TWI533129B - 使用可分割引擎實體化的虛擬核心執行指令序列程式碼區塊 - Google Patents

使用可分割引擎實體化的虛擬核心執行指令序列程式碼區塊 Download PDF

Info

Publication number
TWI533129B
TWI533129B TW101110082A TW101110082A TWI533129B TW I533129 B TWI533129 B TW I533129B TW 101110082 A TW101110082 A TW 101110082A TW 101110082 A TW101110082 A TW 101110082A TW I533129 B TWI533129 B TW I533129B
Authority
TW
Taiwan
Prior art keywords
virtual
core
engine
splittable
execution
Prior art date
Application number
TW101110082A
Other languages
English (en)
Other versions
TW201305819A (zh
Inventor
摩翰麥德 艾伯戴爾拉
Original Assignee
軟體機器公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 軟體機器公司 filed Critical 軟體機器公司
Publication of TW201305819A publication Critical patent/TW201305819A/zh
Application granted granted Critical
Publication of TWI533129B publication Critical patent/TWI533129B/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30076Arrangements for executing specific machine instructions to perform miscellaneous control operations, e.g. NOP
    • G06F9/3009Thread control instructions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30098Register arrangements
    • G06F9/30105Register structure
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30098Register arrangements
    • G06F9/3012Organisation of register space, e.g. banked or distributed register file
    • G06F9/30123Organisation of register space, e.g. banked or distributed register file according to context, e.g. thread buffers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/34Addressing or accessing the instruction operand or the result ; Formation of operand address; Addressing modes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3838Dependency mechanisms, e.g. register scoreboarding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3851Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution from multiple instruction streams, e.g. multistreaming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3854Instruction completion, e.g. retiring, committing or graduating
    • G06F9/3858Result writeback, i.e. updating the architectural state or memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5038Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5077Logical partitioning of resources; Management or configuration of virtualized resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/506Constraint
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Description

使用可分割引擎實體化的虛擬核心執行指令序列程式碼區塊
本申請案主張由Mohammad A.Abdallah於2011年3月25日立案之共同申請共同受讓的美國臨時專利申請案編號61/467,944之優先權,其名為「使用可分割引擎實體化的虛擬核心執行指令序列程式碼區塊」(EXECUTING INSTRUCTION SEQUENCE CODE BLOCKS BY USING VIRTUAL CORES INSTANTIATED BY PARTITIONABLE ENGINES),在此完整引述併入。
【相關申請案之對照】
本申請案關於由Mohammad A.Abdallah於2007年4月12日立案之共同申請共同受讓的美國專利申請案編號2009/0113170,其名為「處理在關連作業中平行指定之指令矩陣的裝置與方法」(APPARATUS AND METHOD FOR PROCESSING AN INSTRUCTION MATRIX SPECIFYING PARALLEL IN DEPENDENT OPERATIONS),在此完整引述併入。
本申請案關於由Mohammad A.Abdallah於2007年11月14日立案之共同申請共同受讓的美國專利申請案編號2010/0161948,其名為「在支援多種內容切換模式與虛擬化方案之多執行緒架構中處理複雜指令格式的裝置與方法」(APPARATUS AND METHOD FOR PROCESSING COMPLEX INSTRUCTION FORMATS IN A MULTITHREADED ARCHITECTURE SUPPORTING VARIOUS CONTEXT SWITCH MODES AND VIRTUALIZATION SCHEMES),在此完整引述併入。
本發明概略關於數位電腦系統,尤指一種用於選擇包含一指令序列之指令的系統與方法。
要處理相關或完全無關之多項工作即需要處理器。這種處理器之內部狀態通常由暫存器所構成,其可在每一次程式執行的特定瞬間保存不同的數值。在每一次程式執行的瞬間,該內部狀態圖像被稱為該處理器的架構狀態。
當程式碼執行被切換來運作另一項功能時(例如另一執行緒、程序或程式),則該機器/處理器之狀態必須被儲存,使得該項新的功能可利用該等內部暫存器來建構其新的狀態。一旦該項新功能終止時,則其狀態即被丟棄,而該先前內容的狀態將被恢復,並重新開始執行。這種切換程序被稱為一內容切換,且通常包括10餘次或數百次循環,特別是利用到採用大量暫存器(例如64、128、256)的現今架構及/或無順序執行時。
在感知執行緒(thread-aware)的硬體架構中,通常該硬體可對於有限數目的硬體支援執行緒來支援多種內容狀態。在此例中,該硬體對於每一支援的執行緒複製所有架構狀態元件。此即可在當執行一新執行緒時無需內容切換。但是,如此仍具有多項缺點,即對於在硬體中支援的 每一額外的執行緒要複製所有架構狀態元件(即暫存器)之區域、電力與複雜性。此外,如果軟體執行緒的數目超過明確支援之硬體執行緒的數目,則仍必須執行該內容切換。
此即為需要大量執行緒的一微細粒度基礎(fine granularity basis)之下所常見需要的平行化(parallelism)。具有複製內容狀態硬體儲存之硬體執行緒感知架構無助於非執行緒的軟體程式碼,且僅能對於被執行緒化的軟體減少其內容切換的次數。但是,那些執行緒通常針對粗粒度平行化所建構,並在初始化與同步化、離開微細粒度平行化(例如函數呼叫與迴路平行執行)時造成沉重的軟體負擔,而無法有效率的執行緒初始化/自動產生。這些描述的負擔在使用針對非明確地/簡易地平行化/執行緒的軟體程式碼之現今的編譯器或使用者平行化技術時,伴隨著這些程式碼之自動平行化的困難性。
在一具體實施例中,本發明實施成一種針對一處理器使用複數虛擬程式碼執行指令之方法。該方法包括使用一通用前端排程器接收一輸入的指令序列,並將該輸入的指令序列分割成複數個指令的程式碼區塊。該方法另包括產生複數個繼承向量,其描述該等程式碼區塊的指令之間的相關性,且將該等程式碼區塊分配至該處理器的複數個虛擬程式碼,其中每一虛擬核心包含複數個可分割引擎之資 源的個別子集合。該等程式碼區塊使用該等可分割引擎根據一虛擬核心模式及根據該等個別繼承向量來執行。
本發明之其它具體實施例利用一共用排程器、一共用暫存器檔案及一共用記憶體子系統來針對處理器之多個可分割引擎實施片段化的位址空間。該等可分割引擎可用於實施複數個虛擬核心。片段化藉由允許額外的虛擬核心來共同合作執行指令序列而能夠調整微處理器的效能。該片段化階層在每一快取階層皆相同(例如L1快取、L2快取及該共用暫存器檔案)。該片段化階層可使用位址位元將該位址空間區分成片段,其中使用該等位址位元使得該等片段在快取線邊界之上及頁面邊界之下。每一片段可設置成利用一多埠式記憶庫結構做儲存。
前述內容為一總結,因此包含必要的簡化、一般化與細節之省略;因此,本技術專業人士將可瞭解到該總結僅為例示性,而並非為任何型式的限制。僅由該等申請專利範圍所定義之本發明的其它態樣、創新特徵與好處將可在以下提出的非限制性詳細說明中做瞭解。
雖然本發明已經配合一具體實施例做說明,本發明並非要被限制於此處所提出的該等特定型式。相反地,本發明係要涵蓋其它這些選項、修正及同等者,其係被合理地包括在由該等附屬申請專利範圍所定義之本發明的範圍之內。
在以下的詳細說明中,已提出許多特定細節,例如特定方法順序、結構、元件與連接。但是應瞭解到這些與其它特定細節不需要被用來實施本發明之具體實施例。在其它狀況下,熟知的結構、元件或連接已被省略或並未特別詳細說明係為了避免不必要地混淆本說明。
在本說明書中的「一(one、an)具體實施例」係要代表配合該具體實施例所描述的一特定特徵、結構或特性被包括在本發明的至少一具體實施例中。在本說明書中多處有用語「在一具體實施例中」的出現並非一定都參照到相同的具體實施例,也非為與其它具體實施例相互排斥的獨立或其它的具體實施例。再者,所述之多種特徵可由一些具體實施例呈現而不出現於其它具體實施例。同樣地,所述之多種需求可為一些具體實施例的需求而非其它具體實施例所需要。
該等詳細說明的一些部份在以下係以程序、步驟、邏輯方塊、處理,以及其它對於一電腦記憶體內資料位元之作業的符號表示來呈現。這些說明及表示為在資料處理技術中那些專業人士所使用來最佳地傳遞他們工作的實質內容到本技藝中其他專業人士的手段。概言之,在此處的程序、電腦可執行步驟、邏輯方塊及程序等,其應視為可達到所想要結果之步驟或指令的一自我符合的順序。該等步驟為那些需要實體量的實體操縱。通常但非必要,這些數量可採取一電腦可讀取儲存媒體的電子或磁性信號之 型式,並能夠在一電腦系統中被儲存、轉換、組合、比較,及另可進行操縱。主要為了共通用法的原因,較為方便地是將這些信號稱為位元、數值、元件、符號、字元、項目、數目或類似者。
但是應要注意到所有這些及類似術語係要關聯於該等適當實體數量,並僅為應用到這些數量的便利標記。除非在以下討論中可瞭解者之外有特定地陳述,將可瞭解到在整個本發明討論中所利用的術語,例如「處理」或「存取」或「寫入」或「儲存」或「重製」或類似者,皆代表一電腦系統或類似的電子運算裝置之動作及程序,其可操縱及轉換表示成該電腦系統之暫存器及記憶體及其它電腦可讀取媒體中的實體(電子)數量的資料成為類似地表示成在該電腦系統記憶體、或暫存器、或其它像是資訊儲存、傳輸或顯示裝置內的實體數量之其它資料。
本發明之具體實施例利用一共用通用前端排程器、複數個分段的暫存器檔案及一記憶體子系統來針對一多核心處理器之多個核心實施片段化的位址空間。在一具體實施例中,片段化可藉由允許額外的虛擬核心(例如軟核心)來共同合作地執行包含一或多個執行緒的指令序列而能夠調整微處理器之效能。該片段化階層對於每一快取階層皆相同(例如L1快取、L2快取及該共用暫存器檔案)。該片段化階層可使用位址位元將該位址空間區分成片段,其中使用該等位址位元使得該等片段由在快取線邊界之上及頁面邊界之下的位元做辨識。每一片段設置成利用一多 埠式記憶庫結構做儲存。本發明之具體實施例在以下由圖1A與圖1B做進一步說明。
圖1A所示為根據本發明一具體實施例之一種處理器的概述圖。如圖1A所示,該處理器包括一通用前端提取與排程器10及複數個可分割引擎11-14。
圖1A所示為該通用前端產生程式碼區塊與繼承向量來在它們個別的可分割引擎上支援程式碼序列之執行的方式概述。每一程式碼序列20-23根據該特定虛擬程式碼執行模式而屬於相同的邏輯核心/執行緒或屬於不同的邏輯核心/執行緒。該通用前端提取與排程器將處理程式碼序列20-23來產生程式碼區塊與繼承向量。這些程式碼區塊與繼承向量被分配給特定可分割引擎11-14,如所示。
該等可分割引擎根據一種選擇的模式實施虛擬核心。一可分割引擎包括一節段、一片段與一些執行單元。在該等可分割引擎之內的該等資源可用於實施具有多種模式的虛擬核心。如該虛擬核心模式所提供者,可實施一軟核心或許多軟核心來支援一邏輯核心/執行緒。在圖1A的具體實施例中,根據該選擇的模式,該等虛擬核心可支援一邏輯核心/執行緒或四邏輯核心/執行緒。在該等虛擬核心支援四邏輯核心/執行緒的一具體實施例中,每一虛擬核心的該等資源被分散橫跨每一可分割引擎。在該等虛擬核心支援一邏輯核心/執行緒的一具體實施例中,所有該等引擎的該等資源係專屬於該核心/執行緒。該等引擎被分 割,使得每一引擎提供包含每一虛擬核心的該等資源之一子集合。換言之,一虛擬核心將包含該等引擎11-14之每一者的該等資源之一子集合。該等引擎11-14之每一者的該等資源之間的通訊由一通用內連線結構30所提供,藉以實施此程序。另外,引擎11-14可用於實施一實體模式,其中引擎11-14之該等資源係專屬於支援一專屬的核心/執行緒之執行。依此方式,由該等引擎實施的該等軟核心包含具有分散橫跨該等引擎之每一者的資源之虛擬核心。該等虛擬核心執行模式另於下述在後續的圖示中做進一步說明。
必須注意到在一種習用的核心實施中,僅有一核心/引擎內的資源僅被分配至一邏輯執行緒/核心。相反地,在本發明之具體實施例中,任何引擎/核心之該等資源可被分割成與其它引擎/核心分割共同地實體化被分配至一邏輯執行緒/核心的一虛擬核心。此外,本發明之具體實施例可實施多種虛擬執行模式,其中那些相同引擎可被分割成支援許多專屬的核心/執行緒、許多動態分配的核心/執行緒、或是所有引擎之所有該等資源支援一單一核心/執行緒之執行的一具體實施例。這些具體實施例在下述之說明中做進一步說明。
圖1B所示為根據本發明一具體實施例中針對一多核心處理器之可分割引擎及它們的組件之概要圖,其中包括分段的排程器與暫存器檔案、通用內連線與一片段化記憶體子系統。如圖1所示,顯示有四個片段101-104。該片 段化階層對於每一快取階層皆相同(例如L1快取、L2快取及該負載儲存緩衝器)。資料可經由記憶體通用內連線110a於該等L1快取之每一者、該等L2快取之每一者、及該等負載儲存緩衝器之每一者之間交換。
該記憶體通用內連線包含一路由矩陣,其允許複數個核心(例如位址計算與執行單元121-124)來存取可能儲存在該分段的快取階層(例如L1快取、負載儲存緩衝器與L2快取)中任何一點處的資料。圖1亦描述了片段101-104之每一者可由位址計算與執行單元121-124經由記憶體通用內連線110a存取之方式。
執行通用內連線110b類似地包含一路由矩陣,其可允許該等複數個核心(例如位址計算與執行單元121-124)來存取可能儲存在該等分段的暫存器檔案之任何一處的資料。因此,該等核心可經由記憶體通用內連線110a或執行通用內連線110b存取儲存在該等片段之任何一者中的資料及儲存在該等節段之任何一者中的資料。此外,必須注意到在一具體實施例中,另一通用內連線存在於該等共用分割提取與排程器之每一者之間。此係由在每一共用分割提取與排程器之間連接的該等水平箭頭所示。
圖1B另顯示一通用前端提取與排程器150,其為整個機器的視圖,且其管理該等暫存器檔案節段與該等片段化記憶體子系統之運用。位址產生包含片段定義之基礎。該通用前端提取與排程器藉由分配指令序列至每一節段 的分割排程器來運作。然後該共用分割排程器分派那些指令序列在位址計算與執行單元121-124上執行。
必須注意到在一具體實施例中,該等共用分割提取與排程器之功能性可被加入到通用前端排程器150。在這種具體實施例中,該等節段未包括個別的共用分割提取與排程器,且它們之間將不需要一內連線。
此外,必須注意到圖1A所示的該等可分割引擎可用一階層方式巢化。在這種具體實施例中,一第一級可分割引擎將包括一局部前端提取與排程器,及與其連接的多個次級可分割引擎。
圖2所示為根據本發明一具體實施例之排程器流程圖。如圖2所示,顯示一桶緩衝器(bucket buffer),其包括推測式執行緒桶指標、桶來源及目的地清單。該等排程器與執行桶包括一桶分派選擇器及該虛擬暫存器匹配與讀取,其包括一暫存器階層與一暫存器快取的可能性。該後端為已執行的桶被記錄及在汰除之前強制異常排序的地方。該暫存器階層/快取亦做為該等執行的桶結果之一中間儲存,直到它們為非推測性,並可更新該架構狀態。下述揭示一種該前端、該分派階段與已執行的桶被記錄處的該後端的可能實施。
圖2所示為由管理少量緊密耦合的執行緒之一桶緩衝器調整成管理多個桶緩衝器與執行緒之硬體電路的觀念 之方式。可被擴充來處理可能具有較少緊密互動的較大量之執行緒的那些電路被描述為一通用前端(例如圖1所示的通用前端排程器150)。
該程序開始於提取一新執行緒矩陣/桶/區塊,然後該新執行緒桶被指定至該桶緩衝器中一空的桶槽。在執行緒分配指標陣列852中該等執行緒分配指標之每一者構成一桶間隔,使得該執行緒被允許實體上來將其指令的區塊/桶置於其中。那些執行緒之每一者以循環的方式保持分配桶至其連續空間之相對應間隔內的該桶緩衝器陣列中。在每一執行緒空間內該等桶/區塊被指定一新編號852,其在每一次一新的桶/區塊被指定時即遞增。對於桶850中每一有效來源。每一桶的該等有效來源具有一有效讀取位元「Rv」,代表此來源為此桶內該等指令所需要。利用相同方式,要由此桶中的指令寫回的每一目的地暫存器在該桶中具有一有效位元「Wv」,且其在一目的地繼承向量853中具有一欄位。當一新的桶要被提取到該桶緩衝器中時,其由執行緒桶分配指標852所指到的該先前分配的桶繼承該目的地繼承向量。該繼承向量自該先前分配的桶複製,然後其覆寫對應於將由那些桶指令更新的該等暫存器的那些有效目的地欄位。該等有效目的地將標示該目前桶編號,而該等無效目的地係自該桶內的該相對應繼承向量複製。然後該執行緒桶指標藉由遞增其指標(其纏繞在其間隔內)對該新的提取桶來更新。
在該桶分派與執行階段中,每當一桶被執行而沒有任何異常處理時,則桶執行旗標(包含該桶編號)854被設定,並廣播到整個該桶緩衝器,並在每一桶內被閂鎖/監視,其具有以該桶編號為來源的一來源。亦可能連同該桶編號傳送其它相關的資訊,例如關於虛擬暫存器位置的資訊。當該等來源桶的所有該等執行旗標被設置在一桶內時,則桶預備位元855被設定,且該桶預備好被分派與執行。當該桶執行而沒有任何異常且其預備好以該程式的序列順序來更新該架構狀態時,則其汰除該桶,且汰除執行緒指標857被遞增到該陣列中的下一個桶。該汰除的桶位置可被指定給一新的桶。
那些緊密相關的執行緒皆可在該矩陣/桶/區塊緩衝器內共存;每一執行緒將佔據屬於該執行緒的連續桶之一間隔。該執行緒的分配指標以一循環方式移動到桶的此間隔內,以該所述的循環方式提取新的指令桶,並將其分配在該執行緒間隔內。利用這種間隔區段化,該整個桶緩衝器利用桶的不同或相等間隔長度來動態地區分。
此處所介紹的該繼承向量的概念係針對該指令桶以及該執行緒。每一指令矩陣/區塊/桶寫入到該等架構性暫存器當中特定的暫存器當中。在分配階段中每一新的桶更新此繼承向量,其寫入其本身的該執行緒與桶編號到此向量中,而使得其並未寫入其中的該等暫存器之該等欄位保持未更新。此桶繼承向量B_iv 856以程式順序由每一桶轉送至下一個桶。在圖2中,如果在該矩陣中該等指令寫入 到那些暫存器中時,每一矩陣將其本身的編號寫入到該等架構目的地暫存器中,否則其繼承來自在該執行緒中該先前的桶之該B_iv的該數值。
圖3所示為根據本發明一具體實施例之示例性硬體電路圖,其中顯示有儲存運算子與結果的一分段的暫存器檔案並具有一內連線。圖3所示為經由該執行通用內連線耦合至複數個執行單元的一運算子結果緩衝器。
圖4所示為根據本發明一具體實施例之一通用前端排程器之示意圖。該通用前端排程器設置成處理可能具有較少緊密互動之較多數目的執行緒(例如圖1所示之排程器150中的通用前端)。此圖所示為來自一邏輯核心的一序列的指令如何被分配橫跨許多虛擬核心。此程序將針對存在於該機器中每一邏輯核心來重複。必須注意到圖4的「引擎」包含一虛擬核心的該等組件,其中該暫存器檔案被明確地描述成顯示在該暫存器檔案階級處虛擬核心間的通訊之態樣。
例如,如圖4所示,該通用前端排程器可處理一執行緒標頭902,但並不需要該執行緒內該等實際指令來強制進行橫跨那些遠離的執行緒之相關性檢查。該執行緒的標頭與其桶的該等子標頭僅包含關於那些執行緒與桶寫入(至那些指令之目的地暫存器)的該等架構暫存器之資訊。在那些標頭中不需要包括實際指令或那些指令的來源。實際上其足以列出那些目的地暫存器或一位元向量,其中每 一個別位元針對為一指令之一目的地的每一暫存器做設定。該標頭並不需要被實際上放置成該等指令的一標頭;其可為任何格式的封包,或該等執行緒內該等指令之該等目的地暫存器之小型化表示,其可能或可能不儲存有該等指令資訊之其餘部份。
此通用前端僅以程式順序提取該等執行緒/區塊之該等標頭,並產生動態執行緒及/或桶繼承向量901(Tiv及/或Biv)。每次分配一新執行緒時,那些繼承向量藉由保持該目前執行緒桶將不會寫入或更新之該等舊的欄位來轉送,如903所示。那些繼承向量被分佈到大量的引擎/核心或處理器904,其每一者可包括一局部前端與一提取單元(其將提取與儲存由每一桶之相關性向量所產生的該等實際指令)及具有局部暫存器檔案905的一局部矩陣/區塊/桶緩衝器。然後該等局部前端提取該等實際指令,並使用來自由該通用前端取得的該等繼承向量之資訊來填充被帶入到那些引擎來執行的該等指令之該等指令來源的相關性資訊。圖4所示為一通用前端實施,及其僅使用關於該等指令之簡要資訊(例如其僅為那些指令寫入其中的該等暫存器)將該等繼承向量散佈到不同的引擎904的方式。其它要放置在該標頭中而有助益的資訊為關於在該等執行緒內或橫跨其間的該控制路徑中的變化之資訊。一通用分支預測器可用於預測橫跨那些執行緒之控制流程,所以這些標頭可包括該等分支的目的地與偏移量。除了該分支預測器來決定控制流程之外,該硬體/編譯器可決定橫跨一分支的兩條控制路徑來分派獨立的執行緒。這此例中,稍 後將使用該繼承向量合併那兩條路徑之執行。圖4亦顯示出當一新執行緒的一標頭由該通用前端提取時的轉送程序。例如執行緒2(906)將更新被轉送給它的相對應繼承向量901,造成暫存器1、2、3、4、6、0與7利用T2標記被更新的向量910。請注意在910中,暫存器5並非由T2桶寫入,因此其標記係由一先前繼承向量所繼承。
一項有趣的觀察為該等暫存器檔案允許該等核心/引擎之間橫跨通訊。由於橫跨引擎而需要的該等暫存器之一早期要求(來降低該存取潛時)只要在該執行緒的該等指令桶被提取與分配在該局部桶緩衝器中時即被放置。此時該來源相關性資訊出現(populate),使得可能在該等實際指令被分派來執行之很久之前就發出橫跨引擎執行緒參照。在任何情況下,該指令將不會被分派,直到該交互參照的來源被轉送且抵達時。此交互參照的來源可被儲存在該局部多執行緒的暫存器檔案或暫存器快取中。雖然此交互參照的來源可被儲存在類似於該負載儲存緩衝器的一緩衝器中(其可重新使用該負載儲存緩衝器實體儲存器與相關性檢查機制,但做為一暫存器負載而非記憶體負載)。可使用許多拓樸來連接橫跨該等引擎/核心的該等暫存器檔案,其可為一環狀拓樸或橫桿拓樸或網格路由內連線。
以下的討論可例示暫存器檔案分段化可如何用於一引擎內且亦可橫跨引擎來使用。當該桶被分派時,其來源被傳送(同時或依序)至該暫存器檔案與該暫存器快取。如果該暫存器檔案被實體上統一,並直接支援執行緒化,則 該運算子直接由該相對應的執行緒暫存器節段讀取。如果該暫存器檔案為一虛擬暫存器,包括使用標籤的一實體上分段的暫存器檔案,則一標籤匹配必須做為該虛擬暫存器讀取的一部份來完成。如果該標籤匹配,則該讀取由該分段的暫存器檔案發生。
所揭示者為可支援軟體執行緒、硬體產生的執行緒、VLIW執行、SIMD & MIMD執行以及無順序超純量執行之模擬的暫存器架構。雖然其實體上為分段,但可看作一統一的架構資源。此分段的暫存器為該虛擬暫存器檔案的一部份,其可包括一暫存器階層與一暫存器快取,以及儲存與檢查暫存器標籤的機制。如果我們使用一位置為主的方案來利用該相關性繼承向量,則可排除該標籤存取。該方案之運作使得當該執行的桶編號於分派階段期間被廣播時,後續指令的所有該等來源執行一CAM(內容可定址匹配,Content addressable match),其比較它們的來源桶與該剛被分派/執行的桶來設定該來源的該預備旗標。此處該桶被執行之實際位置亦可連同該暫存器編號來傳遞,使得可解決任何的混淆。
例如,考慮一種具有四個暫存器檔案節段的實施,其每一者包含16個暫存器。例如,在分派一桶#x至節段2時,該桶編號x被廣播至該桶緩衝器,該節段#2亦隨其被廣播,使得與桶x有相關性的所有來源將記錄其寫入所有其暫存器在節段2中。當來到分派那些指令時,它們知道它們需要由節段2而非任何其它節段讀取它們的暫存器, 即使相同的暫存器編號存在於該等其它節段中。此亦應用至該暫存器快取來避免使用標籤。我們可延伸此觀念至該通用前端,其中除了該執行緒資訊之外,該繼承向量可指定寫入到此暫存器之該指令桶被分配在那一個引擎中。
圖5所示為根據本發明一具體實施例中橫跨許多虛擬核心之指令分配的另一種實施。圖5顯示一運行時間最佳化器排程器550,其藉由分佈繼承向量編碼節段至該等虛擬核心來運作。在一具體實施例中,該最佳化器察看指令的一些程式碼區塊,並重新排程指令橫跨所有該等程式碼區塊,以產生程式碼節段與繼承向量。該最佳化器的目標將是使得程式碼節段在它們個別的虛擬核心上重疊執行之執行效率可最大化。
圖6為根據本發明一具體實施例中具有相對應複數的暫存器檔案與運算子結果緩衝器之複數個暫存器節段。如圖6所示,該執行通用內連線可連接每一暫存器節段至複數個位址計算與執行單元。
圖6的該等暫存器節段可用於實施以下三種執行模式之一:由該編譯器/程式化器被群組在一起來形成一MIMD超指令矩陣,或是每一矩陣在一執行緒的模式中被獨立地執行,其中個別的執行緒在該等四個硬體節段之每一者之上同時地執行。可能的最後執行模式為有能力使用一硬體相關性檢查由一單一執行緒動態地執行四個不同的指令 矩陣,以確保在該等四個不同硬體節段上同時執行的那些不同矩陣之間不會存在有相關性。
圖6中的該等暫存器檔案另可根據該執行模式來設置。在一種模式中,該等暫存器當案係以用於四個節段的一MIMD寬度的一MIMD分段的暫存器檔案來看待,或是它們做為四個各自的暫存器檔案,其每一者用於一獨立執行緒。該等暫存器檔案亦支援一動態執行模式,其中該等四個節段為一統一的暫存器檔案,其中被寫入到一特定節段中任何暫存器的資料可由該等其它節段中所有單元存取。那些模式之間的切換可為無縫隙式,只要不同的執行模式可於各自的執行緒基線指令矩陣與MIMD超指令矩陣執行緒之間交替。
在一多執行緒執行模式中,執行一執行緒的每一暫存器檔案與其執行單元整體皆無關於其它暫存器檔案與它們的執行緒。此係類似於每一執行緒具有其本身的暫存器狀態。但是,可指定那些執行緒之間的相關性。屬於一執行緒的每一矩陣將在該執行緒的暫存器檔案之該執行單元中執行。如果在該硬體上僅有執行一執行緒或非執行緒的單一程式,則使用以下的方法來允許屬於該單一執行緒/程式的平行矩陣能夠存取被寫入到該等其它節段中該等暫存器當中的該等結果。其完成的方法為藉由允許任何矩陣寫入結果到該等四個暫存器檔案節段之任何一者當中,以在該等其它暫存器檔案節段中產生那些暫存器的複本。實際上,此係藉由延伸每一節段的該等寫入埠到該等 其餘節段當中來完成。但是,此無法調整,因為我們無法利用具有四倍於單獨一節段所需要之該等寫入埠的每一記憶胞之一有效率的暫存器檔案。我們提出一種機制使得該暫存器檔案可建構成其將不會受到這種單一執行緒暫存器-廣播延伸的影響。
必須注意到關於在本發明之具體實施例中所使用的暫存器節段之額外態樣可見於Mohammad A.Abdallah於2007年11月14日所立案的美國專利申請案編號2010/0161948,其名為「在支援多種內容切換模式與虛擬化方案之多執行緒架構中處理複雜指令格式的裝置與方法」(APPARATUS AND METHOD FOR PROCESSING COMPLEX INSTRUCTION FORMATS IN A MULTITHREADED ARCHITECTURE SUPPORTING VARIOUS CONTEXT SWITCH MODES AND VIRTUALIZATION SCHEMES)。
圖7為根據本發明一具體實施例之一多核心處理器之一片段化記憶體子系統之細部示意圖。圖7概略顯示出執行緒之間及/或負載與儲存器之間該同步化方案的一種完整方案與實施。該方案描述用於橫跨負載/儲存架構及/或橫跨記憶體參照及/或執行緒的記憶體存取之記憶體參照的同步化與歧義消除的一種較佳的方法。在圖7中,我們顯示了暫存器檔案之多個節段(位址及/或資料暫存器)、執行單元、位址計算單元、及第1級快取及/或負載儲存緩衝器與第2級快取及位址暫存器內連線1200與位址計算單 元內連線1201的片段。那些片段元件可藉由片段化與分佈其集中的資源到數個引擎當中來被建構在一核心/處理器之內,或者它們可由在一多核心/多處理器組態中的不同核心/處理器之元件來建構。那些片段1211之一在圖中顯示為片段編號1;該等片段可調整成一大的數目(概略為圖中所示的N個片段)。
此機制亦用於那些引擎/核心/處理器之間該記憶體架構的一種同調性方案。此方案開始於來自在一片段/核心/處理器中該等位址計算單元之一者的一位址要求。例如,假設該位址由片段1(1211)所要求。其可使用屬於其本身片段的位址暫存器及/或使用位址內連線匯流排1200橫跨其它片段的暫存器來得到並計算其位址。在計算該位址之後,其產生用於存取快取與記憶體之32位元位址或64位元位址的該參照位址。此位址通常被分段成一標籤欄位與一集合與線欄位。此特定片段/引擎/核心將儲存該位址到其負載儲存緩衝器及/或L1及/或L2位址陣列1202當中,同時其將藉由使用一壓縮技術產生該標籤的一壓縮版本(其比該位址之原始標籤欄位具有較少數目的位元)。
有更多不同的片段/引擎/核心/處理器將使用該集合欄位或該集合欄位的一子集合做為一索引來辨識該位址被維護在那一個片段/核心/處理器中。此藉由該位址集合欄位位元之該等片段的索引化可確保在一特定片段/核心/引擎中該位址之擁有權的排除性,即使對應於該位址之記憶體資料可存在於另一個或多個其它片段/引擎/核心/處 理器中。即使該等位址CAM/標籤陣列1202/1206被顯示在要耦合於資料陣列1207的每一片段中,它們可能僅耦合在實際上放置或佈置的鄰近處,或甚至事實上兩者屬於一特定引擎/核心/處理器,但在被保持在該等位址陣列中的位址與在一片段內該等資料陣列中的資料之間並無關係。
圖8為根據本發明一具體實施例如何使用一位址的位元由位址產生來列舉片段的示意圖。在本具體實施例中,片段係由在頁面邊界之上及快取線邊界之下的該等位址位元所定義,如圖8所示。本發明較佳地是維持在該等頁面邊界之上來避免於該等虛擬位址轉譯成實體位址其間造成TLB遺漏。該程序保持在該快取線邊界之下,藉以具有完整的快取線來正確地配合在該硬體快取階層當中。例如,在利用64位元組快取線的系統中,該片段邊界將避免使用最後六個位址位元。相較於利用32位元組快取線的一種系統,該片段邊界將避免使用最後五個位元。一旦定義之後,該片段階層在橫跨該處理器的所有快取階層當中皆相同。
圖9為本發明之具體實施例如何處理負載與儲存之示意圖。如圖9所示,每一片段係關聯於其負載儲存緩衝器與儲存汰除緩衝器。對於任何給定的片段,指定關聯於該片段或另一片段的一位址範圍之負載與儲存將被傳送至該片段的負載儲存緩衝器做處理。必須注意到它們將未依順序到達,因為該等核心並無順序地執行指令。在每一核 心之內,該核心不僅可存取到其本身的暫存器檔案,亦可存取到每一個其它核心之暫存器檔案。
本發明之具體實施例實施一分散式負載儲存排序系統。該系統被分佈橫跨多個片段。在一片段之內,局部資料相關性檢查由該片段執行。此係因為該片段僅載入與儲存在該特定片段的該儲存汰除緩衝器之內。此限制了必須察看其它片段來維持資料同調性的需求。依此方式,自一片段內的資料相關性被局部地強制。
關於資料一致性,該儲存分派閘極根據嚴格的程式內順序記憶體一致性規則來強制儲存汰除。儲存為無順序地抵達該負載儲存緩衝器。負載亦為無順序地抵達該負載儲存緩衝器。同時,該等無順序的負載與儲存被轉送至該等儲存汰除緩衝器做處理。必須注意到雖然儲存在一給定片段內依順序汰除,因為它們進入該儲存分派閘極,它們可無順序地來自該等多個片段。該儲存分派閘極強制實施一政策,其可確保即使儲存可無順序地存在於橫跨儲存汰除緩衝器,且即使該等緩衝器可相對於其它緩衝器的儲存為無順序地轉送儲存至該儲存分派閘極,該分派閘極可確保它們被嚴格地依順序轉送至片段記憶體。此係因為該儲存分派閘極具有儲存汰除的一整體概觀,並僅允許儲存依順序橫跨所有該等片段(例如通用地)離開至該記憶體之通用可見側。依此方式,該儲存分派閘極係做為一通用觀察者來確保該等儲存最終橫跨所有片段依序地返回到記憶體。
圖10為根據本發明一具體實施例中那些片段可被分成兩個或更多區域之方法。圖10所示為一單一片段可被分成多個區域之方法。區域區分可經由該位址產生程序來實施。區域區分改變了負載儲存檢查必須在一片段內完成的方式,因為在此例中相對於橫跨該整個片段,它們僅必須針對每個區域來完成。區域區分亦有好處在於其可使得單一埠的記憶體之行為可像是多埠記憶體,其中該單一埠係對不同的區域來存取。
圖11為根據本發明一具體實施例中該處理器之一種作業模式,其中該等可分割引擎之該等硬體資源係用於做為類似在執行應用程式中的邏輯核心。在此具體實施例中,該等虛擬核心之該等引擎的該等硬體資源被設置成實體核心。在圖11的模式中,其每一實體核心被設置成做為一邏輯核心。多執行緒應用程式與多執行緒功能性係根據該應用程式的軟體之執行緒化的可程式性。
圖12為根據本發明一具體實施例中該處理器之一種作業模式,其中軟核心被用於像是在執行應用程式時的邏輯核心來運作。在此具體實施例中,虛擬核心的該等可分割引擎將支援複數個軟核心。在圖12的模式中,每一軟核心被設置成做為一邏輯核心。多執行緒應用程式與多執行緒功能性係根據該應用程式的軟體之執行緒化的可程式性。
圖13為根據本發明一具體實施例中該處理器之一種作業模式,其中該等軟核心被用於像是在執行應用程式時的一單一邏輯核心來運作。在圖13的模式中,每一軟核心被設置成做為一單一邏輯核心。在這種實施中,一單一執行緒的應用程式將其指令序列分開,並分配在該等虛擬核心之間,其中它們被協同地執行來達成高單一執行緒效能。依此方式,單一執行緒的效能可隨著加入額外的軟核心來調整。
在選擇該處理器之操作模式時可使用一些策略。對於具有大量引擎(例如8引擎、12引擎等)之一處理器,一些軟核心可設置成做為一單一邏輯核心,而該等其餘的核心可在該等其它模式中運作。此屬性可允許一種資源的智慧型分割來確保該硬體之最大利用率及/或最低浪費的電力消耗。例如,在一具體實施例中,核心(例如軟或邏輯核心)可根據正在執行的應用程式之種類來以每個執行緒為基礎做分配。
圖14為根據本發明一具體實施例中用於支援邏輯核心與虛擬核心功能之片段分段的示例性實施。如上所述,該片段分段化可允許該處理器設置成支援不同的虛擬核心執行模式,如上所述。
該通用內連線允許核心的執行緒來存取任何的埠1401。必須注意到此處所使用的術語「執行緒」(thread) 代表來自不同邏輯核心的指令序列、來自相同邏輯核心的指令序列,或是兩者之某種混合。
該等執行緒利用埠1401之一來存取該負載儲存緩衝器之方式可根據該等仲裁器之政策而調整,如所示。因此,使用埠1401中任何一者的一執行緒經由埠1402可較大量或較少量地存取該負載儲存緩衝器。該分配的大小與該分配被管理的方式由該仲裁器控制。該仲裁器可動態地根據一特定執行緒的需求而分配存取該等埠。
該負載儲存緩衝器設置成具有散佈橫跨該等埠之複數個入口。至該負載儲存緩衝器之存取由該仲裁器控制。依此方式,該仲裁器可動態地分配在該負載儲存緩衝器中的入口至該等不同的執行緒。
圖14亦顯示了在負載儲存緩衝器與該L1快取之間的該等埠之上的仲裁器。因此,利用上述之該負載儲存緩衝器,使用該等埠1403中任何一者的一執行緒經由埠1404可較大量或較少量地存取該L1快取。該分配的大小與該分配被管理的方式由該仲裁器控制。該仲裁器可動態地根據一特定執行緒的需求而分配存取該等埠。
該L1快取設置成具有散佈橫跨該等埠之複數條路線。該L1快取之存取由該仲裁器控制。依此方式,該仲裁器可動態地分配在該L1快取中的入口至該等不同的執行緒。
在一具體實施例中,該等仲裁器設置成與用於追蹤功能性的複數個計數器1460及提供一限制功能之複數個臨界值限制暫存器1450運作。該限制功能指定一給定執行緒之最高資源分配百分比。該追蹤功能追蹤在任何給定時間時分配給一給定執行緒的該等實際資源。這些追蹤與限制功能影響了該負載儲存緩衝器、L1快取、L2快取或該等通用內連線之每一執行緒入口、路線或埠之數目的分配。例如,分配給每一執行緒之該負載儲存緩衝器中入口的總數可對於一可變臨界值做動態地檢查。此可變臨界值可根據一給定的執行緒之轉送進度來更新。例如,在一具體實施例中,減慢的執行緒(例如大數目或L2遺失等)以造成緩慢轉送進度來量化,因此它們個別的資源分配臨界值被降低,其包括該等入口臨界值、該等路線臨界值與該等埠臨界值。
圖14亦顯示出一共享的L2快取。在本具體實施例中,該共享的L2快取具有一固定埠配置,而在來自該L1快取的存取之間沒有任何的仲裁。在該處理器上執行的執行緒皆共享存取該L2快取及該L2快取的該等資源。
圖15為根據本發明一具體實施例中實施一多實體對多邏輯模式之一示例性四片段處理器的一片段記憶體。
一示例性邏輯核心與其與該處理器之該等資源的關係由圖15之陰影部所示。在圖11的操作模式中,該多實 體核心對多邏輯核心模式,其中該等實體核心用於像是在執行應用程式中的邏輯核心來運作,每一邏輯核心將設置成具有該負載儲存緩衝器與該L1快取之該等資源的一固定比例。該等埠可被特定地指定至每一執行緒或核心。在該負載儲存緩衝器中的入口可被特定地對每一執行緒或核心來保留。在該L1快取內的路線可被特定地對每一執行緒或核心來保留。多執行緒應用程式與多執行緒功能性係根據該應用程式的軟體之執行緒化的可程式性。此顯示成一個邏輯核心具有該等片段之每一者的該儲存緩衝器與該L1快取的一分配的埠與一分配的部份。依此方式,該邏輯核心包含每一片段的該等資源之一固定分配的片層。
在一具體實施例中,在該多實體核心對多邏輯核心模式中,該等四個片段可根據存取每一片段的埠之數目(例如埠1401)來分割。例如,在每一片段具有六個埠的一具體實施例中,每一片段的該等資源,及每一分割的該等資源將引擎者,即可以這種方式區分來支援橫跨該等四個片段的六個實體核心與該等四個分割雙引擎。每一分割可被分配其本身的埠。同樣地,該負載儲存緩衝器與該L1快取的該等資源將以這種方式分配來支援六個實體核心。例如,在該負載儲存緩衝器具有48個入口的一具體實施例中,該等48個入口可被分配成使得每一實體核心有12個入口來支援實施有四個實體核心的一種模式,或是它們可分配成使得在實施有六個實體核心的狀況中每一實體核心有八個入口。
圖16為根據本發明另一具體實施例中實施一多實體對多邏輯模式之一示例性四片段處理器的一片段記憶體。
配合圖15,一個示例性邏輯核心與其與該處理器之該等資源的關係如圖16的陰影部所示。在圖11的操作模式中,該多實體核心對多邏輯核心模式,一整個分割表引擎係專屬於支援一單一邏輯核心的執行。此係由圖16中的陰影部所顯示。該實體資源引擎係用於類似在執行應用程式中的邏輯核心來運作。
圖17為根據本發明一具體實施例中實施一多軟體核心對多邏輯核心模式之一示例性四片段處理器的一片段記憶體。
一示例性邏輯核心與其與該處理器的該等資源之關係係如圖17的陰影部所示。在圖12的操作模式中,該多軟核心對多邏輯模式,其中虛擬核心用於類似在執行應用程式中的邏輯核心來運作,該負載儲存緩衝器的該等資源之分配的大小與該分配被管理的方式由該仲裁器控制。該仲裁器可動態地根據一特定執行緒或核心的需求而分配存取該等埠。同樣地,該L1快取的該等資源之分配的大小與該分配被管理的方式由該仲裁器控制。該仲裁器可動態地根據一特定執行緒或核心的需求而分配存取該等埠。因此,在任何給定實例中,該邏輯執行緒/核心(例如陰影部)可使用不同的仲裁器與不同的埠。
依此方式,存取該負載儲存緩衝器的該等資源與存取該L1快取的該等資源可以是更為政策導向,並可更為基於進行轉送進度之各自的執行緒或核心之該等需求。此顯示成一個邏輯核心具有該等片段之每一者的該儲存緩衝器與該L1快取的一動態分配的埠與一動態分配的部份。依此方式,該邏輯核心包含每一片段的該等資源之一非固定動態分配的片層。
圖18為根據本發明一具體實施例中實施一多軟核心對一邏輯核心模式之一示例性四片段處理器的一片段記憶體。
在圖13的操作模式中,該多軟核心對一邏輯核心模式,其中該等軟核心用於類似在執行應用程式中一單一邏輯核心來運作,該等軟核心之每一者設置成協同於該等其它軟核心運作成一單一邏輯核心。一單一執行緒或核心具有該等負載儲存緩衝器之所有該等資源與該等L1快取的所有該等資源。在這種實施中,一單一執行緒的應用程式將其指令序列分開,並分配在該等軟核心之間,其中它們被協同地執行來達成高單一執行緒效能。依此方式,單一執行緒的效能可隨著加入額外的軟核心來調整。此係顯示在圖18中,其中一個示例性邏輯核心與其與該處理器之該等資源的關係藉由遮影該處理器之所有該等資源來顯示。
圖19為根據本發明一具體實施例中實施一多實體對多邏輯模式之一示例性四片段處理器之位址計算與執行單元、運算子/結果緩衝器、執行緒的暫存器檔案與共用分割排程器。
一示例性邏輯核心與其與該處理器之該等資源的關係如圖19之陰影部所示。在圖11的操作模式中,該多實體核心對多邏輯核心模式,其中該等實體核心用於像是在執行應用程式中的邏輯核心來運作,每一邏輯核心將設置成具有該等位址計算單元、運算子/結果緩衝器、執行緒的暫存器檔案與共用分割排程器之該等資源的一固定比例。多執行緒應用程式與多執行緒功能性係根據該應用程式的軟體之執行緒化的可程式性。此係顯示成一個邏輯核心具有一分配的位址計算與執行單元、一分配的執行緒暫存器檔案與一分配的共用分割排程器。依此方式,該邏輯核心包含一固定分配的節段。但是在一具體實施例中,在此操作模式下,該等位址計算與執行單元仍可被共享(例如代表該等位址計算與執行單元之每一者將不會被遮影)。
圖20為根據本發明一具體實施例中實施一多實體對多邏輯模式之一示例性四片段處理器之位址計算與執行單元、運算子/結果緩衝器、執行緒的暫存器檔案與共用分割排程器之另一種實施。
一示例性邏輯核心與其與該處理器之該等資源的關係如圖20之陰影部所示。但是在圖20的具體實施例中,一實體核心的該等資源被分散橫跨該等片段之每一者與該等可分割引擎之每一者。此係顯示為一個邏輯核心具有該等位址計算與執行單元之一分配的部份、該等執行緒的暫存器檔案之一分配的部份,及橫跨該等節段之每一者的共用分割排程器之一分配的部份。此外,圖20顯示出一個邏輯核心如何將被分配該等位址計算執行單元之每一者的該等資源之部份。依此方式,該邏輯核心包含該等節段之每一者的一固定分配的部份。
圖21為根據本發明一具體實施例中實施一多軟核心對多邏輯核心模式之一示例性四片段處理器之位址計算與執行單元、暫存器檔案與共用分割排程器。
一示例性邏輯核心與其與該處理器之該等資源的關係如圖21之陰影部所示。在圖12的操作模式中,該多軟核心對多邏輯核心模式,其中該等軟核心用於像是在執行應用程式中的邏輯核心來運作,每一邏輯核心將設置成具有對於該等位址計算單元之任何一者、該等運算子/結果緩衝器之一動態分配部份、執行緒的暫存器檔案與共用分割排程器之一共享的存取。多執行緒應用程式與多執行緒功能性係根據該應用程式的軟體之執行緒化的可程式性。
圖22為根據本發明一具體實施例中實施一多軟核心對一邏輯核心模式之一示例性四片段處理器之位址計算與執行單元、暫存器檔案與共用分割排程器。
一示例性邏輯核心與其與該處理器之該等資源的關係如圖22之陰影部所示。在圖13的操作模式中,該多軟核心對一邏輯核心模式,其中該等軟核心用於像是在執行應用程式中一單一邏輯核心來運作,每一邏輯核心將設置成具有對於所有該等位址計算單元、及所有該等運算子/結果緩衝器、執行緒的暫存器檔案與共用分割排程器之一共享的存取。在這種實施中,一單一執行緒的應用程式將其指令序列分開,並分配在該等虛擬核心之間,其中它們被協同地執行來達成高單一執行緒效能。依此方式,單一執行緒的效能可隨著加入額外的軟核心來調整。
圖23為根據本發明一具體實施例之一示例性微處理器管線2300的示意圖。微處理器管線2300包括一提取模組2301,其實施該程序的功能來辨識與擷取包含一執行的該等指令,如上所述。在圖23的具體實施例中,該提取模組接著為一解碼模組2302、一分配模組2303、一分派模組2304、一執行模組2305與一汰除模組2306。必須注意到微處理器管線2300僅為可實施上述之本發明之具體實施例的功能之管線的一種示例。本技術專業人士將可瞭解到可實施其它的微處理器管線來包括上述之該解碼模組的功能。
為了解釋的目的,前述的說明已經參照特定具體實施例來說明。但是,以上之例示性討論並非窮盡式或限制本發明於所揭示之明確型式。在以上的教示之下可瞭解其有可能許多修改及變化。該等具體實施例係被選擇及描述來最佳地解釋本發明及其實際應用的原理,藉此使得本技術中其它專業人士可在多種具體實施例及多種修正中最佳地利用本發明,使其可適用於所考慮的特定用途。
10‧‧‧通用前端提取與排程器
11-14‧‧‧可分割引擎
20-23‧‧‧程式碼序列
30‧‧‧通用內連線結構
101-104‧‧‧片段
110a‧‧‧記憶體通用內連線
110b‧‧‧執行通用內連線
121-124‧‧‧位址計算與執行單元
150‧‧‧通用前端提取與排程器
550‧‧‧運行時間最佳化器排程器
850‧‧‧桶
852‧‧‧執行緒分配指標陣列
852‧‧‧執行緒桶分配指標
853‧‧‧目的地繼承向量
854‧‧‧桶執行旗標
855‧‧‧桶預備位元
856‧‧‧桶繼承向量B_iv
857‧‧‧汰除執行緒指標
901‧‧‧桶繼承向量
902‧‧‧執行緒標頭
903‧‧‧轉送
904‧‧‧引擎/核心或處理器
905‧‧‧局部暫存器檔案
906‧‧‧執行緒2
910‧‧‧向量
1200‧‧‧位址暫存器內連線
1200‧‧‧位址內連線匯流排
1201‧‧‧位址計算單元內連線
1202‧‧‧位址陣列
1206‧‧‧標籤陣列
1207‧‧‧資料陣列
1211‧‧‧片段
1401、1402、1403、1404‧‧‧埠
1450‧‧‧臨界值限制暫存器
1460‧‧‧計數器
2300‧‧‧微處理器管線
2301‧‧‧提取模組
2302‧‧‧解碼模組
2303‧‧‧分配模組
2304‧‧‧分派模組
2305‧‧‧執行模組
2306‧‧‧汰除模組
本發明藉由範例來例示,但並非限制,在附屬圖面的圖形中類似的參考編號代表類似的元件。
圖1A為該通用前端產生程式碼區塊與繼承向量來在它們個別的可分割引擎上支援程式碼序列之執行的方式概述。
圖1B為根據本發明一具體實施例中針對一多核心處理器之可分割引擎及它們的組件之概要圖,其中包括分段的排程器與暫存器檔案、通用內連線與一片段化記憶體子系統。
圖2為根據本發明一具體實施例之排程器流程圖。
圖3為根據本發明一具體實施例之示例性硬體電路圖,其中顯示有儲存運算子與結果的一分段的暫存器檔案並具有一內連線。
圖4為根據本發明一具體實施例之一通用前端提取及排程器之示意圖。
圖5為根據本發明一具體實施例中橫跨許多虛擬核心之指令分配的另一種實施。
圖6為根據本發明一具體實施例中具有相對應複數的暫存器檔案與運算子及結果緩衝器之複數個暫存器節段。
圖7為根據本發明一具體實施例之一多核心處理器之一片段化記憶體子系統之細部示意圖。
圖8為根據本發明一具體實施例如何使用一位址的位元由位址產生來列舉片段的示意圖。
圖9為本發明之具體實施例如何處理負載與儲存之示意圖。
圖10為根據本發明一具體實施例中那些片段可被分成兩個或更多區域之方法。
圖11為根據本發明一具體實施例中該處理器之一種作業模式,其中虛擬核心被設置成對應於在執行應用程式時邏輯核心之實體核心。
圖12為根據本發明一具體實施例中該處理器之一種作業模式,其中虛擬核心被設置成對應於在執行應用程式時邏輯核心之軟核心。
圖13為根據本發明一具體實施例中該處理器之一種作業模式,其中該等虛擬核心被設置成對應於在執行應用程式時一單一邏輯核心之軟核心。
圖14為根據本發明一具體實施例中用於支援邏輯核心與虛擬核心功能之片段分段的示例性實施。
圖15為根據本發明一具體實施例中實施一多實體對多邏輯模式之一示例性四片段處理器的一片段記憶體。
圖16為根據本發明另一具體實施例中實施一多實體對多邏輯模式之一示例性四片段處理器的一片段記憶體。
圖17為根據本發明一具體實施例中實施一多軟核心對多邏輯核心模式之一示例性四片段處理器的一片段記憶體。
圖18為根據本發明一具體實施例中實施一多軟核心對一邏輯核心模式之一示例性四片段處理器的一片段記憶體。
圖19為根據本發明一具體實施例中實施一實體對邏輯模式之一示例性四片段處理器之位址計算與執行單元、運算子/結果緩衝器、執行緒的暫存器檔案與共用分割排程器。
圖20為根據本發明一具體實施例中實施一多實體對多邏輯模式之一示例性四片段處理器之位址計算與執行單元、運算子/結果緩衝器、執行緒的暫存器檔案與共用分割排程器之另一種實施。
圖21為根據本發明一具體實施例中實施一多軟核心對多邏輯模式之一示例性四片段處理器之位址計算與執行單元、暫存器檔案與共用分割排程器。
圖22為根據本發明一具體實施例中實施一多軟核心對一邏輯核心模式之一示例性四片段處理器之位址計算與執行單元、暫存器檔案與共用分割排程器。
圖23為根據本發明一具體實施例之一示例性微處理器管線的示意圖。
10‧‧‧通用前端提取與排程器
11-14‧‧‧可分割引擎
20-23‧‧‧程式碼序列

Claims (33)

  1. 一種用於使用一處理器之複數個虛擬核心來執行指令的方法,該方法包含:使用一通用前端排程器接收一輸入的指令序列;分割該輸入的指令序列成為複數個指令的程式碼區塊;產生複數個繼承向量來描述該等程式碼區塊之指令之間的交互相關性;分配該等程式碼區塊至該處理器的複數個虛擬核心,其中每一虛擬核心包含複數個可分割引擎的資源之一個別的子集合,其中每一可分割引擎包含一節段、一記憶體片段、及複數個執行單元,其中每一可分割引擎的資源被分割成與其他可分割引擎的分割資源實體化(instantiate)一虛擬核心,其中每一可分割引擎的資源間的通訊係由一通用內連線結構(global interconnecion structure)所支援;及使用該等可分割引擎根據一虛擬核心模式及根據該等個別繼承向量來執行該等程式碼區塊。
  2. 如申請專利範圍第1項之方法,其中每一可分割引擎節段另包含一共用分割提取與排程器。
  3. 如申請專利範圍第1項之方法,其中每一節段另包含一暫存器檔案。
  4. 如申請專利範圍第1項之方法,其中每一可分割引擎另包含一L1快取片段與L2快取片段與一負載儲存緩衝器。
  5. 如申請專利範圍第1項之方法,其中該等複數個虛擬核心實施一執行模式,其中每一可分割引擎的一實體資源的子集合被分配成支援一邏輯核心的一單一邏輯執行緒之執行。
  6. 如申請專利範圍第5項之方法,該等複數個虛擬核心實施複數個邏輯核心。
  7. 如申請專利範圍第1項之方法,其中該等複數個虛擬核心實施一執行模式,其中每一可分割引擎的實體資源根據一可調整臨界值被動態地分配來支援一單一邏輯核心的一單一邏輯執行緒之執行。
  8. 如申請專利範圍第7項之方法,該等複數個虛擬核心實施複數個邏輯核心。
  9. 如申請專利範圍第1項之方法,其中該等複數個虛擬核心實施一執行模式,其中每一可分割引擎之實體資源的該子集合根據一可調整臨界值被分配來支援一單一邏輯執行緒之執行。
  10. 一種用於使用一處理器之複數個虛擬核心來執行指令的系統,該系統包含:一通用前端排程器,用於接收一輸入的指令序列,其中該通用前端排程器分割該輸入的指令序列成為複數個指令的程式碼區塊,並產生複數個繼承向量來描述該等程式碼區塊的指令之間之交互相關性;及耦合來接收由該通用前端排程器分配的程式碼區塊之該 處理器的複數個虛擬核心,其中每一虛擬核心包含複數個可分割引擎的一個別的資源子集合,其中每一可分割引擎包含一節段、一記憶體片段、及複數個執行單元,其中每一可分割引擎的資源被分割成與其他可分割引擎的分割資源實體化一虛擬核心,其中每一可分割引擎的資源間的通訊係由一通用內連線結構所支援,其中該等程式碼區塊使用該等可分割引擎根據一虛擬核心模式與根據該等個別的繼承向量來執行。
  11. 如申請專利範圍第10項之系統,其中每一可分割引擎節段另包含一共用分割提取與排程器。
  12. 如申請專利範圍第10項之系統,其中每一節段另包含一暫存器檔案。
  13. 如申請專利範圍第10項之系統,其中每一可分割引擎另包含一L1快取片段與L2快取片段與一負載儲存緩衝器。
  14. 如申請專利範圍第10項之系統,其中該等複數個虛擬核心實施一執行模式,其中每一可分割引擎的一實體資源的子集合被分配成支援一邏輯核心的一單一邏輯執行緒之執行。
  15. 如申請專利範圍第14項之系統,該等複數個虛擬核心實施複數個邏輯核心。
  16. 如申請專利範圍第10項之系統,其中該等複數個虛擬核心實施一執行模式,其中每一可分割引擎的實體資源根據一可 調整臨界值被動態地分配來支援一單一邏輯核心的一單一邏輯執行緒之執行。
  17. 如申請專利範圍第16項之系統,該等複數個虛擬核心實施複數個邏輯核心。
  18. 如申請專利範圍第10項之系統,其中該等複數個虛擬核心實施一執行模式,其中每一可分割引擎之實體資源的該子集合根據一可調整臨界值被分配來支援一單一邏輯執行緒之執行。
  19. 一種用於使用一處理器之複數個虛擬核心來執行指令的系統,該系統包含:一運行時間最佳化器排程器,用於接收一輸入的指令序列,其中該運行時間最佳化器排程器分割該輸入的指令序列成為複數個指令的程式碼區塊,並產生複數個繼承向量來描述該等程式碼區塊的指令之間之交互相關性;及耦合來接收由該運行時間最佳化器排程器分配的程式碼區塊之該處理器的複數個虛擬核心,其中每一虛擬核心包含複數個可分割引擎的一個別的資源子集合,其中每一可分割引擎包含一節段、一記憶體片段、及複數個執行單元,其中每一可分割引擎的資源被分割成與其他可分割引擎的分割資源實體化一虛擬核心,其中每一可分割引擎的資源間的通訊係由一通用內連線結構所支援,其中該等程式碼區塊使用該等可分割引擎根據一虛擬核心模式與根據該等個別的繼承向量來執行。
  20. 如申請專利範圍第19項之系統,其中每一可分割引擎節段另包含一共用分割提取與排程器。
  21. 如申請專利範圍第19項之系統,其中每一節段另包含一暫存器檔案。
  22. 如申請專利範圍第19項之系統,其中每一可分割引擎另包含一L1快取片段與L2快取片段與一負載儲存緩衝器。
  23. 如申請專利範圍第19項之系統,其中該等複數個虛擬核心實施一執行模式,其中每一可分割引擎的一實體資源的子集合被分配成支援一邏輯核心的一單一邏輯執行緒之執行。
  24. 如申請專利範圍第23項之系統,該等複數個虛擬核心實施複數個邏輯核心。
  25. 如申請專利範圍第19項之系統,其中該等複數個虛擬核心實施一執行模式,其中每一可分割引擎的實體資源根據一可調整臨界值被動態地分配來支援一單一邏輯核心的一單一邏輯執行緒之執行。
  26. 如申請專利範圍第25項之系統,該等複數個虛擬核心實施複數個邏輯核心。
  27. 如申請專利範圍第19項之系統,其中該等複數個虛擬核心實施一執行模式,其中每一可分割引擎之實體資源的該子集 合根據一可調整臨界值被分配來支援一單一邏輯執行緒之執行。
  28. 一種用於使用一處理器之複數個虛擬核心來執行指令的方法,該方法包含:使用一通用前端排程器接收一輸入的指令序列;分割該輸入的指令序列成為複數個指令的程式碼區塊;分配該等程式碼區塊至該處理器的複數個虛擬核心,其中每一虛擬核心包含複數個可分割引擎的資源之一個別的子集合,其中每一可分割引擎包含一節段、一記憶體片段、及複數個執行單元,其中每一可分割引擎的資源被分割成與其他可分割引擎的分割資源實體化一虛擬核心,其中每一可分割引擎的資源間的通訊係由一通用內連線結構所支援;及使用該等可分割引擎根據一虛擬核心模式執行該等程式碼區塊。
  29. 如申請專利範圍第28項之方法,其中該等複數個虛擬核心實施一執行模式,其中每一可分割引擎的一實體資源的子集合被分配成支援一邏輯核心的一單一邏輯執行緒之執行。
  30. 如申請專利範圍第29項之方法,該等複數個虛擬核心實施複數個邏輯核心。
  31. 如申請專利範圍第28項之方法,其中該等複數個虛擬核心實施一執行模式,其中每一可分割引擎的實體資源根據一可調整臨界值被動態地分配來支援一單一邏輯核心的一單一邏 輯執行緒之執行。
  32. 如申請專利範圍第31項之方法,該等複數個虛擬核心實施複數個邏輯核心。
  33. 如申請專利範圍第28項之方法,其中該等複數個虛擬核心實施一執行模式,其中每一可分割引擎之實體資源的該子集合根據一可調整臨界值被分配來支援一單一邏輯執行緒之執行。
TW101110082A 2011-03-25 2012-03-23 使用可分割引擎實體化的虛擬核心執行指令序列程式碼區塊 TWI533129B (zh)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US201161467944P 2011-03-25 2011-03-25

Publications (2)

Publication Number Publication Date
TW201305819A TW201305819A (zh) 2013-02-01
TWI533129B true TWI533129B (zh) 2016-05-11

Family

ID=46878439

Family Applications (1)

Application Number Title Priority Date Filing Date
TW101110082A TWI533129B (zh) 2011-03-25 2012-03-23 使用可分割引擎實體化的虛擬核心執行指令序列程式碼區塊

Country Status (6)

Country Link
US (2) US9766893B2 (zh)
EP (1) EP2689327B1 (zh)
KR (1) KR101638225B1 (zh)
CN (1) CN103547993B (zh)
TW (1) TWI533129B (zh)
WO (1) WO2012135031A2 (zh)

Families Citing this family (56)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007143278A2 (en) 2006-04-12 2007-12-13 Soft Machines, Inc. Apparatus and method for processing an instruction matrix specifying parallel and dependent operations
CN101627365B (zh) 2006-11-14 2017-03-29 索夫特机械公司 多线程架构
US10698859B2 (en) 2009-09-18 2020-06-30 The Board Of Regents Of The University Of Texas System Data multicasting with router replication and target instruction identification in a distributed multi-core processing architecture
US10228949B2 (en) 2010-09-17 2019-03-12 Intel Corporation Single cycle multi-branch prediction including shadow cache for early far branch prediction
TWI520070B (zh) 2011-03-25 2016-02-01 軟體機器公司 使用可分割引擎實體化的虛擬核心以支援程式碼區塊執行的記憶體片段
US9842005B2 (en) 2011-03-25 2017-12-12 Intel Corporation Register file segments for supporting code block execution by using virtual cores instantiated by partitionable engines
WO2012135031A2 (en) 2011-03-25 2012-10-04 Soft Machines, Inc. Executing instruction sequence code blocks by using virtual cores instantiated by partitionable engines
US9940134B2 (en) 2011-05-20 2018-04-10 Intel Corporation Decentralized allocation of resources and interconnect structures to support the execution of instruction sequences by a plurality of engines
US9442772B2 (en) 2011-05-20 2016-09-13 Soft Machines Inc. Global and local interconnect structure comprising routing matrix to support the execution of instruction sequences by a plurality of engines
KR101842550B1 (ko) 2011-11-22 2018-03-28 소프트 머신즈, 인크. 다중 엔진 마이크로프로세서용 가속 코드 최적화기
EP2783281B1 (en) 2011-11-22 2020-05-13 Intel Corporation A microprocessor accelerated code optimizer
US9632825B2 (en) 2013-03-15 2017-04-25 Intel Corporation Method and apparatus for efficient scheduling for asymmetrical execution units
KR20150130510A (ko) 2013-03-15 2015-11-23 소프트 머신즈, 인크. 네이티브 분산된 플래그 아키텍처를 이용하여 게스트 중앙 플래그 아키텍처를 에뮬레이션하는 방법
US9811342B2 (en) 2013-03-15 2017-11-07 Intel Corporation Method for performing dual dispatch of blocks and half blocks
WO2014150991A1 (en) 2013-03-15 2014-09-25 Soft Machines, Inc. A method for implementing a reduced size register view data structure in a microprocessor
US10275255B2 (en) 2013-03-15 2019-04-30 Intel Corporation Method for dependency broadcasting through a source organized source view data structure
WO2014150971A1 (en) 2013-03-15 2014-09-25 Soft Machines, Inc. A method for dependency broadcasting through a block organized source view data structure
KR102063656B1 (ko) 2013-03-15 2020-01-09 소프트 머신즈, 인크. 블록들로 그룹화된 멀티스레드 명령어들을 실행하기 위한 방법
US9904625B2 (en) 2013-03-15 2018-02-27 Intel Corporation Methods, systems and apparatus for predicting the way of a set associative cache
US9886279B2 (en) 2013-03-15 2018-02-06 Intel Corporation Method for populating and instruction view data structure by using register template snapshots
WO2014150806A1 (en) 2013-03-15 2014-09-25 Soft Machines, Inc. A method for populating register view data structure by using register template snapshots
US9891924B2 (en) 2013-03-15 2018-02-13 Intel Corporation Method for implementing a reduced size register view data structure in a microprocessor
US9569216B2 (en) 2013-03-15 2017-02-14 Soft Machines, Inc. Method for populating a source view data structure by using register template snapshots
US10140138B2 (en) 2013-03-15 2018-11-27 Intel Corporation Methods, systems and apparatus for supporting wide and efficient front-end operation with guest-architecture emulation
KR101800948B1 (ko) * 2013-03-15 2017-11-23 인텔 코포레이션 레지스터 뷰, 소스 뷰, 명령어 뷰, 및 복수의 레지스터 템플릿을 가진 마이크로프로세서 아키텍처를 이용하여 명령어들의 블록들을 실행하는 방법
JP6086230B2 (ja) * 2013-04-01 2017-03-01 日本電気株式会社 中央演算装置、情報処理装置、および仮想コア内レジスタ値取得方法
US9218382B1 (en) * 2013-06-18 2015-12-22 Ca, Inc. Exponential histogram based database management for combining data values in memory buckets
US10061592B2 (en) 2014-06-27 2018-08-28 Samsung Electronics Co., Ltd. Architecture and execution for efficient mixed precision computations in single instruction multiple data/thread (SIMD/T) devices
US10061591B2 (en) 2014-06-27 2018-08-28 Samsung Electronics Company, Ltd. Redundancy elimination in single instruction multiple data/thread (SIMD/T) execution processing
US11755484B2 (en) 2015-06-26 2023-09-12 Microsoft Technology Licensing, Llc Instruction block allocation
US10678544B2 (en) 2015-09-19 2020-06-09 Microsoft Technology Licensing, Llc Initiating instruction block execution using a register access instruction
US20170083327A1 (en) 2015-09-19 2017-03-23 Microsoft Technology Licensing, Llc Implicit program order
US20170083341A1 (en) * 2015-09-19 2017-03-23 Microsoft Technology Licensing, Llc Segmented instruction block
US10776115B2 (en) 2015-09-19 2020-09-15 Microsoft Technology Licensing, Llc Debug support for block-based processor
US10198263B2 (en) 2015-09-19 2019-02-05 Microsoft Technology Licensing, Llc Write nullification
US10768936B2 (en) 2015-09-19 2020-09-08 Microsoft Technology Licensing, Llc Block-based processor including topology and control registers to indicate resource sharing and size of logical processor
US11126433B2 (en) 2015-09-19 2021-09-21 Microsoft Technology Licensing, Llc Block-based processor core composition register
US11016770B2 (en) 2015-09-19 2021-05-25 Microsoft Technology Licensing, Llc Distinct system registers for logical processors
US10719321B2 (en) 2015-09-19 2020-07-21 Microsoft Technology Licensing, Llc Prefetching instruction blocks
US10180840B2 (en) 2015-09-19 2019-01-15 Microsoft Technology Licensing, Llc Dynamic generation of null instructions
US10871967B2 (en) * 2015-09-19 2020-12-22 Microsoft Technology Licensing, Llc Register read/write ordering
US10452399B2 (en) 2015-09-19 2019-10-22 Microsoft Technology Licensing, Llc Broadcast channel architectures for block-based processors
US11681531B2 (en) 2015-09-19 2023-06-20 Microsoft Technology Licensing, Llc Generation and use of memory access instruction order encodings
US9977677B2 (en) * 2016-04-07 2018-05-22 International Business Machines Corporation Execution slice with supplemental instruction port for an instruction using a source operand from another instruction port
US10241794B2 (en) 2016-12-27 2019-03-26 Intel Corporation Apparatus and methods to support counted loop exits in a multi-strand loop processor
US20180181398A1 (en) * 2016-12-28 2018-06-28 Intel Corporation Apparatus and methods of decomposing loops to improve performance and power efficiency
WO2018125250A1 (en) 2016-12-31 2018-07-05 Intel Corporation Systems, methods, and apparatuses for heterogeneous computing
US10866806B2 (en) * 2017-11-14 2020-12-15 Nvidia Corporation Uniform register file for improved resource utilization
KR102213046B1 (ko) 2017-12-25 2021-02-05 미쓰비시덴키 가부시키가이샤 설계 지원 장치, 설계 지원 방법 및 기록 매체에 저장된 프로그램
KR102063123B1 (ko) 2018-05-23 2020-01-07 서울대학교산학협력단 의존성 그래프를 이용한 데이터 처리 시스템 및 방법
US10534736B1 (en) 2018-12-31 2020-01-14 Texas Instruments Incorporated Shared buffer for multi-output display systems
CN110659118B (zh) * 2019-09-11 2022-03-08 上海天数智芯半导体有限公司 一种用于多领域芯片设计的可配置混合异构计算核心系统
CN110991666B (zh) 2019-11-25 2023-09-15 远景智能国际私人投资有限公司 故障检测方法、模型的训练方法、装置、设备及存储介质
CN111176847B (zh) * 2019-12-31 2022-08-12 苏州浪潮智能科技有限公司 物理核超多线程服务器上大数据集群性能优化方法及装置
CN112115427A (zh) * 2020-08-14 2020-12-22 咪咕文化科技有限公司 代码混淆方法、装置、电子设备及存储介质
US20230069890A1 (en) * 2021-09-03 2023-03-09 Advanced Micro Devices, Inc. Processing device and method of sharing storage between cache memory, local data storage and register files

Family Cites Families (472)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US727487A (en) 1902-10-21 1903-05-05 Swan F Swanson Dumping-car.
US4075704A (en) 1976-07-02 1978-02-21 Floating Point Systems, Inc. Floating point data processor for high speech operation
US4228496A (en) 1976-09-07 1980-10-14 Tandem Computers Incorporated Multiprocessor system
US4245344A (en) 1979-04-02 1981-01-13 Rockwell International Corporation Processing system with dual buses
US4527237A (en) 1979-10-11 1985-07-02 Nanodata Computer Corporation Data processing system
US4414624A (en) 1980-11-19 1983-11-08 The United States Of America As Represented By The Secretary Of The Navy Multiple-microcomputer processing
US4524415A (en) 1982-12-07 1985-06-18 Motorola, Inc. Virtual machine data processor
US4597061B1 (en) 1983-01-03 1998-06-09 Texas Instruments Inc Memory system using pipleline circuitry for improved system
US4577273A (en) 1983-06-06 1986-03-18 Sperry Corporation Multiple microcomputer system for digital computers
US4682281A (en) 1983-08-30 1987-07-21 Amdahl Corporation Data storage unit employing translation lookaside buffer pointer
US4600986A (en) 1984-04-02 1986-07-15 Sperry Corporation Pipelined split stack with high performance interleaved decode
US4633434A (en) 1984-04-02 1986-12-30 Sperry Corporation High performance storage unit
JPS6140643A (ja) 1984-07-31 1986-02-26 Hitachi Ltd システムの資源割当て制御方式
US4835680A (en) 1985-03-15 1989-05-30 Xerox Corporation Adaptive processor array capable of learning variable associations useful in recognizing classes of inputs
JPS6289149A (ja) 1985-10-15 1987-04-23 Agency Of Ind Science & Technol 多ポ−トメモリシステム
JPH0658650B2 (ja) 1986-03-14 1994-08-03 株式会社日立製作所 仮想計算機システム
US4920477A (en) 1987-04-20 1990-04-24 Multiflow Computer, Inc. Virtual address table look aside buffer miss recovery method and apparatus
US4943909A (en) 1987-07-08 1990-07-24 At&T Bell Laboratories Computational origami
US5339398A (en) 1989-07-31 1994-08-16 North American Philips Corporation Memory architecture and method of data organization optimized for hashing
US5471593A (en) 1989-12-11 1995-11-28 Branigin; Michael H. Computer processor with an efficient means of executing many instructions simultaneously
US5197130A (en) 1989-12-29 1993-03-23 Supercomputer Systems Limited Partnership Cluster architecture for a highly parallel scalar/vector multiprocessor system
US5317754A (en) 1990-10-23 1994-05-31 International Business Machines Corporation Method and apparatus for enabling an interpretive execution subset
US5317705A (en) 1990-10-24 1994-05-31 International Business Machines Corporation Apparatus and method for TLB purge reduction in a multi-level machine system
US6282583B1 (en) 1991-06-04 2001-08-28 Silicon Graphics, Inc. Method and apparatus for memory access in a matrix processor computer
US5539911A (en) 1991-07-08 1996-07-23 Seiko Epson Corporation High-performance, superscalar-based computer system with out-of-order instruction execution
JPH0820949B2 (ja) 1991-11-26 1996-03-04 松下電器産業株式会社 情報処理装置
GB2277181B (en) 1991-12-23 1995-12-13 Intel Corp Interleaved cache for multiple accesses per clock in a microprocessor
KR100309566B1 (ko) 1992-04-29 2001-12-15 리패치 파이프라인프로세서에서다중명령어를무리짓고,그룹화된명령어를동시에발행하고,그룹화된명령어를실행시키는방법및장치
EP0638183B1 (en) 1992-05-01 1997-03-05 Seiko Epson Corporation A system and method for retiring instructions in a superscalar microprocessor
DE69329260T2 (de) 1992-06-25 2001-02-22 Canon Kk Gerät zum Multiplizieren von Ganzzahlen mit vielen Ziffern
JPH0637202A (ja) 1992-07-20 1994-02-10 Mitsubishi Electric Corp マイクロ波ic用パッケージ
JPH06110781A (ja) 1992-09-30 1994-04-22 Nec Corp キャッシュメモリ装置
US5493660A (en) 1992-10-06 1996-02-20 Hewlett-Packard Company Software assisted hardware TLB miss handler
US5513335A (en) 1992-11-02 1996-04-30 Sgs-Thomson Microelectronics, Inc. Cache tag memory having first and second single-port arrays and a dual-port array
US5819088A (en) 1993-03-25 1998-10-06 Intel Corporation Method and apparatus for scheduling instructions for execution on a multi-issue architecture computer
JPH0784883A (ja) 1993-09-17 1995-03-31 Hitachi Ltd 仮想計算機システムのアドレス変換バッファパージ方法
US6948172B1 (en) 1993-09-21 2005-09-20 Microsoft Corporation Preemptive multi-tasking with cooperative groups of tasks
US5469376A (en) 1993-10-14 1995-11-21 Abdallah; Mohammad A. F. F. Digital circuit for the evaluation of mathematical expressions
US5517651A (en) 1993-12-29 1996-05-14 Intel Corporation Method and apparatus for loading a segment register in a microprocessor capable of operating in multiple modes
US5956753A (en) 1993-12-30 1999-09-21 Intel Corporation Method and apparatus for handling speculative memory access operations
US5761476A (en) 1993-12-30 1998-06-02 Intel Corporation Non-clocked early read for back-to-back scheduling of instructions
JP3048498B2 (ja) 1994-04-13 2000-06-05 株式会社東芝 半導体記憶装置
JPH07287668A (ja) 1994-04-19 1995-10-31 Hitachi Ltd データ処理装置
CN1084005C (zh) 1994-06-27 2002-05-01 国际商业机器公司 用于动态控制地址空间分配的方法和设备
US5548742A (en) 1994-08-11 1996-08-20 Intel Corporation Method and apparatus for combining a direct-mapped cache and a multiple-way cache in a cache memory
US5813031A (en) 1994-09-21 1998-09-22 Industrial Technology Research Institute Caching tag for a large scale cache computer memory system
US5640534A (en) 1994-10-05 1997-06-17 International Business Machines Corporation Method and system for concurrent access in a data cache array utilizing multiple match line selection paths
US5835951A (en) 1994-10-18 1998-11-10 National Semiconductor Branch processing unit with target cache read prioritization protocol for handling multiple hits
JP3569014B2 (ja) 1994-11-25 2004-09-22 富士通株式会社 マルチコンテキストをサポートするプロセッサおよび処理方法
US5724565A (en) 1995-02-03 1998-03-03 International Business Machines Corporation Method and system for processing first and second sets of instructions by first and second types of processing systems
US5655115A (en) 1995-02-14 1997-08-05 Hal Computer Systems, Inc. Processor structure and method for watchpoint of plural simultaneous unresolved branch evaluation
US5675759A (en) 1995-03-03 1997-10-07 Shebanow; Michael C. Method and apparatus for register management using issue sequence prior physical register and register association validity information
US5751982A (en) 1995-03-31 1998-05-12 Apple Computer, Inc. Software emulation system with dynamic translation of emulated instructions for increased processing speed
US5634068A (en) 1995-03-31 1997-05-27 Sun Microsystems, Inc. Packet switched cache coherent multiprocessor system
US6209085B1 (en) 1995-05-05 2001-03-27 Intel Corporation Method and apparatus for performing process switching in multiprocessor computer systems
US6643765B1 (en) 1995-08-16 2003-11-04 Microunity Systems Engineering, Inc. Programmable processor with group floating point operations
US5710902A (en) 1995-09-06 1998-01-20 Intel Corporation Instruction dependency chain indentifier
US6341324B1 (en) 1995-10-06 2002-01-22 Lsi Logic Corporation Exception processing in superscalar microprocessor
US5864657A (en) 1995-11-29 1999-01-26 Texas Micro, Inc. Main memory system and checkpointing protocol for fault-tolerant computer system
US5983327A (en) 1995-12-01 1999-11-09 Nortel Networks Corporation Data path architecture and arbitration scheme for providing access to a shared system resource
US5793941A (en) 1995-12-04 1998-08-11 Advanced Micro Devices, Inc. On-chip primary cache testing circuit and test method
US5911057A (en) 1995-12-19 1999-06-08 Texas Instruments Incorporated Superscalar microprocessor having combined register and memory renaming circuits, systems, and methods
US5699537A (en) 1995-12-22 1997-12-16 Intel Corporation Processor microarchitecture for efficient dynamic scheduling and execution of chains of dependent instructions
US6882177B1 (en) 1996-01-10 2005-04-19 Altera Corporation Tristate structures for programmable logic devices
US5754818A (en) 1996-03-22 1998-05-19 Sun Microsystems, Inc. Architecture and method for sharing TLB entries through process IDS
US5904892A (en) 1996-04-01 1999-05-18 Saint-Gobain/Norton Industrial Ceramics Corp. Tape cast silicon carbide dummy wafer
US5752260A (en) 1996-04-29 1998-05-12 International Business Machines Corporation High-speed, multiple-port, interleaved cache with arbitration of multiple access addresses
US5806085A (en) 1996-05-01 1998-09-08 Sun Microsystems, Inc. Method for non-volatile caching of network and CD-ROM file accesses using a cache directory, pointers, file name conversion, a local hard disk, and separate small database
US5829028A (en) 1996-05-06 1998-10-27 Advanced Micro Devices, Inc. Data cache configured to store data in a use-once manner
US6108769A (en) 1996-05-17 2000-08-22 Advanced Micro Devices, Inc. Dependency table for reducing dependency checking hardware
US5881277A (en) 1996-06-13 1999-03-09 Texas Instruments Incorporated Pipelined microprocessor with branch misprediction cache circuits, systems and methods
US5860146A (en) 1996-06-25 1999-01-12 Sun Microsystems, Inc. Auxiliary translation lookaside buffer for assisting in accessing data in remote address spaces
US5903760A (en) 1996-06-27 1999-05-11 Intel Corporation Method and apparatus for translating a conditional instruction compatible with a first instruction set architecture (ISA) into a conditional instruction compatible with a second ISA
US5974506A (en) 1996-06-28 1999-10-26 Digital Equipment Corporation Enabling mirror, nonmirror and partial mirror cache modes in a dual cache system
US6167490A (en) 1996-09-20 2000-12-26 University Of Washington Using global memory information to manage memory in a computer network
KR19980032776A (ko) 1996-10-16 1998-07-25 가나이 츠토무 데이타 프로세서 및 데이타 처리시스템
EP0877981B1 (en) 1996-11-04 2004-01-07 Koninklijke Philips Electronics N.V. Processing device, reads instructions in memory
US5978906A (en) 1996-11-19 1999-11-02 Advanced Micro Devices, Inc. Branch selectors associated with byte ranges within an instruction cache for rapidly identifying branch predictions
US6253316B1 (en) 1996-11-19 2001-06-26 Advanced Micro Devices, Inc. Three state branch history using one bit in a branch prediction mechanism
US5903750A (en) 1996-11-20 1999-05-11 Institute For The Development Of Emerging Architectures, L.L.P. Dynamic branch prediction for branch instructions with multiple targets
US6212542B1 (en) 1996-12-16 2001-04-03 International Business Machines Corporation Method and system for executing a program within a multiscalar processor by processing linked thread descriptors
US6134634A (en) 1996-12-20 2000-10-17 Texas Instruments Incorporated Method and apparatus for preemptive cache write-back
US5918251A (en) 1996-12-23 1999-06-29 Intel Corporation Method and apparatus for preloading different default address translation attributes
US6016540A (en) 1997-01-08 2000-01-18 Intel Corporation Method and apparatus for scheduling instructions in waves
US6065105A (en) 1997-01-08 2000-05-16 Intel Corporation Dependency matrix
US5802602A (en) 1997-01-17 1998-09-01 Intel Corporation Method and apparatus for performing reads of related data from a set-associative cache memory
US6088780A (en) 1997-03-31 2000-07-11 Institute For The Development Of Emerging Architecture, L.L.C. Page table walker that uses at least one of a default page size and a page size selected for a virtual address space to position a sliding field in a virtual address
US6075938A (en) 1997-06-10 2000-06-13 The Board Of Trustees Of The Leland Stanford Junior University Virtual machine monitors for scalable multiprocessors
US6073230A (en) 1997-06-11 2000-06-06 Advanced Micro Devices, Inc. Instruction fetch unit configured to provide sequential way prediction for sequential instruction fetches
JPH1124929A (ja) 1997-06-30 1999-01-29 Sony Corp 演算処理装置およびその方法
US6128728A (en) 1997-08-01 2000-10-03 Micron Technology, Inc. Virtual shadow registers and virtual register windows
US6170051B1 (en) 1997-08-01 2001-01-02 Micron Technology, Inc. Apparatus and method for program level parallelism in a VLIW processor
US6085315A (en) 1997-09-12 2000-07-04 Siemens Aktiengesellschaft Data processing device with loop pipeline
US6101577A (en) 1997-09-15 2000-08-08 Advanced Micro Devices, Inc. Pipelined instruction cache and branch prediction mechanism therefor
US5901294A (en) 1997-09-18 1999-05-04 International Business Machines Corporation Method and system for bus arbitration in a multiprocessor system utilizing simultaneous variable-width bus access
US6185660B1 (en) 1997-09-23 2001-02-06 Hewlett-Packard Company Pending access queue for providing data to a target register during an intermediate pipeline phase after a computer cache miss
US5905509A (en) 1997-09-30 1999-05-18 Compaq Computer Corp. Accelerated Graphics Port two level Gart cache having distributed first level caches
US6226732B1 (en) 1997-10-02 2001-05-01 Hitachi Micro Systems, Inc. Memory system architecture
US5922065A (en) 1997-10-13 1999-07-13 Institute For The Development Of Emerging Architectures, L.L.C. Processor utilizing a template field for encoding instruction sequences in a wide-word format
US6178482B1 (en) 1997-11-03 2001-01-23 Brecis Communications Virtual register sets
US6021484A (en) 1997-11-14 2000-02-01 Samsung Electronics Co., Ltd. Dual instruction set architecture
US6256728B1 (en) 1997-11-17 2001-07-03 Advanced Micro Devices, Inc. Processor configured to selectively cancel instructions from its pipeline responsive to a predicted-taken short forward branch instruction
US6260131B1 (en) 1997-11-18 2001-07-10 Intrinsity, Inc. Method and apparatus for TLB memory ordering
US6016533A (en) 1997-12-16 2000-01-18 Advanced Micro Devices, Inc. Way prediction logic for cache array
US6219776B1 (en) 1998-03-10 2001-04-17 Billions Of Operations Per Second Merged array controller and processing element
US6609189B1 (en) 1998-03-12 2003-08-19 Yale University Cycle segmented prefix circuits
JP3657424B2 (ja) 1998-03-20 2005-06-08 松下電器産業株式会社 番組情報を放送するセンター装置と端末装置
US6216215B1 (en) 1998-04-02 2001-04-10 Intel Corporation Method and apparatus for senior loads
US6157998A (en) 1998-04-03 2000-12-05 Motorola Inc. Method for performing branch prediction and resolution of two or more branch instructions within two or more branch prediction buffers
US6205545B1 (en) 1998-04-30 2001-03-20 Hewlett-Packard Company Method and apparatus for using static branch predictions hints with dynamically translated code traces to improve performance
US6115809A (en) 1998-04-30 2000-09-05 Hewlett-Packard Company Compiling strong and weak branching behavior instruction blocks to separate caches for dynamic and static prediction
US6256727B1 (en) 1998-05-12 2001-07-03 International Business Machines Corporation Method and system for fetching noncontiguous instructions in a single clock cycle
JPH11338710A (ja) 1998-05-28 1999-12-10 Toshiba Corp 複数種の命令セットを持つプロセッサのためのコンパイル方法ならびに装置および同方法がプログラムされ記録される記録媒体
US6272616B1 (en) 1998-06-17 2001-08-07 Agere Systems Guardian Corp. Method and apparatus for executing multiple instruction streams in a digital processor with multiple data paths
US6988183B1 (en) 1998-06-26 2006-01-17 Derek Chi-Lan Wong Methods for increasing instruction-level parallelism in microprocessors and digital system
US6260138B1 (en) 1998-07-17 2001-07-10 Sun Microsystems, Inc. Method and apparatus for branch instruction processing in a processor
US6122656A (en) 1998-07-31 2000-09-19 Advanced Micro Devices, Inc. Processor configured to map logical register numbers to physical register numbers using virtual register numbers
US6272662B1 (en) 1998-08-04 2001-08-07 International Business Machines Corporation Distributed storage system using front-end and back-end locking
JP2000057054A (ja) 1998-08-12 2000-02-25 Fujitsu Ltd 高速アドレス変換システム
US8631066B2 (en) 1998-09-10 2014-01-14 Vmware, Inc. Mechanism for providing virtual machines for use by multiple users
US6339822B1 (en) 1998-10-02 2002-01-15 Advanced Micro Devices, Inc. Using padded instructions in a block-oriented cache
US6332189B1 (en) 1998-10-16 2001-12-18 Intel Corporation Branch prediction architecture
GB9825102D0 (en) 1998-11-16 1999-01-13 Insignia Solutions Plc Computer system
JP3110404B2 (ja) 1998-11-18 2000-11-20 甲府日本電気株式会社 マイクロプロセッサ装置及びそのソフトウェア命令高速化方法並びにその制御プログラムを記録した記録媒体
US6490673B1 (en) 1998-11-27 2002-12-03 Matsushita Electric Industrial Co., Ltd Processor, compiling apparatus, and compile program recorded on a recording medium
US6519682B2 (en) 1998-12-04 2003-02-11 Stmicroelectronics, Inc. Pipelined non-blocking level two cache system with inherent transaction collision-avoidance
US7020879B1 (en) 1998-12-16 2006-03-28 Mips Technologies, Inc. Interrupt and exception handling for multi-streaming digital processors
US6477562B2 (en) 1998-12-16 2002-11-05 Clearwater Networks, Inc. Prioritized instruction scheduling for multi-streaming processors
US6247097B1 (en) 1999-01-22 2001-06-12 International Business Machines Corporation Aligned instruction cache handling of instruction fetches across multiple predicted branch instructions
US6321298B1 (en) 1999-01-25 2001-11-20 International Business Machines Corporation Full cache coherency across multiple raid controllers
JP3842474B2 (ja) 1999-02-02 2006-11-08 株式会社ルネサステクノロジ データ処理装置
US6327650B1 (en) 1999-02-12 2001-12-04 Vsli Technology, Inc. Pipelined multiprocessing with upstream processor concurrently writing to local register and to register of downstream processor
US6732220B2 (en) 1999-02-17 2004-05-04 Elbrus International Method for emulating hardware features of a foreign architecture in a host operating system environment
US6668316B1 (en) 1999-02-17 2003-12-23 Elbrus International Limited Method and apparatus for conflict-free execution of integer and floating-point operations with a common register file
US6418530B2 (en) 1999-02-18 2002-07-09 Hewlett-Packard Company Hardware/software system for instruction profiling and trace selection using branch history information for branch predictions
US6437789B1 (en) 1999-02-19 2002-08-20 Evans & Sutherland Computer Corporation Multi-level cache controller
US6850531B1 (en) 1999-02-23 2005-02-01 Alcatel Multi-service network switch
US6212613B1 (en) 1999-03-22 2001-04-03 Cisco Technology, Inc. Methods and apparatus for reusing addresses in a computer
US6529928B1 (en) 1999-03-23 2003-03-04 Silicon Graphics, Inc. Floating-point adder performing floating-point and integer operations
US6449671B1 (en) 1999-06-09 2002-09-10 Ati International Srl Method and apparatus for busing data elements
US6473833B1 (en) 1999-07-30 2002-10-29 International Business Machines Corporation Integrated cache and directory structure for multi-level caches
US6643770B1 (en) 1999-09-16 2003-11-04 Intel Corporation Branch misprediction recovery using a side memory
US6704822B1 (en) 1999-10-01 2004-03-09 Sun Microsystems, Inc. Arbitration protocol for a shared data cache
US6772325B1 (en) 1999-10-01 2004-08-03 Hitachi, Ltd. Processor architecture and operation for exploiting improved branch control instruction
US6457120B1 (en) 1999-11-01 2002-09-24 International Business Machines Corporation Processor and method including a cache having confirmation bits for improving address predictable branch instruction target predictions
US7441110B1 (en) 1999-12-10 2008-10-21 International Business Machines Corporation Prefetching using future branch path information derived from branch prediction
US7107434B2 (en) 1999-12-20 2006-09-12 Board Of Regents, The University Of Texas System, method and apparatus for allocating hardware resources using pseudorandom sequences
JP4693326B2 (ja) 1999-12-22 2011-06-01 ウビコム インコーポレイテッド 組込み型プロセッサにおいてゼロタイムコンテクストスイッチを用いて命令レベルをマルチスレッド化するシステムおよび方法
US6557095B1 (en) 1999-12-27 2003-04-29 Intel Corporation Scheduling operations using a dependency matrix
CN1210649C (zh) 2000-01-03 2005-07-13 先进微装置公司 能够发送及重新发送附属链接的排程器、包括该排程器的处理器以及排程方法
US6542984B1 (en) 2000-01-03 2003-04-01 Advanced Micro Devices, Inc. Scheduler capable of issuing and reissuing dependency chains
US6594755B1 (en) 2000-01-04 2003-07-15 National Semiconductor Corporation System and method for interleaved execution of multiple independent threads
US6728872B1 (en) 2000-02-04 2004-04-27 International Business Machines Corporation Method and apparatus for verifying that instructions are pipelined in correct architectural sequence
GB0002848D0 (en) 2000-02-08 2000-03-29 Siroyan Limited Communicating instruction results in processors and compiling methods for processors
GB2365661A (en) 2000-03-10 2002-02-20 British Telecomm Allocating switch requests within a packet switch
US6615340B1 (en) 2000-03-22 2003-09-02 Wilmot, Ii Richard Byron Extended operand management indicator structure and method
US6604187B1 (en) 2000-06-19 2003-08-05 Advanced Micro Devices, Inc. Providing global translations with address space numbers
US6557083B1 (en) 2000-06-30 2003-04-29 Intel Corporation Memory system for multiple data types
US6704860B1 (en) 2000-07-26 2004-03-09 International Business Machines Corporation Data processing system and method for fetching instruction blocks in response to a detected block sequence
US7206925B1 (en) 2000-08-18 2007-04-17 Sun Microsystems, Inc. Backing Register File for processors
US6728866B1 (en) 2000-08-31 2004-04-27 International Business Machines Corporation Partitioned issue queue and allocation strategy
US6721874B1 (en) 2000-10-12 2004-04-13 International Business Machines Corporation Method and system for dynamically shared completion table supporting multiple threads in a processing system
US7757065B1 (en) 2000-11-09 2010-07-13 Intel Corporation Instruction segment recording scheme
JP2002185513A (ja) 2000-12-18 2002-06-28 Hitachi Ltd パケット通信ネットワークおよびパケット転送制御方法
US6877089B2 (en) 2000-12-27 2005-04-05 International Business Machines Corporation Branch prediction apparatus and process for restoring replaced branch history for use in future branch predictions for an executing program
US6907600B2 (en) 2000-12-27 2005-06-14 Intel Corporation Virtual translation lookaside buffer
US6647466B2 (en) 2001-01-25 2003-11-11 Hewlett-Packard Development Company, L.P. Method and apparatus for adaptively bypassing one or more levels of a cache hierarchy
FR2820921A1 (fr) 2001-02-14 2002-08-16 Canon Kk Dispositif et procede de transmission dans un commutateur
US6985951B2 (en) 2001-03-08 2006-01-10 International Business Machines Corporation Inter-partition message passing method, system and program product for managing workload in a partitioned processing environment
US6950927B1 (en) 2001-04-13 2005-09-27 The United States Of America As Represented By The Secretary Of The Navy System and method for instruction-level parallelism in a programmable multiple network processor environment
US7707397B2 (en) 2001-05-04 2010-04-27 Via Technologies, Inc. Variable group associativity branch target address cache delivering multiple target addresses per cache line
US7200740B2 (en) 2001-05-04 2007-04-03 Ip-First, Llc Apparatus and method for speculatively performing a return instruction in a microprocessor
US6658549B2 (en) 2001-05-22 2003-12-02 Hewlett-Packard Development Company, Lp. Method and system allowing a single entity to manage memory comprising compressed and uncompressed data
US6985591B2 (en) 2001-06-29 2006-01-10 Intel Corporation Method and apparatus for distributing keys for decrypting and re-encrypting publicly distributed media
US7203824B2 (en) 2001-07-03 2007-04-10 Ip-First, Llc Apparatus and method for handling BTAC branches that wrap across instruction cache lines
US7024545B1 (en) 2001-07-24 2006-04-04 Advanced Micro Devices, Inc. Hybrid branch prediction device with two levels of branch prediction cache
US6954846B2 (en) 2001-08-07 2005-10-11 Sun Microsystems, Inc. Microprocessor and method for giving each thread exclusive access to one register file in a multi-threading mode and for giving an active thread access to multiple register files in a single thread mode
US6718440B2 (en) 2001-09-28 2004-04-06 Intel Corporation Memory access latency hiding with hint buffer
US7150021B1 (en) 2001-10-12 2006-12-12 Palau Acquisition Corporation (Delaware) Method and system to allocate resources within an interconnect device according to a resource allocation table
US7117347B2 (en) 2001-10-23 2006-10-03 Ip-First, Llc Processor including fallback branch prediction mechanism for far jump and far call instructions
US7272832B2 (en) 2001-10-25 2007-09-18 Hewlett-Packard Development Company, L.P. Method of protecting user process data in a secure platform inaccessible to the operating system and other tasks on top of the secure platform
US6964043B2 (en) 2001-10-30 2005-11-08 Intel Corporation Method, apparatus, and system to optimize frequently executed code and to use compiler transformation and hardware support to handle infrequently executed code
GB2381886B (en) 2001-11-07 2004-06-23 Sun Microsystems Inc Computer system with virtual memory and paging mechanism
US7092869B2 (en) 2001-11-14 2006-08-15 Ronald Hilton Memory address prediction under emulation
US7363467B2 (en) 2002-01-03 2008-04-22 Intel Corporation Dependence-chain processing using trace descriptors having dependency descriptors
US6640333B2 (en) 2002-01-10 2003-10-28 Lsi Logic Corporation Architecture for a sea of platforms
US7055021B2 (en) 2002-02-05 2006-05-30 Sun Microsystems, Inc. Out-of-order processor that reduces mis-speculation using a replay scoreboard
US7331040B2 (en) 2002-02-06 2008-02-12 Transitive Limted Condition code flag emulation for program code conversion
US6839816B2 (en) 2002-02-26 2005-01-04 International Business Machines Corporation Shared cache line update mechanism
US6731292B2 (en) 2002-03-06 2004-05-04 Sun Microsystems, Inc. System and method for controlling a number of outstanding data transactions within an integrated circuit
JP3719509B2 (ja) 2002-04-01 2005-11-24 株式会社ソニー・コンピュータエンタテインメント シリアル演算パイプライン、演算装置、算術論理演算回路およびシリアル演算パイプラインによる演算方法
US7565509B2 (en) 2002-04-17 2009-07-21 Microsoft Corporation Using limits on address translation to control access to an addressable entity
US6920530B2 (en) 2002-04-23 2005-07-19 Sun Microsystems, Inc. Scheme for reordering instructions via an instruction caching mechanism
US7113488B2 (en) 2002-04-24 2006-09-26 International Business Machines Corporation Reconfigurable circular bus
US7281055B2 (en) 2002-05-28 2007-10-09 Newisys, Inc. Routing mechanisms in systems having multiple multi-processor clusters
US7117346B2 (en) 2002-05-31 2006-10-03 Freescale Semiconductor, Inc. Data processing system having multiple register contexts and method therefor
US6938151B2 (en) 2002-06-04 2005-08-30 International Business Machines Corporation Hybrid branch prediction using a global selection counter and a prediction method comparison table
US8024735B2 (en) 2002-06-14 2011-09-20 Intel Corporation Method and apparatus for ensuring fairness and forward progress when executing multiple threads of execution
JP3845043B2 (ja) 2002-06-28 2006-11-15 富士通株式会社 命令フェッチ制御装置
JP3982353B2 (ja) 2002-07-12 2007-09-26 日本電気株式会社 フォルトトレラントコンピュータ装置、その再同期化方法及び再同期化プログラム
US6944744B2 (en) 2002-08-27 2005-09-13 Advanced Micro Devices, Inc. Apparatus and method for independently schedulable functional units with issue lock mechanism in a processor
US6950925B1 (en) 2002-08-28 2005-09-27 Advanced Micro Devices, Inc. Scheduler for use in a microprocessor that supports data-speculative execution
US7546422B2 (en) 2002-08-28 2009-06-09 Intel Corporation Method and apparatus for the synchronization of distributed caches
TW200408242A (en) 2002-09-06 2004-05-16 Matsushita Electric Ind Co Ltd Home terminal apparatus and communication system
US6895491B2 (en) 2002-09-26 2005-05-17 Hewlett-Packard Development Company, L.P. Memory addressing for a virtual machine implementation on a computer processor supporting virtual hash-page-table searching
US7334086B2 (en) 2002-10-08 2008-02-19 Rmi Corporation Advanced processor with system on a chip interconnect technology
US7213248B2 (en) 2002-10-10 2007-05-01 International Business Machines Corporation High speed promotion mechanism suitable for lock acquisition in a multiprocessor data processing system
US6829698B2 (en) 2002-10-10 2004-12-07 International Business Machines Corporation Method, apparatus and system for acquiring a global promotion facility utilizing a data-less transaction
US7222218B2 (en) 2002-10-22 2007-05-22 Sun Microsystems, Inc. System and method for goal-based scheduling of blocks of code for concurrent execution
US20040103251A1 (en) 2002-11-26 2004-05-27 Mitchell Alsup Microprocessor including a first level cache and a second level cache having different cache line sizes
AU2003292451A1 (en) 2002-12-04 2004-06-23 Koninklijke Philips Electronics N.V. Register file gating to reduce microprocessor power dissipation
US6981083B2 (en) 2002-12-05 2005-12-27 International Business Machines Corporation Processor virtualization mechanism via an enhanced restoration of hard architected states
US7073042B2 (en) 2002-12-12 2006-07-04 Intel Corporation Reclaiming existing fields in address translation data structures to extend control over memory accesses
US20040117594A1 (en) 2002-12-13 2004-06-17 Vanderspek Julius Memory management method
US20040122887A1 (en) 2002-12-20 2004-06-24 Macy William W. Efficient multiplication of small matrices using SIMD registers
US7191349B2 (en) 2002-12-26 2007-03-13 Intel Corporation Mechanism for processor power state aware distribution of lowest priority interrupt
US6925421B2 (en) 2003-01-09 2005-08-02 International Business Machines Corporation Method, system, and computer program product for estimating the number of consumers that place a load on an individual resource in a pool of physically distributed resources
US20040139441A1 (en) 2003-01-09 2004-07-15 Kabushiki Kaisha Toshiba Processor, arithmetic operation processing method, and priority determination method
US7178010B2 (en) 2003-01-16 2007-02-13 Ip-First, Llc Method and apparatus for correcting an internal call/return stack in a microprocessor that detects from multiple pipeline stages incorrect speculative update of the call/return stack
US7089374B2 (en) 2003-02-13 2006-08-08 Sun Microsystems, Inc. Selectively unmarking load-marked cache lines during transactional program execution
US7278030B1 (en) 2003-03-03 2007-10-02 Vmware, Inc. Virtualization system for computers having multiple protection mechanisms
US6912644B1 (en) 2003-03-06 2005-06-28 Intel Corporation Method and apparatus to steer memory access operations in a virtual memory system
US7111145B1 (en) 2003-03-25 2006-09-19 Vmware, Inc. TLB miss fault handler and method for accessing multiple page tables
US7143273B2 (en) 2003-03-31 2006-11-28 Intel Corporation Method and apparatus for dynamic branch prediction utilizing multiple stew algorithms for indexing a global history
CN1214666C (zh) 2003-04-07 2005-08-10 华为技术有限公司 位置业务中限制位置信息请求流量的方法
US7058764B2 (en) 2003-04-14 2006-06-06 Hewlett-Packard Development Company, L.P. Method of adaptive cache partitioning to increase host I/O performance
EP1471421A1 (en) 2003-04-24 2004-10-27 STMicroelectronics Limited Speculative load instruction control
US7139855B2 (en) 2003-04-24 2006-11-21 International Business Machines Corporation High performance synchronization of resource allocation in a logically-partitioned system
US7290261B2 (en) 2003-04-24 2007-10-30 International Business Machines Corporation Method and logical apparatus for rename register reallocation in a simultaneous multi-threaded (SMT) processor
US7469407B2 (en) 2003-04-24 2008-12-23 International Business Machines Corporation Method for resource balancing using dispatch flush in a simultaneous multithread processor
US7055003B2 (en) 2003-04-25 2006-05-30 International Business Machines Corporation Data cache scrub mechanism for large L2/L3 data cache structures
US7007108B2 (en) 2003-04-30 2006-02-28 Lsi Logic Corporation System method for use of hardware semaphores for resource release notification wherein messages comprises read-modify-write operation and address
ATE554443T1 (de) 2003-06-25 2012-05-15 Koninkl Philips Electronics Nv Anweisungsgesteuerte datenverarbeitungseinrichtung und -verfahren
JP2005032018A (ja) 2003-07-04 2005-02-03 Semiconductor Energy Lab Co Ltd 遺伝的アルゴリズムを用いたマイクロプロセッサ
US7149872B2 (en) 2003-07-10 2006-12-12 Transmeta Corporation System and method for identifying TLB entries associated with a physical address of a specified range
US7089398B2 (en) 2003-07-31 2006-08-08 Silicon Graphics, Inc. Address translation using a page size tag
US8296771B2 (en) 2003-08-18 2012-10-23 Cray Inc. System and method for mapping between resource consumers and resource providers in a computing system
US7133950B2 (en) 2003-08-19 2006-11-07 Sun Microsystems, Inc. Request arbitration in multi-core processor
US7594089B2 (en) 2003-08-28 2009-09-22 Mips Technologies, Inc. Smart memory based synchronization controller for a multi-threaded multiprocessor SoC
US9032404B2 (en) 2003-08-28 2015-05-12 Mips Technologies, Inc. Preemptive multitasking employing software emulation of directed exceptions in a multithreading processor
US7849297B2 (en) 2003-08-28 2010-12-07 Mips Technologies, Inc. Software emulation of directed exceptions in a multithreading processor
US7321965B2 (en) 2003-08-28 2008-01-22 Mips Technologies, Inc. Integrated mechanism for suspension and deallocation of computational threads of execution in a processor
US7111126B2 (en) 2003-09-24 2006-09-19 Arm Limited Apparatus and method for loading data values
JP4057989B2 (ja) 2003-09-26 2008-03-05 株式会社東芝 スケジューリング方法および情報処理システム
US7373637B2 (en) 2003-09-30 2008-05-13 International Business Machines Corporation Method and apparatus for counting instruction and memory location ranges
US7047322B1 (en) 2003-09-30 2006-05-16 Unisys Corporation System and method for performing conflict resolution and flow control in a multiprocessor system
FR2860313B1 (fr) 2003-09-30 2005-11-04 Commissariat Energie Atomique Composant a architecture reconfigurable dynamiquement
TWI281121B (en) 2003-10-06 2007-05-11 Ip First Llc Apparatus and method for selectively overriding return stack prediction in response to detection of non-standard return sequence
US8407433B2 (en) 2007-06-25 2013-03-26 Sonics, Inc. Interconnect implementing internal controls
US7395372B2 (en) 2003-11-14 2008-07-01 International Business Machines Corporation Method and system for providing cache set selection which is power optimized
US7243170B2 (en) 2003-11-24 2007-07-10 International Business Machines Corporation Method and circuit for reading and writing an instruction buffer
US20050120191A1 (en) 2003-12-02 2005-06-02 Intel Corporation (A Delaware Corporation) Checkpoint-based register reclamation
US20050132145A1 (en) 2003-12-15 2005-06-16 Finisar Corporation Contingent processor time division multiple access of memory in a multi-processor system to allow supplemental memory consumer access
US7310722B2 (en) 2003-12-18 2007-12-18 Nvidia Corporation Across-thread out of order instruction dispatch in a multithreaded graphics processor
US7293164B2 (en) 2004-01-14 2007-11-06 International Business Machines Corporation Autonomic method and apparatus for counting branch instructions to generate branch statistics meant to improve branch predictions
US20050204118A1 (en) 2004-02-27 2005-09-15 National Chiao Tung University Method for inter-cluster communication that employs register permutation
US20050216920A1 (en) * 2004-03-24 2005-09-29 Vijay Tewari Use of a virtual machine to emulate a hardware device
US8055885B2 (en) 2004-03-29 2011-11-08 Japan Science And Technology Agency Data processing device for implementing instruction reuse, and digital data storage medium for storing a data processing program for implementing instruction reuse
US7383427B2 (en) 2004-04-22 2008-06-03 Sony Computer Entertainment Inc. Multi-scalar extension for SIMD instruction set processors
US20050251649A1 (en) 2004-04-23 2005-11-10 Sony Computer Entertainment Inc. Methods and apparatus for address map optimization on a multi-scalar extension
US7418582B1 (en) 2004-05-13 2008-08-26 Sun Microsystems, Inc. Versatile register file design for a multi-threaded processor utilizing different modes and register windows
US7478198B2 (en) 2004-05-24 2009-01-13 Intel Corporation Multithreaded clustered microarchitecture with dynamic back-end assignment
US7594234B1 (en) 2004-06-04 2009-09-22 Sun Microsystems, Inc. Adaptive spin-then-block mutual exclusion in multi-threaded processing
US7284092B2 (en) 2004-06-24 2007-10-16 International Business Machines Corporation Digital data processing apparatus having multi-level register file
US20050289530A1 (en) 2004-06-29 2005-12-29 Robison Arch D Scheduling of instructions in program compilation
EP1628235A1 (en) 2004-07-01 2006-02-22 Texas Instruments Incorporated Method and system of ensuring integrity of a secure mode entry sequence
US8044951B1 (en) 2004-07-02 2011-10-25 Nvidia Corporation Integer-based functionality in a graphics shading language
US7339592B2 (en) 2004-07-13 2008-03-04 Nvidia Corporation Simulating multiported memories using lower port count memories
US7398347B1 (en) 2004-07-14 2008-07-08 Altera Corporation Methods and apparatus for dynamic instruction controlled reconfigurable register file
EP1619593A1 (en) 2004-07-22 2006-01-25 Sap Ag Computer-Implemented method and system for performing a product availability check
JP4064380B2 (ja) 2004-07-29 2008-03-19 富士通株式会社 演算処理装置およびその制御方法
US8443171B2 (en) 2004-07-30 2013-05-14 Hewlett-Packard Development Company, L.P. Run-time updating of prediction hint instructions
US7213106B1 (en) 2004-08-09 2007-05-01 Sun Microsystems, Inc. Conservative shadow cache support in a point-to-point connected multiprocessing node
US7318143B2 (en) 2004-10-20 2008-01-08 Arm Limited Reuseable configuration data
US20090150890A1 (en) 2007-12-10 2009-06-11 Yourst Matt T Strand-based computing hardware and dynamically optimizing strandware for a high performance microprocessor system
US7707578B1 (en) 2004-12-16 2010-04-27 Vmware, Inc. Mechanism for scheduling execution of threads for fair resource allocation in a multi-threaded and/or multi-core processing system
US7257695B2 (en) 2004-12-28 2007-08-14 Intel Corporation Register file regions for a processing system
US7996644B2 (en) 2004-12-29 2011-08-09 Intel Corporation Fair sharing of a cache in a multi-core/multi-threaded processor by dynamically partitioning of the cache
US8719819B2 (en) 2005-06-30 2014-05-06 Intel Corporation Mechanism for instruction set based thread execution on a plurality of instruction sequencers
US7050922B1 (en) 2005-01-14 2006-05-23 Agilent Technologies, Inc. Method for optimizing test order, and machine-readable media storing sequences of instructions to perform same
US7657891B2 (en) 2005-02-04 2010-02-02 Mips Technologies, Inc. Multithreading microprocessor with optimized thread scheduler for increasing pipeline utilization efficiency
US7681014B2 (en) 2005-02-04 2010-03-16 Mips Technologies, Inc. Multithreading instruction scheduler employing thread group priorities
JP2008530642A (ja) 2005-02-07 2008-08-07 ペーアーツェーテー イクスペーペー テクノロジーズ アクチエンゲゼルシャフト 低レイテンシーの大量並列データ処理装置
US7400548B2 (en) 2005-02-09 2008-07-15 International Business Machines Corporation Method for providing multiple reads/writes using a 2read/2write register file array
US7343476B2 (en) 2005-02-10 2008-03-11 International Business Machines Corporation Intelligent SMT thread hang detect taking into account shared resource contention/blocking
US7152155B2 (en) 2005-02-18 2006-12-19 Qualcomm Incorporated System and method of correcting a branch misprediction
US20060200655A1 (en) 2005-03-04 2006-09-07 Smith Rodney W Forward looking branch target address caching
US8195922B2 (en) 2005-03-18 2012-06-05 Marvell World Trade, Ltd. System for dynamically allocating processing time to multiple threads
US20060212853A1 (en) 2005-03-18 2006-09-21 Marvell World Trade Ltd. Real-time control apparatus having a multi-thread processor
GB2424727B (en) 2005-03-30 2007-08-01 Transitive Ltd Preparing instruction groups for a processor having a multiple issue ports
US8522253B1 (en) 2005-03-31 2013-08-27 Guillermo Rozas Hardware support for virtual machine and operating system context switching in translation lookaside buffers and virtually tagged caches
US20060230243A1 (en) 2005-04-06 2006-10-12 Robert Cochran Cascaded snapshots
US7313775B2 (en) 2005-04-06 2007-12-25 Lsi Corporation Integrated circuit with relocatable processor hardmac
US8230423B2 (en) 2005-04-07 2012-07-24 International Business Machines Corporation Multithreaded processor architecture with operational latency hiding
US20060230409A1 (en) 2005-04-07 2006-10-12 Matteo Frigo Multithreaded processor architecture with implicit granularity adaptation
US20060230253A1 (en) 2005-04-11 2006-10-12 Lucian Codrescu Unified non-partitioned register files for a digital signal processor operating in an interleaved multi-threaded environment
US20060236074A1 (en) 2005-04-14 2006-10-19 Arm Limited Indicating storage locations within caches
US7437543B2 (en) 2005-04-19 2008-10-14 International Business Machines Corporation Reducing the fetch time of target instructions of a predicted taken branch instruction
US7461237B2 (en) 2005-04-20 2008-12-02 Sun Microsystems, Inc. Method and apparatus for suppressing duplicative prefetches for branch target cache lines
US8713286B2 (en) 2005-04-26 2014-04-29 Qualcomm Incorporated Register files for a digital signal processor operating in an interleaved multi-threaded environment
GB2426084A (en) 2005-05-13 2006-11-15 Agilent Technologies Inc Updating data in a dual port memory
US7861055B2 (en) 2005-06-07 2010-12-28 Broadcom Corporation Method and system for on-chip configurable data ram for fast memory and pseudo associative caches
US8010969B2 (en) 2005-06-13 2011-08-30 Intel Corporation Mechanism for monitoring instruction set based thread execution on a plurality of instruction sequencers
KR101355496B1 (ko) 2005-08-29 2014-01-28 디 인벤션 사이언스 펀드 원, 엘엘씨 복수의 병렬 클러스터들을 포함하는 계층 프로세서의스케쥴링 메카니즘
EP1927054A1 (en) 2005-09-14 2008-06-04 Koninklijke Philips Electronics N.V. Method and system for bus arbitration
US7350056B2 (en) 2005-09-27 2008-03-25 International Business Machines Corporation Method and apparatus for issuing instructions from an issue queue in an information handling system
US7676634B1 (en) 2005-09-28 2010-03-09 Sun Microsystems, Inc. Selective trace cache invalidation for self-modifying code via memory aging
US7231106B2 (en) 2005-09-30 2007-06-12 Lucent Technologies Inc. Apparatus for directing an optical signal from an input fiber to an output fiber within a high index host
US7613131B2 (en) 2005-11-10 2009-11-03 Citrix Systems, Inc. Overlay network infrastructure
US7681019B1 (en) 2005-11-18 2010-03-16 Sun Microsystems, Inc. Executing functions determined via a collection of operations from translated instructions
US7861060B1 (en) 2005-12-15 2010-12-28 Nvidia Corporation Parallel data processing systems and methods using cooperative thread arrays and thread identifier values to determine processing behavior
US7634637B1 (en) 2005-12-16 2009-12-15 Nvidia Corporation Execution of parallel groups of threads with per-instruction serialization
US7770161B2 (en) * 2005-12-28 2010-08-03 International Business Machines Corporation Post-register allocation profile directed instruction scheduling
US8423682B2 (en) 2005-12-30 2013-04-16 Intel Corporation Address space emulation
GB2435362B (en) 2006-02-20 2008-11-26 Cramer Systems Ltd Method of configuring devices in a telecommunications network
JP4332205B2 (ja) 2006-02-27 2009-09-16 富士通株式会社 キャッシュ制御装置およびキャッシュ制御方法
US7543282B2 (en) 2006-03-24 2009-06-02 Sun Microsystems, Inc. Method and apparatus for selectively executing different executable code versions which are optimized in different ways
WO2007143278A2 (en) 2006-04-12 2007-12-13 Soft Machines, Inc. Apparatus and method for processing an instruction matrix specifying parallel and dependent operations
US7610571B2 (en) 2006-04-14 2009-10-27 Cadence Design Systems, Inc. Method and system for simulating state retention of an RTL design
US7577820B1 (en) 2006-04-14 2009-08-18 Tilera Corporation Managing data in a parallel processing environment
CN100485636C (zh) 2006-04-24 2009-05-06 华为技术有限公司 一种基于模型驱动进行电信级业务开发的调试方法及装置
US7804076B2 (en) 2006-05-10 2010-09-28 Taiwan Semiconductor Manufacturing Co., Ltd Insulator for high current ion implanters
US8145882B1 (en) 2006-05-25 2012-03-27 Mips Technologies, Inc. Apparatus and method for processing template based user defined instructions
US20080126771A1 (en) 2006-07-25 2008-05-29 Lei Chen Branch Target Extension for an Instruction Cache
CN100495324C (zh) 2006-07-27 2009-06-03 中国科学院计算技术研究所 复杂指令集体系结构中的深度优先异常处理方法
US7904704B2 (en) 2006-08-14 2011-03-08 Marvell World Trade Ltd. Instruction dispatching method and apparatus
US8046775B2 (en) 2006-08-14 2011-10-25 Marvell World Trade Ltd. Event-based bandwidth allocation mode switching method and apparatus
US7539842B2 (en) 2006-08-15 2009-05-26 International Business Machines Corporation Computer memory system for selecting memory buses according to physical memory organization information stored in virtual address translation tables
US7594060B2 (en) 2006-08-23 2009-09-22 Sun Microsystems, Inc. Data buffer allocation in a non-blocking data services platform using input/output switching fabric
US7752474B2 (en) 2006-09-22 2010-07-06 Apple Inc. L1 cache flush when processor is entering low power mode
US7716460B2 (en) 2006-09-29 2010-05-11 Qualcomm Incorporated Effective use of a BHT in processor having variable length instruction set execution modes
US7774549B2 (en) 2006-10-11 2010-08-10 Mips Technologies, Inc. Horizontally-shared cache victims in multiple core processors
TWI337495B (en) 2006-10-26 2011-02-11 Au Optronics Corp System and method for operation scheduling
US7680988B1 (en) 2006-10-30 2010-03-16 Nvidia Corporation Single interconnect providing read and write access to a memory shared by concurrent threads
US7617384B1 (en) 2006-11-06 2009-11-10 Nvidia Corporation Structured programming control flow using a disable mask in a SIMD architecture
CN101627365B (zh) 2006-11-14 2017-03-29 索夫特机械公司 多线程架构
US7493475B2 (en) 2006-11-15 2009-02-17 Stmicroelectronics, Inc. Instruction vector-mode processing in multi-lane processor by multiplex switch replicating instruction in one lane to select others along with updated operand address
US7934179B2 (en) 2006-11-20 2011-04-26 Et International, Inc. Systems and methods for logic verification
US20080235500A1 (en) 2006-11-21 2008-09-25 Davis Gordon T Structure for instruction cache trace formation
JP2008130056A (ja) 2006-11-27 2008-06-05 Renesas Technology Corp 半導体回路
US7783869B2 (en) 2006-12-19 2010-08-24 Arm Limited Accessing branch predictions ahead of instruction fetching
WO2008077088A2 (en) 2006-12-19 2008-06-26 The Board Of Governors For Higher Education, State Of Rhode Island And Providence Plantations System and method for branch misprediction prediction using complementary branch predictors
EP1940028B1 (en) 2006-12-29 2012-02-29 STMicroelectronics Srl Asynchronous interconnection system for 3D inter-chip communication
US8321849B2 (en) 2007-01-26 2012-11-27 Nvidia Corporation Virtual architecture and instruction set for parallel thread computing
TW200833002A (en) 2007-01-31 2008-08-01 Univ Nat Yunlin Sci & Tech Distributed switching circuit having fairness
US20080189501A1 (en) 2007-02-05 2008-08-07 Irish John D Methods and Apparatus for Issuing Commands on a Bus
US7685410B2 (en) 2007-02-13 2010-03-23 Global Foundries Inc. Redirect recovery cache that receives branch misprediction redirects and caches instructions to be dispatched in response to the redirects
US7647483B2 (en) 2007-02-20 2010-01-12 Sony Computer Entertainment Inc. Multi-threaded parallel processor methods and apparatus
JP4980751B2 (ja) 2007-03-02 2012-07-18 富士通セミコンダクター株式会社 データ処理装置、およびメモリのリードアクティブ制御方法。
US8452907B2 (en) 2007-03-27 2013-05-28 Arm Limited Data processing apparatus and method for arbitrating access to a shared resource
US20080250227A1 (en) 2007-04-04 2008-10-09 Linderman Michael D General Purpose Multiprocessor Programming Apparatus And Method
US7716183B2 (en) 2007-04-11 2010-05-11 Dot Hill Systems Corporation Snapshot preserved data cloning
US7941791B2 (en) 2007-04-13 2011-05-10 Perry Wang Programming environment for heterogeneous processor resource integration
US7769955B2 (en) 2007-04-27 2010-08-03 Arm Limited Multiple thread instruction fetch from different cache levels
US7711935B2 (en) 2007-04-30 2010-05-04 Netlogic Microsystems, Inc. Universal branch identifier for invalidation of speculative instructions
US8555039B2 (en) 2007-05-03 2013-10-08 Qualcomm Incorporated System and method for using a local condition code register for accelerating conditional instruction execution in a pipeline processor
US8219996B1 (en) 2007-05-09 2012-07-10 Hewlett-Packard Development Company, L.P. Computer processor with fairness monitor
CN101344840B (zh) 2007-07-10 2011-08-31 苏州简约纳电子有限公司 一种微处理器及在微处理器中执行指令的方法
US7937568B2 (en) 2007-07-11 2011-05-03 International Business Machines Corporation Adaptive execution cycle control method for enhanced instruction throughput
US20090025004A1 (en) 2007-07-16 2009-01-22 Microsoft Corporation Scheduling by Growing and Shrinking Resource Allocation
US8108545B2 (en) 2007-08-27 2012-01-31 International Business Machines Corporation Packet coalescing in virtual channels of a data processing system in a multi-tiered full-graph interconnect architecture
US7711929B2 (en) * 2007-08-30 2010-05-04 International Business Machines Corporation Method and system for tracking instruction dependency in an out-of-order processor
US8725991B2 (en) 2007-09-12 2014-05-13 Qualcomm Incorporated Register file system and method for pipelined processing
US8082420B2 (en) 2007-10-24 2011-12-20 International Business Machines Corporation Method and apparatus for executing instructions
US7856530B1 (en) 2007-10-31 2010-12-21 Network Appliance, Inc. System and method for implementing a dynamic cache for a data storage system
US7877559B2 (en) 2007-11-26 2011-01-25 Globalfoundries Inc. Mechanism to accelerate removal of store operations from a queue
US8245232B2 (en) 2007-11-27 2012-08-14 Microsoft Corporation Software-configurable and stall-time fair memory access scheduling mechanism for shared memory systems
US7809925B2 (en) * 2007-12-07 2010-10-05 International Business Machines Corporation Processing unit incorporating vectorizable execution unit
US8145844B2 (en) 2007-12-13 2012-03-27 Arm Limited Memory controller with write data cache and read data cache
US7831813B2 (en) 2007-12-17 2010-11-09 Globalfoundries Inc. Uses of known good code for implementing processor architectural modifications
US7870371B2 (en) 2007-12-17 2011-01-11 Microsoft Corporation Target-frequency based indirect jump prediction for high-performance processors
US20090165007A1 (en) 2007-12-19 2009-06-25 Microsoft Corporation Task-level thread scheduling and resource allocation
US8782384B2 (en) 2007-12-20 2014-07-15 Advanced Micro Devices, Inc. Branch history with polymorphic indirect branch information
US7917699B2 (en) 2007-12-21 2011-03-29 Mips Technologies, Inc. Apparatus and method for controlling the exclusivity mode of a level-two cache
US9244855B2 (en) 2007-12-31 2016-01-26 Intel Corporation Method, system, and apparatus for page sizing extension
US8645965B2 (en) 2007-12-31 2014-02-04 Intel Corporation Supporting metered clients with manycore through time-limited partitioning
US7877582B2 (en) 2008-01-31 2011-01-25 International Business Machines Corporation Multi-addressable register file
WO2009101563A1 (en) 2008-02-11 2009-08-20 Nxp B.V. Multiprocessing implementing a plurality of virtual processors
US7949972B2 (en) 2008-03-19 2011-05-24 International Business Machines Corporation Method, system and computer program product for exploiting orthogonal control vectors in timing driven synthesis
US7987343B2 (en) * 2008-03-19 2011-07-26 International Business Machines Corporation Processor and method for synchronous load multiple fetching sequence and pipeline stage result tracking to facilitate early address generation interlock bypass
US9513905B2 (en) 2008-03-28 2016-12-06 Intel Corporation Vector instructions to enable efficient synchronization and parallel reduction operations
US8120608B2 (en) 2008-04-04 2012-02-21 Via Technologies, Inc. Constant buffering for a computational core of a programmable graphics processing unit
TWI364703B (en) 2008-05-26 2012-05-21 Faraday Tech Corp Processor and early execution method of data load thereof
US8145880B1 (en) 2008-07-07 2012-03-27 Ovics Matrix processor data switch routing systems and methods
US8516454B2 (en) 2008-07-10 2013-08-20 Rocketick Technologies Ltd. Efficient parallel computation of dependency problems
JP2010039536A (ja) 2008-07-31 2010-02-18 Panasonic Corp プログラム変換装置、プログラム変換方法およびプログラム変換プログラム
US8316435B1 (en) 2008-08-14 2012-11-20 Juniper Networks, Inc. Routing device having integrated MPLS-aware firewall with virtual security system support
US8135942B2 (en) 2008-08-28 2012-03-13 International Business Machines Corpration System and method for double-issue instructions using a dependency matrix and a side issue queue
US7769984B2 (en) 2008-09-11 2010-08-03 International Business Machines Corporation Dual-issuance of microprocessor instructions using dual dependency matrices
US8225048B2 (en) 2008-10-01 2012-07-17 Hewlett-Packard Development Company, L.P. Systems and methods for resource access
US9244732B2 (en) 2009-08-28 2016-01-26 Vmware, Inc. Compensating threads for microarchitectural resource contentions by prioritizing scheduling and execution
US7941616B2 (en) 2008-10-21 2011-05-10 Microsoft Corporation System to reduce interference in concurrent programs
US8423749B2 (en) 2008-10-22 2013-04-16 International Business Machines Corporation Sequential processing in network on chip nodes by threads generating message containing payload and pointer for nanokernel to access algorithm to be executed on payload in another node
GB2464703A (en) 2008-10-22 2010-04-28 Advanced Risc Mach Ltd An array of interconnected processors executing a cycle-based program
EP2351325B1 (en) 2008-10-30 2018-09-26 Nokia Technologies Oy Method and apparatus for interleaving a data block
US8032678B2 (en) 2008-11-05 2011-10-04 Mediatek Inc. Shared resource arbitration
US7848129B1 (en) 2008-11-20 2010-12-07 Netlogic Microsystems, Inc. Dynamically partitioned CAM array
US8868838B1 (en) 2008-11-21 2014-10-21 Nvidia Corporation Multi-class data cache policies
US8171223B2 (en) 2008-12-03 2012-05-01 Intel Corporation Method and system to increase concurrency and control replication in a multi-core cache hierarchy
US8200949B1 (en) 2008-12-09 2012-06-12 Nvidia Corporation Policy based allocation of register file cache to threads in multi-threaded processor
US8312268B2 (en) 2008-12-12 2012-11-13 International Business Machines Corporation Virtual machine
US8099586B2 (en) 2008-12-30 2012-01-17 Oracle America, Inc. Branch misprediction recovery mechanism for microprocessors
US20100169578A1 (en) 2008-12-31 2010-07-01 Texas Instruments Incorporated Cache tag memory
US20100205603A1 (en) 2009-02-09 2010-08-12 Unisys Corporation Scheduling and dispatching tasks in an emulated operating system
JP5417879B2 (ja) 2009-02-17 2014-02-19 富士通セミコンダクター株式会社 キャッシュ装置
US8505013B2 (en) 2010-03-12 2013-08-06 Lsi Corporation Reducing data read latency in a network communications processor architecture
US8805788B2 (en) 2009-05-04 2014-08-12 Moka5, Inc. Transactional virtual disk with differential snapshots
US8332854B2 (en) 2009-05-19 2012-12-11 Microsoft Corporation Virtualized thread scheduling for hardware thread optimization based on hardware resource parameter summaries of instruction blocks in execution groups
US8533437B2 (en) 2009-06-01 2013-09-10 Via Technologies, Inc. Guaranteed prefetch instruction
GB2471067B (en) 2009-06-12 2011-11-30 Graeme Roy Smith Shared resource multi-thread array processor
US9122487B2 (en) 2009-06-23 2015-09-01 Oracle America, Inc. System and method for balancing instruction loads between multiple execution units using assignment history
CN101582025B (zh) 2009-06-25 2011-05-25 浙江大学 片上多处理器体系架构下全局寄存器重命名表的实现方法
US8397049B2 (en) 2009-07-13 2013-03-12 Apple Inc. TLB prefetching
US8539486B2 (en) 2009-07-17 2013-09-17 International Business Machines Corporation Transactional block conflict resolution based on the determination of executing threads in parallel or in serial mode
JP5423217B2 (ja) 2009-08-04 2014-02-19 富士通株式会社 演算処理装置、情報処理装置、および演算処理装置の制御方法
US8127078B2 (en) 2009-10-02 2012-02-28 International Business Machines Corporation High performance unaligned cache access
US20110082983A1 (en) 2009-10-06 2011-04-07 Alcatel-Lucent Canada, Inc. Cpu instruction and data cache corruption prevention system
US8695002B2 (en) 2009-10-20 2014-04-08 Lantiq Deutschland Gmbh Multi-threaded processors and multi-processor systems comprising shared resources
US8364933B2 (en) 2009-12-18 2013-01-29 International Business Machines Corporation Software assisted translation lookaside buffer search mechanism
JP2011150397A (ja) 2010-01-19 2011-08-04 Panasonic Corp バス調停装置
KR101699910B1 (ko) 2010-03-04 2017-01-26 삼성전자주식회사 재구성 가능 프로세서 및 그 제어 방법
US20120005462A1 (en) 2010-07-01 2012-01-05 International Business Machines Corporation Hardware Assist for Optimizing Code During Processing
US8312258B2 (en) 2010-07-22 2012-11-13 Intel Corporation Providing platform independent memory logic
US8751745B2 (en) 2010-08-11 2014-06-10 Advanced Micro Devices, Inc. Method for concurrent flush of L1 and L2 caches
CN101916180B (zh) 2010-08-11 2013-05-29 中国科学院计算技术研究所 Risc处理器中执行寄存器类型指令的方法和其系统
US8756329B2 (en) 2010-09-15 2014-06-17 Oracle International Corporation System and method for parallel multiplexing between servers in a cluster
US9201801B2 (en) 2010-09-15 2015-12-01 International Business Machines Corporation Computing device with asynchronous auxiliary execution unit
US10228949B2 (en) 2010-09-17 2019-03-12 Intel Corporation Single cycle multi-branch prediction including shadow cache for early far branch prediction
US20120079212A1 (en) 2010-09-23 2012-03-29 International Business Machines Corporation Architecture for sharing caches among multiple processes
WO2012051262A2 (en) 2010-10-12 2012-04-19 Soft Machines, Inc. An instruction sequence buffer to enhance branch prediction efficiency
CN103262027B (zh) 2010-10-12 2016-07-20 索夫特机械公司 用于存储具有可可靠预测的指令序列的分支的指令序列缓冲器
US8370553B2 (en) 2010-10-18 2013-02-05 International Business Machines Corporation Formal verification of random priority-based arbiters using property strengthening and underapproximations
US9047178B2 (en) 2010-12-13 2015-06-02 SanDisk Technologies, Inc. Auto-commit memory synchronization
US8677355B2 (en) 2010-12-17 2014-03-18 Microsoft Corporation Virtual machine branching and parallel execution
WO2012103245A2 (en) 2011-01-27 2012-08-02 Soft Machines Inc. Guest instruction block with near branching and far branching sequence construction to native instruction block
US9842005B2 (en) 2011-03-25 2017-12-12 Intel Corporation Register file segments for supporting code block execution by using virtual cores instantiated by partitionable engines
WO2012135031A2 (en) 2011-03-25 2012-10-04 Soft Machines, Inc. Executing instruction sequence code blocks by using virtual cores instantiated by partitionable engines
TWI520070B (zh) 2011-03-25 2016-02-01 軟體機器公司 使用可分割引擎實體化的虛擬核心以支援程式碼區塊執行的記憶體片段
US20120254592A1 (en) 2011-04-01 2012-10-04 Jesus Corbal San Adrian Systems, apparatuses, and methods for expanding a memory source into a destination register and compressing a source register into a destination memory location
US9740494B2 (en) 2011-04-29 2017-08-22 Arizona Board Of Regents For And On Behalf Of Arizona State University Low complexity out-of-order issue logic using static circuits
US8843690B2 (en) 2011-07-11 2014-09-23 Avago Technologies General Ip (Singapore) Pte. Ltd. Memory conflicts learning capability
US8930432B2 (en) 2011-08-04 2015-01-06 International Business Machines Corporation Floating point execution unit with fixed point functionality
US20130046934A1 (en) 2011-08-15 2013-02-21 Robert Nychka System caching using heterogenous memories
US8839025B2 (en) 2011-09-30 2014-09-16 Oracle International Corporation Systems and methods for retiring and unretiring cache lines
KR101842550B1 (ko) 2011-11-22 2018-03-28 소프트 머신즈, 인크. 다중 엔진 마이크로프로세서용 가속 코드 최적화기
EP2783282B1 (en) 2011-11-22 2020-06-24 Intel Corporation A microprocessor accelerated code optimizer and dependency reordering method
EP2783281B1 (en) 2011-11-22 2020-05-13 Intel Corporation A microprocessor accelerated code optimizer
US8930674B2 (en) 2012-03-07 2015-01-06 Soft Machines, Inc. Systems and methods for accessing a unified translation lookaside buffer
KR20130119285A (ko) 2012-04-23 2013-10-31 한국전자통신연구원 클러스터 컴퓨팅 환경에서의 자원 할당 장치 및 그 방법
US9684601B2 (en) 2012-05-10 2017-06-20 Arm Limited Data processing apparatus having cache and translation lookaside buffer
US9940247B2 (en) 2012-06-26 2018-04-10 Advanced Micro Devices, Inc. Concurrent access to cache dirty bits
US9710399B2 (en) 2012-07-30 2017-07-18 Intel Corporation Systems and methods for flushing a cache with modified data
US9430410B2 (en) 2012-07-30 2016-08-30 Soft Machines, Inc. Systems and methods for supporting a plurality of load accesses of a cache in a single cycle
US9916253B2 (en) 2012-07-30 2018-03-13 Intel Corporation Method and apparatus for supporting a plurality of load accesses of a cache in a single cycle to maintain throughput
US9229873B2 (en) 2012-07-30 2016-01-05 Soft Machines, Inc. Systems and methods for supporting a plurality of load and store accesses of a cache
US9740612B2 (en) 2012-07-30 2017-08-22 Intel Corporation Systems and methods for maintaining the coherency of a store coalescing cache and a load cache
US9678882B2 (en) 2012-10-11 2017-06-13 Intel Corporation Systems and methods for non-blocking implementation of cache flush instructions
US10037228B2 (en) 2012-10-25 2018-07-31 Nvidia Corporation Efficient memory virtualization in multi-threaded processing units
US9195506B2 (en) 2012-12-21 2015-11-24 International Business Machines Corporation Processor provisioning by a middleware processing system for a plurality of logical processor partitions
US9886279B2 (en) 2013-03-15 2018-02-06 Intel Corporation Method for populating and instruction view data structure by using register template snapshots
KR101800948B1 (ko) 2013-03-15 2017-11-23 인텔 코포레이션 레지스터 뷰, 소스 뷰, 명령어 뷰, 및 복수의 레지스터 템플릿을 가진 마이크로프로세서 아키텍처를 이용하여 명령어들의 블록들을 실행하는 방법
KR102063656B1 (ko) 2013-03-15 2020-01-09 소프트 머신즈, 인크. 블록들로 그룹화된 멀티스레드 명령어들을 실행하기 위한 방법
US9632825B2 (en) 2013-03-15 2017-04-25 Intel Corporation Method and apparatus for efficient scheduling for asymmetrical execution units
US9811342B2 (en) 2013-03-15 2017-11-07 Intel Corporation Method for performing dual dispatch of blocks and half blocks
KR20150130510A (ko) 2013-03-15 2015-11-23 소프트 머신즈, 인크. 네이티브 분산된 플래그 아키텍처를 이용하여 게스트 중앙 플래그 아키텍처를 에뮬레이션하는 방법
WO2014150971A1 (en) 2013-03-15 2014-09-25 Soft Machines, Inc. A method for dependency broadcasting through a block organized source view data structure
US9904625B2 (en) 2013-03-15 2018-02-27 Intel Corporation Methods, systems and apparatus for predicting the way of a set associative cache
US9891924B2 (en) 2013-03-15 2018-02-13 Intel Corporation Method for implementing a reduced size register view data structure in a microprocessor
WO2014150806A1 (en) 2013-03-15 2014-09-25 Soft Machines, Inc. A method for populating register view data structure by using register template snapshots
US9569216B2 (en) 2013-03-15 2017-02-14 Soft Machines, Inc. Method for populating a source view data structure by using register template snapshots
US10275255B2 (en) 2013-03-15 2019-04-30 Intel Corporation Method for dependency broadcasting through a source organized source view data structure
WO2014150991A1 (en) 2013-03-15 2014-09-25 Soft Machines, Inc. A method for implementing a reduced size register view data structure in a microprocessor

Also Published As

Publication number Publication date
US9766893B2 (en) 2017-09-19
US9990200B2 (en) 2018-06-05
WO2012135031A2 (en) 2012-10-04
CN103547993B (zh) 2018-06-26
KR101638225B1 (ko) 2016-07-08
US20120246657A1 (en) 2012-09-27
TW201305819A (zh) 2013-02-01
US20160210145A1 (en) 2016-07-21
CN103547993A (zh) 2014-01-29
EP2689327A2 (en) 2014-01-29
EP2689327A4 (en) 2014-08-13
EP2689327B1 (en) 2021-07-28
KR20140018947A (ko) 2014-02-13
WO2012135031A3 (en) 2013-01-03

Similar Documents

Publication Publication Date Title
TWI533129B (zh) 使用可分割引擎實體化的虛擬核心執行指令序列程式碼區塊
TWI518504B (zh) 使用可分割引擎實體化的虛擬核心以支援程式碼區塊執行的暫存器檔案節段
TWI520070B (zh) 使用可分割引擎實體化的虛擬核心以支援程式碼區塊執行的記憶體片段
JP6621476B2 (ja) プロセッサ・コア内で使用するための実行スライス回路、プロセッサ・コア、およびプロセッサ・コアによりプログラム命令を実行する方法