TW201224933A

TW201224933A - Systems and methods for compiler-based vectorization of non-leaf code

Info

Publication number: TW201224933A
Application number: TW100134227A
Authority: TW
Inventors: Jeffry E Gonion
Original assignee: Apple Inc
Priority date: 2010-09-23
Filing date: 2011-09-22
Publication date: 2012-06-16
Also published as: CN103119561A; DE112011103190T5; AU2011305837A1; KR20130096738A; WO2012039937A2; GB201116429D0; AU2011305837B2; TWI446267B; WO2012039937A3; MX2013003339A; KR101573586B1; GB2484000A; BR112013008640A2; CN103119561B

Abstract

Systems and methods for the vectorization of software applications are described. In some embodiments, source code dependencies can be expressed in ways that can extend a compiler's ability to vectorize otherwise scalar functions. For example, when compiling a called function, a compiler may identify dependencies of the called function on variables other than parameters passed to the called function. The compiler may record these dependencies, e.g., in a dependency file. Later, when compiling a calling function that calls the called function, the same (or another) compiler may reference the previously-identified dependencies and use them to determine whether and how to vectorize the calling function. In particular, these techniques may facilitate the vectorization of non-leaf loops. Because non-leaf loops are relatively common, the techniques described herein can increase the amount of vectorization that can be applied to many applications.

Description

201224933 六、發明說明：【發明所屬之技術領域】本發明係關於電腦系統，且更特定言之，係關於用於實現軟體應用程式之通用向量化的系統及方法。【先前技術】典型的軟體開發範例為熟知的。電腦程式設計者用高階程式設計語言（例如’ Basic、C++等）來撰寫原始程式碼。在某些時候’程式設計者使用編譯器將原始程式碼轉變為目的程式碼。在經轉變為可執行程式碼（例如，在連結或其他編譯階段或執行階段處理之後），所得目的程式碼可接著由電腦或計算裝置執行。電腦現具有多個處理單元且能夠並列地執行指令。為了利用此架構，現代編譯器可試圖「並列化」或「向量化」某些軟體函式以使得替代使單一處理單元順序地一次執行一個指令，多個處理單元可同時執行指令。在編譯處理程序期間，編譯器分析軟體函式以判定是否存在任何向量化障礙。-個此種障礙為（例如）真實資料相依性之存在。此情況發生在當前指令參考經由執行先 =所獲得的資料時。在彼種狀況τ，後—指令僅可在前二指令之後進行，且因此兩個指令不能並列地執行。另:潛在障礙為函式呼叫之存在。舉例而言， ^ 叫外邱τ ^ 付編譯之函式呼卜邵函式，則編譯器可能不能夠向量化該【發明内容】 ”函式。本發明提供用於實現軟體應用程式之通用垔化的系統 J5875I.doc 201224933 及方法:為此，本文中所揭示之系統及方法提供擴充編譯器之向量化函式之能力之相依性及/或介面的表達。 Ο201224933 VI. Description of the Invention: TECHNICAL FIELD OF THE INVENTION The present invention relates to computer systems and, more particularly, to systems and methods for implementing generalized vectorization of software applications. [Prior Art] Typical software development examples are well known. Computer programmers write high-level programming languages (such as 'Basic, C++, etc.) to write the original code. At some point, the programmer uses the compiler to convert the source code into the destination code. Upon conversion to an executable code (e.g., after a link or other stage of compilation or execution stage processing), the resulting program code can then be executed by a computer or computing device. The computer now has multiple processing units and is capable of executing instructions in parallel. To take advantage of this architecture, modern compilers may attempt to "parallelize" or "vectorize" some of the software functions so that instead of having a single processing unit execute an instruction sequentially, multiple processing units can execute the instructions simultaneously. During the compilation process, the compiler analyzes the software functions to determine if there are any vectorization barriers. - One such obstacle is the existence of, for example, real data dependencies. This happens when the current instruction references the data obtained by executing the first =. In the case of τ, the post-instruction can only be performed after the first two instructions, and therefore the two instructions cannot be executed side by side. Another: The potential obstacle is the existence of a function call. For example, ^ is called a foreign language τ ^ to compile the function of the function, the compiler may not be able to vectorize the [invention] function. The present invention provides a general purpose for implementing software applications. System J5875I.doc 201224933 and method: To this end, the systems and methods disclosed herein provide for the interdependence of the ability to extend the vectorization function of the compiler and/or the expression of the interface.

在一非限制性實施例中，一編譯器可在函式（「被呼叫函式」）之編譯期間檢查該函式内之記憶體及以資料相依性，且在相依性資料庫（諸如，相錄槽案）中表達彼等相依性。—旦經編譯，則被呼叫函式可成為（例如）程式庫函式或其類似者。在稍後時間，可產生另一函式（「呼叫函式」）以使件其吟叫該被呼叫函$。在呼叫函式之編譯期間，編譯ϋ可存取與被呼叫函式相關聯之相依性標案且可識別其相依性。基於被呼叫函式之相依性，編譯器可做出關於是否向量化呼叫函式之決定。或者或另外，編譯器可決定僅向量化呼叫函式之一部分1相比否則將可能向量化之函式，#由使用相依性權案所提供之可見度可允許編譯器向量化較高百分比之函式。舉例而言，相依性檔案之實施允許向量化包括非葉迴圈 (亦即，呼叫原始程式碼不可見之外部函式之迴圈）的函弋因為現今絕大多數軟體函式包括一或多個非葉迴圈，所以此等系統及方法可增加可應用於任何應用程式之向化的量》在另一非限制性實施例中，編譯器可自單—原始程式碼描述產生函式之純量版本與向量版本兩者。函式之純量版本了使用如由原始程式碼最初指定之純量介面。同時，函式之向里版本可實施至函式之向量介面，從而接受向量來數且產生向量傳回值。 158751.doc 201224933 舉例而言，可在與函式相„之相依性樓案中曝露向量介面。舉例而此替代向量介面之存在允許編料自經向量化之迴圈内進行向量函式呼叫，而非自經向量丄之迴圈内進行多個經串列化之純量函式呼本文中所揭示之技術之各種組合亦准許向量化不含有迴圈：函式’此情形與公認的智慧相反但提供眾多優點。特定言之，此等技術可增加軟體應用程式中之總向量化的【實施方式】雖然易受到各種修改及易具有替代形式，但此說明貪中 :論述之特定實施例在圖式令藉由實例展示且在本文令將詳細描m，應理解，圖式及實施方式*意欲將本發明限於所揭示之特定形式’而相反，其意欲涵蓋屬於如由隨附申請專㈣®所界本發^精神及範的所有修改、等效物及替代物。引言 +以下說明書首先論述說明性電腦系統或裝置。說明書亦述說月〖生編譯器，該編譯器可經組態以執行及/或產生用於電腦$統之可執行程式碼。接著，說明書呈現用於實見非葉迴圈及全函式向量化之若干種技術。說明性電滕系統圖1彳田蜂根據某些實施例之可操作以實施用於實現軟體應用程式之m t θ ^用向1化之技術的說明性電腦系統。在此非限制性實例φ , ^ Λ Ν甲’電腦系統100包括經由ί/Ο介面130耦接至 158751.doc 201224933 -己隐體120之《多個處理器u〇all〇n。電腦系統ι〇〇亦包括耦接至I/O介面130之網路介面14〇及儲存介面丨5〇。儲存介面15〇將外部儲存裝置155連接至1/〇介面13〇。此外，網路介面14〇可將系統100連接至網路（圖中未展示）或連接 - 至另一電腦系統（圖中未展示）。 - 在一些實施{列巾’電腦系統100可為包括僅一個處理器 110a之單一處理器系統。在其他實施例中，電腦系統可包括兩個或兩個以上處理器】1〇a l l〇n。處理器】1〇&_ 〇 ll〇n可包括能夠執行指令之任何處理器。舉例而言，處理器llOa-llOn可為通用或丧入式處理器其實施任何合適的指令集架構（ISA)，諸如x86、p〇werPCTM、spARCTM或 mips™ ISA。在—實施例中，處理器nm加可包括美國專利案第7,617,496號及美國專利案第7,395,419號中所描述之巨集純量（Macroscalar)處理器的各種特徵。系統記憶體120可經組態以儲存可由處理器u〇a_u〇n存取之心7及資料。舉例而言，系統記憶體12G可為靜態隨機存取記憶體（SRAM)、同步㈣RAM(SDRAM)、非揮發性/快閃型s己憶體，或任何其他合適類型之記憶體技術。可將實施以下詳細描述之所要函式或應用程式之程式指令及/或資料的-部分展示為儲存於系統記憶體120内。或者或另外，被等程式指令及/或資料之一部分可儲存於儲存裝置155中，儲存於一或多個處理器llOa-llOn内之快取記隐體中或可經由網路介面14 〇自網路得到。 I/O介面130可操作以管理處理器ma_uGn、系統記憶體 15875I.doc 201224933 120與系統中或附接至系統之任何裝置（包括網路介面 140、儲存介面150或其他周邊介面）之間的資料訊務。舉例而言’ I/O介面丨30可將來自一組件之資料或控制信號轉換為適用於另一組件之格式。舉例而言，在一些實施例中，I/O介面130可包括對經由各種類型之周邊匯流排（諸如，周邊組件互連（PCI)匯流排或通用串列匯流排（USB))附接之裝置的支援。又，在一些實施例中，1/〇介面13〇之一些或所有功能性可併入至處理器11〇a_11〇n中。舉例而言，網路介面140經組態以允許在電腦系統ι〇〇與附接至網路之其他裝置（諸如，其他電腦系統）之間交換資料。舉例而言，網路介面刚可支援經由有線或無線通用資料網路、電信/電話網路、儲存區域網路（諸如，光纖通道SAN)及其類似者之通信。儲存介面150經組態以允許電腦系統1〇〇與儲存裝置（諸如，儲存裝置155)介接。儲存介面15〇可支援標準儲存介面，諸如以下各者之一或多個合適版本：進階技術附接封包介面（ATAPI)標準（其亦可被稱為積體驅動電子（腦)）、小型電腦系統介面（SCSI)標準、·Ε 1394「如▲(火線）」標準、刪標準，或適合於使大容㈣存裝置與電腦系統1〇0互連之另一標準或專屬介面。舉例而t，健存裝置155可包括磁性、光學或固態媒體，其可為^或可抽In a non-limiting embodiment, a compiler can check the memory and data dependencies of the function during compilation of the function ("called function"), and in a dependency database (such as, They are expressed in the phased slot case. Once compiled, the called function can be, for example, a library function or the like. At a later time, another function ("calling function") can be generated to cause the member to call the called call $. During compilation of the call function, the compiler can access the dependencies associated with the called function and can identify their dependencies. Based on the dependencies of the called function, the compiler can make a decision as to whether to vectorize the call function. Alternatively or additionally, the compiler may decide to vectorize only one of the call functions, part 1 of the function, which would otherwise be vectorized, and the visibility provided by the use of dependency rights may allow the compiler to vectorize a higher percentage of the letter. formula. For example, the implementation of a dependency file allows vectorization of functions that include non-leaf loops (ie, loops that call external functions that are not visible to the original code) because most software functions today include one or more Non-leaf loops, so such systems and methods can increase the amount of conformability applicable to any application. In another non-limiting embodiment, the compiler can generate a function from a single-original code description. Both scalar and vector versions. The scalar version of the function uses a scalar interface as originally specified by the original code. At the same time, the inward version of the function can be implemented into the vector interface of the function, accepting vector numbers and generating vector return values. 158751.doc 201224933 For example, the vector interface can be exposed in a dependency structure with a function. For example, the existence of the alternative vector interface allows the vector to be called from a vectorized loop. Rather than performing multiple serialized scalar functions in the loop of the warp vector, various combinations of techniques disclosed herein allow vectorization to not contain loops: the function 'this case with recognized wisdom Rather, it provides a number of advantages. In particular, these techniques can increase the total vectorization in a software application. [Embodiment] Although susceptible to various modifications and alternatives, this description is greedy: a specific embodiment of the discussion The drawings are intended to be illustrative, and are to be considered in the (4) All modifications, equivalents and alternatives to the spirit and scope of the present invention. Introduction + The following description first discusses an illustrative computer system or device. The compiler can be configured to execute and/or generate executable code for the computer. Next, the specification presents several techniques for realizing non-leaf loops and full-function vectorization. Illustrative computer system according to some embodiments operable to implement the mt θ ^ directional technology for software applications. Non-limiting examples φ , ^ Ν Ν ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' The network interface 14 of the O interface 130 and the storage interface 丨5. The storage interface 15 connects the external storage device 155 to the 1/〇 interface 13. In addition, the network interface 14 connects the system 100 to the network ( Not shown in the figure) or connected - to another computer system (not shown) - In some implementations, the computer system 100 can be a single processor system including only one processor 110a. In other embodiments , the computer system can include two or more processors] 1处理器all〇n. The processor]1〇&_〇ll〇n may include any processor capable of executing instructions. For example, the processor 110a-llOn may be any suitable for a general purpose or immersive processor. An instruction set architecture (ISA), such as x86, p〇werPCTM, spARCTM, or mipsTM ISA. In an embodiment, the processor nm plus may include the giants described in U.S. Patent No. 7,617,496 and U.S. Patent No. 7,395,419. Various features of a Macroscalar processor are provided. System memory 120 can be configured to store heart 7 and data that can be accessed by processor u〇a_u〇n. For example, system memory 12G can be static random access memory (SRAM), synchronous (four) RAM (SDRAM), non-volatile/flash type memory, or any other suitable type of memory technology. The portion of the program instructions and/or data that implements the desired functions or applications described in detail below may be shown stored in system memory 120. Alternatively or in addition, a portion of the program instructions and/or data may be stored in the storage device 155, stored in the cache of the one or more processors 110a-llOn or may be accessed via the network interface 14 The network gets it. The I/O interface 130 is operable to manage the processor ma_uGn, the system memory 15875I.doc 201224933 120, and any device in the system or attached to the system, including the network interface 140, the storage interface 150, or other peripheral interface. Information service. For example, the I/O interface 丨30 can convert data or control signals from one component to a format suitable for another component. For example, in some embodiments, I/O interface 130 can include attachment to various types of peripheral busses, such as peripheral component interconnect (PCI) busbars or universal serial busbars (USB). Device support. Again, in some embodiments, some or all of the functionality of the 1/〇 interface 13〇 may be incorporated into the processor 11〇a_11〇n. For example, network interface 140 is configured to allow data to be exchanged between a computer system and other devices attached to the network, such as other computer systems. For example, the network interface can support communication via a wired or wireless universal data network, a telecommunications/telephone network, a storage area network (such as a Fibre Channel SAN), and the like. The storage interface 150 is configured to allow the computer system 1 to interface with a storage device (e.g., storage device 155). The storage interface 15 can support a standard storage interface, such as one or more of the following: Advanced Technology Attachment Packet Interface (ATAPI) standard (which can also be referred to as integrated drive electronics (brain)), small Computer System Interface (SCSI) standard, Ε 1394 "such as ▲ (FireWire)" standard, deletion standard, or another standard or proprietary interface suitable for interconnecting a large capacity device with a computer system. For example, t, the storage device 155 can include magnetic, optical, or solid state media, which can be or can be pumped

取的。儲存裝置155亦可對應於硬碟機或磁碟機陣列、CD 或DVD機，或以非揮發性記憶體（例如，快閃）為基礎的裝置0 15875I.doc 201224933 八系統記憶體丨20及儲存裝置155表示經組態以儲存程式指令及資料之電腦可存取或電腦可讀储存媒體之說明性實^ 例。在其他實施例中，程式指令及/或資料可經接收、發送或儲存於不同類型之電腦可存取媒體上。大體而古，電 • 料存取媒體或储存媒體可包括任何類型之大容量儲存媒 ' ^或記憶體媒體’諸如磁性或光學媒體。電腦可存取媒體或儲存媒體亦可包括任何揮發性或非揮發性媒體，諸如例如，SDRAM、DDR sdram、rdram、sram 〇 :)、ROM或其類似者’而不管是否如系統記憶體12〇或另一類型之記憶體般包括於電腦系統1〇〇中。經由電腦可存取媒體儲存之程式指令及資料可藉由傳輸媒體或信號（諸如，電、電磁或數位信號）傳輸，傳輸媒體或信號（諸如，電、電磁或數位信號）可經由諸如網路及/或無線鍵路之通信媒體傳送，通信媒體（諸如）可經由網路介面14〇來實施。通常’電腦系統1〇〇可採用桌上型電腦或膝上型電腦之形式。然而，如根據本發明將容易地理解，電腦系統10〇可為能夠執行軟體之任何合適裝置。舉例而言，電腦系統 100可為平板型電腦、電話或其類似者。說明性編譯器大體而言，編譯器可對應於經組態以將可用高階程式設計語言（諸如，C、C++或任何其他合適之程式設計語言）表不之原始程式碼轉譯或轉變為目的程式碼的軟體應用程式 (例如，電腦可執行指令之一或多個模組卜藉以表達原始程式碼之5吾s可被稱為原始程式碼語言或簡單地被稱為原 158751.doc 201224933 始語言。通常，可田$人 t ^^ , σ於由目標計算架構處理之指令及貢枓的形式來表示目的式碼’但在一些實施例中，可對所產生之目的程式碼執 _ ±± 仃額外處理（例如，連結）以將目的程式碼轉變為機器可執 a 1 1 °在各種實施例中，此額外處可由編譯器或由單獨應用程式執行。可以機器可讀形式（例如_ —進位形式）、以可能需要額外處理以產生機器可讀程式 ^ 桎式碼之人類可讀形式（例如，組合5吾8 )或以人類可讀"ff彡12 1J*· 〇 7式及機器可讀形式之組合來表示「交叉目的程式丨目的程式碼之目標架構可與藉由處理器 UOa_l KM編譯器經組態以在其上執行）所實施之ISA相同」^ I些例子中’編譯器可經組態以產生用於不同於該編譯器執行所在之似之似的目的程式碼（編譯器」）。圖2¾緣根據某些實施例之在由電腦系統ι〇〇或另一合適電腦系統執行時可產生可執行程式碼之說明性編譯器。編 #器200包括岫端220及後端230，該後端23〇又可包括最佳化器240及程式碼產生器250。如所展示，前端22〇接收原始程式碼210且後端230產生目的程式碼，諸如純量目的程式碼260、經向量化之目的程式碼27〇或其組合。編譯器 200亦可產生與目的程式碼260及/或270中之一或多者相關聯的相依性資料庫280。雖然通常用高階程式設計語言來撰寫原始程式碼2丨〇，但原始程式碼210可替代地對應於諸如組合語言之機器階層語言。舉例而言’除用較高階程式設計語言撰寫之程式 158751.doc •10· 201224933 碼以外，編譯器200可經組態以亦將其最佳化技術應用於組合語言程式碼。又，編譯器2〇〇可包括前端22〇之數個不同的執行個體，其各自經組態以處理用不同之各別語言撰寫之原始程式碼210且產生類似中間表示以供後端23〇處，理。在此等實施例中，編譯器200可有效地充當多語言編 . 譯器。在一實施例中，前端220可經組態以執行原始程式碼21〇之初步處理以判定原始程式碼是否詞法及/或語法正確，〇且執行適合於使原始程式碼210準備好供後端23〇進一步處理之任何轉變。舉例而言，前端22〇可經組態以處理存在於原始程式碼210内之任何編譯器指示詞，諸如可導致原始程式碼210之一些部分包括於編譯處理程序中而排除其他部分的條件性編譯指示詞。前端22〇亦可經不同地組態以將原始程式碼210轉換為符記（例如，根據空白字元及/或由原始語言所定義之其他分隔符號），判定原始程式碼21〇疋否包括原始語言不允許之任何字元或符記，且判定符記之所得串流是否遵守定義原始語言中之語式正確之運算式的語法規則。在不同情形下，前端22〇可經組態以執行此等處理活動之不同組合，可省略上文所描述之某些動作，或可包括不同動作，此取決於前端220之實施及為前端22〇之目標之原始語言。舉例而言，若原始語言不提供用於定義編譯器指示詞之語法，則前端220可省略包括掃描原始程式碼21 0以用於搜尋編譯器指示詞之處理動作。若前端220在處理原始程式碼210期間遭遇錯誤，則其可 158751.doc 201224933 中止處理且報告錯誤（例如， ’藉由將錯誤資訊寫入至日誌Take it. The storage device 155 may also correspond to a hard disk drive or a disk drive array, a CD or DVD player, or a non-volatile memory (eg, flash) based device. 0 15875I.doc 201224933 Eight System Memory 丨20 and Storage device 155 represents an illustrative embodiment of a computer-accessible or computer-readable storage medium configured to store program instructions and data. In other embodiments, program instructions and/or materials may be received, transmitted or stored on different types of computer-accessible media. In general, an electrical access medium or storage medium may include any type of mass storage medium '^ or memory medium' such as magnetic or optical media. The computer-accessible medium or storage medium may also include any volatile or non-volatile media such as, for example, SDRAM, DDR sdram, rdram, sram 〇:), ROM or the like, regardless of whether it is such as system memory. Or another type of memory is included in the computer system. Program instructions and data stored via a computer-accessible medium may be transmitted by a transmission medium or signal (such as an electrical, electromagnetic or digital signal), such as a network, such as an electrical, electromagnetic or digital signal. And/or communication of the wireless communication medium, the communication medium, such as may be implemented via the network interface. Usually, the computer system can be in the form of a desktop or a laptop. However, as will be readily understood in accordance with the present invention, computer system 10A can be any suitable device capable of executing software. For example, computer system 100 can be a tablet computer, a telephone, or the like. Illustrative Compiler In general, a compiler may correspond to a program that is configured to translate or convert an original program code that can be represented by a high-level programming language (such as C, C++, or any other suitable programming language) into a destination program. The software application of the code (for example, one of the computer executable instructions or a plurality of modules to express the original code 5 can be called the original code language or simply called the original 158751.doc 201224933 initial language Typically, the field $t^^, σ is expressed in the form of instructions and tributes processed by the target computing architecture to represent the purpose code 'but in some embodiments, the generated code may be executed _ ± ±仃 Additional processing (eg, linking) to convert the destination code to machine executable a 1 1 ° In various embodiments, this additional may be performed by a compiler or by a separate application. It may be in machine readable form (eg, _ Carry form), in a human readable form that may require additional processing to produce a machine readable program (eg, a combination of 5 or 8) or human readable "ff彡12 1J*· 〇7 And a combination of machine readable forms to indicate that the "target architecture of the program code for the cross-referenced program can be the same as the ISA implemented by the processor UOa_l KM compiler to execute on it". The compiler can be configured to generate a destination code (compiler) that is different from the one in which the compiler is executing. Figure 22 is an illustrative compiler that produces executable code when executed by a computer system or another suitable computer system in accordance with some embodiments. The editor 200 includes a terminal 220 and a back end 230, which in turn may include an optimizer 240 and a code generator 250. As shown, the front end 22 receives the original code 210 and the back end 230 generates the destination code, such as the scalar destination code 260, the vectorized destination code 27, or a combination thereof. Compiler 200 can also generate dependency database 280 associated with one or more of destination code 260 and/or 270. Although the original code 2 is typically written in a higher level programming language, the original code 210 may alternatively correspond to a machine level language such as a combined language. For example, in addition to the program written in a higher-order programming language 158751.doc •10·201224933 code, the compiler 200 can be configured to also apply its optimization techniques to the combined language code. Also, the compiler 2 can include a plurality of different execution entities of the front end 22, each configured to process the original code 210 written in a different language and produce a similar intermediate representation for the back end 23〇 deal with. In such embodiments, compiler 200 can effectively act as a multi-language translator. In an embodiment, the front end 220 can be configured to perform preliminary processing of the original code 21 to determine if the original code is lexical and/or grammatically correct, and the execution is adapted to prepare the original code 210 for later use. End 23 is any further processing of the transition. For example, the front end 22A can be configured to process any compiler pointers present within the original code 210, such as conditionalities that can cause portions of the original code 210 to be included in the compilation process to exclude other portions. Compile the indicator. The front end 22〇 can also be configured differently to convert the original code 210 into tokens (eg, based on blank characters and/or other delimiters defined by the original language) to determine whether the original code 21 is included. Any character or token that is not allowed in the original language, and whether the resulting stream of the token conforms to the grammatical rules that define the correct expression in the original language. In various situations, the front end 22A can be configured to perform different combinations of such processing activities, some of the actions described above can be omitted, or different actions can be included, depending on the implementation of the front end 220 and the front end 22 The original language of the target. For example, if the original language does not provide a syntax for defining a compiler directive, the front end 220 may omit the processing action including scanning the original code 21 0 for searching for compiler pointers. If front end 220 encounters an error during processing of source code 210, it may abort processing and report an error (e.g., 'by writing error information to the log) 158751.doc 201224933

言，中間表示可包括明確地識別原始程式碼21〇之不同區塊或區段間之控制關係的控制流程圖。此控制流程資訊可由後端230用以判定（例如）可重新配置（例如，藉由最佳化器240)原始程式碼2 10之功能性部分的方式以改良效能，同時保留原始程式碼210内之必要的執行排序關係。後端230可大體上經組態以將中間表示轉變為純量程式碼260、經向量化之程式碼270或兩者之組合中的一或多者。具體言之’在所說明之實施例中，最佳化器240可經組態以轉變中間表示以試圖改良所得純量程式碼26〇或經向量化之程式碼2 7 0之一些態樣。舉例而言，最佳化器2 4 〇可經組態以分析中間表示以識別記憶體或資料相依性。在一些實施例中，最佳化器24〇可經組態以執行多種k他類型之程式碼最佳化，諸如向量化、迴圈最佳化（例如，迴圈融合、迴圈展開等）、資料流最佳化（例如，共同子運算In other words, the intermediate representation may include a control flow diagram that explicitly identifies the control relationships between different blocks or segments of the original code 21〇. This control flow information can be used by the back end 230 to determine, for example, the manner in which the functional portion of the original code 2 10 can be reconfigured (e.g., by the optimizer 240) to improve performance while retaining the original code 210. The necessary execution of the sort relationship. The back end 230 can be generally configured to convert the intermediate representation to one or more of a scalar code 260, a vectorized code 270, or a combination of the two. In particular, in the illustrated embodiment, optimizer 240 can be configured to transition intermediate representations in an attempt to improve some aspects of the resulting scalar code 26〇 or vectorized code 270. For example, the optimizer 24 can be configured to analyze intermediate representations to identify memory or data dependencies. In some embodiments, the optimizer 24 can be configured to perform a variety of k-type code optimizations, such as vectorization, loop optimization (eg, loop fusion, loop expansion, etc.) Data stream optimization (for example, common sub-operations)

158751.doc 201224933 (constant folding)等）或任何其他合適的最佳化技術。最佳化器240亦可經組態以產生相依性資料庫280。如下文更詳細描述，相依性資料庫280可表達原始程式碼21〇内之記憶體及/或資料相依性之指示。或者或另外，結合原始程式碼210之向量化，相依性資料庫28〇可曝露與經向量化之目的程式碼2 7 0相關聯的向量介面。程式碼產生器250可經組態以處理如由最佳化器2〇6轉變之中間表示，以便產生純量程式碼26〇、經向量化之程式 Ο 碼270或兩種類型之程式碼之組合。舉例而言，程式碼產生器250可經組態以產生由目標架構之ISA所定義的經向量158751.doc 201224933 (constant folding), etc. or any other suitable optimization technique. Optimizer 240 can also be configured to generate dependency database 280. As described in more detail below, the dependency database 280 can express an indication of the memory and/or data dependencies within the original code 21〇. Alternatively or additionally, in conjunction with the vectorization of the original code 210, the dependency database 28 may expose the vector interface associated with the vectorized code 270. The code generator 250 can be configured to process an intermediate representation as transformed by the optimizer 2〇6 to generate a scalar code 26〇, a vectorized program code 270, or two types of code. combination. For example, the code generator 250 can be configured to generate a warp vector defined by the ISA of the target architecture.

化之機器指令，以使得藉由實施目標架構之處理器（例如，處理器110a-110n中之一者或不同處理器）之所產生指令的執行可實施由原始程式碼21〇所指定之功能性行為。在一實施例中，程式碼產生器25〇亦可經組態以產生對應於原始程式碼210中可能尚未固有但可藉由最佳化器24〇在最佳化處理程序期間添加之運算的指令。在其他實施例中，可將編譯器2〇〇分割為比彼等所展示組件多、少之組件或與所展示組件不同的組件。舉例而言’編譯器200可包括連結器（圖巾未展示），連肖器經組態以將-或多個目的㈣或程式庫作為輸人，且組合該一或多個目的槽案或程式庫以產生單—通常可執行之標案。或者’連結器可為與編譯器分離之實體。如上文所提到，編W件中之任—者及藉此執行之方法或技術（包括以下關於圖3至圖6描述之彼等方法或技術）中之任 158751.doc -13- 201224933 一者可部分或完全地實施為儲存於合適電腦可存取儲存媒體内之軟體程式碼。原始程式碼21〇可表示（例如）軟體函式或演算法。所得目的程式碼26G及/或27G可為（例如）可由其他函式呼叫之程式庫或外部函式。以下更詳細論述由編譯器2〇〇在運算期間且(詳言之)在其向量化運算期間所使用之說明性技術。非葉迴圈之向量化許多現代電腦具有藉由同時執杆兩袖^ h 乃扣叫丨j叮矾仃兩個或兩個以上不同運算來執行計算工作負載之一此類形$廿、二顆1之並列處理的能力。舉例而言，超純量處理器可允許電腦試圖同時執行多個獨立才日令。大體上被稱為「向量計篡、 " 门异」之另一技術（可將其視為並列計算之特殊狀況）允許電腦試圖執行同時對多個資料項目進行運算之單—指令。向量計算之各種實例可見於現可用於各種處判巾之單指令多f料⑻助）指令集中，包括（例如）IBM之用於PowerPCTM—處理器之AkiVeeTM及 SPE™擴充指令集以&Intel2MMXTM&ssETM擴充指令集之㈣。此等S細指令為可由向量化編課器作為目= 向量指令的實例，但其他類型之向量指令或運算（包括可變長度向量運算、預測向量運算、對向量及純量/立即值 (immediate)之組合進行運算之向量運算）亦為可能且預期的。大體而言，將原始程式碼轉變為經向量化之目的程式碼之處理程序可被稱為「向量化」。#使用編譯器執行時 (與此相對比，例如，用手向量化原始程式碼），向量化可 158751.doc -14- 201224933 被稱為「編澤器自動向量化」。一種特定類型之自動向量化為迴圈自動向量化。迴圈自動向量化可將在多個資料項目上反覆之程序性迴圈轉換為能夠在單獨處理單元（例如，圖1中之電腦系統100之處理器110a_110n，或處理器 ' 内之單獨功能性單元）内同時處理多個資料項目之程式 • 碼。舉例而言，為了將兩個數值陣列^"及五"相加起來，程序性迴圈可反覆通過該等陣列，從而在每一反覆期間使一對陣列元素相加。當編譯此迴圈時，向量化編譯器可利〇用目標處理器實施能夠同時處理固定或可變數目個向量元素之向量運算的事實。舉例而言，編譯器可自動向量化陣列相加迴圈以使得在每一反覆同時使陣列^"及之多個元素相加，從而減少完成相加所需之反覆的數目。典型程式在此等迴圈内花費其執行時間之顯著量。因而，迴圈之自動向量化可在無程式設計者介入的情況下產生效能改良。在一些實施例中，編譯器自動向量化限於葉迴圈，亦即不呼叫其他函式之迴圈。非葉迴圈（亦即，呼叫其他函式之迴圈）之向量化一般非常困難，此係因為外部函式乎叫之副作用通常為不透明的，尤其在其原始程式碼不可用於程序間分析時，諸如程式庫之狀況。為了達成說明之目的，考慮以下迴圈： f〇r(x=〇； x<si2e. ++x) A[x]=x; foo (x) 158751.doc 201224933 為了向量化此迴圈，編譯器200可判定函式/〇〇f;是否與陣列々j互動（例如，讀取或寫入）。此處，存在三種可能 ()函式/〇〇0不與互動，（2)函式介〇確實與互動；或（3)函式/〇〇可能與Μ互動（例如，取決於編譯階段或執行階段條件，^^可能與或可能不與W互動）。函式 / 0可犯與乂/7互動之狀況呈現類似於函式介〇實際上與雜動之狀況的問題。在—。與姐間不存在互動的狀況下，則以下可向量化程式碼等效於以上迴圈： for (x=〇； x<si2e； ++χ) A[x] = χ； f：〇((xX)：〇； X<SiZe； ++χ) 此實例展不.在向量化非葉迴圈之處理程序中，編譯器 200將受益於知曉函式存取之記憶體及/或彼記憶體是否經讀取及/或寫X。因為大多數迴圈在其内通常含有函式呼 ^所以為了達成高度向量化，向量化非葉迴圈及由其呼叫之函式為較佳的。為了實現此層級之向量化’本文Ϊ所 \技術及系統之各種實施例増加跨越先前可能已編譯程式庫及核組之相依性及潛在相依性的編譯階段可見度。舉例而古，奸杳4 π 士 y ° 3可在獨立於最初在何時（或在何處）澤程式庫或模組而編譯呼叫函式時可用。^，本文中所描述之某些技術建立說明性編譯器架構以產生此度’且探索由其實現之向量化之類型。相依性資料庫可能需要判定外部函當編譯呼叫外部函式之程式碼時 158751.doc 201224933 Ο Ο 式之介面（例如，外部函式採用之參數之數目及/或類型， =/或其傳回之結果之數目及/或類型）。舉例而言，此介面貝π可用於判定呼叫程式碼是否已正確地實施外部函式。外部可呼叫函式通常可在標頭槽案中曝露其介面定義。然而’此等標頭檔案可能不向呼叫函式曝露並非外部函式之介面之部分但仍影響程式碼向量化的變數之細節。舉=而言，在上文所說明之迴圈中’ for迴圈之向量化可取決於函式/邮與陣料;互動之方式。然而，因為㈣不將仙乍為參數’所以對應於標頭標案可能未充分地指示此相依性。 0 本文中亦可被稱為「持績相依性資料庫」之相依性資料庫可描述程式庫中之外部可呼叫函式之相依性。亦即，相依性資料庫可向呼叫函式曝露僅自被呼叫函式之介面未必顯而易見之被呼叫函式的各種相依性。當編譯呼叫程式庫之函式時’可存取此資料庫。大體而言，相依性資料料持續地儲存可呼叫程式碼之相依性之指示，使得相依性跨越:譯器調用而可見。舉例而言，在—些實施例中，相: 性資料庫可實施為相依性檔案（類似於標頭檔案），其包括指示各種相依性之人類可讀及/或機器可讀内容。在^他實施例中，可使用其他技術來實施相依性資料庫，諸如藉由使用以表為基礎的關係資料庫、半結構化資料（例如，曰使用可延伸性標記語言（XML)格式化）或任何其他合適技術為簡化解釋，以下論述參考使用相依性檔案之實施例。然而，應注意，此僅為相依性資料庫之非限制性實 158751.doc -17- 201224933 例0 在實施例中，編譯器200在包括相應標頭槽案（例如， w祕./〇後即自㈣存取相依性檀案（若其存在）。此機制可允許向量化編釋器（諸如，巨集純量編譯器）編譯現有程式碼而無修改，同時具有知曉外部程式庫之相依性之優點。編譯器·可接著在編譯程式庫時自動地產生相依性檔案。含於相依性檔案中之資訊可形成應用程式編譯器介面 (ACI)，其提供編譯器2〇〇可用以理解函式之約束的資訊。具體言之，相依性檔案可表達關於通常不在呼叫函式之範嘴内之變數的資訊。舉例而言，相依性播案中所表達之變數可包括並非被呼叫函式之參數的資料項目（亦即，此等變數可能未由被啤叫函式之程式設計介面定義為被呼叫函式之參數）。舉例而言，經由使用相依性檔案，呼叫函式可意識到被呼叫函式是否讀取或寫入函式靜態或檔案靜態變數。相依性檔案亦可允許編譯器2〇〇區分共用相同名稱但具有不同範疇之變數。作為一非限制性實例，當編譯程式庫沁時，編譯器一般僅將產生目的檔案。藉由使用本文中所描述之技術，編譯器200亦可（例如）在編譯階段產生相依性檔案。相依性檔案曝露與办中所定義之公用函式相關聯的記憶體相依性。包括來自原始程式碼之之其他程式可觸發編譯器200在相應位置中搜尋相關聯之相依性檔案。此相依性檔案可與打刃紿肩及 158751.doc -18- 201224933 沁·起分散及安裝。在一實施中，相依性檔案之缺乏將意謂無關於程式庫之額外資訊可用，此情形可為舊式程式庫之預設狀態且將不會引起任何編譯錯誤。相依性資料庫可藉由在編譯呼叫程式庫函式之程式碼時 - 以編譯器200可見之方式曝露先前編譯之程式庫函式（或程 - 式中之任何函式）之資料相依性特性來實現非葉迴圈之向罝化。可在不揭露程式庫之原始程式碼的情況下使此資訊可用。〇在一些實施例中，可在程式庫之編料段產生相依性資訊。舉例而言，對於經編譯之每一函式，編譯器200可記下對函式靜態變數、檔案靜態變數、全域變數及/或傳遞至正經編譯之函式中之指標的存取之類型。編譯器200可接著記錄已讀取或寫入哪些符號，且以可在參考該程式庫之其他程式碼之編譯階段存取及使用之相依性檔案的形式匯出此資訊。料另一非限制性實例’若在檔案/㈣中定義函式⑽ U且在標案中定義其介面’則在知e之編譯階段，函式/W之記憶體相依性特性可儲存至相依性檀案/〇心 2。（應注意，可使用用於相依性檔案之任何合適的命名慣例）。使用函式之呼叫函式可包括標頭檔案介以，但不可存取檔案恤。。在呼叫函式之編譯期間參考加办時’編譯II2GG可自動地搜尋相依性檔案加㈣查看其是否存在。因為相依性標案加之存在為可選的，所以此檔案之缺乏可暗示檔案/〇 “中所定義之函式之相依性特性 158751.doc -19- 201224933 未知」因此建議編譯器200在向量化呼叫函式時應作出悲觀饭δ又。然而，若相依性檔案存在，則編譯器2⑽可使用此槽案中之相依性資訊以在向量化呼叫函式期間使用含於相依性標案巾之相純特性進行較準確且餘的假設。參看圖3，描繪表示根據某些實施例之在相依性檔案中表達相依性之方法的流程圖。在區塊中，編釋器2〇〇接收待編譯之函式。舉例而言，編譯器2〇〇可在處理用於編澤之原始程式碼時（諸如，在編譯包括函式之程式庫期間）接收該函式。在區塊310中，編譯器2〇〇分析該函式且識別該函式内的經表達之相依性。此經表達之相依性可為（例如）與並非被呼叫函式之參數之資料項目相關聯的記憶體或資料相依性。更大體而言，關於特定資料項目之函式的呈表達之相依性可指示函式是僅讀取特定資料項目僅寫入特疋資料項目，抑或讀取特定資料項目與寫入特定資料項目兩者。在各種實施{列巾，函式之分析可包括諸如執行函式之硐法、§吾法及/或語意分析之活動。分析亦可包括產生剖析樹、符號表、中間程式碼表示及/或指示正經編譯之程式碼之運算及/或資料參考之一些態樣的任何其他合適資料結構或表示。在區塊320中，編譯器200將經表達之相依性之指示儲存於與函式相關聯之相依性資料庫中。舉例而言，在函式之分析期間，編譯器200可識別由函式所使用的對於彼函式未必為區域或私用的且因此能夠由函式外部之程式碼讀取或寫入的變數。此等變數可為編譯器2〇〇可識別的經表達 158751.doc -20- 201224933 之相依性之實例，且編譯器200可將此等變數之指示儲存於相依性資料庫内。（應注意，在一些實施例中，編譯器 200亦可識別及指示對於函式為區域或私用之相依性）。在各種實施例中，經表達之相依性之指示可包括識別經表達之相依性之資訊，諸如所取決於之變數之名稱。該指示亦可包括特性化經表達之相依性之資訊，諸如關於函式是否凟取或寫入變數之資訊及/或關於變數之資料類型或範疇 (例如，變數是否為全域、私用、靜態等）之資訊。如根據〇本發明將容易地顯而易見，可以任何合適格式（諸如，可延伸性標記語言（XML)或其類似者）產生或更新相依性檔案。此外，在一些實施例中，替代青定方式或除肯定方式以外，亦可以否定方式來指示相依性。舉例而言，除指示確實存在之彼等經表達之相依性以外或替代指示確實存在之彼等經表達之相依性，相依性檔案亦可明確地指示給定變數不取決於外部程式碼。舉例而言，考慮以下實例，其中將編譯c : int A[1000]; //全域陣列 a int F[1000]; //全域 _ 列 F ^include <fool.h> int funcl(int b) { int x，c; c = 0; for (x=0; x<100; ++x) c = c + fool(x) + A[x+b]； F[x] = c } return(c); 158751.doc -21 - 201224933 在此種狀況下，/⑽c7.c呼叫以下展示之外部函式 fool.c ： II —播案 fool.c-·-int fool(int d) { static int e = 0; e = e + d; return(e); } 僅為達成說明之目的而再現被呼叫函式/〇〇7义之原始程式碼。應理解，只要對於/〇〇人〇而言存在相依性資料庫（在此實例中，相依性檔案），則其原始程式碼在呼叫函式 >加/^之編譯期間便無需可用。在此實例中，儲存於可 >已在編譯標案介士時產生之相依性檔案細^中的經表達之相依性資訊可表達函式靜態變數「e」經讀取及寫入兩者的事實，以下展示相應相依性槽案之—非限制性實例：檔案 foo 1 .hd ---function fool(void) read e; write e; } 在檔案Aw/.c之編譯階段，包括標頭檔案/〇〇7』可使編譯器20G讀取相依性樓案⑽.仏此資訊通知編譯器被呼叫函式的經表達之相依性：亦即，彼靜態變數「e」經讀取及寫入。此亦允許編譯器2〇〇偵測如下情形：即使 ,域變數「A」及「F」用於呼叫函式如叫中，全域變數「A」及「F」仍不由被呼叫函式如川參考。此知識允許 15875l.doc •22- 201224933 編譯器2GG向量化函式加仰中之迴圈，此係因為其可判定並列性將不會引起不正確操作。在此種狀況下，對於正經處理之向量中之每一元素中之迴圈將呼叫 fool ()—九。若函式/〇〇職入至全域「A」，則編譯器綱可不向量化/謝"，中之迴圈’ 其可使用肖資訊以僅向量化函式之一部分。纟此例子中，編譯器可(例如)將串列化對函式The machine instructions are implemented such that execution of instructions generated by the original code 21 can be implemented by execution of instructions generated by a processor (e.g., one of the processors 110a-110n or a different processor) implementing the target architecture Sexual behavior. In an embodiment, the code generator 25A can also be configured to generate an operation corresponding to the original code 210 that may not be inherently but can be added by the optimizer 24 during the optimization process. instruction. In other embodiments, the compiler 2 can be partitioned into more or fewer components than the components shown, or components different from the components shown. For example, 'compiler 200 can include a connector (not shown) that is configured to take - or multiple destinations (four) or libraries as input, and combine the one or more destination slots or The library is used to generate a single-usually executable standard. Or the 'connector' can be an entity separate from the compiler. As mentioned above, any of the W-pieces and the methods or techniques performed thereby (including the methods or techniques described below with respect to Figures 3-6) 158751.doc -13- 201224933 The software code stored in a suitable computer-accessible storage medium may be implemented partially or completely. The original code 21〇 can represent, for example, a soft body function or an algorithm. The resulting object code 26G and/or 27G can be, for example, a program library or an external function that can be called by other functions. The illustrative techniques used by the compiler 2 during the operation and (in detail) during its vectorization operations are discussed in more detail below. Vectorization of non-leaf loops Many modern computers have one of the computational workloads by performing two or more different operations at the same time. The ability to process 1 in parallel. For example, a super-scalar processor can allow a computer to attempt to execute multiple independent calendars simultaneously. Another technique, generally referred to as "vector" and "quota", which can be viewed as a special case of side-by-side calculations, allows a computer to attempt to execute a single-instruction that operates on multiple data items simultaneously. Various examples of vector calculations can be found in the single-instruction multi-material (8) help instruction set that is now available for various negation, including, for example, IBM's AkiVeeTM and SPETM extended instruction sets for PowerPCTM-processors &Intel2MMXTM&; ssETM extended instruction set (4). These S-details are examples of vector instructions that can be used by vectorized orchestration, but other types of vector instructions or operations (including variable length vector operations, predictive vector operations, pair vectors, and scalar/immediate values (immediate) The combination of the vector operations of the operations) is also possible and expected. In general, the process of converting the original code into a vectorized destination code can be referred to as "vectorization." #When using the compiler to execute (in contrast to, for example, vectorizing the source code by hand), vectorization can be called 158751.doc -14- 201224933 is called "automatic vectorization". A specific type of automatic vectorization is automatically vectorized for loops. Loop auto-vectorization converts the programmed loops over multiple data items into separate functionalities that can be handled in separate processing units (eg, processor 110a_110n of computer system 100 in Figure 1, or processor'). A program/code that processes multiple data items simultaneously in a unit). For example, to add two arrays of values ^" and five", a procedural loop can repeatedly pass through the arrays, thereby adding a pair of array elements during each iteration. When compiling this loop, the vectorization compiler can use the target processor to implement the fact that vector operations that can handle a fixed or variable number of vector elements simultaneously. For example, the compiler can automatically vectorize the arrays to add loops so that the arrays and the multiple elements are added at the same time, thereby reducing the number of iterations needed to complete the addition. Typical procedures spend a significant amount of their execution time in these loops. Thus, automatic vectorization of the loop can result in performance improvements without the intervention of a programmer. In some embodiments, the compiler auto-vectorization is limited to leaf loops, i.e., loops that do not call other functions. The vectorization of non-leaf loops (that is, calling loops of other functions) is generally very difficult, because the side effects of external functions are usually opaque, especially when their original code is not available for inter-program analysis. When, such as the status of the library. For the purpose of illustration, consider the following loop: f〇r(x=〇; x<si2e. ++x) A[x]=x; foo (x) 158751.doc 201224933 To vectorize this loop, compile The device 200 can determine whether the function /〇〇f; interacts with the array 々j (eg, reads or writes). Here, there are three possible () functions / 〇〇 0 does not interact with, (2) the function does interact with; or (3) the function / 〇〇 may interact with ( (for example, depending on the stage of compilation or Execution phase conditions, ^^ may or may not interact with W). The function of the function / 0 can be used to interact with the 乂/7 to present a problem similar to the state of the function and the situation of the dynamism. in-. In the absence of interaction with the sister, the following vectorizable code is equivalent to the above loop: for (x=〇; x<si2e; ++χ) A[x] = χ; f:〇(( xX): 〇; X<SiZe; ++χ) This example shows that in the vectorized non-leaf loop handler, the compiler 200 will benefit from the knowledge of the memory and/or memory of the function access. Whether to read and / or write X. Since most loops usually contain a function call within it, in order to achieve a high degree of vectorization, it is preferred to vectorize the non-leaf loop and the function called by it. In order to achieve this level of vectorization, various embodiments of the techniques and systems incorporate compile-stage visibility across the dependencies and potential dependencies of previously compiled libraries and core groups. For example, the genius 4 π y ° 3 can be used to compile the call function independently of when (or where) the library or module was originally created. ^, some of the techniques described in this article build an illustrative compiler architecture to produce this degree' and explore the types of vectorization implemented by it. Dependency database may need to determine the external function when compiling the code of the call external function 158751.doc 201224933 Ο Ο interface (for example, the number and / or type of parameters used by the external function, = / or its return The number and/or type of results). For example, this interface π can be used to determine if the calling code has correctly implemented an external function. External callable functions typically expose their interface definitions in the header slot. However, such header files may not expose to the calling function details that are not part of the interface of the external function but still affect the vectorization of the code. In other words, in the loop described above, the vectorization of the for loop can depend on the function/mail and the matrix; the way of interaction. However, because (4) does not treat cents as a parameter', this dependency may not be adequately indicated corresponding to the header. 0 The dependency database, also referred to as the “Performance Dependency Database”, describes the dependencies of external callable functions in the library. That is, the dependency database can expose various dependencies of the called function to the calling function that are not necessarily apparent from the interface of the called function. This library can be accessed when compiling the function of the call library. In general, the dependency data continuously stores an indication of the dependencies of the callable code, such that the dependency spans: the translator is visible. For example, in some embodiments, the phase: sex database can be implemented as a dependency archive (similar to a header file) that includes human readable and/or machine readable content indicative of various dependencies. In other embodiments, other techniques may be used to implement the dependency database, such as by using a table-based relational database, semi-structured material (eg, using Extensible Markup Language (XML) formatted) Or any other suitable technique for a simplified explanation, the following discussion refers to an embodiment using a dependency file. However, it should be noted that this is only a non-limiting example of the dependency database. 158751.doc -17- 201224933 Example 0 In the embodiment, the compiler 200 includes the corresponding header slot (eg, w secret./〇) That is, since (4) accesses the dependent case (if it exists). This mechanism allows the vectorized compiler (such as the macro scalar compiler) to compile the existing code without modification, and has the dependency of the external library. The advantage of sex. The compiler can then automatically generate dependencies when compiling the library. The information contained in the dependency file can form the Application Compiler Interface (ACI), which provides the compiler 2 to understand Information about the constraints of the function. Specifically, the dependency file can express information about variables that are not normally in the mouth of the call function. For example, the variables expressed in the dependency broadcast can include not being called. The data item of the parameter (that is, these variables may not be defined by the programming interface of the beer function as the parameter of the called function). For example, by using the dependency file, the call function It can be appreciated whether the called function reads or writes a function static or archive static variable. The dependency file can also allow the compiler to distinguish between variables that share the same name but have different categories. As a non-limiting example, When compiling a library, the compiler will typically only generate the destination file. By using the techniques described herein, the compiler 200 can also generate dependency files, for example, during the compilation phase. Dependency file exposure and office The memory dependencies associated with the defined public function, including other programs from the original code, can trigger the compiler 200 to search for the associated dependency file in the corresponding location. This dependency file can be used with the blade. 158751.doc -18- 201224933 分散·Distraction and installation. In one implementation, the lack of dependency files will mean that no additional information about the library is available. This situation can be the default state of the old library and will not Will cause any compilation errors. The dependency database can be exposed by the compiler 200 when the code of the call library function is compiled. The data dependency property of the compiled library function (or any function in the program) is used to implement the non-leaf loop. This information can be made without revealing the source code of the library. In some embodiments, dependency information may be generated in a program segment of the library. For example, for each function compiled, the compiler 200 may write down the static variables of the function, the static variables of the file, The global variable and/or the type of access passed to the indicator in the function being compiled. The compiler 200 can then record which symbols have been read or written, and can be compiled with other code that can be referenced in the library. This information is sent in the form of a phase access and usage dependency file. Another non-limiting example 'If the function (10) U is defined in the file / (4) and its interface is defined in the standard 'is compiled At the stage, the memory/dependency characteristics of the function/W can be stored in the dependent case/heart 2. (It should be noted that any suitable naming convention for dependency files can be used). The call function using the function can include a header file, but cannot access the file. . When the reference function is compiled during the compilation of the call function, 'Compile II2GG can automatically search for the dependency file plus (4) to see if it exists. Since the dependency criterion is optional, the lack of this file may imply that the dependency relationship of the function defined in the file/〇"158751.doc -19- 201224933 is unknown. Therefore, it is recommended that the compiler 200 be vectorized. When you call the function, you should make a pessimistic meal. However, if the dependency file exists, compiler 2 (10) can use the dependency information in the slot to make more accurate and redundant assumptions during the vectorized call function using the phase-only characteristics of the dependency blanket. Referring to Figure 3, a flow diagram is depicted depicting a method of expressing dependencies in a dependency archive in accordance with some embodiments. In the block, the editor 2 is connected to the function to be compiled. For example, the compiler 2 can receive the function when processing the original code for authoring, such as during compilation of a library including the function. In block 310, the compiler 2 analyzes the function and identifies the expressed dependencies within the function. This expressed dependency may be, for example, a memory or data dependency associated with a data item that is not a parameter of the called function. More importantly, the dependence of the expression of a function on a particular data item can indicate that the function is to read only the specific data item and only to write the special data item, or to read the specific data item and write the specific data item. By. In various implementations, the analysis of functions may include activities such as the implementation of a method of clarification, § my law, and/or semantic analysis. The analysis may also include generating any other suitable data structure or representation of the parse tree, the symbol table, the intermediate code representation, and/or some of the aspects of the operations and/or data references that indicate the code being compiled. In block 320, compiler 200 stores the indicated indication of dependencies in a dependency database associated with the function. For example, during analysis of the function, compiler 200 can identify variables used by the function that are not necessarily region or private for the function and can therefore be read or written by code external to the function. . These variables may be examples of compiler 2's identifiable dependencies expressed 158751.doc -20-201224933, and compiler 200 may store the indications of such variables in the dependency database. (It should be noted that in some embodiments, compiler 200 may also identify and indicate dependencies for the region for local or private use). In various embodiments, the indication of the expressed dependencies may include information identifying the expressed dependencies, such as the name of the variable to which it depends. The indication may also include information that characterizes the expressed dependencies, such as information about whether the function retrieves or writes variables, and/or data types or categories of variables (eg, whether the variable is global, private, static) Information). As will be readily apparent from the present invention, the dependency file can be generated or updated in any suitable format, such as Extensible Markup Language (XML) or the like. Moreover, in some embodiments, instead of or in addition to the positive mode, the dependency may also be indicated in a negative manner. For example, the dependency file may also explicitly indicate that the given variable does not depend on the external code, in addition to or in addition to the expressed dependencies that indicate the presence of the indication. For example, consider the following example, where c: int A[1000] will be compiled; // global array a int F[1000]; //global_column F ^include <fool.h> int funcl(int b) { int x,c; c = 0; for (x=0; x<100; ++x) c = c + fool(x) + A[x+b]; F[x] = c } return(c ); 158751.doc -21 - 201224933 In this case, /(10)c7.c calls the external function fool.c shown below: II - broadcast fool.c---int fool(int d) { static int e = 0; e = e + d; return(e); } The original code of the called function/〇〇7 is reproduced for the purpose of illustration only. It should be understood that as long as there is a dependency database (in this example, a dependency file) for /〇〇, the original code is not required to be available during compilation of the call function > In this example, the expressed dependency information stored in the dependency file that has been generated when compiling the title is expressed as the function static variable "e" is read and written. The fact that the corresponding dependency slot case is shown below - non-limiting example: file foo 1 .hd ---function fool(void) read e; write e; } In the compilation phase of the file Aw/.c, including the header Archive/〇〇7』 allows the compiler 20G to read the dependency (10). This information informs the compiler of the expressed dependencies of the called function: that is, the static variable "e" is read and written. In. This also allows the compiler 2 to detect the following situation: even if the domain variables "A" and "F" are used in the call function, the global variables "A" and "F" are still not called by the function. reference. This knowledge allows the 15875l.doc •22-201224933 compiler 2GG vectorization function to add back to the circle, which is because it can determine that the parallelism will not cause incorrect operation. In this case, the loop in each element in the vector being processed will call fool () - nine. If the function/deportation is entered into the global "A", then the compiler can not be vectorized / thank you ", the loop in the ' can use Shaw information to only part of the vectorization function. In this example, the compiler can, for example, serialize the pair function

/〇仰之呼叫與對「A」之記憶體參考，同時允許迴圈之剩餘部分以並列方式執行。參看圖4,描繪表示向量化函式之方法之實施例的流程圖。在區塊彻中’編剌識料叫函式。在—非限制性實施例中’呼叫函式可包括非葉迴圈，在該種狀況下，呼叫函式可包括對外部或被呼叫函式之呼叫。參考剛才給出之程式碼實例’編譯器綱可處理加原始程式碼，且識別/_川函式作為呼叫函式，其包括呼叫/〇叫函式之非葉/or迴圈。在區塊410中’編譯器2〇〇可試圖存取與被呼叫函式相關聯之相依性貧料庫。在—些例子中，可（例如）經由命令列參數、嵌入於原始程式碼内之編譯器指示詞或經由另一合適技術向編譯器200明確地指示相依性資料庫（例如，相依性播案）。在其他例子中，編譯器·可試圖根據命名慣例自其他資料推斷相依性檔案之名稱。舉例而言，若標頭檔案包括於原始程式碼内，則編譯器2〇〇可搜尋自標頭檔案之名稱導出的相依性標案。在一些實施例中，編譯器2〇〇 158751.doc -23- 201224933 可基於被呼叫函式之名稱搜尋相依性檔案。若相依性資料庫存在’則其可指示被呼叫函式内的經表達之相依性。此經表達之相依性可為（例如）與並非被呼叫函式之參數之資料項目相關聯的記憶體或資料相依性，如上文所論述。在一些例子中，編譯器200可檢查數個不同的命名慣例以判定相依性檔案是否存在。在區塊420中，編譯器200接著至少部分地基於經表達之相依性（或相依性之缺乏）判定呼叫函式是否與被呼叫函式互動。舉例而言，在存取與函式〇相關聯之相依性檔案後，編譯器200即可判定取決於變數「e」而非變數「A」5戈「F」。a此’編譯器2〇〇可判定，至少關於變數「e」，呼叫函式川確實與被呼叫函式互動。在區塊430中，取決於呼叫函式是否與被呼叫函式互動之判定，編譯器200可判定是否向量化呼叫函式之至少一部分。舉例而言，基於上文所論述的經表達之相依性資訊，編譯器200可試圖藉由產生同時對多個資料項目（例如，陣列元素）及/或多個迴圈反覆進行運算之向量程式碼來向量化呼叫函式/⑽。 ‘ 各種實施例中，相依性資料庫可表達對編譯器判定是否向量化函式有用之各種類型之資訊。實例包括追蹤對資料物件之讀取及寫人、指標、指向f料物件（㈣⑽ to Object)、指向物件内之已知位移（〇ffs 〃主才曰向物件中之未知位移（其可有效地構成對整個物件之參考）、彳匕向物件與資料物件兩者，其可使用在論 t件（指乏變數實現執 158751.doc -24- 201224933 行階段相依性分析）内之變數位移，及至具有至較高層級物件中之未知位移之物件中的已知位移（例如，當參考未知數目個已知位移但保持未參考其他位移時）。已知位移資訊可使編譯器_能夠在不產生額外相依性檢查指令之情況下進行向量化，而變數位移資訊可用以產生在執行階段分析變數相依性之相依性檢查指令，其可允許達成增加之向；t並列性’同時仍維持程式正確性。如上文所解釋，相依性資料庫可表達關於在向量化呼叫函式時有用的被呼叫函式之資訊。言，相依性資料庫可儲存諸如記憶體存取之類型式及/或額外限定詞之資訊。/ Relying on the call and the reference to the memory of "A", while allowing the remaining part of the loop to be executed in parallel. Referring to Figure 4, a flow diagram depicting an embodiment of a method of representing a vectorization function is depicted. In the block, the editorial function is compiled. In a non-limiting embodiment, the 'call function' may include a non-leaf loop, in which case the call function may include a call to an external or called function. Referring to the code example just given, the compiler program can process the original code, and recognize the /_ function as a call function, which includes the non-leaf/or loop of the call/call function. In block 410, the 'compiler 2' may attempt to access the dependent lean library associated with the called function. In some examples, the dependency database may be explicitly indicated to the compiler 200, for example, via command line parameters, compiler directives embedded within the original code, or via another suitable technique (eg, a dependency broadcast) ). In other examples, the compiler may attempt to infer the name of the dependency file from other data based on the naming convention. For example, if the header file is included in the original code, the compiler 2 can search for a dependency reference derived from the name of the header file. In some embodiments, compiler 2〇〇 158751.doc -23- 201224933 may search for dependency files based on the name of the called function. If the dependency data is in stock, then it can indicate the dependent dependencies within the called function. This expressed dependency may be, for example, a memory or data dependency associated with a data item that is not a parameter of the called function, as discussed above. In some examples, compiler 200 can examine several different naming conventions to determine if a dependency file exists. In block 420, compiler 200 then determines whether the call function interacts with the called function based, at least in part, on the expressed dependencies (or lack of dependencies). For example, after accessing the dependency file associated with the function, the compiler 200 can determine that the variable "e" is determined instead of the variable "A" 5 "F". a 'Compiler 2' can determine that, at least with respect to the variable "e", the call function does interact with the called function. In block 430, compiler 200 may determine whether to vectorize at least a portion of the call function, depending on whether the call function is interacting with the called function. For example, based on the expressed dependency information discussed above, compiler 200 may attempt to generate a vector program that repeatedly operates on multiple data items (eg, array elements) and/or multiple loops simultaneously. The code is used to vectorize the call function / (10). ‘In various embodiments, the dependency database can express various types of information useful to the compiler in determining whether a vectorization function is useful. Examples include tracking the reading and writing of data objects, indicators, pointing to f objects ((4) to Object), pointing to known displacements in the object (〇ffs 〃 the unknown displacement in the object to the object (which can effectively Forming a reference to the entire object), both the object and the data object, which can be used in the variable displacement of the t-piece (referring to the calculation of the dependence of the variable 158751.doc -24-201224933), and Known displacement in an object with unknown displacement to a higher level object (eg, when referring to an unknown number of known displacements but remaining unreferenced to other displacements). Known displacement information allows the compiler to be able to generate Vectorization is performed with additional dependency check instructions, and variable displacement information can be used to generate dependency check instructions that analyze variable dependencies during the execution phase, which can allow for increased direction; t-parallelism while still maintaining program correctness As explained above, the dependency database can express information about the called function that is useful in vectorizing call functions. Library store information such as the type of memory access type and / or the additional qualifier.

譯器200 就此而定址模在一些實施例中，藉由函式遲订之6己憶體存取大體上屬於以下兩種類型：讀取及寫入。體上屬中所展千,, 如上文給出之實例斤展不’相錄資料庫可㈣_ 取或寫入之指示。 #目疋否經《賣〇 ^存取呼叫函式查相之被Μ函式内之記 U㈣存取。—些實施例可定義三種定及未知，但替代實施例為可能且預期的1/數變數譯“別藉由以下情形來判定··定址是可由編叫函t 藉由呼叫函式在執行階段抑或藉由被呼叫函式在執行階段建立。另外，_ U藉由被呼式之兩個正交限定詞：公用及2定義定址模聯變數是否為外部模組可見。此等限疋詞指定相關根據一歧會尬Λ；丨 i 施例’常數定址描述可在編譯階段自模組外 I58751.doc -25- 201224933 解析之定址m包括對可在編譯階段解析之命名變數、命名結構内之命名結構元素或陣列索引的參考。舉例而言’ Μ命名變數）、〜(命名結構内之命名結構元素）、 w(由常數標以索引之陣列）及邱7/ι(由常數標以索引之結構之命名陣列内的命名結構元素）表示常數定址之實例。此等實例可表示靜態或全域變數。（自動館存區通常為暫時的’例如’在進人模組後即分配且在模組之退出後即解除分配，且因此在模組外大體上不可見）。以下實例說明使用常數定址之函式之相依性： function foo(void) write public h[5]; read public g；在-些實施例巾，變數定址描㉛並非常數但亦不由被呼叫函式修改之定址。_，其可由呼叫函式在執行階段評估。實例包括對指向物件及可由哞叫函式觀察収址之陣列的參考。考慮以下函式： static int A[1000]; //檔案-靜態變數，未匯 void assignA(int g, int x) A[g] = A[x]; }；此函式將以下相依性匯出至相依性檔案，從而宣告函式寫入且讀取，兩者皆為變數定址之陣列： void assignA(g,x) { write private A[g]; read private A[x]; }； 158751.doc -26- 201224933 在此實例中，若每呼叫迴圈之反覆僅呼叫函式一次’則相依性檢查（其亦可被稱為危障檢查（hazard checking))可為不必要的。被呼叫函式鏡咖川可判定茗及 X是否重疊’且可（例如）使用巨集純量技術相應地分割向量。考慮外部迴圈每反覆調用〇兩次之情形： for (x=...) {Translator 200 Addressing Modes As such, in some embodiments, the 6-reserved access by the function is generally of the following two types: read and write. In the body of the genus, the thousands of exhibitions, as shown in the example above, can not be recorded in the catalogue (4) _ take or write instructions. #目疋No According to the "selling 〇 ^ access call function check the function of the Μ Μ U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U Some embodiments may define three kinds of definitions and unknowns, but alternative embodiments are possible and expected 1/3 variable translations. "Don't judge by the following situation. · Addressing can be made by the calling function t by the calling function in the execution stage. Or by the called function in the execution phase. In addition, _U is defined by the two orthogonal qualifiers of the call: public and 2 to define whether the addressed model variable is visible to the external module. Relevant according to the difference; 丨i Example 'constant addressing description can be compiled at the compile stage I58751.doc -25- 201224933 The address of the parsing m includes the naming variables and naming structures that can be parsed at the compile stage A reference to a named structural element or array index. For example, 'Μ naming variables', ~ (named structure elements within a named structure), w (an array of indices indexed by constants), and Qiu 7/ι (indexed by constants) Examples of naming structural elements within a named array of structures represent instances of constant addressing. Such instances may represent static or global variables. (Automated library areas are usually temporary 'eg' are assigned after entering the module and The allocation is deallocated after the module exits, and is therefore largely invisible outside the module. The following example illustrates the dependency of a function that uses constant addressing: function foo(void) write public h[5]; read public g In some embodiments, the variable address map 31 is not constant but is not addressed by the modified function of the called function. _, which can be evaluated by the call function during the execution phase. Examples include pointing objects and observing functions. Reference to the array of addresses. Consider the following function: static int A[1000]; //archive-static variable, not void assign assign((int g, int x) A[g] = A[x]; }; The following dependencies are exported to the dependency file, so that the function is written and read, both of which are arrays of variable addressing: void assignA(g,x) { write private A[g]; read private A[ x]; }; 158751.doc -26- 201224933 In this example, if only the call function is repeated once per call loop, then the dependency check (which may also be called hazard checking) may It is unnecessary. The called function Mirror Kagawa can determine whether 茗 and X overlap 'and ) Using macro segmentation techniques scalar quantity corresponding to the outer case to consider the loop repeatedly call each of the two square:. For (x = ...) {

assignA (gl,x)； assignA (g2,y)；儘管危障可存在於W對x之間或尽2對>；之間，但此等相依性與函式之單一調用有關。在此特定例子中，呼叫迴圈可僅在其可自相依性檔案中之資訊辨識的公7對少之間及以對义之間檢查潛在危障。在一些實施例中，未知定址類似於如上文所描述之變數足址，但通常應用於執行階段定址不可由呼叫函式評估之匱形此了發生在（例如）被呼叫函式以啤叫函式使用來自相依f生檔案之資訊而不可見之方式修改位址變數之值的情形中。額外限定詞「公用」及「私用」可指定連結器是否符號以允許由呼叫函式檢驗變數。舉例而言，將在上文給出之倒數第二個實例中的對々;之參考指^為「私用」，此係因為々7經宣告為不被匯出至呼叫㈣咖川之函式的播案靜態變數。在此實例中’編譯器2〇〇可自相依性資訊判定㈣咖⑽函式^址々7之方式，但可能不能夠產生實 158751.doc -27- 201224933 際上讀取x/7之值的程式碼。全函式向量化如上文洋細描述’編譯器自動向量化可用以按可能對程式設計者或其他使用者透明之方式來自未經向量化之原始程式瑪產生經向量化之程式碼。此編譯器自動向量化可在幾乎無程式設計者介入的情況下使原始程式碼能夠利用由向量計算硬體所提供之效能改良。然而’若將要有效地向量化非葉函式（亦即，呼叫其他函式之函式），則可能需要提供將向量介面而非可在原始程式碼中表示之純量介面曝露至呼叫函式的被呼叫函式之版本。此外，應用程式開發者可能希一， "议 / 1^1 多種計算平台，並非所有計算平台可提供向量資源。舉例而5，處理器系列之行動版本可省略向量運算以減小晶粒大小及功率消耗’ m同—處理㈣列之桌上型版本可經開發以強調處理能力勝過功率消耗。在此情形中，為了在行動處理$上執行’應用程式可能需要使用僅純量函式來編譯，而當在桌上型處理器上執行時，應用程式可使用純量或向量函式'然而’如同上文所描述之自動向量化需要允許制程式在向量及非向量平台上有效地執行，月時減少或消除程式設計者介入。 ° 因此’當向量化函式時，根據本文中所描述之— 例之編譯Μ自單U㈣碼 ς實施减本兩者。該函式可為（例如）程式庫函式’但更具 158751.doc -28- 201224933 體言之，其可對應於任何可呼叫程序或方法。在一些實施例中，函式之純量版本可❹如由原始程式碼最初指定之純篁介面°同時’函式之向量版本可實施至函式之向量介面，從而接受向量參數及/或產生向量傳回值。藉由產生函式之純量版本與向量版本兩者，編譯器可使程式碼能夠在編譯階段或執行階段被更$活_於可肖_。此外，assignA (gl,x); assignA (g2,y); Although a crisis can exist between W or x or 2 pairs, these dependencies are related to a single call to the function. In this particular example, the call loop can check for potential hazards only between the public 7 pairs of information identified in its self-consistent profile and between right and wrong. In some embodiments, the unknown addressing is similar to the variable foot address as described above, but is typically applied to the execution phase addressing that cannot be evaluated by the call function. This occurs, for example, in the called function. The case uses the information from the dependent f-file to modify the value of the address variable in a way that is invisible. The extra qualifiers "Public" and "Private" specify whether the linker symbol is allowed to allow variables to be tested by the call function. For example, the reference in the penultimate example given above is referred to as "private", because 々7 is declared not to be remitted to the call (4) Type of broadcast static variable. In this example, 'Compiler 2' can determine the value of (4) coffee (10) function ^ 々 7 in the self-dependency information, but may not be able to produce the value of 158751.doc -27- 201224933 The code. Full-featured vectorization As described above, compiler automatic vectorization can be used to generate vectorized code from unnormalized primitives in a manner that may be transparent to the programmer or other user. This compiler auto-vectorization enables the original code to take advantage of the performance improvements provided by vector computing hardware with little or no programmer intervention. However, if the non-leaf function (that is, the function of calling other functions) is to be effectively vectorized, it may be necessary to provide a scalar interface that exposes the vector interface rather than the original code to the call function. The version of the called function. In addition, application developers may wish to use a variety of computing platforms, not all computing platforms can provide vector resources. For example, the mobile version of the processor family can omit vector operations to reduce die size and power consumption. The desktop version of the processing (4) column can be developed to emphasize processing power over power consumption. In this case, in order to execute 'applications on action processing $, it may be necessary to compile with a pure-quantity function, and when executed on a desktop processor, the application can use a scalar or vector function'. 'Automatic vectorization as described above needs to allow the program to execute efficiently on vector and non-vector platforms, reducing or eliminating programmer intervention during the month. ° Therefore, when vectorizing a function, it is compiled according to the example described in the example—from the single U(four) code. This function can be, for example, a library function 'but more 158751.doc -28- 201224933, which can correspond to any callable program or method. In some embodiments, the scalar version of the function may be, for example, the pure interface originally specified by the original code. The vector version of the 'function can be implemented to the vector interface of the function to accept vector parameters and/or generate The vector returns the value. By generating both a scalar version and a vector version of the function, the compiler can make the code more or less active during the compile or execution phase. In addition,

藉由產生被呼叫函式的經向量化之版本且將所得向量介面曝露至呼叫函編譯器可促進呼叫函式之向量化，因此自葉函式向上階層性地傳播向量化之機會。資料庫（諸如，相依性考慮以下函式殼層，可（例如）在與函式相關聯之相依性槽案）中表達向量介面。舉例而言，其中已省略函式之内部細節： int foo(int A) int B; //函成程式碼return(B); 此函式之純量介面可表示為(例如，在相依性槽案内) int f〇〇 (int A) 此表示反映結果。根據此版本，/O〇採用純量參數且傳回純量經向量化以同時對多個資料項目執行運算之同 (例如）可成為：The vectorization of the call function can be facilitated by generating a vectorized version of the called function and exposing the resulting vector interface to the call function compiler, thereby propagating the opportunity for vectorization from the leaf function. Databases (such as dependencies Consider the following functional shells, which can be expressed, for example, in a dependency trough associated with a function). For example, the internal details of the function have been omitted: int foo(int A) int B; //the code is return(B); the scalar interface of this function can be expressed as (for example, in the dependency slot) In the case) int f〇〇(int A) This represents the result. According to this version, /O〇 uses a scalar parameter and returns the scalar vectorized to perform the same operation on multiple data items at the same time (for example) can become:

Vector foo(Vector A)Vector foo(Vector A)

Vector B; //函式程式碼return(B); 1587Sl.doc -29- 201224933 因而，此函式之向量介面可矣+ "囬J衣不為（例如，在相依性檔案内）：Vector B; //Function code return(B); 1587Sl.doc -29- 201224933 Thus, the vector interface of this function can be 矣+ "回J衣不( (for example, in the dependency file):

Vector foo(Vector A) 不同於S前表示，此表示指示χ;之此版本採用向量參數且傳回向量結果。參看圖5，描繪表示全函式向量化方法之實施例的流程圖。在區塊500中，編譯器接收待編課之函式。在區塊 51〇中，、編譯器細可編譯該函式之純量版本。纟區塊52〇中，編譯器2GG可編譯該函式之向量版本^且在區塊53〇中，編譯II 200可在相依性f料庫中表達與函式之向量版本相關聯的向量介面。此替代向量介面之存在允許編譯器2⑽自經向量化之迴圈内進行向量函式呼叫，而非自經向量化之迴圈内進行多個經串列化之純量函式呼叫。舉例而t，考慮呼叫外部函式/〇〇之呼叫函式内的以下迴圈： f〇r(x=〇； χ<5 12; ++χ) C[x]=D[x]； foo (c); 若>省具有純量介面，則向量化此迴圈之機會可限; 4如冰派之向量化。然而，/〇〇〇之向量版本之加迴圈向量化之撒奋谢 ^ J * 版太“ 例而言，以上迴圈之經向量化: 1使用向量參數之/〇〇〇且可接收向量結果，從， ::更多同時執行且減少迴圈内之串列化。此外; 月，J方法’此技術准許不含有迴圈之函式之向量化。此， 158751.d〇, -30- 201224933 形可增加應用程式中之總向量化之量β Ο Ο ，可向：化函式之兩個版本中之迴圏。大體而言，「水平」向量化可指代將迴圈之反覆映射至向量之相應元素的向量化類型。「垂直」向量化可指代如下向量化類型：可 m迴圈之反覆性質（亦即，與如水平向量化中之映射至向置70素相對），但用向量變數替換純量變數以使得相比程式碼之純量版本，每一反覆同時對更多資料進行運算。、可使用S集純量技術水平地向量化函式之純量版本中之、圈可水平或垂直地向量化函式之向量版本中之迴圈。此情形可增加應用程式中之向量化之機會。除向量化 =呼^之效能及效率益處以外，此技術亦可增加在應用程式中垂直地向量化之迴圈之數目，因此減小在水平地向量化迴圈時所引起之額外耗用。參看圖6 ’料表讀驗向量化之函式之方法的實施例的流程圖。在F Λ Λ . 圆在q塊_中，、編譯器2〇〇識別呼叫被啤叫函式2^函式。舉例而言，彳叫函式可包括呼叫經預編譯之、％内之函式的迴圈。在區塊610中，編譯器200存取與被呼叫函式相關聯之相依性資料庫。在區塊6 ™檢查相依性資料庫以料被呼叫函式之向量變^ :否可用。在—實施中’當向量版本可用時，在區塊630 ’編譯1^200編譯呼叫函式以利用被呼叫函式之向量變體。若向量版本不可用，則編譯器·編譯呼叫函式以利用純置版本（例如，藉由反覆地呼叫函式之純量版本）。舉例而言，再次考慮以下迴圈： 158751.doc 31 201224933 f〇r(x=〇； χ<512; ++χ) C[x]=D[x]; )f〇° (C)； f向量化此迴圈時，編譯11可檢查與X)相關聯之相依性資料庫以判定與相關聯之向量介面是否存在。若之向量介面不存在，則編譯器2〇〇可（例如）藉由向量化指派同時使函式呼叫保持處於純量格式來僅部分地：：化迴圈。里另方面，若具有表達於其相依性資料庫中之經向量化之介面’則在_些例子中，編譯器⑽可整體地向量化迴圈（例如’藉由將指派與函式呼叫兩者替換或以其他方式轉變為向量運算）。、當編譯器檢查/〇〇0之相依性資料庫以判定是否存在被呼叫函式之經向量化之介面時’編譯器可另外或替代地檢杳與被呼叫函式相_之任何記憶體相依性， = —相關聯之同-（或另―）相依性資料庫中。，、了表達於與在一些實施中’可獨立地追料列之每_維度之定址以最小化不確定性。大體而言，此概念可應用於料類型（諸如，結構及陣列韦窠、心貧干'^ 卜貫例更砰細地說明諸如編譯器200之編譯器（例如）可使用相依性資料庫資訊以實現向量化且可在可能時代替純量版本而使用函式之向量版本的方式（應注意’在其他實施例中’可獨立於判定向量函 1 是否存在來使用相依性資料庫，且可獨立於判定相依性貝料庫是否存在來使用向量函式介面）。 158751.doc -32· 201224933 typedef struct { int a; int b; int c; int *ptr; } myStruct; myStruct g; int bar (myStruct &p, int j) { p.ptr[p.b+j] = 0; return(p.b > j); }Vector foo(Vector A) is different from the pre-S representation, which indicates χ; this version uses vector parameters and returns vector results. Referring to Figure 5, a flow diagram depicting an embodiment of a full function vectorization method is depicted. In block 500, the compiler receives the function to be programmed. In block 51, the compiler can compile the scalar version of the function. In block 52, the compiler 2GG can compile the vector version of the function^ and in block 53, the compiled II 200 can express the vector interface associated with the vector version of the function in the dependency f library. . The existence of this alternative vector interface allows compiler 2 (10) to perform vector function calls from the vectorized loop instead of performing multiple serialized scalar calls within the vectorized loop. For example, let's consider the following loop in the call function of the call external function/〇〇: f〇r(x=〇; χ<5 12; ++χ) C[x]=D[x]; foo (c); If the province has a scalar interface, the chance of vectorizing this loop is limited; 4 such as the vectorization of the ice pie. However, the vector version of the /〇〇〇 vector is added to the vectorization of the loop. ^ J * version is too "example, the above loop is vectorized: 1 using the vector parameter / / and accepting the vector As a result, from :: more concurrent execution and reduced tandemization within the loop. In addition; month, J method 'this technique allows vectorization of functions that do not contain loops. This, 158751.d〇, -30 - 201224933 The shape can increase the total vectorization of the application β Ο Ο , which can be returned to the two versions of the function. In general, the "horizontal" vectorization can refer to the repetition of the loop. The vectorized type that maps to the corresponding element of the vector. "Vertical" vectorization can refer to the following vectorization type: the inverse nature of the m-loop (ie, as opposed to mapping to the 70-state in horizontal vectorization), but replacing the scalar variable with a vector variable to make Compared to the scalar version of the code, each iteration also operates on more data. You can use the S set scalar technique to horizontally vectorize the loops in the vector version of the function that vectorizes the function horizontally or vertically. This situation increases the chances of vectorization in the application. In addition to the performance and efficiency benefits of vectorization = callback, this technique can also increase the number of loops that are vertically vectorized in the application, thus reducing the extra overhead caused by horizontally quantizing the loop. See Figure 6 for a flow chart of an embodiment of a method of reading a vectorized function. In the F Λ Λ . circle in the q block _, the compiler 2 〇〇 recognizes the call by the beer function 2^ function. For example, a squeaking function can include a callback of a precompiled function within %. In block 610, compiler 200 accesses a dependency database associated with the called function. In block 6 TM check the dependency database to change the vector of the called function: No is available. In the implementation - when the vector version is available, the block 630 'compiles 1^200 to compile the call function to take advantage of the vector variant of the called function. If the vector version is not available, the compiler compiles the call function to use the pure version (for example, by repeatedly calling the scalar version of the function). For example, consider the following loop again: 158751.doc 31 201224933 f〇r(x=〇; χ<512; ++χ) C[x]=D[x]; )f〇° (C); f When vectoring this loop, compile 11 can examine the dependency database associated with X) to determine if the associated vector interface exists. If the vector interface does not exist, the compiler 2 can only partially:: loop back by, for example, vectorizing the assignment while keeping the function call in a scalar format. In another aspect, if there is a vectorized interface expressed in its dependency database, then in some examples, the compiler (10) can collectively vectorize the loop (eg, 'by assigning and calling two functions) Replace or otherwise transform into vector operations). When the compiler checks the /0 dependency database to determine if there is a vectorized interface to the called function, the compiler can additionally or alternatively check any memory associated with the called function. Dependency, = - associated with - (or another) dependency database. , and expressed in some implementations, the location of each _ dimension of the column can be independently pursued to minimize uncertainty. In general, this concept can be applied to material types (such as structure and arrays, and poorly done). The compilers such as compiler 200 (for example) can use dependency database information. To implement vectorization and to replace the scalar version when possible, use the vector version of the function (note that 'in other embodiments' can use the dependency database independently of the existence of the decision vector function 1 and can The vector function interface is used independently of determining whether the dependency library exists. 158751.doc -32· 201224933 typedef struct { int a; int b; int c; int *ptr; } myStruct; myStruct g; int bar ( myStruct &p, int j) { p.ptr[p.b+j] = 0; return(pb >j); }

void foo(int i) { for (int x=i; x<i+200; ++x) if (bar(g,x)); ++g.a; } 在此實例中，函式將匯出相依性（例如，經由在編譯函式時由編譯器200所產生之相依性檔案，如上文所論述），從而指示其寫入至，且自ά及讀取： typedef struct { int a; int b; int c; int *ptr; } myStruct; int bar(myStruct *p, int j) { read p.b; read p.ptr; write p.ptr[p.b+j]; }；應注意，在此特定狀況下，可能沒有必要將對參數之參 158751.doc •33- 201224933 考識別為「公用」《「私用」。又，可能沒有必要宣告函式自ρ或y讀取，此係因為至少在此實例中’可假設函式使用其自有參數。之類型定義可包括於相依性資料庫中以將其曝硌至呼叫之函式，但可未必經由標頭樓案包括而曝露至之定義。在編譯期間，編譯器2〇〇可編譯函式心…而不向量化該函式，此係因為其中不存在要向量化之迴圈。在進行此步驟中，可產生具有以下介面之之純量版本： int bar(myStruct *p, int j) 在此實例中，可將指向結構之指標之單一執行個體及單一整數作為參數，且傳回單一整數作為結果。因此， △ 之此版本之輸入及輸出為純量。然而，編譯器200亦可編譯具有亦可在相依性資料庫中匯出之以下介面的向量函式：Void foo(int i) { for (int x=i; x<i+200; ++x) if (bar(g,x)); ++ga; } In this example, the function will be exported Sex (for example, via a dependency archive generated by compiler 200 at the time of compiling the function, as discussed above), thereby instructing it to be written to, and reading and reading: typedef struct { int a; int b; Int c; int *ptr; } myStruct; int bar(myStruct *p, int j) { read pb; read p.ptr; write p.ptr[p.b+j]; }; It should be noted that in this particular situation Under the circumstance, it may not be necessary to identify the parameter 158751.doc •33- 201224933 as “public” and “private”. Again, it may not be necessary to declare the function to read from ρ or y, since at least in this example the 'assumable function' uses its own parameters. The type definition can be included in the dependency database to expose it to the call function, but may not be exposed to the definition by the header structure. During compilation, the compiler 2 can compile the function... without vectorizing the function, because there is no loop to vectorize. In this step, a scalar version with the following interface can be generated: int bar(myStruct *p, int j) In this example, a single execution individual and a single integer pointing to the indicator of the structure can be used as parameters, and Return a single integer as a result. Therefore, the input and output of this version of △ are scalar. However, compiler 200 can also compile vector functions with the following interfaces that can also be exported in the dependency database:

Vector bar(Vector p, Vector j, Vector pred) 在此實例中’述詞向量prd指定哪些向量元素應由此函式處理。舉例而言，假設向量包括經定義數目個元素，則述詞向量可含有具有相同經定義數目個位元之向量，每— 位元對應於各別元素。每一位元可充當布林述詞，其判定疋否應處理其相應向量元素（例如，若述詞位元為「1 則為「是」，且若述詞位元為「0」，則為「否」，或若述詞位元為「〇」，則為「是」，且若述詞位元為「1 ，則為「否」）。述詞允許呼叫函式進行條件函式呼叫，且若其不終止於向量長度邊界，則注意迴圈之尾端。應注 158751.doc •34· 201224933 意，其他實施例可使用不同類型之述詞格式（諸如，非布林述詞）。又’在此實例巾，向量户為指向結構之指標的向量，但在此實例t該等指標皆指向同一執行個體。向量乂為簡單整數向篁。編譯器可自純量函式宣告推斷此類型資訊。函式办之一可能向量變體針對輸入向量之每一元素計 ’且將此等結果寫入至户抑之適當陣列索引中。 Ο 〇函式之-可能向量變體亦基於户场·之比較來傳回結果向量。在此特定實例中，編譯器垂直地向量化函式。亦即’因為_不含有迴圈，所以不存在待轉變成向量元素之迴圈反覆(如同水平向量化中之狀況)。實情為㈣之經向量化之版本可同時對向量輸人之不同元素進行運算。在/^之編譯期間，編譯器可讀取關於函式^^(其可能未必位於同一原始程式棺中）之相依性資訊，且判定被呼叫函式㈣不具有對μ之相依性，即使呼叫函式將指標傳遞至結構㈣如此。因為其具有此資訊，所以編譯器謂可水平地向量化函式㈣中之迴圈。此外，編譯器 200可針對所處理之每—向量對㈣之向量變體進行單一函式呼叫’而非在迴圈之每—反覆中呼叫純量變體。最後，編譯器可產生具有向量介面之㈣之向量變體。在此特定狀況下’由於不可分析全範圍之X之相依性，因此不可應用垂直向量化。可應用迴圈之水平向量化，且立含於在傳遞至函式/叫之向量變體之向量元素上反覆㈣一迴圈内。 158751.doc -35 - 201224933 在此等假設下’函式/00〇可匯出以下相依性： void foo(int j) { readwrite public g.a; read public g.b; read public g.ptr; write public g.ptr[@]； }；「（®符號表不未知定址）。因為函式心⑺匯出相依性「wnte p抑/>』+"」，所以編譯器2〇〇可告知將結構要素作為X之函式來寫人。因此，編譯器2〇〇可向/〇叻之呼叫者報告寫人至之索引未知’此係因為該索引不可由/〇叩之呼叫者判定。額外實施技術此章節描述可用以實施非筆Aαl θ 井茶向量化及全函式向量化之非限制性編譯器技術。以j从.+. w 以下杬述基於巨集純量編譯器技術，但一般熟習此項技術者根摅太欢α。很糠本發明將認識到，可使用其他編譯器技術》運算式。只要運算式不涉見之項，則大體上係如此當查找表用於計算至其他先前實例說明定址可包括數學及函式呼叫且僅含有呼叫函式可情形。此可包括間接定址，諸如陣列中之索引時。丧疋用〜、、、〇命Μ匯出静恶丨早f ij 可幫助向量化更多迴圈的—愔 — int foo(int i)Vector bar(Vector p, Vector j, Vector pred) In this example, the predicate vector prd specifies which vector elements should be processed by this function. For example, assuming that a vector includes a defined number of elements, the predicate vector may contain a vector having the same defined number of bits, each bit corresponding to a respective element. Each element can act as a Bulin predicate, and it is determined whether the corresponding vector element should be processed (for example, if the term bit is "1", "Yes", and if the term bit is "0", then "No", or "Yes" if the term bit is "〇", and "No" if the term bit is "1". The predicate allows the call function to make a conditional call, and if it does not end at the vector length boundary, note the end of the loop. Note 158751.doc • 34· 201224933 means that other embodiments may use different types of predicate formats (such as non-bringal predicates). Also, in this example, the vector is a vector pointing to the index of the structure, but in this example t these indicators all point to the same execution individual. The vector 乂 is a simple integer 篁. The compiler can infer this type of information from a scalar function declaration. One of the possible vector variants of the function is for each element of the input vector and writes the result to the appropriate array index. The Ο 函 function-possible vector variant also returns the result vector based on the comparison of the household field. In this particular example, the compiler vertically vectorizes the function. That is, because 'the _ does not contain a loop, there is no loop repeat to be converted into a vector element (as in the case of horizontal vectorization). The vectorized version of (4) can operate on different elements of the vector input at the same time. During compilation of /^, the compiler can read dependency information about the function ^^ (which may not necessarily be in the same source) and determine that the called function (4) does not have dependency on μ even if the call The function passes the indicator to the structure (4). Because of this information, the compiler can horizontally vectorize the loop in function (4). In addition, compiler 200 may make a single function call for a vector variant of each of the processed vector pairs (four) instead of calling a scalar variant in each of the loops. Finally, the compiler can generate a vector variant of (4) with a vector interface. In this particular case, vertical vectorization cannot be applied because X-dependence of the full range cannot be analyzed. The horizontal vectorization of the loop can be applied and is repeated in a circle of (4) over the vector element passed to the function/call vector variant. 158751.doc -35 - 201224933 Under these assumptions, the function /00〇 can reproduce the following dependencies: void foo(int j) { readwrite public ga; read public gb; read public g.ptr; write public g. Ptr[@]; }; "(® symbol table is not unknown addressing). Because the function heart (7) reconciles the dependency "wnte p suppression />" +"", the compiler 2 can inform the structural elements Write as a function of X. Therefore, the compiler 2 can report to the caller of the caller that the index to the writer is unknown 'this is because the index cannot be determined by the caller of /〇叩. Additional Implementation Techniques This section describes non-restrictive compiler techniques that can be used to implement non-pen Aαl θ well tea vectorization and full function vectorization. The following is based on the macro scalar compiler technology, but the general knowledge of this technology is too happy. It will be appreciated that the present invention will recognize that other compiler techniques can be used. As long as the expression is not covered, this is generally the case when the lookup table is used for calculations to other previous examples. The addressing can include both mathematical and functional calls and only call functions. This can include indirect addressing, such as when indexing in an array. Funeral Use ~,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,

It形。考慮以下實例 static int A[100] = {...}； return(B[A[i]]); ’ 158751.d〇i -36 - 201224933 void bar(void) { for (x=〇; x<100; ++x) { t = B[x]; B[t] = foo(x); 針對/0〇所產生之相依性可取決於編譯器及連結器是否經組態以按公用方式匯出靜態符號而不同。在以下之實例 o o 中，第一相依性檔案表達私用靜態變數且第二相依性檔案表達公用靜態變數： int foo(int i) { read private A[i]; read public B[@]; }； int foo(int i) { static int A[100]; read public A[i]; read public B[A[x]]; }；應注意，A之類型宣告可在將其以公用方式匯出時在相依性檔案中為必要的。當靜態變數為私用的時，5/7之定址未知，此係因為不可自函式外判定s"之定址。由於危障檢查為不可能的，因此不可執行中之迴圈之向量化。然而，當工具經組態以按公用方式匯出靜態變數時，編譯器可發出讀取之内容之指令，且檢查β厂4^77與之間的危障，因此實現迴圈之向量化。 158751.doc •37· 201224933 自然地，丨、，A 03 V» 田以A用方式匯出且在外部定址靜態變數時， ?見名稱衝突之機會。A了幫助避免此等衝突，靜態變數可藉由旦告該等靜態變數之函式及檔案來進行名稱重整 (mangle) 〇 —危障涉及有條件地發生之記憶體操作，或涉及基於條件°十算而可能不同的^址。為了支援呼叫涉及條件相依陡之函式之迴圈的向量化，可提供一機制來表達條件影響相依性之方式。舉例而言，考慮以下程式碼： if (A[x] < C) d = B[x]; 可在相依性資料庫中將此程式碼表達為： read public A[x]； read public c; 含卜]< c ? read public B[x]; A[x] < c ? write public d; 條件運算式亦可存在於位址之計算令。舉例而言，考慮以下程式碼： if (A[x] < c) d == B [x]; else e = B[x+C]；可在相依性資料庫中將此程式碼表達為： read public A[x]; read public c; A[x] < c ? write public d : write public e; A[x] < c ? read public B[x] : read public B[x+c]; 或者，可將以上的後一條件運算式表達為： read public B[ A[x] < c ? x : x+c ]; 158751.doc -38- 201224933 在一些狀況下，未知項可逐漸產生（creep)於相依性運算式中。在此種狀況下，一說明性實例可為·· A[x] < c ? read public B[xJ ： read public B[@]; 此運算式可在條件為真之情況下通知編譯器對B之特定相依性，且在條件為假之情況下通知編譯器對6之未知相 . 依性。逐漸產生於條件運算式中之未知項可引起表現為似乎條件為真與假兩者之無條件相依性。舉例而言： O A[x] < B[@] ? read public f ： read public g; 可表達為： read public f; read public g; 及： read public A[ x > @ ? x ： x+y]; 可表達為： read public A[x]; read public A[x+y]； 0 因為呼叫函式通常不能夠評估未知條件，所以呼叫函式可作出存取至j/7中之兩個可能索引之保守假設。 “工在一些實施中，亦可在相依性資料庫中表達循環相依性。舉例而言，考慮以下函或： if (A[x] > b) b = A[x] 在一實施中，此函式可表達為： read public A[x]; read public b; A[x] > b ? write public b; 158751.doc -39- 201224933 在將指標或參考傳遞至函式（亦被稱為厂藉由參考來傳遞」）的情況下，函式有可能修改其呼叫參數。此情形不同於由值所傳遞之參數之修改，此係(例如)因為藉由參考所傳遞之參數之修改可影響呼叫函式之運算。可以與記錄靜態及全域儲存區之修改相同的方式記錄藉由參考所傳遞之參數之修改。可將由值所傳遞之參數之修改視為區域自動儲存區之修改。在一些例子中，因為由值所傳遞之來數之修改對於呼叫函式為不可見的，所以可能不記錄該等修改。 —貫施中可在軟體推測對向量化呼叫迴圈將為必要的狀況下以推測方式呼叫符合一組準則之函式。因此， 7在相依性檔案中表達推測安全指示符，且推測安全指示符可充田可以推測方式安全地呼叫相應程式碼之指示。在非限制性實例中，能夠以推測方式被呼叫之向量函式可屬於以下兩種類別中之一者：類型A及類型B。類型A函式可士具有本文中所描述之常規向量介面之向量函式。舉例而。可在類型A函式符合以下準則時以推測方式呼 =函式而無有害的副作用。首先，除區域自動非心子品之外’该函式不存取任何記憶體。其次，該函式不叫亦非類型A函式之任何其他函式。類型A函式之實^ 為超越函式或其他反覆收斂演算法。例可除由原始程式竭所指定之任何傳回值以外亦可傳回指示處理了哪些元素之述詞向量。在—實二式中’用於以推測方式呼叫類型B函式之準則可如下&: 卜。首 158751.doc •40· 201224933 之任何讀取使用首函式不寫入至非區式不呼叫既非類型先’自非區域儲存區或區域陣列儲存區次故障（flrst_faulting)讀取指令其次，域健存區或靜態區域儲存區。第三，函 A函式亦非類型B函式之任何函式。圈啤叫類型A函式可類似於啤叫非推測性函式。通备以推測方式呼叫類型A函式時，就啤叫迴圈而言，無特殊動作為必要的。然而，呼叫、 7 函式可要求呼叫迴圈檢查傳回向量以便判定處理 Ο Ο 來調整呼叫迴圈之行為。素’且作為回應編譯器（諸如，編譯器200)可選擇使類型Β向量函式之所有呼叫者調整其行為以適應實際上已處理之數個元素，而不管軟體推収否用於呼叫迴圈中。或者，編課器⑽可 -類型Β函式產生兩個向量函式；—個推測性向量函式及-個非推測性向量函式。用於類型Β迴圈之準則可大體上經設計以確保限定之彼等迴圈為少且小的，且對此方法之程式碼大小影響可忽略。依性資料庫中之宣，指定符之缺乏暗類型Α及類型β向量函式可由其在相告來識別，如下文所展示。在一實施令示可不以推測方式呼叫函式。It shape. Consider the following example static int A[100] = {...}; return(B[A[i]]); ' 158751.d〇i -36 - 201224933 void bar(void) { for (x=〇; x&lt ;100; ++x) { t = B[x]; B[t] = foo(x); The dependencies generated for /0〇 may depend on whether the compiler and linker are configured to be used in a common manner It is different from exporting static symbols. In the following example oo, the first dependency file expresses a private static variable and the second dependency file expresses a public static variable: int foo(int i) { read private A[i]; read public B[@]; ; int foo(int i) { static int A[100]; read public A[i]; read public B[A[x]]; }; It should be noted that the type declaration of A can be exported in a public manner. It is necessary in the dependency file. When the static variable is private, the address of 5/7 is unknown. This is because the location of s" cannot be determined from the function. Since the hazard check is impossible, the vectorization of the loop in the middle cannot be performed. However, when the tool is configured to export static variables in a common manner, the compiler can issue instructions to read the contents and check for the danger between the β plant and the ^^77, thus implementing vectorization of the loop. 158751.doc •37· 201224933 Naturally, 丨,, A 03 V» When the field is remitted by A and externally located static variables, see the opportunity of name conflict. A. To help avoid such conflicts, static variables can be renamed by functions and files of the static variables. 危危危危危危危危危危危危危危危危危危危危危危危危危危危危危危危危危危危危危危危危° Ten counts may be different. In order to support the vectorization of calls involving a conditionally dependent function, a mechanism can be provided to express the way the condition affects the dependencies. For example, consider the following code: if (A[x] < C) d = B[x]; This code can be expressed in the dependency database as: read public A[x]; read public c ; containing b]< c ? read public B[x]; A[x] < c ? write public d; The conditional expression can also exist in the calculation of the address. For example, consider the following code: if (A[x] < c) d == B [x]; else e = B[x+C]; This code can be expressed in the dependency database as Read public A[x]; read public c; A[x] < c ? write public d : write public e; A[x] < c ? read public B[x] : read public B[x+c Alternatively, the latter conditional expression can be expressed as: read public B[ A[x] < c ? x : x+c ]; 158751.doc -38- 201224933 In some cases, the unknown can be Gradually generated in the dependency expression. In this case, an illustrative example can be: A[x] < c ? read public B[xJ : read public B[@]; This expression can notify the compiler that the condition is true. The specific dependence of B, and if the condition is false, inform the compiler of the unknown phase of 6. Dependency. Unknown items that are gradually produced in the conditional expression can cause unconditional dependencies that appear to be true and false. For example: OA[x] < B[@] ? read public f : read public g; can be expressed as: read public f; read public g; and: read public A[ x > @ ? x : x+ y]; can be expressed as: read public A[x]; read public A[x+y]; 0 Since the call function usually cannot evaluate unknown conditions, the call function can make access to two of j/7 A conservative hypothesis of possible indexes. "In some implementations, it is also possible to express cyclic dependencies in a dependency database. For example, consider the following or: if (A[x] > b) b = A[x] In an implementation, This function can be expressed as: read public A[x]; read public b; A[x] > b ? write public b; 158751.doc -39- 201224933 Passing the indicator or reference to the function (also known as In the case where the factory is passed by reference, the function may modify its call parameters. This situation is different from the modification of the parameters passed by the value, for example because the modification of the parameters passed by reference can affect the operation of the call function. Modifications to the parameters passed by reference can be recorded in the same manner as recording static and global storage modifications. The modification of the parameter passed by the value can be considered as a modification of the regional automatic storage area. In some instances, the modifications may not be recorded because the modification of the number passed by the value is not visible to the call function. - The function of conforming to a set of criteria can be speculatively invoked in a situation where the software speculates that a vectorized call loop will be necessary. Therefore, 7 expresses the speculative security indicator in the dependency file, and the speculative security indicator can predict the manner in which the corresponding code can be safely called. In a non-limiting example, a vector function that can be called in a speculative manner can belong to one of two categories: Type A and Type B. Type A Functions Cox has the vector function of the regular vector interface described in this article. For example. You can speculatively call the function when the type A function meets the following criteria without harmful side effects. First, the function does not access any memory except for the area auto-non-cardinal. Second, the function is not called any other function of the type A function. The real form of the type A function is the transcendental function or other repeated convergence algorithm. An example may return a predicate vector indicating which elements are processed, in addition to any return value specified by the original program. The criteria for calling the type B function in a speculative manner in the real two can be as follows: & The first 158751.doc •40· 201224933 any read using the first function is not written to the non-regional non-calling neither the type first 'self-non-region storage area or area array storage area failure (flrst_faulting) read instruction, Domain storage area or static area storage area. Third, the function of the letter A is not any function of the type B function. The circle beer called type A function can be similar to the beer called non-speculative function. When the type A function is called in a speculative manner, no special action is necessary in terms of the beer callback. However, the Call, 7 function can require the call loop to check the return vector to determine the processing Ο Ο to adjust the behavior of the call loop. And as a response compiler (such as compiler 200), you can choose to have all callers of the type Β vector function adjust their behavior to accommodate several elements that have actually been processed, regardless of whether the software push is used for callback. In the circle. Alternatively, the lesson (10) can generate two vector functions with the -type ; function; a speculative vector function and a non-speculative vector function. The criteria for the type of loops can be generally designed to ensure that the loops defined are small and small, and the effect of the code size of the method is negligible. According to the declaration in the Dependency Database, the lack of the dark type and the type β vector function of the specifier can be identified by the report, as shown below. In an implementation command, the function can be called in a speculative manner.

int funcl(int a) ： AInt funcl(int a) : A

read public b; //區域-靜離 write public c; //區域-靜了態 int func2(int a) ： B 158751.doc •41- 201224933 jead public d; //非區域 :於向量化編譯器，混淆有時可分析來解決問題™ = 二律中此卜，刀$，此可限制較寬向量之益處。。卩或靜錢數混淆可影響跨越函式呼叫之行於〜、纟實施中，執行編譯階段混淆分析且將混淆才曰不付匯出至相依性檔案。 + ^而° —方法可為將現淆事件分成兩種類別，諸如 =混淆及傳出混淆。自被呼叫函式之觀點，傳入混淆可函式中之位址（諸如，作為參數傳遞進入之彼等位址）、自外部變數讀取或由函式藉由採用外部變數之位址所計算的位址。同時，傳出混清可參考函式外傳之指標。此等可為傳回信，會值·亦即’函式寫入至外部變數或解除參考指標中之值。此外可追蹤至少兩種類型之混淆。「複本混淆」可指丁‘標可為另-指標之複本，且可混淆指標可能會混淆之任何事項。「點混淆」可指示指標有可能會影響另一變數。相依性檔案中之混淆資訊為混淆可能存在之肯定表達。舉例而έ，當編譯器簡單地歸因於混淆資訊之缺乏而不可告知兩個指標是否參考同一記憶體時，不需要使用該資訊。變數之混淆之宣告可類似於傳回值之混淆之宣告。舉例而言’考慮以下函式： 158751.doc •42· 201224933 static int s; static void *ptr, *ptr2; static void *A[1000]； void foo(int x, int y) { A[x] = (void*) s; A[y] = (void*) &s; ptrl = &A[s]; ptr2 = A[s]; . } 在一實施中，此函式可表達以下相依性： void foo(int x, int y) { G read public s; write public A[x] copies s; write public A[y] points s; write public ptrl points A[s]; read public A[s]; write public ptr2 copies Afsl· }; 為清晰起見’上述内容區分點與複本，但在替代語法中有可能組合此等兩種概念。如同其他相依性資訊，混淆資訊通常經由呼叫函式鏈來向上傳播。 Q 由函式傳回之值亦可（例如）經由傳回值本身或經由藉由修改由參考傳遞之變數所傳回的資訊而導致混淆。傳回值及資訊亦可在相依性檔案中追鞭。舉例而言，考慮以下函 ' 式： static float gVar; int *foo(fl〇at *ptrl，float **ptr2) *ptr2 = &gVar; return((int*)ptrl); } 在一實施中’此函式可匯出以下相依性： 158751.doc •43- 201224933 int *foo(float *ptrl, float **ptr2) write *ptr2 points gVar; return copies ptr 1; }；相依性宣告可通知呼叫迴圈由/oo〇傳回之指標可為傳遞進入之指標的複本。此情形允許呼叫迴圈採取措施以確保迴圈之正確操作，而不管出現之混淆。此外，此知識亦可使編譯器能夠在面對不符合ANSI-C之程式碼時更好地充分利用ANSI混淆規則。作為另一考慮事項，指標之強制轉型（casting)可影響位址計算。舉例而言，考慮以下函式： void ZeroInt(char *ptr, int x) *((int*)ptr + x) = 0；在一實施中，此函式可匯出以下相依性： void ZeroInt(char *ptr，int X) write *((int*)ptr+x)；Read public b; / / region - static write public c; / / region - static state int func2 (int a) : B 158751.doc • 41- 201224933 jead public d; / / non-region: in the vectorization compiler Confusion can sometimes be analyzed to solve the problem TM = the second rule, the knife $, which limits the benefits of a wider vector. . Confusion of 卩 or static money can affect the traversal of the function call. In the ~, 纟 implementation, perform the compile phase confusion analysis and will confuse the remittance to the dependency file. The + ^ and ° methods can be used to classify confusing events into two categories, such as = confusion and outgoing confusion. From the point of view of the function being called, the address in the obfuscated function (such as the address passed into the parameter as input), the reading from the external variable, or the address of the external variable by the function Calculated address. At the same time, the clarification of the function can be referred to the pheromone of the function. These can be a return letter, the value of the function is also written to the external variable or the value in the reference indicator. In addition, at least two types of confusion can be tracked. “Replica Confusion” may refer to a copy of the “marked as another” indicator and may confuse anything that the indicator may confuse. Point Confusion can indicate that the indicator is likely to affect another variable. The confusing information in the dependency file is a confusing representation of possible affirmation. For example, when the compiler simply attributed to the lack of obfuscated information and could not tell whether two indicators refer to the same memory, the information is not needed. The declaration of confusion of variables can be similar to the declaration of confusion of returning values. For example, consider the following function: 158751.doc •42· 201224933 static int s; static void *ptr, *ptr2; static void *A[1000]; void foo(int x, int y) { A[x] = (void*) s; A[y] = (void*) &s; ptrl = &A[s]; ptr2 = A[s]; . } In an implementation, this function can express the following dependencies Sex: void foo(int x, int y) { G read public s; write public A[x] copies s; write public A[y] points s; write public ptrl points A[s]; read public A[s] ; write public ptr2 copies Afsl· }; For clarity, 'the above distinctions between points and replicas, but it is possible to combine these two concepts in alternative grammars. As with other dependency information, confusing information is usually propagated up through the call function chain. The value returned by Q by the function can also be confusing, for example, by passing back the value itself or by modifying the information returned by the variable passed by the reference. Returning values and information can also be traced in the dependency file. For example, consider the following formula: static float gVar; int *foo(fl〇at *ptrl,float **ptr2) *ptr2 = &gVar;return((int*)ptrl); } In one implementation 'This function can reproduce the following dependencies: 158751.doc •43- 201224933 int *foo(float *ptrl, float **ptr2) write *ptr2 points gVar; return copies ptr 1; }; Dependency declaration can notify the call The indicator returned by /oo〇 can be a copy of the indicator passed in. This situation allows the call loop to take action to ensure proper operation of the loop, regardless of confusion. In addition, this knowledge allows the compiler to better utilize ANSI obfuscation rules in the face of ANSI-C-compliant code. As another consideration, the cast of indicators can affect address calculations. For example, consider the following function: void ZeroInt(char *ptr, int x) *((int*)ptr + x) = 0; In one implementation, this function can rectify the following dependencies: void ZeroInt( Char *ptr, int X) write *((int*)ptr+x);

情形可使編譯器將此等函式視為具有未知相依性之依性之純量函依性之不確定性的反映。此在一實施中，版本设定方案允許結何時間㈣最佳做 158751.doc -44- 201224933 法來表達相依性。舉例而言，一實施』可准_與由較舊維譯器所產生之相依性檔案之回溯相容性、而另一實施例可准許使較舊編譯器亦能夠讀取由較新總坷碥譯器所產生之檔案之雙向相容i在回溯相容性為僅有要求的狀況下，相依性檔案之版本指定符用以通知較舊編㈣給定標取且應忽略。』謂可如下實施雙向相容性。舉例而言，假設編譯器版本1The situation allows the compiler to treat these functions as a reflection of the uncertainty of the scalar function of the dependency of unknown dependencies. In an implementation, the versioning scheme allows for the best time to perform (4) the best way to express dependencies. For example, an implementation may permit backwards compatibility with dependent files generated by older translators, while another embodiment may permit older compilers to also be read by newer totals. The two-way compatibility of the files generated by the interpreter i. In the case where the backward compatibility is only required, the version specification of the dependency file is used to notify the older code (4) that the reference is specified and should be ignored. It is said that bidirectional compatibility can be implemented as follows. For example, suppose the compiler version 1

不支援陣列索引之計算，但編譯器版本2支援陣列索引之計算。可藉由版本i編譯器將至吵W之寫入表達為： #1 int foo(int x, int y) write public B[@]； }；另一方面，版本2編譯器可另外使用版本2語法匯出相同函式： #2 int foo(int x, int y) write public B[x+y]； }; ，藉由此方法，不僅版本2編譯器可讀取版本丨檔案，而且可允許版本2宣告取代版本丨宣告。版本丨編譯器將知曉忽略大於版本1之任何宣告，從而給出如其能夠理解般多的相依丨生-貝讯。隨著編譯器技術的成熟，此為顯著能力。大體而5 ’若要求開發者對軟體進行改變以實現向量化則相對少的程式碼可成為向量化的。為了解決此問題’本文中所描述之技術提供在不要求開發者修改其原始程式碼的情況下執行大規模向量化之能力。 158751.doc -45- 201224933 儘管已以相當多的細節描述了以上實施例，但在充分瞭解說月書後’ $多變化及修改對於熟習此項技術者即將變得顯而易見。意欲將以下申請專利範圍解譯為包含所有此等變化及修改。【圖式簡單說明】圖1為說明根據某些實施例之可操作以實施用於實現軟體應用程式之通用向量化的技術之電腦系統的方塊圖。圖2為說明根據某些實施例之在由電腦系統執行時可產生可執行程式碼之編譯器的方塊圖。圖3展示說明根據某些實施例之在相依性資料庫中表達相依性之方法的流程圖。圖4展示說明根據某些實施例之向量化一函式之方法的流程圖。圖5展不說明根據某些實施例之全函式向量化方法的流程圖。圖6展示說明根據某些實施例之使用經向量化之函式的方法的流程圖。【主要元件符號說明】 100 電腦系統 110a 處理器 110b 處理 11 On 處理器 120 系統記憶體 130 I/O介面 15875I.doc -46 - 201224933 140 網路介面 150 儲存介面 155 儲存裝置 200 編譯器 210 原始程式瑪 . 220 前端 230 後端 240 最佳化器 Ο 250 程式碼產生器 260 純量目的程式碼 270 經向量化之目的程式碼 280 相依性資料庫 ❹ 158751.doc -47-The calculation of the array index is not supported, but compiler version 2 supports the calculation of the array index. The version of the noisy W can be expressed by the version i compiler as: #1 int foo(int x, int y) write public B[@]; }; On the other hand, the version 2 compiler can additionally use version 2 The syntax returns the same function: #2 int foo(int x, int y) write public B[x+y]; }; , by this method, not only the version 2 compiler can read the version file, but also allow Version 2 is declared to replace the version 丨 announcement. The version 丨 compiler will know that any announcements that are slightly larger than version 1 will be given, giving a much-needed dependency-bein. This is a significant capability as compiler technology matures. In general, if a developer is required to change the software to achieve vectorization, relatively few code codes can be vectorized. To address this issue, the techniques described herein provide the ability to perform large-scale vectorization without requiring developers to modify their original code. 158751.doc -45- 201224933 Although the above embodiment has been described in considerable detail, it is becoming apparent that those who are familiar with the technology will become more apparent after fully explaining the monthly book. It is intended that the following claims be interpreted as including all such changes and modifications. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a block diagram illustrating a computer system operable to implement techniques for implementing generalized vectorization of a software application, in accordance with some embodiments. 2 is a block diagram illustrating a compiler that can generate executable code when executed by a computer system in accordance with some embodiments. 3 shows a flow diagram illustrating a method of expressing dependencies in a dependency database, in accordance with some embodiments. 4 shows a flow diagram illustrating a method of vectorizing a function in accordance with some embodiments. Figure 5 does not illustrate a flow diagram of a full function vectorization method in accordance with some embodiments. 6 shows a flow diagram illustrating a method of using a vectorized function, in accordance with some embodiments. [Main component symbol description] 100 Computer system 110a Processor 110b Processing 11 On Processor 120 System memory 130 I/O interface 15875I.doc -46 - 201224933 140 Network interface 150 Storage interface 155 Storage device 200 Compiler 210 Original program玛 220 Front End 230 Back End 240 Optimizer Ο 250 Code Generator 260 Quantitative Purpose Code 270 Vectorized Destination Code 280 Dependency Database 158 158751.doc -47-

Claims

201224933 VII. Patent application scope: ι· A method comprising: performing the following steps by one or more computers: a function called 'the calling function includes a call to a call 3 ο access and the call One of the functions associated with the continuous dependency data deduction 'where the persistent dependency database indicates the dependence of the called function on the expression', wherein the expressed dependency indicates that the formula is a read-only item , whether to write the data item, or read the data item and write the data item, and 2. (4) based on the expressed dependence to generate whether the call function and the called function are Interactive - judgment. 1:: Method of item 1 wherein the call to the called function occurs; one of the call functions is within the loop. 3. The method of claim 1, wherein the performing further comprises: ο determining, from the persistent dependency database, that the vector version exists for the heart call function; and in the call function, A call to a scalar version of the called function transitions to one of the vector versions of the called function. 4·=The method of claim 1, wherein the performing the step comprises: determining, based on the __ or more of the subordinate items, the __ or more of the subordinate items: no vectorizing the calling function At least - part: whether the variable is read or written by the beer: function; whether the variable is for the public function of the calling function; or the addressing mode associated with the variable. 5. The method of claim 1, wherein the executing further comprises: compiling the original code corresponding to a function; during compilation, identifying the function that is expressed by one of the data items: Dependence of Γ expression indicates that the function reads only one of the data items only, or reads both the data item and the data item; and, one of the expressed dependencies In the library. The method of claim 5 in which the continuous dependency depends on the item 5, wherein storing the expressed phase includes the name other than storing the variable, and storing the indication in the following = the indication is stored in the continuous dependency In the sex database, whether the =: is one; or the method associated with the variable. 7. The method of claim 5, wherein the function of the -vector interface is - to /. Include a method with a -_ quantity version, and the knowledge of the vector interface is not stored in the persistent dependency database. 9. 8. = method of item 5 wherein the execution further includes generating the persistence dependency data during the compile phase Library. The method of seeking = wherein the storing the indication comprises expressing a following address pattern: a certain limit associated with the data item in the function; the public item associated with the data item in the formula Private:, associated with this function—speculative safety indicator i. • Two = This: The obscured indicator associated with the item. ^ The method in which the indication is stored includes one or more of the following 158751.doc 201224933: whether the function reads or writes to - indicates the known displacement in the object, whether the function reads An indication of the displacement of a variable that is taken or written into an object; or whether the function reads or writes to an unknown displacement within an object. n. 12. Ο 13. 14. 15. Ο 16. If the method of claim 1 or the method of claim 5 is specified, the data item in I is not passed to the function through the program interface of the function— parameter. The method of claim 1, wherein the performing further comprises: vectorizing the code within the call function based at least in part on the determining. 3 Finding the side of item 12, wherein vectorizing the code progress in the call function includes: at least part of the public id? j-λ j.. P knife based on the determination to vectorize a loop in the call function . The method of claim 12, wherein vectorizing the code in the call function further comprises: modifying the call to refer to the vector version of the beer function. The method of claim 1, wherein the step of performing comprises: determining, based on the call function, whether the fly is interacting with the break call function, based at least in part on the dependency of the expression Whether to vectorize at least a portion of the call function; and in response to determining to vectorize at least a portion of the ^4 function, the peak A is bound to cause a vector code for the persistent-vector operation. The item is the method of claim 15, wherein the called function contains the function in the pre-compiled private code, and the original code is not available, and the code compiled by _ is used. But the decision operation determines to vectorize at least a portion of the call function 158751.doc 201224933, such as a method of requesting an item, wherein the "function" includes a non-leaf loop, the non-leaf loop including the called letter 18. The method of claim 17, wherein the performing further comprises: vectorizing the first portion of the non-leaf back; and serializing the second portion of the non-leaf loop. Reading the storage medium, the program is stored in the order, and the program instructions are executed in response to the execution of the system by means of the knowledge of the brain system - ♦ * l * The request item 丨 to 丨8. The operation of the method of any one of the items 20. The system includes: an inter-storage instruction; and a memory from the 1st home | system execution implementation such as requesting one or more memories , at one or more of the operating periods A processor that fetches instructions during execution and executes the instructions to cause operation of the method of any of 1 to 18. 158751.doc