TWI322953B

TWI322953B - Optical proximity correction on hardware or software platforms with graphical processing units

Info

Publication number: TWI322953B
Application number: TW095144746A
Authority: TW
Inventors: H Torunoglu Ilhami; Karakas Ahmet
Original assignee: Gauda Inc
Priority date: 2005-12-02
Filing date: 2006-12-01
Publication date: 2010-04-01
Also published as: WO2007120304A3; WO2007120304A2; TW200739377A

Description

九、發明說明：【發明所屬之技術領域】 “本發明係與電子設計自動化有關，特別是指一種執行光學臨近修正之改良技術。 5【先前技術】積體電路的製造業者致力於將更小的特徵放置於一既，面積的積體電路晶片中。其中在製造微小特徵的一個挑戰為使用照辦刷術時的光繞㈣題。亦即，超大積體電路(VLSI)晶>{的品質與精度是與光源的波長與印刷的尺寸有關。 15 近來，次波長印刷術近似法針對使用波長大於產生像之最小的特徵尺寸，（例如：波長為193奈米的光被使用於產生90、65或45奈米的特徵）。然而，此種近似法需要針對最後佈局因光繞射所產生的退化與失真進行修正。用 :產生電路佈局之光罩具有料完美處做預鲜備以及至 /部份修正的結構於努力於製造微小特徵。關於曝光與印刷的電腦模擬以在執行，而且退化與失真被不同的附加、包含物與調整所計算於光罩的設計。一 ^被選擇的光罩設狀良了最終結構。這些絲，通常稱為光學臨近修正(optlcaipr〇ximityc〇難i〇n，〇pc)，主要取，於光㈣、統與光罩特徵，而且可經計算而加強。當且有尚密度特徵的區域有趨勢更為傾向失真(“臨近，，效應），⑽ 限定於此種區域’而且可有利於應用在電路的低密度區域。 5 1322953 OPC基本上在一佈局配置中有很多特徵可被電腦處理一至多次。近來在半導體製造的發展允許數十億的電晶體 (亦即數十億個特徵)放置在單一晶片上。熟知的，，摩爾法貝HMoore’s law)假設單一晶片上所能設置的電晶體數量每 5 I2·24月會增加一倍。不幸的是，不管中央處理單元(CPU) 的處理速度與計算能力的進步’ 0PC計算所需的電腦能力與可使用的CPU處理能力之間的差距仍持續增加。換言之’能有效執行OPC計算所需的電腦能力的成長速率快於在一可接受價格的工程工作站中的可使用的CPU能力。 10 更近一步的複雜問題，需應用OPC的光罩或層的數目增加在每一新半導體裝置製造節點。因為當照明波長保持相同或以緩慢的速率減少時該等特徵會因每一製造節點而變得更小，影響每一特徵的精確度的鄰近特徵的數量也增加。因此，執行OPC操作於新晶片設計上的電腦處理能力 15的增加速度大約是3或4或更多因子對於每一個連續製造節點。。目前，每一代光學修正光罩需數小時至數天，而且製程的複雜度職增加。因為在OTC餘後印刷的特徵仍與預期特徵不同，每-特徵對晶片的功能與效能的影響需要反覆的修正。典型的VLSI設計製程具有若干重複的光罩帶、OPC製程以及結果的闡釋。這些重複可造成晶片格與製造製程數月的延遲。對於新晶片設計指令的上市時間的持續壓力改善了預估與縮短OPC製程在早期階段的設計的方法。因為，這是 6 20 1322953 電，的禁止性執行若干。rc應用於-全晶片尺寸、部分 =模型基的opc方法的重複應用於有_樣式，仍需要全晶片OPC —但設計已完成。、因此’技藝巾存在著獨短執行OPC所需的時間，辦進0PC方法的精確度的改良❹、統與方法的需要，而且Γ 可應付較大晶片設計。且疋【發明内容】本發月主要與積體電路製造的領域有關，特別， 10用光學臨近修正(0PC)以改良使用於為電子電路設計刷。特別地’本發明與在使用專門處理單元的硬體或平台，或其組合，上執行OPC技術有關。依據本發明用以在具有專門處理單元的硬體或軟體平台’或其組合，上執行〇PC演算法則㈣統與方法。 15 在本發明部分實施例中，空間領域OPC計算結果於一硬體或軟體平台，或其組合，上執行，包含有一或多個專門執行單元。專門執行單元的範例包括中央處理單元 (CPUs)、圖形處理單元(GPUs)、物理處理器、蜂巢處理器、數位訊號處理器(DSPs)、場可程式閘陣列(FPGAs)、應^特 2〇殊積ϋ電路(ASICs)以及其類似。0PC計算作業的部份可轉換為矩陣與向量之數學計算形式。GpUs特別適合於執行此矩陣與向量資料的運算。 GPU或GPUs可操作該等資料直到結果在依預定的誤差線至内收斂至目標模型》這些操作包括變更光罩特徵= 7 ，狀’與餘光阻層曝光的酬與光學祕的詳細模型。最終#料可能被轉換回原始資料形式以及輸出為用於在半導體裝置印刷佈局的光[GpUs可為—特殊處理器的例子’但不應被限制為本發明所教示之GpUs之特徵。本發明可使用於任何前述之特殊處理器，以及在領域中且有通常知識者可瞭解的其他本質上油聽理器，以及i來可發展出的類似或相關的處理器。在-實施例中，本發明包含一計算系統具有至少一中 10 央處理單it與至少-圖形處理單元；—使用者介面用以對電腦系統產生交互仙；—具有敘述特徵的大小與配置之資料之電腦可讀雜’形成於—用於製造半導體裝置之平版印刷曝絲罩；-電腦可讀频包括光學臨近修正計算程序以作驗料資料，其+料光學臨近修正計算程序 15 的至少一部份在執行時使用該圖形處理單it ;以及輸出裝 X以顯錢賴圖形處理單元計算該料料的應用該光學臨近修正計算程序的結果。實施财，本發明提供—方法包含有：提供一系統具有至少—中央處理單^及至少處理單元；分學臨近修正處理為若干依循—獅式之必需的計算 ϋ，’力配該等光學臨近修正處理的作業至該中央處理單元；以及傳送該中央處理單元與該圖瓜處理早7C之輸出為該光學臨近修正處理之一在=_二，本發明包含有：―計算^包括若干即點’其中母—㈣包括至少—中央處理單元以及至少一 20 1322953 5 15 圖形處理單元之至少-者；-介面用以連接該等節點；一使用者介面使該電腦系統產生交互作用；一具有敛述㈣的大小與配置的資料之電腦可讀媒體以形成於一用於製造 f導體裝置之平版印與光光罩；—包括光學臨近修正計异程序以對該等資料產生作用之電腦可讀媒體，且中該等 ^臨近修正計算程序的至少―部份在執行時錢該圖形處理單元；以及-制可讀舰包括光學崎修正計算程序對該等資料產生作用，其中該光學臨近修正計算程序之至少-部份在執行時使用在其中—節點之圖形處理單元。該介面可為至少-PCI快速匯流排、前側匯流排、乙太網路、網際網路或其他有利於任何形式之資料傳輸 ^串聯或併聯’的介面。該具有描述特徵之大小與配置的貝料之電腦可讀雜形成於—胁製造半導體元件的平版 P刷曝光光罩上可直接連接該等節點之其中—者與該等資 t一部分通過該介面至其他節點之至少—者。該直接連 f可以-不同的節點的方式而非該等節點如何連接。例如.該直接連接可以-IDE、SATA或USB介面。該具有光學臨近修正計算程序以對該等資料產生作用 ^電腦可讀媒體直接連接於該等節點之其中-者，且該等，學，修正計算程序之至少一部份在執行時使用該圖/形理單元於除直接連接於料光學臨近修正計算程序之節 =外之一另一節點。該具有光學臨近修正計算程序以對該 f貧料產生作用之電腦可讀媒體直接連接於該等節點之复中-者，且該等光學臨近修正計算程序之至少份2 20 1322953 行時使用該圖形處理單元之該等光學臨近修正計算程接連接之節點。 ° 序直該系統可具有一具有光學臨近修正計算程序之電腦可讀媒體以與若干二維次區域分享給定的佈置資訊，其中、古些次區域相互重疊。可提供一具有光學臨近修正計算程^ 之電腦可讀媒體以傳輪給定的佈置資訊於若干二維次區至該系統中二或以上個節點。一具有執行於二或以上:狄點之光學臨近修正計算程序之電射讀舰㈣作該給: 的佈置資訊分配至若干二維次區域。 -具有絲臨近修正計算料崎料料產生之電腦可讀舰以纟在該較的佈置資訊上的與-第二節點結合結果分出若干二維次區域。該等光^臨近修正計算料可⑽重#的輯移出的果在一起來結合該等結果. Λ寺、、，。 15 …^發明之其他目的、特徵與優點將會在后文與配合圖不坪述之，圖示中相關的參考設定表示片及於圖示中的特【實施方式】 OPC方法 ^明可㈣地使用於改良使用於製造半導體元件的沉二型是由創造-於-層結構成形於其上。該光=„中該預期 r之佈局疋以透過一光罩對該光 20 1322953 阻進行曝光來產生。該經曝光之光罩像一物理的罩子經由之後的-或多個敍刻步驟將該光阻之佈局轉方之材料。在最終結構中的失真與降級是由於相關因素的組合， 5 源的變化、光學臨近效應、顯影程序的不均句以及二刻程序的不均勻。全部的能量在曝光或印刷步給絲積的光阻巾將蚊^在後續的顯影谷賴餘倾移除。該特印觀财為電子元件 t影像特徵可比用於印刷該等特徵的柄波長小得多（例 1〇 .波長為193奈米的光被用來產生90、65或45奈米或更低的特徵）。該詩真可導致線變薄厚斷或類似的錯誤。 ^ A圖與第圖顯示一典型的範例，該曝光光罩 =徵破製作成與晶片上之特徵有相同的尺寸與形狀(第居#由於則述之失真，該結果佈局無法忠實地將該佈二失直一 Β圖所不之該曝光光罩上。在此特定範例的失真4最終佈局變短、薄與較差地被控制。 ::的0PC方法可被使用於改善轉移至目標材料上之以補償確性。形成於該曝光光罩上的佈局可被改變以掩加2系統性的失真。一種此方法有關於截線的使用哉』/局在失真造成特徵變短、薄與類似的區域。一解為一可被放置於一主特徵之角落或頂點之小特苴^-線可為”正”，其增加該特徵之區域，或可為，，負，，’ 八減^該特徵之區域。、 11 1322953 第一A圖與第二丑圖顯示一於該曝光光罩上使用正Nine, the invention: [Technical field of the invention] "The invention relates to electronic design automation, in particular to an improved technique for performing optical proximity correction. 5 [Prior Art] Manufacturers of integrated circuits are committed to being smaller The feature is placed in an integrated circuit chip of the same area. One of the challenges in the fabrication of tiny features is the use of photo-wiring (4) when using the brushing process. That is, the ultra-large integrated circuit (VLSI) crystal> Quality and accuracy are related to the wavelength of the source and the size of the print. 15 Recently, sub-wavelength printing approximations have been used for wavelengths greater than the minimum feature size at which the image is produced (eg, a wavelength of 193 nm is used to generate 90 , 65 or 45 nm features. However, this approximation needs to be corrected for the degradation and distortion caused by the light diffraction in the final layout. Use: The reticle that produces the circuit layout has the perfect pre-preparation and The structure of the partial/partial correction is an effort to create tiny features. Computer simulations of exposure and printing are performed, and degradation and distortion are different, The inclusions and adjustments are calculated in the design of the reticle. The selected reticle is shaped to have the final structure. These filaments, commonly referred to as optical proximity corrections (optlcaipr〇ximityc〇i〇n,〇pc), are mainly Take, light (four), system and reticle features, and can be strengthened by calculation. When there is a trend of density characteristics, the trend is more inclined to distortion ("near, effect", (10) is limited to such a region' and can It is beneficial for applications in low density areas of the circuit. 5 1322953 OPC basically has many features in a layout configuration that can be processed by the computer one or more times. Recent developments in semiconductor manufacturing have allowed billions of transistors (i.e., billions of features) to be placed on a single wafer. As is well known, HMoore's law) assumes that the number of transistors that can be placed on a single wafer doubles every 5 I2·24 months. Unfortunately, regardless of the processing speed and computing power of the central processing unit (CPU), the gap between the computer power required for 0PC computing and the available CPU processing power continues to increase. In other words, the computer capabilities required to effectively perform OPC calculations are growing faster than the available CPU power at an acceptable price engineering workstation. 10 A further step in the complexity of the problem requires the application of OPC masks or layers to increase the number of nodes in each new semiconductor device fabrication. Because the features become smaller for each manufacturing node as the illumination wavelengths remain the same or decrease at a slow rate, the number of adjacent features that affect the accuracy of each feature also increases. Thus, the increase in computer processing power 15 that performs OPC operations on new wafer designs is approximately 3 or 4 or more factors for each successive manufacturing node. . At present, each generation of optical correction masks takes hours to days, and the complexity of the process increases. Since the features printed after the OTC are still different from the expected features, the effect of each feature on the function and performance of the wafer requires a remediation. A typical VLSI design process has several repeating mask strips, OPC processes, and interpretation of the results. These repetitions can cause delays in the wafer and manufacturing process for several months. Continued pressure on the time-to-market for new chip design instructions improves the way in which estimates are made and the design of the OPC process is reduced at an early stage. Because, this is 6 20 1322953 electricity, the prohibition of execution a number. The application of rc to the full-chip size, partial = model-based opc method is applied to the _ pattern, which still requires full-wafer OPC — but the design is complete. Therefore, the 'technical towel' has the time required to perform OPC in a short period of time, the improvement of the accuracy of the 0PC method, the need for the system and the method, and the ability to cope with larger wafer designs.疋 [Summary of the Invention] This month is mainly related to the field of integrated circuit manufacturing. In particular, 10 uses optical proximity correction (0PC) to improve the design of brushes for electronic circuits. In particular, the present invention relates to the implementation of OPC technology on a hardware or platform using a specialized processing unit, or a combination thereof. In accordance with the present invention, a hardware or software platform having a dedicated processing unit or a combination thereof is used to perform a 〇PC algorithm (4). In some embodiments of the invention, the spatial domain OPC calculation results are performed on a hardware or software platform, or a combination thereof, and include one or more specialized execution units. Examples of specialized execution units include central processing units (CPUs), graphics processing units (GPUs), physical processors, cellular processors, digital signal processors (DSPs), field programmable gate arrays (FPGAs), and applications. Special ϋ circuits (ASICs) and similar. The part of the 0PC calculation job can be converted into a mathematical form of matrix and vector. GpUs is especially suitable for performing this matrix and vector data operations. The GPU or GPUs can manipulate the data until the result converges to the target model within the predetermined error line. These operations include changing the reticle characteristics = 7, the detailed model of the shape and the optical resistance of the photoresist layer exposure. Finally, the material may be converted back to the original data form and output as light for printing the layout on the semiconductor device [GpUs may be an example of a special processor] but should not be limited to the features of the GpUs taught by the present invention. The present invention can be applied to any of the aforementioned special processors, as well as other intrinsic oil listeners that are known in the art and are known to those of ordinary skill, and similar or related processors that can be developed. In an embodiment, the invention comprises a computing system having at least one central processing unit and at least a graphics processing unit; - a user interface for generating an interaction with the computer system; - having a size and configuration of the narrative feature The computer-readable data of the data is formed in a lithographic exposure wire cover for manufacturing a semiconductor device; the computer readable frequency includes an optical proximity correction calculation program for the inspection material, and the +-material optical proximity correction calculation program 15 is at least A portion of the image processing unit is used during execution; and the output device X is used to calculate the result of applying the optical proximity correction calculation program to the graphics processing unit. The invention provides a method comprising: providing a system having at least a central processing unit and at least a processing unit; and a sub-learning proximity correction processing is a necessary calculation for a plurality of lion-like types, Correcting the processed job to the central processing unit; and transmitting the output of the central processing unit 7C earlier than the one of the optical proximity correction processing to ==2, the present invention includes: “calculation^ includes a number of points 'The mother-(four) includes at least the central processing unit and at least one of the 20 1322953 5 15 graphics processing units; the interface is used to connect the nodes; the user interface enables the computer system to interact; The computer readable medium of the size and configuration of the data described in (4) is formed in a lithographic printing and illuminating device for manufacturing an f-conductor device; - a computer readable by an optical proximity correction algorithm to effect the data Media, and at least part of the correction calculation program is executed at the time of execution of the graphics processing unit; and - the readable ship includes optical saki The correction calculation program effects on the data, wherein the optical proximity correction calculation program uses at least a portion of the graphics processing unit of the node in execution. The interface can be at least a PCI Express Bus, a Front Bus, an Ethernet, an Internet, or other interface that facilitates any form of data transmission in series or in parallel. The computer readable memory having the size and configuration of the bead described in the feature is formed on the lithographic P-brush exposure reticle of the fabricated semiconductor component and can be directly connected to the nodes - and the portion of the component is passed through the interface To at least the other nodes. The direct connection f can be - a different node than how the nodes are connected. For example, the direct connection can be -IDE, SATA or USB interface. The optical proximity correction calculation program is operative to cause the data to be directly connected to the computer readable medium, and at least a portion of the learning, correction calculation program is used in execution The / structuring unit is in another node other than the section directly connected to the material optical proximity correction calculation program. The computer readable medium having an optical proximity correction calculation program to effect the f-lean material is directly coupled to the plurality of nodes, and the optical proximity correction calculation program uses at least 2 20 1322953 lines These optical proximity corrections of the graphics processing unit compute the nodes of the connection. ° Straightening The system can have a computer readable medium with an optical proximity correction calculation program to share a given layout information with a number of two-dimensional sub-areas, where the sub-regions overlap. A computer readable medium having an optical proximity correction process can be provided to pass the given arrangement information to a plurality of two-dimensional sub-regions to two or more nodes in the system. An electro-acoustic ship (4) having an optical proximity correction calculation program executed at two or more points: the D-point is assigned to a plurality of two-dimensional sub-regions. - A computer readable ship having a wire adjacent correction calculation for the production of the material to separate the two-dimensional sub-areas from the result of the combination with the second node on the arrangement information. These light corrections can be combined with the results of the (10) heavy #'s series to combine the results. Λ寺,,,. The other objects, features and advantages of the invention will be described below in conjunction with the drawings. The reference setting in the drawings indicates the slice and the specific embodiment in the figure. OPC method ^ 明可(4) The type of sink used to improve the fabrication of semiconductor components is formed by a create-layer structure. The light = „the layout of the expected r is generated by exposing the light 20 1322953 through a reticle. The exposed reticle is like a physical cover via a subsequent one or more snippet steps The material of the photoresist is placed in the material. The distortion and degradation in the final structure are due to the combination of related factors, 5 source changes, optical proximity effects, unevenness of the development procedure, and unevenness of the two-time program. In the exposure or printing step, the smear of the photoresist is removed from the subsequent development valley. The special image is an electronic component. The image characteristics can be much smaller than the wavelength of the handle used to print the features. (Example 1 光. Light with a wavelength of 193 nm is used to produce features of 90, 65 or 45 nm or less.) This poem can lead to thinning and thick lines or similar errors. ^ A and Figure A typical example is shown, the exposure mask = the smashing is made to have the same size and shape as the features on the wafer (the first is due to the distortion described, the result layout can not faithfully defeat the cloth two The picture is not on the exposure mask. In this particular fan The distortion 4 final layout is shortened, thin and poorly controlled. The ::0PC method can be used to improve the transfer to the target material to compensate for the accuracy. The layout formed on the exposure mask can be changed to mask 2 systemic distortion. One method of this method is related to the use of the line 哉 / / in the distortion caused by the feature shortened, thin and similar areas. A solution can be placed in a corner or vertice of a main feature The 苴^- line can be "positive", which increases the area of the feature, or can be,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, Use positive on the exposure mask

=負ί線(第二八圖)以改變該特徵的典型的範例。第二B ===片上之結果結構為一成功使用0職術之、、，。果。㈣餘的目的為計算、改 ^罩上之每—_，嫩蝴以求。明確地’當該晶片具有數十億個二=:母一個具有許多微小結構，0PC的電腦需求會十目前普遍❹的㈣綠包含二主要等 opc計算與空間領域〇PC計算。頻革項域 «^(FD)〇pc rans嶋)來&十异在曝光光罩上的特徵的變形以上的預期結構。此方法典型具有以下步驟： 15 FIM :佈局扭曲(例如：數位化晝素之 -二維頻率領域。 FD-2 :使用該製程的低通濾波效果钮刻特性料。冰則糸統、效果FD-3 :應用一反向滤、波製程以補償前步驟之低通濾波領：回=維反向濾波將這些計算的結果由頻率該頻率領域OPC計算的精確度當使用的點的數量加。許多點必須包括所有的會衝擊欲理想化“徵的失真的局部結構。然而，每一鄰接的局部結構也需要最 1322953 佳化。理想的狀況是在-單-計料考慮所有晶片。缺而，這同樣會戲劇化增加計算的需求。因此，FD方法、空間領域(SD)〇PC計算是以該等特徵的空間特性^ 礎。調整在該曝光光罩上之特徵的邊緣與頂點，例如^ 5形或長方形，以最小化該實際使用該修正曝光光罩與期結構間的差異。此方法具有以下步驟：於該等邊緣與頂點上之準備控制點，或評估點，是以現行設計規則決定。一範例如下： SD-1 :關於每一邊緣，或是邊緣的片段，一邊緣佈置 ίο錯誤(EPE)是以該光學系統的一模型所決定。與其迴旋具有環繞於每一邊緣之曝光光罩區域來執^計算。 SD-2:依據決定一邊緣佈置錯誤，一邊緣片段可被，，推” 或”拉”以試圖降低錯誤。 15 SD-3 :重複每一邊緣片段的模擬與調整數次直至在該曰曰片上之所又特徵之邊緣佈置錯誤均在一可接受的範圍内。空間領域OPC方法比頻率領域OPC方法為佳。光效果通常局部化於在考慮下的區域的緊鄰的特徵。因此，一特 20定計算的大小可變小。然而，同樣的計算必須用於晶片上所有特徵族群。目前’典型的OPC計算問題的解決方法包括使用多 CPU電腦的大系統。此會增加系統的成本與晶片的成本。此硬體結構定義為’，均質結構”表示不同的計算工作由等效 13 處理器執行之。每一構包括:專用處理單元的合作聚集，定義為，，異質用二特定型態。此硬體結構處理器執行表 =之高處理能力之特別型態的問題，十少量的記憶體存取步驟。另二的部份於特核理題。分配㈣計算不同與改善數值的結果。—通$ ’增加效能，降低成本产，ΓΓ·7被設計騎相祕理。㈣可被_為一串 :敕ri有烟f料形式的_序的流。可剌 ==串流之，、程序、方法、演算法與其類= 在访,、、广核〜非*有效率’因為其僅依據它們的輸入。問題ϊ高3部數值與串流之其他元件無關’且對於特定 GPUs通常具有硬醜塊，其可依蚊問題而特別設計〜硬體中可實施的特定核^)。例如：硬體區塊可設計為 :曲不同形式的向量或矩陣數值，或二者。例如：圖形資匕、1為四維參考紅、綠與藍像素的通道值(參考rgb)與不透明值(典型參考alpha或A)。因此，Gpus已被設計為非常快速與有效率地處理四維(RGBA)資料。夕用以改進0Pc程序的cpu基礎方法通常使用如前所述之夕CPU系統。此方法典型具有以分割數值為平行性部份於作業等級增加sj·算效率的企圖β然而，由於其普遍的目的設计’仍無法在指令等級利用額外的平行性。 OPC a十鼻本質為圖形問題。在本發明之一實施例中，四方形或多邊形形式的圖形資料可藉一或多個cpu傳送至一或多個GPU。GPU可被設計為實施一或多個核心以有效執行前述之OPC方法的各步驟。基本上，下列的功能可以作業等級平行性實施： (I) 頂點著色器或頂點處理器的配置以評估點選擇（步驟 SD-1)。 (II) 頂點著色器的配置以修正評估點與其位置（步驟 SD-3)。 (ill)光柵化的配置以決定根據與2_D成本功能的評估點(步驟SD-1)。 (lv)像素著色器或片段處理器的配置，或二者，以強化使用快速核心檢視或核心計算(步驟SD-2)的計算。 ^ 片段處理的配置，例如區域詢問與邊緣以及邊緣片，之標記的深度測試(步驟SD_2)。其它常見的可使用的片 I又測試包括剪裁測試、測試版測試、模板測試、混合測試、混色測試、邏輯操作與其類似。 - GPU中，頂點著色器或頂點處理器為一可程式單 =，其可操作進入的頂點值與其相關資料。光柵化為變換，何與，素資料三者為片段。像素著色器S段處理器為可f式單疋，其可操作片段值與其相關資料。深度測試為，子母像素，冰度緩衝區保持距離的軌跡遠離視角與物體 1322953 佔據該像素。然後，如果通過特定的深度測試，進入的深度值取代原存於深度緩衝區的值。基本上’下列的功能可以評估點平行性實施： ⑴母一像素著色器計算一平行的評估點(步驟SD-2)。 (11)有效使用四維像素值與像素操作以快速核心計算 (步驟 SD-2)。基本上’下列的功能可以指令等級平行性實施： (i) 繪置結構地圖/圖像地圖的迴旋表(步驟犯_2)。 (ii) 使用結構以最佳化結構快取使用（步驟Sd_2)。結構地圖或圖像地圖為資料矩形陣列(如：色彩資料、照明資料、色彩與字母資料，或其類似）。結構内插法為數學上的内插法於結構地圖與圖像地圖資料。基本上，下列特定的硬體功能可以檢索與區域查詢實施： ⑴深度處理器以選擇評估點(步驟Sim)。 (II) 計算錯誤的單一輸入多重資料(SIMD)影像處理器 (步驟 SD-3) 〇 (III) 計算錯誤的多重輸入多重資料(MIMD)影像處理器（步驟SD-3) » 一深度處理器為一可程式單元，其可操作進入片段或像素值與其相關資料。影像處理器為一處理器，其玎於影像資料上執行影像解碼或編碼操作。該處理器可為一車/ 指令多重資料(SIMD)或多重指令多重資料(MIMD)形式。因此，OPC計算之一子集可非常有效地繪製於典裂之 16 1322953 GPU硬體與典型GPU可程式特徵上。因此，GpUs可盘cpus 分享計算以更有效管理〇rc問題導致高通量、低成本、改進的效能，與其類似。第二圖為-典型於商業GPU上執行—典型OTC方法 5的示意圖。此特定的實例使用一 Nvidia GeForce(g) (}1>1；處理器，但本發明可應用任何商業Gpu或類似裝置。一 OPC流程之不同操作使用一圖形處理器3〇〇執行。一 OPC流程之若干步驟包括一幾何操作3〇9、矩形片段儲存310、強度計算3U、區域搜尋312與佈局錯誤或邊緣佈 10局錯誤(，PE)計算313。幾何操作為。矩形片段儲存操作為。強度計算操作為。區域搜尋操作為。佈局錯誤或Ep 操作為。該圖形處理器可為-單-積體電路或為多個積體電路。例如：所有GPU元件顯示於圖中（如方塊3〇1，3〇2, 3〇3 15 304,305,306,307,與308)可屬於一單一積體電元件的任何組合可屬於-單-積體電路且其他元件屬於一或多個其他積體電路。或者-單一積體電路可具有一或多個圖形處理器核心。在-圖形處理器30"，具有一或多個頂點處理器 0卜連接於-二角設定方塊302。-頂鱗理器是執行頂，著色器的原由一頂點著色器的輸人為頂點資料，記錄 5位置、色彩、正規等等。在-頂點著色器中，可為工作寫入編碼例如：使用該模型視角與投射矩陣之換；正規轉換，與如需要其正規化；結構座標系統與轉換； 17 每-頂點或朗每—晝素的計算值；與色彩計算。。該二角设定方塊即為此。該三角設定方塊連接至—著色器指令分配器303。該著色指令分配器做。該著色器指人为配器連接至一或多個片段處理器304。 5 ^該片段處理器為片段著色器執行處。此單元對如：計算色彩’與每一像素結構座標；結構應用；模糊計算；與 =需要每-:象素照明計算正規，等操作負責。一片段處理器之輸入此單元典縣計算韻管道，如頂點位置，色彩，正規等等，之前一階段的内插值。 10 該片&處理器連接於-片段交又開關305。該片段交又開關做該片4又交又開關連接於一模板緩衝@ 3〇6。該模板做。該模板連接至多個記憶體分割3〇7。 f圖形處理器可具有—或多個影像處理H 3G8。該影像處理器做。該影像處理器連接至。顯示於圖形處理器· 15之^件的任何組合可包括於一積體電路。例如：一圖形處理單疋積體電路可包括一頂點處理器單元與一片段處理器單元。該圖形處理單元積體電路可包括一頂點著色器單元與一模板緩衝器單元。如第二圖所示，該等幾何操作與四角片段(步驟SD-1) 2〇可設於該GPU之頂點處理器硬體方塊。該強度計算，區域搜尋，與邊緣佈置錯誤(EPE)計算步驟(步驟SD_2至SD_4) 可設於該GPU之片段處理器與深度過瀘硬體方塊。EpE計算可被簡單視為不至錯誤計算，特別在本發明一實施例中，邊緣未被使用。 18 r:執尋器中執;： _像處理段處理組合可相錢行。程料作之任何 Γ二乍作可被CPU所執行而強度計算’區域3 尋，與EPE計算可被GPU之諸處理器所執= 實;祕域搜尋可於GPU之模衝器=在— 貫把例中，該EPE計算可使用影像處理器執行。在佈=何==四維空_ba) 料於該圖形處理單示為四通道資 =，_。:_特;= 貧料之一對角落表示於一 RGBA顏色 =理單元中，如:幻為心為G，X2為B，以】為A。該GPU將操作該等儲存於此—四維格式之 f另-實施财，—祕的座標，—χ的改變，二-Υ誠變的二維残則四_f料表示於該理早RRGBA聽空式。例如：xuR，Y1為;處 1322953 △X為B ’ ΔΥ為a。該Gpu操作儲存於此—四維格 ^ 資料。 • 在另一實施例中，一角落之X與Y座標，-角度，與 -位維不賴四邊形資料的數量顯示於1形處理單元之 5 RGBA顏色空間格式。例如：X1^R，Y1為G，△為B， r為Α。該GPU操作儲存於此一四維格式之資料。〇PC資料的表示僅為部分可使用的表示的例子。在本 • 發明另一實施例中，其他表示方案可被使用。在-實施例中，本發明之一系統包括：一計算系統， 10具有至少-中央處理單元以及至少一圖形處理單元；一使用者；丨面，用以父互作用該電腦系統；一電腦可讀媒體，包括敘述特徵的大小與布置以形成於一使用於製造半導體元件之照相平版印刷曝光光罩上之資料；一電腦可讀媒體，包括光學臨近修正計算程序以作用於該等資料，其中 15該等光學臨近修正計算程序之至少一部份執行於使用該圖籲 $處理單元，以及輸出裝置’⑽示應用該等光學臨近修 . 正計算程序執行於使用該圖形處理單元依據該等資料之結果。該圖形處理單元可包括一頂點處理器單元以及一片段處理器單元。該圖形處理單元包括一頂點著色器單元以及 2〇一模板緩衝器單元。在一實施例中，可具有多個CPU與GPU,其可執行該 OPC計算。本發明一系統可包括多個節點，其連接高速介面或其間之連接。此介面可包括，例如，一 pciExpress匯流排，AGP匯流排、前側匯流排、乙太網路，或網際網路， 20 或别述之組合。每一節點具有 GPU，式#h pDTT & 人或一個或多個= negative ί line (Fig. 8) to change the typical example of this feature. The second B === on-chip result structure is a successful use of 0 job, ,,. fruit. (4) The purpose of the remainder is to calculate and change every _ _ on the cover. Clearly when the chip has billions of two =: the mother has many tiny structures, the computer demand of the 0PC will be ten. The current general (4) green contains two major opc calculations and space domain 〇 PC calculations. The frequency term "^(FD)〇pc rans嶋) comes in the same structure as the deformation of the features on the exposure mask. This method typically has the following steps: 15 FIM: Layout distortion (eg, digitally-formed-two-dimensional frequency domain. FD-2: low-pass filtering effect using the process to engrave the characteristic material. Ice is 糸, effect FD -3: Apply a reverse filter, wave process to compensate for the low-pass filter of the previous step: Back = dimension inverse filtering will calculate the result of these calculations by the frequency of the frequency domain OPC calculation accuracy when the number of points used is added. Many points must include all the local structures that would impact the idealized distortion. However, each adjacent local structure also needs the most 1322953. Ideally, all wafers are considered in a single-counting. This also dramatically increases the computational requirements. Therefore, the FD method, the spatial domain (SD), and the PC calculation are based on the spatial characteristics of the features. Adjust the edges and vertices of the features on the exposure mask, for example ^5 or rectangle to minimize the difference between the actual use of the modified exposure mask and the period structure. This method has the following steps: Prepare control points on the edges and vertices, or evaluate points, The line design rules are determined. An example is as follows: SD-1: Regarding each edge, or a segment of the edge, an edge arrangement ίο error (EPE) is determined by a model of the optical system. The edge of the exposure mask area is used to calculate. SD-2: According to the decision of an edge placement error, an edge segment can be, pushed or pulled to try to reduce the error. 15 SD-3 : Repeat each edge segment The simulation and adjustment are performed several times until the edge placement errors of the features on the cymbal are within an acceptable range. The spatial domain OPC method is better than the frequency domain OPC method. The light effect is usually localized under consideration. The immediate characteristics of the region. Therefore, the size of a particular calculation can be small. However, the same calculation must be used for all feature groups on the wafer. Currently, the solution to the typical OPC calculation problem involves the use of a multi-CPU computer. Large system. This increases the cost of the system and the cost of the wafer. This hardware structure is defined as ', homogeneous structure' means that different calculations are performed by the equivalent 13 processor. Each structure includes: cooperative aggregation of dedicated processing units, defined as, heterogeneous use of two specific types. This hardware structure processor performs the problem of the special type of processing power of the table = ten memory accesses Steps. The other part is on the special nuclear problem. Assignment (4) Calculate the results of different and improved values. - Pass $ 'Increase performance, reduce cost, ΓΓ·7 is designed to ride the secret. (4) Can be _ a String: 敕ri has a stream of _ sequence in the form of smoke f. Can 剌 == stream, program, method, algorithm and its class = visit, ,, nuclear ~ non * efficient 'because it is based only Their input. The problem is that the three values are independent of the other components of the stream' and that they usually have hard ugly blocks for specific GPUs, which are specifically designed for mosquito-specific problems~ specific cores that can be implemented in hardware^). For example, a hardware block can be designed to: warp different forms of vector or matrix values, or both. For example: graphics, 1 is the channel value of the four-dimensional reference red, green and blue pixels (reference rgb) and the opacity value (typical reference alpha or A). As a result, Gpus has been designed to process 4D (RGBA) data very quickly and efficiently. The cpu basic method used to improve the 0Pc program usually uses the CPU system as described above. This method typically has an attempt to increase the sj-calculation efficiency by dividing the segmentation value into a parallel portion. However, due to its general purpose design, it is still impossible to utilize additional parallelism at the instruction level. OPC a ten nose essence is a graphic problem. In one embodiment of the invention, graphics in quad or polygon form may be transmitted to one or more GPUs by one or more CPUs. The GPU can be designed to implement one or more cores to efficiently perform the steps of the aforementioned OPC method. Basically, the following functions can be implemented with job level parallelism: (I) Vertex shader or vertex processor configuration to evaluate point selection (step SD-1). (II) The configuration of the vertex shader to correct the evaluation point and its position (step SD-3). (ill) Rasterized configuration to determine the evaluation point according to the 2_D cost function (step SD-1). (lv) The configuration of the pixel shader or fragment processor, or both, to enhance the calculation using the fast core view or core calculation (step SD-2). ^ The configuration of the fragment processing, such as the area query and the edge and the edge slice, the depth test of the mark (step SD_2). Other common usable tablets I include tests such as tailoring tests, beta tests, stencil tests, hybrid tests, color mixing tests, and logic operations. - In the GPU, the vertex shader or vertex processor is a programmable single =, which can manipulate the incoming vertex values and their associated data. Rasterization is a transformation, and what is, the prime data is a fragment. The pixel shader S-segment processor is a f-type unit that can manipulate fragment values and their associated data. The depth test is that the sub-pixel, the ice buffer keeps the distance away from the view and the object 1322953 occupies the pixel. Then, if a specific depth test is passed, the entered depth value replaces the value originally stored in the depth buffer. Basically, the following functions can be evaluated for point parallelism: (1) The parent-one pixel shader calculates a parallel evaluation point (step SD-2). (11) Effective use of four-dimensional pixel values and pixel operations for fast core calculation (step SD-2). Basically, the following functions can be used to command level parallelism: (i) Sketch the structure map/image map's revolving table (step _2). (ii) Use the structure to optimize the structure cache usage (step Sd_2). A structural map or image map is a rectangular array of data (eg, color data, lighting material, color and letter data, or the like). Structural interpolation is a mathematical interpolation method for structural maps and image map data. Basically, the following specific hardware functions can be retrieved with the area query implementation: (1) The depth processor to select the evaluation point (step Sim). (II) Calculating the wrong single input multiple data (SIMD) image processor (step SD-3) 〇(III) Calculating the wrong multiple input multiple data (MIMD) image processor (step SD-3) » One depth processor A programmable unit that is operable to enter a segment or pixel value and its associated material. The image processor is a processor that performs image decoding or encoding operations on the image data. The processor can be in the form of a Car/Instruction Multiple Data (SIMD) or Multiple Instruction Multiple Data (MIMD). Therefore, a subset of OPC calculations can be drawn very efficiently on the 16 1322953 GPU hardware and typical GPU programmable features. Therefore, GpUs can share the calculations to better manage the 〇rc problem, resulting in high throughput, low cost, and improved performance. The second figure is a schematic diagram of a typical OTC method 5, typically performed on a commercial GPU. This particular example uses an Nvidia GeForce(g) (}1>1; processor, but the present invention can be applied to any commercial GPU or similar device. The different operations of an OPC process are performed using a graphics processor. The steps of the process include a geometric operation 3〇9, a rectangular segment storage 310, an intensity calculation 3U, an area search 312, and a layout error or edge distribution (PE) calculation 313. The geometric operation is. The rectangular segment storage operation is. The intensity calculation operation is as follows: the area search operation is: layout error or Ep operation is. The graphics processor can be a single-integrated circuit or a plurality of integrated circuits. For example: all GPU components are shown in the figure (eg, block 3 〇1,3〇2, 3〇3 15 304, 305, 306, 307, and 308) any combination of devices that can belong to a single integrated electrical component can belong to a single-integrated circuit and other components belong to one or more other integrated circuits. A single integrated circuit may have one or more graphics processor cores. The graphics processor 30" has one or more vertex processors 0 connected to the -two corner setting block 302. - The top scale is the execution top With The original vertex shader of the device is the vertex data, recording 5 position, color, regular, etc. In the - vertex shader, the work can be coded for example: using the model perspective and the projection matrix; regular conversion , and if necessary, normalization; structural coordinate system and transformation; 17 per-vertex or lang-per-formal calculation; and color calculation. The two-corner setting block is for this. The triangle setting block is connected to - A shader instruction dispatcher 303. The shader instruction dispatcher does this. The shader refers to a human adapter connected to one or more fragment processors 304. 5 ^ The fragment processor is executed by the fragment shader. Color 'with each pixel structure coordinate; structure application; fuzzy calculation; and = need per-: pixel illumination calculation formal, etc. Operation is responsible for a segment processor input this unit can count the rhyme pipeline, such as vertex position, color , regular, etc., the interpolation value of the previous stage. 10 The slice & processor is connected to the - segment intersection switch 305. The segment is switched and the switch is made to be connected to the switch. Template buffer @3〇6. The template is made. The template is connected to multiple memory partitions 3〇7. The f graphics processor can have – or multiple image processing H 3G8. The image processor does. The image processor is connected to Any combination of components shown in the graphics processor may be included in an integrated circuit. For example, a graphics processing unitary unit circuit may include a vertex processor unit and a fragment processor unit. The integrated circuit can include a vertex shader unit and a template buffer unit. As shown in the second figure, the geometric operations and quadrangular segments (step SD-1) can be set on the GPU's vertex processor hardware. Square. The intensity calculation, region search, and edge placement error (EPE) calculation steps (steps SD_2 through SD_4) can be set on the GPU's fragment processor and depth over hardware blocks. EpE calculations can be considered simply as not erroneous calculations, particularly in an embodiment of the invention where the edges are not used. 18 r: Executor in the finder;: _Processing segment processing The combination can be used for money. Any calculations made by the program can be executed by the CPU and the intensity calculation is 'region 3 search, and the EPE calculation can be performed by the GPU's processors. The secret domain search can be performed on the GPU. In the example, the EPE calculation can be performed using an image processor. In the cloth = he == four-dimensional empty _ba) It is expected that the graphic processing is shown as four-channel capital =, _. : _ special; = one of the poor materials is expressed in a RGBA color = in the rational unit, such as: the magic is the heart is G, X2 is B, to be A. The GPU will operate the memory stored in this - four-dimensional format - the implementation of the financial, the secret of the secret, - the change of the ,, the second - Υ 变 change of the two-dimensional residual four _f material expressed in the early RRGBA listen Empty. For example: xuR, Y1 is; where 1322953 ΔX is B ′ ΔΥ is a. The Gpu operation is stored here - the four-dimensional grid ^ data. • In another embodiment, the X and Y coordinates of a corner, the - angle, and the - bit dimension are displayed in the 5 RGBA color space format of the 1-shaped processing unit. For example: X1^R, Y1 is G, △ is B, and r is Α. The GPU operates to store data in this four-dimensional format. The representation of the PC data is only an example of a partially usable representation. In another embodiment of the present invention, other presentation schemes may be used. In an embodiment, a system of the present invention comprises: a computing system, 10 having at least a central processing unit and at least one graphics processing unit; a user; a face for parent interaction of the computer system; Reading media, including narration features sized and arranged to be formed on a photolithographic exposure reticle used to fabricate semiconductor components; a computer readable medium comprising an optical proximity correction calculation program for acting on the data, wherein 15 wherein at least a portion of the optical proximity correction calculation program is executed by using the map to appeal to the processing unit, and the output device '(10) is for applying the optical proximity correction. The calculation program is executed using the graphics processing unit based on the data The result. The graphics processing unit can include a vertex processor unit and a fragment processor unit. The graphics processing unit includes a vertex shader unit and a template buffer unit. In an embodiment, there may be multiple CPUs and GPUs that can perform the OPC calculations. A system of the present invention can include a plurality of nodes that connect to a high speed interface or a connection therebetween. This interface may include, for example, a pciExpress bus, an AGP bus, a front bus, an Ethernet, or an internet, 20 or a combination thereof. Each node has a GPU, type #h pDTT & people or one or more

備一第:儲二：與咖的組合。每-節點可或不需配侑弟一儲存區域’例如硬碟、軟碟、CD 5本發明之〇pC軟體可在任何機器上執行。〜 5隼可具有—主程式執行於該系統之節點的任一子集^該主程式可只執行於該等節點之—者。本發日月之〇pc =作用之㈣可有陳該系狀任何節點。該主程式可協調該計算系統之操作。該〇PC程序或資料，或二者，可移ir統之任何另一節點。結果然後可傳回主茲主程式，其中個別的資料組合在一起。該圖形處理單元與該光學臨近修正計算程序可包括以下至少一者：分配頂點著色器以選擇評估點之程序。分配頂點著色器以修改評估點及其位置之程序。 15 Μ分配像素與頂點著色器以包括空間或頻率領域接近計算強度或電磁場’或-組合’於空氣或於其他媒介包括阻抗材料與在一晶片表面之強度計算之程序。分配像素著色器以使用強度或電磁場，或一組合，例如於頻率領域規範使用快速傅利葉轉換與逆傅利葉轉換或 20任何其他轉換至相同效果於空氣或於該組抗材料以及其他位於該晶片表面之相關位置之強度計算之程序。分配像素著色器以使用快速核心檢視或快速核心計算之強度計算之程序。分配像素著色器以使用光檢視或光計算之強度計算之 21 1322953 程序。分配深度過渡器以區域詢問與邊緣及if緣片段桿記之程序。不β 評估點之像素著色器計算之程序映射規範表為結構映像之程序。序結構快取記憶體使用最佳化之結構内插法使用之程评估點選擇之一深度處理器之使用之程序。使用誤之I—輸人多重資料(SIMD)影像處理器之矩形在中’—程序包括分裂—佈局於若干非 ====== 15 何資訊於節點之間的二下= 以平ϊί算份或完整區域資訊可提供至每一節點間的佈局資料的分割與每一分割與相鄰的分巧局上執η:發明之一技術中’替代在-全部的佈狀離下1 "抑被分離或分割為若干次區域。在此形，每-二=每•區^ 邊形、任何多邊形，或其他。正方形、不規則四依據一特定方法，於每一二維次區域令的資料是被一 22 20 或多個該系統的計算節點所操作。如前所論，每一節點可具有CPU或GPU，或二者。在一特定完成中，每一節點具有一 GPU執行QPC計算於卿局之―狀:欠區域上。計算可被執行於若干平行之次區域，其可提高計算速度。一 ?而言：愈多數量的節點，計算執行地愈快，因為更多計算g平行執行。在點完成計算後，該輸丨結果傳回一呼叫節點(例如一執行主程<的節點)或至其他特定位置。然後或多個计算節點將組合該等輸出結果以讓個別分割起提供OPC計算輸出為完整佈局資料。在本發月特疋元成中，每一次區域被傳輸至一節點 :括若：重疊區域資料從鄰接分割中。例如：以一角落分 ^ -次區域(見次區域505與509)由鄰接之二側傳送至一 =點包括重疊資訊。以一邊緣分割(非角落），次區域507由 =接之三區域傳送至一節點包括重疊資訊。以一中間分 ^該次區域511包括由四鄰接區域所來之重叠資訊。當執行OPC計料，鱗節點㈣這些域包括重疊資料。在一特^完成中’在該OPC計算之後，來自每-節點，輪出即為該等輸出資料為該次區域本身，沒有任何重疊區域。此方法導致更多精確的結果於該GPC計算中。在一特定實施例中，該平版印刷製程模擬OPC與RET 的之計算，其包括該光罩準備相關計算、EAPSM與 =SM相關計算’例如電磁場計算以考慮後0效應，該匕予處理發生於平印刷處理中，包括曝光處理、後烘烤處化學擴大處理、顯影處理全部或部分的計算於像素著 23 1322953 色器或像素與頂點著色器的組合中。弟四圖顯示一電腦系統的示意有關於本發明不同實施例。在某些實施例中，該電腦系統具有一伺服器4〇1、顯示器402、一或多個輸入介面403，以及一或多個輸出介面 5 404，所有以一或多個匯流排405連接.適合的匯流排包括 PCI-Express®、AGP、PCI、ISA 與其類似者。該電腦系統可包括任意數量之圖形處理器。該圖形處理器可設於主機板，例如整合於該主機板晶片組。一或多個圖形處理器可設於一透過一匯流排，例如一 ISA匯流 K)排、PCI匯流排、AGP埠、PCI Express，或其他系統匯流排連接至該系統之外部板上。圖形處理器可謂於分離板，母一分離板連接至一匯流排’例如PCI Express至其他分離板與該系統之其餘部分。再者，可具有一分離匯流排或連接(如Nvidia SLI or ATI CrossFire連接）’利用此可讓該圖形 15處理器相互聯繫。此分離匯流排或連接可為附加或取代系統匯流排。該伺服器401包括一或多個CPU 406、一或多個GPU 407，以及一或多個記憶體模組412。每一個CPU與GPU 可為一單一核心或多重核心單元。例如適合的CPU包括英 20 特爾 Pentium®，英特爾 Core™ 2 Duo, AMD Athlon 64, AMD Opteron® ’與其類似。例如適合的GPU包括Nvidia GeForce®，ATI Radeon®，與其類似。該輸入介面403可包括一鍵盤408與一滑鼠409。該輸出介面404可包括一印表机 410。 24 該通訊介面411為一網路介, :體(t線或硬體網路聯繫。該通訊介面411許怎在另一實施例中，該通訊介_， f 411 勺W 、（，其可用以透過該軌介面411存取，〇括=動電話、PDA、個人電腦，財_(未顯示）。 w己憶體模組412通常包括不同形式、例證半導體記憶體’例如隨機存取記憶體(RAM)，與磁碟機與其他。在 2^例中’該記憶模組412，儲存—操作系統413、 I低構4Μ '指令415、應用程式416，與程序417。儲存裝置可具有大量磁碟機、軟碟機、磁碟、光碟、可讀寫光碟、固定磁碟、硬碟、CD_R〇M、可記錄光碟、 DVD、可記錄 DVD(例如：DVD-R、DVD+R、DVD-RW、 is DVD^RW、HD-DVD，或 Blu-ray Disc)、快閃記憶體以及其他永久性固態儲存(如：USB快閃機）、電池支援暫時記憶體、磁帶儲存體、閱讀機，與其他類似媒體，與前述之組合。在不同實施例中，該特定軟體指令、資料結構，與資 2〇料其可實施本發明之不同實施例典型應用於伺服器、可讀媒體’例如：記憶體，與包括指令、應用程式’與，當由處理器執行時’程序造成該電腦系統利用本發明，例如，資料的收集與分析、像素結構、決定邊緣佈置錯誤、移動邊緣片段、邊緣片段佈置最佳化，與其類似。該記憶體可 25 儲存軟體指令、資料結構，收集應用程式、資料聚：一知作系統之資料、資料與半導體記憶体、_記心^ m分析程序，以及本發明之一電腦$ 士: 或其組合之類似者。存於，獲有關於電腦可讀=腦，仃版本可實現使用、錯何媒體參與提供指令二' °一電腦可讀媒體可具有任輸媒體。永久性媒體包生、非永久性，與傳憶體或RAM。傳輸## q 心尤憶體，例如快閃記 mv 媒體包括同軸電纜、銅後、㈣，卜排於匯流排中之線路，媒相與安聲音，或光浊，也丨』、目電磁、無線電頻率、生者。 β在無線電波與紅外線資料通訊中產例如.本發明之—二進元、機械可執行 15 MM或快閃記憶體，或—大量儲存裝置⑽ 更碟、磁碟、磁帶或CD_R〇M)。進一步明的核心可透過電線、無線電波，或透過-網路，例如! 網際網路，傳輸。纷秒i如該操作系統可為任何習用的操作系統，包含別心⑽ 20 (微軟公司的註冊商標）、Unix®(在美國與其他國家的〇pen Gr〇Up的註冊商標）、Mac OS®(蘋果電腦公司的註冊商標）、 Linux®(Linus T〇rVald的註冊商標）’以及其他未明確列於此者。在本發明之不同實施例中是以一方法、系統或製造上使用彳示準程式或工程技術之物品’或二者，以生產軟體、韌體、硬體’或任何前述之組合。使用於此應用之，，製造之 26 寺句ιΡ，，〆-¾曰” 任程式/品)—辭是指H電腦程式可自之軟辦ΰΤ4喝裝置、载波或媒體讀取。此外，不同實施例存取。透過傳輸媒體，例如：透賴路而至-健器，該製造之物品，其中實施該核心者，亦圍繞傳輸媒歹'如·魄賴線與無_路雜。因此該製造之物 ^包括其有該核叫媒體i知此技藝者會了解許夕、修改可進行改變結構而不會背離本發明的範園。第四圖所顯示之電腦系統不該限制本發明。其他可選擇的硬體環境可被使用而不會背離本發明的範圍。 15 V本發明之以上敘述已提供了圖示與說明的目的。其並非詳盡或其細節的敘述也不應限制本發明，而許多修正或變化按照以上所揭的為可能的。所選擇與敘述的實施例是為了進行本發明的原理與其食物的應用做最佳的解釋。此敘述可令熟知此項技藝者最佳使用或實施本發明不同的實施例’且依不同的修正可適用於特定的使用。本發明的範圍將定義於以下的申請專利範圍中。 27 1322953 15Prepare a first: Chu Er: a combination with the coffee. Each node may or may not be equipped with a storage area' such as a hard disk, a floppy disk, or a CD 5. The 〇pC software of the present invention can be executed on any machine. 〜5隼 can have any subset of the nodes that the main program executes on the system. ^ The main program can be executed only on those nodes. After the date of the month, pc = the role of (4) can have any node of the system. The main program coordinates the operation of the computing system. The PC program or data, or both, can be moved to any other node. The result can then be passed back to the main program, where the individual data are combined. The graphics processing unit and the optical proximity correction calculation program can include at least one of: a procedure of assigning a vertex shader to select an evaluation point. A program that assigns a vertex shader to modify the evaluation point and its position. 15 Μ Allocation of pixels and vertex shaders to include spatial or frequency fields to approximate the strength or electromagnetic field 'or-combination' in air or other medium including resistive materials and the calculation of the intensity of a wafer surface. Allocating pixel shaders to use intensity or electromagnetic fields, or a combination, such as in the frequency domain specification using fast Fourier transform and inverse Fourier transform or 20 any other conversion to the same effect on air or on the set of anti-materials and other on the surface of the wafer The procedure for calculating the strength of the relevant position. A program that assigns pixel shaders to use the fast core view or fast core calculations for strength calculations. Assign a pixel shader to calculate the intensity of the light or optical calculation using the 21 1322953 program. Assign the depth transitioner to the area query and the edge and if edge clips. Program for pixel shader calculation without beta evaluation point The mapping specification table is a program for structural images. Sequence Structure Cache Memory Use Optimized Structure Interpolation Process The evaluation point selects one of the procedures used by the depth processor. Use the error I - input multiple data (SIMD) image processor rectangle in the '- program includes split - layout in a number of non-====== 15 What information between the two nodes = calculate Partial or complete regional information can be provided to the segmentation of the layout data between each node and each segmentation and adjacent demarcation bureaus η: one of the inventions in the technology of 'alternative in-all fabrics 1 " It is separated or divided into several sub-regions. In this shape, every - two = per area ^ edge, any polygon, or other. Square, Irregular 4 According to a specific method, the data in each two-dimensional sub-area is operated by a 22 20 or more computing nodes of the system. As previously discussed, each node can have a CPU or GPU, or both. In a particular implementation, each node has a GPU that performs a QPC calculation on the "state": the underlying area. The calculation can be performed in several parallel sub-regions, which can increase the calculation speed. In terms of: The more nodes, the faster the calculation is performed, because more calculations are performed in parallel. After the point has completed the calculation, the result of the transmission is passed back to a calling node (e.g., a node executing the main < or to a specific location). These or multiple compute nodes will then combine the output results so that the individual splits provide the OPC calculation output as a complete layout. In this special month, each region is transmitted to a node: if: the overlapping region data is from the adjacent segmentation. For example, a sub-region (see sub-regions 505 and 509) is transmitted from the adjacent two sides to a = point including overlapping information. With an edge segmentation (non-corner), the sub-region 507 is transmitted from the = third region to a node including overlapping information. The sub-region 511 includes overlapping information from four adjacent regions. When performing OPC metering, the scale nodes (4) these fields include overlapping data. In a special completion, after the OPC calculation, from each node, the output is the sub-region itself, without any overlapping regions. This method leads to more accurate results in this GPC calculation. In a particular embodiment, the lithographic process simulates the calculation of OPC and RET, which includes the reticle preparation related calculation, EAPSM and =SM correlation calculations, such as electromagnetic field calculations to account for the post-zero effect, which occurs in In the flat printing process, all or part of the calculations including the exposure process, the post-baking chemical enlargement process, and the development process are performed in a combination of a pixel or a pixel and a vertex shader. The four figures show a schematic representation of a computer system relating to different embodiments of the present invention. In some embodiments, the computer system has a server 4, a display 402, one or more input interfaces 403, and one or more output interfaces 5 404, all connected by one or more bus bars 405. Suitable bus bars include PCI-Express®, AGP, PCI, ISA, and the like. The computer system can include any number of graphics processors. The graphics processor can be located on a motherboard, such as integrated into the motherboard chipset. One or more graphics processors can be connected to the external board of the system via a bus, such as an ISA bus, a PCI bus, an AGP port, a PCI Express, or other system bus. The graphics processor can be referred to as a splitter board, and the splitter board is connected to a busbar' such as PCI Express to other splitter boards and the rest of the system. Furthermore, there may be a separate bus or connection (e.g., Nvidia SLI or ATI CrossFire connection) that allows the graphics 15 processor to be interconnected. This separate bus or connection can be an additional or replacement system bus. The server 401 includes one or more CPUs 406, one or more GPUs 407, and one or more memory modules 412. Each CPU and GPU can be a single core or multiple core units. For example, suitable CPUs include the British 20 Pentium®, Intel CoreTM 2 Duo, AMD Athlon 64, and AMD Opteron®. For example, suitable GPUs include Nvidia GeForce®, ATI Radeon®, and the like. The input interface 403 can include a keyboard 408 and a mouse 409. The output interface 404 can include a printer 410. 24 The communication interface 411 is a network interface: body (t-line or hardware network connection. The communication interface 411 is how in another embodiment, the communication medium _, f 411 scoop W, (, which is available Access through the track interface 411, including = mobile phone, PDA, personal computer, financial (not shown). The memory module 412 usually includes different forms, exemplified semiconductor memory, such as random access memory (RAM), and the disk drive and others. In the example of the 'memory module 412, the storage-operating system 413, I low-profile 4' command 415, the application 416, and the program 417. The storage device can have a large number of Disk drive, floppy disk, disk, CD, readable and writable disc, fixed disk, hard disk, CD_R〇M, recordable disc, DVD, recordable DVD (eg DVD-R, DVD+R, DVD) -RW, is DVD^RW, HD-DVD, or Blu-ray Disc), flash memory and other permanent solid-state storage (eg USB flash drive), battery-backed temporary memory, tape storage, reader And other similar media, in combination with the foregoing. In different embodiments, the specific software instruction, data structure, 2 different embodiments in which the present invention can be implemented are typically applied to a server, a readable medium 'eg, a memory, and including instructions, applications' and, when executed by a processor, 'the program causes the computer system to utilize The present invention, for example, data collection and analysis, pixel structure, decision edge placement error, moving edge segment, edge segment placement optimization, is similar to this. The memory can store software instructions, data structures, and collect applications and data. Poly: a system of knowledge, data and semiconductor memory, _ memorandum ^ m analysis program, and a computer of the present invention $: or a combination of similar. Saved, obtained about computer readable = brain , 仃 version can be used, wrong media participation to provide instructions two ° ° computer readable media can have any transmission media. Permanent media package, non-permanent, and memory or RAM. Transmission ## q 心尤Recalling the body, for example, the flash mv media includes coaxial cable, copper, (4), the line arranged in the busbar, the media phase and the sound, or the turbidity, also 丨, 目Radio frequency, live. β In radio and infrared data communication, for example, the present invention - binary, mechanically executable 15 MM or flash memory, or - mass storage device (10) disc, disk, tape or CD_R〇M). The core of further clarification can be transmitted through wires, radio waves, or through a network, such as the Internet. For example, the operating system can be any custom operating system, including the heart (10) 20 (registered trademark of Microsoft Corporation), Unix® (registered trademark of 〇pen Gr〇Up in the United States and other countries), Mac OS® (registered trademark of Apple Computer), Linux® (registered trademark of Linus T〇rVald) 'And others are not explicitly listed here. In various embodiments of the invention, a method, system or article of manufacture or use of a quasi-program or engineering technique is used to produce a software, firmware, hardware' or any combination of the foregoing. For this application, the 26th sentence of the temple is made, 〆-3⁄4曰" program/product) - the word H computer program can be read by soft device, carrier or media. In addition, different The embodiment accesses, through the transmission medium, for example, through the road to the health device, the manufactured article, wherein the core is implemented, and the transmission media is also surrounded by the cable and the road. The manufacture of the product ^ includes the core of the media, and the skilled person will understand that the modification can be made without changing the structure of the invention. The computer system shown in the fourth figure should not limit the invention. Other alternative hardware environments may be utilized without departing from the scope of the invention. The above description of the invention has been provided for purposes of illustration and description. And many modifications or variations are possible in light of the above disclosure. The embodiments selected and described are intended to best explain the principles of the invention and the application of the food. This description is best known to those skilled in the art. Use or implement Different embodiments of the invention implemented 'and applicable to different correction depending on the specific use. The scope of the invention defined in the following patent scope. 27132295315

【圖式簡單說明】第一 A圖為-典型之印刷於—典型光罩上之佈局之意圖；第- B圖顯不無OPC顯影於光阻上之結果佈局；第二A圖為-印刷於—典型光罩上之典型〇pc_修佈局；〃第二B圖顯示顯影於該光阻之該結果佈局；第三圖為一在一典型消費性GPU之部分〇pc程序典型完成之示意圖；第四圖為一有關於本發明各實施例之電腦系統之示意圖；以及〜第五圖顯示佈置資料的分割以及各分割具有與相鄰分個的重疊區域。【主要元件符號說明】 300圖形處理器 3〇1頂點處理器 302三角設定 303著色器指令分配器 304片段處理器 305片段交又開關306模板緩衝器 307記憶體分割 308影像處理器 309幾何操作 310矩形片段 311強度計算 312區域搜尋 313邊緣佈局錯誤(EPE)計算 401伺服器 402顯示器 403輸入介面 404輸出介面 405匯流排 406 CPU 407 GPU 408鍵盤 409滑鼠 28 20 1322953 410印表机 411通訊介面 413操作系統 414資料結構 416應用程式 417程序 505、507、509、511 次區域 412記憶體模組 415指令 503佈局[Simple description of the drawing] The first A picture is the intention of the typical layout printed on the typical mask; the first - B picture shows the result layout of the OPC development on the photoresist; the second picture A is - printing The typical 〇pc_ repair layout on a typical reticle; 〃 The second B diagram shows the resulting layout of the photoresist developed; the third figure is a schematic diagram of a typical GPU program in a typical consumer GPU. The fourth figure is a schematic diagram of a computer system relating to various embodiments of the present invention; and the fifth figure shows the division of the arrangement data and the divisions have overlapping areas with adjacent ones. [Main component symbol description] 300 graphics processor 3〇1 vertex processor 302 triangle setting 303 shader instruction allocator 304 fragment processor 305 segment intersection switch 306 template buffer 307 memory segmentation 308 image processor 309 geometry operation 310 Rectangular segment 311 intensity calculation 312 region search 313 edge layout error (EPE) calculation 401 server 402 display 403 input interface 404 output interface 405 bus 406 CPU 407 GPU 408 keyboard 409 mouse 28 20 1322953 410 printer 411 communication interface 413 Operating system 414 data structure 416 application 417 program 505, 507, 509, 511 sub-region 412 memory module 415 instruction 503 layout

2929

Claims

1322953 Patent Application Range: 1. A software or hardware flat A optical proximity correction system with a graphics processing unit, comprising: a σ-calculation system comprising at least one central processing unit and a to-shape processing unit; - Figure 5 a user interface for interacting with the computing system; a computer readable medium containing information describing the size and arrangement of features to form a lithographic exposure mask for use in fabricating semiconductor components - 钱可,包域The syllabus = for example, wherein the two parts of the optical proximity correction calculation program use the graphics processing unit during execution; the & ^ field (4) causes the (10)_ processing unit to calculate the =, the two optics The result of the proximity correction calculation program 15 contains the system described in the -β vertex 1 item, where the figure is 3. According to the single; processor unit. The unit contains - the graphics processing list 20

The unit of the unit, wherein the figure is in accordance with the system of the patent application program, the system of the present invention includes the optical unit, a template buffer, the top unit, and the fragment processor template. A buffer, and an image processor unit. 7. The system according to item 7 (4) of the patent application, wherein the at least one (four) order is <° at least - physically processed 11, a honeycomb processor, a digital signal processor, or an application-specific integrated circuit. 9. The system of claim 3, wherein the graphics processing unit and the optical proximity correction calculation program comprise at least one of: a procedure for evaluating a vertex shader assignment of a point selection; 10 an evaluation point modification and The program of the position of the vertex shader is allocated; including the spatial or frequency domain close to the calculated intensity or electromagnetic field, or a combination of 'air or other medium including the impedance material and the intensity of the pixel on a wafer surface calculated by the pixel and vertex shader assignment Procedure; using intensity or electromagnetic field, or a combination, such as in the frequency domain specification I5 using fast Fourier transform and inverse Fourier transform or any other conversion to the same effect on air or on the set of resistant materials and other relevant locations on the surface of the wafer. Program for pixel shader assignment for intensity calculation; program for pixel shader assignment using intensity calculation for fast core view or fast core calculation; 20 program for pixel shader assignment using intensity calculation for light inspection or optical calculation; Edge and edge segment mark depth Program for filter assignment; program for pixel shader evaluation at evaluation point; 31 1322953 mapping specification table for program of structure image; structure cache memory using optimized structure interpolation method; evaluation point selection A program used by a deep processor; or 5 a program for calculating the use of a single input multiple data (SIMD) image processor. 1 〇 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ carried out. 11 11. The system according to claim 1, wherein the intensity calculation program, the region search program, and the optimization calculation of the function are performed in an optical proximity correction calculation program, including an edge placement error calculation program, Executed by the fragment processor unit of the graphics processor. 12_ The system according to the scope of the patent application, wherein the optical operation program of the optical=correction calculation program is a system according to the scope of the patent application U1〇 The optical ^ near-suborder geometric operation axis is determined by one of the processing units of the vertex processor. A system according to the item, wherein the processing of the optical segment (4) is a system according to claim 10, wherein the optical 32 20 is adjacent to a rectangular segment of the correction calculation program. The program is executed using the central processing unit. 16. The system of claim 1, wherein the rectangular program of the optical proximity correction calculation program is executed by a vertex processor of the graphics processing unit 5. 17. The system of claim 1, wherein the rectangular program of the optical proximity correction calculation program is (4) executed by the segment processor of the processing unit. 18. The system of claim 1, wherein the optical operation and the moment stripping of the optical correction material are performed using the central processing unit. 19. According to the scope of the patent application, the area search procedure for the near-correction calculation program is performed using the map = rational = a template buffer. 15. The system of claim 1, wherein an optical proximity correction calculation program is performed using an image processor of the graphics processing unit. 21. The system of claim 3, wherein the one-dimensional irregular quadrilateral of the data is represented as four channels of data in the graphics processing unit. 22. The system of claim 2, wherein the parent of the two-dimensional irregular quadrilateral of the two-dimensional irregular quadrilateral and the RGBA color space format in the graphics processing unit. ',... 23. According to the system described in the scope of the patent application, wherein the data 33 1322953 one of the two-dimensional irregular quadrilateral corners, one and Y coordinates are represented as the graphics processing star and a height x format . One of the 70 orders of RGBA color space 24. According to the scope of the patent application No. 1 - two-dimensional irregular quadrilateral - corner 1, unified 'where the X and Y coordinates of the data change indicate that the pulse;;; χ (four) change and one Awkward color space format. "In the graphics processing unit - drive 25. According to the scope of the patent, the two-dimensional irregular quadrilateral - angle - Ά 'the towel, the data and the Υ coordinates are expressed as the angle of the graphics processing and - quantity Χ format. one of the early 7 ° Ning; RGBA color space 26. - applied to the optical proximity correction method with graphics, including: early software or hardware platform - the system contains at least one central 15 Processing unit; early and at least one pattern separation - optical proximity red processing to a plurality of types of operations; one of the required operations for assigning the optical proximity correction processing to the element or the graphics processing unit; The processing unit transmits the result of the central processing unit and the graphics processing single learning proximity correction processing. The cardiac output is the light 27. The processing unit includes a vertex processor and a fragment processor unit according to the patent application scope. The graphic method according to claim 100, wherein the processing unit comprises a vertex processor and a segment processor according to the method of the casting method 34 20 1322953 The method of claim 27, wherein the graphics processing unit further comprises a stencil buffer. X. 30. The method according to claim 27, wherein the graphic 5 is processed. The unit further includes an image processor unit. The method of claim 26, wherein the graphics processing unit comprises a vertex processor, a segment processor unit, a template buffer, and a Image processor unit. 32. The method of claim 26, wherein the at least one graphics processing unit is replaced by at least one of a physical processor, a digital signal processor, or an application specific integrated circuit. 33. The method of claim 26, wherein assigning the jobs comprises at least one of: ~ assigning a vertex shader to select an evaluation point; 15 assigning a vertex shader to correct the evaluation point and its location; Raster to determine evaluation points based on 1D and 2D cost functions; assign pixel shaders to use fast core view or fast core calculation intensity calculations; assign pixel shaders to use light view or light calculation intensity calculations; 20 distribution depth The filter uses region search and edge and edge segment markers; uses pixel shaders to calculate evaluation points; maps the specification table as structure images; uses structural interpolation to optimize structure cache memory usage; uses a depth processor to Selecting an evaluation point; or 35 1322953 34. - A software or hardware flat optical proximity correction system with a graphics processing unit, comprising: σ 5 - a computational material comprising a plurality of nodes, wherein each node contains at least At least one of a central processing unit or at least one graphics processing unit; To connect the nodes; a user interface 'to interact with the computer system · a computer readable medium containing information describing the size and configuration of the features for forming a semiconductor for use in manufacturing Lithography of the component ^ on the light mask; and a computer readable medium containing an optical proximity correction calculation program for effecting on the data, wherein the optical proximity correction calculation program performs at least one injury execution A 15 unit is processed using graphics in one of the nodes. 35. The system of claim 34, wherein the interface comprises at least one of a PCI Express bus, an AGP bus, a front bus, an Ethernet, or an internet. 36. The system of claim 34, wherein the method comprises a size and configuration of the described features for forming a computer for use in a lithographic exposure reticle for fabricating a semiconductor component. The reading medium is directly connected to one of the nodes, and one of the pieces of information passes through; the face to at least the other of the nodes. 37. The system according to claim 34, wherein the system comprises 36 I322953 light = according to the system described in the patent paradigm, wherein the data including the brain readable data occurs: using: electricity Ίο 15 Calculation:: At least - part of the execution is to make two points ίί: material direct linker Λ r patent scope system described in item 34, contains: a given cloth (four) learn near red calculation program to split 40. According to the application, the sub-regions overlap each other. The system described in Item 39 of the Moon Blade Target includes: Computer-readable media, including optical information splitting into a number of two-dimensional sub-regions to two-two bodies' ^Wired near red computing material executed in the dimension Subregion. Ρ上^作_蚊 ( 四四四四四四四四四四四四四四四四四四四四四四四四四四四四四四四四有有有有有有有有有有有有有有The result is the system of claim 41, wherein the optical proximity correction calculation program of the combination results includes linking the results by removing the overlapping regions. 43. The system of claim 42 wherein the link 5 is executed as a single node. 44. The system of claim 42 wherein the link is performed by a plurality of nodes.

38