TW484110B - Video frame rendering engine - Google Patents

Video frame rendering engine Download PDF

Info

Publication number
TW484110B
TW484110B TW87109751A TW87109751A TW484110B TW 484110 B TW484110 B TW 484110B TW 87109751 A TW87109751 A TW 87109751A TW 87109751 A TW87109751 A TW 87109751A TW 484110 B TW484110 B TW 484110B
Authority
TW
Taiwan
Prior art keywords
circuit
memory
interface
processor
printed
Prior art date
Application number
TW87109751A
Other languages
Chinese (zh)
Inventor
Earle W Jennings Iii
Original Assignee
Hyundai Electronics America
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US08/993,442 external-priority patent/US6854003B2/en
Application filed by Hyundai Electronics America filed Critical Hyundai Electronics America
Application granted granted Critical
Publication of TW484110B publication Critical patent/TW484110B/en

Links

Landscapes

  • Image Processing (AREA)

Abstract

A circuit is provided which contains memory, logic, arithmetic and control circuitry needed to generate all or part of a frame for use in video processing, and animation as well as digital signal and image processing. One or more such circuits are provided on an integrated circuit. A video or image frame generation system is constructed from one or more of these integrated circuits, optionally with additional memory circuitry, to provide exceptional performance in frame production for animation, particularly 3-D and other high performance applications such as medical imaging, virtual reality and real-time scene generation in video games and simulation environments. The circuit(s) are used to process high speed object-oriented graphics related streams such as proposed by MPEG 4, as well as act as a single chip JAVA engine with highly optimized numeric performance.

Description

484110 Λ7 五、發明説明(ί ) 相關申請案之陳述: 本專利申請案主張優先權爲美國臨時申請案號:6 0 / 033,476,1996年12月19日提出,和案 號:60/050,396,1997年6月20日提出 ,這些臨時申請案件之全部內容於此處係合倂參考之用。 本發明之背景: 本發明係關於一些電路。更特別地,本發明係關於高 性能積體電路係適用於視訊幀產生工作與數位信號處理( D S P )工作。繪圖產業之主要問題係產生幀,每一幀係 一矩形之像素陣列,通常含有超過一百萬個像素。關於動 畫特別是三維動畫,一般需要數百萬至數十億次的計算以 產生一像素。相似地,繪圖應用程式(諸如醫學成像)需 要產生一或多個幀,再者,通常按照動畫順序產生。 考慮於1 9 9 5年由P i X a r發表的成果有關產生 動畫「T 〇 y故事」之例子。此動畫中有1 1 〇 〇 〇 〇幀 ◊ P i X a r使用8 7組雙處理器1 0 0百萬赫玆(ΜΗ Z)Sparc20 ‘s和30組四處理器100百萬赫 玆Sparc 20 4s。全部係2 9 4個中央處理單元( CPU)。每一中央處理單元平均有96百萬位元組之隨 機存取記憶體並且每一處理節點具有一 3至5十億位元組 (g i g a b y t e )之區域磁碟驅動器。該等磁碟機與 它們的伺服器係與此專利無關,然而系統係相當大。整個 動畫需要計算4 6天且一平均幀需要花1至3小時之S p a r c中央處理單元處理器時間。見參考文獻〔1〕。 本紙張尺度適用中國國家標隼(CNS ) A4規格(21〇Χ29*7公漦) π請先閱讀背面之注意事項^^寫本頁) •裝· 訂484110 Λ7 V. Statement of Invention (ί) Relevant Application Statements: This patent application claims priority to US provisional application number: 60 / 033,476, filed on December 19, 1996, and case number: 60 / 050,396, filed on June 20, 1997, the entire contents of these provisional applications are hereby incorporated by reference. BACKGROUND OF THE INVENTION The present invention relates to some circuits. More specifically, the present invention relates to a high-performance integrated circuit suitable for video frame generation work and digital signal processing (DSP) work. A major problem in the graphics industry is generating frames, each of which is a rectangular pixel array, usually containing more than one million pixels. Regarding animation, especially three-dimensional animation, millions to billions of calculations are generally required to produce one pixel. Similarly, drawing applications, such as medical imaging, need to produce one or more frames, and more often, they are generated in the order of animation. Consider an example of an animation "T 〇 y story" created by the results published by Pi X a r in 1995. There are 1 100 frames in this animation ◊ Pi X a r uses 8 7 groups of dual processors 100 megahertz (ΜΗZ) Sparc20 ′ s and 30 groups of four processors 100 megahertz Sparc 20 4s. All are 2 9 4 central processing units (CPUs). Each central processing unit has an average of 96 million bytes of random access memory and each processing node has a 3 to 5 gigabyte (g i g a b y t e) regional disk drive. These drives and their servers are not related to this patent, but the system is quite large. The entire animation needs to be calculated for 4 to 6 days and an average frame takes 1 to 3 hours of processor time in the SPa r c central processing unit. See reference [1]. This paper size is applicable to China National Standard (CNS) A4 (21〇 × 29 * 7 公 漦) π Please read the precautions on the back first ^ Write this page)

經濟部中央標準局員工消費合作社印製 L 84 4 經濟部中央標準局員工消費合作社印製 Λ7 __ B7 __ 五、發明説明(v ) 雖然「T 〇 y故事」不是照片般的真實,但它的確代 表一重要技術突破。「Toy故事」係第一部完整長篇的 故事影片動畫,整部故事動畫係由三維電腦動畫技術產生 ◊照片般的真實相對於如此的一影片而言至少需要更複雜 十倍的計算。假設該等照片般的真實幀係需計算3 0小時 〇 有很多不同程式被用來產生幀。見參考文獻〔1 9〕 一〔2 3〕。這些程式係很複雜且必須具備高效能。他們 係以高階程序與物件導向計算機程式設計語言(諸如C, C + +和FORTRAN)所建立。這些程式中僅最重要 效能關鍵部分可能直接以組合/機器語言爲了基礎的描繪 引擎硬體而撰寫,因爲成本過於昂貴且很難以組合/機器 語言程式設計。因爲浮點算術動態範圍很廣且程式設計容 易故浮點算術應用於這些程式係廣受歡迎的。 改進效能的需求係很大。最佳的視訊編輯係需要每秒 一幀。即時虛擬實境必須達到每秒產生3 0個幀。滿足此 兩項應用所需的效能改善,對視訊編輯而言係增速至1 0 8 0 0 0倍(=3 0小時/幀X 3 6 0 0秒/小時)以及 對虛擬實境而言係增速至3 240000倍(二30*視 訊編輯)◊ 一相似情形係存在於高效能數位信號處理中。典型的 需求包括處理影像,通常全程從二維與三維感測器陣列群 組收集信號並繪製內部物質之影像,其中包括人體與機器 工具。 ______5_ 本紙張尺度適用中國國家標準(CNS ) A4規格(210X297公釐) (請先閱讀背面之注意事項再填寫本頁)Printed by the Consumers 'Cooperative of the Central Standards Bureau of the Ministry of Economic Affairs L 84 4 Printed by the Consumers' Cooperative of the Central Standards Bureau of the Ministry of Economic Affairs Λ7 __ B7 __ V. Description of the Invention (v) Although the "T 〇y Story" is not as true as a photo, it Represents an important technological breakthrough. "Toy Story" is the first full-length feature film animation. The entire story animation is produced by 3D computer animation technology. ◊Photo-like reality requires at least ten times more calculation than such a movie. Assume that these photo-like real frames are calculated in 30 hours. There are many different programs used to generate frames. See references [1 9] to [2 3]. These programs are complex and must be efficient. They were created with high-level procedural and object-oriented computer programming languages such as C, C ++, and FORTRAN. Only the most important performance-critical parts of these programs may be written directly in assembly / machine language based on the rendering engine hardware, because the cost is too expensive and it is difficult to program in assembly / machine language. Because floating-point arithmetic has a wide dynamic range and easy programming, floating-point arithmetic is popular in these programs. There is a great need to improve performance. The best video editing system requires one frame per second. Real-time virtual reality must reach 30 frames per second. The performance improvements required to meet these two applications have been increased to 1 800 0 times (= 30 hours / frame X 360 seconds / hour) for video editing and virtual reality. The speed is increased to 3 240,000 times (two 30 * video editing). A similar situation exists in high-performance digital signal processing. Typical requirements include processing images, usually collecting signals from 2D and 3D sensor array groups and rendering images of internal matter, including human bodies and machine tools. ______5_ This paper size applies to China National Standard (CNS) A4 (210X297 mm) (Please read the precautions on the back before filling this page)

經濟部中央標準局員工消費合作社印製 484110 Λ7 __L7____一 五、發明説明()) 這些多維的信號處理應用程式從超音波或者磁造影感 測器庫繪製影像。此對幀產生而言係具有相同的效能需求 。這些應用程式於一三維或四維環境的重現/模擬中具有 解析特徵之目的◊(注意:此處所謂四維意思係三維域在 時間軸之觀察/模擬)。特徵解析度係輸入感測器解析度 、在一特定時段內能被計算的快速傅立葉轉換(F F T ) 分析之深度、捨去誤差之控制與經由該等資料幀處理之捨 去誤差之累積。 在極短的時間內的高淸晰之特徵解析度係導致每一像 素或輸出的每一資料點將執行數百萬次並且通常爲數十億 次之算術運算◊使用浮點算術用在提供動態範圔控制與彈 性捨入誤差控制係相當常見的。演算的彈性係被優先考慮 ,乃由於不斷有軟體開發且很多不同的應用程式可利用。 這些不同的應用程式通常需要各式軟體。 這些應用程式軟體發展需求係相當一致的,特別地, 大部分應用程式需要許多程式,多半以程序電腦程式設計 語言如C',C + +和FORT RAN寫入(見參考文獻〔 1 1〕一〔 1 8〕);並且機器階層程式設計之使用只限 於該等程式之最關鍵效能部分。 該等目檫演算法將會出現以下常見的特性:處理每〜 元素需要很多記憶體,通常需1 〇 〇百萬位元組(Μ B ) 範圍;每〜輸出値(像素,資料點等等)需要大量算術計 算;根據大多數的輸入値(像素,資料點等等)之大量計 •算的需求;以及相較計算能力而言僅需很小的通訊負擔◊ 本紙張尺度適用中國國) Α视格—(21〇χ297公楚) " (請先閲讀背面之注意事項再填寫本頁} ’訂 __ 484110 Λ7 B7 五、發明説明(+) 對高解析度繪圖的支援已發展至少超過3 0年。在1 960年與1970年代早期之初步成果諸如參考文獻〔 4〇〕所述係以少量特定硬體建立的電腦繪圖系統。當時 僅有稍許或者沒有對於VLSI(超大型積體電路)之想 法。 藉由半導體元件對繪圖產業的支援已把重心放在以下 的課題: A ·輸入/輸出(I /〇 )裝置之支援,對耗費大量 努力的螢幕顯示裝置的支援。此導致控制螢幕之特定積體 電路被發展。見參考文獻〔2〕。 Β·高速微處理器與數位信號處理器之發展。 C ·高速與高密度記憶體裝置之發展,特別是動態隨 機存取記憶體(DRAM)、視訊記憶體(VRAM)等 等。 D·特殊目的組件其目的在於即時影像處理與幀產生 應用程式。 這些努力基本上碰到一些瓶頸,如下所述° 經濟部中央標準局員工消費合作社印製 A·顯示裝置控制器受限於每一幀在特定時段內係由 一固定的執行結構機器所產生。因此,各種幀演算法係必 然會受限。 B·高速微處理器與數位信號處理器具有很大的內在 演算法靈活性且因此係被用在高速專用幀描繪架構諸如產 生「丁 〇 y故事」S U N (昇陽)網路。見參考文獻〔1 〕。隨著I n t e 1 P e n t i u瓜^微處理器的出現 484110 經濟部中央標準局員工消費合作社印製 B7 五、發明説明($ ) 帶來將R I S C (精簡指令集電腦)族群之所有效能策略 結合。於參考文獻〔3 0〕中之附錄D :精簡指令集電腦 的替換物:INTEL80X86)和參考文獻〔3 1〕 中之「附錄:一超純量3 8 6」在這方面提供良好的參考 資料。參考文獻〔3 0〕之“附錄C :精簡指令集電腦架 構之評述”提供一良好的綜述。 然而,商用微處理器和數位信號處理器系統係受限於 它們大量附加的電路。於現代超純量電腦中,此附加的電 路實質上可能大於該等算術單元◊見參考文獻〔3 0〕與 〔3 1〕有關於架構效能/成本比較評定之討論。 C·高效能記憶體係必要的然而並不保證能快速幀的 產生因爲高效能記憶體無法產生資料,它只是作爲儲存資料 用 D ·已有數類特殊目的之組件被提出,這些特殊目的 之組件係緊密地倂入資料處理元件中並被連結至一具有高 效能的記憶體(通常爲動態隨機存取記憶體)之積體電路 上◊然而這些努力全部皆遭遇瓶頸。於參考文獻〔3 2〕 中所討論的電路係使用有限精度之定點算術引擎。於參# 文獻〔3 2〕中所討論的電路限於浮點執行之效能’並且 處理的程式係大於一單一處理器之區域記憶體。 上面提到的特殊目的組件係被最佳化以執行數種胃胃 法。這些組件包括: D 1 ·影像壓縮/解壓縮處理器。這些電路,雖然重 要,但係非常專門化且對於多種演算法係不提供一般§@ (請先閱讀背面之注意事項再填寫本頁) 訂 本紙張尺度適用中國國家標準(CNS ) A4規格(210X297公釐) 484110 五、發明説明(fe ) 解。舉例而言,如此的引擎勢必很難有效以較高階程序語 言諸如C、C + +和FORTR A N程式設計。對它們程 式設計需要用到組件語言意味著這些單元將無法滿足多維 成像與繪圖幀產生方面一般目的之需求因爲軟體發展需很 大的經費,見參考文獻〔2 4〕和〔2 5〕。 D 2 ·處理器提供繪圖演算法最佳化諸如碎形幾何、 Ζ軸緩衝器、G 〇 u I* a u d陰影效果等等。這些電路對 於圖形幀產生與影像處理需求之廣泛方法並未達最佳化。 見參考文獻〔2 6〕一〔 2 9〕。 D3·信號處理前置處理器加速器諸如子波與其他濾 波器,基波通過基數4、8或1 6快速傅立葉轉換(F F T)等等。一維與二維離散餘弦轉換引擎。這些電路係很 難有效率地執行各種大型尺寸幀產生工作之程式。 D 4 ·多重處理器影像處理器。這些處理器包括混合 多指令多資料路徑(Μ I M D)與單指令多資料路徑(S I MD)系統,這些系統係不適用一般目的之程式設計。 見參考文獻〔24〕與〔41〕至〔43〕。 經濟部中央標準局員工消費合作社印製 這些處理器亦包括V L I W (超長指令字元處理器) 單指令多資料路徑(S I M D )積體電路諸如C h r 〇 m at i c ‘s MPA CT積體電路。如此的積體電路亦無 法提供茼業應用上使用之大量三維動畫軟體程式所需之計 算彈性,其需有效率的編譯器輔助。見參考文獻〔3 4〕 與〔3 9〕。 D 5 ·多媒體信號處理器。這些處理器亦具有各種限 本紙張尺度適用中國國家標準(CNS ) A4規格(210X29?公釐) 經濟部中央標準局員工消費合作社印製 484110 Λ7 B7 五、發明説明(7 ) 制,諸如缺少浮點支援、對大型外部記憶體缺少寬頻外部 資料記憶體介面存取頻寬、缺乏指令處理彈性與資料處理 多面性和仰賴向量處理器。就累加結果來說沒有非常一致 資料存取機構因此無法有效率以及很難執行程式運算。見 參考文獻〔3 5〕一〔 3 8〕。 所需的是一種避免上述關於對視訊幀描繪與數位信號 處理工作之限制的計算引擎。 本發明之槪要: 一種電路係被提供,此電路係包括對於產生全部或部 分幀所需的記憶體、邏輯、算術和控制電路,用於視訊處 理、動畫、數位信號和影像處理等方面° 一積體電路上係 配備有一或多個如此的電路。一視訊或影像幀產生系統係 由這些積體電路中的一或多個積體電路所構成,選擇性地 具有附加的記憶體電路,可提供動畫製作上幀產生之優越 效能,特別是三維與其他高效能應用諸如醫學成像、虛擬 實境、視訊電視遊樂器和虛擬環境之即時景物產生。該( 等)電路係用來處理高速物件導向繪圖相關的資訊流諸如 處理MP E G 4之資訊流,和作爲一單晶片:i AVA引擎 可具有良好的最佳化數値效能。 附圖之簡略說明: 圖1係根據本發明之實施例基本電路方塊圖。 圖2係根據本發明之實施例圖1的陣列處理器之方塊 圖。 圖3係根據本發明之另一實施例有關圖1的陣列處理 _____ 10 本紙張尺度適用中國) A4規格(21〇X 297/>fl ' ~~ (請先閱讀背面之注意事項再填寫本頁)Printed by the Consumer Cooperatives of the Central Bureau of Standards of the Ministry of Economic Affairs 484110 Λ__L7 ____ One 5. Description of the invention ()) These multi-dimensional signal processing applications draw images from the ultrasonic or magnetic contrast sensor library. This has the same performance requirements for frame generation. These applications have the purpose of analytical features in the reproduction / simulation of a three-dimensional or four-dimensional environment (note: the so-called four-dimensional meaning system here is the observation / simulation of the three-dimensional domain on the time axis). Feature resolution is the accumulation of input sensor resolution, the depth of fast Fourier transform (F F T) analysis that can be calculated within a certain period of time, the control of rounding errors, and the rounding errors processed through these data frames. High-definition feature resolution in a very short period of time results in each pixel or each data point of the output performing millions and often billions of arithmetic operations. The use of floating point arithmetic is used to provide a dynamic range Control and flexible rounding error control are quite common. The flexibility of the calculus is preferred because of the continuous software development and the availability of many different applications. These different applications often require a variety of software. These application software development requirements are quite consistent. In particular, most applications require many programs, most of which are written in procedural computer programming languages such as C ', C ++, and FORT RAN (see reference [1 1]- [1 8]); and the use of machine-level programming is limited to the most critical performance parts of such programs. These algorithms will have the following common characteristics: processing every ~ elements requires a lot of memory, usually 100 million bytes (Μ B) range; every ~ output 値 (pixels, data points, etc.) ) Requires a large number of arithmetic calculations; According to the large number of calculations and calculations required by most inputs 値 (pixels, data points, etc.); and requires only a small communication burden compared to the computing power ◊ This paper scale applies to China) A 视 格 — (21〇χ297 公 楚) " (Please read the notes on the back before filling this page} 'Order__ 484110 Λ7 B7 V. Description of the invention (+) Support for high-resolution drawing has been developed at least More than 30 years. Preliminary results in the 1960s and early 1970s, such as the computer graphics system built with a small amount of specific hardware, as described in reference [40]. At that time, there was little or no Circuit). With the support of semiconductor devices for the graphics industry, the focus has been on the following topics: A. Support for input / output (I / 〇) devices, support for screen display devices that consume a lot of effort . This led to the development of specific integrated circuits that control the screen. See reference [2]. Β. The development of high-speed microprocessors and digital signal processors. C. The development of high-speed and high-density memory devices, especially dynamic random. Access memory (DRAM), video memory (VRAM), etc. D. Special purpose components are designed for real-time image processing and frame generation applications. These efforts have basically encountered some bottlenecks, as described below. Printed by the Bureau of Consumer Standards of the Bureau of Standards · The controller of the display device is limited by the fact that each frame is generated by a fixed execution structure machine within a specific period of time. Therefore, various frame algorithms are bound to be limited. B · High-speed micro Processors and digital signal processors have a great deal of inherent algorithmic flexibility and are therefore used in high-speed dedicated frame drawing architectures such as generating "Ding Oy Story" SUN (Sun) networks. See reference [1]. With the appearance of the Inte 1 Pentium microprocessor, 484110 printed by the Consumer Cooperatives of the Central Standards Bureau of the Ministry of Economic Affairs, printed B7 V. Invention Description ($) brought RISC (Reduced Instruction Set Computer) combines all performance strategies of the group. Appendix D in Reference [30]: Replacement of Reduced Instruction Set Computer: INTEL80X86) and "Appendix: An Ultrapure in Reference [3 1]" Amount 3 8 6 "provides a good reference in this regard. Reference [30], "Appendix C: A Review of the Reduced Instruction Set Computer Architecture" provides a good overview. However, commercial microprocessors and digital signal processor systems are limited by their large number of additional circuits. In modern ultrascalar computers, this additional circuit may be substantially larger than these arithmetic units. See references [3 0] and [3 1] for a discussion of comparative evaluation of architecture efficiency / cost. C. High-performance memory system is necessary, but it does not guarantee the generation of fast frames. Because high-performance memory cannot generate data, it is only used to store data. D. Several types of special-purpose components have been proposed. These special-purpose components are It is tightly integrated into the data processing element and is connected to the integrated circuit of a high-performance memory (usually dynamic random access memory). However, all these efforts have encountered bottlenecks. The circuit discussed in reference [3 2] uses a fixed-point arithmetic engine with limited precision. The circuit discussed in reference # 32 [3 2] is limited to the performance of floating point execution 'and the processing program is larger than the area memory of a single processor. The special purpose components mentioned above are optimized to perform several gastrointestinal methods. These components include: D 1 · Image compression / decompression processor. Although these circuits are important, they are very specialized and do not provide general § @ for a variety of algorithms. (Please read the notes on the back before filling out this page.) The paper size of this paper applies the Chinese National Standard (CNS) A4 specification (210X297 (Mm) 484110 V. Explanation of invention (fe). For example, such an engine is bound to be difficult to effectively program in higher-level programming languages such as C, C ++, and FORTR A N. The need to use component languages for their program design means that these units will not be able to meet the general purpose needs of multi-dimensional imaging and drawing frame generation because software development requires a large amount of funding, see references [2 4] and [2 5]. D 2 · The processor provides drawing algorithm optimizations such as fractal geometry, Z-axis buffers, G 0 u I * a u d shadow effects, and more. These circuits are not optimized for a wide range of graphics frame generation and image processing needs. See references [26] to [2 9]. D3. Signal processing pre-processor accelerators such as wavelets and other filters, the fundamental wave passes through a base of 4, 8 or 16 fast Fourier transform (F F T) and so on. 1D and 2D discrete cosine transformation engines. These circuits are very difficult to efficiently perform various large-size frame generation programs. D 4 · Multi-processor image processor. These processors include hybrid multiple instruction multiple data path (M I MD) and single instruction multiple data path (S I MD) systems, these systems are not suitable for general purpose programming. See references [24] and [41] to [43]. Printed by the Consumer Cooperatives of the Central Standards Bureau of the Ministry of Economic Affairs. These processors also include VLIW (Very Long Instruction Word Processor) single instruction multiple data path (SIMD) integrated circuits such as C hr 〇m at ic 's MPA CT integrated circuits . Such integrated circuits also cannot provide the calculation flexibility required by a large number of 3D animation software programs used in industrial applications, which requires efficient compiler assistance. See references [3 4] and [3 9]. D 5 · Multimedia signal processor. These processors also have a variety of paper size standards applicable to Chinese National Standards (CNS) A4 specifications (210X29? Mm) Printed by the Staff Consumer Cooperatives of the Central Standards Bureau of the Ministry of Economic Affairs 484110 Λ7 B7 5. Invention Description (7) Point support, lack of broadband external data memory interface access bandwidth for large external memory, lack of command processing flexibility and data processing multifacetedness, and rely on vector processors. There is no very consistent data accumulation, so data access mechanisms are not efficient and difficult to perform programmatic operations. See references [3 5]-[3 8]. What is needed is a computing engine that avoids the aforementioned limitations on video frame rendering and digital signal processing tasks. Summary of the present invention: A circuit system is provided, which includes memory, logic, arithmetic and control circuits required for generating all or part of a frame for video processing, animation, digital signal and image processing, etc. An integrated circuit is equipped with one or more such circuits. A video or image frame generation system is composed of one or more integrated circuits among these integrated circuits, and optionally has additional memory circuits, which can provide superior performance in the production of frames on animation, especially 3D and Instant production of other high performance applications such as medical imaging, virtual reality, video and video games, and virtual environments. The (etc.) circuit is used to process high-speed object-oriented drawing-related information flow such as processing MP E G 4 information flow, and as a single chip: i AVA engine can have good optimized data performance. Brief description of the drawings: FIG. 1 is a block diagram of a basic circuit according to an embodiment of the present invention. FIG. 2 is a block diagram of the array processor of FIG. 1 according to an embodiment of the present invention. Fig. 3 is related to the array processing of Fig. 1 according to another embodiment of the present invention _____ 10 This paper size is applicable to China) A4 specification (21〇X 297 / > fl '~~ (Please read the precautions on the back before filling (This page)

、1T 484110 Λ7 五、發明説明(f ) 器之方塊圖。 圖4係圖1之內嵌式微處理器的方塊圖。 圖5係一積體電路之方塊圖,其中係具有二組圖1之 基本電路實體並有獨立的外部記憶體介面。 圖6係一積體電路之方塊圖,其中係具有二組圖1之 基本電路實體並共享一外部記憶體介面。 圖7係一積體電路之方塊圖,其中係具有四組圖1之 基本電路實體並有獨立的外部記憶體介面◊ 圖8係一積體電路之方塊圖,其中係具有四組圖1之 基本電路實體並共享一外部記憶體介面。 圖9係一積體電路之方塊圖,其中係具有四組圖1之 基本電路實體並共享二組外部記憶體介面。 ,圖1 0係一積體電路之方塊圖,其中係具有四組圖1 之基本電路實體並有二組共享外部記憶體介面以及完全互 連的訊息埠。_ 圖1 1係一積體電路之方塊圖,其中係具有十六組圖 1之基本電路實體並有四組共享外部記憶體介面。 經濟部中央標準局員工消費合作社印製 (請先閱讀背面之注意事項再填寫本頁) 圖1 2係一積體電路之方塊圖,其中係具有十六組圖 1之基本電路實體並具有二組共享記憶體介面。 圖1 3係一印刷電路板之方塊圖,顯示出把圖1之基 本電路實體與對應它們的記憶體模組整合在一電路板上。 圖1 4係另一印刷電路板之方塊圖,顯示出把圖1之 基本電路實體與對應它們的記憶體模組整合在一電路板上 〇 _η_ 本紙張尺度適用中國國家標準(CNS ) Α4規格(210X 297公釐) 484110 五、發明説明(?) 定義: 導線: 一導線係分享在一電路之多個節點間之狀態的機構。 該狀態係根據某些實際情形之有限字符其係包括但不限定 僅這些:電壓、電流、相位、頻譜分解、光子振幅。符號 係字符之個別元素,相應實際情形之所有測量値範圍典型 爲編碼符號◊最常被使用的字符係{0,1 }集,係二進 制符號集。利用上述所有的方案之二進制系統係存在。其 他常用的字符還包括三符號字符,例如,丨0,1,2 } ,多重二進制字符,例如,{〇〇,01,10,11} 等等β還有使用其他字符◊一導線可被做成,舉例而言, 如一金屬線(例如,在一積體電路中或於一電路板上)、 一光纖、或一微波通道(有時稱爲一微通道)° 接線束: 一接線束係一組一或多條的導線。 匯流排: 經濟部中央標準局員工消費合作社印製 一匯流排係擁有一匯流排協定之接線束°匯流排協定 係規定被接線束所連接之電路之間的通訊。一匯流排典型 地將由各組件間之接線束組成,各組件間之接線束中的一 或多個接線束將決定所連接的組件那些係接收和決定其他 一或多個各組件間之接線束那些係傳送。 浮點: 浮點表示法包括一組表示一數値實體的狀態集。該等 狀態集包括定義表示的數値之正負號、假數和指數的子集 __12 本紙張尺度適用中國國家標準(CNS ) Α4規格(210Χ 297公t y 經濟部中央標準局員工消費合作社印製 484110 五、發明説明(⑼) 。如此的浮點表示法不僅包括二進制系統還包括但不限定 僅這些:I E E E (電氣與電子工程師協會)標準浮點以 及包括但不限定僅這些:本說明書參考文獻所討論有關的 其他特殊目的之浮點表示法。一浮點表示法不僅包括上述 表示法因此具有二子集,每一子集如上述係包括一數値之 正負號、假數和指數。數値表示法係以數字來表示分隔的 資料。一浮點表示法另外還包括非二進制系統,其中假數 與指數係關於2以外的數字之次方。 可程式有限狀態機器: 一可程式有限狀態機器係一包括狀態暫存器以及可能 的一或多個其中狀態情形、數値等等存在之附加暫存器, 此外’藉由一機構,該狀態暫存器、附加暫存器以及外部 之輸入係產生用於狀態暫存器以及可能的附加暫存器之下 一個値° 單指令多資料路徑(S I M D): 一單指令多資料路徑架構係在相同指令執行周期期間 在超過一資料路徑上執行相同指令。對此基本觀念之典型 延伸係倂入關於每一資料路徑之「狀態旗標位元」。這些 旗標決定特定資料路徑是否可以或者不可以執行某些或者 全部之全體共享指令。 需要相同處理多資料流時,單指令多資料路徑(S I M D )架構係最佳的。這些資料流因內在同步將可產生時 常簡化通訊控制之優點。然而當資料路徑間的資料處理若 變得不相同時單指令多資料路徑(SIMD)架構會變得 ______13 本紙張尺度適用中國國家標準(CNS ) Α4規格(210Χ:297公釐) (請先閱讀背面之注意事項再填寫本頁) 、-口 484110 五、發明説明(fl ) 無效率。 單指令多資料路徑(s I MD )架構需要一相當少的 指令處理負擔成本因爲僅有一指令處理機構,係由該等資 料路徑分享。該指令處理機構具有一指令提取機構。該資 料路徑群組典型地僅需要一指令記憶體。 多指令多資料路徑(MIMD): 一多指令多資料路徑架構在不同的資料單元上執行不 同的指令。此處理方式之主要優點係富有彈性。任何資料 處理單元可執行它自己的指令,和其他資料處理單元無關 。然而,此種彈性方式會增加成本。特別地,每一資料處 理單元必須具有它自己的指令提取、解碼、和排序機構。 此指令提取機構經常具有位於資料處理器之至少一小型記 憶體。此區域記憶體通常是一快取記憶體。 (超)長指令字元處理器(分別爲VLIW與LIW): 經濟部中央標準局員工消費合作社印製 (請先閲讀背面之注意事項再填寫本頁) 一(超)長指令字元處理器係一種架構藉此一單指 令處理機構係包括一能夠共同運算之程式計數器,諸如依 條件碼分支,並且多個指令欄位係個別控制該等資料路徑 單元。在這些架構中,該等資料路徑單元之結構或函數通 常係不相同的。 陣列處理器: 一陣列處理器係指稱一長指令字元處理器(L I W) 或超長指令字元處理器(V L I W)指令處理架構係具有 多資料路徑單元而被排列成一或多個資料路徑集◊在本發 明之實施例中,如將說明的,該等資料路徑集接收和可在 _14_ 本紙張尺度適用中國國家標準(CNS ) A4規格(210X297公釐) 經濟部中央標準局員工消費合作社印製 Λ7 —---------— —_ B7 五、發明説明(fL) — — 一f用運算元上操作,其中公用運算元係經由一具有傳送 單兀之公用運算兀匯流排接收;每一資料路徑從一記憶體 接收一或多個附加運算元;每一資料路徑集包括一指令記 憶體處理指令欄位係控制由程式所控制的元素之操作;每 資料路徑包括一或多個乘法器,累加器(MACs);並 且每一乘法器/累加器(MA C)處理許多累加暫存器。 乘法器/累加器: 一乘法器/累加器係一算術電路同時執行二運算元之 乘積並且和其他運算元中的至少一運算元相加(也可以相 減)。 快速傅立葉轉換(F F T): 一快速傅立葉轉換係一相當好的演演算法用於執行一 信號之頻譜上很多重要的運算。見參考文獻〔1 1〕、〔 1 1〕和〔1 5〕相關的章節將會詳細討論各項論述的標 題。 向量處理器: 一向量處理器係專爲運算向量資料而設計之架構。典 型地’ 一向量處理器係大幅度地被管線化。有大量文獻致 力於硏究向量處理,見參考文獻〔4 6〕一〔 5 3〕。附 錄B,參考文獻〔3 0〕中之「向量處理」提供相關的總 〇 特定實施例之說明: 本發明係提供一成本低廉且有效的基本電路’此電路 封裝記憶體、邏輯單元'算術單元和特殊的電路以及控制 _______ 本紙張尺度適用中國國家標準(CNS ) A4規格(210X 297公釐) (請先閱讀背面之注意事項再填寫本頁) 訂 4841101T 484110 Λ7 V. Block diagram of the invention description (f). FIG. 4 is a block diagram of the embedded microprocessor of FIG. 1. FIG. Fig. 5 is a block diagram of an integrated circuit, which has two sets of basic circuit entities of Fig. 1 and has independent external memory interfaces. Fig. 6 is a block diagram of an integrated circuit, which has two sets of basic circuit entities of Fig. 1 and shares an external memory interface. Figure 7 is a block diagram of an integrated circuit, which has four sets of the basic circuit entities of Figure 1 and has independent external memory interfaces. Figure 8 is a block diagram of an integrated circuit, which has four sets of Figure 1 The basic circuit entity also shares an external memory interface. Fig. 9 is a block diagram of an integrated circuit, which has four sets of the basic circuit entities of Fig. 1 and shares two sets of external memory interfaces. Fig. 10 is a block diagram of an integrated circuit, which has four sets of basic circuit entities of Fig. 1 and two sets of shared external memory interfaces and fully interconnected message ports. _ Figure 11 is a block diagram of an integrated circuit, which has sixteen sets of the basic circuit entities of Figure 1 and four sets of shared external memory interfaces. Printed by the Consumer Cooperatives of the Central Standards Bureau of the Ministry of Economic Affairs (please read the precautions on the back before filling this page). Group shared memory interface. Fig. 13 is a block diagram of a printed circuit board, showing the integration of the basic circuit entities of Fig. 1 with the memory modules corresponding to them on a circuit board. Fig. 14 is a block diagram of another printed circuit board, showing the integration of the basic circuit entities of Fig. 1 and their corresponding memory modules on a circuit board. __ This paper standard applies to the Chinese National Standard (CNS) Α4 Specifications (210X 297mm) 484110 V. Description of the Invention (?) Definition: Conductor: A conductor is a mechanism that shares the state among multiple nodes in a circuit. This state is based on the limited characters of some practical situations. It includes but is not limited to these: voltage, current, phase, spectral decomposition, photon amplitude. A symbol is an individual element of a character. The range of all measurements corresponding to the actual situation is typically a coded symbol. The most commonly used character system is the {0,1} set, which is a binary symbol set. Binary systems exist that utilize all of the above schemes. Other commonly used characters include three-symbol characters, for example, 丨 0,1,2,}, multiple binary characters, for example, {〇〇, 01,10,11}, etc. β and other characters. A wire can be used to make For example, such as a metal wire (for example, in an integrated circuit or on a circuit board), an optical fiber, or a microwave channel (sometimes referred to as a micro-channel) ° wiring harness: a wiring harness system A set of one or more wires. Busbar: Printed by the Consumer Cooperative of the Central Standards Bureau of the Ministry of Economic Affairs. A busbar is a wiring harness with a busbar agreement. The busbar agreement specifies the communication between the circuits connected by the busbar. A busbar will typically consist of wiring harnesses between components, and one or more of the wiring harnesses between components will determine which components are connected and which receive and determine the wiring harnesses between the other one or more components. Those are teleportation. Floating point: Floating point notation includes a set of states representing a number of entities. These status sets include a subset of the positive and negative signs, false numbers, and indices defined by the definition __12 This paper size applies to the Chinese National Standard (CNS) Α4 specification (210 × 297 public ty printed by the Central Consumers Bureau of the Ministry of Economic Affairs Consumer Cooperatives) 484110 V. Description of the invention (⑼). Such a floating-point representation includes not only binary systems but also but not limited to these: IEEE (Institute of Electrical and Electronics Engineers) standard floating-point and includes but is not limited to these: References to this specification The floating-point representation of the other special purpose in question. A floating-point representation includes not only the above-mentioned notation and therefore has two subsets, each of which, as described above, includes a positive sign, a false number, and an exponent. The notation system uses numbers to represent separated data. A floating-point representation also includes non-binary systems, where false numbers and exponents are about the power of a number other than 2. Programmable finite state machine: Programmable finite state machine A register including a state register and possibly one or more additional registers in which state conditions, data, etc. exist, In addition, through a mechanism, the status register, additional registers, and external inputs are generated for the status register and possibly additional registers under a single instruction multiple data path (SIMD): A single-instruction, multiple-data-path architecture executes the same instruction on more than one data path during the same instruction execution cycle. A typical extension of this basic idea is to incorporate "status flag bits" for each data path. These flags The standard determines whether a particular data path can or cannot execute some or all of the shared instructions. When multiple data streams need to be processed the same, the single instruction multiple data path (SIMD) architecture is optimal. These data streams will be available due to internal synchronization. It often has the advantage of simplifying communication control. However, when the data processing between data paths becomes different, the single instruction multiple data path (SIMD) architecture will become ______13 This paper standard applies Chinese National Standard (CNS) Α4 specification (210 × : 297 mm) (Please read the precautions on the back before filling out this page), -port 484110 V. The description of the invention (fl) is invalid The single instruction multiple data path (S I MD) architecture requires a relatively small instruction processing burden because only one instruction processing organization is shared by these data paths. The instruction processing organization has an instruction fetching organization. The data path Groups typically require only one instruction memory. Multiple instruction multiple data path (MIMD): A multiple instruction multiple data path architecture executes different instructions on different data units. The main advantage of this processing method is flexibility. Any data The processing unit can execute its own instructions, which has nothing to do with other data processing units. However, this flexible approach increases costs. In particular, each data processing unit must have its own instruction fetching, decoding, and sequencing mechanism. This instruction The retrieval mechanism often has at least one small memory located in the data processor. This area of memory is usually a cache memory. (Ultra) long instruction character processors (VLIW and LIW respectively): Printed by the Consumer Cooperatives of the Central Standards Bureau of the Ministry of Economic Affairs (please read the precautions on the back before filling this page) A framework whereby a single instruction processing mechanism includes a program counter capable of common operations, such as branching by condition codes, and multiple instruction fields individually control these data path units. In these architectures, the structure or function of the data path units is often different. Array processor: An array processor refers to a long instruction word processor (LIW) or a very long instruction word processor (VLIW) instruction processing architecture. It has multiple data path units and is arranged into one or more data path sets. ◊In the embodiment of the present invention, as will be explained, these data path sets can be received and accepted at _14_ This paper size is applicable to China National Standard (CNS) A4 specifications (210X297 mm) Employees ’Cooperatives, Central Standards Bureau, Ministry of Economic Affairs Print Λ7 —---------— —_ B7 V. Description of the Invention (fL) — — f operates on operands, where the common operands pass through a common arithmetic unit with a transmission unit. Row receiving; each data path receives one or more additional operands from a memory; each data path set includes an instruction memory processing instruction field that controls the operation of elements controlled by the program; each data path includes a Or multiple multipliers, accumulators (MACs); and each multiplier / accumulator (MA C) handles many accumulation registers. Multiplier / accumulator: A multiplier / accumulator is an arithmetic circuit that simultaneously performs the product of two operands and adds (or subtracts) one or more of the other operands. Fast Fourier Transform (F F T): A fast Fourier transform is a fairly good algorithm for performing many important operations on the frequency spectrum of a signal. See references [1 1], [1 1], and [1 5] for more details on the topics discussed. Vector processor: A vector processor is a framework designed for computing vector data. A typical'-vector processor is heavily pipelined. There is a large amount of literature dedicated to investigating vector processing, see references [4 6]-[5 3]. Appendix B, "Vector Processing" in reference [30] provides a description of the specific embodiment. The present invention provides a low-cost and effective basic circuit 'this circuit encapsulates memory, logic unit' arithmetic unit And special circuits and controls _______ This paper size applies to China National Standard (CNS) A4 specifications (210X 297 mm) (Please read the precautions on the back before filling this page) Order 484110

經濟部中央榡準局員工消費合作衽印製 五、發明説明(〇) 單元係特別適用於支援多種演算法,特別是被使用於視訊 幀描繪。複製該基本電路會增加計算容量,產生支援很高 效能需求之能力。 關於高效能之間的區別通常爲多維度的信號處理與高 效能幀描繪繪圖方面之間的區別係差不多。在參閱這些圖 形時,有特別註解描述本發明應用上的不同處。 圖1係根據本發明之實施例基本電路1的方塊圖。於 該基本電路1中,一或多個控制器2作爲外部區域記憶體 介面對一與該基本電路1相聯繋之區域外部記憶體(被顯 示於較後的圖中)提供存取◊此一區域外部記憶體最好是 1 0 0百萬位元組或較多的記憶體(見Τ Ο Υ故事,參考 文獻〔1〕),例如,十億位元組的記憶體。一主記憶體 介面與控制器(MM I C ) 6,亦稱爲「大型記憶體寬介 面電路」6,係控制該(等)外部區域記憶體介面2舆各 種其他組件之間的資料流程,,該資料包括指令。該主記 憶體介面與控制器(MM I C ) 6亦控制對一內部區域記 憶體4之存取。該基本電路1亦包括一總體外部匯流排介 面(G Ε Β I ) 7、一內嵌式微處理器子系統8,一數位 信號處理器(D S Ρ )(陣列處理器)9、相鄰電路通信 埠1 0、特殊目的電路1 1 (例如,特別幀產生器電路、 內容可定址記憶體(CAM)、位元碼剖析器,等等), 以及接線束 3、5、12、13、14、15、16和1 7 ’這些接線束如圖1所示互連該等電路。 該總體外部匯流排介面(G Ε Β I ) 7提供至外部環 -- 16 尽紙張尺度適用中國國家標準(CNS ) a4規格(210X 297公釐) (請先閱讀背面之注意事項再填寫本頁) -口 484110 Λ7 五、發明説明(4) (請先閲讀背面之注意事項再填寫本頁) 境之介面。此外部匯流排可能是一標準電腦匯流排諸如眾 所皆知的週邊組件互連(P C I )、圖形加速埠(A G P )、通用串列匯流排(USB)、國際電機電子工程師學 會(IEEE) 1394 (以前,Firewire) ' 或者光纖通道;或者一特殊目的匯流排用在外部控制器一 主機與本發明之多種實體之間支援通信。 該內嵌式微處理器子系統8控制基本電路1之操作但 不針對數位信號處理器(陣列處理器)9或特殊目的之電 路11執行計算。該微處理器子系統8下文將會更詳細討 論。 該數位信號處理器(陣列處理器)9執行浮點計算。 於本發明之一實施例中,該數位信號處理器(陣列處理器 )9.係如同1 9 9 7年6月2 0日提出的美國專利申請案 號60/050,396中所顯示與討論的,該專利案號 之內容於此處係合倂參考之用。該數位信號處理器(陣列 處理器)9係將在下文更詳細討論。 經濟部中央標準局員工消費合作社印製 該(等)相鄰電路通信埠1〇提供基本電路1與系統 中基本電路1之其他實體之間的通信。在其中該基本電路 1僅一實體係被使用的系統應用中’這些埠1 0係不必要 的。於一特定實施例中先前的句子存在一例外情形’其中 某些埠1 0提供與其他組件(無顯示出)通信,這些組件 不是基本電路1之實體然而非常方便與基本電路1之實體 或者經由基本電路1之實體同作交易。 具有基本電路1之多個實體的應用程式通常需要高速 __17______ 巧張尺度適用中關家標準(CNS ) A4規格(210X297公麓) " ^ 484110Printed by the Central Government Bureau of the Ministry of Economic Affairs of the People's Republic of China on consumer cooperation. 5. Description of the Invention (〇) The unit is particularly suitable for supporting multiple algorithms, especially for video frame drawing. Duplicating this basic circuit increases computing capacity and creates the ability to support very high performance requirements. The difference between high performance is usually the same between multidimensional signal processing and high performance frame drawing. When referring to these figures, there are special notes describing the differences in application of the invention. FIG. 1 is a block diagram of a basic circuit 1 according to an embodiment of the present invention. In the basic circuit 1, one or more controllers 2 serve as external area memories to provide access to an area external memory (shown in the later figure) associated with the basic circuit 1. Here, The external memory of a region is preferably 100 million bytes or more of memory (see ΤΥΥ story, reference [1]), for example, one billion bytes of memory. A main memory interface and controller (MM IC) 6, also known as a "large-memory wide interface circuit" 6, controls the data flow between the (and other) external area memory interface 2 and various other components, The information includes instructions. The main memory interface and controller (MM I C) 6 also controls access to an internal area memory 4. The basic circuit 1 also includes a general external bus interface (G Ε Β I) 7, an embedded microprocessor subsystem 8, a digital signal processor (DSP) (array processor) 9, and adjacent circuit communication. Port 10, special purpose circuit 1 1 (for example, special frame generator circuit, content addressable memory (CAM), bit code parser, etc.), and wiring harness 3, 5, 12, 13, 14, 15, 16, and 7 'These interconnects interconnect these circuits as shown in Figure 1. The overall external bus interface (G Ε Β I) 7 is provided to the outer ring-16 paper sizes apply Chinese National Standard (CNS) a4 specifications (210X 297 mm) (Please read the precautions on the back before filling this page ) -Port 484110 Λ7 V. Description of the invention (4) (Please read the notes on the back before filling this page). The external bus may be a standard computer bus such as the well-known Peripheral Component Interconnect (PCI), Graphics Acceleration Port (AGP), Universal Serial Bus (USB), Institute of Electrical and Electronics Engineers (IEEE) 1394 (Previously, Firewire) 'or Fibre Channel; or a special purpose bus used to support communication between an external controller, a host, and various entities of the invention. The embedded microprocessor subsystem 8 controls the operation of the basic circuit 1 but does not perform calculations on a digital signal processor (array processor) 9 or a special purpose circuit 11. The microprocessor subsystem 8 is discussed in more detail below. The digital signal processor (array processor) 9 performs floating-point calculations. In one embodiment of the present invention, the digital signal processor (array processor) 9. is shown and discussed in US Patent Application No. 60 / 050,396, filed on June 20, 1997. Yes, the contents of the patent number are incorporated herein by reference. The Digital Signal Processor (Array Processor) 9 Series will be discussed in more detail below. Printed by the Consumer Cooperatives of the Central Standards Bureau of the Ministry of Economic Affairs. The (and other) adjacent circuit communication port 10 provides communication between the basic circuit 1 and other entities in the system. In a system application in which the basic circuit 1 is used as a real system, these ports 10 are unnecessary. There is an exception to the previous sentence in a specific embodiment. 'Some of the ports 10 provide communication with other components (not shown). These components are not the entities of the basic circuit 1 but are very convenient to communicate with the entities of the basic circuit 1 or via The entities of the basic circuit 1 are traded together. Applications with multiple entities of basic circuit 1 usually require high speed __17______ The clever scale is applicable to the Zhongguanjia Standard (CNS) A4 specification (210X297 male feet) " ^ 484110

經濟部中央標準局員工消費合作社印製 五、發明说明(β ) 執行,在基本電路1之實體之間同時通信。藉由提供該( 等)通信璋1 〇,基本電路1在該總體外部匯流排介面( G E B I ) 7中將會減少可能的堵塞◊於一特定的實施例 中,該(等)通信埠1 0在剛好二實體間提供專用的通信 ;因此,在每一埠上所使用的協定可能比,例如,在外部 匯流排介面7中使用的匯流排協定簡單。 通常’標準方法係用於實體間相互通信。舉例而言, 根據訊息傳送協定的各種通信方案可被使用’其在此方面 技術有很多文獻資料。相同地,數種「旗號」系統或者「 交握」系統中任何一種可被用來提供不同實體間的同步與 控制同時順序處理(c S P )運算之能力。 特殊目的之電路11存在於各種實施例中作爲提供增 加效能用換句話說對應用上具有相當複雜或者耗時的運算 係必須的◊如此的電路之例子包括,然而係不限定僅這些 :一幀產生器用於加速Z軸緩衝器運算,三角形塡色’ 「 B i t B 1 t s」和表面紋理;一內容可定址的記憶體( C AM)用於加速圖形匹配,舉例而言,係藉由碎形幾何 壓縮或者將一壓縮的符號轉換成一標準處置標示至一散列 表中;和一位元碼剖析器係對命令解碼以及用於高度壓縮 的通信方案諸如Μ P E G (動畫專家群組)之資料標頭與 語言諸如J a ν aTM,其係一 S u η微系統之商標。 內部記憶體庫4包括一或多個隨機存取記憶體(R Α Μ)陣列庫◊於本發明之實施例中這些記憶體係實用的動 態隨機存取記憶體(DRAM),然而其他類型的記億體 __ —_____18 _______ 本紙張尺度適用中國國家標準(CNS ) A4規格(210X297公釐) (請先閱讀背面之注意事項再填寫本頁)Printed by the Consumer Cooperatives of the Central Bureau of Standards of the Ministry of Economic Affairs 5. The description of the invention (β) is executed, and the entities in the basic circuit 1 communicate at the same time. By providing the (etc.) communication (10), the basic circuit (1) in the overall external bus interface (GEBI) (7) will reduce possible congestion (in a specific embodiment, the (etc.) communication port (10) Dedicated communication is provided between exactly two entities; therefore, the protocol used on each port may be simpler than, for example, the bus protocol used in the external bus interface 7. Usually the 'standard method is used to communicate between entities. For example, various communication schemes based on messaging protocols can be used ', and there is a lot of literature on this technology. Similarly, any of several "flag" systems or "handshake" systems can be used to provide the ability to synchronize and control simultaneous sequential processing (c SP) operations between different entities. The special purpose circuit 11 exists in various embodiments as a way to provide increased efficiency. In other words, it is necessary for a computationally complex or time-consuming operation system. Examples of such circuits include, but are not limited to only these: one frame The generator is used to accelerate the operation of the Z-axis buffer, the triangle ocher “B it B 1 ts” and the surface texture; a content-addressable memory (C AM) is used to accelerate the graphic matching, for example, by shredding Geometric compression or conversion of a compressed symbol into a standard disposal tag into a hash table; and a one-byte parser that decodes commands and information for highly compressed communication schemes such as M PEG (animation expert group) Headers and languages such as JaVaTM, which are trademarks of a Su η microsystem. The internal memory bank 4 includes one or more random access memory (RAM) array banks. In the embodiment of the present invention, these memory systems are practical dynamic random access memory (DRAM). However, other types of memory Billion body __ —_____ 18 _______ This paper size applies Chinese National Standard (CNS) A4 (210X297 mm) (Please read the precautions on the back before filling this page)

484110 Λ7 B7 五、發明説明(J) 亦可被使用◊於一較佳實施例中’係提供全部記憶體爲1 至3 2百萬位元組(M B )構成一寬組態。預期可達到1 0 2 4位元(1 K位元)之寬度。大於6 4位元之寬度係 看作「寬的」。下列這些記憶體所包括的爲較佳的組態然 而係不限定僅這些: 32KX64,χ128,x 256,X512和xlK位 元; 64KX64,X128,X256,X512和X1K位 元; 128KX64,xl28,X256,X512和X1K 位元; 256KX64,xl28,x256,x512和X1K 位元;等等 値得注意的是提供對於除了 2的次方諸如7 2以外的 資料長度之存取的組態可具有許多優點◊這些組態支援主 記憶體介面與控制器(MM I C ) 6內部之標準邏輯實施 的錯誤偵測與校正方案。 經濟部中央標準局員工消費合作社印製 該主記憶體介面與控制器(MM I C ) 6係一支援一 些功能之數位邏輯電路。若該內部記憶體庫4係動態隨機 存取記憶體或者像動態隨機存取記憶體一樣需要更新,則 該主記憶體介面與控制器(MM I C ) 6自動更新該內部 隨機存取記憶體庫4。該主記憶體介面與控制器(MM I C)6允許並控制對內部隨機存取記憶體庫4存取供總體 外部匯流排介面(G E B I ) 7、嵌入微處理器8、數位 —_______19 本紙張尺度適用中國國家標準(CNS ) A4規格(2ι〇Χ297公釐) 48411° κι484110 Λ7 B7 V. Description of the invention (J) can also be used. In a preferred embodiment, it is provided that the total memory is 1 to 32 2 megabytes (MB) to form a wide configuration. It is expected to reach a width of 10 24 bits (1 K bits). Widths greater than 64 bits are considered "wide." The following memories include better configurations but are not limited to these: 32KX64, χ128, x256, X512 and x1K bits; 64KX64, X128, X256, X512 and X1K bits; 128KX64, xl28, X256 bits , X512 and X1K bits; 256KX64, xl28, x256, x512 and X1K bits; etc. It should be noted that a configuration that provides access to data lengths other than the power of 2 such as 7 2 can have many advantages ◊These configurations support error detection and correction schemes implemented by standard logic within the main memory interface and controller (MM IC) 6. Printed by the Consumer Cooperative of the Central Standards Bureau of the Ministry of Economic Affairs. The main memory interface and controller (MM I C) 6 is a digital logic circuit that supports some functions. If the internal memory bank 4 is a dynamic random access memory or needs to be updated like a dynamic random access memory, the main memory interface and the controller (MM IC) 6 automatically updates the internal random access memory bank 4. The main memory interface and controller (MM IC) 6 allows and controls access to the internal random access memory bank 4 for the overall external bus interface (GEBI) 7. Embedded microprocessor 8, digital — _______19 paper size Applicable to Chinese National Standard (CNS) A4 specification (2ι〇 × 297 mm) 48411 ° κι

玉、發明説明(q) 信號處理器9 (陣列處理器),並且可能經相鄰電路通信 瘅1 〇之實體和特殊目的之電路1 1讀取與寫入。該主記 懷體介面與控制器(MM I C ) 6亦允許並控制由該總體 外部匯流排介面(G E B I ) 7、嵌入微處理器8、數位 信號處理器9 (陣列處理器),並且可能經相鄰電路通信 瘅1〇之實體和特殊目的之電路11對外埠區域記憶體介 面2存取。 於本發明之較佳實施例中,該主記憶體介面與控制器 (MM I c ) 6包括一有限態機器(f SM)從該等模組 7、8、9、1 0和1 1接受對於在外部區域記憶體中資 料和/或指令之請求來控制資料與指令流程◊該主記憶體 介面與控制器(MM I c ) 6根據在該有限態機器(F S Μ ).中編碼的決策演算法來滿足上述的請求。關於有限態 機器(F SM)對資料/指令流程之控制係眾所皆知的。 經濟部中央標準局員工消費合作社印製 (請先閱讀背面之注意事項再填寫本頁)Jade, description of the invention (q) Signal processor 9 (array processor), and the physical and special-purpose circuit 11 1 that can communicate via adjacent circuits 读取 1 0 read and write. The main memory interface and controller (MM IC) 6 also allows and controls the overall external bus interface (GEBI) 7, embedded microprocessor 8, digital signal processor 9 (array processor), and possibly via The entity of the adjacent circuit communication 10 and the special-purpose circuit 11 access the external port area memory interface 2. In a preferred embodiment of the present invention, the main memory interface and controller (MM I c) 6 includes a finite state machine (f SM) received from the modules 7, 8, 9, 10, and 11 For the request of data and / or instructions in the external area memory to control the data and instruction flow, the main memory interface and controller (MM I c) 6 is based on the decisions encoded in the finite state machine (FS Μ). Algorithm to satisfy the above request. The control of the data / instruction flow by a finite state machine (F SM) is well known. Printed by the Consumers' Cooperative of the Central Standards Bureau of the Ministry of Economic Affairs (Please read the precautions on the back before filling this page)

、1T 於本發明之一較佳實施例中,分離的指令與資料流係 供內嵌式微處理器8、數位信號處理器9 (陣列處理器) 和特殊目的電路1 1之每一單元作業d於程式初始與作業 期間’從總體外部匯流排介面(G E B I ) 7載入其中之 後’這些指令與資料流典型地常駐在外部區域記憶體上。 這些指令與資料流典型地根據已知的編譯程式技術之一或 多個編譯程式產生,並且可能根據已知方法加入一些交握 協定。1T In a preferred embodiment of the present invention, separate instructions and data streams are provided for each unit of the embedded microprocessor 8, digital signal processor 9 (array processor), and special purpose circuit 11. These instructions and data streams typically reside on external area memory during program initialization and operation 'after being loaded from the General External Bus Interface (GEBI) 7'. These instructions and data streams are typically generated based on one or more of the known compiler techniques and may incorporate some handshake protocols according to known methods.

除了主記憶體介面與控制器(MM I C ) 6以外,還 有該內嵌式微處理器8與總體外部匯流排介面(G E B I 本紙張尺度適用中國國家標準(CNS ) A4規格(210X297公釐 經濟部中央榡準局員工消費合作衽印製 484110 kl B7 五、發明説明( )7以一已知技術之方式提供對資料/指令流的控制方向 。舉例而言,該微處理器8可先佔用有限態機器(F SM )之控制,例如,在總體外部匯流排介面(G E B I ) 7 上匯流排錯誤交易之錯誤控制、初始化和抓取。 關於外部區域記憶體介面之控制器2 (或E LM I 2 )對一或多個種外部積體電路(I C )記憶體之存取提供 外部時序與介面規約之支援。舉例而言,該(等)外部區 域記憶體介面(E LM I ) 2提供對隨機存取記憶體之讀 取與寫入,以及對不變性記憶體讀取。若某些或全部外部 記憶體係關於動態隨機存取記憶體(見參考文獻〔5 4〕 中之第6章與第7章)時,則該(等)外部區域記憶體介 面(E LM I ) 2支援自動記憶體更新。最佳地,該(等 )外部區域記憶體介面(E LM I ) 2在更新期間提供存 取請求之佇列◊該(等)外部區域記憶體介面(E LM I )2亦可選擇性地在寫入、抹除和其他可能的慢速作業期 間提供存取請求之佇列。 於該等實施例中,其中至少某些外部記憶體係可程式 化與不變的(見參考文獻〔5 4〕中之第1 0、1 1和1 2章),該(等)外部區域記憶體介面(E LM I ) 2提 供特定字元之寫入和,相稱地、字元區塊之抹除。選擇性 地,該(等)外部區域記憶體介面(E L Μ I ) 2可提供 該技術中爲人所熟知之附加功能’諸如對不變性記憶體作 業中止抹除,係爲了提供方便某些應用程式執行。 雖然沒有特別在圖1中顯示出’於另一種實施例中該 _21 本紙張尺度適用中國國家標準(CNS ) Α4規格(210Χ297公嫠) (請先閱讀背面之注意事項再填寫本頁) 、1Τ 48411° 經濟部中央標準局員工消費合作社印製 hi B7 7、發明説明(d) 褰本電路1包括一或多個類比介面組件,諸如類比至數位 (A/D )轉換器,和數位至類比(D/A)轉換器,壓 控震盪器(V 0 C 〇)等等,和它們對應的接線束。 於另一種的實施例中附加的第二支援電路包括,然而 係不限定僅這些:內部時鐘乘法器、相位鎖定迴路(P L L)、時鐘分配網路、內建自測試(Βΐ$τ)電路、邊 界揷描路徑,等等,這些電路於相關的技術中係爲人所熟 該基本電路1具有一最佳化設計係以多種方式提供一 用於幀描繪與類似的工作之高效能計算引擎。重要地,該 內嵌式微處理器8、數位信號處理器9 (陣列處理器), 和特殊目的電路11係被提供電路資源以同時處理它們個 別的指令與資料,並且在很少會缺芝資料與指令而暫停之 下。會出現此效率係起因於,舉例而言,整體設計與個別 模組設計,如上文與下文所述。舉例而言,該接線束1 3 在總體外部匯流排7和內嵌式微處理器8之間藉由提供一^ 專用路徑作爲交互作用,例如,控制,用以降低該主匯流 排介面(主記憶體介面與控制器(mm Ic) 6,和接線 束12、14、15、16、17)之頻寬負擔,該接線 束13係和主匯流排介面上之主要流量交易無關◊以此方 式,則可節省主匯流排介面之頻寬。 於一較佳實施例中,該基本電路1係以一積體電路( I C )之方式實施◊重要地,該基本電路1可因它的設計 而能使用較少之積體電路表面積實施。而且此架構係簡單 _____22_ 本&張尺度適用中國國^^((:奶7^1格(21()\ 297公釐)~' " (請先閱讀背面之注意事項再填寫本頁) 衣·In addition to the main memory interface and the controller (MM IC) 6, there is also an embedded microprocessor 8 and an overall external bus interface (GEBI) This paper size applies the Chinese National Standard (CNS) A4 specification (210X297 mm Ministry of Economic Affairs) Printed by the Central Procurement Bureau for employee cooperation 484110 kl B7 V. Description of the invention () 7 Provides direction of control of data / instruction flow in a known technology. For example, the microprocessor 8 may occupy a limited amount first Control of the state machine (F SM), for example, error control, initialization, and capture of bus error transactions on the overall external bus interface (GEBI) 7. Controller 2 (or E LM I) of the external area memory interface 2) Access to one or more types of external integrated circuit (IC) memory provides external timing and interface protocol support. For example, the (and other) external area memory interface (E LM I) 2 provides access to Read and write random access memory, and read invariant memory. If some or all of the external memory system is about dynamic random access memory (see Chapters 6 and 5 in Reference [5 4]) Chapter 7), the (and other) external area memory interface (E LM I) 2 supports automatic memory update. Optimally, the (and other) external area memory interface (E LM I) 2 is during the update Provide a queue of access requests. The external memory interface (E LM I) 2 can also optionally provide a queue of access requests during write, erase, and other possible slow operations. In these embodiments, at least some of the external memory systems are programmable and immutable (see chapters 10, 11 and 12 in reference [5 4]), the (etc.) external area memory The body interface (E LM I) 2 provides the writing of specific characters and the proportional erasure of the character blocks. Optionally, the (or other) external area memory interface (EL M I) 2 can provide the Additional functions that are well known in the art, such as suspending the erasing of invariable memory operations, are provided to facilitate the execution of certain applications. Although not specifically shown in FIG. 1, 'this in another embodiment_21 This paper size applies to China National Standard (CNS) Α4 specification (210 × 297 male) ) (Please read the notes on the back before filling out this page), 1T 48411 ° Printed by the Consumers Cooperative of the Central Standards Bureau of the Ministry of Economic Affairs, Hi B7 7. Invention Description (d) d This circuit 1 includes one or more analog interface components, Such as analog-to-digital (A / D) converters, and digital-to-analog (D / A) converters, voltage-controlled oscillators (V 0 C 〇), etc., and their corresponding wiring harnesses. In another embodiment The additional second support circuits included, but are not limited to these: internal clock multiplier, phase locked loop (PLL), clock distribution network, built-in self-test (Bΐ $ τ) circuit, boundary trace path, etc. These circuits are well known in the related art. The basic circuit 1 has an optimized design that provides a high-performance computing engine for frame rendering and similar tasks in a variety of ways. Importantly, the embedded microprocessor 8, digital signal processor 9 (array processor), and special purpose circuit 11 are provided with circuit resources to process their individual instructions and data at the same time. With instructions while suspended. This efficiency arises from, for example, overall design and individual module design, as described above and below. For example, the wiring harness 1 3 provides a dedicated path for interaction between the overall external bus 7 and the embedded microprocessor 8, such as control, to reduce the main bus interface (main memory). The physical interface is related to the bandwidth of the controller (mm Ic) 6, and the wiring harness 12, 14, 15, 16, 17). The wiring harness 13 is not related to the main traffic transactions on the main bus interface. In this way, It can save the bandwidth of the main bus interface. In a preferred embodiment, the basic circuit 1 is implemented as an integrated circuit (IC). Importantly, the basic circuit 1 can be implemented with less integrated circuit surface area due to its design. And this structure is simple _____22_ This & Zhang scale is applicable to China ^^ ((: milk 7 ^ 1 grid (21 () \ 297mm) ~ '" (Please read the precautions on the back before filling this page ) Clothing

、1T 經濟部中央標準局員工消費合作社印製 Λ7 B7 五、發明説明(/ ) 又富有彈性足以使編譯器能有效地對它的計算單元編譯, 特別是對微處理器與該陣列處理器。 圖2係根據本發明之一實施例有關圖1之陣列處理器 9的方塊圖◊下文將會詳細討論,該陣列處理器9係能夠 執行使用向量處理技術之習知技術無法有效率地以平行方 式處理的計算,如引用的參考文獻中之討論。 這些計算包括,舉例而言,需要計算數個較短的向量 。舉例而言,計算複數値函數,諸如X = ( a z + b ) / (cz + d),其中a'b、c、d和z皆爲複數浮點數 ◊如大家所熟知的,這些計算和相似的計算在幀描繪中係 爲普遍的。爲了計算這些函數,aO、bO、cO'd〇 ' z 〇和X 〇係被定義爲實數部份並且相對的,a 1、b 1、. c 1、d 1、z 1和x 1係被定義爲虛數部份◊計算 時在進入一浮點除法電路之前,先進行二遍乘法一累加處 理◊於第一遍中,計算如下:, 1T Printed by the Consumer Cooperatives of the Central Standards Bureau of the Ministry of Economic Affairs Λ7 B7 V. The description of the invention (/) is flexible enough to enable the compiler to efficiently compile its computing unit, especially the microprocessor and the array processor. FIG. 2 is a block diagram of the array processor 9 of FIG. 1 according to an embodiment of the present invention. As will be discussed in detail below, the array processor 9 is capable of performing conventional techniques using vector processing techniques and cannot efficiently perform parallel Calculations, as discussed in the cited references. These calculations include, for example, the need to calculate several shorter vectors. For example, calculate a complex unitary function, such as X = (az + b) / (cz + d), where a'b, c, d, and z are all complex floating-point numbers. As everyone knows, these calculations and Similar calculations are common in frame rendering. To calculate these functions, aO, bO, cO'd〇 'z 〇 and X 〇 are defined as the real part and relative, a 1, b 1,. C 1, d 1, z 1 and x 1 are It is defined as the imaginary part. Before calculation, before entering a floating-point division circuit, two-pass multiplication and accumulation are performed. In the first pass, the calculation is as follows:

AO = a〇 氺 z〇 — a 1 氺 z 1+bO A1 二 a〇 氺 z 1 + a l*z〇 + blAO = a〇 氺 z〇 — a 1 氺 z 1 + bO A1 two a〇 氺 z 1 + a l * z〇 + bl

B〇 二 c〇 氺 z〇 — c 1 氺 z 1 + dOB〇 two c〇 氺 z〇 — c 1 氺 z 1 + dO

Bl = c〇 氺 z 1 + c 1 氺 zO + dl 於第二遍中,B 〇與B 1之結果係送回乘法器/累加 器(後面會討論)作爲共享運算元而產生: C〇-A〇*B〇-Al*Bl C1=A1 氺 B0 + A0*B1 D = B〇 氺 B0 + B1 氺 ___23__ 本紙張尺度適用中國國家標準(CNS〉A4規格(210X297公釐) (請先閱讀背面之注意事項再填寫本頁) -1¾.Bl = c〇 氺 z 1 + c 1 氺 zO + dl In the second pass, the results of B 0 and B 1 are sent back to the multiplier / accumulator (discussed later) as a shared operand: C〇- A〇 * B〇-Al * Bl C1 = A1 氺 B0 + A0 * B1 D = B〇 氺 B0 + B1 氺 ___23__ This paper size applies to Chinese national standards (CNS> A4 size (210X297 mm) (Please read first (Notes on the back, please fill out this page) -1¾.

、1T 經濟部中央標準局員工消費合作社印製 484110 五、發明説明() 最後,係執行除法運算:, 1T Printed by the Consumer Cooperatives of the Central Standards Bureau of the Ministry of Economic Affairs 484110 V. Description of the Invention () Finally, the division is performed:

X 0 = C 0 / D X 1 = C 1 / D 根據圖2之實施例,一陣列處理器介面電路(A P I C ) 9 6 5控制從該主記憶體介面與控制器(MM I C ) 6發出之指令與資料的請求與接收以及送資料至該主記憶 體介面與控制器(MM I C ) 6。於此實施例中該陣列處 理器9具有一單一群組9 7 0其中有多資料路徑處理單元 9 0 0。該等資料路徑處理單元9 0 0亦簡稱爲資料路徑 9 0 0。該群組9 7 0形成一算術處理單元(A P U) 9 7 0。 於算術處理單元9 7 0中,一內部浮點表示法係被使 用。.此內部浮點表示法可能相同或者可能不同於外部所使 用的浮點表示法◊通常,該算術處理單元9 7 0係適用於 預計至少一標準外部表示法,例如,標準國際電機電子工 程師學會(I E E E )浮點表示法。於一較佳實施例中, 使用的浮點表示法係一簡化的表示法,此表示法並不包括 ,例如,標準國際電機電子工程師學會(I E E E )浮點 表示法之異常情形。使用該簡化的表示法能夠有較高的內 部效率,例如,在考量電路大小之下,藉由避免複雜的例 外情形。使用簡化的表示法係確實可行的,因爲例外情形 典型上在幀描繪與數位信號處理器應用方面並沒有太大關 係。 該算術處理單元9 7 0包括一共享運算元電路(s 〇 ___24 ______ 本紙張尺度適用中國國家標準(CNS ) A4規格(21〇'〆297公嫠) (請先閱讀背面之注意事項再填寫本頁)X 0 = C 0 / DX 1 = C 1 / D According to the embodiment of FIG. 2, an array processor interface circuit (APIC) 9 6 5 controls commands issued from the main memory interface and the controller (MM IC) 6 Request and receive data and send data to the main memory interface and controller (MM IC) 6. In this embodiment, the array processor 9 has a single group 9 7 0 including multiple data path processing units 9 0 0. These data path processing units 900 are also referred to as data paths 900. The group 9 7 0 forms an arithmetic processing unit (A P U) 9 7 0. In the arithmetic processing unit 970, an internal floating-point representation system is used. This internal floating-point representation may be the same or may be different from the floating-point representation used externally. Generally, the arithmetic processing unit 970 is suitable for predicting at least one standard external representation, for example, the International Institute of Electrical and Electronics Engineers (IEEE) floating point notation. In a preferred embodiment, the floating-point representation used is a simplified representation. This representation does not include, for example, the standard International Institute of Electrical and Electronics Engineers (IEEE) floating-point representation exception. Using this simplified notation can have higher internal efficiency, for example, by considering circuit size, by avoiding complex exceptions. Using a simplified notation system is indeed possible, because exceptions typically do not have much to do with frame rendering and digital signal processor applications. The arithmetic processing unit 9 7 0 includes a shared arithmetic element circuit (s 〇 ___24 ______ This paper size applies the Chinese National Standard (CNS) A4 specification (21〇'297297 嫠) (Please read the precautions on the back before filling (This page)

484110 Λ7 B7 五、發明説明(〆) C) 9 1 0,此共享運算元電路作爲一乘法器輸入電路。 選擇性地,該共享運算元電路9 1 0包括一具有位址控制 之隨機存取記憶體電路能測定該隨機存取記憶體是否可被 使用作爲一運算兀快取記憶體或者是一先進先出(F I F 0)的佇列。於該共享運算元電路9 1 0中之第一子電路 (圖中並無示出)包括多個輸入暫存器用以接收與捕捉接 線束9 0 2之狀態°該共享運算兀電路9 1 0包括一第二 子電路(無顯示出)係被耦接至第一子電路中的暫存器。 第二子電路包每一整數算術邏輯單元(ALU)(無顯示 出)◊第二子電路來自於該等暫存器選出的接線束902 之狀態之欄位上執行定點加/減運算(於算術邏輯單元中 ),。第二子電路亦根據已知的轉換演算法把加/減運算 所得到的定點結果或者把輸入暫存器所選擇欄位的定點結 果轉換成內部浮點表示法。所以,第二子電路係包括一浮 點轉換單元(無顯示出)。 經濟部中央標準局員工消費合作社印製 (請先閱讀背面之注意事項再填寫本頁) 該算術處理單元9 7 0亦包括二或更多乘法器/累加 器(MA C ) 9 0 0如同資料路徑處理單元9 0 0 ◊乘法 器/累加器(MA C )在相關技術方面係爲人所熟知。每 一乘法器/累加器(MA C ) 9 0 0係被耦接至接線束9 0 6用以從共享運算元電路9 1 0接收一共享運算元。每 一乘法器/累加器(MA C ) 9 0 0係亦被耦接至一對應 的區域資料儲存器9 2 0。通常,區域資料記憶體電路9 2 0之數目係和乘法器/累加器(MAC) 900的數目 一樣多。 ________25__ 本紙張尺度適用中國國家標準(CMS ) A4規格(210X 297公釐) 484110484110 Λ7 B7 V. Description of the invention (〆) C) 9 1 0. This shared operand circuit is used as a multiplier input circuit. Optionally, the shared arithmetic element circuit 9 10 includes a random access memory circuit with address control to determine whether the random access memory can be used as an arithmetic cache memory or a first-in-first-out memory. Out (FIF 0) queue. The first sub-circuit (not shown in the figure) of the shared arithmetic element circuit 9 1 0 includes a plurality of input registers for receiving and capturing the state of the wiring harness 9 0 2 ° The shared arithmetic circuit 9 1 0 A second sub-circuit (not shown) is coupled to the register in the first sub-circuit. Each sub-circuit package includes each integer arithmetic logic unit (ALU) (not shown). The second sub-circuit performs fixed-point addition / subtraction operations on the status field of the wiring harness 902 selected by these registers (in Arithmetic Logic Unit). The second sub-circuit also converts the fixed-point result obtained by the addition / subtraction operation or the fixed-point result of the field selected by the input register into an internal floating-point representation according to a known conversion algorithm. Therefore, the second sub-circuit system includes a floating-point conversion unit (not shown). Printed by the Consumer Cooperative of the Central Standards Bureau of the Ministry of Economic Affairs (please read the notes on the back before filling this page) The arithmetic processing unit 9 7 0 also includes two or more multipliers / accumulators (MA C) 9 0 0 The path processing unit 9 0 0 ◊ multiplier / accumulator (MA C) is well known in the related art. Each multiplier / accumulator (MA C) 9 0 0 is coupled to the wiring harness 9 0 6 to receive a shared operand from the shared operand circuit 9 10. Each multiplier / accumulator (MA C) 9 0 0 is also coupled to a corresponding area data storage 9 2 0. Generally, the number of area data memory circuits 9 2 0 is as many as the number of multiplier / accumulator (MAC) 900. ________25__ This paper size applies to China National Standard (CMS) A4 specification (210X 297 mm) 484110

經濟部中央標準局員工消費合作社印製 五、發明説明(4) 在操作期間,每一乘法器/累加器(MA C) 9 0 〇 以某數値格式接收3個數値X、Y和Z,其中X係從接線 束9 0 6接收的共享運算元,並且γ與z係從該區域資料 儲存器9 2 0經由接線束9 0 9接收的。 每一乘法器/累加器(MA C ) 9 0 0擁有二或更多 暫存器’最好是至少四個暫存器。每一乘法器/累加器( MA C ) 9 0 〇在每一時鐘週期中可執行一乘法以及一加 /減法運算’藉此從(可能)目前或者先前時鐘週期中所 接收的X、Y和Z値產生XY+ Z或者XY — Z。每一乘 法器/累加器(MA C ) 9 0 0按照情況將所產生的結果 儲存在它所擁有的其中一個暫存器中^ 該算術處理單兀9 7 0進一^步包括一^共享輸出與回授 介面(S 0 F I ) 9 4 0。該共享輸出與回授介面(S 0 F I ) 9 4 0包括一浮點轉換單元(無示出),此浮點轉 換單元係適合使用標準技術將一接線束(9 0 7 )之內部 數値格式轉換成必要的浮點(例如,國際電機電子工程師 學會(I E E E )浮點)或者外部所需的定點表示法◊當 然’於內部浮點表示法係等於外部表示法之該等實施例中 ’則不必執行該特別的轉換。該共享輸出與回授介面(S 0 F I ) 9 4 0控制一寬匯流排9 0 1中上述所轉換之結 果傳輸至陣列處理器介面電路9 6 5。於一特別實施例中 ,該共享輸出與回授介面(S 0 F I ) 9 4 0在傳輸之前 亦必須暫時儲存如此的結果。 該資料記憶體9 2 0具有一對應的位址產生器 26 本紙張尺度適用中國國家標準(CNS ) A4規格(210X 297公釐) (請先閱讀背面之注意事項再填寫本頁) 一口 84 4 經濟部中央標準局員工消費合作社印製 Λ7 __B? 五、發明説明(呻) (例如,下文將討論的模組9 5 0中)用以提供位址以便 存取。該等位址被用來提取Y與Z運算元,這些運算元可 能相同也可能不同。藉由支援產生不同的位址,在很多應 用需求上係提供一優點,諸如快速傅立葉轉換(F F T ) 〇 該算術處理單元9 7 0包括一總體控制、同步電路和 指令儲存單元(C S I S ) 9 5 0。該總體控制、同步電 路和指令儲存單元(C S I S ) 9 5 0執行算術處理單元 9 7 0之各類模組的控制與同步並且包括對指令儲存。算 術處理單元9 7 0之架構係如圖2中所示並且此處將討論 ,該總體控制、同步電路和指令儲存器(C S I S ) 9 5 0具有一合適結構用以進行它的工作。特別地,該總體控 制、同步電路和指令儲存單元(C S I S ) 9 5 1包括( 圖2中無顯示出)一狀態暫存器、一程式計數器、二或更 多回路計數器、用於與每一乘法器/累加器(MAC) 9 0 0相連的每一Y與Z運算元區域記憶體9 2 0之至少一 指標暫存器、一指令暫存器和一指令儲存器,所有的這些 組件對所有熟悉此技術之人士而言係爲人所熟知。對每一 輸入、輸出分配、乘法器/累加器(MAC) 900而言 指令字元係具有不同欄位。Printed by the Consumer Cooperative of the Central Standards Bureau of the Ministry of Economic Affairs 5. Description of the invention (4) During the operation, each multiplier / accumulator (MA C) 9 0 〇 receives 3 numbers in a certain number format: X, Y and Z , Where X is the shared operand received from wiring harness 9 06, and γ and z are received from the area data storage 9 2 0 via wiring harness 9 0 9. Each multiplier / accumulator (MA C) 9 0 0 has two or more registers', preferably at least four registers. Each multiplier / accumulator (MA C) 9 0 〇 can perform a multiplication and an addition / subtraction operation in each clock cycle to thereby take from (possibly) the X, Y, and Z 値 produces XY + Z or XY — Z. Each multiplier / accumulator (MA C) 9 0 0 stores the generated result in one of the registers it owns according to the situation ^ The arithmetic processing unit 9 7 0 further steps include a shared output And feedback interface (S 0 FI) 9 4 0. The shared output and feedback interface (S 0 FI) 9 4 0 includes a floating-point conversion unit (not shown). This floating-point conversion unit is suitable for internally digitizing a wiring harness (9 0 7) using standard technology. The format is converted to the necessary floating point (for example, the Institute of Electrical and Electronics Engineers (IEEE) floating point) or externally required fixed-point representation (of course, in those embodiments where the internal floating-point representation is equal to the external representation) It is not necessary to perform this special conversion. The shared output and feedback interface (S 0 F I) 9 4 0 controls a wide bus 9 0 1 to transfer the results converted above to the array processor interface circuit 9 6 5. In a special embodiment, the shared output and feedback interface (S 0 F I) 9 40 must also temporarily store such results before transmission. The data memory 9 2 0 has a corresponding address generator 26. The paper size applies to the Chinese National Standard (CNS) A4 specification (210X 297 mm) (Please read the precautions on the back before filling in this page) sip 84 4 Printed by the Consumers' Cooperative of the Central Standards Bureau of the Ministry of Economic Affairs Λ7 __B? 5. The description of the invention (呻) (for example, in module 950 discussed below) is used to provide an address for easy access. These addresses are used to extract the Y and Z operands, which may be the same or different. By supporting the generation of different addresses, it provides an advantage in many application requirements, such as fast Fourier transform (FFT). The arithmetic processing unit 9 7 0 includes an overall control, synchronization circuit, and instruction storage unit (CSIS) 9 5 0. The overall control, synchronization circuit, and instruction storage unit (C S I S) 950 performs arithmetic processing unit 970 control and synchronization of various modules and includes instruction storage. The architecture of the arithmetic processing unit 970 is shown in Figure 2 and will be discussed here. The overall control, synchronization circuit, and instruction memory (CSSI) 950 has a suitable structure for performing its work. Specifically, the overall control, synchronization circuit, and instruction storage unit (CSIS) 9 5 1 includes (not shown in FIG. 2) a state register, a program counter, two or more loop counters, and Multiplier / accumulator (MAC) 9 0 0 At least one index register, an instruction register, and an instruction memory of each Y and Z operand region memory 9 2 0, all of these components are Everyone familiar with this technology is familiar. The instruction characters have different fields for each input, output allocation, multiplier / accumulator (MAC) 900.

該狀態暫存器係選擇性地至少保留大量線狀態。這些 線狀態至少包括迴圏計數器之狀態條件位元、來自每一共 享運算元電路910之整數算術邏輯單元的零檢測信號和 每一算術處理單元9 7 0之共享輸出與回授介面(s 0 F _____27__^____ 本紙張尺度適用中國國家標準(CNS ) A4規格(210X297公釐) (請先閱讀背面之注意事項再填寫本頁)The state register selectively retains at least a large number of line states. These line states include at least the status condition bits of the echo counter, the zero detection signal from the integer arithmetic logic unit of each shared operand circuit 910, and the shared output and feedback interface of each arithmetic processing unit 9 7 0 (s 0 F _____ 27 __ ^ ____ This paper size applies Chinese National Standard (CNS) A4 (210X297 mm) (Please read the precautions on the back before filling this page)

484110 經濟部中央標準局員工消費合作社印製 Λ7 五、發明説明(y< ) 1 ) 9 4 0的浮點轉換單元之條件位元。 該總體控制、同步電路和指令儲存單元(c s 1 s ) 係適用於請求指令(經由主記憶體介面與控制器(MM 1 C) 6與陣列處理器介面電路9 6 5 )和暫時儲存這些指 令至它的指令儲存器中。 該接線束9 0 3提供輸入至該區域資料儲存器9 2 0 。該接線束9 0 3至少包括:用於每一區域資料記憶體9 2 0之組件記憶體的位址、每一區域資料記憶體9 2 0之 讀/寫控制與頁交插控制信號。 每一算術處理單元9 7 0在二向量之形式上運算’如 在「向量加至向量」之形式運算’或者於一向量與一純量 之形式上運算,如在「純量乘上一向量之元素」之形式操 作。.該算術處理單元9 7 0係具有相當多的用途因爲一輸 入運算元係爲多個乘法器/累加器(MA C )之間所共享 並且因爲一區域記憶體電路(9 2 0 )或者另一乘法器/ 累加器(MAC )之暫存器的狀態提供其他的二個運算元 〇 根據本發明該陣列處理器9能夠計算使用其他類形處 理器(例如,向量處理器)所無法有效率執行之計算。舉 例而言,在數個離散子波變換濾波器(DWT F )之計算 中經常必須共享橫跨一向量之數個子分量的數個不同純量 。通常向量處理器需要與被共享的純量數目一樣多的週期 執行此工作。於這些離散子波變換濾波器中,每一偶數項 輸入會影響所有濾波器輸出,並且該等奇數項輸入僅影響 __— _28___ 本紙張尺度適用中國國家標準(CNS ) A4規格(21〇Χ297公釐) (請先閱讀背面之注意事項再填寫本頁) 衣·484110 Printed by the Consumer Cooperatives of the Central Standards Bureau of the Ministry of Economic Affairs Λ7 V. Conditional Bits of the floating-point conversion unit of the invention description (y <) 1) 9 4 0. The overall control, synchronization circuit, and instruction storage unit (cs 1 s) are suitable for requesting instructions (via the main memory interface and controller (MM 1 C) 6 and array processor interface circuit 9 6 5) and temporarily storing these instructions To its instruction memory. The wiring harness 9 0 3 provides input to the area data storage 9 2 0. The wiring harness 903 includes at least: an address of a component memory for each area data memory 920, and a read / write control and page interleaving control signal for each area data memory 920. Each arithmetic processing unit 9 7 0 operates on the form of two vectors' as in the form of "vector plus vector" or operates on the form of a vector and a scalar, such as "scalar multiplied by a vector Element ". The arithmetic processing unit 970 has a lot of uses because an input operand system is shared between multiple multipliers / accumulators (MA C) and because of an area memory circuit (920) or another The state of the register of a multiplier / accumulator (MAC) provides two other operands. According to the present invention, the array processor 9 can calculate the efficiency that cannot be achieved using other types of processors (for example, vector processors). Calculations performed. For example, in the calculation of several discrete wavelet transform filters (DWT F), it is often necessary to share several different scalars across several subcomponents of a vector. Usually vector processors need as many cycles as the number of scalars being shared to perform this work. In these discrete wavelet transform filters, each even-numbered input will affect all filter outputs, and these odd-numbered inputs only affect __— _28___ This paper standard applies to China National Standard (CNS) A4 specification (21〇 × 297 Mm) (Please read the notes on the back before filling this page)

、1T 484110 經濟中央標準局員工消費合作社印製 Β7 - — 1 — -*-—-----------------'一 五、發明説明(4 ) 一半濾波器輸出。以此方式處理四輸入會出現奇數項使用 該向量處理器方法無法同時被處理之缺點。然而於該陣列 處理器9中,這些問題係被最小化,舉例而言,因爲陣列 處理器可送出二(或更多)奇數項純量元素給一向量之不 同分量◊數基2快速傅立葉轉換(F F T)提供和與差給 一向量之不同分量。本發明藉由平行提供這些給該向量之 各別分量可在一週期內得到結果而不是如向量處理器對於 每對元素需二週期。 圖3係根據本發明之另一實施例有關圖1之陣列處理 器9的方塊圖。如圖3所示,該陣列處理器9具有二個算 術處理單元9 7 0 ◊此架構提供應用程式相加的並行操作 ’對於由其他模組(例如,內嵌式微處理器)執行之其他 計算來說浮點計算預期會特別有用。於另外的實施例中, 於該基本電路1中之算術處理單元9 7 0數目可被增加達 到二個以上。 根據圖3之陣列處理器9係被用在一較佳實施例中, (參考圖1 )該基本電路1係有效率地包括除了根據圖3 之陣列處理器9之外,根據需要,還包括總體外部匯流排 介面(G E B I ) 7、該(等)外部區域記憶體介面(E LM I ) 2、內嵌式微處理器子系統8和相鄰電路通信埠 1 0 ’然而沒有附加特殊目的電路1 1,無記憶體庫4, 因此’一主介面與控制Μ I C ) 6其省略該主記憶體介 面與控制器(MM I C) 6之記憶體控制器。 圖4係圖1之內嵌式微處理器子系統8的方塊圖。內 本紙張尺度適用中國國家榡準(CNS ) Α4規格(21〇χ297公嫠 (請先閱讀背面之注意事項再填寫本頁) 訂 484110 Λ7 Β7 五、發明説明(V]) 嵌式微處理器在相關的技術中係眾人皆知的,如圖4所示 ’該內嵌式微處理器子系統8包括微處理器8 0 0。該微 處理器8 0 0可作爲一位兀組碼機器,諸如一 J a v a或 MPEG — 4引擎 '一可程式有限狀態機器(P F S Μ ) 、或者一精簡指令集電腦引擎,最好是一1 6位元指令集 精簡指令集電腦具有一 3 2位元資料路徑,諸如A RM 7 T DM I (見參考文獻〔5 6〕)或者Μ I P S 1 6 (見 參考文獻〔5 5〕)。 該內嵌式微處理器子系統8亦包括一任選區域快取記 憶體8 1 0,最好是2至4分路交插並且至少有6 4頁和 全部記憶體至少有1 Κ字元之記憶體。典型地,該快取記 憶體8 1 0係安排具有,例如,3 2位元字元。該快取記 憶體8 1 0可包括二部分快取記憶體,一爲指令快取記憶 體以及另一爲資料快取記憶體。 經濟部中央標準局員工消費合作社印製 該內嵌式微處理器子系統8亦包括附加的任選區域儲 存器,諸如一隨機存取記憶體8 2 0或者是一唯讀記憶體 8 3 0,根據相關的技術係可得知◊該唯讀記憶體8 3 0 可能包括,例如,用於該微處理器子系統8和/或基本電 路1之初始資訊。 於一較佳實施例中,該(等)微處理器8 0 0提供二 種指令集:一種長度爲1 6位元,其包括由以編譯程式技 術產生的碼所使用之最常用的指令,並且另一種長度爲3 2位元。如此的微處理器係從相關的技術可得知。如此的 微處理器之優點係大部分產生的程式碼將爲16位元格式 本紙張尺度適用中國國家標準(CNS ) Α4規格(210Χ297公釐) 484110 經濟部中央標準局員工消費合作社印製 A7 五、發明説明(>?) 。於每一微處理器8 0 0中,每一 3 2位元或者更多之指 令提取可提供給緊接的1至3指令而不用依靠進一步的指 令記億體存取。 根據本文所討論的基本電路1與陣列處理器9揭示電 路之架構或者種類其具有許多値得注意的優點° 其中的一項優點係和使用微處理器或傳統數位信號處 理器之類似大小的印刷電路板上如此演演算法的效能比較 於執行特定之計算繁重的演算法上該電路1擁有非常高的 效率。 通常,該基本電路1在電路大小方面和其他可達到相 同的結果之機構比較係具有一較低需求。所以,該基本電 路1在達到相同結果方面係比其他機構便宜。該陣列處理 器9之小電路尺寸需求中的一因素是不直接實施除法運算 ,對所關注的大量幀描繪與數位信號處理演演算法中如此 設計確實是不常見的。於該陣列處理器9之實施例中該等 乘法器/累加器(MA C )係被建構成僅支援單倍精度算 術,對多數的幀描繪與數位信號處理器應用程式而言一般 係足夠的,該陣列處理器9之電路大小需求實質上係進一 步被減少。 該陣列處理器9之架構提供定點加/減法之「前端」, 繼之係浮點轉換(例如,在共享運算元電路9 1 0中), 然後相乘及累加(於乘法器/累加器(M A C) 9 0 0中 )。此「前端」可執行初步運算係涉及對稱、非對稱FI R ’和一些低基數快速傅立葉轉換(F F T)計算。這些 i紙張尺度適用中國國家標準(CNS) A4規格(21〇><297公^ ) (請先閲讀背面之注意事項再填寫本頁) 訂 -泉- 484110 經濟部中央榡準局員工消費合作社印製 Μ 五、發明説明(1 ) 運算在特定的數位信號處理應用程式中係非常有用,其中 資料通常係從取樣電路以一定點形式接收,典型地此取樣 電路係產生8至1 6位元之定點精度。 該陣列處理器9之另一優點係有關累積捨入誤差之一 般現象。數位信號處理器演算法維持算術結果之精度係非 常重要的。爲了能提高效率,很多早期的數位信號處理器 使用執行定點算術之演算法。如此的定點算術需要很小心 管理捨入誤差以維持精度◊比較起來,該陣列處理器9能 夠有效率地執行浮點計算。浮點計算具有提供較大的動態 範圍之優點,然而在相同精度的結果情況下係較容易由程 式執行◊ 分析陣列處理器9之一特定、實施例的性能係有意義 的。.於該例示性實施例中,該陣列處理器9 (根據圖3) 於它的陣列處理器介面電路9 6 5中從主記憶體介面與控 制器(MM I c ) 6、接收至少1 2 8位元。每一算術處理 單元9 7 0之共享運算元電路9 1 0可同時接收至少6 4 位元(最好是12 8位元)。該共享運算元電路9 1〇可 將接收的資料分解成固定的8位元或16位元整數欄位。 該共享運算元電路910可同時將最多4欄位之資料作加 法或減法運算。此例示性實施例之架構在計算基數4快谏 傅立葉轉換(F F Τ)時比起其他計算可達到相當高的效 能。 於該例示性實施例中,該陣列處理器子系統9具有〜 2 0 0百萬赫玆(μη z )之內部時鐘速率。隨後的蓮算 (請先閱讀背面之注意事項再填寫本頁) ^^衣· 、言 本紙張尺度適财關^^?TcNS ) Α4規格(2似297公资) ""~ 484110 經濟部中央標準局員工消費合作社印製 Λ7 五、發明説明(P ) 可以在每一時鐘週期內執行:四次單一精度浮點相乘和四 次單一精度浮點相加;一整數/定點加/減運算,整數/ 定點對浮點轉換;和浮點對定點轉換。於該例示性實施例 中,該區域資料儲存器9 2 0係雙分路交叉可支援經主記 憶體介面與控制器(MM I C ) 6存取而接收的目前載入 之資料同時提取Y與Z運算元給對應的乘法器/累加器( MAC)900^ 於該例示性實施例中,一通信方案係包括在總體外部 匯流排介面(G E B I ) 7、主記憶體介面與控制器(M Μ I C ) 6、內嵌式微處理器子系統8、外部區域記憶體 介面(E L Μ I ) 2和該陣列處理器9之陣列處理器介面 電路9 6 5的實施中,使用標準技術時,容許該陣列處理 器9.在7 5%的時間內能維持完全動作執行預見的幀描繪 與數位信號處理器應用程式◊此效能之假設乃根據至基本 電路1與它的內部陣列處理器9之資料位置故係合理的。 由該例示性實施例所得到的效能,上述的假設對很多 數位信號處理器與特定幀描繪演算法而言將是運算速度達 到一至三十億次浮點運算/秒(每秒執行十億次浮點運算 )° 假設基本電路1係在~標準積體電路(I c)封裝諸 如1 7 6支接腳之TQ F Ρ (薄四方形平面封裝)中實施 ’則每一積體電路將佔用約一平方英吋之印刷電路板面積 ◊因此’本發明之實施例可被實施作爲一週邊組件互連( P C I )匯流排卡最多能夠容納3 2個此類的積體電路, -—_______33 本紙張尺度適财關家^^—CNS ) Α4規格(210^^"--- (請先閱讀背面之注意事項再填寫本頁) 訂 484110 經濟部中央標準局員工消費合作社印裝 Λ7 五、發明説明(以) 藉此在遞送速度方面約爲一視算科技公司工作站(見τ 〇 y故事’參考文獻〔1〕)之效能的16倍且遠低於視算 科技公司工作站之成本。 本發明係特別適用於使用涉及j a v a (見參考文獻 〔6 4〕至〔6 5〕)及父互式聲視訊號語言諸如μ p e G —4 (見參考文獻〔57〕至〔63〕)之應用程式。 該Μ P E G - 4標準動作係發展各種演算法之環境使得可 以執行於客戶所用型號之硬體。:ί a ν a在其定義本質上 係與機器無關並且擁有語意能力可產生並引用很複雜以及 計算上爲昂貴的演算法◊本發明提供高效能數位信號處理 用以支援網路設備。 某些實施例之補充說明: .本發明之某些實施例可以用積體電路之槪略型式在此 作補充說明,其中包括: 一大型記憶體寬介面電路,其中該大型記憶體可能被 設置在該積體電路內或者被設置在該積體電路外,該大型 記憶體寬介面電路係藉由包含接線束D 〇之匯流排B 〇連 接至該大型記憶體,其中: 在該大型記憶體寬介面電路中之雙向收發器連接匯流 排B 〇之接線束D 〇,提供大型憶體寬介面電路與該大 型記憶體之間的資料轉移; 一或多個資料處理電路係包括: 可程式有限狀態機器(P F S Μ)或者微處理器;和 一資料處理器介面電路係連接至該大型記憶體寬介面 34 ϋ張尺度適用中國國家標準(CNS ) Α4規格(210X 297公瘦1 " ' (請先閱讀背面之注意事項再填寫本頁)、 1T 484110 Printed by the Consumers' Cooperative of the Central Bureau of Economic Standards B7--1--*--------------------------------------------------------------------------------(5) Invention Description (4) half filter Output. Processing four inputs in this way has the disadvantage that odd-numbered items cannot be processed simultaneously using the vector processor method. However, in the array processor 9, these problems are minimized, for example, because the array processor can send two (or more) odd-numbered scalar elements to different components of a vector. (FFT) provides sum and difference to different components of a vector. The present invention can obtain results in one cycle by providing these individual components to the vector in parallel instead of requiring two cycles for each pair of elements as the vector processor. FIG. 3 is a block diagram of the array processor 9 of FIG. 1 according to another embodiment of the present invention. As shown in FIG. 3, the array processor 9 has two arithmetic processing units 9 7 0. This architecture provides parallel operations for application addition. 'For other calculations performed by other modules (eg, embedded microprocessors) Floating-point calculations are expected to be particularly useful. In another embodiment, the number of arithmetic processing units 970 in the basic circuit 1 can be increased to two or more. The array processor 9 according to FIG. 3 is used in a preferred embodiment (refer to FIG. 1). The basic circuit 1 efficiently includes the array processor 9 according to FIG. General external bus interface (GEBI) 7. The external area memory interface (E LM I) 2. Embedded microprocessor subsystem 8 and adjacent circuit communication port 1 0 'However, no special purpose circuit 1 is attached 1. There is no memory bank 4. Therefore, 'a main interface and control M IC) 6 It omits the main memory interface and the memory controller of the controller (MM IC) 6. FIG. 4 is a block diagram of the embedded microprocessor subsystem 8 of FIG. 1. The size of this paper is applicable to China National Standard (CNS) Α4 specification (21〇297297χ (please read the precautions on the back before filling this page) Order 484110 Λ7 Β7 V. Description of Invention (V) The related art is well known, as shown in FIG. 4 'The embedded microprocessor subsystem 8 includes a microprocessor 800. The microprocessor 800 can be used as a one-digit group code machine, such as A J ava or MPEG-4 engine 'a Programmable Finite State Machine (PFS M), or a reduced instruction set computer engine, preferably a 16-bit instruction set. The reduced instruction set computer has a 32-bit data path. , Such as A RM 7 T DM I (see reference [5 6]) or M IPS 16 (see reference [5 5]). The embedded microprocessor subsystem 8 also includes an optional area cache memory The body 8 1 0 is preferably a 2 to 4 branch interleave and has at least 64 pages and a total memory of at least 1 K characters. Typically, the cache memory 8 1 0 is arranged to have, For example, 32-bit characters. The cache memory 8 1 0 may include two parts of cache memory. One is the instruction cache and the other is the data cache. The embedded microprocessor subsystem 8 printed by the Consumer Cooperatives of the Central Standards Bureau of the Ministry of Economic Affairs also includes additional optional area memory, such as a random memory. Take the memory 8 2 0 or a read-only memory 8 3 0. According to the related technology, the read-only memory 8 3 0 may include, for example, the microprocessor subsystem 8 and / Or the initial information of the basic circuit 1. In a preferred embodiment, the (and other) microprocessor 800 provides two instruction sets: a length of 16 bits, which includes code generated by compiler technology The most commonly used instructions are 32 bits in length. Such a microprocessor is known from related technologies. The advantages of such a microprocessor are that most of the generated code will be 16 bits The format of this paper is in accordance with the Chinese National Standard (CNS) A4 specification (210 × 297 mm) 484110 Printed by A7, Consumer Cooperatives of the Central Standards Bureau of the Ministry of Economic Affairs 5. Description of invention (>?). At each microprocessor 8 0 0 Medium, 2 each Or more instruction fetches can be provided to the next 1 to 3 instructions without relying on further instruction memory access. According to the basic circuit 1 and array processor 9 discussed in this paper, the architecture or type of the circuit is revealed to have many注意 Notable advantages ° One of the advantages is that the performance of such an algorithm on a printed circuit board of a similar size using a microprocessor or traditional digital signal processor is better than that of a circuit that performs a specific computationally heavy algorithm 1 has a very high efficiency. Generally, the basic circuit 1 has a lower demand in terms of circuit size than other mechanisms that can achieve the same result. Therefore, the basic circuit 1 is cheaper than other institutions in achieving the same result. One of the factors in the small circuit size requirements of the array processor 9 is that the division operation is not implemented directly, and it is indeed uncommon to design such a large number of frames of interest and digital signal processing algorithms. In the embodiment of the array processor 9, the multipliers / accumulators (MA C) are constructed to support only single-precision arithmetic, and are generally sufficient for most frame rendering and digital signal processor applications. The circuit size requirement of the array processor 9 is substantially further reduced. The architecture of the array processor 9 provides the "front end" of fixed-point addition / subtraction, followed by floating-point conversion (for example, in shared operand circuit 9 10), and then multiplication and accumulation (in the multiplier / accumulator ( MAC) 9 0 0). This "front-end" executable preliminary operation involves symmetric, asymmetric FIR ', and some low-cardinality fast Fourier transform (F F T) calculations. These i paper sizes are in accordance with the Chinese National Standard (CNS) A4 specifications (21〇 > < 297g ^) (Please read the precautions on the back before filling this page) Order-Quan-484110 Staff Consumption Cooperative printed M. 5. Description of the invention (1) The operation is very useful in specific digital signal processing applications. The data is usually received in a certain point form from the sampling circuit. Typically, this sampling circuit generates 8 to 16 bits. Yuanzhi fixed-point accuracy. Another advantage of the array processor 9 is the general phenomenon related to accumulated rounding errors. It is important that digital signal processor algorithms maintain the accuracy of the arithmetic results. To improve efficiency, many early digital signal processors used algorithms that performed fixed-point arithmetic. Such fixed-point arithmetic requires careful management of rounding errors to maintain accuracy. In comparison, the array processor 9 can efficiently perform floating-point calculations. Floating-point calculations have the advantage of providing a larger dynamic range, but they are easier to execute programmatically for results with the same accuracy. It is meaningful to analyze the performance of one specific, embodiment of the array processor 9. In the exemplary embodiment, the array processor 9 (according to FIG. 3) receives at least 1 2 from the main memory interface and the controller (MM I c) in its array processor interface circuit 9 6 5. 8 bits. The shared operand circuit 9 10 of each arithmetic processing unit 970 can receive at least 64 bits (preferably 128 bits) at the same time. The shared operand circuit 9 10 can decompose the received data into fixed 8-bit or 16-bit integer fields. The shared operand circuit 910 can perform addition or subtraction on data of up to 4 columns at the same time. The architecture of this exemplary embodiment can achieve considerably higher performance when calculating the base 4 fast Fourier transform (F F T) than other calculations. In the exemplary embodiment, the array processor subsystem 9 has an internal clock rate of ~ 200 million hertz (μηz). Subsequent lotus calculations (please read the precautions on the back before filling this page) ^^ clothing, and paper size ^^? TcNS) Α4 specifications (2 like 297 public funds) " " ~ 484110 Economy Printed by the Consumer Standards Cooperative of the Ministry of Standards of the Ministry of Λ7 V. Invention description (P) can be executed in each clock cycle: four single-precision floating-point multiplications and four single-precision floating-point additions; an integer / fixed-point addition / Subtraction, integer / fixed-point to floating-point conversion; and floating-point to fixed-point conversion. In this exemplary embodiment, the area data storage 920 is a dual-branch cross, which can support the currently loaded data received through the main memory interface and the controller (MM IC) 6 to simultaneously extract Y and Z operand gives the corresponding multiplier / accumulator (MAC) 900 ^ In this exemplary embodiment, a communication scheme includes the overall external bus interface (GEBI) 7. The main memory interface and the controller (MM) IC) 6, embedded microprocessor subsystem 8, external area memory interface (ELM I) 2 and the array processor interface circuit 9 6 5 of the array processor 9 in the implementation of standard technology, allowing the Array processor 9. Can maintain full operation in 7 5% of time and execute foreseen frame drawing and digital signal processor applications. The assumption of this performance is based on the data location of basic circuit 1 and its internal array processor 9. It is reasonable. From the performance obtained by this exemplary embodiment, the above assumptions will be one to three billion floating-point operations per second for many digital signal processors and specific frame rendering algorithms (per billion executions per second) (Floating-point arithmetic) ° Assuming that the basic circuit 1 is implemented in a standard integrated circuit (I c) package such as a TQ F RP (thin square flat package) with 176 pins, then each integrated circuit will take up Approximately one square inch of printed circuit board area, so 'the embodiments of the present invention can be implemented as a peripheral component interconnect (PCI) bus card capable of accommodating up to 32 such integrated circuits, -_______ 33 copies Paper size is suitable for financial affairs ^^ — CNS) Α4 specification (210 ^^ " --- (Please read the notes on the back before filling out this page) Order 484110 Printed by the Consumers' Cooperative of the Central Standards Bureau of the Ministry of Economic Affairs Λ7 5. [Explanation] The invention takes about 16 times the performance of a video technology company workstation (see τ 〇y story 'Reference [1]) in terms of delivery speed and is far lower than the cost of the video technology company workstation. The Department of Invention is particularly applicable Use applications involving java (see references [6 4] to [6 5]) and parent interactive audiovisual signal languages such as μ pe G-4 (see references [57] to [63]). The PEG -4 The standard action system develops an environment of various algorithms to enable execution on the hardware of the model used by the customer .: ί a ν a is essentially machine-independent in its definition and possesses semantic capabilities. Expensive algorithms: The present invention provides high-performance digital signal processing to support network equipment. Supplementary description of certain embodiments: .Some embodiments of the present invention can use the simplified form of integrated circuits to supplement the description here. , Which includes: a large memory wide interface circuit, wherein the large memory may be disposed in the integrated circuit or outside the integrated circuit, the large memory wide interface circuit is provided by including a wiring harness D 〇The bus B 〇 is connected to the large memory, wherein: the bidirectional transceiver in the large memory wide interface circuit is connected to the wiring harness D 〇 of the bus B 〇, providing a large Data transfer between the memory wide interface circuit and the large memory; one or more data processing circuits include: a programmable finite state machine (PFS M) or a microprocessor; and a data processor interface circuit connected to The large-memory wide interface 34-inch scale is applicable to China National Standard (CNS) Α4 specifications (210X 297 male thin 1 " '(Please read the precautions on the back before filling this page)

、1T 484110 經濟部中央標準局員工消費合作社印製 Λ7 五、發明説明(β) 其中該資料處理器電路係藉由一匯流排Β1被連接至大型 記憶體寬介面電路’其中的兩項電路係包含: 連接匯流排Β 1之接線束分線D 1的雙向收發器係; 和 區域記憶體電路能夠提供對該區域記憶體電路中匯流 排Β 1之接線束分線D 1發出和/或感測的狀態儲:便 在該狀態可能不再被發出後留存該狀態資訊,所以能讀出 該匯流排Β1之接線束分線D1的狀態; 其中該匯流排Β1之接線束D1的信號狀態包括按時 順向或逆向傳送的D 〇之某些或者全部信號狀態,即,可 將資料傳送至匯流排B 〇之接線束D 〇或者從匯流排B 〇 之接線束〇 〇接收資料的機構; .一或多個陣列處理器,每一陣列處理器包括一陣列處 理器介面電路係連接至該大型記憶體寬介面電路、一共享 運算元輸入電路、多個乘法器/累加器和一共享乘法器輸 出電路以及一指令解碼器電路,其中: 該陣列處理器介面電路連接至大型記憶體寬介 係包括一匯流排B 2,此陣列處理器介面電路與該大型記 憶體寬介面電路包括: 雙向收發器係被連接至匯流排B 2之接線束分線D 2 ;和 區域記憶體電路能夠提供對該區域記億體電路中匯流 排B 2之接線束分線D 2發出和/或感測的狀態儲存$便 在該狀態可能不再被發出後留存該狀態資訊’所以能纟買出 35 ^^5^用中國國家標準(〇奶)八4規格(210'/297公釐) (請先閱讀背面之注意事項再填寫本頁) 衣. 、v-u 經濟部中央標準局員工消費合作社印製 484110 Λ7 __B7 五、發明説明(4) 該匯流排B 2之接線束分線d 2的狀態; 該匯流排B 2之接線束D 2的信號狀態包含按時順向 或逆向傳送的D 〇之某些或全部信號狀態,即,可將資料 傳送至匯流排B 〇之接線束D 〇或者從匯流排B 〇之接線 束D 0接收資料的機構; 該共享運算元輸入電路係藉由一匯流排B 3被連接至 該陣列處理器介面電路,此外,該共享運算元輸入電路係 被連接至該共享乘法器輸出電路以及至該等區域資料記憶 體電路’而該等區域記憶體電路係藉由一匯流排B 4聯繋 每一乘法器/累加器,其中: 該共享運算元輸入電路接收接線束之某些或全部狀態 資訊; .每一乘法器/累加器包含乘法與加法機構,外加二個 互相獨立的可定址隨機存取記憶體,其中: 該乘法與加法機構共享一乘法器輸入並且從二個互相 獨立的可定址隨機存取記憶體提供另外的二輸入,此二個 互相獨立的可定址隨機存取記憶體可以是唯讀記憶體或者 不是唯讀記憶體; 每一隨機存取記憶體位址以及控制線束係爲控制接線 束(CWB )之部份; 該乘法與加法機構擁有一輸出驅動器連接至一接線束 M0,此接線束M0係由該乘法器/累加器以及乘法器輸 出電路所共享;和 每一輸出驅動器係由控制接線束(CWB )之接線束 36 本紙張尺度適用中國國家標準(〇奶)六4規格(210、/297公楚) (請先閱讀背面之注意事項再填寫本頁) 、1· ·丨 4841101T 484110 Printed by the Consumer Cooperatives of the Central Standards Bureau of the Ministry of Economic Affairs Λ7 V. Description of the Invention (β) The data processor circuit is connected to a large-memory wide interface circuit through a bus B1. It includes: a two-way transceiver system connected to the wiring harness branch line D 1 of the busbar B 1; and the area memory circuit can provide a response and / or sense to the wiring harness branch line D 1 of the bus line B 1 in the area memory circuit. Measured state storage: The state information is retained after the state may no longer be issued, so the state of the wiring harness branch line D1 of the busbar B1 can be read out; among them, the signal status of the wiring harness D1 of the busbar B1 includes Some or all of the signal states of D 0 transmitted in a forward or reverse direction on time, that is, an organization that can transmit data to or receive data from the wiring harness D 0 of the bus B 0; . One or more array processors, each array processor includes an array processor interface circuit connected to the large memory wide interface circuit, a shared operand input circuit, multiple multipliers / tires An adder, a shared multiplier output circuit, and an instruction decoder circuit, wherein: the array processor interface circuit is connected to a large memory; the wide interface includes a bus B 2; the array processor interface circuit and the large memory The wide interface circuit includes: a bidirectional transceiver is connected to the wiring harness branch D 2 of the bus B 2; and the area memory circuit can provide the wiring harness branch D 2 of the bus B 2 in the region memory circuit. The issued and / or sensed status is stored in $, and the status information is retained after the status may no longer be issued. 'So I can buy 35 ^^ 5 ^ using Chinese National Standard (〇 奶) 8 4 specifications (210' / 297 mm) (Please read the precautions on the back before filling this page). Vu Printed by the Consumer Cooperative of the Central Standards Bureau of the Ministry of Economic Affairs 484110 Λ7 __B7 V. Description of the invention (4) The wiring harness of the bus B 2 The state of the line d 2; the signal state of the wiring harness D 2 of the bus B 2 includes some or all of the signal states of D 0 transmitted in a forward or reverse direction on time, that is, data can be transmitted to the bus B 0 Wiring harness D 〇 or from the sink A mechanism for receiving data in the wiring harness D 0 of row B 0; the shared operand input circuit is connected to the array processor interface circuit through a bus B 3; furthermore, the shared operand input circuit is connected to the array The shared multiplier output circuit and the regional data memory circuits are connected to each multiplier / accumulator via a bus B 4, wherein: the shared operand input circuit receives wiring Some or all of the status information; each multiplier / accumulator contains a multiplication and addition mechanism, plus two mutually independent addressable random access memories, where: the multiplication and the addition mechanism share a multiplier input and Provide another two inputs from two mutually independent addressable random access memories, these two mutually independent addressable random access memories may be read-only memory or not read-only memory; each random access The memory address and the control wiring harness are part of the control wiring harness (CWB); the multiplication and addition mechanism has an output driver connected to a wiring harness M0 This wiring harness M0 is shared by the multiplier / accumulator and the output circuit of the multiplier; and each output driver is controlled by the wiring harness of the control wiring harness (CWB). 36 This paper size applies to the Chinese national standard (〇 奶). 4 specifications (210, / 297 Gongchu) (Please read the precautions on the back before filling out this page), 1 · · 484110

五、發明説明(4) 經濟部中央標準局員工消費合作社印製 分線控制;和 該乘法器輸出電路包含Μ 〇接線束介面、乘法器輸出 記憶體電路和乘法器輸出介面,其中: 該Μ 〇接線束介面包含被連接至該μ 〇接線束之輸入 電路; 該乘法器輸出電路係被連接至該Μ 〇接線束介面電路 之輸入電路,以此一方式提供儲存於該記憶體電路中之資 料;和 該記憶體輸出電路係被連接至該乘法器輸出介面藉此 該記憶體電路之狀態可從該記憶體輸出介面輸出至一接線 束分線或整個匯流排Β 3上。 於一實施例中,該匯流排Β 〇另外包括接線束分線d 〇 Α與D 〇 C,其中: 於該大型記憶體寬介面電路中之輸出驅動器連接匯流 排B 〇之接線束D 〇 A,可提供存取該大型記憶體資料所 需之位址信號:和 於該大型記憶體寬介面電路中之輸出驅動器連接匯流 排B 〇之接線束DO C,可提供存取該大型記憶體資料所 需之控制信號。 於另一種選擇之實施例中,該匯流排B 0另外包括〜 接線束分線D 〇 F,其中: 於該大型記憶體寬介面電路中之輸入驅動器連接瞳流 排B 〇之接線束D 0 F,可提供被用來傳遞大型記憶體資 料存取之狀態的回授信號。 一____37----- 本紙張尺度適用中國國家標準(CNS ) A4規格(210X297公釐) (請先閱讀背面之注意事項再填寫本頁)V. Description of the invention (4) Printed branch control by the Consumer Cooperative of the Central Standards Bureau of the Ministry of Economic Affairs; and the output circuit of the multiplier includes a wiring harness interface, a multiplier output memory circuit, and a multiplier output interface, where: Μ 〇The wiring harness interface includes an input circuit connected to the μ 〇 wiring harness; the multiplier output circuit is connected to the input circuit of the 〇 wiring harness interface circuit, and in this way provides the storage circuit in the memory circuit. Data; and the memory output circuit is connected to the multiplier output interface so that the state of the memory circuit can be output from the memory output interface to a wiring harness branch line or the entire busbar B3. In an embodiment, the bus bar B 〇 further includes a wiring harness branch line d 〇 A and D 〇C, wherein: an output driver in the large-memory wide interface circuit is connected to the wiring harness D 〇A of the bus B 〇 , Can provide the address signal required to access the large memory data: and the output driver in the large memory wide interface circuit to connect the wiring harness DO C of the bus B, can provide access to the large memory data Required control signals. In another alternative embodiment, the bus bar B 0 further includes a wiring harness branch line D oF, wherein: the input driver in the large-memory wide interface circuit is connected to the wiring harness D 0 of the pupil bus bar B 0 F, can provide feedback signals used to communicate the status of large memory data access. ____ 37 ----- This paper size is applicable to China National Standard (CNS) A4 specification (210X297 mm) (Please read the precautions on the back before filling this page)

經濟部中央榡準局員工消費合作社印製 484110 五、發明説明(β ) 於一實施例中,該等通信介面中至少一通信介面外接 外部電路係提供一通信介面至一匯流排,其中提到的電路 動作如同一匯流排主控器。 於一實施例中,該等通信介面之至少一通信介面外接 外部電路係提供一通信介面至一匯流排,其中提到的電路 動作如同一匯流排受控器。 圖5至1 2說明使用本發明之裝置提供接近線性加速 視訊幀描繪工作與其他主要的平行計算◊由於集積的程度 係在進步,將外部記憶體介面通信路徑匯集至一共享外部 記憶體介面中係有顯著的益處。每一外部記憶體介面藉由 在每一積體電路加入很多接腳將可大大節省產品價格。此 不僅影響晶片尺寸還包括熱散逸與功率消耗。以此方式共 享一外部記憶體介面將可最佳化通信頻寬並節省製造成本 〇 圖5係一積體電路之方塊圖,其中係具有二組圖1之 基本電路實體並有獨立的外部記憶體介面。 圖6係一積體電路之方塊圖,其中係具有二組圖1之 基本電路實體並共享一外部記憶體介面, 圖7係一積體電路之方塊圖,其中係具有四組圖1之 基本電路實體並有獨立的外部記憶體介面、 圖8係一積體電路之方塊圖,其中係具有四組圖1之 基本電路實體並共享一外部記憶體介面。 圖9係一積體電路之方塊圖,其中係具有四組圖1之 基本電路實體並共享二組外部記憶體介面。 本 尺度適用中國國家標準YCNS )八4規格(21〇><297公漦一) - 一 --------#衣------、訂------^ (請先閱讀背面之注意事項再填寫本頁) 484110 經濟部中央標準局員工消費合作社印製 Λ7 B7 五、發明説明(4) 圖1 0係一積體電路之方塊圖,其中係具有四組圖1 之基本電路實體並有二組共享外部記憶體介面以及完全互 連的訊息埠。 圖1 1係一積體電路之方塊圖,其中係具有十六組圖 1之基本電路實體並有四組共享外部記憶體介面◊ 圖1 2係一積體電路之方塊圖,其中係具有十六組圖 1之基本電路實體並具有二組共享記憶體介面。 圖1 3係一印刷電路板之方塊圖,顯示出把圖1之基 本電路實體與對應它們的記憶體模組整合在一電路板上。 圖1 4係另一印刷電路板之方塊圖,顯示出把圖1之 基本電路實體與對應它們的記憶體模組整合在一電路板上 〇 參考文獻: 1 : U b 〇 i s,J e f f,“日照好萊塢··丄丄7 SPARC站示出「Toy故事」,最早的長篇電腦動畫 影片” Sunwor Id 〇nl ine,1995 年 1 1月,係見於一全球資訊網電子雜誌之h t t p : w w · s u n · c 0 m / s u n w 0 r 1 d 1 i n e / s wo 1-1 1- 1995/swo l-χ 1-p i xa r • h t m 1。 2-Fuchs,Henry,USPTN 4,9 5 0 ’ 4 6 5 ’“使用加強邏輯的像素記憶體單元之圖形 顯不系統”,1982年一月18曰提出申請,1986 年5月2 0日獲准。 39 本紙張尺度適用中國國家標準(CNS ) A4規格(21 〇 X 297公釐) (請先閲讀背面之注意事項再填寫本頁) 、\一-口 484110 五、發明説明(β) 3-Andrews ?David Η·,et-al •USPTN 4,646,075, “用於一資料處理 管線之系統與方法”,1983年11月3日提出申請, 1987年2月24日獲准。 4 * L i t. t lef ield,Richard,US PTN4,949,280, “以平行處理器爲基礎之掃 描線圖形系統架構”,1988年5月10日提出申請, 1 9 9 0年8月1 4日獲准。 5.Hedley,David,et.al·,U SPTN4,953,107, “視訊信號處理”,19 8 9年8月2 8日提出申請,19 90年8月28日獲准 〇 ,6-Westberg» Thomas jet *al •,USPTN5,101,365,“使用 Z 軸緩衝記 憶體用以擴充視窗之裝置”,1990年11月7日提出 申請,1992年3月31日獲准。 7*Cawley,Robin,USPTN5,1 經濟部中央標隼局員工消費合作社印製 (請先閱讀背面之注意事項再填寫本頁) 0 3,2 1 7,“電子影像處理”,9 8 8年1 1月2 9日提出申請,1992年4月7日獲准。 8*Liang,Bob,et .al ·,USPT N 5,1 8 2,7 9 7,“多重處理器圖形顯示系統用以 顯示階層式資料結構”,1992年5月27日提出申請 ,1993年1月26日獲准。Printed by the Central Consumers Association of the Ministry of Economic Affairs, Consumer Cooperatives 484110 V. Description of the Invention (β) In one embodiment, at least one of the communication interfaces is external to an external circuit to provide a communication interface to a bus, which mentions The circuit works like the same bus master. In an embodiment, at least one of the communication interfaces is externally connected to an external circuit to provide a communication interface to a bus, and the circuits mentioned therein act as the same bus controller. Figures 5 to 12 illustrate the use of the device of the present invention to provide near linear accelerated video frame drawing work and other major parallel calculations. Since the degree of accumulation is progressing, the communication paths of the external memory interface are integrated into a shared external memory interface There are significant benefits. Each external memory interface can save a lot of product price by adding a lot of pins to each integrated circuit. This affects not only wafer size but also heat dissipation and power consumption. Sharing an external memory interface in this way can optimize communication bandwidth and save manufacturing costs. Figure 5 is a block diagram of an integrated circuit, which has two sets of basic circuit entities of Figure 1 and independent external memory. Body interface. Figure 6 is a block diagram of an integrated circuit, which has two sets of basic circuit entities of Figure 1 and shares an external memory interface, Figure 7 is a block diagram of an integrated circuit, which has four sets of basic Figure 1 The circuit entity does not have an independent external memory interface. FIG. 8 is a block diagram of an integrated circuit, which has four sets of basic circuit entities of FIG. 1 and shares an external memory interface. Fig. 9 is a block diagram of an integrated circuit, which has four sets of the basic circuit entities of Fig. 1 and shares two sets of external memory interfaces. This standard applies the Chinese National Standard YCNS) 8 4 specifications (21〇 < 297 Gongyi)-a -------- # 衣 ------, order ------ ^ (Please read the notes on the back before filling in this page) 484110 Printed by the Consumer Cooperatives of the Central Standards Bureau of the Ministry of Economic Affairs Λ7 B7 V. Description of the invention (4) Figure 10 is a block diagram of an integrated circuit, which has four groups The basic circuit entity in Figure 1 does not have two sets of shared external memory interfaces and fully interconnected message ports. Figure 1 1 is a block diagram of an integrated circuit, which has sixteen sets of basic circuit entities of Figure 1 and has four sets of shared external memory interfaces. Figure 12 is a block diagram of an integrated circuit, which has ten There are six sets of basic circuit entities in Figure 1 and two sets of shared memory interfaces. Fig. 13 is a block diagram of a printed circuit board, showing the integration of the basic circuit entities of Fig. 1 with the memory modules corresponding to them on a circuit board. Fig. 14 is a block diagram of another printed circuit board, showing the integration of the basic circuit entities of Fig. 1 and their corresponding memory modules on a circuit board. References: 1: U b ois, J eff, "Rizhao Hollywood · 丄 丄 7 SPARC station shows" Toy Story ", the earliest full-length computer animated film" Sunwor Id 〇nline, January 1995, see http: // ww · sun · C 0 m / sunw 0 r 1 d 1 ine / s wo 1-1 1- 1995 / swo l-χ 1-pi xa r • htm 1. 2-Fuchs, Henry, USPTN 4, 9 5 0 '4 6 5 '"Graphics display system using pixel logic unit with enhanced logic", application was made on January 18, 1982, and was approved on May 20, 1986. 39 This paper size applies the Chinese National Standard (CNS) A4 specification ( 21 〇X 297 mm) (Please read the notes on the back before filling out this page), \ 一-口 484110 V. Description of the invention (β) 3-Andrews? David Η ·, et-al • USPTN 4,646,075, "Use "Systems and Methods in a Data Processing Pipeline", filed on November 3, 1983, and February 24, 1987 4 * Lit. t lefield, Richard, US PTN4,949,280, "Parallel Processor-Based Scanning Line Graphics System Architecture", filed on May 10, 1988, August 1990 Approved on April 4. 5. Hedley, David, et.al., U SPTN 4,953, 107, "Video Signal Processing", application was made on August 28, 1989, and approved on August 28, 1990. , 6-Westberg »Thomas jet * al •, USPTN 5, 101, 365," A device for expanding the window using Z-axis buffer memory ", application was made on November 7, 1990, and approved on March 31, 1992. 7 * Cawley, Robin, USPTN5, 1 Printed by the Consumer Cooperatives of the Central Bureau of Standards of the Ministry of Economic Affairs (please read the precautions on the back before filling this page) 0 3, 2 1 7, "Electronic Image Processing", 1988 1 The application was filed on January 29, and approved on April 7, 1992. 8 * Liang, Bob, et .al ·, USPT N 5, 1 8 2, 7 9 7, "Multiprocessor graphics display system for displaying hierarchy File structure ", application was made on May 27, 1992, and was approved on January 26, 1993.

9*S〇kai,T〇shi〇,et ,al ·,U 本紙張尺度適用中國國家標見格(210—X297公釐) - 484110 Λ7 B7 五、發明説明(/) SPTN5,594,844,“使用射線追蹤經由元音 劃分之三維視圖而數値上係使用目標爲基之參數”,19 94年1月25日提出申請,1997年1月14日獲准 〇 l〇,Uhl in,Ke i th,USPTN5,6 3 Ο,Ο 4 3,“用於三維影像顯示之動畫製作紋理貼圖 的裝置與方法”,1995年5月11日提出申請,19 9 7年5月1 3日獲准。9 * S〇kai, T〇shi〇, et, al ·, U This paper size is applicable to the national standard of China (210-X297 mm)-484110 Λ7 B7 V. Description of the invention (/) SPTN5,594,844, "Use Ray tracing uses three-dimensional views divided by vowels and numerically uses target-based parameters. "19 Application was made on January 25, 1994, and was approved on January 14, 1997. 010, Uhl in, Keith, USPTN5, 6 3 Ο, Ο 4 3, "Apparatus and method for producing texture maps for animation of 3D image display", application was made on May 11, 1995, and was approved on May 13, 1997.

11-Oppenheim^Alan&Schaf e r,R ο n a 1 d,“數位信號處理”,1 9 7 5年, Prentice Hall,Engle wood C 1 i f i s,N · J ·。 .12 - Oppenheim^Alan&Schaf e r,R ο n a 1 d,“離散時間信號處理”,1 9 8 9 年,Prentice Hall,國際標準圖書編號0 一 1 3-2 1 6292 - 經濟部中央標準局員工消費合作社印製 (請先閲讀背面之注意事項再填寫本頁) 13 ^Barns ley ^Michael&Hur d,L y m a η,碎形幾何影像壓縮,1 9 9 3年,A K Peters,有限公司,Wei Is ley,ΜΑ 0 2 1 8 1,國際標準圖書編號1 — 56881 — 000 — 8。 14*Daubechies, Ingrid,有關 子波之十篇講稿,1 9 9 2年,工業與應用數學學會,國 際標準圖書編號0 — 89871 — 274 — 2。 _41_ 本紙張尺度適用中國國家標準(CNS ) A4規格(210X297公釐)11-Oppenheim ^ Alan & Schaf e r, R ο n a 1 d, "Digital Signal Processing", 1975, Prentice Hall, Engle wood C 1 i f i s, N · J ·. .12-Oppenheim ^ Alan & Schafer, R na 1 d, "Discrete-Time Signal Processing", 1989, Prentice Hall, ISBN 0-11 3-2 1 6292-Central Bureau of Standards, Ministry of Economic Affairs Printed by an employee consumer cooperative (please read the notes on the back before filling this page) 13 ^ Barns ley ^ Michael & Hur d, Lyma η, Fragmented Geometric Image Compression, 193, AK Peters, Ltd., Wei Is ley, MA 0 2 1 8 1, ISBN 1 — 56881 — 000 — 8. 14 * Daubechies, Ingrid, Ten Lecture Notes on Wavelets, 1982, Society of Industrial and Applied Mathematics, ISBN 0—89871—274—2. _41_ This paper size applies to China National Standard (CNS) A4 (210X297 mm)

I 經濟部中央標準局員工消費合作社印製 Λ7 B7 五、發明説明(β ) 15.Kaiser,Gerald,容易理解的子 波入門,1994年,Bi rkhauser,波士頓, 國際標準圖書編號0 — 8176 — 3711 — 7。 1 6 · F i s h e r,Y u v a 1 ( e d ·),碎形 幾何影像壓縮,1995年Springer — Ver 1 a g,紐約,國際標準圖書編號〇 — 3 87 — 94211 —4。 17*Mcgregor,D,R*et,al· “ 較快速之碎形幾何壓縮”,D r · D 〇 b b £ s期刊,1 9 9 6年1月,第3 4頁等等 1 8 ♦ L i m,J a e,二維信號與影像處理,1 9 9 0年,Prentice Hall,國際標準圖書編 號 0,-13 - 935322 — 4。 19.Glassner,Andrew,數位影像 合成之原理,第1和第2卷,1995年,Morgan K a u f f m a η出版社,股份有限公司,國際標準圖書 編號 1 — 55860 — 276 — 3 ◊ 20*Foley,James,et *al ·,電 腦繪圖:C + +語言原理與實作,1 9 9 6年,1 9 9 0 年,Addis on—Wes ley出版公司,國際標準 圖書編號0 — 201 — 8 4 840 — 6。 21.Watt,Alan,三維電腦繪圖,第二版 ,1993年,Addison—Wesley 出版社, 有限公司,國際標準圖書編號0 — 201 — 63186 — _42_ 本紙張尺度適用中國國家標準(CNS ) A4規格(210X297公釐) (請先閱讀背面之注意事項再填寫本頁) • 衣·I Printed by the Consumer Cooperatives of the Central Standards Bureau of the Ministry of Economic Affairs Λ7 B7 V. Invention Description (β) 15.Kaiser, Gerald, Easy-to-understand Introduction to Wavelets, 1994, Birkhauser, Boston, ISBN 0 — 8176 — 3711 — 7. 16 · F i s hr, Yu v a 1 (e d ·), fractal geometric image compression, 1995 Springer — Ver 1 ag, New York, ISBN 0 — 3 87 — 94211 — 4. 17 * Mcgregor, D, R * et, al · "Faster Faster Compression of Fragmented Geometry", D r · D 〇bb £ s Journal, January 1996, page 34, etc. 1 8 ♦ L im, Jae, Two-Dimensional Signal and Image Processing, 1990, Prentice Hall, ISBN 0, -13-935322-4. 19.Glassner, Andrew, Principles of Digital Image Synthesis, Volumes 1 and 2, 1995, Morgan Kauffmann Press, Inc., ISBN 1 — 55860 — 276 — 3 ◊ 20 * Foley, James , Et * al ·, Computer Graphics: Principles and Implementation of C ++ Language, 196, 190, Addis on—Wesley Publishing Company, ISBN 0 — 201 — 8 4 840 — 6. 21.Watt, Alan, 3D Computer Graphics, Second Edition, 1993, Addison—Wesley Publishing Co., Ltd., International Standard Book No. 0 — 201 — 63186 — _42_ This paper size applies to China National Standard (CNS) A4 specifications ( 210X297 mm) (Please read the notes on the back before filling out this page) • Clothing ·

、1T 484110 Λ7 B7 五、發明説明(w) 5。 22*Watt ’Alan&Watt,Mark, 高級動畫與描繪技術,1992年,ACM Press ,國際標準圖書編號〇 — 2 Ο 1 23*Prusinki ewicz ?Prezmy slaw&Lindenayer,Arist· id,e t · a 1 ·,植物的美麗色彩之演算法,1 9 9 0年,S p r i n g e r —V e r 1 a g紐約,股份有限公司,國 際標準圖書編號0 — 387 — 97297 — 8。 24· Iwatz,Eiji,et-al* , “一 2·2GOPS視訊數位信號處理器2—精簡指令集電腦 多指令多資料路徑,6 - P E 單指令多資料路徑架構用 於即時Μ P E G 2視訊編碼/解碼”,1 9 9 7年國際電 機電子工程師學會(I Ε Ε Ε )國際固態電路硏討會,1 9 9 7年I Ε Ε Ε,國際標準圖書編號〇 _ 7 8 0 3 — 3 721 — 2,第258 — 259頁。 經濟部中央標準局員工消費合作社印製 (請先閱讀背面之注意事項再填寫本頁) 25*Si lverbrook,Kia,USPT N5, 590, 252,“視訊處理器系統與聲頻處理器 系統”,1 9 9 3年4月28曰提出申請,1 996年1 2月3 1日獲准。 26*Uhlin,Keith,USPTN5,6 3 0 ’ 0 4 3,“用於三維影像顯示之動畫製作紋理貼圖 的裝置與方法”,1995年5月11日提出申請,19 9 7年5月1 3日獲准。 -------43 _ 本紙張尺度適用中國國家標準(CNS ) A4規格(210X 297公釐) 484110 經濟部中央標準局員工消費合作社印製 Λ7 B7 五、發明説明(Μ ) 27*Sakai,Toshio,et *al ·, USPTN5,594,844, “使用射線追蹤經由元 音劃分之三維視圖而數値上係使用目標爲基之參數”,1 994年1月25日提出申請,1997年1月14日獲 准。 28*Greene,Edward,et*al· ,USPTN5,579,455, “使用階層式Z軸緩 衝器逼真度於一顯示器上的三維景物之描繪”,1 9 9 3 年7月3 0日提出申請,1996年11月2 6獲准。 29- P〇ultn,J〇hn,et.al·,!! SPTN5,481,669, “利用增強型記憶體元件 用於影像產生之架構與裝置”,1993年4月28日提 出申請,1996年12月31日獲准。 30- Patterson,David&Henn e s s e y,J 〇 h n,“電腦架構:定量探討(第二版 )’ 1990年 ’ 1996年,Morgan Kan f f m a η出版社,股份有限公司,國際標準圖書編號工 一 55860 - 329 - 8 〇 31.jr〇hnson,Mike,“超純量微處理 器設計”,1991 年,P T R Prentice —H a 1 1,股份有限公司,Englew(3〇d ci i f f s,N J ,國際標準圖書編號〇— 13一8756 3 1 - 1 ° 32 •Murakami ^Kazuaki 5et. · 44 本紙張尺度適用中國國家標準(CNS ) A4規格( (請先閱讀背面之注意事項再填寫本頁) 、τ 84 4 /|\ Λ7 B7 五、發明説明(P) a 1 ·,“具有2 5 6百萬位元之動態隨機處理記憶體與 四路多工處理器之平行處理隨機存取記憶體晶片”,19 9 7年,國際電機電子工程師學會(I E E E )國際固態 電路硏討會,228-229頁,國際標準圖書編號0 — 7803-3721-2^ 3 3 - Amimoto ?Yoshiharu ? et. • a 1 ·,“一 7 · 6 8 G I P 3 · 8 4 G B / s 1W平行影像處理隨機存取記憶體整合一1 6百萬位元之 動態隨機處理記憶體與1 2 8處理器”,1 9 9 6年,國 際電機電子工程師學會(I E E E )國際固態電路硏討會 ,372 — 373頁,國際標準圖書編號0 — 7 8 0 3 — 3 13 6-2。 ,34*Yao,Yong,“Chromatic ‘ s Μ P A C T 2加速三維計算:M p a c t /3 0 0 0變成第一媒體處理器裝在存儲媒體內內”,第1,6— 1 0頁,微處理器報告,第10卷,15號,1996年 11月18日,1996年,微設計資源◊ 經濟部中央標準局員工消費合作社印製 (請先閱讀背面之注意事項再填寫本頁) 35*Shimizu,Tonu,et*al·, “一具有十六百萬位元動態隨機存取記憶體之多媒體3 2 位元精簡指令集電腦(R I S C )微處理器”,1 9 9 6 年,國際電機電子工程師學會(I E E E )國際固態電路 硏討會,第2 1 6 — 2 1 7頁,國際標準圖書編號0 — 7 803-3136-2。 36*Glaskowsky,Peter,“富士 _45 本紙張尺度適用中國國家標準(CNS ) A4規格(210X 297公釐) 484110 Λ7 五、發明説明(A ) 通感應媒體處理器再:多樣化數位光^ :多模組存取組合長 指令字磁心,整合周邊”,1 1 一 1 3頁’微處理器報告 ,第 1〇 卷,15 號 ’ 1996 年 11 月 18 曰 ’ 199 6年,微設計資源° 37·,,Sam sung發佈的多媒體信號處理器’ J 〇 h n p e d d i e會員關於個人電腦圖形報告,1 9 96 年8 月 20 日,1996 年,John Peddi e Associates,Tiburon’CA94 920,1153-1156頁。 38.Ya〇,Yong,“Samsung 提出的 媒體處理器:大量儲存處理器係被設計作用於微軟之心三 維軟體架構,,,第1,6 — Θ頁,微處理器報告,1 〇卷 ,11號,1996年8月26曰,1996,微設計資 源。1T 484110 Λ7 B7 V. Description of the invention (w) 5. 22 * Watt 'Alan & Watt, Mark, Advanced Animation and Drawing Technology, 1992, ACM Press, ISBN 0—2 0 1 23 * Prusinki ewicz? Prezmy slaw & Lindenayer, Arist · id, et · a 1 · , Algorithms for Plants' Beautiful Colors, 1990, Springer — Ver 1 ag New York, Inc., International Standard Book Number 0 — 387 — 97297 — 8. 24 · Iwatz, Eiji, et-al *, "a 2 · 2 GOPS video digital signal processor 2—reduced instruction set computer multiple instruction multiple data path, 6-PE single instruction multiple data path architecture for real-time MPEG 2 video encoding / Decoding ", 1997 International Society of Electrical and Electronics Engineers (I Ε Ε Ε) International Solid-State Circuits Symposium, 1977 I Ε Ε Ε, International Standard Book Number 0_ 7 8 0 3 — 3 721 — 2, pp. 258 — 259. Printed by the Consumer Cooperatives of the Central Bureau of Standards of the Ministry of Economic Affairs (please read the notes on the back before filling this page) 25 * Silverbrook, Kia, USPT N5, 590, 252, "Video Processor System and Audio Processor System", 1 The application was submitted on April 28, 1993, and was approved on February 31, 1996. 26 * Uhlin, Keith, USPTN5, 6 3 0 '0 4 3, "Apparatus and method for producing texture maps for animation of three-dimensional image display", application was made on May 11, 1995, May 1, 1997 Approved. ------- 43 _ This paper size applies to China National Standard (CNS) A4 (210X 297 mm) 484110 Printed by the Consumers' Cooperative of the Central Standards Bureau of the Ministry of Economic Affairs Λ7 B7 V. Invention Description (M) 27 * Sakai , Toshio, et * al ·, USPTN 5,594,844, "Using Ray Tracing to Three-Dimensional Views Divided by Vowels and Using Target-Based Parameters in Numbers", Filed on January 25, 1994, January 1997 Approved on the 14th. 28 * Greene, Edward, et * al ·, USPTN5,579,455, "Description of Three-Dimensional Scenery on a Display Using Hierarchical Z-Buffers," Filed on July 30, 1993, Application, 1996 Was approved on November 26 of the year. 29- Poultn, John, et.al.,! ! SPTN5,481,669, "Architecture and Installations Using Enhanced Memory Elements for Image Generation", filed on April 28, 1993, and was approved on December 31, 1996. 30- Patterson, David & Henn essey, John, "Computer Architecture: A Quantitative Discussion (Second Edition) '1990', 1996, Morgan Kan ffma η Publishing House, Inc., ISBN 55860- 329-8 〇31.jr〇hnson, Mike, "Ultra-Scalar Microprocessor Design", 1991, PTR Prentice-H a 1 1, Inc., Englew (3d Ciiffs, NJ, International Standard Book No. 〇-13 一 8756 3 1-1 ° 32 • Murakami ^ Kazuaki 5et. · 44 This paper size is applicable to China National Standard (CNS) A4 specification ((Please read the precautions on the back before filling this page), τ 84 4 / | \ Λ7 B7 V. Description of the invention (P) a 1 · "Dynamic random processing memory with 256 million bits and parallel processing random access memory chip with four-way multiplex processor", 19 1997, IEEE International Solid-State Circuits Symposium, pages 228-229, ISBN 0 — 7803-3721-2 ^ 3 3-Amimoto? Yoshiharu? Et. • a 1 ·, "One 7 · 6 8 GIP 3 · 8 4 GB / s 1W parallel For example, the processing of random access memory integrates a 16 million-bit dynamic random processing memory and a 128 processor. "In 1996, the International Society of Electrical and Electronics Engineers (IEEE) International Solid-State Circuits Symposium, 372 — 373 pages, ISBN 0 — 7 8 0 3 — 3 13 6-2., 34 * Yao, Yong, “Chromatic's Μ PACT 2 accelerates 3D calculations: M pact / 3 0 0 0 becomes first Media Processor in Storage Media ", pages 1, 6-10, Microprocessor Report, Volume 10, No. 15, November 18, 1996, 1996, Micro Design Resources ◊ Central Standards of the Ministry of Economic Affairs Printed by the Bureau ’s Consumer Cooperative (please read the notes on the back before filling out this page) 35 * Shimizu, Tonu, et * al ·, "A multimedia 3 with 16 million bits of dynamic random access memory Meta Reduced Instruction Set Computer (RISC) Microprocessor ", 1996, International Society of Electrical and Electronics Engineers (IEEE) International Solid-State Circuits Symposium, pages 2 16 — 2 17, International Standard Book Number 0 — 7 803-3136-2. 36 * Glaskowsky, Peter, "Fuji_45 This paper size is applicable to the Chinese National Standard (CNS) A4 specification (210X 297 mm) 484110 Λ7 V. Description of the invention (A) Passive media processor: Diversified digital light ^: Multi-Module Access Combination of Long Instruction Word Cores, Integrating Peripherals ", 1 1-13 pages of" Microprocessor Report, Volume 10, No. 15 "November 18, 1996," 199 6 years, micro design resources ° 37 ·, Sam Sung's Multimedia Signal Processor 'Johnpeddie Member's Report on Personal Computer Graphics, August 20, 1996, 1996, John Peddi e Associates, Tiburon' CA94 920, pages 1153-1156. 38.Ya〇, Yong, "Samsung's Media Processor: Mass Storage Processor is designed to work with Microsoft Heart's three-dimensional software architecture,", pages 1, 6 — Θ, Microprocessor Report, Volume 10, No. 11, August 26, 1996, 1996, Micro Design Resources.

3 9· “Chromatic rolls out 延伸Mpact”,J〇hn Peddi e會員關於個 人電腦圖形報告,1996年8月27日’1996’J 經濟部中央標準局員工消費合作社印製 (請先閲讀背面之注意事項再填寫本頁) 〇 h η P e d d i e Associates ’Tib uron,CA94920’1182-1183頁。 4〇.Romney,Gordon ’ et ·β1 ’ USPTN3,621,214, “用電子裝置產生的透 視”,1968年11月13日提出申請’ 1971年1 1月1 6日獲准。 41 .Gove。Robert ’USPTN5 ’ 4 _46_______ 本紙張尺度適用中國國家標準(CNS ) A4規格(210X 297公釐) 484110 Λ7 B7 五、發明説明( 1 0,6 4 9,“影像電腦系統與網路”,1 9 9 2年1 月2 9日提出申請,1995年4月25日獲准。 42*GovecRobert ,USPTN5,5 2 2,0 8 3,“於單指令多資料路徑(S I M D)模式 運算之可重組態多處理器係利用一處理器提取指令而使用 剩餘的處理器執行多路徑運算”,1994年1月22日 提出申請,1996年5月28日獲准。 43· “TMX320C6201數位信號處理器, 產品預覽”,SPRS051 · pdf,1997年 1 月 ,1 9 9 7德州儀器,可從德州儀器網站h t t p : w w w · t i · c o m 得到0 44-Weinreb,Dani e 1 和Moon, D a v i d,特點:訊息傳入L i s p機器,人工智慧備 忘錄6 02,1980年11月,Μ· I ·Τ·人工智慧 實驗室。 經濟部中央標準局員工消費合作社印製 (請先閱讀背面之注意事項再填寫本頁) 4 5 · D a 1 ley,Wi 1 liam,et *al •,訊息驅動處理器架構,1 1版,人工智慧備忘錄1 0 69,1988年8月,Μ·Ι·Τ·人工智慧實驗室。 46 · Cray,Seymour,USPTN4, 128,880,1976年1月30曰提出申請,1 9 7 8年1 2月5日獲准。 47*Beard,Douglas,e t · a 1 · USPTN5,544,337, “具有由向量暫存器控 制之暫存器的向量處理器”,1995年1月7日提出申 _£7_ 本紙張尺度適用中國國家標準(CNS ) A4規格(210X 297公釐) 經濟部中央標準局員工消費合作社印製 484110 五、發明説明(β) 請,1996年8月6日獲准。 48*Yoshinaga,Toru 和Shinj 〇,Naoki,USPTN5,598,574,“向 量處理裝置”,1 9 9 6年3月1 8日提出申請,1 9 9 7年1月2 8日獲准。 49· Cray ’Seym our,USPTN3, 8 3 3,8 8 9,“多模式資料處理系統”,1 9 7 3年 3月8日提出申請,1974年9月3日獲准。 50*Porter,J〇hn,et · a 1 ·,U SPTN4,589,067, “具有動態可重組態多功 能管線之算術邏輯單元的全浮點向量處理器”,1 9 8 3 年5月27曰提出申請,1986年5月13日獲准。 ,51*Ellis,James,et-ai·,U SPTN5,418,973,“具有快取記憶體控制器 配合向量與純量運算之數位電腦系統”,1992年1月 2 2日提出申請,1 9 9 5年5月2 3日獲准。 52*Omoda,Koichior,et.,al • ,USPTN4,651,274, “向量資料處理器 ”,1984年3月28日提出申請,1987年3月1 7曰獲准。 53-Gal lup,Michael,et · a 1 • ,USPTN5,600,846, “資料處理系統與 相關的方法”,1 9 9 5年2月1 7日提出申請,1 9 9 7年2月4日獲准。 _48 ____ 本紙張尺度適用中國國家標準(CNS ) A4規格(2!〇X 297公釐) " (請先閱讀背面之注意事項再填寫本頁) 衣· 訂 484110 經濟部中央標準局員工消費合作社印製 Λ7 B7 五、發明説明(Μ) 54.Prince’Betty, “半導體記憶體 :設計、製造和應用手冊,第二版”,1 9 9 3,1 9 9 1 John Wiley&Sons,有限公司,國際 標準圖書編號0 — 471 — 94295 — 2。 5 5· “產品說明:MI PS 1 6特定應用之延伸” ,v 1 · 1,見於視算科技公司網站,1997年7月2 4曰,http://www·sgi.com/MIP S/mipsl6-pdf。 5 6 · “介紹 ThumbTM,2 · 0版”,1996 年3月發行,高級R I S C機器有限公司(ARM) 1 9 9 5 〇 5 7 · “第一卷評論”,影像通訊斯刊:特別探討Μ PEG — 4,下載日期爲1997年7月24日,從h t t P : / / d r o g ο / c s e 1 t · s t e t. · i t / υ f v / 1 e ο n a r d o / i c j f i 1 e s / m e p g — 4 _ s i/paperO · 58 - Koenen jRob ^ Pere i ra ^ F •,和 C h i a r i g 1 i 〇 n e,L · “Μ P E G 4 : 內容與目的”,影像通訊期刊:特別探討MPEG—4, 下載日期爲19 9 7年7月17日,從^1|.1?://(1 rogo/cselt·stet· it/ufv/le onardo/i c j f i 1 es/mepg-4 — s i / p a p e r 1 · h t m。3 9 · "Chromatic rolls out Extend Mpact", a member of Joon Peddi's report on personal computer graphics, '1996'J August 27, 1996 Printed by the Consumer Cooperative of the Central Standards Bureau of the Ministry of Economic Affairs (please read the note on the back first) Please fill in this page for more details) 〇h η Peddie Associates 'Tib uron, CA94920' 1182-1183. 40. Romney, Gordon ′ et · β1 ”USPTN 3,621,214,“ Transmission by Electronic Device ”, filed on November 13, 1968, and approved on November 16, 1971. 41 .Gove. Robert 'USPTN5' 4 _46_______ This paper size applies to Chinese National Standard (CNS) A4 (210X 297 mm) 484110 Λ7 B7 V. Description of the invention (1 0, 6 4 9 "Image Computer System and Network", 1 9 An application was filed on January 29, 2002, and approved on April 25, 1995. 42 * GovecRobert, USPTN5, 5 2 2, 0 8 3, "Reconfigurable operation in single instruction multiple data path (SIMD) mode Multiprocessors use one processor to fetch instructions and use the remaining processors to perform multipath operations. "Applications were made on January 22, 1994, and approved on May 28, 1996. 43." TMX320C6201 Digital Signal Processor, Product Preview ", SPRS051 · pdf, January 1997, 1997. Texas Instruments, available from Texas Instruments website http: www.ti.com 0 44-Weinreb, Dani e 1 and Moon, D avid, Features: incoming message L isp machine, artificial intelligence memo 6 02, November 1980, M · I · T · artificial intelligence laboratory. Printed by the Consumer Cooperative of the Central Standards Bureau of the Ministry of Economic Affairs (please read the precautions on the back before filling out this page) 4 5 · D a 1 ley, Wi 1 liam, et * al •, Message Driven Processor Architecture, 11th Edition, Artificial Intelligence Memorandum 1 0 69, August 1988, M.I.T. Artificial Intelligence Lab. 46 · Cray, Seymour, USPTN4, 128 , 880, filed on January 30, 1976, and was approved on February 5, 1978. 47 * Beard, Douglas, et · a 1 · USPTN 5,544,337, "With the control of the vector register Vector Processor for Register ", filed on January 7, 1995 _ £ 7_ This paper size applies to China National Standard (CNS) A4 (210X 297 mm) Printed by the Consumer Cooperative of the Central Standards Bureau of the Ministry of Economic Affairs 484110 5 I. Description of the Invention (β) Please, approved on August 6, 1996. 48 * Yoshinaga, Toru and Shinj 〇, Naoki, USPTN 5,598,574, "Vector Processing Device", filed March 18, 1996 The application was approved on January 28, 1997. 49 · Cray 'Seym our, USPTN3, 8 3 3, 8 89, "Multimodal Data Processing System", application was made on March 8, 1973 , Approved on September 3, 1974. 50 * Porter, John, et · a 1 ·, U SPTN4,589,067, "Full Floating Point Vector Processor with Dynamic Reconfigurable Multifunction Pipeline Arithmetic Logic Unit", May 27, 1983 He filed an application and was approved on May 13, 1986. , 51 * Ellis, James, et-ai ·, U SPTN5,418,973, "Digital computer system with cache memory controller coordinate vector and scalar operation", filed on January 22, 1992, 1 9 9 Approved May 23rd, 5 years. 52 * Omoda, Koichior, et., Al •, USPTN 4,651,274, "Vector Data Processor", filed on March 28, 1984, and was approved on March 17, 1987. 53-Gal lup, Michael, et · a 1 •, USPTN 5, 600, 846, "Data Processing System and Related Methods", filed on February 17, 1999, February 4, 1997 Approved. _48 ____ This paper size applies to China National Standard (CNS) A4 (2! 〇X 297 mm) " (Please read the precautions on the back before filling out this page) Clothes · Order 484110 Staff Consumer Cooperatives, Central Standards Bureau, Ministry of Economic Affairs Printed Λ7 B7 5. Description of the Invention (M) 54. Prince'Betty, "Semiconductor Memory: A Handbook of Design, Manufacturing, and Application, Second Edition", 1 9 9 3, 1 9 9 1 John Wiley & Sons, Ltd. , ISBN 0 — 471 — 94295 — 2. 5 5 · "Product Description: MI PS 1 6 Extension of Specific Applications", v 1 · 1, found on the website of Video Computing Technology, July 24, 1997, http: //www·sgi.com/MIP S / mipsl6-pdf. 5 6 · "Introduction to ThumbTM, Version 2.0", issued March 1996, Advanced RISC Machines Ltd. (ARM) 1 995 5 0 5 · "Volume One Review", Video Newsletter: Special Discussion on M PEG — 4, downloaded July 24, 1997, from htt P: / / drog ο / cse 1 t · ste t. · It / υ fv / 1 e ο nardo / icjfi 1 es / mepg — 4 _ si / paperO · 58-Koenen jRob ^ Pere i ra ^ F •, and Chiarig 1 i 〇ne, L · "Μ PEG 4: Content and Purpose", Video Communications Journal: Special Discussion on MPEG-4, Download Date 19 9 July 17, 2007, from ^ 1 | .1?: // (1 rogo / cselt · stet · it / ufv / le onardo / icjfi 1 es / mepg-4 — si / paper 1 · htm.

59*Contin,L,et*al·,“測試M _____ 49 本紙張尺度適用中國國家標準(CNS ) A4規格(210X297公釐) ' - — 丨: II (請先閲讀背面之注意事項再填寫本頁)59 * Contin, L, et * al ·, "Test M _____ 49 This paper size applies to China National Standard (CNS) A4 specifications (210X297 mm) '--丨: II (Please read the precautions on the back before filling in this page)

、1T 484110 經濟部中央標準局員工消費合作社印製 Λ7 B7 五、發明説明(A ) peg—4聲音編碼/解碼建議”,影像通訊期刊:特別 探討Μ P E G — 4,下載日期爲1997年7月17日, 從 h t t ρ : / / d r 〇 g 〇 / c s e 1 t · s t e t · i t/ufv/leonardo/icj f i les/ 111 e P g — 4 — s i / p a p e r 3 · h t m。 60.Osterman,J〇rn,“用於計値 M PEG—4中之視訊工具與演算法的操作法”,影像通訊 期刊:特別探討Μ P E G — 4,下載日期爲1 9 9 7年7 月 17 日,從ht tp ://drogo/cse 1 t · stet · it/ufv/leonardo/icj f i lesXmepg-4_s i//paper4 *htm 〇 .61#Ebrahimi ^Touradj ? “MP E G - 4視訊驗證模式:根據內容表示法之視訊編碼/解 碼演算法”,影像通訊期刊:特別探討Μ P E G — 4,下 載日期爲1 9 9 7年7月1 7日,從h t t p : //d r 〇g〇/cselt*stet· it/ufv/leo n a r d o /· i c j f i 1 e s / m e p g — 4 — s i / paper5.htm。 62*Avaro,0.et,al ♦,“該等MP E G - 4系統與描述語言:一種超越音頻視覺資訊表示法 之方式”,影像通訊期刊:特別探討Μ P E G — 4,下載 日期爲1 9 9 7年7月1 7曰,從h t t p : //d r 〇 go / cselt · stet. · it/ufv/leon __ _ 50 本紙張尺度適用中國國家標準(CNS ) M規格(21〇x297公#·) (請先閲讀背面之注意事項再填寫本頁) 訂 484110 Λ7 B7 五、發明説明(J ) a r d 〇/ i c j f i les/mepg- 4_si/p aper6.htm。 63-Doenges - et *al ·,“Μ P E G — 4 :用於混合媒體之聲頻/視頻與合成圖形/聲頻 ”,影像通訊期刊:特別探討MPEG—4,下載日期爲 1997年7月 17 曰,從http: //d r 〇 g 〇/ cselt · stet · i t/uiv/leonard o/i c j f i 1 es/mepg — 4_s i/pape r 7 · h t m ^ 64*Gosling,James°J〇y,Bi l]^C!Steele,Guy,“JavaTM 語言說明書 ”,1 9 9 6昇陽微系統股份有限公司,由A d d i s ο n—We s 1 e y出版社出版,國際標準圖書編號 〇 一 201-63451-1°1T 484110 Printed by the Consumer Cooperatives of the Central Standards Bureau of the Ministry of Economic Affairs Λ7 B7 V. Description of Invention (A) Peg-4 Audio Coding / Decoding Suggestions ", Journal of Video Communications: Special Discussion on PEG-4, Download Date July 1997 On the 17th, from htt ρ: // dr 〇g 〇 / cse 1 t · stet · it / ufv / leonardo / icj fi les / 111 e P g — 4 — si / paper 3 · htm. 60. Osterman, J〇 rn, "Operational Methods for Video Tools and Algorithms in Calculating M PEG-4", Journal of Image Communications: Special Discussion on M PEG-4, Download Date July 17, 1997, from ht tp : // drogo / cse 1 t stet it / ufv / leonardo / icj fi lesXmepg-4_s i // paper4 * htm 〇.61 # Ebrahimi ^ Touradj? "MP EG-4 video verification mode: according to the content representation "Video Encoding / Decoding Algorithm", Journal of Image Communications: Special Discussion on PEG-4, download date: July 17, 1997, http: // dr 〇g〇 / cselt * stet · it / ufv / leo nardo / · icjfi 1 es / mepg — 4 — si / paper5.htm. 62 * Avaro, 0.et, al ♦, “This And other MP EG-4 systems and description languages: a way to go beyond audio visual information representation ", Journal of Image Communications: Special Discussion on M PEG-4, download date is July 17, 1997, from http: / / dr 〇go / cselt · stet. · it / ufv / leon __ _ 50 This paper size applies to Chinese National Standard (CNS) M specification (21〇x297 公 # ·) (Please read the precautions on the back before filling this page ) Order 484110 Λ7 B7 V. Description of the invention (J) ard 〇 / icjfi les / mepg- 4_si / p aper6.htm. 63-Doenges-et * al ·, "M PEG-4: Audio / Video for mixed media And Synthetic Graphics / Audio ", Video Communication Journal: Special Discussion on MPEG-4, Download Date July 17, 1997, http: // dr 〇g 〇 / cselt · stet · it / uiv / leonard o / icjfi 1 es / mepg — 4_s i / pape r 7 · htm ^ 64 * Gosling, James ° Joy, Bi l] ^ C! Steele, Guy, "JavaTM Language Manual", 1 9 6 6 Sun Microsystems Co., Ltd. , Published by A ddis ο n—We s 1 ey publishing house, ISBN 〇 201-63451-1 °

65 ·Αι:ηο1(1,Κ6η 和 Gos 1 ing,J a m e s,“ *1 a v a TM程式設計語言”,1 9 9 β昇陽 微系統股份有限公司’由Add i son — Wesley 經濟部中央標準局員工消費合作社印製 (請先閱讀背面之注意事項再填寫本頁) 出版社出版,國際標準圖書編號 0 — 201 — 6345 5 — 4 〇 6 6 · I n t e 1,“加速繪圖埠介面說明書”,工 • 0版,英代爾公司,1996年7月31日。 67· “通用串列匯流排說明書”,1·〇修訂版, 由Comp a g、數位設備公司' IBM電腦公司、英代 爾、微軟、NEC 'Northern Telecom 本紙張尺度適财關家標準(CNS ) A4規格(21GX29^公釐) - 484110 Λ7 B7 .70-Papaicha 1 is,Pan John, 五、發明説明(叫) 等公司訂定,1996年1月15日。 68*S〇iari ’Edward和11 ,G e o r g e,“週邊組件互連(p C丨)硬體與:二 ,第三版”,1994,1995,1996,*An = abo〇ks,San Diego, C A出版,國陕二 準圖書編號0一92392一32 一9,第十五刷,『奶 9 6年1月◊ 9 69 · Shanley,Tom,Anders ’ D ο η,“週邊組件互連(p c I )系統架構,键=n ” 版 ” ’1995Mindshare股份有限公司,國際汽 準圖書編號〇 — 201 — 4099 3 — 3,第一刷1' 5年2月。 〇 s 和 s 〇 6利用TMS32020實施快速傅立葉轉 換演算法”,第84至85頁,利用TMS320系列執 行數位信號處理應用程式,1 9 8 6德州儀器股份有限公 司0 ----------jml衣— C請先閲讀背面之注意事項再填寫本頁j 、-5:-τ> 經濟部中央標準局員工消費合作社印製 52 CN 準 標 家 國 國 中 用 適 度 尺65 · Αι: ηο1 (1, K6η and Gos 1 ing, James, "* 1 ava TM programming language", 1 9 9 β Sun Microsystems Co., Ltd. 'by Add i son — Central Bureau of Standards, Wesley Ministry of Economic Affairs Printed by employee consumer cooperatives (please read the notes on the back before filling this page) Published by the publisher, ISBN 0 — 201 — 6345 5 — 4 〇6 6 · I nte 1, “Accelerated Graphics Port Interface Manual”, Version 0, Indale Corporation, July 31, 1996. 67 · "Common Serial Bus Specification", 1.0 Revision, Comp Comp, Digital Equipment Corporation 'IBM Computer Corporation, Indale, Microsoft, NEC 'Northern Telecom This paper size is suitable for financial standards (CNS) A4 specification (21GX29 ^ mm)-484110 Λ7 B7 .70-Papaicha 1 is, Pan John, V. Description of invention (called) and other companies , January 15, 1996. 68 * Soari 'Edward and 11, George, "Peripheral Component Interconnect (p C 丨) Hardware and: Second, Third Edition", 1994, 1995, 1996, * An = abo〇ks, San Diego, CA, National Shaanxi Book No. 0-92392-32 January 9, Fifteenth Brush, "Milk 9 January ◊ 9 69 · Shanley, Tom, Anders 'D ο η," Peripheral Component Interconnect (pc I) System Architecture, Key = n "Version"' 1995Mindshare Co., Ltd., International Automobile Standard Book No. 0—201—4099 3—3, the first brush 1 '5 February. 〇s and s 〇6 using TMS32020 to implement a fast Fourier transform algorithm ”, pages 84 to 85, using TMS320 series executes digital signal processing applications, 1 9 8 6 Texas Instruments Co., Ltd. 0 ---------- jml clothing — C Please read the precautions on the back before filling in this page j, -5:- τ > Printed by the Consumer Cooperatives of the Central Bureau of Standards of the Ministry of Economic Affairs 52 CN

I釐 公 7 9 2 X 經濟部智慧財產局員工消費合作社印製 484110 A7 B7 五、發明說明() 主要部份代表符號之簡要說明: 1基本電路 2控制器 3、5、12、13、14、15、16、17 接線束 4內部區域記憶體 6主記億體介面與控制器(MM I C ) 7總體外部匯流排介面(G E B I ) 8內嵌式微處理器子系統 9數位信號處理器(陣列處理器) , 10相鄰電路通信埠 11特殊目的電路 8 0 0微處理器 810快取記億體, 8 2 0隨機存取記憶體 8 3 0唯讀記憶體 900資料路徑處理單元 9 0 1匯流排 、 9 0 2接線束 9 0 3接線束 9 0 6接線束 9 0 7接線束 909接線束 9 1 0共享運算元電路(S〇C) 9 2 0區域資料儲存器 (請先閱讀背面之注意事項再填寫本頁)I centimeter 7 9 2 X Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economy 484110 A7 B7 V. Description of the invention () Brief description of the representative symbols of the main parts: 1 Basic circuit 2 Controller 3, 5, 12, 13, 14 , 15, 16, 17 Wiring harness 4 Internal area memory 6 Main memory interface and controller (MM IC) 7 Overall external bus interface (GEBI) 8 Embedded microprocessor subsystem 9 Digital signal processor (array Processor), 10 adjacent circuits, communication port 11 special purpose circuit 8 0 0 microprocessor 810 cache memory, 8 2 0 random access memory 8 3 0 read-only memory 900 data path processing unit 9 0 1 Busbar, 9 0 2 Wiring harness 9 0 3 Wiring harness 9 0 6 Wiring harness 9 0 7 Wiring harness 909 Wiring harness 9 1 0 Shared operand circuit (SOC) 9 2 0 Area data storage (please read the back first (Notes for filling in this page)

\ 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) 484110 A7 A7 B7 五、發明說明( ) 9 4 0共享輸出與回授介面(S〇F I ) 9 5 0模組 殳5 1同步電路和指令儲存單元(c s I s r 9 6 5陣列處理器介面電路(A P I C ) 9 7 0群組 ·· (請先閱讀背面之注意事項再填寫本頁) 經濟部智慧財產局員工消費合作社印製 II ^---------線丨争--------.-------------- 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐)\ This paper size applies the Chinese National Standard (CNS) A4 specification (210 X 297 mm) 484110 A7 A7 B7 V. Description of the invention () 9 4 0 Shared output and feedback interface (S0FI) 9 5 0 module 殳5 1 Synchronization circuit and instruction storage unit (cs I sr 9 6 5 Array processor interface circuit (APIC) 9 7 0 Group ... (Please read the precautions on the back before filling this page) Employees of the Intellectual Property Bureau of the Ministry of Economic Affairs Cooperative printed II ^ --------- line 丨 continued --------.-------------- This paper size applies to China National Standard (CNS) A4 size (210 X 297 mm)

Claims (1)

484110 A8 B8 C8 D8 六、申請專利範圍 1 · 一種積體電路,在運算期間係與記憶體一起作業 ,該積體電路包含: 一介面電路係被組構用於控制對記憶體存取,該介面 電路係被耦接至該記憶體; 一內嵌式微處理器係被組構用於控制該積體電路,該 內嵌式處理器被耦接至該介面電路以從介面電路接收資訊 9 一陣列處理器用以執行算術計算,該陣列處理器被耦 接至該介面電路以從介面電路接收資訊。 2 ♦如申請專利範圍第1項之積體電路,其中該陣列 處理器包含: 複數個乘法器/累加器;和 .一共享運算元電路被耦接至並提供一共享運算元給該 等多數乘法器/累加器中至少兩個乘法器/累加器。 3 ·如申請專利範圍第1項之積體電路,其中該介面 電路包括一接線束用以提供寬存取。 經濟部中央標準局員工消費合作社印製 (請先閲讀背面之注意事項再填寫本頁) 4 ·如申請專利範圍第3項之積體電路,其中該接線 束包含至少2 5 6條導線。 1 本紙張尺度適用中國國家標準(CNS ) Α4規格(210Χ297公釐)484110 A8 B8 C8 D8 VI. Patent application scope 1 · An integrated circuit that works with memory during operation. The integrated circuit includes: An interface circuit is configured to control access to the memory. An interface circuit is coupled to the memory; an embedded microprocessor is configured to control the integrated circuit, and the embedded processor is coupled to the interface circuit to receive information from the interface circuit. The array processor is used for performing arithmetic calculations. The array processor is coupled to the interface circuit to receive information from the interface circuit. 2 ♦ The integrated circuit of item 1 in the scope of patent application, wherein the array processor includes: a plurality of multipliers / accumulators; and a shared operand circuit is coupled to and provides a shared operand to the majority At least two of the multipliers / accumulators. 3. The integrated circuit of item 1 in the scope of patent application, wherein the interface circuit includes a wiring harness to provide wide access. Printed by the Consumers' Cooperative of the Central Standards Bureau of the Ministry of Economic Affairs (please read the notes on the back before filling out this page) 4 · If the integrated circuit of item 3 of the patent application, the wiring harness contains at least 256 wires. 1 This paper size applies to China National Standard (CNS) Α4 specification (210 × 297 mm)
TW87109751A 1997-06-20 1998-06-18 Video frame rendering engine TW484110B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US5039697P 1997-06-20 1997-06-20
US08/993,442 US6854003B2 (en) 1996-12-19 1997-12-18 Video frame rendering engine

Publications (1)

Publication Number Publication Date
TW484110B true TW484110B (en) 2002-04-21

Family

ID=26728224

Family Applications (1)

Application Number Title Priority Date Filing Date
TW87109751A TW484110B (en) 1997-06-20 1998-06-18 Video frame rendering engine

Country Status (1)

Country Link
TW (1) TW484110B (en)

Similar Documents

Publication Publication Date Title
US6854003B2 (en) Video frame rendering engine
US6721773B2 (en) Single precision array processor
TW588281B (en) Processing multiply-accumulate operations in a single cycle
JP2022106737A (en) Performing matrix multiplication in hardware
US5613146A (en) Reconfigurable SIMD/MIMD processor using switch matrix to allow access to a parameter memory by any of the plurality of processors
EP0282825B1 (en) Digital signal processor
RU2263947C2 (en) Integer-valued high order multiplication with truncation and shift in architecture with one commands flow and multiple data flows
US5933624A (en) Synchronized MIMD multi-processing system and method inhibiting instruction fetch at other processors while one processor services an interrupt
EP2003548B1 (en) Resource management in multi-processor system
US6192384B1 (en) System and method for performing compound vector operations
CN108874744A (en) The broad sense of matrix product accumulating operation accelerates
US4689738A (en) Integrated and programmable processor for word-wise digital signal processing
de Kock Multiprocessor mapping of process networks: a JPEG decoding case study
JPH10116268A (en) Single-instruction plural data processing using plural banks or vector register
US6037947A (en) Graphics accelerator with shift count generation for handling potential fixed-point numeric overflows
CN114118354A (en) Efficient SOFTMAX computation
CN101093577A (en) Picture processing engine and picture processing system
Dehyadegari et al. Architecture support for tightly-coupled multi-core clusters with shared-memory HW accelerators
CN101751356B (en) Method, system and apparatus for improving direct memory access transfer efficiency
CN113743573A (en) Techniques for accessing and utilizing compressed data and state information thereof
Stepchenkov et al. Recurrent data-flow architecture: features and realization problems
TW484110B (en) Video frame rendering engine
US20230359496A1 (en) Stack access throttling for synchronous ray tracing
CN116257208A (en) Method and apparatus for separable convolution filter operation on matrix multiplication array
Niederhagen Parallel cryptanalysis

Legal Events

Date Code Title Description
GD4A Issue of patent certificate for granted invention patent
MM4A Annulment or lapse of patent due to non-payment of fees