TWI234738B - Re-configurable streaming vector processor - Google Patents
Re-configurable streaming vector processor Download PDFInfo
- Publication number
- TWI234738B TWI234738B TW092115849A TW92115849A TWI234738B TW I234738 B TWI234738 B TW I234738B TW 092115849 A TW092115849 A TW 092115849A TW 92115849 A TW92115849 A TW 92115849A TW I234738 B TWI234738 B TW I234738B
- Authority
- TW
- Taiwan
- Prior art keywords
- data
- reconfigurable
- vector processor
- memory
- flow vector
- Prior art date
Links
- 239000013598 vector Substances 0.000 title claims abstract description 85
- 230000015654 memory Effects 0.000 claims abstract description 47
- 230000006870 function Effects 0.000 claims abstract description 14
- 238000004364 calculation method Methods 0.000 claims description 24
- 238000000034 method Methods 0.000 claims description 22
- 230000008569 process Effects 0.000 description 6
- 230000008901 benefit Effects 0.000 description 3
- 238000004590 computer program Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 238000000926 separation method Methods 0.000 description 2
- 238000009825 accumulation Methods 0.000 description 1
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 230000006837 decompression Effects 0.000 description 1
- 230000008672 reprogramming Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000005641 tunneling Effects 0.000 description 1
- 239000011800 void material Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/34—Addressing or accessing the instruction operand or the result ; Formation of operand address; Addressing modes
- G06F9/355—Indexed addressing
- G06F9/3552—Indexed addressing using wraparound, e.g. modulo or circular addressing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/78—Architectures of general purpose stored program computers comprising a single central processing unit
- G06F15/7867—Architectures of general purpose stored program computers comprising a single central processing unit with reconfigurable architecture
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/80—Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
- G06F15/8053—Vector processors
- G06F15/8061—Details on data memory access
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G06F9/30036—Instructions to perform operations on packed data, e.g. vector, tile or matrix operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/34—Addressing or accessing the instruction operand or the result ; Formation of operand address; Addressing modes
- G06F9/345—Addressing or accessing the instruction operand or the result ; Formation of operand address; Addressing modes of multiple operands or results
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/34—Addressing or accessing the instruction operand or the result ; Formation of operand address; Addressing modes
- G06F9/345—Addressing or accessing the instruction operand or the result ; Formation of operand address; Addressing modes of multiple operands or results
- G06F9/3455—Addressing or accessing the instruction operand or the result ; Formation of operand address; Addressing modes of multiple operands or results using stride
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3824—Operand accessing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3824—Operand accessing
- G06F9/383—Operand prefetching
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3877—Concurrent instruction execution, e.g. pipeline or look ahead using a slave processor, e.g. coprocessor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3877—Concurrent instruction execution, e.g. pipeline or look ahead using a slave processor, e.g. coprocessor
- G06F9/3879—Concurrent instruction execution, e.g. pipeline or look ahead using a slave processor, e.g. coprocessor for non-native instruction execution, e.g. executing a command; for Java instruction set
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Advance Control (AREA)
- Complex Calculations (AREA)
Description
1234738 玖、發明說明: 【發明所屬之技術領域】 本專利申請案係關於共同待審之專利申請案「具有積體 儲存器之互連裝置」(律師檔案號碼CML00101D)、「分數 足址之記憶體介面」(律師檔案號碼CML00102D)、「流動 向f處理器的排程器」(律師檔案號碼CML00108D)、「流 動向里计算之線性圖形程式設計方法」(律師檔案號碼 CML00109D),該等申請案係於各自的申請日提出,並經 引用併入本文。 本發明一般係關於電腦處理器領域。更具體而言,本發 明係關於可重新配置流動向量處理器。 【先前技術】 為行動裝置(多媒體、圖形、影像壓縮/解壓等)計劃之許 户新應用涉及非常多的流動向量計算。該等應用的計算速 率通常超過最好的通用CPU所能提供的性能。因此,可望 找到改善該等裝置之現有計算引擎性能的方法,以滿足該 寺新應用的計算需要。 严同時,該等新應用的本質為,標準及符合標準之最佳演 算法在不斷變化’要求可程式且容易進行程式設計的解決 万案。而且’時間對市場的壓力在增加。解決該問題之一 方法為增加對軟體及硬體之先前投資的再利用。可程式性 極大促進了在多個產品上重新利用硬體。藉由在裝置的多 個實ί程式中使用統-程式模型,因而保持二進位相容 性,藉此促進軟體的重新利用。 85656.doc 1234738 為滿足此需要嘗試使用硬體加速器。但該等方法未解決 問題,因其重新程式設計能力有限。該等功能未固定之硬 體加速器僅能改變其執行功能之參數,而不能改變功能之 類型或排序。 可程式解決方案存在於向量處理器、數位信號處理器、 SIMD處理器及 VLIW處理器中。由於其程式模型的限 制使其難以進行程式設計且難以在各代硬體中保持統一 的程式模型,因而該等解決方案未能解決問題。其程式模 型的限制包括:資料路徑管線的程式員直觀性、記憶體的 寬度及潛伏、記憶體的資料對準及明確的資源相依性。 【發明内容】 雖然本發明可容許許多不同形式的具體實施例,且所附 圖式將顯示及本文還將詳細說明一或多個特定具體實施 例,但是應明白,本文應視為本發明之原則的示範,並無 意將本發明限制於所顯示及說明之特定具體實施例。在以 下說明中,若干視圖中相同參考號碼係用於說明相同、類 似或對應之部分。 本發明之可重新配置流動向量處理器(Re-configurable Streaming Vector Processor; RSVP)係實施向量運算(即一 資料元件序列上之一組相同運算)的一協同處理器。其係旨 在藉由實施高速向量運算而提高嵌入式通用處理器(主機 處理器)的性能。在RSVP程式模型中,向量運算之規格被 分成兩個部分:存取和計算。 【實施方式】 85656.doc 1234738 在本發明之一項具體實施例中,一可重新配置流動向量 處理器包括數個功能單元,各單元具有一或多個用於接收 資料值的輸入及一個用於提供資料值的輸出,以及一可重 新配置互連開關及一微定序器。該可重新配置互連開關包 括一或多個鏈路,各鏈路可按該微定序器的指示操作,將 一功能單元的輸出耦合至一功能單元的輸入。該向量處理 器還包括一或多個輸入流單元,用於從記憶體取回資料。 各輸入流單元係由一主機處理器控制並具有至該主機處 理器的一定義介面。該向量處理器還包括一或多個輸出流 單元,用於向記憶體寫入資料,其也具有至該主機處理器 的一定義介面。 在另一項具體實施例中,該可重新配置互連開關包括一 記憶體,用於儲存中間資料值。 在該較佳具體實施例中,該輸入流單元之該定義介面形 成該程式模型的第一部分。儲存於指示該可重新配置互連 開關之序列的記憶體中的指令形成該程式模型的第二部 分。. 圖1顯示RVSP硬體100之一示範性具體實施例。參考圖 1,數個功能單元102的輸出和輸入係藉由一可重新配置互 連開關104互連。該功能單元可包括一乘法器106、一加法 器108、一邏輯單元110及一移位器112。也可包括其他功能 單元及一特定類型的多個功能單元。該功能單元的輸出可 為單一暫存器或管線暫存器。該功能單元可支援多個獨立 的運算。例如,該乘法器可具有一 128位元輸入及一 128位 85656.doc 1234738 疋輸出,並能夠實施兩個32x32到64或四個16x16到32位元 的乘法’或實施其總數不超過128位元輸入或128位元輸出 的任何乘法組合。該硬體還包括一或多個累加器丨丨4。在該 較佳具體實施例中,該等累加器係當作累加器及儲存暫存 w 並與遠互連開關1 及一外部介面π 6連接。該外部介 面116使RSVP可與一主機處理器連接,並允許該主機處理 器存取累加器和RSVP的其他部件。功能單元1〇2及該可重 新配置互連開關1 〇4定義RSVP的資料路徑。該功能單元及 可重新配置互連開關104係鏈接至一微定序器丨丨8,該定序 為包括一記憶體12〇(最好為一快取記憶體)以儲存一指令 私式,m指令程式說明所需向量計算之資料流圖形的實施 特別說明。在處理器時脈的各週期,該微定序器產生控制 竽兀,其配置該互連開關中的鏈路並驅動該功能單元。常 數單元120提供純量值的儲存和表現及穿隧節點功能。該 王機處理器或該指令程式可載入該等純量值及穿隧初始 化值。在運算中,輸入資料值係藉由一或多個輸入流單元 122(圖中僅顯示一個單元)提供給互連開關ι〇4。各輸入流 單元122係由說明記憶體中資料分配的一組參數控制。該 組參數係由該主機處理器提供,該輸人流單元係藉由外_ 介面U6連接至該主機處理器。類似地,各輸出流單元 124(圖中僅顯示-個單元)係由該主機處理器控制,並可操
作將資料自該可重新配置互連開關1〇4傳送至外部記S 體。輸人流單TC122及輸出流單元⑵係連接至同步資 的微定序器118。 ~ 机 85656.doc -10 - 1234738 I示範RS VP的架構有利於將向量存取與向量計算分 開運作。 由輸入或輸出流單元實施的向量存取係由說明各輸入 ^輸出向里(其為部分該向量運算)之位置、形狀及類型構 成。在該較佳具體實施例中,該等特性係透過兩個或多個 參數說明: 1 ·向1位址·下一向量元件在記憶體中的開始位址。 2 ·跨步-從一個元件到下一個元件的符號增量。 3。跨距-跨越前的跨步數。 4·跨越-計算元件跨距數後的符號增量。 5.大小-各資料元件的大小(例如i、〕*#位元組)。 除这等輸入及輸出向量外,該向量處理器還包括數個純 量及累加器暫存器,其初始值可由程式員指^。該等暫存 器係在向量計算中使用。對於該等累加器,在計算中可改 變該等暫存器的值’並可留待以後存取使用。該程式模型 的向量存取邵分係在該主機處理器所用的程式語言中說 明’並在該主機上執行。 向量計算由待應用於該等輸入向量之元件的部分排序 運算子組、純量暫存器及導出各輸出向量元件之累加器組 成。在本發明之該程式模型中,向量計算係藉由計算之資 料流圖形的線性表現實施。在資料流圖形中,圖形中的各 節點係使用節點描述符表示,節點描述符指定從中獲取其 輸入資料的該節點或該等節點將要實施的運算。與其他 CPU不同,沒有明確命名暫存器以在運算子之間傳遞資 85656.doc -11- 1234738 料。一計算範例的c語言說明如下: void quant(short *out, short *inf int nf short qp) { long rq, b, c; rq = ((1 « 16) + qp) / (qp « 1); b = qp - ! (qp & 1); while (--n > 0) { c = *in++; if(c < 0) c += b; else if(c > 0) c -= b; ★out++ = (c * rq) / (1 «16 ); } } 圖2顯示一對應的資料流圖形。參考圖2,在區塊202載 入一向量vl。在節點204獲得該向量的符號。在區塊206與 208分別載入純量值s2與si。在區塊210載入直接移位值 16。在節點212,向量vl由純量s2相乘。在節點214,從vl 中減去該乘法之結果。在節點216,該減法之結果與純量si 相乘,然後,在節點21 8向右移位16。最後在區塊220向量 結果當作v0儲存。 資料流圖形的線性形式如下: Q1: vld.sl6 (vl) // c = *in++; Q2: vsign.sl6 Q1 Q3: vscalar s2 // s2 is b Q4: vscalar si // si is rq Q5: vimm 16 Q6: vmul.sl6 Q2,Q3 // if(c<0) c+=b; Q7: vsub.sl6 Q1,Q6 // else if (c>0) c-=b; Q8: vmul.s32 Q7,Q4 // c *= rq; Q9: vasrO.sl6 Q8,Q5 // *out++ = c/ (1«16); 85656.doc - 12 - 1234738 該示範線性流形式使用的函數運算為: vld.sl6--自16位元資料值之向量載入下一個資料元件 vsign.sl6--計算該資料值的符號 vscalar--載入一純量值 vmul .sl6_-乘以兩個16位元資料值 vmul .s3 2—乘以兩個32位元資料值 vsub.si 6--減去兩個16位元資料值 vasi:(Ksl6—將一 16位元資料值算術右移 該功能單元最好實施各種其他函數運算,包括向量元件 加法(vadd)及累加(vadda)。在執行前,該線性圖形係排程 於該RSVP資料路徑。該資料路徑可在各時脈週期重新配 置。該等功能單元可聚合,即,可組合其子集形成較大的 功能單元。該功能單元互連使該等功能單元可具有任意管 線。 該RSVP透過其程式模型促進高性能目標及快速應市。因 其為協同處理器,RSVP最好使用單芯程式模型。雙芯解決 方案(如通用CPU/DSP組合中使用的那種)更難以進行程式 設計。因為,使用該等類型的解決方案,程式員必須使用 兩組不同的程式工具,並且必須明確地處理CPU與DSP之 間的同步。 在本發明之程式模型内,向量存取說明係與向量計算說 明分開。因此,程式員無須應付兩個問題的混雜。因為向 量存取僅以五個參數說明,程式員可避免處理資料對準及 填塞、記憶體匯流排寬度或記憶體潛伏的問題。在下方的 85656.doc -13 - 1234738 向量存取硬體處理該等問題。藉此,向量存取說明保持一 致,而無論記憶體子系統或向量存取硬體的實施如何。這 不僅簡化了程式員的工作,而且促進了二進位碼的相容 性,因為無須修改RSVP二進位碼以反映該等實施的變化。 該資料流圖形形式的向量計算說明不包括實施RSVP資 料路徑的特定資訊。除非使用累加器與向量流單元(vector stream units ; VSU),RSVP沒有任何資源依賴。特定而言, 其沒有明確命名暫存器以在運算子之間傳遞資料。此舉消 除了排程器的負擔,使排程器更容易實現最佳排程。因 此,資料路徑可從一純量資料路徑改變成超純量資料路 徑、VLIW或SIMD之類,其為程式員所瞭解,且不用改變 RSVP二進位碼。 圖3及圖4說明該程式模型中向量存取說明與向量計算 說明的分離。圖3顯示為RSVP產生代碼之方法的流程圖。 在開始區塊302後,在區塊304指定計算的一資料流圖形。 在區塊306,從該資料流圖形產生計算的一線性圖形。可 人工或由一電腦程式自動產生該線性圖形。在一項具體實 施例中,該電腦程式為使用者提供圖形使用者介面,以方 便讀取該資料流圖形。在區塊308,於區塊306產生之該線 性圖形係提供給一排程器。該排程器係一電腦程式,為高 效使用RSVP資源依序進行函數運算。一旦排程完成,在區 塊310即產生RSVP的二進位碼,過程於區塊312終止。應注 意,該過程未考慮向量存取問題,如資料對準與填塞、記 憶體匯流排寬度或記憶體潛伏。該等問題均由硬體處理。 85656.doc -14- 1234738 資料存取係由該主機處理器指定。圖4顯示該主機處理器 的程式設計過程。參考圖4,在開始區塊402之後,於區塊 404指定記憶體中的資料結構。在區塊406指定相關的資料 存取參數(開始位址、跨步、跨距、跨越及大小)。在運算 中該等參數將傳遞至該RSVP的輸入流動單元。在區塊408 產生該主機處理器代碼的餘下部分,過程於區塊410終 止。因此,該主機處理指定向量存取,但獨立於向量計算。 該RSVP硬體利用了該程式模型的若干方面,以便改善性 能。由於向量存取與計算的分離,各自的硬體相對於另一 方非同步運作。因此,該向量存取硬體可在計算前運作, 在需要資料之前擷取資料,因而至少隱藏了部分記憶體潛 伏。 該向量存取說明足夠簡潔,使所有資訊均保存於存取硬 體的少數暫存器中。該主機處理器可存取該等暫存器。同 樣地,RSVP資料流圖形結構限定於固定數目的節點(例如 256個節點),因而該RSVP微定序器記憶體足夠容納該資料 流圖形的整個線性形式。該方式的優點為,硬體無需擷取 指令來決定如何實施位址計算或向量計算。因而無需自服 務RSVP計算所需之記憶體頻寬擷取指令。 因為向量計算係指定為資料流圖形,包括少數資源依賴 性,因此,RSVP資料路徑不同於其他CPU的資料路徑。大 多數DSP、SIMD、VLIW及向量處理器裝置不能以任意順 序連接其資料路徑的功能單元。且均無可聚合的功能單 元0 85656.doc -15 - I234738 :、、心技術人士將明白’本文已經透過基於使用一特定架 構〈示範具體實施例說明本發明。但是,本發明不應受此 限制,因為本發明可使用同等結構實施。熟悉技術人士將 進一步明自’可對本發明做出各種形式與内容之改變而不 會背離本發明的精神與範圍。 、雖然本發明係結合特定具體實施例說明,但顯然根據上 述祝明’對於熟悉技術人士而言,許多替代、修改、改變 與變化係顯而易見的。因此’本發明意在包含所有在隨附 申請專利範圍内之替代、修改與變化。 【圖式簡單說明】 隨附的中請專利範圍中提出本發明的新穎功能及特 性。但是,藉由參考上文巾之w解具體實施例的詳細說明 並配合附圖,更容易完全明白本發明以及使用本發明的較 佳模式、進一步目的及其優點,其中: 圖1為本發明之一可重新配置流動向量處理器的一具體 實施例的圖形表示。 圖2為一示範疊代計算的資料流圖形。 圖3為程式設計本發明之向量處理器之過程的流程圖。 圖4為程式設計一主機處理器’使之與本發明之向量處 理器一起運作之過程的流程圖。 【圖式代表符號說明】 1〇〇 可重新配置流動向量處理器 102 功能單元 104 可重新配置互連開關 85656.doc -16- 乘法器 加法器 邏輯單元 移位器 累加器 外部介面 微定序器 記憶體; 常數單元 輸入流單元 輸出流單元 區塊 節點 區塊 區塊 區塊 節點 節點 節點 節點 區塊 開始區塊 區塊 區塊 -17- 1234738 308 區塊 310 區塊 312 區塊 402 開始區塊 404 區塊 406 區塊 408 區塊 410 區塊 85656.doc - 18-
Claims (1)
1234738 拾、申請專利範圍: 1 · 一種可重新配置流動向量處理器,包括·· 複數個功能單元,各具有一或多個輸入以接收一資料 值及一輸出用以提供一資料值; 可重新配置互連開關,其包括一或多個鏈路,各鏈 路可操作以將一功能單元的一輸出耦合至一功能單元的 該等一或多個輸入的一輸入;以及 一微疋序器,其耦合至該可重新配置互連開關,並可 操作以控制該可重新配置互連開關。 2·如申印專利範圍第丨項之可重新配置流動向量處理器,其 中讀微疋序器包括一程式記憶體以儲存一指令程式。 3 ·如申叫專利範圍第丨項之可重新配置流動向量處理器,其 中S可重新配置互連開關包括一開關記憶體以儲存資料 值。 /、 4·如申請專利範圍第3項之可重新配置流動向量處理器,其 中該開關記憶體包括至少一FIF〇、一程式延遲及一管線 暫存咨文件之一。 5·如申請專利範圍第丨項之可重新配置流動向量處理器,其 中孩可重新配置互連開關之一鏈路係由該微定序器指示 以接文一功能單元之一輸出的一資料值,並向一功能單 元的居等一或多個輸入之一輸入提供一資料值。 6·如申请專利範圍第1項之可重新配置流動向量處理器,進 一步包括: 85656.doc 1234738 一或多個輸入流單元,其耦合至該可重新配置互連開 關’並可操作以從一資料記憶體中取回輸入資料值,且 向該可重新配置互連開關提供資料值;以及 一或多個輸出流單元,其耦合至該可重新配置互連開 關,並可操作以從該可重新配置互連開關接受資料值, 且向資料記憶體提供輸出資料值。 7·如申請專利範圍第6項之可重新配置流動向量處理器,其 中該等輸入與輸出流單元包括一介面,用以接收一主機 電腦的控制指令。 8。如申請專利範圍第7項之可重新配置流動向量處理器,其 中該等控制指令至少包括以下各項之一: 在該資料記憶體中的資料值之一向量的一開始位址; 貫料值之该向量的一跨步; 資料值之間的一跨距; 要在向量資料值之一跨距之間跨越的數個記憶體位 址;以及 資料值之向量中各資料值的一大小。 9_如申請專利範圍第6項之可重新配置流動向量處理器,其 進一步包括一外部介面,該外部介面可操作以將該等輸 入流單元、該等輸出流單元及該微定序器耦合至一主機 電腦。 10·如申請專利範圍第1項之可重新配置流動向量處理器,其 85656.doc 1234738 中該等功能單元至少包括以下各項之一: 一移位器; 一加法器; 一邏輯單元;以及 一乘法器。 11.如申請專利範圍第10項之可重新配置流動向量處理器, 其中6亥等功能單元進一步包括一通過功能草元。 12 ·如申請專利範圍第1項之可重新配置流動向量處理器,其 中該等複數個功能單元的至少之一的一輸出包括一暫存 器管線。 13.如申請專利範圍第1項之可重新配置流動向量處理器,其 進一步包括耦合至該可重新配置互連開關的至少一累加 器。 1 4·如申請專利範圍第丨3項之可重新配置流動向量處理器, 其中該至少一累加器可操作以耦合至一主機電腦。 1 5 ·如申請專利範圍第丨項之可重新配置流動向量處理器,其 進一步包括複數個純量暫存器。 16·如申請專利範圍第15項之可重新配置流動向量處理器, 其中該等複數個純量暫存器提供一資料穿隧。 1 7。一種配置包括一互連開關、一微定序器及複數個功能單 元的一流動向量處理器的方法,該方法包括: 在該微定序器中儲存一指令程式; 取回該指令程式之一指令; 85656.doc 1234738 根據自該指令程式取回的該指令配置該互連開關; 根據自該指令程式收到的該指令雨一功能卓元提供儲 存於一第一記憶體的資料; 該功能單元對該等資料進行運算;以及 根據自該指令程式收到的該指令將一功能單元的資料 儲存於一第二記憶體中。 18·如申請專利範圍第17項之方法,其中該流動向量處理器 進一步包括具有一緩衝記憶體的一或多個輸入流單元, 且其中,該第一記憶體係該輸入流單元之該等一或多個 緩衝記憶體及該互連開關中的一記憶體。 19·如申請專利範圍第18項之方法,該方法進一步包括,根 據自一主機電腦接收的一組參數,各輸入流單元從一外 部記憶體取回資料值,並將其儲存於該輸入流單元的該 緩衝記憶體中。 20·如申請專利範圍第17項之方法,其中該流動向量處理器 進一步包括具有一緩衝記憶體的一或多個輸出流單元, 且其中,該第二記憶體係該輸出流單元之該等一或多個 緩衝記憶體及該互連開關中的一記憶體。 2 1.如申請專利範圍第20項之方法,其進一步包括,根據自 一主機處理器接收的一組參數,各輸出流單元將該輸出 流單元之該緩衝記憶體的資料值寫入一外部記憶體。 22. —種用於程式設計一流動向量處理器以實施一疊代計算 的方法,該流動向量處理器具有一可重新配置資料路徑 85656.doc 1234738 且該方法包括: 指定該疊代計算之一疊代的一資料流圖形; 自該資料流圖形產生一線性圖形,其指定對應於該資 料流圖形之一部分定序運算組; 將該線性圖形排程至該流動向量處理器的該資料路 徑;以及 產生二進位碼指令,其可操作以配置該流動向量處理 器的該資料路徑。 23. 如申請專利範圍第22項之方法,其中該流動向量處理器 包括具有一記憶體的一微定序器,該方法進一步包括在 該微定序器的該記憶體中儲存該等二進位碼指令。 24. 如申請專利範圍第23項之方法,其中係由一電腦排程該 線性圖形及產生該等二進位碼指令。 25. 如中請專利範圍第22項之方法,其中產生一線性圖形進 一步包括使用一電腦的一圖形使用者介面指定該資料流 圖形,而該電腦自該資料流圖形自動產生該線性圖形。 85656.doc
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/184,583 US7159099B2 (en) | 2002-06-28 | 2002-06-28 | Streaming vector processor with reconfigurable interconnection switch |
Publications (2)
Publication Number | Publication Date |
---|---|
TW200405981A TW200405981A (en) | 2004-04-16 |
TWI234738B true TWI234738B (en) | 2005-06-21 |
Family
ID=29779404
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
TW092115849A TWI234738B (en) | 2002-06-28 | 2003-06-11 | Re-configurable streaming vector processor |
Country Status (7)
Country | Link |
---|---|
US (2) | US7159099B2 (zh) |
EP (1) | EP1535171A4 (zh) |
JP (1) | JP2005531848A (zh) |
CN (1) | CN1666187A (zh) |
AU (1) | AU2003228247A1 (zh) |
TW (1) | TWI234738B (zh) |
WO (1) | WO2004003767A1 (zh) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI476695B (zh) * | 2011-11-30 | 2015-03-11 | Intel Corp | 提供向量水平比較功能之指令與邏輯 |
TWI507982B (zh) * | 2013-08-14 | 2015-11-11 | Qualcomm Inc | 向量算術縮減 |
US10318291B2 (en) | 2011-11-30 | 2019-06-11 | Intel Corporation | Providing vector horizontal compare functionality within a vector register |
Families Citing this family (83)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7159099B2 (en) * | 2002-06-28 | 2007-01-02 | Motorola, Inc. | Streaming vector processor with reconfigurable interconnection switch |
US7415601B2 (en) * | 2002-06-28 | 2008-08-19 | Motorola, Inc. | Method and apparatus for elimination of prolog and epilog instructions in a vector processor using data validity tags and sink counters |
US6961888B2 (en) * | 2002-08-20 | 2005-11-01 | Flarion Technologies, Inc. | Methods and apparatus for encoding LDPC codes |
US7290122B2 (en) * | 2003-08-29 | 2007-10-30 | Motorola, Inc. | Dataflow graph compression for power reduction in a vector processor |
US7610466B2 (en) * | 2003-09-05 | 2009-10-27 | Freescale Semiconductor, Inc. | Data processing system using independent memory and register operand size specifiers and method thereof |
US7107436B2 (en) * | 2003-09-08 | 2006-09-12 | Freescale Semiconductor, Inc. | Conditional next portion transferring of data stream to or from register based on subsequent instruction aspect |
US7315932B2 (en) * | 2003-09-08 | 2008-01-01 | Moyer William C | Data processing system having instruction specifiers for SIMD register operands and method thereof |
US7275148B2 (en) * | 2003-09-08 | 2007-09-25 | Freescale Semiconductor, Inc. | Data processing system using multiple addressing modes for SIMD operations and method thereof |
US9047094B2 (en) * | 2004-03-31 | 2015-06-02 | Icera Inc. | Apparatus and method for separate asymmetric control processing and data path processing in a dual path processor |
US7949856B2 (en) | 2004-03-31 | 2011-05-24 | Icera Inc. | Method and apparatus for separate control processing and data path processing in a dual path processor with a shared load/store unit |
US8484441B2 (en) * | 2004-03-31 | 2013-07-09 | Icera Inc. | Apparatus and method for separate asymmetric control processing and data path processing in a configurable dual path processor that supports instructions having different bit widths |
JP2006018412A (ja) * | 2004-06-30 | 2006-01-19 | Fujitsu Ltd | アドレス生成器および演算回路 |
JP2006236106A (ja) * | 2005-02-25 | 2006-09-07 | Canon Inc | データ処理装置及びデータ処理方法 |
US7305649B2 (en) * | 2005-04-20 | 2007-12-04 | Motorola, Inc. | Automatic generation of a streaming processor circuit |
US20070150697A1 (en) * | 2005-05-10 | 2007-06-28 | Telairity Semiconductor, Inc. | Vector processor with multi-pipe vector block matching |
US20060265485A1 (en) * | 2005-05-17 | 2006-11-23 | Chai Sek M | Method and apparatus for controlling data transfer in a processing system |
US7415595B2 (en) * | 2005-05-24 | 2008-08-19 | Coresonic Ab | Data processing without processor core intervention by chain of accelerators selectively coupled by programmable interconnect network and to memory |
EP1732306B1 (en) * | 2005-06-10 | 2012-11-21 | Agfa Graphics N.V. | Image processing method for reducing image artefacts |
US7603492B2 (en) * | 2005-09-20 | 2009-10-13 | Motorola, Inc. | Automatic generation of streaming data interface circuit |
US7502909B2 (en) * | 2005-10-11 | 2009-03-10 | Motorola, Inc. | Memory address generation with non-harmonic indexing |
GB0522957D0 (en) * | 2005-11-11 | 2005-12-21 | Portabottle Ltd | Carrier |
US7856628B2 (en) * | 2006-01-23 | 2010-12-21 | International Business Machines Corporation | Method for simplifying compiler-generated software code |
US7872657B1 (en) * | 2006-06-16 | 2011-01-18 | Nvidia Corporation | Memory addressing scheme using partition strides |
US20080120497A1 (en) * | 2006-11-20 | 2008-05-22 | Motorola, Inc. | Automated configuration of a processing system using decoupled memory access and computation |
US7634633B2 (en) * | 2006-11-30 | 2009-12-15 | Motorola, Inc. | Method and apparatus for memory address generation using dynamic stream descriptors |
US8200156B2 (en) * | 2007-01-31 | 2012-06-12 | Broadcom Corporation | Apparatus for allocation of wireless resources |
US8254319B2 (en) * | 2007-01-31 | 2012-08-28 | Broadcom Corporation | Wireless programmable logic device |
US8289944B2 (en) * | 2007-01-31 | 2012-10-16 | Broadcom Corporation | Apparatus for configuration of wireless operation |
US8438322B2 (en) * | 2007-01-31 | 2013-05-07 | Broadcom Corporation | Processing module with millimeter wave transceiver interconnection |
US9486703B2 (en) | 2007-01-31 | 2016-11-08 | Broadcom Corporation | Mobile communication device with game application for use in conjunction with a remote mobile communication device and methods for use therewith |
US8280303B2 (en) * | 2007-01-31 | 2012-10-02 | Broadcom Corporation | Distributed digital signal processor |
US8238275B2 (en) * | 2007-01-31 | 2012-08-07 | Broadcom Corporation | IC with MMW transceiver communications |
US20090017910A1 (en) * | 2007-06-22 | 2009-01-15 | Broadcom Corporation | Position and motion tracking of an object |
US8125950B2 (en) * | 2007-01-31 | 2012-02-28 | Broadcom Corporation | Apparatus for wirelessly managing resources |
US8116294B2 (en) * | 2007-01-31 | 2012-02-14 | Broadcom Corporation | RF bus controller |
US8121541B2 (en) * | 2007-01-31 | 2012-02-21 | Broadcom Corporation | Integrated circuit with intra-chip and extra-chip RF communication |
US20090197641A1 (en) * | 2008-02-06 | 2009-08-06 | Broadcom Corporation | Computing device with handheld and extended computing units |
US8204075B2 (en) * | 2007-01-31 | 2012-06-19 | Broadcom Corporation | Inter-device wireless communication for intra-device communications |
US8223736B2 (en) * | 2007-01-31 | 2012-07-17 | Broadcom Corporation | Apparatus for managing frequency use |
US20090011832A1 (en) * | 2007-01-31 | 2009-01-08 | Broadcom Corporation | Mobile communication device with game application for display on a remote monitor and methods for use therewith |
US8239650B2 (en) * | 2007-01-31 | 2012-08-07 | Broadcom Corporation | Wirelessly configurable memory device addressing |
US20080320293A1 (en) * | 2007-01-31 | 2008-12-25 | Broadcom Corporation | Configurable processing core |
US7694193B2 (en) * | 2007-03-13 | 2010-04-06 | Hewlett-Packard Development Company, L.P. | Systems and methods for implementing a stride value for accessing memory |
US7802005B2 (en) * | 2007-03-30 | 2010-09-21 | Motorola, Inc. | Method and apparatus for configuring buffers for streaming data transfer |
US20090198798A1 (en) * | 2008-02-06 | 2009-08-06 | Broadcom Corporation | Handheld computing unit back-up system |
US8195928B2 (en) * | 2008-02-06 | 2012-06-05 | Broadcom Corporation | Handheld computing unit with merged mode |
US8064952B2 (en) * | 2008-02-06 | 2011-11-22 | Broadcom Corporation | A/V control for a computing device with handheld and extended computing units |
US8175646B2 (en) * | 2008-02-06 | 2012-05-08 | Broadcom Corporation | Networking of multiple mode handheld computing unit |
US8717974B2 (en) * | 2008-02-06 | 2014-05-06 | Broadcom Corporation | Handheld computing unit coordination of femtocell AP functions |
US8117370B2 (en) * | 2008-02-06 | 2012-02-14 | Broadcom Corporation | IC for handheld computing unit of a computing device |
KR100976628B1 (ko) * | 2008-05-09 | 2010-08-18 | 한국전자통신연구원 | 다중 프로세서 시스템 및 그 시스템에서의 다중 프로세싱방법 |
US8430750B2 (en) * | 2008-05-22 | 2013-04-30 | Broadcom Corporation | Video gaming device with image identification |
US7945768B2 (en) | 2008-06-05 | 2011-05-17 | Motorola Mobility, Inc. | Method and apparatus for nested instruction looping using implicit predicates |
US8793472B2 (en) * | 2008-08-15 | 2014-07-29 | Apple Inc. | Vector index instruction for generating a result vector with incremental values based on a start value and an increment value |
WO2010057375A1 (zh) * | 2008-11-19 | 2010-05-27 | 北京大学深圳研究生院 | 一种可配置处理器体系结构和控制方法 |
CN102122275A (zh) * | 2010-01-08 | 2011-07-13 | 上海芯豪微电子有限公司 | 一种可配置处理器 |
US9747363B1 (en) * | 2012-03-01 | 2017-08-29 | Attivio, Inc. | Efficient storage and retrieval of sparse arrays of identifier-value pairs |
US9116686B2 (en) | 2012-04-02 | 2015-08-25 | Apple Inc. | Selective suppression of branch prediction in vector partitioning loops until dependency vector is available for predicate generating instruction |
US9038042B2 (en) * | 2012-06-29 | 2015-05-19 | Analog Devices, Inc. | Staged loop instructions |
US9465620B2 (en) * | 2012-12-20 | 2016-10-11 | Intel Corporation | Scalable compute fabric |
US9275014B2 (en) | 2013-03-13 | 2016-03-01 | Qualcomm Incorporated | Vector processing engines having programmable data path configurations for providing multi-mode radix-2x butterfly vector processing circuits, and related vector processors, systems, and methods |
US9495154B2 (en) * | 2013-03-13 | 2016-11-15 | Qualcomm Incorporated | Vector processing engines having programmable data path configurations for providing multi-mode vector processing, and related vector processors, systems, and methods |
US9043510B2 (en) | 2013-08-06 | 2015-05-26 | Oracle International Corporation | Hardware streaming unit |
US9396113B2 (en) * | 2013-08-06 | 2016-07-19 | Oracle International Corporation | Flexible configuration hardware streaming unit |
US9367309B2 (en) * | 2013-09-24 | 2016-06-14 | Apple Inc. | Predicate attribute tracker |
US9390058B2 (en) | 2013-09-24 | 2016-07-12 | Apple Inc. | Dynamic attribute inference |
CN103544131B (zh) * | 2013-10-12 | 2017-06-16 | 深圳市欧克蓝科技有限公司 | 一种可动态配置矢量处理装置 |
US20150143076A1 (en) * | 2013-11-15 | 2015-05-21 | Qualcomm Incorporated | VECTOR PROCESSING ENGINES (VPEs) EMPLOYING DESPREADING CIRCUITRY IN DATA FLOW PATHS BETWEEN EXECUTION UNITS AND VECTOR DATA MEMORY TO PROVIDE IN-FLIGHT DESPREADING OF SPREAD-SPECTRUM SEQUENCES, AND RELATED VECTOR PROCESSING INSTRUCTIONS, SYSTEMS, AND METHODS |
US10353708B2 (en) * | 2016-09-23 | 2019-07-16 | Advanced Micro Devices, Inc. | Strided loading of non-sequential memory locations by skipping memory locations between consecutive loads |
US10216515B2 (en) | 2016-10-18 | 2019-02-26 | Oracle International Corporation | Processor load using a bit vector to calculate effective address |
US11853244B2 (en) * | 2017-01-26 | 2023-12-26 | Wisconsin Alumni Research Foundation | Reconfigurable computer accelerator providing stream processor and dataflow processor |
CN114168525B (zh) | 2017-03-14 | 2023-12-19 | 珠海市芯动力科技有限公司 | 可重构并行处理 |
US10318306B1 (en) * | 2017-05-03 | 2019-06-11 | Ambarella, Inc. | Multidimensional vectors in a coprocessor |
US10331445B2 (en) * | 2017-05-24 | 2019-06-25 | Microsoft Technology Licensing, Llc | Multifunction vector processor circuits |
CN109032668B (zh) | 2017-06-09 | 2023-09-19 | 超威半导体公司 | 具有高带宽和低功率向量寄存器堆的流处理器 |
US11614941B2 (en) * | 2018-03-30 | 2023-03-28 | Qualcomm Incorporated | System and method for decoupling operations to accelerate processing of loop structures |
US11663001B2 (en) * | 2018-11-19 | 2023-05-30 | Advanced Micro Devices, Inc. | Family of lossy sparse load SIMD instructions |
EP4085354A4 (en) * | 2019-12-30 | 2024-03-13 | Star Ally International Limited | PROCESSOR FOR CONFIGURABLE PARALLEL CALCULATIONS |
EP3937009A1 (en) * | 2020-07-09 | 2022-01-12 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | A computing unit, method to perform computations, method to generate program code for a hardware comprising a multi-stage-buffer and program code |
US12086597B2 (en) * | 2021-06-28 | 2024-09-10 | Silicon Laboratories Inc. | Array processor using programmable per-dimension size values and programmable per-dimension stride values for memory configuration |
US20220413850A1 (en) * | 2021-06-28 | 2022-12-29 | Silicon Laboratories Inc. | Apparatus for Processor with Macro-Instruction and Associated Methods |
US12079630B2 (en) * | 2021-06-28 | 2024-09-03 | Silicon Laboratories Inc. | Array processor having an instruction sequencer including a program state controller and loop controllers |
CN114579083B (zh) * | 2022-05-09 | 2022-08-05 | 上海擎昆信息科技有限公司 | 一种基于矢量处理器的数据处理装置和方法 |
Family Cites Families (77)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US3718912A (en) | 1970-12-22 | 1973-02-27 | Ibm | Instruction execution unit |
US4128880A (en) | 1976-06-30 | 1978-12-05 | Cray Research, Inc. | Computer vector register processing |
JPS5975365A (ja) | 1982-10-22 | 1984-04-28 | Hitachi Ltd | ベクトル処理装置 |
JPS60134974A (ja) | 1983-12-23 | 1985-07-18 | Hitachi Ltd | ベクトル処理装置 |
JPS6145354A (ja) | 1984-08-10 | 1986-03-05 | Nec Corp | マイクロプロセツサ |
US4744043A (en) * | 1985-03-25 | 1988-05-10 | Motorola, Inc. | Data processor execution unit which receives data with reduced instruction overhead |
US4807183A (en) | 1985-09-27 | 1989-02-21 | Carnegie-Mellon University | Programmable interconnection chip for computer system functional modules |
US5821934A (en) | 1986-04-14 | 1998-10-13 | National Instruments Corporation | Method and apparatus for providing stricter data type capabilities in a graphical data flow diagram |
US5481740A (en) | 1986-04-14 | 1996-01-02 | National Instruments Corporation | Method and apparatus for providing autoprobe features in a graphical data flow diagram |
US5734863A (en) | 1986-04-14 | 1998-03-31 | National Instruments Corporation | Method and apparatus for providing improved type compatibility and data structure organization in a graphical data flow diagram |
US4862407A (en) * | 1987-10-05 | 1989-08-29 | Motorola, Inc. | Digital signal processing apparatus |
US4918600A (en) | 1988-08-01 | 1990-04-17 | Board Of Regents, University Of Texas System | Dynamic address mapping for conflict-free vector access |
US5317734A (en) | 1989-08-29 | 1994-05-31 | North American Philips Corporation | Method of synchronizing parallel processors employing channels and compiling method minimizing cross-processor data dependencies |
JP2718254B2 (ja) | 1990-10-02 | 1998-02-25 | 日本電気株式会社 | ベクトル処理装置 |
US5966528A (en) | 1990-11-13 | 1999-10-12 | International Business Machines Corporation | SIMD/MIMD array processor with vector processing |
JP3532932B2 (ja) | 1991-05-20 | 2004-05-31 | モトローラ・インコーポレイテッド | 時間重複メモリ・アクセスを有するランダムにアクセス可能なメモリ |
US5423040A (en) | 1991-07-24 | 1995-06-06 | International Business Machines Corporation | System and method for efficiently executing directed acyclic graphs |
US5965528A (en) * | 1991-09-27 | 1999-10-12 | Mcgill University | Recombinant human alph-fetoprotein as an immunosuppressive agent |
US5206822A (en) | 1991-11-15 | 1993-04-27 | Regents Of The University Of California | Method and apparatus for optimized processing of sparse matrices |
DE69413572T2 (de) | 1993-02-08 | 1999-05-20 | Sony Corp., Tokio/Tokyo | Optischer Wellenleiter für die Frequenzverdopplung |
US5717947A (en) | 1993-03-31 | 1998-02-10 | Motorola, Inc. | Data processing system and method thereof |
US5418953A (en) | 1993-04-12 | 1995-05-23 | Loral/Rohm Mil-Spec Corp. | Method for automated deployment of a software program onto a multi-processor architecture |
US5450607A (en) * | 1993-05-17 | 1995-09-12 | Mips Technologies Inc. | Unified floating point and integer datapath for a RISC processor |
US6064819A (en) | 1993-12-08 | 2000-05-16 | Imec | Control flow and memory management optimization |
US5719988A (en) * | 1994-05-31 | 1998-02-17 | Tektronix, Inc. | Dynamically paged non-volatile random access video store |
FR2723652B1 (fr) | 1994-08-11 | 1996-09-13 | Cegelec | Procede pour ordonnancer des taches successives |
JP2660163B2 (ja) * | 1994-10-11 | 1997-10-08 | 有限会社アレフロジック | アルゴリズム教育支援システム |
US5652903A (en) * | 1994-11-01 | 1997-07-29 | Motorola, Inc. | DSP co-processor for use on an integrated circuit that performs multiple communication tasks |
US5887183A (en) | 1995-01-04 | 1999-03-23 | International Business Machines Corporation | Method and system in a data processing system for loading and storing vectors in a plurality of modes |
US5495817A (en) * | 1995-05-22 | 1996-03-05 | Blough-Wagner Manufacturing Co., Inc. | Pedal mechanism for operating presser and motor in sewing machines |
US5719998A (en) * | 1995-06-12 | 1998-02-17 | S3, Incorporated | Partitioned decompression of audio data using audio decoder engine for computationally intensive processing |
JP3598589B2 (ja) | 1995-06-28 | 2004-12-08 | 株式会社日立製作所 | プロセッサ |
JP3520611B2 (ja) | 1995-07-06 | 2004-04-19 | 株式会社日立製作所 | プロセッサの制御方法 |
US5742821A (en) | 1995-11-08 | 1998-04-21 | Lucent Technologies Inc. | Multiprocessor scheduling and execution |
US5764787A (en) | 1996-03-27 | 1998-06-09 | Intel Corporation | Multi-byte processing of byte-based image data |
US6571016B1 (en) | 1997-05-05 | 2003-05-27 | Microsoft Corporation | Intra compression of pixel blocks using predicted mean |
JPH09330304A (ja) | 1996-06-05 | 1997-12-22 | Internatl Business Mach Corp <Ibm> | プロセッサ間の通信スケジュールを決定する方法 |
US5805614A (en) | 1996-07-03 | 1998-09-08 | General Signal Corporation | Fault tolerant switch fabric with control and data correction by hamming codes |
US5889989A (en) | 1996-09-16 | 1999-03-30 | The Research Foundation Of State University Of New York | Load sharing controller for optimizing monetary cost |
GB2317465B (en) | 1996-09-23 | 2000-11-15 | Advanced Risc Mach Ltd | Data processing apparatus registers. |
GB2317469B (en) | 1996-09-23 | 2001-02-21 | Advanced Risc Mach Ltd | Data processing system register control |
GB2317464A (en) | 1996-09-23 | 1998-03-25 | Advanced Risc Mach Ltd | Register addressing in a data processing apparatus |
US6317774B1 (en) | 1997-01-09 | 2001-11-13 | Microsoft Corporation | Providing predictable scheduling of programs using a repeating precomputed schedule |
US6112023A (en) | 1997-02-24 | 2000-08-29 | Lucent Technologies Inc. | Scheduling-based hardware-software co-synthesis of heterogeneous distributed embedded systems |
US5999736A (en) | 1997-05-09 | 1999-12-07 | Intel Corporation | Optimizing code by exploiting speculation and predication with a cost-benefit data flow analysis based on path profiling information |
US6437804B1 (en) | 1997-10-23 | 2002-08-20 | Aprisma Management Technologies, Inc | Method for automatic partitioning of node-weighted, edge-constrained graphs |
US6173389B1 (en) | 1997-12-04 | 2001-01-09 | Billions Of Operations Per Second, Inc. | Methods and apparatus for dynamic very long instruction word sub-instruction selection for execution time parallelism in an indirect very long instruction word processor |
US5936953A (en) | 1997-12-18 | 1999-08-10 | Raytheon Company | Multi-mode, multi-channel communication bus |
US6430671B1 (en) | 1998-02-10 | 2002-08-06 | Lucent Technologies, Inc. | Address generation utilizing an adder, a non-sequential counter and a latch |
US6104962A (en) | 1998-03-26 | 2000-08-15 | Rockwell Technologies, Llc | System for and method of allocating processing tasks of a control program configured to control a distributed control system |
US6202130B1 (en) * | 1998-04-17 | 2001-03-13 | Motorola, Inc. | Data processing system for processing vector data and method therefor |
US6128775A (en) | 1998-06-16 | 2000-10-03 | Silicon Graphics, Incorporated | Method, system, and computer program product for performing register promotion via load and store placement optimization within an optimizing compiler |
US6052766A (en) | 1998-07-07 | 2000-04-18 | Lucent Technologies Inc. | Pointer register indirectly addressing a second register in the processor core of a digital processor |
US6192384B1 (en) | 1998-09-14 | 2001-02-20 | The Board Of Trustees Of The Leland Stanford Junior University | System and method for performing compound vector operations |
US6629123B1 (en) | 1998-10-02 | 2003-09-30 | Microsoft Corporation | Interception of unit creation requests by an automatic distributed partitioning system |
US6442701B1 (en) | 1998-11-25 | 2002-08-27 | Texas Instruments Incorporated | Power saving by disabling memory block access for aligned NOP slots during fetch of multiple instruction words |
SE9804529L (sv) | 1998-12-23 | 2000-06-24 | Axis Ab | Flexibel minneskanal |
US6513107B1 (en) | 1999-08-17 | 2003-01-28 | Nec Electronics, Inc. | Vector transfer system generating address error exception when vector to be transferred does not start and end on same memory page |
US6665749B1 (en) | 1999-08-17 | 2003-12-16 | Nec Electronics, Inc. | Bus protocol for efficiently transferring vector data |
US6745160B1 (en) | 1999-10-08 | 2004-06-01 | Nec Corporation | Verification of scheduling in the presence of loops using uninterpreted symbolic simulation |
US6588009B1 (en) | 1999-11-29 | 2003-07-01 | Adelante Technologies Nv | Method and apparatus for compiling source code using symbolic execution |
US6754893B2 (en) | 1999-12-29 | 2004-06-22 | Texas Instruments Incorporated | Method for collapsing the prolog and epilog of software pipelined loops |
US6892380B2 (en) | 1999-12-30 | 2005-05-10 | Texas Instruments Incorporated | Method for software pipelining of irregular conditional control loops |
US6795908B1 (en) * | 2000-02-16 | 2004-09-21 | Freescale Semiconductor, Inc. | Method and apparatus for instruction execution in a data processing system |
JP3674515B2 (ja) * | 2000-02-25 | 2005-07-20 | 日本電気株式会社 | アレイ型プロセッサ |
US6598221B1 (en) * | 2000-04-13 | 2003-07-22 | Koninklijke Philips Electronics N.V. | Assembly code performance evaluation apparatus and method |
US6647546B1 (en) | 2000-05-03 | 2003-11-11 | Sun Microsystems, Inc. | Avoiding gather and scatter when calling Fortran 77 code from Fortran 90 code |
US7010788B1 (en) | 2000-05-19 | 2006-03-07 | Hewlett-Packard Development Company, L.P. | System for computing the optimal static schedule using the stored task execution costs with recent schedule execution costs |
DE10057343A1 (de) | 2000-11-18 | 2002-05-23 | Philips Corp Intellectual Pty | Paketvermittlungseinrichtung mit einer Kaskadensteuerung und pufferloser Kaskadenkoppelmatrix |
US6898691B2 (en) * | 2001-06-06 | 2005-05-24 | Intrinsity, Inc. | Rearranging data between vector and matrix forms in a SIMD matrix processor |
JP3914771B2 (ja) | 2002-01-09 | 2007-05-16 | 株式会社日立製作所 | パケット通信装置及びパケットデータ転送制御方法 |
US6732354B2 (en) | 2002-04-23 | 2004-05-04 | Quicksilver Technology, Inc. | Method, system and software for programming reconfigurable hardware |
US7159099B2 (en) * | 2002-06-28 | 2007-01-02 | Motorola, Inc. | Streaming vector processor with reconfigurable interconnection switch |
AT412881B (de) * | 2002-08-23 | 2005-08-25 | Wuester Heinrich | Schirmartiger wäschetrockner mit schutzhülle |
US7610466B2 (en) * | 2003-09-05 | 2009-10-27 | Freescale Semiconductor, Inc. | Data processing system using independent memory and register operand size specifiers and method thereof |
US7107436B2 (en) * | 2003-09-08 | 2006-09-12 | Freescale Semiconductor, Inc. | Conditional next portion transferring of data stream to or from register based on subsequent instruction aspect |
US7315932B2 (en) * | 2003-09-08 | 2008-01-01 | Moyer William C | Data processing system having instruction specifiers for SIMD register operands and method thereof |
-
2002
- 2002-06-28 US US10/184,583 patent/US7159099B2/en not_active Expired - Fee Related
-
2003
- 2003-05-20 JP JP2004517568A patent/JP2005531848A/ja active Pending
- 2003-05-20 AU AU2003228247A patent/AU2003228247A1/en not_active Abandoned
- 2003-05-20 CN CN03815336XA patent/CN1666187A/zh active Pending
- 2003-05-20 WO PCT/US2003/016019 patent/WO2004003767A1/en not_active Application Discontinuation
- 2003-05-20 EP EP03726946A patent/EP1535171A4/en not_active Withdrawn
- 2003-06-11 TW TW092115849A patent/TWI234738B/zh not_active IP Right Cessation
- 2003-09-08 US US10/657,793 patent/US7100019B2/en not_active Expired - Fee Related
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI476695B (zh) * | 2011-11-30 | 2015-03-11 | Intel Corp | 提供向量水平比較功能之指令與邏輯 |
US9665371B2 (en) | 2011-11-30 | 2017-05-30 | Intel Corporation | Providing vector horizontal compare functionality within a vector register |
US10318291B2 (en) | 2011-11-30 | 2019-06-11 | Intel Corporation | Providing vector horizontal compare functionality within a vector register |
TWI507982B (zh) * | 2013-08-14 | 2015-11-11 | Qualcomm Inc | 向量算術縮減 |
Also Published As
Publication number | Publication date |
---|---|
EP1535171A1 (en) | 2005-06-01 |
US7159099B2 (en) | 2007-01-02 |
AU2003228247A1 (en) | 2004-01-19 |
EP1535171A4 (en) | 2007-02-28 |
US7100019B2 (en) | 2006-08-29 |
TW200405981A (en) | 2004-04-16 |
WO2004003767A1 (en) | 2004-01-08 |
US20040117595A1 (en) | 2004-06-17 |
CN1666187A (zh) | 2005-09-07 |
US20040003206A1 (en) | 2004-01-01 |
JP2005531848A (ja) | 2005-10-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
TWI234738B (en) | Re-configurable streaming vector processor | |
KR102549680B1 (ko) | 벡터 계산 유닛 | |
US6839728B2 (en) | Efficient complex multiplication and fast fourier transform (FFT) implementation on the manarray architecture | |
JP3559046B2 (ja) | データ処理マネージメントシステム | |
US5179530A (en) | Architecture for integrated concurrent vector signal processor | |
JP3983857B2 (ja) | ベクトルレジスタの複数バンクを用いた単一命令複数データ処理 | |
JP6895484B2 (ja) | マルチスレッドプロセッサのレジスタファイル | |
Yu et al. | Vector processing as a soft-core CPU accelerator | |
US5926644A (en) | Instruction formats/instruction encoding | |
EP1840742A2 (en) | Method and apparatus for operating a computer processor array | |
WO2006115635A2 (en) | Automatic configuration of streaming processor architectures | |
US4298936A (en) | Array Processor | |
US6934938B2 (en) | Method of programming linear graphs for streaming vector computation | |
TWI794789B (zh) | 向量運算裝置和方法 | |
CN111183418A (zh) | 可配置硬件加速器 | |
Gealow et al. | System design for pixel-parallel image processing | |
Corporaal et al. | Cosynthesis with the MOVE framework | |
JPH10143494A (ja) | スカラ/ベクトル演算の組み合わせられた単一命令複数データ処理 | |
KR19980018071A (ko) | 멀티미디어 신호 프로세서의 단일 명령 다중 데이터 처리 | |
WO2001044964A2 (en) | Digital signal processor having a plurality of independent dedicated processors | |
JP2004515856A (ja) | ディジタル信号処理装置 | |
CN105593809A (zh) | 灵活配置硬件流传输单元 | |
Santiago et al. | Compiler for the Versat reconfigurable architecture | |
Pitsianis et al. | High-performance FFT implementation on the BOPS ManArray parallel DSP | |
JP2654451B2 (ja) | データ出力方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
MM4A | Annulment or lapse of patent due to non-payment of fees |