JP2005532601A - 単一命令複数データ(simd)命令用の積和演算(mac)ユニット - Google Patents
単一命令複数データ(simd)命令用の積和演算(mac)ユニット Download PDFInfo
- Publication number
- JP2005532601A JP2005532601A JP2003535084A JP2003535084A JP2005532601A JP 2005532601 A JP2005532601 A JP 2005532601A JP 2003535084 A JP2003535084 A JP 2003535084A JP 2003535084 A JP2003535084 A JP 2003535084A JP 2005532601 A JP2005532601 A JP 2005532601A
- Authority
- JP
- Japan
- Prior art keywords
- vector
- sum
- bits
- machine
- vectors
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000012546 transfer Methods 0.000 claims abstract description 19
- 230000009977 dual effect Effects 0.000 claims abstract description 12
- 239000013598 vector Substances 0.000 claims description 191
- 230000006835 compression Effects 0.000 claims description 37
- 238000007906 compression Methods 0.000 claims description 37
- 238000000034 method Methods 0.000 claims description 15
- 230000003068 static effect Effects 0.000 claims description 3
- 230000001186 cumulative effect Effects 0.000 claims 2
- JBYXPOFIGCOSSB-GOJKSUSPSA-N 9-cis,11-trans-octadecadienoic acid Chemical compound CCCCCC\C=C\C=C/CCCCCCCC(O)=O JBYXPOFIGCOSSB-GOJKSUSPSA-N 0.000 description 9
- 238000009825 accumulation Methods 0.000 description 8
- 238000012545 processing Methods 0.000 description 7
- 101000577080 Homo sapiens Mitochondrial-processing peptidase subunit alpha Proteins 0.000 description 6
- 101000740659 Homo sapiens Scavenger receptor class B member 1 Proteins 0.000 description 6
- 102100025321 Mitochondrial-processing peptidase subunit alpha Human genes 0.000 description 6
- 101150042828 csa1 gene Proteins 0.000 description 5
- 238000010586 diagram Methods 0.000 description 5
- 102100034033 Alpha-adducin Human genes 0.000 description 1
- 101000799076 Homo sapiens Alpha-adducin Proteins 0.000 description 1
- 101000629598 Rattus norvegicus Sterol regulatory element-binding protein 1 Proteins 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 235000019800 disodium phosphate Nutrition 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/544—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices for evaluating functions by calculation
- G06F7/5443—Sum of products
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2207/00—Indexing scheme relating to methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F2207/38—Indexing scheme relating to groups G06F7/38 - G06F7/575
- G06F2207/3804—Details
- G06F2207/3808—Details concerning the type of numbers or the way they are handled
- G06F2207/3812—Devices capable of handling different types of numbers
- G06F2207/382—Reconfigurable for different fixed word lengths
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2207/00—Indexing scheme relating to methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F2207/38—Indexing scheme relating to groups G06F7/38 - G06F7/575
- G06F2207/3804—Details
- G06F2207/3808—Details concerning the type of numbers or the way they are handled
- G06F2207/3828—Multigauge devices, i.e. capable of handling packed numbers without unpacking them
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2207/00—Indexing scheme relating to methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F2207/38—Indexing scheme relating to groups G06F7/38 - G06F7/575
- G06F2207/3804—Details
- G06F2207/386—Special constructional features
- G06F2207/3884—Pipelining
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/52—Multiplying; Dividing
- G06F7/523—Multiplying only
- G06F7/53—Multiplying only in parallel-parallel fashion, i.e. both operands being entered in parallel
- G06F7/5318—Multiplying only in parallel-parallel fashion, i.e. both operands being entered in parallel with column wise addition of partial products, e.g. using Wallace tree, Dadda counters
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/52—Multiplying; Dividing
- G06F7/523—Multiplying only
- G06F7/53—Multiplying only in parallel-parallel fashion, i.e. both operands being entered in parallel
- G06F7/5324—Multiplying only in parallel-parallel fashion, i.e. both operands being entered in parallel partitioned, i.e. using repetitively a smaller parallel parallel multiplier or using an array of such smaller multipliers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/52—Multiplying; Dividing
- G06F7/523—Multiplying only
- G06F7/533—Reduction of the number of iteration steps or stages, e.g. using the Booth algorithm, log-sum, odd-even
- G06F7/5334—Reduction of the number of iteration steps or stages, e.g. using the Booth algorithm, log-sum, odd-even by using multiple bit scanning, i.e. by decoding groups of successive multiplier bits in order to select an appropriate precalculated multiple of the multiplicand as a partial product
- G06F7/5336—Reduction of the number of iteration steps or stages, e.g. using the Booth algorithm, log-sum, odd-even by using multiple bit scanning, i.e. by decoding groups of successive multiplier bits in order to select an appropriate precalculated multiple of the multiplicand as a partial product overlapped, i.e. with successive bitgroups sharing one or more bits being recoded into signed digit representation, e.g. using the Modified Booth Algorithm
- G06F7/5338—Reduction of the number of iteration steps or stages, e.g. using the Booth algorithm, log-sum, odd-even by using multiple bit scanning, i.e. by decoding groups of successive multiplier bits in order to select an appropriate precalculated multiple of the multiplicand as a partial product overlapped, i.e. with successive bitgroups sharing one or more bits being recoded into signed digit representation, e.g. using the Modified Booth Algorithm each bitgroup having two new bits, e.g. 2nd order MBA
Landscapes
- Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computational Mathematics (AREA)
- Advance Control (AREA)
- Complex Calculations (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Mobile Radio Communication Systems (AREA)
Abstract
Description
Claims (37)
- パイプライン中の第1積和演算において第1圧縮演算を行なうステップと、
前記第1積和演算中の第1圧縮演算において2つ以上の中間ベクトルを生成するステップと、
前記2つ以上の中間ベクトルのそれぞれの少なくとも一部を前記パイプライン中の第2積和演算に転送するステップと
を備える、方法。 - 前記2つ以上の中間ベクトルのそれぞれの少なくとも一部を転送するステップが、前記2つ以上の中間ベクトルのそれぞれの下位部分を転送するステップを備える、請求項1に記載の方法。
- 前記第1圧縮演算を行なうステップが、第1の複数の部分積を第1和ベクトルおよび第1キャリ・ベクトルへ圧縮し、第2の複数の部分積を第2和ベクトルおよび第2キャリ・ベクトルへ圧縮するステップとを備える、請求項1に記載の方法。
- 前記2つ以上の中間ベクトルを生成するステップが、前記第1和ベクトル、前記第2和ベクトル、前記第1キャリ・ベクトルおよび前記第2キャリ・ベクトルを、中間和ベクトルおよび中間キャリ・ベクトルへ圧縮するステップを備える、請求項1に記載の方法。
- 前記転送するステップが、前記2つ以上の中間ベクトルのそれぞれの少なくとも一部をウォリスのトリー圧縮ユニットへ転送するステップを備える、請求項1に記載の方法。
- 機械実行命令を格納する機械可読メディアを含む物品であって、前記命令が機械に、
パイプライン中の第1積和演算において第1圧縮演算を行なわせ、
前記第1積和演算中の第1圧縮演算において2つ以上の中間ベクトルを生成させ、
前記2つ以上の中間ベクトルのそれぞれの少なくとも一部を前記パイプライン中の第2積和演算に転送させる
物品。 - 前記機械に前記2つ以上の中間ベクトルのそれぞれの少なくとも一部を転送させる命令が、前記機械に前記2つ以上の中間ベクトルのそれぞれの下位ビットを転送させる命令を備える、請求項6に記載の物品。
- 前記機械に第1圧縮演算を行なわせる命令が、前記機械に第1の複数の部分積を第1和ベクトルおよび第1キャリ・ベクトルへ圧縮させ、第2の複数の部分積を第2和ベクトルおよび第2キャリ・ベクトルへ圧縮させる命令とを備える、請求項6に記載の物品。
- 前記機械に前記2つ以上の中間ベクトルを生成させる命令が、前記機械に、前記第1和ベクトル、前記第2和ベクトル、前記第1キャリ・ベクトルおよび前記第2キャリ・ベクトルを、中間和ベクトルおよび中間キャリ・ベクトルへ圧縮させる命令を備える、請求項6に記載の物品。
- 前記機械に転送させる命令が、前記機械に、前記2つ以上の中間ベクトルのそれぞれの少なくとも一部をウォリスのトリー圧縮ユニットへ転送させる命令を備える、請求項6に記載の物品。
- 積和演算の第1ウォリスのトリー圧縮段階において、第1の複数の部分積を第1和ベクトルおよび第1キャリ・ベクトルへ圧縮し、第2の複数の部分積を第2和ベクトルおよび第2キャリ・ベクトルへ圧縮するステップと、
前記第1和ベクトル、前記第2和ベクトル、前記第1キャリ・ベクトルおよび前記第2キャリ・ベクトルを、第1中間和ベクトルおよび第1中間キャリ・ベクトルへ圧縮するステップと、
積和演算の第2段階において、前記中間和ベクトルおよび第3の複数の部分積を圧縮し、前記中間キャリ・ベクトルおよび第4の複数の部分積を圧縮するステップと
を備える、方法。 - 前記積和演算が、単一命令複数データ(SIMD)演算を含む、請求項11に記載の方法。
- 第1のペアのオペランドから前記第1の複数の部分積を生成するステップと、
第2のペアのオペランドから前記第2の複数の部分積を生成するステップと、
第3のペアのオペランドから前記第3の複数の部分積を生成するステップと、
第4のペアのオペランドから前記第4の複数の部分積を生成するステップと
を更に備える、請求項11に記載の方法。 - パイプライン中の第2積和演算へ前記中間和ベクトルおよび前記中間キャリ・ベクトルを転送するステップを更に備える、請求項11に記載の方法。
- 前記転送するステップが、前記第2積和演算における累積加算データ依存性を除去するステップを備える、請求項14に記載の方法。
- 機械実行命令を格納する機械可読メディアを含む物品であって、前記命令が機械に、
積和演算の第1ウォリスのトリー圧縮段階において、第1の複数の部分積を第1和ベクトルおよび第1キャリ・ベクトルへ圧縮させ、第2の複数の部分積を第2和ベクトルおよび第2キャリ・ベクトルへ圧縮させる命令と、
前記第1和ベクトル、前記第2和ベクトル、前記第1キャリ・ベクトルおよび前記第2キャリ・ベクトルを、第1中間和ベクトルおよび第1中間キャリ・ベクトルへ圧縮させる命令と、
積和演算の第2段階において、前記中間和ベクトルおよび第3の複数の部分積を圧縮し、前記中間キャリ・ベクトルおよび第4の複数の部分積を圧縮させる命令と
を備える、物品。 - 前記積和演算が、単一命令複数データ(SIMD)演算を含む、請求項16に記載の物品。
- 前記機械に、
第1のペアのオペランドから前記第1の複数の部分積を生成させ、
第2のペアのオペランドから前記第2の複数の部分積を生成させ、
第3のペアのオペランドから前記第3の複数の部分積を生成させ、
第4のペアのオペランドから前記第4の複数の部分積を生成させ
る命令を更に備える、請求項16に記載の物品。 - 前記機械に、パイプライン中の第2積和演算へ前記中間和ベクトルおよび前記中間キャリ・ベクトルを転送させる命令を更に備える、請求項16に記載の物品。
- 前記機械に転送させる命令が、前記機械に前記第2積和演算における累積加算データ依存性を除去させる命令を備える、請求項16記載の物品
- 積和演算の第1および第2段階においてベクトルを圧縮する第1および第2ウォリスのトリー圧縮ユニットと、
前記積和演算の前記第1段階において前記第1および第2ウォリスのトリー圧縮ユニットから出力される複数のベクトルを2つの中間ベクトルへと圧縮するコンプレッサと、
前記コンプレッサの出力からマルチプレクサーの入力までのデータ経路と
を備え、
前記マルチプレクサーが、前記2つの中間ベクトルの中の一つを前記積和演算の前記第2段階において前記第1および第2ウォリスのトリー圧縮ユニットの中の一つに選択的に入力する、
装置。 - デュアル積和演算ユニットを更に備える、請求項21に記載の装置。
- 前記複数のベクトルが、第1和ベクトル、第2和ベクトル、第1キャリ・ベクトルおよび第2キャリ・ベクトルを含む、請求項21に記載の装置。
- 前記コンプレッサが、は4:2ベクトル・コンプレッサを含む、請求項21に記載の装置。
- 前記マルチプレクサーが、前記第1ウォリスのトリー圧縮ユニットに接続される出力を有する第1マルチプレクサーと、前記第2ウォリスのトリー圧縮ユニットに接続される出力を有する第2マルチプレクサーとを備える、請求項21に記載の装置。
- 静的ランダムアクセス記憶装置と、
前記静的ランダムアクセス記憶装置に接続されるプロセッサと
を備え、
前記プロセッサがデュアル積和演算ユニットを備え、
前記ユニットが、
積和演算の第1および第2段階においてベクトルを圧縮する第1および第2ウォリスのトリー圧縮ユニットと、
前記積和演算の前記第1段階において前記第1および第2ウォリスのトリー圧縮ユニットから出力される複数のベクトルを2つの中間ベクトルへと圧縮するコンプレッサと、
前記コンプレッサの出力からマルチプレクサーの入力までのデータ経路と
を備え、
前記マルチプレクサーが、前記2つの中間ベクトルの中の一つを前記積和演算の前記第2段階において前記第1および第2ウォリスのトリー圧縮ユニットの中の一つに選択的に入力する、
システム。 - 前記マルチプレクサーは、前記第1ウォリスのトリー圧縮ユニットに接続される出力を有する第1マルチプレクサーと、前記第2ウォリスのトリー圧縮ユニットに接続される出力を有する第2マルチプレクサーとを備える、請求項26に記載のシステム。
- 4つのnビット演算として、第1および第2の2nビットオペランドの積和演算を実行するステップを備える、方法。
- 前記実行するステップが、
前記第1オペランドの下位nビットおよび前記第2オペランドの下位nビットから部分積ベクトルを生成するステップと、
前記第1オペランドの上位nビットおよび前記第2オペランドの下位nビットから部分積ベクトルを生成するステップと、
前記第1オペランドの上位nビットおよび前記第2オペランドの上位nビットから部分積ベクトルを生成するステップと、
前記第1オペランドの下位nビットおよび前記第2オペランドの上位nビットから部分積ベクトルを生成するステップと
を備える、請求項28に記載の方法。 - 前記第1オペランドの上位nビットおよび前記第2オペランドの下位nビットから生成された前記部分積を2つの中間ベクトルへと圧縮するステップと、
前記中間ベクトルを左にnビットだけシフトするステップと
を更に備える、請求項28に記載の方法。 - 前記実行するステップが、密結合デュアルnビット積和演算ユニット上で積和演算を実行するステップを備える、請求項28に記載の方法。
- n=16である、請求項28に記載の方法。
- 機械実行命令を格納する機械可読メディアを含む物品であって、前記命令が機械に、
4つのnビット演算として、第1および第2の2nビットオペランドの積和演算を実行させる、物品。 - 前記機械に実行させる命令が、
前記第1オペランドの下位nビットおよび前記第2オペランドの下位nビットから部分積ベクトルを生成させ、
前記第1オペランドの上位nビットおよび前記第2オペランドの下位nビットから部分積ベクトルを生成させ、
前記第1オペランドの上位nビットおよび前記第2オペランドの上位nビットから部分積ベクトルを生成させ、
前記第1オペランドの下位nビットおよび前記第2オペランドの上位nビットから部分積ベクトルを生成させ
る命令を備える、請求項33に記載の物品。 - 前記機械に
前記第1オペランドの上位nビットおよび前記第2オペランドの下位nビットから生成された前記部分積を2つの中間ベクトルへと圧縮させ、
前記中間ベクトルを左にnビットだけシフトさせ
る命令を更に備える、請求項33に記載の物品。 - 前記機械に実行させる命令が、密結合デュアルnビット積和演算ユニット上で積和演算を実行させる命令を備える、請求項33に記載の物品。
- n=16である、請求項33に記載の物品。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US09/972,720 US7107305B2 (en) | 2001-10-05 | 2001-10-05 | Multiply-accumulate (MAC) unit for single-instruction/multiple-data (SIMD) instructions |
PCT/US2002/031412 WO2003032187A2 (en) | 2001-10-05 | 2002-10-03 | Multiply-accumulate (mac) unit for single-instruction/multiple-data (simd) instructions |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP2008073962A Division JP4555356B2 (ja) | 2001-10-05 | 2008-03-21 | 単一命令複数データ(simd)命令用の積和演算(mac)ユニット |
Publications (2)
Publication Number | Publication Date |
---|---|
JP2005532601A true JP2005532601A (ja) | 2005-10-27 |
JP4584580B2 JP4584580B2 (ja) | 2010-11-24 |
Family
ID=25520040
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP2003535084A Expired - Fee Related JP4584580B2 (ja) | 2001-10-05 | 2002-10-03 | 単一命令複数データ(simd)命令用の積和演算(mac)ユニット |
JP2008073962A Expired - Fee Related JP4555356B2 (ja) | 2001-10-05 | 2008-03-21 | 単一命令複数データ(simd)命令用の積和演算(mac)ユニット |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP2008073962A Expired - Fee Related JP4555356B2 (ja) | 2001-10-05 | 2008-03-21 | 単一命令複数データ(simd)命令用の積和演算(mac)ユニット |
Country Status (11)
Country | Link |
---|---|
US (1) | US7107305B2 (ja) |
EP (1) | EP1446728B1 (ja) |
JP (2) | JP4584580B2 (ja) |
KR (1) | KR100834178B1 (ja) |
CN (1) | CN100474235C (ja) |
AT (1) | ATE371893T1 (ja) |
AU (1) | AU2002334792A1 (ja) |
DE (1) | DE60222163T2 (ja) |
HK (1) | HK1065127A1 (ja) |
TW (1) | TWI242742B (ja) |
WO (1) | WO2003032187A2 (ja) |
Families Citing this family (79)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6581003B1 (en) * | 2001-12-20 | 2003-06-17 | Garmin Ltd. | Systems and methods for a navigational device with forced layer switching based on memory constraints |
US7532720B2 (en) * | 2003-10-15 | 2009-05-12 | Microsoft Corporation | Utilizing SIMD instructions within montgomery multiplication |
JP4502662B2 (ja) * | 2004-02-20 | 2010-07-14 | アルテラ コーポレイション | 乗算器−累算器ブロックモード分割 |
US7353244B2 (en) | 2004-04-16 | 2008-04-01 | Marvell International Ltd. | Dual-multiply-accumulator operation optimized for even and odd multisample calculations |
US20060004903A1 (en) * | 2004-06-30 | 2006-01-05 | Itay Admon | CSA tree constellation |
US8856201B1 (en) | 2004-11-10 | 2014-10-07 | Altera Corporation | Mixed-mode multiplier using hard and soft logic circuitry |
EP1710691A1 (en) * | 2005-04-07 | 2006-10-11 | STMicroelectronics (Research & Development) Limited | MAC/MUL unit |
US8620980B1 (en) | 2005-09-27 | 2013-12-31 | Altera Corporation | Programmable device with specialized multiplier blocks |
CN101359284B (zh) * | 2006-02-06 | 2011-05-11 | 威盛电子股份有限公司 | 处理数个不同数据格式的乘法累加单元及其方法 |
US8301681B1 (en) | 2006-02-09 | 2012-10-30 | Altera Corporation | Specialized processing block for programmable logic device |
US8041759B1 (en) * | 2006-02-09 | 2011-10-18 | Altera Corporation | Specialized processing block for programmable logic device |
US8266198B2 (en) | 2006-02-09 | 2012-09-11 | Altera Corporation | Specialized processing block for programmable logic device |
US8266199B2 (en) * | 2006-02-09 | 2012-09-11 | Altera Corporation | Specialized processing block for programmable logic device |
US7836117B1 (en) | 2006-04-07 | 2010-11-16 | Altera Corporation | Specialized processing block for programmable logic device |
US7822799B1 (en) | 2006-06-26 | 2010-10-26 | Altera Corporation | Adder-rounder circuitry for specialized processing block in programmable logic device |
US7783862B2 (en) * | 2006-08-07 | 2010-08-24 | International Characters, Inc. | Method and apparatus for an inductive doubling architecture |
US8386550B1 (en) | 2006-09-20 | 2013-02-26 | Altera Corporation | Method for configuring a finite impulse response filter in a programmable logic device |
US8122078B2 (en) * | 2006-10-06 | 2012-02-21 | Calos Fund, LLC | Processor with enhanced combined-arithmetic capability |
US7930336B2 (en) | 2006-12-05 | 2011-04-19 | Altera Corporation | Large multiplier for programmable logic device |
US8386553B1 (en) | 2006-12-05 | 2013-02-26 | Altera Corporation | Large multiplier for programmable logic device |
US20080140753A1 (en) * | 2006-12-08 | 2008-06-12 | Vinodh Gopal | Multiplier |
US7814137B1 (en) | 2007-01-09 | 2010-10-12 | Altera Corporation | Combined interpolation and decimation filter for programmable logic device |
US8650231B1 (en) | 2007-01-22 | 2014-02-11 | Altera Corporation | Configuring floating point operations in a programmable device |
US7865541B1 (en) | 2007-01-22 | 2011-01-04 | Altera Corporation | Configuring floating point operations in a programmable logic device |
US8645450B1 (en) | 2007-03-02 | 2014-02-04 | Altera Corporation | Multiplier-accumulator circuitry and methods |
US7949699B1 (en) | 2007-08-30 | 2011-05-24 | Altera Corporation | Implementation of decimation filter in integrated circuit device using ram-based data storage |
US8959137B1 (en) | 2008-02-20 | 2015-02-17 | Altera Corporation | Implementing large multipliers in a programmable integrated circuit device |
US8307023B1 (en) | 2008-10-10 | 2012-11-06 | Altera Corporation | DSP block for implementing large multiplier on a programmable integrated circuit device |
US8706790B1 (en) | 2009-03-03 | 2014-04-22 | Altera Corporation | Implementing mixed-precision floating-point operations in a programmable integrated circuit device |
US8468192B1 (en) | 2009-03-03 | 2013-06-18 | Altera Corporation | Implementing multipliers in a programmable integrated circuit device |
US8645449B1 (en) | 2009-03-03 | 2014-02-04 | Altera Corporation | Combined floating point adder and subtractor |
US8650236B1 (en) | 2009-08-04 | 2014-02-11 | Altera Corporation | High-rate interpolation or decimation filter in integrated circuit device |
JP5376659B2 (ja) * | 2009-09-03 | 2013-12-25 | エヌイーシーコンピュータテクノ株式会社 | 積和演算装置及び積和演算装置の制御方法 |
US20110055445A1 (en) * | 2009-09-03 | 2011-03-03 | Azuray Technologies, Inc. | Digital Signal Processing Systems |
US8412756B1 (en) | 2009-09-11 | 2013-04-02 | Altera Corporation | Multi-operand floating point operations in a programmable integrated circuit device |
US8396914B1 (en) | 2009-09-11 | 2013-03-12 | Altera Corporation | Matrix decomposition in an integrated circuit device |
US8996845B2 (en) * | 2009-12-22 | 2015-03-31 | Intel Corporation | Vector compare-and-exchange operation |
US7948267B1 (en) | 2010-02-09 | 2011-05-24 | Altera Corporation | Efficient rounding circuits and methods in configurable integrated circuit devices |
US8539016B1 (en) | 2010-02-09 | 2013-09-17 | Altera Corporation | QR decomposition in an integrated circuit device |
US8601044B2 (en) | 2010-03-02 | 2013-12-03 | Altera Corporation | Discrete Fourier Transform in an integrated circuit device |
US8484265B1 (en) | 2010-03-04 | 2013-07-09 | Altera Corporation | Angular range reduction in an integrated circuit device |
US8510354B1 (en) | 2010-03-12 | 2013-08-13 | Altera Corporation | Calculation of trigonometric functions in an integrated circuit device |
US8539014B2 (en) | 2010-03-25 | 2013-09-17 | Altera Corporation | Solving linear matrices in an integrated circuit device |
US8589463B2 (en) | 2010-06-25 | 2013-11-19 | Altera Corporation | Calculation of trigonometric functions in an integrated circuit device |
US8862650B2 (en) | 2010-06-25 | 2014-10-14 | Altera Corporation | Calculation of trigonometric functions in an integrated circuit device |
US8577951B1 (en) | 2010-08-19 | 2013-11-05 | Altera Corporation | Matrix operations in an integrated circuit device |
US8478969B2 (en) * | 2010-09-24 | 2013-07-02 | Intel Corporation | Performing a multiply-multiply-accumulate instruction |
US8645451B2 (en) | 2011-03-10 | 2014-02-04 | Altera Corporation | Double-clocked specialized processing block in an integrated circuit device |
US9600278B1 (en) | 2011-05-09 | 2017-03-21 | Altera Corporation | Programmable device using fixed and configurable logic to implement recursive trees |
US8812576B1 (en) | 2011-09-12 | 2014-08-19 | Altera Corporation | QR decomposition in an integrated circuit device |
US8949298B1 (en) | 2011-09-16 | 2015-02-03 | Altera Corporation | Computing floating-point polynomials in an integrated circuit device |
US9053045B1 (en) | 2011-09-16 | 2015-06-09 | Altera Corporation | Computing floating-point polynomials in an integrated circuit device |
US8762443B1 (en) | 2011-11-15 | 2014-06-24 | Altera Corporation | Matrix operations in an integrated circuit device |
US8868634B2 (en) * | 2011-12-02 | 2014-10-21 | Advanced Micro Devices, Inc. | Method and apparatus for performing multiplication in a processor |
CN102520906A (zh) * | 2011-12-13 | 2012-06-27 | 中国科学院自动化研究所 | 支持定浮点可重构的向量长度可配置的向量点积累加网络 |
US9235414B2 (en) * | 2011-12-19 | 2016-01-12 | Intel Corporation | SIMD integer multiply-accumulate instruction for multi-precision arithmetic |
US8543634B1 (en) | 2012-03-30 | 2013-09-24 | Altera Corporation | Specialized processing block for programmable integrated circuit device |
US9098332B1 (en) | 2012-06-01 | 2015-08-04 | Altera Corporation | Specialized processing block with fixed- and floating-point structures |
US8996600B1 (en) | 2012-08-03 | 2015-03-31 | Altera Corporation | Specialized processing block for implementing floating-point multiplier with subnormal operation support |
US9207909B1 (en) | 2012-11-26 | 2015-12-08 | Altera Corporation | Polynomial calculations optimized for programmable integrated circuit device structures |
US9275014B2 (en) * | 2013-03-13 | 2016-03-01 | Qualcomm Incorporated | Vector processing engines having programmable data path configurations for providing multi-mode radix-2x butterfly vector processing circuits, and related vector processors, systems, and methods |
US9189200B1 (en) | 2013-03-14 | 2015-11-17 | Altera Corporation | Multiple-precision processing block in a programmable integrated circuit device |
US9348795B1 (en) | 2013-07-03 | 2016-05-24 | Altera Corporation | Programmable device using fixed and configurable logic to implement floating-point rounding |
US9684488B2 (en) | 2015-03-26 | 2017-06-20 | Altera Corporation | Combined adder and pre-adder for high-radix multiplier circuit |
US10489155B2 (en) | 2015-07-21 | 2019-11-26 | Qualcomm Incorporated | Mixed-width SIMD operations using even/odd register pairs for wide data elements |
CN107977192B (zh) * | 2016-10-21 | 2024-09-13 | 超威半导体公司 | 用于执行低功率和低延时多精度计算的方法和系统 |
US10942706B2 (en) | 2017-05-05 | 2021-03-09 | Intel Corporation | Implementation of floating-point trigonometric functions in an integrated circuit device |
KR102408858B1 (ko) | 2017-12-19 | 2022-06-14 | 삼성전자주식회사 | 비휘발성 메모리 장치, 이를 포함하는 메모리 시스템 및 비휘발성 메모리 장치의 동작 방법 |
US11409525B2 (en) * | 2018-01-24 | 2022-08-09 | Intel Corporation | Apparatus and method for vector multiply and accumulate of packed words |
WO2020046642A1 (en) | 2018-08-31 | 2020-03-05 | Flex Logix Technologies, Inc. | Multiplier-accumulator circuit, logic tile architecture for multiply-accumulate and ic including logic tile array |
US11194585B2 (en) | 2019-03-25 | 2021-12-07 | Flex Logix Technologies, Inc. | Multiplier-accumulator circuitry having processing pipelines and methods of operating same |
US11314504B2 (en) | 2019-04-09 | 2022-04-26 | Flex Logix Technologies, Inc. | Multiplier-accumulator processing pipelines and processing component, and methods of operating same |
US11288076B2 (en) | 2019-09-13 | 2022-03-29 | Flex Logix Technologies, Inc. | IC including logic tile, having reconfigurable MAC pipeline, and reconfigurable memory |
US11455368B2 (en) | 2019-10-02 | 2022-09-27 | Flex Logix Technologies, Inc. | MAC processing pipeline having conversion circuitry, and methods of operating same |
US12015428B2 (en) | 2019-11-05 | 2024-06-18 | Flex Logix Technologies, Inc. | MAC processing pipeline using filter weights having enhanced dynamic range, and methods of operating same |
US11693625B2 (en) | 2019-12-04 | 2023-07-04 | Flex Logix Technologies, Inc. | Logarithmic addition-accumulator circuitry, processing pipeline including same, and methods of operation |
US11960856B1 (en) | 2020-01-15 | 2024-04-16 | Flex Logix Technologies, Inc. | Multiplier-accumulator processing pipeline using filter weights having gaussian floating point data format |
US11442881B2 (en) | 2020-04-18 | 2022-09-13 | Flex Logix Technologies, Inc. | MAC processing pipelines, circuitry to control and configure same, and methods of operating same |
US11604645B2 (en) | 2020-07-22 | 2023-03-14 | Flex Logix Technologies, Inc. | MAC processing pipelines having programmable granularity, and methods of operating same |
Family Cites Families (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP3435744B2 (ja) * | 1993-09-09 | 2003-08-11 | 富士通株式会社 | 乗算回路 |
US6385634B1 (en) * | 1995-08-31 | 2002-05-07 | Intel Corporation | Method for performing multiply-add operations on packed data |
US7395298B2 (en) * | 1995-08-31 | 2008-07-01 | Intel Corporation | Method and apparatus for performing multiply-add operations on packed data |
US5777679A (en) * | 1996-03-15 | 1998-07-07 | International Business Machines Corporation | Video decoder including polyphase fir horizontal filter |
JPH10207863A (ja) * | 1997-01-21 | 1998-08-07 | Toshiba Corp | 演算処理装置 |
US5847981A (en) * | 1997-09-04 | 1998-12-08 | Motorola, Inc. | Multiply and accumulate circuit |
DE69941287D1 (de) * | 1998-01-21 | 2009-10-01 | Panasonic Corp | Verfahren und apparat für arithmetische operationen |
JP2000081966A (ja) * | 1998-07-09 | 2000-03-21 | Matsushita Electric Ind Co Ltd | 演算装置 |
US6571268B1 (en) * | 1998-10-06 | 2003-05-27 | Texas Instruments Incorporated | Multiplier accumulator circuits |
US6542915B1 (en) * | 1999-06-17 | 2003-04-01 | International Business Machines Corporation | Floating point pipeline with a leading zeros anticipator circuit |
US6532485B1 (en) * | 1999-09-08 | 2003-03-11 | Sun Microsystems, Inc. | Method and apparatus for performing multiplication/addition operations |
US6574651B1 (en) * | 1999-10-01 | 2003-06-03 | Hitachi, Ltd. | Method and apparatus for arithmetic operation on vectored data |
US6611856B1 (en) | 1999-12-23 | 2003-08-26 | Intel Corporation | Processing multiply-accumulate operations in a single cycle |
US6922716B2 (en) * | 2001-07-13 | 2005-07-26 | Motorola, Inc. | Method and apparatus for vector processing |
-
2001
- 2001-10-05 US US09/972,720 patent/US7107305B2/en not_active Expired - Fee Related
-
2002
- 2002-10-03 KR KR1020047005030A patent/KR100834178B1/ko not_active IP Right Cessation
- 2002-10-03 EP EP02800879A patent/EP1446728B1/en not_active Expired - Lifetime
- 2002-10-03 DE DE60222163T patent/DE60222163T2/de not_active Expired - Lifetime
- 2002-10-03 WO PCT/US2002/031412 patent/WO2003032187A2/en active IP Right Grant
- 2002-10-03 AT AT02800879T patent/ATE371893T1/de not_active IP Right Cessation
- 2002-10-03 JP JP2003535084A patent/JP4584580B2/ja not_active Expired - Fee Related
- 2002-10-03 AU AU2002334792A patent/AU2002334792A1/en not_active Abandoned
- 2002-10-03 CN CNB028196473A patent/CN100474235C/zh not_active Expired - Fee Related
- 2002-10-04 TW TW091122994A patent/TWI242742B/zh not_active IP Right Cessation
-
2004
- 2004-09-07 HK HK04106791A patent/HK1065127A1/xx not_active IP Right Cessation
-
2008
- 2008-03-21 JP JP2008073962A patent/JP4555356B2/ja not_active Expired - Fee Related
Also Published As
Publication number | Publication date |
---|---|
US20030069913A1 (en) | 2003-04-10 |
EP1446728B1 (en) | 2007-08-29 |
WO2003032187A3 (en) | 2004-06-10 |
ATE371893T1 (de) | 2007-09-15 |
JP2008217805A (ja) | 2008-09-18 |
TWI242742B (en) | 2005-11-01 |
KR20040048937A (ko) | 2004-06-10 |
CN100474235C (zh) | 2009-04-01 |
CN1633637A (zh) | 2005-06-29 |
AU2002334792A1 (en) | 2003-04-22 |
WO2003032187A2 (en) | 2003-04-17 |
DE60222163D1 (de) | 2007-10-11 |
US7107305B2 (en) | 2006-09-12 |
EP1446728A2 (en) | 2004-08-18 |
KR100834178B1 (ko) | 2008-05-30 |
HK1065127A1 (en) | 2005-02-08 |
JP4555356B2 (ja) | 2010-09-29 |
DE60222163T2 (de) | 2008-06-12 |
JP4584580B2 (ja) | 2010-11-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP4555356B2 (ja) | 単一命令複数データ(simd)命令用の積和演算(mac)ユニット | |
RU2263947C2 (ru) | Целочисленное умножение высокого порядка с округлением и сдвигом в архитектуре с одним потоком команд и множеством потоков данных | |
JP3869269B2 (ja) | 単一サイクルにおける乗算累算演算の処理 | |
JP4064989B2 (ja) | パック・データの乗加算演算を実行する装置 | |
US7536430B2 (en) | Method and system for performing calculation operations and a device | |
EP1576493B1 (en) | Method, device and system for performing calculation operations | |
KR100329339B1 (ko) | 압축데이터에의한승산-가산연산수행장치 | |
JP5273866B2 (ja) | 乗算器/アキュムレータ・ユニット | |
US8074058B2 (en) | Providing extended precision in SIMD vector arithmetic operations | |
US6324638B1 (en) | Processor having vector processing capability and method for executing a vector instruction in a processor | |
US6675286B1 (en) | Multimedia instruction set for wide data paths | |
US10929101B2 (en) | Processor with efficient arithmetic units | |
Kumar et al. | Analysis of low power, area and high speed multipliers for DSP applications | |
US20220012304A1 (en) | Fast matrix multiplication | |
US7047271B2 (en) | DSP execution unit for efficient alternate modes for processing multiple data sizes | |
Quan et al. | A novel vector/SIMD multiply-accumulate unit based on reconfigurable booth array | |
JP2000081966A (ja) | 演算装置 | |
Bhaskaran et al. | Multimedia architectures: from desktop systems to portable appliances | |
JP2002157114A (ja) | 乗算器及びそれを搭載した集積回路装置 | |
Sangireddy et al. | On-chip adaptive circuits for fast media processing | |
Tang et al. | Integrated partition integer execution unit for multimedia and conventional applications | |
Wu et al. | Design of a low latency high speed pipelining multiplier | |
KURODA et al. | A multimedia architecture extension for an embedded RISC processor | |
Ong et al. | A MDSP (multimedia DSP) chip for portable multimedia applications | |
Farooqui et al. | Architecture for Single-Chip ASIC Processor with Integrated Floating Point Unit |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
A977 | Report on retrieval |
Free format text: JAPANESE INTERMEDIATE CODE: A971007 Effective date: 20061226 |
|
A131 | Notification of reasons for refusal |
Free format text: JAPANESE INTERMEDIATE CODE: A131 Effective date: 20070116 |
|
A601 | Written request for extension of time |
Free format text: JAPANESE INTERMEDIATE CODE: A601 Effective date: 20070411 |
|
A602 | Written permission of extension of time |
Free format text: JAPANESE INTERMEDIATE CODE: A602 Effective date: 20070418 |
|
A601 | Written request for extension of time |
Free format text: JAPANESE INTERMEDIATE CODE: A601 Effective date: 20070510 |
|
A602 | Written permission of extension of time |
Free format text: JAPANESE INTERMEDIATE CODE: A602 Effective date: 20070517 |
|
A601 | Written request for extension of time |
Free format text: JAPANESE INTERMEDIATE CODE: A601 Effective date: 20070614 |
|
A602 | Written permission of extension of time |
Free format text: JAPANESE INTERMEDIATE CODE: A602 Effective date: 20070621 |
|
A521 | Request for written amendment filed |
Free format text: JAPANESE INTERMEDIATE CODE: A523 Effective date: 20070713 |
|
A02 | Decision of refusal |
Free format text: JAPANESE INTERMEDIATE CODE: A02 Effective date: 20071127 |
|
A521 | Request for written amendment filed |
Free format text: JAPANESE INTERMEDIATE CODE: A523 Effective date: 20080321 |
|
A911 | Transfer to examiner for re-examination before appeal (zenchi) |
Free format text: JAPANESE INTERMEDIATE CODE: A911 Effective date: 20080522 |
|
A912 | Re-examination (zenchi) completed and case transferred to appeal board |
Free format text: JAPANESE INTERMEDIATE CODE: A912 Effective date: 20080815 |
|
A601 | Written request for extension of time |
Free format text: JAPANESE INTERMEDIATE CODE: A601 Effective date: 20100319 |
|
A602 | Written permission of extension of time |
Free format text: JAPANESE INTERMEDIATE CODE: A602 Effective date: 20100325 |
|
A521 | Request for written amendment filed |
Free format text: JAPANESE INTERMEDIATE CODE: A821 Effective date: 20100423 |
|
A01 | Written decision to grant a patent or to grant a registration (utility model) |
Free format text: JAPANESE INTERMEDIATE CODE: A01 |
|
A61 | First payment of annual fees (during grant procedure) |
Free format text: JAPANESE INTERMEDIATE CODE: A61 Effective date: 20100902 |
|
R150 | Certificate of patent or registration of utility model |
Free format text: JAPANESE INTERMEDIATE CODE: R150 |
|
FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20130910 Year of fee payment: 3 |
|
R250 | Receipt of annual fees |
Free format text: JAPANESE INTERMEDIATE CODE: R250 |
|
R250 | Receipt of annual fees |
Free format text: JAPANESE INTERMEDIATE CODE: R250 |
|
R250 | Receipt of annual fees |
Free format text: JAPANESE INTERMEDIATE CODE: R250 |
|
R250 | Receipt of annual fees |
Free format text: JAPANESE INTERMEDIATE CODE: R250 |
|
R250 | Receipt of annual fees |
Free format text: JAPANESE INTERMEDIATE CODE: R250 |
|
LAPS | Cancellation because of no payment of annual fees |