KR101009095B1 - 다목적 배정도 기능 유닛을 구비한 그래픽 프로세서 - Google Patents
다목적 배정도 기능 유닛을 구비한 그래픽 프로세서 Download PDFInfo
- Publication number
- KR101009095B1 KR101009095B1 KR1020080124099A KR20080124099A KR101009095B1 KR 101009095 B1 KR101009095 B1 KR 101009095B1 KR 1020080124099 A KR1020080124099 A KR 1020080124099A KR 20080124099 A KR20080124099 A KR 20080124099A KR 101009095 B1 KR101009095 B1 KR 101009095B1
- Authority
- KR
- South Korea
- Prior art keywords
- double
- precision
- operand
- dfma
- operands
- Prior art date
Links
- 238000012545 processing Methods 0.000 claims abstract description 48
- 238000009877 rendering Methods 0.000 claims abstract description 30
- 238000000034 method Methods 0.000 claims description 31
- 238000006243 chemical reaction Methods 0.000 claims description 24
- 238000001514 detection method Methods 0.000 description 39
- 238000000605 extraction Methods 0.000 description 33
- 238000004364 calculation method Methods 0.000 description 29
- 238000012360 testing method Methods 0.000 description 24
- 238000002360 preparation method Methods 0.000 description 23
- 238000010586 diagram Methods 0.000 description 22
- 230000006870 function Effects 0.000 description 15
- 238000012986 modification Methods 0.000 description 11
- 230000004048 modification Effects 0.000 description 11
- 238000010606 normalization Methods 0.000 description 11
- 239000000872 buffer Substances 0.000 description 8
- 238000013461 design Methods 0.000 description 8
- 230000004044 response Effects 0.000 description 7
- 238000013459 approach Methods 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 238000012546 transfer Methods 0.000 description 5
- 230000000295 complement effect Effects 0.000 description 4
- 238000004891 communication Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000007789 sealing Methods 0.000 description 3
- 238000012163 sequencing technique Methods 0.000 description 3
- 238000007796 conventional method Methods 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 239000012634 fragment Substances 0.000 description 1
- 210000004185 liver Anatomy 0.000 description 1
- 238000007620 mathematical function Methods 0.000 description 1
- 229920001690 polydopamine Polymers 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000010977 unit operation Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G06F9/3001—Arithmetic instructions
- G06F9/30014—Arithmetic instructions with variable precision
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/14—Digital output to display device ; Cooperation and interconnection of the display device with other functional units
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/50—Adding; Subtracting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30098—Register arrangements
- G06F9/30105—Register structure
- G06F9/30112—Register structure comprising data of variable length
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30098—Register arrangements
- G06F9/3012—Organisation of register space, e.g. banked or distributed register file
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30098—Register arrangements
- G06F9/3012—Organisation of register space, e.g. banked or distributed register file
- G06F9/30123—Organisation of register space, e.g. banked or distributed register file according to context, e.g. thread buffers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
- G06F9/3851—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution from multiple instruction streams, e.g. multistreaming
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
- G06F9/3887—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled by a single instruction for multiple data lanes [SIMD]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
- G06F9/3888—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled by a single instruction for multiple threads [SIMT] in parallel
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
- G06F9/3893—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled in tandem, e.g. multiplier-accumulator
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
- G06T1/20—Processor architectures; Processor configuration, e.g. pipelining
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Analysis (AREA)
- Computational Mathematics (AREA)
- Multimedia (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Optimization (AREA)
- Computing Systems (AREA)
- Human Computer Interaction (AREA)
- Image Processing (AREA)
- Image Generation (AREA)
- Complex Calculations (AREA)
Abstract
Description
Claims (23)
- 그래픽 프로세서에 있어서,이미지 데이터를 생성하도록 구성된 렌더링 파이프라인(rendering pipeline) - 상기 렌더링 파이프라인은 복수의 동시 발생적인 스레드들을 실행하도록 구성된 처리 코어를 포함하고, 단정도 피연산자들(single-precision operands)을 연산함 - 을 포함하고,상기 처리 코어는 배정도(double-precision) 입력 피연산자들의 세트에 대해 복수의 배정도 연산들 중 하나를 선택가능하게 실행하도록 구성된 다목적 배정도 기능 유닛을 더 포함하고, 상기 다목적 배정도 기능 유닛은 적어도 하나의 산술 로직 회로를 포함하며,상기 배정도 기능 유닛의 모든 산술 로직 회로들은 배정도로 연산할 수 있는 그래픽 프로세서.
- 제1항에 있어서,상기 배정도 기능 유닛은 상기 복수의 배정도 연산들 각각이 동일한 수의 클럭 사이클들 안에 완료되도록 더 구성되는 그래픽 프로세서.
- 제2항에 있어서,상기 배정도 기능 유닛은 오버플로우 또는 언더플로우 상태가 발생하는지의 여부에 관계없이 상기 복수의 배정도 연산들 각각이 동일한 수의 클럭 사이클들 안에 완료되도록 더 구성되는 그래픽 프로세서.
- 제3항에 있어서,상기 배정도 기능 유닛은 오버플로우 또는 언더플로우 상태가 발생되는 경우에 부동 소수점 산술 표준(floating-point arithmetic standard)에 따르는 오버플로우 또는 언더플로우 결과를 생성하고, 상기 오버플로우 또는 언더플로우 상태가 발생되었는지 여부를 표시하도록 출력 상태 플래그를 설정하도록 더 구성되는 그래픽 프로세서.
- 제1항에 있어서,상기 배정도 기능 유닛은 상기 복수의 배정도 연산들 중 임의의 하나를 완료하는데 필요한 시간이 부동 소수점 예외에 의해 영향을 받지 않도록 더 구성되는 그래픽 프로세서.
- 제1항에 있어서,상기 복수의 배정도 연산들은,두 개의 배정도 피연산자들을 더하는 덧셈 연산;두 개의 배정도 피연산자들을 곱하는 곱셈 연산; 및제1 배정도 피연산자 및 제2 배정도 피연산자의 곱을 계산한 후, 제3 배정도 피연산자를 상기 곱에 더하는 결합형 곱셈 덧셈 연산(fused multiply-add operation)을 포함하는 그래픽 프로세서.
- 제6항에 있어서,상기 복수의 배정도 연산들은, 제1 피연산자와 제2 피연산자에 대한 비교 검사를 수행하고, 상기 비교 검사가 만족되었는지의 여부를 나타내는 불린 결과(Boolean result)를 생성하는 DSET(double-precision comparison) 연산을 더 포함하는 그래픽 프로세서.
- 제6항에 있어서,상기 복수의 배정도 연산들은,두 개의 배정도 입력 피연산자들 중 보다 큰 배정도 입력 피연산자를 반환하는 DMAX(double-precision maximum) 연산; 및두 개의 배정도 입력 피연산자들 중 보다 작은 배정도 입력 피연산자를 반환하는 DMIN(double-precision minimum) 연산을 더 포함하는 그래픽 프로세서.
- 제6항에 있어서,상기 복수의 배정도 연산들은 피연산자를 배정도 포맷에서 비-배정도 포맷(non-double-precision format)으로 변환하는 적어도 하나의 포맷 변환 연산을 더 포함하는 그래픽 프로세서.
- 제6항에 있어서,상기 복수의 배정도 연산들은 피연산자를 비-배정도 포맷에서 배정도 포맷으로 변환하는 적어도 하나의 포맷 변환 연산을 더 포함하는 그래픽 프로세서.
- 그래픽 프로세서에 있어서,이미지 데이터를 생성하도록 구성된 렌더링 파이프라인 - 상기 렌더링 파이프라인은 복수의 동시 발생적인 스레드들을 실행하도록 구성된 처리 코어를 포함함 - 을 포함하고,상기 처리 코어는,하나 이상의 단정도 피연산자들에 대해 산술 연산을 실행하도록 구성된 단정도 기능 유닛을 포함하고,배정도 입력 피연산자들의 세트에 대해 결합형 곱셈 덧셈 연산을 실행하여 배정도 결과를 제공하도록 구성된 DFMA(double-precision fused multiply-add) 기능 유닛을 더 포함하며,상기 DFMA 기능 유닛은 데이터 경로들을 갖는 DFMA 파이프라인을 포함하며, 상기 데이터 경로들은 상기 결합형 곱셈 덧셈 연산을 상기 DFMA 파이프라인을 통해서 한 번의 통과로 수행할 수 있는, 그래픽 프로세서.
- 제11항에 있어서,상기 DFMA 기능 유닛은,단일의 반복으로 두 개의 배정도 가수(double-precision mantissas)의 곱을 계산하도록 구성된 곱셈기; 및단일의 반복으로 두 개의 배정도 가수의 합을 계산하도록 구성된 덧셈기를 포함하는 그래픽 프로세서.
- 제11항에 있어서,상기 DFMA 기능 유닛은 한 쌍의 배정도 입력 피연산자들에 대해 곱셈 연산을 실행하여 배정도 결과를 제공하도록 더 구성되는 그래픽 프로세서.
- 제13항에 있어서,상기 곱셈 연산 및 상기 결합형 곱셈 덧셈 연산은 동일한 수의 클럭 사이클들 안에 각각 완료되는 그래픽 프로세서.
- 제11항에 있어서,상기 DFMA 기능 유닛은 한 쌍의 배정도 입력 피연산자들에 대해 덧셈 연산을 실행하여 배정도 결과를 제공하도록 더 구성되는 그래픽 프로세서.
- 제15항에 있어서,상기 덧셈 연산 및 상기 결합형 곱셈 덧셈 연산은 동일한 수의 클럭 사이클 들 안에 각각 완료되는 그래픽 프로세서.
- 제16항에 있어서,상기 DFMA 기능 유닛은 한 쌍의 배정도 입력 피연산자들에 대해 곱셈 연산을 실행하여 배정도 결과를 제공하도록 더 구성되고,상기 결합형 곱셈 덧셈 연산, 상기 덧셈 연산 및 상기 곱셈 연산은 오버플로우 또는 언더플로우 상태가 발생하는지의 여부에 관계없이 동일한 수의 클럭 사이클들 안에 각각 완료되는 그래픽 프로세서.
- 제17항에 있어서,상기 DFMA 기능 유닛은 오버플로우 또는 언더플로우 상태가 발생되는 경우에 부동 소수점 산술 표준에 따르는 오버플로우 또는 언더플로우 결과를 생성하고, 상기 오버플로우 또는 언더플로우 상태가 발생되었는지 여부를 표시하도록 출력 상태 플래그를 설정하도록 더 구성되는 그래픽 프로세서.
- 제11항에 있어서,상기 처리 코어는 병렬로 연산하도록 구성된 제1 기능 유닛의 다수(P개)의 복사본 및 상기 DFMA 기능 유닛의 다수(N개)의 복사본을 포함하는 그래픽 프로세서.
- 제19항에 있어서,상기 P개는 상기 N개보다 큰 그래픽 프로세서.
- 제20항에 있어서,상기 N개는 1인 그래픽 프로세서.
- 제21항에 있어서,상기 처리 코어는 상기 DFMA 기능 유닛에 대한 P개의 세트의 배정도 입력 피연산자들을 모아서, 상기 P개의 세트의 배정도 피연산자들의 상이한 피연산자들을 상이한 클록 사이클들에 상기 DFMA 기능 유닛으로 전달하도록 구성된 입력 관리자 회로를 더 포함하는 그래픽 프로세서.
- 제22항에 있어서,상기 입력 관리자 회로는 상기 제1 기능 유닛에 대한 P개의 세트의 단정도 입력 피연산자들을 모아서, 상기 P개의 세트의 단정도 피연산자들의 상이한 피연산자를 상기 제1 기능 유닛의 P개의 복사본들 각각에 대해 병렬로 전달하도록 더 구성되는 그래픽 프로세서.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/952,858 | 2007-12-07 | ||
US11/952,858 US8106914B2 (en) | 2007-12-07 | 2007-12-07 | Fused multiply-add functional unit |
Publications (2)
Publication Number | Publication Date |
---|---|
KR20090060207A KR20090060207A (ko) | 2009-06-11 |
KR101009095B1 true KR101009095B1 (ko) | 2011-01-18 |
Family
ID=40230776
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
KR1020080124099A KR101009095B1 (ko) | 2007-12-07 | 2008-12-08 | 다목적 배정도 기능 유닛을 구비한 그래픽 프로세서 |
Country Status (7)
Country | Link |
---|---|
US (1) | US8106914B2 (ko) |
JP (2) | JP2009140491A (ko) |
KR (1) | KR101009095B1 (ko) |
CN (1) | CN101452571B (ko) |
DE (1) | DE102008059371B9 (ko) |
GB (1) | GB2455401B (ko) |
TW (1) | TWI402766B (ko) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8051123B1 (en) | 2006-12-15 | 2011-11-01 | Nvidia Corporation | Multipurpose functional unit with double-precision and filtering operations |
US8106914B2 (en) | 2007-12-07 | 2012-01-31 | Nvidia Corporation | Fused multiply-add functional unit |
US8190669B1 (en) | 2004-10-20 | 2012-05-29 | Nvidia Corporation | Multipurpose arithmetic functional unit |
Families Citing this family (61)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8037119B1 (en) | 2006-02-21 | 2011-10-11 | Nvidia Corporation | Multipurpose functional unit with single-precision and double-precision operations |
US8289333B2 (en) | 2008-03-04 | 2012-10-16 | Apple Inc. | Multi-context graphics processing |
US8477143B2 (en) | 2008-03-04 | 2013-07-02 | Apple Inc. | Buffers for display acceleration |
US8633936B2 (en) * | 2008-04-21 | 2014-01-21 | Qualcomm Incorporated | Programmable streaming processor with mixed precision instruction execution |
US8239441B2 (en) * | 2008-05-15 | 2012-08-07 | Oracle America, Inc. | Leading zero estimation modification for unfused rounding catastrophic cancellation |
US8495121B2 (en) * | 2008-11-20 | 2013-07-23 | Advanced Micro Devices, Inc. | Arithmetic processing device and methods thereof |
US20100125621A1 (en) * | 2008-11-20 | 2010-05-20 | Advanced Micro Devices, Inc. | Arithmetic processing device and methods thereof |
KR101511273B1 (ko) * | 2008-12-29 | 2015-04-10 | 삼성전자주식회사 | 멀티 코어 프로세서를 이용한 3차원 그래픽 렌더링 방법 및시스템 |
US8803897B2 (en) * | 2009-09-03 | 2014-08-12 | Advanced Micro Devices, Inc. | Internal, processing-unit memory for general-purpose use |
US8990282B2 (en) * | 2009-09-21 | 2015-03-24 | Arm Limited | Apparatus and method for performing fused multiply add floating point operation |
US8745111B2 (en) | 2010-11-16 | 2014-06-03 | Apple Inc. | Methods and apparatuses for converting floating point representations |
KR101735677B1 (ko) | 2010-11-17 | 2017-05-16 | 삼성전자주식회사 | 부동 소수점의 복합 연산장치 및 그 연산방법 |
US8752064B2 (en) * | 2010-12-14 | 2014-06-10 | Advanced Micro Devices, Inc. | Optimizing communication of system call requests |
US8965945B2 (en) * | 2011-02-17 | 2015-02-24 | Arm Limited | Apparatus and method for performing floating point addition |
DE102011108754A1 (de) * | 2011-07-28 | 2013-01-31 | Khs Gmbh | Inspektionseinheit |
CN102750663A (zh) * | 2011-08-26 | 2012-10-24 | 新奥特(北京)视频技术有限公司 | 一种基于gpu的地理信息数据处理的方法、设备和系统 |
US9792087B2 (en) * | 2012-04-20 | 2017-10-17 | Futurewei Technologies, Inc. | System and method for a floating-point format for digital signal processors |
US9110713B2 (en) | 2012-08-30 | 2015-08-18 | Qualcomm Incorporated | Microarchitecture for floating point fused multiply-add with exponent scaling |
US9152382B2 (en) | 2012-10-31 | 2015-10-06 | Intel Corporation | Reducing power consumption in a fused multiply-add (FMA) unit responsive to input data values |
US9665973B2 (en) * | 2012-11-20 | 2017-05-30 | Intel Corporation | Depth buffering |
US9019284B2 (en) | 2012-12-20 | 2015-04-28 | Nvidia Corporation | Input output connector for accessing graphics fixed function units in a software-defined pipeline and a method of operating a pipeline |
US9123128B2 (en) * | 2012-12-21 | 2015-09-01 | Nvidia Corporation | Graphics processing unit employing a standard processing unit and a method of constructing a graphics processing unit |
US9317251B2 (en) | 2012-12-31 | 2016-04-19 | Nvidia Corporation | Efficient correction of normalizer shift amount errors in fused multiply add operations |
GB2511314A (en) | 2013-02-27 | 2014-09-03 | Ibm | Fast fused-multiply-add pipeline |
US9389871B2 (en) | 2013-03-15 | 2016-07-12 | Intel Corporation | Combined floating point multiplier adder with intermediate rounding logic |
US9465578B2 (en) * | 2013-12-13 | 2016-10-11 | Nvidia Corporation | Logic circuitry configurable to perform 32-bit or dual 16-bit floating-point operations |
US10297001B2 (en) * | 2014-12-26 | 2019-05-21 | Intel Corporation | Reduced power implementation of computer instructions |
KR102276910B1 (ko) | 2015-01-06 | 2021-07-13 | 삼성전자주식회사 | 테셀레이션 장치 및 방법 |
US9952865B2 (en) | 2015-04-04 | 2018-04-24 | Texas Instruments Incorporated | Low energy accelerator processor architecture with short parallel instruction word and non-orthogonal register data file |
US11847427B2 (en) | 2015-04-04 | 2023-12-19 | Texas Instruments Incorporated | Load store circuit with dedicated single or dual bit shift circuit and opcodes for low power accelerator processor |
US9817791B2 (en) | 2015-04-04 | 2017-11-14 | Texas Instruments Incorporated | Low energy accelerator processor architecture with short parallel instruction word |
US10152310B2 (en) * | 2015-05-27 | 2018-12-11 | Nvidia Corporation | Fusing a sequence of operations through subdividing |
US10503474B2 (en) | 2015-12-31 | 2019-12-10 | Texas Instruments Incorporated | Methods and instructions for 32-bit arithmetic support using 16-bit multiply and 32-bit addition |
US10387988B2 (en) * | 2016-02-26 | 2019-08-20 | Google Llc | Compiler techniques for mapping program code to a high performance, power efficient, programmable image processing hardware platform |
US10282169B2 (en) | 2016-04-06 | 2019-05-07 | Apple Inc. | Floating-point multiply-add with down-conversion |
US10157059B2 (en) * | 2016-09-29 | 2018-12-18 | Intel Corporation | Instruction and logic for early underflow detection and rounder bypass |
US10401412B2 (en) | 2016-12-16 | 2019-09-03 | Texas Instruments Incorporated | Line fault signature analysis |
US10275391B2 (en) | 2017-01-23 | 2019-04-30 | International Business Machines Corporation | Combining of several execution units to compute a single wide scalar result |
GB2560766B (en) * | 2017-03-24 | 2019-04-03 | Imagination Tech Ltd | Floating point to fixed point conversion |
US10417734B2 (en) | 2017-04-24 | 2019-09-17 | Intel Corporation | Compute optimization mechanism for deep neural networks |
US10489877B2 (en) | 2017-04-24 | 2019-11-26 | Intel Corporation | Compute optimization mechanism |
US10409614B2 (en) | 2017-04-24 | 2019-09-10 | Intel Corporation | Instructions having support for floating point and integer data types in the same register |
US10417731B2 (en) | 2017-04-24 | 2019-09-17 | Intel Corporation | Compute optimization mechanism for deep neural networks |
US10474458B2 (en) | 2017-04-28 | 2019-11-12 | Intel Corporation | Instructions and logic to perform floating-point and integer operations for machine learning |
US10726514B2 (en) * | 2017-04-28 | 2020-07-28 | Intel Corporation | Compute optimizations for low precision machine learning operations |
CN108595369B (zh) * | 2018-04-28 | 2020-08-25 | 天津芯海创科技有限公司 | 算式并行计算装置及方法 |
US10635439B2 (en) | 2018-06-13 | 2020-04-28 | Samsung Electronics Co., Ltd. | Efficient interface and transport mechanism for binding bindless shader programs to run-time specified graphics pipeline configurations and objects |
CN108958705B (zh) * | 2018-06-26 | 2021-11-12 | 飞腾信息技术有限公司 | 一种支持混合数据类型的浮点融合乘加器及其应用方法 |
US11093579B2 (en) * | 2018-09-05 | 2021-08-17 | Intel Corporation | FP16-S7E8 mixed precision for deep learning and other algorithms |
US11455766B2 (en) * | 2018-09-18 | 2022-09-27 | Advanced Micro Devices, Inc. | Variable precision computing system |
JP7115211B2 (ja) * | 2018-10-18 | 2022-08-09 | 富士通株式会社 | 演算処理装置および演算処理装置の制御方法 |
AU2020241262A1 (en) | 2019-03-15 | 2021-11-04 | Intel Corporation | Sparse optimizations for a matrix accelerator architecture |
US12013808B2 (en) | 2019-03-15 | 2024-06-18 | Intel Corporation | Multi-tile architecture for graphics operations |
EP4024223A1 (en) | 2019-03-15 | 2022-07-06 | Intel Corporation | Systems and methods for cache optimization |
US11934342B2 (en) | 2019-03-15 | 2024-03-19 | Intel Corporation | Assistance for hardware prefetch in cache access |
US11016765B2 (en) * | 2019-04-29 | 2021-05-25 | Micron Technology, Inc. | Bit string operations using a computing tile |
US10990389B2 (en) * | 2019-04-29 | 2021-04-27 | Micron Technology, Inc. | Bit string operations using a computing tile |
US11907713B2 (en) * | 2019-12-28 | 2024-02-20 | Intel Corporation | Apparatuses, methods, and systems for fused operations using sign modification in a processing element of a configurable spatial accelerator |
US12020349B2 (en) * | 2020-05-01 | 2024-06-25 | Samsung Electronics Co., Ltd. | Methods and apparatus for efficient blending in a graphics pipeline |
CN111610955B (zh) * | 2020-06-28 | 2022-06-03 | 中国人民解放军国防科技大学 | 一种数据饱和加打包处理部件、芯片及设备 |
US20220156344A1 (en) | 2020-11-19 | 2022-05-19 | Google Llc | Systolic array cells with output post-processing |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0659862A (ja) * | 1992-08-05 | 1994-03-04 | Fujitsu Ltd | 乗算器 |
US5778247A (en) | 1996-03-06 | 1998-07-07 | Sun Microsystems, Inc. | Multi-pipeline microprocessor with data precision mode indicator |
KR20010050800A (ko) * | 1999-10-01 | 2001-06-25 | 가나이 쓰토무 | 부동 소수점 명령 세트 아키텍쳐 및 구현 |
JP2003223316A (ja) | 2002-01-31 | 2003-08-08 | Matsushita Electric Ind Co Ltd | 演算処理装置 |
Family Cites Families (52)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5241638A (en) * | 1985-08-12 | 1993-08-31 | Ceridian Corporation | Dual cache memory |
JPS6297060A (ja) | 1985-10-23 | 1987-05-06 | Mitsubishi Electric Corp | デイジタルシグナルプロセツサ |
US4893268A (en) | 1988-04-15 | 1990-01-09 | Motorola, Inc. | Circuit and method for accumulating partial products of a single, double or mixed precision multiplication |
US4972362A (en) * | 1988-06-17 | 1990-11-20 | Bipolar Integrated Technology, Inc. | Method and apparatus for implementing binary multiplication using booth type multiplication |
US5287511A (en) * | 1988-07-11 | 1994-02-15 | Star Semiconductor Corporation | Architectures and methods for dividing processing tasks into tasks for a programmable real time signal processor and tasks for a decision making microprocessor interfacing therewith |
US4969118A (en) | 1989-01-13 | 1990-11-06 | International Business Machines Corporation | Floating point unit for calculating A=XY+Z having simultaneous multiply and add |
JPH0378083A (ja) * | 1989-08-21 | 1991-04-03 | Hitachi Ltd | 倍精度演算方式及び積和演算装置 |
JPH03100723A (ja) * | 1989-09-13 | 1991-04-25 | Fujitsu Ltd | 精度変換命令の処理方式 |
US5241636A (en) | 1990-02-14 | 1993-08-31 | Intel Corporation | Method for parallel instruction execution in a computer |
US5068816A (en) | 1990-02-16 | 1991-11-26 | Noetzel Andrew S | Interplating memory function evaluation |
DE69129569T2 (de) * | 1990-09-05 | 1999-02-04 | Philips Electronics N.V., Eindhoven | Maschine mit sehr langem Befehlswort für leistungsfähige Durchführung von Programmen mit bedingten Verzweigungen |
JPH0612229A (ja) | 1992-06-10 | 1994-01-21 | Nec Corp | 乗累算回路 |
EP0576262B1 (en) | 1992-06-25 | 2000-08-23 | Canon Kabushiki Kaisha | Apparatus for multiplying integers of many figures |
US5581778A (en) * | 1992-08-05 | 1996-12-03 | David Sarnoff Researach Center | Advanced massively parallel computer using a field of the instruction to selectively enable the profiling counter to increase its value in response to the system clock |
EP0622727A1 (en) | 1993-04-29 | 1994-11-02 | International Business Machines Corporation | System for optimizing argument reduction |
EP0645699A1 (en) * | 1993-09-29 | 1995-03-29 | International Business Machines Corporation | Fast multiply-add instruction sequence in a pipeline floating-point processor |
US5487022A (en) * | 1994-03-08 | 1996-01-23 | Texas Instruments Incorporated | Normalization method for floating point numbers |
US5673407A (en) * | 1994-03-08 | 1997-09-30 | Texas Instruments Incorporated | Data processor having capability to perform both floating point operations and memory access in response to a single instruction |
US5553015A (en) | 1994-04-15 | 1996-09-03 | International Business Machines Corporation | Efficient floating point overflow and underflow detection system |
US5734874A (en) | 1994-04-29 | 1998-03-31 | Sun Microsystems, Inc. | Central processing unit with integrated graphics functions |
JP3493064B2 (ja) | 1994-09-14 | 2004-02-03 | 株式会社東芝 | バレルシフタ |
US5548545A (en) * | 1995-01-19 | 1996-08-20 | Exponential Technology, Inc. | Floating point exception prediction for compound operations and variable precision using an intermediate exponent bus |
US5701405A (en) | 1995-06-21 | 1997-12-23 | Apple Computer, Inc. | Method and apparatus for directly evaluating a parameter interpolation function used in rendering images in a graphics system that uses screen partitioning |
JP3790307B2 (ja) | 1996-10-16 | 2006-06-28 | 株式会社ルネサステクノロジ | データプロセッサ及びデータ処理システム |
US6490607B1 (en) | 1998-01-28 | 2002-12-03 | Advanced Micro Devices, Inc. | Shared FP and SIMD 3D multiplier |
US6061781A (en) * | 1998-07-01 | 2000-05-09 | Ip First Llc | Concurrent execution of divide microinstructions in floating point unit and overflow detection microinstructions in integer unit for integer divide |
JP2000081966A (ja) * | 1998-07-09 | 2000-03-21 | Matsushita Electric Ind Co Ltd | 演算装置 |
JP3600026B2 (ja) | 1998-08-12 | 2004-12-08 | 株式会社東芝 | 浮動小数点演算器 |
US6317133B1 (en) | 1998-09-18 | 2001-11-13 | Ati Technologies, Inc. | Graphics processor with variable performance characteristics |
US6480872B1 (en) | 1999-01-21 | 2002-11-12 | Sandcraft, Inc. | Floating-point and integer multiply-add and multiply-accumulate |
JP2000293494A (ja) * | 1999-04-09 | 2000-10-20 | Fuji Xerox Co Ltd | 並列計算装置および並列計算方法 |
US6198488B1 (en) * | 1999-12-06 | 2001-03-06 | Nvidia | Transform, lighting and rasterization system embodied on a single semiconductor platform |
US6807620B1 (en) * | 2000-02-11 | 2004-10-19 | Sony Computer Entertainment Inc. | Game system with graphics processor |
US6557022B1 (en) | 2000-02-26 | 2003-04-29 | Qualcomm, Incorporated | Digital signal processor with coupled multiply-accumulate units |
US6912557B1 (en) | 2000-06-09 | 2005-06-28 | Cirrus Logic, Inc. | Math coprocessor |
JP2002008060A (ja) * | 2000-06-23 | 2002-01-11 | Hitachi Ltd | データ処理方法、記録媒体及びデータ処理装置 |
US6976043B2 (en) | 2001-07-30 | 2005-12-13 | Ati Technologies Inc. | Technique for approximating functions based on lagrange polynomials |
JP3845009B2 (ja) | 2001-12-28 | 2006-11-15 | 富士通株式会社 | 積和演算装置、及び積和演算方法 |
WO2004015572A1 (en) * | 2002-08-07 | 2004-02-19 | Mmagix Technology Limited | Apparatus, method and system for a synchronicity independent, resource delegating, power and instruction optimizing processor |
US8549501B2 (en) * | 2004-06-07 | 2013-10-01 | International Business Machines Corporation | Framework for generating mixed-mode operations in loop-level simdization |
US7437538B1 (en) * | 2004-06-30 | 2008-10-14 | Sun Microsystems, Inc. | Apparatus and method for reducing execution latency of floating point operations having special case operands |
US7640285B1 (en) | 2004-10-20 | 2009-12-29 | Nvidia Corporation | Multipurpose arithmetic functional unit |
WO2006053173A2 (en) * | 2004-11-10 | 2006-05-18 | Nvidia Corporation | Multipurpose multiply-add functional unit |
KR20060044124A (ko) * | 2004-11-11 | 2006-05-16 | 삼성전자주식회사 | 3차원 그래픽 가속을 위한 그래픽 시스템 및 메모리 장치 |
JP4571903B2 (ja) * | 2005-12-02 | 2010-10-27 | 富士通株式会社 | 演算処理装置,情報処理装置,及び演算処理方法 |
US7747842B1 (en) * | 2005-12-19 | 2010-06-29 | Nvidia Corporation | Configurable output buffer ganging for a parallel processor |
US7728841B1 (en) | 2005-12-19 | 2010-06-01 | Nvidia Corporation | Coherent shader output for multiple targets |
US7484076B1 (en) * | 2006-09-18 | 2009-01-27 | Nvidia Corporation | Executing an SIMD instruction requiring P operations on an execution unit that performs Q operations at a time (Q<P) |
US7617384B1 (en) * | 2006-11-06 | 2009-11-10 | Nvidia Corporation | Structured programming control flow using a disable mask in a SIMD architecture |
JP4954799B2 (ja) | 2007-06-05 | 2012-06-20 | 日本発條株式会社 | 衝撃吸収装置 |
US8775777B2 (en) * | 2007-08-15 | 2014-07-08 | Nvidia Corporation | Techniques for sourcing immediate values from a VLIW |
US8106914B2 (en) | 2007-12-07 | 2012-01-31 | Nvidia Corporation | Fused multiply-add functional unit |
-
2007
- 2007-12-07 US US11/952,858 patent/US8106914B2/en active Active
-
2008
- 2008-11-25 GB GB0821495A patent/GB2455401B/en active Active
- 2008-11-27 JP JP2008302713A patent/JP2009140491A/ja active Pending
- 2008-11-28 DE DE102008059371A patent/DE102008059371B9/de active Active
- 2008-12-04 CN CN2008101825044A patent/CN101452571B/zh active Active
- 2008-12-05 TW TW097147390A patent/TWI402766B/zh active
- 2008-12-08 KR KR1020080124099A patent/KR101009095B1/ko active IP Right Grant
-
2011
- 2011-09-30 JP JP2011217575A patent/JP2012084142A/ja active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0659862A (ja) * | 1992-08-05 | 1994-03-04 | Fujitsu Ltd | 乗算器 |
US5778247A (en) | 1996-03-06 | 1998-07-07 | Sun Microsystems, Inc. | Multi-pipeline microprocessor with data precision mode indicator |
KR20010050800A (ko) * | 1999-10-01 | 2001-06-25 | 가나이 쓰토무 | 부동 소수점 명령 세트 아키텍쳐 및 구현 |
JP2003223316A (ja) | 2002-01-31 | 2003-08-08 | Matsushita Electric Ind Co Ltd | 演算処理装置 |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8190669B1 (en) | 2004-10-20 | 2012-05-29 | Nvidia Corporation | Multipurpose arithmetic functional unit |
US8051123B1 (en) | 2006-12-15 | 2011-11-01 | Nvidia Corporation | Multipurpose functional unit with double-precision and filtering operations |
US8106914B2 (en) | 2007-12-07 | 2012-01-31 | Nvidia Corporation | Fused multiply-add functional unit |
Also Published As
Publication number | Publication date |
---|---|
DE102008059371B9 (de) | 2012-06-06 |
TWI402766B (zh) | 2013-07-21 |
US8106914B2 (en) | 2012-01-31 |
GB0821495D0 (en) | 2008-12-31 |
JP2009140491A (ja) | 2009-06-25 |
CN101452571B (zh) | 2012-04-25 |
KR20090060207A (ko) | 2009-06-11 |
GB2455401A (en) | 2009-06-10 |
CN101452571A (zh) | 2009-06-10 |
TW200937341A (en) | 2009-09-01 |
DE102008059371B4 (de) | 2012-03-08 |
DE102008059371A1 (de) | 2009-06-25 |
JP2012084142A (ja) | 2012-04-26 |
GB2455401B (en) | 2010-05-05 |
US20090150654A1 (en) | 2009-06-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR101009095B1 (ko) | 다목적 배정도 기능 유닛을 구비한 그래픽 프로세서 | |
US11797303B2 (en) | Generalized acceleration of matrix multiply accumulate operations | |
US11816482B2 (en) | Generalized acceleration of matrix multiply accumulate operations | |
US7428566B2 (en) | Multipurpose functional unit with multiply-add and format conversion pipeline | |
US7225323B2 (en) | Multi-purpose floating point and integer multiply-add functional unit with multiplication-comparison test addition and exponent pipelines | |
US8037119B1 (en) | Multipurpose functional unit with single-precision and double-precision operations | |
US7724261B2 (en) | Processor having a compare extension of an instruction set architecture | |
KR101515311B1 (ko) | 승산-승산-누산 명령 수행 | |
US20060101244A1 (en) | Multipurpose functional unit with combined integer and floating-point multiply-add pipeline | |
US8051123B1 (en) | Multipurpose functional unit with double-precision and filtering operations | |
CN108076666B (zh) | 计算机指令的降功率实现 | |
KR100911786B1 (ko) | 다목적 승산-가산 기능 유닛 | |
US7640285B1 (en) | Multipurpose arithmetic functional unit | |
US6732259B1 (en) | Processor having a conditional branch extension of an instruction set architecture | |
US8190669B1 (en) | Multipurpose arithmetic functional unit | |
US7240184B2 (en) | Multipurpose functional unit with multiplication pipeline, addition pipeline, addition pipeline and logical test pipeline capable of performing integer multiply-add operations | |
EP1163591B1 (en) | Processor having a compare extension of an instruction set architecture | |
WO2000048080A9 (en) | Processor having a compare extension of an instruction set architecture |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
A201 | Request for examination | ||
E902 | Notification of reason for refusal | ||
E701 | Decision to grant or registration of patent right | ||
GRNT | Written decision to grant | ||
FPAY | Annual fee payment |
Payment date: 20131223 Year of fee payment: 4 |
|
FPAY | Annual fee payment |
Payment date: 20141231 Year of fee payment: 5 |
|
FPAY | Annual fee payment |
Payment date: 20160104 Year of fee payment: 6 |
|
FPAY | Annual fee payment |
Payment date: 20170102 Year of fee payment: 7 |
|
FPAY | Annual fee payment |
Payment date: 20180110 Year of fee payment: 8 |
|
FPAY | Annual fee payment |
Payment date: 20190102 Year of fee payment: 9 |
|
FPAY | Annual fee payment |
Payment date: 20200102 Year of fee payment: 10 |