JP2016537726A

JP2016537726A - Vector processing engine with merging circuit between execution unit and vector data memory and associated method

Info

Publication number: JP2016537726A
Application number: JP2016531030A
Authority: JP
Inventors: カーン、ラヘール
Original assignee: Qualcomm Inc
Current assignee: Qualcomm Inc
Priority date: 2013-11-15
Filing date: 2014-11-14
Publication date: 2016-12-01
Anticipated expiration: 2034-11-14
Also published as: JP6339197B2; WO2015073915A1; CN105723333B; US20150143077A1; CN105723333A; US9684509B2; EP3069235A1; KR101781057B1; KR20160085873A

Abstract

ベクトルデータメモリに記憶される出力ベクトルデータのインフライトマージングを提供するために、実行ユニットとベクトルデータメモリとの間のデータフローパスにおいてマージング回路を利用するベクトル処理エンジン（ＶＰＥ）が開示される。関連するベクトル処理の命令、システム、および方法も開示される。マージング回路は、ＶＰＥ内の実行ユニットとベクトルデータメモリとの間のデータフローパス内に設けられる。マージング回路は、出力ベクトルデータサンプルセットが記憶されるために実行ユニットからベクトルデータメモリに出力データフローパスを介して供給されている間のインフライトのベクトル処理動作を実行する結果として、実行ユニットからの出力ベクトルデータサンプルセットをマージするように構成される。マージされた出力ベクトルデータサンプルセットは、実行ユニット内で実行されるべき次のベクトル処理動作を遅延させる可能性がある、さらなる後処理ステップを必要とせずに、ベクトルデータメモリにマージされた形式で記憶される。【選択図】図３１A vector processing engine (VPE) is disclosed that utilizes a merging circuit in the data flow path between the execution unit and the vector data memory to provide in-flight merging of output vector data stored in the vector data memory. Related vector processing instructions, systems, and methods are also disclosed. The merging circuit is provided in the data flow path between the execution unit in the VPE and the vector data memory. The merging circuit performs an in-flight vector processing operation while the output vector data sample set is being supplied from the execution unit to the vector data memory via the output data flow path to be stored. It is configured to merge the output vector data sample sets. The merged output vector data sample set is in a form merged into vector data memory without the need for further post-processing steps that may delay the next vector processing operation to be performed within the execution unit. Remembered. [Selection] Figure 31

Description

Related applications

[0001]本出願は、２０１３年３月１３日に出願され、参照によりその全体が本明細書に組み込まれている、「ＶＥＣＴＯＲＰＲＯＣＥＳＳＩＮＧＥＮＧＩＮＥＳＨＡＶＩＮＧＰＲＯＧＲＡＭＭＡＢＬＥＤＡＴＡＰＡＴＨＣＯＮＦＩＧＵＲＡＴＩＯＮＳＦＯＲＰＲＯＶＩＤＩＮＧＭＵＬＴＩ−ＭＯＤＥＶＥＣＴＯＲＰＲＯＣＥＳＳＩＮＧ，ＡＮＤＲＥＬＡＴＥＤＶＥＣＴＯＲＰＲＯＣＥＳＳＯＲＳ，ＳＹＳＴＥＭＳ，ＡＮＤＭＥＴＨＯＤＳ」、１２３２４９と題する、米国特許出願第１３／７９８，６４１号に関連する。 [0001] This application was filed on March 13, 2013 and is hereby incorporated by reference in its entirety, "VECTOR PROCESSING ENGINES HAVING PROGRAMMABLE DATA PATH CONFIGURATIONS FOR PROVIDING MULTITED ED Related to US patent application Ser. No. 13 / 798,641, entitled “VECTOR PROCESSORS, SYSTEMS, AND METHODS”, 123249.

[0002]本出願は、２０１３年３月１３日に出願され、参照によりその全体が本明細書に組み込まれている、「ＶＥＣＴＯＲＰＲＯＣＥＳＳＩＮＧＣＡＲＲＹ−ＳＡＶＥＡＣＣＵＭＵＬＡＴＯＲＳＥＭＰＬＯＹＩＮＧＲＥＤＵＮＤＡＮＴＣＡＲＲＹ−ＳＡＶＥＦＯＲＭＡＴＴＯＲＥＤＵＣＥＣＡＲＲＹＰＲＯＰＡＧＡＴＩＯＮ，ＡＮＤＲＥＬＡＴＥＤＶＥＣＴＯＲＰＲＯＣＥＳＳＯＲＳ，ＳＹＳＴＥＭＳ，ＡＮＤＭＥＴＨＯＤＳ」、１２３２４８と題する、米国特許出願第１３／７９８，６１８号に関連する。 [0002] This application was filed on March 13, 2013, and is incorporated herein by reference in its entirety, "VECTOR PROCESSING CARRY-SAVE ACCUMULATORS EMPLOYING REDUNDANT CARRY-SAVE FORMAT TO REDUPRAND CARRY. Related to US patent application Ser. No. 13 / 798,618, entitled RELATED VECTOR PROCESSORS, SYSTEMS, AND METHODS, 123248.

[0003]本出願は、２０１３年１１月１５日に出願され、参照によりその全体が本明細書に組み込まれている、「ＶＥＣＴＯＲＰＲＯＣＥＳＳＩＮＧＥＮＧＩＮＥＳ（ＶＰＥｓ）ＥＭＰＬＯＹＩＮＧＡＴＡＰＰＥＤ−ＤＥＬＡＹＬＩＮＥ（Ｓ）ＦＯＲＰＲＯＶＩＤＩＮＧＰＲＥＣＩＳＩＯＮＦＩＬＴＥＲＶＥＣＴＯＲＰＲＯＣＥＳＳＩＮＧＯＰＥＲＡＴＩＯＮＳＷＩＴＨＲＥＤＵＣＥＤＳＡＭＰＬＥＲＥ−ＦＥＴＣＨＩＮＧＡＮＤＰＯＷＥＲＣＯＮＳＵＭＰＴＩＯＮ，ＡＮＤＲＥＬＡＴＥＤＶＥＣＴＯＲＰＲＯＣＥＳＳＯＲＳＹＳＴＥＭＳＡＮＤＭＥＴＨＯＤＳ」、１２４３６２と題する、米国特許出願第１４／０８２，０７５号にも関連する。 [0003] This application is filed on Nov. 15, 2013, which is incorporated herein by reference in its entirety, "VECTOR PROCESSING ENGINES (VPEs) EMPLOYING A TAPED-DELAY LINE (S) FOR PROVIDING PRECISION" FILTER VECTOR PROCESSING OPERATIONS WITH REDUCED SAMPLE RE-FETCHING AND POWER CONSUMPTION, AND RELATED VECTOR PROCESSOR SYSTEMS AND METHODS "

[0004]本出願は、２０１３年１１月１５日に出願され、参照によりその全体が本明細書に組み込まれている、「ＶＥＣＴＯＲＰＲＯＣＥＳＳＩＮＧＥＮＧＩＮＥＳ（ＶＰＥｓ）ＥＭＰＬＯＹＩＮＧＴＡＰＰＥＤ−ＤＥＬＡＹＬＩＮＥ（Ｓ）ＦＯＲＰＲＯＶＩＤＩＮＧＰＲＥＣＩＳＩＯＮＣＯＲＲＥＬＡＴＩＯＮ／ＣＯＶＡＲＩＡＮＣＥＶＥＣＴＯＲＰＲＯＣＥＳＳＩＮＧＯＰＥＲＡＴＩＯＮＳＷＩＴＨＲＥＤＵＣＥＤＳＡＭＰＬＥＲＥ−ＦＥＴＣＨＩＮＧＡＮＤＰＯＷＥＲＣＯＮＳＵＭＰＴＩＯＮ，ＡＮＤＲＥＬＡＴＥＤＶＥＣＴＯＲＰＲＯＣＥＳＳＯＲＳＹＳＴＥＭＳＡＮＤＭＥＴＨＯＤＳ」、１２４３６４と題する、米国特許出願第１４／０８２，０７９号にも関連する。 [0004] This application is filed on November 15, 2013, and is incorporated herein by reference in its entirety, "VECTOR PROCESSING ENGINES (VPEs) EMPLOYING TAPPED-DELAY LINE (S) FOR PROVIDING PRECISION CORRELATION. / COVARIANCE VECTOR PROCESSING OPERATIONS WITH REDUCED SAMPLE RE-FETCHING AND POWER CONSUMPTION, AND RELATED VECTOR PROCESSOR SYSTEM 0, and US REFERENCE VECTOR / AND

[0005]本出願は、２０１３年１１月１５日に出願され、参照によりその全体が本明細書に組み込まれている、「ＶＥＣＴＯＲＰＲＯＣＥＳＳＩＮＧＥＮＧＩＮＥＳ（ＶＰＥｓ）ＥＭＰＬＯＹＩＮＧＦＯＲＭＡＴＣＯＮＶＥＲＳＩＯＮＣＩＲＣＵＩＴＲＹＩＮＤＡＴＡＦＬＯＷＰＡＴＨＳＢＥＴＷＥＥＮＶＥＣＴＯＲＤＡＴＡＭＥＭＯＲＹＡＮＤＥＸＥＣＵＴＩＯＮＵＮＩＴＳＴＯＰＲＯＶＩＤＥＩＮ−ＦＬＩＧＨＴＦＯＲＭＡＴ−ＣＯＮＶＥＲＴＩＮＧＯＦＩＮＰＵＴＶＥＣＴＯＲＤＡＴＡＴＯＥＸＥＣＵＴＩＯＮＵＮＩＴＳＦＯＲＶＥＣＴＯＲＰＲＯＣＥＳＳＩＮＧＯＰＥＲＡＴＩＯＮＳ，ＡＮＤＲＥＬＡＴＥＤＶＥＣＴＯＲＰＲＯＣＥＳＳＯＲＳＹＳＴＥＭＳＡＮＤＭＥＴＨＯＤＳ」、１２４３６５と題する、米国特許出願第１４／０８２，０８８号にも関連する。 [0005] This application is filed on November 15, 2013 and is incorporated herein by reference in its entirety, "VECTOR PROCESSING ENGINES (VPEs) EMPLOYING FORMAT CONVERSION CIRCUITRY DATA FLOWOR BETWEME BETWEET AND EXECUTION UNITS TO PROVIDE IN-FLIGHT FORMAT-CONVERTING OF INPUT VECTOR DATA TO EXECUTION UNITS FOR VECTOR PROCESSING OPERATIONS, AND RELATIVES "Entitled 124365 also relates to U.S. Patent Application No. 14 / 082,088.

[0006]本出願は、２０１３年１１月１５日に出願され、参照によりその全体が本明細書に組み込まれている、「ＶＥＣＴＯＲＰＲＯＣＥＳＳＩＮＧＥＮＧＩＮＥＳ（ＶＰＥｓ）ＥＭＰＬＯＹＩＮＧＲＥＯＲＤＥＲＩＮＧＣＩＲＣＵＩＴＲＹＩＮＤＡＴＡＦＬＯＷＰＡＴＨＳＢＥＴＷＥＥＮＥＸＥＣＵＴＩＯＮＵＮＩＴＳＡＮＤＶＥＣＴＯＲＤＡＴＡＭＥＭＯＲＹＴＯＰＲＯＶＩＤＥＩＮ−ＦＬＩＧＨＴＲＥＯＲＤＥＲＩＮＧＯＦＯＵＴＰＵＴＶＥＣＴＯＲＤＡＴＡＳＴＯＲＥＤＴＯＶＥＣＴＯＲＤＡＴＡＭＥＭＯＲＹ，ＡＮＤＲＥＬＡＴＥＤＶＥＣＴＯＲＰＲＯＣＥＳＳＯＲＳＹＳＴＥＭＳＡＮＤＭＥＴＨＯＤＳ」、１２４４５０と題する、米国特許出願第１４／０８２，０８１号にも関連する。 [0006] This application was filed on November 15, 2013, and is hereby incorporated by reference in its entirety, "VECTOR PROCESSING ENGINES (VPEs) EMPLOYING REORDERING BETWEEN EXTECUENT EXTECUENT EXECT DATA MEMORY TO PROVIDE IN-FLIGHT REORDERING OF OUTPUT VECTOR DATA STORED TO VECTOR DATA MEMORY, AND RELATED VECTOR PROCESSOR SYSTEMS AND METHODS, US Patent No. 1244500 To.

[0007]本出願は、２０１３年１１月１５日に出願され、参照によりその全体が本明細書に組み込まれている、「ＶＥＣＴＯＲＰＲＯＣＥＳＳＩＮＧＥＮＧＩＮＥＳ（ＶＰＥｓ）ＥＭＰＬＯＹＩＮＧＤＥＳＰＲＥＡＤＩＮＧＣＩＲＣＵＩＴＲＹＩＮＤＡＴＡＦＬＯＷＰＡＴＨＳＢＥＴＷＥＥＮＥＸＥＣＵＴＩＯＮＵＮＩＴＳＡＮＤＶＥＣＴＯＲＤＡＴＡＭＥＭＯＲＹＴＯＰＲＯＶＩＤＥＩＮ−ＦＬＩＧＨＴＤＥＳＰＲＥＡＤＩＮＧＯＦＳＰＲＥＡＤ−ＳＰＥＣＴＲＵＭＳＥＱＵＥＮＣＥＳ，ＡＮＤＲＥＬＡＴＥＤＶＥＣＴＯＲＰＲＯＣＥＳＳＩＮＧＩＮＳＴＲＵＣＴＩＯＮＳ，ＳＹＳＴＥＭＳ，ＡＮＤＭＥＴＨＯＤＳ」、１２４３６３Ｕ２と題する、米国特許出願第１４／０８２，０６７号にも関連する。 [0007] This application was filed on November 15, 2013, and is incorporated herein by reference in its entirety, "VECTOR PROCESSING ENGINES (VPEs) EMPLOYING DESPREADING CIRCUITRY IN DATA FLOW EVENT EXTECUENT EXECT DATA MEMORY TO PROVIDE IN-FLIGHT DESPREADING OF SPREAD-SPECTRUM SEQUENCES, AND RELATED VECTOR PROCESSING INSTRUCTIONS, SYSTEMS, AND METHODS, US Patent No. 124363U2 To do.

[0008]本開示の分野は、単一命令多重データ（ＳＩＭＤ）プロセッサと多重命令多重データ（ＭＩＭＤ）プロセッサとを含む、ベクトル演算とスカラー演算とを処理するためのベクトルプロセッサおよび関連システムに関する。 [0008] The field of this disclosure relates to vector processors and related systems for processing vector operations and scalar operations, including single instruction multiple data (SIMD) processors and multiple instruction multiple data (MIMD) processors.

[0009]ワイヤレスコンピューティングシステムは、デジタル情報領域において最も普及した技術の１つに急速になりつつある。技術における進歩により、ワイヤレス通信デバイスは、より小型でより強力になった。たとえば、ワイヤレスコンピューティングデバイスには、一般に、小型で軽量な、ユーザが容易に持ち運べるポータブルワイヤレス電話、携帯情報端末（ＰＤＡ）、およびページングデバイスが含まれる。より具体的には、携帯電話およびインターネットプロトコル（ＩＰ）電話などのポータブルワイヤレス電話は、ワイヤレスネットワークを介して音声とデータパケットとを通信することができる。さらに、多くのそのようなワイヤレス通信デバイスには、他のタイプのデバイスが含まれる。たとえば、ワイヤレス電話には、デジタルスチルカメラ、デジタルビデオカメラ、デジタルレコーダ、および／またはオーディオファイルプレーヤが含まれ得る。また、ワイヤレス電話は、インターネットにアクセスするために使用され得るウェブインターフェースを含むことができる。さらに、ワイヤレス通信デバイスは、設計されたワイヤレス通信技術規格（たとえば、符号分割多元接続（ＣＤＭＡ）、広帯域ＣＤＭＡ（ＷＣＤＭＡ（登録商標））、およびロングタームエボリューション（ＬＴＥ（登録商標）））に従って高速ワイヤレス通信データを処理するための複合処理リソースを含む場合がある。そのため、これらのワイヤレス通信デバイスはかなりのコンピューティング能力を含む。 [0009] Wireless computing systems are rapidly becoming one of the most popular technologies in the digital information domain. Advances in technology have made wireless communication devices smaller and more powerful. For example, wireless computing devices generally include small, lightweight, portable wireless phones that are easy for the user to carry, personal digital assistants (PDAs), and paging devices. More specifically, portable wireless phones, such as cellular phones and Internet Protocol (IP) phones, can communicate voice and data packets over a wireless network. In addition, many such wireless communication devices include other types of devices. For example, a wireless phone may include a digital still camera, a digital video camera, a digital recorder, and / or an audio file player. The wireless phone can also include a web interface that can be used to access the Internet. In addition, wireless communication devices are capable of high-speed wireless according to designed wireless communication technology standards (eg, code division multiple access (CDMA), wideband CDMA (WCDMA®), and long term evolution (LTE®)). May contain complex processing resources for processing communication data. As such, these wireless communication devices include significant computing capabilities.

[0010]ワイヤレスコンピューティングデバイスは、より小型でより強力になるにつれて、ますますリソースの制約を受けるようになる。たとえば、画面サイズ、利用可能なメモリおよびファイルシステム空間の量、ならびに入出力能力の量は、デバイスの小さいサイズによって制限される場合がある。さらに、バッテリサイズ、バッテリによって供給される電力の量、およびバッテリの寿命も制限される。デバイスのバッテリ寿命を増やすための１つの方法は、より少ない電力を消費するプロセッサを設計することである。 [0010] As wireless computing devices become smaller and more powerful, they become increasingly resource constrained. For example, the screen size, the amount of available memory and file system space, and the amount of input / output capability may be limited by the small size of the device. In addition, battery size, the amount of power supplied by the battery, and battery life are also limited. One way to increase the battery life of a device is to design a processor that consumes less power.

[0011]この関連で、ベクトルプロセッサを含むベースバンドプロセッサが、ワイヤレス通信デバイスに利用され得る。ベクトルプロセッサは、ベクトル、すなわちデータのアレイに働く高水準の演算を提供するベクトルアーキテクチャを有する。ベクトル処理は、１つのデータセットに対してベクトル命令を実行し、次いで、ベクトル内の後続要素のためにベクトル命令を再フェッチし復号することとは対照的に、ベクトル命令を１度フェッチし、次いで、データ要素のアレイ全体にわたってベクトル命令を複数回実行することを伴う。このプロセスにより、他の要因の中でも、各ベクトル命令はより少ない回数しかフェッチされる必要がないので、プログラムを実行するために必要とされるエネルギーの低減が可能になる。ベクトル命令は、同時に複数のクロック周期にわたって長いベクトルに対して動作するので、簡易な順序ベクトル命令ディスパッチを用いて、高度の並列性が達成可能である。 [0011] In this regard, baseband processors, including vector processors, can be utilized for wireless communication devices. Vector processors have a vector architecture that provides high-level operations that operate on vectors, or arrays of data. Vector processing fetches a vector instruction once, as opposed to executing the vector instruction on one data set and then refetching and decoding the vector instruction for subsequent elements in the vector, It then involves executing the vector instruction multiple times across the array of data elements. This process allows a reduction in the energy required to execute the program because, among other factors, each vector instruction needs to be fetched fewer times. Since vector instructions operate on long vectors simultaneously over multiple clock periods, a high degree of parallelism can be achieved using simple ordered vector instruction dispatch.

[0012]図１は、ワイヤレスコンピュータデバイスなどのコンピューティングデバイス内で利用され得る例示的なベースバンドプロセッサ１０を示す。ベースバンドプロセッサ１０は、特定のアプリケーションのための関数固有ベクトル処理を提供することに各々が専用化された複数の処理エンジン（ＰＥ）１２を含む。この例では、６つの別個のＰＥ１２（０）〜ＰＥ１２（５）がベースバンドプロセッサ１０内に設けられる。ＰＥ１２（０）〜ＰＥ１２（５）は各々、共有メモリ１６からＰＥ１２（０）〜ＰＥ１２（５）に供給される固定Ｘビット幅のベクトルデータ１４に対してベクトル処理を提供するように構成される。たとえば、ベクトルデータ１４は５１２ビット幅であり得る。ベクトルデータ１４は、Ｘのより小さい倍数のビット幅のベクトルデータサンプルセット１８（０）〜１８（Ｙ）（たとえば、１６ビットおよび３２ビットのサンプルセット）内で定義され得る。このようにして、ＰＥ１２（０）〜ＰＥ１２（５）は、高度の並列性を達成するために、ＰＥ１２（０）〜ＰＥ１２（５）に並列に供給される複数のベクトルデータサンプルセット１８に対するベクトル処理を提供することが可能である。各ＰＥ１２（０）〜ＰＥ１２（５）は、ベクトルデータ１４上で処理されるベクトル命令の結果を記憶するためのベクトルレジスタファイル（ＶＲ）を含む場合がある。 [0012] FIG. 1 illustrates an exemplary baseband processor 10 that may be utilized within a computing device, such as a wireless computing device. Baseband processor 10 includes a plurality of processing engines (PE) 12 each dedicated to providing function eigenvector processing for a particular application. In this example, six separate PEs 12 (0) to PE 12 (5) are provided in the baseband processor 10. PE12 (0) -PE12 (5) are each configured to provide vector processing for fixed X-bit wide vector data 14 supplied from shared memory 16 to PE12 (0) -PE12 (5). . For example, the vector data 14 may be 512 bits wide. Vector data 14 may be defined within vector data sample sets 18 (0) -18 (Y) (eg, 16-bit and 32-bit sample sets) of bit widths that are smaller multiples of X. In this way, PE12 (0) -PE12 (5) is a vector for a plurality of vector data sample sets 18 supplied in parallel to PE12 (0) -PE12 (5) to achieve a high degree of parallelism. Processing can be provided. Each PE 12 (0) -PE 12 (5) may include a vector register file (VR) for storing the results of vector instructions processed on the vector data 14.

[0013]図１のベースバンドプロセッサ１０内の各ＰＥ１２（０）〜ＰＥ１２（５）は、特定のタイプの固定演算を効率的に実行するように特に設計された、特定の専用回路とハードウェアとを含む。たとえば、図１のベースバンドプロセッサ１０は、別個のＷＣＤＭＡのＰＥ１２（０）、ＰＥ１２（１）と、ＬＴＥのＰＥ１２（４）、ＰＥ１２（５）とを含むが、これは、ＷＣＤＭＡおよびＬＴＥが異なるタイプの特殊な演算を伴うからである。したがって、別個のＷＣＤＭＡ固有ＰＥ１２（０）、ＰＥ１２（１）とＬＴＥ固有ＰＥ１２（４）、ＰＥ１２（５）とを設けることによって、ＰＥ１２（０）、ＰＥ１２（１）、ＰＥ１２（４）、ＰＥ１２（５）の各々は、高効率演算のための、ＷＣＤＭＡおよびＬＴＥ用の頻繁に実行される関数に固有の特殊な専用回路を含むように設計され得る。この設計は、効率的でない方式ではあるが、より多数の無関係な演算をサポートするために柔軟であるように設計された、より一般的な回路とハードウェアとを含むスカラー処理エンジンとは対照的である。 [0013] Each PE 12 (0) -PE 12 (5) in the baseband processor 10 of FIG. 1 is a specific dedicated circuit and hardware specifically designed to efficiently perform a specific type of fixed operation. Including. For example, the baseband processor 10 of FIG. 1 includes separate WCDMA PEs 12 (0), PE12 (1) and LTE PEs 12 (4), PE12 (5), which are different in WCDMA and LTE. This is because it involves special operations of type. Therefore, by providing separate WCDMA specific PE12 (0), PE12 (1) and LTE specific PE12 (4), PE12 (5), PE12 (0), PE12 (1), PE12 (4), PE12 ( Each of 5) can be designed to include special dedicated circuitry specific to frequently executed functions for WCDMA and LTE for high efficiency operations. This design is inefficient, but in contrast to a scalar processing engine that includes more general circuits and hardware designed to be flexible to support a larger number of unrelated operations. It is.

[0014]いくつかのワイヤレスベースバンド動作は、前の処理動作から決定されたデータサンプルのマージングを必要とする。たとえば、実行ユニットのデータパスよりも広い変化幅のベクトルデータサンプルを累算することが望ましい場合がある。別の例として、ベクトル処理動作において出力ベクトルデータのマージングを提供するために、様々な実行ユニットからの出力ベクトルデータサンプルのドット積乗算を提供することが望ましい場合がある。これらのベクトル処理動作におけるベクトルデータサンプルは、ベクトルデータレーンと交差するデータパスを提供する複合ルーティングを含むことができる。しかしながら、様々なベクトルデータレーンと交差してマージされるべき出力ベクトルデータにおける並列化は困難なので、これにより、複雑度が増大し、ベクトル処理エンジン（ＶＰＥ）の効率が低減する可能性がある。ベクトルプロセッサは、実行ユニットからベクトルデータメモリに記憶された出力ベクトルデータの後処理マージングを実行する回路を含むこともできる。ベクトルデータメモリに記憶された後処理された出力ベクトルデータサンプルは、ベクトルデータメモリからフェッチされ、必要に応じてマージされ、ベクトルデータメモリに戻されて記憶される。しかしながら、この後処理により、ＶＰＥの次のベクトル処理動作が遅延し、実行ユニット内のコンピュータ構成要素が過少利用される原因になる可能性がある。 [0014] Some wireless baseband operations require merging of data samples determined from previous processing operations. For example, it may be desirable to accumulate vector data samples with a wider variation than the execution unit data path. As another example, it may be desirable to provide dot product multiplication of output vector data samples from various execution units to provide merging of output vector data in vector processing operations. Vector data samples in these vector processing operations can include complex routing that provides a data path that intersects the vector data lane. However, parallelization in output vector data to be merged across different vector data lanes is difficult, which can increase complexity and reduce the efficiency of the vector processing engine (VPE). The vector processor may also include circuitry that performs post-processing merging of output vector data stored in the vector data memory from the execution unit. The post-processed output vector data samples stored in the vector data memory are fetched from the vector data memory, merged as necessary, and returned to the vector data memory for storage. However, this post-processing may delay the next vector processing operation of the VPE and cause underutilization of computer components in the execution unit.

[0015]本明細書で開示される実施形態は、ベクトルデータメモリに記憶される出力ベクトルデータのインフライトマージングを提供するために、実行ユニットとベクトルデータメモリとの間のデータフローパスにおいてマージング回路を利用するベクトル処理エンジン（ＶＰＥ）を含む。関連するベクトル処理の命令、システム、および方法も開示される。マージング回路は、ＶＰＥ内の実行ユニットとベクトルデータメモリとの間のデータフローパス内に設けられる。マージング回路は、出力ベクトルデータサンプルセットが記憶されるために実行ユニットからベクトルデータメモリに出力データフローパスを介して供給されている間のインフライトのベクトル処理動作を実行する結果として、実行ユニットからの出力ベクトルデータサンプルセットをマージするように構成される。出力データサンプルセットのインフライトマージングは、実行ユニットによって供給された出力ベクトルデータサンプルセット内の所望のプログラムされた出力ベクトルデータサンプルが、ベクトルデータメモリに記憶される前にマージされることを意味し、その結果、出力ベクトルデータサンプルセットはマージされたフォーマットでベクトルデータメモリに記憶される。非限定的な例として、出力ベクトルデータのマージングは、マージされた出力ベクトルデータサンプルセットを供給する出力ベクトルデータサンプルセットと、出力スカラーデータサンプルセットとを加算することを含む場合がある。別の非限定的な例として、出力ベクトルデータサンプルセットのマージングは、実行ユニットからの比較された出力ベクトルデータサンプルセット間の最大出力ベクトルデータおよび／または最小出力ベクトルデータを生成することを含む場合がある。マージされた出力ベクトルデータサンプルセットは、実行ユニット内で実行されるべき次のベクトル処理動作を遅延させる可能性がある、さらなる後処理ステップを必要とせずに、ベクトルデータメモリにマージされた形式で記憶される。 [0015] Embodiments disclosed herein include a merging circuit in a data flow path between an execution unit and a vector data memory to provide in-flight merging of output vector data stored in the vector data memory. Includes vector processing engine (VPE) to use. Related vector processing instructions, systems, and methods are also disclosed. The merging circuit is provided in the data flow path between the execution unit in the VPE and the vector data memory. The merging circuit performs an in-flight vector processing operation while the output vector data sample set is being supplied from the execution unit to the vector data memory via the output data flow path to be stored. It is configured to merge the output vector data sample sets. In-flight merging of output data sample sets means that the desired programmed output vector data samples in the output vector data sample set supplied by the execution unit are merged before being stored in the vector data memory. As a result, the output vector data sample set is stored in the vector data memory in the merged format. As a non-limiting example, merging output vector data may include adding an output vector data sample set that provides a merged output vector data sample set and an output scalar data sample set. As another non-limiting example, the merging of output vector data sample sets includes generating maximum output vector data and / or minimum output vector data between the compared output vector data sample sets from the execution unit. There is. The merged output vector data sample set is in a form merged into vector data memory without the need for further post-processing steps that may delay the next vector processing operation to be performed within the execution unit. Remembered.

[0016]したがって、ＶＰＥ内のデータフローパスの効率は、出力ベクトルデータのマージングによって制限されない。出力ベクトルデータサンプルセットがベクトルデータメモリにマージされた形式で記憶されるべきとき、実行ユニット内の次のベクトル処理は、データフローの制限ではなく、コンピュータリソースのみによって制限される。ＶＰＥはまた、実行ユニットのコンピュータ要素の効率に影響を与えることなく、ベクトルデータメモリ内の所望の宛先位置内に、マージされたベクトル内出力ベクトルデータサンプルセットを供給するように構成される。 [0016] Thus, the efficiency of the data flow path within the VPE is not limited by merging of output vector data. When the output vector data sample set is to be stored in merged form in vector data memory, the next vector processing in the execution unit is limited only by computer resources, not data flow limitations. The VPE is also configured to provide a merged in-vector output vector data sample set within a desired destination location in the vector data memory without affecting the efficiency of the execution unit computer elements.

[0017]この関連で、一実施形態では、ベクトル処理動作を実行する少なくとも１つの実行ユニットによって生成された、結果として生じる出力ベクトルデータサンプルセットをインフライトマージするように構成されたＶＰＥが提供される。ＶＰＥは、少なくとも１つのベクトルデータファイルを備える。ベクトルデータファイルは、ベクトル処理動作のための少なくとも１つの入力データフローパス内にフェッチされた入力ベクトルデータサンプルセットを供給するように構成される。ベクトルデータファイルはまた、記憶されるべき少なくとも１つの出力データフローパスからの少なくとも１つのマージされた、結果として生じる出力ベクトルデータサンプルセットを受信するように構成される。ＶＰＥはまた、少なくとも１つの入力データフローパス内に設けられた少なくとも１つの実行ユニットを備える。実行ユニットは、少なくとも１つの入力データフローパス上で、入力ベクトルデータサンプルセットを受信するように構成される。実行ユニットはまた、少なくとも１つの出力データフローパス上に、結果として生じる出力ベクトルデータサンプルセットを供給するために、入力ベクトルデータサンプルセットに対してベクトル処理動作を実行するように構成される。ＶＰＥはまた、少なくとも１つのマージング回路を含む。マージング回路は、結果として生じる出力ベクトルデータサンプルセットを受信するように構成される。マージング回路はまた、結果として生じる出力ベクトルデータサンプルセットが少なくとも１つのベクトルデータファイルに記憶されることなく、少なくとも１つのマージされた、結果として生じる出力ベクトルデータサンプルセットを供給するために、結果として生じる出力ベクトルデータサンプルセットをマージするように構成される。マージング回路はまた、少なくとも１つの出力データフローパス上に少なくとも１つのマージされた、結果として生じる出力ベクトルデータサンプルセットを供給するように構成される。 [0017] In this regard, in one embodiment, a VPE configured to in-flight merge a resulting output vector data sample set generated by at least one execution unit that performs vector processing operations is provided. The The VPE comprises at least one vector data file. The vector data file is configured to provide an input vector data sample set fetched in at least one input data flow path for vector processing operations. The vector data file is also configured to receive at least one merged resulting output vector data sample set from at least one output data flow path to be stored. The VPE also comprises at least one execution unit provided in at least one input data flow path. The execution unit is configured to receive the input vector data sample set on at least one input data flow path. The execution unit is also configured to perform vector processing operations on the input vector data sample set to provide the resulting output vector data sample set on at least one output data flow path. The VPE also includes at least one merging circuit. The merging circuit is configured to receive the resulting output vector data sample set. The merging circuit also results in providing at least one merged resulting output vector data sample set without the resulting output vector data sample set being stored in at least one vector data file. It is configured to merge the resulting output vector data sample sets. The merging circuit is also configured to provide at least one merged resulting output vector data sample set on at least one output data flow path.

[0018]別の実施形態では、ベクトル処理動作を実行する少なくとも１つの実行ユニットによって生成された、結果として生じる出力ベクトルデータサンプルセットをインフライトマージするように構成されたＶＰＥが提供される。ＶＰＥは、少なくとも１つのベクトルデータファイル手段を備える。ベクトルデータファイル手段は、ベクトル処理動作のための少なくとも１つの入力データフローパス手段内にフェッチされた入力ベクトルデータサンプルセットを供給するための手段を備える。ベクトルデータファイル手段はまた、記憶されるべき少なくとも１つの出力データフローパス手段からの少なくとも１つのマージされた、結果として生じる出力ベクトルデータサンプルセットを受信するための手段を備える。ＶＰＥはまた、少なくとも１つの入力データフローパス手段内に設けられた少なくとも１つの実行ユニット手段を備える。実行ユニット手段は、少なくとも１つの入力データフローパス手段上で、入力ベクトルデータサンプルセットを受信するための手段を備える。実行ユニット手段はまた、少なくとも１つの入力データフローパス手段上に、結果として生じる出力ベクトルデータサンプルセットを供給するために、入力ベクトルデータサンプルセットに対してベクトル処理動作を実行するための実行手段を備える。 [0018] In another embodiment, a VPE configured to in-flight merge a resulting output vector data sample set generated by at least one execution unit that performs vector processing operations is provided. The VPE comprises at least one vector data file means. The vector data file means comprises means for providing a set of input vector data samples fetched into at least one input data flow path means for vector processing operations. The vector data file means also comprises means for receiving at least one merged resulting output vector data sample set from at least one output data flow path means to be stored. The VPE also comprises at least one execution unit means provided in the at least one input data flow path means. The execution unit means comprises means for receiving an input vector data sample set on at least one input data flow path means. The execution unit means also comprises execution means for performing vector processing operations on the input vector data sample set to provide the resulting output vector data sample set on at least one input data flow path means. .

[0019]さらに、ＶＰＥはまた、少なくとも１つのマージング回路手段を備える。マージング回路手段は、少なくとも１つの入力データフローパス手段上で、結果として生じる出力ベクトルデータサンプルセットを受信するための手段を備える。マージング回路手段はまた、結果として生じる出力ベクトルデータサンプルセットが少なくとも１つのベクトルデータファイル手段に記憶されることなく、少なくとも１つのマージされた、結果として生じる出力ベクトルデータサンプルセットを供給するために、結果として生じる出力ベクトルデータサンプルセットをコードシーケンスベクトルデータサンプルセットとマージするためのマージング手段を備える。マージング回路手段はまた、少なくとも１つの出力データフローパス手段上に少なくとも１つのマージされた、結果として生じる出力ベクトルデータサンプルセットを供給するための手段を備える。 [0019] Furthermore, the VPE also comprises at least one merging circuit means. The merging circuit means comprises means for receiving the resulting output vector data sample set on at least one input data flow path means. The merging circuit means also provides the at least one merged resulting output vector data sample set without the resulting output vector data sample set being stored in the at least one vector data file means. Merging means are provided for merging the resulting output vector data sample set with the code sequence vector data sample set. The merging circuit means also comprises means for providing at least one merged resulting output vector data sample set on the at least one output data flow path means.

[0020]別の実施形態では、ベクトル処理動作を実行する少なくとも１つの実行ユニットによって生成された、結果として生じる出力ベクトルデータサンプルセットをインフライトマージする方法が提供される。方法は、少なくとも１つのベクトルデータファイルからベクトル処理動作のための少なくとも１つの入力データフローパス内にフェッチされた入力ベクトルデータサンプルセットを供給することを備える。方法はまた、少なくとも１つの入力データフローパス内に設けられた少なくとも１つの実行ユニット内の少なくとも１つの入力データフローパス上で入力ベクトルデータサンプルセットを受信することを備える。方法はまた、少なくとも１つの入力データフローパス上に、結果として生じる出力ベクトルデータサンプルセットを供給するために、入力ベクトルデータサンプルセットに対してベクトル処理動作を実行することを備える。方法はまた、結果として生じる出力ベクトルデータサンプルセットが少なくとも１つのベクトルデータファイルに記憶されることなく、少なくとも１つのマージされた、結果として生じる出力ベクトルデータサンプルセットを供給するために、結果として生じる出力ベクトルデータサンプルセットをマージすることを備える。方法はまた、少なくとも１つのベクトルデータファイル内に少なくとも１つの出力データフローパスからの少なくとも１つのマージされた、結果として生じる出力ベクトルデータサンプルセットを記憶することを備える。 [0020] In another embodiment, a method is provided for in-flight merging a resulting output vector data sample set generated by at least one execution unit that performs vector processing operations. The method comprises providing an input vector data sample set fetched from at least one vector data file into at least one input data flow path for vector processing operations. The method also comprises receiving an input vector data sample set on at least one input data flow path in at least one execution unit provided in the at least one input data flow path. The method also comprises performing a vector processing operation on the input vector data sample set to provide a resulting output vector data sample set on at least one input data flow path. The method also results to provide at least one merged resulting output vector data sample set without the resulting output vector data sample set being stored in at least one vector data file. Merging output vector data sample sets. The method also comprises storing at least one merged resulting output vector data sample set from at least one output data flow path in at least one vector data file.

特定のアプリケーション向けの関数固有ベクトル処理を提供するために各々が専用化された、複数のベクトル処理エンジン（ＶＰＥ）を含む例示的なベクトルプロセッサの概略図。1 is a schematic diagram of an exemplary vector processor that includes multiple vector processing engines (VPEs), each dedicated to provide function-specific vector processing for a particular application. ＶＰＥ内に設けられた共通の回路およびハードウェアが、別個のＶＰＥを設ける必要なしに複数のアプリケーションまたは技術のために、特定のタイプのベクトル演算を高効率な方式で実行するために複数のモードでプログラムされ得るように、プログラム可能なデータパス構成を有するＶＰＥを含む例示的なベースバンドプロセッサの概略図。A common circuit and hardware provided within a VPE allows multiple modes to perform certain types of vector operations in a highly efficient manner for multiple applications or technologies without the need for separate VPE 1 is a schematic diagram of an exemplary baseband processor including a VPE having a programmable data path configuration as can be programmed with. FIG. ＶＰＥによってサポートされるフィルタベクトル処理動作において提供され得るディスクリート有限インパルス応答（ＦＩＲ）フィルタの概略図。FIG. 3 is a schematic diagram of a discrete finite impulse response (FIR) filter that may be provided in a filter vector processing operation supported by a VPE. 再フェッチおよび電力消費が低減される精度フィルタベクトル処理動作を提供するためのフィルタ係数データを用いて処理されるべきシフトされた入力ベクトルデータサンプルセットを受信し、実行ユニットに供給するためにタップ付き遅延線を利用する例示的なＶＰＥの概略図。Tapped to receive and provide a shifted input vector data sample set to be processed with filter coefficient data to provide recursive and power consumption reduced precision filter vector processing operations 1 is a schematic diagram of an example VPE that utilizes a delay line. FIG. 例示的なフィルタベクトル命令に従って図４のＶＰＥにおいて実行され得る例示的なフィルタベクトル処理動作を示すフローチャート。5 is a flowchart illustrating an example filter vector processing operation that may be performed in the VPE of FIG. 4 in accordance with an example filter vector instruction. 図４のＶＰＥ内のレジスタファイルに記憶されたフィルタタップ係数の概略図。FIG. 5 is a schematic diagram of filter tap coefficients stored in a register file in the VPE of FIG. 4. 図４のＶＰＥ内のベクトルデータファイルに記憶された例示的な入力ベクトルデータサンプルセットの概略図。FIG. 5 is a schematic diagram of an exemplary input vector data sample set stored in a vector data file in the VPE of FIG. 図４のＶＰＥにおいて提供され得る例示的なタップ付き遅延線とオプションのシャドウタップ付き遅延線とを示す概略図であって、例示的なタップ付き遅延線が、各々、ＶＰＥによって実行されるフィルタベクトル処理動作の間に、ベクトルデータメモリからの入力ベクトルデータサンプルセットとシフトされた入力ベクトルデータサンプルセットとを受信し、実行ユニットに供給するための複数のパイプラインレジスタを備える、概略図。FIG. 5 is a schematic diagram illustrating an example tapped delay line and an optional shadow tapped delay line that may be provided in the VPE of FIG. FIG. 4 is a schematic diagram comprising a plurality of pipeline registers for receiving and supplying input vector data sample sets and shifted input vector data sample sets from a vector data memory to an execution unit during processing operations. フィルタベクトル処理動作の間に入力ベクトルデータサンプルセット内の入力ベクトルデータサンプルをシフトするためのパイプラインレジスタの中のレーン内およびレーン間のルーティングを含む、データレーン内のパイプラインレジスタの例示的な詳細を示す、図７のタップ付き遅延線のより例示的な詳細を示す概略図。Exemplary pipeline registers in data lanes, including intra-lane and inter-lane routing in pipeline registers for shifting input vector data samples in an input vector data sample set during a filter vector processing operation FIG. 8 is a schematic diagram illustrating more exemplary details of the tapped delay line of FIG. 7 showing details. 例示的な８タップフィルタベクトル処理動作の第１のフィルタタップ実行の一部として、図４のＶＰＥ内のプライマリタップ付き遅延線に最初に記憶された入力ベクトルデータサンプルセットの概略図。FIG. 5 is a schematic diagram of an input vector data sample set initially stored on a primary tapped delay line in the VPE of FIG. 4 as part of a first filter tap execution of an exemplary 8-tap filter vector processing operation. レジスタファイルに記憶されたフィルタタップ係数、および図９Ａに示された例示的な８タップフィルタベクトル処理動作フィルタベクトル処理動作の第１のフィルタタップ実行の一部として、図４のＶＰＥ内のシャドウタップ付き遅延線に最初に記憶されたシャドウ入力ベクトルデータサンプルセットの概略図。The filter tap coefficients stored in the register file and the shadow taps in the VPE of FIG. 4 as part of the first filter tap execution of the exemplary 8-tap filter vector processing operation filter vector processing operation shown in FIG. 9A FIG. 3 is a schematic diagram of a shadow input vector data sample set initially stored in a delay line. 例示的な８タップフィルタベクトル処理動作の第２のフィルタタップ実行の一部として、図４のＶＰＥ内のプライマリタップ付き遅延線およびシャドウタップ付き遅延線に記憶された、シフトされた入力ベクトルデータサンプルセット、ならびにレジスタファイルに記憶されたフィルタタップ係数の概略図。Shifted input vector data samples stored in the primary tapped delay line and shadow tapped delay line in the VPE of FIG. 4 as part of the second filter tap execution of the exemplary 8-tap filter vector processing operation. The schematic of the filter tap coefficient memorize | stored in the set and a register file. 例示的な８タップフィルタベクトル処理動作の第８のフィルタタップ実行の一部として、図４のＶＰＥ内のプライマリタップ付き遅延線およびシャドウタップ付き遅延線に記憶された、シフトされた入力ベクトルデータサンプルセット、ならびにレジスタファイルに記憶されたフィルタタップ係数の概略図。Shifted input vector data samples stored in the primary tapped delay line and shadow tapped delay line in the VPE of FIG. 4 as part of the eighth filter tap execution of the exemplary 8-tap filter vector processing operation. The schematic of the filter tap coefficient memorize | stored in the set and a register file. 例示的な８タップフィルタベクトル処理動作が完全に実行された後の図４のＶＰＥ内の実行ユニットの累算器のコンテンツの概略図。FIG. 5 is a schematic diagram of the contents of an accumulator of execution units in the VPE of FIG. 4 after an exemplary 8-tap filter vector processing operation has been fully performed. 再フェッチおよび電力消費が低減される精度相関／共分散ベクトル処理動作を提供するために、シーケンス番号データを用いて処理されるべきシフトされた入力ベクトルデータサンプルセットを受信し、実行ユニットに供給するためにタップ付き遅延線を利用する例示的なＶＰＥの概略図。Receive and supply to the execution unit a shifted input vector data sample set to be processed with sequence number data to provide precision correlation / covariance vector processing operations with reduced refetch and power consumption FIG. 3 is a schematic diagram of an example VPE that utilizes a tapped delay line for the purpose. 例示的な相関／共分散ベクトル処理動作に従って、インターリーブされたオンタイムおよび後発の入力ベクトルデータサンプルセットがフェッチされる、図１１のＶＰＥ内で並列に実行され得る例示的な相関／共分散ベクトル処理動作を示すフローチャート。Exemplary correlation / covariance vector processing that may be performed in parallel in the VPE of FIG. 11, in which interleaved on-time and late input vector data sample sets are fetched according to exemplary correlation / covariance vector processing operations The flowchart which shows operation | movement. 例示的な相関／共分散ベクトル処理動作に従って、インターリーブされたオンタイムおよび後発の入力ベクトルデータサンプルセットがフェッチされる、図１１のＶＰＥ内で並列に実行され得る例示的な相関／共分散ベクトル処理動作を示すフローチャート。Exemplary correlation / covariance vector processing that may be performed in parallel in the VPE of FIG. 11, in which interleaved on-time and late input vector data sample sets are fetched according to exemplary correlation / covariance vector processing operations The flowchart which shows operation | movement. 図１１のＶＰＥ内のレジスタファイルに記憶された相関／共分散入力ベクトルデータサンプルセットの概略図。12 is a schematic diagram of a correlation / covariance input vector data sample set stored in a register file in the VPE of FIG. 図１１のＶＰＥ内に設けられ得る例示的なタップ付き遅延線とオプションのシャドウタップ付き遅延線とを示す概略図であって、例示的なタップ付き遅延線が、各々、ＶＰＥによって実行される相関／共分散ベクトル処理動作の間に、ベクトルデータメモリからの入力ベクトルデータサンプルセットとシフトされた入力ベクトルデータサンプルセットとを受信し、実行ユニットに供給するための複数のパイプラインレジスタを備える、概略図。FIG. 12 is a schematic diagram illustrating an exemplary tapped delay line and an optional shadow tapped delay line that may be provided in the VPE of FIG. 11, wherein the exemplary tapped delay line is each a correlation performed by the VPE. / Comprising a plurality of pipeline registers for receiving and supplying the input vector data sample set and the shifted input vector data sample set from the vector data memory to the execution unit during the covariance vector processing operation Figure. 相関／共分散ベクトル処理動作の第１の処理ステージの一部として、図１１のＶＰＥ内のプライマリタップ付き遅延線に最初に供給されたベクトルデータファイルからの入力ベクトルデータサンプルセットの概略図。FIG. 12 is a schematic diagram of an input vector data sample set from a vector data file first supplied to a primary tapped delay line in the VPE of FIG. 11 as part of a first processing stage of a correlation / covariance vector processing operation. 相関／共分散ベクトル処理動作の第１の処理ステージの一部として、図１１のＶＰＥ内のシャドウタップ付き遅延線に最初に供給されたベクトルデータファイルからのシャドウ入力ベクトルデータサンプルセットの概略図。FIG. 12 is a schematic diagram of a shadow input vector data sample set from a vector data file initially supplied to a shadow tapped delay line in the VPE of FIG. 11 as part of a first processing stage of a correlation / covariance vector processing operation. 相関／共分散ベクトル処理動作の第２の処理ステージの一部として、図１１のＶＰＥ内のプライマリタップ付き遅延線およびシャドウタップ付き遅延線に記憶された、シフトされた入力ベクトルデータサンプルセット、ならびにレジスタファイルに記憶されたシフトされた入力ベクトルデータサンプルセットの概略図。As part of the second processing stage of the correlation / covariance vector processing operation, the shifted input vector data sample set stored in the primary tapped delay line and shadow tapped delay line in the VPE of FIG. FIG. 4 is a schematic diagram of a shifted input vector data sample set stored in a register file. 相関／共分散ベクトル処理動作の第１４の処理ステージの一部として、図１１のＶＰＥ内のプライマリタップ付き遅延線およびシャドウタップ付き遅延線に記憶された、シフトされた入力ベクトルデータサンプルセット、ならびにレジスタファイルに記憶されたシフトされた入力ベクトルデータサンプルセットの概略図。As part of the fourteenth processing stage of the correlation / covariance vector processing operation, the shifted input vector data sample set stored in the primary tapped delay line and shadow tapped delay line in the VPE of FIG. FIG. 4 is a schematic diagram of a shifted input vector data sample set stored in a register file. 例示的な相関／共分散ベクトル処理動作が完全に実行された後の図１１のＶＰＥ内の実行ユニットの累算器のコンテンツの概略図。FIG. 12 is a schematic diagram of the contents of an execution unit accumulator in the VPE of FIG. 11 after an exemplary correlation / covariance vector processing operation has been fully performed. 別々に記憶された、結果として生じるフィルタ出力ベクトルデータサンプルの実数成分および虚数成分に記憶された、結果として生じるフィルタ出力ベクトルデータサンプルセットを示す例示的なベクトルデータファイルの図。FIG. 4 is an exemplary vector data file illustrating a resulting set of filter output vector data samples stored separately in real and imaginary components of the resulting filter output vector data samples. 別々に記憶されたその偶数および奇数の、結果として生じるフィルタ出力ベクトルデータサンプルとともに記憶された、結果として生じるフィルタ出力ベクトルデータサンプルセットを示す例示的なベクトルデータファイルの図。FIG. 4 is an exemplary vector data file showing a resulting set of filter output vector data samples stored with their even and odd number of resulting filter output vector data samples stored separately. 符号付き複素数の１６ビットフォーマットでＶＰＥのベクトルデータファイルに記憶されたベクトルデータサンプルセットの例示的なインターリーブされたベクトルデータサンプルの図。FIG. 5 is an illustration of an exemplary interleaved vector data sample of a vector data sample set stored in a VPE vector data file in a signed complex 16-bit format. 符号付き複素数の８ビットフォーマットでＶＰＥのベクトルデータファイルに記憶されたベクトルデータサンプルセットの例示的なインターリーブされたベクトルデータサンプルの図。FIG. 6 is an illustration of an exemplary interleaved vector data sample of a vector data sample set stored in a VPE vector data file in a signed complex 8-bit format. ベクトル処理動作を実行するための少なくとも１つの実行ユニットにフォーマット変換された入力ベクトルデータサンプルセットを供給するために、入力ベクトルデータサンプルセットがベクトルデータファイルから再フェッチされる必要なしに、ベクトルデータファイルと少なくとも１つの実行ユニットとの間の少なくとも１つの入力データフローパスにおいて、入力ベクトルデータサンプルセットのインフライトフォーマット変換を提供するように構成されたフォーマット変換回路を利用する例示的なＶＰＥの概略図。A vector data file without the input vector data sample set having to be refetched from the vector data file to provide a formatted input vector data sample set to at least one execution unit for performing vector processing operations FIG. 4 is a schematic diagram of an example VPE that utilizes a format conversion circuit configured to provide in-flight format conversion of an input vector data sample set in at least one input data flow path between and at least one execution unit. 図１９のＶＰＥにおいて実行され得る、ベクトルデータファイルと少なくとも１つの実行ユニットとの間の少なくとも１つの入力データフローパスにおける入力ベクトルデータサンプルセットの例示的なインフライトフォーマット変換を示すフローチャート。FIG. 20 is a flowchart illustrating an example in-flight format conversion of an input vector data sample set in at least one input data flow path between a vector data file and at least one execution unit that may be performed in the VPE of FIG. 図１９のＶＰＥ内のタップ付き遅延線と実行ユニットとの間に設けられた例示的なフォーマット変換回路の概略図であって、フォーマット変換回路が実行ユニットへの入力データフローパス内のタップ付き遅延線によって供給される入力ベクトルデータサンプルセットのインフライトフォーマット変換を提供するように構成される、概略図。FIG. 20 is a schematic diagram of an exemplary format conversion circuit provided between the tapped delay line in the VPE of FIG. 19 and the execution unit, where the format conversion circuit is in the tapped delay line in the input data flow path to the execution unit. Schematic configured to provide in-flight format conversion of input vector data sample sets supplied by 実行ユニットにおける受信前に入力データフローパス内で入力ベクトルデータサンプルセットのインフライトフォーマット変換を提供するために、図１９のＶＰＥにプログラミングを提供する例示的なベクトル命令データフォーマットを示す図。FIG. 20 illustrates an exemplary vector instruction data format that provides programming to the VPE of FIG. 19 to provide in-flight format conversion of input vector data sample sets within an input data flow path prior to receipt at an execution unit. 並び替えられた、結果として生じる出力データサンプルセットを供給し記憶するために、結果として生じる出力ベクトルデータサンプルセットが少なくとも１つのベクトルデータファイルに記憶されずに、少なくとも１つの実行ユニットと少なくとも１つのベクトルデータファイルとの間の少なくとも１つの出力データフローパスにおいて、結果として生じる出力ベクトルデータサンプルセットのインフライト並び替えを提供するように構成された並び替え回路を利用する例示的なＶＰＥの概略図。In order to provide and store the rearranged resulting output data sample set, the resulting output vector data sample set is not stored in at least one vector data file, but at least one execution unit and at least one FIG. 4 is a schematic diagram of an example VPE that utilizes a reordering circuit configured to provide in-flight reordering of a resulting output vector data sample set in at least one output data flow path to and from a vector data file. ベクトルデータファイルに並び替えた形式で記憶される図２３のＶＰＥ内のベクトルデータファイルと少なくとも１つの実行ユニットとの間の少なくとも１つの出力データフローパスにおける出力ベクトルデータサンプルセットの例示的なインフライトデインターリービングを示すフローチャート。An exemplary in-flight data set of output vector data sample sets in at least one output data flow path between the vector data file in the VPE of FIG. The flowchart which shows interleaving. ベクトルデータファイルに記憶された出力ベクトルデータサンプルセットのインフライト並び替えを提供するために、実行ユニットとベクトルデータファイルとの間の出力データフローパス内の並び替え回路を利用する例示的なＶＰＥの概略図。Overview of an exemplary VPE that utilizes a reordering circuit in the output data flow path between the execution unit and the vector data file to provide in-flight reordering of the output vector data sample set stored in the vector data file. Figure. 図２６Ａは、通信信号を表す例示的なベクトルデータサンプルシーケンスの図である。図２６Ｂは、例示的な符号分割多元接続（ＣＤＭＡ）チップシーケンスの図である。図２６Ｃは、図２６ＢのＣＤＭＡチップシーケンスで拡散された後の図２６Ａのベクトルデータサンプルシーケンスの図である。図２６Ｄは、図２６Ａの元のベクトルデータサンプルシーケンスを復元するために、図２６ＢのＣＤＭＡチップシーケンスで図２６Ｃの拡散されたベクトルデータサンプルシーケンスを逆拡散する図である。FIG. 26A is a diagram of an exemplary vector data sample sequence representing a communication signal. FIG. 26B is a diagram of an exemplary code division multiple access (CDMA) chip sequence. 26C is a diagram of the vector data sample sequence of FIG. 26A after being spread with the CDMA chip sequence of FIG. 26B. FIG. 26D is a diagram for despreading the spread vector data sample sequence of FIG. 26C with the CDMA chip sequence of FIG. 26B to restore the original vector data sample sequence of FIG. 26A. 逆拡散された、結果として生じる出力ベクトルデータサンプルセットを供給し記憶するために、結果として生じる出力ベクトルデータサンプルセットが少なくとも１つのベクトルデータファイルに記憶されずに、少なくとも１つの実行ユニットと少なくとも１つのベクトルデータファイルとの間の少なくとも１つの出力データフローパスにおいて、結果として生じる出力ベクトルデータサンプルセットの逆拡散を提供するように構成された逆拡散回路を利用する例示的なＶＰＥの概略図。In order to provide and store the despread resulting output vector data sample set, the resulting output vector data sample set is not stored in at least one vector data file, but at least one execution unit and at least one FIG. 3 is a schematic diagram of an example VPE that utilizes a despreading circuit configured to provide despreading of a resulting set of output vector data samples in at least one output data flow path between two vector data files. 少なくとも１つのベクトルデータファイル内に逆拡散された、結果として生じる出力ベクトルデータサンプルセットを供給し記憶するために、図２７のＶＰＥ内の少なくとも１つのベクトルデータファイルと少なくとも１つの実行ユニットとの間の少なくとも１つの出力データフローパスにおける、結果として生じる出力ベクトルデータサンプルセットの例示的な逆拡散を示すフローチャート。Between at least one vector data file and at least one execution unit in the VPE of FIG. 27 to provide and store the resulting output vector data sample set despread in at least one vector data file 6 is a flowchart illustrating an exemplary despreading of the resulting output vector data sample set in at least one output data flow path. 少なくとも１つのベクトルデータファイル内に逆拡散された、結果として生じる出力ベクトルデータサンプルセットを供給し記憶するために、結果として生じる出力ベクトルデータサンプルセットの逆拡散を提供する、図２７のＶＰＥ内の少なくとも１つの実行ユニットと少なくとも１つのベクトルデータファイルとの間の出力データフローパス内の例示的な逆拡散回路の概略図。In the VPE of FIG. 27, providing despreading of the resulting output vector data sample set to provide and store the resulting output vector data sample set despread in at least one vector data file FIG. 3 is a schematic diagram of an exemplary despreading circuit in an output data flow path between at least one execution unit and at least one vector data file. マージされるべき例示的なベクトルデータサンプルとマージされた、結果として生じるベクトルデータサンプルとを示す図。FIG. 4 shows an exemplary vector data sample to be merged and a resulting vector data sample merged. マージされた、結果として生じる出力ベクトルデータサンプルセットを供給し記憶するために、結果として生じる出力ベクトルデータサンプルセットが少なくとも１つのベクトルデータファイルに記憶されずに、少なくとも１つの実行ユニットと少なくとも１つのベクトルデータファイルとの間の少なくとも１つの出力データフローパスにおいて、結果として生じる出力ベクトルデータサンプルセットのマージングを提供するように構成されたマージ回路を利用する例示的なＶＰＥの概略図。In order to provide and store a merged resulting output vector data sample set, the resulting output vector data sample set is not stored in at least one vector data file, but at least one execution unit and at least one FIG. 6 is a schematic diagram of an example VPE that utilizes a merge circuit configured to provide merging of a resulting output vector data sample set in at least one output data flow path to and from a vector data file. ベクトルデータファイル内に加算マージされた、結果として生じる出力ベクトルデータサンプルセットを供給し記憶するために、図３１のＶＰＥ内のベクトルデータファイルと少なくとも１つの実行ユニットとの間の少なくとも１つの出力データフローパスにおける、結果として生じる出力ベクトルデータサンプルセットの例示的な加算マージングを示すフローチャート。At least one output data between the vector data file in the VPE of FIG. 31 and at least one execution unit to provide and store the resulting output vector data sample set merged into the vector data file. 6 is a flowchart illustrating exemplary additive merging of a resulting output vector data sample set in a flow path. 結果として生じる出力ベクトルデータサンプルセットの加算マージングと、ベクトルデータファイル内への加算マージされた、結果として生じる出力ベクトルデータサンプルセットの記憶とを提供する、図３１のＶＰＥ内の実行ユニットとベクトルデータファイルとの間の出力データフローパス内の例示的なマージ回路の概略図。The execution unit and vector data in the VPE of FIG. 31 that provides additive merging of the resulting output vector data sample set and storage of the resulting output vector data sample set merged into the vector data file. FIG. 4 is a schematic diagram of an exemplary merge circuit in an output data flow path to and from a file. 結果として生じる出力ベクトルデータサンプルセットの最大／最小マージングと、ベクトルデータファイル内への最大／最小マージされた、結果として生じる出力ベクトルデータサンプルセットの記憶とを提供する、図３１のＶＰＥ内の実行ユニットとベクトルデータファイルとの間の出力データフローパス内の例示的なマージ回路の概略図。Execution in the VPE of FIG. 31 that provides maximum / minimum merging of the resulting output vector data sample set and storage of the resulting output vector data sample set merged into the vector data file. FIG. 3 is a schematic diagram of an exemplary merge circuit in an output data flow path between a unit and a vector data file. ＶＰＥ内に設けられ得る例示的なベクトル処理ステージの概略図であって、ベクトル処理ステージのうちのいくつかがプログラム可能なデータパス構成を有する例示的なベクトル処理ブロックを含む、概略図。FIG. 4 is a schematic diagram of exemplary vector processing stages that may be provided in a VPE, wherein some of the vector processing stages include exemplary vector processing blocks having a programmable data path configuration. 各々がプログラム可能なデータパス構成を有し、図３５の例示的なＶＰＥ内の様々なベクトル処理ステージ内に設けられる、乗算器ブロックおよび累算器ブロックの例示的なベクトル処理を示すフローチャート。36 is a flow chart illustrating exemplary vector processing of multiplier and accumulator blocks, each having a programmable data path configuration and provided in various vector processing stages within the exemplary VPE of FIG. 図３５のＶＰＥのベクトル処理ステージ内に設けられる複数の乗算器ブロックのより詳細な概略図であって、複数の乗算器ブロックが特定の様々なタイプのベクトル乗算演算を実行するために複数のモードでプログラムされ得るように、複数の乗算器ブロックが各々プログラム可能なデータパス構成を有する、概略図。FIG. 36 is a more detailed schematic diagram of a plurality of multiplier blocks provided within the vector processing stage of the VPE of FIG. 35, wherein the plurality of multiplier blocks perform a plurality of modes for performing certain different types of vector multiplication operations. FIG. 2 is a schematic diagram in which a plurality of multiplier blocks each have a programmable data path configuration such that it can be programmed with ８ビット×８ビットの入力ベクトルデータサンプルセットおよび１６ビット×１６ビットの入力ベクトルデータサンプルセットについての乗算演算を提供するようにプログラムされることが可能なプログラム可能なデータパス構成を有する、図３７の複数の乗算器ブロックの中のある乗算器ブロックの内部構成要素の概略図。FIG. 37 has a programmable data path configuration that can be programmed to provide multiplication operations for an 8 bit × 8 bit input vector data sample set and a 16 bit × 16 bit input vector data sample set. FIG. 2 is a schematic diagram of internal components of a multiplier block among a plurality of multiplier blocks of FIG. 図３８のＶＰＥ内の乗算器ブロックおよび累算器ブロックの一般化された概略図であって、累算器ブロックが桁上げ伝搬を低減するために冗長桁上げ保存フォーマットを利用する桁上げ保存累算器構造を利用する、概略図。FIG. 39 is a generalized schematic diagram of a multiplier block and accumulator block in the VPE of FIG. 38, wherein the accumulator block uses a carry-over save format to reduce carry propagation. Schematic using an arithmetic structure. 図３５のＶＰＥ内に設けられた図３９の累算器ブロックの例示的な内部構成要素の詳細な概略図であって、累算器ブロックが冗長桁上げ保存フォーマットを用いて特定の様々なタイプのベクトル累算演算を実行するために複数のモードでプログラムされ得るように、累算器ブロックがプログラム可能なデータパス構成を有する、概略図。FIG. 40 is a detailed schematic diagram of exemplary internal components of the accumulator block of FIG. 39 provided within the VPE of FIG. 35, where the accumulator block uses a redundant carry save format to identify various different types. FIG. 6 is a schematic diagram of an accumulator block having a programmable data path configuration so that it can be programmed in multiple modes to perform a vector accumulation operation. 本明細書で開示された実施形態による、ベクトル処理回路とベクトル処理動作とを提供するために、本明細書で開示されたＶＰＥを含むことができるベクトルプロセッサを含むことができる、例示的なプロセッサベースシステムのブロック図。An exemplary processor that can include a vector processor that can include the VPE disclosed herein to provide vector processing circuitry and vector processing operations in accordance with embodiments disclosed herein. The block diagram of a base system.

[0073]ここで図面を参照すると、本開示のいくつかの例示的な実施形態が記載される。「例示的」という単語は、本明細書において、「例、事例、または例示として働くこと」を意味するために使用される。本明細書で「例示的」と記載されたいかなる実施形態も、必ずしも他の実施形態より好ましいか、または有利であると解釈されるべきであるとは限らない。 [0073] Referring now to the drawings, several exemplary embodiments of the disclosure will be described. The word “exemplary” is used herein to mean “serving as an example, instance, or illustration”. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

[0074]本明細書で開示される実施形態は、ベクトルデータメモリに記憶される出力ベクトルデータのインフライトマージングを提供するために、実行ユニットとベクトルデータメモリとの間のデータフローパスにおいてマージング回路を利用するベクトル処理エンジン（ＶＰＥ）も含む。関連するベクトル処理の命令、システム、および方法も開示される。マージング回路は、ＶＰＥ内の実行ユニットとベクトルデータメモリとの間のデータフローパス内に設けられる。マージング回路は、出力ベクトルデータサンプルセットが記憶されるために実行ユニットからベクトルデータメモリに出力データフローパスを介して供給されている間のインフライトのベクトル処理動作を実行する結果として、実行ユニットからの出力ベクトルデータサンプルセットをマージするように構成される。出力データサンプルセットのインフライトマージングは、実行ユニットによって供給された出力ベクトルデータサンプルセット内の所望のプログラムされた出力ベクトルデータサンプルが、ベクトルデータメモリに記憶される前にマージされることを意味し、その結果、出力ベクトルデータサンプルセットはマージされたフォーマットでベクトルデータメモリに記憶される。非限定的な例として、出力ベクトルデータのマージングは、マージされた出力ベクトルデータサンプルセットを供給する出力ベクトルデータサンプルセットと、出力スカラーデータサンプルセットとを加算することを含む場合がある。別の非限定的な例として、出力ベクトルデータサンプルセットのマージングは、実行ユニットからの比較された出力ベクトルデータサンプルセット間の最大出力ベクトルデータおよび／または最小出力ベクトルデータを生成することを含む場合がある。マージされた出力ベクトルデータサンプルセットは、実行ユニット内で実行されるべき次のベクトル処理動作を遅延させる可能性がある、さらなる後処理ステップを必要とせずに、ベクトルデータメモリにマージされた形式で記憶される。 [0074] Embodiments disclosed herein include a merging circuit in a data flow path between an execution unit and a vector data memory to provide in-flight merging of output vector data stored in the vector data memory. It also includes a vector processing engine (VPE) to be used. Related vector processing instructions, systems, and methods are also disclosed. The merging circuit is provided in the data flow path between the execution unit in the VPE and the vector data memory. The merging circuit performs an in-flight vector processing operation while the output vector data sample set is being supplied from the execution unit to the vector data memory via the output data flow path to be stored. It is configured to merge the output vector data sample sets. In-flight merging of output data sample sets means that the desired programmed output vector data samples in the output vector data sample set supplied by the execution unit are merged before being stored in the vector data memory. As a result, the output vector data sample set is stored in the vector data memory in the merged format. As a non-limiting example, merging output vector data may include adding an output vector data sample set that provides a merged output vector data sample set and an output scalar data sample set. As another non-limiting example, the merging of output vector data sample sets includes generating maximum output vector data and / or minimum output vector data between the compared output vector data sample sets from the execution unit. There is. The merged output vector data sample set is in a form merged into vector data memory without the need for further post-processing steps that may delay the next vector processing operation to be performed within the execution unit. Remembered.

[0075]したがって、ＶＰＥ内のデータフローパスの効率は、出力ベクトルデータのマージングによって制限されない。出力ベクトルデータサンプルセットがベクトルデータメモリにマージされた形式で記憶されるべきとき、実行ユニット内の次のベクトル処理は、データフローの制限ではなく、コンピュータリソースのみによって制限される。ＶＰＥはまた、実行ユニットのコンピュータ要素の効率に影響を与えることなく、ベクトルデータメモリ内の所望の宛先位置内に、マージされたベクトル内出力ベクトルデータサンプルセットを供給するように構成される。 [0075] Thus, the efficiency of the data flow path within the VPE is not limited by merging of output vector data. When the output vector data sample set is to be stored in merged form in vector data memory, the next vector processing in the execution unit is limited only by computer resources, not data flow limitations. The VPE is also configured to provide a merged in-vector output vector data sample set within a desired destination location in the vector data memory without affecting the efficiency of the execution unit computer elements.

[0076]この関連で、図２は、ベクトル処理エンジン（ＶＰＥ）２２とも呼ばれる例示的なベクトル処理ユニット２２を含むベースバンドプロセッサ２０の概略図である。下記でより詳細に説明されるように、ＶＰＥ２２は、実行ユニット８４と、本明細書で開示される例示的なベクトル処理動作を含むベクトル処理動作を提供する他の特定の例示的な回路および機能とを含む。ベースバンドプロセッサ２０およびそのＶＰＥ２２は、半導体ダイ２４内に設けられ得る。この実施形態では、下記でより詳細に説明されるように、ベースバンドプロセッサ２０は、様々なプログラム可能なデータパス構成を提供するためにプログラムされ得るプログラム可能なデータパス２６を含む共通のＶＰＥ２２を含む。このようにして、ＶＰＥ２２内の実行ユニット８４とベクトルデータファイル８２との間のプログラム可能なデータパス２６は、ベースバンドプロセッサ２０内に別々のＶＰＥ２２を設ける必要なしに、様々な動作モードで様々な特定のタイプのベクトル処理動作を提供するようにプログラムおよび再プログラムされ得る。 In this regard, FIG. 2 is a schematic diagram of a baseband processor 20 that includes an exemplary vector processing unit 22, also referred to as a vector processing engine (VPE) 22. As described in more detail below, VPE 22 provides execution unit 84 and other specific exemplary circuits and functions that provide vector processing operations, including the exemplary vector processing operations disclosed herein. Including. Baseband processor 20 and its VPE 22 may be provided in semiconductor die 24. In this embodiment, as described in more detail below, the baseband processor 20 includes a common VPE 22 that includes a programmable data path 26 that can be programmed to provide various programmable data path configurations. Including. In this way, the programmable data path 26 between the execution unit 84 in the VPE 22 and the vector data file 82 can be varied in various modes of operation without the need for a separate VPE 22 in the baseband processor 20. It can be programmed and reprogrammed to provide specific types of vector processing operations.

[0077]図３で始まる効率的な処理について、この開示におけるＶＰＥ２２によって提供されるように構成された特定の回路とベクトル処理動作とを説明する前に、図２のベースバンドプロセッサ２０の構成要素が最初に記載される。この非限定的な例におけるベースバンドプロセッサ２０は、５１２ビットベクトルプロセッサである。ベースバンドプロセッサ２０は、ベースバンドプロセッサ２０内のベクトル処理を提供するＶＰＥ２２をサポートするために、ＶＰＥ２２に加えて構成要素を含む。ベースバンドプロセッサ２０は、ベクトルユニットデータメモリ（ＬＭＥＭ）３２からベクトルデータ３０を受信し記憶するように構成された、ベクトルデータファイル８２としても知られる、ベクトルレジスタを含む。たとえば、ベクトルデータ３０はＸビット幅であり、「Ｘ」は設計選択に従って定義される（たとえば、５１２ビット）。ベクトルデータ３０は、ベクトルデータサンプルセット３４に分割され得る。非限定的な例として、ベクトルデータ３０は２５６ビット幅であり得るし、より小さいベクトルデータサンプルセット３４（Ｙ）〜３４（０）を備える場合がある。いくつかのベクトルデータサンプルセット３４（Ｙ）〜３４（０）は、例として１６ビット幅であり得るし、ベクトルデータサンプルセット３４（Ｙ）〜３４（０）の他は、３２ビット幅であり得る。ＶＰＥ２２は、高度の並列性を達成するために、ＶＰＥ２２に並列に供給されるいくつかの選ばれたベクトルデータサンプルセット３４（Ｙ）〜３４（０）に対するベクトル処理を提供することが可能である。ベクトルデータファイル８２はまた、ＶＰＥ２２がベクトルデータ３０を処理するときに生成される結果を記憶するように構成される。いくつかの実施形態では、ＶＰＥ２２は、より速いベクトル命令実行時間を提供するようにレジスタ書込みを低減するために、ベクトルデータファイル８２内に中間ベクトル処理結果を記憶しないように構成される。この構成は、スカラー処理デジタル信号プロセッサ（ＤＳＰ）などの、レジスタに中間結果を記憶するスカラー処理エンジンによって実行されるスカラー命令とは反対である。 [0077] Before describing the specific circuitry and vector processing operations configured to be provided by the VPE 22 in this disclosure for efficient processing beginning with FIG. 3, the components of the baseband processor 20 of FIG. Are listed first. Baseband processor 20 in this non-limiting example is a 512-bit vector processor. Baseband processor 20 includes components in addition to VPE 22 to support VPE 22 that provides vector processing within baseband processor 20. Baseband processor 20 includes a vector register, also known as vector data file 82, configured to receive and store vector data 30 from vector unit data memory (LMEM) 32. For example, vector data 30 is X bits wide and “X” is defined according to design choice (eg, 512 bits). Vector data 30 may be divided into vector data sample sets 34. As a non-limiting example, vector data 30 may be 256 bits wide and may comprise smaller vector data sample sets 34 (Y) -34 (0). Some vector data sample sets 34 (Y) -34 (0) may be 16 bits wide by way of example, and others are 32 bits wide other than vector data sample sets 34 (Y) -34 (0) obtain. The VPE 22 can provide vector processing for several selected vector data sample sets 34 (Y) -34 (0) that are fed in parallel to the VPE 22 to achieve a high degree of parallelism. . Vector data file 82 is also configured to store results generated when VPE 22 processes vector data 30. In some embodiments, the VPE 22 is configured not to store intermediate vector processing results in the vector data file 82 to reduce register writes to provide faster vector instruction execution time. This configuration is the opposite of a scalar instruction executed by a scalar processing engine that stores intermediate results in a register, such as a scalar processing digital signal processor (DSP).

[0078]図２のベースバンドプロセッサ２０は、ベクトル命令の条件付き実行において使用するためにＶＰＥ２２に条件を与えるように、およびベクトル命令実行の結果として更新された条件を記憶するように構成された条件レジスタ３６も含む。ベースバンドプロセッサ２０はまた、累算レジスタ３８と、グローバルレジスタを含むグローバルレジスタファイル４０と、アドレスレジスタ４２とを含む。累算レジスタ３８は、ベクトルデータ３０に対していくつかの特殊な演算を実行する結果として累算された結果を記憶するために、ＶＰＥ２２によって使用されるように構成される。グローバルレジスタファイル４０は、ＶＰＥ２２によってサポートされるいくつかのベクトル命令のためのスカラーオペランドを記憶するように構成される。アドレスレジスタ４２は、ベクトルユニットデータメモリ３２からベクトルデータ３０を取り出し、ベクトルユニットデータメモリ３２にベクトル処理結果を記憶するために、ベクトルロードによってアドレス指定可能なアドレスを記憶し、ＶＰＥ２２によってサポートされる命令を記憶するように構成される。 [0078] The baseband processor 20 of FIG. 2 is configured to condition the VPE 22 for use in conditional execution of vector instructions and to store updated conditions as a result of vector instruction execution. A condition register 36 is also included. The baseband processor 20 also includes an accumulation register 38, a global register file 40 including global registers, and an address register 42. The accumulation register 38 is configured to be used by the VPE 22 to store the accumulated result as a result of performing some special operations on the vector data 30. Global register file 40 is configured to store scalar operands for a number of vector instructions supported by VPE 22. The address register 42 retrieves the vector data 30 from the vector unit data memory 32, stores addresses addressable by vector load to store vector processing results in the vector unit data memory 32, and instructions supported by the VPE 22 Is configured to memorize.

[0079]引き続き図２を参照すると、この実施形態におけるベースバンドプロセッサ２０は、ＶＰＥ２２によって提供されるベクトル処理に加えて、ベースバンドプロセッサ２０においてスカラー処理を提供する（「整数ユニット」とも呼ばれる）スカラープロセッサ４４も含む。高効率演算のために実行される命令のタイプに基づいて、ベクトル命令演算とスカラー命令演算の両方をサポートするように構成された中央処理装置（ＣＰＵ）を設けることが望ましい場合がある。この実施形態では、スカラープロセッサ４４は、非限定的な例として、３２ビット縮小命令セットコンピューティング（ＲＩＳＣ）スカラープロセッサである。スカラープロセッサ４４は、この例では、スカラー命令処理をサポートするための算術論理ユニット（ＡＬＵ）４６を含む。ベースバンドプロセッサ２０は、プログラムメモリ５０から命令をフェッチし、フェッチされた命令を復号し、命令タイプに基づいて、スカラープロセッサ４４に、またはベクトルデータパス５３を通ってＶＰＥ２２に、フェッチされた命令を向けるように構成された命令ディスパッチ回路４８を含む。スカラープロセッサ４４は、スカラー命令を実行するときにスカラープロセッサ４４によって使用される汎用レジスタ５４を含む。スカラー命令実行のためにスカラープロセッサ４４によるアクセス用に、メインメモリから汎用レジスタ５４にデータを供給するように、整数ユニットデータメモリ（ＤＭＥＭ）５６がベースバンドプロセッサ２０に含まれる。ＤＭＥＭ５６は、非限定的な例としてキャッシュメモリであり得る。ベースバンドプロセッサ２０は、メモリコントローラデータパス６２を通ってメインメモリへのアクセスを求めるベクトル命令をスカラープロセッサ４４が実行しているときに汎用レジスタ５４からメモリアドレスを受信するように構成されたメモリコントローラレジスタ６０を含むメモリコントローラ５８も含む。 [0079] With continued reference to FIG. 2, the baseband processor 20 in this embodiment provides scalar processing (also referred to as an "integer unit") in the baseband processor 20 in addition to the vector processing provided by the VPE 22. A processor 44 is also included. It may be desirable to provide a central processing unit (CPU) configured to support both vector and scalar instruction operations based on the type of instruction executed for high efficiency operations. In this embodiment, scalar processor 44 is a 32-bit reduced instruction set computing (RISC) scalar processor, as a non-limiting example. Scalar processor 44 in this example includes an arithmetic logic unit (ALU) 46 to support scalar instruction processing. The baseband processor 20 fetches instructions from the program memory 50, decodes the fetched instructions, and based on the instruction type, fetches the fetched instructions to the scalar processor 44 or to the VPE 22 through the vector data path 53. Instruction dispatch circuit 48 is configured to be directed. Scalar processor 44 includes general purpose registers 54 that are used by scalar processor 44 when executing scalar instructions. An integer unit data memory (DMEM) 56 is included in the baseband processor 20 to supply data from the main memory to the general purpose register 54 for access by the scalar processor 44 for scalar instruction execution. The DMEM 56 may be a cache memory as a non-limiting example. Baseband processor 20 is a memory controller configured to receive a memory address from general register 54 when scalar processor 44 is executing a vector instruction that seeks access to main memory through memory controller data path 62. A memory controller 58 that includes a register 60 is also included.

[0080]ＶＰＥ２２によるベクトル命令処理によってサポートされることが望ましい場合がある特殊ベクトル処理動作の１つのタイプは、フィルタリングである。フィルタ動作は、サンプリングされた入力時間関数の重畳の量子化時間領域表現と、フィルタの重み付け関数の表現とを計算する。時間領域内の重畳は、周波数領域内の乗算に対応する。このように、デジタルフィルタは、間隔が均一なサンプル間隔で実行される乗算および加算の拡張シーケンスにより、ＶＰＥ２２において実現され得る。たとえば、ディスクリート有限インパルス応答（ＦＩＲ）フィルタは、フィルタ関数を計算するために、「Ｙ」計算フィルタ係数を有する遅延線上の遅延タップの有限数（Ｙ）を使用して実施され得る。 [0080] One type of special vector processing operation that may be desired to be supported by vector instruction processing by the VPE 22 is filtering. The filter operation calculates a quantized time domain representation of the superposition of the sampled input time function and a representation of the weighting function of the filter. Superposition in the time domain corresponds to multiplication in the frequency domain. In this way, the digital filter can be implemented in the VPE 22 with an extended sequence of multiplication and addition performed at evenly spaced sample intervals. For example, a discrete finite impulse response (FIR) filter may be implemented using a finite number (Y) of delay taps on the delay line with “Y” calculation filter coefficients to calculate the filter function.

[0081]この関連で、図３は、図２のＶＰＥ２２におけるフィルタベクトル処理動作を介してサポートされることが望ましい場合がある、例示的なディスクリートＦＩＲフィルタ６４の概略図である。デジタル化入力信号６６（ｘ［ｎ］）は、「フィルタ遅延タップ」６８（１）〜６８（Ｙ−１）と呼ばれる遅延構造を通ってデジタル化入力信号サンプル（ｘ［０］，ｘ［１］，．．．ｘ［ｎ］）を渡すことによってフィルタリングされ得る。フィルタ遅延タップ６８（１）〜６８（Ｙ−１）は、フィルタサンプル被乗数７２（０）〜７２（Ｙ−１）を供給するために、すべてのデジタル化入力信号サンプル（すなわち、ｘ［０］，ｘ［１］，．．．ｘ［ｎ］）が各々フィルタ係数（ｈ［０］〜ｈ（Ｙ−１））によって乗算される（すなわち、ｈ（ｌ）＊ｘ［ｎ−ｌ］）ために、クロックされたデジタル化入力信号サンプル（すなわち、ｘ［０］，ｘ［１］，．．．ｘ［ｎ］）を乗算器７０（０）〜７０（Ｙ−１）の中にシフトする。フィルタサンプル被乗数７２（０）〜７２（Ｙ−１）は、結果として生じるフィルタ処理された出力信号７６（すなわち、ｙ［ｎ］）を供給するために、加算器（すなわち、アダー）７４（１）〜７４（Ｙ−１）によって一緒に加算される。このように、図３のディスクリートＦＩＲフィルタ６４は以下のように要約され得る。 [0081] In this regard, FIG. 3 is a schematic diagram of an exemplary discrete FIR filter 64 that may be desired to be supported via filter vector processing operations in VPE 22 of FIG. Digitized input signal 66 (x [n]) passes through a delay structure called “filter delay taps” 68 (1) -68 (Y−1) and digitized input signal samples (x [0], x [1] ], ... x [n]). Filter delay taps 68 (1) -68 (Y-1) provide all the digitized input signal samples (ie, x [0]) to provide filter sample multiplicands 72 (0) -72 (Y-1). , X [1],... X [n]) are respectively multiplied by the filter coefficients (h [0] to h (Y−1)) (ie h (l) * x [n−1]). In order to shift the clocked digitized input signal samples (ie, x [0], x [1],... X [n]) into multipliers 70 (0) -70 (Y-1). To do. The filter sample multiplicands 72 (0) -72 (Y-1) are fed to an adder (ie, adder) 74 (1) to provide a resulting filtered output signal 76 (ie, y [n]). ) To 74 (Y-1). Thus, the discrete FIR filter 64 of FIG. 3 can be summarized as follows.

ここで、
ｎは入力信号サンプルの数であり、
ｘ［ｎ］はデジタル化入力信号６６であり、
ｙ［ｎ］は、結果として生じるフィルタ処理された出力信号７６であり、
ｈ（ｌ）はフィルタ係数であり、
Ｙはフィルタ係数の数である。
フィルタ係数ｈ（ｌ）は複素数であり得る。一態様では、ＶＰＥ２２は、（たとえば、グローバルレジスタファイル４０から）フィルタ係数を受信することができる。ＶＰＥ２２は、ＦＩＲフィルタ関数を実行するために受信されたフィルタ係数を直接使用することができ、その場合、上記の式におけるフィルタ係数ｈ（ｌ）は、受信されたフィルタ係数を表すことができる。代替として、ＶＰＥ２２は、ＦＩＲフィルタ関数を実行するためにそれらを使用する前に、受信されたフィルタ係数の複素共役を計算することができ、その場合、上記の式におけるフィルタ係数ｈ（ｌ）は、受信されたフィルタ係数の共役を表すことができる。 here,
n is the number of input signal samples,
x [n] is the digitized input signal 66;
y [n] is the resulting filtered output signal 76;
h (l) is a filter coefficient,
Y is the number of filter coefficients.
The filter coefficient h (l) can be a complex number. In one aspect, VPE 22 may receive filter coefficients (eg, from global register file 40). The VPE 22 can directly use the received filter coefficients to perform the FIR filter function, in which case the filter coefficient h (l) in the above equation can represent the received filter coefficients. Alternatively, the VPE 22 can calculate the complex conjugate of the received filter coefficients before using them to perform the FIR filter functions, in which case the filter coefficient h (l) in the above equation is , Which can represent the conjugate of the received filter coefficients.

[0082]図３の上記のディスクリートＦＩＲフィルタ６４は、以下のように書き直され得る。
ｙ［ｎ］＝ｘ［ｎ］＊ｈ０＋ｘ［ｎ−１］＊ｈ１＋．．．＋ｘ［ｎ−７］＊ｈ７ [0082] The discrete FIR filter 64 of FIG. 3 may be rewritten as follows.
y [n] = x [n] * h0 + x [n-1] * h1 +. . . + X [n-7] * h7

[0083]しかしながら、図３のディスクリートＦＩＲフィルタ６４などのフィルタリング演算は、ベクトルプロセッサにおいて提供される特殊データフローパスに起因して、ベクトルプロセッサにおいて並列化することは困難であり得る。フィルタリングされるべき入力ベクトルデータサンプルセット（たとえば、ベクトル化されたデジタル化入力信号６６）が、フィルタ遅延タップ（たとえば、６８（１）〜６８（Ｙ−１））の間でシフトされると、入力ベクトルデータサンプルセットはベクトルデータファイルから再フェッチされ、したがって電力消費が増大し、スループットが低減される。ベクトルデータファイルからの入力ベクトルデータサンプルセットの再フェッチを最小化するために、ベクトルプロセッサ内のデータフローパスは、効率的な並列化処理のために、フィルタ遅延タップ（たとえば、６８（１）〜６８（Ｙ−１））と同じ数の乗算器（たとえば、７０（０）〜７０（Ｙ−１））を設けるように構成される可能性がある。しかしながら、他のベクトル処理動作は、より少ない乗算器しか必要としない場合があり、それにより、データフローパス内の乗算器の非効率的なスケーリングおよび過少利用がもたらされる。スケーラビリティを提供するために、乗算器の数がフィルタ遅延タップの数よりも少なくなるように削減された場合、フィルタ処理の様々なフェーズに対して同じ入力ベクトルデータサンプルセットを取得するために、メモリにより多くの再フェッチが必要とされることによって、並列化が制限される。 [0083] However, filtering operations such as the discrete FIR filter 64 of FIG. 3 may be difficult to parallelize in the vector processor due to the special data flow path provided in the vector processor. When an input vector data sample set to be filtered (eg, vectorized digitized input signal 66) is shifted between filter delay taps (eg, 68 (1) -68 (Y-1)), The input vector data sample set is refetched from the vector data file, thus increasing power consumption and reducing throughput. In order to minimize refetching of the input vector data sample set from the vector data file, the data flow path within the vector processor is filtered by a filter delay tap (e.g. 68 (1) -68) for efficient parallelization. (Y-1)) may be configured to provide as many multipliers (e.g., 70 (0) to 70 (Y-1)). However, other vector processing operations may require fewer multipliers, leading to inefficient scaling and underutilization of the multipliers in the data flow path. In order to provide scalability, if the number of multipliers is reduced to be less than the number of filter delay taps, the memory is used to obtain the same set of input vector data samples for the various phases of filtering. The parallelism is limited by requiring more refetches.

[0084]この関連で、図４は、図２のＶＰＥ２２として提供され得る例示的なＶＰＥ２２（１）の概略図である。下記でより詳細に記載されるように、図４のＶＰＥ２２（１）は、ベクトルデータサンプルの再フェッチが除去または低減され、電力消費が低減される、ＶＰＥ２２（１）内の精度フィルタベクトル処理動作を提供する。精度フィルタベクトル処理動作は、ベクトルデータサンプルの再フェッチを必要とし、それにより結果として電力消費が増大する、中間結果の記憶を必要とするフィルタベクトル処理動作と比較して、ＶＰＥ２２（１）において提供され得る。ベクトルデータファイルからの入力ベクトルデータサンプルの再フェッチを除去または最小化して、電力消費を低減し、処理効率を改善するために、ＶＰＥ２２（１）内のベクトルデータファイル８２（０）〜８２（Ｘ）と（「ＥＵ」とも標記される）実行ユニット８４（０）〜８４（Ｘ）との間の入力データフローパス８０（０）〜８０（Ｘ）にタップ付き遅延線７８が含まれる。「Ｘ」＋１は、この例におけるベクトルデータサンプルの処理用にＶＰＥ２２（１）内に設けられる並列入力データレーンの最大数である。タップ付き遅延線７８は、ベクトルデータファイル８２（０）〜８２（Ｘ）の対応するサブセットまたはすべてから入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）の入力ベクトルデータサンプル８６のサブセットまたはすべてとして、タップ付き遅延線入力８８（０）〜８８（Ｘ）上で入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）を受信するように構成される。入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）は、この例では８６（０）、８６（１）、．．．、および８６（Ｘ）である、「Ｘ＋１」個の入力ベクトルデータサンプル８６から構成される。 [0084] In this regard, FIG. 4 is a schematic diagram of an exemplary VPE 22 (1) that may be provided as VPE 22 of FIG. As described in more detail below, the VPE 22 (1) of FIG. 4 is a precision filter vector processing operation within the VPE 22 (1) in which refetching of vector data samples is eliminated or reduced and power consumption is reduced. I will provide a. Precision filter vector processing operations require refetching of vector data samples, resulting in increased power consumption, compared to filter vector processing operations that require storage of intermediate results, provided in VPE 22 (1) Can be done. In order to eliminate or minimize refetching of input vector data samples from the vector data file to reduce power consumption and improve processing efficiency, the vector data files 82 (0) -82 (X ) And execution unit 84 (0) -84 (X) (also labeled “EU”) includes a tapped delay line 78 in the input data flow paths 80 (0) -80 (X). “X” +1 is the maximum number of parallel input data lanes provided in VPE 22 (1) for processing vector data samples in this example. The tapped delay line 78 is a subset or all of the input vector data samples 86 of the input vector data sample sets 86 (0) to 86 (X) from the corresponding subset or all of the vector data files 82 (0) to 82 (X). Is configured to receive input vector data sample sets 86 (0) -86 (X) on tapped delay line inputs 88 (0) -88 (X). Input vector data sample sets 86 (0) -86 (X) are 86 (0), 86 (1),. . . , And 86 (X), which is comprised of “X + 1” input vector data samples 86.

[0085]引き続き図４を参照すると、タップ付き遅延線７８は、フィルタベクトル処理動作のために実行ユニット８４（０）〜８４（Ｘ）によって処理されるべき、ベクトルデータファイル８２（０）〜８２（Ｘ）からフェッチされた入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）を記憶する。下記の図６および図７に関して下記でより詳細に説明されるように、タップ付き遅延線７８は、実行ユニット８４（０）〜８４（Ｘ）にシフトされた入力ベクトルデータサンプルセット８６Ｓ（０）〜８６Ｓ（Ｘ）を供給するために、ＶＰＥ２２（１）によって実行されるべきフィルタベクトル命令に従うフィルタベクトル処理動作のフィルタ遅延タップ（すなわち、フィルタ処理ステージ）ごとに、入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）をシフトするように構成される。シフトされた入力ベクトルデータサンプル８６Ｓのすべては、シフトされた入力ベクトルデータサンプルセット８６Ｓ（０）〜８６Ｓ（Ｘ）を備える。タップ付き遅延線７８は、フィルタベクトル処理動作中、実行ユニット８４（０）〜８４（Ｘ）の実行ユニット入力９０（０）〜９０（Ｘ）にシフトされた入力ベクトルデータサンプル８６Ｓ（０）〜８６Ｓ（Ｘ）を供給する。このようにして、フィルタベクトル処理動作のフィルタタップのためのシフトされた入力ベクトルデータサンプルセット８６Ｓ（０）〜８６Ｓ（Ｘ）に対して実行される動作に基づく中間フィルタ結果は、ＶＰＥ２２（１）によって実行されるフィルタベクトル処理動作の各処理ステージの間、記憶、シフト、およびベクトルデータファイル８２（０）〜８２（Ｘ）から再フェッチされる必要がない。このように、タップ付き遅延線７８は、ＶＰＥ２２（１）によって実行されるフィルタベクトル処理動作についての電力消費を低減し、処理効率を上げることができる。 [0085] With continued reference to FIG. 4, tapped delay line 78 is to be processed by execution units 84 (0) -84 (X) for filter vector processing operations, vector data files 82 (0) -82. The input vector data sample sets 86 (0) to 86 (X) fetched from (X) are stored. As described in more detail below with respect to FIGS. 6 and 7 below, tapped delay line 78 is input vector data sample set 86S (0) shifted to execution units 84 (0) -84 (X). Input vector data sample set 86 (0) for each filter delay tap (ie, filter stage) of the filter vector processing operation according to the filter vector instruction to be executed by VPE 22 (1) to supply ~ 86S (X). ) To 86 (X). All of the shifted input vector data samples 86S comprise the shifted input vector data sample sets 86S (0) -86S (X). The tapped delay line 78 provides input vector data samples 86S (0)-shifted to execution unit inputs 90 (0) -90 (X) of execution units 84 (0) -84 (X) during filter vector processing operations. 86S (X) is supplied. In this way, an intermediate filter result based on the operations performed on the shifted input vector data sample sets 86S (0) -86S (X) for the filter taps of the filter vector processing operation is VPE22 (1). During each processing stage of the filter vector processing operations performed by, there is no need to store, shift, and refetch from the vector data files 82 (0) -82 (X). In this way, the tapped delay line 78 can reduce power consumption and increase processing efficiency for the filter vector processing operation performed by the VPE 22 (1).

[0086]「ベクトル処理ステージ」とも呼ばれるＶＰＥ２２（１）内の処理ステージは、特定のタスクまたは動作を実行するように設計された回路と関連するベクトルデータパスとを備える。ベクトル処理動作は、いくつかの異なる処理ステージにおいて、ＶＰＥ２２（１）によって実行される場合がある。各処理ステージは、ＶＰＥ２２（１）の１つまたは複数のクロックサイクルにわたって実行される場合がある。その結果、ＶＰＥ２２（１）内のベクトル処理動作の実行は、ベクトル処理動作の各処理ステージが各々１つまたは複数のクロックサイクルを消費する可能性があるので、完了するために多くのクロックサイクルを要する可能性がある。たとえば、処理ステージは、図４のＶＰＥ２２（１）内のタップ付き遅延線７８の中に入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）をフェッチすることを含む場合がある。ＶＰＥ２２（１）内のベクトル処理ステージはパイプライン化され得る。 [0086] A processing stage within VPE 22 (1), also referred to as a "vector processing stage", comprises a vector data path associated with circuitry designed to perform a particular task or operation. Vector processing operations may be performed by VPE 22 (1) at several different processing stages. Each processing stage may be executed over one or more clock cycles of VPE 22 (1). As a result, execution of vector processing operations in VPE 22 (1) may take many clock cycles to complete because each processing stage of the vector processing operations may consume one or more clock cycles each. It may take. For example, the processing stage may include fetching input vector data sample sets 86 (0) -86 (X) into the tapped delay line 78 in VPE 22 (1) of FIG. The vector processing stage in VPE 22 (1) may be pipelined.

[0087]実行ユニット８４（０）〜８４（Ｘ）は、フェッチされた入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）を処理する１つまたは複数のパイプラインステージを含む場合がある。たとえば、実行ユニット８４（０）〜８４（Ｘ）内の１つのパイプラインステージは、累算演算を実行するように構成された累算器から構成される累算ステージを含む場合がある。別の例として、実行ユニット８４（０）〜８４（Ｘ）内の別のパイプラインステージは、乗算演算を実行するように構成された乗算器から構成される乗算ステージを含む場合がある。 [0087] Execution units 84 (0) -84 (X) may include one or more pipeline stages that process fetched input vector data sample sets 86 (0) -86 (X). For example, one pipeline stage in execution units 84 (0) -84 (X) may include an accumulation stage comprised of accumulators configured to perform accumulation operations. As another example, another pipeline stage in execution units 84 (0) -84 (X) may include a multiplication stage comprised of multipliers configured to perform multiplication operations.

[0088]引き続き図４を参照すると、実行ユニット８４（０）〜８４（Ｘ）は、フィルタベクトル処理動作のための図２のグローバルレジスタファイル４０に記憶されたフィルタ係数９２（０）〜９２（Ｙ−１）の中からフィルタ係数９２を受信する、ここで、「Ｙ」はフィルタベクトル処理動作のためのフィルタ係数の数に等しい場合がある。実行ユニット８４（０）〜８４（Ｘ）は、各々、実行ユニット８４（０）〜８４（Ｘ）内に中間フィルタベクトルデータ出力サンプルを供給するために、ベクトルフィルタ処理動作の各処理ステージの間に、受信されたフィルタ係数９２（０）、９０（１）、．．．９０（Ｙ−１）のうちの１つを、シフトされた入力ベクトルデータサンプルセット８６Ｓ（０）〜８６Ｓ（Ｘ）のシフトされた入力ベクトルデータサンプル８６Ｓ（０）、８６Ｓ（１）、．．．８６Ｓ（Ｘ）と乗算するように構成される。中間フィルタベクトルデータ出力サンプルセットは、実行ユニット８４（０）〜８４（Ｘ）の各々において累算される（すなわち、前に累算されたフィルタ出力ベクトルデータサンプルが現在の累算されたフィルタ出力ベクトルデータサンプルに加算される）。これにより、シフトされた入力ベクトルデータサンプルセット８６Ｓ（０）〜８６Ｓ（Ｘ）内のシフトされた入力ベクトルデータサンプル８６Ｓ（０）、８６Ｓ（１）、．．．８６Ｓ（Ｘ）ごとに、それぞれ、出力データフローパス９８（０）〜９８（Ｘ）上の実行ユニット出力９６（０）〜９６（Ｘ）上に実行ユニット８４（０）〜８４（Ｘ）によって供給される、最終的な、結果として生じるフィルタ出力ベクトルデータサンプルセット９４（０）〜９４（Ｘ）がもたらされる。結果として生じるフィルタ出力ベクトルデータサンプルセット９４（０）〜９４（Ｘ）は、この例では９４（０）、９４（１）、．．．、および９４（Ｘ）である、「Ｘ＋１」個の、結果として生じるフィルタ出力ベクトルデータサンプル９４から構成される。実行ユニット８４（０）〜８４（Ｘ）によって生成された中間フィルタベクトルデータ出力サンプルセットを記憶しシフトする必要なしに、結果として生じるフィルタ出力ベクトルデータサンプルセット９４（０）〜９４（Ｘ）は、ＶＰＥ２２（１）によるさらなる使用および／または処理のために、それぞれのベクトルデータファイル８２（０）〜８２（Ｘ）に戻されて記憶される。 [0088] With continued reference to FIG. 4, execution units 84 (0) -84 (X) receive filter coefficients 92 (0) -92 () stored in global register file 40 of FIG. 2 for filter vector processing operations. Y-1) receive filter coefficients 92, where “Y” may be equal to the number of filter coefficients for the filter vector processing operation. Execution units 84 (0) -84 (X) each during each processing stage of the vector filter processing operation to provide intermediate filter vector data output samples within execution units 84 (0) -84 (X). Received filter coefficients 92 (0), 90 (1),. . . 90 (Y-1) are shifted to shifted input vector data samples 86S (0) to 86S (X), 86S (0), 86S (1),. . . 86S (X) is configured to multiply. The intermediate filter vector data output sample set is accumulated in each of execution units 84 (0) -84 (X) (i.e., the previously accumulated filter output vector data samples are currently accumulated filter outputs. Added to vector data samples). Thus, the shifted input vector data samples 86S (0), 86S (1),. . . Provided by execution units 84 (0) -84 (X) on execution unit outputs 96 (0) -96 (X) on output data flow paths 98 (0) -98 (X), respectively, for each 86S (X). Resulting in the final resulting filter output vector data sample set 94 (0) -94 (X). The resulting filter output vector data sample sets 94 (0) -94 (X) are 94 (0), 94 (1),. . . , And 94 (X), consisting of “X + 1” resulting filter output vector data samples 94. Without having to store and shift the intermediate filter vector data output sample sets generated by execution units 84 (0) -84 (X), the resulting filter output vector data sample sets 94 (0) -94 (X) , Stored back into the respective vector data files 82 (0) -82 (X) for further use and / or processing by the VPE 22 (1).

[0089]引き続き図４を参照すると、下記でより詳細に説明されるように、タップ付き遅延線７８は、処理されているベクトル命令に従って制御されるようにプログラム可能である。フィルタベクトル命令が処理されていない場合、タップ付き遅延線７８は、ベクトルデータファイル８２（０）〜８２（Ｘ）と実行ユニット８４（０）〜８４（Ｘ）との間の入力データフローパス８０（０）〜８０（Ｘ）に含まれないようにプログラムされ得る。この実施形態では、タップ付き遅延線７８は、フィルタベクトル処理動作のフィルタタップごとにシフトされた入力ベクトルデータサンプルセット８６Ｓ（０）〜８６Ｓ（Ｘ）を供給するために、ベクトルデータファイル８２（０）〜８２（Ｘ）から受信された入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）をロードしシフトするように構成される。このように、シフトされた入力ベクトルデータサンプルセット８６Ｓ（０）〜８６Ｓ（Ｘ）は、フィルタベクトル処理動作のフィルタタップの実行のために、実行ユニット８４（０）〜８４（Ｘ）に供給され得る。タップ付き遅延線７８がないと、フィルタベクトル処理動作の次のフィルタタップのために、実行ユニット８４（０）〜８４（Ｘ）にシフトされた中間入力ベクトルデータサンプルセットを再び供給するために、別個のシフティングプロセスが実行される必要があるはずであり、それにより、遅延時間が増大し、さらなる電力が消費される。さらに、フィルタベクトル処理動作中、ベクトルデータファイル８２（０）〜８２（Ｘ）からのシフトされた入力ベクトルデータサンプルセット８６Ｓ（０）〜８６Ｓ（Ｘ）の再フェッチ遅延によって、ＶＰＥ２２（１）内の入力データフローパス８０（０）〜８０（Ｘ）および出力データフローパス９８（０）〜９８（Ｘ）の効率が制限される。 [0089] With continued reference to FIG. 4, the tapped delay line 78 is programmable to be controlled in accordance with the vector instruction being processed, as described in more detail below. When the filter vector instruction has not been processed, the tapped delay line 78 is connected to the input data flow path 80 () between the vector data files 82 (0) -82 (X) and the execution units 84 (0) -84 (X). 0) -80 (X) can be programmed. In this embodiment, tapped delay line 78 provides vector data file 82 (0) to provide input vector data sample sets 86S (0) -86S (X) shifted for each filter tap of the filter vector processing operation. ) -82 (X) is configured to load and shift the input vector data sample sets 86 (0) -86 (X) received from. Thus, the shifted input vector data sample sets 86S (0) -86S (X) are supplied to execution units 84 (0) -84 (X) for execution of filter taps of the filter vector processing operation. obtain. Without the tapped delay line 78, to supply again the intermediate input vector data sample set shifted to execution units 84 (0) -84 (X) for the next filter tap of the filter vector processing operation. A separate shifting process would need to be performed, thereby increasing the delay time and consuming additional power. In addition, during the filter vector processing operation, due to the refetch delay of the shifted input vector data sample sets 86S (0) -86S (X) from the vector data files 82 (0) -82 (X), The efficiency of the input data flow paths 80 (0) to 80 (X) and the output data flow paths 98 (0) to 98 (X) are limited.

[0090]シフトされた入力ベクトルデータサンプルセット８６Ｓ（０）〜８６Ｓ（Ｘ）は、実行ユニット８４（０）〜８４（Ｘ）に局在するタップ付き遅延線７８によって供給される。実行ユニット８４（０）〜８４（Ｘ）におけるベクトル処理は、データフローの制限ではなく、コンピュータリソースのみによって制限される。これは、シフトされた入力ベクトルデータサンプルセット８６Ｓ（０）〜８６Ｓ（Ｘ）がベクトルデータファイル８２（０）〜８２（Ｘ）からフェッチされるまで待つ必要なしに、実行ユニット８４（０）〜８４（Ｘ）が、ベクトル処理動作を実行するためにシフトされた入力ベクトルデータサンプルセット８６Ｓ（０）〜８６Ｓ（Ｘ）を受信することに、連続して、または実質的に連続して忙しいことを意味する。 [0090] The shifted input vector data sample sets 86S (0) -86S (X) are provided by a tapped delay line 78 located in the execution units 84 (0) -84 (X). Vector processing in execution units 84 (0) -84 (X) is limited only by computer resources, not data flow limitations. This is done without having to wait until the shifted input vector data sample sets 86S (0) -86S (X) are fetched from the vector data files 82 (0) -82 (X). 84 (X) is continuously or substantially continuously busy receiving input vector data sample sets 86S (0) -86S (X) shifted to perform vector processing operations Means.

[0091]さらに、図４のＶＰＥ２２（１）によって実行されるフィルタベクトル処理動作は、タップ付き遅延線７８を利用することによってより精密になり得るが、これは、実行ユニット８４（０）〜８４（Ｘ）内の中間フィルタ処理ステージのための出力累算がベクトルデータファイル８２（０）〜８２（Ｘ）に記憶される必要がないからである。実行ユニット８４（０）〜８４（Ｘ）からベクトルデータファイル８２（０）〜８２（Ｘ）への中間出力ベクトルデータサンプルセットの記憶は、丸めをもたらす可能性がある。したがって、次の中間出力ベクトルデータサンプルセットがベクトル処理動作のために実行ユニット８４（０）〜８４（Ｘ）に供給されるとき、ベクトル処理動作の各乗算フェーズの間に任意の丸め誤差が伝搬および加算される。対照的に、図４のＶＰＥ２２（１）の例では、実行ユニット８４（０）〜８４（Ｘ）によって計算された中間出力ベクトルデータサンプルセットは、ベクトルデータファイル８２（０）〜８２（Ｘ）に記憶される必要がない。実行ユニット８４（０）〜８４（Ｘ）は、前の中間出力ベクトルデータサンプルセットを次のフィルタ遅延タップのための中間出力ベクトルデータサンプルセットと累算することができるが、これは、タップ付き遅延線７８が、処理されるべきベクトル処理動作の間に、実行ユニット８４（０）〜８４（Ｘ）にシフトされた入力ベクトルデータサンプルセット８６Ｓ（０）〜８６Ｓ（Ｘ）を供給するからであり、結果は前のフィルタ遅延タップのための前のベクトルデータサンプルセットと累算される。 [0091] In addition, the filter vector processing operations performed by VPE 22 (1) of FIG. This is because the output accumulation for the intermediate filtering stage in (X) need not be stored in vector data files 82 (0) -82 (X). Storage of the intermediate output vector data sample sets from execution units 84 (0) -84 (X) to vector data files 82 (0) -82 (X) may result in rounding. Thus, when the next intermediate output vector data sample set is fed to execution units 84 (0) -84 (X) for vector processing operations, any rounding errors are propagated and propagated during each multiplication phase of the vector processing operations. Is added. In contrast, in the VPE 22 (1) example of FIG. 4, the intermediate output vector data sample sets computed by execution units 84 (0) -84 (X) are vector data files 82 (0) -82 (X). Need not be memorized. Execution units 84 (0) -84 (X) may accumulate the previous intermediate output vector data sample set with the intermediate output vector data sample set for the next filter delay tap, which is tapped. Because delay line 78 provides input vector data sample sets 86S (0) -86S (X) shifted to execution units 84 (0) -84 (X) during vector processing operations to be processed. Yes, the result is accumulated with the previous vector data sample set for the previous filter delay tap.

[0092]引き続き図４を参照すると、この実施形態におけるＶＰＥ２２（１）は、並列化処理のための（ＶＬＡＮＥ０１００（０）〜ＶＬＡＮＥＸ１００（Ｘ）と標記された）複数のベクトルデータレーンから構成される。各ベクトルデータレーン１００（０）〜１００（Ｘ）は、この実施形態では、ベクトルデータファイル８２と実行ユニット８４とを含んでいる。例としてベクトルデータレーン１００（０）を取り上げると、その中のベクトルデータファイル８２（０）は、フィルタベクトル処理のために実行ユニット８４（０）によって受信されるように、入力データフローパス８０（０）上に入力ベクトルデータサンプル８６（０）を供給するように構成される。上記で説明されたように、タップ付き遅延線７８は、フィルタベクトル処理のために、入力ベクトルデータサンプル８６（０）をシフトし、シフトされた入力ベクトルデータサンプル８６Ｓ（０）を実行ユニット８４（０）に供給するために、入力データフローパス８０（０）内に設けられる。ベクトルデータファイル８２（０）はまた、ＶＰＥ２２（１）によって処理されるべき現在または次のベクトル命令に従って、必要または所望に応じて、次のベクトル処理動作のためにベクトルデータファイル８２（０）に戻されて記憶されるべき、出力データフローパス９８（０）からのフィルタベクトル処理の結果として、実行ユニット８４（０）によって供給される、結果として生じるフィルタ出力ベクトルデータサンプル９４（０）を受信するように構成される。 [0092] With continued reference to FIG. 4, the VPE 22 (1) in this embodiment is comprised of a plurality of vector data lanes (labeled VLANE0 100 (0) to VLANEX 100 (X)) for parallel processing. Is done. Each vector data lane 100 (0) -100 (X) includes a vector data file 82 and an execution unit 84 in this embodiment. Taking the vector data lane 100 (0) as an example, the vector data file 82 (0) therein is input data flow path 80 (0) as received by the execution unit 84 (0) for filter vector processing. ) Configured to provide input vector data samples 86 (0). As explained above, the tapped delay line 78 shifts the input vector data sample 86 (0) and executes the shifted input vector data sample 86S (0) for execution of the filter vector processing unit 84 ( 0) is provided in the input data flow path 80 (0). The vector data file 82 (0) is also transferred to the vector data file 82 (0) for the next vector processing operation as necessary or desired according to the current or next vector instruction to be processed by the VPE 22 (1). Receive the resulting filter output vector data sample 94 (0) supplied by execution unit 84 (0) as a result of filter vector processing from output data flow path 98 (0) to be returned and stored. Configured as follows.

[0093]必要に応じて、任意の数のベクトルデータレーン１００（０）〜１００（Ｘ）がＶＰＥ２２（１）内に設けられる場合がある。ＶＰＥ２２（１）内に設けられるベクトルデータレーン１００（０）〜１００（Ｘ）の数は、効率目的のための並列化ベクトル処理対さらなるベクトルデータレーン１００（０）〜１００（Ｘ）を設けることに伴うさらなる回路、空間、および電力消費についてのトレードオフに基づく場合がある。１つの非限定的な例として、１６個のベクトルデータレーン１００がＶＰＥ２２（１）内に設けられる場合があり、各ベクトルデータレーン１００は、ＶＰＥ２２（１）内の５１２ビットまでのベクトルデータの並列化処理を提供するために、３２ビットのデータ幅能力を有する。 [0093] Any number of vector data lanes 100 (0) -100 (X) may be provided in VPE 22 (1) as needed. The number of vector data lanes 100 (0) -100 (X) provided in VPE 22 (1) provides parallel vector processing vs. further vector data lanes 100 (0) -100 (X) for efficiency purposes. May be based on additional circuit, space, and power consumption tradeoffs. As one non-limiting example, 16 vector data lanes 100 may be provided in the VPE 22 (1), and each vector data lane 100 is parallel to up to 512 bits of vector data in the VPE 22 (1). In order to provide processing, it has a 32-bit data width capability.

[0094]引き続き図４を参照すると、すべてのベクトルデータファイル８２（０）〜８２（Ｘ）に適用可能であるが、例としてベクトルデータレーン１００（０）内のベクトルデータファイル８２（０）を使用して、ベクトルデータファイル８２（０）により、入力ベクトルデータサンプル８６（０）の１つまたは複数のサンプルがベクトル処理のために記憶されることが可能になる。ＶＰＥ２２（１）によって実行されている特定のベクトル命令に従う入力ベクトルデータサンプル８６（０）のプログラミングに応じて、入力ベクトルデータサンプル８６（０）の幅が設けられる。入力データフローパス８０（０）の幅は、所与のベクトル命令がタップ付き遅延線７８および実行ユニット８４（０）に様々な幅の入力ベクトルデータサンプル８６（０）を供給するために、クロックサイクルごとを含むベクトル命令ごとにプログラム可能および再プログラム可能である。このようにして、ベクトルデータレーン１００（０）は、実行されているベクトル命令のタイプに応じて、入力ベクトルデータサンプル８６（０）の様々な幅の処理を提供するように、プログラムおよび再プログラムされ得る。 [0094] Continuing to refer to FIG. 4, it is applicable to all vector data files 82 (0) -82 (X), but as an example the vector data file 82 (0) in the vector data lane 100 (0) In use, the vector data file 82 (0) allows one or more samples of the input vector data sample 86 (0) to be stored for vector processing. Depending on the programming of the input vector data sample 86 (0) according to the particular vector instruction being executed by the VPE 22 (1), the width of the input vector data sample 86 (0) is provided. The width of the input data flow path 80 (0) is determined by the number of clock cycles for a given vector instruction to provide input vector data samples 86 (0) of varying widths to the tapped delay line 78 and execution unit 84 (0). Programmable and reprogrammable for each vector instruction including In this way, the vector data lane 100 (0) is programmed and reprogrammed to provide varying width processing of the input vector data samples 86 (0), depending on the type of vector instruction being executed. Can be done.

[0095]たとえば、ベクトルデータファイル８２（０）は、３２ビット幅であり、同様に３２ビットまでの幅である入力ベクトルデータサンプル８６を記憶することが可能であり得る。入力ベクトルデータサンプル８６（０）は、ベクトルデータファイル８２（０）の幅全体（たとえば、３２ビット）を消費する場合があるか、またはベクトルデータファイル８２（０）の幅のより小さいサンプルサイズで供給される場合がある。入力ベクトルデータサンプル８６（０）のサイズは、ＶＰＥ２２（１）によって実行されているベクトル命令に基づく、入力ベクトルデータサンプル８６（０）のサイズ向けの入力データフローパス８０（０）の構成のプログラミングに基づいて構成され得る。たとえば、入力ベクトルデータサンプル８６（０）は、１つのベクトル命令のための２つの別々の１６ビットベクトルデータサンプルを備える場合がある。別の例として、入力ベクトルデータサンプル８６（０）は、１つの３２ビットベクトルデータサンプルとは対照的に、別のベクトル命令のためのベクトルデータファイル８２（０）内の４つの８ビットベクトルデータサンプルを備える場合がある。別の例では、入力ベクトルデータサンプル８６（０）は、１つの３２ビットベクトルデータサンプルを備える場合がある。ＶＰＥ２２（１）はまた、ベクトル命令ごとに、および／または所与のベクトル命令のクロックサイクルごとに、実行ユニット８４（０）によりベクトルデータファイル８２（０）に供給される様々なサイズの、結果として生じるフィルタ出力ベクトルデータサンプル９４（０）を受信するように、ベクトルデータファイル８２（０）のための出力データフローパス９８（０）をプログラムおよび再プログラムすることが可能である。 [0095] For example, the vector data file 82 (0) is 32 bits wide and may be capable of storing input vector data samples 86 that are also up to 32 bits wide. The input vector data sample 86 (0) may consume the entire width of the vector data file 82 (0) (eg, 32 bits) or with a smaller sample size than the width of the vector data file 82 (0). May be supplied. The size of the input vector data sample 86 (0) is used to program the configuration of the input data flow path 80 (0) for the size of the input vector data sample 86 (0) based on the vector instruction being executed by the VPE 22 (1). Can be configured on the basis. For example, input vector data sample 86 (0) may comprise two separate 16-bit vector data samples for one vector instruction. As another example, input vector data sample 86 (0) is the four 8-bit vector data in vector data file 82 (0) for another vector instruction, as opposed to one 32-bit vector data sample. May have a sample. In another example, input vector data sample 86 (0) may comprise one 32-bit vector data sample. The VPE 22 (1) also has various sizes of results supplied by the execution unit 84 (0) to the vector data file 82 (0) for each vector instruction and / or for each clock cycle of a given vector instruction. The output data flow path 98 (0) for the vector data file 82 (0) can be programmed and reprogrammed to receive the resulting filter output vector data sample 94 (0).

[0096]図４のＶＰＥ２２（１）のさらなる詳細および特徴、ならびにこの実施形態における入力データフローパス８０（０）〜８０（Ｘ）内の実行ユニット８４（０）〜８４（Ｘ）にシフトされた入力ベクトルデータサンプルセット８６Ｓ（０）〜８６Ｓ（Ｘ）を供給するためのタップ付き遅延線７８のさらなる説明が次に記載される。この関連で、図５は、例示的なフィルタベクトル命令に従って、タップ付き遅延線７８を利用する図４のＶＰＥ２２（１）において実行され得る例示的なフィルタベクトル処理動作１０２を示すフローチャートである。図６Ａ〜図１０において提供される例を参照して、図５のフィルタベクトル処理動作１０２において実行される例示的なタスクが記載される。 [0096] Further details and features of VPE 22 (1) in FIG. 4 and shifted to execution units 84 (0) -84 (X) in input data flow paths 80 (0) -80 (X) in this embodiment. Further description of tapped delay line 78 for providing input vector data sample sets 86S (0) -86S (X) will now be described. In this regard, FIG. 5 is a flowchart illustrating an exemplary filter vector processing operation 102 that may be performed in VPE 22 (1) of FIG. 4 utilizing tapped delay line 78 in accordance with an exemplary filter vector instruction. With reference to the examples provided in FIGS. 6A-10, exemplary tasks performed in the filter vector processing operation 102 of FIG. 5 will be described.

[0097]図５を参照すると、フィルタベクトル命令に従ってフィルタベクトル処理動作１０２において処理されるべき入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）は、フィルタベクトル処理動作１０２のために、ベクトルデータファイル８２（０）〜８２（Ｘ）から入力データフローパス８０（０）〜８０（Ｘ）の中にフェッチされる（ブロック１０４）。図４のＶＰＥ２２（１）に関して上記で説明されたように、入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）は、実行ユニット８４（０）〜８４（Ｘ）内のグローバルレジスタファイル４０から受信されたフィルタ係数９２（０）〜９２（Ｙ−１）と乗算される。たとえば、図６Ａは、グローバルレジスタファイル４０内のフィルタ係数９２（０）〜９２（Ｙ−１）（すなわち、ｈ７〜ｈ０）を示す。この例では、実行されるべきフィルタベクトル処理動作１０２において８個のフィルタタップを提供する、グローバルレジスタファイル４０に記憶された８個のフィルタ係数９２が存在する。この例では、上記で説明された図３のディスクリートＦＩＲフィルタ６４の式からのフィルタベクトル処理動作１０２は、下記の通りであることに留意されたい。
ｙ［ｎ］＝ｘ［ｎ］＊ｈ０＋ｘ［ｎ−１］＊ｈ１＋．．．＋ｘ［ｎ−７］＊ｈ７ [0097] Referring to FIG. 5, the input vector data sample sets 86 (0) -86 (X) to be processed in the filter vector processing operation 102 in accordance with the filter vector instructions include vector data for the filter vector processing operation 102. Fetched from files 82 (0) -82 (X) into input data flow paths 80 (0) -80 (X) (block 104). As described above with respect to VPE 22 (1) of FIG. 4, input vector data sample sets 86 (0) -86 (X) are derived from global register file 40 in execution units 84 (0) -84 (X). The received filter coefficients 92 (0) to 92 (Y-1) are multiplied. For example, FIG. 6A shows filter coefficients 92 (0) -92 (Y-1) (ie, h7-h0) in the global register file 40. FIG. In this example, there are eight filter coefficients 92 stored in the global register file 40 that provide eight filter taps in the filter vector processing operation 102 to be performed. Note that in this example, the filter vector processing operation 102 from the discrete FIR filter 64 equation of FIG. 3 described above is as follows.
y [n] = x [n] * h0 + x [n-1] * h1 +. . . + X [n-7] * h7

[0098]図６Ｂは、フィルタベクトル処理動作１０２によってフィルタリングされるべき入力信号を表す、図４のＶＰＥ２２（１）内のベクトルデータファイル８２（０）〜８２（Ｘ）に記憶された例示的な入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）を示す。この例では、サンプルＸ０は最も古いサンプルであり、サンプルＸ６３はつい最近のサンプルである。言い換えれば、この例では、サンプルＸ６３は、時間的にサンプルＸ０の後に発生する。ベクトルデータファイル８２（０）〜８２（Ｘ）の各アドレスは１６ビット幅であるので、ベクトルデータファイル８２（０）〜８２（Ｘ）に記憶された最初の入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）は、図６Ｂに示されたように、ＡＤＤＲＥＳＳ０およびＡＤＤＲＥＳＳ１にまたがる。これにより、ベクトルデータファイル８２（０）〜８２（Ｘ）が、図４のＶＰＥ２２（１）の例における実行ユニット８４（０）〜８４（Ｘ）の３２ビット幅能力をサポートするために、３２ビット幅の入力ベクトルデータサンプル８６を供給することが可能になる。この関連で、最初の入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）を備える、合計５１２ビットの各々８ビットの幅である６４個の合計入力ベクトルデータサンプルサブセット（すなわち、Ｘ０〜Ｘ６３）が存在する。同様に、ＡＤＤＲＥＳＳ２およびＡＤＤＲＥＳＳ３は、ベクトルデータファイル８２（０）〜８２（Ｘ）に記憶された別の２番目の入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）を記憶する。図６Ｂのこの例では、各ベクトルデータファイル８２（０）〜８２（Ｘ）の８個のアドレス（ＡＤＤＲＥＳＳ０〜７）が示され、２５６個の合計入力ベクトルデータサンプル８６（すなわち、Ｘ０〜Ｘ２５５）を示すが、それは限定的でないことに留意されたい。 [0098] FIG. 6B illustrates an exemplary stored in vector data files 82 (0) -82 (X) in VPE 22 (1) of FIG. 4 that represents the input signal to be filtered by the filter vector processing operation 102. Input vector data sample sets 86 (0) -86 (X) are shown. In this example, sample X0 is the oldest sample and sample X63 is the most recent sample. In other words, in this example, sample X63 occurs after sample X0 in time. Since each address of the vector data files 82 (0) to 82 (X) is 16 bits wide, the first input vector data sample set 86 (0) stored in the vector data files 82 (0) to 82 (X). ~ 86 (X) spans ADDRESS0 and ADDRESS1, as shown in FIG. 6B. This allows vector data files 82 (0) -82 (X) to support the 32-bit width capability of execution units 84 (0) -84 (X) in the example of VPE 22 (1) of FIG. It becomes possible to provide input vector data samples 86 of bit width. In this regard, a total of 64 total input vector data sample subsets (i.e., X0-X63) each comprising a total of 512 bits and 8 bits wide, comprising the first input vector data sample set 86 (0) -86 (X) Exists. Similarly, ADDRESS2 and ADDRESS3 store another second input vector data sample set 86 (0) -86 (X) stored in vector data files 82 (0) -82 (X). In this example of FIG. 6B, eight addresses (ADDRESS0-7) of each vector data file 82 (0) -82 (X) are shown, and 256 total input vector data samples 86 (ie, X0-X255). Note that this is not limiting.

[0099]フィルタベクトル処理動作１０２に関与する入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）の幅に応じて、ベクトル命令のプログラミングに従うフィルタベクトル処理動作１０２を提供するために、図４のＶＰＥ２２（１）内のベクトルデータレーン１００（０）〜１００（Ｘ）の１つ、いくつか、またはすべてが利用され得る。ベクトルデータファイル８２（０）〜８２（Ｘ）の幅全体が必要な場合、すべてのベクトルデータレーン１００（０）〜１００（Ｘ）がフィルタベクトル処理動作１０２に利用され得る。フィルタベクトル処理動作１０２は、フィルタベクトル処理動作１０２に利用され得るベクトルデータレーン１００（０）〜１００（Ｘ）のサブセットを必要とするにすぎない場合があることに留意されたい。これは、入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）の幅がすべてのベクトルデータファイル８２（０）〜８２（Ｘ）の幅よりも小さいからであり得るし、ここで、フィルタベクトル処理動作１０２と並列に実行されるべき他のベクトル処理動作にさらなるベクトルデータレーン１００を利用することが望ましい。現在の例を説明する目的で、フィルタベクトル処理動作１０２において利用される入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）が、すべてのベクトルデータレーン１００（０）〜１００（Ｘ）を要すると想定する。 [0099] To provide a filter vector processing operation 102 according to programming of vector instructions, depending on the width of the input vector data sample sets 86 (0) -86 (X) involved in the filter vector processing operation 102, FIG. One, some, or all of the vector data lanes 100 (0) -100 (X) in VPE 22 (1) may be utilized. If the entire width of the vector data files 82 (0) -82 (X) is required, all vector data lanes 100 (0) -100 (X) may be utilized for the filter vector processing operation 102. Note that the filter vector processing operation 102 may only require a subset of the vector data lanes 100 (0) -100 (X) that may be utilized for the filter vector processing operation 102. This may be because the width of the input vector data sample set 86 (0) -86 (X) is smaller than the width of all vector data files 82 (0) -82 (X), where the filter vector It may be desirable to utilize additional vector data lanes 100 for other vector processing operations to be performed in parallel with processing operations 102. For the purpose of illustrating the current example, the input vector data sample set 86 (0) -86 (X) utilized in the filter vector processing operation 102 requires all vector data lanes 100 (0) -100 (X). Assume that.

[00100]図５に戻って参照すると、現在の入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）としてタップ付き遅延線７８にロードされるために、フェッチされた入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）が、ベクトルデータファイル８２（０）〜８２（Ｘ）から入力データフローパス８０（０）〜８０（Ｘ）に供給される（ブロック１０６）。入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）は、フィルタベクトル処理動作１０２のために実行ユニット８４（０）〜８４（Ｘ）によって処理されるべき入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）として、プライマリタップ付き遅延線７８（０）の中にロードされる。プライマリタップ付き遅延線７８（０）の中にロードされた入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）は、フィルタベクトル処理動作１０２の最初のフィルタタップ動作のためにシフトされない。しかしながら、上記で説明され、図７に関して下記でさらに詳細に説明されるように、タップ付き遅延線７８の目的は、フィルタベクトル処理動作１０２の次のフィルタタップ動作のために実行ユニット８４（０）〜８４（Ｘ）にシフトされた入力ベクトルデータサンプルセット８６Ｓ（０）〜８６Ｓ（Ｘ）を供給するために、入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）のシフトを提供することである。実行ユニット８４（０）〜８４（Ｘ）によって実行されるフィルタベクトル処理動作１０２の各処理ステージの間、実行ユニット８４（０）〜８４（Ｘ）にシフトされた入力ベクトルデータサンプルセット８６Ｓ（０）〜８６Ｓ（Ｘ）を供給するために、入力ベクトルデータサンプル８６がプライマリタップ付き遅延線７８（０）内でシフトされる。このようにして、入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）は、フィルタベクトル処理動作１０２のフィルタタップ動作ごとに、記憶、ベクトルデータファイル８２（０）〜８２（Ｘ）内でシフト、および再フェッチされる必要がない。 [00100] Referring back to FIG. 5, the fetched input vector data sample set 86 () is loaded into the tapped delay line 78 as the current input vector data sample set 86 (0) -86 (X). 0) -86 (X) are supplied from the vector data files 82 (0) -82 (X) to the input data flow paths 80 (0) -80 (X) (block 106). The input vector data sample sets 86 (0) -86 (X) are to be processed by the execution units 84 (0) -84 (X) for the filter vector processing operation 102. 86 (X) is loaded into the delay line 78 (0) with the primary tap. The input vector data sample sets 86 (0) -86 (X) loaded into the primary tapped delay line 78 (0) are not shifted due to the first filter tap operation of the filter vector processing operation 102. However, as described above and described in further detail below with respect to FIG. 7, the purpose of the tapped delay line 78 is to execute unit 84 (0) for the next filter tap operation of the filter vector processing operation 102. Providing a shift of the input vector data sample sets 86 (0) -86 (X) to provide input vector data sample sets 86S (0) -86S (X) shifted to .about.84 (X). is there. During each processing stage of the filter vector processing operation 102 performed by the execution units 84 (0) -84 (X), the input vector data sample set 86S (0) shifted to the execution units 84 (0) -84 (X). ) To 86S (X), the input vector data sample 86 is shifted in the primary tapped delay line 78 (0). In this way, the input vector data sample sets 86 (0) to 86 (X) are stored and shifted within the vector data files 82 (0) to 82 (X) for each filter tap operation of the filter vector processing operation 102. , And does not need to be refetched.

[00101]オプションのシャドウタップ付き遅延線７８（１）がＶＰＥ２２（１）内に設けられた場合、次の入力ベクトルデータサンプルセット８６Ｎ（０）〜８６Ｎ（Ｘ）も、ベクトルデータファイル８２（０）〜８２（Ｘ）からシャドウタップ付き遅延線７８（１）の中にロードされ得る。図７に関して下記でさらに詳細に説明されるように、次の入力ベクトルデータサンプルセット８６Ｎ（０）〜８６Ｎ（Ｘ）は、シフトされた入力ベクトルデータサンプルセット８６Ｓ（０）〜８６Ｓ（Ｘ）の少なくとも一部になるために、フィルタベクトル処理動作１０２の間にプライマリタップ付き遅延線７８（０）の中にシフトされる。このように、プライマリタップ付き遅延線７８（０）は、フィルタベクトル処理動作１０２のために実行されるべき次の入力ベクトルデータサンプルセット８６Ｎ（０）〜８６Ｎ（Ｘ）が、ベクトルデータファイル８２（０）〜８２（Ｘ）からプライマリタップ付き遅延線７８（０）の中にフェッチされるまで、実行ユニット８４（０）〜８４（Ｘ）が待つ必要があった場合、場合によっては被る遅延をフェッチすることなく、フィルタベクトル処理動作１０２の間に利用可能なシフトされた入力ベクトルデータサンプルセット８６Ｓ（０）〜８６Ｓ（Ｘ）を有することができる。 [00101] If the optional shadow-tapped delay line 78 (1) is provided in the VPE 22 (1), the next input vector data sample sets 86N (0) -86N (X) are also stored in the vector data file 82 (0). ) -82 (X) can be loaded into the shadow tapped delay line 78 (1). As described in more detail below with respect to FIG. 7, the next input vector data sample set 86N (0) -86N (X) is the same as the shifted input vector data sample set 86S (0) -86S (X). To be at least partly shifted into the primary tapped delay line 78 (0) during the filter vector processing operation 102. In this way, the primary tapped delay line 78 (0) allows the next input vector data sample set 86N (0) -86N (X) to be executed for the filter vector processing operation 102 to be stored in the vector data file 82 ( If execution units 84 (0) -84 (X) had to wait from 0) -82 (X) to fetch into primary tapped delay line 78 (0), the delay incurred in some cases Without fetching, it is possible to have the shifted input vector data sample sets 86S (0) -86S (X) available during the filter vector processing operation 102.

[00102]この関連で、図７は、図４のＶＰＥ２２（１）内に設けられ得る例示的なタップ付き遅延線７８を示す。この実施形態では、タップ付き遅延線７８は、シャドウタップ付き遅延線７８（１）とプライマリタップ付き遅延線７８（０）とを備える。この例におけるプライマリタップ付き遅延線７８（０）は、入力ベクトルデータサンプル８６の解像度が８ビット長に落ちることを可能にするために、複数の８ビットプライマリパイプラインレジスタ１２０から構成される。実行ユニット８４（０）〜８４（Ｘ）によって処理される最初の入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）は、下記の図９Ａに関して説明されるように、フィルタベクトル処理動作１０２の最初のフィルタタップのために、この例ではシフトされない。実行ユニット８４（０）〜８４（Ｘ）がフィルタベクトル処理動作１０２のために次のフィルタタップを処理するとき、プライマリタップ付き遅延線７８（０）に記憶された入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）の中の入力ベクトルデータサンプル８６は、シフトされた入力ベクトルデータサンプルセット８６Ｓ（０）〜８６Ｓ（Ｘ）になるために、図７の矢印によって示されたように、プライマリパイプラインレジスタ１２０（０）〜１２０（４Ｘ＋３）内でシフトされる。このようにして、実行ユニット８４（０）〜８４（Ｘ）は、入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）を記憶およびシフトする必要なしに、ならびにベクトルデータファイル８２（０）〜８２（Ｘ）からシフトされた入力ベクトルデータサンプルセット８６Ｓ（０）〜８６Ｓ（Ｘ）を再フェッチすることなく、シフトされた入力ベクトルデータサンプルセット８６Ｓ（０）〜８６Ｓ（Ｘ）を受信し、それらのフィルタベクトル処理動作１０２を実行することによって、十分利用される。 [00102] In this regard, FIG. 7 illustrates an exemplary tapped delay line 78 that may be provided within the VPE 22 (1) of FIG. In this embodiment, the tapped delay line 78 includes a shadow tapped delay line 78 (1) and a primary tapped delay line 78 (0). The primary tapped delay line 78 (0) in this example is comprised of a plurality of 8-bit primary pipeline registers 120 to allow the resolution of the input vector data sample 86 to drop to 8 bits long. The first input vector data sample sets 86 (0) -86 (X) processed by the execution units 84 (0) -84 (X) are used in the filter vector processing operation 102 as described with respect to FIG. 9A below. Due to the first filter tap, it is not shifted in this example. When execution units 84 (0) -84 (X) process the next filter tap for filter vector processing operation 102, input vector data sample set 86 (0) stored in primary tapped delay line 78 (0). ) To 86 (X), the input vector data samples 86 become the shifted input vector data sample sets 86S (0) to 86S (X), as shown by the arrows in FIG. Shifted in pipeline registers 120 (0) -120 (4X + 3). In this way, execution units 84 (0) -84 (X) do not need to store and shift input vector data sample sets 86 (0) -86 (X) and vector data files 82 (0)- Receiving shifted input vector data sample sets 86S (0) -86S (X) without refetching the input vector data sample sets 86S (0) -86S (X) shifted from 82 (X); By performing these filter vector processing operations 102, it is fully utilized.

[00103]この実施形態では、プライマリパイプラインレジスタ１２０（０）〜１２０（４Ｘ＋３）はまとめて、図４のベクトルデータファイル８２（０）〜８２（Ｘ）の幅である。１５に等しい「Ｘ」を有する幅が５１２ビットであるベクトルデータファイル８２（０）〜８２（Ｘ）の例では、５１２ビット（すなわち、６４個のレジスタ×各８ビット）の合計幅を提供するために、各々が８ビット幅である６４個の合計プライマリパイプラインレジスタ１２０（０）〜１２０（６３）が存在する。したがって、この例では、プライマリタップ付き遅延線７８（０）は、１つの入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）の幅全体を記憶することが可能である。この例では、８ビット幅のプライマリパイプラインレジスタ１２０（０）〜１２０（４Ｘ＋３）を設けることによって、入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）は、プライマリパイプラインレジスタ１２０（０）〜１２０（４Ｘ＋３）において、８ビットフィルタベクトル処理動作のために８ビットのベクトルデータサンプルサイズにシフトダウンされ得る。たとえば、１６ビットまたは３２ビットのサンプルなどのより大きいサイズの入力ベクトルデータサンプル８６のサイズがフィルタベクトル処理動作のために望ましい場合、入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）は、プライマリパイプラインレジスタ１２０（０）〜１２０（４Ｘ＋３）において、１度に２つのプライマリパイプラインレジスタ１２０によってシフトされ得る。 [00103] In this embodiment, primary pipeline registers 120 (0) -120 (4X + 3) are collectively the width of vector data files 82 (0) -82 (X) in FIG. In the example of a vector data file 82 (0) -82 (X) having a width of “X” equal to 15 and 512 bits, it provides a total width of 512 bits (ie, 64 registers × 8 bits each) Therefore, there are 64 total primary pipeline registers 120 (0) -120 (63), each 8 bits wide. Accordingly, in this example, the primary tapped delay line 78 (0) can store the entire width of one input vector data sample set 86 (0) -86 (X). In this example, by providing 8-bit wide primary pipeline registers 120 (0) to 120 (4X + 3), the input vector data sample sets 86 (0) to 86 (X) are converted into the primary pipeline registers 120 (0). ~ 120 (4X + 3) may be shifted down to an 8-bit vector data sample size for 8-bit filter vector processing operations. For example, if a larger input vector data sample 86 size, such as a 16-bit or 32-bit sample, is desired for the filter vector processing operation, the input vector data sample set 86 (0) -86 (X) Pipeline registers 120 (0) -120 (4X + 3) may be shifted by two primary pipeline registers 120 at a time.

[00104]引き続き図７を参照すると、シャドウタップ付き遅延線７８（１）もタップ付き遅延線７８内に設けられる。シャドウタップ付き遅延線７８（１）は、次のベクトル処理動作のためにベクトルデータファイル８２（０）〜８２（Ｘ）から次の入力ベクトルデータサンプルセット８６Ｎ（０）〜８６Ｎ（Ｘ）をラッチまたは輸送するために利用され得る。フィルタベクトル処理動作１０２のための各フィルタタップが実行ユニット８４（０）〜８４（Ｘ）によって実行されるとき、次の入力ベクトルデータサンプルセット８６Ｎ（０）〜８６Ｎ（Ｘ）からの次の入力ベクトルデータサンプル８６Ｎは、シャドウタップ付き遅延線７８（１）からプライマリタップ付き遅延線７８（０）の中にシフトされる。シャドウタップ付き遅延線７８（１）はまた、入力ベクトルデータサンプル８６の解像度が、プライマリタップ付き遅延線７８（０）と同様に８ビット長に落ちることを可能にするために、複数の８ビットシャドウパイプラインレジスタ１２２から構成される。プライマリパイプラインレジスタ１２０（０）〜１２０（４Ｘ＋３）のように、シャドウタップ付き遅延線７８（１）内に設けられたシャドウパイプラインレジスタ１２２（０）〜１２２（４Ｘ＋３）はまとめて、この例では５１２ビットである、ベクトルデータファイル８２（０）〜８２（Ｘ）の幅である。したがって、シャドウタップ付き遅延線７８（１）のシャドウパイプラインレジスタ１２２（０）〜１２２（４Ｘ＋３）も、１つの入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）の幅全体を記憶することが可能である。したがって、この実施形態では、プライマリタップ付き遅延線７８（０）に含まれるシャドウパイプラインレジスタ１２２（０）〜１２２（４Ｘ＋３）の数は、この例では合計１６である（すなわち、Ｘ＝１５）ベクトルデータレーン１００（０）〜１００（Ｘ）の数の４倍である。したがって、シャドウパイプラインレジスタ１２２の数も、合計５１２ビット（すなわち、６４個のレジスタ×各８ビット）向けにこの例では合計６４である。プライマリタップ付き遅延線７８（０）に関して上記で説明されたように、この例では、８ビット幅のシャドウパイプラインレジスタ１２２（０）〜１２２（４Ｘ＋３）を設けることによって、次の入力ベクトルデータサンプルセット８６Ｎ（０）〜８６Ｎ（Ｘ）は、８ビットフィルタベクトル処理動作のために８ビットのベクトルデータサンプルサイズにシフトダウンされ得る。 [00104] With continued reference to FIG. 7, a shadow tapped delay line 78 (1) is also provided within the tapped delay line 78. The shadow tapped delay line 78 (1) latches the next input vector data sample set 86N (0) -86N (X) from the vector data files 82 (0) -82 (X) for the next vector processing operation. Or it can be used to transport. As each filter tap for filter vector processing operation 102 is performed by execution units 84 (0) -84 (X), the next input from the next input vector data sample set 86N (0) -86N (X). Vector data sample 86N is shifted from shadow tapped delay line 78 (1) into primary tapped delay line 78 (0). The shadow tapped delay line 78 (1) also allows a plurality of 8-bit lengths to allow the resolution of the input vector data sample 86 to fall to 8-bit length, similar to the primary tapped delay line 78 (0). It consists of a shadow pipeline register 122. The shadow pipeline registers 122 (0) to 122 (4X + 3) provided in the delay line 78 (1) with shadow tap, such as the primary pipeline registers 120 (0) to 120 (4X + 3), are collectively shown in this example. Is the width of the vector data files 82 (0) to 82 (X), which is 512 bits. Therefore, the shadow pipeline registers 122 (0) -122 (4X + 3) of the delay line 78 (1) with shadow tap also store the entire width of one input vector data sample set 86 (0) -86 (X). Is possible. Therefore, in this embodiment, the number of shadow pipeline registers 122 (0) to 122 (4X + 3) included in the primary tapped delay line 78 (0) is a total of 16 in this example (ie, X = 15). This is four times the number of vector data lanes 100 (0) to 100 (X). Thus, the number of shadow pipeline registers 122 is also a total of 64 in this example for a total of 512 bits (ie, 64 registers × 8 bits each). As described above with respect to the primary tapped delay line 78 (0), in this example the next input vector data sample is provided by providing 8-bit wide shadow pipeline registers 122 (0) -122 (4X + 3). The sets 86N (0) -86N (X) may be shifted down to an 8-bit vector data sample size for 8-bit filter vector processing operations.

[00105]図８は、図７のプライマリタップ付き遅延線７８（０）およびシャドウタップ付き遅延線７８（１）の中に存在する選択されたプライマリパイプラインレジスタ１２０とシャドウパイプラインレジスタ１２２とを示す概略図である。図８は、プライマリパイプラインレジスタ１２０とシャドウパイプラインレジスタ１２２との間の入力ベクトルデータサンプル８６のシフトの例を説明することを容易にするために提供される。上記で説明されたように、入力ベクトルデータサンプル８６はまた、プライマリタップ付き遅延線７８（０）およびシャドウタップ付き遅延線７８（１）内で、ならびにシャドウタップ付き遅延線７８（１）からプライマリタップ付き遅延線７８（０）にシフトされ得る。パイプラインレジスタ１２０、１２２は、入力ベクトルデータサンプル８６が必要な場合８ビットの解像度でシフトすることを可能にするために、この例では各々８ビット幅である。これは下記でより詳細に説明される。プライマリタップ付き遅延線７８（０）およびシャドウタップ付き遅延線７８（１）はまた、同様に下記でより詳細に説明されるように、入力ベクトルデータサンプル８６の解像度の１６ビットシフトと３２ビットシフトとを実行することが可能である。 [00105] FIG. 8 shows selected primary pipeline registers 120 and shadow pipeline registers 122 present in primary tapped delay line 78 (0) and shadow tapped delay line 78 (1) of FIG. FIG. FIG. 8 is provided to facilitate describing an example of shifting input vector data samples 86 between primary pipeline register 120 and shadow pipeline register 122. As explained above, the input vector data samples 86 are also primary in the primary tapped delay line 78 (0) and shadow tapped delay line 78 (1) and from the shadow tapped delay line 78 (1). It can be shifted to tapped delay line 78 (0). Pipeline registers 120 and 122 are each 8 bits wide in this example to allow shifting in 8-bit resolution when input vector data samples 86 are required. This is explained in more detail below. Primary tapped delay line 78 (0) and shadow tapped delay line 78 (1) are also 16-bit and 32-bit shifts in the resolution of input vector data samples 86, as will also be described in more detail below. Can be executed.

[00106]この関連で、図８は、図７のプライマリタップ付き遅延線７８（０）内に入力ベクトルデータサンプル８６Ｓ（Ｘ）のための記憶レジスタを形成する、プライマリパイプラインレジスタ１２０（４Ｘ＋３）、１２０（２Ｘ＋１）、１２０（４Ｘ＋２）、および１２０（２Ｘ）の中への入力ベクトルデータサンプル８６のシフトを示す。プライマリパイプラインレジスタ１２０（４Ｘ＋３）および１２０（４Ｘ＋２）は、それぞれ、図７のプライマリタップ付き遅延線７８（０）内のレジスタＢ₃₁およびＢ₃₀である。プライマリパイプラインレジスタ１２０（２Ｘ+１）および１２０（２Ｘ）は、それぞれ、図７のプライマリタップ付き遅延線７８（０）内のレジスタＡ₃₁およびＡ₃₀である。図７に示されたように、レジスタＢ₃₁およびＢ₃₀のためのプライマリパイプラインレジスタ１２０（４Ｘ＋３）および１２０（４Ｘ＋２）は、シャドウタップ付き遅延線７８（１）内の隣接するシャドウパイプラインレジスタ１２２からシフトされた入力ベクトルデータサンプル８６を受信するように構成される。したがって、図８の例では、それぞれ、レジスタＡ’₀およびＡ’₁のためのシャドウパイプラインレジスタ１２２（０）および１２２（１）は、Ｂ₃₁およびＢ₃₀のためのプライマリパイプラインレジスタ１２０（４Ｘ＋３）および１２０（４Ｘ＋２）の中に入力ベクトルデータサンプル８６をシフトするように構成されるものとして示される。同様に、図８の例では、プライマリタップ付き遅延線７８（０）内の、それぞれ、レジスタＢ₁およびＢ₀のためのプライマリパイプラインレジスタ１２０（２Ｘ＋３）および１２０（２Ｘ＋２）は、レジスタＡ₃₁およびＡ₃₀のための隣接するプライマリパイプラインレジスタ１２０（２Ｘ+１）および１２０（２Ｘ）の中に入力ベクトルデータサンプル８６をシフトするように構成されるものとして示される。これらのレジスタ間の入力ベクトルデータサンプル８６の例示的なシフトが次に記載される。 [00106] In this regard, FIG. 8 illustrates a primary pipeline register 120 (4X + 3) that forms a storage register for input vector data samples 86S (X) in the primary tapped delay line 78 (0) of FIG. , 120 (2X + 1), 120 (4X + 2), and 120 (2X). Primary pipeline registers 120 (4X + 3) and 120 (4X + 2) are registers B ₃₁ and B ₃₀ in the primary tapped delay line 78 (0) of FIG. 7, respectively. Primary pipeline registers 120 (2X + 1) and 120 (2X) are registers A ₃₁ and A ₃₀ in delay line 78 (0) with a primary tap in FIG. 7, respectively. As shown in FIG. 7, primary pipeline registers 120 (4X + 3) and 120 (4X + 2) for registers B ₃₁ and B ₃₀ are adjacent shadow pipeline registers in shadow tapped delay line 78 (1). An input vector data sample 86 shifted from 122 is configured to be received. Thus, in the example of FIG. 8, respectively, the shadow pipeline register 122 for register A _'0 and A' ₁ (0) and 122 (1), the primary pipeline register 120 for B ₃₁ and B ₃₀ ( 4X + 3) and 120 (4X + 2) are shown configured to shift the input vector data samples 86. Similarly, in the example of FIG. 8, primary pipeline registers 120 (2X + 3) and 120 (2X + 2) for registers B ₁ and B ₀ in primary tapped delay line 78 (0), respectively, are registered in register A _31. , And A ₃₀ are shown as being configured to shift input vector data samples 86 into adjacent primary pipeline registers 120 (2X + 1) and 120 (2X). An exemplary shift of input vector data samples 86 between these registers will now be described.

[00107]引き続き図８を参照すると、図４ならびに入力ベクトルデータサンプル８６のシフトにおいて、ベクトルデータファイル８２（０）〜８２（Ｘ）から新しい入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）をロードするように、プライマリパイプラインレジスタ１２０とシャドウパイプラインレジスタ１２２とを構成する柔軟性を提供するために、入力ベクトルデータサンプル選択器がプライマリパイプラインレジスタ１２０およびシャドウパイプラインレジスタ１２２の各々に関連付けられる。この関連で、プライマリタップ付き遅延線７８（０）において、それぞれ、プライマリパイプラインレジスタ１２０（０）〜１２０（４Ｘ＋３）の中にロードまたはシフトされるベクトルデータに、入力ベクトルデータサンプル選択器１２４（０）〜１２４（４Ｘ＋３）が提供される。シャドウタップ付き遅延線７８（１）において、それぞれ、シャドウパイプラインレジスタ１２２（０）〜１２２（４Ｘ＋３）の中にロードまたはシフトされるベクトルデータに、入力ベクトルデータサンプル選択器１２６（０）〜１２６（４Ｘ＋３）が提供される。入力ベクトルデータサンプル選択器１２４（０）〜１２４（４Ｘ＋３）および入力ベクトルデータサンプル選択器１２６（０）〜１２６（４Ｘ＋３）は、この例では各々マルチプレクサである。下記でより詳細に説明されるように、入力ベクトルデータサンプル選択器１２４（０）〜１２４（４Ｘ＋３）、１２６（０）〜１２６（４Ｘ＋３）は、各々、プライマリパイプラインレジスタ１２０（０）〜１２０（４Ｘ＋３）およびシャドウパイプラインレジスタ１２２（０）〜１２２（４Ｘ＋３）の中にロードまたはシフトされるべき入力ベクトルデータを選択するために、データ幅シフト制御入力１２５によって制御され得る。 [00107] With continued reference to FIG. 8, in the shift of FIG. 4 and the input vector data samples 86, a new set of input vector data samples 86 (0) -86 (X) from the vector data files 82 (0) -82 (X). In order to provide the flexibility to configure the primary pipeline register 120 and the shadow pipeline register 122 to load the input vector data sample selector to each of the primary pipeline register 120 and the shadow pipeline register 122. Associated. In this regard, the input vector data sample selector 124 (into the vector data loaded or shifted into the primary pipeline registers 120 (0) to 120 (4X + 3) in the primary tapped delay line 78 (0), respectively. 0) -124 (4X + 3) are provided. Input vector data sample selectors 126 (0) -126 for vector data loaded or shifted into shadow pipeline registers 122 (0) -122 (4X + 3) in delay line 78 (1) with shadow tap, respectively. (4X + 3) is provided. Input vector data sample selectors 124 (0) -124 (4X + 3) and input vector data sample selectors 126 (0) -126 (4X + 3) are each multiplexers in this example. As described in more detail below, the input vector data sample selectors 124 (0) -124 (4X + 3), 126 (0) -126 (4X + 3) are respectively primary pipeline registers 120 (0) -120. Data width shift control input 125 may be controlled to select input vector data to be loaded or shifted into (4X + 3) and shadow pipeline registers 122 (0) -122 (4X + 3).

[00108]図８では、それぞれ、レジスタＢ₃₁、Ｂ₃₀、Ａ₃₁、およびＡ₃₀に対応する、それぞれ、プライマリパイプラインレジスタ１２０（４Ｘ＋３）、１２０（４Ｘ＋２）、１２０（２Ｘ＋１）、１２０（２Ｘ）のために、入力ベクトルデータサンプル選択器１２４（４Ｘ＋３）、１２４（４Ｘ＋２）、１２４（２Ｘ＋１）、１２４（２Ｘ）のみが示されていることに留意されたい。図８では、それぞれ、レジスタＡ’₁、Ａ’₀、Ｂ₁、およびＢ₀に対応する、それぞれ、パイプラインレジスタ１２２（１）、１２２（０）、１２０（２Ｘ＋３）、１２０（２Ｘ＋２）のために、入力ベクトルデータサンプル選択器１２６（１）、１２６（０）、１２４（２Ｘ＋３）、１２４（２Ｘ＋２）のみが示されている。 [00108] In Figure 8, respectively, corresponding to the register _{_{_{B 31, B 30, A 31}}} , and A _30, respectively, the primary pipeline register 120 (4X + 3), 120 (4X + 2), 120 (2X + 1), 120 (2X Note that only the input vector data sample selectors 124 (4X + 3), 124 (4X + 2), 124 (2X + 1), 124 (2X) are shown. In FIG. 8, for pipeline registers 122 (1), 122 (0), 120 (2X + 3), 120 (2X + 2) respectively corresponding to registers A ′ ₁ , A ′ ₀ , B ₁ , and B ₀ . Thus, only the input vector data sample selectors 126 (1), 126 (0), 124 (2X + 3), 124 (2X + 2) are shown.

[00109]引き続き図８を参照すると、ベクトル処理動作のために、新しい入力ベクトルデータがプライマリタップ付き遅延線７８（０）およびシャドウタップ付き遅延線７８（１）の中にロードされるべき場合、データ幅シフト制御入力１２５は、入力ベクトルデータサンプル選択器１２４（４Ｘ＋３）、１２４（４Ｘ＋２）、１２４（２Ｘ＋１）、１２４（２Ｘ）に、ロードデータフローパス１３３（４Ｘ＋３）、１３３（４Ｘ＋２）、１３３（２Ｘ＋１）、１３３（２Ｘ）を選択させるように、図４のＶＰＥ２２（１）によって構成され得る。ロードデータフローパス１３３（４Ｘ＋３）、１３３（４Ｘ＋２）、１３３（２Ｘ＋１）、１３３（２Ｘ）を選択すると、ベクトルデータファイル８２（０）〜８２（Ｘ）からの入力ベクトルデータがプライマリパイプラインレジスタ１２０（４Ｘ＋３）、１２０（４Ｘ＋２）、１２０（２Ｘ＋１）、１２０（２Ｘ）に記憶されることが可能になる。ベクトルデータファイル８２（０）〜８２（Ｘ）から入力ベクトルデータをロードすることは、例としてＶＰＥ２２（１）によって処理されるべき新しいまたは次のベクトル命令上で実行される場合がある。同様に、データ幅シフト制御入力１２５はまた、入力ベクトルデータサンプル選択器１２６（１）、１２４（２Ｘ＋３）、１２６（０）、１２４（２Ｘ＋２）に、入力データフローパス１３５（１）、１３３（２Ｘ＋３）、１３５（０）、１３３（２Ｘ＋２）を選択させるように、図４のＶＰＥ２２（１）によって構成され得る。ロードデータフローパス１３５（１）、１３３（２Ｘ＋３）、１３５（０）、１３３（２Ｘ＋２）を選択すると、ベクトルデータファイル８２（０）〜８２（Ｘ）からの入力ベクトルデータがパイプラインレジスタ１２２（１）、１２０（２Ｘ＋３）、１２４（０）、１２０（２Ｘ＋２）に記憶されることが可能になる。 [00109] Referring still to FIG. 8, for vector processing operations, if new input vector data is to be loaded into the primary tapped delay line 78 (0) and the shadow tapped delay line 78 (1), The data width shift control input 125 is input to the input vector data sample selectors 124 (4X + 3), 124 (4X + 2), 124 (2X + 1), and 124 (2X), to the load data flow path 133 (4X + 3), 133 (4X + 2), and 133 ( 2X + 1), 133 (2X) may be selected by VPE 22 (1) of FIG. When the load data flow path 133 (4X + 3), 133 (4X + 2), 133 (2X + 1), 133 (2X) is selected, the input vector data from the vector data files 82 (0) to 82 (X) is transferred to the primary pipeline register 120 ( 4X + 3), 120 (4X + 2), 120 (2X + 1), 120 (2X). Loading input vector data from vector data files 82 (0) -82 (X) may be performed on a new or next vector instruction to be processed by VPE 22 (1) as an example. Similarly, data width shift control input 125 is also input to input vector data sample selectors 126 (1), 124 (2X + 3), 126 (0), 124 (2X + 2) to input data flow paths 135 (1), 133 (2X + 3). ), 135 (0), 133 (2X + 2) may be configured by VPE 22 (1) of FIG. When the load data flow path 135 (1), 133 (2X + 3), 135 (0), 133 (2X + 2) is selected, the input vector data from the vector data files 82 (0) to 82 (X) is stored in the pipeline register 122 (1 ), 120 (2X + 3), 124 (0), 120 (2X + 2).

[00110]引き続き図８を参照すると、ベクトル処理動作のために、プライマリタップ付き遅延線７８（０）およびシャドウタップ付き遅延線７８（１）に記憶されたベクトルデータがシフトされる必要がある場合、データ幅シフト制御入力１２５は、入力ベクトルデータサンプル選択器１２４（４Ｘ＋３）、１２４（４Ｘ＋２）、１２４（２Ｘ＋１）、１２４（２Ｘ）に、ベクトルデータサンプルのシフトのための入力データフローパス１３７（４Ｘ＋３）、１３７（４Ｘ＋２）、１３７（２Ｘ＋１）、１３７（２Ｘ）を選択させるように、図４のＶＰＥ２２（１）によって構成され得る。データ幅シフト制御入力１２５はまた、入力ベクトルデータサンプル選択器１２６（１）、１２４（２Ｘ＋３）、１２６（０）、１２４（２Ｘ＋２）に、ベクトルデータサンプルのシフトのための入力データフローパス１３９（１）、１３７（２Ｘ＋３）、１３９（０）、１３７（２Ｘ＋２）を選択させる。そこに示されているように、入力ベクトルデータサンプル選択器１２４（４Ｘ＋３）、１２４（４Ｘ＋２）、１２４（２Ｘ＋１）、１２４（２Ｘ）および入力ベクトルデータサンプル選択器１２６（１）、１２４（２Ｘ＋３）、１２６（０）、１２４（２Ｘ＋２）は、各々、ベクトルデータが他のレジスタにシフトされることを可能にする、それぞれ、出力データフローパス１４１（４Ｘ＋３）、１４１（４Ｘ＋２）、１４１（２Ｘ＋１）、１４１（２Ｘ）および１４３（１）、１４１（２Ｘ＋３）、１４３（０）、１２４（２Ｘ＋２）を含む。図８に示された出力データフローパスは、次に全体が示される出力データフローパス１４１（０）〜１４１（４Ｘ＋３）および１４３（０）〜１４３（４Ｘ＋３）の一部であるが、それぞれ、プライマリタップ付き遅延線７８（０）内の入力ベクトルデータサンプル選択器１２４（０）〜１２４（４Ｘ＋３）およびシャドウタップ付き遅延線７８（１）内の入力ベクトルデータサンプル選択器１２６（０）〜１２６（４Ｘ＋３）のために含まれる。 [00110] Continuing with FIG. 8, when the vector data stored in the primary tapped delay line 78 (0) and the shadow tapped delay line 78 (1) needs to be shifted for vector processing operations. , The data width shift control input 125 is input to an input vector data sample selector 124 (4X + 3), 124 (4X + 2), 124 (2X + 1), 124 (2X) to an input data flow path 137 (4X + 3) for shifting vector data samples. ) 137 (4X + 2), 137 (2X + 1), 137 (2X) may be selected by VPE 22 (1) of FIG. Data width shift control input 125 also provides input vector data sample selectors 126 (1), 124 (2X + 3), 126 (0), 124 (2X + 2) to input data flow path 139 (1 ) 137 (2X + 3), 139 (0), and 137 (2X + 2). As shown therein, input vector data sample selectors 124 (4X + 3), 124 (4X + 2), 124 (2X + 1), 124 (2X) and input vector data sample selectors 126 (1), 124 (2X + 3) , 126 (0), 124 (2X + 2) each allow the vector data to be shifted to another register, respectively, output data flow path 141 (4X + 3), 141 (4X + 2), 141 (2X + 1), 141 (2X) and 143 (1), 141 (2X + 3), 143 (0), 124 (2X + 2). The output data flow path shown in FIG. 8 is a part of the output data flow paths 141 (0) to 141 (4X + 3) and 143 (0) to 143 (4X + 3), which are shown next as a whole. Input vector data sample selectors 124 (0) to 124 (4X + 3) in delay line 78 (0) with delay and input vector data sample selectors 126 (0) to 126 (4X + 3) in delay line 78 (1) with shadow tap. Included for).

[00111]例として、８ビットベクトルデータのシフト中、入力ベクトルデータサンプル選択器１２４（４Ｘ＋３）、１２４（４Ｘ＋２）、１２４（２Ｘ＋１）、１２４（２Ｘ）および入力ベクトルデータサンプル選択器１２６（１）、１２４（２Ｘ＋３）、１２６（０）、１２４（２Ｘ＋２）は、それぞれ、入力データフローパス１３７（４Ｘ＋３）、１３７（４Ｘ＋２）、１３７（２Ｘ＋１）、１３７（２Ｘ）、１３９（１）、１３７（２Ｘ＋３）、１３９（０）、１３７（２Ｘ＋２）を選択するように構成される。この関連で、例として、プライマリパイプラインレジスタ１２０（２Ｘ＋１）（すなわち、Ａ₃₁）内のベクトルデータは、図８に示されたように、プライマリパイプラインレジスタ１２０（２Ｘ）（すなわち、Ａ₃₀）に出力データフローパス１４１（２Ｘ＋１）上でシフトされる。プライマリパイプラインレジスタ１２０（４Ｘ＋３）（すなわち、Ｂ₃₁）内のベクトルデータは、図８に示されたように、プライマリパイプラインレジスタ１２０（４Ｘ＋２）（すなわち、Ｂ₃₀）に出力データフローパス１４１（４Ｘ＋３）上でシフトされる。シャドウパイプラインレジスタ１２２（０）（すなわち、Ａ’₀）内のベクトルデータは、図８に示されたように、プライマリパイプラインレジスタ１２０（４Ｘ＋３）（すなわち、Ｂ₃₁）に出力データフローパス１４３（０）上でシフトされる。プライマリパイプラインレジスタ１２０（２Ｘ＋３）（すなわち、Ｂ₁）内のベクトルデータは、図８に示されたように、プライマリパイプラインレジスタ１２０（４Ｘ＋２）（すなわち、Ｂ₃₀）に出力データフローパス１４１（２Ｘ＋３）上でシフトされる。シャドウパイプラインレジスタ１２２（１）（すなわち、Ａ’₁）内のベクトルデータは、図８に示されたように、シャドウパイプラインレジスタ１２２（０）（すなわち、Ａ’₀）に出力データフローパス１４３（１）上でシフトされる。プライマリパイプラインレジスタ１２０（２Ｘ＋２）（すなわち、Ｂ₀）内のベクトルデータは、図８に示されたように、プライマリパイプラインレジスタ１２０（２Ｘ＋１）（すなわち、Ａ₃₁）に出力データフローパス１４１（２Ｘ＋２）上でシフトされる。 [00111] As an example, during the shifting of 8-bit vector data, input vector data sample selectors 124 (4X + 3), 124 (4X + 2), 124 (2X + 1), 124 (2X) and input vector data sample selector 126 (1) , 124 (2X + 3), 126 (0), 124 (2X + 2) are input data flow paths 137 (4X + 3), 137 (4X + 2), 137 (2X + 1), 137 (2X), 139 (1), 137 (2X + 3), respectively. ) 139 (0), 137 (2X + 2). In this connection, by way of example, the vector data in primary pipeline register 120 (2X + 1) (ie, A ₃₁ ) is represented by primary pipeline register 120 (2X) (ie, A ₃₀ ) as shown in FIG. Shifted on the output data flow path 141 (2X + 1). The vector data in primary pipeline register 120 (4X + 3) (ie, B ₃₁ ) is output to primary pipeline register 120 (4X + 2) (ie, B ₃₀ ) as shown in FIG. 8, and output data flow path 141 (4X + 3). ) Shifted on. The vector data in shadow pipeline register 122 (0) (ie, A ′ ₀ ) is output to primary pipeline register 120 (4X + 3) (ie, B ₃₁ ) as shown in FIG. 0) shifted up. The vector data in the primary pipeline register 120 (2X + 3) (ie, B ₁ ) is output to the primary pipeline register 120 (4X + 2) (ie, B ₃₀ ) as shown in FIG. 8 in the output data flow path 141 (2X + 3). ) Shifted on. Shadow pipeline register 122 (1) (i.e., A _'1) vector data in, as shown in FIG. 8, the shadow pipeline register 122 (0) (i.e., A' ₀₎ output to the data flow path 143 (1) Shifted up. The vector data in primary pipeline register 120 (2X + 2) (ie, B ₀ ) is output to primary pipeline register 120 (2X + 1) (ie, A ₃₁ ) as shown in FIG. ) Shifted on.

[00112]引き続き図８を参照すると、１６ビットベクトルデータのシフト中、入力ベクトルデータサンプル選択器１２４（４Ｘ＋３）、１２４（４Ｘ＋２）、１２４（２Ｘ＋１）、１２４（２Ｘ）および入力ベクトルデータサンプル選択器１２６（１）、１２４（２Ｘ＋３）、１２６（０）、１２４（２Ｘ＋２）は、それぞれ、入力データフローパス１４５（４Ｘ＋３）、１４５（４Ｘ＋２）、１４５（２Ｘ＋１）、１４５（２Ｘ）、１４７（１）、１４５（２Ｘ＋３）、１４７（０）、１４５（２Ｘ＋２）を選択するように構成される。この関連で、例として、プライマリパイプラインレジスタ１２０（２Ｘ＋２）（すなわち、Ｂ₀）内のベクトルデータは、図８に示されたように、プライマリパイプラインレジスタ１２０（２Ｘ）（すなわち、Ａ₃₀）に出力データフローパス１４１（２Ｘ＋２）上でシフトされる。シャドウパイプラインレジスタ１２２（０）（すなわち、Ａ’₀）内のベクトルデータは、図８に示されたように、プライマリパイプラインレジスタ１２０（４Ｘ＋２）（すなわち、Ｂ₃₀）に出力データフローパス１４３（０）上でシフトされる。プライマリパイプラインレジスタ１２０（２Ｘ＋３）（すなわち、Ｂ₁）内のベクトルデータは、図８に示されたように、プライマリパイプラインレジスタ１２０（２Ｘ＋１）（すなわち、Ａ₃₁）に出力データフローパス１４１（２Ｘ＋３）上でシフトされる。シャドウパイプラインレジスタ１２２（１）（すなわち、Ａ’₁）内のベクトルデータは、図８に示されたように、プライマリパイプラインレジスタ１２０（４Ｘ＋３）（すなわち、Ｂ₃₁）に出力データフローパス１４３（１）上でシフトされる。 [00112] Still referring to FIG. 8, during the shifting of 16-bit vector data, input vector data sample selectors 124 (4X + 3), 124 (4X + 2), 124 (2X + 1), 124 (2X) and input vector data sample selectors 126 (1), 124 (2X + 3), 126 (0), 124 (2X + 2) are input data flow paths 145 (4X + 3), 145 (4X + 2), 145 (2X + 1), 145 (2X), 147 (1), respectively. 145 (2X + 3), 147 (0), and 145 (2X + 2). In this context, by way of example, the vector data in primary pipeline register 120 (2X + 2) (ie, B ₀ ) is represented by primary pipeline register 120 (2X) (ie, A ₃₀ ) as shown in FIG. Shifted on the output data flow path 141 (2X + 2). The vector data in shadow pipeline register 122 (0) (ie, A ′ ₀ ) is transferred to primary pipeline register 120 (4X + 2) (ie, B ₃₀ ) as shown in FIG. 0) shifted up. The vector data in primary pipeline register 120 (2X + 3) (ie, B ₁ ) is output to primary pipeline register 120 (2X + 1) (ie, A ₃₁ ) as shown in FIG. 8, in output data flow path 141 (2X + 3). ) Shifted on. The vector data in the shadow pipeline register 122 (1) (ie, A ′ ₁ ) is transferred to the primary pipeline register 120 (4X + 3) (ie, B ₃₁ ) as shown in FIG. 1) Shifted up.

[00113]プライマリタップ付き遅延線７８（０）およびシャドウタップ付き遅延線７８（１）において３２ビットベクトルデータのシフトが望ましい場合、プライマリパイプラインレジスタ１２０（０）〜１２０（４Ｘ＋３）およびシャドウパイプラインレジスタ１２２（０）〜１２２（４Ｘ＋３）に記憶されたベクトルデータは、必要な場合、２つの１６ビットベクトルデータのシフト動作においてシフトされ得る。 [00113] When 32-bit vector data shift is desired in the primary tapped delay line 78 (0) and the shadow tapped delay line 78 (1), the primary pipeline registers 120 (0) -120 (4X + 3) and the shadow pipeline Vector data stored in registers 122 (0) -122 (4X + 3) can be shifted in a shift operation of two 16-bit vector data, if necessary.

[00114]図７では、レジスタＢ₃₁およびＢ₃₀のためのプライマリパイプラインレジスタ１２０（４Ｘ＋３）および１２０（４Ｘ＋２）、ならびにレジスタＡ₃₁およびＡ₃₀のためのプライマリパイプラインレジスタ１２０（２Ｘ＋１）および１２０（２Ｘ）は、シフトされた入力ベクトルデータサンプル８６Ｓ（Ｘ）に対して互いに論理的に関連付けられるが、図８に示されたように、互いに物理的に隣接していないことに留意されたい。この配置は、図６Ｂに示されたように、ベクトルデータファイル８２（０）〜８２（Ｘ）内の入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）の記憶パターンに起因して、この例において提供される。同様に図６Ｂに示されたように、ベクトルデータファイル８２（０）〜８２（Ｘ）に記憶された入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）は、ＡＤＤＲＥＳＳ０およびＡＤＤＲＥＳＳ１をまたぐ。しかしながら、本明細書内の開示は、ベクトルデータファイル８２（０）〜８２（Ｘ）内の入力ベクトルサンプルセット８６（０）〜８６（Ｘ）のこの記憶パターンに限定されないことに留意されたい。 [00114] In Figure 7, the primary pipeline register 120 for register B ₃₁ and B ₃₀ (4X + 3) and 120 (4X + 2), as well as the primary pipeline register 120 for register A ₃₁ and A ₃₀ (2X + 1) and 120 Note that (2X) are logically related to each other for shifted input vector data samples 86S (X), but are not physically adjacent to each other as shown in FIG. This arrangement is due to the storage pattern of the input vector data sample sets 86 (0) -86 (X) in the vector data files 82 (0) -82 (X), as shown in FIG. 6B. Provided in the examples. Similarly, as shown in FIG. 6B, the input vector data sample sets 86 (0) -86 (X) stored in the vector data files 82 (0) -82 (X) straddle ADDRESS0 and ADDRESS1. It should be noted, however, that the disclosure herein is not limited to this storage pattern of input vector sample sets 86 (0) -86 (X) in vector data files 82 (0) -82 (X).

[00115]さらに、図８に関して、タップ付き遅延線７８（０）、７８（１）は、実行されるべきベクトル命令に従って、タップ付き遅延線７８（０）、７８（１）のためのプログラム可能な入力データパス構成に基づいて、ベクトルデータファイル８２（０）〜８２（Ｘ）と実行ユニット８４（０）〜８４（Ｘ）との間の入力データフローパス８０（０）〜８０（Ｘ）内に、選択的に設けられるか、または設けられないように構成可能である。たとえば、ベクトル命令がフィルタベクトル処理命令ではなく、および／または場合によっては入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）をシフトするためにタップ付き遅延線７８（０）、７８（１）を必要としない場合、タップ付き遅延線７８（０）、７８（１）は、入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）をラッチしないように構成され得る。入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）は、プライマリタップ付き遅延線７８（０）とシャドウタップ付き遅延線７８（１）とをバイパスすることによって、それぞれの実行ユニット８４（０）〜８４（Ｘ）にベクトルデータファイル８２（０）〜８２（Ｘ）から供給され得る。このプログラム可能なデータパス構成により、さらに、プライマリタップ付き遅延線７８（０）およびシャドウタップ付き遅延線７８（１）が入力データフローパス８０（０）〜８０（Ｘ）内に設けられるか、または設けられないことが可能になる。プライマリタップ付き遅延線７８（０）およびシャドウタップ付き遅延線７８（１）は、必要に応じて、ベクトル命令ごとに、入力データフローパス８０（０）〜８０（Ｘ）内に設けられるか、または設けられないようにプログラムされ得る。 [00115] Further, with respect to FIG. 8, tapped delay lines 78 (0), 78 (1) are programmable for tapped delay lines 78 (0), 78 (1) according to vector instructions to be executed. In the input data flow paths 80 (0) to 80 (X) between the vector data files 82 (0) to 82 (X) and the execution units 84 (0) to 84 (X) based on the various input data path configurations Can be configured to be selectively provided or not provided. For example, the vector instructions are not filter vector processing instructions and / or in some cases tapped delay lines 78 (0), 78 (1) to shift the input vector data sample sets 86 (0) -86 (X). Are not required, the tapped delay lines 78 (0), 78 (1) may be configured not to latch the input vector data sample sets 86 (0) -86 (X). The input vector data sample sets 86 (0) -86 (X) are bypassed from the primary tapped delay line 78 (0) and the shadow tapped delay line 78 (1), respectively, so that each execution unit 84 (0). To 84 (X) may be supplied from the vector data files 82 (0) to 82 (X). This programmable data path configuration further provides a primary tapped delay line 78 (0) and a shadow tapped delay line 78 (1) in the input data flow paths 80 (0) -80 (X), or It becomes possible not to be provided. Primary tapped delay line 78 (0) and shadow tapped delay line 78 (1) are provided in the input data flow paths 80 (0) -80 (X) for each vector instruction as required, or It can be programmed not to be provided.

[00116]図９Ａは、フィルタベクトル処理命令の第１のクロックサイクル（ＣＹＣＬＥ０）の間にプライマリタップ付き遅延線７８（０）の中にベクトルデータファイル８２（０）〜８２（Ｘ）からロードされた入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）を示す。プライマリタップ付き遅延線７８（０）およびシャドウタップ付き遅延線７８（１）は、図７から簡略化された形式で示されている。グローバルレジスタファイル４０も示されている。最初の入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）が入力ベクトルデータサンプルＸ０〜Ｘ６３としてプライマリタップ付き遅延線７８（０）の中にロードされる。たとえば、プライマリタップ付き遅延線７８（０）の中に（および下記でより詳細に説明されるように、シャドウタップ付き遅延線７８（１）の中にも）最初の入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）をロードするために、特殊ベクトル命令がサポートされる場合がある。この最初の入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）は、図６Ｂに示されたように、ベクトルデータファイル８２（０）〜８２（Ｘ）内のＡＤＤＲＥＳＳ０およびＡＤＤＲＥＳＳ１に記憶された。ひとえにこの例のための図４のＶＰＥ２２（１）内のベクトルデータファイル８２（０）〜８２（Ｘ）の記憶パターンのせいで、この例では、Ｘ０、Ｘ１、Ｘ３２、およびＸ３３が最初の入力ベクトルデータサンプル８６（０）を形成することに留意されたい。他の入力ベクトルデータサンプル８６は、同様に、図９Ａに示されたように形成される（たとえば、８６（１）、８６（２）、．．．８６（Ｘ））。入力ベクトルデータサンプル８６を一緒にグループ化して、入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）を形成するために、他のパターンが提供される可能性がある。 [00116] FIG. 9A is loaded from the vector data files 82 (0) -82 (X) into the primary tapped delay line 78 (0) during the first clock cycle (CYCLE0) of the filter vector processing instruction. The input vector data sample sets 86 (0) to 86 (X) are shown. The primary tapped delay line 78 (0) and the shadow tapped delay line 78 (1) are shown in simplified form from FIG. A global register file 40 is also shown. The first input vector data sample set 86 (0) -86 (X) is loaded into the primary tapped delay line 78 (0) as input vector data samples X0-X63. For example, the first input vector data sample set 86 (in the primary tapped delay line 78 (0) (and also in the shadow tapped delay line 78 (1), as will be described in more detail below)). Special vector instructions may be supported to load 0) -86 (X). This initial input vector data sample set 86 (0) -86 (X) was stored in ADDRESS0 and ADDRESS1 in vector data files 82 (0) -82 (X) as shown in FIG. 6B. Due to the storage pattern of vector data files 82 (0) -82 (X) in VPE 22 (1) of FIG. 4 for this example, in this example, X0, X1, X32, and X33 are the first inputs. Note that the vector data sample 86 (0) is formed. Other input vector data samples 86 are similarly formed as shown in FIG. 9A (eg, 86 (1), 86 (2),... 86 (X)). Other patterns may be provided to group input vector data samples 86 together to form input vector data sample sets 86 (0) -86 (X).

[00117]図９Ｂは、フィルタベクトル処理命令の第２のクロックサイクル（ＣＹＣＬＥ１）の間にシャドウタップ付き遅延線７８（１）の中にロードされた次の入力ベクトルデータサンプルセット８６Ｎ（０）〜８６Ｎ（Ｘ）を示す。フィルタ処理動作の実行をセットアップするために、ベクトルデータファイル８２（０）〜８２（Ｘ）からの最初の入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）がプライマリタップ付き遅延線７８（０）の中にロードされた後に、次の入力ベクトルデータサンプルセット８６Ｎ（０）〜８６Ｎ（Ｘ）がシャドウタップ付き遅延線７８（１）の中にロードされる。この次の入力ベクトルデータサンプルセット８６Ｎ（０）〜８６Ｎ（Ｘ）は、入力ベクトルデータサンプルＸ６４〜Ｘ１２７としてシャドウタップ付き遅延線７８（１）の中にロードされる。この次の入力ベクトルデータサンプルセット８６Ｎ（０）〜８６Ｎ（Ｘ）は、図６Ｂに示されたように、ベクトルデータファイル８２（０）〜８２（Ｘ）内のＡＤＤＲＥＳＳ２およびＡＤＤＲＥＳＳ３に記憶された。ひとえにこの例のための図４のＶＰＥ２２（１）内のベクトルデータファイル８２（０）〜８２（Ｘ）の記憶パターンのせいで、この例では、Ｘ６４、Ｘ６５、Ｘ９６、およびＸ９７が最初の入力ベクトルデータサンプル８６（０）を形成することに留意されたい。入力ベクトルデータサンプル８６を一緒にグループ化して、入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）を形成するために、他のパターンが提供される可能性がある。グローバルレジスタファイル４０からの最初のフィルタ係数９２（０）も、フィルタベクトル処理動作１０２において使用するために図９Ｂの実行ユニット８４（０）〜８４（Ｘ）へのレジスタ（「Ｃ」）内に設けられるものとして示される。 [00117] FIG. 9B shows the next input vector data sample set 86N (0)-loaded into the shadow tapped delay line 78 (1) during the second clock cycle (CYCLE1) of the filter vector processing instruction. 86N (X) is shown. To set up the execution of the filtering operation, the first input vector data sample set 86 (0) -86 (X) from the vector data files 82 (0) -82 (X) is represented by the primary tapped delay line 78 (0 ) Is loaded into the shadow tapped delay line 78 (1). The next input vector data sample set 86N (0) -86N (X) is loaded. This next input vector data sample set 86N (0) -86N (X) is loaded into the delay line 78 (1) with shadow tap as input vector data samples X64-X127. This next input vector data sample set 86N (0) -86N (X) was stored in ADDRESS2 and ADDRESS3 in vector data files 82 (0) -82 (X) as shown in FIG. 6B. Due to the storage pattern of vector data files 82 (0) -82 (X) in VPE 22 (1) of FIG. 4 for this example, in this example, X64, X65, X96, and X97 are the first inputs. Note that the vector data sample 86 (0) is formed. Other patterns may be provided to group input vector data samples 86 together to form input vector data sample sets 86 (0) -86 (X). The first filter coefficient 92 (0) from global register file 40 is also in register ("C") to execution units 84 (0) -84 (X) of FIG. 9B for use in filter vector processing operation 102. Shown as provided.

[00118]図７に戻って参照すると、フィルタベクトル処理動作１０２の各処理ステージの間に入力ベクトルデータサンプル８６がプライマリタップ付き遅延線７８（０）内でシフトされるとき、シャドウパイプラインレジスタ１２２に記憶された次の入力ベクトルデータサンプル８６Ｎも、シャドウタップ付き遅延線７８（１）のシャドウパイプラインレジスタ１２２内でシフトされる。図７の最初のシャドウパイプラインレジスタ１２２（０）に記憶された入力ベクトルデータサンプル８６は、各シフトの間にプライマリタップ付き遅延線７８（０）の最後のプライマリパイプラインレジスタ１２０（４Ｘ＋３）の中にシフトされる。したがって、このようにして、フィルタベクトル処理動作１０２の処理ステージが実行ユニット８４（０）〜８４（Ｘ）において進行するとき、シャドウタップ付き遅延線７８（１）に最初に記憶された次の入力ベクトルデータサンプルセット８６Ｎ（０）〜８６Ｎ（Ｘ）の少なくとも一部分は、処理のために実行ユニット８４（０）〜８４（Ｘ）に供給されるために、プライマリタップ付き遅延線７８（０）の中にシフトされる。シフトの回数は、この例ではフィルタベクトル処理動作１０２において提供されたフィルタタップの数に依存する。ベクトルデータファイル８２（０）〜８２（Ｘ）からプライマリタップ付き遅延線７８（０）およびシャドウタップ付き遅延線７８（１）の中にフェッチされた入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）内の入力ベクトルデータサンプル８６の数が、フィルタベクトル処理動作１０２におけるフィルタタップの数よりも大きい場合、実行ユニット８４（０）〜８４（Ｘ）は、任意のさらなる入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）がベクトルデータファイル８２（０）〜８２（Ｘ）から再フェッチされることなく、フィルタベクトル処理動作１０２を実行することができる。しかしながら、フィルタベクトル処理動作１０２におけるフィルタタップの数が、ベクトルデータファイル８２（０）〜８２（Ｘ）からプライマリタップ付き遅延線７８（０）およびシャドウタップ付き遅延線７８（１）の中にフェッチされた入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）内の入力ベクトルデータサンプル８６よりも大きい場合、フィルタベクトル処理動作１０２の一部として、さらなる入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）がベクトルデータファイル８２（０）〜８２（Ｘ）からフェッチされ得る。フィルタベクトル処理動作１０２がシフトされた入力ベクトルデータサンプルセット８６Ｓ（０）〜８６Ｓ（Ｘ）に対して完了した後、タップ付き遅延線７８（０）、７８（１）内に未処理の入力ベクトルデータサンプル８６Ｓが存在する場合、実行ユニット８４（０）〜８４（Ｘ）は、次いで、次のフィルタベクトル処理動作のためのシフトされた入力ベクトルデータサンプルセット８６Ｓ（０）〜８６Ｓ（Ｘ）として、プライマリタップ付き遅延線７８（０）に記憶された前の次の入力ベクトルデータサンプルセット８６Ｎ（０）〜８６Ｎ（Ｘ）を供給され得る。 [00118] Referring back to FIG. 7, when the input vector data sample 86 is shifted within the primary tapped delay line 78 (0) during each processing stage of the filter vector processing operation 102, the shadow pipeline register 122 The next input vector data sample 86N stored in is also shifted in the shadow pipeline register 122 of the shadow tapped delay line 78 (1). The input vector data samples 86 stored in the first shadow pipeline register 122 (0) of FIG. 7 are stored in the last primary pipeline register 120 (4X + 3) of the primary tapped delay line 78 (0) during each shift. Shifted in. Thus, in this way, when the processing stage of filter vector processing operation 102 proceeds in execution units 84 (0) -84 (X), the next input initially stored in shadow tapped delay line 78 (1). At least a portion of the vector data sample sets 86N (0) -86N (X) is provided to the execution units 84 (0) -84 (X) for processing so that the primary tapped delay line 78 (0) Shifted in. The number of shifts depends in this example on the number of filter taps provided in the filter vector processing operation 102. Input vector data sample sets 86 (0) -86 () fetched from the vector data files 82 (0) -82 (X) into the primary tapped delay line 78 (0) and the shadow tapped delay line 78 (1). If the number of input vector data samples 86 in X) is greater than the number of filter taps in the filter vector processing operation 102, the execution units 84 (0) -84 (X) may use any additional input vector data sample sets 86. The filter vector processing operation 102 can be performed without refetching (0) -86 (X) from the vector data files 82 (0) -82 (X). However, the number of filter taps in the filter vector processing operation 102 is fetched from the vector data files 82 (0) -82 (X) into the delay line 78 (0) with the primary tap and the delay line 78 (1) with the shadow tap. If the input vector data sample 86 is larger than the input vector data sample 86 in the input vector data sample set 86 (0) -86 (X), the additional input vector data sample set 86 (0) -86 as part of the filter vector processing operation 102. (X) may be fetched from the vector data files 82 (0) -82 (X). After the filter vector processing operation 102 is completed for the shifted input vector data sample sets 86S (0) -86S (X), an unprocessed input vector in the tapped delay lines 78 (0), 78 (1). If data sample 86S is present, execution units 84 (0) -84 (X) are then used as shifted input vector data sample sets 86S (0) -86S (X) for the next filter vector processing operation. The previous next input vector data sample set 86N (0) -86N (X) stored in the primary tapped delay line 78 (0) may be supplied.

[00119]シャドウタップ付き遅延線７８（１）を提供するための別の例示的な論理的根拠は以下の通りである。現在のフィルタベクトル処理動作１０２が、ベクトルデータレーン１００（０）〜１００（Ｘ）の幅で提供され得るよりも多くの入力ベクトルデータサンプル８６を要する場合、シャドウタップ付き遅延線７８（１）の中にロードされたさらなる入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）は、遅延がないフィルタベクトル処理動作１０２の間、実行ユニット８４（０）〜８４（Ｘ）に利用可能である。フィルタベクトル処理動作１０２が、実行中シフトされた入力ベクトルデータサンプルセット８６Ｓ（０）〜８６Ｓ（Ｘ）を通じて進行するとき、上記で説明されたように、シャドウタップ付き遅延線７８（１）の中にロードされたさらなる次の入力ベクトルデータサンプルセット８６Ｎ（０）〜８６Ｎ（Ｘ）が、プライマリタップ付き遅延線７８（０）の中にシフトされる。したがって、このようにして、実行ユニット８４（０）〜８４（Ｘ）によるベクトル処理において使用するための次の入力ベクトルデータサンプルセット８６Ｎ（０）〜８６Ｎ（Ｘ）は、遅延なく利用可能である。ベクトルデータファイル８２（０）〜８２（Ｘ）の幅の単一のフェッチされた入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）が、フィルタベクトル処理動作１０２全体を実行するのに十分であるかどうかにかかわらず、実行ユニット８４（０）〜８４（Ｘ）は、フィルタベクトル処理動作１０２の間、十分に利用され続けることができる。 [00119] Another exemplary rationale for providing a shadow tapped delay line 78 (1) is as follows. If the current filter vector processing operation 102 requires more input vector data samples 86 than can be provided in the width of the vector data lanes 100 (0) -100 (X), the shadow tapped delay line 78 (1) Additional input vector data sample sets 86 (0) -86 (X) loaded therein are available to execution units 84 (0) -84 (X) during the filter vector processing operation 102 without delay. As the filter vector processing operation 102 proceeds through the currently shifted input vector data sample sets 86S (0) -86S (X), as described above, in the shadow tapped delay line 78 (1). The further next input vector data sample set 86N (0) -86N (X) loaded into is shifted into the primary tapped delay line 78 (0). Thus, in this way, the next input vector data sample sets 86N (0) -86N (X) for use in vector processing by the execution units 84 (0) -84 (X) are available without delay. . A single fetched input vector data sample set 86 (0) -86 (X) wide of vector data files 82 (0) -82 (X) is sufficient to perform the entire filter vector processing operation 102. Regardless of whether or not there is, execution units 84 (0)-84 (X) can continue to be fully utilized during the filter vector processing operation 102.

[00120]最初の入力ベクトルデータサンプルセット８６Ｎ（０）〜８６Ｎ（Ｘ）および次の入力ベクトルデータサンプルセット８６Ｎ（０）〜８６Ｎ（Ｘ）が、それぞれ、プライマリタップ付き遅延線７８（０）およびシャドウタップ付き遅延線７８（１）の中にロードされた後、プライマリタップ付き遅延線７８（０）に供給された最初の入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）は、フィルタベクトル処理動作１０２の最初の処理ステージにおいて処理されるために、それぞれの実行ユニット８４（０）〜８４（Ｘ）に供給される（図５のブロック１０８）。最初の入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）が実行ユニット８４（０）〜８４（Ｘ）によって処理された後、最初の入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）は、実行ユニット８４（０）〜８４（Ｘ）によって処理されるべきシフトされた入力ベクトルデータサンプルセット８６Ｓ（０）〜８６Ｓ（Ｘ）になるために、プライマリタップ付き遅延線７８（０）内でシフトされる。図４のＶＰＥ２２（１）において示されたように、シフトされた入力ベクトルデータサンプル８６Ｓ（０）は実行ユニット８４（０）に供給され、シフトされた入力ベクトルデータサンプル８６Ｓ（１）は実行ユニット８４（１）に供給され、以下同様である。 [00120] The first input vector data sample set 86N (0) -86N (X) and the next input vector data sample set 86N (0) -86N (X) are respectively connected to the primary tapped delay line 78 (0) and After loading into the shadow tapped delay line 78 (1), the first input vector data sample set 86 (0) -86 (X) supplied to the primary tapped delay line 78 (0) is the filter vector. To be processed in the first processing stage of processing operation 102, it is supplied to each execution unit 84 (0) -84 (X) (block 108 of FIG. 5). After the first input vector data sample sets 86 (0) -86 (X) have been processed by execution units 84 (0) -84 (X), the first input vector data sample sets 86 (0) -86 (X) In the primary tapped delay line 78 (0) to become a shifted input vector data sample set 86S (0) -86S (X) to be processed by execution units 84 (0) -84 (X). Shifted by. As shown in VPE 22 (1) of FIG. 4, the shifted input vector data sample 86S (0) is provided to execution unit 84 (0), and the shifted input vector data sample 86S (1) is executed by the execution unit. 84 (1), and so on.

[00121]次に、実行ユニット８４（０）〜８４（Ｘ）は、フィルタベクトル処理動作１０２を実行する（図５のブロック１１０）。より詳細には、実行ユニット８４（０）〜８４（Ｘ）は、この例では演算：ｙ［ｎ］＝ｘ［ｎ−７］＊ｈ７に従って、第１の繰返しにおいて最初の入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）を現在のフィルタ係数９２（０）と乗算し、ここで、ｘ［ｎ−７］は、結果として生じるフィルタ出力ベクトルデータサンプルセット９４（０）〜９４（Ｘ）を供給する最初の入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）である。フィルタベクトル処理動作１０２の次の繰返し（図５のブロック１１０）において、フィルタベクトル処理動作１０２のための次のシフトされた入力ベクトルデータサンプルセット８６Ｓ（０）〜８６Ｓ（Ｘ）は、現在のフィルタ係数９２（１）〜９２（Ｙ−１）と乗算される。実行ユニット８４（０）〜８４（Ｘ）は、新しい前の、結果として生じるフィルタ出力ベクトルデータサンプルセット９４（０）〜９４（Ｘ）を供給するために、結果として生じるフィルタ出力ベクトルデータサンプルセット９４（０）〜９４（Ｘ）を、実行ユニット８４（０）〜８４（Ｘ）によって計算された前の、結果として生じるフィルタ出力ベクトルデータサンプルセット９４（０）〜９４（Ｘ）と累算する（図５のブロック１１２）。フィルタベクトル処理動作１０２の最初の処理ステージでは、前の、結果として生じるフィルタ出力ベクトルデータサンプルセットは存在しない。 [00121] Next, execution units 84 (0) -84 (X) perform a filter vector processing operation 102 (block 110 of FIG. 5). More specifically, execution units 84 (0) -84 (X), in this example, follow the operation: y [n] = x [n−7] * h7, the first set of input vector data samples in the first iteration. 86 (0) -86 (X) is multiplied by the current filter coefficient 92 (0), where x [n-7] is the resulting filter output vector data sample set 94 (0) -94 (X ) Is the first input vector data sample set 86 (0) -86 (X). In the next iteration of the filter vector processing operation 102 (block 110 of FIG. 5), the next shifted input vector data sample sets 86S (0) -86S (X) for the filter vector processing operation 102 are stored in the current filter. The coefficients 92 (1) to 92 (Y-1) are multiplied. Execution units 84 (0) -84 (X) provide the resulting filter output vector data sample set 94 (0) -94 (X) to provide a new previous filter output vector data sample set 94 (0) -94 (X). 94 (0) -94 (X) accumulates with the resulting filter output vector data sample sets 94 (0) -94 (X) before being calculated by execution units 84 (0) -84 (X) (Block 112 in FIG. 5). In the first processing stage of the filter vector processing operation 102, there is no previous resulting filter output vector data sample set.

[00122]フィルタベクトル処理動作１０２のすべての処理ステージが完了した場合（図５のブロック１１４）、ベクトルデータファイル８２（０）〜８２（Ｘ）に供給され記憶されるために、出力データフローパス９８（０）〜９８（Ｘ）内の、結果として生じるフィルタ出力ベクトルデータサンプルセット９４（０）〜９４（Ｘ）として、累算された前の、結果として生じるフィルタ出力ベクトルデータサンプルセット９４（０）〜９４（Ｘ）が供給される（図５のブロック１１６）。フィルタベクトル処理動作１０２のすべての処理ステージが完了していない場合（図５のブロック１１４）、フィルタベクトル処理動作１０２に次のシフトされた入力ベクトルデータサンプルセット８６Ｓ（０）〜８６Ｓ（Ｘ）を供給するために、タップ付き遅延線７８（０）および７８（１）に記憶されたサンプルがタップ付き遅延線７８（０）、７８（１）内でシフトされる（図５のブロック１１８）。シフトされた入力ベクトルデータサンプルセット８６Ｓ（０）〜８６Ｓ（Ｘ）は、フィルタベクトル処理動作１０２が完了するまで、前の、結果として生じるフィルタ出力ベクトルデータサンプルセットと累算されるために、中間結果として次の、結果として生じるフィルタ出力ベクトルデータサンプルセットを計算するために供給される。タップ付き遅延線７８（０）、７８（１）内にシフトされた入力ベクトルデータサンプルセット８６Ｓ（０）〜８６Ｓ（Ｘ）を供給するために入力ベクトルデータサンプル８６をシフトすることは、図７に関して詳細に上記で前述された。フィルタベクトル処理動作１０２に実行ユニット８４（０）〜８４（Ｘ）によって供給された中間結果の最終的な累算は、図４に示されたように、実行ユニット８４（０）〜８４（Ｘ）から、結果として生じるフィルタ出力ベクトルデータサンプルセット９４（０）〜９４（Ｘ）として供給される。 [00122] When all processing stages of the filter vector processing operation 102 are complete (block 114 in FIG. 5), the output data flow path 98 is supplied and stored in the vector data files 82 (0) -82 (X). The resulting filter output vector data sample set 94 (0) prior to accumulation as the resulting filter output vector data sample set 94 (0) -94 (X) in (0) -98 (X). ) To 94 (X) are supplied (block 116 in FIG. 5). If all processing stages of the filter vector processing operation 102 have not been completed (block 114 of FIG. 5), the filtered vector processing operation 102 receives the next shifted input vector data sample set 86S (0) -86S (X). To provide, samples stored in tapped delay lines 78 (0) and 78 (1) are shifted in tapped delay lines 78 (0), 78 (1) (block 118 in FIG. 5). The shifted input vector data sample sets 86S (0) -86S (X) are intermediate to be accumulated with the previous resulting filter output vector data sample sets until the filter vector processing operation 102 is complete. The result is then supplied to compute the resulting filter output vector data sample set. Shifting the input vector data samples 86 to provide input vector data sample sets 86S (0) -86S (X) shifted into tapped delay lines 78 (0), 78 (1) is shown in FIG. With respect to details above. The final accumulation of the intermediate results provided by the execution units 84 (0) -84 (X) to the filter vector processing operation 102 is performed as shown in FIG. 4 by execution units 84 (0) -84 (X ) As the resulting filter output vector data sample sets 94 (0) -94 (X).

[00123]図９Ｃは、次のフィルタ処理動作ｙ［ｎ］＝ｘ［ｎ−６］＊ｈ６のための次のシフトされた入力ベクトルデータサンプルセット８６Ｓ（０）〜８６Ｓ（Ｘ）になるために、フィルタベクトル処理動作１０２の２番目の処理ステージにおいて、入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）がシフトされたときのタップ付き遅延線７８のコンテンツを示す。プライマリタップ付き遅延線７８（０）内のシフトされた入力ベクトルデータサンプルセット８６Ｓ（０）〜８６Ｓ（Ｘ）は、実行されているベクトル命令によって規定された入力ベクトルデータサンプルのシフト幅に従って、プライマリパイプラインレジスタ１２０（０）〜１２０（４Ｘ＋３）内でシフトされる。たとえば、図９Ｃに示されたように、サンプルＸ２はシフトされた入力ベクトルデータサンプル８６Ｓ（０）内でシフトされる。新しいシフトされた入力ベクトルデータサンプルセット８６Ｓ（０）〜８６Ｓ（Ｘ）は、フィルタベクトル処理動作１０２の次のフィルタタップのための実行のために、実行ユニット８４（０）〜８４（Ｘ）に供給される。実行ユニット８４（０）〜８４（Ｘ）に供給されたフィルタ係数９２は、この例では「ｈ６」である次のフィルタ係数９２でもある。 [00123] FIG. 9C is the next shifted input vector data sample set 86S (0) -86S (X) for the next filtering operation y [n] = x [n-6] * h6. 6 shows the contents of the tapped delay line 78 when the input vector data sample sets 86 (0) -86 (X) are shifted in the second processing stage of the filter vector processing operation 102. FIG. The shifted input vector data sample sets 86S (0) to 86S (X) in the primary tapped delay line 78 (0) are primary according to the shift width of the input vector data samples defined by the vector instruction being executed. Shifted in pipeline registers 120 (0) -120 (4X + 3). For example, as shown in FIG. 9C, sample X2 is shifted within shifted input vector data sample 86S (0). New shifted input vector data sample sets 86S (0) -86S (X) are sent to execution units 84 (0) -84 (X) for execution for the next filter tap of filter vector processing operation 102. Supplied. The filter coefficient 92 supplied to the execution units 84 (0) to 84 (X) is also the next filter coefficient 92 which is “h6” in this example.

[00124]引き続き図５を参照すると、次のフィルタ係数９２と乗算される（図５のブロック１１０）ために、実行ユニット８４（０）〜８４（Ｘ）にプライマリタップ付き遅延線７８（０）からシフトされた入力ベクトルデータサンプルセット８６Ｓ（０）〜８６Ｓ（Ｘ）を供給する（図５のブロック１０８）ことによって、プロセスは繰り返す。結果として生じるフィルタ出力ベクトルデータサンプルセット９４（０）〜９４（Ｘ）が前の、結果として生じるフィルタ出力ベクトルデータサンプルセット９４（０）〜９４（Ｘ）と累算される（図５のブロック１１２）。図９Ｄは、例示的なフィルタベクトル処理動作１０２の最後の処理ステージの間、タップ付き遅延線７８（０）、７８（１）内に存在する入力ベクトルデータサンプル８６の状態を示す。図９Ｄに示されたこの例では、フィルタ係数９２「ｈ７」〜「ｈ０」（すなわち、９２（０）〜９２（Ｙ−１））のせいで、フィルタベクトル処理動作１０２において８個のフィルタタップ（Ｙ）が存在した。図９Ｄに示されたように、「ｈ０」はフィルタベクトル処理動作１０２における最後のフィルタ係数９２である。シフトされた入力ベクトルデータサンプルセット８６Ｓ（０）〜８６Ｓ（Ｘ）は（フィルタタップの数よりも１回少ない）７回シフトされており、その結果、フィルタベクトル処理動作１０２のための最後の８番目の処理ステージにおいて、入力ベクトルデータサンプルＸ３９がプライマリタップ付き遅延線７８（０）内のシフトされた入力ベクトルデータサンプル８６Ｓ（０）に記憶される。 [00124] With continued reference to FIG. 5, the primary tapped delay line 78 (0) in the execution units 84 (0) -84 (X) to be multiplied by the next filter coefficient 92 (block 110 of FIG. 5). The process repeats by providing input vector data sample sets 86S (0) -86S (X) shifted from (block 108 in FIG. 5). The resulting filter output vector data sample sets 94 (0) -94 (X) are accumulated with the previous resulting filter output vector data sample sets 94 (0) -94 (X) (block of FIG. 5). 112). FIG. 9D shows the state of the input vector data sample 86 present in the tapped delay lines 78 (0), 78 (1) during the final processing stage of the exemplary filter vector processing operation 102. In this example shown in FIG. 9D, eight filter taps in the filter vector processing operation 102 due to the filter coefficients 92 “h7” to “h0” (ie, 92 (0) to 92 (Y−1)). (Y) was present. As shown in FIG. 9D, “h0” is the last filter coefficient 92 in the filter vector processing operation 102. The shifted input vector data sample sets 86S (0) -86S (X) have been shifted 7 times (one less than the number of filter taps), so that the last 8 for the filter vector processing operation 102 In the second processing stage, the input vector data sample X39 is stored in the shifted input vector data sample 86S (0) in the primary tapped delay line 78 (0).

[00125]上述されたフィルタベクトル処理動作１０２の例は、フィルタベクトル処理動作１０２を提供するためにＶＰＥ２２（１）内のベクトルデータレーン１００（０）〜１００（Ｘ）の各々を利用するが、それは必要でないことに留意されたい。フィルタベクトル処理動作１０２は、フィルタベクトル処理動作１０２に利用されるべきベクトルデータレーン１００（０）〜１００（Ｘ）のサブセットを必要とするにすぎない場合がある。たとえば、入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）の幅がすべてのベクトルデータファイル８２（０）〜８２（Ｘ）の幅よりも小さい場合があり、ここで、フィルタベクトル処理動作１０２と並列に実行されるべき他のベクトル処理動作にさらなるベクトルデータレーン１００を利用することが望ましい。このシナリオでは、図７のタップ付き遅延線７８（０）、７８（１）は、最後のベクトルデータレーン１００（Ｘ）に到達するより前に、ベクトルデータレーン１００内のシフトされた入力ベクトルデータサンプルセット８６Ｓ（０）〜８６Ｓ（Ｘ）として、シャドウタップ付き遅延線７８（１）からプライマリタップ付き遅延線７８（０）に次の入力ベクトルデータサンプルセット８６Ｎ（０）〜８６Ｎ（Ｘ）をシフトするように、修正される必要があり得る。 [00125] The example of the filter vector processing operation 102 described above utilizes each of the vector data lanes 100 (0) -100 (X) in the VPE 22 (1) to provide the filter vector processing operation 102, Note that it is not necessary. The filter vector processing operation 102 may only require a subset of the vector data lanes 100 (0) -100 (X) to be utilized for the filter vector processing operation 102. For example, the width of the input vector data sample sets 86 (0) -86 (X) may be smaller than the width of all vector data files 82 (0) -82 (X), where the filter vector processing operation 102 It is desirable to utilize additional vector data lanes 100 for other vector processing operations to be performed in parallel. In this scenario, the tapped delay lines 78 (0), 78 (1) of FIG. 7 will cause the shifted input vector data in the vector data lane 100 before reaching the last vector data lane 100 (X). As sample sets 86S (0) to 86S (X), the next input vector data sample sets 86N (0) to 86N (X) are transferred from the delay line with shadow tap 78 (1) to the delay line with primary tap 78 (0). It may need to be modified to shift.

[00126]図１０は、上記の例における例示的な８個のタップフィルタベクトル処理ステージが、ｙ［ｎ］＝ｘ［ｎ］＊ｈ０＋ｘ［ｎ−１］＊ｈ１＋．．．＋ｘ［ｎ−７］＊ｈ７に従って完全に実行された後の、図４のＶＰＥ２２（１）内の実行ユニット８４（０）〜８４（Ｘ）内の累算器のコンテンツ（すなわち、結果として生じるフィルタ出力ベクトルデータサンプル９４）の概略図である。この例では、各実行ユニット８４（０）〜８４（Ｘ）は、ベクトルデータレーン１００（０）〜１００（Ｘ）ごとに並列に配置された４つの累算器を有するので、累算器Ａｃｃ０〜Ａｃｃ３が図１０に示されている。累算された、結果として生じる出力ベクトルデータサンプルは、さらなる分析および／または処理のためにそこに記憶されるべき全体の、結果として生じるフィルタ出力ベクトルデータサンプルセット９４（０）〜９４（Ｘ）として、ベクトルデータファイル８２（０）〜８２（Ｘ）に出力データフローパス９８（０）〜９８（Ｘ）上で供給され得る。必要な場合、ベクトルデータファイル８２（０）〜８２（Ｘ）から図２のベクトルユニットデータメモリ３２に、結果として生じるフィルタ出力ベクトルデータサンプルセット９４（０）〜９４（Ｘ）の行を移動するために、特殊なベクトル命令がＶＰＥ２２（１）によってサポートされる場合がある。 [00126] FIG. 10 illustrates that the exemplary eight tap filter vector processing stages in the above example are y [n] = x [n] * h0 + x [n-1] * h1 +. . . The contents of the accumulators in execution units 84 (0) -84 (X) in VPE 22 (1) of FIG. 4 after complete execution according to + x [n-7] * h7 (ie, the resulting FIG. 6 is a schematic diagram of a filter output vector data sample 94). In this example, each execution unit 84 (0) -84 (X) has four accumulators arranged in parallel for each of the vector data lanes 100 (0) -100 (X), so the accumulator Acc0 ~ Acc3 is shown in FIG. The accumulated resulting output vector data samples are then stored in the entire resulting filtered output vector data sample set 94 (0) -94 (X) to be stored for further analysis and / or processing. As such, vector data files 82 (0) -82 (X) may be supplied on output data flow paths 98 (0) -98 (X). If necessary, move the resulting row of filter output vector data sample sets 94 (0) -94 (X) from the vector data files 82 (0) -82 (X) to the vector unit data memory 32 of FIG. Therefore, special vector instructions may be supported by VPE 22 (1).

[00127]フィルタベクトル処理動作１０２以外の他のタイプのベクトル処理動作も、上記で説明された図４のＶＰＥ２２（１）内に設けられたタップ付き遅延線７８と同じまたは同様のタップ付き遅延線７８の使用による、ＶＰＥにおける処理効率を享受することができる。たとえば、ＶＰＥにおける入力ベクトルデータサンプルセット８６のシフトを伴う別の特殊なベクトル処理動作は、（本明細書では「相関ベクトル処理動作」と呼ばれる）相関／共分散ベクトル処理動作である。例として、ＣＤＭＡシステムにおいてユーザ信号と他のユーザの信号との間の良好な分離を提供するために、ＣＤＭＡシステムにおいてユーザ信号を復調するための直接スペクトル拡散コード（ＤＳＳＣ）（すなわち、チップシーケンス）を選ぶために相関演算を提供するようにベクトル処理を利用することが望ましい場合がある。信号の分離は、受信された信号を所望のユーザのローカルに生成されたチップシーケンスと相関させることによって行われる。信号が所望のユーザのチップシーケンスと一致する場合、相関関数は高くなり、ＣＤＭＡシステムはその信号を抽出することができる。所望のユーザのチップシーケンスが信号と共通する部分を少ししか、またはまったく有していない場合、相関は可能な限りゼロに近い（したがって信号を除去する）はずであり、これは相互相関と呼ばれる。チップシーケンスがゼロ以外の任意の時間オフセットで信号と相関される場合、相関は可能な限りゼロに近いはずである。これは自己相関と呼ばれ、マルチパス干渉を拒絶するために使用される。 [00127] Other types of vector processing operations other than the filter vector processing operation 102 are also the same or similar tapped delay lines as the tapped delay line 78 provided in the VPE 22 (1) of FIG. 4 described above. The processing efficiency in VPE by using 78 can be enjoyed. For example, another special vector processing operation that involves shifting the input vector data sample set 86 in the VPE is a correlation / covariance vector processing operation (referred to herein as a “correlation vector processing operation”). As an example, a direct spread spectrum code (DSSC) (ie, a chip sequence) for demodulating a user signal in a CDMA system to provide good separation between the user signal and other users' signals in a CDMA system It may be desirable to use vector processing to provide a correlation operation to choose. Signal separation is performed by correlating the received signal with a locally generated chip sequence of the desired user. If the signal matches the desired user's chip sequence, the correlation function is high and the CDMA system can extract the signal. If the desired user's chip sequence has little or no part in common with the signal, the correlation should be as close to zero as possible (thus removing the signal), which is called cross-correlation. If the chip sequence is correlated with the signal at any non-zero time offset, the correlation should be as close to zero as possible. This is called autocorrelation and is used to reject multipath interference.

[00128]しかしながら、相関演算は、ベクトルプロセッサにおいて提供される特殊なデータフローパスに起因して、ベクトルプロセッサにおいて並列化することは困難であり得る。相関されるべき信号を表す入力ベクトルデータサンプルセットが遅延タップ間でシフトされると、入力ベクトルデータサンプルセットはベクトルデータファイルから再フェッチされ、したがって電力消費が増大し、スループットが低減される。メモリからの入力ベクトルデータサンプルセットの再フェッチを最小化するために、データフローパスは、効率的な並列化処理のために、遅延タップと同じ数の乗算器を設けるように構成される可能性がある。しかしながら、他のベクトル処理動作は、より少ない乗算器しか必要としない場合があり、それにより、データフローパス内の乗算器の非効率的なスケーリングおよび過少利用がもたらされる。スケーラビリティを提供するために、乗算器の数が遅延タップの数よりも少なくなるように削減された場合、相関処理の様々なフェーズに対して同じ入力ベクトルデータサンプルセットを取得するために、メモリにより多くの再フェッチが必要とされることによって、並列化が制限される。 [00128] However, correlation operations may be difficult to parallelize in a vector processor due to the special data flow path provided in the vector processor. When the input vector data sample set representing the signal to be correlated is shifted between delay taps, the input vector data sample set is refetched from the vector data file, thus increasing power consumption and reducing throughput. In order to minimize refetching of input vector data sample sets from memory, the data flow path may be configured to provide as many multipliers as delay taps for efficient parallel processing. is there. However, other vector processing operations may require fewer multipliers, leading to inefficient scaling and underutilization of the multipliers in the data flow path. In order to provide scalability, if the number of multipliers is reduced to be less than the number of delay taps, the memory can be used to obtain the same set of input vector data samples for the various phases of the correlation process. The need for many refetches limits parallelism.

[00129]この関連で、図１１は、図２のＶＰＥ２２として提供され得る別の例示的なＶＰＥ２２（２）の概略図である。下記でより詳細に記載されるように、図１１のＶＰＥ２２（２）は、ベクトルデータサンプルの再フェッチが除去または低減され、電力消費が低減される、ＶＰＥ２２（２）内の精度相関ベクトル処理動作を提供するように構成される。精度相関ベクトル処理動作は、ベクトルデータサンプルの再フェッチを必要とし、それにより結果として電力消費が増大する、中間結果の記憶を必要とする相関ベクトル処理動作と比較して、ＶＰＥ２２（２）において提供され得る。ベクトルデータファイルからの入力ベクトルデータサンプルの再フェッチを除去または最小化して、電力消費を低減し、処理効率を改善するために、図４のＶＰＥ２２（１）に含まれるタップ付き遅延線７８も、ＶＰＥ２２（２）内のベクトルデータファイル８２（０）〜８２（Ｘ）と（「ＥＵ」とも標記される）実行ユニット８４（０）〜８４（Ｘ）との間の入力データフローパス８０（０）〜８０（Ｘ）に含まれる。「Ｘ」＋１は、この例におけるベクトルデータサンプルの処理用にＶＰＥ２２（２）内に設けられる並列入力データレーンの最大数である。上記で前に説明されたように、タップ付き遅延線７８は、ベクトルデータファイル８２（０）〜８２（Ｘ）の対応するサブセットまたはすべてから入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）の入力ベクトルデータサンプル８６のサブセットまたはすべてとして、タップ付き遅延線入力８８（０）〜８８（Ｘ）上で入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）を受信するように構成される。すべての入力ベクトルデータサンプル８６は、入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）を備える。下記でより詳細に説明されるように、ベクトルデータファイル８２（０）〜８２（Ｘ）からの入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）は、結果として生じる相関出力ベクトルデータサンプルセット１３２（０）〜１３２（Ｘ）を供給するために、基準ベクトルデータサンプルセット１３０（０）〜１３０（Ｘ）とＶＰＥ２２（２）において相関される。基準ベクトルデータサンプルセット１３０（０）〜１３０（Ｘ）は、この例では１３０（０）、１３０（１）、．．．、および１３０（Ｘ）である、「Ｘ＋１」個の基準ベクトルデータサンプル１３０から構成される。結果として生じる相関出力ベクトルデータサンプルセット１３２（０）〜１３２（Ｘ）は、この例では１３２（０）、１３２（１）、．．．、および１３２（Ｘ）である、「Ｘ＋１」個の、結果として生じる相関出力ベクトルデータサンプル１３２から構成される。 [00129] In this regard, FIG. 11 is a schematic diagram of another exemplary VPE 22 (2) that may be provided as the VPE 22 of FIG. As described in more detail below, the VPE 22 (2) of FIG. 11 eliminates or reduces refetching of vector data samples, reducing power consumption, and the precision correlation vector processing operation in VPE 22 (2). Configured to provide. Precision correlation vector processing operations are provided at VPE22 (2) compared to correlation vector processing operations that require the storage of intermediate results, which requires refetching of vector data samples, resulting in increased power consumption. Can be done. In order to eliminate or minimize refetching of input vector data samples from the vector data file to reduce power consumption and improve processing efficiency, the tapped delay line 78 included in VPE 22 (1) of FIG. Input data flow path 80 (0) between vector data files 82 (0) -82 (X) in VPE 22 (2) and execution units 84 (0) -84 (X) (also labeled “EU”). ~ 80 (X). “X” +1 is the maximum number of parallel input data lanes provided in VPE 22 (2) for processing vector data samples in this example. As previously described above, tapped delay line 78 is used to input vector data sample sets 86 (0) -86 (X) from a corresponding subset or all of vector data files 82 (0) -82 (X). Are configured to receive the input vector data sample sets 86 (0) -86 (X) on the tapped delay line inputs 88 (0) -88 (X) as a subset or all of the input vector data samples 86. . All input vector data samples 86 comprise an input vector data sample set 86 (0) -86 (X). As described in more detail below, the input vector data sample sets 86 (0) -86 (X) from the vector data files 82 (0) -82 (X) are the resulting correlated output vector data sample sets. To provide 132 (0) -132 (X), the reference vector data sample sets 130 (0) -130 (X) and VPE 22 (2) are correlated. Reference vector data sample sets 130 (0) -130 (X) are 130 (0), 130 (1),. . . , And 130 (X), which are comprised of “X + 1” reference vector data samples 130. The resulting correlation output vector data sample sets 132 (0) -132 (X) are 132 (0), 132 (1),. . . , And 132 (X), consisting of “X + 1” resulting correlation output vector data samples 132.

[00130]引き続き図１１を参照すると、タップ付き遅延線７８は、シフトされた入力ベクトルデータサンプルセット８６Ｓ（０）〜８６Ｓ（Ｘ）を供給するために、ＶＰＥ２２（２）によって実行されるべき相関ベクトル命令に従う相関ベクトル処理動作の相関遅延タップ（すなわち、相関処理ステージ）ごとに、入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）をシフトする。シフトされた入力ベクトルデータサンプル８６Ｓのすべては、シフトされた入力ベクトルデータサンプルセット８６Ｓ（０）〜８６Ｓ（Ｘ）を備える。タップ付き遅延線７８は、相関ベクトル処理動作中、実行ユニット８４（０）〜８４（Ｘ）の実行ユニット入力９０（０）〜９０（Ｘ）にシフトされた入力ベクトルデータサンプルセット８６Ｓ（０）〜８６Ｓ（Ｘ）を供給するために、入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）をシフトする。このようにして、シフトされた入力ベクトルデータサンプルセット８６Ｓ（０）〜８６Ｓ（Ｘ）に対して実行される動作に基づく中間相関結果は、ＶＰＥ２２（２）によって実行される相関ベクトル処理動作の各処理ステージの間に、記憶、シフト、およびベクトルデータファイル８２（０）〜８２（Ｘ）から再フェッチされる必要がない。このように、タップ付き遅延線７８は、電力消費を低減し、ＶＰＥ２２（２）によって実行される相関ベクトル処理動作についての処理効率を上げることができる。 [00130] With continued reference to FIG. 11, the tapped delay line 78 provides a correlation to be performed by the VPE 22 (2) to provide a shifted input vector data sample set 86S (0) -86S (X). The input vector data sample sets 86 (0) to 86 (X) are shifted for each correlation delay tap (ie, correlation processing stage) of the correlation vector processing operation according to the vector instruction. All of the shifted input vector data samples 86S comprise the shifted input vector data sample sets 86S (0) -86S (X). Tapped delay line 78 is input vector data sample set 86S (0) shifted to execution unit inputs 90 (0) -90 (X) of execution units 84 (0) -84 (X) during correlation vector processing operations. Shift the input vector data sample sets 86 (0) -86 (X) to provide ~ 86S (X). In this way, the intermediate correlation results based on the operations performed on the shifted input vector data sample sets 86S (0) -86S (X) are obtained from each of the correlation vector processing operations performed by the VPE 22 (2). There is no need to store, shift, and refetch from the vector data files 82 (0) -82 (X) during the processing stage. Thus, the tapped delay line 78 can reduce power consumption and increase the processing efficiency for the correlation vector processing operation performed by the VPE 22 (2).

[00131]引き続き図１１を参照すると、実行ユニット８４（０）〜８４（Ｘ）は、相関ベクトル処理動作のためのシーケンス番号発生器（ＳＮＧ）１３４に記憶された基準ベクトルデータサンプルセット１３０（０）〜１３０（Ｘ）の中から基準ベクトルデータサンプル１３０も受信する。実行ユニット８４（０）〜８４（Ｘ）は、相関ベクトル処理動作の一部として、基準ベクトルデータサンプルセット１３０（０）〜１３０（Ｘ）を入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）と相関させるように構成される。しかしながら、シーケンス番号発生器（ＳＮＧ）１３４はレジスタまたは他のファイルでもあり得ることに留意されたい。この例における相関ベクトル処理動作はＣＤＭＡ相関ベクトル命令向けなので、シーケンス番号発生器１３４は、基準ベクトルデータサンプルセット１３０（０）〜１３０（Ｘ）を供給するために、この実施形態において提供される。基準ベクトルデータサンプルセット１３０（０）〜１３０（Ｘ）と入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）との間の相関が高い場合、基準ベクトルデータサンプルセット１３０（０）〜１３０（Ｘ）は、入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）からの信号抽出に使用するための生成されたチップシーケンスとして供給される。 [00131] With continued reference to FIG. 11, the execution units 84 (0) -84 (X) have the reference vector data sample set 130 (0) stored in the sequence number generator (SNG) 134 for correlation vector processing operations. ) To 130 (X), the reference vector data sample 130 is also received. Execution units 84 (0) -84 (X) use reference vector data sample sets 130 (0) -130 (X) as input vector data sample sets 86 (0) -86 (X) as part of the correlation vector processing operation. ) To be correlated. However, it should be noted that the sequence number generator (SNG) 134 can also be a register or other file. Since the correlation vector processing operation in this example is for CDMA correlation vector instructions, sequence number generator 134 is provided in this embodiment to provide reference vector data sample sets 130 (0) -130 (X). When the correlation between the reference vector data sample sets 130 (0) to 130 (X) and the input vector data sample sets 86 (0) to 86 (X) is high, the reference vector data sample sets 130 (0) to 130 ( X) is provided as a generated chip sequence for use in signal extraction from the input vector data sample sets 86 (0) -86 (X).

[00132]たとえば、ＣＤＭＡベクトル相関命令向けの相関ベクトル処理動作は、入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）内のオンタイム入力ベクトルデータサンプル８６と、入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）内の後発入力ベクトルデータサンプルとの間の相関を提供する可能性がある。たとえば、入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）内のオンタイム入力ベクトルデータサンプル８６は、入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）内の偶数の入力ベクトルデータサンプル８６（たとえば、８６（０）、８６（２）、８６（４）、．．．８６（Ｘ−１））であり得る。入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）内の後発入力ベクトルデータサンプル８６は、入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）内の奇数の入力ベクトルデータサンプル８６（たとえば、８６（１）、８６（３）、８６（５）、．．．８６（Ｘ））であり得る。代替として、オンタイム入力ベクトルデータサンプル８６は奇数の入力ベクトルデータサンプル８６であり得るし、後発入力ベクトルデータサンプル８６は偶数の入力ベクトルデータサンプル８６であり得る。相関ベクトル処理動作の結果、オンタイム入力ベクトルデータサンプル８６のための、結果として生じる相関出力ベクトルデータサンプルセット１３２（０）〜１３２（Ｘ）、および後発入力ベクトルデータサンプル８６は、信号抽出に入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）からのオンタイム入力ベクトルデータサンプルを使用するべきか、または後発入力ベクトルデータサンプルを使用するべきかを決定するために使用され得る。たとえば、オンタイム相関ベクトル処理動作は、以下の式に従って供給される場合がある、 [00132] For example, correlation vector processing operations for CDMA vector correlation instructions include on-time input vector data samples 86 in input vector data sample sets 86 (0) -86 (X) and input vector data sample sets 86 (0 ) -86 (X) may provide a correlation with later input vector data samples. For example, on-time input vector data samples 86 in input vector data sample sets 86 (0) -86 (X) are even-numbered input vector data samples 86 in input vector data sample sets 86 (0) -86 (X). (E.g., 86 (0), 86 (2), 86 (4),... 86 (X-1)). Subsequent input vector data samples 86 in input vector data sample sets 86 (0) -86 (X) are odd input vector data samples 86 in input vector data sample sets 86 (0) -86 (X) (eg, 86 (1), 86 (3), 86 (5),... 86 (X)). Alternatively, the on-time input vector data sample 86 can be an odd input vector data sample 86 and the late input vector data sample 86 can be an even input vector data sample 86. As a result of the correlation vector processing operation, the resulting correlation output vector data sample set 132 (0) -132 (X) for the on-time input vector data sample 86 and the late input vector data sample 86 are input to the signal extraction. It can be used to determine whether to use on-time input vector data samples from vector data sample sets 86 (0) -86 (X) or to use late input vector data samples. For example, the on-time correlation vector processing operation may be provided according to the following equation:

ここで、
ｎは入力信号サンプルの数であり、
ｘ［ｎ］はデジタル化入力信号６６であり、
ｙ［ｎ］は基準信号であり、
ｌはサンプル数である。 here,
n is the number of input signal samples,
x [n] is the digitized input signal 66;
y [n] is a reference signal,
l is the number of samples.

[00133]後発相関ベクトル処理動作は、以下の式に従って提供される場合がある、 [00133] The late correlation vector processing operation may be provided according to the following equation:

ここで、
ｎは入力信号サンプルの数であり、
ｘ［ｎ］はデジタル化入力信号６６であり、
ｙ［ｎ］は基準信号であり、
ｌはサンプル数である。
基準信号ｙ［ｎ］（すなわち、基準ベクトルデータサンプル）は複素数であり得る。一態様では、ＶＰＥ２２（２）は、（たとえば、シーケンス番号発生器１３４から）基準信号を受信する場合がある。ＶＰＥ２２（２）は、オンタイム相関演算と後発相関演算とを実行するために受信された基準信号を直接使用する場合があり、その場合、上記の式における基準信号ｙ［ｎ］は、受信された基準信号を表す場合がある。代替として、ＶＰＥ２２（２）は、オンタイム相関演算と後発相関演算とを実行するために基準信号を使用する前に、受信された基準信号の複素共役を計算する場合があり、その場合、上記の式における基準信号ｙ［ｎ］は、受信された基準信号の共役を表す場合がある。 here,
n is the number of input signal samples,
x [n] is the digitized input signal 66;
y [n] is a reference signal,
l is the number of samples.
The reference signal y [n] (ie, the reference vector data sample) can be a complex number. In one aspect, VPE 22 (2) may receive a reference signal (eg, from sequence number generator 134). VPE 22 (2) may directly use the received reference signal to perform on-time correlation calculations and late correlation calculations, in which case the reference signal y [n] in the above equation is received. May represent a reference signal. Alternatively, VPE 22 (2) may calculate the complex conjugate of the received reference signal before using the reference signal to perform on-time correlation operations and late correlation operations, in which case The reference signal y [n] in the equation (1) may represent the conjugate of the received reference signal.

[00134]引き続き図１１を参照すると、実行ユニット８４（０）〜８４（Ｘ）は、各々、実行ユニット８４（０）〜８４（Ｘ）内の中間相関出力ベクトルデータサンプルを供給するために、相関ベクトル処理動作の各処理ステージの間に、基準ベクトルデータサンプルセット１３０（０）〜１３０（Ｘ）を、シフトされた入力ベクトルデータサンプルセット８６Ｓ（０）〜８６Ｓ（Ｘ）のシフトされた入力ベクトルデータサンプル８６Ｓ（０）、８６Ｓ（１）、．．．８６Ｓ（Ｘ）と乗算するように構成される。中間相関出力ベクトルデータサンプルセットは、実行ユニット８４（０）〜８４（Ｘ）の各々において累算される（すなわち、前に累算された相関出力ベクトルデータサンプルが現在の相関出力ベクトルデータサンプルに加算される）。これにより、実行ユニット８４（０）〜８４（Ｘ）によって生成された中間相関出力ベクトルデータサンプルセットを記憶しシフトする必要なしに、ＶＰＥ２２（２）によるさらなる使用および／または処理のためにそれぞれのベクトルデータファイル８２（０）〜８２（Ｘ）に戻して記憶されるべき入力ベクトルデータサンプルセット８６（０）、８６（１）、．．．８６（Ｘ）ごとに、それぞれ、出力データフローパス９８（０）〜９８（Ｘ）上の実行ユニット出力９６（０）〜９６（Ｘ）上に実行ユニット８４（０）〜８４（Ｘ）によって供給される、最終的な、結果として生じる相関出力ベクトルデータサンプルセット１３２（０）〜１３２（Ｘ）がもたらされる。 [00134] With continued reference to FIG. 11, execution units 84 (0) -84 (X) each provide intermediate correlation output vector data samples in execution units 84 (0) -84 (X). During each processing stage of the correlation vector processing operation, the reference vector data sample sets 130 (0) to 130 (X) are shifted to the shifted input vector data sample sets 86S (0) to 86S (X). Vector data samples 86S (0), 86S (1),. . . 86S (X) is configured to multiply. The intermediate correlation output vector data sample set is accumulated in each of execution units 84 (0) -84 (X) (i.e., the previously accumulated correlation output vector data sample becomes the current correlation output vector data sample. Added). This allows each of the intermediate correlation output vector data sample sets generated by execution units 84 (0) -84 (X) to be used for further use and / or processing by VPE 22 (2) without having to store and shift. Input vector data sample sets 86 (0), 86 (1),... To be stored back to the vector data files 82 (0) -82 (X). . . For each 86 (X), supplied by execution units 84 (0) -84 (X) on execution unit outputs 96 (0) -96 (X) on output data flow paths 98 (0) -98 (X), respectively. Resulting in a final resulting correlation output vector data sample set 132 (0) -132 (X).

[00135]さらに、図１１のＶＰＥ２２（２）内に設けられた同じ構成要素およびアーキテクチャが、図４のＶＰＥ２２（１）内に設けられることに留意されたい。シーケンス番号発生器１３４は、フィルタ係数９２（０）〜９２（Ｙ−１）または基準ベクトルデータサンプルセット１３０（０）〜１３０（Ｘ）と処理されるべき他のデータを供給することができるグローバルレジスタファイル４０と、マルチプレクサ１３６によって加算および多重化される。したがって、図１１のＶＰＥ２２（２）は、前述のフィルタベクトル処理動作と、マルチプレクサ１３６の制御による、ここで説明され、下記でさらに詳細に説明される相関ベクトル処理動作の両方を提供することができる。マルチプレクサ１３６は、ＶＰＥ２２（２）によって実行されているベクトル命令に基づいて制御される選択器信号１３８によって制御され得る。フィルタベクトル命令の場合、選択器信号１３８は、実行ユニット８４（０）〜８４（Ｘ）に供給されるべきグローバルレジスタファイル４０からのフィルタ係数９２（０）〜９２（Ｙ−１）を供給するように構成され得る。相関ベクトル命令の場合、選択器信号１３８は、実行ユニット８４（０）〜８４（Ｘ）に供給されるべきシーケンス番号発生器１３４からの基準ベクトルデータサンプルセット１３０（０）〜１３０（Ｘ）を選択するように構成され得る。 [00135] Furthermore, it should be noted that the same components and architecture provided in VPE 22 (2) of FIG. 11 are provided in VPE 22 (1) of FIG. The sequence number generator 134 can supply filter coefficients 92 (0) -92 (Y-1) or reference vector data sample sets 130 (0) -130 (X) and other data to be processed. Addition and multiplexing are performed by the register file 40 and the multiplexer 136. Thus, VPE 22 (2) of FIG. 11 can provide both the filter vector processing operations described above and the correlation vector processing operations described herein and described in further detail below, under the control of multiplexer 136. . Multiplexer 136 may be controlled by a selector signal 138 that is controlled based on a vector instruction being executed by VPE 22 (2). For filter vector instructions, selector signal 138 provides filter coefficients 92 (0) -92 (Y-1) from global register file 40 to be supplied to execution units 84 (0) -84 (X). Can be configured as follows. In the case of a correlation vector instruction, the selector signal 138 receives the reference vector data sample sets 130 (0) -130 (X) from the sequence number generator 134 to be supplied to the execution units 84 (0) -84 (X). Can be configured to select.

[00136]引き続き図１１を参照すると、下記でより詳細に説明されるように、タップ付き遅延線７８（０）、７８（１）は、処理されているベクトル命令に従って制御されるようにプログラム可能である。相関ベクトル命令またはタップ付き遅延線７８を利用しない他の命令が処理されていない場合、タップ付き遅延線７８は、ベクトルデータファイル８２（０）〜８２（Ｘ）と実行ユニット８４（０）〜８４（Ｘ）との間の入力データフローパス８０（０）〜８０（Ｘ）に含まれないようにプログラムされ得る。この実施形態では、前に説明されたように、２つのタップ付き遅延線７８、プライマリタップ付き遅延線７８（０）およびシャドウタップ付き遅延線７８（１）が設けられ、シャドウタップ付き遅延線７８（１）はこの実施形態ではオプションである。前に説明されたように、タップ付き遅延線７８がないと、実行ユニット８４（０）〜８４（Ｘ）にシフトされた中間入力ベクトルデータサンプルセットを再び供給するために、別個のシフティングプロセスが実行される必要があるはずであり、それにより、遅延時間が増大し、さらなる電力が消費される。さらに、相関ベクトル処理動作中、ベクトルデータファイル８２（０）〜８２（Ｘ）からのシフトされた入力ベクトルデータサンプルセット８６Ｓ（０）〜８６Ｓ（Ｘ）の再フェッチ遅延によって、ＶＰＥ２２（２）内の入力データフローパス８０（０）〜８０（Ｘ）および出力データフローパス９８（０）〜９８（Ｘ）の効率が制限されない。シフトされた入力ベクトルデータサンプルセット８６Ｓ（０）〜８６Ｓ（Ｘ）は、実行ユニット８４（０）〜８４（Ｘ）に局在するタップ付き遅延線７８によって供給される。実行ユニット８４（０）〜８４（Ｘ）におけるベクトル処理は、データフローの制限ではなく、コンピュータリソースのみによって制限される。 [00136] With continued reference to FIG. 11, tapped delay lines 78 (0), 78 (1) are programmable to be controlled in accordance with the vector instruction being processed, as described in more detail below. It is. If no correlated vector instructions or other instructions that do not utilize the tapped delay line 78 are being processed, the tapped delay line 78 is used by the vector data files 82 (0) -82 (X) and execution units 84 (0) -84. (X) may be programmed not to be included in the input data flow path 80 (0) -80 (X). In this embodiment, as previously described, two tapped delay lines 78, a primary tapped delay line 78 (0) and a shadow tapped delay line 78 (1) are provided, and a shadow tapped delay line 78 is provided. (1) is optional in this embodiment. As previously described, in the absence of the tapped delay line 78, a separate shifting process is used to provide the shifted intermediate input vector data sample set to execution units 84 (0) -84 (X) again. Need to be executed, which increases the delay time and consumes more power. Further, during the correlation vector processing operation, the refetch delay of the shifted input vector data sample sets 86S (0) to 86S (X) from the vector data files 82 (0) to 82 (X) causes the VPE 22 (2) to The efficiency of the input data flow paths 80 (0) to 80 (X) and the output data flow paths 98 (0) to 98 (X) is not limited. The shifted input vector data sample sets 86S (0) -86S (X) are provided by a tapped delay line 78 located in the execution units 84 (0) -84 (X). Vector processing in execution units 84 (0) -84 (X) is limited only by computer resources, not data flow limitations.

[00137]さらに、図１１のＶＰＥ２２（２）によって実行される相関ベクトル処理動作は、タップ付き遅延線７８を利用することによってより精密にされ得るが、これは、実行ユニット８４（０）〜８４（Ｘ）内の中間相関処理ステージのための出力累算がベクトルデータファイル８２（０）〜８２（Ｘ）に記憶される必要がないからである。実行ユニット８４（０）〜８４（Ｘ）からベクトルデータファイル８２（０）〜８２（Ｘ）への中間ベクトルデータサンプルセットの記憶は、丸めをもたらす可能性がある。したがって、次の中間ベクトルデータサンプルセットがベクトル処理動作のために実行ユニット８４（０）〜８４（Ｘ）に供給されるとき、ベクトル処理動作の各乗算フェーズの間に任意の丸め誤差が伝搬および加算される。対照的に、図１１のＶＰＥ２２（２）の例では、実行ユニット８４（０）〜８４（Ｘ）によって計算された中間相関出力ベクトルデータサンプルセットは、ベクトルデータファイル８２（０）〜８２（Ｘ）に記憶される必要がない。前の中間相関出力ベクトルデータサンプルセットは、次の相関出力ベクトルデータサンプルセットのための中間相関出力ベクトルデータサンプルセットと累算され得るが、これは、タップ付き遅延線７８が、処理されるべきベクトル処理動作の間に、実行ユニット８４（０）〜８４（Ｘ）にシフトされた入力ベクトルデータサンプルセット８６Ｓ（０）〜８６Ｓ（Ｘ）を供給するからであり、結果は前の相関出力ベクトルデータサンプルセットのための前のベクトルデータサンプルセットと累算される。 [00137] Furthermore, the correlation vector processing operations performed by VPE 22 (2) of FIG. 11 may be made more precise by utilizing tapped delay line 78, which is performed by execution units 84 (0) -84. This is because the output accumulation for the intermediate correlation processing stage in (X) need not be stored in vector data files 82 (0) -82 (X). Storage of the intermediate vector data sample sets from execution units 84 (0) -84 (X) to vector data files 82 (0) -82 (X) can result in rounding. Thus, when the next intermediate vector data sample set is fed to execution units 84 (0) -84 (X) for vector processing operations, any rounding errors are propagated and added during each multiplication phase of the vector processing operations. Is done. In contrast, in the example of VPE 22 (2) of FIG. 11, the intermediate correlation output vector data sample sets calculated by execution units 84 (0) -84 (X) are vector data files 82 (0) -82 (X ) Need not be memorized. The previous intermediate correlation output vector data sample set may be accumulated with the intermediate correlation output vector data sample set for the next correlation output vector data sample set, which means that the tapped delay line 78 is to be processed. This is because, during the vector processing operation, the shifted input vector data sample sets 86S (0) -86S (X) are supplied to the execution units 84 (0) -84 (X), the result being the previous correlation output vector. Accumulated with the previous vector data sample set for the data sample set.

[00138]上記図４のＶＰＥ２２（１）内に設けられた構成要素の前の説明は、図１１のＶＰＥ２２（２）に等しく適用可能であり、したがって再び記載されない。 [00138] The previous description of the components provided in VPE 22 (1) of FIG. 4 above is equally applicable to VPE 22 (2) of FIG. 11, and therefore will not be described again.

[00139]図１１のＶＰＥ２２（２）のさらなる詳細および特徴、ならびにこの実施形態における入力データフローパス８０（０）〜８０（Ｘ）内の実行ユニット８４（０）〜８４（Ｘ）にシフトされた入力ベクトルデータサンプルセット８６Ｓ（０）〜８６Ｓ（Ｘ）を供給するためのタップ付き遅延線７８のさらなる説明が次に記載される。この関連で、図１２Ａおよび図１２Ｂは、例示的な相関ベクトル命令に従って、タップ付き遅延線７８を利用する図１１のＶＰＥ２２（２）において実行され得る例示的な相関ベクトル処理動作１４０を示すフローチャートである。図１２Ａおよび図１２Ｂは、例示的な相関／共分散ベクトル処理動作に従って、インターリーブされたオンタイムおよび後発の入力ベクトルデータサンプルセットがフェッチされる、図１１のＶＰＥ２２（２）において並列に実行され得る例示的な相関／共分散ベクトル処理動作を示すフローチャートである。 [00139] Further details and features of VPE 22 (2) of FIG. 11 and shifted to execution units 84 (0) -84 (X) in input data flow paths 80 (0) -80 (X) in this embodiment. Further description of tapped delay line 78 for providing input vector data sample sets 86S (0) -86S (X) will now be described. In this regard, FIGS. 12A and 12B are flowcharts illustrating an exemplary correlation vector processing operation 140 that may be performed in VPE 22 (2) of FIG. 11 utilizing tapped delay line 78 in accordance with exemplary correlation vector instructions. is there. 12A and 12B may be performed in parallel in VPE 22 (2) of FIG. 11 where interleaved on-time and late input vector data sample sets are fetched according to exemplary correlation / covariance vector processing operations. 6 is a flowchart illustrating an exemplary correlation / covariance vector processing operation.

[00140]図１３〜図１７Ｂにおいて提供される例を参照して、図１２Ａおよび図１２Ｂの相関ベクトル処理動作１４０において実行される例示的なタスクが記載される。図１２Ａを参照すると、相関ベクトル命令に従って相関ベクトル処理動作１４０において処理されるべき入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）が、相関ベクトル処理動作１４０のために、ベクトルデータファイル８２（０）〜８２（Ｘ）から入力データフローパス８０（０）〜８０（Ｘ）の中にフェッチされる（ブロック１４２）。図１１のＶＰＥ２２（２）に関して上記で説明されたように、入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）は、実行ユニット８４（０）〜８４（Ｘ）内のシーケンス番号発生器１３４から受信された基準ベクトルデータサンプルセット１３０（０）〜１３０（Ｘ）と乗算される。たとえば、図１３は、シーケンス番号発生器１３４内の基準ベクトルデータサンプルセット１３０（０）〜１３０（Ｘ）を示す。この例では、入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）内の１６個の入力ベクトルデータサンプル８６（０）、８６（１）、．．．８６（１５）と相関されるべき、グローバルレジスタファイル４０に記憶された１６個の基準ベクトルデータサンプル１３０（０）、１３０（１）、．．．１３０（１５）が存在する。上記で前に説明された図６Ｂは、ベクトルデータファイル８２（０）〜８２（Ｘ）に記憶された例示的な入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）を示したが、それはこの例においても適用可能であり、したがって、ここでは再び記載されない。 [00140] With reference to the examples provided in FIGS. 13-17B, exemplary tasks performed in the correlation vector processing operation 140 of FIGS. 12A and 12B will be described. Referring to FIG. 12A, the input vector data sample sets 86 (0) -86 (X) to be processed in the correlation vector processing operation 140 according to the correlation vector instructions are transferred to the vector data file 82 ( 0) -82 (X) are fetched into the input data flow paths 80 (0) -80 (X) (block 142). As described above with respect to VPE 22 (2) of FIG. 11, the input vector data sample sets 86 (0) -86 (X) are converted into sequence number generators 134 in execution units 84 (0) -84 (X). Is multiplied by the reference vector data sample sets 130 (0) -130 (X) received from. For example, FIG. 13 shows reference vector data sample sets 130 (0) -130 (X) in sequence number generator 134. In this example, the 16 input vector data samples 86 (0), 86 (1),. . . 86 reference vector data samples 130 (0), 130 (1),... Stored in the global register file 40 to be correlated with 86 (15). . . 130 (15) exists. FIG. 6B, previously described above, shows an exemplary input vector data sample set 86 (0) -86 (X) stored in vector data files 82 (0) -82 (X), which It is also applicable in this example and is therefore not described here again.

[00141]相関ベクトル処理動作１４０において相関されるべき入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）および基準ベクトルデータサンプルセット１３０（０）〜１３０（Ｘ）の幅に応じて、ベクトル命令のプログラミングに従う相関ベクトル処理動作１４０を提供するために、図１１のＶＰＥ２２（２）内のベクトルデータレーン１００（０）〜１００（Ｘ）の１つ、いくつか、またはすべてが利用され得る。ベクトルデータファイル８２（０）〜８２（Ｘ）の幅全体が必要な場合、すべてのベクトルデータレーン１００（０）〜１００（Ｘ）が相関ベクトル処理動作１４０に利用され得る。相関ベクトル処理動作１４０は、相関ベクトル処理動作１４０に利用され得るベクトルデータレーン１００（０）〜１００（Ｘ）のサブセットを必要とするにすぎない場合があることに留意されたい。これは、入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）の幅がすべてのベクトルデータファイル８２（０）〜８２（Ｘ）の幅よりも小さいからであり得るし、ここで、相関ベクトル処理動作１４０と並列に実行されるべき他のベクトル処理動作にさらなるベクトルデータレーン１００を利用することが望ましい。現在の例を説明する目的で、相関ベクトル処理動作１４０において利用される入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）および基準ベクトルデータサンプルセット１３０（０）〜１３０（Ｘ）が、ＶＰＥ２２（２）内のすべてのベクトルデータレーン１００（０）〜１００（Ｘ）を要すると想定する。 [00141] Vector instructions depending on the width of the input vector data sample sets 86 (0) -86 (X) and the reference vector data sample sets 130 (0) -130 (X) to be correlated in the correlation vector processing operation 140 One, some, or all of the vector data lanes 100 (0) -100 (X) in VPE 22 (2) of FIG. 11 may be utilized to provide a correlation vector processing operation 140 according to If the entire width of the vector data files 82 (0) -82 (X) is required, all vector data lanes 100 (0) -100 (X) may be utilized for the correlation vector processing operation 140. Note that correlation vector processing operation 140 may only require a subset of vector data lanes 100 (0) -100 (X) that may be utilized for correlation vector processing operation 140. This may be because the width of the input vector data sample set 86 (0) -86 (X) is smaller than the width of all vector data files 82 (0) -82 (X), where the correlation vector It may be desirable to utilize additional vector data lanes 100 for other vector processing operations that are to be performed in parallel with processing operations 140. For purposes of illustrating the current example, the input vector data sample sets 86 (0) -86 (X) and the reference vector data sample sets 130 (0) -130 (X) utilized in the correlation vector processing operation 140 are represented by VPE 22 Assume that all vector data lanes 100 (0) to 100 (X) in (2) are required.

[00142]図１２Ａに戻って参照すると、相関ベクトル処理動作１４０のための第１の入力ベクトルデータサンプルセット８６Ｓ（０）〜８６（Ｘ）としてタップ付き遅延線７８にロードされるために、フェッチされた入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）が、ベクトルデータファイル８２（０）〜８２（Ｘ）から入力データフローパス８０（０）〜８０（Ｘ）に供給される（ブロック１４４）。入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）は、相関ベクトル処理動作１４０のために実行ユニット８４（０）〜８４（Ｘ）によって処理されるべき入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）として、プライマリタップ付き遅延線７８（０）の中にロードされる。プライマリタップ付き遅延線７８（０）の中にロードされた入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）は、相関ベクトル処理動作１４０の最初の動作のためにシフトされない。次の入力ベクトルデータサンプルセット８６Ｎ（０）〜８６Ｎ（Ｘ）も、実行ユニット８４（１）〜８４（Ｘ）によって処理されるべき次の入力ベクトルデータサンプルセット８６Ｎ（０）〜８６Ｎ（Ｘ）として、シャドウタップ付き遅延線７８（１）の中にロードされ得る。上記で前に説明され、下記でさらに詳細に説明されるように、タップ付き遅延線７８の目的は、相関ベクトル処理動作１４０の動作の間に、次の相関演算のために実行ユニット８４（０）〜８４（Ｘ）にシフトされた入力ベクトルデータサンプルセット８６Ｓ（０）〜８６Ｓ（Ｘ）を供給するために、入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）のシフトを提供することである。実行ユニット８４（０）〜８４（Ｘ）によって実行される相関ベクトル処理動作１４０の各処理ステージの間に、実行ユニット８４（０）〜８４（Ｘ）にシフトされた入力ベクトルデータサンプルセット８６Ｓ（０）〜８６Ｓ（Ｘ）を供給するために、入力ベクトルデータサンプル８６はプライマリタップ付き遅延線７８（０）内でシフトされる。このようにして、入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）は、相関ベクトル処理動作１４０の相関演算ごとに、記憶、ベクトルデータファイル８２（０）〜８２（Ｘ）内でシフト、および再フェッチされる必要がない。 [00142] Referring back to FIG. 12A, fetch to be loaded into the tapped delay line 78 as a first input vector data sample set 86S (0) -86 (X) for the correlation vector processing operation 140. The input vector data sample sets 86 (0) to 86 (X) are supplied from the vector data files 82 (0) to 82 (X) to the input data flow paths 80 (0) to 80 (X) (block 144). ). The input vector data sample sets 86 (0) -86 (X) are to be processed by the execution units 84 (0) -84 (X) for the correlation vector processing operation 140. 86 (X) is loaded into the delay line 78 (0) with the primary tap. The input vector data sample sets 86 (0) -86 (X) loaded into the primary tapped delay line 78 (0) are not shifted due to the initial operation of the correlation vector processing operation 140. The next input vector data sample sets 86N (0) to 86N (X) are also processed by the execution units 84 (1) to 84 (X). Can be loaded into the delay line 78 (1) with shadow tap. As previously described above and described in further detail below, the purpose of tapped delay line 78 is to execute unit 84 (0) for the next correlation operation during operation of correlation vector processing operation 140. ) To provide input vector data sample sets 86 (0) to 86S (X) shifted to provide input vector data sample sets 86S (0) to 86S (X) shifted to 84 (X). It is. During each processing stage of the correlation vector processing operation 140 performed by execution units 84 (0) -84 (X), input vector data sample set 86S (shifted to execution units 84 (0) -84 (X). 0) to 86S (X), the input vector data samples 86 are shifted in the primary tapped delay line 78 (0). In this way, the input vector data sample sets 86 (0) to 86 (X) are stored and shifted in the vector data files 82 (0) to 82 (X) for each correlation calculation of the correlation vector processing operation 140. And does not need to be refetched.

[00143]この関連で、図１４は、図１１のＶＰＥ２２（２）内に設けられ得る例示的なタップ付き遅延線７８を示す。この実施形態では、タップ付き遅延線７８は、シャドウタップ付き遅延線７８（１）とプライマリタップ付き遅延線７８（０）とを備える。上記で前に説明されたように、この例におけるプライマリタップ付き遅延線７８（０）は、入力ベクトルデータサンプル８６の解像度が８ビット長に落ちることを可能にするために、複数の８ビットプライマリパイプラインレジスタ１２０から構成される。実行ユニット８４（０）〜８４（Ｘ）によって処理される最初の入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）は、相関ベクトル処理動作１４０の最初の相関演算のためにこの例ではシフトされない。実行ユニット８４（０）〜８４（Ｘ）が相関ベクトル処理動作１４０のために次の相関演算を処理するとき、プライマリタップ付き遅延線７８（０）に記憶された入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）内の入力ベクトルデータサンプル８６は、シフトされた入力ベクトルデータサンプルセット８６Ｓ（０）〜８６Ｓ（Ｘ）になるために、図１４の矢印によって示されたように、プライマリパイプラインレジスタ１２０（０）〜１２０（４Ｘ＋３）内でシフトされる。このようにして、実行ユニット８４（０）〜８４（Ｘ）は、ベクトルデータファイル８２（０）〜８２（Ｘ）から入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）を記憶、シフト、および再フェッチする必要なしに、シフトされた入力ベクトルデータサンプルセット８６Ｓ（０）〜８６Ｓ（Ｘ）を受信し、それらの相関ベクトル処理動作１４０を実行することによって、十分利用される。 [00143] In this regard, FIG. 14 illustrates an exemplary tapped delay line 78 that may be provided within the VPE 22 (2) of FIG. In this embodiment, the tapped delay line 78 includes a shadow tapped delay line 78 (1) and a primary tapped delay line 78 (0). As previously described above, the primary tapped delay line 78 (0) in this example provides a plurality of 8-bit primary to allow the resolution of the input vector data sample 86 to drop to 8 bits long. It consists of a pipeline register 120. The first input vector data sample sets 86 (0) -86 (X) processed by execution units 84 (0) -84 (X) are shifted in this example for the first correlation operation of correlation vector processing operation 140. Not. When execution units 84 (0) -84 (X) process the next correlation operation for correlation vector processing operation 140, input vector data sample set 86 (0) stored in primary tapped delay line 78 (0). ) -86 (X), the input vector data samples 86 become the shifted input vector data sample sets 86S (0) -86S (X), as shown by the arrows in FIG. Shifted in line registers 120 (0) -120 (4X + 3). In this way, execution units 84 (0) -84 (X) store, shift, and store input vector data sample sets 86 (0) -86 (X) from vector data files 82 (0) -82 (X). And without having to refetch, it is fully utilized by receiving the shifted input vector data sample sets 86S (0) -86S (X) and performing their correlation vector processing operations 140.

[00144]相関ベクトル処理動作１４０のためにプライマリタップ付き遅延線７８（０）およびシャドウタップ付き遅延線７８（１）において実行されるシフトの回数は、相関されるべきサンプルの数に依存する。ベクトルデータファイル８２（０）〜８２（Ｘ）からプライマリタップ付き遅延線７８（０）およびシャドウタップ付き遅延線７８（１）の中にフェッチされた入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）内の入力ベクトルデータサンプル８６の数が、相関ベクトル処理動作１４０における相関演算の数よりも大きい場合、実行ユニット８４（０）〜８４（Ｘ）は、任意のさらなる入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）がベクトルデータファイル８２（０）〜８２（Ｘ）から再フェッチされることなく、相関ベクトル処理動作１４０を実行することができる。しかしながら、相関ベクトル処理動作１４０における相関演算の数が、ベクトルデータファイル８２（０）〜８２（Ｘ）からプライマリタップ付き遅延線７８（０）およびシャドウタップ付き遅延線７８（１）の中にフェッチされた入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）内の入力ベクトルデータサンプル８６の数よりも大きい場合、相関ベクトル処理動作１４０の一部として、さらなる入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）がベクトルデータファイル８２（０）〜８２（Ｘ）からフェッチされ得る。 [00144] The number of shifts performed in primary tapped delay line 78 (0) and shadow tapped delay line 78 (1) for correlation vector processing operation 140 depends on the number of samples to be correlated. Input vector data sample sets 86 (0) -86 () fetched from the vector data files 82 (0) -82 (X) into the primary tapped delay line 78 (0) and the shadow tapped delay line 78 (1). If the number of input vector data samples 86 in X) is greater than the number of correlation operations in the correlation vector processing operation 140, execution units 84 (0) -84 (X) may select any additional input vector data sample sets 86. Correlation vector processing operation 140 can be performed without refetching (0) -86 (X) from vector data files 82 (0) -82 (X). However, the number of correlation operations in correlation vector processing operation 140 is fetched from vector data files 82 (0) -82 (X) into primary tapped delay line 78 (0) and shadow tapped delay line 78 (1). If the input vector data sample set 86 (0) -86 (X) is greater than the number of input vector data samples 86, then as part of the correlation vector processing operation 140, a further input vector data sample set 86 (0) ~ 86 (X) may be fetched from the vector data files 82 (0) -82 (X).

[00145]この実施形態では、プライマリパイプラインレジスタ１２０（０）〜１２０（４Ｘ＋３）はまとめて、ベクトルデータファイル８２（０）〜８２（Ｘ）の幅である。１５に等しい「Ｘ」を有する幅が５１２ビットであるベクトルデータファイル８２（０）〜８２（Ｘ）の例では、５１２ビット（すなわち、６４個のレジスタ×各８ビット）の合計幅を提供するために、各々が８ビットの幅である６４個の合計プライマリパイプラインレジスタ１２０（０）〜１２０（６３）が存在する。したがって、この例では、プライマリタップ付き遅延線７８（０）は、１つの入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）の幅全体を記憶することが可能である。この例では、８ビット幅のプライマリパイプラインレジスタ１２０（０）〜１２０（４Ｘ＋３）を設けることによって、入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）は、８ビット相関ベクトル処理動作のために８ビットベクトルデータサンプルサイズにシフトダウンされ得る。たとえば、１６ビットまたは３２ビットのサンプルなどのより大きい入力ベクトルデータサンプル８６のサイズが相関ベクトル処理動作１４０に望ましい場合、入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）は、プライマリパイプラインレジスタ１２０（０）〜１２０（４Ｘ＋３）において、１度に２つのプライマリパイプラインレジスタ１２０によってシフトされ得る。 [00145] In this embodiment, primary pipeline registers 120 (0) -120 (4X + 3) are collectively the width of vector data files 82 (0) -82 (X). In the example of a vector data file 82 (0) -82 (X) having a width of “X” equal to 15 and 512 bits, it provides a total width of 512 bits (ie, 64 registers × 8 bits each) Therefore, there are 64 total primary pipeline registers 120 (0) -120 (63), each 8 bits wide. Accordingly, in this example, the primary tapped delay line 78 (0) can store the entire width of one input vector data sample set 86 (0) -86 (X). In this example, by providing 8-bit wide primary pipeline registers 120 (0) to 120 (4X + 3), the input vector data sample sets 86 (0) to 86 (X) are for 8-bit correlation vector processing operations. Can be shifted down to 8-bit vector data sample size. For example, if a larger input vector data sample 86 size, such as a 16-bit or 32-bit sample, is desired for the correlation vector processing operation 140, the input vector data sample set 86 (0) -86 (X) is the primary pipeline register. 120 (0) -120 (4X + 3) may be shifted by two primary pipeline registers 120 at a time.

[00146]図１５Ａは、相関ベクトル処理命令１４０の第１のクロックサイクル（ＣＹＣＬＥ０）の間に、ベクトルデータファイル８２（０）〜８２（Ｘ）からプライマリタップ付き遅延線７８（０）の中にロードされた入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）を示す。最初の入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）が入力ベクトルデータサンプルＸ１〜Ｘ３２としてプライマリタップ付き遅延線７８（０）の中にロードされるが、６４個の入力ベクトルデータサンプルが供給される。プライマリパイプラインレジスタ１２０（０）〜１２０（２Ｘ＋１）（図１４も参照）は、入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）からオンタイム入力ベクトルデータサンプルおよび後発入力ベクトルデータサンプルをロードされる。たとえば、プライマリタップ付き遅延線７８（０）の中に（および下記で後により詳細に説明されるように、シャドウタップ付き遅延線７８（１）の中にも）、入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）のオンタイム入力ベクトルデータサンプルと後発入力ベクトルデータサンプルとをロードするために、特殊なベクトル命令がサポートされる場合がある。たとえば、プライマリパイプラインレジスタ１２２（０）、１２２（１）、１２２（２Ｘ＋２）、および１２２（２Ｘ＋３）はまとめて、入力ベクトルデータサンプル８６（０）を含んでいる。プライマリパイプラインレジスタ１２２（０）、１２２（１）は、Ｘ（０）およびＸ（１）であるオンタイム入力ベクトルデータサンプル８６ＯＴ（０）を含んでおり、ここで「ＯＴ」は「オンタイム」を意味する。プライマリパイプラインレジスタ１２２（２Ｘ＋２）、１２２（２Ｘ＋３）は、Ｘ（１）およびＸ（２）である後発入力ベクトルデータサンプル８６Ｌ（０）を含んでおり、ここで「Ｌ」は「後発」を意味する。プライマリタップ付き遅延線７８（０）内のこの入力ベクトルデータサンプル８６の記憶パターンは、他のプライマリパイプラインレジスタ１２２（２）〜１２２（２Ｘ＋１）および１２２（２Ｘ＋４）〜１２２（４Ｘ＋３）について繰り返される（図１４参照）。 [00146] FIG. 15A illustrates that during the first clock cycle (CYCLE0) of the correlation vector processing instruction 140, from the vector data files 82 (0) -82 (X) into the primary tapped delay line 78 (0). The loaded input vector data sample sets 86 (0) -86 (X) are shown. The first input vector data sample set 86 (0) -86 (X) is loaded into the primary tapped delay line 78 (0) as input vector data samples X1-X32, but 64 input vector data samples are Supplied. Primary pipeline registers 120 (0) -120 (2X + 1) (see also FIG. 14) load on-time input vector data samples and late input vector data samples from input vector data sample sets 86 (0) -86 (X). Is done. For example, in the primary tapped delay line 78 (0) (and also in the shadow tapped delay line 78 (1), as will be described in more detail below), the input vector data sample set 86 ( Special vector instructions may be supported to load on-time input vector data samples and late input vector data samples from 0) to 86 (X). For example, primary pipeline registers 122 (0), 122 (1), 122 (2X + 2), and 122 (2X + 3) collectively include input vector data samples 86 (0). Primary pipeline registers 122 (0), 122 (1) include on-time input vector data samples 86OT (0) which are X (0) and X (1), where “OT” is “on-time”. "Means. Primary pipeline registers 122 (2X + 2) and 122 (2X + 3) include late input vector data samples 86L (0) which are X (1) and X (2), where “L” is “late”. means. The storage pattern of this input vector data sample 86 in the primary tapped delay line 78 (0) is repeated for the other primary pipeline registers 122 (2) -122 (2X + 1) and 122 (2X + 4) -122 (4X + 3). (See FIG. 14).

[00147]図１４に戻って参照すると、シャドウタップ付き遅延線７８（１）もタップ付き遅延線７８内に設けられる。シャドウタップ付き遅延線７８（１）は、次のベクトル処理動作のためにベクトルデータファイル８２（０）〜８２（Ｘ）から次の入力ベクトルデータサンプルセット８６Ｎ（０）〜８６Ｎ（Ｘ）をラッチまたは輸送するために利用され得る。シャドウタップ付き遅延線７８（１）はまた、入力ベクトルデータサンプルの解像度が、プライマリタップ付き遅延線７８（０）と同様に８ビット長に落ちることを可能にするために、複数の８ビットシャドウパイプラインレジスタ１２２から構成される。シャドウパイプラインレジスタ１２２はまとめて、この例では５１２ビットであるベクトルデータファイル８２（０）〜８２（Ｘ）の幅であり、その結果、シャドウタップ付き遅延線７８（１）も、プライマリタップ付き遅延線７８（０）のように、１つの入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）の幅全体を記憶することが可能である。したがって、この実施形態では、プライマリタップ付き遅延線７８（０）に含まれるシャドウパイプラインレジスタ１２２（０）〜１２２（４Ｘ＋３）の数は、合計１６であるベクトルデータレーン１００（０）〜１００（Ｘ）の数の４倍であり、この例では各ベクトルデータレーン１００（０）〜１００（Ｘ）が各々３２ビットをサポートすることが可能である。したがって、プライマリパイプラインレジスタ１２０の数も、この例では合計５１２ビット（すなわち、６４個のレジスタ×各８ビット）用に合計６４である。 [00147] Referring back to FIG. 14, a shadow tapped delay line 78 (1) is also provided in the tapped delay line 78. The shadow tapped delay line 78 (1) latches the next input vector data sample set 86N (0) -86N (X) from the vector data files 82 (0) -82 (X) for the next vector processing operation. Or it can be used to transport. Shadow tapped delay line 78 (1) also provides a plurality of 8-bit shadows to allow the resolution of the input vector data samples to fall to 8 bits long, similar to primary tapped delay line 78 (0). It consists of a pipeline register 122. The shadow pipeline register 122 is collectively the width of the vector data file 82 (0) -82 (X) which is 512 bits in this example, and as a result, the delay line 78 (1) with the shadow tap also has the primary tap. Like delay line 78 (0), it is possible to store the entire width of one input vector data sample set 86 (0) -86 (X). Accordingly, in this embodiment, the number of shadow pipeline registers 122 (0) to 122 (4X + 3) included in the primary tapped delay line 78 (0) is a total of 16, vector data lanes 100 (0) to 100 ( X) is four times the number, and in this example each vector data lane 100 (0) -100 (X) can support 32 bits each. Thus, the number of primary pipeline registers 120 is also a total of 64 in this example for a total of 512 bits (ie, 64 registers × 8 bits each).

[00148]図１５Ｂは、相関ベクトル処理命令１４０の第２のクロックサイクル（ＣＹＣＬＥ１）の間に、シャドウタップ付き遅延線７８（１）の中にロードされた次の入力ベクトルデータサンプルセット８６Ｎ（０）〜８６Ｎ（Ｘ）を示す。相関ベクトル処理動作１４０の実行をセットアップするために、ベクトルデータファイル８２（０）〜８２（Ｘ）からの最初の入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）がプライマリタップ付き遅延線７８（０）の中にロードされた後に、次の入力ベクトルデータサンプルセット８６Ｎ（０）〜８６Ｎ（１）がシャドウタップ付き遅延線７８（１）の中にロードされる。この次の入力ベクトルデータサンプルセット８６Ｎ（０）〜８６Ｎ（Ｘ）が、オンタイム入力ベクトルデータサンプル８６ＯＴと後発入力ベクトルデータサンプル８６Ｌの両方とともに、入力ベクトルデータサンプルＸ（３２）〜Ｘ（６３）としてシャドウタップ付き遅延線７８（１）の中にロードされる。この例では、上記で説明されたプライマリタップ付き遅延線７８（０）において提供される記憶パターンのように、Ｘ（３２）およびＸ（３３）が入力ベクトルデータサンプル８６（０）のオンタイム入力ベクトルデータサンプル８６ＯＴを形成し、Ｘ（３３）およびＸ（３４）が入力ベクトルデータサンプル８６（０）の後発入力ベクトルデータサンプル８６Ｌを形成することに留意されたい。入力ベクトルデータサンプル８６を一緒にグループ化して、入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）を形成するために、他のパターンが提供される可能性がある。シーケンス番号発生器１３４からの基準ベクトルデータサンプルセット１３０（０）〜１３０（Ｘ）から、相関ベクトル処理動作１４０の第１の処理ステージの間に相関された基準ベクトルデータサンプル１３０（すなわち、Ｙ（０）およびＹ（１））はまた、相関ベクトル処理動作１４０において使用するための図１５Ｂの実行ユニット８４（０）〜８４（Ｘ）へのレジスタ（「Ｃ」）内で供給されるものとして示される。 [00148] FIG. 15B illustrates the next input vector data sample set 86N (0) loaded into the shadow tapped delay line 78 (1) during the second clock cycle (CYCLE1) of the correlation vector processing instruction 140. ) To 86N (X). To set up the execution of the correlation vector processing operation 140, the first input vector data sample set 86 (0) -86 (X) from the vector data files 82 (0) -82 (X) is the primary tapped delay line 78. After being loaded into (0), the next input vector data sample set 86N (0) -86N (1) is loaded into the shadow tapped delay line 78 (1). The next set of input vector data samples 86N (0) to 86N (X) is input vector data samples X (32) to X (63) together with both the on-time input vector data sample 86OT and the subsequent input vector data sample 86L. Is loaded into the delay line 78 (1) with a shadow tap. In this example, X (32) and X (33) are on-time inputs of input vector data samples 86 (0), as in the storage pattern provided in the primary tapped delay line 78 (0) described above. Note that vector data sample 86OT is formed, and X (33) and X (34) form subsequent input vector data sample 86L of input vector data sample 86 (0). Other patterns may be provided to group input vector data samples 86 together to form input vector data sample sets 86 (0) -86 (X). From the reference vector data sample sets 130 (0) -130 (X) from the sequence number generator 134, the reference vector data samples 130 (ie, Y () correlated during the first processing stage of the correlation vector processing operation 140. 0) and Y (1)) are also provided in registers ("C") to execution units 84 (0) -84 (X) of FIG. 15B for use in correlation vector processing operation 140. Indicated.

[00149]図１４に戻って参照すると、相関ベクトル処理動作１４０の各処理ステージの間に、入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）内の入力ベクトルデータサンプル８６がプライマリタップ付き遅延線７８（０）内でシフトされるとき、シャドウパイプラインレジスタ１２２に記憶された次の入力ベクトルデータサンプル８６Ｎも、シャドウタップ付き遅延線７８（１）のシャドウパイプラインレジスタ１２２内でシフトされる。この例では、入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）の入力ベクトルデータサンプル８６は、オンタイムバージョンおよび後発バージョンとして記憶されるので、図１４のタップ付き遅延線７８（０）と７８（１）との間で提供されるシフトパターンは、図７のタップ付き遅延線７８（０）と７８（１）との間で提供されるシフトパターンとは異なる。図１４に示されたように、オンタイム入力ベクトルデータサンプル８６ＯＴは、シャドウタップ付き遅延線７８（１）内のシャドウパイプラインレジスタ１２２（０）から、プライマリタップ付き遅延線７８（０）内のプライマリパイプラインレジスタ１２０（２Ｘ＋１）にシフトされる。同じく、後発入力ベクトルデータサンプル８６Ｌは、シャドウタップ付き遅延線７８（１）内のシャドウパイプラインレジスタ１２２（２Ｘ＋２）から、プライマリタップ付き遅延線７８（０）内のプライマリパイプラインレジスタ１２０（４Ｘ＋３）にシフトされる。このようにして、入力ベクトルデータサンプル８６のシフトが相関ベクトル処理動作１４０の間に発生するとき、オンタイム入力ベクトルデータサンプル８６ＯＴおよび後発入力ベクトルデータサンプル８６ＯＴは、タップ付き遅延線７８（０）、７８（１）内で互いから隔離され続ける。 [00149] Referring back to FIG. 14, during each processing stage of the correlation vector processing operation 140, the input vector data samples 86 in the input vector data sample sets 86 (0) -86 (X) are delayed with a primary tap. When shifted in line 78 (0), the next input vector data sample 86N stored in shadow pipeline register 122 is also shifted in shadow pipeline register 122 of shadow tapped delay line 78 (1). . In this example, the input vector data samples 86 of the input vector data sample sets 86 (0) to 86 (X) are stored as an on-time version and a later version, so that the tapped delay line 78 (0) of FIG. The shift pattern provided between 78 (1) and the shift pattern provided between tapped delay lines 78 (0) and 78 (1) of FIG. 7 is different. As shown in FIG. 14, the on-time input vector data sample 86OT is transferred from the shadow pipeline register 122 (0) in the shadow tapped delay line 78 (1) to the primary tapped delay line 78 (0). Shifted to primary pipeline register 120 (2X + 1). Similarly, the late input vector data sample 86L is transferred from the shadow pipeline register 122 (2X + 2) in the shadow tap delay line 78 (1) to the primary pipeline register 120 (4X + 3) in the primary tap delay line 78 (0). Shifted to. In this way, when a shift of the input vector data sample 86 occurs during the correlation vector processing operation 140, the on-time input vector data sample 86OT and the later input vector data sample 86OT are tapped delay line 78 (0), Continue to be isolated from each other within 78 (1).

[00150]相関ベクトル処理動作１４０の処理ステージが実行ユニット８４（０）〜８４（Ｘ）において進行し、最終的に、シャドウタップ付き遅延線７８（１）に最初に記憶された次の入力ベクトルデータサンプルセット８６Ｎ（０）〜８６Ｎ（Ｘ）全体は、処理のために実行ユニット８４（０）〜８４（Ｘ）に供給されるために、プライマリタップ付き遅延線７８（０）の中に完全にシフトされる。このようにして、相関ベクトル処理動作１４０が現在の入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）に対して完了した後、実行ユニット８４（０）〜８４（Ｘ）は、次いで、必要な場合、遅延なく、次の相関ベクトル処理動作１４０のための現在の入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）として、プライマリタップ付き遅延線７８（０）に記憶された前の次の入力ベクトルデータサンプルセット８６Ｎ（０）〜８６Ｎ（Ｘ）を供給され得る。 [00150] The processing stage of correlation vector processing operation 140 proceeds in execution units 84 (0) -84 (X), and finally the next input vector initially stored in shadow-tapped delay line 78 (1). The entire data sample set 86N (0) -86N (X) is completely in the primary tapped delay line 78 (0) to be provided to the execution units 84 (0) -84 (X) for processing. Shifted to. Thus, after the correlation vector processing operation 140 is completed for the current input vector data sample set 86 (0) -86 (X), execution units 84 (0) -84 (X) are then required Otherwise, the previous next stored in the primary tapped delay line 78 (0) as the current input vector data sample set 86 (0) -86 (X) for the next correlation vector processing operation 140 without delay. Input vector data sample sets 86N (0) -86N (X).

[00151]最初の入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）および次の入力ベクトルデータサンプルセット８６Ｎ（０）〜８６Ｎ（Ｘ）が、それぞれ、プライマリタップ付き遅延線７８（０）およびシャドウタップ付き遅延線７８（１）の中にロードされた後、図１５Ｂに示されたように、プライマリタップ付き遅延線７８（０）内に供給された最初の入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）は、相関ベクトル処理動作１４０の最初の処理ステージにおいて処理されるために、それぞれの実行ユニット８４（０）〜８４（Ｘ）に供給される（図１２Ａのブロック１４６）。最初の入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）は、実行ユニット８４（０）〜８４（Ｘ）によって処理されている現在の入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）になる。図１１のＶＰＥ２２（２）において示されたように、現在の入力ベクトルデータサンプル８６（０）は実行ユニット８４（０）に供給され、現在の入力ベクトルデータサンプル８６（１）は実行ユニット８４（１）に供給され、以下同様である。相関ベクトル処理動作１４０の現在の処理ステージにおいて、入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）と相関されるべき基準ベクトルデータ入力サンプル１３０（０）〜１３０（Ｘ）が実行ユニット８４（０）〜８４（Ｘ）に供給される（図１２Ａのブロック１４８）。 [00151] The first input vector data sample set 86 (0) -86 (X) and the next input vector data sample set 86N (0) -86N (X) are respectively the primary tapped delay line 78 (0) and After loading into the shadow tapped delay line 78 (1), the first input vector data sample set 86 (0) supplied in the primary tapped delay line 78 (0) as shown in FIG. 15B. ) -86 (X) are provided to respective execution units 84 (0) -84 (X) for processing in the first processing stage of the correlation vector processing operation 140 (block 146 of FIG. 12A). The first input vector data sample sets 86 (0) -86 (X) are the current input vector data sample sets 86 (0) -86 (X) being processed by execution units 84 (0) -84 (X). become. As shown in VPE 22 (2) of FIG. 11, the current input vector data sample 86 (0) is provided to execution unit 84 (0), and the current input vector data sample 86 (1) is provided to execution unit 84 (0). 1), and so on. At the current processing stage of correlation vector processing operation 140, reference vector data input samples 130 (0) -130 (X) to be correlated with input vector data sample sets 86 (0) -86 (X) are executed by execution unit 84 ( 0) to 84 (X) (block 148 in FIG. 12A).

[00152]次に、実行ユニット８４（０）〜８４（Ｘ）が、相関ベクトル処理動作１４０（図１２Ａのブロック１５０）を実行する。より詳細には、実行ユニット８４（０）〜８４（Ｘ）は、演算：オンタイム入力ベクトルデータサンプル８６ＯＴのためのＲ（ＯＴ）［ｎ］＝ｙ［０］＊ｘ［ｎ］および後発入力ベクトルデータサンプル８６ＬのためのＲ（Ｌ）［ｎ］＝ｙ［１］＊ｘ［１＋ｎ］に従って、最初の処理ステージの間に現在の入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）を基準ベクトルデータサンプル１３０と乗算し、ここで、ｙ［］は指定された基準ベクトルデータサンプル１３０であり、ｘ［ｎ］は現在の入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）である。相関の結果は、現在のオンタイム相関出力ベクトルデータサンプルセットＲ（ＯＴ）［ｎ］および現在の後発相関出力ベクトルデータサンプルセットＲ（Ｌ）［ｎ］である。次いで、実行ユニット８４（０）〜８４（Ｘ）が、新しい前の入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）を供給するために、各現在の、結果として生じる相関ベクトルデータサンプルセットを、実行ユニット８４（０）〜８４（Ｘ）によって計算された前の、結果として生じる相関ベクトルデータサンプルセットと累算する（図１２Ｂのブロック１５２）。相関ベクトル処理動作１４０の最初の処理ステージでは、前の、結果として生じる相関出力ベクトルデータサンプルセット１３２（０）〜１３２（Ｘ）は存在しない。したがって、相関ベクトル処理動作１４０の２番目の次の処理ステージのために、最初／現在の、結果として生じる相関出力ベクトルデータサンプルセット１３２（０）〜１３２（Ｘ）が前の入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）になるにすぎない。 [00152] Next, execution units 84 (0) -84 (X) perform a correlation vector processing operation 140 (block 150 of FIG. 12A). More specifically, execution units 84 (0) -84 (X) are operated on: R (OT) [n] = y [0] * x [n] and late input for on-time input vector data sample 86OT. According to R (L) [n] = y [1] * x [1 + n] for the vector data sample 86L, the current input vector data sample set 86 (0) -86 (X) is obtained during the first processing stage. Multiply with reference vector data sample 130, where y [] is the designated reference vector data sample 130 and x [n] is the current input vector data sample set 86 (0) -86 (X). . The results of the correlation are the current on-time correlation output vector data sample set R (OT) [n] and the current late correlation output vector data sample set R (L) [n]. The execution units 84 (0) -84 (X) then provide each new, resulting correlation vector data sample set to provide a new previous input vector data sample set 86 (0) -86 (X). Are accumulated with the previous correlation vector data sample set calculated by execution units 84 (0) -84 (X) (block 152 of FIG. 12B). In the first processing stage of the correlation vector processing operation 140, the previous resulting correlation output vector data sample sets 132 (0) -132 (X) are not present. Thus, for the second next processing stage of the correlation vector processing operation 140, the first / current resulting correlation output vector data sample set 132 (0) -132 (X) is the previous input vector data sample set. It is only 86 (0) to 86 (X).

[00153]相関ベクトル処理動作１４０のすべての処理ステージが完了した場合（図１２Ｂのブロック１５４）、ベクトルデータファイル８２（０）〜８２（Ｘ）に供給され記憶されるために、出力データフローパス９８（０）〜９８（Ｘ）内の、結果として生じる相関出力ベクトルデータサンプルセット１３２（０）〜１３２（Ｘ）として、累算された前の、結果として生じる相関出力ベクトルデータサンプルセット１３２（０）〜１３２（Ｘ）が供給される（図１２Ｂのブロック１５７）。相関ベクトル処理動作１４０のすべての処理ステージが完了していなかった場合（図１２Ａのブロック１５４）、シフトされた入力ベクトルデータサンプルセット８６Ｓ（０）〜８６Ｓ（Ｘ）を供給するために、シフトされた入力ベクトルデータサンプルセット８６Ｓ（０）〜８６Ｓ（Ｘ）が、相関ベクトル処理動作１４０のための次の場所にタップ付き遅延線７８（０）、７８（１）内でシフトされる（図１２Ｂのブロック１５６）。シフトされた入力ベクトルデータサンプルセット８６Ｓ（０）〜８６Ｓ（Ｘ）は、前の、結果として生じる相関出力ベクトルデータサンプルセット１３２（０）〜１３２（Ｘ）と累算されるように、次の、結果として生じる相関出力ベクトルデータサンプルセット１３２（０）〜１３２（Ｘ）を計算するために供給される。タップ付き遅延線７８（０）、７８（１）内で入力ベクトルデータサンプル８６をシフトすることは、図１４に関して詳細に上記で前述された。 [00153] When all processing stages of the correlation vector processing operation 140 are complete (block 154 of FIG. 12B), the output data flow path 98 is supplied and stored in the vector data files 82 (0) -82 (X). The resulting correlated output vector data sample set 132 (0) prior to accumulation as the resulting correlated output vector data sample set 132 (0) -132 (X) in (0) -98 (X). ) To 132 (X) are supplied (block 157 in FIG. 12B). If all processing stages of correlation vector processing operation 140 have not been completed (block 154 of FIG. 12A), they are shifted to provide shifted input vector data sample sets 86S (0) -86S (X). The input vector data sample sets 86S (0) -86S (X) are shifted within the tapped delay lines 78 (0), 78 (1) to the next location for the correlation vector processing operation 140 (FIG. 12B). Block 156). The shifted input vector data sample sets 86S (0) -86S (X) are accumulated with the previous resulting correlated output vector data sample sets 132 (0) -132 (X) as , To provide the resulting correlation output vector data sample sets 132 (0) -132 (X). Shifting the input vector data samples 86 within the tapped delay lines 78 (0), 78 (1) has been described above in detail with respect to FIG.

[00154]図１５Ｃは、次の相関処理動作１４０、オンタイム入力ベクトルデータサンプル８６ＳＯＴのためのＲ（ＯＴ）［ｎ］＝ｙ［２］＊ｘ［２＋ｎ］および後発入力ベクトルデータサンプル８６ＳＬのためのＲ（Ｌ）［ｎ］＝ｙ［３］＊ｘ［３＋ｎ］のための新たなシフトされた入力ベクトルデータサンプルセット８６Ｓ（０）〜８６Ｓ（Ｘ）になるために、相関ベクトル処理動作１４０の２番目の処理ステージにおいて、入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）がシフトされたときのタップ付き遅延線７８のコンテンツを示す。プライマリタップ付き遅延線７８（０）内の入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）が、２つの入力ベクトルデータサンプル８６によってシフトされる。たとえば、ｘ（２）およびＸ（３）の図１５Ｂの入力ベクトルデータサンプル８６ＯＴ（１）が、次に図１５Ｃの入力ベクトルデータサンプル８６Ｓ（０）の中にシフトされる。シフトされた入力ベクトルデータサンプルセット８６Ｓ（０）〜８６Ｓ（Ｘ）は、現在の入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）になる。実行ユニット８４（０）〜８４（Ｘ）に供給された基準ベクトルデータサンプル１３０はまた、この例ではＹ（２）およびＹ（３）である基準ベクトルデータサンプル１３０である。 [00154] FIG. 15C illustrates the next correlation operation 140, R (OT) [n] = y [2] * x [2 + n] for on-time input vector data sample 86SOT and late input vector data sample 86SL. Correlation vector processing operation 140 to become a new shifted input vector data sample set 86S (0) -86S (X) for R (L) [n] = y [3] * x [3 + n] The contents of the tapped delay line 78 when the input vector data sample sets 86 (0) -86 (X) are shifted in the second processing stage are shown. Input vector data sample sets 86 (0) to 86 (X) in the primary tapped delay line 78 (0) are shifted by two input vector data samples 86. For example, the input vector data sample 86OT (1) of FIG. 15B for x (2) and X (3) is then shifted into the input vector data sample 86S (0) of FIG. 15C. The shifted input vector data sample sets 86S (0) to 86S (X) become the current input vector data sample sets 86 (0) to 86 (X). The reference vector data samples 130 supplied to execution units 84 (0) -84 (X) are also reference vector data samples 130, which in this example are Y (2) and Y (3).

[00155]引き続き図１２Ｂを参照すると、次の基準ベクトルデータサンプル１３０と乗算されるために、プライマリタップ付き遅延線７８（０）から（およびシャドウタップ付き遅延線７８（１）の一部分から）実行ユニット８４（０）〜８４（Ｘ）に次のシフトされた入力ベクトルデータサンプルセット８６Ｓ（０）〜８６Ｓ（Ｘ）を供給する（図１２Ａのブロック１５０）ことによってプロセスが繰り返し、結果として生じる相関出力ベクトルデータサンプルセット１３２（０）〜１３２（Ｘ）が、前の、結果として生じる相関出力ベクトルデータサンプルセット１３２（０）〜１３２（Ｘ）と累算される（図１２Ｂのブロック１５２）。図１５Ｄは、例示的な相関ベクトル処理動作１４０の最後の処理ステージの間に、タップ付き遅延線７８（０）、７８（１）内に存在する入力ベクトルデータサンプル８６の状態を示す。この例では、図１５Ｄに示されたように、タップ付き遅延線７８のフルデータ幅は、入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）に利用されたが、オンタイム入力ベクトルデータサンプル８６ＯＴと後発入力ベクトルデータサンプル８６Ｌとの間で分割されるので、相関ベクトル処理動作１４０のための１６個の処理ステージが存在した。図１５Ｄに示されたように、Ｙ（３０）およびＹ（３１）は、相関ベクトル処理動作１４０における最後の基準ベクトルデータサンプル１３０（Ｘ）であり、それは、図１３の例では基準ベクトルセータサンプル１３０（１５）である。シフトされた入力ベクトルデータサンプルセット８６Ｓ（０）〜８６Ｓ（Ｘ）は（この例ではベクトルデータレーン１００（０）〜１００（Ｘ）の幅である）１６回シフトされており、その結果、相関ベクトル処理動作１４０のための最後の１６番目の処理ステージにおいて、入力ベクトルデータサンプルＸ（３０）およびＸ（３１）がプライマリタップ付き遅延線７８（０）内のシフトされた入力ベクトルデータサンプル８６Ｓ（０）に記憶される。 [00155] With continued reference to FIG. 12B, run from primary tapped delay line 78 (0) (and from a portion of shadow tapped delay line 78 (1)) to be multiplied with the next reference vector data sample 130. By supplying the units 84 (0) -84 (X) with the next shifted input vector data sample sets 86S (0) -86S (X) (block 150 of FIG. 12A), the process repeats and the resulting correlation The output vector data sample sets 132 (0) -132 (X) are accumulated with the previous resulting correlated output vector data sample sets 132 (0) -132 (X) (block 152 of FIG. 12B). FIG. 15D shows the state of the input vector data sample 86 present in the tapped delay lines 78 (0), 78 (1) during the final processing stage of the exemplary correlation vector processing operation 140. FIG. In this example, as shown in FIG. 15D, the full data width of tapped delay line 78 was used for input vector data sample sets 86 (0) -86 (X), but on-time input vector data samples. There were 16 processing stages for the correlation vector processing operation 140 since it was divided between 86OT and the late input vector data sample 86L. As shown in FIG. 15D, Y (30) and Y (31) are the last reference vector data samples 130 (X) in the correlation vector processing operation 140, which in the example of FIG. 130 (15). The shifted input vector data sample sets 86S (0) -86S (X) are shifted 16 times (in this example, the width of the vector data lanes 100 (0) -100 (X)), resulting in correlation. In the last sixteenth processing stage for vector processing operation 140, input vector data samples X (30) and X (31) are shifted input vector data samples 86S (in primary tapped delay line 78 (0). 0).

[00156]図１６は、上記の例における例示的な１６個の相関ベクトル処理ステージが完全に実行された後の、図１１のＶＰＥ２２（２）内の実行ユニット８４（０）〜８４（Ｘ）内の累算器のコンテンツ（すなわち、結果として生じる相関出力ベクトルデータサンプル１３２）の概略図である。結果として生じる相関出力ベクトルデータサンプルセットは、１３２（０）〜１３２（Ｘ）として示される。この例では、各実行ユニット８４（０）〜８４（Ｘ）は、ベクトルデータレーン１００（０）〜１００（Ｘ）ごとに並列に配置された４つの累算器を有するので、累算器Ａｃｃ０〜Ａｃｃ３が図１６に示されている。累算された、結果として生じる出力ベクトルデータサンプルは、さらなる分析および／または処理のためにそこに記憶されるべき全体の結果として生じる相関出力ベクトルデータサンプルセット１３２（０）〜１３２（Ｘ）として、ベクトルデータファイル８２（０）〜８２（Ｘ）に出力データフローパス９８（０）〜９８（Ｘ）上で供給され得る。必要な場合、ベクトルデータファイル８２（０）〜８２（Ｘ）からベクトルユニットデータメモリ３２（図２参照）に、結果として生じる相関出力ベクトルデータサンプルセット１３２（０）〜１３２（Ｘ）の行を移動するために、特殊なベクトル命令がＶＰＥ２２（２）によってサポートされる場合がある。 [00156] FIG. 16 illustrates execution units 84 (0) -84 (X) in VPE 22 (2) of FIG. 11 after the exemplary 16 correlation vector processing stages in the above example have been fully executed. FIG. 6 is a schematic diagram of the contents of the accumulators in (ie, the resulting correlation output vector data sample 132). The resulting correlation output vector data sample set is shown as 132 (0) -132 (X). In this example, each execution unit 84 (0) -84 (X) has four accumulators arranged in parallel for each of the vector data lanes 100 (0) -100 (X), so the accumulator Acc0 ~ Acc3 is shown in FIG. The resulting output vector data samples accumulated are as the entire resulting correlation output vector data sample set 132 (0) -132 (X) to be stored there for further analysis and / or processing. , Vector data files 82 (0) -82 (X) may be provided on output data flow paths 98 (0) -98 (X). If necessary, the rows of the resulting correlation output vector data sample sets 132 (0) -132 (X) are transferred from the vector data files 82 (0) -82 (X) to the vector unit data memory 32 (see FIG. 2). In order to move, special vector instructions may be supported by VPE 22 (2).

[00157]上述された、結果として生じるフィルタベクトル出力データサンプルセット９４（０）〜９４（Ｘ）と、結果として生じる相関出力ベクトルデータサンプルセット１３２（０）〜１３２（Ｘ）とを含む、実行ユニット８４（０）〜８４（Ｘ）によって供給される、結果として生じる出力ベクトルデータサンプルセットは、ＶＰＥによって実行されるベクトル命令に応じて、異なるインターリーブされたフォーマットでベクトルデータファイル８２（０）〜８２（Ｘ）、８２（３１）に戻されて記憶され得る。各々３２ビット幅であるベクトルデータファイル８２（０）〜８２（Ｘ）を提供するために、この例では「Ｘ」は３１に等しい。たとえば、図１７Ａに示されたように、結果として生じる出力ベクトルデータサンプルセット１５８（０）〜１５８（Ｘ）、１５８（３１）は、それらの実数（「ｑ」）成分および虚数（「ｉ」）成分によって分離されたベクトルデータファイル８２（０）〜８２（Ｘ）に記憶され得る。結果として生じる出力ベクトルデータサンプルセット１５８（０）〜１５８（Ｘ）は、この例では１５８（０）、１５８（１）、．．．、および１５８（Ｘ）である、「Ｘ＋１」個の、結果として生じる出力ベクトルデータサンプル１５８から構成される。次のベクトル命令が、入力ベクトルデータサンプルセットとして、結果として生じる出力ベクトルデータサンプルセット１５８（０）〜１５８（Ｘ）、１５８（３１）の実数成分および虚数成分に対して演算する場合など、効率目的でそれらの実数（「ｑ」）成分および虚数（「ｉ」）成分によって分離された、結果として生じる出力ベクトルデータサンプルセット１５８（０）〜１５８（Ｘ）、１５８（３１）を記憶することはより効率的であり得る。または、結果として生じる出力ベクトルデータサンプル１５８のその実数成分および虚数成分への分離のように、ベクトルデータファイル８２内に、結果として生じる出力ベクトルデータサンプル１５８を記憶することが可能ではない場合がある。たとえば、１６ビットベクトルデータサンプルが別の１６ビットベクトルデータサンプルと乗算される場合、３２ビットの結果として生じるベクトルデータサンプルがもたらされる。たとえば、３２ビットの結果として生じる出力ベクトルデータサンプル１５８は、図１７ＡのＹ０であり得る。Ｙ０の虚数成分Ｙ０．ｉ１５８（Ｉ）はベクトルデータファイル８２（０）のＡＤＤＲＥＳＳ「０」に記憶され得るし、Ｙ０の実数成分Ｙ０．ｑ１５８（Ｑ）はＡＤＤＲＥＳＳ「Ａ」などの別のＡＤＤＲＥＳＳに記憶され得る。 [00157] An implementation comprising the resulting filter vector output data sample sets 94 (0) -94 (X) and the resulting correlation output vector data sample sets 132 (0) -132 (X), as described above. The resulting output vector data sample sets supplied by units 84 (0) -84 (X) are in different interleaved formats depending on vector instructions executed by the VPE, in vector data files 82 (0)- 82 (X) and 82 (31) can be stored back. In this example, “X” is equal to 31 to provide vector data files 82 (0) -82 (X) that are each 32 bits wide. For example, as shown in FIG. 17A, the resulting output vector data sample sets 158 (0) -158 (X), 158 (31) have their real (“q”) components and imaginary numbers (“i”). ) May be stored in vector data files 82 (0) -82 (X) separated by components. The resulting output vector data sample sets 158 (0) -158 (X) are 158 (0), 158 (1),. . . , And 158 (X), consisting of “X + 1” resulting output vector data samples 158. Efficiency, such as when the next vector instruction operates on the real and imaginary components of the resulting output vector data sample set 158 (0) -158 (X), 158 (31) as the input vector data sample set Storing the resulting output vector data sample sets 158 (0) -158 (X), 158 (31), separated by their real (“q”) and imaginary (“i”) components for purposes Can be more efficient. Alternatively, it may not be possible to store the resulting output vector data sample 158 in the vector data file 82, such as separating the resulting output vector data sample 158 into its real and imaginary components. . For example, if a 16-bit vector data sample is multiplied with another 16-bit vector data sample, the resulting vector data sample of 32 bits is provided. For example, the resulting output vector data sample 158 of 32 bits may be Y0 in FIG. 17A. The imaginary component Y0 of Y0. i158 (I) can be stored in ADDRESS “0” of the vector data file 82 (0), and the real component Y0. q158 (Q) may be stored in another ADDRESS such as ADDRESS “A”.

[00158]図１７Ａの結果として生じる出力ベクトルデータサンプルセット１５８（０）〜１５８（Ｘ）、１５８（３１）は、偶数および奇数の結果として生じる出力ベクトルデータサンプルによってインターリーブされたベクトルデータファイル８２（０）〜８２（Ｘ）、８２（３１）に記憶される可能性がある。これは図１７Ｂにおける例によって示される。図１７Ｂに示されたように、結果として生じる出力ベクトルデータサンプルＹ０〜Ｙ３１１５８（０）〜１５８（Ｘ）、１５８（３１）は、ベクトルデータファイル８２（０）〜８２（３１）内のＡＤＤＲＥＳＳ「０」およびＡＤＤＲＥＳＳ「Ａ」の中の偶数および奇数のベクトルデータサンプルによってインターリーブされたフォーマットで記憶される。結果として生じる出力ベクトルデータサンプルＹ０１５８（０）は、ベクトルデータファイル８２（０）内のＡＤＤＲＥＳＳ「０」に記憶される。結果として生じる出力ベクトルデータサンプルＹ１１５８（１）は、ベクトルデータファイル８２（１）内のＡＤＤＲＥＳＳ「０」に記憶されないが、ベクトルデータファイル８２（０）内のＡＤＤＲＥＳＳ「Ａ」に記憶される。結果として生じる出力ベクトルデータサンプルＹ２１５８（２）は、ベクトルデータファイル８２（１）内のＡＤＤＲＥＳＳ「０」に記憶され、以下同様である。 [00158] The resulting output vector data sample sets 158 (0) -158 (X), 158 (31) of FIG. 17A are interleaved with the vector data file 82 (interleaved with even and odd resulting output vector data samples. 0) to 82 (X) and 82 (31). This is illustrated by the example in FIG. 17B. As shown in FIG. 17B, the resulting output vector data samples Y0-Y31 158 (0) -158 (X), 158 (31) are ADDRESS in the vector data files 82 (0) -82 (31). Stored in a format interleaved with even and odd vector data samples in "0" and ADDRESS "A". The resulting output vector data sample Y0 158 (0) is stored in ADDRESS “0” in the vector data file 82 (0). The resulting output vector data sample Y1 158 (1) is not stored in ADDRESS “0” in vector data file 82 (1), but is stored in ADDRESS “A” in vector data file 82 (0). The resulting output vector data sample Y2 158 (2) is stored in ADDRESS “0” in the vector data file 82 (1), and so on.

[00159]いくつかのワイヤレスベースバンド動作は、データサンプルが処理される前にフォーマット変換されることを必要とする。たとえば、図１７Ａおよび図１７Ｂにおいてインターリーブされたフォーマットでベクトルデータファイル８２（０）〜８２（Ｘ）に記憶された、結果として生じる出力ベクトルデータサンプルセット１５８（０）〜１５８（Ｘ）は、次のベクトル処理動作のためにデインターリーブされる必要があり得る。たとえば、結果として生じる出力ベクトルデータサンプル１５８（０）〜１５８（Ｘ）がＣＤＭＡ信号を表す場合、結果として生じる出力ベクトルデータサンプル１５８（０）〜１５８（Ｘ）は、信号の偶数フェーズおよび奇数フェーズを分離するためにデインターリーブされる必要があり得る。デインターリーブされた信号は、ＣＤＭＡシステムが信号を抽出することができるかどうかを決定するために、図１１〜図１６に関して上述された例示的な相関ベクトル処理動作などの相関処理動作において、ローカルに生成されたコードまたはシーケンス番号と相関される場合もある。従来のプログラマブルプロセッサは、複数のステップでデータサンプルのフォーマット変換を実施し、それは、ベクトルデータサンプルのフォーマット変換において、サイクルと、電力消費と、データフローの複雑化とを加える。ベクトルプロセッサは、フォーマット変換されたベクトルデータサンプルが実行ユニットに供給される前にフォーマット変換を提供するように、ベクトルデータサンプルを前処理することができる。フォーマット変換されたベクトルデータサンプルは、ベクトルデータメモリに記憶され、実行ユニットによって処理されるべきデータフォーマット変換を必要とするベクトル処理動作の一部として再フェッチされる。しかしながら、ベクトルデータサンプルのこのフォーマット前処理は、実行ユニットによるフォーマット変換されたベクトルデータサンプルの次の処理を遅延させ、実行ユニット内のコンピュータ構成要素が過少利用される原因になる。 [00159] Some wireless baseband operations require that data samples be formatted before being processed. For example, the resulting output vector data sample sets 158 (0) -158 (X) stored in vector data files 82 (0) -82 (X) in the interleaved format in FIGS. May need to be deinterleaved for a vector processing operation. For example, if the resulting output vector data samples 158 (0) -158 (X) represent a CDMA signal, the resulting output vector data samples 158 (0) -158 (X) are the even and odd phases of the signal. May need to be deinterleaved to separate them. The deinterleaved signal is locally received in a correlation processing operation, such as the exemplary correlation vector processing operation described above with respect to FIGS. 11-16, to determine whether the CDMA system can extract the signal. It may be correlated with the generated code or sequence number. Conventional programmable processors perform format conversion of data samples in multiple steps, which adds cycles, power consumption, and data flow complexity in vector data sample format conversion. The vector processor can preprocess the vector data samples to provide format conversion before the format converted vector data samples are provided to the execution unit. Format converted vector data samples are stored in vector data memory and refetched as part of a vector processing operation that requires data format conversion to be processed by the execution unit. However, this format preprocessing of vector data samples delays subsequent processing of the vector data samples that have been format converted by the execution unit, causing underutilization of computer components within the execution unit.

[00160]本明細書において下記で開示される実施形態は、図１８Ａおよび図１８Ｂに示されたベクトルデータサンプルセットなどの、インターリーブされたベクトルデータサンプルセットの変換を提供する。たとえば、図１８Ａおよび図１８Ｂは、様々なフォーマットでベクトルデータファイル８２（０）〜８２（Ｘ）に記憶されたベクトルデータサンプルセットＤ（０）〜Ｄ（Ｘ）を示す。図１８Ａは、符号付き複素数（ＳＣ）の１６ビットサンプル（ＳＣ１６）に記憶され、実数成分および虚数成分によってフォーマットインターリーブされたベクトルデータサンプルセットＤ（０）〜Ｄ（Ｘ）を示す。３２ビットベクトルデータサンプルＤ（０）の１６ビットの実数成分Ｄ（０）（Ｑ）および虚数成分Ｄ（０）（Ｉ）は、３２ビットベクトルデータファイル８２（０）に記憶される。ベクトルデータサンプルＤ（Ｘ）の１６ビットの実数成分Ｄ（Ｘ）（Ｑ）および虚数成分Ｄ（Ｘ）（Ｉ）は、３２ビットベクトルデータファイル８２（Ｘ）に記憶される。図１８Ｂは、ＳＣの８ビットサンプル（ＳＣ８）に記憶され、実数成分および虚数成分によってフォーマットインターリーブされたベクトルデータサンプルセットＤ（０）〜Ｄ（Ｘ）を示す。１６ビットベクトルデータサンプルＤ（０）（１）の８ビットの実数成分Ｄ（０）（１）（Ｑ）および虚数成分Ｄ（０）（１）（Ｉ）は、ベクトルデータファイル８２（０）に記憶される。１６ビットベクトルデータサンプルＤ（０）（０）の８ビットの実数成分Ｄ（０）（０）（Ｑ）および虚数成分Ｄ（０）（０）（Ｉ）も、３２ビットベクトルデータファイル８２（０）に記憶される。同じく、１６ビットベクトルデータサンプルＤ（Ｘ）（１）の８ビットの実数成分Ｄ（Ｘ）（１）（Ｑ）および虚数成分Ｄ（Ｘ）（１）（Ｉ）は、３２ビットベクトルデータファイル８２（Ｘ）に記憶される。１６ビットベクトルデータサンプルＤ（Ｘ）（０）の８ビットの実数成分Ｄ（Ｘ）（０）（Ｑ）および虚数成分Ｄ（Ｘ）（０）（Ｉ）も、３２ビットベクトルデータファイル８２（Ｘ）に記憶される。 [00160] The embodiments disclosed herein below provide for the conversion of interleaved vector data sample sets, such as the vector data sample sets shown in FIGS. 18A and 18B. For example, FIGS. 18A and 18B show vector data sample sets D (0) -D (X) stored in vector data files 82 (0) -82 (X) in various formats. FIG. 18A shows vector data sample sets D (0) -D (X) stored in signed complex (SC) 16-bit samples (SC16) and format interleaved with real and imaginary components. The 16-bit real component D (0) (Q) and imaginary component D (0) (I) of the 32-bit vector data sample D (0) are stored in the 32-bit vector data file 82 (0). The 16-bit real component D (X) (Q) and imaginary component D (X) (I) of the vector data sample D (X) are stored in the 32-bit vector data file 82 (X). FIG. 18B shows vector data sample sets D (0) -D (X) stored in SC 8-bit samples (SC8) and format interleaved with real and imaginary components. The 8-bit real component D (0) (1) (Q) and imaginary component D (0) (1) (I) of the 16-bit vector data sample D (0) (1) are stored in the vector data file 82 (0). Is remembered. The 8-bit real component D (0) (0) (Q) and imaginary component D (0) (0) (I) of the 16-bit vector data sample D (0) (0) are also converted into the 32-bit vector data file 82 ( 0). Similarly, the 8-bit real component D (X) (1) (Q) and the imaginary component D (X) (1) (I) of the 16-bit vector data sample D (X) (1) are a 32-bit vector data file. 82 (X). The 8-bit real component D (X) (0) (Q) and imaginary component D (X) (0) (I) of the 16-bit vector data sample D (X) (0) are also converted into the 32-bit vector data file 82 ( X).

[00161]この関連で、図１９は、図２のＶＰＥ２２として提供され得る別の例示的なＶＰＥ２２（３）の概略図である。下記でより詳細に記載されるように、図１９のＶＰＥ２２（３）は、ベクトルデータサンプルの再フェッチが除去または低減され、電力消費が低減される、ＶＰＥ２２（３）内のベクトル処理動作のために実行ユニットに供給される入力ベクトルデータサンプルセットのインフライトフォーマット変換（たとえば、デインターリービング）を提供するように構成される。入力ベクトルデータサンプルセットのインフライトフォーマット変換は、ベクトルデータメモリから取り出された入力ベクトルデータサンプルセットが、実行のために実行ユニットに供給される前に、ベクトルデータメモリに記憶され、そこから再フェッチされる必要なしに、フォーマット変換されることを意味する。ベクトルデータファイルからの入力ベクトルデータサンプルの再フェッチを除去または最小化して、電力消費を低減し、処理効率を改善するために、ベクトルデータファイル８２（０）〜８２（Ｘ）と実行ユニット８４（０）〜８４（Ｘ）との間のベクトルデータレーン１００（０）〜１００（Ｘ）の各々に、フォーマット変換回路１５９（０）〜１５９（Ｘ）が含まれる。下記でより詳細に説明されるように、入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）のデインターリービングを必要とするベクトル処理動作のために、実行ユニット８４（０）〜８４（Ｘ）にフォーマット変換された入力ベクトルデータサンプルセット８６Ｆ（０）〜８６Ｆ（Ｘ）を供給するために、ＶＰＥ２２（３）内のフォーマット変換回路１５９（０）〜１５９（Ｘ）において、ベクトルデータファイル８２（０）〜８２（Ｘ）からの入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）がフォーマット変換（たとえば、デインターリーブ）される。フォーマット変換された入力ベクトルデータサンプル８６Ｆのすべては、この例ではフォーマット変換された入力ベクトルデータサンプルセット８６Ｆ（０）〜８６Ｆ（Ｘ）を備える。「Ｘ」＋１は、この例における入力ベクトルデータサンプル８６の処理用にＶＰＥ２２（３）内に設けられる並列入力データレーンの最大数である。 [00161] In this regard, FIG. 19 is a schematic diagram of another exemplary VPE 22 (3) that may be provided as the VPE 22 of FIG. As described in more detail below, VPE 22 (3) of FIG. 19 is for vector processing operations in VPE 22 (3) where refetching of vector data samples is eliminated or reduced, reducing power consumption. Is configured to provide in-flight format conversion (eg, deinterleaving) of the input vector data sample set supplied to the execution unit. In-flight format conversion of the input vector data sample set is stored in the vector data memory and refetched from it before the input vector data sample set retrieved from the vector data memory is supplied to the execution unit for execution. It means that the format is converted without having to be done. In order to eliminate or minimize refetching of input vector data samples from the vector data file to reduce power consumption and improve processing efficiency, the vector data files 82 (0) -82 (X) and execution unit 84 ( Format conversion circuits 159 (0) to 159 (X) are included in each of the vector data lanes 100 (0) to 100 (X) between 0) to 84 (X). As described in more detail below, execution units 84 (0) -84 (X) for vector processing operations that require deinterleaving of input vector data sample sets 86 (0) -86 (X). ) In the format conversion circuits 159 (0) to 159 (X) in the VPE 22 (3) to supply the input vector data sample sets 86F (0) to 86F (X) subjected to the format conversion to the vector data file 82. Input vector data sample sets 86 (0) -86 (X) from (0) -82 (X) are format converted (eg, deinterleaved). All of the format-converted input vector data samples 86F comprise the format-converted input vector data sample sets 86F (0) to 86F (X) in this example. “X” +1 is the maximum number of parallel input data lanes provided in the VPE 22 (3) for processing the input vector data samples 86 in this example.

[00162]このようにして、ＶＰＥ２２（３）における入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）のフォーマット変換は、前処理、記憶、およびベクトルデータファイル８２（０）〜８２（Ｘ）からの再フェッチを必要とせず、それにより、電力消費が低減される。さらに、入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）のフォーマット変換は、ベクトルデータファイル８２（０）〜８２（Ｘ）からのフォーマット変換された入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）の前処理、記憶、および再フェッチを必要としないので、実行ユニット８４（０）〜８４（Ｘ）はベクトル処理動作を実行することから遅延されない。したがって、ＶＰＥ２２（３）内のデータフローパスの効率は、入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）のフォーマット変換前処理の遅延によって制限されない。フォーマット変換（たとえば、デインターリーブ）された入力ベクトルデータサンプルセット８６Ｆ（０）〜８６Ｆ（Ｘ）は、実行ユニット８４（０）〜８４（Ｘ）に局在化されるように供給される。実行ユニット８４（０）〜８４（Ｘ）におけるベクトル処理は、データフローの制限ではなく、コンピュータリソースのみによって制限される。 [00162] Thus, the format conversion of input vector data sample sets 86 (0) -86 (X) in VPE 22 (3) is pre-processed, stored, and vector data files 82 (0) -82 (X). No re-fetching is required, thereby reducing power consumption. Further, the format conversion of the input vector data sample sets 86 (0) to 86 (X) is performed by the format conversion of the input vector data sample sets 86 (0) to 86 (86) from the vector data files 82 (0) to 82 (X). Execution units 84 (0) -84 (X) are not delayed from performing vector processing operations because they do not require (X) preprocessing, storage, and refetching. Therefore, the efficiency of the data flow path in VPE 22 (3) is not limited by the format conversion pre-processing delay of input vector data sample sets 86 (0) -86 (X). Format converted (eg, deinterleaved) input vector data sample sets 86F (0) -86F (X) are provided to be localized to execution units 84 (0) -84 (X). Vector processing in execution units 84 (0) -84 (X) is limited only by computer resources, not data flow limitations.

[00163]プライマリタップ付き遅延線７８（０）およびシャドウタップ付き遅延線７８（１）が図１９のＶＰＥ２２（３）内に示されるが、図１９のＶＰＥ２２（３）内にタップ付き遅延線を含めることは必要でないことに留意されたい。この例では、図１９に示されたように、フォーマット変換回路１５９（０）〜１５９（Ｘ）は、オプションのプライマリタップ付き遅延線７８（０）に含まれ得る。この配置は、図１９のＶＰＥ２２（３）内のベクトルデータファイル８２（０）〜８２（Ｘ）と実行ユニット８４（０）〜８４（Ｘ）との間の入力データフローパス８０（０）〜８０（Ｘ）内にフォーマット変換回路１５９（０）〜１５９（Ｘ）を設ける。プライマリタップ付き遅延線７８（０）の動作は、ＶＰＥ２２（１）およびＶＰＥ２２（２）に関して上記で前述された。上記で前に説明されたように、プライマリタップ付き遅延線７８（０）およびシャドウタップ付き遅延線７８（１）は、ベクトル処理動作に利用される場合があり、フォーマット変換された入力ベクトルデータサンプルセット８６Ｆ（０）〜８６Ｆ（Ｘ）が実行ユニット８４（０）〜８４（Ｘ）に供給されることを必要とし、実行ユニット８４（０）〜８４（Ｘ）も、８６ＳＦ（０）〜８６ＳＦ（Ｘ）と指定された、フォーマット変換されシフトされた入力ベクトルデータサンプルセットを必要とする。 [00163] A primary tapped delay line 78 (0) and a shadow tapped delay line 78 (1) are shown in VPE 22 (3) of FIG. 19, but a tapped delay line in VPE 22 (3) of FIG. Note that inclusion is not necessary. In this example, as shown in FIG. 19, the format conversion circuits 159 (0) to 159 (X) can be included in an optional primary tapped delay line 78 (0). This arrangement is based on the input data flow paths 80 (0) -80 between the vector data files 82 (0) -82 (X) and the execution units 84 (0) -84 (X) in the VPE 22 (3) of FIG. Format conversion circuits 159 (0) to 159 (X) are provided in (X). The operation of primary tapped delay line 78 (0) has been described above with respect to VPE22 (1) and VPE22 (2). As previously described above, primary tapped delay line 78 (0) and shadow tapped delay line 78 (1) may be used for vector processing operations, and format converted input vector data samples. The sets 86F (0) to 86F (X) need to be supplied to the execution units 84 (0) to 84 (X), and the execution units 84 (0) to 84 (X) are also 86SF (0) to 86SF. Requires a format-converted and shifted input vector data sample set designated (X).

[00164]図１９のＶＰＥ２２（３）内に設けられた同じ構成要素およびアーキテクチャが、図１１のＶＰＥ２２（２）内に設けられることに留意されたい。図１９のＶＰＥ２２（３）と図１１のＶＰＥ２２（２）との間の共通構成要素が、ＶＰＥ２２（２）の図１１の構成要素と共通の要素番号とともに図１９に示されている。上記図１１のＶＰＥ２２（２）のためのこれらの共通構成要素の前の記載および説明は、図１９のＶＰＥ２２（３）にも適用可能であり、したがってここでは再び記載されない。 [00164] Note that the same components and architecture provided in VPE 22 (3) of FIG. 19 are provided in VPE 22 (2) of FIG. The common components between the VPE 22 (3) of FIG. 19 and the VPE 22 (2) of FIG. 11 are shown in FIG. 19 together with the element numbers common to the components of FIG. 11 of the VPE 22 (2). The previous description and description of these common components for VPE 22 (2) in FIG. 11 above is also applicable to VPE 22 (3) in FIG. 19, and is therefore not described again here.

[00165]図１９のＶＰＥ２２（３）のさらなる詳細および特徴、ならびにこの実施形態における入力データフローパス８０（０）〜８０（Ｘ）内の実行ユニット８４（０）〜８４（Ｘ）にフォーマット変換された入力ベクトルデータサンプルセット８６Ｆ（０）〜８６Ｆ（Ｘ）を供給するためのタップ付き遅延線７８のさらなる説明が次に記載される。この関連で、図２０は、入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）のフォーマット変換を必要とする例示的なベクトル命令に従って、フォーマット変換回路１５９（０）〜１５９（Ｘ）を利用する図１９のＶＰＥ２２（３）において実行され得る、例示的なデインターリービングフォーマット変換ベクトル処理動作１６０を示すフローチャートである。 [00165] Further details and features of VPE 22 (3) of FIG. 19 and format conversion to execution units 84 (0) -84 (X) in input data flow paths 80 (0) -80 (X) in this embodiment. Further description of the tapped delay line 78 for providing the input vector data sample sets 86F (0) -86F (X) will now be described. In this regard, FIG. 20 utilizes format conversion circuits 159 (0) -159 (X) in accordance with exemplary vector instructions requiring format conversion of input vector data sample sets 86 (0) -86 (X). 20 is a flowchart illustrating an exemplary deinterleaving format conversion vector processing operation 160 that may be performed at VPE 22 (3) of FIG.

[00166]図２０を参照すると、ベクトル命令に従うベクトル処理動作１６０のための入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）が、ベクトルデータファイル８２（０）〜８２（Ｘ）から入力データフローパス８０（０）〜８０（Ｘ）の中にフェッチされる（ブロック１６２）。たとえば、ベクトル処理動作１６０のためのフォーマット変換は、入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）が、ベクトルデータファイル８２（０）〜８２（Ｘ）内のそのインターリーブされた状態から、デインターリーブされた入力ベクトルデータサンプルセット８６Ｆ（０）〜８６Ｆ（Ｘ）の中にデインターリーブされる、デインターリービングベクトル処理動作１６０であり得る。ベクトル処理動作１６０のためにフォーマット変換されるべき入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）の幅に応じて、ベクトル命令のプログラミングに従うベクトル処理動作１６０を提供するために、図１９のＶＰＥ２２（３）内のベクトルデータレーン１００（０）〜１００（Ｘ）の１つ、いくつか、またはすべてが利用され得る。ベクトルデータファイル８２（０）〜８２（Ｘ）の幅全体が必要な場合、すべてのベクトルデータレーン１００（０）〜１００（Ｘ）がベクトル処理動作１６０に利用され得る。ベクトル処理動作１６０は、ベクトル処理動作１６０に利用され得るベクトルデータレーン１００（０）〜１００（Ｘ）のサブセットを必要とするにすぎない場合がある。これは、入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）の幅がすべてのベクトルデータファイル８２（０）〜８２（Ｘ）の幅よりも小さいからであり得るし、ここで、ベクトル処理動作１６０と並列に実行されるべき他のベクトル処理動作にさらなるベクトルデータレーン１００を利用することが望ましい。現在の例を説明する目的で、ベクトル処理動作１６０のための入力ベクトルデータサンプルセット８６Ｆ（０）〜８６Ｆ（Ｘ）にフォーマット変換された入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）が、図１９のＶＰＥ２２（３）内のすべてのベクトルデータレーン１００（０）〜１００（Ｘ）を要すると想定する。 [00166] Referring to FIG. 20, input vector data sample sets 86 (0) -86 (X) for vector processing operations 160 according to vector instructions are input data from vector data files 82 (0) -82 (X). Fetched into flow paths 80 (0) -80 (X) (block 162). For example, the format conversion for the vector processing operation 160 may include an input vector data sample set 86 (0) -86 (X) from its interleaved state in the vector data file 82 (0) -82 (X): There may be a de-interleaving vector processing operation 160 that is de-interleaved into the de-interleaved input vector data sample sets 86F (0) -86F (X). Depending on the width of the input vector data sample sets 86 (0) -86 (X) to be formatted for vector processing operation 160, to provide a vector processing operation 160 according to the programming of vector instructions, FIG. One, some, or all of the vector data lanes 100 (0) -100 (X) in VPE 22 (3) may be utilized. If the entire width of the vector data files 82 (0) -82 (X) is required, all vector data lanes 100 (0) -100 (X) can be utilized for the vector processing operation 160. Vector processing operation 160 may only require a subset of vector data lanes 100 (0) -100 (X) that may be utilized for vector processing operation 160. This may be because the width of the input vector data sample set 86 (0) -86 (X) is less than the width of all vector data files 82 (0) -82 (X), where vector processing It may be desirable to utilize additional vector data lanes 100 for other vector processing operations to be performed in parallel with operation 160. For the purpose of describing the current example, input vector data sample sets 86 (0) -86 (X) that have been format converted to input vector data sample sets 86F (0) -86F (X) for vector processing operation 160 are provided. Assume that all the vector data lanes 100 (0) to 100 (X) in the VPE 22 (3) in FIG. 19 are required.

[00167]引き続き図２０を参照すると、ベクトル処理動作１６０に従ってフォーマット変換されるために、フェッチされた入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）が、フォーマット変換回路１５９（０）〜１５９（Ｘ）への入力データフローパス８０（０）〜８０（Ｘ）の中に供給される（ブロック１６４）。非限定的な例として、現在の入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）は、場合によっては、ベクトル処理動作１６０のために実行ユニット８４（０）〜８４（Ｘ）に供給される前にフォーマット変換されるべき入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）として、プライマリタップ付き遅延線７８（０）の中にロードされる場合がある。前に説明されたように、次の入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）は、場合によっては、実行ユニット８４（０）〜８４（Ｘ）によって処理されるべき次の入力ベクトルデータサンプルセット８６Ｎ（０）〜８６Ｎ（Ｘ）として、シャドウタップ付き遅延線７８（１）の中にロードされる場合もある。上記で前に説明されたように、タップ付き遅延線７８の目的は、シフトされた入力ベクトルデータサンプル８６に対して演算するベクトル処理動作１６０の動作の間に、実行ユニット８４（０）〜８４（Ｘ）に供給されるべきシフトされた入力ベクトルデータサンプルセット８６Ｓ（０）〜８６Ｓ（Ｘ）に、入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）をシフトすることである。フォーマット変換された入力ベクトルデータサンプルセット８６Ｆ（０）〜８６Ｆ（Ｘ）も、ベクトル処理動作１６０の間にタップ付き遅延線７８内でシフトされた場合、シフトされフォーマット変換された入力ベクトルデータサンプルセットは、８６ＳＦ（０）〜８６ＳＦ（Ｘ）と指定される。 [00167] With continued reference to FIG. 20, the fetched input vector data sample sets 86 (0) -86 (X) are converted into format conversion circuits 159 (0) -159 for format conversion in accordance with vector processing operation 160. Provided in the input data flow paths 80 (0) -80 (X) to (X) (block 164). As a non-limiting example, the current input vector data sample set 86 (0) -86 (X) is optionally supplied to execution units 84 (0) -84 (X) for vector processing operation 160. May be loaded into the primary tapped delay line 78 (0) as input vector data sample sets 86 (0) -86 (X) to be format converted. As previously described, the next input vector data sample set 86 (0) -86 (X) is possibly the next input vector to be processed by execution units 84 (0) -84 (X). Data sample sets 86N (0) -86N (X) may be loaded into the delay line 78 (1) with shadow tap. As previously described above, the purpose of the tapped delay line 78 is to execute units 84 (0) -84 during the operation of the vector processing operation 160 that operates on the shifted input vector data samples 86. Shifting the input vector data sample sets 86 (0) -86 (X) to the shifted input vector data sample sets 86S (0) -86S (X) to be supplied to (X). If the format-converted input vector data sample sets 86F (0) -86F (X) are also shifted within the tapped delay line 78 during the vector processing operation 160, the shifted and format-converted input vector data sample sets. Are designated as 86SF (0) to 86SF (X).

[00168]引き続き図２０を参照すると、実行ユニット８４（０）〜８４（Ｘ）は、次に、フォーマット変換された入力ベクトルデータサンプルセット８６Ｆ（０）〜８６Ｆ（Ｘ）を使用して、ベクトル処理動作１６０を実行することができる（ブロック１６６）。実行ユニット８４（０）〜８４（Ｘ）は、フォーマット変換された入力ベクトルデータサンプルセット８６Ｆ（０）〜８６Ｆ（Ｘ）を使用して、乗算および／または累算を提供するように構成される場合がある。タップ付き遅延線７８がベクトル処理動作１６０の間にフォーマット変換された入力ベクトルデータサンプルセット８６Ｆ（０）〜８６Ｆ（Ｘ）をシフトするために利用される場合、実行ユニット８４（０）〜８４（Ｘ）は、ベクトル処理動作１６０が完了するまで、ベクトル処理動作１６０の各処理ステージの間にシフトされフォーマット変換された入力ベクトルデータサンプルセット８６ＳＦ（０）〜８６ＳＦ（Ｘ）を受信することができる（ブロック１６８）。ベクトル処理動作１６０が完了すると、フォーマット変換された入力ベクトルデータサンプルセット８６Ｆ（０）〜８６Ｆ（Ｘ）、またはシフトされフォーマット変換された入力ベクトルデータサンプルセット８６ＳＦ（０）〜８６ＳＦ（Ｘ）を伴うベクトル処理動作に基づく、結果として生じる出力ベクトルデータサンプルセット１７２（０）〜１７２（Ｘ）が、ベクトルデータファイル８２（０）〜８２（Ｘ）に供給され記憶されるために、出力データフローパス９８（０）〜９８（Ｘ）内に供給される（ブロック１７０）。結果として生じる出力ベクトルデータサンプルセット１７２（０）〜１７２（Ｘ）は、この例では１７２（０）、１７２（１）、．．．、および１７２（Ｘ）である、「Ｘ＋１」個の、結果として生じる出力ベクトルデータサンプル１７２から構成される。 [00168] With continued reference to FIG. 20, execution units 84 (0) -84 (X) then use the format-converted input vector data sample sets 86F (0) -86F (X) to generate a vector. A processing operation 160 may be performed (block 166). Execution units 84 (0) -84 (X) are configured to provide multiplication and / or accumulation using the formatted input vector data sample sets 86F (0) -86F (X). There is a case. When tapped delay line 78 is used to shift the input vector data sample sets 86F (0) -86F (X) that have been formatted during vector processing operation 160, execution units 84 (0) -84 ( X) can receive input vector data sample sets 86SF (0) -86SF (X) shifted and format converted during each processing stage of vector processing operation 160 until vector processing operation 160 is complete. (Block 168). When the vector processing operation 160 is complete, it is accompanied by a format converted input vector data sample set 86F (0) -86F (X), or a shifted and format converted input vector data sample set 86SF (0) -86SF (X) The resulting output vector data sample sets 172 (0) -172 (X) based on the vector processing operations are supplied to and stored in the vector data files 82 (0) -82 (X) so that the output data flow path 98 (0) to 98 (X) (block 170). The resulting output vector data sample sets 172 (0) -172 (X) are 172 (0), 172 (1),. . . , And 172 (X), consisting of “X + 1” resulting output vector data samples 172.

[00169]図２１は、プライマリタップ付き遅延線７８（０）からシフトされた入力ベクトルデータサンプルセット８６Ｓ（０）〜８６Ｓ（Ｘ）を受信する例示的なフォーマット変換回路１５９（０）〜１５９（Ｘ）の概略図である。この例では、フォーマット変換回路１５９（０）〜１５９（Ｘ）は、入力データフローパス８０（０）〜８０（Ｘ）内のプライマリタップ付き遅延線７８（０）の出力上に設けられる。例示的なフォーマット変換回路１５９（０）〜１５９（Ｘ）が次に記載される。 [00169] FIG. 21 illustrates exemplary format conversion circuits 159 (0) -159 () that receive input vector data sample sets 86S (0) -86S (X) shifted from the primary tapped delay line 78 (0). It is the schematic of X). In this example, the format conversion circuits 159 (0) to 159 (X) are provided on the output of the primary tapped delay line 78 (0) in the input data flow paths 80 (0) to 80 (X). Exemplary format conversion circuits 159 (0) -159 (X) are described next.

[00170]例示的なフォーマット変換回路１５９（０）〜１５９（Ｘ）が次に記載される。フォーマット変換回路１５９（０）の内部構成要素の例示的な詳細が図２１において提供されるが、それはフォーマット変換回路１５９（１）〜１５９（Ｘ）にも適用可能である。例として図２１のフォーマット変換回路１５９（０）を取り上げると、この例におけるフォーマット変換回路１５９（０）は、それぞれ、フォーマット変換された入力ベクトルデータサンプル８６Ｆ（０）、またはシフトされフォーマット変換された入力ベクトルデータサンプル８６ＳＦ（０）を供給するために、ベクトルデータレーン１００（０）内のプライマリパイプラインレジスタ１２０（０）、１２０（１）、１２０（２Ｘ＋２）、１２０（２Ｘ＋３）からの入力ベクトルデータサンプル８６（０）またはシフトされた入力ベクトルデータサンプル８６Ｓ（０）のデインターリービングと符号拡張（sign extention）とを提供するように構成される。この関連で、この例では４つのマルチプレクサ１７４（３）〜１７４（０）が提供され、それらは、それぞれ、割り当てられたプライマリパイプラインレジスタ１２０（０）〜１２０（２Ｘ＋３）に従って配置される。各マルチプレクサ１７４（３）〜１７４（０）は、割り当てられたプライマリパイプラインレジスタ１２０（０）、１２０（１）、１２０（２Ｘ＋２）、１２０（２Ｘ＋３）内のシフトされた入力ベクトルデータサンプル８６Ｓ（０）の部分、または割り当てられたプライマリパイプラインレジスタ１２０（０）、１２０（１）、１２０（２Ｘ＋２）、１２０（２Ｘ＋３）に隣接するプライマリパイプラインレジスタ１２０に記憶するシフトされた入力ベクトルデータサンプル８６Ｓ（０）の部分のいずれかを選択するように構成される。 [00170] Exemplary format conversion circuits 159 (0) -159 (X) are described next. Exemplary details of the internal components of the format conversion circuit 159 (0) are provided in FIG. 21, which is also applicable to the format conversion circuits 159 (1) -159 (X). Taking the format conversion circuit 159 (0) of FIG. 21 as an example, the format conversion circuit 159 (0) in this example is the input vector data sample 86F (0) that has been subjected to the format conversion, or the shifted and format-converted, respectively. Input vectors from primary pipeline registers 120 (0), 120 (1), 120 (2X + 2), 120 (2X + 3) in vector data lane 100 (0) to provide input vector data samples 86SF (0) It is configured to provide deinterleaving and sign extention of data sample 86 (0) or shifted input vector data sample 86S (0). In this regard, in this example, four multiplexers 174 (3) -174 (0) are provided, which are arranged according to the assigned primary pipeline registers 120 (0) -120 (2X + 3), respectively. Each multiplexer 174 (3) -174 (0) receives a shifted input vector data sample 86S (in the assigned primary pipeline register 120 (0), 120 (1), 120 (2X + 2), 120 (2X + 3). 0) or shifted input vector data samples to be stored in the primary pipeline register 120 adjacent to the assigned primary pipeline register 120 (0), 120 (1), 120 (2X + 2), 120 (2X + 3) 86S (0) portion is selected.

[00171]たとえば、プライマリパイプラインレジスタ１２０（０）、１２０（１）、１２０（２Ｘ＋２）、１２０（２Ｘ＋３）が、実数［１５：８］、虚数［１５：８］、実数［７：０］、虚数［７：０］として、複素数のインターリーブされた形式でインターリーブされシフトされた入力ベクトルデータサンプル８６Ｓ（０）を記憶し、所望のデインターリーブされたフォーマットが、実行されるべきベクトル命令に従う実数［１５：０］および虚数［１５：０］である場合、マルチプレクサ１７４（３）〜１７４（０）の選択は以下のようであるはずである。マルチプレクサ１７４（３）は、その割り当てられたプライマリパイプラインレジスタ１２０（０）に記憶された、シフトされた入力ベクトルデータサンプル８６Ｓの部分を選択するはずである。しかしながら、マルチプレクサ１７４（２）は、プライマリパイプラインレジスタ１２０（１）に記憶された、シフトされた入力ベクトルデータサンプル８６Ｓの部分を選択するはずである。これは、隣接する入力データフローパス８０（０）（３）、８０（０）（２）内の入力ベクトルデータサンプル８６Ｓ（０）のデインターリーブされた実数部分（すなわち、実数［１５：０］）を供給するはずである。同様に、マルチプレクサ１７４（０）は、その割り当てられたプライマリパイプラインレジスタ１２０（２Ｘ＋３）に記憶された、シフトされた入力ベクトルデータサンプル８６Ｓの部分を選択するはずである。しかしながら、マルチプレクサ１７４（１）は、プライマリパイプラインレジスタ１２０（２Ｘ＋２）に記憶された、シフトされた入力ベクトルデータサンプル８６Ｓの部分を選択するはずである。これは、隣接する入力データフローパス８０（０）（１）、８０（０）（０）内のシフトされた入力ベクトルデータサンプル８６Ｓ（０）のデインターリーブされた虚数部分（すなわち、虚数［１５：０］）を供給するはずである。マルチプレクサ１７６（１）、１７６（０）は、図２１に示されたように、割り当てられていない、隣接しないプライマリパイプラインレジスタ１２０（０）、１２０（１）、１２０（２Ｘ＋２）、１２０（２Ｘ＋３）から、シフトされた入力ベクトルデータサンプル８６Ｓ（０）の部分を選択する能力を各マルチプレクサ１７４（３）〜１７４（０）に提供する。 [00171] For example, primary pipeline registers 120 (0), 120 (1), 120 (2X + 2), 120 (2X + 3) are real numbers [15: 8], imaginary numbers [15: 8], and real numbers [7: 0]. , Store the input vector data sample 86S (0) interleaved and shifted in complex interleaved form as an imaginary number [7: 0], and the desired deinterleaved format is a real number according to the vector instruction to be executed For [15: 0] and imaginary [15: 0], the selection of multiplexers 174 (3) -174 (0) should be as follows. Multiplexer 174 (3) should select the portion of the shifted input vector data sample 86S stored in its assigned primary pipeline register 120 (0). However, multiplexer 174 (2) should select the portion of shifted input vector data sample 86S stored in primary pipeline register 120 (1). This is because the deinterleaved real part of the input vector data sample 86S (0) in adjacent input data flow paths 80 (0) (3), 80 (0) (2) (ie real [15: 0]) Should supply. Similarly, multiplexer 174 (0) should select the portion of shifted input vector data sample 86S stored in its assigned primary pipeline register 120 (2X + 3). However, multiplexer 174 (1) should select the portion of shifted input vector data sample 86S stored in primary pipeline register 120 (2X + 2). This is because the deinterleaved imaginary part of the shifted input vector data sample 86S (0) in the adjacent input data flow path 80 (0) (1), 80 (0) (0) (ie, imaginary [15: 0]) should be supplied. Multiplexers 176 (1), 176 (0) are not assigned non-adjacent primary pipeline registers 120 (0), 120 (1), 120 (2X + 2), 120 (2X + 3), as shown in FIG. ) Provides each multiplexer 174 (3) -174 (0) with the ability to select a portion of the shifted input vector data sample 86S (0).

[00172]引き続き図２１を参照すると、フォーマット変換回路１５９（０）〜１５９（Ｘ）はまた、フォーマット変換された入力ベクトルデータサンプルセット８６Ｆ（０）〜８６Ｆ（Ｘ）を符号拡張する（sign extend）ように構成され得る。たとえば、入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）のフォーマット変換が、小さいビット幅から大きいビット幅に変換された符号付きベクトルデータサンプルを要する場合、フォーマット変換回路１５９（０）〜１５９（Ｘ）は、非負数の場合「０」として、負数の場合「Ｆ」として最上位ビットを拡張することによって、デインターリーブされたベクトルデータサンプルを符号拡張するように構成され得る。フォーマット変換回路１５９（０）〜１５９（Ｘ）は、フォーマット変換された入力ベクトルデータサンプルセット８６Ｆ（０）〜８６Ｆ（Ｘ）に対して符号拡張が実行されるべきか否かを示すために、実行されているベクトル命令に従って設定される符号拡張（ＳＣ）入力１７８（０）〜１７８（Ｘ）を有する場合がある。ＳＣ入力１７８（０）〜１７８（Ｘ）は、処理されているベクトル命令に従ってＳＣ入力１７８（０）〜１７８（Ｘ）によって提供されたプログラム可能なデータパス構成に従って符号拡張を実行するために、フォーマット変換回路１５９（０）〜１５９（Ｘ）内に設けられた符号拡張回路１８０（０）〜１８０（Ｘ）に供給され得る。ＳＣ入力１７８（０）〜１７８（Ｘ）は、ＶＰＥ２２（３）によるベクトル処理において柔軟性を提供するように、ベクトル命令ごとに構成および再構成され得る。たとえば、フォーマット変換回路１５９（０）〜１５９（Ｘ）内のプログラム可能なデータパスは、必要な場合、実行ユニット８４（０）〜８４（Ｘ）を十分に利用して、必要に応じてフォーマット変換を提供するために、ベクトル命令のクロックサイクルごとに、必要な場合クロックサイクルごとに、構成および再構成され得るＳＣ入力１７８（０）〜１７８（Ｘ）によって構成され得る。 [00172] With continued reference to FIG. 21, format conversion circuits 159 (0) -159 (X) also sign extend the format-converted input vector data sample sets 86F (0) -86F (X). ). For example, if the format conversion of the input vector data sample sets 86 (0) to 86 (X) requires signed vector data samples converted from a small bit width to a large bit width, the format conversion circuits 159 (0) to 159 (X) may be configured to sign extend the deinterleaved vector data samples by extending the most significant bit as “0” for non-negative numbers and “F” for negative numbers. The format conversion circuits 159 (0) to 159 (X) indicate whether or not sign extension should be performed on the format-converted input vector data sample sets 86F (0) to 86F (X). May have a sign extension (SC) input 178 (0) -178 (X) set according to the vector instruction being executed. SC inputs 178 (0) -178 (X) perform sign extension according to the programmable data path configuration provided by SC inputs 178 (0) -178 (X) according to the vector instruction being processed, The signal may be supplied to sign extension circuits 180 (0) to 180 (X) provided in the format conversion circuits 159 (0) to 159 (X). SC inputs 178 (0) -178 (X) may be configured and reconfigured for each vector instruction to provide flexibility in vector processing by VPE 22 (3). For example, the programmable data paths in the format conversion circuits 159 (0) -159 (X) can be formatted as needed, making full use of the execution units 84 (0) -84 (X) if necessary. To provide the conversion, it can be configured by SC inputs 178 (0) -178 (X) that can be configured and reconfigured every clock cycle of the vector instruction and, if necessary, every clock cycle.

[00173]しかし、上記で説明されたように、フォーマット変換回路１５９（０）〜１５９（Ｘ）は、プライマリタップ付き遅延線７８（０）の一部として設けられる必要がない。プライマリタップ付き遅延線７８（０）およびシャドウタップ付き遅延線７８（１）はオプションである。フォーマット変換回路１５９（０）〜１５９（Ｘ）は、ベクトルデータファイル８２（０）〜８２（Ｘ）から直接入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）を受信する可能性がある。このシナリオでは、例として、図２１を参照すると、入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）は、直接ベクトルレジスタファイル８２（０）〜８２（Ｘ）からプライマリレジスタ１２０（０）〜１２０（４Ｘ＋３）の中にロードされる可能性がある。 [00173] However, as explained above, the format conversion circuits 159 (0) to 159 (X) need not be provided as part of the primary tapped delay line 78 (0). The primary tapped delay line 78 (0) and shadow tapped delay line 78 (1) are optional. The format conversion circuits 159 (0) to 159 (X) may receive the input vector data sample sets 86 (0) to 86 (X) directly from the vector data files 82 (0) to 82 (X). In this scenario, for example, referring to FIG. 21, the input vector data sample sets 86 (0) to 86 (X) are directly transferred from the vector register files 82 (0) to 82 (X) to the primary registers 120 (0) to 120 (4X + 3) may be loaded.

[00174]さらに、フォーマット変換回路１５９（０）〜１５９（Ｘ）は、フォーマット変換された入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）へのプライマリタップ付き遅延線７８（０）の出力上に設けられるが、それは必要でないことに留意されたい。図２１のフォーマット変換回路１５９（０）〜１５９（Ｘ）は、プライマリタップ付き遅延線７８（０）およびシャドウタップ付き遅延線７８（１）の入力側に設けられる可能性があり、その結果、ベクトルデータファイル８２（０）〜８２（Ｘ）からフェッチされた入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）は、プライマリタップ付き遅延線７８（０）およびシャドウタップ付き遅延線７８（１）の中にロードされるより前に、フォーマット変換回路１５９（０）〜１５９（Ｘ）内でフォーマット変換される。この例では、入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）は、プライマリタップ付き遅延線７８（０）およびシャドウタップ付き遅延線７８（１）において、フォーマット変換された入力ベクトルデータサンプルセット８６Ｆ（０）〜８６Ｆ（Ｘ）（またはシフト後の８６ＳＦ（０）〜８６ＳＦ（Ｘ））として記憶されるはずである。フォーマット変換された入力ベクトルデータサンプルセット８６Ｆ（０）〜８６Ｆ（Ｘ）（またはシフト後の８６ＳＦ（０）〜８６ＳＦ（Ｘ））は、次いで、ベクトル処理動作における実行のために、直接プライマリタップ付き遅延線７８（０）から直接実行ユニット８４（０）〜８４（Ｘ）に供給される可能性がある。 [00174] Further, the format conversion circuits 159 (0) to 159 (X) output the primary tapped delay line 78 (0) to the format-converted input vector data sample sets 86 (0) to 86 (X). Note that although provided above, it is not necessary. The format conversion circuits 159 (0) to 159 (X) in FIG. 21 may be provided on the input side of the delay line 78 (0) with the primary tap and the delay line 78 (1) with the shadow tap. The input vector data sample sets 86 (0) to 86 (X) fetched from the vector data files 82 (0) to 82 (X) include a primary tapped delay line 78 (0) and a shadow tapped delay line 78 (1). The format is converted in the format conversion circuits 159 (0) to 159 (X) before being loaded into the. In this example, the input vector data sample sets 86 (0) to 86 (X) are format-converted input vector data sample sets in the primary tapped delay line 78 (0) and the shadow tapped delay line 78 (1). It should be stored as 86F (0) -86F (X) (or 86SF (0) -86SF (X) after shifting). The format converted input vector data sample set 86F (0) -86F (X) (or shifted 86SF (0) -86SF (X)) is then directly tapped for execution in vector processing operations. There is a possibility that the delay line 78 (0) may be directly supplied to the execution units 84 (0) to 84 (X).

[00175]上記で説明されたように、入力データフローパス８０（０）〜８０（Ｘ）は、実行されるべきベクトル命令に従ってフォーマット変換回路１５９（０）〜１５９（Ｘ）を利用するように、プログラム可能な入力データパス構成に従ってプログラムされ得る。この関連で、図２２は、図１９のＶＰＥ２２（３）における入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）のシフトおよびフォーマット変換のプログラミングを制御するベクトル命令のビットの例示的なデータフォーマットを提供するチャート１８２である。チャート１８２内のフィールドに提供されたデータは、それらの機能が処理されるべきベクトル命令に必要とされるかどうかに応じて、フォーマット変換回路１５９（０）〜１５９（Ｘ）および／またはタップ付き遅延線７８が入力データフローパス８０（０）〜８０（Ｘ）に含まれるかどうかを制御するように、ＶＰＥ２２（３）にプログラミングを提供する。 [00175] As explained above, the input data flow paths 80 (0) -80 (X) utilize the format conversion circuits 159 (0) -159 (X) according to the vector instructions to be executed, It can be programmed according to a programmable input data path configuration. In this regard, FIG. 22 illustrates an exemplary data format of the bits of the vector instruction that controls the shifting and format conversion programming of the input vector data sample set 86 (0) -86 (X) in VPE 22 (3) of FIG. It is the chart 182 which provides The data provided for the fields in chart 182 may be tapped with format conversion circuits 159 (0) -159 (X) and / or depending on whether those functions are required for the vector instruction to be processed. Programming is provided to VPE 22 (3) to control whether delay line 78 is included in input data flow paths 80 (0) -80 (X).

[00176]図２２では、たとえば、タップ付き遅延線７８によって符号付き複素数１６ビットフォーマット（ＳＣ１６）を使用するとき、算術命令のためのシフトバイアスが提供されるかどうかを示すために、ベクトル命令またはベクトルプログラミングのビット［７：０］にバイアスフィールド１８４（ＢＩＡＳ＿ＳＣ１６）が設けられる。第１のソースデータ（すなわち、入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ））が縮小化（すなわち、デインターリーブ）され、ＳＣ８フォーマットからＳＣ１６フォーマットに変換されるべきか否かを示すために、ベクトル命令またはベクトルプログラミングのビット［１６］に第１のソースデータフォーマット変換フィールド１８６（ＤＥＣＩＭＡＴＥ＿ＳＲＣ１）が設けられる。第２のソースデータ（すなわち、入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ））が縮小化（すなわち、デインターリーブ）され、ＳＣ８フォーマットからＳＣ１６フォーマットに変換されるべきか否かを示すために、ベクトル命令またはベクトルプログラミングのビット［１７］に第２のソースデータフォーマット変換フィールド１８８（ＤＥＣＩＭＡＴＥ＿ＳＲＣ２）が設けられる。出力ソースデータ（たとえば、図１９のＶＰＥ２２（３）内の、結果として生じる出力ベクトルデータサンプルセット１７２（０）〜１７２（Ｘ））が、ベクトルデータファイル８２（０）〜８２（Ｘ）に記憶されるとき、ＳＣ１６フォーマットで記憶されるべきか、またはＳＣ１６フォーマットからＳＣ８フォーマットに変換され並び替えられるべきかを示すために、ビット［１８］に出力データフォーマットフィールド１９０（ＤＥＳＴ＿ＦＭＴ）が設けられる。上記および図１７Ｂに前述されたように、特にＣＤＭＡ固有のベクトル処理動作に有用であり得る、偶数（たとえば、オンタイム）サンプルおよび奇数（たとえば、後発）サンプルに沿って、入力ソースデータ（すなわち、入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ））および出力データ（たとえば、図１９のＶＰＥ２２（３）内の、結果として生じる出力ベクトルデータサンプルセット１７２（０）〜１７２（Ｘ））が、縮小化（すなわち、デインターリーブ）されるべきかどうかを示すために、ビット［１９］にフェーズフォーマットフィールド１９２（ＤＥＣＩＭＡＴＥ＿ＰＨＡＳＥ）が設けられる。 [00176] In FIG. 22, for example, when using a signed complex 16-bit format (SC16) with a tapped delay line 78, to indicate whether a shift bias for arithmetic instructions is provided, a vector instruction or A bias field 184 (BIAS_SC16) is provided in bits [7: 0] of the vector programming. To indicate whether the first source data (ie, input vector data sample set 86 (0) -86 (X)) should be reduced (ie, deinterleaved) and converted from SC8 format to SC16 format. The first source data format conversion field 186 (DECIMATE_SRC1) is provided in bit [16] of the vector instruction or vector programming. To indicate whether the second source data (ie, input vector data sample set 86 (0) -86 (X)) should be reduced (ie, deinterleaved) and converted from SC8 format to SC16 format. The second source data format conversion field 188 (DECIMATE_SRC2) is provided in the bit [17] of the vector instruction or vector programming. Output source data (eg, the resulting output vector data sample sets 172 (0) -172 (X) in VPE 22 (3) of FIG. 19) are stored in vector data files 82 (0) -82 (X). When done, an output data format field 190 (DEST_FMT) is provided in bit [18] to indicate whether it should be stored in SC16 format or converted from SC16 format to SC8 format and reordered. As described above and in FIG. 17B, input source data (ie, along with even (eg, on-time) and odd (eg, late) samples, which may be particularly useful for CDMA-specific vector processing operations. Input vector data sample sets 86 (0) -86 (X)) and output data (eg, the resulting output vector data sample sets 172 (0) -172 (X) in VPE 22 (3) of FIG. 19) , A phase format field 192 (DECIMATE_PHASE) is provided in bit [19] to indicate whether it should be reduced (ie, deinterleaved).

[00177]上記で説明されたように、ＶＰＥ２２内の実行ユニット８４（０）〜８４（Ｘ）が入力ベクトルデータサンプルに対してベクトル処理を実行し、結果として出力データフローパス９８（０）〜９８（Ｘ）上に、結果として生じる出力ベクトルデータサンプルセットを供給した後、次のベクトル処理動作は、結果として生じる出力ベクトルデータサンプルセットに対して実行される必要があり得る。しかしながら、結果として生じる出力ベクトルデータサンプルセットは、次のベクトル処理動作のために並び替えられる必要があり得る。したがって、前の処理動作から得られた、結果として生じる出力ベクトルデータサンプルセットは、ベクトルデータファイル８２（０）〜８２（Ｘ）に記憶され、並び替えのためにフェッチされ、ベクトルデータファイル８２（０）〜８２（Ｘ）に並び替えられたフォーマットで再記憶されなければならない。たとえば、図１７Ａおよび図１７Ｂにおいて上記で説明されたように、次の処理動作は、ベクトルデータファイル８２（０）〜８２（Ｘ）に記憶されるときに、前に処理されたベクトルデータサンプルがインターリーブされることを必要とする場合がある。 [00177] As explained above, execution units 84 (0) -84 (X) in VPE 22 perform vector processing on the input vector data samples, resulting in output data flow paths 98 (0) -98. (X) After supplying the resulting output vector data sample set above, the next vector processing operation may need to be performed on the resulting output vector data sample set. However, the resulting output vector data sample set may need to be reordered for the next vector processing operation. Thus, the resulting output vector data sample set obtained from the previous processing operation is stored in vector data files 82 (0) -82 (X), fetched for reordering, and vector data file 82 ( 0) -82 (X) must be re-stored in the sorted format. For example, as described above in FIGS. 17A and 17B, when the next processing operation is stored in vector data files 82 (0) -82 (X), the previously processed vector data samples May need to be interleaved.

[00178]別の例として、次の処理動作は、ベクトルデータファイル８２（０）〜８２（Ｘ）に記憶されるときに、前に処理されたベクトルデータサンプルがデインターリーブされることを必要とする場合がある。たとえば、ＣＤＭＡ処理動作では、信号を表すデータサンプルは、信号の偶数（たとえば、オンタイム）フェーズおよび奇数（たとえば、後発）フェーズに従って記憶されインターリーブされる必要があり得る。この問題を解決するために、ベクトルプロセッサは、出力ベクトルデータがベクトルデータメモリに記憶された後に、実行ユニットからの出力ベクトルデータの後処理並び替えを実行する回路を含むことができる。ベクトルデータメモリに記憶された、後処理された出力ベクトルデータサンプルは、ベクトルデータメモリからフェッチされ、並び替えられ、ベクトルデータメモリに戻されて記憶される。この後処理は、実行ユニットによる並び替えられたベクトルデータサンプルの次の処理を遅延させ、実行ユニット内のコンピュータ構成要素が過少利用される原因になる。 [00178] As another example, the following processing operation requires that previously processed vector data samples be de-interleaved when stored in vector data files 82 (0) -82 (X). There is a case. For example, in a CDMA processing operation, data samples representing a signal may need to be stored and interleaved according to the even (eg, on-time) and odd (eg, late) phases of the signal. To solve this problem, the vector processor can include circuitry that performs post-processing reordering of the output vector data from the execution unit after the output vector data is stored in the vector data memory. The post-processed output vector data samples stored in the vector data memory are fetched from the vector data memory, rearranged, returned to the vector data memory and stored. This post-processing delays subsequent processing of the reordered vector data samples by the execution unit, causing underutilization of computer components within the execution unit.

[00179]この関連で、図２３は、図２のＶＰＥ２２として提供され得る別の例示的なＶＰＥ２２（４）の概略図である。下記でより詳細に記載されるように、図２３のＶＰＥ２２（４）は、ベクトルデータサンプルの再フェッチが除去または低減され、電力消費が低減される、ＶＰＥ２２（４）内のベクトルデータファイル８２（０）〜８２（Ｘ）に記憶されるべき、ベクトル処理動作のために実行ユニット８４（０）〜８４（Ｘ）によって供給される、結果として生じる出力ベクトルデータサンプルセット１９４（０）〜１９４（Ｘ）のインフライト並び替えを提供するように構成される。結果として生じる出力ベクトルデータサンプルセット１９４（０）〜１９４（Ｘ）は、この例では１９４（０）、１９４（１）、．．．、および１９４（Ｘ）である、「Ｘ＋１」個の、結果として生じる出力ベクトルデータサンプル１９４から構成される。たとえば、並び替えは、ベクトルデータファイル８２（０）〜８２（Ｘ）に記憶される前の、結果として生じる出力ベクトルデータサンプルセット１９４（０）〜１９４（Ｘ）のインターリービングを含む可能性がある。 [00179] In this regard, FIG. 23 is a schematic diagram of another exemplary VPE 22 (4) that may be provided as the VPE 22 of FIG. As described in more detail below, the VPE 22 (4) of FIG. 23 removes or reduces refetching of vector data samples, reducing power consumption, and the vector data file 82 ( 0) -82 (X), resulting output vector data sample sets 194 (0) -194 () supplied by execution units 84 (0) -84 (X) for vector processing operations. X) configured to provide in-flight reordering. The resulting output vector data sample sets 194 (0) -194 (X) are 194 (0), 194 (1),. . . , And 194 (X), consisting of “X + 1” resulting output vector data samples 194. For example, the reordering may include interleaving of the resulting output vector data sample sets 194 (0) -194 (X) before being stored in the vector data files 82 (0) -82 (X). is there.

[00180]図２３に示され、下記でより詳細に説明されるように、並び替え回路１９６（０）〜１９６（Ｘ）は、ベクトルデータレーン１００（０）〜１００（Ｘ）の各々の中の実行ユニット８４（０）〜８４（Ｘ）とベクトルデータファイル８２（０）〜８２（Ｘ）との間の出力データフローパス９８（０）〜９８（Ｘ）内に設けられる。並び替え回路１９６（０）〜１９６（Ｘ）は、出力データフローパス９８（０）〜９８（Ｘ）内の並び替えられた、結果として生じる出力ベクトルデータサンプルセット１９４Ｒ（０）〜１９４Ｒ（Ｘ）として、結果として生じる出力ベクトルデータサンプルセット１９４（０）〜１９４（Ｘ）の並び替えを提供するために、実行されるべきベクトル命令に従うプログラミングに基づいて構成される。図２３のＶＰＥ２２（４）における、結果として生じる出力ベクトルデータサンプルセット１９４（０）〜１９４（Ｘ）のインフライト並び替えは、実行ユニット８４（０）〜８４（Ｘ）によって供給された、結果として生じる出力ベクトルデータサンプルセット１９４（０）〜１９４（Ｘ）が、ベクトルデータファイル８２（０）〜８２（Ｘ）に記憶される前に、並び替えられた、結果として生じる出力ベクトルデータサンプルセット１９４Ｒ（０）〜１９４Ｒ（Ｘ）として並び替えられることを意味する。このようにして、結果として生じる出力ベクトルデータサンプルセット１９４（０）〜１９４（Ｘ）は、並び替えられた、結果として生じる出力ベクトルデータサンプルセット１９４Ｒ（０）〜１９４Ｒ（Ｘ）として、並び替えられたフォーマットでベクトルデータファイル８２（０）〜８２（Ｘ）に記憶される。非限定的な例として、結果として生じる出力ベクトルデータサンプルセット１９４（０）〜１９４（Ｘ）の並び替えは、ベクトルデータファイル８２（０）〜８２（Ｘ）に並び替えられた、結果として生じる出力ベクトルデータサンプルセット１９４Ｒ（０）〜１９４Ｒ（Ｘ）として記憶されるべき、結果として生じる出力ベクトルデータサンプルセット１９４（０）〜１９４（Ｘ）のインターリービングまたはデインターリービングを含む場合がある。 [00180] As shown in FIG. 23 and described in more detail below, the reordering circuits 196 (0) -196 (X) are located in each of the vector data lanes 100 (0) -100 (X). Are provided in output data flow paths 98 (0) to 98 (X) between the execution units 84 (0) to 84 (X) and the vector data files 82 (0) to 82 (X). The reordering circuits 196 (0) -196 (X) reorder the resulting output vector data sample sets 194R (0) -194R (X) in the output data flow paths 98 (0) -98 (X). As configured based on programming according to vector instructions to be executed to provide a reordering of the resulting output vector data sample sets 194 (0) -194 (X). The resulting in-flight reordering of output vector data sample sets 194 (0) -194 (X) in VPE 22 (4) of FIG. 23 is the result supplied by execution units 84 (0) -84 (X). The resulting output vector data sample sets 194 (0) -194 (X) are reordered before being stored in the vector data files 82 (0) -82 (X). It means that it is rearranged as 194R (0) to 194R (X). In this way, the resulting output vector data sample sets 194 (0) -194 (X) are rearranged as the resulting output vector data sample sets 194R (0) -194R (X). Are stored in the vector data files 82 (0) to 82 (X) in the format. As a non-limiting example, the resulting reordering of output vector data sample sets 194 (0) -194 (X) results in reordering into vector data files 82 (0) -82 (X). It may include interleaving or deinterleaving of the resulting output vector data sample sets 194 (0) -194 (X) to be stored as output vector data sample sets 194R (0) -194R (X).

[00181]このように、出力データフローパス９８（０）〜９８（Ｘ）内に設けられた並び替え回路１９６（０）〜１９６（Ｘ）により、結果として生じる出力ベクトルデータサンプルセット１９４（０）〜１９４（Ｘ）は、最初にベクトルデータファイル８２（０）〜８２（Ｘ）に記憶され、次いでベクトルデータファイル８２（０）〜８２（Ｘ）からフェッチされ、並び替えられ、ベクトルデータファイル８２（０）〜８２（Ｘ）に再記憶される必要がない。結果として生じる出力ベクトルデータサンプルセット１９４（０）〜１９４（Ｘ）は、ベクトルデータファイル８２（０）〜８２（Ｘ）に記憶される前に並び替えられる。このようにして、結果として生じる出力ベクトルデータサンプルセット１９４（０）〜１９４（Ｘ）は、実行ユニット８４（０）〜８４（Ｘ）内で実行されるべき次のベクトル処理動作を遅延させる可能性がある、さらなる後処理ステップを必要とせずに、ベクトルデータファイル８２（０）〜８２（Ｘ）に並び替えられたフォーマットで記憶される。したがって、ＶＰＥ２２（４）内のデータフローパスの効率は、結果として生じる出力ベクトルデータサンプルセット１９４（０）〜１９４（Ｘ）の並び替えによって制限されない。実行ユニット８４（０）〜８４（Ｘ）における次のベクトル処理は、結果として生じる出力ベクトルデータサンプルセット１９４（０）〜１９４（Ｘ）がベクトルデータファイル８２（０）〜８２（Ｘ）に並び替えられた、結果として生じる出力ベクトルデータサンプルセット１９４Ｒ（０）〜１９４Ｒ（Ｘ）として並び替えられたフォーマットで記憶されるべきときに、データフローの制限ではなく、コンピュータリソースのみによって制限される。 [00181] Thus, the resulting output vector data sample set 194 (0) by the reordering circuits 196 (0) -196 (X) provided in the output data flow paths 98 (0) -98 (X). ˜194 (X) are first stored in vector data files 82 (0) ˜82 (X), then fetched from the vector data files 82 (0) ˜82 (X), rearranged, and vector data file 82 There is no need to re-store in (0) -82 (X). The resulting output vector data sample sets 194 (0) -194 (X) are reordered before being stored in the vector data files 82 (0) -82 (X). In this way, the resulting output vector data sample sets 194 (0) -194 (X) can delay the next vector processing operation to be performed within execution units 84 (0) -84 (X). Stored in the rearranged format in the vector data files 82 (0) -82 (X) without the need for additional post-processing steps. Thus, the efficiency of the data flow path within VPE 22 (4) is not limited by the reordering of the resulting output vector data sample sets 194 (0) -194 (X). The next vector processing in execution units 84 (0) -84 (X) is that the resulting output vector data sample sets 194 (0) -194 (X) are arranged in vector data files 82 (0) -82 (X). When it is to be stored in the rearranged format as permuted resulting output vector data sample sets 194R (0) -194R (X), it is limited only by computer resources, not data flow limitations.

[00182]この例では、図２３に示されたように、並び替え回路１９６（０）〜１９６（Ｘ）を含むＶＰＥ２２（４）はまた、プライマリタップ付き遅延線７８（０）および／またはシャドウタップ付き遅延線７８（１）をオプションとして含むことができる。タップ付き遅延線７８（０）、７８（１）の動作は、ＶＰＥ２２（１）およびＶＰＥ２２（２）に関して上記で前述された。上記で前に説明されたように、タップ付き遅延線７８（０）、７８（１）は、実行ユニット８４（０）〜８４（Ｘ）に供給されるべきシフトされた入力ベクトルデータサンプルセット８６Ｓ（０）〜８６Ｓ（Ｘ）を必要とするベクトル処理動作に利用される場合がある。同様に、図４、図１１、および図１９のＶＰＥ２２（１）〜２２（３）内に設けられた共通構成要素が、図２３のＶＰＥ２２（４）内に設けられることに留意されたい。共通構成要素は、共通要素番号とともに図２３のＶＰＥ２２（４）において示される。ＶＰＥ２２（１）〜２２（３）に関する上記これらの共通構成要素の前の記載および説明は、図２３のＶＰＥ２２（４）にも適用可能であり、したがってここでは再び記載されない。 [00182] In this example, VPE 22 (4) including reordering circuits 196 (0) -196 (X) also includes primary tapped delay line 78 (0) and / or shadow, as shown in FIG. A tapped delay line 78 (1) may optionally be included. The operation of tapped delay lines 78 (0), 78 (1) has been described above with respect to VPE 22 (1) and VPE 22 (2). As previously described above, tapped delay lines 78 (0), 78 (1) are shifted input vector data sample sets 86S to be supplied to execution units 84 (0) -84 (X). It may be used for vector processing operations that require (0) -86S (X). Similarly, it should be noted that common components provided in VPEs 22 (1) -22 (3) of FIGS. 4, 11, and 19 are provided in VPE 22 (4) of FIG. Common components are shown in VPE 22 (4) of FIG. 23 along with common element numbers. The previous description and description of these common components above for VPE 22 (1) -22 (3) is also applicable to VPE 22 (4) of FIG. 23 and is therefore not described again here.

[00183]引き続き図２３を参照すると、より具体的には、並び替え回路１９６（０）〜１９６（Ｘ）は、出力データフローパス９８（０）〜９８（Ｘ）上の並び替え回路入力１９８（０）〜１９８（Ｘ）上で、結果として生じる出力ベクトルデータサンプルセット１９４（０）〜１９４（Ｘ）を受信するように構成される。並び替え回路１９６（０）〜１９６（Ｘ）は、並び替えられた、結果として生じる出力ベクトルデータサンプルセット１９４Ｒ（０）〜１９４Ｒ（Ｘ）を供給するために、結果として生じる出力ベクトルデータサンプルセット１９４（０）〜１９４（Ｘ）を並び替えるように構成される。並び替え回路１９６（０）〜１９６（Ｘ）は、記憶用にベクトルデータファイル８２（０）〜８２（Ｘ）に供給されるために、出力データフローパス９８（０）〜９８（Ｘ）内の並び替え回路出力２００（０）〜２００（Ｘ）上に並び替えられた、結果として生じる出力ベクトルデータサンプルセット１９４Ｒ（０）〜１９４Ｒ（Ｘ）を供給するように構成される。 [00183] With continued reference to FIG. 23, more specifically, the reordering circuits 196 (0) -196 (X) receive reordering circuit inputs 198 () on the output data flow paths 98 (0) -98 (X). 0) -198 (X) is configured to receive the resulting output vector data sample sets 194 (0) -194 (X). Reordering circuits 196 (0) -196 (X) provide the resulting output vector data sample sets 194R (0) -194R (X) to provide the resulting output vector data sample sets 194R (0) -194R (X). 194 (0) to 194 (X) are rearranged. The reordering circuits 196 (0) -196 (X) are supplied to the vector data files 82 (0) -82 (X) for storage, so that the output data flow paths 98 (0) -98 (X) The resulting output vector data sample sets 194R (0) -194R (X) are arranged to be rearranged on the reordering circuit outputs 200 (0) -200 (X).

[00184]この実施形態における出力データフローパス９８（０）〜９８（Ｘ）内のベクトルデータファイル８２（０）〜８２（Ｘ）に並び替えられた、結果として生じる出力ベクトルデータサンプルセット１９４Ｒ（０）〜１９４Ｒ（Ｘ）を供給するための図２３のＶＰＥ２２（４）のさらなる詳細および特徴のさらなる説明が次に記載される。この関連で、図２４は、結果として生じる出力ベクトルデータサンプルセット１９４（０）〜１９４（Ｘ）の並び替えを必要とする例示的なベクトル命令に従って、並び替え回路１９６（０）〜１９６（Ｘ）を利用する図２３のＶＰＥ２２（４）において実行され得るベクトル処理動作２０２から得られた、結果として生じる出力ベクトルデータサンプルセット１９４（０）〜１９４（Ｘ）の例示的な並び替えを示すフローチャートである。 [00184] The resulting output vector data sample set 194R (0) sorted into the vector data files 82 (0) -82 (X) in the output data flow paths 98 (0) -98 (X) in this embodiment. ) To 194R (X) for further details and further description of VPE 22 (4) of FIG. In this regard, FIG. 24 illustrates a reordering circuit 196 (0) -196 (X) according to an exemplary vector instruction that requires reordering of the resulting output vector data sample sets 194 (0) -194 (X). ) Using the vector processing operation 202 that may be performed in the VPE 22 (4) of FIG. 23, a flowchart illustrating an exemplary reordering of the resulting output vector data sample sets 194 (0) -194 (X). It is.

[00185]図２３と図２４とを参照すると、ベクトル命令に従うベクトル処理動作２０２に従って処理されるべき入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）が、ベクトルデータファイル８２（０）〜８２（Ｘ）からフェッチされ、入力データフローパス８０（０）〜８０（Ｘ）内に供給される（図２４のブロック２０４）。たとえば、ベクトル処理動作２０２は、実行されるべきベクトル命令に従って必要とされる任意のベクトル処理動作を含むことができる。上述のフィルタ、相関、およびフォーマット変換のベクトル処理動作を含む非限定的な例。ベクトル処理動作２０２のための入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）の幅に応じて、ベクトル命令のプログラミングに従うベクトル処理動作２０２を提供するために、図２３のＶＰＥ２２（４）内のベクトルデータレーン１００（０）〜１００（Ｘ）の１つ、いくつか、またはすべてが利用され得る。ベクトルデータファイル８２（０）〜８２（Ｘ）の幅全体が必要な場合、すべてのベクトルデータレーン１００（０）〜１００（Ｘ）がベクトル処理動作２０２に利用され得る。ベクトル処理動作２０２は、ベクトルデータレーン１００（０）〜１００（Ｘ）のサブセットを必要とするにすぎない場合がある。これは、入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）の幅がすべてのベクトルデータファイル８２（０）〜８２（Ｘ）の幅よりも小さいからであり得るし、ここで、ベクトル処理動作２０２と並列に実行されるべき他のベクトル処理動作にさらなるベクトルデータレーン１００を利用することが望ましい。 [00185] Referring to FIGS. 23 and 24, input vector data sample sets 86 (0) -86 (X) to be processed in accordance with vector processing operations 202 according to vector instructions are converted to vector data files 82 (0) -82. Fetched from (X) and fed into input data flow paths 80 (0) -80 (X) (block 204 in FIG. 24). For example, vector processing operation 202 can include any vector processing operation required according to the vector instruction to be executed. Non-limiting examples including the above-described filter, correlation, and format conversion vector processing operations. Depending on the width of the input vector data sample set 86 (0) -86 (X) for the vector processing operation 202, the VPE 22 (4) in FIG. One, some, or all of the vector data lanes 100 (0) -100 (X) may be utilized. If the entire width of the vector data files 82 (0) -82 (X) is required, all vector data lanes 100 (0) -100 (X) may be utilized for the vector processing operation 202. Vector processing operation 202 may only require a subset of vector data lanes 100 (0) -100 (X). This may be because the width of the input vector data sample set 86 (0) -86 (X) is less than the width of all vector data files 82 (0) -82 (X), where vector processing It may be desirable to utilize additional vector data lanes 100 for other vector processing operations to be performed in parallel with operation 202.

[00186]引き続き図２３と図２４とを参照すると、フェッチされた入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）が、実行ユニット８４（０）〜８４（Ｘ）にある入力データフローパス８０（０）〜８０（Ｘ）から受信される（図２４のブロック２０６）。実行ユニット８４（０）〜８４（Ｘ）が、ベクトル命令に従って提供されたベクトル処理動作２０２に従って、受信された入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）に対してベクトル処理を実行する（図２４のブロック２０８）。非限定的な例として、入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）は、場合によっては、入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）のシフトを伴う実行ユニット８４（０）〜８４（Ｘ）によって実行されるベクトル処理動作２０２の各処理ステージの間のベクトル処理動作２０２の実行中にシフトされるべき入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）として、プライマリタップ付き遅延線７８（０）の中にロードされる場合がある。前に説明されたように、次の入力ベクトルデータサンプルセット８６Ｎ（０）〜８６Ｎ（Ｘ）は、場合によっては、実行ユニット８４（１）〜８４（Ｘ）によって処理されるべき次の入力ベクトルデータサンプルセット８６Ｎ（０）〜８６Ｎ（Ｘ）として、シャドウタップ付き遅延線７８（１）の中にロードされる場合もある。上記で前に説明されたように、タップ付き遅延線７８の目的は、シフトされた入力ベクトルデータサンプル８６に対して演算するベクトル処理動作２０２の動作の間に、実行ユニット８４（０）〜８４（Ｘ）に供給されるべきシフトされた入力ベクトルデータサンプルセット８６Ｓ（０）〜８６Ｓ（Ｘ）に、入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）をシフトすることである。 [00186] With continued reference to FIGS. 23 and 24, the input data flow path 80 where the fetched input vector data sample sets 86 (0) -86 (X) are in execution units 84 (0) -84 (X). (0) -80 (X) are received (block 206 in FIG. 24). Execution units 84 (0) -84 (X) perform vector processing on received input vector data sample sets 86 (0) -86 (X) according to vector processing operations 202 provided according to vector instructions. (Block 208 in FIG. 24). By way of non-limiting example, input vector data sample sets 86 (0) -86 (X) may be executed by execution unit 84 (0 ) -84 (X) as input vector data sample sets 86 (0) -86 (X) to be shifted during execution of vector processing operation 202 during each processing stage of vector processing operation 202 performed by primary May be loaded into tapped delay line 78 (0). As previously described, the next input vector data sample set 86N (0) -86N (X) is possibly the next input vector to be processed by execution units 84 (1) -84 (X). Data sample sets 86N (0) -86N (X) may be loaded into the delay line 78 (1) with shadow tap. As previously described above, the purpose of the tapped delay line 78 is to execute units 84 (0) -84 during operation of the vector processing operation 202 that operates on the shifted input vector data samples 86. Shifting the input vector data sample sets 86 (0) -86 (X) to the shifted input vector data sample sets 86S (0) -86S (X) to be supplied to (X).

[00187]引き続き図２３と図２４とを参照すると、実行ユニット８４（０）〜８４（Ｘ）は、入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）を使用して、乗算および／または累算を提供するように構成される場合がある。タップ付き遅延線７８がベクトル処理動作２０２の間にフォーマット変換された入力ベクトルデータサンプルセット８６Ｆ（０）〜８６Ｆ（Ｘ）をシフトするために利用される場合、実行ユニット８４（０）〜８４（Ｘ）は、例によって前述されたように、ベクトル処理動作２０２が完了するまで、ベクトル処理動作２０２の各処理ステージの間にシフトされた入力ベクトルデータサンプルセット８６Ｓ（０）〜８６Ｓ（Ｘ）を受信することができる。ベクトル処理動作２０２が完了すると、入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）、またはシフトされフォーマット変換された入力ベクトルデータサンプルセット８６Ｓ（０）〜８６Ｓ（Ｘ）のベクトル処理に基づく、結果として生じる出力ベクトルデータサンプルセット１９４（０）〜１９４（Ｘ）が、出力データフローパス９８（０）〜９８（Ｘ）内に供給される。 [00187] With continued reference to FIGS. 23 and 24, execution units 84 (0) -84 (X) may use input vector data sample sets 86 (0) -86 (X) to multiply and / or May be configured to provide accumulation. When tapped delay line 78 is used to shift input vector data sample sets 86F (0) -86F (X) that have been formatted during vector processing operation 202, execution units 84 (0) -84 ( X), as described above by example, the input vector data sample sets 86S (0) to 86S (X) shifted during each processing stage of the vector processing operation 202 until the vector processing operation 202 is completed. Can be received. Upon completion of the vector processing operation 202, based on the vector processing of the input vector data sample set 86 (0) -86 (X) or the shifted and format converted input vector data sample set 86S (0) -86S (X), The resulting output vector data sample sets 194 (0) -194 (X) are fed into output data flow paths 98 (0) -98 (X).

[00188]引き続き図２３と図２４とを参照すると、結果として生じる出力ベクトルデータサンプルセット１９４（０）〜１９４（Ｘ）がベクトルデータファイル８２（０）〜８２（Ｘ）に記憶される前に、結果として生じる出力ベクトルデータサンプルセット１９４（０）〜１９４（Ｘ）は、実行ユニット８４（０）〜８４（Ｘ）とベクトルデータファイル８２（０）〜８２（Ｘ）との間に設けられた出力データフローパス９８（０）〜９８（Ｘ）内に設けられた並び替え回路１９６（０）〜１９６（Ｘ）に供給される。並び替え回路１９６（０）〜１９６（Ｘ）は、実行されているベクトル命令に従って、および下記でより詳細に説明されるように、ベクトル命令がベクトルデータファイル８２（０）〜８２（Ｘ）に記憶されるべき、結果として生じる出力ベクトルデータサンプルセット１９４（０）〜１９４（Ｘ）の並び替えを要求する場合、出力データフローパス９８（０）〜９８（Ｘ）に含まれるようにプログラム可能である。結果として生じる出力ベクトルデータサンプルセット１９４（０）〜１９４（Ｘ）がベクトルデータファイル８２（０）〜８２（Ｘ）に記憶されることなく、並び替え回路１９６（０）〜１９６（Ｘ）が、実行されているベクトル命令に従うプログラミングにおいて提供される並び替えに従って、結果として生じる出力ベクトルデータサンプルセット１９４（０）〜１９４（Ｘ）を並び替える（図２４のブロック２１０）。このようにして、結果として生じる出力ベクトルデータサンプルセット１９４（０）〜１９４（Ｘ）は、それにより実行ユニット８４（０）〜８４（Ｘ）において遅延をもたらす、最初にベクトルデータファイル８２（０）〜８２（Ｘ）に記憶され、再フェッチされ、後処理動作において並び替えられ、ベクトルデータファイル８２（０）〜８２（Ｘ）に並び替えられたフォーマットで記憶される必要がない。結果として生じる出力ベクトルデータサンプルセット１９４（０）〜１９４（Ｘ）が、並び替え後処理を必要とせずに、ベクトルデータファイル８２（０）〜８２（Ｘ）に並び替えられた、結果として生じる出力ベクトルデータサンプルセット１９４Ｒ（０）〜１９４Ｒ（Ｘ）として記憶される（図２４のブロック２１２）。たとえば、結果として生じる出力ベクトルデータサンプルセット１９４（０）〜１９４（Ｘ）は、並び替え回路１９６（０）〜１９６（Ｘ）によって並び替えられる前に、図１８Ａおよび図１８Ｂにおいて提供されたフォーマットのようなフォーマットで現れる場合がある。 [00188] With continued reference to FIGS. 23 and 24, before the resulting output vector data sample sets 194 (0) -194 (X) are stored in the vector data files 82 (0) -82 (X). The resulting output vector data sample sets 194 (0) -194 (X) are provided between the execution units 84 (0) -84 (X) and the vector data files 82 (0) -82 (X). The output data flow paths 98 (0) to 98 (X) are supplied to the rearrangement circuits 196 (0) to 196 (X). The reordering circuits 196 (0) -196 (X) transfer vector instructions to the vector data files 82 (0) -82 (X) according to the vector instructions being executed and as described in more detail below. When requesting reordering of the resulting output vector data sample sets 194 (0) -194 (X) to be stored, it can be programmed to be included in the output data flow paths 98 (0) -98 (X). is there. The resulting output vector data sample sets 194 (0) -194 (X) are not stored in the vector data files 82 (0) -82 (X), and the reordering circuits 196 (0) -196 (X) The resulting output vector data sample sets 194 (0) -194 (X) are reordered according to the reordering provided in programming according to the vector instruction being executed (block 210 of FIG. 24). In this way, the resulting output vector data sample sets 194 (0) -194 (X) will initially cause a delay in the execution units 84 (0) -84 (X), which is initially the vector data file 82 (0 ) -82 (X), re-fetched, rearranged in post-processing operations, and need not be stored in the rearranged format in vector data files 82 (0) -82 (X). The resulting output vector data sample sets 194 (0) -194 (X) are rearranged into vector data files 82 (0) -82 (X) without the need for post-reordering processing, resulting in It is stored as output vector data sample sets 194R (0) -194R (X) (block 212 in FIG. 24). For example, the resulting output vector data sample sets 194 (0) -194 (X) are provided in the format provided in FIGS. 18A and 18B before being reordered by the reordering circuits 196 (0) -196 (X). May appear in a format like

[00189]次に図２５に関して、並び替え回路１９６（０）〜１９６（Ｘ）の一例が記載される。ベクトルデータレーン１００（０）イズプロバイデッド内に設けられた並び替え回路１９６（０）の１つの事例のために、並び替え回路１９６（０）〜１９６（Ｘ）の内部構成要素の例示的な詳細が図２５において提供されるが、それは並び替え回路１９６（１）〜１９６（Ｘ）にも適用可能である。例として図２５における並び替え回路１９６（０）を取り上げると、この例における並び替え回路１９６（０）は、並び替えられた、結果として生じる出力ベクトルデータサンプル１９４Ｒ（０）を供給するために、ベクトルデータレーン１００（０）内の出力データフローパス９８（０）内で、実行ユニット８４（０）によって供給された、結果として生じる出力ベクトルデータサンプル１９４（０）を並び替えるように構成される。この関連で、この例ではマルチプレクサの形態で設けられた４つの出力ベクトルデータサンプル選択器２１４（３）〜２１４（０）がこの例において提供され、それらは、各々８ビット幅のこの例では４つの９６（０）（３）〜９６（０）（０）である、実行ユニット出力９６（０）のビット幅に従って配置される。各出力ベクトルデータサンプル選択器２１４（３）〜２１４（０）は、割り当てられた実行ユニット出力９６（０）（３）〜９６（０）（０）内の、結果として生じる出力ベクトルデータサンプル１９４（０）の部分、または割り当てられた実行ユニット出力９６（０）（３）〜９６（０）（０）に隣接する実行ユニット出力９６からの、結果として生じるシフト出力ベクトルデータサンプル１９４（０）の部分のいずれかを選択するように構成される。 [00189] Referring now to FIG. 25, an example of a reordering circuit 196 (0) -196 (X) will be described. For one instance of the reordering circuit 196 (0) provided within the vector data lane 100 (0) is provisioned, exemplary internal components of the reordering circuits 196 (0) -196 (X) Details are provided in FIG. 25, but it is also applicable to the reordering circuits 196 (1) -196 (X). Taking the reordering circuit 196 (0) in FIG. 25 as an example, the reordering circuit 196 (0) in this example provides a reordered resulting output vector data sample 194R (0): It is configured to reorder the resulting output vector data samples 194 (0) supplied by execution unit 84 (0) within output data flow path 98 (0) in vector data lane 100 (0). In this regard, four output vector data sample selectors 214 (3) -214 (0), provided in this example in the form of a multiplexer, are provided in this example, which are each 4 bits wide in this example. Are arranged according to the bit width of the execution unit output 96 (0), which is 96 (0) (3) -96 (0) (0). Each output vector data sample selector 214 (3) -214 (0) has a resulting output vector data sample 194 in the assigned execution unit output 96 (0) (3) -96 (0) (0). The resulting shifted output vector data sample 194 (0) from the execution unit output 96 adjacent to the portion of (0) or the assigned execution unit output 96 (0) (3) -96 (0) (0). Configured to select one of the parts.

[00190]たとえば、実行ユニット出力９６（０）（３）〜９６（０）（０）が、１６ビット符号付き複素数フォーマット、実数［３１：２４］、実数［２３：１６］、虚数［１５：８］、虚数［７：０］で、結果として生じる出力ベクトルデータサンプル１９４（０）を供給し、所望の並び替えられた（たとえば、インターリーブされた）フォーマットが、実行されるべきベクトル命令に従って実数［３１：２４］、虚数［２３：１６］、実数［１５：８］、虚数［７：０］である場合、出力ベクトルデータサンプル選択器２１４（３）〜２１４（０）の選択は以下のようであるはずである。出力ベクトルデータサンプル選択器２１４（３）は、出力データフローパス９８（０）（３）上で供給するために、実行ユニット出力９６（０）（３）から、結果として生じる出力ベクトルデータサンプル１９４（０）（３）を選択するはずである。しかしながら、出力ベクトルデータサンプル選択器２１４（２）は、出力データフローパス９８（０）（２）上で供給するために、実行ユニット出力９６（０）（１）上の、結果として生じる出力ベクトルデータサンプル１９４（０）（１）の部分を選択するはずである。これにより、並び替えられた、結果として生じる出力ベクトルデータサンプル１９４Ｒ（０）の並び替えられた、結果として生じる出力ベクトルデータサンプル１９４Ｒ（０）（３）、１９４Ｒ（０）（２）として、隣接する出力データフローパス９８（０）（３）、９８（０）（２）内の、結果として生じるシフト出力ベクトルデータサンプル１９４（０）（すなわち、実数［３１：２４］、虚数［２３：１６］）のインターリーブされた実数部分がもたらされるはずである。同様に、出力ベクトルデータサンプル選択器２１４（０）は、出力データフローパス９８（０）（０）内で供給するために、実行ユニット出力９６（０）（０）から、結果として生じる出力ベクトルデータサンプル１９４（０）（０）を選択するはずである。しかしながら、出力ベクトルデータサンプル選択器２１４（１）は、出力データフローパス９８（０）（１）上で供給するために、実行ユニット出力９６（０）（２）上の、結果として生じる出力ベクトルデータサンプル１９４（０）（２）を選択するはずである。これにより、並び替えられた、結果として生じる出力ベクトルデータサンプル１９４Ｒ（０）の並び替えられた、結果として生じる出力ベクトルデータサンプル１９４Ｒ（０）（１）、１９４Ｒ（０）（０）として、隣接する出力データフローパス９８（０）（１）、９８（０）（０）内で並び替えられ、インターリーブされた、結果として生じる出力ベクトルデータサンプル１９４（０）（２）、１９４（０）（０）（すなわち、実数［１５：８］、虚数［７：０］）がもたらされるはずである。同様にマルチプレクサの形態で設けられた出力ベクトルデータサンプル選択器２１６（１）、２１６（０）は、図２５に示されたように、割り当てられていない、隣接しない実行ユニット出力９６（０）（３）〜９６（０）（０）からの、結果として生じる出力ベクトルデータサンプル１９４（０）（３）〜１９４（０）（０）の間を選択する能力を提供する。 [00190] For example, execution unit outputs 96 (0) (3) -96 (0) (0) are 16-bit signed complex format, real [31:24], real [23:16], imaginary [15: 8], with the imaginary number [7: 0], providing the resulting output vector data sample 194 (0), where the desired reordered (eg, interleaved) format is real according to the vector instruction to be executed In the case of [31:24], imaginary number [23:16], real number [15: 8], imaginary number [7: 0], the selection of the output vector data sample selectors 214 (3) to 214 (0) is as follows. Should be. The output vector data sample selector 214 (3) provides the resulting output vector data sample 194 () from the execution unit output 96 (0) (3) to provide on the output data flow path 98 (0) (3). 0) (3) should be selected. However, the output vector data sample selector 214 (2) provides the resulting output vector data on the execution unit output 96 (0) (1) to provide on the output data flow path 98 (0) (2). The portion of sample 194 (0) (1) should be selected. This results in the rearranged resulting output vector data samples 194R (0) being rearranged and resulting output vector data samples 194R (0) (3), 194R (0) (2) as adjacent. The resulting shifted output vector data sample 194 (0) (ie real [31:24], imaginary [23:16]) in the output data flow path 98 (0) (3), 98 (0) (2) ) Should result in an interleaved real part. Similarly, output vector data sample selector 214 (0) provides the resulting output vector data from execution unit output 96 (0) (0) to provide in output data flow path 98 (0) (0). Sample 194 (0) (0) should be selected. However, the output vector data sample selector 214 (1) provides the resulting output vector data on the execution unit output 96 (0) (2) to provide on the output data flow path 98 (0) (1). Sample 194 (0) (2) should be selected. This results in rearranged output vector data samples 194R (0) being rearranged and resulting output vector data samples 194R (0) (1), 194R (0) (0) as adjacent. Resulting output vector data samples 194 (0) (2), 194 (0) (0) sorted and interleaved within output data flow paths 98 (0) (1), 98 (0) (0) ) (Ie real number [15: 8], imaginary number [7: 0]). Similarly, output vector data sample selectors 216 (1), 216 (0) provided in the form of a multiplexer, as shown in FIG. 25, are assigned non-adjacent execution unit outputs 96 (0) ( 3) to 96 (0) (0) provide the ability to select between the resulting output vector data samples 194 (0) (3) to 194 (0) (0).

[00191]引き続き図２３と図２５とを参照すると、並び替え回路１９６（０）〜１９６（Ｘ）は、実行されるべきベクトル命令に従って、結果として生じる出力ベクトルデータサンプルセット１９４（０）〜１９４（Ｘ）を並び替えないように構成または再構成されるようにプログラム可能であるものとして提供される可能性がある。この例では、並び替え回路１９６（０）〜１９６（Ｘ）は、形成されたいかなる並び替え動作もなしに、並び替え回路１９６（０）〜１９６（Ｘ）に直接流れる出力データフローパス９８（０）〜９８（Ｘ）を提供するようにプログラムされる場合がある。上記で前に説明され、図２２に示されたように、出力ソースデータ（たとえば、図２３のＶＰＥ２２（４）内の、結果として生じる出力ベクトルデータサンプルセット１９４（０）〜１９４（Ｘ））が、ベクトルデータファイル８２（０）〜８２（Ｘ）に記憶されるとき、ＳＣ１６フォーマットで記憶されるべきか、またはＳＣ１６フォーマットからＳＣ８フォーマットに変換され並び替えられるべきかを示すために、非限定的な例として、ベクトル命令のビット［１８］にチャート１８２内の出力データフォーマットフィールド１９０（ＤＥＳＴ＿ＦＭＴ）が設けられ得る。 [00191] With continued reference to FIGS. 23 and 25, the reordering circuits 196 (0) -196 (X), depending on the vector instruction to be executed, result in output vector data sample sets 194 (0) -194. (X) may be provided as programmable to be configured or reconfigured so as not to be reordered. In this example, the rearrangement circuits 196 (0) to 196 (X) perform the output data flow path 98 (0) that flows directly to the rearrangement circuits 196 (0) to 196 (X) without any rearrangement operation formed. ) To 98 (X) may be programmed. Output source data (eg, resulting output vector data sample sets 194 (0) -194 (X) in VPE 22 (4) of FIG. 23) as previously described above and shown in FIG. Is stored in vector data files 82 (0) -82 (X) to indicate whether it should be stored in SC16 format or converted from SC16 format to SC8 format and reordered. As a typical example, bit [18] of a vector instruction may be provided with an output data format field 190 (DEST_FMT) in chart 182.

[00192]この関連で、図２５のプログラム可能な並び替えデータパス構成入力２１８（０）は、出力データフローパス９８（０）内の、結果として生じる出力ベクトルデータサンプル１９４（０）（３）〜１９４（０）（０）を並び替えるか、または並び替えないように、並び替え回路１９６（０）をプログラムするために、並び替え回路１９６（０）に供給され得る。プログラム可能な並び替えデータパス構成入力２１８（１）〜２１８（Ｘ）（図示せず）は、それぞれ、出力データフローパス９８（１）〜９８（Ｘ）内の、結果として生じる出力ベクトルデータサンプルセット１９４（１）〜１９４（Ｘ）を並び替えるか、または並び替えないように、並び替え回路１９６（１）〜１９６（Ｘ）をプログラムするために、並び替え回路１９６（１）〜１９６（Ｘ）に同様に供給され得る。このようにして、並び替え回路１９６（０）〜１９６（Ｘ）は、ベクトル命令が実行されるべきそのような処理を提供しない場合、結果として生じる出力ベクトルデータサンプルセット１９４（０）〜１９４（Ｘ）を並び替えないようにプログラムされ得る。プログラム可能な並び替えデータパス構成入力２１８（０）〜２１８（Ｘ）は、ＶＰＥ２２（４）によるベクトル処理において柔軟性を提供するように、ベクトル命令ごとに構成および再構成され得る。たとえば、プログラム可能な並び替えデータパス構成入力２１８（０）〜２１８（Ｘ）は、必要な場合実行ユニット８４（０）〜８４（Ｘ）を十分に利用して、必要に応じて並び替えを提供するように、ベクトル命令のクロックサイクルごとに、必要な場合クロックサイクルごとに、構成および再構成され得る。 [00192] In this regard, the programmable reordering data path configuration input 218 (0) of FIG. 25 is the resulting output vector data sample 194 (0) (3)-in the output data flow path 98 (0). The reordering circuit 196 (0) may be supplied to the reordering circuit 196 (0) to program the reordering circuit 196 (0) to reorder or not reorder 194 (0) (0). Programmable reordering data path configuration inputs 218 (1) -218 (X) (not shown) are the resulting output vector data sample sets in output data flow paths 98 (1) -98 (X), respectively. In order to program the rearrangement circuits 196 (1) to 196 (X) so as to rearrange or not rearrange 194 (1) to 194 (X), the rearrangement circuits 196 (1) to 196 (X ) As well. In this way, the reordering circuits 196 (0) -196 (X) provide the resulting output vector data sample sets 194 (0) -194 () if the vector instruction does not provide such processing to be executed. X) may be programmed not to reorder. Programmable reordering data path configuration inputs 218 (0) -218 (X) may be configured and reconfigured for each vector instruction to provide flexibility in vector processing by VPE 22 (4). For example, the programmable reordering data path configuration inputs 218 (0) -218 (X) can reorder as needed, making full use of execution units 84 (0) -84 (X) when necessary. As provided, it can be configured and reconfigured every clock cycle of the vector instruction and, if necessary, every clock cycle.

[00193]実行ユニット８４（０）〜８４（Ｘ）において実行されるべき次のベクトル処理動作を遅延させる可能性がある、さらなる後処理ステップを必要とせずに、実行ユニット８４（０）〜８４（Ｘ）からの、結果として生じる出力ベクトルデータサンプルセットのインフライト処理を伴う、他のベクトル処理動作も提供され得る。たとえば、可変長の拡散信号データシーケンスに従ってチップシーケンスの逆拡散を必要とするＣＤＭＡワイヤレスベースバンド動作が、インフライトベクトル処理から恩恵を受ける場合がある。 [00193] Execution units 84 (0) -84 without requiring further post-processing steps that may delay the next vector processing operation to be performed in execution units 84 (0) -84 (X). Other vector processing operations may also be provided with in-flight processing of the resulting output vector data sample set from (X). For example, CDMA wireless baseband operation that requires despreading of the chip sequence according to a variable length spread signal data sequence may benefit from in-flight vector processing.

[00194]たとえば、ＣＤＭＡを使用して変調され得るデータ信号２２０が図２６Ａにおいて示される。データ信号２２０は２Ｔの周期を有する。図２６Ａに示されたように、データ信号２２０は、この例ではデータシーケンス１０１０を表し、ここで、高信号レベルは論理「１」を表し、低信号レベルは論理「０」を表す。ＣＤＭＡ変調において、データ信号２２０は、図２６Ｂのチップシーケンス２２２などのチップシーケンス２２２によって拡張され、それは擬似ランダムコードであり得る。この例では、チップシーケンス２２２は、この例ではデータ信号２２０のサンプルごとに１０チップの拡散率または拡散係数を有するチップシーケンス２２２を提供するために、データ信号２２０の周期の１０分の１の大きさである周期を有する。この例では、データ信号２２０を拡散するために、データ信号２２０は、図２６Ｃに示されたように、拡散送信データ信号２２４を供給するために、チップシーケンス２２２と排他的論理和（すなわち、ＸＯＲ）される。拡散送信データ信号２２４とともに同じ帯域幅で送信される他のユーザ向けの他のデータ信号は、互いに直交する他のチップシーケンスとチップシーケンス２２２とを用いて拡散される。このようにして、元のデータ信号２２０が復元されるべきとき、拡散送信データ信号２２４は、図１１〜図１６に関して上記で前述されたように、シーケンス番号と相関される。チップシーケンス２２２の場合のように、シーケンス番号と拡散送信データ信号２２４との間に高相関が存在する場合、元のデータ信号２２０は、高相関シーケンス番号に関連するチップシーケンスを使用して復元され得る。拡散送信データ信号２２４は、図２６Ｄにおける復元されたデータ信号２２６のように、元のデータ信号２２０を復元するために、この例ではチップシーケンス２２２である高相関チップシーケンスを用いて逆拡散される。 [00194] For example, a data signal 220 that may be modulated using CDMA is shown in FIG. 26A. The data signal 220 has a period of 2T. As shown in FIG. 26A, the data signal 220 represents a data sequence 1010 in this example, where a high signal level represents a logic “1” and a low signal level represents a logic “0”. In CDMA modulation, the data signal 220 is extended by a chip sequence 222, such as the chip sequence 222 of FIG. 26B, which can be a pseudo-random code. In this example, the chip sequence 222 is one tenth of the period of the data signal 220 to provide a chip sequence 222 having a spreading factor or spreading factor of 10 chips for each sample of the data signal 220 in this example. Has a period that is In this example, to spread the data signal 220, the data signal 220 is exclusive ORed (ie, XOR) with the chip sequence 222 to provide a spread transmission data signal 224, as shown in FIG. 26C. ) Other data signals for other users transmitted with the same bandwidth along with the spread transmission data signal 224 are spread using another chip sequence and chip sequence 222 that are orthogonal to each other. In this way, when the original data signal 220 is to be reconstructed, the spread transmission data signal 224 is correlated with the sequence number as described above with respect to FIGS. If there is a high correlation between the sequence number and the spread transmitted data signal 224, as in the case of the chip sequence 222, the original data signal 220 is recovered using the chip sequence associated with the high correlation sequence number. obtain. The spread transmission data signal 224 is despread using a highly correlated chip sequence, which in this example is the chip sequence 222, to reconstruct the original data signal 220, like the reconstructed data signal 226 in FIG. 26D. .

[00195]図２６Ｃにおける拡散送信データ信号２２４の逆拡散は、高相関チップシーケンスを決定するために、図１１のＶＰＥ２２（２）に関して上述された相関ベクトル処理動作と同様に、拡散送信データ信号２２４と潜在的なチップシーケンスとの間の内積として、逆拡散ベクトル処理動作において実行され得る。拡散送信データ信号２２４は、図２６Ｄにおける復元されたデータ信号２２６を供給するために、元のデータ信号２２０をＣＤＭＡ変調するために使用されていると決定されたチップシーケンス２２２を用いて逆拡散され得る。 [00195] The despreading of the spread transmitted data signal 224 in FIG. 26C is similar to the correlation vector processing operation described above with respect to VPE 22 (2) of FIG. 11 to determine the highly correlated chip sequence, to determine the highly correlated chip sequence. As a dot product between and a potential chip sequence. The spread transmission data signal 224 is despread using a chip sequence 222 determined to be used to CDMA modulate the original data signal 220 to provide the recovered data signal 226 in FIG. 26D. obtain.

[00196]ＣＤＭＡ処理動作を含むベクトルプロセッサでは、ベクトルプロセッサは、実行ユニットから出力され、ベクトルデータメモリに記憶された後に拡散信号ベクトルデータシーケンスの逆拡散を実行する回路を含むことができる。この関連で、ベクトルデータメモリに記憶された拡散信号ベクトルデータシーケンスは、後処理動作においてベクトルデータメモリからフェッチされ、元のデータ信号を復元するために相関拡散コードシーケンスまたはチップシーケンスを用いて逆拡散される。拡散前の元のデータサンプルである逆拡散ベクトルデータシーケンスは、ベクトルデータメモリに戻されて記憶される。この後処理動作は、実行ユニットによる次のベクトル動作処理を遅延させる可能性があり、実行ユニット内のコンピュータ構成要素が過少利用される原因になる。さらに、逆拡散されるべき拡散信号ベクトルデータシーケンスは実行ユニットからの異なるデータフローパスと交差するので、拡散コードシーケンスを使用する拡散信号ベクトルシーケンスの逆拡散は、並列化することが困難である。 [00196] In a vector processor that includes CDMA processing operations, the vector processor may include circuitry that performs despreading of the spread signal vector data sequence after being output from the execution unit and stored in the vector data memory. In this regard, the spread signal vector data sequence stored in the vector data memory is fetched from the vector data memory in a post-processing operation and despread using a correlation spread code sequence or chip sequence to recover the original data signal. Is done. The despread vector data sequence which is the original data sample before spreading is returned to the vector data memory and stored. This post-processing operation can delay the next vector operation process by the execution unit, causing underutilization of computer components in the execution unit. Furthermore, since the spread signal vector data sequence to be despread crosses different data flow paths from the execution unit, despreading of the spread signal vector sequence using the spread code sequence is difficult to parallelize.

[00197]この問題に対処するために、下記で開示される実施形態では、ＶＰＥ内の実行ユニットとベクトルデータメモリとの間のデータフローパス内に設けられた逆拡散回路を含むＶＰＥが提供される。逆拡散回路は、出力ベクトルデータサンプルセットが実行ユニットからベクトルデータメモリに出力データフローパスを介して供給されている間のインフライトの実行ユニットからの出力ベクトルデータサンプルを使用して、拡散スペクトルシーケンスを逆拡散するように構成される。出力ベクトルデータサンプルセットのインフライト逆拡散は、実行ユニットによって供給された出力ベクトルデータサンプルセットが、ベクトルデータメモリに記憶される前に逆拡散されることを意味し、その結果、出力ベクトルデータサンプルセットは逆拡散されたフォーマットでベクトルデータメモリに記憶される。逆拡散された拡散スペクトルシーケンス（ＤＳＳＳ）は、実行ユニット内で実行されるべき次のベクトル処理動作を遅延させる可能性がある、さらなる後処理ステップを必要とせずに、ベクトルデータメモリに逆拡散された形式で記憶され得る。したがって、ＶＰＥ内のデータフローパスの効率は、拡散スペクトルシーケンスの逆拡散によって制限されない場合がある。逆拡散された拡散スペクトルシーケンスがベクトルデータメモリに記憶されるとき、実行ユニット内の次のベクトル処理は、データフローの制限ではなく、コンピュータリソースのみによって制限される。 [00197] To address this issue, in the embodiments disclosed below, a VPE is provided that includes a despreading circuit provided in a data flow path between an execution unit in the VPE and a vector data memory. . The despreading circuit uses the output vector data samples from the in-flight execution unit while the output vector data sample set is fed from the execution unit to the vector data memory via the output data flow path, Configured to despread. In-flight despreading of the output vector data sample set means that the output vector data sample set supplied by the execution unit is despread before being stored in the vector data memory, so that the output vector data sample The set is stored in vector data memory in a despread format. The despread spread spectrum sequence (DSSS) is despread into the vector data memory without the need for further post-processing steps that may delay the next vector processing operation to be performed within the execution unit. Can be stored in different formats. Thus, the efficiency of the data flow path within the VPE may not be limited by despreading of the spread spectrum sequence. When the despread spread spectrum sequence is stored in the vector data memory, the next vector processing in the execution unit is limited only by computer resources, not data flow limitations.

[00198]この関連で、図２７は、図２のＶＰＥ２２として提供され得る別の例示的なＶＰＥ２２（５）の概略図である。下記でより詳細に記載されるように、図２７のＶＰＥ２２（５）は、ベクトルデータサンプルの再フェッチが除去または低減され、電力消費が低減される、ＶＰＥ２２（５）内のベクトルデータファイル８２（０）〜８２（Ｘ）に記憶されるべき、ベクトル処理動作のためのコードシーケンスを用いて実行ユニット８４（０）〜８４（Ｘ）によって供給される、結果として生じる出力ベクトルデータサンプルセット２２８（０）〜２２８（Ｘ）のインフライト逆拡散を提供するように構成される。結果として生じる出力ベクトルデータサンプルセット２２８（０）〜２２８（Ｘ）は、この例では２２８（０）、２２８（１）、．．．、および２２８（Ｘ）である、「Ｘ＋１」個の入力の結果として生じる出力ベクトルデータサンプル２２８から構成される。コードシーケンスは、非限定的な例として、ＣＤＭＡ逆拡散ベクトル処理動作のための拡散スペクトルＣＤＭＡチップシーケンスであり得る。図２７のＶＰＥ２２（５）では、結果として生じる出力ベクトルデータサンプルセット２２８（０）〜２２８（Ｘ）は、ベクトルデータファイル８２（０）〜８２（Ｘ）に記憶される前にコードシーケンスを用いて逆拡散され得る。 [00198] In this regard, FIG. 27 is a schematic diagram of another exemplary VPE 22 (5) that may be provided as the VPE 22 of FIG. As described in more detail below, VPE 22 (5) in FIG. 27 removes or reduces the refetching of vector data samples, reducing power consumption and reducing the vector data file 82 (V) in VPE 22 (5). 0) -82 (X), resulting output vector data sample set 228 (supplied by execution units 84 (0) -84 (X) with a code sequence for vector processing operations to be stored. 0) -228 (X) configured to provide in-flight despreading. The resulting output vector data sample sets 228 (0) -228 (X) are in this example 228 (0), 228 (1),. . . , And 228 (X), which are composed of output vector data samples 228 that result from “X + 1” inputs. The code sequence may be a spread spectrum CDMA chip sequence for CDMA despread vector processing operations as a non-limiting example. In VPE 22 (5) of FIG. 27, the resulting output vector data sample sets 228 (0) -228 (X) use code sequences before being stored in vector data files 82 (0) -82 (X). Can be despread.

[00199]図２７に示され、下記でより詳細に説明されるように、逆拡散回路２３０は、ベクトルデータレーン１００（０）〜１００（Ｘ）の各々の中の実行ユニット８４（０）〜８４（Ｘ）とベクトルデータファイル８２（０）〜８２（Ｘ）との間の出力データフローパス９８（０）〜９８（Ｘ）内に設けられる。逆拡散回路２３０は、相関ベクトル処理動作に関して図１１〜図１６において上記で前述されたように、シーケンス番号発生器１３４によって生成された基準ベクトルデータサンプルセット１３０（０）〜１３０（Ｘ）として供給されるコードシーケンスを用いて、結果として生じる出力ベクトルデータサンプルセット２２８（０）〜２２８（Ｘ）のインフライト逆拡散を提供するために、実行されるべきベクトル命令に従うプログラミングに基づいて構成される。逆拡散された、結果として生じる出力ベクトルデータサンプルセット２２９（０）〜２２９（Ｚ）は、出力データフローパス９８（０）〜９８（Ｘ）内の逆拡散回路２３０によって供給される。逆拡散された、結果として生じる出力ベクトルデータサンプルセット２２９（０）〜２２９（Ｚ）は、この例では２２９（０）、２２９（１）、．．．、および２２９（Ｚ）である、「Ｚ＋１」個の逆拡散された、結果として生じる出力ベクトルデータサンプル２２９から構成される。図２７のＶＰＥ２２（５）における、結果として生じる出力ベクトルデータサンプルセット２２８（０）〜２２８（Ｘ）のインフライト逆拡散は、実行ユニット８４（０）〜８４（Ｘ）によって供給された、結果として生じる出力ベクトルデータサンプルセット２２８（０）〜２２８（Ｘ）が、ベクトルデータファイル８２（０）〜８２（Ｘ）に記憶される前に、結果として生じるベクトルデータサンプルセット２２８（０）〜２２８（Ｘ）内でコードシーケンスを用いて逆拡散されることを意味する。このようにして、結果として生じる出力ベクトルデータサンプルセット２２８（０）〜２２８（Ｘ）は、逆拡散された、結果として生じる出力ベクトルデータサンプルセット２２９（０）〜２２９（Ｘ）として逆拡散された形式でベクトルデータファイル８２（０）〜８２（Ｘ）に記憶される。 [00199] As shown in FIG. 27 and described in more detail below, the despreading circuit 230 includes an execution unit 84 (0)-in each of the vector data lanes 100 (0) -100 (X). 84 (X) and output data flow paths 98 (0) to 98 (X) between the vector data files 82 (0) to 82 (X). The despreading circuit 230 is supplied as reference vector data sample sets 130 (0) -130 (X) generated by the sequence number generator 134 as described above in FIGS. 11-16 with respect to the correlation vector processing operation. Configured based on programming according to the vector instructions to be executed to provide in-flight despreading of the resulting output vector data sample sets 228 (0) -228 (X) . The resulting despread output vector data sample sets 229 (0) -229 (Z) are provided by despreading circuit 230 in output data flow paths 98 (0) -98 (X). The resulting despread output vector data sample set 229 (0) -229 (Z) is 229 (0), 229 (1),. . . , And 229 (Z), consisting of “Z + 1” despread, resulting output vector data samples 229. The in-flight despreading of the resulting output vector data sample sets 228 (0) -228 (X) in VPE 22 (5) of FIG. 27 was provided by execution units 84 (0) -84 (X). The resulting output vector data sample sets 228 (0) -228 (X) are stored in the vector data files 82 (0) -82 (X) before the resulting vector data sample sets 228 (0) -228. In (X), it means despreading using a code sequence. In this way, the resulting output vector data sample sets 228 (0) -228 (X) are despread and despread as the resulting output vector data sample sets 229 (0) -229 (X). Are stored in the vector data files 82 (0) to 82 (X) in the same format.

[00200]このように、出力データフローパス９８（０）〜９８（Ｘ）内に設けられた逆拡散回路２３０により、結果として生じる出力ベクトルデータサンプルセット２２８（０）〜２２８（Ｘ）は、最初にベクトルデータファイル８２（０）〜８２（Ｘ）に記憶され、次いでベクトルデータファイル８２（０）〜８２（Ｘ）からフェッチされ、逆拡散され、ベクトルデータファイル８２（０）〜８２（Ｘ）に逆拡散された形式で再記憶される必要がない。結果として生じる出力ベクトルデータサンプルセット２２８（０）〜２２８（Ｘ）は、ベクトルデータファイル８２（０）〜８２（Ｘ）に記憶される前に逆拡散される。このようにして、逆拡散された、結果として生じる出力ベクトルデータサンプルセット２２９（０）〜２２９（Ｚ）は、実行ユニット８４（０）〜８４（Ｘ）において実行されるべき次のベクトル処理動作を遅延させる可能性がある、さらなる後処理ステップを必要とせずに、ベクトルデータファイル８２（０）〜８２（Ｘ）に記憶される。したがって、ＶＰＥ２２（５）内のデータフローパスの効率は、結果として生じる出力ベクトルデータサンプルセット２２８（０）〜２２８（Ｘ）の逆拡散によって制限されない。実行ユニット８４（０）〜８４（Ｘ）における次のベクトル処理は、結果として生じる出力ベクトルデータサンプルセット２２８（０）〜２２８（Ｘ）がベクトルデータファイル８２（０）〜８２（Ｘ）に逆拡散された、結果として生じる出力ベクトルデータサンプルセット２２９（０）〜２２９（Ｚ）として逆拡散された形式で記憶されるときに、データフローの制限ではなく、コンピュータリソースのみによって制限される。 [00200] Thus, the resulting output vector data sample sets 228 (0) -228 (X) by the despreading circuit 230 provided in the output data flow paths 98 (0) -98 (X) Stored in vector data files 82 (0) -82 (X), then fetched from vector data files 82 (0) -82 (X), despread, and vector data files 82 (0) -82 (X) Need not be re-stored in despread form. The resulting output vector data sample sets 228 (0) -228 (X) are despread before being stored in the vector data files 82 (0) -82 (X). The resulting output vector data sample sets 229 (0) -229 (Z), despread in this way, are then used in the next vector processing operation to be performed in execution units 84 (0) -84 (X). Stored in vector data files 82 (0) -82 (X) without the need for further post-processing steps. Thus, the efficiency of the data flow path within VPE 22 (5) is not limited by the resulting despreading of output vector data sample sets 228 (0) -228 (X). The next vector processing in execution units 84 (0) -84 (X) causes the resulting output vector data sample sets 228 (0) -228 (X) to be reversed to vector data files 82 (0) -82 (X). When stored in despread form as a spread, resulting output vector data sample set 229 (0) -229 (Z), it is limited only by computer resources, not by data flow limitations.

[00201]さらに、実行ユニット８４（０）〜８４（Ｘ）とベクトルデータファイル８２（０）〜８２（Ｘ）との間の出力データフローパス９８（０）〜９８（Ｘ）内に逆拡散回路２３０を設けることによって、結果として生じる出力ベクトルデータサンプルセット２２８（０）〜２２８（Ｘ）は、ベクトルデータファイル８２（０）〜８２（Ｘ）と実行ユニット８４（０）〜８４（Ｘ）との間の入力データフローパス８０（０）〜８０（Ｘ）内のベクトルデータレーン１００と交差する必要がない。異なるベクトルデータレーン１００の間の入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）内の入力ベクトルデータサンプル８６の逆拡散のためのデータフローパスを設けると、ルーティングの複雑さが増大するはずである。結果として、実行ユニット８４（０）〜８４（Ｘ）は、入力データフローパス８０（０）〜８０（Ｘ）において逆拡散動作が実行されている間、過少利用される可能性がある。同様に、上記で説明されたように、入力データフローパス８０（０）〜８０（Ｘ）における、結果として生じる出力ベクトルデータサンプルセット２２８（０）〜２２８（Ｘ）の逆拡散は、結果として生じる出力ベクトルデータサンプルセット２２８（０）〜２２８（Ｘ）が最初に図２７のＶＰＥ２２（５）内のベクトルデータファイル８２（０）〜８２（Ｘ）に記憶されることを必要とするはずであり、それにより、再フェッチおよび逆拡散されるときの電力消費が増大し、および／または逆拡散動作が実行されている間に遅延する可能性がある実行ユニット８４（０）〜８４（Ｘ）の過少利用のリスクがある。 [00201] Further, a despreading circuit is provided in the output data flow paths 98 (0) -98 (X) between the execution units 84 (0) -84 (X) and the vector data files 82 (0) -82 (X). 230, the resulting output vector data sample sets 228 (0) -228 (X) are converted into vector data files 82 (0) -82 (X) and execution units 84 (0) -84 (X). There is no need to cross the vector data lane 100 in the input data flow path 80 (0) -80 (X) between the two. Providing a data flow path for despreading of input vector data samples 86 in input vector data sample sets 86 (0) -86 (X) between different vector data lanes 100 should increase routing complexity. is there. As a result, execution units 84 (0) -84 (X) may be underutilized while a despreading operation is being performed on input data flow paths 80 (0) -80 (X). Similarly, as described above, despreading of the resulting output vector data sample sets 228 (0) -228 (X) in the input data flow paths 80 (0) -80 (X) results. The output vector data sample sets 228 (0) -228 (X) should first be stored in the vector data files 82 (0) -82 (X) in VPE 22 (5) of FIG. , Thereby increasing power consumption when refetched and despread and / or delaying execution units 84 (0) -84 (X) that may be delayed while a despread operation is being performed. There is a risk of underuse.

[00202]図４、図１１、図１９、および図２３のＶＰＥ２２（１）〜２２（４）内に設けられた共通構成要素が、図２７のＶＰＥ２２（５）内に設けられることに留意されたい。共通構成要素は、共通要素番号とともに図２７のＶＰＥ２２（５）において示される。ＶＰＥ２２（１）〜２２（４）内の上記これらの共通構成要素の前の記載および説明は、図２７のＶＰＥ２２（５）にも適用可能であり、したがってここでは再び記載されない。 [00202] It is noted that the common components provided in VPEs 22 (1) -22 (4) of FIGS. 4, 11, 19, and 23 are provided in VPE 22 (5) of FIG. I want. Common components are shown in VPE 22 (5) of FIG. 27 along with common element numbers. The previous description and description of these common components above in VPE 22 (1) -22 (4) is also applicable to VPE 22 (5) in FIG. 27 and is therefore not described again here.

[00203]引き続き図２７を参照すると、より具体的には、逆拡散回路２３０は、出力データフローパス９８（０）〜９８（Ｘ）上の逆拡散回路入力２３２（０）〜２３２（Ｘ）上で、結果として生じる出力ベクトルデータサンプルセット２２８（０）〜２２８（Ｘ）を受信するように構成される。逆拡散回路２３０は、逆拡散された、結果として生じる出力ベクトルデータサンプルセット２２９（０）〜２２９（Ｚ）を供給するために、結果として生じる出力ベクトルデータサンプルセット２２８（０）〜２２８（Ｘ）を逆拡散するように構成される。下記でより詳細に説明されるように、逆拡散された、結果として生じる出力ベクトルデータサンプル２２９の数は、逆拡散された、結果として生じる出力ベクトルデータサンプルセット２２９（０）〜２２９（Ｚ）内では「Ｚ＋１」である。逆拡散された、結果として生じる出力ベクトルデータサンプルセット２２９（０）〜２２９（Ｚ）内の逆拡散された、結果として生じる出力ベクトルデータサンプル２２９の数は、結果として生じる出力ベクトルデータサンプルセット２２８（０）〜２２８（Ｘ）を逆拡散するために使用される拡散係数に依存する。逆拡散回路２３０は、記憶用にベクトルデータファイル８２（０）〜８２（Ｘ）に供給されるために、出力データフローパス９８（０）〜９８（Ｘ）内の逆拡散回路出力２３４（０）〜２３４（Ｘ）上に逆拡散された、結果として生じる出力ベクトルデータサンプルセット２２９（０）〜２２９（Ｚ）を供給するように構成される。 [00203] Still referring to FIG. 27, more specifically, the despreading circuit 230 is on the despreading circuit inputs 232 (0) to 232 (X) on the output data flow paths 98 (0) to 98 (X). The resulting output vector data sample sets 228 (0) -228 (X) are configured to be received. The despreading circuit 230 provides the resulting output vector data sample sets 228 (0) -228 (X) to provide the despread resulting output vector data sample sets 229 (0) -229 (Z). ) Is despread. As will be described in more detail below, the number of despread resulting output vector data samples 229 is equal to the despread resulting output vector data sample set 229 (0) -229 (Z). In the figure, “Z + 1”. The number of despread resulting output vector data samples 229 in despread resulting output vector data sample sets 229 (0) -229 (Z) is the resulting output vector data sample set 228. Depends on the diffusion coefficient used to despread (0) -228 (X). The despreading circuit 230 is supplied to the vector data files 82 (0) -82 (X) for storage, so that the despreading circuit output 234 (0) in the output data flow paths 98 (0) -98 (X). 234 (X) is configured to provide a resulting output vector data sample set 229 (0) -229 (Z) that is despread onto 234 (X).

[00204]この実施形態における出力データフローパス９８（０）〜９８（Ｘ）内のベクトルデータファイル８２（０）〜８２（Ｘ）に逆拡散された、結果として生じる出力ベクトルデータサンプルセット２２９（０）〜２２９（Ｚ）を供給するための図２７のＶＰＥ２２（５）のさらなる詳細および特徴のさらなる説明が次に記載される。この関連で、図２８は、結果として生じる出力ベクトルデータサンプルセット２２８（０）〜２２８（Ｘ）の逆拡散を必要とする例示的なベクトル命令に従って、逆拡散回路２３０を利用する図２７のＶＰＥ２２（５）において実行され得る逆拡散ベクトル処理動作２３６から得られた結果として生じる出力ベクトルデータサンプルセット２２８（０）〜２２８（Ｘ）の例示的な逆拡散を示すフローチャートである。 [00204] The resulting output vector data sample set 229 (0) despread into the vector data files 82 (0) -82 (X) in the output data flow paths 98 (0) -98 (X) in this embodiment. ) To 229 (Z) for further details and further description of VPE 22 (5) of FIG. In this regard, FIG. 28 illustrates the VPE 22 of FIG. 27 utilizing the despreading circuit 230 in accordance with an exemplary vector instruction that requires despreading of the resulting output vector data sample sets 228 (0) -228 (X). FIG. 6 is a flowchart illustrating exemplary despreading of the resulting output vector data sample sets 228 (0) -228 (X) obtained from despreading vector processing operation 236 that may be performed in (5).

[00205]図２７と図２８とを参照すると、ベクトル命令に従う逆拡散ベクトル処理動作２３６に従って処理されるべき入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）が、ベクトルデータファイル８２（０）〜８２（Ｘ）からフェッチされ、入力データフローパス８０（０）〜８０（Ｘ）内に供給される（図２８のブロック２３８）。結果として生じる逆拡散ベクトル処理動作２３６のための、結果として生じる出力ベクトルデータサンプルセット２２８（０）〜２２８（Ｘ）の幅に応じて、ベクトル命令のプログラミングに従う逆拡散ベクトル処理動作２３６を提供するために、図２７のＶＰＥ２２（５）内のベクトルデータレーン１００（０）〜１００（Ｘ）の１つ、いくつか、またはすべてが利用され得る。逆拡散ベクトル処理動作２３６が、結果として生じる出力ベクトルデータサンプルセット２２８（０）〜２２８（Ｘ）の中のすべての、結果として生じる出力ベクトルデータサンプル２２８の逆拡散を実行することを要する場合、実行ユニット８４（０）〜８４（Ｘ）からの出力データフローパス９８（０）〜９８（Ｘ）内のすべてのベクトルデータレーン１００（０）〜１００（Ｘ）が逆拡散ベクトル処理動作２３６に利用され得る。代替として、逆拡散ベクトル処理動作２３６は、結果として生じる出力ベクトルデータサンプルセット２２８（０）〜２２８（Ｘ）の中の、結果として生じる出力ベクトルデータサンプル２２８のサブセットを逆拡散することのみを要する場合があり、したがって、結果として生じる出力ベクトルデータサンプル２２８のサブセットに対応する出力データフローパス９８内のベクトルデータレーン１００のみを要する。 [00205] Referring to FIGS. 27 and 28, an input vector data sample set 86 (0) -86 (X) to be processed according to a despread vector processing operation 236 according to a vector instruction is a vector data file 82 (0). To 82 (X) and fed into the input data flow paths 80 (0) to 80 (X) (block 238 of FIG. 28). Depending on the width of the resulting output vector data sample sets 228 (0) -228 (X) for the resulting despread vector processing operation 236, a despread vector processing operation 236 that follows programming of the vector instructions is provided. To that end, one, some, or all of the vector data lanes 100 (0) -100 (X) in VPE 22 (5) of FIG. 27 may be utilized. If the despread vector processing operation 236 requires performing despreading of all the resulting output vector data samples 228 in the resulting output vector data sample sets 228 (0) -228 (X), All vector data lanes 100 (0) -100 (X) in the output data flow paths 98 (0) -98 (X) from the execution units 84 (0) -84 (X) are used for the despread vector processing operation 236. Can be done. Alternatively, the despread vector processing operation 236 only requires despreading a subset of the resulting output vector data samples 228 in the resulting output vector data sample sets 228 (0) -228 (X). Thus, only the vector data lane 100 in the output data flow path 98 corresponding to a subset of the resulting output vector data samples 228 is required.

[00206]引き続き図２７と図２８とを参照すると、逆拡散ベクトル処理動作が図２７のＶＰＥ２２（５）内の逆拡散回路２３０によって実行されるより前に、フェッチされた入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）が、実行ユニット８４（０）〜８４（Ｘ）にある入力データフローパス８０（０）〜８０（Ｘ）から受信される（図２８のブロック２４０）。実行ユニット８４（０）〜８４（Ｘ）が、ベクトル命令に従って提供されたベクトル処理動作に従って、受信された入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）に対して１つまたは複数のベクトル処理動作を実行する（図２８のブロック２４２）。たとえば、実行ユニット８４（０）〜８４（Ｘ）は、結果として生じる出力ベクトルデータサンプルセット２２８（０）〜２２８（Ｘ）を供給するために、ベクトル処理動作を実行するための、入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）と、基準ベクトルデータサンプルセット１３０（０）〜１３０（Ｘ）内のコードシーケンスとを使用して、乗算および／または累算を提供する。たとえば、結果として生じる出力ベクトルデータサンプルセット２２８（０）〜２２８（Ｘ）は、入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）のベクトル処理に基づく場合があり、基準ベクトルデータサンプルセット１３０（０）〜１３０（Ｘ）は、図２７のＶＰＥ２２（５）の出力データフローパス９８（０）〜９８（Ｘ）内に供給される。 [00206] With continued reference to FIGS. 27 and 28, the fetched input vector data sample set 86 before the despreading vector processing operation is performed by the despreading circuit 230 in VPE 22 (5) of FIG. (0) -86 (X) are received from input data flow paths 80 (0) -80 (X) in execution units 84 (0) -84 (X) (block 240 in FIG. 28). One or more vectors for input vector data sample sets 86 (0) -86 (X) received by execution units 84 (0) -84 (X) in accordance with vector processing operations provided in accordance with vector instructions. Processing operations are performed (block 242 of FIG. 28). For example, execution units 84 (0) -84 (X) may receive input vector data for performing vector processing operations to provide the resulting output vector data sample sets 228 (0) -228 (X). Sample sets 86 (0) -86 (X) and code sequences in reference vector data sample sets 130 (0) -130 (X) are used to provide multiplication and / or accumulation. For example, the resulting output vector data sample sets 228 (0) -228 (X) may be based on vector processing of the input vector data sample sets 86 (0) -86 (X) and the reference vector data sample set 130. (0) to 130 (X) are supplied into the output data flow paths 98 (0) to 98 (X) of the VPE 22 (5) in FIG.

[00207]引き続き図２７と図２８とを参照すると、結果として生じる出力ベクトルデータサンプルセット２２８（０）〜２２８（Ｘ）を逆拡散することが望ましい場合、逆拡散ベクトル処理動作２３６は、結果として生じる出力ベクトルデータサンプルセット２２８（０）〜２２８（Ｘ）がベクトルデータファイル８２（０）〜８２（Ｘ）に記憶される前に実行され得る。この例では、結果として生じる出力ベクトルデータサンプルセット２２８（０）〜２２８（Ｘ）は、図２７のＶＰＥ２２（５）内の実行ユニット８４（０）〜８４（Ｘ）とベクトルデータファイル８２（０）〜８２（Ｘ）との間に設けられた出力データフローパス９８（０）〜９８（Ｘ）内に設けられた逆拡散回路２３０に供給される。逆拡散回路２３０は、実行されているベクトル命令に従って、およびベクトル命令がベクトルデータファイル８２（０）〜８２（Ｘ）に記憶されるべき、結果として生じる出力ベクトルデータサンプルセット２２８（０）〜２２８（Ｘ）の逆拡散を要求する場合、出力データフローパス９８（０）〜９８（Ｘ）内で、結果として生じる出力ベクトルデータサンプルセット２２８（０）〜２２８（Ｘ）を選択的に逆拡散するようにプログラム可能である。結果として生じる出力ベクトルデータサンプルセット２２８（０）〜２２８（Ｘ）がベクトルデータファイル８２（０）〜８２（Ｘ）に記憶されることなく、逆拡散回路２３０が、実行されているベクトル命令に従う逆拡散プログラミングに従って、結果として生じる出力ベクトルデータサンプルセット２２８（０）〜２２８（Ｘ）を逆拡散する（図２８のブロック２４４）。 [00207] With continued reference to FIGS. 27 and 28, if it is desired to despread the resulting output vector data sample sets 228 (0) -228 (X), the despread vector processing operation 236 results in The resulting output vector data sample sets 228 (0) -228 (X) may be executed before being stored in the vector data files 82 (0) -82 (X). In this example, the resulting output vector data sample sets 228 (0) -228 (X) are associated with execution units 84 (0) -84 (X) and vector data file 82 (0) in VPE 22 (5) of FIG. ) To 82 (X) are supplied to the despreading circuit 230 provided in the output data flow paths 98 (0) to 98 (X). The despreading circuit 230 determines the resulting output vector data sample set 228 (0) -228 according to the vector instruction being executed and the vector instruction is to be stored in the vector data files 82 (0) -82 (X). When despreading (X) is required, the resulting output vector data sample sets 228 (0) -228 (X) are selectively despread within the output data flow paths 98 (0) -98 (X). Is programmable. The resulting output vector data sample sets 228 (0) -228 (X) are not stored in the vector data files 82 (0) -82 (X), and the despreading circuit 230 follows the vector instruction being executed. The resulting output vector data sample sets 228 (0) -228 (X) are despread according to despread programming (block 244 in FIG. 28).

[00208]このようにして、結果として生じる出力ベクトルデータサンプルセット２２８（０）〜２２８（Ｘ）は、それにより実行ユニット８４（０）〜８４（Ｘ）において遅延をもたらす、最初にベクトルデータファイル８２（０）〜８２（Ｘ）に記憶され、再フェッチされ、後処理動作において逆拡散され、ベクトルデータファイル８２（０）〜８２（Ｘ）に逆拡散されたフォーマットで記憶される必要がない。結果として生じる出力ベクトルデータサンプルセット２２８（０）〜２２８（Ｘ）が、逆拡散後処理を必要とせずに、ベクトルデータファイル８２（０）〜８２（Ｘ）に逆拡散された、結果として生じる出力ベクトルデータサンプルセット２２９（０）〜２２９（Ｚ）として記憶される（図２８のブロック２４６）。 [00208] In this way, the resulting output vector data sample sets 228 (0) -228 (X) are initially vector data files resulting in delays in execution units 84 (0) -84 (X). Stored in 82 (0) -82 (X), refetched, despread in post-processing operation, and need not be stored in vector data file 82 (0) -82 (X) in despread format . The resulting output vector data sample sets 228 (0) -228 (X) are despread into the vector data files 82 (0) -82 (X) without requiring despreading post-processing, resulting in It is stored as output vector data sample sets 229 (0) -229 (Z) (block 246 in FIG. 28).

[00209]図２９は、図２７のＶＰＥ２２（５）内の実行ユニット８４（０）〜８４（Ｘ）とベクトルデータファイル８２（０）〜８２（Ｘ）との間の出力データフローパス９８（０）〜９８（Ｘ）内に設けられ得る、例示的な逆拡散回路２３０の概略図である。逆拡散回路２３０は、基準ベクトルデータサンプルセット１３０（０）〜１３０（Ｘ）内の反復コードシーケンスの様々な拡散係数に対して、逆拡散された、結果として生じる出力ベクトルデータサンプルセット２２９（０）〜２２９（Ｚ）を供給するために、結果として生じる出力ベクトルデータサンプルセット２２８（０）〜２２８（Ｘ）の逆拡散を提供するように構成される。結果として生じる出力ベクトルデータサンプルセット２２８（０）〜２２８（Ｘ）は、図２７に示されたように、実行ユニット出力９６（０）〜９６（Ｘ）から逆拡散回路２３０に供給される。結果として生じる出力ベクトルデータサンプルセット２２８（０）〜２２８（Ｘ）の拡散係数は知られていない場合があるので、図２７のシーケンス番号発生器１３４によって生成された基準ベクトルデータサンプルセット１３０（０）〜１３０（Ｘ）内の反復シーケンス番号の様々な拡散係数を用いて、結果として生じる出力ベクトルデータサンプルセット２２８（０）〜２２８（Ｘ）を逆拡散することが望ましい場合がある。 [00209] FIG. 29 illustrates an output data flow path 98 (0) between execution units 84 (0) -84 (X) and vector data files 82 (0) -82 (X) in VPE 22 (5) of FIG. ) -98 (X) is a schematic diagram of an exemplary despreading circuit 230 that may be provided. The despreading circuit 230 despreads the resulting output vector data sample set 229 (0 ) ˜229 (Z) is configured to provide despreading of the resulting output vector data sample sets 228 (0) ˜228 (X). The resulting output vector data sample sets 228 (0) -228 (X) are supplied to the despreading circuit 230 from the execution unit outputs 96 (0) -96 (X) as shown in FIG. Since the spreading coefficients of the resulting output vector data sample sets 228 (0) -228 (X) may not be known, the reference vector data sample set 130 (0) generated by the sequence number generator 134 of FIG. ) To 130 (X), it may be desirable to despread the resulting output vector data sample sets 228 (0) to 228 (X) using various spreading factors with repetitive sequence numbers.

[00210]たとえば、結果として生じる出力ベクトルデータサンプルセット２２８（０）〜２２８（Ｘ）が３２個のサンプルを含んでいて、結果として生じる出力ベクトルデータサンプルセット２２８（０）〜２２８（Ｘ）全体が４の拡散係数を想定して逆拡散された場合、結果として生じる出力ベクトルデータサンプルセット２２８（０）〜２２８（Ｘ）の逆拡散が実行された後、逆拡散された、結果として生じる出力ベクトルデータサンプルセット２２９（０）〜２２９（Ｚ）は、８個の逆拡散サンプル（すなわち、３２サンプル／４の拡散係数）を含んでいるはずである。しかしながら、この同じ例において、結果として生じる出力ベクトルデータサンプルセット２２８（０）〜２２８（Ｘ）全体が８の拡散係数を想定して逆拡散された場合、結果として生じる出力ベクトルデータサンプルセット２２８（０）〜２２８（Ｘ）の逆拡散が実行された後、逆拡散された、結果として生じる出力ベクトルデータサンプルセット２２９（０）〜２２９（Ｚ）は、４個の逆拡散サンプル（すなわち、３２サンプル／８の拡散係数）を含んでいるはずである。 [00210] For example, the resulting output vector data sample set 228 (0) -228 (X) includes 32 samples, and the resulting output vector data sample set 228 (0) -228 (X) as a whole. Is despread assuming a spreading factor of 4, the resulting output vector data sample sets 228 (0) -228 (X) are despread and then despread resulting output The vector data sample sets 229 (0) -229 (Z) should contain 8 despread samples (i.e. 32 samples / 4 spread coefficients). However, in this same example, if the entire resulting output vector data sample set 228 (0) -228 (X) is despread assuming a spreading factor of 8, the resulting output vector data sample set 228 ( After the despreading of 0) to 228 (X) is performed, the resulting output vector data sample set 229 (0) to 229 (Z) that has been despread has four despread samples (ie, 32 Sample / 8 diffusion coefficient).

[00211]このように、引き続き図２９を参照すると、逆拡散回路２３０は、異なる数の拡散係数に対して、結果として生じる出力ベクトルデータサンプルセット２２８（０）〜２２８（Ｘ）を逆拡散するように構成される。この実施形態における逆拡散回路２３０は、１つのベクトル処理動作／１つのベクトル命令における様々な拡散係数に対して、逆拡散された、結果として生じる出力ベクトルデータサンプルセット２２９（０）〜２２９（Ｚ）を供給するように構成される。この関連で、逆拡散回路２３０は、結果として生じる出力ベクトルデータサンプルセット２２８（０）〜２２８（Ｘ）を受信するために、実行ユニット出力９６（０）〜９６（Ｘ）に結合された加算器ツリー２４８を含んでいる。逆拡散回路２３０の加算器ツリー２４８は、それらのそれぞれのベクトルデータレーン１００（０）〜１００（Ｘ）内で、結果として生じる出力ベクトルデータサンプルセット２２８（０）〜２２８（Ｘ）の各サンプル２２８を受信するように構成される。加算器ツリー２４８内に第１の加算器ツリーレベル２４８（１）が設けられる。第１の加算器ツリーレベル２４８（１）は、４の拡散係数によって、結果として生じる出力ベクトルデータサンプルセット２２８（０）〜２２８（Ｘ）内のサンプル２２８を拡散することができるように、加算器２５０（０）〜２５０（（（Ｘ＋１）＊２）−１）、２５０（７）から構成される。出力データフローパス９８（０）〜９８（Ｘ）から、結果として生じる出力ベクトルデータサンプルセット２２８（０）〜２２８（Ｘ）をラッチするために、ラッチ２５１（０）〜２５１（Ｘ）が逆拡散回路２３０内に設けられる。 [00211] Thus, with continued reference to FIG. 29, the despreading circuit 230 despreads the resulting output vector data sample sets 228 (0) -228 (X) for different numbers of spreading factors. Configured as follows. The despreading circuit 230 in this embodiment is a despread resulting output vector data sample set 229 (0) -229 (Z) for various spreading factors in one vector processing operation / one vector instruction. ). In this regard, the despreading circuit 230 performs an addition coupled to the execution unit outputs 96 (0) -96 (X) to receive the resulting output vector data sample sets 228 (0) -228 (X). A container tree 248 is included. The adder tree 248 of the despreading circuit 230 has each sample of the resulting output vector data sample set 228 (0) -228 (X) within their respective vector data lanes 100 (0) -100 (X). 228 is configured to receive. A first adder tree level 248 (1) is provided in the adder tree 248. The first adder tree level 248 (1) adds so that the spreading factor of 4 can spread the samples 228 in the resulting output vector data sample sets 228 (0) -228 (X). Units 250 (0) to 250 (((X + 1) * 2) -1) and 250 (7). Latches 251 (0) -251 (X) are despread to latch the resulting output vector data sample sets 228 (0) -228 (X) from output data flow paths 98 (0) -98 (X). Provided in circuit 230.

[00212]たとえば、結果として生じる出力ベクトルデータサンプルセット２２８（０）〜２２８（Ｘ）内の各サンプル２２８が３２ビット幅であり、２つの１６ビット複素数のベクトルデータ（すなわち、フォーマットＩ８Ｑ８に従う第１のベクトルデータおよびフォーマットＩ８Ｑ８に従う第２のベクトルデータ）から構成される場合、結果として生じる出力ベクトルデータサンプルセット２２８（０）〜２２８（Ｘ）内の２つの、結果として生じる出力ベクトルデータサンプル２２８の中の４つのベクトルデータサンプルを、１つの逆拡散された、結果として生じる出力ベクトルデータサンプルの中に逆拡散するために、４の拡散係数が適用される可能性がある。たとえば、図２９に示されたように、加算器２５０（０）は、結果として生じる出力ベクトルデータサンプル２２８（０）と２２８（１）とを、それらのサンプルのための４の拡散係数によって逆拡散するように構成される。同じく、加算器２５０（１）は、結果として生じる出力ベクトルデータサンプル２２８（２）と２２８（３）とを、それらのサンプルのための４の拡散係数によって逆拡散するように構成される。加算器２５０（（（Ｘ＋１）／２）−１）、２５０（７）は、４の拡散係数を用いて、逆拡散ベクトルデータサンプルセット２５２（０）〜２５２（（（Ｘ＋１）／２）−１）、２５２（７）を供給するために、結果として生じる出力ベクトルデータサンプルセット２２８（Ｘ−１）と２２８（Ｘ）とを逆拡散するように構成される。加算器２５０（（（Ｘ＋１）／２）−１）、２５０（７）によって実行された逆拡散からの逆拡散ベクトルデータサンプルセット２５２（０）〜２５２（（（Ｘ＋１）／２）−１）、２５２（７）は、ラッチ２５５（０）〜２５５（（（Ｘ＋１）／２）−１）、２５５（７）の中にラッチされる。 [00212] For example, each sample 228 in the resulting output vector data sample set 228 (0) -228 (X) is 32 bits wide and is two 16-bit complex vector data (ie, a first according to format I8Q8 Of the resulting output vector data samples 228 in the resulting output vector data sample sets 228 (0) to 228 (X), the second vector data according to format I8Q8). Four despread coefficients may be applied to despread the four vector data samples in into one despread, resulting output vector data sample. For example, as shown in FIG. 29, adder 250 (0) reverses the resulting output vector data samples 228 (0) and 228 (1) with a spreading factor of 4 for those samples. Configured to diffuse. Similarly, adder 250 (1) is configured to despread the resulting output vector data samples 228 (2) and 228 (3) by a spreading factor of 4 for those samples. The adders 250 (((X + 1) / 2) -1) and 250 (7) use despreading coefficients of 4, and despread vector data sample sets 252 (0) to 252 (((X + 1) / 2)- 1), 252 (7) is configured to despread the resulting output vector data sample sets 228 (X-1) and 228 (X) to provide 252 (7). Despread vector data sample sets 252 (0) -252 (((X + 1) / 2) -1) from the despread performed by adders 250 (((X + 1) / 2) -1), 250 (7) , 252 (7) is latched into latches 255 (0) -255 (((X + 1) / 2) -1), 255 (7).

[00213]逆拡散ベクトル処理動作２３６が４の拡散係数による、結果として生じる出力ベクトルデータサンプルセット２２８（０）〜２２８（Ｘ）の逆拡散を必要とする場合、下記でより詳細に記載されるように、逆拡散ベクトルデータサンプルセット２５２（０）〜２５２（（（Ｘ＋１）／２）−１）、２５２（７）は、逆拡散された、結果として生じる出力ベクトルデータサンプルセット２２９（０）〜２２９（Ｚ）として供給され得るし、ここで、「Ｚ」は７である。しかしながら、逆拡散ベクトル処理動作２３６がより高い拡散係数（たとえば、８、１６、３２、６４、１２８、２５６）を要求する場合、逆拡散ベクトルデータサンプルセット２５２（０）〜２５２（（（Ｘ＋１）／２）−１）、２５２（７）は、逆拡散された、結果として生じる出力ベクトルデータサンプルセット２２９（０）〜２２９（Ｚ）として供給されない。逆拡散ベクトルデータサンプルセット２５２（０）〜２５２（（（Ｘ＋１）／２）−１）、２５２（７）は、加算器２５４（０）〜２５４（（（Ｘ＋１）／４）−１）、２５４（３）への第２の加算器ツリーレベル２４８（２）に供給される。この関連で、加算器２５４（０）は、それらのサンプルのための８の拡散係数を有する、結果として生じる逆拡散ベクトルデータサンプル２５６（０）を供給するために、逆拡散ベクトルデータサンプル２５２（０）および２５２（１）に対して逆拡散を実行するように構成される。同じく、加算器２５４（１）は、それらのサンプルのための８の拡散係数を有する、結果として生じる逆拡散ベクトルデータサンプル２５６（１）を供給するために、逆拡散ベクトルデータサンプル２５２（２）および２５２（３）に対して逆拡散を実行するように構成される。加算器２５４（（（Ｘ＋１）／４）−１）、２５４（３）は、８の拡散係数を有する、結果として生じる逆拡散ベクトルデータサンプル２５６（（（Ｘ＋１）／４）−１）、２５６（３）を供給するために、逆拡散ベクトルデータサンプルセット２５２（（（Ｘ＋１）／４）−２）、２５２（（（Ｘ＋１）／４）−１）、２５２（３）に対して逆拡散を実行するように構成される。加算器２５４（０）〜２５４（（（Ｘ＋１）／４）−１）、２５４（３）によって実行された逆拡散からの、結果として生じる逆拡散ベクトルデータサンプルセット２５６（０）〜２５６（（（Ｘ＋１）／４）−１）、２５６（３）は、ラッチ２５７（０）〜２５７（（（Ｘ＋１）／４）−１）、２５７（３）の中にラッチされる。 [00213] If the despread vector processing operation 236 requires despreading of the resulting output vector data sample set 228 (0) -228 (X) with a spreading factor of 4, it will be described in more detail below. Thus, the despread vector data sample sets 252 (0) -252 (((X + 1) / 2) -1), 252 (7) are despread and the resulting output vector data sample set 229 (0) ˜229 (Z), where “Z” is 7. However, if the despread vector processing operation 236 requires a higher spreading factor (eg, 8, 16, 32, 64, 128, 256), the despread vector data sample sets 252 (0) -252 (((X + 1) / 2) -1), 252 (7) is not supplied as despread, resulting output vector data sample sets 229 (0) -229 (Z). Despread vector data sample sets 252 (0) to 252 (((X + 1) / 2) -1), 252 (7) are adders 254 (0) to 254 (((X + 1) / 4) -1), To the second adder tree level 248 (2) to 254 (3). In this regard, adder 254 (0) provides despread vector data sample 252 (() to provide the resulting despread vector data sample 256 (0) with a spreading factor of 8 for those samples. 0) and 252 (1) are configured to perform despreading. Similarly, adder 254 (1) provides despread vector data sample 252 (2) to provide the resulting despread vector data sample 256 (1) with a spreading factor of 8 for those samples. And 252 (3) are configured to perform despreading. Adders 254 (((X + 1) / 4) -1), 254 (3) are the resulting despread vector data samples 256 (((X + 1) / 4) -1) 256 with 8 spreading factors. In order to supply (3), despread vector data sample sets 252 (((X + 1) / 4) -2), 252 (((X + 1) / 4) -1), 252 (3) are despread Configured to perform. Resulting despread vector data sample sets 256 (0) -256 (() from the despread performed by adders 254 (0) -254 (((X + 1) / 4) -1), 254 (3). (X + 1) / 4) -1), 256 (3) is latched into latches 257 (0) -257 (((X + 1) / 4) -1), 257 (3).

[00214]引き続き図２９を参照すると、逆拡散ベクトル処理動作２３６が８の拡散係数による、結果として生じる出力ベクトルデータサンプルセット２２８（０）〜２２８（Ｘ）の逆拡散を必要とする場合、下記でより詳細に記載されるように、逆拡散ベクトルデータサンプルセット２５６（０）〜２５６（（（Ｘ＋１）／４）−１）、２５６（３）は、逆拡散された、結果として生じる出力ベクトルデータサンプルセット２２９（０）〜２２９（Ｚ）として供給され得るし、ここで、「Ｚ」は３である。しかしながら、逆拡散ベクトル処理動作２３６が８よりも高い拡散係数（たとえば、１６、３２、６４、１２８、２５６）を要求する場合、逆拡散ベクトルデータサンプルセット２５６（０）〜２５６（（（Ｘ＋１）／４）−１）、２５６（３）は、逆拡散された、結果として生じる出力ベクトルデータサンプルセット２２９（０）〜２２９（Ｚ）として供給されない。逆拡散ベクトルデータサンプルセット２５６（０）〜２５６（（（Ｘ＋１）／４）−１）、２５６（３）は、加算器２５８（０）〜２５８（（（Ｘ＋１）／８）−１）、２５８（１）への第３の加算器ツリーレベル２４８（３）に供給される。この関連で、加算器２５８（０）は、それらのサンプルのための１６の拡散係数を供給するために、逆拡散ベクトルデータサンプル２５６（０）および２５６（１）に対して逆拡散を実行するように構成される。同じく、加算器２５８（１）は、１６の拡散係数を有する逆拡散ベクトルデータサンプルセット２６０（０）〜２６０（（（Ｘ＋１）／８）−１）、２６０（１）を供給するために、逆拡散ベクトルデータサンプル２５６（２）および２５６（３）に対して逆拡散を実行するように構成される。加算器２５８（０）〜２５８（（（Ｘ＋１）／８）−１）、２５８（１）によって実行された逆拡散からの逆拡散ベクトルデータサンプルセット２６０（０）〜２６０（（（Ｘ＋１）／８）−１）、２６０（１）は、ラッチ２５９（０）〜２５９（（（Ｘ＋１）／８）−１）、２５９（２）の中にラッチされる。 [00214] With continued reference to FIG. 29, if the despread vector processing operation 236 requires despreading of the resulting output vector data sample sets 228 (0) -228 (X) with a spreading factor of 8, The despread vector data sample sets 256 (0) to 256 (((X + 1) / 4) -1), 256 (3) are despread, resulting output vectors, as described in more detail in Data sample sets 229 (0) -229 (Z) may be provided, where “Z” is three. However, if the despread vector processing operation 236 requires a spreading factor higher than 8 (eg, 16, 32, 64, 128, 256), the despread vector data sample sets 256 (0) to 256 (((X + 1) / 4) -1) 256 (3) is not supplied as the despread resulting output vector data sample set 229 (0) -229 (Z). Despread vector data sample sets 256 (0) to 256 (((X + 1) / 4) -1), 256 (3) are adders 258 (0) to 258 (((X + 1) / 8) -1), To the third adder tree level 248 (3) to 258 (1). In this regard, summer 258 (0) performs despreading on despread vector data samples 256 (0) and 256 (1) to provide 16 spreading factors for those samples. Configured as follows. Similarly, adder 258 (1) provides despread vector data sample sets 260 (0) -260 (((X + 1) / 8) -1), 260 (1) having 16 spreading factors. Despreading is configured to perform despreading vector data samples 256 (2) and 256 (3). Despread vector data sample sets 260 (0) -260 (((X + 1) /) from the despread performed by adders 258 (0) -258 (((X + 1) / 8) -1), 258 (1) 8) -1), 260 (1) are latched in latches 259 (0) -259 (((X + 1) / 8) -1), 259 (2).

[00215]引き続き図２９を参照すると、逆拡散ベクトル処理動作２３６が１６の拡散係数による、結果として生じる出力ベクトルデータサンプルセット２２８（０）〜２２８（Ｘ）の逆拡散を必要とする場合、下記でより詳細に記載されるように、逆拡散ベクトルデータサンプルセット２６０（０）〜２６０（（（Ｘ＋１）／８）−１）、２５６（１）は、逆拡散された、結果として生じる出力ベクトルデータサンプルセット２２９（０）〜２２９（Ｚ）として供給され得るし、ここで、「Ｚ」は１である。しかしながら、逆拡散ベクトル処理動作２３６が１６よりも高い拡散係数（たとえば、３２、６４、１２８、２５６）を要求する場合、逆拡散ベクトルデータサンプルセット２６０（０）〜２６０（（（Ｘ＋１）／８）−１）、２６０（１）は、逆拡散された、結果として生じる出力ベクトルデータサンプルセット２２９（０）〜２２９（Ｚ）として供給されない。逆拡散ベクトルデータサンプルセット２６０（０）〜２６０（（（Ｘ＋１）／８）−１）、２６０（１）は、加算器２６２への第４の加算器ツリーレベル２４８（４）に供給される。この関連で、加算器２６２は、３２の拡散係数を有する逆拡散ベクトルデータサンプル２６４を供給するために、逆拡散ベクトルデータサンプル２６０（０）および２６０（１）に対して逆拡散を実行するように構成される。加算器２６２によって実行された逆拡散からの逆拡散ベクトルデータサンプル２６４は、ラッチ２６６および２６８の中にラッチされる。 [00215] With continued reference to FIG. 29, if the despread vector processing operation 236 requires despreading of the resulting output vector data sample sets 228 (0) -228 (X) by 16 spreading factors, The despread vector data sample sets 260 (0) -260 (((X + 1) / 8) -1), 256 (1) are despread and the resulting output vector, as described in more detail in Data sample sets 229 (0) -229 (Z) may be provided, where “Z” is one. However, if the despread vector processing operation 236 requires a spreading factor higher than 16 (eg, 32, 64, 128, 256), the despread vector data sample sets 260 (0) -260 (((X + 1) / 8 ) -1), 260 (1) are not supplied as despread, resulting output vector data sample sets 229 (0) -229 (Z). The despread vector data sample sets 260 (0) -260 (((X + 1) / 8) -1), 260 (1) are supplied to a fourth adder tree level 248 (4) to adder 262. . In this regard, adder 262 performs despreading on despread vector data samples 260 (0) and 260 (1) to provide despread vector data samples 264 having 32 spreading factors. Configured. Despread vector data samples 264 from the despread performed by adder 262 are latched into latches 266 and 268.

[00216]引き続き図２９を参照すると、逆拡散ベクトル処理動作２３６が３２の拡散係数による、結果として生じる出力ベクトルデータサンプルセット２２８（０）〜２２８（Ｘ）の逆拡散を必要とする場合、下記でより詳細に記載されるように、逆拡散ベクトルデータサンプル２６４は、逆拡散された、結果として生じる出力ベクトルデータサンプル２２９として供給され得る。しかしながら、逆拡散ベクトル処理動作２３６が３２よりも高い拡散係数（たとえば、６４、１２８、２５６）を要求する場合、逆拡散ベクトルデータサンプル２６４は、逆拡散された、結果として生じる出力ベクトルデータサンプルセット２２９として供給されない。逆拡散ベクトルデータサンプル２６４は、ベクトルデータファイル８２に記憶される必要なしに、ラッチ２６８の中にラッチされたままである。上述されたように、３２の拡散係数を使用して逆拡散されるために、別の結果として生じる出力ベクトルデータサンプルセット２２８（０）〜２２８（Ｘ）は、さらなる処理サイクルにわたって、ラッチ２５１（０）〜２５１（Ｘ）の中にロードされる。結果として生じる逆拡散ベクトルデータサンプル２６４’は、６４の拡散係数を有する逆拡散ベクトルデータサンプル２７２を供給するために、第５の加算器ツリー２４８（５）内の加算器２７０により、前の逆拡散ベクトルデータサンプル２６４に加算される。選択器２７３は、３２の拡散係数を有する逆拡散ベクトルデータサンプル２６４、または６４の拡散係数を有する逆拡散ベクトルデータサンプル２６４’のどちらが、ラッチ２７４の中にラッチされる逆拡散ベクトルデータサンプル２７２としてラッチされるかを制御する。さらなる結果として生じる出力ベクトルデータサンプルセット２２８（０）〜２２８（Ｘ）をラッチし、さらなる結果として生じる出力ベクトルデータサンプルセット２２８（０）〜２２８（Ｘ）を逆拡散するこの同じプロセスは、必要な場合、６４よりも大きい拡散係数を達成するために実行され得る。逆拡散ベクトルデータサンプル２７２は、最終的に、逆拡散ベクトル処理動作２３６のための所望の拡散係数に従って、所望の逆拡散された、結果として生じる出力ベクトルデータサンプル２２９として、ラッチ２７４の中にラッチされる。 [00216] With continued reference to FIG. 29, if the despread vector processing operation 236 requires despreading of the resulting output vector data sample sets 228 (0) -228 (X) by 32 spreading factors, As described in more detail in, despread vector data sample 264 may be provided as a despread resulting output vector data sample 229. However, if the despread vector processing operation 236 requires a spreading factor higher than 32 (eg, 64, 128, 256), the despread vector data sample 264 is despread and the resulting output vector data sample set Not supplied as 229. The despread vector data sample 264 remains latched in the latch 268 without having to be stored in the vector data file 82. As described above, due to despreading using 32 spreading factors, another resulting output vector data sample set 228 (0) -228 (X) is latched 251 ( 0) to 251 (X). The resulting despread vector data sample 264 ′ is added by the adder 270 in the fifth adder tree 248 (5) to provide a despread vector data sample 272 having 64 spreading factors. It is added to the diffusion vector data sample 264. The selector 273 selects either the despread vector data sample 264 having 32 spread coefficients or the despread vector data sample 264 ′ having 64 spread coefficients as a despread vector data sample 272 to be latched in the latch 274. Controls whether it is latched. This same process of latching the additional resulting output vector data sample sets 228 (0) -228 (X) and despreading the additional resulting output vector data sample sets 228 (0) -228 (X) is necessary If so, it can be performed to achieve a diffusion coefficient greater than 64. The despread vector data sample 272 is finally latched into the latch 274 as the desired despread resulting output vector data sample 229 according to the desired spreading factor for the despread vector processing operation 236. Is done.

[00217]引き続き図２９を参照すると、逆拡散ベクトル処理動作２３６においてどの拡散係数が要求されても、逆拡散された、結果として生じる出力ベクトルデータサンプルセット２２９（０）〜２２９（Ｚ）は、図２７のベクトルデータファイル８２（０）〜８２（Ｘ）に記憶される必要がある。次に説明されるように、図２９の逆拡散回路２３０はまた、逆拡散された、結果として生じる出力ベクトルデータサンプルセット２２９（０）〜２２９（Ｚ）を形成するために、結果として生じる出力ベクトルデータサンプルセット２２８（０）〜２２８（Ｘ）に対して逆拡散ベクトル処理動作２３６を実行する結果としてもたらされた逆拡散された、結果として生じる出力ベクトルデータサンプル２２９をラッチ２７６（０）〜２７６（Ｘ）の中にロードするように構成される。逆拡散された、結果として生じる出力ベクトルデータサンプルセット２２９（０）〜２２９（Ｚ）は、記憶されるためにベクトルデータファイル８２（０）〜８２（Ｘ）に供給され得る。このようにして、逆拡散回路２３０によって作成された逆拡散された、結果として生じる出力ベクトルデータサンプルセット２２９（０）〜２２９（Ｚ）を記憶するために、ベクトルデータファイル８２（０）〜８２（Ｘ）に対して１回の書込みが必要とされるにすぎない。図２９の逆拡散回路２３０内の加算器ツリー２４８（１）〜２４８（５）は、逆拡散ベクトル処理動作２３６においてどの拡散係数が要求されても、拡散係数４、８、１６、および３２のすべてに対して逆拡散された、結果として生じる出力ベクトルデータサンプル２２９を生成することができる。代替として、所望の拡散係数に従って逆拡散ベクトル処理動作２３６を実行する必要がない加算器ツリー内の加算器は、無効にされ得るか、または０を加算するように構成され得る。しかしながら、これらの逆拡散された、結果として生じる出力ベクトルデータサンプル２２９のどれが記憶されるためにラッチ２７６（０）〜２７６（Ｘ）に供給されるかを決定するために、次に説明されるように、選択器２７８（０）〜２７８（（（Ｘ＋１）／４）−１）、２７８（３）が設けられる。 [00217] With continued reference to FIG. 29, no matter what spreading factor is requested in the despread vector processing operation 236, the resulting despread output vector data sample sets 229 (0) -229 (Z) are: The vector data files 82 (0) to 82 (X) in FIG. 27 need to be stored. As will be described next, the despreading circuit 230 of FIG. 29 also produces a resulting output vector data sample set 229 (0) -229 (Z) that is despread. The resulting despread, resulting output vector data sample 229 is latched 276 (0) as a result of performing the despread vector processing operation 236 on the vector data sample sets 228 (0) -228 (X). Configured to load into ~ 276 (X). The despread resulting output vector data sample sets 229 (0) -229 (Z) can be supplied to vector data files 82 (0) -82 (X) for storage. In this way, vector data files 82 (0) -82 are stored to store the despread resulting output vector data sample sets 229 (0) -229 (Z) created by the despreading circuit 230. Only one write is required for (X). The adder trees 248 (1) to 248 (5) in the despreading circuit 230 of FIG. 29 can obtain the spreading coefficients 4, 8, 16, and 32 regardless of which spreading coefficient is required in the despreading vector processing operation 236. The resulting output vector data sample 229, despread for all, can be generated. Alternatively, adders in the adder tree that do not need to perform the despread vector processing operation 236 according to the desired spreading factor can be disabled or configured to add zeros. However, to determine which of these despread, resulting output vector data samples 229 are supplied to latches 276 (0) -276 (X) for storage, the following is described. As shown, selectors 278 (0) to 278 (((X + 1) / 4) -1) and 278 (3) are provided.

[00218]この関連で、引き続き図２９を参照すると、選択器２７８（０）は、実行されている逆拡散ベクトル処理動作２３６に基づいて、それぞれ、加算器２５０（０）、２５４（０）、２５８（０）からの拡散係数４、８、および１６、ならびに加算器２６２、２７０からの拡散係数３２、６４、１２８、２５６のいずれかに対して、逆拡散された、結果として生じる出力ベクトルデータサンプル２２９を選択することができる。選択器２７８（１）は、実行されている逆拡散ベクトル処理動作２３６に基づいて、それぞれ、加算器２５０（１）、２５４（１）、および２５８（１）からの拡散係数４、８、および１６に対して、逆拡散された、結果として生じる出力ベクトルデータサンプル２２９を選択することができる。選択器２７８（２）は、実行されている逆拡散ベクトル処理動作２３６に基づいて、それぞれ、加算器２５０（２）および２５４（２）からの拡散係数４および８に対して、逆拡散された、結果として生じる出力ベクトルデータサンプル２２９を選択することができる。選択器２７８（３）は、実行されている逆拡散ベクトル処理動作２３６に基づいて、それぞれ、加算器２５０（３）および２５４（３）からの拡散係数４および８に対して、逆拡散された、結果として生じる出力ベクトルデータサンプル２２９を選択することができる。選択器２７８（４）は、実行されている逆拡散ベクトル処理動作２３６に基づいて、それぞれ、加算器ツリー２４８（１）および２４８（２）からの拡散係数４および８に対して、逆拡散された、結果として生じる出力ベクトルデータサンプル２２９を選択することができる。８の拡散係数を供給することが選択器２７８（０）〜２７８（３）によって完全に満足され得るので、選択器は、加算器２５０（４）〜２５０（７）から供給され逆拡散された、結果として生じる出力ベクトルデータサンプル２２９を制御するためには設けられない。 [00218] Continuing with reference to FIG. 29 in this regard, the selector 278 (0) is based on the despread vector processing operation 236 being performed, respectively, with adders 250 (0), 254 (0), Resulting output vector data despread for spreading coefficients 4, 8, and 16 from 258 (0) and any of spreading coefficients 32, 64, 128, 256 from adders 262, 270 Sample 229 can be selected. Selector 278 (1) is based on the despreading vector processing operation 236 being performed, and spreading coefficients 4, 8, and, from adders 250 (1), 254 (1), and 258 (1), respectively. For 16, the resulting output vector data sample 229, despread, can be selected. Selector 278 (2) was despread for spreading coefficients 4 and 8 from adders 250 (2) and 254 (2), respectively, based on the despreading vector processing operation 236 being performed. The resulting output vector data sample 229 can be selected. Selector 278 (3) was despread for spreading coefficients 4 and 8 from adders 250 (3) and 254 (3), respectively, based on the despreading vector processing operation 236 being performed. The resulting output vector data sample 229 can be selected. Selector 278 (4) is despread for spreading coefficients 4 and 8 from adder trees 248 (1) and 248 (2), respectively, based on the despreading vector processing operation 236 being performed. Also, the resulting output vector data sample 229 can be selected. Since supplying a spreading factor of 8 can be fully satisfied by the selectors 278 (0) -278 (3), the selector is despread from the adders 250 (4) -250 (7). It is not provided to control the resulting output vector data sample 229.

[00219]引き続き図２９を参照すると、それぞれ、選択器２７８（０）〜２７８（（（Ｘ＋１）／４）−１）、２７８（３）および加算器２５０（４）〜２５０（（（Ｘ＋１）／２）−１）、２５０（７）によって選択され逆拡散された、結果として生じる出力ベクトルデータサンプル２２９を受信するために、一連のデータスライサ２８０（０）〜２８０（（（Ｘ＋１）／２）−１）、２８０（７）が設けられる。データスライサ２８０（０）〜２８０（（（Ｘ＋１）／２）−１）、２８０（７）は、その受信され逆拡散された、結果として生じる出力ベクトルデータサンプル２２９が論理高レベル（たとえば、論理「１」）として特徴付けられるか、論理低レベル（たとえば、論理「０」）として特徴付けられるかを選択するように構成される。逆拡散された、結果として生じる出力ベクトルデータサンプル２２９は、次いで、クロスバー２８２への接続を介して、記憶されるためにラッチ２７６（０）〜２７６（Ｘ）の中の所望のラッチ２７６に転送される。クロスバー２８２は、様々なラッチ２７６（０）〜２７６（Ｘ）に、逆拡散ベクトル処理動作２３６に従って逆拡散された、結果として生じる出力ベクトルデータサンプル２２９を供給する柔軟性を提供する。このようにして、逆拡散された、結果として生じる出力ベクトルデータサンプル２２９は、ベクトルデータファイル８２（０）〜８２（Ｘ）に記憶される前に、逆拡散ベクトル処理動作２３６の様々な繰返しの中で、ラッチ２７６（０）〜２７６（Ｘ）にスタックされ得る。たとえば、逆拡散された、結果として生じる出力ベクトルデータサンプルセット２２９（０）〜２２９（Ｚ）は、ベクトルデータファイル８２（０）〜８２（Ｘ）に記憶される前に、逆拡散ベクトル処理動作２３６の様々な繰返しの中で、ラッチ２７６（０）〜２７６（Ｘ）にスタックされ得る。このようにして、逆拡散された、結果として生じる出力ベクトルデータサンプルセット２２９（０）〜２２９（Ｚ）を記憶するためのベクトルデータファイル８２（０）〜８２（Ｘ）へのアクセスは、動作効率のために最小化され得る。 [00219] With continued reference to FIG. 29, selectors 278 (0) -278 (((X + 1) / 4) -1), 278 (3) and adders 250 (4) -250 (((X + 1)), respectively. / 2) -1), a series of data slicers 280 (0) -280 (((X + 1) / 2) to receive the resulting output vector data samples 229 selected and despread by 250 (7) -1) 280 (7) is provided. Data slicers 280 (0) -280 (((X + 1) / 2) -1), 280 (7) receive and despread the resulting output vector data samples 229 at a logic high level (eg, logic It is configured to select whether it is characterized as “1”) or as a logic low level (eg, logic “0”). The despread resulting output vector data sample 229 is then stored in the desired latch 276 in latches 276 (0) -276 (X) to be stored via connection to crossbar 282. Transferred. Crossbar 282 provides the flexibility to supply various latches 276 (0) -276 (X) with the resulting output vector data sample 229, which is despread according to despread vector processing operation 236. In this way, the resulting output vector data samples 229 that have been despread are stored in various iterations of the despread vector processing operation 236 before being stored in the vector data files 82 (0) -82 (X). In, it can be stacked into latches 276 (0) -276 (X). For example, the despread resulting output vector data sample sets 229 (0) -229 (Z) may be despread vector processing operations before being stored in the vector data files 82 (0) -82 (X). In various iterations of 236, it can be stacked into latches 276 (0) -276 (X). In this way, access to the vector data files 82 (0) -82 (X) to store the despread resulting output vector data sample sets 229 (0) -229 (Z) is operational. Can be minimized for efficiency.

[00220]たとえば、図２９に示されたように、クロスバー２８２に結合された選択器２８４（０）〜２８４（Ｘ）は、ラッチ２７６（０）〜２７６（Ｘ）のいずれかの中にデータスライサ２８０（０）からの逆拡散された、結果として生じる出力ベクトルデータサンプル２２９を記憶するように制御され得る。クロスバー２８２に結合された選択器２８４（１）、２８４（３）、２８４（５）、２８４（７）、２８４（９）、２８４（１１）、２８４（１３）、２８４（１５）は、ラッチ２７６（１）、２７６（３）、２７６（５）、２７６（７）、２７６（９）、２７６（１１）、２７６（１３）、および２７６（１５）に記憶されるべきデータスライサ２８０（１）からの逆拡散された、結果として生じる出力ベクトルデータサンプル２２９を記憶するように制御され得る。クロスバー２８２に結合された選択器２８４（２）、２８４（６）、２８４（１０）、２８４（１４）は、ラッチ２７６（２）、２７６（６）、２７６（１０）、および２７６（１４）にデータスライサ２８０（２）からの逆拡散された、結果として生じる出力ベクトルデータサンプル２２９を記憶するように制御され得る。クロスバー２８２に結合された選択器２８４（３）、２８４（７）、２８４（１１）、２８４（１５）は、ラッチ２７６（３）、２７６（７）、２７６（１１）、および２７６（１５）にデータスライサ２８０（３）からの逆拡散された、結果として生じる出力ベクトルデータサンプル２２９を記憶するように制御され得る。クロスバー２８２に結合された選択器２８４（４）および２８４（１２）は、ラッチ２７６（４）および２７６（１２）にデータスライサ２８０（４）からの逆拡散された、結果として生じる出力ベクトルデータサンプル２２９を記憶するように制御され得る。クロスバー２８２に結合された選択器２８４（５）および２８４（１３）は、ラッチ２７６（５）および２７６（１３）にデータスライサ２８０（５）からの逆拡散された、結果として生じる出力ベクトルデータサンプル２２９を記憶するように制御され得る。クロスバー２８２に結合された選択器２８４（６）および２８４（１４）は、ラッチ２７６（６）または２７６（１４）にデータスライサ２８０（６）からの逆拡散された、結果として生じる出力ベクトルデータサンプル２２９を記憶するように制御され得る。クロスバー２８２に結合された選択器２８４（７）および２８４（１５）は、ラッチ２７６（７）または２７６（１５）にデータスライサ２８０（７）からの逆拡散された、結果として生じる出力ベクトルデータサンプル２２９を記憶するように制御され得る。 [00220] For example, as shown in FIG. 29, selectors 284 (0) -284 (X) coupled to crossbar 282 are in any of latches 276 (0) -276 (X). It can be controlled to store the despread resulting output vector data samples 229 from data slicer 280 (0). Selectors 284 (1), 284 (3), 284 (5), 284 (7), 284 (9), 284 (11), 284 (13), 284 (15) coupled to the crossbar 282 are Data slicer 280 (to be stored in latches 276 (1), 276 (3), 276 (5), 276 (7), 276 (9), 276 (11), 276 (13), and 276 (15) It can be controlled to store the despread resulting output vector data samples 229 from 1). Selectors 284 (2), 284 (6), 284 (10), 284 (14) coupled to crossbar 282 are latched 276 (2), 276 (6), 276 (10), and 276 (14). ) Can be controlled to store the despread resulting output vector data samples 229 from the data slicer 280 (2). Selectors 284 (3), 284 (7), 284 (11), 284 (15) coupled to crossbar 282 are latched 276 (3), 276 (7), 276 (11), and 276 (15). ) Can be controlled to store the despread resulting output vector data samples 229 from the data slicer 280 (3). Selectors 284 (4) and 284 (12) coupled to crossbar 282 are despread resulting output vector data from data slicer 280 (4) to latches 276 (4) and 276 (12). The sample 229 can be controlled to store. Selectors 284 (5) and 284 (13) coupled to crossbar 282 are despread resulting output vector data from data slicer 280 (5) to latches 276 (5) and 276 (13). The sample 229 can be controlled to store. Selectors 284 (6) and 284 (14) coupled to crossbar 282 are despread resulting output vector data from data slicer 280 (6) to latch 276 (6) or 276 (14). The sample 229 can be controlled to store. Selectors 284 (7) and 284 (15) coupled to crossbar 282 are despread resulting output vector data from data slicer 280 (7) to latch 276 (7) or 276 (15). The sample 229 can be controlled to store.

[00221]引き続き図２９を参照すると、逆拡散回路２３０は、実行されるべきベクトル命令に従って、結果として生じる出力ベクトルデータサンプル２２８（０）〜２２８（Ｘ）に対して逆拡散動作を実行するか、または実行しないように構成されるようにプログラムされ得る。この関連で、ベクトルデータファイル８２（０）〜８２（Ｘ）に記憶されるために、それぞれ、結果として生じる出力ベクトルデータサンプルセット２２８（０）〜２２８（Ｘ）に対して逆拡散動作を実行するか、またはラッチ２７６（０）〜２７６（Ｘ）に、結果として生じる出力ベクトルデータサンプルセット２２８（０）〜２２８（Ｘ）を単に供給するために、図２９の逆拡散構成入力２８６が逆拡散回路２３０に提供され得る。このようにして、逆拡散回路２３０は、ベクトル命令が実行されるべきそのような処理を提供しない場合、結果として生じる出力ベクトルデータサンプルセット２２８（０）〜２２８（Ｘ）を逆拡散しないようにプログラムされ得る。逆拡散構成入力２８４は、図２７のＶＰＥ２２（５）によるベクトル処理において柔軟性を提供するように、ベクトル命令ごとに構成および再構成され得る。たとえば、逆拡散構成入力２８４は、必要な場合実行ユニット８４（０）〜８４（Ｘ）を十分に利用して、必要に応じて逆拡散を提供するように、ベクトル命令のクロックサイクルごとに、必要な場合クロックサイクルごとに、構成および再構成され得る。 [00221] With continued reference to FIG. 29, does the despreading circuit 230 perform a despreading operation on the resulting output vector data samples 228 (0) -228 (X) according to the vector instruction to be executed? Or may be programmed to be configured not to execute. In this regard, a despreading operation is performed on the resulting output vector data sample sets 228 (0) -228 (X), respectively, for storage in vector data files 82 (0) -82 (X). Or the despreading configuration input 286 of FIG. 29 is reversed to simply supply the resulting output vector data sample sets 228 (0) -228 (X) to the latches 276 (0) -276 (X). The diffusion circuit 230 can be provided. In this way, the despreading circuit 230 does not despread the resulting output vector data sample sets 228 (0) -228 (X) if the vector instruction does not provide such processing to be performed. Can be programmed. Despread configuration input 284 may be configured and reconfigured for each vector instruction to provide flexibility in vector processing by VPE 22 (5) of FIG. For example, the despreading configuration input 284 may use the execution units 84 (0) -84 (X) when necessary to provide despreading as needed, every vector instruction clock cycle, It can be configured and reconfigured every clock cycle if necessary.

[00222]いくつかの他のワイヤレスベースバンド動作は、拡散スペクトルデータシーケンスの逆拡散以外の理由で前の処理動作から決定されたデータサンプルのマージングを必要とする。たとえば、ベクトルデータレーン１００（０）〜１００（Ｘ）によって提供された実行ユニット８４（０）〜８４（Ｘ）のためのデータフローパスよりも広い変化幅のベクトルデータサンプルを累算することが望ましい場合がある。別の例として、ベクトル処理動作において出力ベクトルデータのマージングを提供するために、様々な実行ユニット８４（０）〜８４（Ｘ）からの出力ベクトルデータサンプルのドット積乗算を提供することが望ましい場合がある。ＶＰＥ内のベクトルデータレーン１００（０）〜１００（Ｘ）は、マージされたベクトル処理動作を提供するために、ベクトルデータレーン１００（０）〜１００（Ｘ）と交差するためのベクトル内データパスを提供する複雑なルーティングを含む可能性がある。しかしながら、様々なベクトルデータレーンと交差してマージされるべき出力ベクトルデータにおける並列化は困難なので、これにより、複雑さが増大し、ＶＰＥの効率が低減される可能性がある。ベクトルプロセッサは、実行ユニットからベクトルデータメモリに記憶された出力ベクトルデータの後処理マージングを実行する回路を含む可能性がある。ベクトルデータメモリに記憶された後処理された出力ベクトルデータサンプルは、ベクトルデータメモリからフェッチされ、必要に応じてマージされ、ベクトルデータメモリに戻されて記憶される。しかしながら、この後処理により、ＶＰＥの次のベクトル処理動作が遅延し、実行ユニット内のコンピュータ構成要素が過少利用される原因になる可能性がある。 [00222] Some other wireless baseband operations require merging of data samples determined from previous processing operations for reasons other than despreading of the spread spectrum data sequence. For example, it may be desirable to accumulate vector data samples having a wider variation width than the data flow path for execution units 84 (0) -84 (X) provided by vector data lanes 100 (0) -100 (X). There is a case. As another example, it may be desirable to provide dot product multiplication of output vector data samples from various execution units 84 (0) -84 (X) to provide merging of output vector data in vector processing operations. There is. Vector data lanes 100 (0) -100 (X) in the VPE are inter-vector data paths for intersecting vector data lanes 100 (0) -100 (X) to provide merged vector processing operations. May include complex routing to provide. However, parallelization in output vector data to be merged across various vector data lanes is difficult, so this can increase complexity and reduce the efficiency of the VPE. The vector processor may include circuitry that performs post-processing merging of output vector data stored in the vector data memory from the execution unit. The post-processed output vector data samples stored in the vector data memory are fetched from the vector data memory, merged as necessary, and returned to the vector data memory for storage. However, this post-processing may delay the next vector processing operation of the VPE and cause underutilization of computer components in the execution unit.

[00223]たとえば、前述されたＶＰＥ内のベクトルデータファイル８２（０）、８２（１）内に供給された２つの入力ベクトルデータサンプル２９０（０）、２９０（１）が図３０に示される。これらの２つの入力ベクトルデータサンプル２９０（０）、２９０（１）を一緒に加算することが望ましい場合がある。この例では、２つの入力ベクトルデータサンプル２９０（０）、２９０（１）の和は「０ｘ１１２５０３１４Ｅ」であり、それはベクトルデータレーン１００（０）または１００（１）のいずれかよりも大きいデータ幅を有する。実行ユニット８４（０）、８４（１）が、ベクトルデータレーン１００（０）、１００（１）をまたぐ２つの実行ユニット８４（０）、８４（１）の間の桁上げ論理を提供することを含む、２つの入力ベクトルデータサンプル２９０（０）、２９０（１）一緒の和の実行を行うことが可能になるように、ベクトルデータレーン１００（０）、１００（１）の間のベクトルデータルーティングを提供するために、データフローパスがＶＰＥ２２内に設けられる可能性がある。マージされたベクトルデータサンプルのスカラー結果を供給するために、すべてのベクトルデータレーン１００（０）〜１００（Ｘ）と交差する能力が必要となる場合があり、それにより、データフローパス内の複雑さがさらに増大する場合がある。しかしながら、上記で説明されたように、これにより、データフローパス内の複雑さが加わるはずであり、それにより、複雑さが増大し、場合によっては効率が低減される。 [00223] For example, two input vector data samples 290 (0), 290 (1) provided in the vector data files 82 (0), 82 (1) in the VPE described above are shown in FIG. It may be desirable to add these two input vector data samples 290 (0), 290 (1) together. In this example, the sum of the two input vector data samples 290 (0), 290 (1) is “0x11250314E”, which has a data width greater than either vector data lane 100 (0) or 100 (1). Have. Execution units 84 (0), 84 (1) provide carry logic between two execution units 84 (0), 84 (1) across vector data lanes 100 (0), 100 (1) Vector data between vector data lanes 100 (0), 100 (1) such that two input vector data samples 290 (0), 290 (1) can be performed together. A data flow path may be provided in the VPE 22 to provide routing. In order to provide a scalar result of merged vector data samples, the ability to cross all vector data lanes 100 (0) -100 (X) may be required, thereby increasing the complexity in the data flow path. May further increase. However, as explained above, this should add complexity in the data flow path, thereby increasing complexity and possibly reducing efficiency.

[00224]この問題に対処するために、下記で開示される実施形態は、ＶＰＥ内の実行ユニットとベクトルデータメモリとの間の出力データフローパス内に設けられたマージング回路を含むＶＰＥを含む。マージング回路は、出力ベクトルデータサンプルセットが実行ユニットからベクトルデータメモリに出力データフローパスを介して供給されている間に、インフライトの実行ユニットによって供給された出力ベクトルデータサンプルセットからの出力ベクトルデータサンプルをマージするように構成される。出力ベクトルデータサンプルのインフライトマージングは、実行ユニットによって供給された出力ベクトルデータサンプルが、ベクトルデータメモリに記憶される前にマージされ得ることを意味し、その結果、得られた出力ベクトルデータサンプルセットはマージされたフォーマットでベクトルデータメモリに記憶される。マージされた出力ベクトルデータサンプルは、実行ユニット内で実行されるべき次のベクトル処理動作を遅延させる可能性がある、さらなる後処理ステップを必要とせずに、ベクトルデータファイルに記憶され得る。したがって、ＶＰＥ内のデータフローパスの効率は、ベクトルデータマージング動作によって制限されない。マージされたベクトルデータサンプルがベクトルデータメモリに記憶されるとき、実行ユニット内の次のベクトル処理は、データフローの制限ではなく、コンピュータリソースのみによって制限される。 [00224] To address this issue, the embodiments disclosed below include a VPE that includes a merging circuit provided in an output data flow path between an execution unit in the VPE and a vector data memory. The merging circuit outputs the output vector data samples from the output vector data sample set supplied by the in-flight execution unit while the output vector data sample set is supplied from the execution unit to the vector data memory via the output data flow path. Configured to merge. In-flight merging of output vector data samples means that the output vector data samples supplied by the execution unit can be merged before being stored in the vector data memory, so that the resulting output vector data sample set Are stored in the vector data memory in a merged format. The merged output vector data samples can be stored in the vector data file without the need for further post-processing steps that may delay the next vector processing operation to be performed in the execution unit. Therefore, the efficiency of the data flow path within the VPE is not limited by the vector data merging operation. When merged vector data samples are stored in vector data memory, the next vector processing in the execution unit is limited only by computer resources, not data flow limitations.

[00225]この関連で、図３１は、図２のＶＰＥ２２として提供され得る別の例示的なＶＰＥ２２（６）の概略図である。下記でより詳細に記載されるように、図３１のＶＰＥ２２（６）は、ベクトルデータサンプルの再フェッチが除去または低減され、電力消費が低減される、ＶＰＥ２２（６）内のベクトルデータファイル８２（０）〜８２（Ｘ）に記憶されるべき、ベクトル処理動作のためのコードシーケンスを用いて実行ユニット８４（０）〜８４（Ｘ）によって供給される、結果として生じる出力ベクトルデータサンプルセット２９２（０）〜２９２（Ｘ）のインフライトマージングを提供するように構成される。結果として生じる出力ベクトルデータサンプルセット２９２（０）〜２９２（Ｘ）は、結果として生じる出力ベクトルデータサンプル２９２（０）、．．．、２９２（Ｘ）から構成される。非限定的な例として、マージベクトル処理動作は、結果として生じる出力ベクトルデータサンプル２９２を加算すること、複数の結果として生じる出力ベクトルデータサンプル２９２の中の最大ベクトルデータサンプル値を決定すること、または複数の結果として生じる出力ベクトルデータサンプル２９２の中の最小ベクトルデータサンプル値を決定することを含む可能性がある。図３１のＶＰＥ２２（６）では、結果として生じる出力ベクトルデータサンプルセット２９２（０）〜２９２（Ｘ）の中の、結果として生じる出力ベクトルデータサンプル２９２は、ベクトルデータファイル８２（０）〜８２（Ｘ）に記憶される前にマージされ得る。 [00225] In this regard, FIG. 31 is a schematic diagram of another exemplary VPE 22 (6) that may be provided as the VPE 22 of FIG. As described in more detail below, the VPE 22 (6) of FIG. 31 removes or reduces the refetching of vector data samples and reduces power consumption, thereby reducing the vector data file 82 (in VPE 22 (6). 0) -82 (X), resulting output vector data sample set 292 (supplied by execution units 84 (0) -84 (X) using a code sequence for vector processing operations. 0) to 292 (X) in-flight merging. The resulting output vector data sample sets 292 (0) -292 (X) are the resulting output vector data samples 292 (0),. . . , 292 (X). As a non-limiting example, a merge vector processing operation can add the resulting output vector data samples 292, determine a maximum vector data sample value among a plurality of resulting output vector data samples 292, or Determining a minimum vector data sample value among the plurality of resulting output vector data samples 292 may be included. In the VPE 22 (6) of FIG. 31, the resulting output vector data samples 292 in the resulting output vector data sample sets 292 (0) to 292 (X) are stored in vector data files 82 (0) to 82 ( X) can be merged before being stored.

[00226]マージング回路２９４は、結果として生じる出力ベクトルデータサンプルセット２２８（０）〜２２８（Ｘ）の中の、結果として生じる出力ベクトルデータサンプル２２８のインフライトマージングを提供するために、実行されるべきベクトル命令に従うプログラミングに基づいて構成される。マージされた、結果として生じる出力ベクトルデータサンプル２９６（０）〜２９６（Ｚ）は、出力データフローパス９８（０）〜９８（Ｘ）内のマージング回路２９４によって供給される。マージされた結果として生じる出力ベクトルデータサンプル２９６（０）〜２９６（Ｚ）における「Ｚ」は、マージされた、結果として生じる出力ベクトルデータサンプルセット２９６（０）〜２９６（Ｚ）内のマージされた、結果として生じる出力ベクトルデータサンプル２９６の数を表す。マージされた、結果として生じる出力ベクトルデータサンプルセット２９６（０）〜２９６（Ｚ）は、この例では２９６（０）、．．．、および２９６（Ｚ）である、結果として生じる出力ベクトルデータサンプル２９６から構成される。マージされた、結果として生じる出力ベクトルデータサンプルセット２９６（０）〜２９６（Ｚ）内のマージされた、結果として生じる出力ベクトルデータサンプル２９６の数は、結果として生じる出力ベクトルデータサンプルセット２９２（０）〜２９２（Ｘ）に対して実行されるマージング動作に依存する。図３１のＶＰＥ２２（６）における、結果として生じる出力ベクトルデータサンプル２９２のインフライトマージングは、実行ユニット８４（０）〜８４（Ｘ）によって供給された、結果として生じる出力ベクトルデータサンプルセット２９２（０）〜２９２（Ｘ）内の、結果として生じる出力ベクトルデータサンプル２９２が、ベクトルデータファイル８２（０）〜８２（Ｘ）に記憶される前に、一緒にマージされ得ることを意味する。このようにして、マージされた、結果として生じる出力ベクトルデータサンプルセット２９６（０）〜２９６（Ｚ）のマージされた、結果として生じる出力ベクトルデータサンプル２９６は、マージされた、結果として生じる出力ベクトルデータサンプルセット２９６（０）〜２９６（Ｚ）として、マージされた形式でベクトルデータファイル８２（０）〜８２（Ｘ）に記憶され得る。 [00226] The merging circuit 294 is implemented to provide in-flight merging of the resulting output vector data samples 228 in the resulting output vector data sample sets 228 (0) -228 (X). Configured based on programming according to power vector instructions. The merged resulting output vector data samples 296 (0) -296 (Z) are provided by a merging circuit 294 in the output data flow path 98 (0) -98 (X). The “Z” in the resulting output vector data samples 296 (0) -296 (Z) that have been merged is merged into the merged resulting output vector data sample set 296 (0) -296 (Z). And the resulting number of output vector data samples 296. The merged resulting output vector data sample set 296 (0) -296 (Z) is 296 (0),. . . , And 296 (Z), resulting output vector data samples 296. The number of merged resulting output vector data samples 296 in the resulting output vector data sample sets 296 (0) -296 (Z) is equal to the resulting output vector data sample set 292 (0 ) To 292 (X). In-flight merging of the resulting output vector data samples 292 in VPE 22 (6) of FIG. 31 is provided by the resulting output vector data sample set 292 (0 ) -292 (X) means that the resulting output vector data samples 292 can be merged together before being stored in the vector data files 82 (0) -82 (X). In this way, the merged resulting output vector data samples 296 of the resulting output vector data sample sets 296 (0) -296 (Z) are merged into the resulting output vector. Data sample sets 296 (0) -296 (Z) can be stored in vector data files 82 (0) -82 (X) in merged form.

[00227]このように、出力データフローパス９８（０）〜９８（Ｘ）内に設けられたマージング回路２９４により、結果として生じる出力ベクトルデータサンプルセット２９２（０）〜２９２（Ｘ）は、最初にベクトルデータファイル８２（０）〜８２（Ｘ）に記憶され、次いでベクトルデータファイル８２（０）〜８２（Ｘ）からフェッチされる必要がない。所望の結果として生じる出力ベクトルデータサンプル２９２はマージされ、結果として生じる出力ベクトルデータサンプル２９２は、ベクトルデータファイル８２（０）〜８２（Ｘ）にマージされた形式で再記憶される。結果として生じる出力ベクトルデータサンプルセット２９２（０）〜２９２（Ｘ）からの、結果として生じる出力ベクトルデータサンプル２９２は、ベクトルデータファイル８２（０）〜８２（Ｘ）に記憶される前にマージされ得る。このようにして、マージされた、結果として生じる出力ベクトルデータサンプルセット２９６（０）〜２９６（Ｚ）からのマージされた、結果として生じる出力ベクトルデータサンプル２９６は、実行ユニット８４（０）〜８４（Ｘ）において実行されるべき次のベクトル処理動作を遅延させる可能性がある、さらなる後処理ステップを必要とせずに、ベクトルデータファイル８２（０）〜８２（Ｘ）に記憶される。したがって、ＶＰＥ２２（６）内のデータフローパスの効率は、結果として生じる出力ベクトルデータサンプル２９２のマージングによって制限されない。結果として生じる出力ベクトルデータサンプル２９２がベクトルデータファイル８２（０）〜８２（Ｘ）にマージされた形式で記憶されるとき、実行ユニット８４（０）〜８４（Ｘ）における次のベクトル処理は、データフローの制限ではなく、コンピュータリソースのみによって制限される。 [00227] Thus, the resulting output vector data sample sets 292 (0) -292 (X) by the merging circuit 294 provided in the output data flow paths 98 (0) -98 (X) It need not be stored in vector data files 82 (0) -82 (X) and then fetched from vector data files 82 (0) -82 (X). The desired output vector data sample 292 is merged and the resulting output vector data sample 292 is re-stored in the merged form in the vector data files 82 (0) -82 (X). The resulting output vector data samples 292 from the resulting output vector data sample sets 292 (0) -292 (X) are merged before being stored in the vector data files 82 (0) -82 (X). obtain. In this way, the merged resulting output vector data samples 296 from the resulting output vector data sample sets 296 (0) -296 (Z) are stored in execution units 84 (0) -84. Stored in vector data files 82 (0) -82 (X) without the need for further post-processing steps that may delay the next vector processing operation to be performed in (X). Thus, the efficiency of the data flow path in VPE 22 (6) is not limited by the resulting merging of output vector data samples 292. When the resulting output vector data samples 292 are stored in merged form in vector data files 82 (0) -82 (X), the next vector processing in execution units 84 (0) -84 (X) is: Limited by computer resources only, not data flow limitations.

[00228]さらに、実行ユニット８４（０）〜８４（Ｘ）とベクトルデータファイル８２（０）〜８２（Ｘ）との間の出力データフローパス９８（０）〜９８（Ｘ）内にマージング回路２９４を設けることによって、結果として生じる出力ベクトルデータサンプルセット２９２（０）〜２９２（Ｘ）は、ベクトルデータファイル８２（０）〜８２（Ｘ）と実行ユニット８４（０）〜８４（Ｘ）との間の入力データフローパス８０（０）〜８０（Ｘ）内のベクトルデータレーン１００と交差する必要がない。異なるベクトルデータレーン１００の間の入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）内の入力ベクトルデータサンプル８６のマージングのためのデータフローパスを設けると、ルーティングの複雑さが増大するはずである。結果として、入力データフローパス８０（０）〜８０（Ｘ）においてマージング動作が実行されている間、実行ユニット８４（０）〜８４（Ｘ）は過少利用される可能性がある。同様に、上記で説明されたように、入力データフローパス８０（０）〜８０（Ｘ）における、結果として生じる出力ベクトルデータサンプルセット２９２（０）〜２９２（Ｘ）からの、結果として生じる出力ベクトルデータサンプル２９２のマージングは、結果として生じる出力ベクトルデータサンプルセット２９２（０）〜２９２（Ｘ）が最初に図３１のＶＰＥ２２（６）内のベクトルデータファイル８２（０）〜８２（Ｘ）に記憶されることを必要とするはずであり、それにより、再フェッチおよびマージされるときの電力消費が増大し、および／またはマージング動作が実行されている間に遅延する可能性がある実行ユニット８４（０）〜８４（Ｘ）の過少利用のリスクがある。 [00228] Further, the merging circuit 294 in the output data flow path 98 (0) -98 (X) between the execution units 84 (0) -84 (X) and the vector data files 82 (0) -82 (X). Resulting in output vector data sample sets 292 (0) -292 (X) between vector data files 82 (0) -82 (X) and execution units 84 (0) -84 (X). There is no need to cross the vector data lane 100 in the input data flow paths 80 (0) -80 (X) between. Providing a data flow path for merging of input vector data samples 86 in input vector data sample sets 86 (0) -86 (X) between different vector data lanes 100 should increase routing complexity. . As a result, execution units 84 (0) -84 (X) may be underutilized while merging operations are being performed in input data flow paths 80 (0) -80 (X). Similarly, the resulting output vectors from the resulting output vector data sample sets 292 (0) -292 (X) in the input data flow paths 80 (0) -80 (X) as described above. For merging data samples 292, the resulting output vector data sample sets 292 (0) -292 (X) are first stored in vector data files 82 (0) -82 (X) in VPE 22 (6) of FIG. Execution unit 84 (which may need to be done, thereby increasing power consumption when refetched and merged and / or delaying while merging operations are being performed) There is a risk of underuse of 0) to 84 (X).

[00229]図４、図１１、図１９、図２３および図２７のＶＰＥ２２（１）〜２２（５）内に設けられた共通構成要素が、図３１のＶＰＥ２２（６）内に設けられることに留意されたい。共通構成要素は、共通要素番号とともに図３１のＶＰＥ２２（６）において示される。ＶＰＥ２２（１）〜２２（５）内の上記これらの共通構成要素の前の記載および説明は、図３１のＶＰＥ２２（６）にも適用可能であり、したがってここでは再び記載されない。 [00229] The common components provided in VPE 22 (1) -22 (5) of FIGS. 4, 11, 19, 23, and 27 are provided in VPE 22 (6) of FIG. Please keep in mind. The common component is shown in VPE 22 (6) of FIG. 31 together with the common element number. The previous description and description of these common components above in VPE 22 (1) -22 (5) is also applicable to VPE 22 (6) of FIG. 31, and is therefore not described again here.

[00230]引き続き図３１を参照すると、より具体的には、マージング回路２９４は、出力データフローパス９８（０）〜９８（Ｘ）上のマージング回路入力３００（０）〜３００（Ｘ）上で、結果として生じる出力ベクトルデータサンプルセット２９２（０）〜２９２（Ｘ）を受信するように構成される。マージング回路２９４は、マージされた、結果として生じる出力ベクトルデータサンプルセット２９６（０）〜２９６（Ｚ）を供給するために、結果として生じる出力ベクトルデータサンプルセット２９２（０）〜２９２（Ｘ）からの所望の結果として生じる出力ベクトルデータサンプル２９２をマージするように構成される。マージされた結果として生じる出力ベクトルデータサンプル２９６（０）〜２９６（Ｚ）における「Ｚ」は、マージされた、結果として生じる出力ベクトルデータサンプルセット２９６（０）〜２９６（Ｚ）のビット幅を表す。「Ｚ」は、マージング動作に起因して、「Ｘ」によって表される、結果として生じる出力ベクトルデータサンプルセット２９２（０）〜２９２（Ｘ）のビット幅よりも小さい場合がある。下記でより詳細に説明されるように、マージされた、結果として生じる出力ベクトルデータサンプルセット２９６（０）〜２９６（Ｚ）内のマージされた、結果として生じる出力ベクトルデータサンプル２９６の数「Ｚ＋１」は、一緒にマージされるべき結果として生じる出力ベクトルデータサンプルセット２９２（０）〜２９２（Ｘ）からの、結果として生じる出力ベクトルデータサンプル２９２に依存する。マージング回路２９４は、記憶用にベクトルデータファイル８２（０）〜８２（Ｘ）に供給されるために、出力データフローパス９８（０）〜９８（Ｘ）内のマージング回路出力３０１（０）〜３０１（Ｘ）上にマージされた、結果として生じる出力ベクトルデータサンプルセット２９６（０）〜２９６（Ｚ）を供給するように構成される。 [00230] Continuing with reference to FIG. 31, more specifically, the merging circuit 294 is configured on the merging circuit inputs 300 (0) -300 (X) on the output data flow paths 98 (0) -98 (X): The resulting output vector data sample sets 292 (0) -292 (X) are configured to be received. Merging circuit 294 provides a merged, resulting output vector data sample set 296 (0) -296 (Z) from the resulting output vector data sample set 292 (0) -292 (X). Is configured to merge the resulting output vector data samples 292. “Z” in the resulting output vector data samples 296 (0) -296 (Z) that is merged results in the bit width of the merged resulting output vector data sample set 296 (0) -296 (Z). Represent. “Z” may be smaller than the bit width of the resulting output vector data sample set 292 (0) -292 (X) represented by “X” due to the merging operation. As described in more detail below, the number of merged resulting output vector data samples 296 in the resulting output vector data sample sets 296 (0) -296 (Z) “Z + 1”. Depends on the resulting output vector data samples 292 from the resulting output vector data sample sets 292 (0) -292 (X) to be merged together. The merging circuit 294 is supplied to the vector data files 82 (0) -82 (X) for storage, so that the merging circuit outputs 301 (0) -301 in the output data flow paths 98 (0) -98 (X). (X) is configured to provide the resulting output vector data sample sets 296 (0) -296 (Z) merged onto.

[00231]この実施形態における出力データフローパス９８（０）〜９８（Ｘ）内のベクトルデータファイル８２（０）〜８２（Ｘ）にマージされた、結果として生じる出力ベクトルデータサンプルセット２９６（０）〜２９６（Ｚ）を供給するための、図３１のＶＰＥ２２（６）のさらなる詳細および特徴のさらなる説明が次に記載される。この関連で、図３２は、結果として生じる出力ベクトルデータサンプル２９２のマージングを必要とする例示的なベクトル命令に従って、マージング回路２９４を利用する図３１のＶＰＥ２２（６）において実行され得るベクトル処理動作３０２から得られた、結果として生じる出力ベクトルデータサンプルセット２９２（０）〜２９２（Ｘ）の、結果として生じる出力ベクトルデータサンプル２９２の例示的なマージングを示すフローチャートである。 [00231] The resulting output vector data sample set 296 (0) merged into the vector data files 82 (0) -82 (X) in the output data flow paths 98 (0) -98 (X) in this embodiment. Further details and a further description of the features of VPE 22 (6) of FIG. 31 to supply ˜296 (Z) will now be described. In this regard, FIG. 32 illustrates a vector processing operation 302 that may be performed in VPE 22 (6) of FIG. 31 utilizing merging circuit 294 according to an exemplary vector instruction that requires merging of the resulting output vector data sample 292. FIG. 6 is a flowchart illustrating exemplary merging of the resulting output vector data samples 292 of the resulting output vector data sample sets 292 (0) -292 (X) obtained from FIG.

[00232]図３１と図３２とを参照すると、ベクトル命令に従うベクトル処理動作３０２に従って処理されるべき入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）が、ベクトルデータファイル８２（０）〜８２（Ｘ）からフェッチされ、入力データフローパス８０（０）〜８０（Ｘ）内に供給される（図３２のブロック３０４）。ベクトル処理動作３０２のための入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）の幅に応じて、ベクトル命令のプログラミングに従うベクトル処理動作３０２を提供するために、図３１のＶＰＥ２２（６）内のベクトルデータレーン１００（０）〜１００（Ｘ）の１つ、いくつか、またはすべてが利用され得る。ベクトルデータファイル８２（０）〜８２（Ｘ）の幅全体が必要な場合、すべてのベクトルデータレーン１００（０）〜１００（Ｘ）がベクトル処理動作３０２に利用され得る。ベクトル処理動作３０２は、ベクトルデータレーン１００（０）〜１００（Ｘ）のサブセットを必要とするにすぎない場合がある。これは、入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）の幅がすべてのベクトルデータファイル８２（０）〜８２（Ｘ）の幅よりも小さいからであり得るし、ここで、ベクトル処理動作３０２と並列に実行されるべき他のベクトル処理動作にさらなるベクトルデータレーン１００を利用することが望ましい。 [00232] Referring to FIGS. 31 and 32, input vector data sample sets 86 (0) -86 (X) to be processed according to vector processing operations 302 according to vector instructions are converted to vector data files 82 (0) -82. (X) is fetched and supplied into the input data flow paths 80 (0) -80 (X) (block 304 of FIG. 32). Depending on the width of the input vector data sample set 86 (0) -86 (X) for the vector processing operation 302, within the VPE 22 (6) of FIG. One, some, or all of the vector data lanes 100 (0) -100 (X) may be utilized. If the entire width of the vector data files 82 (0) -82 (X) is required, all vector data lanes 100 (0) -100 (X) can be utilized for the vector processing operation 302. Vector processing operation 302 may only require a subset of vector data lanes 100 (0) -100 (X). This may be because the width of the input vector data sample set 86 (0) -86 (X) is less than the width of all vector data files 82 (0) -82 (X), where vector processing It may be desirable to utilize additional vector data lanes 100 for other vector processing operations to be performed in parallel with operation 302.

[00233]引き続き図３１と図３２とを参照すると、フェッチされた入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）が、実行ユニット８４（０）〜８４（Ｘ）にある入力データフローパス８０（０）〜８０（Ｘ）から受信される（図３２のブロック３０６）。実行ユニット８４（０）〜８４（Ｘ）が、ベクトル命令に従って提供されたベクトル処理動作３０２に従って、受信された入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）に対してベクトル処理動作３０２を実行する（図３２のブロック３０８）。実行ユニット８４（０）〜８４（Ｘ）は、ベクトル処理動作３０２が、結果として生じる出力ベクトルデータサンプルセット２９２（０）〜２９２（Ｘ）を供給するために、入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）を使用して、乗算および／または累算を提供することができる。ベクトル処理動作３０２が完了すると、入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）上で遂行されたベクトル処理動作３０２に基づく、結果として生じる出力ベクトルデータサンプルセット２９２（０）〜２９２（Ｘ）は、図３１のＶＰＥ２２（６）の出力データフローパス９８（０）〜９８（Ｘ）内に供給される。 [00233] With continued reference to FIGS. 31 and 32, the input data flow path 80 where the fetched input vector data sample sets 86 (0) -86 (X) are in execution units 84 (0) -84 (X). (0) -80 (X) are received (block 306 in FIG. 32). Execution units 84 (0) -84 (X) perform vector processing operations 302 on received input vector data sample sets 86 (0) -86 (X) according to vector processing operations 302 provided according to vector instructions. Perform (block 308 in FIG. 32). Execution units 84 (0) -84 (X) provide input vector data sample set 86 (0) for vector processing operation 302 to provide resulting output vector data sample sets 292 (0) -292 (X). ) -86 (X) can be used to provide multiplication and / or accumulation. When vector processing operation 302 is complete, the resulting output vector data sample sets 292 (0) -292 (X) based on vector processing operations 302 performed on input vector data sample sets 86 (0) -86 (X). ) Is supplied into the output data flow paths 98 (0) to 98 (X) of the VPE 22 (6) in FIG.

[00234]引き続き図３１と図３２とを参照すると、結果として生じる出力ベクトルデータサンプルセット２９２（０）〜２９２（Ｘ）がベクトルデータファイル８２（０）〜８２（Ｘ）に記憶される前に、結果として生じる出力ベクトルデータサンプルセット２９２（０）〜２９２（Ｘ）は、実行ユニット８４（０）〜８４（Ｘ）とベクトルデータファイル８２（０）〜８２（Ｘ）との間に設けられた出力データフローパス９８（０）〜９８（Ｘ）内に設けられたマージング回路２９４に供給される。マージング回路２９４は、実行されているベクトル命令に従って、および下記でより詳細に説明されるように、ベクトル命令がベクトルデータファイル８２（０）〜８２（Ｘ）に記憶されるべき、結果として生じる出力ベクトルデータサンプルセット２９２（０）〜２９２（Ｘ）からの、結果として生じる出力ベクトルデータサンプル２９２のマージングを要求する場合、出力データフローパス９８（０）〜９８（Ｘ）に含まれるようにプログラム可能である。結果として生じる出力ベクトルデータサンプルセット２９２（０）〜２９２（Ｘ）がベクトルデータファイル８２（０）〜８２（Ｘ）に記憶されることなく、マージング回路２９４が、実行されているベクトル命令に従って、結果として生じる出力ベクトルデータサンプルセット２９２（０）〜２９２（Ｘ）からの、結果として生じる出力ベクトルデータサンプル２９２をマージする（図３２のブロック３１０）。このようにして、結果として生じる出力ベクトルデータサンプルセット２９２（０）〜２９２（Ｘ）は、それにより実行ユニット８４（０）〜８４（Ｘ）において遅延をもたらす、最初にベクトルデータファイル８２（０）〜８２（Ｘ）に記憶され、再フェッチされ、後処理動作においてマージされ、ベクトルデータファイル８２（０）〜８２（Ｘ）にマージされたフォーマットで記憶される必要がない。結果として生じる出力ベクトルデータサンプルセット２９２（０）〜２９２（Ｘ）が、マージ後処理を必要とせずに、ベクトルデータファイル８２（０）〜８２（Ｘ）にマージされた、結果として生じる出力ベクトルデータサンプルセット２９６（０）〜２９６（Ｚ）として記憶される（図３２のブロック３１２）。 [00234] With continued reference to FIGS. 31 and 32, before the resulting output vector data sample sets 292 (0) -292 (X) are stored in the vector data files 82 (0) -82 (X). The resulting output vector data sample sets 292 (0) -292 (X) are provided between the execution units 84 (0) -84 (X) and the vector data files 82 (0) -82 (X). The output data flow paths 98 (0) to 98 (X) are supplied to a merging circuit 294. The merging circuit 294 provides the resulting output that the vector instructions are to be stored in the vector data files 82 (0) -82 (X) according to the vector instructions being executed and as described in more detail below. Programmable to be included in the output data flow path 98 (0) -98 (X) when requesting merging of the resulting output vector data sample 292 from the vector data sample sets 292 (0) -292 (X) It is. Without the resulting output vector data sample sets 292 (0) -292 (X) being stored in the vector data files 82 (0) -82 (X), the merging circuit 294 follows the vector instruction being executed, The resulting output vector data samples 292 from the resulting output vector data sample sets 292 (0) -292 (X) are merged (block 310 of FIG. 32). In this way, the resulting output vector data sample sets 292 (0) -292 (X) will initially cause a delay in the execution units 84 (0) -84 (X), which is the first vector data file 82 (0 ) -82 (X), re-fetched, merged in post-processing operations, and need not be stored in the merged format in vector data files 82 (0) -82 (X). The resulting output vector data sample sets 292 (0) -292 (X) are merged into vector data files 82 (0) -82 (X) without requiring post-merging processing. Stored as data sample sets 296 (0) -296 (Z) (block 312 of FIG. 32).

[00235]図３３は、図３１のＶＰＥ２２（６）内の実行ユニット８４（０）〜８４（Ｘ）とベクトルデータファイル８２（０）〜８２（Ｘ）との間の出力データフローパス９８（０）〜９８（Ｘ）内に設けられ得る例示的なマージング回路２９４の概略図である。マージング回路２９４は、マージされた、結果として生じる出力ベクトルデータサンプルセット２９６（０）〜２９６（Ｚ）を供給するために、結果として生じる出力ベクトルデータサンプルセット２９２（０）〜２９２（Ｘ）のマージングを提供するように構成される。結果として生じる出力ベクトルデータサンプルセット２９２（０）〜２９２（Ｘ）は、図３１に示されたように、実行ユニット出力９６（０）〜９６（Ｘ）からマージング回路２９４に供給される。 [00235] FIG. 33 illustrates an output data flow path 98 (0) between execution units 84 (0) -84 (X) and vector data files 82 (0) -82 (X) in VPE 22 (6) of FIG. ) -98 (X) is a schematic diagram of an exemplary merging circuit 294 that may be provided. The merging circuit 294 provides a merged, resulting output vector data sample set 296 (0) -296 (Z) of the resulting output vector data sample set 292 (0) -292 (X). Configured to provide merging. The resulting output vector data sample sets 292 (0) -292 (X) are supplied to the merging circuit 294 from the execution unit outputs 96 (0) -96 (X) as shown in FIG.

[00236]引き続き図３３を参照すると、マージング回路２９４は、結果として生じる出力ベクトルデータサンプルセット２９２（０）〜２９２（Ｘ）をマージするように構成される。この実施形態におけるマージング回路２９４は、マージされた、結果として生じる出力ベクトルデータサンプルセット２９６（０）〜２９６（Ｚ）を供給するように構成される。この関連で、マージング回路２９４は、結果として生じる出力ベクトルデータサンプルセット２９２（０）〜２９２（Ｘ）を受信するために、実行ユニット出力９６（０）〜９６（Ｘ）に結合された加算器ツリー３１８を含んでいる。マージング回路２９４の加算器ツリー３１８は、それらのそれぞれのベクトルデータレーン１００（０）〜１００（Ｘ）内で、結果として生じる出力ベクトルデータサンプルセット２９２（０）〜２９２（Ｘ）の各サンプル２９２を受信するように構成される。加算器ツリー３１８内に第１の加算器ツリーレベル３１８（１）が設けられる。第１の加算器ツリーレベル３１８（１）は、結果として生じる出力ベクトルデータサンプルセット２９２（０）〜２９２（Ｘ）内の隣接サンプル２９２をマージすることができるように、マージ回路３２０（０）〜３２０（（（Ｘ＋１）／２）−１）、３２０（７）から構成される。出力データフローパス９８（０）〜９８（Ｘ）から、結果として生じる出力ベクトルデータサンプルセット２９２（０）〜２９２（Ｘ）をラッチするために、ラッチ３２１（０）〜３２１（Ｘ）がマージング回路２９４内に設けられる。 [00236] With continued reference to FIG. 33, the merging circuit 294 is configured to merge the resulting output vector data sample sets 292 (0) -292 (X). The merging circuit 294 in this embodiment is configured to provide a merged resulting output vector data sample set 296 (0) -296 (Z). In this regard, the merging circuit 294 is an adder coupled to the execution unit outputs 96 (0) -96 (X) to receive the resulting output vector data sample sets 292 (0) -292 (X). A tree 318 is included. The adder tree 318 of the merging circuit 294 is responsible for each sample 292 of the resulting output vector data sample set 292 (0) -292 (X) within their respective vector data lanes 100 (0) -100 (X). Configured to receive. A first adder tree level 318 (1) is provided in the adder tree 318. The first adder tree level 318 (1) can merge the adjacent samples 292 in the resulting output vector data sample sets 292 (0) -292 (X) so that the merge circuit 320 (0). To 320 (((X + 1) / 2) -1), 320 (7). Latches 321 (0) -321 (X) are merging circuits to latch the resulting output vector data sample sets 292 (0) -292 (X) from output data flow paths 98 (0) -98 (X). 294 is provided.

[00237]たとえば、結果として生じる出力ベクトルデータサンプルセット２９２（０）〜２９２（Ｘ）内の各サンプル２９２が３２ビット幅であり、２つの１６ビット複素数のベクトルデータ（すなわち、フォーマットＩ８Ｑ８に従う第１のベクトルデータおよびフォーマットＩ８Ｑ８に従う第２のベクトルデータ）から構成される場合、結果として生じる出力ベクトルデータサンプルセット２９２（０）〜２９２（Ｘ）内の２つの、結果として生じる出力ベクトルデータサンプル２９２の中の４つのベクトルデータサンプルを、１つのマージされた、結果として生じる出力ベクトルデータサンプル２９６の中にマージするために、マージング動作が適用される可能性がある。たとえば、図３３に示されたように、加算器３２０（０）は、結果として生じる出力ベクトルデータサンプル２９２（０）と２９２（１）とをマージするように構成される。同じく、加算器３２０（１）は、それらのサンプルのための、結果として生じる出力ベクトルデータサンプル２９２（２）と２９２（３）とをマージするように構成される。加算器３２０（（（Ｘ＋１）／２）−１）、３２０（７）は、マージベクトルデータサンプルセット３２２（０）〜３２２（（（Ｘ＋１）／２）−１）、３２２（７）を供給するために、結果として生じる出力ベクトルデータサンプルセット２９２（Ｘ−１）と２９２（Ｘ）とをマージするように構成される。加算器３２０（（（Ｘ＋１）／２）−１）、３２０（７）によって実行されたマージングからのマージベクトルデータサンプルセット３２２（０）〜３２２（（（Ｘ＋１）／２）−１）、３２２（７）は、ラッチ３２５（０）〜３２５（（（Ｘ＋１）／２）−１）、３２５（７）の中にラッチされる。 [00237] For example, each sample 292 in the resulting output vector data sample set 292 (0) -292 (X) is 32 bits wide and is two 16-bit complex vector data (ie, a first according to format I8Q8 Of the resulting output vector data samples 292 in the resulting output vector data sample sets 292 (0) -292 (X). A merging operation may be applied to merge the four vector data samples in into one merged resulting output vector data sample 296. For example, as shown in FIG. 33, adder 320 (0) is configured to merge the resulting output vector data samples 292 (0) and 292 (1). Similarly, adder 320 (1) is configured to merge the resulting output vector data samples 292 (2) and 292 (3) for those samples. The adders 320 (((X + 1) / 2) -1) and 320 (7) supply merge vector data sample sets 322 (0) to 322 (((X + 1) / 2) -1) and 322 (7). In order to do so, the resulting output vector data sample sets 292 (X-1) and 292 (X) are configured to be merged. Merge vector data sample sets 322 (0) -322 (((X + 1) / 2) -1), 322 from the merging performed by adders 320 (((X + 1) / 2) -1), 320 (7), 322 (7) is latched into latches 325 (0) -325 (((X + 1) / 2) -1), 325 (7).

[00238]マージベクトル処理動作３０２が、結果として生じる出力ベクトルデータサンプルセット２９２（０）〜２９２（Ｘ）のマージングを必要とする場合、下記でより詳細に記載されるように、マージベクトルデータサンプルセット３２２（０）〜３２２（（（Ｘ＋１）／２）−１）、３２２（７）は、マージ、結果として生じる出力ベクトルデータサンプルセット２９６（０）〜２９６（Ｚ）として供給され得るし、ここで、「Ｚ」は７である。しかしながら、マージベクトル処理動作３０２が、結果として生じる出力ベクトルデータサンプルセット２９２（０）〜２９２（Ｘ）内の隣接しない、結果として生じる出力ベクトルデータサンプル２９２のマージングを要求する場合、マージベクトルデータサンプルセット３２２（０）〜３２２（（（Ｘ＋１）／２）−１）、３２２（７）は、マージ、結果として生じる出力ベクトルデータサンプルセット２９６（０）〜２９６（Ｚ）として供給されない。マージベクトルデータサンプルセット３２２（０）〜３２２（（（Ｘ＋１）／２）−１）、３２２（７）は、加算器３２４（０）〜３２４（（（Ｘ＋１）／４）−１）、３２４（３）への第２の加算器ツリーレベル３１８（２）に供給される。この関連で、加算器３２４（０）は、結果として生じるマージベクトルデータサンプル３２６（０）を供給するために、マージベクトルデータサンプル３２２（０）および３２２（１）に対してマージングを実行するように構成される。同じく、加算器３２４（１）は、結果として生じるマージベクトルデータサンプル３２６（１）を供給するために、マージベクトルデータサンプル３２２（２）および３２２（３）に対してマージングを実行するように構成される。加算器３２４（（（Ｘ＋１）／４）−１）、３２４（３）は、結果として生じるマージベクトルデータサンプル３２６（（（Ｘ＋１）／４）−１）、３２６（３）を供給するために、マージベクトルデータサンプル３２２（（（Ｘ＋１）／４）−２）、３２２（（（Ｘ＋１）／４）−１）、３２２（３）に対してマージングを実行するように構成される。加算器３２４（０）〜３２４（（（Ｘ＋１）／４）−１）、３２４（３）によって実行されたマージングからの、結果として生じるマージベクトルデータサンプルセット３２６（０）〜３２６（（（Ｘ＋１）／４）−１）、３２６（３）は、ラッチ３２７（０）〜３２７（（（Ｘ＋１）／４）−１）、３２７（３）の中にラッチされる。 [00238] If the merge vector processing operation 302 requires merging of the resulting output vector data sample sets 292 (0) -292 (X), as described in more detail below, the merge vector data samples Sets 322 (0) -322 (((X + 1) / 2) -1), 322 (7) may be supplied as a merge, resulting output vector data sample set 296 (0) -296 (Z), Here, “Z” is 7. However, if the merge vector processing operation 302 requires merging of the resulting output vector data samples 292 that are not adjacent in the resulting output vector data sample sets 292 (0) -292 (X), then merge vector data samples Sets 322 (0) -322 (((X + 1) / 2) -1), 322 (7) are not supplied as merge, resulting output vector data sample sets 296 (0) -296 (Z). Merge vector data sample sets 322 (0) to 322 (((X + 1) / 2) -1) and 322 (7) are adders 324 (0) to 324 (((X + 1) / 4) -1), 324 To the second adder tree level 318 (2) to (3). In this regard, adder 324 (0) performs merging on merge vector data samples 322 (0) and 322 (1) to provide the resulting merge vector data samples 326 (0). Configured. Similarly, adder 324 (1) is configured to perform merging on merge vector data samples 322 (2) and 322 (3) to provide a resulting merge vector data sample 326 (1). Is done. Adders 324 (((X + 1) / 4) -1), 324 (3) provide the resulting merge vector data samples 326 (((X + 1) / 4) -1), 326 (3). The merge vector data samples 322 (((X + 1) / 4) -2), 322 (((X + 1) / 4) -1), and 322 (3) are configured to perform merging. Resulting merge vector data sample sets 326 (0) -326 (((X + 1) from the merging performed by adders 324 (0) -324 (((X + 1) / 4) -1), 324 (3) ) / 4) -1), 326 (3) are latched into latches 327 (0) -327 (((X + 1) / 4) -1), 327 (3).

[00239]引き続き図３３を参照すると、マージベクトル処理動作３０２が８のマージ係数による、結果として生じる出力ベクトルデータサンプルセット２９２（０）〜２９２（Ｘ）のマージングを必要とする場合、下記でより詳細に記載されるように、マージベクトルデータサンプルセット３２６（０）〜３２６（（（Ｘ＋１）／４）−１）、３２６（３）は、マージ、結果として生じる出力ベクトルデータサンプルセット２９６（０）〜２９６（Ｚ）として供給され得るし、ここで、「Ｚ」は３である。しかしながら、マージベクトル処理動作３０２が８よりも高いマージ係数（たとえば、１６、３２、６４、１２８、２５６）を要求する場合、マージベクトルデータサンプルセット３２６（０）〜３２６（（（Ｘ＋１）／４）−１）、３２６（３）は、マージ、結果として生じる出力ベクトルデータサンプルセット２９６（０）〜２９６（Ｚ）として供給されない。マージベクトルデータサンプルセット３２６（０）〜３２６（（（Ｘ＋１）／４）−１）、３２６（３）は、加算器３２８（０）〜３２８（（（Ｘ＋１）／８）−１）、３２８（１）への第３の加算器ツリーレベル３１８（３）に供給される。この関連で、加算器３２８（０）は、それらのサンプルのための１６のマージ係数を供給するために、マージベクトルデータサンプル３２６（０）および３２６（１）に対してマージングを実行するように構成される。同じく、加算器３２８（１）は、１６のマージ係数を有するマージベクトルデータサンプルセット３３０（０）〜３３０（（（Ｘ＋１）／８）−１）、３３０（１）を供給するために、マージベクトルデータサンプル３２６（２）および３２６（３）に対してマージングを実行するように構成される。加算器３２８（０）〜３２８（（（Ｘ＋１）／８）−１）、３２８（１）によって実行されたマージングからのマージベクトルデータサンプルセット３３０（０）〜３３０（（（Ｘ＋１）／８）−１）、３３０（１）は、ラッチ３２９（０）〜３２９（（（Ｘ＋１）／８）−１）、３２９（１）の中にラッチされる。 [00239] Still referring to FIG. 33, if the merge vector processing operation 302 requires merging of the resulting output vector data sample sets 292 (0) -292 (X) with a merge factor of 8, As described in detail, merge vector data sample sets 326 (0) -326 (((X + 1) / 4) -1), 326 (3) are merged and the resulting output vector data sample set 296 (0 ) To 296 (Z), where “Z” is 3. However, if merge vector processing operation 302 requires a merge factor higher than 8 (eg, 16, 32, 64, 128, 256), merge vector data sample sets 326 (0) -326 (((X + 1) / 4 ) -1), 326 (3) is not supplied as a merge, resulting output vector data sample set 296 (0) -296 (Z). Merge vector data sample sets 326 (0) to 326 (((X + 1) / 4) -1) and 326 (3) are adders 328 (0) to 328 (((X + 1) / 8) -1), 328 To the third adder tree level 318 (3) to (1). In this regard, adder 328 (0) performs merging on merge vector data samples 326 (0) and 326 (1) to provide 16 merge factors for those samples. Composed. Similarly, adder 328 (1) merges to provide merge vector data sample sets 330 (0) -330 (((X + 1) / 8) -1), 330 (1) having 16 merge coefficients. It is configured to perform merging on vector data samples 326 (2) and 326 (3). Merge vector data sample sets 330 (0) -330 (((X + 1) / 8) from merging performed by adders 328 (0) -328 (((X + 1) / 8) -1), 328 (1) -1) and 330 (1) are latched in latches 329 (0) to 329 (((X + 1) / 8) -1) and 329 (1).

[00240]引き続き図３３を参照すると、マージベクトル処理動作３０２が１６のマージ係数による、結果として生じる出力ベクトルデータサンプルセット２９２（０）〜２９２（Ｘ）のマージングを必要とする場合、下記でより詳細に記載されるように、マージベクトルデータサンプルセット３３０（０）〜３３０（（（Ｘ＋１）／８）−１）、３３０（１）は、マージ、結果として生じる出力ベクトルデータサンプルセット２９６（０）〜２９６（Ｚ）として供給され得るし、ここで、「Ｚ」は１である。しかしながら、マージベクトル処理動作２３６が１６よりも高いマージ係数（たとえば、３２、６４、１２８、２５６）を要求する場合、マージベクトルデータサンプルセット３３０（０）〜３３０（（（Ｘ＋１）／８）−１）、３３０（１）は、マージ、結果として生じる出力ベクトルデータサンプルセット２９６（０）〜２９６（Ｚ）として供給されない。マージベクトルデータサンプルセット３３０（０）〜３３０（（（Ｘ＋１）／８）−１）、３３０（１）は、加算器３３２への第４の加算器ツリーレベル３１８（４）に供給される。この関連で、加算器３３２は、３２のマージ係数を有するマージベクトルデータサンプル３３４を供給するために、マージベクトルデータサンプル３３０（０）および３３０（１）に対してマージングを実行するように構成される。加算器３３２によって実行されたマージングからのマージベクトルデータサンプル３３４は、ラッチ３３６および３３８の中にラッチされる。 [00240] Still referring to FIG. 33, if the merge vector processing operation 302 requires merging of the resulting output vector data sample sets 292 (0) -292 (X) with 16 merge factors, As described in detail, merge vector data sample sets 330 (0) -330 (((X + 1) / 8) -1), 330 (1) are merged and the resulting output vector data sample set 296 (0 ) To 296 (Z), where “Z” is 1. However, if the merge vector processing operation 236 requires a merge factor higher than 16 (eg, 32, 64, 128, 256), the merge vector data sample sets 330 (0) -330 (((X + 1) / 8) − 1), 330 (1) are not supplied as a merge, resulting output vector data sample sets 296 (0) -296 (Z). Merge vector data sample sets 330 (0)-330 (((X + 1) / 8) −1), 330 (1) are supplied to a fourth adder tree level 318 (4) to adder 332. In this regard, adder 332 is configured to perform merging on merge vector data samples 330 (0) and 330 (1) to provide merge vector data samples 334 having 32 merge coefficients. The Merge vector data samples 334 from the merging performed by adder 332 are latched into latches 336 and 338.

[00241]引き続き図３３を参照すると、マージベクトル処理動作３０２が３２のマージ係数による、結果として生じる出力ベクトルデータサンプルセット２９２（０）〜２９２（Ｘ）のマージングを必要とする場合、下記でより詳細に記載されるように、マージベクトルデータサンプル３３４は、マージ、結果として生じる出力ベクトルデータサンプル２９６として供給され得る。しかしながら、マージベクトル処理動作３０２が３２よりも高いマージ係数（たとえば、６４、１２８、２５６）を要求する場合、マージベクトルデータサンプル３３４は、マージ、結果として生じる出力ベクトルデータサンプル２９６として供給されない。マージベクトルデータサンプル３３４は、ベクトルデータファイル８２に記憶される必要なしに、ラッチ３３８の中にラッチされたままである。上述されたように、３２のマージ係数を使用してマージされるために、別の結果として生じる出力ベクトルデータサンプルセット２９２（０）〜２９２（Ｘ）は、さらなる処理サイクルにわたって、ラッチ３２１（０）〜３２１（Ｘ）の中にロードされる。結果として生じるマージベクトルデータサンプル３３４’は、６４のマージ係数を有するマージベクトルデータサンプル３４２を供給するために、第５の加算器ツリー３１８（５）内の加算器３４０により、前のマージベクトルデータサンプル３３４に加算される。選択器３４３は、３２のマージ係数を有するマージベクトルデータサンプル３３４、または６４のマージ係数を有するマージベクトルデータサンプル３３４’のどちらが、マージベクトルデータサンプル３４２としてラッチ３４４の中にラッチされるかを制御する。さらなる結果として生じる出力ベクトルデータサンプルセット２９２（０）〜２９２（Ｘ）をラッチし、さらなる結果として生じる出力ベクトルデータサンプルセット２９２（０）〜２９２（Ｘ）をマージするこの同じプロセスは、必要な場合、６４よりも大きいマージ係数を達成するために実行され得る。マージベクトルデータサンプル３４２は、最終的に、マージベクトル処理動作３０２のための所望のマージ係数に従って、所望のマージ、結果として生じる出力ベクトルデータサンプル２９６として、ラッチ３４４の中にラッチされる。 [00241] Still referring to FIG. 33, if the merge vector processing operation 302 requires merging of the resulting output vector data sample sets 292 (0) -292 (X) with 32 merge factors, As described in detail, merge vector data sample 334 may be provided as a merge, resulting output vector data sample 296. However, if the merge vector processing operation 302 requires a merge factor higher than 32 (eg, 64, 128, 256), the merge vector data sample 334 is not provided as a merge, resulting output vector data sample 296. The merge vector data sample 334 remains latched in the latch 338 without having to be stored in the vector data file 82. As described above, to be merged using 32 merge factors, another resulting output vector data sample set 292 (0) -292 (X) is latched 321 (0) over a further processing cycle. ) To 321 (X). The resulting merge vector data sample 334 ′ is added by the adder 340 in the fifth adder tree 318 (5) to provide a merge vector data sample 342 having 64 merge coefficients. Added to sample 334. The selector 343 controls whether a merge vector data sample 334 having 32 merge coefficients or a merge vector data sample 334 ′ having 64 merge coefficients is latched into the latch 344 as a merge vector data sample 342. To do. This same process of latching the additional resulting output vector data sample sets 292 (0) -292 (X) and merging the additional resulting output vector data sample sets 292 (0) -292 (X) is necessary If so, it can be performed to achieve a merge factor greater than 64. The merge vector data sample 342 is finally latched into the latch 344 as the desired merge, resulting output vector data sample 296, according to the desired merge factor for the merge vector processing operation 302.

[00242]引き続き図３３を参照すると、マージベクトル処理動作３０２においてどのマージ係数が要求されても、マージ、結果として生じる出力ベクトルデータサンプルセット２９６（０）〜２９６（Ｚ）は、ベクトルデータファイル８２（０）〜８２（Ｘ）に記憶される必要がある。次に説明されるように、図３３のマージング回路２９４はまた、マージ、結果として生じる出力ベクトルデータサンプルセット２９６（０）〜２９６（Ｚ）を形成するために、結果として生じる出力ベクトルデータサンプルセット２９２（０）〜２９２（Ｘ）に対してマージベクトル処理動作３０２を実行する結果としてもたらされたマージ、結果として生じる出力ベクトルデータサンプル２９６をラッチ３４６（０）〜３４６（Ｘ）の中にロードするように構成される。マージ、結果として生じる出力ベクトルデータサンプルセット２９６（０）〜２９６（Ｚ）は、記憶されるためにベクトルデータファイル８２（０）〜８２（Ｘ）に供給され得る。このようにして、マージング回路２９４によって作成されたマージ、結果として生じる出力ベクトルデータサンプルセット２９６（０）〜２９６（Ｚ）を記憶するために、ベクトルデータファイル８２（０）〜８２（Ｘ）に対して１回の書込みが必要とされるにすぎない。図３３のマージング回路２９４内の加算器ツリー３１８（１）〜３１８（５）は、マージベクトル処理動作３０２においてどのマージ係数が要求されても、マージ係数４、８、１６、および３２のすべてに対して、マージ、結果として生じる出力ベクトルデータサンプル２９６を生成することができる。代替として、所望のマージ係数に従ってマージベクトル処理動作３０２を実行する必要がない加算器ツリー内の加算器は、無効にされ得るか、または０を加算するように構成され得る。しかしながら、これらのマージ、結果として生じる出力ベクトルデータサンプル２９６のどれが記憶されるためにラッチ３４６（０）〜３４６（Ｘ）に供給されるかを決定するために、次に説明されるように、選択器３４８（０）〜３４８（（（Ｘ＋１）／４）−１）、３４８（３）が設けられる。 [00242] With continued reference to FIG. 33, no matter what merge coefficients are requested in the merge vector processing operation 302, the resulting output vector data sample sets 296 (0) -296 (Z) are merged into the vector data file 82. (0) to 82 (X) need to be stored. As will be explained next, the merging circuit 294 of FIG. 33 also merges the resulting output vector data sample sets 296 (0) -296 (Z) to form the resulting output vector data sample sets. The resulting merge, resulting in performing the merge vector processing operation 302 on 292 (0) -292 (X), and the resulting output vector data sample 296 in latches 346 (0) -346 (X) Configured to load. The merged, resulting output vector data sample sets 296 (0) -296 (Z) can be supplied to vector data files 82 (0) -82 (X) for storage. Thus, to store the merge created by merging circuit 294 and the resulting output vector data sample sets 296 (0) -296 (Z), vector data files 82 (0) -82 (X) Only one write is required for this. The adder trees 318 (1) -318 (5) in the merging circuit 294 of FIG. In contrast, merging and the resulting output vector data samples 296 can be generated. Alternatively, adders in the adder tree that do not need to perform the merge vector processing operation 302 according to the desired merge factor can be disabled or configured to add zeros. However, to determine which of these merges, the resulting output vector data samples 296, are supplied to latches 346 (0) -346 (X) for storage, as described next. , Selectors 348 (0) to 348 (((X + 1) / 4) -1), 348 (3) are provided.

[00243]この関連で、引き続き図３３を参照すると、選択器３４８（０）は、実行されているマージベクトル処理動作３０２に基づいて、それぞれ、加算器３２０（０）、３２４（０）、３２８（０）からのマージ係数４、８、および１６、ならびに加算器３３２、３４０からのマージ係数３２、６４、１２８、２５６のいずれかに対して、マージ、結果として生じる出力ベクトルデータサンプル２９６を選択することができる。選択器３４８（１）は、実行されているマージベクトル処理動作３０２に基づいて、それぞれ、加算器３２０（１）、３２４（１）、３２８（１）からのマージ係数４、８、および１６に対して、マージ、結果として生じる出力ベクトルデータサンプル２９６を選択することができる。選択器３４８（２）は、実行されているマージベクトル処理動作３０２に基づいて、それぞれ、加算器３２０（２）および３２４（２）からのマージ係数４および８に対して、マージ、結果として生じる出力ベクトルデータサンプル２９６を選択することができる。選択器３４８（３）は、実行されているマージベクトル処理動作３０２に基づいて、それぞれ、加算器３２０（３）および３２４（３）からのマージ係数４および８に対して、マージ、結果として生じる出力ベクトルデータサンプル２９６を選択することができる。８のマージ係数を供給することが選択器３４８（０）〜３４８（３）によって完全に満足され得るので、選択器は、加算器３２０（４）〜３２０（７）から供給されたマージ、結果として生じる出力ベクトルデータサンプル２９６を制御するためには設けられない。 [00243] Continuing with reference to FIG. 33 in this regard, the selector 348 (0) is based on the merge vector processing operations 302 being performed, respectively, adders 320 (0), 324 (0), 328. Merge, select resulting merged output vector data sample 296 for any of merge factors 4, 8, and 16 from (0) and merge factors 32, 64, 128, 256 from adders 332, 340 can do. Selector 348 (1) determines merge factors 4, 8, and 16 from adders 320 (1), 324 (1), 328 (1), respectively, based on the merge vector processing operation 302 being performed. On the other hand, merging and the resulting output vector data sample 296 can be selected. Selector 348 (2) merges, resulting in merge coefficients 4 and 8, from adders 320 (2) and 324 (2), respectively, based on the merge vector processing operation 302 being performed. Output vector data samples 296 can be selected. Selector 348 (3) merges and results in merge factors 4 and 8 from adders 320 (3) and 324 (3), respectively, based on the merge vector processing operation 302 being performed. Output vector data samples 296 can be selected. Since supplying a merge factor of 8 can be fully satisfied by the selectors 348 (0) -348 (3), the selector is able to merge the results supplied from the adders 320 (4) -320 (7). Is not provided for controlling the resulting output vector data sample 296.

[00244]引き続き図３３を参照すると、マージベクトル処理動作用に設けられたデータスライサ３５０（０）〜３５０（（（Ｘ＋１）／２）−１）、３５０（７）は、バイパスされるか、または、それぞれ、選択器３４８（０）〜３４８（（（Ｘ＋１）／４）−１）、３４８（３）および加算器３２０（４）〜３２０（（（Ｘ＋１）／２）−１）、３２０（７）によって選択された、受信されたマージ、結果として生じる出力ベクトルデータサンプル２９６に対してデータスプライシングを実行しないように構成される可能性がある。マージ、結果として生じる出力ベクトルデータサンプル２９６は、次いで、クロスバー３５２への接続を介して、記憶されるためにラッチ３４６（０）〜３４６（Ｘ）の中の所望のラッチ３４６に転送される。クロスバー３５２は、様々なラッチ３４６（０）〜３４６（Ｘ）に、マージベクトル処理動作３０２に従ってマージ、結果として生じる出力ベクトルデータサンプル２９６を供給する柔軟性を提供する。このようにして、マージ、結果として生じる出力ベクトルデータサンプル２９６は、ベクトルデータファイル８２（０）〜８２（Ｘ）に記憶される前に、マージベクトル処理動作３０２の様々な繰返しの中で、ラッチ３４６（０）〜３４６（Ｘ）にスタックされ得る。たとえば、マージ、結果として生じる出力ベクトルデータサンプルセット２９６（０）〜２９６（Ｚ）は、ベクトルデータファイル８２（０）〜８２（Ｘ）に記憶される前に、マージベクトル処理動作３０２の様々な繰返しの中で、ラッチ３４６（０）〜３４６（Ｘ）にスタックされ得る。このようにして、マージ、結果として生じる出力ベクトルデータサンプルセット２９６（０）〜２９６（Ｚ）を記憶するためのベクトルデータファイル８２（０）〜８２（Ｘ）へのアクセスは、動作効率のために最小化され得る。 [00244] With continued reference to FIG. 33, data slicers 350 (0) -350 (((X + 1) / 2) -1), 350 (7) provided for merge vector processing operations are bypassed, Alternatively, selectors 348 (0) to 348 (((X + 1) / 4) -1), 348 (3) and adders 320 (4) to 320 (((X + 1) / 2) -1), 320, respectively. The received merge selected by (7) may be configured not to perform data splicing on the resulting output vector data samples 296. The merged, resulting output vector data sample 296 is then transferred via a connection to the crossbar 352 to the desired latch 346 in latches 346 (0) -346 (X) for storage. . Crossbar 352 provides the flexibility to supply various latches 346 (0) -346 (X) to merge and the resulting output vector data samples 296 according to merge vector processing operation 302. In this way, the merged and resulting output vector data samples 296 are latched during various iterations of the merge vector processing operation 302 before being stored in the vector data files 82 (0) -82 (X). 346 (0) -346 (X). For example, the merged resulting output vector data sample sets 296 (0) -296 (Z) may be stored in the various merge vector processing operations 302 before being stored in the vector data files 82 (0) -82 (X). In iterations, they can be stacked into latches 346 (0) -346 (X). Thus, merging and accessing the resulting vector data files 82 (0) -82 (X) for storing the resulting output vector data sample sets 296 (0) -296 (Z) is for operational efficiency. Can be minimized.

[00245]たとえば、図３３に示されたように、クロスバー３５２に結合された選択器３５４（０）〜３５４（Ｘ）は、ラッチ３４６（０）〜３４６（Ｘ）のいずれかの中に選択器３４８（０）からのマージ、結果として生じる出力ベクトルデータサンプル２９６を記憶するように制御され得る。クロスバー３５２に結合された選択器３５４（１）、３５４（３）、３５４（５）、３５４（７）、３５４（９）、３５４（１１）、３５４（１３）、３５４（１５）は、ラッチ３４６（１）、３４６（３）、３４６（５）、３４６（７）、３４６（９）、３４６（１１）、３４６（１３）、および３４６（１５）に記憶されるべき選択器３４８（１）からのマージ、結果として生じる出力ベクトルデータサンプル２９６を記憶するように制御され得る。クロスバー３５２に結合された選択器３５４（２）、３５４（６）、３５４（１０）、３５４（１４）は、ラッチ３４６（２）、３４６（６）、３４６（１０）、および３４６（１４）に選択器３４８（２）からのマージ、結果として生じる出力ベクトルデータサンプル２９６を記憶するように制御され得る。クロスバー３５２に結合された選択器３５４（３）、３５４（７）、３５４（１１）、３５４（１５）は、ラッチ３４６（３）、３４６（７）、３４６（１１）、および３４６（１５）に選択器３４８（３）からのマージ、結果として生じる出力ベクトルデータサンプル２９６を記憶するように制御され得る。クロスバー３５２に結合された選択器３５４（４）および３５４（１２）は、ラッチ３４６（４）および３４６（１２）に加算器３２０（４）からのマージ、結果として生じる出力ベクトルデータサンプル２９６を記憶するように制御され得る。クロスバー３５２に結合された選択器３５４（５）および３５４（１３）は、ラッチ３４６（５）および３４６（１３）に加算器３２０（５）からのマージ、結果として生じる出力ベクトルデータサンプル２９６を記憶するように制御され得る。クロスバー３５２に結合された選択器３５４（６）および３５４（１４）は、ラッチ３４６（６）または３４６（１４）に加算器３２０（６）からのマージ、結果として生じる出力ベクトルデータサンプル２９６を記憶するように制御され得る。クロスバー３５２に結合された選択器３５４（７）および３５４（１５）は、ラッチ３４６（７）または３４６（１５）に加算器３２０（７）からのマージ、結果として生じる出力ベクトルデータサンプル２９６を記憶するように制御され得る。 [00245] For example, as shown in FIG. 33, selectors 354 (0) -354 (X) coupled to crossbar 352 are in any of latches 346 (0) -346 (X). The merge from selector 348 (0) may be controlled to store the resulting output vector data samples 296. Selectors 354 (1), 354 (3), 354 (5), 354 (7), 354 (9), 354 (11), 354 (13), 354 (15) coupled to the crossbar 352 are A selector 348 (to be stored in latches 346 (1), 346 (3), 346 (5), 346 (7), 346 (9), 346 (11), 346 (13), and 346 (15). The merge from 1) may be controlled to store the resulting output vector data samples 296. Selectors 354 (2), 354 (6), 354 (10), 354 (14) coupled to the crossbar 352 include latches 346 (2), 346 (6), 346 (10), and 346 (14 ) May be controlled to store the merge from selector 348 (2) and the resulting output vector data samples 296. Selectors 354 (3), 354 (7), 354 (11), 354 (15) coupled to the crossbar 352 include latches 346 (3), 346 (7), 346 (11), and 346 (15 ) Can be controlled to store the merge from selector 348 (3) and the resulting output vector data samples 296. Selectors 354 (4) and 354 (12) coupled to crossbar 352 merge the latches 346 (4) and 346 (12) from adder 320 (4) into the resulting output vector data samples 296. It can be controlled to memorize. Selectors 354 (5) and 354 (13) coupled to crossbar 352 merge the latches 346 (5) and 346 (13) from adder 320 (5) into the resulting output vector data samples 296. It can be controlled to memorize. Selectors 354 (6) and 354 (14) coupled to crossbar 352 merge the latches 346 (6) or 346 (14) from adder 320 (6) into the resulting output vector data samples 296. It can be controlled to memorize. Selectors 354 (7) and 354 (15) coupled to crossbar 352 merge the latches 346 (7) or 346 (15) from adder 320 (7) and the resulting output vector data samples 296. It can be controlled to memorize.

[00246]図３３のマージング回路２９４では、加算器は、結果として生じる出力ベクトルデータサンプルセット２９２（０）〜２９２（Ｘ）内の隣接しない、結果として生じる出力ベクトルデータサンプル２８２がマージされることを可能にするように構成される可能性があることに留意されたい。たとえば、結果として生じる出力ベクトルデータサンプル２９２（０）を、結果として生じる出力ベクトルデータサンプル２９２（９）とマージすることが望ましい場合、加算器ツリーレベル３１８（１）〜３１８（３）内の加算器は、単に、結果として生じる出力ベクトルデータサンプル２９２（９）との、結果として生じる出力ベクトルデータサンプル２９２（０）のマージを、加算器ツリーレベル３１８（４）に渡すように構成される可能性がある。加算器ツリーレベル３１８（４）内の加算器３３２は、次いで、マージされた出力ベクトルデータサンプル２９６を供給するために、結果として生じる出力ベクトルデータサンプル２９２（０）を、結果として生じる出力ベクトルデータサンプル２９２（９）とマージする可能性がある。 [00246] In the merging circuit 294 of FIG. 33, the adder causes the resulting output vector data samples 282 that are not adjacent in the resulting output vector data sample sets 292 (0) -292 (X) to be merged. Note that may be configured to enable. For example, if it is desired to merge the resulting output vector data sample 292 (0) with the resulting output vector data sample 292 (9), the additions in adder tree levels 318 (1) -318 (3) The instrument can be configured to simply pass the merge of the resulting output vector data sample 292 (0) with the resulting output vector data sample 292 (9) to the adder tree level 318 (4). There is sex. Adder 332 in adder tree level 318 (4) then provides the resulting output vector data sample 292 (0) to the resulting output vector data to provide merged output vector data samples 296. May merge with sample 292 (9).

[00247]ベクトルおよび／またはスカラーの加算以外の他のタイプのベクトルマージング演算を提供するマージング回路も、実行ユニット８４（０）〜８４（Ｘ）とベクトルデータファイル８２（０）〜８２（Ｘ）との間の出力データフローパス９８（０）〜９８（Ｘ）内に設けられる可能性がある。たとえば、図３３のマージング回路２９４は、最大または最小のベクトルおよび／またはスカラーのマージング演算を提供するように構成される可能性がある。たとえば、図３３の加算器ツリー３１８の加算器ツリーレベル３１８（１）〜３１８（５）内の加算器は、最大または最小の関数回路と交換される可能性がある。言い換えれば、回路は、結果として生じる出力ベクトルデータサンプルセット２９２（０）〜２９２（Ｘ）からの２つの、結果として生じる出力ベクトルデータサンプル２９２のうちの大きい方または小さい方のいずれかを渡すことを選択するはずである。たとえば、結果として生じる出力ベクトルデータサンプルセット２９２（０）〜２９２（Ｘ）からの２つの、結果として生じる出力ベクトルデータサンプル２９２が、図３０における２つの入力ベクトルデータサンプル２９０（０）、２９０（１）であった場合、マージング回路２９４が最大ベクトルデータサンプルを選択するように構成されている場合、マージング回路２９４は、ベクトルデータサンプル２９０（１）を選択するように構成される可能性がある。 [00247] Merging circuits that provide other types of vector merging operations other than vector and / or scalar addition are also described in execution units 84 (0) -84 (X) and vector data files 82 (0) -82 (X). May be provided in the output data flow path 98 (0) to 98 (X) between and. For example, the merging circuit 294 of FIG. 33 may be configured to provide maximum or minimum vector and / or scalar merging operations. For example, adders in adder tree levels 318 (1) -318 (5) of adder tree 318 of FIG. 33 may be replaced with maximum or minimum functional circuits. In other words, the circuit passes either the larger or the smaller of the two resulting output vector data samples 292 from the resulting output vector data sample sets 292 (0) -292 (X). Should be selected. For example, two resulting output vector data samples 292 from the resulting output vector data sample sets 292 (0) -292 (X) are represented by the two input vector data samples 290 (0), 290 ( 1), if the merging circuit 294 is configured to select the largest vector data sample, the merging circuit 294 may be configured to select the vector data sample 290 (1). .

[00248]この関連で、図３４を参照すると、図３３の第１の加算器ツリーレベル３１８（１）内の加算器３２０（０）〜３２０（（（Ｘ＋１）／２）−１）、３２０（７）は、図３４に示されたように、最大または最小のマージ選択加算器３２０’（０）〜３２０’（（（Ｘ＋１）／２）−１）、３２０’（７）と交換される可能性がある。第２の加算器ツリーレベル３１８（２）内の加算器３２４（０）〜３２４（（（Ｘ＋１）／４）−１）、３２４（３）は、図３４に示されたように、最大または最小の選択器３２４’（０）〜３２４’（（（Ｘ＋１）／４）−１）、３２４’（３）と交換される可能性がある。第３の加算器ツリーレベル３１８（３）内の加算器３２８（０）〜３２８（（（Ｘ＋１）／８）−１）、３２８（１）は、図３４に示されたように、最大または最小の選択器３２８’（０）〜３２８’（（（Ｘ＋１）／８）−１）、３２８’（１）と交換される可能性がある。第４の加算器ツリーレベル３１８（４）内の加算器３３２は、図３４に示されたように、最大または最小の選択器３３２’と交換される可能性がある。第５の加算器ツリーレベル３１８（５）内の加算器３４０は、図３４に示されたように、最大または最小の選択器３４０’と交換される可能性がある。図３４のマージング回路２９４では、加算器は、マージされるべき、結果として生じる出力ベクトルデータサンプルセット２９２（０）〜２９２（Ｘ）内の隣接しない、結果として生じる出力ベクトルデータサンプル２９２の間の、最大または最小の、結果として生じる出力ベクトルデータサンプル２９２を選択するように構成される可能性がある。たとえば、結果として生じる出力ベクトルデータサンプル２９２（０）を、結果として生じる出力ベクトルデータサンプル２９２（９）と最大マージすることが望ましい場合、加算器ツリーレベル３１８（１）〜３１８（３）内の加算器は、単に、結果として生じる出力ベクトルデータサンプル２９２（９）との、結果として生じる出力ベクトルデータサンプル２９２（０）のマージを、加算器ツリーレベル３１８（４）に渡すように構成される可能性がある。加算器ツリーレベル３１８（４）内の加算器３３２’は、次いで、マージされた出力ベクトルデータサンプル２６４を供給するために、結果として生じる出力ベクトルデータサンプル２９２（０）を、結果として生じる出力ベクトルデータサンプル２９２（９）と最大マージする可能性がある。 [00248] In this regard, referring to FIG. 34, adders 320 (0) -320 (((X + 1) / 2) -1), 320 in the first adder tree level 318 (1) of FIG. (7) is replaced with the maximum or minimum merge selection adders 320 ′ (0) to 320 ′ (((X + 1) / 2) −1), 320 ′ (7) as shown in FIG. There is a possibility. The adders 324 (0) to 324 (((X + 1) / 4) -1), 324 (3) in the second adder tree level 318 (2) are maximum or The smallest selectors 324 ′ (0) to 324 ′ (((X + 1) / 4) −1) may be exchanged for 324 ′ (3). The adders 328 (0) to 328 (((X + 1) / 8) -1), 328 (1) in the third adder tree level 318 (3) are maximum or The smallest selectors 328 ′ (0) to 328 ′ (((X + 1) / 8) −1) and 328 ′ (1) may be exchanged. The adder 332 in the fourth adder tree level 318 (4) may be replaced with a maximum or minimum selector 332 'as shown in FIG. The adder 340 in the fifth adder tree level 318 (5) may be replaced with a maximum or minimum selector 340 'as shown in FIG. In the merging circuit 294 of FIG. 34, the adder is between the non-adjacent, resulting output vector data samples 292 in the resulting output vector data sample sets 292 (0) -292 (X) to be merged. May be configured to select the largest or smallest resulting output vector data sample 292. For example, if it is desired to maximum merge the resulting output vector data sample 292 (0) with the resulting output vector data sample 292 (9), the adder tree levels 318 (1) through 318 (3) The adder is simply configured to pass a merge of the resulting output vector data sample 292 (0) with the resulting output vector data sample 292 (9) to the adder tree level 318 (4). there is a possibility. The adder 332 ′ in adder tree level 318 (4) then provides the resulting output vector data sample 292 (0) to the resulting output vector to provide a merged output vector data sample 264. There is a possibility of maximal merging with data sample 292 (9).

[00249]上記で説明されたように、入力ベクトルデータサンプルセット８６（０）〜８６（Ｘ）に対してベクトル処理動作を実行するために、ＶＰＥ２２（１）〜（６）内に実行ユニット８４（０）〜８４（Ｘ）が設けられる。実行ユニット８４（０）〜８４（Ｘ）は、実行ユニット８４（０）〜８４（Ｘ）が様々なベクトル処理動作のための共通回路とハードウェアとを用いて複数の動作モードを提供することを可能にする、プログラム可能なデータパス構成も含む。実行ユニット８４（０）〜８４（Ｘ）および共通回路とハードウェアとを用いて複数の動作モードを提供するためのそれらのプログラム可能なデータパス構成に関するより例示的な詳細が次に説明される。 [00249] As described above, execution unit 84 in VPE 22 (1)-(6) is used to perform vector processing operations on input vector data sample sets 86 (0) -86 (X). (0) to 84 (X) are provided. Execution units 84 (0) -84 (X) provide execution modes 84 (0) -84 (X) using a common circuit and hardware for various vector processing operations. It also includes a programmable data path configuration that enables More exemplary details regarding the execution units 84 (0) -84 (X) and their programmable data path configurations for providing multiple modes of operation using common circuitry and hardware will now be described. .

[00250]この関連で、図３５は、ＶＰＥ２２（１）〜（６）内の実行ユニット８４（０）〜８４（Ｘ）の各々のために提供され得る、例示的な実行ユニットの例示的な概略図を示す。図３５に示されたように、および図３６〜図３９において以下でより詳細に記載されるように、実行ユニット８４は、プログラム可能なデータパス構成を用いて構成され得る例示的なベクトル処理ブロックを有する、複数の例示的なベクトルパイプラインステージ４６０を含む。下記でより詳細に説明されるように、ベクトル処理ブロック内に設けられたプログラム可能なデータパス構成により、特定の回路およびハードウェアが、図２のベクトルユニットデータメモリ３２から受信されたベクトルデータ３０に対する異なる特定のベクトル処理動作の実行をサポートするようにプログラムおよび再プログラムされることが可能になる。 [00250] In this regard, FIG. 35 illustrates an exemplary execution unit that may be provided for each of the execution units 84 (0) -84 (X) in the VPEs 22 (1)-(6). A schematic diagram is shown. As shown in FIG. 35, and as described in more detail below in FIGS. 36-39, execution unit 84 may be configured with an exemplary vector processing block that may be configured with a programmable data path configuration. A plurality of exemplary vector pipeline stages 460. As will be described in more detail below, a programmable data path configuration provided within the vector processing block allows specific circuitry and hardware to receive vector data 30 received from the vector unit data memory 32 of FIG. Can be programmed and reprogrammed to support the execution of different specific vector processing operations.

[00251]たとえば、いくつかのベクトル処理動作は、通常、ベクトルデータ３０の乗算、続いて、乗算されたベクトルデータ結果の累算を必要とする場合がある。そのようなベクトル処理の非限定的な例には、ワイヤレス通信アルゴリズムのための高速フーリエ変換（ＦＦＴ）演算を実行するために通常使用される、フィルタリング演算、相関演算、ならびに基数２および基数４のバタフライ演算が含まれ、ここで、一連の並列乗算が、続いて乗算結果の一連の並列累算が提供される。同様に図３９および図４０に関して下記でより詳細に説明されるように、図３５の実行ユニット８４は、桁上げ保存累算器において冗長桁上げ保存フォーマットを提供するための桁上げ保存累算器を有する融合乗算器のオプションも有する。桁上げ保存累算器において冗長桁上げ保存フォーマットを提供することにより、桁上げ伝搬パスと、累算の各ステップの間の桁上げ伝搬加算演算とを提供する必要をなくすことができる。 [00251] For example, some vector processing operations may typically require multiplication of vector data 30 followed by accumulation of the multiplied vector data results. Non-limiting examples of such vector processing include filtering operations, correlation operations, and radix-2 and radix-4 commonly used to perform fast Fourier transform (FFT) operations for wireless communication algorithms. A butterfly operation is included, where a series of parallel multiplications are provided followed by a series of parallel accumulations of the multiplication results. Similarly, as will be described in more detail below with respect to FIGS. 39 and 40, the execution unit 84 of FIG. 35 includes a carry save accumulator for providing a redundant carry save format in the carry save accumulator. There is also a fusion multiplier option with Providing a redundant carry save format in the carry save accumulator eliminates the need to provide a carry propagation path and a carry propagation add operation between each step of accumulation.

[00252]この関連で、図３５をさらに参照すると、ＶＰＥ２２のＭ０乗算ベクトルパイプラインステージ４６０（１）が最初に記載される。Ｍ０乗算ベクトルパイプラインステージ４６０（１）は、各々がプログラム可能なデータパス構成を有する、任意の所望の数の乗算器ブロック４６２（Ａ）〜４６２（０）の形式で複数のベクトル処理ブロックを含んでいる第２のベクトルパイプラインステージである。乗算器ブロック４６２（Ａ）〜４６２（０）は、実行ユニット８４内でベクトル乗算演算を実行するために設けられる。複数の乗算器ブロック４６２（Ａ）〜４６２（０）は、最大１２個の乗算ベクトルデータサンプルセット３４（Ｙ）〜３４（０）の乗算を提供するために、Ｍ０乗算ベクトルパイプラインステージ４６０（１）内で互いと並列に配置される。この実施形態では、「Ａ」は３に等しく、この例では、Ｍ０乗算ベクトルパイプラインステージ４６０（１）に４つの乗算器ブロック４６２（３）〜４６２（０）が含まれることを意味する。乗算ベクトルデータサンプルセット３４（Ｙ）〜３４（０）は、実行ユニット８４内の第１のベクトルパイプラインステージ４６０（０）である、入力読取り（ＲＲ）ベクトルパイプラインステージ内に設けられた複数のラッチ４６４（Ｙ）〜４６４（０）へのベクトル処理のための実行ユニット８４の中にロードされる。この実施形態では実行ユニット８４内に１２個のラッチ４６４（１１）〜４６４（０）があり、この実施形態では「Ｙ」が１１に等しいことを意味する。ラッチ４６４（１１）〜４６４（０）は、ベクトルレジスタ（図２のベクトルデータファイル２８参照）から取り出された乗算ベクトルデータサンプルセット３４（１１）〜３４（０）を、ベクトルデータ入力サンプルセット４６６（１１）〜４６６（０）としてラッチするように構成される。この例では、各ラッチ４６４（１１）〜４６４（０）は８ビット幅である。ラッチ４６４（１１）〜４６４（０）は、各々、それぞれ乗算ベクトルデータ入力サンプルセット４６６（１１）〜４６６（０）を、総計９６ビット幅のベクトルデータ３０（すなわち、１２ラッチ×各８ビット）を求めてラッチするように構成される。 [00252] In this connection, with further reference to FIG. 35, the M0 multiply vector pipeline stage 460 (1) of VPE 22 is first described. The M0 multiplication vector pipeline stage 460 (1) may include a plurality of vector processing blocks in the form of any desired number of multiplier blocks 462 (A) -462 (0), each having a programmable data path configuration. It is the 2nd vector pipeline stage containing. Multiplier blocks 462 (A) -462 (0) are provided for performing vector multiplication operations within execution unit 84. Multiple multiplier blocks 462 (A) -462 (0) provide M0 multiplication vector pipeline stage 460 () to provide multiplication of up to 12 multiplication vector data sample sets 34 (Y) -34 (0). 1) in parallel with each other. In this embodiment, “A” is equal to 3, which in this example means that the M0 multiplication vector pipeline stage 460 (1) includes four multiplier blocks 462 (3) -462 (0). Multiplier vector data sample sets 34 (Y) -34 (0) are the first vector pipeline stage 460 (0) in execution unit 84, and are provided in the input read (RR) vector pipeline stage. Are loaded into the execution unit 84 for vector processing into the latches 464 (Y) -464 (0). In this embodiment, there are twelve latches 464 (11) -464 (0) in the execution unit 84, which means that “Y” is equal to 11 in this embodiment. The latches 464 (11) to 464 (0) convert the multiplication vector data sample sets 34 (11) to 34 (0) fetched from the vector register (see the vector data file 28 in FIG. 2) into the vector data input sample set 466. (11) to 466 (0) are configured to be latched. In this example, each latch 464 (11) -464 (0) is 8 bits wide. The latches 464 (11) to 464 (0) respectively multiply the multiplication vector data input sample sets 466 (11) to 466 (0), respectively, and the vector data 30 having a total width of 96 bits (ie, 12 latches × 8 bits each). Configured to latch for.

[00253]引き続き図３５を参照すると、複数の乗算器ブロック４６２（３）〜４６２（０）は、ベクトル乗算演算を提供するために、ベクトルデータ入力サンプルセット４６６（１１）〜４６６（０）のいくつかの組合せを受信することができるように構成され、ここで、この例では「Ｙ」は１１に等しい。乗算ベクトルデータ入力サンプルセット４６６（１１）〜４６６（０）は、実行ユニット８４の設計に従って、複数の入力データパスＡ３〜Ａ０、Ｂ３〜Ｂ０、およびＣ３〜Ｃ０内で供給される。ベクトルデータ入力サンプルセット４６６（３）〜４６６（０）は、図３５に示されたように、入力データパスＣ３〜Ｃ０に対応する。ベクトルデータ入力サンプルセット４６６（７）〜４６６（４）は、図３５に示されたように、入力データパスＢ３〜Ｂ０に対応する。ベクトルデータ入力サンプルセット４６６（１１）〜４６６（８）は、図３５に示されたように、入力データパスＡ３〜Ａ０に対応する。複数の乗算器ブロック４６２（３）〜４６２（０）は、ベクトル乗算演算を提供するために、それぞれ、複数の乗算器ブロック４６２（３）〜４６２（０）に供給された、入力データパスＡ３〜Ａ０、Ｂ３〜Ｂ０、Ｃ３〜Ｃ０に従って受信されたベクトルデータ入力サンプルセット４６６（１１）〜４６６（０）を処理するように構成される。 [00253] With continued reference to FIG. 35, a plurality of multiplier blocks 462 (3) -462 (0) are provided for the vector data input sample sets 466 (11) -466 (0) to provide vector multiplication operations. It is configured to be able to receive several combinations, where “Y” is equal to 11 in this example. Multiplier vector data input sample sets 466 (11) -466 (0) are provided in a plurality of input data paths A3-A0, B3-B0, and C3-C0 according to the design of execution unit 84. The vector data input sample sets 466 (3) to 466 (0) correspond to the input data paths C3 to C0 as shown in FIG. The vector data input sample sets 466 (7) to 466 (4) correspond to the input data paths B3 to B0 as shown in FIG. The vector data input sample sets 466 (11) to 466 (8) correspond to the input data paths A3 to A0 as shown in FIG. A plurality of multiplier blocks 462 (3) -462 (0) are respectively provided to an input data path A3 supplied to the plurality of multiplier blocks 462 (3) -462 (0) to provide vector multiplication operations. ~ Configured to process vector data input sample sets 466 (11) to 466 (0) received according to ~ A0, B3-B0, C3-C0.

[00254]図３７および図３８に関して下記でより詳細に説明されるように、図３５の乗算器ブロック４６２（３）〜４６２（０）内に設けられたプログラム可能な内部データパス４６７（３）〜４６７（０）は、様々なデータパス構成を有するようにプログラムされ得る。これらの様々なデータパス構成は、各乗算器ブロック４６２（３）〜４６２（０）に供給された、特定の入力データパスＡ３〜Ａ０、Ｂ３〜Ｂ０、Ｃ３〜Ｃ０に従って乗算器ブロック４６２（３）〜４６２（０）に供給された、特定の受信されたベクトルデータ入力サンプルセット４６６（１１）〜４６６（０）の様々な組合せおよび／または様々なビット長の乗算を提供する。この関連で、複数の乗算器ブロック４６２（３）〜４６２（０）は、ベクトルデータ入力サンプルセット４６６（１１）〜４６６（０）の特定の組合せを一緒に乗算した乗算結果を備えるベクトル結果出力サンプルセットとして、ベクトル乗算出力サンプルセット４６８（３）〜４６８（０）を供給する。 [00254] Programmable internal data path 467 (3) provided within multiplier blocks 462 (3) -462 (0) of FIG. 35, as described in more detail below with respect to FIGS. ˜467 (0) can be programmed to have various data path configurations. These various data path configurations depend on the specific input data paths A3-A0, B3-B0, C3-C0 supplied to each multiplier block 462 (3) -462 (0). ) To 462 (0) to provide various combinations and / or various bit length multiplications of a particular received vector data input sample set 466 (11) to 466 (0). In this regard, a plurality of multiplier blocks 462 (3) -462 (0) provides a vector result output comprising multiplication results obtained by multiplying together a particular combination of vector data input sample sets 466 (11) -466 (0). As sample sets, vector multiplication output sample sets 468 (3) to 468 (0) are supplied.

[00255]たとえば、乗算器ブロック４６２（３）〜４６２（０）のプログラム可能な内部データパス４６７（３）〜４６７（０）は、図２のベースバンドプロセッサ２０の命令ディスパッチ回路４８内のベクトル命令デコーダから供給される設定値に従ってプログラムされ得る。この実施形態では、乗算器ブロック４６２（３）〜４６２（０）の４つのプログラム可能な内部データパス４６７（３）〜４６７（０）がある。ベクトル命令は、実行ユニット８４によって実行されるべき特定のタイプの演算を指定する。したがって、実行ユニット８４は、高効率な方式で同じ共通回路を用いて様々なタイプのベクトル乗算演算を提供するために、乗算器ブロック４６２（３）〜４６２（０）のプログラム可能な内部データパス４６７（３）〜４６７（０）を構成するようにプログラムおよび再プログラムされ得る。たとえば、実行ユニット８４は、乗算器ブロック４６２（３）〜４６２（０）のプログラム可能な内部データパス４６７（３）〜４６７（０）を、命令ディスパッチ回路４８内の命令パイプラインにおけるベクトル命令の復号に従って、実行されるベクトル命令ごとにクロックサイクルごとに構成および再構成するようにプログラムされ得る。したがって、実行ユニット８４内のＭ０乗算ベクトルパイプラインステージ４６０（１）が、クロックサイクルごとにベクトルデータ入力サンプルセット４６６を処理するように構成されている場合、結果として、乗算器ブロック４６２（３）〜４６２（０）は、命令ディスパッチ回路４８内の命令パイプラインにおけるベクトル命令の復号に従って、クロックサイクルごとにベクトル乗算演算を実行する。 [00255] For example, the programmable internal data paths 467 (3) -467 (0) of multiplier blocks 462 (3) -462 (0) are vectors in the instruction dispatch circuit 48 of the baseband processor 20 of FIG. It can be programmed according to the set value supplied from the instruction decoder. In this embodiment, there are four programmable internal data paths 467 (3) -467 (0) for multiplier blocks 462 (3) -462 (0). Vector instructions specify a particular type of operation to be performed by execution unit 84. Thus, execution unit 84 provides programmable internal data paths for multiplier blocks 462 (3) -462 (0) to provide various types of vector multiplication operations using the same common circuit in a highly efficient manner. 467 (3) -467 (0) may be programmed and reprogrammed. For example, execution unit 84 may use programmable internal data paths 467 (3) -467 (0) of multiplier blocks 462 (3) -462 (0) for vector instructions in the instruction pipeline in instruction dispatch circuit 48. According to the decoding, it can be programmed to configure and reconfigure every clock cycle for every vector instruction executed. Thus, if the M0 multiply vector pipeline stage 460 (1) in the execution unit 84 is configured to process the vector data input sample set 466 every clock cycle, the result is a multiplier block 462 (3). ˜462 (0) performs a vector multiplication operation every clock cycle in accordance with the decoding of the vector instruction in the instruction pipeline in the instruction dispatch circuit 48.

[00256]乗算器ブロック４６２は、実数乗算と虚数乗算とを実行するようにプログラムされ得る。引き続き図３５を参照すると、あるベクトル処理ブロックデータパス構成において、乗算器ブロック４６２は、２つの８ビットベクトルデータ入力サンプルセット４６６を一緒に乗算するように構成される場合がある。ある乗算ブロックデータパス構成では、乗算器ブロック４６２は、２つの１６ビットベクトルデータ入力サンプルセット４６６を一緒に乗算するように構成される場合があり、これらは、８ビットベクトルデータ入力サンプルセット４６６の第２のペアと乗算された８ビットベクトルデータ入力サンプルセット４６６の第１のペアから形成される。これは図３８に示され、下記でより詳細に説明される。やはり、乗算器ブロック４６２（３）〜４６２（０）内にプログラム可能なデータパス構成を設けることにより、乗算器ブロック４６２（３）〜４６２（０）が、実行ユニット８４内の面積を削減し、場合によっては、所望のベクトル処理動作を遂行するためにベースバンドプロセッサ２０内により少ない実行ユニット８４が設けられることを可能にするために、様々なタイプの乗算演算を実行するように構成および再構成され得るという柔軟性がもたらされる。 [00256] Multiplier block 462 may be programmed to perform real and imaginary multiplication. With continued reference to FIG. 35, in one vector processing block data path configuration, multiplier block 462 may be configured to multiply two 8-bit vector data input sample sets 466 together. In one multiply block data path configuration, multiplier block 462 may be configured to multiply together two 16-bit vector data input sample sets 466, which are Formed from the first pair of 8-bit vector data input sample sets 466 multiplied by the second pair. This is shown in FIG. 38 and is described in more detail below. Again, by providing a programmable data path configuration within multiplier blocks 462 (3) -462 (0), multiplier blocks 462 (3) -462 (0) reduce the area within execution unit 84. , In some cases, configured and reconfigured to perform various types of multiplication operations to allow fewer execution units 84 to be provided in the baseband processor 20 to perform the desired vector processing operations. The flexibility is that it can be configured.

[00257]図３５に戻って参照すると、複数の乗算器ブロック４６２（３）〜４６２（０）は、プログラム可能な出力データパス４７０（３）〜４７０（０）内のベクトル乗算出力サンプルセット４６８（３）〜４６８（０）を、次のベクトル処理ステージ４６０または出力処理ステージのいずれかに供給するように構成される。ベクトル乗算出力サンプルセット４６８（３）〜４６８（０）は、複数の乗算器ブロック４６２（３）〜４６２（０）によって実行されているベクトル命令に基づいてプログラムされた構成に従って、プログラム可能な出力データパス４７０（３）〜４７０（０）内で供給される。この例では、プログラム可能な出力データパス４７０（３）〜４７０（０）内のベクトル乗算出力サンプルセット４６８（３）〜４６８（０）は、下記で説明されるように、累算のためにＭ１累算ベクトルパイプラインステージ４６０（２）に供給される。実行ユニット８４のこの特定の設計では、複数の乗算器ブロック４６２（３）〜４６２（０）と、続いて、ベクトルデータ入力の乗算、それに続く乗算結果の累算を要求する特殊なベクトル命令をサポートする累算器とを設けることが望ましい。たとえば、ＦＦＴ演算を提供するために通常使用される、基数２および基数４のバタフライ演算は、一連の乗算演算、それに続く乗算結果の累算を含む。しかしながら、実行ユニット８４内に設けられるベクトル処理ブロックのこれらの組合せは例示的であり、限定的でないことに留意されたい。プログラム可能なデータパス構成を有するＶＰＥは、ベクトル処理ブロックを有する１つまたは他の任意の数のベクトル処理ステージを含むように構成される可能性がある。ベクトル処理ブロックは、設計および実行ユニットによってサポートされるように設計された特定のベクトル命令に従って、任意のタイプの演算を実行するために設けられる可能性がある。 [00257] Referring back to FIG. 35, the plurality of multiplier blocks 462 (3) -462 (0) includes vector multiply output sample sets 468 in programmable output data paths 470 (3) -470 (0). (3) -468 (0) are configured to be supplied to either the next vector processing stage 460 or the output processing stage. Vector multiply output sample sets 468 (3) -468 (0) are programmable outputs according to a programmed configuration based on vector instructions being executed by a plurality of multiplier blocks 462 (3) -462 (0). It is supplied in the data paths 470 (3) to 470 (0). In this example, vector multiplication output sample sets 468 (3) -468 (0) in programmable output data paths 470 (3) -470 (0) are used for accumulation, as described below. M1 accumulation vector pipeline stage 460 (2) is supplied. This particular design of execution unit 84 includes a plurality of multiplier blocks 462 (3) -462 (0) followed by special vector instructions that require multiplication of vector data input followed by accumulation of multiplication results. It is desirable to provide a supporting accumulator. For example, a radix-2 and radix-4 butterfly operation, commonly used to provide an FFT operation, includes a series of multiplication operations followed by accumulation of multiplication results. However, it should be noted that these combinations of vector processing blocks provided within execution unit 84 are exemplary and not limiting. A VPE with a programmable data path configuration may be configured to include one or any other number of vector processing stages with vector processing blocks. A vector processing block may be provided to perform any type of operation according to specific vector instructions designed to be supported by the design and execution unit.

[00258]引き続き図３５を参照すると、この実施形態では、ベクトル乗算出力サンプルセット４６８（３）〜４６８（０）は、Ｍ１累算ベクトル処理ステージ４６０（２）である次のベクトル処理ステージ内に設けられた複数の累算器ブロック４７２（３）〜４７２（０）に供給される。複数の累算器ブロック４７２（Ａ）〜４７２（０）の中の各累算器ブロックは、２つの累算器４７２（Ｘ）（１）および４７２（Ｘ）（０）（すなわち、４７２（３）（１）、４７２（３）（０）、４７２（２）（１）、４７２（２）（０）、４７２（１）（１）、４７２（１）（０）、および４７２（０）（１）、４７２（０）（０））を含んでいる。複数の累算器ブロック４７２（３）〜４７２（０）は、ベクトル乗算出力サンプルセット４６８（３）〜４６８（０）の結果を累算する。図３９および図４０に関して下記でより詳細に説明されるように、複数の累算器ブロック４７２（３）〜４７２（０）は桁上げ保存累算器として設けられ得るし、ここで、桁上げ積は本質的に、累算演算が完了されるまで累算プロセス中保存され、伝搬されない。複数の累算器ブロック４７２（３）〜４７２（０）は、複数の累算器ブロック４７２（３）〜４７２（０）において冗長桁上げ保存フォーマットを提供するために、図３５および図３７の複数の乗算器ブロック４６２（３）〜４６２（０）と融合されるオプションも有する。複数の累算器ブロック４７２（３）〜４７２（０）において冗長桁上げ保存フォーマットを提供することにより、複数の累算器ブロック４７２（３）〜４７２（０）内の累算の各ステップの間に、桁上げ伝搬パスと桁上げ伝搬加算演算とを提供する必要をなくすことができる。Ｍ１累算ベクトル処理ステージ４６０（２）およびその複数の累算器ブロック４７２（３）〜４７２（０）が、図３５を参照して次に紹介される。 [00258] With continued reference to FIG. 35, in this embodiment, the vector multiplication output sample sets 468 (3) -468 (0) are in the next vector processing stage, which is the M1 accumulated vector processing stage 460 (2). The plurality of accumulator blocks 472 (3) to 472 (0) are provided. Each accumulator block in the plurality of accumulator blocks 472 (A) -472 (0) is represented by two accumulators 472 (X) (1) and 472 (X) (0) (ie, 472 ( 3) (1), 472 (3) (0), 472 (2) (1), 472 (2) (0), 472 (1) (1), 472 (1) (0), and 472 (0 ) (1), 472 (0) (0)). A plurality of accumulator blocks 472 (3) -472 (0) accumulate the results of the vector multiplication output sample sets 468 (3) -468 (0). As described in more detail below with respect to FIGS. 39 and 40, a plurality of accumulator blocks 472 (3) -472 (0) may be provided as carry save accumulators, where carry The product is essentially stored and not propagated during the accumulation process until the accumulation operation is complete. The plurality of accumulator blocks 472 (3) -472 (0) are shown in FIGS. 35 and 37 to provide a redundant carry save format in the plurality of accumulator blocks 472 (3) -472 (0). There is also an option to be merged with multiple multiplier blocks 462 (3) -462 (0). By providing a redundant carry save format in the plurality of accumulator blocks 472 (3) -472 (0), each step of accumulation in the plurality of accumulator blocks 472 (3) -472 (0). In the meantime, the need to provide a carry propagation path and a carry propagation addition operation can be eliminated. The M1 accumulated vector processing stage 460 (2) and its plurality of accumulator blocks 472 (3) -472 (0) are introduced next with reference to FIG.

[00259]図３５を参照すると、Ｍ１累算ベクトル処理ステージ４６０（２）内の複数の累算器ブロック４７２（３）〜４７２（０）は、累算器出力サンプルセット４７６（３）〜４７６（０）（すなわち、４７６（３）（１）、４７６（３）（０）、４７６（２）（１）、４７６（２）（０）、４７６（１）（１）、４７６（１）（０）、および４７６（０）（１）、４７６（０）（０））を、次のベクトル処理ステージ４６０または出力処理ステージのいずれかにおいて供給するために、プログラム可能な出力データパス構成に従って、プログラム可能な出力データパス４７４（３）〜４７４（０）（すなわち、４７４（３）（１）、４７４（３）（０）、４７４（２）（１）、４７４（２）（０）、４７４（１）（１）、４７４（１）（０）、および４７４（０）（１）、４７４（０）（０））内でベクトル乗算出力サンプルセット４６８（３）〜４６８（０）を累算するように構成される。この例では、累算器出力サンプルセット４７６（３）〜４７６（０）は、ＡＬＵ処理ステージ４６０（３）である出力処理ステージに供給される。たとえば、下記でより詳細に説明されるように、累算器出力サンプルセット４７６（３）〜４７６（０）はまた、非限定的な例として、図２のベースバンドプロセッサ２０内のスカラープロセッサ４４内のＡＬＵ４６に供給され得る。たとえば、ＡＬＵ４６は、より一般的な処理動作において使用されるために、実行ユニット８４によって実行される特殊なベクトル命令に従って、累算器出力サンプルセット４７６（３）〜４７６（０）を取る場合がある。 [00259] Referring to FIG. 35, a plurality of accumulator blocks 472 (3) -472 (0) in the M1 accumulated vector processing stage 460 (2) are connected to accumulator output sample sets 476 (3) -476. (0) (ie, 476 (3) (1), 476 (3) (0), 476 (2) (1), 476 (2) (0), 476 (1) (1), 476 (1) (0) and 476 (0) (1), 476 (0) (0)) in accordance with a programmable output data path configuration to supply either in the next vector processing stage 460 or output processing stage , Programmable output data paths 474 (3) -474 (0) (ie, 474 (3) (1), 474 (3) (0), 474 (2) (1), 474 (2) (0) 474 (1) (1), 474 (1) (0) , And 474 (0) (1), 474 (0) (0)) are configured to accumulate vector multiplication output sample sets 468 (3) -468 (0). In this example, accumulator output sample sets 476 (3) -476 (0) are supplied to an output processing stage, which is an ALU processing stage 460 (3). For example, as described in more detail below, accumulator output sample sets 476 (3) -476 (0) are also, by way of non-limiting example, scalar processor 44 within baseband processor 20 of FIG. It can be supplied to the ALU 46 within. For example, ALU 46 may take accumulator output sample sets 476 (3) -476 (0) according to special vector instructions executed by execution unit 84 for use in more general processing operations. is there.

[00260]図３５に戻って参照すると、累算器ブロック４７２（３）〜４７２（０）のプログラム可能な入力データパス４７８（３）〜７８（０）および／またはプログラム可能な内部データパス４８０（３）〜４８０（０）は、乗算器ブロック４６２（３）〜４６２（０）から累算器ブロック４７２（３）〜４７２（０）に供給された、様々な組合せおよび／またはビット長のベクトル乗算出力サンプルセット４６８（３）〜４６８（０）を受信するように再構成されるようにプログラムされ得る。各累算器ブロック４７２は、２つの累算器４７２（Ｘ）（１）、４７２（Ｘ）（０）から構成されるので、プログラム可能な入力データパス４７８（Ａ）〜４７８（０）は、４７８（３）（１）、４７８（３）（０）、４７８（２）（１）、４７８（２）（０）、４７８（１）（１）、４７８（１）（０）、および４７８（０）（１）、４７８（０）（０）として図３５に示されている。同様に、プログラム可能な内部データパス４８０（３）〜４８０（０）は、４８０（３）（１）、４８０（３）（０）、４８０（２）（１）、４８０（２）（０）、４８０（１）（１）、４８０（１）（０）、４８０（０）（１）、４８０（０）（０）として図３５に示されている。累算器ブロック４７２（３）〜４７２（０）内にプログラム可能な入力データパス４７８（３）〜４７８（０）および／またはプログラム可能な内部データパス４８０（３）〜４８０（０）を設けることは、図３９および図４０に関して下記でより詳細に説明される。このようにして、累算器ブロック４７２（３）〜４７２（０）のプログラム可能な入力データパス４７８（３）〜４７８（０）および／またはプログラム可能な内部データパス４８０（３）〜４８０（０）に従って、累算器ブロック４７２（３）〜４７２（０）は、累算されたベクトル乗算出力サンプルセット４６８（３）〜４６８（０）のプログラムされた組合せに従って、累算器出力サンプルセット４７６（３）〜４７６（０）を供給することができる。やはり、これにより、累算器ブロック４７２（３）〜４７２（０）が、実行ユニット８４内の面積を削減し、場合によっては、所望のベクトル処理動作を遂行するために、ベースバンドプロセッサ２０内により少ない実行ユニット８４が設けられることを可能にするために、プログラム可能な入力データパス４７８（３）〜４７８（０）および／またはプログラム可能な内部データパス４８０（３）〜４８０（０）のプログラミングに基づいて、様々なタイプの累算演算を実行するように構成および再構成され得るという柔軟性がもたらされる。 [00260] Referring back to FIG. 35, the programmable input data paths 478 (3) -78 (0) and / or the programmable internal data path 480 of the accumulator blocks 472 (3) -472 (0). (3) -480 (0) can be of various combinations and / or bit lengths supplied from multiplier blocks 462 (3) -462 (0) to accumulator blocks 472 (3) -472 (0). It can be programmed to be reconfigured to receive vector multiply output sample sets 468 (3) -468 (0). Each accumulator block 472 is composed of two accumulators 472 (X) (1), 472 (X) (0), so that the programmable input data paths 478 (A) -478 (0) are 478 (3) (1), 478 (3) (0), 478 (2) (1), 478 (2) (0), 478 (1) (1), 478 (1) (0), and It is shown in FIG. 35 as 478 (0) (1), 478 (0) (0). Similarly, the programmable internal data paths 480 (3) -480 (0) are 480 (3) (1), 480 (3) (0), 480 (2) (1), 480 (2) (0 ), 480 (1) (1), 480 (1) (0), 480 (0) (1), 480 (0) (0). Provide programmable input data paths 478 (3) -478 (0) and / or programmable internal data paths 480 (3) -480 (0) in accumulator blocks 472 (3) -472 (0). This is described in more detail below with respect to FIGS. 39 and 40. In this way, the programmable input data paths 478 (3) -478 (0) and / or the programmable internal data paths 480 (3) -480 () of the accumulator blocks 472 (3) -472 (0). 0), accumulator blocks 472 (3) -472 (0) are configured to generate accumulator output sample sets according to a programmed combination of accumulated vector multiplication output sample sets 468 (3) -468 (0). 476 (3) to 476 (0) can be supplied. Again, this causes accumulator blocks 472 (3) -472 (0) to reduce the area in execution unit 84 and possibly in baseband processor 20 to perform the desired vector processing operations. Of programmable input data paths 478 (3) -478 (0) and / or programmable internal data paths 480 (3) -480 (0) to allow fewer execution units 84 to be provided. Based on programming, the flexibility is provided that can be configured and reconfigured to perform various types of accumulation operations.

[00261]たとえば、ある累算器モード構成では、２つの累算器ブロック４７２のプログラム可能な入力データパス４７８および／またはプログラム可能な内部データパス４８０は、非限定的な例として、単一の４０ビット累算器を提供するようにプログラムされ得る。別の累算器モード構成では、２つの累算器ブロック４７２のプログラム可能な入力データパス４７８および／またはプログラム可能な内部データパス４８０は、非限定的な例として、二重２４ビット累算器を提供するようにプログラムされ得る。別の累算器モード構成では、２つの累算器ブロック４７２のプログラム可能な入力データパス４７８および／またはプログラム可能な内部データパス４８０は、１６ビット桁上げ保存加算器、それに続く単一の２４ビット累算器を提供するようにプログラムされ得る。乗算演算と累算演算の特定の様々な組合せも、乗算器ブロック４６２（３）〜４６２（０）および累算器ブロック４７２（３）〜４７２（０）（たとえば、１６ビット累算を用いる１６ビット虚数乗算、および１６ビット累算を用いる３２ビット虚数乗算）のプログラミングに従って、実行ユニット８４によってサポートされ得る。 [00261] For example, in certain accumulator mode configurations, the programmable input data path 478 and / or the programmable internal data path 480 of the two accumulator blocks 472 may include a single, It can be programmed to provide a 40-bit accumulator. In another accumulator mode configuration, the programmable input data path 478 and / or the programmable internal data path 480 of the two accumulator blocks 472 are, as a non-limiting example, a double 24-bit accumulator. Can be programmed to provide In another accumulator mode configuration, the programmable input data path 478 and / or the programmable internal data path 480 of the two accumulator blocks 472 may include a 16-bit carry save adder followed by a single 24 It can be programmed to provide a bit accumulator. Various particular combinations of multiplication and accumulation operations are also used in multiplier blocks 462 (3) -462 (0) and accumulator blocks 472 (3) -472 (0) (eg, 16 using 16-bit accumulation). According to the programming of bit imaginary multiplication and 32-bit imaginary multiplication with 16-bit accumulation) may be supported by execution unit 84.

[00262]累算器ブロック４７２（３）〜４７２（０）のプログラム可能な入力データパス４７８（３）〜４７８（０）および／またはプログラム可能な内部データパス４８０（３）〜４８０（０）は、図２のベースバンドプロセッサ２０の命令ディスパッチ回路４８内のベクトル命令デコーダから供給される設定値に従ってプログラムされ得る。ベクトル命令は、実行ユニット８４によって実行されるべき特定のタイプの演算を指定する。したがって、実行ユニット８４は、累算器ブロック４７２（３）〜４７２（０）のプログラム可能な入力データパス４７８（３）〜４７８（０）および／またはプログラム可能な内部データパス４８０（３）〜４８０（０）を、命令ディスパッチ回路４８内の命令パイプラインにおけるベクトル命令の復号に従って実行されるベクトル命令ごとに、再プログラムするように構成され得る。ベクトル命令は、実行ユニット８４の１つまたは複数のクロックサイクルにわたって実行することができる。また、この例では、実行ユニット８４は、累算器ブロック４７２（３）〜４７２（０）のプログラム可能な入力データパス４７８（３）〜４７８（０）および／またはプログラム可能な内部データパス４８０（３）〜４８０（０）を、クロックサイクルごとにベクトル命令のクロックサイクルごとに、再プログラムするように構成され得る。したがって、たとえば、実行ユニット８４内のＭ１累算ベクトル処理ステージ４６０（２）によって実行されるベクトル命令が、クロックサイクルごとにベクトル乗算出力サンプルセット４６８（３）〜４６８（０）を処理する場合、結果として、累算器ブロック４７２（３）〜４７２（０）のプログラム可能な入力データパス４７８（３）〜４７８（０）および／またはプログラム可能な内部データパス４８０（３）〜４８０（０）は、ベクトル命令の実行中、クロックサイクルごとに再構成され得る。 [00262] Programmable input data paths 478 (3) -478 (0) and / or programmable internal data paths 480 (3) -480 (0) of accumulator blocks 472 (3) -472 (0). Can be programmed according to the set value supplied from the vector instruction decoder in the instruction dispatch circuit 48 of the baseband processor 20 of FIG. Vector instructions specify a particular type of operation to be performed by execution unit 84. Accordingly, execution unit 84 may implement programmable input data paths 478 (3) -478 (0) and / or programmable internal data paths 480 (3)-of accumulator blocks 472 (3) -472 (0). 480 (0) may be configured to reprogram every vector instruction executed according to the decoding of the vector instruction in the instruction pipeline in instruction dispatch circuit 48. Vector instructions may be executed over one or more clock cycles of execution unit 84. Also, in this example, execution unit 84 has a programmable input data path 478 (3) -478 (0) and / or a programmable internal data path 480 for accumulator blocks 472 (3) -472 (0). (3) -480 (0) may be configured to reprogram every clock cycle of the vector instruction every clock cycle. Thus, for example, if a vector instruction executed by M1 accumulated vector processing stage 460 (2) in execution unit 84 processes vector multiply output sample sets 468 (3) -468 (0) every clock cycle, As a result, programmable input data paths 478 (3) -478 (0) and / or programmable internal data paths 480 (3) -480 (0) of accumulator blocks 472 (3) -472 (0). Can be reconfigured every clock cycle during execution of a vector instruction.

[00263]図３６は、例示的なベクトル処理のさらなる説明を提供するために、図２および図３５の実行ユニット８４内の乗算器ブロック４６２（Ａ）〜４６２（０）および累算器ブロック４７２（Ａ）（１）〜４７２（０）（０）の例示的なベクトル処理を示すフローチャートである。乗算器ブロック４６２（Ａ）〜４６２（０）および累算器ブロック４７２（Ａ）（１）〜４７２（０）（０）は、各々プログラム可能なデータパス構成を有し、図２および図３５の例示的な実行ユニット８４内の様々なベクトル処理ステージ内に設けられる。たとえば、ＦＦＴベクトル演算は、乗算演算と、それに続く累算演算とを伴う。 [00263] FIG. 36 provides multiplier blocks 462 (A) -462 (0) and accumulator block 472 within execution unit 84 of FIGS. 2 and 35 to provide further explanation of exemplary vector processing. It is a flowchart which shows the exemplary vector processing of (A) (1) -472 (0) (0). Multiplier blocks 462 (A) through 462 (0) and accumulator blocks 472 (A) (1) through 472 (0) (0) each have a programmable data path configuration and are shown in FIGS. Are provided in various vector processing stages within the exemplary execution unit 84. For example, an FFT vector operation involves a multiplication operation followed by an accumulation operation.

[00264]この関連で、図３６に関して、ベクトル処理は、入力処理ステージ４６０（０）内の複数の入力データパスＡ３〜Ｃ０の中の入力データパス内で、ベクトルアレイの幅の複数の乗算ベクトルデータサンプルセット３４（Ｙ）〜３４（０）を受信することを伴う（ブロック５０１）。ベクトル処理は、次いで、複数の乗算器ブロック４６２（Ａ）〜４６２（０）内の複数の入力データパスＡ３〜Ｃ０から乗算ベクトルデータサンプルセット３４（Ｙ）〜３４（０）を受信することを含む（ブロック５０３）。ベクトル処理は、次いで、ベクトル処理ステージ４６０（１）によって実行されるベクトル命令に従って、乗算器ブロック４６２（Ａ）〜４６２（０）のためのプログラム可能なデータパス構成に基づいて、複数の乗算出力データパス４７０（Ａ）〜４７０（０）の中の乗算出力データパス４７０（Ａ）〜４７０（０）内に乗算ベクトル結果出力サンプルセット４６８（Ａ）〜４６８（０）を供給するために、乗算ベクトルデータサンプルセット３４（Ｙ）〜３４（０）を乗算することを含む（ブロック５０５）。ベクトル処理は、次に、複数の累算器ブロック４７２（Ａ）（１）〜４７２（０）（０）内の複数の乗算出力データパス４７０（Ａ）〜４７０（０）から乗算ベクトル結果出力サンプルセット４６８（Ａ）〜４６８（０）を受信することを含む（ブロック５０７）。ベクトル処理は、次に、第２のベクトル処理ステージ４６０（２）によって実行されるベクトル命令に従って、累算器ブロック４７２（Ａ）（１）〜４７２（０）（０）のためのプログラム可能な入力データパス４７８（Ａ）（１）〜４７８（０）（０）、プログラム可能な内部データパス４８０（Ａ）（１）〜４８０（０）（０）、およびプログラム可能な出力データパス４７４（Ａ）（１）〜４７４（０）（０）の構成に基づいて、累算器出力サンプルセット４７６（Ａ）（１）〜４７６（０）（０）を供給するために、乗算ベクトル結果出力サンプルセット４６８（Ａ）〜４６８（０）を一緒に累算することを含む（ブロック５０９）。ベクトル処理は、次いで、プログラム可能な出力データパス４７４（Ａ）（１）〜４７４（０）（０）内に累算器出力サンプルセット４７６（Ａ）（１）〜４７６（０）（０）を供給することを含む（ブロック５１１）。ベクトル処理は、次いで、出力ベクトル処理ステージ４６０（３）内の累算器ブロック４７２（Ａ）（１）〜４７２（０）（０）から累算器出力サンプルセット４７６（Ａ）（１）〜４７６（０）（０）を受信することを含む（ブロック５１３）。 [00264] In this regard, with respect to FIG. 36, the vector processing is performed by a plurality of multiplication vectors of the width of the vector array in the input data paths in the plurality of input data paths A3-C0 in the input processing stage 460 (0). Receiving data sample sets 34 (Y) -34 (0) is received (block 501). The vector processing then receives the multiplication vector data sample sets 34 (Y) -34 (0) from the plurality of input data paths A3-C0 in the plurality of multiplier blocks 462 (A) -462 (0). (Block 503). Vector processing then outputs a plurality of multiply outputs based on a programmable data path configuration for multiplier blocks 462 (A) -462 (0) according to vector instructions executed by vector processing stage 460 (1). To provide multiplication vector result output sample sets 468 (A) -468 (0) in multiplication output data paths 470 (A) -470 (0) in data paths 470 (A) -470 (0), Multiplying the multiplication vector data sample sets 34 (Y) -34 (0) (block 505). The vector processing then outputs the multiplication vector result from the plurality of multiplication output data paths 470 (A) to 470 (0) in the plurality of accumulator blocks 472 (A) (1) to 472 (0) (0). Receiving sample sets 468 (A) -468 (0) (block 507). The vector processing is then programmable for the accumulator blocks 472 (A) (1) -472 (0) (0) according to the vector instructions executed by the second vector processing stage 460 (2). Input data path 478 (A) (1) -478 (0) (0), programmable internal data path 480 (A) (1) -480 (0) (0), and programmable output data path 474 ( A) Multiplication vector result output to provide accumulator output sample sets 476 (A) (1) -476 (0) (0) based on the configuration of (1) -474 (0) (0) The sample sets 468 (A) -468 (0) are accumulated together (block 509). Vector processing is then performed by accumulator output sample sets 476 (A) (1) -476 (0) (0) in programmable output data paths 474 (A) (1) -474 (0) (0). (Block 511). Vector processing then proceeds from accumulator blocks 472 (A) (1) -472 (0) (0) in output vector processing stage 460 (3) to accumulator output sample set 476 (A) (1)- Receiving 476 (0) (0) (block 513).

[00265]プログラム可能なデータパス構成を有するベクトル処理ブロックを利用する、図３５の例示的な実行ユニット８４および図３６のベクトル処理の概要が記載されたので、説明の残りは、図３７〜図４０におけるこれらのベクトル処理ブロックのより例示的な、非限定的な詳細を記載する。 [00265] Having described the exemplary execution unit 84 of FIG. 35 and the vector processing of FIG. 36 utilizing a vector processing block having a programmable data path configuration, the remainder of the description is shown in FIGS. More exemplary, non-limiting details of these vector processing blocks at 40 are described.

[00266]この関連で、図３７は、図３５の実行ユニット８４のＭ０乗算ベクトル処理ステージ４６０（１）内の複数の乗算器ブロック４６２（３）〜４６２（０）のより詳細な概略図である。図３８は、図３７の乗算器ブロック４６２の内部構成要素の概略図である。図３７に示されたように、特定の入力データパスＡ３〜Ａ０、Ｂ３〜Ｂ０、Ｃ３〜Ｃ０に従って、乗算器ブロック４６２（３）〜４６２（０）によって受信されるベクトルデータ入力サンプルセット４６６（１１）〜４６６（０）が示されている。図３８に関して下記でより詳細に説明されるように、この例における乗算器ブロック４６２（３）〜４６２（０）の各々は、４つの８ビット×８ビット乗算器を含む。図３７に戻って参照すると、この例における乗算器ブロック４６２（３）〜４６２（０）の各々は、被乗数入力「Ａ」を被乗数入力「Ｂ」または被乗数入力「Ｃ」のいずれかと乗算するように構成される。乗算器ブロック４６２において一緒に乗算され得る被乗数入力「Ａ」および「Ｂ」または「Ｃ」は、図３７に示されたように、どの入力データパスＡ３〜Ａ０、Ｂ３〜Ｂ０、Ｃ３〜Ｃ０が乗算器ブロック４６２（３）〜４６２（０）に接続されるかによって制御される。被乗数選択器入力４８２（３）〜４８２（０）は、被乗数入力「Ｂ」または被乗数入力「Ｃ」のどちらが被乗数入力「Ａ」と乗算されるために選択されるかを選択するように、各乗算器ブロック４６２（３）〜４６２（０）内のプログラム可能な内部データパス４６７（３）〜４６７（０）を制御するために、各乗算器ブロック４６２（３）〜４６２（０）に入力として供給される。このようにして、乗算器ブロック４６２（３）〜４６２（０）は、必要に応じて、それらのプログラム可能な内部データパス４６７（３）〜４６７（０）が様々な乗算演算を提供するように再プログラムされるための能力を提供される。 [00266] In this regard, FIG. 37 is a more detailed schematic diagram of a plurality of multiplier blocks 462 (3) -462 (0) in the M0 multiplication vector processing stage 460 (1) of the execution unit 84 of FIG. is there. FIG. 38 is a schematic diagram of the internal components of the multiplier block 462 of FIG. As shown in FIG. 37, vector data input sample set 466 () received by multiplier blocks 462 (3) -462 (0) according to specific input data paths A3-A0, B3-B0, C3-C0. 11) to 466 (0) are shown. As will be described in more detail below with respect to FIG. 38, each of multiplier blocks 462 (3) -462 (0) in this example includes four 8-bit × 8-bit multipliers. Referring back to FIG. 37, each of the multiplier blocks 462 (3) -462 (0) in this example is to multiply the multiplicand input “A” with either the multiplicand input “B” or the multiplicand input “C”. Configured. The multiplicand inputs “A” and “B” or “C” that can be multiplied together in the multiplier block 462 are represented by which input data paths A3-A0, B3-B0, C3-C0 as shown in FIG. It is controlled depending on whether it is connected to the multiplier blocks 462 (3) to 462 (0). The multiplicand selector inputs 482 (3) -482 (0) are each selected to select which multiplicand input “B” or multiplicand input “C” is selected to be multiplied with the multiplicand input “A”. Input to each multiplier block 462 (3) -462 (0) to control programmable internal data paths 467 (3) -467 (0) in multiplier blocks 462 (3) -462 (0). Supplied as In this way, multiplier blocks 462 (3) -462 (0) allow their programmable internal data paths 467 (3) -467 (0) to provide various multiplication operations as needed. Provided with the ability to be reprogrammed.

[00267]引き続き図３７を参照すると、一例として乗算器ブロック４６２（３）を使用すると、入力データパスＡ３およびＡ２は、それぞれ入力ＡＨおよびＡＬに接続される。入力ＡＨは被乗数入力Ａの上位ビットを表し、ＡＬは入力被乗数入力「Ａ」の下位ビットを意味する。入力データパスＢ３およびＢ２は、それぞれ入力ＢＨおよびＢＬに接続される。入力ＢＨは被乗数入力「Ｂ」の上位ビットを表し、ＡＬは入力被乗数入力「Ｂ」の下位ビットを表す。入力データパスＣ３およびＣ２は、それぞれ入力ＣＩおよびＣＱに接続される。入力ＣＩは、この例では入力被乗数入力「Ｃ」の実数ビット部分を表す。ＣＱは、この例では入力被乗数入力「Ｃ」の虚数ビット部分を表す。図３８に関して下記でより詳細に説明されるように、被乗数選択器入力４８２（３）はまた、この例では、乗算器ブロック４６２（３）のプログラム可能な内部データパス４６７（３）が、被乗数入力「Ａ」に対する８ビット乗算を被乗数入力「Ｂ」または被乗数入力「Ｃ」のいずれと実行するように構成されるか、または乗算器ブロック４６２（３）が、被乗数入力「Ａ」に対する１６ビット乗算を被乗数入力「Ｂ」または被乗数入力「Ｃ」のいずれと実行するように構成されるかを制御する。 [00267] With continued reference to FIG. 37, using multiplier block 462 (3) as an example, input data paths A3 and A2 are connected to inputs AH and AL, respectively. The input AH represents the upper bit of the multiplicand input A, and AL means the lower bit of the input multiplicand input “A”. Input data paths B3 and B2 are connected to inputs BH and BL, respectively. The input BH represents the upper bits of the multiplicand input “B”, and AL represents the lower bits of the input multiplicand input “B”. Input data paths C3 and C2 are connected to inputs CI and CQ, respectively. Input CI represents the real bit portion of input multiplicand input “C” in this example. CQ represents the imaginary bit portion of the input multiplicand input “C” in this example. As described in more detail below with respect to FIG. 38, the multiplicand selector input 482 (3) is also, in this example, the programmable internal data path 467 (3) of the multiplier block 462 (3). Is configured to perform 8-bit multiplication on input “A” with either multiplicand input “B” or multiplicand input “C”, or multiplier block 462 (3) is 16 bits on multiplicand input “A”? Controls whether multiplication is performed with multiplicand input “B” or multiplicand input “C”.

[00268]引き続き図３７を参照すると、乗算器ブロック４６２（３）〜４６２（０）は、各々、それらのプログラム可能な内部データパス４６７（３）〜４６７（０）の構成に基づいて、乗算演算の桁上げ「Ｃ」および和「Ｓ」のベクトル出力サンプルセットとして、ベクトル乗算出力サンプルセット４６８（３）〜４６８（０）を生成するように構成される。図３９および図４０に関して下記でより詳細に説明されるように、ベクトル乗算出力サンプルセット４６８（３）〜４６８（０）の桁上げ「Ｃ」および和「Ｓ」は融合され、桁上げ「Ｃ」および和「Ｓ」が、複数の累算器ブロック４７２（３）〜４７２（０）において冗長桁上げ保存フォーマットを提供するために、複数の累算器ブロック４７２（３）〜４７２（０）に冗長桁上げ保存フォーマットで供給されることを意味する。下記でより詳細に説明されるように、複数の累算器ブロック４７２（３）〜４７２（０）において冗長桁上げ保存フォーマットを提供することにより、複数の累算器ブロック４７２（３）〜４７２（０）によって実行される累算演算中に、桁上げ伝搬パスと桁上げ伝搬加算演算とを提供する必要をなくすことができる。 [00268] With continued reference to FIG. 37, multiplier blocks 462 (3) -462 (0) each multiply based on the configuration of their programmable internal data paths 467 (3) -467 (0). Vector multiplication output sample sets 468 (3) to 468 (0) are generated as vector output sample sets of the arithmetic carry "C" and the sum "S". As described in more detail below with respect to FIGS. 39 and 40, the carry “C” and the sum “S” of the vector multiplication output sample sets 468 (3) -468 (0) are merged and the carry “C” ”And the sum“ S ”to provide a redundant carry save format in the plurality of accumulator blocks 472 (3) to 472 (0), the plurality of accumulator blocks 472 (3) to 472 (0). Is supplied in redundant carry save format. As described in more detail below, a plurality of accumulator blocks 472 (3) -472 are provided by providing a redundant carry save format in the plurality of accumulator blocks 472 (3) -472 (0). During the accumulation operation performed by (0), the need to provide a carry propagation path and a carry propagation add operation can be eliminated.

[00269]それらのプログラム可能な内部データパス４６７（３）〜４６７（０）の構成に基づいて、乗算演算の桁上げ「Ｃ」および和「Ｓ」のベクトル出力サンプルセットとして、ベクトル乗算出力サンプルセット４６８（３）〜４６８（０）を生成する乗算器ブロック４６２（３）〜４６２（０）の例が図３７に示される。たとえば、乗算器ブロック４６２（３）は、８ビット乗算のための３２ビット値として桁上げＣ００と和Ｓ００とを生成し、１６ビット乗算のための６４ビット値として桁上げＣ０１と和Ｓ０１とを生成するように構成される。他の乗算器ブロック４６２（２）〜４６２（０）は、この例では同じ能力を有する。この関連で、乗算器ブロック４６２（２）は、８ビット乗算のための３２ビット値として桁上げＣ１０と和Ｓ１０とを生成し、１６ビット乗算のための６４ビット値として桁上げＣ１１と和Ｓ１１とを生成するように構成される。乗算器ブロック４６２（１）は、８ビット乗算のための３２ビット値として桁上げＣ２０と和Ｓ２０とを生成し、１６ビット乗算のための６４ビット値として桁上げＣ２１と和Ｓ２１とを生成するように構成される。乗算器ブロック４６２（０）は、８ビット乗算のための３２ビット値として桁上げＣ３０と和Ｓ３０とを生成し、１６ビット乗算のための６４ビット値として桁上げＣ３１と和Ｓ３１とを生成するように構成される。 [00269] Vector multiplication output samples as a vector output sample set of carry "C" and sum "S" multiplication operations based on the configuration of their programmable internal data paths 467 (3) -467 (0) An example of multiplier blocks 462 (3) -462 (0) that generate sets 468 (3) -468 (0) is shown in FIG. For example, multiplier block 462 (3) generates carry C00 and sum S00 as a 32-bit value for 8-bit multiplication, and carries C01 and sum S01 as a 64-bit value for 16-bit multiplication. Configured to generate. The other multiplier blocks 462 (2) -462 (0) have the same capabilities in this example. In this regard, multiplier block 462 (2) generates carry C10 and sum S10 as a 32-bit value for 8-bit multiplication and carry C11 and sum S11 as a 64-bit value for 16-bit multiplication. And is configured to generate Multiplier block 462 (1) generates carry C20 and sum S20 as 32-bit values for 8-bit multiplication and generates carry C21 and sum S21 as 64-bit values for 16-bit multiplication. Configured as follows. Multiplier block 462 (0) generates carry C30 and sum S30 as 32-bit values for 8-bit multiplication and generates carry C31 and sum S31 as 64-bit values for 16-bit multiplication. Configured as follows.

[00270]図３７の乗算器ブロック４６２内に設けられるプログラム可能なデータパス構成のより例示的な詳細を説明するために、図３８が提供される。図３８は、８ビット×８ビットベクトルデータ入力サンプルセット４６６と、１６ビット×１６ビットベクトルデータ入力サンプルセット４６６とを乗算することが可能な、プログラム可能なデータパス構成を有する、図３７の乗算器ブロック４６２の内部構成要素の概略図である。この関連で、乗算器ブロック４６２は、この例では４つの８×８ビット乗算器４８４（３）〜４８４（０）を含む。任意の所望の数の乗算器４８４が設けられる可能性がある。第１の乗算器４８４（３）は、（入力被乗数入力「Ａ」の上位ビットである）８ビットベクトルデータ入力サンプルセット４６６Ａ［Ｈ］を受信し、ベクトルデータ入力サンプルセット４６６Ａ［Ｈ］を、（入力被乗数入力「Ｂ」の上位ビットである）８ビットベクトルデータ入力サンプルセット４６６Ｂ［Ｈ］または（入力被乗数入力「Ｃ」の上位ビットである）８ビットベクトルデータ入力サンプルセット４６６Ｃ［Ｉ］のいずれかと乗算するように構成される。乗算器４８４（３）に被乗数として供給している８ビットベクトルデータ入力サンプルセット４６６Ｂ［Ｈ］または８ビットベクトルデータ入力サンプルセット４６６Ｃ［Ｉ］のいずれかを選択するように構成された、マルチプレクサ４８６（３）が設けられる。マルチプレクサ４８６（３）は、この実施形態では、被乗数選択器入力４８２内の上位ビットである被乗数選択器入力４８２［３］によって制御される。このようにして、マルチプレクサ４８６（３）および被乗数選択器入力４８２［３］は、８ビットベクトルデータ入力サンプルセット４６６Ｂ［Ｈ］または８ビットベクトルデータ入力サンプルセット４６６Ｃ［Ｉ］のどちらが、受信されたベクトルデータ入力サンプルセット４６６Ａ［Ｈ］と乗算されるかを、乗算器４８４（３）が制御するためのプログラム可能な内部データパス４６７［０］構成を提供する。 [00270] FIG. 38 is provided to describe more illustrative details of the programmable data path configuration provided within the multiplier block 462 of FIG. FIG. 38 illustrates the multiplication of FIG. 37 with a programmable data path configuration capable of multiplying an 8 bit × 8 bit vector data input sample set 466 and a 16 bit × 16 bit vector data input sample set 466. 3 is a schematic diagram of the internal components of the instrument block 462; In this regard, multiplier block 462 includes four 8 × 8 bit multipliers 484 (3) -484 (0) in this example. Any desired number of multipliers 484 may be provided. The first multiplier 484 (3) receives the 8-bit vector data input sample set 466A [H] (which is the upper bits of the input multiplicand input “A”), and receives the vector data input sample set 466A [H] as Of the 8-bit vector data input sample set 466B [H] (which is the upper bit of the input multiplicand input “B”) or the 8-bit vector data input sample set 466C [I] (which is the upper bit of the input multiplicand input “C”) Configured to multiply with either. Multiplexer 486 configured to select either 8-bit vector data input sample set 466B [H] or 8-bit vector data input sample set 466C [I] being supplied as a multiplicand to multiplier 484 (3) (3) is provided. Multiplexer 486 (3) is controlled by multiplicand selector input 482 [3], which in this embodiment is the upper bit in multiplicand selector input 482. In this way, multiplexer 486 (3) and multiplicand selector input 482 [3] received either 8-bit vector data input sample set 466B [H] or 8-bit vector data input sample set 466C [I]. A programmable internal data path 467 [0] configuration is provided for the multiplier 484 (3) to control whether the vector data input sample set 466A [H] is multiplied.

[00271]引き続き図３８を参照すると、他の乗算器４８４（２）〜４８４（０）も、第１の乗算器４８４（３）用に設けられたものと同様のプログラム可能な内部データパス４６７［２］〜４６７［０］を含む。乗算器４８４（２）は、被乗数入力「Ａ」の下位ビットである８ビットベクトルデータ入力サンプルセット４６６Ａ［Ｌ］と乗算されるべき、８ビットベクトルデータ入力サンプルセット４６６Ｂ［Ｈ］または８ビットベクトルデータ入力サンプルセット４６６Ｃ［Ｉ］のいずれかを、プログラム可能な内部データパス４６７［１］内に供給するためのプログラム可能な構成を有する、プログラム可能な内部データパス４６７［２］を含む。選択は、この実施形態では、被乗数選択器入力４８２内の被乗数選択器入力４８２［２］に従って、マルチプレクサ４８６（２）によって制御される。乗算器４８４（１）は、８ビットベクトルデータ入力サンプルセット４６６Ａ［Ｈ］と乗算されるべき、被乗数入力「Ｂ」の下位ビットである８ビットベクトルデータ入力サンプルセット４６６Ｂ［Ｌ］、または被乗数入力「Ｃ」の下位ビットである８ビットベクトルデータ入力サンプルセット４６６Ｃ［Ｑ］のいずれかを、プログラム可能な内部データパス４６７［１］内に供給するようにプログラム可能な、プログラム可能な内部データパス４６７［１］を含む。選択は、この実施形態では、被乗数選択器入力４８２内の被乗数選択器入力４８２［１］に従って、マルチプレクサ４８６（１）によって制御される。さらに、乗算器４８４（０）は、８ビットベクトルデータ入力サンプルセット４６６Ａ［Ｌ］と乗算されるべき、８ビットベクトルデータ入力サンプルセット４６６Ｂ［Ｌ］または８ビットベクトルデータ入力サンプルセット４６６Ｃ［Ｑ］のいずれかを、プログラム可能な内部データパス４６７［０］内に供給するようにプログラム可能な、プログラム可能な内部データパス４６７［０］を含む。選択は、この実施形態では、乗数選択器入力４８２内の被乗数選択器ビット入力４８２［０］に従って、マルチプレクサ４８６（０）によって制御される。 [00271] With continued reference to FIG. 38, the other multipliers 484 (2) -484 (0) are also programmable internal data paths 467 similar to those provided for the first multiplier 484 (3). [2] to 467 [0] are included. Multiplier 484 (2) is an 8-bit vector data input sample set 466B [H] or an 8-bit vector to be multiplied with the 8-bit vector data input sample set 466A [L], which is the lower bits of the multiplicand input “A”. A programmable internal data path 467 [2] having a programmable configuration for providing any of the data input sample sets 466C [I] into the programmable internal data path 467 [1] is included. Selection is controlled by multiplexer 486 (2) in this embodiment according to multiplicand selector input 482 [2] in multiplicand selector input 482. Multiplier 484 (1) is an 8-bit vector data input sample set 466B [L] or multiplicand input that is a lower bit of multiplicand input “B” to be multiplied with 8-bit vector data input sample set 466A [H]. Programmable internal data path programmable to supply any of the 8-bit vector data input sample set 466C [Q], the lower bits of “C”, into the programmable internal data path 467 [1] 467 [1]. Selection is controlled by multiplexer 486 (1) in this embodiment according to multiplicand selector input 482 [1] in multiplicand selector input 482. Furthermore, the multiplier 484 (0) is to be multiplied by the 8-bit vector data input sample set 466A [L], and the 8-bit vector data input sample set 466B [L] or the 8-bit vector data input sample set 466C [Q]. Of the programmable internal data path 467 [0], which is programmable to be fed into the programmable internal data path 467 [0]. Selection is controlled in this embodiment by multiplexer 486 (0) according to multiplicand selector bit input 482 [0] in multiplier selector input 482.

[00272]引き続き図３８を参照すると、上記で説明されたように、乗算器４８４（３）〜４８４（０）は、様々なビット長乗算演算を実行するように構成され得る。この関連で、各乗算器４８４（３）〜４８４（０）は、それぞれ、ビット長乗算モード入力４８８（３）〜４８８（０）を含む。この例では、各乗算器４８４（３）〜４８４（０）は、それぞれ、プログラム可能なデータパス４９０（３）〜４９０（０）、４９１、および４９２（３）〜４９２（０）の構成を制御する入力に従って、８ビット×８ビットモードでプログラムされ得る。各乗算器４８４（３）〜４８４（０）はまた、それぞれ、プログラム可能なデータパス４９０（３）〜４９０（０）、４９１、および４９２（３）〜４９２（０）の構成を制御する入力に従って、１６ビット×１６ビットモードと２４ビット×８ビットモードとを含む、より大きいビット乗算演算の一部を提供するようにプログラムされ得る。たとえば、各乗算器４８４（３）〜４８４（０）が、プログラム可能なデータパス４９０（３）〜４９０（０）の構成に従って８ビット×８ビット乗算モードで構成される場合、ユニットとしての複数の乗算器４８４（３）〜４８４（０）は、乗算器ブロック４６２の一部として２つの個々の８ビット×８ビット乗算器を備えるように構成され得る。各乗算器４８４（３）〜４８４（０）が、プログラム可能なデータパス４９１の構成に従って１６ビット×１６ビット乗算モードで構成される場合、ユニットとしての複数の乗算器４８４（３）〜４８４（０）は、乗算器ブロック４６２の一部として単一の１６ビット×１６ビット乗算器を備えるように構成され得る。乗算器４８４（３）〜４８４（０）が、プログラム可能なデータパス４９２（３）〜４９２（０）の構成に従って２４ビット×８ビット乗算モードで構成される場合、ユニットとしての複数の乗算器４８４（３）〜４８４（０）は、乗算器ブロック４６２の一部として１つの１６ビット×２４ビット×８ビット乗算器を備えるように構成され得る。 [00272] With continued reference to FIG. 38, as described above, the multipliers 484 (3) -484 (0) may be configured to perform various bit-length multiplication operations. In this regard, each multiplier 484 (3) -484 (0) includes a bit length multiplication mode input 488 (3) -488 (0), respectively. In this example, each multiplier 484 (3) -484 (0) has a programmable data path 490 (3) -490 (0), 491 and 492 (3) -492 (0) configuration, respectively. Depending on the input to be controlled, it can be programmed in 8 bit x 8 bit mode. Each multiplier 484 (3) -484 (0) also has an input that controls the configuration of programmable data paths 490 (3) -490 (0), 491, and 492 (3) -492 (0), respectively. Can be programmed to provide some of the larger bit multiplication operations, including 16 bit × 16 bit mode and 24 bit × 8 bit mode. For example, if each multiplier 484 (3) -484 (0) is configured in an 8-bit × 8-bit multiplication mode according to the configuration of programmable data paths 490 (3) -490 (0), multiple units as units Multipliers 484 (3)-484 (0) may be configured to include two individual 8-bit × 8-bit multipliers as part of multiplier block 462. When each multiplier 484 (3) -484 (0) is configured in a 16-bit × 16-bit multiplication mode according to the configuration of the programmable data path 491, a plurality of multipliers 484 (3) -484 ( 0) may be configured to comprise a single 16 bit × 16 bit multiplier as part of the multiplier block 462. When multipliers 484 (3) -484 (0) are configured in 24-bit × 8-bit multiplication mode according to the configuration of programmable data paths 492 (3) -492 (0), multiple multipliers as a unit 484 (3)-484 (0) may be configured with one 16-bit × 24-bit × 8-bit multiplier as part of the multiplier block 462.

[00273]引き続き図３８を参照すると、この例における乗算器４８４（３）〜４８４（０）は、１６ビット×１６ビット乗算モードで構成されるものとして示されている。１６ビットの入力和４９４（３）、４９４（２）および入力桁上げ４９６（３）、４９６（２）は、それぞれ、各乗算器４８４（３）、４８４（２）によって生成される。１６ビットの入力和４９４（１）、４９４（０）および入力桁上げ４９６（１）、４９６（０）は、それぞれ、各乗算器４８４（１）、４８４（０）によって生成される。１６ビットの入力和４９４（３）、４９４（２）および入力桁上げ４９６（３）、４９６（２）はまた、一緒に入力和４９４（３）〜４９４（０）と入力桁上げ４９６（３）〜４９６（０）とを加算するために、１６ビットの和入力４９４（１）、４９４（０）および入力桁上げ４９６（１）、４９６（０）とともに２４ビット４：２圧縮器５１５に供給される。加算された入力和４９４（３）〜４９４（０）および入力桁上げ４９６（３）〜４９６（０）は、プログラム可能なデータパス４９１がアクティブであり、入力和４９４（３）〜４９４（０）および入力桁上げ４９６（３）〜４９６（０）とゲート制御されるとき、１６ビット×１６ビット乗算モードで単一和４９８と単一桁上げ５００とをもたらす。プログラム可能なデータパス４９１は、２４ビット４：２圧縮器５１５に供給されるように、１６ビットワードとして、組み合わされた入力和４９４（３）、４９４（２）を有する第１のＡＮＤベースのゲート５０２（３）によって、および１６ビットワードとして、組み合わされた入力桁上げ４９６（３）、４９６（２）を有する第２のＡＮＤベースのゲート５０２（２）によってゲート制御される。プログラム可能なデータパス４９１はまた、２４ビット４：２圧縮器５１５に供給されるように、１６ビットワードとして、組み合わされた入力和４９４（１）、４９４（０）を有する第３のＡＮＤベースのゲート５０２（１）によって、および１６ビットワードとして、組み合わされた入力桁上げ４９６（１）、４９６（０）を有する第４のＡＮＤベースのゲート５０２（０）によってゲート制御される。乗算器ブロック４６２が１６ビット×１６ビット乗算モードまたは２４ビット×８ビット乗算モードで構成される場合、プログラム可能な出力データパス４７０［０］は、圧縮された３２ビット和Ｓ０および３２ビット桁上げＣ０部分積として、ベクトル乗算出力サンプルセット４６８［０］を供給される。 [00273] With continued reference to FIG. 38, the multipliers 484 (3) -484 (0) in this example are shown as configured in a 16-bit × 16-bit multiplication mode. A 16-bit input sum 494 (3), 494 (2) and an input carry 496 (3), 496 (2) are generated by each multiplier 484 (3), 484 (2), respectively. A 16-bit input sum 494 (1), 494 (0) and an input carry 496 (1), 496 (0) are generated by each multiplier 484 (1), 484 (0), respectively. The 16-bit input sums 494 (3), 494 (2) and the input carry 496 (3), 496 (2) are also combined with the input sums 494 (3) -494 (0) and the input carry 496 (3 ) To 496 (0) to the 24-bit 4: 2 compressor 515 along with the 16-bit sum inputs 494 (1), 494 (0) and the input carry 496 (1), 496 (0). Supplied. For the added input sums 494 (3) -494 (0) and input carry 496 (3) -496 (0), the programmable data path 491 is active and the input sums 494 (3) -494 (0) ) And input carry 496 (3) -496 (0), yielding a single sum 498 and a single carry 500 in 16 bit × 16 bit multiply mode. Programmable data path 491 is a first AND-based having a combined input sum 494 (3), 494 (2) as a 16-bit word to be fed to a 24-bit 4: 2 compressor 515. Gated by gate 502 (3) and as a 16-bit word by a second AND-based gate 502 (2) having combined input carry 496 (3), 496 (2). Programmable data path 491 is also a third AND base with input sums 494 (1), 494 (0) combined as a 16-bit word to be fed to 24-bit 4: 2 compressor 515. Gate 502 (1) and as a 16-bit word by a fourth AND-based gate 502 (0) with combined input carry 496 (1), 496 (0). When multiplier block 462 is configured in 16-bit × 16-bit multiplication mode or 24-bit × 8-bit multiplication mode, programmable output data path 470 [0] has a compressed 32-bit sum S0 and a 32-bit carry. A vector multiplication output sample set 468 [0] is supplied as the C0 partial product.

[00274]乗算器ブロック４６２内の乗算器４８４（３）〜４８４（０）が８ビット×８ビット乗算モードで構成される場合、プログラム可能な出力データパス４７０［１］構成は、圧縮なしで、１６ビット入力和４９４（３）〜４９４（０）および部分積としての対応する１６ビット入力桁上げ４９６（３）〜４９６（０）として提供される。乗算器ブロック４６２内の乗算器４８４（３）〜４８４（０）が８ビット×８ビット乗算モードで構成される場合、プログラム可能な出力データパス４７０［１］は、圧縮なしで、１６ビット入力和４９４（３）〜４９４（０）およびベクトル乗算出力サンプルセット４６８［１］としての対応する１６ビット入力桁上げ４９６（３）〜４９６（０）として提供される。乗算器ブロック４６２の乗算モードに依存するベクトル乗算出力サンプルセット４６８［０］、４６８［１］は、実行されているベクトル命令に従って、和および桁上げ積の累算のために、累算器ブロック４７２（３）〜４７２（０）に供給される。 [00274] When the multipliers 484 (3) -484 (0) in the multiplier block 462 are configured in an 8 bit x 8 bit multiplication mode, the programmable output data path 470 [1] configuration is not compressed. , 16-bit input sums 494 (3) -494 (0) and the corresponding 16-bit input carry 496 (3) -496 (0) as partial products. When the multipliers 484 (3) -484 (0) in the multiplier block 462 are configured in an 8 bit × 8 bit multiplication mode, the programmable output data path 470 [1] is 16-bit input without compression. Sums 494 (3) -494 (0) and corresponding 16-bit input carry 496 (3) -496 (0) as vector multiplication output sample set 468 [1] are provided. The vector multiplication output sample set 468 [0], 468 [1] depending on the multiplication mode of the multiplier block 462 is an accumulator block for accumulation of sum and carry products according to the vector instruction being executed. 472 (3) to 472 (0).

[00275]プログラム可能なデータパス構成を有する、図３７および図３８の乗算器ブロック４６２（３）〜４６２（０）が記載されたので、冗長桁上げ保存フォーマットで構成された累算器ブロック４７２（３）〜４７２（０）と融合されるべき、実行ユニット８４内の乗算器ブロック４６２（３）〜４６２（０）の特徴が、図３９に関して次に概説される。 [00275] Since the multiplier blocks 462 (3) -462 (0) of FIGS. 37 and 38 having a programmable data path configuration have been described, the accumulator block 472 configured in a redundant carry save format. The features of multiplier blocks 462 (3) -462 (0) in execution unit 84 to be merged with (3) -472 (0) are outlined below with respect to FIG.

[00276]この関連で、図３９は、上述された実行ユニット８４（０）〜８４（Ｘ）内の乗算器ブロックおよび累算器ブロックの一般化された概略図であり、ここで、累算器ブロックは、桁上げ伝搬を低減するために冗長桁上げ保存フォーマットを利用する桁上げ保存累算器構造を利用する。前に説明され、図３８に示されたように、乗算器ブロック４６２は、被乗数入力４６６［Ｈ］と４６６［Ｌ］とを乗算し、少なくとも１つの入力和４９４と少なくとも１つの入力桁上げ４９６とを、ベクトル乗算出力サンプルセット４６８として、プログラム可能な出力データパス４７０内に供給するように構成される。累算ステップごとに累算器ブロック４７２内に桁上げ伝搬パスと桁上げ伝搬加算器とを設ける必要をなくすために、プログラム可能な出力データパス４７０内のベクトル乗算出力サンプルセット４６８内の少なくとも１つの入力和４９４および少なくとも１つの入力桁上げ４９６が、少なくとも１つの累算器ブロック４７２に冗長桁上げ保存フォーマットで融合される。言い換えれば、ベクトル乗算出力サンプルセット４６８内の桁上げ４９６は、累算器ブロック４７２に桁上げ保存フォーマットでベクトル入力桁上げ４９６として供給される。このようにして、ベクトル乗算出力サンプルセット４６８内の入力和４９４および入力桁上げ４９６は、この実施形態では複合ゲート４：２圧縮器である累算器ブロック４７２の圧縮器５０８に供給され得る。圧縮器５０８は、入力和４９４および入力桁上げ４９６を、それぞれ、前の累算ベクトル出力和５１２および前のシフトされた累算ベクトル出力桁上げ５１７と一緒に累算するように構成される。前のシフトされた累算ベクトル出力桁上げ５１７は、本質的に、累算演算中の保存された桁上げ累算である。 [00276] In this regard, FIG. 39 is a generalized schematic of the multiplier and accumulator blocks in the execution units 84 (0) -84 (X) described above, where accumulation is The generator block utilizes a carry save accumulator structure that utilizes a redundant carry save format to reduce carry propagation. As previously described and illustrated in FIG. 38, the multiplier block 462 multiplies the multiplicand inputs 466 [H] and 466 [L] to produce at least one input sum 494 and at least one input carry 496. Are provided in a programmable output data path 470 as a vector multiplication output sample set 468. At least one in vector multiply output sample set 468 in programmable output data path 470 to eliminate the need for a carry propagation path and carry propagation adder in accumulator block 472 for each accumulation step. One input sum 494 and at least one input carry 496 are fused to at least one accumulator block 472 in a redundant carry save format. In other words, the carry 496 in the vector multiply output sample set 468 is provided to the accumulator block 472 as a vector input carry 496 in a carry save format. In this way, the input sum 494 and input carry 496 in the vector multiply output sample set 468 can be provided to the compressor 508 of the accumulator block 472, which in this embodiment is a composite gate 4: 2 compressor. The compressor 508 is configured to accumulate the input sum 494 and the input carry 496 together with the previous accumulated vector output sum 512 and the previous shifted accumulated vector output carry 517, respectively. The previous shifted accumulation vector output carry 517 is essentially a stored carry accumulation during the accumulation operation.

[00277]このようにして、受信された入力桁上げ４９６を累算器ブロック４７２によって生成された累算の一部として入力和４９４に伝搬するために、単一の最終桁上げ伝搬加算器のみが累算器ブロック４７２内に設けられることが必要である。累算器ブロック４７２内の累算の各ステップ中に、桁上げ伝搬加算演算を実行することに関連する電力消費が、この実施形態では低減される。また、累算器ブロック４７２内の累算の各ステップ中に、桁上げ伝搬加算演算を実行することに関連するゲート遅延も、この実施形態ではなくなる。 [00277] Thus, only a single final carry propagation adder is required to propagate the received input carry 496 to the input sum 494 as part of the accumulation generated by the accumulator block 472. Must be provided in the accumulator block 472. During each step of accumulation in accumulator block 472, the power consumption associated with performing carry propagation addition operations is reduced in this embodiment. Also, the gate delay associated with performing carry propagation addition operations during each step of accumulation in accumulator block 472 is also not in this embodiment.

[00278]引き続き図３９を参照すると、圧縮器５０８は、冗長な形式での入力和４９４および入力桁上げ４９６を、それぞれ、前の累算ベクトル出力和５１２および前のシフトされた累算ベクトル出力桁上げ５１７と累算するように構成される。シフトされた累算ベクトル出力桁上げ５１７は、次の受信された入力和４９４および入力桁上げ４９６の次の累算が圧縮器５０８によって実行される前に、累算ベクトル出力桁上げ５１４をシフトすることにより、圧縮器５０８によって生成された累算ベクトル出力桁上げ５１４によって生成される。最終的なシフトされた累算ベクトル出力桁上げ５１７は、最終的なシフトされた累算ベクトル出力桁上げ５１７内で桁上げ累算を伝搬して、最終累算ベクトル出力和５１２を最終累算器出力サンプルセット４７６２の補数表現に変換するために、累算器ブロック４７２内に設けられた単一の最終桁上げ伝搬加算器５１９によって最終累算ベクトル出力和５１２に加算される。最終累算ベクトル出力和５１２は、プログラム可能な出力データパス４７４内で累算器出力サンプルセット４７６として供給される（図３５参照）。 [00278] With continued reference to FIG. 39, the compressor 508 converts the input sum 494 and the input carry 496 in redundant form to the previous accumulated vector output sum 512 and the previous shifted accumulated vector output, respectively. It is configured to accumulate with carry 517. The shifted accumulated vector output carry 517 shifts the accumulated vector output carry 514 before the next accumulation of the next received input sum 494 and input carry 496 is performed by the compressor 508. Is generated by the accumulated vector output carry 514 generated by the compressor 508. The final shifted accumulated vector output carry 517 propagates the carry accumulation within the final shifted accumulated vector output carry 517 to produce the final accumulated vector output sum 512 as the final accumulated. Is added to the final accumulated vector output sum 512 by a single final carry propagation adder 519 provided in the accumulator block 472 for conversion to the complement representation of the generator output sample set 4762. The final accumulated vector output sum 512 is provided in the programmable output data path 474 as an accumulator output sample set 476 (see FIG. 35).

[00279]冗長桁上げ保存フォーマットで構成された累算器ブロック４７２との乗算器ブロック４６２の融合を示す図３９が記載されたので、累算器ブロック４７２（３）〜４７２（０）に関するより例示的な詳細が、図４０に関してここで概説される。図４０は、図３５の実行ユニット８４内に設けられた累算器ブロック４７２の例示的な内部構成要素の詳細な概略図である。前に説明され、下記でより詳細に説明されるように、累算器ブロック４７２は、プログラム可能な入力データパス４７８（３）〜４７８（０）および／またはプログラム可能な内部データパス４８０（３）〜４８０（０）を用いて構成され、その結果、累算器ブロック４７２は、特定の異なるタイプのベクトル累算演算を実行するように設計された専用回路として働くようにプログラムされ得る。たとえば、累算器ブロック４７２は、符号付きおよび符号なしの累算演算を含む、いくつかの様々な累算と加算とを提供するようにプログラムされ得る。様々なタイプの累算演算を提供するように構成されている累算器ブロック４７２内のプログラム可能な入力データパス４７８（３）〜４７８（０）および／またはプログラム可能な内部データパス４８０（３）〜４８０（０）の具体例が開示される。また、累算器ブロック４７２は、低減された組合せ論理を用いて高速累算演算を提供するために、桁上げ伝搬を回避または低減するように冗長桁上げ算術を提供するために、桁上げ保存累算器４７２［０］、４７２［１］を含むように構成される。 [00279] Since FIG. 39 illustrating the fusion of multiplier block 462 with accumulator block 472 configured in a redundant carry save format has been described, more about accumulator blocks 472 (3) -472 (0). Exemplary details are outlined herein with respect to FIG. FIG. 40 is a detailed schematic diagram of exemplary internal components of accumulator block 472 provided within execution unit 84 of FIG. As previously described and described in more detail below, accumulator block 472 may include programmable input data paths 478 (3) -478 (0) and / or programmable internal data path 480 (3 ) -480 (0) so that accumulator block 472 can be programmed to act as a dedicated circuit designed to perform certain different types of vector accumulation operations. For example, accumulator block 472 may be programmed to provide a number of different accumulations and additions, including signed and unsigned accumulation operations. Programmable input data path 478 (3) -478 (0) and / or programmable internal data path 480 (3) in accumulator block 472 configured to provide various types of accumulation operations. ) To 480 (0) are disclosed. Accumulator block 472 also carries carry saves to provide redundant carry arithmetic to avoid or reduce carry propagation to provide fast accumulation operations using reduced combinatorial logic. The accumulators 472 [0] and 472 [1] are included.

[00280]累算器ブロック４７２の例示的な内部構成要素が図４０に示される。そこに示されているように、この実施形態における累算器ブロック４７２は、一緒に累算されるために、第１の入力和４９４［０］および第１の入力桁上げ４９６［０］と、第２の入力和４９４［１］および第２の入力桁上げ４９６［１］とを乗算器ブロック４６２から受信するように構成される。図４０に関して、入力和４９４［０］、４９４［１］および入力桁上げ４９６［０］、４９６［１］は、ベクトル入力和４９４［０］、４９４［１］およびベクトル入力桁上げ４９６［０］、４９６［１］と呼ばれる。前述され、図３９に示されたように、この実施形態におけるベクトル入力和４９４［０］、４９４［１］およびベクトル入力桁上げ４９６［０］、４９６［１］は、各々長さが１６ビットである。この例における累算器ブロック４７２は、２つの２４ビット桁上げ保存累算器ブロック４７２［０］、４７２［１］として設けられ、「［０］」が桁上げ保存累算器４７２［０］用に指定され、「［１］」が桁上げ保存累算器４７２［１］用に指定される、共通要素番号を有する同様の構成要素を各々が含んでいる。桁上げ保存累算器４７２［０］、４７２［１］は、同時にベクトル累算演算を実行するように構成され得る。 [00280] Exemplary internal components of accumulator block 472 are shown in FIG. As shown therein, the accumulator block 472 in this embodiment has a first input sum 494 [0] and a first input carry 496 [0] to be accumulated together. , A second input sum 494 [1] and a second input carry 496 [1] are configured to be received from multiplier block 462. With respect to FIG. 40, the input sums 494 [0], 494 [1] and the input carry 496 [0], 496 [1] are the vector input sums 494 [0], 494 [1] and the vector input carry 496 [0]. ] 496 [1]. As described above and shown in FIG. 39, the vector input sums 494 [0], 494 [1] and the vector input carry 496 [0], 496 [1] in this embodiment are each 16 bits in length. It is. The accumulator block 472 in this example is provided as two 24-bit carry save accumulator blocks 472 [0], 472 [1], where “[0]” is the carry save accumulator 472 [0]. Each contains similar components with a common element number, designated for use with “[1]” designated for carry save accumulator 472 [1]. The carry save accumulators 472 [0], 472 [1] may be configured to perform vector accumulation operations simultaneously.

[00281]図４０の桁上げ保存累算器４７２［０］を参照すると、ベクトル入力和４９４［０］およびベクトル入力桁上げ４９６［０］は、プログラム可能な内部データパス４８０［０］の一部として設けられたマルチプレクサ５０４（０）内の入力である。負のベクトル入力和４９４［０］’と負のベクトル入力桁上げ４９６［０］’とを必要とする累算演算のための、マルチプレクサ５０４（０）への入力として、入力５２１（０）に従って負のベクトル入力和４９４［０］’と負のベクトル入力桁上げ４９６［０］’とを生成する、排他的ＯＲベースのゲートから構成され得る否定回路５０６（０）も設けられる。マルチプレクサ５０４（０）は、ベクトル命令復号の結果として生成された、選択器入力５１０（０）に従って圧縮器５０８（０）に供給されるべき、ベクトル入力和４９４［０］およびベクトル入力桁上げ４９６［０］、または負のベクトル入力和４９４［０］’および負のベクトル入力桁上げ４９６［０］’のいずれかを選択するように構成される。この関連で、選択器入力５１０（０）により、累算器ブロック４７２によって実行されるように構成された累算演算に従って、桁上げ保存累算器４７２［０］のプログラム可能な入力データパス４７８［０］が、ベクトル入力和４９４［０］およびベクトル入力桁上げ４９６［０］、または負のベクトル入力和４９４［０］’および負のベクトル入力桁上げ４９６［０］’のいずれかを圧縮器５０８（０）に供給するようにプログラム可能になる。 [00281] Referring to the carry save accumulator 472 [0] of FIG. 40, the vector input sum 494 [0] and the vector input carry 496 [0] are part of the programmable internal data path 480 [0]. This is an input in a multiplexer 504 (0) provided as a unit. According to input 521 (0) as an input to multiplexer 504 (0) for an accumulation operation requiring a negative vector input sum 494 [0] 'and a negative vector input carry 496 [0]'. Also provided is a negation circuit 506 (0), which may consist of an exclusive OR-based gate, that produces a negative vector input sum 494 [0] 'and a negative vector input carry 496 [0]'. Multiplexer 504 (0) generates vector input sum 494 [0] and vector input carry 496 to be supplied to compressor 508 (0) according to selector input 510 (0), generated as a result of vector instruction decoding. [0], or a negative vector input sum 494 [0] ′ and a negative vector input carry 496 [0] ′. In this regard, the programmable input data path 478 of the carry save accumulator 472 [0] according to the accumulation operation configured to be performed by the accumulator block 472 with the selector input 510 (0). [0] compresses either vector input sum 494 [0] and vector input carry 496 [0], or negative vector input sum 494 [0] 'and negative vector input carry 496 [0]' It can be programmed to feed to the device 508 (0).

[00282]引き続き図４０を参照すると、この実施形態における桁上げ保存累算器ブロック４７２［０］の圧縮器５０８（０）は、複合ゲート４：２圧縮器である。この関連で、圧縮器５０８（０）は、冗長桁上げ保存演算において和と桁上げとを累算するように構成される。圧縮器５０８（０）は、圧縮器５０８（０）への４つの入力として、現在のベクトル入力和４９４［０］およびベクトル入力桁上げ４９６［０］、または現在の負のベクトル入力和４９４［０］’および負のベクトル入力桁上げ４９６［０］’を、前の累算されたベクトル入力和４９４［０］およびベクトル入力桁上げ４９６［０］、または累算された負のベクトル入力和４９４［０］’および負のベクトル入力桁上げ４９６［０］’と一緒に累算するように構成される。圧縮器５０８（０）は、累算器出力サンプルセット４７６（３）〜４７６（０）を供給するために、プログラム可能な出力データパス４７４［０］（図３５参照）内の累算器出力サンプルセット４７６［０］として、累算ベクトル出力和５１２（０）と累算ベクトル出力桁上げ５１４（０）とを供給する。累算ベクトル出力桁上げ５１４（０）は、各累算ステップ中にビット幅成長を制御するために、シフトされた累算ベクトル出力桁上げ５１７（０）を供給するように、累算演算中にビットシフタ５１６（０）によってシフトされる。たとえば、この実施形態におけるビットシフタ５１６（０）は、冗長桁上げ保存フォーマットで圧縮器５０８（０）に融合されるバレルシフタである。このようにして、シフトされた累算ベクトル出力桁上げ５１７（０）は、本質的に、累算器ブロック４７２［０］によって実行される累算演算中に、累算ベクトル出力和５１２（０）に伝搬される必要なしに保存される。このようにして、累算器ブロック４７２［０］内の累算の各ステップ中に桁上げ伝搬加算演算を実行することに関連する電力消費およびゲート遅延が、この実施形態ではなくなる。 [00282] With continued reference to FIG. 40, the compressor 508 (0) of the carry save accumulator block 472 [0] in this embodiment is a composite gate 4: 2 compressor. In this regard, the compressor 508 (0) is configured to accumulate the sum and carry in a redundant carry save operation. The compressor 508 (0) has four inputs to the compressor 508 (0) as a current vector input sum 494 [0] and a vector input carry 496 [0], or a current negative vector input sum 494 [ 0] 'and negative vector input carry 496 [0]', the previous accumulated vector input sum 494 [0] and vector input carry 496 [0], or the accumulated negative vector input sum 494 [0] ′ and negative vector input carry 496 [0] ′ are configured to accumulate together. Compressor 508 (0) provides accumulator output in programmable output data path 474 [0] (see FIG. 35) to provide accumulator output sample sets 476 (3) -476 (0). As a sample set 476 [0], an accumulated vector output sum 512 (0) and an accumulated vector output carry 514 (0) are supplied. Accumulated vector output carry 514 (0) is performing an accumulation operation to provide a shifted accumulated vector output carry 517 (0) to control bit width growth during each accumulation step. Shifted by the bit shifter 516 (0). For example, bit shifter 516 (0) in this embodiment is a barrel shifter that is fused to compressor 508 (0) in a redundant carry save format. In this way, the shifted accumulated vector output carry 517 (0) is essentially the accumulated vector output sum 512 (0) during the accumulation operation performed by accumulator block 472 [0]. ) Stored without having to be propagated to. In this way, the power consumption and gate delay associated with performing carry propagation addition operations during each step of accumulation in accumulator block 472 [0] is not in this embodiment.

[00283]さらなる後続のベクトル入力和４９４［０］およびベクトル入力桁上げ４９６［０］、または負のベクトル入力和４９４［０］’および負のベクトル入力桁上げ４９６［０］’は、現在の累算ベクトル出力和５１２（０）および現在の累算ベクトル出力桁上げ５１７（０）と累算され得る。ベクトル入力和４９４［０］およびベクトル入力桁上げ４９６［０］、または負のベクトル入力和４９４［０］’および負のベクトル入力桁上げ４９６［０］’は、ベクトル命令復号の結果として生成された、和桁上げ選択器５２０（０）に従うプログラム可能な内部データパス４８０［０］の一部として、マルチプレクサ５１８（０）によって選択される。現在の累算ベクトル出力和５１２（０）および現在のシフトされた累算ベクトル出力桁上げ５１７（０）は、桁上げ保存累算器ブロック４７２［０］が、更新された累算ベクトル出力和５１２（０）と累算ベクトル出力桁上げ５１４（０）とを供給するために、圧縮器５０８（０）への入力として供給され得る。この関連で、和桁上げ選択器５２０（０）により、累算器ブロック４７２［０］のプログラム可能な内部データパス４８０［０］が、累算器ブロック４７２によって実行されるように構成された累算演算に従って、圧縮器５０８（０）にベクトル入力和４９４［０］とベクトル入力桁上げ４９６［０］とを供給するようにプログラム可能になる。桁上げ保存累算器ブロック４７２［０］における累算の動作タイミングを制御するために、保持状態入力５２６（０）に従って累算ベクトル出力和５１２（０）およびシフトされた累算ベクトル出力桁上げ５１７（０）の現在の状態を、マルチプレクサ５１８（０）に保持させるために、この実施形態では保持ゲート５２２（０）、５２４（０）も提供される。 [00283] Further subsequent vector input sum 494 [0] and vector input carry 496 [0], or negative vector input sum 494 [0] 'and negative vector input carry 496 [0]' The accumulated vector output sum 512 (0) and the current accumulated vector output carry 517 (0) may be accumulated. Vector input sum 494 [0] and vector input carry 496 [0], or negative vector input sum 494 [0] 'and negative vector input carry 496 [0]' are generated as a result of vector instruction decoding. Also selected by multiplexer 518 (0) as part of a programmable internal data path 480 [0] according to sum carry selector 520 (0). The current accumulated vector output sum 512 (0) and the current shifted accumulated vector output carry 517 (0) are stored in the carry save accumulator block 472 [0] by the updated accumulated vector output sum. To provide 512 (0) and accumulated vector output carry 514 (0), it can be provided as an input to compressor 508 (0). In this regard, the sum carry selector 520 (0) is configured such that the accumulator block 472 [0] programmable internal data path 480 [0] is executed by the accumulator block 472. According to the accumulation operation, the compressor 508 (0) can be programmed to supply a vector input sum 494 [0] and a vector input carry 496 [0]. Accumulated vector output sum 512 (0) and shifted accumulated vector output carry according to hold state input 526 (0) to control the operation timing of accumulation in carry save accumulator block 472 [0]. In order to hold the current state of 517 (0) in multiplexer 518 (0), hold gates 522 (0) and 524 (0) are also provided in this embodiment.

[00284]引き続き図４０を参照すると、桁上げ保存累算器ブロック４７２［０］の累算ベクトル出力和５１２（０）およびシフトされた累算ベクトル出力桁上げ５１７（０）、ならびに桁上げ保存累算器ブロック４７２［１］の累算ベクトル出力和５１２（１）およびシフトされた累算ベクトル出力桁上げ５１７（１）は、それぞれ、制御ゲート５３４（０）、５３６（０）および５３４（１）、５３６（１）によってゲート制御される。制御ゲート５３４（０）、５３６（０）および５３４（１）、５３６（１）は、それぞれ、圧縮器５０８（０）、５０８（１）に戻される、累算ベクトル出力和５１２（０）およびシフトされた累算ベクトル出力桁上げ５１７（０）と、累算ベクトル出力和５１２（１）およびシフトされた累算ベクトル出力桁上げ５１７（１）とを制御する。 [00284] With continued reference to FIG. 40, the accumulated vector output sum 512 (0) and shifted accumulated vector output carry 517 (0) of the carry save accumulator block 472 [0] and carry save. Accumulated vector output sum 512 (1) and shifted accumulated vector output carry 517 (1) of accumulator block 472 [1] are control gates 534 (0), 536 (0) and 534 ( 1) Gate controlled by 536 (1). Control gates 534 (0), 536 (0) and 534 (1), 536 (1) are fed back to compressors 508 (0), 508 (1), respectively, with accumulated vector output sums 512 (0) and Controls the shifted accumulated vector output carry 517 (0), the accumulated vector output sum 512 (1) and the shifted accumulated vector output carry 517 (1).

[00285]要約すると、図４０の累算器ブロック４７２の累算器ブロック４７２［０］、４７２［１］のプログラム可能な入力データパス４７８［０］、４７８［１］およびプログラム可能な内部データパス４８０［０］、４８０［１］により、累算器ブロック４７２は様々なモードで構成され得る。累算器ブロック４７２は、図４０に示された共通累算器回路による特定のベクトル処理命令に従って、様々な累算演算を提供するように構成され得る。 [00285] In summary, the accumulator blocks 472 [0], 472 [1] of the accumulator block 472 of FIG. 40, the programmable input data paths 478 [0], 478 [1] and the programmable internal data. Passes 480 [0], 480 [1] allow accumulator block 472 to be configured in various modes. Accumulator block 472 may be configured to provide various accumulator operations according to specific vector processing instructions by the common accumulator circuit shown in FIG.

[00286]本明細書において説明された概念および実施形態によるＶＰＥは、任意のプロセッサベースのデバイス内に設けられるか、または任意のプロセッサベースのデバイスの中に統合される場合がある。限定はしないが、例には、セットトップボックス、エンターテインメントユニット、ナビゲーションデバイス、通信デバイス、固定ロケーションデータユニット、モバイルロケーションデータユニット、モバイルフォン、携帯電話、コンピュータ、ポータブルコンピュータ、デスクトップコンピュータ、携帯情報端末（ＰＤＡ）、モニタ、コンピュータモニタ、テレビジョン、チューナ、ラジオ、衛星ラジオ、音楽プレーヤ、デジタル音楽プレーヤ、ポータブル音楽プレーヤ、デジタルビデオプレーヤ、ビデオプレーヤ、デジタルビデオディスク（ＤＶＤ）プレーヤ、およびポータブルデジタルビデオプレーヤが含まれる。 [00286] A VPE according to the concepts and embodiments described herein may be provided within any processor-based device or may be integrated into any processor-based device. Examples include, but are not limited to, set-top boxes, entertainment units, navigation devices, communication devices, fixed location data units, mobile location data units, mobile phones, mobile phones, computers, portable computers, desktop computers, personal digital assistants ( PDA), monitor, computer monitor, television, tuner, radio, satellite radio, music player, digital music player, portable music player, digital video player, video player, digital video disc (DVD) player, and portable digital video player included.

[00287]この関連で、図４１は、プロセッサベースのシステム５５０の例を示す。この例では、プロセッサベースのシステム５５０は、各々が１つまたは複数のプロセッサまたはコア５５４を含む、１つまたは複数の処理ユニット（ＰＵ）５５２を含む。ＰＵ５５２は、非限定的な例として、図２のベースバンドプロセッサ２０であり得る。プロセッサ５５４は、非限定的な例として、図２に提供されたベースバンドプロセッサ２０のようなベクトルプロセッサであり得る。この関連で、プロセッサ５５４は、図２の実行ユニット８４を含むが、それに限定されないＶＰＥ５５６を含む場合もある。ＰＵ５５２は、一時的に記憶されたデータへの高速アクセスのための、プロセッサ５５４に結合されたキャッシュメモリ５５８を有する場合がある。ＰＵ５５２は、システムバス５６０に結合され、プロセッサベースのシステム５５０に含まれるマスタデバイスとスレーブデバイスとを相互結合することができる。よく知られているように、ＰＵ５５２は、システムバス５６０を介してアドレスと、制御と、データ情報とを交換することによって、これらの他のデバイスと通信する。たとえば、ＰＵ５５２は、スレーブデバイスの例として、メモリコントローラ５６２にバストランザクション要求を通信することができる。図４１には示されていないが、複数のシステムバス５６０が提供される可能性があり、ここで、各システムバス５６０は様々なファブリックを構成する。 [00287] In this regard, FIG. 41 illustrates an example of a processor-based system 550. In this example, processor-based system 550 includes one or more processing units (PUs) 552 that each include one or more processors or cores 554. The PU 552 may be the baseband processor 20 of FIG. 2 as a non-limiting example. The processor 554 may be a vector processor such as the baseband processor 20 provided in FIG. 2 as a non-limiting example. In this regard, the processor 554 may include a VPE 556 that includes, but is not limited to, the execution unit 84 of FIG. The PU 552 may have a cache memory 558 coupled to the processor 554 for high speed access to temporarily stored data. The PU 552 is coupled to the system bus 560 and can interconnect the master and slave devices included in the processor-based system 550. As is well known, the PU 552 communicates with these other devices by exchanging address, control, and data information via the system bus 560. For example, PU 552 can communicate a bus transaction request to memory controller 562 as an example of a slave device. Although not shown in FIG. 41, multiple system buses 560 may be provided, where each system bus 560 comprises a different fabric.

[00288]他のマスタデバイスおよびスレーブデバイスが、システムバス５６０に接続され得る。図４１に示されたように、これらのデバイスには、例として、メモリシステム５６４、１つまたは複数の入力デバイス５６６、１つまたは複数の出力デバイス５６８、１つまたは複数のネットワークインターフェースデバイス５７０、および１つまたは複数のディスプレイコントローラ５７２が含まれ得る。メモリシステム５６４は、メモリコントローラ５６２によってアクセス可能なメモリ５６５を含むことができる。入力デバイス５６６は、限定はしないが、入力キー、スイッチ、音声プロセッサなどを含む、任意のタイプの入力デバイスを含むことができる。出力デバイス５６８は、限定はしないが、オーディオ、ビデオ、他の視覚的インジケータなどを含む、任意のタイプの出力デバイスを含むことができる。ネットワークインターフェースデバイス５７０は、ネットワーク５７４との間のデータ交換を可能にするように構成された任意のデバイスであり得る。ネットワーク５７４は、限定はしないが、有線またはワイヤレスのネットワーク、専用または公共のネットワーク、ローカルエリアネットワーク（ＬＡＮ）、ワイドローカルエリアネットワーク（ＷＬＡＮ）、およびインターネットが含まれる、任意のタイプのネットワークであり得る。ネットワークインターフェースデバイス５７０は、所望の任意のタイプの通信プロトコルをサポートするように構成され得る。 [00288] Other master and slave devices may be connected to the system bus 560. As shown in FIG. 41, these devices include, by way of example, a memory system 564, one or more input devices 566, one or more output devices 568, one or more network interface devices 570, And one or more display controllers 572 may be included. Memory system 564 may include memory 565 that is accessible by memory controller 562. Input device 566 can include any type of input device, including but not limited to input keys, switches, voice processors, and the like. The output device 568 can include any type of output device, including but not limited to audio, video, other visual indicators, and the like. Network interface device 570 may be any device configured to allow data exchange with network 574. The network 574 can be any type of network including, but not limited to, a wired or wireless network, a dedicated or public network, a local area network (LAN), a wide local area network (WLAN), and the Internet. . Network interface device 570 may be configured to support any type of communication protocol desired.

[00289]ＰＵ５５２はまた、１つまたは複数のディスプレイ５７８に送られる情報を制御するために、システムバス５６０を介してディスプレイコントローラ５７２にアクセスするように構成される場合がある。ディスプレイコントローラ５７２は、１つまたは複数のビデオプロセッサ５８０を介して表示されるべき情報をディスプレイ５７８に送り、ビデオプロセッサ５８０は、表示されるべき情報をディスプレイ５７８に適したフォーマットに処理する。ディスプレイ５７８は、限定はしないが、陰極線管（ＣＲＴ）、液晶ディスプレイ（ＬＣＤ）、プラズマディスプレイなどを含む、任意のタイプのディスプレイを含むことができる。 [00289] PU 552 may also be configured to access display controller 572 via system bus 560 to control information sent to one or more displays 578. Display controller 572 sends information to be displayed via one or more video processors 580 to display 578, which processes the information to be displayed into a format suitable for display 578. Display 578 can include any type of display, including but not limited to a cathode ray tube (CRT), liquid crystal display (LCD), plasma display, and the like.

[00290]本明細書で開示された二重電圧ドメインメモリバッファの実施形態とともに記載された様々な例示的な論理ブロック、モジュール、回路、およびアルゴリズムは、電子ハードウェアとして、メモリもしくは別のコンピュータ可読媒体に記憶され、プロセッサもしくは他の処理デバイスによって実行される命令として、または両方の組合せとして実装され得ることが、当業者ならさらに諒解されよう。本明細書に記載されたアービタ、マスタデバイス、およびスレーブデバイスは、例として、任意の回路、ハードウェア構成要素、集積回路（ＩＣ）、またはＩＣチップ内で利用される場合がある。本明細書で開示されたメモリは、任意のタイプおよびサイズのメモリであり得るし、所望の任意のタイプの情報を記憶するように構成される場合がある。この互換性を明確に示すために、様々な例示的な構成要素、ブロック、モジュール、回路、およびステップが、概してそれらの機能に関して上述された。そのような機能性がどのように実装されるかは、特定の用途、設計選択、および／または全体的なシステムに課された設計制約に依存する。当業者は、特定の用途ごとに様々な方法で記載された機能を実装することができるが、そのような実装の決定は、本開示の範囲から逸脱する原因になると解釈されるべきではない。 [00290] The various exemplary logic blocks, modules, circuits, and algorithms described with the dual voltage domain memory buffer embodiments disclosed herein may be implemented as electronic hardware, memory or another computer readable. Those skilled in the art will further appreciate that they can be implemented as instructions stored on a medium and executed by a processor or other processing device, or as a combination of both. The arbiters, master devices, and slave devices described herein may be utilized by way of example in any circuit, hardware component, integrated circuit (IC), or IC chip. The memory disclosed herein may be any type and size of memory and may be configured to store any type of information desired. To clearly illustrate this compatibility, various exemplary components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. How such functionality is implemented depends on the particular application, design choices, and / or design constraints imposed on the overall system. Those skilled in the art can implement the described functionality in a variety of ways for a particular application, but such implementation decisions should not be construed as causing a departure from the scope of the present disclosure.

[00291]本明細書で開示された実施形態に関して記載された様々な例示的な論理ブロック、モジュール、および回路は、プロセッサ、ＤＳＰ、特定用途向け集積回路（ＡＳＩＣ）、ＦＰＧＡもしくは他のプログラマブル論理デバイス、個別ゲートもしくはトランジスタ論地、個別ハードウェア構成要素、または本明細書に記載された機能を実行するように設計されたそれらの任意の組合せを用いて実装または実行される場合がある。プロセッサはマイクロプロセッサであり得るが、代替として、プロセッサは、任意の従来のプロセッサ、コントローラ、マイクロコントローラ、または状態機械であり得る。プロセッサはまた、コンピューティングデバイスの組合せ、たとえば、ＤＳＰとマイクロプロセッサとの組合せ、複数のマイクロプロセッサ、ＤＳＰコアと連携する１つもしくは複数のマイクロプロセッサ、または任意の他のそのような構成として実装される場合がある。 [00291] Various exemplary logic blocks, modules, and circuits described with respect to the embodiments disclosed herein may be a processor, DSP, application specific integrated circuit (ASIC), FPGA, or other programmable logic device. May be implemented or implemented using discrete gates or transistors, discrete hardware components, or any combination thereof designed to perform the functions described herein. The processor can be a microprocessor, but in the alternative, the processor can be any conventional processor, controller, microcontroller, or state machine. The processor is also implemented as a combination of computing devices, eg, a DSP and microprocessor combination, a plurality of microprocessors, one or more microprocessors associated with a DSP core, or any other such configuration. There is a case.

[00292]本明細書で開示された実施形態は、ハードウェアにおいて、およびハードウェアに記憶された命令において具現化される場合があり、たとえば、ランダムアクセスメモリ（ＲＡＭ）、フラッシュメモリ、読取り専用メモリ（ＲＯＭ）、電気的プログラマブルＲＯＭ（ＥＰＲＯＭ）、電気的消去可能プログラマブルＲＯＭ（ＥＥＰＲＯＭ（登録商標））、レジスタ、ハードディスク、リムーバブルディスク、ＣＤ−ＲＯＭ、または当技術分野で知られている任意の他の形態のコンピュータ可読媒体の中に存在する場合がある。例示的な記憶媒体は、プロセッサが記憶媒体から情報を読み出し、記憶媒体に情報を書き込むことができるように、プロセッサに結合される。代替として、記憶媒体はプロセッサと一体であり得る。プロセッサおよび記憶媒体は、ＡＳＩＣの中に存在する場合がある。ＡＳＩＣはリモート局の中に存在する場合がある。代替として、プロセッサおよび記憶媒体は、個別構成要素としてリモート局、基地局、またはサーバの中に存在する場合がある。 [00292] The embodiments disclosed herein may be embodied in hardware and in instructions stored in hardware, eg, random access memory (RAM), flash memory, read-only memory. (ROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), register, hard disk, removable disk, CD-ROM, or any other known in the art In the form of a computer readable medium. An exemplary storage medium is coupled to the processor such that the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and storage medium may reside in an ASIC. The ASIC may reside in the remote station. In the alternative, the processor and the storage medium may reside as discrete components in a remote station, base station, or server.

[00293]また、本明細書の例示的な実施形態のいずれかにおいて記載された動作ステップは、例および説明を提供するために記載されたことに留意されたい。記載された動作は、図示されたシーケンス以外の多数の様々なシーケンスにおいて実行される場合がある。さらに、単一の動作ステップにおいて記載された動作は、実際には、いくつかの様々なステップにおいて実行される場合がある。さらに、例示的な実施形態において説明された１つまたは複数の動作ステップは、組み合わされる場合がある。フローチャート図に示された動作ステップは、当業者には容易に明らかになるように、多数の様々な修正を受ける場合があることを理解されたい。情報および信号が様々な異なる技術および技法のいずれかを使用して表され得ることも当業者は理解されよう。たとえば、上記の説明全体を通して参照され得るデータ、命令、コマンド、情報、信号、ビット、シンボル、およびチップは、電圧、電流、電磁波、磁場もしくは磁気粒子、光場もしくは光学粒子、またはそれらの任意の組合せによって表される場合がある。 [00293] It should also be noted that the operational steps described in any of the exemplary embodiments herein have been described to provide examples and explanations. The described operations may be performed in a number of different sequences other than the illustrated sequence. Furthermore, the operations described in a single operation step may actually be performed in several different steps. Further, one or more of the operational steps described in the exemplary embodiments may be combined. It should be understood that the operational steps shown in the flowchart diagrams may be subject to many different modifications, as will be readily apparent to those skilled in the art. Those skilled in the art will also understand that information and signals may be represented using any of a variety of different technologies and techniques. For example, data, instructions, commands, information, signals, bits, symbols, and chips that may be referred to throughout the above description are voltages, currents, electromagnetic waves, magnetic fields or magnetic particles, light fields or optical particles, or any of them May be represented by a combination.

[00294]本開示の前の説明は、当業者が本開示を製作または使用することを可能にするために提供される。本開示に対する様々な修正は当業者には容易に明らかになり、本明細書で定義された一般原理は、本開示の趣旨または範囲から逸脱することなく、他の変形形態に適用される場合がある。したがって、本開示は、本明細書に記載された例および設計に限定されるものではなく、本明細書で開示された原理および新規の特徴と一致する最も広い範囲が与えられるべきである。 [00294] The previous description of the disclosure is provided to enable any person skilled in the art to make or use the disclosure. Various modifications to the present disclosure will be readily apparent to those skilled in the art and the generic principles defined herein may be applied to other variations without departing from the spirit or scope of the disclosure. is there. Thus, the present disclosure is not limited to the examples and designs described herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

A vector processing engine (VPE) configured to in-flight merge a resulting output vector data sample set generated by at least one execution unit that performs vector processing operations;
Providing an input vector data sample set fetched in at least one input data flow path for vector processing operations;
Receiving at least one merged resulting output vector data sample set from at least one output data flow path to be stored; and at least one vector data file
Receiving the input vector data sample set on the at least one input data flow path;
Performing the vector processing operation on the input vector data sample set to provide a resulting output vector data sample set on the at least one output data flow path; At least one execution unit provided in the at least one input data flow path;
Receiving the resulting output vector data sample set;
The resulting output to provide at least one merged resulting output vector data sample set without the resulting output vector data sample set being stored in the at least one vector data file. Merging vector data sample sets;
VPE comprising: at least one merging circuit configured to provide the at least one merged resulting output vector data sample set on the at least one output data flow path.

The at least one vector data file is
Providing the input vector data sample set of the width of the at least one vector data file in the at least one input data flow path for the vector processing operation;
Receiving the at least one merged resulting output vector data sample set of the width of the at least one vector data file from the at least one output data flow path to be stored. The VPE of claim 1.

The at least one vector data file is
Providing the input vector data sample set on at least one vector data file output in the at least one input data flow path;
Receiving the at least one merged resulting output vector data sample set on at least one vector data file input in the at least one output data flow path; and
The at least one execution unit is
Receiving the input vector data sample set on at least one execution unit input in the at least one input data flow path;
Multiplying the input vector data sample set with a code sequence vector data sample set to provide the resulting output vector data sample set on at least one execution unit output in the at least one input data flow path; Configured to do
The at least one merging circuit comprises:
Receiving from the at least one execution unit the resulting output vector data sample set on at least one merging circuit input in the at least one input data flow path;
The method of claim 1, further comprising: providing the merged resulting output vector data sample set on at least one merging circuit output in the at least one output data flow path. VPE.

The merging circuit outputs at least two resulting output vector data samples in the resulting output vector data sample set to provide the at least one merged resulting output vector data sample set. Consisting of at least one adder configured to merge;
The VPE of claim 1.

The at least one adder is composed of a plurality of adders provided in an adder tree, and each of the plurality of adders results in a plurality of addition merges each having a different bit width. Configured to provide an output vector data sample set,
The VPE according to claim 4.

The merging circuit has two vector results in the resulting output vector data sample set having a larger vector data value to provide the at least one merged resulting output vector data sample set. Consisting of at least one maximum vector data sample selector configured to maximum merge the resulting output vector data samples between the resulting output vector data samples.
The VPE of claim 1.

The at least one maximum vector data sample selector is a plurality of maximum value data, each configured to provide a plurality of maximum merged resulting output vector data sample sets, each having a different bit width. Consists of sample selectors,
The VPE according to claim 6.

The merging circuit has two, as a result, in the resulting output vector data sample set having smaller vector data values to provide the at least one merged resulting output vector data sample set. Consisting of at least one minimum vector data sample selector configured to minimally merge the resulting output vector data samples between the resulting output vector data samples;
The VPE of claim 1.

The at least one minimum vector data sample selector is a plurality of minimum value data, each configured to provide a plurality of minimum merged resulting output vector data sample sets, each having a different bit width. Consists of sample selectors,
The VPE according to claim 8.

The merging circuit further comprises a merge selector configured to select one of the at least one merged resulting output vector data sample set.
The VPE according to claim 4.

The code sequence vector data sample set is composed of at least one CDMA chip code sequence;
The VPE of claim 1.

The at least one merging circuit is configurable to be reconfigured based on a programmable merge data path configuration input to selectively merge the resulting output vector data sample set.
The VPE of claim 1.

The at least one merging circuit is configured to selectively merge the resulting output vector data sample set every clock cycle of the VPE to be executed by the at least one execution unit. Further configured to be reconfigured based on the data path configuration input;
The VPE of claim 12.

The at least one merging circuit is configured to selectively merge the resulting output vector data sample set on a next vector instruction to be executed by the at least one execution unit. Further configured to be reconfigured based on the path configuration input;
The VPE of claim 12.

The at least one merging circuit further comprises a plurality of latches, and the at least one merging circuit is further configured to store the at least one merged resulting output vector data sample set in the plurality of latches. Composed,
The VPE of claim 1.

The at least one merging circuit is further configured to store the at least one merged resulting output vector data sample set in a selected latch of the plurality of latches;
The VPE of claim 15.

The at least one merging circuit further comprises a plurality of selectors corresponding to the plurality of latches, and the at least one merging circuit is merged with the selected latches in the plurality of latches. And configured to control a selector among the plurality of selectors to store the resulting output vector data sample set.
The VPE of claim 16.

The at least one merging circuit provides the at least one merged resulting output vector data sample set in the at least one output data flow path for storage in the at least one vector data file. Previously configured to store the at least one merged resulting output vector data sample set in the plurality of latches;
The VPE of claim 17.

The at least one execution unit is configured to process different bit widths of input vector data samples from the input vector data sample set based on a programmable input data flow path configuration for the at least one execution unit. Is configurable,
The VPE of claim 1.

A vector processing engine (VPE) configured to in-flight merge a resulting output vector data sample set generated by at least one execution unit that performs vector processing operations;
Means for providing a set of input vector data samples fetched into at least one input data flow path means for vector processing operations;
Means for receiving at least one merged resulting output vector data sample set from at least one output data flow path means to be stored; and at least one vector data file means
Means for receiving the input vector data sample set on the at least one input data flow path means;
Execution means for performing the vector processing operation on the input vector data sample set to provide a resulting output vector data sample set on the at least one input data flow path means, At least one execution unit means provided in the at least one input data flow path means;
Means for receiving the resulting output vector data sample set;
The resulting output vector data sample set is generated to provide at least one merged resulting output vector data sample set without being stored in the at least one vector data file means. Merging means for merging an output vector data sample set with the code sequence vector data sample set;
A vector processing engine comprising: at least one merging circuit means comprising: means for providing the at least one merged resulting output vector data sample set on the at least one output data flow path means. VPE).

A method for in-flight merging a resulting set of output vector data samples generated by at least one execution unit that performs vector processing operations, comprising:
Providing an input vector data sample set fetched from at least one vector data file into at least one input data flow path for vector processing operations;
Receiving the input vector data sample set on the at least one input data flow path in at least one execution unit provided in the at least one input data flow path;
Performing the vector processing operation on the input vector data sample set to provide a resulting output vector data sample set on the at least one input data flow path;
The resulting output to provide at least one merged resulting output vector data sample set without the resulting output vector data sample set being stored in the at least one vector data file. Merging vector data sample sets;
Storing the at least one merged resulting output vector data sample set from the at least one output data flow path in the at least one vector data file.

The merging of the resulting output vector data sample sets occurs in the at least one adder to provide the at least one merged resulting output vector data sample set. Further comprising adding the merge samples in the output vector data sample set;
The method of claim 21.

The at least one adder is comprised of a plurality of adders provided in an adder tree, each of the plurality of adders having a plurality of merged resulting outputs, each having a different bit width. Configured to supply a vector data sample set,
The method of claim 22.

Selecting one of the plurality of resulting output vector data sample sets to provide as the at least one resulting output vector data sample set within the at least one output data flow path. Prepare
24. The method of claim 23.

Receiving programmable merge data path configuration inputs; and
22. The method of claim 21, further comprising: selectively merging the resulting output vector data sample set based on the programmable merge data path configuration input.

Further comprising selectively merging the resulting output vector data sample sets for each clock cycle of a VPE to be executed by the at least one execution unit;
26. The method of claim 25.

Further comprising selectively merging the resulting output vector data sample sets for a next vector instruction to be executed by the at least one execution unit.
26. The method of claim 25.