JP2009505486A

JP2009505486A - Multi-mode wireless broadband signal processor system and method

Info

Publication number: JP2009505486A
Application number: JP2008525972A
Authority: JP
Inventors: ジョンマイヤーズ、セオドア; ダブリュ．ボーセル、ロバート; キュー．グエン、ティエン; カヌラスシンスアン、ケネス; ウェールズプライス、フレデリック; ニールコーエン、ルイス; トーマスワーナー、ダニエル
Original assignee: NXP USA Inc
Current assignee: NXP USA Inc
Priority date: 2005-08-08
Filing date: 2005-09-08
Publication date: 2009-02-05
Also published as: WO2007018553A1

Abstract

無線広帯域信号処理システムは、プログラムメモリ、命令コントローラおよび処理装置を備える。このシステムは、サンプルバッファ、１ポートメモリおよび４ポートメモリを備えることも可能である。プログラムメモリは、命令コントローラによって用いられる、プログラムされた命令を格納する。処理装置は、復調処理などのベクトル処理を実行するように構成される。１つの処理装置は各クロックにおいて計算される畳み込み演算用に構成され、別の処理装置は各クロックにおいて基数４のバタフライが実行されるＦＦＴ機能用に構成され、さらに別の処理装置は、逆拡散、ベクトル加算、ベクトル減算、内積および成分毎の乗算など、他のベクトル演算用に構成されることが可能である。システムは診断データを収集し、複数のネットワークを通じて動作し、ベースバンド回路を削減し、マルチモード動作を最大化することが可能である。 The wireless broadband signal processing system includes a program memory, an instruction controller, and a processing device. The system can also include a sample buffer, a 1 port memory and a 4 port memory. The program memory stores programmed instructions used by the instruction controller. The processing device is configured to perform vector processing such as demodulation processing. One processing unit is configured for the convolution operation calculated at each clock, another processing unit is configured for an FFT function where a radix-4 butterfly is performed at each clock, and yet another processing unit is despread , Vector addition, vector subtraction, inner product and component-by-component multiplication can be configured for other vector operations. The system can collect diagnostic data and operate through multiple networks, reducing baseband circuitry and maximizing multi-mode operation.

Description

本発明は通信システムおよび方法に関する。より詳細には、本発明はマルチモード無線広帯域信号プロセッサシステムおよび方法に関する。 The present invention relates to communication systems and methods. More particularly, the present invention relates to multimode wireless broadband signal processor systems and methods.

無線装置は、ますます高いデータレートを取扱う性能を必要としている。マルチメディアコンテンツに適合するために、例えば、無線装置のデータレートが有線接続の装置の広帯域レートと一致することが必要な場合がある。無線装置のユーザは、複数の無線ネットワーク化技術によって異なる種類のコンテンツおよびサービスを取得するために、ますます多機能な、多技術の装置を要求している。 Wireless devices need the ability to handle increasingly higher data rates. In order to adapt to multimedia content, for example, the data rate of a wireless device may need to match the broadband rate of a wired device. Wireless device users are increasingly demanding multifunctional, multi-technology devices to acquire different types of content and services through multiple wireless networking technologies.

小型の携帯装置へ広帯域機能を実装する多くの試みがなされている。例えば、Ｗｉ−Ｆｉ８０２．１１として一般に知られている無線データ技術では、高品質（高解像度）のストリーミングビデオおよびイメージコンテンツなどの要求用途を取扱うための高速性能が提供される。しかしながら、従来の８０２．１１の実装では、ユーザが許容可能である、電力消費量パラメータが満たされることはない。現在利用可能である電力消費が最低の８０２．１１の実装でも、バッテリ駆動の装置では、「通話時間」（音声、データまたはビデオが転送されている作動状態）は非常に制限される。 Many attempts have been made to implement broadband functions in small portable devices. For example, wireless data technology commonly known as Wi-Fi 802.11 provides high speed performance to handle demand applications such as high quality (high resolution) streaming video and image content. However, conventional 802.11 implementations do not satisfy the power consumption parameters that are acceptable to the user. Even with 802.11 implementations that currently have the lowest power consumption, battery-operated devices are very limited in “talk time” (the operational state in which voice, data or video is being transferred).

許容可能な電力消費の８０２．１１の実装の考案を除き、別の課題は、８０２．１１、Ｂｌｕｅｔｏｏｔｈ（登録商標）、ＵＷＢ（ＵｌｔｒａＷｉｄｅｂａｎｄ）、ＷｉＭａｘ（８０２．１６ｄ、８０２．１６ｅ）、８０２．２０ならびに３Ｇおよび４Ｇのセルラーシステムなど、２つ以上のネットワーク化動作モードをサポートする無線実装を確立することである。無線装置は様々な無線ネットワーク化技術を提供可能である必要がある。１つの装置において複数のネットワーク化標準および技術にしたがって動作する性能は、「マルチモード」機能と呼ばれる。 Other than the idea of 802.11 implementation of acceptable power consumption, other issues are 802.11, Bluetooth®, UWB (Ultra Wideband), WiMax (802.16d, 802.16e), 802. Establishing a wireless implementation that supports more than one networked mode of operation, such as 20 and 3G and 4G cellular systems. The wireless device needs to be able to provide various wireless networking technologies. The ability to operate according to multiple networking standards and technologies on a single device is called a “multi-mode” function.

大抵の従来の移動体装置は、デジタル信号プロセッサ（ＤＳＰ）ベースであるか、特定用途向け集積回路（ＡＳＩＣ）ベースであるか、あるいはＡＳＩＣ／ＤＳＰのハイブリッドアーキテクチャである。電力効率、設計の柔軟性、および費用など、いくつかの技術的考察によれば、いずれの手法も広帯域無線に適切ではない。アーキテクチャの限界のため、従来のアプローチでは、電力消費の犠牲によってのみ高いデータレートを提供することができ、許容不能な短いバッテリ寿命が生じる。 Most conventional mobile devices are digital signal processor (DSP) based, application specific integrated circuit (ASIC) based, or ASIC / DSP hybrid architectures. According to some technical considerations such as power efficiency, design flexibility, and cost, neither approach is suitable for broadband radio. Due to architectural limitations, conventional approaches can provide high data rates only at the expense of power consumption, resulting in unacceptably short battery life.

日々新たな無線標準が導入されているので、それらの急速に発展する標準に継続的に適合するには、従来のＡＳＩＣ設計は充分に柔軟でない。新たな標準について集積回路設計サイクルが開始されると、必然的に発生する変更によって、再びＡＳＩＣチップのスクラッチまたは再スピンから開始する必要がある。１つの装置上にエンドユーザの要求する複数の無線機能を提供するために、ＡＳＩＣおよびＤＳＰアプローチでは単純に追加の「処理回路」を並列にスタックすることによってマルチモード機能がサポートされており、これによって各々の新たなモードのための装置容積および製造者負担が著しく増大している。 As new wireless standards are introduced every day, traditional ASIC designs are not flexible enough to continually meet these rapidly evolving standards. When an integrated circuit design cycle is started for a new standard, the inevitably changes need to start again from scratching or re-spinning the ASIC chip. To provide multiple wireless functions required by the end user on a single device, the ASIC and DSP approaches support multi-mode functionality by simply stacking additional “processing circuits” in parallel, Greatly increases the equipment volume and manufacturer burden for each new mode.

広帯域性能および低電力消費を備えるマルチモード通信を提供する、通信システムおよびアーキテクチャの必要が存在する。また、論理アナライザによる観察および解析のために高周波数で通信装置の高等診断データを収集する性能の必要も存在する。さらに、複数のネットワークおよび複数の通信標準を通じて機能することの可能な無線通信装置を提供する必要が存在する。さらに、ベースバンド回路を削減し、ＡＳＩＣアルゴリズムを改良して極度に低電力／低費用であることの利点を得て、処理能力を上昇させ、電力消費、ゲート数、およびシリコン費用を削減する必要が存在する。 There is a need for a communication system and architecture that provides multi-mode communication with broadband performance and low power consumption. There is also a need for the ability to collect higher diagnostic data of communication devices at high frequencies for observation and analysis with a logic analyzer. Furthermore, there is a need to provide a wireless communication device that can function through multiple networks and multiple communication standards. In addition, the need to reduce baseband circuitry, improve ASIC algorithms to gain the benefits of extremely low power / low cost, increase throughput, reduce power consumption, gate count, and silicon costs Exists.

また、そのような通信システムにおいて入力および出力を制御する必要や、複数の通信標準を取扱うためのマルチモード無線処理システムにおいてサンプルバッファへの接続レートを動的に制御する必要も存在する。さらに、マルチモード無線処理システムにおいてプロセッサとのインタフェースを行う必要が存在する。さらに、電力消費を最小化するように、高速フーリエ変換（ＦＦＴ）を実行する必要が存在する。さらに、タスクのコンテキストに基づき、タスクおよびそのタスクのコンポーネントの動作の実行に優先順位を付ける性能の必要が存在する。また、マルチモード無線広帯域システムにおいて畳み込み演算を実行する必要が存在する。 There is also a need to control input and output in such a communication system and to dynamically control the connection rate to the sample buffer in a multi-mode wireless processing system for handling multiple communication standards. Furthermore, there is a need to interface with a processor in a multimode wireless processing system. Furthermore, there is a need to perform a Fast Fourier Transform (FFT) so as to minimize power consumption. In addition, there is a need for the ability to prioritize the execution of tasks and component operations based on the task context. There is also a need to perform convolution operations in multimode wireless broadband systems.

代表的な一実施形態は、プロセッサ診断データを取得するための方法に関する。この方法は、命令を受信する工程と、診断メモリに対する出力通信ストリームの書込アクセスを可能とする工程と、第１の周波数で診断メモリへの書込を行う工程と、第２の周波数で診断メモリからの読取を行う工程とを含むことが可能である。ここで、第１の周波数は第２の周波数より大きい。 One exemplary embodiment relates to a method for obtaining processor diagnostic data. The method includes receiving an instruction, enabling write access of the output communication stream to the diagnostic memory, writing to the diagnostic memory at a first frequency, and diagnosing at a second frequency. Reading from the memory. Here, the first frequency is greater than the second frequency.

別の代表的な実施形態は、プロセッサ診断データを取得するためのシステムに関する。このシステムは、複数の命令を格納しているメモリと、複数の命令を受信して実行するコントローラと、第１の周波数で通信データを受信し、第２の周波数で通信データを出力する診断メモリとを含むことが可能である。ここで、第１の周波数は第２の周波数より大きい。 Another exemplary embodiment relates to a system for obtaining processor diagnostic data. The system includes a memory that stores a plurality of instructions, a controller that receives and executes the plurality of instructions, and a diagnostic memory that receives communication data at a first frequency and outputs communication data at a second frequency. Can be included. Here, the first frequency is greater than the second frequency.

別の代表的な実施形態は、マルチモード無線処理システムにおいて入力および出力を制御する方法に関する。この方法は、マルチモード無線処理システムにおいて通信のための命令を受信する工程と、所定の処理装置が出力データを生成するか、あるいは入力データを受信するかを、受信された命令のフィールドから決定する工程とを含むことが可能である。 Another exemplary embodiment relates to a method for controlling input and output in a multi-mode wireless processing system. The method determines from a received command field whether to receive a command for communication in a multi-mode wireless processing system and whether a given processing device generates output data or receives input data. It is possible to include the process of carrying out.

別の代表的な実施形態は、マルチモード無線処理システムにおいて処理装置とのインタフェースを行うための入力／出力コンポーネントの構成に関する。この構成は、マルチモード無線処理システムにおいて処理装置へ入力データを供給するための複数の汎用入力と、マルチモード無線処理システムにおいて処理装置によって生成される出力データを受信するための複数の汎用出力とを含む。 Another exemplary embodiment relates to the configuration of input / output components for interfacing with a processing device in a multi-mode wireless processing system. This configuration includes a plurality of general-purpose inputs for supplying input data to a processing device in a multi-mode wireless processing system, and a plurality of general-purpose outputs for receiving output data generated by the processing device in the multi-mode wireless processing system. including.

別の代表的な実施形態は、マルチモード無線処理システムにおいて入力および出力を制御するためのシステムに関する。このシステムは、マルチモード無線プロセッサシステムにおける命令を含むメモリと、命令を受信し、命令のフィールドから、マルチモード無線処理システムにおける所定の処理装置が出力データを生成するか、あるいは入力データを受信するかを決定するコントローラとを含むことが可能である。 Another exemplary embodiment relates to a system for controlling input and output in a multi-mode wireless processing system. The system receives memory and instructions including instructions in a multimode wireless processor system, and from a field of instructions, a predetermined processing device in the multimode wireless processing system generates output data or receives input data. And a controller for determining whether or not.

別の代表的な実施形態は、マルチモード処理システムにおいてサンプルバッファへの接続レートを動的に制御する方法に関する。この方法は、マルチモード無線処理システムにおいて通信用の命令を受信する工程と、データの受信または送信のためにマルチモード無線処理システムの外部の要素へ複数のバッファが逐次（ｓｅｒｉａｌｌｙ）接続されるレートを決定する工程とを含むことが可能である。 Another exemplary embodiment relates to a method for dynamically controlling a connection rate to a sample buffer in a multi-mode processing system. The method includes receiving a command for communication in a multimode wireless processing system, and a rate at which a plurality of buffers are serially connected to elements external to the multimode wireless processing system for receiving or transmitting data. Can be included.

別の代表的な実施形態は、マルチモード処理システムにおいてサンプルバッファへの接続レートを動的に制御するためのシステムに関する。このシステムは、マルチモード無線処理システムにおけるマルチモード無線プロセッサ通信用の命令を含むメモリと、命令を受信し、データの受信または送信のためにマルチモード無線処理システムの外部の要素へ複数のバッファが逐次接続されるレートを決定するコントローラとを含むことが可能である。 Another exemplary embodiment relates to a system for dynamically controlling a connection rate to a sample buffer in a multi-mode processing system. The system includes a memory containing instructions for multimode wireless processor communication in a multimode wireless processing system, and a plurality of buffers to elements external to the multimode wireless processing system for receiving instructions and receiving or transmitting data. A controller for determining a rate of serial connection.

別の代表的な実施形態は、２つのプロセッサのインタフェースを行う方法に関する。この方法は、第１のプロセッサに直接的にアクセス可能でないメモリにアクセスするために第１のプロセッサにて読取／書込要求を生成する工程と、ターゲットメモリへの直接的なアクセスを有する第２のプロセッサにて読取／書込要求を受信する工程と、第２のプロセッサにて読取／書込動作を完了する工程と、読取／書込動作が完了したことの表示（ｉｎｄｉｃａｔｉｏｎ）を第１のプロセッサにて受信する工程とを含むことが可能である。 Another exemplary embodiment relates to a method for interfacing two processors. The method includes generating a read / write request at the first processor to access memory that is not directly accessible to the first processor, and a second having direct access to the target memory. The first processor receives the read / write request, the second processor completes the read / write operation, and an indication that the read / write operation is complete. Receiving at the processor.

別の代表的な実施形態は、２つのプロセッサのインタフェースを行うためのシステムに関する。このシステムは、第１のプロセッサに直接的にアクセス可能でないメモリにアクセスするために読取／書込要求を生成する第１のプロセッサと、読取／書込要求を受信し、ターゲットメモリへの直接的なアクセスを有し、読取／書込動作を完了させる第２のプロセッサと、ターゲットメモリと、第１のプロセッサと第２のプロセッサとの間で通信を行うための手段とを含むことが可能である。 Another exemplary embodiment relates to a system for interfacing two processors. The system includes a first processor that generates a read / write request to access memory that is not directly accessible to the first processor, and receives the read / write request directly to the target memory. A second processor that has complete access and completes the read / write operation, a target memory, and means for communicating between the first processor and the second processor. is there.

別の代表的な実施形態は、２つのプロセッサの間のインタフェースに関する。このインタフェースは、第１のプロセッサにて読取／書込要求を生成するための手段と、いずれかのプロセッサによって状態ビットを設定するための手段と、両方のプロセッサによって状態ビットのポーリングを行うための手段と、２つのプロセッサの間で追加のデータの通信を行うための手段とを含むことが可能である。 Another exemplary embodiment relates to an interface between two processors. The interface includes means for generating a read / write request at the first processor, means for setting a status bit by either processor, and polling of the status bit by both processors. Means and means for communicating additional data between the two processors may be included.

別の代表的な実施形態は、マルチモード無線処理システムにおいて高速フーリエ変換（ＦＦＴ）を実行する方法に関する。この方法は、入力ベクトルを入力バッファにロードする工程と、第２のカウンタおよび変数Ｎを初期化する工程と、ＦＦＴステージを実行する工程と、ｓをＮと比較し、ｓ＝Ｎとなるまで追加のＦＦＴステージを実行する工程とを含むことが可能である。ここで、Ｎ＝ｌｏｇ_２（入力ベクトルサイズ）であり、ｓは第２のカウンタの値である。ＦＦＴステージは、入力バッファのデータに対しベクトル演算を実行する工程と、結果を出力バッファに送信する工程と、第２のカウンタの値を進める工程と、入力バッファおよび出力バッファの役割を切り替える工程とを含むことが可能である。ＦＦＴステージにおけるベクトル演算は、一度に４つの入力データに対し基数４（Ｒａｄｉｘ−４）のＦＦＴベクトル演算を実行する工程と、その結果得られる出力ベクトルに回転（Ｔｗｉｄｄｌｅ）因子を乗算する工程とを含むことが可能である。回転因子を生成する方法は、回転因子の制御操作のための制御語を生成する工程と、生成された回転アドレスに基づきメモリから回転因子にアクセスする必要がある否かを判定する工程とを含むことが可能である。回転因子にアクセスする必要がある場合、回転因子を生成する方法は、さらに、メモリから回転因子の読取を行う工程と、制御語に基づき回転因子を操作する工程と、操作された回転因子を処理装置に格納する工程とを含むことが可能である。 Another exemplary embodiment relates to a method for performing a Fast Fourier Transform (FFT) in a multi-mode wireless processing system. The method includes loading an input vector into an input buffer, initializing a second counter and variable N, executing an FFT stage, and comparing s with N until s = N. Performing additional FFT stages. Here, N = log ₂ (input vector size), and s is the value of the second counter. The FFT stage includes a step of performing a vector operation on the data in the input buffer, a step of transmitting the result to the output buffer, a step of advancing the value of the second counter, and a step of switching the roles of the input buffer and the output buffer. Can be included. The vector operation in the FFT stage includes a step of executing a radix-4 (Radix-4) FFT vector operation on four input data at a time, and a step of multiplying an output vector obtained as a result by a twiddle factor. It is possible to include. A method of generating a twiddle factor includes generating a control word for a twiddle factor control operation and determining whether the twiddle factor needs to be accessed from memory based on the generated rotation address. It is possible. If the twiddle factor needs to be accessed, the method for generating the twiddle factor further includes reading the twiddle factor from memory, manipulating the twiddle factor based on the control word, and processing the manipulated twiddle factor. Storing in the device.

別の代表的な実施形態は、マルチモード無線処理システムにおいて高速フーリエ変換（ＦＦＴ）を実行するためのシステムに関する。このシステムは、処理装置へ算術関数を提供するためのメモリと、ＦＦＴアルゴリズムを実行するための命令を格納しているプログラムメモリと、プログラムメモリから命令を受信し、実行するための命令コントローラと、ＦＦＴアルゴリズムの連続的なＦＦＴステージにおいて入力バッファおよび出力バッファとして交互に機能する１対のバッファとを含むことが可能である。 Another exemplary embodiment relates to a system for performing a Fast Fourier Transform (FFT) in a multi-mode wireless processing system. The system includes a memory for providing arithmetic functions to a processing device, a program memory storing instructions for executing an FFT algorithm, an instruction controller for receiving and executing instructions from the program memory, It is possible to include a pair of buffers that function alternately as input and output buffers in successive FFT stages of the FFT algorithm.

この代表的な実施形態における処理装置は、４つの入力ベクトルに対し８つの複素加算を実行し、４つの出力ベクトルを生成する基数４のＦＦＴエンジンと、入力バッファから４つの入力ベクトルを逐次受信し、４つの入力ベクトルを基数４のＦＦＴエンジンへ並列に送信するためのシリアル−パラレル変換器と、４つの生成された出力ベクトルを並列に受信し、回転乗算器および出力バッファへ４つの出力ベクトルを逐次出力するためのパラレル−シリアル変換器と、操作された回転因子を処理装置に格納するための１組のレジスタと、制御語に基づき回転因子を操作する回転オクタント（ｏｃｔａｎｔ）操作器と、所与のＦＦＴステージにおけるＦＦＴアルゴリズムの進行を監視するためのループ変数として用いられるマスターカウンタと、ＦＦＴアルゴリズムの現在のステージを追跡するためのループ変数として用いられる第２のカウンタと、入力バッファアドレスを生成する入力アドレス生成器と、最終ＦＦＴステージが実行されており、かつ、Ｎが奇数のときを除いて、すべてのＦＦＴステージにおいて入力バッファアドレスが出力バッファアドレスとして用いられることと、Ｎ＝ｌｏｇ_２（入力バッファのデータのサイズ）であることと、予備回転アドレスを生成するための回転アドレス生成器と、Ｎが奇数の場合、最終ＦＦＴステージにおいて出力バッファアドレスを生成するダイビット（ＤｉＢｉｔ）インタリーブ処理生成器と、制御語および最終回転因子アドレスを生成するための回転アドレス乗算器とを含むことが可能である。 The processor in this exemplary embodiment performs a radix-4 FFT engine that performs eight complex additions on four input vectors and generates four output vectors, and sequentially receives the four input vectors from the input buffer. A serial-to-parallel converter for transmitting four input vectors in parallel to a radix-4 FFT engine, and four generated output vectors received in parallel, and four output vectors to a rotary multiplier and output buffer A parallel-serial converter for sequential output, a set of registers for storing the manipulated twiddle factor in the processor, a rotary octant manipulator for manipulating the twiddle factor based on the control word, Master counter used as a loop variable to monitor the progress of the FFT algorithm at a given FFT stage , A second counter used as a loop variable to track the current stage of the FFT algorithm, an input address generator for generating an input buffer address, a final FFT stage is executed, and N is an odd number Except time, input buffer address is used as output buffer address in all FFT stages, N = log ₂ (size of data in input buffer), rotation address for generating preliminary rotation address A generator, a dibit interleave processing generator that generates an output buffer address in the final FFT stage if N is odd, and a rotation address multiplier for generating a control word and a final twiddle factor address Is possible.

別の代表的な実施形態は、プロセッサ診断データを取得するためのシステムに関する。このシステムは、プログラムメモリから命令を受信するコントローラと、受信される命令に基づきコントローラによってデータの受信が可能とされる診断メモリとを含むことが可能である。診断メモリは、第１の周波数で通信データを受信し、第２の周波数で通信データを出力する。ここで、第１の周波数は第２の周波数より大きい。このシステムは、さらに、データを第２のレートで通信するために、診断メモリに結合されている外部インタフェースを含むことが可能である。 Another exemplary embodiment relates to a system for obtaining processor diagnostic data. The system can include a controller that receives instructions from the program memory and a diagnostic memory that can receive data by the controller based on the received instructions. The diagnostic memory receives communication data at the first frequency and outputs communication data at the second frequency. Here, the first frequency is greater than the second frequency. The system can further include an external interface coupled to the diagnostic memory for communicating data at the second rate.

別の代表的な実施形態は、時間間隔内で命令コンテキスト間の切替を行う方法に関する。この方法は、時間間隔内に実行を完了するクリティカルタスク動作を実行する工程と、クリティカルタスクは複数のクリティカルタスク動作を含むことと、時間間隔の境界を越えることの可能な非クリティカルタスク動作を実行する工程と、非クリティカルタスクは複数の非クリティカルタスク動作を含むことと、時間間隔において開始されたクリティカルタスク動作および非クリティカルタスク動作が次の時間間隔が開始する前に完了されている場合、クリティカルタスク動作および非クリティカルタスク動作の実行されないスリープモードに入る工程と、を含むことが可能である。 Another exemplary embodiment relates to a method for switching between instruction contexts within a time interval. This method executes a critical task operation that completes execution within a time interval, the critical task includes multiple critical task operations, and performs non-critical task operations that can cross time interval boundaries. And a non-critical task contains multiple non-critical task actions and is critical if critical and non-critical task actions started in a time interval are completed before the next time interval starts. Entering a sleep mode in which task operations and non-critical task operations are not performed.

別の代表的な実施形態は、マルチモード無線処理システムにおいて畳み込み演算を実行するための方法に関する。この方法は、アドレス生成器に初期値および刻み値（ｓｔｒｉｄｅｖａｌｕｅ）をロードする工程と、初期値および刻み値に基づきアドレスを生成する工程と、生成したアドレスを一連のメモリへ供給する工程と、入力データを一連のレジスタへロードする工程と、各レジスタに関連したメモリの生成したアドレスに格納されている値に各レジスタの内容を乗算する工程と、その結果得られる乗算の積を加算する工程と、その結果得られる総和にづき出力を生成する工程とを含むことが可能である。メモリの数およびレジスタの数は等しく、各レジスタは関連するメモリを有する。 Another exemplary embodiment relates to a method for performing a convolution operation in a multi-mode wireless processing system. The method includes loading an initial value and a step value into an address generator, generating an address based on the initial value and the step value, and supplying the generated address to a series of memories; Loading input data into a series of registers, multiplying the value stored in the address generated by the memory associated with each register with the contents of each register, and adding the product of the resulting multiplications And generating an output based on the resulting sum. The number of memories and the number of registers are equal and each register has an associated memory.

別の代表的な実施形態は、マルチモード無線処理システムにおいて畳み込み演算を実行するためのシステムに関する。このシステムは、初期値および刻み値の与えられるアドレスを生成するためのアドレス生成器と、一連のメモリと、入力値を格納するための一連のレジスタと、一連の複素乗算器と、一連の複素乗算器、レジスタおよびメモリの数は等しいことと、各乗算器は１つのレジスタおよび１つのメモリに関連していることと、各乗算器は、関連するレジスタの内容と、関連するメモリの生成したアドレスに格納されている値との積を生成することと、一連の積を加算し、積の総和を生成する複素加算木とを含むことが可能である。 Another exemplary embodiment relates to a system for performing convolution operations in a multi-mode wireless processing system. The system includes an address generator for generating addresses given initial and step values, a series of memories, a series of registers for storing input values, a series of complex multipliers, and a series of complex multipliers. The number of multipliers, registers and memories is equal, each multiplier is associated with one register and one memory, each multiplier is associated with the contents of the associated register and the associated memory generated It is possible to include generating a product with the value stored at the address and adding a series of products to generate a sum of products.

図１には、無線広帯域信号処理システム１０を示す。無線広帯域信号処理システム１０は、プログラムメモリ１２、命令コントローラ１４、および処理装置１６，１８，２０を備えることが可能である。また、このシステム１０は、サンプルバッファ２２，２４，２６と、１ポート（ｓｉｎｇｌｅ−ｐｏｒｔ）メモリ２８，３０，３２と、４ポートメモリ３４，３６とを備えることが可能である。プログラムメモリ１２は、命令コントローラ１４によって用いられる、プログラムされた命令を格納する。処理装置１６，１８，２０は、復調処理などのベクトル処理を実行するように構成される。例えば、処理装置１６は、各クロックにおいて計算される畳み込み演算用に構成されることが可能であり、処理装置１８は、各クロックにおいて基数４のバタフライが実行されるＦＦＴ機能用に構成されることが可能であり、処理装置２０は、逆拡散、ベクトル加算、ベクトル減算、内積および成分毎の乗算など、他のベクトル演算用に構成されることが可能である。追加の処理装置、より少ない処理装置、または異なる処理装置を備えることが可能である。１つ以上の代表的な実施形態では、処理装置１６，１８，２０へ算術関数を提供するためにメモリ３８を備える。メモリ３８は、リードオンリメモリ（ＲＯＭ）であることが可能である。 FIG. 1 shows a wireless broadband signal processing system 10. The wireless broadband signal processing system 10 can include a program memory 12, an instruction controller 14, and processing devices 16, 18, and 20. In addition, the system 10 can include sample buffers 22, 24, 26, single-port memories 28, 30, 32, and 4-port memories 34, 36. Program memory 12 stores programmed instructions used by instruction controller 14. The processing devices 16, 18, and 20 are configured to perform vector processing such as demodulation processing. For example, the processor 16 can be configured for a convolution operation calculated at each clock, and the processor 18 can be configured for an FFT function where a radix-4 butterfly is performed at each clock. The processing device 20 can be configured for other vector operations such as despreading, vector addition, vector subtraction, inner product and component-by-component multiplication. It is possible to provide additional processing devices, fewer processing devices, or different processing devices. In one or more exemplary embodiments, a memory 38 is provided to provide arithmetic functions to the processing units 16, 18, 20. The memory 38 can be a read only memory (ROM).

命令コントローラ１４は、プログラムメモリ１２からベクトル命令を受信する。受信されたベクトル命令に基づき、命令コントローラ１４は入力および出力用のポートメモリを選択することが可能である。無線広帯域信号処理システム１０の代表的な動作は、「デジタル変調・復調を実行するマルチモードの方法および装置（Ｍｕｌｔｉ−ＭｏｄｅＭｅｔｈｏｄａｎｄＡｐｐａｒａｔｕｓｆｏｒＰｅｒｆｏｒｍｉｎｇＤｉｇｉｔａｌＭｏｄｕｌａｔｉｏｎａｎｄＤｅｍｏｄｕｌａｔｉｏｎ）」と題する米国特許出願第１０／６１３，４７６号明細書に記載されている。その全体を引用によって本明細書に援用する。 The instruction controller 14 receives vector instructions from the program memory 12. Based on the received vector instructions, the instruction controller 14 can select port memories for input and output. A typical operation of the wireless wideband signal processing system 10 is described in US patent application Ser. No. 10 / entitled “Multi-Mode Method and Apparatus for Performing Digital Modulation and Demodulation”. No. 613,476. Which is incorporated herein by reference in its entirety.

無線広帯域信号処理システム１０は、さらに診断メールボックス４４を備える。診断メールボックス４４はランダムアクセスメモリ（ＲＡＭ）などのメモリであり、処理装置の出力へ（示すように）または無線広帯域信号処理システム１０の入力へ結合されている。いずれの実装においても、診断メールボックス４４は高周波数で通信データを受信し、より低い周波数で論理アナライザ４６へ通信データを送信する。論理アナライザ４６は診断メールボックス４４の内容のログを生成する。次いで、無線広帯域信号処理システム１０の動作を理解し、デバッグ動作、故障解析などを実行するために、診断メールボックス４４の内容を調査、研究することが可能である。 The wireless broadband signal processing system 10 further includes a diagnostic mailbox 44. The diagnostic mailbox 44 is a memory, such as a random access memory (RAM), coupled to the output of the processing unit (as shown) or to the input of the wireless broadband signal processing system 10. In either implementation, diagnostic mailbox 44 receives communication data at a high frequency and transmits communication data to logic analyzer 46 at a lower frequency. The logic analyzer 46 generates a log of the contents of the diagnostic mailbox 44. It is then possible to investigate and study the contents of the diagnostic mailbox 44 in order to understand the operation of the wireless broadband signal processing system 10 and perform debugging operations, failure analysis, and the like.

図２には、代表的な一実施形態による診断メールボックス４４の使用を示す。動作中、命令コントローラ１４はプログラムメモリ１２から命令を受信する。この命令は、通信されている命令の種類に関する情報を有する診断メールボックスフィールドを含む。診断メールボックス４４へ出力ストリームの書込が行われる場合、診断メールボックスフィールドは論理値１（ｌｏｇｉｃａｌｏｎｅ）に設定される。命令コントローラ１４は、ベクトル命令出力期間に診断メールボックス４４の書込アクセスが可能となるように、必要な時間整列を実行する。診断メールボックス４４に対する書込の発生するレートは、Ｆ_ｗｂｓｐである。診断メールボックス４４からの読取動作は、より低い同期レートＦ_ｒｅａｄで発生する。これは、チップ外（ｏｆｆ−ｃｈｉｐ）のアクセスにおいてサポート可能なレートである。代表的な一実施形態では、同期レートＦ_ｒｅａｄは４０ＭＨｚ以下であり、４０ＭＨｚ以上であるＦ_ｗｂｓｐの１／５〜１／１０である。Ｆ_ｒｅａｄ≧ＮＦ_ｗｂｓｐであり、ここでＮは、診断メールボックスフィールドが１に設定される命令に関連しているクロックの逆数である。 FIG. 2 illustrates the use of a diagnostic mailbox 44 according to an exemplary embodiment. During operation, the instruction controller 14 receives instructions from the program memory 12. This instruction includes a diagnostic mailbox field having information regarding the type of instruction being communicated. When an output stream is written to the diagnostic mailbox 44, the diagnostic mailbox field is set to a logical value of 1 (logical one). The instruction controller 14 performs the necessary time alignment so that the diagnostic mailbox 44 can have write access during the vector instruction output period. The rate at which writing to the diagnostic mailbox 44 occurs is F _wbsp . A reading operation from the diagnostic mailbox 44 occurs at a lower synchronization rate _Fread . This is a rate that can be supported in off-chip access. In one exemplary embodiment, the synchronization rate _{F read} is less than 40 MHz, which is 1 / 5-1 / 10 of the _{F WBSP} not less than 40 MHz. F _read ≧ NF _wbsp , where N is the reciprocal of the clock associated with the instruction whose diagnostic mailbox field is set to 1.

代替の一実施形態では、プログラムメモリ１２から受信されるベクトル命令が変化する場合には常に、命令コントローラ１４は、診断メモリに対する書込アクセスを可能とする。これによって、診断メールボックス４４が出力ストリームの連続的なログを提供することが可能となる。 In an alternative embodiment, the instruction controller 14 allows write access to the diagnostic memory whenever the vector instruction received from the program memory 12 changes. This allows the diagnostic mailbox 44 to provide a continuous log of the output stream.

図３には、診断メールボックスが２ポート（ｄｕａｌ−ｐｏｒｔ）ＲＡＭ５４を介して実装される、好適な一実施形態を示す。２ポートＲＡＭ５４の外部の論理部（図示せず）は、アドレスの値がＲＡＭの物理サイズを超えるときに０に戻る（ｗｒａｐ）こと（例えば、アドレスシーケンスは、Ｎ−３，Ｎ−２，Ｎ−１，０，１，２，．．．であり、ここでＮは、２ポートＲＡＭ５４においてアクセス可能な位置の数である）を除き、各アクセスの後、読取アドレスおよび書込アドレスを連続的にインクリメントする。したがって、２ポートＲＡＭ５４はＦＩＦＯのように機能する。 FIG. 3 illustrates a preferred embodiment in which the diagnostic mailbox is implemented via a dual-port RAM 54. An external logic unit (not shown) of the two-port RAM 54 wraps when the address value exceeds the physical size of the RAM (for example, the address sequence is N-3, N-2, N-1, 0, 1, 2, ..., where N is the number of accessible locations in the 2-port RAM 54), and after each access, the read and write addresses are Increment continuously. Thus, the 2-port RAM 54 functions like a FIFO.

診断可能化命令に関連した命令の出力が生成されるとき、２ポートＲＡＭ５４の書込ポートは使用可能となる。２ポートＲＡＭ５４の読取ポートは、書込ポートより低い周波数で動作する。書込アドレスＡ＿ｗｒｉｔｅが読取アドレスＡ＿ｒｅａｄより大きいとき、２ポートは、Ａ＿ｗｒｉｔｅ＝Ａ＿ｒｅａｄとなるまで、読取ポートの外からクロック提供される有効な情報を有する。Ａ＿ｗｒｉｔｅが大きくなりすぎて、読取ポートの外からクロック提供されるものを超えて情報の書込が行われる場合、エラー状態を指示するオーバフローインジケータが設定され、ラッチされる。 When the output of the instruction associated with the diagnostic enable instruction is generated, the write port of the 2-port RAM 54 is enabled. The read port of the 2-port RAM 54 operates at a lower frequency than the write port. When the write address A_write is greater than the read address A_read, the two ports have valid information clocked out of the read port until A_write = A_read. If A_write becomes too large and information is written beyond what is clocked out of the read port, an overflow indicator indicating an error condition is set and latched.

代表的な一実施形態では、メールボックスサポート論理部５３は、２ポートＲＡＭ５４が動作を実行するのを補助する命令を備える。メールボックスサポート論理部５３は、書込アドレスおよび読取アドレスを受信する。この情報に応じて、メールボックスサポート論理部５３は、オーバフローインジケータを通信することが可能である。オーバフローインジケータは、上述のように、２ポートＲＡＭ５４に情報の書込が行われること（診断メールボックス４４は一杯であること）を示す。２ポートＲＡＭ５４がデータを受信する準備ができていること（診断メールボックス４４は空であること）を示すために、エンプティインジケータを通信することが可能である。メールボックスサポート論理部５３は、診断ストリームを介して論理アナライザ４６へＲＡＭデータが通信されるとき、２ポートＲＡＭ５４に読取可能信号を通信する。 In one exemplary embodiment, the mailbox support logic 53 comprises instructions that assist the two-port RAM 54 to perform operations. Mailbox support logic 53 receives the write address and the read address. In response to this information, the mailbox support logic 53 can communicate an overflow indicator. The overflow indicator indicates that information is written to the 2-port RAM 54 (the diagnostic mailbox 44 is full) as described above. An empty indicator can be communicated to indicate that the 2-port RAM 54 is ready to receive data (the diagnostic mailbox 44 is empty). Mailbox support logic 53 communicates a readable signal to 2-port RAM 54 when RAM data is communicated to logic analyzer 46 via a diagnostic stream.

図４には、汎用入力出力（ＧＰＩＯ）命令フィールドを含むプログラムメモリ１２から受信される命令の、命令コントローラ１４による処理を示す。Ｎビットを有するＧＰＩＯ命令フィールドは、ＧＰＩ（汎用入力）もしくはＧＰＯ（汎用出力）を指示すること、またはゼロのＧＰＩＯコードによって、いずれも指示しないことが可能である。Ｎビットのフィールドによって、２^Ｎ−１のＧＰＩおよびＧＰＯの組合せまで、アドレス指定を行うことが可能である。ＧＰＩＯコードによって、命令コントローラ１４のトリガを行い、ＧＰＩ選択論理部５５またはＧＰＯ選択論理部５７を用いることが可能である。 FIG. 4 shows the processing by the instruction controller 14 of an instruction received from the program memory 12 including a general purpose input output (GPIO) instruction field. A GPIO instruction field having N bits can indicate GPI (General Purpose Input) or GPO (General Purpose Output), or none by a zero GPIO code. With N-bit fields, it is possible to address up to 2 ^N -1 combinations of GPI and GPO. The GPIO code can trigger the instruction controller 14 to use the GPI selection logic unit 55 or the GPO selection logic unit 57.

汎用出力（ＧＰＯ）動作を用いて、無線広帯域信号処理システム１０において利用される無線広帯域信号プロセッサ（ＷＢＳＰ）の外部の要素に対する通信を制御することが可能である。外部要素の例には、プロセッサ（英国ケンブリッジのＡＲＭ社（ＡＲＭ，Ｌｉｍｉｔｅｄ）製のＡＲＭプロセッサとして知られているプロセッサなど）またはＲＦトランシーバが含まれる。これに加えて、以下に記載のＰＩＤレジスタなど、ＷＢＳＰの動作に関連したレジスタにＧＰＯ動作を用いてアクセスすることが可能である。プログラムメモリ１２における現在の命令に、ある要素に対してユニークなＧＰＩＯコードが存在するとき、ＧＰＯ選択論理部５７は、その要素に直接接続されており、かつ、その要素に対してユニークなイネーブルのパルスを与える。特定のイネーブルの意味は、要素に応じて異なってよい。通常、イネーブル信号は、要素に出力ストリームに対するデータのラッチを行わせる。あるいは、イネーブル自体が意味を有し、ラッチを行うことなく、出力ストリームが要素へ直接的に送信されることを可能とする。 A general purpose output (GPO) operation can be used to control communications to elements external to the wireless wideband signal processor (WBSP) utilized in the wireless wideband signal processing system 10. Examples of external elements include a processor (such as the processor known as the ARM processor from ARM, Limited, Cambridge, UK) or an RF transceiver. In addition to this, it is possible to access a register related to the operation of the WBSP, such as the PID register described below, using the GPO operation. When the current instruction in program memory 12 has a unique GPIO code for an element, GPO selection logic 57 is directly connected to that element and is uniquely enabled for that element. Give a pulse. The meaning of a particular enable may vary depending on the element. Typically, the enable signal causes the element to latch data for the output stream. Alternatively, the enable itself has meaning and allows the output stream to be sent directly to the element without latching.

汎用入力（ＧＰＩ）動作を用いて、ＷＢＳＰの外部の要素またはＷＢＳＰの動作に関連したレジスタから入力を受信することが可能である。入力動作の例には、ＷＢＳＰと外部プロセッサ（ＡＲＭなど）との間のインタフェースのサポート、フレーム誤差のレートの記録が含まれる。命令のＧＰＩＯフィールドにおいてアサートされたコードがＧＰＩに対応する場合、入力ストリームはその特定の要素に接続される。 A general purpose input (GPI) operation can be used to receive input from elements external to the WBSP or from registers associated with the operation of the WBSP. Examples of input operations include support for the interface between the WBSP and an external processor (such as an ARM), and recording the frame error rate. If the code asserted in the GPIO field of the instruction corresponds to a GPI, the input stream is connected to that particular element.

図５には、汎用入力出力（ＧＰＩＯ）命令フィールドを有する命令の処理を含む、無線広帯域信号処理システム１０を示す。１つの入力またはＧＰＩ動作では、サンプルバッファ２２が、処理装置１６，１８，２０のうちの１つへ通信データの入力ストリームを通信する。別の入力またはＧＰＩ動作では、要素６６が処理装置１６，１８，２０のうちの１つへ通信データの入力ストリームを通信する。 FIG. 5 illustrates a wireless broadband signal processing system 10 that includes processing of instructions having a general purpose input output (GPIO) instruction field. In one input or GPI operation, the sample buffer 22 communicates the input stream of communication data to one of the processing devices 16, 18, 20. In another input or GPI operation, element 66 communicates an input stream of communication data to one of processing devices 16, 18, and 20.

図６には、処理反復時間（ＰＩＤ）の代表的な動的構成を示す。ＰＩＤは、受信モード（Ａ／Ｄからの）においてサンプルバッファ２２，２４，２６への書込が行われる、あるいは送信モード（ＤＡＣへの）においてサンプルバッファ２２，２４，２６からの読取が行われる、サンプルの数を参照する。無線広帯域信号処理システム１０において利用可能な代表的なバッファ技術は、「デジタル通信信号を処理するバッファの方法および装置（ＢｕｆｆｅｒｉｎｇＭｅｔｈｏｄａｎｄＡｐｐａｒａｔｕｓｆｏｒＰｒｏｃｅｓｓｉｎｇＤｉｇｉｔａｌＣｏｍｍｕｎｉｃａｔｉｏｎＳｉｇｎａｌｓ）」と題する米国特許出願第１０／６１３，８９７号明細書に記載されている。その全体を引用によって本明細書に援用する。 FIG. 6 shows a typical dynamic configuration of processing iteration time (PID). The PID is written to the sample buffers 22, 24, 26 in the reception mode (from A / D), or read from the sample buffers 22, 24, 26 in the transmission mode (to DAC). Browse the number of samples. A typical buffer technology available in the wireless broadband signal processing system 10 is US patent application Ser. No. 10/613 entitled “Buffering Method and Apparatus for Processing Digital Communication Signals”. , 897. Which is incorporated herein by reference in its entirety.

ＰＩＤ、すなわち、サンプルバッファ２２，２４，２６への書込の行われたサンプルの数によって、バッファスキームの進行されるレートが決定される。換言すると、ＰＩＤは、サンプルを受信するためにサンプルバッファ２２，２４，２６が接続されるプログラムレートである。小さなＰＩＤは、少ない時間においてサンプルが利用可能である（ＲＸ上で）または利用可能となる（ＴＸ上で）、遅延時間の少ない状況を表す。より大きなＰＩＤによって、本質的により有効な、より長いベクトル演算が可能となる（命令のための初期の処理遅延はより多くの出力データを通じて償却される）ので、より大きな処理効率が可能となる。 The PID, ie the number of samples written to the sample buffers 22, 24, 26, determines the rate at which the buffer scheme is advanced. In other words, PID is the program rate to which the sample buffers 22, 24, 26 are connected to receive samples. A small PID represents a situation where the sample is available (on RX) or becomes available (on TX) in a small amount of time and has a low delay. Larger PIDs allow for greater processing efficiency because inherently more effective and longer vector operations are possible (the initial processing delay for instructions is amortized through more output data).

サンプルバッファ２２，２４，２６の進行のレートを決定するパラメータには、ＧＰＩＯ命令を介してアクセス可能である。現在の命令のＧＰＩＯフィールドが値１を格納しているとき、出力ストリームはサンプルバッファが進行されるレートを制御するレジスタへルーティングされる。そのため、命令コントローラ１４が動的にＰＩＤを変更する能力によって、小さい遅延時間と大きい遅延時間との間のリアルタイムなトレードオフが可能となる。例えば、より長いベクトル演算が実行中であるとき、あるいは実行が予想されるとき、より長いＰＩＤを用いることが可能である。これに加えて、特定のシンボルレートを有する標準においては、一部のＰＩＤが本質的に優れている（例えば、８０２．１１ｇには４マイクロ秒が本質的に適当である）。 Parameters that determine the rate of progression of the sample buffers 22, 24, 26 are accessible via GPIO instructions. When the GPIO field of the current instruction stores the value 1, the output stream is routed to a register that controls the rate at which the sample buffer is advanced. Therefore, the ability of the instruction controller 14 to dynamically change the PID enables a real-time tradeoff between a small delay time and a large delay time. For example, a longer PID can be used when a longer vector operation is being performed or is expected to be performed. In addition to this, some PIDs are inherently superior in standards with specific symbol rates (eg, 4 microseconds is inherently adequate for 802.11g).

図７には、１つ以上の代表的な実施形態による、ＡＲＭプロセッサなどのプロセッサ、および無線広帯域信号処理システム１０と共に利用される無線広帯域信号プロセッサによって実行される動作を示す。特定の実施形態または実装に応じて、追加の動作、より少ない動作、または異なる動作が実行されてよい。 FIG. 7 illustrates operations performed by a processor, such as an ARM processor, and a wireless broadband signal processor 10 utilized with the wireless broadband signal processing system 10 according to one or more exemplary embodiments. Depending on the particular embodiment or implementation, additional operations, fewer operations, or different operations may be performed.

１つ以上の代表的な実施形態では、ＷＢＳＰは信号プロセッサとして用いられ、その場合、ＡＲＭプロセッサなど、主プロセッサの制御下にある必要がある。このため、ＡＲＭプロセッサはＷＢＳＰに対し読取および書込を行う性能を有する必要がある。図７に示すインタフェースは、完全にソフトウェア定義されたものであるため、非常に柔軟である。任意のプロトコルをサポートするインタフェースを定義するように、ＡＲＭプロセッサおよびＷＢＳＰをプログラムすることが可能である。 In one or more exemplary embodiments, the WBSP is used as a signal processor, in which case it needs to be under the control of a main processor, such as an ARM processor. For this reason, the ARM processor needs to have the ability to read and write to the WBSP. The interface shown in FIG. 7 is very flexible because it is completely software defined. It is possible to program the ARM processor and WBSP to define an interface that supports any protocol.

「読取」要求は、特定のＷＢＳＰバッファ内部の特定のメモリ位置の内容をＡＲＭプロセッサに通信するための機構である。「書込」要求は、ＷＢＳＰプロセッサの特定のバッファ内部の特定のメモリ位置へ配置される特定の値を、ＡＲＭプロセッサからＷＢＳＰプロセッサに通信するための機構である。 A “read” request is a mechanism for communicating the contents of a particular memory location within a particular WBSP buffer to the ARM processor. A “write” request is a mechanism for communicating from the ARM processor to the WBSP processor a specific value that is placed at a specific memory location within a specific buffer of the WBSP processor.

「読取」要求は、ＡＲＭプロセッサが、較正、ホストＧＵＩディスプレイのＰＨＹ統計（ＲＳＳＩなど）、ＡＲＭ処理に対する動的アルゴリズム入力など、様々な目的でＷＢＳＰプロセッサからアクセスできる情報をサポートする。「書込」要求は、ＴＸ上のＤＣ除去（ＩおよびＱ）、データレートの関数としてのＴＸ出力の更新、モデム８０２．１１ａ／ｂ／ｇの動作モード（２重捕捉が必要でないときに電力消費のより少ない処理を可能とする）、ＲＳＳＩ計算活性（やはり、電力消費の無効化を可能とする）など、ＡＲＭがＷＢＳＰへ渡す情報の通信をサポートする。 The “read” request supports information that the ARM processor can access from the WBSP processor for various purposes, such as calibration, PHY statistics of the host GUI display (such as RSSI), dynamic algorithm input for ARM processing, etc. A “write” request can include DC removal on TX (I and Q), TX output update as a function of data rate, modem 802.11a / b / g mode of operation (power when dual acquisition is not required) Supports communication of information that the ARM passes to the WBSP, such as enabling processing with less consumption), RSSI calculation activity (again enabling power consumption to be disabled).

状態Ａ１では、ＡＲＭプロセッサは、読取または書込要求の要求を開始する。一般に、プロセッサは互いに対して非同期に動作しているので、ＷＢＳＰプロセッサは何らかの一般的な処理を含む状態Ｗ１にある。定期的に、ＷＢＳＰプロセッサは状態Ｗ２に遷移して、ＷＢＳＰ＿ＳＴＡＴＵＳビットを検査する。これらのビットはＧＰＩ命令のようにアクセス可能である。ＷＢＳＰ＿ＳＴＡＴＵＳ＝０の場合、状態Ｗ１において一般的な処理が再開する。ＷＢＳＰ＿ＳＴＡＴＵＳが０でない場合、状態Ｗ３に遷移し、ＡＲＭ命令が実行される。 In state A1, the ARM processor initiates a request for a read or write request. In general, since the processors are operating asynchronously with respect to each other, the WBSP processor is in state W1, which includes some general processing. Periodically, the WBSP processor transitions to state W2 and checks the WBSP_STATUS bit. These bits are accessible like GPI instructions. If WBSP_STATUS = 0, general processing resumes in state W1. If WBSP_STATUS is not 0, the state transits to state W3 and the ARM instruction is executed.

動作が「読取」である場合、ＷＢＳＰプロセッサは、ＷＢＳＰ＿ＡＤＤＲＥＳＳにより指定されるアドレスにアクセスする。この１次元アドレスは、バッファ番号およびバッファ内のアドレスを含む、２次元のＷＢＳＰアドレスへ翻訳される。この位置の内容がアクセスされ、出力ストリームはＷＢＳＰ＿ＤＡＴＡに関連したＧＰＯへ宛てられる。 If the operation is “read”, the WBSP processor accesses the address specified by WBSP_ADDRESS. This one-dimensional address is translated into a two-dimensional WBSP address that includes the buffer number and the address in the buffer. The contents of this location are accessed and the output stream is directed to the GPO associated with WBSP_DATA.

動作が「書込」である場合、ＷＢＳＰプロセッサは、ＷＢＳＰ＿ＡＤＤＲＥＳＳにより指定されるアドレスにアクセスする。この１次元アドレスは、バッファ番号およびバッファ内のアドレスを含む、２次元のＷＢＳＰアドレスへ翻訳される。ＷＢＳＰ＿ＤＡＴＡの値はＧＰＩ機構を介してアクセスされる。ＷＢＳＰプロセッサは、復号されたバッファ番号およびバッファ内のアドレスについて予定される出力ストリームに、この値をルーティングする。 If the operation is “write”, the WBSP processor accesses the address specified by WBSP_ADDRESS. This one-dimensional address is translated into a two-dimensional WBSP address that includes the buffer number and the address in the buffer. The value of WBSP_DATA is accessed via the GPI mechanism. The WBSP processor routes this value to the output stream scheduled for the decoded buffer number and address in the buffer.

「読取」および「書込」の両方の場合において、ＷＢＳＰ＿ＳＴＡＴＵＳの値は０にリセットされる。その間、ＡＲＭプロセッサは、状態Ａ２における一般的な処理を再開する。定期的に、ＡＲＭプロセッサは、自身のＭＭＩＯレジスタＡＲＭ＿ＷＢＳＰ＿ＡＣＣＥＳＳを介してＷＢＳＰ＿ＳＴＡＴＵＳの値を検査する。この値が０であるとき、ＡＲＭプロセッサは「読取」または「書込」命令が完了したことを認識する。この動作が読取であった場合、ＡＲＭプロセッサはＷＢＳＰ＿ＤＡＴＡレジスタの読取値にアクセスすることが可能である。別の「読取」または「書込」命令を開始する選択肢を含め、「読取」動作による影響を受けて、継続的な動作が発生してもよい（状態Ａ４）。同時に、「書込」動作による影響を受けて、ＷＢＳＰ動作が状態Ｗ３における動作を継続してもよい。 In both “read” and “write” cases, the value of WBSP_STATUS is reset to zero. Meanwhile, the ARM processor resumes general processing in state A2. Periodically, the ARM processor checks the value of WBSP_STATUS via its MMIO register ARM_WBSP_ACCESS. When this value is 0, the ARM processor recognizes that the “read” or “write” instruction is complete. If this operation is a read, the ARM processor can access the read value of the WBSP_DATA register. Continuing actions may occur (state A4), influenced by the “read” action, including the option to initiate another “read” or “write” instruction. At the same time, under the influence of the “write” operation, the WBSP operation may continue the operation in the state W3.

図８には、無線広帯域信号処理システム１０において実行される代表的なＦＦＴアルゴリズムにより実行される動作を示す。特定の実施形態または実装に応じて、アルゴリズムにおいて追加の動作、より少ない動作、または異なる動作が実行されてよい。ＦＦＴアルゴリズムをプログラムメモリ１２に存在するソフトウェアプログラムへ符号化することが可能である。動作８２では、ＦＦＴ／ＩＦＦＴ変換の行われるデータがバッファにロードされる。続く演算の動作を管理する設定が初期化される。第２のカウンタは２に初期化され、Ｎはｌｏｇ_２（入力ベクトルの長さ）に設定される。動作８４では、ＧＰＩＯ命令第２３番によって、処理装置１８においてマスターカウンタがリセットされる。ＧＰＩＯ命令第１３番によって、処理装置１８（図１）にＦＦＴ長さ（Ｎ）が信号で送信される。より詳細に以下に記載するように、マスターカウンタはアドレス生成を担う。 FIG. 8 shows operations executed by a typical FFT algorithm executed in the wireless wideband signal processing system 10. Depending on the particular embodiment or implementation, additional operations, fewer operations, or different operations may be performed in the algorithm. It is possible to encode the FFT algorithm into a software program residing in the program memory 12. In operation 82, data to be subjected to FFT / IFFT conversion is loaded into a buffer. Settings for managing the operation of subsequent operations are initialized. The second counter is initialized to 2 and N is set to log ₂ (the length of the input vector). In operation 84, the master counter is reset in the processing unit 18 by the GPIO command No. 23. With the GPIO command No. 13, the FFT length (N) is transmitted as a signal to the processing device 18 (FIG. 1). As described in more detail below, the master counter is responsible for address generation.

動作８６では、処理装置１８が、ＦＦＴ／ＩＦＦＴアルゴリズムに関連したベクトル演算を実行する。１つ以上の実施形態では、ベクトル命令によって演算されるベクトルの長さの上限は１２８語である。１２８語より大きいデータ長については、充分な回数、ＦＦＴ／ＩＦＦＴアルゴリズムのループを行うことが必要である（例えば、データ長が２０４８語で、最大ベクトル長さが１２８語である場合、変換を実行するには１６回のＦＦＴ／ＩＦＦＴアルゴリズムの反復が必要である）。動作８７では、動作８６においてＦＦＴ／ＩＦＦＴアルゴリズムがデータのうち１つの１２８語のセグメントを演算した後にのみ（ＧＰＩＯ命令２３を介して明示的にリセットされない限り）、マスターカウンタの値がインクリメントされる。 In operation 86, the processor 18 performs vector operations associated with the FFT / IFFT algorithm. In one or more embodiments, the upper limit of the length of the vector computed by the vector instruction is 128 words. For data lengths greater than 128 words, it is necessary to loop the FFT / IFFT algorithm a sufficient number of times (for example, if the data length is 2048 words and the maximum vector length is 128 words, conversion is performed) Requires 16 FFT / IFFT algorithm iterations). In act 87, the value of the master counter is incremented only after the FFT / IFFT algorithm has computed a 128 word segment of data in act 86 (unless explicitly reset via the GPIO instruction 23).

動作８８では、第２のカウンタが２だけ進められ、ＦＦＴ／ＩＦＦＴ処理の次のステージへ進行する。また、入力および出力のバッファが切り替えられ、ＦＦＴステージ／ＩＦＦＴステージの間の処理のカスケード化が可能となる。動作８９では、ＦＦＴ／ＩＦＦＴ処理のすべてのステージが実行された場合、ＦＦＴ／ＩＦＦＴ変換されたデータがプロセッサによるさらなる処理について利用可能となる。 In operation 88, the second counter is incremented by 2 and proceeds to the next stage of the FFT / IFFT process. Further, the input and output buffers are switched, and the processing can be cascaded between the FFT stage / IFFT stage. In operation 89, if all stages of FFT / IFFT processing have been performed, the FFT / IFFT converted data is available for further processing by the processor.

図１を参照すると、メモリ３８は処理装置１６，１８，２０へ算術関数を提供している。好適な実施形態では、メモリ３８はリードオンリメモリ（ＲＯＭ）である。ＲＯＭは比較的電力を浪費する。そのため、メモリ３８に対するアクセスを最小化することによって、必要な総電力が削減される。ＦＦＴアルゴリズムでは、基数４の演算の出力に用いられる回転因子を含む、算術関数用のメモリ３８にアクセスすることが必要である。 Referring to FIG. 1, memory 38 provides arithmetic functions to processing units 16, 18, and 20. In the preferred embodiment, the memory 38 is a read only memory (ROM). ROM is relatively wasteful of power. Thus, minimizing access to the memory 38 reduces the total power required. The FFT algorithm requires access to an arithmetic function memory 38 that contains the twiddle factors used to output the radix-4 operation.

所与のステージのＦＦＴアルゴリズムによって演算された入力ベクトルのセグメントを再配列することによって、連続的な基数４の演算の出力に同じ組の３つの回転因子を用いることが可能である。例として、ｌｏｇ_４（４０９６）＝６ステージが必要な４０９６語のＦＦＴを考える。ステージ１では、基数４の演算毎にメモリ３８から３つの回転因子がアクセスされる。なお、基数４の演算の第１の出力の回転因子は常に１であるため、出力のうちの３つのみが自明でない。しかしながら、次のステージ、すなわち、ＦＦＴアルゴリズムのステージ２では、以下に記載のように最適なアドレス生成スキームが用いられる場合、同じ組の３つの回転因子は４つの連続する基数４の演算に用いられてもよい。ＦＦＴアルゴリズムのステージ３では、同じ組の３つの回転因子が１６の連続する基数４の演算に用いられてもよい。ステージ４では、この数は、６４の連続する基数４の演算へと幾何学的に増加し続ける。 By rearranging the segments of the input vector computed by a given stage FFT algorithm, it is possible to use the same set of three twiddle factors at the output of successive radix-4 operations. As an example, consider a 4096-word FFT that requires log ₄ (4096) = 6 stages. In stage 1, three twiddle factors are accessed from the memory 38 for every radix-4 operation. Note that since the twiddle factor of the first output of the radix-4 operation is always 1, only three of the outputs are not obvious. However, in the next stage, ie, stage 2 of the FFT algorithm, the same set of three twiddle factors are used for four consecutive radix-4 operations if the optimal address generation scheme is used as described below. May be. In stage 3 of the FFT algorithm, the same set of three twiddle factors may be used for 16 consecutive radix-4 operations. In stage 4, this number continues to increase geometrically to 64 consecutive radix-4 operations.

他の設計の考慮によって、メモリ３８に必要な回転因子空間の量を削減することが可能である。例えば、より大きな２の累乗はより小さな２の累乗のスーパーセットであるので、最大のＦＦＴサイズに対応する回転因子しか格納する必要はない。このため、回転アドレス生成によって、すべてのＦＦＴサイズが単一のテーブルに折り込まれることがサポートされる。また、アドレス生成スキームによって、最大のＦＦＴサイズについての回転因子の数の軽減もサポートされる。例えば、８１９２語のＦＦＴでは、隣接した回転因子は異なるｅｘｐ（ｊ×２×π／８１９２）の因子であり、これは１０ビットの固定小数点表現により解くには小さすぎる。そのため、奇数の値すべてを破棄した、削減された回転因子の組が格納される。対称性によって、２×πラジアンの完全な単位円を、回転因子に相当するπ／４（１オクタント）の記憶域によって構成することが可能である。この単位円によって、追加の１／８に必要な記憶域要件が削減される。回転オクタント操作ブロック（図９に関して記載の処理装置１８に示す）と結合された回転アドレス生成によって、この記憶域の軽減が行われる。 Other design considerations can reduce the amount of twiddle factor space required for memory 38. For example, since the larger power of 2 is a superset of the smaller power of 2, only the twiddle factor corresponding to the largest FFT size needs to be stored. Thus, rotation address generation supports all FFT sizes folded into a single table. The address generation scheme also supports a reduction in the number of twiddle factors for the maximum FFT size. For example, in an 8192 word FFT, the adjacent twiddle factors are different exp (j × 2 × π / 8192) factors, which are too small to solve with a 10-bit fixed point representation. Therefore, a reduced set of twiddle factors that stores all odd values is stored. Due to symmetry, a complete unit circle of 2 × π radians can be constructed with a storage area of π / 4 (1 octant) corresponding to the twiddle factor. This unit circle reduces the storage requirement for an additional 1/8. This storage reduction is accomplished by rotational address generation coupled with a rotational octant operation block (shown in processing unit 18 described with respect to FIG. 9).

図９には、図１に関して記載のプロセッサ１８の機能のさらに詳細な図を示す。１つ以上の実施形態では、プロセッサが１ポートＲＡＭからデータを逐次受信するので、プロセッサ１８は基数４のＦＦＴを保証するために４つの入力（Ｘ１，Ｘ２，Ｘ３，Ｘ４）をバッファする。例外は、４の整数乗でないＦＦＴサイズに対する最終の基数２のステージである。この場合、２つの入力のみがバッファされ、Ｘ２およびＸ４は０に設定される。 FIG. 9 shows a more detailed view of the functionality of the processor 18 described with respect to FIG. In one or more embodiments, since the processor receives data sequentially from the 1-port RAM, the processor 18 buffers four inputs (X1, X2, X3, X4) to guarantee a radix-4 FFT. The exception is the final radix-2 stage for FFT sizes that are not a power of four. In this case, only two inputs are buffered and X2 and X4 are set to zero.

基数４のＦＦＴエンジンは、無線広帯域信号処理システム１０の残部と比較して、減少されたクロック速度で動作する。多くの実施形態では、基数４のＦＦＴエンジンは、４分の１に減少されたシステムクロック周波数で動作する。例外は、４の整数乗でないＦＦＴサイズに対する最終の基数２のステージであり、この場合、システムクロック周波数は２分の１に減少される。基数４のＦＦＴエンジンは、８つの複素数の加算を実行して４つの出力を生成することが可能であるように最適化される。基数４のＦＦＴエンジンは、２組のカスケード化された加算器を備える。第１の組の加算器は、４つの複素数の入力に基づき、次の部分和を生成する：
Ｐ１＝Ｘ１＋Ｘ３
Ｐ２＝Ｘ１−Ｘ３
Ｐ３＝Ｘ２＋Ｘ４
Ｐ４＝Ｘ２−Ｘ４。 The radix-4 FFT engine operates at a reduced clock rate compared to the rest of the wireless broadband signal processing system 10. In many embodiments, the radix-4 FFT engine operates at a system clock frequency reduced by a factor of four. The exception is the final radix-2 stage for FFT sizes that are not a power of four, in which case the system clock frequency is reduced by a factor of two. The radix-4 FFT engine is optimized so that it can perform 8 complex additions to produce 4 outputs. The radix-4 FFT engine includes two sets of cascaded adders. The first set of adders generates the following partial sums based on four complex inputs:
P1 = X1 + X3
P2 = X1-X3
P3 = X2 + X4
P4 = X2-X4.

第２の組の加算器は、この部分和に基づき、次の出力を計算する：
Ｙ１＝Ｐ１＋Ｐ３
Ｙ２＝Ｐ２−ｊ×Ｐ４
Ｙ３＝Ｐ１−Ｐ３
Ｙ４＝Ｐ２＋ｊ×Ｐ４。 A second set of adders calculates the following output based on this partial sum:
Y1 = P1 + P3
Y2 = P2-j * P4
Y3 = P1-P3
Y4 = P2 + j × P4.

ここで、ｊによる乗算は、ＩおよびＱを切り替え、Ｉの出力を反転させることによって実装される。
一般に、この演算に切捨は存在しない。 Here, multiplication by j is implemented by switching between I and Q and inverting the output of I.
In general, there is no truncation in this operation.

各スカラー回転因子乗算の出力は、１１ビットまでで切り捨てられる。したがって、複素乗算器の出力は１２ビットである。ビット［１０：１］は処理装置１８の出力に対しマッピングされる。回転因子がアクセスされるレートを減少させるために、１でない回転因子を格納するための３つの記憶レジスタ９２が存在する。図１０〜１３に関連してさらに以下に記載するように、記憶レジスタ９２は、回転アドレスが回転アドレス生成器のマッピングブロックの外へ遷移するときのみ更新を行う。この遷移は、より詳細に以下に説明するように、動作１０６において生成される回転アドレス遷移インジケータによって、記憶レジスタ９２へ信号で送信される。乗算器９４は、１である回転因子が適用されるとき、第４の乗算毎の回避機能をサポートする。図１０に示し、以下に記載する乗算器１１０からの３ビットの制御語に基づき、アクセスされる回転因子は回転オクタント操作器９０によって、次のように操作される。回転因子は３つの演算のカスケード効果を受ける：
ビット１またはビット２のいずれか一方＝１の場合
回転因子のＩおよびＱを交換し、実数および虚数の符号を反転（ｎｅｇａｔｅ）
ビット２＝１の場合
回転因子の実数の符号を反転する
ビット３＝１の場合
回転因子の実数および虚数の両方の符号を反転する。 The output of each scalar twiddle factor multiplication is truncated to 11 bits. Therefore, the output of the complex multiplier is 12 bits. Bits [10: 1] are mapped to the output of processing unit 18. To reduce the rate at which twiddle factors are accessed, there are three storage registers 92 for storing twiddle factors that are not one. As described further below in connection with FIGS. 10-13, the storage register 92 updates only when the rotation address transitions out of the rotation address generator mapping block. This transition is signaled to the storage register 92 by a rotating address transition indicator generated in operation 106, as described in more detail below. Multiplier 94 supports a fourth multi-multiplication avoidance function when a twiddle factor of 1 is applied. Based on the 3-bit control word from the multiplier 110 shown in FIG. 10 and described below, the accessed twiddle factor is manipulated by the rotation octant manipulator 90 as follows. The twiddle factor is subject to a cascade effect of three operations:
When either bit 1 or bit 2 = 1, exchange the twiddle factors I and Q and negate the sign of the real and imaginary numbers
When bit 2 = 1 Inverts the sign of the real number of the twiddle factor When bit 3 = 1 Inverts the sign of both the real and imaginary number of the twiddle factor.

図１０は、図９に関連して記載したＦＦＴアルゴリズム用のアドレス生成において実行される動作を示す。特定の実施形態または実装に応じて、追加の動作、より少ない動作、または異なる動作が実行されてよい。動作１０４では、入力アドレスを生成するために、動作１０２によって供給されたマスターカウンタ情報が入力アドレス生成器によってマッピングされる。図１１には、マスターカウンタ情報の代表的なマッピングを示す。示すように、入力アドレスは、ＦＦＴアルゴリズムによって変換される入力ベクトルのサイズであるＮだけ占有される。図１１に示した代表的なマッピングでは、入力アドレスは長さ１３ビットである。高位の１３−Ｎビットは０に設定され、Ｎ＝ｌｏｇ_２（ＦＦＴのサイズ）である。次の高位ビットはマスターカウンタのｓビットであり、ここで、ｓ＝２，４，．．．，Ｎ−２，Ｎ（Ｎが偶数の場合）、ｓ＝２，４，．．．Ｎ−１，Ｎ（Ｎが奇数の場合）である。また、入力アドレスの低位のビットは、マスターカウンタのＮ−ｓビットである。再び図１０を参照する。動作１０４によって入力アドレスが生成されると、入力バッファは入力アドレスを受信する。また、以下に記載の最終ステージを除き、出力バッファも入力アドレスを受信する。 FIG. 10 illustrates operations performed in address generation for the FFT algorithm described in connection with FIG. Depending on the particular embodiment or implementation, additional operations, fewer operations, or different operations may be performed. In operation 104, the master counter information provided by operation 102 is mapped by the input address generator to generate an input address. FIG. 11 shows a typical mapping of master counter information. As shown, the input address is occupied by N, which is the size of the input vector converted by the FFT algorithm. In the representative mapping shown in FIG. 11, the input address is 13 bits long. The high order 13-N bits are set to 0 and N = log ₂ (FFT size). The next high order bit is the s bit of the master counter, where s = 2, 4,. . . , N-2, N (when N is an even number), s = 2, 4,. . . N-1, N (when N is an odd number). The low order bit of the input address is the Ns bit of the master counter. Refer to FIG. 10 again. When the input address is generated by operation 104, the input buffer receives the input address. Also, except for the final stage described below, the output buffer also receives the input address.

動作１０６では、回転因子アドレスが生成される。図１２は、回転アドレス用の代表的なマッピングを示す。この代表的なマッピングには、動作１０４において生成される入力アドレスの再シャフルが含まれる。回転アドレスは１１ビットを有する。高位ビットは入力アドレスビット（Ｎ−ｓ）〜１である。回転因子の残る低位ビット（入力アドレスサイズ１１からＮ−ｓを減算することによって決定される）は、０に設定される。 In act 106, a twiddle factor address is generated. FIG. 12 shows an exemplary mapping for the rotational address. This representative mapping includes reshuffling of the input address generated in operation 104. The rotation address has 11 bits. The high order bits are input address bits (N−s) ˜1. The remaining low order bits of the twiddle factor (determined by subtracting Ns from the input address size 11) are set to zero.

新たな回転因子が必要であるか否かを判定するため、また、電力節約のため、遷移の決定は、メモリ３８（ＲＯＭなど）に対するアクセス数を制限するように行われる。回転アドレス遷移インジケータは、回転アドレスに変更または遷移が存在することと、新たな回転因子が必要であることとを示す、動作１０６によって生成される。回転アドレス遷移インジケータは、処理装置１８および算術関数メモリ３８において、記憶レジスタ９２へ送信される。メモリ３８がアクセスされるとき、３つの回転因子が読み出され、上述のように操作され、記憶レジスタ９２に格納される。 Transition decisions are made to limit the number of accesses to memory 38 (such as ROM) to determine whether a new twiddle factor is needed and to save power. A rotation address transition indicator is generated by operation 106 indicating that there is a change or transition in the rotation address and that a new rotation factor is required. The rotation address transition indicator is transmitted to the storage register 92 in the processing unit 18 and the arithmetic function memory 38. When the memory 38 is accessed, the three twiddle factors are read out, manipulated as described above, and stored in the storage register 92.

以下では、回転因子による記憶レジスタ９２の占有および回転因子の使用について記載する。この処理では、乗算器１１０を用いて、マスターカウンタの２つの低位ビット（ＬＳＢ）に回転アドレスを乗算する。この乗算の積（この代表的な実施形態では１３ビット）は複数の部分に分離される。それらのビットのうちの１０ビットは、総和器（ｓｕｍｍｅｒ）１１２およびマルチプレクサ１１４へ入力として提供される。総和器１１２は、この１０ビットを５１２から減算し、マルチプレクサ１１４の入力１へ結果を提供する。マルチプレクサ１１４の他の入力（入力０）は、乗算器１１０による乗算の結果から１０ビットを受信する。乗算結果の残るビットからの１ビットは、マルチプレクサ１１４に対する選択として用いられ、乗算結果の３つの高位ビットは、以前に参照された制御語としてプロセッサ１８の回転オクタント操作器９０へ提供される。マルチプレクサ１１４の出力は、回転因子を読み出すために算術関数メモリ３８へ送信されるアドレスである。 In the following, the occupation of the storage register 92 by the twiddle factor and the use of the twiddle factor will be described. In this process, the multiplier 110 is used to multiply the two low-order bits (LSB) of the master counter by the rotation address. The product of this multiplication (13 bits in this exemplary embodiment) is separated into multiple parts. Ten of these bits are provided as inputs to a summer 112 and a multiplexer 114. The summer 112 subtracts the 10 bits from 512 and provides the result to the input 1 of the multiplexer 114. The other input of multiplexer 114 (input 0) receives 10 bits from the result of multiplication by multiplier 110. One bit from the remaining bits of the multiplication result is used as a selection for the multiplexer 114, and the three high order bits of the multiplication result are provided to the rotating octant operator 90 of the processor 18 as a previously referenced control word. The output of the multiplexer 114 is an address that is sent to the arithmetic function memory 38 to read the twiddle factor.

ＦＦＴの行われる入力ベクトルの長さが２の奇数乗（４の非整数倍）の長さを有する場合、出力バッファは、動作１０８において形成される入力アドレスがインタリーブ処理されたものを受信する。図１３に示すように、インタリーブ処理された入力アドレスは、Ｎの値に応じて異なる。Ｎは上述のようにｌｏｇ_２（ＦＦＴのサイズ）を表す。図１３に示す入力アドレスの構成にしたがって、出力バッファへ提供される１３ビットのアドレスは、最初の１３−Ｎビットに０を含む。設計によっては、図１０〜１３に示す実行される処理では、回転因子を格納しているメモリ３８へのアクセスを制限することによって、電力を節約する。 If the length of the input vector on which the FFT is performed has an odd power of 2 (non-integer multiple of 4), the output buffer receives the interleaved input address formed in operation 108. As shown in FIG. 13, the interleaved input address varies depending on the value of N. N represents log ₂ (FFT size) as described above. According to the configuration of the input address shown in FIG. 13, the 13-bit address provided to the output buffer includes 0 in the first 13-N bits. Depending on the design, the processes performed shown in FIGS. 10-13 conserve power by restricting access to the memory 38 that stores the twiddle factors.

図１４には、無線広帯域信号処理システム１０において実行されるコンテキスト切替処理により実行される動作を示す。実施形態または実装に応じて、追加の動作、より少ない動作、または異なる動作が実行されてよい。動作１４２では、クリティカルタスク１の動作が実行される。クリティカルタスクは、新たな処理反復間（ＰＩＤ）が開始する前に各々完了される必要のある、１つ以上の動作である。例えば、クリティカルタスク１は、処理反復時間（ＰＩＤ）命令が受信されるときに実行され、新たなＰＩＤが受信される前に各々完了する、複数の８０２．１１動作を含むことが可能である。クリティカルタスク１の動作が完了されると、動作１４４においてクリティカルタスク２の動作の実行が可能である。例えば、クリティカルタスク２は、中間バッファへＤＶＢサンプルをコピーすることに伴う動作であることが可能である。非クリティカルタスク３が完了される前にクリティカルタスク２の動作が完了される場合、動作１４６において非クリティカルタスク動作が実行される、プログラム誘導コンテキスト切替が実行される。非クリティカル動作はＰＩＤ境界を超えて延長してもよい。そのような非クリティカルタスク３は、ＤＶＢ復調であることが可能である。ＰＩＤ命令が受信されるとき、誘導コンテキスト切替が終了される。クリティカルタスク２が完了されるときに非クリティカルタスクが完了される場合、ＰＩＤ終了までスリープモードに入る。 FIG. 14 shows an operation executed by the context switching process executed in the wireless broadband signal processing system 10. Depending on the embodiment or implementation, additional operations, fewer operations, or different operations may be performed. In operation 142, the operation of critical task 1 is executed. A critical task is one or more operations that each need to be completed before a new inter-process iteration (PID) begins. For example, critical task 1 may include a plurality of 802.11 operations that are executed when a processing iteration time (PID) instruction is received, each completed before a new PID is received. When the operation of critical task 1 is completed, the operation of critical task 2 can be executed in operation 144. For example, critical task 2 can be an operation associated with copying a DVB sample to an intermediate buffer. If the operation of critical task 2 is completed before non-critical task 3 is completed, a program-guided context switch is performed in which a non-critical task operation is performed in operation 146. Non-critical operations may extend beyond the PID boundary. Such non-critical task 3 can be DVB demodulation. When the PID command is received, the guidance context switch is terminated. If the non-critical task is completed when critical task 2 is completed, the sleep mode is entered until the end of the PID.

コンテキストの従来の定義は、以前に終わったところからタスクを再開することができる１組の情報である。コンテキスト切替中、「現在の」タスクのコンテキストが格納され、「次の」タスクのコンテキストがロードされる。「現在の」タスクは以前に格納されたコンテキストをロードして戻すことによって、将来の何らかの時点で立ち戻られる。ＷＢＳＰの状態は１組のプロセッサレジスタによって定義される。示した例では、プロセッサレジスタは命令ポインタである。しかしながら、いくつかの追加のプロセッサレジスタが存在することも可能である。ＷＢＳＰは、コンテキストの完全な記述のための１組のメモリ要素（例えば、ハードウェアレジスタ）を備える。メモリ要素の組の数によって、最大の同時コンテキスト数が決定される。ＷＢＳＰでは、所与のコンテキストのための１組のメモリ要素に格納された情報が１組のプロセッサレジスタとしてロードされるとき、コンテキスト切替が発生する。ＷＢＳＰでは、すべての組のメモリ要素は、単一のクロックによりプロセッサレジスタへロードされる。この点において、ＷＢＳＰは命令の正常な定常的な実行を継続する。 The conventional definition of a context is a set of information that can resume a task from where it ended previously. During context switching, the context of the “current” task is stored and the context of the “next” task is loaded. The “current” task can be brought back at some point in the future by loading back the previously stored context. The state of the WBSP is defined by a set of processor registers. In the example shown, the processor register is an instruction pointer. However, there can be some additional processor registers. The WBSP includes a set of memory elements (eg, hardware registers) for a complete description of the context. The maximum number of simultaneous contexts is determined by the number of sets of memory elements. In WBSP, context switching occurs when information stored in a set of memory elements for a given context is loaded as a set of processor registers. In WBSP, all sets of memory elements are loaded into processor registers with a single clock. At this point, the WBSP continues normal steady execution of the instruction.

図１５には、図１４に関連して記載したコンテキスト切替処理のタイミングを示す。ＰＩＤ１では、クリティカルタスク１の動作が開始される。クリティカルタスク１の動作はＰＩＤ２の開始の前に完了され、クリティカルタスク２の動作および非クリティカルタスク３の動作が実行されることが可能となる。ＰＩＤ２を受信すると、非クリティカルタスク３は停止され（まだ完了していないが）、クリティカルタスク１の動作が実行される。このように、ＰＩＤの受信によってクリティカルタスク動作の実行のトリガが行われるまで、処理は継続する。クリティカルタスクの動作は順番に実行され、まだ新たなＰＩＤが受信されない場合、非クリティカルタスク動作が実行されることが可能である。そのため、クリティカルタスク動作はＰＩＤ内に完了するが、非クリティカルタスクを実行するために不活性期間が利用される。 FIG. 15 shows the timing of the context switching process described in relation to FIG. In PID1, the operation of critical task 1 is started. The operation of the critical task 1 is completed before the start of PID2, and the operation of the critical task 2 and the operation of the non-critical task 3 can be executed. When PID2 is received, the non-critical task 3 is stopped (although not yet completed) and the operation of the critical task 1 is executed. In this way, the processing continues until the trigger for execution of the critical task operation is performed by receiving the PID. The critical task operations are performed in order, and if no new PID is received yet, a non-critical task operation can be performed. Therefore, although the critical task operation is completed within the PID, an inactive period is used to execute the non-critical task.

図１６には、無線広帯域信号処理システム１０における処理装置を示す。この処理装置は、畳み込み演算（ＦＩＲフィルタリング）およびタップローディング（ｔａｐｌｏａｄｉｎｇ）の実行が可能である。アドレス生成論理部２０２へ初期値および刻み値が提供される。アドレス生成論理部２０２は、ＲＯＭ１，ＲＯＭ２，ＲＯＭ３，ＲＯＭ４，ＲＯＭ５，ＲＯＭ６，ＲＯＭ７，ＲＯＭ８へ供給されるアドレスを生成する。処理装置によって、入力シフタ２０４にて入力データが受信される。入力シフタ２０４はタップローディングを実行して、受信されたデータをレジスタ２０６，２０８，２１２にロードする。レジスタはフリップフロップ構造であることが可能である。 FIG. 16 shows a processing device in the wireless broadband signal processing system 10. This processor is capable of performing convolution operations (FIR filtering) and tap loading. An initial value and a step value are provided to the address generation logic unit 202. The address generation logic unit 202 generates addresses supplied to ROM 1, ROM 2, ROM 3, ROM 4, ROM 5, ROM 6, ROM 7, and ROM 8. The input data is received by the input shifter 204 by the processing device. The input shifter 204 performs tap loading and loads the received data into the registers 206, 208 and 212. The register can have a flip-flop structure.

複素乗算の演算は、アドレス生成論理部２２によって生成されるアドレスへ対応する位置でＲＯＭ構造にロードされたデータと、通信データとに対して実行される。これらの複素乗算の演算の積の総和が、複素加算木２１６によって求められる。結合ストリームを複素加算木２１６に供給する結合シフタ２１８によって、８並列の乗算を超える乗算が可能となる。このように、複数のタップを集積することによって畳み込みが構築される。このように、結合ストリーム入力を複素加算木２１６へ含めることによって、動的な範囲制御が可能となる。出力シフタ２２０は、処理装置からのデータの出力ストリームとして、複素加算木２１６からのデータをシフトする。 The complex multiplication operation is performed on the data loaded in the ROM structure at the position corresponding to the address generated by the address generation logic unit 22 and the communication data. The sum of the products of these complex multiplication operations is obtained by the complex addition tree 216. A combined shifter 218 that supplies the combined stream to the complex adder tree 216 allows for more than 8 parallel multiplications. In this way, a convolution is constructed by accumulating a plurality of taps. Thus, by including the combined stream input in the complex adder tree 216, dynamic range control is possible. The output shifter 220 shifts the data from the complex addition tree 216 as an output stream of data from the processing device.

図１７には、図１６の処理装置のアドレス演算論理部２０２をより詳細に示す。ＧＰＩＯ命令を介し、アドレス生成論理部２０２によって、初期化されたアドレスが受信される。この初期化されたアドレスが、現在のアドレスである。ＲＯＭメモリ構造（図１６）へ通信されるアドレスは、現在のアドレス（Ａ０）、現在のアドレスに刻み値を加算したもの、現在のアドレスに刻み値掛ける２を加算したものなどである。ＲＯＭ構造からデータの読取が行われるにしたがって、現在のアドレスが刻み値だけインクリメントされる。そのため、アドレスのインクリメントは、「頂部」、すなわち、通信データの総和が求められる値を再びロードする必要なく、自動的に行われる。 FIG. 17 shows the address arithmetic logic unit 202 of the processing apparatus of FIG. 16 in more detail. The address generated by the address generation logic unit 202 is received via the GPIO instruction. This initialized address is the current address. The address communicated to the ROM memory structure (FIG. 16) is the current address (A0), the current address plus the step value, the current address plus the step number two, etc. As data is read from the ROM structure, the current address is incremented by the increment value. Therefore, the address increment is automatically performed without having to reload the “top”, that is, the value for which the sum of communication data is required.

図１６のＲＯＭ１，ＲＯＭ２，ＲＯＭ３，ＲＯＭ４，ＲＯＭ５，ＲＯＭ６，ＲＯＭ７，ＲＯＭ８の内容は、次式を用いて決定可能である： The contents of ROM 1, ROM 2, ROM 3, ROM 4, ROM 5, ROM 6, ROM 7, ROM 8 of FIG. 16 can be determined using the following equations:

ここで、ＲはアドレスＡにおける第ｎ番目のＲＯＭの内容であり、Ａは値０〜２５５に定義されるアドレスである。

Here, R is the contents of the nth ROM at address A, and A is an address defined by the value 0-255.

代表的な一実施形態による無線広帯域信号処理システムを示す図。1 illustrates a wireless broadband signal processing system according to a representative embodiment. FIG. 代表的な一実施形態による図１の無線広帯域信号処理システムの診断メールボックスの使用を示す図。FIG. 2 illustrates the use of a diagnostic mailbox of the wireless broadband signal processing system of FIG. 1 according to an exemplary embodiment. 代表的な一実施形態による２ポートＲＡＭによって実装されるメールボックス診断機能を示す図。The figure which shows the mailbox diagnostic function implemented by 2 port RAM by one typical embodiment. 代表的な一実施形態による汎用入力出力（ＧＰＩＯ）命令フィールドを含む命令の、図１の無線広帯域信号処理システムによる処理の図。FIG. 3 is a diagram of processing by the wireless broadband signal processing system of FIG. 1 of an instruction including a general purpose input output (GPIO) instruction field according to an exemplary embodiment. 汎用入力および出力の動作を示す図１の無線広帯域信号処理システムの図。FIG. 2 is a diagram of the wireless broadband signal processing system of FIG. 1 illustrating general input and output operations. 処理反復時間の動的な構成を示す図１の無線広帯域信号処理システムの図。FIG. 2 is a diagram of the wireless broadband signal processing system of FIG. 1 showing a dynamic configuration of processing iteration times. 代表的な一実施形態による図１の無線広帯域信号処理システムにおいて利用されるＡＲＭプロセッサおよび無線広帯域信号プロセッサ（ＷＢＳＰ）のプロセッサによって実行される動作を示す図。FIG. 2 is a diagram illustrating operations performed by an ARM processor and a wireless wideband signal processor (WBSP) processor utilized in the wireless wideband signal processing system of FIG. 1 according to an exemplary embodiment. 代表的な一実施形態による図１の無線広帯域信号処理システムにおいて実行されるＦＦＴ動作を示す図。FIG. 2 is a diagram illustrating an FFT operation performed in the wireless wideband signal processing system of FIG. 1 according to an exemplary embodiment. 図１の無線広帯域信号処理システムにおいてＦＦＴアルゴリズムを実行するプロセッサの機能を示す図。The figure which shows the function of the processor which performs a FFT algorithm in the radio | wireless wideband signal processing system of FIG. 図９のＦＦＴアルゴリズムのアドレス生成処理において実行される動作を示す図。The figure which shows the operation | movement performed in the address generation process of the FFT algorithm of FIG. 代表的な一実施形態による代表的な入力アドレスマッピングを示す図。FIG. 4 illustrates an exemplary input address mapping according to an exemplary embodiment. 代表的な一実施形態による代表的な回転アドレスマッピングを示す図。FIG. 4 illustrates an exemplary rotational address mapping according to an exemplary embodiment. 代表的な一実施形態による最終ステージ処理におけるインタリーブ処理マッピングを示す図。FIG. 6 is a diagram illustrating interleave processing mapping in final stage processing according to an exemplary embodiment. 代表的な一実施形態によるコンテキスト切替動作を示す図。The figure which shows the context switch operation | movement by one representative embodiment. 図１４のコンテキスト切替動作のタイミング図。FIG. 15 is a timing diagram of the context switching operation of FIG. 14. 図１の無線広帯域信号処理システムにおける処理装置を示す図。The figure which shows the processing apparatus in the radio | wireless wideband signal processing system of FIG. 図１６の処理装置のアドレス演算論理部を示す図。The figure which shows the address arithmetic logic part of the processing apparatus of FIG.

Claims

A method for obtaining processor diagnostic data, comprising:
Receiving a command;
Selectively enabling write access of the output stream to the diagnostic memory;
Writing to diagnostic memory at a first frequency;
Reading from the diagnostic memory at a second frequency;
The first frequency is greater than the second frequency.

The method of claim 1 including communicating the contents of the diagnostic memory to a logic analyzer.

The method of claim 1, wherein the diagnostic memory receives communication data from an external source.

The method of claim 1, wherein the diagnostic memory receives communication data from the processing device.

The method of claim 1, wherein write access of the output stream to the diagnostic memory is enabled when received instructions change.

The method of claim 1, wherein the first frequency is 40 MHz or higher.

The method of claim 1, wherein the second frequency is 40 MHz or less.

The method of claim 1, wherein the received instruction includes a diagnostic mailbox field.

9. The method of claim 8, wherein if the diagnostic mailbox field of the received instruction is set to 1, the output stream of the received instruction is written to diagnostic memory.

The first frequency and the second frequency are selected such that the second frequency is less than or equal to the frequency obtained by multiplying the first frequency by the inverse of the clock associated with the instruction whose diagnostic mailbox field is set to 1. 10. The method of claim 9, wherein:

The method of claim 1, wherein the diagnostic memory is a random access memory (RAM) having one or more read ports and one or more write ports.

The method of claim 11, wherein the random access memory (RAM) is a two-port RAM having one write port and one read port.

The read and write addresses applied to the diagnostic memory are after each read or write access to the diagnostic memory until either address matches the maximum RAM address at which the read and write addresses return to 0. 12. The method of claim 11, wherein the method is automatically incremented.

14. The method of claim 13, comprising communicating an overflow indication when the diagnostic memory is full with unread data and the received instruction indicates to write the output stream to the diagnostic memory. Method.

14. The method of claim 13, comprising communicating an empty display when all data stored in diagnostic memory has been read.

A system for obtaining processor diagnostic data,
A memory storing a plurality of instructions;
A controller for receiving and executing the plurality of instructions including selectively enabling write access of the output stream to the diagnostic memory;
A system comprising: a diagnostic memory that receives an output stream at a first frequency and outputs content at a second frequency; and the first frequency is greater than the second frequency.

The system of claim 16, comprising a logic analyzer that receives the contents of the diagnostic memory.

The system of claim 16, wherein the diagnostic memory receives communication data from an external source.

The system of claim 16, wherein the diagnostic memory receives communication data from the processing device.

The system of claim 16, wherein the controller enables write access to the diagnostic memory when received instructions change.

The system of claim 16, wherein the first frequency is 40 MHz or higher.

The system of claim 16, wherein the second frequency is 40 MHz or less.

The system of claim 16, wherein the received instruction includes a diagnostic mailbox field.

24. The system of claim 23, wherein the output stream of received instructions is written to diagnostic memory if the received instruction diagnostic mailbox field is set to one.

The first frequency and the second frequency are selected such that the second frequency is less than or equal to the frequency obtained by multiplying the first frequency by the inverse of the clock associated with the instruction whose diagnostic mailbox field is set to 1. 25. The system of claim 24.

The system of claim 16, wherein the diagnostic memory is a random access memory (RAM) having one or more read ports and one or more write ports.

27. The system of claim 26, wherein the random access memory (RAM) is a two-port RAM having one write port and one read port.

The read and write addresses applied to the diagnostic memory are after each read or write access to the diagnostic memory until either address matches the maximum RAM address at which the read and write addresses return to 0. 27. The system of claim 26, wherein the system is automatically incremented.

29. The step of communicating an overflow indication when the diagnostic memory is full with unread data and the received instruction indicates to write the output communication stream to the diagnostic memory. System.

29. The system of claim 28, comprising communicating an empty display when all data stored in diagnostic memory has been read.

A method for controlling input and output in a multi-mode wireless processing system, comprising:
Receiving instructions for controlling an interface between an external element of the multi-mode wireless processing system and the multi-mode wireless processing system;
Determining whether a given processing unit generates output data or receives input data from the field of the received instruction.

32. The method of claim 31, wherein the received command indicates recording of a frame rate error.

32. The method of claim 31, wherein the received instruction directs management of a sample buffer within the multi-mode wireless processing system.

32. The method of claim 31, comprising determining a general input as a source of input data from a field of received instructions.

35. The method of claim 34, wherein the received command indicates a communication rate between the predetermined processing device and the general purpose input.

35. The method of claim 34, comprising routing input data from a source to a predetermined processing device.

32. The method of claim 31, comprising determining the general output from the received instruction field as a destination for the generated output data.

38. The method of claim 37, wherein the received command indicates a communication rate between the predetermined processing unit and the general purpose output.

38. The method of claim 37, comprising routing generated output data from a predetermined processing device to a destination.

A configuration of input / output components for interfacing with a processing device in a multi-mode wireless processing system,
A plurality of general-purpose inputs for supplying input data to a processing device in a multi-mode wireless processing system;
A configuration comprising a plurality of general-purpose outputs for receiving output data generated by a processing device in a multimode wireless processing system.

A system for controlling input and output in a multimode wireless processor, comprising:
In a multimode wireless processor system, a memory including instructions for controlling an interface between an external element of the multimode wireless processing system and the multimode wireless processing system;
A system comprising: a controller that receives an instruction and determines from a field of the instruction whether a predetermined processing device in the multimode wireless processing system generates output data or receives input data.

42. The system of claim 41, wherein the received command directs recording of a frame rate error.

42. The system of claim 41, wherein the received instruction directs management of a sample buffer within the multi-mode wireless processing system.

42. The system of claim 41, wherein the controller includes determining, from the command field, a general purpose input as a source of input data.

45. The system of claim 44, wherein the controller routes input data from the source to a predetermined processing device.

45. The system of claim 44, wherein the received command indicates a communication rate between the predetermined processing device and the general purpose input.

42. The system of claim 41, wherein the controller determines the general purpose output as the destination of the generated output data from the field of instructions.

48. The system of claim 47, wherein the controller routes output data from a predetermined processing device to a destination.

48. The system of claim 47, wherein the received command indicates a communication rate between the predetermined processing unit and the general purpose output.

A method for dynamically controlling a connection rate to a sample buffer in a multi-mode processing system, comprising:
Receiving a command for communication in a multi-mode wireless processing system;
Determining a rate at which a plurality of buffers are sequentially connected to elements external to the multi-mode wireless processing system for receiving or transmitting data;
Programming a plurality of registers for controlling a rate at which a plurality of buffers are sequentially connected to an external element based on received instructions.

51. The method of claim 50, wherein one register controls the rate at which multiple buffers are sequentially connected to external elements.

51. The method of claim 50, wherein a rate at which multiple buffers are sequentially connected to an external element varies dynamically based on received instructions.

51. The method of claim 50, wherein a field of received instructions determines whether a rate at which a plurality of buffers are sequentially connected to an external element changes.

A system for dynamically controlling a connection rate to a sample buffer in a multi-mode processing system,
A memory containing instructions for multimode wireless processor communication in a multimode wireless processing system;
A controller that receives instructions and determines a rate at which a plurality of buffers are sequentially connected to elements external to the multi-mode wireless processing system for receiving or transmitting data;
A system comprising a plurality of registers for controlling a rate at which a plurality of buffers are sequentially connected to an external element.

55. The system of claim 54, wherein the controller dynamically changes a rate at which the plurality of buffers are sequentially connected to external elements based on the received instructions.

55. The system of claim 54, wherein one register controls the rate at which multiple buffers are sequentially connected to external elements.

55. The system of claim 54, wherein the controller determines whether a plurality of registers are programmed based on the received instruction field.

A method for interfacing two processors,
Generating a read / write request at the first processor; the read / write request is targeted to a target memory to which the first processor does not have direct access;
Receiving a read / write request at the second processor; the second processor having direct access to the target memory accessed by the read / write request;
Completing a read / write operation in a second processor;
Receiving at the first processor an indication that the read / write operation is complete.

The method includes: continuing operation at the first processor after receiving read data from the second processor; and continuing the operation is associated with a read / write request that is a read operation. 58. The method according to 58.

Claims comprising: continuing the operation at the second processor after the completion of the write operation by the second processor; and continuing the operation is associated with a read / write request that is a write operation. Item 59. The method according to Item 58.

Generating a read / write address including a target buffer number and a target address in the target buffer, the target memory being a target buffer that is part of a second processor;
59. receiving a read / write address at a second processor.

59. The method of claim 58, comprising receiving write data at the second processor if the read / write request is a write request.

59. The method of claim 58, comprising polling at the second processor for a read / write request of the first processor.

64. The method of claim 63, wherein polling in the second processor is performed by periodically monitoring status bits.

The method of claim 64, wherein a read / write request is indicated by the status bit being set to a non-zero value.

The method of claim 64, wherein the status bit is cleared upon completion of the read / write operation by the second processor.

59. The method of claim 58, comprising polling at the first processor for an indication that the read / write operation is complete.

68. The method of claim 67, wherein polling in the first processor is performed by periodically monitoring status bits.

A system for interfacing two processors,
A first processor for generating read / write requests directed to a target memory for which the first processor does not have direct access;
A second processor that receives the read / write request and completes the read / write operation; the second processor has direct access to the target memory accessed by the read / write request;
Target memory,
Means for communicating between the first processor and the second processor.

70. The system of claim 69, wherein the second processor is a multi-mode wireless processor.

70. The system of claim 69, wherein the first processor is an ARM processor.

70. The system of claim 69, wherein the target memory is part of the second processor.

70. The system of claim 69, wherein the second processor receives write data if the read / write request is a write request.

Claim: after receiving read data from a second processor, the first processor performs an operation affected by the read / write request, and the read / write request is a read operation. Item 70. The system according to Item 69.

After the second processor completes the write operation, the second processor performs an operation affected by the read / write request; the read / write request is a write operation; 70. The system of claim 69, comprising:

70. The system of claim 69, wherein the target memory is a buffer.

The first processor generates a read / write address including a target buffer number and a target address in the target buffer; the target memory is a target buffer that is part of the second processor; 77. The system of claim 76, comprising: receiving the generated read / write address.

70. The system of claim 69, wherein the second processor polls for read / write requests of the first processor.

79. The system of claim 78, wherein polling by the second processor is performed by periodically monitoring status bits.

80. The system of claim 79, wherein a read / write request is indicated by the status bit being set to a non-zero value.

70. The system of claim 69, wherein the first processor polls for an indication that the read / write operation is complete.

The system of claim 81, wherein the first processor polling is performed by periodically monitoring status bits.

The system of claim 82, wherein the status bit is cleared upon completion of a read / write operation by the second processor.

An interface between two processors,
Means for generating a read / write request at the first processor;
Means for setting a status bit by either the first processor or the second processor;
Means for polling the status bits by the first processor;
Means for polling the status bits by the second processor;
Means for communicating additional data between the first processor and the second processor.

85. The interface of claim 84, wherein the second processor periodically polls for status bits.

The interface of claim 84, wherein the first processor sets a status bit to 0 to indicate a read / write request.

85. The interface of claim 84, wherein the first processor provides write data to the second processor when the read / write request is for a write operation.

The interface of claim 84, wherein the second processor clears the status bit when the requested read / write operation is complete.

The interface of claim 84, wherein the second processor transmits write data to the first processor when the second processor completes the write operation.

85. The interface of claim 84, wherein the first processor periodically polls for status bits.

85. The interface of claim 84, wherein the first processor provides an address as part of a read / write request to the second processor.

92. The interface of claim 91, wherein the address includes a target buffer number and a target address in the target buffer.

A fast Fourier transform (FFT) method in a multimode wireless processing system, comprising:
Loading an input vector into an input buffer;
Initializing the second counter and variable N, N = log ₂ (input vector size), s being the value of the second counter,
The process of executing the FFT stage and the FFT stage are the following processes:
Performing a vector operation on the data in the input buffer and sending the result to the output buffer; the data in the input buffer comprising a plurality of segments;
Advancing the value of the second counter;
Switching roles of the input buffer and the output buffer; and
comparing s with N and performing additional FFT stages until s = N.

94. The method of claim 93, wherein if N is odd, the second counter is initialized to 2 and advanced by 2 in the FFT stage and set to N in the final FFT stage.

94. The method of claim 93, wherein the vector operation operates on one segment of data in the input buffer at a time until all segments are calculated.

Vector operations are
Loading four input data from the input buffer to the processor;
Performing a radix-4 FFT vector operation using a radix-4 FFT engine on the four input data loaded into the processor, and the radix-4 FFT engine receives the four input vectors and generates four output vectors. And
Multiplying the four generated output vectors by a twiddle factor; each output vector has an associated twiddle factor; the twiddle factor has a real component and an imaginary component;
94. The method of claim 93, comprising avoiding output vector multiplication when the associated twiddle factor is one.

99. The method of claim 96, comprising avoiding multiplication of the first output vector.

If N is an odd number and the final FFT stage is being executed, two input data are loaded from the input buffer to the processor and used as the first and third radix-4 FFT engine input vectors, 99. The method of claim 96, wherein the fourth radix-4 FFT engine input vector is set to zero.

The four input data loaded into the processor are sequentially received by the processor and provided in parallel to the radix-4 FFT engine, and the four output vectors of the radix-4 FFT engine are received in parallel from the radix-4 FFT engine. 99. The method of claim 96, wherein the method is sequentially written to the output buffer.

The processor operates at a multimode radio processing system clock frequency reduced by a factor of four, except when the final FFT stage is being performed and N is an odd number,
99. The method of claim 96, wherein when the final FFT stage is performed and N is an odd number, the processing unit operates at a multimode wireless processing system clock frequency that is reduced by a factor of two.

101. The master counter is used as a loop variable that is initialized, advanced, and compared to the length of data in the input buffer to determine when all segments of data in the input buffer data have been computed. The method described.

In the input buffer address, bits N to (s + 1) of the master counter are mapped to bits (N−s) to 1 of the input buffer address, and bits s to 1 of the master counter are bits N to (N−s + 1) of the input buffer address. 102. The method of claim 101, wherein the remaining high order bits of the input buffer address are set to 0 and bit 1 is generated to be the least significant bit of the input buffer address and master counter.

103. The method of claim 102, wherein the input address is 13 bits.

In all FFT stages except the final FFT stage, the output buffer address is equal to the input buffer address, and in the final FFT stage, the output buffer address bits 13 to 13-N of the output buffer address are set to 0, and N is an even number. , The output buffer bits N to 1 are the first mapping sequence I ₂ , I ₁ , I ₄ , I ₃ ,. . . In accordance with I _N , I _N−1 , when N is an odd number, the bits N to ₁ of the output buffer are the second mapping sequence, I ₁ , I ₃ , I ₂ , I ₅ , I ₄ ,. . . According to I _N and I _N−1 , I is the input buffer address, bit 1 is the lowest bit of the output buffer, and bit 1 is generated to be the lowest bit of the input buffer address and output buffer address 103. The method of claim 102.

The twiddle factor is
Generating a pre-rotation address;
Generating a control word for the control operation of the twiddle factor;
Generating a final rotation address;
Determining whether it is necessary to access the twiddle factor from memory based on the preliminary rotation address;
If you need access to the twiddle factor,
Reading the twiddle factor from memory at the final rotation address;
Manipulating the twiddle factor based on the control word;
99. The method of claim 96 generated by storing the manipulated twiddle factor in a processor.

106. The method of claim 105, wherein the manipulated twiddle factor stored in the processing device is stored in a register.

In the preliminary rotation address, the high order (Ns) bits of the preliminary rotation address are mapped to the bits (Ns) to 1 of the input buffer address, the remaining low order bits of the preliminary rotation address are set to 0, and bit 1 is input. 106. The method of claim 105, wherein the method is generated to be the least significant bit of the buffer address.

106. The method of claim 105, wherein the pre-rotation address and the final rotation address are 11 bits.

106. The method of claim 105, wherein the control word is the three high order bits of the product between the preliminary rotation address and the two low order bits of the master counter.

The twiddle factor is manipulated according to the control word bits as follows:
First, when the exclusive OR of bit 1 of the control word and bit 2 of the control word is 1, the real and imaginary components of the twiddle factor are exchanged, and the signs of the real and imaginary components of the twiddle factor are inverted. ,
Second, when bit 2 of the control word is 1, the sign of the real component of the twiddle factor is inverted,
Thirdly, if bit 3 of the control word is 1, the sign of the real and imaginary components of the twiddle factor is inverted.

The final rotation address is
Multiplying the preliminary rotation address by the two low order bits of the master counter to produce a product;
Subtracting product bits 9-0 from 512 to produce a residue, bit 0 being the least significant bit of the product,
Sending the remainder to the first input of the 2: 1 multiplexer, the product bits 9-0 to the second input of the 2: 1 multiplexer, and the product bit 10 to the selected input of the 2: 1 multiplexer. 106. The method of claim 105, wherein the final rotation address is generated by an output of a 2: 1 multiplexer.

A system for performing a Fast Fourier Transform (FFT) in a multimode wireless processing system comprising:
A processing unit for performing vector operations;
Memory for providing arithmetic functions to the processing unit; and
A program memory storing instructions for executing the FFT algorithm;
An instruction controller for receiving and executing instructions from the program memory;
A system comprising: a pair of buffers that alternately function as input and output buffers in successive FFT stages of an FFT algorithm, and the input buffer data includes a plurality of segments.

113. The system of claim 112, wherein the memory that provides the arithmetic function stores a twiddle factor.

113. The system of claim 112, wherein the memory that provides the arithmetic function is a ROM.

The processing equipment
A radix-4 FFT engine that performs 8 complex additions on 4 input vectors and generates 4 output vectors;
A rotation multiplier for multiplying the generated output vector by an associated twiddle factor, the twiddle factor having a real component and an imaginary component;
A serial-to-parallel converter for sequentially receiving four input vectors from an input buffer and transmitting the four input vectors to a radix-4 FFT engine in parallel;
A parallel-to-serial converter for receiving four generated output vectors in parallel and sequentially outputting the four output vectors to a rotation multiplier and an output buffer;
A set of registers for storing the manipulated twiddle factor in the processor;
Rotating octant controller that operates twiddle factors based on control words;
A master counter used as a loop variable to monitor the progress of the FFT algorithm at a given FFT stage;
A second counter used as a loop variable to track the current stage of the FFT algorithm, and s is the value of the second counter;
An input address generator for generating an input buffer address and a final FFT stage is executed and the input buffer address is used as an output buffer address in all FFT stages except when N is an odd number; and N = log ₂ (size of data in the input buffer)
A rotation address generator for generating a preliminary rotation address;
A dibit interleaving generator that generates an output buffer address in the final FFT stage if N is odd;
A rotational address multiplier for generating control words;
A summer to subtract 512 bits of the product generated by the rotating address multiplier from 512 and generate a residue;
113. The system of claim 112, comprising: a 2: 1 multiplexer for generating a final rotation address from the product generated by the residual and rotation address multiplier.

116. The system of claim 115, wherein if N is odd, the second counter is initialized to 2, advanced by 2 in the FFT stage, and set to N in the final FFT stage.

116. The system of claim 115, wherein the processing unit operates on one segment of data in the input buffer at a time until all segments are calculated.

116. The system of claim 115, comprising a multiplier avoidance indicator to indicate when a rotation multiplier is avoided.

If N is odd and the final FFT stage is being executed, the serial-parallel converter receives two input data from the input buffer, and the two received input data are the first radix-4 FFT engine input vector. 116. The system of claim 115, wherein the second radix-4 FFT engine input vector and the fourth radix-4 FFT engine input vector are set to zero.

The processor operates at a multimode radio processing system clock frequency reduced by a quarter, except when the final FFT stage is being performed and N is odd,
116. The system of claim 115, wherein when the final FFT stage is being performed and N is an odd number, the processing unit operates at a system clock frequency that is reduced by a factor of two.

In the input buffer address, bits N to (s + 1) of the master counter are mapped to bits (N−s) to 1 of the input buffer address, and bits s to 1 of the master counter are bits N to (N−s + 1) of the input buffer address. 116. The system of claim 115, wherein the remaining high order bits of the input buffer address are set to 0 and bit 1 is generated to be the least significant bit of the master counter and input buffer address.

116. The system of claim 115, wherein the input buffer address is 13 bits.

In all FFT stages except the final FFT stage, the output buffer address is equal to the input buffer address. In the final FFT stage, the output buffer address is set so that bits 13 to 13-N of the output buffer are set to 0 and N is an even number. If the bit N~1 the first mapping sequence of output _{_{_{buffers, I 2, I 1, I}}} 4, I 3,. . . In accordance with I _N , I _N−1 , when N is an odd number, the bits N to ₁ of the output buffer are the second mapping sequence, I ₁ , I ₃ , I ₂ , I ₅ , I ₄ ,. . . 116. The system of claim 115, wherein according to I _N , I _N−1 , I is generated such that I is an input buffer address and bit 1 is the least significant bit of the input buffer address and output buffer address.

The rotation address generator determines whether a new twiddle factor needs to be accessed from the memory providing the arithmetic function and indicates that the new twiddle factor needs to be accessed from the memory providing the arithmetic function 116. The system of claim 115, comprising generating a rotation address transition indicator and transmitting the rotation address transition indicator to a set of registers.

In the preliminary rotation address, the high order (N−s) bits of the preliminary rotation address are mapped to the bits (N−s) to 1 of the input buffer address, the remaining low order bits of the preliminary rotation address are set to 0, and bit 1 is 116. The system of claim 115, wherein the system is generated to be the least significant bit of the input buffer address.

116. The system of claim 115, wherein the pre-rotation address and the final rotation address are 11 bits.

116. The system of claim 115, wherein the control word is three high order bits of the product of the preliminary rotation address and the two low order bits of the master counter.

The twiddle factor is manipulated according to the control word as follows:
First, when the exclusive OR of bit 1 of the control word and bit 2 of the control word is 1, the real and imaginary components of the twiddle factor are exchanged, and the signs of the real and imaginary components of the twiddle factor are inverted. ,
Second, when bit 2 of the control word is 1, the sign of the real component of the twiddle factor is inverted,
Third, the system of claim 115, wherein when bit 3 of the control word is 1, the sign of both the real and imaginary components of the twiddle factor are inverted.

The remainder is sent to the first input of the 2: 1 multiplexer and the product bits 9-0 generated by the rotating address multiplier are sent to the second input of the 2: 1 multiplexer and generated by the rotating address multiplier. 116. The system of claim 115, wherein product bit 10 is transmitted to a selected input of a 2: 1 multiplexer, and a final rotation address is an output of the 2: 1 multiplexer.

A method for switching between instruction contexts within a time interval in a multi-mode wireless broadband processing system, comprising:
Executing critical task actions, each critical task action being executed within a time interval, a critical task containing multiple critical task actions,
Performing a non-critical task action, the execution of each non-critical task action can cross time interval boundaries, a non-critical task includes multiple non-critical task actions,
Entering a sleep mode in which the critical task action and the non-critical task action are not executed if the critical task action and the non-critical task action started in the time interval are completed before the next time interval starts. Method.

131. The method of claim 130, wherein non-critical task operations are not initiated in a time interval until a critical task operation is performed.

131. The method of claim 130, wherein the one or more critical task operations are performed within a time interval.

The method of claim 130, wherein the context is stored in a hardware register.

131. The method of claim 130, wherein the maximum number of simultaneous contexts is determined by the number of sets of hardware registers.

A system for switching between instruction contexts within a time interval in a multi-mode wireless broadband processing system,
A memory storing a plurality of instructions including critical task operations and non-critical task operations; a critical task including a plurality of critical task operations; a non-critical task including a plurality of non-critical task operations;
A controller that receives and executes instructions, that critical task operations are executed within a time interval, and that execution of each non-critical task operation can cross time interval boundaries;
A multi-mode wireless broadband processing system that includes a controller and a memory is capable of critical task operations and non-critical tasks if critical and non-critical task operations initiated in a time interval are completed before the next time interval begins. Entering a sleep mode in which no operation is performed.

136. The system of claim 135, wherein non-critical task operations are not initiated in a time interval until a critical task operation is performed.

136. The system of claim 135, wherein the one or more critical task operations are performed within a time interval.

140. The system of claim 135, wherein the context is stored in a memory element.

138. The system of claim 138, wherein the maximum number of simultaneous contexts is determined by the number of sets of memory elements.

A method for performing a convolution operation in a multimode wireless processing system, comprising:
Loading initial values and ticks into the address generator;
Generating an address based on the initial value and the step value;
Supplying the generated addresses to a series of memories;
Loading input data into a series of registers, the number of series registers being equal to the number of series memories, each register being associated with one memory,
Multiplying the value stored in the generated address of the memory associated with each register by the contents of each register to generate a series of products;
Adding a series of products to generate a sum of products;
Generating an output stream from the sum of products.

141. The method of claim 140, wherein the generated address is initially set to an initial value.

141. The method of claim 140, wherein the register is a flip-flop structure.

141. The method of claim 140, wherein the memory is a ROM.

143. The method of claim 140, wherein the multiplication of the register contents and the memory contents is performed in parallel.

143. The method of claim 140, wherein the product addition is performed by a complex addition tree.

141. The method of claim 140, wherein the input from the combination shifter is included in the sum of products.

141. The method of claim 140, wherein the value R stored at address A of memory n is determined by the following equation:

(Where A is defined as the value 0-255)

141. The method of claim 140, wherein there are 8 memories and 8 registers.

Performing subsequent multiplications between the contents of each register and the value stored in the generated address of the memory associated with each register, the generated address being incremented in increments in the subsequent multiplication;
Adding the products of subsequent multiplications to generate the sum of the subsequent products;
141. generating a subsequent output stream based on a sum of subsequent products.

The method of claim 149, wherein the address generator automatically increments the generated address by a step value.

A system for performing a convolution operation in a multimode wireless processing system,
An address generator for generating addresses given initial values and ticks;
A series of memories,
A series of registers for storing input values, a series of complex multipliers, a series of complex multipliers, the number of registers and memories being equal, and each multiplier is associated with one register and one memory Each multiplier generates a product of the contents of the associated register and the value stored at the generated address of the associated memory;
A system consisting of a complex addition tree that adds a series of products and generates a sum of products.