JP3996858B2

JP3996858B2 - Arithmetic processing unit

Info

Publication number: JP3996858B2
Application number: JP2003009709A
Authority: JP
Inventors: 隆太朗山中; 秀俊鈴木; 英之蕪尾; 稔岡本; ケヴィン・ストーン
Original assignee: Panasonic Corp; Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Corp; Panasonic Holdings Corp
Priority date: 1997-06-30
Filing date: 2003-01-17
Publication date: 2007-10-24
Anticipated expiration: 2018-06-16
Also published as: JP2003234656A

Description

【０００１】
【発明の属する技術分野】
本発明は移動通信機器などに組み込まれる演算処理装置に関し、特にビタビ復号のＡＣＳ（加算、比較、選択）演算の効率的処理を可能にする技術に関する。
【０００２】
【従来の技術】
近年、ディジタル信号処理プロセッサ（以下これをＤＳＰと呼ぶ）は、移動体通信分野のディジタル化の動きに合わせて、例えば、携帯電話への機器組込み型プロセッサとして多用されている。移動無線通信回線におけるデータ通信では、ビット誤りが頻繁に発生するため、誤り訂正処理を行う必要がある。誤り訂正の手法には、入力ビットから生成された畳み込み符号を、受信側でビタビ復号により復号する方法があり、この誤り訂正処理にＤＳＰが使用される。
【０００３】
ビタビ復号は、加算・比較・選択という単純な処理の繰り返しと、最終的にデータを復号するトレースバック操作とで畳み込み符号の最尤復号を実現する。以下に、ビタビ復号の処理を簡単に説明する。畳み込み符号は、入力ビットとそれに先行する一定数のビットとのｍｏｄ２加算により生成され、入力ビット１ビットに対応して複数の符号化データが生成される。この符号化データに影響を与える入力情報ビット数のことを拘束長（Ｋ）といい、その数はｍｏｄ２加算に用いられるシフトレジスタの段数に等しい。
【０００４】
この符号化データは、入力ビットと、先行する（Ｋ−１）個の入力ビットの状態とで決まる。この状態は、新たな情報ビットが入力することによって新たな状態に移る（遷移する）が、遷移可能な状態は、新たな入力ビットが０であるか１であるかによって決まってしまう。この状態の数は、（Ｋ−１）このビットのそれぞれが１、０をとりうるから、２^K-1 個となる。
【０００５】
ビタビ復号では、受信した符号化データ系列を観測し、取り得るすべての状態遷移の中から、最も確からしい状態を推定する。そのために、情報ビット１ビットに対応する符号化データ（受信データ系列）を得るごとに、その時点での各状態へのパスの信号間距離（メトリック）を計算し、同一状態に達するパスのうち、メトリックの少ない方を生き残りパスとして残す操作を順次繰り返す。
【０００６】
図２５は、拘束長Ｋの畳み込み符号器において、ある時点における状態Ｓ［２ｎ］（ｎは正整数）に対し、１つ前の時点の状態Ｓ［ｎ］とＳ［ｎ＋２^K-2 ］とから状態遷移を表す２本のパスが延びている様子を示している。例えば、Ｋ＝３の場合でいえば、ｎ＝１のとき、Ｓ［２］すなわちＳ１０の状態（先行する２ビットが「１」「０」の順に入力した状態）に対して、Ｓ［１］すなわちＳ０１の状態、およびＳ［３］すなわちＳ１１の状態からの遷移が可能であり、また、ｎ＝２のとき、Ｓ［４］すなわちＳ００の状態（下位２ビットの表す状態）に対して、Ｓ［２］すなわちＳ１０の状態、およびＳ［４］すなわちＳ００の状態からの遷移が可能である。
【０００７】
パスメトリックａは、状態Ｓ［２ｎ］に入力するパスの出力シンボルと受信データ系列との信号間距離（ブランチメトリックｘ）と、１つ前の時点の状態Ｓ［ｎ］までの生き残りパスのブランチメトリックの総和であるパスメトリックＡとの和である。同様にパスメトリックｂは、状態Ｓ［２ｎ］に入力するパスの出力シンボルと受信データ系列との距離（ブランチメトリックｙ）と、１つ前の時点の状態Ｓ［ｎ＋２^K-2 ］までの生き残りパスのブランチメトリックの総和であるパスメトリックＢとの和である。こうして求めた、状態Ｓ［２ｎ］に入力するパスメトリックａ，ｂを比較し、小さい方のパスを生き残りパスとして選択する。
【０００８】
ビタビ復号では、このように、パスメトリックを求めるための加算、パスメトリックの比較、パスの選択の各処理を、各時点で２^K-1 個の状態に対して実行する。更に、パスの選択において、どちらのパスを選択したかという履歴をパスセレクト信号ＰＳ［ｉ］、［ｉ＝０〜２^K-1 −１］として残しておく必要がある。このとき、選ばれたパスの１つ前の状態の添え字（例えばｎ）が、選ばれなかった他方のパスの１つ前の状態の添え字（ｎ＋２^K-2 ）よりも小さければ、ＰＳ〔ｉ］＝０とし、大きければ、ＰＳ［ｉ］＝１とする。図２５の場合、ｎ< （ｎ＋２^K-2 ）であるから、ａ＞ｂの時は状態Ｓ［ｎ＋２^K-2 ］が選択されて、ＰＳ［Ｓ２ｎ］＝１となり、ａ≦ｂの時は状態Ｓ［ｎ］が選択されて、ＰＳ［Ｓ２ｎ］＝０となる。最終的に、トレースバックにより復号する際に、このパスセレクト信号を基に生き残りパスをさかのぼりながら、データを復号していく。
【０００９】
従来のＤＳＰで、このビタビ復号用のＡＣＳ演算処理を、汎用の演算装置であるＴＭＳ３２０Ｃ５４ｘ（TEXAS INSTRUMENTS 社製、以下これをＣ５４ｘと呼ぶ）を例に挙げて説明する。ＧＳＭセルラー無線システムでは、以下の多項式が畳み込み符号として使用されている。
Ｇ１（Ｄ）＝１＋Ｄ³ ＋Ｄ⁴ 、Ｇ２（Ｄ）＝１＋Ｄ＋Ｄ³ ＋Ｄ⁴
【００１０】
この畳み込み符号は、図２６に示すバタフライ構造のトレリス線図で表される。このトレリス線図は、ある状態から別の状態への畳み込み符号の遷移する様子を表している。今、拘束長Ｋが５であるとすると、２^K-1 ＝１６個の状態または８個のバタフライ構造が各シンボル間ごとに存在することになり、それぞれの状態には２つのブランチが入力され、ＡＣＳ演算により新しいパスメトリックが決定する。
【００１１】
ブランチメトリックを次式で定義する。
Ｍ＝ＳＤ（２^* ｉ）^* Ｂ（Ｊ，０）＋ＳＤ（２^* ｉ＋１）^* Ｂ（Ｊ，１）
ここで、ＳＤ（２^* ｉ）は軟判定入力を表すシンボルメトリックの１番目のシンボルであり、ＳＤ（２^* ｉ＋１）はシンボルメトリックの２番目のシンボルである。Ｂ（Ｊ，０）とＢ（Ｊ，１）は図２７に示す畳み込み符号器により生成される符号に一致する。
【００１２】
Ｃ５４ｘでは、ＡＬＵをデュアル１６ビットモードにセットすることによってバタフライ構造を高速に処理する。新しいパスメトリック（Ｊ）の決定には、ＤＡＤＳＴ命令で、２^* Ｊと２^* Ｊ＋１の２個のパスメトリックとブランチメトリック（Ｍと−Ｍ）を並列に演算し、ＣＭＰＳ命令で比較を行う。新しいパスメトリック（Ｊ＋８）の決定には、ＤＳＡＤＴ命令で、２個のパスメトリックとブランチメトリック（Ｍと−Ｍ）を並列に演算する。演算結果はそれぞれ倍精度アキュムレータの上位と下位に格納される。ＣＭＰＳ命令で新しいパスメトリックが決定される。
【００１３】
ＣＭＰＳ命令は、アキュムレータの上位と下位を比較し、大きい方をメモリに格納する。また、１６ビットのトランディション・レジスタ（ＴＲＮ）にどちらが選択されたかを、後にトレースバックすることができるように比較を行うたびに更新する。ＴＲＮの内容は、各シンボル処理が終わるたびにメモリに格納する。メモリに格納される情報は、トレースバックの過程で最適なパスを探索するのに使われる。図２８にビタビ復号のバタフライ演算のマクロプログラムを示す。ブランチメトリックの値は、マクロが呼び出される前にＴレジスタに格納する。図２９にパスメトリックのメモリマッピング例を示す。
【００１４】
１つのシンボル区間で、８個のバタフライ演算が実行され、１６個の新しい状態が求められる。この一連の処理を数シンボル区間にわたって繰り返し計算し、処理が終了すると、次にトレースバックを行い、１６通りのパスから最適パスを探索し、復号ビット系列が求まる。
【００１５】
以上が、汎用のＤＳＰであるＣ５４ｘのＡＣＳ演算の機構であり、図２８のマクロプログラム例から、Ｃ５４ｘでは２個のパスメトリックの更新に４マシンサイクルで実現している。
【００１６】
【発明が解決しようとする課題】
今後、移動無線通信によるデータ伝送等の非音声通信の需要は、ますます増加することが見込まれており、非音声通信では従来の音声通信に比べて、より低いビット誤り率（以下、これをＢＥＲと呼ぶ）の高伝送品質が要望されている。低ＢＥＲを達成する一手段に、誤り訂正として使用されるビタビ復号の拘束長Ｋを大きくする手段がある。拘束長が１つ大きくなると、パスメトリックの数（状態数）が２倍になるため、ビタビ復号における演算量が２倍に増加する。また、一般的に非音声通信は音声通信に比べ情報量が多く、情報量が多ければそれだけビタビ復号に要す処理量（ＡＣＳ演算など）が増加する。
【００１７】
一方、移動無線通信等では、携帯端末のバッテリーの寿命を長時間持続させることが望まれている。また、それと同時に携帯端末の小型化・軽量化・低価格化も望まれている。そのため携帯端末では、従来、専用ＬＳＩで処理していた領域もＤＳＰ処理による１チップ化が計られている。ＤＳＰの処理量が少なければ少ないほどバッテリーを長時間持続させることができる。
【００１８】
しかしながら、上記に述べたとおり今後ＤＳＰによる演算量は増加する傾向にあり、そのため携帯端末のバッテリーを長時間持続させることは困難であるという問題がある。また、演算量が増加すれば、もはや既存のＤＳＰの処理能力を超えてしまい、ＤＳＰによる１チップで実現することができなくなるという問題もあった。さらに、ＤＳＰを高機能化させるため、大規模なハードウェア投資はそれだけＤＳＰ自身のコストの高騰化を招き、結果携帯端末の低価格化が実現できなくなるという問題もある。
【００１９】
本発明は、このような従来の問題を解決するものであり、なるべく少ないハードウェアの投資で、ＤＳＰによるビタビ復号の処理、とくにＡＣＳ演算を効率的に処理する演算処理装置を提供することを目的とする。
【００２０】
【課題を解決するための手段】
本発明の演算処理装置は、ディジタル信号処理プロセッサによるビタビ復号が可能な演算処理装置であって、第１のデータと第２のデータとを比較する第１の比較手段と、第３のデータと第４のデータとを比較する第２の比較手段とを有し、前記第１の比較手段と前記第２の比較手段とを、１命令によって、かつ、１サイクルで動作させる構成をとる。
【００２９】
【発明の実施の形態】
以下、本発明の実施の形態について、図面を用いて説明する。
（実施の形態１）
図１は実施の形態１における演算処理装置の構成を示すものである。図１において、１はパスメトリックを格納する記憶手段、２は記憶手段１に接続され、データの供給や演算結果の転送を行うバス、３はブランチメトリックを格納する記憶手段、４は記憶手段３に接続され、データの供給を行うバス、５および９は記憶手段１および３からそれぞれバス２および４を介して読み出されたデータの比較を行う比較手段、６および１０は記憶手段１および３からそれぞれバス２および４を介して読み出されたデータの加算を行う加算手段、７は比較手段５の比較結果を格納する記憶手段、１１は比較手段９の比較結果を格納する記憶手段、８は加算手段６の加算結果を入力し、比較手段５の比較結果に基づいて出力を決定する選択手段、１２は加算手段１０の加算結果を入力し、比較手段９の比較結果に基づいて出力を決定する選択手段、１３は選択手段８および１２の選択結果を入力し、記憶手段１に転送するバスである。なお、比較結果を格納する記憶手段７および１１は、いずれもバス２に接続され、バス２を介して記憶手段１に比較結果を転送することができる。
【００３０】
次に本実施の形態における動作を図２と図３を参照して説明する。以下の説明では、拘束長Ｋを４、符号化率１／２の場合について考える。パスメトリックとブランチメトリックのデータの型は、いずれも単精度データとする。また、以下の説明では、便宜上、倍精度データを（Ｘ，Ｙ）としたとき、Ｘは倍精度データの上位側を表し、Ｙは倍精度データの下位側を表す。
【００３１】
図２の畳み込み符号器を例に考え、符号化率１／２としたときの４個のブランチメトリックをそれぞれＢＭ０，ＢＭ１，ＢＭ２，ＢＭ３とする。これらのブランチメトリックを用いて拘束長Ｋ＝４の時のステート（State ）の遷移状態を図示すると、図３のようなバタフライ構造になる。ここで旧ステート（Old State ・）のノードＮ０とノードＮ１に着目する。ノードＮ０とノードＮ１が遷移するのはノードＮ’０とノードＮ’４である。
そのときに取るブランチメトリック（ＢＭ）は
・ノードＮ０からノードＮ’０のときはＢＭ０、
・ノードＮ１からノードＮ’０のときはＢＭ１、
・ノードＮ０からノードＮ’４のときはＢＭ１、
・ノードＮ１からノードＮ’４のときはＢＭ０、
である。また、ノードＮ０のパスメトリックをＰＭ０、ノードＮ１のパスメトリックをＰＭ１とすると、共通のパスメトリックＰＭ０，ＰＭ１にそれぞれブランチメトリックＢＭ０，ＢＭ１を交換して加算することで、ノードＮ’０、ノードＮ’４のパスメトリックになり得るということがわかる。この関係を利用して、並列処理することで同時に２個のパスメトリックを更新することができる。
【００３２】
なお、この関係は図３に示すように以降のノードのペア（図ではノードＮ２とノードＮ３のペア、ノードＮ４とノードＮ５のペア、ノードＮ６とノードＮ７のペア）に関しても成り立つ。そこで、図３に示すように前半のノードＮ’０からノードＮ’３のＡＣＳ演算を比較手段５と加算手段６と比較結果を格納する記憶手段７と選択手段８とで処理を行い、後半のノードＮ’４からノードＮ’７のＡＣＳ演算を比較手段９と加算手段１０と比較結果を格納する記憶手段１１と選択手段１２とで処理を行う。
【００３３】
以降はノードＮ０とノードＮ１からノードＮ’０とノードＮ’４へのＡＣＳ演算に関して詳細な動作説明を行う。まず、記憶手段１から２個のパスメトリックが（ＰＭ１，ＰＭ０）として、バス２に出力され、一方記憶手段３から２個のブランチメトリックが（ＢＭ１，ＢＭ０）として、バス４に出力される。比較手段５では、バス２から２個のパスメトリック（ＰＭ１，ＰＭ０）を入力し、バス４から２個のブランチメトリック（ＢＭ１，ＢＭ０）を入力し、
ＰＭ１＋ＢＭ１−ＰＭ０−ＢＭ０
を計算する。一方、加算手段６では、バス２から２個のパスメトリック（ＰＭ１，ＰＭ０）を入力し、バス４から２個のブランチメトリック（ＢＭ１，ＢＭ０）を入力し、
ＰＭ１＋ＢＭ１と、ＰＭ０＋ＢＭ０
を計算し、選択手段８に（ＰＭ１＋ＢＭ１，ＰＭ０＋ＢＭ０）として出力する。
【００３４】
選択手段８は、比較手段５の比較結果ＰＭ１＋ＢＭ１−ＰＭ０−ＢＭ０の符号ビットである最上位ビット（以後これをＭＳＢ：Most Significant Bitと呼ぶ）を入力し、ＭＳＢの値により上位ＰＭ１＋ＢＭ１を出力するか、下位ＰＭ０＋ＢＭ０を出力するかを選択する。
すなわち、
ＰＭ１＋ＢＭ１≧ＰＭ０−ＢＭ０
なら、ＰＭ１＋ＢＭ１−ＰＭ０−ＢＭ０≧０
であるので、ＭＳＢは０となり、このときは下位ＰＭ０＋ＢＭ０を選択し、これを新たにＰＭ’０としてバス１３に出力する。
逆に、
ＰＭ１＋ＢＭ１＜ＰＭ０−ＢＭ０
なら、ＰＭ１＋ＢＭ１−ＰＭ０−ＢＭ０＜０
であるので、ＭＳＢは１となり、このときは上位ＰＭ１＋ＢＭ１を選択し、これを新たにＰＭ’０としてバス１３に出力する。また、比較手段５の比較結果のＭＳＢは同時に記憶手段７に順次格納される。
【００３５】
比較手段９では、バス２から２個のパスメトリック（ＰＭ１，ＰＭ０）を入力し、バス４から２個のブランチメトリック（ＢＭ１，ＢＭ０）を入力し、
ＰＭ１＋ＢＭ０−ＰＭ０−ＢＭ１
を計算する。一方、加算手段１０では、バス２から２個のパスメトリック（ＰＭ１，ＰＭ０）を入力し、バス４から２個のブランチメトリック（ＢＭ１，ＢＭ０）を入力し、
ＰＭ１＋ＢＭ０と、ＰＭ０＋ＢＭ１
を計算し、選択手段１２に（ＰＭ１＋ＢＭ０，ＰＭ０＋ＢＭ１）として出力する。
【００３６】
選択手段１２は、比較手段９の比較結果ＰＭ１＋ＢＭ０−ＰＭ０−ＢＭ１のＭＳＢを入力し、ＭＳＢの値により上位ＰＭ１＋ＢＭ０を出力するか、下位ＰＭ０＋ＢＭ１を出力するかを選択する。
すなわち、
ＰＭ１＋ＢＭ０≧ＰＭ０−ＢＭ１
なら、ＰＭ１＋ＢＭ０−ＰＭ０−ＢＭ１≧０
であるので、ＭＳＢは０となり、このときは下位ＰＭ０＋ＢＭ１を選択し、これを新たにＰＭ’４としてバス１３に出力する。
逆に、
ＰＭ１＋ＢＭ０＜ＰＭ０−ＢＭ１
なら、ＰＭ１＋ＢＭ０−ＰＭ０−ＢＭ１＜０
であるので、ＭＳＢは１となり、このときは上位ＰＭ１＋ＢＭ０を選択し、これを新たにＰＭ’４としてバス１３に出力する。
また、比較手段９の比較結果のＭＳＢは同時に記憶手段１１に順次格納される。
【００３７】
以上のように、その外のノードのペアに関しても同様な処理を行うことで、ＤＳＰによるビタビ復号のＡＣＳ演算を並列に実行することができる。なお、これまでの説明では、拘束長Ｋ＝４、符号化率１／２の場合の具体例を示したが、拘束長と符号化率の値がそれ以外の値であっても、上記関係は成り立つ為、それに応じた変更を適宜施すことによって同様に実施可能である。
【００３８】
（実施の形態２）
図４は実施の形態２における演算処理装置の構成を示すものである。本実施の形態の演算処理装置が、実施の形態１（図１）の演算処理装置と異なるところは、パスメトリックを格納する記憶手段として４バンクからなるＲＡＭ１４で構成されている点であり、それ以外の構成および動作は実施の形態１とまったく同じである。
【００３９】
本実施の形態の演算処理装置は、図５に示すパイプライン構造の演算処理に適している。例えば、命令１においてｎ＋１サイクル目の演算実行ステージでＡＣＳ演算を実行するためには、予めｎサイクル目のメモリアクセス・ステージで読み出すパスメトリックのアドレスをＲＡＭ１４に供給する必要がある。このときＲＡＭ１４が偶数番地と奇数番地を連続して読み出すことができる、すなわち倍精度読み出しが可能なＲＡＭであるとすると、以下の状況で偶数アドレスを指定するだけで演算に使用する２つのパスメトリックを読み出すことができる。
・１ステートのパスメトリックは偶数番地、奇数番地の順に連続した番地に格納されており、
・１ステートのパスメトリックを前半と後半に分け、それぞれ別々のバンクに格納されている。
【００４０】
４バンクのＲＡＭ１４には、例えばバンク０に旧ステートの前半のパスメトリック（図３ではＰＭ０，ＰＭ１，ＰＭ２，ＰＭ３を指す）が格納されており、バンク１に旧ステートの後半のパスメトリック（図３ではＰＭ４，ＰＭ５，ＰＭ６，ＰＭ７を指す）が格納されているとき、１サイクルの演算実行（ＡＣＳ演算実行）で２個のパスメトリックが生成され、それらがバス１３を介してそれぞれバンク２、バンク３に格納される。このときバス１３は倍精度データを転送することになり、バンク２にノードＮ’０からノードＮ’３のパスメトリックが格納され、バンク３にノードＮ’からノードＮ’７のパスメトリックが格納される。
【００４１】
以上の図３に対応したメモリアクセスの動作例を図６に示す。１ステートのＡＣＳ演算が終了すると、次ステートでは旧ステートのパスメトリックとしてバンク２および３から読み出しを行い、新ステートのパスメトリックはバンク０とバンク１に格納する。このように４バンクのＲＡＭ１４を用いて１ステートのＡＣＳ演算が終了するごとにパスメトリックを読み出すバンクのペアと格納するバンクのペアを切り替えることで、ＤＳＰによるビタビ復号のＡＣＳ演算を並列に実行することが可能となる。
【００４２】
なお、これまでの説明では、ペアとなるバンクとしてバンク０とバンク１、バンク２とバンク３を例に説明したが、その他の組み合わせを用いてもメモリアクセスステージで供給するアドレスと格納するときのアドレスが変更するだけで同様に実施可能である。また、本実施の形態では、ＲＡＭ１４を４つのバンクで構成したが、本バンク数は最低限必要な数であり、４つ以上であれば同様に実施可能である。
【００４３】
（実施の形態３）
図７は実施の形態３における演算処理装置の構成を示すものである。本実施の形態の演算処理装置が、実施の形態１（図１）の演算処理装置と異なるところはパスメトリックを格納する記憶手段として３バンクからなるデュアルポートＲＡＭ１５で構成されている点であり、それ以外の構成および動作は実施の形態１とまったく同じである。
【００４４】
本実施の形態の演算処理装置も、実施の形態２と同じく図５に示すパイプライン構造の演算処理に適している。パスメトリックを格納する記憶手段がデュアルポートＲＡＭ１５であることから、１命令において同一バンクへのリードとライトの指定が可能なため、例えば、命令１においてｎ＋１サイクル目の演算実行ステージでＡＣＳ演算を実行する為に、まずｎサイクル目のメモリアクセス・ステージで読み出すパスメトリックのアドレスと書き込むパスメトリックのアドレスをデュアルポートＲＡＭ１５に供給し、ｎ＋１サイクル目で、実施の形態２のＲＡＭ１４と同じくデュアルポートＲＡＭ１５から偶数番地と奇数番地を連続して読み出し、ＡＣＳ演算を行い、さらに同じバンクに１個のパスメトリックを書き込むことが可能となる。
【００４５】
本実施の形態３の演算処理装置も、実施の形態２の演算処理装置と同じく以下の状況下で動作する。
・１ステートのパスメトリックは偶数番地、奇数番地の順に連続した番地に格納されており、
・１ステートのパスメトリックを前半と後半に分け、それぞれ別々のバンクに格納されている。
【００４６】
デュアルポートＲＡＭ１５には、例えばバンク０に旧ステートの前半のパスメトリック（図３ではＰＭ０，ＰＭ１，ＰＭ２，ＰＭ３を指す）が格納されており、バンク１に旧ステートの後半のパスメトリック（図３ではＰＭ４，ＰＭ５，ＰＭ６，ＰＭ７を指す）が格納されているとき、１サイクルの演算実行（ＡＣＳ演算実行）で２個のパスメトリックが生成され、それらがバス１３を介してそれぞれバンク０、バンク２に格納される。このときバス１３は倍精度データを転送することになり、バンク０にノードＮ’０からノードＮ’３のパスメトリックが格納され、バンク２にノードＮ’４からノードＮ’７のパスメトリックが格納される。
【００４７】
以上の図３に対応したメモリアクセスの動作例を図８に示す。本実施の形態の演算処理装置が実施の形態２の演算処理装置と異なる点は、１ステートのＡＣＳ演算が終了すると、バンク１とバンク２の切り替えのみ行い、バンク０に関しては切り替えなくても、ＤＳＰによるビタビ復号のＡＣＳ演算を並列に実行することができる点である。なお、本実施の形態では、デュアルポートＲＡＭ１５を３つのバンクで構成したが、本バンク数は最低限必要な数であり、３つ以上であれば同様に実施可能である。
【００４８】
（実施の形態４）
図９は実施の形態４における演算処理装置の構成を示すものである。本実施の形態の演算処理装置が、実施の形態２（図４）の演算処理装置と異なるところは入力レジスタ１６、１７を具備している点であり、それ以外の構成および動作は実施の形態２とまったく同じである。図９において、本入力レジスタ１６、１７は、バス２からデータを入力し、比較手段５、９と加算手段６、１０にデータを出力する。
【００４９】
本実施の形態の演算処理装置は、図１０に示すパイプライン構造の演算処理に適している。例えば、命令１においてｎ＋２サイクル目の演算実行ステージでＡＣＳ演算を実行する為に、予めｎサイクル目のメモリアクセス・ステージで読み出すパスメトリックのアドレスをＲＡＭ１４に供給し、ｎ＋１サイクル目のデータ転送ステージでＲＡＭ１４から出力されたデータがバス２を介して入力レジスタ１６、１７にラッチする。
【００５０】
図１０に示すパイプラインは、図５に示したパイプラインのステージにデータ転送の１ステージを演算実行ステージの前に挿入している。すなわち、演算実行ステージの始まりの時点では、ＲＡＭ１４からのデータは各演算器（比較手段５、９と加算手段６、１０を指す）手前の入力レジスタで確定しているため、ＲＡＭ１４からのデータ転送に要す時間を省くことが可能となる。
【００５１】
したがって、本実施の形態によれば、比較的高速にＤＳＰによるビタビ復号のＡＣＳ演算を並列に実行することが可能となる。なお、パスメトリックを格納する手段としてデュアルポートＲＡＭを用いても同様に実施可能である。
【００５２】
（実施の形態５）
図１１は実施の形態５における演算処理装置の構成を示すものである。本実施の形態の演算処理装置が、実施の形態４（図９）の演算処理装置と異なるところはスワップ回路１８を具備している点であり、それ以外の構成および動作は実施の形態４とまったく同じである。図１１において、本スワップ回路１８は、ブランチメトリックを格納する記憶手段３からデータを入力し、バス４にデータを出力する。
【００５３】
本実施の形態の演算処理装置は、図１０に示すパイプライン構造の演算処理に適している。本スワップ回路１８は、記憶手段３から例えば｛ＢＭ１，ＢＭ０｝の形式で倍精度データとして入力した２つのブランチメトリックの値を、そのまま｛ＢＭ１，ＢＭ０｝として出力するか、上位と下位をスワップして｛ＢＭ０，ＢＭ１｝として出力するかを、命令などにより切り替える機能を有する。
【００５４】
スワップ回路１８の動作を説明する。拘束長Ｋ＝４、符号化率１／２として、図２に示す畳み込み符号器および図３に示すバタフライ構造のパスメトリックの遷移状態を用いて説明する。旧ステート（Old State ）のノードＮ０とノードＮ１から、ノードＮ’０およびノードＮ’４に遷移する時のＡＣＳ演算と、旧ステート（Old State ）のノードＮ６とノードＮ７から、ノードＮ’３およびノードＮ’７に遷移する時のＡＣＳ演算とを比較すると図１２になる。すなわち、ノードＮ０とノードＮ１からノードＮ’０へのＡＣＳ演算とノードＮ６とノードＮ７からノードＮ’３へのＡＣＳ演算は比較手段５と加算手段６で行われるが、両ＡＣＳ演算では共通のブランチメトリックＢＭ０とＢＭ１を用い、かつＢＭ０とＢＭ１がスワップした関係になっている。これはノードＮ０とノードＮ１からノードＮ’４へのＡＣＳ演算とノードＮ６とノードＮ７からノードＮ’７へのＡＣＳ演算の比較手段９と加算手段１０でも同じ関係が成り立つ。そのため、ブランチメトリックを格納する記憶手段３には｛ＢＭ０，ＢＭ１｝と｛ＢＭ１，ＢＭ０｝の両形態で格納しなければならず、冗長なハードウェア資源となる。
【００５５】
スワップ回路１８は、このような冗長性を解決するもので、ブランチメトリックを格納する記憶手段３には、例えば｛ＢＭ０，ＢＭ１｝の形態だけを格納しておき、スワップ回路１８には、この｛ＢＭ０，ＢＭ１｝を入力し、例えば命令などにより、出力として｛ＢＭ０，ＢＭ１｝とするか、｛ＢＭ１，ＢＭ０｝とするかを切り替える動作を行うもので、このスワップ回路１８により、ブランチメトリックを格納する記憶手段３の冗長性を省くことが可能となる。
【００５６】
なお、本実施の形態では、拘束長Ｋ＝４、符号化率１／２で、旧ステートのノードＮ０、ノードＮ１、ノードＮ６、ノードＮ７を用いて説明を行ったが、ノードＮ２、ノードＮ３、ノードＮ４、ノードＮ５でも上記関係が成り立ち、さらに上記以外の拘束長Ｋと符号化率の組み合わせでも成り立つため、同様に実施可能である。また、パスメトリックを格納する手段としてデュアルポートＲＡＭを用いても同様に実施可能である。
【００５７】
（実施の形態６）
図１３は実施の形態６における演算処理装置の構成を示すものである。本実施の形態の演算処理装置が、実施の形態５（図１１）の演算処理装置と異なるところは、比較手段として２つの加算器と１つの比較器で構成し、加算手段として２つの加算器で構成している点であり、それ以外の構成および動作は実施の形態５とまったく同じである。
【００５８】
図１３において、１９および２０はバス４と入力レジスタ１６からデータを入力し加算する加算器、２１は加算器１９と加算器２０から加算結果を入力して比較し、比較結果を格納する記憶手段７と選択手段８に出力する比較器、２２および２３はバス４と入力レジスタ１６からデータを入力して加算し、加算結果を選択手段８に出力する加算器、２４および２５はバス４と入力レジスタ１７からデータを入力し加算する加算器、２６は加算器２４と加算器２５から加算結果を入力して比較し、比較結果を格納する記憶手段１１と選択手段１２に出力する比較器、２７および２８はバス４と入力レジスタ１７からデータを入力して加算し、加算結果を選択手段１２に出力する加算器である。本実施の形態の演算処理装置は、図１０に示すパイプライン構造の演算処理に適している。
【００５９】
次に、本実施の形態におけるＡＣＳ演算の動作を説明する。拘束長Ｋ＝４、符号化率１／２として、図２に示す畳み込み符号器と、図３に示すバタフライ構造と、図１２に示すノードＮ０，Ｎ１からノードＮ’０，Ｎ’４へのＡＣＳ演算とノードＮ６，Ｎ７からノードＮ’３，Ｎ’７へのＡＣＳ演算の比較を用いて説明する。
【００６０】
図１３に示すように、入力レジスタ１６、１７から２つのパスメトリックが｛Ａ，Ｂ｝として出力され、スワップ回路１８から２つのブランチメトリックが｛Ｃ，Ｄ｝として出力されると、加算器１９では、パスメトリック｛Ａ｝とブランチメトリック｛Ｃ｝を入力し加算結果｛Ａ＋Ｃ｝を出力し、加算器２０では、パスメトリック｛Ｂ｝とブランチメトリック｛Ｄ｝を入力し、加算結果｛Ｂ＋Ｄ｝を出力し、比較器２１では、加算器１９の加算結果｛Ａ＋Ｃ｝と加算器２０の加算結果｛Ｂ＋Ｄ｝とを入力し、｛Ａ＋Ｃ−（Ｂ＋Ｄ）｝の比較を行い、比較結果のＭＳＢを出力する。加算器２２では、パスメトリック｛Ａ｝とブランチメトリック｛Ｃ｝を入力し加算結果｛Ａ＋Ｃ｝を出力し、加算器２３では、パスメトリック｛Ｂ｝とブランチメトリック｛Ｄ｝を入力し、加算結果｛Ｂ＋Ｄ｝を出力する。
【００６１】
一方、加算器２４では、パスメトリック｛Ａ｝とブランチメトリック｛Ｄ｝を入力し、加算結果｛Ａ＋Ｄ｝を出力し、加算器２５では、パスメトリック｛Ｂ｝とブランチメトリック｛Ｃ｝を入力し、加算結果｛Ｂ＋Ｃ｝を出力し、比較器２６では、加算器２４の加算結果｛Ａ＋Ｄ｝と加算器２５の加算結果｛Ｂ＋Ｃ｝とを入力し、｛Ａ＋Ｄ−（Ｂ＋Ｃ）｝の比較を行い、比較結果のＭＳＢを出力する。加算器２７では、パスメトリック｛Ａ｝とブランチメトリック｛Ｄ｝を入力し、加算結果｛Ａ＋Ｄ｝を出力し、加算器２８では、パスメトリック｛Ｂ｝とブランチメトリック｛Ｃ｝を入力し、加算結果｛Ｂ＋Ｃ｝を出力する。
【００６２】
以上の構成および動作により、入力レジスタ１６および入力レジスタ１７の２つのパスメトリック｛Ａ，Ｂ｝＝｛ＰＭ１，ＰＭ０｝とし、スワップ回路１８の出力｛Ｃ，Ｄ｝＝｛ＢＭ１，ＢＭ０｝とすると、図１２に示す旧ステート（Old State ）のノードＮ０とノードＮ１から、ノードＮ’０およびノードＮ’４に遷移する時のＡＣＳ演算が実現できる。
【００６３】
また、入力レジスタ１６および入力レジスタ１７の２つのパスメトリック｛Ａ，Ｂ｝＝｛ＰＭ１，ＰＭ０｝とし、スワップ回路１８の出力｛Ｃ，Ｄ｝＝｛ＢＭ０，ＢＭ１｝とすると、図１２に示す旧ステート（Old State ）のノードＮ０とノードＮ１から、ノードＮ’０およびノードＮ’４に遷移する時のＡＣＳ演算が実現できる。
【００６４】
したがって、本実施の形態によれば、２つのパスメトリックの更新がＤＳＰによるパイプライン動作により１マシンサイクルで実現できる。なお、本実施の形態では、拘束長Ｋ＝４、符号化率１／２で、旧ステートのノードＮ０、ノードＮ１、ノードＮ６、ノードＮ７を用いて説明を行ったが、ノードＮ２、ノードＮ３、ノードＮ４、ノードＮ５でも上記関係が成り立ち、さらに上記以外の拘束長Ｋと符号化率の組み合わせでも成り立つため、同様に実施可能である。また、パスメトリックを格納する手段としてデュアルポートＲＡＭを用いても同様に実施可能である。
【００６５】
（実施の形態７）
図１４は実施の形態７における演算処理装置の構成を示すものである。本実施の形態の演算処理装置が、実施の形態６（図１３）の演算処理装置と異なるところは、比較器の一方をＡＬＵ２９で兼用している点であり、またそれに伴い入力レジスタ３０、３１と、バス３２、３３、３７、３８と、セレクタ３４、３５を具備しており、またブランチメトリックを格納する記憶手段としてレジスタファイル３６を具備している点で、それ以外の構成および動作は実施の形態６とまったく同じである。
【００６６】
図１４において、３０は４バンクからなるＲＡＭ１４からバス３７を介してデータを入力する入力レジスタ、３１は４バンクからなるＲＡＭ１４からバス３８を介してデータを入力する入力レジスタ、３２およびバス３３はレジスタファイル３６からデータを入力するバス、３４はバス３２と加算器１９と入力レジスタ３０からデータを入力し、出力を選択するセレクタ、３５はバス３３と加算器２０と入力レジスタ３１からデータを入力し、出力を選択するセレクタ、２９はセレクタ３４および３５からデータを入力して算術論理演算を行い、バス１３に算術論理演算結果を出力し、さらに算術論理演算結果のＭＳＢを比較結果を格納する記憶手段７と選択手段８に出力するＡＬＵである。本実施の形態の演算処理装置は、図１０に示すパイプライン構造の演算処理に適している。
【００６７】
本実施の形態においてＡＣＳ演算を行うときは、セレクタ３４は加算器１９の出力を選択してＡＬＵ２９に入力し、セレクタ３５は加算器２０の出力を選択してＡＬＵ２９に入力し、ＡＬＵ２９は入力した２つのデータを減算し、減算結果のＭＳＢを比較結果を格納する記憶手段７と選択手段８に出力する。
【００６８】
また、ＡＬＵがレジスタ−レジスタ間の算術論理演算を行う時は、レジスタファイル３６からバス３２とバス３３にデータが出力され、セレクタ３４とセレクタ３５が、それぞれバス３２とバス３３を選択することで実現可能である。ＡＬＵがレジスタ−メモリ間の算術論理演算を行う時は、レジスタファイル３６からバス３２にデータが出力され、４バンクからなるＲＡＭ１４からバス３８を介して入力レジスタ３１にデータが入力され、セレクタ３４とセレクタ３５がそれぞれバス３２と入力レジスタ３１を選択することで実現可能である。
【００６９】
逆に、ＡＬＵがメモリ−レジスタ間の算術論理演算を行う時は、４バンクからなるＲＡＭ１４からバス３７を介して入力レジスタ３０にデータが入力され、レジスタファイル３６からバス３３にデータが出力され、セレクタ３４とセレクタ３５がそれぞれ入力レジスタ３０とバス３３を選択することで実現可能である。
【００７０】
その外、ＡＬＵがメモリ−メモリ間の算術論理演算を行う時は、４バンクからなるＲＡＭ１４からバス３７およびバス３８を介して、入力レジスタ３０および入力レジスタ３１にデータが入力され、セレクタ３４とセレクタ３５がそれぞれ入力レジスタ３０と入力レジスタ３１を選択することで実現可能である。
【００７１】
このようにして、本実施の形態によれば、ＡＣＳ演算を行う比較器の一方をＡＬＵと兼用することで、演算処理装置をＬＳＩ化する場合に、そのチップ面積を削減してコストを低減することができる。なお、パスメトリックを格納する手段としてデュアルポートＲＡＭを用いても同様に実施可能である。
【００７２】
（実施の形態８）
図１５は実施の形態８における演算処理装置の構成を示すものである。本実施の形態の演算処理装置が、実施の形態７（図１４）の演算処理装置と異なるところは、比較手段として用いている２つの加算器を、４：２ＣＯＭＰＲＥＳＳＯＲ３９および４０で実現している点であり、それ以外の構成および動作は実施の形態７とまったく同じである。
【００７３】
図１５において、３９はバス４と入力レジスタ１６からデータを入力し、セレクタ３４とセレクタ３５に演算結果を出力する４：２ＣＯＭＰＲＥＳＳＯＲ、４０はバス４と入力レジスタ１７からデータを入力し比較器２６に演算結果を出力する４：２ＣＯＭＰＲＥＳＳＯＲである。本実施の形態の演算処理装置は、図１０に示すパイプライン構造の演算処理に適している。
【００７４】
次に、本実施の形態におけるＡＣＳ演算の動作を説明する。拘束長Ｋ＝４、符号化率１／２として、図２に示す畳み込み符号器と、図３に示すバタフライ構造と、図１２に示すノードＮ０，Ｎ１からノードＮ’０，Ｎ’４へのＡＣＳ演算とノードＮ６，Ｎ７からノードＮ’３，Ｎ’７へのＡＣＳ演算の比較を用いて説明する。
【００７５】
まず、４：２ＣＯＭＰＲＥＳＳＯＲ３９、４０は、図１６に示す処理を行なう単体のブロックが単精度ビット数分直列に接続され、通常の全加算器よりも高速に加算処理を行なう。
【００７６】
図１５に示すように、入力レジスタ１６、１７から２つのパスメトリックが｛Ａ，Ｂ｝として出力され、スワップ回路１８から２つのブランチメトリックが｛Ｃ，Ｄ｝として出力されると、４：２ＣＯＭＰＲＥＳＳＯＲ３９では、パスメトリック｛Ａ｝とブランチメトリック｛Ｃ｝とパスメトリック｛Ｂ｝の反転｛￣Ｂ｝とブランチメトリック｛Ｄ｝の反転｛￣Ｄ｝を入力し、ＡＬＵ２９では、セレクタ３４、３５を介して、４：２ＣＯＭＰＲＥＳＳＯＲ３９の２つの出力を入力して加算する。ただし、このとき｛Ｂ｝および｛Ｄ｝の２の補数を実現するために、４：２ＣＯＭＰＲＥＳＳＯＲ３９と、ＡＬＵ２９の最下位のキャリー入力に“１”を入力する。その結果｛Ａ＋Ｃ−（Ｂ＋Ｄ）｝が得られ、そのＭＳＢを出力する。加算器２２では、パスメトリック｛Ａ｝とブランチメトリック｛Ｃ｝を入力し加算結果｛Ａ＋Ｃ｝を出力し、加算器２３では、パスメトリック｛Ｂ｝とブランチメトリック｛Ｄ｝を入力し加算結果｛Ｂ＋Ｄ｝を出力する。
【００７７】
一方、４：２ＣＯＭＰＲＥＳＳＯＲ４０では、パスメトリック｛Ａ｝とブランチメトリック｛Ｄ｝とパスメトリック｛Ｂ｝のの反転｛￣Ｂ｝とブランチメトリック｛Ｃ｝の反転｛￣Ｃ｝を入力し、比較器２６は、４：２ＣＯＭＰＲＥＳＳＯＲ３９の２つの出力を入力して加算する。ただし、このとき｛Ｂ｝および｛Ｃ｝の２の補数を実現するために、４：２ＣＯＭＰＲＥＳＳＯＲ４０と、比較器２６の最下位のキャリー入力に“１”を入力する。その結果｛Ａ＋Ｄ−（Ｂ＋Ｃ）｝が得られ、そのＭＳＢを出力する。加算器２７では、パスメトリック｛Ａ｝とブランチメトリック｛Ｄ｝を入力し、加算結果｛Ａ＋Ｄ｝を出力し、加算器２８では、パスメトリック｛Ｂ｝とブランチメトリック｛Ｃ｝を入力し、加算結果｛Ｂ＋Ｃ｝を出力する。
【００７８】
以上の構成および動作により、入力レジスタ１６および入力レジスタ１７の２つのパスメトリック｛Ａ，Ｂ｝＝｛ＰＭ１，ＰＭ０｝とし、スワップ回路１８の出力｛Ｃ，Ｄ｝＝｛ＢＭ１，ＢＭ０｝とすると、図１２に示す旧ステート（Old State ）のノードＮ０とノードＮ１から、ノードＮ’０およびノードＮ’４に遷移する時のＡＣＳ演算が実現できる。
【００７９】
また、入力レジスタ１６および入力レジスタ１７の２つのパスメトリック｛Ａ，Ｂ｝＝｛ＰＭ１，ＰＭ０｝とし、スワップ回路１８の出力｛Ｃ，Ｄ｝＝｛ＢＭ０，ＢＭ１｝とすると、図１２に示す旧ステート（Old State ）のノードＮ０とノードＮ１から、ノードＮ’０およびノードＮ’４に遷移する時のＡＣＳ演算が実現できる。したがって、２つのパスメトリックの更新がＤＳＰによるパイプライン動作により１マシンサイクルで実現できる。
【００８０】
このようにして、本実施の形態によれば、ＡＣＳ演算を行う比較手段に４：２ＣＯＭＰＲＥＳＳＯＲを適用することによって、２つの加算器で構成した場合より高速に演算することが可能なため、より高速な演算を実現することができる。なお、例では拘束長Ｋ＝４、符号化率１／２で、旧ステートのノードＮ０、ノードＮ１、ノードＮ６、ノードＮ７を用いて説明を行ったが、ノードＮ２、ノードＮ３、ノードＮ４、ノードＮ５でも上記関係が成り立ち、さらに上記以外の拘束長Ｋと符号化率の組み合わせでも成り立つため、同様に実施可能である。また、パスメトリックを格納する手段としてデュアルポートＲＡＭを用いても同様に実施可能である。
【００８１】
（実施の形態９）
図１７は実施の形態９における演算処理装置の構成を示すものである。本実施の形態の演算処理装置が、実施の形態８（図１５）の演算処理装置と異なるところは、加算手段として倍精度加算器を用い、しかも少なくとも一方は倍精度ＡＵで兼用している点であり、それ以外の構成および動作は実施の形態８とまったく同じである。
【００８２】
図１７において、４１は入力レジスタ１６とバス４から倍精度形式のデータを入力し、倍精度算術演算を行う倍精度ＡＵ、４２は入力レジスタ１７とバス４から倍精度形式のデータを入力し、倍精度加算演算を行う倍精度加算器であり、倍精度ＡＵ４１の出力は選択手段８とバス１３に出力し、倍精度加算器４２の出力は選択手段１２に出力する。本実施の形態の演算処理装置は、図１０に示すパイプライン構造の演算処理に適している。
【００８３】
本実施の形態においてＡＣＳ演算を行うときは、倍精度ＡＵ４１は、入力レジスタ１６から２つのパスメトリックを倍精度形式で｛Ａ，Ｂ｝として入力し、スワップ回路１８からバス４を介して、２つのブランチメトリックを倍精度形式で｛Ｃ，Ｄ｝として入力する。この時、倍精度ＡＵ４１は倍精度の加算を行うが、図１８に示すように、単精度のＭＳＢのビット位置から次段へのキャリーは強制的にゼロにする。これにより、２つのパスメトリックとブランチメトリックの加算｛Ａ＋Ｃ，Ｂ＋Ｄ｝が同時に並列演算することができる。
【００８４】
一方、倍精度加算器４２は、入力レジスタ１７から２つのパスメトリックを倍精度形式で｛Ａ，Ｂ｝として入力し、スワップ回路１８からバス４を介して、２つのブランチメトリックを倍精度形式で｛Ｄ，Ｃ｝として入力する。倍精度加算器４２も、倍精度ＡＵ４１と同様に単精度のＭＳＢのビット位置から次段へのキャリーは強制的にゼロにして、２つのパスメトリックとブランチメトリックの加算｛Ａ＋Ｄ，Ｂ＋Ｃ｝を同時に並列演算する。
【００８５】
このようにして、本実施の形態によれば、ＡＣＳ演算を行う加算手段に倍精度ＡＵ４１を用い、ＡＣＳ演算時には単精度のＭＳＢのビット位置から次段へのキャリーを強制的にゼロにし、それ以外の倍精度算術演算では、キャリーを伝播させる制御を付加することで、例えば積和演算時の倍精度累積加算器と兼用することが可能で、演算処理装置をＬＳＩ化する場合に、そのチップ面積を一段と削減してコストを低減することができる。なお、パスメトリックを格納する手段としてデュアルポートＲＡＭを用いても同様に実施可能である。
【００８６】
（実施の形態１０）
図１９は実施の形態１０における演算処理装置の構成を示すものである。本実施の形態の演算処理装置が、実施の形態９（図１７）の演算処理装置と異なるところは、比較結果を格納する記憶手段としてシフトレジスタを用いている点であり、それ以外の構成および動作は実施の形態９とまったく同じである。
【００８７】
図１９において、４３はＡＬＵ２９の演算結果のＭＳＢを入力とするシフトレジスタ、４４は比較器２６の演算結果のＭＳＢを入力とするシフトレジスタであり、シフトレジスタ４３，４４は、両者ともバス２にデータを出力することができる。本実施の形態の演算処理装置は、図１０に示すパイプライン構造の演算処理に適している。
【００８８】
本実施の形態においてＡＣＳ演算を行うときは、ＡＬＵ２９による比較結果のＭＳＢをシフトレジスタ４３に随時シフトインし、比較器２６による比較結果のＭＳＢをシフトレジスタ４４に随時シフトインすることで、パスセレクト信号（２つのパスのうちどちらを選んだかを示す信号で、ＡＣＳ演算終了後トレースバックするときに使用する）を格納することができる。また、本シフトレジスタ４３，４４のビット幅が、例えば単精度データ幅である場合には、単精度のビット数回ＡＣＳ演算を行うと、シフトレジスタ４３，４４の値をバス２を介して、４バンクからなるＲＡＭ１４にパスセレクト信号を格納する必要がある。
【００８９】
このようにして、本実施の形態によれば、ＡＣＳ演算を行う比較結果を格納する記憶手段にシフトレジスタ４３，４４を用いることで、例えば除算系のシフトレジスタを使用する演算命令と兼用することが可能で、演算処理装置をＬＳＩ化する場合に、そのチップ面積を一段と削減してコストを低減することができる。なお、パスメトリックを格納する手段としてデュアルポートＲＡＭを用いても同様に実施可能である。
【００９０】
（実施の形態１１）
図２０は実施の形態１１における演算処理装置の構成を示すものである。本実施の形態の演算処理装置が、実施の形態１０（図１９）の演算処理装置と異なるところは、入力レジスタ１７がバス２から常にパスメトリックデータをスワップして入力して、４：２ＣＯＭＰＲＥＳＳＯＲ４０にはスワップ回路１８からのブランチメトリックデータをスワップしないでそのまま入力し、比較器２６の比較結果のネゲート値がシフトレジスタ４４にシフトインする点であり、それ以外の構成および動作は実施の形態１０とまったく同じである。本実施の形態の演算処理装置は、図１０に示すパイプライン構造の演算処理に適している。
【００９１】
本実施の形態においてＡＣＳ演算を行うときは、２つのパスメトリック｛Ａ，Ｂ｝が入力レジスタ１６にはそのまま｛Ａ，Ｂ｝として入力されるが、入力レジスタ１７には、常にスワップした状態｛Ｂ，Ａ｝として入力される。その後、４：２ＣＯＭＰＲＥＳＳＯＲ４０では、スワップ回路１８から２つのブランチメトリックが｛Ｃ｝と｛￣Ｄ｝として、入力レジスタ１７から２つのパスメトリックが｛Ｂ｝と｛￣Ａ｝として入力され、比較器２６では、４：２ＣＯＭＰＲＥＳＳＯＲ４０の２つの出力を入力して加算し、｛Ａ＋Ｄ−Ｂ−Ｃ｝を計算する。一方、倍精度加算器４２は、スワップ回路１８から２つのブランチメトリックを｛Ｃ，Ｄ｝として、入力レジスタから２つのパスメトリックが｛Ｂ，Ａ｝として入力され、｛Ｂ＋Ｃ｝と｛Ａ＋Ｄ｝を同時に並列演算し、選択手段１２に｛Ｂ＋Ｃ，Ａ＋Ｄ｝の形式で出力する。比較器２６は、比較結果のＭＳＢを選択手段１２に、比較結果のネゲート値のＭＳＢをシフトレジスタ４４に出力する。
【００９２】
このようにして、本実施の形態によれば、２つのパスメトリックを格納する入力レジスタの一方をスワップして入力することで、演算実行（ＥＸ）ステージで４：２ＣＯＭＰＲＥＳＳＯＲ４０と倍精度加算器４２の入力でのスワップがなくなり、より高速なＡＣＳ演算を行うことが可能となる。なお、パスメトリックを格納する手段としてデュアルポートＲＡＭを用いても同様に実施可能である。
【００９３】
（実施の形態１２）
図２１は実施の形態１２における移動局装置の構成を示すものである。図２１において、本実施の形態における移動局装置４５は、送受信共用のアンテナ部４６と、受信部４８及び送信部４９から成る無線部４７と、信号の変調及び復調と符号化及び復号化とを行うベースバンド信号処理部５０と、音声を放音するスピーカ５８と、音声を入力するマイク５９と、送受信するデータを外部装置との間で入出力するデータ入出力部６０と、動作状態を表示する表示部６１と、テンキーなどの操作部６２と、アンテナ部４６、無線部４７、ベースバンド信号処理部５０、表示部６１及び操作部６２などを制御する制御部６３とを備えている。
【００９４】
また、ベースバンド信号処理部５０は、受信信号を復調する復調部５１と、送信信号を変調する変調部５２と、１チップのＤＳＰ５３とで構成され、ＤＳＰ５３は、第１から第１１の実施の形態の演算処理装置から成るビタビ復号部５５と、送信信号を畳み込み符号化する畳み込み符号化部５６と、音声信号の符復号化を行う音声コーデック部５７と、送受信のタイミングを計って受信信号を復調部５１からビタビ復号部５５に、送信信号を畳み込み符号化部５６から変調部５２に送るタイミング制御部５４とを、それぞれソフトウェアで形成している。
【００９５】
この移動局装置４５の制御部６３は、移動局装置４５全体の動作を制御し、例えば、操作部６２から入力した信号を表示部６１に表示したり、操作部６２から入力した信号を受けて、発着呼の動作を行うための制御信号を、通信シーケンスに従って、アンテナ部４６と、無線部４７及びベースバンド信号処理部５０などに出力する。
【００９６】
移動局装置４５から音声が送信される場合には、マイク５９から入力した音声信号がAD変換され（図示なし）、ＤＳＰ５３の音声コーデック部５７で符号化され、その符号化データが畳み込み符号化部５６に入力する。また、データが送信される場合には、外部から入力したデータがデータ入出力部６０を介して畳み込み符号化部５６に入力する。畳み込み符号化部５６は、入力したデータを畳み込み符号化し、タイミング制御部５４に出力する。タイミング制御部５４は、入力したデータの並び替えや送信出力タイミングの調整を行って、変調部５２に出力する。変調部５２に入力したデータは、ディジタル変調され、DA変換されて( 図示なし) 、無線部４７の送信部４９に出力される。送信部４９は、これを無線信号に変換してアンテナ部４６に送り、アンテナから電波として送信される。
【００９７】
一方、受信時には、アンテナ部４６で受信された電波が、無線部４７の受信部４８で受信され、AD変換されて、ベースバンド信号処理部５０の復調部５１に出力される。復調部５１で復調されたデータは、タイミング制御部５４でデータの並び替え等が行われた後、ビタビ復号部５５に入力し、ここで復号される。ビタビ復号部５５で復号されたデータは、音声通信時には、音声コーデック部５７で音声復号化され、ＤＡ変換された後、スピーカ５８から音声として出力される。また、データ通信時には、ビタビ復号部５５で復号されたデータは、データ入出力部６０を介して外部に出力される。
【００９８】
このようにして、本実施の形態による移動局装置４５は、ビタビ復号部５５、畳み込み符号化部５６、音声コーデック部５７及びタイミング制御部５４の各部を１チップのＤＳＰ５３のソフトウェアで形成しているため、少ない部品点数で組み立てることができる。また、このビタビ復号部５５を第１から第１１の実施の形態の演算処理装置形成しているため、ＤＳＰ５３によるパイプライン処理で１マシンサイクルに２つのパスメトリックの更新が実現でき、これにより高速に比較的少ない処理量でＤＳＰ５３によるビタビ復号のＡＣＳ演算が実現できる。
【００９９】
なお、ここでは、復調部５１及び変調部５２をＤＳＰ５３と区別して示しているが、それらをＤＳＰ５３のソフトウェアで構成することも可能である。また、ＤＳＰとして、第６の実施の形態のＤＳＰを使用し、畳み込み符号化部５６、音声コーデック部５７及びタイミング制御部５４をそれぞれ別の部品で構成することも可能である。
【０１００】
（実施の形態１３）
図２２は実施の形態１３における移動局装置の構成を示すものである。本実施の形態の移動局装置４５Ａが、実施の形態１２（図２６）の移動局装置４５と異なるところは、変調部５２Ａに拡散部６５を設け、また、復調部５１Ａに逆拡散部６４を設けたＣＤＭＡ通信方式のベースバンド信号処理部５０Ａとした点であり、それ以外の構成及び動作は実施の形態１２と多くの点で類似している。なおＣＤＭＡ通信の場合、タイミング制御部５４に、遅延プルファイル等（図示なし）から選択された複数のフィンガを合わせ込むＲＡＫＥ受信部が含まれることもある。
【０１０１】
このように、本実施の形態における移動局装置４５Ａは、復調部５１Ａに逆拡散部６４を、また、変調部５２Ａに拡散部６５を設けることで、ＣＤＭＡ通信に適用することができる。
【０１０２】
（実施の形態１４）
図２３は実施の形態１４における基地局装置の構成を示すものであり、図２１に示したものと同様な機能を有する構成要素には同様な符号を付してある。図２３において、本実施の形態における基地局装置６８は、受信用のアンテナ６６及び送信用のアンテナ６７から成るアンテナ部４６と、受信部４８及び送信部４９から成る無線部４７と、信号の変調及び復調と符号化及び復号化とを行うベースバンド信号処理部６９と、送受信するデータを有線回線との間で入出力するデータ入出力部６０と、アンテナ部４６、無線部４７、ベースバンド信号処理部６９などを制御する制御部６３とを備えている。
【０１０３】
また、ベースバンド信号処理部６９は、受信信号を復調する復調部５１と、送信信号を変調する変調部５２と、１チップのＤＳＰ５３Ａとで構成され、ＤＳＰ５３Ａは、第１から第１１の実施の形態の演算処理装置から成るビタビ復号部５５と、送信信号を畳み込み符号化する畳み込み符号化部５６と、送受信のタイミングを計って受信信号を復調部５１からビタビ復号部５５に、送信信号を畳み込み符号化部５６から変調部５２に送るタイミング制御部５４とを、それぞれソフトウェアで形成している。
【０１０４】
この基地局装置６８の制御部６３は、基地局装置６８の制御の下に送信・受信の動作が行われ、有線回線から入力したデータがデータ入出力部６０を介して畳み込み符号化部５６に入力する。畳み込み符号化部５６は、入力したデータを畳み込み符号化し、タイミング制御部５４に出力する。タイミング制御部５４は、入力したデータの並び替えや送信出力タイミングの調整を行って、変調部５２に出力する。変調部５２に入力したデータは、ディジタル変調され、ＤＡ変換されて( 図示なし) 、無線部４７の送信部４９に出力される。送信部４９は、これを無線信号に変換してアンテナ部４６に送り、アンテナから電波として送信される。
【０１０５】
一方、受信時には、アンテナ部４６で受信された電波が、無線部４７の受信部４８で受信され、ＡＤ変換されて、ベースバンド信号処理部６９の復調部５１に出力される。復調部５１で復調されたデータは、タイミング制御部５４でデータの並び替え等が行われた後、ビタビ復号部５５に入力し、ここで復号される。ビタビ復号部５５で復号されたデータは、データ入出力部６０を介して有線回線に出力される。
【０１０６】
このように、本実施の形態における基地局装置６８は、ビタビ復号部５５、畳み込み符号化部５６、及びタイミング制御部５４の各部を１チップのＤＳＰ５３Ａのソフトウェアで形成しているため、少ない部品点数で組み立てることができる。また、このビタビ復号部５５を第１から第１１の実施の形態の演算処理装置形成しているため、ＤＳＰ５３Ａによるパイプライン処理で１マシンサイクルに２つのパスメトリックの更新が実現でき、これにより高速に比較的少ない処理量でＤＳＰ５３Ａによるビタビ復号のＡＣＳ演算が実現できる。
【０１０７】
なお、ここでは、復調部５１及び変調部５２をＤＳＰ５３Ａと区別して示しているが、それらをＤＳＰ５３Ａのソフトウェアで構成することも可能である。また、ＤＳＰ５３Ａとして、第６の実施の形態のＤＳＰを使用し、畳み込み符号化部５６、音声コーデック部５７及びタイミング制御部５４をそれぞれ別の部品で構成することも可能である。
【０１０８】
（実施の形態１５）
図２４は実施の形態１５における基地局装置の構成を示すものである。本実施の形態の基地局装置６８Ａが、実施の形態１４（図２４）の基地局装置６８と異なるところは、変調部５２Ａに拡散部６５を設け、また、復調部５１Ａに逆拡散部６４を設けたＣＤＭＡ通信方式のベースバンド信号処理部６９Ａとした点であり、それ以外の構成及び動作は実施の形態１２と多くの点で類似している。なお、ＣＤＭＡ通信の場合、タイミング制御部５４に、遅延プルファイル等（図示なし）から選択された複数のフィンガを合わせ込むＲＡＫＥ受信部が含まれることもある。
【０１０９】
このように、本実施の形態による基地局装置６８Ａは、復調部５１Ａに逆拡散部６４を、また、変調部５２Ａに拡散部６５を設けることで、ＣＤＭＡ通信に適用することができる。
【０１１０】
【発明の効果】
以上のように、本発明の演算処理装置によれば、命令を解読する命令解読手段と、所定のビット幅を有して上位側と下位側に２つのメトリックを保持する記憶手段と、前記記憶手段の上位側と下位側とを並列にアクセスして２つのメトリックを読み出すアクセス手段と、２つのパスメトリックのうちの１つと２つのブランチメトリックのうちの１つとを加算して得られる４通りのデータである第１、第２、第３、第４のデータを入力し、前記第１のデータと前記第２のデータとを比較する第１の比較手段と、この第１の比較手段と並列に動作して前記第３のデータと前記第４のデータとを比較する第２の比較手段とを備えているので、ＤＳＰによるパイプライン処理で１マシンサイクルに２つのパスメトリックの更新が実現でき、これにより高速に比較的少ない処理量でＤＳＰによるビタビ復号のＡＣＳ演算が実現でき、携帯端末の小型化・軽量化・低価格化・バッテリーの長寿命化が可能になるという有利な効果が得られる。
【図面の簡単な説明】
【図１】本発明の実施の形態１における演算処理装置の構成を示すブロック図
【図２】符号化率１／２の畳み込み符号器の例を示すブロック図
【図３】拘束長Ｋ＝４時のバタフライ構造を示す模式図
【図４】本発明の実施の形態２における演算処理装置の構成を示すブロック図
【図５】本発明の実施の形態２における演算処理装置のパイプライン動作を説明するタイミング図
【図６】本発明の実施の形態２における４バンクのＲＡＭ１４のメモリアクセスの動作例を示す模式図
【図７】本発明の実施の形態３における演算処理装置の構成を示すブロック図
【図８】本発明の実施の形態３におけるデュアルポートＲＡＭ１５のメモリアクセスの動作例を示す模式図
【図９】本発明の実施の形態４における演算処理装置の構成を示すブロック図
【図１０】本発明の実施の形態４における演算処理装置のパイプライン動作を説明するタイミング図
【図１１】本発明の実施の形態５における演算処理装置の構成を示すブロック図
【図１２】ノードＮ０，Ｎ１からノードＮ’０，Ｎ’４へのＡＣＳ演算とノードＮ６，Ｎ７からノードＮ’３，Ｎ’７へのＡＣＳ演算の比較例を示す一覧図
【図１３】本発明の実施の形態６における演算処理装置の構成を示すブロック図
【図１４】本発明の実施の形態７における演算処理装置の構成を示すブロック図
【図１５】本発明の実施の形態８における演算処理装置の構成を示すブロック図
【図１６】本発明の実施の形態８における４：２ＣＯＭＰＲＥＳＳＯＲの入出力図
【図１７】本発明の実施の形態９における演算処理装置の構成を示すブロック図
【図１８】倍精度ＡＵのキャリー制御を説明するための図
【図１９】本発明の実施の形態１０における演算処理装置の構成を示すブロック図
【図２０】本発明の実施の形態１１における演算処理装置の構成を示すブロック図
【図２１】本発明の実施の形態１２における移動局装置の構成を示すブロック図
【図２２】本発明の実施の形態１３における移動局装置の構成を示すブロック図
【図２３】本発明の実施の形態１４における基地局装置の構成を示すブロック図
【図２４】本発明の実施の形態１５における基地局装置の構成を示すブロック図
【図２５】ビタビ復号における畳み込み符号器の状態遷移のパスを示す状態遷移図（トレリス線図）
【図２６】トレリス線図のバタフライ構造を示す模式図
【図２７】畳み込み符号器による生成符号例を示す模式図
【図２８】チャネル・コーディング（Channel Coding）向けビタビ演算例を示すプログラム図
【図２９】ポインタ制御とパスメトリックの格納例を示す模式図
【符号の説明】
１パスメトリックを格納する記憶手段
２バス
３ブランチメトリックを格納する記憶手段
４バス
５比較手段
６加算手段
７比較結果を格納する記憶手段
８選択手段
９比較手段
１０加算手段
１１比較結果を格納する記憶手段
１２選択手段
１３バス
１４４バンクからなるＲＡＭ
１５デュアルポートＲＡＭ
１６入力レジスタ
１７入力レジスタ
１８スワップ回路
１９加算器
２０加算器
２１比較器
２２加算器
２３加算器
２４加算器
２５加算器
２６比較器
２７加算器
２８加算器
２９ＡＬＵ
３０入力レジスタ
３１入力レジスタ
３２バス
３３バス
３４セレクタ
３５セレクタ
３６レジスタファイル
３７バス
３８バス
３９４：２ＣＯＭＰＲＥＳＳＯＲ
４０４：２ＣＯＭＰＲＥＳＳＯＲ
４１倍精度ＡＵ
４２倍精度加算器
４３シフトレジスタ
４４シフトレジスタ
４５移動局装置
４６アンテナ部
４７無線部
４８受信部
４９送信部
５０ベースバンド信号処理部
５１復調部
５２変調部
５３ＤＳＰ
５４タイミング制御部
５５ビタビ復号部
５６畳み込み符号
５７音声コーデック部
５８スピーカ
５９マイク
６０データ入出力装置
６１表示部
６２操作部
６３制御部
６４逆拡散部
６５拡散部
６６受信アンテナ
６７送信アンテナ
６８基地局装置
６９ベースバンド信号処理部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an arithmetic processing apparatus incorporated in a mobile communication device or the like, and more particularly to a technique that enables efficient processing of ACS (addition, comparison, selection) operation of Viterbi decoding.
[0002]
[Prior art]
2. Description of the Related Art In recent years, digital signal processors (hereinafter referred to as DSPs) are frequently used as, for example, devices embedded in mobile phones in accordance with the trend of digitization in the mobile communication field. In data communication in a mobile radio communication line, bit errors frequently occur, so it is necessary to perform error correction processing. As an error correction method, there is a method in which a convolutional code generated from input bits is decoded by Viterbi decoding on the receiving side, and a DSP is used for this error correction processing.
[0003]
Viterbi decoding realizes maximum likelihood decoding of a convolutional code by repeating a simple process of addition, comparison, and selection, and finally by a traceback operation for decoding data. The Viterbi decoding process will be briefly described below. A convolutional code is generated by mod2 addition of an input bit and a certain number of bits preceding it, and a plurality of encoded data is generated corresponding to one input bit. The number of input information bits affecting the encoded data is called a constraint length (K), and the number is equal to the number of stages of shift registers used for mod2 addition.
[0004]
This encoded data is determined by the input bits and the state of the preceding (K-1) input bits. This state shifts to a new state when a new information bit is input (transition), but the state that can be transitioned is determined by whether the new input bit is 0 or 1. The number of states is (K-1) because each of these bits can take 1 and 0.^K-1 It becomes a piece.
[0005]
In Viterbi decoding, the received encoded data sequence is observed, and the most probable state is estimated from all possible state transitions. Therefore, every time the encoded data (received data series) corresponding to 1 information bit is obtained, the inter-signal distance (metric) of the path to each state at that time is calculated, and among the paths reaching the same state The operation of leaving the one with a smaller metric as a surviving path is sequentially repeated.
[0006]
FIG. 25 shows a state S [n] and a state S [n + 2] at a previous time point with respect to a state S [2n] (n is a positive integer) at a certain time point in a convolutional encoder with a constraint length K.^K-2 ] Shows a state in which two paths representing state transitions are extended. For example, in the case of K = 3, when n = 1, for S [2], that is, the state of S10 (the state in which the preceding two bits are input in the order of “1” and “0”), S [1 ], That is, a transition from the state of S01 and S [3], that is, the state of S11, and when n = 2, the state of S [4], that is, the state of S00 (the state represented by the lower 2 bits) , S [2], that is, the state of S10, and S [4], that is, the state of S00.
[0007]
The path metric a is an inter-signal distance (branch metric x) between the output symbol of the path input to the state S [2n] and the received data series, and the branch of the surviving path up to the state S [n] at the previous time point. It is the sum with path metric A which is the sum of metrics. Similarly, the path metric b is the distance (branch metric y) between the output symbol of the path input to the state S [2n] and the received data series, and the state S [n + 2 at the previous time point.^K-2 ] With the path metric B, which is the sum of the branch metrics of the surviving paths. The path metrics a and b input to the state S [2n] thus obtained are compared, and the smaller path is selected as the surviving path.
[0008]
In Viterbi decoding, each process of addition, path metric comparison, and path selection for obtaining a path metric is performed at each point in time as described above.^K-1 Execute for each state. Further, in selecting a path, the history of which path has been selected is stored in the path select signal PS [i], [i = 0-2.^K-1 -1] must be left as it is. At this time, the subscript (for example, n) of the state immediately before the selected path is the subscript (n + 2) of the state before the other path that has not been selected.^K-2 ) Is smaller than PS), PS [i] = 0, and larger is PS [i] = 1. In the case of FIG. 25, n <(n + 2^K-2 Therefore, when a> b, the state S [n + 2^K-2 ] Is selected and PS [S2n] = 1, and when a ≦ b, the state S [n] is selected and PS [S2n] = 0. Finally, when decoding by traceback, the data is decoded while tracing back the surviving path based on this path select signal.
[0009]
This ACS arithmetic processing for Viterbi decoding in a conventional DSP will be described by taking TMS320C54x (manufactured by TEXAS INSTRUMENTS, hereinafter referred to as C54x) as a general-purpose arithmetic unit as an example. In the GSM cellular radio system, the following polynomials are used as convolutional codes.
G1 (D) = 1 + D^Three + D^Four , G2 (D) = 1 + D + D^Three + D^Four
[0010]
This convolutional code is represented by a trellis diagram having a butterfly structure shown in FIG. This trellis diagram shows how the convolutional code transitions from one state to another. If the constraint length K is now 5, 2^K-1 = 16 states or 8 butterfly structures exist between each symbol, and two branches are input to each state, and a new path metric is determined by ACS operation.
[0011]
The branch metric is defined by the following formula.
M = SD (2^* i)^* B (J, 0) + SD (2^* i + 1)^* B (J, 1)
Where SD (2^* i) is the first symbol of the symbol metric representing the soft decision input, and SD (2^* i + 1) is the second symbol of the symbol metric. B (J, 0) and B (J, 1) match the codes generated by the convolutional encoder shown in FIG.
[0012]
In C54x, the butterfly structure is processed at high speed by setting the ALU to dual 16-bit mode. To determine the new path metric (J), use the DADST instruction to^* J and 2^* Two path metrics of J + 1 and branch metrics (M and -M) are calculated in parallel, and comparison is performed using a CMPS instruction. To determine a new path metric (J + 8), two path metrics and a branch metric (M and -M) are calculated in parallel with the DSADT instruction. The calculation results are stored in the upper and lower positions of the double precision accumulator, respectively. A new path metric is determined by the CMPS instruction.
[0013]
The CMPS instruction compares the upper and lower accumulators and stores the larger one in the memory. In addition, it is updated each time a comparison is made so that it can be traced back later which is selected in the 16-bit transition register (TRN). The contents of TRN are stored in the memory after each symbol process. Information stored in the memory is used to search for the optimum path during the traceback process. FIG. 28 shows a macro program for butterfly calculation of Viterbi decoding. The value of the branch metric is stored in the T register before the macro is called. FIG. 29 shows an example of path metric memory mapping.
[0014]
In one symbol interval, 8 butterfly operations are performed, and 16 new states are obtained. This series of processing is repeatedly calculated over several symbol intervals, and when the processing is completed, traceback is performed next, the optimum path is searched from 16 paths, and a decoded bit sequence is obtained.
[0015]
The above is the mechanism of ACS operation of C54x, which is a general-purpose DSP. From the example of the macro program in FIG. 28, C54x implements updating of two path metrics in 4 machine cycles.
[0016]
[Problems to be solved by the invention]
In the future, demand for non-voice communication such as data transmission by mobile radio communication is expected to increase more and more, and non-voice communication has a lower bit error rate (hereinafter referred to as this). High transmission quality (referred to as BER) is demanded. One means for achieving a low BER is to increase the constraint length K of Viterbi decoding used as error correction. When the constraint length is increased by one, the number of path metrics (number of states) is doubled, so that the amount of computation in Viterbi decoding is doubled. In general, non-voice communication has a larger amount of information than voice communication, and the larger the amount of information, the more the processing amount (ACS calculation, etc.) required for Viterbi decoding.
[0017]
On the other hand, in mobile radio communication and the like, it is desired to maintain the battery life of the portable terminal for a long time. At the same time, miniaturization, weight reduction, and price reduction of portable terminals are also desired. For this reason, in a portable terminal, an area conventionally processed by a dedicated LSI is also made into one chip by DSP processing. The smaller the DSP throughput, the longer the battery can last.
[0018]
However, as described above, the amount of computation by the DSP tends to increase in the future, and there is a problem that it is difficult to sustain the battery of the mobile terminal for a long time. Further, if the amount of calculation increases, there is a problem that the processing capacity of the existing DSP is no longer exceeded, and it cannot be realized with one chip by the DSP. Furthermore, in order to increase the functionality of the DSP, a large-scale hardware investment causes an increase in the cost of the DSP itself, resulting in a problem that the price of the portable terminal cannot be reduced.
[0019]
The present invention solves such a conventional problem, and an object of the present invention is to provide an arithmetic processing unit that efficiently processes Viterbi decoding processing by a DSP, particularly ACS arithmetic, with as little hardware investment as possible. And
[0020]
[Means for Solving the Problems]
An arithmetic processing apparatus of the present invention is an arithmetic processing apparatus capable of Viterbi decoding by a digital signal processor, wherein first comparison means for comparing first data with second data, third data, Second comparing means for comparing with the fourth data, the first comparing means and the second comparing means.By one instruction andIt is configured to operate in one cycle.
[0029]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, embodiments of the present invention will be described with reference to the drawings.
(Embodiment 1)
FIG. 1 shows the configuration of the arithmetic processing apparatus according to the first embodiment. In FIG. 1, 1 is a storage means for storing a path metric, 2 is a bus connected to the storage means 1 for supplying data and transferring operation results, 3 is a storage means for storing branch metrics, and 4 is a storage means 3 And 5 and 9 are comparison means for comparing data read from the storage means 1 and 3 via the buses 2 and 4, respectively, and 6 and 10 are storage means 1 and 3. Adding means for adding the data read from the buses 2 and 4 respectively, 7 is a storage means for storing the comparison result of the comparison means 5, 11 is a storage means for storing the comparison result of the comparison means 9, 8 Is a selection unit that inputs the addition result of the addition unit 6 and determines an output based on the comparison result of the comparison unit 5, and 12 is an input of the addition result of the addition unit 10, and is based on the comparison result of the comparison unit 9 Selection means for determining the force 13 enter the selection result of the selection means 8 and 12, a bus for transferring the storage means 1. The storage means 7 and 11 for storing the comparison results are both connected to the bus 2 and can transfer the comparison results to the storage means 1 via the bus 2.
[0030]
Next, the operation in this embodiment will be described with reference to FIGS. In the following description, the case where the constraint length K is 4 and the coding rate is 1/2 is considered. Both path metric and branch metric data types are single precision data. In the following description, for the sake of convenience, when the double precision data is (X, Y), X represents the upper side of the double precision data, and Y represents the lower side of the double precision data.
[0031]
Considering the convolutional encoder of FIG. 2 as an example, the four branch metrics when the coding rate is ½ are BM0, BM1, BM2, and BM3, respectively. Using these branch metrics, the transition state of the state (State) when the constraint length K = 4 is illustrated, and a butterfly structure as shown in FIG. 3 is obtained. Here, attention is paid to the node N0 and the node N1 in the old state (Old State). The node N0 and the node N1 transition at the node N′0 and the node N′4.
The branch metric (BM) taken at that time is
BM0 when node N0 to node N'0
BM1 when node N1 to node N'0
BM1 when node N0 to node N'4
BM0 when node N1 to node N'4
It is. Also, assuming that the path metric of the node N0 is PM0 and the path metric of the node N1 is PM1, the node metrics N0, N1 are exchanged by adding the branch metrics BM0, BM1 to the common path metrics PM0, PM1, respectively. It can be seen that it can be a '4 path metric. Using this relationship, two path metrics can be updated simultaneously by performing parallel processing.
[0032]
As shown in FIG. 3, this relationship also holds for subsequent node pairs (in the figure, a pair of node N2 and node N3, a pair of node N4 and node N5, and a pair of node N6 and node N7). Therefore, as shown in FIG. 3, the ACS calculation of the first half nodes N′0 to N′3 is processed by the comparison means 5, the addition means 6, the storage means 7 for storing the comparison results, and the selection means 8, and the latter half. The ACS operation of the nodes N′4 to N′7 is processed by the comparison means 9, the addition means 10, the storage means 11 for storing the comparison results, and the selection means 12.
[0033]
Hereinafter, detailed operations will be described regarding the ACS operation from the node N0 and the node N1 to the node N′0 and the node N′4. First, two path metrics are output from the storage unit 1 as (PM1, PM0) to the bus 2, while two branch metrics are output from the storage unit 3 to the bus 4 as (BM1, BM0). In the comparison means 5, two path metrics (PM1, PM0) are input from the bus 2, two branch metrics (BM1, BM0) are input from the bus 4,
PM1 + BM1-PM0-BM0
Calculate On the other hand, the adding means 6 inputs two path metrics (PM1, PM0) from the bus 2, and inputs two branch metrics (BM1, BM0) from the bus 4.
PM1 + BM1 and PM0 + BM0
Is output to the selection means 8 as (PM1 + BM1, PM0 + BM0).
[0034]
The selection means 8 inputs the most significant bit (hereinafter referred to as MSB: Most Significant Bit) which is the sign bit of the comparison result PM1 + BM1−PM0−BM0 of the comparison means 5 and outputs the higher order PM1 + BM1 according to the MSB value. , Select whether to output the lower PM0 + BM0.
That is,
PM1 + BM1 ≧ PM0−BM0
Then, PM1 + BM1-PM0-BM0 ≧ 0
Therefore, the MSB becomes 0. At this time, the lower PM0 + BM0 is selected, and this is newly output to the bus 13 as PM'0.
vice versa,
PM1 + BM1 <PM0-BM0
Then, PM1 + BM1-PM0-BM0 <0
Therefore, the MSB becomes 1, and at this time, the upper PM1 + BM1 is selected, and this is newly output to the bus 13 as PM'0. Further, the MSB of the comparison result of the comparison means 5 is sequentially stored in the storage means 7 at the same time.
[0035]
In the comparison means 9, two path metrics (PM1, PM0) are input from the bus 2, and two branch metrics (BM1, BM0) are input from the bus 4.
PM1 + BM0-PM0-BM1
Calculate On the other hand, in the adding means 10, two path metrics (PM1, PM0) are input from the bus 2, and two branch metrics (BM1, BM 0) are input from the bus 4.
PM1 + BM0 and PM0 + BM1
Is output to the selection means 12 as (PM1 + BM0, PM0 + BM1).
[0036]
The selection unit 12 inputs the MSB of the comparison result PM1 + BM0−PM0−BM1 of the comparison unit 9, and selects whether to output the upper PM1 + BM0 or the lower PM0 + BM1 depending on the value of the MSB.
That is,
PM1 + BM0 ≧ PM0-BM1
Then, PM1 + BM0-PM0-BM1 ≧ 0
Therefore, the MSB becomes 0. At this time, the lower PM0 + BM1 is selected, and this is newly output to the bus 13 as PM'4.
vice versa,
PM1 + BM0 <PM0-BM1
Then, PM1 + BM0-PM0-BM1 <0
Therefore, the MSB becomes 1, and at this time, the upper PM1 + BM0 is selected, and this is newly output to the bus 13 as PM'4.
Further, the MSB of the comparison result of the comparison unit 9 is sequentially stored in the storage unit 11 at the same time.
[0037]
As described above, it is possible to execute the ACS operation of the Viterbi decoding by the DSP in parallel by performing the same processing for the other node pairs. In the above description, a specific example in the case of the constraint length K = 4 and the coding rate 1/2 has been shown. However, even if the values of the constraint length and the coding rate are other values, the above relationship Therefore, it can be similarly implemented by making appropriate changes accordingly.
[0038]
(Embodiment 2)
FIG. 4 shows the configuration of the arithmetic processing apparatus according to the second embodiment. The arithmetic processing unit of the present embodiment is different from the arithmetic processing unit of the first embodiment (FIG. 1) in that the arithmetic processing unit is composed of four banks of RAM 14 as storage means for storing path metrics. Other configurations and operations are the same as those in the first embodiment.
[0039]
The arithmetic processing apparatus of this embodiment is suitable for the arithmetic processing of the pipeline structure shown in FIG. For example, in order to execute the ACS operation in the operation execution stage of the (n + 1) th cycle in the instruction 1, it is necessary to supply the RAM 14 with a path metric address read in advance in the memory access stage of the nth cycle. At this time, assuming that the RAM 14 can continuously read even addresses and odd addresses, that is, a RAM capable of double-precision reading, the two path metrics used for calculation can be used simply by specifying even addresses in the following situations. Can be read out.
・ One-state path metrics are stored in consecutive addresses in the order of even and odd addresses.
-One-state path metrics are divided into the first half and the second half, and stored in separate banks.
[0040]
The four-bank RAM 14 stores, for example, the first-half path metrics of the old state in bank 0 (referring to PM0, PM1, PM2, and PM3 in FIG. 3). 3 indicates PM4, PM5, PM6, and PM7), two path metrics are generated by one cycle of operation execution (ACS operation execution), and these are respectively sent to bank 2, Stored in bank 3. At this time, the bus 13 transfers double precision data, the path metrics from the nodes N′0 to N′3 are stored in the bank 2, and the path metrics from the node N ′ to the node N′7 are stored in the bank 3. Is done.
[0041]
FIG. 6 shows an example of memory access operation corresponding to FIG. When the one-state ACS operation is completed, in the next state, the path metric of the old state is read from the banks 2 and 3, and the path metric of the new state is stored in the banks 0 and 1. By switching between the bank pair for reading the path metric and the bank pair to be stored every time one-state ACS operation is completed using the four banks of RAM 14 as described above, the Viterbi decoding ACS operation by the DSP is executed in parallel. It becomes possible.
[0042]
In the above description, bank 0 and bank 1 and bank 2 and bank 3 have been described as an example of a pair of banks. However, even when other combinations are used, addresses supplied at the memory access stage are stored. It can be similarly implemented only by changing the address. In the present embodiment, the RAM 14 is composed of four banks. However, the number of banks is a minimum required number, and can be similarly implemented if the number is four or more.
[0043]
(Embodiment 3)
FIG. 7 shows the configuration of the arithmetic processing apparatus according to the third embodiment. The difference between the arithmetic processing unit of the present embodiment and the arithmetic processing unit of the first embodiment (FIG. 1) is that it is composed of a dual-port RAM 15 consisting of three banks as storage means for storing path metrics. Other configurations and operations are the same as those in the first embodiment.
[0044]
The arithmetic processing apparatus of this embodiment is also suitable for the arithmetic processing of the pipeline structure shown in FIG. Since the storage means for storing the path metric is the dual port RAM 15, it is possible to specify reading and writing to the same bank in one instruction. For example, in the instruction 1, the ACS operation is executed at the operation execution stage of the (n + 1) th cycle. Therefore, first, the address of the path metric to be read and the address of the path metric to be written are supplied to the dual port RAM 15 in the memory access stage of the nth cycle, and from the dual port RAM 15 in the same way as the RAM 14 of the second embodiment in the n + 1th cycle. It is possible to read even and odd addresses in succession, perform ACS operation, and write one path metric to the same bank.
[0045]
The arithmetic processing apparatus according to the third embodiment also operates under the following conditions, like the arithmetic processing apparatus according to the second embodiment.
・ One-state path metrics are stored in consecutive addresses in the order of even and odd addresses.
-One-state path metrics are divided into the first half and the second half, and stored in separate banks.
[0046]
The dual port RAM 15 stores, for example, the path metric of the first half of the old state in bank 0 (referring to PM0, PM1, PM2, PM3 in FIG. 3), and the path metric of the second half of the old state stored in bank 1 (FIG. 3). , PM4, PM5, PM6, and PM7) are stored, two path metrics are generated by one cycle of operation execution (ACS operation execution). 2 is stored. At this time, the bus 13 transfers double precision data, the path metrics from the nodes N′0 to N′3 are stored in the bank 0, and the path metrics from the nodes N′4 to N′7 are stored in the bank 2. Stored.
[0047]
FIG. 8 shows an example of memory access operation corresponding to FIG. The arithmetic processing unit of the present embodiment is different from the arithmetic processing unit of the second embodiment in that when the one-state ACS calculation is completed, only the bank 1 and the bank 2 are switched, and the bank 0 is not switched. This is that the ACS operation of the Viterbi decoding by the DSP can be executed in parallel. In the present embodiment, the dual port RAM 15 is composed of three banks. However, the number of banks is a minimum required number, and can be similarly implemented if the number is three or more.
[0048]
(Embodiment 4)
FIG. 9 shows the configuration of the arithmetic processing apparatus according to the fourth embodiment. The arithmetic processing unit of the present embodiment is different from the arithmetic processing unit of the second embodiment (FIG. 4) in that it includes input registers 16 and 17, and other configurations and operations are the same as those of the first embodiment. It is exactly the same as 2. In FIG. 9, the input registers 16 and 17 receive data from the bus 2 and output the data to the comparison means 5 and 9 and the addition means 6 and 10.
[0049]
The arithmetic processing apparatus of this embodiment is suitable for the arithmetic processing of the pipeline structure shown in FIG. For example, in order to execute the ACS operation in the operation execution stage of the n + 2 cycle in the instruction 1, the path metric address read in the memory access stage of the n cycle is supplied to the RAM 14 in advance, and in the data transfer stage of the n + 1 cycle. Data output from the RAM 14 is latched in the input registers 16 and 17 via the bus 2.
[0050]
In the pipeline shown in FIG. 10, one stage of data transfer is inserted before the operation execution stage in the stage of the pipeline shown in FIG. That is, at the beginning of the operation execution stage, the data from the RAM 14 is determined by the input register before each operation unit (referring to the comparison means 5 and 9 and the addition means 6 and 10). It is possible to save the time required for.
[0051]
Therefore, according to the present embodiment, it is possible to execute the ACS operation of Viterbi decoding by the DSP in parallel at a relatively high speed. Note that the present invention can be similarly implemented even if a dual port RAM is used as means for storing the path metric.
[0052]
(Embodiment 5)
FIG. 11 shows the configuration of the arithmetic processing unit according to the fifth embodiment. The arithmetic processing unit of the present embodiment is different from the arithmetic processing unit of the fourth embodiment (FIG. 9) in that it includes a swap circuit 18, and other configurations and operations are the same as those of the fourth embodiment. Exactly the same. In FIG. 11, the swap circuit 18 inputs data from the storage means 3 that stores the branch metrics and outputs the data to the bus 4.
[0053]
The arithmetic processing apparatus of this embodiment is suitable for the arithmetic processing of the pipeline structure shown in FIG. The swap circuit 18 outputs the values of two branch metrics inputted as double precision data in the form of {BM1, BM0} from the storage means 3 as {BM1, BM0} as they are, or swaps the upper and lower levels. And {BM0, BM1} to be output by an instruction or the like.
[0054]
The operation of the swap circuit 18 will be described. The constraint length K = 4 and the coding rate 1/2 will be described using the transition state of the path metric of the convolutional encoder shown in FIG. 2 and the butterfly structure shown in FIG. An ACS operation when transitioning from the node N0 and the node N1 in the old state to the node N'0 and the node N'4, and from the node N6 and the node N7 in the old state (node N'3) FIG. 12 is a comparison of the ACS operation when transitioning to the node N′7. That is, the ACS operation from the node N0 and the node N1 to the node N′0 and the ACS operation from the node N6 and the node N7 to the node N′3 are performed by the comparison unit 5 and the addition unit 6, but both ACS operations are common. The branch metrics BM0 and BM1 are used, and BM0 and BM1 are swapped. The same relationship holds in the comparing means 9 and the adding means 10 of the ACS operation from the node N0 and the node N1 to the node N'4 and the ACS operation from the node N6 and the node N7 to the node N'7. For this reason, the storage means 3 for storing branch metrics must be stored in both {BM0, BM1} and {BM1, BM0} forms, resulting in redundant hardware resources.
[0055]
The swap circuit 18 solves such redundancy. The storage means 3 for storing branch metrics stores only the form {BM0, BM1}, for example, and the swap circuit 18 stores this { BM0, BM1} is input and, for example, an instruction or the like is used to switch between {BM0, BM1} and {BM1, BM0} as an output, and the branch metric is stored by this swap circuit 18 Therefore, it is possible to omit the redundancy of the storage means 3 to be performed.
[0056]
In this embodiment, the description has been given using the node N0, the node N1, the node N6, and the node N7 in the old state with the constraint length K = 4 and the coding rate 1/2, but the node N2, the node N3 The above relationship is also established in the node N4 and the node N5, and further, a combination of the constraint length K and the coding rate other than the above is also established. Further, the present invention can be similarly implemented by using a dual port RAM as means for storing the path metric.
[0057]
(Embodiment 6)
FIG. 13 shows the configuration of the arithmetic processing apparatus according to the sixth embodiment. The arithmetic processing apparatus of the present embodiment is different from the arithmetic processing apparatus of the fifth embodiment (FIG. 11) in that it comprises two adders and one comparator as comparison means, and two adders as addition means. The other configurations and operations are the same as those in the fifth embodiment.
[0058]
In FIG. 13, reference numerals 19 and 20 denote adders for inputting and adding data from the bus 4 and the input register 16, and reference numeral 21 denotes a storage means for inputting and comparing the addition results from the adder 19 and the adder 20 and storing the comparison results. 7 and a comparator for outputting to the selecting means 8, 22 and 23 are input by adding data from the bus 4 and the input register 16, and an adder for outputting the addition result to the selecting means 8, and 24 and 25 are inputs to the bus 4. An adder for inputting and adding data from the register 17; and 26, a comparator for inputting and comparing the addition results from the adder 24 and the adder 25, and outputting the comparison results to the storage means 11 and the selecting means 12, 27 Reference numerals 28 and 28 denote adders which input data from the bus 4 and the input register 17 and add them, and output the addition result to the selection means 12. The arithmetic processing apparatus of this embodiment is suitable for the arithmetic processing of the pipeline structure shown in FIG.
[0059]
Next, the ACS calculation operation in this embodiment will be described. Assuming that the constraint length K = 4 and the coding rate 1/2, the convolutional encoder shown in FIG. 2, the butterfly structure shown in FIG. 3, and the nodes N0 and N1 shown in FIG. A description will be given by comparing the ACS operation and the ACS operation from the nodes N6 and N7 to the nodes N′3 and N′7.
[0060]
As shown in FIG. 13, when the two path metrics are output as {A, B} from the input registers 16 and 17 and the two branch metrics are output as {C, D} from the swap circuit 18, the adder 19 Then, the path metric {A} and the branch metric {C} are input and the addition result {A + C} is output. The adder 20 inputs the path metric {B} and the branch metric {D}, and the addition result {B + D }, And the comparator 21 inputs the addition result {A + C} of the adder 19 and the addition result {B + D} of the adder 20, compares {A + C− (B + D)}, and compares the MSB of the comparison result Is output. The adder 22 inputs the path metric {A} and the branch metric {C} and outputs the addition result {A + C}. The adder 23 inputs the path metric {B} and the branch metric {D}, and the addition result. {B + D} is output.
[0061]
On the other hand, the adder 24 inputs the path metric {A} and the branch metric {D} and outputs the addition result {A + D}, and the adder 25 inputs the path metric {B} and the branch metric {C}. , The addition result {B + C} is output, and the comparator 26 inputs the addition result {A + D} of the adder 24 and the addition result {B + C} of the adder 25, and compares {A + D− (B + C)}. The MSB of the comparison result is output. The adder 27 inputs the path metric {A} and the branch metric {D} and outputs the addition result {A + D}. The adder 28 inputs the path metric {B} and the branch metric {C} and adds them. The result {B + C} is output.
[0062]
With the above configuration and operation, if two path metrics {A, B} = {PM1, PM0} of the input register 16 and the input register 17 are set, and the output {C, D} = {BM1, BM0} of the swap circuit 18 is set. The ACS operation when transitioning from the node N0 and the node N1 in the old state shown in FIG. 12 to the node N′0 and the node N′4 can be realized.
[0063]
Further, when two path metrics {A, B} = {PM1, PM0} of the input register 16 and the input register 17 are set and the output {C, D} = {BM0, BM1} of the swap circuit 18 is shown in FIG. It is possible to realize an ACS operation when transitioning from the node N0 and the node N1 in the old state to the node N′0 and the node N′4.
[0064]
Therefore, according to the present embodiment, the update of two path metrics can be realized in one machine cycle by the pipeline operation by the DSP. In this embodiment, the description has been given using the node N0, the node N1, the node N6, and the node N7 in the old state with the constraint length K = 4 and the coding rate 1/2, but the node N2, the node N3 The above relationship is also established in the node N4 and the node N5, and further, a combination of the constraint length K and the coding rate other than the above is also established. Further, the present invention can be similarly implemented by using a dual port RAM as means for storing the path metric.
[0065]
(Embodiment 7)
FIG. 14 shows the configuration of the arithmetic processing apparatus according to the seventh embodiment. The difference between the arithmetic processing unit of the present embodiment and the arithmetic processing unit of the sixth embodiment (FIG. 13) is that one of the comparators is shared by the ALU 29, and the input registers 30, 31 are accordingly associated. And other buses 32, 33, 37, and 38, selectors 34 and 35, and a register file 36 as a storage means for storing branch metrics. This is exactly the same as Form 6.
[0066]
In FIG. 14, 30 is an input register for inputting data from the RAM 14 consisting of 4 banks via the bus 37, 31 is an input register for inputting data from the RAM 14 consisting of 4 banks via the bus 38, and 32 and the bus 33 are registers. A bus for inputting data from the file 36, a selector 34 for inputting data from the bus 32, the adder 19 and the input register 30 and selecting an output, and 35 for inputting data from the bus 33, the adder 20 and the input register 31. , A selector for selecting an output, 29 receives data from the selectors 34 and 35, performs arithmetic logic operation, outputs an arithmetic logic operation result to the bus 13, and further stores the MSB of the arithmetic logic operation result as a comparison result An ALU output to the means 7 and the selection means 8. The arithmetic processing apparatus of this embodiment is suitable for the arithmetic processing of the pipeline structure shown in FIG.
[0067]
In the present embodiment, when performing the ACS operation, the selector 34 selects the output of the adder 19 and inputs it to the ALU 29, the selector 35 selects the output of the adder 20 and inputs it to the ALU 29, and the ALU 29 inputs The two data are subtracted, and the MSB of the subtraction result is output to the storage means 7 for storing the comparison result and the selection means 8.
[0068]
In addition, when the ALU performs arithmetic logic operation between the register and the register, data is output from the register file 36 to the bus 32 and the bus 33, and the selector 34 and the selector 35 select the bus 32 and the bus 33, respectively. It is feasible. When the ALU performs an arithmetic logic operation between the register and the memory, data is output from the register file 36 to the bus 32, and data is input from the RAM 14 having four banks to the input register 31 via the bus 38. This can be realized by the selector 35 selecting the bus 32 and the input register 31, respectively.
[0069]
Conversely, when the ALU performs an arithmetic logic operation between the memory and the register, data is input from the RAM 14 consisting of four banks to the input register 30 via the bus 37, and data is output from the register file 36 to the bus 33. This can be realized by the selector 34 and the selector 35 selecting the input register 30 and the bus 33, respectively.
[0070]
In addition, when the ALU performs an arithmetic logic operation between the memory and the memory, data is input to the input register 30 and the input register 31 from the RAM 14 having four banks via the bus 37 and the bus 38, and the selector 34 and the selector 34 are selected. 35 can be realized by selecting the input register 30 and the input register 31, respectively.
[0071]
As described above, according to the present embodiment, one of the comparators that perform the ACS operation is also used as the ALU, so that the chip area can be reduced and the cost can be reduced when the arithmetic processing unit is implemented as an LSI. be able to. Note that the present invention can be similarly implemented even if a dual port RAM is used as means for storing the path metric.
[0072]
(Embodiment 8)
FIG. 15 shows a configuration of an arithmetic processing apparatus according to the eighth embodiment. The difference between the arithmetic processing unit of the present embodiment and the arithmetic processing unit of the seventh embodiment (FIG. 14) is that two adders used as comparison means are realized by 4: 2 COMPRESOR 39 and 40. Other configurations and operations are the same as those in the seventh embodiment.
[0073]
In FIG. 15, 39 is a data input from the bus 4 and the input register 16, and 4: 2COMPRESSOR outputs the operation result to the selector 34 and the selector 35, and 40 is a data input from the bus 4 and the input register 17 to the comparator 26. 4: 2COMPRESSOR that outputs the calculation result. The arithmetic processing apparatus of this embodiment is suitable for the arithmetic processing of the pipeline structure shown in FIG.
[0074]
Next, the ACS calculation operation in this embodiment will be described. Assuming that the constraint length K = 4 and the coding rate 1/2, the convolutional encoder shown in FIG. 2, the butterfly structure shown in FIG. 3, and the nodes N0 and N1 shown in FIG. A description will be given by comparing the ACS operation and the ACS operation from the nodes N6 and N7 to the nodes N′3 and N′7.
[0075]
First, the 4: 2COMPRESSORs 39, 40 are connected in series with a single block for the number of single-precision bits for performing the processing shown in FIG. 16, and perform addition processing faster than a normal full adder.
[0076]
As shown in FIG. 15, when two path metrics are output as {A, B} from the input registers 16 and 17, and two branch metrics are output as {C, D} from the swap circuit 18, 4: 2COMPRESSOR39 Then, the path metric {A}, the branch metric {C}, the inversion {｝ B} of the path metric {B}, and the inversion {￣D} of the branch metric {D} are input. In the ALU 29, the selectors 34 and 35 are used. Then, the two outputs of 4: 2COMPRESSOR39 are input and added. However, at this time, in order to realize the two's complement of {B} and {D}, “1” is input to the 4: 2COMPRESSOR 39 and the lowest carry input of the ALU 29. As a result, {A + C- (B + D)} is obtained, and the MSB is output. The adder 22 inputs the path metric {A} and the branch metric {C} and outputs the addition result {A + C}. The adder 23 inputs the path metric {B} and the branch metric {D} and adds the result { B + D} is output.
[0077]
On the other hand, in 4: 2COMPRESSOR 40, path metric {A}, branch metric {D}, inversion {￣B} of path metric {B}, and inversion {￣C} of branch metric {C} are input to comparator 26. Inputs and adds the two outputs of 4: 2COMPRESSOR39. However, at this time, in order to realize the two's complement of {B} and {C}, “1” is input to the 4: 2COMPRESSOR 40 and the lowest carry input of the comparator 26. As a result, {A + D- (B + C)} is obtained, and the MSB is output. The adder 27 inputs the path metric {A} and the branch metric {D} and outputs the addition result {A + D}. The adder 28 inputs the path metric {B} and the branch metric {C} and adds them. The result {B + C} is output.
[0078]
With the above configuration and operation, if two path metrics {A, B} = {PM1, PM0} of the input register 16 and the input register 17 are set, and the output {C, D} = {BM1, BM0} of the swap circuit 18 is set. The ACS operation when transitioning from the node N0 and the node N1 in the old state shown in FIG. 12 to the node N′0 and the node N′4 can be realized.
[0079]
Further, when two path metrics {A, B} = {PM1, PM0} of the input register 16 and the input register 17 are set and the output {C, D} = {BM0, BM1} of the swap circuit 18 is shown in FIG. It is possible to realize an ACS operation when transitioning from the node N0 and the node N1 in the old state to the node N′0 and the node N′4. Therefore, the update of two path metrics can be realized in one machine cycle by the pipeline operation by the DSP.
[0080]
In this way, according to the present embodiment, by applying 4: 2COMPRESSOR to the comparison means for performing the ACS operation, it is possible to perform the operation at a higher speed than when configured with two adders. Can be realized. In the example, the description has been given using the node N0, the node N1, the node N6, and the node N7 in the old state with the constraint length K = 4 and the coding rate 1/2. However, the node N2, the node N3, the node N4, Since the above relationship is also established at the node N5, and the combination of the constraint length K and the coding rate other than the above is also established, it can be similarly implemented. Further, the present invention can be similarly implemented by using a dual port RAM as means for storing the path metric.
[0081]
(Embodiment 9)
FIG. 17 shows the configuration of the arithmetic processing apparatus according to the ninth embodiment. The arithmetic processing apparatus of the present embodiment is different from the arithmetic processing apparatus of the eighth embodiment (FIG. 15) in that a double precision adder is used as the adding means, and at least one of them is also used as a double precision AU. Other configurations and operations are the same as those in the eighth embodiment.
[0082]
In FIG. 17, 41 is input double precision format data from the input register 16 and the bus 4, double precision AU for performing double precision arithmetic operation, 42 is input double precision format data from the input register 17 and the bus 4, The double precision adder performs a double precision addition operation. The output of the double precision AU 41 is output to the selection means 8 and the bus 13, and the output of the double precision adder 42 is output to the selection means 12. The arithmetic processing apparatus of this embodiment is suitable for the arithmetic processing of the pipeline structure shown in FIG.
[0083]
When performing an ACS operation in the present embodiment, the double precision AU 41 inputs two path metrics from the input register 16 in the double precision format as {A, B}, and 2 from the swap circuit 18 via the bus 4. Two branch metrics are entered as {C, D} in double precision format. At this time, the double precision AU 41 performs double precision addition, but as shown in FIG. 18, the carry from the bit position of the single precision MSB to the next stage is forced to zero. Thereby, the addition {A + C, B + D} of the two path metrics and the branch metrics can be performed in parallel at the same time.
[0084]
On the other hand, the double precision adder 42 inputs the two path metrics from the input register 17 in the double precision format as {A, B}, and the two branch metrics from the swap circuit 18 via the bus 4 in the double precision format. Input as {D, C}. Similarly to the double precision AU 41, the double precision adder 42 forces the carry from the bit position of the single precision MSB to the next stage to zero and simultaneously adds the two path metrics and branch metrics {A + D, B + C}. Operate in parallel.
[0085]
In this way, according to the present embodiment, the double precision AU41 is used as the adding means for performing the ACS calculation, and the carry from the bit position of the single precision MSB to the next stage is forced to zero during the ACS calculation. In double precision arithmetic operations other than the above, by adding control for propagating the carry, it can be used, for example, as a double precision cumulative adder at the time of product-sum operation. The cost can be reduced by further reducing the area. Note that the present invention can be similarly implemented even if a dual port RAM is used as means for storing the path metric.
[0086]
(Embodiment 10)
FIG. 19 shows the configuration of the arithmetic processing apparatus according to the tenth embodiment. The difference between the arithmetic processing unit of the present embodiment and the arithmetic processing unit of the ninth embodiment (FIG. 17) is that a shift register is used as the storage means for storing the comparison result. The operation is exactly the same as in the ninth embodiment.
[0087]
In FIG. 19, 43 is a shift register that receives the MSB of the calculation result of the ALU 29, 44 is a shift register that receives the MSB of the calculation result of the comparator 26, and both the shift registers 43 and 44 are connected to the bus 2. Data can be output. The arithmetic processing apparatus of this embodiment is suitable for the arithmetic processing of the pipeline structure shown in FIG.
[0088]
In the present embodiment, when the ACS operation is performed, the MSB of the comparison result by the ALU 29 is shifted into the shift register 43 as needed, and the MSB of the comparison result from the comparator 26 is shifted into the shift register 44 as needed. A signal (a signal indicating which of the two paths has been selected and used when tracing back after the ACS operation is completed) can be stored. Further, when the bit width of the shift registers 43 and 44 is, for example, a single precision data width, when the ACS operation is performed several times with single precision bits, the values of the shift registers 43 and 44 are transferred via the bus 2. It is necessary to store the path select signal in the RAM 14 having four banks.
[0089]
Thus, according to the present embodiment, by using the shift registers 43 and 44 as the storage means for storing the comparison result for performing the ACS operation, it can also be used as, for example, an operation instruction using a division shift register. In the case where the arithmetic processing unit is made into an LSI, the chip area can be further reduced to reduce the cost. Note that the present invention can be similarly implemented even if a dual port RAM is used as means for storing the path metric.
[0090]
(Embodiment 11)
FIG. 20 shows the configuration of the arithmetic processing apparatus according to the eleventh embodiment. The arithmetic processing unit of the present embodiment is different from the arithmetic processing unit of the tenth embodiment (FIG. 19) in that the input register 17 always swaps and inputs path metric data from the bus 2 to the 4: 2 COMPRESOR 40. Is that the branch metric data from the swap circuit 18 is input as it is without swapping, and the negated value of the comparison result of the comparator 26 is shifted into the shift register 44. Other configurations and operations are the same as those of the tenth embodiment. Exactly the same. The arithmetic processing apparatus of this embodiment is suitable for the arithmetic processing of the pipeline structure shown in FIG.
[0091]
When performing an ACS operation in the present embodiment, two path metrics {A, B} are directly input to the input register 16 as {A, B}, but the input register 17 is always swapped { B, A}. Thereafter, in the 4: 2COMPRESSOR 40, the two branch metrics are input from the swap circuit 18 as {C} and {￣D}, and the two path metrics are input from the input register 17 as {B} and {￣A}. Then, the two outputs of 4: 2COMPRESSOR40 are input and added, and {A + D−B−C} is calculated. On the other hand, the double precision adder 42 receives two branch metrics as {C, D} from the swap circuit 18 and two path metrics as {B, A} from the input register, and {B + C} and {A + D}. Simultaneously calculate in parallel and output to the selection means 12 in the form of {B + C, A + D}. The comparator 26 outputs the MSB of the comparison result to the selection unit 12 and the MSB of the negation value of the comparison result to the shift register 44.
[0092]
In this way, according to the present embodiment, one of the input registers for storing two path metrics is swapped and input, so that the 4: 2 COMPRESOR 40 and the double precision adder 42 are connected in the operation execution (EX) stage. There is no swapping at the input, and it is possible to perform a faster ACS operation. Note that the present invention can be similarly implemented even if a dual port RAM is used as means for storing the path metric.
[0093]
(Embodiment 12)
FIG. 21 shows the configuration of the mobile station apparatus in the twelfth embodiment. In FIG. 21, a mobile station apparatus 45 in the present embodiment performs a transmission / reception antenna unit 46, a radio unit 47 including a reception unit 48 and a transmission unit 49, and modulation and demodulation of signals, encoding and decoding. Baseband signal processing unit 50 for performing, speaker 58 for emitting sound, microphone 59 for inputting sound, data input / output unit 60 for inputting / outputting data to be transmitted / received to / from an external device, and operation status display And a control unit 63 for controlling the antenna unit 46, the radio unit 47, the baseband signal processing unit 50, the display unit 61, the operation unit 62, and the like.
[0094]
The baseband signal processing unit 50 includes a demodulation unit 51 that demodulates a reception signal, a modulation unit 52 that modulates a transmission signal, and a one-chip DSP 53. The DSP 53 includes the first to eleventh embodiments. A Viterbi decoding unit 55 comprising a form of arithmetic processing unit, a convolutional encoding unit 56 for performing convolutional encoding of a transmission signal, an audio codec unit 57 for encoding / decoding an audio signal, and a reception signal by measuring transmission / reception timing A timing control unit 54 that sends a transmission signal from the convolutional coding unit 56 to the modulation unit 52 is formed by software from the demodulation unit 51 to the Viterbi decoding unit 55, respectively.
[0095]
The control unit 63 of the mobile station device 45 controls the operation of the entire mobile station device 45, for example, displays a signal input from the operation unit 62 on the display unit 61 or receives a signal input from the operation unit 62. A control signal for performing an incoming / outgoing call operation is output to the antenna unit 46, the radio unit 47, the baseband signal processing unit 50, and the like according to the communication sequence.
[0096]
When voice is transmitted from the mobile station device 45, the voice signal input from the microphone 59 is AD converted (not shown), encoded by the voice codec unit 57 of the DSP 53, and the encoded data is convolutionally encoded. 56. When data is transmitted, data input from the outside is input to the convolutional encoding unit 56 via the data input / output unit 60. The convolutional coding unit 56 performs convolutional coding on the input data and outputs the data to the timing control unit 54. The timing control unit 54 rearranges input data and adjusts transmission output timing, and outputs the result to the modulation unit 52. The data input to the modulation unit 52 is digitally modulated, D / A converted (not shown), and output to the transmission unit 49 of the radio unit 47. The transmission part 49 converts this into a radio signal, sends it to the antenna part 46, and is transmitted as a radio wave from the antenna.
[0097]
On the other hand, at the time of reception, the radio wave received by the antenna unit 46 is received by the receiving unit 48 of the radio unit 47, AD converted, and output to the demodulating unit 51 of the baseband signal processing unit 50. The data demodulated by the demodulator 51 is rearranged by the timing controller 54 and then input to the Viterbi decoder 55 where it is decoded. The data decoded by the Viterbi decoding unit 55 is audio-decoded by the audio codec unit 57 and DA-converted during audio communication, and then output as audio from the speaker 58. During data communication, the data decoded by the Viterbi decoding unit 55 is output to the outside via the data input / output unit 60.
[0098]
In this way, in the mobile station apparatus 45 according to the present embodiment, the Viterbi decoding unit 55, the convolutional encoding unit 56, the audio codec unit 57, and the timing control unit 54 are formed by the software of the DSP 53 of one chip. Therefore, it can be assembled with a small number of parts. Further, since the Viterbi decoding unit 55 is formed of the arithmetic processing unit of the first to eleventh embodiments, two path metrics can be updated in one machine cycle by pipeline processing by the DSP 53, thereby achieving high speed. In addition, it is possible to realize the ACS operation of Viterbi decoding by the DSP 53 with a relatively small processing amount.
[0099]
Here, the demodulator 51 and the modulator 52 are shown separately from the DSP 53, but they can also be configured by software of the DSP 53. Further, the DSP of the sixth embodiment can be used as the DSP, and the convolutional encoding unit 56, the audio codec unit 57, and the timing control unit 54 can be configured by separate components.
[0100]
(Embodiment 13)
FIG. 22 shows the configuration of the mobile station apparatus in the thirteenth embodiment. The mobile station device 45A of the present embodiment is different from the mobile station device 45 of the twelfth embodiment (FIG. 26) in that a modulator 65A is provided with a spreader 65, and a demodulator 51A is provided with a despreader 64. The baseband signal processing unit 50A of the provided CDMA communication system is used, and other configurations and operations are similar in many respects to the twelfth embodiment. In the case of CDMA communication, the timing controller 54 may include a RAKE receiver that combines a plurality of fingers selected from a delayed pull file or the like (not shown).
[0101]
Thus, mobile station apparatus 45A in the present embodiment can be applied to CDMA communication by providing despreading section 64 in demodulation section 51A and spreading section 65 in modulation section 52A.
[0102]
(Embodiment 14)
FIG. 23 shows the configuration of the base station apparatus according to the fourteenth embodiment. Components having the same functions as those shown in FIG. In FIG. 23, the base station apparatus 68 in the present embodiment includes an antenna unit 46 including a reception antenna 66 and a transmission antenna 67, a radio unit 47 including a reception unit 48 and a transmission unit 49, and signal modulation. And a baseband signal processing unit 69 that performs demodulation, encoding, and decoding, a data input / output unit 60 that inputs / outputs data to be transmitted / received to / from a wired line, an antenna unit 46, a radio unit 47, and a baseband signal And a control unit 63 that controls the processing unit 69 and the like.
[0103]
The baseband signal processing unit 69 includes a demodulation unit 51 that demodulates a reception signal, a modulation unit 52 that modulates a transmission signal, and a one-chip DSP 53A. The DSP 53A includes the first to eleventh embodiments. A Viterbi decoding unit 55 composed of an arithmetic processing unit, a convolutional encoding unit 56 that performs convolutional encoding of the transmission signal, and a transmission signal is convolved from the demodulation unit 51 to the Viterbi decoding unit 55 by measuring transmission / reception timing. The timing control unit 54 sent from the encoding unit 56 to the modulation unit 52 is formed by software.
[0104]
The control unit 63 of the base station device 68 performs transmission / reception operations under the control of the base station device 68, and the data input from the wired line is sent to the convolutional encoding unit 56 via the data input / output unit 60. input. The convolutional coding unit 56 performs convolutional coding on the input data and outputs the data to the timing control unit 54. The timing control unit 54 rearranges input data and adjusts transmission output timing, and outputs the result to the modulation unit 52. The data input to the modulation unit 52 is digitally modulated, D / A converted (not shown), and output to the transmission unit 49 of the radio unit 47. The transmission part 49 converts this into a radio signal, sends it to the antenna part 46, and is transmitted as a radio wave from the antenna.
[0105]
On the other hand, at the time of reception, the radio wave received by the antenna unit 46 is received by the receiving unit 48 of the wireless unit 47, AD converted, and output to the demodulating unit 51 of the baseband signal processing unit 69. The data demodulated by the demodulator 51 is rearranged by the timing controller 54 and then input to the Viterbi decoder 55 where it is decoded. The data decoded by the Viterbi decoding unit 55 is output to a wired line via the data input / output unit 60.
[0106]
Thus, since the base station apparatus 68 in this Embodiment forms each part of the Viterbi decoding part 55, the convolutional encoding part 56, and the timing control part 54 with the software of DSP53A of 1 chip | tip, there are few component points Can be assembled with. Further, since the Viterbi decoding unit 55 is formed of the arithmetic processing unit of the first to eleventh embodiments, two path metrics can be updated in one machine cycle by pipeline processing by the DSP 53A, thereby achieving high speed. In addition, it is possible to realize the ACS operation of Viterbi decoding by the DSP 53A with a relatively small processing amount.
[0107]
Here, the demodulator 51 and the modulator 52 are shown separately from the DSP 53A, but they can also be configured by software of the DSP 53A. Also, the DSP of the sixth embodiment can be used as the DSP 53A, and the convolutional encoding unit 56, the audio codec unit 57, and the timing control unit 54 can be configured by separate components.
[0108]
(Embodiment 15)
FIG. 24 shows the configuration of the base station apparatus in the fifteenth embodiment. The base station device 68A of the present embodiment is different from the base station device 68 of the fourteenth embodiment (FIG. 24) in that a modulation unit 52A is provided with a spreading unit 65, and a demodulation unit 51A is provided with a despreading unit 64. The baseband signal processing unit 69A of the provided CDMA communication system is used, and other configurations and operations are similar in many respects to the twelfth embodiment. In the case of CDMA communication, the timing control unit 54 may include a RAKE receiving unit that combines a plurality of fingers selected from a delay pull file or the like (not shown).
[0109]
Thus, base station apparatus 68A according to the present embodiment can be applied to CDMA communication by providing despreading unit 64 in demodulation unit 51A and spreading unit 65 in modulation unit 52A.
[0110]
【The invention's effect】
As described above, according to the arithmetic processing device of the present invention, the instruction decoding means for decoding an instruction, the storage means having a predetermined bit width and holding two metrics on the upper side and the lower side, the memory Access means that reads the two metrics by accessing the upper side and the lower side of the means in parallel, and four types obtained by adding one of the two path metrics and one of the two branch metrics First comparison means for inputting first, second, third, and fourth data, which is data, and comparing the first data and the second data; and parallel to the first comparison means Since the second comparison means for operating the second data and comparing the third data and the fourth data is provided, two path metrics can be updated in one machine cycle by pipeline processing by the DSP. And this DSP ACS operation of the Viterbi decoding can be realized by a relatively small amount of processing speed, an advantageous effect that size reduction, weight reduction and cost reduction and battery life of the mobile terminal is enabled to obtain.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a configuration of an arithmetic processing unit according to a first embodiment of the present invention.
FIG. 2 is a block diagram illustrating an example of a convolutional encoder with a coding rate of 1/2.
FIG. 3 is a schematic diagram showing a butterfly structure when the restraint length K = 4.
FIG. 4 is a block diagram showing a configuration of an arithmetic processing unit according to Embodiment 2 of the present invention.
FIG. 5 is a timing chart for explaining a pipeline operation of the arithmetic processing unit according to the second embodiment of the present invention.
FIG. 6 is a schematic diagram showing an example of memory access operation of the 4-bank RAM 14 according to the second embodiment of the present invention;
FIG. 7 is a block diagram showing a configuration of an arithmetic processing unit according to Embodiment 3 of the present invention.
FIG. 8 is a schematic diagram showing an example of memory access operation of the dual port RAM 15 according to the third embodiment of the present invention;
FIG. 9 is a block diagram showing a configuration of an arithmetic processing unit according to Embodiment 4 of the present invention.
FIG. 10 is a timing diagram illustrating pipeline operation of the arithmetic processing unit according to the fourth embodiment of the present invention.
FIG. 11 is a block diagram showing a configuration of an arithmetic processing unit according to a fifth embodiment of the present invention.
FIG. 12 is a list showing a comparative example of the ACS operation from the nodes N0 and N1 to the nodes N′0 and N′4 and the ACS operation from the nodes N6 and N7 to the nodes N′3 and N′7.
FIG. 13 is a block diagram showing a configuration of an arithmetic processing unit according to the sixth embodiment of the present invention.
FIG. 14 is a block diagram showing a configuration of an arithmetic processing unit according to a seventh embodiment of the present invention.
FIG. 15 is a block diagram showing a configuration of an arithmetic processing unit according to an eighth embodiment of the present invention.
FIG. 16 is an input / output diagram of 4: 2COMPRESSOR in the eighth embodiment of the present invention;
FIG. 17 is a block diagram showing a configuration of an arithmetic processing unit according to the ninth embodiment of the present invention.
FIG. 18 is a diagram for explaining carry control of a double precision AU.
FIG. 19 is a block diagram showing a configuration of an arithmetic processing unit according to the tenth embodiment of the present invention.
FIG. 20 is a block diagram showing a configuration of an arithmetic processing unit according to the eleventh embodiment of the present invention.
FIG. 21 is a block diagram showing a configuration of a mobile station apparatus in Embodiment 12 of the present invention.
FIG. 22 is a block diagram showing a configuration of a mobile station apparatus in Embodiment 13 of the present invention.
FIG. 23 is a block diagram showing a configuration of a base station apparatus according to Embodiment 14 of the present invention.
FIG. 24 is a block diagram showing a configuration of a base station apparatus according to Embodiment 15 of the present invention.
FIG. 25 is a state transition diagram (trellis diagram) showing a state transition path of a convolutional encoder in Viterbi decoding;
FIG. 26 is a schematic diagram showing a butterfly structure of a trellis diagram.
FIG. 27 is a schematic diagram illustrating an example of a generated code by a convolutional encoder.
FIG. 28 is a program diagram showing an example of a Viterbi operation for channel coding.
FIG. 29 is a schematic diagram showing an example of storing pointer control and path metrics.
[Explanation of symbols]
1 Storage means for storing path metrics
2 buses
3 Storage means for storing branch metrics
4 Bus
5 comparison means
6 addition means
7 Storage means for storing comparison results
8 selection means
9 Comparison means
10 Addition means
11 Storage means for storing comparison results
12 Selection means
13 Bus
14 4 banks of RAM
15 Dual port RAM
16 Input register
17 Input register
18 Swap circuit
19 Adder
20 Adder
21 Comparator
22 Adder
23 Adder
24 Adder
25 Adder
26 Comparator
27 Adder
28 Adder
29 ALU
30 Input register
31 Input register
32 buses
33 Bus
34 Selector
35 selector
36 Register file
37 bus
38 bus
39 4: 2COMPRESSOR
40 4: 2COMPRESSOR
41 double precision AU
42 double precision adder
43 Shift register
44 Shift register
45 Mobile station equipment
46 Antenna
47 Radio section
48 Receiver
49 Transmitter
50 Baseband signal processor
51 Demodulator
52 Modulator
53 DSP
54 Timing control unit
55 Viterbi decoder
56 convolutional codes
57 Audio codec section
58 Speaker
59 Microphone
60 Data input / output device
61 Display
62 Operation unit
63 Control unit
64 Despreading section
65 Diffusion part
66 Receiving antenna
67 Transmitting antenna
68 Base station equipment
69 Baseband signal processor

Claims

An arithmetic processing device capable of Viterbi decoding by a digital signal processor,
First comparison means for comparing the first data and the second data;
Second comparison means for comparing the third data and the fourth data;
An arithmetic processing unit that operates the first comparison unit and the second comparison unit in one cycle by one instruction .

The arithmetic processing unit according to claim 1, wherein at least one of the first comparison unit and the second comparison unit is configured by an ALU (Arithmetic Logic Unit).

3. The arithmetic processing apparatus according to claim 1, further comprising two shift registers that hold comparison results of the first comparison unit and the second comparison unit.

The arithmetic processing apparatus according to claim 2, wherein the ALU (Arithmetic Logic Unit) is capable of performing a register-memory operation.