JP2004239930A

JP2004239930A - Method and system for detecting pitch in packet loss compensation

Info

Publication number: JP2004239930A
Application number: JP2003025727A
Authority: JP
Inventors: Sachiko Nagakura; 祥子長倉
Original assignee: Iwatsu Electric Co Ltd
Current assignee: Iwatsu Electric Co Ltd
Priority date: 2003-02-03
Filing date: 2003-02-03
Publication date: 2004-08-26

Abstract

<P>PROBLEM TO BE SOLVED: To reduce the load applied to a CPU caused by the pitch detection of speech being carried out between initial frames of a frame disappearance section in the pitch detection of speech communication using a packet. <P>SOLUTION: Correlation computation is performed at all times and the pitch detection (7) is performed to form interpolation data by pitch buffers PB 1 to 5, a correlation computation section 5, and a correlation buffer 6 so as to be ready for disappearance of the next frame. When the frame disappearance occurs, input data 1 is subjected to interpolation processing (8), by which the immediate interpolation of the disappeared speech data can be immediately interpolated. Since the load of the correlation computation is small, the amount of the operation requiring urgency that arises in the event of the disappearance is extremely little and the circuit arrangement of the correlation computation section 5 and the like by the slow-speed and low-cost CPU is possible. <P>COPYRIGHT: (C)2004,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、パケットによる音声通信におけるピッチ検出方法と装置に関する。さらに具体的には、フレームが消失した場合のパケット損失を補償する音声ピッチ検出方法と装置に関わる。国際電気通信連合の電気通信標準化部門（ＩＴＵ−Ｔ）の勧告Ｇ．７１１ＡＰＰＥＮＤＩＸＩ（以下単に、ＩＴＵ−Ｔ勧告という）に示している推薦案「パケット損失補償方法」において、演算処理を平均化することでＣＰＵにかかる負荷を軽減する改良された方法と装置を提供するものである。
【従来の技術】
【０００２】
ＩＴＵ−Ｔ勧告の「パケット損失補償方法」では、音声通信において、フレームが消失した場合のフレーム消失区間の最初のフレーム間で音声のピッチ検出が行われる。このピッチ検出において、消失したフレームの直前の２０ｍｓ（１６０サンプル）の音声とそれより過去の音声との正規化相互相関計算が実行される。これが、ＣＰＵにおける演算量の大部分を占めている。
【０００３】
ＩＴＵ−Ｔ勧告では、フレーム消失区間の最初に、消失したフレームの直前の音声データを４８．７５ｍｓ（３９０サンプル）長のピッチ・バッファにコピー（記憶）する。この記憶（格納）された音声データは、現時点の音声のピッチ（基本波の周期）を計算するため、および、消失したフレームの期間に存在したであろうと推定される音声の波形を抽出し再現するために、使用される。
【０００４】
音声データのピッチは、５ｍｓ（４０サンプル）から１５ｍｓ（１２０サンプル）までの範囲であり、ピッチ・バッファに記憶した最近（最新）の２０ｍｓ（１６０サンプル）の音声とそれより過去の音声との、正規化相互相関のピークを見つけることで、推定される。
【０００５】
正規化相互相関ｐｉｔｃｈ（ｉ）は、相互相関Ｍ（ｉ）を自己相関Ｓ（ｉ）の平方根で除算して求めることができる。消失したフレームの期間である消失区間の直前に入力された音声データをｘ（ｎ）、サンプル番号をｉ，ｋおよびｎとする。
Ｍ（ｉ）＝Σｘ（ｎ−ｋ）・ｘ（ｎ−ｉ−ｋ）（１）
Ｓ（ｉ）＝Σｘ（ｎ−ｉ−ｋ）^２（２）
ここで、Σはｋ＝０からｋまでの累和を表している。ｘ（ｎ−ｋ）はｘ（ｎ）のｋサンプル前の音声データを表し、ｘ（ｎ−ｉ−ｋ）はｘ（ｎ−ｋ）のｉサンプル前の音声データを表している。
【０００６】
式（１），（２）を用いて、正規化相互相関ｐｉｔｃｈ（ｉ）を求めると、
ｐｉｔｃｈ（ｉ）＝Ｍ（ｉ）／｛Ｓ（ｉ）｝^１／２（３）
となる。ピッチ検出は、ｋ＝０〜１５９の１６０サンプルにつきｉ＝４０〜１２０の範囲において、ｐｉｔｃｈ（ｉ）が最大値を示すｉの値を検出する。
【０００７】
式（３）では、１つの正規化相互相関ｐｉｔｃｈ（ｉ）を求めるために、式（１）の積和演算を１６０回（ｋ＝０〜１５９）、式（２）の積和演算を１６０回（ｋ＝０〜１５９）と平方根（｛｝^１／２）を１回行い、さらに、式（３）の除算を１回実行する必要がある。式（１）と（２）の積和演算にそれぞれ１サイクル、式（２）の平方根に１０サイクルおよび式（３）の除算に１０サイクルの演算量がかかると仮定すると、１回の正規化相互相関ｐｉｔｃｈ（ｉ）を求める計算で、１６０×２＋１０＋１０＝３４０サイクルを必要とし、これを、ｉ＝４０〜１２０について８１回計算するので、２７，５４０サイクルの演算量を必要とする。
【０００８】
すなわち、式（３）の正規化相互相関ｐｉｔｃｈ（ｉ）を求めるためには、２７，５４０サイクルの演算量を必要とするのである。これだけの演算量を実行して、はじめて、式（３）の正規化相互相関ｐｉｔｃｈ（ｉ）が最大値を示すｉの値を検出することが可能となる。
【０００９】
このように大きな演算量の実行を回避するために、ＩＴＵ−Ｔ勧告では、以下の方法による低演算量化を提案している。
【００１０】
ピッチの推定は、２段階に分けて計算する。その第１段階では、２対１に間引いた音声データ信号で粗い探索を実行してピーク値を検出する。第２段階では、粗い探索で検出したピーク値の付近で、詳細な探索を実行する。
【００１１】
第１段階の粗い探索では、式（２）の自己相関計算で、ｉ＝１２０において、
Ｓ（ｉ）＝Σｘ（ｎ−ｉ−２ｋ）^２（４）
すなわち、
Ｓ（１２０）＝Σｘ（ｎ−１２０−２ｋ）^２
ここで、第１段階で２対１に間引いた結果、式（４）において、Σはｋ＝０から７９までの累和を表し、２対１に間引いたので式（２）のｋに代えて２ｋを用いている。
【００１２】
差分計算は、
Ｓ（ｉ＋１）−Ｓ（ｉ）＝ｘ（ｎ−１２０−ｉ）^２ −ｘ（ｎ−ｉ）^２（５）
で表されることから、
Ｓ（ｉ）＝Ｓ（ｉ＋１）−ｘ（ｎ−１２０−ｉ）^２＋ｘ（ｎ−ｉ）^２（６）
となる。式（６）において、ｉ＝１１９〜４０のうち、２対１に間引いたのでｉが偶数のときのみ、すなわち、４０回計算する。
【００１３】
自己相関計算の場合と同様に、式（１）の相互相関計算で、ｉ＝１２０とおいて、
Ｍ（ｉ）＝Σｘ（ｎ−２ｋ）・ｘ（ｎ−ｉ−２ｋ）（７）
となる。ここで、第１段階で２対１に間引いた結果、式（７）において、Σはｋ＝０から７９までの累和を表し、式（１）のｋに代えて２ｋを用いている。式（７）において、ｉ＝１１９〜４０のうち、ｉが偶数のときのみ、すなわち、４１回計算する。
【００１４】
式（７）で得た相互相関Ｍ（ｉ）と、式（６）で得た自己相関Ｓ（ｉ）とから、粗い（ｉ＝１１９〜４０のうちのｉが偶数時）正規化相互相関ｐｉｔｃｈ（ｉ）を求める。
ｐｉｔｃｈ（ｉ）＝Ｍ（ｉ）／｛Ｓ（ｉ）｝^１／２（８）
【００１５】
したがって、式（８）の粗い正規化相互相関ｐｉｔｃｈ（ｉ）を探索するのに必要な演算量は、自己相関計算に８０サイクル、差分計算に２×４０＝８０サイクル、相互相関計算に８０×４１サイクル、平方根計算に１０×４１サイクル、除算計算に１０×４１サイクルの第１段階の計４，２６０サイクルが必要となる。
【００１６】
第２段階では、第１段階で４，２６０サイクルの計算により求めた式（８）の粗い正規化相互相関ｐｉｔｃｈ（ｉ）の探索で検出したピーク値ｉとその前後の３値（ｉ−１，ｉ，ｉ＋１）で、サンプルを間引かずに詳細な正規化相互相関ｐｉｔｃｈ（ｉ）の探索を実行する。
【００１７】
すなわち、式（３）において、ｋ＝０〜１５９として、
ｐｉｔｃｈ（ｉ＋１）＝Ｍ（ｉ＋１）／｛Ｓ（ｉ＋１）｝^１／２（９）
ｐｉｔｃｈ（ｉ）＝Ｍ（ｉ）／｛Ｓ（ｉ）｝^１／２（１０）
ｐｉｔｃｈ（ｉ−１）＝Ｍ（ｉ−１）／｛Ｓ（ｉ−１）｝^１／２（１１）
となる。
【００１８】
第２段階の演算量は、自己相関計算に１６０サイクル、差分計算に２×２＝４サイクル、相互相関計算に１６０×３サイクル、平方根計算に１０×３サイクル、除算計算に１０×３サイクルの第２段階の計７０４サイクルとなる。第１段階と第２段階の合計は、４，９６４サイクルとなる。
【００１９】
第１段階と第２段階の合計では、４，９６４サイクルとなるから、ＩＴＵ−Ｔ勧告をそのまま実行したときに必要とされる２７，５４０サイクルの演算量に対して、約２０％に削減される。しかしながら、ＩＴＵ−Ｔ勧告の低演算量化提案を実行したときに発生する演算量は、４，９６４サイクルである。これを１２５μｓ（８ｋＨｚ）の期間内で処理しようとすると、ＣＰＵにおける処理量（ＩＰＳ：１秒あたりの命令数）は、４９６４／０．０００１２５＝３９．７１２ＭＩＰＳ（Ｍ：メガ）となり、依然として、ＣＰＵにとって大きな負荷となっている。
【００２０】
図１０には、従来例の装置における補間処理の動作を示すタイムチャートが示されている。同図（ａ）に示した音声の入力データ１においては、データ番号０〜７９＝８０個のサンプル（ｓａ）のデータが１フレーム（ｔ８〜ｔ１０）に含まれている。ここで、１ｓａは、１２５μｓである。時点ｔ１０〜ｔ１２のフレームが消失した場合を想定する。同図（ｃ）には、（ａ）の入力データ１を３０ｓａ分遅れさせた遅延データ３が示されている。ここで、（ｂ）は、説明の便宜上、欠番となっている。
【００２１】
（ｄ）のフレーム消失信号１５は、時点ｔ１０までは“Ｌ”のままであるが、時点ｔ１０においてフレームの消失を検出すると“Ｈ”となり、時点ｔ１２においてフレームが検出されると、再び“Ｌ”となる。ＩＴＵ−Ｔ勧告では、補間データと消失直前の入力データ１を滑らかに接続するために、消失直前の３０サンプル（ｓａ）分の遅延データ３（ｃ）と補間データを前方（ｔ３２〜ｔ３４の方向）に延長したデータの重畳加算を時点ｔ３２〜ｔ３４の間行っている。また、補間データと消失直後の入力データを滑らかに接続するために、消失フレームの直後の３０ｓａ分の遅延データ３（ｃ）と補間データを後方に延長したデータとの重畳加算を時点ｔ３５〜ｔ３６の間行っている。
【００２２】
重畳加算は、時点ｔ１０〜ｔ３３の遅延データ３（ｃ）において、ｔ１０のデータ番号５０を補間データに加算する場合には、遅延データの割合を大きく補間データの割合を小さくし、その後ｔ３３に近づくにつれて遅延データの割合を小さく補間データの割合を大きくしている。１ｓａ遅れた出力データ２（ｅ）の時点ｔ３２〜ｔ３４の間は、遅延データの割合を徐々に小さく補間データの割合を徐々に大きくして、ｔ３４以後はｔ３５迄補間データのみとなる。
【００２３】
さらに、時点ｔ３７〜ｔ３８の遅延データ３（ｃ）において、ｔ３７のデータ番号０を補間データに加算する場合には、遅延データの割合を小さく補間データの割合を大きくし、その後ｔ３８に近づくにつれて補間データの割合を小さく遅延データの割合を大きくしている。１ｓａ遅れた出力データ２（ｅ）の時点ｔ３５〜ｔ３６の間は、補間データの割合を徐々に大きくして、ｔ３６以後は遅延データのみとなる。
【００２４】
（ｅ）の出力データ２は、（ｃ）の遅延データ３よりも１ｓａだけ遅れて出力される。時点ｔ１０においてフレームの消失が検出されると、そこから１ｓａ分の１２５μｓの期間内で、前記第１段階と第２段階の演算処理がなされて、補間データが出力される。出力データ２は、時点ｔ１０から１２５μｓ遅れた時点ｔ３２からｔ３４の間（３０サンプル分：３．７５ｍｓ）は重畳加算された補間データであり、ｔ３４〜ｔ３５の間は重畳加算されていない補間データであり、ｔ３５からｔ３６の間は重畳加算された補間データとなっている。
【００２５】
フレーム消失信号（ｄ）の発生した時点ｔ１０からｔ３２の１２５μｓの間に前記第１段階と第２段階の演算処理が完了しなければ、消失フレームの発生によって、音声データが途切れることになる。このような事態は絶対に避けねばならないから、この１２５μｓの間に演算処理を完了しなければならないＣＰＵにとって大きな負荷となる。
【００２６】
【発明が解決しようとする課題】
ＩＴＵ−Ｔ勧告をそのまま実行したときに必要とされる２７，５４０サイクルの大きな演算量の実行を回避するために、ＩＴＵ−Ｔ勧告の低演算量化提案を採用したとしても、ＣＰＵに対する負荷は依然として重い。さりとて、ＣＰＵにおける演算処理時間を延ばすことは、消失フレームの補間処理が遅延して音声信号に不都合を生じてしまうこととなるから許されず、解決されなければならない課題であった。
【００２７】
ＣＰＵおよび、その周辺の回路素子の演算速度を上げることにより、補間処理の遅延を避けることは技術的には可能であるが、著しいコストアップを伴うから、実施することができないという重大な問題点があった。
【００２８】
【課題を解決するための手段】
本発明は、パケットによる音声通信において、フレームが消失した場合のパケット損失を補償する場合に発生する、ＣＰＵにかかる大きな負荷を軽減するべく、演算処理を平均化するようにしている。
【００２９】
音声データを含むフレームが消失するのに備えて、フレーム消失の有無にかかわらず、常時、一連の正常なフレーム列から正規化相互相関計算をして音声のピッチを検出して、一連の正常なフレーム列の直後に入力される次フレームが消失フレームであった場合に、その消失フレームに、検出された音声のピッチに基づいて得た補間データを補間するようにした。この補間作業においては、フレームの消失寸前の３０サンプル分のデータを、重畳加算できるようにしている。
【００３０】
ピッチ・バッファはフレーム周期が１０ｍｓ（８０サンプル）の場合は、フレーム消失直前の３９０サンプルを格納（記憶）するとして、３９０／８０＝４．９となるから、５個のピッチ・バッファを用意している。
【００３１】
自己相関および相互相関を求めて、音声データのピッチ検出をし、補間データを作成しておく演算作業を、フレーム消失が発生するか否かにかかわらず、常時実行している。演算は、常時分散して実行されているために、その計算速度は遅いものでも、十分に対応できる。
【００３２】
フレーム消失が発生すると、ピッチ・バッファにおける音声データの更新は止められ、すでに検出された最新の音声データのピッチから得た補間データにより、消失したフレームを補間する。フレーム消失という異常な事態が発生した時点では、消失したフレームを補間するのに必要なピッチおよび補間データがすでに作成済みであるから、フレーム消失の発生時における処理量は、極めて小さく、消失フレームの補間処理が遅延して音声信号に不都合を生じてしまうこともない。
【００３３】
【発明の実施の形態】
図１および図２は、本発明の実施の形態を示す回路構成図、および回路構成の動作を従来例（図１０）と対比して説明するためのタイムチャートである。ここにおいて、従来例を示す図１０の要素と同じものについては、同じ記号を用いた。図２において、図１０と異なる点は、（ｂ）のスイッチ３０と（ｆ）のスイッチＳＷの機能が追加されている点である。
【００３４】
すなわち、（ｂ）のスイッチ３０は、時点ｔ３１からｔ１０の間（３０ｓａ分：３．７５ｍｓ）はオンとなり、それ以外においてオフとなっている。（ｆ）のスイッチＳＷは、時点ｔ３２からｔ３６の間は端子ａ側に接続され、それ以外において端子ｂ側に接続される。時点ｔ１０までに相関計算は終了している。
【００３５】
音声データが入力データ１（ａ）として印加されている。入力データ１は、遅延器ＤＬによって３０ｓａ分（ｔ３１からｔ１０の間）遅延し、切替スイッチＳＷの端子ｂ側に接続されて、出力バッファ１０において１ｓａ（サンプル：１サンプルは１２５μｓ）遅れて、従来例と同じく出力データ２として出力される。入力データ１はフレーム構成で、１フレーム中に音声信号からサンプルして得た８０サンプル分のデータを含んでいる。
【００３６】
何等かの理由でフレームが消失すると、消失したフレームの音声を再現できなくなるから、消失したフレームのデータを、消失前の音声データから相関計算し、音声のピッチを検出して推定し、その推定したデータで、消失したフレームのデータを補間している。入力データ１にフレームの消失が発生すると、フレーム消失検出部９がこれを検出し、フレーム消失信号１５を出力し（ｔ１０）、１ｓａ分遅れて（ｔ３２）切替スイッチＳＷを端子ａ側に切替て、消失したフレームを補間する補間データ２６を、出力バッファ１０を介して出力データ２として出力する。このフレーム消失検出部９、遅延器ＤＬおよび出力バッファ１０の構成は従来例と同じであり、公知である。
【００３７】
補間処理が終り、正常なフレームの入力をフレーム消失検出部９が時点ｔ１２で確認すると、フレーム消失信号１５が終了した後、６０ｓａ分（ｔ１２〜ｔ３８）経過することにより、次のサンプルで切替スイッチＳＷを端子ｂ側に切替える（ｔ３６）。正常なフレームの入力をフレーム消失検出部９が確認している間は、切替スイッチＳＷは端子ｂ側にあって、出力データ２は出力され、ピッチ・バッファＰＢ１〜５にも同時に印加され、そこに一時格納（記憶）される。遅延データ３（ｃ）における消失フレーム開始時点ｔ３３は、遅延器ＤＬによって３０ｓａ分遅れている。
【００３８】
スイッチ３０（ｂ）は、入力データ１（ａ）の１フレームの最後の３０ｓａ分（ｔ３１〜ｔ１０：３．７５ｍｓ：データ番号５０〜７９）の間オンすることにより、消失するかもしれない次のフレーム開始の直前の３０ｓａ分をピッチ・バッファＰＢ１に格納する。この３０ｓａ分は、遅延データ３（ｃ）における時点ｔ１０〜ｔ３３（データ番号５０〜７９）に対応しており、ピッチ検出と出力データ２（ｅ）における時点ｔ３２〜ｔ３４（データ番号５０〜７９）の重畳加算において、使用される。
【００３９】
ピッチ・バッファ出力２１は、補間処理部８と相関計算部５に送られる。相関バッファ６には、５個の経過バッファＰＡＢ１〜５、２個の自己相関バッファＳＣＢ１，２と、２個の相互相関バッファＭＣＢ１，２が含まれている。相関計算部５と相関バッファ６の間においては、相関入出力２２により、相関計算中のデータのやりとりが行われる。
【００４０】
自己相関バッファＳＣＢ１，２の自己相関バッファ出力２３と、相互相関バッファＭＣＢ１，２の相互相関バッファ出力２４とは、ピッチ検出部７に送られる。ここで音声のピッチが検出されて、ピッチ・データ２５が補間処理部８に送られ、作成された補間データ２６が切替スイッチＳＷの端子ａ側に印加される。フレーム検出部９がフレームの消失を検出すると、フレーム消失信号１５により、切替スイッチＳＷは端子ａ側に切替られて、補間データ２６で補間されたフレームが出力バッファ１０を介して出力データ２として出力される。
【００４１】
図３には、図１に示した回路構成の動作原理を説明するためのタイムチャートが示されている。５個のピッチ・バッファＰＢ１〜５のうちの１つのピッチ・バッファＰＢ１を代表例として、説明している。
【００４２】
図３（ａ）のフレーム（Ｆ）は、時点ｔ０に始まり、時点ｔ２迄に０〜７９の８０サンプル（ｓａ）分のデータを含んでいる。以下同様に、ｔ４，ｔ６，・・・ｔ１０と続いている。同図（ｂ）のピッチ・バッファＰＢ１は、記憶開始時点ｔｓからデータの一時記憶を開始する。ピッチ・バッファＰＢ１の記憶容量は３９０ｓａ分（０〜３８９）である。時点ｔ１０において、ピッチ・バッファＰＢ１は満杯となる。
【００４３】
音声のピッチ計算には２８０ｓａ分のデータを必要とする。そこで時点ｔ３において、同図（ｃ）の相関計算部５の動作が開始されたとする。自己相関計算は時点ｔ３〜ｔ７（１６０ｓａ分）の間に行われる。自己相関差分計算は時点ｔ７〜ｔ９（８０ｓａ分）の間に行われる。相関計算部５の動作は、時点ｔ６〜ｔ１０（１６０ｓａ分）の間に行われる。
【００４４】
同図において、もし、時点ｔ１０以後のフレームが消失したときには、ピッチ検出部７および補間処理部８の動作により、時点ｔ１０から１ｓａ分（１２５μｓ）の間にピッチを検出し、補間データ２６を得ている。ピッチ・バッファＰＢ１〜５、ＣＰＵ構成の相関計算部５および相関バッファ６は、ピッチ・バッファ制御部１１と相関バッファ制御部１２の制御下におかれる。時点ｔ１０〜ｔ１２において存在すべきフレームが消失したと仮定すると、この消失したフレームを補間するべく、時点ｔ１０から１ｓａ分（１２５μｓ）の間に得た補間データ２６を出力することになる。
【００４５】
図３において実行される相関計算およびピッチ検出について、詳細に説明する。同図（ｂ）のピッチ・バッファＰＢ１には、音声のピッチを推定し、消失したフレーム区間（ｔ１０〜ｔ１２）の音声波形を抽出し推定するべく、消失した時点ｔ１０より前のｔｓ〜ｔ１０の３９０ｓａ（０〜３８９のサンプル）が格納されている。演算処理を分散するために、自己相関計算（ｔ３〜ｔ７）、および相互相関計算（ｔ６〜ｔ１０）は、データが入力されて、計算が可能となった時点で順次行う。
【００４６】
ピッチ・バッファＰＢ１が１１０ｓａを格納（記憶）した時点ｔ３から１６０ｓａ格納した時点ｔ７までの自己相関Ｓ（０）を求める。サンプル番号ｎの入力データをｘ（ｎ）とすると、ｎ＝１１０〜２６９のとき（図３（ｂ）ではサンプル番号０，１１０，２３０，２７０，３４９，３８９を表示している）、
Ｓ（０）＝Ｓ（０）＋ｘ（ｎ）^２（１２）
を計算する。
【００４７】
サンプル番号ｎ＝２６９になると、自己相関Ｓ（０）が求まる。
つぎに、ｎ＝１１１を先頭とする１６０ｓａの自己相関Ｓ（１）を求める。これは、Ｓ（０）を用いて、ｎ＝２７０になったとき、
Ｓ（１）＝Ｓ（０）−ｘ（ｎ−１６０）^２＋ｘ（ｎ）^２（１３）
で求められる。
【００４８】
以降、ｎ＝１９０を先頭とする時点ｔ９のｎ＝３４９までの１６０ｓａを用いて、ｔ７〜ｔ９の間に８０個（＝３４９−２６９）の自己相関Ｓ（ｉ）を求めることができる。自己相関Ｓ（ｉ）は、ｎ＝２７０〜３４９において、
ｉ＝ｎ−２６９として、
Ｓ（ｉ）＝Ｓ（ｉ−１）−ｘ（ｎ−１６０）^２＋ｘ（ｎ）^２（１４）
【００４９】
相互相関Ｍ（ｉ）は、フレームが消失する直前の１６０ｓａ（ｔ６〜ｔ１０）すなわち、ｎ＝２３０を先頭にした１６０ｓａに対し、それぞれｉサンプル（ｓａ）前のデータを掛け合わせたものを、順次加算して求める。すなわち、ｔ６〜ｔ１０のｎ＝２３０〜３８９のデータに対し、たとえば、ｉ＝１２０のときは、同図（ｆ）のｎ−ｉ−ｋに１点鎖線の枠で示すｎ＝１１０〜２６９（ｔ３〜ｔ７）のデータを掛け合わせて加算する。
【００５０】
同図（ｄ）のｎ−ｋに１点鎖線の枠で示すｎ＝２３０〜３８９（ｔ６〜ｔ１０）のデータに対し、たとえば、ｉ＝４０のときは、同図（ｅ）のｎ−ｉ−ｋに１点鎖線の枠で示すｎ＝１９０〜３４９（ｔ５〜ｔ９）のデータを掛け合わせて加算する。
【００５１】
ｎ＝３８９のとき、ｉ＝４０〜１２０について、相互相関Ｍ（ｉ）は、
Ｍ（ｉ）＝Σｘ（ｎ−ｋ）・ｘ（ｎ−ｉ−ｋ）（１５）
と表すことができる。ここに、Σは、ｋ＝０〜１５９としたときの累和を表している。
【００５２】
相互相関Ｍ（ｉ）は、ｎ＝２３０〜３８９のとき、ｉ＝４０〜１２０について、それぞれ、
Ｍ（ｉ）＝Ｍ（ｉ−１）＋ｘ（ｎ）・ｘ（ｎ−ｉ）（１６）
を計算すれば、ｎ＝３８９のとき（ｔ１０）、８１個全ての相互相関Ｍ（ｉ）が求められる。
【００５３】
式（１２），（１４），（１６）によるデータ入力時の積和演算の回数は、以下のようになる。式（１２）の自己相関計算でＳ（０）にｘ（ｎ）^２を加算する計算を１回（＝１サイクル）行う。式（１４）の自己相関差分計算でＳ（ｉ−１）からｘ（ｎ−１６０）^２を減算する計算とｘ（ｎ）^２を加算する計算の２回（＝２サイクル）実行する。式（１６）の相互相関計算で８１個のｉ（＝４０〜１２０）についてＭ（ｉ−１）にｘ（ｎ）・ｘ（ｎ−ｉ）を加算するから８１回（＝８１サイクル）計算する。
【００５４】
図３の相互相関計算中の時点ｔ６〜ｔ１０の間には、自己相関計算の一部（ｔ６〜ｔ７）と自己相関差分計算（ｔ７〜ｔ９）が同時に並行して実行されるために、最大で８３回（＝８３サイクル）の積和計算を行う。さらに、式（８）の｛Ｓ（ｉ）｝^１／２を得るために式（１４）の結果の平方根を得る計算に１０サイクルを要するから、演算量は、８３＋１０＝９３サイクルとなる。
【００５５】
フレーム消失が時点ｔ１０において発生すると、すでに求めた相互相関Ｍ（ｉ）と自己相関の平方根｛Ｓ（ｉ）｝^１／２で除算する計算をして、式（８）の正規化相互相関ｐｉｔｃｈ（ｉ）を求めて消失したフレームを補間する。フレーム消失が発生した時点ｔ１０において、必要とされる計算は、演算量削減のために２対１に間引いた信号で粗い探索を行う。
【００５６】
その後に、粗い探索で求めたピーク付近で詳細な探索をするならば、４１＋２＝４３回の除算で済むので、各除算に１０サイクルを要するから、その除算をするのに４３回×１０サイクル＝４３０サイクルとなる。これは、ＩＴＵ−Ｔ勧告の低演算量化提案をそのまま実行したときの４，９６４サイクルの１０％以下であるから、相関計算部５などを含むＣＰＵの負荷は極めて小さい。
【００５７】
以上の説明においては、ピッチ・バッファＰＢ１を代表例として述べたが、相互相関計算が終る時点ｔ１０において、丁度都合よく消失フレームが発生するとは限らない。そのために、ピッチ・バッファＰＢを５個用意して、いつ消失フレームが発生しても、いずれかのピッチ・バッファＰＢが図３に示した状態となっているようにしたので、ただちに対処できる。
【００５８】
図４には、図１に示した回路構成の構成要素であるピッチ・バッファのフレームに対する動作内容を説明するためのタイムチャートが示されている。同図（ａ）には、時点ｔ０〜ｔ１８までのフレームが示されている。同図（ｂ）〜（ｆ）には、それぞれピッチ・バッファＰＢ１〜５の動作が示されている。ここで、Ｓは自己相関計算、ＳＤは自己相関差分計算、Ｍは相互相関計算を表している。
【００５９】
フレーム消失の有無にかかわらず、入力される音声データ８０ｓａ（サンプル）を１フレームとするフレーム毎に対して、常時、自己相関計算と相互相関計算を実行するように対処しなければならない。ピッチ検出および補間のためには３９０ｓａ（サンプル）のデータを格納するピッチ・バッファＰＢがフレーム毎に必要である。
【００６０】
フレーム毎に時間的にずらして、５個のピッチ・バッファＰＢ１〜５で対応できるようにする。１つのピッチ・バッファＰＢは、１フレームの消失に対してだけ対応することができるのみである。そこでたとえば、フレーム周期が１０ｍｓ（８０サンプル）の場合は、フレーム消失直前の３９０サンプルを記憶するとして、３９０／８０＝４．９となるから、５個のピッチ・バッファＰＢ１〜５を用意する。
【００６１】
ピッチ・バッファＰＢ１〜５には、順次入力されるフレーム毎の音声データをピッチ・バッファＰＢ１から順次に記憶する。すなわち、５個のピッチ・バッファＰＢ１〜５により、１フレームずつずらし、それぞれ５フレーム分の音声データが記憶されている。５フレーム期間が経過すると、最も古い音声データを記憶している、たとえば、ピッチ・バッファＰＢ１の記憶内容は、時点ｔ１０から１０ｓａ（サンプル）の時点で更新されて、最新の音声データを記憶することになる。１個のピッチ・バッファＰＢについて見ると、５フレーム期間の経過ごとに記憶されている音声データが更新されることになる。
【００６２】
ピッチ・バッファＰＢ１〜５に記憶しているサンプル（音声データ）から自己相関計算Ｓ，自己相関差分計算ＳＤおよび相互相関計算Ｍをして、音声データのピッチ検出をし、補間データを作成しておく作業を常時実行している。たとえば、時点ｔ１０でフレームの消失が発生すると、ピッチ・バッファＰＢ１のデータから演算して求めた補間データが使用される。同じく、時点ｔ１２，１４，１６，１８でフレームの消失が発生すると、それぞれ、ピッチ・バッファＰＢ２，３，４，５のデータによる補間データが使用される。
【００６３】
自己相関および相互相関を求めるために、相関バッファ６と相関計算部５が設けられ、相関バッファ制御部１２による制御がなされている。ＣＰＵ構成の相関計算部５における演算は、常時分散して実行されているために、その計算速度は遅いものでも、十分に対応できる。
【００６４】
フレーム消失が発生すると、ピッチ・バッファにおける音声データの更新は止められ、すでに検出された最新の音声データのピッチにより、消失したフレームを補間する。フレーム消失という異常な事態が発生した時点では、消失したフレームを補間するのに必要な補間データがすでに作成済みであるから、フレーム消失の発生時における演算量は、極めて小さく、消失フレームの補間処理が遅延して音声信号に不都合を生じてしまうこともない。
【００６５】
図５には、ピッチ・バッファＰＢ１〜５のデータがフレームに割当てられる様子を説明するためのタイムチャートが示されている。同図（ａ）には、時点ｔ０〜ｔ２４までのフレームの番号が示されている。同図（ｂ）〜（ｆ）には、それぞれピッチ・バッファＰＢ１〜５のフレーム対応動作が示されている。たとえば、（ｂ）のピッチ・バッファＰＢ１のｔ０〜ｔ１０のデータは、フレーム６（ｔ１０〜）でフレーム消失が発生したときに使用される。同じくピッチ・バッファＰＢ２のｔ２〜ｔ１２のデータは、フレーム７（ｔ１２〜）でフレーム消失が発生したときに使用される。以下、ピッチ・バッファＰＢ３〜５も同様である。
【００６６】
図６には、図５の時点ｔ１０〜ｔ１２のフレーム６でフレーム消失が発生したときのその後のピッチ・バッファＰＢ１〜５のデータがフレームに割当てられる様子を説明するためのタイムチャートが示されている。（ｂ）のピッチ・バッファＰＢ１のｔ０〜ｔ１０のデータは、フレーム６（ｔ１０〜）でフレーム消失が発生したときには、そのデータは、（ｂ）のピッチ・バッファＰＢ１においては、ｔ１０で更新されずに、消失フレームを補間する補間動作がｔ１２で終了してから更新される。
【００６７】
そのかわりに、時点ｔ１０から蓄積が開始されなければならないフレーム１１の消失に備えるデータは、（ｃ）のピッチ・バッファＰＢ２にｔ１０以後において蓄積される。そのために、ピッチ・バッファＰＢ１のｔ１２〜ｔ２２のデータは、フレーム１２（ｔ２２〜）が消失した場合のために、使用される。
【００６８】
フレーム６の消失に続いてフレーム７も消失した場合は、（ｂ）のピッチ・バッファＰＢ１においてフレーム６の消失に備えたデータ（ｔ０〜ｔ１２）で補間する。そこで、フレーム７の消失に備えてｔ１０まで蓄積した（ｃ）のピッチ・バッファＰＢ２のデータは不要になるので、ＰＢ２にはｔ１０〜ｔ２０においてフレーム１１が消失した場合に備えて、データ蓄積がなされる。
【００６９】
図７には、ピッチ・バッファ制御部１１に含まれた５個のレジスタからなるピッチ・バッファ・カウンタＰＢＣ１〜５の動作のタイムチャートが示されている。５個のピッチ・バッファ・カウンタＰＢＣ１〜５は、５個のピッチ・バッファＰＢ１〜５へのフレームの割り当てを制御するためのものである。各ピッチ・バッファ・カウンタＰＢＣは、３９０進（０〜３８９）のカウンタである。
【００７０】
同図（ｂ），（ｃ），（ｄ），（ｅ），（ｆ）のピッチ・バッファ・カウンタＰＢＣ１〜５のそれぞれは、ピッチ・バッファＰＢ１〜５へのフレームの割当てを制御している。（ｇ）の現フレーム用ピッチ・バッファ番号ＰＦＰＢＮｏは、たとえば、時点ｔ０現在のフレームに割当てたピッチ・バッファＰＢの番号が１（ＰＢ１）であることを示している。（ｈ）の次フレーム用ピッチ・バッファ番号ＮＦＰＢＮｏは、たとえば、時点ｔ０現在のフレームの次に割当てるピッチ・バッファＰＢの番号が２（ＰＢ２）であることを示している。以下も同様である。
【００７１】
図７のフレーム割当ての手順を具体的に説明する。たとえば、ピッチ・バッファ・カウンタＰＢＣ１が、時点ｔ８で３０９（＝３８９−８０）を示したとき、そのピッチ・バッファＰＢ１の番号１を（ｈ）の次フレーム用ピッチ・バッファ番号ＮＦＰＢＮｏの１として記録する。ピッチ・バッファ・カウンタＰＢＣ１が、時点ｔ１０で３８９を示したとき、そのピッチ・バッファＰＢ１の番号１を（ｇ）の現フレーム用ピッチ・バッファ番号ＮＦＰＢＮｏの１として記録する。
【００７２】
現フレームが正常フレームであれば、現フレーム用のピッチ・バッファＰＢのデータは不要となるので、これを５フレーム後の新フレームに割当てる。現フレームが消失フレームであれば、現フレーム用のピッチ・バッファＰＢのデータとしてすでに用意してある補間データにより、消失データを補間し、消失フレームが連続している間は、これを使用する。そのときには、次フレーム用ピッチ・バッファは不要となるので、これを新たなフレームに割当てる。
【００７３】
相関計算部５では、ピッチ・バッファＰＢ毎にピッチ・バッファ・カウンタＰＢＣのカウント値に応じて、相関計算を行っている。たとえば、ピッチ・バッファ・カウンタＰＢＣ１のカウント値が１１０〜２６９のときは（図３を参照）、）自己相関計算を行い、カウント値が２６９のときに自己相関結果の平方根を求める。
【００７４】
さらに、カウント値が２７０〜３４９のときには、自己相関差分計算を行い、それぞれ差分計算結果の平方根を求める。カウント値が２３０〜３８９のときには、相互相関計算を行う。その計算結果は、相関バッファ制御部１２が示す、相関バッファ６に含まれた相互相関バッファＭＣＢ１，２に格納する。
【００７５】
相関バッファ６は、ピッチ・バッファＰＢ毎に自己相関計算の途中経過を格納する５個の経過バッファＰＡＢ１〜５と、１フレームにつき８１個の自己相関計算結果を２フレーム分格納する自己相関バッファＳＣＢ１，２と、１フレームにつき８１個の相互相関計算結果を２フレーム分格納する相互相関バッファＭＣＢ１，２とで構成されている。
【００７６】
相関バッファ制御部１２は、相関計算結果を格納する相関バッファ６を制御している。経過バッファＰＡＢ１〜５は、それぞれピッチ・バッファＰＢ１〜５に対応して割当てられる。自己相関バッファＳＣＢ１，２は、それぞれ１フレーム分づつを格納できるから、１フレーム毎に交互に割当てられる。同様に、相互相関バッファＭＣＢ１，２も、それぞれ１フレーム分づつを格納できるから、１フレーム毎に交互に割当てられる。
【００７７】
図８には、相関バッファ６に含まれた経過バッファＰＡＢ１〜５の動作を説明するためのタイムチャートが示されている。同図（ａ）には、時点ｔ０〜ｔ２４のフレームの番号が示されている。同図（ｂ），（ｃ），（ｄ），（ｅ），（ｆ）の経過バッファＰＡＢ１〜５のそれぞれは、ピッチ・バッファＰＢ１〜５のいずれかと対応している。
【００７８】
たとえば、同図（ｃ）の経過バッファＰＡＢ２はｔ３〜ｔ９において、自己相関計算Ｓと自己相関差分計算ＳＤの間（図４のｔ３〜ｔ９）、１つのピッチ・バッファＰＢ１に割当てられ、６番目のフレーム（ｔ１０〜）の消失に備えていることを表している。同様に、同図（ｄ）の経過バッファＰＡＢ３はｔ５〜ｔ１１において、自己相関計算Ｓと自己相関差分計算ＳＤの間（図４のｔ５〜ｔ１１）、１つのピッチ・バッファＰＢ２に割当てられ、７番目のフレーム（ｔ１２〜）の消失に備えていることを表している。以下、同様である。
【００７９】
図９には、相関バッファ６に含まれた自己相関バッファＳＣＢ１，２と相互相関バッファＭＣＢ１，２の動作を図８と対応して説明するためのタイムチャートが示されている。同図（ａ）には、時点ｔ０〜ｔ２４のフレームの番号が示されている。同図（ｂ），（ｃ），（ｄ），（ｅ）の自己相関バッファＳＣＢ１，２と相互相関バッファＭＣＢ１，２の動作について、説明する。
【００８０】
同図（ｂ）の自己相関バッファＳＣＢ１は、時点ｔ１〜ｔ５（Ｓ）の間に求めた式（１２）のＳ（０）と、時点ｔ５〜ｔ７（ＳＤ）の間に求めた式（１４）のｉ＝１〜８０としたＳ（ｉ）とを相互相関計算Ｍの終了する時点ｔ８まで格納できればよい。同じく、（ｂ）の自己相関バッファＳＣＢ１は、時点ｔ５〜ｔ９（Ｓ）の間に求めた式（１２）のＳ（０）と、時点ｔ９〜ｔ１１（ＳＤ）の間に求めた式（１４）のｉ＝１〜８０としたＳ（ｉ）とを相互相関計算Ｍの終了する時点ｔ１２まで格納できればよい。
【００８１】
（ｃ）の相互相関バッファＭＣＢ１は、時点ｔ４〜ｔ８（Ｍ）の相互相関計算Ｍの式（１６）のｉ＝４０〜１２０としたＭ（ｉ）を格納して、時点ｔ８〜のフレーム番号５の消失に備える。同じく、（ｃ）の相互相関バッファＭＣＢ１は、時点ｔ８〜ｔ１２（Ｍ）の相互相関計算Ｍの式（１６）のｉ＝４０〜１２０としたＭ（ｉ）を格納して、時点ｔ１２〜のフレーム番号７の消失に備える。
【００８２】
同図（ｄ）の自己相関バッファＳＣＢ２は、時点ｔ３〜ｔ７（Ｓ）の間に求めた式（１２）のＳ（０）と、時点ｔ７〜ｔ９（ＳＤ）の間に求めた式（１４）のｉ＝１〜８０としたＳ（ｉ）とを相互相関計算Ｍの終了する時点ｔ１０まで格納できればよい。同じく、（ｄ）の自己相関バッファＳＣＢ２は、時点ｔ７〜ｔ１１（Ｓ）の間に求めた式（１２）のＳ（０）と、時点ｔ１１〜ｔ１３（ＳＤ）の間に求めた式（１４）のｉ＝１〜８０としたＳ（ｉ）とを相互相関計算Ｍの終了する時点ｔ１４まで格納できればよい。
【００８３】
（ｅ）の相互相関バッファＭＣＢ２は、時点ｔ６〜ｔ１０（Ｍ）の相互相関計算Ｍの式（１６）のｉ＝４０〜１２０としたＭ（ｉ）を格納して、時点ｔ１０〜のフレーム番号６の消失に備える。同じく、（ｅ）の相互相関バッファＭＣＢ２は、時点ｔ１０〜ｔ１４（Ｍ）の相互相関計算Ｍの式（１６）のｉ＝４０〜１２０としたＭ（ｉ）を格納して、時点ｔ１４のフレーム番号８の消失に備える。
【００８４】
かくして、Ｓ（ｉ）、Ｍ（ｉ）のそれぞれ８１個分の格納容量をもつ自己相関バッファＳＣＢ１と相互相関バッファＭＣＢ１のペア、および、Ｓ（ｉ）、Ｍ（ｉ）のそれぞれ８１個分の格納容量をもつ自己相関バッファＳＣＢ２と相互相関バッファＭＣＢ２のペアとで重複せずに、常時、計算結果を格納して、フレーム消失の事態に備えている。
【００８５】
図９（ａ）において、時点ｔ１０のフレーム６が消失したと仮定する。すると、フレーム消失検出部９はフレーム消失を検出して、フレーム消失信号１５を出力する。これを受けたピッチ検出部７は、直前のデータ、すなわち、同図（ｄ），（ｅ）の自己相関バッファＳＣＢ２と相互相関バッファＭＣＢ２のペアから、ｔ１０における格納データである相関計算結果（Ｓ（ｉ），Ｍ（ｉ））を読み出して、式（８）の正規化相互相関ｐｉｔｃｈ（ｉ）を計算し、そのピーク値を示すｉを検出して、音声のピッチ周期を抽出する。
【００８６】
補間処理部８では、消失したフレーム番号６の消失に備えていたピッチ・バッファＰＢ１（図６（ｂ））のデータをピッチ・バッファ出力２１により読み出し、ピッチ検出部７で求めた音声のピッチ（周期）を用いて、補間データ２６を作成し、出力する。ピッチ検出および消失したフレームの補間方法は、ＩＴＵ−Ｔ勧告に従って実行される。
【００８７】
図９を用いて、演算量（サイクル数）を説明する。たとえば、時点ｔ７において入力された１ｓａ（サンプル）のデータｘ（ｎ）に対して式（１２）によるＳ（０）の計算を２回（＝２×１＝２サイクル）、式（１４）によるＳ（ｉ）の計算を１回（＝２サイクル）、式（１６）によるｉ＝４０〜１２０とした８１個のｘ（ｎ−ｉ）によるＭ（ｉ）の計算を２回（＝２×８１＝１６２サイクル）、平方根計算を１回（＝１０サイクル）の合計２＋２＋１６２＋１０＝１７６サイクルとなる。
【００８８】
以上の説明では、フレーム周期が１０ｍｓ（８０ｓａ）として例示したが、フレーム周期が１０ｍｓの整数倍である場合には、消失フレームに対して、１０ｍｓ周期のフレームが連続して発生したとみなすことにより、同様に処理できる。フレーム周期が１０ｍｓ以下の場合には、複数のフレームをまとめて１０ｍｓフレームとみなすことにより、同様に処理できる。このとき、まとめられた複数のフレームのうちの１つが消失した場合には、その１まとめのフレーム全体を消失フレームとみなして同様に処理する。
【００８９】
以上において説明した本発明では、常時、正規化相互相関計算を行うので、１つの入力されたデータに対して、１７６サイクルの演算量である。また、消失フレームの発生時の最初に行う正規化相互相関計算は４２回の除算（１回の除算は１０サイクル）の４２０サイクルである。すなわち、消失フレーム発生時には、最大でも１７６＋４２０＝５９６サイクルの演算量でよい。このように演算処理は平均化されて、ＣＰＵ負荷が軽減されている。
【００９０】
【発明の効果】
ＩＴＵ−Ｔ勧告に従って消失フレームを処理する場合には、消失区間の最初の１サンプル（１２５μｓ）の間にピッチ検出をしなければならないので、正規化相互相関計算のために、４，９６４サイクルの演算量が必要となる。相関計算部などを構成するＣＰＵの負荷は、その１サンプル（１２５μｓ）の間に集中してしまうために、極めて重いものとなっていた。
【００９１】
以上の説明から明らかなように、本発明では、フレーム消失発生の有無に関わらず、常時、正規化相互相関計算を分散して実行してフレーム消失発生に備えているために、相関計算部などを構成するＣＰＵの負荷は、極めて軽い。従来例においては、フレーム消失発生時直後において、１サンプル（１２５μｓ）の間に集中する演算量は、４，９６４サイクルであったのが、本発明によれば５９６サイクルとなり、性能の低い安価なＣＰＵを用いても十分処理できるから、本発明の効果は極めて大きい。
【図面の簡単な説明】
【図１】本発明の実施の形態を示す回路構成図である。
【図２】図１に示した回路構成の動作を従来例と対比して説明するためのタイムチャートである。
【図３】図１に示した回路構成の動作原理を説明するためのタイムチャートである。
【図４】図１に示した回路構成の構成要素であるピッチ・バッファの動作を説明するためのタイムチャートである。
【図５】図１に示した回路構成の構成要素であるピッチ・バッファのさらに詳細な動作を説明するためのタイムチャートである。
【図６】図４に示したタイムチャートにおいて、フレーム消失が発生した場合の動作を説明するためのタイムチャートである。
【図７】図１に示した回路構成の構成要素であるピッチ・バッファ制御部の動作を説明するためのタイムチャートである。
【図８】図１に示した回路構成の構成要素である相関バッファに含まれた経過バッファの動作を説明するためのタイムチャートである。
【図９】図１に示した回路構成の構成要素である相関バッファに含まれた自己相関バッファおよび相互相関バッファの動作を説明するためのタイムチャートである。
【図１０】従来例における補間処理動作を説明するためのタイムチャートである。
【符号の説明】
１入力データ
２出力データ
３遅延データ
５相関計算部
６相関バッファ
７ピッチ検出部
８補間処理部
９フレーム消失検出部
１０出力バッファ
１１ピッチ・バッファ制御部
１２相関バッファ制御部
１５フレーム消失信号
２１ピッチ・バッファ出力
２２相関入出力
２３自己相関バッファ出力
２４相互相関バッファ出力
２５ピッチデータ
２６補間データ
３０スイッチ
３１スイッチ信号
３２ピッチ入力データ
ａ，ｂ端子
ＤＬ遅延器
Ｆフレーム
ｉ，ｋサンプル番号
Ｍ相互相関計算
ＭＣＢ１，２相互相関バッファ
ｎサンプル番号
ＮＦＰＢＮｏ次フレーム用ピッチ・バッファ番号
ＰＡＢ１〜５経過バッファ
ＰＢ１〜５ピッチ・バッファ
ＰＢＣ１〜５ピッチ・バッファ・カウンタ
ＰＦＰＢＮｏ現フレーム用ピッチ・バッファ番号
Ｓ自己相関計算
ｓａサンプル
ＳＣＢ１，２自己相関バッファ
ＳＤ自己相関差分計算
ＳＷ切替スイッチ
ｔ１〜２４，３１〜３５時点
ｔｓ記憶開始時点[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a pitch detection method and apparatus for voice communication using packets. More specifically, the present invention relates to a voice pitch detection method and apparatus for compensating for packet loss when a frame is lost. Recommendation G. of the Telecommunication Standardization Sector of the International Telecommunication Union (ITU-T). 711 In the recommendation "packet loss compensation method" shown in APPENDIX I (hereinafter simply referred to as ITU-T recommendation), an improved method and apparatus for reducing the load on the CPU by averaging arithmetic processing is provided. Is what you do.
[Prior art]
[0002]
In the "packet loss compensation method" of the ITU-T recommendation, in voice communication, voice pitch detection is performed between the first frames of a frame lost section when a frame is lost. In this pitch detection, a normalized cross-correlation calculation is performed between the speech of 20 ms (160 samples) immediately before the lost frame and the speech in the past. This accounts for a large part of the calculation amount in the CPU.
[0003]
According to the ITU-T recommendation, at the beginning of a frame lost section, audio data immediately before the lost frame is copied (stored) in a pitch buffer having a length of 48.75 ms (390 samples). This stored (stored) voice data is used to calculate the pitch of the current voice (the period of the fundamental wave), and to extract and reproduce the voice waveform that is presumed to have existed during the period of the lost frame. Used to
[0004]
The pitch of the audio data ranges from 5 ms (40 samples) to 15 ms (120 samples), and includes the most recent (latest) 20 ms (160 samples) audio stored in the pitch buffer and the past audio. It is estimated by finding the peak of the normalized cross-correlation.
[0005]
The normalized cross-correlation pitch (i) can be obtained by dividing the cross-correlation M (i) by the square root of the autocorrelation S (i). Let x (n) be the audio data input immediately before the lost section, which is the period of the lost frame, and let i, k, and n be the sample numbers.
M (i) = Σx (nk) x (nik) (1)
S (i) = Σx (n−ik)²                        (2)
Here, Σ represents a cumulative sum from k = 0 to k. x (nk) represents audio data k samples before x (n), and x (nik) represents audio data i samples before x (nk).
[0006]
When the normalized cross-correlation pitch (i) is obtained using the equations (1) and (2),
pitch (i) = M (i) / {S (i)}^1/2                    (3)
Becomes In the pitch detection, the value of i at which the value of the value of the parameter pitch (i) is maximum is detected in the range of i = 40 to 120 for 160 samples of k = 0 to 159.
[0007]
In Expression (3), in order to obtain one normalized cross-correlation pitch (i), the product-sum operation of Expression (1) is performed 160 times (k = 0 to 159), and the product-sum operation of Expression (2) is performed 160 times. Times (k = 0 to 159) and square root (｛｛^1/2) Is performed once, and the division of the equation (3) needs to be performed once. Assuming that the product-sum operation of equations (1) and (2) requires 1 cycle, the square root of equation (2) requires 10 cycles, and the division of equation (3) requires 10 cycles, the normalization is performed once. The calculation for determining the cross-correlation pitch (i) requires 160 × 2 + 10 + 10 = 340 cycles, and this is calculated 81 times for i = 40 to 120, so that the calculation amount of 27,540 cycles is required.
[0008]
In other words, in order to obtain the normalized cross-correlation pitch (i) of the equation (3), a calculation amount of 27,540 cycles is required. It is possible to detect the value of i at which the normalized cross-correlation pitch (i) of Expression (3) has the maximum value for the first time by executing such an amount of calculation.
[0009]
In order to avoid the execution of such a large calculation amount, the ITU-T recommends a low calculation amount by the following method.
[0010]
Estimation of pitch is calculated in two stages. In the first stage, a rough search is performed on the audio data signal thinned out two-to-one to detect a peak value. In the second stage, a detailed search is performed near the peak value detected in the coarse search.
[0011]
In the coarse search of the first stage, in the autocorrelation calculation of Expression (2), when i = 120,
S (i) = Σx (ni−2k)²                        (4)
That is,
S (120) = Σx (n−120−2k)²
Here, in the first stage, as a result of thinning out to 2: 1, in Expression (4), Σ represents a sum from k = 0 to 79, and since it was thinned out to 2: 1, instead of k in Expression (2), 2k is used.
[0012]
The difference calculation is
S (i + 1) -S (i) = x (n-120-i)²  -X (ni)²  (5)
Is represented by
S (i) = S (i + 1) -x (n-120-i)²  + X (ni)²  (6)
Becomes In the equation (6), the calculation is performed only when i is an even number, i.e., 40 times, since i = 1119 to 40 is thinned out to 2: 1.
[0013]
Similar to the autocorrelation calculation, in the cross-correlation calculation of Expression (1), i = 120,
M (i) = Σx (n−2k) · x (ni−2k) (7)
Becomes Here, as a result of thinning out 2: 1 in the first stage, in Expression (7), Σ represents a cumulative sum from k = 0 to 79, and 2k is used instead of k in Expression (1). In the equation (7), the calculation is performed only when i is an even number out of i = 119 to 40, that is, 41 times.
[0014]
From the cross-correlation M (i) obtained from the equation (7) and the autocorrelation S (i) obtained from the equation (6), a rough normalized cross-correlation (when i out of i = 119 to 40 is an even number) Find pitch (i).
pitch (i) = M (i) / {S (i)}^1/2                    (8)
[0015]
Therefore, the amount of computation required to search for the coarse normalized cross-correlation pitch (i) in equation (8) is 80 cycles for the autocorrelation calculation, 2 × 40 = 80 cycles for the difference calculation, and 80 × for the cross-correlation calculation. The first stage of 41 cycles, 10 × 41 cycles for square root calculation, and 10 × 41 cycles for division calculation requires a total of 4,260 cycles.
[0016]
In the second stage, the peak value i detected in the search for the coarse normalized cross-correlation pitch (i) of Expression (8) obtained by the calculation of 4,260 cycles in the first stage and the three values before and after (i-1) , I, i + 1) performs a search for a detailed normalized cross-correlation pitch (i) without thinning out samples.
[0017]
That is, in equation (3), k = 0 to 159, and
pitch (i + 1) = M (i + 1) / {S (i + 1)}^1/2      (9)
pitch (i) = M (i) / {S (i)}^1/2                  (10)
pitch (i-1) = M (i-1) / {S (i-1)}^1/2      (11)
Becomes
[0018]
The operation amount of the second stage is 160 cycles for autocorrelation calculation, 2 × 2 = 4 cycles for difference calculation, 160 × 3 cycles for cross-correlation calculation, 10 × 3 cycles for square root calculation, and 10 × 3 cycles for division calculation. This is a total of 704 cycles in the second stage. The total of the first stage and the second stage is 4,964 cycles.
[0019]
Since the total of the first stage and the second stage is 4,964 cycles, the amount of calculation of 27,540 cycles required when the ITU-T recommendation is directly executed is reduced to about 20%. You. However, the amount of computation that occurs when the ITU-T recommendation for reducing the computation amount is executed is 4,964 cycles. If this is processed within a period of 125 μs (8 kHz), the processing amount (IPS: the number of instructions per second) in the CPU becomes 4964 / 0.000125 = 39.712 MIPS (M: mega), and It is a heavy load for.
[0020]
FIG. 10 is a time chart showing the operation of the interpolation processing in the conventional device. In the audio input data 1 shown in FIG. 9A, data of data samples 0 to 79 = 80 samples (sa) are included in one frame (t8 to t10). Here, 1sa is 125 μs. It is assumed that the frame at time points t10 to t12 has disappeared. FIG. 7C shows delay data 3 obtained by delaying the input data 1 of FIG. Here, (b) is omitted for convenience of explanation.
[0021]
The frame erasure signal 15 in (d) remains “L” until time t10, but becomes “H” when a frame is detected at time t10, and becomes “L” again when a frame is detected at time t12. ". According to the ITU-T recommendation, in order to smoothly connect the interpolation data and the input data 1 immediately before the disappearance, the delay data 3 (c) for 30 samples (sa) immediately before the disappearance and the interpolation data are forwardly (in the direction from t32 to t34). ) Is performed during time t32 to t34. Also, in order to smoothly connect the interpolation data and the input data immediately after the disappearance, the superimposed addition of the delay data 3 (c) for 30 sa immediately after the lost frame and the data obtained by extending the interpolation data backward is performed at times t35 to t36. Have gone between
[0022]
In the superimposition addition, when the data number 50 of t10 is added to the interpolation data in the delay data 3 (c) at the time points t10 to t33, the ratio of the delay data is increased and the ratio of the interpolation data is reduced, and thereafter, the time approaches t33. Accordingly, the ratio of the delay data is reduced and the ratio of the interpolation data is increased. During the period from time t32 to time t34 of the output data 2 (e) delayed by 1 sa, the ratio of the delay data is gradually reduced and the ratio of the interpolation data is gradually increased. After t34, only the interpolation data remains until t35.
[0023]
Further, in the case of adding the data number 0 at t37 to the interpolation data in the delay data 3 (c) from the time point t37 to t38, the ratio of the delay data is reduced and the ratio of the interpolation data is increased, and then the interpolation is performed as the time approaches t38. The ratio of data is reduced and the ratio of delayed data is increased. During the period from time t35 to time t36 of the output data 2 (e) delayed by 1 sa, the ratio of the interpolation data is gradually increased, and after t36, only the delayed data is included.
[0024]
The output data 2 in (e) is output with a delay of 1 sa from the delay data 3 in (c). When the disappearance of the frame is detected at the time point t10, the first-stage and second-stage arithmetic processes are performed within a period of 125 μs for 1 sa, and the interpolation data is output. The output data 2 is interpolated data that has been superimposed and added between time t32 and time t34 (30 samples: 3.75 ms) 125 μs later than time t10, and is interpolated data that has not been superimposed and added between time t34 and t35. The interpolation data is superimposed and added between t35 and t36.
[0025]
If the arithmetic processing of the first and second steps is not completed within 125 μs from time t10 to time t32 when the frame erasure signal (d) occurs, the sound data is interrupted due to the occurrence of the erasure frame. Since such a situation must be avoided, it imposes a heavy load on the CPU which must complete the arithmetic processing within this 125 μs.
[0026]
[Problems to be solved by the invention]
Even if the ITU-T recommendation for reducing the amount of computation is adopted in order to avoid the execution of a large computation amount of 27,540 cycles required when the ITU-T recommendation is executed as it is, the load on the CPU still remains. heavy. In addition, increasing the processing time in the CPU is not permissible because the interpolation processing of the lost frame is delayed, causing inconvenience in the audio signal, and has been a problem to be solved.
[0027]
Although it is technically possible to avoid the delay of the interpolation process by increasing the operation speed of the CPU and its peripheral circuit elements, it is a serious problem that it cannot be implemented because of a significant increase in cost. was there.
[0028]
[Means for Solving the Problems]
According to the present invention, in voice communication using packets, arithmetic processing is averaged in order to reduce a large load on the CPU that occurs when compensating for packet loss when a frame is lost.
[0029]
In preparation for the loss of a frame containing audio data, regardless of whether or not a frame has been lost, a normalized cross-correlation calculation is always performed from a series of normal frame sequences to detect the pitch of the audio, and a series of normal When the next frame input immediately after the frame sequence is a lost frame, interpolation data obtained based on the detected voice pitch is interpolated into the lost frame. In this interpolation operation, data for 30 samples just before the disappearance of a frame is superimposed and added.
[0030]
When the frame period is 10 ms (80 samples), 390/80 = 4.9 is assumed as storing (storing) 390 samples immediately before the frame disappearance. Therefore, five pitch buffers are prepared. ing.
[0031]
The operation of calculating the autocorrelation and the cross-correlation, detecting the pitch of audio data, and creating interpolation data is always performed regardless of whether or not frame loss occurs. Since the calculation is always executed in a distributed manner, even if the calculation speed is slow, it can sufficiently cope with it.
[0032]
When a frame loss occurs, the updating of the audio data in the pitch buffer is stopped, and the lost frame is interpolated by interpolation data obtained from the pitch of the latest audio data already detected. When the abnormal situation of frame loss occurs, the pitch and interpolation data necessary to interpolate the lost frame have already been created, so the processing amount at the time of frame loss is extremely small, There is no possibility that the interpolation process is delayed and the audio signal is not inconvenient.
[0033]
BEST MODE FOR CARRYING OUT THE INVENTION
1 and 2 are a circuit configuration diagram showing an embodiment of the present invention and a time chart for explaining the operation of the circuit configuration in comparison with a conventional example (FIG. 10). Here, the same symbols are used for the same elements as those in FIG. 10 showing the conventional example. 2 differs from FIG. 10 in that the functions of the switch 30 in FIG. 2B and the switch SW in FIG. 2F are added.
[0034]
That is, the switch 30 in (b) is on during the period from the time point t31 to t10 (for 30 sa: 3.75 ms), and is off at other times. The switch SW of (f) is connected to the terminal a during the period from time t32 to time t36, and is connected to the terminal b at other times. The correlation calculation has been completed by time t10.
[0035]
Voice data is applied as input data 1 (a). The input data 1 is delayed by 30 sa (between t31 and t10) by the delay unit DL, connected to the terminal b of the changeover switch SW, and delayed by 1 sa (sample: 125 μs for one sample) in the output buffer 10, and It is output as output data 2 as in the example. The input data 1 has a frame configuration and includes data for 80 samples obtained by sampling from an audio signal in one frame.
[0036]
If the frame is lost for any reason, the voice of the lost frame cannot be reproduced.Therefore, the data of the lost frame is correlated with the voice data before the loss, and the pitch of the voice is detected and estimated. The lost data is interpolated with the lost data. When a frame loss occurs in the input data 1, the frame loss detecting unit 9 detects this, outputs a frame loss signal 15 (t10), and switches the switch SW to the terminal a side with a delay of 1sa (t32). The interpolation data 26 for interpolating the lost frame is output as the output data 2 via the output buffer 10. The configurations of the frame erasure detector 9, the delay unit DL and the output buffer 10 are the same as those of the conventional example and are well known.
[0037]
When the interpolation processing is completed and the frame erasure detection unit 9 confirms the input of a normal frame at time t12, after the frame erasure signal 15 ends, 60 sa minutes (t12 to t38) elapse, and the changeover switch is switched in the next sample. SW is switched to the terminal b side (t36). While the frame erasure detection unit 9 confirms the input of a normal frame, the changeover switch SW is on the terminal b side, the output data 2 is output, and is simultaneously applied to the pitch buffers PB1 to PB5. Is temporarily stored (stored). The lost frame start time point t33 in the delay data 3 (c) is delayed by 30 sa by the delay unit DL.
[0038]
The switch 30 (b) is turned off for the last 30 sa of one frame of the input data 1 (a) (t31 to t10: 3.75 ms: data numbers 50 to 79), and the next switch which may disappear. 30sa immediately before the start of the frame is stored in the pitch buffer PB1. This 30 sa corresponds to time points t10 to t33 (data numbers 50 to 79) in the delay data 3 (c), and time points t32 to t34 (data numbers 50 to 79) in the pitch detection and output data 2 (e). Is used in the superposition addition of.
[0039]
The pitch buffer output 21 is sent to the interpolation processing unit 8 and the correlation calculation unit 5. The correlation buffer 6 includes five progress buffers PAB1 to PAB5, two autocorrelation buffers SCB1 and SCB2, and two cross correlation buffers MCB1 and MCB2. Between the correlation calculator 5 and the correlation buffer 6, data is being exchanged during the correlation calculation by the correlation input / output 22.
[0040]
The autocorrelation buffer output 23 of the autocorrelation buffers SCB1 and SCB2 and the crosscorrelation buffer output 24 of the crosscorrelation buffers MCB1 and MCB2 are sent to the pitch detection unit 7. Here, the pitch of the voice is detected, the pitch data 25 is sent to the interpolation processing unit 8, and the created interpolation data 26 is applied to the terminal a of the switch SW. When the frame detection unit 9 detects the disappearance of the frame, the changeover switch SW is switched to the terminal a by the frame disappearance signal 15, and the frame interpolated by the interpolation data 26 is output as the output data 2 via the output buffer 10. Is done.
[0041]
FIG. 3 shows a time chart for explaining the operation principle of the circuit configuration shown in FIG. One pitch buffer PB1 among the five pitch buffers PB1 to PB5 is described as a representative example.
[0042]
The frame (F) in FIG. 3A starts at time t0 and includes data of 80 samples (sa) from 0 to 79 by time t2. Similarly, t4, t6,... T10 follow. The pitch buffer PB1 in FIG. 3B starts temporary storage of data from the storage start time ts. The storage capacity of the pitch buffer PB1 is 390 sa (0 to 389). At time t10, the pitch buffer PB1 becomes full.
[0043]
280 sa of data is required for voice pitch calculation. Therefore, it is assumed that the operation of the correlation calculation unit 5 in FIG. The autocorrelation calculation is performed between time points t3 and t7 (for 160 sa). The autocorrelation difference calculation is performed between time points t7 and t9 (for 80 sa). The operation of the correlation calculation unit 5 is performed between time points t6 and t10 (for 160 sa).
[0044]
In the figure, if the frame after the time point t10 disappears, the pitch detection unit 7 and the interpolation processing unit 8 detect the pitch for 1 sa (125 μs) from the time point t10 and obtain the interpolation data 26. ing. The pitch buffers PB1 to PB5, the correlation calculation unit 5 having the CPU configuration, and the correlation buffer 6 are under the control of the pitch buffer control unit 11 and the correlation buffer control unit 12. Assuming that a frame to be present at time points t10 to t12 has disappeared, the interpolation data 26 obtained during 1 sa (125 μs) from time point t10 is output in order to interpolate the lost frame.
[0045]
The correlation calculation and pitch detection performed in FIG. 3 will be described in detail. In the pitch buffer PB1 shown in FIG. 3B, in order to estimate the pitch of the voice and extract and estimate the voice waveform of the lost frame section (t10 to t12), the pitch buffer PB1 of ts to t10 before the disappearance time t10 is used. 390sa (0 to 389 samples) are stored. In order to distribute the arithmetic processing, the auto-correlation calculation (t3 to t7) and the cross-correlation calculation (t6 to t10) are sequentially performed when data is input and calculation becomes possible.
[0046]
The autocorrelation S (0) from the time point t3 when the pitch buffer PB1 stores (stores) 110 sa to the time point t7 when the pitch buffer PB1 stores 160 sa is obtained. Assuming that input data of sample number n is x (n), when n = 110 to 269 (in FIG. 3B, sample numbers 0, 110, 230, 270, 349, and 389 are displayed).
S (0) = S (0) + x (n)²                            (12)
Is calculated.
[0047]
When the sample number n = 269, the autocorrelation S (0) is obtained.
Next, an autocorrelation S (1) of 160 sa starting with n = 111 is obtained. This is because when n = 270 using S (0),
S (1) = S (0) -x (n-160)²+ X (n)²          (13)
Is required.
[0048]
Thereafter, 80 (= 349-269) autocorrelations S (i) can be obtained between t7 and t9 using 160sa from n = 190 to n = 349 at the beginning of n = 190. The autocorrelation S (i) is obtained when n = 270-349.
Assuming that i = n-269,
S (i) = S (i-1) -x (n-160)²+ X (n)²      (14)
[0049]
The cross-correlation M (i) is obtained by sequentially multiplying 160 sa (t 6 to t 10) immediately before the frame disappears, that is, 160 sa with n = 230 at the top and data before i samples (sa). Add and find. That is, for the data of n = 230 to 389 from t6 to t10, for example, when i = 120, n = 110 to 269 (nik shown in FIG. The data of t3 to t7) are multiplied and added.
[0050]
With respect to the data of n = 230 to 389 (t6 to t10) indicated by the one-dot chain line in nk of FIG. 9D, for example, when i = 40, ni of FIG. −k is multiplied by data of n = 190 to 349 (t5 to t9) indicated by a dashed-dotted line frame and added.
[0051]
When n = 389, for i = 40 to 120, the cross-correlation M (i) is
M (i) = Σx (nk) × x (nik) (15)
It can be expressed as. Here, Σ represents a cumulative sum when k = 0 to 159.
[0052]
When n = 230-389, the cross-correlation M (i) is, for i = 40-120, respectively:
M (i) = M (i−1) + x (n) · x (ni) (16)
Is calculated, when n = 389 (t10), all 81 cross correlations M (i) are obtained.
[0053]
The number of product-sum operations at the time of data input according to equations (12), (14), and (16) is as follows. In the autocorrelation calculation of equation (12), x (n) is added to S (0).²  Is calculated once (= 1 cycle). In the autocorrelation difference calculation of Expression (14), x (n-160) is obtained from S (i-1).²  Subtraction and x (n)²  Is executed twice (= 2 cycles). In the cross-correlation calculation of equation (16), x (n) · x (ni) is added to M (i−1) for 81 i (= 40 to 120), so calculation is performed 81 times (= 81 cycles). I do.
[0054]
Since a part of the autocorrelation calculation (t6 to t7) and the autocorrelation difference calculation (t7 to t9) are performed simultaneously in parallel between the time points t6 and t10 in the cross-correlation calculation of FIG. Performs 83 times (= 83 cycles) product-sum calculation. Further, {S (i)} in equation (8)^1/2  Since it takes 10 cycles to calculate the square root of the result of the equation (14) to obtain, the amount of calculation is 83 + 10 = 93 cycles.
[0055]
When the frame erasure occurs at time t10, the square root {S (i)} of the cross-correlation M (i) and the auto-correlation already determined.^1/2  Then, the lost frame is interpolated by calculating the normalized cross-correlation pitch (i) of Expression (8). At time t10 when frame erasure occurs, the required calculation performs a coarse search with a signal thinned out two-to-one to reduce the amount of computation.
[0056]
Thereafter, if a detailed search is performed in the vicinity of the peak obtained by the coarse search, only 41 + 2 = 43 divisions are required, and each division requires 10 cycles. Therefore, 43 × 10 cycles = This is 430 cycles. This is 10% or less of 4,964 cycles when the proposal for reducing the amount of operation of the ITU-T recommendation is executed as it is, so that the load on the CPU including the correlation calculation unit 5 and the like is extremely small.
[0057]
In the above description, the pitch buffer PB1 has been described as a representative example. However, at the time t10 when the cross-correlation calculation ends, a lost frame is not always generated just conveniently. For this purpose, five pitch buffers PB are prepared, and even when a lost frame occurs, any one of the pitch buffers PB is in the state shown in FIG. 3 so that it can be dealt with immediately.
[0058]
FIG. 4 is a time chart for explaining the operation of the pitch buffer, which is a component of the circuit configuration shown in FIG. 1, with respect to a frame. FIG. 7A shows frames from time t0 to time t18. FIGS. 8B to 8F show the operations of the pitch buffers PB1 to PB5, respectively. Here, S indicates autocorrelation calculation, SD indicates autocorrelation difference calculation, and M indicates cross-correlation calculation.
[0059]
Regardless of the presence or absence of frame erasure, it is necessary to always perform autocorrelation calculation and cross-correlation calculation for each frame in which the input audio data 80sa (sample) is one frame. For pitch detection and interpolation, a pitch buffer PB storing 390 sa (samples) of data is required for each frame.
[0060]
The five pitch buffers PB1 to PB5 can cope with each frame by shifting the time. One pitch buffer PB can only respond to the loss of one frame. Therefore, for example, when the frame period is 10 ms (80 samples), 390/80 = 4.9 is assumed as storing 390 samples immediately before erasure of the frame. Therefore, five pitch buffers PB1 to PB5 are prepared.
[0061]
In the pitch buffers PB1 to PB5, sequentially input audio data for each frame are sequentially stored from the pitch buffer PB1. That is, five frames of voice data are stored by shifting one frame at a time by the five pitch buffers PB1 to PB5. After the lapse of five frame periods, the oldest audio data is stored. For example, the storage content of the pitch buffer PB1 is updated from time t10 to 10sa (sample) to store the latest audio data. become. Looking at one pitch buffer PB, the stored audio data is updated every five frame periods.
[0062]
An autocorrelation calculation S, an autocorrelation difference calculation SD, and a cross-correlation calculation M are performed from the samples (audio data) stored in the pitch buffers PB1 to PB5 to detect the pitch of the audio data and create interpolation data. Is always running. For example, when a frame disappears at time t10, the interpolation data calculated from the data in the pitch buffer PB1 is used. Similarly, when a frame disappears at times t12, 14, 16, and 18, interpolation data based on the data of the pitch buffers PB2, 3, 4, and 5 is used, respectively.
[0063]
A correlation buffer 6 and a correlation calculator 5 are provided for obtaining the autocorrelation and the cross-correlation, and are controlled by a correlation buffer controller 12. The calculations in the correlation calculation unit 5 having the CPU configuration are always executed in a distributed manner, so that even if the calculation speed is slow, it can sufficiently cope with them.
[0064]
When a frame loss occurs, the update of the audio data in the pitch buffer is stopped, and the lost frame is interpolated with the latest detected voice data pitch. When the abnormal situation of frame loss occurs, the interpolation data necessary to interpolate the lost frame has already been created, so the amount of calculation when the frame loss occurs is extremely small, and the interpolation processing of the lost frame is performed. Is not delayed, thereby causing no inconvenience in the audio signal.
[0065]
FIG. 5 shows a time chart for explaining how data in pitch buffers PB1 to PB5 are allocated to frames. FIG. 7A shows frame numbers from time t0 to time t24. FIGS. 8B to 8F show the frame-corresponding operations of the pitch buffers PB1 to PB5, respectively. For example, the data at t0 to t10 of the pitch buffer PB1 in (b) is used when a frame loss occurs in the frame 6 (t10). Similarly, data of t2 to t12 of the pitch buffer PB2 is used when a frame disappears in the frame 7 (t12 to). Hereinafter, the same applies to the pitch buffers PB3 to PB5.
[0066]
FIG. 6 is a time chart for explaining the manner in which data in the subsequent pitch buffers PB1 to PB5 are allocated to frames when a frame loss occurs in frame 6 at times t10 to t12 in FIG. I have. The data at t0 to t10 in the pitch buffer PB1 in (b) is not updated at t10 in the pitch buffer PB1 in (b) when a frame loss occurs in the frame 6 (t10). The update is performed after the interpolation operation for interpolating the lost frame is completed at t12.
[0067]
Instead, the data in preparation for the loss of the frame 11 whose accumulation has to be started from the time t10 is accumulated in the pitch buffer PB2 of (c) after the time t10. Therefore, the data from t12 to t22 of the pitch buffer PB1 is used for the case where the frame 12 (t22 to) has disappeared.
[0068]
When the frame 7 also disappears following the disappearance of the frame 6, interpolation is performed with the data (t0 to t12) prepared for the disappearance of the frame 6 in the pitch buffer PB1 in (b). Therefore, the data of the pitch buffer PB2 (c) accumulated until t10 in preparation for the loss of the frame 7 becomes unnecessary, so that the data is stored in the PB2 in case the frame 11 is lost at t10 to t20. You.
[0069]
FIG. 7 shows a time chart of the operation of the pitch buffer counters PBC1 to PBC5 composed of five registers included in the pitch buffer control unit 11. The five pitch buffer counters PBC1-5 control the allocation of frames to the five pitch buffers PB1-5. Each pitch buffer counter PBC is a counter of 390 (0 to 389).
[0070]
Each of the pitch buffer counters PBC1 to PBC5 in (b), (c), (d), (e), and (f) controls the allocation of frames to the pitch buffers PB1 to PB5. . The pitch buffer number PFPBNo for the current frame in (g) indicates that, for example, the number of the pitch buffer PB assigned to the frame at the time point t0 is 1 (PB1). The next frame pitch buffer number NFPBNo in (h) indicates that, for example, the number of the pitch buffer PB allocated next to the frame at the time point t0 is 2 (PB2). The same applies to the following.
[0071]
The procedure of the frame allocation in FIG. 7 will be specifically described. For example, when the pitch buffer counter PBC1 indicates 309 (= 389-80) at time t8, the number 1 of the pitch buffer PB1 is recorded as 1 of the pitch buffer number NFPBNo for the next frame of (h). I do. When the pitch buffer counter PBC1 indicates 389 at time t10, the pitch buffer PB1 number 1 is recorded as the current frame pitch buffer number NFPBNo of (g).
[0072]
If the current frame is a normal frame, data in the pitch buffer PB for the current frame becomes unnecessary, and is allocated to a new frame five frames later. If the current frame is a lost frame, the lost data is interpolated by interpolation data already prepared as data in the pitch buffer PB for the current frame, and is used while the lost frames are continuous. At that time, the pitch buffer for the next frame becomes unnecessary, and is allocated to a new frame.
[0073]
The correlation calculation unit 5 performs a correlation calculation for each pitch buffer PB according to the count value of the pitch buffer counter PBC. For example, when the count value of the pitch buffer counter PBC1 is 110 to 269 (see FIG. 3), the autocorrelation calculation is performed, and when the count value is 269, the square root of the autocorrelation result is obtained.
[0074]
Further, when the count value is 270 to 349, autocorrelation difference calculation is performed, and the square root of the difference calculation result is obtained. When the count value is 230 to 389, a cross-correlation calculation is performed. The calculation result is stored in the cross-correlation buffers MCB1 and MCB2 included in the correlation buffer 6 and indicated by the correlation buffer control unit 12.
[0075]
The correlation buffer 6 includes five progress buffers PAB1 to PAB5 for storing the progress of the autocorrelation calculation for each pitch buffer PB and an autocorrelation buffer SCB1 for storing 81 autocorrelation calculation results for two frames per frame. , 2 and cross-correlation buffers MCB1 and MCB2 which store 81 cross-correlation calculation results per frame for two frames.
[0076]
The correlation buffer control unit 12 controls the correlation buffer 6 that stores a correlation calculation result. The elapsed buffers PAB1 to PAB5 are assigned corresponding to the pitch buffers PB1 to PB5, respectively. The autocorrelation buffers SCB1 and SCB2 can store one frame at a time, and are assigned alternately for each frame. Similarly, since the cross-correlation buffers MCB1 and MCB2 can store one frame each, they are allocated alternately for each frame.
[0077]
FIG. 8 is a time chart for explaining the operation of the progress buffers PAB1 to PAB5 included in the correlation buffer 6. FIG. 6A shows the frame numbers at time points t0 to t24. Each of the elapse buffers PAB1 to PAB5 in FIGS. 8B, 8C, 8D, 8E, and 8F corresponds to one of the pitch buffers PB1 to PB5.
[0078]
For example, the elapsed buffer PAB2 in FIG. 9C is allocated to one pitch buffer PB1 between the autocorrelation calculation S and the autocorrelation difference calculation SD from t3 to t9 (t3 to t9 in FIG. 4). Of the frame (t10). Similarly, the elapsed buffer PAB3 of FIG. 7D is allocated to one pitch buffer PB2 between the autocorrelation calculation S and the autocorrelation difference calculation SD from t5 to t11 (t5 to t11 in FIG. 4). This indicates that the second frame (t12-) is prepared. Hereinafter, the same applies.
[0079]
FIG. 9 is a time chart for explaining the operation of the autocorrelation buffers SCB1 and SCB2 and the cross-correlation buffers MCB1 and MCB2 included in the correlation buffer 6 in correspondence with FIG. FIG. 6A shows the frame numbers at time points t0 to t24. The operation of the auto-correlation buffers SCB1, SCB2 and the cross-correlation buffers MCB1, MCB1 and MCB2 of (b), (c), (d) and (e) of FIG.
[0080]
The autocorrelation buffer SCB1 shown in FIG. 11B is obtained by calculating S (0) of the equation (12) obtained between the time points t1 and t5 (S) and the equation (14) obtained between the time points t5 and t7 (SD). ) In which i = 1 to 80 and S (i) may be stored until the time point t8 when the cross-correlation calculation M ends. Similarly, the autocorrelation buffer SCB1 in (b) is obtained by calculating S (0) in the equation (12) obtained between the time points t5 and t9 (S) and using the equation (14) obtained between the time points t9 and t11 (SD). ) In which i = 1 to 80 and S (i) may be stored until the time point t12 when the cross-correlation calculation M ends.
[0081]
The cross-correlation buffer MCB1 of (c) stores M (i) where i = 40 to 120 in the equation (16) of the cross-correlation calculation M from time t4 to t8 (M), and stores the frame number from time t8. Prepare for the disappearance of 5. Similarly, the cross-correlation buffer MCB1 in (c) stores M (i) where i = 40 to 120 in Equation (16) of the cross-correlation calculation M from time t8 to t12 (M), and Prepare for loss of frame number 7.
[0082]
The autocorrelation buffer SCB2 in FIG. 11D is obtained by calculating S (0) of the equation (12) obtained between the time points t3 and t7 (S) and the equation (14) obtained between the time points t7 and t9 (SD). ) In which i = 1 to 80 and S (i) may be stored until the time t10 when the cross-correlation calculation M ends. Similarly, the autocorrelation buffer SCB2 of (d) is obtained by calculating S (0) of the equation (12) obtained between the time points t7 and t11 (S) and the equation (14) obtained between the time points t11 and t13 (SD). ) In which i = 1 to 80 and S (i) may be stored until the time point t14 when the cross-correlation calculation M ends.
[0083]
The cross-correlation buffer MCB2 of (e) stores M (i) where i = 40 to 120 in the equation (16) of the cross-correlation calculation M at time t6 to t10 (M), and stores the frame number at time t10. Prepare for the disappearance of 6. Similarly, the cross-correlation buffer MCB2 of (e) stores M (i) where i = 40 to 120 in Expression (16) of the cross-correlation calculation M at time t10 to t14 (M), and stores the frame at time t14. Prepare for the disappearance of number 8.
[0084]
Thus, a pair of the auto-correlation buffer SCB1 and the cross-correlation buffer MCB1 each having a storage capacity of 81 S (i) and M (i), and 81 pairs of S (i) and M (i), respectively. The calculation result is always stored without overlapping the autocorrelation buffer SCB2 having a storage capacity and the pair of the cross-correlation buffer MCB2 to prepare for a frame loss situation.
[0085]
In FIG. 9A, it is assumed that frame 6 at time t10 has disappeared. Then, the frame erasure detecting section 9 detects the frame erasure and outputs the frame erasure signal 15. Receiving this, the pitch detection unit 7 calculates the correlation calculation result (S) as the storage data at t10 from the immediately preceding data, that is, the pair of the auto-correlation buffer SCB2 and the cross-correlation buffer MCB2 in FIGS. (I), M (i)) is read out, the normalized cross-correlation pitch (i) of equation (8) is calculated, i indicating the peak value is detected, and the pitch period of the voice is extracted.
[0086]
The interpolation processing unit 8 reads out the data of the pitch buffer PB1 (FIG. 6B) prepared for the disappearance of the lost frame number 6 by using the pitch buffer output 21, and reads the voice pitch ( The interpolation data 26 is created using the (cycle) and output. The method of pitch detection and interpolation of lost frames is performed according to ITU-T recommendations.
[0087]
The operation amount (the number of cycles) will be described with reference to FIG. For example, the calculation of S (0) by the equation (12) is performed twice (= 2 × 1 = 2 cycles) for the data x (n) of 1sa (sample) input at the time point t7 according to the equation (14). The calculation of S (i) is performed once (= 2 cycles), and the calculation of M (i) is performed twice (= 2 × 2) using 81 x (ni) with i = 40 to 120 according to equation (16). 81 = 162 cycles), and one square root calculation (= 10 cycles) gives a total of 2 + 2 + 162 + 10 = 176 cycles.
[0088]
In the above description, the frame period is exemplified as 10 ms (80 sa). However, when the frame period is an integral multiple of 10 ms, it is considered that a frame having a period of 10 ms is continuously generated with respect to the lost frame. Can be processed similarly. When the frame period is 10 ms or less, the same processing can be performed by regarding a plurality of frames collectively as a 10 ms frame. At this time, if one of the combined frames is lost, the entirety of the combined frame is regarded as a lost frame and the same processing is performed.
[0089]
In the present invention described above, since the normalized cross-correlation calculation is always performed, the operation amount is 176 cycles for one input data. In addition, the first normalized cross-correlation calculation performed when an erased frame occurs is 420 cycles of 42 divisions (one division is 10 cycles). That is, at the time of occurrence of a lost frame, the calculation amount may be 176 + 420 = 596 cycles at the maximum. In this way, the arithmetic processing is averaged, and the CPU load is reduced.
[0090]
【The invention's effect】
When processing a lost frame according to the ITU-T recommendation, pitch detection must be performed during the first sample (125 μs) of the lost section, so that 4,964 cycles of normalized cross-correlation calculations are required. The amount of calculation is required. The load on the CPU constituting the correlation calculation unit and the like is extremely heavy because it is concentrated during one sample (125 μs).
[0091]
As is apparent from the above description, in the present invention, regardless of the presence or absence of occurrence of frame loss, the normalization cross-correlation calculation is always executed in a distributed manner to prepare for the occurrence of frame loss. Is very light. In the conventional example, immediately after the occurrence of a frame erasure, the amount of computation concentrated in one sample (125 μs) was 4,964 cycles. Since the processing can be sufficiently performed even by using a CPU, the effect of the present invention is extremely large.
[Brief description of the drawings]
FIG. 1 is a circuit configuration diagram showing an embodiment of the present invention.
FIG. 2 is a time chart for explaining the operation of the circuit configuration shown in FIG. 1 in comparison with a conventional example.
FIG. 3 is a time chart for explaining the operation principle of the circuit configuration shown in FIG. 1;
4 is a time chart for explaining an operation of a pitch buffer which is a component of the circuit configuration shown in FIG. 1;
FIG. 5 is a time chart for explaining a more detailed operation of a pitch buffer which is a component of the circuit configuration shown in FIG. 1;
FIG. 6 is a time chart for explaining an operation when a frame loss occurs in the time chart shown in FIG. 4;
FIG. 7 is a time chart for explaining an operation of a pitch buffer control unit which is a component of the circuit configuration shown in FIG. 1;
FIG. 8 is a time chart for explaining an operation of a progress buffer included in a correlation buffer which is a component of the circuit configuration shown in FIG. 1;
FIG. 9 is a time chart for explaining operations of an autocorrelation buffer and a cross-correlation buffer included in a correlation buffer, which are components of the circuit configuration shown in FIG.
FIG. 10 is a time chart for explaining an interpolation processing operation in a conventional example.
[Explanation of symbols]
1 Input data
2 Output data
3 Delayed data
5 Correlation calculator
6 Correlation buffer
7 Pitch detection unit
8 Interpolation processing unit
9 Frame loss detector
10 Output buffer
11 Pitch buffer controller
12 Correlation buffer control unit
15 frame lost signal
21 Pitch buffer output
22 Correlation input / output
23 Autocorrelation buffer output
24 Cross-correlation buffer output
25 Pitch data
26 Interpolation data
30 switches
31 Switch signal
32 pitch input data
a, b terminals
DL delay unit
F frame
i, k sample number
M Cross-correlation calculation
MCB1,2 Cross-correlation buffer
n sample number
NFPBNo Pitch buffer number for next frame
PAB1-5 Elapse buffer
PB1-5 Pitch buffer
PBC1-5 Pitch buffer counter
PFPBNo Pitch buffer number for current frame
S Autocorrelation calculation
sa sample
SCB1,2 Autocorrelation buffer
SD Autocorrelation difference calculation
SW changeover switch
t1-24, 31-35
ts memory start time

Claims

In preparation for the occurrence of a lost frame in which a frame including voice data is lost, a normalized cross-correlation calculation is always performed from a series of normal frame sequences to detect the voice pitch, thereby obtaining the detected voice pitch. When the next frame input immediately after the series of normal frame strings is the lost frame, interpolation data obtained based on the detected voice pitch is interpolated to the lost frame. ,
Pitch detection method in packet loss compensation.

When detecting the pitch of the voice by performing the normalized cross-correlation calculation, one frame as one of the frames includes voice data of 80 samples, and two frames of voice data immediately before the lost frame are included. , To calculate the normalized cross-correlation with the audio data of the previous frame sequence,
A pitch detection method in packet loss compensation according to claim 1.

In preparation for the occurrence of a lost frame in which a frame including audio data is lost, a normalized cross-correlation calculating means (5, 6, 9, 11, 12, 12) for constantly calculating a normalized cross-correlation from a series of normal frame sequences. , 30, PB1-5);
Pitch detection means (7) for detecting the pitch of the voice from the result of the normalized cross-correlation calculation to obtain the pitch of the detected voice;
To interpolate and output interpolation data obtained based on the detected voice pitch to the lost frame when the next frame input immediately after the series of normal frame sequences is the lost frame. Interpolation processing means (8, DL, SW)
Pitch detection device in packet loss compensation.

The normalized cross-correlation calculation means (5, 6, 9, 11, 12, 30, PB1-5)
When one frame, which is one of the frames, contains audio data of 80 samples, five pitch buffer means (PB1) for storing audio data of 390 samples while shifting one frame at a time. ~ 5),
When detecting the pitch of the voice by performing the normalized cross-correlation calculation, the auto-correlation and the cross-correlation calculation are performed between the voice data of the two frames immediately before the lost frame and the voice data of the previous frame sequence. Correlation calculation means (5) for performing
Five progress buffers (PAB1 to 5) for storing calculation data generated in a calculation process for performing the autocorrelation and cross-correlation calculations, and for storing a result of the autocorrelation calculation generated in the calculation process. A correlation buffer means (6) including two autocorrelation buffers (SCB1, 2) and two crosscorrelation buffers (MCB1, 2) for storing the result of the cross-correlation calculation occurring in the calculation process; ,including,
A pitch detecting device according to claim 3, wherein the pitch detecting device is used for packet loss compensation.

The normalized cross-correlation calculation means (5, 6, 9, 11, 12, 30, PB1-5)
Five pitch buffer counters (PBC1-5) for counting the number of data samples respectively stored in the five pitch buffer means (PB1-5), and the pitch buffer counters (PBC1-5) ) Is recorded as a pitch buffer number (PFPBNo, NFPBNo) to be assigned to the current frame or the next frame when the count value of ()) reaches a predetermined value. Another pitch buffer assigned to the next frame instead of one of the pitch buffer means to be updated when the current frame is lost. Pitch buffer control means (11, 30) for controlling so as to update the number of
Judgment is made from the result of the cross-correlation calculation according to the count values of the five pitch buffer counters (PBC1 to PBC5). PB1 to PB5) are selected, and during the calculation of the autocorrelation difference, the two autocorrelation buffers (SCB1) corresponding to the assigned frame are selected. , 2), and during the cross-correlation calculation, one of the two cross-correlation buffers (MCB1, 2) corresponding to the assigned frame is selected. Controlling correlation buffer control means (12).
A pitch detecting apparatus for packet loss compensation according to claim 4.