JP2004301907A

JP2004301907A - Speech encoding device

Info

Publication number: JP2004301907A
Application number: JP2003091747A
Authority: JP
Inventors: Koji Yoshida; 幸司吉田
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 2003-03-28
Filing date: 2003-03-28
Publication date: 2004-10-28
Anticipated expiration: 2023-03-28
Also published as: JP4437011B2

Abstract

<P>PROBLEM TO BE SOLVED: To realize both transmission efficiency and low delay performance of two-way communication. <P>SOLUTION: A speech encoding device is provided with a 1st encoding part 101 which uses a 1st speech encoding system having small encoding delay and a 2nd encoding part 103 which uses a 2nd encoding system having larger encoding delay and a lower encoding bit rate than those of the 1st speech encoding system. An encoding selection part 105 detects the degree of two-way of speech communication by using a transmission-side input speech signal and a reception-side decoded speech signal and properly selects one of the 1st encoding part 101 and 2nd encoding part 103 as an encoding means of encoding the transmission-side input speech signal on the basis of the detected degree of two-way of speech communication. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、デジタル音声通信に必要な音声符号化装置に関する。
【０００２】
【従来の技術】
デジタル方式の移動通信網や固定通信網、インターネットなどによる音声通信においては、伝送効率の向上を図るため、音声信号を高効率で符号化する音声符号化装置が用いられる。なお、本明細書中、「音声符号化装置」という用語は、符号化機能のみならず復号化機能をも有する広い意味で使用する。
【０００３】
近年、各種通信網の伝送速度の向上やマルチメディア通信の発展により、音声通信のサービス形態として、単なる電話としての機能のみならず、テレビ電話としての利用や、テレビ会議などの、より臨場感が要求される音声通信を行うこと、各種情報を音声で案内する際に背景にＢＧＭがあるような音声信号をより高品質に伝送することなど、様々な形態があり、音声信号の符号化に対して、高い効率を維持しつつより高い品質で音声信号を符号化できる装置が求められている。
【０００４】
図３は、従来の音声符号化装置の一般的な構成を示す図である（たとえば、非特許文献１参照）。
【０００５】
図３の装置において、送信する音声信号は符号化部１に入力され、ここで符号化処理が行われた後、音声符号化データとして出力される。音声符号化データは通信相手に送信される。一方、通信相手から受信した音声符号化データは復号化部３に入力され、ここで復号化処理が行われた後、復号音声信号として出力される。なお、音声信号の符号化は、一般に、入力音声信号に対して一定の区間（以下「音声フレーム」という）毎に区切られ、この音声フレーム単位で符号化処理と復号化処理が行われる。
【０００６】
このような音声符号化装置を用いて音声通信を実現するためには、音声通信による通話の双方向性を考慮して、符号化によって生じる遅延（以下「符号化遅延」という）がある程度小さいこと、たとえば、符号化部単体の遅延量で５０ｍｓ程度まで、また、処理遅延や伝送路遅延を含めた片側（送信側または受信側）の合計遅延量で１５０ｍｓ程度までであることが望まれる。
【０００７】
このような音声通信に適した高効率の符号化方式としては、ＩＴＵ−Ｔ（ＩｎｔｅｒｎａｔｉｏｎａｌＴｅｌｅｃｏｍｍｕｎｉｃａｔｉｏｎＵｎｉｏｎＴｅｌｅｃｏｍｍｕｎｉｃａｔｉｏｎｓｔａｎｄａｒｄｉｚａｔｉｏｎｓｅｃｔｏｒ：国際電気通信連合電気通信標準化部門）や３ＧＰＰ（３ｒｄＧｅｎｅｒａｔｉｏｎＰａｒｔｎｅｒｓｈｉｐＰｒｏｊｅｃｔ）などの標準化機関で規格化されている様々な方式があり、代表的な例として、ＩＴＵ−Ｔ標準Ｇ．７２９（ＣＳ−ＡＣＥＬＰ符号化）や３ＧＰＰ標準のＡＭＲ符号化などが挙げられる。
【０００８】
【非特許文献１】
３ＧＰＰ標準規格ＴＳ２６．０７１：ＡＭＲｓｐｅｅｃｈＣＯＤＥＣ；Ｇｅｎｅｒａｌｄｅｓｃｒｉｐｔｉｏｎ
【０００９】
【発明が解決しようとする課題】
しかしながら、上記に示した従来の高効率な音声符号化装置においては、音声の帯域が３．４ｋＨｚまでの音声信号を対象とした符号化方式であるため、より高い音声品質を実現するためには必ずしも十分ではなく、より音声帯域の広い高品質な符号化が望まれる。
【００１０】
この点、音声帯域の広い非常に高品質な符号化を実現できる符号化方式として、音楽信号の符号化を対象とした、ＭＰ３やＡＡＣと呼ばれる符号化方式がある。しかし、これらは、双方向通信用の符号化方式ではないため、符号化遅延が大きく（たとえば、符号化部単体の遅延量で１００ｍｓのオーダ）、音声通信用の符号化方式として使用する場合には、符号化遅延により音声通信に支障が生じるという問題がある。
【００１１】
また、一般に、音声品質を高く維持したまま低い遅延で符号化を実現する場合には、伝送に要する符号化ビットレートが高くなり、伝送効率が低下するという問題がある。
【００１２】
本発明は、かかる点に鑑みてなされたものであり、伝送効率と双方向通信における低遅延性とを両立させることができる音声符号化装置を提供することを目的とする。
【００１３】
【課題を解決するための手段】
本発明の音声符号化装置は、第１の音声符号化方式を用いて、送信する音声信号を符号化する第１符号化手段と、前記第１の音声符号化方式に比べて、符号化遅延が大きく、かつ、符号化ビットレートが低いまたは符号化音声品質が高い第２の符号化方式を用いて、送信する音声信号を符号化する第２符号化手段と、受信された音声符号化データを復号化する復号化手段と、送信する音声信号および復号化後の音声信号を用いて、音声通信の双方向性度合を検出する検出手段と、検出された音声通信双方向性度合に基づいて、前記第１符号化手段および前記第２符号化手段のいずれか一方を選択する選択手段と、選択された符号化手段の処理結果を出力する出力手段と、を有する構成を採る。このとき、たとえば、前記選択手段は、音声通信の双方向性度合が高い場合は、第１符号化手段を選択し、音声通信の双方向性度合が低い場合は、第２符号化手段を選択する、構成を採る。また、たとえば、前記復号化手段は、前記第１の音声符号化方式および前記第２の音声符号化方式のうち、選択された一の音声符号化方式を用いて、受信された音声符号化データを復号化する、構成を採る。
【００１４】
この構成によれば、送信する音声信号を符号化する符号化手段として、第１の音声符号化方式を用いる第１符号化手段と、第１の音声符号化方式に比べて、符号化遅延が大きく、かつ、符号化ビットレートが低いまたは符号化音声品質が高い第２の符号化方式を用いる第２符号化手段とを設け、送信する音声信号および復号化後の音声信号を用いて音声通信の双方向性度合を検出し、検出した音声通信双方向性度合に基づいて、符号化遅延が比較的小さい第１符号化手段と、符号化遅延は比較的大きいが符号化ビットレートは比較的低いまたは符号化音声品質は比較的高い第２符号化手段のいずれか一方を適切に選択するため、符号化の遅延が適切に制御され、伝送効率と双方向通信における低遅延性とを両立させることができる。
【００１５】
本発明の音声符号化装置は、上記の構成において、前記検出手段は、送信する音声信号に対して判定された有音区間に関する送信側の情報および復号化後の音声信号に対して判定された有音区間に関する受信側の情報を用いて、音声通信の双方向性度合を検出する、構成を採る。
【００１６】
この構成によれば、音声通信の双方向性度合を検出する際に、送信する音声信号に対して判定された有音区間に関する送信側の情報および復号化後の音声信号に対して判定された有音区間に関する受信側の情報を用いるため、音声通信の双方向性度合をより適切に検出することができる。
【００１７】
本発明の音声符号化装置は、上記の構成において、前記検出手段は、送信する音声信号に対して判定された有音区間に関する送信側の情報および復号化後の音声信号に対して判定された有音区間に関する受信側の情報として、送信側の有音率と受信側の有音率との組み合わせ情報、送信側の有音区間と受信側の有音区間との相補性度情報、および送信側の有音区間と受信側の有音区間との交互発生度情報のうち、少なくとも一以上の情報を用いて、音声通信の双方向性度合を検出する、構成を採る。
【００１８】
この構成によれば、送信する音声信号に対して判定された有音区間に関する送信側の情報および復号化後の音声信号に対して判定された有音区間に関する受信側の情報として、具体的に、上記の各種情報を用いるため、音声通信の双方向性度合をより一層高い精度で検出することができる。
【００１９】
本発明の携帯端末装置は、上記構成の音声符号化装置を有する構成を採る。
【００２０】
この構成によれば、上記と同様の作用効果を有する携帯端末装置を実現することができる。
【００２１】
本発明の基地局装置は、上記構成の音声符号化装置を有する構成を採る。
【００２２】
この構成によれば、上記と同様の作用効果を有する基地局装置を実現することができる。
【００２３】
本発明の音声符号化方法は、第１の音声符号化方式を用いて、送信する音声信号を符号化する第１符号化ステップと、前記第１の音声符号化方式に比べて、符号化遅延が大きく、かつ、符号化ビットレートが低いまたは符号化音声品質が高い第２の符号化方式を用いて、送信する音声信号を符号化する第２符号化ステップと、受信された音声符号化データを復号化する復号化ステップと、送信する音声信号および復号化後の音声信号を用いて、音声通信の双方向性度合を検出する検出ステップと、前記検出ステップで検出した音声通信双方向性度合に基づいて、前記第１符号化方式および前記第２符号化方式のいずれか一方を選択する選択ステップと、前記第１符号化ステップおよび前記第２符号化ステップの各処理結果のうち、前記選択ステップで選択した符号化方式に基づく処理結果を出力する出力ステップと、を有するようにした。
【００２４】
この方法によれば、送信する音声信号を符号化する符号化ステップとして、第１の音声符号化方式を用いる第１符号化ステップと、第１の音声符号化方式に比べて、符号化遅延が大きく、かつ、符号化ビットレートが低いまたは符号化音声品質が高い第２の符号化方式を用いる第２符号化ステップとを設け、送信する音声信号および復号化後の音声信号を用いて音声通信の双方向性度合を検出し、検出した音声通信双方向性度合に基づいて、符号化遅延が比較的小さい第１符号化手段と、符号化遅延は比較的大きいが符号化ビットレートは比較的低いまたは符号化音声品質は比較的高い第２符号化手段のいずれか一方を適切に選択するため、符号化の遅延が適切に制御され、伝送効率と双方向通信における低遅延性とを両立させることができる。
【００２５】
本発明の音声符号化プログラムは、コンピュータに、第１の音声符号化方式を用いて、送信する音声信号を符号化する第１符号化ステップと、前記第１の音声符号化方式に比べて、符号化遅延が大きく、かつ、符号化ビットレートが低いまたは符号化音声品質が高い第２の符号化方式を用いて、送信する音声信号を符号化する第２符号化ステップと、受信された音声符号化データを復号化する復号化ステップと、送信する音声信号および復号化後の音声信号を用いて、音声通信の双方向性度合を検出する検出ステップと、前記検出ステップで検出した音声通信双方向性度合に基づいて、前記第１符号化方式および前記第２符号化方式のいずれか一方を選択する選択ステップと、前記第１符号化ステップおよび前記第２符号化ステップの各処理結果のうち、前記選択ステップで選択した符号化方式に基づく処理結果を出力する出力ステップと、を実行させるようにした。
【００２６】
このプログラムによれば、送信する音声信号を符号化する符号化ステップとして、第１の音声符号化方式を用いる第１符号化ステップと、第１の音声符号化方式に比べて、符号化遅延が大きく、かつ、符号化ビットレートが低いまたは符号化音声品質が高い第２の符号化方式を用いる第２符号化ステップとを設け、送信する音声信号および復号化後の音声信号を用いて音声通信の双方向性度合を検出し、検出した音声通信双方向性度合に基づいて、符号化遅延が比較的小さい第１符号化手段と、符号化遅延は比較的大きいが符号化ビットレートは比較的低いまたは符号化音声品質は比較的高い第２符号化手段のいずれか一方を適切に選択するため、符号化の遅延が適切に制御され、伝送効率と双方向通信における低遅延性とを両立させることができる。
【００２７】
【発明の実施の形態】
本発明の骨子は、音声通信の双方向性度合から符号化遅延を制御することで、伝送効率と双方向通信における低遅延性とを両立させることである。
【００２８】
以下、本発明の実施の形態について、図面を参照して詳細に説明する。
【００２９】
図１は、本発明の一実施の形態に係る音声符号化装置の構成を示すブロック図である。
【００３０】
この音声符号化装置は、送信側に、第１符号化部１０１、第２符号化部１０３、符号化選択部１０５、および切り替え器１０７を有し、受信側に、第１復号化部１０９、第２復号化部１１１、切り替え器１１３、および切り替え器１１５を有する。
【００３１】
まず、送信側の構成要素について説明する。
【００３２】
第１符号化部１０１および第２符号化部１０３は、それぞれ、送信する音声信号（入力音声信号）に対して音声符号化を行い、音声符号化データを切り替え器１０７に出力する。ここで、第１符号化部１０１は、符号化遅延が小さい第１の符号化方式を使用し、第２符号化部１０３は、第１の符号化方式に比べて、符号化遅延が大きく、かつ、符号化ビットレートが低いまたは符号化音声品質が高い第２の符号化方式を使用する。たとえば、第２の符号化方式は、第１の符号化方式に比べて、▲１▼符号化遅延が大きくかつ符号化ビットレートが低い場合（符号化音声品質は同等）と、▲２▼符号化遅延が大きくかつ音声品質が高い場合（符号化ビットレートは同等）とを有する。具体的な符号化方式は、上記条件を満たすものであれば任意のものでよい。具体例は、たとえば、次の表１に示すとおりである。
【００３３】
【表１】

【００３４】
ここで、上記の表１において、例１および例２は、第１符号化部１０１に対して、第２符号化部１０３が、符号化遅延が大きく、かつ、符号化音声品質がほぼ同等（または同等に近い）で符号化ビットレートが低い場合であり、例３は、第１符号化部１０１に対して、第２符号化部１０３が、符号化遅延が大きく、かつ、符号化ビットレートが同等で符号化音声品質が高い場合である。
【００３５】
なお、第１符号化部１０１および第２符号化部１０３に適用する符号化方式は、上記の例に限定されるわけではなく、前述のように、上記条件を満たすものであれば任意のものでよい。たとえば、第２符号化部１０３に適用する符号化方式は、第１符号化部１０１に適用する符号化方式に対して、フレーム長の増加や入力音声信号先読み遅延の増加などによる変更を加えたものでもよい。また、第１符号化部１０１にＣＥＬＰ系の符号化方式を、第２符号化部１０３に周波数変換符号化の方式をそれぞれ適用したものでもよい。また、スケーラブル構成の符号化において、ベースレイヤを低遅延の符号化とし、エンハンスレイヤを低遅延の符号化と、遅延の大きい符号化を切りかえる構成として、第１符号化部１０１および第２符号化部１０３を構成するようにしてもよい。
【００３６】
符号化選択部１０５は、送信側の入力音声信号および後述する動作により得られる受信側の復号音声信号を用いて、音声通信の双方向性度合を検出し、検出した音声通信双方向性度合に応じて、送信側の入力音声信号を符号化するために第１符号化部１０１と第２符号化部１０３のいずれを選択すべきかを示す情報（以下「符号化選択情報」という）を出力する。具体的には、たとえば、音声通信の双方向性度合が高い場合は、符号化遅延が小さい第１符号化部１０１を選択し、音声通信の双方向性度合が低い場合は、符号化遅延が大きくかつ符号化ビットレートが低い（または符号化音声品質が高い）第２符号化部１０３を選択し、この情報を符号化選択情報として出力する。符号化選択情報は、切り替え器１０７に出力されるとともに、通信相手に送信される。なお、符号化選択部１０５の内部構成は、後で詳細に説明する。
【００３７】
切り替え器１０７は、符号化選択部１０５から出力された符号化選択情報に基づいて、内部スイッチを切り替え、第１符号化部１０１から出力された音声符号化データと第２符号化部１０３から出力された音声符号化データのうち、選択された方の音声符号化データを、送信すべき音声符号化データとして出力する。なお、切り替え器１０７から出力された音声符号化データは、符号化選択部１０５から出力された符号化選択情報と共に、通信相手に送信される。
【００３８】
次に、受信側の構成要素について説明する。
【００３９】
第１復号化部１０９および第２復号化部１１１は、選択的に、それぞれ、切り替え器１１３の出力（音声符号化データ）に対して音声復号化を行い、復号音声信号を切り替え器１１５に出力する。ここで、第１復号化部１０９は送信側の第１符号化部１０１に対応し、第２復号化部１１１は送信側の第２符号化部１０３に対応している。第１復号化部１０９または第２復号化部１１１から出力された復号音声信号は、切り替え器１１５を介して図示しない所定の処理部および送信側の符号化選択部１０５に供給される。
【００４０】
切り替え器１１３および切り替え器１１５は、それぞれ、互いに同期して動作し、通信相手から受信された符号化選択情報に基づいて、内部スイッチを切り替える。すなわち、切り替え器１１３は、通信相手から受信された音声符号化データを、第１復号化部１０９および第２復号化部１１１のうち、選択された方の符号化方式を備えた復号化部に供給し、切り替え器１１５は、その復号化部から出力された復号音声信号を上記所定の処理部および符号化選択部１０５に供給する。
【００４１】
図２は、図１の符号化選択部１０５の構成の一例を示すブロック図である。
【００４２】
この符号化選択部１０５は、送信音声有音判定部１２１、受信音声有音判定部１２３、送信音声有音率算出部１２５、受信音声有音率算出部１２７、有音区間相補性度算出部１２９、有音区間交互発生度算出部１３１、音声通信双方向性度合検出部１３３、および符号化選択判定部１３５を有する。
【００４３】
送信音声有音判定部１２１は、送信する音声信号（入力音声信号）が、ある一定区間毎に有音か無音かの判定を行い、この判定結果を送信音声有音率算出部１２５、有音区間相補性度算出部１２９、および有音区間交互発生度算出部１３１に出力する。
【００４４】
受信音声有音判定部１２３は、受信した音声信号（復号音声信号）が、ある一定区間毎に有音か無音かの判定を行い、この判定結果を受信音声有音率算出部１２７、有音区間相補性度算出部１２９、および有音区間交互発生度算出部１３１に出力する。
【００４５】
なお、本実施の形態では、このように送信音声有音判定部１２１および受信音声有音判定部１２３を設けて、有音・無音の判定を行うようにしているが、これに限定されない。たとえば、第１符号化部１０１および第２符号化部１０３に適用される音声符号化方式自体にあらかじめ有音・無音の判定処理が組み込まれている場合には、その情報をそのまま利用するようにしてもよい。
【００４６】
送信音声有音率算出部１２５は、送信音声有音判定部１２１から出力された判定結果（送信音声の有音判定情報）を用いて、送信音声の有音率ＶＡＦｓ（０≦ＶＡＦｓ≦１）を算出する。ここで、送信音声の有音率ＶＡＦｓとは、送信音声における有音の割合のことである。算出された有音率ＶＡＦｓは、音声通信双方向性度合検出部１３３に出力される。
【００４７】
受信音声有音率算出部１２７は、受信音声有音判定部１２３から出力された判定結果（受信音声の有音判定情報）を用いて、受信音声の有音率ＶＡＦｒ（０≦ＶＡＦｒ≦１）を算出する。ここで、受信音声の有音率ＶＡＦｒとは、受信音声における有音の割合のことである。算出された有音率ＶＡＦｒは、音声通信双方向性度合検出部１３３に出力される。
【００４８】
有音区間相補性度算出部１２９は、送信音声有音判定部１２１から出力された判定結果（送信音声の有音判定情報）および受信音声有音判定部１２３から出力された判定結果（受信音声の有音判定情報）を用いて、送信音声と受信音声の有音区間相補性度を算出する。ここで、有音区間相補性度とは、送信音声の有音区間と受信音声の有音区間が時間的に重なったり（送信受信共に有音）また空きになったり（送信受信共に有音でない）する場合がどの程度ないかを示す度合である。本実施の形態では、この有音区間相補性度を、音声通信の双方向性度合を示す１つの指標とする。具体的には、たとえば、一例として、次の（式１）に示す値ＣＯＭＰを、この有音区間相補性度を示す１つの指標とする。算出された有音区間相補性度は、音声通信双方向性度合検出部１３３に出力される。
【００４９】

【００５０】
有音区間交互発生度算出部１３１は、送信音声有音判定部１２１から出力された判定結果（送信音声の有音判定情報）および受信音声有音判定部１２３から出力された判定結果（受信音声の有音判定情報）を用いて、送信音声と受信音声の有音区間交互発生度を算出する。ここで、有音区間交互発生度とは、ある単位時間に、送信音声と受信音声がどの程度の頻度で交互に有音区間となっているかを示すパラメータである。本実施の形態では、この有音区間交互発生度を、音声通信の双方向性度合を示す別の指標とする。具体的には、たとえば、有音区間交互発生度ＮＩＮＴＲを、単位時間（１ｓｅｃ）当たりの送話側から受話側（または受話側から送話側）への有音区間の変化の回数と定義する。算出された有音区間交互発生度は、音声通信双方向性度合検出部１３３に出力される。
【００５１】
音声通信双方向性度合検出部１３３は、送信音声有音率算出部１２５、受信音声有音率算出部１２７、有音区間相補性度算出部１２９、および有音区間交互発生度算出部１３１でそれぞれ得られた、送信音声の有音率ＶＡＦｓ、受信音声の有音率ＶＡＦｒ、有音区間相補性度ＣＯＭＰ、および有音区間交互発生度ＮＩＮＴＲを用いて、音声通信双方向性度合を判定（検出）する。この判定（検出）結果は、符号化選択判定部１３５に出力される。
【００５２】
具体的には、たとえば、下記の（式２）、（式３）、（式４）、（式５）を用いて、音声通信双方向性度合を示すフラグＦＬＡＧを求める。
【００５３】

【００５７】
符号化選択判定部１３５は、音声通信双方向性度合検出部１３３で得られた判定（検出）結果ＦＬＡＧに基づいて、入力音声信号を符号化するために第１符号化部１０１と第２符号化部１０３のいずれを選択すべきかを示す情報（符号化選択情報）を決定し、出力する。具体的には、たとえば、ＦＬＡＧ＝１の場合は、音声通信の双方向性度合が高いものと判断して、第１符号化部１０１を選択し、ＦＬＡＧ＝０の場合は、音声通信の双方向性度合が低いものと判断して、第２符号化部１０３を選択する。
【００５８】
なお、本実施の形態では、３種類の判定情報ＦＬＡＧ１、ＦＬＡＧ２、ＦＬＡＧ３から判定を行っているが、これに限定されるわけではなく、これら３種類の中のいずれか１つまたは任意の組み合わせで判定を行うようにしてもよい。
【００５９】
次いで、上記構成を有する音声符号化装置の動作について説明する。
【００６０】
まず、第１符号化部１０１および第２符号化部１０３で、それぞれ、入力音声信号に対して音声符号化を行い、音声符号化データを切り替え器１０５に出力する。上記のように、第１符号化部１０１には、符号化遅延が小さい符号化方式が適用され、第２符号化部１０３には、第１符号化部１０１に比べて符号化遅延が大きくかつ符号化ビットレートが低い（または符号化音声品質が高い）符号化方式が適用されている。
【００６１】
そして、符号化選択部１０５で、送信側の入力音声信号および受信側の復号音声信号を用いて、音声通信の双方向性度合を検出し、検出した音声通信双方向性度合に応じて、第１符号化部１０１と第２符号化部１０３のいずれを選択すべきかを示す情報（符号化選択情報）を出力する。具体的には、たとえば、上記のように、音声通信の双方向性度合が高い場合は、符号化遅延が小さい第１符号化部１０１を選択し、音声通信の双方向性度合が低い場合は、符号化遅延が大きくかつ符号化ビットレートが低い（または符号化音声品質が高い）第２符号化部１０３を選択し、この情報を符号化選択情報として出力する。符号化選択情報は、切り替え器１０７に出力されるとともに、通信相手に送信される。
【００６２】
また、このとき、音声通信の双方向性度合の検出に当たっては、たとえば、上記のように、送信側の入力音声信号および受信側の復号音声信号を用いて、送信音声と受信音声の有音判定を行い、この判定結果を用いて、送信音声有音率、受信音声有音率、有音区間相補性度、有音区間交互発生度をそれぞれ算出した後、これらの算出結果を用いて、上記（式２）〜（式５）により、音声通信の双方向性度合を判定（検出）する。
【００６３】
そして、切り替え器１０７で、符号化選択部１０５から出力された符号化選択情報に基づいて、内部スイッチを切り替え、第１符号化部１０１から出力された音声符号化データと第２符号化部１０３から出力された音声符号化データのうち、選択された方の音声符号化データを、送信すべき音声符号化データとして出力する。なお、切り替え器１０７から出力された音声符号化データは、符号化選択部１０５から出力された符号化選択情報と共に、通信相手に送信される。
【００６４】
一方、通信相手から音声符号化データおよび符号化選択情報を受信すると、受信した符号化選択情報に基づいて、切り替え器１１３、１１５の内部スイッチを切り替え、第１復号化部１０９または第２復号化部１１１で、切り替え器１１３の出力（音声符号化データ）に対して音声復号化を行い、得られた復号音声信号を、切り替え器１１５を介して図示しない所定の処理部および送信側の符号化選択部１０５に出力する。
【００６５】
このように、本実施の形態によれば、送信する音声信号を符号化する符号化手段として、符号化遅延が小さい第１の音声符号化方式を用いる第１符号化部１０１と、第１の音声符号化方式に比べて符号化遅延が大きくかつ符号化ビットレートが低い（または符号化音声品質が高い）第２の符号化方式を用いる第２符号化部１０３とを設け、送信側の入力音声信号および受信側の復号音声信号を用いて音声通信の双方向性度合を検出し、検出した音声通信双方向性度合に基づいて、符号化遅延が小さい第１符号化手段と、符号化遅延は大きいが符号化ビットレートは低い（または符号化音声品質は高い）第２符号化手段のいずれか一方を適切に選択するため、符号化の遅延が適切に制御され、伝送効率と双方向通信における低遅延性とを両立させることができる。
【００６６】
なお、本実施の形態では、符号化の遅延量が異なる２つの符号化部１０１、１０３を切り替えるようにしているが、切り替える符号化部の数はこれに限定されるわけではなく、３つ以上の符号化部を設け、これら３つ以上の符号化部を音声通信双方向性度合によって適切に切り替える構成としてもよい。
【００６７】
また、本実施の形態は、上記の機能を実現させる制御プログラムをコンピュータに実行させる構成としてもよい。
【００６８】
【発明の効果】
以上説明したように、本発明によれば、伝送効率と双方向通信における低遅延性とを両立させることができる。
【図面の簡単な説明】
【図１】本発明の一実施の形態に係る音声符号化装置の構成を示すブロック図
【図２】図１の符号化選択部の構成を示すブロック図
【図３】従来の音声符号化装置の構成の一例を示すブロック図
【符号の説明】
１０１第１符号化部
１０３第２符号化部
１０５符号化選択部
１０７、１１３、１１５切り替え器
１０９第１復号化部
１１１第２復号化部
１２１送信音声有音判定部
１２３受信音声有音判定部
１２５送信音声有音率算出部
１２７受信音声有音率算出部
１２９有音区間相補性度算出部
１３１有音区間交互発生度算出部
１３３音声通信双方向性度合検出部
１３５符号化選択判定部[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to an audio encoding device required for digital audio communication.
[0002]
[Prior art]
2. Description of the Related Art In a voice communication using a digital mobile communication network, a fixed communication network, the Internet, or the like, a voice coding apparatus that codes a voice signal with high efficiency is used to improve transmission efficiency. In this specification, the term "speech coding apparatus" is used in a broad sense having not only a coding function but also a decoding function.
[0003]
In recent years, with the improvement of transmission speeds of various communication networks and the development of multimedia communications, voice communication services have become more than just phone functions, but also have more realism such as videophone use and videoconferencing. There are various forms, such as performing required voice communication and transmitting higher quality voice signals with BGM in the background when guiding various information by voice. Thus, there is a need for a device that can encode a speech signal with higher quality while maintaining high efficiency.
[0004]
FIG. 3 is a diagram showing a general configuration of a conventional speech coding apparatus (for example, see Non-Patent Document 1).
[0005]
In the apparatus shown in FIG. 3, the audio signal to be transmitted is input to the encoding unit 1, where it is subjected to encoding processing and then output as encoded audio data. The voice encoded data is transmitted to the communication partner. On the other hand, the encoded audio data received from the communication partner is input to the decoding unit 3, where the encoded data is subjected to a decoding process and then output as a decoded audio signal. In general, the encoding of the audio signal is divided into a certain section (hereinafter referred to as “audio frame”) with respect to the input audio signal, and the encoding processing and the decoding processing are performed in units of the audio frame.
[0006]
In order to realize voice communication using such a voice coding device, the delay caused by coding (hereinafter referred to as “coding delay”) must be small to some extent in consideration of the two-way nature of voice communication. For example, it is desired that the delay amount of the encoding unit alone is up to about 50 ms, and that the total delay amount of one side (transmitting side or receiving side) including the processing delay and the transmission path delay is up to about 150 ms.
[0007]
As a high-efficiency coding method suitable for such voice communication, an ITU-T (International Telecommunication Union Telecommunication standardization sector: International Telecommunication Union Telecommunication Standardization Sector) and 3GPP (3rd Generation Partnership Standardization Organization, etc.) There are various standardized systems, and a typical example is the ITU-T standard G.400. 729 (CS-ACELP encoding) and 3GPP standard AMR encoding.
[0008]
[Non-patent document 1]
3GPP standard TS26.071: AMR speech codec; General description
[0009]
[Problems to be solved by the invention]
However, in the above-described conventional high-efficiency speech coding apparatus, since the coding method is for a speech signal whose speech band is up to 3.4 kHz, in order to realize higher speech quality, It is not always sufficient, and high-quality coding with a wider voice band is desired.
[0010]
In this regard, as a coding method capable of achieving very high quality coding over a wide audio band, there is a coding method called MP3 or AAC for coding a music signal. However, since these are not coding systems for two-way communication, the coding delay is large (for example, the delay amount of the coding unit alone is on the order of 100 ms), and when these are used as coding systems for voice communication. However, there is a problem that voice communication is hindered due to coding delay.
[0011]
In general, when encoding is realized with low delay while maintaining high audio quality, there is a problem that an encoding bit rate required for transmission increases and transmission efficiency decreases.
[0012]
The present invention has been made in view of the above, and has as its object to provide a speech encoding device that can achieve both transmission efficiency and low delay in bidirectional communication.
[0013]
[Means for Solving the Problems]
An audio encoding apparatus according to the present invention includes a first encoding unit that encodes an audio signal to be transmitted using a first audio encoding method, and an encoding delay compared with the first audio encoding method. Encoding means for encoding an audio signal to be transmitted using a second encoding method having a large encoding bit rate or a low encoding bit rate or high encoded audio quality; and received audio encoded data. A decoding means for decoding the audio signal, a detection means for detecting the bidirectional degree of the voice communication using the transmitted audio signal and the decoded audio signal, and , A selecting means for selecting one of the first coding means and the second coding means, and an output means for outputting a processing result of the selected coding means. At this time, for example, the selecting unit selects the first encoding unit when the degree of bidirectionality of the voice communication is high, and selects the second encoding unit when the degree of bidirectionality of the voice communication is low. Take the configuration. Also, for example, the decoding unit may be configured to use a selected one of the first audio coding method and the second audio coding method for the received audio coded data. , And adopt a configuration.
[0014]
According to this configuration, as encoding means for encoding the audio signal to be transmitted, the first encoding means using the first audio encoding method and the encoding delay are shorter than those of the first audio encoding method. A second encoding unit that uses a second encoding method that is large and has a low encoding bit rate or a high encoding audio quality, and performs audio communication using an audio signal to be transmitted and an audio signal after decoding. And a first encoding unit having a relatively small encoding delay and a relatively large encoding delay but a relatively large encoding bit rate, based on the detected voice communication bidirectionality. Since either one of the second encoding means having a low or encoded voice quality is relatively high is appropriately selected, the encoding delay is appropriately controlled, and both transmission efficiency and low delay in bidirectional communication are compatible. be able to.
[0015]
In the speech encoding apparatus according to the present invention, in the above-described configuration, the detection unit determines the information on the transmitting side regarding the sound interval determined for the audio signal to be transmitted and the decoded audio signal. A configuration is adopted in which the degree of bidirectionality of voice communication is detected using information on the receiving side regarding a sound section.
[0016]
According to this configuration, when detecting the degree of bidirectionality of the voice communication, it is determined based on the information on the transmitting side regarding the sound interval determined on the voice signal to be transmitted and the voice signal after decoding. Since the information on the receiving side regarding the sound section is used, the degree of the bidirectionality of the voice communication can be detected more appropriately.
[0017]
In the speech encoding apparatus according to the present invention, in the above-described configuration, the detection unit determines the information on the transmitting side regarding the sound interval determined for the audio signal to be transmitted and the decoded audio signal. As information on the receiving side regarding the voiced section, information on a combination of the voiced rate on the transmitting side and the voiced rate on the receiving side, complementarity information between the voiced section on the transmitting side and the voiced section on the receiving side, and transmission The bidirectional degree of voice communication is detected using at least one or more pieces of information on the alternating occurrence degree of the sound section on the receiving side and the sound section on the receiving side.
[0018]
According to this configuration, the information on the transmitting side regarding the voiced section determined for the audio signal to be transmitted and the information on the receiving side regarding the voiced section determined on the decoded voice signal are specifically described. Since the above various information is used, the degree of bidirectionality of voice communication can be detected with higher accuracy.
[0019]
The portable terminal device of the present invention employs a configuration including the above-described audio encoding device.
[0020]
According to this configuration, it is possible to realize a portable terminal device having the same operation and effect as described above.
[0021]
The base station apparatus according to the present invention employs a configuration having the speech coding apparatus having the above configuration.
[0022]
According to this configuration, it is possible to realize a base station apparatus having the same functions and effects as described above.
[0023]
The voice coding method of the present invention includes a first coding step of coding a voice signal to be transmitted using a first voice coding method, and a coding delay compared with the first voice coding method. A second encoding step of encoding an audio signal to be transmitted using a second encoding scheme having a large encoding bit rate or a low encoding bit rate or a high encoded audio quality; Using the audio signal to be transmitted and the decoded audio signal to detect the bidirectionality of the audio communication; and the audio communication bidirectionality detected in the detecting step. A selecting step of selecting one of the first encoding method and the second encoding method, based on the following, and selecting one of the processing results of the first encoding step and the second encoding step S And outputting the processing result based on the selected coding scheme in-up, and to have.
[0024]
According to this method, the encoding step of encoding the audio signal to be transmitted includes a first encoding step using the first audio encoding method, and an encoding delay that is shorter than that of the first audio encoding method. A second encoding step using a second encoding scheme that is large and has a low encoding bit rate or a high encoding voice quality, and uses a voice signal to be transmitted and a voice signal after decoding. And a first encoding unit having a relatively small encoding delay and a relatively large encoding delay but a relatively large encoding bit rate, based on the detected voice communication bidirectionality. Since either one of the second encoding means having a low or encoded voice quality is relatively high is appropriately selected, the encoding delay is appropriately controlled, and both transmission efficiency and low delay in bidirectional communication are compatible. It is possible .
[0025]
An audio encoding program according to the present invention provides a computer with a first encoding step of encoding an audio signal to be transmitted using a first audio encoding method, A second encoding step of encoding the audio signal to be transmitted using a second encoding scheme with a large encoding delay and a low encoding bit rate or high encoded audio quality; A decoding step of decoding encoded data, a detection step of detecting the degree of bidirectionality of voice communication using a voice signal to be transmitted and a voice signal after decoding, and a voice communication detected in the detection step. A selecting step of selecting one of the first encoding method and the second encoding method based on the degree of directivity; and a processing of each of the first encoding step and the second encoding step Of fruit were to be executed and an output step of outputting a processing result based on the coding scheme selected in the selecting step.
[0026]
According to this program, the encoding step for encoding the audio signal to be transmitted includes a first encoding step using the first audio encoding method, and an encoding delay that is shorter than that of the first audio encoding method. A second encoding step using a second encoding scheme that is large and has a low encoding bit rate or a high encoding voice quality, and uses a voice signal to be transmitted and a voice signal after decoding. And a first encoding unit having a relatively small encoding delay and a relatively large encoding delay but a relatively large encoding bit rate, based on the detected voice communication bidirectionality. Since either one of the second encoding means having a low or encoded voice quality is relatively high is appropriately selected, the encoding delay is appropriately controlled, and both transmission efficiency and low delay in bidirectional communication are compatible. thing It can be.
[0027]
BEST MODE FOR CARRYING OUT THE INVENTION
The gist of the present invention is to achieve both transmission efficiency and low delay in bidirectional communication by controlling the encoding delay based on the degree of bidirectionality of voice communication.
[0028]
Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
[0029]
FIG. 1 is a block diagram showing a configuration of a speech coding apparatus according to one embodiment of the present invention.
[0030]
This speech coding apparatus includes a first coding unit 101, a second coding unit 103, a coding selection unit 105, and a switch 107 on the transmission side, and a first decoding unit 109 on the reception side. It has a second decoding unit 111, a switch 113, and a switch 115.
[0031]
First, the components on the transmitting side will be described.
[0032]
Each of first encoding section 101 and second encoding section 103 performs audio encoding on an audio signal (input audio signal) to be transmitted, and outputs audio encoded data to switch 107. Here, the first encoding unit 101 uses the first encoding method with a small encoding delay, and the second encoding unit 103 has a large encoding delay as compared with the first encoding method. In addition, a second coding method with a low coding bit rate or high coding voice quality is used. For example, the second encoding method has the following advantages: (1) when the encoding delay is large and the encoding bit rate is low (the encoded voice quality is the same) as compared with the first encoding method; The encoding delay is large and the voice quality is high (the encoding bit rate is the same). A specific encoding method may be any method as long as the above condition is satisfied. Specific examples are, for example, as shown in Table 1 below.
[0033]
[Table 1]

[0034]
Here, in Table 1 above, in Example 1 and Example 2, the second encoding unit 103 has a large encoding delay and the encoded audio quality is almost equal to the first encoding unit 101 ( In the third example, the second encoding unit 103 has a large encoding delay and a large encoding bit rate with respect to the first encoding unit 101. Are the same and the coded voice quality is high.
[0035]
Note that the encoding method applied to the first encoding unit 101 and the second encoding unit 103 is not limited to the above example, but may be any method that satisfies the above conditions as described above. Is fine. For example, the coding system applied to the second coding unit 103 is modified from the coding system applied to the first coding unit 101 by an increase in the frame length or an increase in the input audio signal prefetch delay. It may be something. Further, a CELP coding method may be applied to the first coding unit 101 and a frequency conversion coding method may be applied to the second coding unit 103. Further, in the coding of the scalable configuration, the first coding unit 101 and the second coding unit are configured such that the base layer is switched to low-delay coding and the enhanced layer is switched between low-delay coding and large-delay coding. The unit 103 may be configured.
[0036]
The encoding selection unit 105 detects the bidirectional degree of the voice communication using the input voice signal on the transmitting side and the decoded voice signal on the receiving side obtained by the operation described later, and determines the bidirectionality of the detected voice communication. Accordingly, information indicating which of first encoding section 101 and second encoding section 103 should be selected to encode the input audio signal on the transmission side (hereinafter, referred to as “encoding selection information”) is output. . Specifically, for example, when the degree of bidirectionality of voice communication is high, the first encoding unit 101 having a small encoding delay is selected, and when the degree of bidirectionality of voice communication is low, the encoding delay is low. The second encoding unit 103 that is large and has a low encoding bit rate (or high encoding voice quality) is selected, and this information is output as encoding selection information. The encoding selection information is output to the switch 107 and transmitted to the communication partner. The internal configuration of the encoding selection unit 105 will be described later in detail.
[0037]
The switch 107 switches the internal switch based on the coding selection information output from the coding selection unit 105, and outputs the voice coded data output from the first coding unit 101 and the voice coded data output from the second coding unit 103. The selected encoded audio data among the encoded audio data is output as encoded audio data to be transmitted. Note that the encoded audio data output from the switching unit 107 is transmitted to the communication partner together with the encoding selection information output from the encoding selection unit 105.
[0038]
Next, components on the receiving side will be described.
[0039]
The first decoding unit 109 and the second decoding unit 111 selectively perform audio decoding on the output (audio encoded data) of the switch 113 and output the decoded audio signal to the switch 115, respectively. I do. Here, the first decoding unit 109 corresponds to the first encoding unit 101 on the transmission side, and the second decoding unit 111 corresponds to the second encoding unit 103 on the transmission side. The decoded audio signal output from the first decoding unit 109 or the second decoding unit 111 is supplied to a predetermined processing unit (not shown) and an encoding selection unit 105 on the transmission side via a switch 115.
[0040]
The switch 113 and the switch 115 operate in synchronization with each other, and switch the internal switch based on the coding selection information received from the communication partner. That is, the switch 113 transmits the encoded audio data received from the communication partner to the decoding unit having the selected one of the first decoding unit 109 and the second decoding unit 111. The switching unit 115 supplies the decoded audio signal output from the decoding unit to the predetermined processing unit and the encoding selection unit 105.
[0041]
FIG. 2 is a block diagram illustrating an example of the configuration of the encoding selection unit 105 in FIG.
[0042]
The encoding selection unit 105 includes a transmission voice existence determination unit 121, a reception voice existence determination unit 123, a transmission voice existence ratio calculation unit 125, a reception voice existence ratio calculation unit 127, and a sound section complementarity calculation unit. 129, a voiced section alternating occurrence degree calculation unit 131, a voice communication bidirectionality degree detection unit 133, and an encoding selection determination unit 135.
[0043]
The transmitted voice existence determination unit 121 determines whether the transmitted voice signal (input voice signal) is voiced or non-voiced at certain intervals, and determines the result of this determination by the transmission voiced voice ratio calculation unit 125 and Output to the section complementarity calculating section 129 and the voiced section alternating degree calculating section 131.
[0044]
The received voice sound determination unit 123 determines whether the received voice signal (decoded voice signal) is voiced or unvoiced at certain intervals, and determines the determination result as a received voice voice ratio calculation unit 127, Output to the section complementarity calculating section 129 and the voiced section alternating degree calculating section 131.
[0045]
In this embodiment, the transmission voice existence determination unit 121 and the reception voice existence determination unit 123 are provided to determine the presence or absence of sound, but the present invention is not limited to this. For example, if the speech encoding scheme applied to the first encoding unit 101 and the second encoding unit 103 incorporates speech / silence determination processing in advance, the information is used as it is. You may.
[0046]
Using the determination result (transmitted voice presence determination information) output from the transmitted voice presence determination unit 121, the transmission voice presence ratio calculation unit 125 uses the transmission voice presence ratio VAFs (0 ≦ VAFs ≦ 1). Is calculated. Here, the sound rate VAFs of the transmission sound is the ratio of sound in the transmission sound. The calculated voiced rate VAFs is output to the voice communication bidirectionality detection unit 133.
[0047]
The received voice activity ratio calculation unit 127 uses the determination result (voice presence information of the received voice) output from the received voice activity determination unit 123, and uses the received voice activity ratio VAFr (0 ≦ VAFr ≦ 1). Is calculated. Here, the sound rate VAFr of the received sound is the rate of sound in the received sound. The calculated voice ratio VAFr is output to the voice communication bidirectionality degree detection unit 133.
[0048]
The voiced section complementarity calculating section 129 outputs the determination result (voiced voice determination information of the transmitted voice) output from the transmitted voiced voice determination section 121 and the determination result (received voiced voiced voice determination section 123). Of the transmitted voice and the received voice are calculated using the voiced sound determination information). Here, the voiced section complementarity means that the voiced section of the transmitted voice and the voiced section of the received voice are temporally overlapped (both transmitting and receiving are voiced) or empty (both transmitting and receiving are not voiced). ) Indicates the degree to which there is no case. In the present embodiment, the sound section complementarity is used as one index indicating the bidirectionality of voice communication. Specifically, for example, as an example, a value COMP shown in the following (Equation 1) is set as one index indicating the degree of complementation of the sound section. The calculated sound section complementarity is output to the voice communication bidirectionality detector 133.
[0049]

[0050]
The voiced section alternating occurrence degree calculation section 131 outputs a determination result (voiced voice determination information of the transmitted voice) output from the transmitted voiced voice determination section 121 and a determination result (received voiced voiced voice determination section 123) Is calculated using the voiced sound determination information) of the transmitted voice and the received voice. Here, the voiced section alternate occurrence degree is a parameter indicating how frequently a transmitted voice and a received voice are alternately voiced sections in a certain unit time. In the present embodiment, this sound segment alternating occurrence degree is used as another index indicating the degree of bidirectionality of voice communication. Specifically, for example, the sound section alternation degree NINTR is defined as the number of times a sound section changes from the transmitting side to the receiving side (or from the receiving side to the transmitting side) per unit time (1 sec). . The calculated sound section alternation degree is output to the voice communication bidirectionality degree detection unit 133.
[0051]
The voice communication bidirectionality degree detection unit 133 includes a transmission voice sound rate calculation unit 125, a reception voice sound rate calculation unit 127, a sound section complementarity calculation unit 129, and a sound section alternating occurrence degree calculation unit 131. The voice communication bidirectionality is determined using the obtained voice ratios VAFs of the transmission voice, the voice ratios VAFr of the reception voice, the voice section complementarity COMP, and the voice section alternateness NINTR ( To detect. This determination (detection) result is output to the encoding selection determination unit 135.
[0052]
Specifically, for example, a flag FLAG indicating the degree of voice communication bidirectionality is obtained using the following (Equation 2), (Equation 3), (Equation 4), and (Equation 5).
[0053]

[0057]
The encoding selection determining unit 135 includes a first encoding unit 101 and a second encoding unit 101 for encoding an input audio signal based on the determination (detection) result FLAG obtained by the audio communication bidirectionality detection unit 133. Information (encoding selection information) indicating which one of the coding units 103 to select, and outputs it. Specifically, for example, when FLAG = 1, it is determined that the degree of bidirectionality of voice communication is high, and the first encoding unit 101 is selected. When FLAG = 0, both voice communications are performed. It is determined that the directional degree is low, and the second encoding unit 103 is selected.
[0058]
In the present embodiment, the determination is made based on the three types of determination information FLAG1, FLAG2, and FLAG3. However, the present invention is not limited to this, and any one of these three types or an arbitrary combination is used. The determination may be performed.
[0059]
Next, the operation of the speech coding apparatus having the above configuration will be described.
[0060]
First, the first encoding unit 101 and the second encoding unit 103 respectively perform audio encoding on an input audio signal, and output encoded audio data to the switch 105. As described above, an encoding method with a small encoding delay is applied to the first encoding unit 101, and the encoding delay is large and large for the second encoding unit 103 as compared with the first encoding unit 101. A coding scheme with a low coding bit rate (or high coding voice quality) is applied.
[0061]
Then, the encoding selection unit 105 detects the degree of bidirectionality of voice communication using the input audio signal on the transmitting side and the decoded audio signal on the receiving side, and, based on the detected degree of bidirectionality of voice communication, Information (coding selection information) indicating which one of the first coding unit 101 and the second coding unit 103 is to be selected is output. Specifically, for example, as described above, when the degree of bidirectionality of voice communication is high, the first encoding unit 101 with a small encoding delay is selected, and when the degree of bidirectionality of voice communication is low, The second encoding unit 103 selects a second encoding unit 103 having a large encoding delay and a low encoding bit rate (or a high encoded voice quality), and outputs this information as encoding selection information. The encoding selection information is output to the switch 107 and transmitted to the communication partner.
[0062]
Further, at this time, when detecting the degree of bidirectionality of the voice communication, for example, as described above, using the input voice signal on the transmission side and the decoded voice signal on the reception side, the presence / absence determination of the transmission voice and the reception voice is performed. Is performed, and using this determination result, after calculating the transmission voice sound rate, the reception voice sound rate, the sound section complementarity degree, and the sound section alternate occurrence degree, using these calculation results, Based on (Equation 2) to (Equation 5), the degree of bidirectionality of voice communication is determined (detected).
[0063]
Then, the switching unit 107 switches the internal switch based on the coding selection information output from the coding selection unit 105, and the voice coded data output from the first coding unit 101 and the second coding unit 103 And outputs the selected one of the encoded audio data output as the encoded audio data to be transmitted. Note that the encoded audio data output from the switching unit 107 is transmitted to the communication partner together with the encoding selection information output from the encoding selection unit 105.
[0064]
On the other hand, when voice encoded data and coding selection information are received from the communication partner, the internal switches of the

switches

113 and 115 are switched based on the received coding selection information, and the first decoding unit 109 or the second decoding unit is switched. The unit 111 performs audio decoding on the output (encoded audio data) of the switching unit 113, and converts the obtained decoded audio signal through a switching unit 115 into a predetermined processing unit (not shown) and encoding on the transmission side. Output to selection section 105.
[0065]
As described above, according to the present embodiment, the first encoding unit 101 that uses the first audio encoding scheme with a small encoding delay as the encoding unit that encodes the audio signal to be transmitted, A second encoding unit 103 that uses a second encoding method that has a large encoding delay and a low encoding bit rate (or high encoded audio quality) compared to the audio encoding method, and A first encoding unit having a small encoding delay based on the detected audio communication bidirectionality, using the audio signal and the decoded audio signal on the receiving side to detect the bidirectionality of the audio communication; Is large, but the encoding bit rate is low (or the encoded voice quality is high). In order to appropriately select one of the second encoding means, the encoding delay is appropriately controlled, so that the transmission efficiency and the two-way communication are improved. Compatible with low latency Rukoto can.
[0066]
In the present embodiment, two encoding

units

101 and 103 having different encoding delay amounts are switched, but the number of encoding units to be switched is not limited to this, and three or more encoding units are switched. May be provided so that these three or more encoding units are appropriately switched according to the degree of voice communication bidirectionality.
[0067]
Further, the present embodiment may be configured to cause a computer to execute a control program for realizing the above functions.
[0068]
【The invention's effect】
As described above, according to the present invention, it is possible to achieve both transmission efficiency and low delay in bidirectional communication.
[Brief description of the drawings]
FIG. 1 is a block diagram illustrating a configuration of a speech coding apparatus according to an embodiment of the present invention.
FIG. 2 is a block diagram illustrating a configuration of an encoding selection unit in FIG. 1;
FIG. 3 is a block diagram showing an example of a configuration of a conventional speech encoding device.
[Explanation of symbols]
101 first encoding unit
103 second encoding unit
105 Encoding selection unit
107, 113, 115 switch
109 first decoding unit
111 second decryption unit
121 Transmission sound existence judgment unit
123 Received sound existence judgment unit
125 Transmission voice sound rate calculation unit
127 Received voice soundness ratio calculation unit
129 Sounded section complementarity calculator
131 Sounded section alternating occurrence calculation unit
133 voice communication bidirectionality detection unit
135 Encoding selection determination unit

Claims

A first encoding unit that encodes an audio signal to be transmitted using a first audio encoding method;
An audio signal to be transmitted is encoded using a second encoding method having a large encoding delay and a low encoding bit rate or a high encoded audio quality as compared with the first audio encoding method. Second encoding means,
Decoding means for decoding the received encoded voice data;
Using the audio signal to be transmitted and the audio signal after decoding, detection means for detecting the degree of bidirectionality of the audio communication,
Selecting means for selecting one of the first encoding means and the second encoding means based on the detected voice communication bidirectionality degree;
Output means for outputting a processing result of the selected encoding means,
A speech encoding device comprising:

The selecting means,
If the degree of bidirectionality of voice communication is high, select the first encoding unit; if the degree of bidirectionality of voice communication is low, select the second encoding unit.
The speech encoding device according to claim 1, wherein:

The detecting means,
Using the information on the transmitting side regarding the sound period determined for the audio signal to be transmitted and the information on the receiving side regarding the sound period determined for the decoded audio signal, the degree of bidirectionality of the voice communication is determined. To detect the
The speech encoding device according to claim 1, wherein:

The detecting means,
As the information on the transmitting side regarding the voiced section determined for the audio signal to be transmitted and the information on the receiving side regarding the voiced section determined on the decoded voice signal, the voice ratio of the transmitting side and the receiving side , The complementation information between the sound section on the transmitting side and the sound section on the receiving side, and the alternating information on the sound section on the transmitting side and the sound section on the receiving side. Of which, at least one or more pieces of information are used to detect the degree of bidirectionality of voice communication.
4. The speech encoding device according to claim 3, wherein:

The decoding means,
Decoding the received encoded voice data using a selected one of the first and second voice encoding schemes;
The speech encoding device according to claim 1, wherein:

A portable terminal device comprising the speech encoding device according to claim 1.

A base station device comprising the speech encoding device according to claim 1.

A first encoding step of encoding an audio signal to be transmitted using a first audio encoding scheme;
An audio signal to be transmitted is encoded using a second encoding method having a large encoding delay and a low encoding bit rate or a high encoded audio quality as compared with the first audio encoding method. A second encoding step,
A decoding step of decoding the received encoded voice data;
Using the audio signal to be transmitted and the audio signal after decoding, a detection step of detecting the degree of bidirectionality of the audio communication,
A selecting step of selecting one of the first encoding method and the second encoding method based on the voice communication bidirectionality detected in the detecting step;
An output step of, among the processing results of the first encoding step and the second encoding step, outputting a processing result based on the encoding scheme selected in the selecting step;
A speech encoding method comprising:

On the computer,
A first encoding step of encoding an audio signal to be transmitted using a first audio encoding scheme;
An audio signal to be transmitted is encoded using a second encoding method having a large encoding delay and a low encoding bit rate or a high encoded audio quality as compared with the first audio encoding method. A second encoding step,
A decoding step of decoding the received encoded voice data;
Using the audio signal to be transmitted and the audio signal after decoding, a detection step of detecting the degree of bidirectionality of the audio communication,
A selecting step of selecting one of the first encoding method and the second encoding method based on the voice communication bidirectionality detected in the detecting step;
An output step of, among the processing results of the first encoding step and the second encoding step, outputting a processing result based on the encoding scheme selected in the selecting step;
A speech encoding program characterized by executing the following.