JP3886815B2

JP3886815B2 - Speech coding apparatus, speech decoding apparatus, speech coding method, and speech decoding method

Info

Publication number: JP3886815B2
Application number: JP2002020502A
Authority: JP
Inventors: 正山浦
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2002-01-29
Filing date: 2002-01-29
Publication date: 2007-02-28
Anticipated expiration: 2022-01-29
Also published as: JP2003223177A

Description

【０００１】
【発明の属する技術分野】
この発明は、音声信号をディジタル信号に圧縮符号化する音声符号化装置及び音声符号化方法と、圧縮符号化されたディジタル信号を復号化する音声復号化装置及び音声復号化方法とに関するものである。
【０００２】
【従来の技術】
従来、高能率音声符号化方法として、所定時間のフレーム毎に入力音声をスペクトル情報と音源情報とに分離して符号化する音声符号化方法が広く用いられている。その代表的な方法としては、例えば、マルチパルス音声符号化方法やＣＥＬＰ（ＣｏｄｅＥｘｃｉｔｅｄＬｉｎｅａｒＰｒｅｄｉｃｔｉｏｎ）音声符号化方法がよく知られている。
【０００３】これらの音声符号化方法では、スペクトル情報の符号化が予め定められたフレーム単位で行われ、また、音源情報の符号化がフレーム長より短い間隔（サブフレーム）毎に行われる。音源情報を符号化する際にはスペクトル情報を利用するが、本来、スペクトル情報は時間的に滑らかに変化するものであるため、フレーム毎のスペクトル情報をサブフレーム毎に補間することが必要である。
【０００４】
このスペクトル情報を補間する方法としては、一般に先行フレームのスペクトル情報から現フレームのスペクトル情報までの変動を直線近似するものが利用されている。また、どのフレームにおいても単一の変動パターンで近似を施すのは適切でない場合もあるので、特開平４−２３２９９９号公報や特開平６−１１８９９６号公報では、複数の補間パターンから最適な補間パターンを選択する方法が開示されている。
【０００５】
図９は従来の音声符号化装置及び音声復号化装置を示す構成図であり、図において、１は入力音声Ｓ１のスペクトル情報を符号化するとともに、入力音声Ｓ１の音源情報を符号化する符号化部、２は符号化部１により符号化されたスペクトル情報の符号と音源情報の符号を多重化して伝送する多重化部、３は多重化されているスペクトル情報の符号と音源情報の符号を分離する分離部、４はスペクトル情報の符号と音源情報の符号を復号化して復号音声Ｓ３を生成する復号化部である。
【０００６】
１１は入力音声Ｓ１のスペクトル情報を符号化し、そのスペクトル情報の符号と量子化値を出力するスペクトル情報符号化部、１２は予め用意されている複数の補間方法のうち、任意の補間方法を一つ選択し、その補間方法を実行してスペクトル情報の量子化値を補間するとともに、その選択した補間方法を示す補間情報の符号を出力する補間部、１３は補間部１２から出力されたスペクトル情報の量子化値を用いて入力音声Ｓ１の音源情報を符号化し、その音源情報の符号を出力する音源情報符号化部である。
【０００７】
１４はスペクトル情報の符号を復号化し、そのスペクトル情報の量子化値を出力するスペクトル情報復号化部、１５は予め用意されている複数の補間方法のうち、補間情報の符号が示す補間方法を選択し、その補間方法を実行してスペクトル情報の量子化値を補間する補間部、１６は音源情報の符号を復号化し、その音源情報の量子化値を出力する音源情報復号化部、１７は補間部１５から出力されたスペクトル情報の量子化値と音源情報復号化部１６から出力された音源情報の量子化値を合成して復号音声Ｓ３を生成する合成部である。
【０００８】
次に動作について説明する。
まず、符号化部１のスペクトル情報符号化部１１は、入力音声Ｓ１を分析し、例えば、線形予測パラメータやケプストラムなどのスペクトル情報を抽出する。
そして、そのスペクトル情報をベクトル量子化などの既知の方法を実行して符号化し、得られたスペクトル情報の符号を多重化部２に出力する。また、スペクトル情報の量子化値（符号化結果）を補間部１２に出力する。
【０００９】
符号化部１の補間部１２は、予め用意されている複数の補間方法のうち、例えば、特開平４−２３２９９９号公報に開示されている方法を実行することにより、任意の補間方法を一つ選択する。
そして、その選択した補間方法を実行してスペクトル情報の量子化値を補間し、補間後のスペクトル情報の量子化値を音源情報符号化部１３に出力する。また、その選択した補間方法を示す補間情報の符号を多重化部２に出力する。
【００１０】
符号化部１の音源情報符号化部１３は、入力音声Ｓ１を分析して音源情報を抽出する。
そして、補間部１２から出力されたスペクトル情報の量子化値を用いて、入力音声Ｓ１の音源情報を符号化し、得られた音源情報の符号を多重化部２に出力する。
【００１１】
多重化部２は、符号化部１から出力されたスペクトル情報の符号と、補間情報の符号と、音源情報の符号とを多重化し、その多重化結果Ｓ２を分離部３に伝送する。
分離部３は、多重化結果Ｓ２を受けると、それらを分離する。そして、スペクトル情報の符号をスペクトル情報復号化部１４に出力し、補間情報の符号を補間部１５に出力し、音源情報の符号を音源情報復号化部１６に出力する。
【００１２】
復号化部４のスペクトル情報復号化部１４は、分離部３からスペクトル情報の符号を受けると、そのスペクトル情報の符号を復号化し、そのスペクトル情報の量子化値（復号化結果）を補間部１５に出力する。
復号化部４の補間部１５は、予め用意されている複数の補間方法のうち、補間情報の符号が示す補間方法を選択する。即ち、符号化部１の補間部１２により選択された補間方法と同じ補間方法を選択する。そして、その補間方法を実行してスペクトル情報の量子化値を補間し、補間後のスペクトル情報の量子化値を合成部１７に出力する。
【００１３】
復号化部４の音源情報復号化部１６は、分離部３から音源情報の符号を受けると、その音源情報の符号を復号化し、その音源情報の量子化値（復号化結果）を合成部１７に出力する。
復号化部４の合成部１７は、補間部１５から出力されたスペクトル情報の量子化値と音源情報復号化部１６から出力された音源情報の量子化値を合成して復号音声Ｓ３を生成する。
【００１４】
【発明が解決しようとする課題】
従来の音声符号化装置及び音声復号化装置は以上のように構成されているので、常に同一の補間方法を実行してスペクトル情報の量子化値を補間する場合よりも、高品質な復号音声を生成することができるが、補間情報の符号を伝送する必要があるため、低ビットレート化の妨げになる課題があった。
【００１５】
この発明は上記のような課題を解決するためになされたもので、伝送情報量の増加を招くことなく、品質の高い音声の再生を実現することができる音声符号化装置、音声復号化装置、音声符号化方法及び音声復号化方法を得ることを目的とする。
【００１６】
【課題を解決するための手段】
この発明に係る音声符号化装置は、スペクトル情報符号化手段から出力された量子化値に基づいて入力音声の前フレーム及び現フレームが有声であるか無声であるかを判定する様態判定手段を設け、その様態判定手段の判定結果に応じて補間方法を選択し、その補間方法を実行してスペクトル情報符号化手段から出力された量子化値をフレーム間で補間するようにしたものである。
【００１７】
この発明に係る音声符号化装置は、スペクトル情報符号化手段から出力されたスペクトル情報の符号に基づいて入力音声の前フレーム及び現フレームが有声であるか無声であるかを判定する様態判定手段を設け、その様態判定手段の判定結果に応じて補間方法を選択し、その補間方法を実行してスペクトル情報符号化手段から出力された量子化値をフレーム間で補間するようにしたものである。
【００１９】
この発明に係る音声符号化装置は、様態判定手段の判定結果が、入力音声の前フレームが無声であって、現フレームが有声である旨を示す場合、補間対象の一部のサブフレームの量子化値が現フレームの量子化値と一致するような補間方法を実行するようにしたものである。
【００２０】
この発明に係る音声符号化装置は、様態判定手段の判定結果が、入力音声の前フレームが有声であって、現フレームが無声である旨を示す場合、補間対象の一部のサブフレームの量子化値が前フレームの量子化値と一致するような補間方法を実行するようにしたものである。
【００２１】
この発明に係る音声復号化装置は、スペクトル情報復号化手段から出力された量子化値に基づいて復号音声の前フレーム及び現フレームが有声であるか無声であるかを判定する様態判定手段を設け、その様態判定手段の判定結果に応じて補間方法を選択し、その補間方法を実行してスペクトル情報復号化手段から出力された量子化値をフレーム間で補間するようにしたものである。
【００２２】
この発明に係る音声復号化装置は、スペクトル情報の符号に基づいて復号音声の前フレーム及び現フレームが有声であるか無声であるかを判定する様態判定手段を設け、その様態判定手段の判定結果に応じて補間方法を選択し、その補間方法を実行してスペクトル情報復号化手段から出力された量子化値をフレーム間で補間するようにしたものである。
【００２４】
この発明に係る音声復号化装置は、様態判定手段の判定結果が、復号音声の前フレームが無声であって、現フレームが有声である旨を示す場合、補間対象の一部のサブフレームの量子化値が現フレームの量子化値と一致するような補間方法を実行するようにしたものである。
【００２５】
この発明に係る音声復号化装置は、様態判定手段の判定結果が、復号音声の前フレームが有声であって、現フレームが無声である旨を示す場合、補間対象の一部のサブフレームの量子化値が前フレームの量子化値と一致するような補間方法を実行するようにしたものである。
【００２６】
この発明に係る音声符号化方法は、スペクトル情報の量子化値に基づいて入力音声の前フレーム及び現フレームが有声であるか無声であるかを判定し、その判定結果に応じて補間方法を選択し、その補間方法を実行してそのスペクトル情報の量子化値をフレーム間で補間するようにしたものである。
【００２７】
この発明に係る音声符号化方法は、スペクトル情報の符号に基づいて入力音声の前フレーム及び現フレームが有声であるか無声であるかを判定し、その判定結果に応じて補間方法を選択し、その補間方法を実行してそのスペクトル情報の量子化値をフレーム間で補間するようにしたものである。
【００２８】
この発明に係る音声復号化方法は、スペクトル情報の量子化値に基づいて復号音声の前フレーム及び現フレームが有声であるか無声であるかを判定し、その判定結果に応じて補間方法を選択し、その補間方法を実行してそのスペクトル情報の量子化値をフレーム間で補間するようにしたものである。
【００２９】
この発明に係る音声復号化方法は、スペクトル情報の符号に基づいて復号音声の前フレーム及び現フレームが有声であるか無声であるかを判定し、その判定結果に応じて補間方法を選択し、その補間方法を実行してそのスペクトル情報の量子化値をフレーム間で補間するようにしたものである。
【００３０】
【発明の実施の形態】
以下、この発明の実施の一形態を説明する。
実施の形態１．
図１はこの発明の実施の形態１による音声符号化装置及び音声復号化装置を示す構成図であり、図において、２１は入力音声Ｓ１のスペクトル情報を符号化するとともに、入力音声Ｓ１の音源情報を符号化する符号化部、２２は符号化部２１により符号化されたスペクトル情報の符号と音源情報の符号を多重化して伝送する多重化部、２３は多重化されているスペクトル情報の符号と音源情報の符号を分離する分離部、２４はスペクトル情報の符号と音源情報の符号を復号化して復号音声Ｓ３を生成する復号化部である。
【００３１】
３１は入力音声Ｓ１のスペクトル情報を符号化し、そのスペクトル情報の符号と量子化値を出力するスペクトル情報符号化部（スペクトル情報符号化手段）、３２はスペクトル情報符号化部３１から出力されたスペクトル情報の量子化値に基づいて入力音声Ｓ１の様態を判定する様態判定部（様態判定手段）、３３は予め用意されている複数の補間方法のうち、様態判定部３２の判定結果に応じた補間方法を実行して、スペクトル情報符号化部３１から出力されたスペクトル情報の量子化値をフレーム間で補間する補間部（補間手段）、３４は補間部３３から出力されたスペクトル情報の量子化値を用いて入力音声Ｓ１の音源情報を符号化し、その音源情報の符号を出力する音源情報符号化部（音源情報符号化手段）である。
【００３２】
３５はスペクトル情報の符号を復号化し、そのスペクトル情報の量子化値を出力するスペクトル情報復号化部（スペクトル情報復号化手段）、３６はスペクトル情報復号化部３５から出力されたスペクトル情報の量子化値に基づいて復号音声Ｓ３の様態を判定する様態判定部（様態判定手段）、３７は様態判定部３６の判定結果に応じた補間方法を実行して、スペクトル情報復号化部３５から出力されたスペクトル情報の量子化値をフレーム間で補間する補間部（補間手段）、３８は音源情報の符号を復号化し、その音源情報の量子化値を出力する音源情報復号化部（音源情報復号化手段）、３９は補間部３７から出力されたスペクトル情報の量子化値と音源情報復号化部３８から出力された音源情報の量子化値を合成して復号音声Ｓ３を生成する合成部（合成手段）である。
図２はこの発明の実施の形態１による音声符号化方法及び音声復号化方法を示すフローチャートである。
【００３３】
次に動作について説明する。
まず、符号化部２１のスペクトル情報符号化部３１は、入力音声Ｓ１を分析して、例えば、線形予測パラメータやケプストラムなどのスペクトル情報を抽出する。
そして、そのスペクトル情報をベクトル量子化などの既知の方法を実行して符号化し、得られたスペクトル情報の符号を多重化部２２に出力する（ステップＳＴ１）。また、スペクトル情報の量子化値（符号化結果）を様態判定部３２及び補間部３３に出力する。
【００３４】
符号化部２１の様態判定部３２は、スペクトル情報符号化部３１からスペクトル情報の量子化値を受けると、そのスペクトル情報の量子化値に基づいて入力音声Ｓ１の様態を判定する（ステップＳＴ２）。即ち、音声符号化や音声認識などの技術分野において既知の方法を実行することにより、スペクトル情報の量子化値から入力音声Ｓ１の各フレームの様態が有声であるか、または、無声であるかを判定する。
【００３５】
符号化部２１の補間部３３は、予め用意されている複数の補間方法（補間パターン）のうち、様態判定部３２の判定結果に応じた補間パターンを選択する（ステップＳＴ３）。
ここで、図３は補間パターン例を示す説明図であり、図において、（Ａ）はスペクトル情報をフレーム間で線形補間している補間パターン例である。また、（Ｂ）はサブフレームの量子化値が早期に現フレームの量子化値と一致するように補間している補間パターン例である。
【００３６】
補間部３３は、例えば、様態判定部３２の判定結果が、前フレームが無声であって、現フレームが有声である旨を示す場合には、補間パターン（Ｂ）を選択し、それ例外の場合には、補間パターン（Ａ）を選択するようにする。
このような補間パターンを用いることにより、図４に示すように、無声から有声への過渡部（有声区間である第２、第３サブフレーム）において、補間後のスペクトル情報が真のスペクトル情報に近いものとなるので、当該区間の復号音声の品質を高めることができる。
【００３７】
このような補間を行った場合、入力音声Ｓ１のフレーミングによっては、図５に示すように、無声区間である第２サブフレームにおいて、補間後のスペクトル情報が有声のスペクトル様態を示すものとなることもあるが、無声区間はパワーが小さいので、このような齟齬が発生しても復号音声の聴覚上の劣化は生じない。
【００３８】
なお、前フレームが無声であって、現フレームが有声である場合のスペクトル情報の補間パターンは、図３の補間パターン（Ｂ）に限るものではなく、図６に示す補間パターン（Ｂ’）や（Ｂ″）などの他の補間パターンを用いることも当然可能である。また、図３〜図６では１フレーム当り、４つのサブフレームを有する構成を示したが、１フレーム当りのサブフレーム数は４に限るものでなく、他の構成でも同様に補間パターンを与えることは当然可能である。
【００３９】
符号化部２１の補間部３３は、上記のようにして補間パターンを選択すると、その補間パターンにしたがってスペクトル情報符号化部３１から出力されたスペクトル情報の量子化値をフレーム間で補間する（ステップＳＴ４）。
符号化部２１の音源情報符号化部３４は、入力音声Ｓ１を分析して音源情報を抽出する。
そして、補間部３３から出力されたスペクトル情報の量子化値を用いて、入力音声Ｓ１の音源情報を符号化し、得られた音源情報の符号を多重化部２２に出力する（ステップＳＴ５）。
【００４０】
多重化部２２は、符号化部２１から出力されたスペクトル情報の符号と、音源情報の符号とを多重化し、その多重化結果Ｓ２を分離部２３に伝送する（ステップＳＴ６）。
分離部２３は、多重化結果Ｓ２を受けると、それらを分離する（ステップＳＴ７）。そして、スペクトル情報の符号をスペクトル情報復号化部３５に出力し、音源情報の符号を音源情報復号化部３８に出力する。
【００４１】
復号化部２４のスペクトル情報復号化部３５は、分離部２３からスペクトル情報の符号を受けると、そのスペクトル情報の符号を復号化し、そのスペクトル情報の量子化値（復号化結果）を様態判定部３６及び補間部３７に出力する（ステップＳＴ８）。
【００４２】
復号化部２４の様態判定部３６は、スペクトル情報復号化部３５からスペクトル情報の量子化値を受けると、符号化部２１の様態判定部３２と同様に、そのスペクトル情報の量子化値に基づいて復号音声Ｓ３の様態を判定する（ステップＳＴ９）。
【００４３】
復号化部２４の補間部３７は、符号化部２１の補間部３３と同様に、予め用意されている複数の補間方法のうち、様態判定部３６の判定結果に応じた補間パターンを選択する（ステップＳＴ１０）。
そして、その選択した補間パターンにしたがってスペクトル情報復号化部３５から出力されたスペクトル情報の量子化値をフレーム間で補間する（ステップＳＴ１１）。
【００４４】
復号化部２４の音源情報復号化部３８は、分離部２３から音源情報の符号を受けると、その音源情報の符号を復号化し、その音源情報の量子化値（復号化結果）を合成部３９に出力する（ステップＳＴ１２）。
復号化部２４の合成部３９は、補間部３７から出力されたスペクトル情報の量子化値と音源情報復号化部３８から出力された音源情報の量子化値を合成して復号音声Ｓ３を生成する（ステップＳＴ１３）。
【００４５】
以上で明らかなように、この実施の形態１によれば、符号化部２１においては、スペクトル情報符号化部３１から出力されたスペクトル情報の量子化値に基づいて入力音声Ｓ１の様態を判定する様態判定部３２を設け、その様態判定部３２の判定結果に応じた補間方法を実行して、スペクトル情報符号化部３１から出力されたスペクトル情報の量子化値をフレーム間で補間するように構成し、復号化部２４においては、スペクトル情報復号化部３５から出力されたスペクトル情報の量子化値に基づいて復号音声Ｓ３の様態を判定する様態判定部３６を設け、その様態判定部３６の判定結果に応じた補間方法を実行して、スペクトル情報復号化部３５から出力されたスペクトル情報の量子化値をフレーム間で補間するように構成したので、従来例のように、補間情報の符号を伝送する必要がなくなり、その結果、伝送情報量の増加を招くことなく、品質の高い音声の再生を実現することができる効果を奏する。
【００４６】
また、この実施の形態１では、入力音声Ｓ１の様態判定結果が所定条件を満たす場合には、サブフレームの量子化値が早期に現フレームの量子化値と一致するように補間するので、フレーム内で入力音声Ｓ１の様態が急変する場合にも、適正なスペクトル情報の補間を実施して、品質の高い復号音声を生成することができる。
【００４７】
実施の形態２．
図７はこの発明の実施の形態２による音声符号化装置及び音声復号化装置を示す構成図であり、図において、図１と同一符号は同一または相当部分を示すので説明を省略する。
４１はスペクトル情報符号化部３１から出力されたスペクトル情報の符号に基づいて入力音声Ｓ１の様態を判定する様態判定部（様態判定手段）、４２はスペクトル情報の符号に基づいて復号音声Ｓ３の様態を判定する様態判定部（様態判定手段）である。
【００４８】
上記実施の形態１では、様態判定部３２がスペクトル情報符号化部３１から出力されたスペクトル情報の量子化値に基づいて入力音声Ｓ１の様態を判定し、様態判定部３６がスペクトル情報復号化部３５から出力されたスペクトル情報の量子化値に基づいて復号音声Ｓ３の様態を判定するものについて示したが、これに限るものではなく、様態判定部４１がスペクトル情報符号化部３１から出力されたスペクトル情報の符号に基づいて入力音声Ｓ１の様態を判定し、様態判定部４２がスペクトル情報の符号に基づいて復号音声Ｓ３の様態を判定するようにしてもよく、上記実施の形態１と同様の効果を奏することができる。
また、スペクトル情報の符号と量子化値（符号化結果または復号化結果）の両方を用いて音声の様態を判定するようにしてもよい。
【００４９】
実施の形態３．
上記実施の形態１，２では、前フレームが無声であって、現フレームが有声である場合には、サブフレームの量子化値が早期に現フレームの量子化値と一致するように補間し、それ以外の場合には、線形に補間するものについて示したが、前フレームが有声であって、現フレームが無声である場合には、図８（Ｃ）に示すように、サブフレームの量子化値が遅くまで前フレームの量子化値と一致するように補間してもよい。
この実施の形態３によれば、有声から無声への過渡部において、補間後のスペクトル情報が真のスペクトル情報に近いものとなるので、品質の高い復号音声を生成することができる。
【００５０】
実施の形態４．
上記実施の形態１〜３では、音声の様態を有声と無声の２値で判定するものについて示したが、これに限るものではなく、例えば、有声の度合を多値判定し、その判定結果に応じて異なる補間パターンを用いるようにしてもよい。
また、有声と無声だけではなく、無声、有声定常、有声過渡など、他の多くの音声様態に分類して判定し、その様態の判定結果に応じて異なる補間パターンを用いるようにしてもよい。
【００５１】
さらに、上記実施の形態１〜３では、人が発見的に与えられるような単純な補間パターンを示しているが、これに限るものではなく、例えば、大量の発話音声からなるデータベースを用いて統計的な手法により、各音声様態における最適な補間パターンを学習して獲得するようにしてもよい。
この実施の形態４によれば、それぞれの音声様態に適したスペクトル状態の補間を行うなどの細かい制御が可能となるので、品質の高い復号音声を生成することができる。
【００５２】
実施の形態５．
上記実施の形態１〜４では、スペクトル情報のみを用いて音声様態を判定するものについて示したが、これに限るものではなく、例えば、現フレームはスペクトル情報のみから音声様態を判定し、過去のフレームにおける様態判定は過去のフレームの音源情報も用いて再度判定するようにしてもよい。
この実施の形態５によれば、過去のフレームの様態判定精度を向上することが可能となるので、品質の高い復号音声を生成することができる。
【００５３】
【発明の効果】
以上のように、この発明によれば、スペクトル情報符号化手段から出力された量子化値に基づいて入力音声の前フレーム及び現フレームが有声であるか無声であるかを判定する様態判定手段を設け、その様態判定手段の判定結果に応じて補間方法を選択し、その補間方法を実行してスペクトル情報符号化手段から出力された量子化値をフレーム間で補間するように構成したので、伝送情報量の増加を招くことなく、品質の高い音声の再生を実現することができる効果がある。
【００５４】
この発明によれば、スペクトル情報符号化手段から出力されたスペクトル情報の符号に基づいて入力音声の前フレーム及び現フレームが有声であるか無声であるかを判定する様態判定手段を設け、その様態判定手段の判定結果に応じて補間方法を選択し、その補間方法を実行してスペクトル情報符号化手段から出力された量子化値をフレーム間で補間するように構成したので、伝送情報量の増加を招くことなく、品質の高い音声の再生を実現することができる効果がある。
【００５６】
この発明によれば、様態判定手段の判定結果が、入力音声の前フレームが無声であって、現フレームが有声である旨を示す場合、補間対象の一部のサブフレームの量子化値が現フレームの量子化値と一致するような補間方法を実行するように構成したので、無声から有声への過渡部においても、品質の高い音声の再生を実現することができる効果がある。
【００５７】
この発明によれば、様態判定手段の判定結果が、入力音声の前フレームが有声であって、現フレームが無声である旨を示す場合、補間対象の一部のサブフレームの量子化値が前フレームの量子化値と一致するような補間方法を実行するように構成したので、有声から無声への過渡部においても、品質の高い音声の再生を実現することができる効果がある。
【００５８】
この発明によれば、スペクトル情報復号化手段から出力された量子化値に基づいて復号音声の前フレーム及び現フレームが有声であるか無声であるかを判定する様態判定手段を設け、その様態判定手段の判定結果に応じて補間方法を選択し、その補間方法を実行してスペクトル情報復号化手段から出力された量子化値をフレーム間で補間するように構成したので、伝送情報量の増加を招くことなく、品質の高い音声の再生を実現することができる効果がある。
【００５９】
この発明によれば、スペクトル情報の符号に基づいて復号音声の前フレーム及び現フレームが有声であるか無声であるかを判定する様態判定手段を設け、その様態判定手段の判定結果に応じて補間方法を選択し、その補間方法を実行してスペクトル情報復号化手段から出力された量子化値をフレーム間で補間するように構成したので、伝送情報量の増加を招くことなく、品質の高い音声の再生を実現することができる効果がある。
【００６１】
この発明によれば、様態判定手段の判定結果が、復号音声の前フレームが無声であって、現フレームが有声である旨を示す場合、補間対象の一部のサブフレームの量子化値が現フレームの量子化値と一致するような補間方法を実行するように構成したので、無声から有声への過渡部においても、品質の高い音声の再生を実現することができる効果がある。
【００６２】
この発明によれば、様態判定手段の判定結果が、復号音声の前フレームが有声であって、現フレームが無声である旨を示す場合、補間対象の一部のサブフレームの量子化値が前フレームの量子化値と一致するような補間方法を実行するように構成したので、有声から無声への過渡部においても、品質の高い音声の再生を実現することができる効果がある。
【００６３】
この発明によれば、スペクトル情報の量子化値に基づいて入力音声の前フレーム及び現フレームが有声であるか無声であるかを判定し、その判定結果に応じて補間方法を選択し、その補間方法を実行してそのスペクトル情報の量子化値をフレーム間で補間するように構成したので、伝送情報量の増加を招くことなく、品質の高い音声の再生を実現することができる効果がある。
【００６４】
この発明によれば、スペクトル情報の符号に基づいて入力音声の前フレーム及び現フレームが有声であるか無声であるかを判定し、その判定結果に応じて補間方法を選択し、その補間方法を実行してそのスペクトル情報の量子化値をフレーム間で補間するように構成したので、伝送情報量の増加を招くことなく、品質の高い音声の再生を実現することができる効果がある。
【００６５】
この発明によれば、スペクトル情報の量子化値に基づいて復号音声の前フレーム及び現フレームが有声であるか無声であるかを判定し、その判定結果に応じて補間方法を選択し、その補間方法を実行してそのスペクトル情報の量子化値をフレーム間で補間するように構成したので、伝送情報量の増加を招くことなく、品質の高い音声の再生を実現することができる効果がある。
【００６６】
この発明によれば、スペクトル情報の符号に基づいて復号音声の前フレーム及び現フレームが有声であるか無声であるかを判定し、その判定結果に応じて補間方法を選択し、その補間方法を実行してそのスペクトル情報の量子化値をフレーム間で補間するように構成したので、伝送情報量の増加を招くことなく、品質の高い音声の再生を実現することができる効果がある。
【図面の簡単な説明】
【図１】この発明の実施の形態１による音声符号化装置及び音声復号化装置を示す構成図である。
【図２】この発明の実施の形態１による音声符号化方法及び音声復号化方法を示すフローチャートである。
【図３】補間パターン例を示す説明図である。
【図４】スペクトル情報と音声の関係を示す説明図である。
【図５】スペクトル情報と音声の関係を示す説明図である。
【図６】補間パターン例を示す説明図である。
【図７】この発明の実施の形態２による音声符号化装置及び音声復号化装置を示す構成図である。
【図８】補間パターン例を示す説明図である。
【図９】従来の音声符号化装置及び音声復号化装置を示す構成図である。
【符号の説明】
１符号化部、２多重化部、３分離部、４復号化部、１１スペクトル情報符号化部、１２補間部、１３音源情報符号化部、１４スペクトル情報復号化部、１５補間部、１６音源情報復号化部、１７合成部、２１符号化部、２２多重化部、２３分離部、２４復号化部、３１スペクトル情報符号化部（スペクトル情報符号化手段）、３２様態判定部（様態判定手段）、３３補間部（補間手段）、３４音源情報符号化部（音源情報符号化手段）、３５スペクトル情報復号化部（スペクトル情報復号化手段）、３６様態判定部（様態判定手段）、３７補間部（補間手段）、３８音源情報復号化部（音源情報復号化手段）、３９合成部（合成手段）、４１様態判定部（様態判定手段）、４２様態判定部（様態判定手段）。[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a speech coding apparatus and speech coding method for compressing and coding speech signals into digital signals, and a speech decoding apparatus and speech decoding method for decoding compression-coded digital signals. .
[0002]
[Prior art]
Conventionally, as a high-efficiency speech encoding method, a speech encoding method that separates and encodes input speech into spectrum information and sound source information every frame for a predetermined time has been widely used. As typical methods, for example, a multipulse speech coding method and a CELP (Code Excited Linear Prediction) speech coding method are well known.
In these speech encoding methods, spectral information is encoded in predetermined frames, and sound source information is encoded at intervals (subframes) shorter than the frame length. Spectral information is used when encoding sound source information. However, since spectral information changes smoothly with time, it is necessary to interpolate spectral information for each frame for each subframe. .
[0004]
As a method for interpolating the spectrum information, generally, a method that linearly approximates the fluctuation from the spectrum information of the preceding frame to the spectrum information of the current frame is used. In addition, since it may not be appropriate to perform approximation with a single variation pattern in any frame, in Japanese Patent Laid-Open Nos. 4-232999 and 6-118996, an optimal interpolation pattern is selected from a plurality of interpolation patterns. A method of selecting is disclosed.
[0005]
FIG. 9 is a block diagram showing a conventional speech encoding apparatus and speech decoding apparatus. In the figure, 1 encodes spectral information of the input speech S1 and encodes sound source information of the input speech S1. 2 is a multiplexing unit that multiplexes and transmits the spectrum information code and the excitation information code encoded by the encoding unit 1, and 3 separates the multiplexed spectrum information code and the excitation information code. The separating unit 4 is a decoding unit that decodes the code of the spectrum information and the code of the sound source information to generate the decoded speech S3.
[0006]
11 is a spectrum information encoding unit that encodes the spectrum information of the input speech S1 and outputs a code of the spectrum information and a quantized value, and 12 is an arbitrary interpolation method among a plurality of interpolation methods prepared in advance. And interpolating the quantized value of the spectrum information by executing the interpolation method and outputting a code of the interpolation information indicating the selected interpolation method, 13 is the spectrum information output from the interpolation unit 12 This is a sound source information encoding unit that encodes the sound source information of the input speech S1 using the quantized value of and outputs the code of the sound source information.
[0007]
14 is a spectrum information decoding unit that decodes a spectrum information code and outputs a quantized value of the spectrum information, and 15 is an interpolation method indicated by the interpolation information code among a plurality of interpolation methods prepared in advance. An interpolation unit that executes the interpolation method to interpolate the quantized value of the spectrum information, 16 decodes the code of the sound source information, and outputs the quantized value of the sound source information, and 17 interpolates This is a synthesizing unit that generates a decoded speech S3 by synthesizing the quantized value of the spectrum information output from the unit 15 and the quantized value of the sound source information output from the sound source information decoding unit 16.
[0008]
Next, the operation will be described.
First, the spectrum information encoding unit 11 of the encoding unit 1 analyzes the input speech S1 and extracts, for example, spectrum information such as linear prediction parameters and cepstrum.
Then, the spectrum information is encoded by executing a known method such as vector quantization, and the obtained spectrum information code is output to the multiplexing unit 2. Further, the quantized value (encoding result) of the spectrum information is output to the interpolation unit 12.
[0009]
The interpolation unit 12 of the encoding unit 1 performs one arbitrary interpolation method by executing, for example, a method disclosed in JP-A-4-232999, among a plurality of interpolation methods prepared in advance. select.
Then, the selected interpolation method is executed to interpolate the quantized value of the spectrum information, and the quantized value of the interpolated spectral information is output to the excitation information encoding unit 13. In addition, a code of interpolation information indicating the selected interpolation method is output to the multiplexing unit 2.
[0010]
The sound source information encoding unit 13 of the encoding unit 1 analyzes the input speech S1 and extracts sound source information.
Then, using the quantized value of the spectrum information output from the interpolation unit 12, the sound source information of the input speech S <b> 1 is encoded, and the obtained sound source information code is output to the multiplexing unit 2.
[0011]
The multiplexing unit 2 multiplexes the spectrum information code, the interpolation information code, and the excitation information code output from the encoding unit 1, and transmits the multiplexing result S 2 to the separation unit 3.
When receiving the multiplexing result S2, the separating unit 3 separates them. The code of the spectrum information is output to the spectrum information decoding unit 14, the code of the interpolation information is output to the interpolation unit 15, and the code of the excitation information is output to the excitation information decoding unit 16.
[0012]
When the spectrum information decoding unit 14 of the decoding unit 4 receives the code of the spectrum information from the separation unit 3, the spectrum information decoding unit 14 decodes the code of the spectrum information and interpolates the quantized value (decoding result) of the spectrum information. Output to.
The interpolation unit 15 of the decoding unit 4 selects an interpolation method indicated by the code of the interpolation information from among a plurality of interpolation methods prepared in advance. That is, the same interpolation method as that selected by the interpolation unit 12 of the encoding unit 1 is selected. Then, the interpolation method is executed to interpolate the quantized value of the spectrum information, and the quantized value of the interpolated spectral information is output to the synthesis unit 17.
[0013]
When receiving the code of the sound source information from the separation unit 3, the sound source information decoding unit 16 of the decoding unit 4 decodes the code of the sound source information and combines the quantized value (decoding result) of the sound source information with the synthesis unit 17. Output to.
The synthesis unit 17 of the decoding unit 4 combines the quantized value of the spectrum information output from the interpolation unit 15 and the quantized value of the excitation information output from the excitation information decoding unit 16 to generate decoded speech S3. .
[0014]
[Problems to be solved by the invention]
Since the conventional speech coding apparatus and speech decoding apparatus are configured as described above, a higher quality decoded speech can be obtained than when the same interpolation method is always executed to interpolate the quantized values of spectrum information. Although it can be generated, since it is necessary to transmit the code of the interpolation information, there is a problem that hinders the reduction of the bit rate.
[0015]
The present invention has been made to solve the above-described problems. A speech encoding device, a speech decoding device, and a speech decoding device that can realize high-quality speech reproduction without increasing the amount of transmission information. An object is to obtain a speech encoding method and a speech decoding method.
[0016]
[Means for Solving the Problems]
The speech coding apparatus according to the present invention is based on the quantized value output from the spectrum information coding means. Whether the previous and current frames are voiced or unvoiced A mode determining unit is provided, and an interpolation method is selected according to the determination result of the mode determining unit, and the interpolation method is executed to interpolate the quantized value output from the spectrum information encoding unit between frames. It is a thing.
[0017]
The speech coding apparatus according to the present invention is configured to input speech based on the spectrum information code output from the spectrum information coding means. Whether the previous and current frames are voiced or unvoiced A mode determining unit is provided, and an interpolation method is selected according to the determination result of the mode determining unit, and the interpolation method is executed to interpolate the quantized value output from the spectrum information encoding unit between frames. It is a thing.
[0019]
In the speech coding apparatus according to the present invention, when the determination result of the state determination means indicates that the previous frame of the input speech is unvoiced and the current frame is voiced, the quantum of some subframes to be interpolated An interpolation method is executed so that the quantized value matches the quantized value of the current frame.
[0020]
In the speech coding apparatus according to the present invention, when the determination result of the state determination means indicates that the previous frame of the input speech is voiced and the current frame is unvoiced, the quantization of some subframes to be interpolated An interpolation method in which the quantized value matches the quantized value of the previous frame is executed.
[0021]
The speech decoding apparatus according to the present invention is configured to decode decoded speech based on the quantization value output from the spectrum information decoding unit. Whether the previous and current frames are voiced or unvoiced A mode determination unit for determining is provided, an interpolation method is selected according to a determination result of the mode determination unit, and the quantization value output from the spectrum information decoding unit is interpolated between frames by executing the interpolation method. It is a thing.
[0022]
The speech decoding apparatus according to the present invention is configured to decode decoded speech based on a code of spectrum information. Whether the previous and current frames are voiced or unvoiced A mode determination unit for determining is provided, an interpolation method is selected according to a determination result of the mode determination unit, and the quantization value output from the spectrum information decoding unit is interpolated between frames by executing the interpolation method. It is a thing.
[0024]
In the speech decoding apparatus according to the present invention, when the determination result of the state determination means indicates that the previous frame of the decoded speech is unvoiced and the current frame is voiced, the quantization of some subframes to be interpolated An interpolation method is executed so that the quantized value matches the quantized value of the current frame.
[0025]
In the speech decoding apparatus according to the present invention, when the determination result of the state determination means indicates that the previous frame of the decoded speech is voiced and the current frame is unvoiced, the quantization of some subframes to be interpolated An interpolation method in which the quantized value matches the quantized value of the previous frame is executed.
[0026]
The speech coding method according to the present invention is based on a quantized value of spectrum information. Whether the previous and current frames are voiced or unvoiced It is determined, an interpolation method is selected according to the determination result, the interpolation method is executed, and the quantized value of the spectrum information is interpolated between frames.
[0027]
The speech coding method according to the present invention is based on the code of spectrum information. Whether the previous and current frames are voiced or unvoiced It is determined, an interpolation method is selected according to the determination result, the interpolation method is executed, and the quantized value of the spectrum information is interpolated between frames.
[0028]
The speech decoding method according to the present invention is based on a quantized value of spectrum information. Whether the previous and current frames are voiced or unvoiced It is determined, an interpolation method is selected according to the determination result, the interpolation method is executed, and the quantized value of the spectrum information is interpolated between frames.
[0029]
The speech decoding method according to the present invention is based on a code of spectrum information. Whether the previous and current frames are voiced or unvoiced It is determined, an interpolation method is selected according to the determination result, the interpolation method is executed, and the quantized value of the spectrum information is interpolated between frames.
[0030]
DETAILED DESCRIPTION OF THE INVENTION
An embodiment of the present invention will be described below.
Embodiment 1 FIG.
FIG. 1 is a block diagram showing a speech encoding apparatus and speech decoding apparatus according to Embodiment 1 of the present invention. In FIG. 1, reference numeral 21 encodes the spectrum information of the input speech S1 and the sound source information of the input speech S1. 22 is a multiplexing unit that multiplexes and transmits the spectrum information code encoded by the encoding unit 21 and the sound source information code, and 23 is a multiplexed spectrum information code. A separation unit 24 that separates the code of the sound source information, and a decoding unit 24 that decodes the code of the spectrum information and the code of the sound source information to generate the decoded speech S3.
[0031]
31 is a spectrum information encoding unit (spectrum information encoding means) that encodes the spectrum information of the input speech S1 and outputs the code of the spectrum information and a quantized value, and 32 is a spectrum output from the spectrum information encoding unit 31. A mode determination unit (mode determination unit) 33 for determining the mode of the input speech S1 based on the quantized value of the information, 33 is an interpolation according to the determination result of the mode determination unit 32 among a plurality of interpolation methods prepared in advance. An interpolation unit (interpolation means) that interpolates the quantized value of the spectral information output from the spectral information encoding unit 31 between frames by executing the method, and 34 indicates the quantized value of the spectral information output from the interpolating unit 33 Is a sound source information encoding unit (sound source information encoding means) that encodes the sound source information of the input speech S1 and outputs the code of the sound source information.
[0032]
Reference numeral 35 denotes a spectrum information decoding unit (spectrum information decoding means) that decodes the spectrum information code and outputs a quantized value of the spectrum information, and 36 denotes quantization of the spectrum information output from the spectrum information decoding unit 35. A mode determination unit (mode determination unit) 37 that determines the mode of the decoded speech S3 based on the value, executes an interpolation method according to the determination result of the mode determination unit 36, and is output from the spectrum information decoding unit 35 An interpolating unit (interpolating unit) that interpolates the quantized value of the spectrum information between frames, and 38 is a sound source information decoding unit (sound source information decoding unit) that decodes the code of the sound source information and outputs the quantized value of the sound source information. , 39 synthesizes the quantized value of the spectrum information output from the interpolation unit 37 and the quantized value of the sound source information output from the sound source information decoding unit 38 to generate the decoded speech S3. Combining section that is (combining means).
FIG. 2 is a flowchart showing a speech encoding method and speech decoding method according to Embodiment 1 of the present invention.
[0033]
Next, the operation will be described.
First, the spectrum information encoding unit 31 of the encoding unit 21 analyzes the input speech S1 and extracts, for example, spectrum information such as linear prediction parameters and cepstrum.
Then, the spectrum information is encoded by executing a known method such as vector quantization, and the obtained spectrum information code is output to the multiplexing unit 22 (step ST1). Further, the quantized value (encoding result) of the spectrum information is output to the state determination unit 32 and the interpolation unit 33.
[0034]
When receiving the quantized value of the spectrum information from the spectrum information encoding unit 31, the mode determining unit 32 of the encoding unit 21 determines the mode of the input speech S1 based on the quantized value of the spectrum information (step ST2). . That is, by executing a method known in the technical field such as speech coding and speech recognition, whether the state of each frame of the input speech S1 is voiced or unvoiced from the quantized value of the spectrum information. judge.
[0035]
The interpolation unit 33 of the encoding unit 21 selects an interpolation pattern according to the determination result of the state determination unit 32 among a plurality of interpolation methods (interpolation patterns) prepared in advance (step ST3).
Here, FIG. 3 is an explanatory diagram showing an example of an interpolation pattern. In FIG. 3, (A) is an example of an interpolation pattern in which spectral information is linearly interpolated between frames. (B) is an example of an interpolation pattern in which interpolation is performed so that the quantized value of the subframe matches the quantized value of the current frame at an early stage.
[0036]
For example, when the determination result of the state determination unit 32 indicates that the previous frame is unvoiced and the current frame is voiced, the interpolation unit 33 selects the interpolation pattern (B). In this case, the interpolation pattern (A) is selected.
By using such an interpolation pattern, as shown in FIG. 4, in the transitional part from unvoiced to voiced (second and third subframes which are voiced sections), the interpolated spectral information becomes true spectral information. Since they are close to each other, the quality of decoded speech in the section can be improved.
[0037]
When such interpolation is performed, depending on the framing of the input speech S1, as shown in FIG. 5, in the second subframe that is an unvoiced section, the interpolated spectrum information indicates a voiced spectrum mode. However, since the power in the unvoiced section is small, even if such a wrinkle occurs, auditory deterioration of the decoded speech does not occur.
[0038]
Note that the interpolation pattern of the spectrum information when the previous frame is unvoiced and the current frame is voiced is not limited to the interpolation pattern (B) in FIG. 3, but the interpolation pattern (B ′) shown in FIG. It is of course possible to use other interpolation patterns such as (B ″). FIGS. 3 to 6 show the configuration having four subframes per frame, but the number of subframes per frame is shown. Is not limited to 4, and it is naturally possible to provide an interpolation pattern in other configurations as well.
[0039]
When the interpolation unit 33 of the encoding unit 21 selects the interpolation pattern as described above, the quantization value of the spectrum information output from the spectrum information encoding unit 31 is interpolated between frames according to the interpolation pattern (step ST4).
The sound source information encoding unit 34 of the encoding unit 21 analyzes the input speech S1 and extracts sound source information.
Then, using the quantized value of the spectrum information output from the interpolation unit 33, the sound source information of the input speech S1 is encoded, and the obtained code of the sound source information is output to the multiplexing unit 22 (step ST5).
[0040]
The multiplexing unit 22 multiplexes the spectrum information code output from the encoding unit 21 and the sound source information code, and transmits the multiplexing result S2 to the demultiplexing unit 23 (step ST6).
When receiving the multiplexing result S2, the separating unit 23 separates them (step ST7). The code of the spectrum information is output to the spectrum information decoding unit 35, and the code of the excitation information is output to the excitation information decoding unit 38.
[0041]
When the spectrum information decoding unit 35 of the decoding unit 24 receives the code of the spectrum information from the separation unit 23, the spectrum information decoding unit 35 decodes the code of the spectrum information and uses the quantized value (decoding result) of the spectrum information as the state determination unit. 36 and the interpolation unit 37 (step ST8).
[0042]
When the state determination unit 36 of the decoding unit 24 receives the quantized value of the spectrum information from the spectrum information decoding unit 35, the state determination unit 36, based on the quantization value of the spectrum information, like the state determination unit 32 of the encoding unit 21. Then, the state of the decoded speech S3 is determined (step ST9).
[0043]
Like the interpolation unit 33 of the encoding unit 21, the interpolation unit 37 of the decoding unit 24 selects an interpolation pattern corresponding to the determination result of the state determination unit 36 from among a plurality of interpolation methods prepared in advance ( Step ST10).
Then, the quantized value of the spectrum information output from the spectrum information decoding unit 35 is interpolated between frames according to the selected interpolation pattern (step ST11).
[0044]
When receiving the code of the sound source information from the separation unit 23, the sound source information decoding unit 38 of the decoding unit 24 decodes the code of the sound source information and combines the quantized value (decoding result) of the sound source information with the synthesis unit 39. (Step ST12).
The synthesizing unit 39 of the decoding unit 24 synthesizes the quantized value of the spectrum information output from the interpolating unit 37 and the quantized value of the sound source information output from the sound source information decoding unit 38 to generate decoded speech S3. (Step ST13).
[0045]
As apparent from the above, according to the first embodiment, the encoding unit 21 determines the mode of the input speech S1 based on the quantized value of the spectrum information output from the spectrum information encoding unit 31. An aspect determination unit 32 is provided, and an interpolation method according to the determination result of the aspect determination unit 32 is executed to interpolate the quantized value of the spectrum information output from the spectrum information encoding unit 31 between frames. In the decoding unit 24, a mode determination unit 36 that determines the mode of the decoded speech S3 based on the quantized value of the spectrum information output from the spectrum information decoding unit 35 is provided. Since the interpolation method according to the result is executed and the quantized value of the spectrum information output from the spectrum information decoding unit 35 is interpolated between frames, As in, it is not necessary to transmit the code of the interpolation information, as a result, without increasing the amount of transmitted information, an effect that can realize the reproduction of high quality sound.
[0046]
In the first embodiment, when the state determination result of the input speech S1 satisfies the predetermined condition, the subframe is interpolated so that the quantized value of the subframe coincides with the quantized value of the current frame at an early stage. Even when the state of the input speech S1 changes suddenly, proper spectral information interpolation can be performed to generate decoded speech with high quality.
[0047]
Embodiment 2. FIG.
FIG. 7 is a block diagram showing a speech coding apparatus and speech decoding apparatus according to Embodiment 2 of the present invention. In the figure, the same reference numerals as those in FIG.
41 is a mode determination unit (mode determination unit) that determines the mode of the input speech S1 based on the code of the spectrum information output from the spectrum information encoding unit 31, and 42 is a mode of the decoded speech S3 based on the code of the spectrum information. This is a mode determination unit (mode determination unit).
[0048]
In the first embodiment, the state determining unit 32 determines the state of the input speech S1 based on the quantized value of the spectrum information output from the spectrum information encoding unit 31, and the state determining unit 36 is the spectrum information decoding unit. Although it has been shown that the state of the decoded speech S3 is determined based on the quantized value of the spectrum information output from 35, the present invention is not limited to this, and the state determination unit 41 is output from the spectrum information encoding unit 31. The mode of the input speech S1 may be determined based on the code of the spectrum information, and the mode determination unit 42 may determine the mode of the decoded speech S3 based on the code of the spectrum information, as in the first embodiment. There is an effect.
Moreover, you may make it determine the aspect of an audio | voice using both the code | symbol of spectrum information and a quantization value (an encoding result or a decoding result).
[0049]
Embodiment 3 FIG.
In the first and second embodiments, when the previous frame is unvoiced and the current frame is voiced, interpolation is performed so that the quantized value of the subframe matches the quantized value of the current frame at an early stage. In other cases, linear interpolation is shown. However, when the previous frame is voiced and the current frame is unvoiced, subframe quantization is performed as shown in FIG. Interpolation may be performed so that the value matches the quantization value of the previous frame until late.
According to the third embodiment, the interpolated spectrum information is close to the true spectrum information in the transitional part from voiced to unvoiced, so that high-quality decoded speech can be generated.
[0050]
Embodiment 4 FIG.
In the first to third embodiments described above, the voice mode is determined based on the two values of voiced and unvoiced. However, the present invention is not limited to this. Different interpolation patterns may be used accordingly.
In addition to voiced and unvoiced, determination may be made by classifying into many other voice modes such as unvoiced, voiced steady, and voiced transient, and different interpolation patterns may be used depending on the determination result of the mode.
[0051]
Furthermore, in Embodiments 1 to 3 described above, a simple interpolation pattern that is given heuristically is shown. However, the present invention is not limited to this, and for example, statistics using a database consisting of a large amount of uttered speech is used. The optimum interpolation pattern in each voice mode may be learned and acquired by a typical method.
According to the fourth embodiment, fine control such as interpolating spectral states suitable for each speech mode can be performed, so that high-quality decoded speech can be generated.
[0052]
Embodiment 5 FIG.
In the first to fourth embodiments, the speech mode is determined using only the spectrum information. However, the present invention is not limited to this. For example, the current frame determines the speech mode only from the spectrum information, and the past The state determination in the frame may be performed again using the sound source information of the past frame.
According to the fifth embodiment, it is possible to improve the state determination accuracy of the past frame, and thus it is possible to generate high quality decoded speech.
[0053]
【The invention's effect】
As described above, according to the present invention, based on the quantized value output from the spectrum information encoding means, Whether the previous and current frames are voiced or unvoiced A mode determining unit is provided, and an interpolation method is selected according to the determination result of the mode determining unit, and the interpolation method is executed to interpolate the quantized value output from the spectrum information encoding unit between frames. Thus, there is an effect that it is possible to realize reproduction of high-quality sound without increasing the amount of transmission information.
[0054]
According to this invention, based on the code of the spectrum information output from the spectrum information encoding means, Whether the previous and current frames are voiced or unvoiced A mode determining unit is provided, and an interpolation method is selected according to the determination result of the mode determining unit, and the interpolation method is executed to interpolate the quantized value output from the spectrum information encoding unit between frames. Thus, there is an effect that it is possible to realize reproduction of high-quality sound without increasing the amount of transmission information.
[0056]
According to the present invention, when the determination result of the mode determining means indicates that the previous frame of the input speech is unvoiced and the current frame is voiced, the quantized values of some subframes to be interpolated are present. Since the interpolation method that matches the quantized value of the frame is executed, there is an effect that it is possible to realize reproduction of high-quality sound even in a transitional part from unvoiced to voiced.
[0057]
According to the present invention, when the determination result of the mode determination means indicates that the previous frame of the input speech is voiced and the current frame is unvoiced, the quantization values of some subframes to be interpolated are Since the interpolation method that matches the quantized value of the frame is executed, there is an effect that it is possible to realize high-quality sound reproduction even in a transitional part from voiced to unvoiced.
[0058]
According to the present invention, based on the quantized value output from the spectrum information decoding means, the decoded speech is Whether the previous and current frames are voiced or unvoiced An aspect determining means is provided, an interpolation method is selected according to the determination result of the aspect determining means, and the interpolation method is executed to interpolate the quantized value output from the spectrum information decoding means between frames. Thus, there is an effect that it is possible to realize reproduction of high-quality sound without increasing the amount of transmission information.
[0059]
According to the present invention, the decoded speech is based on the code of the spectrum information. Whether the previous and current frames are voiced or unvoiced An aspect determining means is provided, an interpolation method is selected according to the determination result of the aspect determining means, and the interpolation method is executed to interpolate the quantized value output from the spectrum information decoding means between frames. Thus, there is an effect that it is possible to realize high-quality sound reproduction without causing an increase in the amount of transmission information.
[0061]
According to the present invention, when the determination result of the mode determination means indicates that the previous frame of the decoded speech is unvoiced and the current frame is voiced, the quantized values of some subframes to be interpolated are present. Since the interpolation method that matches the quantized value of the frame is executed, there is an effect that it is possible to realize reproduction of high-quality sound even in a transitional part from unvoiced to voiced.
[0062]
According to this invention, when the determination result of the mode determination means indicates that the previous frame of the decoded speech is voiced and the current frame is unvoiced, the quantization values of some subframes to be interpolated are Since the interpolation method that matches the quantized value of the frame is executed, there is an effect that it is possible to realize high-quality sound reproduction even in a transitional part from voiced to unvoiced.
[0063]
According to the present invention, based on the quantized value of the spectrum information, the input speech is Whether the previous and current frames are voiced or unvoiced Since it is configured to select an interpolation method according to the determination result, execute the interpolation method, and interpolate the quantized value of the spectrum information between frames, without increasing the amount of transmission information There is an effect that high-quality sound reproduction can be realized.
[0064]
According to the present invention, based on the sign of the spectrum information, the input speech Whether the previous and current frames are voiced or unvoiced Since it is configured to select an interpolation method according to the determination result, execute the interpolation method, and interpolate the quantized value of the spectrum information between frames, without increasing the amount of transmission information There is an effect that high-quality sound reproduction can be realized.
[0065]
According to the present invention, based on the quantized value of the spectrum information, the decoded speech is Whether the previous and current frames are voiced or unvoiced Since it is configured to select an interpolation method according to the determination result, execute the interpolation method, and interpolate the quantized value of the spectrum information between frames, without increasing the amount of transmission information There is an effect that high-quality sound reproduction can be realized.
[0066]
According to the present invention, the decoded speech is based on the code of the spectrum information. Whether the previous and current frames are voiced or unvoiced Since it is configured to select an interpolation method according to the determination result, execute the interpolation method, and interpolate the quantized value of the spectrum information between frames, without increasing the amount of transmission information There is an effect that high-quality sound reproduction can be realized.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a speech encoding apparatus and speech decoding apparatus according to Embodiment 1 of the present invention.
FIG. 2 is a flowchart showing a speech encoding method and a speech decoding method according to Embodiment 1 of the present invention.
FIG. 3 is an explanatory diagram showing an example of an interpolation pattern.
FIG. 4 is an explanatory diagram showing the relationship between spectrum information and voice.
FIG. 5 is an explanatory diagram showing the relationship between spectrum information and voice.
FIG. 6 is an explanatory diagram showing an example of an interpolation pattern.
FIG. 7 is a block diagram showing a speech encoding apparatus and speech decoding apparatus according to Embodiment 2 of the present invention.
FIG. 8 is an explanatory diagram showing an example of an interpolation pattern.
FIG. 9 is a block diagram showing a conventional speech coding apparatus and speech decoding apparatus.
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 1 Encoding part, 2 Multiplexing part, 3 Separation part, 4 Decoding part, 11 Spectrum information encoding part, 12 Interpolation part, 13 Excitation information encoding part, 14 Spectrum information decoding part, 15 Interpolation part, 16 Excitation Information decoding unit, 17 synthesizing unit, 21 encoding unit, 22 multiplexing unit, 23 demultiplexing unit, 24 decoding unit, 31 spectrum information encoding unit (spectrum information encoding unit), 32 mode determining unit (mode determining unit) ), 33 interpolation unit (interpolation unit), 34 sound source information encoding unit (sound source information encoding unit), 35 spectrum information decoding unit (spectrum information decoding unit), 36 mode determination unit (mode determination unit), 37 interpolation Unit (interpolation unit), 38 sound source information decoding unit (sound source information decoding unit), 39 synthesis unit (synthesis unit), 41 mode determination unit (mode determination unit), 42 mode determination unit (mode determination unit).

Claims

Spectral information encoding means for encoding spectral information of the input speech and outputting a code of the spectral information and a quantized value; a previous frame of the input speech based on the quantized value output from the spectral information encoding means; A mode determination unit for determining whether the current frame is voiced or unvoiced, and an interpolation method is selected according to the determination result of the mode determination unit, and the interpolation method is executed and output from the spectrum information encoding unit Interpolating means for interpolating the quantized values between frames, and sound source information coding for encoding sound source information of the input speech using the quantized values output from the interpolating means and outputting a code of the sound source information A speech encoding device comprising: means.

Spectral information encoding means for encoding spectral information of the input speech and outputting the code of the spectral information and a quantized value; and a previous frame of the input speech based on the code of the spectral information output from the spectral information encoding means And a mode determination unit for determining whether the current frame is voiced or unvoiced, and an interpolation method is selected according to a determination result of the mode determination unit, and the interpolation method is executed to execute the interpolation method from the spectrum information encoding unit. Interpolating means for interpolating output quantized values between frames, and sound source information code for encoding sound source information of the input speech using the quantized values output from the interpolating means and outputting codes of the sound source information And a speech encoding apparatus comprising:

When the determination result of the mode determination unit indicates that the previous frame of the input speech is unvoiced and the current frame is voiced, the interpolation unit determines that the quantization values of some subframes to be interpolated are those of the current frame. The speech encoding apparatus according to claim 1 or 2, wherein an interpolation method that matches the quantized value is executed.

When the determination result of the aspect determination unit indicates that the previous frame of the input speech is voiced and the current frame is unvoiced, the interpolation unit determines that the quantization values of some subframes to be interpolated are those of the previous frame. The speech encoding apparatus according to claim 1 or 2, wherein an interpolation method that matches the quantized value is executed.

Spectral information decoding means for decoding a spectrum information code and outputting a quantized value of the spectrum information, and a previous frame and a current frame of the decoded speech based on the quantized value output from the spectrum information decoding means A mode determination unit for determining whether the voice is voiced or unvoiced, and an interpolation method is selected according to a determination result of the mode determination unit, and the quantum method output from the spectrum information decoding unit is executed by executing the interpolation method. Interpolating means for interpolating quantized values between frames, sound source information decoding means for decoding the code of sound source information and outputting quantized values of the sound source information, quantized values output from the interpolating means and the sound sources A speech decoding apparatus comprising: synthesis means for synthesizing the quantized values output from the information decoding means to generate decoded speech.

Spectrum information decoding means for decoding the spectrum information code and outputting a quantized value of the spectrum information, and whether the previous frame and the current frame of the decoded speech are voiced or unvoiced based on the code of the spectrum information and determining manner determining means, to select the interpolation method in accordance with the determination result of the manner determining means, for interpolating an output quantized value from the spectral information decoding means to execute the interpolation method between frames Interpolating means, sound source information decoding means for decoding the code of the sound source information and outputting the quantized value of the sound source information, the quantized value output from the interpolating means and the sound source information decoding means A speech decoding apparatus comprising: synthesis means for synthesizing quantized values to generate decoded speech.

When the determination result of the aspect determination unit indicates that the previous frame of the decoded speech is unvoiced and the current frame is voiced, the interpolation unit determines that the quantization values of some subframes to be interpolated are those of the current frame. 7. The speech decoding apparatus according to claim 5, wherein an interpolation method that matches the quantized value is executed.

When the determination result of the aspect determination unit indicates that the previous frame of the decoded speech is voiced and the current frame is unvoiced, the interpolation unit determines that the quantization values of some subframes to be interpolated are those of the previous frame. 7. The speech decoding apparatus according to claim 5, wherein an interpolation method that matches the quantized value is executed.

Encodes the spectrum information of the input speech and outputs the spectrum information code, while determining whether the previous and current frames of the input speech are voiced or unvoiced based on the quantized value of the spectrum information. Then, when an interpolation method is selected according to the determination result, and the quantization method of the spectrum information is interpolated between the frames by executing the interpolation method, the input speech is obtained using the quantization value of the spectrum information after the interpolation process. Encoding method for encoding the sound source information and outputting the code of the sound source information.

Encode the spectrum information of the input speech and output the code of the spectrum information, while determining whether the previous frame and the current frame of the input speech are voiced or unvoiced based on the code of the spectrum information, When an interpolation method is selected according to the determination result, and the quantized value of the spectrum information is interpolated between the frames by executing the interpolation method, the sound source of the input sound is used using the quantized value of the spectrum information after the interpolation process. A speech encoding method for encoding information and outputting a code of the sound source information.

Decodes the spectrum information code and outputs the quantized value of the spectrum information, while determining whether the previous and current frames of the decoded speech are voiced or unvoiced based on the quantized value of the spectrum information Then, an interpolation method is selected according to the determination result, the quantization method is executed to interpolate the quantized value of the spectrum information between frames, the code of the sound source information is decoded, and the quantum of the sound source information is decoded. A speech decoding method for generating decoded speech by synthesizing the quantized value of the spectrum information after interpolation processing and the quantized value of the sound source information after outputting the quantized value.

While decoding the code of the spectrum information and outputting the quantized value of the spectrum information , determining whether the previous frame and the current frame of the decoded speech are voiced or unvoiced based on the code of the spectrum information, Select an interpolation method according to the determination result, execute the interpolation method, interpolate the quantized value of the spectrum information between frames, decode the code of the excitation information, and quantize the excitation information Is a speech decoding method for generating decoded speech by synthesizing the quantized value of spectrum information after interpolation processing and the quantized value of sound source information after interpolation.