JP2008310238A

JP2008310238A - Speech coder, decoder, speech coding program and speech decoding program

Info

Publication number: JP2008310238A
Application number: JP2007160092A
Authority: JP
Inventors: Kazuki Kakuno; 一樹客野
Original assignee: Axell Corp
Current assignee: Axell Corp
Priority date: 2007-06-18
Filing date: 2007-06-18
Publication date: 2008-12-25
Anticipated expiration: 2027-06-18
Also published as: JP4470122B2

Abstract

<P>PROBLEM TO BE SOLVED: To effectively suppress noise generation at a connection point of speech, when a plurality of pieces of speech are successively reproduced. <P>SOLUTION: The speech coder includes a separating section 2 and an encoder 3. The separating section 2 sequentially separates a series of speech waveforms into a plurality of groups. The encoder 3 performs encoding processing by making each of the plurality of groups a processing unit. At the same time, prediction encoding is performed sequentially in right order, for a start group including a start point of the speech waveform. The prediction encoding is performed sequentially in reverse order, for an end group including an end point of the speech waveform. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、予測符号化方式を用いた音声波形の符号化および復号化に関する。 The present invention relates to coding and decoding of speech waveforms using a predictive coding scheme.

音声波形の符号化／復号化の手法として、ＡＤＰＣＭ（Adaptive Differential Pulse Code Modulation：適応的差分パルス符号変調）を代表とする予測符号化方式が知られている。予測符号化は、音声波形は連続的に変化するという特徴に基づいて、時間的に直前のデータ化された値との差分（予測値）を符号化するものであり、ＰＣＭ（Pulse-Code Modulation：パルス符号変調）と比較して圧縮率が高いという利点がある。しかしながら、この符号化方式は、予測誤差を伴うため、復号化の処理進捗にともない予測誤差が重畳し、原波形との間にズレが生じてしまう。これにより、複数の音声を続けて再生する際、前の音声波形の終了点のＰＣＭ値と、これに続く後の音声波形の開始点のＰＣＭ値とを一致させているにも拘わらず、両点間に予測誤差に起因した不一致（ギャップ）が生じる。その結果、前後の音声のつなぎ目においてノイズが生じてしまうといった問題が発生する。 As a method for coding / decoding a speech waveform, a predictive coding method represented by ADPCM (Adaptive Differential Pulse Code Modulation) is known. Predictive coding encodes a difference (predicted value) from a value converted into data immediately before based on the feature that a speech waveform changes continuously. PCM (Pulse-Code Modulation) : Pulse code modulation) has an advantage that the compression rate is high. However, since this encoding method involves a prediction error, the prediction error is superimposed as the decoding process progresses, and a deviation occurs from the original waveform. As a result, when a plurality of voices are continuously played back, the PCM value at the end point of the previous voice waveform is matched with the PCM value at the start point of the subsequent voice waveform. There is a discrepancy (gap) due to the prediction error between the points. As a result, there arises a problem that noise occurs at the joint between the front and rear voices.

特許文献１には、ミュートすることなく、音声波形が不連続となることを防止する再生装置が開示されている。具体的には、２つの非圧縮の音声波形を用い、これらの音声波形の接続点においてクロスフェード処理を行うことで、接続点における急激な値の変動を防止する。しかしながら、クロスフェード処理を行うと、クロスフェード処理を施した分だけ再生時間が短くなるという問題がある。 Patent Document 1 discloses a playback device that prevents a sound waveform from becoming discontinuous without muting. Specifically, two uncompressed speech waveforms are used, and crossfade processing is performed at the connection points of these speech waveforms, thereby preventing sudden fluctuations in values at the connection points. However, when the crossfade process is performed, there is a problem that the reproduction time is shortened by the amount of the crossfade process.

特開２００１−１２８１１９号公報JP 2001-128119 A

本発明は、かかる事情に鑑みてなされたものであり、その目的は、複数の音声を続けて再生する際、音声のつなぎ目におけるノイズの発生を有効に抑制することである。 The present invention has been made in view of such circumstances, and an object of the present invention is to effectively suppress the generation of noise at a joint between voices when a plurality of voices are continuously played back.

かかる課題を解決するために、第１の発明は、分割部と、エンコーダとを有し、一連の音声波形を符号化する音声符号化装置を提供する。分割部は、音声波形を時系列的に複数のグループに分割する。エンコーダは、複数のグループのそれぞれを処理単位とした音声波形の符号化処理を行う。その際、音声波形の開始点を含む開始グループ内の音声波形に対しては、時系列的に正順で予測符号化が行われる。また、音声波形の終了点を含む終了グループ内の音声波形に対しては、時系列的に逆順で予測符号化が行われる。 In order to solve such a problem, the first invention provides a speech encoding apparatus that includes a dividing unit and an encoder and encodes a series of speech waveforms. The dividing unit divides the speech waveform into a plurality of groups in time series. The encoder performs an audio waveform encoding process using each of the plurality of groups as a processing unit. At this time, predictive coding is performed in the normal order in time series on the speech waveforms in the start group including the start point of the speech waveform. Also, predictive coding is performed in reverse order in time series on the speech waveforms in the end group including the end point of the speech waveform.

ここで、第１の発明において、エンコーダは、反転器と、予測符号エンコーダとを有することが好ましい。反転器は、時系列的に正順で入力された終了グループ内の音声波形が時系列的に逆順になるように、終了グループ内の音声波形を反転させる。予測符号エンコーダは、反転器を介さずに入力された時系列的に正順の開始グループと、反転器を介して入力された時系列的に逆順の終了グループとに対して、自己への入力順序にしたがって予測符号化を行う。また、エンコーダは、開始グループおよび終了グループを除く中間グループ内の音声波形に対して、変換符号化を行う変換符号エンコーダをさらに有していてもよい。 Here, in the first invention, the encoder preferably includes an inverter and a prediction code encoder. The inverter inverts the speech waveform in the end group so that the speech waveform in the end group input in the normal order in time series is in reverse order in the time series. The predictive code encoder inputs to itself the time-series forward start group input without going through the inverter and the time-series reverse end group inputted through the inverter. Predictive encoding is performed according to the order. The encoder may further include a transform code encoder that performs transform coding on the speech waveform in the intermediate group excluding the start group and the end group.

第１の発明において、予測符号エンコーダは、グループ内における時系列的な符号化順序を示す識別情報を、音声波形の符号語に付加することが好ましい。例えば、予測符号エンコーダは、開始グループに相当するグループを特定するための識別情報と、終了グループに相当するグループを特定するための識別情報とを、音声波形の符号語に付加してもよい。また、例えば、中間グループに相当するグループを特定するための識別情報を、音声波形の符号語に付加してもよい。 In the first invention, the predictive code encoder preferably adds identification information indicating a time-series coding order within a group to a codeword of a speech waveform. For example, the prediction code encoder may add identification information for specifying a group corresponding to the start group and identification information for specifying a group corresponding to the end group to the codeword of the speech waveform. Further, for example, identification information for specifying a group corresponding to the intermediate group may be added to the codeword of the speech waveform.

また、第１の発明において、分割部は、時系列的に互いに隣接したグループ間で音声波形が部分的に重複するように、音声波形を分割することが好ましい。 In the first invention, the dividing unit preferably divides the speech waveform so that the speech waveforms partially overlap between groups adjacent to each other in time series.

第２の発明は、判定部と、デコーダと、合成部とを有し、一連の音声波形を時系列的なグループ単位で符号化することで生成された符号語を復号化する音声復号化装置を提供する。判定部は、符号語に付加された識別情報に基づいて、復号化処理の対象となる符号語のグループが、音声波形の開始点を含む開始グループであるか否か、および、音声波形の終了点を含む終了グループであるか否かを判定する。デコーダは、符号語の復号化処理をグループ単位で行い、開始グループであると判定されたグループ内の符号語に対しては、時系列的に正順で予測符号化方式による復号化を行い、終了グループであると判定されたグループ内の符号語に対しては、時系列的に逆順で予測符号化方式による復号化を行う。合成部は、デコーダによって生成された複数の復号波形を、時系列的に合成することで一連の音声波形を復元する。 2nd invention has a determination part, a decoder, and a synthetic | combination part, The audio | voice decoding apparatus which decodes the codeword produced | generated by encoding a series of audio | voice waveform by a time-sequential group unit I will provide a. Based on the identification information added to the codeword, the determination unit determines whether or not the group of codewords to be decoded is a start group including the start point of the speech waveform, and the end of the speech waveform It is determined whether or not the end group includes a point. The decoder performs codeword decoding processing in units of groups, and for the codewords in the group determined to be the start group, performs decoding by the predictive coding method in the normal order in time series, For the code words in the group determined to be the end group, decoding is performed by the predictive coding method in reverse order in time series. The synthesis unit restores a series of speech waveforms by synthesizing a plurality of decoded waveforms generated by the decoder in time series.

ここで、第２の発明において、デコーダは、自己への入力順序にしたがって、それぞれのグループ内の符号語に対して、予測符号化方式による復号化を行うことで、復号波形をグループ単位で生成する予測符号デコーダと、終了グループであると判定されたグループ内の復号波形が時系列的に逆順になるように、このグループ内の復号波形を反転させる反転器とを有することが好ましい。 Here, in the second invention, the decoder generates a decoded waveform in units of groups by performing decoding using a predictive coding method for the codewords in each group according to the input order to itself. It is preferable to include a predictive code decoder that performs this operation and an inverter that inverts the decoded waveforms in the group so that the decoded waveforms in the group determined to be the end group are reversed in time series.

第２の発明において、判定部は、復号化処理の対象となる符号語のグループが、開始グループおよび終了グループを除く中間グループであるか否かを判定してもよい。この場合、デコーダは、中間グループであると判定されたグループ内の符号語に対しては、変換符号化方式による復号化を行う変換符号デコーダをさらに有することが望ましい。 In the second invention, the determination unit may determine whether or not a group of codewords to be subjected to the decoding process is an intermediate group excluding a start group and an end group. In this case, it is desirable that the decoder further includes a transform code decoder that performs decoding by a transform coding scheme for codewords in the group determined to be an intermediate group.

第２の発明において、判定部は、符号語に付加された識別情報に基づいて、当該符号語が復号化された復号波形が他の復号波形と時系列的にオーバーラップするかを判定してもよい。この場合、合成部は、復号化された復号波形が、時系列的に互いに隣接した他の復号波形とオーバラップする範囲であると判定された場合、当該復号された復号波形に対してフェード処理を行うことが好ましい。 In the second invention, the determination unit determines whether the decoded waveform obtained by decoding the code word overlaps with another decoded waveform in time series based on the identification information added to the code word. Also good. In this case, when it is determined that the decoded waveform is in a range where the decoded waveform overlaps with another decoded waveform adjacent to each other in time series, the synthesis unit performs a fade process on the decoded waveform. It is preferable to carry out.

第３の発明は、グループを処理単位として一連の音声波形を符号化する音声符号化方法をコンピュータに実行させる音声符号化プログラムを提供する。このプログラムは、音声波形を時系列的に複数のグループに分割する第１のステップと、複数のグループのそれぞれを処理単位とした音声波形の符号化処理を行い、音声波形の開始点を含む開始グループ内の音声波形に対しては、時系列的に正順で予測符号化を施すとともに、音声波形の終了点を含む終了グループ内の音声波形に対しては、時系列的に逆順で予測符号化を施す第２のステップとを有する音声符号化方法をコンピュータに実行させる。 The third invention provides a speech encoding program that causes a computer to execute a speech encoding method that encodes a series of speech waveforms using a group as a processing unit. This program performs a first step of dividing a speech waveform into a plurality of groups in time series, a speech waveform encoding process in which each of the plurality of groups is a processing unit, and a start including a start point of the speech waveform Predictive coding is applied to the speech waveforms in the group in chronological order, and predictive codes are applied to the speech waveforms in the end group including the end point of the speech waveform in reverse order in chronological order. And causing the computer to execute a speech encoding method including a second step of performing the conversion.

第４の発明は、一連の音声波形を時系列的なグループ単位で符号化することで生成された符号語を復号化する音声復号化方法をコンピュータに実行させる音声復号化プログラムを提供する。このプログラムは、符号語に付加された識別情報に基づいて、復号化処理の対象となる符号語のグループが、音声波形の開始点を含む開始グループであるか否か、および、音声波形の終了点を含む終了グループであるか否かを判定する第１のステップと、符号語の復号化処理をグループ単位で行い、開始グループであると判定されたグループ内の符号語に対しては、時系列的に正順で予測符号化方式による復号化を施し、終了グループであると判定されたグループ内の符号語に対しては、時系列的に逆順で予測符号化方式による復号化を施す第２のステップと、グループ毎に復号化された復号波形を時系列的に合成することで一連の音声波形を復元する第３のステップとを有する音声復号化方法をコンピュータに実行させる。
The fourth invention provides a speech decoding program for causing a computer to execute a speech decoding method for decoding a codeword generated by encoding a series of speech waveforms in time-series group units. Based on the identification information added to the codeword, this program determines whether or not the group of codewords to be decoded is a start group including the start point of the speech waveform, and the end of the speech waveform The first step of determining whether or not the end group includes a point, and the codeword decoding process is performed in units of groups, and for codewords in the group determined to be the start group, First, decoding using the predictive coding method is performed in the forward sequence, and the codeword in the group determined to be the end group is decoded using the predictive coding method in the reverse sequence in the time series. The computer is caused to execute a speech decoding method including the second step and a third step of restoring a series of speech waveforms by synthesizing the decoded waveforms decoded for each group in time series.

本発明によれば、開始グループに対して、音声波形の開始点から時系列的な順序で予測符号化方式による符号化を行うため、これを復号化した音声波形の開始点は、符号化前の音声波形の終了点と一致する。一方、終了グループに対して、音声波形の終了点から時系列的な順序とは逆の順序で予測符号化による符号化を行うため、これを復号化した音声波形の終了点は、符号化前の音声波形の終了点と一致する。つまり、復元した音声波形の開始点と終了点は、符号化前の音声波形と一致するため、音声波形間における開始点および終了点のギャップによるノイズを抑制できる。 According to the present invention, the start group is encoded by the predictive encoding method in the time-series order from the start point of the speech waveform. This coincides with the end point of the voice waveform. On the other hand, for the end group, encoding by predictive encoding is performed in an order reverse to the time-series order from the end point of the speech waveform. This coincides with the end point of the voice waveform. That is, since the start point and end point of the restored speech waveform match the speech waveform before encoding, noise due to the gap between the start point and the end point between speech waveforms can be suppressed.

（第１の実施形態）
＜符号化＞
図１は、本実施形態における音声符号化装置の構成図である。この音声符号化装置は、記憶部１と、分割部２と、エンコーダ３とを有し、一連の音声波形（原波形）を符号化する。記憶部１は、符号化対象となる音声波形を格納する。音声波形は、複数のサンプル点で構成されており、複数のサンプル点のそれぞれに位置情報（アドレス）が時系列順に対応付けられている。 (First embodiment)
<Encoding>
FIG. 1 is a configuration diagram of a speech encoding apparatus according to this embodiment. This speech encoding apparatus includes a storage unit 1, a dividing unit 2, and an encoder 3, and encodes a series of speech waveforms (original waveforms). The storage unit 1 stores a speech waveform to be encoded. The speech waveform is composed of a plurality of sample points, and position information (address) is associated with each of the plurality of sample points in time series order.

分割部２は、記憶部１に格納された音声波形を読み出し、この音声波形を時系列的に複数のグループに分割する。本実施形態における分割数は２であり、音声波形は、その開始点を含む開始グループと、その終了点を含む終了グループとの２つに分割される。また、分割部２は、時系列的に互いに隣接したグループ間で音声波形が部分的に重複するように、音声波形を分割する。具体的には、開始グループに関しては、その終端が分割点以降の所定時間分（マージン）を含むように設定される。また、これに代えて、或いは、これと併用して、終了グループに関して、その先端が分割点以前の所定時間分（マージン）を含むように設定してもよい。これにより、開始グループと終了グループとが、上記マージンに相当する数サンプル分だけオーバーラップする。 The dividing unit 2 reads the speech waveform stored in the storage unit 1 and divides the speech waveform into a plurality of groups in time series. In this embodiment, the number of divisions is two, and the speech waveform is divided into two groups: a start group including the start point and an end group including the end point. Further, the dividing unit 2 divides the speech waveform so that the speech waveforms partially overlap between groups adjacent to each other in time series. Specifically, the start group is set so that the end thereof includes a predetermined time (margin) after the dividing point. Alternatively, or in combination with this, the end group may be set so that the tip thereof includes a predetermined time (margin) before the dividing point. As a result, the start group and the end group overlap by several samples corresponding to the margin.

エンコーダ３は、分割部２によって規定された個々のグループを処理単位とし、グループ内の音声波形に対して符号化処理を施す。その際、音声波形の開始点を含む開始グループに対しては、時系列的に正順で予測符号化が行われる。また、音声波形の終了点を含む終了グループに対しては、時系列的に逆順で予測符号化が行われる。予測符号化方式の一例として、ＡＤＰＣＭ（Adaptive Differential Pulse Code Modulation：適応的差分ＰＣＭ）が用いられる。 The encoder 3 uses each group defined by the dividing unit 2 as a processing unit, and performs encoding processing on the speech waveform in the group. At this time, predictive coding is performed in the normal order in time series on the start group including the start point of the speech waveform. Also, for the end group including the end point of the speech waveform, predictive coding is performed in reverse order in time series. As an example of the predictive encoding method, ADPCM (Adaptive Differential Pulse Code Modulation) is used.

エンコーダ３は、反転器３ａと予測符号エンコーダ３ｂとで構成される。反転器３ａは、時系列的に正順で入力された終了グループの音声波形を時系列的に反転することで、時系列的に逆順の終了グループを出力する。これにより、一連の音声波形の終了点に相当する終了グループの終端は、この終了グループの先端にシフトする。この反転処理は、具体的には、終了グループ内における音声波形の各サンプル点に対応するアドレスを時間軸上で逆転させることによって行われる。予測符号エンコーダ３ｂは、反転器３ａを介さずに入力された時系列的に正順の開始グループと、反転器３ａを介して入力された時系列的に逆順の終了グループとに対して、これらの入力順序（すなわち、時系列的に正順）にしたがって予測符号化を行い、音声波形の符号語を出力する。 The encoder 3 includes an inverter 3a and a prediction code encoder 3b. The inverter 3a outputs the end groups in the reverse order in time series by inverting the speech waveforms of the end groups input in the normal order in time series. As a result, the end of the end group corresponding to the end point of the series of speech waveforms is shifted to the end of the end group. Specifically, this inversion process is performed by reversing the address corresponding to each sample point of the speech waveform in the end group on the time axis. The predictive code encoder 3b applies the time series forward start group input without going through the inverter 3a and the time series reverse end group inputted through the inverter 3a to these Are subjected to predictive coding in accordance with the input order (i.e., normal order in time series), and a speech waveform code word is output.

なお、後述の復号化のために、予測符号エンコーダ３ｂは、グループの種別、具体的には、グループ内における時系列的な符号化順序を示す識別情報をグループフラグとして音声波形の符号語に付加する。このグループフラグによって、開始グループに相当するグループ、終了グループに相当するグループ、および中間グループに相当するグループが識別・特定される。 Note that, for the later-described decoding, the predictive code encoder 3b adds identification information indicating the type of group, specifically, time-series coding order within the group, to the code word of the speech waveform as a group flag. To do. By this group flag, a group corresponding to the start group, a group corresponding to the end group, and a group corresponding to the intermediate group are identified and specified.

さらに、予測符号エンコーダ３ｂは、符号化対象のグループのうち、時系列的に隣接する他のグループとオーバーラップする部分に相当するか否かを示す識別情報をフェードフラグとして付加する。本実施形態では、開始グループのうち、終了グループとオーバラップする範囲に属するサンプル点を符号化した場合には、対応する符号語にその旨が付加される。逆に、終了グループのうち、開始グループとオーバラップする範囲におけるサンプル点を符号化した場合には、対応する符号語にその旨が付加される。オーバーラップしないサンプル点を符号化した場合には、対応する符号語にその旨が付加される。 Furthermore, the predictive code encoder 3b adds, as a fade flag, identification information indicating whether or not the encoding target group corresponds to a portion overlapping with another group adjacent in time series. In the present embodiment, when a sample point belonging to a range overlapping with the end group in the start group is encoded, this is added to the corresponding code word. On the contrary, when the sample point in the range which overlaps with the start group among the end groups is encoded, the fact is added to the corresponding code word. When a non-overlapping sample point is encoded, this is added to the corresponding code word.

＜復号化＞
図２は、本実施形態における音声復号化装置の構成図である。この音声復号化装置は、図１に示した音声符号化装置により符号化された符号語を音声波形に復号化する。音声復号化装置は、判定部４と、デコーダ５と、合成部６と、記憶部７とで構成される。 <Decryption>
FIG. 2 is a configuration diagram of the speech decoding apparatus according to the present embodiment. This speech decoding apparatus decodes the codeword encoded by the speech encoding apparatus shown in FIG. 1 into a speech waveform. The speech decoding apparatus includes a determination unit 4, a decoder 5, a synthesis unit 6, and a storage unit 7.

判定部４は、符号語に付された識別情報に基づいて、この符号語を復号化した復号波形（正確に言えばサンプル値）が、原波形のいずれのグループに相当するか否かを判定する。具体的には、符号語に付加されているグループフラグに基づいて、グループの種別（開始グループ、終了グループ、中間グループ）が判定される。また、符号語に付されたフェードフラグに基づいて、上述したオーバーラップ部分も判定される。判定部４による判定結果は、後述するデコーダ５（反転器５ｂ）、合成部６（乗算器６ａ）にそれぞれ出力される。 The determination unit 4 determines, based on the identification information attached to the code word, to which group of the original waveform the decoded waveform (more precisely, the sample value) obtained by decoding the code word corresponds. To do. Specifically, the group type (start group, end group, intermediate group) is determined based on the group flag added to the codeword. Further, the above-described overlap portion is also determined based on the fade flag attached to the code word. The determination result by the determination unit 4 is output to a decoder 5 (inverter 5b) and a synthesis unit 6 (multiplier 6a) described later.

デコーダ５は、予測符号デコーダ５ａと、反転器５ｂとで構成される。予測符号デコーダ５ａは、入力された符号語に対して時系列的に正順で予測符号化方式による復号化を行い、復号波形を生成する。このとき、図１に示した符号化において、いずれのグループも、時系列的に正順で予測符号化方式による符号化がされているため、予測符号デコーダ５ａは、時系列的に正順であるか否かを問わずに復号化を行うことができる。 The decoder 5 includes a prediction code decoder 5a and an inverter 5b. The predictive code decoder 5a decodes the input codeword in the time series in the normal order according to the predictive encoding method, and generates a decoded waveform. At this time, in the encoding shown in FIG. 1, since all groups are encoded by the predictive encoding method in the normal order in time series, the predictive code decoder 5a is in the normal order in time series. Decoding can be performed regardless of whether or not there is.

反転器５ｂは、判定部４の判定に従って、予測符号デコーダ５から生成された復号波形を必要に応じて時系列的に反転する。符号語が開始グループを符号化したものと判定された場合、この符号語が予測符号デコーダ５により生成された復号波形がそのまま出力される（順方向エンコード）。一方、符号語が終了グループを符号化したものと判定された場合、反転器５ｂによって、この符号語が予測符号デコーダ５ａにより生成された復号波形が、時系列的に反転されて出力される（逆方向エンコード）。 The inverter 5b inverts the decoded waveform generated from the predictive code decoder 5 in time series according to the determination of the determination unit 4. When it is determined that the code word is obtained by encoding the start group, the decoded waveform generated by the predictive code decoder 5 for this code word is output as it is (forward encoding). On the other hand, when it is determined that the code word is obtained by encoding the end group, the decoded waveform generated by the predictive code decoder 5a of the code word is inverted and output by the inverter 5b in time series ( Reverse encoding).

合成部６は、デコーダ５によって生成されたグループ毎の復号波形を、時系列的に合成することで一連の音声波形を復元する。合成の際、判定部４の判定に従って、反転器５ｂから出力された復号波形にクロスフェード処理が施される。図３は、本実施形態にかかるクロスフェード処理の説明図である。クロスフェード処理は、２の音声波形をミックスする方法であり、例えば、時系列的に互いに隣接するグループがオーバーラップする範囲において、前のグループをフェードアウトさせながら、後のグループをフェードインさせて再生する。本実施形態のクロスフェード処理は、開始グループおよび終了グループのそれぞれに相当する復号するグループとがオーバーラップする範囲で行われる。 The synthesizer 6 restores a series of speech waveforms by synthesizing the decoded waveforms for each group generated by the decoder 5 in time series. At the time of synthesis, according to the determination of the determination unit 4, the decoded waveform output from the inverter 5b is subjected to crossfade processing. FIG. 3 is an explanatory diagram of the crossfade processing according to the present embodiment. Crossfade processing is a method of mixing two audio waveforms. For example, within a range where adjacent groups overlap in time series, the previous group fades out and the subsequent group fades in and plays. To do. The crossfade process of the present embodiment is performed in a range where the decoding groups corresponding to the start group and the end group overlap each other.

図１で示した符号化において、開始グループおよび終了グループは、時系列にある程度余裕をもって分割される。そのため、各グループ間をクロスフェード処理も用いて連結することで、復号化された音声波形の再生時間を、符号化前の音声波形のそれと一致させることができる。また、クロスフェード処理によって、２つのグループが連続再生する際に発生するノイズを抑制させることができる。合成部６は、乗算器６ａと加算器６ｂとで構成される。 In the encoding shown in FIG. 1, the start group and the end group are divided into a time series with some margin. Therefore, by connecting the groups using the cross-fade process, the reproduction time of the decoded speech waveform can be matched with that of the speech waveform before encoding. Further, noise generated when two groups continuously reproduce can be suppressed by the crossfade process. The synthesizer 6 includes a multiplier 6a and an adder 6b.

乗算器６ａは、判定部４の指示に従い、必要に応じて反転器５ｂから生成された復号波形の一部に対して所定値を乗算する。例えば、前段の復号波形の一部が、時系列的に隣接する後段の復号波形とオーバラップするものと判定部４により判定された場合、乗算器６ａは、オーバーラップする範囲に属するサンプル点の値に、所定値を乗算して出力する。一方、後段の復号波形の一部が、時系列的に隣接する前段の復号波形とオーバラップするものと判定部４により判定された場合、乗算器６ａは、オーバーラップする範囲に属する復号波形のサンプル点の値に、別の所定値を乗算して出力する。いずれの復号波形にもオーバラップしないものと判定部４により判定された場合、所定値を乗算せずにそのまま出力される。 The multiplier 6a multiplies a part of the decoded waveform generated from the inverter 5b by a predetermined value as necessary according to the instruction of the determination unit 4. For example, when the determination unit 4 determines that a part of the decoded waveform in the previous stage overlaps with the decoded waveform in the subsequent stage that is adjacent in time series, the multiplier 6a determines the sampling points belonging to the overlapping range. The value is multiplied by a predetermined value and output. On the other hand, when the determination unit 4 determines that a part of the decoded waveform in the subsequent stage overlaps with the decoded waveform in the preceding stage that is adjacent in time series, the multiplier 6a determines the decoded waveform belonging to the overlapping range. The sample point value is multiplied by another predetermined value and output. When it is determined by the determination unit 4 that the waveform does not overlap any decoded waveform, it is output as it is without being multiplied by a predetermined value.

加算器６ｂは、乗算器６ａから出力された復号波形と、記憶部７に格納された復号波形とを時系列的に対応させながら加算する。記憶部７には、一連の復号化処理（デコーダ５、合成部６）を介して、従前の処理で出力された復号波形が格納されている。加算器６ｂは、乗算器６ａから出力された復号波形に時系列的に対応する復号波形と、乗算器７ａから出力された復号波形とを加算して再び記憶部７へ格納する。 The adder 6b adds the decoded waveform output from the multiplier 6a and the decoded waveform stored in the storage unit 7 while making them correspond in time series. The storage unit 7 stores the decoded waveform output in the previous process through a series of decoding processes (decoder 5 and synthesis unit 6). The adder 6b adds the decoded waveform corresponding to the decoded waveform output from the multiplier 6a in time series and the decoded waveform output from the multiplier 7a, and stores the result in the storage unit 7 again.

本実施形態によれば、複数の音声を続けて再生する際、音声のつなぎ目における予測誤差に起因したノイズの発生を有効に抑制することが可能になる。開始グループ内の音声波形に対して時系列的に正順で予測符号化を施すので、音声波形の開始点では予測誤差が生じない。それとともに、終了グループ内の音声波形には時系列的に逆順で予測符号化を施すので、音声波形の終了点でも予測誤差が生じない。これにより、複数の音声を続けて再生する際、前の音声波形の終了点と、これに続く後の音声波形の開始点との不一致（ギャップ）を解消できる。 According to this embodiment, when a plurality of voices are continuously played back, it is possible to effectively suppress the occurrence of noise due to the prediction error at the voice joint. Since predictive coding is performed on the speech waveforms in the start group in chronological order, no prediction error occurs at the start point of the speech waveform. At the same time, since the predictive coding is performed on the speech waveforms in the end group in reverse order in time series, no prediction error occurs even at the end point of the speech waveform. Thereby, when a plurality of voices are continuously played back, it is possible to eliminate a mismatch (gap) between the end point of the previous voice waveform and the start point of the subsequent voice waveform.

例えば、連続する２つの音声波形（前段波形、後段波形）をそれぞれ原波形として上記の符号化／復号化を施しても、両波形間のギャップは発生しないので、この間にクロスフェード処理を施すことなく音声再生をさせることができる。後段波形が、前段波形の再生中にリアルタイム、かつ、ランダムに選択される場合であっても、前段波形の終了点が、選択される後段波形の開始点と一致する場合、クロスフェード処理をする必要もなくなるので、あらかじめ後段波形の全体を準備する（復号化しておく）必要なくなる。 For example, even if the above-described encoding / decoding is performed using two consecutive speech waveforms (previous waveform and subsequent waveform) as original waveforms, a gap between the two waveforms does not occur. It is possible to play audio without any problem. Even if the subsequent waveform is selected in real time and randomly during playback of the previous waveform, if the end point of the previous waveform matches the start point of the selected subsequent waveform, crossfade processing is performed. Since it becomes unnecessary, it becomes unnecessary to prepare (decode) the entire subsequent waveform in advance.

本実施形態では、予測符号化としてＡＤＰＣＭを採用したが、本発明は、原波形における開始点（終了点）の値を符号化／復号化においても一定させることができる符号化であれば足りる。そのため、例えば、ＤＰＣＭ（Differential Pulse Code Modulation：差分パルス符号変調）であってもよい。 In the present embodiment, ADPCM is employed as predictive coding. However, the present invention only needs to be able to make the value of the start point (end point) in the original waveform constant in coding / decoding. Therefore, for example, DPCM (Differential Pulse Code Modulation) may be used.

本実施形態では、符号語に別途付加された識別情報（グループフラグ、フェードフラグ）に基づいて、デコーダ５（反転器５ｂ）および合成部６（乗算器６ａ）の処理の可否が決定された。しかし、本発明は、復号波形が、原波形におけるいずれのグループであるかという点や、時系列的に隣接するグループとオーバーラップ重複するか否かという点が、復号化処理時に判定できれば足りる。したがって、例えば、各グループに対応する位置情報（アドレス）を反転器３ａによって変更し、判定部４がこの位置情報に基づいてグループの種別や、オーバーラップの有無を判定してもよい。 In this embodiment, whether or not the decoder 5 (inverter 5b) and the synthesizer 6 (multiplier 6a) can perform processing is determined based on identification information (group flag and fade flag) added to the codeword separately. However, the present invention only needs to be able to determine at the time of decoding processing whether the decoded waveform is a group in the original waveform and whether it overlaps with a group adjacent in time series. Therefore, for example, the position information (address) corresponding to each group may be changed by the inverter 3a, and the determination unit 4 may determine the type of group and the presence or absence of overlap based on this position information.

本実施形態では、符号化対象となるグループは２つ（開始グループ、終了グループ）であったが、本発明はこれに限定されない。本発明は、音声波形が３以上のグループに分割されていても、開始点を含む開始グループと、終了点を含む終了グループとに対して、予測符号化方式による符号化／復号化が行えれば足りる。３以上のグループに分割した場合については、次の実施形態で説明する。 In this embodiment, there are two groups (start group and end group) to be encoded, but the present invention is not limited to this. According to the present invention, even when a speech waveform is divided into three or more groups, encoding / decoding by a predictive encoding method can be performed on a start group including a start point and an end group including an end point. It's enough. The case of dividing into three or more groups will be described in the next embodiment.

（第２の実施形態）
＜符号化＞
図４は、本実施形態における音声符号化装置の構成図である。この音声符号化装置は、第１の実施形態と同様に、一連の音声波形を符号化するものであるが、開始グループ、終了グループに加えて、中間グループをも符号化する点が異なる。中間グループとは、複数のグループのうち、開始グループおよび終了グループを除いたグループ、すなわち、符号化対象の音声波形の開始点および終了点のいずれも含まないグループである。構成的には、変換符号エンコーダ３ｃをさらに有する点が第１の実施形態と異なる。 (Second Embodiment)
<Encoding>
FIG. 4 is a configuration diagram of the speech encoding apparatus according to the present embodiment. This speech encoding apparatus encodes a series of speech waveforms as in the first embodiment, but differs in that in addition to a start group and an end group, an intermediate group is also encoded. The intermediate group is a group excluding a start group and an end group among a plurality of groups, that is, a group that includes neither the start point nor the end point of the speech waveform to be encoded. Structurally, this embodiment differs from the first embodiment in that it further includes a transform code encoder 3c.

変換符号エンコーダ３ｃは、分割部２によって抽出された中間グループに変換符号化を行い、符号語を出力する。変換符号化とは、直交変換に基づいて、データを冗長度の少ない軸に変換してから圧縮符号化する手法である。本実施形態における変換符号化は、例えばＭＤＣＴ（Modified Discrete Cosine Transform：修正離散コサイン変換）を利用したＭＰ３（MPEG-1 Audio Layer-3）やＡＡＣ（Advanced Audio Coding）等である。なお、後述の復号化のために、変換符号エンコーダ３ｃは、生成した符号語に対して、中間グループを示すグループフラグ、および時系列的に隣接する他のグループとオーバーラップする部分に相当するか否かを示すフェードフラグを付加する。 The transform code encoder 3c performs transform coding on the intermediate group extracted by the dividing unit 2 and outputs a code word. Transform coding is a technique for performing compression coding after data is converted to an axis with less redundancy based on orthogonal transform. The transform coding in the present embodiment is, for example, MP3 (MPEG-1 Audio Layer-3) or AAC (Advanced Audio Coding) using MDCT (Modified Discrete Cosine Transform). For decoding described later, the transform code encoder 3c corresponds to the generated codeword corresponding to a group flag indicating an intermediate group and a portion overlapping with another group adjacent in time series. A fade flag indicating whether or not is added.

＜復号化＞
図５は、本実施形態における音声復号化装置の構成図である。この音声復号化装置は、図４に示した音声符号化装置によって符号化された符号語を原波形に復号化する。本実施形態の音声復号化装置は、第１の実施形形態のそれと比較すると、セレクタ８と、変換符号デコーダ５ｃとを有する点が異なる。これに応じて、判定部４および合成部６の機能も若干変更される。その他の構成・機能については、第１の実施形態と同様なので説明を省略する。 <Decryption>
FIG. 5 is a configuration diagram of the speech decoding apparatus according to the present embodiment. This speech decoding apparatus decodes the codeword encoded by the speech encoding apparatus shown in FIG. 4 into an original waveform. The speech decoding apparatus according to this embodiment is different from that according to the first embodiment in that it includes a selector 8 and a transform code decoder 5c. Accordingly, the functions of the determination unit 4 and the synthesis unit 6 are slightly changed. Other configurations and functions are the same as those in the first embodiment, and thus description thereof is omitted.

判定部４は、第１の実施形態と同様に、符号語に付されたグループフラグおよびフェードフラグを判定する。判定部４は、符号化が予測符号エンコーダ３ｂによってされたものと判定した場合、セレクタ８によってこの符号語を予測符号デコーダ５ａに入力させる。判定部４は、符号化が変換符号エンコーダ３ｃによってされたものと判定した場合、セレクタ８によってこの符号語を変換符号デコーダ５ｃに入力させる。変換符号デコーダ５ｃは、セレクタ８から入力された符号語に、変換符号化を用いた復号化を行い、復号波形を生成する。 The determination unit 4 determines the group flag and the fade flag attached to the code word as in the first embodiment. When the determination unit 4 determines that the encoding is performed by the prediction code encoder 3b, the determination unit 4 causes the selector 8 to input the code word to the prediction code decoder 5a. When the determination unit 4 determines that the encoding is performed by the conversion code encoder 3c, the determination unit 4 causes the selector 8 to input the code word to the conversion code decoder 5c. The transform code decoder 5c decodes the codeword input from the selector 8 using transform coding to generate a decoded waveform.

合成部６は、判定部４の指示に従って、デコーダ５から出力された復号波形にクロスフェード処理を施す。図６は、本実施形態にかかるクロスフェード処理の説明図である。本実施形態におけるクロスフェード処理は、第１の実施形態と同様に、復号化された復号波形が、他の復号波形と時間的にオーバーラップする範囲において行われる。 The synthesizer 6 performs a cross-fade process on the decoded waveform output from the decoder 5 in accordance with an instruction from the determination unit 4. FIG. 6 is an explanatory diagram of the crossfade processing according to the present embodiment. As in the first embodiment, the crossfading process in the present embodiment is performed in a range where the decoded waveform overlaps with other decoded waveforms in time.

本実施形態によれば、第１の実施形態と同様に、複数の音声を続けて再生する際、音声のつなぎ目における予測誤差に起因したノイズの発生を有効に抑制することが可能になる。第１の実施形態では、原波形全体に対して予測符号化方式による符号化／復号化が行われていた。これに対して、本実施形態では、そのうち開始グループおよび終了グループのみに対して同形式による符号化／復号化が行われ、残りの一部（中間グループ）に対して変換符号化による符号化／復号化を行っている。そのため、本実施形態は、第１の実施形態と比較して、品質の劣化をより効果的に防止することができる。 According to the present embodiment, as in the first embodiment, when a plurality of voices are continuously played back, it is possible to effectively suppress the occurrence of noise due to the prediction error at the voice joint. In the first embodiment, the entire original waveform is encoded / decoded by the predictive encoding method. On the other hand, in the present embodiment, only the start group and the end group are encoded / decoded in the same format, and the remaining part (intermediate group) is encoded / decoded by transform encoding. Decryption is in progress. Therefore, this embodiment can prevent deterioration of quality more effectively than the first embodiment.

本実施形態では、中間グループの符号化形式は、変換符号化方式が用いられていたが、本発明はこれに限定されず、例えば、予測符号化方式が用いられてもよい。また、音声波形が４つ以上に分割され、中間グループが２以上ある場合であっても、中間グループに対して上記の符号化方式による符号化を行えば、第１の実施形態と同様の効果を奏することができる。 In the present embodiment, the transform coding scheme is used as the intermediate group coding format, but the present invention is not limited to this, and for example, a predictive coding scheme may be used. Further, even when the speech waveform is divided into four or more and there are two or more intermediate groups, the same effect as that of the first embodiment can be obtained by performing the encoding by the above encoding method on the intermediate group. Can be played.

なお、第１および第２の実施形態は、それぞれハードウェア回路上での説明であったが、本発明はこれに限定されない。例えば、上記の機能を実現するプログラムを、コンピュータが読み取り可能な記憶媒体に記録し、この記憶媒体に記録されたプログラムをコンピュータシステムに実行させても実現可能である。 The first and second embodiments have been described on the hardware circuit, but the present invention is not limited to this. For example, the present invention can be realized by recording a program that realizes the above functions on a computer-readable storage medium and causing the computer system to execute the program recorded on the storage medium.

第１の実施形態にかかる音声符号化装置の構成図1 is a configuration diagram of a speech encoding apparatus according to a first embodiment. 第１の実施形態にかかる音声復号化装置の構成図1 is a configuration diagram of a speech decoding apparatus according to a first embodiment. 第１の実施形態におけるクロスフェード処理の説明図Explanatory drawing of crossfade processing in a 1st embodiment 第２の実施形態にかかる音声符号化装置の構成図The block diagram of the audio | voice coding apparatus concerning 2nd Embodiment. 第２の実施形態にかかる音声復号化装置の構成図The block diagram of the audio | voice decoding apparatus concerning 2nd Embodiment. 第２の実施形態にかかるクロスフェード処理の説明図Explanatory drawing of the cross-fade process concerning 2nd Embodiment

Explanation of symbols

１記憶部
２分割部
３エンコーダ
３ａ反転器
３ｂ予測符号エンコーダ
３ｃ変換符号エンコーダ
４判定部
５デコーダ
５ａ予測符号デコーダ
５ｂ反転器
５ｃ変換符号デコーダ
６合成部
６ａ乗算器
６ｂ加算器
７記憶部
８セレクタ DESCRIPTION OF SYMBOLS 1 Memory | storage part 2 Dividing part 3 Encoder 3a Inverter 3b Prediction code encoder 3c Conversion code encoder 4 Judgment part 5 Decoder 5a Prediction code decoder 5b Inverter 5c Conversion code decoder 6 Synthesis | combination part 6a Multiplier 6b Adder 7 Storage part 8 Selector

Claims

In a speech encoding apparatus that encodes a series of speech waveforms,
A division unit for dividing the speech waveform into a plurality of groups in time series;
A speech waveform encoding process is performed with each of the plurality of groups as a processing unit, and the speech waveforms in the start group including the start point of the speech waveform are predictively encoded in a chronological order. And an encoder that performs predictive coding in reverse order in time series on the speech waveform in the end group including the end point of the speech waveform.

The encoder is
An inverter that inverts the speech waveform in the end group so that the speech waveform in the end group input in time sequence in normal order is in reverse order in time series;
The input order to the self with respect to the start group in the forward order in time series input without passing through the inverter and the end group in the reverse order in time series input through the inverter The speech encoding apparatus according to claim 1, further comprising: a predictive code encoder that performs predictive encoding according to the above.

The encoder is
The speech coding apparatus according to claim 1, further comprising a transform code encoder that performs transform coding on the speech waveform in the intermediate group excluding the start group and the end group.

The prediction code encoder is
The speech coding apparatus according to claim 1, wherein identification information indicating a time-series coding order within a group is added to a codeword of the speech waveform.

The prediction code encoder is
The identification information for specifying a group corresponding to the start group and the identification information for specifying a group corresponding to the end group are added to a codeword of the speech waveform. The speech encoding apparatus described in 1.

The transform code encoder is
4. The speech encoding apparatus according to claim 3, wherein identification information for specifying a group corresponding to the intermediate group is added to a codeword of the speech waveform.

The said division part divides | segments the said audio | voice waveform so that the said audio | voice waveform may overlap partially between the groups which mutually adjoined in time series, The description in any one of Claim 1 to 6 characterized by the above-mentioned. Speech encoding device.

In a speech decoding apparatus that decodes a codeword generated by encoding a series of speech waveforms in time-series group units,
Based on the identification information added to the codeword, whether or not a group of codewords to be decoded is a start group including a start point of the speech waveform, and an end point of the speech waveform A determination unit for determining whether or not the end group includes
The codeword decoding process is performed in units of groups, and the codewords in the group determined to be the start group are decoded by the predictive coding method in the normal order in time series, and the end For a codeword in a group determined to be a group, a decoder that performs decoding by a predictive coding method in reverse order in time series,
A speech decoding apparatus comprising: a synthesizing unit that restores a series of speech waveforms by synthesizing the decoded waveforms for each group generated by the decoder in time series.

The decoder
A predictive code decoder that generates a decoded waveform in units of groups by performing decoding by a predictive coding method for the codewords in each group according to an input order to the self;
9. The inverter according to claim 8, further comprising: an inverter that inverts the decoded waveform in the group so that the decoded waveform in the group determined to be the end group is in reverse order in time series. Voice decoding device.

The determination unit determines whether or not a group of codewords subject to decoding processing is an intermediate group excluding the start group and the end group,
The decoder
The speech decoding according to claim 8 or 9, further comprising a transform code decoder that performs decoding by a transform coding scheme for a codeword in the group determined to be the intermediate group. Device.

The determination unit
Based on the identification information added to the codeword, determine whether the decoded waveform obtained by decoding the codeword overlaps with other decoded waveforms in time series,
The synthesis unit is
When it is determined that the decoded decoded waveform is in a range that overlaps with other decoded waveforms that are adjacent to each other in time series, a fade process is performed on the decoded waveform. The speech decoding apparatus according to any one of claims 8 to 10.

In a speech encoding program for causing a computer to execute a speech encoding method for encoding a series of speech waveforms using a group as a processing unit,
A first step of dividing the speech waveform into a plurality of groups in time series;
A speech waveform encoding process is performed with each of the plurality of groups as a processing unit, and the speech waveforms in the start group including the start point of the speech waveform are predictively encoded in a chronological order. And the computer executes a speech encoding method including a second step of predictively encoding the speech waveforms in the end group including the end point of the speech waveform in a time-series reverse order. A speech encoding program characterized by being caused to execute.

In a speech decoding program for causing a computer to execute a speech decoding method for decoding a codeword generated by encoding a series of speech waveforms in time-series group units,
Based on the identification information added to the codeword, whether or not a group of codewords to be decoded is a start group including a start point of the speech waveform, and an end point of the speech waveform A first step of determining whether or not the end group includes:
The codeword decoding process is performed in units of groups, and the codewords in the group determined to be the start group are decoded by the predictive coding method in the normal order in time series, and the end A second step of performing decoding by a predictive coding method in reverse order in time series for codewords in the group determined to be a group;
A speech decoding program that causes a computer to execute a speech decoding method having a third step of restoring a series of speech waveforms by synthesizing the decoded waveforms decoded for each group in time series .