JPH0573089A

JPH0573089A - Speech reproducing method

Info

Publication number: JPH0573089A
Application number: JP3237666A
Authority: JP
Inventors: Masayuki Misaki; 正之三崎; Ryoji Suzuki; 良二鈴木
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1991-09-18
Filing date: 1991-09-18
Publication date: 1993-03-26

Abstract

PURPOSE:To obtain a reproduced speech which can be heard as a natural speech even when its reproducing speed is varied and also reducible in process distortion in a voice section by finding the ratio of speech section length of specific block length and overall block length, comparing the value with a desired time compression ratio, and selecting a compressing method according to the result. CONSTITUTION:The ratio Rv of the total time length of the speech section in one block of an input signal and the time length of the whole block is found and compared with the compression ratio Rc of the time compression of the input signal. When Rc>=kRv (k: constant), only voiceless sections are compressed. When Rc<kRc, on the other hand, independent compression ratios are set for the voiceless sections and speech sections and the time compression is performed. Thus, an elongation ratio is varied for the voiceless sections and speech section according to the value Rv, so the speech which is close to the natural speech can be reproduced.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、記録された会話音声信
号などを再生する際に、通常速度以外の速度に変更して
も音程の変化を生じずに不都合なく聴取できるようにす
る音声再生方法に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention, when reproducing a recorded conversational voice signal or the like, does not cause a change in the pitch even if the speed is changed to a speed other than the normal speed, and reproduces the sound without any inconvenience. It is about the method.

【０００２】[0002]

【従来の技術】従来より、記録した音声信号の再生速度
を可変する機能をもつ様々な装置が提案されている。最
も簡単な例はアナログテープレコーダのキュー（早送り
再生）、レビュー（巻戻し再生）であるが、これらの高
速再生音声は通常速度の音声と比べると音程・速度の両
方が変化しているため、内容を聴取するのはほとんど不
可能である。また、高速／低速再生時にも音程が変化し
ないように補償する速度変換処理を利用した音声再生方
法も提案されている。以下、速度変換処理を行なう従来
の音声再生方法について、図面を参照しながら説明を行
なう。2. Description of the Related Art Conventionally, various devices having a function of varying the reproduction speed of a recorded audio signal have been proposed. The simplest examples are cues (fast forward playback) and reviews (rewind playback) of analog tape recorders, but these high-speed playback voices have both pitch and speed changes compared to normal-speed voices. It is almost impossible to hear the contents. Also, a voice reproduction method utilizing a speed conversion process for compensating the pitch so as not to change even during high speed / low speed reproduction has been proposed. Hereinafter, a conventional audio reproduction method for performing speed conversion processing will be described with reference to the drawings.

【０００３】（図４）は、従来の音声再生方法のフロー
チャートを示すものである。まず、時間圧伸比R_CEを決
定する。ここで、時間圧伸比R_CEは以下のように示され
る。FIG. 4 shows a flowchart of a conventional audio reproducing method. First, the time companding ratio R _CE is determined. Here, the time companding ratio R _CE is shown as follows.

【０００４】[0004]

【数１】 [Equation 1]

【０００５】（数１）より明らかなように、通常速度よ
り高速で再生する場合は、R_CEは１より小さい値をと
り、通常速度より低速で再生する場合には、R_CEは１よ
り大きい値をとる。このR_CEをパラメータとして入力音
声信号に対して速度変換処理を施す。速度変換処理は音
程を通常再生時と同様に保ったまま速度のみを変更する
処理を行なうものである。速度変換処理については例え
ば『ディジタル音声処理』古井貞煕著、東海大学出版会
のＴＤＨＳアルゴリズムの項目を参照されたい。そし
て、引続き処理を繰り返して行う場合には、R_CEを変更
するかどうかを確認し、R_CEを決定してから、速度変換
処理を行う。以上の動作を繰り返すことにより、ユーザ
ーが所望とする再生速度で音声信号を再生するものであ
る。As is clear from (Equation 1), R _CE takes a value smaller than 1 when reproducing at a speed higher than normal speed, and R _CE is larger than 1 when reproducing at a speed lower than normal speed. Takes a value. The speed conversion process is performed on the input voice signal using this R _CE as a parameter. The speed conversion process is a process of changing only the speed while keeping the pitch as in the normal reproduction. For the speed conversion processing, see, for example, "Digital Speech Processing", Sadahiro Furui, TDHS algorithm of Tokai University Press. Then, in the case where the subsequent processing is repeatedly performed, it is confirmed whether R _CE is changed, R _CE is determined, and then the speed conversion processing is performed. By repeating the above operation, the audio signal is reproduced at the reproduction speed desired by the user.

【０００６】[0006]

【発明が解決しようとする課題】しかし上記の構成で
は、入力信号の状態によらずに、一様に時間軸伸縮を行
なう速度変換処理を行なって再生速度を変化させてい
る。これにより、再生音の音程は保存されているが、再
生速度を通常速度から大きく変更する場合でも、無音区
間と音声区間の区別なく一様に時間伸縮を行っているた
め、人間の発声法とは異なった不自然な再生音に感じら
れるという課題を有している。また、常時、速度変換処
理による処理歪が、情報を伝達している音声区間に対し
て加わっている。However, in the above configuration, the reproduction speed is changed by performing the speed conversion processing for uniformly expanding and contracting the time axis regardless of the state of the input signal. As a result, the pitch of the reproduced sound is preserved, but even when the reproduction speed is greatly changed from the normal speed, the time is expanded and contracted uniformly without distinction between the silent section and the voice section. Has a problem of feeling different unnatural reproduced sounds. Further, processing distortion due to speed conversion processing is always added to the voice section transmitting information.

【０００７】本発明は上記の課題を解決するもので、再
生速度を変更してもできるだけ自然な発声として聞き取
ることができ、しかも音声区間に対する処理歪を低減し
た再生音を得るための音声再生方法を提供することを目
的とする。The present invention solves the above problems, and a voice reproduction method for obtaining a reproduced sound which can be heard as natural as possible even if the reproduction speed is changed and the processing distortion of the voice section is reduced. The purpose is to provide.

【０００８】[0008]

【課題を解決するための手段】この目的を達成するため
に本発明の音声再生方法では、１ブロック内で音声区間
長とブロック長との比R_Vを考慮して、無音区間および音
声区間に対する圧縮比（伸長比）をそれぞれ別に設定し
て時間圧縮（時間伸長）を行う方法を採用している。ま
た、時間圧縮比（時間伸長比）とR_Vとを比較し、できる
限り無音区間の圧縮（伸長）に重点をかけ、音声区間の
圧縮（伸長）は時間圧縮率（時間伸長率）が大きいとき
にしか行われないような方法としている。In order to achieve this object, a voice reproducing method according to the present invention considers a ratio R _V between a voice section length and a block length in one block, and considers a silent section and a voice section. A method is employed in which the compression ratio (expansion ratio) is set separately and time compression (time expansion) is performed. Also, the time compression ratio (time expansion ratio) is compared with R _V, and the compression (expansion) of the voice section is emphasized as much as possible, and the compression (expansion) of the voice section has a large time compression rate (time expansion rate). It's a method that is only done sometimes.

【０００９】[0009]

【作用】この方法によって、ブロック内の無音区間およ
び音声区間に対する圧縮比（伸長比）をそれぞれR_Vに対
応して決定している。ここでR_Vは1.0のとき100％音声区
間であり、0.0のとき100％無音区間である。まず、時間
圧縮の場合には、ブロック内に無音区間が多く含まれて
おり時間圧縮比R_Cよりk・R_Vが小さいときには、無音区間
を圧縮することだけで所望の時間圧縮比を実現でき、ブ
ロック内に無音区間が少なくR_Cよりk・R_Vが大きいときに
は無音区間の圧縮だけでは所望の時間圧縮比を実現でき
ないので、音声区間の圧縮も同時に行う。ここで定数k
は、１より大きい値であり、無音区間圧縮のマージンを
決定する。By this method, the compression ratio (expansion ratio) for the silent section and the voice section in the block is determined for each R _V. When R _V is 1.0, it is a 100% voice section, and when it is 0.0, it is a 100% voice section. First, in the case of time compression, when a block contains many silent intervals and k · R _V is smaller than the time compression ratio R _C , the desired time compression ratio can be achieved simply by compressing the silent intervals. When there are few silent sections in the block and k · R _V is larger than R _C , the desired time compression ratio cannot be achieved only by compressing the silent sections, so the speech section is also compressed at the same time. Where the constant k
Is a value greater than 1 and determines the margin for silence interval compression.

【００１０】次に、時間伸長の場合には、ブロック内に
無音区間があまり含まれておらず時間伸長比R_Eよりk／R
_Vが小さいときには、無音区間を伸長することだけで所
望の時間伸長比を実現し、ブロック内に無音区間が多く
R_Eよりk／R_Vが大きいときには無音区間の伸長だけで所
望の時間伸長比を実現するには限界があるので、音声区
間の伸長も同時に行う。ここで定数kは、１より小さい
値であり、無音区間伸長のマージンを決定する。Next, in the case of time extension, there are not many silent sections in the block and k / R is calculated from the time extension ratio R _E.
_{When V} is small, the desired time extension ratio can be achieved simply by extending the silent section, and there are many silent sections in the block.
When k / R _V is larger than R _E, there is a limit in achieving the desired time extension ratio by only the extension of the silent section, so the extension of the voice section is also performed at the same time. Here, the constant k is a value smaller than 1 and determines the margin of the silent section expansion.

【００１１】このように制御を行うことによって、再生
速度を変更してもできるだけ自然な発声として聞き取る
ことができ、しかも音声区間に対する処理が少なく処理
歪を低減した再生音を得ることができる。By performing the control as described above, it is possible to hear the utterance as natural as possible even if the reproduction speed is changed, and it is possible to obtain the reproduced sound in which the processing for the voice section is small and the processing distortion is reduced.

【００１２】[0012]

【実施例１】以下本発明の実施例１について、図面を参
照しながら説明する。First Embodiment A first embodiment of the present invention will be described below with reference to the drawings.

【００１３】（図１）は本発明の第１の実施例における
フローチャートを示すものである。（図１）に用いてい
る記号について説明する。R_Vは以下の（数２）に示すよ
うに、ブロック内の音声区間長の総和とブロック長全体
との比である。FIG. 1 shows a flowchart in the first embodiment of the present invention. The symbols used in (FIG. 1) will be described. R _V is the ratio of the sum of the voice section lengths in a block to the entire block length, as shown in (Equation 2) below.

【００１４】[0014]

【数２】 [Equation 2]

【００１５】R_Cは以下の（数３）に示すように入力信号
を時間圧縮する圧縮比である。R _C is a compression ratio for time-compressing the input signal as shown in the following (Equation 3).

【００１６】[0016]

【数３】 [Equation 3]

【００１７】また、ブロック長全体の時間長をT_B、無音
区間の時間長の総和をT_S、音声区間の時間長の総和を
T_V、無音区間に対する時間圧縮の圧縮比をC_S、音声区間
に対する時間圧縮の圧縮比をC_Vとする。これら、T_B,T_S,
T_V,C_S,C_VでR_Cを書き表すと以下の（数４）のようにな
る。The total time length of the block length is T _B , the total time length of the silent section is T _S , and the total time length of the voice section is
T _V, the compression ratio of time compression on silence section C _S, the compression ratio of time compression on the voice section and C _V. These, T _B , T _S ,
Writing R _C by T _V , C _S , C _V gives the following (Equation 4).

【００１８】[0018]

【数４】 [Equation 4]

【００１９】また、T_BとT_S,T_Vの関係式は以下の（数
５）のようになる。The relational expression between T _B and T _S , T _V is as shown in the following (Equation 5).

【００２０】[0020]

【数５】 [Equation 5]

【００２１】次に（図１）のフローチャートの説明を行
う。まず、入力信号は１ブロック分だけバッファメモリ
に読み込まれる。そしてこのブロック内の信号に関して
音声区間と無音区間の判定を行い、R_Vを求める。そし
て、このR_Vに定数k(k＞1)を掛け合わせたk・R_VとR_Cとの
比較を行い、R_C≧k・R_Vの場合および、R_C＜k・R_Vの場合の
C_SおよびC_Vの値を次の（表１）のように計算する。Next, the flowchart of FIG. 1 will be described. First, the input signal is read into the buffer memory for one block. Then, with respect to the signal in this block, the voice section and the silent section are determined to obtain R _V. Then, a comparison of a constant k (k> 1) k · multiplied by the R _V and R _C to the R _V, if the R _C ≧ k · R _V and, in the case of R _C <k · R _V of
Calculate the values of C _S and C _V as in (Table 1) below.

【００２２】[0022]

【表１】 [Table 1]

【００２３】ここで、R_Vの値が小さいとき、すなわちブ
ロック内の無音区間の比率が大きいときには、R_C≧k・R_V
の条件が満たされ易く、音声区間の時間圧縮は行われ
ず、無音区間の時間圧縮のみによって目標の時間圧縮比
が実現される。逆に、ブロック内にあまり無音区間がな
いときには、R_C＜k・R_Vの条件が満たされ易く、無音区
間、または音声区間の時間圧縮が行われる。さらに、R_V
の値によって２通りの場合に分かれる。R_Vに対するC_Sお
よびC_Vの値の変化をグラフにしたものを（図２（ａ）
（ｂ））に示す。この図より明らかなように、R_Vが0か
ら1まで変化していくときの無音区間または音声区間に
対する圧縮の様子は、まず、無音区間の圧縮だけが行わ
れ、そして、無音区間と音声区間の両方の圧縮が行わ
れ、そして１近傍では音声区間の圧縮のみ行われる。Here, when the value of R _V is small, that is, when the ratio of the silent section in the block is large, R _C ≧ k · R _V
Since the condition is easily satisfied, the time compression of the voice section is not performed, and the target time compression ratio is realized only by the time compression of the silent section. On the contrary, when there are not many silent sections in the block, the condition of R _C <k · R _V is easily satisfied, and the silent section or the voice section is time-compressed. In addition, R _V
There are two cases depending on the value of. Those in which the change in the value of C _S and C _V for R _V in the graph (FIGS. 2 (a)
(B)). As is clear from this figure, when R _V changes from 0 to 1, the compression state for the silent section or the speech section is as follows.First, only the silent section is compressed, and then the silent section and the speech section. Both of the compressions are performed, and in the vicinity of 1, only the voice section is compressed.

【００２４】（表１）に示したように、C_SとC_Vとを決定
したのち、ブロック内のデータに対して処理を行う。無
音区間には圧縮比C_Sに基づいて圧縮を行い、音声区間に
は圧縮比C_Vに基づいて圧縮を行う。以上で１ブロックに
対する処理は終わり、次のブロックに対しても同様の処
理を行うかどうかを確認する。As shown in (Table 1), after determining C _S and C _V , the data in the block is processed. The silent section is compressed based on the compression ratio C _S , and the voice section is compressed based on the compression ratio C _V. With the above, the processing for one block is completed, and it is confirmed whether the same processing is performed for the next block.

【００２５】以上のように本実施例によれば、現在のブ
ロックに於けるR_Vの値に応じて無音区間および、音声区
間に対する圧縮比を変更している。そして、無音区間が
多いブロックでは音声区間には圧縮を行わずに、無音区
間のみ圧縮してブロック全体を所望の圧縮比R_Cにしてい
るため、音声区間に対する処理歪がなく、自然な高速再
生を実現できる。As described above, according to this embodiment, the compression ratios for the silent section and the voice section are changed according to the value of R _{V in} the current block. Then, in a block with many silent sections, the speech section is not compressed, but only the silent section is compressed to the desired compression ratio R _C , so that there is no processing distortion in the speech section and natural high-speed reproduction is performed. Can be realized.

【００２６】以下本発明の第２の実施例について、図面
を参照しながら説明する。（図３）は本発明の第２の実
施例におけるフローチャートを示すものである。（図
３）に用いている記号について説明する。R_Eは以下の
（数６）に示すように入力信号を時間伸長する伸長比で
ある。A second embodiment of the present invention will be described below with reference to the drawings. FIG. 3 shows a flowchart in the second embodiment of the present invention. The symbols used in (FIG. 3) will be described. R _E is an expansion ratio for time-expanding the input signal as shown in (Equation 6) below.

【００２７】[0027]

【数６】 [Equation 6]

【００２８】また、R_V、T_B、T_S、T_Vは第１の実施例と同
様である。無音区間に対する時間伸長の伸長比をE_S、音
声区間に対する時間伸長の伸長比をE_Vとする。これら、
T_B、T_S、T_V、E_S、E_VでR_Eを書き表すと以下の（数７）の
ようになる。Further, R _V , T _B , T _S and T _V are the same as those in the first embodiment. _Let E _{S be} the expansion ratio of the time expansion for the silent interval and E _{V be} the expansion ratio of the time expansion for the voice interval. these,
When R _E is written with T _B , T _S , T _V , E _S , and E _V , the following (Equation 7) is obtained.

【００２９】[0029]

【数７】 [Equation 7]

【００３０】次に（図３）のフローチャートの説明を行
う。まず、入力信号は１ブロック分だけバッファメモリ
に読み込まれる。そしてこのブロック内の信号に関して
音声区間と無音区間の判定を行い、R_Vを求める。そし
て、このR_Vの逆数に定数k(k＜1)を掛け合わせたk／R_Vと
R_Eとの比較を行い、R_E≧k・R_Vの場合および、R_E＜k・R_Vの
場合のE_SおよびE_Vの値を次の（表２）のように計算す
る。Next, the flowchart of FIG. 3 will be described. First, the input signal is read into the buffer memory for one block. Then, with respect to the signal in this block, the voice section and the silent section are determined to obtain R _V. Then, k / R _V obtained by multiplying the reciprocal of R _{V by} a constant k (k <1)
By comparing with R _E , the values of E _S and E _V when R _E ≧ k · R _V and when R _E <k · R _V are calculated as shown in (Table 2) below.

【００３１】[0031]

【表２】 [Table 2]

【００３２】ここで、R_Vの値が大きいとき、すなわちブ
ロック内の無音区間の比率が小さいときには、R_E≧k／R
_Vの条件が満たされ易く、音声区間の時間伸長は行われ
ず、無音区間の時間伸長のみによって目標の時間伸長比
が実現される。逆に、ブロック内に無音区間が多く存在
するときには、R_E＜k／R_Vの条件が満たされ易く、無音
区間、または音声区間の時間伸長が平行して行われる。
（表２）に示したように、E_SとE_Vとを決定したのち、ブ
ロック内のデータに対して処理を行う。無音区間には伸
長比E_Sに基づいて伸長を行い、音声区間には伸長比E_Vに
基づいて伸長を行う。以上で１ブロックに対する処理は
終わり、次のブロックに対しても同様の処理を行うかど
うかを確認する。Here, when the value of R _V is large, that is, when the ratio of the silent section in the block is small, R _E ≧ k / R
The condition of _V is easily satisfied, the time extension of the voice section is not performed, and the target time extension ratio is realized only by the time extension of the silent section. On the contrary, when there are many silent sections in the block, the condition of R _E <k / R _V is easily satisfied, and the silent section or the voice section is time-expanded in parallel.
As shown in (Table 2), after determining E _S and E _V , the data in the block is processed. The silent section is expanded based on the expansion ratio E _S , and the voice section is expanded based on the expansion ratio E _V. With the above, the processing for one block is completed, and it is confirmed whether the same processing is performed for the next block.

【００３３】以上のように本実施例によれば、現在のブ
ロックに於けるR_Vの値に応じて無音区間および、音声区
間に対する伸長比を変更している。そして、無音区間が
少ないブロックでは音声区間には伸長を行わずに、無音
区間のみ伸長してブロック全体を所望の伸長比R_Eにして
いるため、音声区間に対する処理歪がなく、自然な低速
再生を実現できる。As described above, according to this embodiment, the expansion ratio for the silent section and the voice section is changed according to the value of R _{V in} the current block. In a block with few silent sections, the sound section is not expanded, but only the silent section is expanded to the desired expansion ratio R _E , so there is no processing distortion with respect to the sound section and a natural low-speed playback is performed. Can be realized.

【００３４】[0034]

【発明の効果】以上詳述したように本発明は、所定のブ
ロック長における音声区間長とブロック長全体との比R_V
を求め、この値と所望の圧縮比（伸長比）とを比較し、
その結果により、(1)無音区間のみを圧縮する（伸長す
る）ことで全体の時間圧縮（時間伸長）を行う、あるい
は(2)無音区間と音声区間とに独立した圧縮比（伸長
比）を設置して時間圧縮（時間伸長）を行うかを選択し
て音声再生を行う方法である。この方法により、高速再
生する場合に、無音区間が多く含まれているブロックに
対しては、無音区間の圧縮のみで時間圧縮を実現し、低
速再生する場合に、無音区間が少ないブロックに対して
は、無音区間の伸長のみで時間伸長を実現するものであ
る。したがって、元の信号の音声区間に対して処理を加
えない場合には、処理歪が少なくなる。また、高速再生
時には不要な無音区間を削除し、低速再生時には考える
余裕を与える無音区間を挿入することで、人間にとって
自然に音声を聴取することができる音声再生方法を提供
するものである。As described above in detail, according to the present invention, the ratio R _V of the voice section length to the entire block length in a predetermined block length is
And compare this value with the desired compression ratio (expansion ratio),
Depending on the result, (1) the entire time compression (time expansion) is performed by compressing (expanding) only the silent section, or (2) the independent compression ratio (expansion ratio) for the silent section and the voice section. This is a method of performing audio reproduction by selecting whether to install and perform time compression (time expansion). With this method, for high-speed playback, for blocks containing many silent intervals, time compression is achieved only by compressing silent intervals, and for low-speed playback, for blocks with few silent intervals. Is to extend time only by extending the silent section. Therefore, when no processing is applied to the voice section of the original signal, processing distortion is reduced. Another object of the present invention is to provide a voice reproduction method that allows humans to naturally hear voices by deleting unnecessary silence intervals during high-speed reproduction and inserting silence intervals that allow time to think during low-speed reproduction.

[Brief description of drawings]

【図１】本発明の第１の実施例における音声再生方法の
フローチャートである。FIG. 1 is a flowchart of an audio reproducing method according to a first embodiment of the present invention.

【図２】（ａ）は本発明の第１の実施例におけるR_VとC_S
の関係を示すグラフである。（ｂ）は本発明の第１の実施例におけるR_VとC_vの関係を
示すグラフである。FIG. 2A shows R _V and C _S in the first embodiment of the present invention.
It is a graph which shows the relationship of. (B) is a graph showing the relationship between R _V and C _v in the first embodiment of the present invention.

【図３】本発明の第２の実施例における音声再生方法の
フローチャートである。FIG. 3 is a flowchart of an audio reproducing method according to a second embodiment of the present invention.

【図４】従来の音声再生方法のフローチャートである。FIG. 4 is a flowchart of a conventional audio reproduction method.

Claims

[Claims]

1. A voice section or a silent section is determined within a predetermined block length range, a ratio R _V between the voice section length and the entire block length is obtained, and k · R is calculated from a target time compression ratio R _C. _V
When (k is a constant of 1 or more) is small or equal,
Compressing the time length by compressing only the silent section, from R _C
When k · R _V takes a large value, a voice reproduction method for performing high-speed voice reproduction by performing compression at a predetermined compression ratio for a silent section or a voice section to compress a time length and repeating the above processing.

2. A voice section or a silent section is determined within a predetermined block length range, a ratio R _V between the voice section length and the entire block length is obtained, and k / R is calculated from a target time expansion ratio R _E.
When _V (k is a constant of 1 or less) is smaller or equal value, extends the length of time by extending the silent section, from R _E
When k / R _V takes a large value, a sound reproduction method in which a sound section or a sound section is expanded at a predetermined expansion ratio to expand the time length, and the above processing is repeated to reproduce the sound at a low speed.