JP3249567B2 - Method and apparatus for converting speech speed - Google Patents

Method and apparatus for converting speech speed

Info

Publication number
JP3249567B2
JP3249567B2 JP05178792A JP5178792A JP3249567B2 JP 3249567 B2 JP3249567 B2 JP 3249567B2 JP 05178792 A JP05178792 A JP 05178792A JP 5178792 A JP5178792 A JP 5178792A JP 3249567 B2 JP3249567 B2 JP 3249567B2
Authority
JP
Japan
Prior art keywords
section
speech
voice
voiced
speed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
JP05178792A
Other languages
Japanese (ja)
Other versions
JPH05257490A (en
Inventor
龍 池沢
章 中村
栄一 宮坂
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Japan Broadcasting Corp
Original Assignee
Japan Broadcasting Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Japan Broadcasting Corp filed Critical Japan Broadcasting Corp
Priority to JP05178792A priority Critical patent/JP3249567B2/en
Priority to US07/950,411 priority patent/US5305420A/en
Priority to EP96119237A priority patent/EP0766229B1/en
Priority to DK96119237T priority patent/DK0766229T3/en
Priority to EP92116292A priority patent/EP0534410B1/en
Priority to DK92116292T priority patent/DK0534410T3/en
Publication of JPH05257490A publication Critical patent/JPH05257490A/en
Application granted granted Critical
Publication of JP3249567B2 publication Critical patent/JP3249567B2/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Description

【発明の詳細な説明】DETAILED DESCRIPTION OF THE INVENTION

【0001】[0001]

【産業上の利用分野】本発明は、聴覚障害者や高齢者等
の音声聴取に好適な話速変換方法および装置に関する。
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech speed conversion method and apparatus suitable for listening to a voice of a hearing impaired person or an elderly person.

【0002】[0002]

【発明の概要】本発明は、聴覚障害者や高齢者等の音声
聴取に好適な話速変換方法および装置に関するものであ
って、受聴音声の発声する速さ(話速)を遅くする際
に、文章間の無音区間を聴感上違和感のない範囲で最短
に短縮し、かつ話速を一定の規則に基づいて変化させる
ことにより、発話時間を原音声の発話時間に保ったまま
全体としてゆっくりとした聴きやすい良好な音声に変換
することを図るものである。
SUMMARY OF THE INVENTION The present invention relates to a speech speed conversion method and apparatus suitable for hearing voices of hearing-impaired persons, elderly people, and the like. By shortening the silent section between sentences to the minimum as long as there is no unpleasant sensation, and changing the speech speed based on a certain rule, the speech time is maintained slowly while maintaining the speech time of the original voice as a whole. It is intended to convert the sound into good sound which is easy to hear.

【0003】[0003]

【従来の技術】品質を保ったまま、話速を変換する技術
自体が発展途上である上、実時間(枠)との「ずれ」を
考慮した技術は未開発である。
2. Description of the Related Art A technology for converting speech speed while maintaining quality is in the process of development, and a technology that takes into account a "deviation" from real time (frame) has not been developed.

【0004】[0004]

【発明が解決しようとする課題】音声の話速のみを一様
に遅くすることにより、特に高齢者や聴覚障害者等にと
っては、はるかに聴きやすくすることが可能であるが、
この操作によって音声の発話時間も必然的に伸張する。
しかし、放送や朗読カセット等では、伸張前の音声の発
話時間は、決められた時間内に収まるように発話されて
いるから、このような音声を伸張すると上記制限時間内
に収まらなくなる可能性が生じる。また、テレビジョン
等のように音声と映像を同期して提供するような場合
に、音声のみを伸張すると、映像との間に時間的な「ず
れ」が生じ、これが聞き取りに悪影響を及ぼすことが考
えられる。
By uniformly lowering only the speech speed of speech, it is possible to make it much easier to hear, especially for the elderly and the hearing impaired.
This operation inevitably extends the speech utterance time.
However, in broadcasting and reading cassettes, the utterance time of the voice before expansion is uttered so as to fit within the predetermined time.Therefore, if such a voice is expanded, it may not be able to fit within the above-mentioned time limit. Occurs. Also, in the case where audio and video are provided in synchronization, such as on a television, if only audio is expanded, a time lag may occur between the video and the video, which may adversely affect listening. Conceivable.

【0005】本発明の目的は、上述した時間的な「ず
れ」に伴う問題点を解決するため、発話音声中の意味上
重要な部分の話速は適度に遅くし、それ以外の部分は逆
に速めることによって、発話時間を実質的に伸張させる
ことなく、全体としてゆっくりとした聞きやすい音声に
変換する話速変換方法および装置を提供することにあ
る。
[0005] An object of the present invention is to solve the above-mentioned problem associated with the temporal "shift" by appropriately reducing the speech speed of a meaningful portion in the uttered voice, and reversely reducing the other portions. Accordingly, it is an object of the present invention to provide a speech speed conversion method and apparatus for converting speech into a slow and easy-to-hear sound as a whole without substantially extending the speech time by speeding up.

【0006】[0006]

【課題を解決するための手段】上記目的を達成するた
め、請求項1の話速変換方法の発明は、受聴音声の発生
する速さ(以下、話速という)を遅くする際に、音声の
ピッチ(基本周波数)の変化に応じて、ピッチの高いと
ころでは話速を緩め、低いところでは話速を早めること
を特徴とする。
In order to achieve the above object, according to the first aspect of the present invention, there is provided a method for converting a speech speed, which comprises the steps of: According to a change in the pitch (fundamental frequency), the speech speed is reduced at a high pitch and the speech speed is increased at a low pitch.

【0007】上記目的を達成するため、請求項2の話速
変換方法の発明は、話速を遅くする際に、声立てと次の
声立ての区間を単位にしてこの区間の開始点ではゆっく
りとした話速を設定し、その終了点に向かって音声の基
本周波数の大まかな変化に追従して徐々に話速を早める
ことを特徴とする。
In order to achieve the above object, a speech speed conversion method according to a second aspect of the present invention provides a speech speed conversion method in which, when the speech speed is reduced, a section of a vocalization and a next vocalization are slowly set at a start point of this section. Is set, and the speech speed is gradually increased toward the end point by following a rough change in the fundamental frequency of the voice.

【0008】ここで、話速を遅くする際にさらに、文章
間の無音区間を予め実験で求めた聴感上違和感のない範
囲でできるだけ短い時間に短縮することを特徴とするこ
とができる。また、前記予め実験で求めた聴感上違和感
のない範囲でできるだけ短い時間が、862ms〜ほぼ
1000msの範囲であることを特徴とすることができ
る。
[0008] Here, when the speech speed is reduced, it is further characterized in that the silent section between sentences is shortened as short as possible within a range in which the sense of incongruity obtained by an experiment is obtained in advance. Further, the time as short as possible within a range in which there is no uncomfortable feeling in the hearing, which is previously determined by an experiment, is in a range from 862 ms to almost 1000 ms.

【0009】上記目的を達成するため、請求項5の話速
変換装置の発明は、音声信号を有声、無声、無音の別に
識別する音声識別手段と、該音声識別手段により識別さ
れた無音区間が文章間の無音区間か否かを判定する無音
区間判定手段と、該無音区間判定手段により文章間の無
音区間と判定された場合は当該無音区間を予め実験で求
めた聴感上違和感のない範囲でできるだけ短い時間に短
縮する無音区間短縮手段と、前記音声識別手段により識
別された有声区間が声立て開始のものか否かを判定する
有声区間判定手段と、該有声区間判定手段により声立て
開始と判定された場合は声立てと次の声立ての区間を単
位にしてこの区間の開始点ではゆっくりとした話速を設
定し、その終了点に向かって音声の基本周波数の大まか
な変化に追従して徐々に話速を速める話速変換処理を行
う有声区間伸張手段とを具備したことを特徴とする。
In order to achieve the above object, a speech speed conversion apparatus according to a fifth aspect of the present invention is a speech speed conversion apparatus, comprising: a voice identification unit for identifying a voice signal as voiced, unvoiced, and silent; and a silent section identified by the voice identification unit. A silent section determining means for determining whether or not there is a silent section between sentences; and, if the silent section determining means determines that there is a silent section between sentences, the silent section is determined in advance by an experiment in a range where there is no auditory discomfort. Silent section shortening means for shortening to a time as short as possible, voiced section determining means for determining whether or not the voiced section identified by the voice identifying means is the start of voiced voice, and start of voiced voice by the voiced section determining means. If it is determined, set a slow speech speed at the start point of this section in units of the vocal and the next vocal section, and follow the rough change of the fundamental frequency of the voice toward the end point. hand S and characterized by including a voiced segment stretching means for performing speech speed conversion processing to increase the speech rate on.

【0010】[0010]

【作用】本発明は、受聴音声の発生する速さ(話速)を
遅くする際に、音声のピッチ(基本周波数)の変化に応
じて、ピッチの高いところでは話速を緩め、低いところ
では話速を早めることに特徴がある。また、本発明は、
話速を遅くする際に、声立てと次の声立ての区間を単位
にしてこの区間の開始点ではゆっくりとした話速を設定
し、その終了点に向かって音声の基本周波数の大まかな
変化に追従して徐々に話速を早めることに特徴がある。
According to the present invention, when the speed at which a received voice is generated (speaking speed) is reduced, the voice speed is reduced at a high pitch and the voice speed is reduced at a low pitch in accordance with a change in the pitch (fundamental frequency) of the voice. The feature is to speed up the talk. Also, the present invention
When decreasing the speech speed, set a slow speech speed at the start point of this section in units of the voice and the next voice, and roughly change the fundamental frequency of the voice toward the end point The feature is that the speech speed is gradually increased to follow.

【0011】さらに、本発明では、文章間の無音区間に
着目し、文章間の無音区間を予め実験で求めた聴感上違
和感のない範囲でできるだけ短い時間に短縮するように
している。その一例として、その聴感上違和感のない範
囲でできるだけ短い時間が、862ms〜ほぼ1000
msの範囲であるとしている。
Further, in the present invention, attention is paid to a silent section between sentences, and the silent section between sentences is shortened to a time as short as possible within a range in which the sense of incongruity previously obtained by an experiment is not affected. As an example, a time as short as possible within a range where there is no uncomfortable feeling is 862 ms to almost 1000 times.
ms.

【0012】従って、本発明によれば、受聴者の希望に
あったゆっくりとした聴きやすい音声を発話時間が伸張
することなく、実時間の枠内で聴取することが可能にな
る。
Therefore, according to the present invention, it is possible to listen to a slow and easy-to-listen sound desired by the listener within the real time frame without extending the utterance time.

【0013】[0013]

【実施例】以下、図面を参照して本発明の実施例を詳細
に説明する。
Embodiments of the present invention will be described below in detail with reference to the drawings.

【0014】(1)装置構成 図1に本発明の一実施例の装置構成を示す。音声入力回
路1は音声信号を入力するための一般的な構成の回路で
あり、必要に応じて例えばマイクロホン、音調回路、ア
ナログディジタル変換器、音声記憶再生(録音)回路、
音声記憶媒体(例えば、ICメモリ、ハードディスク、
フロッピーディスクまたはVTR)、およびインタフェ
ース回路等を包含している。CPU(中央演算処理装
置)2は装置全体の制御および演算等を司り、例えば公
知のワンチップマイクロコンピュータやパーソナルコン
ピュータ等が適用できる。プログラムメモリ(PRO
M)3はCPU2が実行する本発明に係わる図2に示す
ような制御手順(プログラム)、およびテーブル、定数
等をあらかじめ格納している。
(1) Apparatus Configuration FIG. 1 shows an apparatus configuration according to an embodiment of the present invention. The audio input circuit 1 is a circuit having a general configuration for inputting an audio signal. If necessary, for example, a microphone, a tone circuit, an analog / digital converter, an audio storage / reproduction (recording) circuit,
Audio storage media (for example, IC memory, hard disk,
Floppy disk or VTR), and an interface circuit. The CPU (Central Processing Unit) 2 controls the entire apparatus, performs calculations, and the like. For example, a known one-chip microcomputer or personal computer can be applied. Program memory (PRO
M) 3 stores in advance control procedures (programs), tables, constants, and the like as shown in FIG.

【0015】入力バッファ4および処理バッファ5はC
PU2が作業域として使用する不図示のRAM(ランダ
ムアクセスメモリ)内に確保されており、音声入力回路
1から入力されたディジタル音声信号は後述のフレーム
単位で順次入力バッファ4に一時格納され、次に入力バ
ッファ4に格納された音声信号は後述のセグメント毎に
処理バッファ5に一時格納される。ファイル6は本発明
に係わる有声区間の伸張と無音区間の短縮の処理を施さ
れた音声信号を格納するメモリであり、例えば上記のR
AMの他に、ICメモリやフロッピーディスク等の音声
記憶媒体が適用できる。
The input buffer 4 and the processing buffer 5 are C
The PU 2 is secured in a RAM (random access memory) (not shown) used as a work area, and digital audio signals input from the audio input circuit 1 are temporarily stored in the input buffer 4 sequentially in frame units described later. The audio signal stored in the input buffer 4 is temporarily stored in the processing buffer 5 for each segment described later. The file 6 is a memory for storing a voice signal which has been subjected to processing for expanding a voiced section and shortening a silent section according to the present invention.
In addition to the AM, an audio storage medium such as an IC memory or a floppy disk can be applied.

【0016】音声出力回路7はファイル6内の音声信号
を外部に出力するための一般的な構成の回路であり、必
要に応じて例えばインタフェース回路、ディジタルアナ
ログ変換器、スピーカー、録音装置(あるいは放送機
器)等を包含している。なお、後述の図2に示す手順を
公知技術により全てハード化して専用機として構成する
ことも勿論可能である。
The audio output circuit 7 is a circuit having a general configuration for outputting the audio signal in the file 6 to the outside. If necessary, for example, an interface circuit, a digital / analog converter, a speaker, a recording device (or a broadcast device) Equipment). Of course, it is also possible to harden all the procedures shown in FIG.

【0017】(2)動作例 図2は本発明の一実施例の動作手順を示す。本実施例で
は、受聴音声の発声する速さ(話速)を遅くする際に、
無音区間を聴感上の違和感なく最短に短縮し、かつ発話
音声中の意味上重要な部分は通例音声のピッチ(基本周
波数)が高いところであり、そのピッチの高いところは
通例声立て開始時であるということに着目して、声立て
と次の声立ての区間を単位にしてこの区間の開始点では
ゆっくりとした話速を設定し、終了点に向って音声の基
本周波数の大まかな変化に追随して徐々に話速を速める
ように処理している。
(2) Operation Example FIG. 2 shows an operation procedure of an embodiment of the present invention. In the present embodiment, when the speed of uttering the listening sound (speaking speed) is reduced,
The silent section is shortened to the shortest possible time without a sense of incongruity, and a significant portion in the uttered voice is a place where the pitch (fundamental frequency) of the voice is usually high, and the place where the pitch is high is at the start of vocalization. Focusing on this, set a slow speech speed at the start point of this section in units of the vocal and the next vocal section, and follow the rough change of the fundamental frequency of the voice toward the end point And then gradually increase the speaking speed.

【0018】ステップS1:まず最初に音声入力回路1
からの入力音声信号をフレームと呼ばれる一定長の部分
に切り出し、入力バッファ4に格納する。本実施例で
は、フレーム長は例えば3.3msである。
Step S1: First, the voice input circuit 1
The input audio signal is cut out into a fixed-length portion called a frame and stored in the input buffer 4. In this embodiment, the frame length is, for example, 3.3 ms.

【0019】ステップS2:フレーム毎に有声、無声、
無音の判定を行う。この判定方法として、一例として公
知の自己相関法と零クロス法を適用できる。勿論その他
の判定方法でもよい。人が発声する有声および無声以外
の入力音(例えば、低レベルの雑音や背景音等)は原則
として無音として処理する。
Step S2: Voiced, unvoiced,
Performs silence determination. As the determination method, for example, a known autocorrelation method and a zero-cross method can be applied. Of course, other determination methods may be used. In principle, input sounds other than voiced and unvoiced voices (for example, low-level noise and background sounds) uttered by humans are processed as silent.

【0020】ステップS3:今回と前回のフレームの上
記種類が同じであればステップS1に戻り、異なった場
合、例えば有声から無声に変化すれば後段の処理に進
む。これにより同一種類(区間)の音声が入力バッファ
4に格納されることになる。
Step S3: If the type of the current and previous frames is the same, return to step S1, and if different, for example, change from voiced to unvoiced, proceed to the subsequent stage. As a result, the same type (section) of voice is stored in the input buffer 4.

【0021】ステップS4:1秒間に発声されるモーラ
数の平均から、後述のスレッショールド値Th1,Th
2,Th3を設定する。モーラは、短母音を含む1音節
の長さに相当する。日本語ではほぼ仮名1文字(拗音で
は2字)に相当する。なお、このステップS4の処理は
最初の段階のときだけ、あるいは所定時間毎に行っても
よい。
Step S4: From the average of the number of mora uttered in one second, the threshold values Th1 and Th described later are calculated.
2. Set Th3. Mora corresponds to the length of one syllable including a short vowel. In Japanese, it is almost equivalent to one character of kana (two characters in MUON). The processing in step S4 may be performed only at the initial stage or at predetermined intervals.

【0022】ステップS5:無声または無音から始まっ
て有声で終わる区間を1ブロック(Bn :n=1,2,
…)とする。このブロック内ではステップS2の判定に
応じて無音区間(an )、無声区間(bn )、有声区間
(Cn )の3つに大別され、その区間毎に下記の各処理
系に送られる。b1 とc1 の境界の時刻をt1,s と表現
し、初回の声立てをα1とする(図3参照)。
Step S5: A section starting from unvoiced or unvoiced and ending with voiced is one block (B n : n = 1, 2, 2)
…). In this block, it is roughly classified into a silent section (a n ), an unvoiced section (b n ), and a voiced section (C n ) according to the determination in step S2, and each section is sent to the following processing system. Can be The time at the boundary between b 1 and c 1 is expressed as t 1, s, and the first voice is α1 (see FIG. 3).

【0023】ステップS6:図3に示すように、n番目
の有声区間の開始点(tn,s )と1つの前の有声区間の
終了点(tn-1,e )との間の時間間隔Tn (Tn =t
n,s −tn-1,e )を算出する。
Step S6: As shown in FIG. 3, the time between the start point (t n, s ) of the nth voiced section and the end point (t n-1, e ) of one previous voiced section. Interval T n (T n = t
n, s -t n-1, e ).

【0024】ステップS7:Tn と声立てを判別するた
めのスレッショールド値Th1とを比較する。Tn があ
るスレッショールド値Th1を越えた場合には、tn,s
の時点を声立てαm と判断し(図3参照)、ステップS
8に進む。なお、本処理の開始時点で前の有声区間がな
いときは後述のステップS11に飛ぶ。
[0024] Step S7: comparing the threshold value Th1 for determining T n and voice stand. If T n exceeds a certain threshold value Th1, t n, s
Is determined to be the voice α m (see FIG. 3), and step S
Proceed to 8. If there is no previous voiced section at the start of this processing, the process jumps to step S11 described later.

【0025】ステップS8:1つ前の声立てαm-1 と1
つ前の有声区間の終了点tn-1,e の範囲を1セグメント
とする。図3の例では、T5 =t6,s −t5,e >Th1
とすると、t6,s の時点が声立てα2 、区間(t5,e
1,s )が1セグメントとなる。そして、ステップS1
1,S12,S15の処理によりこれまでに処理バッフ
ァ5に格納されている1セグメントの開始点の有声区間
長の伸張倍率rs を1≦rs ≦2の範囲内であらかじめ
決めた値に設定して伸張する。この伸張倍率をこのセグ
メントの終了点に向って徐々に小さくし、終了点の有声
区間長の伸張倍率re が0.7≦re ≦1となるように
する。図4に図3のセグメント1に属する有声区間の伸
張倍率の求め方の一例を示す。セグメント開始点の有声
区間c1は伸張されてc1 ′=rs ・c1 、c2 はc
2 ′=r2 ・c2 となる。セグメント終了点の有声区間
5 はc5 ′=re ・c5 となるが、re はre ≦1で
あるから、実際的には短縮される。有声区間以外の無音
区間an 、無声区間bn については処理を施さず、不変
である。
Step S8: The previous voice α m-1 and 1
The range of the end point t n-1, e of the preceding voiced section is defined as one segment. In the example of FIG. 3, T 5 = t 6, s −t 5, e > Th1
Then , the time point of t6 , s is voiced α 2 , and the section (t 5, e
t 1, s ) becomes one segment. Then, step S1
The expansion ratio r s of the voiced section length of the start point of one segment stored so far in the processing buffer 5 is set to a predetermined value within the range of 1 ≦ r s ≦ 2 by the processing of 1, S12, and S15. And stretch. The stretching magnification is gradually decreased toward the end point of this segment, stretching magnification r e voiced section length of the end point is made to be 0.7 ≦ r e ≦ 1. FIG. 4 shows an example of a method of obtaining the expansion factor of the voiced section belonging to the segment 1 in FIG. Voiced c 1 of the segment starting point is stretched c 1 '= r s · c 1, c 2 is c
2 ′ = r 2 · c 2 Voiced c 5 segments end point becomes a c 5 '= r e · c 5, r e is because a r e ≦ 1, in practice is shortened. No processing is performed on the silent section a n and the unvoiced section b n other than the voiced section, and they are unchanged.

【0026】すなわち、一般に声立て部分(一単位の中
の前半部分)の音声は意味上、重要であることが多いの
で、上記のように話速を適度に遅くすることによって聴
きやすさが向上する。話速の変化は、適当な関数f
(t)を用いて変化させる。本実施例では、一例として
図4に示すような余弦関数を用いた。この場合、f
(t)は次式(1)で表現される。
That is, in general, the voice of the vocal part (the first half of one unit) is often important in terms of meaning, so that the audibility is improved by appropriately reducing the speech speed as described above. I do. The change in speech speed is determined by the appropriate function f
It is changed using (t). In the present embodiment, a cosine function as shown in FIG. 4 is used as an example. In this case, f
(T) is expressed by the following equation (1).

【0027】[0027]

【数1】 (Equation 1)

【0028】ステップS9:ステップS8で話速変換さ
れた音声データをファイル6に落とす。
Step S9: The voice data whose speech speed has been converted in step S8 is dropped to the file 6.

【0029】ステップS10:処理バッファ5をクリア
する。
Step S10: The processing buffer 5 is cleared.

【0030】ステップS11:ステップS7でTn ≦T
h1の場合、またはステップS10を処理した場合はこ
のステップS11に進む。ステップS7が否定判定の場
合は有声区間が一単位に収まっていると判断し、この有
声区間を処理バッファ5に蓄える。ステップS10を通
った場合は声立て開始時点の有声区間が処理バッファ5
に蓄えられることになる。入力バッファ4を次の音声デ
ータの処理のためにクリアし、本処理作業の終了指示が
発生されてなければ(ステップS16)ステップS1に
戻る。
[0030] Step S11: In the step S7 T n ≦ T
In the case of h1, or when step S10 is processed, the process proceeds to step S11. If a negative determination is made in step S7, it is determined that the voiced section falls within one unit, and this voiced section is stored in the processing buffer 5. If step S10 has been reached, the voiced section at the start of the vocalization is stored in the processing buffer 5
Will be stored. The input buffer 4 is cleared for the processing of the next audio data, and if an instruction to end this processing operation has not been issued (step S16), the process returns to step S1.

【0031】ステップS12:無声区間については、入
力バッファ4から常に処理バッファ5に転送して蓄え
る。その後、入力バッファ4をクリアし、ステップS1
6を経てステップS1に戻る。
Step S12: The unvoiced section is always transferred from the input buffer 4 to the processing buffer 5 and stored. After that, the input buffer 4 is cleared, and step S1
After step 6, the process returns to step S1.

【0032】ステップS13:音声の種類別区間が無音
区間の場合は、無音区間の長さと、文章間の区切り(句
点)を判別するためのスレッショールド値Th2とを比
較する。無音区間がTh2を越えた場合、この無音区間
を文章と文章の区切り(句点)と判断し、次のステップ
S14に進み、それ以外はステップS15に飛ぶ。
Step S13: If the section for each type of voice is a silent section, the length of the silent section is compared with a threshold value Th2 for determining a break (punctuation) between sentences. When the silent section exceeds Th2, the silent section is determined to be a break (punctuation) between sentences, and the process proceeds to the next step S14. Otherwise, the process jumps to step S15.

【0033】ステップS14:句点と判定した無音区間
を以下の手順で短縮する。
Step S14: The silent section determined as a period is shortened by the following procedure.

【0034】聴感上の違和感なく最短に短縮するため、
短縮無音区間の時間長はスレッショールド値Th3とな
る。無音区間の時間長をan 、削除する区間の時間長を
n、削除後の無音区間の時間長をen とした場合、en
は図5の(B)に示すように、 en =an −dn ・・・(2) となる。この際、分析時の無音範囲の指定誤りから、無
声部分までも長い無音の一部と識別してしまう可能性が
あるため、an の先頭から、dn を削除するのではな
く、図5の(A)に示すように、an の中心点からdn
部分を削除する。また、dn の両端には、数msのテー
パーをかけて平滑化し、これによりクリック音の発生を
防止する。ここでの無音とは前述のように人から発生さ
れた音声以外の音を含むので、この平滑化処理が有用と
なる。
[0034] In order to shorten to the shortest without any discomfort in the sense of hearing,
The time length of the shortened silent section is the threshold value Th3. If the time length a n of the silent section, the time length d n of the section to be deleted, the time length of the silent section after the deletion was e n, e n
, As shown in (B) of FIG. 5, a e n = a n -d n ··· (2). In this case, the specified error silence range during analysis, because there is a possibility of identifying as part of a long silence even unvoiced portion, from the beginning of a n, instead of deleting the d n, 5 as shown in the (a), d n from the center point of a n
Delete part. Further, both ends of the d n, smoothes over the taper of several ms, to prevent the occurrence of this the click sound. Since the silence here includes a sound other than the voice generated from a person as described above, this smoothing process is useful.

【0035】上式(2)においてen の値はen ≧Th
3での範囲で可変値として設定してもよいが、処理を簡
単にするためen をTh3に近い一定値(例えば862
ms)に設定した場合は、上式(2)からdn はan
より変わる可変値となる。次に、ステップS15に進
む。
[0035] In the above equation (2) e n value e n ≧ Th
It may be set as a variable value in the range of 3, but a fixed value close to Th3 to e n order to simplify the processing (eg, 862
If set to ms), d n from the above equation (2) is a variable value that varies by a n. Next, the process proceeds to step S15.

【0036】ステップS15:無音区間を処理バッファ
5に蓄える。入力バッファ4をクリアし、ステップS1
6を経てステップS1に戻る。
Step S15: The silent section is stored in the processing buffer 5. Clear the input buffer 4 and execute step S1
After step 6, the process returns to step S1.

【0037】ステップS16:音声入力回路1に音声信
号のデータがなくなった場合、あるいは作業中止命令が
あった場合は本処理ルーチンは終了し、メインの待機ル
ーチン等に復帰する。
Step S16: If there is no sound signal data in the sound input circuit 1, or if there is a work stop command, the present processing routine ends and returns to the main standby routine and the like.

【0038】(3)実験例 本実施例の実験例では、136秒のニュース文に適応し
たが、この場合、話速の平均が9.6モーラ/秒であ
り、これを基に、Th1,Th2,Th3をTh1=3
50ms、Th2=Th3=1000msに設定した。
この時、心理実験により、話速制御については、一単位
内の開始点の話速(有声区間長の伸張倍率)が原音声の
1.0〜1.3倍、終了点の話速が0.9〜1.0倍の
範囲では自然性、わかりやすさにおいて高い評価が得ら
れ、また、無音区間の短縮については、短縮した無音区
間(en )が最低でも862ms存在すれば、聴感上違
和感がないという知見が得られた。
(3) Experimental Example In the experimental example of this embodiment, a news sentence of 136 seconds was applied. In this case, the average of the speech speed was 9.6 mora / second. Th2 = Th3 = Th1 = 3
50 ms, and Th2 = Th3 = 1000 ms.
At this time, according to a psychological experiment, as for the speech speed control, the speech speed at the start point in one unit (the expansion rate of the voiced section length) is 1.0 to 1.3 times that of the original voice, and the speech speed at the end point is 0. naturalness in the range of .9~1.0 times, high evaluation is obtained in clarity, also, for the shortening of the silent section, if shortened silence section (e n) is them 862ms present at a minimum, the audibility discomfort It was found that there was not.

【0039】その結果から、話速を1.2倍というゆっ
くりした話速から0.92倍という速い話速に変化さ
せ、長い無音区間(文章間の「ま」)を1200msに
短縮することによって、原音声、変換音声とも発話時間
が合致し、良好な話速変換音声が得られることが確認で
きた。
From the results, the speech speed was changed from a slow speech speed of 1.2 times to a fast speech speed of 0.92 times, and a long silent section ("ma" between sentences) was shortened to 1200 ms. It was confirmed that the speech times of the original voice and the converted voice coincided with each other, and that a good voice speed converted voice could be obtained.

【0040】(4)その他の実施例 上記実施例のステップS8(図2参照)の処理中におい
て、話速が変わってもそのピッチが変わらないように処
理することにより、高品質の音質が保てる。この処理方
法としては、例えば特願平3−245960号「話速制
御型補聴方法および装置」に開示された音声信号の処理
方法が好適である。
(4) Other Embodiments During the processing in step S8 (see FIG. 2) of the above-described embodiment, high-quality sound quality can be maintained by performing processing so that the pitch does not change even when the speech speed changes. . As this processing method, for example, a method of processing an audio signal disclosed in Japanese Patent Application No. 3-245960 “Speech Rate Control Type Hearing Aid Method and Apparatus” is suitable.

【0041】また、上記実施例において有声区間長の伸
張倍率rs ,re 無音区間の削除後の時間長en 等をあ
らかじめ決めた一定値としたが、ダイヤルやキーボード
等から使用者が希望の値にセット可能な可変値としても
よい。これにより、例えば視聴者の希望に合せたり、あ
るいは放送時間内にぴったりと合わせる編集作業等がよ
り容易となる。
Further, stretching magnification r s voiced interval length in the above embodiment, although a constant value the time length e n, etc. are previously decided after deletion of r e silent section, the user desires from the dial or a keyboard May be a variable value that can be set to the value of. This makes it easier to perform, for example, editing work that matches the wishes of the viewer or that fits exactly within the broadcast time.

【0042】また、上記実施例の有声区間の伸張処理の
代りに、音声のピッチ(基本周波数)を公知のピッチ抽
出方法により直接検出し、ピッチの変化に応じて、ピッ
チの高いところでは話速を緩め、低いところでは話速を
速めるように処理してもよい。
Also, instead of the voiced section expansion processing in the above embodiment, the pitch (fundamental frequency) of the voice is directly detected by a known pitch extraction method, and the speech speed is changed at a high pitch according to the change in pitch. May be relaxed, and the speech speed may be increased in low places.

【0043】[0043]

【発明の効果】以上説明したように、本発明によれば、
受聴音声の発生する速さ(話速)を遅くする際に、音声
のピッチ(基本周波数)の変化に応じて、ピッチの高い
ところでは話速を緩め、低いところでは話速を早めるよ
うにし、また、声立てと次の声立ての区間を単位にして
この区間の開始点ではゆっくりとした話速を設定し、そ
の終了点に向かって音声の基本周波数の大まかな変化に
追従して徐々に話速を早めるようにし、さらには文章間
の無音区間を予め実験で求めた聴感上違和感のない範囲
でできるだけ短い時間に短縮するようにしているので、
発話時間を原音声の発話時間に保ったまま全体としてゆ
っくりとした聴きやすい良好な音声に変換できる効果が
得られる。
As described above, according to the present invention,
When decreasing the speed at which the listening sound is generated (speaking speed), the speaking speed is slowed at a high pitch and the spoken speed is increased at a low pitch in accordance with a change in the pitch (fundamental frequency) of the sound. In addition, set a slow speech speed at the start point of this section in units of the vocal section and the next vocal section, and gradually follow the rough change of the fundamental frequency of the voice toward the end point. Since the speech speed is increased, and the silent section between sentences is shortened to the shortest possible time within the range of hearing perceived incongruity determined in advance by experiments,
An effect is obtained in which the speech can be converted into a good voice that is easy to listen to slowly as a whole while keeping the speech time at the speech time of the original voice.

【図面の簡単な説明】[Brief description of the drawings]

【図1】本発明の一実施例の装置構成を示すブロック図
である。
FIG. 1 is a block diagram showing an apparatus configuration according to an embodiment of the present invention.

【図2】本発明の一実施例の処理内容を示すフローチャ
ートである。
FIG. 2 is a flowchart showing processing contents of one embodiment of the present invention.

【図3】本発明の一実施例の処理に基づく音声データの
セグメンテーションを示す線図である。
FIG. 3 is a diagram showing a segmentation of audio data based on a process according to an embodiment of the present invention.

【図4】本発明の一実施例の話速変化を示すタイミング
チャートである。
FIG. 4 is a timing chart showing a change in speech speed in one embodiment of the present invention.

【図5】本発明の一実施例の処理に基づく原波形(A)
と文章間の長い無音区間を短縮した波形(B)とを示す
波形図である。
FIG. 5 is an original waveform (A) based on the processing of one embodiment of the present invention.
FIG. 8 is a waveform diagram showing a waveform (B) obtained by shortening a long silent section between sentences.

【符号の説明】[Explanation of symbols]

1 音声入力回路 2 CPU 3 PROM 4 入力バッファ 5 処理バッファ 6 ファイル 7 音声出力回路 an 無音区間 bn 無声区間 cn 有声区間 cn ′ 伸張した有声区間 Bn 無声または無音から始まって有声で終わる区間 Th1 声立てを判別するためのスレッショールド値 Th2 文章間の区切り(句点)を判別するためのスレ
ッショールド値 rs 開始点における有声区間長の伸張倍率 re 終了点における有声区間長の伸張倍率
Reference Signs List 1 voice input circuit 2 CPU 3 PROM 4 input buffer 5 processing buffer 6 file 7 voice output circuit a n silent section b n unvoiced section c n voiced section c n ′ expanded voiced section B n voiced beginning with voiceless and ending with voiceless Section Th1 Threshold value for discriminating voice-throat Th2 Threshold value for discriminating breaks (phrases) between sentences r s Extension rate of voiced section length at start point r e Length of voiced section length at end point Stretch magnification

───────────────────────────────────────────────────── フロントページの続き (56)参考文献 特開 平1−93795(JP,A) 電子情報通信学会技術研究報告 SP 92−56 話速変換に伴う時間伸長を吸収 するための一方法 (58)調査した分野(Int.Cl.7,DB名) G10L 21/04 ──────────────────────────────────────────────────続 き Continuation of the front page (56) References JP-A-1-93795 (JP, A) IEICE Technical Report SP 92-56 A method for absorbing time extension accompanying speech speed conversion (58 ) Surveyed field (Int.Cl. 7 , DB name) G10L 21/04

Claims (5)

(57)【特許請求の範囲】(57) [Claims] 【請求項1】 受聴音声の発生する速さ(以下、話速と
いう)を遅くする際に、音声のピッチ(基本周波数)の
変化に応じて、ピッチの高いところでは話速を緩め、低
いところでは話速を早めることを特徴とする話速変換方
法。
When the speed at which a received voice is generated (hereinafter referred to as a voice speed) is reduced, the voice speed is reduced at a high pitch and the voice speed is lowered at a low pitch in accordance with a change in voice pitch (fundamental frequency). Is a speech speed conversion method characterized by increasing the speech speed.
【請求項2】 話速を遅くする際に、声立てと次の声立
ての区間を単位にしてこの区間の開始点ではゆっくりと
した話速を設定し、その終了点に向かって音声の基本周
波数の大まかな変化に追従して徐々に話速を早めること
を特徴とする話速変換方法。
2. When the speech speed is reduced, a slow speech speed is set at the start point of a section of a vocal utterance and the next vocal utterance unit, and the basic speech is set toward the end point. A speech speed conversion method characterized by gradually increasing the speech speed following a rough change in frequency.
【請求項3】 話速を遅くする際にさらに、文章間の無
音区間を予め実験で求めた聴感上違和感のない範囲でで
きるだけ短い時間に短縮することを特徴とする請求項1
または2に記載の話速変換方法。
3. The method according to claim 1, wherein, when the speech speed is reduced, a silent section between sentences is shortened to a time as short as possible within a range in which a sense of incongruity obtained by an experiment is obtained in advance.
Or the speech speed conversion method according to 2.
【請求項4】 前記予め実験で求めた聴感上違和感のな
い範囲でできるだけ短い時間が、862ms〜ほぼ10
00msの範囲であることを特徴とする請求項3に記載
の話速変換方法。
4. The time as short as possible within a range in which there is no uncomfortable feeling on hearing obtained in the experiment in advance is from 862 ms to almost 10
4. The speech speed conversion method according to claim 3, wherein the range is 00 ms.
【請求項5】 音声信号を有声、無声、無音の別に識別
する音声識別手段と、 該音声識別手段により識別された無音区間が文章間の無
音区間か否かを判定する無音区間判定手段と、 該無音区間判定手段により文章間の無音区間と判定され
た場合は当該無音区間を予め実験で求めた聴感上違和感
のない範囲でできるだけ短い時間に短縮する無音区間短
縮手段と、 前記音声識別手段により識別された有声区間が声立て開
始のものか否かを判定する有声区間判定手段と、 該有声区間判定手段により声立て開始と判定された場合
は声立てと次の声立ての区間を単位にしてこの区間の開
始点ではゆっくりとした話速を設定し、その終了点に向
かって音声の基本周波数の大まかな変化に追従して徐々
に話速を速める話速変換処理を行う有声区間伸張手段と
を具備したことを特徴とする話速変換装置。
5. Speech identification means for distinguishing a speech signal into voiced, unvoiced, and silence; silence section determination means for judging whether or not the silence section identified by the speech identification means is a silence section between sentences; When the silent section determining unit determines that the silent section between sentences is a silent section between sentences, a silent section shortening unit that shortens the silent section to a time as short as possible within a range where there is no uncomfortable feeling obtained by an experiment in advance; Voiced section determining means for determining whether or not the identified voiced section is the start of a voiced voice; and, if the voiced voiced section determining means determines that the voiced voice is to be started, the voiced voice and the next voiced voice are used as a unit. Voiced section decompression means that performs a speech rate conversion process that sets a slow speech rate at the start of the leverage section and gradually increases the speech rate following a rough change in the fundamental frequency of speech toward the end point. Speech speed converting device being characterized in that comprises a.
JP05178792A 1991-09-25 1992-03-10 Method and apparatus for converting speech speed Expired - Lifetime JP3249567B2 (en)

Priority Applications (6)

Application Number Priority Date Filing Date Title
JP05178792A JP3249567B2 (en) 1992-03-10 1992-03-10 Method and apparatus for converting speech speed
US07/950,411 US5305420A (en) 1991-09-25 1992-09-22 Method and apparatus for hearing assistance with speech speed control function
EP96119237A EP0766229B1 (en) 1991-09-25 1992-09-23 Method and apparatus for hearing assistance with speech speed control function
DK96119237T DK0766229T3 (en) 1991-09-25 1992-09-23 Method and apparatus for hearing aid with speech rate control function
EP92116292A EP0534410B1 (en) 1991-09-25 1992-09-23 Method and apparatus for hearing assistance with speech speed control function
DK92116292T DK0534410T3 (en) 1991-09-25 1992-09-23 Method and apparatus for hearing aid with speech rate control function

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP05178792A JP3249567B2 (en) 1992-03-10 1992-03-10 Method and apparatus for converting speech speed

Publications (2)

Publication Number Publication Date
JPH05257490A JPH05257490A (en) 1993-10-08
JP3249567B2 true JP3249567B2 (en) 2002-01-21

Family

ID=12896658

Family Applications (1)

Application Number Title Priority Date Filing Date
JP05178792A Expired - Lifetime JP3249567B2 (en) 1991-09-25 1992-03-10 Method and apparatus for converting speech speed

Country Status (1)

Country Link
JP (1) JP3249567B2 (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5611018A (en) * 1993-09-18 1997-03-11 Sanyo Electric Co., Ltd. System for controlling voice speed of an input signal
EP0797822B1 (en) * 1994-12-08 2002-05-22 The Regents of the University of California Method and device for enhancing the recognition of speech among speech-impaired individuals
WO1996037880A1 (en) * 1995-05-25 1996-11-28 Sanyo Electric Co., Ltd. Recording and reproducing device
JPH09198089A (en) * 1996-01-19 1997-07-31 Matsushita Electric Ind Co Ltd Reproduction speed converting device
JP3432443B2 (en) * 1999-02-22 2003-08-04 日本電信電話株式会社 Audio speed conversion device, audio speed conversion method, and recording medium storing program for executing audio speed conversion method
JP2001312298A (en) * 2000-04-27 2001-11-09 Nippon Hoso Kyokai <Nhk> Device and method for speaking speed conversion processing, recording medium, and using method for speaking speed conversion processing device
JP5054477B2 (en) * 2007-09-26 2012-10-24 日本放送協会 Hearing aid
JP5412204B2 (en) * 2009-07-31 2014-02-12 日本放送協会 Adaptive speech speed converter and program
JP5593244B2 (en) 2011-01-28 2014-09-17 日本放送協会 Spoken speed conversion magnification determination device, spoken speed conversion device, program, and recording medium
JP2014228691A (en) * 2013-05-22 2014-12-08 日本電気株式会社 Aviation control voice communication device and voice processing method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
電子情報通信学会技術研究報告 SP92−56 話速変換に伴う時間伸長を吸収するための一方法

Also Published As

Publication number Publication date
JPH05257490A (en) 1993-10-08

Similar Documents

Publication Publication Date Title
US5828994A (en) Non-uniform time scale modification of recorded audio
JP4523257B2 (en) Audio data processing method, program, and audio signal processing system
US6205420B1 (en) Method and device for instantly changing the speed of a speech
JP3249567B2 (en) Method and apparatus for converting speech speed
JPH11175082A (en) Voice interaction device and voice synthesizing method for voice interaction
CN111739536A (en) Audio processing method and device
JP2000152394A (en) Hearing aid for moderately hard of hearing, transmission system having provision for the moderately hard of hearing, recording and reproducing device for the moderately hard of hearing and reproducing device having provision for the moderately hard of hearing
JP4953767B2 (en) Speech generator
Cooke et al. Combining spectral and temporal modification techniques for speech intelligibility enhancement
JP3219892B2 (en) Real-time speech speed converter
JP3553828B2 (en) Voice storage and playback method and voice storage and playback device
JP6314879B2 (en) Reading aloud evaluation device, reading aloud evaluation method, and program
Nakamura et al. A new approach to compensate degeneration of speech intelligibility for elderly listeners-development of a portable real time speech rate conversion system
JP2001184100A (en) Speaking speed converting device
JP3187242B2 (en) Speech speed converter
Junqua et al. Influence of the speaking style and the noise spectral tilt on the Lombard reflex and automatic speech recognition
JP3373933B2 (en) Speech speed converter
JPH09152889A (en) Speech speed transformer
JP3102553B2 (en) Audio signal processing device
JP4313724B2 (en) Audio reproduction speed adjustment method, audio reproduction speed adjustment program, and recording medium storing the same
JP3187241B2 (en) Speech speed converter
JP2867744B2 (en) Audio playback device
JPH08110796A (en) Voice emphasizing method and device
JPH0580791A (en) Device and method for speech rule synthesis
JP3113101B2 (en) Speech synthesizer

Legal Events

Date Code Title Description
R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20081109

Year of fee payment: 7

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20091109

Year of fee payment: 8

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20101109

Year of fee payment: 9

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20111109

Year of fee payment: 10

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20121109

Year of fee payment: 11

EXPY Cancellation because of completion of term
FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20121109

Year of fee payment: 11