JPH06289895A

JPH06289895A - Real-time speaking speed converting method

Info

Publication number: JPH06289895A
Application number: JP5078098A
Authority: JP
Inventors: Tatsu Ikezawa; 龍池沢; Akira Nakamura; 章中村; Eiichi Miyasaka; 栄一宮坂
Original assignee: Nippon Hoso Kyokai NHK; Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 1993-04-05
Filing date: 1993-04-05
Publication date: 1994-10-18
Anticipated expiration: 2016-10-15
Also published as: JP3219892B2

Abstract

PURPOSE:To convert a speech into an easy-to-hear speech which is slow on the whole without substantially extending a speaking time by properly reducing the speaking speed of a part which is considered as an important meaning part in a spoken speech and contrary increasing the speed of other parts through real-time processing. CONSTITUTION:This method is equipped with a speech input circuit 1, a CPU circuit 2, a PROM circuit 3, an input buffer circuit 4, a processing buffer circuit 5, a file circuit 6, a speech output circuit 7, and a bus 8. Then the circuit 1 inputs a speech (original speech) to be an object converted in speaking speed and detects variation in the intonation (pitch frequency) of the original speech through the real-time processing. Further, the speaking speed is varied according to a rule which increases the speaking speed for a high-intonation part and decreasing the speaking speed for a low part according to the indication, thereby converting the original speech into the excellent, easy-to-hear speech while maintaining the vocalization time.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、原音声を取り込んで、
聴覚障害者や高齢者等の音声聴取に好適なゆっくりした
速度の音声に変換するリアルタイム話速変換方法に関す
る。BACKGROUND OF THE INVENTION The present invention captures original voice,
The present invention relates to a real-time speech speed conversion method for converting to slow-speed voice suitable for hearing of hearing impaired persons and elderly people.

【０００２】［発明の概要］本発明は、原音声を取り込
んで、聴覚障害者や高齢者等の音声聴取に好適なゆっく
りした速度の音声に変換するリアルタイム話速変換方法
に関するものであって、リアルタイム処理で、音声のイ
ントネーション（ピッチ周波数）の変化を検出するとと
もに、この検出結果に基づいてイントネーションの高い
部分では、話速を緩め、低い部分では、話速を早めると
いう規則で話速を変化させることにより、原音声の発話
時間を保ったまま、原音声を聞き易い良好な音声に変換
するものである。SUMMARY OF THE INVENTION The present invention relates to a real-time speech speed conversion method for capturing original sound and converting it into slow-speed sound suitable for listening to the sound of hearing impaired persons, the elderly, and the like. Changes in voice intonation (pitch frequency) are detected by real-time processing, and based on this detection result, the speech speed is changed according to the rule that the speech speed is slowed in the high intonation portion and increased in the low intonation portion. By doing so, the original voice is converted into a good voice that is easy to hear while maintaining the utterance time of the original voice.

【０００３】[0003]

【従来の技術】一般に、受聴者が加齢ないしなんらかの
障害などによって音声識別臨界速度（音声を正確に識別
できる最大の話速）などの受聴能力が低下すると、通常
の速さの音声や早口で話される音声の識別度が大幅に低
下するようになる。2. Description of the Related Art Generally, when a listener deteriorates his / her listening ability such as a voice recognition critical speed (maximum speech speed at which voice can be accurately identified) due to aging or some kind of disorder, a normal voice or quick voice is used. The degree of distinction of the spoken voice is greatly reduced.

【０００４】そして、従来においては、このような聴力
障害を持つ人のための補聴手段として、補聴器しかなか
った。In the past, only a hearing aid was available as a hearing aid for people with such hearing impairment.

【０００５】[0005]

【発明が解決しようとする課題】ところで、上述した補
聴器は単に周波数特性の改善や利得制御などによって聴
覚系の外耳、中耳の伝達特性のみを補償するための機器
であることから、聴覚中枢の劣化に関与する音声の識別
能力の低下を補償することができないという問題があっ
た。The hearing aid described above is a device for compensating only the transfer characteristics of the outer ear and middle ear of the auditory system by simply improving the frequency characteristics and controlling the gain. There is a problem in that it is not possible to compensate for the deterioration in the ability of discriminating voice that is related to deterioration.

【０００６】そこで、このような問題を解決する方法と
して、原音声の品質を保ったまま、話速を変換する技術
が開発されている。Therefore, as a method for solving such a problem, a technique for converting the speech speed while maintaining the quality of the original voice has been developed.

【０００７】この話速変換技術では、音声の話速のみを
一様に遅くすることにより、特に高齢者や聴覚障害者等
にとっては、はるかに聴き易い音声にすることが可能で
あるが、この操作によって音声の発話時間も必然的に伸
張する。しかし、放送等では、伸張前の音声の発話時間
は、決められた時間内に収まるように発話されているこ
とから、このような音声の伸張を行なうと、上記制限時
間内に収まらなくなる可能性が生じる。また、テレビジ
ョン等のように音声と映像を同期して提供するような場
合に、音声のみを伸張すると、映像との間に時間的な
「ずれ」が生じ、これが聞き取りに悪影響を及ぼす虞が
発生する。In this speech speed conversion technology, it is possible to make the speech much easier to hear, especially for the elderly and deaf persons, by uniformly slowing only the speech speed of the speech. The utterance time of the voice is inevitably extended by the operation. However, in broadcasting etc., the utterance time of the voice before decompression is uttered so as to be within the predetermined time. Therefore, if the voice is decompressed in this way, it may not be within the above time limit. Occurs. Further, in the case where audio and video are provided in synchronism with each other such as in a television, if only the audio is expanded, there is a time lag between the audio and the video, which may adversely affect listening. Occur.

【０００８】このため、このような時間的な「ずれ」を
考慮した話速変換技術をオフライン処理で実現するもの
が開発されているものの、時間的な「ずれ」を考慮した
話速変換技術をリアルタイム処理で実現することができ
るものは未だ開発されていない。Therefore, although a speech speed conversion technique considering such a temporal "deviation" is realized by off-line processing, a speech speed conversion technique considering a temporal "deviation" is developed. What can be realized by real-time processing has not been developed yet.

【０００９】本発明は上記の事情に鑑み、上述した時間
的な「ずれ」に伴う問題点を解決するため、リアルタイ
ム処理で、発話音声中の意味上重要な部分と考えられる
部分の話速を適度に遅くし、それ以外の部分を逆に速め
ることによって、発話時間を実質的に伸張させることな
く、全体としてゆっくりとした聞きやすい音声に変換す
ることができるリアルタイム話速変換方法を提供するこ
とを目的としている。In view of the above-mentioned circumstances, the present invention solves the above-mentioned problems associated with the "deviation" in time. Therefore, in real-time processing, the speech speed of a portion considered to be significant in the uttered voice is considered. To provide a real-time speech speed conversion method capable of converting to a slow and easy-to-listen voice as a whole without substantially extending the utterance time by moderately slowing it and speeding up the other parts in reverse. It is an object.

【００１０】[0010]

【課題を解決するための手段】上記の目的を達成するた
めに本発明によるリアルタイム話速変換方法は、受聴音
声の発声する速さ（話速）を遅くする際、文章と文章と
の区切りを検出してポーズ区間と判定し、このポーズ区
間から所定時間の範囲にある部分と、この範囲外の部分
とに分けて予め設定されている一定の規則に基づいて話
速を変化させることを特徴としている。In order to achieve the above-mentioned object, the real-time speech speed conversion method according to the present invention, when slowing down the utterance speed (speaking speed) of a listening voice, divides sentences into sentences. The feature is that it is detected and judged as a pause section, and the speech speed is changed based on a predetermined rule divided into a portion within a predetermined time range from this pause section and a portion outside this range. I am trying.

【００１１】[0011]

【作用】上記の構成によって、本発明では、受聴音声の
発声する速さ（話速）を遅くする際、文章と文章との区
切りを検出してポーズ区間と判定し、このポーズ区間か
ら所定時間の範囲にある部分と、この範囲外の部分とに
分けて予め設定されている一定の規則に基づいて話速を
変化させることにより、時間的な「ずれ」に伴う問題点
を解決しながら、発話音声中の意味上重要な部分の話速
を適度に遅くし、それ以外の部分を逆に速めることによ
って、発話時間を実質的に伸張させることなく、リアル
タイムで全体としてゆっくりとした聞きやすい音声に変
換する。With the above structure, in the present invention, when the speed at which the received voice is uttered (speech speed) is slowed down, a pause between sentences is detected to determine a pause section, and a predetermined time is passed from this pause section. By changing the speech speed based on a predetermined rule that is divided into a portion within the range and a portion outside this range, while solving the problems associated with temporal "deviation", By slowing down the speech speed of the meaningful part of the uttered voice moderately and speeding up the other parts in the opposite direction, the utterance time is not extended substantially and the voice is slow and easy to hear as a whole in real time. Convert to.

【００１２】[0012]

【Example】

《実施例の構成》図１は本発明によるリアルタイム話速
変換方法の一実施例を適用したリアルタイム話速変換装
置の一例を示すブロック図である。<< Structure of Embodiment >> FIG. 1 is a block diagram showing an example of a real-time speech speed conversion apparatus to which an embodiment of the real-time speech speed conversion method according to the present invention is applied.

【００１３】この図に示すリアルタイム話速変換装置
は、音声入力回路１と、ＣＰＵ回路２と、ＰＲＯＭ回路
３と、入力バッファ回路４と、処理バッファ回路５と、
ファイル回路６と、音声出力回路７と、バス８とを備え
ており、音声入力回路１によって話速変換対象となる音
声（原音声）を取り込み、リアルタイム処理で、原音声
のイントネーション（ピッチ周波数）の変化を検出する
とともに、この検出結果に基づいてイントネーションの
高い部分では、話速を緩め、低い部分では、話速を早め
るという規則で話速を変化させることにより、原音声の
発話時間を保ったまま、原音声を聞き易い良好な音声に
変換する。The real-time speech speed conversion apparatus shown in this figure comprises a voice input circuit 1, a CPU circuit 2, a PROM circuit 3, an input buffer circuit 4, a processing buffer circuit 5,
The file circuit 6, the voice output circuit 7, and the bus 8 are provided, and the voice input circuit 1 takes in the voice (original voice) that is the target of speech speed conversion, and the intonation (pitch frequency) of the original voice is processed in real time. Based on this detection result, the speech speed is changed in accordance with the rule that the speech speed is slowed down in the high intonation portion and increased in the low intonation portion, so that the utterance time of the original voice is maintained. The original voice is converted into a good voice that is easy to hear.

【００１４】音声入力回路１は、原音声を入力するため
の一般的な構成の回路、例えばマイクロホン、音調回
路、アナログディジタル変換器、音声記憶再生（録音）
回路、音声記憶媒体（例えば、ＩＣメモリ、ハードディ
スク、フロッピーディスクまたはＶＴＲ）、およびイン
タフェース回路等を備えており、話速変換対象となる音
声を取り込み、これをデジタル形式の音声信号に変換す
るとともに、ＣＰＵ回路２からの指示に基づいてフレー
ム単位で入力バッファ回路４に供給する。The voice input circuit 1 is a circuit having a general structure for inputting an original voice, for example, a microphone, a tone circuit, an analog-digital converter, a voice memory reproduction (recording).
A circuit, a voice storage medium (for example, an IC memory, a hard disk, a floppy disk or a VTR), an interface circuit, and the like are provided, and the voice that is the target of speech rate conversion is taken in and converted into a digital format voice signal. The data is supplied to the input buffer circuit 4 frame by frame based on an instruction from the CPU circuit 2.

【００１５】入力バッファ回路４は、必要な容量のＲＡ
Ｍなどによって構成され、ＣＰＵ回路２の作業域として
使用される部分であり、音声入力回路１から出力される
音声信号を取り込んでこれを記憶するとともに、ＣＰＵ
回路２からの指示に基づいて記憶している音声信号を処
理バッファ回路５に転送する。The input buffer circuit 4 has a required capacity of RA.
This is a portion configured by M and the like and used as a work area of the CPU circuit 2. The audio signal output from the audio input circuit 1 is captured and stored, and the CPU
Based on the instruction from the circuit 2, the stored audio signal is transferred to the processing buffer circuit 5.

【００１６】処理バッファ回路５は、必要な容量のＲＡ
Ｍなどによって構成され、ＣＰＵ回路２の作業域として
使用される部分であり、入力バッファ回路４から出力さ
れる音声信号を取り込んでこれを記憶するとともに、Ｃ
ＰＵ回路２からの指示に基づいて記憶している音声信号
をファイル回路６などに転送する。The processing buffer circuit 5 has a required capacity of RA.
This is a portion configured by M and the like and used as a work area of the CPU circuit 2. The audio signal output from the input buffer circuit 4 is captured and stored, and C
Based on an instruction from the PU circuit 2, the stored audio signal is transferred to the file circuit 6 or the like.

【００１７】ファイル回路６は、ＲＡＭの他に、ＩＣメ
モリやフロッピーディスク等の音声記憶媒体によって構
成され、本発明に係わる有声区間の伸張された音声信号
と、無音区間の短縮の処理を施された信号などを格納す
るメモリであり、処理バッファ回路５から処理済みの音
声信号が出力されたとき、これを取り込んで記憶し、こ
の後ＣＰＵ回路２からの指示に基づいて記憶している音
声信号を音声出力回路７に供給する。The file circuit 6 is composed of a voice storage medium such as an IC memory or a floppy disk in addition to the RAM, and is subjected to a voice signal expanded voiced section and a silent section shortening process according to the present invention. A signal for storing processed signals, and when the processed audio signal is output from the processing buffer circuit 5, the audio signal is captured and stored, and then stored based on an instruction from the CPU circuit 2. Is supplied to the audio output circuit 7.

【００１８】音声出力回路７は、ファイル回路６内の音
声信号を外部に出力するための一般的な構成の回路、例
えばインタフェース回路、ディジタルアナログ変換器、
スピーカー、録音装置（あるいは放送機器）等を備えて
おり、ファイル回路６から音声信号が出力されたとき、
これを取り込んで音声に変換しながら、外部に出力す
る。The audio output circuit 7 is a circuit having a general structure for outputting the audio signal in the file circuit 6 to the outside, for example, an interface circuit, a digital-analog converter,
It is equipped with a speaker, a recording device (or broadcasting equipment), etc., and when an audio signal is output from the file circuit 6,
This is taken in and converted into voice and output to the outside.

【００１９】また、ＣＰＵ回路２は、ワンチップマイク
ロコンピュータ等によって構成される部分であり、ＰＲ
ＯＭ回路３に格納されているプログラムに基づいて装置
全体の制御や各種のデータ処理を行なう。Further, the CPU circuit 2 is a portion constituted by a one-chip microcomputer, etc.
Based on the program stored in the OM circuit 3, the control of the entire apparatus and various data processing are performed.

【００２０】また、ＰＲＯＭ回路３は、ＣＰＵ回路２の
動作を規定するプログラムや各種の処理で使用される定
数データなどの格納場所として使用される部分であり、
ＣＰＵ回路２からの読出し指令に応じて記憶しているプ
ログラムや定数データを読出してＣＰＵ回路２に供給す
る。Further, the PROM circuit 3 is a portion used as a storage place for a program for defining the operation of the CPU circuit 2 and constant data used in various processes.
In response to a read command from the CPU circuit 2, the stored program or constant data is read and supplied to the CPU circuit 2.

【００２１】《実施例の動作》次に、図１に示すブロッ
ク図および図２、図３に示すフローチャート、図４に示
すタイミング図を参照しながら、この実施例の動作を説
明する。<< Operation of the Embodiment >> Next, the operation of this embodiment will be described with reference to the block diagram of FIG. 1, the flowcharts of FIGS. 2 and 3, and the timing chart of FIG.

【００２２】まず、ＣＰＵ回路２は最初に音声入力回路
１に入力されて処理された音声信号をフレームと呼ばれ
る一定長、例えば３．３ｍｓ毎に切出し、これを入力バ
ッファ回路４に転送させて格納させる（ステップＳＴ
１）。First, the CPU circuit 2 cuts out a voice signal which is first input to the voice input circuit 1 and processed, at a constant length called a frame, for example, every 3.3 ms, and transfers this to the input buffer circuit 4 for storage. Allow (Step ST
1).

【００２３】この後、ＣＰＵ回路２は入力バッファ回路
４に格納されている音声信号を各フレーム毎に自己相関
法や零クロス法などの方法で処理して各フレーム毎に有
声、無声、無音の判定を行なう。但し、この場合、人が
発する有声および無声以外の入力音（例えば、低レベル
の雑音や背景音など）については、原則として無音とし
て処理する（ステップＳＴ２）。Thereafter, the CPU circuit 2 processes the voice signal stored in the input buffer circuit 4 for each frame by a method such as an autocorrelation method or a zero-cross method, and outputs voiced, unvoiced, or silent voice for each frame. Make a decision. However, in this case, as a general rule, input sounds other than voiced and unvoiced sounds (for example, low-level noise and background sounds) produced by humans are processed as silence (step ST2).

【００２４】次いで、ＣＰＵ回路２は今回のフレームに
ついての有声、無声、無音の判定結果と、前回のフレー
ムについての有声、無声、無音の判定結果とが同じであ
るかどうかを判定し（ステップＳＴ３）、これらが同じ
種類であれば、上述したフレームの切出し処理に戻って
同じ処理を繰り返し、また違う種類、例えば前回のフレ
ームが有声区間であり、今回のフレームが無声区間であ
れば、それまで同じ種類の区間と判定されている音声信
号を処理バッファ回路５に転送して格納させる（ステッ
プＳＴ４）。Next, the CPU circuit 2 determines whether the voiced, unvoiced, and silent determination results for the current frame are the same as the voiced, unvoiced, and silent determination results for the previous frame (step ST3). ), If these are the same type, return to the above-described frame cutout process and repeat the same process, and if a different type, for example, the previous frame is a voiced section and the current frame is an unvoiced section, Audio signals determined to be the same type of section are transferred to the processing buffer circuit 5 and stored therein (step ST4).

【００２５】これによって、図４に示す如く、音声入力
回路１によって取り込まれた音声が有声区間と、無声区
間と、無音区間とに区分されて処理バッファ回路５に格
納される。As a result, as shown in FIG. 4, the voice input by the voice input circuit 1 is divided into the voiced section, the unvoiced section, and the silent section and stored in the processing buffer circuit 5.

【００２６】この後、ＣＰＵ回路２は処理バッファ回路
５に格納されている各音声信号のうち、無音区間と判定
された区間の中で、その区間長が２５０ｍｓ以上の無音
区間がポーズ区間（発声音の息つぎ区間）と判定すると
ともに、各ポーズ区間の間にある区間をフレーズ区間
（一息で発声される区間）と判定する（ステップＳＴ
５）。After that, the CPU circuit 2 selects a pause section (i.e., a voice output section) from the voice signals stored in the processing buffer circuit 5 in a silence section having a section length of 250 ms or more among the sections determined to be the silence section. It is determined to be a breath sound breathing section), and a section between each pause section is determined to be a phrase section (section to be uttered in one breath) (step ST
5).

【００２７】そして、ＣＰＵ回路２は各フレーズ区間の
有声区間と判定された区間に対し、図３に示す有声区間
処理１０を行なう（ステップＳＴ６）。Then, the CPU circuit 2 performs the voiced section processing 10 shown in FIG. 3 for the section determined as the voiced section of each phrase section (step ST6).

【００２８】この有声区間処理では、ＣＰＵ回路２は最
初、処理対象となっている有声区間がポーズ区間直後の
有声区間かどうかを判定し（ステップＳＴ１５）、ポー
ズ区間直後の有声区間であれば、フレーズ区間の開始点
（Ｐｈ＿ｓｔ）から３つの有声区間（第１有声区間、第
２有声区間、第３有声区間）を抽出してこれら第１有声
区間〜第３有声区間の各ピッチ周波数のうち、最高のピ
ッチ周波数を最高ピッチ周波数Ｐｉｔｃｈ＿ｍａｘとす
るとともに、第１有声区間の開始点Ｖ＿ｓｔにおける話
速の伸張倍倍率を“ｒｓ”とする（ステップＳＴ１
６）。In this voiced section processing, the CPU circuit 2 first determines whether or not the voiced section to be processed is the voiced section immediately after the pause section (step ST15). Three voiced sections (first voiced section, second voiced section, third voiced section) are extracted from the start point (Ph_st) of the phrase section, and among these pitch frequencies of the first voiced section to the third voiced section, The highest pitch frequency is set to the highest pitch frequency Pitch_max, and the expansion rate of the speech speed at the start point V_st of the first voiced section is set to "rs" (step ST1).
6).

【００２９】この後、ＣＰＵ回路２は処理対象となる音
声信号が第１有声区間の開始点Ｖ＿ｓｔから予め設定さ
れている長さの時間Ｔ（この実施例では、２０００ｍ
ｓ）が経過したかどうかを判定し（ステップＳＴ１
７）、時間Ｔが経過していなければ、話速の伸張倍率を
予め設定されている適切な減少関数、例えば次式に示す
余弦関数ｆ（ｔ）を用いて“ｒｓ”から“ｒｅ”まで変
化させる（ステップＳＴ１８）。Thereafter, the CPU circuit 2 causes the voice signal to be processed to have a preset time T from the start point V_st of the first voiced section (2000 m in this embodiment).
s) has elapsed (step ST1
7) If the time T has not elapsed, the expansion rate of the speech speed is changed from "rs" to "re" using an appropriate decreasing function set in advance, for example, the cosine function f (t) shown in the following equation. It is changed (step ST18).

【００３０】ｆ（ｔ）＝ｒｅ＋（１／２）・（ｒｓ−ｒｅ）・｛ｃｏｓπ・（ｔ−Ｖ＿ｓｔ）／Ｔ＋１．０｝ …（１）但し、ｔ：ｔ＝Ｖ＿ｓｔ〜Ｖ＿ｓｔ＋Ｔまた、このとき、この範囲では、無音区間および無声区
間に対し、何等の処理も施さない。F (t) = re + (1/2) · (rs−re) · {cosπ · (t−V_st) /T+1.0} (1) However, t: t = V_st to V_st + T At this time, no processing is performed on the silent section and the unvoiced section in this range.

【００３１】また、処理対象となる音声信号が第１有声
区間の開始点Ｖ＿ｓｔから予め設定されている長さの時
間Ｔを経過していれば（ステップＳＴ１７）、ＣＰＵ回
路２は処理対象となっている音声信号を含む区間（第ｎ
音声区間）（但し、ｎ≧ｋ）における平均ピッチ周波数
Ｐｉｔｃｈ（ｎ）が次式を満たすかどうかを判定する
（ステップＳＴ１９）。If the voice signal to be processed has passed the preset time T from the start point V_st of the first voiced section (step ST17), the CPU circuit 2 becomes the process target. Section containing the audio signal
It is determined whether the average pitch frequency Pitch (n) in the voice section) (where n ≧ k) satisfies the following equation (step ST19).

【００３２】Ｐｉｔｃｈ（ｎ）＞Ｐｉｔｃｈ＿ｍａｘ×Ｔｈ２ …（２）但し、Ｔｈ２：しきい値であり、この実施例では、Ｔｈ
２＝０．７。Pitch (n)> Pitch_max × Th2 (2) where Th2: threshold value, and in this embodiment, Th
2 = 0.7.

【００３３】そして、第ｎ音声区間が上記の（２）式を
満たしていれば、ＣＰＵ回路２はこの第ｎ音声区間の開
始点を“Ｖ２＿ｓｔ”として（ステップＳＴ２０）、上
述した期間内かどうかの判定処理および減少関数ｆ
（ｔ）を使用した有声区間の伸張処理を行ない、開始点
Ｖ２＿ｓｔから期間Ｔまでの範囲で、話速の伸張倍率を
“ｒｓ−Ｔｈ３”から“ｒｅ”まで変化させる（ステッ
プＳＴ１８）。If the nth voice section satisfies the above expression (2), the CPU circuit 2 sets the start point of this nth voice section to "V2_st" (step ST20) and determines whether it is within the above-mentioned period. Determination process and decreasing function f
(T) is used to expand the voiced section, and the expansion rate of the speech speed is changed from "rs-Th3" to "re" within the range from the start point V2_st to the period T (step ST18).

【００３４】この場合、この実施例では、しきい値Ｔｈ
３の値を“０．１”に設定している。In this case, in this embodiment, the threshold value Th
The value of 3 is set to "0.1".

【００３５】また、前記（２）式が満たされていなけれ
ば（ステップＳＴ１９）、ＣＰＵ回路２は有声区間の伸
張倍率ｒｅ、すなわち話速を最も速い状態のままにする
（ステップＳＴ２１）。If the equation (2) is not satisfied (step ST19), the CPU circuit 2 keeps the expansion rate re of the voiced section, that is, the fastest speech speed (step ST21).

【００３６】以下、ＣＰＵ回路２は次のポーズ区間ま
で、有声区間が検出される毎に、この有声区間内の音声
信号に対して上述した処理を繰り返し行なう。Thereafter, the CPU circuit 2 repeats the above-described processing for the voice signal in the voiced section every time the voiced section is detected until the next pause section.

【００３７】そして、この処理が終了した後、ＣＰＵ回
路２は処理バッファ内にある話速変換済みの音声信号を
ファイル回路６に転送させて格納させるとともに、処理
バッファ回路５をクリアする（ステップＳＴ７）。After this processing is completed, the CPU circuit 2 transfers the speech-rate-converted voice signal in the processing buffer to the file circuit 6 for storage and also clears the processing buffer circuit 5 (step ST7). ).

【００３８】また、上述した識別区間処理において（ス
テップＳＴ５）、処理対象となる区間が無声区間と判定
されれば、ＣＰＵ回路２はこの区間の音声信号を処理バ
ッファ回路５からファイル回路６に転送させて格納させ
た後、処理バッファ回路５をクリアする（ステップＳＴ
８）。Further, in the above-mentioned identification section processing (step ST5), if the section to be processed is determined to be the unvoiced section, the CPU circuit 2 transfers the voice signal of this section from the processing buffer circuit 5 to the file circuit 6. Then, the processing buffer circuit 5 is cleared (step ST
8).

【００３９】また、上述した識別区間処理において（ス
テップＳＴ５）、処理対象となる区間が無音区間と判定
されれば、ＣＰＵ回路２はこの区間がポーズ区間かどう
か判定し（ステップＳＴ９）、ポーズ区間であるときに
は、文章と文章との区切れ（句点）と判断して、文章を
聴感上の違和感なく最短に短縮するため、予め設定され
ているアルゴリズムの短縮処理を行なって無音区間を短
縮する（ステップＳＴ１０）。Further, in the above-described identification section processing (step ST5), if the section to be processed is determined to be a silent section, the CPU circuit 2 determines whether this section is a pause section (step ST9), and the pause section. When it is, it is determined that the sentence is a break (a phrase) between sentences and the sentence is shortened to the shortest without any discomfort in hearing. Therefore, a preset algorithm is shortened to shorten the silent section ( Step ST10).

【００４０】また、上述した識別区間処理において（ス
テップＳＴ５）、無音区間と判定さても、ポーズ区間で
なければ（ステップＳＴ９）、ＣＰＵ回路２はこの短縮
処理をスキップする。Further, in the above-described identification section processing (step ST5), even if it is determined that there is no sound section, if it is not a pause section (step ST9), the CPU circuit 2 skips this shortening processing.

【００４１】この後、ＣＰＵ回路２は処理バッファ回路
５内にある処理済みの無音区間の信号をファイル回路６
に転送させて格納させた後、処理バッファ回路５をクリ
アする（ステップＳＴ１１）。After that, the CPU circuit 2 outputs the signal of the processed silent section in the processing buffer circuit 5 to the file circuit 6.
Then, the processing buffer circuit 5 is cleared (step ST11).

【００４２】以下、ＣＰＵ回路２は処理対象となる音声
信号が無くなるまで（ステップＳＴ１２）、上述した処
理を繰り返し行なう。Thereafter, the CPU circuit 2 repeats the above-described processing until there are no audio signals to be processed (step ST12).

【００４３】また、上述した処理と並行して、ＣＰＵ回
路２はファイル回路６内に格納されている処理済みの音
声信号を音声出力回路７に転送させて音声として出力さ
せる。In parallel with the above processing, the CPU circuit 2 transfers the processed audio signal stored in the file circuit 6 to the audio output circuit 7 and outputs it as audio.

【００４４】《実験例》そして、表１に示す実際のニュ
ース音声を含む音声を上述したリアルタイム話速変換装
置で処理したとき、文章中の（１）−１〜（１）−６に
対し、各々の区切れをポーズと認識して各フレーズの開
始点で話速を遅くすることができ、（１）−１、（１）
−４の後半部分（下線部分）で話速を遅くすることがで
きた。<< Experimental example >> Then, when the voice including the actual news voice shown in Table 1 is processed by the above-mentioned real-time speech speed conversion device, for (1) -1 to (1) -6 in the sentence, Recognizing each break as a pause and slowing down the speech speed at the start point of each phrase, (1) -1, (1)
-4 was able to slow down the speech speed in the latter half (underlined part).

【００４５】[0045]

【表１】そして、この実験において、話速の伸張倍率を“ｒｓ＝
１．２５”、“ｒｅ＝０．９”としたとき、１３６秒の
長さの原音声を１３６秒の長さの音声にすることがで
き、“ｒｓ＝ｒｅ＝１．３”と話速を一律に遅くした場
合と比べて、吸収率αを１００％（未吸収時間０．０
秒）にすることができた。[Table 1] Then, in this experiment, the extension rate of the speech speed is set to "rs =
When 1.25 ”and“ re = 0.9 ”, the original voice having a duration of 136 seconds can be changed to a voice having a duration of 136 seconds, and the speech speed is“ rs = re = 1.3 ”. The absorption rate α is 100% (unabsorbed time 0.0
Seconds).

【００４６】この場合、吸収率αは次式で表わされる値
である。In this case, the absorption rate α is a value represented by the following equation.

【００４７】 α＝｛（Ｔ１−Ｔ２）／Ｔ１｝・１００ …（３）但し、Ｔ１：話速を一律の場合の伸張時間Ｔ２：話速を変化させた場合の伸張時間したがって、この実施例を使用することにより、文章間
の無音区間を効果的に短縮し、これによって全体の時間
長を伸張せずに、リアルタイムで原音声をゆっくりした
音声に変換することができる。Α = {(T1−T2) / T1} · 100 (3) However, T1: extension time when the voice speed is uniform T2: extension time when the voice speed is changed Therefore, this embodiment By using, it is possible to effectively reduce the silent intervals between sentences, thereby converting the original voice into a slow voice in real time without extending the overall time length.

【００４８】このようにこの実施例においては、音声入
力回路１によって話速変換対象となる音声（原音声）を
取り込み、リアルタイム処理で、原音声のイントネーシ
ョン（ピッチ周波数）の変化を検出するとともに、この
検出結果に基づいてイントネーションの高い部分では、
話速を緩め、低い部分では、話速を早めるという規則で
話速を変化させることにより、原音声の発話時間を保っ
たまま、原音声を聞き易い良好な音声に変換するように
したので、発話音声中の意味上重要な部分の話速は適度
に遅くし、それ以外の部分は逆に速めることができ、こ
れによって発話時間を実質的に伸張させることなく、全
体としてゆっくりとした聞きやすい音声に変換すること
ができる。As described above, in this embodiment, the voice input circuit 1 takes in the voice (original voice) to be converted into the speech speed and detects the change in the intonation (pitch frequency) of the original voice in real time processing. Based on this detection result, in parts with high intonation,
By changing the speaking speed according to the rule of slowing the speaking speed and increasing the speaking speed in the low part, it is possible to convert the original speech into a good voice that is easy to hear while maintaining the utterance time of the original speech. Speaking speed can be moderately slowed down in the important part of the uttered voice, and speeded up in the other parts, which is slower and easier to hear as a whole without substantially extending the speaking time. Can be converted to voice.

【００４９】[0049]

【発明の効果】以上説明したように本発明によれば、時
間的な「ずれ」に伴う問題点を解決するため、リアルタ
イム処理で、発話音声中の意味上重要な部分と考えられ
る部分の話速を適度に遅くし、それ以外の部分を逆に速
めることによって、発話時間を実質的に伸張させること
なく、全体としてゆっくりとした聞きやすい音声に変換
することができる。As described above, according to the present invention, in order to solve the problem caused by the "deviation" in time, in real-time processing, the part of the speech that is considered to be the important part in the uttered speech is talked about. By slowing the speed moderately and accelerating the other parts in the opposite direction, it is possible to convert the speech into a slow and easy-to-listen voice as a whole without substantially extending the utterance time.

[Brief description of drawings]

【図１】本発明によるリアルタイム話速変換方法の一実
施例を適用したリアルタイム話速変換装置の一例を示す
ブロック図である。FIG. 1 is a block diagram showing an example of a real-time voice speed conversion apparatus to which an embodiment of a real-time voice speed conversion method according to the present invention is applied.

【図２】図１に示すリアルタイム話速変換装置の動作例
を示すメインフローチャートである。FIG. 2 is a main flowchart showing an operation example of the real-time speech speed conversion device shown in FIG.

【図３】図１に示すリアルタイム話速変換装置の動作例
を示す有声区間処理ルーチンの一例を示すフローチャー
トである。3 is a flowchart showing an example of a voiced segment processing routine showing an operation example of the real-time speech speed conversion apparatus shown in FIG.

【図４】図１に示すリアルタイム話速変換装置の動作例
を示すタイミング図である。FIG. 4 is a timing diagram showing an operation example of the real-time speech speed conversion device shown in FIG.

[Explanation of symbols]

１音声入力回路２ＣＰＵ回路３ＰＲＯＭ回路４入力バッファ回路５処理バッファ回路６ファイル回路７音声出力回路８バス 1 audio input circuit 2 CPU circuit 3 PROM circuit 4 input buffer circuit 5 processing buffer circuit 6 file circuit 7 audio output circuit 8 bus

Claims

[Claims]

1. When slowing down the speed at which the listening voice is uttered (speech speed), a boundary between sentences is detected to determine a pause section, and a portion within a predetermined time range from this pause section, A real-time voice speed conversion method, characterized in that the voice speed is changed based on a predetermined rule divided into parts outside this range.