JP2000250571A

JP2000250571A - Method and device for editing speech data, and medium recorded with program

Info

Publication number: JP2000250571A
Application number: JP11048710A
Authority: JP
Inventors: Kotaro Machidera; 侯大郎待寺; Hiroshi Ono; 博小野
Original assignee: Anritsu Corp
Current assignee: Anritsu Corp
Priority date: 1999-02-25
Filing date: 1999-02-25
Publication date: 2000-09-14

Abstract

PROBLEM TO BE SOLVED: To make a functional word, etc., included in speach easy to listen. SOLUTION: An analog speech signal is converted into a series of speech data by an A/D converter 21, and a series of speech data are divided into the word data by a word blocking means 23, and the word data whose signal level is lower than a prescribed threshold value R among the divided word data are extracted by a word data extraction means 24. The signal level of the extracted word data is amplified by an amplifier means 25, and the amplified word data and the other data are synthesized in order of input speech by a synthetic means 26, converted into the analog speech signal, and outputted.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、音声を聞きやすく
編集するための技術に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a technology for editing a voice so that it is easy to hear.

【０００２】[0002]

【従来の技術】例えば、外国語を学習する方法として、
実際にその外国語を耳で聞いて学習するＬＬ学習方法が
効果的である。2. Description of the Related Art For example, as a method of learning a foreign language,
The LL learning method of actually learning by listening to the foreign language by ear is effective.

【０００３】このようなＬＬ学習では、外国語の音声を
先生等が使用する親機から生徒用の複数の子機へ出力
し、各生徒がこれをヘッドフォンを介して聞くという方
法が取られている。In such LL learning, a method is used in which a foreign language voice is output from a master unit used by a teacher or the like to a plurality of slave units for students, and each student listens through a headphone. I have.

【０００４】しかし、複数の生徒の中には、実際の外国
語の速度に耳が慣れている者だけでなく、全く慣れてい
ない者もおり、このような初心者では、親機から送られ
てくる音声を理解できない場合が多い。[0004] However, among a plurality of students, not only those who are accustomed to the actual speed of a foreign language but also those who are not accustomed to it at all, such beginners are sent from the parent machine. In many cases, the incoming voice cannot be understood.

【０００５】これを解決するために、入力される音声を
話速変換技術を用いてゆっくりとした音声に変換するこ
とが考えられる。[0005] In order to solve this, it is conceivable to convert the input voice into a slow voice using a speech speed conversion technique.

【０００６】[0006]

【発明が解決しようとする課題】しかしながら、単に音
声をゆっくりとした速度に変換しただけでは、例えば英
語の場合、ｆｏｒ、ｔｏ、ｔｈｅ等の前置詞や冠詞等
（これらを機能語と言う）のように、短くしかも弱く発
音される単語を明確に認識できないという問題があっ
た。However, simply converting the voice to a slow speed, for example, in the case of English, such as prepositions and articles such as for, to, the, etc. (these are called functional words) There is a problem that words that are short and weakly pronounced cannot be clearly recognized.

【０００７】本発明は、この問題を解決し、音声に含ま
れる機能語を聞きやすくする音声データ編集方法、音声
データ編集装置およびプログラムを記録した媒体を提供
することを目的としている。An object of the present invention is to solve this problem and to provide a voice data editing method, a voice data editing device, and a medium on which a program is recorded, which makes it easy to hear function words contained in voice.

【０００８】[0008]

【課題を解決するための手段】前記目的を達成するため
に、本発明の請求項１の音声データ編集方法は、アナロ
グの音声信号をディジタル変換して得られた一連の音声
データを単語データに区分けする段階と、前記区分けさ
れた単語データのうち信号レベルが所定のしきい値より
低い単語データを増幅する段階と、増幅した単語データ
と他の単語データとを、元の音声順に合成する段階とを
含んでいる。According to a first aspect of the present invention, there is provided a voice data editing method comprising: converting a series of voice data obtained by converting an analog voice signal into digital data into word data; Segmenting, amplifying word data having a signal level lower than a predetermined threshold value among the segmented word data, and synthesizing the amplified word data and other word data in the original voice order. And

【０００９】また、本発明の請求項２の音声データ編集
装置は、入力されるアナログの音声信号をディジタルの
音声データに変換するＡ／Ｄ変換器と、前記Ａ／Ｄ変換
器から出力された一連の音声データを単語データに区分
けする単語ブロック化手段と、前記単語ブロック化手段
によって区分けされた各単語データのうち信号レベルが
所定のしきい値より低い単語データを抽出する単語デー
タ抽出手段と、前記単語データ抽出手段によって抽出さ
れた単語データの信号レベルを増幅処理する増幅手段
と、前記単語ブロック化手段よって区分けされた各単語
データのうち前記単語データ抽出手段で抽出されなかっ
た単語データと、前記増幅手段によって信号レベルが増
幅された単語データとを入力音声順に合成する合成手段
と、前記合成手段によって合成された音声データをアナ
ログの音声信号に変換して出力するＤ／Ａ変換器とを備
えている。According to a second aspect of the present invention, there is provided an audio data editing apparatus for converting an input analog audio signal into digital audio data, and an output from the A / D converter. Word blocking means for dividing a series of voice data into word data; word data extracting means for extracting word data whose signal level is lower than a predetermined threshold value among word data divided by the word blocking means; Amplifying means for amplifying the signal level of the word data extracted by the word data extracting means; and word data not extracted by the word data extracting means among the word data segmented by the word blocking means. Synthesizing means for synthesizing the word data whose signal level has been amplified by the amplifying means in the order of the input voice; and And a D / A converter audio data and outputs the converted analog audio signal synthesized me.

【００１０】また、本発明の請求項３のプログラムを記
録した媒体は、コンピュータを、一連の音声データを単
語データに区分けする単語ブロック化手段と、前記単語
ブロック化手段によって区分けされた各単語データのう
ち信号レベルが所定のしきい値より低い単語データを抽
出する単語データ抽出手段と、前記単語データ抽出手段
によって抽出された単語データの信号レベルを増幅処理
する増幅手段と、前記単語ブロック化手段よって区分け
された各単語データのうち前記単語データ抽出手段で抽
出されなかった単語データと、前記増幅手段によって信
号レベルが増幅された単語データとを入力音声順に合成
する合成手段として機能させるためのプログラムが記録
されている。[0010] Further, a medium on which the program according to claim 3 of the present invention is recorded is a computer, comprising: a computer configured to divide a series of voice data into word data; Word data extracting means for extracting word data having a signal level lower than a predetermined threshold value; amplifying means for amplifying a signal level of the word data extracted by the word data extracting means; and the word blocking means Therefore, a program for functioning as synthesizing means for synthesizing word data not extracted by the word data extracting means among the word data divided and word data whose signal level has been amplified by the amplifying means in the order of input voices. Is recorded.

【００１１】[0011]

【発明の実施の形態】以下、図面に基づいて本発明の実
施形態を説明する。図１は、実施形態の音声編集装置２
０の構成を示し、図２は、その要部の内部構成例を示し
ている。Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 shows an audio editing device 2 according to the embodiment.
FIG. 2 shows an example of the internal configuration of the main part.

【００１２】図１において、Ａ／Ｄ変換器２１は、入力
端子２０ａに入力される音声信号を所定周期でサンプリ
ングしてディジタルの音声データに変換して、入力バッ
ファ２２に順次記憶する。In FIG. 1, an A / D converter 21 samples an audio signal input to an input terminal 20a at a predetermined cycle, converts the audio signal into digital audio data, and sequentially stores the digital audio data in an input buffer 22.

【００１３】単語ブロック化手段２３は、入力バッファ
２２に記憶された一連の音声データを、単語データおよ
び無音データに区分けする。The word blocker 23 divides a series of voice data stored in the input buffer 22 into word data and silence data.

【００１４】例えば、入力バッファ２２に記憶された一
連の音声データについて自己相関法や零交差計数法等を
用いて音声のピッチを抽出し、この抽出したピッチに基
づいて、一連の音声データを無声区間、有声区間、無音
区間に分け、無音区間を挟まずに無声区間と有声区間と
の組合せからなる連続した有音部分のデータ列を単語デ
ータＷとし、単語データ間の無音区間のデータ列を無音
データＳとする。For example, an audio pitch is extracted from a series of audio data stored in the input buffer 22 using an autocorrelation method, a zero-crossing counting method, or the like, and a series of audio data is unvoiced based on the extracted pitch. It is divided into a section, a voiced section, and a silent section, and a data string of a continuous voiced part composed of a combination of the unvoiced section and the voiced section is defined as word data W without sandwiching the silent section, and a data string of a silent section between the word data is defined. Let it be silence data S.

【００１５】ここで、無声区間とは、例えば図３に示す
１音の信号のうち、音の先頭で周波数が高い部分（子音
部分）Ｖａを示し、有声区間とは周波数の低い部分Ｖｂ
（母音部分）を示し、無音区間とは音が無い（雑音レベ
ル）部分Ｎを示している。なお、図３は１音について示
しているが、実際の音声は、図３のような信号が無音区
間Ｎを挟まずに複数個連続して単語単位で発せられ、そ
の単語間に無音区間Ｎが挟まれることになり、無声区間
Ｖａと有声区間Ｖｂとの組合せからなる連続した有音区
間のデータ列を単語データＷ、無音区間Ｎのデータ列を
無音データＳとする。Here, the unvoiced section refers to, for example, a high-frequency part (consonant part) Va at the beginning of the sound in the signal of one sound shown in FIG. 3, and the voiced section refers to a low-frequency part Vb.
(A vowel portion), and a silent section indicates a portion N where there is no sound (noise level). Note that FIG. 3 shows one sound, but in the actual voice, a signal as shown in FIG. Are interposed, and a data string of a continuous voiced section composed of a combination of a voiceless section Va and a voiced section Vb is defined as word data W, and a data string of a silent section N is defined as silent data S.

【００１６】単語データ抽出手段２４は、単語ブロック
化手段２３によって区分けされた各単語データの信号レ
ベルと所定のしきい値Ｒとを比較し、信号レベルがしき
い値Ｒより低い単語データＷａを抽出して増幅手段２５
に出力し、信号レベルがしきい値Ｒ以上の単語データＷ
ｂおよび無音データＳを合成手段２６に出力する。増幅
手段２５は、しきい値Ｒより低い単語データＷａの信号
レベルを、１以上の増幅度Ａで増幅する。The word data extracting means 24 compares the signal level of each word data segmented by the word blocking means 23 with a predetermined threshold value R, and outputs word data Wa whose signal level is lower than the threshold value R. Extraction and amplification means 25
And the word data W whose signal level is equal to or higher than the threshold value R
b and the silence data S are output to the synthesizing means 26. The amplification unit 25 amplifies the signal level of the word data Wa lower than the threshold value R by one or more amplification degrees A.

【００１７】合成手段２６は、信号レベルがしきい値Ｒ
以上の単語データＷｂおよび無音データＳと、増幅手段
２５によって増幅された単語データＷａ′とを、入力さ
れた音声順に合成して、中間バッファ２７に記憶する。The synthesizing means 26 determines that the signal level is equal to the threshold R
The word data Wb and the silence data S described above and the word data Wa ′ amplified by the amplifying unit 25 are synthesized in the order of the input voice, and stored in the intermediate buffer 27.

【００１８】話速変換手段２８は、中間バッファ２７に
記憶された各単語データの無声区間、有声区間および各
無音データを、所定の伸長倍率Ｂで伸長し、この伸長さ
れた各単語ブロックを入力音声順に出力バッファ２９に
記憶する。The speech speed conversion means 28 expands the unvoiced section, voiced section and each silent data of each word data stored in the intermediate buffer 27 at a predetermined expansion rate B, and inputs each expanded word block. The data is stored in the output buffer 29 in the order of voice.

【００１９】なお、この伸長倍率Ｂは、無声区間、有声
区間および無音区間の各区間に対して一律であってもよ
く、また、各区間毎に異なる伸長倍率であってもよい。The expansion rate B may be uniform for each of the unvoiced section, voiced section and silent section, or may be different for each section.

【００２０】Ｄ／Ａ変換器３０は、出力バッファ２９に
記憶された単語ブロックを入力音声順にアナログの音声
信号に変換して出力端子２０ｂから出力する。The D / A converter 30 converts the word blocks stored in the output buffer 29 into an analog audio signal in the order of the input audio and outputs the analog audio signal from the output terminal 20b.

【００２１】この出力端子２０ｂには、増幅器等を介し
てスピーカやヘッドフォンが接続され、入力端子２０ａ
に入力された音声の編集結果を聞くことができる。Speakers and headphones are connected to the output terminal 20b via an amplifier or the like.
You can hear the result of editing the audio that was input to.

【００２２】一方、単語データ抽出手段２４のしきい値
Ｒ、増幅手段２５の増幅度Ａおよび話速変換手段２８の
伸長倍率Ｂは、編集パラメータ設定手段３１によって設
定される。On the other hand, the threshold value R of the word data extraction means 24, the amplification degree A of the amplification means 25, and the expansion rate B of the speech speed conversion means 28 are set by the editing parameter setting means 31.

【００２３】ここで、編集パラメータ設定手段３１は、
しきい値Ｒ、増幅度Ａについては入力バッファ２２に記
憶された一連の音声データのレベルに基づいて算出した
値を設定する。Here, the editing parameter setting means 31
As the threshold R and the amplification A, values calculated based on the level of a series of audio data stored in the input buffer 22 are set.

【００２４】即ち、図２に示しているように、レベル検
出手段３２によって入力バッファ２２に記憶されている
一連の音声データの信号レベルの絶対値の最大値Ｈと最
小値Ｌとを検出する。That is, as shown in FIG. 2, the level detection means 32 detects the maximum value H and the minimum value L of the absolute value of the signal level of a series of audio data stored in the input buffer 22.

【００２５】しきい値設定手段３３は、係数設定手段３
４に予め設定された１以下の係数α、および最大値Ｈ、
最小値Ｌから、次の演算、Ｒ＝α・（Ｈ−Ｌ）を行なってしきい値Ｒを算出し、単語データ抽出手段２
４に設定する。なお、計数設定手段３４に設定される係
数αは、発音の弱い単語と他の単語とをしきい値Ｒで判
別するために例えば０．２程度に設定される。The threshold setting means 33 includes a coefficient setting means 3
4, a coefficient α equal to or less than 1 and a maximum value H,
From the minimum value L, the following calculation, R = α · (HL), is performed to calculate the threshold value R, and the word data extraction means 2
Set to 4. The coefficient α set in the count setting means 34 is set to, for example, about 0.2 in order to discriminate a weakly pronounced word from another word by using the threshold value R.

【００２６】また、増幅度設定手段３５は、係数設定手
段３４に設定された係数αに応じた増幅度Ａを増幅手段
２５に設定する。The amplification degree setting means 35 sets the amplification degree A according to the coefficient α set in the coefficient setting means 34 in the amplification means 25.

【００２７】ここで、増幅度Ａは、増幅後の単語データ
の信号レベルが最大値Ｈを越えないように、１以上１／
α以下の範囲内に限定され、係数αが可変できる場合に
は、この係数αに連動するように、例えば１／（２・
α）等に設定される。Here, the amplification degree A is not less than 1/1/1 so that the signal level of the amplified word data does not exceed the maximum value H.
If the coefficient α is variable within a range equal to or less than α, for example, 1 / (2 ·
α) and so on.

【００２８】また、伸長倍率設定手段３６は、話速変換
手段２８の伸長倍率Ｂを所定の範囲内で任意に設定する
ことができる。例えば、伸長倍率Ｂを各区間に対して１
にすれば元の音声と同一速度の音声となり、伸長倍率倍
Ｂを各区間に対して１より大きくすれば元の音声よりゆ
っくりとした音声に変換され、伸長倍率倍Ｂを各区間に
対して１より小さくすれば元の音声より速い音声に変換
される。The expansion rate setting means 36 can arbitrarily set the expansion rate B of the speech speed conversion means 28 within a predetermined range. For example, the expansion ratio B is set to 1 for each section.
, The voice becomes the voice of the same speed as the original voice, and if the expansion ratio B is made larger than 1 for each section, the voice is converted into a voice slower than the original voice. If it is smaller than 1, the voice is converted to a voice faster than the original voice.

【００２９】なお、上記の音声データ編集装置２０の構
成要件のうち、Ａ／Ｄ変換器２１およびＤ／Ａ変換器３
０以外の各バッファ２２、２７、２９および前記各手段
２３〜２６、２８、３１は、コンピュータによって機能
させることができる。The A / D converter 21 and the D / A converter 3 among the constituent elements of the audio data editing device 20 described above.
Each buffer 22, 27, 29 other than 0 and each of the means 23 to 26, 28, 31 can be operated by a computer.

【００３０】この場合、Ａ／Ｄ変換器２１およびＤ／Ａ
変換器３０に接続されたコンピュータに対して前記各バ
ッファ２２、２７、２９、前記各手段２３〜２６、２
８、３１を機能させるためのプログラムが記録された媒
体（フロッピーディスク、ＣＤ−ＲＯＭ、ＭＯ等）から
前記プログラムをロードし、コンピュータでこのプログ
ラムを実行させる。In this case, the A / D converter 21 and the D / A
The buffers 22, 27, 29, the means 23 to 26, 2
The program is loaded from a medium (a floppy disk, a CD-ROM, an MO, etc.) on which a program for causing the programs 8 and 31 to function is recorded, and the computer executes the program.

【００３１】このように構成された音声データ編集装置
２０に対し、例えば次のように８つの単語からなる英語
文「Ｉｔ’ｓｄｉｆｆｉｃｕｌｔｆｏｒｍｅｔ
ｏｆｉｎｉｓｈｔｈｅｈｏｍｅｗｏｒｋ．」の音声
が入力された場合、図４の（ａ）に示すアナログの音声
信号（包短線で示している）が入力端子２０ａに入力さ
れ、Ａ／Ｄ変換器２１によって一連の音声データに変換
される。For the audio data editing apparatus 20 configured as described above, for example, the English sentence “It's difficult for met” composed of eight words is as follows.
o finish thehomework. Is input, an analog audio signal (shown by a dashed line) shown in FIG. 4A is input to the input terminal 20a, and is converted into a series of audio data by the A / D converter 21. Is done.

【００３２】そして、この一連の音声データから最大値
Ｈと最小値Ｌが検出され、前記したように、これらの値
と係数αとによってしきい値Ｒおよび増幅度Ａが決定さ
れる。Then, the maximum value H and the minimum value L are detected from the series of audio data, and the threshold value R and the amplification degree A are determined based on these values and the coefficient α as described above.

【００３３】一方、Ａ／Ｄ変換器２１によって変換され
た一連の音声データは、単語ブロック化手段２３によっ
て８つの単語データＷ１〜Ｗ８と９つの無音データＳ１
〜Ｓ９と分けられる。On the other hand, a series of voice data converted by the A / D converter 21 is converted into eight word data W1 to W8 and nine silence data S1 by the word blocker 23.
~ S9.

【００３４】これらの単語データＷ１〜Ｗ８に対して、
単語データ抽出手段２４は、その信号レベルがしきい値
Ｒより低い単語データＷａを抽出する。With respect to these word data W1 to W8,
The word data extraction means 24 extracts word data Wa whose signal level is lower than the threshold value R.

【００３５】ここで、図４の（ａ）に示しているよう
に、信号レベルがしきい値Ｒより低い単語データＷａ
は、機能語としての「ｆｏｒ」、「ｔｏ」、「ｔｈｅ」
に対応する単語データＷ３、Ｗ５、Ｗ７であり、これら
の単語データＷ３、Ｗ５、Ｗ７が増幅手段２５に出力さ
れ、増幅度Ａで増幅される。Here, as shown in FIG. 4A, word data Wa having a signal level lower than the threshold value R is used.
Means "for", "to", "the" as function words
, And these word data W3, W5, W7 are output to the amplifying means 25 and amplified at the amplification degree A.

【００３６】そして、増幅された単語データＷａ′（Ｗ
３′、Ｗ５′、Ｗ７′）が他の単語データＷｂ（Ｗ１、
Ｗ２、Ｗ４、Ｗ６、Ｗ８、Ｗ９）および無音データＳ１
〜Ｓ９とともに、合成手段２６で入力順に合成されて、
中間バッファ２７に記憶される。Then, the amplified word data Wa '(W
3 ', W5', W7 ') are other word data Wb (W1, W1
W2, W4, W6, W8, W9) and silence data S1
Along with S9, the images are synthesized in the order of input by the synthesis means 26,
It is stored in the intermediate buffer 27.

【００３７】したがって、中間バッファ２７には、図４
の（ｂ）に示すように、「ｆｏｒ」、「ｔｏ」、「ｔｈ
ｅ」に対応する部分のみが増幅された一連の音声データ
が記憶される。Therefore, in the intermediate buffer 27, FIG.
(B), “for”, “to”, “th”
A series of audio data in which only the portion corresponding to "e" is amplified is stored.

【００３８】ここで、話速変換手段２８の伸長倍率Ｂが
各区間に対して１であれば、Ｄ／Ａ変換器３０からは、
図４の（ｂ）に示した音声信号が出力されることにな
る。この音声信号は、図４の（ａ）の入力音声のうち発
音の特に弱い単語が選択的に増幅されたものであるか
ら、従来聞こえにくかった機能語に対する認識度が高く
なり、外国語学習の初心者でもこれらの単語を容易に聞
き取ることができる。If the expansion rate B of the speech speed conversion means 28 is 1 for each section, the D / A converter 30 outputs
The audio signal shown in FIG. 4B is output. Since this speech signal is obtained by selectively amplifying a word having a particularly weak pronunciation in the input speech shown in FIG. Even beginners can easily hear these words.

【００３９】また、話速変換手段２８の伸長倍率Ｂを各
区間に対して１より大きくすると、図４の（ｃ）のよう
に、各単語データおよび無音データの時間幅が大きくな
り、入力された音声をゆっくりとした速度で聞くことが
でき、前記した単語データの増幅処理と合わせて、入力
された音声の聞取りがさらに容易となる。When the expansion rate B of the speech speed conversion means 28 is set to be larger than 1 for each section, the time width of each word data and silence data increases as shown in FIG. The user can hear the input voice at a slow speed, and the input voice can be heard more easily in addition to the word data amplification process described above.

【００４０】なお、編集パラメータ設定手段３１から設
定されるしきい値Ｒ、増幅度Ａおよび伸長倍率Ｂは固定
されていてもよく、また、図示しない操作部によって可
変できるようにしてもよく、このように操作部で可変で
きるようにすれば、ＬＬ学習システムやテレビ、ビデ
オ、ラジオ等の音声を使用者がもっとも聞きやすい状態
に設定することができる。It should be noted that the threshold value R, the amplification factor A and the expansion factor B set by the editing parameter setting means 31 may be fixed, or may be variable by an operation unit (not shown). As described above, by making the operation unit variable, it is possible to set the sound of the LL learning system, the television, the video, the radio, and the like so that the user can hear it most easily.

【００４１】また、前記説明では、無声区間と有声区間
との組合せからなる連続した有音部のデータ列を単語デ
ータとし無音データと区別していたが、単語ブロック化
手段２３において、この有音部のデータ列とその前ある
いはその後に続く無音データとを合わせて１組の単語デ
ータとしてもよい。この場合には、信号レベルがしきい
値Ｒより低いと判定された単語データを増幅手段２５に
よって増幅する際に、その単語データの有音部のみを増
幅し、無音データ部分は元のデータのまま出力すること
で、Ｓ／Ｎの低下を防ぐことができる。Further, in the above description, the data sequence of continuous voiced parts composed of a combination of unvoiced sections and voiced sections is used as word data to be distinguished from unvoiced data. A set of word data may be obtained by combining the data string of the part and the silence data preceding or following it. In this case, when the word data whose signal level is determined to be lower than the threshold value R is amplified by the amplifying means 25, only the sound part of the word data is amplified, and the silent data part is replaced with the original data. By outputting the signal as it is, a decrease in S / N can be prevented.

【００４２】[0042]

【発明の効果】以上のように、本発明では、アナログの
音声信号をディジタル化して得られた一連の音声データ
を単語データに区分けし、その単語データから信号レベ
ルが所定値より低い単語データを抽出してその信号レベ
ルを増幅し、増幅した単語データと他のデータとを入力
音声順に合成して、アナログの音声信号に変換するよう
にしている。As described above, according to the present invention, a series of voice data obtained by digitizing an analog voice signal is divided into word data, and word data having a signal level lower than a predetermined value are converted from the word data. The signal level is extracted and amplified, and the amplified word data and other data are combined in the order of the input voice and converted into an analog voice signal.

【００４３】このため、音声のうち発音の弱い単語に対
する認識性を高くすることができ、外国語等の音声が聞
きやすくなる。For this reason, it is possible to enhance the recognizability of a weakly pronounced word in the voice, and it becomes easy to hear a voice in a foreign language or the like.

[Brief description of the drawings]

【図１】本発明の実施形態の構成を示すブロック図FIG. 1 is a block diagram showing a configuration of an embodiment of the present invention.

【図２】実施形態の要部の構成例を示すブロック図FIG. 2 is a block diagram showing a configuration example of a main part of the embodiment.

【図３】音声信号と各区間との対応図FIG. 3 is a diagram showing a correspondence between an audio signal and each section.

【図４】実施形態の動作を説明するための信号図FIG. 4 is a signal diagram for explaining the operation of the embodiment;

[Explanation of symbols]

２０音声データ編集装置２１Ａ／Ｄ変換器２３単語ブロック化手段２４単語データ抽出手段２５増幅手段２６合成手段２８話速変換手段３０Ｄ／Ａ変換器３１編集パラメータ設定手段３２レベル検出手段３３しきい値設定手段３４係数設定手段３５増幅度設定手段３６伸長倍率設定手段 Reference Signs List 20 audio data editing device 21 A / D converter 23 word blocking means 24 word data extracting means 25 amplifying means 26 synthesizing means 28 speech speed converting means 30 D / A converter 31 editing parameter setting means 32 level detecting means 33 threshold Value setting means 34 coefficient setting means 35 amplification degree setting means 36 elongation magnification setting means

【手続補正書】[Procedure amendment]

【提出日】平成１１年４月８日（１９９９．４．８）[Submission date] April 8, 1999 (1999.4.8)

【手続補正１】[Procedure amendment 1]

【補正対象書類名】図面[Document name to be amended] Drawing

【補正対象項目名】全図[Correction target item name] All figures

【補正方法】変更[Correction method] Change

【補正内容】[Correction contents]

【図１】 FIG.

【図２】 FIG. 2

【図３】 FIG. 3

【図４】 FIG. 4

Claims

[Claims]

A step of classifying a series of voice data obtained by digitally converting an analog voice signal into word data; and word data having a signal level lower than a predetermined threshold value among the divided word data. And a step of synthesizing the amplified word data and other word data in the original voice order.

2. An A / D converter for converting an input analog voice signal into digital voice data, and a word block for dividing a series of voice data output from the A / D converter into word data Means, word data extraction means for extracting word data having a signal level lower than a predetermined threshold value among word data divided by the word blocking means, and word data extracted by the word data extraction means. Amplifying means for amplifying the signal level; word data not extracted by the word data extracting means among the word data divided by the word blocking means; and word data whose signal level has been amplified by the amplifying means. Means for synthesizing the voice data in the order of the input voice, and converting the voice data synthesized by the synthesis means into an analog voice. Audio data editing apparatus and a D / A converter for converting the items.

3. A computer, comprising: a word blocking unit for dividing a series of voice data into word data; and word data whose signal level is lower than a predetermined threshold value among the word data divided by the word blocking unit. Word data extracting means for extracting the word data; amplifying means for amplifying the signal level of the word data extracted by the word data extracting means; and the word data extracting means among the word data divided by the word blocking means. A medium on which is recorded a program for functioning as a synthesizing means for synthesizing word data not extracted in step (a) and word data whose signal level has been amplified by the amplifying means in the order of input speech.