JPS6029800A

JPS6029800A - Voice analysis system

Info

Publication number: JPS6029800A
Application number: JP58137597A
Authority: JP
Inventors: 野田　和生
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1983-07-29
Filing date: 1983-07-29
Publication date: 1985-02-15

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】〔発明の技術分野〕この発明は、磁気テープ等に録音された音声信号を含む
アナログ信号をアナログ・デジタル変換（以下ＡＤ変換
という）してデジタル信号を形成し、該デジタル信号の
中から音声データのみを抽出して記憶装置に格納するよ
うにした音声分析方式に関する。[Detailed Description of the Invention] [Technical Field of the Invention] This invention converts an analog signal including an audio signal recorded on a magnetic tape or the like into a digital signal by performing analog-to-digital conversion (hereinafter referred to as AD conversion). The present invention relates to a voice analysis method in which only voice data is extracted from a digital signal and stored in a storage device.

[Technical background of the invention]

最近へデジタル音声メモリを有し、該デジタル音声メモ
リによシ音声応答機能を実現した装置が種々開発されて
いるが、かがる装置に組込むデジタル音声メモリを作成
する場合は、予め録音された音声信号を分析してデジタ
ルデータに変換する作業がまず行われる。かがる作業を
行うのが音声分析方式である。Recently, various devices have been developed that have a digital voice memory and realize a voice response function using the digital voice memory. The first step is to analyze the audio signal and convert it into digital data. The voice analysis method is used to perform the overcasting process.

ところで装置によっては、数多くの言葉を組み合わせて
種々の文章を発音できるようにしたものがあり、この場
合多くの言葉を録音して磁気テープ等を記録媒体とする
録音再生装置にアナログ信号の形で記録し、このアナロ
グ信号をそれぞれ分析してデジタルデータに変換すると
いう作業を行うことになる。By the way, some devices are capable of pronouncing various sentences by combining many words, and in this case, many words are recorded and sent in the form of an analog signal to a recording/playback device using a recording medium such as magnetic tape. This involves recording, analyzing each of these analog signals, and converting them into digital data.

しかし、かかるアナログ信号には各音声信号の間にノイ
ズを含む無録音帯が存在し、音声分析に際してはこの無
録音帯を除き音声信号のみを抽出しなければならない。However, such analog signals have unrecorded bands containing noise between each audio signal, and when performing audio analysis, it is necessary to remove these unrecorded bands and extract only the audio signal.

そこで従来の音声分析方式においては、オペレータはこ
の再生音を聞き取シながら音声の始まシと終シの瞬間を
判断している。Therefore, in the conventional voice analysis method, the operator listens to the reproduced sound and determines the moment when the voice starts and ends.

[Problems with background technology]

しかし、この作業はオペレータの判断を頼るため数度の
試行錯誤が必要とされ、多数の音声データを作成するた
めには多大の時間が必要とされ極めて効率が悪かった。However, this work relies on the operator's judgment, requiring several trials and errors, and creating a large amount of audio data requires a large amount of time, making it extremely inefficient.

[Purpose of the invention]

この発明は、上記欠点を除去し、音声の開始点と終了点
を自動的に検出することによシ、無駄のない音声データ
が一回の操作で作成できる音声分析方式を提供すること
を目的とする。The purpose of this invention is to eliminate the above-mentioned drawbacks and provide a voice analysis method that can create lean voice data in a single operation by automatically detecting the start and end points of voice. shall be.

[Summary of the invention]

この発明では、音声信号をデジタル信号に変換して最小
単位毎に入力するとともに該デジタル信号が無音データ
か有音データかを判別し、有音データが第１の所定時間
ｔ’を以上連続した場合は該デジタル信号の格納を開始
し、前記無音データまたは第１の所定時間１１以上連続
しない有音データのいずれかが第２の所定時間ｔ２　（
＞ｔｘ）以上続いた場合は前記デジタル信号の格納を終
了するようにしている。In this invention, an audio signal is converted into a digital signal and inputted in each minimum unit, and it is determined whether the digital signal is silent data or sound data, and the sound data continues for a first predetermined time t' or more. If so, storage of the digital signal is started, and either the silent data or the sound data that does not continue for 11 or more times in the first predetermined time period is stored for a second predetermined time period t2 (
>tx), the storage of the digital signal is terminated.

[Embodiments of the invention]

以下１この発明の実施例を添付図面を参照して詳細に説
明する。Embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

第１図は島この発明の方式が適用される音声分析装置の
一実施例を示すブロック図である。この実施例の音声分
析装置は、オペレータコンソール１、録音再生装置制御
部２．多数の音声が録音された録音再生装置３．音声を
アナログ・デジタル変換するＡＤ変換部４．デジタル信
号の音声データをデジタル・アナログ変換するＤＡ変換
部５゜スピーカ６、音声データの始点終点検出部７．中
央処理装置８．一つの音声データを記憶する音声メモリ
９．プログラムメモリ１０．多数の音声データを記憶す
る音声データ格納装置１１を具えている。FIG. 1 is a block diagram showing an embodiment of a speech analysis device to which the method of the present invention is applied. The voice analysis device of this embodiment includes an operator console 1, a recording/playback device control section 2. Recording and playback device in which a large number of voices are recorded 3. AD conversion unit that converts audio from analog to digital 4. DA converter 5° speaker 6 for digital-to-analog conversion of audio data of digital signals; audio data start/end point detector 7. Central processing unit8. Audio memory for storing one audio data9. Program memory 10. It includes an audio data storage device 11 that stores a large amount of audio data.

箇２図は、録音再生装置３に９音されている多数の音声
の一例を示すもので、横軸は時間Ｔを示し縦軸は時間Ｔ
の変化に対応して変化する音の大きさを示してい、る。Figure 2 shows an example of a large number of nine sounds recorded in the recording/playback device 3, where the horizontal axis represents time T and the vertical axis represents time T.
It shows the volume of sound that changes in response to changes in.

音声成分１２は時点Ｔ１から時点Ｔ２までの時間帯にア
シ、ノイズ成分１３と無音声分１４は散在している。The audio component 12 is scattered in the time period from time T1 to time T2, and the noise component 13 and non-speech component 14 are scattered.

さて、第１図に示されているブロック図において、オペ
レータコンソール１によシ音声分析を指示すると録音再
生装置制御部２から制御信号が出力され録音再生装置３
が駆動され音声の再生が開始される。ここで録音再生装
置３で再生される音声は例えば第２図に対応するものと
なる。この再生は時点Ｔｏから開始される。録音再生装
置３の再生出力はＡＤ変換部４に伝達される。ＡＤ変換
部４は録音再生装置３の、再生出力を順次デジタル信号
に変換し、とのデジタル信号は始点終点検出部７に伝Ｕ
れる。始点終点検出部７はＡＤ変換部４によって変換さ
れたデジタル信号を順次検査し、音声成分１２の始点す
なわち時点Ｔ！を検出する。始点終点検出部７における
音声成分１２の始点の検出は、第３図に示すフローチャ
ートにしたがって行われる。Now, in the block diagram shown in FIG. 1, when the operator console 1 instructs voice analysis, a control signal is output from the recording/playback device control section 2 and the recording/playback device 3
is driven and audio playback begins. Here, the audio reproduced by the recording/reproducing device 3 corresponds to, for example, FIG. 2. This playback starts from time To. The playback output of the recording/playback device 3 is transmitted to the AD converter 4. The AD conversion unit 4 sequentially converts the playback output of the recording/playback device 3 into digital signals, and the digital signals are transmitted to the start/end point detection unit 7.
It will be done. The start/end point detecting section 7 sequentially inspects the digital signal converted by the AD converting section 4, and detects the starting point of the audio component 12, that is, the time T! Detect. Detection of the start point of the audio component 12 by the start point/end point detection section 7 is performed according to the flowchart shown in FIG.

始点終点検出部７は、まずＡＤ変換部７によって変換さ
れたデジタルデータを最小単位毎に入力しこのデータが
無音データか否かの判断を行う０この判定は具体的には
入力データを所定のスレッシＥ／ｌ／ドレペルと比較す
ることにより行う。すなわち、入力データ−！工当該ス
レッショルドレベルよりも小さければ、無音データと判
断し、大きければ有音データと判断す、るのである。こ
の判断において、入力データが無音データであった場合
はこのフローチャートの始めに戻シ、再び上記動作を繰
シ返えす、。また、この判断において、入力データが無
音データでない場合、すなわち有音データである場合は
次にこの有音データが時間ｔ１以上持続するか否かの判
断を行い、時間ｔ１以上持続しない場合はこのフローチ
ャートの始めに戻シ上記動作を再びｉうが、時間ｔ１以
上持続した場合は音声データが始まったと判断し、音声
メモリ９へのデジタル音声データの格納を開始する。す
なわち、始点終点検出部７で検出された時点Ｔ１は中央
処理部８へ伝達され、ＡＤ変換部４からの音声データは
時点Ｔ１から音声メモリ９に格納される。The start point/end point detection section 7 first inputs the digital data converted by the AD conversion section 7 in minimum units and judges whether or not this data is silent data. This is done by comparing with Threshold E/l/Drepel. That is, the input data -! If it is smaller than the threshold level, it is determined to be silent data, and if it is larger than the threshold level, it is determined to be sound data. In this judgment, if the input data is silent data, the process returns to the beginning of this flowchart and repeats the above operation again. In addition, in this judgment, if the input data is not silent data, that is, if it is sound data, then it is judged whether or not this sound data lasts for more than time t1, and if it does not last for more than time t1, then this Returning to the beginning of the flowchart, the above operation is repeated, but if it continues for more than time t1, it is determined that the audio data has started, and storage of the digital audio data in the audio memory 9 is started. That is, the time point T1 detected by the start point/end point detection section 7 is transmitted to the central processing section 8, and the audio data from the AD conversion section 4 is stored in the audio memory 9 from time point T1.

この音声メモリ９への格納動作は始点終点検出部７によ
シ音声成分１２の終点が検出されるまで続けられる。始
点終点検出部７における音声成分１２の終点検出動作は
、第４図に示すフローチャートにしたがって行われる。This storing operation in the audio memory 9 is continued until the end point of the audio component 12 is detected by the start/end point detection unit 7. The end point detection operation of the audio component 12 in the start/end point detection section 7 is performed according to the flowchart shown in FIG.

ＡＤ変換部４からの音声データは最小単位毎に始点終点
検出部７に加えられ、始点終点検出部７ではまずこの入
力データが無音データか否かの判断を行う。ここで無音
データではなかった場合すなわち有音データだった場合
は、次・にこの有召データが時間ｔ１以上ケづ絖するか
否かの判断を行い該有音が９１足の時間ｔ□以上連紅し
た場合は、まだ音声成分１２でおると判断し、音声デー
タの音声メモリ分の時間１、タイムカウントをクリアし
、音声データの音声メモリ分の格納を続ける。また、上
記入力データが無音データであるか否かの判断において
入力データが無音データと判断された場合および上記有
音データが時間１．以上持続するか否かの判断において
、有音データが時間１２以上持続しなかりた場合は時間
ｔ２の計時を開始する。、そして、この次に時間ｔ２に
達したか否かの判断を行い達しなかった場合はこのフロ
ーの最初に戻シ再び上記動作を繰シ返すが、時間ｔ２に
達した場合は音声成分１２が終了したと判断する。The audio data from the AD conversion unit 4 is added to the start/end point detection unit 7 in minimum units, and the start/end point detection unit 7 first determines whether or not this input data is silent data. If it is not silent data, that is, if it is sound data, then it is determined whether or not this active data lasts for more than time t1, and the sound is more than 91 pairs of time t□. If it is red, it is determined that the audio component is still 12, the time count is cleared by 1 for the audio memory portion of the audio data, and storage of the audio data for the audio memory portion is continued. Further, in the case where the input data is determined to be silent data in determining whether or not the input data is silent data, and when the above-mentioned sound data is determined to be silent data at time 1. In determining whether or not the sound data lasts for more than 12 hours, if the sound data does not last for more than 12 hours, time measurement of time t2 is started. Then, it is determined whether or not the time t2 has been reached, and if the time has not been reached, the process returns to the beginning of this flow and the above operation is repeated again. However, if the time t2 has been reached, the audio component 12 is Decide that it is finished.

この判断によシ中央処理装置８は、音声データの記憶を
終了し、また録音再生装置制御部２は、録音再生装置３
の駆動を停止する〇すなわち、この実施例ではＡＤ変換部４でＡＤ変換され
たデジタルデータをシリアルに処理し、無音データ及び
一定値ｔ１しが持続しない有音データをスキップし、一
定値ｔ１を越える音声成分１２の先頭をみつけて音声メ
モリ９への音声データの格納を開始し、また無音データ
及び一定値ｔ１しか持続しない有音データがｔ２の閲読
いたとき終点と判断して、音声メモリ９への音声データ
の格納を終了するようにしている。Based on this determination, the central processing unit 8 finishes storing the audio data, and the recording/playback device control section 2 controls the recording/playback device 3.
In other words, in this embodiment, the AD converter 4 serially processes the digital data that has been AD converted, skips silent data and sound data that does not last at the constant value t1, and sets the constant value t1. It finds the beginning of the audio component 12 that exceeds the limit and starts storing the audio data in the audio memory 9, and when silent data and sound data that last only at a constant value t1 are read at t2, it is determined that the end point is reached and the audio data is stored in the audio memory 9. I am trying to finish storing the audio data to.

また、オ（レータコンソール１よシ音声再生指令を発す
ると、音声メモリ９よシ分析した音声データをとシだし
、ＤＡ変換部５にデータを出力し、分析した音声データ
をスピーカ６よシ再生して確認することができる。良好
な場合は、音声メモリ９にある音声データを音声データ
格納装置１１に格納し、次の音声データ分析に設える◎
〔発明の効果〕以上説明したように本発明の方式によれば、無駄のない
音声データを一回の操作で作成できるため、効率の良い
音声分析作業ができる利点がある０In addition, when the operator console 1 issues an audio playback command, it outputs the analyzed audio data from the audio memory 9, outputs the data to the DA converter 5, and reproduces the analyzed audio data from the speaker 6. If it is good, the audio data in the audio memory 9 is stored in the audio data storage device 11 and used for the next audio data analysis.
[Effects of the Invention] As explained above, according to the method of the present invention, waste-free voice data can be created in a single operation, which has the advantage of enabling efficient voice analysis work.

[Brief explanation of drawings]

第１図はこの発明の方式を適用した音声分析装置の一実
施例を示すゾｐ、り図、第２図は音声の一例を示すグラ
フ、第３図は同実施例における川岸データの始点を検出
するためフローチャート、１・・・オ（レータコンソー
ル、２・・・録音再生装置制御部、３・・・録音再生装
置、４・・・ＡＤ変換部、５音声データ格納装置、１２
・・・音声成分、１３・・・ノイズ成分、１４・・・無
音成分。代理人弁理士　則近憲佑（ｊ、Ｌか格）第３図第４図Fig. 1 is a diagram showing an example of a speech analysis device to which the method of the present invention is applied, Fig. 2 is a graph showing an example of speech, and Fig. 3 is a graph showing the starting point of riverbank data in the same embodiment. Flowchart for detection, 1...O(rate console), 2...Recording and playback device control unit, 3...Recording and playback device, 4...AD conversion unit, 5 Audio data storage device, 12
...Audio component, 13...Noise component, 14...Silent component. Representative Patent Attorney Kensuke Norichika (J, L or grade) Figure 3 Figure 4

Claims

[Claims]

The audio signal is converted into a digital signal and inputted in each minimum unit, and it is determined whether the minimum unit required signal is silent data or audio data, and if the audio data continues for a first predetermined time of 11 or more, the i-voice data is input. The storage of the digital signal is started by determining the start point of the digital signal, and either the silent data or the sound data that does not continue for 11 or more predetermined times is determined as the second predetermined end point, and the digital signal is stored. A voice analysis method that terminates signal storage.