JPH0628830A

JPH0628830A - Speech recognizing device

Info

Publication number: JPH0628830A
Application number: JP4069589A
Authority: JP
Inventors: Makoto Akaha; 誠赤羽; Tetsuo Kobayashi; 哲夫小林; Hiroaki Ogawa; 浩明小川; Miyuki Tanaka; 幸田中; Yasuhiko Kato; 靖彦加藤; Kazuo Ishii; 和夫石井
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 1992-02-19
Filing date: 1992-02-19
Publication date: 1994-02-04

Abstract

PURPOSE:To speedily retrieve the recorded content of a tape. CONSTITUTION:When a key word is inputted from a keyboard 7 through a key word voice input part 11 or a key word preparing part 8, the key word is supplied through a selective processing part 9 to a key word processing part 10. When the recording position of the key word is preliminarily recorded in the header of a tape 4, the position preliminarily recorded in the header of the tape 4 is referred by the key word processing part 10, and the same key word as the key word supplied from the selective processing part 9 is retrieved. When the same key word as the key word supplied from the selective processing part 9 is present among the key words preliminarily recorded in the header of the tape 4, the recording position of the key word preliminarily recorded in the tape 4 is referred by the key word processing part 10, and the reproduction of the tape 4 is started from the position.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、例えばテープやディス
クなどに記録された音声の内容を検索する場合に用いて
好適な音声認識装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice recognition device suitable for use in searching the contents of voice recorded on a tape or a disk.

【０００２】[0002]

【従来の技術】例えばテープレコーダなどにおいては、
会議の様子をテープに記録することができる。このと
き、必要に応じて任意の位置にインデックス信号を記録
することができる。これにより、テープの任意の位置か
ら、このインデックス信号を検索し、その位置から再生
ができるようになっている。2. Description of the Related Art For example, in a tape recorder,
The state of the meeting can be recorded on tape. At this time, the index signal can be recorded at an arbitrary position as needed. As a result, it is possible to retrieve this index signal from an arbitrary position on the tape and reproduce it from that position.

【０００３】[0003]

【発明が解決しようとする課題】しかしながら、従来の
装置では、例えば会議で打ち合わせた事項のうち、所定
の事項に関する部分を確認したい場合、テープの先頭か
らすべてのインデックス信号を順次検出して部分的に再
生しなければならず、使用者にわずらわしさを感じさせ
るだけでなく、その位置を検索するのに時間がかかる課
題があった。また、その位置にあらかじめインデックス
信号が記録されていない場合においては、始点から終点
まで再生をしなければならなかった。However, in the conventional apparatus, for example, when it is desired to confirm a part relating to a predetermined item among the items discussed at the meeting, all the index signals are sequentially detected from the beginning of the tape and the partial signal is partially detected. However, there is a problem that not only the user feels troublesome but also it takes time to search the position. In addition, when the index signal is not recorded in advance at that position, reproduction must be performed from the start point to the end point.

【０００４】本発明は、このような状況に鑑みてなされ
たものであり、所定の記録内容を迅速に検索できるよう
にするものである。The present invention has been made in view of such a situation, and makes it possible to quickly search a predetermined recording content.

【０００５】[0005]

【課題を解決するための手段】請求項１に記載の音声認
識装置は、音声を入力する音声入力手段としての音声入
力部１と、入力された音声を記録するための記録媒体と
してのテープ４と、キーワードを入力するキーワード入
力手段としてのキーボード７またはキーワード音声入力
部１１と、入力されたキーワードを記憶する記憶手段と
しての標準パターンメモリ２４と、テープ４に記録され
た音声（音声入力部１よりテープ４に記録される音声ま
たはテープ４から再生された音声）から、標準パターン
メモリ２４に記憶されたキーワードを抽出するキーワー
ド抽出手段としての音声認識部３と、音声認識部３によ
り抽出されたキーワードがテープ４に記録された位置を
検出し、キーワードとともにテープ４の例えばヘッダな
どの所定の位置に記録するキーワード処理手段としての
キーワード処理部１０とを備えることを特徴とする。According to a first aspect of the present invention, there is provided a voice recognition device, which comprises a voice input section 1 as voice input means for inputting voice and a tape 4 as a recording medium for recording the input voice. A keyboard 7 or a keyword voice input section 11 as a keyword input means for inputting a keyword, a standard pattern memory 24 as a storage means for storing the input keyword, and a voice recorded on the tape 4 (voice input section 1 From the voice recorded on the tape 4 or the voice reproduced from the tape 4), and a voice recognition unit 3 as a keyword extraction unit for extracting a keyword stored in the standard pattern memory 24. The position where the keyword is recorded on the tape 4 is detected, and the keyword is recorded at a predetermined position on the tape 4 such as a header. Characterized in that it comprises a keyword processing section 10 as a keyword processing means for recording.

【０００６】請求項２に記載の音声認識装置は、キーボ
ード７またはキーワード音声入力部１１は、文字または
音声でキーワードを入力することを特徴とする。In the voice recognition device according to a second aspect of the present invention, the keyboard 7 or the keyword voice input unit 11 inputs a keyword by characters or voice.

【０００７】請求項３に記載の音声認識装置は、キーワ
ードを記憶する記憶手段としての標準パターンメモリ２
４と、キーワードを含む音声が記録されているととも
に、音声に含まれるキーワードの記録位置が例えばヘッ
ダなどの所定の位置に記録されている記録媒体としての
テープ４と、標準パターンメモリ２４に記憶されたキー
ワードを含む音声を、テープ４から検索する検索手段と
してのキーワード処理部１０とを備えることを特徴とす
る。In the voice recognition device according to the third aspect, the standard pattern memory 2 as a storage means for storing the keyword is provided.
4, a voice including a keyword is recorded, and the recording position of the keyword included in the voice is recorded in a predetermined position such as a header as a tape 4 and a standard pattern memory 24. And a keyword processing unit 10 as a search means for searching the tape 4 for a voice including the keyword.

【０００８】[0008]

【作用】請求項１に記載の音声認識装置においては、テ
ープ４に記録される音声またはテープ４から再生された
音声に含まれるキーワードを抽出し、抽出したキーワー
ドがテープ４に記録された位置を検出して、キーワード
とともにテープ４のヘッダに記録する。従って、テープ
４のヘッダを参照してキーワードが記録された位置に、
迅速にアクセスすることができる。In the voice recognition device according to the first aspect of the present invention, the keyword contained in the voice recorded on the tape 4 or the voice reproduced from the tape 4 is extracted, and the extracted keyword indicates the position recorded on the tape 4. It is detected and recorded in the header of the tape 4 together with the keyword. Therefore, referring to the header of the tape 4, at the position where the keyword is recorded,
It can be accessed quickly.

【０００９】請求項２に記載の音声認識装置において
は、キーボード７またはキーワード音声入力部１１は、
文字または音声でキーワードを入力することができるの
で、文字または音声でキーワードを確認することができ
る。In the voice recognition device according to the second aspect, the keyboard 7 or the keyword voice input unit 11 is
Since the keyword can be input by letters or voice, the keyword can be confirmed by letters or voice.

【００１０】請求項３に記載の音声認識装置において
は、キーワードが標準パターンメモリ２４に記憶されて
いる。そして、キーワードを含む音声が記録されている
とともに、音声に含まれるキーワードの記録位置が記録
されたテープ４から、標準パターンメモリ２４に記憶さ
れたキーワードを含む音声を検索する。従って、テープ
４に記録された所定の内容を迅速に確認することができ
る。In the voice recognition apparatus according to the third aspect, the keyword is stored in the standard pattern memory 24. Then, the voice including the keyword stored in the standard pattern memory 24 is searched from the tape 4 in which the voice including the keyword is recorded and the recording position of the keyword included in the voice is recorded. Therefore, the predetermined contents recorded on the tape 4 can be confirmed quickly.

【００１１】[0011]

【実施例】図１は、本発明の音声認識装置の一実施例の
構成を示すブロック図である。音声入力部１は、入力さ
れた音声をＡ／Ｄ変換した信号、即ちディジタル化した
音声信号を音声記録部２および音声認識部３に出力す
る。音声記録部２は、音声入力部１より出力された音声
信号をテープ４に記録する。音声再生部５は、テープ４
に記録された音声信号を再生し、音声出力部６および音
声認識部３に供給する。音声出力部６は、音声再生部５
またはキーワード処理部１０より供給された音声信号を
音声に変換して出力する。DESCRIPTION OF THE PREFERRED EMBODIMENTS FIG. 1 is a block diagram showing the configuration of an embodiment of a voice recognition device of the present invention. The voice input unit 1 outputs a signal obtained by A / D converting the input voice, that is, a digitized voice signal to the voice recording unit 2 and the voice recognition unit 3. The audio recording unit 2 records the audio signal output from the audio input unit 1 on the tape 4. The audio reproducing unit 5 is a tape 4
The audio signal recorded in is reproduced and supplied to the audio output unit 6 and the audio recognition unit 3. The audio output unit 6 is the audio reproduction unit 5
Alternatively, the voice signal supplied from the keyword processing unit 10 is converted into voice and output.

【００１２】キーワード音声入力部１１は、入力された
音声をキーワードとして選択処理部９に供給する。キー
ボード７は、キーワードを文字（文字列）としてタイプ
するためのもので、タイプされた文字は、キーワード作
成部８に出力される。キーワード作成部８は、キーボー
ド７より出力された文字としてのキーワードより音声信
号としてのキーワードを作成し、選択処理部９に供給す
る。選択処理部９は、キーワード音声入力部１１より供
給されるキーワードまたはキーワード作成部８より供給
されるキーワードのいずれかを選択して、音声認識部３
およびキーワード処理部１０に出力する。The keyword voice input unit 11 supplies the input voice as a keyword to the selection processing unit 9. The keyboard 7 is for typing a keyword as a character (character string), and the typed character is output to the keyword creating unit 8. The keyword creating unit 8 creates a keyword as a voice signal from the keyword as a character output from the keyboard 7 and supplies it to the selection processing unit 9. The selection processing unit 9 selects either the keyword supplied from the keyword voice input unit 11 or the keyword supplied from the keyword creation unit 8, and the voice recognition unit 3
And output to the keyword processing unit 10.

【００１３】音声認識部３は、図２に示すように、音響
分析部２１、入力パターンメモリ２２、ＤＰマッチング
部２３、標準パターンメモリ２４、およびワードスポッ
ティング判定部２５により構成される。音響分析部２１
は、選択処理部９よりキーワードとしての音声信号が供
給された場合、キーワード（音声信号）に、例えば線形
予測分析処理などの周波数分析処理を施し、音声の特徴
パラメータ（特徴パラメータ系列）を算出し、標準パタ
ーンメモリ２４に供給する。標準パターンメモリ２４
は、音響分析部２１より供給される、キーワードより算
出された特徴パラメータを標準パターンとして記憶す
る。As shown in FIG. 2, the voice recognition section 3 is composed of an acoustic analysis section 21, an input pattern memory 22, a DP matching section 23, a standard pattern memory 24, and a word spotting determination section 25. Acoustic analysis unit 21
When a voice signal as a keyword is supplied from the selection processing unit 9, the keyword (voice signal) is subjected to frequency analysis processing such as linear prediction analysis processing to calculate a voice characteristic parameter (feature parameter series). , To the standard pattern memory 24. Standard pattern memory 24
Stores the characteristic parameter calculated from the keyword supplied from the acoustic analysis unit 21 as a standard pattern.

【００１４】また、音響分析部２１は、音声入力部１よ
り音声信号が供給された場合、または音声再生部５によ
りテープ４から再生された音声信号が供給された場合、
音声信号に、線形予測分析処理を施し、音声の特徴パラ
メータ（特徴パラメータ系列）を算出し、入力パターン
メモリ２２に供給する。入力パターンメモリ２２は、音
響分析部２１より供給された音声の特徴パラメータを入
力パターンとして一時記憶する。In addition, when the audio signal is supplied from the audio input unit 1 or the audio signal reproduced from the tape 4 by the audio reproducing unit 5 is supplied to the acoustic analysis unit 21,
A linear prediction analysis process is performed on the voice signal to calculate a voice feature parameter (feature parameter series), and the voice feature parameter is supplied to the input pattern memory 22. The input pattern memory 22 temporarily stores the characteristic parameter of the voice supplied from the acoustic analysis unit 21 as an input pattern.

【００１５】ＤＰマッチング部２３は、入力パターンメ
モリ２２に一時記憶された入力パターンと、標準パター
ンメモリ２４に記憶された標準パターンとのパターン間
距離を、ＤＰマッチングにより算出する。即ち、ＤＰマ
ッチング部２３は、標準パターンの時間軸、または入力
パターンの時間軸のうちの一方の、例えば標準パターン
の時間軸を、他方のパターンである入力パターンに最も
類似するように（標準パターンと入力パターンとの距離
を最小にするように）、人工的に歪ませ、標準パターン
と入力パターンとの距離（時間正規化した距離）を求め
る。そして、ＤＰマッチング部２３は、標準パターンと
入力パターンとの距離とともに、その距離が得られたと
きの、標準パターンの時間軸が入力パターンの時間軸に
対応する位置、即ち、入力パターン中の標準パターンの
位置をワードスポッティング判定部２５に供給する。The DP matching unit 23 calculates the pattern distance between the input pattern temporarily stored in the input pattern memory 22 and the standard pattern stored in the standard pattern memory 24 by DP matching. That is, the DP matching unit 23 sets one of the time axis of the standard pattern or the time axis of the input pattern, for example, the time axis of the standard pattern, so that the time axis of the other pattern is the most similar (the standard pattern). And the input pattern) so that the distance between the standard pattern and the input pattern (time-normalized distance) is artificially distorted. Then, the DP matching unit 23, together with the distance between the standard pattern and the input pattern, the position at which the time axis of the standard pattern corresponds to the time axis of the input pattern when the distance is obtained, that is, the standard in the input pattern. The position of the pattern is supplied to the word spotting determination unit 25.

【００１６】なお、ＤＰマッチング部２３は、標準パタ
ーンメモリ２４に記憶されているすべての標準パターン
に対して以上の処理を行い、すべての標準パターンにお
ける最小の距離、およびそれが得られた位置を求め、ワ
ードスポッティング判定部２５に供給する。The DP matching unit 23 performs the above processing on all the standard patterns stored in the standard pattern memory 24, and determines the minimum distance in all the standard patterns and the position where the minimum distance is obtained. Obtained and supplied to the word spotting determination unit 25.

【００１７】ワードスポッティング判定部２５は、ＤＰ
マッチング部２３より供給されたすべての標準パターン
における最小の距離、およびそれが得られた位置から、
入力パターンに含まれるすべての標準パターンを抽出
（認識）し、その標準パターンに対するキーワードをキ
ーワード処理部１０（図１）に出力する。The word spotting determination unit 25 uses the DP
From the minimum distances in all the standard patterns supplied from the matching unit 23 and the positions where they are obtained,
All the standard patterns included in the input pattern are extracted (recognized), and the keywords for the standard patterns are output to the keyword processing unit 10 (FIG. 1).

【００１８】キーワード処理部１０は、音声認識部３
（ワードスポッティング判定部２５）より出力されたキ
ーワードがテープ４に記録された位置（記録位置）を検
出し、キーワードとともにテープ４の、例えばヘッダに
記録する。さらに、キーワード処理部１０は、テープ４
のヘッダに記録されたキーワードを読み取り、音声再生
部５または表示部１２に供給する。表示部１２は、キー
ワード処理部１０より供給されたキーワードを、画像と
して表示する。The keyword processing unit 10 includes a voice recognition unit 3
The position (recording position) where the keyword output from the (word spotting determination unit 25) is recorded on the tape 4 is detected, and is recorded together with the keyword on, for example, the header of the tape 4. Further, the keyword processing unit 10 uses the tape 4
The keyword recorded in the header of is read and supplied to the voice reproducing unit 5 or the display unit 12. The display unit 12 displays the keyword supplied from the keyword processing unit 10 as an image.

【００１９】次に、キーワードを登録するときの動作に
ついて説明する。キーワード音声入力部１１またはキー
ワード作成部８を介してキーボード７より入力されたキ
ーワードが、選択処理部９を介して音声認識部３の音響
分析部２１およびキーワード処理部１０に供給される。
キーワード処理部１０において、選択処理部９より供給
されたキーワードは、入力確認のために表示部１２また
は音声再生部５を介して音声出力部６に供給される。表
示部１２において、キーワード作成部８を介してキーボ
ード７より入力されたキーワードは画像（文字）として
表示され、音声出力部６において、キーワード音声入力
部１１より入力されたキーワードは音声として出力され
る。Next, the operation for registering a keyword will be described. The keyword input from the keyboard 7 via the keyword voice input unit 11 or the keyword creation unit 8 is supplied to the acoustic analysis unit 21 and the keyword processing unit 10 of the voice recognition unit 3 via the selection processing unit 9.
In the keyword processing unit 10, the keyword supplied from the selection processing unit 9 is supplied to the voice output unit 6 via the display unit 12 or the voice reproduction unit 5 for input confirmation. In the display unit 12, the keyword input from the keyboard 7 via the keyword creating unit 8 is displayed as an image (character), and in the voice output unit 6, the keyword input from the keyword voice input unit 11 is output as voice. .

【００２０】なお、１つのキーワードを、キーボード７
とキーワード音声入力部１１との両方で入力しておけ
ば、音声で入力したキーワード（キーワード音声入力部
１１より入力されたキーワード）を文字として確認する
ことができ、また、文字で入力したキーワード（キーボ
ード７より入力されたキーワード）を音声で確認するこ
とができる。It should be noted that one keyword is used for the keyboard 7
If both are input with the keyword voice input unit 11, the keyword input by voice (the keyword input by the keyword voice input unit 11) can be confirmed as characters, and the keyword input by characters ( A keyword input from the keyboard 7 can be confirmed by voice.

【００２１】一方、音声認識部３の音響分析部２１にお
いて、選択処理部９より供給されたキーワードとしての
音声信号が、線形予測分析され、音声の特徴パラメータ
（特徴パラメータ系列）が算出され、標準パターンメモ
リ２４に供給される。標準パターンメモリ２４におい
て、音響分析部２１より供給される、キーワードより算
出された特徴パラメータが標準パターンとして記憶され
る。On the other hand, in the acoustic analysis section 21 of the speech recognition section 3, the speech signal serving as the keyword supplied from the selection processing section 9 is subjected to linear prediction analysis to calculate the characteristic parameter (characteristic parameter series) of the speech, and the standard is calculated. It is supplied to the pattern memory 24. In the standard pattern memory 24, the characteristic parameter calculated by the keyword supplied from the acoustic analysis unit 21 is stored as a standard pattern.

【００２２】さらに、キーワードをテープ４に記録する
ときの動作について説明する。音声入力部１において、
音声信号がＡ／Ｄ変換処理され、音声記録部２および音
声認識部３に供給される。音声記録部２において、音声
入力部１より供給された音声信号は、テープ４に記録さ
れる。Further, the operation of recording the keyword on the tape 4 will be described. In the voice input unit 1,
The audio signal is A / D converted and supplied to the audio recording unit 2 and the audio recognition unit 3. In the audio recording unit 2, the audio signal supplied from the audio input unit 1 is recorded on the tape 4.

【００２３】一方、音声認識部３に供給された音声信号
は、音響分析部２１において線形予測分析処理され、音
声の特徴パラメータ（特徴パラメータ系列）が算出され
る。そして、入力パターンメモリ２２において、音響分
析部２１より供給された音声の特徴パラメータが入力パ
ターンとして一時記憶される。On the other hand, the speech signal supplied to the speech recognition section 3 is subjected to linear prediction analysis processing in the acoustic analysis section 21 to calculate a characteristic parameter (characteristic parameter series) of speech. Then, in the input pattern memory 22, the characteristic parameter of the voice supplied from the acoustic analysis unit 21 is temporarily stored as an input pattern.

【００２４】ＤＰマッチング部２３において、入力パタ
ーンメモリ２２に一時記憶された入力パターンと、標準
パターンメモリ２４に記憶された標準パターンとのパタ
ーン間距離が、ＤＰマッチングにより算出される。即
ち、ＤＰマッチング部２３において、標準パターンの時
間軸、または入力パターンの時間軸のうちの一方、例え
ば標準パターンの時間軸を、他方のパターンである入力
パターンに最も類似するように、人工的に歪ませ、標準
パターンと入力パターンとの距離（時間正規化した距
離）が求められる。そして、ＤＰマッチング部２３にお
いて、標準パターンと入力パターンとの距離とともに、
その距離が得られたときの、標準パターンの時間軸が入
力パターンの時間軸に対応する位置、即ち、入力パター
ン中の標準パターンの位置が求められ、ワードスポッテ
ィング判定部２５に供給される。In the DP matching section 23, the pattern distance between the input pattern temporarily stored in the input pattern memory 22 and the standard pattern stored in the standard pattern memory 24 is calculated by DP matching. That is, in the DP matching unit 23, one of the time axis of the standard pattern and the time axis of the input pattern, for example, the time axis of the standard pattern, is artificially set so as to be most similar to the input pattern which is the other pattern. After distortion, the distance between the standard pattern and the input pattern (time-normalized distance) is obtained. Then, in the DP matching unit 23, together with the distance between the standard pattern and the input pattern,
The position where the time axis of the standard pattern corresponds to the time axis of the input pattern when the distance is obtained, that is, the position of the standard pattern in the input pattern is obtained and supplied to the word spotting determination unit 25.

【００２５】なお、ＤＰマッチング部２３において、標
準パターンメモリ２４に記憶されているすべての標準パ
ターンに対して以上の処理が行われ、すべての標準パタ
ーンにおける最小の距離、およびそれが得られた位置が
求められ、ワードスポッティング判定部２５に供給され
る。In the DP matching unit 23, all the standard patterns stored in the standard pattern memory 24 are subjected to the above processing, and the minimum distance in all the standard patterns and the position where the minimum distance is obtained. Is calculated and supplied to the word spotting determination unit 25.

【００２６】ワードスポッティング判定部２５におい
て、ＤＰマッチング部２３より供給されたすべての標準
パターンにおける最小の距離、およびそれが得られた位
置から、入力パターンに含まれるすべての標準パターン
が抽出（認識）され、その標準パターンに対するキーワ
ードがキーワード処理部１０（図１）に出力される。In the word spotting determination unit 25, all the standard patterns included in the input pattern are extracted (recognized) from the minimum distance in all the standard patterns supplied from the DP matching unit 23 and the position where the minimum distance is obtained. Then, the keyword for the standard pattern is output to the keyword processing unit 10 (FIG. 1).

【００２７】キーワード処理部１０において、音声認識
部３（ワードスポッティング判定部２５）より出力され
たキーワードがテープ４に記録された位置が検出され、
キーワードとともにテープ４のヘッダに記録される。The keyword processing unit 10 detects the position where the keyword output from the voice recognition unit 3 (word spotting determination unit 25) is recorded on the tape 4,
It is recorded in the header of the tape 4 together with the keyword.

【００２８】また、上述した以外に、次のようにしてキ
ーワードをテープ４に記録することができる。即ち、音
声入力部１において、音声信号がＡ／Ｄ変換処理され、
音声記録部２に供給される。音声記録部２において、音
声入力部１より供給された音声信号は、テープ４に記録
される。音声信号の記録が終了した後、音声再生部５に
おいて、テープ４に記録された音声信号が再生され、音
声認識部３に供給される。音声認識部３の音響分析部２
１において、テープ４に記録された音声信号が線形予測
分析処理され、音声の特徴パラメータ（特徴パラメータ
系列）が算出される。そして、入力パターンメモリ２２
において、音響分析部２１より供給された音声の特徴パ
ラメータが入力パターンとして一時記憶される。In addition to the above, the keywords can be recorded on the tape 4 as follows. That is, in the voice input unit 1, the voice signal is A / D converted,
It is supplied to the voice recording unit 2. In the audio recording unit 2, the audio signal supplied from the audio input unit 1 is recorded on the tape 4. After the recording of the audio signal is completed, the audio signal recorded on the tape 4 is reproduced by the audio reproducing unit 5 and supplied to the voice recognizing unit 3. Acoustic analysis unit 2 of voice recognition unit 3
In 1, the audio signal recorded on the tape 4 is subjected to the linear predictive analysis processing, and the audio characteristic parameter (characteristic parameter series) is calculated. Then, the input pattern memory 22
At, the characteristic parameter of the voice supplied from the acoustic analysis unit 21 is temporarily stored as an input pattern.

【００２９】以後、音声認識部３およびキーワード処理
部１０において、上述したキーワードをテープ４に記録
するための処理が行われ、キーワード、およびキーワー
ドの記録位置がテープ４のヘッダに記録される。After that, the voice recognition unit 3 and the keyword processing unit 10 perform the above-described processing for recording the keyword on the tape 4, and the keyword and the recording position of the keyword are recorded in the header of the tape 4.

【００３０】次に、テープ４に記録された音声信号を再
生する場合、キーワード音声入力部１１またはキーワー
ド作成部８を介してキーボード７よりキーワードを入力
すると、そのキーワードが、選択処理部９を介してキー
ワード処理部１０に供給される。キーワード処理部１０
において、選択処理部９より供給されたキーワードは、
入力確認のために表示部１２または音声再生部５を介し
て音声出力部６に供給される。表示部１２において、キ
ーワード作成部８を介してキーボード７より入力された
キーワードは画像として表示され、音声出力部６におい
て、キーワード音声入力部１１より入力されたキーワー
ドは音声として出力される。Next, when reproducing the audio signal recorded on the tape 4, when a keyword is input from the keyboard 7 through the keyword audio input unit 11 or the keyword creating unit 8, the keyword is input through the selection processing unit 9. Are supplied to the keyword processing unit 10. Keyword processing unit 10
In, the keyword supplied from the selection processing unit 9 is
It is supplied to the audio output unit 6 via the display unit 12 or the audio reproduction unit 5 for input confirmation. In the display unit 12, the keyword input from the keyboard 7 via the keyword creating unit 8 is displayed as an image, and in the voice output unit 6, the keyword input from the keyword voice input unit 11 is output as voice.

【００３１】そして、あらかじめテープ４のヘッダにキ
ーワードおよびキーワードの記録位置が記録されている
場合、キーワード処理部１０において、あらかじめテー
プ４のヘッダに記録されたキーワードが参照され、選択
処理部９より供給されたキーワードと同一のキーワード
が検索される。あらかじめテープ４のヘッダに記録され
たキーワードの中に、選択処理部９より供給されたキー
ワードと同一のキーワードがあると、キーワード処理部
１０において、あらかじめテープ４のヘッダに記録され
た、そのキーワードの記録位置が読み取られ、さらにそ
の位置が検索され、その位置からテープ４の再生が開始
される。また、キーワードを含む音声の始点の位置をヘ
ッダに記録しておけば、キーワードを含む音声をその始
点から再生させることもできる。When the keyword and the recording position of the keyword are recorded in the header of the tape 4 in advance, the keyword recorded in the header of the tape 4 is referred to in the keyword processing unit 10 and supplied from the selection processing unit 9. The same keyword as the searched keyword is searched. If the same keyword as the keyword supplied from the selection processing unit 9 is included in the keywords recorded in the header of the tape 4 in advance, the keyword processing unit 10 records the keyword of the keyword recorded in the header of the tape 4 in advance. The recording position is read, the position is further searched, and the reproduction of the tape 4 is started from that position. Further, if the position of the start point of the voice including the keyword is recorded in the header, the voice including the keyword can be reproduced from the start point.

【００３２】また、あらかじめテープ４のヘッダにキー
ワードおよびキーワードの記録位置が記録されていない
場合、音声再生部５において、テープ４に記録された音
声信号の再生が開始され、音声認識部３に供給される。
音声認識部３の音響分析部２１において線形予測分析処
理され、音声の特徴パラメータ（特徴パラメータ系列）
が算出される。そして、入力パターンメモリ２２におい
て、音響分析部２１より供給された音声の特徴パラメー
タが入力パターンとして一時記憶される。When the keyword and the recording position of the keyword are not recorded in the header of the tape 4 in advance, the audio reproduction unit 5 starts reproduction of the audio signal recorded on the tape 4 and supplies it to the audio recognition unit 3. To be done.
A linear prediction analysis process is performed in the acoustic analysis unit 21 of the voice recognition unit 3, and a voice characteristic parameter (feature parameter series)
Is calculated. Then, in the input pattern memory 22, the characteristic parameter of the voice supplied from the acoustic analysis unit 21 is temporarily stored as an input pattern.

【００３３】ＤＰマッチング部２３において、標準パタ
ーンメモリ２４に記憶された標準パターンの時間軸、ま
たは入力パターンメモリ２２に記憶された入力パターン
の時間軸のうちの一方、例えば標準パターンの時間軸
を、他方のパターンである入力パターンに最も類似する
ように、人工的に歪ませ、標準パターンと入力パターン
との距離（時間正規化した距離）が求められる。そし
て、ＤＰマッチング部２３において、標準パターンと入
力パターンとの距離とともに、その距離が得られたとき
の、標準パターンの時間軸が入力パターンの時間軸に対
応する位置、即ち、入力パターン中の標準パターンの位
置が求められ、ワードスポッティング判定部２５に供給
される。In the DP matching unit 23, one of the time axis of the standard pattern stored in the standard pattern memory 24 or the time axis of the input pattern stored in the input pattern memory 22, for example, the time axis of the standard pattern, The distance between the standard pattern and the input pattern (distance normalized by time) is obtained by artificially distorting it so that it is most similar to the other pattern, the input pattern. Then, in the DP matching unit 23, along with the distance between the standard pattern and the input pattern, when the distance is obtained, the time axis of the standard pattern corresponds to the time axis of the input pattern, that is, the standard in the input pattern. The position of the pattern is obtained and supplied to the word spotting determination unit 25.

【００３４】なお、ＤＰマッチング部２３において、標
準パターンメモリ２４に記憶されているすべての標準パ
ターンに対して以上の処理が行われ、すべての標準パタ
ーンにおける最小の距離、およびそれが得られた位置が
求められ、ワードスポッティング判定部２５に供給され
る。In the DP matching unit 23, all the standard patterns stored in the standard pattern memory 24 are subjected to the above processing, and the minimum distance in all the standard patterns and the position where the minimum distance is obtained. Is calculated and supplied to the word spotting determination unit 25.

【００３５】ワードスポッティング判定部２５におい
て、ＤＰマッチング部２３より供給されたすべての標準
パターンにおける最小の距離、およびそれが得られた位
置から、入力パターンに含まれるすべての標準パターン
が抽出（認識）され、その標準パターンに対するキーワ
ードがキーワード処理部１０（図１）に出力される。In the word spotting determination section 25, all the standard patterns included in the input pattern are extracted (recognized) from the minimum distance in all the standard patterns supplied from the DP matching section 23 and the position where the minimum distance is obtained. Then, the keyword for the standard pattern is output to the keyword processing unit 10 (FIG. 1).

【００３６】キーワード処理部１０において、音声認識
部３（ワードスポッティング判定部２５）より供給され
たキーワードは、表示部１２または音声再生部５を介し
て音声出力部６に供給される。表示部１２において、テ
ープ４に記録されている音声信号に含まれるキーワード
は画像として表示され、音声出力部６において、テープ
４に記録されている音声信号に含まれるキーワードは音
声として出力される。In the keyword processing unit 10, the keyword supplied from the voice recognition unit 3 (word spotting determination unit 25) is supplied to the voice output unit 6 via the display unit 12 or the voice reproduction unit 5. The keywords included in the audio signal recorded on the tape 4 are displayed as images on the display unit 12, and the keywords included in the audio signal recorded on the tape 4 are output as audio on the audio output unit 6.

【００３７】従って、テープ４に、例えば「こんにち
は、東京のソニー太郎です。先日のお問い合わせについ
てご返事差し上げます。当社のウォークマンは現在８タ
イプ発売致しております。この中でご要望に合う機種は
ＷＭ−ＥＸ７０、１９，５００円です。ご購入は、お近
くのソニー製品を扱っている電器店をご利用ください。
詳しいカタログをご希望でしたらご請求ください。お送
りさせていただきます。取り急ぎ、ご返事さしあげまし
た。」という音声（メッセージ）が記録されている場
合、標準パターンメモリ２４に、例えば「東京、ソニ
ー、返事、ウォークマン、タイプ、円、要望、機種、製
品、購入、電器店、利用、カタログ、希望、請求、１、
２、３、４、５、６、７、８、９、０」などのキーワー
ドが登録されているとき、「東京、ソニー、返事、ウォ
ークマン、８、タイプ、要望、機種、７０、１９，５０
０、円、ソニー、製品、電器店、利用、カタログ、希
望、請求、返事」のキーワードが音声出力部６または表
示部１２より出力される。このキーワードから音声（メ
ッセージ）の内容の概要を推測することができる。[0037] Therefore, the tape 4, is for example, "Hello, Tokyo Sony Taro. Simply send your answer on the other day of inquiries. Our Walkman We now will 8 type released. Models that meet your requirements in this Is WM-EX70, 19,500 yen, please purchase it at a nearby electronics store that sells Sony products.
Please request if you want a detailed catalog. We will send it to you. We hurried and responded. Is recorded in the standard pattern memory 24, for example, “Tokyo, Sony, reply, walkman, type, circle, request, model, product, purchase, electronics store, use, catalog, hope, Billing 1,
When keywords such as “2, 3, 4, 5, 6, 7, 8, 9, 0” are registered, “Tokyo, Sony, Answer, Walkman, 8, Type, Request, Model, 70, 19, 50”
The keywords “0, Yen, Sony, Product, Electronic store, Usage, Catalog, Hope, Claim, Reply” are output from the voice output unit 6 or the display unit 12. From this keyword, the outline of the content of the voice (message) can be inferred.

【００３８】以上、本発明の音声認識装置について説明
したが、本発明を、例えばボイスメールシステムなどに
応用すれば、送られてきたボイスメールの内容を、迅速
に確認することができる。The voice recognition apparatus of the present invention has been described above. However, if the present invention is applied to, for example, a voice mail system, the contents of a voice mail sent can be confirmed quickly.

【００３９】なお、本実施例においては、音声信号を記
録する記録媒体をテープとしたが、ランダムアクセスが
可能な、例えば光ディスクや光磁気ディスクなどのディ
スクに音声信号を記録するようにすれば、さらに迅速に
キーワードの検索をすることができる。In this embodiment, the tape is used as the recording medium for recording the audio signal. However, if the audio signal is recorded on a disc which can be randomly accessed, such as an optical disc or a magneto-optical disc, The keyword can be searched more quickly.

【００４０】[0040]

【発明の効果】請求項１に記載の音声認識装置によれ
ば、記録媒体に記録される音声または記録媒体から再生
された音声に含まれるキーワードを抽出し、抽出したキ
ーワードが記録媒体に記録された位置を検出して、キー
ワードとともに記録媒体の所定の位置に記録する。従っ
て、記録媒体の所定の位置を参照してキーワードが記録
された位置に、迅速にアクセスすることができる。According to the voice recognition device of the first aspect, the keyword contained in the voice recorded in the recording medium or the voice reproduced from the recording medium is extracted, and the extracted keyword is recorded in the recording medium. The detected position is detected and recorded together with the keyword at a predetermined position on the recording medium. Therefore, it is possible to quickly access the position where the keyword is recorded by referring to the predetermined position on the recording medium.

【００４１】請求項２に記載の音声認識装置によれば、
キーワード入力手段は、文字または音声でキーワードを
入力する。従って、使用者の嗜好に合わせて装置を利用
することができる。According to the voice recognition device of the second aspect,
The keyword input means inputs a keyword by characters or voice. Therefore, the device can be used according to the preference of the user.

【００４２】請求項３に記載の音声認識装置によれば、
キーワードが記憶手段に記憶されている。そして、キー
ワードを含む音声が記録されているとともに、音声に含
まれるキーワードの記録位置が記録された記録媒体か
ら、記憶手段に記憶されたキーワードを含む音声を検索
する。従って、記録媒体に記録された所定の内容を迅速
に確認することができる。According to the voice recognition device of the third aspect,
Keywords are stored in the storage means. Then, the voice including the keyword is recorded, and the voice including the keyword stored in the storage unit is searched from the recording medium in which the recording position of the keyword included in the voice is recorded. Therefore, it is possible to quickly confirm the predetermined content recorded on the recording medium.

[Brief description of drawings]

【図１】本発明の音声認識装置の一実施例の構成を示す
ブロック図である。FIG. 1 is a block diagram showing the configuration of an embodiment of a voice recognition device of the present invention.

【図２】図１の実施例の音声認識部３のより詳細なブロ
ック図である。FIG. 2 is a more detailed block diagram of a voice recognition unit 3 of the embodiment of FIG.

[Explanation of symbols]

１音声入力部２音声記録部３音声認識部４テープ５音声再生部６音声出力部７キーボード８キーワード作成部９選択処理部１０キーワード処理部１１キーワード音声入力部１２表示部２１音響分析部２２入力パターンメモリ２３ＤＰマッチング部２４標準パターンメモリ２５ワードスポッティング判定部 1 voice input unit 2 voice recording unit 3 voice recognition unit 4 tape 5 voice reproduction unit 6 voice output unit 7 keyboard 8 keyword creation unit 9 selection processing unit 10 keyword processing unit 11 keyword voice input unit 12 display unit 21 acoustic analysis unit 22 input Pattern memory 23 DP matching unit 24 Standard pattern memory 25 Word spotting determination unit

───────────────────────────────────────────────────── フロントページの続き (72)発明者田中幸東京都品川区北品川６丁目７番35号ソニー株式会社内 (72)発明者加藤靖彦東京都品川区北品川６丁目７番35号ソニー株式会社内 (72)発明者石井和夫東京都品川区北品川６丁目７番35号ソニー株式会社内 ─────────────────────────────────────────────────── ─── Continuation of the front page (72) Inventor Yuko Tanaka 6-735 Kita-Shinagawa, Shinagawa-ku, Tokyo Inside Sony Corporation (72) Inventor Yasuhiko Kato 6-7-35 Kita-Shinagawa, Shinagawa-ku, Tokyo Sony Corporation (72) Inventor Kazuo Ishii 6-735 Kitashinagawa, Shinagawa-ku, Tokyo Sony Corporation

Claims

[Claims]

1. A voice input unit for inputting voice, a recording medium for recording the voice input by the voice input unit, a keyword input unit for inputting a keyword, and a keyword input by the keyword input unit. Storing means for storing the keyword, keyword extracting means for extracting the keyword stored in the storing means from the voice recorded in the recording medium, and the keyword extracted by the keyword extracting means is recorded in the recording medium. A voice recognition device, comprising: a keyword processing means for detecting a position and recording the keyword at a predetermined position on the recording medium together with the keyword.

2. The keyword input means inputs the keyword by characters or voice.
The voice recognition device described in.

3. A storage means for storing a keyword, a voice including the keyword is recorded, and
A recording medium in which a recording position of the keyword included in the voice is recorded at a predetermined position; and a searching unit that searches the recording medium for a voice containing the keyword stored in the storage unit. And a voice recognition device.