JPH05234338A

JPH05234338A - Speech recognition apparatus

Info

Publication number: JPH05234338A
Application number: JP4069590A
Authority: JP
Inventors: Hiroaki Ogawa; 浩明小川; Makoto Akaha; 誠赤羽; Kazuo Ishii; 和夫石井; Miyuki Tanaka; 幸田中; Yasuhiko Kato; 靖彦加藤; Tetsuo Kobayashi; 哲夫小林
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 1992-02-19
Filing date: 1992-02-19
Publication date: 1993-09-10

Abstract

PURPOSE:To enable the checking of record contents of a tape quickly. CONSTITUTION:In a voice reproducing section 5, a voice signal recorded on a tape 4 is reproduced to be supplied to a speech recognizing section 3. At the speech recognizing section 3, a key word contained in the voice signal is extracted and supplied to a display section 12 through a key word processing section 10. At the display section 12, the key ward contained in the voice signal recorded on the tape is outputted.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、例えばテープやディス
クなどに記録された音声の内容を確認する場合に用いて
好適な音声認識装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice recognition apparatus suitable for use in confirming the contents of voice recorded on a tape or a disk.

【０００２】[0002]

【従来の技術】例えばテープレコーダなどにおいては、
会議の様子をテープに記録することができる。このと
き、必要に応じて任意の位置にインデックス信号を記録
することができる。これにより、テープの任意の位置か
ら、このインデックス信号を検索し、その位置から再生
ができるようになっている。2. Description of the Related Art For example, in a tape recorder,
The state of the meeting can be recorded on tape. At this time, the index signal can be recorded at an arbitrary position as needed. As a result, the index signal can be searched from an arbitrary position on the tape and reproduced from that position.

【０００３】[0003]

【発明が解決しようとする課題】しかしながら、従来の
装置では、テープに記録された情報の内容がわからない
場合、テープの先頭に記録されている情報からすべて再
生をして確認しなければならず、テープの記録内容を確
認するのに時間がかかる課題があった。However, in the conventional apparatus, if the contents of the information recorded on the tape cannot be understood, all the information recorded at the beginning of the tape must be reproduced for confirmation. There was a problem that it took time to confirm the recorded contents of the tape.

【０００４】そこで、テープの再生速度を速くして、テ
ープの記録内容を確認する方法があるが、この方法で
も、テープの始めから終わりまで、その内容を聞かなけ
ればならず、相当の時間を必要とし、使用者にわずらわ
しさを感じさせる課題があった。Therefore, there is a method of confirming the recorded content of the tape by increasing the playback speed of the tape, but even with this method, the content must be heard from the beginning to the end of the tape, which requires a considerable amount of time. There was a problem that it was necessary and made the user feel annoyed.

【０００５】本発明は、このような状況に鑑みてなされ
たものであり、記録内容を迅速に確認できるようにする
ものである。The present invention has been made in view of such a situation, and makes it possible to quickly confirm the recorded contents.

【０００６】[0006]

【課題を解決するための手段】請求項１に記載の音声認
識装置は、音声を記録するための記録媒体としてのテー
プ４と、キーワードを入力するキーワード入力手段とし
てのキーボード７またはキーワード音声入力部１１と、
キーボード７またはキーワード音声入力部１１より入力
されたキーワードを記憶する記憶手段としての標準パタ
ーンメモリ２４と、テープ４に記録された音声から、標
準パターンメモリ２４に記憶されたキーワードを抽出す
るキーワード抽出手段としての音声認識部３と、音声認
識部３により抽出されたキーワードを出力する出力手段
としての音声出力部６または表示部１２とを備えること
を特徴とする。A voice recognition apparatus according to claim 1, wherein a tape 4 as a recording medium for recording a voice, a keyboard 7 as a keyword input means for inputting a keyword, or a keyword voice input section. 11 and
A standard pattern memory 24 as a storage means for storing the keyword input from the keyboard 7 or the keyword voice input unit 11, and a keyword extracting means for extracting the keyword stored in the standard pattern memory 24 from the voice recorded on the tape 4. And a voice output unit 6 or a display unit 12 as an output means for outputting the keyword extracted by the voice recognition unit 3.

【０００７】この音声認識装置は、音声認識部３により
抽出されたキーワードの出現頻度を記憶する頻度記憶手
段としてのキーワード情報記憶部１３と、キーワード情
報記憶部１３に記憶されたキーワードの出現頻度に対応
して、キーワードを音声出力部６または表示部１２に供
給するキーワード処理手段としてのキーワード処理部１
０とをさらに備えることができる。This voice recognition apparatus uses a keyword information storage unit 13 as a frequency storage unit for storing the appearance frequency of the keywords extracted by the voice recognition unit 3 and the appearance frequency of the keywords stored in the keyword information storage unit 13. Correspondingly, the keyword processing unit 1 as keyword processing means for supplying the keyword to the voice output unit 6 or the display unit 12.
0 and can be further provided.

【０００８】[0008]

【作用】請求項１に記載の音声認識装置においては、テ
ープ４に記録された音声に含まれるキーワードを抽出
し、抽出したキーワードを音声出力部６または表示部１
２より出力する。従って、テープ４の記録内容を迅速に
確認することができる。In the voice recognition device according to the first aspect, the keywords included in the voice recorded on the tape 4 are extracted, and the extracted keywords are output to the voice output unit 6 or the display unit 1.
Output from 2. Therefore, the recorded contents of the tape 4 can be confirmed quickly.

【０００９】キーワードの出現頻度に対応して、キーワ
ードを音声出力部６または表示部１２に供給することが
できる場合においては、テープ４の記録内容を、さらに
迅速に確認することができる。When the keyword can be supplied to the voice output section 6 or the display section 12 in accordance with the appearance frequency of the keyword, the recorded contents of the tape 4 can be confirmed more quickly.

【００１０】[0010]

【実施例】図１は、本発明の音声認識装置の一実施例の
構成を示すブロック図である。音声入力部１は、入力さ
れた音声をＡ／Ｄ変換した信号、即ちディジタル化した
音声信号を音声記録部２および音声認識部３に出力す
る。音声記録部２は、音声入力部１より出力された音声
信号をテープ４に記録する。音声再生部５は、テープ４
に記録された音声信号を再生し、音声出力部６および音
声認識部３に供給する。音声出力部６は、音声再生部５
より供給された音声信号またはキーワード処理部１０よ
り供給されるキーワードを音声に変換して出力する。1 is a block diagram showing the configuration of an embodiment of a voice recognition apparatus of the present invention. The voice input unit 1 outputs a signal obtained by A / D converting the input voice, that is, a digitized voice signal to the voice recording unit 2 and the voice recognition unit 3. The audio recording unit 2 records the audio signal output from the audio input unit 1 on the tape 4. The audio reproducing unit 5 is a tape 4
The audio signal recorded in (1) is reproduced and supplied to the audio output unit 6 and the audio recognition unit 3. The audio output unit 6 is the audio reproduction unit 5
The supplied voice signal or the keyword supplied from the keyword processing unit 10 is converted into voice and output.

【００１１】キーワード音声入力部１１は、入力された
音声をキーワードとして選択処理部９に供給する。キー
ボード７は、キーワードを文字（文字列）としてタイプ
するためのもので、タイプされた文字は、キーワード作
成部８に出力される。キーワード作成部８は、キーボー
ド７より出力された文字としてのキーワードより音声信
号としてのキーワードを作成し、選択処理部９に供給す
る。選択処理部９は、キーワード音声入力部１１より供
給されるキーワードまたはキーワード作成部８より供給
されるキーワードのいずれかを選択して、音声認識部３
およびキーワード処理部１０に出力する。The keyword voice input unit 11 supplies the input voice as a keyword to the selection processing unit 9. The keyboard 7 is for typing a keyword as a character (character string), and the typed character is output to the keyword creating unit 8. The keyword creating unit 8 creates a keyword as a voice signal from the keyword as a character output from the keyboard 7 and supplies it to the selection processing unit 9. The selection processing unit 9 selects either the keyword supplied from the keyword voice input unit 11 or the keyword supplied from the keyword creation unit 8, and the voice recognition unit 3
And output to the keyword processing unit 10.

【００１２】音声認識部３は、図２に示すように、音響
分析部２１、入力パターンメモリ２２、ＤＰマッチング
部２３、標準パターンメモリ２４、およびワードスポッ
ティング判定部２５により構成される。音響分析部２１
は、選択処理部９よりキーワードとしての音声信号が供
給された場合、キーワード（音声信号）に、例えば線形
予測分析処理などの周波数分析処理を施し、音声の特徴
パラメータ（特徴パラメータ系列）を算出し、標準パタ
ーンメモリ２４に供給する。標準パターンメモリ２４
は、音響分析部２１より供給される、キーワードより算
出された特徴パラメータを標準パターンとして記憶す
る。As shown in FIG. 2, the voice recognition section 3 is composed of an acoustic analysis section 21, an input pattern memory 22, a DP matching section 23, a standard pattern memory 24, and a word spotting determination section 25. Acoustic analysis unit 21
When a voice signal as a keyword is supplied from the selection processing unit 9, the keyword (voice signal) is subjected to frequency analysis processing such as linear prediction analysis processing to calculate a voice feature parameter (feature parameter series). , To the standard pattern memory 24. Standard pattern memory 24
Stores the characteristic parameter calculated from the keyword supplied from the acoustic analysis unit 21 as a standard pattern.

【００１３】また、音響分析部２１は、音声入力部１よ
り音声信号が供給された場合、または音声再生部５によ
りテープ４から再生された音声信号が供給された場合、
音声信号に、線形予測分析処理を施し、音声の特徴パラ
メータ（特徴パラメータ系列）を算出し、入力パターン
メモリ２２に供給する。入力パターンメモリ２２は、音
響分析部２１より供給された音声の特徴パラメータを入
力パターンとして一時記憶する。In addition, when the audio signal is supplied from the audio input unit 1 or the audio signal reproduced from the tape 4 by the audio reproducing unit 5 is supplied to the acoustic analysis unit 21,
A linear prediction analysis process is performed on the voice signal to calculate a voice feature parameter (feature parameter series), and the voice feature parameter is supplied to the input pattern memory 22. The input pattern memory 22 temporarily stores the characteristic parameter of the voice supplied from the acoustic analysis unit 21 as an input pattern.

【００１４】ＤＰマッチング部２３は、入力パターンメ
モリ２２に一時記憶された入力パターンと、標準パター
ンメモリ２４に記憶された標準パターンとのパターン間
距離を、ＤＰマッチングにより算出する。即ち、ＤＰマ
ッチング部２３は、標準パターンの時間軸、または入力
パターンの時間軸のうちの一方の、例えば標準パターン
の時間軸を、他方のパターンである入力パターンに最も
類似するように（標準パターンと入力パターンとの距離
を最小にするように）、人工的に歪ませ、標準パターン
と入力パターンとの距離（時間正規化した距離）を求め
る。そして、ＤＰマッチング部２３は、標準パターンと
入力パターンとの距離とともに、その距離が得られたと
きの、標準パターンの時間軸が入力パターンの時間軸に
対応する位置、即ち、入力パターン中の標準パターンの
位置をワードスポッティング判定部２５に供給する。The DP matching unit 23 calculates the pattern distance between the input pattern temporarily stored in the input pattern memory 22 and the standard pattern stored in the standard pattern memory 24 by DP matching. That is, the DP matching unit 23 sets one of the time axis of the standard pattern or the time axis of the input pattern, for example, the time axis of the standard pattern, to be the most similar to the input pattern which is the other pattern (the standard pattern). And the input pattern) so that the distance between the standard pattern and the input pattern (time-normalized distance) is artificially distorted. Then, the DP matching unit 23, together with the distance between the standard pattern and the input pattern, the position where the time axis of the standard pattern corresponds to the time axis of the input pattern when the distance is obtained, that is, the standard in the input pattern. The position of the pattern is supplied to the word spotting determination unit 25.

【００１５】なお、ＤＰマッチング部２３は、標準パタ
ーンメモリ２４に記憶されているすべての標準パターン
に対して以上の処理を行い、すべての標準パターンにお
ける最小の距離、およびそれが得られた位置を求め、ワ
ードスポッティング判定部２５に供給する。The DP matching unit 23 performs the above processing on all the standard patterns stored in the standard pattern memory 24, and determines the minimum distance in all the standard patterns and the position where the minimum distance is obtained. Obtained and supplied to the word spotting determination unit 25.

【００１６】ワードスポッティング判定部２５は、ＤＰ
マッチング部２３より供給されたすべての標準パターン
における最小の距離、およびそれが得られた位置から、
入力パターンに含まれるすべての標準パターンを抽出
（認識）し、その標準パターンに対するキーワードをキ
ーワード処理部１０（図１）に出力する。The word spotting determination unit 25 uses the DP
From the minimum distance in all the standard patterns supplied from the matching unit 23 and the position where it is obtained,
All the standard patterns included in the input pattern are extracted (recognized), and the keywords for the standard patterns are output to the keyword processing unit 10 (FIG. 1).

【００１７】キーワード処理部１０は、音声認識部３
（ワードスポッティング判定部２５）より出力されたキ
ーワードの出現頻度をカウントし、キーワードとともに
キーワード情報記憶部１３に書き込む。さらにキーワー
ド処理部１０は、キーワード情報記憶部１３に記憶され
たキーワードの出現頻度に対応して、そのキーワードを
読み出し、表示部１２または音声再生部５を介して音声
出力部６に供給する。キーワード情報記憶部１３は、キ
ーワード処理部１０より供給されるキーワードと、その
出現頻度を記憶する。表示部１２は、キーワード処理部
１０より供給されたキーワードを、画像（文字）として
表示する。The keyword processing unit 10 includes a voice recognition unit 3
The appearance frequency of the keyword output from the (word spotting determination unit 25) is counted and written in the keyword information storage unit 13 together with the keyword. Further, the keyword processing unit 10 reads the keyword corresponding to the appearance frequency of the keyword stored in the keyword information storage unit 13 and supplies the keyword to the voice output unit 6 via the display unit 12 or the voice reproduction unit 5. The keyword information storage unit 13 stores the keyword supplied from the keyword processing unit 10 and the appearance frequency thereof. The display unit 12 displays the keyword supplied from the keyword processing unit 10 as an image (character).

【００１８】次に、キーワードを登録するときの動作に
ついて説明する。キーワード音声入力部１１またはキー
ワード作成部８を介してキーボード７より入力されたキ
ーワードが、選択処理部９を介して音声認識部３の音響
分析部２１およびキーワード処理部１０に供給される。
キーワード処理部１０において、選択処理部９より供給
されたキーワードは、入力確認のために表示部１２また
は音声再生部５を介して音声出力部６に供給される。表
示部１２において、キーワード作成部８を介してキーボ
ード７より入力されたキーワードは画像として表示さ
れ、音声出力部６において、キーワード音声入力部１１
より入力されたキーワードは音声として出力される。Next, the operation for registering a keyword will be described. The keyword input from the keyboard 7 via the keyword voice input unit 11 or the keyword creation unit 8 is supplied to the acoustic analysis unit 21 and the keyword processing unit 10 of the voice recognition unit 3 via the selection processing unit 9.
In the keyword processing unit 10, the keyword supplied from the selection processing unit 9 is supplied to the voice output unit 6 via the display unit 12 or the voice reproduction unit 5 for input confirmation. On the display unit 12, the keyword input from the keyboard 7 via the keyword creating unit 8 is displayed as an image, and on the voice output unit 6, the keyword voice input unit 11 is displayed.
The input keyword is output as voice.

【００１９】なお、１つのキーワードを、キーボード７
とキーワード音声入力部１１との両方で入力しておけ
ば、音声で入力したキーワード（キーワード音声入力部
１１より入力されたキーワード）を文字として確認する
ことができ、また、文字で入力したキーワード（キーボ
ード７より入力されたキーワード）を音声で確認するこ
とができる。即ち、１つのキーワードを、キーボード７
とキーワード音声入力部１１との両方で入力しておけ
ば、キーワードを文字（画像）および音声の両方で確認
することができる。It should be noted that one keyword is used for the keyboard 7
If both are input with both the keyword input unit 11 and the keyword voice input unit 11, the keyword input by voice (the keyword input by the keyword voice input unit 11) can be confirmed as characters, and the keyword input by characters ( The keyword input from the keyboard 7) can be confirmed by voice. That is, one keyword is specified on the keyboard 7
If both are input with the keyword voice input unit 11, the keyword can be confirmed by both characters (images) and voice.

【００２０】一方、音声認識部３の音響分析部２１にお
いて、選択処理部９より供給されたキーワードとしての
音声信号が、線形予測分析され、音声の特徴パラメータ
（特徴パラメータ系列）が算出され、標準パターンメモ
リ２４に供給される。標準パターンメモリ２４におい
て、音響分析部２１より供給される、キーワードより算
出された特徴パラメータが標準パターンとして記憶され
る。On the other hand, in the acoustic analysis section 21 of the speech recognition section 3, the speech signal as the keyword supplied from the selection processing section 9 is subjected to linear prediction analysis, and the characteristic parameter (characteristic parameter series) of the speech is calculated and standardized. It is supplied to the pattern memory 24. In the standard pattern memory 24, the characteristic parameter calculated from the keyword supplied from the acoustic analysis unit 21 is stored as a standard pattern.

【００２１】以上のようにして、キーワードが標準パタ
ーンメモリ２４に登録（記憶）される。As described above, the keyword is registered (stored) in the standard pattern memory 24.

【００２２】さらに、音声をテープ４に記録するときの
動作について説明する。音声入力部１において、音声信
号がＡ／Ｄ変換処理され、音声記録部２および音声認識
部３に供給される。音声記録部２において、音声入力部
１より供給された音声信号は、テープ４に記録される。Further, the operation of recording voice on the tape 4 will be described. The voice signal is A / D converted in the voice input unit 1 and supplied to the voice recording unit 2 and the voice recognition unit 3. In the audio recording unit 2, the audio signal supplied from the audio input unit 1 is recorded on the tape 4.

【００２３】一方、音声認識部３に供給された音声信号
は、音響分析部２１において線形予測分析処理され、音
声の特徴パラメータ（特徴パラメータ系列）が算出され
る。そして、入力パターンメモリ２２において、音響分
析部２１より供給された音声の特徴パラメータが入力パ
ターンとして一時記憶される。On the other hand, the speech signal supplied to the speech recognition section 3 is subjected to linear prediction analysis processing in the acoustic analysis section 21 to calculate a characteristic parameter (characteristic parameter series) of speech. Then, in the input pattern memory 22, the characteristic parameter of the voice supplied from the acoustic analysis unit 21 is temporarily stored as an input pattern.

【００２４】ＤＰマッチング部２３において、入力パタ
ーンメモリ２２に一時記憶された入力パターンと、標準
パターンメモリ２４に記憶された標準パターンとのパタ
ーン間距離が、ＤＰマッチングにより算出される。即
ち、ＤＰマッチング部２３において、標準パターンの時
間軸、または入力パターンの時間軸のうちの一方、例え
ば標準パターンの時間軸を、他方のパターンである入力
パターンに最も類似するように、人工的に歪ませ、標準
パターンと入力パターンとの距離（時間正規化した距
離）が求められる。そして、ＤＰマッチング部２３にお
いて、標準パターンと入力パターンとの距離とともに、
その距離が得られたときの、標準パターンの時間軸が入
力パターンの時間軸に対応する位置、即ち、入力パター
ン中の標準パターンの位置が求められ、ワードスポッテ
ィング判定部２５に供給される。In the DP matching unit 23, the pattern distance between the input pattern temporarily stored in the input pattern memory 22 and the standard pattern stored in the standard pattern memory 24 is calculated by DP matching. That is, in the DP matching unit 23, one of the time axis of the standard pattern and the time axis of the input pattern, for example, the time axis of the standard pattern, is artificially set so as to be most similar to the input pattern which is the other pattern. After distortion, the distance between the standard pattern and the input pattern (time-normalized distance) is obtained. Then, in the DP matching unit 23, together with the distance between the standard pattern and the input pattern,
The position where the time axis of the standard pattern corresponds to the time axis of the input pattern when the distance is obtained, that is, the position of the standard pattern in the input pattern is obtained and supplied to the word spotting determination unit 25.

【００２５】なお、ＤＰマッチング部２３において、標
準パターンメモリ２４に記憶されているすべての標準パ
ターンに対して以上の処理が行われ、すべての標準パタ
ーンにおける最小の距離、およびそれが得られた位置が
求められ、ワードスポッティング判定部２５に供給され
る。In the DP matching unit 23, all the standard patterns stored in the standard pattern memory 24 are subjected to the above processing, and the minimum distance in all the standard patterns and the position where the minimum distance is obtained. Is calculated and supplied to the word spotting determination unit 25.

【００２６】ワードスポッティング判定部２５におい
て、ＤＰマッチング部２３より供給されたすべての標準
パターンにおける最小の距離、およびそれが得られた位
置から、入力パターンに含まれるすべての標準パターン
が抽出（認識）され、その標準パターンに対するキーワ
ードがキーワード処理部１０（図１）に出力される。In the word spotting determination section 25, all the standard patterns included in the input pattern are extracted (recognized) from the minimum distance in all the standard patterns supplied from the DP matching section 23 and the position where the minimum distance is obtained. Then, the keyword for the standard pattern is output to the keyword processing unit 10 (FIG. 1).

【００２７】キーワード処理部１０において、音声認識
部３（ワードスポッティング判定部２５）より出力され
たキーワードの出現頻度がカウントされ、キーワードと
ともにキーワード情報記憶部１３に書き込まれる。さら
に、キーワード処理部１０において、音声入力部１への
音声の入力が終了すると、キーワード情報記憶部１３に
記憶されたキーワードの出現頻度に対応して、キーワー
ドが読み出され（例えば、あらかじめ設定した所定の基
準値以上の頻度のキーワードのみが読み出され）、表示
部１２、または音声再生部５を介して音声出力部６に供
給される。表示部１２において、キーワード処理部１０
より供給されたキーワードは、画像（文字）として表示
され、音声出力部６において、キーワード処理部１０よ
り供給されたキーワードは、音声として出力される。In the keyword processing unit 10, the appearance frequency of the keyword output from the voice recognition unit 3 (word spotting determination unit 25) is counted and written in the keyword information storage unit 13 together with the keyword. Further, in the keyword processing unit 10, when the input of the voice to the voice input unit 1 is completed, the keyword is read out in accordance with the appearance frequency of the keyword stored in the keyword information storage unit 13 (for example, a preset keyword is set). Only keywords having a frequency equal to or higher than a predetermined reference value are read out) and supplied to the voice output unit 6 via the display unit 12 or the voice reproduction unit 5. In the display unit 12, the keyword processing unit 10
The supplied keyword is displayed as an image (character), and the voice output unit 6 outputs the keyword supplied from the keyword processing unit 10 as voice.

【００２８】次に、テープ４の記録内容を再生する場合
の動作について説明する。音声再生部５において、テー
プ４に記録された音声信号が再生され、音声認識部３に
供給される。音声認識部３の音響分析部２１において、
テープ４に記録された音声信号が線形予測分析処理さ
れ、音声の特徴パラメータ（特徴パラメータ系列）が算
出される。そして、入力パターンメモリ２２において、
音響分析部２１より供給された音声の特徴パラメータが
入力パターンとして一時記憶される。Next, the operation of reproducing the recorded contents of the tape 4 will be described. In the audio reproduction unit 5, the audio signal recorded on the tape 4 is reproduced and supplied to the voice recognition unit 3. In the acoustic analysis unit 21 of the voice recognition unit 3,
The audio signal recorded on the tape 4 is subjected to a linear predictive analysis process, and an audio characteristic parameter (characteristic parameter series) is calculated. Then, in the input pattern memory 22,
The characteristic parameter of the voice supplied from the acoustic analysis unit 21 is temporarily stored as an input pattern.

【００２９】ＤＰマッチング部２３において、入力パタ
ーンメモリ２２に一時記憶された入力パターンと、標準
パターンメモリ２４に記憶された標準パターンとのパタ
ーン間距離が、ＤＰマッチングにより算出される。即
ち、ＤＰマッチング部２３において、標準パターンの時
間軸、または入力パターンの時間軸のうちの一方、例え
ば標準パターンの時間軸を、他方のパターンである入力
パターンに最も類似するように、人工的に歪ませ、標準
パターンと入力パターンとの距離（時間正規化した距
離）が求められる。そして、ＤＰマッチング部２３にお
いて、標準パターンと入力パターンとの距離とともに、
その距離が得られたときの、標準パターンの時間軸が入
力パターンの時間軸に対応する位置、即ち、入力パター
ン中の標準パターンの位置がワードスポッティング判定
部２５に供給される。In the DP matching unit 23, the pattern distance between the input pattern temporarily stored in the input pattern memory 22 and the standard pattern stored in the standard pattern memory 24 is calculated by DP matching. That is, in the DP matching unit 23, one of the time axis of the standard pattern and the time axis of the input pattern, for example, the time axis of the standard pattern, is artificially set so as to be most similar to the input pattern which is the other pattern. After distortion, the distance between the standard pattern and the input pattern (time-normalized distance) is obtained. Then, in the DP matching unit 23, together with the distance between the standard pattern and the input pattern,
The position where the time axis of the standard pattern corresponds to the time axis of the input pattern when the distance is obtained, that is, the position of the standard pattern in the input pattern is supplied to the word spotting determination unit 25.

【００３０】なお、ＤＰマッチング部２３において、標
準パターンメモリ２４に記憶されているすべての標準パ
ターンに対して以上の処理が行われ、すべての標準パタ
ーンにおける最小の距離、およびそれが得られた位置が
求められ、ワードスポッティング判定部２５に供給され
る。In the DP matching unit 23, all the standard patterns stored in the standard pattern memory 24 are subjected to the above processing, and the minimum distance in all the standard patterns and the position where the minimum distance is obtained. Is calculated and supplied to the word spotting determination unit 25.

【００３１】ワードスポッティング判定部２５におい
て、ＤＰマッチング部２３より供給されたすべての標準
パターンにおける最小の距離、およびそれが得られた位
置から、入力パターンに含まれるすべての標準パターン
が抽出（認識）され、その標準パターンに対するキーワ
ードがキーワード処理部１０（図１）に出力される。In the word spotting determination section 25, all the standard patterns included in the input pattern are extracted (recognized) from the minimum distance in all the standard patterns supplied from the DP matching section 23 and the position where the minimum distance is obtained. Then, the keyword for the standard pattern is output to the keyword processing unit 10 (FIG. 1).

【００３２】キーワード処理部１０において、音声認識
部３（ワードスポッティング判定部２５）より出力され
たキーワードの出現頻度がカウントされ、キーワードと
ともにキーワード情報記憶部１３に書き込まれる。さら
に、キーワード処理部１０において、音声再生部５への
音声の入力が終了すると、キーワード情報記憶部１３に
記憶されたキーワードの出現頻度に対応して、キーワー
ドが読み出され、表示部１２、または音声再生部５を介
して音声出力部６に供給される。表示部１２において、
キーワード処理部１０より供給されたキーワードは、画
像（文字）として表示され、音声出力部６において、キ
ーワード処理部１０より供給されたキーワードは、音声
として出力される。In the keyword processing unit 10, the appearance frequency of the keyword output from the voice recognition unit 3 (word spotting determination unit 25) is counted and written in the keyword information storage unit 13 together with the keyword. Further, in the keyword processing unit 10, when the input of the voice to the voice reproduction unit 5 is completed, the keyword is read out in accordance with the appearance frequency of the keyword stored in the keyword information storage unit 13, and the display unit 12 or It is supplied to the audio output unit 6 via the audio reproduction unit 5. In the display unit 12,
The keyword supplied from the keyword processing unit 10 is displayed as an image (character), and the voice output unit 6 outputs the keyword supplied from the keyword processing unit 10 as a voice.

【００３３】従って、テープ４に、例えば「こんにち
は、小野です。この前話していた長岡の花火の件です
が、８月３日は長岡近辺の温泉宿はほとんど予約でいっ
ぱいになるそうです。この際、長岡から少しはなれたと
ころの温泉を予約して、花火を見るのにＪＲで出かけて
しまってはどうでしょうか？それに、温泉がいっぱいな
らふつうの民宿でも良いと思います。」という音声（メ
ッセージ）が記録されている場合、標準パターンメモリ
２４に、例えば「東京、大阪、名古屋、石川、北海道、
新潟、長岡、柏崎、川崎、調布、小野、小川、岡、田
中、鈴木、ウインドサーフィン、スキー、テニス、温
泉、ドライブ、花火、予約、製品、希望、購入、利用、
カタログ、請求、日程、時刻、１、２、３、４、５、
６、７、８、９、０、月、日」などのキーワードが登録
されているとき、キーワードの出現頻度の閾値を、人名
は１回、その他の単語は２回とすると、「小野、長岡、
花火、温泉、予約」のキーワードが音声出力部６または
表示部１２より出力される。このキーワードから音声
（メッセージ）の内容の概要を推測することができる。[0033] Therefore, the tape 4, for example, "Hello, this is Ono., But is a matter of Nagaoka fireworks before this was talking, August 3, it is so become filled with hot spring inn most reservation near Nagaoka. This On that occasion, why don't we reserve a hot spring a little far from Nagaoka and go out on the JR to see the fireworks? And if the hot springs are full, I think a normal guesthouse is also good. " ) Is recorded in the standard pattern memory 24, for example, “Tokyo, Osaka, Nagoya, Ishikawa, Hokkaido,
Niigata, Nagaoka, Kashiwazaki, Kawasaki, Chofu, Ono, Ogawa, Oka, Tanaka, Suzuki, windsurfing, skiing, tennis, hot spring, drive, fireworks, reservation, product, hope, purchase, use,
Catalog, billing, schedule, time, 1, 2, 3, 4, 5,
When a keyword such as “6, 7, 8, 9, 0, month, day” is registered, the threshold of the frequency of appearance of the keyword is set to be once for a person's name and twice for other words. ,
The keyword “fireworks, hot spring, reservation” is output from the voice output unit 6 or the display unit 12. From this keyword, the outline of the content of the voice (message) can be inferred.

【００３４】さらに、キーワード処理部１０において
は、キーワードとその出現頻度をキーワード情報記憶部
１３に書き込むだけでなく、キーワードとその出現頻度
をテープ４の、例えばヘッダに記録するようにすること
ができる。これにより、再生時にテープ４のヘッダに記
録されたキーワードとその出現頻度を参照して、即座に
キーワードを出力することができるようになる。Further, in the keyword processing unit 10, not only the keyword and its appearance frequency are written in the keyword information storage unit 13, but also the keyword and its appearance frequency can be recorded in, for example, the header of the tape 4. .. This makes it possible to immediately output the keyword by referring to the keyword recorded in the header of the tape 4 and its appearance frequency during reproduction.

【００３５】なお、本実施例においては、音声信号を記
録する記録媒体をテープとしたが、ランダムアクセスが
可能な、例えば光ディスクや光磁気ディスクなどのディ
スクに音声信号を記録するようにすれば、さらに迅速に
キーワードの検索をすることができる。In this embodiment, the tape is used as the recording medium for recording the audio signal. However, if the audio signal is recorded on a disc such as an optical disc or a magneto-optical disc which can be randomly accessed, The keyword can be searched more quickly.

【００３６】[0036]

【発明の効果】請求項１に記載の音声認識装置によれ
ば、記録媒体に記録された音声に含まれるキーワードを
抽出し、抽出したキーワードを出力手段より出力する。
従って、記録媒体の記録内容を迅速に確認することがで
きる。According to the voice recognition device of the first aspect, the keyword contained in the voice recorded in the recording medium is extracted, and the extracted keyword is output from the output means.
Therefore, the recorded contents of the recording medium can be confirmed quickly.

【００３７】請求項２に記載の音声認識装置によれば、
キーワードの出現頻度に対応して、キーワードを出力手
段に供給するようにしたので、頻度の少ないキーワード
の出力を省略することができ、すべてのキーワードを出
力する場合に較べて、記録媒体の記録内容を、さらに迅
速に確認することができる。According to the voice recognition device of the second aspect,
Since the keywords are supplied to the output means in accordance with the frequency of appearance of the keywords, it is possible to omit the output of the keywords having a low frequency, and to compare the recorded contents of the recording medium with the case of outputting all the keywords. Can be confirmed more quickly.

[Brief description of drawings]

【図１】本発明の音声認識装置の一実施例の構成を示す
ブロック図である。FIG. 1 is a block diagram showing a configuration of an embodiment of a voice recognition device of the present invention.

【図２】図１の実施例の音声認識部３のより詳細なブロ
ック図である。FIG. 2 is a more detailed block diagram of a voice recognition unit 3 of the embodiment of FIG.

[Explanation of symbols]

１音声入力部２音声記録部３音声認識部４テープ５音声再生部６音声出力部７キーボード８キーワード作成部９選択処理部１０キーワード処理部１１キーワード音声入力部１２表示部１３キーワード情報記憶部２１音響分析部２２入力パターンメモリ２３ＤＰマッチング部２４標準パターンメモリ２５ワードスポッティング判定部 1 voice input unit 2 voice recording unit 3 voice recognition unit 4 tape 5 voice reproduction unit 6 voice output unit 7 keyboard 8 keyword creation unit 9 selection processing unit 10 keyword processing unit 11 keyword voice input unit 12 display unit 13 keyword information storage unit 21 Acoustic analysis unit 22 Input pattern memory 23 DP matching unit 24 Standard pattern memory 25 Word spotting determination unit

───────────────────────────────────────────────────── フロントページの続き (72)発明者田中幸東京都品川区北品川６丁目７番35号ソニー株式会社内 (72)発明者加藤靖彦東京都品川区北品川６丁目７番35号ソニー株式会社内 (72)発明者小林哲夫東京都品川区北品川６丁目７番35号ソニー株式会社内 ─────────────────────────────────────────────────── ─── Continuation of the front page (72) Inventor Yuko Tanaka 6-735 Kita-Shinagawa, Shinagawa-ku, Tokyo Inside Sony Corporation (72) Inventor Yasuhiko Kato 6-35 Kita-Shinagawa, Shinagawa-ku, Tokyo Sony Corporation (72) Inventor Tetsuo Kobayashi 6-735 Kitashinagawa, Shinagawa-ku, Tokyo Sony Corporation

Claims

[Claims]

1. A recording medium for recording a voice, a keyword input unit for inputting a keyword, a storage unit for storing a keyword input by the keyword input unit, and a voice recorded on the recording medium, A voice recognition device comprising: a keyword extraction unit that extracts a keyword stored in the storage unit; and an output unit that outputs the keyword extracted by the keyword extraction unit.

2. A frequency storage unit that stores the appearance frequency of the keyword extracted by the keyword extraction unit, and the keyword is supplied to the output unit corresponding to the appearance frequency of the keyword stored in the frequency storage unit. The voice recognition device according to claim 1, further comprising: