JPH01106098A

JPH01106098A - Voice recognition system

Info

Publication number: JPH01106098A
Application number: JP62264375A
Authority: JP
Inventors: Masayuki Iida; 正幸飯田; Hiroki Onishi; 宏樹大西; Kazuyoshi Okura; 計美大倉
Original assignee: Sanyo Electric Co Ltd
Current assignee: Sanyo Electric Co Ltd
Priority date: 1987-10-20
Filing date: 1987-10-20
Publication date: 1989-04-24
Anticipated expiration: 2012-05-14
Also published as: JP2609874B2

Abstract

PURPOSE: To generate standard patterns of a microphone speech and a sound- recorded speech by single-time pronunciation and lighten the burden on the registering person by recording a registered speech which is inputted from a microphone in a sound-recording and reproducing device and using this registered speech as a registered speech for generating the standard pattern of the sound-recorded speech. CONSTITUTION: The registered speech is inputted directly to a main body from the microphone 3 and the sound-recording and reproducing device 2 is connected to the main body at this time. Then while this inputted registered speech is recorded in the sound-recording and reproducing device 2, the registered speech from the microphone 3 is analyzed on the main body side to generate the standard pattern, which is stored in a storage device 8. Consequently, the standard pattern generated by the direction input from the microphone 3 and the standard pattern generated from the sound-recorded speech can be both generated by single-time speech registering operation, so the frequency of speech pronunciation by the registering person is reduced to a half to lighten the burden of the speech operation.

Description

【発明の詳細な説明】くイ）産業上の利用分野本発明は音声認識システムに関するものである。[Detailed description of the invention] B) Industrial application fields TECHNICAL FIELD The present invention relates to a speech recognition system.

（ロ）従来の技術音声をテープレコーダの如き録音再生装置に録音し、こ
れを再生して出力される再生音声を音声認識装置へ入力
することにより、音声認識を行ないこれを文章化する音
声認識システムが開発されつつある（特開昭５８−１５
８７３６号）。(b) Conventional technology Voice recognition is performed by recording voice on a recording/playback device such as a tape recorder, playing it back, and inputting the output reproduced voice to a voice recognition device to perform voice recognition and convert it into text. system is being developed (Unexamined Japanese Patent Publication No. 58-15
No. 8736).

テープレコーダの如き録音再生装置に録音したｇ声は、
録音再生装置の周波数特性を受けているため、録音音声
から作成した標準パターンは、マイクより直接入力した
音声から作成した標準パターンとは違っている。故に、
音声認識装置で録音音声を認識させるときは、録音音声
から作成した標準パターンを使用する必要があり、一方
マイクから直接入力した音声をＶ、識させるときは、マ
イクより直接入力した音声から作成した標準パターンを
使用する必要がある。従って、マイク音声と録音音声の
いずれの音声でも認識可能とするには、マイク音声によ
る登録と録音音声による登録という２通りの登録操作が
必要であるので、従来は、かかる２通りの登録操作を別
々におこなう必要があった。The g-voice recorded on a recording/playback device such as a tape recorder is
Because the standard pattern is based on the frequency characteristics of the recording/playback device, a standard pattern created from recorded audio is different from a standard pattern created from audio input directly from a microphone. Therefore,
When making a speech recognition device recognize recorded speech, it is necessary to use a standard pattern created from the recorded speech.On the other hand, when making the speech recognition device recognize the speech input directly from the microphone, it is necessary to use a standard pattern created from the speech input directly from the microphone. Standard patterns must be used. Therefore, in order to be able to recognize both microphone voice and recorded voice, two types of registration operations are required: registration using microphone voice and registration using recorded voice. Conventionally, these two types of registration operations were performed. They had to be done separately.

（ハ）発明が解決しようとする問題点上述のようにマイク音声と録音音声の登録操作は、別々
に行なわれていたため、同じ発声を２回行なわなければ
ならず、登録者にとって大変な負担であった。本発明は
１回の発声でマイク音声と球音音声との標準パターンを
作成でき、登録者の負担を軽減できる音声認識システム
を実現するものである。(c) Problems to be solved by the invention As mentioned above, since the registration operations for microphone voice and recorded voice were performed separately, the same utterance had to be performed twice, which was a great burden for the registrant. there were. The present invention realizes a voice recognition system that can create a standard pattern of microphone voice and bulb voice with one utterance, and can reduce the burden on the registrant.

り二）問題点を解決するだめの手段大発明の音声認識システムは、認識装置にマイクより登
録音声を入力するとき、この登録音声を録音再生装置に
録音しながら、ｆＭ＄パターンを作成する。この後、録
音再生装置に録音された音声を再生入力し、録音音声の
標準パターンを作成するものである。2) Means to Solve the Problems The voice recognition system of the great invention creates an fM$ pattern while inputting registered voice into the recognition device through the microphone and recording the registered voice into the recording/playback device. Thereafter, the recorded voice is played back and input to the recording/playback device, and a standard pattern of the recorded voice is created.

（ホ）作用本発明の音声認識システムによれば、マイクより入力さ
れる登録音声を、録音再生装置に録音しておき、かかる
録音音声を録音音声の標準パターンを作るための登録音
声として用いるため、１０の登録音声の発声で、マイク
入力音から作成した標準パターンと、録音音声から作成
した標準パター〉・の内標準パターンを得る察ができる
。(e) Function: According to the voice recognition system of the present invention, the registered voice inputted from the microphone is recorded in the recording/playback device, and the recorded voice is used as the registered voice for creating a standard pattern of the recorded voice. , 10 registered voices, it can be seen that a standard pattern created from the microphone input sound and a standard pattern created from the recorded sound are obtained.

以下余白（へ）実施例第１因に本発明を採用して音声入力により文章作成する
デイクチ−ティングマシンの外観図を示し、第２図に該
マシンの機能ブロック図を示す。DESCRIPTION OF THE PREFERRED EMBODIMENTS In the following, an external view of a desk cheating machine which employs the present invention to create sentences by voice input is shown in the first place, and FIG. 2 shows a functional block diagram of the machine.

第２図に於て、（１）は第１図の本体（１００）内に回
路装備された音声認識部であり、その詳細は第３図のブ
ロック図に示す如く、入力音声信号の音圧調整を行う前
処理部＜１１）［第４図コ、該処理部（１１）からの音
圧ｍ整済みの音声信号からそ、の音響特徴を示すパラメ
ータを抽出する特徴抽出部（１２）［第５図］、該抽出
部（１２）から得られる特徴パラメータに基づき入力音
声の単語認識を行う単語認識部（１３）［第６図］と文
節認識部（１４）［第７図］、及びこれらいずれかの認
識部（１３）、（ｔ４）からの認識結果に基づき認識単
語文字列、或いは認識音節文字の候補を作成する候補作
成部（１５）からなる。In Fig. 2, (1) is a speech recognition section equipped with a circuit in the main body (100) of Fig. 1, and its details are shown in the block diagram of Fig. 3. A pre-processing unit that performs adjustment <11) [Fig. [Fig. 5], a word recognition unit (13) [Fig. 6] that performs word recognition of input speech based on the feature parameters obtained from the extraction unit (12), and a phrase recognition unit (14) [Fig. 7]; It consists of a candidate generation unit (15) that generates candidates for recognized word character strings or recognized syllable characters based on the recognition results from any of these recognition units (13) and (t4).

更に第２図に約１、（２）は第１図に示す如く本体（１
００）に機械的並びに電気的に着脱可能なテープレコー
ダ等の録音再生装置、（３）は例えば第１図図示の如き
ヘッドホンタイプのマイクロホン、（４）は録音再生装
置（２）とマイクロホン（３）と音声認識部（１）との
あいだの接続切り換えを行う入力切り換え部〔第８図］
である。（６）は認識結果に基づき生成した文字列等を
表示するための表示装置、（７）は該デイクチ−ティン
グマシンの各種制御信号を入力するためのキーボード、
（８）は該デイクチ−ティングマシンで生成された文字
列を記憶する磁気ディスク装置等の記憶装置、（９）は
該記憶装置の文字列を規則合成によりスピーカ（１０）
から読み上げるための音声合成部である。Furthermore, in Fig. 2, approximately 1, (2) is the main body (1) as shown in Fig. 1.
00) is a mechanically and electrically detachable recording and reproducing device such as a tape recorder, (3) is a headphone type microphone as shown in FIG. 1, and (4) is a recording and reproducing device (2) and a microphone (3). ) and the voice recognition unit (1) (Fig. 8)
It is. (6) is a display device for displaying character strings etc. generated based on the recognition results; (7) is a keyboard for inputting various control signals of the desk cheating machine;
(8) is a storage device such as a magnetic disk device that stores the character strings generated by the digital cheating machine, and (9) is a speaker (10) that uses the character strings in the storage device by rule synthesis.
This is a speech synthesis unit that reads out the text.

尚、（５）はマイクロプロセッサからなる制御部であり
、上記各部の動作の制御を司っている。Note that (5) is a control section consisting of a microprocessor, which controls the operations of the above-mentioned sections.

上述の構成のデイクチ−ティングマシンに依る文章作成
方法としては二通りあり、それぞれに就いて以下に詳述
する。There are two ways to create sentences using the desk cheating machine configured as described above, and each will be explained in detail below.

第一の方法は、マイク（３）より主音声を音声認識部く
１）に入力し、音声認識を行ない、入力音声を文字列に
変換し、表示装ｅ（６）に表示し、同時に記憶装置（８
）に結果を記憶する。The first method is to input the main voice from the microphone (3) to the voice recognition section 1), perform voice recognition, convert the input voice into a character string, display it on the display device e (6), and store it at the same time. Equipment (8
).

第二の方法は、入力したい文章を予め録音再生装！！（
２）に録音しておき、この録音再生装置（２）を本装置
に接続し、録音文章を音声認識部（１）に入力すること
により、音声認識・を行ない、入力音声を文字列に変換
し、表示装置く６）に表示し、同時に記憶装置（８）に
結果を記憶する。The second method is to record and play the text you want to input in advance! ! (
2), connect this recording and playback device (2) to this device, and input the recorded text to the speech recognition unit (1) to perform speech recognition and convert the input speech into a character string. The results are displayed on the display device (6), and the results are simultaneously stored in the storage device (8).

上述の様に、音声ｉ入力する方法は、二通りあるので、
入力切り換え部（４）において、入力の切り換えを行な
う。また入力切り換え部（４）は、入力の切り換えの他
に、録音再生装置（２）に録音信号（イ）を録音するの
か、マイク（３）より入力された音声を録音するのかの
切り換えも行なう。As mentioned above, there are two ways to input voice i.
Input switching section (4) performs input switching. In addition to switching the input, the input switching unit (4) also switches whether to record the recording signal (a) to the recording/playback device (2) or the audio input from the microphone (3). .

以下に音声録音から文章作成までの動作を順次詳述する
。The operations from voice recording to text creation will be explained in detail below.

（ｉ）　　音声登録処理音声認識を行なうに先たち、音声認識に必要な音声の標
準パターンを作成するため、音声登録を行なう。(i) Voice registration processing Before performing voice recognition, voice registration is performed in order to create a standard pattern of voice necessary for voice recognition.

まず、音節登録モードについて述べる。First, the syllable registration mode will be described.

ここで述べている標準パターンとは、音声認識部（１）
の文節認識部（１４）でのパターンマッチィング時の基
準パターンとなるものであり、具体的には第７図の如き
文節認識部（１４）の音節標準パターンメモリ（１４ｄ
）に格納される。The standard pattern described here is the speech recognition unit (1)
It serves as a reference pattern during pattern matching in the phrase recognition section (14) of the phrase recognition section (14), and specifically, it is the syllable standard pattern memory (14d) of the phrase recognition section (14) as shown in Fig. 7.
).

本デイクチ−ティングマシンに音声登録する方法は、ま
ず第７図のスイッチ（１４９１）を操作しパラメータバ
ッファ（１４ｄ）と音節認識部（１４ｂ）とを接続し、
次に述べる三方法がある。The method for registering voice in this digital cheating machine is to first operate the switch (1491) in Fig. 7 to connect the parameter buffer (14d) and syllable recognition unit (14b).
There are three methods described below.

第一の方法は該マシンの本体（１００）にマイク（３）
より直接登録音声を入力し、この登録音声を音声認識部
（１）で分析し、標準パターンを作成し、作成した標準
パターンを音節標準パターンメモリ（１４ｄ）および記
憶装置（８）に記憶させる方法である。The first method is to attach a microphone (3) to the main body (100) of the machine.
A method of directly inputting a registered voice, analyzing this registered voice with a voice recognition unit (1), creating a standard pattern, and storing the created standard pattern in a syllable standard pattern memory (14d) and a storage device (8). It is.

第二の方法は前もって登録音声を録音しておいた録音再
生装置（２）を本体（１００）に接続し、この録音登録
音声を再生することにより登録音声の入力をなし、この
人力゛した登録音声を音声認識部（１）で分析し、標準
パターンを作成し、作成した標準パターンを音節標準パ
ターンメモリ（１４ｄ）および記憶装置（８〉に記憶さ
せる方法である。The second method is to connect a recording/playback device (2) on which the registered voice has been recorded in advance to the main body (100), input the registered voice by playing back the recorded registered voice, and perform this manual registration. This is a method in which speech is analyzed by a speech recognition unit (1), a standard pattern is created, and the created standard pattern is stored in a syllable standard pattern memory (14d) and a storage device (8>).

第三の方法は本マシンの本体（１００）にマイク（３）
から直接登録音声を入力するが、このとき同時に録音再
生装置（２）を本体（１００）に接続しておきこの入力
きれた音声を録音再生装置（２）に録音しながら、本体
（１００）側ではマイク（３）からの登録音声の分析を
行ない標準パターンを作成し、作成した標準パターンを
記憶装置（８）に記憶させておく。そして、次にこのマ
イク（３）への音声入力が終了すると、これに引き続き
、録音再生装置（２）に録音された音声を再生し、この
録音された登録音声を音声！！！識部置部）で分析し、
標準パターンを作成し、作成した標準パターンを音節標
準パターンメモリ（Ｌ４ｄ）に記憶しておくと同時に、
記憶装置（８）にも上述のマイク（３）からの直接の登
録音声の音節標準パターンと共に記憶させる方法である
。The third method is to connect the microphone (3) to the main body (100) of this machine.
At this time, the recording and playback device (2) is connected to the main unit (100), and while recording the input audio to the recording and playback device (2), input the registered audio directly from the main unit (100) side. Then, the registered voice from the microphone (3) is analyzed to create a standard pattern, and the created standard pattern is stored in the storage device (8). Then, when the voice input to this microphone (3) is completed, the recorded voice is subsequently played back by the recording/playback device (2), and this recorded registered voice is reproduced as a voice! ! ! Analyzed by Shikibe (Okibe),
At the same time as creating a standard pattern and storing the created standard pattern in the syllable standard pattern memory (L4d),
This is a method in which the syllable standard pattern of the directly registered voice from the above-mentioned microphone (3) is also stored in the storage device (8).

この第３の方法に於ては、録音再生装置（２）に録音し
た音声は録音再生装置（２）の周波数特性を受けている
ため、録音した音声から作成した標準パターンと、マイ
ク（３）から直接入力した音声より作成した標準パター
ンとを比べた場合、両標準パターンの間に違いが現れる
。故に録音音声を認識させるときは、録音音声より作成
した標準パターンを使用する必要があり、マイク（３）
から直接入力した音声を認識させるときは、マイク（３
）から直接入力した音声より作成した標準パターンを使
用する必要があるので、上述の如きの方法をとることに
よって、マイク（３）から直接登録した標準パターンと
録音音声より作成した標準パターンの両パターンを一回
の音声登録操作によって作成し記憶できる。また、−度
録音再生装置く２）に登録音声を録音しておけば標準パ
ターンを作成していないデイクチ−ディングマシン上に
も登録者の発声入力を必要とせず、この録音音声を再生
入力するだけで、標準パターンが作成できる。また、録
音再生装ｆｌ（２’）に登録音声を録音し、さらにこの
登録音声のあとに文章を録音しておけば、後にこの録音
再生装置（２）を本体（１００）に接続し、録音された
音声を再生するだけで音声登録から、文章作成まで、す
べて自動的に行なえる。In this third method, the sound recorded on the recording/playback device (2) is subject to the frequency characteristics of the recording/playback device (2), so the standard pattern created from the recorded sound and the microphone (3) When comparing the standard pattern created from the voice input directly from the standard pattern, differences appear between the two standard patterns. Therefore, when recognizing a recorded voice, it is necessary to use a standard pattern created from the recorded voice, and the microphone (3)
When you want to recognize the voice input directly from the microphone (3)
), it is necessary to use the standard pattern created from the voice input directly from the microphone (3), so by using the method described above, both the standard pattern registered directly from the microphone (3) and the standard pattern created from the recorded voice can be used. can be created and stored with a single voice registration operation. In addition, if the registered voice is recorded in the recording and playback device 2), this recorded voice can be played back and input on the recording machine for which no standard pattern has been created, without requiring the registrant's voice input. Standard patterns can be created simply by Also, if you record the registered voice on the recording and playback device fl (2') and record the text after this registered voice, you can later connect this recording and playback device (2) to the main unit (100) and record it. Just by playing the recorded audio, everything from voice registration to sentence creation can be done automatically.

尚、音声の標準パターンを作成する為の登録者の発声入
力は、本装置が一定の順序で表示装置（６）に表示する
文字を登録者が読み上げることにより行なわれる。Note that the registrant's voice input for creating the standard voice pattern is performed by the registrant reading out the characters that the present device displays on the display device (6) in a fixed order.

また、本マシン専用の表示機能をもつ録音再生装置（２
）を使用する場合はこの録音再生装置（２）単独で携帯
する時でもその表示画面に表示された見出し語に対応す
る音声を発声し録音再生装置（２）に録音する事で、標
準パターンの作成が可能となる。In addition, a recording and playback device (2
), this recording/playback device (2) can be carried alone, by uttering the voice corresponding to the entry word displayed on the display screen and recording it on the recording/playback device (2), the standard pattern can be reproduced. It becomes possible to create.

上述の如く、標準パターンを作成するための登録音声を
録音再生装置（２）に録音する場合は、この録音された
登録音声より標準パターンを作成するときにノイズなど
の影響を受は録音音声とこれに対応するべき見出し語と
がずれる可能性があり、以下、第９図に基づき、説明の
ため録音再生装置としてテープレコーダを使用した場合
について述べる。第９図（ａ）はテープレコーダに標準
パターン作成のための登録音声を録音した状態のうち、
見出し語「あ」〜１か」に対応した登録音声゛あ°〜“
か°の間のテープの状態を表わしており、ここでは“え
°′と“お１の間に［ノイズ］が録音された場合を示す
。第９図（ａ）の様に登録音声と登録音声との間に［ノ
イズコが録音されたテープにより音声登録を行なった場
合、１番目に録音された音が“あ“で２番目に録音され
た音が“い”″という様に、ただ単にテープに録音され
た音の順序により、入力された登録音声がどの音節に対
応しているのかを決定していると、［ノイズ］まで登録
音声とみなして見出し語を対応させるので入力された実
際の登録音声と見出し語とがずれてしまう。As mentioned above, when recording the registered voice for creating a standard pattern into the recording/playback device (2), when creating the standard pattern from the recorded registered voice, the recorded voice may be affected by noise etc. There is a possibility that the corresponding headword may be misaligned, and for the sake of explanation, a case will be described below based on FIG. 9 in which a tape recorder is used as the recording/reproducing device. Figure 9(a) shows the state in which registered voices for standard pattern creation are recorded on a tape recorder.
Registered voice corresponding to the headword “A” ~1ka” ゛゛°〜“
This shows the state of the tape between "E°'" and "O1", where noise is recorded between "E°'" and "O1". As shown in Figure 9 (a), there is a gap between the registered voices [If noiseco registers the voice using the recorded tape, the first recorded sound is "A" and the second recorded sound is recorded. If you decide which syllable the input registered voice corresponds to simply by the order of the sounds recorded on the tape, such as when the sound is "i" Since the headwords are made to correspond to each other, the actual registered voice input and the headwords are misaligned.

ここで、第９図（ｂ）は［ノイズコを音声と誤認識し、
見出し語「え、のところに［ノイズ］が入力きれ、見出
し語「お、のところに音節“九″が入力された図である
。Here, Fig. 9(b) shows [Noiseco is mistakenly recognized as voice,
This is a diagram where [noise] has been input in place of the headword ``E,'' and the syllable ``9'' has been input in place of the headword ``O.''

この様に登録音声より標準パターンを作成するときにノ
イズなどの影響を受は録音音声と見出し語とがずれる場
合があるため、第９図（Ｃ）に示すように、登録音声の
種類を示したキャラクタ−コード音を、登録音声に対応
させて録音再生装置（２）に録音する。この方法により
、“う°′　と“え′の間に［ノイズコが録音されてい
ても、上述のように、入力された音と見出し語とのずれ
を防止する。In this way, when creating a standard pattern from registered voices, the recorded voice and headwords may deviate due to the influence of noise, etc., so the types of registered voices are indicated as shown in Figure 9 (C). The character code sounds obtained are recorded in a recording/playback device (2) in correspondence with the registered sounds. With this method, even if [Noiseko] is recorded between "U°' and "E," as described above, a discrepancy between the input sound and the headword can be prevented.

このずれを防止する特定周波数のキャラクタ−コード音
の録音方法を、録音再生装置（２）のテープレコーダが
シングルトラックである場合と、マルチトラックである
場合とにわけて説明する。A method of recording a character-code sound of a specific frequency to prevent this deviation will be explained separately for the case where the tape recorder of the recording/reproducing device (2) is a single track and the case where the tape recorder is a multi-track.

まず第１０図において、録音方式としてマルチトラック
をもつ録音再生装置を使用する場合について述べる。First, with reference to FIG. 10, a case will be described in which a recording/playback device with multi-track is used as the recording method.

録音方式としてマルチトラックをもつ録音再生装置を使
用する場合は同図（ａ）に示すように音声を録音してい
ないトラックに見出し語に対応するキャラクタ−コード
を録音する。音声認識部（１）では、このキャラクタ−
コード音より、入力される音声の見出し語を知るととも
に、音声トラックに録音された音のうち、このキャラク
タ−コード音が録音きれた区間ｔ１に録音きれた實のう
ち、音圧しきい値以上の条件をみたすもののみを音声と
みなし、分析を行なう。When a multi-track recording/playback device is used as a recording method, the character code corresponding to the headword is recorded on a track on which no audio is recorded, as shown in FIG. 2(a). In the voice recognition unit (1), this character
From the chord sound, we can know the headword of the input voice, and among the sounds recorded on the audio track, we can identify the sound that is over the sound pressure threshold among the sounds that have been recorded in the section t1 where this character-chord sound has been recorded. Only those that meet the conditions are considered to be audio and analyzed.

または、同図（ｂ）に示すように、音声の始めと終わり
に見出し語に対応するキャラクタ−コードを録音し、音
声トラックに録音された音のうち、この音声の始めを示
すキャラクタ−コード音と、音声の終わりを示すキャラ
クタ−コード音の間の区間ｔ２に録音された音のうち、
音圧しきい値以上の条件をみたすもののみを音声とみな
し、分析を行なう。Alternatively, as shown in FIG. 2(b), a character code corresponding to the headword is recorded at the beginning and end of the audio, and the character code that indicates the beginning of the audio is selected from among the sounds recorded on the audio track. Among the sounds recorded in the interval t2 between the character and the chord sound indicating the end of the voice,
Only those that satisfy the condition of being equal to or higher than the sound pressure threshold are regarded as voices and analyzed.

または、同図（Ｃ）に示すように、音声の始めに見出し
語に対応するキャラクタ−コードを録音する。音声トラ
ックに録音された音のうち、この音声の種類を示すキャ
ラクタ−コード音から、次の見出し語に対応するキャラ
クタ−コード音までの区間ｔ３に録音された音のうち、
音圧しきい値以上の条件をみたすもののみを音声とみな
し、分析を行なう。Alternatively, as shown in FIG. 5C, a character code corresponding to the headword is recorded at the beginning of the audio. Among the sounds recorded on the audio track, among the sounds recorded in the section t3 from the character-code sound indicating the type of sound to the character-code sound corresponding to the next headword,
Only those that satisfy the condition of being equal to or higher than the sound pressure threshold are regarded as voices and analyzed.

また第二の方法としてシングルトラックの録音再生装置
（２〉の場合は、見出し語に対応するキャラフターコー
ドを音声の分析周波数帯域外の音で表わし、音声の録音
されているトラックに音声と共に録音する。この場合の
キャラクタ−コード音を録音する方法は、上述のマルチ
トラックの場合と同様である。つまり、上述のｔｌ、ｔ
２、ｔ３の区間に録音きれた音うち、上述と同様の条件
をみたすもののみを音声とみなし、分析を行なう。The second method is a single-track recording/playback device (in the case of 2), the character code corresponding to the headword is expressed as a sound outside the voice analysis frequency band, and is recorded along with the voice on the track where the voice is recorded. The method for recording character chord sounds in this case is the same as in the multi-track case described above.
2. Among the sounds completely recorded in the interval t3, only those that satisfy the same conditions as described above are regarded as voices and analyzed.

ただし、音声と、キャラクタ−コード音が重なっている
同図（ａ）に示した実施例の場合以外は、キャラクタ−
コード音に、音声の分析周波数帯域外の音を使用しなく
てもよい。However, except in the case of the example shown in FIG.
It is not necessary to use a sound outside the voice analysis frequency band as a chord sound.

次ぎにアルファベット、数字およびカッコや句読点など
予め第６図の如き単語認識部（１３）の単語辞書（１３
ｄ）にキャラクタ−登録されている単語に対応する単語
標準パターンを、同図の単語標準パターンメモリ（１３
ｃ）に登録する。Next, alphabets, numbers, parentheses, punctuation marks, etc. are preliminarily stored in the word dictionary (13) of the word recognition unit (13) as shown in Figure 6.
The word standard pattern corresponding to the word registered as a character in d) is stored in the word standard pattern memory (13) in the figure.
c) Register.

まず、所定の操作により、第６図のパラメータバッファ
（１３ｇ）と単語標準パターンメモリ（１３ｃ）とがス
イッチ＜１３ｓｌ）により接続され、単語登録モードに
する。First, by a predetermined operation, the parameter buffer (13g) shown in FIG. 6 and the word standard pattern memory (13c) are connected by the switch <13sl), and the word registration mode is set.

つぎに、本装置本体（１００）の表示装置く６）にアル
ファベット、数字およびカッコや句読点などが表示され
、操作者はこれに対応する読みを音声入力する。Next, alphabets, numbers, parentheses, punctuation marks, etc. are displayed on the display device 6) of the main body of the apparatus (100), and the operator inputs the corresponding pronunciation by voice.

音声認識部（１）では、この音声を分析し、単語標準パ
ターンメモリ（１３ｃ）に単語標準パターンの登録を行
なう。The speech recognition section (1) analyzes this speech and registers the word standard pattern in the word standard pattern memory (13c).

上述までの操作により音声認識は可能となる。Voice recognition becomes possible through the operations described above.

しかし、自立語・付属語辞書（１４ｅ）および単語辞１
（Ｌ３ｄ）にない単語を認識させたいときは、自立語・
付属語辞書（１４ｅ）に認識許せたい単語を登録するが
、単語辞書（１３ｄ）に認識させたい単語を、また単語
標準パターンメモリ（Ｌ３ｃ）に単語標準パターンを登
録する必要がある。ただし、自立語・付属語辞書（１４
ｅ）に単語を登録するか、単語辞書（１３ｄ）および単
語標準パターンメモリ（１３ｃ）に、単語および単語標
準パターンを登録するかは、使用者がその単語を文節発
声として認識させたいか、単語発声として認識させたい
かによって決定する。However, independent word/attached word dictionary (14e) and word dictionary 1
If you want to recognize a word that is not in (L3d), use the independent word
Although the words to be recognized are registered in the adjunct word dictionary (14e), it is necessary to register the words to be recognized in the word dictionary (13d) and the word standard pattern in the word standard pattern memory (L3c). However, independent word/attached word dictionary (14
Whether to register words in e) or register words and word standard patterns in the word dictionary (13d) and word standard pattern memory (13c) depends on whether the user wants the word to be recognized as a phrase utterance, or whether the word Determine whether you want it to be recognized as a vocalization.

また、自立語・付属語辞書（１４ｅ）にはあるが、単語
辞書＜１３ｄ）になく、それでも単語認識で認識させた
い場合、かかる単語を単語辞書（１３ｄ）および単語標
準パターンメモリ（１３ｃ）に、単語および単語標準パ
ターンを登録する必要がある。In addition, if the word is in the independent word/adjunct word dictionary (14e) but not in the word dictionary <13d) and you still want to recognize it with word recognition, you can add the word to the word dictionary (13d) and word standard pattern memory (13c). , it is necessary to register words and word standard patterns.

以下に任意単語の登録方法について述べる。The method for registering arbitrary words will be described below.

単語の登録には、単語を自立語・付属語辞書（１４ａ）
に文字列を登録する登録と、単語を単語標準パターンメ
モリ（１３ｃ）に単語標準パターンを登録、およびｇｔ
語辞書（１３ｄ）に文字列を登録する２方法がある。To register words, use the independent word/attached word dictionary (14a)
, register the character string in the word standard pattern memory (13c), and register the word standard pattern in the word standard pattern memory (13c).
There are two methods for registering character strings in the word dictionary (13d).

単語を自立語・付属語辞！（１４ｅ）に登録する場合は
、登録したい単語を発声し本装置に入力する。A dictionary of independent words and adjunct words! When registering in (14e), speak the word you want to register and input it into the device.

このとき本装置はこの音声を音声認識部（１）で認識し
、認識結果を表示装置（６）に表示する。使用者はこの
結果が正しければキーボード（７〉の所定のキーを押し
、発声音声を表示装置（６）に表示されている文字列と
して自立語・付属語辞書（１４Ｑ）に登録する。もし、
表示装置（６）に表示された認識結果が正しくなければ
、本装置の音節修正機能により表示装置（６）に表示さ
れた認識結果を修正するか、登録したい単語を再発声す
る。また再発声した結果が誤っているときは、再び本装
置の音節修正機能により修正する。上述の操作を表示装
置（６）に表示される文字列が登録したい単語と一致す
るまで繰り返す。At this time, this device recognizes this voice using the voice recognition section (1) and displays the recognition result on the display device (6). If the result is correct, the user presses the specified key on the keyboard (7>) and registers the uttered voice as the character string displayed on the display device (6) in the independent word/adjunct word dictionary (14Q).
If the recognition result displayed on the display device (6) is not correct, the recognition result displayed on the display device (6) is corrected using the syllable correction function of the present device, or the word to be registered is reuttered. If the re-uttered result is incorrect, it is corrected again using the syllable correction function of this device. The above operations are repeated until the character string displayed on the display device (6) matches the word to be registered.

単語を単語標準パターンメモリ（１３ｃ）および単語辞
書（１３ｄ）に登録する場合は、単語を自立語・付属語
辞書（１４ｃ）に登録する場合と同様にまず表示装置（
６）に登録したい文字列を正しく表示させる。次に正し
く認識された文字列と単語標準パターンを、単語辞書（
１３ｄ）およびＩａ語！Ｍ準パターンメモリ（１３ｃ）
にそれぞれ登録する。When registering a word in the word standard pattern memory (13c) and word dictionary (13d), first the display device (
6) Display the character string you want to register correctly. Next, the correctly recognized character strings and word standard patterns are stored in a word dictionary (
13d) and Ia words! M quasi-pattern memory (13c)
Register each.

また、自然な発声で入力された音声を認識することは、
現在の音声認識技術のレベルを考えた場合、無理がある
。現在の音声認識技術のレベルでは、連続音節発声入力
が限度であるため、以下に連続音節発声入力の一実施例
について記す。In addition, recognizing natural voice input is
Considering the current level of speech recognition technology, this is unreasonable. Since the current level of speech recognition technology is limited to continuous syllable utterance input, an example of continuous syllable utterance input will be described below.

連続音節発声入力の場合も、上記の手順と同一であるが
、連続音節発声人力の場合は、Ｒ−語標準パターンも連
続音節発声のパターンとなっているため、登録したい単
語を自然発声で再発声し、単語標準パターンを自然発声
より作成し、単語標準パターンと文字列を単語標準パタ
ーンメモリ（１３Ｃ）および単語辞書（１３ｄ）にそれ
ぞれ登録する。In the case of continuous syllable speech input, the above procedure is the same, but in the case of continuous syllable speech input manually, the R-word standard pattern is also a continuous syllable speech pattern, so the word you want to register can be recited naturally. A word standard pattern is created by natural utterance, and the word standard pattern and character string are registered in a word standard pattern memory (13C) and a word dictionary (13d), respectively.

以上の操作により、音声認識による文章作成のために必
要なデータを登録で−きた事となる。Through the above operations, the data necessary for creating sentences using voice recognition has been registered.

（ｉ）　　文章作成以下に文章作成の実施例について述べる。(i) Text creation An example of text creation will be described below.

まｒ、認識動作を行なう場合は、単語認識部（１３）の
スイッチ（Ｌ３ｓｌ）は、パラメータバッファ（１３ａ
）と単語判定部（１３ｂ）を接続する様に、文節認識部
（１４）のスイッチ（１４ｓｌ）は、パラメータバッフ
ァ（１４ａ）と音節認識部（１４ｂ）を接続する様に設
定する。Also, when performing a recognition operation, the switch (L3sl) of the word recognition unit (13) is set to the parameter buffer (13a).
) and the word determination section (13b), and the switch (14sl) of the phrase recognition section (14) is set so as to connect the parameter buffer (14a) and the syllable recognition section (14b).

文章作成には三方法がある。There are three ways to write sentences.

第一の方法は本装置の本体に作成したい文章を音声によ
りマイク（３）から直接入力するオンライン認識方法で
ある。The first method is an online recognition method in which the text to be created is directly input into the main body of the device by voice from the microphone (3).

第二の方法は文章を録音しておいた録音再生装置（２）
を本装置に接続し、録音文章を再生し、認識きせるオフ
ライン認識である。The second method is to use a recording/playback device that records the text (2).
This is offline recognition in which the device is connected to the device, the recorded text is played back, and the text is recognized.

まず、オンライン認識の実施例について述べる。First, an example of online recognition will be described.

オンライン認識の場合は、本装置にマイクく３）より直
接文節単位または単語単位に発声した文章を音声入力す
るので、所定の操作により、入力切り換え部（４〉でマ
イク（３）と音声認識部（１）を接続する。In the case of online recognition, sentences uttered in units of phrases or words are input directly from the microphone 3) to this device, so by performing the specified operation, the microphone (3) and the voice recognition unit Connect (1).

また、マイク（３）より入力している音声を録音再生装
置（２）に記録しておきたいときは、録音再生装置（２
）を本体に接続し、入力切り換え部（４）をマイクく３
）の出力と録音再生装置（２）の録音端子とを接続する
。Also, if you want to record the audio input from the microphone (3) on the recording/playback device (2),
) to the main unit, and connect the input switching section (4) to the microphone 3.
) and the recording terminal of the recording/playback device (2).

また同時に、入力切り換え部（４）は、後述の様に無音
検出信号が特徴抽出部（１２）より入力された場合、文
節、または単語区切りを示すビーブ音を録音するよう機
能する。At the same time, the input switching section (4) functions to record a beep sound indicating a phrase or word break when a silence detection signal is input from the feature extraction section (12) as described later.

音′声認識時は、単語認識部（１３）と文節認識部（１
４）が起動している。During speech recognition, the word recognition unit (13) and the phrase recognition unit (1
4) is running.

マイク（３）より入力された音声は、前処理部（１１）
で入力音声を音声分析に適した特性になるよう処理を施
され（例えば入力音声の音圧が小さい時は、増幅器によ
り音圧を増幅したり°する処理を行なう）、特徴抽出部
（１２）に送られる。The audio input from the microphone (3) is processed by the preprocessing unit (11).
The input voice is processed to have characteristics suitable for voice analysis (for example, when the sound pressure of the input voice is low, the sound pressure is amplified using an amplifier), and the feature extraction unit (12) sent to.

特徴抽出部（１２）では、第５図に示す如く、前処理部
（１１）より入力きれできた音声を分析部＜１２ａ）で
分析し特徴抽出を行ない、パラメータバッファ（１２ｃ
）に記憶する。In the feature extraction section (12), as shown in FIG.
).

同時に、特徴抽出部（１２）の分析東位判定部（１２ｂ
）では、分析部（Ｈａ）の分析結果より、音節または文
節単位に発声されたあとの無音区間、および文節または
単語単位に発声されたあとに録音されたビーブ音（詳細
は後述のオフライン認識の実施例に示す、）の検出を行
なっており、無音区間を検出した場合、無音区間検出信
号（ロ）を発生する。At the same time, the analysis east position determination unit (12b) of the feature extraction unit (12)
), based on the analysis results of the analysis unit (Ha), it was found that the silent interval after each syllable or phrase was uttered, and the beep sound recorded after each phrase or word was uttered (details will be explained later in offline recognition). ) shown in the embodiment, and when a silent section is detected, a silent section detection signal (b) is generated.

かかる無音区間検出信号（ロ）を受は取ったパラメータ
バッファ（１２ｃ）は、記憶している特徴パラメータを
単語認識部（１３）と文節認識部（１４）に送り、記憶
内容を消去する。The parameter buffer (12c) which receives the silent section detection signal (b) sends the stored characteristic parameters to the word recognition section (13) and phrase recognition section (14), and erases the stored contents.

単語認識部（１３）に入力妨れた特徴パラメータは、第
６図に示きれたパラメータバッファ（１３ａ）に記憶さ
れる。単語判定部（１３ｂ）では、パラメータバッファ
　（１３ａ）に記憶きれた特徴パラメータと単語標準パ
ターンメモリ（１３ｃ）とを比較し、パラメータバッフ
ァ（１３ｇ＞に記憶された特徴パラメータと、尤度の大
きい単語標準パターンをもつ単語を、単語辞書（１３ｄ
）より複数語選び、選ばれた単語の文字列とその尤度値
を候補作成部（１５）に送る。The feature parameters that cannot be input to the word recognition unit (13) are stored in the parameter buffer (13a) shown in FIG. The word determination unit (13b) compares the feature parameters stored in the parameter buffer (13a) with the word standard pattern memory (13c), and selects the feature parameters stored in the parameter buffer (13g>) and words with a large likelihood. Words with standard patterns are added to the word dictionary (13d
), and sends the character strings of the selected words and their likelihood values to the candidate creation section (15).

一方、文節認識部（１４）に入力きれた特徴パラメータ
は、パラメータバッファ（１４ａ）に記憶される。音節
認識部（１４ｂ）では、パラメータバッファ（１４ａ）
に記憶された特徴パラメータと音節標準パターンメモリ
（１４ｄ）とを比較し、パラメータバッファ（１４ａ）
に記憶された特徴パラメータを音節列に変換し、かかる
音節列を文節判定部（１４ｃ）へ送る３文節判定部（１
４ｃ）では入力きれた音節列と自立語・付属語辞書（１
４ｅ）に登録されている単語を比較し、自立語と付属語
を組み合わせて尤度の大きい文節を複数組作成し、作成
した文節の文字列とその尤度値を候補作成部（１５）に
送る。On the other hand, the feature parameters that have been input to the clause recognition unit (14) are stored in the parameter buffer (14a). In the syllable recognition unit (14b), the parameter buffer (14a)
The characteristic parameters stored in the syllable standard pattern memory (14d) are compared, and the characteristic parameters stored in the parameter buffer (14a) are compared.
A 3-phrase judgment unit (14c) converts the feature parameters stored in
In 4c), the input syllable string and the independent word/adjunct word dictionary (1
Compare the words registered in 4e), create multiple sets of phrases with a high likelihood by combining independent words and attached words, and send the character strings of the created phrases and their likelihood values to the candidate creation section (15). send.

候補作成部（１５）は入力きれた文字列から尤度の大き
いものを複数個選び、尤度値と単語認識部（１３）から
送らバてきたデータか文節認識部（１４）から送られて
きたデータかを示すコードを付加し記憶する。同時に、
尤度の最も大きいものの文字列を、表示装置に表示きせ
る信号を制御部（５）に送る。制御部（５）は、この信
号を受は尤度の最も大きいものの文字列の後に区切り記
号マークｒ′７」をつけ、例えば第１４図（ａ）の入力
文章に対して第１４図（ｃ）に示すような形式で表示装
置に表示させる。同時に候補作成部（１５）は制御部（
５）に、候補選択部（１５）に記憶された内容を記憶装
置ｔ（８）に記憶させる信号を送る。制御部（５）はこ
の信号を受け、候補作成部（１５）に記憶された文字列
の後に区切り記号を表わすコードを付加した形で記憶装
置（８）に記憶させる。この外部記憶装置に記憶された
文字列は、ワープロの一次Ｗ、稿とする。−殻内にはフ
ロッピーディスクを用いるが、このとき記憶装置（８）
のファイルのフォーマットはワープロのファイルフォー
マットに合わせておく必要がある。The candidate creation section (15) selects a plurality of strings with a large likelihood from the input strings, and uses the likelihood value and the data sent from the word recognition section (13) or the phrase recognition section (14). A code indicating the data is added and stored. at the same time,
A signal is sent to the control unit (5) to display the character string with the greatest likelihood on the display device. When the control unit (5) receives this signal, it adds a delimiter mark "r'7" after the character string with the highest likelihood, and for example, for the input sentence in FIG. 14(a), ) is displayed on the display device in the format shown below. At the same time, the candidate creation section (15) controls the control section (
5), a signal is sent to cause the content stored in the candidate selection unit (15) to be stored in the storage device t(8). The control unit (5) receives this signal and stores the character string stored in the candidate creation unit (15) in the storage device (8) in the form of a code representing a delimiter added after the character string. The character strings stored in this external storage device are assumed to be the primary W and draft of the word processor. - A floppy disk is used inside the shell, and at this time the storage device (8)
The format of the file must match the file format of the word processor.

また、この無音区間検出信号をうけとった第８図に示す
入力切り換え部（４）の信号発生部（４２）は、文章の
文節または単語の区切りを表わすビーブ音を発生し、か
かるビーブ音をスイッチ〈４１）に入力する。スイッチ
（４１）は、マイク（３）から入力される音声と、信号
発生部（４２）より入力されるビーブ音を、録音再生装
置く２）に録音するよう、回路を接続し、録音再生装置
（２）に録音されている文章の文節またはｉ語の区切り
と見なされた無音区間にビーブ音を録音する。Further, the signal generating section (42) of the input switching section (4) shown in FIG. 8, which receives this silent section detection signal, generates a beep sound representing the break between clauses or words of the sentence, and switches the beep sound. Enter in <41). The switch (41) connects a circuit so that the audio input from the microphone (3) and the beep sound input from the signal generator (42) are recorded on the recording/playback device (2). (2) A beep sound is recorded in the silent period that is considered to be a break between the phrases or i-words of the sentence recorded in step (2).

次ぎに、オフライン認識の実施例について述べる。Next, an example of offline recognition will be described.

オフライン認識の場合は、本装置に録音再生装置（２）
の録音音声を再生入力することにより文章作成を行なう
ものであるため、まず録音再生装置（２）に文章を録音
する。For offline recognition, this device is equipped with a recording/playback device (2).
Since the text is created by inputting and reproducing the recorded voice, the text is first recorded on the recording/playback device (2).

また、録音再生装置（２）より音声入力を行なうため、
入力切り換え部（４）により、録音再生装置（２）と音
声認識部く１）を接続する。In addition, in order to input audio from the recording and playback device (2),
The input switching section (4) connects the recording/reproducing device (2) and the voice recognition section 1).

文章録音時は、文節単位または単語単位に発声し、文節
および単語間に無音区間を作る。また、第１図に示す如
き本装置専用の録音再生装置（２）を使用する場合は、
文節および単語の区切りを明確にするため、区切りを示
すビーブ音を、録音再生装置（２）または本デイクチ−
ティングマシン本体に設定きれている区切りキー（７１
）を押し録音する。When recording sentences, the system vocalizes phrases or words, creating silent intervals between phrases and words. In addition, when using a dedicated recording/playback device (2) for this device as shown in Figure 1,
In order to clearly mark the divisions between phrases and words, beep sounds indicating divisions can be played on the recording/playback device (2) or this daylight.
Separator key (71) that has been set on the machine
) to record.

また、単語登録をした単語は、Ｓ語単位に発声をおこな
うが、録音再生装置（２）がキャラクタ−音発生機能を
持ち、かつ入力したい単語に相当するキャラクタ−をも
っていれば、音声の替わりにそのキャラクタ−音を録音
してもよい。In addition, the registered words are uttered in units of S words, but if the recording/playback device (2) has a character sound generation function and has a character corresponding to the word you want to input, it will be uttered instead of voice. You may also record the character sounds.

また、文章単位の頭だしゃ文章と文章の間に録音された
ノイズを音声と誤り認識してしまうことを避けるために
文章の始まりと終わりを示す信号を音声と共に録音して
おく。In addition, signals indicating the beginning and end of a sentence are recorded together with the voice in order to prevent noise recorded between sentences at the beginning of each sentence from being mistakenly recognized as voice.

ただし、この信号の録音方法は、録音再生装置（２）が
マルチトラック方式か否かにより音声登録のところで述
べたように変わる。第１１図は、マルチトラック方式お
よび、第１２図はシングルトラック方式の図である。第
１１図（ａ）、第１２図（ａ）は、ＤＴＭＦ信号等の音
が、録音されている区間を音声領域として、検出する方
法である。However, the method of recording this signal changes depending on whether the recording/reproducing device (2) is a multi-track system or not, as described in the audio registration section. FIG. 11 shows a multi-track system and FIG. 12 shows a single-track system. FIGS. 11(a) and 12(a) show a method of detecting a section in which a sound such as a DTMF signal is recorded as an audio region.

第１１図（ｂ）、第１２図（ａ）は、ＤＴＭＦ信号等の
音を、文章の始まる前に録音し、文章が終了したときに
、再度録音し、かかる両信号に挾まれた区間を音声領域
として、検出する方法である。Figures 11(b) and 12(a) show that a sound such as a DTMF signal is recorded before a sentence begins, and then recorded again when the sentence ends, and the section between these two signals is recorded. This is a method of detecting it as a voice area.

また、第１２図のシングルトラック方式の場合は、音声
区間とＤＴＭＦ信号等の音が、重なることを考え、音声
帯域外のＤＴＭＦ信号等を用いる。Furthermore, in the case of the single track system shown in FIG. 12, a DTMF signal or the like outside the audio band is used, considering that the audio section and the sound such as the DTMF signal overlap.

また文章を認識するときは、信号の録音されている前後
ｔ４およびｔ５の区間をサンプリングし、音声か否かを
判定するため必ずしも文章の始まりと信号の始まり、お
よび文章の終わりと信号の終わりが一致している必要は
ない、このため、文章を発声するタイミングとキーを押
すタイミングが少々ずれても認識可能である。Furthermore, when recognizing a sentence, the sections t4 and t5 before and after the recorded signal are sampled, and in order to determine whether or not it is voice, it is not always necessary to identify the beginning of the sentence and the beginning of the signal, and the end of the sentence and the end of the signal. They do not need to match, so recognition is possible even if there is a slight lag between the timing at which the sentence is uttered and the timing at which the key is pressed.

次に、録音再生装置（２）を本、装置の本体と接続し録
音′音声を再生し認識処理を行なうが、この録音音声を
認識させる前に認識速度のモードを、録音音声の再生速
度を速くして、認識時間短縮を行なう早聞さ認識のモー
ドか、通常の再生速度で認識させるモードか、時間的に
一余裕があり、高認識率を必要とするときは、二度再生
認識モードのいずＦＬかのモードに設定しておく。Next, the recording and playback device (2) is connected to the main body of the device and the recorded voice is played back to perform recognition processing, but before the recorded voice is recognized, the recognition speed mode and the playback speed of the recorded voice are set. There is a fast recognition mode that speeds up the recognition time to shorten the recognition time, a mode that uses normal playback speed for recognition, or a double playback recognition mode when you have extra time and need a high recognition rate. Set it to the FL mode.

まず早聞き認識モードの実施例を記す。First, an example of the fast listening recognition mode will be described.

早聞き認識モードでは、録音音声の再生速度を速くして
いるため、入力音声の特性が、通常の再生速度で再生さ
れた登録音声より作成した、標準パターンとは特性が違
っており、単に再生速度を速くした音声を入力しても、
正確に音声認識を行なえない。In fast listening recognition mode, the playback speed of the recorded audio is increased, so the characteristics of the input audio are different from the standard pattern created from the registered audio played back at the normal playback speed. Even if you input speeded up audio,
Speech recognition cannot be performed accurately.

そこで、再生速度を速くした音声を正確に認識するため
、サンプリング周波数を変更する。以下に、かかる方法
の、実施例を記す。Therefore, in order to accurately recognize audio that has been played back at a faster speed, the sampling frequency is changed. Examples of such methods are described below.

第５図の特徴抽出部り１２）のサンプリング周波数制御
部（１２ｄ）は、特徴抽出部り１２）の入力音声のサン
プリング周波数を音声の標準パターンを作成したときの
サンプリング周波数の（再生速度／録音速度）倍に設定
し、音声をサンプリングし分析する。特徴抽出部（１２
）以降の処理はオンライン認識時の実施例と同様、ただ
し、録音再生装＃（２）の録音文章に、文節および単語
の区切りを明確にするための区切りを示すビーブ音を録
音済みの文章を入力し、特徴抽出部（１２）がかかるビ
ーブ音を検出したとき、特徴抽出部（１２）は無音区間
検出信号（ロ）の代わりに、ビーブ音検出信号（口゛）
を発生する。受信信号が、゛無音区間検出信号（ロ）で
なく、ビーブ音検出信号（口′）の場合、入力切り換え
部（４）の１言号発生部（４２）は、文章の文節または
単語の区切りを表わすビーブ音の発生は行なわない。The sampling frequency control unit (12d) of the feature extraction unit 12) in FIG. speed) to sample and analyze the audio. Feature extraction part (12
) The subsequent processing is the same as in the example for online recognition, except that a beep sound is added to the recorded sentence of the recording/playback device #(2) to clearly indicate the break between phrases and words. When the feature extractor (12) detects the beep sound, the feature extractor (12) extracts the beep sound detection signal (mouth) instead of the silent section detection signal (b).
occurs. When the received signal is not a silent section detection signal (b) but a beep sound detection signal (mouth'), the one-word generator (42) of the input switching unit (4) detects the segment or word break of the sentence. No beep sound will be generated to indicate this.

また、音声認識部（１）が、単語を示すキャラクタ−音
を認識した場合は、かかるキャラクタ−音に対応した単
語を認識結果として出力する。Further, when the speech recognition unit (1) recognizes a character-sound indicating a word, it outputs a word corresponding to the character-sound as a recognition result.

次に二度再生認識モードの実施例を記す。Next, an example of the twice playback recognition mode will be described.

本モードは、まず録音音声を再生し本装置に入力する。In this mode, the recorded audio is first played back and input to the device.

このとき音声認識部く１）の前処理部（１１）で録音發
声の音圧変動を全て読みとり、このデータを第４図に示
す音圧変動メモリ（ｌｌｂ）に記憶する０次ぎに、再び
録音音声を再生し本装置に入力する。このとき前処理部
（１１）では、音圧変動記憶メモリ（ｌｌｂ＞に記憶さ
れｈデータを使用し、特徴抽出部（１２）への入力音圧
を第１８図に示す如く、音声認識に最も適したレベルに
あわせるよう、ＡＧＣ回路（ｌｌａ）の増幅率を調整す
る。即ち、利得Ｇを固定利得Ａに制御電圧Ｖ。（可変調
整される）を乗したものとする。At this time, the preprocessing unit (11) of the speech recognition unit (1) reads all the sound pressure fluctuations of the recorded voice, and stores this data in the sound pressure fluctuation memory (llb) shown in Fig. 4. Play the recorded audio and input it to this device. At this time, the preprocessing unit (11) uses the h data stored in the sound pressure fluctuation storage memory (llb>) to determine the input sound pressure to the feature extraction unit (12) as shown in FIG. The amplification factor of the AGC circuit (lla) is adjusted to match the appropriate level. That is, the gain G is the fixed gain A multiplied by the control voltage V (variably adjusted).

また、二度再生認識モードの別の実施例として、多数回
再生認識モードも考えられる。これは、録音文章を多数
回再生入力し、入力のつど、音声認識部（１）における
認識方法を変更することによって認識きれた結果を比較
し、最も確からしさの尤度の大きいものを、選択する方
法である。Further, as another example of the twice playback recognition mode, a multiple playback recognition mode can also be considered. This involves playing and inputting a recorded sentence many times, changing the recognition method in the speech recognition unit (1) each time, comparing the recognized results, and selecting the one with the greatest likelihood of certainty. This is the way to do it.

また、録音再生装＃（２）に登録用音声を録音しておら
ず、かつ録音再生装置（２）によっては再生速度を速く
した場合の周波数特性と通常の再生速度の場合の周波数
特性が違うものを使用するとき、または音声の標準パタ
ーン作成に使用した録音再生装置（２）と違う周波数特
性をもつ録音再生装置（２）に録音した文章を認識きせ
るとき、または音声の標準パターン作成に使用した録音
再生装置（２）と規格上は同じ周波数特性を有するが使
用部品等の誤差の影Ｇをうけ実際の周波数特性が音声の
標準パターン作成に使用した録音再生装置（２）と違っ
ている録音再生装置（２）に録音した文章を認識させる
ときは、以゛下に述べる周波数特性の影響を補正する機
能を使用する。In addition, the audio for registration is not recorded on recording/playback device #(2), and depending on the recording/playback device (2), the frequency characteristics when the playback speed is increased are different from those when the playback speed is normal. or when recognizing sentences recorded on a recording/playback device (2) that has different frequency characteristics from the recording/playback device (2) used to create the standard voice pattern, or when creating a standard voice pattern. Although it has the same frequency characteristics according to the standard as the recording and playback device (2) that was used, the actual frequency characteristics are different from the recording and playback device (2) that was used to create the standard audio pattern due to the influence of errors in the parts used etc. When the recording and reproducing device (2) recognizes recorded sentences, a function for correcting the influence of frequency characteristics described below is used.

まず、録音再生装置（２）の周波数特性を測定する場合
の基準となる基準正弦波信号を基準信号発生部（４２）
で発生させ、録音再生装置く２）に録音する。しかる後
に録音きれたかかる基準正弦波信号を本装置に再生入力
する。入力された基準正弦波信号を音声認識部（１）は
分析し、録音された基準正弦波信号と、基準信号発生部
（４２）で発生させた基準正弦波信号との周波数特性の
差を求め、録音された基準正弦波信号と、基準信号発生
部（４２）で発生させた基準正弦波信号との周波数特性
の差を小さくするように、補正をかける。補正をかける
手段は、音声認識部（１）の特徴抽出部（１２）の特徴
抽出方法により、多数考えられる。例えば第１３図に示
したように、直列接続されたバンドパスフィルタ（ＢＰ
Ｆ）と増巾器（ＡＭＰ）との並列接続体からなるアナロ
グフィルターバンク方式とするものであれば、増幅器（
ＡＭＰ）の増幅率を調整することにより、基準信号発生
部（４２）で発生させた基準正弦波信号との周波数特性
の差を小さくするようにフィルタからの出力を調整する
。また、特徴抽出部（１２）の特徴抽出方法として、デ
ィジタルフィルターをもちいていれば、ディジタルフィ
ルターの特性を決めているパラメータを変更すればよい
、その他、音声認識部（１）の特徴抽出部（１２）の特
徴抽出方法に対応して、あらゆる方法が考えられる。First, a reference sine wave signal, which is a reference when measuring the frequency characteristics of the recording/playback device (2), is generated by the reference signal generator (42).
and record it on a recording/playback device (2). After that, the recorded reference sine wave signal is reproduced and input to the present device. The speech recognition unit (1) analyzes the input reference sine wave signal and determines the difference in frequency characteristics between the recorded reference sine wave signal and the reference sine wave signal generated by the reference signal generation unit (42). , a correction is made to reduce the difference in frequency characteristics between the recorded reference sine wave signal and the reference sine wave signal generated by the reference signal generator (42). A large number of means for applying the correction can be considered depending on the feature extraction method of the feature extraction section (12) of the speech recognition section (1). For example, as shown in FIG.
If an analog filter bank system is used, which consists of a parallel connection of an amplifier (AMP) and an amplifier (AMP),
By adjusting the amplification factor of AMP), the output from the filter is adjusted so as to reduce the difference in frequency characteristics from the reference sine wave signal generated by the reference signal generator (42). In addition, if a digital filter is used as the feature extraction method of the feature extraction unit (12), the parameters that determine the characteristics of the digital filter may be changed. Various methods can be considered corresponding to the feature extraction method 12).

前記までの操作により、音声入力した文章はかな列に変
換きれた事となる。このかな列変換された文章が入力し
た文章と違っている場合の修正方法を第１４図を使用し
それぞれの誤りかたに場合分けして以下に述べる。以下
の手順により修正を行なう。By the above operations, the text input by voice has been converted into a kana string. A correction method when the kana string-converted text differs from the input text will be described below, using FIG. 14 and classifying each case into error. Correct it using the following steps.

第１４図（ａ）は入力文章、同図（ｂ）は入力音声、同
図（ｃ）は認識結果、同図（ｄ）〜（ｈ）は修正過程、
同図＜ｉ）は修正結果を表わしている。Figure 14 (a) is the input text, Figure 14 (b) is the input voice, Figure 14 (c) is the recognition result, Figure 14 (d) to (h) are the correction process,
Figure <i) shows the correction results.

まず、単語として発声したものが文節として誤認識され
た場合の修正法について述べる。同図（Ｃ）に示したよ
うに単ａｉｄ（”として発声したものが、文節“し−”
として認識された場合、先ずカーソル（Ｘ）を誤った単
語の部分へ移動する［同図（ｄ）Ｉコ。　次ぎに単語次
候補キー（７２）を押し卯語の次候補を表示させる［同
図（ｄ）８Ｆ。　この結果が正しければ次の修正部分へ
進む、もしこの結果が誤っていれば、再び単語次候補キ
ー（７２）を押し単語の次候補を表示させる。この操作
を正解が表示されるまで繰り返す。First, we will discuss a correction method when a word uttered is incorrectly recognized as a phrase. As shown in the same figure (C), what is uttered as the single aid ('' is the phrase ``shi-''
If the word is recognized as incorrect, first move the cursor (X) to the part of the incorrect word [(d) I in the same figure. Next, press the word next candidate key (72) to display the next candidate for Ugo [FIG. 8(d) 8F. If this result is correct, proceed to the next part to be corrected; if this result is incorrect, press the word next candidate key (72) again to display the next word candidate. Repeat this operation until the correct answer is displayed.

次ぎに、文節として発声したものが単語とじて誤認識さ
れた場合の修正法について述べる０文節“い”として発
声したものが、単語′″Ｅ”として認識された場合、先
ずカーソル（Ｘ）を誤った文節の部分へ移動する。次ぎ
に文節次候補キー（７３）を押し文節の次候補を表示き
せる。この結果が正しければ次の修正部分へ進む。Next, we will discuss how to correct a phrase uttered as a phrase that is misrecognized as a word.If the phrase uttered as the phrase ``i'' is recognized as the word ``''E'', first move the cursor (X) Move to the incorrect clause. Next, press the phrase next candidate key (73) to display the next phrase candidate. If this result is correct, proceed to the next modification section.

もしこの結果が誤っていれば、文節次候補キー（７３）
を押し文節の次候補を表示させる。この操作を正解が表
示されるまで繰り返す。If this result is incorrect, the phrase next candidate key (73)
Press to display the next phrase option. Repeat this operation until the correct answer is displayed.

単語前候補キー（７４）を押すことにより単語、文節前
候補キー（７５）を押すことにより文節、それぞれの一
つ前の候補を表示きせることも出来る。It is also possible to display the previous candidate for a word by pressing the pre-word candidate key (74), and the previous candidate for each phrase by pressing the pre-phrase candidate key (75).

上述の２通りの修正法で正解が得られないときは音節単
位の修正や、単語または文節または音節を再発声入力す
る。If the correct answer cannot be obtained using the above two correction methods, correction may be performed in units of syllables, or words, phrases, or syllables may be re-inputted.

また、再発声入力時に再び、文節を単語認識したり、単
語を文節認識したりすることを避けるため、候補作成部
（１５）を、単語認識部（１３）より送られてきた認識
結果のみを認識結果としてみなし、文節認識部〈１４）
より送られてきた認識結果は、無視するよう外部より制
御できる。In addition, in order to avoid recognizing phrases as words or recognizing words as phrases again during re-voice input, the candidate generation section (15) is configured to only recognize the recognition results sent from the word recognition section (13). Regarded as recognition result, phrase recognition unit (14)
The recognition results sent from the computer can be controlled from the outside to be ignored.

また、候補作成部（１５）を、文節認識部〈１４）より
送られてきた認識結果のみを認識結果としてみなし、単
語認識部（１３）より送られてきた認識ｇ果は、無視す
るよう外部より制御できる。In addition, the candidate generation unit (15) is configured to consider only the recognition results sent from the phrase recognition unit (14) as recognition results, and to ignore the recognition results sent from the word recognition unit (13). More control.

上述の次候補キーとは、以下に述べる機能を有するキー
の事であり、第１５図を使用し説明する。The above-mentioned next candidate key is a key having the function described below, and will be explained using FIG. 15.

本装置の音声認識部（１）では、単語認識と文節認識が
並走しており、単語および文節の両認識結果を求めてい
ることは先に述べたが、この両認識結果より、文節認識
処理の結果を尤度の大きいものから順番に認識結果を表
示装置く６）に表示させるためのキーが文節次候補キー
（７３）であり、単語認識処理の結果を尤度の大きいも
のから順番に認識結果を表示装置に表示きせるためのキ
ーが単語次候補キー（７２）であり、現在表示装置に表
示されている認識結果より、一つ尤度の大きい認識結果
を表示装置（６）に表示するキーが、儲語前候補キーお
よび文節前候補キーである。In the speech recognition unit (1) of this device, word recognition and phrase recognition run in parallel, and as mentioned above, both word and phrase recognition results are obtained. The phrase next candidate key (73) is used to display the recognition results on the display device 6) in descending order of the word recognition processing results, in descending order of likelihood. The key to display the recognition result on the display device is the word next candidate key (72), which displays the recognition result with one more likelihood than the recognition result currently displayed on the display device on the display device (6). The keys to be displayed are the pre-phrase candidate key and the pre-bunset candidate key.

第１５図は候補作成部（１５）の候補バッファ（１５ａ
）である、この図は、−位の認識結果が、′たんご」で
あり、これは単語認識部（１３〉から送られてきた認識
結果であることを（＃語）で表わしている。同様に三位
の認識結果が、「たんごをＪであり、これは文節認識部
（１４）から送られてきた認識結果であることを（文節
）で表わし、三位の認識結果が、「たんごに」であり、
これは文節認識部（１４）から送られてきた認識結果で
あることをく文節）で表わし、四位の認識結果が、′た
んこう」であり、これは単語認識部（１３）から送られ
てきた認Ｄｕ果であることを（単語）で表わしている。FIG. 15 shows the candidate buffer (15a) of the candidate creation section (15).
). In this figure, the recognition result in the - position is 'tango', and the (# word) indicates that this is the recognition result sent from the word recognition unit (13>). Similarly, the recognition result of the third place is ``Tango is J, and this is the recognition result sent from the phrase recognition unit (14), which is expressed as (phrase), and the recognition result of the third place is `` Tangoni” and
This is a recognition result sent from the phrase recognition section (14), which is expressed as ku bunsetsu), and the fourth recognition result is 'tankou'', which is sent from the word recognition section (13). It is expressed by (word) that it is the result of recognition that has come.

いま、表示装置（６）には、′たんご、が表示諮れてい
るとする。かかる状態で文節次候補キー（７３）を押す
と表示装置（６）には「たんごを」が表示される。また
、単語次候補キー（７２）を押すと表示装置（６）には
ゝたんこう、が表示される。It is now assumed that 'Tango' is being displayed on the display device (6). In this state, when the phrase next candidate key (73) is pressed, "Tango wo" is displayed on the display device (6). Further, when the next word candidate key (72) is pressed, ゝtanko'' is displayed on the display device (6).

また、表示装置（６）には、「たんこう」が表示されて
いる場合に、単語前候補キー（７４）を押すと表示装置
（６）にはまたんご、が表示きれ、文節前候補キー（７
３）を押すと表示装置（６）にはまたんごに」が表示き
れる。In addition, when the display device (6) is displaying ``tankou'', if you press the pre-word candidate key (74), the display device (6) will display ``matango,'' and the pre-word candidate key (74) will be displayed. key (7
3) When you press ``Matagoni'' is displayed on the display device (6).

次ぎに一文節全体の一括修正方法について述べる。Next, we will discuss how to modify an entire passage at once.

第１４図（ｅ）の例は単語「Ｔ」をｒ　Ａ　、と誤？２
識した例である。先ずカーソルを修正したい単語へ移動
する［同図（ｅ）ｉコ。In the example in Figure 14(e), is the word "T" incorrectly written as r A? 2
This is a well-known example. First, move the cursor to the word you want to correct [see (e) i in the same figure.

次に単語次候補キー（７２）を押し単語の次候補を表示
きせる［同図（ｅ）ｉｌ。この結果が正しければ次の修
正部分へ進む。もしこの結果が誤っていれば、小諸次候
補キー（７２）を押し単語の次候補を表示きせる。この
操作を正解が表示されるまで繰り返す。正解が表示され
無ければ、再発声を行ない、再入力をおこなう。前単語
候補キー（７４）を押すことにより一つ前に表示した単
語の候補を表示させることも出来る。Next, press the word next candidate key (72) to display the next word candidate [FIG. 4(e) il]. If this result is correct, proceed to the next modification section. If this result is incorrect, press the Komoro next candidate key (72) to display the next candidate for the word. Repeat this operation until the correct answer is displayed. If the correct answer is not displayed, re-speak and re-enter. By pressing the previous word candidate key (74), candidates for the word displayed immediately before can be displayed.

次ぎに一単語全体の一括修正方法について述べる。Next, we will discuss how to correct an entire word at once.

第１４図（ｆ’）の例は文節「がめんの」を１がいねん
の」と誤認識した例である。先ずカーソルを修正したい
文節へ移動する［同図（ｒ）ｉｌ。The example in FIG. 14(f') is an example in which the phrase ``gamen no'' is incorrectly recognized as ``1 gainen no''. First, move the cursor to the phrase you want to modify [Figure (r) il.

次ぎに文節次候補中−＜７３）を押し文節の次候補を表
示させる［同図（ｒ）ｉｌ、この結果が正しければ次の
修正部分へ進む、もしこ４の結果が誤っていれば、文節
次候補キー（７３）を押し文節の次候補を表示させる。Next, press Phrase Next Candidate -<73) to display the next bunsetsu candidate [Figure (r) il, if this result is correct, proceed to the next correction part, if the result of this 4 is incorrect, Press the phrase next candidate key (73) to display the next phrase candidate.

この操作を正解が表示されるまで繰り返す、正解が表示
され無ければ、再発声を行ない、再入力をおこなう、前
文節候補キー（７５）を押すことにより一つ前に表示し
た文節の候補を表示させることも出来る。Repeat this operation until the correct answer is displayed. If the correct answer is not displayed, re-speak and re-enter. By pressing the previous clause candidate key (75), display the candidate clause for the previous clause displayed. You can also do it.

次ぎに音節単位の修正方法について述べる。Next, a method for correcting syllables will be described.

第１４図（ｈ）の例は文節「おんせいて、を「おんけい
で」と誤認識した例である。この例は音節１け」を１せ
」に修正する場合であるが、先ずカーソル（Ｘ）を修正
したい音節「け」へ移動し［同図（ｈ）ｉｌ、音節次候
補キー（７６）を押す、音節次候補キー（７６）を押す
ことにより修正したい部分の音節と最も距離が近い音節
が表示される［同図（ｈ）ｉ］’、正解が表示されれば
、次の修正部分へ移動する。もしこの結果が誤っていれ
ば、再度音節次候補キーを押し音節の次候補を表示させ
る。The example in FIG. 14(h) is an example in which the phrase ``Onseite'' is incorrectly recognized as ``Onkeide''. In this example, the syllable ``ke'' is to be corrected to ``ke''. By pressing the syllable next candidate key (76), the syllable closest to the syllable of the part you want to correct will be displayed [Figure (h) i]'. If the correct answer is displayed, move on to the next part to be corrected. Moving. If this result is incorrect, press the syllable next candidate key again to display the next syllable candidate.

乙の操作を正解が表示されるまで繰り返す。正解が表示
きれ無ければ、再発声により再入力を行なう。再入力の
結果が間違っている時は上記の手順により再び修正する
。′−の操作を正解が表示されるまで繰り返す。Repeat B's operation until the correct answer is displayed. If the correct answer cannot be displayed, re-enter the answer by speaking again. If the re-input results are incorrect, correct them again using the steps above. Repeat the operation ′- until the correct answer is displayed.

また前音節候補キー（７７）を押すことにより音節の一
つ前の候補を表示させることも出来る。Furthermore, by pressing the previous syllable candidate key (77), the previous syllable candidate can be displayed.

音節を削除したい時は、カーソルを修正したい音節へ移
動し削除キー（７８〉を押し削除する。If you want to delete a syllable, move the cursor to the syllable you want to modify and press the delete key (78>) to delete it.

音節を挿入したい時は、カーソルを修正したい音節へ移
動し挿入キー（７９）を押し挿入する。When you want to insert a syllable, move the cursor to the syllable you want to modify and press the insert key (79) to insert it.

次に第１６図を使用し、数音節修正法について記す。Next, using FIG. 16, the several syllable correction method will be described.

この例は、同図（ａ）の入力文章“かいじよう′を同図
（ｂ＞１がんじょう」と誤認識した例である。この場合
、まずカーソル（Ｘ）を修正したい音節にもっていき［
同図（ｃ）］、“かい”と再再発大入する。かかる再発
声入力音声は音声認識部（１）で認識きれ、認識結果は
表示装置（６〉に表示される。認識結果が正しければ、
次の修正部へすすむ、もし、同図（ｄ）に示すように、
「かい」を「かえ」と誤認識した場合、ＲＬ語の場合は
、単語次役補キー（７２）を押す０文節の場合は、文節
次候補キー（７３）を押す、第１６図は単語の場合の例
であるので、以下単語の修正力法について記す、同図（
ｄ）の状態で、単語次候補キー（７２）を押した場合、
まず、制御部（５）は、単語辞書（１３ｄ）より、修正
前の同図（ｂ）の認識結果「かんしょう」と再発声後の
同図（ｄ）の認識結果１かえじよう」とを比較し、同一
部分′しよう」をみつける。次に、制御部（５）は、単
語辞書＜１３ｄ）より、かかる同一８１り分１じよう」
をもつｊｌｉ語を選ぶ、同図（ｆ）は単語辞書（１３ｄ
）の記憶内容を示しており、同図（ｇ）は記憶内容より
選んだ「Ｃよう」をもつ単語を示している。次に制御部
（５）は、同図（ｇ）に記した単語と、再発声後の認識
結果「かえしよう」との尤度を計算し、最も尤度値の大
きい単語を表示する［同図（ｅ）］。In this example, the input sentence "Kaijiyo" in Figure (a) is incorrectly recognized as Figure (b > 1 Ganjo).In this case, first move the cursor (X) to the syllable you want to correct. [
Figure (c)], the word “kai” is used again and again. Such re-voiced input speech can be recognized by the speech recognition unit (1), and the recognition result is displayed on the display device (6>).If the recognition result is correct,
Proceed to the next correction part, if as shown in the same figure (d),
If you misrecognize "kai" as "kae", if it is an RL word, press the word secondary candidate key (72). If it is a 0 phrase, press the phrase next candidate key (73). Since this is an example of the case of
If you press the next word candidate key (72) in the state of d),
First, the control unit (5) selects from the word dictionary (13d) the recognition result "Kansho" in the same figure (b) before correction and the recognition result "1 Kaejiyo" in the same figure (d) after re-voicing. Compare and find the same parts. Next, the control unit (5) selects the same 81 minutes from the word dictionary <13d).
The same figure (f) is a word dictionary (13d
), and (g) in the same figure shows words with "C-yo" selected from the memory contents. Next, the control unit (5) calculates the likelihood between the word written in FIG. Figure (e)].

次に文節または単語の認識境界誤りを修正する場合につ
いて述べる。Next, we will discuss the case of correcting recognition boundary errors of phrases or words.

第１４図（ｇ）の例は文節「ぶんしようを」を１ん」と
１し」の間にＣｏ３印で示す無音区間があると誤認識し
、！ＡＭ’ぶん」と文節「しようを」というように二つ
に分けて誤認識した例である。The example in Figure 14 (g) incorrectly recognizes that there is a silent section marked Co3 between the phrase ``Bunsyowo'' and ``1'', and! This is an example of a sentence being misrecognized as being divided into two parts, such as "AM'bun" and the phrase "shouwo."

この場合認識境界誤りを修正しなければならないが、認
識境界区切り記号を削除したい場合は、削除したい認識
境界区切り記号にカーソル（Ｘ）を移動し［同図（ｇ）
ｉｌ、削除キー（７８）を押す［同図（ｇ）ｉｌ。認識
境界区切り記号を挿入したい場合は挿入したい位置にあ
る音節にカーソル（Ｘ）を移動し挿入キー（７９）を押
す。In this case, the recognition boundary error must be corrected, but if you want to delete the recognition boundary delimiter, move the cursor (X) to the recognition boundary delimiter you want to delete [see figure (g)
il, press the delete key (78) [(g) il in the same figure. If you want to insert a recognition boundary delimiter, move the cursor (X) to the syllable at the position you want to insert and press the insert key (79).

ただし、後に述べるように録音再生装置（２）の一区切
りビーブ音と、記憶装置く８）に３己憶きれた認識結果
に付加きれた区切り記号は、録音再生装置（２）と記憶
装ｅ（８）の同期をとるだめの目印となるので、対応は
とっておかなければならない。ゆえに、この時記憶装置
（８）に区切り記号が挿入削除されたことを記憶装置（
８）に記憶しておく。However, as will be described later, the delimiter beep of the recording/playback device (2) and the delimiter added to the recognition result stored in the storage device (e) 8), so you must take precautions. Therefore, at this time, the storage device (8) indicates that the delimiter has been inserted and deleted.
8).

例えば、第１４図（ｇ）ｉに示した文章が、第１４図（
ｇ）ｉに示すように、記憶装置（８）に記憶きれている
ものとする。（ｇ）ｉの文章を、（ｇ）ｉに示すように
修正した場合、記憶装置（８）に記憶されていた区切り
記号「マ、は、記号ｒｖ、に改められる。記号「ｖＪは
、区切り記号１７」が削除されたことを示す記号であり
、認識単位を示す記号には用いられず、録音再生装置（
２）等との制御のみに用いられる記号である。For example, the sentence shown in Figure 14(g)i is
g) As shown in i, it is assumed that the storage device (8) is fully stored. If the sentence in (g)i is modified as shown in (g)i, the delimiter ``ma'' stored in the storage device (8) will be changed to the symbol rv.The symbol ``vJ'' will be changed to the delimiter ``vJ''. This symbol indicates that "Symbol 17" has been deleted, and is not used as a symbol to indicate a recognition unit.
2) This symbol is used only for control.

このような構成にすれば、区切り記号ｒマ、を削除した
後も、録音再生装置（２）に録音いれたビーブ音と、記
憶装置（８）に記憶された記号ｒｖ。With such a configuration, even after delimiting the delimiter r, the beep sound recorded in the recording/playback device (2) and the symbol rv stored in the storage device (8) will be retained.

ｒｖ、を用いることにより、同期をとりながら両装置を
制御できる。rv, it is possible to control both devices while maintaining synchronization.

以上は、区切り記号「７」を削除した場合の例であるが
、挿入された場合も同様の考え方ができる。つまり、制
御信号としては用いられず、区切りのみを表わす特定の
記号を、区切り記号１マ」の替わりに挿入すればよい。The above is an example where the delimiter "7" is deleted, but the same idea can be applied when it is inserted. In other words, a specific symbol that is not used as a control signal and represents only a delimiter may be inserted in place of the delimiter "1ma".

以上の修正手順により、第１４図（ｉ）に示すように、
文章を修正する。By the above correction procedure, as shown in FIG. 14(i),
Correct the text.

認識境界誤り修正を行なった後認識境界誤り修正を行な
った認識単位について、修正手順に従って修正を加える
。再発声による修正の場合、標準パターンを登録した人
なら誰の音声でも認識できるので文章の録音者でなくと
も修正操作を行なえる。After the recognition boundary error has been corrected, the recognition unit for which the recognition boundary error has been corrected is corrected according to the correction procedure. In the case of correction by re-voicing, the voice of anyone who has registered the standard pattern can be recognized, so corrections can be made even if one is not the person who recorded the text.

以上、かな列文章の修正方法を述へたが、修正を補助す
る機能として以下に述べる機能を有する。The method for correcting kana string sentences has been described above, and the following functions are provided to assist in correction.

表示装置（６）に表示された文字列上のカーソル移動と
表示画面のスクロール機能により、記憶装置（８）より
順次記憶文章を表示画面上に表示できるが、この時画面
上に表示されている部分に対応する音声が録音再生装置
（２）から再生きれる。By moving the cursor on the character string displayed on the display device (6) and scrolling the display screen, it is possible to sequentially display memorized sentences from the storage device (8) on the display screen, but at this time, the memorized sentences can be displayed on the display screen in sequence. The audio corresponding to the part can be played back from the recording/playback device (2).

また、上述の機能とは逆の機能も有し、録音再生装置（
２）から再生きれている部分に対応した文字列が表示装
置（６）に表示される。It also has the opposite function to the above-mentioned function, and has a recording/playback device (
2), a character string corresponding to the portion that has been completely reproduced is displayed on the display device (6).

また、上述のどちらの方法の場合も録音文章に録音され
ている区切り記号前と、表示側に記録きれている区切り
記号を、同期を取るタイミング信号として使用し、録音
再生装置（２）の再生と表示とがお互いに同期をとりな
がら動作するよう制御している。また、キーボード（７
）、または録音再生装置（２）より再生を止める６号が
入力きれたとき、再生を止めるとともに、表示のスクロ
ールまたはカーソルの移動を止める。In addition, in both of the above methods, the time before the delimiter recorded in the recorded text and the delimiter recorded on the display side are used as timing signals for synchronization, and the playback of the recording and playback device (2) is performed. and the display are controlled so that they operate in synchronization with each other. Also, the keyboard (7
), or when No. 6 to stop the playback has been inputted from the recording/playback device (2), the playback is stopped and the scrolling of the display or the movement of the cursor is also stopped.

以上の録音再生装置（２）の再生と表示との同期機能に
より、再生音を聞きながら文字列の確認を行なうことが
でき、修正個所の発見を容易にする。The above synchronization function between playback and display of the recording/playback device (2) allows character strings to be checked while listening to the playback sound, making it easier to find corrections.

ここで述べ℃いる同期のとり方として、再生きれ工いる
部分に対応する記憶装置（８）の文字列を表示装置（６
）に表示する方法と、再生きれている部分に対応する部
分より区切り記号−つ遅れた部分のかな列を表示装置（
６）に表示する方法とがある。As for the synchronization method described here, the character string in the storage device (8) corresponding to the part where the playback is completed is transferred to the display device (6).
), and how to display the kana column of the part that is delayed by one delimiter from the part corresponding to the part that has been played completely on the display device (
6) is a display method.

この場合、修正のため表示を停止したときには既に録音
音声の修正部分は再生きれているため再度修正部分を再
生するためには、再生された文章より修正したい部分の
頭だしを行なう必要がある。そこで、この方法を採用す
る場合は、表示を停止したとき、自動的に録音再生装置
（２）を一つ前の区切り記号までパックド・ラックする
機能をもたせる。In this case, when the display is stopped for correction, the corrected part of the recorded voice has already been played back, so in order to play the corrected part again, it is necessary to locate the beginning of the part to be corrected from the reproduced text. Therefore, when this method is adopted, the recording/reproducing device (2) is provided with a function to automatically pack and rack up to the previous delimiter when the display is stopped.

また、録音再生装置（２）に、テープレコーダを使用し
た場合、再生部分をモータの回転により制御することと
、テープのたるみなどにより、修正部分に対応した部分
の頭出しが正確に行なえない場合がある。In addition, when a tape recorder is used as the recording/playback device (2), the playback section is controlled by the rotation of a motor, and the beginning of the section corresponding to the correction section may not be accurately located due to slack in the tape, etc. There is.

このような場合は、入力されてくる音声を、−定時間長
たけＰＣＭ録音やＡＤＰＣＭ録音で記憶しておき、入力
きれた音声を聞き返したい場合は、ＰＣＭ録音やＡ　Ｄ
　Ｐ’ＣＭ録音音声を聞き返す機能を付加する。In such a case, record the input audio for a long time using PCM recording or ADPCM recording, and if you want to listen back to the input audio, use PCM recording or ADPCM recording.
Adds a function to listen back to P'CM recorded audio.

第１７図は上記の、機能の一実施例であり、ＰＣＭ録音
のデータを記憶しておくＰＣＭデータメモリの図である
０図中の数字０１〜０５はアドレスを示している。入力
音声は、第１４図に記した“わたしわ１てん１し−１あ
−る１で−１かめんの１ふんしようを１てん１おんせい
で１しゆう゛せいした１まる”という、文章である。FIG. 17 shows an embodiment of the above-mentioned functions, and numbers 01 to 05 in FIG. 0, which is a diagram of a PCM data memory for storing PCM recording data, indicate addresses. The input voice is ``1 circle, which is 1 unit, which is 1 unit, 1 unit, and 1 unit, which is 1 unit, 1 unit, and 1 unit.'' It is a text.

上記の、音声が入力されたとき、ＰＣＭデータメモリ＜
ＤＭ）には、０１番地に最初の無音区間までの音声“わ
たしわ°°が記憶きれる。０２番地に２番目の無音区間
までの音声“てん“が記憶される。０５番地に５番目の
無音区間までの音声“て−“′が記憶きれる。このとき
、ＰＣＭアドレスポインタ（ＡＰ）は、ＰＣＭデータメ
モリに記憶されているデータのうち、１番先に記憶きれ
たデータのアドレスを記憶しておく。本例では、０１が
記憶される。When the above audio is input, the PCM data memory <
In the DM), the voice “Washiwa °°” up to the first silent section is stored at address 01. The voice “ten” up to the second silent section is stored at address 02. The fifth silent section is stored at address 05. The voice "te-"' up to the section is completely memorized. At this time, the PCM address pointer (AP) stores the address of the data that is stored first among the data stored in the PCM data memory. In this example, 01 is stored.

この段階でＰＣＭデータメモリは一杯になる。At this stage, the PCM data memory is full.

次に、音声が入力されたときは、ＰＣＭデータメモリ（
ＤＭ）に記憶きれているデータのうち、１番先に記憶さ
れたデータのアドレスに、入力きれた音声を記憶する９
本例では“わたしわ”が記憶されていたアドレス０１に
“かめんの”を記憶する。このとき、ＰＣＭアドレスポ
インタ（ＡＰ）は、ＰＣＭデータメモリ（ＤＭ）に記憶
されているデータのうち、１番先に記憶されたデータの
アドレスを記憶しておく０本例では、０２が記憶される
。Next, when audio is input, the PCM data memory (
9. Store the input voice at the address of the first data stored in the DM).
In this example, "Kamen no" is stored at address 01 where "Washiwa" was stored. At this time, the PCM address pointer (AP) stores the address of the first data stored in the PCM data memory (DM). In this example, 02 is stored. Ru.

この状態で、ＰＣＭデータメモリ（ＤＭ）の内容を再生
する場合、ＰＣＭアドレスポインタ（ＡＰ）の指してい
る、アドレスから、再生する。本例では、０２，０３，
０４，０５，０１の順番に再生していく。In this state, when reproducing the contents of the PCM data memory (DM), the contents are reproduced starting from the address pointed to by the PCM address pointer (AP). In this example, 02, 03,
It will be played back in the order of 04, 05, and 01.

かかる方法により、何度でも、正確に素早く、音声を聞
き返すことが可能となる。With this method, it is possible to listen back to the audio accurately and quickly as many times as desired.

また、画面上の認識単位の区切り記号上べ力−ンル（Ｘ
）を移動し録音音声の頭出しキー（７０）を押すことに
より、カーソルが示している認識単位に対応した録音再
生値ｒ！ｔ（２）側の区切り記号音部分を録音文章より
捜し出し、これに続く文章を再生する機能を有する。以
下に、かかる機能の実施例を示す。In addition, the recognition unit delimiter on the screen (X
) and press the recording audio cue key (70), the recording/playback value r! corresponding to the recognition unit indicated by the cursor is displayed. It has a function of searching the t(2) side delimiter sound part from the recorded sentence and reproducing the sentence that follows. An example of such functionality is shown below.

認識した文章の確認のため、認識結果を記憶装置（８）
より読み出し、表示装置（６）に冒頭より表示きせる。To confirm the recognized sentences, the recognition results are stored in a storage device (8)
The data is read out from the beginning and displayed on the display device (6) from the beginning.

この時、第１９図、制御部（５〉の区切り記号カウンタ
ー（５ａ）は、記憶装置（８）より読み出された区切り
記号の数を計数していく、読み出した認識結果が誤って
いる場合は、誤っている部分にカーソルをあて、頭出し
キーを押す。制御部（５）は、録音再生装置（２）に録
、音きれている文章を、早送り再生モードで再生させる
。特徴抽出部（１２）のビーブ音カウンター＜１２ａ）
は、録音再生装置（２）より入力される文章中の区切り
をしめずビーブ音を計数する。At this time, in FIG. 19, the delimiter counter (5a) of the control unit (5>) counts the number of delimiters read from the storage device (8).If the read recognition result is incorrect, Place the cursor on the incorrect part and press the cue key.The control unit (5) causes the recording and playback device (2) to record and play back the text whose sound is cut out in fast forward playback mode.Feature extraction unit (12) Beep sound counter <12a)
counts beeps without marking the breaks in a sentence input from the recording/playback device (2).

比較回路（５ｂ）は、ビーブ音カウンター（１２ｅ）の
値が、先に述べた区切り記号カウンター（５ａ）の値よ
り、１つ小さくなったとき、イコ号（ハ）を録音再生装
置（２）に送り、再生を止める。When the value of the beep sound counter (12e) becomes one smaller than the value of the delimiter counter (5a) mentioned above, the comparison circuit (5b) outputs the icon number (c) to the recording and reproducing device (2). and stop playback.

また、認識結果、および修正を終了した文章の確認のた
めには、記憶装置（８）の記憶データを表示装置（６）
に文字列で表示させ、表示画面上に表示された文字列を
目で追い、読まなければならないため、非常に目が疲れ
る。In addition, in order to confirm the recognition result and the corrected text, the data stored in the storage device (8) can be displayed on the display device (6).
It is very tiring for the eyes because the user has to visually follow and read the string displayed on the display screen.

かかる点に鑑み、本装置は認識結果を記憶許せた記憶装
置（８）上の文字列を、音声合成機能により読み上げる
機能をもたせることにより、認識結果、および修正を終
了した文章の確認を音声合成音を聞くことにより行なえ
るようにできる。In view of this, this device has a function to read out the character strings stored in the storage device (8) that can store the recognition results using a voice synthesis function. You can do it by listening to the sound.

この場合も音声合成部（９）と記憶装置（８）と録音再
生装置（２）と表示装置く６）との同期を取るタイミン
グ信号として、区切り記号を使用する。In this case as well, the delimiter is used as a timing signal for synchronizing the speech synthesis section (9), the storage device (8), the recording and reproducing device (2), and the display device (6).

つまり、音声合成部（９）が記憶装置く８）より読み上
げている部分に相当する文字列が表示装置（６〉に表示
され、同時に録音再生装置（２〉より録音部分を頭出し
している。この方法により、音声合成音の読み合わせ機
能により誤りを発見し修正のために音声合成の読み合わ
せ機能を停止させたとき、表示装置ｍＦ（６）の表示も
録音再生装置（２）の録音部分も誤り部分を示しており
、即座に修正を行なうことができる。In other words, the character string corresponding to the part that the speech synthesis unit (9) reads out from the storage device (8) is displayed on the display device (6>), and at the same time, the recording part is cued up from the recording/playback device (2>). With this method, when an error is discovered by the voice synthesis function and the voice synthesis function is stopped to correct it, neither the display on the display device mF (6) nor the recorded part on the recording/playback device (2) will be displayed. Errors are shown and corrections can be made immediately.

ここで述べている同期のとり方として、音声合成機能に
より読み上げられている部分に対応する記憶装置のかな
列を表示装置く６）に表示すると同時に、録音再生装置
（２）に録音されている文章より該当する音節部分を再
生する方法と、音声合成機能により読み上げられている
部分に対応する部分より、区切り記号−つ遅れた録音再
生装置（２）に録音されている文章部分再生する方法と
がある。後者の場合、修正のため音声合成を停止したと
き、録音再生装置（２）は修正したい部分より手前で停
止しているため、この状態で再生すれば直ぐに修正部分
の音声を再生できる。前者の場合は修正のため音声合成
を停止したときには既に録音音声の修正部分は再生され
ているため再度修正部分を再生するためにはバックトラ
ックする必要がある。そこで、前者の方法を採用する場
合は表示を停止したとき、自動的に録音再生装置（２）
が一つ前の区切り記号までバックトラックする機能をも
たせるのが好ましい。The method of synchronization described here is to display the kana column of the storage device corresponding to the part being read aloud by the speech synthesis function on the display device (6), and at the same time display the text recorded on the recording/playback device (2). There are two methods: one is to play back the syllable part that corresponds to the part that is being read aloud by the speech synthesis function, and the other is to play back the part of the sentence recorded in the recording and playback device (2) that is one delimiter later than the part that corresponds to the part that is being read aloud by the speech synthesis function. be. In the latter case, when voice synthesis is stopped for correction, the recording and reproducing device (2) has stopped before the part to be corrected, so if it is played back in this state, the corrected part of the audio can be immediately played back. In the former case, when voice synthesis is stopped for correction, the corrected part of the recorded voice has already been played back, so it is necessary to backtrack in order to play the corrected part again. Therefore, when adopting the former method, when the display is stopped, the recording and playback device (2) is automatically activated.
It is preferable to have the function of backtracking to the previous delimiter.

以上、認識結果を記憶装置（８）に記憶してお〈実施例
を記してきたが、以下に、別の実施例として、録音再生
装置（２）に認識結果を記憶させる方法を記す。Above, an example has been described in which the recognition result is stored in the storage device (8), but below, as another example, a method of storing the recognition result in the recording/reproducing device (2) will be described.

記憶装置（８）に記憶された、認識結果を、原文の録音
された録音再生装置（２）に記録する。この様にすれば
、原文と認識結果が、同一記録媒体に記録できるため、
原文と認識結果の管理が容易になる。The recognition result stored in the storage device (8) is recorded in the recording/playback device (2) where the original text was recorded. In this way, the original text and recognition results can be recorded on the same recording medium, so
It becomes easier to manage the original text and recognition results.

また、録１−文章を、再生入力しながら、認識した結果
を録音再生装置（２）に録音していくことにより、外部
記憶装置が不要となる。Also, by recording the recognition results in the recording/reproducing device (2) while reproducing and inputting the recording 1-sentence, an external storage device becomes unnecessary.

いずれの場合も、マルチトラック方式の録音再生装置（
２）を用いることにより、録音音声を再生しながら、音
声の録音されていないトラックに認識結果を記憶させる
ことができる。In either case, a multi-track recording/playback device (
By using 2), it is possible to store the recognition result in a track on which no audio is recorded while reproducing the recorded audio.

以下余白（ト）発明の効果Ｑ音再生装置に録音した音声から作成した標準パターン
と、マイクから直接入力した音声より作成した標準パタ
ーンとは録音再生装置の周波数特性分だけ違っている。Below is a margin (G) Effects of the Invention Q A standard pattern created from audio recorded on a sound reproducing device and a standard pattern created from audio input directly from a microphone differ by the frequency characteristics of the recording and reproducing device.

故に録音音声を認識きせるときは、録音音声より作成し
た標準パターンを使用する必要があり、マイクから直接
入力した音声を認識させるときは、マイクから直接入′
力した音声より作成した標準パターンを使用する必要が
ある。よって音声認識装置には内標準パターンをもたせ
なければならないが、本発明システムによれば、マイク
から直接入力し作成した標準パターンと録音音声より作
成した標準パターンの両パターンをただ一回の音声登録
操作によって作成できるため、登録者の音声発声の回数
が半分ですみ音声作業の負担を軽減できる。Therefore, when recognizing a recorded voice, it is necessary to use a standard pattern created from the recorded voice, and when recognizing a voice input directly from a microphone, it is necessary to use a standard pattern created from the recorded voice.
It is necessary to use a standard pattern created from the input voice. Therefore, the speech recognition device must have an internal standard pattern, but according to the system of the present invention, both the standard pattern created by direct input from the microphone and the standard pattern created from the recorded voice can be registered in a single voice registration process. Since it can be created manually, the number of vocalizations required by the registrant can be cut in half, reducing the burden of voice work.

[Brief explanation of the drawing]

第１図は本発明の音声認識システムを採用したデイクチ
−ティングマシンの外観図、第２図はデイクチ−ティン
グマシンの構成図、第３図は音声認識部〈１）の構成図
、第４図は前処理部（１１）の構成図、第５図は特徴抽
出部（ｔ２）の構成図、第６図は単語認識部（１３）の
構成図、第７図は文節認識部（１４）の構成図、第８図
は入力切り換え部（４）の構成図、第９図は見出し語と
録音方式とキャラクタ−音の関係図、第１０図はキャラ
クタ−音の録副方法と音声区間の関係図、第１１図は録
音再生装置がマルチトラック方式の場合の録音方法を示
す図、第１２図は録音再生装置がシングルトラック方式
の場合の録音方法を示す図、第１３図は周波数補正回路
例を示ｒ図、第１４図は誤認識時の修正図、第１５図は
候補作成部〈１５）内の候補バッファ（１５ａ）を示す
図、第１６図は誤認識時の数音節修正例を示す図、第１
７区はＰＣＭ録音方法説明図、第１８図はＡＧＣ動作の
説明図、第１９図は、区切り記号のカウンターの説明図
である。（１）・・・音声認識部、（２）・・・録音再生装置、
（３）・・・マイク、（６）・・・表示装置、（７）・
・・キーボード、（８）・・・記憶装置、（１１）・・
・前処理部、（１２）・・・特徴抽出部、（１３）・・
・単語認識部、（１４〉・・・文節認識部、（ｌｌａ）
・・・可変利得増巾器、（ｌｌｂ）・・・音圧変動メモ
リ。Figure 1 is an external view of a daytime cheating machine that employs the voice recognition system of the present invention, Figure 2 is a configuration diagram of the daytime cheating machine, Figure 3 is a configuration diagram of the voice recognition section (1), and Figure 4 is a block diagram of the preprocessing section (11), FIG. 5 is a block diagram of the feature extraction section (t2), FIG. 6 is a block diagram of the word recognition section (13), and FIG. 7 is a block diagram of the phrase recognition section (14). Fig. 8 is a block diagram of the input switching section (4), Fig. 9 is a diagram showing the relationship between headwords, recording methods, and characters and sounds, and Fig. 10 is a relation between character and sound recording methods and voice sections. Figure 11 is a diagram showing the recording method when the recording/playback device is a multi-track system, Figure 12 is a diagram showing the recording method when the recording/playback system is a single-track system, and Figure 13 is an example of a frequency correction circuit. Figure 14 is a diagram showing corrections made in the case of misrecognition, Figure 15 is a diagram showing the candidate buffer (15a) in the candidate creation section (15), and Figure 16 is an example of correcting several syllables in the case of misrecognition. Figure shown, 1st
Section 7 is an explanatory diagram of the PCM recording method, FIG. 18 is an explanatory diagram of AGC operation, and FIG. 19 is an explanatory diagram of a delimiter counter. (1)...Speech recognition unit, (2)...Recording and playback device,
(3)...Microphone, (6)...Display device, (7)...
...Keyboard, (8)...Storage device, (11)...
・Pre-processing unit, (12)...Feature extraction unit, (13)...
・Word recognition unit, (14>... phrase recognition unit, (lla)
...Variable gain amplifier, (llb)...Sound pressure variation memory.

Claims

[Claims]

(1) Create standard patterns from a recording and playback device that can record and play back the voice input from the microphone, the voice input directly from the microphone, and the playback voice obtained from the recording and playback device, and use these as standard patterns. It is equipped with a voice recognition device that stores patterns, and when the registered voice is input into the voice recognition device from the microphone to create a standard pattern, this registered voice is also recorded on the recording and playback device, and the voice recognition device can record the registered voice from the microphone input. After the registration is completed, the recorded voice is played back to the voice recognition device and registered, so that in addition to the standard pattern of the voice input from the microphone, the standard pattern of the voice input from the recording and playback device is reproduced. A speech recognition system characterized by the ability to obtain