JP2647873B2

JP2647873B2 - Writing system

Info

Publication number: JP2647873B2
Application number: JP62320214A
Authority: JP
Inventors: 正幸飯田; 宏樹大西; 計美大倉
Original assignee: Sanyo Denki Co Ltd
Current assignee: Sanyo Denki Co Ltd
Priority date: 1987-12-17
Filing date: 1987-12-17
Publication date: 1997-08-27
Anticipated expiration: 2012-08-27
Also published as: JPH01161431A

Description

【発明の詳細な説明】（イ）産業上の利用分野音声認識により文章を作成する装置に関するものであ
る。DETAILED DESCRIPTION OF THE INVENTION (a) Field of Industrial Use The present invention relates to an apparatus for creating a sentence by voice recognition.

（ロ）従来の技術音声をテープレコーダの如き録音再生装置に録音し、
これを再生して出力される再生音声を音声認識装置へ入
力することにより、音声認識を行ないこれを文章化する
文章作成システムが開発されつつある（特開昭58−1587
36号）。(B) Conventional technology Sound is recorded on a recording / playback device such as a tape recorder,
A sentence creation system for performing speech recognition by inputting a playback sound output by reproducing the speech to a speech recognition device and converting the sentence into a sentence has been developed (Japanese Patent Laid-Open No. 58-1587).
No. 36).

従来、録音再生装置と音声合成装置と音声認識装置と
表示装置を組み合わせた文章作成システムのような装置
では、録音再生装置より再生する音声と記憶装置より読
みだしたデータを表示装置に表示する部分と、音声合成
の読み合わせ部分は独立した動作をしていた。2. Description of the Related Art Conventionally, in a device such as a sentence creating system in which a recording / reproducing device, a voice synthesizing device, a voice recognizing device, and a display device are combined, a portion for displaying a voice reproduced from the recording / reproducing device and data read from a storage device on a display device. And the reading part of the speech synthesis operated independently.

このような装置において、例えば、作成した文章の確
認中等に誤りを発生し、録音再生装置より、かかる誤り
部分に相当する部分を聞き返したい場合、録音再生装置
に録音されている文章を始めから聞き返し、該当する部
分を見つけなければならなかった。In such a device, for example, when an error occurs during confirmation of a prepared sentence or the like, and the recording / reproducing device wants to hear a portion corresponding to the error portion, it is necessary to listen to the sentence recorded in the recording / reproducing device from the beginning. Had to find the relevant part.

（ハ）発明が解決しようとする問題点従来、ワープロなどで作成した文章を、音声合成機能
で読ませ、文章の誤りを発見し、録音再生装置より誤り
個所に対応する部分を見つけだすときは、録音音声を再
生し聞き返すか、頭出し機能を使用し誤り部分を見つけ
ださなければならなかった。何れの場合もこのように、
音声合成の読み合わせ機能で文章の誤りを発見した後、
録音再生装置より誤り個所に対応する部分を見つけだ
し、再生させるには繁雑な操作を必要とした。(C) Problems to be Solved by the Invention Conventionally, when a sentence created by a word processor or the like is read by a speech synthesis function, an error in the sentence is found, and a portion corresponding to the error portion is found from a recording / reproducing device, I had to play back and listen to the recorded audio, or use the cue function to find out what was wrong. In each case,
After finding an error in the sentence using the speech synthesis reading function,
The recording / reproducing device required a complicated operation to find the part corresponding to the erroneous part and to reproduce the part.

本発明は斯る点に鑑み、音声合成の読み合わせ機能で
文章の誤りを発見した後、録音再生装置より誤り個所に
対応する部分に見つけだし、再生させる繁雑な操作を極
めて簡単な操作により実行できるようにするものであ
る。In view of the above, the present invention makes it possible to execute a complicated operation to find a part corresponding to an erroneous part from a recording / reproducing apparatus and to reproduce the same by an extremely simple operation after finding an error in a sentence by a reading function of speech synthesis. It is to be.

（ニ）問題点を解決するための手段本発明の文章作成システムは、録音再生装置に録音し
た無音区間を認識単位の区切りとすることにより、録音
再生装置より再生された入力音声を文節、音節、単語単
位ごとに認識する音声認識機能を有し、無音区間を検出
した場合、無音区間を検出したことをしめす無音区間検
出信号をかかる無音区間に録音するとともに、音声認識
した結果を認識単位毎に区切り信号をつけて記憶装置に
記憶し、無音区間検出信号と区切り記号とを１対１に対
応させ、かかる両信号を使用し、前記音声認識した結果
に対応して表示装置に表示する文章と、前記音声認識し
た結果に対応して音声合成部が合成する文章音声と、録
音再生装置に録音されている文章音声との間で、表示出
力と音声合成出力の同期を取る文章作成システムにおい
て、音声合成されている部分に同期した文章音声を録音
再生装置より再生する機能をもち、音声認識結果の修正
時等に、合成音を停止した場合、録音再生装置の再生部
分が所定の無音区間検出信号分逆戻りすることを特徴と
する。(D) Means for Solving the Problems The sentence preparation system of the present invention uses a silent section recorded in a recording / reproducing apparatus as a unit of recognition unit, so that the input voice reproduced from the recording / reproducing apparatus is used as a syllable or syllable. Has a voice recognition function for recognizing each word, and when a silent section is detected, a silent section detection signal indicating that a silent section has been detected is recorded in the silent section, and the result of voice recognition is recognized for each recognition unit. A sentence to be stored in a storage device with a delimiter signal added thereto, and a silent section detection signal and a delimiter symbol are made to correspond to each other on a one-to-one basis. And a sentence composition for synchronizing the display output and the speech synthesis output between the sentence speech synthesized by the speech synthesis unit corresponding to the result of the speech recognition and the sentence speech recorded in the recording / reproducing device. In the system, the recording / reproducing device has a function of reproducing a sentence voice synchronized with the voice-synthesized portion, and when the synthesized sound is stopped, for example, when correcting the speech recognition result, the reproducing portion of the recording / reproducing device has a predetermined function. It is characterized in that it returns backward by the silent section detection signal.

（ホ）作用本発明のシステムによれば、このように構成した場
合、例えば音声合成によ読み合わせ中に誤りを発見し、
合成機能を止めた場合、録音再生装置は現在の位置より
所定の無音区間検出信号分逆戻りして停止する。つま
り、この時、録音再生装置は、合成部が合成した文章よ
りも所定の無音区間検出信号分だけ戻っているため、音
声合成出力を停止した部分に対応する部分より、前の部
分へ戻っており、誤り部分の頭出しができている。(E) Function According to the system of the present invention, when configured in this way, for example, an error is found during reading by speech synthesis,
When the synthesizing function is stopped, the recording / reproducing apparatus returns from the current position by a predetermined silent section detection signal and stops. In other words, at this time, the recording / reproducing apparatus returns by a predetermined silence section detection signal from the text synthesized by the synthesizing unit, and thus returns to a part before the part corresponding to the part where the voice synthesis output is stopped. The error part has been found.

（ヘ）実施例第１図に本発明を採用して音声入力により文章作成す
るディクテーティングマシンの外観図を示し、第２図に
該マシンの機能ブロック図を示す。(F) Embodiment FIG. 1 shows an external view of a dictating machine which employs the present invention to create sentences by voice input, and FIG. 2 shows a functional block diagram of the machine.

第２図に於て、（１）は第１図の本体（100）内に回
路装備された音声認識部であり、その詳細は第３図のブ
ロック図に示す如く、入力音声信号の音声調整を行う前
処理部（11）［第４図］、該処理部（11）からの音圧調
整済みの音声信号からその音響特徴を示すパラメータを
抽出する特徴抽出部（12）［第５図］、該抽出部（12）
から得られる特徴パラメータに基づき入力音声の単語認
識を行う単語認識部（13）［第６図］と文節認識部（1
4）［第７図］、及びこれらいずれかの認識部（13）、
（14）からの認識結果に基づき認識単語文字列、或いは
認識音節文字の候補を作成する候補作成部（15）からな
る。In FIG. 2, (1) is a speech recognition unit provided in the main body (100) of FIG. 1, and details thereof are shown in a block diagram of FIG. Preprocessing unit (11) [FIG. 4], and a feature extracting unit (12) [FIG. 5] for extracting a parameter indicating the acoustic feature from the sound pressure adjusted sound signal from the processing unit (11). , The extraction unit (12)
Word recognition unit (13) that performs word recognition of input speech based on feature parameters obtained from
4) [Fig. 7] and any of these recognition units (13),
A candidate creating unit (15) for creating a candidate for a recognized word character string or a recognized syllable character based on the recognition result from (14).

更に第２図に於て、（２）は第１図に示す如く本体
（100）に機械的並びに電気的に着脱可能なテープレコ
ーダ等の録音再生装置、（３）は例えば第１図図示の如
きヘッドホンタイプのマイクロホン、（４）は録音再生
装置（２）とマイクロホン（３）と音声認識部（１）と
のあいだの接続切り換えを行う入力切り換え部［第８
図］である。（６）は認識結果に基づき生成した文字列
等を表示するための表示装置、（７）は該ディクテーテ
ィングマシンの各種制御信号を入力するためのキーボー
ド、（８）は該ディクテーティングマシンで生成された
文字列を記憶する磁気ディスク装置等の記憶装置、
（９）は該記憶装置の文字列を規則合成によりスピーカ
（10）から読み上げるための音声合成部である。尚、
（５）はマイクロプロセッサからなる制御部であり、上
記各部の動作の制御を司っている。Further, in FIG. 2, (2) is a recording / reproducing device such as a tape recorder which can be mechanically and electrically detachably attached to the main body (100) as shown in FIG. 1, and (3) is, for example, the one shown in FIG. A headphone type microphone (4) is an input switching unit [8th] for switching connection between the recording / reproducing device (2), the microphone (3) and the voice recognition unit (1).
Figure]. (6) a display device for displaying a character string or the like generated based on a recognition result, (7) a keyboard for inputting various control signals of the dictating machine, and (8) a keyboard for inputting the dictating machine. A storage device such as a magnetic disk device for storing the character string generated by
(9) is a voice synthesizing unit for reading out a character string in the storage device from the speaker (10) by rule synthesis. still,
(5) is a control unit composed of a microprocessor, which controls the operation of each unit.

上述の構成のディクテーティングマシンに依る文章作
成方法としては二通りあり、それぞれに就いて以下に詳
述する。There are two methods for creating a sentence using the dictating machine having the above-described configuration, and each will be described in detail below.

第一の方法は、マイク（３）より生音声を音声認識部
（１）に入力し、音声認識を行ない、入力音声を文字列
に変換し、表示装置（６）に表示し、同時に記憶装置
（８）に結果を記憶する。The first method is to input a raw voice from a microphone (3) to a voice recognition unit (1), perform voice recognition, convert the input voice to a character string, display the same on a display device (6), and simultaneously store the storage device. The result is stored in (8).

第二の方法は、入力したい文章を予め録音再生装置
（２）に録音しておき、この録音再生装置（２）を本装
置に接続し、録音文章を音声認識部（１）に入力するこ
とにより、音声認識を行ない、入力音声を文字列に変換
し、表示装置（６）に表示し、同時に記憶装置（８）に
結果を記憶する。A second method is to record a sentence to be input in a recording / reproducing device (2) in advance, connect the recording / reproducing device (2) to the present device, and input the recorded sentence to a voice recognition unit (1). Performs voice recognition, converts the input voice into a character string, displays it on the display device (6), and simultaneously stores the result in the storage device (8).

上述の様に、音声を入力する方法は、二通りあるの
で、入力切り換え部（４）において、入力の切り換えを
行なう。また入力切り換え部（４）は、入力の切り換え
の他に、録音再生装置（２）に録音信号（イ）を録音す
るのか、マイク（３）より入力された音声を録音するの
かの切り換えも行なう。As described above, there are two methods for inputting voice, and the input switching unit (4) switches the input. In addition to the input switching, the input switching unit (4) also switches between recording the recording signal (a) in the recording / reproducing device (2) and recording the voice input from the microphone (3). .

以下に音声録音から文章作成までの動作を順次詳述す
る。Hereinafter, the operations from voice recording to text creation will be sequentially described in detail.

（ｉ）音声登録処理音声認識を行なうに先だち、音声認識に必要な音声の
標準パターンを作成するため、音声登録を行なう。(I) Voice Registration Processing Before performing voice recognition, voice registration is performed to create a standard voice pattern required for voice recognition.

まず、音節登録モードについて述べる。 First, the syllable registration mode will be described.

ここで述べている標準パターンとは、音声認識部
（１）の文節認識部（14）でのパターンマッチィング時
の基準パターンとなるものであり、具体的には第７図の
如き文節認識部（14）の音節標準パターンメモリ（14
d）に格納される。The standard pattern described here is a reference pattern at the time of pattern matching in the phrase recognition unit (14) of the speech recognition unit (1). Specifically, the standard pattern as shown in FIG. 14) Syllable standard pattern memory (14
stored in d).

本ディクテーティングマシンに音声登録する方法は、
まず第７図のスイッチ（14s1）を操作しパラメータバッ
ファ（14a）と音節標準パターンメモリ（14a）とを接続
し、次に述べる三方法がある。To register a voice to this dictating machine,
First, the switch (14s1) in FIG. 7 is operated to connect the parameter buffer (14a) to the syllable standard pattern memory (14a), and there are three methods described below.

第一の方法は該マシンの本体（100）にマイク（３）
より直接登録音声を入力し、この登録音声を音声認識部
（１）で分析し、標準パターンを作成し、作成した標準
パターンを音節標準パターンメモリ（14d）および記憶
装置（８）に記憶させる方法である。The first method is to place a microphone (3) on the body (100) of the machine.
A method of directly inputting a registered voice, analyzing the registered voice in a voice recognition unit (1), creating a standard pattern, and storing the created standard pattern in a syllable standard pattern memory (14d) and a storage device (8) It is.

第二の方法は前もって登録音声を録音しておいた録音
再生装置（２）を本体（100）に接続し、この録音登録
音声を再生することにより登録音声の入力をなし、この
入力した登録音声を音声認識部（１）で分析し、標準パ
ターンを作成し、作成した標準パターンを音節標準パタ
ーンメモリ（14d）および記憶装置（８）に記憶させる
方法である。The second method is to connect the recording / reproducing apparatus (2), which has previously recorded the registered voice, to the main body (100) and reproduce the recorded voice, thereby inputting the registered voice, and inputting the registered voice. Is analyzed by the voice recognition unit (1), a standard pattern is created, and the created standard pattern is stored in the syllable standard pattern memory (14d) and the storage device (8).

第三の方法は本マシンの本体（100）にマイク（３）
から直接登録音声を入力するが、このとき同時に録音再
生装置（２）を本体（100）に接続しておきこの入力さ
れた音声を録音再生装置（２）に録音しながら、本体
（100）側ではマイク（３）からの登録音声の分析を行
ない標準パターンを作成し、作成した標準パターンを記
憶装置（８）に記憶させておく。そして、次にこのマイ
ク（３）への音声入力が終了すると、これに引き続き、
録音再生装置（２）に録音された音声を再生し、この録
音された登録音声を音声認識部（１）で分析し、標準パ
ターンを作成し、作成した標準パターンを音節標準パタ
ーンメモリ（14d）に記憶しておくと同時に、記憶装置
（８）にも上述のマイク（３）からの直接の登録音声の
音節標準パターンと共に記憶させる方法である。The third method is to use a microphone (3) on the main unit (100) of the machine.
The registered voice is input directly from the main unit (100). At this time, the recording / reproducing device (2) is connected to the main unit (100), and the input voice is recorded on the recording / reproducing device (2). Then, the registered voice from the microphone (3) is analyzed to create a standard pattern, and the created standard pattern is stored in the storage device (8). Then, when the voice input to the microphone (3) is completed,
The voice recorded by the recording / reproducing device (2) is reproduced, the registered voice recorded is analyzed by the voice recognition unit (1), a standard pattern is created, and the created standard pattern is stored in a syllable standard pattern memory (14d). At the same time as the syllable standard pattern of the registered voice directly from the microphone (3).

この第３の方法に於ては、録音再生装置（２）に録音
した音声は録音再生装置（２）の周波数特性を受けてい
るため、録音した音声から作成した標準パターンと、マ
イク（３）から直接入力した音声より作成した標準パタ
ーンとを比べた場合、両標準パターンの間に違いが現れ
る。故に録音音声を認識させるときは、録音音声より作
成した標準パターンを使用する必要があり、マイク
（３）から直接入力した音声を認識させるときは、マイ
ク（３）から直接入力した音声より作成した標準パター
ンを使用する必要があるので、上述の如きの方法をとる
ことによって、マイク（３）から直接登録した標準パタ
ーンと録音音声より作成した標準パターンの両パターン
を一回の音声登録操作によって作成し記憶できる。ま
た、一度録音再生装置（２）に登録音声を録音しておけ
ば標準パターンを作成していないディクテーティングマ
シン上にも登録者の発声入力を必要とせず、この録音音
声を再生入力するだけで、標準パターンが作成できる。
また、録音再生装置（２）に登録音声を録音し、さらに
この登録音声のあとに文章を録音しておけば、後にこの
録音再生装置（２）を本体（100）に接続し、録音され
た音声を再生するだけで音声登録から、文章作成まで、
すべて自動的に行なえる。In the third method, the sound recorded in the recording / reproducing device (2) receives the frequency characteristics of the recording / reproducing device (2), so that the standard pattern created from the recorded sound and the microphone (3) When a standard pattern created from a voice directly input from a PC is compared, a difference appears between the two standard patterns. Therefore, when recognizing the recorded voice, it is necessary to use a standard pattern created from the recorded voice, and when recognizing the voice directly input from the microphone (3), the standard pattern created from the voice directly input from the microphone (3) is used. Since it is necessary to use the standard pattern, the above-described method is used to create both the standard pattern directly registered from the microphone (3) and the standard pattern created from the recorded voice by one voice registration operation. You can remember. Also, once the registered voice is recorded in the recording / reproducing device (2), the registrant does not need to input the voice on the dictating machine on which the standard pattern has not been created. Thus, a standard pattern can be created.
Also, if the registered voice is recorded in the recording / reproducing device (2), and a sentence is recorded after the registered voice, the recording / reproducing device (2) is connected to the main body (100) later, and the recorded voice is recorded. Just by playing the sound, from voice registration to text creation,
Everything can be done automatically.

尚、音声の標準パターンを作成する他の登録者の発声
入力は、本装置が一定の順序で表示装置（６）に表示す
る文字を登録者が読み上げることにより行なわれる。The registrant who creates the standard voice pattern is input by the registrant reading out characters displayed on the display device (6) in a certain order.

また、本マシンの専用の表示機能をもつ録音再生装置
（２）を使用する場合はこの録音再生装置（２）単独で
携帯する時でもその表示画面に表示された見出し語に対
応する音声を発声し録音再生装置（２）に録音する事
で、標準パターンの作成が可能となる。When the recording / reproducing device (2) having a dedicated display function of the machine is used, even when the recording / reproducing device (2) is carried alone, a voice corresponding to the headword displayed on the display screen is produced. By recording on the recording / reproducing device (2), a standard pattern can be created.

上述の如く、標準パターンを作成するための登録音声
を録音再生装置（２）に録音する場合は、この録音され
た登録音声より標準パターンを作成するときにノイズな
どの影響を受け録音音声とこれに対応するべき見出し語
とがずれる可能性があり、以下、第９図に基づき、説明
のため録音再生装置としてテープレコーダを使用した場
合について述べる。第９図（ａ）はテープレコーダに標
準パターン作成のための登録音声を録音した状態のう
ち、見出し語「あ」〜「か」に対応した登録音声“あ”
〜“か”の間のテープの状態を表わしており、ここでは
“え”と“お”の間に［ノイズ］が録音された場合を示
す。第９図（ａ）の様に登録音声と登録音声との間に
［ノイズ］が録音されたテープにより音声登録を行なっ
た場合、１番目に録音された音が“あ”で２番目に録音
された音が“い”という様に、ただ単にテープに録音さ
れた音の順序により、入力された登録音声がどの音節に
対応しているのかを決定していると、［ノイズ］まで登
録音声とみなして見出し語を対応させるので入力された
実際の登録音声と見出し語とがずれてしまう。As described above, when a registered voice for creating a standard pattern is recorded in the recording / reproducing device (2), the recorded voice is affected by noise when creating the standard pattern from the recorded voice. There is a possibility that the headword may be different from the corresponding headword. Hereinafter, a case where a tape recorder is used as a recording / reproducing apparatus will be described with reference to FIG. FIG. 9A shows a state in which the registered voice for creating the standard pattern is recorded on the tape recorder, and the registered voice “A” corresponding to the headwords “A” to “KA” is shown.
This represents the state of the tape between "?" And "?", And here shows the case where [noise] is recorded between "e" and "o". As shown in FIG. 9 (a), when a voice is registered using a tape on which [noise] is recorded between the registered voices and the registered voice, the first recorded voice is "A" and the second voice is recorded. If the input sound is determined by the order of the sounds recorded on the tape, such as “I”, which syllable the input sound corresponds to, the registered sound will be up to [Noise]. Since the headword is regarded as corresponding, the input registered voice and the headword are shifted.

ここで、第９図（ｂ）は［ノイズ］を音声と誤認識
し、見出し語「え」のところに［ノイズ］が入力され、
見出し語「お」のところに音節“え”が入力された図で
ある。Here, in FIG. 9 (b), [noise] is erroneously recognized as speech, and [noise] is input at the headword "e".
FIG. 11 is a diagram in which a syllable “e” is input at the headword “o”.

この様に登録音声より標準パターンを作成するときに
ノイズなどの影響を受け録音音声と見出し語とがずれる
場合があるため、第９図（ｃ）に示すように、登録音声
の種類を示したキャラクターコード音を、登録音声に対
応させて録音再生装置（２）に録音する。この方法によ
り、“う”と“え”の間に［ノイズ］が録音されていて
も、上述のように、入力された音と見出し語とのずれを
防止する。As described above, when the standard pattern is created from the registered voice, the recorded voice may be displaced from the headword under the influence of noise or the like. Therefore, as shown in FIG. 9C, the type of the registered voice is indicated. The character code sound is recorded in the recording / reproducing device (2) in correspondence with the registered voice. With this method, even if [noise] is recorded between "u" and "e", a shift between the input sound and the headword is prevented as described above.

このずれを防止する特定周波数のキャラクターコード
音の録音方法を、録音再生装置（２）のテープレコーダ
がシングルトラックである場合と、マルチトラックであ
る場合とにわけて説明する。A method of recording a character code sound of a specific frequency for preventing this deviation will be described separately for a case where the tape recorder of the recording / reproducing apparatus (2) is a single track and a case where the tape recorder is a multi-track.

まず第10図において、録音方式としてマルチトラック
をもつ録音再生装置を使用する場合について述べる。First, a case where a recording / reproducing apparatus having a multitrack is used as a recording method will be described with reference to FIG.

録音方式としてマルチトラックをもつ録音再生装置を
使用する場合は同図（ａ）に示すように音声を録音して
いないトラックに見出し語に対応するキャラクターコー
ドを録音する。音声認識部（１）では、キャラクターコ
ード音より、入力される音声の見出し語を知るととも
に、音声トラックに録音された音のうち、このキャラク
ターコード音が録音された区間t1に録音された音のう
ち、音圧しきい値以上の条件をみたすもののみを音声と
みなし、分析を行なう。When a recording / reproducing apparatus having a multi-track is used as a recording method, a character code corresponding to a headword is recorded on a track on which no voice is recorded as shown in FIG. In the voice recognition unit (1), the headword of the input voice is known from the character code sound, and among the sounds recorded in the voice track, the sound recorded in the section t1 in which the character code sound was recorded is recorded. Of these, only those satisfying the condition equal to or higher than the sound pressure threshold value are regarded as speech and analyzed.

また、同図（ｂ）に示すように、音声の始めと終わり
に見出し語に対応するキャラクターコードを録音し、音
声トラックに録音された音のうち、この音声の始めを示
すキャラクターコード音と、音声の終わりを示すキャラ
クターコード音の間の区間t2に録音された音のうち、音
圧しきい値以上の条件をみたすもののみを音声とみな
し、分析を行なう。Also, as shown in FIG. 3B, a character code corresponding to the headword is recorded at the beginning and end of the voice, and a character code sound indicating the beginning of the voice among the sounds recorded on the voice track; Of the sounds recorded in the section t2 between the character code sounds indicating the end of the voice, only those that meet the condition of the sound pressure threshold or more are regarded as voice and analyzed.

または、同図（ｃ）に示すように、音声の始めに見出
し語に対応するキャラクターコードを録音する。音声ト
ラックに録音された音のうち、この音声の種類を示すキ
ャラクターコード音から、次の見出し語に対応するキャ
ラクターコード音までの区間t3に録音された音のうち、
音圧しきい値以上の条件をみたすもののみを音声とみな
し、分析を行なう。Alternatively, a character code corresponding to the headword is recorded at the beginning of the voice, as shown in FIG. Of the sounds recorded in the voice track, from the character code sound indicating the type of this sound to the character code sound corresponding to the next headword, in the sound recorded in section t3,
Only those satisfying the condition equal to or higher than the sound pressure threshold value are regarded as speech and analyzed.

また第二の方法としてシングルトラックの録音再生装
置（２）の場合は、見出し語に対応するキャラクターコ
ードを音声の分析周波数帯域外の音で表わし、音声の録
音されているトラックに音声と共に録音する。この場合
のキャラクターコード音を録音する方法は、上述のマル
チトラックの場合と同様である。つまり、上述のt1、t
2、t3の区間に録音された音うち、上述と同様の条件を
みたすもののみを音声とみなし、分析を行なう。ただ
し、音声と、キャラクターコード音が重なっている同図
（ａ）に示した実施例の場合以外は、キャラクターコー
ド音に、音声の分析周波数帯域外の音を使用しなくても
よい。As a second method, in the case of a single-track recording / reproducing device (2), a character code corresponding to a headword is represented by a sound outside the analysis frequency band of the voice, and is recorded together with the voice on the track where the voice is recorded. . The method of recording the character chord sound in this case is the same as in the case of the multi-track described above. That is, t1, t described above
Of the sounds recorded in the section between t2 and t3, only those meeting the same conditions as above are regarded as voices and analyzed. However, except for the case of the embodiment shown in FIG. 10A in which the voice and the character code sound overlap, it is not necessary to use a sound outside the analysis frequency band of the voice as the character code sound.

次ぎにアルファベット、数字およびカッコや句読点な
ど予め第６図の如き単語認識部（13）の単語辞書（13
d）にキャラクター登録されている単語に対応する単語
標準パターンを、同図の単語標準パターンメモリ（13
c）に登録する。Next, a word dictionary (13) of the word recognition unit (13) as shown in FIG.
The word standard pattern corresponding to the word registered in d) is stored in the word standard pattern memory (13
Register in c).

まず、所定の操作により、第６図のパラメータバッフ
ァ（13a）と単語標準パターンメモリ（13c）とがスイッ
チ（13s1）により接続され、単語登録モードにする。First, by a predetermined operation, the parameter buffer (13a) and the word standard pattern memory (13c) of FIG. 6 are connected by the switch (13s1), and the mode is set to the word registration mode.

つぎに、本装置本体（100）の表示装置（６）にアル
ファベット、数字およびカッコや句読点などが表示さ
れ、操作者はこれに対応する読みを音声入力する。Next, alphabets, numbers, parentheses, punctuation marks, and the like are displayed on the display device (6) of the apparatus main body (100), and the operator speaks the corresponding reading.

音声認識部（１）では、この音声を分析し、単語標準
パターンメモリ（13c）に単語標準パターンの登録を行
なう。The voice recognition unit (1) analyzes this voice and registers the word standard pattern in the word standard pattern memory (13c).

上述までの操作により音声認識は可能となる。しか
し、自立語・付属語辞書（14e）および単語辞書（13d）
にない単語を認識させたいときは、自立語・付属語辞書
（14e）に認識させたい単語を登録するか、単語辞書（1
3d）に認識させたい単語を、また単語標準パターンメモ
リ（13c）に単語標準パターンを登録する必要がある。
ただし、自立語・付属語辞書（14e）に単語を登録する
か、単語辞書（13d）および単語標準パターンメモリ（1
3c）に、単語および単語標準パターンを登録するかは、
使用者がその単語を文節発声として認識させたいか、単
語発声として認識させたいかによって決定する。Voice recognition becomes possible by the operations described above. However, the independent and auxiliary word dictionary (14e) and the word dictionary (13d)
If you want to recognize words that do not exist in the dictionary, register the words you want to recognize in the independent word /
It is necessary to register the word to be recognized in 3d) and the word standard pattern in the word standard pattern memory (13c).
However, register words in the independent word / attached word dictionary (14e), or use the word dictionary (13d) and word standard pattern memory (1e).
3c), whether to register words and word standard patterns,
The determination is made depending on whether the user wants the word to be recognized as a phrase utterance or a word utterance.

また、自立語・付属語辞書（14e）にはあるが、単語
辞書（13d）になく、それでも単語認識で認識させたい
場合、かかる単語を単語辞書（13d）および単語標準パ
ターンメモリ（13c）に、単語および単語標準パターン
を登録する必要がある。If the word is in the independent word / attached word dictionary (14e) but is not in the word dictionary (13d), but still wants to be recognized by word recognition, the word is stored in the word dictionary (13d) and the word standard pattern memory (13c). , Words and word standard patterns need to be registered.

以下に任意単語の登録方法について述べる。 The method of registering an arbitrary word is described below.

単語の登録には、単語を自立語・付属語辞書（14e）
に文字列を登録する登録と、単語を単語標準パターンメ
モリ（13c）に単語標準パターンを登録、および単語辞
書（13d）に文字列を登録する２方法がある。To register a word, use the dictionary of independent words and attached words (14e)
There are two methods of registering a character string in the word dictionary, registering a word standard pattern in the word standard pattern memory (13c), and registering a character string in the word dictionary (13d).

単語を自立語・付属語辞書（14e）に登録する場合
は、登録したい単語を発声し本装置に入力する。When a word is registered in the independent word / attached word dictionary (14e), the word to be registered is uttered and input to the apparatus.

このとき本装置はこの音声を音声認識部（１）で認識
し、認識結果を表示装置（６）に表示する。使用者はこ
の結果が正しければキーボード（７）の所定のキーを押
し、発声音声を表示装置（６）に表示されている文字列
として自立語・付属語辞書（14e）に登録する。もし、
表示装置（６）に表示された認識結果が正しくなけれ
ば、本装置の音節修正機能により表示装置（６）に表示
された認識結果を修正するか、登録したい単語を再発声
する。また再発声した結果が誤っているときは、再び本
装置の音節修正機能により修正する。上述の操作を表示
装置（６）に表示される文字列が登録したい単語と一致
するまで繰り返す。At this time, the apparatus recognizes the voice by the voice recognition unit (1) and displays the recognition result on the display device (6). If the result is correct, the user presses a predetermined key of the keyboard (7) and registers the uttered voice in the independent word / attached word dictionary (14e) as a character string displayed on the display device (6). if,
If the recognition result displayed on the display device (6) is not correct, the syllable correction function of the present device corrects the recognition result displayed on the display device (6), or re-utters the word to be registered. If the result of the re-utterance is incorrect, it is corrected again by the syllable correction function of the present apparatus. The above operation is repeated until the character string displayed on the display device (6) matches the word to be registered.

単語を単語標準パターンメモリ（13c）および単語辞
書（13d）に登録する場合は、単語を自立語・付属語辞
書（14e）に登録する場合と同様にまず表示装置（６）
に登録したい文字列を正しく表示させる。次に正しく認
識された文字列と単語標準パターンを、単語辞書（13
d）および単語標準パターンメモリ（13c）にそれぞれ登
録する。When the word is registered in the word standard pattern memory (13c) and the word dictionary (13d), first, the display device (6) is used in the same manner as when the word is registered in the independent word / attached word dictionary (14e).
Display the character string you want to register correctly. Next, the correctly recognized character strings and word standard patterns are stored in a word dictionary (13
d) and the word standard pattern memory (13c).

また、自然な発声で入力された音声を認識すること
は、現在の音声認識技術のレベルを考えた場合、無理が
ある。現在の音声認識技術のレベルでは、連続音声発声
入力が限度であるため、以下に連続音節発声入力の一実
施例について記す。In addition, it is impossible to recognize a voice input with a natural utterance, considering the current level of voice recognition technology. At the current level of speech recognition technology, continuous speech utterance input is the limit, so one embodiment of continuous syllable utterance input will be described below.

連続音声発声入力の場合も、上記の手順と同一である
が、連続音節発声入力の場合は、単語標準パターンも連
続音節発声のパターンとなっているため、登録したい単
語を自然発声で再発声し、単語標準パターンを自然発声
より作成し、単語標準パターンと文字列を単語標準パタ
ーンメモリ（13c）および単語辞書（13d）にそれぞれ登
録する。In the case of continuous vocal utterance input, the above procedure is the same, but in the case of continuous syllable vocal input, the word to be registered is re-uttered by natural utterance because the word standard pattern is also a pattern of continuous syllable utterance. Then, a word standard pattern is created from natural utterance, and the word standard pattern and the character string are registered in the word standard pattern memory (13c) and the word dictionary (13d), respectively.

以上の操作により、音声認識による文章作成のために
必要なデータを登録できた事となる。By the above operation, data necessary for creating a sentence by voice recognition can be registered.

（ii）文章作成以下に文章作成の実施例について述べる。(Ii) Text preparation An example of text preparation is described below.

まず、認識動作を行なう場合は、単語認識部（13）の
スイッチ（13s1）は、パラメータバッファ（13a）と単
語判定部（13b）を接続する様に、文節認識部（14）の
スイッチ（14s1）は、パラメータバッファ（14a）と音
節認識部（14b）を接続する様に設定する。First, when performing the recognition operation, the switch (13s1) of the word recognition unit (13) is connected to the switch (14s1) of the phrase recognition unit (14) such that the parameter buffer (13a) and the word determination unit (13b) are connected. ) Is set to connect the parameter buffer (14a) and the syllable recognition unit (14b).

文章作成には二方法がある。 There are two ways to create sentences.

第一の方法は本装置の本体に作成したい文章を音声に
よりマイク（３）から直接入力するオンライン認識方法
である。The first method is an online recognition method in which a sentence to be created in the main body of the apparatus is directly input by voice from a microphone (3).

第二の方法は文章を録音しておいた録音再生装置
（２）を本装置に接続し、録音文章を再生し、認識させ
るオフライン認識である。The second method is offline recognition in which a recording / reproducing device (2) that has recorded a sentence is connected to the present device, and the recorded sentence is reproduced and recognized.

まず、オンライン認識の実施例について述べる。 First, an embodiment of online recognition will be described.

オンライン認識の場合は、本装置にマイク（３）より
直接文節単位または単語単位に発声した文章を音声入力
するので、所定の操作により、入力切り換え部（４）で
マイク（３）と音声認識部（１）を接続する。In the case of online recognition, a sentence uttered in units of phrases or words is directly input to the apparatus by voice from the microphone (3), and the microphone (3) and the voice recognition unit are input by the input switching unit (4) by a predetermined operation. Connect (1).

また、マイク（３）より入力している音声を録音再生
装置（２）に記録しておきたいときは、録音再生装置
（２）を本体に接続し、入力切り換え部（４）をマイク
（３）の出力と録音再生装置（２）の録音端子とを接続
する。To record the sound input from the microphone (3) in the recording / reproducing apparatus (2), the recording / reproducing apparatus (2) is connected to the main body, and the input switching section (4) is connected to the microphone (3). ) Is connected to the recording terminal of the recording / reproducing device (2).

また同時に、入力切り換え部（４）は、後述の様に無
音検出信号が特徴抽出部（12）より入力された場合、文
節、または単語区切りを示すビープ音を録音するよう機
能する。At the same time, the input switching unit (4) functions to record a beep indicating a phrase or a word break when a silence detection signal is input from the feature extraction unit (12) as described later.

音声認識時は、単語認識部（13）と文節認識部（14）
が起動している。At the time of speech recognition, word recognition unit (13) and phrase recognition unit (14)
Is running.

マイク（３）より入力された音声は、前処理部（11）
で入力音声を音声分析に適した特性になるよう処理を施
され（例えば入力音声の音圧が小さい時は、増幅器によ
り音圧を増幅をしたりする処理を行なう）、特徴抽出部
（12）に送られる。The voice input from the microphone (3) is sent to the pre-processing unit (11)
The input voice is processed so as to have characteristics suitable for voice analysis (for example, when the sound pressure of the input voice is low, a process of amplifying the sound pressure by an amplifier is performed), and a feature extraction unit (12) Sent to

特徴抽出部（12）では、第５図に示す如く、前処理部
（11）より入力されてきた音声を分析部（12a）で分析
し特徴抽出を行ない、パラメータバッファ（12c）に記
憶する。In the feature extraction unit (12), as shown in FIG. 5, the speech input from the pre-processing unit (11) is analyzed by the analysis unit (12a) to perform feature extraction, and stored in the parameter buffer (12c).

同時に、特徴抽出部（12）の分析単位判定部（12b）
では、分析部（12a）の分析結果より、音節または文節
単位に発声されたあとの無音区間、および文節または単
語単位に発声されたあとに録音されたビープ音（詳細は
後述のオフライン認識の実施例に示す。）の検出を行な
っており、無音区間を検出した場合、無音区間検出信号
（ロ）を発生する。At the same time, the analysis unit determination unit (12b) of the feature extraction unit (12)
Then, based on the analysis result of the analysis unit (12a), silence sections after being uttered in units of syllables or phrases, and beep sounds recorded after being uttered in units of phrases or words (for details, see In this case, a silent section detection signal (b) is generated when a silent section is detected.

かかる無音区間検出信号（ロ）を受け取ったパラメー
タバッファ（12c）は、記憶している特徴パラメータを
単語認識部（13）と文節認識部（14）に送り、記憶内容
を消去する。The parameter buffer (12c) that has received the silent section detection signal (b) sends the stored feature parameters to the word recognition section (13) and the phrase recognition section (14), and erases the stored contents.

単語認識部（13）に入力された特徴パラメータは、第
６図に示されたパラメータバッファ（13a）に記憶され
る。単語判定部（13b）では、パラメータバッファ（13
A）に記憶された特徴パラメータと単語標準パラメータ
メモリ（13c）とを比較し、パラメータバッファ（13a）
に記憶された特徴パラメータと、尤度の大きい単語標準
パターンをもつ単語を、単語辞書（13d）より複数語選
び、選ばれた単語の文字列とその尤度値を候補作成部
（15）に送る。The feature parameters input to the word recognition unit (13) are stored in the parameter buffer (13a) shown in FIG. In the word determination unit (13b), the parameter buffer (13
A) The feature parameter stored in A) is compared with the word standard parameter memory (13c), and the parameter buffer (13a) is compared.
A word having a feature parameter and a word standard pattern having a large likelihood is selected from the word dictionary (13d), and the character string of the selected word and its likelihood value are sent to the candidate creation unit (15). send.

一方、文節認識部（14）に入力された特徴パラメータ
は、パラメータバッファ（14a）に記憶される。音節認
識部（14b）では、パラメータバッファ（14a）に記憶さ
れた特徴パラメータと音節標準パラメータメモリ（14
d）とを比較し、パラメータバッファ（14a）に記憶され
た特徴パラメータを音節列に変換し、かかる音節列を文
節判定部（14c）へ送る。文節判定部（14c）では入力さ
れた音節列と自立語・付属語辞書（14e）に登録されて
いる単語を比較し、自立語と付属語を組み合わせて尤度
の大きい文節を複数組作成し、作成した文節の文字列と
その尤度値を候補作成部（15）に送る。On the other hand, the feature parameters input to the phrase recognition unit (14) are stored in the parameter buffer (14a). In the syllable recognition unit (14b), the feature parameters stored in the parameter buffer (14a) and the syllable standard parameter memory (14
d), and converts the characteristic parameter stored in the parameter buffer (14a) into a syllable string, and sends the syllable string to the syllable determination unit (14c). The phrase determining unit (14c) compares the input syllable string with the words registered in the independent word / adjunct dictionary (14e), and combines the independent word and the adjunct word to generate a plurality of sets of phrases having a high likelihood. Then, the character string of the created phrase and its likelihood value are sent to the candidate creating unit (15).

候補作成部（15）は入力された文字列から尤度の大き
いものを複数個選び、尤度値と単語認識部（13）から送
られてきたデータか文節認識部（14）から送られてきた
データかを示すコードを付加し記憶する。同時に、尤度
の最も大きいものの文字列を、表示装置に表示させる信
号を制御部（５）に送る。制御部（５）は、この信号を
受け尤度の最も大きいものの文字列の後に区切り記号マ
ークをつけ、例えば第14図（ａ）の入力文章をに対して第14
図（ｃ）に示すような形式で表示装置に表示させる。同
時に候補作成部（15）は制御部（５）に、候補作成部
（15）に記憶された内容を記憶装置（８）に記憶させる
信号を送る。制御部（５）にこの信号を受け、候補作成
部（15）に記憶された文字列の後に区切り信号を表わす
コードを付加した形で記憶装置（８）に記憶させる。こ
の外部記憶装置に記憶された文字列は、ワープロの一次
原稿とする。一般的にはフロッピーディスクを用いる
が、このとき記憶装置（８）のファイルのフォーマット
はワープロのファイルフォーマットに合わせておく必要
がある。The candidate creator (15) selects a plurality of candidates having a large likelihood from the input character string, and receives the likelihood value and the data sent from the word recognizer (13) or sent from the phrase recognizer (14). A code indicating the data is added and stored. At the same time, a signal for causing the display device to display a character string having the highest likelihood is sent to the control unit (5). The control unit (5) receives this signal, and adds a delimiter mark after the character string of the one having the highest likelihood. To the input sentence shown in FIG.
The image is displayed on the display device in a format as shown in FIG. At the same time, the candidate creating section (15) sends a signal to the control section (5) to store the content stored in the candidate creating section (15) in the storage device (8). The control unit (5) receives this signal, and stores it in the storage device (8) in a form in which a code representing a delimiter signal is added after the character string stored in the candidate creation unit (15). The character string stored in the external storage device is a primary document of a word processor. Generally, a floppy disk is used, but at this time, the format of the file in the storage device (8) needs to match the file format of a word processor.

また、この無音区間検出信号をうけとった第８図に示
す入力切り換え部（４）の信号発生部（42）は、文章の
文節または単語の区切りを表わすビープ音を発生し、か
かるビープ音をスイッチ（41）に入力する。スイッチ
（41）は、マイク（３）から入力される音声と、信号発
声部（42）より入力されるビープ音を、録音再生装置
（２）に録音するよう、回路を接続し、録音再生装置
（２）に録音されている文章の文節または単語の区切り
と見なされた無音区間にビープ音を録音する。The signal generating section (42) of the input switching section (4) shown in FIG. 8 which receives the silent section detection signal generates a beep sound indicating a passage of a sentence or a break of a word, and switches the beep sound. Enter in (41). The switch (41) connects a circuit so as to record the sound input from the microphone (3) and the beep sound input from the signal uttering unit (42) to the recording / reproducing device (2). A beep sound is recorded in a silent section that is regarded as a phrase or word segment of the sentence recorded in (2).

次ぎに、オフライン認識の実施例について述べる。 Next, an embodiment of off-line recognition will be described.

オフライン認識の場合は、本装置に録音再生装置
（２）の録音音声を再生入力することにより文章作成を
行なうものであるため、まず録音再生装置（２）に文章
を録音する。In the case of off-line recognition, a sentence is created by reproducing and inputting a recorded voice of the recording / reproducing device (2) to the present apparatus. First, a sentence is recorded on the recording / reproducing device (2).

また、録音再生装置（２）より音声入力を行なうた
め、入力切り換え部（４）により、録音再生装置（２）
と音声認識部（１）を接続する。Further, in order to input a voice from the recording / reproducing device (2), the input switching unit (4) uses the recording / reproducing device (2).
And the voice recognition unit (1).

文章録音時は、文節単位または単語単位に発声し、文
節および単語間に無音区間を作る。また、第１図に示す
如き本装置専用の録音再生装置（２）を使用する場合
は、文節および単語の区切りを明確にするため、区切り
を示すビープ音を、録音再生装置（２）または本ディス
クテーティングマシン本体に設定されている区切りキー
（71）を押し録音する。At the time of recording a sentence, utterance is made in units of phrases or words, and a silent section is created between the phrases and words. When a dedicated recording / reproducing apparatus (2) as shown in FIG. 1 is used, a beep sound indicating a delimiter is recorded on the recording / reproducing apparatus (2) or the book to clarify the delimitation of phrases and words. Press the delimiter key (71) set on the disk tating machine and record.

また、単語登録をした単語は、単語単位に発声をおこ
なうが、録音再生装置（２）がキャラクター音発生機能
を持ち、かつ入力したい単語に相当するキャラクターを
もっていれば、音声の替わりにそのキャラクター音を録
音してもよい。The registered words are uttered in word units. If the recording / reproducing device (2) has a character sound generating function and has a character corresponding to the word to be input, the character sound is replaced with the character sound. May be recorded.

また、文章単位の頭だしや文章と文章の間に録音され
たノイズを音声と誤り認識してしまうことを避けるため
に文章の始まりと終わりを示す信号を音声と共に録音し
ておく。In addition, in order to prevent the head recorded in each sentence or the noise recorded between sentences from being erroneously recognized as speech, a signal indicating the beginning and end of the sentence is recorded together with the speech.

ただし、この信号の録音方法は、録音再生装置（２）
がマルチトラック方式か否かにより音声登録のところで
述べたように変わる。第11図は、マルチトラック方式お
よび、第12図はシングルトラック方式の図である。第11
図（ａ）、第12図（ａ）は、DTMF信号等の音が、録音さ
れている区間を音声領域として、検出する方法である。However, the recording method of this signal is as follows:
Changes as described in the voice registration depending on whether or not is a multi-track system. FIG. 11 is a diagram of a multi-track system and FIG. 12 is a diagram of a single-track system. Eleventh
FIGS. 12 (a) and 12 (a) show a method of detecting a section in which a sound such as a DTMF signal is recorded as a voice area.

第11図（ｂ）は、第12図（ｂ）は、DTMF信号等の音
を、文章の始まる前に録音し、文章が終了したときに、
再度録音し、かかる両信号に挾まれた区間を音声領域と
して、検出する方法である。FIG. 11 (b) shows a sound such as a DTMF signal recorded before the start of the sentence, and FIG.
This is a method of re-recording and detecting a section sandwiched between the two signals as a voice area.

また、第12図のシングルトラック方式の場合は、音声
区間とDTMF信号等の音が、重なることを考え、音声帯域
外のDTMF信号等を用いる。Further, in the case of the single track system shown in FIG. 12, a DTMF signal or the like out of the audio band is used, considering that the sound of the voice section and the sound of the DTMF signal and the like overlap.

また文章を認識するときは、信号の録音されている前
後14およびt5の区間をサンプリングし、音声か否かを判
定するため必ずしも文章の始まりと信号の始まり、およ
び文章の終わりと信号の終わりが一致している必要はな
い。このため、文章を発声するタイミングとキーを押す
タイミングが少々ずれても認識可能である。Also, when recognizing a sentence, the section between 14 and t5 before and after the signal is recorded is sampled, and it is necessary to sample the beginning of the sentence and the beginning of the signal, and the end of the sentence and the end of the signal in order to determine whether or not it is a voice. It does not need to match. For this reason, it is possible to recognize even if the timing of uttering a sentence and the timing of pressing a key slightly deviate.

次に、録音再生装置（２）を本装置の本体と接続し録
音音声を再生し認識処理を行なうが、この録音音声を認
識させる前に認識速度のモードを、録音音声の再生速度
を速くして、認識時間短縮を行なう早聞き認識のモード
が、通常の再生速度で認識させるモードか、時間的に余
裕があり、高認識率を必要とするときは、二度再生認識
モードのいずれかのモードに設定しておく。Next, the recording / reproducing apparatus (2) is connected to the main body of the present apparatus to reproduce the recorded voice and perform the recognition process. Before recognizing the recorded voice, the mode of the recognition speed is changed to a higher reproducing speed of the recorded voice. The fast-recognition mode for shortening the recognition time is either the mode for recognizing at the normal playback speed or the double-playback recognition mode when there is sufficient time and a high recognition rate is required. Set the mode.

まず早聞き認識モードの実施例を記す。 First, an embodiment of the early listening recognition mode will be described.

早聞き認識モードでは、録音音声の再生速度を速くし
ているため、入力音声の特性が、通常の再生速度で再生
された登録音声より作成した、標準パターンで特性が違
っており、単に再生速度を速くした音声を入力しても、
正確に音声認識を行なえない。In the fast-recognition recognition mode, the playback speed of the recorded voice is increased, so the characteristics of the input voice are different from those of the standard pattern created from the registered voice played at the normal playback speed. Even if you input a voice that speeds up
Cannot perform accurate speech recognition.

そこで、再生速度を速くした音声を正確に認識するた
め、サンプリング周波数を変更する。以下に、かかる方
法の、実施例を記す。Therefore, the sampling frequency is changed in order to accurately recognize the sound whose reproduction speed has been increased. Hereinafter, examples of such a method will be described.

第５図の特徴抽出部（12）のサンプリング周波数制御
部（12d）は、特徴抽出部（12）の入力音声のサンプリ
ング周波数を音声の標準パターンを作成したときのサン
プリング周波数の（再生速度／録音速度）倍に設定し、
音声をサンプリングし分析する。特徴抽出部（12）以降
の処理はオンライン認識時の実施例と同様。ただし、録
音再生装置（２）の録音文章に、文節および単語の区切
りを明確にするための区切りを示すピーブ音を録音済み
の文章を入力し、特徴抽出部（12）がかかるピーブ音を
検出したとき、特徴抽出部（12）は無音区間検出信号
（ロ）の代わりに、ピーブ音検出信号（ロ′）を発生す
る。受信信号が、無音区間検出信号（ロ）でなく、ピー
ブ音検出信号（ロ′）の場合、入力切り換え部（４）の
信号っ発生部（42）は、文章の文節または単語の区切り
を表わすピーブ音の発生は行なわない。The sampling frequency control unit (12d) of the feature extraction unit (12) in FIG. 5 sets the sampling frequency of the input voice of the feature extraction unit (12) to (the playback speed / recording speed) Speed) times,
Sample and analyze audio. The processing after the feature extraction unit (12) is the same as in the embodiment at the time of online recognition. However, in the recorded text of the recording / reproducing device (2), a sentence in which a peep sound indicating a delimiter for clarifying a segment and a word is input is input, and the feature extractor (12) detects the peep sound. Then, the feature extraction unit (12) generates a peep sound detection signal (b ') instead of the silent section detection signal (b). When the received signal is not a silent section detection signal (b) but a peep sound detection signal (b '), the signal generation section (42) of the input switching section (4) indicates a segment of a sentence or a word segment. No beep sound is generated.

また、音声認識部（１）が、単語を示すキャラクター
音を認識した場合は、かかるキャラクター音に対応した
単語を認識結果として出力する。When the voice recognition unit (1) recognizes a character sound indicating a word, it outputs a word corresponding to the character sound as a recognition result.

次に二度再生認識モードの実施例を記す。 Next, an embodiment of the double reproduction recognition mode will be described.

本モードは、まず録音音声を再生し本装置に入力す
る。このとき音声認識部（１）の前処理部（11）で録音
音声の音圧変動を全て読みとり、このデータを第４図に
示す音圧変動メモリ（11b）に記憶する。次ぎに、再び
録音音声を再生し本装置に入力する。このとき前処理部
（11）では、音圧変動メモリ（11b）に記憶されたデー
タを使用し、特徴抽出部（12）への入力音圧を第18図に
示す如く、音声認識に最も適したレベルにあわせるよ
う、AGC回路（11a）の増幅率を調整する。即ち、利得Ｇ
を固定利得Ａに制御電圧V_G（可変調整される）を乗じた
ものとする。In this mode, first, a recorded voice is reproduced and input to the apparatus. At this time, the preprocessing section (11) of the voice recognition section (1) reads all the sound pressure fluctuations of the recorded voice, and stores this data in the sound pressure fluctuation memory (11b) shown in FIG. Next, the recorded voice is reproduced again and input to the apparatus. At this time, the pre-processing unit (11) uses the data stored in the sound pressure fluctuation memory (11b) and adjusts the input sound pressure to the feature extraction unit (12) as shown in FIG. The gain of the AGC circuit (11a) is adjusted to match the level. That is, the gain G
And multiplied by the fixed gain A of the control voltage V _G (it is variably adjusted).

また、二度再生認識モードの別の実施例として、多数
回再生認識モードも考えられる。これは、録音文章を多
数回再生入力し、入力のつど、音声認識部（１）におけ
る認識方法を変更することによって認識された結果を比
較し、最も確からしさの尤度の大きいものを、選択する
方法である。Further, as another embodiment of the double playback recognition mode, a multiple playback recognition mode is also conceivable. This means that a recorded sentence is reproduced and input many times, and each time the input is input, the result recognized by changing the recognition method in the voice recognition unit (1) is compared, and the one with the highest likelihood of certainty is selected. How to

また、録音再生装置（２）に登録用音声を録音してお
らず、かつ録音再生装置（２）によっては再生速度を速
くした場合の周波数特性と通常の再生速度の場合の周波
数特性が違うものを使用するとき、または音声の標準パ
ターン作成に使用した録音再生装置（２）と違う周波数
特性をもつ録音再生装置（２）に録音した文章を認識さ
せるとき、または音声の標準パターン作成に使用した録
音再生装置（２）と規格上は同じ周波数特性を有するが
使用部品等の誤差の影響をうけ実際の周波数特性が音声
の標準パターン作成に使用した録音再生装置（２）と違
っている録音再生装置（２）に録音した文章を認識させ
るときは、以下に述べる周波数特性の影響を補正する機
能を使用する。In addition, the recording / reproducing device (2) does not record the registration voice, and the recording / reproducing device (2) has different frequency characteristics when the reproduction speed is increased from the frequency characteristics when the reproduction speed is normal. Or when using a recording / reproducing device (2) having a frequency characteristic different from that of the recording / reproducing device (2) used for creating a standard voice pattern, or for creating a standard voice pattern. The recording / reproducing device (2) has the same frequency characteristics as the recording / reproducing device (2) in standard, but the actual frequency characteristics are different from those of the recording / reproducing device (2) used for creating the standard pattern of the voice due to the influence of errors of the parts used. When the apparatus (2) recognizes the recorded text, a function for correcting the influence of the frequency characteristic described below is used.

まず、録音再生装置（２）の周波数特性を測定する場
合の基準となる基準正弦波信号を基準信号発生部（42）
で発生させ、録音再生装置（２）に録音する。しかる後
に録音されたかかる基準正弦波信号を本装置に再生入力
する。入力された基準正弦波信号を音声認識部（１）は
分析し、録音された基準正弦波信号と、基準信号発生部
（42）で発生させた基準正弦波信号との周波数特性の差
を求め、録音された基準正弦波信号と、基準信号発生部
（42）で発生させた基準正弦波信号との周波数特性の差
を小さくするように、補正をかける。補正をかける手段
は、音声認識部（１）の特徴抽出部（12）の特徴抽出方
法により、多数考えられる。例えば第13図に示したよう
に、直列接続されたバンドパスフィルタ（BPF）と増巾
器（AMP）との並列接続体からなるアナログフィルター
バンク方式とするものであれば、増幅器（AMP）の増幅
率を調整することにより、基準信号発生部（42）で発生
させた基準正弦波信号との周波数特性の差を小さくする
ようにフィルタからの出力を調整する。また、特徴抽出
部（12）の特徴抽出方法として、ディジタルフィルター
をもちいていれば、ディジタルフィルターの特性を決め
ているパラメータを変更すればよい。その他、音声認識
部（１）の特徴抽出部（12）の特徴抽出方法に対応し
て、あらゆる方法が考えられる。First, a reference sine wave signal serving as a reference when measuring the frequency characteristics of the recording / reproducing apparatus (2) is supplied to a reference signal generator (42).
And record it in the recording / playback device (2). The reference sine wave signal recorded after that is reproduced and input to the apparatus. The voice recognition unit (1) analyzes the input reference sine wave signal and obtains a difference between the frequency characteristics of the recorded reference sine wave signal and the reference sine wave signal generated by the reference signal generation unit (42). The correction is performed so as to reduce the difference in frequency characteristics between the recorded reference sine wave signal and the reference sine wave signal generated by the reference signal generator (42). A number of means for performing the correction can be considered by the feature extraction method of the feature extraction unit (12) of the speech recognition unit (1). For example, as shown in FIG. 13, if an analog filter bank system including a parallel connection of a band-pass filter (BPF) and an amplifier (AMP) connected in series is used, an amplifier (AMP) By adjusting the amplification factor, the output from the filter is adjusted so as to reduce the difference in the frequency characteristics from the reference sine wave signal generated by the reference signal generator (42). Also, if a digital filter is used as the feature extraction method of the feature extraction unit (12), the parameters that determine the characteristics of the digital filter may be changed. In addition, various methods are conceivable corresponding to the feature extraction method of the feature extraction unit (12) of the speech recognition unit (1).

前記までの操作により、音声入力した文章はかな列に
変更された事となる。このかな列変更された文章が入力
した文章と違っている場合の修正方法を第14図を使用し
それぞれの誤りかたに場合分けして以下に述べる。以下
の手順により修正を行なう。By the operations up to the above, the sentence input by voice is changed to the kana sequence. A method of correcting a sentence in which the kana column changed is different from the input sentence will be described below with reference to FIG. 14 for each error type. Make corrections according to the following procedure.

第14図（ａ）は入力文章、同図（ｂ）は入力音声、同
図（ｃ）は認識結果、同図（ｄ）〜（ｈ）は修正過程、
同図（ｉ）は修正結果を表わしている。14 (a) is an input sentence, FIG. 14 (b) is an input voice, FIG. 14 (c) is a recognition result, and FIGS. 14 (d) to (h) are correction processes.
FIG. 3I shows the result of the correction.

まず、単語として発声したものが文節として誤認識さ
れた場合の修正法について述べる。同図（ｃ）に示した
ように単語“C"として発声したものが、文節“しー”と
して認識された場合、先ずカーソル（Ｘ）を誤った単語
の部分へ移動する［同図（ｄ）ｉ］。次ぎに単語次候補
キー（72）を押し単語の次候補を表示させる［同図
（ｄ）ii］。この結果が正しければ次の修正部分へ進
む。もしこの結果が誤っていれば、再び単語次候補キー
（72）を押し単語の次の候補を表示させる。この操作を
正解が表示されるまで繰り返す。First, a correction method when a word spoken as a word is incorrectly recognized as a phrase will be described. When the word uttered as the word "C" is recognized as the phrase "shi" as shown in FIG. 3C, the cursor (X) is first moved to an incorrect word portion [FIG. ) I]. Next, the next word candidate key (72) is pressed to display the next word candidate [FIG. (D) ii]. If the result is correct, proceed to the next correction. If this result is incorrect, the next word candidate key (72) is pressed again to display the next word candidate. This operation is repeated until a correct answer is displayed.

次ぎに、文節として発声したものが単語として誤認識
された場合の修正法について述べる。文節“い”として
発声したものが、単語“E"として認識された場合、先ず
カーソル（Ｘ）を誤った文節の部分へ移動する。次ぎに
文節次候補キー（73）を押し文節の次候補を表示させ
る。この結果が正しければ次の修正部分へ進む。Next, a description will be given of a correction method when a phrase uttered as a phrase is incorrectly recognized as a word. When the phrase uttered as "i" is recognized as the word "E", the cursor (X) is first moved to the wrong phrase. Next, the next phrase candidate key (73) is pressed to display the next phrase candidate. If the result is correct, proceed to the next correction.

もしこの結果が誤っていれば、文節次候補キー（73）
を押し文節の次候補を表示させる。この操作を正解が表
示されるまで繰り返す。If this result is incorrect, the next phrase candidate key (73)
Press to display the next candidate of the phrase. This operation is repeated until a correct answer is displayed.

単語前候補キー（74）を押すことにより単語、文節前
候補キー（75）を押すことにより文節、それぞれの一つ
前の候補を表示させることも出来る。By pressing the pre-word candidate key (74), a word can be displayed, and by pressing the pre-phrase candidate key (75), the phrase and the preceding candidate can be displayed.

上述の２通りの修正法で正解が得られないときは音節
単位の修正や、単語または文節または音節を再発声入力
する。When a correct answer cannot be obtained by the above two correction methods, correction is performed in units of syllables, and a word, a phrase, or a syllable is re-voiced.

また、再発声入力時に再び、文節を単語認識したり、
単語を文節認識したりすることを避けるため、候補作成
部（15）を、単語認識部（13）より送られてきた認識結
果のみを認識結果としてみなし、文節認識部（14）より
送られてきた認識結果は、無視するよう外部より制御で
きる。Also, when re-utterance input, phrases are recognized again,
In order to avoid phrase recognition, the candidate creation unit (15) regards only the recognition result sent from the word recognition unit (13) as the recognition result and sends it from the phrase recognition unit (14). The recognition result can be externally controlled to be ignored.

また、候補作成部（15）は、文節認識部（14）より送
られてきた認識結果のみを認識結果としてみなし、単語
認識部（13）より送られてきた認識結果は、無視するよ
う外部より制御できる。In addition, the candidate creation unit (15) regards only the recognition result sent from the phrase recognition unit (14) as a recognition result, and ignores the recognition result sent from the word recognition unit (13) so as to ignore it. Can control.

上述の次候補キーとは、以下に述べる機能を有するキ
ーの事であり、第15図を使用し説明する。The above-mentioned next candidate key is a key having the functions described below, and will be described with reference to FIG.

本装置の音声認識部（１）では、単語認識と文節認識
が並走しており、単語および文節の両認識結果を求めて
いることは先に述べたが、この両認識結果より、文節認
識処理の結果を尤度の大きいものから順番に認識結果を
表示装置（６）に表示させるためのキーが文節次候補キ
ー（73）であり、単語認識処理の結果を尤度の大きいも
のから順番に認識結果を表示装置に表示させるためのキ
ーが単語次候補キー（72）であり、現在表示装置に表示
されている認識結果よい、一つ尤度の大きい認識結果を
表示装置（６）に表示するキーが、単語前候補キーおよ
び文節前候補キーである。As described above, the speech recognition unit (1) of the present apparatus performs word recognition and phrase recognition in parallel, and seeks both word and phrase recognition results. The key for displaying the recognition results on the display device (6) in order from the one with the highest likelihood is the next phrase candidate key (73), and the result of the word recognition processing is displayed in the order from the one with the highest likelihood. The key for displaying the recognition result on the display device is the next word candidate key (72), and the recognition result currently displayed on the display device is good, and the recognition result with one large likelihood is displayed on the display device (6). The keys to be displayed are the pre-word candidate key and the pre-phrase candidate key.

第15図は候補作成部（15）の候補バッファ（15a）で
ある。この図は、一位の認識結果が、「たんご」であ
り、これは単語認識部（13）から送られてきた認識結果
であることを（単語）で表わしている。同様に二位の認
識結果が、「たんごを」であり、これは文節認識部（1
4）から送られてきた認識結果であることを（文節）で
表わし、三位の認識結果が、「たんごに」であり、これ
は文節認識部（14）ら送られてきた認識結果であること
を（文節）で表わし、四位の認識結果が、「たんこう」
であり、これは単語認識部（13）から送られてきた認識
結果であることを（単語）で表わしている。FIG. 15 shows a candidate buffer (15a) of the candidate creating section (15). In this figure, the recognition result of the first place is "Tango", which is represented by (word) indicating that this is the recognition result sent from the word recognition section (13). Similarly, the second-ranked recognition result is "Tango", which is a phrase recognition unit (1
The recognition result sent from (4) is represented by (phrase), and the third-ranked recognition result is "Tango", which is the recognition result sent from the phrase recognition unit (14). Something is expressed by (clause), and the recognition result of the fourth place is "Tanko"
This is represented by (word) indicating that this is the recognition result sent from the word recognition unit (13).

いま、表示装置（６）には、「たんご」が表示されて
いるとする。かかる状態で文節次候補キー（73）を押す
と表示装置（６）には「たんごを」が表示される。ま
た、単語次候補キー（72）を押すと表示装置（６）には
「たんこう」が表示される。Now, it is assumed that "Tango" is displayed on the display device (6). When the next phrase candidate key (73) is pressed in such a state, "Tango" is displayed on the display device (6). When the next word candidate key (72) is depressed, "Tanko" is displayed on the display device (6).

また、表示装置（６）には、「たんこう」が表示され
ている場合に、単語前候補キー（74）が押すと表示装置
（６）には「たんご」が表示され、文節前候補キー（7
3）を押すと表示装置（６）には「たんごに」が表示さ
れる。In addition, when "Tanko" is displayed on the display device (6), when the previous word candidate key (74) is pressed, "Tango" is displayed on the display device (6), and the pre-phrase candidate is displayed. Key (7
When 3) is pressed, "Tango ni" is displayed on the display device (6).

次ぎに一文節全体の一括修正方法について述べる。 Next, the batch correction method for an entire phrase is described.

第14図（ｅ）の例は単語「Ｔ」を「Ａ」と誤認識した
例である。先ずカーソルを修正したい単語へ移動する
［同図（ｅ）ｉ］。The example of FIG. 14 (e) is an example in which the word “T” is erroneously recognized as “A”. First, the cursor is moved to the word to be corrected [FIG.

次に単語次候補キー（72）を押し単語の次候補を表示
させる［同図（ｅ）ii］。この結果が正しければ次の修
正部分へ進む。もしこの結果が誤っていれば、単語次候
補キー（72）を押し単語の次候補を表示させる。この操
作を正解が表示されるまで繰り返す。正解が表示され無
ければ、再発声を行ない、再入力をおこなう。前単語候
補キー（74）を押すことにより一つ前に表示した単語の
候補を表示させることも出来る。Next, the next word candidate key (72) is pressed to display the next word candidate [FIG. (E) ii]. If the result is correct, proceed to the next correction. If this result is incorrect, the next word candidate key (72) is pressed to display the next word candidate. This operation is repeated until a correct answer is displayed. If no correct answer is displayed, re-speak and re-enter. By pressing the previous word candidate key (74), the word candidate displayed immediately before can also be displayed.

次ぎに一単語全体の一括修正方法について述べる。 Next, a batch correction method for an entire word will be described.

第14図（ｆ）の例は文節「がめんの」を「がいねん
の」と誤認識した例である。先ずカーソルを修正したい
文節へ移動する［同図（ｆ）ｉ］。The example of FIG. 14 (f) is an example in which the phrase “gamenno” is erroneously recognized as “gainenno”. First, the cursor is moved to the phrase to be corrected [FIG.

次ぎに文節次候補キー（73）を押し文節の次候補を表
示させる［同図（ｆ）ii］。この結果が正しければ次の
修正部分へ進む。もしこの結果が誤っていれば、文節次
候補キー（73）を押し文節の次候補を表示させる。この
操作を正解が表示されるまで操り返す。正解が表示され
無ければ、再発声を行ない、再入力をおこなう。前文節
候補キー（75）を押すことにより一つ前に表示した文節
の候補を表示させることも出来る。Next, the next phrase candidate key (73) is pressed to display the next phrase candidate [FIG. (F) ii]. If the result is correct, proceed to the next correction. If this result is incorrect, the next phrase candidate key (73) is pressed to display the next phrase candidate. Repeat this operation until the correct answer is displayed. If no correct answer is displayed, re-speak and re-enter. By pressing the previous phrase candidate key (75), the phrase candidate displayed immediately before can also be displayed.

次ぎに音節単位の修正方法について述べる。 Next, a method for correcting syllable units will be described.

第14図（ｈ）の例は文節「おんせんで」を「おんけい
で」と誤認識した例である。この例は音節「け」を
「せ」に修正する場合であるが、先ずカーソル（Ｘ）を
修正したい音節「け」へ移動し［同図（ｈ）ｉ］、音節
次候補キー（76）を押す。音節次候補キー（76）を押す
ことにより修正したい部分の音節と最も距離が近い音節
が表示される［同図（ｈ）ii］。正解が表示されれば、
次の修正部分へ移動する。もしこの結果が誤っていれ
ば、再度音節次候補キーを押し音節の次候補を表示させ
る。この操作を正解が表示されるまで繰り返す。正解が
表示され無ければ、再発声により再入力を行なう。再入
力の結果が間違っている時は上記の手順により再び修正
する。この操作を正解が表示されるまで繰り返す。The example in FIG. 14 (h) is an example in which the phrase “onsende” is erroneously recognized as “onkeide”. In this example, the syllable “ke” is corrected to “se”. First, the cursor (X) is moved to the syllable “ke” to be corrected [(h) i in FIG. Press. By pressing the syllable next candidate key (76), the syllable closest in distance to the syllable to be corrected is displayed [FIG. (H) ii]. If the correct answer is displayed,
Move to the next revision. If this result is incorrect, the next syllable candidate key is pressed again to display the next syllable candidate. This operation is repeated until a correct answer is displayed. If the correct answer is not displayed, re-input is performed by re-utterance. If the result of re-entry is incorrect, correct it again by the above procedure. This operation is repeated until a correct answer is displayed.

また前音節候補キー（77）を押すことにより音節の一
つ前の候補を表示させることも出来る。By pressing the previous syllable candidate key (77), the candidate immediately before the syllable can be displayed.

音節を削除したい時は、カーソルを修正したい音節へ
移動し削除キー（78）を押し削除する。To delete a syllable, move the cursor to the syllable you want to modify and press the delete key (78) to delete it.

音節を挿入したい時は、カーソルを修正したい音節移
動し挿入キー（79）を押し挿入する。To insert a syllable, move the cursor to the syllable you want to modify and press the insert key (79) to insert it.

次に第16図を使用し、数音節修正法について記す。 Next, the method of correcting several syllables will be described with reference to FIG.

この例は、同図（ａ）の入力文章“かいじょう”を同
図（ｂ）「かんじょう」と誤認識した例である。この場
合、まずカーソル（Ｘ）を修正したい音節にもっていき
［同図（ｃ）］、“かい”と再発声入口する。かかる再
発声入力音声は音声認識部（１）で認識され、認識結果
は表示装置（６）に表示される。認識結果が正しけれ
ば、次の修正部へすすむ。もし、同図（ｄ）に示すよう
に、「かい」を「かえ」と誤認識した場合、単語の場合
は、単語次候補キー（72）を押す。文節の場合は、文節
次候補キー（73）を押す。第16図は単語の場合の例であ
るので、以下単語の修正方法について記す。同図（ｄ）
の状態で、単語次候補キー（72）を押した場合、まず、
制御部（５）は、単語辞書（13d）より、修正前の同図
（ｂ）の認識結果「かんじょう」と再発声後の同図
（ｄ）の認識結果「かえじょう」とを比較し、同一部分
「じょう」をみつける。次に、制御部（５）は、単語辞
書（13d）より、かかる同一部分「じょう」をもつ単語
を選ぶ。同図（ｆ）は単語辞書（13d）の記憶内容を示
しており、同図（ｇ）は記憶内容より選んだ「じょう」
をもつ単語を示している。次に制御部（５）は、同図
（ｇ）に記した単語と、再発声後の認識結果「かえじょ
う」との尤度を計算し、最も尤度値の大きい単語を表示
する［同図（ｅ）］。This example is an example in which the input sentence "KAIJO" in FIG. 7A is erroneously recognized as "KANJO" in FIG. In this case, the cursor (X) is first brought to the syllable to be corrected (FIG. 3 (c)), and "kai" is entered again. The re-uttered input voice is recognized by the voice recognition unit (1), and the recognition result is displayed on the display device (6). If the recognition result is correct, the process proceeds to the next correction unit. If "kai" is erroneously recognized as "kae" as shown in FIG. 3D, if the word is a word, the next word candidate key (72) is pressed. If it is a phrase, press the next phrase candidate key (73). FIG. 16 shows an example of the case of a word, and a method of correcting the word will be described below. Figure (d)
If you press the next word candidate key (72) in the state of,
From the word dictionary (13d), the control unit (5) compares the recognition result “kanjo” in FIG. 12B before correction with the recognition result “kaejo” in FIG. Find the same part "Jo". Next, the control unit (5) selects a word having the same part "jo" from the word dictionary (13d). FIG. 11F shows the stored contents of the word dictionary (13d), and FIG. 13G shows the “Jo” selected from the stored contents.
Is shown. Next, the control unit (5) calculates the likelihood between the word described in FIG. 7G and the recognition result “Kaejo” after the re-speaking, and displays the word having the highest likelihood value. Figure (e)].

次に文節または単語の認識境界誤りを修正する場合に
ついて述べる。Next, a case of correcting a recognition boundary error of a phrase or a word will be described.

第14図（ｇ）の例は文節「ぶんしょうを」を「ん」と
「し」の間に印で示す無音区間があると誤認識し、単語「ぶん」と文
節「しょうを」というように二つに分けて誤認識した例
である。この場合認識境界誤りを修正しなけえばならな
いが、認識境界区切り記号を削除したい場合は、削除し
たい認識境界区切り記号にカーソル（Ｘ）を移動し［同
図（ｇ）ｉ］、削除キー（78）を押す［同図（ｇ）i
i］。認識境界区切り記号を挿入したい場合は挿入した
い位置にある音節にカーソル（Ｘ）を移動し挿入キー
（79）を押す。In the example of Fig. 14 (g), the phrase "bunsho-o" is between "n" and "shi". This is an example in which a silence section indicated by a mark is erroneously recognized, and is erroneously recognized in two parts, such as a word “bun” and a phrase “showo”. In this case, the recognition boundary error must be corrected. However, if it is desired to delete the recognition boundary delimiter, move the cursor (X) to the recognition boundary delimiter to be deleted [(g) i in FIG. Press [] (g) i
i]. If you want to insert a recognition boundary separator, move the cursor (X) to the syllable at the position where you want to insert it and press the insert key (79).

ただし、後に述べるように録音再生装置（２）の区切
りビーブ音と、記憶装置（８）に記憶された認識結果に
付加された区切り記号は、録音再生装置（２）と記憶装
置（８）の同期をとるための目印となるので、対応はと
っておかなければならない。ゆえに、この時記憶装置
（８）に区切り記号が挿入削除されたことを記憶装置
（８）に記憶しておく。However, as described later, the beep sound of the recording / reproducing device (2) and the delimiter added to the recognition result stored in the storage device (8) are different from those of the recording / reproducing device (2) and the storage device (8). It must be taken into account as it serves as a marker for synchronization. Therefore, at this time, the fact that the delimiter is inserted or deleted in the storage device (8) is stored in the storage device (8).

例えば、第14図（ｂ）に示した文章がの一部が、第14
図（ｇ）iiに示すように、記憶装置（８）に記憶されて
いるものとする。（ｇ）ｉの章を、（ｇ）iiに示すよう
に修正した場合、記憶装置（８）に記憶されている区切
り記号は、第14図（）に示したように、記号「」に改めら
れる。記号「」は、区切り記号が削除されたことを記す記号であり、認識単位を示す記
号には用いらず、録音再生装置（２）等との制御のみに
用いられる記号である。For example, part of the sentence shown in FIG.
It is assumed that the data is stored in the storage device (8) as shown in FIG. (G) When the chapter of i is modified as shown in (g) ii, the delimiter stored in the storage device (8) Is changed to the symbol "" as shown in FIG. The symbol "" is a delimiter Is a symbol indicating that the symbol is deleted, and is not used as a symbol indicating a recognition unit, but is used only for control with the recording / reproducing device (2) or the like.

このような構成によれば、区切り記号を削除した後も、録音再生装置（２）に録音されたビー
プ音と、記憶装置（８）に記憶された記号「」を用いることにより、同期をとりながら両装置を
制御できる。According to such a configuration, the delimiter Is deleted, the beep sound recorded in the recording / reproducing device (2) and the symbol stored in the storage device (8) By using "", both devices can be controlled while maintaining synchronization.

以上は、区切り記号を削除した場合の例であるが、挿入された場合も同様の
考え方ができる。つまり、制御信号としては用いられ
ず、区切りのみを表わす特定の記号を、区切り記号の替わりに挿入すればよい。The above is a separator Is deleted, but the same idea can be applied to the case where the data is inserted. In other words, a specific symbol that is not used as a control signal but represents only a delimiter Should be inserted instead of

以上の修正手順により、第14図（ｉ）に示すように、
文章を修正する。By the above correction procedure, as shown in FIG.
Correct the sentence.

認識境界誤り修正を行なった後認識境界誤り修正を行
なった認識単位について、修正手順に従って修正を加え
る。再発声による修正の場合、標準パターンを登録した
人なら誰の音声でも認識できるので文章の録音者でなく
とも修正操作を行なえる。After the recognition boundary error correction, the recognition unit in which the recognition boundary error correction is performed is corrected according to the correction procedure. In the case of correction by re-utterance, anyone who has registered the standard pattern can recognize any voice, so that the correction operation can be performed even if it is not the recorder of sentences.

以上、かな列文章の修正方法を述べたが、修正を補助
する機能として以下に述べる機能を有する。The method of correcting the kana sequence has been described above, and the following functions are provided as functions for assisting the correction.

表示装置（６）に表示された文字列上のカーソル移動
と表示画面のスクロール機能により、記憶装置（８）に
より順次記憶文章を表示画面上に表示できるが、この時
画面上に表示されている部分に対応する音声が録音再生
装置（２）から再生される。By moving the cursor on the character string displayed on the display device (6) and scrolling the display screen, the storage device (8) can sequentially display the stored sentences on the display screen. At this time, the sentences are displayed on the screen. Sound corresponding to the portion is reproduced from the recording / reproducing device (2).

また、上述の機能とは逆の機能も有し、録音再生装置
（２）から再生されている部分に対応した文字列が表示
装置（６）に表示される。It also has a function opposite to the above-mentioned function, and a character string corresponding to a portion being reproduced from the recording / reproducing device (2) is displayed on the display device (6).

また、上述のどちらの方法の場合も録音文章に録音さ
れている区切り記号音と、表示側に記録されている区切
り記号を、同期を取るタイミング信号として使用し、録
音再生装置（２）の再生と表示とがお互いに同期をとり
ながら動作するよう制御している。また、キーボード
（７）、または録音再生装置（２）より再生を止める信
号が入力されたとき、再生を止めるとともに、表示のス
クロールまたはカーソルの移動を止める。In both of the above-mentioned methods, the delimiter sound recorded in the recorded text and the delimiter recorded on the display side are used as timing signals for synchronizing, and the reproduction of the recording / reproducing apparatus (2) is performed. And the display are controlled to operate in synchronization with each other. When a signal for stopping the reproduction is input from the keyboard (7) or the recording / reproducing device (2), the reproduction is stopped and the scrolling of the display or the movement of the cursor is stopped.

以上の録音再生装置（２）の再生と表示との同期機能
により、再生音を聞きながら文字列の認識を行なうこと
ができ、修正個所の発見を容易にする。By the function of synchronizing the reproduction and display of the recording / reproducing apparatus (2) as described above, the character string can be recognized while listening to the reproduced sound, thereby facilitating finding a correction position.

ここで述べている同期のとり方として、再生されてい
る部分に対応する記憶装置（８）の文字列を表示装置
（６）に表示する方法と、再生されている部分に対応す
る部分より区切り記号一つ遅れた部分のかな列を表示装
置（６）に表示する方法とがある。As a method of synchronization described here, a method of displaying a character string of the storage device (8) corresponding to the part being reproduced on the display device (6), and a delimiter from the part corresponding to the part being reproduced. There is a method of displaying the kana row of the one-delayed portion on the display device (6).

この場合、修正のため表示を停止したときには既に録
音音声の修正部分は再生されているため再度修正部分を
再生するためには、再生された文章より修正したい部分
の頭だしを行なう必要がある。そこで、この方法を採用
する場合は、表示を停止したとき、自動的に録音再生装
置（２）を一つ前の区切り記号までバックトラックする
機能をもたせる。In this case, when the display is stopped for correction, the corrected portion of the recorded voice has already been reproduced, and thus, to reproduce the corrected portion again, it is necessary to start the portion to be corrected from the reproduced text. Therefore, when this method is adopted, a function of automatically backtracking the recording / reproducing apparatus (2) to the immediately preceding delimiter when the display is stopped is provided.

また、録音再生装置（２）に、テープレコーダを使用
した場合、再生部分をモータの回転により制御すること
と、テープのたるみなどにより、修正部分に対応した部
分の頭出しが性格に行なえない場合がある。When a tape recorder is used for the recording / reproducing device (2), the reproducing portion is controlled by the rotation of the motor, and when the cue of the portion corresponding to the corrected portion cannot be accurately performed due to slack of the tape or the like. There is.

このような場合は、入力されてくる音声を、一定時間
長だけPCM録音やADPCM録音で記憶しておい、入力された
音声を聞き返したい場合は、PCM録音やADPCM録音音声を
聞き返す機能を付加する。In such a case, the input voice is stored for a certain period of time in PCM recording or ADPCM recording, and if you want to hear back the input voice, add a function to hear back the PCM recording or ADPCM recording voice. .

第17図は上記の、機能の一実施例であり、PCM録音の
データを記憶しておくPCMデータメモリの図である。図
中の数字01〜05はアドレスを示してる。入力音声は、第
14図に記した“わたしわ｜てん｜しー｜あーる｜てー｜
がめんの｜ぶんしょうを｜てん｜おんせいで｜しゅうせ
いした｜まる”という、文章である。FIG. 17 shows an embodiment of the above function, and is a diagram of a PCM data memory for storing PCM recording data. Numerals 01 to 05 in the figure indicate addresses. The input voice is
“Iwa ｜ ten ｜ shi ｜ ear ｜ te ｜
This is a sentence that says "Gamen | Bunsho | Ten | Onsei | Onsen | Maru".

上記の、音声が入力されたとき、PCMデータメモリ（D
M）には、01番地に最初の無音区間までの音声“わたし
わ”が記憶される。02番地に２番目の無音区間までの音
声“てん”が記憶される。05番地に５番目の無音区間ま
での音声“てー”が記憶される。このとき、PCMアドレ
スポインタ（AP）は、PCMデータメモリに記憶されてい
るデータのうち、１番先に記憶されたデータのアドレス
を記憶していおく。本例では、01が記憶される。When the above sound is input, the PCM data memory (D
M) stores the voice “Iwa” up to the first silent section at address 01. At the address 02, the voice "ten" up to the second silent section is stored. At the address 05, the voice "te" up to the fifth silent section is stored. At this time, the PCM address pointer (AP) stores the address of the data stored first in the data stored in the PCM data memory. In this example, 01 is stored.

この段階ではPCMデータメモリは一杯になる。 At this stage, the PCM data memory is full.

次に、音声が入力されたときは、PCMデータメモリ（D
M）に記憶されているデータのうち、１番先に記憶され
たデータのアドレスに、入力された音声を記憶する。本
例では“わたしわ”が記憶されていたアドレス01に“が
めんの”を記憶する。このとき、PCMアドレスポインタ
（AP）は、PCMデータメモリ（DM）に記憶されているデ
ータのうち、１番先に記憶されたデータのアドレスを記
憶しておく。本例では、02が記憶される。Next, when voice is input, the PCM data memory (D
The input voice is stored at the address of the data stored first in the data stored in M). In this example, “gamenno” is stored at the address 01 where “Iwa” was stored. At this time, the PCM address pointer (AP) stores the address of the data stored first in the data stored in the PCM data memory (DM). In this example, 02 is stored.

この状態で、PCMデータメモリ（DM）の内容を再生す
る場合、PCMアドレスポインタ（AP）の指している、ア
ドレスから、再生する。本例では、02,03,04,05,01の順
番に再生していく。In this state, when reproducing the contents of the PCM data memory (DM), reproduction is started from the address indicated by the PCM address pointer (AP). In this example, playback is performed in the order of 02, 03, 04, 05, 01.

かかる方法により、何度でも、正確に素早く、音声を
聞き返すことが可能となる。By such a method, it is possible to hear the sound accurately and quickly many times.

また、画面上の認識単位の区切り信号上へカーソル
（Ｘ）を移動し録音音声の頭出しキー（70）を押すこと
により、カーソルが示している認識単位に対応した録音
再生装置（２）側の区切り記号音部分を録音文章より捜
し出し、これに続く文章を再生する機能を有する。以下
に、かかる機能の実施例を示す。Also, by moving the cursor (X) to the recognition unit delimiter signal on the screen and pressing the recording voice cue key (70), the recording / reproducing device (2) corresponding to the recognition unit indicated by the cursor is moved. Has a function of searching for a delimiter sound portion of a recorded sentence, and reproducing the sentence following it. An example of such a function will be described below.

認識した文章の確認のため、認識結果を記憶装置
（８）より読み出し、表示装置（６）に冒頭より表示さ
せる。この時、第19図、制御部（５）の区切り記号カウ
ンター（5a）は、記号装置（８）より読み出された区切
り記号の数を計数していく。読み出した認識結果が誤っ
ている場合は、誤っている部分にカーソルをあて、頭出
しキーを押す。制御部（５）は、録音再生装置（２）に
録音されている文章を、早送り再生モードで再生させ
る。特徴抽出部（12）のビーブ音カウンター（12e）
は、録音再生装置（２）より入力される文章中の区切り
をしめすビープ音を計数する。To confirm the recognized text, the recognition result is read from the storage device (8) and displayed on the display device (6) from the beginning. At this time, the delimiter counter (5a) of the controller (5) in FIG. 19 counts the number of delimiters read from the symbol device (8). If the read recognition result is incorrect, place the cursor on the incorrect part and press the cue key. The control unit (5) reproduces the text recorded in the recording / reproducing device (2) in a fast-forward reproduction mode. Beeper counter (12e) of the feature extractor (12)
Counts beep sounds indicating a break in a sentence input from the recording / reproducing device (2).

比較回路（5b）は、ビープ音カウンター（12e）の値
が、先に述べた区切り記号カウンター（5a）の値より、
１つ小さくなったとき、信号（ハ）を録音再生装置
（２）に送り、再生を止める。The comparison circuit (5b) determines that the value of the beep counter (12e) is greater than the value of the delimiter counter (5a) described above.
When it is reduced by one, the signal (c) is sent to the recording / reproducing device (2), and the reproduction is stopped.

また、認識結果、および修正を終了した文章の確認の
ためには、記憶装置（８）の記憶データを表示装置
（６）に文字列で表示させ、表示画面上に表示された文
字列を目で追い、読まなければならないため、非常に目
が疲れる。In addition, in order to confirm the recognition result and the sentence after the correction, the stored data of the storage device (8) is displayed as a character string on the display device (6), and the character string displayed on the display screen is visually checked. I have to read and read, which makes my eyes very tired.

かかる点に鑑み、本装置は認識結果を記憶させた記憶
装置（８）上の文字列を、音声合成機能により読み上げ
る機能をもなせることにより、認識結果、および修正を
終了した文章の認識を音声合成音を聞くことにより行な
えるようにできる。In view of this point, the present apparatus has a function of reading out a character string on the storage device (8) storing the recognition result by a speech synthesis function, thereby recognizing the recognition result and the sentence after the correction is completed. This can be done by listening to the synthesized speech.

この場合も音声合成部（９）と記憶装置（８）と録音
再生装置（２）と表示装置（６）との同期を取るタイミ
ング信号として、区切り記号を使用する。Also in this case, a delimiter is used as a timing signal for synchronizing the voice synthesizer (9), the storage device (8), the recording / reproducing device (2), and the display device (6).

つまり、音声合成部（９）が記憶装置（８）より読み
上げている部分に相当する文字列が表示装置（６）に表
示され、同時に録音再生装置（２）より録音部分を頭出
ししている。この方法により、音声合成音の読み合わせ
機能により誤りを発見し修正のために音声合成の読み合
わせ機能を停止させたとき、表示装置（６）の表示も録
音再生装置（２）の録音部分も誤り部分を示しており、
即座に修正を行なうこともできる。In other words, a character string corresponding to the portion read by the voice synthesizer (9) from the storage device (8) is displayed on the display device (6), and at the same time, the recording portion is caught by the recording / reproducing device (2). . According to this method, when an error is found by the speech synthesis reading function and the speech synthesis reading function is stopped for correction, both the display on the display device (6) and the recording portion of the recording / reproducing device (2) have an error portion. Indicates that
Corrections can be made immediately.

ここで述べている同期のとり方として、音声合成機能
により読み上げられている部分に対応する記憶装置のか
な列を表示装置（６）に表示すると同時に、録音再生装
置（２）に録音されている文章より該当する音節部分を
再生する方法と、音声合成機能により読み上げられてい
る部分に対応する部分より、区切り記号一つ遅れた録音
再生装置（２）に録音されている文章部分再生する方法
とがある。後者の場合、修正のため音声合成を停止した
とき、録音再生装置（２）は修正したい部分より手前で
停止しているため、この状態で再生すれば直ぐに修正部
分の音声を再生できる。前者の場合は修正のめ音声合成
を停止したときには既に録音音声の修正部分は再生され
ているため再度修正部分を再生するためにはバックトラ
ックする必要がある。そこで、前者の方法を採用する場
合は表示を停止したとき、自動的に録音再生装置（２）
が一つ前の区切り記号までバックトラックする機能をも
たせるのが好ましい。As a method of synchronization described here, a kana column in a storage device corresponding to a portion read out by the speech synthesis function is displayed on the display device (6), and at the same time, a sentence recorded on the recording / reproducing device (2). A method of reproducing the corresponding syllable part and a method of reproducing the sentence part recorded in the recording / reproducing apparatus (2) one delimiter behind the part corresponding to the part read out by the speech synthesis function. is there. In the latter case, when the speech synthesis is stopped for correction, the recording / reproducing device (2) is stopped before the portion to be corrected, so that if the sound is reproduced in this state, the voice of the corrected portion can be reproduced immediately. In the former case, when speech synthesis for correction is stopped, the corrected portion of the recorded voice has already been reproduced, so that it is necessary to backtrack to reproduce the corrected portion again. Therefore, when the former method is adopted, when the display is stopped, the recording / reproducing device (2) is automatically
Preferably has the function of backtracking to the previous delimiter.

以上、認識結果を記憶楝（８）に記憶しておく実施例
を記してきたが、別の実施例として、録音再生装置
（２）に認識結果を記憶させてもよい。As described above, the embodiment in which the recognition result is stored in the storage connection (8) is described. However, as another embodiment, the recognition result may be stored in the recording / reproducing device (2).

記憶装置（８）に記憶された、認識結果を、原文の録
音された録音再生装置（２）に記録しておけば、原文と
認識結果が、同一記録媒体に記録できるため、原文と認
識結果の管理が容易になる。If the recognition result stored in the storage device (8) is recorded in the recording / reproducing device (2) where the original is recorded, the original and the recognition result can be recorded on the same recording medium. Management becomes easier.

また、録音文章を、再生入力しながら、認識した結果
を録音再生装置（２）に録音していくことにより、外部
記憶装置が不要となる。Also, by recording the recognized result in the recording / reproducing device (2) while reproducing and inputting the recorded text, an external storage device becomes unnecessary.

いずれの場合も、マルチトラック方式の録音再生装置
（２）を用いることにより、録音音声を再生しながら、
音声の録音されていないトラックに認識結果を記憶させ
ることができる。In any case, by using the multi-track recording and reproducing device (2), while reproducing the recorded voice,
The recognition result can be stored in a track on which no sound is recorded.

（ト）発明の効果本発明の文章作成システムによれば、作成した文章の
確認中に誤りを発見し、合成機能を停止した場合、録音
再生装置の再生部分が所定の無音区間検出信号分逆戻り
しているため、録音再生装置側は誤り個所の頭出しがで
きており、この状態で再生すれば直ぐに誤り個所の文章
を録音再生装置より聞き出すことができる。(G) Effects of the Invention According to the sentence creation system of the present invention, when an error is found while checking the created sentence and the synthesizing function is stopped, the playback portion of the recording / playback device returns by a predetermined silent section detection signal. As a result, the recording / reproducing apparatus has found the beginning of the erroneous part, and if the reproduction is performed in this state, the text at the erroneous part can be immediately heard from the recording / reproducing apparatus.

このように音声合成機能を使用した音声認識文章結果
の確認時に、原文（録音再生装置に録音されている文
章）の頭出しが同時に行なわれているため、原文（録音
再生装置に録音した文章）の再生が容易であり、修正操
作の効率の向上が望める。As described above, the original sentence (the sentence recorded on the recording / reproducing device) is simultaneously searched at the time of confirming the result of the speech recognition sentence using the speech synthesis function. Can be easily reproduced, and the efficiency of the correction operation can be improved.

[Brief description of the drawings]

第１図は本発明の音声認識システムを採用したディクテ
ーティングマシンの外観図、第２図はディクテーティン
グマシンの構造図、第３図は音声認識部（１）の構成
図、第４図は前処理部（11）の構成図、第５図は特徴抽
出部（12）の構成図、第６図は単語認識部（13）の構成
図、第７図は文節認識部（14）の構成図、第８図は入力
切り換え部（４）の構成図、第９図は見出し語と録音方
式とキャラクター音の関係図、第10図はキャラクター音
の録音方法と音声区間の関係図、第11図は録音再生装置
がマルチトラック方式の場合の録音方法を示す図、第12
図は録音再生装置がシングルトラック方式の場合の録音
方法を示す図、第13図は周波数補正回路例を示す図、第
14図は誤認識時の修正図、第15図は候補作成部（15）内
の候補バッファ（15a）を示す図、第16図は誤認識時の
数音節修正例を示す図、第17図はPCM録音方法説明図、
第18図はAGC動作の説明図、第19図は、区切り記号のカ
ウンターの説明図である。（１）……音声認識部、（２）……録音再生装置、
（３）……マイク、（６）……表示装置、（７）……キ
ーボード、（８）……記憶装置、（11）……前処理部、
（12）……特徴抽出部、（13）……単語認識部、（14）
……文節認識部。FIG. 1 is an external view of a dictating machine employing the speech recognition system of the present invention, FIG. 2 is a structural diagram of the dictating machine, FIG. 3 is a configuration diagram of a speech recognition unit (1), FIG. Is a block diagram of the preprocessing unit (11), FIG. 5 is a block diagram of the feature extraction unit (12), FIG. 6 is a block diagram of the word recognition unit (13), and FIG. 7 is a block diagram of the phrase recognition unit (14). FIG. 8 is a structural diagram of the input switching unit (4), FIG. 9 is a diagram showing a relationship between a headword, a recording method, and a character sound, FIG. 10 is a diagram showing a relationship between a character sound recording method and a voice section, FIG. 11 is a diagram showing a recording method when the recording / reproducing apparatus is a multi-track system, and FIG.
FIG. 13 is a diagram showing a recording method when the recording / reproducing apparatus is a single track system, FIG. 13 is a diagram showing an example of a frequency correction circuit,
FIG. 14 is a correction diagram at the time of misrecognition, FIG. 15 is a diagram showing the candidate buffer (15a) in the candidate creating section (15), FIG. 16 is a diagram showing an example of correcting several syllables at the time of misrecognition, FIG. Is an illustration of PCM recording method,
FIG. 18 is an explanatory diagram of the AGC operation, and FIG. 19 is an explanatory diagram of a separator counter. (1) ... voice recognition unit, (2) ... recording and playback device,
(3) ... microphone, (6) ... display device, (7) ... keyboard, (8) ... storage device, (11) ... preprocessing unit,
(12): Feature extraction unit, (13) Word recognition unit, (14)
...... Phrase recognition unit.

フロントページの続き (56)参考文献特開昭62−113264（ＪＰ，Ａ) 特開昭59−62949（ＪＰ，Ａ) 特開昭59−127148（ＪＰ，Ａ) 特開昭58−217044（ＪＰ，Ａ)Continuation of the front page (56) References JP-A-62-113264 (JP, A) JP-A-59-62949 (JP, A) JP-A-59-127148 (JP, A) JP-A-58-217044 (JP, A) , A)

Claims

(57) [Claims]

1. A speech recognition function for recognizing an input voice reproduced from a recording / reproducing device for each syllable, syllable, or word unit by using a silent section recorded in the recording / reproducing device as a delimiter of a recognition unit. When a silent section is detected, a silent section detection signal indicating that the silent section has been detected is recorded in the silent section, and the speech recognition result is stored in a storage device with a delimiter for each recognition unit. A detection signal and a delimiter signal are made to correspond one-to-one, a sentence to be displayed on a display device corresponding to the result of the voice recognition using the two signals, and a voice synthesis unit corresponding to the result of the voice recognition. In a sentence creation system that synchronizes the display output and the speech synthesis output between the sentence speech synthesized by the And the function of reproducing the sentence voice from the recording / reproducing device, and when the synthesized sound is stopped, for example, when correcting the speech recognition result, the reproducing portion of the recording / reproducing device reverts by a predetermined silent section detection signal. Sentence writing system.