JP2777366B2

JP2777366B2 - Speech recognition system

Info

Publication number: JP2777366B2
Application number: JP62264379A
Authority: JP
Inventors: 正幸飯田; 宏樹大西; 計美大倉; 憲敬森
Original assignee: Sanyo Denki Co Ltd
Current assignee: Sanyo Denki Co Ltd
Priority date: 1987-10-20
Filing date: 1987-10-20
Publication date: 1998-07-16
Anticipated expiration: 2013-07-16
Also published as: JPH01106099A

Description

【発明の詳細な説明】（イ）産業上の利用分野音声認識システムに関するものである。（ロ）従来の技術音声をテープレコーダの如き録音再生装置に録音し、
これを再生して出力される再生音声を音声認識装置へ入
力することにより、音声認識を行ないこれを文章化する
音声認識システムが開発されつつある（特開昭58−1587
36号）。録音再生装置と音声認識装置を組み合わせた従来の音
声認識システムは、マイクより直接音声を入力するかわ
りに、録音しておいた文章を入力するものであった。つ
まり、録音再生装置は、マイクより入力する文章を前以
て録音しておくだけのものにすぎなかった。（ハ）発明が解決しようとする問題点上述の如き従来システムに於いては、録音再生装置
は、マイクより入力する文章を前以て録音しておくだけ
のものであったため、記憶機能をもつ録音再生装置を装
備しながら、認識結果を記憶するための記憶装置を別に
必要としていた。また、原文（録音文章）と認識結果が
別々の記憶媒体に記憶されるため、これ等の記憶媒体の
管理に難点があった。（ニ）問題点を解決するための手段本発明の音声認識システムは原文が記憶された録音再
生装置に認識結果をも記憶させるものである。（ホ）作用本発明によれば、音声認識装置での認識結果を、該音
声認識装置への入力音声を格納してある録音再生装置に
記憶させるため、認識結果記憶だけの為の従来のフロッ
ピーディスクなどの記憶媒体が不要になる。（ヘ）実施例第１図に本発明を採用して音声入力により文章作成す
るディクテーティングマシンの外観図を示し、第２図に
該マシンの機能ブロック図を示す。第２図に於て、（１）は第１図の本体（100）内に回
路装備された音声認識部であり、その詳細は第３図のブ
ロック図に示す如く、入力音声信号の音圧調整を行う前
処理部（11）［第４図］、該処理部（11）からの音圧調
整済みの音声信号からその音響特徴を示すパラメータを
抽出する特徴抽出部（12）［第５図］、該抽出部（12）
から得られる特徴パラメータに基づき入力音声の単語認
識を行う単語認識部（13）［第６図］と文節認識部（1
4）［第７図］、及びこれらいずれかの認識部（13）、
（14）からの認識結果に基づき認識単語文字列、或いは
認識音節文字の候補を作成する候補作成部（15）からな
る。更に第２図に於て、（２）は第１図に示す如く本体
（100）に機械的並びに電気的に着脱可能なテープレコ
ーダ等の録音再生装置、（３）は例えば第１図図示の如
きヘッドホンタイプのマイクロホン、（４）は録音再生
装置（２）とマイクロホン（３）と音声認識部（１）と
のあいだの接続切り換えを行う入力切り換え部［第８
図］である。（６）は認識結果に基づき生成した文字列
等を表示するための表示装置、（７）は該ディクテーテ
ィングマシンの各種制御信号を入力するためのキーボー
ド、（８）は該ディクテーティングマシンで生成された
文字列を記憶する磁気ディスク装置等の記憶装置、
（９）は該記憶装置の文字列を規則合成によりスピーカ
（10）から読み上げるための音声合成部である。尚、
（５）はマイクロプロセッサからなる制御部であり、上
記各部の動作の制御を司っている。上述の構成のディクテーティングマシンに依る文章作
成方法として二通りあり、それぞれに就いて以下に詳述
する。第一の方法は、マイク（３）より生音声を音声認識部
（１）に入力し、音声認識を行ない、入力音声を文字列
に変換し、表示装置（６）に表示し、同時に記憶装置
（８）に結果を記憶する。第二の方法は、入力したい文章を予め録音再生装置
（２）に録音しておき、この録音再生装置（２）を本装
置に接続し、録音文章を音声認識部（１）に入力するこ
とにより、音声認識を行ない、入力音声を文字列に変換
し、表示装置（６）に表示し、同時に記憶装置（８）に
結果を記憶する。上述の様に、音声を入力する方法は、二通りあるの
で、入力切り換え部（４）において、入力の切り換えを
行なう。また入力切り換え部（４）は、入力の切り換え
の他に、録音再生装置（２）に録音信号（イ）を録音す
るのか、マイク（３）より入力された音声を録音するの
かの切り換えも行なう。以下に音声録音から文章作成までの動作を順次詳述す
る。（ｉ）音声登録処理音声認識を行なうに先だち、音声認識に必要な音声の
標準パターンを作成するため、音声登録を行なう。まず、音節登録モードについて述べる。ここで述べている標準パターンとは、音声認識部
（１）の文節認識部（14）でのパターンマッチィング時
の基準パターンとなるものであり、具体的には第７図の
如き文節認識部（14）の音節標準パターンメモリ（14
d）に格納される。本ディクテーティングマシンに音声登録する方法は、
まず第７図のスイッチ（14s1）を操作しパラメータバッ
ファ（14d）と音節認識部（14b）とを接続し、次に述べ
る三方法がある。第一の方法は該マシンの本体（100）にマイク（３）
より直接登録音声を入力し、この登録音声を音声認識部
（１）で分析し、標準パターンを作成し、作成された標
準パターンを音節標準パターンメモリ（14d）および記
憶装置（８）に記憶させる方法である。第二の方法は前もって登録音声を録音しておいた録音
再生装置（２）を本体（100）に接続し、この録音登録
音声を再生することにより登録音声の入力をなし、この
入力した登録音声を音声認識部（１）で分析し、標準パ
ターンを作成し、作成した標準パターンを音節標準パタ
ーンメモリ（14d）および記憶装置（８）に記憶させる
方法である。第三の方法は本マシンの本体（100）にマイク（３）
から直接登録音声を入力するが、このとき同時に録音再
生装置（２）を本体（100）に接続しておきこの入力さ
れた音声を録音再生装置（２）に録音しながら、本体
（100）側ではマイク（３）からの登録音声の分析を行
ない標準パターンを作成し、作成した標準パターンを記
憶装置（８）に記憶させておく。そして、次にこのマイ
ク（３）への音声入力が終了すると、これに引き続き、
録音再生装置（２）に録音された音声を再生し、この録
音された登録音声を音声認識部（１）で分析し、標準パ
ターンを作成し、作成した標準パターンを音節標準パタ
ーンメモリ（14d）に記憶しておくと同時に、記憶装置
（８）にも上述のマイク（３）からの直接の登録音声の
音節標準パターンと共に記憶させる方法である。この第３の方法に於ては、録音再生装置（２）に録音
した音声は録音再生装置（２）の周波数特性を受けてい
るため、録音した音声から作成した標準パターンと、マ
イク（３）から直接入力した音声より作成した標準パタ
ーンとを比べた場合、両標準パターンの間に違いが現れ
る。故に録音音声を認識させるときは、録音音声より作
成した標準パターンを使用する必要があり、マイク
（３）から直接入力した音声を認識させるときは、マイ
ク（３）から直接入力した音声より作成した標準パター
ンを使用する必要があるので、上述の如き方法をとるこ
とによって、マイク（３）から直接登録した標準パター
ンと録音音声より作成した標準パターンの両パターンを
一回の音声登録操作によって作成し記憶できる。また、
一度録音再生装置（２）に登録音声を録音しておけば標
準パターンを作成していないディクテーティングマシン
上にも登録者の発声入力を必要とせず、この録音音声を
再生入力するだけで、標準パターンが作成できる。ま
た、録音再生装置（２）に登録音声を録音し、さらにこ
の登録音声のあとに文章を録音しておけば、後にこの録
音再生装置（２）を本体（100）に接続し、録音された
音声を再生するだけで音声登録から、文章作成まで、す
べて自動的に行なえる。尚、音声の標準パターンを作成する為の登録者の発声
入力は、本装置が一定の順序で表示装置（６）に表示す
る文章を登録者が読み上げることにより行なわれる。また、本マシン専用の表示機能をもつ録音再生装置
（２）を使用する場合はこの録音再生装置（２）単独で
携帯する時でもその表示画面に表示された見出し語に対
応する音声を発声し録音再生装置（２）に録音する事
で、標準パターンの作成が可能となる。上述の如く、標準パターンを作成するための登録音声
を録音再生装置（２）に録音する場合は、この録音され
た登録音声より標準パターンを作成するときにノイズな
どの影響を受け録音音声とこれに対応するべき見出し語
とがずれる可能性があり、以下、第９図に基づき、説明
のため録音再生装置としてテープレコーダを使用した場
合について述べる。第９図（ａ）はテープレコーダに標
準パターン作成のための登録音声を録音した状態のう
ち、見出し語「あ」〜「か」に対応した登録音声“あ”
〜“か”の間のテープの状態を表わしており、ここでは
“え”と“お”の間に［ノイズ］が録音された場合を示
す。第９図（ａ）の様に登録音声と登録音声との間に
［ノイズ］が録音されたテープにより音声登録を行なっ
た場合、１番目に録音された音が“あ”で２番目に録音
された音が“い”という様に、ただ単にテープに録音さ
れた音の順序により、入力された登録音声がどの音節に
対応しているのかを決定していると、［ノイズ］まで登
録音声とみなして見出し語を対応させるので入力された
実際の登録音声と見出し語とがずれてしまう。ここで、第９図（ｂ）は［ノイズ］を音声と誤認識
し、見出し語「え」のところに［ノイズ］が入力され、
見出し語「お」のところに音節“え”が入力された図で
ある。この様に登録音声より標準パターンを作成するときに
ノイズなどの影響を受け録音音声と見出し語とがずれる
場合があるため、第９図（ｃ）に示すように、登録音声
の種類を示したキャラクターコード音を、登録音声に対
応させて録音再生装置（２）に録音する。この方法によ
り、“う”と“え”の間に［ノイズ］が録音されていて
も、上述のように、入力された音と見出し語とのずれを
防止する。このずれを防止する特定周波数のキャラクターコード
音の録音方法を、録音再生装置（２）のテープレコーダ
がシングルトラックである場合と、マルチトラックであ
る場合とにわけて説明する。まず第10図において、録音方式としてマルチトラック
をもつ録音再生装置を使用する場合について述べる。録音方式としてマルチトラックをもつ録音再生装置を
使用する場合は同図（ａ）に示すように音声を録音して
いないトラックに見出し語に対応するキャラクターコー
ドを録音する。音声認識部（１）では、このキャラクタ
ーコード音より、入力される音声の見出し語を知るとと
もに、音声トラックに録音された音のうち、このキャラ
クターコード音が録音された区間t1に録音された音のう
ち、音圧しきい値以上の条件をみたすもののみを音声と
みなし、分析を行なう。または、同図（ｂ）に示すように、音声の始めと終わ
りに見出し語に対応するキャラクターコードを録音し、
音声トラックに録音された音のうち、この音声の始めを
示すキャラクターコード音と、音声の終わりを示すキャ
ラクターコード音の間の区間t2に録音された音のうち、
音圧しきい値以上の条件をみたすもののみを音声とみな
し、分析を行なう。または、同図（ｃ）に示すように、音声の始めに見出
し語に対応するキャラクターコードを録音する。音声ト
ラックに録音された音のうち、この音声の種類を示すキ
ャラクターコード音から、次の見出し語に対応するキャ
ラクターコード音までの区間t3に録音された音のうち、
音圧しきい値以上の条件をみたすもののみを音声とみな
し、分析を行なう。また第二の方法としてシングルトラックの録音再生装
置（２）の場合は、見出し語に対応するキャラクターコ
ードを音声の分析周波数帯域外の音で表わし、音声の録
音されているトラックに音声と共に録音する。この場合
のキャラクターコード音を録音する方法は、上述のマル
チトラックの場合と同様である。つまり、上述のt1、t
2、t3の区間に録音された音うち、上述と同様の条件を
みたすもののみを音声とみなし、分析を行なう。ただ
し、音声と、キャラクターコード音が重なっている同図
（ａ）に示した実施例の場合以外は、キャラクターコー
ド音に、音声の分析周波数帯域外の音を使用しなくても
よい。次ぎにアルファベット、数字およびカッコや句読点な
ど予め第６図の如き単語認識部（13）の単語辞書（13
d）にキャラクター登録されている単語に対応する単語
標準パターンを、同図の単語標準パターンメモリ（13
c）に登録する。まず、所定の操作により、第６図のパラメータバッフ
ァ（13a）と単語標準パターンメモリ（13c）とがスイッ
チ（13s1）により接続され、単語登録モードにする。つぎに、本装置本体（100）の表示装置（６）にアル
ファベット、数字およびカッコや句読点などが表示さ
れ、操作者はこれに対応する読みを音声入力する。音声認識部（１）では、この音声を分析し、単語標準
パターンメモリ（13c）に単語標準パターンの登録を行
なう。上述までの操作により音声認識は可能となる。しか
し、自立語・付属語辞書（14e）および単語辞書（13d）
にない単語を認識させたいときは、自立語・付属語辞書
（14e）に認識させたい単語を登録するか、単語辞書（1
3d）に認識させたい単語を、また単語標準パターンメモ
リ（13c）に単語標準パターンを登録する必要がある。
ただし、自立語・付属語辞書（14e）に単語を登録する
か、単語辞書（13d）および単語標準パターンメモリ（1
3c）に、単語および単語標準パターンを登録するかは、
使用者がその単語を文節発声として認識させたいか、単
語発声として認識させたいかによって決定する。また、自立語・付属語辞書（14e）にはあるが、単語
辞書（13d）になく、それでも単語認識で認識させたい
場合、かかる単語を単語辞書（13d）および単語標準パ
ターンメモリ（13c）に、単語および単語標準パターン
を登録する必要がある。以下に任意単語の登録方法について述べる。単語の登録には、単語を自立語・付属語辞書（14e）
に文字列を登録する登録と、単語を単語標準パターンメ
モリ（13c）に単語標準パターンを登録、および単語辞
書（13d）に文字列を登録する２方法がある。単語を自立語・付属語辞書（14e）に登録する場合
は、登録したい単語を発声し本装置に入力する。このとき本装置はこの音声を音声認識部（１）で認識
し、認識結果を表示装置（６）に表示する。使用者はこ
の結果が正しければキーボード（７）の所定のキーを押
し、発声音声を表示装置（６）に表示されている文字列
として自立語・付属語辞書（14e）に登録する。もし、
表示装置（６）に表示された認識結果が正しくなけれ
ば、本装置の音節修正機能により表示装置（６）に表示
された認識結果を修正するか、登録したい単語を再発声
する。また再発声した結果が誤っているときは、再び本
装置の音節修正機能により修正する。上述の操作を表示
装置（６）に表示される文字列が登録したい単語と一致
するまで繰り返す。単語を単語標準パターンメモリ（13c）および単語辞
書（13d）に登録する場合は、単語を自立語・付属語辞
書（14e）に登録する場合と同様にまず表示装置（６）
に登録したい文字列を正しく表示させる。次に正しく認
識された文字列と単語標準パターンを、単語辞書（13
d）および単語標準パターンメモリ（13c）にそれぞれ登
録する。また、自然な発声で入力された音声を認識すること
は、現在の音声認識技術のレベルを考えた場合、無理が
ある。現在の音声認識技術のレベルでは、連続音節発声
入力が限度であるため、以下に連続音節発声入力の一実
施例について記す。連続音節発声入力の場合も、上記の手順と同一である
が、連続音節発声入力の場合は、単語標準パターンも連
続音節発声のパターンとなっているため、登録したい単
語を自然発声で再発声し、単語標準パターンを自然発声
より作成し、単語標準パターンと文字列を単語標準パタ
ーンメモリ（13c）および単語辞書（13d）にそれぞれ登
録する。以上の操作により、音声認識による文章作成のために
必要なデータを登録できた事となる。（ii）文章作成以下に文章作成の実施例について述べる。まず、認識動作を行なう場合は、単語認識部（13）の
スイッチ（13s1）は、パラメータバッファ（13a）と単
語判定部（13b）を接続する様に、文節認識部（14）の
スイッチ（14s1）は、パラメータバッファ（14a）と音
節認識部（14b）を接続する様に設定する。文章作成には二方法がある。第一の方法は本装置の本体に作成したい文書を音声に
よりマイク（３）から直接入力するオンライン認識方法
である。第二の方法は文章を録音しておいた録音再生装置
（２）を本装置に接続し、録音文章を再生し、認識させ
るオフライン認識である。まず、オンライン認識の実施例について述べる。オンライン認識の場合は、本装置にマイク（３）より
直接文節単位または単語単位に発声した文章を音声入力
するので、所定の操作により、入力切り換え部（４）で
マイク（３）と音声認識部（１）を接続する。また、マイク（３）より入力している音声を録音再生
装置（２）に記録しておきたいときは、録音再生装置
（２）を本体に接続し、入力切り換え部（４）をマイク
（３）の出力と録音再生装置（２）の録音端子とを接続
する。また同時に、入力切り換え部（４）は、後述の様に無
音検出信号が特徴抽出部（12）より入力された場合、文
節、または単語区切りを示すビープ音を録音するよう機
能する。音声認識時は、単語認識部（13）と文節認識部（14）
が起動している。マイク（３）より入力された音声は、前処理部（11）
で入力音声を音声分析に適した特性になるよう処理を施
され（例えば入力音声の音圧が小さい時は、増幅器によ
り音圧を増幅したりする処理を行なう）、特徴抽出部
（12）に送られる。特徴抽出部（12）では、第５図に示す如く、前処理部
（11）より入力されてきた音声を分析部（12a）で分析
し特徴抽出を行ない、パラメータバッファ（12c）に記
憶する。同時に、特徴抽出部（12）の分析単位判定部（12b）
では、分析部（12a）の分析結果より、音節または文節
単位に発声されたあとの無音区間、および文節または単
語単位に発声されたあとに録音されたビープ音（詳細は
後述のオフライン認識の実施例に示す。）の検出を行な
っており、無音区間を検出した場合、無音区間検出信号
（ロ）を発生する。かかる無音区間検出信号（ロ）を受け取ったパラメー
タバッファ（12c）は、記憶している特徴パラメータを
単語認識部（13）と文節認識部（14）に送り、記憶内容
を消去する。単語認識部（13）に入力された特徴パラメータは、第
６図に示されたパラメータバッファ（13a）に記憶され
る。単語判定部（13b）では、パラメータバッファ（13
a）に記憶された特徴パラメータと単語標準パターンメ
モリ（13c）とを比較し、パラメータバッファ（13a）に
記憶された特徴パラメータと、尤度の大きい単語標準パ
ターンをもつ単語を、単語辞書（13d）より複数後選
び、選ばれた単語の文字列とその尤度値を候補作成部
（15）に送る。一方、文節認識部（14）に入力された特徴パラメータ
は、パラメータバッファ（14a）に記憶される。音節認
識部（14b）では、パラメータバッファ（14a）に記憶さ
れた特徴パラメータと音節標準パターンメモリ（14d）
とを比較し、パラメータバッファ（14a）に記憶された
特徴パラメータを音節列に変換し、かかる音節列を文節
判定部（14c）へ送る。文節判定部（14c）では入力され
た音節列と自立語・付属語辞書（14e）に登録されてい
る単語を比較し、自立語と付属語を組み合わせて尤度の
大きい文節を複数組作成し、作成した文節の文字列とそ
の尤度値を候補作成部（15）に送る。候補作成部（15）はその入力された文字列から尤度の
大きいものを複数個選び、尤度値と単語認識部（13）か
ら送られてきたデータか文節認識部（14）から送られて
きたデータかを示すコードを付加し記憶する。同時に、
尤度の最も大きいものの文字列を、表示装置に表示させ
る信号を制御部（５）に送る。制御部（５）は、この信
号を受け尤度の最も大きいものの文字列の語に区切り記
号マーク「▽」をつけ、例えば第14図（ａ）の入力文章
に対して第14図（ｃ）に示すような形式で表示装置に表
示させる。同時に候補作成部（15）は制御部（５）に、
候補選択部（15）に記憶された内容を記憶装置（８）に
記憶させる信号を送る。制御部（５）はこの信号を受
け、候補作成部（15）に記憶された文字列の後に区切り
記号を表わすコードを付加した形で記憶装置（８）に記
憶させる。この外部記憶装置に記憶された文字列は、ワ
ープロの一次原稿とする。一般的にはフロッピーディス
クを用いるが、このとき記憶装置（８）のファイルのフ
ォーマットはワープロのファイルフォーマットに合わせ
ておく必要がある。また、この無音区間検出信号をうけとった第８図に示
す入力切り換え部（４）の信号発生部（42）は、文章の
文節または単語の区切りを表わすビープ音を発生し、か
かるビープ音をスイッチ（41）に入力する。スイッチ
（41）は、マイク（３）から入力される音声と、信号発
生部（42）より入力されるビープ音を、録音再生装置
（２）に録音するよう、回路を接続し、録音再生装置
（２）に録音されている文章の文節または単語の区切り
と見なされた無音区間にビープ音を録音する。次ぎに、オフライン認識の実施例について述べる。オフライン認識の場合は、本装置に録音再生装置
（２）の録音音声を再生入力することにより文章作成を
行なうものであるため、まず録音再生装置（２）に文章
を録音する。また、録音再生装置（２）より音声入力を行なうた
め、入力切り換え部（４）により、録音再生装置（２）
と音声認識部（１）を接続する。文章録音時は、文節単位または単語単位に発声し、文
節および単語間に無音区間を作る。また、第１図に示す
如き本装置専用の録音再生装置（２）を使用する場合
は、文節および単語の区切りを明確にするため、区切り
を示すビープ音を、録音再生装置（２）または本ディク
テーティングマシン本体に設定されている区切りキー
（71）を押し録音する。また、単語登録をした単語は、単語単位に発声をおこ
なうが、録音再生装置（２）がキャラクター音発声機能
を持ち、かつ入力したい単語に相当するキャラクターを
もっていれば、音声の替わりにそのキャラクター音を録
音してもよい。また、文章単位の頭だしや文章と文章の間に録音され
たノイズを音声と誤り認識してしまうことを避けるため
に文章の始まりと終わりを示す信号を音声と共に録音し
ておく。ただし、この信号の録音方法は、録音再生装置（２）
がマルチトラック方式か否かにより音声登録のところで
述べたように変わる。第11図は、マルチトラック方式お
よび、第12図はシングルトラック方式の図である。第11
図（ａ）、第12図（ａ）は、DTMF信号等の音が、録音さ
れている区間を音声領域として、検出する方法である。第11図（ｂ）、第12図（ａ）は、DTMF信号等の音を、
文章の始まる前に録音し、文章が終了したときに、再度
録音し、かかる両信号に挾まれた区間を音声領域とし
て、検出する方法である。また、第12図のシングルトラック方式の場合は、音声
区間とDTMF信号等の音が、重なることを考え、音声帯域
外のDTMF信号等を用いる。また文章を認識するときは、信号の録音されている前
後t4およびt5の区間をサンプリングし、音声か否かを判
定するため必ずしも文章の始まりと信号の始まり、およ
び文章の終わりと信号の終わりが一致している必要はな
い。このため、文章を発生するタイミングとキーを押す
タイミングが少々ずれても認識可能である。次に、録音再生装置（２）を本装置の本体と接続し録
音音声を再生し認識処理を行なうが、この録音音声を認
識させる前に認識速度のモードを、録音音声の再生速度
を速くして、認識時間短縮を行なう早聞き認識のモード
か、通常の再生速度で認識させるモードか、時間的に余
裕があり、高認識率を必要とするときは、二度再生認識
モードのいずれかのモードに設定しておく。まず早聞き認識モードの実施例を記す。早聞き認識モードでは、録音音声の再生速度を速くし
ているため、入力音声の特性が、通常の再生速度で再生
された登録音声より作成した、標準パターンとは特性が
違っており、単に再生速度を速くした音声を入力して
も、正確に音声認識を行なえない。そこで、再生速度を速くした音声を正確に認識するた
め、サンプリング周波数を変更する。以下に、かかる方
法の、実施例を記す。第５図の特徴抽出部（12）のサンプリング周波数制御
部（12d）は、特徴抽出部（12）の入力音声のサンプリ
ング周波数を音声の標準パターンを作成したときのサン
プリング周波数の（再生速度／録音速度）倍に設定し、
音声をサンプリングし分析する。特徴抽出部（12）以降
の処理はオンライン認識時の実施例と同様。ただし、録
音再生装置（２）の録音文章に、文節および単語の区切
りを明確にするための区切りを示すビープ音を録音済み
の文章を入力し、特徴抽出部（12）がかかるビープ音を
検出したとき、特徴抽出部（12）は無音区間検出信号
（ロ）の代わりに、ビープ音検出信号（ロ′）を発生す
る。受信信号が、無音区間検出信号（ロ）でなく、ビー
プ音検出信号（ロ′）の場合、入力切り換え部（４）の
信号発生部（42）は、文章の文節または単語の区切りを
表わすビープ音の発生は行なわない。また、音声認識部（１）が、単語を示すキャラクター
音を認識した場合は、かかるキャラクター音に対応した
単語を認識結果として出力する。次に二度再生認識モードの実施例を記す。本モードは、まず録音音声を再生し本装置に入力す
る。このとき音声認識部（１）の前処理部（11）で録音
音声の音圧変動を全て読みとり、このデータを第４図に
示す音圧変動メモリ（11b）に記憶する。次ぎに、再び
録音音声を再生し本装置に入力する。このとき前処理部
（11）では、音圧変動記憶メモリ（11b）に記憶された
データを使用し、特徴抽出部（12）への入力音圧を第18
図に示す如く、音声認識に最も適したレベルにあわせる
よう、AGC回路（11a）の増幅率を調整する。即ち、利得
Ｇを固定利得Ａに制御電圧V_G（可変調整される）を乗じ
たものとする。また、二度再生認識モードの別の実施例として、多数
回再生認識モードも考えられる。これは、録音文章を多
数回再生入力し、入力のつど、音声認識部（１）におけ
る認識方法を変更することによって認識された結果を比
較し、最も確からしさの尤度の大きいものを、選択する
方法である。また、録音再生装置（２）に登録用音声を録音してお
らず、かつ録音再生装置（２）によっては再生速度を速
くした場合の周波数特性と通常の再生速度の場合の周波
数特性が違うものを使用するとき、または音声の標準パ
ターン作成に使用した録音再生装置（２）と違う周波数
特性をもつ録音再生装置（２）に録音した文章を認識さ
せるとき、または音声の標準パターン作成に使用した録
音再生装置（２）と規格上は同じ周波数特性を有するが
使用部品等の誤差の影響をうけ実際の周波数特性が音声
の標準パターン作成に使用した録音再生装置（２）と違
っている録音再生装置（２）に録音した文章を認識させ
るときは、以下に述べる周波数特性の影響を補正する機
能を使用する。まず、録音再生装置（２）の周波数特性を測定する場
合の基準となる基準正弦波信号を基準信号発生部（42）
で発生させ、録音再生装置（２）に録音する。しかる後
に録音されたかかる基準正弦波信号を本装置に再生入力
する。入力された基準正弦波信号を音声認識部（１）は
分析し、録音された基準正弦波信号と、基準信号発生部
（42）で発生させた基準正弦波信号との周波数特性の差
を求め、録音された基準正弦波信号と、基準信号発生部
（42）で発生させた基準正弦波信号との周波数特性の差
を小さくするように、補正をかける。補正をかける手段
は、音声認識部（１）の特徴抽出部（12）の特徴抽出方
法により、多数考えられる。例えば第13図に示したよう
に、直列接続されたバンドパスフィルタ（BPF）と増巾
器（AMP）との並列接続体からなるアナログフィルター
バンク方式とするものであれば、増幅器（AMP）の増幅
率を調整することにより、基準信号発生部（42）で発生
させた基準正弦波信号との周波数特性の差を小さくする
ようにフィルタからの出力を調整する。また、特徴抽出
部（12）の特徴抽出方法として、ディジタルフィルター
をもちいていれば、ディジタルフィルターの特性を決め
ているパラメータを変更すればよい。その他、音声認識
部（１）の特徴抽出部（12）の特徴抽出方法に対応し
て、あらゆる方法が考えられる。前記までの操作により、音声入力した文章はかな列に
変換された事となる。このかな列変換された文章が入力
した文章と違っている場合の修正方法を第14図を使用し
それぞれの誤りかたに場合分けして以下に述べる。以下
の手順により修正を行なう。第14図（ａ）は入力文章、同図（ｂ）は入力音声、同
図（ｃ）は認識結果、同図（ｄ）〜（ｈ）は修正過程、
同図（ｉ）は修正結果を表わしている。まず、単語として発声したものが文節として誤認識さ
れた場合の修正法について述べる。同図（ｃ）に示した
ように単語“C"として発声したものが、文節“しー”と
して認識された場合、先ずカーソル（Ｘ）を誤った単語
の部分へ移動する［同図（ｄ）ｉ］。次ぎに単語次候補
キー（72）を押し単語の次候補を表示させる［同図
（ｄ）ii］。この結果が正しければ次の修正部分へ進
む。もしこの結果が誤っていれば、再び単語次候補キー
（72）を押し単語の次候補を表示させる。この操作が正
解が表示されるまで繰り返す。次ぎに、文節として発声したものが単語として誤認識
された場合の修正法について述べる。文節“い”として
発声したものが、単語“E"として認識された場合、先ず
カーソル（Ｘ）を誤った文節の部分へ移動する。次ぎに
文節次候補キー（73）を押し文節の次候補を表示させ
る。この結果が正しければ次の修正部分へ進む。もしこの結果が誤っていれば、文節次候補キー（73）
を押し文節の次候補を表示させる。この操作を正解が表
示されるまで繰り返す。単語前候補キー（74）を押すことにより単語、文節前
候補キー（75）を押すことにより文節、それぞれの一つ
前の候補を表示させることも出来る。上述の２通りの修正法で正解が得られないときは音節
単位の修正や、単語または文節または音節を再発声入力
する。また、再発声入力時に再び、文節を単語認識したり、
単語を文節認識したりすることを避けるため、候補作成
部（15）を、単語認識部（13）より送られてきた認識結
果のみを認識結果としてみなし、文節認識部（14）より
送られてきた認識結果は、無視するよう外部より制御で
きる。また、候補作成部（15）を、文節認識部（14）より送
られてきた認識結果のみを認識結果としてみなし、単語
認識部（13）より送られてきた認識結果は、無視するよ
う外部より制御できる。上述の次候補キーとは、以下に述べる機能を有するキ
ーの事であり、第15図を使用し説明する。本装置の音声認識部（１）では、単語認識と文節認識
が並走しており、単語および文節の両認識結果を求めて
いることは先に述べたが、この両認識結果より、文節認
識処理の結果を尤度の大きいものから順番に認識結果を
表示装置（６）に表示させるためのキーが文節次候補キ
ー（73）であり、単語認識処理の結果を尤度の大きいも
のから順番に認識結果を表示装置に表示させるためのキ
ーが単語次候補キー（72）であり、現在表示装置に表示
されている認識結果より、一つ尤度の大きい認識結果を
表示装置（６）に表示するキーが、単語前候補キーおよ
び文節前候補キーである。第15図は候補作成部（15）の候補バッファ（15a）で
ある。この図は、一位の認識結果が、「たんご」であ
り、これは単語認識部（13）から送られてきた認識結果
であることを（単語）で表わしている。同様に二位の認
識結果が、「たんごを」であり、これは文節認識部（1
4）から送られてきた認識結果であることを（文節）で
表わし、三位の認識結果が、「たんごに」であり、これ
は文節認識部（14）から送られてきた認識結果であるこ
とを（文節）で表わし、四位の認識結果が、「たんこ
う」であり、これは単語認識部（13）から送られてきた
認識結果であることを（単語）で表わしている。いま、表示装置（６）には、「たんご」が表示されて
いるとする。かかる状態で文節次候補キー（73）を押す
と表示装置（６）には「たんごを」が表示される。ま
た、単語次候補キー（72）を押すと表示装置（６）には
「たんこう」が表示される。また、表示装置（６）には、「たんこう」が表示され
ている場合に、単語前候補キー（74）を押すと表示装置
（６）には「たんご」が表示され、文節前候補キー（7
3）を押すと表示装置（６）には「たんごに」が表示さ
れる。次ぎに一文節全体の一括修正方法について述べる。第14図（ｅ）の例は単語「Ｔ」を「Ａ」と誤認識した
例である。先ずカーソルを修正したい単語へ移動する
［同図（ｅ）ｉ］。次に単語次候補キー（72）を押し単語の次候補を表示
させる［同図（ｅ）ii］。この結果が正しければ次の修
正部分へ進む。もしこの結果が誤っていれば、単語次候
補キー（72）を押し単語の次候補を表示させる。この操
作を正解が表示されるまで繰り返す。正解が表示され無
ければ、再発声を行ない、再入力をおこなう。前単語候
補キー（74）を押すことにより一つ前に表示した単語の
候補を表示させることも出来る。次ぎに一単語全体の一括修正方法について述べる。第14図（ｆ）の例は文節「がめんの」を「がいねん
の」と誤認識した例である。先ずカーソルを修正したい
文節へ移動する［同図（ｆ）ｉ］。次ぎに文節次候補キー（73）を押し文節の次候補を表
示させる［同図（ｆ）ii］。この結果が正しければ次の
修正部分へ進む。もしこの結果が誤っていれば、文節次
候補キー（73）を押し文節の次候補を表示させる。この
操作を正解が表示されるまで繰り返す。正解が表示され
無ければ、再発声を行ない、再生入力をおこなう。前文
節候補キー（75）を押すことにより一つ前に表示した文
節の候補を表示させることも出来る。次ぎに音節単位の修正方法について述べる。第14図（ｈ）の例は文節「おんせいで」を「おんけい
で」と誤認識した例である。この例は音節「け」を
「せ」に修正する場合であるが、先ずカーソル（Ｘ）を
修正したい音節「け」へ移動し［同図（ｈ）ｉ］。音節
次候補キー（76）を押す。音節次候補キー（76）を押す
ことにより修正したい部分の音節と最も距離が近い音節
が表示される［同図（ｈ）ii］。正解が表示されれば、
次の修正部分へ移動する。もしこの結果が誤っていれ
ば、再度音節次候補キーを押し音節の次候補を表示させ
る。この操作を正解が表示されるまで繰り返す。正解が
表示され無ければ、再発声により再入力を行なう。再入
力の結果が間違っている時は上記の手順により再び修正
する。この操作を正解が表示されるまで繰り返す。また前音節候補キー（77）を押すことにより音節の一
つ前の候補を表示させることも出来る。音節を削除したい時は、カーソルを修正したい音節へ
移動し削除キー（78）を押し削除する。音節を挿入したい時は、カーソルを修正したい音節へ
移動し挿入キー（79）を押し挿入する。次に第16図を使用し、数音節修正法について記す。この例は、同図（ａ）の入力文章“かいじょう”を同
図（ｂ）「かんじょう」と誤認識した例である。この場
合、まずカーソル（Ｘ）を修正したい音節にもどってい
き［同図（ｃ）］、“かい”と再発声入力する。かかる
再発声入力音声は音声認識部（１）で認識され、認識結
果は表示装置（６）に表示される。認識結果が正しけれ
ば、次の修正部へすすむ。もし、同図（ｄ）に示すよう
に、「かい」を「かえ」と誤認識した場合、単語の場合
は、単語次候補キー（72）を押す。文節の場合は、文節
次候補キー（73）を押す。第16図は単語の場合の例であ
るので、以下単語の修正方法について記す。同図（ｄ）
の状態で、単語次候補キー（72）を押した場合、まず、
制御部（５）は、単語辞書（13d）より、修正前の同図
（ｂ）の認識結果「かんじょう」と再発声後の同図
（ｄ）の認識結果「かえじょう」とを比較し、同一部分
「じょう」をみつける。次に、制御部（５）は、単語辞
書（13d）より、かかる同一部分「じょう」をもつ単語
を選ぶ。図図（ｆ）は単語辞書（13d）の記憶内容を示
しており、同図（ｇ）は記憶内容より選んだ「じょう」
をもつ単語に示している。次に制御部（５）は、同図
（ｇ）に記した単語と、再発声後の認識結果「かえじょ
う」との尤度を計算し、最も尤度値の大きい単語を表示
する［同図（ｅ）］。次に文節または単語の認識境界誤りを修正する場合に
ついて述べる。第14図（ｇ）の例は文節「ぶんしょうを」を「ん」と
「し」の間に［］印で示す無音区間があると誤認識
し、単語「ぶん」と文節「しょうを」というように二つ
に分けて誤認識した例である。この場合認識境界誤りを
修正しなければならないが、認識境界区切り記号を削除
したい場合は、削除したい認識境界区切り記号にカーソ
ル（Ｘ）を移動し［同図（ｇ）ｉ］、削除キー（78）を
押す［同図（ｇ）ii］。認識境界区切り記号を挿入した
い場合は挿入したい位置にある音節にカーソル（Ｘ）を
移動し挿入キー（79）を押す。ただし、後に述べるように録音再生装置（２）の区切
りビープ音と、記憶装置（８）に記憶された認識結果に
付加された区切り記号は、録音再生装置（２）と記憶装
置（８）の同期をとるための目印となるので、対応はと
っておかなければならない。ゆえに、この時記憶装置
（８）に区切り記号が挿入削除されたことを記憶装置
（８）に記憶しておく。例えば、第14図（ｇ）ｉに示した文章が、第14図
（ｇ）iiに示すように、記憶装置（８）に記憶されてい
るものとする。（ｇ）ｉの文章を、（ｇ）iiに示すよう
に修正した場合、記憶装置（８）に記憶されていた区切
り記号「▽」は、記号「▼」に改められる。記号「▼」
は、区切り記号「▽」が削除されたことを示す記号であ
り、認識単位を示す記号には用いられず、録音再生装置
（２）等との制御のみに用いられる記号である。このような構成にすれば、区切り記号「▽」を削除し
た後も、録音再生装置（２）に録音されたビープ音と、
記憶装置（８）に記憶された記号「▽」「▼」を用いる
ことにより、同期をとりながら両装置を制御できる。以上は、区切り記号「▽」を削除した場合の例である
が、挿入された場合も同様の考え方ができる。つまり、
制御信号としては用いられず、区切りのみを表わす特定
の記号を、区切り記号「▽」の替わりに挿入すればよ
い。以上の修正手順により、第14図（ｉ）に示すように、
文章を修正する。認識境界誤り修正を行なった後認識境界誤り修正を行
なった認識単位について、修正手順に従って修正を加え
る。再発声による修正の場合、標準パターンを登録した
人なら誰の音声でも認識できるので文章の録音者でなく
とも修正操作を行なえる。以上、かな列文章の修正方法を述べたが、修正を補助
する機能として以下に述べる機能を有する。表示装置（６）に表示された文字列上のカーソル移動
と表示画面のスクロール機能により、記憶装置（８）よ
り順次記憶文章を表示画面上に表示できるが、この時画
面上に表示されている部分に対応する音声が録音再生装
置（２）から再生される。また、上述の機能とは逆の機能も有し、録音再生装置
（２）から再生されている部分に対応した文字列が表示
装置（６）に表示される。また、上述のどちらの方法の場合も録音文章に録音さ
れている区切り記号音と、表示側に記憶されている区切
り記号を、同期を取るタイミング信号として使用し、録
音再生装置（２）の再生と表示とがお互いに同期をとり
ながら動作するよう制御している。また、キーボード
（７）、または録音再生装置（２）より再生を止める信
号が入力されたとき、再生を止めるとともに、表示のス
クロールまたはカーソルの移動を止める。以上の録音再生装置（２）の再生と表示との同期機能
により、再生音を聞きながら文字列の確認を行なうこと
ができ、修正個所の発見を容易にする。ここで述べている同期のとり方として、再生されてい
る部分に対応する記憶装置（８）の文字列を表示装置
（６）に表示する方法と、再生されている部分に対応す
る部分より区切り記号一つ遅れた部分のかな列を表示装
置（６）に表示する方法とがある。この場合、修正のため表示を停止したときには既に録
音音声の修正部分は再生されているため再度修正部分を
再生するためには、再生された文章より修正したい部分
の頭だしを行なう必要がある。そこで、この方法を採用
する場合は、表示を停止したとき、自動的に録音再生装
置（２）を一つ前の区切り記号までバックトラックする
機能をもたせる。また、録音再生装置（２）に、テープレコーダを使用
した場合、再生部分をモータの回転により制御すること
と、テープのたるみなどにより、修正部分に対応した部
分の頭出しが正確に行なえない場合がある。このような場合は、入力されてくる音声を、一定時間
長だけPCM録音やADPCM録音で記憶しておき、入力された
音声を聞き返したい場合は、PCM録音やADPCM録音音声を
聞き返す機能を付加する。第17図は上記の、機能の一実施例であり、PCM録音の
データを記憶しておくPCMデータメモリの図である。図
中の数字01〜05はアドレスを示している。入力音声は、
第14図に記した“わたしわ｜てん｜しー｜あーる｜てー
｜がめんの｜ぶんしょうを｜てん｜おんせいで｜しゅう
せいした｜まる”という、文章である。上記の、音声が入力されたとき、PCMデータメモリ（D
M）には、01番地に最初の無音区間までの音声“わたし
わ”が記憶される。02番地に２番目の無音区間までの音
声“てん”が記憶される。05番地に５番目の無音区間ま
での音声“てー”が記憶される。このとき、PCMアドレ
スポインタ（AP）は、PCMデータメモリに記憶されてい
るデータのうち、１番先に記憶されたデータのアドレス
を記憶しておく。本例では、01が記憶される。この段階でPCMデータメモリは一杯になる。次に、音声が入力されたときは、PCMデータメモリ（D
M）に記憶されているデータのうち、１番先に記憶され
たデータのアドレスは、入力された音声を記憶する。本
例では“わたしわ”が記憶されていたアドレス01に“が
めんの”を記憶する。このとき、PCMアドレスポインタ
（AP）は、PCMデータメモリ（DM）に記憶されているデ
ータのうち、１番先に記憶されたデータのアドレスを記
憶しておく。本例では、02が記憶される。この状態で、PCMデータメモリ（DM）の内容を再生す
る場合、PCMアドレスポインタ（AP）の指している、ア
ドレスから、再生する。本例では、02,03,04,05,01の順
番に再生していく。かかる方法により、何度でも、正確に素早く、音声を
聞き返すことが可能となる。また、画面上の認識単位の区切り記号上へカーソル
（Ｘ）を移動し録音音声の頭出しキー（70）を押すこと
により、カーソルが示している認識単位に対応した録音
再生装置（２）側の区切り記号音部分を録音文章より捜
し出し、これに続く文章を再生する機能を有する。以下
に、かかる機能の実施例を示す。認識した文章の確認のため、認識結果を記憶装置
（８）より読み出し、表示装置（６）に冒頭より表示さ
せる。この時、第19図、制御部（５）の区切り記号カウ
ンター（５）ａは、記憶装置（８）より読み出された区
切り記号の数を計数していく。読み出した認識結果が誤
っている場合は、誤っている部分にカーソルをあて、頭
出しキーを押す。制御部（５）は、録音再生装置（２）
に録音されている文章を、早送り再生モードで再生させ
る。特徴抽出部（12）のビープ音カウンター（12e）
は、録音再生装置（２）より入力される文章中の区切り
をしめすビープ音を計数する。比較回路（5b）は、ビープ音カウンター（12e）の値
が、先に述べた区切り記号カウンター（5a）の値より、
１つ小さくなったとき、信号（ハ）を録音再生装置
（２）に送り、再生を止める。また、認識結果、および修正を終了した文章の確認の
ためには、記憶装置（８）の記憶データを表示装置
（６）に文字列で表示させ、表示画面上に表示された文
字列を目で追い、読まなければならないため、非常に目
が疲れる。かかる点に鑑み、本装置は認識結果を記憶させた記憶
装置（８）上の文字列を、音声合成機能により読み上げ
る機能をもたせることにより、認識結果、および修正を
終了した文章の確認を音声合成音を聞くことにより行な
えるようにできる。この場合も音声合成部（９）と記憶装置（８）と録音
再生装置（２）と表示装置（６）との同期を取るタイミ
ング信号として、区切り記号を使用する。つまり、音声合成部（９）が記憶装置（８）より読み
上げている部分に相当する文字列が表示装置（６）に表
示され、同時に録音再生装置（２）より録音部分を頭出
ししている。この方法により、音声合成音の読み合わせ
機能により誤りを発見し修正のために音声合成の読み合
わせ機能を停止させたとき、表示装置（６）の表示も録
音再生装置（２）の録音部分も誤り部分を示しており、
即座に修正を行なうことができる。ここで述べている同期のとり方として、音声合成機能
により読み上げている部分に対応する記憶装置のかな列
を表示装置（６）に表示すると同時に、録音再生装置
（２）に録音されている文章より該当する音節部分を再
生する方法と、音声合成機能により読み上げられている
部分に対応する部分より、区切り記号一つ遅れた録音再
生装置（２）に録音されている文章部分再生する方法と
がある。後者の場合、修正のため音声合成を停止したと
き、録音再生装置（２）は修正したい部分より手前で停
止しているため、この状態で再生すれば直ぐに修正部分
の音声を再生できる。前者の場合は修正のため音声合成
を停止したときには既に録音音声の修正部分は再生され
ているため再度修正部分を再生するためにはバックトラ
ックする必要がある。そこで、前者の方法を採用する場
合は表示を停止したとき、自動的に録音再生装置（２）
が一つ前の区切り記号までバックトラックする機能をも
たせるのが好ましい。以上、認識結果を記憶装置（８）に記憶しておく実施
例を記してきたが、以下に、別の実施例として、録音再
生装置（２）に認識結果を記憶させる方法を記す。記憶装置（８）に記憶された、認識結果を、原文の録
音された録音再生装置（２）に記録する。この様にすれ
ば、原文と認識結果が、同一記録媒体に記録できるた
め、原文と認識結果の管理が容易になる。また、録音文章を、再生入力しながら、認識した結果
を録音再生装置（２）に録音していくことにより、外部
記憶装置が不要となる。いずれの場合も、マルチトラック方式の録音再生装置
（２）を用いることにより、録音音声を再生しながら、
音声の録音されていないトラックに認識結果を記憶させ
ることができる。（ト）発明の効果本発明の音声認識システムは文字コード列の如き認識
結果を、原文音声が格納されている録音再生装置に記憶
させるため、認識結果記憶だけの為の従来のフロッピー
ディスクなどの記憶装置が不要になり、装置の小型化お
よび低価格化が可能となる。また、原文（録音文章）と認識結果が同一の記憶媒体
に記憶できるため、原文と認識結果を一組とした保管が
でき、ファイル管理が容易になる。DETAILED DESCRIPTION OF THE INVENTION (a) Industrial application field This invention relates to a speech recognition system. (B) Conventional technology Sound is recorded on a recording / playback device such as a tape recorder,
The reproduced sound is output to the speech recognition device.
Speech recognition is performed and text is written
Speech recognition systems are being developed (Japanese Patent Laid-Open No. 58-1587).
No. 36). Conventional sound combining a recording / playback device and a voice recognition device
The voice recognition system can be used to input voice directly from the microphone.
In addition, he had to input the text he had recorded. One
In other words, the recording and playback device pre-
It was just for recording. (C) Problems to be solved by the invention In the conventional system as described above, the recording / reproducing device
Just record in advance the text to be input from the microphone
Recording and playback device with a memory function.
Storage device for storing the recognition result
Needed. In addition, the original text (recorded text) and the recognition result
Since these are stored on separate storage media,
There were difficulties in management. (D) Means for solving the problem The speech recognition system of the present invention employs a recording reproduction
The recognition result is also stored in the raw device. (E) Function According to the present invention, the recognition result of the voice recognition device is
To the recording / playback device that stores the input voice to the voice recognition device
The conventional floppy disk for storing the recognition result only
A storage medium such as a pea disk becomes unnecessary. (F) Embodiment FIG. 1 employs the present invention to create sentences by voice input.
Fig. 2 shows an external view of the dictating machine.
FIG. 2 shows a functional block diagram of the machine. In FIG. 2, (1) is rotated inside the main body (100) of FIG.
This is a voice recognition unit mounted on the road.
Before adjusting the sound pressure of the input audio signal as shown in the lock diagram
Processing unit (11) [Fig. 4], sound pressure adjustment from processing unit (11)
From the adjusted audio signal
Feature extraction unit to be extracted (12) [Fig. 5], extraction unit (12)
Word recognition of input speech based on feature parameters obtained from
The word recognition unit (13) that performs recognition [Fig. 6] and the phrase recognition unit (1
4) [Fig. 7] and any of these recognition units (13),
A recognized word character string based on the recognition result from (14), or
From the candidate creation unit (15) for creating recognition syllable character candidates,
You. Further, in FIG. 2, (2) shows the main body as shown in FIG.
(100) Mechanically and electrically removable tape recorder
(3) is a recording / reproducing device such as a
Headphone type microphone, (4) recording and playback
Device (2), microphone (3), voice recognition unit (1)
Input switching unit [8th]
Figure]. (6) is a character string generated based on the recognition result
(7) is a display device for displaying
Keyboard for inputting various control signals of the
(8) was generated by the dictating machine
Storage devices such as magnetic disk devices for storing character strings,
(9) is a loudspeaker that synthesizes a character string in the storage device by rule synthesis.
This is a speech synthesis unit for reading out from (10). still,
(5) is a control unit composed of a microprocessor.
It controls the operation of each part. Writing with the dictating machine with the above configuration
There are two ways to do this, each of which is described in detail below.
I do. The first method is to use live sound from the microphone (3) as a voice recognition unit.
Input to (1), perform voice recognition, and input text as a character string
And display it on the display device (6), and at the same time, the storage device
The result is stored in (8). The second method is to record and reproduce the text you want to input in advance.
Record in (2), and install this recording and playback device (2)
And connect the recorded text to the speech recognition unit (1).
Performs voice recognition and converts the input voice to a character string
And display it on the display device (6) and at the same time on the storage device (8).
Store the result. As mentioned above, there are two ways to input voice.
In the input switching unit (4), the input is switched.
Do. The input switching unit (4) switches the input.
In addition, the recording signal (b) is recorded in the recording / reproducing device (2).
To record the voice input from the microphone (3)
The switching is also performed. The operations from voice recording to text creation will be described in detail below.
You. (I) Speech registration process Before speech recognition, the speech necessary for speech recognition
Voice registration is performed to create a standard pattern. First, the syllable registration mode will be described. The standard pattern described here is the voice recognition unit.
At the time of pattern matching in the phrase recognition unit (14) in (1)
The reference pattern shown in FIG.
Syllable standard pattern memory (14
stored in d). To register a voice to this dictating machine,
First, operate the switch (14s1) in FIG.
(14d) and the syllable recognition unit (14b).
There are three methods. The first method is to place a microphone (3) on the body (100) of the machine.
Enter the registered voice directly, and use this registered voice
Analyze in (1), create a standard pattern, and create the standard
The quasi-pattern is stored in the syllable standard pattern memory (14d) and
This is a method of storing data in the storage device (8). The second method is to record the registered voice in advance
Connect the playback device (2) to the main unit (100) and register this recording.
The registered voice is input by playing the voice.
The input registered voice is analyzed by the voice recognition unit (1), and the standard
Create a turn and use the created standard pattern as a syllable standard pattern.
Memory (14d) and the storage device (8).
Is the way. The third method is to use a microphone (3) on the main unit (100) of the machine.
Input the registered voice directly from the
Connect the raw device (2) to the main unit (100)
While recording the recorded sound on the recording and playback device (2),
The (100) side analyzes the registered voice from the microphone (3).
Create a standard pattern that does not exist and record the created standard pattern.
It is stored in the storage device (8). And then this my
When the voice input to step (3) is completed,
The sound recorded on the recording / reproducing device (2) is reproduced, and
The voiced registered voice is analyzed by the voice recognition unit (1), and the standard
Create a turn and use the created standard pattern as a syllable standard pattern.
Memory (14d) and the storage device
(8) is also a direct registration voice from the microphone (3).
This is a method of storing with the syllable standard pattern. In the third method, the recording / reproducing device (2)
Of the recorded sound is subject to the frequency characteristics of the recording / reproducing device (2).
Standard patterns created from the recorded audio
Standard pattern created from audio input directly from Iku (3)
Differences between the two standard patterns
You. Therefore, when recognizing the recorded voice,
You must use a standard pattern
If you want to recognize the voice input directly from (3),
Standard putter created from audio input directly from step (3)
Need to use the
And the standard putter registered directly from the microphone (3)
And the standard pattern created from the recorded voice.
It can be created and stored by one voice registration operation. Also,
Once the registered voice is recorded in the recording / playback device (2),
Dictating machine without quasi-pattern
The recorded voice is not required
A standard pattern can be created simply by inputting it for playback. Ma
In addition, the registered voice is recorded on the recording / reproducing device (2), and
If you record a sentence after the registered voice of
Connect the sound playback device (2) to the main unit (100) and record
Just by playing the audio, you can go from voice registration to text creation.
All can be done automatically. In addition, the registrant's utterance for creating a standard voice pattern
The input is displayed on the display device (6) by the device in a certain order.
This is done by the registrant reading out a sentence. In addition, a recording and playback device with a display function dedicated to this machine
When using (2), this recording and playback device (2) alone
Even when you carry your phone,
Produce a corresponding voice and record it on the recording and playback device (2)
Thus, a standard pattern can be created. As described above, the registered voice for creating the standard pattern
When recording to the recording / playback device (2),
Noise when creating a standard pattern from the registered voice
Which recordings are affected and the corresponding headwords
There is a possibility that it will be shifted.
Using a tape recorder as a recording and playback device
The case is described. FIG. 9 (a) shows a mark on the tape recorder.
Record the registered voice for creating the semi-pattern
The registered voice "A" corresponding to the headwords "A" to "KA"
~ "" Represents the state of the tape, here
Shows the case where [Noise] is recorded between "E" and "O".
You. As shown in FIG. 9 (a), between the registered voice and the registered voice
[Noise] is registered by using the recorded tape.
The first recorded sound is "A" and the second is recorded
The recorded sound is simply recorded on tape, like "I"
To which syllable the input registered voice depends on the order of the
If you have determined whether it is compatible, go up to [Noise].
Since it is regarded as a recorded voice and the headword is corresponded, it is input
The actual registered voice and the headword are shifted. Here, FIG. 9 (b) shows that [noise] is erroneously recognized as voice.
Then, [Noise] is entered at the headword "E",
In the figure where the syllable "E" is entered at the headword "O"
is there. When creating a standard pattern from registered voices like this
Recorded voice and headword are shifted due to noise
In some cases, as shown in FIG.
Character code sound indicating the type of
In response, recording is made on the recording / reproducing device (2). By this method
And [Noise] is recorded between "U" and "E"
Also, as described above, the difference between the input sound and the headword
To prevent. Character code of specific frequency to prevent this shift
The sound recording method was changed to the tape recorder of the recording / playback device (2).
Is single-track and multi-track
Will be described separately. First, in Fig. 10, a multi-track recording method is used.
A case where a recording / reproducing device having a function is used will be described. Recording / playback device with multi-track recording system
When using it, record the voice as shown in Fig.
Character track corresponding to the headword
Record the sound. In the voice recognition unit (1), this character
-If you know the headword of the input voice from the chord sound,
Of the sounds recorded on the audio track, this character
Sound recorded in the section t1 where the chord chord sound was recorded.
That is, only those that meet the conditions above the sound pressure threshold
Assume and perform analysis. Or, as shown in FIG.
Record the character code corresponding to the headword,
Of the sounds recorded on the audio track,
Character code sound and a character
Of the sounds recorded in the interval t2 between lactor chord sounds,
Only those that meet the conditions above the sound pressure threshold are regarded as speech.
And perform the analysis. Or, as shown in FIG.
Record the character code corresponding to the word. Sound
A key that indicates the type of this sound among the sounds recorded on the rack.
From the character chord sound, the character corresponding to the next entry word
Of the sounds recorded in the section t3 up to the Lactor chord sound,
Only those that meet the conditions above the sound pressure threshold are regarded as speech.
And perform the analysis. As a second method, a single track recording and playback device
In the case of (2), the character
The sound is recorded outside of the audio analysis frequency band, and the sound is recorded.
Record along with the sound on the sounding track. in this case
The method of recording the character chord sound of
This is the same as in the case of Chitrac. That is, t1, t described above
2, among the sounds recorded in the section of t3, the same conditions as above
Analysis is performed by regarding only what is fulfilled as speech. However
And the voice and the character code sound overlap
Except in the case of the embodiment shown in FIG.
Even if you do not use sound outside the frequency
Good. Next is the alphabet, numbers, parentheses and punctuation.
The word dictionary (13) of the word recognition unit (13) as shown in FIG.
Words corresponding to words registered as characters in d)
The standard pattern is stored in the word standard pattern memory (13
Register in c). First, the parameter buffer shown in FIG.
(13a) and the word standard pattern memory (13c)
(13s1) to connect to the word registration mode. Next, the display device (6) of the main unit (100) is
Favets, numbers and parentheses and punctuation are displayed.
Then, the operator voice-inputs the corresponding reading. The speech recognition unit (1) analyzes this speech and uses the word standard
Register a word standard pattern in the pattern memory (13c)
Now. Voice recognition becomes possible by the operations described above. Only
And independent word and adjunct dictionary (14e) and word dictionary (13d)
If you want to recognize words that are not in
Register the word to be recognized in (14e) or use the word dictionary (1
3d) Recognize the word you want to recognize
It is necessary to register the word standard pattern in ri (13c).
However, register words in the independent word / attached word dictionary (14e)
Or word dictionary (13d) and word standard pattern memory (1
3c), whether to register words and word standard patterns,
Whether the user wants the word to be recognized as a phrase
It is determined depending on whether the user wants to be recognized as a spoken word. In addition, although it is in the independent word / attached word dictionary (14e),
Not in dictionary (13d), but still want to be recognized by word recognition
If this is the case, place the word in the word dictionary (13d) and the word standard
Words and word standard patterns in turn memory (13c)
Need to be registered. The method of registering an arbitrary word is described below. To register a word, use the dictionary of independent words and attached words (14e)
Register a character string in the
Register word standard patterns in Mori (13c)
There are two ways to register a character string in the book (13d). When registering words in the independent word / attached word dictionary (14e)
Utters a word to be registered and inputs the word to the apparatus. At this time, this device recognizes this voice by the voice recognition unit (1).
Then, the recognition result is displayed on the display device (6). User
If the result is correct, press the predetermined key on the keyboard (7).
Then, the uttered voice is displayed on the display device (6) as a character string
Is registered in the independent word / attached word dictionary (14e). if,
The recognition result displayed on the display device (6) is incorrect.
For example, display on the display device (6) by the syllable correction function of this device
Correct the recognized result or re-speak the word you want to register
I do. If the result of the re-utterance is incorrect,
Correct by the syllable correction function of the device. Show above actions
Character string displayed on device (6) matches word to be registered
Repeat until you do. Words in the word standard pattern memory (13c) and word dictionaries
When registering in the book (13d), use words
Display device (6) as in the case of registration in the certificate (14e)
Display the character string you want to register correctly. Next,
The recognized character strings and word standard patterns are stored in a word dictionary (13
d) and the word standard pattern memory (13c)
Record. It also recognizes natural utterances
Is impossible given the current level of speech recognition technology.
is there. At the current level of speech recognition technology, continuous syllable utterances
Since the input is limited, the following is an example of continuous syllable utterance input.
An example is described. The procedure above is the same for continuous syllable utterance input.
However, in the case of continuous syllable utterance input, the word standard pattern
Since the pattern is a syllable utterance,
Words are re-uttered with natural utterances, and word standard patterns are spontaneously uttered
Create a word standard pattern and a character string
In the memory (13c) and the word dictionary (13d), respectively.
Record. By the above operation, to create sentences by speech recognition
The necessary data has been registered. (Ii) Text preparation An example of text preparation is described below. First, when performing the recognition operation, the word recognition unit (13)
The switch (13s1) is simply connected to the parameter buffer (13a).
The word recognition unit (13b) is connected to the phrase recognition unit (14).
The switch (14s1) has a parameter buffer (14a) and a sound.
Set to connect the clause recognition unit (14b). There are two ways to create sentences. The first method is to convert a document to be created on the
Online recognition method to input directly from microphone (3)
It is. The second method is a recording and playback device that records sentences
Connect (2) to this device to play and recognize the recorded text
Is offline recognition. First, an embodiment of online recognition will be described. For online recognition, use the microphone (3)
Speech input of sentences spoken directly in units of phrases or words
Therefore, the input switching unit (4) performs
The microphone (3) and the voice recognition unit (1) are connected. In addition, the sound input from the microphone (3) is recorded and played back.
If you want to record on the device (2),
Connect (2) to the main unit and connect the input switching unit (4) to the microphone.
Connect the output of (3) to the recording terminal of the recording and playback device (2)
I do. At the same time, the input switching unit (4)
When the sound detection signal is input from the feature extraction unit (12), the sentence
A beep to indicate a clause or word break
Works. At the time of speech recognition, word recognition unit (13) and phrase recognition unit (14)
Is running. The voice input from the microphone (3) is sent to the pre-processing unit (11)
Process the input voice so that it has characteristics suitable for voice analysis.
(For example, when the sound pressure of the input sound is low,
Process to amplify the sound pressure), feature extraction unit
Sent to (12). In the feature extraction unit (12), as shown in FIG.
(11) Analyze the voice input from the analysis unit (12a)
Feature extraction and write in the parameter buffer (12c).
Remember At the same time, the analysis unit determination unit (12b) of the feature extraction unit (12)
Then, based on the analysis result of the analysis unit (12a), syllables or syllables
Silence after a unit is uttered, and a phrase or unit
Beep sound recorded after being uttered word by word (for details,
This will be shown in an embodiment of the offline recognition described later. ) Detection
When a silent section is detected, a silent section detection signal
(B) occurs. The parameter that receives the silent section detection signal (b)
Buffer (12c) stores the stored feature parameters.
Sent to the word recognition unit (13) and phrase recognition unit (14)
To delete. The feature parameters input to the word recognition unit (13) are
It is stored in the parameter buffer (13a) shown in FIG.
You. In the word determination unit (13b), the parameter buffer (13
The feature parameters and word standard pattern stored in a)
Compared with the moly (13c) and stored in the parameter buffer (13a)
The stored feature parameters and the standard
Select multiple words with turns from the word dictionary (13d)
And the character string of the selected word and its likelihood value
Send to (15). On the other hand, the feature parameters input to the phrase recognition unit (14)
Is stored in the parameter buffer (14a). Syllable recognition
In the knowledge section (14b), it is stored in the parameter buffer (14a).
Feature parameters and syllable standard pattern memory (14d)
And stored in the parameter buffer (14a).
Converts feature parameters into syllable strings and converts the syllable strings into syllable strings
Send to the judgment unit (14c). In the phrase judgment section (14c)
Syllable strings and registered in the independent word / adjunct dictionary (14e)
Of independent words and adjunct words to calculate the likelihood
Create multiple sets of large clauses, and create a text string and
Is sent to the candidate creating unit (15). The candidate creation unit (15) calculates the likelihood from the input character string.
Select multiple large ones, and select the likelihood value and word recognition unit (13).
From the sent data or sent from the phrase recognition unit (14)
A code indicating whether the data is received is added and stored. at the same time,
The character string with the highest likelihood is displayed on the display device.
To the control unit (5). The control unit (5)
Is delimited to the word of the string with the highest likelihood
Put the number mark "▽", for example, the input sentence in Fig. 14 (a).
Is displayed on the display device in the format shown in Fig. 14 (c).
Let me show you. At the same time, the candidate creation unit (15)
The contents stored in the candidate selection unit (15) are stored in the storage device (8).
Send the signal to be stored. The control unit (5) receives this signal.
After the character string stored in the candidate creation unit (15)
A code representing a symbol is added to the storage device (8).
Remember. The character string stored in this external storage device is
-The primary manuscript of a professional. Generally floppy disk
At this time, the file file in the storage device (8) is used.
The format matches the file format of the word processor
Need to be kept. FIG. 8 shows the result of receiving the silent section detection signal.
The signal generator (42) of the input selector (4)
Emits a beep to indicate a phrase or word break,
Input a beep sound to the switch (41). switch
(41) shows the sound input from the microphone (3) and the signal emission
A beep sound input from the raw part (42) is recorded and reproduced
(2) Connect the circuit to record, and record and playback device
Phrases or punctuation of sentences recorded in (2)
A beep sound is recorded in the silence section considered as. Next, an embodiment of off-line recognition will be described. In the case of offline recognition, a recording and playback device
Sentence creation by playing and inputting the recorded voice of (2)
First, write the text to the recording / playback device (2).
To record. Also, voice input from the recording / reproducing device (2) is performed.
The recording / reproducing device (2) is operated by the input switching unit (4).
And the voice recognition unit (1). When recording a sentence, utter words in units of phrases or words, and
Create silence between clauses and words. Also shown in FIG.
When using a dedicated recording / playback device (2) such as this device
Is used to separate clauses and words.
The beep sound indicating that
Separator key set on the main body of the tasting machine
Press (71) to record. In addition, words registered as words are uttered in word units.
Now, the recording / playback device (2) has a character sound utterance function.
And the character corresponding to the word you want to enter
If you have it, record the character sound instead of voice
You may sound. It is also recorded at the beginning of each sentence or recorded between sentences.
To avoid incorrectly recognizing noise as speech
And record a signal indicating the beginning and end of the sentence with audio.
Keep it. However, the recording method of this signal is as follows:
Depends on whether or not it is a multi-track system.
Change as described. Fig. 11 shows the multi-track system
FIG. 12 is a diagram of a single track system. Eleventh
FIGS. 12 (a) and 12 (a) show that a sound such as a DTMF signal is recorded.
This is a method of detecting a section that has been set as a voice area. FIGS. 11 (b) and 12 (a) show sounds such as DTMF signals,
Record before the beginning of the sentence, and when the sentence ends,
Record and record the area between the two signals as the audio area.
This is a method of detecting. In the case of the single track system shown in FIG.
Considering that sections and sounds such as DTMF signals overlap,
Use an external DTMF signal or the like. Also, when recognizing a sentence, before the signal is recorded
After that, the sections at t4 and t5 are sampled to determine
The beginning of a sentence, the beginning of a signal, and
And the end of the sentence does not have to coincide with the end of the signal.
No. For this reason, when to generate sentences and press keys
Even if the timing is slightly shifted, it can be recognized. Next, the recording / reproducing apparatus (2) is connected to the main body of the apparatus, and the recording / reproducing apparatus (2) is connected.
Plays the sound and performs the recognition process.
Before recognizing, set the recognition speed mode to the playback speed of the recorded voice.
Fast-recognition mode for faster recognition and shorter recognition time
Or the mode to recognize at the normal playback speed,
If you have enough time and need a high recognition rate,
Set to one of the modes. First, an embodiment of the early listening recognition mode will be described. In the quick-recognition mode, the playback speed of the recorded sound is increased.
The characteristics of the input sound are reproduced at the normal playback speed.
The characteristics are different from the standard pattern created from the registered voice
It is different, just input the audio with the faster playback speed
However, the speech recognition cannot be performed accurately. Therefore, it is necessary to accurately recognize the sound whose playback speed has been increased.
Change the sampling frequency. Below
Examples of the method are described. Sampling frequency control of the feature extraction unit (12) in FIG.
The section (12d) is a sampler of the input voice of the feature extraction section (12).
The sampling frequency when creating a standard audio pattern.
Set it to (playback speed / recording speed) times the pulling frequency,
Sample and analyze audio. Feature extraction unit (12) and later
Is the same as that of the embodiment for online recognition. However,
Separation of phrases and words into the recorded text of the sound reproducing device (2)
Pre-recorded beeps to indicate breaks
And the feature extractor (12) generates a beep
When detected, the feature extraction unit (12) outputs a silent section detection signal.
Generates a beep sound detection signal (b ') instead of (b)
You. If the received signal is not a silent section detection signal (b)
In the case of the buzzer detection signal (b '), the input switching unit (4)
The signal generation unit (42)
No beep is generated. In addition, the voice recognition unit (1) outputs a character indicating a word.
If the sound is recognized, the character
Output words as recognition results. Next, an embodiment of the double reproduction recognition mode will be described. In this mode, the recorded sound is played back and input to the device.
You. At this time, the sound is recorded by the preprocessing unit (11) of the voice recognition unit (1).
All the sound pressure fluctuations of the voice are read and this data is shown in Fig. 4.
The sound pressure fluctuation memory (11b) shown in FIG. Next, again
Play recorded sound and input to this device. At this time, the preprocessing section
In (11), the sound pressure fluctuation storage memory (11b)
Using the data, the input sound pressure to the feature extraction unit (12)
Adjust to the most suitable level for speech recognition as shown
Thus, the amplification factor of the AGC circuit (11a) is adjusted. That is, the gain
G to fixed gain A and control voltage V _G Multiply (variably adjusted)
It shall be assumed. As another example of the double playback recognition mode,
Double-playback recognition mode is also conceivable. This is a lot of recorded text
Playback input several times, each time input, in the voice recognition unit (1)
The recognition results by changing the recognition method
And select the one with the highest likelihood of certainty
Is the way. Also, record the registration voice in the recording / playback device (2).
And, depending on the recording / reproducing device (2), the reproducing speed is increased.
Frequency characteristics at normal playback speed
When using different numerical characteristics or when using standard audio
Different frequency from the recording / playback device (2) used to create the turn
Recognize sentences recorded on a recording / playback device (2) having characteristics
Recording used to create a standard audio pattern.
Although it has the same frequency characteristics as the sound reproducing device (2) in the standard,
The actual frequency characteristics are affected by the error of the parts used.
Recording and playback device (2) used to create the standard pattern
Recognize the recorded text to the recording / playback device (2)
Is used to compensate for the effects of frequency characteristics described below.
Use Noh. First, when measuring the frequency characteristics of the recording / reproducing device (2),
The reference sine wave signal that serves as the reference for the combination
And record it in the recording / playback device (2). After a while
The reference sine wave signal recorded in
I do. The voice recognition unit (1) converts the input reference sine wave signal to
Analyzed and recorded reference sine wave signal and reference signal generator
Difference in frequency characteristics from the reference sine wave signal generated in (42)
The reference sine wave signal recorded and the reference signal generator
Difference in frequency characteristics from the reference sine wave signal generated in (42)
Is corrected so as to reduce. Means for applying correction
Is the feature extraction method of the feature extraction unit (12) of the speech recognition unit (1)
Depending on the law, many are possible. For example, as shown in FIG.
And a band-pass filter (BPF) connected in series and an amplifier
Filter consisting of a parallel connection with an amplifier (AMP)
Amplification of amplifier (AMP) if it is a bank system
Generated by the reference signal generator (42) by adjusting the rate
The difference in frequency characteristics from the reference sine wave signal
Adjust the output from the filter as follows. Also feature extraction
Digital filter is used as the feature extraction method of section (12).
, Determine the characteristics of the digital filter.
Parameter can be changed. Other, voice recognition
It corresponds to the feature extraction method of the feature extraction unit (12) of the unit (1).
And every method is conceivable. By the above operation, the sentence input by voice becomes a kana row
It will be converted. This kana column converted sentence is input
How to correct a sentence that is not the same as
Each case is described below. Less than
The correction is made according to the procedure described in the above. FIG. 14 (a) shows an input sentence, and FIG. 14 (b) shows an input voice,
FIG. 4C shows the recognition result, and FIGS. 4D to 4H show the correction process.
FIG. 3I shows the result of the correction. First, words spoken as words are incorrectly recognized as phrases.
The correction method in the event of a failure is described. As shown in FIG.
Uttered as the word "C"
When the cursor (X) is recognized as the wrong word,
[(I) in the figure]. Next word candidate
Press the key (72) to display the next word candidate.
(D) ii]. If the result is correct, proceed to the next correction
No. If this result is incorrect, again the next word candidate key
Press (72) to display the next word candidate. This operation is correct
Repeat until solution is displayed. Next, what was uttered as a phrase was incorrectly recognized as a word
The correction method in the case where it is done is described. As a phrase "i"
If the utterance is recognized as the word "E",
Move the cursor (X) to the wrong phrase. Next
Press the next phrase candidate key (73) to display the next phrase candidate
You. If the result is correct, proceed to the next correction. If this result is incorrect, the next phrase candidate key (73)
Press to display the next candidate of the phrase. The correct answer shows this operation.
Repeat until indicated. Pressing the word pre-candidate key (74) allows you to press a word or phrase
Press the candidate key (75) to select a phrase, one for each
The previous candidate can also be displayed. If the correct answer cannot be obtained by the above two correction methods, the syllable
Modify units or re-enter words, phrases or syllables
I do. Also, when re-utterance input, phrases are recognized again,
Candidate creation to avoid word recognition
Section (15) is replaced with the recognition result sent from the word recognition section (13).
From the phrase recognition unit (14)
Recognition results sent are externally controlled to be ignored.
Wear. The candidate creation unit (15) is sent from the phrase recognition unit (14).
Only the recognized recognition result is regarded as the recognition result, and the word
Ignore the recognition result sent from the recognition unit (13)
Can be controlled from outside. The above-mentioned next candidate key is a key having the functions described below.
This will be described with reference to FIG. In the speech recognition unit (1) of this device, word recognition and phrase recognition
Are running side by side, seeking both word and phrase recognition results.
As mentioned earlier, both recognition results show that
Recognition results in order from the one with the highest likelihood
The key to be displayed on the display device (6) is the next phrase candidate key.
-(73), and the result of the word recognition process is
Key for displaying the recognition results on the display device in order from
Is the next word candidate key (72) and is currently displayed on the display device
The recognition result with one more likelihood than the recognition result
Keys to be displayed on the display device (6) are a word-previous candidate key and
And the pre-phrase candidate key. Figure 15 shows the candidate buffer (15a) of the candidate creation unit (15)
is there. This figure shows that the top recognition result is "Tango".
This is the recognition result sent from the word recognition unit (13).
(Word). Similarly, the second
The recognition result is "Tango", which is a phrase recognition unit (1
4) (recognition result)
And the recognition result of the third place is "Tango ni".
Is the recognition result sent from the phrase recognition unit (14).
Are expressed as (clauses), and the recognition result of the fourth place is "Tanko
This is sent from the word recognition unit (13)
The recognition result is represented by (word). Now, "Tango" is displayed on the display device (6).
Suppose you have Press the next phrase candidate key (73) in this state
Is displayed on the display device (6). Ma
When the next word candidate key (72) is pressed, the display device (6)
“Tanko” is displayed. "Tanko" is displayed on the display device (6).
When the previous word candidate key (74) is pressed, the display device
In (6), “Tango” is displayed, and the pre-phrase candidate key (7
3) Press to display "Tango ni" on the display (6).
It is. Next, the batch correction method for an entire phrase is described. In the example of FIG. 14 (e), the word “T” is erroneously recognized as “A”.
It is an example. First move the cursor to the word you want to modify
[FIG. (E) i]. Next, press the next word candidate key (72) to display the next word candidate
[(E) ii] of FIG. If this result is correct,
Proceed to the positive part. If this result is incorrect,
Press the complementary key (72) to display the next word candidate. This operation
Repeat until the correct answer is displayed. Correct answer is displayed and nothing
If so, re-speak and re-enter. Previous word
By pressing the complementary key (74), the previously displayed word
Candidates can also be displayed. Next, a batch correction method for an entire word will be described. In the example of Fig. 14 (f), the phrase "Gamenno" is replaced by "Gainen".
This is an example of misrecognition of "no". I want to modify the cursor first
It moves to the phrase [(f) i in the figure]. Next, press the next phrase candidate key (73) to display the next candidate of the phrase.
[Figure (f) ii]. If this result is correct,
Proceed to the correction part. If this result is incorrect,
Press the candidate key (73) to display the next candidate of the phrase. this
Repeat until correct answer is displayed. The correct answer is displayed
If not, re-speak and perform playback input. preamble
Press the clause candidate key (75) to display the previous sentence
Section candidates can also be displayed. Next, a method for correcting syllable units will be described. In the example of Fig. 14 (h), the phrase "onsei"
This is an example of misrecognition. This example shows the syllable "ke"
First, move the cursor (X)
Move to the syllable "ke" to be corrected [(h) i in the same figure]. syllable
Press the next candidate key (76). Press the next syllable candidate key (76)
The syllable closest to the syllable to be corrected
Is displayed [(h) ii in the figure]. If the correct answer is displayed,
Move to the next revision. If this result is wrong
Press the next syllable candidate key again to display the next syllable candidate.
You. This operation is repeated until a correct answer is displayed. Correct answer
If not displayed, re-enter by re-speaking. Reentry
If the result of force is wrong, correct it again by the above procedure
I do. This operation is repeated until a correct answer is displayed. By pressing the previous syllable candidate key (77),
The previous candidate can also be displayed. To delete a syllable, move the cursor to the syllable you want to modify.
Move and press the delete key (78) to delete. To insert a syllable, move the cursor to the syllable you want to modify.
Move and press the insert key (79) to insert. Next, the method of correcting several syllables will be described with reference to FIG. In this example, the input sentence “Kaijo” in FIG.
FIG. 13B shows an example in which “kanjo” is erroneously recognized. This place
The cursor (X) first returns to the syllable
[FIG. 10 (c)], "Kai" is re-uttered. Take
The re-uttered input voice is recognized by the voice recognition unit (1), and is recognized.
The result is displayed on the display device (6). The recognition result is correct
If so, proceed to the next correction section. If it is as shown in FIG.
In the case that "Kai" is misrecognized as "Kae",
Presses the next word candidate key (72). If clause, clause
Press the next candidate key (73). Figure 16 is an example of a word case
Therefore, a method of correcting a word will be described below. Figure (d)
If you press the next word candidate key (72) in the state of,
The control unit (5) uses the word dictionary (13d) to extract
(B) Recognition result "Kanjo" and the same figure after re-utterance
(D) Compared with the recognition result "Kaejo"
Find "Jo". Next, the control section (5)
Words from the book (13d)
Choose Figure (f) shows the stored contents of the word dictionary (13d).
Figure (g) shows “Jo” selected from the stored contents.
Is shown in words with. Next, the control unit (5)
(G) and the recognition result after re-speech "Kaejo
And the word with the highest likelihood value is displayed.
[FIG. (E)]. Next, when correcting phrase or word recognition boundary errors
I will talk about it. In the example of Fig. 14 (g), the phrase "bunsho o"
False recognition that there is a silent section indicated by [] between "shi"
And the word “bun” and the phrase “showo”
This is an example of incorrect recognition. In this case,
Must be corrected, remove recognition boundary delimiter
Cursor on the recognition boundary separator you want to delete.
Move the file (X) [(g) i in the same figure] and press the delete key (78).
Press [(g) ii in the figure]. Insert recognition boundary separator
The cursor (X) to the syllable at the position where you want to insert
Move and press the insert key (79). However, as described later, the recording / reproducing device (2)
Beep sound and the recognition result stored in the storage device (8)
The added delimiter is recorded in the recording / reproducing device (2) and the storage device.
Since it is a mark for synchronizing the location (8),
I have to keep it. Therefore, at this time the storage device
(8) The storage device indicates that the delimiter has been inserted and deleted.
It is stored in (8). For example, the sentence shown in FIG.
(G) As shown in ii, the data is stored in the storage device (8).
Shall be. (G) the sentence of i as shown in (g) ii
If modified to the
The symbol “▽” is changed to the symbol “▼”. Symbol "▼"
Is a symbol indicating that the delimiter `` ▽ '' has been deleted.
Recording and playback devices
This is a symbol used only for control with (2) and the like. With such a configuration, the separator "記号" is deleted.
After that, the beep sound recorded on the recording and playback device (2)
Use the symbols “▽” and “▼” stored in the storage device (8)
Thus, both devices can be controlled while maintaining synchronization. The above is an example when the separator "▽" is deleted
However, the same idea can be applied to the case where it is inserted. That is,
Not used as a control signal, but only a delimiter
Can be inserted in place of the delimiter "▽"
No. By the above correction procedure, as shown in FIG.
Correct the sentence. After correcting the recognition boundary error, perform the recognition boundary error correction.
For the recognition unit that no longer exists,
You. In the case of correction by re-voice, a standard pattern was registered
You can recognize anyone's voice,
And can perform the correction operation. Above, we explained how to correct kana column sentences, but we helped correct them
It has the following functions as the function to perform. Cursor movement on the character string displayed on the display device (6)
And the scroll function of the display screen, the storage device (8)
The stored text can be displayed sequentially on the display screen.
The sound corresponding to the part displayed on the
It is reproduced from the location (2). In addition, the recording / reproducing apparatus has a function opposite to the above-mentioned function.
The character string corresponding to the part being reproduced from (2) is displayed
Displayed on device (6). In both cases, the recorded text is recorded.
The delimiter sound that is displayed and the delimiter stored on the display side
Symbol as a timing signal for synchronization,
The reproduction and display of the sound reproducing device (2) are synchronized with each other.
It is controlled to operate while. Also keyboard
(7) or a signal to stop playback from the recording / playback device (2)
When a number is input, playback is stopped and the display
Stop crawling or moving the cursor. Synchronization function between playback and display of the above recording and playback device (2)
Check the character string while listening to the playback sound
And make it easier to find the corrections. The method of synchronization described here is
A character string of the storage device (8) corresponding to the portion to be displayed
(6) Display method and corresponding to the part being reproduced
Display the kana column of the part that is one separator behind the part
(6). In this case, when the display is stopped for correction,
Since the corrected part of the sound and audio is being played,
In order to play, the part that you want to modify from the reproduced text
It is necessary to do the heading of. Therefore, we adopt this method
The recording and playback device automatically when the display is stopped.
Backtrack the position (2) to the previous delimiter
Add functions. A tape recorder is used for the recording / reproducing device (2).
If this is done, control the playback part by rotating the motor.
And the part corresponding to the corrected part due to the slack of the tape etc.
In some cases, the minute can not be located correctly. In such a case, the input voice is
Only the length is stored in PCM recording or ADPCM recording and input
If you want to hear the sound back, use PCM recording or ADPCM recording sound.
Add a function to listen back. FIG. 17 shows an embodiment of the above-mentioned function,
FIG. 3 is a diagram of a PCM data memory for storing data. Figure
Numerals 01 to 05 indicate addresses. The input audio is
“Iwa ｜ Ten ｜ Shi ｜ Aar ｜ Tee
｜ Gamenno ｜ Bunsho ｜ Ten ｜ Onsei ｜ Shu
This is a sentence that says “I've got |
M) contains the voice up to the first silent section at address 01
The sound up to the second silent section is stored at address 02.
The voice "ten" is stored. To the fifth silence section at address 05
Is stored. At this time, the PCM address
Pointer (AP) is stored in PCM data memory.
Address of the data stored first in the data
Is stored. In this example, 01 is stored. At this stage, the PCM data memory is full. Next, when voice is input, the PCM data memory (D
Of the data stored in M)
The address of the input data stores the input voice. Book
In the example, the address 01 where "Iwa" was stored
"Menno" is stored. At this time, the PCM address pointer
(AP) is the data stored in the PCM data memory (DM).
Data address of the data stored first.
Remember. In this example, 02 is stored. In this state, the contents of the PCM data memory (DM) are played back.
Address, the PCM address pointer (AP) points to
Play from the dress. In this example, 02,03,04,05,01
I will play it in turn. In this way, you can make accurate, quick audio
It is possible to hear back. Also, move the cursor to the recognition unit separator on the screen.
Move (X) and press the cue key (70) of the recorded voice
The recording corresponding to the recognition unit indicated by the cursor
Search for the delimiter sound part of the playback device (2) from the recorded text
And has a function of reproducing a sentence following it. Less than
FIG. 1 shows an embodiment of such a function. Storage of recognition results for confirmation of recognized sentences
Read from (8) and displayed on the display device (6) from the beginning
Let At this time, in FIG.
Center (5) a is the section read from the storage device (8).
Count the number of cut symbols. The read recognition result is incorrect.
If it is, place the cursor on the wrong part and
Press the exit key. The control unit (5) includes a recording / reproducing device (2)
Play in the fast forward playback mode.
You. Beep sound counter (12e) of feature extraction unit (12)
Is a delimiter in the text input from the recording and playback device (2)
A beeping sound is counted. The comparison circuit (5b) is the value of the beep counter (12e)
From the value of the delimiter counter (5a)
When the signal is reduced by one, the signal (c) is recorded and played back.
Send to (2) and stop playback. Confirmation of the recognition result and the sentence that has been corrected
In order to display the data stored in the storage device (8)
A sentence displayed on the display screen by displaying a character string in (6)
You have to follow the text with your eyes and read it.
Get tired. In view of this point, the present device stores the recognition result
A character string on the device (8) is read out by the speech synthesis function.
The recognition results and corrections.
Confirmation of the finished sentence by listening to the synthesized speech
Can be obtained. Also in this case, the voice synthesizer (9), the storage device (8), and the recording
Time synchronization between the playback device (2) and the display device (6)
Delimiters are used as signaling signals. That is, the voice synthesizing unit (9) reads from the storage device (8).
The character string corresponding to the raised part is displayed on the display device (6).
Is shown, and at the same time, the recording part is searched from the recording / reproducing device (2)
I have. By this method, reading of synthesized speech
Function to find errors and read speech synthesis for correction
When the adjustment function is stopped, the display on the display device (6) is also recorded.
The recording part of the sound reproducing device (2) also shows an error part,
Corrections can be made immediately. Synchronization described here is based on the speech synthesis function.
Row of storage device corresponding to the part read out by
Is displayed on the display device (6), and at the same time, the recording / reproducing device is displayed.
Reappear the corresponding syllable from the sentence recorded in (2)
And how to read it out
Recording replay delayed by one delimiter from the part corresponding to the part
A method of reproducing a sentence portion recorded on the raw device (2) and
There is. In the latter case, speech synthesis was stopped for correction.
The recording / reproducing device (2) stops short of the part to be corrected.
Because it is stopped, if you play it in this state it will be corrected immediately
You can play the sound of. In the former case, speech synthesis for correction
When you stop playback, the corrected part of the recorded sound is
Backtrack to play the corrected part again
Need to be checked. Therefore, if the former method is adopted,
If the display is stopped, the recording and playback device (2) is automatically
Also backtracks to the previous separator.
It is preferred to add. As described above, the recognition result is stored in the storage device (8).
An example has been described, but hereafter, as another embodiment,
A method for storing the recognition result in the raw device (2) will be described. The recognition result stored in the storage device (8) is recorded in the original sentence.
The recorded sound is recorded in the recording / reproducing device (2). Like this
For example, the original text and the recognition result can be recorded on the same recording medium.
This makes it easier to manage the original text and the recognition results. In addition, the result of recognition while recording and inputting the recorded text
By recording to the recording and playback device (2)
No storage device is required. In any case, a multi-track recording and playback device
By using (2), while playing the recorded voice,
Recognize the recognition result in a track where no sound is recorded
Can be (G) Effect of the Invention The speech recognition system of the present invention recognizes character strings and the like.
Store the result in the recording / playback device where the original voice is stored
Conventional floppy only for recognition result storage
Eliminates the need for storage devices such as disks, and makes devices smaller and smaller.
And lower prices are possible. In addition, a storage medium with the same recognition result as the original text (recorded text)
The original text and the recognition result can be stored as a set.
File management becomes easier.

【図面の簡単な説明】第１図は本発明の音声認識システムを採用したディクテ
ーティングマシンの外観図、第２図はディクテーティン
グマシンの構成図、第３図は音声認識部（１）の構成
図、第４図は前処理部（11）の構成図、第５図は特徴抽
出部（12）の構成図、第６図は単語認識部（13）の構成
図、第７図は文節認識部（14）の構成図、第８図は入力
切り換え部（４）の構成図、第９図は見出し語と録音方
式とキャラクター音の関係図、第10図はキャラクター音
の録音方法と音声区間の関係図、第11図は録音再生装置
がマルチトラック方式の場合の録音方法を示す図、第12
図は録音再生装置がシングルトラック方式の場合の録音
方法を示す図、第13図は周波数補正回路例を示す図、第
14図は誤認識時の修正図、第15図は候補作成部（15）内
の候補バッファ（15a）を示す図、第16図は誤認識時の
数音節修正例を示す図、第17図はPCM録音方法説明図、
第18図はAGC動作の説明図、第19図は、区切り記号のカ
ウンターの説明図である。（１）……音声認識部、（２）……録音再生装置、
（３）……マイク、（６）……表示装置、（７）……キ
ーボード、（８）……記憶装置、（11）……前処理部、
（12）……特徴抽出部、（13）……単語認識部、（14）
……文節認識部、（11a）……可変利得増巾器、（11b）
……音圧変動メモリ。BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is an external view of a dictating machine employing a speech recognition system of the present invention, FIG. 2 is a configuration diagram of the dictating machine, and FIG. 3 is a speech recognition unit (1). FIG. 4 is a block diagram of a pre-processing unit (11), FIG. 5 is a block diagram of a feature extraction unit (12), FIG. 6 is a block diagram of a word recognition unit (13), and FIG. FIG. 8 is a block diagram of the phrase recognizing unit (14), FIG. 8 is a block diagram of the input switching unit (4), FIG. 9 is a diagram showing the relationship between the headword, the recording method and the character sound, and FIG. FIG. 11 is a diagram showing a recording method when the recording / reproducing apparatus is a multi-track system, and FIG.
FIG. 13 is a diagram showing a recording method when the recording / reproducing apparatus is a single track system, FIG. 13 is a diagram showing an example of a frequency correction circuit,
FIG. 14 is a correction diagram at the time of misrecognition, FIG. 15 is a diagram showing the candidate buffer (15a) in the candidate creating section (15), FIG. 16 is a diagram showing an example of correcting several syllables at the time of misrecognition, FIG. Is an illustration of PCM recording method,
FIG. 18 is an explanatory diagram of the AGC operation, and FIG. 19 is an explanatory diagram of a separator counter. (1) ... voice recognition unit, (2) ... recording and playback device,
(3) ... microphone, (6) ... display device, (7) ... keyboard, (8) ... storage device, (11) ... preprocessing unit,
(12): Feature extraction unit, (13) Word recognition unit, (14)
...... Phrase recognition unit, (11a) ... Variable gain amplifier, (11b)
…… Sound pressure fluctuation memory.

───────────────────────────────────────────────────── フロントページの続き (72)発明者森憲敬大阪府守口市京阪本通２丁目18番地三洋電機株式会社内 (56)参考文献特開昭55−7758（ＪＰ，Ａ) 特開昭62−113264（ＪＰ，Ａ) 特開昭54−136134（ＪＰ，Ａ) 特開昭62−65285（ＪＰ，Ａ) 実開昭61−140405（ＪＰ，Ｕ) 実開昭62−22700（ＪＰ，Ｕ) (58)調査した分野(Int.Cl.⁶，ＤＢ名) G10L 3/00 551 G10L 3/00 561──────────────────────────────────────────────────続き Continuation of front page (72) Inventor Noritaka Mori 2-18 Keihanhondori, Moriguchi-shi, Osaka Sanyo Electric Co., Ltd. (56) References JP-A-55-7758 (JP, A) JP-A JP-A-62-113264 (JP, A) JP-A-54-136134 (JP, A) JP-A-62-65285 (JP, A) Fully open Showa 61-140405 (JP, U) Really open Showa 62-22700 (JP, A) , U) (58) Field surveyed (Int. Cl. ⁶ , DB name) G10L 3/00 551 G10L 3/00 561

Claims

(57) [Claims] A recording and playback device that records input voice, and a voice recognition device that recognizes a sentence recorded in the recording and playback device,
A speech recognition system, wherein a recognition result from the speech recognition device is stored in the recording / reproducing device.