JP2723214B2

JP2723214B2 - Voice document creation device

Info

Publication number: JP2723214B2
Application number: JP60253206A
Authority: JP
Inventors: 洋一竹林; 宏之坪井; 博史金沢
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1985-11-12
Filing date: 1985-11-12
Publication date: 1998-03-09
Anticipated expiration: 2013-03-09
Also published as: JPS62113264A

Description

【発明の詳細な説明】〔発明の技術分野〕本発明は音声入力による文書作成を効率良く行うこと
のできる音声文書作成装置に関する。〔発明の技術的背景とその問題点〕文書の作成は、専ら筆記具を用いて原稿用紙に筆記す
ることにより行われる。一方、多忙な文書作成者にあっては、速記者を利用し
て口述筆記によって文書作成したり、作成文書情報をテ
ープレコーダ等に音声入力し、これを秘書により書面化
させる等して文書作成が行われる。これに対して最近では、日本語ワードプロセッサの普
及に伴い、例えばキーボード操作による文書情報の一括
入力方式で、簡易に文書作成が行われるようになってき
ている。ところが文書情報の最も自然な入力手段は音声入力で
あり、キーボード等の煩わしい操作が全く不要である。
そこで、音声入力による文書作成装置、つまり音声ワー
ドプロセッサの開発が種々試みられている。さて、この種の音声ワードプロセッサは、情報処理技
術および半導体製造技術の発展を背景として、その実用
化の研究が進められているものである。ところが音声認識技術は、例えば工場の製品管理工程
で特定話者認識技術として、また電話サービス等におけ
る限られた認識対象に対する不特定話者認識技術として
実用化されているに過ぎない。つまり音声認識は、入力
音声の種々の変動要因を考慮して認識処理する必要があ
る。この為、その認識性能の向上を図るべく、認識処理
方式に対する種々の工夫が試みられ、その認識性能が徐
々に高められつつある。しかし人間にとっても会話において誤認識が生じるよ
うに、音声認識装置において、100％完全な認識率を達
成することは到底困難である。これ故、音声入力によっ
て文書作成した場合、筆記による文書作成時にも増し
て、作成文書情報の厳密なチェックが必要となる。〔発明の目的〕本発明はこのような事情を考慮してなされたもので、
その目的とするところは、音声入力により効果的に文書
作成し、且つその作成文書の編集チェックを簡易に、且
つ効率良く行うことのできる音声文書作成装置を提供す
ることにある。〔発明の概要〕本発明に係る音声文書作成装置は、所定の言語単位で
発声入力された音声を分析する手段と、この音声の分析
結果を音声辞書と照合して所定の音声処理単位に対する
認識候補を求める手段と、この認識候補の系列を言語辞
書と照合して前記所定の言語単位の入力音声に対する認
識結果を求める手段と、この認識結果を記憶する第１の
記憶手段と、上記認識結果に対応する入力音声データを
記憶する第２の記憶手段と、前記第１の記憶手段に記憶
された前記認識結果と前記第２の記憶手段に記憶された
前記入力音声データとを所定の言語処理単位で対応付け
て管理する対応管理手段と、前記第１の記憶手段に記憶
された前記認識結果を表示する表示手段と、この表示手
段により表示された認識結果のうち入力音声の参照を行
うべき所望の部分を所定の言語処理単位で指示するため
の音声出力指示手段と、この音声出力指示手段により指
示された前記認識結果の部分に対応付けられて前記第２
の記憶手段に記憶されている入力音声データを選択的に
読出して音声出力する手段とを具備したことを特徴とす
る。〔発明の効果〕かくして本発明によれば、認識結果に応じて、その認
識結果を得た入力音声を再生し、これをモニタして上記
認識結果をチェックすることができるので、音声入力に
より作成された文書情報のチェック処理、およびその修
正処理等を簡易に、且つ効率良く行うことが可能とな
る。しかも入力音声が、その認識結果に対応して記憶さ
れ、適宜これを再生することができるので、音声入力に
よる文書作成を一括して行うことができ、その文書作成
効率の向上を図ることが可能となる等の実用上多大なる
効果が奏せられる。〔発明の実施例〕以下、図面を参照して発明の一実施例装置につき説明
する。図は実施例装置の概略構成図であり、１はマイクロフ
ォンや増幅器等からなる音声入力部である。この音声入
力部１から入力される音声情報は、音声検出部２にて音
声区間検出される。制御部３はこの音声区間検出情報に
従って以下に説明する音声認識処理の実行等を制御す
る。音声入力部１から認識対象とする音声が入力される
と、例えば複数チャンネルのバンドパスフィルタ群から
なる音声分析部４は、その入力音声のスペクトル成分を
検出する等して該入力音声の特徴パラメータを求めてい
る。音声認識部５は、上記特徴パラメータの時系列から
その特徴ベクトルを検出し、その特徴ベクトルと音声辞
書６とを照合して各認識対象カテゴリに対する類似度を
計算する等して、該入力音声を認識処理している。言語処理部７は、このようにして求められる入力音声
の認識結果の系列を、言語辞書８を参照して言語的に検
定し、例えば複数の認識候補の組合せから言語的に成立
する認識結果系列を、前記入力音声によって示される作
成文書情報として得ている。このようにして認識処理された認識結果が、ファイル
管理部９の管理の下で文書ファイル10に順に格納され
る。一方、入力音声に対する認識結果が文書ファイル10に
格納されるとき、その認識結果を得た前記入力音声、或
いはこの入力音声を分析処理してなる音声データが上記
認識結果に対応して音声ファイル11に格納される。この
音声ファイル11は、例えば入力音声データをディジタル
化して記憶し、その記憶音声データを選択的に読出して
再生出力するものである。しかして、文書作成に供する音声を一括入力し、その
入力音声に対する認識処理が終了すると、オペレータと
の対話形インターフェース部を形成する表示部12にて前
記文書ファイル10に格納された認識結果（作成された文
書情報）を表示し、そのチェックが行われる。ここで、その表示された認識結果に対して、その認識
結果を得た入力音声を参照したい場合には、音声出力指
示部13から入力音声の参照を行うべき認識結果の特定
と、その入力音声の参照指示が与えられる。この指示情
報は、前記制御部３に与えられると共に、前記音声ファ
イル11に与えられる。この結果、前記ファイル管理部９
の制御の下で、前記音声ファイル11から上記指定された
認識結果に対応する入力音声データが音声出力部14に読
出され、該入力音声が再生出力される。この再生出力さ
れた音声によって、オペレータは認識結果の合否を判定
して適宜その修正を行うことになる。この認識結果の修
正は、例えば正しい認識カテゴリをキーボード入力する
等して行われ、この入力データによって前記文書ファイ
ル11に格納された該当認識結果が修正されることにな
る。尚、音声出力部14から出力された音声に従って、前記
文書ファイル10に格納された文書情報（認識結果）編集
処理が必要な場合には、その編集情報が編集情報入力部
15から入力される。この入力された編集情報に従って前
記ファイル管理部９の制御の下で、前記文書ファイル10
に格納された文書情報の編集が行われる。この場合、文
書ファイル10に格納された認識結果の編集作業に伴っ
て、音声ファイル11に格納された音声データも、その認
識結果に対応して編集処理される。ところで本装置にあっては、必要に応じて前記文書フ
ァイル10に格納された認識結果を規則合成部16に読出
し、該認識結果を規則合成処理してその音声データが求
められるようになっている。そしてその規則合成された
音声データを音声出力部17を介して音声出力し得るもの
となっている。この機能は、認識処理した結果を音声によってチェッ
クする場合に用いられるもので、例えば文書ファイル10
に格納された認識結果の系列が言語処理単位で順に規則
合成されて出力されるようになっている。この場合、制
御部３は、規則合成して音声出力した認識結果に対して
その認識結果を得た入力音声の参照が指示されると、上
記認識結果の規則合成による音声出力を打切る。そして
その打切られた音声の言語処理単位の数単位前の入力音
声データからの再生を開始する。つまり、ファイル管理部９の制御の下で文書ファイル
10からの認識結果を所定の言語単位毎に読出し、これを
規則合成して音声出力している時点で入力音声の参照が
指示されると、上記文書ファイル10からの認識結果の読
出しに代えて、前記音声ファイル11からの入力音声デー
タの読出しを開始する。この音声ファイル11からの音声
データの読出しは、上記認識結果の読出しの中止が指示
されて言語処理単位の数単位前の言語処理単位から行わ
れる。この結果、オペレータは、認識結果の音声出力を得た
後、その認識結果を得た入力音声を繰返し得ることが可
能となり、これによってその照合が行われる。このように本装置によれば、入力音声を認識処理し、
その認識結果を順次文書ファイル10に格納する際に、同
時にその認識結果を得た入力音声データを音声ファイル
11に格納し、これらを所定の言語処理単位で相互に対応
させて管理しているので、認識結果をチェック時に、そ
の認識結果を得た入力音声を容易に参照することが可能
となる。しかも所定の言語処理単位で、任意に入力音声
を参照することが可能となる。故に、文書作成に供する音声データを一括入力し、そ
の認識処理を行わせた後、入力音声を適宜参照して認識
結果のチェックを簡易に、且つ効果的に行うことが可能
となる。従って、音声の発声入力者が自ら認識結果のチ
ェックを行うことが可能なことはもとより、秘書等の第
３者によって作成文書のチェックを行うことが可能とな
る等、実用上多大なる効果が奏せられる。尚、前記音声ファイル11に分析処理された入力音声デ
ータを格納する場合には、その分析音声データに従って
入力音声を再合成する処理が必要となることは云うまで
もない。また入力音声の認識処理方式や、入力音声デー
タの記憶形態等は、装置の仕様に応じて定めれば良いも
のである。要するに本発明は、その要旨を逸脱しない範
囲で種々変形して実施することができる。Description: TECHNICAL FIELD [0001] The present invention relates to a voice document creation device capable of efficiently creating a document by voice input. [Technical Background of the Invention and Problems Thereof] A document is created by writing on a manuscript paper exclusively using a writing instrument. On the other hand, a busy document creator can use a stenographer to create a document by dictation, or input the created document information by voice into a tape recorder or the like, and write the document by a secretary, etc. Is performed. On the other hand, recently, with the spread of Japanese word processors, documents have been easily created by a batch input method of document information by, for example, keyboard operation. However, the most natural means of inputting document information is voice input, and no cumbersome operation such as a keyboard is required.
Therefore, various attempts have been made to develop a document creation device using voice input, that is, a voice word processor. Now, research on the practical use of this type of voice word processor is being promoted with the development of information processing technology and semiconductor manufacturing technology. However, the speech recognition technology has only been put to practical use, for example, as a specific speaker recognition technology in a product management process in a factory, or as an unspecified speaker recognition technology for a limited recognition target in a telephone service or the like. In other words, the voice recognition needs to perform a recognition process in consideration of various fluctuation factors of the input voice. Therefore, in order to improve the recognition performance, various devices for the recognition processing method have been tried, and the recognition performance has been gradually improved. However, it is very difficult for human beings to achieve a 100% perfect recognition rate in a speech recognition device so that false recognition occurs in conversation. For this reason, when a document is created by voice input, strict checking of created document information is required more than when a document is created by handwriting. [Object of the Invention] The present invention has been made in view of such circumstances,
It is an object of the present invention to provide a voice document creating apparatus capable of effectively creating a document by voice input and performing an editing check of the created document easily and efficiently. [Summary of the Invention] A speech document creating apparatus according to the present invention includes a means for analyzing a speech uttered and input in a predetermined language unit, and a method for recognizing a predetermined speech processing unit by comparing the analysis result of the speech with a speech dictionary. Means for obtaining a candidate, means for comparing the sequence of recognition candidates with a language dictionary to obtain a recognition result for the input speech in the predetermined linguistic unit, first storage means for storing the recognition result, A second storage means for storing input voice data corresponding to the input voice data, and a predetermined language processing of the recognition result stored in the first storage means and the input voice data stored in the second storage means. Correspondence management means for managing in association with each other, display means for displaying the recognition result stored in the first storage means, and input speech among the recognition results displayed by the display means. Voice output instructing means for instructing a desired part in a predetermined language processing unit, and the second
Means for selectively reading out the input voice data stored in the storage means and outputting the voice. [Effects of the Invention] Thus, according to the present invention, according to the recognition result, it is possible to play back the input voice that has obtained the recognition result, and monitor and check the recognition result. It is possible to easily and efficiently perform the check processing of the document information thus corrected and the correction processing thereof. Moreover, since the input voice is stored in correspondence with the recognition result and can be reproduced as appropriate, it is possible to collectively create a document by voice input, thereby improving the document creation efficiency. Thus, a great effect can be obtained in practical use. [Embodiment of the Invention] An embodiment of the invention will be described below with reference to the drawings. FIG. 1 is a schematic configuration diagram of the apparatus according to the embodiment. Reference numeral 1 denotes an audio input unit including a microphone, an amplifier, and the like. The voice information input from the voice input unit 1 is detected by the voice detection unit 2 in a voice section. The control unit 3 controls the execution of a speech recognition process described below in accordance with the speech section detection information. When a speech to be recognized is input from the speech input unit 1, the speech analysis unit 4 including a group of band-pass filters of a plurality of channels detects the spectral components of the input speech and performs other processing on the characteristic parameters of the input speech. Seeking. The speech recognition unit 5 detects the feature vector from the time series of the feature parameters, compares the feature vector with the speech dictionary 6, calculates the similarity for each recognition target category, and the like, and converts the input speech. Recognition processing is in progress. The language processing unit 7 verifies the sequence of the recognition results of the input speech obtained in this manner linguistically with reference to the language dictionary 8 and, for example, recognizes a recognition result sequence linguistically established from a combination of a plurality of recognition candidates. Is obtained as created document information indicated by the input voice. The recognition results thus recognized are sequentially stored in the document file 10 under the management of the file management unit 9. On the other hand, when the recognition result for the input voice is stored in the document file 10, the input voice that obtained the recognition result or the voice data obtained by analyzing the input voice is stored in the voice file 11 corresponding to the recognition result. Is stored in The audio file 11 is for digitizing and storing input audio data, for example, and selectively reading out the stored audio data for reproduction and output. Then, when the speech to be used for document creation is input collectively and the recognition process for the input speech is completed, the recognition result (the creation result) stored in the document file 10 is displayed on the display unit 12 which forms an interactive interface with the operator. Document information), and the check is performed. Here, when the user wants to refer to the input speech obtained from the recognition result with respect to the displayed recognition result, the user specifies the recognition result to refer to the input speech from the speech output instructing unit 13 and the input speech. Is given. This instruction information is provided to the control unit 3 and also to the audio file 11. As a result, the file management unit 9
Under the control described above, input voice data corresponding to the specified recognition result is read out from the voice file 11 to the voice output unit 14, and the input voice is reproduced and output. Based on the reproduced voice, the operator determines whether the recognition result is acceptable or not and appropriately corrects the result. The correction of the recognition result is performed, for example, by inputting the correct recognition category using a keyboard, and the input data corrects the corresponding recognition result stored in the document file 11. If the document information (recognition result) stored in the document file 10 needs to be edited in accordance with the voice output from the voice output unit 14, the edit information is sent to the edit information input unit.
Entered from 15. Under the control of the file management unit 9 according to the input editing information, the document file 10
The document information stored in the file is edited. In this case, along with the editing operation of the recognition result stored in the document file 10, the audio data stored in the audio file 11 is also edited according to the recognition result. By the way, in the present apparatus, the recognition result stored in the document file 10 is read out to the rule synthesizing unit 16 as necessary, and the recognition result is subjected to rule synthesizing processing to obtain its speech data. . Then, the voice data that has been rule-synthesized can be output as voice via the voice output unit 17. This function is used when checking the result of recognition processing by voice.
The sequence of recognition results stored in is sequentially synthesized and output in units of language processing. In this case, when the control unit 3 is instructed to refer to the input speech for which the recognition result has been obtained with respect to the recognition result that has been rule-synthesized and output as speech, the control unit 3 terminates the speech output by the rule synthesis of the recognition result. Then, the reproduction of the cut-off sound from the input sound data several units before the language processing unit is started. That is, a document file is controlled under the control of the file management unit 9.
The recognition result from the document file 10 is read out for each predetermined language unit, and when the reference to the input voice is instructed at the time of performing the rule synthesis and outputting the voice, the reading of the recognition result from the document file 10 is performed instead. Then, the reading of the input voice data from the voice file 11 is started. The reading of the voice data from the voice file 11 is performed from the language processing unit several units before the language processing unit when the stop of the reading of the recognition result is instructed. As a result, after obtaining the voice output of the recognition result, the operator can repeat the input voice having obtained the recognition result, whereby the matching is performed. As described above, according to the present apparatus, the input voice is recognized and processed.
When the recognition results are sequentially stored in the document file 10, the input voice data obtained at the
Since these are stored in the memory 11 and managed in correspondence with each other in a predetermined language processing unit, it is possible to easily refer to the input speech obtained from the recognition result when checking the recognition result. In addition, it is possible to arbitrarily refer to the input voice in a predetermined language processing unit. Therefore, it is possible to easily and effectively check the recognition result by inputting voice data to be used for document creation at once and causing the recognition process to be performed, and then appropriately referring to the input voice. Therefore, not only the voice utterer can check the recognition result by himself, but also a third party such as a secretary can check the created document. Can be done. It is needless to say that, when storing the analyzed input voice data in the voice file 11, a process of resynthesizing the input voice according to the analyzed voice data is required. The recognition processing method of the input voice, the storage form of the input voice data, and the like may be determined according to the specifications of the device. In short, the present invention can be variously modified and implemented without departing from the gist thereof.

【図面の簡単な説明】図は本発明の一実施例装置の概略構成図である。１…音声入力部、２…音声検出部、３…制御部、４…音
声分析部、５…音声認識部、６…音声辞書、７…言語処
理部、８…言語辞書、９…ファイル管理部、10…文書フ
ァイル、11…音声ファイル、12…表示部、13…音声出力
指示部、14…音声出力部、15…編集情報入力部、16…規
則合成部、17…音声出力部。BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a schematic configuration diagram of an apparatus according to an embodiment of the present invention. REFERENCE SIGNS LIST 1 voice input unit 2 voice detection unit 3 control unit 4 voice analysis unit 5 voice recognition unit 6 voice dictionary 7 language processing unit 8 language dictionary 9 file management unit , 10 ... document file, 11 ... audio file, 12 ... display unit, 13 ... audio output instruction unit, 14 ... audio output unit, 15 ... edit information input unit, 16 ... rule synthesizing unit, 17 ... audio output unit.

Claims

(57) [Claims] Means for analyzing a voice uttered and input in a predetermined language unit; means for comparing the analysis result of the voice with a voice dictionary to obtain a recognition candidate for a predetermined voice processing unit; Means for comparing to obtain a recognition result for the input speech in the predetermined language unit; first storage means for storing the recognition result; and second means for storing input speech data corresponding to the recognition result.
Storage means; and correspondence management means for managing the recognition result stored in the first storage means and the input voice data stored in the second storage means in association with each other in a predetermined language processing unit. Display means for displaying the recognition result stored in the first storage means; and a desired part of the recognition result displayed by the display means to which input speech should be referred to in a predetermined language processing unit And voice output instruction means for selectively reading input voice data stored in the second storage means in association with a part of the recognition result specified by the voice output instruction means. And a means for creating a voice document. 2. 2. The voice document creation device according to claim 1, wherein the predetermined language unit input by voice is a word, a phrase, a phrase, a sentence, or the like. 3. The correspondence managing means, when the recognition result stored in the first storage means is edited, causes the second
2. The voice document creating apparatus according to claim 1, wherein the input voice data stored in said storage means is also edited according to the recognition result.