JP2001067097A

JP2001067097A - Document preparation device and document preparing method

Info

Publication number: JP2001067097A
Application number: JP24562099A
Authority: JP
Inventors: Shoichi Matsunaga; 昭一松永; Yoshiaki Noda; 喜昭野田; Katsutoshi Ofu; 克年大附
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1999-08-31
Filing date: 1999-08-31
Publication date: 2001-03-16

Abstract

PROBLEM TO BE SOLVED: To provide a document preparation device using a voice recognizing function, capable of efficiently using the voice recognizing function if sufficient texts can not be collected. SOLUTION: A voice recognizing part for analyzing inputted voice into a group of vocal-sound characteristic parameters and performing voice recognition based on the information of the characteristic parameters and on linguistic information, has a continuous voice recognizing part 3 for recognizing a continuously produced voice and a word voice recognizing part 4 for recognizing a voice spoken as a word. A user prepares a document while arbitrarily switching between the two voice recognizing parts by selecting it using a mouse or a keyboard or by vocally inputting a command, etc.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、異なる性質を持つ
複数の音声認識機能を用いた文書作成装置に関する。[0001] 1. Field of the Invention [0002] The present invention relates to a document creating apparatus using a plurality of voice recognition functions having different properties.

【０００２】[0002]

【従来の技術】従来、音声認識機能を用いたディクテー
ションによる文書作成においては、連続音声認識機能が
幅広く使われている。この連続音声認識機能は音響モデ
ルと呼ばれる声の質に関する情報と、言葉に関する言語
情報を用いて動作する。音響モデルとしては隠れマルコ
フモデル（ＨＭＭ）が、言語モデルとしては単語の二つ
組の生起順序に関する統計モデル（bigram）、三つ組の
生起順序に関する統計モデル(trigram)が言語モデルと
して広く用いられている（例えば、中川聖一著、“確率
モデルによる音声認識”電子情報通信学会昭和63年７
月発行）。2. Description of the Related Art Conventionally, a continuous speech recognition function has been widely used in document creation by dictation using a speech recognition function. This continuous speech recognition function operates using information about the quality of voice called an acoustic model and linguistic information about words. A Hidden Markov Model (HMM) is widely used as an acoustic model, a statistical model (bigram) regarding the order of occurrence of two sets of words, and a statistical model (trigram) regarding the order of occurrence of three sets of languages is widely used as a language model. (For example, Seiichi Nakagawa, "Speech Recognition by Probabilistic Model," IEICE, July 1988
Month).

【０００３】上記言語モデルを作成するためには、日本
語の場合、所望のテキスト（入力したいテキスト）に類
似したテキストを収集し、単語ごとに分割したのちに、
その単語の連鎖統計を求めることにより言語モデルとし
て使用していた。In order to create the above language model, in the case of Japanese, texts similar to a desired text (text to be input) are collected, and divided into words.
It was used as a language model by determining the link statistics of the word.

【０００４】[0004]

【発明が解決しようとする課題】従来の上記ディクテー
ションでは、収集できるテキストが少ない場合、テキス
トの単語連鎖では十分に統計情報が得られない、あるい
はその統計情報が有効に機能しない単語（例えば、人名
などの固有名詞）があり、それらの単語に対する認識性
能はまだ十分ではなかった。In the conventional dictation, when the amount of text that can be collected is small, the word chain of the text does not provide sufficient statistical information, or a word in which the statistical information does not function effectively (for example, a person's name). And the recognition performance for those words was not yet enough.

【０００５】本発明の目的は、音声入力により当該テキ
ストを作成する過程において連続音声認識機能のみを用
いることに起因する上記の問題点を解決し、音声認識機
能を効率よく用いる文書作成装置を提供することにあ
る。An object of the present invention is to solve the above-mentioned problem caused by using only the continuous speech recognition function in the process of creating the text by voice input, and to provide a document creation apparatus that uses the speech recognition function efficiently. Is to do.

【０００６】[0006]

【課題を解決するための手段】本発明に係る請求項１記
載の文書作成装置は、入力された音声を音声音響特徴パ
ラメータ群に分析し、その特徴パラメータの情報と言語
情報に基づいて認識を行う音声認識装置において、連続
して発声された音声を認識する連続音声認識部と、単語
として発声された音声を認識する単語音声認識部を持
ち、使用者が両認識部をマウスあるいはキーボードによ
る選択、もしくは音声によるコマンド等による入力で任
意に切り替えながら文書を作成することを特徴とする。According to a first aspect of the present invention, there is provided a document creating apparatus which analyzes an input voice into a group of voice acoustic characteristic parameters, and performs recognition based on the characteristic parameter information and language information. A speech recognition device to perform has a continuous speech recognition unit for recognizing continuously uttered speech, and a word speech recognition unit for recognizing speech uttered as a word, and a user selects both recognition units with a mouse or a keyboard. Alternatively, a document is created while arbitrarily switching by input using a voice command or the like.

【０００７】また、請求項２記載の文書作成装置は、請
求項１記載の文書作成装置において、上記単語音声認識
において使用者が複数の単語辞書の中から任意に一つの
単語辞書を選択することを特徴とする。また、請求項３
記載の文書作成装置は、請求項２に記載の文書作成装置
において、他の全ての単語辞書の単語項目を登録した全
単語辞書を単語辞書の一つとして登録することを特徴と
する。According to a second aspect of the present invention, in the document creating apparatus of the first aspect, the user selects one word dictionary from a plurality of word dictionaries in the word speech recognition. It is characterized by. Claim 3
According to a second aspect of the present invention, there is provided the document creating apparatus according to the second aspect, wherein all word dictionaries in which word items of all other word dictionaries are registered are registered as one of the word dictionaries.

【０００８】また、請求項４記載の文書作成方法は、請
求項１記載の文書作成装置において、使用者が連続音声
認識部を用いて文書を作成している最中に、音声コマン
ドにより連続音声認識を中断し単語認識部を起動させる
こと、及び単語認識終了後、再び連続音声認識を再開で
きることを特徴とする。According to a fourth aspect of the present invention, in the document creating apparatus of the first aspect, while the user is creating a document by using the continuous voice recognition unit, the user can generate a continuous voice by a voice command. It is characterized in that the recognition is interrupted and the word recognition unit is activated, and that the continuous speech recognition can be resumed after the completion of the word recognition.

【０００９】[0009]

【発明の実施の形態】以下、図面を参照して本発明に係
る実施形態について説明する。図１に本発明の文書作成
装置のブロック図を示す。文書作成装置１は、マイク7-
1、キーボード7-2、マウス7-3等から構成される入力手
段７、音響モデル５と話者適応部６を備えた主エディタ
２、単語n-gram (ｎ個の要素からなる連鎖)辞書3-1を備
えた連続音声認識部３、複数の単語辞書4-1を備えた単
語音声認識部４及び表示装置８等から構成される。Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 shows a block diagram of the document creation apparatus of the present invention. The document creation device 1 includes a microphone 7-
1. Input means 7 including a keyboard 7-2, a mouse 7-3, etc., a main editor 2 having an acoustic model 5 and a speaker adaptation unit 6, a word n-gram (chain of n elements) dictionary The system includes a continuous speech recognition unit 3 provided with a word dictionary 3-1, a word speech recognition unit 4 provided with a plurality of word dictionaries 4-1 and a display device 8.

【００１０】音声が入力されると、音響モデル５と話者
適応部６を用いて主エディタ２は入力された音声を音声
音響特徴パラメータ群に分析し、連続して発声された音
声は単語n-gram辞書3-1を用いて連続音声認識部３で認
識し、そして、単語として発声された音声は単語辞書3-
1を用いて単語音声認識部４で認識し、その認識結果を
用いて主エディタ２は文書を作成すると共に表示装置８
に表示する。また、主エディタ２は入力手段７の入力に
より連続音声認識部３と単語音声認識部４とを切り替え
る機能と、単語音声認識部４が備える複数の単語辞書4-
1を選択する機能を有する。When a speech is input, the main editor 2 analyzes the input speech into a group of speech acoustic feature parameters using the acoustic model 5 and the speaker adaptation unit 6, and the continuously uttered speech is converted to the word n. The continuous speech recognition unit 3 uses the -gram dictionary 3-1 to recognize the speech, and the speech uttered as a word is converted to the word dictionary 3-
1, the main editor 2 creates a document and uses the display device 8
To be displayed. The main editor 2 has a function of switching between the continuous speech recognition unit 3 and the word speech recognition unit 4 by the input of the input unit 7, and a plurality of word dictionaries 4-4 provided in the word speech recognition unit 4.
It has a function to select 1.

【００１１】図２は、本発明に関わる文書作成装置のイ
メージ図である。図２は、入力した音声を音声認識機能
により文字系列に変換して所望の文書を作成していく状
況を示している。表示装置８において、上部が文書作成
用コマンドを動作させる操作部、下部は認識結果であ
り、「初期評価実施致しましたので連絡します。」と文
書が作成されている様子を示す。本発明による一つの実
施形態として、Ｓ1：連続音声認識開始／終了ボタン
（例えば、キーボード）により連続音声認識開始および
終了を操作し、Ｓ2：単語音声認識開始／終了ボタンに
より単語音声認識開始および終了の操作を行う（開始の
際にボタンを押下し、終了時に再度そのボタンを押下す
る。）。即ち、請求項１記載の発明に関して、本発明で
はこの様に連続音声認識と単語音声認識を組み合わせな
がら、文書作成を行っていく。日常、よく使用される言
い回しなどの文書を作成するには連続音声認識を、特殊
な場合のみ使われる固有名詞（例えば、特殊な人名や地
名等）を記述する場合は、単語音声認識を用いて、文書
を作成する。これにより、従来の連続音声認識のみによ
る文書作成に比較してより効率良く、かつ使用者に負担
をかけずに文書作成を行うことができる。FIG. 2 is an image diagram of a document creating apparatus according to the present invention. FIG. 2 shows a situation in which an input voice is converted into a character sequence by a voice recognition function to create a desired document. On the display device 8, the upper part shows the operation unit for operating the document creation command, and the lower part shows the recognition result, and shows that the document is being created, "I will contact you because the initial evaluation was performed." As one embodiment according to the present invention, S1: start and end continuous speech recognition by a continuous speech recognition start / end button (for example, a keyboard), and S2: start and end word speech recognition by a word speech start / end button. (Press the button at the start and press the button again at the end.) That is, with respect to the first aspect of the present invention, the present invention creates a document while combining continuous speech recognition and word speech recognition. Use continuous speech recognition to create documents such as commonly used phrases, and use word speech recognition to write proper nouns that are used only in special cases (for example, special names of people or places). , Create a document. This makes it possible to create a document more efficiently and without imposing a burden on the user as compared with the conventional document creation based on continuous speech recognition alone.

【００１２】この実施形態の他の方法として、操作メニ
ューの選択による方法を図３に示す。例えば、「挿入」
のメニューをマウス等により選択することにより、Ｓ3
「連続音声認識開始」メニューあるいはＳ4「単語音声
認識開始」メニューを選択することにより入力操作を行
う。メニュー選択後の連続音声入力中は、「連続音声認
識開始」メニューは「連続音声認識終了」メニューとな
り、「単語音声認識開始」メニューは「単語音声認識終
了」メニューとなる。それらを再び選択することで、音
声入力は終了となる。As another method of this embodiment, a method by selecting an operation menu is shown in FIG. For example, "insert"
The menu of S3 can be selected by selecting
An input operation is performed by selecting the "start continuous speech recognition" menu or the S4 "start word speech recognition" menu. During continuous voice input after menu selection, the “continuous voice recognition start” menu becomes a “continuous voice recognition end” menu, and the “word voice recognition start” menu becomes a “word voice recognition end” menu. By selecting them again, the voice input ends.

【００１３】他の実施形態としては、音声による操作、
即ち「連続音声認識開始」あるいは「単語音声認識開
始」と発声すること（音声入力）で、両認識を効率良く
使い分けることが可能である。一方、請求項２の単語音
声認識に関しては、単語認識選択前あるいは後に、認識
しようとする単語辞書を選択することで、認識対象語彙
を絞り認識性能を向上させる。[0013] As another embodiment, an operation by voice,
That is, by uttering “start of continuous speech recognition” or “start of word speech recognition” (voice input), it is possible to use both recognitions efficiently. On the other hand, with regard to the word speech recognition according to the second aspect, before or after selection of the word recognition, a word dictionary to be recognized is selected, thereby narrowing down the vocabulary to be recognized and improving the recognition performance.

【００１４】図４では、単語辞書の▼のボタンを押下す
ることにより、単語辞書の種類が出現し、そこより一つ
を選択する。図４では、単語辞書Ａが人名辞書、単語辞
書Ｂが地名辞書、単語辞書Ｃが魚名辞書とする。使用者
が「いとー」と発声した場合には、人名辞書Ａを選択し
ていれば「伊藤」もしくは「伊東」が、地名辞書Ｂを選
択していれば「伊東」が、魚名辞書Ｃを選択していれば
「伊富」が出力されることになる。図４は表示が反転し
ている単語辞書Ａを選択している様子を示す。In FIG. 4, by pressing the ▼ button of the word dictionary, the type of the word dictionary appears, and one of them is selected. In FIG. 4, the word dictionary A is a person name dictionary, the word dictionary B is a place name dictionary, and the word dictionary C is a fish name dictionary. When the user utters “Ito”, “Ito” or “Ito” is selected if the personal name dictionary A is selected, “Ito” is selected if the place name dictionary B is selected, and the fish name dictionary C If "is selected," Itomi "will be output. FIG. 4 shows a state where the word dictionary A whose display is inverted is selected.

【００１５】一方、請求項３の全単語辞書に関しては、
請求項２の実施形態において、図５に示すように、単語
辞書Ａ、単語辞書Ｂ、単語辞書Ｃ、及び全単語辞書の４
つの辞書より使用者が選択することになる。ここで、全
単語辞書には単語辞書Ａ、Ｂ、Ｃの全ての単語が登録さ
れている。これは、使用者が出力させたい単語がどの辞
書項目に含まれるか分からない場合に使用される。認識
精度や処理時間は該当する単語辞書を選ぶ場合よりも劣
化するが、確実に出力することが可能である。例えば、
上記例では、「いとー」と発声した場合に、「伊東」、
「伊藤」、「伊富」の候補が出力され、その中より候補
を使用者が選ぶことになる。On the other hand, with respect to the all word dictionary of claim 3,
In the embodiment of FIG. 2, as shown in FIG. 5, a word dictionary A, a word dictionary B, a word dictionary C, and four
The user will select from one dictionary. Here, all the words in the word dictionaries A, B, and C are registered in the all word dictionary. This is used when the user does not know which dictionary entry contains the word to be output. Although the recognition accuracy and the processing time are lower than when the corresponding word dictionary is selected, it is possible to output the data reliably. For example,
In the above example, if you say "Ito", "Ito"
The candidates “Ito” and “Itomi” are output, and the user selects a candidate from the candidates.

【００１６】一方、請求項４の音声認識機能を用いた文
書作成方法に関しては、連続音声認識を用いて文書を作
成している最中に、例えば、「単語認識」などと使用者
が音声コマンドを発声することにより連続音声認識を中
断し単語認識部を起動させ、単語音声認識が終了した時
点で、再び先ほどの連続音声認識を再開することによ
り、文章を効率良く作成する。例えば、「都市の名前：
伊東静岡県にある都市」という文を作成したいと考え
た場合の手順を図６に示す。On the other hand, with regard to the document creation method using the voice recognition function according to the fourth aspect, while the document is being created using the continuous voice recognition, for example, the user may use a voice command such as "word recognition". , The continuous speech recognition is interrupted to activate the word recognition unit, and when the word speech recognition is completed, the continuous speech recognition is resumed again, thereby efficiently creating a sentence. For example, "City name:
FIG. 6 shows a procedure when it is desired to create a sentence “City in Shizuoka Prefecture, Ito”.

【００１７】まず連続音声認識を起動させ、Ｓ7：「と
しのなまえころん」と発声する。この時点では、『都市
の名前：』と表示されている。ここで地名単語辞書を選
択した状態で、Ｓ9：「単語認識モード」と発声し、Ｓ1
0：「いとー」と発声することにより、自動的に単語認
識部が起動され、発声した音声「いとう」が認識され、
Ｓ11：『都市の名前：伊東』と出力される。この時点
で、すでに処理は連続音声認識が再開されており、Ｓ1
2：「すぺーすしずおかけんにあるとし」と発声するこ
とにより、『都市の名前：伊東静岡県にある都市』と
表示され、連続音声認識を終了することにより所望の文
を作成できる。このように連続音声認識を行っている最
中に、単語音声認識を呼び出すことにより、文章を効率
よく作成することが可能となる。First, continuous speech recognition is activated, and S7: "Toshinomae Koron" is uttered. At this point, "City name:" is displayed. Here, with the place name word dictionary selected, S9: “Word recognition mode” is uttered, and S1
0: By uttering “Ito”, the word recognition unit is automatically activated, and the uttered voice “Ito” is recognized,
S11: "City name: Ito" is output. At this point, the process has already resumed continuous speech recognition, and S1
2: "Speaking that you are in the space" is displayed, and "City name: Ito city in Shizuoka prefecture" is displayed, and the desired sentence can be created by ending the continuous speech recognition. By calling the word speech recognition during the continuous speech recognition as described above, it is possible to efficiently create a sentence.

【００１８】以上説明したように、連続音声認識機能と
単語音声認識機能を保持した文章作成装置を提供するこ
とで、両機能をユーザーが切り替えながら使用すること
により、効率良く文書を作成することができる。As described above, by providing a sentence creation apparatus having a continuous speech recognition function and a word speech recognition function, a user can efficiently create a document by switching and using both functions. it can.

【００１９】[0019]

【発明の効果】以上、詳述したように、本発明によれ
ば、音声認識機能を用いた文書作成装置（ディクテーシ
ョン装置）において、連続音声認識機能と単語音声認識
機能を効率よく切り替えながら文書を作成することによ
り、従来の連続音声認識機能のみを用いた文書作成装置
により、より優れた使用勝手（利便性）を提供できると
いう利点がある。As described above in detail, according to the present invention, in a document creation device (dictation device) using a speech recognition function, a document is created while efficiently switching between a continuous speech recognition function and a word speech recognition function. By creating the document, there is an advantage that a more excellent usability (convenience) can be provided by a conventional document creating apparatus using only the continuous speech recognition function.

[Brief description of the drawings]

【図１】本発明の文書作成装置のブロック図。FIG. 1 is a block diagram of a document creation device of the present invention.

【図２】連続音声認識と単語音声認識の両認識部のボタ
ンによる切り換えと文書作成例を示すイメージ図。FIG. 2 is an image diagram showing an example of switching between continuous speech recognition and word speech recognition by buttons of a recognition unit and document creation.

【図３】連続音声認識と単語音声認識の両認識部のメニ
ューによる切り換えと文書作成例を示すイメージ図。FIG. 3 is an image diagram showing an example of switching between continuous voice recognition and word voice recognition by menus of a recognition unit and a document creation example.

【図４】複数の単語辞書の選択メニューを示す図。FIG. 4 is a diagram showing a selection menu for a plurality of word dictionaries.

【図５】単語音声認識における全単語辞書を含む単語辞
書のメニューを示す図。FIG. 5 is a diagram showing a menu of word dictionaries including all word dictionaries in word speech recognition.

【図６】連続音声認識を行いながら部分的に単語音声認
識を行って文書を作成する手順を示す図。FIG. 6 is a diagram showing a procedure for creating a document by partially performing word speech recognition while performing continuous speech recognition.

[Explanation of symbols]

１文書作成装置２主エディタ３連続音声認識部 3-1 単語n-gram辞書４単語音声認識部 4-1 単語辞書５音響モデル６話者適応部７入力手段８表示装置 DESCRIPTION OF SYMBOLS 1 Document creation apparatus 2 Main editor 3 Continuous speech recognition part 3-1 Word n-gram dictionary 4 Word speech recognition part 4-1 Word dictionary 5 Acoustic model 6 Speaker adaptation part 7 Input means 8 Display device

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁷ 識別記号ＦＩテーマコート゛(参考）Ｇ１０Ｌ 15/10 Ｇ１０Ｌ 3/00 ５３１Ｍ 15/00 ５５１Ｂ 15/22 ５７１Ｖ (72)発明者大附克年東京都千代田区大手町二丁目３番１号日本電信電話株式会社内Ｆターム(参考） 5B009 KB01 MB06 ME12 5D015 AA01 BB01 HH06 HH12 KK01 KK03 LL08 LL10 ──────────────────────────────────────────────────続き Continued on the front page (51) Int.Cl. ⁷ Identification symbol FI Theme coat ゛ (Reference) G10L 15/10 G10L 3/00 531M 15/00 551B 15/22 571V (72) Inventor Katsutoshi Ohtsuki Tokyo 2-3-1 Otemachi, Chiyoda-ku Nippon Telegraph and Telephone Corporation F-term (reference) 5B009 KB01 MB06 ME12 5D015 AA01 BB01 HH06 HH12 KK01 KK03 LL08 LL10

Claims

[Claims]

An analyzing unit for analyzing the input speech into a group of speech acoustic feature parameters; a speech recognizing unit having a speech recognition function for performing recognition based on information on the feature parameters and linguistic information; A document creation device comprising a document creation unit for creating a document by using the speech recognition unit, wherein the speech recognition unit recognizes a continuously uttered speech, and a word speech recognition that recognizes a speech uttered as a word. A document creation apparatus comprising: an input unit that allows a user to arbitrarily switch between the two recognition units; and creates a document while the user switches between the two recognition units arbitrarily.

2. The apparatus according to claim 1, wherein said word speech recognition section includes a plurality of word dictionaries, and word dictionary selecting means for allowing a user to arbitrarily select one of the plurality of word dictionaries. The document creation device according to claim 1.

3. The document creation apparatus according to claim 2, wherein one of the plurality of word dictionaries includes an all word dictionary in which word items of all other word dictionaries are registered.

4. A continuously uttered speech having an analysis unit for analyzing an input speech into a group of speech acoustic feature parameters and a speech recognition function for performing recognition based on information of the feature parameters and linguistic information. A continuous speech recognition unit for recognizing, and a word speech recognition unit for recognizing a voice uttered as a word, input means capable of arbitrarily switching the two recognition units by a user, and both recognition units being optional by the user And a document creation device provided with a document creation unit for creating a document using the recognition result. Procedure 1 for creating a document using the continuous speech recognition unit; The procedure 2 interrupts the continuous speech recognition of the continuous speech recognition unit and activates the word speech recognition unit to perform the word speech recognition. The procedure 3 restarts the continuous speech recognition after the completion of the word speech recognition. Document creation method characterized by comprising.