JP2001092494A

JP2001092494A - Device and method for recognizing speech, and speech recognition program recording medium

Info

Publication number: JP2001092494A
Application number: JP27119799A
Authority: JP
Inventors: Hirotaka Goi; 啓恭伍井; Yoshiharu Abe; 芳春阿部
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 1999-09-24
Filing date: 1999-09-24
Publication date: 2001-04-06
Anticipated expiration: 2019-09-24
Also published as: JP3976959B2

Abstract

PROBLEM TO BE SOLVED: To facilitate accurately giving a proper reading to a syllable string agreeing with an unknown word when the word is extracted during speech recognition processing. SOLUTION: This speech recognition device is provided with an unknown word syllable presuming device 8; which generates various syllable string candidates corresponding to the unknown word by referring to a sub-word dictionary 11 in which various readings to sub-words composing a word as syllable strings and combining the syllable strings to the sub-words composing the unknown word; which detects a syllable string candidate most approximate to the syllable string for recognition corresponding to the unknown word by referring to a table of differences 12 for evaluating a degree of approximation between two syllable strings; and thus, which presumes the maximum likelihood syllable string candidate as the syllable string agreeing with the unknown word.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】この発明は、音声認識に係
り、特に未知語を含んだ発声について未知語を抽出する
とともに当該未知語に合致する読みの音節列を推定する
音声認識装置、音声認識方法および音声認識プログラム
記録媒体に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to speech recognition, and more particularly to a speech recognition apparatus and method for extracting an unknown word from an utterance including an unknown word and estimating a syllable string of a reading that matches the unknown word. And a voice recognition program recording medium.

【０００２】[0002]

【従来の技術】日本語文書入力の手段として音声入力は
有用なものであるが、実用化するためにはより認識精度
の向上が望まれている。特に、入力された音声の認識精
度を高めるために、単語辞書を用いる方式が考察されて
いる。ただし、単語辞書を用いる方式では、単語辞書に
登録できる単語数には限界があるため、新しく登場する
単語（未知語）を正しく認識するのは非常に困難であ
る。したがって、例えば特開平２−１６３８７４号公報
に開示されているように、ユーザによる認定文字列の入
力があると、文字種等の情報を用いて未知語候補の文字
列を抽出し、ユーザによる未知語の確認を行うか、ある
いは大規模辞書をアクセスして未知語の同定を行い、未
知語を新たに単語辞書に登録していた。2. Description of the Related Art Speech input is useful as a means for inputting Japanese documents, but it is desired to improve recognition accuracy for practical use. In particular, a method using a word dictionary has been considered in order to increase the recognition accuracy of input speech. However, in the method using the word dictionary, there is a limit to the number of words that can be registered in the word dictionary, and it is extremely difficult to correctly recognize a newly appearing word (unknown word). Therefore, as disclosed in, for example, JP-A-2-163874, when a user inputs a certified character string, a character string of an unknown word candidate is extracted using information such as a character type, and the unknown word by the user is extracted. Or an unknown word is identified by accessing a large-scale dictionary, and the unknown word is newly registered in the word dictionary.

【０００３】図２３は、従来の一般的な未知語抽出機能
を備えた音声認識装置の構成を示すブロック図である。
図において、１０１はマイク、１０２は音節列算出装
置、１０３は単語列算出装置、１０４は出力装置、１０
５は修正装置、１０６は字種分割装置、１０７は未知語
抽出装置、１０８はＲＡＭ、１０９は単語辞書である。FIG. 23 is a block diagram showing the configuration of a conventional general speech recognition apparatus having an unknown word extracting function.
In the figure, 101 is a microphone, 102 is a syllable string calculating device, 103 is a word string calculating device, 104 is an output device, 10
5 is a correction device, 106 is a character type division device, 107 is an unknown word extraction device, 108 is a RAM, and 109 is a word dictionary.

【０００４】次に動作について説明する。図２４は、上
記音声認識装置を用いて未知語音節を抽出する動作の過
程を示すフローチャートである。ユーザがマイク１０１
に対して発声することで、処理が開始される（ステップ
ＳＴ１０１）。マイク１０１を通して音声を入力すると
（ステップＳＴ１０２）、マイク１０１内部で入力音声
を電気的信号に変換する（ステップＳＴ１０３）。音節
列算出装置１０２は、電気的信号をＡ／Ｄ変換して量子
化した後、音声パターンのスペクトル分析を実施し、音
節単位の認識結果を接続することで音節列候補を生成し
てＲＡＭ１０８に記憶する（ステップＳＴ１０４）。単
語列算出装置１０３は、すべての音節列候補に対応した
単語列候補を算出する（ステップＳＴ１０５）。次に、
出力装置１０４は、最尤の音節列候補および単語列候補
を選定してそれらを出力する（ステップＳＴ１０６）。
ユーザは出力装置１０４による表示出力を見て、認識結
果に誤りがある場合には、当該誤りのある認識部分につ
いて修正装置１０５を用いて修正を行う（ステップＳＴ
１０７）。修正装置１０５は、ユーザからの修正入力を
受けて、当該正解文字列を出力する。そして、字種分割
装置１０６は、修正装置１０５から出力された正解文字
列を入力して、正解文字列を字種（平仮名、カタカナ、
漢字、英文字等）に基づいて分割し、字種分割文字列を
出力する（ステップＳＴ１０８）。未知語抽出装置１０
７は、字種分割文字列を入力して、字種分割文字列の部
分文字列をキーとして単語辞書１０９を検索し、単語辞
書にキーとする文字列が登録されていないときには、当
該キーとされた文字列を未知語として出力する（ステッ
プＳＴ１０９）。以上のようにして、発声に含まれる未
知語を抽出して処理を終了する（ステップＳＴ１１
０）。Next, the operation will be described. FIG. 24 is a flowchart showing a process of extracting an unknown word syllable using the above speech recognition device. When the user enters the microphone 101
, The process is started (step ST101). When a voice is input through the microphone 101 (step ST102), the input voice is converted into an electric signal inside the microphone 101 (step ST103). The syllable string calculating device 102 A / D converts and quantizes the electric signal, performs a spectrum analysis of the voice pattern, connects syllable unit recognition results to generate syllable string candidates, and stores the syllable string candidates in the RAM 108. It is stored (step ST104). The word string calculation device 103 calculates word string candidates corresponding to all syllable string candidates (step ST105). next,
The output device 104 selects the maximum likelihood syllable string candidate and word string candidate and outputs them (step ST106).
The user looks at the display output from the output device 104 and, if there is an error in the recognition result, corrects the erroneously recognized portion using the correction device 105 (step ST).
107). The correction device 105 receives the correction input from the user and outputs the correct character string. Then, the character type dividing device 106 inputs the correct character string output from the correction device 105 and converts the correct character string into character types (Hiragana, Katakana,
(Kanji, English characters, etc.), and outputs a character string divided into character types (step ST108). Unknown word extraction device 10
7 inputs a character type divided character string, searches the word dictionary 109 using a partial character string of the character type divided character string as a key, and when the character string as a key is not registered in the word dictionary, The resulting character string is output as an unknown word (step ST109). As described above, the unknown words included in the utterance are extracted, and the process is terminated (step ST11).
0).

【０００５】[0005]

【発明が解決しようとする課題】従来の音声認識装置は
以上のように構成されているので、ユーザにより入力さ
れた正解文字列から未知語を抽出できるが、音声認識処
理を実施するためには未知語に合致する読みの音節列の
情報が必要となる。未知語がカタカナまたは平仮名で表
記されている場合には、音節列を付与できる場合もある
が、漢字や英文字列について精度良く音節列を付与する
ことは困難であるという課題があった。また、漢字に対
する音節列付与については、単漢字毎に対応する音節列
を接続して未知語に対する全体的な音節列を付与する付
与方式も提案されてはいるが、１つの漢字について通常
は多数の音節列候補が対応するために、精度良く正しい
音節列を選択するのが困難であるという課題があった。Since the conventional speech recognition apparatus is configured as described above, an unknown word can be extracted from the correct character string input by the user. Information on the syllable sequence of the reading that matches the unknown word is required. When an unknown word is written in katakana or hiragana, a syllable string can be given in some cases, but it is difficult to assign a syllable string with high accuracy to kanji or English character strings. As for the syllable string assignment for kanji, there has been proposed a method of connecting syllable strings corresponding to each single kanji and assigning an entire syllable string for an unknown word. There is a problem that it is difficult to select a correct syllable string with high accuracy because the syllable string candidates correspond.

【０００６】さらに、仮名文字についても、例えば単語
「ロウソク」に合致する音節列に対しては、母音の長音
化のルールが適用されて、合致する音節列は＃ｒｏｏｓ
ｏｋｕ＃となる。しかし、単語「シロウサギ」に合致す
る音節列について上記ルールを同様に適用すると、合致
する音節列は＃ｓｉｒｏｏｓａｇｉ＃となり、正解であ
る音節列＃ｓｉｒｏｕｓａｇｉ＃と異なった音節列が付
与されてしまう。このように、仮名文字についても単純
に表記に基づいた音節化ルールの適用のみでは精度良く
音節列を付与することが困難であるという課題があっ
た。[0006] Furthermore, for kana characters, for example, the rule of vowel lengthening is applied to a syllable string that matches the word "candle", and the matching syllable string is #roos.
oku #. However, if the above rule is similarly applied to a syllable string that matches the word “white rabbit”, the matching syllable string will be # siloosagi #, and a syllable string different from the correct syllable string # sirousagi # will be added. As described above, there is a problem that it is difficult to provide a syllable string with high accuracy by simply applying a syllable rule based on a simple notation even for a kana character.

【０００７】この発明は上記のような課題を解決するた
めになされたもので、音声認識過程で抽出された未知語
に合致する正確な音節列を精度良く付与することができ
る音声認識装置、音声認識方法、および音声認識プログ
ラム記録媒体を得ることを目的とする。SUMMARY OF THE INVENTION The present invention has been made to solve the above-mentioned problems, and a speech recognition apparatus and a speech recognition apparatus capable of accurately giving an accurate syllable string matching an unknown word extracted in a speech recognition process. It is an object to obtain a recognition method and a recording medium for a speech recognition program.

【０００８】[0008]

【課題を解決するための手段】この発明に係る音声認識
装置は、音声を入力して電気的信号等で表現される情報
処理可能な音声パターンを生成する音声入力手段と、前
記音声パターンを基に音節単位の認識を実施して前記音
声に対応する音節列候補を算出する音節列算出手段と、
前記音節列候補に対応する単語列候補を算出する単語列
算出手段と、音声認識結果として前記音節列算出手段お
よび前記単語列算出手段により算出された少なくとも最
尤の認識単語列を出力する出力手段と、前記出力手段に
より表示される認識単語列に誤りがある場合に修正のた
めにユーザが正解文字列を入力する修正手段と、入力さ
れた前記正解文字列に対して形態素解析を実施する形態
素解析手段と、前記正解文字列と形態素解析結果とを比
較して未知語および当該未知語に対応する認識音節列を
認定する未知語範囲抽出手段と、単語を構成するサブワ
ードに対する種々の読みを音節列として登録したサブワ
ード辞書を参照して未知語を構成するサブワードに対す
る音節列を組み合わせることで未知語に対応する種々の
音節列候補を生成し、２つの音節列間の近似度を評価す
る差分表を参照して前記未知語に対応する認識音節列に
最も近似する音節列候補を検出して、この最尤の音節列
候補を未知語に合致する音節列と推定する未知語音節推
定手段とを備えるようにしたものである。According to the present invention, there is provided a voice recognition apparatus comprising: a voice input unit configured to input a voice to generate an information-processable voice pattern represented by an electrical signal or the like; Syllable string calculating means for performing syllable unit recognition to calculate a syllable string candidate corresponding to the voice,
Word string calculating means for calculating a word string candidate corresponding to the syllable string candidate, and output means for outputting at least the maximum likelihood recognized word string calculated by the syllable string calculating means and the word string calculating means as a speech recognition result Correction means for a user to input a correct character string for correction when there is an error in a recognized word string displayed by the output means, and a morpheme for performing morphological analysis on the input correct character string Analyzing means, an unknown word range extracting means for comparing the correct character string and the morphological analysis result to identify an unknown word and a recognized syllable string corresponding to the unknown word, and reading various readings for subwords constituting the word into syllables. Generating various syllable string candidates corresponding to unknown words by combining syllable strings for subwords constituting unknown words with reference to subword dictionaries registered as strings A syllable string candidate that is most similar to the recognized syllable string corresponding to the unknown word is detected by referring to a difference table for evaluating the degree of approximation between the two syllable strings, and this maximum likelihood syllable string candidate is converted to an unknown word. An unknown word syllable estimating means for estimating a matching syllable string is provided.

【０００９】この発明に係る音声認識装置は、音声パタ
ーンを基に音節単位の認識を実施して音声に対応する尤
度上位の複数個の音節列候補を算出する音節列算出手段
と、前記複数個の音節列候補のそれぞれに対して対応す
る単語列候補を算出する単語列算出手段と、前記音節列
算出手段および前記単語列算出手段により算出された複
数個の音節列候補と単語列候補との組み合せのなかから
最も大きな言語尤度を有する組み合せを検出し、当該組
み合せに係る音節列候補および単語列候補をそれぞれ認
識音節列および認識単語列として少なくとも認識単語列
を出力する出力手段とを備えるようにしたものである。A syllable string calculating means for performing syllable unit recognition based on a voice pattern to calculate a plurality of syllable string candidates having a higher likelihood corresponding to a voice, and Word string calculating means for calculating a word string candidate corresponding to each of the syllable string candidates; and a plurality of syllable string candidates and word string candidates calculated by the syllable string calculating means and the word string calculating means. Output means for detecting a combination having the largest linguistic likelihood from among the combinations of, and outputting at least a recognized word string as a syllable string candidate and a word string candidate for the combination as a recognized syllable string and a recognized word string, respectively. It is like that.

【００１０】この発明に係る音声認識装置は、未知語範
囲抽出手段により認定された未知語、および未知語音節
推定手段により推定された前記未知語に合致する音節列
を単語辞書に登録する単語登録手段を備えるようにした
ものである。[0010] A speech recognition apparatus according to the present invention is a word registration for registering an unknown word recognized by an unknown word range extracting means and a syllable string matching the unknown word estimated by an unknown word syllable estimating means in a word dictionary. Means.

【００１１】この発明に係る音声認識装置は、未知語範
囲抽出手段により認定された未知語、および未知語音節
推定手段により推定された前記未知語に合致する音節列
をｎ−ｇｒａｍとして単語辞書に登録するｎ−ｇｒａｍ
登録手段を備えるようにしたものである。[0011] The speech recognition apparatus according to the present invention provides an unknown word recognized by the unknown word range extracting means and a syllable string matching the unknown word estimated by the unknown word syllable estimating means as n-gram in the word dictionary. N-gram to register
It is provided with registration means.

【００１２】この発明に係る音声認識装置は、未知語範
囲抽出手段により認定された未知語、および未知語音節
推定手段により推定された前記未知語に合致する音節列
を表す表記をユーザに対して表示する第２の出力手段
と、該第２の出力手段に表示された前記未知語および前
記未知語に合致する音節列を表す表記に誤りがある場合
にユーザが正解となる表記を入力する第２の修正手段と
を備えるようにしたものである。The speech recognition apparatus according to the present invention provides a user with a notation representing an unknown word recognized by an unknown word range extracting unit and a syllable string matching the unknown word estimated by the unknown word syllable estimating unit to a user. A second output means for displaying, and a second input means for inputting a notation that is correct when the notation representing the unknown word and the syllable string matching the unknown word displayed on the second output means is incorrect. 2 correction means.

【００１３】この発明に係る音声認識装置は、未知語範
囲抽出手段により認定された未知語に対して未知語音節
推定手段により推定された前記未知語に合致する音節列
を単語辞書に登録するとともに、未知語についての異表
記に対しても前記合致する音節列を単語辞書に登録する
異表記登録手段を備えるようにしたものである。The speech recognition apparatus according to the present invention registers a syllable string matching the unknown word estimated by the unknown word syllable estimating means in the word dictionary for the unknown word recognized by the unknown word range extracting means. In addition, a different notation registering means for registering the matching syllable string in the word dictionary with respect to the different notation of the unknown word is provided.

【００１４】この発明に係る音声認識装置は、未知語範
囲抽出手段により認定された未知語について、該未知語
に合致する音節列が推定できたか否かを判定して、推定
できた場合には当該推定された音節列を前記未知語に合
致する音節列として単語辞書に登録し、推定できない場
合には前記未知語範囲抽出手段により認定された前記未
知語に対応する認識音節列を前記未知語に合致する音節
列として単語辞書に登録する音節列登録手段を備えるよ
うにしたものである。The speech recognition apparatus according to the present invention determines whether or not a syllable string matching the unknown word has been estimated for the unknown word recognized by the unknown word range extracting means. The estimated syllable string is registered in the word dictionary as a syllable string that matches the unknown word, and if it cannot be estimated, the recognized syllable string corresponding to the unknown word recognized by the unknown word range extracting means is converted to the unknown word. And a syllable string registering means for registering the syllable string in the word dictionary as a syllable string matching with.

【００１５】この発明に係る音声認識装置は、未知語範
囲抽出手段により認定された未知語に対して未知語音節
推定手段により推定された前記未知語に合致する音節列
を単語辞書に登録するとともに、前記未知語に対して該
未知語に合致する異読みの音節列を単語辞書に登録する
異読み登録手段を備えるようにしたものである。The speech recognition apparatus according to the present invention registers a syllable string matching the unknown word estimated by the unknown word syllable estimating means in the word dictionary for the unknown word recognized by the unknown word range extracting means. And a different reading registration unit for registering, in the word dictionary, a syllable string of different reading corresponding to the unknown word.

【００１６】この発明に係る音声認識方法は、音声を入
力して電気的信号等で表現される情報処理可能な音声パ
ターンを生成する音声入力ステップと、前記音声パター
ンを基に音節単位の認識を実施して前記音声に対応する
音節列候補を算出する音節列算出ステップと、前記音節
列候補に対応する単語列候補を算出する単語列算出ステ
ップと、音声認識結果として前記音節列算出ステップお
よび前記単語列算出ステップにおいて算出された少なく
とも最尤の認識単語列を出力する出力ステップと、前記
出力ステップにおいて表示される認識単語列に誤りがあ
る場合に修正のためにユーザが正解文字列を入力する修
正ステップと、入力された前記正解文字列に対して形態
素解析を実施する形態素解析ステップと、前記正解文字
列と形態素解析結果とを比較して未知語および当該未知
語に対応する認識音節列を認定する未知語範囲抽出ステ
ップと、単語を構成するサブワードに対する種々の読み
を音節列として登録したサブワード辞書を参照して未知
語を構成するサブワードに対する音節列を組み合せるこ
とで未知語に対応する種々の音節列候補を生成し、２つ
の音節列間の近似度を評価する差分表を参照して前記未
知語に対応する認識音節列に最も近似する音節列候補を
検出して、この最尤の音節列候補を未知語に合致する音
節列と推定する未知語音節推定ステップとを有するよう
にしたものである。A voice recognition method according to the present invention includes a voice input step of inputting a voice to generate an information-processable voice pattern represented by an electrical signal or the like, and performing syllable-based recognition based on the voice pattern. A syllable string calculating step of calculating a syllable string candidate corresponding to the voice, a word string calculating step of calculating a word string candidate corresponding to the syllable string candidate, and the syllable string calculating step as a speech recognition result. An output step of outputting at least the maximum likelihood recognized word string calculated in the word string calculation step, and a user inputting a correct character string for correction when the recognized word string displayed in the output step has an error A correcting step, a morphological analysis step of performing a morphological analysis on the input correct character string, and a morphological analysis result with the correct character string. An unknown word range extraction step of comparing an unknown word and a recognized syllable string corresponding to the unknown word, and referring to a subword dictionary in which various readings for the subwords constituting the word are registered as a syllable string. By generating syllable string candidates corresponding to the unknown word by combining syllable strings for the subwords constituting, the recognition corresponding to the unknown word is referred to by referring to a difference table for evaluating the degree of approximation between the two syllable strings. A syllable string candidate closest to the syllable string is detected, and an unknown word syllable estimation step of estimating the most likely syllable string candidate as a syllable string matching the unknown word is provided.

【００１７】この発明に係る音声認識方法は、音節列算
出ステップにおいて音声パターンを基に音節単位の認識
を実施して音声に対応する尤度上位の複数個の音節列候
補を算出し、単語列算出ステップにおいて前記複数の音
節列候補のそれぞれに対して対応する単語列候補を算出
し、出力ステップにおいて前記音節列算出ステップおよ
び前記単語列算出ステップで算出された複数の音節列と
単語列との組み合せのなかから最も大きな言語尤度を有
する組み合せを検出し、当該組み合せに係る音節列候補
および単語列候補をそれぞれ認識音節列および認識単語
列として少なくとも認識単語列を出力するようにしたも
のである。In the speech recognition method according to the present invention, in the syllable string calculation step, syllable unit recognition is performed on the basis of the speech pattern to calculate a plurality of syllable string candidates with higher likelihood corresponding to the speech, and the word string is calculated. In the calculating step, a word string candidate corresponding to each of the plurality of syllable string candidates is calculated, and in the outputting step, the syllable string calculating step and the plurality of syllable strings calculated in the word string calculating step are combined with the word string. A combination having the largest linguistic likelihood is detected from the combinations, and the syllable string candidate and the word string candidate related to the combination are output as at least a recognized syllable string and a recognized word string, respectively. .

【００１８】この発明に係る音声認識方法は、未知語範
囲抽出ステップにおいて認定された未知語、および未知
語音節推定ステップにおいて推定された前記未知語に合
致する音節列を単語辞書に登録する単語登録ステップを
有するようにしたものである。In the speech recognition method according to the present invention, word registration for registering, in a word dictionary, an unknown word recognized in an unknown word range extraction step and a syllable string matching the unknown word estimated in the unknown word syllable estimation step. It has a step.

【００１９】この発明に係る音声認識方法は、単語登録
ステップにおいて、未知語範囲抽出ステップにおいて認
定された未知語、および未知語音節推定ステップにおい
て推定された前記未知語に合致する音節列をｎ−ｇｒａ
ｍとして単語辞書に登録するようにしたものである。In the speech recognition method according to the present invention, in the word registration step, the unknown word recognized in the unknown word range extraction step and the syllable string matching the unknown word estimated in the unknown word syllable estimation step are n- gra
It is registered in the word dictionary as m.

【００２０】この発明に係る音声認識方法は、未知語範
囲抽出ステップにおいて認定された未知語、および未知
語音節推定ステップにおいて推定された前記未知語に合
致する音節列を表す表記をユーザに対して表示する第２
の出力ステップと、該第２の出力ステップで表示された
前記未知語および前記未知語に合致する音節列を表す表
記に誤りがある場合にユーザが正解となる表記を入力す
る第２の修正ステップとを有するようにしたものであ
る。In the speech recognition method according to the present invention, the notation representing the unknown word recognized in the unknown word range extraction step and the syllable string matching the unknown word estimated in the unknown word syllable estimation step is provided to the user. 2nd to display
And a second correction step in which the user inputs a correct notation when the notation representing the unknown word and the syllable string matching the unknown word displayed in the second output step is incorrect. It has the following.

【００２１】この発明に係る音声認識方法は、未知語範
囲抽出ステップで認定された未知語に対して未知語音節
推定ステップで推定された前記未知語に合致する音節列
を単語辞書に登録するとともに、前記未知語についての
異表記に対しても前記合致する音節列を単語辞書に登録
する異表記登録ステップを有するようにしたものであ
る。In the speech recognition method according to the present invention, a syllable string matching the unknown word estimated in the unknown word syllable estimation step is registered in the word dictionary for the unknown word identified in the unknown word range extraction step. , A different notation registration step of registering the matching syllable string in the word dictionary even for the different notation of the unknown word.

【００２２】この発明に係る音声認識方法は、未知語範
囲抽出ステップで認定された未知語について、該未知語
に合致する音節列が推定できたか否かを判定して、推定
できた場合には当該推定された音節列を前記未知語に合
致する音節列として単語辞書に登録し、推定できない場
合には前記未知語範囲抽出ステップで認定された前記未
知語に対応する認識音節列を前記未知語に合致する音節
列として単語辞書に登録する音節列登録ステップを有す
るようにしたものである。The speech recognition method according to the present invention determines whether or not a syllable string matching the unknown word has been estimated for the unknown word identified in the unknown word range extracting step. The estimated syllable string is registered in the word dictionary as a syllable string that matches the unknown word, and if it cannot be estimated, the recognized syllable string corresponding to the unknown word identified in the unknown word range extraction step is replaced with the unknown word. Is registered in the word dictionary as a syllable string that matches

【００２３】この発明に係る音声認識方法は、未知語範
囲抽出ステップで認定された未知語に対して未知語音節
推定ステップで推定された前記未知語に合致する音節列
を単語辞書に登録するとともに、前記未知語に対して該
未知語に合致する異読みの音節列を単語辞書に登録する
異読み登録ステップを有するようにしたものである。In the speech recognition method according to the present invention, a syllable string matching the unknown word estimated in the unknown word syllable estimation step is registered in the word dictionary for the unknown word identified in the unknown word range extraction step. , A misreading registration step of registering, in the word dictionary, a syllable string of misreading that matches the unknown word.

【００２４】この発明に係る音声認識プログラム記録媒
体は、入力された音声パターンを基に音節単位の認識を
実施して音声に対応する音節列候補を算出する音節列算
出機能と、前記音節列候補に対応する単語列候補を算出
する単語列算出機能と、前記音節列算出機能および前記
単語列算出機能を用いて算出された少なくとも最尤の認
識単語列を出力する出力機能と、該出力機能を用いて表
示される認識単語列に誤りがある場合に修正のためにユ
ーザによる正解文字列の入力を可能とする修正機能と、
入力された前記正解文字列に対して形態素解析を実施す
る形態素解析機能と、前記正解文字列と形態素解析結果
とを比較して未知語および当該未知語に対応する認識音
節列を認定する未知語範囲抽出機能と、単語を構成する
サブワードに対する種々の読みを音節列として登録した
サブワード辞書を参照して未知語を構成するサブワード
に対する音節列を組み合せることで未知語に対応する種
々の音節列候補を生成し、２つの音節列間の近似度を評
価する差分表を参照して前記未知語に対応する認識音節
列に最も近似する音節列候補を検出して、この最尤の音
節列候補を未知語に合致する音節列と推定する未知語音
節推定機能とを、コンピュータに実現させるための音声
認識プログラムをコンピュータ読み取り可能な記録媒体
に記録したものである。A syllable string calculation function for performing syllable unit recognition based on an input voice pattern to calculate a syllable string candidate corresponding to a voice, and A word string calculation function of calculating a word string candidate corresponding to the above, an output function of outputting at least the maximum likelihood recognized word string calculated using the syllable string calculation function and the word string calculation function, and the output function A correction function that allows a user to input a correct character string for correction when there is an error in a recognized word string displayed using
A morphological analysis function for performing morphological analysis on the input correct character string, and an unknown word for comparing the correct character string and the morphological analysis result to recognize an unknown word and a recognized syllable string corresponding to the unknown word Various syllable string candidates corresponding to unknown words by combining a range extraction function and syllable strings for subwords forming unknown words by referring to a subword dictionary in which various readings for subwords forming words are registered as syllable strings Is generated, and a syllable string candidate closest to the recognized syllable string corresponding to the unknown word is detected with reference to a difference table for evaluating the degree of approximation between the two syllable strings. A syllable sequence that matches an unknown word and an unknown word syllable estimation function that is recorded on a computer-readable recording medium with a speech recognition program for causing the computer to realize the function. That.

【００２５】この発明に係る音声認識プログラム記録媒
体は、音声パターンを基に音節単位の認識を実施して音
声に対応する尤度上位の複数個の音節列候補を算出する
音節列算出機能と、前記複数個の音節列候補のそれぞれ
に対して対応する単語列候補を算出する単語列算出機能
と、前記音節列算出機能および前記単語列算出機能を用
いて算出された複数個の音節列候補と単語列候補との組
み合せのなかから最も大きな言語尤度を有する組み合せ
を検出し、当該組み合せに係る音節列候補および単語列
候補をそれぞれ認識音節列および認識単語列として少な
くとも認識単語列を出力する出力機能とを、コンピュー
タに実現させるためのプログラムを追補した音声認識プ
ログラムをコンピュータ読み取り可能な記録媒体に記録
したものである。A syllable string calculating function for performing syllable unit recognition based on a voice pattern and calculating a plurality of syllable string candidates with higher likelihood corresponding to voice, according to the present invention. A word string calculation function for calculating a word string candidate corresponding to each of the plurality of syllable string candidates; and a plurality of syllable string candidates calculated using the syllable string calculation function and the word string calculation function. An output for detecting a combination having the largest linguistic likelihood from combinations with word string candidates and outputting at least a recognized word string as a syllable string and a recognized word string, respectively, as a syllable string candidate and a word string candidate relating to the combination; The functions are recorded on a computer-readable recording medium with a speech recognition program supplemented with a program for causing a computer to realize the functions.

【００２６】この発明に係る音声認識プログラム記録媒
体は、未知語範囲抽出機能を用いて認定された未知語、
および未知語音節推定機能により推定された前記未知語
に合致する音節列を単語辞書に登録する単語登録機能を
コンピュータに実現させるためのプログラムを追補した
音声認識プログラムをコンピュータ読み取り可能な記録
媒体に記録したものである。[0026] The speech recognition program recording medium according to the present invention provides an unknown word recognized by using an unknown word range extraction function,
And a computer-readable recording medium recording a speech recognition program supplemented with a program for causing a computer to implement a word registration function of registering a syllable string matching the unknown word estimated by the unknown word syllable estimation function in a word dictionary. It was done.

【００２７】この発明に係る音声認識プログラム記録媒
体は、未知語範囲抽出機能を用いて認定された未知語、
および未知語音節推定機能を用いて推定された前記未知
語に合致する音節列をｎ−ｇｒａｍとして単語辞書に登
録するｎ−ｇｒａｍ登録機能をコンピュータに実現させ
るためのプログラムを追補した音声認識プログラムをコ
ンピュータ読み取り可能な記録媒体に記録したものであ
る。[0027] The speech recognition program recording medium according to the present invention provides an unknown word recognized using an unknown word range extraction function,
And a speech recognition program that supplements a program for causing a computer to implement an n-gram registration function of registering a syllable string matching the unknown word estimated using the unknown word syllable estimation function as an n-gram in a word dictionary. It is recorded on a computer-readable recording medium.

【００２８】この発明に係る音声認識プログラム記録媒
体は、未知語範囲抽出機能を用いて認定された未知語、
および未知語音節推定機能を用いて推定された前記未知
語に合致する音節列を表す表記をユーザに対して表示す
る第２の出力機能と、該第２の出力機能を用いて表示さ
れた前記未知語および前記未知語に合致する音節列を表
す表記に誤りがある場合にユーザによる正解となる表記
の入力を可能とする第２の修正機能とを、コンピュータ
に実現させるためのプログラムを追補した音声認識プロ
グラムをコンピュータ読み取り可能な記録媒体に記録し
たものである。[0028] The speech recognition program recording medium according to the present invention provides an unknown word recognized using an unknown word range extraction function.
And a second output function for displaying to the user a notation representing a syllable string that matches the unknown word estimated using the unknown word syllable estimation function, and the second output function displayed using the second output function. When a notation representing an unknown word and a syllable string that matches the unknown word has an error, a second correction function enabling a user to input a correct notation is added to a program for causing a computer to perform the correction. The voice recognition program is recorded on a computer-readable recording medium.

【００２９】この発明に係る音声認識プログラム記録媒
体は、未知語範囲抽出機能を用いて認定された未知語に
対して未知語音節推定機能を用いて推定された前記未知
語に合致する音節列を単語辞書に登録するとともに、未
知語についての異表記に対しても前記合致する音節列を
単語辞書に登録する異表記登録機能をコンピュータに実
現させるためのプログラムを追補した音声認識プログラ
ムをコンピュータ読み取り可能な記録媒体に記録したも
のである。The speech recognition program recording medium according to the present invention provides a syllable string that matches an unknown word estimated using an unknown word syllable estimation function for an unknown word that has been identified using an unknown word range extraction function. A computer-readable speech recognition program that is supplemented with a program for realizing a computer to have a variant transcription registration function of registering the matching syllable string in the word dictionary with respect to a variant representation of an unknown word while registering the phrase in an unknown word. Recorded on a simple recording medium.

【００３０】この発明に係る音声認識プログラム記録媒
体は、未知語範囲抽出機能を用いて認定された未知語に
ついて、該未知語に合致する音節列が推定できたか否か
を判定して、推定できた場合には当該推定された音節列
を前記未知語に合致する音節列として単語辞書に登録
し、推定できない場合には前記未知語範囲抽出機能を用
いて認定された前記未知語に対応する認識音節列を前記
未知語に合致する音節列として単語辞書に登録する音節
列登録機能をコンピュータに実現させるためのプログラ
ムを追補した音声認識プログラムをコンピュータ読み取
り可能な記録媒体に記録したものである。The speech recognition program recording medium according to the present invention is capable of determining whether or not a syllable string matching an unknown word has been estimated for an unknown word recognized using the unknown word range extraction function. If not, the estimated syllable string is registered in the word dictionary as a syllable string that matches the unknown word, and if it cannot be estimated, the recognition corresponding to the unknown word recognized using the unknown word range extraction function A computer-readable recording medium records a speech recognition program supplemented with a program for causing a computer to implement a syllable string registration function of registering a syllable string in a word dictionary as a syllable string matching the unknown word.

【００３１】この発明に係る音声認識プログラム記録媒
体は、未知語範囲抽出機能を用いて認定された未知語に
対して未知語音節推定機能を用いて推定された前記未知
語に合致する音節列を単語辞書に登録するとともに、前
記未知語に対して該未知語に合致する異読みの音節列を
単語辞書に登録する異読み登録機能をコンピュータに実
現させるためのプログラムを追補した音声認識プログラ
ムをコンピュータ読み取り可能な記録媒体に記録したも
のである。[0031] The speech recognition program recording medium according to the present invention provides a syllable string that matches the unknown word estimated using the unknown word syllable estimation function with respect to the unknown word recognized using the unknown word range extraction function. A computer-readable speech recognition program that supplements a program for realizing a variant reading registration function for registering in a word dictionary and registering, in the word dictionary, a variant syllable string that matches the unknown word with respect to the unknown word. It is recorded on a readable recording medium.

【００３２】[0032]

【発明の実施の形態】以下、この発明の実施の一形態を
説明する。実施の形態１．図１は、この発明の実施の形態１による
音声認識装置の構成を示すブロック図である。図におい
て、１はユーザが発声する音声を入力して電気的信号に
変換して情報処理可能な音声パターンを生成するマイク
（音声入力手段）、２はマイク１により得られた音声パ
ターンを基に音節単位の認識を実施して音声に対応する
最尤の音節列候補を算出する音節列算出装置（音節列算
出手段）、３は音節列候補を基に最尤の単語列候補を算
出する単語列算出装置（単語列算出手段）、４は音声認
識結果として最尤の音節列候補および単語列候補を認識
音節列および認識単語列として出力する出力装置（出力
手段）、５は出力装置４により表示出力される認識結果
に誤りがある場合に、誤認識部分についてユーザからの
修正入力を受けて当該正解文字列を出力する修正装置
（修正手段）、６は修正装置５からの正解文字列の入力
を受けて正解文字列に対する形態素解析を実施する形態
素解析装置（形態素解析手段）、７は形態素解析装置６
により分離された正解文字列から未知語を認定するとと
もに出力装置４に出力された認識音節列を参照して当該
未知語に対応する認識音節列を認定する未知語範囲抽出
装置（未知語範囲抽出手段）、８は未知語範囲抽出装置
７で認定された未知語に合致する正確な音節列を推定す
る未知語音節推定装置（未知語音節推定手段）である。
また、９は音節列候補等を記憶するためのＲＡＭ，１０
は単語が登録された単語辞書、１１は単語を構成する単
漢字等のサブワードに対する種々の読みを音節列として
登録するサブワード辞書、１２は２つの音節（音節列）
間の対数尤度を表した差分表である。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS One embodiment of the present invention will be described below. Embodiment 1 FIG. FIG. 1 is a block diagram showing a configuration of a speech recognition device according to Embodiment 1 of the present invention. In the figure, reference numeral 1 denotes a microphone (speech input means) which inputs a speech uttered by a user and converts it into an electric signal to generate a speech pattern which can be processed, and 2 denotes a speech pattern obtained by the microphone 1 based on the speech pattern. A syllable string calculating device (syllable string calculating means) for performing syllable unit recognition and calculating the maximum likelihood syllable string candidate corresponding to the speech, 3 is a word for calculating the maximum likelihood word string candidate based on the syllable string candidate A column calculating device (word string calculating means) 4 is an output device (output means) for outputting the maximum likelihood syllable string candidate and word string candidate as a recognized syllable string and a recognized word string as a speech recognition result. When there is an error in the recognition result displayed and output, a correction device (correction means) for receiving a correction input from the user for the erroneously recognized portion and outputting the correct character string; Positive after receiving input Morphological analysis apparatus for performing the morphological analysis on a character string (morphological analysis means), 7 morphological analysis apparatus 6
An unknown word range extracting device (an unknown word range extraction device) that recognizes an unknown word from the correct character string separated by the above and refers to a recognized syllable sequence output to the output device 4 to recognize a recognized syllable sequence corresponding to the unknown word Means) and 8 are unknown word syllable estimating devices (unknown word syllable estimating means) for estimating an accurate syllable string matching the unknown word recognized by the unknown word range extracting device 7.
Reference numeral 9 denotes a RAM for storing syllable string candidates and the like;
Is a word dictionary in which words are registered, 11 is a sub-word dictionary in which various readings for sub-words such as single kanji constituting the word are registered as syllable strings, and 12 is two syllables (syllable strings).
6 is a difference table showing the log likelihood between them.

【００３３】上記の単語列算出装置３で実施される演算
処理について説明すると、単語列候補は、以下の式で与
えられる単語列の確率Ｐ（Ｗ｜Ｙ）を最大にするＷを検
出することで生成される。The arithmetic processing performed by the word string calculating device 3 will be described. The word string candidate is to detect W that maximizes the probability P (W | Y) of the word string given by the following equation. Generated by

【００３４】[0034]

【数１】 (Equation 1)

【００３５】上記の式において、Ｗは発話された単語
列、Ｙは発話された音節列を示す。上記式の右辺のＰ
（Ｙ）はＹが観測される期待値であってＷには無関係で
あるので、Ｐ（Ｗ｜Ｙ）を最大にするＷを求めるために
は、Ｐ（Ｙ｜Ｗ）・Ｐ（Ｗ）を最大にするＷを求めれば
よいこととなる。ここで、Ｐ（Ｙ｜Ｗ）は単語列Ｗが与
えられたときの音節列Ｙの出現確率であり、Ｐ（Ｗ）は
単語列Ｗの出現確率である。そして、時刻ｔ＝１，２，
…，Ｌにおいて単語列Ｗに対応する音節列が、Ｙ＝Ｙ₁，Ｙ₂，…，Ｙ_L で決定されるとき、Ｐ（Ｙ｜Ｗ）は音節確率から次式に
よって算出できる。In the above equation, W is the uttered word
The column Y indicates the uttered syllable sequence. P on the right side of the above equation
(Y) is the expected value of Y, which is independent of W
In order to find W that maximizes P (W | Y)
If we find W that maximizes P (Y | W) · P (W)
It will be good. Here, P (Y | W) is given by the word string W.
Is the probability of appearance of the syllable string Y when obtained, and P (W) is
This is the appearance probability of the word string W. Then, at time t = 1, 2, 2,
, L is a syllable string corresponding to the word string W, Y = Y₁, Y_Two, ..., Y_L P (Y | W) is calculated from the syllable probability as
Therefore, it can be calculated.

【００３６】[0036]

【数２】 (Equation 2)

【００３７】また、単語列の出現確率Ｐ（Ｗ）は、ｍ個
の単語からなる単語列Ｗが、Ｗ＝ｗ₁，ｗ₂，…，ｗ_m で決定されるとき、音節確率とは独立に次式（単語ｎ−
ｇｒａｍ情報）から算出できる。The appearance probability P (W) of the word string is m
A word string W consisting of the words₁, W_Two, ..., w_m Is determined independently of the syllable probability as follows (word n−
(gram information).

【００３８】[0038]

【数３】 (Equation 3)

【００３９】上述した計算により、音節列候補について
音節列に対応する単語列の個々の単語が単語辞書１０に
存在するものについて、単語列確率Ｐ（Ｗ｜Ｙ）を最大
にするＷを算出する。なお、上式において、それぞれの
単語の出現確率は単語辞書１０に予め記憶されているも
のとする。また、組み合せ計算については、例えば、中
川聖一著：「確率モデルによる音声認識」に示されるＶ
ｉｔｅｒｂｉの方法、スタックデコーディングの方法等
の方法を用いて高速に実施することが可能である。さら
に、確率を対数確率として、計算式を総和で計算可能と
してもよい。With the above-described calculation, for each syllable string candidate whose word string corresponding to the syllable string exists in the word dictionary 10, W that maximizes the word string probability P (W | Y) is calculated. . In the above formula, it is assumed that the appearance probability of each word is stored in the word dictionary 10 in advance. Also, as for the combination calculation, for example, V shown in Seiichi Nakagawa: “Speech Recognition by Stochastic Model”
It can be implemented at high speed using a method such as the iterbi method or the stack decoding method. Further, the probability may be set as a logarithmic probability, and the calculation formula may be calculated as a sum.

【００４０】単語辞書１０では、単語に対する文字表記
と、当該単語に合致する読みの音節列表記と、末尾単語
の出現確率（尤度）Ｐ（Ｗ）とが１つのレコードとして
表現され、記憶されている。図２に示される表図（ａ）
は、１−ｇｒａｍの記憶形態を用いた単語辞書１０内部
の記憶例を示し、表図（ｂ）は２−ｇｒａｍの記憶形態
を用いた単語辞書１０内部の記憶例を示している。な
お、この実施の形態では、単語の連鎖は２連鎖までのも
のを扱うこととするが、連鎖数は３以上であってもよ
い。In the word dictionary 10, the character notation for the word, the syllable string notation of the reading that matches the word, and the appearance probability (likelihood) P (W) of the last word are expressed and stored as one record. ing. Table (a) shown in FIG.
Shows a storage example inside the word dictionary 10 using the storage form of 1-gram, and Table (b) shows a storage example inside the word dictionary 10 using the storage form of 2-gram. In this embodiment, word chains are handled up to two chains, but the number of chains may be three or more.

【００４１】次に、形態素解析装置６、未知語範囲抽出
装置７および未知語音節推定装置８の機能について説明
する。形態素解析装置６は、ユーザにより与えられる正
解文字列を入力として、正解文字列の形態素解析を実施
して、正解文字列を形態素に分割して出力する。ここ
で、形態素は、形態素文字表記、形態素音節列表記およ
び形態素品詞の３要素から構成される。形態素解析によ
り、例えば「音声ｏｎｓｅｅ名詞」のような形態素
が分割出力される。Next, the functions of the morphological analyzer 6, unknown word range extracting device 7, and unknown word syllable estimating device 8 will be described. The morphological analysis device 6 receives the correct character string given by the user as input, performs morphological analysis of the correct character string, divides the correct character string into morphemes, and outputs the morpheme. Here, the morpheme is composed of three elements: morpheme character notation, morpheme syllable string notation, and morpheme part of speech. By the morphological analysis, for example, a morpheme such as “speech-onsee noun” is divided and output.

【００４２】未知語範囲抽出装置７は、形態素解析結果
と、出力装置４により表示出力された認識音節列および
認識単語列を入力して、形態素と認識単語列とを比較し
て文字表記の異なる文字列を含む形態素を未知語候補と
する。次に、未知語候補の形態素の文字表記から与えら
れる文字列をキーとして単語辞書１０を検索し、単語辞
書１０に当該文字列が登録されていない場合には、未知
語候補を未知語として認定するとともに、出力装置４に
出力された認識音節列のなかの当該未知語に対応する部
分的な認識音節列を認定して未知語範囲として出力す
る。なお、本実施例では形態素と認識単語列とを比較し
て文字表記の異なる文字列を含む形態素を未知語候補と
したが、サブワード、またはサブワード列の連続を未知
語候補としてもよい。また、従来の方式同様、仮名や漢
字等の文字種によって区切った単語を未知語候補として
得ても良い。The unknown word range extraction device 7 receives the morphological analysis result and the recognized syllable sequence and the recognized word sequence displayed and output by the output device 4, compares the morpheme with the recognized word sequence, and differs in character notation. A morpheme including a character string is set as an unknown word candidate. Next, the word dictionary 10 is searched using the character string given from the character notation of the morpheme of the unknown word candidate as a key. If the character string is not registered in the word dictionary 10, the unknown word candidate is recognized as an unknown word. At the same time, a partial recognized syllable string corresponding to the unknown word in the recognized syllable string output to the output device 4 is recognized and output as an unknown word range. In this embodiment, the morpheme and the recognized word string are compared, and a morpheme including a character string having a different character notation is determined as an unknown word candidate. However, a subword or a series of subword strings may be determined as an unknown word candidate. As in the conventional method, words separated by character types such as kana and kanji may be obtained as unknown word candidates.

【００４３】未知語音節推定装置８は、未知語範囲抽出
装置７から入力された未知語範囲を基にして、未知語に
合致する正解音節列を推定する。この際、未知語音節推
定装置８は、未知語として与えられる文字列をサブワー
ド辞書１１を用いて任意の部分文字列に分割し、分割し
て得られたサブワードに対して付与される音節列を種々
に組み合せて、未知語に対応する音節列候補をすべて生
成する。そして、このようにして得られたすべての音節
列候補について、未知語範囲抽出装置７で認定された未
知語に対応する認識音節列との間の近似度を差分表１２
を用いて算出し、最も近似度の高い音節列を未知語の読
みに合致する音節列として出力する。The unknown word syllable estimating device 8 estimates a correct syllable sequence matching the unknown word based on the unknown word range input from the unknown word range extracting device 7. At this time, the unknown word syllable estimation device 8 divides the character string given as the unknown word into an arbitrary partial character string using the subword dictionary 11, and calculates the syllable string given to the subword obtained by the division. In various combinations, all syllable string candidates corresponding to unknown words are generated. Then, for all the syllable string candidates obtained in this way, the similarity between the syllable string candidates recognized by the unknown word range extraction device 7 and the recognized syllable strings corresponding to the unknown words is calculated by using the difference table 12.
And output the syllable string with the highest degree of approximation as a syllable string that matches the reading of the unknown word.

【００４４】ここで、サブワード辞書１１とは、単語辞
書１０に登録されていない未知語に合致する音節列を付
与するために、単語を構成するサブワードに対する種々
の読みの音節列を登録したものである。例えば、単語を
構成する種々の単漢字の読みに対する音節列、母音の長
音化ルール等の種々の発音規則が適用される場合または
適用されない場合のサブワードの読みに対する音節列等
が登録されている。図３には、サブワード辞書１１内に
登録されているデータの例が示されている。Here, the sub-word dictionary 11 registers syllable strings of various readings for the sub-words constituting the word in order to add syllable strings matching unknown words not registered in the word dictionary 10. is there. For example, syllable strings for reading various single kanji characters constituting words, syllable strings for reading subwords when various pronunciation rules such as vowel lengthening rules or the like are applied or not are registered. FIG. 3 shows an example of data registered in the sub-word dictionary 11.

【００４５】また、差分表１２には、図４に示されるよ
うに、対照される２つの音節（または音節列）間の対数
尤度が示されている。したがって、サブワード辞書１１
を参照して生成した音節列候補と未知語に対応する認識
音節列とを対照する場合には、それぞれの音節列を適切
に分割して、それぞれ対応する部分的な音節（音節列）
を確定し、差分表１２により得られる対応音節間の対数
尤度を合計して、前記音節列候補と認識音節列との間の
近似度を算出する。As shown in FIG. 4, the difference table 12 shows the log likelihood between two contrasted syllables (or syllable strings). Therefore, the sub-word dictionary 11
When the syllable string candidates generated by referring to the above are compared with the recognized syllable string corresponding to the unknown word, each syllable string is appropriately divided, and each corresponding partial syllable (syllable string)
Is determined, and the log likelihood between the corresponding syllables obtained from the difference table 12 is summed to calculate the degree of approximation between the syllable string candidate and the recognized syllable string.

【００４６】次に、この発明の実施の形態１による音声
認識装置の動作を、具体的な実施例を挙げて説明する。
図５は、この発明の実施の形態１による音声認識装置を
用いて未知語音節を抽出する動作の過程を示すフローチ
ャートである。ユーザがマイク１に対して発声すること
で処理が開始され（ステップＳＴ１）、マイク１を通し
て音声を入力すると（ステップＳＴ２）、マイク１内部
で入力音声を電気的信号に変換してアナログデータとし
て取り込む（ステップＳＴ３）。本実施例では、ユーザ
が「おんせいにんしきしょり」と発声したとする。Next, the operation of the speech recognition apparatus according to Embodiment 1 of the present invention will be described with reference to specific examples.
FIG. 5 is a flowchart showing a process of extracting unknown word syllables using the speech recognition device according to the first embodiment of the present invention. The process is started when the user speaks to the microphone 1 (step ST1). When a voice is input through the microphone 1 (step ST2), the input voice is converted into an electric signal inside the microphone 1 and taken in as analog data. (Step ST3). In the present embodiment, it is assumed that the user has uttered “Onsenishoshi”.

【００４７】音節列算出装置２は、マイク１の取り込ん
だアナログデータをＡ／Ｄ変換して量子化した後、スペ
クトル分析を実施して、音節単位に分離して認識された
それぞれの音節候補を接続して１つの最尤の音節列候補
を出力する（ステップＳＴ４）。なお、音節列候補を選
定する種々の手法については、例えば、中川聖一著：
「確率モデルによる音声認識」にその詳細が開示されて
いる。本実施例では、以下に示される最尤の音節列候補
およびその尤度が出力される。＃ｏＮｓｅｅｎｉＮｈｉｓｙｏｒｉ＃０．３ただし、＃は文頭および文末を表す記号である。なお、
音響尤度については、確率以外に対数確率等を用いても
よい。The syllable string calculating device 2 A / D converts and quantizes the analog data taken in by the microphone 1, and then performs a spectrum analysis to separate each syllable candidate recognized in syllable units. The connection is made and one maximum likelihood syllable string candidate is output (step ST4). For various methods for selecting syllable string candidates, see, for example, Seichi Nakagawa:
The details are disclosed in "Speech Recognition by Probability Model". In this embodiment, the following maximum likelihood syllable string candidates and their likelihoods are output. # ONseeniNhisiriri # 0.3 where # is a symbol representing the beginning and end of a sentence. In addition,
For the acoustic likelihood, a logarithmic probability or the like may be used instead of the probability.

【００４８】単語列算出装置３は、音節列算出装置２が
出力した最尤の音節列候補を基にして単語列候補を算出
する（ステップＳＴ５）。この算出処理には、阿部他：
「１段目の最適解と正解の差分傾向を考慮した２段階探
索法」，音構論，１−Ｒ−１５，１９９８．９に示され
る手法を用いる。この際、音節列算出装置２により算出
された最尤の音節列候補が唯一の音節列候補であるとの
前提にたって、当該音節列候補に対応する最尤の単語列
候補を検出し、当該最尤の単語列候補およびその尤度が
出力される。「音声認知処理」０．４The word string calculator 3 calculates word string candidates based on the maximum likelihood syllable string candidates output from the syllable string calculator 2 (step ST5). Abe et al .:
A two-stage search method taking into account the difference tendency between the first-stage optimal solution and the correct solution is used, a method described in Tone Composition Theory, 1-R-15, 1998. At this time, on the assumption that the maximum likelihood syllable string candidate calculated by the syllable string calculating device 2 is the only syllable string candidate, the most likely word string candidate corresponding to the syllable string candidate is detected, and The maximum likelihood word string candidate and its likelihood are output. "Speech recognition processing" 0.4

【００４９】そして、最尤の音節列候補および単語列候
補を認識音節列および認識単語列としてＲＡＭ９に記憶
する（ステップＳＴ６）。＃ＮＵＬＬ文頭／音声ｏｎｓｅｅ名詞／認知
ｎｉＮｈｉ名詞／処理ｓｊｏｒｉさ変名詞／＃
ＮＵＬ文末確率０．４Then, the maximum likelihood syllable string candidates and word string candidates are stored in the RAM 9 as recognized syllable strings and recognized word strings (step ST6). # NULL Sentence / voice onsee noun / cognition
niNhi noun / processing sjori saga noun / #
NUL sentence end probability 0.4

【００５０】次に、出力装置４は、ＲＡＭ９に記憶され
ている最尤の認識単語列を読み出し、＃以外の表記の要
素を連結して出力する（ステップＳＴ７）。本実施例で
は、以下に示される文字列が出力される。「音声認知処理」Next, the output device 4 reads out the maximum likelihood recognized word string stored in the RAM 9, and connects the notation elements other than # to output (step ST7). In this embodiment, the following character strings are output. "Speech recognition processing"

【００５１】出力装置４により表示出力された認識単語
列に誤りがある場合には、ユーザは修正装置５を用いて
正解文字列を入力する（ステップＳＴ８）。ここで、文
字列に誤りがなく、ユーザによる修正の必要がない場合
には処理を終了する。本実施例では、以下に示される正
解文字列が入力される。「音声認識処理」図６には、修正前の文字列と修正後の
文字列とを示す。When there is an error in the recognized word string displayed and output by the output device 4, the user inputs a correct character string using the correction device 5 (step ST8). Here, if there is no error in the character string and there is no need for correction by the user, the process ends. In this embodiment, a correct character string shown below is input. "Speech Recognition Processing" FIG. 6 shows a character string before correction and a character string after correction.

【００５２】次に、形態素解析装置６は、正解となる正
解文字列を形態素解析する（ステップＳＴ９）。本実施
例では、形態素解析結果は以下のようになる。＃ＮＵＬＬ文頭／音声ｏｎｓｅｅ名詞／認識
？名詞／処理ｓｊｏｒｉサ変名詞／＃ＮＵＬ文
末Next, the morphological analyzer 6 performs a morphological analysis on the correct character string that becomes the correct answer (step ST9). In this embodiment, the morphological analysis result is as follows. # NULL Sentence / voice onsee noun / recognition
? Noun / processing sjori sa-variant noun / # NULL sentence end

【００５３】ここで、形態素解析のアルゴリズムについ
て詳細に説明する。図７は、形態素解析のアルゴリズム
を示すフローチャートである。第１に、修正装置５を用
いて正解である正解文字列「音声認識処理」を入力する
（ステップＳＴ１２１）。次に、形態素解析装置６は、
入力された正解文字列をＲＡＭ９に記憶する（ステップ
ＳＴ１２２）。この際、サブワード辞書１１を用いて正
解文字列をサブワードの組み合せに分解する。ここで
は、サブワードとして「音」、「声」、「認」、
「識」、「処」および「理」が与えられ、これらサブワ
ードの任意の組み合せについて仮想的な単語を構成し、
予め大量のコーパスから抽出しておいた単語の長さに対
する確率を付与してＲＡＭ９に記憶するものとする。な
お、ここでは「認識」は単語辞書１０に登録されていな
い未知語であると仮定する。Here, the morphological analysis algorithm will be described in detail. FIG. 7 is a flowchart showing an algorithm of morphological analysis. First, the correct character string “speech recognition process”, which is the correct answer, is input using the correction device 5 (step ST121). Next, the morphological analyzer 6
The input correct character string is stored in the RAM 9 (step ST122). At this time, the correct character string is decomposed into a combination of subwords using the subword dictionary 11. Here, the subwords "sound", "voice", "recognition",
Given "insight", "action" and "physics", construct a virtual word for any combination of these subwords,
It is assumed that a probability is added to the word length extracted in advance from a large amount of corpus and stored in the RAM 9. Here, it is assumed that “recognition” is an unknown word that is not registered in the word dictionary 10.

【００５４】形態素解析装置６は、正解文字列をＲＡＭ
９より取り出すとともに、初期化処理を実施する（ステ
ップＳＴ１２３）。初期化処理として、ヌル単語「＃
＃文頭」とその確率値「１」を先行単語列の初期値とし
てＲＡＭ９に記憶する。The morphological analyzer 6 stores the correct character string in the RAM
9 and an initialization process is performed (step ST123). As initialization processing, the null word "#
"# Sentence head" and its probability value "1" are stored in the RAM 9 as initial values of the preceding word string.

【００５５】次に、形態素解析装置６は、前方一致の文
字列を検索することで、先行単語列が正解文字列に一致
するまで、単語辞書１０およびサブワード辞書１１を参
照して先行単語列を構成する。本実施例では、最初に
「＃＃文頭」が先行単語列として取り出される（ス
テップＳＴ１２４）。Next, the morphological analysis device 6 searches the preceding word string by referring to the word dictionary 10 and the subword dictionary 11 until the preceding word string matches the correct character string by searching for a character string having a prefix match. Constitute. In this embodiment, first, “# # sentence head” is extracted as a preceding word string (step ST124).

【００５６】先行単語列が設定されると、正解文字列の
なかで先行単語列以降の部分文字列において、前方一致
する後方単語が存在するか否かをチェックする（ステッ
プＳＴ１２５）。この後方単語の検索に関しては、単語
辞書１０およびサブワード辞書１１を参照して、サブワ
ードおよびサブワード列も登録されている限りにおいて
後方単語として扱うものとする。ここで、前方一致する
後方単語が存在しない場合には、先行単語列を初期設定
するために、処理をステップＳＴ１２４に戻す。また、
前方一致する後方単語が存在する場合には、一致した後
方単語についての尤度を計算してＲＡＭ９に記憶すると
ともに、先行単語列に後方単語を接続して新たな先行単
語列としてＲＡＭ９に記憶する（ステップＳＴ１２
６）。本実施例では、先行単語列「＃＃文頭」の後
方単語を検索するために、「＃」に後続する「音声認識
処理＃」のなかで単語辞書１０内に一致する文字列が存
在するか否かを検索する。この場合、単語辞書１０内に
「音声」が登録されているので、「音声ｏｎｓｅｅ
名詞」を後方単語として抽出する。そして、先行単語列
「＃＃文頭」を「＃＃文頭音声ｏｎｓｅｅ
名詞」に置き換える。ここでは、言語尤度の計算に
は、２−ｇｒａｍの確率を用いることとして、「＃＃
文頭音声ｏｎｓｅｅ名詞」に対する言語尤度は、
先行単語列「＃＃文頭」の確率（すなわち“１”）と
単語辞書１０に記述されている「＃＃文頭音声ｏ
ｎｓｅｅ名詞」の２−ｇｒａｍの確率との積として与
えられる。When the preceding word string is set, it is checked whether or not there is a backward word that matches forward in the partial character string after the preceding word string in the correct character string (step ST125). Regarding the search for the backward word, the word dictionary 10 and the subword dictionary 11 are referred to and treated as the backward word as long as the subword and the subword string are also registered. Here, if there is no backward word that matches forward, the process returns to step ST124 to initialize the preceding word string. Also,
If there is a backward word that matches the head, the likelihood of the matched backward word is calculated and stored in the RAM 9, and the backward word is connected to the preceding word string and stored in the RAM 9 as a new preceding word string. (Step ST12
6). In the present embodiment, in order to search for a word after the preceding word string “# # sentence head”, whether a matching character string exists in the word dictionary 10 in “speech recognition processing #” following “#” Search for no. In this case, since "voice" is registered in the word dictionary 10, "voice onsee"
Nouns ”are extracted as backward words. Then, the preceding word string “# # sentence head” is changed to “# # sentence head voice onsee.
Noun ". Here, it is assumed that the probability of 2-gram is used for the calculation of the language likelihood, and "####
The linguistic likelihood for the sentence head voice onsee noun is
The probability of the preceding word string “## sentence head” (that is, “1”) and “## sentence head sound o described in the word dictionary 10
nsee noun "is given as the product of 2-gram probability.

【００５７】次に、先行単語列が正解文字列と一致する
か否かをチェックする（ステップＳＴ１２７）。ここ
で、先行単語列が正解文字列と一致しない場合には、さ
らなる後方単語を検出するために処理をステップＳＴ１
２５に戻す。また、先行単語列が正解文字列と一致する
場合には、既にサブワードについての他の組み合せによ
り正解文字列に一致した先行単語列が存在する場合には
言語尤度が最大となる先行単語列を検出して、当該先行
単語列およびその尤度をＲＡＭ９に記憶する（ステップ
ＳＴ１２８）。Next, it is checked whether or not the preceding word string matches the correct character string (step ST127). If the preceding word string does not match the correct character string, the process proceeds to step ST1 to detect further backward words.
Return to 25. Also, if the preceding word string matches the correct character string, if there is already a preceding word string that matches the correct character string by another combination of subwords, the preceding word string with the maximum linguistic likelihood is determined. Upon detection, the preceding word string and its likelihood are stored in the RAM 9 (step ST128).

【００５８】次に、すべての先行単語列の組み合せにつ
いて検索を実施したか否かをチェックする（ステップＳ
Ｔ１２９）。ここで、すべての組み合せについて検索が
実施されていない場合には、他の組み合せからなる先行
単語列を検出してその言語尤度を算出するために処理を
ステップＳＴ１２４に戻す。また、すべての組み合せに
対して検索が終了している場合には、ＲＡＭ９に記憶さ
れている最大の言語尤度を有する先行単語列の組み合せ
を読み出し、当該組み合せを形態素解析の解として出力
する（ステップＳＴ１３０）。本実施例では、正解文字
列「＃音声認識処理＃」に対して、最大の言語尤度を有
する「＃＃文頭音声ｏｎｓｅｅ名詞認識？
名詞処理ｓｊｏｒｉサ変名詞＃＃文末」
が、形態素解析の解析結果として出力される。この際、
単語辞書１０に登録されていない未知語である「認識」
については、形態素解析において音節列表記不明“？”
として出力される。そして、解の形態素列を返値とし
て、処理をステップＳＴ９に戻す（ステップＳＴ１３
１）。Next, it is checked whether or not a search has been performed for all combinations of preceding word strings (step S).
T129). If the search has not been performed for all combinations, the process returns to step ST124 to detect a preceding word string composed of other combinations and calculate its linguistic likelihood. If the search has been completed for all combinations, the combination of preceding word strings having the maximum linguistic likelihood stored in the RAM 9 is read out, and the combination is output as a morphological analysis solution ( Step ST130). In the present embodiment, for the correct character string "#voice recognition processing #", "# # sentence head voice onsee noun recognition?"
Noun processing sjori sa noun # # sentence end "
Is output as the analysis result of the morphological analysis. On this occasion,
"Recognition" which is an unknown word not registered in the word dictionary 10
About syllable string notation unknown in morphological analysis
Is output as Then, the process returns to step ST9 with the morpheme sequence of the solution as a return value (step ST13).
1).

【００５９】処理がステップＳＴ９に戻ると、未知語範
囲抽出装置７は、出力装置４に出力された認識単語列
「音声認知処理」と形態素解析結果とを比較して、文字
表記の異なる文字列を含む形態素を検出するとともに
（すなわち、「知 −＞識」で判定される修正部分を
含む形態素を検出する）、出力装置４に出力された認識
音節列＃ｏＮｓｅｅｎｉＮｈｉｓｊｏｒｉ＃のなかから
未知語「認識」に対応する部分的な認識音節列＃ｎｉＮ
ｈｉ＃を検出して、未知語「認識」および未知語に対応
する認識音節列＃ｎｉＮｈｉ＃を未知語範囲として認定
して出力する（ステップＳＴ１０）。When the process returns to step ST 9, the unknown word range extraction device 7 compares the recognized word sequence “speech recognition process” output to the output device 4 with the result of the morphological analysis, (That is, a morpheme including a corrected part determined by "knowledge->knowledge"), and an unknown word "recognition" from the recognition syllable string # oNseeniNhisjori # output to the output device 4. "Is partially recognized syllable string #niN
hi # is detected, and the unknown word "recognition" and the recognized syllable string # niNhi # corresponding to the unknown word are recognized and output as the unknown word range (step ST10).

【００６０】次に、未知語音節推定装置８は、入力され
た未知語範囲について、差分表１２を用いて解析を行っ
て未知語に合致する正確な読みの音節列を推定する（ス
テップＳＴ１１）。この際、未知語である「認識」に合
致する正確な読みの音節列を得るために、第１に、「認
識」のすべての部分文字列についてサブワード辞書１１
を検索する。サブワード辞書１１には、「認」および
「識」がサブワードとして登録されているので、「認」
に対して＃ｎｉＮ＃および＃ｍｉｔｏｍｅ＃の読みがあ
り、「識」に対しては＃ｓｉｋｉ＃があることが判明す
る。これらの組み合せ＃ｎｉＮｓｉｋｉ＃および＃ｍｉ
ｔｏｍｅｓｉｋｉ＃について＃ｎｉＮｈｉ＃との間の近
似度を計算する。この近似度の計算に際しては、図８に
示されるように、対照される２つの音節列を個々の対応
音節の対数尤度の和が最大となるように適宜音節単位に
分割するとともに、音節レベルで対応付け、図４に示さ
れる差分表から対応付けられた個々の音節の対の対数尤
度を導いて、これら対数尤度の総和として近似度を与え
る。なお、差分表１２において、通常、最適音節列とは
音声認識装置により認識された音節列として与えられ、
標準音節列とは正解となる音節列として与えられるもの
である。そして、＃ｎｉＮｓｉｋｉ＃を未知語に合致す
る音節列と推定して処理を終了する（ステップＳＴ１
２）。Next, the unknown word syllable estimating device 8 analyzes the input unknown word range using the difference table 12, and estimates an accurate syllable string corresponding to the unknown word (step ST11). . At this time, in order to obtain a syllable string of accurate reading that matches the unknown word “recognition”, first, the subword dictionary 11
Search for. In the sub-word dictionary 11, "authorization" and "knowledge" are registered as sub-words.
, There is a reading of # niN # and # mitome #, and for "knowledge", it is found that there is # siki #. These combinations # niNsiki # and #mi
Calculate the degree of approximation between tomeshiki # and # niNhi #. In calculating the degree of approximation, as shown in FIG. 8, the two syllable strings to be compared are appropriately divided into syllable units so that the sum of the log likelihoods of the corresponding syllables is maximized, and the syllable level is adjusted. , And the log likelihood of each associated syllable pair is derived from the difference table shown in FIG. 4, and an approximation is given as a sum of these log likelihoods. In the difference table 12, usually, the optimal syllable string is given as a syllable string recognized by the speech recognition device,
The standard syllable sequence is given as a correct syllable sequence. Then, # niNsiki # is estimated as a syllable string that matches the unknown word, and the process ends (step ST1).
2).

【００６１】以上のように、この実施の形態１によれ
ば、音声認識装置を形態素解析装置６、未知語範囲抽出
装置７、未知語音節推定装置８、サブワード辞書１１お
よび差分表１２を有するように構成したので、未知語範
囲抽出装置７を用いて、形態素解析装置６による形態素
解析結果と出力装置４に出力された認識単語列および認
識音節列とを比較して未知語および未知語に対応する認
識音節列を認定し、サブワード辞書１１を参照すること
で未知語を構成するサブワードを組み合せて生成した種
々の音節列候補と未知語に対応する上記認識音節列との
近似度を算出して最も近似度の高い音節列候補を未知語
に合致する音節列として推定するから、音声認識過程で
抽出された未知語に対して、正確な音節列を精度良く付
与することができるという効果を奏する。As described above, according to the first embodiment, the speech recognition apparatus includes the morphological analyzer 6, the unknown word range extracting device 7, the unknown word syllable estimating device 8, the subword dictionary 11, and the difference table 12. Thus, the unknown word range extraction device 7 is used to compare the morphological analysis result of the morphological analysis device 6 with the recognized word sequence and the recognized syllable sequence output to the output device 4 to correspond to the unknown word and the unknown word. A recognized syllable string to be recognized is identified, and by referring to the subword dictionary 11, the degree of approximation between various syllable string candidates generated by combining subwords constituting an unknown word and the recognized syllable string corresponding to the unknown word is calculated. Since the syllable string candidate with the highest degree of approximation is estimated as a syllable string that matches the unknown word, an accurate syllable string can be accurately assigned to the unknown word extracted in the speech recognition process. The effect say.

【００６２】なお、音声認識装置を構成する上記の音節
列算出装置２、単語列算出装置３、出力装置４、修正装
置５、形態素解析装置６、未知語範囲抽出装置７および
未知語音節推定装置８により付与される機能は、ＣＰ
Ｕ、メモリ、入出力装置等を具備するコンピュータ上で
動作するプログラムにより実現することが可能である。
したがって、音声認識処理を実施するための上記機能を
実現するプログラムをコンピュータ読み取り可能な記録
媒体に記録することが可能であり、この記録媒体をコン
ピュータに読み取らせることで、任意のコンピュータ上
で音声認識処理を実施することができる。The syllable string calculating device 2, word string calculating device 3, output device 4, correcting device 5, morphological analyzer 6, unknown word range extracting device 7, and unknown word syllable estimating device which constitute the speech recognition device. The function provided by 8 is CP
It can be realized by a program operating on a computer including a U, a memory, an input / output device, and the like.
Therefore, it is possible to record a program for realizing the above functions for performing the voice recognition processing on a computer-readable recording medium. Processing can be performed.

【００６３】実施の形態２．図９は、この発明の実施の
形態２による音声認識装置の構成を示すブロック図であ
る。図９において、図1と同一符号は同一または相当部
分を示すのでその説明を省略する。実施の形態２は、実
施の形態１と比較すると、尤度の高い上位Ｎ個の音節列
候補が出力されるとともに、それぞれの音節列候補に対
応する単語列候補が算出され、音節列候補と単語列候補
との組み合せについて最も大きな言語尤度を与える音節
列候補および単語列候補を認識音節列および認識単語列
として、これらに基づいて未知語に合致する音節列が推
定される点で相違する。図９において、２１はマイク１
により得られた音声パターンを基に音節単位の認識を実
施して尤度の高い上位Ｎ個の音節列候補を出力するＮベ
スト音節列算出装置（音節列算出手段）、２２はＮベス
ト音節列算出装置２１から出力された上位Ｎ個の音節列
候補のそれぞれについて最尤の単語列候補を出力するＮ
ベスト単語列算出装置（単語列算出手段）、２３はＮ組
の音節列候補と単語列候補との組み合せのなかから最尤
の認識音節列および認識単語列に基づいて未知語に合致
する音節列を推定するＮベスト未知語音節推定装置であ
る。Embodiment 2 FIG. 9 is a block diagram showing a configuration of a speech recognition device according to Embodiment 2 of the present invention. 9, the same reference numerals as those in FIG. 1 denote the same or corresponding parts, and a description thereof will not be repeated. In the second embodiment, as compared with the first embodiment, the top N syllable string candidates having a higher likelihood are output, and a word string candidate corresponding to each syllable string candidate is calculated. A syllable string candidate and a word string candidate giving the largest linguistic likelihood in combination with a word string candidate are recognized syllable strings and a recognized word string, and a syllable string matching an unknown word is estimated based on these. . In FIG. 9, 21 is the microphone 1
N best syllable string calculating device (syllable string calculating means) for performing syllable unit recognition based on the voice pattern obtained by the above and outputting the top N syllable string candidates with high likelihood, 22 is the N best syllable string N that outputs the maximum likelihood word string candidate for each of the top N syllable string candidates output from calculation device 21
The best word string calculating device (word string calculating means) 23 is a syllable string that matches an unknown word based on the maximum likelihood recognized syllable string and the recognized word string from among combinations of N sets of syllable string candidates and word string candidates. Is the N best unknown word syllable estimation device.

【００６４】次に動作について説明する。図１０は、こ
の発明の実施の形態２による音声認識装置を用いて未知
語音節を推定する動作の過程を示すフローチャートであ
る。図１０において、図５と同一符号は同一または相当
処理を示すのでその説明を省略する。Next, the operation will be described. FIG. 10 is a flowchart illustrating a process of estimating an unknown word syllable using the speech recognition device according to the second embodiment of the present invention. 10, the same reference numerals as those in FIG. 5 denote the same or corresponding processes, and a description thereof will not be repeated.

【００６５】ステップＳＴ３において入力音声が電気的
信号に変換されると、Ｎベスト音節列算出装置２１は尤
度の高い上位Ｎ個の音節列候補を出力する（ステップＳ
Ｔ２１）。次に、Ｎ個すべての音節列候補についての単
語列候補の算出が終了したか否かをチェックする（ステ
ップＳＴ２２）。すべての単語列候補の算出が終了して
いる場合には、処理をステップＳＴ７に進める。また、
すべての単語列候補の算出が終了していない場合には、
順次それぞれの音節列候補に対する最尤の単語列候補を
算出する（ステップＳＴ２３）。この際、音節列候補が
生起する確率と、当該音節列候補の生起を前提とした単
語列候補の生起する確率との積を音節列候補と対応する
単語列候補との組み合せについての言語尤度として算出
する。例えば、音節列候補およびその尤度が＃ｏＮｓｅ
ｅｎｉＮｈｉｓｊｏｒｉ＃，０．３，であり、当該音節
列候補が与えられたことを前提とした最尤の単語列候補
およびその尤度が「音声認知処理」，０．４，である場
合には、音節列候補＃ｏＮｓｅｅｎｉＮｈｉｓｊｏｒｉ
＃と単語列候補「音声認知処理」との組み合せに対する
言語尤度は０．１２となる。When the input speech is converted into an electrical signal in step ST3, the N best syllable string calculating device 21 outputs the top N syllable string candidates with high likelihood (step S3).
T21). Next, it is checked whether the calculation of word string candidates for all N syllable string candidates has been completed (step ST22). If the calculation of all word string candidates has been completed, the process proceeds to step ST7. Also,
If all word string candidates have not been calculated,
The maximum likelihood word string candidate is sequentially calculated for each syllable string candidate (step ST23). At this time, the product of the probability of occurrence of a syllable string candidate and the probability of occurrence of a word string candidate assuming the occurrence of the syllable string candidate is the language likelihood for the combination of the syllable string candidate and the corresponding word string candidate. Is calculated as For example, a syllable string candidate and its likelihood are #oNse
eniNhisjori #, 0.3, and if the maximum likelihood word string candidate and its likelihood are “speech recognition processing”, 0.4, assuming that the syllable string candidate is given, Syllable string candidate #oNseeniNhisjori
The linguistic likelihood for the combination of # and the word string candidate "speech recognition processing" is 0.12.

【００６６】次に、対象となっている音節列候補と単語
列候補との組み合せに対する言語尤度が最大であるか否
かをチェックする（ステップＳＴ２４）。最大尤度でな
い場合には、次の音節列候補について対応する単語列候
補を算出するために処理をステップＳＴ２２に戻す。ま
た、最大尤度である場合には、当該音節列候補および対
応する単語列候補をＲＡＭ９に記憶する。これにより、
ステップＳＴ７において、ＲＡＭ９から記憶された音節
列候補および単語列候補が読み出され、出力装置４によ
り認識音節列および認識単語列として少なくとも認識単
語列が表示出力される。Next, it is checked whether or not the linguistic likelihood for the combination of the target syllable string candidate and word string candidate is maximum (step ST24). If it is not the maximum likelihood, the process returns to step ST22 to calculate a word string candidate corresponding to the next syllable string candidate. If it is the maximum likelihood, the syllable string candidate and the corresponding word string candidate are stored in the RAM 9. This allows
In step ST7, the stored syllable string candidates and word string candidates are read from the RAM 9, and the output device 4 displays and outputs at least the recognized word strings as the recognized syllable strings and the recognized word strings.

【００６７】以上のように、この実施の形態２によれ
ば、音声認識装置をＮベスト音節列算出装置２１、Ｎベ
スト単語列算出装置２２およびＮベスト未知語音節列推
定装置２３を有するように構成したので、Ｎベスト音節
列算出装置２１から算出された上位Ｎ個の音節列候補に
ついてＮベスト単語列算出装置２２によりそれぞれ対応
する単語列候補が算出され、音節列候補と単語列候補と
の組み合せに係る言語尤度が最大となる音節列候補およ
び単語列候補を認識音節列および認識単語列として、未
知語に合致する音節列の推定を実施するので、音節列と
単語列との組み合せからなる総合的な言語尤度を基にし
て推定が実施されるから、より高精度な未知語に対する
音節列の付与が可能となる。As described above, according to the second embodiment, the speech recognition apparatus is provided with the N best syllable string calculating device 21, the N best word string calculating device 22, and the N best unknown word syllable string estimating device 23. With this configuration, the N best word string calculating device 22 calculates corresponding word string candidates for the top N syllable string candidates calculated by the N best syllable string calculating device 21, and calculates the syllable string candidate and the word string candidate. A syllable string candidate and a word string candidate with the largest linguistic likelihood pertaining to the combination are regarded as a recognized syllable string and a recognized word string, and a syllable string matching the unknown word is estimated. Since the estimation is performed based on the total linguistic likelihood, syllable strings can be assigned to unknown words with higher accuracy.

【００６８】なお、Ｎベスト音節列算出装置２１、Ｎベ
スト単語列算出装置２２およびＮベスト未知語音節列推
定装置２３をはじめとして、音声認識装置を構成する各
装置により付与される機能は、ＣＰＵ、メモリ、入出力
装置等を具備するコンピュータ上で動作するプログラム
により実現することが可能である。したがって、音声認
識処理を実施するための上記機能を実現するプログラム
をコンピュータ読み取り可能な記録媒体に記録すること
が可能であり、この記録媒体をコンピュータに読み取ら
せることで、任意のコンピュータ上で音声認識処理を実
施することができる。The functions provided by the respective devices constituting the speech recognition device, such as the N-best syllable sequence calculating device 21, the N-best word sequence calculating device 22, and the N-best unknown word syllable sequence estimating device 23, are controlled by the CPU. , A memory, an input / output device, and the like. Therefore, it is possible to record a program for realizing the above functions for performing the voice recognition processing on a computer-readable recording medium. Processing can be performed.

【００６９】実施の形態３．図１１は、この発明の実施
の形態３による音声認識装置の構成を示すブロック図で
ある。図１１において、図１と同一符号は同一または相
当部分を示すのでその説明を省略する。実施の形態３
は、実施の形態１と比較すると、未知語および当該未知
語に合致すると推定された音節列を単語辞書に登録する
点で相違する。図１１において、３１は未知語および当
該未知語に合致すると推定された音節列を単語辞書１０
に登録する単語登録装置（単語登録手段）である。Embodiment 3 FIG. 11 is a block diagram showing a configuration of a voice recognition device according to Embodiment 3 of the present invention. 11, the same reference numerals as those in FIG. 1 denote the same or corresponding parts, and a description thereof will not be repeated. Embodiment 3
Is different from Embodiment 1 in that an unknown word and a syllable string estimated to match the unknown word are registered in a word dictionary. In FIG. 11, reference numeral 31 denotes an unknown word and a syllable string estimated to match the unknown word.
Is a word registration device (word registration means).

【００７０】次に動作について説明する。図１２は、こ
の発明の実施の形態３による音声認識装置を用いて未知
語音節を推定して未知語を辞書に登録する動作の過程を
示すフローチャートである。図１２において、図５と同
一符号は同一または相当処理を示すのでその説明を省略
する。ステップＳＴ１１において、例えば未知語である
「認識」に対する最尤の音節列＃ｎｉＮｓｉｋｉ＃が推
定されると、単語登録装置３１は未知語の文字表記「認
識」および未知語の音節列表記＃ｎｉＮｓｉｋｉ＃を品
詞「名詞」で単語辞書１０に登録する。Next, the operation will be described. FIG. 12 is a flowchart showing a process of estimating an unknown word syllable and registering the unknown word in a dictionary using the speech recognition device according to the third embodiment of the present invention. 12, the same reference numerals as those in FIG. 5 denote the same or corresponding processes, and a description thereof will not be repeated. In step ST11, for example, when the maximum likelihood syllable string # niNsiki # for the unknown word "recognition" is estimated, the word registration device 31 performs the unknown word character notation "recognition" and the unknown word syllabic string notation # niNsiki #. Is registered in the word dictionary 10 as a part of speech “noun”.

【００７１】以上のように、この実施の形態３によれ
ば、音声認識装置を単語登録装置３１を有するように構
成したので、自動的に認定された未知語および自動的に
推定された当該未知語に合致する音節列が単語辞書１０
に登録されるから、逐次単語辞書１０を充実させて認識
精度を向上することができるという効果を奏する。As described above, according to the third embodiment, the speech recognition device is configured to include the word registration device 31, so that the automatically recognized unknown word and the automatically estimated unknown word are determined. A syllable string that matches the word is a word dictionary 10
Therefore, there is an effect that the word dictionary 10 can be sequentially improved and the recognition accuracy can be improved.

【００７２】なお、単語登録装置３１をはじめとして、
音声認識装置を構成する各装置により付与される機能
は、ＣＰＵ、メモリ、入出力装置等を具備するコンピュ
ータ上で動作するプログラムにより実現することが可能
である。したがって、音声認識処理を実施するための上
記機能を実現するプログラムをコンピュータ読み取り可
能な記録媒体に記録することが可能であり、この記録媒
体をコンピュータに読み取らせることで、任意のコンピ
ュータ上で音声認識処理を実施することができる。The word registration device 31 and the like
The functions provided by each device constituting the speech recognition device can be realized by a program that operates on a computer including a CPU, a memory, an input / output device, and the like. Therefore, it is possible to record a program for realizing the above functions for performing the voice recognition processing on a computer-readable recording medium. Processing can be performed.

【００７３】実施の形態４．図１３は、この発明の実施
の形態４による音声認識装置の構成を示すブロック図で
ある。図１３において、図１と同一符号は同一または相
当部分を示すのでその説明を省略する。実施の形態４
は、実施の形態１と比較すると、未知語および当該未知
語に合致すると推定された音節列をｎ−ｇｒａｍとして
単語辞書に登録する点で相違する。図１３において、４
１は未知語を連接する形態素と接続して構成するｎ−ｇ
ｒａｍを単語辞書１０に登録するｎ−ｇｒａｍ登録装置
（ｎ−ｇｒａｍ登録手段）である。Embodiment 4 FIG. 13 is a block diagram showing a configuration of a voice recognition device according to Embodiment 4 of the present invention. 13, the same reference numerals as those in FIG. 1 denote the same or corresponding parts, and a description thereof will not be repeated. Embodiment 4
Is different from Embodiment 1 in that an unknown word and a syllable string estimated to match the unknown word are registered in a word dictionary as n-grams. In FIG. 13, 4
1 is an ng constructed by connecting an unknown word with a morpheme connected thereto
An n-gram registration device (n-gram registration means) for registering the ram in the word dictionary 10.

【００７４】次に動作について説明する。図１４は、こ
の発明の実施の形態４による音声認識装置を用いて未知
語音節を推定して未知語をｎ−ｇｒａｍとして辞書に登
録する動作の過程を示すフローチャートである。図１４
において、図５と同一符号は同一または相当処理を示す
のでその説明を省略する。ステップＳＴ１１において、
例えば未知語である「認識」に合致する最尤の音節列＃
ｎｉＮｓｉｋｉ＃が推定されると、ｎ−ｇｒａｍ登録装
置４１は、未知語を連接する形態素と接続してｎ−ｇｒ
ａｍを構成し、ｎ−ｇｒａｍに含めて未知語の文字表記
「認識」および合致する音節列表記＃ｎｉＮｓｉｋｉ＃
を品詞「名詞」で単語辞書１０に登録する。この際、言
語尤度として固定値（例えば０．１）を与える。例え
ば、２−ｇｒａｍの形態で単語辞書１０に登録されると
すると、以下に示すレコードが単語辞書１０に登録され
る。音声ｏｎｓｅｅ名詞認識ｎｉＮｓｉｋｉ名詞
０．１認識ｎｉＮｓｉｋｉ名詞処理ｓｊｏｒｉサ変
名詞０．１Next, the operation will be described. FIG. 14 is a flowchart showing a process of estimating an unknown word syllable by using the speech recognition apparatus according to the fourth embodiment of the present invention and registering the unknown word as an n-gram in a dictionary. FIG.
In FIG. 5, the same reference numerals as those in FIG. 5 indicate the same or corresponding processes, and a description thereof will not be repeated. In step ST11,
For example, the most likely syllable string # that matches the unknown word "recognition"
When the niNsiki # is estimated, the n-gram registration device 41 connects the unknown word to a morpheme connected to the unknown word, and
am, which is included in the n-gram, and the character notation “recognition” of the unknown word and the matching syllable string notation # niNsiki #
Is registered in the word dictionary 10 as a part of speech “noun”. At this time, a fixed value (for example, 0.1) is given as the language likelihood. For example, if it is registered in the word dictionary 10 in the form of 2-gram, the following records are registered in the word dictionary 10. Voice onsee noun recognition niNsiki noun 0.1 recognition niNsiki noun processing sjori sa noun 0.1

【００７５】以上のように、この実施の形態４によれ
ば、音声認識装置をｎ−ｇｒａｍ登録装置４１を有する
ように構成したので、自動的に認定された未知語および
自動的に推定された当該未知語に合致する音節列がｎ−
ｇｒａｍの形態で単語辞書１０に登録されるから、逐次
単語辞書１０が充実され、対象とする単語について前後
に連接する単語に基づいての正確な認識を可能とするの
で、認識精度を向上することができるという効果を奏す
る。As described above, according to the fourth embodiment, since the voice recognition device is configured to have the n-gram registration device 41, the automatically recognized unknown words and the automatically estimated unknown words are obtained. The syllable string that matches the unknown word is n-
Since the words are registered in the word dictionary 10 in the form of a gram, the word dictionary 10 is successively enriched, and the target word can be accurately recognized based on words connected before and after the word, thereby improving the recognition accuracy. This has the effect that it can be performed.

【００７６】なお、ｎ−ｇｒａｍ登録装置４１をはじめ
として、音声認識装置を構成する各装置により付与され
る機能は、ＣＰＵ、メモリ、入出力装置等を具備するコ
ンピュータ上で動作するプログラムにより実現すること
が可能である。したがって、音声認識処理を実施するた
めの上記機能を実現するプログラムをコンピュータ読み
取り可能な記録媒体に記録することが可能であり、この
記録媒体をコンピュータに読み取らせることで、任意の
コンピュータ上で音声認識処理を実施することができ
る。The functions provided by the respective devices constituting the speech recognition device, such as the n-gram registration device 41, are realized by a program operating on a computer having a CPU, a memory, an input / output device, and the like. It is possible. Therefore, it is possible to record a program for realizing the above functions for performing the voice recognition processing on a computer-readable recording medium. Processing can be performed.

【００７７】実施の形態５．図１５は、この発明の実施
の形態５による音声認識装置の構成を示すブロック図で
ある。図１５において、図１と同一符号は同一または相
当部分を示すのでその説明を省略する。実施の形態５
は、実施の形態１と比較すると、未知語範囲抽出装置に
より認定された未知語および未知語音節推定装置により
推定された未知語に合致する音節列をユーザに提示し
て、ユーザが未知語並びに合致する音節列を修正できる
ようにした点で相違する。図１５において、５１は認定
された未知語および推定された音節列についての表記を
ユーザに対して表示する第２出力装置（第２の出力手
段）、５２は未知語または未知語に合致する音節列に誤
りがある場合に、修正のためにユーザが正解の文字列ま
たは音節列を入力する第２修正装置（第２の修正手
段）、５３は未知語および未知語に合致する音節列を単
語辞書１０に登録する単語登録装置である。Embodiment 5 FIG. 15 is a block diagram showing a configuration of a speech recognition device according to Embodiment 5 of the present invention. 15, the same reference numerals as those in FIG. 1 denote the same or corresponding parts, and a description thereof will not be repeated. Embodiment 5
Presents to the user a syllable string that matches the unknown word recognized by the unknown word range extraction device and the unknown word estimated by the unknown word syllable estimation device, as compared with the first embodiment. The difference is that the matching syllable string can be modified. In FIG. 15, reference numeral 51 denotes a second output device (second output means) for displaying to the user the notation of the recognized unknown word and the estimated syllable string, and 52 denotes an unknown word or a syllable matching the unknown word. When there is an error in the sequence, a second correction device (second correction means) for inputting a correct character string or syllable sequence by the user for correction, the unknown word and a syllable sequence matching the unknown word are converted into words. A word registration device for registering in the dictionary 10.

【００７８】次に動作について説明する。図１６は、こ
の発明の実施の形態５による音声認識装置を用いて未知
語を抽出しユーザによる修正を経て辞書に登録する動作
の過程を示すフローチャートである。図１６において、
図５と同一符号は同一または相当処理を示すのでその説
明を省略する。ステップＳＴ１１において、例えば未知
語である「認識」に合致する音節列として＃ｍｉｔｏｍ
ｅｓｉｋｉ＃が推定されたとすると、第２出力装置５１
は、未知語の文字表記「認識」および音節列表記＃ｍｉ
ｔｏｍｅｓｉｋｉ＃を表示出力して、ユーザに未知語お
よび合致する音節列を提示する（ステップＳＴ５１）。
次に、出力された未知語の文字表記または音節列表記に
誤りがある場合には、ユーザは第２修正装置５２を用い
て文字表記または音節列表記を正解である文字列または
音節列に修正する。この場合、音節列＃ｍｉｔｏｍｅｓ
ｉｋｉ＃を＃ｎｉＮｓｉｋｉ＃に修正する（ステップＳ
Ｔ５２）。そして、単語登録装置５３は、未知語の文字
表記「認識」および音節列表記＃ｎｉＮｓｉｋｉ＃を品
詞「名詞」で単語辞書１０に登録する（ステップＳＴ５
３）。Next, the operation will be described. FIG. 16 is a flowchart showing a process of extracting an unknown word using the voice recognition device according to the fifth embodiment of the present invention and registering it in a dictionary after correction by a user. In FIG.
5 denote the same or corresponding processes, and a description thereof will not be repeated. In step ST11, for example, a syllable string that matches the unknown word “recognition” is #mitom.
If esiki # is estimated, the second output device 51
Is a character notation “recognition” of an unknown word and a syllable string notation #mi
Tomesiki # is displayed and output, and the unknown word and the matching syllable string are presented to the user (step ST51).
Next, if there is an error in the output character representation or syllable string representation of the unknown word, the user corrects the character representation or syllable string representation to a correct character string or syllable string using the second correction device 52. I do. In this case, the syllable string #mitomes
iki # is modified to # niNsiki # (step S
T52). Then, the word registration device 53 registers the character notation “recognition” of the unknown word and the syllable string notation # niNsiki # in the word dictionary 10 as the part of speech “noun” (step ST5).
3).

【００７９】以上のように、この実施の形態５によれ
ば、音声認識装置を第２出力装置５１および第２修正装
置５２を有するように構成したので、ユーザが未知語お
よび合致する音節列を確認することができ、誤りがある
場合にはこれを修正できるから、誤りのない正確な音声
データが確実に辞書に登録されて、認識精度を向上する
ことができるという効果を奏する。As described above, according to the fifth embodiment, since the speech recognition device is configured to include the second output device 51 and the second correction device 52, the user can determine the unknown word and the matching syllable string. Since it is possible to confirm the error and correct the error if there is an error, an accurate voice data without any error can be surely registered in the dictionary, and the effect of improving the recognition accuracy can be obtained.

【００８０】なお、第２出力装置５１および第２修正装
置５２をはじめとして、音声認識装置を構成する各装置
により付与される機能は、ＣＰＵ、メモリ、入出力装置
等を具備するコンピュータ上で動作するプログラムによ
り実現することが可能である。したがって、音声認識処
理を実施するための上記機能を実現するプログラムをコ
ンピュータ読み取り可能な記録媒体に記録することが可
能であり、この記録媒体をコンピュータに読み取らせる
ことで、任意のコンピュータ上で音声認識処理を実施す
ることができる。The functions provided by the respective devices constituting the speech recognition device, such as the second output device 51 and the second correction device 52, operate on a computer having a CPU, a memory, an input / output device and the like. It can be realized by a program that performs Therefore, it is possible to record a program for realizing the above functions for performing the voice recognition processing on a computer-readable recording medium. Processing can be performed.

【００８１】実施の形態６．図１７は、この発明の実施
の形態６による音声認識装置の構成を示すブロック図で
ある。図１７において、図１と同一符号は同一または相
当部分を示すのでその説明を省略する。実施の形態６
は、実施の形態１と比較すると、ユーザにより入力され
た正解文字列のなかの部分文字列として設定される未知
語について、ユーザにより与えられた正規の文字表記に
加えて、同一の読みを有する異なる文字表記に対して
も、未知語に合致する音節列を単語辞書に登録する点で
相違する。図１７において、６１は未知語についての正
規の文字表記に加えて同じ読みおよび意味を与える異表
記の文字表記についても、合致する音節列を単語辞書１
０に登録する異表記登録装置（異表記登録手段）であ
る。Embodiment 6 FIG. FIG. 17 is a block diagram showing a configuration of a speech recognition device according to Embodiment 6 of the present invention. 17, the same reference numerals as those in FIG. 1 denote the same or corresponding parts, and a description thereof will not be repeated. Embodiment 6
Has the same reading for unknown words set as partial character strings in the correct character string input by the user, in addition to the regular character notation given by the user, as compared with the first embodiment. The difference is that syllable strings that match unknown words are registered in a word dictionary even for different character notations. In FIG. 17, reference numeral 61 denotes a syllable string that matches the same syllabic string that gives the same reading and meaning in addition to the regular letter notation for unknown words.
0 is a different notation registration device (different notation registration means).

【００８２】次に動作について説明する。図１８は、こ
の発明の実施の形態６による音声認識装置を用いて未知
語を抽出し正規の表記に加えて異表記に対しても合致す
る音節列を単語辞書１０に登録する動作の過程を示すフ
ローチャートである。図１８において、図５と同一符号
は同一または相当処理を示すのでその説明を省略する。
ステップＳＴ１１において、例えば未知語である「関
数」に合致する音節列＃ｋａｎｓｕｕ＃が推定されたと
すると、異表記登録装置６１は、内部に保持している異
表記文字変換規則（例えば、関 −＞函）を用いて、
異表記である「函数」と上記合致する音節列＃ｋａｎｓ
ｕｕ＃の組み合せについても単語辞書１０に登録する。Next, the operation will be described. FIG. 18 shows a process of extracting unknown words by using the speech recognition apparatus according to Embodiment 6 of the present invention and registering syllable strings that match not only regular expressions but also different expressions in the word dictionary 10. It is a flowchart shown. In FIG. 18, the same reference numerals as those in FIG. 5 denote the same or corresponding processes, and a description thereof will not be repeated.
In step ST11, assuming that a syllable string # kansu # that matches, for example, an unknown word “function” is estimated, the different notation registration device 61 converts the different notation character conversion rules (for example, Box)
Syllable string #kans that matches the above-mentioned "function"
The combination of uu # is also registered in the word dictionary 10.

【００８３】以上のように、この実施の形態６によれ
ば、音声認識装置を異表記登録装置６１を有するように
構成したので、未知語についてユーザにより入力された
正規の表記のみならず異表記に対しても合致する音節列
が単語辞書１０に登録されて学習されるから、認識精度
を向上することができるという効果を奏する。As described above, according to the sixth embodiment, since the speech recognition apparatus is configured to have the different notation registration device 61, not only the regular notation input by the user for the unknown word but also the different notation is used. Since a syllable string that matches also is registered in the word dictionary 10 and learned, it is possible to improve the recognition accuracy.

【００８４】なお、異表記登録装置６１をはじめとし
て、音声認識装置を構成する各装置により付与される機
能は、ＣＰＵ、メモリ、入出力装置等を具備するコンピ
ュータ上で動作するプログラムにより実現することが可
能である。したがって、音声認識処理を実施するための
上記機能を実現するプログラムをコンピュータ読み取り
可能な記録媒体に記録することが可能であり、この記録
媒体をコンピュータに読み取らせることで、任意のコン
ピュータ上で音声認識処理を実施することができる。The functions provided by each device constituting the speech recognition device, such as the different notation registration device 61, are realized by a program operating on a computer having a CPU, a memory, an input / output device, and the like. Is possible. Therefore, it is possible to record a program for realizing the above functions for performing the voice recognition processing on a computer-readable recording medium. Processing can be performed.

【００８５】実施の形態７．図１９は、この発明の実施
の形態７による音声認識装置の構成を示すブロック図で
ある。図１９において、図１と同一符号は同一または相
当部分を示すのでその説明を省略する。実施の形態７
は、実施の形態１と比較すると、未知語に合致する音節
列が推定できない場合でも、未知語範囲抽出装置により
認定された未知語に対応する認識音節列を未知語に合致
する音節列として辞書に登録する点で相違する。図１９
において、７１は未知語に合致する音節列が推定できた
か否かを判定して、推定できた場合には当該推定された
音節列を未知語に対応する音節列として単語辞書１０に
登録し、推定できない場合には未知語範囲抽出装置７に
より認定された未知語に対応する認識音節列を未知語に
合致する音節列として単語辞書１０に登録する音節列登
録装置（音節列登録手段）である。Embodiment 7 FIG. FIG. 19 is a block diagram showing a configuration of a speech recognition device according to Embodiment 7 of the present invention. In FIG. 19, the same reference numerals as those in FIG. 1 denote the same or corresponding parts, and a description thereof will be omitted. Embodiment 7
In comparison with the first embodiment, even if a syllable string matching an unknown word cannot be estimated, the dictionary recognizes a recognized syllable string corresponding to the unknown word recognized by the unknown word range extraction device as a syllable string matching the unknown word. It differs in that it is registered in. FIG.
In 71, it is determined whether a syllable string matching the unknown word has been estimated, and if it can be estimated, the estimated syllable string is registered in the word dictionary 10 as a syllable string corresponding to the unknown word. If it cannot be estimated, the syllable string registration device (syllable string registration means) registers the recognized syllable string corresponding to the unknown word recognized by the unknown word range extraction device 7 in the word dictionary 10 as a syllable string matching the unknown word. .

【００８６】次に動作について説明する。図２０は、こ
の発明の実施の形態７による音声認識装置を用いて未知
語を抽出し合致する音節列を単語辞書に登録する動作の
過程を示すフローチャートである。図２０において、図
５と同一符号は同一または相当処理を示すのでその説明
を省略する。ステップＳＴ１１において未知語に合致す
る音節列が推定された後、合致する音節列が推定できた
か否かをチェックする（ステップＳＴ７１）。合致する
音節列が推定できた場合には、未知語および当該推定さ
れた音節列を単語辞書１０に登録する（ステップＳＴ７
３）。また、未知語に対する音節列候補の尤度がどれも
所定の閾値未満である場合のように合致する音節列が推
定できない時には、未知語範囲抽出装置７により認定さ
れた未知語に対応する認識音節列を未知語に合致する音
節列として設定する（ステップＳＴ７２）。そして、未
知語と設定された上記音節列とを単語辞書１０に登録す
る（ステップＳＴ７３）。Next, the operation will be described. FIG. 20 is a flowchart showing a process of extracting an unknown word and registering a matching syllable string in a word dictionary using the speech recognition apparatus according to the seventh embodiment of the present invention. 20, the same reference numerals as those in FIG. 5 denote the same or corresponding processes, and a description thereof will not be repeated. After the syllable string that matches the unknown word is estimated in step ST11, it is checked whether the matching syllable string has been estimated (step ST71). If a matching syllable string can be estimated, the unknown word and the estimated syllable string are registered in the word dictionary 10 (step ST7).
3). When a matching syllable string cannot be estimated as in the case where the likelihood of a syllable string candidate for an unknown word is less than a predetermined threshold, the recognition syllable corresponding to the unknown word recognized by the unknown word range extraction device 7 The sequence is set as a syllable sequence that matches the unknown word (step ST72). Then, the unknown word and the set syllable string are registered in the word dictionary 10 (step ST73).

【００８７】以上のように、この実施の形態７によれ
ば、音節列登録装置７１を有するように構成したので、
未知語として抽出された文字列のなかにサブワード辞書
１１に登録されていない部分文字列が存在して認定され
た未知語に対して音節列を付与できない場合でも、未知
語に対応する認識音節列を付与することができ、この認
識音節列は元来ユーザの発声に対する認識結果であるか
ら、音声認識においてミスマッチを生じる可能性も少な
く、認識精度を向上することができるという効果を奏す
る。As described above, according to the seventh embodiment, the syllable string registration device 71 is provided.
Even when a partial character string not registered in the subword dictionary 11 exists in a character string extracted as an unknown word and a syllable string cannot be assigned to the recognized unknown word, a recognized syllable string corresponding to the unknown word Since the recognition syllable sequence is originally a recognition result for the utterance of the user, there is little possibility that a mismatch occurs in voice recognition, and the recognition accuracy can be improved.

【００８８】なお、音節列登録装置７１をはじめとし
て、音声認識装置を構成する各装置により付与される機
能は、ＣＰＵ、メモリ、入出力装置等を具備するコンピ
ュータ上で動作するプログラムにより実現することが可
能である。したがって、音声認識処理を実施するための
上記機能を実現するプログラムをコンピュータ読み取り
可能な記録媒体に記録することが可能であり、この記録
媒体をコンピュータに読み取らせることで、任意のコン
ピュータ上で音声認識処理を実施することができる。The functions provided by the syllabic string registration device 71 and other devices constituting the speech recognition device are realized by a program operating on a computer having a CPU, a memory, an input / output device, and the like. Is possible. Therefore, it is possible to record a program for realizing the above functions for performing the voice recognition processing on a computer-readable recording medium. Processing can be performed.

【００８９】実施の形態８．図２１は、この発明の実施
の形態８による音声認識装置の構成を示すブロック図で
ある。図２１において、図１と同一符号は同一または相
当部分を示すのでその説明を省略する。実施の形態８
は、実施の形態１と比較すると、未知語音節推定装置８
により推定された音節列に加えて、未知語に対応する異
読みの音節列をも未知語に合致するものとして単語辞書
に登録する点で相違する。図２１において、８１は未知
語に対して未知語音節推定装置８により推定された音節
列に加えて、音節列の変更規則に従った異読みの音節列
についても、未知語に合致するものとして単語辞書１０
に登録する異読み登録装置（異読み登録手段）である。Embodiment 8 FIG. FIG. 21 is a block diagram showing a configuration of a speech recognition device according to Embodiment 8 of the present invention. 21, the same reference numerals as those in FIG. 1 denote the same or corresponding parts, and a description thereof will not be repeated. Embodiment 8
Is compared with the first embodiment, the unknown word syllable estimation device 8
The difference is that, in addition to the syllable string estimated by, the syllable string of the misreading corresponding to the unknown word is registered in the word dictionary as matching the unknown word. In FIG. 21, in addition to the syllable string estimated by the unknown word syllable estimating device 8 with respect to the unknown word, the syllable string of misreading according to the rule for changing the syllable string matches the unknown word. Word dictionary 10
Is a different reading registration device (differential reading registering means).

【００９０】次に動作について説明する。図２２は、こ
の発明の実施の形態８による音声認識装置を用いて未知
語を抽出し正規の音節列に加えて異読みの音節列につい
ても単語辞書１０に登録する動作の過程を示すフローチ
ャートである。図２２において、図５と同一符号は同一
または相当処理を示すのでその説明を省略する。ステッ
プＳＴ１１において、例えば未知語である「洗濯機」に
合致する音節列＃ｓｅＮｔａｋｕｋｉ＃が推定されたと
すると、異読み登録装置８１は、内部に保持している音
節列変換規則（例えば、ａｋｕｋｉ −＞ａＱｋｉ）を
用いて、異読みの音節列についても単語辞書１０に登録
する。すなわち、「洗濯機」と＃ｓｅＮｔａｋｕｋｉ＃
との組み合せのみではなく、「洗濯機」と＃ｓｅＮｔａ
Ｑｋｉ＃との組み合せについても単語辞書１０に登録す
る。Next, the operation will be described. FIG. 22 is a flowchart showing a process of extracting unknown words using the speech recognition apparatus according to Embodiment 8 of the present invention and registering, in addition to regular syllable strings, dyslexic syllable strings in the word dictionary 10. is there. 22, the same reference numerals as those in FIG. 5 denote the same or corresponding processes, and a description thereof will not be repeated. In step ST11, assuming that a syllable string # seNtakiki # that matches, for example, the unknown word “washing machine” is estimated, the misreading registration device 81 stores the syllable string conversion rules (eg, akuki->) held therein. Using aQki), syllable strings of different readings are also registered in the word dictionary 10. That is, "washing machine" and # seNtakiki #
Not only in combination with "washing machine" and #seNta
The combination with Qki # is also registered in the word dictionary 10.

【００９１】以上のように、この実施の形態８によれ
ば、音声認識装置を異読み登録装置８１を有するように
構成したので、未知語について異読みの音節列も自動的
に登録されて学習されるから、認識精度を向上すること
ができるという効果を奏する。As described above, according to the eighth embodiment, since the speech recognition device is configured to have the misreading registration device 81, the syllabic sequence of the misreading for an unknown word is automatically registered and learned. Therefore, there is an effect that recognition accuracy can be improved.

【００９２】なお、異読み登録装置８１をはじめとし
て、音声認識装置を構成する各装置により付与される機
能は、ＣＰＵ、メモリ、入出力装置等を具備するコンピ
ュータ上で動作するプログラムにより実現することが可
能である。したがって、音声認識処理を実施するための
上記機能を実現するプログラムをコンピュータ読み取り
可能な記録媒体に記録することが可能であり、この記録
媒体をコンピュータに読み取らせることで、任意のコン
ピュータ上で音声認識処理を実施することができる。The functions provided by the devices constituting the speech recognition device, such as the different reading registration device 81, are realized by a program operating on a computer having a CPU, a memory, an input / output device, and the like. Is possible. Therefore, it is possible to record a program for realizing the above functions for performing the voice recognition processing on a computer-readable recording medium. Processing can be performed.

【００９３】[0093]

【発明の効果】以上のように、この発明によれば、ユー
ザにより入力された正解文字列と形態素解析結果とを比
較して未知語および当該未知語に対応する認識音節列を
認定する未知語範囲抽出を行うとともに、単語を構成す
るサブワードに対する種々の読みを音節列として登録し
たサブワード辞書を参照して未知語を構成するサブワー
ドに対する音節列を組み合せることで未知語に対応する
種々の音節列候補を生成し、２つの音節列間の近似度を
評価する差分表を参照して未知語に対応する認識音節列
に最も近似する音節列候補を検出して、この最尤の音節
列候補を未知語に合致する音節列と推定する未知語音節
推定を行うように構成したので、音声認識過程で抽出さ
れた未知語に対して、正確な音節列を精度良く付与する
ことができるという効果を奏する。As described above, according to the present invention, an unknown word and a recognized syllable string corresponding to the unknown word are recognized by comparing the correct character string input by the user with the result of the morphological analysis. Various syllable strings corresponding to unknown words are obtained by performing range extraction and combining syllable strings for subwords forming unknown words with reference to a subword dictionary in which various readings for subwords forming words are registered as syllable strings. A candidate is generated, a syllable string candidate closest to the recognized syllable string corresponding to the unknown word is detected with reference to a difference table for evaluating the degree of approximation between the two syllable strings, and this maximum likelihood syllable string candidate is determined. Since it is configured to perform unknown word syllable estimation to estimate a syllable string that matches the unknown word, it is possible to accurately assign an accurate syllable string to the unknown word extracted in the speech recognition process. An effect.

【００９４】この発明によれば、音声パターンを基に音
節単位の認識を実施して音声に対応する尤度上位の複数
個の音節列候補を算出する音節列算出、複数個の音節列
候補のそれぞれに対して対応する単語列候補を算出する
単語列算出、および複数の音節列候補と単語列候補との
組み合せのなかから最も大きな言語尤度を有する組み合
せを検出し、当該組み合せに係る音節列候補および単語
列候補をそれぞれ認識音節列および認識単語列として出
力することを行うように構成したので、音節列と単語列
との組み合せからなる総合的な言語尤度を基にして推定
が実施されるから、より高精度な未知語に合致する音節
列の推定が可能になるという効果を奏する。According to the present invention, syllable string calculation for performing syllable unit recognition based on a voice pattern to calculate a plurality of syllable string candidates with higher likelihood corresponding to voice, and A word string calculation for calculating a word string candidate corresponding thereto, and a combination having the largest linguistic likelihood is detected from combinations of a plurality of syllable string candidates and word string candidates, and a syllable string related to the combination is detected. Since the configuration is such that the candidate and the word string candidate are output as the recognized syllable string and the recognized word string, respectively, the estimation is performed based on the total linguistic likelihood including the combination of the syllable string and the word string. Therefore, there is an effect that it is possible to more accurately estimate a syllable string that matches an unknown word.

【００９５】この発明によれば、未知語と未知語に合致
する音節列とを単語辞書に登録する単語登録を行うよう
に構成したので、自動的に認定された未知語および自動
的に推定された当該未知語に合致する音節列が単語辞書
に登録されるから、逐次単語辞書を充実させて認識精度
を向上することができるという効果を奏する。According to the present invention, since the word registration for registering the unknown word and the syllable string matching the unknown word in the word dictionary is performed, the automatically recognized unknown word and the automatically estimated unknown word are automatically registered. Since the syllable string that matches the unknown word is registered in the word dictionary, it is possible to enhance the word dictionary one by one to improve the recognition accuracy.

【００９６】この発明によれば、未知語と未知語に合致
する音節列とをｎ−ｇｒａｍとして単語辞書に登録する
ｎ−ｇｒａｍ登録を行うように構成したので、自動的に
認定された未知語および自動的に推定された当該未知語
に合致する音節列がｎ−ｇｒａｍの形態で単語辞書に登
録されるから、逐次単語辞書が充実され、対象とする単
語について前後に連接する単語に基づいての正確な認識
を可能とするので、認識精度を向上することができると
いう効果を奏する。According to the present invention, n-gram registration is performed in which an unknown word and a syllable string matching the unknown word are registered in the word dictionary as n-grams. Since the syllable string that matches the unknown word automatically estimated is registered in the word dictionary in the form of n-gram, the word dictionary is sequentially enhanced, and the target word is determined based on the words connected before and after. Since it is possible to accurately recognize the object, it is possible to improve the recognition accuracy.

【００９７】この発明によれば、未知語範囲抽出により
認定された未知語および未知語音節推定により推定され
た未知語に合致する音節列を表す表記をユーザに対して
表示出力すること、および未知語および合致する音節列
に係る表記に誤りがある場合にユーザが正解となる表記
を入力することを行うように構成したので、ユーザが未
知語および合致する音節列を確認することができ、誤り
がある場合にはこれを修正できるから、誤りのない正確
な音声データが確実に辞書に登録されて認識精度を向上
することができるという効果を奏する。According to the present invention, a notation representing a syllable string matching an unknown word recognized by unknown word range extraction and an unknown word estimated by unknown word syllable estimation is displayed and output to a user. Since the user inputs the correct notation when there is an error in the notation relating to the word and the matching syllable string, the user can confirm the unknown word and the matching syllable string, and In the case where there is, this can be corrected, so that there is an effect that accurate voice data without any error can be reliably registered in the dictionary and the recognition accuracy can be improved.

【００９８】この発明によれば、未知語範囲抽出により
認定された未知語に対して未知語音節推定により推定さ
れた未知語に合致する音節列を単語辞書に登録するとと
もに、未知語についての異表記に対しても前記合致する
音節列を単語辞書に登録する異表記登録を行うように構
成したので、未知語についてユーザにより入力された正
規の表記のみならず異表記に対しても合致する音節列が
単語辞書に登録されて学習されるから、認識精度を向上
することができるという効果を奏する。According to the present invention, a syllable string matching the unknown word estimated by the unknown word syllable estimation is registered in the word dictionary for the unknown word identified by the unknown word range extraction, and the unknown word As for the notation, the syllable string that matches the syllabic string that matches not only the regular spelling entered by the user for the unknown word but also the syllable that matches the syllabic string is registered so that the matching syllable string is registered in the word dictionary. Since the columns are registered and learned in the word dictionary, the recognition accuracy can be improved.

【００９９】この発明によれば、未知語範囲抽出により
認定された未知語について、未知語に合致する音節列が
推定できたか否かを判定して、推定できた場合には当該
推定された音節列を未知語に合致する音節列として単語
辞書に登録し、推定できない場合には未知語範囲抽出で
認定された未知語に対応する認識音節列を未知語に合致
する音節列として単語辞書に登録する音節列登録を行う
ように構成したので、未知語として抽出された文字列の
なかにサブワード辞書に登録されていない部分文字列が
存在して認定された未知語に対して音節列を付与できな
い場合でも、未知語に対応する認識音節列を付与するこ
とができ、この認識音節列は元来ユーザの発声に対する
認識結果であるから、音声認識においてミスマッチを生
じる可能性も少なく、認識精度を向上することができる
という効果を奏する。According to the present invention, it is determined whether or not a syllable string matching the unknown word has been estimated for the unknown word identified by the unknown word range extraction. Register the string in the word dictionary as a syllable string that matches the unknown word, and if it cannot be estimated, register the recognized syllable string corresponding to the unknown word that has been certified by unknown word range extraction in the word dictionary as a syllable string that matches the unknown word Syllable string registration is performed, so a syllable string cannot be assigned to the recognized unknown word because there is a partial character string that is not registered in the subword dictionary in the character string extracted as an unknown word Even in this case, a recognition syllable sequence corresponding to the unknown word can be added, and since this recognition syllable sequence is originally a recognition result for the utterance of the user, there is little possibility that a mismatch occurs in speech recognition. An effect that it is possible to improve the recognition accuracy.

【０１００】この発明によれば、未知語範囲抽出により
認定された未知語に対して未知語音節推定により推定さ
れた未知語に合致する音節列を単語辞書に登録するとと
もに、未知語に対して当該未知語に合致する異読みの音
節列を単語辞書に登録する異読み登録を行うように構成
したので、未知語について異読みの音節列も自動的に登
録されて学習されるから、認識精度を向上することがで
きるという効果を奏する。According to the present invention, a syllable string matching the unknown word estimated by the unknown word syllable estimation is registered in the word dictionary for the unknown word recognized by the unknown word range extraction. The system is configured to register the misreading syllable sequence that matches the unknown word in the word dictionary, so that the syllabic sequence of the misreading for the unknown word is automatically registered and learned, so that the recognition accuracy is improved. Is achieved.

[Brief description of the drawings]

【図１】この発明の実施の形態１による音声認識装置
の構成を示すブロック図である。FIG. 1 is a block diagram showing a configuration of a speech recognition device according to a first embodiment of the present invention.

【図２】単語辞書に記憶されているレコードの内容を
示す図である。FIG. 2 is a diagram showing the contents of a record stored in a word dictionary.

【図３】サブワード辞書に記憶されているレコードの
内容を示す図である。FIG. 3 is a diagram showing the contents of a record stored in a subword dictionary.

【図４】差分表に記憶されているレコードの内容を示
す図である。FIG. 4 is a diagram showing the contents of a record stored in a difference table.

【図５】この発明の実施の形態１による音声認識方法
を示すフローチャートである。FIG. 5 is a flowchart showing a speech recognition method according to the first embodiment of the present invention.

【図６】認識された文字列および修正された文字列を
示す図である。FIG. 6 is a diagram illustrating a recognized character string and a corrected character string.

【図７】形態素解析のアルゴリズムを示すフローチャ
ートである。FIG. 7 is a flowchart showing an algorithm of morphological analysis.

【図８】２つの音節列間の近似度を計算する際の一過
程を示す図である。FIG. 8 is a diagram showing one process when calculating the degree of approximation between two syllable strings.

【図９】この発明の実施の形態２による音声認識装置
の構成を示すブロック図である。FIG. 9 is a block diagram showing a configuration of a voice recognition device according to a second embodiment of the present invention.

【図１０】この発明の実施の形態２による音声認識方
法を示すフローチャートである。FIG. 10 is a flowchart showing a speech recognition method according to Embodiment 2 of the present invention.

【図１１】この発明の実施の形態３による音声認識装
置の構成を示すブロック図である。FIG. 11 is a block diagram showing a configuration of a voice recognition device according to a third embodiment of the present invention.

【図１２】この発明の実施の形態３による音声認識方
法を示すフローチャートである。FIG. 12 is a flowchart showing a voice recognition method according to Embodiment 3 of the present invention.

【図１３】この発明の実施の形態４による音声認識装
置の構成を示すブロック図である。FIG. 13 is a block diagram showing a configuration of a voice recognition device according to a fourth embodiment of the present invention.

【図１４】この発明の実施の形態４による音声認識方
法を示すフローチャートである。FIG. 14 is a flowchart showing a voice recognition method according to Embodiment 4 of the present invention.

【図１５】この発明の実施の形態５による音声認識装
置の構成を示すブロック図である。FIG. 15 is a block diagram showing a configuration of a voice recognition device according to a fifth embodiment of the present invention.

【図１６】この発明の実施の形態５による音声認識方
法を示すフローチャートである。FIG. 16 is a flowchart showing a voice recognition method according to Embodiment 5 of the present invention.

【図１７】この発明の実施の形態６による音声認識装
置の構成を示すブロック図である。FIG. 17 is a block diagram showing a configuration of a speech recognition device according to Embodiment 6 of the present invention.

【図１８】この発明の実施の形態６による音声認識方
法を示すフローチャートである。FIG. 18 is a flowchart showing a voice recognition method according to Embodiment 6 of the present invention.

【図１９】この発明の実施の形態７による音声認識装
置の構成を示すブロック図である。FIG. 19 is a block diagram showing a configuration of a speech recognition device according to a seventh embodiment of the present invention.

【図２０】この発明の実施の形態７による音声認識方
法を示すフローチャートである。FIG. 20 is a flowchart showing a voice recognition method according to Embodiment 7 of the present invention.

【図２１】この発明の実施の形態８による音声認識装
置の構成を示すブロック図である。FIG. 21 is a block diagram showing a configuration of a voice recognition device according to an eighth embodiment of the present invention.

【図２２】この発明の実施の形態８による音声認識方
法を示すフローチャートである。FIG. 22 is a flowchart showing a speech recognition method according to Embodiment 8 of the present invention.

【図２３】従来の一般的な未知語抽出機能を備えた音
声認識装置の構成を示すブロック図である。FIG. 23 is a block diagram showing a configuration of a conventional speech recognition device having a general unknown word extraction function.

【図２４】従来の音声認識装置を用いて未知語音節を
抽出する動作過程を示す図である。FIG. 24 is a diagram showing an operation process of extracting an unknown word syllable using a conventional speech recognition device.

[Explanation of symbols]

１マイク（音声入力手段）、２音節列算出装置（音
節列算出手段）、３単語列算出装置（単語列算出手
段）、４出力装置（出力手段）、５修正装置（修正
手段）、６形態素解析装置（形態素解析手段）、７
未知語範囲抽出装置（未知語範囲抽出手段）、８未知
語音節推定装置（未知語音節推定手段）９ＲＡＭ、１
０単語辞書、１１サブワード辞書、１２差分表、
２１Ｎベスト音節列算出装置（音節列算出手段）、２
２Ｎベスト単語列算出装置（単語列算出手段）、２３
Ｎベスト未知語音節推定装置、３１単語登録装置
（単語登録手段）、４１ｎ−ｇｒａｍ登録装置（ｎ−
ｇｒａｍ登録手段）、５１第２出力装置（第２の出力手
段）、５２第２修正装置（第２の修正手段）、５３
単語登録装置、６１異表記登録装置（異表記登録手
段）、７１音節列登録装置（音節列登録手段）、８１
異読み登録装置（異読み登録手段）。Reference Signs List 1 microphone (voice input means), 2 syllable string calculating device (syllable string calculating means), 3 word string calculating device (word string calculating means), 4 output device (output means), 5 correcting device (correcting means), 6 morpheme Analysis device (morphological analysis means), 7
Unknown word range extracting device (unknown word range extracting means), 8 unknown word syllable estimating device (unknown word syllable estimating means) 9 RAM, 1
0 word dictionary, 11 sub-word dictionary, 12 difference table,
21 N best syllable string calculating device (syllable string calculating means), 2
2 N best word string calculating device (word string calculating means), 23
N best unknown word syllable estimation device, 31 word registration device (word registration means), 41 n-gram registration device (n-
(gram registration means), 51 second output device (second output means), 52 second correction device (second correction means), 53
Word registration device, 61 Different notation registration device (different notation registration means), 71 Syllable string registration device (syllable string registration means), 81
Different reading registration device (different reading registration unit).

Claims

[Claims]

1. A voice input means for inputting a voice to generate an information-processable voice pattern represented by an electric signal or the like, and performing syllable-based recognition based on the voice pattern to correspond to the voice A syllable string calculating means for calculating a syllable string candidate to be performed, a word string calculating means for calculating a word string candidate corresponding to the syllable string candidate, and a syllable string calculating means and a word string calculating means calculated as a speech recognition result. Output means for outputting at least the maximum likelihood recognized word string, and correction means for the user to input a correct character string for correction when there is an error in the recognized word string displayed by the output means,
A morphological analysis unit that performs morphological analysis on the input correct character string, and an unknown word that compares the correct character string with the result of the morphological analysis to identify an unknown word and a recognized syllable string corresponding to the unknown word. By combining a range extracting means and a syllable sequence for a subword constituting an unknown word with reference to a subword dictionary in which various readings for a subword constituting a word are registered as a syllable sequence, various syllable sequence candidates corresponding to the unknown word are obtained. A syllable string candidate that is most similar to the recognized syllable string corresponding to the unknown word is detected with reference to a difference table that evaluates the degree of approximation between the two syllable strings. A speech recognition apparatus comprising: an unknown word syllable estimating unit that estimates a syllable string that matches a word.

2. A syllable string calculating means for performing syllable unit recognition based on a voice pattern to calculate a plurality of syllable string candidates having a higher likelihood corresponding to the voice, and each of the plurality of syllable string candidates. A word string calculating means for calculating a word string candidate corresponding to the syllable string calculating means and the largest among the combinations of the plurality of syllable string candidates and the word string candidates calculated by the word string calculating means. Output means for detecting a combination having linguistic likelihood, and outputting at least a recognized word string as a syllable string candidate and a word string candidate relating to the combination as a recognized syllable string and a recognized word string, respectively. 2. The speech recognition device according to 1.

3. A word registering means for registering, in a word dictionary, an unknown word recognized by the unknown word range extracting means and a syllable string matching the unknown word estimated by the unknown word syllable estimating means. The voice recognition device according to claim 1.

4. An n-gram registration means for registering an unknown word recognized by the unknown word range extracting means and a syllable string matching the unknown word estimated by the unknown word syllable estimating means as an n-gram in a word dictionary. The voice recognition device according to claim 3, comprising:

5. A second output means for displaying, to a user, a notation representing a syllable string matching the unknown word recognized by the unknown word range extracting means and the unknown word estimated by the unknown word syllable estimating means. And a second correction unit for inputting a notation that allows the user to enter a correct answer when there is an error in the notation representing the unknown word and the syllable string matching the unknown word displayed on the second output unit. The speech recognition device according to claim 3 or 4, wherein:

6. A syllable sequence that matches the unknown word estimated by the unknown word syllable estimating means is registered in the word dictionary for the unknown word recognized by the unknown word range extracting means, and the unknown notation of the unknown word is registered. 2. The speech recognition apparatus according to claim 1, further comprising a different notation registration unit that registers the matching syllable string in a word dictionary.

7. It is determined whether or not a syllable string matching the unknown word can be estimated for the unknown word recognized by the unknown word range extracting means. If the syllable string can be estimated, the estimated syllable string is replaced with the estimated syllable string. The syllable string corresponding to the unknown word is registered in the word dictionary as a syllable string matching the unknown word, and if it cannot be estimated, the recognized syllable string corresponding to the unknown word recognized by the unknown word range extracting means is set as a syllable string matching the unknown word. The speech recognition apparatus according to claim 1, further comprising a syllable string registration unit that registers the syllable string in a dictionary.

8. A syllable string that matches the unknown word estimated by the unknown word syllable estimating means for the unknown word recognized by the unknown word range extracting means is registered in a word dictionary, and 2. The speech recognition apparatus according to claim 1, further comprising: a misreading registration unit that registers a misreading syllable string matching the unknown word in a word dictionary.

9. A voice inputting step of inputting a voice to generate an information-processable voice pattern represented by an electric signal or the like, and performing syllable-based recognition based on the voice pattern to correspond to the voice A syllable string calculating step of calculating a syllable string candidate to be performed, a word string calculating step of calculating a word string candidate corresponding to the syllable string candidate, and a syllable string calculating step and a word string calculating step calculated as a speech recognition result. An output step of outputting a recognized word string of at least the maximum likelihood, and a correction step of a user inputting a correct character string for correction when there is an error in the recognized word string displayed in the output step. A morphological analysis step of performing a morphological analysis on the correct character string, and comparing the correct character string with the morphological analysis result to determine whether the unknown word Combining an unknown word range extraction step of identifying a recognized syllable string corresponding to an unknown word with a syllable string for a subword forming an unknown word by referring to a subword dictionary in which various readings for the subwords forming the word are registered as a syllable string By generating various syllable string candidates corresponding to the unknown word, and referring to a difference table for evaluating the degree of approximation between the two syllable strings, a syllable string candidate that is the closest to the recognized syllable string corresponding to the unknown word And an unknown word syllable estimation step of estimating the maximum likelihood syllable string candidate as a syllable string matching the unknown word.

10. In the syllable string calculation step, syllable unit recognition is performed based on the voice pattern to calculate a plurality of syllable string candidates with higher likelihood corresponding to the voice, and in the word string calculation step, the plurality of syllable strings are calculated. A word string candidate corresponding to each of the string candidates is calculated, and in the output step, the largest language is selected from the syllable string calculating step and the combination of the plurality of syllable strings and the word strings calculated in the word string calculating step. 10. The speech recognition according to claim 9, wherein a combination having likelihood is detected, and at least a recognition word string is output as a syllable string candidate and a word string candidate relating to the combination, respectively, as a recognition syllable string and a recognition word string. Method.

11. A word registration step of registering, in a word dictionary, an unknown word identified in the unknown word range extraction step and a syllable string matching the unknown word estimated in the unknown word syllable estimation step. The speech recognition method according to claim 9.

12. In the word registration step, the unknown word recognized in the unknown word range extraction step and the syllable string matching the unknown word estimated in the unknown word syllable estimation step are registered in the word dictionary as n-grams. The speech recognition method according to claim 11, wherein:

13. A second output step of displaying, to a user, a notation representing a syllable string that matches the unknown word recognized in the unknown word range extraction step and the unknown word estimated in the unknown word syllable estimation step. And a second correction step in which the user inputs a correct notation when there is an error in the notation representing the unknown word and the syllable string matching the unknown word displayed in the second output step. The speech recognition method according to claim 11 or 12, wherein:

14. A syllable sequence that matches the unknown word estimated in the unknown word syllable estimation step is registered in the word dictionary for the unknown word identified in the unknown word range extraction step, and a difference between the unknown word and the unknown word is determined. 10. The speech recognition method according to claim 9, further comprising a different notation registration step of registering the matching syllable string in a word dictionary for the notation.

15. For the unknown word identified in the unknown word range extraction step, it is determined whether or not a syllable string matching the unknown word has been estimated. If the syllable string has been estimated, the estimated syllable string is determined. The syllable string corresponding to the unknown word is registered in the word dictionary as a syllable string matching the unknown word, and if it cannot be estimated, the recognized syllable string corresponding to the unknown word recognized in the unknown word range extracting step is set as a syllable string matching the unknown word. 10. A syllable string registration step for registering in a dictionary.
Voice recognition method described in.

16. A syllable string matching the unknown word estimated in the unknown word syllable estimation step is registered in the word dictionary for the unknown word recognized in the unknown word range extraction step, and the unknown word is A speech recognition method, comprising a misreading registration step of registering a misreading syllable string matching the unknown word in a word dictionary.

17. A syllable string calculating function for performing syllable unit recognition based on an input voice pattern to calculate a syllable string candidate corresponding to a voice, and calculating a word string candidate corresponding to the syllable string candidate. A word string calculation function, an output function of outputting at least the maximum likelihood recognized word string calculated using the syllable string calculation function and the word string calculation function, and a recognized word string displayed using the output function. A correction function that enables a user to input a correct character string for correction when there is an error; a morphological analysis function that performs morphological analysis on the input correct character string; and a correct character string and a morpheme An unknown word range extraction function for comparing an analysis result with an unknown word and a recognized syllable string corresponding to the unknown word, and a subword in which various readings for subwords constituting the word are registered as a syllable string. A syllable sequence corresponding to the unknown word is generated by combining syllable sequences for subwords constituting the unknown word with reference to the word dictionary, and a difference table for evaluating the degree of approximation between the two syllable sequences is referred to. An unknown word syllable estimation function for detecting a syllable string candidate closest to the recognized syllable string corresponding to the unknown word and estimating the maximum likelihood syllable string candidate as a syllable string matching the unknown word. A computer-readable recording medium on which a speech recognition program to be realized is recorded.

18. A syllable string calculating function for performing syllable unit recognition based on a voice pattern to calculate a plurality of syllable string candidates having a higher likelihood corresponding to a voice, and each of the plurality of syllable string candidates. A word string calculation function for calculating a corresponding word string candidate, and a combination of a plurality of syllable string candidates and word string candidates calculated using the syllable string calculation function and the word string calculation function. An output function of detecting a combination having the largest language likelihood and outputting at least a recognized word string as a syllable string candidate and a word string candidate for the combination as a recognized syllable string and a recognized word string,
18. A computer-readable recording medium recording a speech recognition program according to claim 17, wherein a program to be realized by a computer is additionally recorded.

19. A computer has a word registration function of registering, in a word dictionary, an unknown word recognized using an unknown word range extraction function and a syllable string matching the unknown word estimated by the unknown word syllable estimation function. 18. A computer-readable recording medium on which a speech recognition program according to claim 17, further comprising a program for causing the computer to execute the program.

20. Registering an unknown word recognized using an unknown word range extraction function and a syllable string matching the unknown word estimated using an unknown word syllable estimation function as an n-gram in a word dictionary. 20. The computer-readable recording medium according to claim 19, further comprising a program for causing a computer to realize a gram registration function.

21. A method for displaying, to a user, a notation representing a syllable string that matches an unknown word recognized using an unknown word range extraction function and the unknown word estimated using an unknown word syllable estimation function. And the user can input a correct notation when the notation representing the unknown word and the syllable string that matches the unknown word displayed using the second output function is incorrect. 21. A computer-readable recording medium storing a speech recognition program according to claim 19, wherein a program for realizing the second correction function to be executed by a computer is additionally recorded.

22. A syllable string matching the unknown word estimated using the unknown word syllable estimation function is registered in the word dictionary with respect to the unknown word recognized using the unknown word range extraction function. 18. The speech recognition according to claim 17, wherein a program for causing a computer to realize a function of registering a different syllabic string in the word dictionary for the different spelling of the word is additionally recorded. A computer-readable recording medium on which a program is recorded.

23. For unknown words recognized using the unknown word range extraction function, it is determined whether or not a syllable string that matches the unknown words has been estimated. If the syllable string can be estimated, the estimated syllables are determined. The sequence is registered in the word dictionary as a syllable sequence that matches the unknown word, and if it cannot be estimated, the recognized syllable sequence corresponding to the unknown word recognized using the unknown word range extraction function matches the unknown word. 18. The computer-readable recording medium according to claim 17, wherein a program for causing a computer to realize a syllable string registration function of registering a syllable string in a word dictionary is additionally recorded.

24. A syllable sequence that matches an unknown word estimated using an unknown word syllable estimation function for an unknown word identified using an unknown word range extraction function is registered in a word dictionary, and the unknown word is registered in the word dictionary. 18. The program according to claim 17, further comprising a program for causing a computer to realize a misreading registration function of registering, in a word dictionary, a syllable string of misreading that matches the unknown word for the word. A computer-readable recording medium on which a voice recognition program is recorded.