JP2010231149A

JP2010231149A - Terminal using kana-kanji conversion system for voice recognition, method and program

Info

Publication number: JP2010231149A
Application number: JP2009081489A
Authority: JP
Inventors: Toshiaki Uchibe; 利明内部
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2009-03-30
Filing date: 2009-03-30
Publication date: 2010-10-14
Anticipated expiration: 2029-03-30
Also published as: JP5243325B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a terminal, a method and a program, capable of integrating a resource of a language model of a whole system by using a kana-kanji (Japanese-Chinese character) conversion system for voice recognition. <P>SOLUTION: The terminal includes: an acoustic score calculating means for calculating an acoustic score which is given by an acoustic model for each Japanese syllabary of a plurality of candidates for a phoneme frame in order to calculate a score; an N-gram score calculating means for calculating an N-gram score according to a connection strength between words for Japanese syllabary series which is a candidate; a case grammar score calculating means for calculating a case grammar score according to the connection strength between syllables; and an integrated score calculating means for calculating an integrated score based on the plurality of scores. In order to search kana-kanji in which the integrated score is the highest based on morphological analysis, a search control means for repeating score calculation, and a case frame processing means for selecting a conversion candidate by a case structure of a sentence, are included. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、音声認識に仮名漢字変換システムを用いた端末、方法及びプログラムに関する。 The present invention relates to a terminal, a method, and a program using a kana-kanji conversion system for speech recognition.

従来、アプリケーションに文字を入力するために、キー操作によって利用者に入力された平仮名を、仮名漢字の文字列に変換する仮名漢字変換システムがある。通常、ワードプロセッサソフトウェアやテキストエディタソフトウェアに組み込まれた日本語入力機能として動作する。 2. Description of the Related Art Conventionally, there is a kana-kanji conversion system that converts a hiragana character input to a user by a key operation into a kana-kanji character string in order to input characters to an application. Usually, it operates as a Japanese input function built into word processor software or text editor software.

これに対し、利用者の発声した音声信号をマイクで取得し、その音声信号を仮名漢字の文字列に変換する音声認識システムがある。利用者は、キーを操作することなく、文字列を入力することができる。しかしながら、人が発声する単語の語彙数が膨大である。そのために、音声認識の精度を高めることが難しい。 On the other hand, there is a voice recognition system that acquires a voice signal uttered by a user with a microphone and converts the voice signal into a kana-kanji character string. A user can input a character string without operating a key. However, the number of vocabulary words spoken by people is enormous. For this reason, it is difficult to improve the accuracy of voice recognition.

音声認識によって予測される文字列の精度を向上させるために、文章から話題予測データを作成する技術がある（例えば特許文献１参照）。話題予測データと平仮名とに基づいて、仮名漢字の文字列が予測される。 In order to improve the accuracy of a character string predicted by speech recognition, there is a technique for creating topic prediction data from sentences (see, for example, Patent Document 1). A character string of kana / kanji is predicted based on the topic prediction data and hiragana.

また、予測される複数候補の文字列を、特定の認識対象語に絞り込むことによって、認識精度を高める技術もある（例えば特許文献２参照）。しかしながら、認識対象語が絞り込まれているために、音声認識システムの用途も限定される。 There is also a technique for improving recognition accuracy by narrowing down a plurality of candidate character strings to be predicted to specific recognition target words (for example, see Patent Document 2). However, since the recognition target words are narrowed down, the use of the speech recognition system is also limited.

図１は、従来技術における仮名漢字変換機能及び音声認識機能を有するシステムである。 FIG. 1 shows a system having a kana-kanji conversion function and a speech recognition function in the prior art.

図１によれば、端末１は、ハードウェアとして、ディスプレイ１００と、キー操作部１１１と、マイク１２１とを有する。ディスプレイ１００は、テキストエディタのようなアプリケーションの入力インタフェースを表示すると共に、利用者によって入力された平仮名及び予測候補の仮名漢字を表示する。キー操作部１１１は、利用者に対して、キーを用いて平仮名を入力させる。マイク１２１は、利用者によって発声された音声信号を取得する。 According to FIG. 1, the terminal 1 includes a display 100, a key operation unit 111, and a microphone 121 as hardware. The display 100 displays an input interface of an application such as a text editor, and also displays hiragana and kanji kanji of prediction candidates input by the user. The key operation unit 111 causes the user to input hiragana using a key. The microphone 121 acquires an audio signal uttered by the user.

端末１は、仮名漢字変換機能として、仮名漢字変換エンジン１１２と、言語モデル蓄積部１１３と、履歴蓄積部１１４と、仮名漢字選択画面制御部１１５とを有する。仮名漢字変換エンジン１１２は、言語モデル蓄積部１１３及び履歴蓄積部１１４を用いて、キー操作部１１１によって入力された平仮名を、複数候補の仮名漢字に変換する。仮名漢字選択画面制御部１１５は、利用者に対して、複数候補の仮名漢字から１つの仮名漢字を選択させる。 The terminal 1 includes a kana / kanji conversion engine 112, a language model storage unit 113, a history storage unit 114, and a kana / kanji selection screen control unit 115 as a kana / kanji conversion function. The kana-kanji conversion engine 112 uses the language model storage unit 113 and the history storage unit 114 to convert the hiragana input by the key operation unit 111 into a plurality of candidate kana-kanji characters. The kana / kanji selection screen control unit 115 causes the user to select one kana / kanji character from a plurality of candidate kana / kanji characters.

言語モデル蓄積部１１３は、単語辞書と言語モデルとを蓄積する。言語モデルは、単語間の連接関係の確率を規定する。単語辞書に未登録の単語には、変換することができない。そのために、言語モデル蓄積部１１３は、キー操作によって入力される全ての単語を含む単語辞書と、それらの言語モデルとを蓄積しておく必要がある。仮名漢字変換システムに対応する言語モデル蓄積部１１３は、係り受け文法の確率と、格文法の確率とを蓄積する。 The language model storage unit 113 stores a word dictionary and a language model. The language model defines the probability of the connection relationship between words. Words that are not registered in the word dictionary cannot be converted. Therefore, the language model storage unit 113 needs to store a word dictionary including all words input by key operations and their language models. The language model storage unit 113 corresponding to the kana-kanji conversion system stores the dependency grammar probability and the case grammar probability.

履歴蓄積部１１４は、過去に利用者によって選択された仮名漢字（単語、文節等）の学習履歴を蓄積する。 The history accumulating unit 114 accumulates a learning history of kana / kanji characters (words, phrases, etc.) selected by the user in the past.

また、端末１は、通信インタフェースを介して、音声認識サーバ２へ接続する。そして、端末１は、マイク１２１によって取得された音声信号を、ネットワークを介して、音声認識サーバ２へ送信する。音声認識では、人が発声する単語の語彙数が膨大であって、大容量の言語モデルと高い処理負荷とを要する。そのために、音声認識エンジンを、音声認識サーバとして別途備えることは好ましい。 The terminal 1 is connected to the voice recognition server 2 via a communication interface. And the terminal 1 transmits the audio | voice signal acquired by the microphone 121 to the audio | voice recognition server 2 via a network. In speech recognition, the number of vocabulary words spoken by a person is enormous, requiring a large-capacity language model and a high processing load. Therefore, it is preferable to separately provide a speech recognition engine as a speech recognition server.

音声認識サーバは、音声認識エンジン１２２と、音響モデル蓄積部１２３と、言語モデル蓄積部１２４とを有する。音声認識エンジン１２２は、音響モデル蓄積部１２３及び言語モデル蓄積部１２４を用いて、受信された音声信号から複数候補の仮名漢字に変換する。複数候補の仮名漢字は、ネットワークを介して、端末１へ送信される。端末１の音声認識選択画面制御部１２５は、複数候補の仮名漢字から、利用者によって１つの仮名漢字を選択させる。 The speech recognition server includes a speech recognition engine 122, an acoustic model storage unit 123, and a language model storage unit 124. The speech recognition engine 122 uses the acoustic model storage unit 123 and the language model storage unit 124 to convert the received speech signal into a plurality of candidate kana-kanji characters. A plurality of candidate kana-kanji characters are transmitted to the terminal 1 via the network. The voice recognition selection screen control unit 125 of the terminal 1 causes the user to select one kana / kanji character from a plurality of candidate kana / kanji characters.

音響モデル蓄積部１２３は、音響モデルを蓄積する。音響モデルとは、音響特徴パラメータと音素との対応関係を蓄積する。音素とは、単語をローマ字で書いたときのアルファベットの単位にほぼ相当する。音響モデル蓄積部１２３は、音響特徴パラメータに対するその音素の確率を蓄積する。 The acoustic model storage unit 123 stores an acoustic model. The acoustic model stores a correspondence relationship between acoustic feature parameters and phonemes. A phoneme is roughly equivalent to an alphabetic unit when a word is written in Roman letters. The acoustic model storage unit 123 stores the probability of the phoneme with respect to the acoustic feature parameter.

言語モデル蓄積部１２４は、単語間の連接関係の確率を規定する。音声認識システムの言語モデル蓄積部１２４は、隣接するN個の単語間の連接関係の確率を利用するN-gramを採用する。これに対し、仮名漢字変換システムの言語モデル蓄積部１１３は、隣接する２個の単語間の連接関係の確率のみを利用する係り受け文法を採用する。そのために、音声認識システムの言語モデル蓄積部１２４は、仮名漢字システムの言語モデル蓄積部１１３よりも、リソースが膨大となる。勿論、端末１が、音声認識エンジン１２２と、音響モデル蓄積部１２３と、言語モデル蓄積部１２４とを更に含むものであってもよい。 The language model accumulation unit 124 defines the probability of the connection relationship between words. The language model storage unit 124 of the speech recognition system employs an N-gram that uses the probability of the connection relationship between adjacent N words. On the other hand, the language model storage unit 113 of the kana-kanji conversion system employs a dependency grammar that uses only the probability of the connection relationship between two adjacent words. For this reason, the language model storage unit 124 of the speech recognition system has more resources than the language model storage unit 113 of the kana / kanji system. Of course, the terminal 1 may further include a speech recognition engine 122, an acoustic model storage unit 123, and a language model storage unit 124.

尚、音声認識システムによって人の発声した音声信号を平仮名に変換し、次に、仮名漢字変換システムによってその平仮名を仮名漢字に変換する技術もある（例えば非特許文献１及び２参照）。 There is also a technique for converting a voice signal uttered by a person into a hiragana by a voice recognition system, and then converting the hiragana into a kana / kanji by a kana-kanji conversion system (see, for example, Non-Patent Documents 1 and 2).

特開２０００−２８５１１２号公報JP 2000-285112 A 特許第４０１２１４３号Japanese Patent No. 40112143

音声認識エンジン「VORERO（VOice REcognition RObustR）」、旭化成、[online]、［平成２１年３月１９日検索］、インターネット＜URL:http://www.asahi-kasei.co.jp/vorero/jp/vorero/index.html＞Voice recognition engine "VORERO (VOice REcognition RObustR)", Asahi Kasei, [online], [March 19, 2009 search], Internet <URL: http://www.asahi-kasei.co.jp/vorero/jp /vorero/index.html> 日本語かな漢字変換システム「モバイルWnn」、オムロン、[online]、［平成２１年３月１９日検索］、インターネット＜URL:http://www.omronsoft.co.jp/SP/mobile/＞Japanese Kana-Kanji conversion system "Mobile Wnn", OMRON, [online], [March 19, 2009 search], Internet <URL: http://www.omronsoft.co.jp/SP/mobile/>

前述した従来技術によれば、音声認識システムと仮名漢字変換システムとは別々に備えられ、互いに独立のリソースを用いている。音声認識システムにおける音響モデル及び言語モデルは、流行語のような全ての口語文字に対応する必要があり、そのリソースも膨大である。一方で、音声認識システムでは、仮名漢字変換システムに組み込まれた格文法、形態素解析及び格フレーム処理のような処理を実行しないため、仮名漢字変換における誤変換も生じやすい。 According to the above-described prior art, the speech recognition system and the kana-kanji conversion system are provided separately and use resources independent of each other. The acoustic model and language model in the speech recognition system need to correspond to all spoken words such as buzzwords, and the resources are enormous. On the other hand, since the speech recognition system does not execute processing such as case grammar, morphological analysis, and case frame processing incorporated in the kana-kanji conversion system, erroneous conversion in kana-kanji conversion is likely to occur.

そこで、本発明は、音声認識に仮名漢字変換システムを用いることによって、システム全体の言語モデルのリソースを統合すると共に、音声認識の誤変換を削減することができる端末、方法及びプログラムを提供することを目的とする。 Therefore, the present invention provides a terminal, a method, and a program capable of integrating the language model resources of the entire system and reducing erroneous conversion of speech recognition by using a kana-kanji conversion system for speech recognition. With the goal.

本発明によれば、
音声認識で用いられる音響モデルを蓄積する音響モデル蓄積手段と、
単語辞書と、N-gram及び格文法に対応した言語モデルとを蓄積する言語モデル蓄積手段と、
利用者によって発声された音声信号を取得するマイク手段と、
音声信号から音素フレームを抽出する音響分析手段と、
音素フレームに対する複数候補の読み仮名毎に、音響モデルが与える音響スコアを算出する音響スコア算出手段と、
候補となる読み仮名系列に対して、言語モデルが与える単語間の連接強度に応じてN-gramスコアを算出するN-gramスコア算出手段と、
候補となる読み仮名系列に対して、文節−文節間の連接強度に応じて格文法スコアを算出する格文法スコア算出手段と、
候補となる読み仮名系列に対して、音響スコア、N-gramスコア及び格文法スコアに基づく統合スコアを算出する統合スコア算出手段と、
形態素解析に基づいて複数候補の仮名漢字を選別すると共に、統合スコアが最も高い仮名漢字を探索するために、音響スコア算出手段、N-gramスコア算出手段、格文法スコア算出手段及び統合スコア算出手段を繰り返す探索制御手段と、
探索された仮名漢字を含む文章の格構造によって変換候補を選択する格フレーム処理手段と
を有し、音声認識機能を実現することを特徴とする。 According to the present invention,
Acoustic model storage means for storing an acoustic model used in speech recognition;
Language model storage means for storing a word dictionary and a language model corresponding to N-gram and case grammar;
Microphone means for acquiring a voice signal uttered by a user;
Acoustic analysis means for extracting phoneme frames from the speech signal;
Acoustic score calculation means for calculating an acoustic score given by the acoustic model for each of a plurality of candidate reading pseudonyms for the phoneme frame;
N-gram score calculating means for calculating an N-gram score according to the connection strength between words given by the language model for candidate reading kana series;
Case grammar score calculating means for calculating a case grammar score according to the connection strength between the clauses for the candidate kana series,
An integrated score calculation means for calculating an integrated score based on the acoustic score, the N-gram score, and the case grammar score for the candidate kana series,
In order to select a plurality of candidate kana-kanji characters based on morphological analysis and to search for kana-kanji characters having the highest integrated score, acoustic score calculation means, N-gram score calculation means, case grammar score calculation means, and integrated score calculation means Search control means for repeating
And a case frame processing means for selecting a conversion candidate according to a case structure of a sentence including the searched kana / kanji, and realizing a speech recognition function.

本発明の端末における他の実施形態によれば、
過去に変換された仮名漢字を記憶する履歴蓄積手段と、
履歴蓄積手段を用いて、候補となる読み仮名系列に対して、過去の学習履歴に応じて履歴スコアを算出する履歴スコア算出手段と
を有し、
統合スコア算出手段は、更に、履歴スコアを統合スコアに算入することも好ましい。 According to another embodiment of the terminal of the present invention,
History accumulating means for storing kana-kanji characters converted in the past;
A history score calculating means for calculating a history score according to a past learning history with respect to candidate reading kana series using the history storage means,
The integrated score calculation means preferably further includes the history score in the integrated score.

本発明の端末における他の実施形態によれば、
キーを用いて利用者によって入力された平仮名系列を取得するキー操作手段を更に有し、
N-gramスコア算出手段、格文法スコア算出手段、統合スコア算出手段、探索制御手段及び格フレーム処理手段によって、仮名漢字変換機能を実現することも好ましい。 According to another embodiment of the terminal of the present invention,
It further has a key operation means for acquiring a hiragana sequence input by a user using a key,
It is also preferable to realize the kana-kanji conversion function by the N-gram score calculation means, the case grammar score calculation means, the integrated score calculation means, the search control means, and the case frame processing means.

本発明の端末における他の実施形態によれば、
利用者に対して複数の変換候補の中から、いずれか１つの仮名漢字を選択させる候補選択画面制御手段を更に有し、
候補選択画面制御手段は、音声認識機能及び仮名漢字変換機能を、利用者に対して同一の候補選択画面を視認させることも好ましい。 According to another embodiment of the terminal of the present invention,
A candidate selection screen control means for allowing the user to select any one kana / kanji character from a plurality of conversion candidates;
It is also preferable that the candidate selection screen control means allows the user to visually recognize the same candidate selection screen for the voice recognition function and the kana-kanji conversion function.

本発明によれば、端末における音声認識方法であって、
端末は、
音声認識で用いられる音響モデルを蓄積する音響モデル蓄積部と、
単語辞書と、N-gram及び格文法に対応した言語モデルとを蓄積する言語モデル蓄積部と
を有し、
利用者によって発声された音声信号を、マイク部によって取得する第１のステップと、
音声信号から音素フレームを抽出する第２のステップと、
音素フレームに対する複数候補の読み仮名毎に、音響モデルが与える音響スコアを算出する第３のステップと、
候補となる読み仮名系列に対して、言語モデルが与える単語間の連接強度に応じてN-gramスコアを算出する第４のステップと、
候補となる読み仮名系列に対して、文節−文節間の連接強度に応じて格文法スコアを算出する第５のステップと、
候補となる読み仮名系列に対して、音響スコア、N-gramスコア及び格文法スコアに基づく統合スコアを算出する第６のステップと、
形態素解析に基づいて複数候補の仮名漢字を選別すると共に、統合スコアが最も高い仮名漢字を探索するために、第３から第６のステップを繰り返す第７のステップと、
探索された仮名漢字を含む文章の格構造によって変換候補を選択する第８のステップと
を有し、音声認識機能を実現することを特徴とする。 According to the present invention, there is provided a speech recognition method in a terminal,
The terminal
An acoustic model storage unit that stores an acoustic model used in speech recognition;
A language model storage unit that stores a word dictionary and a language model corresponding to N-gram and case grammar;
A first step of acquiring an audio signal uttered by a user with a microphone unit;
A second step of extracting phoneme frames from the speech signal;
A third step of calculating an acoustic score given by the acoustic model for each of a plurality of candidate reading pseudonyms for the phoneme frame;
A fourth step of calculating an N-gram score according to the connection strength between words given by the language model for the candidate kana sequence;
A fifth step of calculating a case grammar score according to the connection strength between clauses for the candidate kana series;
A sixth step of calculating an integrated score based on the acoustic score, the N-gram score, and the case grammar score for the candidate kana sequence;
A seventh step of repeating the third to sixth steps to select a plurality of candidate kana-kanji characters based on morphological analysis and to search for a kana-kanji character having the highest integrated score;
And an eighth step of selecting a conversion candidate according to the case structure of the sentence including the searched kana / kanji, and realizing a speech recognition function.

本発明の音声認識方法における他の実施形態によれば、
端末は、過去に変換された仮名漢字を記憶する履歴蓄積部を更に有し、
履歴蓄積部を用いて、候補となる読み仮名系列に対して、過去の学習履歴に応じて履歴スコアを算出するステップを更に有し、
第６のステップは、更に、履歴スコアを統合スコアに算入することも好ましい。 According to another embodiment of the speech recognition method of the present invention,
The terminal further includes a history storage unit that stores kana-kanji characters converted in the past,
Using the history accumulating unit, the method further includes a step of calculating a history score according to the past learning history for candidate kana series,
In the sixth step, it is also preferable to add the history score to the integrated score.

本発明の音声認識方法における他の実施形態によれば、
端末は、キーを用いて利用者によって入力された平仮名系列を取得するキー操作部を更に有し、
第４のステップから第８のステップまでによって、仮名漢字変換機能を実現することも好ましい。 According to another embodiment of the speech recognition method of the present invention,
The terminal further includes a key operation unit that acquires a hiragana sequence input by a user using a key,
It is also preferable to realize the kana-kanji conversion function from the fourth step to the eighth step.

本発明の音声認識方法における他の実施形態によれば、
利用者に対して複数の変換候補の中から、いずれか１つの仮名漢字を選択させるステップを更に有することによって、音声認識機能及び仮名漢字変換機能を、利用者に対して同一の候補選択画面を視認させることも好ましい。 According to another embodiment of the speech recognition method of the present invention,
By further comprising the step of allowing the user to select any one kana / kanji character from among a plurality of conversion candidates, the voice recognition function and kana / kanji conversion function can be displayed on the same candidate selection screen for the user. Visual recognition is also preferable.

本発明によれば、端末に搭載されたコンピュータを機能させる音声認識プログラムであって、
音声認識で用いられる音響モデルを蓄積する音響モデル蓄積手段と、
単語辞書と、N-gram及び格文法に対応した言語モデルとを蓄積する言語モデル蓄積手段と、
利用者によって発声された音声信号を取得するマイク手段と、
音声信号から音素フレームを抽出する音響分析手段と、
音素フレームに対する複数候補の読み仮名毎に、音響モデルが与える音響スコアを算出する音響スコア算出手段と、
候補となる読み仮名系列に対して、言語モデルが与える単語間の連接強度に応じてN-gramスコアを算出するN-gramスコア算出手段と、
候補となる読み仮名系列に対して、文節−文節間の連接強度に応じて格文法スコアを算出する格文法スコア算出手段と、
候補となる読み仮名系列に対して、音響スコア、N-gramスコア及び格文法スコアに基づく統合スコアを算出する統合スコア算出手段と、
形態素解析に基づいて複数候補の仮名漢字を選別すると共に、統合スコアが最も高い仮名漢字を探索するために、音響スコア算出手段、N-gramスコア算出手段、格文法スコア算出手段及び統合スコア算出手段を繰り返す探索制御手段と、
探索された仮名漢字を含む文章の格構造によって変換候補を選択する格フレーム処理手段と
してコンピュータを機能させることを特徴とする。 According to the present invention, a speech recognition program for causing a computer mounted on a terminal to function,
Acoustic model storage means for storing an acoustic model used in speech recognition;
Language model storage means for storing a word dictionary and a language model corresponding to N-gram and case grammar;
Microphone means for acquiring a voice signal uttered by a user;
Acoustic analysis means for extracting phoneme frames from the speech signal;
Acoustic score calculation means for calculating an acoustic score given by the acoustic model for each of a plurality of candidate reading pseudonyms for the phoneme frame;
N-gram score calculating means for calculating an N-gram score according to the connection strength between words given by the language model for candidate reading kana series;
Case grammar score calculating means for calculating a case grammar score according to the connection strength between the clauses for the candidate kana series,
An integrated score calculation means for calculating an integrated score based on the acoustic score, the N-gram score, and the case grammar score for the candidate kana series,
In order to select a plurality of candidate kana-kanji characters based on morphological analysis and to search for kana-kanji characters having the highest integrated score, acoustic score calculation means, N-gram score calculation means, case grammar score calculation means, and integrated score calculation means Search control means for repeating
The computer is caused to function as a case frame processing means for selecting a conversion candidate according to the case structure of the sentence including the searched kana / kanji.

本発明の端末、方法及びプログラムによれば、音声認識に仮名漢字変換システムを用いることによって、システム全体の言語モデルのリソースを統合すると共に、音声認識の誤変換を削減することができる。 According to the terminal, method, and program of the present invention, by using the kana-kanji conversion system for speech recognition, it is possible to integrate the language model resources of the entire system and reduce erroneous conversion of speech recognition.

従来技術における仮名漢字変換機能及び音声認識機能を有するシステムである。This is a system having a kana-kanji conversion function and a speech recognition function in the prior art. 仮名漢字変換システム及び音声認識システムの機能構成図である。It is a functional block diagram of a kana / kanji conversion system and a speech recognition system. 本発明における端末の機能構成図である。It is a function block diagram of the terminal in this invention. 本発明におけるフローチャートである。It is a flowchart in this invention. 従来技術と本発明とを比較した、仮名漢字の予測変換シーケンスである。It is a predictive conversion sequence of Kana-Kanji comparing the prior art and the present invention.

以下、本発明の実施の形態について、図面を用いて詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

図２は、仮名漢字変換システム及び音声認識システムの機能構成図である。 FIG. 2 is a functional configuration diagram of the kana-kanji conversion system and the speech recognition system.

図２（ａ）によれば、仮名漢字変換システムのみの機能構成が表されている。仮名漢字変換エンジン１１２は、単語の連接毎に、仮名漢字候補の確率をスコア（点数付け）によって算出する。仮名漢字変換エンジン１１２は、係り受け文法スコア算出部１１２１と、格文法スコア算出部１１２２と、履歴スコア算出部１１２３と、統合スコア算出部１１２４と、形態素解析部１１２５と、格フレーム処理部１１２６とを含む。係り受け文法スコア算出部１１２１は、単語間の連接強度を、単語間の共起確率で表す。尚、仮名漢字変換システムによれば、キーによって入力されるために、文字の読み仮名は確定する。そのため、その読み仮名に対する複数の同音意義語を検出すればよい。 FIG. 2A shows the functional configuration of only the kana-kanji conversion system. The kana-kanji conversion engine 112 calculates the probability of kana-kanji candidate by score (scoring) for each word concatenation. The kana-kanji conversion engine 112 includes a dependency grammar score calculating unit 1121, a case grammar score calculating unit 1122, a history score calculating unit 1123, an integrated score calculating unit 1124, a morpheme analyzing unit 1125, and a case frame processing unit 1126. including. The dependency grammar score calculation unit 1121 represents the connection strength between words as a co-occurrence probability between words. Incidentally, according to the kana-kanji conversion system, the character's reading kana is fixed because it is inputted by the key. Therefore, it is only necessary to detect a plurality of synonymous meaning words for the reading kana.

図２（ｂ）によれば、音声認識システムのみの機能構成が表されている。音声認識エンジン１２２は、音響分析部１２２１と、音響スコア算出部１２２２と、N-gramスコア算出部１２２３と、統合スコア算出部１２２４と、候補範囲制御部１２２５とを有する。仮名漢字変換システムの係り受け文法スコア算出部１１２１に対して、音声認識システムでは、隣接するＮ個の単語の連接関係の確率を算出するN-gramスコア算出部１２２３を用いる。N-gramは、隣接する２個の単語の連接関係の確率を表す「係り受け文法」を拡張したものである。尚、音声認識システムの場合、読み仮名自体が不確定となる。そのために、候補範囲制御部１２２５は、異なる読み仮名の各々に対応する仮名漢字を探索する必要がある。 FIG. 2B shows the functional configuration of only the voice recognition system. The speech recognition engine 122 includes an acoustic analysis unit 1221, an acoustic score calculation unit 1222, an N-gram score calculation unit 1223, an integrated score calculation unit 1224, and a candidate range control unit 1225. In contrast to the dependency grammar score calculation unit 1121 of the kana-kanji conversion system, the speech recognition system uses an N-gram score calculation unit 1223 that calculates the probabilities of the concatenation relationship of N adjacent words. N-gram is an extension of “dependency grammar” that represents the probability of the connection between two adjacent words. In the case of a voice recognition system, the reading pseudonym itself is indeterminate. Therefore, the candidate range control unit 1225 needs to search for kana kanji corresponding to each different reading kana.

本発明の音声認識システムによれば、統合スコアの算出に、仮名漢字変換システムの格文法スコア算出部１１２２及び履歴スコア算出部１１２３を更に含む。また、仮名漢字変換システムにおける形態素解析部１１２５及び格フレーム処理部１１２６と、音声認識システムにおける候補範囲制御部１２２５とを統合する。更に、本発明の仮名漢字変換システムによれば、係り受け文法スコアではなく、N-gramスコアが用いられる。 According to the speech recognition system of the present invention, the integrated score calculation further includes the case grammar score calculation unit 1122 and the history score calculation unit 1123 of the kana-kanji conversion system. Further, the morphological analysis unit 1125 and the case frame processing unit 1126 in the kana-kanji conversion system and the candidate range control unit 1225 in the speech recognition system are integrated. Furthermore, according to the kana-kanji conversion system of the present invention, the N-gram score is used instead of the dependency grammar score.

図３は、本発明における端末の機能構成図である。 FIG. 3 is a functional configuration diagram of the terminal in the present invention.

図３によれば、端末１は、音響モデル蓄積部１２３と、言語モデル蓄積部１３３と、履歴蓄積部１１４と、音響分析部１２２１と、音響スコア算出部１２２２と、N-gramスコア算出部１２２３と、格文法スコア算出部１１２２と、履歴スコア算出部１１２３と、統合スコア算出部１３０と、探索制御部１３１と、格フレーム処理部１１２６と、候補選択画面制御部１３２とを更に有する。これら機能構成部は、端末に搭載されたコンピュータを機能させるプログラムを実行することによって実現される。 According to FIG. 3, the terminal 1 includes an acoustic model storage unit 123, a language model storage unit 133, a history storage unit 114, an acoustic analysis unit 1221, an acoustic score calculation unit 1222, and an N-gram score calculation unit 1223. And a case grammar score calculation unit 1122, a history score calculation unit 1123, an integrated score calculation unit 130, a search control unit 131, a case frame processing unit 1126, and a candidate selection screen control unit 132. These functional components are realized by executing a program that causes a computer installed in the terminal to function.

N-gramスコア算出部１２２３は、候補となる読み仮名系列に対して、言語モデルが与える単語間の連接強度に応じてN-gramスコアを算出する。ここで、音声認識によれば、読み仮名が不確定であるために、異なる複数の読み仮名についてスコアを算出する必要がある。例えば、「こうつう」の後に「き・・」が連接する場合、「交通」の後に来る単語の変換候補としては「規制」「きれい」「季節」等の中で、「規制」が一番高いスコアとなる。 The N-gram score calculation unit 1223 calculates an N-gram score according to the connection strength between words given by the language model with respect to the candidate kana sequence. Here, according to the voice recognition, since the reading kana is indeterminate, it is necessary to calculate scores for a plurality of different reading kana. For example, when "Ki ..." is connected after "Koutsu", "Regulation" is the best word conversion candidate after "Transport" among "Regulation", "Pretty", "Season", etc. High score.

格文法スコア算出部１１２２は、候補となる読み仮名系列に対して、言語モデルが与える文節−文節間の連接強度に応じて格文法スコアを算出する。格文法解析とは、助詞に注目して、文章の構造（単文、重文、複文、主語、目的語、修飾語、述語、補語など）を解析する。例えば、「わたしはくににぜいきんがおさめられているのでおさめる」では、以下のように文章構造が判断される。
「わたしはくににぜいきんがおさめられているのでおさめる」
（主語）（述語）
主語目的語修飾語述語
文節「ぜいきんが」─文節「おさめられている」の連接強度は高く、且つ、文章構成のける重要度は低い。「ぜいきんがおさめられているので」を修飾語と判断し、「わたしは」「くにに」「おさめる」が、文章構成の重要度が高いと判断する。このように、従来の音声認識システムによれば言語モデルとしてN-gramしか考慮していないのに対し、自由度が高い文法の発声にも対応することができる。 The case grammar score calculation unit 1122 calculates a case grammar score according to the joint strength between clauses given by the language model for the candidate kana sequence. Case grammar analysis focuses on particles and analyzes the structure of sentences (single sentence, compound sentence, compound sentence, subject, object, modifier, predicate, complement, etc.). For example, in “I will stop you because I ’m included in the crab”, the sentence structure is judged as follows.
“I ’ll stop you because I ’m the one who ’s gone.”
(Subject) (Predicate)
Subject object modifier predicate
The phrase “Zeikinga” – the phrase “Puppeted” has high connection strength, and the importance of sentence composition is low. Judgment is “because Zeikin has been subjugated” as a modifier, and “I”, “Kunii” and “Osamu” are judged to have a high degree of importance. As described above, according to the conventional speech recognition system, only N-gram is considered as a language model, but it is also possible to deal with grammatical utterance with a high degree of freedom.

言語モデル蓄積部１３３は、単語辞書及び言語モデルを蓄積する。特に、単語辞書における単語間の連接関係を、点数化（スコア化）して蓄積する。ここで、本発明における言語モデル蓄積部１３３は、N-gramの確率及び格文法の確率を有し、N-gramスコア算出部１２２３及び格文法スコア算出部１１２２に対応したものである。その点で、仮名漢字変換システム及び音声認識システムにおける既存の言語モデル蓄積部とは異なる。 The language model storage unit 133 stores a word dictionary and a language model. In particular, the connection relations between words in the word dictionary are scored and stored. Here, the language model storage unit 133 according to the present invention has an N-gram probability and a case grammar probability, and corresponds to the N-gram score calculation unit 1223 and the case grammar score calculation unit 1122. In that respect, it differs from the existing language model storage unit in the kana-kanji conversion system and the speech recognition system.

履歴スコア算出部１１２３は、候補となる読み仮名系列に対して、履歴蓄積部１１４を用いて、過去に利用者によって選択された仮名漢字（単語、文節等）を、高いスコアに設定する。即ち、過去の学習結果を、言語スコアに反映させる。基本的に、その読み仮名で、直前に選択された仮名漢字を予測候補にする。また、その読み仮名に対して、過去に用いられた文章の内容に応じて仮名漢字を予測候補にする。 The history score calculation unit 1123 sets the kana kanji (word, phrase, etc.) selected by the user in the past to a high score for the candidate reading kana series using the history storage unit 114. That is, the past learning result is reflected in the language score. Basically, the kana-kanji selected immediately before that reading kana is used as a prediction candidate. In addition, for the reading kana, kana kanji is used as a prediction candidate in accordance with the contents of sentences used in the past.

従来技術における音声認識システムによれば、利用者に応じて提示する候補を変更することはできない。これに対し、本発明によれば、利用者に応じて使用頻度の高い単語を特定することによって、予測変換精度を向上させることができる。また、仮名漢字変換機能と音声認識機能との間で、履歴情報が相互に共有される。 According to the speech recognition system in the prior art, the candidate to be presented cannot be changed according to the user. On the other hand, according to the present invention, it is possible to improve the predictive conversion accuracy by specifying words that are frequently used according to the user. Also, history information is shared between the kana-kanji conversion function and the voice recognition function.

音響分析部１２２１は、マイク１２１から入力された音声信号を、音響特徴パラメータに変換する。その音響特徴パラメータは、音響スコア算出部１２２２へ出力される。音響特徴パラメータは、入力された音声信号を数十ｍｓｅｃの音素フレーム単位（音素フレーム）で分析したＬＰＣケプストラム又はＭＦＣＣのようなパラメータ系列である。 The acoustic analysis unit 1221 converts the audio signal input from the microphone 121 into an acoustic feature parameter. The acoustic feature parameter is output to the acoustic score calculation unit 1222. The acoustic feature parameter is a parameter series such as an LPC cepstrum or MFCC obtained by analyzing an input speech signal in units of phoneme frames (phoneme frames) of several tens of msec.

音響スコア算出部１２２２は、音響モデル蓄積部１２３を用いて、音素フレームに対する複数候補の読み仮名毎に、音響モデルが与える音響スコアを算出する。 The acoustic score calculation unit 1222 uses the acoustic model storage unit 123 to calculate an acoustic score given by the acoustic model for each of a plurality of candidate reading pseudonyms for the phoneme frame.

統合スコア算出部１３０は、候補となる読み仮名系列に対して、言語スコア（N-gramスコア、格文法スコア及び履歴スコア）と音響スコアとを統合して、最も高いスコアの単語列を、認識結果として出力する。入力された音声信号に対して、音響的な類似度と言語的な妥当性とをスコア化することができる。 The integrated score calculation unit 130 integrates the language score (N-gram score, case grammar score, and history score) and the acoustic score with respect to the candidate kana series, and recognizes the word string having the highest score. Output as a result. It is possible to score the acoustic similarity and linguistic validity of the input speech signal.

探索制御部１３１は、形態素解析に基づいて探索候補を選別すると共に、統合スコアが最も高い仮名漢字列を探索するために、音響スコア算出部１２２２、N-gramスコア算出部１２２３、格文法スコア算出部１１２２、履歴スコア算出部１１２３及び統合スコア算出部１３０を繰り返す。 The search control unit 131 selects a search candidate based on morphological analysis, and in order to search for a kana / kanji string having the highest integrated score, an acoustic score calculation unit 1222, an N-gram score calculation unit 1223, and a case grammar score calculation The unit 1122, the history score calculation unit 1123, and the integrated score calculation unit 130 are repeated.

形態素解析とは、文法及び単語辞書を情報源として用いて、自然言語で書かれた文を形態素（Morpheme、言語で意味を持つ最小単位）の列に分割する。例えば、文「きのうわたしはしっていた。」に対して、多数の単語に区切ることができる。区切り方によっては、意味が通じない単語となる。
・き/のう/わ/たし/は/しっ/て/いた
・きのう/わたし/は/しっ/て/いた
・きのう/わたし/はしっ/て/いた Morphological analysis uses a grammar and word dictionary as an information source to divide a sentence written in natural language into a sequence of morphemes (Morpheme, the smallest unit that has meaning in the language). For example, the sentence “I was doing yesterday” can be divided into many words. Depending on how it is separated, the word does not make sense.
・ Yes / No / Wa / Toshi / Ha / Sh / te / Yes ・ Yes / I / Ha / Sh / te / Yes ・ Yes / I / Hash / Te / was

次に、探索制御部１３１は、最適な文節に区切るべく、スコアが高いＮ個の候補を選別する。スコアを用いることによって、仮名漢字の出現頻度だけでなく、単語又は文節の連接強度によって、Ｎ個の候補が選別される。
Σ^M _m=1α_mΣ^Nm _n=1Ｐ_m(ｘ_mn)
Ｍ：単語数
ｗ_m：ｍ番目の単語
α_m：単語ｗ_mの言語スコア
Ｎ_m：単語ｗ_mに含まれる音素フレーム数
ｘ_mn：単語ｗ_mに含まれるｎ番目の音素フレーム
Ｐ_m(ｘ_mn)：音素フレームの単語ｗ_mの音響モデルの音響スコア Next, the search control unit 131 selects N candidates with high scores in order to divide them into optimal phrases. By using the score, N candidates are selected not only by the appearance frequency of kana / kanji characters but also by the connection strength of words or phrases.
Σ ^M _{m = 1} α _m Σ ^Nm _{n = 1} P _m (x _mn )
M: number of words w _{m: m-th} word α _m: language of the word w _m score N _m: number of words phoneme frame included in the w _m x _mn: n-th phoneme frame that is included in the word w _{_m} P _m (x _mn ): acoustic score of the acoustic model of the word w _{m in} the phoneme frame

従来技術における音声認識システムでは、候補となる読み仮名系列の中から、音響スコア及びN-gramスコアが上位となる読み仮名に絞り込んでいる。これに対し、本発明によれば、形態素解析によって文構造の正しい候補を選別した上で、スコアが上位となる読み仮名に絞り込んでいる。これによって、予測精度も向上する。 In the speech recognition system according to the prior art, the kana characters with the highest acoustic score and N-gram score are narrowed down from the candidate kana sequences. On the other hand, according to the present invention, the correct sentence structure candidates are selected by morphological analysis, and then narrowed down to reading kana with higher scores. This also improves the prediction accuracy.

格フレーム処理部１１２６は、文の格構造によって変換候補を選択する。格フレームとは、動詞に、「どんな助詞（で、に、を、から、より、・・・）と一緒に使われるのか」という情報を持たせることを意味する。入力された読みと一致する格フレームを持つ単語を、変換候補の第１候補にする。例えば、「私はドレスを着る。」「私はナイフでドレスを切る。」のように、「着る」と「切る」の同音異義語を区別することができる。 The case frame processing unit 1126 selects a conversion candidate according to the case structure of the sentence. The case frame means that the verb has information such as “with which particle (in order to be used together)”. A word having a case frame that matches the input reading is set as a first candidate for conversion. For example, “I wear a dress” and “I cut a dress with a knife” can be used to distinguish the homonyms “wear” and “cut”.

従来技術における音声認識システムでは、N-gramの性質上、隣接しない単語同士の同音異義語を正しい候補に変換することが難しい。これに対し、本発明によれば、格フレーム処理によって、文の格構造を概念的に認識し、名詞の後の助詞に基づいて「どの単語に係るか」を解析する。これによって、文節（格構造）が隣接しない（特に離れている）場合であっても、同音異義語の中から適切な候補を選択することができる。 In the conventional speech recognition system, it is difficult to convert homonyms of non-adjacent words into correct candidates due to the nature of N-gram. On the other hand, according to the present invention, the case structure of a sentence is conceptually recognized by case frame processing, and “which word is involved” is analyzed based on the particle after the noun. Thereby, even if the clause (case structure) is not adjacent (particularly apart), an appropriate candidate can be selected from the homonyms.

候補選択画面制御部１３２は、利用者に対して複数の変換候補の中から、いずれか１つの仮名漢字を選択させる。これにより、利用者は、音声認識機能及び仮名漢字変換機能を、同一の候補選択画面として視認することができる。 The candidate selection screen control unit 132 causes the user to select any one kana / kanji character from a plurality of conversion candidates. Thereby, the user can visually recognize the voice recognition function and the kana-kanji conversion function as the same candidate selection screen.

図４は、本発明におけるフローチャートである。 FIG. 4 is a flowchart in the present invention.

（Ｓ４０１）マイクを用いて、利用者によって発声された音声信号を取得する。
（Ｓ４０２）音響分析によって、音声信号から音素フレームを抽出する。
（Ｓ４０３）以下、Ｓ４０８までを繰り返す。
（Ｓ４０４）音素フレームに対する複数候補の読み仮名毎に、音響モデルが与える音響スコアを算出する。
（Ｓ４０５）候補となる読み仮名系列に対して、言語モデルを用いて、単語間の連接強度に応じてN-gramスコアを算出する。
（Ｓ４０６）候補となる読み仮名系列に対して、言語モデルを用いて、文節−文節間の連接強度に応じて格文法スコアを算出する。
（Ｓ４０７）候補となる読み仮名系列に対して、履歴情報を用いて、過去の学習履歴に応じて履歴スコアを算出する。
（Ｓ４０８）候補となる読み仮名系列に対して、音響スコア及び言語スコア（N-gramスコア、格文法スコア及び履歴スコア）に基づく統合スコアを算出する。
（Ｓ４０９）形態素解析に基づいて複数候補の仮名漢字を選別すると共に、統合スコアが最も高い仮名漢字を探索するために、Ｓ４０３からＳ４０８までを繰り返す。
（Ｓ４１０）探索された仮名漢字を含む文章の格構造によって変換候補を選択する。
（Ｓ４１１）利用者に対して複数の変換候補の中から、いずれか１つの仮名漢字を選択させる。利用者から見て、音声認識機能及び仮名漢字変換機能は、同一の候補選択画面として視認される。 (S401) An audio signal uttered by a user is acquired using a microphone.
(S402) A phoneme frame is extracted from an audio signal by acoustic analysis.
(S403) Thereafter, steps up to S408 are repeated.
(S404) An acoustic score given by the acoustic model is calculated for each of a plurality of candidate reading names for the phoneme frame.
(S405) An N-gram score is calculated according to the connection strength between words using a language model for the candidate kana sequence.
(S406) A case grammar score is calculated with respect to the candidate kana sequence using the language model according to the connection strength between the clauses.
(S407) A history score is calculated according to the past learning history using the history information for the candidate reading kana series.
(S408) An integrated score based on the acoustic score and the language score (N-gram score, case grammar score, and history score) is calculated for the candidate kana sequence.
(S409) While selecting a plurality of candidate kana / kanji characters based on the morphological analysis, steps S403 to S408 are repeated to search for kana / kanji characters having the highest integrated score.
(S410) A conversion candidate is selected according to the case structure of the sentence including the searched kana / kanji.
(S411) The user is allowed to select any one kana / kanji from a plurality of conversion candidates. From the viewpoint of the user, the voice recognition function and the kana-kanji conversion function are visually recognized as the same candidate selection screen.

図５は、従来技術と本発明とを比較した、仮名漢字の予測変換シーケンスである。 FIG. 5 is a kana-kanji predictive conversion sequence comparing the prior art and the present invention.

図５によれば、従来技術における音声認識の場合、音響スコア及びN-gramスコアによって探索を繰り返すために、隣接する単語に影響されやすく、誤変換されやすい。 According to FIG. 5, in the case of speech recognition in the prior art, since the search is repeated with the acoustic score and the N-gram score, it is likely to be influenced by adjacent words and misconverted.

これに対し、本発明における音声認識の場合、格文法スコア及び履歴スコアを統合スコアに更に含めて探索を繰り返すことによって、最適な形態素に区分される。更に、格フレーム処理によって文章の格構造が判断され、最終的に、最適な仮名漢字に変換される。 On the other hand, in the case of speech recognition according to the present invention, the case grammar score and the history score are further included in the integrated score, and the search is repeated to classify them into optimum morphemes. Further, the case structure of the sentence is determined by the case frame processing, and finally converted into an optimal kana / kanji character.

以上、詳細に説明したように、本発明の端末、方法及びプログラムによれば、音声認識に仮名漢字変換システムを用いることによって、システム全体の言語モデルのリソースを統合すると共に、音声認識の誤変換を削減することができる。 As described above in detail, according to the terminal, method, and program of the present invention, by using the kana-kanji conversion system for speech recognition, it is possible to integrate the language model resources of the entire system and to perform erroneous speech recognition conversion. Can be reduced.

本発明によれば、音声認識における誤認識を、仮名漢字変換システムにおける言語スコア計算部によって補うことによって、音声認識における認識精度を向上させる。また、言語モデルを統合することによって、当該端末が備えるべきリソースを削減させ、携帯端末への組込みを可能とする。音声認識システムと仮名漢字変換システムとにおける予測候補の選択画面が共通化されるので、利用者から見て理解を容易にする。更に、仮名漢字変換システムで利用者によって選択された仮名漢字は、音声認識システムでも同様に、高いスコアで算出される。逆に、音声認識システムで利用者によって選択された仮名漢字は、仮名漢字変換システムでも同様に、高いスコアで算出される。即ち、利用者から見て、両システムともに、同様の仮名漢字変換が実行される。 According to the present invention, the recognition accuracy in speech recognition is improved by compensating for misrecognition in speech recognition by the language score calculation unit in the kana-kanji conversion system. Also, by integrating the language model, the resources that the terminal should have are reduced, and can be incorporated into a portable terminal. Since the prediction candidate selection screens in the speech recognition system and the kana-kanji conversion system are shared, it is easy for the user to understand. Furthermore, the kana-kanji selected by the user in the kana-kanji conversion system is calculated with a high score in the same manner in the speech recognition system. Conversely, the kana / kanji character selected by the user in the speech recognition system is similarly calculated with a high score in the kana / kanji conversion system. That is, the same kana-kanji conversion is executed in both systems from the viewpoint of the user.

前述した本発明の種々の実施形態について、本発明の技術思想及び見地の範囲の種々の変更、修正及び省略は、当業者によれば容易に行うことができる。前述の説明はあくまで例であって、何ら制約しようとするものではない。本発明は、特許請求の範囲及びその均等物として限定するものにのみ制約される。 Various changes, modifications, and omissions of the above-described various embodiments of the present invention can be easily made by those skilled in the art. The above description is merely an example, and is not intended to be restrictive. The invention is limited only as defined in the following claims and the equivalents thereto.

１端末
１００ディスプレイ
１１１キー操作部
１１２仮名漢字変換エンジン
１１２１係り受け文法スコア算出部
１１２２格文法スコア算出部
１１２３履歴スコア算出部
１１２４統合スコア算出部
１１２５形態素解析部
１１２６格フレーム処理部
１１３言語モデル蓄積部
１１４履歴蓄積部
１１５仮名漢字選択画面制御部
１２１マイク
１２２音声認識エンジン
１２２１音響分析部
１２２２音響スコア算出部
１２２３ N-gramスコア算出部
１２２４統合スコア算出部
１２２５候補範囲制御部
１２３音響モデル蓄積部
１２４言語モデル蓄積部
１２５音声認識選択画面制御部
１３０統合スコア算出部
１３１探索制御部
１３２候補選択画面制御部
１３３言語モデル蓄積部 DESCRIPTION OF SYMBOLS 1 Terminal 100 Display 111 Key operation part 112 Kana-kanji conversion engine 1121 Dependency grammar score calculation part 1122 Case grammar score calculation part 1123 History score calculation part 1124 Integrated score calculation part 1125 Morphological analysis part 1126 Case frame process part 113 Language model storage part 114 history storage unit 115 kana / kanji selection screen control unit 121 microphone 122 speech recognition engine 1221 acoustic analysis unit 1222 acoustic score calculation unit 1223 N-gram score calculation unit 1224 integrated score calculation unit 1225 candidate range control unit 123 acoustic model storage unit 124 language Model accumulation unit 125 Speech recognition selection screen control unit 130 Integrated score calculation unit 131 Search control unit 132 Candidate selection screen control unit 133 Language model accumulation unit

Claims

Acoustic model storage means for storing an acoustic model used in speech recognition;
Language model storage means for storing a word dictionary and a language model corresponding to N-gram and case grammar;
Microphone means for acquiring a voice signal uttered by a user;
Acoustic analysis means for extracting phoneme frames from the speech signal;
Acoustic score calculation means for calculating an acoustic score given by the acoustic model for each of a plurality of candidate reading pseudonyms for the phoneme frame;
N-gram score calculating means for calculating an N-gram score according to the concatenation strength between words given by the language model for candidate reading kana series;
Case grammar score calculating means for calculating a case grammar score according to the connection strength between the clauses for the candidate kana series,
An integrated score calculation means for calculating an integrated score based on the acoustic score, the N-gram score, and the case grammar score for the candidate kana series,
In order to select a plurality of candidate kana-kanji characters based on morphological analysis and to search for kana-kanji characters having the highest integrated score, the acoustic score calculation means, the N-gram score calculation means, the case grammar score calculation means, and Search control means for repeating the integrated score calculation means;
A terminal having a case frame processing means for selecting a conversion candidate according to a case structure of a sentence including a searched kana / kanji, and realizing a voice recognition function.

History accumulating means for storing kana-kanji characters converted in the past;
A history score calculating unit that calculates a history score according to a past learning history with respect to candidate reading kana series using the history storage unit,
The terminal according to claim 1, wherein the integrated score calculation unit further includes the history score in the integrated score.

A key operation means for obtaining a hiragana string input by the user using a key;
The kana-kanji conversion function is realized by the N-gram score calculation means, the case grammar score calculation means, the integrated score calculation means, the search control means, and the case frame processing means. The terminal described in.

Candidate selection screen control means for causing the user to select any one kana / kanji from a plurality of conversion candidates;
The terminal according to claim 3, wherein the candidate selection screen control means causes the user to visually recognize the same candidate selection screen for the voice recognition function and the kana-kanji conversion function.

A speech recognition method on a terminal,
The terminal
An acoustic model storage unit that stores an acoustic model used in speech recognition;
A language model storage unit that stores a word dictionary and a language model corresponding to N-gram and case grammar;
A first step of acquiring an audio signal uttered by a user with a microphone unit;
A second step of extracting a phoneme frame from the speech signal;
A third step of calculating an acoustic score given by the acoustic model for each of a plurality of candidate reading pseudonyms for the phoneme frame;
A fourth step of calculating an N-gram score according to a connection strength between words given by the language model for a candidate kana sequence;
A fifth step of calculating a case grammar score according to the connection strength between clauses for the candidate kana series;
A sixth step of calculating an integrated score based on the acoustic score, the N-gram score, and the case grammar score for candidate kana series;
Selecting a plurality of candidate kana-kanji characters based on morphological analysis and repeating the third to sixth steps to search for kana-kanji characters having the highest integrated score;
A speech recognition method comprising: an eighth step of selecting a conversion candidate according to a case structure of a sentence including a searched kana / kanji character, and realizing a speech recognition function.

The terminal further includes a history storage unit for storing kana-kanji characters converted in the past,
Using the history accumulating unit, further comprising a step of calculating a history score according to a past learning history for a candidate reading kana series;
The voice recognition method according to claim 5, wherein the sixth step further includes calculating the history score in the integrated score.

The terminal further includes a key operation unit that acquires a hiragana sequence input by the user using a key,
The speech recognition method according to claim 5 or 6, wherein the kana-kanji conversion function is realized by the fourth step to the eighth step.

The speech recognition function and the kana-kanji conversion function are the same candidates for the user by further comprising the step of causing the user to select any one kana-kanji character from a plurality of conversion candidates. The speech recognition method according to claim 7, wherein the selection screen is visually recognized.

A speech recognition program that allows a computer installed in a terminal to function,
Acoustic model storage means for storing an acoustic model used in speech recognition;
Language model storage means for storing a word dictionary and a language model corresponding to N-gram and case grammar;
Microphone means for acquiring a voice signal uttered by a user;
Acoustic analysis means for extracting phoneme frames from the speech signal;
Acoustic score calculation means for calculating an acoustic score given by the acoustic model for each of a plurality of candidate reading pseudonyms for the phoneme frame;
N-gram score calculating means for calculating an N-gram score according to the concatenation strength between words given by the language model for candidate reading kana series;
Case grammar score calculating means for calculating a case grammar score according to the connection strength between the clauses for the candidate kana series,
An integrated score calculation means for calculating an integrated score based on the acoustic score, the N-gram score, and the case grammar score for the candidate kana series,
In order to select a plurality of candidate kana-kanji characters based on morphological analysis and to search for kana-kanji characters having the highest integrated score, the acoustic score calculation means, the N-gram score calculation means, the case grammar score calculation means, and Search control means for repeating the integrated score calculation means;
A speech recognition program for causing a computer to function as a case frame processing means for selecting a conversion candidate according to a case structure of a sentence including a searched kana / kanji character.