JP2001134285A

JP2001134285A - Speech recognition device

Info

Publication number: JP2001134285A
Application number: JP31055199A
Authority: JP
Inventors: Takahiro Kudo; 貴弘工藤; Kenji Mizutani; 研治水谷; Yumi Wakita; 由実脇田
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1999-11-01
Filing date: 1999-11-01
Publication date: 2001-05-18

Abstract

PROBLEM TO BE SOLVED: To solve such problem that a conventional language model has increased an erroneous recognition and has been also unable to recognize such expression and phrasing which have not been learned by the model beforehand because a single language model is used disregarding the other party, situation, contents, or the like of a speaker. SOLUTION: According to the contents which a user intends to speak and the other party to whom the user intends to speak, a learning data selecting part 1 selects a language model or a corpus to be used in a speech recognition part 5 from a learning data storage part 2, and after it has been converted into a format to be used in the speech recognition part 5 by a learning data conversion part 3, the speech recognition part 5 performs speech recognition by using the language model suitable for the utterance contents and the other party.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、音声を入力として
文字、画像、音声データなどを出力する対話装置に関す
るものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a dialogue device for outputting characters, images, voice data, and the like using voice as input.

【０００２】[0002]

【従来の技術】以下、従来の音声認識技術に関して述べ
る。2. Description of the Related Art A conventional speech recognition technique will be described below.

【０００３】従来音声認識のための言語モデルの学習に
おいては、一般に入手が容易な新聞記事等の大規模な書
き言葉のデータベースが用いられていた。ところがこの
ような書き言葉から学習された言語モデルでは、音声認
識で対象とするユーザの自由な発話表現を十分に学習で
きないため、大規模な書き言葉データベースに対して、
対話集などから得られる規模の小さい話し言葉に関する
データベースを集めて付加し、学習させることで自由発
話に対応できるようにしていた。Conventionally, in learning a language model for speech recognition, a large-scale database of written words, such as newspaper articles, which is generally easily available, has been used. However, such a language model learned from written words cannot sufficiently learn the free utterance expression of the target user by speech recognition, so that a large-scale written word database must be used.
They collected and added a database of small-scale spoken words obtained from a collection of dialogues, etc., and made them learn so that they could respond to free speech.

【０００４】また音声認識の際のユーザによる認識性能
のかたよりを改善するために、ユーザの発話の音響的な
特徴を学習することで、音響的なモデルを適応させてい
た。Further, in order to improve the recognition performance of the user in speech recognition, the acoustic model is adapted by learning the acoustic characteristics of the utterance of the user.

【０００５】[0005]

【発明が解決しようとする課題】大規模な書き言葉デー
タベースに対して付加する話し言葉データベースの選考
には特に基準はなく、収集可能なだけ収集し付加されて
きたため、話し言葉のデータベースのタスク内容（例え
ば、それが会議中の会話であるか、レストランでの会話
であるか、発話者は客であるか、店員であるか等）に関
しては特に考慮されていなかった。There is no particular criterion for selecting a spoken language database to be added to a large-scale written language database, and since it has been collected and added as much as possible, the task contents of the spoken language database (for example, No particular consideration was given as to whether it was a conversation during a meeting, a conversation at a restaurant, whether the speaker was a customer, a clerk, etc.).

【０００６】ところが、話し言葉データベースを追加し
ていくことが、必ずしも自由発話に対する適応にはなら
ないこと、つまり認識率の向上に直結しないことが実験
により確かめられた。However, experiments have confirmed that adding a spoken language database does not always correspond to free speech, that is, does not directly lead to an improvement in recognition rate.

【０００７】図１２は、様々なデータベースで学習させ
た何種類かの言語モデルに対する、Perplexityを測定し
た結果である。ここでPerplexityとは次単語の予測のし
易さを表す尺度であり、一般にPerplexityの値が低い言
語モデルほど性能がよいと言われている。FIG. 12 shows the results of measuring the Perplexity for several types of language models trained on various databases. Here, the perplexity is a measure indicating the ease of predicting the next word, and it is generally said that a language model having a lower value of perplexity has better performance.

【０００８】大規模な書き言葉データベースとしては毎
日新聞記事９５年版（図１２では毎日DBと記述）、発話
者が異なることでタスク内容が異なると考えられる小規
模な話し言葉データベースとしてホテルにおけるclerk
とcustomerの会話（図１２中ではそれぞれclerk、custo
merと記述）を使用し、毎日新聞DBに対してclerk、cust
omerそれぞれの会話のデータベースを追加して言語モデ
ルを学習させた際の、clerk、customerそれぞれの発話
の評価用データに対するPerplexityの値を測定した。As a large-scale written language database, the Mainichi Shimbun Article 95 version (described as daily DB in FIG. 12), and a small spoken language database which is considered to have different task contents due to different speakers, and a clerk at a hotel.
And customer conversation (in Figure 12, clerk and custo respectively)
clerk, cust for Mainichi Shimbun DB
When a language model was learned by adding a database of each omer's conversation, the value of Perplexity for the evaluation data of each clerk's and customer's utterances was measured.

【０００９】clerkの評価用データに対しては、付加す
る話し言葉データベースが、clerkもしくはcustomer、
その両方と、増加するに従ってPerplexityの値は減少
し、話し言葉データベースが増加することによる適応の
効果がうかがえるが、customerの評価用データに関して
は、customerの話し言葉データベースを付加した場合が
最もPerplexityの値が低く、それにさらにclerkの話し
言葉データベース（つまり、タスク内容の異なる話し言
葉データベース）を追加すると逆にPerplexityの値は増
加する。このことから、話し言葉データベースを追加し
て言語モデルを学習させていくことが、必ずしもユーザ
の自由発話表現に対する適応であるとは考えられない。For the evaluation data of clerk, the spoken language database to be added is clerk or customer,
In both cases, as the value increases, the value of Perplexity decreases, suggesting the effect of adaptation by increasing the spoken language database.However, with regard to customer evaluation data, the value of Perplexity is the highest when the customer's spoken language database is added. Conversely, adding a clerk spoken language database (that is, a spoken language database with different task content) to it will increase the value of Perplexity. From this, it is not considered that adding a spoken language database and learning a language model is necessarily adaptation to a user's free speech expression.

【００１０】また、ユーザによって発話しやすい表現は
まちまちであるが、ユーザに提供される言語モデルには
ユーザ毎に特有な表現の発話のし易さが反映されていな
いため、ユーザの音響的な特徴を学習することによる音
響モデルの適応だけでは、ユーザによる認識率のかたよ
りを十分には補いきれない。[0010] Expressions that are easy for the user to speak vary, but the language model provided to the user does not reflect the ease of speaking of expressions that are unique to each user. The adaptation of the acoustic model by learning the features alone cannot sufficiently compensate for the user's recognition rate.

【００１１】本発明は上記の課題を考慮し、ユーザの発
話するタスク内容に応じて、言語モデル学習の際に大規
模書き言葉データベースに付加する話し言葉データベー
スの種類、数を変更、または、言語モデル自身を変更
し、またデータベースに関して、ユーザ毎に多様である
発話表現を言語モデルに反映させるために、ユーザ固有
の表現の履歴からあらたに話し言葉データベースを追加
することにより、ユーザに適応する言語モデルを提供す
るものである。In view of the above problems, the present invention changes the type and number of spoken language databases to be added to a large-scale written language database during language model learning, or changes the language model itself in accordance with the task contents spoken by the user. To provide a language model that adapts to the user by adding a new spoken language database from the history of user-specific expressions in order to change the language and reflect the utterance expressions that are diverse for each user in the language model. Is what you do.

【００１２】[0012]

【課題を解決するための手段】上述した課題を解決する
ために、第１の本発明（請求項１に対応）は、音声を入
力する音声入力部と、言語データを記憶しておく学習デ
ータ記憶部と、前記学習データ記憶部に記憶されている
言語データを選択する学習データ選択部と、前記学習デ
ータ選択部により選択された言語データを統計的に処
理、もしくはデータの形式を変換する学習データ変換部
と、前記学習データ変換部により処理されたデータを用
いて前記音声入力部に入力された音声を認識する音声認
識部と、前記音声認識部の認識結果をもとに文字、また
は音声、または画像、またはそれらを組み合わせたデー
タを出力するデータ出力部とを有することを特徴とする
音声認識装置である。In order to solve the above-mentioned problems, a first aspect of the present invention (corresponding to claim 1) is to provide a voice input unit for inputting voice and learning data for storing language data. A storage unit, a learning data selection unit for selecting language data stored in the learning data storage unit, and learning for statistically processing the language data selected by the learning data selection unit or converting the data format A data conversion unit, a voice recognition unit that recognizes a voice input to the voice input unit using data processed by the learning data conversion unit, and a character or voice based on a recognition result of the voice recognition unit. Or a data output unit for outputting an image or data obtained by combining the images.

【００１３】また、第２の本発明（請求項２に対応）
は、前記学習データ選択部は、前記学習データ記憶部に
記憶されている複数の言語データの中から、ユーザの発
話目的に応じてユーザが一つまたは複数選択することを
特徴とする請求項１記載の音声認識装置である。Further, a second aspect of the present invention (corresponding to claim 2)
2. The learning data selection unit according to claim 1, wherein the user selects one or a plurality of language data from a plurality of language data stored in the learning data storage unit according to a user's utterance purpose. It is a speech recognition device of the description.

【００１４】また、第３の本発明（請求項３に対応）
は、前記学習データ選択部は、前記学習データ記憶部に
記憶されている複数の言語データの中から、ユーザの発
話目的に応じて自動的に選択することを特徴とする請求
項１記載の音声認識装置である。A third aspect of the present invention (corresponding to claim 3)
2. The voice according to claim 1, wherein the learning data selection unit automatically selects from a plurality of language data stored in the learning data storage unit in accordance with a user's utterance purpose. It is a recognition device.

【００１５】また、第４の本発明（請求項４に対応）
は、請求項２または請求項３に記載の言語データは、言
語を構成する基本単位の接続頻度もしくは接続確率であ
ることを特徴とする請求項１記載の音声認識装置であ
る。A fourth aspect of the present invention (corresponding to claim 4)
The speech recognition apparatus according to claim 1, wherein the language data described in claim 2 or 3 is a connection frequency or a connection probability of a basic unit constituting the language.

【００１６】また、第５の本発明（請求項５に対応）
は、請求項２または請求項３に記載の言語データは単語
列であり、請求項１に記載の学習データ変換部は、請求
項２または請求項３の方法で選択された単語列を言語の
接続頻度もしくは接続確率に変換することを特徴とする
請求項１記載の音声認識装置である。A fifth aspect of the present invention (corresponding to claim 5)
The language data according to claim 2 or 3 is a word string, and the learning data conversion unit according to claim 1 converts the word string selected by the method according to claim 2 or 3 into a language string. 2. The speech recognition apparatus according to claim 1, wherein the speech recognition apparatus converts the connection frequency into a connection frequency or a connection probability.

【００１７】また、第６の本発明（請求項６に対応）
は、請求項２または請求項３に記載の言語データは、発
話の状況ごとにまとまりを持って分類されていることを
特徴とする請求項１記載の音声認識装置である。The sixth invention (corresponding to claim 6)
Is a speech recognition apparatus according to claim 1, wherein the language data according to claim 2 or 3 is classified in units of utterance situations.

【００１８】また、第７の本発明(請求項７に対応)は、
前記学習データ記憶部に記憶されている言語データは、
文字列を入力する言語データ入力部と、入力された文字
列を処理する言語処理部から作成されることを特徴とす
る音声認識装置である。The seventh invention (corresponding to claim 7) is:
The language data stored in the learning data storage unit is
A speech recognition apparatus is characterized by being created from a language data input unit for inputting a character string and a language processing unit for processing the input character string.

【００１９】また、第８の本発明(請求項８に対応)は、
前記言語データ入力部は、ユーザに明示的に文字列の入
力を要求することを特徴とする音声認識装置である。An eighth aspect of the present invention (corresponding to claim 8) is:
The language data input unit is a voice recognition device that explicitly requests a user to input a character string.

【００２０】また、第９の本発明(請求項９に対応)は、
前記言語データ入力部は、ユーザが入力する文字列か
ら、ユーザに意識させることなく前記文字列の部分文字
列を抽出することを特徴とする音声認識装置である。A ninth aspect of the present invention (corresponding to claim 9) is:
The language data input unit is a voice recognition device that extracts a partial character string of the character string from a character string input by a user without making the user conscious.

【００２１】[0021]

【発明の実施の形態】以下に本発明の実施の形態につい
て図面を参照して説明する。Embodiments of the present invention will be described below with reference to the drawings.

【００２２】まず音声認識装置全体について図１を用い
て説明する。First, the entire speech recognition apparatus will be described with reference to FIG.

【００２３】発話内容がタスクや目的に応じて変化する
発話をユーザが行う前に、学習データ選択部１は音声認
識部５で使用する言語データを学習データ記憶部２から
選択する。学習データ変換部３は選択されたデータを学
習データ記憶部２から受け取って、データの形式を判別
し、音声認識部５で利用する形式と異なっていれば利用
できる形式に変換する。データの形式の変換については
後述する。Before the user makes an utterance whose utterance changes according to the task or purpose, the learning data selecting unit 1 selects language data to be used by the speech recognition unit 5 from the learning data storage unit 2. The learning data conversion unit 3 receives the selected data from the learning data storage unit 2, determines the format of the data, and converts the data into a usable format if different from the format used by the speech recognition unit 5. The conversion of the data format will be described later.

【００２４】音声認識部５はデータ出力部６が出力する
データをロードする。The voice recognition unit 5 loads data output from the data output unit 6.

【００２５】ユーザが発話を行うと、発話内容が音声入
力部４に入力される。音声認識部５ではロードしている
データをもとに入力された音声を認識し、それに対する
出力を決定し、データ出力部６では音声認識部５の決定
に従い、音声、文字、画像などのデータを出力する。When the user speaks, the speech content is input to the voice input unit 4. The voice recognition unit 5 recognizes the input voice based on the loaded data, and determines an output for the voice. The data output unit 6 determines the voice, character, image, and other data according to the determination of the voice recognition unit 5. Is output.

【００２６】以下第１から第４の実施の形態では、電子
メールのディクテーションを想定する。In the following first to fourth embodiments, dictation of electronic mail is assumed.

【００２７】（第１の実施の形態）第１の実施の形態で
は、図１における学習データ記憶部２に記憶されている
言語データとして図２に示す言語モデル、学習データ選
択部１で選択を行う主体としてユーザを想定する。(First Embodiment) In the first embodiment, the language model shown in FIG. 2 is selected as the language data stored in the learning data storage unit 2 in FIG. It is assumed that the user is the subject to perform.

【００２８】本実施の形態について、図３を用いて説明
する。This embodiment will be described with reference to FIG.

【００２９】ユーザは電子メールの内容や送信相手から
発話内容に適した言語モデルを選択し、その情報を選択
データ入力部２１に入力する。選択データ入力部２１へ
の入力に基づき、学習データ記憶部２から対応する言語
モデルが学習データ変換部３に送られる。音声認識部５
は学習データ変換部３の出力する言語モデルをロードす
る。The user selects a language model suitable for the content of the utterance from the contents of the e-mail or the transmission destination, and inputs the information to the selection data input unit 21. Based on the input to the selection data input unit 21, the corresponding language model is sent from the learning data storage unit 2 to the learning data conversion unit 3. Voice recognition unit 5
Loads the language model output from the learning data converter 3.

【００３０】（第２の実施の形態）第２の実施の形態に
おいては、図１における学習データ記憶部２に記憶され
ている言語データを図４に示すコーパスとし、言語デー
タの選択は学習データ選択部１が自動的に行うものとす
る。(Second Embodiment) In the second embodiment, the language data stored in the learning data storage unit 2 in FIG. 1 is used as the corpus shown in FIG. It is assumed that the selecting unit 1 automatically performs the processing.

【００３１】またコーパスの種類として、新聞記事の大
規模書き言葉コーパス、病院での会話コーパス、ユーザ
と友人との対話コーパス、のタグ付けされたものを想定
する。It is also assumed that the types of corpus are tagged with a large-scale corpus of newspaper articles, a corpus of conversation at a hospital, and a corpus of conversation between a user and a friend.

【００３２】本実施の形態について、図５から図９を用
いて説明する。This embodiment will be described with reference to FIGS.

【００３３】ユーザが図５に示すフォーマットの電子メ
ールを送信しようとすると、Subjectや受け取り人が図
６の判別条件入力部５１に入力される。選択データ決定
部５２では、判別条件入力部５１に入力されたSubject
や受け取り人をもとに、概念構造および概念とそれに対
応するコーパスに関係を記憶している判別条件データベ
ース５３からユーザがこれから発話する電子メールの本
文の内容を予測し、学習データ記憶部２から使用するデ
ータを選択する。例えば図５の例では、Subject４２が
“遊園地”、受け取り人（図５のTo４１）が“義信”で
ある。図７に示す概念図から、“遊園地”は“遊び場
所”であり、“義信”は“友人”である。図６の選択デ
ータ決定部５２は、概念とそれに対応するコーパスを示
した図８から、選択すべきコーパスが新聞記事コーパス
と、友人との対話コーパスであることを決定する。When the user tries to send an e-mail in the format shown in FIG. 5, the subject and the recipient are input to the determination condition input unit 51 in FIG. In the selection data determination unit 52, the Subject input to the discrimination condition input unit 51
From the discriminant condition database 53 which stores the concept structure and the relation between the concept and the corpus corresponding thereto based on the recipient and the recipient, the contents of the body of the e-mail to be uttered by the user from now on are predicted. Select the data to use. For example, in the example of FIG. 5, the subject 42 is “amusement park” and the recipient (To 41 in FIG. 5) is “yoshinobu”. From the conceptual diagram shown in FIG. 7, “amusement park” is a “playground” and “yoshinobu” is a “friend”. The selection data determination unit 52 in FIG. 6 determines that the corpus to be selected is the newspaper article corpus and the conversation corpus with friends from FIG. 8 showing the concept and the corpus corresponding thereto.

【００３４】選択データ決定部５２で選択されたコーパ
スを受け取った学習データ変換部３は、図９に示す形態
素解析を行った後、単語やフレーズ、クラス間の隣接出
現頻度を数えて、図２に示すような、音声認識部５で使
用できる言語モデルのフォーマットに変換することにな
る。ここでクラスとは、前後の接続関係が似ている単語
やフレーズ同士をひとまとまりの単位としたものであ
る。After receiving the corpus selected by the selection data decision unit 52, the learning data conversion unit 3 performs the morphological analysis shown in FIG. 9 and then counts the frequency of adjacent occurrences between words, phrases, and classes. Is converted into a language model format that can be used by the speech recognition unit 5 as shown in FIG. Here, the class is a group of words and phrases having similar connection relations before and after.

【００３５】（第３の実施の形態）第３の実施の形態で
は、図１における学習データ記憶部２に記憶される言語
データが新しく作成される場合を想定する。(Third Embodiment) In the third embodiment, it is assumed that language data stored in the learning data storage unit 2 in FIG. 1 is newly created.

【００３６】本実施の形態について、図１０を用いて説
明する。This embodiment will be described with reference to FIG.

【００３７】ユーザは、学習データ記憶部２に追加した
い表現や文字列を言語データ入力部９１に入力する。こ
の場合の追加したい表現や文字列は、例えばユーザがよ
く使用する言い回しであったり、ユーザ特有の表現であ
ったりすることが考えられる。The user inputs an expression or a character string to be added to the learning data storage unit 2 to the language data input unit 91. In this case, the expression or character string to be added may be, for example, a phrase frequently used by the user or an expression unique to the user.

【００３８】言語処理部９２は入力された文字列を受け
取り、学習データ記憶部２において記憶されるフォーマ
ットへ変換を行う。The language processing unit 92 receives the input character string and converts it into a format stored in the learning data storage unit 2.

【００３９】例えば、学習データ記憶部２が言語モデル
の形式で記憶するなら、言語処理部９２は入力データを
形態素解析し、単語やフレーズ、クラス間の隣接出現頻
度を計算し、学習データ記憶部２はその情報を記憶す
る。For example, if the learning data storage unit 2 stores the data in the form of a language model, the language processing unit 92 performs morphological analysis of the input data, calculates the frequency of adjacent occurrences between words, phrases, and classes. 2 stores the information.

【００４０】（第４の実施の形態）第4の実施の形態で
は、ユーザが言語データ入力部９１への入力を意識して
いなくても、ユーザが他の目的で別の装置、またはシス
テムに入力する表現や文字列が学習データ記憶部２に記
憶される場合として、ユーザが電子メールを使用する環
境を想定する。(Fourth Embodiment) In the fourth embodiment, even if the user is not conscious of the input to the language data input section 91, the user can connect to another device or system for another purpose. As a case where an expression or a character string to be input is stored in the learning data storage unit 2, an environment in which a user uses an electronic mail is assumed.

【００４１】本実施の形態について図１１を用いて説明
する。This embodiment will be described with reference to FIG.

【００４２】ユーザは日々電子メールの本文を文字列と
して電子メールソフト１０１に入力する。言語処理部９
２は電子メールソフト１０１に入力された文字列を受け
取り、第3の実施の形態と同様の処理を行う。The user inputs the body of the electronic mail to the electronic mail software 101 as a character string every day. Language processing unit 9
2 receives the character string input to the e-mail software 101 and performs the same processing as in the third embodiment.

【００４３】[0043]

【発明の効果】以上述べたことから明らかなように、本
発明によれば、ユーザの発話内容や目的に応じて、音声
認識部はその都度適した言語モデルを使用することがで
きるため、認識性能の向上が期待できる。As is apparent from the above description, according to the present invention, the speech recognition unit can use an appropriate language model each time according to the contents and purpose of the utterance of the user. An improvement in performance can be expected.

【００４４】また、ユーザに固有の表現やユーザがよく
使用する言い回しを記憶させておくことで、あらかじめ
学習されている言語モデルにはない表現の認識を可能に
し、頻繁に使用する表現の認識性能を向上することが可
能となる。Also, by storing expressions unique to the user and phrases often used by the user, it is possible to recognize expressions that are not present in the language model that has been learned in advance, and to recognize the frequently used expressions. Can be improved.

[Brief description of the drawings]

【図１】音声認識装置の構成図FIG. 1 is a configuration diagram of a speech recognition device.

【図２】音声認識部で使用する言語モデルのフォーマッ
トの図FIG. 2 is a diagram of a format of a language model used in a speech recognition unit.

【図３】第１の実施の形態における、音声認識装置の構
成図FIG. 3 is a configuration diagram of a speech recognition device according to the first embodiment.

【図４】対話コーパスの図FIG. 4 is a diagram of a dialog corpus.

【図５】電子メール送信のテンプレートの図FIG. 5 is a diagram of an e-mail transmission template.

【図６】第２の実施の形態における、音声認識装置の構
成図FIG. 6 is a configuration diagram of a speech recognition device according to a second embodiment.

【図７】概念の木構造を表した図FIG. 7 is a diagram showing a tree structure of a concept.

【図８】概念と対応するコーパスの関係を示した図FIG. 8 is a diagram showing a relationship between a concept and a corpus corresponding thereto.

【図９】形態素解析の結果を表した図FIG. 9 is a diagram showing a result of morphological analysis.

【図１０】第3の実施の形態における、学習データ追加
の構造を表した図FIG. 10 is a diagram showing a structure of learning data addition in the third embodiment.

【図１１】第4の実施の形態における、電子メールによ
る学習データの追加の構造を表した図FIG. 11 is a diagram showing an additional structure of learning data by e-mail according to the fourth embodiment.

【図１２】言語モデルとPerplexity値の関係を示した図FIG. 12 is a diagram showing a relationship between a language model and a Perplexity value.

[Explanation of symbols]

１学習データ選択部２学習データ記憶部３学習データ変換部４音声入力部５音声認識部６データ出力部 11 言語モデル 21 選択データ入力部 31 対話コーパス 41 受け取り人 42 Subject 51 判別条件入力部 52 選択データ決定部 53 判別条件データベース 61 受け取り人の概念構造 62 Subjectの概念構造 71 概念と対応するコーパス 81 形態素解析の対象文 82 形態素解析結果 91 言語データ入力部 92 言語処理部 101 電子メールソフト 111 評価データに対するPerplexity Reference Signs List 1 learning data selection unit 2 learning data storage unit 3 learning data conversion unit 4 voice input unit 5 voice recognition unit 6 data output unit 11 language model 21 selection data input unit 31 dialog corpus 41 recipient 42 Subject 51 determination condition input unit 52 selection Data determination unit 53 Discriminant condition database 61 Recipient's conceptual structure 62 Subject's conceptual structure 71 Concept and corresponding corpus 81 Morphological analysis target sentence 82 Morphological analysis result 91 Language data input unit 92 Language processing unit 101 E-mail software 111 Evaluation data Perplexity for

───────────────────────────────────────────────────── フロントページの続き (72)発明者脇田由実大阪府門真市大字門真1006番地松下電器産業株式会社内Ｆターム(参考） 5D015 AA01 AA05 HH11 ──────────────────────────────────────────────────続き Continued on the front page (72) Inventor Yumi Wakita 1006 Kazuma Kadoma, Kadoma City, Osaka Matsushita Electric Industrial Co., Ltd. F-term (reference) 5D015 AA01 AA05 HH11

Claims

[Claims]

An input unit for inputting a voice; a learning data storage unit for storing language data; a learning data selection unit for selecting language data stored in the learning data storage unit; A learning data conversion unit that statistically processes the language data selected by the data selection unit or converts a data format; and a voice input to the voice input unit using the data processed by the learning data conversion unit. And a data output unit for outputting a character, a voice, an image, or a combination thereof based on the recognition result of the voice recognition unit. .

2. The learning data selection unit, comprising: a plurality of language data stored in the learning data storage unit;
2. The speech recognition device according to claim 1, wherein the user selects one or a plurality of items according to the utterance purpose of the user.

3. The learning data selecting unit, from among a plurality of language data stored in the learning data storage unit,
2. The speech recognition device according to claim 1, wherein the selection is automatically made in accordance with the utterance purpose of the user.

4. The speech recognition apparatus according to claim 1, wherein the language data according to claim 2 or 3 is a connection frequency or a connection probability of a basic unit constituting the language.

5. The language data according to claim 2 or 3, wherein the language data is a word string, and the learning data conversion unit according to claim 1 is a word string selected by the method according to claim 2 or 3. 2. The speech recognition apparatus according to claim 1, wherein is converted into a connection frequency or a connection probability of a language.

6. The speech recognition device according to claim 1, wherein the language data according to claim 2 or 3 is classified in units of utterance situations.

7. The language data stored in the learning data storage unit includes: a language data input unit for inputting a character string;
A speech recognition apparatus, which is created from a language processing unit that processes an input character string.

8. The speech recognition device according to claim 1, wherein the language data input unit explicitly requests a user to input a character string.

9. The speech recognition device according to claim 1, wherein the language data input unit extracts a partial character string of the character string from a character string input by a user without making the user aware.