JP2006145891A

JP2006145891A - Voice processor, voice processing method, voice processing program and recording medium

Info

Publication number: JP2006145891A
Application number: JP2004336484A
Authority: JP
Inventors: Kentaro Yamamoto; 健太郎山本; Atsushi Shinohara; 淳篠原
Original assignee: Pioneer Electronic Corp
Current assignee: Pioneer Corp
Priority date: 2004-11-19
Filing date: 2004-11-19
Publication date: 2006-06-08
Anticipated expiration: 2024-11-19
Also published as: JP4718163B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a voice processor which smoothly copes with the voice uttered by a user. <P>SOLUTION: The voice processor is provided with: a voice recognition section 3 which recognizes uttered voice and converts the voice to character data Sc; a character data generating means 6 which determines whether recognized operation contents that are indicated by the character data Sc, match with the registered operation contents that are already registered in a database 8, and when the recognized operation contents do not match with the registered operation contents, generates related character data Sg that indicate operation contents related to the recognized operation contents, while acting as an artificial brain; and a voice synthesizing section 9 which responds to the voice by using the related character data Sg. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本願は、発話された音声に対応する音声応答や動作を実行する音声処理装置に関する。 The present application relates to a voice processing apparatus that executes voice response and operation corresponding to spoken voice.

従来から、カーナビゲーション等をはじめ、様々な分野で音声処理装置が用いられている。ここで、具体的に音声処理装置とは、ユーザ（カーナビゲーションの場合は運転者又は同乗者となる）が発話した音声に対応する各種の情報等を音声によりそのユーザに提供する装置である（特許文献１）。 Conventionally, audio processing apparatuses have been used in various fields including car navigation. Here, specifically, the voice processing device is a device that provides various information corresponding to the voice spoken by the user (in the case of car navigation, a driver or a passenger) to the user by voice ( Patent Document 1).

また、近年の音声処理装置においては、単語認識方式の場合での音声認識率はある程度高いレベルとなっているため、ユーザが発話した音声が、予め音声処理装置に登録された単語であれば、その発話した音声に対応する情報等を正確にユーザに提供することができる。
特開２００３−２４１７９７号公報 Further, in recent speech processing devices, the speech recognition rate in the case of the word recognition method is a high level to some extent, so if the speech uttered by the user is a word previously registered in the speech processing device, Information corresponding to the spoken voice can be accurately provided to the user.
JP 2003-241797 A

しかしながら、上述した従来の音声処理装置では、ユーザがその音声処理装置に予め登録された単語を発話しなければその内容の認識がされないため、ユーザはいつも決まった定型文を話さなくてはならない。よって、ユーザにとって自由な用語を用いた音声処理ができないという問題点があった。また、たとえ認識されたとしてもそれに対応する応答としては定型文による応答しか為されないので、会話としてはいつも同じような結果しか出力されず、結果として多様性が失われてユーザが音声処理装置を使うこと自体に飽きてしまうという問題点があった。 However, in the above-described conventional speech processing apparatus, since the content is not recognized unless the user speaks a word registered in the speech processing apparatus in advance, the user must always speak a fixed phrase. Therefore, there is a problem that voice processing using terms that are free for the user cannot be performed. In addition, even if recognized, only a response in a fixed sentence is made as a corresponding response. Therefore, only the same result is always output as a conversation, and as a result, the diversity is lost and the user uses the voice processing device. There was a problem of getting tired of using it.

さらに、音声認識は高いレベルであっても、音声処理装置に登録されていない単語をユーザが発話した場合には、音声処理装置はその単語を認識できず、その音声処理装置が無反応になるという事態が生じ、音声操作への抵抗感、拒否感を抱くことになってしまうという問題点もあった。 Furthermore, even if speech recognition is at a high level, if the user utters a word that is not registered in the speech processing device, the speech processing device cannot recognize the word, and the speech processing device becomes unresponsive. As a result, there was a problem that a feeling of resistance and refusal to voice operation was held.

本願は、このような問題に鑑みなされたものであり、ユーザが発話した自由な内容の音声に対して、円滑に対応できる音声処理装置を提供することを課題の一例とする。 This application is made in view of such a problem, and makes it an example of providing the audio processing apparatus which can respond | correspond smoothly with the audio | voice of the free content which the user uttered.

上記課題を解決するために、請求項１に記載の発明は、発話された音声を認識し、文字データに変換する音声認識手段と、予め登録された登録動作内容を記憶する記憶手段と、
前記文字データにより示される動作内容である認識動作内容が、記憶手段に記憶されている登録動作内容と一致しない場合に、前記認識動作内容に関連する動作内容を意味する関連文字データを生成する文字データ生成手段と、前記生成された関連文字データを用いて、前記音声に対する応答を行う応答手段と、を備える。 In order to solve the above-mentioned problem, the invention according to claim 1 recognizes spoken voice and converts it into character data, storage means for storing pre-registered registration operation contents,
A character that generates related character data that represents an operation content related to the recognition operation content when the recognition operation content that is the operation content indicated by the character data does not match the registered operation content stored in the storage unit. Data generation means; and response means for responding to the voice using the generated related character data.

上記課題を解決するために、請求項１３に記載の発明は、発話された音声を認識し、文字データに変換する音声認識工程と、予め登録された登録動作内容を記憶する記憶工程と、前記文字データにより示される動作内容である認識動作内容が、記憶手段に記憶されている登録動作内容と一致しない場合に、前記認識動作内容に関連する動作内容を意味する関連文字データを生成する文字データ生成工程と、前記生成された関連文字データを用いて、前記音声に対する応答を行う応答工程と、を備える。 In order to solve the above-mentioned problem, the invention according to claim 13 recognizes spoken speech and converts it into character data, a storage step of storing pre-registered registration operation contents, Character data for generating related character data indicating the operation content related to the recognition operation content when the operation content indicated by the character data does not match the registered operation content stored in the storage means A generating step, and a response step of responding to the voice using the generated related character data.

上記課題を解決するために、請求項１４に記載の発明は、コンピュータを請求項１乃至請求項１２のいずれか一に記載の音声処理装置として機能させる。 In order to solve the above problem, an invention according to a fourteenth aspect causes a computer to function as the voice processing apparatus according to any one of the first to twelfth aspects.

上記課題を解決するために、請求項１５に記載の発明は、請求項１４に記載の音声処理用プログラムが、前記コンピュータにより読取可能に記録されている。 In order to solve the above-mentioned problem, in the invention described in claim 15, the voice processing program described in claim 14 is recorded so as to be readable by the computer.

次に、本願の音声処理装置について、図面を用いて具体的に説明する。なお、以下に説明する実施形態は、音響再生機能付きカーナビゲーションシステムに対して本願の音声処理装置を適用した場合の実施の形態である。 Next, the speech processing apparatus of the present application will be specifically described with reference to the drawings. In addition, embodiment described below is embodiment at the time of applying the audio processing apparatus of this application with respect to the car navigation system with a sound reproduction function.

（Ｉ）実施形態
図１は、本願の音声処理装置の概要構成を示すブロック図である。 (I) Embodiment FIG. 1 is a block diagram showing a schematic configuration of a speech processing apparatus of the present application.

図１に示すように、実施形態に係る音声処理装置Ｖは、マイク１と、Ａ／Ｄ変換部２と、音声認識手段としての音声認識部３と、第１動作実行手段、第２動作実行手段としての動作実行部４と、実行処理判断手段及び文字データ生成処理判断手段としての制御部５と、文字データ生成手段としての文字データ生成部６と、解析部７と、記憶手段としてのデータベース８と、第１音声合成手段、第２音声合成手段としての音声合成部９と、Ｄ／Ａ変換部１０、スピーカ１１と、から構成されている。 As shown in FIG. 1, the speech processing apparatus V according to the embodiment includes a microphone 1, an A / D conversion unit 2, a speech recognition unit 3 as speech recognition means, a first operation execution unit, and a second operation execution. An operation execution unit 4 as a means, a control unit 5 as an execution process determination unit and a character data generation process determination unit, a character data generation unit 6 as a character data generation unit, an analysis unit 7, and a database as a storage unit 8, a first voice synthesis unit, a voice synthesis unit 9 as a second voice synthesis unit, a D / A conversion unit 10, and a speaker 11.

次に、全体動作を説明する。 Next, the overall operation will be described.

上記の構成において、ユーザにより音声が発話された際、マイク１を通して当該音声に対応する音声信号ＳａがＡ／Ｄ変換部２へ出力される。そして、音声信号Ｓａは、Ａ／Ｄ変換部２により、音声データＳｂに変換されて音声認識部３に出力される。その後、音声データＳｂの内容が音声認識部３により音声認識されて対応する文字データＳｃに変換され、制御部５に出力される。 In the above configuration, when a voice is spoken by the user, the voice signal Sa corresponding to the voice is output to the A / D conversion unit 2 through the microphone 1. Then, the audio signal Sa is converted into audio data Sb by the A / D conversion unit 2 and output to the audio recognition unit 3. Thereafter, the content of the voice data Sb is voice-recognized by the voice recognition unit 3 to be converted into corresponding character data Sc and output to the control unit 5.

その後、制御部５によって、文字データＳｃに示される動作内容である認識動作内容（以下、単に「認識動作内容」と称する場合がある）が予め登録されている登録動作内容（以下、単に「登録動作内容」と称する場合がある）と一致しているか否かが判断される。なお、当該登録動作内容について具体的には、後述する。 Thereafter, the control unit 5 recognizes the registered operation content (hereinafter simply referred to as “recognition operation content”) that is the operation content indicated by the character data Sc (hereinafter simply referred to as “recognition operation content”). It is determined whether or not the content matches the “operation content”. The details of the registration operation will be described later.

ここで、動作内容とは、実施形態に係るカーナビゲーションシステムにより実行可能なナビゲーション処理の内容、例えば、ナビゲーション用の地図表示、経路探索又はその探索結果を用いた経路案内、或いは音響再生機能としてＭＤ（Mini Disc）に記録された音楽の再生処理等をいう。 Here, the operation content is the content of navigation processing that can be executed by the car navigation system according to the embodiment, for example, map display for navigation, route search or route guidance using the search result, or MD as a sound reproduction function. This refers to the playback processing of music recorded on (Mini Disc).

そして、認識動作内容が登録動作内容と一致している場合には、動作実行部４により、認識動作内容と同内容の登録動作内容を実行する処理（以下、単に「動作実行処理」と称する場合がある）を行う。 If the recognition operation content matches the registration operation content, the operation execution unit 4 executes the registration operation content having the same content as the recognition operation content (hereinafter simply referred to as “operation execution processing”). Do).

また、その登録動作内容を示す文字データＳｃは、音声合成部９に出力され、当該音声合成部９により音声合成されて音声データＳｄに変換され、Ｄ／Ａ変換部１０を通って、音声信号Ｓeに変換され、スピーカ１１により当該音声信号Ｓeに対応する音声が発話される。 Further, the character data Sc indicating the registered operation content is output to the speech synthesizer 9, synthesized by the speech synthesizer 9, converted into speech data Sd, passed through the D / A converter 10, The sound is converted into Se, and the speaker 11 utters the sound corresponding to the sound signal Se.

以上の動作により、認識動作内容が登録動作内容と一致している場合には、動作実行部４によりユーザの発話により得られた認識動作内容と同内容の登録動作内容が実行されるため、ユーザの意図した動作が行われることになる。 By the above operation, when the recognition operation content matches the registration operation content, the operation execution unit 4 executes the registration operation content having the same content as the recognition operation content obtained by the user's utterance. The intended operation is performed.

一方、文字データＳｃに示される動作内容である認識動作内容が登録動作内容と一致しない場合には、当該文字データＳｃが、制御部５から文字データ生成部６に出力される。 On the other hand, when the recognition operation content that is the operation content indicated by the character data Sc does not match the registered operation content, the character data Sc is output from the control unit 5 to the character data generation unit 6.

ここで、文字データ生成部６は、認識動作内容に関連した動作内容を示す文字データ（以下、「関連文字データ」と称する）を生成するものであって、より具体的には、いわゆる人工無脳としての会話プログラムを動作させることで実現されるものである。 Here, the character data generation unit 6 generates character data indicating the operation content related to the recognition operation content (hereinafter referred to as “related character data”). It is realized by operating a conversation program as a brain.

人工無脳とは、一般的な「人工知能」に対峙するものとして用いられる会話プログラムの総称であって、いわゆるボトムアップ的な人工知能としての処理では「人らしさ」に到達するまでに複雑な処理が必要となるため、これとは逆に、トップダウン的に「人らしさ」のモデルを形成することで「人らしさ」を作り出そうとした会話プログラムの総称である。例えばインターネット等の検索エンジンやエキスパートシステムなど、人らしくはないものの役には立つ「人工知能」に対峙して「人工無能」と呼ばれたのがこの語の起源であるが、「無能」の否定的なイメージを嫌われた結果、近年では「人工無脳」という称されるようになったものである。 Artificial brainless is a general term for conversational programs that are used to confront general “artificial intelligence”. In the process of so-called bottom-up artificial intelligence, it is complicated to reach “humanity”. Contrary to this, it is a general term for conversation programs that try to create “humanity” by forming a model of “humanity” from the top down. The origin of this word was called “artificial incompetence” in contrast to “artificial intelligence”, which is useful for things that are not human, such as search engines such as the Internet and expert systems. As a result of hating negative images, in recent years it has come to be called “artificial brainless”.

文字データ生成部６により行われる文字データ生成処理によって生成される関連文字データとは、例えば、認識動作内容が示される文字データの単語キー列（これについては後述する）のいずれかの単語が含まれるもの、読みが同じものが含まれるもの、意味的に似ているもの、意味がつながるもの、ユーザが関連文字データにより発話された言葉を認識した際に、ユーザが発話した音声に基づいて関連文字データが生成されて応答されたということがユーザにわかりやすく理解可能であるもの等をいう。 The related character data generated by the character data generation processing performed by the character data generation unit 6 includes, for example, any word in a word key string (which will be described later) of character data indicating the recognition operation content. Things that have the same readings, things that are semantically similar, things that have meanings connected, and related when the user recognizes words spoken by related character data based on the speech spoken by the user It means that the user can easily understand and understand that character data has been generated and responded.

従って、文字データ生成部６により行われる文字データ生成処理は、ユーザが発話することにより得られた文字データＳｃとは意味の繋がらない意外性のある言葉を関連文字データＳｇとして生成することもあれば、登録動作内容を意味する文字データＳｃを中心とした言葉を関連文字データＳｇとして生成することもある。 Therefore, the character data generation process performed by the character data generation unit 6 may generate, as related character data Sg, an unexpected word that is not connected to the character data Sc obtained by the user speaking. For example, a word centering on the character data Sc meaning the registered operation content may be generated as the related character data Sg.

これにより、ユーザの発話により得られた認識動作内容が、登録動作内容と一致しない場合であっても、音声処理装置Ｖによってユーザに何らかの応答を行わせることができる。 Thereby, even if the recognition operation content obtained by the user's utterance does not match the registered operation content, the voice processing device V can cause the user to make some response.

次に、文字データ生成部６により出力された文字データＳｃの動作内容は、解析部７により品詞分解され、文字データＳｆに変換され、データベース８に保存される。その後、品詞分解の結果により得られた文字データＳｆに関連があり且つ登録動作内容を意味する複数の単語キーＳｉを、データベース８より文字データ生成部６に出力する。文字データ生成部６により単語キーＳｉを基にして関連文字データＳｇが生成され、制御部５に出力される。 Next, the operation content of the character data Sc output by the character data generation unit 6 is decomposed into parts of speech by the analysis unit 7, converted into character data Sf, and stored in the database 8. Thereafter, a plurality of word keys Si that are related to the character data Sf obtained as a result of the part-of-speech decomposition and mean the registered operation contents are output from the database 8 to the character data generation unit 6. The character data generation unit 6 generates related character data Sg based on the word key Si and outputs it to the control unit 5.

その後、動作実行部４により、関連文字データＳｇに示された動作内容、具体的には、上述したような関連文字データＳｇに示された動作内容、つまり、ユーザが発話することにより得られた文字データＳｃとは意味の繋がらない意外性のある言葉を関連文字データＳｇとして生成された動作内容又は、登録動作内容を意味する文字データＳｃを中心とした言葉を関連文字データＳｇとして生成された動作内容等が実行される。 Thereafter, the action execution unit 4 obtains the action contents indicated in the related character data Sg, specifically, the action contents indicated in the related character data Sg as described above, that is, the user speaks. The action content generated as the related character data Sg is an unexpected word that has no meaning to the character data Sc, or the word centered on the character data Sc meaning the registered operation content is generated as the related character data Sg. The operation content and the like are executed.

また、文字データＳｇは、音声合成部９に出力されて音声合成され、音声データＳｋに変換され、Ｄ／Ａ変換部１０により音声信号Ｓｌに変換され、スピーカ１１により当該音声信号Ｓｌに対応する音声が発話される。 Further, the character data Sg is output to the voice synthesizer 9 to be synthesized, converted into voice data Sk, converted into a voice signal Sl by the D / A converter 10, and corresponds to the voice signal S1 by the speaker 11. Voice is spoken.

この動作により、ユーザの発話により得られた認識動作内容が、登録動作内容と一致しない場合であっても、文字データ生成部６により関連文字データＳｇを生成し、音声処理装置Ｖによってユーザに何らかの応答を行わせることにより、ユーザが予め登録されている定型文を話さなくても自由な表現を用いて対話が可能であり、また、音声処理装置Ｖが無反応状態となることを回避することができ、音声操作への抵抗感、拒否感を軽減できる。 By this operation, even if the recognition operation content obtained by the user's utterance does not match the registered operation content, the character data generation unit 6 generates the related character data Sg, and the voice processing device V causes the user to do something. By making a response, it is possible for the user to interact with a free expression without speaking a pre-registered fixed sentence, and to avoid the voice processing device V from becoming unresponsive. Can reduce the sense of resistance and refusal to voice operation.

次に、音声処理装置の音声処理について、具体的に図２乃至図４を用いて説明する。 Next, the audio processing of the audio processing apparatus will be specifically described with reference to FIGS.

先ず、図２は、音声処理装置の音声処理の全体について示したフローチャートである。 First, FIG. 2 is a flowchart showing the entire sound processing of the sound processing apparatus.

実施形態に係る音声処理としては、最初に、ユーザから発話された音声を音声認識部によって音声認識し、文字データＳｃに変換する（ステップＳ２１）。 As voice processing according to the embodiment, first, voice uttered by a user is voice-recognized by a voice recognition unit and converted into character data Sc (step S21).

そして、文字データＳｃに示される動作内容である認識動作内容が登録動作内容と一致しているか否かを判断する（ステップＳ２２）。 Then, it is determined whether or not the recognition operation content, which is the operation content indicated by the character data Sc, matches the registered operation content (step S22).

ここで、ステップＳ２２に関し、登録動作内容の例について表１を用いて説明する。 Here, regarding step S22, an example of registration operation content will be described with reference to Table 1.

当該登録動作内容としては、例えば表１に示すように、「ＭＤ再生」、「ナビ画面表示」、「次の曲を再生」、「前の曲を再生」又は「渋滞情報確認」等の、動作実行部４において実行可能な実施形態に係るナビゲーションシステムとしての具体的な動作内容が、予め登録されている。そして、ステップＳ２２では、これらの登録動作内容と認識動作内容とが一致しているか否かを判断する。

As the registration operation content, as shown in Table 1, for example, “MD playback”, “navigation screen display”, “play next song”, “play previous song”, or “congestion information confirmation”, etc. Specific operation contents as the navigation system according to the embodiment that can be executed by the operation execution unit 4 are registered in advance. In step S22, it is determined whether or not the registered operation content matches the recognized operation content.

文字データＳｃにより示された動作内容である認識動作内容が登録動作内容と一致しない場合（ステップＳ２２；ＮＯ）には、次に、上記登録動作内容の一部になり得るものとしてデータベース８内に予め登録されている部分的な動作又はその動作に付属する内容である部分内容（以下、単に「部分内容」とする）を認識動作内容が含むか否かを判断する（ステップＳ２３）。 If the recognition operation content, which is the operation content indicated by the character data Sc, does not match the registration operation content (step S22; NO), it is next stored in the database 8 as a part of the registration operation content. It is determined whether or not the recognition operation content includes a partial operation registered in advance or a partial content (hereinafter simply referred to as “partial content”) that is a content attached to the operation (step S23).

ここで、ステップＳ２３についての部分内容の例について表２を用いて説明する。 Here, the example of the partial content about step S23 is demonstrated using Table 2. FIG.

ステップＳ２３についての部分内容としては、例えば表２左に示すように、「ＭＤ」、「ナビ」、「次」、「前」、「天気」等が、予め登録されている。

For example, “MD”, “Navi”, “Next”, “Previous”, “Weather”, etc. are registered in advance as the partial contents of Step S23.

そして、上記ステップＳ２３においては、例えば、ユーザにより「最近天気が悪いね」という音声が発話された場合、ユーザにより発話された音声から得られた文字データＳｃの内容が解析部７により解析され、その解析結果としての内容がステップＳ２３の処理としての部分内容のいずれかに合致しているか否かが判断される。例えば、表２に示す部分内容である「天気」が、認識動作内容に含まれている場合には、認識動作内容が部分内容を含む（すなわち、文字データＳｃにより示された動作内容の一部がステップＳ２３としての部分内容と一致している）と判断される（ステップＳ２３；ＹＥＳ）。同様に、ユーザにより「今日は新しいＭＤを買った」という音声が発話された場合、表２に示す部分内容である「ＭＤ」が、認識動作内容に含まれているので、ステップＳ２３についての部分内容を含むと判断される（ステップＳ２３；ＹＥＳ）。 In step S23, for example, when the user utters a voice saying “The weather has been bad recently”, the content of the character data Sc obtained from the voice uttered by the user is analyzed by the analysis unit 7, It is determined whether or not the content as the analysis result matches any of the partial contents as the process of step S23. For example, when “weather” which is the partial content shown in Table 2 is included in the recognition operation content, the recognition operation content includes the partial content (that is, a part of the operation content indicated by the character data Sc). Is the same as the partial content in step S23) (step S23; YES). Similarly, if the user utters a voice saying “I bought a new MD today”, the partial contents “MD” shown in Table 2 are included in the recognition operation contents, so the part about step S23 It is determined that the content is included (step S23; YES).

一方、Ｓ２３の判定において、部分内容を全く含まない場合（ステップＳ２３；ＮＯ）、具体的には、表２に示すような部分内容が、認識動作内容の中に一つもない場合には、文字データ生成部６により、認識動作内容に関連する動作内容を意味する関連文字データを生成する（ステップＳ２８）。この関連文字データＳｇの生成処理の詳細については、後述する。 On the other hand, if it is determined in S23 that no partial content is included (step S23; NO), specifically, if there is no partial content as shown in Table 2 in the recognition operation content, The data generation unit 6 generates related character data indicating the operation content related to the recognition operation content (step S28). Details of the generation processing of the related character data Sg will be described later.

次に、上記ステップＳ２３の判定において、認識動作内容が部分内容を含む場合（ステップＳ２３；ＹＥＳ）には、部分内容に対応する登録動作内容を実行する処理としての動作実行処理を行うか否かが判断される（ステップＳ２４）。具体的には、上述したように、ユーザにより発話された「ＭＤ」という認識動作内容が部分内容を含む場合、当該部分内容に基づき対応する動作実行処理を行うか否かを判断する。 Next, in the determination in step S23, if the recognition operation content includes partial content (step S23; YES), whether or not to perform an operation execution process as a process for executing the registration operation content corresponding to the partial content is determined. Is determined (step S24). Specifically, as described above, when the recognition operation content “MD” uttered by the user includes partial content, it is determined whether or not to perform a corresponding operation execution process based on the partial content.

ここで、ステップＳ２４において、動作実行処理を行うか否かを判断する基準を表３に例示しつつ説明する。 Here, the criteria for determining whether or not to perform the operation execution process in step S24 will be described with reference to Table 3.

当該動作実行処理を行うかの判断基準としては、例えば、表３に示すように、ユーザが走行中であるか否か、又は他の操作の最中であるか否か等が判断要素となる。更に具体的には、ユーザが車を運転中である場合に、人工無脳としての文字データ生成部６により関連文字データＳｇを生成して応答した結果、それに伴って、例えば急激な音量変化又は強制的なルート変更等が発生した場合、車を運転中であるユーザを驚かせてしまう可能性がある。このような車走行中のユーザを驚かせる可能性があるような応答は回避する必要があるため、ユーザが車を運転中である場合には文字データ生成処理を行わないことが適切であると考えられる。よって、このような可能性がある場合は、人工無脳の機能を用いずに動作実行処理を行うと判断される（ステップＳ２４；ＹＥＳ）。

For example, as shown in Table 3, whether or not the user performs the operation execution process is determined based on whether or not the user is running or whether another operation is being performed. . More specifically, when the user is driving the vehicle, the character data generation unit 6 as an artificial brainless unit 6 generates the related character data Sg and responds. As a result, for example, a sudden volume change or When a forced route change or the like occurs, there is a possibility that the user who is driving the vehicle will be surprised. Since it is necessary to avoid such a response that may surprise the user while driving, it is appropriate not to perform the character data generation process when the user is driving the car. Conceivable. Therefore, when there is such a possibility, it is determined that the action execution process is performed without using the artificial brainless function (step S24; YES).

一方、ステップＳ２４の判定により、動作実行処理を行なわないと判断した場合（ステップＳ２４；ＮＯ）には、文字データ生成部６としての人工無脳の機能を用いて関連文字データＳｇを生成すべく、後述するステップＳ２８の処理に移行する。 On the other hand, if it is determined in step S24 that the action execution process is not performed (step S24; NO), the related character data Sg should be generated using the artificial brainless function as the character data generation unit 6. Then, the process proceeds to step S28 described later.

次に、ステップＳ２４の判定において、動作実行処理を行うと判断された場合（ステップＳ２４；ＹＥＳ）、上述した表２に示す動作実行処理（部分内容に対応するものとしての動作実行処理）が行われる（ステップＳ２５）。すなわち、具体的には、表２の左側の部分内容に対応する登録動作内容として表２の右側に夫々示された動作内容を動作実行処理により行う。例えば、表２左側の「ＭD」に対応する場合はその右側にある「ＭＤを再生する」を動作実行部４において実行し、「ナビ」に対応する場合はその右側にある「ナビの画面を表示する」を動作実行部４において実行し、「次」に対応する場合はその右側にある「次の曲を再生する」を動作実行部４において実行し、「前」に対応する「前の曲を再生する」、「天気」に対応する場合はその右側にある「天気予測画面を表示する」を動作実行部４において実行することになる。ここで、上述したように、「ＭＤ」に対応する登録動作内容として、「ＭＤを再生する」を選択した場合について示したが、表２に示した登録動作内容の他にも、「ＭＤ」に対応する登録内容として、例えば、「ＭＤを録音する」、「ＭＤを停止する」等が登録されている場合には、これらの登録動作内容のうちどれを実行するかが選択された後に、その動作内容が実行されるため、この場合の動作実行処理は、部分内容に対応する登録内容のうちから動作実行内容を選択し、その動作実行内容を実行する処理が含まれることになる。 Next, when it is determined in step S24 that the operation execution process is to be performed (step S24; YES), the operation execution process (the operation execution process corresponding to the partial contents) shown in Table 2 is performed. (Step S25). Specifically, the operation content shown on the right side of Table 2 as the registered operation content corresponding to the partial content on the left side of Table 2 is performed by the operation execution process. For example, when “MD” on the left side of Table 2 corresponds to “MD” on the right side, “Play MD” is executed in the action execution unit 4, and when “Navi” corresponds to “Navigation screen on the right side” “Display” is executed in the action execution unit 4. When corresponding to “next”, “play next song” on the right side is executed in the action execution unit 4, and “previous” In the case of corresponding to “play music” and “weather”, the operation execution unit 4 executes “display weather forecast screen” on the right side. Here, as described above, the case where “play MD” is selected as the registration operation content corresponding to “MD” is shown, but in addition to the registration operation content shown in Table 2, “MD” For example, in the case where “record MD”, “stop MD”, etc. are registered as the registered contents corresponding to, after selecting which of these registered operation contents is to be executed, Since the operation content is executed, the operation execution process in this case includes a process of selecting the operation execution content from the registered content corresponding to the partial content and executing the operation execution content.

次に、以下に示す基準により、文字データ生成部による処理である文字データ生成処理を行うか否かが判断される（ステップＳ２６）。 Next, it is determined whether or not to perform character data generation processing, which is processing by the character data generation unit, based on the following criteria (step S26).

ここで、ステップＳ２６に関し、文字データ生成処理を行うか否かを判断する基準について表４を用いて説明する。 Here, with respect to step S26, the criteria for determining whether or not to perform the character data generation process will be described with reference to Table 4.

表４に示すように、ステップＳ２６の判定において文字データ生成処理を行うか否かを判断する基準として、例えば、ＭＤ再生等がなされた場合には、音楽が出力されればＭＤが再生されるという動作が行われたことがわかるので、このような場合には、動作内容を音声で確認するまでもない。また、例えばＭＤの再生、停止、音量調節等の頻度の高い動作がなされた場合であって、この動作内容が行われた際に「ＭＤを再生しました」、「ＭＤを停止しました」等を何度も言われたく無い場合も、動作内容を音声で確認しないことが望まれる。よって、このような場合には、文字データ生成処理を行う（ステップＳ２６；ＹＥＳ）。

As shown in Table 4, as a criterion for determining whether or not to perform the character data generation process in the determination in step S26, for example, when MD reproduction is performed, MD is reproduced if music is output. In such a case, it is not necessary to confirm the operation content by voice. Also, for example, when an operation with a high frequency such as MD playback, stop, volume control, etc. is performed, when this operation content is performed, “MD was played”, “MD was stopped”, etc. Even if you do not want to be told many times, it is desirable not to confirm the operation content by voice. Therefore, in such a case, a character data generation process is performed (step S26; YES).

そして、文字データ生成処理が適切とされた場合は、関連文字データＳｇを生成する。この場合の文字データ生成処理では、ユーザが発話することにより得られた文字データＳｃとは意味の繋がらない意外性のある言葉を関連文字データＳｇとして生成することもあれば、登録動作内容を意味する文字データを中心とした言葉を関連文字データＳｇとして生成することもある（ステップＳ２８）。例えば、登録動作内容を意味する文字データが「ＭＤ再生」だとすると、「ＭＤ聴きたいけど、いいＭＤが無いな。」等がある。さらに、ユーザにより発話された「自宅へ帰る」という文言が部分内容を含むものであっても、その部分内容に対応する動作を行わず、会話の面白さに重点が置かれる場合には、文字データ生成処理により、「嫌だ」の様な文言を示す関連文字データＳｇが生成される（ステップＳ２８）。 If the character data generation process is appropriate, the related character data Sg is generated. In the character data generation process in this case, an unexpected word that is not connected to the character data Sc obtained by the user's utterance may be generated as the related character data Sg, or the registered operation content is meant. The word centering on the character data to be generated may be generated as the related character data Sg (step S28). For example, if the character data meaning the registered operation content is “MD playback”, “I want to listen to MD but I don't have a good MD”. Furthermore, even if the word “go home” uttered by the user includes partial content, if the focus is on the fun of conversation without performing the action corresponding to the partial content, As a result of the data generation processing, related character data Sg indicating a word such as “dislike” is generated (step S28).

なお、上述した関連文字データＳｇを生成する文字データ生成処理（ステップＳ２８）と並行して、カーナビゲーションシステムとしての正規の動作（すなわち、ユーザが発話したことにより得られた文字データＳｃに対する動作（例えば、自宅へ帰るという指示に対応する動作としての、自宅までの地図表示処理等））は、上述したステップＳ２５の段階で完了しており、上記ステップＳ２８による処理は、あくまで会話を楽しむためのものとして文字データ生成処理により生成された関連文字データＳｇを使うものである。 In parallel with the character data generation process (step S28) for generating the related character data Sg described above, the normal operation as the car navigation system (that is, the operation on the character data Sc obtained by the user speaking) For example, the map display processing to the home as the operation corresponding to the instruction to return home is completed in the above-described step S25, and the processing in the above step S28 is only for enjoying the conversation. The related character data Sg generated by the character data generation process is used as the object.

次に、文字データ生成処理によって生成された関連文字データＳｇは、音声合成部９により音声合成処理、つまり、関連文字データＳｇを音声データＳｋに変換する処理が行われ、音声として上記のように、ユーザへスピーカ１１を通して応答出力されることとなる（ステップＳ２９、Ｓ３０）。例えば、「渋滞情報確認」という登録動作内容を意味する文字データがあった場合には、動作では、カーナビゲーション等の画面に、渋滞情報が表示され、音声によっては「東北道で５キロの渋滞です。」等の対応がなされる。 Next, the related character data Sg generated by the character data generation process is subjected to a voice synthesis process by the voice synthesizer 9, that is, a process of converting the related character data Sg into the voice data Sk. Then, a response is output to the user through the speaker 11 (steps S29 and S30). For example, if there is character data that means the registered operation content “confirmation of traffic information”, traffic information will be displayed on the screen of the car navigation etc., depending on the voice. Etc. "is made.

一方、ステップＳ２６の判定において、上述した判断基準として、例えば走行中のユーザを驚かせる可能性があるような応答を回避して危険を招くことを防ぐべく、会話の面白さよりも処理結果を正しくユーザに伝えることが優先される場合（ステップＳ２６；ＮＯ）には、先程なされた（ステップＳ２５）登録動作内容に対応する音声による応答処理が、音声合成部９により行われ、音声としてスピーカ１１を介して応答出力されることとなる（ステップＳ２９、Ｓ３０）。よって、この場合には、ステップＳ２５の動作実行処理によりなされた動作と、ステップＳ２７により行われる後に音声とされる音声対応処理と、が、共に文字データ生成処理を伴わず、動作実行処理を用いて行われることになる。 On the other hand, in the determination in step S26, as the above-described determination criterion, for example, in order to avoid a response that may surprise the user who is running and to avoid danger, the processing result is more correctly than the fun of the conversation. When giving priority to the user (step S26; NO), the voice synthesizer 9 performs a response process by voice corresponding to the registration operation content performed previously (step S25), and the speaker 11 is used as voice. The response is output via (steps S29 and S30). Therefore, in this case, both the action performed by the action execution process of step S25 and the voice corresponding process to be voiced after step S27 do not involve the character data generation process and use the action execution process. Will be done.

他方、ステップＳ２３又はステップＳ２４の判断がＮＯとされた場合（すなわち、認識動作内容が登録動作内容の一部さえも含まない（ステップＳ２３；ＮＯ）場合、又は認識動作内容は登録動作内容の一部は含むが車の運行上人工無脳を用いる処理は実行しない（ステップＳ２４；ＮＯ）場合）、ステップＳ２８の処理に移行することになるが、この場合には動作実行処理（ステップＳ２５）が実行されないため、上述した場合と同様に、動作対応も、音声対応も、全て文字データ生成部６により生成された関連文字データＳｇにより応答がなされることになる。 On the other hand, if the determination in step S23 or step S24 is NO (that is, the recognition operation content does not include even a part of the registration operation content (step S23; NO)), or the recognition operation content is one of the registration operation content. The process is not executed (step S24; NO), but the process proceeds to step S28. In this case, the action execution process (step S25) is performed. Since it is not executed, as in the case described above, the response is made by the related character data Sg generated by the character data generation unit 6 for both operation correspondence and voice correspondence.

次に、ステップＳ２２において、認識動作内容が登録動作内容を意味する文字データＳｃと一致していると判断された場合には（ステップＳ２２；ＹＥＳ）、文字データＳｃの動作内容を実行する（ステップＳ３１）。ここで実行される動作内容は、上述したように、認識動作内容と同内容の登録動作内容であり、登録動作内容に示された内容を実行するものである。 Next, when it is determined in step S22 that the recognized action content matches the character data Sc that means the registered action content (step S22; YES), the action content of the character data Sc is executed (step S22). S31). As described above, the operation content executed here is the registration operation content having the same content as the recognition operation content, and the content shown in the registration operation content is executed.

次に、上述したステップＳ２６以降の動作として、まず、文字データ生成処理を行うか否かが判断される。すなわち、上述したように、ユーザが発話したことにより得られた文字データＳｃに対する動作と同内容の動作実行処理により登録動作内容が実行され、更に文字データ生成処理により生成された関連文字データＳｇによる応答が行われることが適切な場合（ステップＳ２６；ＹＥＳ）には、文字データ生成処理が行われ（ステップＳ２８）、処理結果を正しくユーザに伝えることが優先される場合（ステップＳ２６；ＮＯ）には、先程なされた動作内容と共に、音声による対応も登録動作内容を意味する文字データによる応答が行われる場合とが判断され、その判断に応じて、音声による対応が音声合成処理により行われる（ステップＳ２９、Ｓ３０）。 Next, as the operation after step S26 described above, it is first determined whether or not to perform character data generation processing. That is, as described above, the registration operation content is executed by the operation execution processing having the same content as the operation on the character data Sc obtained by the user's utterance, and further by the related character data Sg generated by the character data generation processing. When it is appropriate that a response is made (step S26; YES), character data generation processing is performed (step S28), and when it is given priority to correctly convey the processing result to the user (step S26; NO). Is determined to be a case where the response by voice is performed together with the action content performed earlier, and the response by the character data meaning the registered operation content is performed, and in response to the determination, the response by voice is performed by the voice synthesis processing (step S29, S30).

次に、図２のステップＳ２８における処理（文字データ生成処理による関連文字データＳｇの生成処理）について、具体的に図３を用いて説明する。なお、図３は、本願に用いられる文字データ生成処理による関連文字データ生成処理を示すフローチャートである。 Next, the processing in step S28 of FIG. 2 (related character data Sg generation processing by character data generation processing) will be specifically described with reference to FIG. FIG. 3 is a flowchart showing related character data generation processing by character data generation processing used in the present application.

まず、文字データ生成処理により関連文字データＳｇを生成する前提として、関連文字データＳｇの生成に使用される「単語キー」をデータベース８に登録しておく。 First, as a premise for generating the related character data Sg by the character data generation process, a “word key” used for generating the related character data Sg is registered in the database 8.

ここで、単語キーとは、図１で説明したように、認識動作内容を品詞分解した結果により得られた文字データＳｆに関連があり且つ登録動作内容を意味するものをいう。例えば、「こんにちは」を単語キーＳｉとして登録する。具体的には表５を用いて説明する。 Here, as described in FIG. 1, the word key refers to a word that is related to the character data Sf obtained as a result of the part-of-speech decomposition of the recognition operation content and means the registered operation content. For example, to register the "Hello" as a word key Si. Specifically, this will be described with reference to Table 5.

表５に示すように、第１の単語キーＳｉとして、出現形である「こんにちは」と登録すると共に、当該出現形の読み「コンニチハ」、原形「こんにちは」、原形の読み「コンニチハ」、品詞「感動詞」等の単語キーを登録しておく。同様に、第２の単語キーＳｉとして、出現形である「初め」と登録し、出現形の読み「ハジメ」、原形「初め」、原形の読み「ハジメ」、品詞「名詞」等のデータを登録しておく。このように、第３の単語キーＳｉに「まして」、第４の単語キーＳｉに「お年玉」といった単語キーＳｉを登録する。ここで、単語キーＳｉは自ら登録してもよいし、予めデータベース８に登録されているものを関連文字データ生成に使用してもよい。

As shown in Table 5, as the first word key Si, the appearance shape and registers as "Hello", reading of the appearance form "Hello", original "Hello", reading original "Hello", the part of speech " Register a word key such as “adverb”. Similarly, as the second word key Si, “first” which is an appearance form is registered, and data such as “hajime” which is the appearance form, “hajime” which is the original form, “hajime” which is the original form, and part of speech “noun” are stored. Register. In this way, the word key Si such as “Noshi” is registered in the third word key Si and “Otoshi” is registered in the fourth word key Si. Here, the word key Si may be registered by itself, or a word key registered in advance in the database 8 may be used for generating related character data.

ここで、表５に示されているカウンタは、ユーザが単語キーＳｉを使用する際にカウントされるものである。カウンタの数が多いほど、ユーザによって使用された頻度が高いことを示しているため、この情報を基に関連文字データＳｇの生成に使われる確立が高くなる。 Here, the counter shown in Table 5 is counted when the user uses the word key Si. The greater the number of counters, the higher the frequency of use by the user, and the higher the probability that the related character data Sg will be generated based on this information.

更に、具体的には表６を用いて説明する。 Furthermore, it demonstrates concretely using Table 6. FIG.

表６は、上記表５で登録された単語キーＳｉがユーザによってどの程度の頻度でテーマとして使われたかを登録するため、かつてユーザが使用した単語キーの中から、名詞、動詞、感動詞を抜き出してそれぞれの単語キーＳｉが、テーマとして使われた順番と、単語キーを使用した時間（グリニッジ標準時からの経過時間）と共に登録される。

Table 6 shows how often the word key Si registered in Table 5 is used as a theme by the user. From the word keys used by the user, nouns, verbs, and emotional verbs are stored. Each extracted word key Si is registered together with the order in which it was used as a theme and the time for which the word key was used (elapsed time from Greenwich Mean Time).

これにより、テーマ番号が大きく、取得時間が現在の時間と近いものがユーザにより最近使用されたものであることがわかるため、今後関連文字データを生成する際にテーマとなる確率が高くなる。 As a result, it can be seen that a user with a large theme number and an acquisition time close to the current time has been recently used by the user, so that the probability of becoming a theme when generating related character data in the future increases.

次に、データベース８に登録されている単語キーＳｉを用いて、関連文字データ生成処理による関連文字データ生成について説明する。 Next, related character data generation by the related character data generation processing using the word key Si registered in the database 8 will be described.

図３に示すように、ユーザにより音声が発話された後、音声処理装置Ｖが確率的に返答するか否かが設定される（ステップＳ４１）。次に、ユーザにより設定された結果が返答するか否かを判断する（ステップＳ４２）。設定結果が返答しないとされた場合（ステップＳ４２；ＮＯ）には、操作を終了する（ステップＳ４３）。設定結果が返答するとされた場合（ステップＳ４２；ＹＥＳ）には、次に、ユーザにより関連文字データ生成に用いるテーマが設定され、これにより関連文字データ生成に用いるテーマを取得する（ステップＳ４４）。このテーマに従い、後に関連文字データＳｇの生成が行われる。 As shown in FIG. 3, it is set whether or not the voice processing device V responds stochastically after the voice is spoken by the user (step S41). Next, it is determined whether or not the result set by the user responds (step S42). If the setting result is not answered (step S42; NO), the operation is terminated (step S43). If it is determined that the setting result is a response (step S42; YES), then, a theme used for generating related character data is set by the user, thereby acquiring a theme used for generating related character data (step S44). In accordance with this theme, the related character data Sg is generated later.

ここでのテーマは、上述したような名詞、動詞、感動詞等である。この単語を用いてその単語を元に関連文字データＳｇを生成する。上述したように、データベース８に登録しておいた単語キーＳｉのうちから、返答の際に関連文字データＳｇを生成するためのテーマを設定する。 The themes here are nouns, verbs, emotional verbs and the like as described above. Using this word, the related character data Sg is generated based on the word. As described above, a theme for generating the related character data Sg is set from the word keys Si registered in the database 8 when replying.

その後、ユーザにより発話された音声を認識してから、音声処理装置Ｖによる返答がどのくらいの時間で行われるかの返答時間が記録される（ステップＳ４５）。次に、テーマ選択を行う（ステップＳ４６）。この際、テーマの単語と同じ読みのテーマを探して、ランダムに選択する。その後、単語キー列の取得を行う（ステップＳ４７）。 Then, after recognizing the voice uttered by the user, a response time indicating how long a response is made by the voice processing device V is recorded (step S45). Next, theme selection is performed (step S46). At this time, a theme having the same reading as the theme word is searched and selected at random. Thereafter, the word key string is acquired (step S47).

ここで、具体的には表７を用いて説明する。 Here, the details will be described with reference to Table 7.

表７は、表５に登録された単語キーＳｉに対する番号がそれぞれ第１単語、第２単語、第３単語の順で並んでおり、それを組み合わせて文字データとして表されることになる。

In Table 7, numbers for the word keys Si registered in Table 5 are arranged in the order of the first word, the second word, and the third word, respectively, and are combined and expressed as character data.

表７に示すように、第１単語に０がある並びはそこが第１の節となる。逆に、第３単語に0がある並びはそこが終わりの節となる。単語キーの並び１は、０−１−０の順で並べられているので、「こんにちは」という文字列になる。さらに単語キーの並び２は、０−２−３であるので、「初めまして」という文字列になる。 As shown in Table 7, a sequence having 0 in the first word is the first clause. Conversely, if the third word has 0, that is the end clause. The arrangement of the word key 1, so are arranged in the order of 0-1-0, made to the string "Hello". Furthermore, since the word key sequence 2 is 0-2-3, it becomes a character string “Nice to meet you”.

関連文字データＳｇを生成する上でのテーマが表５の「８正月」（上記テーマデータの４、もしくは５がテーマとして決定されたとき）である場合、第２単語が「８」のデータを呼び出す。つまり、表７に示すように、単語キーの並びが「７−８−９」である単語キー列を呼び出す。 When the theme for generating the related character data Sg is “8 New Year” in Table 5 (when the theme data 4 or 5 is determined as the theme), the second word is “8”. call. That is, as shown in Table 7, a word key string whose word key sequence is “7-8-9” is called.

次に、繋がった関連文字データを生成していく際に、単語キー列の前に繋がる単語キーＳｉがあるかを判断する（ステップＳ４８）。単語キー列の前に繋がる単語キーＳｉがある場合（ステップＳ４８；ＹＥＳ）には、単語キー列の前に単語キーＳｉを付加する（ステップＳ４９）。単語キー列が「７−８−９」の時に、「０−７−８」と繋がって、「０−７−８−９」となる。０が並びの初めにくるまで、単語キー列の前に繋がる単語キーＳｉがある場合には（ステップＳ４８；ＹＥＳ）、単語キー列の前に単語キーＳｉを付加する（ステップＳ４９）という動作が続けられる。０が並びの初めにきたら、この動作を終了し、次に、単語キー列の後に繋がる単語キーＳｉがあるかが判断される（ステップＳ５０）。 Next, when the connected related character data is generated, it is determined whether there is a word key Si connected before the word key string (step S48). If there is a word key Si connected before the word key string (step S48; YES), the word key Si is added before the word key string (step S49). When the word key string is “7-8-9”, it is connected to “0-7-8” to become “0-7-8-9”. If there is a word key Si connected before the word key string until 0 comes to the beginning of the sequence (step S48; YES), the operation of adding the word key Si before the word key string (step S49) is performed. You can continue. If 0 comes to the beginning of the sequence, this operation is terminated, and then it is determined whether there is a word key Si connected after the word key string (step S50).

単語キー列の前に繋がる単語キーＳｉがない場合には（ステップＳ４８；ＮＯ）、続いて、単語キー列の後に繋がる単語キーＳｉがあるかが判断される（ステップＳ５０）。 If there is no word key Si connected before the word key string (step S48; NO), it is subsequently determined whether there is a word key Si connected after the word key string (step S50).

次に、単語キー列の後に繋がる単語キーＳｉがない場合には（ステップＳ５０；ＮＯ）、続いて、単語キー列を文字列に変換する（ステップＳ５２）。 Next, when there is no word key Si connected after the word key string (step S50; NO), the word key string is converted into a character string (step S52).

単語キー列の後に繋がる単語キーＳｉがある場合には（ステップＳ５０；ＹＥＳ）、単語キー列の前に単語キーＳｉを付加する（ステップＳ５１）。単語キー列が「０−７−８−９」の時に、「８−９−１０」と繋がって、「０−７−８−９−１０」となる。同様にして、０が並びの終わりにくるまで、単語キー列の前に繋がる単語キーＳｉがある場合には（ステップＳ４８；ＹＥＳ）、単語キー列の前に単語キーＳｉを付加する（ステップＳ４９）という動作が続けられる。０が並びの終わりにきたら、この動作を終了し、次に、単語キー列を文字列に変換する（ステップＳ５２）。 If there is a word key Si connected after the word key string (step S50; YES), the word key Si is added before the word key string (step S51). When the word key string is “0-7-8-9”, it is connected to “8-9-10” to become “0-7-8-9-10”. Similarly, until there is a word key Si connected before the word key string until 0 comes to the end of the sequence (step S48; YES), the word key Si is added before the word key string (step S49). ) Is continued. When 0 comes to the end of the sequence, this operation is terminated, and then the word key string is converted into a character string (step S52).

単語キーの並び「０−７−８−９−１０−０」が決定されると、それに基づいた単語の出現形を割り当てる。 When the word key sequence “0-7-8-9-10-0” is determined, the appearance form of the word based on it is assigned.

０は、始まりと終わりの記号として考えられ、実際単語は割り当てられてないため、表５により、７、８、９、１０に相当するそれぞれの単語を割り出し、「そろそろ-正月-だ-な」という言葉が生成される。 Since 0 is considered as a symbol of the beginning and end, and no actual word is assigned, the respective words corresponding to 7, 8, 9, and 10 are determined according to Table 5, and "Now, New Year-It is-" Is generated.

このように、テーマとなった言葉を中心にそれと異なる３単語をデータベース８から検索し、それをつなぎあわせて文字データを生成する。
データベース８に登録されている３単語のつながりが自然であればあるほど、出力される文は人間にとって意味をなしたものとなる。 In this way, three words different from the central word are searched from the database 8 and connected to generate character data.
The more natural the connection of the three words registered in the database 8, the more meaningful the output sentence will be for humans.

上述したように、ユーザの発話により得られた認識動作内容が、登録動作内容と一致しない場合であっても、文字データ生成部６により関連文字データＳｇを生成し、音声処理装置Ｖによってユーザに何らかの応答を行わせることが可能であるが、関連文字データＳｇの生成に用いられる単語キーＳｉを更新できなければ、関連文字データ生成を行う際に用いられる単語キーＳｉの新鮮味が保たれなくなり、会話の面白さが半減してしまう。 As described above, even if the recognition operation content obtained by the user's utterance does not match the registration operation content, the character data generation unit 6 generates the related character data Sg, and the voice processing device V informs the user. Although some response can be made, if the word key Si used for generating the related character data Sg cannot be updated, the freshness of the word key Si used for generating the related character data cannot be maintained. The fun of conversation is halved.

よって、文字データ生成処理により、新鮮味のあり、面白い関連文字データが生成されるため、関連文字データＳｇの生成に使用される単語キーＳｉが更新される。 Therefore, since fresh and interesting related character data is generated by the character data generation process, the word key Si used for generating the related character data Sg is updated.

以下に、図４を用いて、本願に用いられるデータベース８に保存されている単語キーＳｉが更新される手法について説明する。なお、図４は、本願に用いられるデータベースに保存されている文字データが更新される際のフローチャートである。 Hereinafter, a method of updating the word key Si stored in the database 8 used in the present application will be described with reference to FIG. FIG. 4 is a flowchart when the character data stored in the database used in the present application is updated.

データベース８を更新する場合には、まず、ユーザにより発話されたことにより得られた文字データを解析部により解析して、品詞分解し、単語に区切る（ステップＳ６１）。表７に示したように、単語キーＳｉが、3単語の並びがあるか否かを判断する（ステップＳ６２）。３単語の並びが無い場合（ステップＳ６２；ＮＯ）には、登録しない（ステップＳ６６）。３単語の並びがある場合（ステップＳ６２；ＹＥＳ）には、次に、データベースに登録されている並びか否かの判断する（ステップＳ６３）。 When updating the database 8, first, character data obtained by uttering by the user is analyzed by the analysis unit, the part of speech is decomposed, and divided into words (step S61). As shown in Table 7, it is determined whether or not the word key Si has an arrangement of three words (step S62). If there is no arrangement of three words (step S62; NO), no registration is made (step S66). If there is a sequence of three words (step S62; YES), it is next determined whether or not the sequence is registered in the database (step S63).

データベース８に登録されている並びである場合（ステップＳ６３；ＹＥＳ）には、改めて登録する必要がないので、登録しない（ステップＳ６６）。データベース８に登録されていない並び場合（ステップＳ６３；ＮＯ）には、単語データ、単語並びデータを登録するかを判断する（ステップＳ６４）。そして、登録しない場合（ステップＳ６４；ＮＯ）には、登録されず（ステップＳ６６）、登録する場合（ステップＳ６４；ＹＥＳ）には、名詞、動詞、感動詞等の単語キーＳｉ又は、単語キーの並びを登録する（ステップＳ６５）。 If the sequence is registered in the database 8 (step S63; YES), it is not necessary to register again, so that it is not registered (step S66). If the line is not registered in the database 8 (step S63; NO), it is determined whether to register word data and word line data (step S64). And when not registering (step S64; NO), it is not registered (step S66), and when registering (step S64; YES), the word key Si or the word key of noun, verb, impression verb, etc. The line is registered (step S65).

このようにして、関連文字データＳｇの生成に用いられるために、データベース８に保存されている文字データを更新することができるので、関連文字データ生成を行う際に用いられる文字データの新鮮味が保たれることになる。 In this way, since the character data stored in the database 8 can be updated to be used for generating the related character data Sg, the freshness of the character data used when generating the related character data is maintained. Will be drunk.

なお、ネットワークを使ってデータベース８を更新することも考えられる。 It is also conceivable to update the database 8 using a network.

上述したように、ユーザが音声処理装置に登録されている定型文を発話しなくても、文字データ生成手段により、ユーザによって発話された単語を基に、又は時にそれを無視し、ランダム性のある文が作成されるので、定型文を話さなくても自由な表現を用いて対話が可能である。 As described above, even if the user does not utter a standard sentence registered in the speech processing device, the character data generation means ignores the word uttered by the user based on the word or sometimes, Since a certain sentence is created, it is possible to talk using a free expression without speaking a fixed sentence.

よって、会話はいつも違った結果を生ずるので多様性があり、何が出るかわからない言葉のキャッチボールとして会話を楽しむことができ、飽きさせない音声装置を実現することができる。 Therefore, since conversations always produce different results, there are diversity, and conversation can be enjoyed as a catch ball of words that do not know what happens, and an audio device that does not get bored can be realized.

また、ユーザが音声処理装置に登録されていない単語を発話した場合であっても、文字データ生成し応答するため、ユーザがどんな言葉を発話しても、受け答えをしてくれるため、無反応という事態を無くし、音声操作への抵抗感、拒否感を軽減できる。 In addition, even if the user utters a word that is not registered in the speech processing device, the character data is generated and responded. The situation can be eliminated and the sense of resistance and refusal to voice operation can be reduced.

ざらに、音声が誤認識された場合であっても、会話としての面白さを重視すれば、なんらかの応答はされるので、無反応という状態を回避できる。 Roughly, even if the voice is misrecognized, any response can be made if importance is given to the fun of conversation, so that the state of no response can be avoided.

なお上述した実施形態につき、認識動作内容が登録動作内容と一致する場合、又は一致しない場合のいずれであっても会話としての面白さに重点が置かれる場合には、動作対応も音声対話も両者とも文字データ生成処理を行うこととしてもよい。この場合は、はじめから文字データと登録動作内容を意味する文字データと一致しているか否かの判断を行わないことになる。 In the above-described embodiment, when the recognition operation content matches or does not match the registered operation content, if the emphasis is on the fun of conversation, both the action correspondence and the voice dialogue are both Both may perform character data generation processing. In this case, it is not determined from the beginning whether the character data matches the character data indicating the registered operation content.

さらに、上述した実施形態は動作による対応が先になされ、音声による対応がその後なされるという形態を示したが、音声による対応が先であってもよい。 Furthermore, although the above-described embodiment has shown the form in which the response by the operation is performed first and the response by the voice is performed thereafter, the response by the voice may be performed first.

なお、上述した実施形態は、ユーザが発話した音声が正しく認識された場合を前提としているものであるが、仮に、音声が認識されて文字データに変換される際に既に誤認識されていた場合であっても本発明を有効に利用することができる。 The above-described embodiment is based on the assumption that the voice uttered by the user is correctly recognized. However, if the voice is recognized and converted into character data, it is already erroneously recognized. Even so, the present invention can be used effectively.

ここで、誤認識とは、認識動作内容が登録動作内容と一致するか否かを判断する場合や、認識動作内容が部分内容を含有するか否かを判断する場合に、正しく判断されなかった場合をいう。例えば、認識動作内容と登録動作内容が一致する場合に、認識動作内容と登録動作内容が一致しない（Ｓ２２；Ｎｏ）と判断された場合や、認識動作内容が登録動作内容と一致していないが、認識動作内容に部分内容を含んでいる場合に、認識動作内容が登録動作内容と一致せず、認識動作内容に部分内容を含まない（Ｓ２２；Ｎｏ→Ｓ２３；Ｎｏ）と判断された場合、更には、登録動作内容と部分内容を共に含まない場合に、登録動作内容と一致はしないが、部分内容は含む（Ｓ２２；Ｎｏ→Ｓ２３；Yes）と判断された場合等が考えられる。誤認識がなされた場合には、ユーザの真に意図した応答がなされない場合もあるが、このような場合であっても会話としての面白さを重視すれば、誤認識文字データにより動作実行処理又は文字データ生成処理によって何らかの応答がなされることとなるため、無反応という状態を回避できる。 Here, misrecognition was not correctly determined when determining whether the recognition operation content matches the registered operation content or when determining whether the recognition operation content contains partial content. Refers to cases. For example, when the recognition operation content matches the registration operation content, if it is determined that the recognition operation content does not match the registration operation content (S22; No), or the recognition operation content does not match the registration operation content. When it is determined that the recognition operation content does not match the registered operation content and the recognition operation content does not include the partial content (S22; No → S23; No) when the recognition operation content includes the partial content, Furthermore, when both the registered operation content and the partial content are not included, there is a case where it is determined that the registered content is not included, but the partial content is included (S22; No → S23; Yes). If there is a misrecognition, there may be cases where the user's truly intended response may not be made. Alternatively, since some response is made by the character data generation process, a state of no reaction can be avoided.

更に、上記の様に、会話を楽しむためにあえて誤認識させる場合、例えば１分間に１０回誤認識が実行されると、その後は正しい認識をし、それに基づいて文字データ生成処理が行われるようにする等の設定をすることも可能である。 Furthermore, as described above, in order to make a recognition error in order to enjoy a conversation, for example, if erroneous recognition is executed 10 times per minute, then correct recognition is performed, and character data generation processing is performed based on the recognition. It is also possible to make settings such as

上述したように、音声処理装置、音声処理方法、音声処理用プログラム、記録媒体について説明したが、音声のみならず、映像、音の組み合わせにも同様に対応することができる。 As described above, the audio processing device, the audio processing method, the audio processing program, and the recording medium have been described. However, not only audio but also a combination of video and audio can be similarly handled.

例えば、上述した実施形態では、ユーザにより発話された音声に対する音声処理装置の応答を示してきたが、ユーザにより発話された音声に対して映像を用いた処理装置による応答処理についても応用が可能である。その場合には上述した、図２のＳ２８〜Ｓ３０は、映像を用いた処理がなされることになる。 For example, in the above-described embodiment, the response of the voice processing device to the voice uttered by the user has been shown. However, the response processing by the processing device using video for the voice uttered by the user can be applied. is there. In this case, the above-described processing using video is performed in S28 to S30 in FIG.

更に、これらの変形形態として、映像を用いた処理装置による応答処理について説明する。 Furthermore, as a modification of these, response processing by a processing device using video will be described.

（II）変形形態
上述した実施形態では、ユーザから入力された音声の内容を認識動作内容として用いたのに対し、以下の変形形態では、ユーザの動きを映像として捉えその内容を認識した結果を認識動作内容として用いる。 (II) Modified Embodiment In the embodiment described above, the content of the voice input from the user is used as the recognition operation content, whereas in the following modified embodiment, the result of capturing the user's movement as a video and recognizing the content is used. Used as recognition operation content.

より具体的には、先ず、ユーザの動きをカメラで認識し、その動きをデータとして取得し、同様の動きをするデータをデータベースから検索し、その読み出したデータを人工無脳機能を用いて組み合わせて新たな動きを生成し、それを図示しない表示部において表示する。この場合、時には完全に動きを真似する動作になることもあるが、人工無脳の活用時には元々画一的な動作を期待しないなので、ユーザがどのような動きをしても色々な反応を画像表示して応答することが可能である。 More specifically, first, the user's movement is recognized by the camera, the movement is acquired as data, data having the same movement is searched from the database, and the read data is combined using an artificial brainless function. A new motion is generated and displayed on a display unit (not shown). In this case, sometimes it may be an action that imitates the movement completely, but when using artificial brain, it does not expect a uniform action from the beginning, so various actions will be taken regardless of the movement of the user It is possible to display and respond.

更に、図１に示すフローチャートに対応するプログラムをフレキシブルディスク又はハードディスク等の記録媒体に記録しておき、或いはインターネット等のネットワークを介して取得して記録しておき、これをマイクロコンピュータ等により読み出して実行することにより、当該マイクロコンピュータを各実施形態に係る制御部として機能させることも可能である。 Further, a program corresponding to the flowchart shown in FIG. 1 is recorded on a recording medium such as a flexible disk or a hard disk, or is acquired and recorded via a network such as the Internet, and is read by a microcomputer or the like. When executed, the microcomputer can function as a control unit according to each embodiment.

なお、本願の音声処理装置、音声処理方法、音声処理用プログラム、記録媒体は、上記実施の形態に限定されるものではなく、テレビ、オーディオシステム等に搭載したり、自由な言葉に応答する音声装置等にも本願発明を用いることができる。 Note that the sound processing device, sound processing method, sound processing program, and recording medium of the present application are not limited to the above-described embodiments, and are mounted on a television, an audio system, or the like, or sound that responds to free words. The present invention can also be used for devices and the like.

本願の音声処理装置の概要構成を示すブロック図である。It is a block diagram which shows schematic structure of the audio processing apparatus of this application. 本願の音声処理装置の音声処理の全体を示すフローチャートである。It is a flowchart which shows the whole audio | voice process of the audio | voice processing apparatus of this application. 本願に用いられる文字データ生成処理による関連文字データ生成を示すフローチャートである。It is a flowchart which shows the related character data generation by the character data generation process used for this application. 本願に用いられるデータベースに保存されている文字データが更新される際のフローチャートである。It is a flowchart at the time of the character data preserve | saved at the database used for this application being updated.

Explanation of symbols

１・・・マイク
２・・・Ａ／Ｄ変換部
３・・・音声認識部
４・・・動作実行部
５・・・制御部
６・・・文字データ生成部
７・・・解析部
８・・・データベース
９・・・音声合成部
１０・・・Ｄ／Ａ変換部
１１・・・スピーカ DESCRIPTION OF SYMBOLS 1 ... Microphone 2 ... A / D conversion part 3 ... Voice recognition part 4 ... Operation | movement execution part 5 ... Control part 6 ... Character data generation part 7 ... Analysis part 8. .. Database 9 ... speech synthesizer 10 ... D / A converter 11 ... speaker

Claims

Speech recognition means for recognizing spoken speech and converting it into character data;
Storage means for storing pre-registered registration operation content;
A character that generates related character data that represents an operation content related to the recognition operation content when the recognition operation content that is the operation content indicated by the character data does not match the registered operation content stored in the storage unit. Data generation means;
Response means for performing a response to the voice using the generated related character data;
An audio processing apparatus comprising:

The speech processing apparatus according to claim 1,
The character data generating means does not match the registration operation content with the registration operation content, and any one of the partial contents stored in the storage means as a part of the registration operation content. The speech processing apparatus, wherein the related character data is generated when the recognition operation content is not included.

The speech processing apparatus according to claim 2,
Execution to further determine whether or not to execute the registration operation content corresponding to the partial content when the recognition operation content does not match the registration operation content and the recognition operation content includes the partial content Further comprising a process determination means;
The speech processing apparatus, wherein the character data generation unit generates the related character data when the execution process determination unit determines that the registration operation content corresponding to the partial content is not to be executed.

The voice processing device according to claim 3,
A speech processing apparatus, further comprising: a first operation executing unit that executes the registered operation content when the execution process determining unit determines to execute the registered operation content corresponding to the partial content.

The speech processing apparatus according to claim 4,
The voice processing device, wherein the character data generation unit generates the related character data when the first operation execution unit executes the registration operation content corresponding to the partial content.

The speech processing apparatus according to any one of claims 1 to 5,
The speech processing apparatus further comprising first speech synthesis means for converting the related character data generated by the character data generation means into speech data and generating a corresponding speech.

The speech processing apparatus according to any one of claims 4 to 6,
The voice processing apparatus according to claim 1, wherein the first action executing means executes action contents of the related character data generated by the character data generating means.

The speech processing apparatus according to any one of claims 1 to 7,
The speech processing apparatus further comprising: a second operation executing unit that executes the registered operation content when the recognized operation content matches the registered operation content.

The speech processing apparatus according to claim 8, wherein
The voice processing apparatus, wherein the character data generation unit generates the related character data when the second operation execution unit executes the registered operation content.

The speech processing apparatus according to claim 8, wherein
The voice processing apparatus, wherein the character data generation unit generates registration operation character data related to the registration operation content when the second operation execution unit executes the registration operation content.

The speech processing apparatus according to any one of claims 8 to 10,
2. A speech processing apparatus, further comprising: second speech synthesis means for converting character data indicating the content of the operation executed by the second operation execution means into speech data and emitting a corresponding speech.

The speech processing apparatus according to any one of claims 8 to 11,
The voice processing apparatus, wherein the second operation executing means executes the operation content of the related character data generated by the character data generating means.

A speech recognition process for recognizing spoken speech and converting it into character data;
If the recognition operation content, which is the operation content indicated by the character data, does not match the registered operation content stored in advance in the storage unit, related character data that means the operation content related to the recognition operation content is generated. A character data generation process;
A response step of performing a response to the voice using the generated related character data;
An audio processing method comprising:

An audio processing program for causing a computer to function as the audio processing device according to any one of claims 1 to 12.

15. An information recording medium in which the audio processing program according to claim 14 is recorded so as to be readable by the computer.