JP6287754B2

JP6287754B2 - Response generation apparatus, response generation method, and response generation program

Info

Publication number: JP6287754B2
Application number: JP2014214615A
Authority: JP
Inventors: 生聖渡部
Original assignee: Toyota Motor Corp
Current assignee: Toyota Motor Corp
Priority date: 2014-10-21
Filing date: 2014-10-21
Publication date: 2018-03-07
Anticipated expiration: 2034-10-21
Also published as: JP2016080980A

Description

本発明は、ユーザに対して応答を行う応答生成装置、応答生成方法及び応答生成プログラムに関するものである。 The present invention relates to a response generation apparatus that performs a response to a user, a response generation method, and a response generation program.

ユーザの音声を認識する音声認識手段と、音声認識手段により認識された音声の構造を解析する構造解析手段と、構造解析手段により解析された音声の構造に基づいて、ユーザの音声に対する応答文を生成し、該生成した応答文を出力する応答出力手段と、を備える応答生成装置が知られている（例えば、特許文献１参照）。 A voice recognition means for recognizing the user's voice, a structure analysis means for analyzing the structure of the voice recognized by the voice recognition means, and a response sentence to the user's voice based on the structure of the voice analyzed by the structure analysis means. There is known a response generation device including a response output unit that generates and outputs the generated response sentence (see, for example, Patent Document 1).

特開２０１０−１５７０８１号公報JP 2010-157081 A

上記のような応答生成装置は、音声の構造解析、及びその応答文の生成に時間を要し、応答待ちが生じる。このため、対話に違和感が生じる虞がある。そこで、例えば、その応答待ちの間に音声認識手段により認識したユーザの音声を繰返し応答文として用い簡易に応答を行うことが考えられる。この場合、応答待ちが短くなり対話の違和感が緩和されるが、画一的な応答パターンとなり対話としての不自然さが残る。 The response generation apparatus as described above takes time for the structure analysis of voice and the generation of the response sentence, and waiting for a response occurs. For this reason, there is a possibility that a sense of incongruity occurs in the dialogue. Thus, for example, it is conceivable to simply respond by using the user's voice recognized by the voice recognition means while waiting for the response as a repeated response sentence. In this case, the waiting time for the response is shortened, and the uncomfortable feeling of the dialogue is alleviated, but the response pattern becomes uniform and unnaturalness as the dialogue remains.

本発明は、このような問題点を解決するためになされたものであり、画一的な応答パターンによる対話の違和感を緩和することができる応答生成装置、応答生成方法及び応答生成プログラムを提供することを主たる目的とする。 The present invention has been made to solve such problems, and provides a response generation device, a response generation method, and a response generation program that can alleviate the uncomfortable feeling of dialogue due to a uniform response pattern. The main purpose.

上記目的を達成するための本発明の一態様は、ユーザの音声を認識する音声認識手段と、前記音声認識手段により認識された音声の構造を解析する構造解析手段と、前記音声認識手段により認識された音声から名詞を抽出する名詞抽出手段と、前記名詞抽出手段により抽出された名詞に基づいて、前記ユーザの音声を繰り返すための繰返し応答文を生成する繰返生成手段と、前記構造解析手段により解析された音声の構造に基づいて、前記ユーザの音声に対する随意の応答文を生成し、前記繰返生成手段により生成された繰返し応答文を出力した後、前記生成した随意の応答文を出力する応答出力手段と、を備える応答生成装置であって、複数のキーワードに該各キーワードの印象度の大きさを夫々対応付けた情報と、該印象度の大きさに付加語及び出力態様を対応付けた情報と、を記憶する記憶手段を備え、前記繰返生成手段は、前記名詞抽出手段により抽出された名詞と一致する前記記憶手段のキーワードを選択し、該選択したキーワードの印象度の大きさに対応する付加語を前記名詞に付加して前記繰返し応答文を生成し、前記応答出力手段は、前記名詞抽出手段により抽出された名詞と一致する前記記憶手段に記憶された各キーワードを選択し、該選択したキーワードの印象度の大きさに対応する出力態様で、前記繰返生成手段により生成された繰返し応答文を出力する、ことを特徴とする応答生成装置である。
この一態様において、前記記憶手段は、前記各キーワードの印象度を表す数値に、音量、周波数、およびイントネーションのうち少なくとも１つの前記出力態様を、対応付けた情報を記憶していてもよい。
この一態様において、前記各キーワードの印象度の大きさが増加するに従がって、音量、周波数、およびイントネーションのうち少なくとも１つが増加するように前記出力態様が設定されていてもよい。
この一態様において、前記ユーザの音声の音韻を分析する音韻分析手段と、前記前記音韻分析手段により分析された音韻の分析結果に基づいて、前記ユーザの音声に対する相槌の応答を生成する相槌生成手段と、を更に含み、前記繰返生成手段により生成される繰返しの応答文を出力する前に、前記相槌生成手段により生成された相槌の応答を出力してもよい。
上記目的を達成するための本発明の一態様は、ユーザの音声を認識するステップと、前記認識された音声の構造を解析するステップと、前記認識された音声から名詞を抽出するステップと、前記抽出された名詞に基づいて、前記ユーザの音声を繰り返すための繰返し応答文を生成するステップと、前記解析された音声の構造に基づいて、前記ユーザの音声に対する随意の応答文を生成し、前記生成された繰返し応答文を出力した後、前記生成した随意の応答文を出力するステップと、複数のキーワードに該各キーワードの印象度の大きさを夫々対応付けた情報と、該印象度の大きさに付加語及び出力態様を対応付けた情報と、を記憶するステップと、を含む応答生成方法であって、前記抽出された名詞と一致する前記キーワードを選択し、該選択したキーワードの印象度の大きさに対応する付加語を前記名詞に付加して前記繰返し応答文を生成し、前記抽出された名詞と一致する前記各キーワードを選択し、該選択したキーワードの印象度の大きさに対応する出力態様で、前記生成された繰返し応答文を出力する、ことを特徴とする応答生成方法であってもよい。
上記目的を達成するための本発明の一態様は、ユーザの音声を認識する処理と、前記認識された音声の構造を解析する処理と、前記認識された音声から名詞を抽出する処理と、前記抽出された名詞に基づいて、前記ユーザの音声を繰り返すための繰返し応答文を生成する処理と、前記解析された音声の構造に基づいて、前記ユーザの音声に対する随意の応答文を生成し、前記生成された繰返し応答文を出力した後、前記生成した随意の応答文を出力する処理と、をコンピュータに実行させる応答生成プログラムであって、複数のキーワードに該各キーワードの印象度の大きさを夫々対応付けた情報と、該印象度の大きさに付加語及び出力態様を対応付けた情報と、が記憶されており、前記抽出された名詞と一致する前記キーワードを選択し、該選択したキーワードの印象度の大きさに対応する付加語を前記名詞に付加して前記繰返し応答文を生成する処理と、前記抽出された名詞と一致する前記各キーワードを選択し、該選択したキーワードの印象度の大きさに対応する出力態様で、前記生成された繰返し応答文を出力する処理と、をコンピュータに実行させることを特徴とする応答生成プログラムであってもよい。 In order to achieve the above object, one aspect of the present invention includes a speech recognition unit that recognizes a user's voice, a structure analysis unit that analyzes a structure of a speech recognized by the speech recognition unit, and a recognition by the speech recognition unit. Noun extraction means for extracting a noun from the generated voice, repeat generation means for generating a repetitive response sentence for repeating the user's voice based on the noun extracted by the noun extraction means, and the structure analysis means Based on the structure of the voice analyzed by the above, an arbitrary response sentence to the user's voice is generated, the repeated response sentence generated by the repetition generation means is output, and then the generated arbitrary response sentence is output And a response output means for responding to the information by associating a plurality of keywords with the magnitude of the impression degree of each keyword, and attaching the magnitude of the impression degree. Storage means for storing a word and information in which an output mode is associated, and the repetition generation means selects a keyword of the storage means that matches the noun extracted by the noun extraction means, and selects the selected An additional word corresponding to the magnitude of the impression level of the keyword is added to the noun to generate the repeated response sentence, and the response output means stores in the storage means that matches the noun extracted by the noun extraction means In the response generation device, the selected keyword is selected, and the repeated response sentence generated by the repeated generation unit is output in an output mode corresponding to the impression level of the selected keyword. is there.
In this aspect, the storage means may store information in which at least one of the output forms of volume, frequency, and intonation is associated with a numerical value representing the impression level of each keyword.
In this one aspect, the output aspect may be set so that at least one of volume, frequency, and intonation increases as the impression level of each keyword increases.
In this aspect, the phoneme analysis unit that analyzes the phoneme of the user's voice, and the conflict generation unit that generates a response to the user's voice based on the analysis result of the phoneme analyzed by the phoneme analysis unit And before outputting the repeated response sentence generated by the repetition generation means, the response of the interaction generated by the interaction generation means may be output.
One aspect of the present invention for achieving the above object includes a step of recognizing a user's voice, a step of analyzing a structure of the recognized voice, a step of extracting a noun from the recognized voice, Generating a repetitive response sentence for repeating the user's voice based on the extracted noun; generating an optional response sentence for the user's voice based on the analyzed voice structure; After outputting the generated repeated response sentence, the step of outputting the generated arbitrary response sentence, information in which the magnitude of the impression level of each keyword is associated with a plurality of keywords, and the magnitude of the impression level Storing the information in which the additional word and the output mode are associated with each other, and selecting the keyword that matches the extracted noun, An additional word corresponding to the impression level of the selected keyword is added to the noun to generate the repeated response sentence, each keyword that matches the extracted noun is selected, and the impression of the selected keyword The response generation method may output the generated repeated response sentence in an output mode corresponding to the magnitude of the degree.
One aspect of the present invention for achieving the above object includes a process for recognizing a user's voice, a process for analyzing the structure of the recognized voice, a process for extracting a noun from the recognized voice, Based on the extracted noun, a process of generating a repeated response sentence for repeating the user's voice, and an optional response sentence to the user's voice based on the analyzed voice structure, A response generation program for causing a computer to execute a process of outputting the generated arbitrary response sentence after outputting the generated repeated response sentence, wherein the degree of impression of each keyword is set to a plurality of keywords. Information associated with each other and information in which an additional word and an output mode are associated with the magnitude of the impression degree are stored, and the keyword that matches the extracted noun is selected, A process of generating an iterative response sentence by adding an additional word corresponding to the magnitude of impression of the selected keyword to the noun, selecting each keyword that matches the extracted noun, and selecting the selected keyword A response generation program that causes a computer to execute the process of outputting the generated repeated response sentence in an output manner corresponding to the magnitude of the impression degree.

本発明によれば、画一的な応答パターンによる対話の違和感を緩和することができる応答生成装置、応答生成方法及び応答生成プログラムを提供することができる。 According to the present invention, it is possible to provide a response generation device, a response generation method, and a response generation program that can alleviate the uncomfortable feeling of dialogue due to a uniform response pattern.

本発明の実施形態１に係る応答生成装置の概略的なシステム構成を示すブロック図である。It is a block diagram which shows the schematic system configuration | structure of the response generation apparatus which concerns on Embodiment 1 of this invention. 本発明の実施形態１に係る応答生成装置の概略的なハードウェア構成を示すブロック図である。It is a block diagram which shows the schematic hardware constitutions of the response generation apparatus which concerns on Embodiment 1 of this invention. 本発明の実施形態１に係る応答生成方法の処理フローを示すフローチャートである。It is a flowchart which shows the processing flow of the response generation method which concerns on Embodiment 1 of this invention. 本発明の実施形態２に係る応答生成装置の概略的なシステム構成を示すブロック図である。It is a block diagram which shows the schematic system configuration | structure of the response generation apparatus which concerns on Embodiment 2 of this invention.

実施形態１
以下、図面を参照して本発明の実施の形態について説明する。図１は、本発明の実施形態１に係る応答生成装置の概略的なシステム構成を示すブロック図である。本実施形態１に係る応答生成装置１は、ユーザの音声を認識する音声認識部２と、音声の構造を解析する構造解析部と３、ユーザの音声に対する応答文を生成し、出力する応答出力部４と、繰返しの応答文を生成する繰返生成部５と、名詞抽出部９と、を備えている。 Embodiment 1
Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram showing a schematic system configuration of a response generation apparatus according to Embodiment 1 of the present invention. The response generation apparatus 1 according to the first embodiment includes a voice recognition unit 2 that recognizes a user's voice, a structure analysis unit 3 that analyzes a voice structure, and a response output that generates and outputs a response sentence to the user's voice. A unit 4, a repeat generation unit 5 that generates a repeated response sentence, and a noun extraction unit 9 are provided.

なお、応答生成装置１は、例えば、演算処理等と行うＣＰＵ（Central Processing Unit）１ａ、ＣＰＵ１ａによって実行される演算プログラム、制御プログラム等が記憶されたＲＯＭ（Read Only Memory）やＲＡＭ（Random Access Memory）からなるメモリ１ｂ、外部と信号の入出力を行うインターフェイス部（Ｉ／Ｆ）１ｃ、などからなるマイクロコンピュータを中心にして、ハードウェア構成されている（図２）。ＣＰＵ１ａ、メモリ１ｂ、及びインターフェイス部１ｃは、データバス１ｄなどを介して相互に接続されている。 The response generation device 1 includes, for example, a CPU (Central Processing Unit) 1a that performs arithmetic processing and the like, a ROM (Read Only Memory) and a RAM (Random Access Memory) that store arithmetic programs executed by the CPU 1a, control programs, and the like. 2) and a microcomputer including an interface unit (I / F) 1c for inputting / outputting signals to / from the outside, and the like (FIG. 2). The CPU 1a, the memory 1b, and the interface unit 1c are connected to each other via a data bus 1d.

音声認識部２は、マイク６により取得されたユーザの音声情報に基づいて音声認識処理を行い、ユーザの音声情報をテキスト化し文字列情報として認識する。音声認識部２は、音声認識手段の一具体例である。音声認識部２は、マイク６から出力されるユーザの音声情報から発話区間を検出し、検出した発話区間の音声情報に対して、例えば、統計言語モデルを参照してパターンマッチングを行うことで音声認識を行う。ここで、統計言語モデルは、例えば、単語の出現分布やある単語の次に出現する単語の分布等、言語表現の出現確率を計算するための確率モデルであり、形態素単位で連結確率を学習したものである。統計言語モデルは、メモリ１ｂなどに予め記憶されている。なお、音声認識部２は、ユーザの音声情報の各形態素に対してその品詞種類（名詞、形容詞、動詞、副詞など）を付加した品詞情報付き形態素情報を生成する。音声認識部２は、認識したユーザの音声情報を構造解析部３及び名詞抽出部９に出力する。 The voice recognition unit 2 performs voice recognition processing based on the user's voice information acquired by the microphone 6, converts the user's voice information into text, and recognizes it as character string information. The speech recognition unit 2 is a specific example of speech recognition means. The voice recognition unit 2 detects a speech section from the user's voice information output from the microphone 6, and performs voice matching by performing pattern matching on the detected voice information of the speech section with reference to, for example, a statistical language model. Recognize. Here, the statistical language model is a probability model for calculating the appearance probability of a language expression, such as the distribution of the appearance of a word or the distribution of a word that appears after a certain word, and learned the connection probability in units of morphemes. Is. The statistical language model is stored in advance in the memory 1b or the like. Note that the speech recognition unit 2 generates morpheme information with part-of-speech information by adding the part-of-speech type (noun, adjective, verb, adverb, etc.) to each morpheme of the user's speech information. The voice recognition unit 2 outputs the recognized voice information of the user to the structure analysis unit 3 and the noun extraction unit 9.

構造解析部３は、音声認識部２により認識された音声情報の構造を解析する。構造解析部３は、構造解析手段の一具体例である。構造解析部３は、例えば、一般的な形態素解析器を用いて音声認識されたユーザの音声情報を示す文字列情報に対して形態素解析などを行い、文字列情報の意味解釈を行う。構造解析部３は、文字列情報の解析結果を応答出力部４に出力する。 The structure analysis unit 3 analyzes the structure of the voice information recognized by the voice recognition unit 2. The structure analysis unit 3 is a specific example of structure analysis means. For example, the structure analysis unit 3 performs morphological analysis on character string information indicating the voice information of the user that has been voice-recognized using a general morphological analyzer, and interprets the meaning of the character string information. The structure analysis unit 3 outputs the analysis result of the character string information to the response output unit 4.

応答出力部４は、構造解析部３により解析された音声情報の構造に基づいて、ユーザの音声情報に対する応答文（以下、随意応答文と称す）を生成し、該生成した随意応答文を出力する。応答出力部４は、応答出力手段の一具体例である。応答出力部４は、例えば、構造解析部３から出力される文字列情報の解析結果に基づいて、ユーザの音声情報に対する随意応答文を生成する。そして、応答出力部４は、生成した応答文をスピーカ７を用いて出力する。 The response output unit 4 generates a response sentence (hereinafter referred to as an optional response sentence) to the user's voice information based on the structure of the voice information analyzed by the structure analysis unit 3, and outputs the generated optional response sentence To do. The response output unit 4 is a specific example of response output means. For example, the response output unit 4 generates an arbitrary response sentence for the user's voice information based on the analysis result of the character string information output from the structure analysis unit 3. Then, the response output unit 4 outputs the generated response sentence using the speaker 7.

より、具体的には、構造解析部３は、文字列情報「トンカツを食べる」において、述語項構造を抽出し、述語「食べる」と格助詞「を」を特定する。そして、応答出力部４は、構造解析部３により特定された述語「食べる」に係り得る格助詞の種類を、述語と格助詞との対応関係が記憶された不足格辞書データベース８の中から抽出する。なお、不足格辞書データベース８は、例えば、メモリ１ｂに構築されている。 More specifically, the structure analysis unit 3 extracts the predicate term structure in the character string information “eating tonkatsu” and specifies the predicate “eat” and the case particle “wo”. Then, the response output unit 4 extracts the type of case particles that can be related to the predicate “eat” specified by the structure analysis unit 3 from the deficiency dictionary database 8 in which the correspondence between the predicate and the case particle is stored. To do. The deficiency dictionary database 8 is constructed in the memory 1b, for example.

応答出力部４は、例えば、「何を食べる」、「どこで食べる」、「いつに食べる」、「誰と食べる」とういう述語項構造を、随意応答文として生成する。さらに、応答出力部４は、上記生成した述語項構造の中で、ユーザの音声と一致しない表層格「を」を除いた、他の述語項構造の中からランダムに選択し、選択した述語項構造を随意応答文とする。応答出力部４は、例えば、「誰と食べたの？」という述語項構造を選択し、随意応答文として出力する。なお、上述した随意応答文の生成方法は一例であり、これに限定されず、任意の生成方法を用いることができる。 The response output unit 4 generates, for example, predicate term structures such as “what to eat”, “where to eat”, “when to eat”, and “who to eat” as an optional response sentence. Further, the response output unit 4 randomly selects from other predicate term structures except the surface case "" that does not match the user's voice in the generated predicate term structure, and selects the selected predicate term. The structure is an arbitrary response sentence. The response output unit 4 selects, for example, a predicate term structure “who did you eat?” And outputs it as an optional response sentence. In addition, the generation method of the arbitrary response sentence mentioned above is an example, It is not limited to this, Arbitrary generation methods can be used.

名詞抽出部９は、音声認識部２から出力された音声情報の品詞情報付き形態素情報に基づいて、認識されたユーザの音声情報から名詞のみを抽出する。名詞抽出部９は、名詞抽出手段の一具体例である。名詞抽出部９は、例えば、音声認識部２から出力された音声情報の品詞情報付き形態素情報「トンカツ（名詞）を（助詞）食べた（動詞）よ（助詞）」から、「トンカツ（名詞）」のみを抽出する。また、名詞抽出部９は、上記名詞として、例えば、トンカツ（一般名詞）、矢場トン（固有名詞）、投票する＝＞投票（サ変名詞）（但し、数詞などの一部の名詞を除く）などを抽出する。名詞抽出部９は、抽出した名詞を繰返生成部５に出力する。 The noun extraction unit 9 extracts only the noun from the recognized user's voice information based on the morpheme information with part of speech information of the voice information output from the voice recognition unit 2. The noun extraction unit 9 is a specific example of noun extraction means. The noun extraction unit 9, for example, from the morphological information with part-of-speech information “tonkatsu (noun) (participant) ate (verb) yo (particle)” of the speech information output from the speech recognition unit 2, “tonkatsu (noun)” "Only. In addition, the noun extraction unit 9 includes, for example, tonkatsu (general noun), toba Yaba (proprietary noun), voting => voting (sa variable noun) (however, excluding some nouns such as numerals), etc. To extract. The noun extraction unit 9 outputs the extracted noun to the repetition generation unit 5.

ところで、上述したような、音声情報の構造解析、及びその応答文の生成には時間を要し（例えば、３秒程度）、処理コストが高い。このため、応答待ちが生じ、対話に違和感が生じる虞がある。 By the way, the structure analysis of voice information and the generation of a response sentence as described above require time (for example, about 3 seconds), and the processing cost is high. For this reason, there is a possibility that waiting for a response may occur, and the conversation may feel uncomfortable.

これに対し、本実施の形態１に係る応答生成装置１において、繰返生成部５は、音声認識部２により認識されたユーザの音声から、繰返しの応答文（以下、繰返応答文と称す）を簡易に生成する。そして、応答出力部４は、繰返生成部５により生成された繰返応答文した後、音声の構造に基づいた随意応答文を出力する。 On the other hand, in the response generation device 1 according to the first embodiment, the repeat generation unit 5 repeats a response sentence (hereinafter referred to as a repeat response sentence) from the user's voice recognized by the voice recognition unit 2. ) Is generated easily. And the response output part 4 outputs the voluntary response sentence based on the structure of an audio | voice after making the repeated response sentence produced | generated by the repetition production | generation part 5. FIG.

これにより、繰返応答文は、認識されたユーザの音声をオウム返しで繰り返すだけなので生成時間を要せず（例えば、１秒程度）、処理コストが低い。したがって、上記処理コストが高い随意応答文を出力するまでの応答待ちの間に、処理コストが低い繰返応答文を出力することができる。したがって、応答待ちによって生じる対話の間が大きいことによる対話の違和感を緩和することができる。 Thereby, since the repeated response sentence only repeats the recognized user's voice by returning a parrot, it does not require generation time (for example, about 1 second), and the processing cost is low. Therefore, it is possible to output a repetitive response sentence with a low processing cost while waiting for a response until the optional response sentence with a high processing cost is output. Therefore, it is possible to alleviate the uncomfortable feeling of the dialogue due to the large duration of the dialogue caused by waiting for a response.

繰返生成部５は、上述の如く、音声認識部２により認識された音声情報を、オウム返しを行うための繰返応答文として生成する。ここで、ユーザの音声を全く変えずにそのままオウム返しするよりも、ユーザの音声情報の名詞に特定の付加語を付加してオウム返しをした方が、より対話の自然性が向上する。例えば、ユーザの発話「海に行ったよ」に対して、応答生成装置１が単にそのまま「海に行ったよ」と応答するよりも、「海かぁ」あるいは「お、海かぁ」と応答した方がより対話の自然性が向上する。 As described above, the repeat generation unit 5 generates the speech information recognized by the speech recognition unit 2 as a repeat response sentence for performing a parrot return. Here, rather than returning the parrot as it is without changing the user's voice at all, the naturalness of the conversation is improved by adding a specific additional word to the noun of the user's voice information and returning the parrot. For example, in response to the user's utterance “I went to the ocean”, the response generation device 1 responds “Umi ka” or “Oh, ka ka” rather than simply responding “I went to the ocean”. The naturalness of dialogue is further improved.

したがって、本実施の形態１に係る繰返生成部５は、名詞抽出部９により抽出された名詞に対して特定の付加語を付加することで、繰返応答文を生成する。例えば、繰返生成部５は、名詞抽出部９により抽出された名詞と一致するメモリ１ｂの各キーワードを選択し、選択したキーワードの数値に対応する付加語を、その名詞に付加して繰返応答文を生成する。これにより、オオム返しの繰返応答文の語感に多様性を持たせることができるため、画一的な応答パターンにならず、対話の違和感をより緩和することができる。 Therefore, the repeat generation unit 5 according to the first embodiment generates a repeat response sentence by adding a specific additional word to the noun extracted by the noun extraction unit 9. For example, the repetition generation unit 5 selects each keyword in the memory 1b that matches the noun extracted by the noun extraction unit 9, and adds an additional word corresponding to the numerical value of the selected keyword to the noun. Generate a response sentence. As a result, since the vocabulary of the repeated response sentence can be varied, it is not a uniform response pattern, and the discomfort of the dialogue can be further alleviated.

メモリ１ｂには、次のように、複数のキーワードと、各キーワードの印象度の大きさを示す印象値と、を夫々対応付けた印象度情報（テーブル情報など）が記憶されている。メモリ１ｂは記憶手段の一具体例である。 The memory 1b stores impression degree information (table information or the like) in which a plurality of keywords and an impression value indicating the magnitude of the impression degree of each keyword are associated with each other as follows. The memory 1b is a specific example of storage means.

キーワード「トンカツ」印象値「０．７」、
キーワード「ステーキ」印象値「−０．３」、
キーワード「野球」印象値「０．７」、
キーワード「サッカー」印象値「−０．４７」 Keyword “tonkatsu” Impression value “0.7”,
Keyword “steak” impression value “−0.3”,
Keyword “baseball” impression value “0.7”,
Keyword “soccer” Impression value “−0.47”

なお、印象値は、例えば、（不快）−１．０〜＋１．０（快適）の間の値が設定されているが、これに限定されない。応答生成装置１は、予め蓄積したユーザとの対話履歴と、例えば、肯定的な表現の品詞（「トンカツが好き」など）の出現頻度が増加するとその品詞の印象値を増加させるような決定関数（Ｍ＝Ｌｏｇ（ｎ＋１）など）と、に基づいて、各キーワードの印象値を設定してもよい。このように、ユーザの印象が良いキーワードほど、どの印象値が増加するように、各キーワードの印象値は設定されている。
なお、上記印象値には数値が設定されているが、これに限定されない。上記印象値として、例えば、大、中、小などであってもよく、キーワードの印象度の大きさを表す任意のものが設定できる。 The impression value is set to a value between (uncomfortable) -1.0 to +1.0 (comfortable), but is not limited thereto. The response generation device 1 has a conversation function with a user accumulated in advance, and a determination function that increases the impression value of a part of speech when the frequency of appearance of a positive part of speech (such as “I like tonkatsu”) increases, for example. (M = Log (n + 1) etc.) and the impression value of each keyword may be set. Thus, the impression value of each keyword is set so that the impression value increases as the keyword has a better user impression.
In addition, although the numerical value is set to the said impression value, it is not limited to this. The impression value may be, for example, large, medium, or small, and an arbitrary value representing the magnitude of the impression level of the keyword can be set.

繰返生成部５は、名詞抽出部９から出力された名詞と、メモリ１ｂの印象度情報と、に基づいて、その名詞抽出部９からの名詞に対して印象値を設定する。
さらに、印象値ｘには、例えば、下記のように３段階の興奮度が予め設定されているが、これに限定されず、興奮度を任意に設定できる。
ｘ＜０．３興奮度＝０
０．３≦ｘ≦０．７興奮度＝１
０．７≦ｘ興奮度＝２ The repeat generation unit 5 sets an impression value for the noun from the noun extraction unit 9 based on the noun output from the noun extraction unit 9 and the impression degree information in the memory 1b.
Further, for example, three levels of excitement are set in advance for the impression value x as follows, but the present invention is not limited to this, and the excitement can be arbitrarily set.
x <0.3 Excitability = 0
0.3 ≦ x ≦ 0.7 Excitability = 1
0.7 ≦ x Excitability = 2

メモリ１ｂには、興奮度と付加語（語頭、語尾など）と、を夫々対応付けた興奮度情報（テーブル情報）が記憶されている。
繰返生成部５は、上記設定した名詞の印象値と、メモリ１ｂの興奮度情報と、に基づいて、その名詞に対して興奮度を設定する。繰返生成部５は、設定した名詞の興奮度を応答出力部４に出力する。繰返生成部５は、上記設定した名詞の興奮度と、メモリ１ｂの興奮度情報と、に基づいて、その名詞に対応する付加語を選択する。繰返生成部５は、上記設定した名詞に付加語を付加することで、繰返応答文を生成する。 The memory 1b stores excitement level information (table information) in which the excitement level and additional words (beginning, ending, etc.) are associated with each other.
The repeat generation unit 5 sets the excitement level for the noun based on the set impression value of the noun and the excitement level information in the memory 1b. The repetition generation unit 5 outputs the set excitement level of the noun to the response output unit 4. The repeat generation unit 5 selects an additional word corresponding to the noun based on the set excitement level of the noun and the excitement level information in the memory 1b. The repeat generation unit 5 generates a repeat response sentence by adding an additional word to the set noun.

より具体的には、名詞抽出部９は、音声認識部２により認識された音声の文字列情報「トンカツを食べたよ」から、名詞「トンカツ」のみを抽出し、繰返生成部５に出力する。
繰返生成部５は、名詞抽出部９から出力された名詞「トンカツ」と、メモリ１ｂの印象度情報と、に基づいて、その名詞「トンカツ」に対して印象値＝０．７を設定する。繰返生成部５は、印象値＝０．７に対応する興奮度＝１をその名詞「トンカツ」に設定する。 More specifically, the noun extraction unit 9 extracts only the noun “tonkatsu” from the character string information “I ate a tongue cutlet” recognized by the voice recognition unit 2, and outputs it to the repetition generation unit 5. .
The repeat generation unit 5 sets an impression value = 0.7 for the noun “tonkatsu” based on the noun “tonkatsu” output from the noun extraction unit 9 and the impression degree information in the memory 1b. . The repetition generation unit 5 sets the excitement level = 1 corresponding to the impression value = 0.7 to the noun “tonkatsu”.

繰返生成部５は、上記設定した名詞「トンカツ」の興奮度＝１と、メモリ１ｂの興奮度情報と、に基づいて、その名詞「トンカツ」に対応する付加語（語頭「お、」、語尾「かぁ」）を選択する。繰返生成部５は、上記設定した名詞「トンカツ」に付加語（語頭「お、」、語尾「かぁ」）を付加することで、繰返応答文「お、トンカツかぁ」を生成する。 Based on the excitement level = 1 of the set noun “tonkatsu” and the excitement level information in the memory 1b, the repetition generation unit 5 adds an additional word corresponding to the noun “tonkatsu” (beginning “o,”, Select the ending "ka"). The repeat generation unit 5 generates a repeat response sentence “O, Tonkatsu Ka” by adding an additional word (beginning “O,”, ending “Ka”) to the set noun “Tonkatsu”.

なお、メモリ１ｂには、例えば、印象値と、付加語と、を夫々対応付けた情報が記憶されていてもよい。この場合、繰返生成部５は、上記設定した名詞の印象値と、メモリ１ｂの情報と、に基づいて、その印象値に対応する付加語を選択する。繰返生成部５は、上記設定した名詞に付加語を付加することで、繰返応答文を生成する。 The memory 1b may store, for example, information that associates impression values with additional words. In this case, the repetition generation unit 5 selects an additional word corresponding to the impression value based on the set impression value of the noun and the information in the memory 1b. The repeat generation unit 5 generates a repeat response sentence by adding an additional word to the set noun.

さらに、本実施形態１においては、画一的な応答パターンを改善し対話としての自然さを向上させるために、上記名詞に特定の付加語を付加するだけでなく、出力態様を変更する。すなわち、本実施形態１に係る応答出力部４は、名詞抽出部９により抽出された名詞と一致するメモリ１ｂの各キーワードの印象値に対応する出力態様で、繰返生成部５により生成された繰返応答文を出力する。これにより、オオム返しの繰返応答文の語感に多様性を持たせることができるため、画一的な応答パターンにならず、対話の違和感をより緩和することができる。 Furthermore, in the first embodiment, in order to improve a uniform response pattern and improve the naturalness of dialogue, not only a specific additional word is added to the noun, but also the output mode is changed. That is, the response output unit 4 according to the first embodiment is generated by the repeated generation unit 5 in an output mode corresponding to the impression value of each keyword in the memory 1b that matches the noun extracted by the noun extraction unit 9. Output repeated response text. As a result, since the vocabulary of the repeated response sentence can be varied, it is not a uniform response pattern, and the discomfort of the dialogue can be further alleviated.

メモリ１ｂには、例えば、次のように興奮度に出力態様が対応付けられた出力情報（テーブル情報など）が記憶されている。なお、メモリ１ｂの上記出力情報、印象度情報及び興奮度情報は、タッチパネル、キーボード、音声認識装置などの任意の入力装置を介して設定変更できる。
興奮度＝０出力態様スピーカ音量＝１倍
興奮度＝１出力態様スピーカ音量＝１．５倍
興奮度＝２出力態様スピーカ音量＝２倍 The memory 1b stores, for example, output information (table information or the like) in which the output mode is associated with the degree of excitement as follows. Note that the output information, impression degree information, and excitement degree information in the memory 1b can be set and changed via an arbitrary input device such as a touch panel, a keyboard, or a voice recognition device.
Exciting degree = 0 Output mode Speaker volume = 1 times Exciting degree = 1 Output mode Speaker volume = 1.5 times Exciting degree = 2 Output mode Speaker volume = 2 times

応答出力部４は、繰返生成部５から出力される名詞の興奮度と、メモリ１ｂの出力情報と、に基づいて、繰返応答文の出力態様を選択する。応答出力部４は、繰返生成部５から出力される繰返応答文を選択した出力態様で出力する。 The response output unit 4 selects the output mode of the repeated response sentence based on the excitement level of the noun output from the repeated generation unit 5 and the output information of the memory 1b. The response output unit 4 outputs the repeated response text output from the repeated generation unit 5 in the selected output mode.

例えば、応答出力部４は、繰返生成部５から出力される名詞「トンカツ」の興奮度＝１と、メモリ１ｂの出力情報と、に基づいて、繰返応答文の出力態様「スピーカ音量＝１．５倍」を選択する。応答出力部４は、繰返生成部５から出力される繰返応答文「お、トンカツかぁ」を選択した出力態様「スピーカ音量＝１．５倍」で出力する。これにより、名詞に対するユーザの興奮度（印象値）が高いほど、スピーカ音量が増加することとなる。したがって、その繰返応答文に多様性を持たせることができ、対話の違和感をより緩和することができる。 For example, based on the excitement level = 1 of the noun “tonkatsu” output from the repeat generation unit 5 and the output information of the memory 1b, the response output unit 4 outputs an output form “speaker volume = Select “1.5 times”. The response output unit 4 outputs the output response “speaker volume = 1.5 times” in which the repeated response sentence “O, Tonkatsuka” output from the repeated generation unit 5 is selected. As a result, the louder speaker volume increases as the degree of user excitement (impression value) for the noun increases. Therefore, diversity can be given to the repeated response sentence, and the uncomfortable feeling of dialogue can be further alleviated.

なお、メモリ１ｂには、例えば、印象値と、出力態様と、を夫々対応付けた出力情報が記憶されていてもよい。この場合、繰返生成部５は、上記設定した名詞の印象値と、メモリ１ｂの出力情報と、に基づいて、その印象値に対応する出力態様を選択する。応答出力部４は、繰返生成部５から出力される繰返応答文を、選択した出力態様で出力する。 The memory 1b may store, for example, output information in which impression values are associated with output modes. In this case, the repeat generation unit 5 selects an output mode corresponding to the impression value based on the set impression value of the noun and the output information of the memory 1b. The response output unit 4 outputs the repeated response text output from the repeated generation unit 5 in the selected output mode.

さらに、出力態様の一例として、上記スピーカ音量が挙げられているが、これに限定されない。出力態様は、スピーカ音声の基本周波数（倍率、増加量など）、あるいは、イントネーション（倍率、増加量など）の設定であってもよい。例えば、応答出力部４は、繰返生成部５から出力される繰返応答文を、選択した出力態様「基本周波数＝１．２倍」で出力する。これにより、名詞に対するユーザの興奮度（印象値）が高いほど、基本周波数が増加する（音声が高くなる）こととなる。したがって、その繰返応答文に多様性を持たせることができ、対話の違和感をより緩和することができる。
なお、上述した繰返応答文を生成する処理は、音声情報から名詞を抽出し、抽出した名詞に対して、対応する付加語を付加し、対応する出力態様で出力するだけの簡易な処理である。したがって、この繰返応答文を生成する処理コストは、音声情報の構造解析を行って随意応答文を生成する処理コストと比較して、低くなる。 Furthermore, although the said speaker volume is mentioned as an example of an output mode, it is not limited to this. The output mode may be the setting of the fundamental frequency (magnification, increase, etc.) or intonation (magnification, increase, etc.) of the speaker sound. For example, the response output unit 4 outputs the repeated response sentence output from the repeated generation unit 5 in the selected output mode “basic frequency = 1.2 times”. As a result, the higher the user excitement level (impression value) with respect to the noun, the higher the fundamental frequency (the higher the voice becomes). Therefore, diversity can be given to the repeated response sentence, and the uncomfortable feeling of dialogue can be further alleviated.
In addition, the process which produces | generates the repeated response sentence mentioned above is a simple process which extracts a noun from audio | voice information, adds a corresponding additional word with respect to the extracted noun, and outputs it in a corresponding output mode. is there. Therefore, the processing cost for generating the repeated response text is lower than the processing cost for generating the voluntary response text by analyzing the structure of the voice information.

図３は、本実施形態１に係る応答生成方法の処理フローを示すフローチャートである。
音声認識部２は、マイク６により取得されたユーザの音声情報の音声認識を行い（ステップＳ１０１）、認識したユーザの音声情報を構造解析部３、繰返生成部５及び名詞抽出部９に出力する。 FIG. 3 is a flowchart showing a processing flow of the response generation method according to the first embodiment.
The voice recognition unit 2 performs voice recognition of the user's voice information acquired by the microphone 6 (step S101), and outputs the recognized user's voice information to the structure analysis unit 3, the repeat generation unit 5, and the noun extraction unit 9. To do.

名詞抽出部９は、音声認識部２から出力された音声情報の品詞情報付き形態素情報に基づいて、認識されたユーザの音声情報から名詞のみを抽出する（ステップＳ１０２）。名詞抽出部９は、抽出した名詞を繰返生成部５に出力する。 The noun extraction unit 9 extracts only the noun from the recognized user's speech information based on the morpheme information with part of speech information of the speech information output from the speech recognition unit 2 (step S102). The noun extraction unit 9 outputs the extracted noun to the repetition generation unit 5.

繰返生成部５は、名詞抽出部９から出力された名詞と、メモリ１ｂの印象度情報と、に基づいて、その名詞抽出部９からの名詞に対して印象値を設定する（ステップＳ１０３）。繰返生成部５は、上記設定した名詞の印象値と、メモリ１ｂの興奮度情報と、に基づいて、その名詞に対して興奮度を設定する（ステップＳ１０４）。繰返生成部５は、設定した名詞の興奮度を応答出力部４に出力する。 The repeat generation unit 5 sets an impression value for the noun from the noun extraction unit 9 based on the noun output from the noun extraction unit 9 and the impression degree information in the memory 1b (step S103). . The repeat generation unit 5 sets the excitement level for the noun based on the set impression value of the noun and the excitement level information in the memory 1b (step S104). The repetition generation unit 5 outputs the set excitement level of the noun to the response output unit 4.

繰返生成部５は、上記設定した名詞の興奮度と、メモリ１ｂの興奮度情報と、に基づいて、その名詞に対応する付加語を選択する（ステップＳ１０５）。繰返生成部５は、上記設定した名詞に付加語を付加することで、繰返応答文を生成する（ステップＳ１０６）。繰返生成部５は、生成した繰返応答文を応答出力部４に出力する。 The repeat generation unit 5 selects an additional word corresponding to the noun based on the set excitement level of the noun and the excitement level information in the memory 1b (step S105). The repeat generation unit 5 generates a repeat response sentence by adding an additional word to the set noun (step S106). The repeat generation unit 5 outputs the generated repeat response sentence to the response output unit 4.

応答出力部４は、繰返生成部５から出力される名詞の興奮度と、メモリ１ｂの出力情報と、に基づいて、繰返応答文の出力態様を選択する（ステップＳ１０７）。応答出力部４は、繰返生成部５から出力される繰返応答文を選択した出力態様でスピーカ７から出力する（ステップＳ１０８）。 The response output unit 4 selects the output mode of the repeated response sentence based on the excitement level of the noun output from the repeated generation unit 5 and the output information of the memory 1b (step S107). The response output unit 4 outputs the repeated response sentence output from the repeated generation unit 5 from the speaker 7 in the selected output mode (step S108).

上記（ステップ１０２）及び（ステップ１０８）と平行して、構造解析部３は、音声認識部２により認識された音声情報の構造を解析し（ステップＳ１０９）、その文字列情報の解析結果を応答出力部４に出力する。 In parallel with the above (Step 102) and (Step 108), the structure analysis unit 3 analyzes the structure of the speech information recognized by the speech recognition unit 2 (Step S109), and returns the analysis result of the character string information as a response. Output to the output unit 4.

応答出力部４は、構造解析部３から出力される文字列情報の解析結果に基づいて随意応答文を生成し（ステップＳ１１０）、生成した随意応答文をスピーカ７から出力する（ステップＳ１１１）。 The response output unit 4 generates an arbitrary response sentence based on the analysis result of the character string information output from the structure analysis unit 3 (step S110), and outputs the generated arbitrary response sentence from the speaker 7 (step S111).

以上、本実施形態１において、繰返生成部５は、名詞抽出部９により抽出された名詞と一致するメモリ１ｂの各キーワードを選択し、選択したキーワードの数値に対応する付加語を、その名詞に付加して繰返応答文を生成する。さらに、応答出力部４は、名詞抽出部９により抽出された名詞と一致するメモリ１ｂの各キーワードの印象値に対応する出力態様で、繰返生成部５により生成された繰返応答文を出力する。これにより、オオム返しの繰返応答文の語感に多様性を持たせることができるため、画一的な応答パターンにならず、対話の違和感をより緩和することができる。 As described above, in the first embodiment, the repeat generation unit 5 selects each keyword in the memory 1b that matches the noun extracted by the noun extraction unit 9, and adds an additional word corresponding to the selected keyword to the noun. To generate a repeated response sentence. Furthermore, the response output unit 4 outputs the repeated response sentence generated by the repeated generation unit 5 in an output mode corresponding to the impression value of each keyword in the memory 1b that matches the noun extracted by the noun extraction unit 9. To do. As a result, since the vocabulary of the repeated response sentence can be varied, it is not a uniform response pattern, and the discomfort of the dialogue can be further alleviated.

実施形態２．
図４は、本発明の実施形態２に係る応答生成装置の概略的なシステム構成を示すブロック図である。本実施形態２に係る応答生成装置２０は、上記実施形態１に係る応答生成装置１の構成に加えて、ユーザの音声情報の音韻を分析する音韻分析部２１と、ユーザの音声情報に対する相槌の応答を生成する相槌生成部２２と、を更に備える点を特徴とする。 Embodiment 2. FIG.
FIG. 4 is a block diagram illustrating a schematic system configuration of the response generation apparatus according to the second embodiment of the present invention. In addition to the configuration of the response generation apparatus 1 according to the first embodiment, the response generation apparatus 20 according to the second embodiment includes a phoneme analysis unit 21 that analyzes the phoneme of the user's voice information, and a conflict of the user's voice information. It is characterized in that it further includes an interaction generation unit 22 that generates a response.

音韻分析部２１は、マイク６により取得されたユーザの音声情報に基づいてユーザの音声情報の音韻を分析する。音韻分析部２１は、音韻分析手段の一具体例である。例えば、音韻分析部２１は、音声情報の音量レベル変化や周波数変化（基本周波数等）を検出することで、ユーザの音声の切れ目を推定する。音韻分析部２１は、音韻の分析結果を相槌生成部２２に出力する。 The phoneme analysis unit 21 analyzes the phoneme of the user's voice information based on the user's voice information acquired by the microphone 6. The phoneme analysis unit 21 is a specific example of phoneme analysis means. For example, the phonological analysis unit 21 estimates a break in a user's voice by detecting a change in volume level or frequency (such as a fundamental frequency) of voice information. The phoneme analysis unit 21 outputs the phoneme analysis result to the conflict generation unit 22.

相槌生成部２２は、音韻分析部２１から出力される音韻の分析結果に基づいて、ユーザの音声に対する相槌の応答（以下、相槌応答と称す）を生成する。相槌生成部２２は、相槌生成手段の一具体例である。例えば、相槌生成部２２は、音声情報の音量レベルが閾値以下となったとき、相槌のパターンが記憶された定型応答データベース２３を検索する。そして、相槌生成部２２は、定型応答データベース２３からランダムに相槌応答を選択する。定型応答データベース２３は、「うん。うん。」などの相槌に用いられる複数のパターンが記憶されている。定型応答データベース２３は、上記メモリ１ｂなどに構築されている。相槌生成部２２は、生成した相槌応答を応答出力部４に出力する。 Based on the phonological analysis result output from the phonological analysis unit 21, the harmonious generation unit 22 generates a response to the user's voice (hereinafter referred to as a “compatibility response”). The interaction generating unit 22 is a specific example of the interaction generating unit. For example, when the volume level of the voice information is equal to or lower than the threshold, the conflict generation unit 22 searches the standard response database 23 in which the conflict pattern is stored. Then, the conflict generation unit 22 randomly selects a conflict response from the standard response database 23. The standard response database 23 stores a plurality of patterns used for consideration such as “Yes. The fixed response database 23 is constructed in the memory 1b and the like. The interaction generating unit 22 outputs the generated interaction response to the response output unit 4.

応答出力部４は、繰返生成部５により生成された繰返応答文の前に、相槌生成部２２により生成された相槌応答をスピーカ７から出力させる。なお、音韻分析部２１は、処理コストの低い特徴量を用いて音韻分析を行っている。このため、その相槌応答の生成時間は、上記繰返応答文の生成時間より短く、処理コストがより低い。 The response output unit 4 causes the speaker 7 to output the conflict response generated by the conflict generation unit 22 before the repeated response sentence generated by the repeat generation unit 5. Note that the phonological analysis unit 21 performs phonological analysis using feature quantities with low processing costs. For this reason, the generation time of the conflict response is shorter than the generation time of the repeated response sentence, and the processing cost is lower.

したがって、上記繰返応答文を出力するまでの間に、より処理コストが低い相槌応答を出力することができる。これにより、対話間の繋がりがよりスムーズになり、対話の違和感をより緩和することができる。さらに、処理コストの異なるより多くの応答及び応答文を並列で生成し、その生成順に出力する。これにより、対話の連続性をより滑らかに維持しそのテンポ感を損なわないより自然な対話を実現できる。 Therefore, it is possible to output a conflict response with a lower processing cost before outputting the repeated response sentence. Thereby, the connection between dialogs becomes smoother, and the uncomfortable feeling of dialog can be eased more. Further, more responses and response sentences having different processing costs are generated in parallel and output in the order of generation. As a result, it is possible to maintain a smoother continuity of dialogue and realize a more natural dialogue that does not impair the sense of tempo.

なお、相槌生成部２２は、相槌応答を定型的に生成しており、繰返生成部５は、音声認識結果の表層的な解釈のみを行って繰返応答文を生成している。したがって、応答出力部４は、相槌生成部２２により生成された相槌応答および繰返生成部５により生成された繰返応答と同様の随意応答候補を生成することが想定される。 Note that the conflict generation unit 22 routinely generates a conflict response, and the repeat generation unit 5 generates a repeat response sentence by only performing surface interpretation of the speech recognition result. Therefore, it is assumed that the response output unit 4 generates an optional response candidate similar to the interaction response generated by the interaction generation unit 22 and the repetition response generated by the repetition generation unit 5.

これに対し、応答出力部４は、随意応答候補の中から、相槌生成部２２により生成された相槌応答および繰返生成部５により生成された繰返応答と重複する随意応答候補を除外する。そして、応答出力部４は、その除外された随意応答候補の中から最適な候補を選択し、随意応答文とする。これにより、重複する無駄な言葉を排除できより自然な対話を実現できる。 On the other hand, the response output unit 4 excludes from the voluntary response candidates the voluntary response candidates that overlap with the conflict response generated by the conflict generation unit 22 and the repeat response generated by the repeat generation unit 5. And the response output part 4 selects an optimal candidate from the excluded voluntary response candidates, and makes it an arbitrary response sentence. This eliminates redundant useless words and realizes a more natural dialogue.

例えば、ユーザの発話「今日は暑いね」に対して、相槌生成部２２が相槌応答「うん」を生成する。続いて、繰返生成部５は、繰返応答文「暑いね」を生成する。これに対し、応答出力部４は、随意応答候補「嫌だね」、「いつまで暑いのかな？」、「暑いね」、「そうだね」等を生成する。応答出力部４は、生成した随意応答候補の中から繰返生成部５により生成された繰返応答文と重複する「暑いね」を排除する。そして、応答出力部４は、その除外された随意応答候補の中から、例えば「いつまで暑いのかな？」を選択し、随意応答文とする。 For example, in response to the user's utterance “Today is hot”, the conflict generation unit 22 generates a conflict response “Yes”. Subsequently, the repeat generation unit 5 generates a repeat response sentence “hot”. On the other hand, the response output unit 4 generates a voluntary response candidate “I don't like it”, “How long will it be hot?”, “It ’s hot”, “I think so”, and so on. The response output unit 4 eliminates “hot” overlapping with the repeated response sentence generated by the repeated generation unit 5 from the generated voluntary response candidates. Then, the response output unit 4 selects, for example, “How long will it be hot?” From the excluded optional response candidates, and uses it as an optional response sentence.

なお、本実施形態２に係る応答生成装置２０において、上記実施形態１に係る応答生成装置１と同一部分に同一符号を付して詳細な説明は省略する。 Note that, in the response generation device 20 according to the second embodiment, the same parts as those in the response generation device 1 according to the first embodiment are denoted by the same reference numerals, and detailed description thereof is omitted.

以下、応答生成装置２０とユーザとの対話の一例を示す。下記一例において、Ｍは、応答生成装置２０の応答文及び応答であり、Ｕはユーザの発話である。
Ｍ（話題提供）：お昼何を食べたの？
Ｕ：トンカツを食べたよ。
Ｍ（相槌応答）：うん。うん。
Ｍ（繰返応答文）：お、トンカツかぁ。（「お」、「かぁ」を付加、スピーカ音量＝１．５倍で出力）
Ｍ（随意応答文）：誰と食べたのかな？
Ｕ：友達と食べたよ。
Ｍ（相槌応答）：そうなんだ。
Ｍ（繰返応答文）：友達なんだぁ。（「なんだぁ」を付加）
Ｍ（随意応答文）：どこで食べたのかな？
Ｕ：矢場とんで食べたよ。
Ｍ（相槌応答）：なるほど。
Ｍ（繰返応答文）：おおー、矢場とんだね。（「おおー」、「だね」を付加、スピーカ音量＝１．５倍、基本周波数＝１．１倍）
Ｍ（随意応答文）：食べたね。
Ｕ：美味しかったよ。
Ｍ（相槌応答）：ふーん。
Ｍ（繰返応答文）：美味しかったのか。（「のか」を付加）
Ｍ（随意応答文）：それはいいね。○○さん。 Hereinafter, an example of the interaction between the response generation device 20 and the user will be shown. In the following example, M is a response sentence and a response of the response generation device 20, and U is a user's utterance.
M (topic provided): What did you eat at lunch?
U: I ate Tonkatsu.
M (conformity response): Yeah. Yup.
M (repeat response): Oh, tonkatsu. ("O" and "ka" are added, speaker volume = 1.5 times output)
M (optional response): Who did you eat?
U: I ate with my friends.
M (Aiso response): That's right.
M (repeat response): I'm a friend. (Add "Nandaa")
M (voluntary response): Where did you eat it?
U: I ate in Yaba.
M (conformity response): I see.
M (repeat response): Oh, Yaba. ("Oo" and "Dane" added, speaker volume = 1.5 times, fundamental frequency = 1.1 times)
M (optional response): I ate it.
U: It was delicious.
M (conformity response): Hmm.
M (repeat response): Was it delicious? (Add "noka")
M (voluntary response): That's good. Mr. ○○.

上記対話の一例が示すように、ユーザが発話すると、この発話に対して、応答生成装置２０の相槌応答、繰返応答文、及び随意応答文がテンポよく連続し、対話間の繋がりがよりスムーズになることが分かる。また、繰返応答文に付加した付加語および出力態様に多様性を持たせることで、対話の自然性がより向上していることが分かる。 As shown in the example of the above dialogue, when the user utters, the response response of the response generation device 20, the repeated response text, and the voluntary response text continue to the utterance at a fast pace, and the connection between the dialogs is smoother. I understand that In addition, it is understood that the naturalness of the dialogue is further improved by giving diversity to the additional words added to the repeated response sentence and the output mode.

なお、本発明は上記実施の形態に限られたものではなく、趣旨を逸脱しない範囲で適宜変更することが可能である。 Note that the present invention is not limited to the above-described embodiment, and can be changed as appropriate without departing from the spirit of the present invention.

上記実施形態において、応答出力部４は相槌生成部２２により生成された相槌応答をスピーカ７から出力させているが、これに限られない。応答出力部４は、相槌生成部２２により生成された相槌応答に基づいて、処理負荷の低い任意の応答を行っても良い。例えば、応答出力部４は、振動装置の振動、ライト装置の点灯／点滅、表示装置の表示、ロボットの手足、頭部、胴体など各部の動作などをおこなってもよく、これらを任意に組み合わせて行ってもよい。 In the above-described embodiment, the response output unit 4 outputs the conflict response generated by the conflict generation unit 22 from the speaker 7, but is not limited thereto. The response output unit 4 may perform an arbitrary response with a low processing load based on the conflict response generated by the conflict generation unit 22. For example, the response output unit 4 may perform vibrations of the vibration device, lighting / flashing of the light device, display of the display device, operation of each part such as a robot's limbs, head, and trunk, and any combination thereof. You may go.

上記実施形態において、応答出力部４は、繰返生成部５により生成された繰返応答文をスピーカ７から出力させているが、これに限らない。応答出力部４は、繰返生成部５により生成された繰返応答文に基づいて、処理負荷の低い任意の繰返応答文を出力しても良い。例えば、応答出力部４は、表示装置の表示などを用いて繰返応答文を出力してもよく、任意に手段を組み合わせて出力してもよい。この場合、例えば、応答出力部４の出力態様は、文字の大きさ、輝度、形状などの設定であってもよい。 In the above embodiment, the response output unit 4 outputs the repeated response text generated by the repeated generation unit 5 from the speaker 7, but is not limited thereto. The response output unit 4 may output an arbitrary repeated response sentence with a low processing load based on the repeated response sentence generated by the repeated generation unit 5. For example, the response output unit 4 may output a repetitive response sentence using a display on the display device or the like, or may output it by arbitrarily combining means. In this case, for example, the output mode of the response output unit 4 may be settings such as character size, brightness, and shape.

また、本発明は、例えば、図３に示す処理を、ＣＰＵ１ａにコンピュータプログラムを実行させることにより実現することも可能である。
プログラムは、様々なタイプの非一時的なコンピュータ可読媒体（non-transitory computer readable medium）を用いて格納され、コンピュータに供給することができる。非一時的なコンピュータ可読媒体は、様々なタイプの実体のある記録媒体（tangible storage medium）を含む。非一時的なコンピュータ可読媒体の例は、磁気記録媒体（例えばフレキシブルディスク、磁気テープ、ハードディスクドライブ）、光磁気記録媒体（例えば光磁気ディスク）、ＣＤ−ＲＯＭ（Read Only Memory）、ＣＤ−Ｒ、ＣＤ−Ｒ／Ｗ、半導体メモリ（例えば、マスクＲＯＭ、ＰＲＯＭ（Programmable ROM）、ＥＰＲＯＭ（Erasable PROM）、フラッシュＲＯＭ、ＲＡＭ（random access memory））を含む。 In addition, the present invention can realize the processing shown in FIG. 3 by causing the CPU 1a to execute a computer program, for example.
The program may be stored using various types of non-transitory computer readable media and supplied to a computer. Non-transitory computer readable media include various types of tangible storage media. Examples of non-transitory computer-readable media include magnetic recording media (for example, flexible disks, magnetic tapes, hard disk drives), magneto-optical recording media (for example, magneto-optical disks), CD-ROMs (Read Only Memory), CD-Rs, CD-R / W and semiconductor memory (for example, mask ROM, PROM (Programmable ROM), EPROM (Erasable PROM), flash ROM, RAM (random access memory)) are included.

また、プログラムは、様々なタイプの一時的なコンピュータ可読媒体（transitory computer readable medium）によってコンピュータに供給されてもよい。一時的なコンピュータ可読媒体の例は、電気信号、光信号、及び電磁波を含む。一時的なコンピュータ可読媒体は、電線及び光ファイバ等の有線通信路、又は無線通信路を介して、プログラムをコンピュータに供給できる。 The program may also be supplied to the computer by various types of transitory computer readable media. Examples of transitory computer readable media include electrical signals, optical signals, and electromagnetic waves. The temporary computer-readable medium can supply the program to the computer via a wired communication path such as an electric wire and an optical fiber, or a wireless communication path.

１応答生成装置、２音声認識部、３構造解析部、４応答出力部、５繰返生成部、６マイク、７スピーカ、８不足格辞書データベース、９名詞抽出部、２１音韻分析部、２２相槌生成部、２３定型応答データベース DESCRIPTION OF SYMBOLS 1 Response production | generation apparatus, 2 Speech recognition part, 3 Structure analysis part, 4 Response output part, 5 Repeat production | generation part, 6 Microphone, 7 Speaker, 8 Insufficient dictionary database, 9 Noun extraction part, 21 Phonological analysis part, 22 Generation unit, 23 standard response database

Claims

Voice recognition means for recognizing the user's voice;
Structure analysis means for analyzing the structure of the voice recognized by the voice recognition means;
Noun extraction means for extracting nouns from the speech recognized by the speech recognition means;
Based on the noun extracted by the noun extraction means, a repeat generation means for generating a repeated response sentence for repeating the user's voice;
Based on the structure of the voice analyzed by the structure analysis means, an optional response sentence for the user's voice is generated, and after the repetition response sentence generated by the repetition generation means is output, the generated optional answer sentence is generated. A response output means for outputting a response sentence;
A response generation device comprising:
Storage means for storing information in which the magnitude of the impression level of each keyword is associated with a plurality of keywords, and information in which an additional word and an output mode are associated with the magnitude of the impression level,
The repeat generation means selects a keyword of the storage means that matches the noun extracted by the noun extraction means, and adds an additional word corresponding to the impression level of the selected keyword to the noun. Generating the repeated response sentence;
The response output means selects each keyword of the storage means that matches the noun extracted by the noun extraction means, and outputs the repeat generation means in an output mode corresponding to the impression degree of the selected keyword. A response generation apparatus characterized by outputting a repetitive response sentence generated by.

The response generation device according to claim 1,
The said memory | storage means has memorize | stored the information which matched at least 1 said output aspect among volume, a frequency, and intonation with the magnitude | size of the impression degree of each said keyword, The response generation apparatus characterized by the above-mentioned .

The response generation device according to claim 2,
The response generation device is characterized in that the output mode is set so that at least one of volume, frequency, and intonation increases as the degree of impression of each keyword increases. .

The response generation device according to any one of claims 1 to 3,
Phoneme analysis means for analyzing the phoneme of the user's voice;
A conflict generating means for generating a response to the user's voice based on the analysis result of the phoneme analyzed by the phoneme analyzing means;
Before outputting the repeated response sentence generated by the repetition generation means, the response response generated by the interaction generation means is output.

Recognizing the user's voice;
Analyzing the structure of the recognized speech;
Extracting a noun from the recognized speech;
Generating a repetitive response sentence for repeating the user's voice based on the extracted noun;
Generating an arbitrary response sentence to the user's voice based on the analyzed voice structure, outputting the generated repeated response sentence, and then outputting the generated arbitrary response sentence;
Storing information in which the magnitude of the impression level of each keyword is associated with a plurality of keywords, and information in which an additional word and an output mode are associated with the magnitude of the impression level;
A response generation method including:
Selecting the keyword that matches the extracted noun, adding an additional word corresponding to the impression level of the selected keyword to the noun to generate the repeated response sentence;
Selecting each keyword that matches the extracted noun and outputting the generated repetitive response sentence in an output mode corresponding to the magnitude of the impression level of the selected keyword. Method.

Processing to recognize the user's voice,
Processing to analyze the structure of the recognized speech;
A process of extracting nouns from the recognized speech;
Based on the extracted noun, a process of generating a repeated response sentence for repeating the user's voice;
A process of generating an arbitrary response sentence to the user's voice based on the analyzed voice structure, outputting the generated repeated response sentence, and outputting the generated arbitrary response sentence. A response generation program to be executed by a computer,
Information in which the magnitude of the impression level of each keyword is associated with a plurality of keywords, and information in which an additional word and an output mode are associated with the magnitude of the impression level are stored,
Selecting the keyword that matches the extracted noun, and adding the additional word corresponding to the impression level of the selected keyword to the noun to generate the repeated response sentence;
Selecting each keyword that matches the extracted noun, and causing the computer to execute a process of outputting the generated repeated response sentence in an output manner corresponding to the magnitude of the impression degree of the selected keyword A response generation program characterized by the above.