JP6773074B2

JP6773074B2 - Response generation method, response generator and response generation program

Info

Publication number: JP6773074B2
Application number: JP2018088229A
Authority: JP
Inventors: 生聖渡部
Original assignee: Toyota Motor Corp
Current assignee: Toyota Motor Corp
Priority date: 2018-05-01
Filing date: 2018-05-01
Publication date: 2020-10-21
Anticipated expiration: 2034-08-21
Also published as: JP2018160248A

Description

本発明は、ユーザに対して応答を行う応答生成装置、応答生成方法及び応答生成プログラムに関するものである。 The present invention relates to a response generator, a response generation method, and a response generation program that respond to a user.

ユーザの音声を認識する音声認識手段と、音声認識手段により認識された音声の構造を解析する構造解析手段と、構造解析手段により解析された音声の構造に基づいて、ユーザの音声に対する応答文を生成し、該生成した応答文を出力する応答出力手段と、を備える応答生成装置が知られている（例えば、特許文献１参照）。 Based on the voice recognition means that recognizes the user's voice, the structural analysis means that analyzes the structure of the voice recognized by the voice recognition means, and the structure of the voice analyzed by the structural analysis means, a response sentence to the user's voice is generated. A response generation device including a response output means for generating and outputting the generated response sentence is known (see, for example, Patent Document 1).

特開２０１０−１５７０８１号公報Japanese Unexamined Patent Publication No. 2010-157081

しかしながら、上記のような応答生成装置は、音声の構造解析、及びその応答文の生成に時間を要し、応答待ちが生じる。このため、対話に違和感が生じる虞がある。 However, in the response generation device as described above, it takes time to analyze the structure of the voice and generate the response sentence, and a response waiting occurs. For this reason, there is a risk that the dialogue may feel uncomfortable.

本発明は、このような問題点を解決するためになされたものであり、応答待ちによる対話の違和感を緩和することができる応答生成方法、応答生成装置及び応答生成プログラムを提供することを主たる目的とする。 The present invention has been made to solve such a problem, and a main object of the present invention is to provide a response generation method, a response generation device, and a response generation program capable of alleviating the discomfort of dialogue due to waiting for a response. And.

上記目的を達成するための本発明の一態様は、ユーザの音声を認識するステップと、前記認識された音声の構造を解析するステップと、前記解析された音声の構造に基づいて、前記ユーザの音声に対する応答文を生成し、該生成した随意の応答文を出力するステップと、を含む応答生成方法であって、前記認識されたユーザの音声を、繰返しの応答文として生成するステップと、前記音声の構造に基づいた随意の応答文を出力する前に、前記生成された繰返しの応答文を出力するステップと、を含む、ことを特徴とする応答生成方法である。
この一態様において、前記ユーザの音声の音韻を分析するステップと、前記音韻の分析結果に基づいて、前記ユーザの音声に対する相槌の応答を生成するステップと、を更に含み、前記生成される繰返しの応答文を出力する前に、前記生成された相槌の応答を出力してもよい。
この一態様において、前記解析された音声の構造に基づいて、前記ユーザの音声に対する応答候補を複数生成し、該生成した複数の応答候補の中から、前記生成された繰返しの応答文及び前記生成された相槌の応答と重複する応答候補を除外し、該除外した応答候補の中から選択した応答候補を前記随意の応答文としてもよい。
この一態様において、前記認識されたユーザの音声からキーワード及び該キーワードの品詞を抽出し、複数のキーワードと、該各キーワードの品詞と、付加語尾と、を夫々対応付けた付加情報に基づいて前記抽出したキーワード及び品詞に対応した前記付加語尾を選択し、前記抽出したキーワードに対して前記選択した付加語尾を付加することで、繰返しの応答文を生成してもよい。
上記目的を達成するための本発明の一態様は、ユーザの音声を認識する音声認識手段と、前記音声認識手段により認識された音声の構造を解析する構造解析手段と、前記構造解析手段により解析された音声の構造に基づいて、前記ユーザの音声に対する応答文を生成し、該生成した随意の応答文を出力する応答出力手段と、を備える応答生成装置であって、前記音声認識手段により認識されたユーザの音声を、繰返しの応答文として生成する繰返生成手段を備え、前記応答出力手段は、前記音声の構造に基づいた随意の応答文を出力する前に、前記繰返生成手段により生成された繰返しの応答文を出力する、ことを特徴とする応答生成装置であってもよい。
上記目的を達成するための本発明の一態様は、ユーザの音声を認識する処理と、前記認識された音声の構造を解析する処理と、前記解析された音声の構造に基づいて、前記ユーザの音声に対する応答文を生成し、該生成した随意の応答文を出力する処理と、前記認識されたユーザの音声を、繰返しの応答文として生成する処理と、前記音声の構造に基づいた随意の応答文を出力する前に、前記生成された繰返しの応答文を出力する処理と、をコンピュータに実行させることを特徴とする応答生成プログラムであってもよい。 One aspect of the present invention for achieving the above object is a step of recognizing a user's voice, a step of analyzing the structure of the recognized voice, and a step of analyzing the structure of the analyzed voice of the user. A response generation method including a step of generating a response sentence to a voice and outputting the generated arbitrary response sentence, the step of generating the recognized user's voice as a repeated response sentence, and the above-mentioned The response generation method is characterized by including a step of outputting the generated repetitive response sentence before outputting an arbitrary response sentence based on a voice structure.
In this aspect, the generated iteration further comprises a step of analyzing the phoneme of the user's voice and a step of generating an aizuchi response to the user's voice based on the result of the phoneme analysis. Before outputting the response statement, the response of the generated aizuchi may be output.
In this aspect, a plurality of response candidates for the user's voice are generated based on the analyzed voice structure, and the generated repetitive response sentence and the generation are generated from the generated plurality of response candidates. A response candidate that overlaps with the response of the aizuchi may be excluded, and the response candidate selected from the excluded response candidates may be used as the optional response statement.
In this aspect, the keyword and the part of speech of the keyword are extracted from the recognized user's voice, and the plurality of keywords, the part of speech of each keyword, and the additional ending are associated with each other based on the additional information. A repetitive response sentence may be generated by selecting the additional ending corresponding to the extracted keyword and part of speech and adding the selected additional ending to the extracted keyword.
One aspect of the present invention for achieving the above object is a voice recognition means for recognizing a user's voice, a structural analysis means for analyzing the structure of the voice recognized by the voice recognition means, and an analysis by the structural analysis means. A response generation device including a response output means for generating a response sentence to the user's voice based on the generated voice structure and outputting the generated arbitrary response sentence, which is recognized by the voice recognition means. A repeat generation means for generating the voice of the user as a repeated response sentence is provided, and the response output means is provided by the repeat generation means before outputting an arbitrary response sentence based on the structure of the voice. It may be a response generator characterized by outputting the generated repetitive response statement.
One aspect of the present invention for achieving the above object is a process of recognizing a user's voice, a process of analyzing the structure of the recognized voice, and a process of analyzing the user's voice based on the analyzed voice structure. A process of generating a response sentence to a voice and outputting the generated voluntary response sentence, a process of generating the recognized user's voice as a repeated response sentence, and a voluntary response based on the structure of the voice. A response generation program may be characterized in that a computer is made to execute a process of outputting the generated repeated response sentence before outputting the statement.

本発明によれば、応答待ちによる対話の違和感を緩和することができる応答生成方法、応答生成装置及び応答生成プログラムを提供することができる。 According to the present invention, it is possible to provide a response generation method, a response generation device, and a response generation program that can alleviate the discomfort of dialogue due to waiting for a response.

本発明の実施形態１に係る応答生成装置の概略的なシステム構成を示すブロック図である。It is a block diagram which shows the schematic system configuration of the response generation apparatus which concerns on Embodiment 1 of this invention. 本発明の実施形態１に係る応答生成方法の処理フローを示すフローチャートである。It is a flowchart which shows the processing flow of the response generation method which concerns on Embodiment 1 of this invention. 本発明の実施形態２に係る応答生成装置の概略的なシステム構成を示すブロック図である。It is a block diagram which shows the schematic system configuration of the response generation apparatus which concerns on Embodiment 2 of this invention. 本発明の実施形態２に係る応答生成方法の処理フローを示すフローチャートである。It is a flowchart which shows the processing flow of the response generation method which concerns on Embodiment 2 of this invention. メモリに記憶された付加情報の一例である。This is an example of additional information stored in the memory.

実施形態１
以下、図面を参照して本発明の実施の形態について説明する。図１は、本発明の実施形態１に係る応答生成装置の概略的なシステム構成を示すブロック図である。本実施形態１に係る応答生成装置１は、ユーザの音声を認識する音声認識部２と、音声の構造を解析する構造解析部と３、ユーザの音声に対する応答文を生成し、出力する応答出力部４と、繰返しの応答文を生成する繰返生成部５と、を備えている。 Embodiment 1
Hereinafter, embodiments of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram showing a schematic system configuration of the response generator according to the first embodiment of the present invention. The response generation device 1 according to the first embodiment is a voice recognition unit 2 that recognizes the user's voice, a structural analysis unit that analyzes the structure of the voice, and 3, and a response output that generates and outputs a response sentence to the user's voice. It includes a unit 4 and a repeat generation unit 5 that generates a repetitive response statement.

なお、応答生成装置１は、例えば、演算処理等と行うＣＰＵ（Central Processing Unit）、ＣＰＵによって実行される演算プログラム、制御プログラム等が記憶されたＲＯＭ（Read Only Memory）やＲＡＭ（Random Access Memory）からなるメモリ、外部と信号の入出力を行うインターフェイス部（Ｉ／Ｆ）、などからなるマイクロコンピュータを中心にして、ハードウェア構成されている。ＣＰＵ、メモリ、及びインターフェイス部は、データバスなどを介して相互に接続されている。 The response generation device 1 is, for example, a CPU (Central Processing Unit) that performs arithmetic processing, a ROM (Read Only Memory) or a RAM (Random Access Memory) that stores an arithmetic program, a control program, or the like executed by the CPU. The hardware is configured around a microcomputer consisting of a memory composed of a memory, an interface unit (I / F) for inputting / outputting signals to and from the outside, and the like. The CPU, memory, and interface unit are connected to each other via a data bus or the like.

音声認識部２は、音声認識手段の一具体例であり、マイク６により取得されたユーザの音声情報に基づいて音声認識処理を行い、ユーザの音声をテキスト化して文字列情報を生成する。音声認識部２は、マイク６から出力されるユーザの音声情報から発話区間を検出し、検出した発話区間の音声情報に対して、例えば、統計言語モデルを参照してパターンマッチングを行うことで音声認識を行う。ここで、統計言語モデルは、例えば、単語の出現分布やある単語の次に出現する単語の分布等、言語表現の出現確率を計算するための確率モデルであり、形態素単位で連結確率を学習したものである。統計言語モデルは、上記メモリなどに予め記憶されている。音声認識部２は、認識したユーザの音声情報を構造解析部３及び繰返生成部５に出力する。 The voice recognition unit 2 is a specific example of the voice recognition means, performs voice recognition processing based on the user's voice information acquired by the microphone 6, converts the user's voice into text, and generates character string information. The voice recognition unit 2 detects the utterance section from the user's voice information output from the microphone 6, and performs pattern matching on the detected utterance section voice information by referring to, for example, a statistical language model. Recognize. Here, the statistical language model is a probability model for calculating the appearance probability of a linguistic expression such as the appearance distribution of words and the distribution of words appearing next to a certain word, and the connection probability is learned in morpheme units. It is a thing. The statistical language model is stored in advance in the above memory or the like. The voice recognition unit 2 outputs the recognized user's voice information to the structural analysis unit 3 and the repeat generation unit 5.

構造解析部３は、構造解析手段の一具体例であり、音声認識部２により認識された音声情報の構造を解析する。構造解析部３は、例えば、一般的な形態素解析器を用いて音声認識されたユーザの音声情報を示す文字列情報に対して形態素解析などを行い、文字列情報の意味解釈を行う。構造解析部３は、文字列情報の解析結果を応答出力部４に出力する。 The structural analysis unit 3 is a specific example of the structural analysis means, and analyzes the structure of the voice information recognized by the voice recognition unit 2. The structural analysis unit 3 performs morphological analysis or the like on the character string information indicating the voice information of the user who has been voice-recognized by using a general morphological analyzer, and interprets the meaning of the character string information. The structural analysis unit 3 outputs the analysis result of the character string information to the response output unit 4.

応答出力部４は、応答出力手段の一具体例であり、構造解析部３により解析された音声情報の構造に基づいて、ユーザの音声情報に対する応答文（以下、随意応答文と称す）を生成し、該生成した随意応答文を出力する。応答出力部４は、例えば、構造解析部３から出力される文字列情報の解析結果に基づいて、ユーザの音声情報に対する随意応答文を生成する。そして、応答出力部４は、生成した応答文をスピーカ７を用いて出力する。 The response output unit 4 is a specific example of the response output means, and generates a response sentence (hereinafter, referred to as a voluntary response sentence) for the user's voice information based on the structure of the voice information analyzed by the structural analysis unit 3. Then, the generated voluntary response statement is output. The response output unit 4 generates, for example, a voluntary response sentence to the user's voice information based on the analysis result of the character string information output from the structural analysis unit 3. Then, the response output unit 4 outputs the generated response sentence by using the speaker 7.

より、具体的には、構造解析部３は、文字列情報「トンカツを食べる」において、述語項構造を抽出し、述語「食べる」と格助詞「を」を特定する。そして、応答出力部４は、構造解析部３により特定された述語「食べる」に係り得る格助詞の種類を、述語と格助詞との対応関係が記憶された不足格辞書データベース８の中から抽出する。なお、不足格辞書データベース８は、例えば、上記メモリに構築されている。 More specifically, the structural analysis unit 3 extracts the predicate argument structure in the character string information "eat pork cutlet" and specifies the predicate "eat" and the case particle "o". Then, the response output unit 4 extracts the types of case particles that can be related to the predicate "eat" specified by the structural analysis unit 3 from the deficient case dictionary database 8 in which the correspondence between the predicate and the case particles is stored. To do. The deficient dictionary database 8 is built in the above memory, for example.

応答出力部４は、例えば、「何を食べる」、「どこで食べる」、「いつに食べる」、「誰と食べる」とういう述語項構造を、随意応答文として生成する。さらに、応答出力部４は、上記生成した述語項構造の中で、ユーザの音声と一致しない表層格「を」を除いた、他の述語項構造の中からランダムに選択し、選択した述語項構造を随意応答文とする。このように、応答出力部４は、構造解析部３により解析された音声情報の構造に基づいて音声情報の意味解釈を行い、複数の随意応答候補を生成する。そして、応答出力部４は、生成した複数の随意応答候補の中から最適な候補を選択し、随意応答文とする。応答出力部４は、例えば、「誰と食べたの？」という述語項構造を選択し、随意応答文として出力する。 The response output unit 4 generates, for example, a predicate argument structure such as “what to eat”, “where to eat”, “when to eat”, and “who to eat” as a voluntary response sentence. Further, the response output unit 4 randomly selects from other predicate argument structures excluding the surface case “o” that does not match the user's voice in the generated predicate argument structure, and selects the predicate argument. The structure is a voluntary response statement. In this way, the response output unit 4 interprets the meaning of the voice information based on the structure of the voice information analyzed by the structure analysis unit 3 and generates a plurality of voluntary response candidates. Then, the response output unit 4 selects the optimum candidate from the plurality of generated voluntary response candidates and sets it as a voluntary response statement. The response output unit 4 selects, for example, the predicate argument structure "who did you eat with?" And outputs it as a voluntary response sentence.

ところで、上述したような、音声情報の構造解析、及びその応答文の生成には時間を要し（例えば、３秒程度）、処理コストが高い。このため、応答待ちが生じ、対話に違和感が生じる虞がある。 By the way, as described above, structural analysis of voice information and generation of a response sentence thereof require time (for example, about 3 seconds), and processing cost is high. For this reason, there is a risk that a response wait will occur and the dialogue will feel uncomfortable.

これに対し、本実施の形態１に係る応答生成装置１において、繰返生成部５は、音声認識部２により認識されたユーザの音声を、繰返しの応答文（以下、繰返応答文と称す）として生成する。そして、応答出力部４は、音声の構造に基づいた随意応答文を出力する前に、繰返生成部５により生成された繰返応答文を出力する。 On the other hand, in the response generation device 1 according to the first embodiment, the repeat generation unit 5 refers to the user's voice recognized by the voice recognition unit 2 as a repeat response sentence (hereinafter, referred to as a repeat response sentence). ). Then, the response output unit 4 outputs the repeat response sentence generated by the repeat generation unit 5 before outputting the voluntary response sentence based on the voice structure.

これにより、繰返応答文は、認識されたユーザの音声をそのまま繰り返すだけなので生成時間を要せず（例えば、１秒程度）、処理コストが低い。したがって、上記処理コストが高い音声の構造に基づいて生成した随意応答文を出力するまでの応答待ちの間に、処理コストが低い繰返応答文を出力することができる。これにより、応答待ちによって生じる対話の間が大きいことによる対話の違和感を緩和することができる。 As a result, since the repeated response sentence simply repeats the recognized user's voice as it is, it does not require a generation time (for example, about 1 second), and the processing cost is low. Therefore, it is possible to output a repeated response statement with a low processing cost while waiting for a response until the optional response statement generated based on the voice structure having a high processing cost is output. As a result, it is possible to alleviate the discomfort of the dialogue caused by the large interval between the dialogues caused by waiting for a response.

繰返生成部５は、音声認識部２により認識された音声情報を、所謂オウム返しを行うための繰返応答文として生成する。繰返生成部５は、生成した繰返応答文を応答出力部４に出力する。そして、応答出力部４は、構造解析部３から出力される文字列情報の解析結果に基づいて生成した随意応答文の前に、繰返生成部５から出力された繰返応答文をスピーカ７から出力する。このように、処理コストの異なる複数の応答文を並列で生成し、その生成順に応答文を出力する。これにより、対話の連続性を維持してそのテンポを損なわない対話を実現できる。 The repeat generation unit 5 generates the voice information recognized by the voice recognition unit 2 as a repeat response sentence for performing so-called parrot return. The repeat generation unit 5 outputs the generated repeat response statement to the response output unit 4. Then, the response output unit 4 sends the iterative response sentence output from the iterative generation unit 5 to the speaker 7 before the voluntary response sentence generated based on the analysis result of the character string information output from the structural analysis unit 3. Output from. In this way, a plurality of response statements having different processing costs are generated in parallel, and the response statements are output in the order of generation. As a result, it is possible to realize a dialogue that maintains the continuity of the dialogue and does not impair the tempo.

図２は、本実施形態１に係る応答生成方法の処理フローを示すフローチャートである。
音声認識部２は、マイク６により取得されたユーザの音声情報の音声認識を行い（ステップＳ１０１）、認識したユーザの音声情報を構造解析部３及び繰返生成部５に出力する。 FIG. 2 is a flowchart showing a processing flow of the response generation method according to the first embodiment.
The voice recognition unit 2 performs voice recognition of the user's voice information acquired by the microphone 6 (step S101), and outputs the recognized user's voice information to the structure analysis unit 3 and the repeat generation unit 5.

繰返生成部５は、音声認識部２により認識された音声情報を、繰返応答文として生成し（ステップＳ１０２）、生成した繰返応答文を応答出力部４に出力する。
応答出力部４は、繰返生成部５から出力された繰返応答文をスピーカ７から出力する（ステップＳ１０３）。 The repeat generation unit 5 generates the voice information recognized by the voice recognition unit 2 as a repeat response sentence (step S102), and outputs the generated repeat response sentence to the response output unit 4.
The response output unit 4 outputs the repeat response statement output from the repeat generation unit 5 from the speaker 7 (step S103).

上記（ステップ１０２）及び（ステップ１０３）と平行して、構造解析部３は、音声認識部２により認識された音声情報の構造を解析し（ステップＳ１０４）、その文字列情報の解析結果を応答出力部４に出力する。 In parallel with the above (step 102) and (step 103), the structural analysis unit 3 analyzes the structure of the voice information recognized by the voice recognition unit 2 (step S104), and responds with the analysis result of the character string information. Output to the output unit 4.

応答出力部４は、構造解析部３から出力される文字列情報の解析結果に基づいて随意応答文を生成し（ステップＳ１０５）、生成した随意応答文をスピーカ７から出力する（ステップＳ１０６）。 The response output unit 4 generates a voluntary response sentence based on the analysis result of the character string information output from the structural analysis unit 3 (step S105), and outputs the generated voluntary response sentence from the speaker 7 (step S106).

以上、本実施形態１において、認識されたユーザの音声を繰返応答文として生成し、音声の構造に基づいた随意応答文を出力する前に、繰返応答文を出力する。これにより、処理コストが高い音声の構造に基づいて生成した随意応答文を出力するまでの応答待ちの間に、処理コストが低い繰返応答文を出力することができる。したがって、応答待ちによって生じる対話の間が大きいことによる対話の違和感を緩和することができる。 As described above, in the first embodiment, the recognized user's voice is generated as the repeated response sentence, and the repeated response sentence is output before the optional response sentence based on the structure of the voice is output. As a result, it is possible to output a repeated response statement with a low processing cost while waiting for a response until the optional response statement generated based on the structure of the voice having a high processing cost is output. Therefore, it is possible to alleviate the discomfort of the dialogue caused by the large interval between the dialogues caused by waiting for a response.

実施形態２．
図３は、本発明の実施形態２に係る応答生成装置の概略的なシステム構成を示すブロック図である。本実施形態２に係る応答生成装置２０は、上記実施形態１に係る応答生成装置１の構成に加えて、ユーザの音声情報の音韻を分析する音韻分析部２１と、ユーザの音声情報に対する相槌の応答を生成する相槌生成部２２と、を更に備える点を特徴とする。 Embodiment 2.
FIG. 3 is a block diagram showing a schematic system configuration of the response generator according to the second embodiment of the present invention. In addition to the configuration of the response generation device 1 according to the first embodiment, the response generation device 20 according to the second embodiment includes a phoneme analysis unit 21 that analyzes the phoneme of the user's voice information, and an aizuchi for the user's voice information. It is characterized in that it further includes an aizuchi generation unit 22 that generates a response.

音韻分析部２１は、音韻分析手段の一具体例であり、マイク６により取得されたユーザの音声情報に基づいてユーザの音声情報の音韻を分析する。例えば、音韻分析部２１は、音声情報の音量レベル変化や周波数変化（基本周波数等）を検出することで、ユーザの音声の切れ目を推定する。音韻分析部２１は、音韻の分析結果を相槌生成部２２に出力する。 The phoneme analysis unit 21 is a specific example of the phoneme analysis means, and analyzes the phoneme of the user's voice information based on the user's voice information acquired by the microphone 6. For example, the phoneme analysis unit 21 estimates a break in the user's voice by detecting a change in the volume level or a frequency change (fundamental frequency, etc.) of the voice information. The phoneme analysis unit 21 outputs the phoneme analysis result to the aizuchi generation unit 22.

相槌生成部２２は、相槌生成手段の一具体例であり、音韻分析部２１から出力される音韻の分析結果に基づいて、ユーザの音声に対する相槌の応答（以下、相槌応答と称す）を生成する。例えば、相槌生成部２２は、音声情報の音量レベルが閾値以下となったとき、相槌のパターンが記憶された定型応答データベース２３を検索する。そして、相槌生成部２２は、定型応答データベース２３からランダムに相槌応答を選択する。定型応答データベース２３は、「うん。うん。」、「なるほど。」、「ふーん。」などの相槌に用いられる複数のパターンが記憶されている。定型応答データベース２３は、上記メモリなどに構築されている。相槌生成部２２は、生成した相槌応答を応答出力部４に出力する。 The aizuchi generation unit 22 is a specific example of the aizuchi generation means, and generates an aizuchi response (hereinafter referred to as an aizuchi response) to the user's voice based on the phoneme analysis result output from the phoneme analysis unit 21. .. For example, the aizuchi generation unit 22 searches the standard response database 23 in which the pattern of the aizuchi is stored when the volume level of the voice information becomes equal to or less than the threshold value. Then, the aizuchi generation unit 22 randomly selects an aizuchi response from the standard response database 23. The standard response database 23 stores a plurality of patterns used for aizuchi such as "Yeah. Yeah.", "I see.", And "Hmm." The standard response database 23 is built in the above memory or the like. The aizuchi generation unit 22 outputs the generated aizuchi response to the response output unit 4.

応答出力部４は、繰返生成部５により生成された繰返応答文の前に、相槌生成部２２により生成された相槌応答をスピーカ７から出力させる。 The response output unit 4 outputs the reciprocal response generated by the reciprocal generation unit 22 from the speaker 7 before the repetitive response sentence generated by the repetitive generation unit 5.

音韻分析部２１は、音量レベル変化をリアルタイムに検出できる。また、音韻分析部２１が周波数変化を検出する際の周波数計算量は、パターンマッチングよりも少なく、処理遅延が少ない。このように音韻分析部２１は、処理コストの低い特徴量を用いて音韻分析を行っている。このため、その相槌応答の生成時間は、上記繰返応答文の生成時間より短く（例えば、３００ｍｓｅｃ程度）、処理コストがより低い。 The phoneme analysis unit 21 can detect a change in volume level in real time. Further, the amount of frequency calculation when the phoneme analysis unit 21 detects the frequency change is smaller than that of pattern matching, and the processing delay is small. In this way, the phoneme analysis unit 21 performs phoneme analysis using features with low processing costs. Therefore, the generation time of the aizuchi response is shorter than the generation time of the repeated response statement (for example, about 300 msec), and the processing cost is lower.

したがって、上記繰返応答文を出力するまでの間に、より処理コストが低い相槌応答を出力することができる。これにより、対話間の繋がりがよりスムーズになり、対話の違和感をより緩和することができる。さらに、処理コストの異なるより多くの応答及び応答文を並列で生成し、その生成順に出力する。これにより、対話の連続性をより滑らかに維持しそのテンポ感を損なわないより自然な対話を実現できる。 Therefore, it is possible to output an aizuchi response with a lower processing cost before the repeated response statement is output. As a result, the connection between the dialogues becomes smoother, and the discomfort of the dialogue can be further alleviated. Furthermore, more responses and response statements with different processing costs are generated in parallel and output in the order of generation. As a result, it is possible to realize a more natural dialogue that maintains the continuity of the dialogue more smoothly and does not impair the sense of tempo.

なお、相槌生成部２２は、相槌応答を定型的に生成しており、繰返生成部５は、音声認識結果の表層的な解釈のみを行って繰返応答文を生成している。したがって、応答出力部４は、相槌生成部２２により生成された相槌応答および繰返生成部５により生成された繰返応答と同様の随意応答候補を生成することが想定される。 The aizuchi generation unit 22 routinely generates an aizuchi response, and the repeat generation unit 5 only superficially interprets the speech recognition result to generate a repeat response sentence. Therefore, it is assumed that the response output unit 4 generates a voluntary response candidate similar to the aizuchi response generated by the aizuchi generation unit 22 and the repeat response generated by the repeat generation unit 5.

これに対し、応答出力部４は、随意応答候補の中から、相槌生成部２２により生成された相槌応答および繰返生成部５により生成された繰返応答と重複する随意応答候補を除外する。そして、応答出力部４は、その除外された随意応答候補の中から最適な候補を選択し、随意応答文とする。これにより、重複する無駄な言葉を排除できより自然な対話を実現できる。 On the other hand, the response output unit 4 excludes the aizuchi response generated by the aizuchi generation unit 22 and the voluntary response candidate overlapping with the repeat response generated by the repeat generation unit 5 from the voluntary response candidates. Then, the response output unit 4 selects the optimum candidate from the excluded voluntary response candidates and sets it as a voluntary response statement. As a result, duplicate unnecessary words can be eliminated and a more natural dialogue can be realized.

例えば、ユーザの発話「今日は暑いね」に対して、相槌生成部２２が相槌応答「うん」を生成する。続いて、繰返生成部５は、繰返応答文「暑いね」を生成する。これに対し、応答出力部４は、随意応答候補「嫌だね」、「いつまで暑いのかな？」、「暑いね」、「そうだね」等を生成する。応答出力部４は、生成した随意応答候補の中から繰返生成部５により生成された繰返応答文と重複する「暑いね」を排除する。そして、応答出力部４は、その除外された随意応答候補の中から、例えば「いつまで暑いのかな？」を選択し、随意応答文とする。 For example, in response to the user's utterance "It's hot today", the Aizuchi generation unit 22 generates an Aizuchi response "Yeah". Subsequently, the repeat generation unit 5 generates a repeat response sentence "It's hot". On the other hand, the response output unit 4 generates voluntary response candidates "I don't like it", "How long is it hot?", "It's hot", "That's right" and the like. The response output unit 4 eliminates "hot" that overlaps with the iterative response statement generated by the iterative generation unit 5 from the generated voluntary response candidates. Then, the response output unit 4 selects, for example, "how hot is it?" From the excluded voluntary response candidates, and sets it as a voluntary response sentence.

上記のように生成された対話の一例を下記に示す。なお、下記一例において、Ｍは、応答生成装置２０の応答文及び応答であり、Ｕはユーザの発話である。
Ｕ：「今日は暑いね。」
Ｍ（相槌応答）：「うん。」
Ｍ（繰返応答文）：「暑いね。」
Ｍ（随意応答文）：「いつまで暑いのかな？」
このように、対話の連続性をより滑らかに維持しつつ、重複する無駄な言葉を排除できより自然な対話を実現できる。 An example of the dialogue generated as described above is shown below. In the following example, M is a response sentence and a response of the response generator 20, and U is a user's utterance.
U: "It's hot today."
M (Aizuchi response): "Yeah."
M (repeated response sentence): "It's hot."
M (voluntary response): "How long will it be hot?"
In this way, it is possible to eliminate duplicate unnecessary words and realize a more natural dialogue while maintaining the continuity of the dialogue more smoothly.

本実施形態２に係る応答生成装置２０において、上記実施形態１に係る応答生成装置１と同一部分に同一符号を付して詳細な説明は省略する。 In the response generation device 20 according to the second embodiment, the same parts as those of the response generation device 1 according to the first embodiment are designated by the same reference numerals, and detailed description thereof will be omitted.

図４は、本実施形態２に係る応答生成方法の処理フローを示すフローチャートである。
音韻分析部２１は、マイク６により取得されたユーザの音声情報に基づいてユーザの音声情報の音韻を分析し（ステップＳ２０１）、その音韻の分析結果を相槌生成部２２に出力する。 FIG. 4 is a flowchart showing a processing flow of the response generation method according to the second embodiment.
The phoneme analysis unit 21 analyzes the phoneme of the user's voice information based on the user's voice information acquired by the microphone 6 (step S201), and outputs the phoneme analysis result to the aizuchi generation unit 22.

相槌生成部２２は、音韻分析部２１から出力される音韻の分析結果に基づいてユーザの音声に対する相槌応答を生成し（ステップＳ２０２）、生成した相槌応答を応答出力部４に出力する。
応答出力部４は、相槌生成部２２から出力された相槌応答をスピーカ７から出力する（ステップＳ２０３）。 The aizuchi generation unit 22 generates an aizuchi response to the user's voice based on the phoneme analysis result output from the phoneme analysis unit 21 (step S202), and outputs the generated aizuchi response to the response output unit 4.
The response output unit 4 outputs the aizuchi response output from the aizuchi generation unit 22 from the speaker 7 (step S203).

上記（ステップＳ２０１）乃至（ステップ２０３）の処理と平行して、音声認識部２は、マイク６により取得されたユーザの音声情報の音声認識を行い（ステップＳ２０４）、認識したユーザの音声情報を構造解析部３及び繰返生成部５に出力する。 In parallel with the above processes (step S201) to (step 203), the voice recognition unit 2 performs voice recognition of the user's voice information acquired by the microphone 6 (step S204), and obtains the recognized user's voice information. Output to the structural analysis unit 3 and the repeat generation unit 5.

繰返生成部５は、音声認識部２により認識された音声情報を、繰返応答文として生成し（ステップＳ２０５）、生成した繰返応答文を応答出力部４に出力する。
応答出力部４は、繰返生成部５から出力された繰返応答文をスピーカ７から出力する（ステップＳ２０６）。 The repeat generation unit 5 generates the voice information recognized by the voice recognition unit 2 as a repeat response sentence (step S205), and outputs the generated repeat response sentence to the response output unit 4.
The response output unit 4 outputs the repeat response statement output from the repeat generation unit 5 from the speaker 7 (step S206).

上記（ステップ２０５）及び（ステップＳ２０６）の処理と平行して、構造解析部３は、音声認識部２により認識された音声情報の構造を解析し（ステップＳ２０７）、その文字列情報の解析結果を応答出力部４に出力する。 In parallel with the above processes (step 205) and (step S206), the structural analysis unit 3 analyzes the structure of the voice information recognized by the voice recognition unit 2 (step S207), and the analysis result of the character string information. Is output to the response output unit 4.

応答出力部４は、構造解析部３から出力される文字列情報の解析結果に基づいて複数の随意応答候補を生成する（ステップＳ２０８）。応答出力部４は、随意応答候補の中から、相槌生成部２２により生成された相槌応答および繰返生成部５により生成された繰返応答と重複する随意応答候補を除外する。そして、応答出力部４は、その除外された随意応答候補の中から最適な候補を選択し、随意応答文とする（ステップＳ２０９）。応答出力部４は、生成した随意応答文をスピーカ７から出力する（ステップＳ２１０）。 The response output unit 4 generates a plurality of optional response candidates based on the analysis result of the character string information output from the structural analysis unit 3 (step S208). The response output unit 4 excludes the aizuchi response generated by the aizuchi generation unit 22 and the voluntary response candidate overlapping with the repeat response generated by the repeat generation unit 5 from the voluntary response candidates. Then, the response output unit 4 selects the optimum candidate from the excluded voluntary response candidates and sets it as a voluntary response statement (step S209). The response output unit 4 outputs the generated voluntary response statement from the speaker 7 (step S210).

以下、応答生成装置２０とユーザとの対話の一例を示す。
Ｍ（話題提供）：「お昼何を食べたの？」
Ｕ：「トンカツを食べたよ。」
Ｍ（相槌応答）：「うん。うん。」
Ｍ（繰返応答文）：「トンカツを食べた。」
Ｍ（随意応答文）：「誰と食べたのかな？」
Ｕ：「友達と食べたよ。」
Ｍ（相槌応答）：「そうなんだ。」
Ｍ（繰返応答文）：「友達と食べた。」
Ｍ（随意応答文）：「どこで食べたのかな？」
Ｕ：「矢場とんで食べたよ。」
Ｍ（相槌応答）：「なるほど。」
Ｍ（繰返応答文）：「矢場とんで食べた。」
Ｍ（随意応答文）：「食べたね。」
Ｕ：「美味しかったよ。」
Ｍ（相槌応答）：「ふーん。」
Ｍ（繰返応答文）：「美味しかった。」
Ｍ（随意応答文）：「それはいいね。○○さん。」 The following is an example of a dialogue between the response generator 20 and the user.
M (topic provided): "What did you eat for lunch?"
U: "I ate pork cutlet."
M (Aizuchi response): "Yeah. Yeah."
M (repeated response sentence): "I ate pork cutlet."
M (voluntary response): "Who did you eat with?"
U: "I ate with my friends."
M (Aizuchi response): "That's right."
M (repeated response): "I ate with a friend."
M (voluntary response): "Where did you eat?"
U: "I ate at Yabaton."
M (Aizuchi response): "I see."
M (repeated response sentence): "I ate at Yabaton."
M (voluntary response sentence): "I ate it."
U: "It was delicious."
M (Aizuchi response): "Hmm."
M (repeated response sentence): "It was delicious."
M (voluntary response sentence): "I like it. Mr. XX."

上記対話の一例が示すように、ユーザが発話すると、この発話に対して、応答生成装置２０の相槌応答、繰返応答文、及び随意応答文がテンポよく連続し、対話間の繋がりがよりスムーズになることが分かる。 As an example of the above dialogue shows, when the user utters, the aizuchi response, the repeated response sentence, and the voluntary response sentence of the response generator 20 are continuously connected at a good tempo to the utterance, and the connection between the dialogues is smoother. It turns out that

以上、本実施形態２において、ユーザの音声情報の音韻を分析し、その分析結果に基づいて、ユーザの音声情報に対する相槌応答を生成し、繰返応答文を出力する前に生成した相槌応答を出力する。これにより、繰返応答文を出力するまでの間に、より処理コストが低い相槌応答を出力することができる。これにより、対話間の繋がりがよりスムーズになり、対話の違和感をより緩和することができる。 As described above, in the second embodiment, the phonology of the user's voice information is analyzed, the aizuchi response to the user's voice information is generated based on the analysis result, and the aizuchi response generated before the repeated response sentence is output is generated. Output. As a result, it is possible to output an aizuchi response with a lower processing cost before the repeated response statement is output. As a result, the connection between the dialogues becomes smoother, and the discomfort of the dialogue can be further alleviated.

実施形態３．
本実施形態３に係る繰返生成部５は、音声認識部２により認識されたユーザの音声情報からキーワードを抽出し、抽出したキーワードに対して特定の付加語尾を付加することで、繰返応答文を生成することを特徴とする。 Embodiment 3.
The repeat generation unit 5 according to the third embodiment extracts a keyword from the user's voice information recognized by the voice recognition unit 2, and adds a specific additional ending to the extracted keyword to perform a repeat response. It is characterized by generating a statement.

繰返生成部５は、音声認識部２により認識された音声情報を、オウム返しを行うための繰返応答文として生成する。ここで、ユーザの音声を全く変えずにそのままオウム返しするよりも、ユーザの音声情報に特定の付加語尾を付加してオウム返しをした方が、より対話の自然性が向上する。例えば、ユーザの発話「海に行ったよ」に対して、応答生成装置１が単にそのまま「海に行ったよ」と応答するよりも、「海かぁ」と応答した方がより対話の自然性が向上する。 The repeat generation unit 5 generates the voice information recognized by the voice recognition unit 2 as a repeat response sentence for returning the parrot. Here, the naturalness of the dialogue is further improved by adding a specific additional ending to the user's voice information and returning the parrot, rather than returning the parrot as it is without changing the user's voice at all. For example, the naturalness of the dialogue is improved when the response generator 1 responds to the user's utterance "I went to the sea" by responding "I went to the sea" rather than simply responding "I went to the sea". To do.

例えば、複数のキーワードと、各キーワードの品詞と、付加語尾と、を夫々対応付けた付加情報がメモリに記憶されている。繰返生成部５は、音声認識部により認識されたユーザの音声情報からキーワード及び該キーワードの品詞を抽出する。繰返生成部５は、メモリに記憶された付加情報に基づいて抽出したキーワード及び品詞に対応した付加語尾を選択する。繰返生成部５は、抽出したキーワードに対して選択した付加語尾を付加することで、繰返応答文を生成する。 For example, additional information in which a plurality of keywords, a part of speech of each keyword, and an additional ending are associated with each other is stored in the memory. The repeat generation unit 5 extracts a keyword and a part of speech of the keyword from the user's voice information recognized by the voice recognition unit. The repeat generation unit 5 selects an additional ending corresponding to the extracted keyword and part of speech based on the additional information stored in the memory. The repeat generation unit 5 generates a repeat response sentence by adding a selected additional ending to the extracted keyword.

より具体的には、繰返生成部５は、音声認識部２により認識された音声の文字列情報「トンカツを食べたよ」から、キーワード「トンカツ」及び該キーワードの品詞「名詞」、キーワード「を」及び該キーワードの品詞「助詞」、キーワード「食べた」及び該キーワードの品詞「動詞」、キーワード「よ」及び該キーワードの品詞「助詞」、を抽出する。さらに、繰返生成部５は、抽出したこれらキーワード及び品詞の中から、キーワード「トンカツ」及び品詞「名詞」を選択し、メモリの付加情報に基づいて、抽出したキーワード「トンカツ」及び品詞「名詞」に対応した付加語尾「かぁ」を選択する。ここで、繰返生成部５は、音声認識部２により認識された音声の文字列情報から、上述のように、名詞又は形容詞のキーワードを任意に抽出し、対応した付加語尾を選択する。 More specifically, the repeat generation unit 5 uses the keyword "tonkatsu" and the part of speech "noun" and the keyword "" from the voice character string information "I ate tonkatsu" recognized by the voice recognition unit 2. , The part of speech "particle" of the keyword, the keyword "eaten", the part of speech "verb" of the keyword, the keyword "yo", and the part of speech "particle" of the keyword. Further, the repeat generation unit 5 selects the keyword "tonkatsu" and the part of speech "noun" from these extracted keywords and part of speech, and the extracted keyword "tonkatsu" and part of speech "noun" based on the additional information in the memory. Select the additional ending "ka" corresponding to "". Here, the repeat generation unit 5 arbitrarily extracts a keyword of a noun or an adjective from the character string information of the voice recognized by the voice recognition unit 2, and selects a corresponding additional ending.

繰返生成部５は、抽出したキーワード及び品詞に対応する付加語尾が複数存在する場合、予め設定された優先順位に従って選択してもよい。同様に、繰返生成部５は、音声認識部２により認識された音声の文字列情報から、例えば、キーワード「やった」及び品詞「感動詞」を選択した場合、付加情報に基づいて、キーワード「やった」及び品詞「感動詞」に対応する付加語尾「ね」を選択する。なお、キーワードだけでなく、品詞も同時に抽出することで、上記のような「やった」を感動詞の「やった」と動詞「やった」とで区別することができる。 When there are a plurality of additional endings corresponding to the extracted keywords and part of speech, the repeat generation unit 5 may select them according to a preset priority order. Similarly, when the repeat generation unit 5 selects, for example, the keyword "done" and the part of speech "interjection" from the character string information of the voice recognized by the voice recognition unit 2, the keyword is based on the additional information. Select the additional ending "ne" corresponding to "done" and the part of speech "interjection". By extracting not only the keywords but also the part of speech at the same time, it is possible to distinguish the above-mentioned "done" by the verb "done" and the verb "done".

音声認識部２は、例えば、認識したユーザの音声の文字列情報「美味しかった」に、その活用前の原型「美味しい」を付加して、繰返生成部５に出力してもよい。この場合、繰返生成部５は、キーワード「美味しい」及び品詞「形容詞」を抽出する。繰返生成部５は、付加情報に基づいて、キーワード「美味しい」及び品詞「形容詞」に対応する付加語尾「のか」を選択する。 For example, the voice recognition unit 2 may add the prototype "delicious" before utilization to the character string information "delicious" of the recognized user's voice and output it to the repeat generation unit 5. In this case, the repeat generation unit 5 extracts the keyword “delicious” and the part of speech “adjective”. The iterative generation unit 5 selects the additional ending "noka" corresponding to the keyword "delicious" and the part of speech "adjective" based on the additional information.

図５は、メモリに記憶された付加情報の一例である。図２に示す付加情報において、キーワードがワイルドカード「＊」になっている場合、全てのキーワードが対象となる。したがって、繰返生成部５は、キーワード「トンカツ」及び品詞「名詞」を抽出した場合、付加情報を参照して、付加語尾「かぁ」及び「なんだ」のうちの一方をランダムに選択する。 FIG. 5 is an example of additional information stored in the memory. In the additional information shown in FIG. 2, when the keyword is a wildcard "*", all the keywords are targeted. Therefore, when the keyword "pork cutlet" and the part of speech "noun" are extracted, the repeat generation unit 5 randomly selects one of the additional endings "ka" and "nanda" with reference to the additional information.

繰返生成部５は、上述のように、キーワード「トンカツ」及び品詞「名詞」を抽出する。そして、繰返生成部５は、付加情報を参照して、キーワード「トンカツ」及び品詞「名詞」に対応する付加語尾「かぁ」をランダムに選択する。最後に、繰返生成部５は、抽出したキーワード「トンカツ」に対して選択した付加語尾「かぁ」を付加することで、繰返応答文「トンカツかぁ」を生成する。ここで、繰返生成部５は、例えば、抽出したキーワードを２回繰返したもの「トンカツ、トンカツ」に付加語尾「かぁ」を付加して繰返応答文「トンカツ、トンカツかぁ」を生成してもよい。これにより、対話のテンポ感が高まり、対話の自然性がより向上する。 As described above, the repeat generation unit 5 extracts the keyword “pork cutlet” and the part of speech “noun”. Then, the iterative generation unit 5 randomly selects the additional ending “ka” corresponding to the keyword “tonkatsu” and the part of speech “noun” with reference to the additional information. Finally, the repeat generation unit 5 generates the repeat response sentence "tonkatsu ka" by adding the selected additional ending "ka" to the extracted keyword "tonkatsu". Here, the repeat generation unit 5 generates, for example, a repeat response sentence “tonkatsu, pork cutlet” by adding an additional ending “ka” to “tonkatsu, pork cutlet” which is obtained by repeating the extracted keyword twice. May be good. As a result, the tempo of the dialogue is enhanced, and the naturalness of the dialogue is further improved.

本実施形態３によれば、繰返生成部５は、ユーザの音声情報からキーワード及び品詞を抽出し、付加情報に基づいてキーワード及び品詞に対応した付加語尾を選択し、キーワードに付加語尾を付加するだけの処理で繰返応答文を生成する。したがって、簡易な処理で繰返応答文を生成できるため、処理コストを低く抑えることができる。さらに、ユーザの音声を単にオウム返しするだけでなく、ユーザの音声情報に応じて、適切に付加語尾を付加して多様なオウム返しができるため、対話の自然性をより向上させることができる。 According to the third embodiment, the repeat generation unit 5 extracts the keyword and the part of speech from the voice information of the user, selects the additional ending corresponding to the keyword and the part of speech based on the additional information, and adds the additional ending to the keyword. Generate a repeat response statement by just doing the process. Therefore, since the iterative response statement can be generated by simple processing, the processing cost can be kept low. Further, not only the user's voice is simply returned to the parrot, but also various parrots can be returned by appropriately adding additional endings according to the user's voice information, so that the naturalness of the dialogue can be further improved.

なお、本発明は上記実施の形態に限られたものではなく、趣旨を逸脱しない範囲で適宜変更することが可能である。 The present invention is not limited to the above embodiment, and can be appropriately modified without departing from the spirit.

上記実施形態において、応答出力部４は相槌生成部２２により生成された相槌応答をスピーカ７から出力させているが、これに限られない。応答出力部４は、相槌生成部２２により生成された相槌応答に基づいて、処理負荷の低い任意の応答を行っても良い。例えば、応答出力部４は、振動装置の振動、ライト装置の点灯／点滅、表示装置の表示、ロボットの手足、頭部、胴体など各部の動作などをおこなってもよく、これらを任意に組み合わせて行ってもよい。 In the above embodiment, the response output unit 4 outputs the aizuchi response generated by the aizuchi generation unit 22 from the speaker 7, but the present invention is not limited to this. The response output unit 4 may perform an arbitrary response with a low processing load based on the aizuchi response generated by the aizuchi generation unit 22. For example, the response output unit 4 may perform vibration of the vibration device, lighting / blinking of the light device, display of the display device, operation of each part such as the limbs, head, and body of the robot, and any combination of these may be performed. You may go.

上記実施形態において、応答出力部４は、繰返生成部５により生成された繰返応答文をスピーカ７から出力させているが、これに限らない。応答出力部４は、繰返生成部５により生成された繰返応答文に基づいて、処理負荷の低い任意の繰返応答文を出力しても良い。例えば、応答出力部４は、表示装置の表示などを用いて繰返応答文を出力してもよく、任意に手段を組み合わせて出力してもよい。 In the above embodiment, the response output unit 4 outputs the repeat response sentence generated by the repeat generation unit 5 from the speaker 7, but the present invention is not limited to this. The response output unit 4 may output an arbitrary repeated response statement having a low processing load based on the repeated response statement generated by the repeat generation unit 5. For example, the response output unit 4 may output a repeated response sentence by using a display of a display device or the like, or may output by arbitrarily combining means.

また、本発明は、例えば、図２及び図４に示す処理を、ＣＰＵにコンピュータプログラムを実行させることにより実現することも可能である。 Further, the present invention can also be realized, for example, by causing the CPU to execute a computer program for the processes shown in FIGS. 2 and 4.

プログラムは、様々なタイプの非一時的なコンピュータ可読媒体（non-transitory computer readable medium）を用いて格納され、コンピュータに供給することができる。非一時的なコンピュータ可読媒体は、様々なタイプの実体のある記録媒体（tangible storage medium）を含む。非一時的なコンピュータ可読媒体の例は、磁気記録媒体（例えばフレキシブルディスク、磁気テープ、ハードディスクドライブ）、光磁気記録媒体（例えば光磁気ディスク）、ＣＤ−ＲＯＭ（Read Only Memory）、ＣＤ−Ｒ、ＣＤ−Ｒ／Ｗ、半導体メモリ（例えば、マスクＲＯＭ、ＰＲＯＭ（Programmable ROM）、ＥＰＲＯＭ（Erasable PROM）、フラッシュＲＯＭ、ＲＡＭ（random access memory））を含む。 Programs can be stored and supplied to a computer using various types of non-transitory computer readable media. Non-transitory computer-readable media include various types of tangible storage media. Examples of non-temporary computer-readable media include magnetic recording media (eg, flexible disks, magnetic tapes, hard disk drives), magneto-optical recording media (eg, magneto-optical disks), CD-ROMs (Read Only Memory), CD-Rs, It includes a CD-R / W and a semiconductor memory (for example, a mask ROM, a PROM (Programmable ROM), an EPROM (Erasable PROM), a flash ROM, and a RAM (random access memory)).

また、プログラムは、様々なタイプの一時的なコンピュータ可読媒体（transitory computer readable medium）によってコンピュータに供給されてもよい。一時的なコンピュータ可読媒体の例は、電気信号、光信号、及び電磁波を含む。一時的なコンピュータ可読媒体は、電線及び光ファイバ等の有線通信路、又は無線通信路を介して、プログラムをコンピュータに供給できる。 The program may also be supplied to the computer by various types of transient computer readable media. Examples of temporary computer-readable media include electrical, optical, and electromagnetic waves. The temporary computer-readable medium can supply the program to the computer via a wired communication path such as an electric wire and an optical fiber, or a wireless communication path.

１応答生成装置、２音声認識部、３構造解析部、４応答出力部、５繰返生成部、６マイク、７スピーカ、８不足格辞書データベース、２１音韻分析部、２２相槌生成部、２３定型応答データベース 1 Response generator, 2 Speech recognition unit, 3 Structural analysis unit, 4 Response output unit, 5 Repeat generation unit, 6 Microphone, 7 Speaker, 8 Insufficient dictionary database, 21 Phonology analysis unit, 22 Aizuchi generation unit, 23 Standard Response database

Claims

Steps to recognize the user's voice and
The step of analyzing the structure of the recognized voice and
A step of generating a response sentence to the user's voice based on the analyzed voice structure and outputting the generated arbitrary response sentence.
It is a response generation method including
From the recognized user's voice , a step of generating a plurality of repetitive response statements with different processing costs in parallel, and
Before outputting the response sentence optional based on the structure of the audio, including the steps of outputting the response sentence of repetitive said generated by the generated order response generating method, characterized in that ..

The response generation method according to claim 1.
The step of analyzing the phonology of the user's voice,
A step of generating an aizuchi response to the user's voice based on the phonological analysis result is further included.
A response generation method, characterized in that the response of the generated aizuchi is output before the output of the generated repetitive response statement.

The response generation method according to claim 2.
Based on the structure of the analyzed voice, a plurality of response candidates for the user's voice are generated, and from the generated plurality of response candidates, the generated repetitive response sentence and the generated response of the aizuchi are generated. A response generation method characterized in that a response candidate overlapping with the above is excluded, and a response candidate selected from the excluded response candidates is used as the optional response statement.

The response generation method according to claim 3.
A keyword and a part of speech of the keyword are extracted from the recognized user's voice, and the extracted keyword and part of speech are extracted based on additional information in which a plurality of keywords, a part of speech of each keyword, and an additional ending are associated with each other. A response generation method, characterized in that a repetitive response sentence is generated by selecting the additional ending corresponding to the above and adding the selected additional ending to the extracted keyword.

A voice recognition means that recognizes the user's voice,
A structural analysis means for analyzing the structure of the voice recognized by the voice recognition means, and
A response generator including a response output means that generates a response sentence to the user's voice based on the structure of the voice analyzed by the structural analysis means and outputs the generated arbitrary response sentence.
A repeat generation means for generating a plurality of repetitive response sentences having different processing costs in parallel from the user's voice recognized by the voice recognition means is provided.
The response output means outputs a plurality of repetitive response sentences generated by the repeat generation means in the order in which they are generated, before outputting an arbitrary response sentence based on the voice structure. A featured response generator.

The process of recognizing the user's voice and
The process of analyzing the structure of the recognized voice and
A process of generating a response sentence to the user's voice based on the analyzed voice structure and outputting the generated arbitrary response sentence.
A process of generating a plurality of repetitive response statements having different processing costs in parallel from the recognized user's voice,
Before outputting the response sentence optional based on the structure of the voice response, characterized in that to execute a process of outputting the response sentence of repetitive said generated in the order the generated, to a computer Generation program.