JP2008293098A

JP2008293098A - Answer score information generation device and interactive processor

Info

Publication number: JP2008293098A
Application number: JP2007135469A
Authority: JP
Inventors: Ryuta Terajima; 立太寺嶌
Original assignee: Toyota Central R&D Labs Inc
Current assignee: Toyota Central R&D Labs Inc
Priority date: 2007-05-22
Filing date: 2007-05-22
Publication date: 2008-12-04

Abstract

<P>PROBLEM TO BE SOLVED: To provide an interactive processor, capable of making interaction long-lasting by selecting an appropriate answer, and an answer score information generation device provided to the interactive processor. <P>SOLUTION: The answer score information generation device comprises an answer knowledge information storage means which stores answer knowledge information including a plurality of pieces of speech information showing a speech and answer information showing an answer responding to each speech; a conversion means which converts the answer knowledge information stored by the storage means to a speech transition matrix showing a correspondence between speech information and answer information; a calculation means which calculates a maximum unique vector of the speech transition matrix converted by the conversion means; an answer score information generation means which generates answer score information by associating each element of the maximum unique vector calculated by the calculation means with each answer shown in the answer information; and an answer score information storage means which stores answer score information generated by the answer score information generation means. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、応答スコア情報生成装置及び対話処理装置に関し、特にユーザとの対話を長続きすることを可能とするための応答スコア情報生成装置、及び対話処理装置に関する。 The present invention relates to a response score information generation device and a dialogue processing device, and more particularly to a response score information generation device and a dialogue processing device for enabling a long-lasting dialogue with a user.

近年の音声認識技術の向上に伴い、種々の対話処理装置が提案されている。例えば特許文献１には、事象と事象、事象と評価、評価と評価の組み合わせからなる形式に帰着できる応答知識を用いて、発話を生成する対話処理装置が提案されている。 With recent improvements in speech recognition technology, various interactive processing devices have been proposed. For example, Patent Document 1 proposes an interactive processing device that generates an utterance using response knowledge that can be reduced to a format composed of events and events, events and evaluations, and combinations of evaluations and evaluations.

この対話処理装置では、ユーザから入力された発話から、発話に含まれる単語集合の類似度に基づいて、応答知識中の発話とのマッチングを行い、装置が応答すべき発話の候補を選択する。 In this dialogue processing apparatus, matching is performed with utterances in response knowledge based on the similarity of word sets included in utterances from utterances input by a user, and utterance candidates to which the apparatus should respond are selected.

選択された候補が複数あった場合、上記対話処理装置は、発話候補の中からランダムに一つの発話を選択するか、対話履歴中に出現した発話をマスキングするなどの処理を行って、一つの発話を決定するようになっている。 When there are a plurality of selected candidates, the dialogue processing device selects one utterance at random from the utterance candidates or performs processing such as masking the utterance that appears in the dialogue history, The utterance is decided.

特開２００６−２０１８７０号公報JP 2006-201870 A

しかしながら、特許文献１に開示された技術では、ランダムに一つの発話を選択する処理を行っているため、適切な応答が選択されずに、対話が長続きしないという問題点があった。 However, the technique disclosed in Patent Document 1 has a problem in that since a process of selecting one utterance is performed at random, an appropriate response is not selected and the dialogue does not last long.

本発明は上記問題点に鑑み、適切な応答を選択することにより対話を長続きさせることを可能とする対話処理装置、及び対話処理装置に供される応答スコア情報生成装置を提供することを目的とする。 In view of the above problems, an object of the present invention is to provide a dialogue processing device that enables a dialogue to last long by selecting an appropriate response, and a response score information generation device provided for the dialogue processing device. To do.

上記目的を達成するために請求項１の発明は、発話を示す複数の発話情報、及び前記発話の各々に対応する応答を示す応答情報を含む応答知識情報が記憶された応答知識情報記憶手段と、前記応答知識情報記憶手段により記憶された前記応答知識情報を、前記発話情報と前記応答情報との対応を示す発話推移行列に変換する変換手段と、前記変換手段により変換された前記発話推移行列の最大固有ベクトルを算出する算出手段と、前記算出手段により算出された前記最大固有ベクトルの各要素と前記応答情報に示される各応答とが対応づけられた応答スコア情報を生成する応答スコア情報生成手段と、前記応答スコア情報生成手段により生成された前記応答スコア情報が記憶される応答スコア情報記憶手段と、を有する。 In order to achieve the above object, the invention of claim 1 includes a response knowledge information storage means storing response knowledge information including a plurality of utterance information indicating utterances and response information indicating responses corresponding to the utterances. Conversion means for converting the response knowledge information stored by the response knowledge information storage means into an utterance transition matrix indicating a correspondence between the utterance information and the response information; and the utterance transition matrix converted by the conversion means Calculation means for calculating the maximum eigenvector of the response, and response score information generation means for generating response score information in which each element of the maximum eigenvector calculated by the calculation means is associated with each response indicated by the response information; Response score information storage means for storing the response score information generated by the response score information generation means.

請求項１に発明によれば、応答知識情報記憶手段には、発話を示す複数の発話情報、及び前記発話の各々に対応する応答を示す応答情報を含む応答知識情報が記憶され、変換手段が前記応答知識情報記憶手段により記憶された前記応答知識情報を、前記発話情報と前記応答情報との対応を示す発話推移行列に変換し、算出手段が前記変換手段により変換された前記発話推移行列の最大固有ベクトルを算出し、応答スコア情報生成手段が前記算出手段により算出された前記最大固有ベクトルの各要素と前記応答情報に示される各応答とが対応づけられた応答スコア情報を生成し、応答スコア情報記憶手段には前記応答スコア情報生成手段により生成された前記応答スコア情報が記憶される。上記最大固有ベクトルの各要素は、対応する応答が適切な応答であることの度合いを示すものであるため、対話を長続きさせるための応答スコア情報生成装置を提供することができる。 According to the first aspect of the present invention, the response knowledge information storage means stores a plurality of utterance information indicating utterances, and response knowledge information including response information indicating a response corresponding to each of the utterances. The response knowledge information stored by the response knowledge information storage means is converted into an utterance transition matrix indicating a correspondence between the utterance information and the response information, and a calculation means converts the utterance transition matrix converted by the conversion means. A response eigenvector is calculated, and response score information generating means generates response score information in which each element of the maximum eigenvector calculated by the calculating means is associated with each response shown in the response information, and response score information The storage means stores the response score information generated by the response score information generation means. Since each element of the maximum eigenvector indicates the degree to which the corresponding response is an appropriate response, it is possible to provide a response score information generating device for making the dialogue last longer.

また、上記課題を解決するために、請求項２の発明は、請求項１に記載の応答スコア情報生成装置を有し、入力手段がユーザによる発話を音声認識し、該ユーザの発話を前記発話情報として入力し、応答情報抽出手段が前記応答知識情報記憶手段により記憶された前記応答知識情報から、前記入力手段により入力された前記発話情報が示す発話に対応する応答を示す応答情報を抽出し、要素取得手段が前記応答スコア情報記憶手段により記憶された前記応答スコア情報から、前記応答情報抽出手段により抽出された応答情報が示す応答に対応する前記最大固有ベクトルの要素を取得し、応答情報選択手段が前記要素取得手段により取得された要素のうちで最も大きい要素に対応する応答を示す応答情報を選択し、音声出力手段が前記応答情報選択手段により選択された応答情報を、音声に変換して出力する。上記最大固有ベクトルの各要素は、対応する応答が適切な応答であることの度合いを示すものであるため、ユーザによる発話に対して適切な応答を選択し、その応答を出力することにより、対話を長続きさせることを可能とする対話処理装置を提供することができる。 In order to solve the above-mentioned problem, the invention of claim 2 has the response score information generation device of claim 1, wherein the input means recognizes speech of a user by speech and recognizes the user's speech as the speech. Response information is input as information, and response information extraction means extracts response information indicating a response corresponding to the utterance indicated by the utterance information input by the input means from the response knowledge information stored by the response knowledge information storage means. The element acquisition unit acquires the element of the maximum eigenvector corresponding to the response indicated by the response information extracted by the response information extraction unit from the response score information stored by the response score information storage unit, and selects response information The means selects response information indicating a response corresponding to the largest element among the elements acquired by the element acquisition means, and the voice output means selects the response information. Response information selected by the-option means, and outputs the converted to speech. Each element of the maximum eigenvector indicates the degree to which the corresponding response is an appropriate response. Therefore, by selecting an appropriate response for the user's utterance and outputting the response, the interaction is performed. It is possible to provide an interactive processing device that can last for a long time.

また、請求項２の発明は、請求項３の発明のように、前記ユーザに次の発話を促す予め定められた応答を示す発話促進応答情報が記憶された発話促進応答情報記憶手段を更に有し、前記応答情報選択手段は、前記応答情報抽出手段により前記応答情報が抽出されなかった場合には、前記発話促進応答情報記憶手段により記憶された発話促進応答情報を選択する。 Further, the invention of claim 2 further comprises speech promotion response information storage means for storing speech promotion response information indicating a predetermined response that prompts the user to prompt the next speech as in the invention of claim 3. The response information selection unit selects the speech promotion response information stored in the speech promotion response information storage unit when the response information is not extracted by the response information extraction unit.

請求項３の発明によれば、抽出されなかった場合には、ユーザに次の発話を促す予め定められた応答を示す発話促進応答情報を選択するので、対話を長続きさせることができる。 According to the third aspect of the present invention, when it is not extracted, utterance promotion response information indicating a predetermined response that prompts the user to utter the next utterance is selected, so that the dialogue can be continued for a long time.

本発明によれば、適切な応答を選択することにより対話を長続きさせることを可能とする対話処理装置、及び対話処理装置に供される応答スコア情報生成装置を提供することができるという効果が得られる。 Advantageous Effects of Invention According to the present invention, there is an effect that it is possible to provide a dialog processing device that enables a dialog to last long by selecting an appropriate response, and a response score information generation device provided for the dialog processing device. It is done.

以下、図面を参照して、本発明の実施の形態について詳細に説明する。なお、本実施の形態では、応答スコア情報生成装置、及び対話処理装置をパソコンに適用した例について説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. In the present embodiment, an example in which the response score information generation device and the dialogue processing device are applied to a personal computer will be described.

まず、図１を用いて、パソコン１２の構成について説明する。パソコン１２は、各々バスＢにより接続されたＣＰＵ（Central Processing Unit）６０と、ＲＯＭ（Read Only Memory）６１と、ＲＡＭ（Random Access Memory）６２と、ＨＤＤ（Hard Disk Drive）６３と、表示部６４と、操作入力部６５と、マイク６６と、スピーカ６７とを含む。 First, the configuration of the personal computer 12 will be described with reference to FIG. The personal computer 12 includes a CPU (Central Processing Unit) 60, a ROM (Read Only Memory) 61, a RAM (Random Access Memory) 62, a HDD (Hard Disk Drive) 63, and a display unit 64, each of which is connected by a bus B. An operation input unit 65, a microphone 66, and a speaker 67.

ＣＰＵ６０は、パソコン１２の全体的な動作を司るものであり、後述するプログラムは、ＣＰＵ６０により実行される。ＲＯＭ６１は、パソコン１２の起動時に動作するブートプログラムなどが記憶されている不揮発性の記憶装置である。ＲＡＭ６２は、ＯＳ（Operating System）、プログラム、及びデータが展開される揮発性の記憶装置である。ＨＤＤ６３は、後述する応答知識データベース（以下、応答知識ＤＢと記す）、応答スコアデータベース（以下、応答スコアＤＢと記す）、発話促進応答データベース（以下、発話促進応答ＤＢと記す）、ＯＳ、及びプログラム等が記憶された不揮発性の記憶装置であり、応答知識情報記憶手段、応答スコア情報記憶手段、及び発話促進応答情報記憶手段に対応する。 The CPU 60 controls the overall operation of the personal computer 12, and a program to be described later is executed by the CPU 60. The ROM 61 is a non-volatile storage device that stores a boot program that operates when the personal computer 12 is started up. The RAM 62 is a volatile storage device in which an OS (Operating System), programs, and data are expanded. The HDD 63 includes a later-described response knowledge database (hereinafter referred to as response knowledge DB), a response score database (hereinafter referred to as response score DB), an utterance promotion response database (hereinafter referred to as utterance promotion response DB), an OS, and a program. And the like, and corresponds to response knowledge information storage means, response score information storage means, and utterance promotion response information storage means.

表示部６４は、各種情報を表示するものである。操作入力部６５は、ユーザがパソコン１２の操作をする場合や、パソコン１２に情報を入力する際に用いられるものである。マイク６６は、アナログ／デジタル変換部を含み、ユーザによる発話をデジタル信号に変換してバスＢに出力するものである。また、スピーカ６７は、デジタル／アナログ変換部を含み、バスＢから入力されたデジタル信号をアナログ信号に変換して音を出力するものである。 The display unit 64 displays various information. The operation input unit 65 is used when the user operates the personal computer 12 or inputs information to the personal computer 12. The microphone 66 includes an analog / digital converter, and converts a user's utterance into a digital signal and outputs it to the bus B. The speaker 67 includes a digital / analog converter, and converts a digital signal input from the bus B into an analog signal and outputs a sound.

次に、上記応答知識ＤＢを、図２を用いて説明する。応答知識ＤＢは、発話を示す複数の発話情報、及び前記発話の各々に対応する応答を示す応答情報を含むＤＢである。 Next, the response knowledge DB will be described with reference to FIG. The response knowledge DB is a DB including a plurality of utterance information indicating utterances and response information indicating responses corresponding to the utterances.

具体的には、同図に示されるように、発話情報として、「犬小屋を作る」、「犬が大きくなった」などの発話を示す情報がテキスト情報として複数含まれている。また、応答情報には、「犬小屋を作る」に対応する応答を示す「犬が喜ぶ」や、「犬が大きくなった」に対応する応答を示す「犬小屋を作る」などの応答を示す情報がテキスト情報として複数含まれている。 Specifically, as shown in the figure, as the utterance information, a plurality of pieces of information indicating utterances such as “make a dog house”, “the dog has grown”, and the like are included as text information. In addition, the response information indicates a response such as “Dog is pleased” indicating a response corresponding to “Create a doghouse” or “Create a doghouse” indicating a response corresponding to “The dog has grown” Multiple pieces of information are included as text information.

なお、上記応答知識ＤＢに含まれる発話情報及びそれに対応した応答情報は各７つしか示されていないが、７つに限られるものではない。 Note that although only seven utterance information and corresponding response information are included in the response knowledge DB, the number is not limited to seven.

次に、応答スコアＤＢについて説明するが、その説明に先立ち、発話ネットワーク及び発話推移行列について説明する。 Next, the response score DB will be described. Prior to the description, the utterance network and the utterance transition matrix will be described.

図３は、上記応答知識ＤＢに示される発話情報と応答情報との対応を示す発話ネットワークを示す。応答知識ＤＢにおける発話の一つである「犬が大きくなった」に対応する応答は、「犬小屋を作る」のみである。この「犬が大きくなった」をｎ１とし、「犬小屋を作る」をｎ２とすれば、図３に示されるように、ノードｎ１とノードｎ２とを結ぶネットワークとして表現できる。 FIG. 3 shows an utterance network showing correspondence between utterance information and response information shown in the response knowledge DB. The response corresponding to “Dog has grown”, which is one of the utterances in the response knowledge DB, is only “Create dog house”. Assuming that “the dog has grown” is n1 and “make the doghouse” is n2, it can be expressed as a network connecting the nodes n1 and n2, as shown in FIG.

更に、上記「犬小屋を作る」に対応する応答は、「犬が喜ぶ」及び「金槌がいる」の２つある。先ほどと同様に、「犬が喜ぶ」をｎ３とし、「金槌がいる」をｎ４とすれば、図３に示されるように、ノードｎ２とノードｎ３及びノードｎ２とノードｎ４とを結ぶネットワークとして表現できる。 Furthermore, there are two responses corresponding to the above-mentioned “create a dog house”: “the dog is pleased” and “there is a hammer”. Similarly to the previous case, if “the dog is pleased” is n3 and “the gold hammer is” is n4, as shown in FIG. 3, it is expressed as a network connecting the nodes n2 and n3 and the nodes n2 and n4. it can.

このようにすることで、応答知識ＤＢから図３に示される発話ネットワークが得られるが、この発話ネットワークにおける始ノードを行とし、終ノードを列とした行列で表現したものが図４（Ａ）に示される行列である。 By doing so, the utterance network shown in FIG. 3 can be obtained from the response knowledge DB. The utterance network represented by a matrix having the start node in the utterance network as a row and the end node as a column is shown in FIG. It is a matrix shown in

図４（Ａ）に示される行列において、例えば１行２列目（ｎ１行ｎ２列目）は「１」となっている。これは、始ノードをｎ１とし、終ノードをｎ２としたネットワークが存在することを示しており、この場合は上述した「犬が大きくなった」（ｎ１）、「犬小屋を作る」（ｎ２）を結ぶネットワークに対応している。 In the matrix shown in FIG. 4A, for example, the first row and second column (n1 row and n2 column) is “1”. This indicates that there is a network in which the start node is n1 and the end node is n2. In this case, the above-mentioned “dog has grown” (n1), “make a doghouse” (n2) It corresponds to the network that connects

また、２行５列目（ｎ２行ｎ５列目）のように、「０」である場合は、始ノードをｎ２とし、終ノードをｎ５としたネットワークが存在しないことを示している。 Further, as in the second row and fifth column (n2 row and n5 column), “0” indicates that there is no network in which the start node is n2 and the end node is n5.

この行列（図４（Ａ））の要素のうち、同じ行に存在する各要素を、同じ行に存在する要素の総和で割ったものが、図４（Ｂ）に示される発話推移行列である。例えば、図４（Ａ）において、２行目に属する要素は、「１」が２つ、「０」が５つのため、同一の行に存在する要素の和は２であるので、２行目に属する各要素を２で割ったものが、図４（Ｂ）に示される２行目となる。 Of the elements of this matrix (FIG. 4 (A)), the utterance transition matrix shown in FIG. 4 (B) is obtained by dividing each element present in the same row by the sum of the elements present in the same row. . For example, in FIG. 4A, since the elements belonging to the second line are two “1” and five “0”, the sum of the elements existing in the same line is 2, so the second line Dividing each element belonging to 2 by 2 is the second line shown in FIG.

すなわち、図４（Ｂ）に示される行列は、ある始ノードから各終ノードに推移する確率を示す確率ネットワークを示している。この確率ネットワークをランダムウォークした際に得られるノードの滞在確率の極限は、上記発話推移行列を固有値分解して算出された最大固有ベクトルを正規化した値として得ることができる。 That is, the matrix shown in FIG. 4B shows a probability network indicating the probability of transition from a certain start node to each end node. The limit of the stay probability of a node obtained when the probability network is randomly walked can be obtained as a value obtained by normalizing the maximum eigenvector calculated by eigenvalue decomposition of the utterance transition matrix.

従って、あるノードが複数のノードに接続している（ある発話に対応する応答候補が複数ある）場合には、それら複数のノードの中から、滞在確率が最も高い（最大固有ベクトルの要素のうちで最も大きい）ノードが、適切な応答を示すノードと考えられる。よって、最大固有ベクトルの各要素は、対応する応答が適切な応答であることの度合いを示すものである。 Therefore, when a certain node is connected to a plurality of nodes (there are a plurality of response candidates corresponding to a certain utterance), the stay probability is the highest among the plurality of nodes (among the elements of the maximum eigenvector). The node that is the largest) is considered the node that gives the appropriate response. Therefore, each element of the maximum eigenvector indicates the degree to which the corresponding response is an appropriate response.

なお、最大固有ベクトルの正規化とは、最大固有ベクトルに属する要素の総和で、各要素を割ったベクトルを求めることである。 The normalization of the maximum eigenvector is to obtain a vector obtained by dividing each element by the sum of the elements belonging to the maximum eigenvector.

なお、上述した図４（Ｂ）に示される発話推移行列の強連結性を保証するために、発話推移行列に属する各要素が０にならないようにスムーシングを行うようにしても良い。このスムーシングとは、発話推移行列の１行の要素の個数をｎとし、λを１より小さい正の実数とし、発話推移行列のｉ行ｊ列目の要素をａｉｊとしたとき、各ａｉｊを、
（ａｉｊ＋λ）／（１＋ｎλ）
に置き換えることである。 In addition, in order to guarantee the strong connectivity of the utterance transition matrix shown in FIG. 4B, smoothing may be performed so that each element belonging to the utterance transition matrix does not become zero. This smoothing means that when the number of elements in one row of the utterance transition matrix is n, λ is a positive real number smaller than 1, and the element in the i-th row and j-th column of the utterance transition matrix is aij,
(Aij + λ) / (1 + nλ)
Is to replace

このように置き換えても、（ａｉｊ＋λ）／（１＋ｎλ）は１より小さく、また、Σをｊ（１からｎまで）に関する和としたとき、Σａｉｊ＝１であるので、
Σ（ａｉｊ＋λ）／（１＋ｎλ）＝１
となる。 Even if replaced in this way, (aij + λ) / (1 + nλ) is smaller than 1, and Σaij = 1 when Σ is the sum of j (from 1 to n).
Σ (aij + λ) / (1 + nλ) = 1
It becomes.

以上説明した発話推移行列を固有値分解により算出された最大固有ベクトルを図５に示す。同図に示される最大固有ベクトルの横に記載されたｎ１〜ｎ７は、図３の各ノードに示される「犬が大きくなった」などの文言に対応する。なお、同図に示される最大固有ベクトルは正規化されていない。 FIG. 5 shows the maximum eigenvector calculated by eigenvalue decomposition of the utterance transition matrix described above. N1 to n7 written beside the maximum eigenvector shown in the figure correspond to the words such as “the dog has grown” shown in each node in FIG. The maximum eigenvector shown in the figure is not normalized.

そして、同図に示される応答スコアＤＢは、発話推移行列の最大固有ベクトルの各要素と応答情報に示される各応答とが対応づけられた情報である。この応答スコアＤＢでは、同図に示されるように、最大固有ベクトルの各要素をスコアと表現している。 The response score DB shown in the figure is information in which each element of the maximum eigenvector of the utterance transition matrix is associated with each response shown in the response information. In this response score DB, as shown in the figure, each element of the maximum eigenvector is expressed as a score.

上述した図４（Ａ）に示した行列では、ネットワークが存在する場合に要素に「１」が格納されているが、「１」以外の値を格納する方法もある。この方法について説明する。まず、上述した応答知識ＤＢの内容は、ネットワークで公開されているドキュメントから自動的に獲得することができる（例えば、乾他、”接続標識「ため」に基づく文書集合からの因果関係知識の自動獲得”、情報処理学会論文集、vol1,45, No. 3,pp. 919-933, 2004を参照のこと）。 In the matrix shown in FIG. 4A described above, “1” is stored as an element when a network exists, but there is a method of storing a value other than “1”. This method will be described. First, the contents of the response knowledge DB described above can be automatically acquired from documents published on the network (for example, dryness, automatic causal relationship knowledge from a document set based on “connection indicator“ for ”). Acquired ", IPSJ Proceedings, vol1,45, No. 3, pp. 919-933, 2004).

このとき、獲得元の文書集合には重複した応答知識が含まれている場合があるため、この出現カウントを応答知識ＤＢに記憶し、要素にその出現カウントを格納する。 At this time, since there are cases where duplicate response knowledge is included in the document set of the acquisition source, this appearance count is stored in the response knowledge DB, and the appearance count is stored in the element.

上記出現カウントは、発話の結びつきの強さを代表する値と考えられるため、より自然で適切な応答を選択することが可能となる。 Since the appearance count is considered to be a value representative of the strength of utterance connection, a more natural and appropriate response can be selected.

次に、図６を用いて、発話促進応答ＤＢについて説明する。この発話促進応答ＤＢは、応答情報が抽出されなかった場合に、ユーザに次の発話を促す予め定められた応答を示す情報である。ここで、「応答情報が抽出されなかった」とは、応答知識ＤＢにユーザの発話に対応する応答が存在しなかったことを意味する。その場合、対話が終了するため、ユーザに次の発話を促すために発話促進応答ＤＢが用いられる。 Next, the speech promotion response DB will be described with reference to FIG. This utterance promotion response DB is information indicating a predetermined response that prompts the user to utter the next utterance when response information is not extracted. Here, “response information was not extracted” means that there was no response corresponding to the user's utterance in the response knowledge DB. In that case, since the dialogue is ended, the speech promotion response DB is used to prompt the user for the next speech.

同図に示されるように、発話促進応答ＤＢには、「それで？」、「それから？」など、ユーザに次の発話を促す応答が記憶されている。 As shown in the figure, in the utterance promotion response DB, responses that prompt the user to utter the next utterance such as “So?” And “And then?” Are stored.

次に、図７を用いて、応答スコア情報生成装置、及び対話処理装置の機能ブロックについて説明する。同図には、推移行列変換部７０、最大固有ベクトル算出部７１、発話入力部７２、応答選択部７３、応答出力部７４、応答知識ＤＢ８０、発話推移行列８１、応答スコアＤＢ８２、及び発話促進応答ＤＢ８３が示されている。 Next, functional blocks of the response score information generation device and the dialogue processing device will be described with reference to FIG. In the figure, a transition matrix conversion unit 70, a maximum eigenvector calculation unit 71, an utterance input unit 72, a response selection unit 73, a response output unit 74, a response knowledge DB 80, an utterance transition matrix 81, a response score DB 82, and an utterance promotion response DB 83 It is shown.

このうち、推移行列変換部７０は、応答知識ＤＢを、発話情報と応答情報との対応を示す発話推移行列に変換するものである。最大固有ベクトル算出部７１は、推移行列変換部７０により変換された発話推移行列の最大固有ベクトルを算出するものである。 Among these, the transition matrix conversion unit 70 converts the response knowledge DB into an utterance transition matrix indicating correspondence between utterance information and response information. The maximum eigenvector calculation unit 71 calculates the maximum eigenvector of the utterance transition matrix converted by the transition matrix conversion unit 70.

発話入力部７２は、ユーザによる発話を音声認識し、ユーザの発話を発話情報として入力するものである。なお、発話入力部７２は、例えばユーザが「犬小屋を作ったんだ」と発話した場合にも、発話情報を「犬小屋を作る」として入力する。すなわち、発話入力部７２は、名詞と動詞とをマッチングするようになっている。 The utterance input unit 72 recognizes utterances by the user and inputs the utterances of the user as utterance information. Note that the utterance input unit 72 also inputs the utterance information as “create kennel” even when the user utters “I made a kennel”, for example. That is, the utterance input unit 72 is adapted to match a noun and a verb.

応答選択部７３は、応答知識ＤＢ８０から、発話入力部７２により入力された発話情報が示す発話に対応する応答を示す応答情報を抽出し、応答スコアＤＢ８２から、抽出された応答情報が示す応答に対応する最大固有ベクトルの要素を取得し、取得された要素のうちで最も大きい要素に対応する応答を示す応答情報を選択するものである。 The response selection unit 73 extracts response information indicating a response corresponding to the utterance indicated by the utterance information input by the utterance input unit 72 from the response knowledge DB 80, and converts the response information indicated by the extracted response information from the response score DB 82. A corresponding maximum eigenvector element is acquired, and response information indicating a response corresponding to the largest element among the acquired elements is selected.

また、応答選択部７３は、応答情報が抽出されなかった場合には、発話促進応答ＤＢ８３により記憶された発話促進応答情報を選択するものでもある。 Moreover, the response selection part 73 is also what selects the speech promotion response information memorize | stored by speech promotion response DB83, when response information is not extracted.

応答出力部７４は、応答選択部７３により選択された応答情報を、音声に変換して出力するものである。 The response output unit 74 converts the response information selected by the response selection unit 73 into sound and outputs it.

また、応答知識ＤＢ８０、発話推移行列８１、応答スコアＤＢ８２、及び発話促進応答ＤＢ８３は上述した通りである。 The response knowledge DB 80, the utterance transition matrix 81, the response score DB 82, and the utterance promotion response DB 83 are as described above.

以上説明した機能ブロックにおいて、応答スコア情報生成装置としての機能ブロックは、推移行列変換部７０、最大固有ベクトル算出部７１、応答知識ＤＢ８０、発話推移行列８１、及び応答スコアＤＢ８２を含むものとなる。 In the functional blocks described above, the functional blocks as the response score information generation device include the transition matrix conversion unit 70, the maximum eigenvector calculation unit 71, the response knowledge DB 80, the utterance transition matrix 81, and the response score DB 82.

また、対話処理装置としての機能ブロックは、上記応答スコア情報生成装置としての機能ブロックに加え、発話入力部７２、応答選択部７３、応答出力部７４、及び発話促進応答ＤＢ８３を含むものとなる。 In addition to the functional block as the response score information generation device, the functional block as the dialogue processing device includes an utterance input unit 72, a response selection unit 73, a response output unit 74, and an utterance promotion response DB 83.

以下、フローチャートを用いてパソコン１２で実行される処理について説明する。なお、この処理は上記ＣＰＵ６０により実行されるものである。 Hereinafter, processing executed by the personal computer 12 will be described using a flowchart. This process is executed by the CPU 60.

まず、図８のフローチャートを用いてパソコン１２が応答スコア情報生成装置として実行する応答スコア情報生成処理について説明する。なお、このフローチャートでは、図２に示した応答知識ＤＢ８０における発話とその発話に対応する応答とが発話対と表現されている。 First, a response score information generation process executed by the personal computer 12 as a response score information generation device will be described using the flowchart of FIG. In this flowchart, the utterance in the response knowledge DB 80 shown in FIG. 2 and the response corresponding to the utterance are expressed as an utterance pair.

まず、ステップ１０１で、応答知識ＤＢ８０に記憶された発話対に対応する行列要素を１にする（図４（Ａ）参照）。ステップ１０２で、応答知識ＤＢ８０に含まれる全ての発話対に対する上記ステップ１０１の処理が終了したか否か判断し、終了していない場合には、ステップ１０１の処理を行い、終了した場合にはステップ１０３に処理を移行する。 First, in step 101, the matrix element corresponding to the utterance pair stored in the response knowledge DB 80 is set to 1 (see FIG. 4A). In step 102, it is determined whether or not the processing in step 101 has been completed for all utterance pairs included in the response knowledge DB 80. If not, the processing in step 101 is performed. The processing is shifted to 103.

ステップ１０３で、１つの同じ行に存在する各要素を、同じ行に存在する要素の総和で割る（図４（Ｂ）参照）。その行の要素に対して、ステップ１０４で上述したスムーシングを行う。次のステップ１０５で、全ての行に対するステップ１０３、１０４の処理が終了したか否か判断し、終了していない場合には、ステップ１０３の処理を行い、終了した場合にはステップ１０６に処理を移行する。以上の処理により、応答知識ＤＢ８０が、発話情報と応答情報との対応を示す発話推移行列に変換される。 In step 103, each element present in one same row is divided by the sum of the elements present in the same row (see FIG. 4B). The smoothing described above is performed in step 104 on the elements in the row. In the next step 105, it is determined whether or not the processing of steps 103 and 104 for all rows has been completed. If not, the processing of step 103 is performed. If the processing has been completed, the processing of step 106 is performed. Transition. Through the above processing, the response knowledge DB 80 is converted into an utterance transition matrix indicating correspondence between utterance information and response information.

次のステップ１０６で、上記発話推移行列を固有値分解することにより、最大固有ベクトルを算出する。次のステップ１０７で、最大固有ベクトルを用いて応答スコアＤＢを生成し（図５参照）、処理を終了する。 In the next step 106, the maximum eigenvector is calculated by eigenvalue decomposition of the utterance transition matrix. In the next step 107, a response score DB is generated using the maximum eigenvector (see FIG. 5), and the process ends.

以上説明した応答スコア情報生成処理における上記最大固有ベクトルの各要素は、対応する応答が適切な応答であることの度合いを示すものであるため、対話を長続きさせるための応答スコア情報生成装置を提供することができる。 Since each element of the maximum eigenvector in the response score information generation process described above indicates the degree to which the corresponding response is an appropriate response, a response score information generation device for continuing the dialogue is provided. be able to.

次に、図９のフローチャートを用いてパソコン１２が対話処理装置として実行する対話処理について説明する。まず、ステップ２０１で、ユーザの発話を音声認識し、発話情報を入力する。次のステップ２０２で、応答知識ＤＢ８０に記憶されている発話情報のうち、入力された発話情報と比較して、一致するものがあるか否か判断する。ステップ２０２で、否定判断した場合には、ステップ２０４に処理が移行する。 Next, dialogue processing executed by the personal computer 12 as the dialogue processing device will be described using the flowchart of FIG. First, in step 201, the user's speech is recognized and speech information is input. In the next step 202, the utterance information stored in the response knowledge DB 80 is compared with the inputted utterance information to determine whether there is a match. If a negative determination is made in step 202, the process proceeds to step 204.

一方、ステップ２０２で肯定判断した場合には、ステップ２０３で、ユーザによる発話情報と一致した応答知識ＤＢ８０に記憶されている発話に対応する応答を応答候補とし、ステップ２０４に処理が移行する。 On the other hand, if an affirmative determination is made in step 202, a response corresponding to the utterance stored in the response knowledge DB 80 that matches the utterance information by the user is set as a response candidate in step 203, and the process proceeds to step 204.

次のステップ２０４で、応答知識ＤＢ８０に記憶されている全ての発話情報と、ユーザによる発話情報とを比較したか否か判断し、否定判断した場合には、再びステップ２０２の処理を行い、肯定判断した場合には、ステップ２０５に処理が移行する。 In the next step 204, it is determined whether or not all the utterance information stored in the response knowledge DB 80 and the utterance information by the user have been compared. If so, the process proceeds to step 205.

上記ステップ２０２からステップ２０４の処理により、入力された発話情報が示す発話に対応する応答を示す応答情報（応答候補）が抽出される。 Through the processing from step 202 to step 204, response information (response candidate) indicating the response corresponding to the utterance indicated by the input utterance information is extracted.

次のステップ２０５で、応答候補が０か否か判断する。応答候補が０、すなわち応答情報が抽出されなかった場合には、ステップ２０６で、発話促進応答ＤＢ８３から発話促進応答を応答情報として選択し、ステップ２０９に処理が移行する。 In the next step 205, it is determined whether or not the response candidate is 0. If the response candidate is 0, that is, if response information is not extracted, in step 206, the speech promotion response is selected from the speech promotion response DB 83 as response information, and the process proceeds to step 209.

このように、応答候補が抽出されなかった場合には、ユーザに次の発話を促す予め定められた応答を示す発話促進応答情報を選択するので、対話を長続きさせることができる。 As described above, when no response candidate is extracted, utterance promotion response information indicating a predetermined response that prompts the user to utter the next utterance is selected, so that the conversation can be continued for a long time.

一方、ステップ２０５で、否定判断した場合には、ステップ２０７で、応答スコアＤＢ８２から、応答候補に対応する要素（スコア）を取得し、ステップ２０８で、最も大きい要素に対応する応答候補を応答情報として選択する。 On the other hand, if a negative determination is made in step 205, the element (score) corresponding to the response candidate is acquired from the response score DB 82 in step 207, and the response candidate corresponding to the largest element is selected as the response information in step 208. Choose as.

次のステップ２０９で、応答情報を音声に変換して出力し、再びステップ２０１の処理行う。 In the next step 209, the response information is converted into voice and output, and the process of step 201 is performed again.

この対話処理によれば、例えばユーザが「犬小屋を作る」と発話した場合、「犬が喜ぶ」又は「金槌がいる」が応答候補となる。「犬が喜ぶ」に対応する要素は、０．５６９であり、「金槌がいる」に対応する要素は、０．０１２である。従って、「犬が喜ぶ」が応答として選択される。このような言明文には、文末に「ね」を加えて音声出力するような処理にしても良い。 According to this dialogue processing, for example, when the user speaks “make a doghouse”, “the dog is pleased” or “there is a hammer” is a response candidate. The element corresponding to “the dog is pleased” is 0.569, and the element corresponding to “has a hammer” is 0.012. Therefore, “the dog is pleased” is selected as the response. Such a statement may be processed by adding “ne” to the end of the sentence and outputting the sound.

以上説明した対話処理における最大固有ベクトルの各要素は、対応する応答が適切な応答であることの度合いを示すものであるため、ユーザによる発話に対して適切な応答を選択し、その応答を出力することで対話を長続きさせることを可能とする対話処理装置を提供することができる。 Since each element of the maximum eigenvector in the dialog processing described above indicates the degree to which the corresponding response is an appropriate response, an appropriate response is selected for the user's utterance and the response is output. Thus, it is possible to provide a dialogue processing apparatus that makes it possible to continue the dialogue.

パソコンの構成を示す図である。It is a figure which shows the structure of a personal computer. 応答知識データベースを示す図である。It is a figure which shows a response knowledge database. 発話ネットワークを示す図である。It is a figure which shows an utterance network. 発話推移行列を示す図である。It is a figure which shows an utterance transition matrix. 最大固有ベクトル及び応答スコアデータベースを示す図である。It is a figure which shows a maximum eigenvector and a response score database. 発話促進応答データベースを示す図である。It is a figure which shows an utterance promotion response database. 応答スコア情報生成装置、及び対話処理装置の機能ブロック図である。It is a functional block diagram of a response score information generation device and a dialogue processing device. 応答スコア情報生成処理を示すフローチャートである。It is a flowchart which shows a response score information generation process. 対話処理を示すフローチャートである。It is a flowchart which shows a dialogue process.

Explanation of symbols

１２パソコン
６０ＣＰＵ
６３ＨＤＤ
６６マイク
６７スピーカ
７０推移行列変換部
７１最大固有ベクトル算出部
７２発話入力部
７３応答選択部
７４応答出力部
８０応答知識ＤＢ
８１発話推移行列
８２応答スコアＤＢ
８３発話促進応答ＤＢ 12 PC 60 CPU
63 HDD
66 microphone 67 speaker 70 transition matrix conversion unit 71 maximum eigenvector calculation unit 72 utterance input unit 73 response selection unit 74 response output unit 80 response knowledge DB
81 Utterance transition matrix 82 Response score DB
83 Utterance promotion response DB

Claims

Response knowledge information storage means storing response knowledge information including a plurality of utterance information indicating utterances, and response information indicating responses corresponding to each of the utterances;
Conversion means for converting the response knowledge information stored by the response knowledge information storage means into an utterance transition matrix indicating correspondence between the utterance information and the response information;
Calculating means for calculating a maximum eigenvector of the utterance transition matrix converted by the converting means;
Response score information generation means for generating response score information in which each element of the maximum eigenvector calculated by the calculation means is associated with each response indicated in the response information;
Response score information storage means for storing the response score information generated by the response score information generation means;
A response score information generation device having

A response score information generating device according to claim 1;
Input means for recognizing speech by a user and inputting the user's speech as the speech information;
Response information extraction means for extracting response information indicating a response corresponding to the utterance indicated by the utterance information input by the input means from the response knowledge information stored by the response knowledge information storage means;
Element acquisition means for acquiring the element of the maximum eigenvector corresponding to the response indicated by the response information extracted by the response information extraction means from the response score information stored by the response score information storage means;
Response information selection means for selecting response information indicating a response corresponding to the largest element among the elements acquired by the element acquisition means;
Voice output means for converting the response information selected by the response information selection means into voice and outputting the voice;
A dialogue processing apparatus.

Utterance promotion response information storage means for storing utterance promotion response information indicating a predetermined response that prompts the user for the next utterance;
3. The dialogue according to claim 2, wherein the response information selection unit selects the utterance promotion response information stored in the utterance promotion response information storage unit when the response information is not extracted by the response information extraction unit. Processing equipment.