JP2013016151A

JP2013016151A - Language processing apparatus

Info

Publication number: JP2013016151A
Application number: JP2012110061A
Authority: JP
Inventors: Kotaro Funakoshi; 孝太郎船越; Mikio Nakano; 幹生中野; Takenobu Tokunaga; 健伸徳永; Ryu Iida; 龍飯田
Original assignee: Honda Motor Co Ltd
Current assignee: Honda Motor Co Ltd
Priority date: 2011-07-06
Filing date: 2012-05-11
Publication date: 2013-01-24

Abstract

PROBLEM TO BE SOLVED: To provide a reference expression processing apparatus using a probability model that interprets and generates reference expression including descriptive expression, anaphoric expression and deictic expression according to a progress status of a dialogue.SOLUTION: The language processing apparatus according to the present invention includes: a reference expression processing unit for performing at least one of the interpretation and generation of the reference expression by a probability model comprising a reference expression Bayesian network that represents a relationship among a reference domain (D) as a collection of conceivable instruction objects, the instruction objects (X) in the reference domain, a concept (C) relating to the instruction objets, and words (W) representing the concept; and a storage unit for storing data necessary for forming the reference expression Bayesian network.

Description

本発明は、参照表現を処理する参照表現処理装置、該参照表現処理装置を含む言語処理装置及び参照表現処理方法に関する。 The present invention relates to a reference expression processing apparatus that processes a reference expression, a language processing apparatus including the reference expression processing apparatus, and a reference expression processing method.

たとえば、音声対話システムなどを使用して人間とロボットとがコミュニケーションを行う場合を考える。部屋の中に複数の机やいすがあり、人間が「赤い脚の白い机」という参照表現によって一つの机を指定したとする。ここで、参照表現とは話し手が関心を持つ特定の事物を聞き手に対して指し示す言語表現である。ロボットの言語処理装置が、この参照表現から人間の指定した机を特定する作業が参照表現の理解である。また、ロボットの言語処理装置が、ロボットが指定した机を人間に特定させるために、その机を表す、人間に理解しやすい表現を生成することが参照表現の生成である。人間による参照表現は、当然ながら人間の知識に依存するので、ロボットの言語処理装置は、参照表現の理解及び生成に際して、人間の知識に関する情報を利用する必要がある。 For example, consider a case where a human and a robot communicate using a voice dialogue system. Suppose that there are multiple desks and chairs in a room, and that a person has designated one desk by the reference expression “a white table with red legs”. Here, the reference expression is a linguistic expression that indicates a specific thing that the speaker is interested in to the listener. The task of the robot language processing device to specify a desk designated by a person from the reference expression is understanding of the reference expression. In addition, in order for the language processing device of the robot to make the person specified by the robot a human, a reference expression is generated by generating a human-readable expression that represents the desk. Since human reference expressions naturally depend on human knowledge, the robot language processing apparatus needs to use information related to human knowledge in understanding and generating the reference expressions.

言語処理装置が、参照表現の理解及び生成に際して、人間の知識に関する情報を利用するために確率モデルを使用することが考えられる。 It is conceivable that the language processing device uses a probabilistic model to use information about human knowledge in understanding and generating the reference expression.

本出願の出願人と同一の出願人による特許文献１は、参照表現の物体の部分への言及を取り扱う言語処理装置及び該言語処理装置による言語処理において使用される確率演算方法を開示している。 Patent Document 1 by the same applicant as the applicant of the present application discloses a language processing apparatus that handles a reference to an object part of a reference expression and a probability calculation method used in language processing by the language processing apparatus. .

しかし、特許文献１は、対話の進行状況に応じて、記述表現、照応表現及び直示表現を含む参照表現を理解し、生成する仕組みについては開示していない。 However, Patent Document 1 does not disclose a mechanism for understanding and generating a reference expression including a descriptive expression, an anaphoric expression, and a direct expression according to the progress of the dialogue.

特開２０１０−２２４５３６号公報JP 2010-224536 A

したがって、対話の進行状況に応じて、記述表現、照応表現及び直示表現を含む参照表現を理解し、生成する確率モデルを使用した参照表現処理装置、言語処理装置及び参照表現処理方法に対するニーズがある。 Therefore, there is a need for a reference expression processing device, a language processing device, and a reference expression processing method using a probability model that understands and generates a reference expression including a description expression, an anaphoric expression, and a direct expression according to the progress of the dialogue. is there.

本発明の第１の態様による参照表現処理装置は、考えられる指示対象の集合である参照ドメイン（Ｄ）、参照ドメイン中の指示対象（Ｘ）、指示対象に関する概念（Ｃ）及び概念を表現する語（Ｗ）の間の関係を表す参照表現ベイジアンネットワークから構成された確率モデルによって、参照表現の理解及び生成の少なくとも一方を行う参照表現処理部と、前記参照表現ベイジアンネットワークを形成するのに必要なデータを記憶する記憶部と、を備えている。 The reference expression processing apparatus according to the first aspect of the present invention expresses a reference domain (D) that is a set of possible instruction objects, an instruction object (X) in the reference domain, a concept (C) and an object related to the instruction object. Necessary for forming the reference expression Bayesian network, and a reference expression processing unit for understanding and generating the reference expression by a probability model composed of the reference expression Bayesian network representing the relationship between the words (W) And a storage unit for storing various data.

本態様による参照表現処理装置は、指示対象（Ｘ）、指示対象に関する概念（Ｃ）及び概念を表現する語（Ｗ）の間の関係を表す参照表現ベイジアンネットワークから構成された確率モデルを使用するので、記述表現、照応表現及び直示表現を含む参照表現を理解し、生成することができる。また、表現ベイジアンネットワークは、考えられる指示対象の集合である参照ドメイン（Ｄ）を含むので、該参照表現処理装置は、状況に応じて参照表現を処理することができる。 The reference expression processing apparatus according to this aspect uses a probability model composed of a reference expression Bayesian network that represents a relationship between an instruction object (X), a concept (C) related to the instruction object, and a word (W) expressing the concept. Therefore, it is possible to understand and generate reference expressions including descriptive expressions, anaphoric expressions, and direct expressions. Further, since the expression Bayesian network includes a reference domain (D) that is a set of possible instruction targets, the reference expression processing apparatus can process the reference expression according to the situation.

本発明の一つの実施形態による参照表現処理装置は、前記参照表現ベイジアンネットワークが対話の進行中に参照表現の処理ごとに形成されるように構成されている。 A reference expression processing apparatus according to an embodiment of the present invention is configured such that the reference expression Bayesian network is formed for each processing of a reference expression while a dialog is in progress.

本実施形態の参照表現処理装置によれば、対話の進行状況に合わせて参照表現を処理することができる。 According to the reference expression processing apparatus of the present embodiment, it is possible to process the reference expression in accordance with the progress of the dialogue.

本発明の一つの実施形態による参照表現処理装置は、参照表現の種類によって、前記参照ドメインの定め方を変えるように構成されている。 A reference expression processing device according to an embodiment of the present invention is configured to change the way of determining the reference domain according to the type of reference expression.

本実施形態の参照表現処理装置によれば、参照表現の種類を考慮することにより、より高い精度で参照表現を処理することができる。 According to the reference expression processing apparatus of this embodiment, the reference expression can be processed with higher accuracy by considering the type of the reference expression.

本発明の一つの実施形態による参照表現処理装置は、参照表現が指示詞を含む場合に前記参照ドメインが全ての要素（指示可能な対象）を含むように構成されている。 A reference expression processing device according to an embodiment of the present invention is configured such that, when a reference expression includes an indicator, the reference domain includes all elements (objects that can be specified).

本実施形態の参照表現処理装置によれば、参照表現が指示詞を含む場合に、参照ドメインとして全ての要素を含む単一の参照ドメインだけを考慮することにより、より高い精度で参照表現を処理することができる。 According to the reference expression processing apparatus of the present embodiment, when the reference expression includes a directive, the reference expression is processed with higher accuracy by considering only a single reference domain including all elements as the reference domain. can do.

本発明の一つの実施形態による参照表現処理装置は、参照ドメインの顕現性をパラメータとする、参照ドメインの複数の推定モデルを形成し、参照表現の指示対象が単一物か集合かによって、前記複数の推定モデルのうちの一つを選択して使用するように構成されている。 A reference expression processing device according to an embodiment of the present invention forms a plurality of estimation models of a reference domain using the manifestation of the reference domain as a parameter, and determines whether the reference expression instruction target is a single object or a set. One of the plurality of estimation models is selected and used.

本実施形態の参照表現処理装置によれば、参照表現の指示対象が単一物か集合かによって、複数の推定モデルのうちの一つを選択して使用することにより、より高い精度で参照表現を処理することができる。 According to the reference expression processing apparatus of the present embodiment, a reference expression can be obtained with higher accuracy by selecting and using one of a plurality of estimation models depending on whether the target object of the reference expression is a single object or a set. Can be processed.

本発明の第２の態様の言語処理装置は、本発明による参照表現処理装置を備えている。 The language processing apparatus according to the second aspect of the present invention includes the reference expression processing apparatus according to the present invention.

本態様の言語処理装置は、本発明による参照表現処理装置を備えているので、上述のように参照表現を高い精度で処理することができる。 Since the language processing apparatus of this aspect includes the reference expression processing apparatus according to the present invention, the reference expression can be processed with high accuracy as described above.

本発明の第３の態様による参照表現処理方法は、言語処理装置の参照表現処理部が、記憶部に記憶されたデータを使用して、参照表現に対して、参照ドメイン（Ｄ）、参照ドメイン中の指示対象（Ｘ）、指示対象に関する概念（Ｃ）及び概念を表現する語（Ｗ）の間の関係を表す参照表現ベイジアンネットワークを形成するステップと、前記参照表現処理部が、前記ベイジアンネットワークを周辺化して、確率Ｐ（Ｘ｜Ｗ）を求めるステップと、前記参照表現処理部が、確率Ｐ（Ｘ｜Ｗ）を最大とするｘ’を求め、前記参照表現の指示対象とするステップと、を含む。 In the reference expression processing method according to the third aspect of the present invention, the reference expression processing unit of the language processing device uses the data stored in the storage unit to perform the reference domain (D) and the reference domain on the reference expression. Forming a reference expression Bayesian network representing a relationship between a pointing object (X), a concept (C) related to the pointing object, and a word (W) expressing the concept, and the reference expression processing unit includes the Bayesian network And obtaining the probability P (X | W), and the reference expression processing unit obtaining x ′ that maximizes the probability P (X | W), and setting the reference expression as an instruction target; ,including.

本態様による参照表現処理方法は、指示対象（Ｘ）、指示対象に関する概念（Ｃ）及び概念を表現する語（Ｗ）の間の関係を表す参照表現ベイジアンネットワークから構成される確率モデルを使用するので、記述表現、照応表現及び直示表現を含む参照表現を理解し、生成することができる。また、表現ベイジアンネットワークは、考えられる指示対象の集合である参照ドメイン（Ｄ）を含むので、該参照表現処理方法によれば、状況に応じて参照表現を処理することができる。 The reference expression processing method according to this aspect uses a probability model composed of a reference expression Bayesian network that represents a relationship between an instruction object (X), a concept (C) related to the instruction object, and a word (W) expressing the concept. Therefore, it is possible to understand and generate reference expressions including descriptive expressions, anaphoric expressions, and direct expressions. Further, since the expression Bayesian network includes a reference domain (D) that is a set of possible instruction targets, according to the reference expression processing method, it is possible to process the reference expression according to the situation.

本発明の一実施形態による言語処理装置の構成を示す図である。参照ベイジアンネットワークの基本ネットワーク構造を示す図である。「そのテーブル」のような１個の指示対象を示す参照表現の参照ベイジアンネットワークを示す図である。「彼のテーブル」のような２個の指示対象を示す参照表現の参照ベイジアンネットワークを示す図である。７個のピースからなるタングラム・パズルを解いている場面を示す図である。参照表現処理部による参照表現の理解の処理を示す流れ図である。図６のステップＳ１０１０の詳細な処理を説明するための流れ図である。参照表現処理部による参照表現の生成の処理を示す流れ図である。ＲＥＸ−Ｊコーパスに対して定義された概念辞書の抜粋を示す図である。図５に示す７個のピースに対して定義された静的適合度表の抜粋を示す図である。参照ドメインのリストを使用して

を求める方法を示す流れ図である。 It is a figure which shows the structure of the language processing apparatus by one Embodiment of this invention. It is a figure which shows the basic network structure of a reference Bayesian network. It is a figure which shows the reference Bayesian network of the reference expression which shows one instruction | indication object like "the table." It is a figure which shows the reference Bayesian network of the reference expression which shows two instruction | indication objects like "his table." It is a figure which shows the scene which is solving the tangram puzzle which consists of seven pieces. It is a flowchart which shows the process of the understanding of the reference expression by a reference expression process part. It is a flowchart for demonstrating the detailed process of step S1010 of FIG. It is a flowchart which shows the production | generation process of the reference expression by a reference expression process part. It is a figure which shows the excerpt of the concept dictionary defined with respect to the REX-J corpus. It is a figure which shows the excerpt of the static fitness table defined with respect to seven pieces shown in FIG. Using a list of reference domains

It is a flowchart which shows the method of calculating | requiring.

図１は、本発明の一実施形態による言語処理装置１００の構成を示す図である。言語処理装置１００は、単一または複数のプロセッサ及び単一または複数のメモリによって構成してもよい。 FIG. 1 is a diagram showing a configuration of a language processing apparatus 100 according to an embodiment of the present invention. The language processing apparatus 100 may be configured by a single or a plurality of processors and a single or a plurality of memories.

言語処理装置１００は、特徴的な構成要素として、参照表現処理部１０１及び記憶部１０５を備える。 The language processing apparatus 100 includes a reference expression processing unit 101 and a storage unit 105 as characteristic components.

参照表現処理部１０１は、ある参照表現が候補の指示対象を表す確率を演算するための確率モデルを備える。記憶部１０１は、該確率モデルが使用するデータを記憶する。参照表現処理部１０１は、対話の進行の状況に応じて該確率モデルを更新する。参照表現処理部１０１は、参照表現の理解の際に、すなわち、参照表現が表す指示対象を特定する際に、確率モデルに基づいて参照表現が候補の指示対象を表す確率を演算し、確率が最大となる候補の指示対象を、該参照表現が表す指示対象として特定する。また、参照表現選択部１０７は、参照表現の生成の際に、すなわち、指示対象を表す参照表現を選択する際に、候補の参照表現が該指示対象を表す確率を演算し、確率が最大となる参照表現を、該指示対象を表す参照表現として選択する。参照表現処理部１０１及び記憶部１０５は、参照表現処理装置を構成する。参照表現処理装置の構成要素の詳細については後で説明する。 The reference expression processing unit 101 includes a probability model for calculating a probability that a certain reference expression represents a candidate instruction target. The storage unit 101 stores data used by the probability model. The reference expression processing unit 101 updates the probability model according to the progress of the dialogue. The reference expression processing unit 101, when understanding the reference expression, that is, when specifying the instruction target represented by the reference expression, calculates the probability that the reference expression represents the candidate instruction object based on the probability model, and the probability is The largest candidate instruction target is specified as the instruction target represented by the reference expression. Further, the reference expression selection unit 107 calculates the probability that the candidate reference expression represents the instruction object when generating the reference expression, that is, when selecting the reference expression representing the instruction object, and the probability is the maximum. Is selected as a reference expression representing the instruction target. The reference expression processing unit 101 and the storage unit 105 constitute a reference expression processing device. Details of components of the reference expression processing device will be described later.

音声認識部１１５は、たとえば、人間の音声を認識し、言語の最小単位である形態素に分けて、辞書によりそれぞれの形態素の品詞を定める。構造解析部１１１は、音声認識部１１５によって得られた形態素の情報から文章の構造を解析する。 For example, the speech recognition unit 115 recognizes human speech, divides it into morphemes that are the smallest units of language, and determines the part of speech of each morpheme using a dictionary. The structure analysis unit 111 analyzes the structure of the sentence from morpheme information obtained by the speech recognition unit 115.

表層実現部１１３は、参照表現処理部１０１が生成した参照表現を含む自然言語の表現を生成する。音声合成／表示部１１７は、自然言語の表現を音声合成し、または表示する。 The surface layer realizing unit 113 generates a natural language expression including the reference expression generated by the reference expression processing unit 101. The speech synthesis / display unit 117 synthesizes or displays a natural language expression.

言語処理装置１００は、他に言語理解処理部１０３、言語生成処理部１０７及び対話管理部１０９を備える。言語理解処理部１０３は、構造解析部１１１から構造解析の結果を受け取り、参照表現処理部１０１が行う参照表現の処理以外の言語処理を行い、その結果を対話管理部１０９へ送る。言語生成処理部１０７は、参照表現処理部１０１が行なう参照表現の生成以外の言語生成を行ない、その結果を表層実現部１１３へ送る。対話管理部１０９は、参照表現処理部１０１から参照表現が表す指示対象を受け取り、言語理解処理部１２１から、参照表現処理部１０１が行う参照表現の処理以外の言語処理の結果を受け取り、受け取った入力の処理を行う。対話管理部１０９は、入力または他の条件に基づいて出力を作成し、参照表現処理部１０１及び言語生成処理部１０７へ送る。参照表現処理部１０１は、対話管理部１０９の出力を受け取って適切な参照表現を生成する。また、言語生成処理部１０７は、対話管理部１０９の出力を受け取って参照表現の選択以外の言語生成を行なう。 The language processing apparatus 100 further includes a language understanding processing unit 103, a language generation processing unit 107, and a dialogue management unit 109. The language understanding processing unit 103 receives the result of the structural analysis from the structural analysis unit 111, performs language processing other than the reference expression processing performed by the reference expression processing unit 101, and sends the result to the dialog management unit 109. The language generation processing unit 107 performs language generation other than the generation of the reference expression performed by the reference expression processing unit 101, and sends the result to the surface layer realizing unit 113. The dialogue management unit 109 receives the instruction target represented by the reference expression from the reference expression processing unit 101, and receives and receives the result of language processing other than the reference expression processing performed by the reference expression processing unit 101 from the language understanding processing unit 121. Perform input processing. The dialogue management unit 109 creates an output based on the input or other conditions, and sends the output to the reference expression processing unit 101 and the language generation processing unit 107. The reference expression processing unit 101 receives the output of the dialogue management unit 109 and generates an appropriate reference expression. The language generation processing unit 107 receives the output of the dialogue management unit 109 and performs language generation other than the selection of the reference expression.

ここで、参照表現処理部１０１が使用する確率モデルについて説明する。確率モデルは、参照表現ベイジアンネットワークを使用する。 Here, the probability model used by the reference expression processing unit 101 will be described. The probabilistic model uses a reference representation Bayesian network.

図２は、参照ベイジアンネットワークの基本ネットワーク構造を示す図である。図２において、４個のノードＷ、Ｃ、Ｘ及びＤは、観測される語、該語によって指示される概念、参照表現の指示対象、及び想定される参照ドメインをそれぞれ表す。語は、後で説明する概念辞書の項目である。 FIG. 2 is a diagram illustrating a basic network structure of a reference Bayesian network. In FIG. 2, four nodes W, C, X, and D represent an observed word, a concept indicated by the word, an indication target of a reference expression, and an assumed reference domain, respectively. A word is an item of a concept dictionary described later.

ここで、参照ドメイン（Susan Salmon-Art and Laurent Romary. 2000. Generating referring expressions in multimodal context. In Proceedings of the INLG 2000 workshop on Conference in Generated Multimedia, Mitzpe Ramon, Israel, June; Susan Salmon-Art and Laurent Romary. 2001. Reference resolution within the framework of cognitive grammar. In Proceedings of the International Colloquium on Cognitive Science, San Sabastian, Spain, May; and Alexandre Dennis. 2010. Generating referring expressions with reference domain theory. In Proceedings of the 6^th International Natural Language Generating Conference (INLG), pages 27-35）について説明する。参照ドメインは、指示対象を含む集合である。参照ドメインに含まれる要素は、個々の具体物であっても他の参照ドメインであってもよい。各参照ドメインdは、焦点及び顕現性の程度(負でない実数)を有する。焦点及び顕現性の程度は、それぞれ、foc(d)及びsal(d)で表される。参照ドメインは、顕現性にしたがって降順にソートされる。 Here, the reference domain (Susan Salmon-Art and Laurent Romary. 2000. Generating referring expressions in multimodal context. In Proceedings of the INLG 2000 workshop on Conference in Generated Multimedia, Mitzpe Ramon, Israel, June; Susan Salmon-Art and Laurent Romary 2001. Reference resolution within the framework of cognitive grammar.In Proceedings of the International Colloquium on Cognitive Science, San Sabastian, Spain, May; and Alexandre Dennis. 2010. Generating referring expressions with reference domain theory.In Proceedings of the 6 ^th International Natural Language Generating Conference (INLG), pages 27-35). The reference domain is a set including the instruction target. The elements included in the reference domain may be individual specific objects or other reference domains. Each reference domain d has a focus and a degree of manifestation (a non-negative real number). The degree of focus and manifestation is expressed as foc (d) and sal (d), respectively. Reference domains are sorted in descending order according to manifestation.

図５は、７個のピースからなるタングラム・パズルを解いている場面を示す図である。タングラム・パズルについては後で説明する。参照ドメインは、参照表現を聴くことによって言語によって、あるいは、物理的な状況を観察することによって視覚によって、対話参加者の知的な空間に導入される。図５に示す状況で「２個の大きな三角形」と言われた場合には、ピース１及び２から構成される参照ドメインが認識される。もし、ピース１を動かしてピース２に付けた場合には、近接度からピース１、２及び６から構成される参照ドメインが感覚的に認識される（Kristinn R. Thorisson. 1994. Simulated perceptual grouping: An application to human-computer interaction. In Proceedings of the 16^th annual Conference of the Cognitive Science Society, pages 876-881, Atlanta, GA, USA）。同様にして、ピース５及び７から構成される参照ドメインも認識される。以下において、参照ドメインを、インデクスを付した＠で示し、その要素を[]で囲んで示す。たとえば、@₁=[1,2]、@₂=[1,2,6]、@₃=[5,7]などである。焦点の要素には、＊印をつける。たとえば、foc([1*,2])=1である。 FIG. 5 is a diagram showing a scene in which a tangram puzzle consisting of seven pieces is being solved. The tangram puzzle will be explained later. The reference domain is introduced into the intellectual space of the dialog participant either by language by listening to the reference expression or visually by observing the physical situation. In the situation shown in FIG. 5, when “two large triangles” are referred to, a reference domain composed of pieces 1 and 2 is recognized. If piece 1 is moved and attached to piece 2, the reference domain composed of pieces 1, 2 and 6 is perceptually recognized from the proximity (Kristinn R. Thorisson. 1994. Simulated perceptual grouping: An application to human-computer interaction. In Proceedings of the 16 th annual Conference of the Cognitive Science Society, pages 876-881, Atlanta, GA, USA). Similarly, a reference domain composed of pieces 5 and 7 is also recognized. In the following, the reference domain is indicated by @ with an index, and its elements are enclosed in []. For example, @ ₁ = [1,2], @ ₂ = [1,2,6], @ ₃ = [5,7]. * Mark the focus element. For example, foc ([1 *, 2]) = 1.

参照ドメインの指示対象は、どの参照ドメインが想定されるかによって異なる。すなわち、@₁=[1,2]または@₂=[1,2,6]が想定されれば、「右のピース」の指示対象は、ピース１である。@₃=[5,7]が想定されれば、「右のピース」の指示対象は、ピース５である。 The reference target of the reference domain differs depending on which reference domain is assumed. That is, if @ ₁ = [1,2] or @ ₂ = [1,2,6] is assumed, the instruction target of “right piece” is piece 1. If @ ₃ = [5, 7] is assumed, the instruction target of “right piece” is piece 5.

参照ドメインに関する上記の文献は、確率的なアプローチではなく論理記号演算に基づくアプローチを採用している。 The above literature on the reference domain adopts an approach based on logical symbolic operation rather than a probabilistic approach.

図３は、「そのテーブル」のような１個の指示対象を示す参照表現の参照ベイジアンネットワークを示す図である。このように、人間が１つのまとまりとして認識する指示対象への参照表現を単位参照表現と呼称する。上記の例の他に、「赤いボール」、「あの夫婦」、「隣の部屋」、「昨日」といった参照表現は単位参照表現である。 FIG. 3 is a diagram illustrating a reference Bayesian network of a reference expression indicating one indication target such as “the table”. In this way, a reference expression to an instruction target that a human recognizes as one unit is called a unit reference expression. In addition to the above example, reference expressions such as “red ball”, “that couple”, “next room”, and “yesterday” are unit reference expressions.

図４は、「彼のテーブル」のような２個の指示対象を示す参照表現の参照ベイジアンネットワークを示す図である。このように、２以上の単位参照表現を含む参照表現を複合参照表現と呼称する。上記の例の他に、「僕のボール」、「机の上」、「去年の１２月」といった参照表現は複合参照表現である。 FIG. 4 is a diagram illustrating a reference Bayesian network of a reference expression indicating two target objects such as “his table”. As described above, a reference expression including two or more unit reference expressions is referred to as a composite reference expression. In addition to the above example, reference expressions such as “my ball”, “on the desk”, and “December last year” are compound reference expressions.

参照ベイジアンネットワークは、複合参照表現を処理することもできるが、以下においては、簡単のため、単位参照表現を処理する場合について説明する。 Although the reference Bayesian network can process a composite reference expression, a case where a unit reference expression is processed will be described below for the sake of simplicity.

Ｎ語の単位参照表現インスタンス用の参照ベイジアンネットワークは、２Ｎ＋２個の個別のランダム変数Ｗ_１，・・・，Ｗ_ｎ，Ｃ_１・・・，Ｃ_ｎ，Ｘ及びＤを有する。各変数の領域は、その時点におけるコンテキスト及び対応する参照ドメインによって定まる。ここで、

は、ランダム変数Ｖの領域を示す。

は、対応する観測される語ｗ_ｉ及び他の可能性を表すωを含み、

となる。各Ｗｉは対応するノードＣｉを有する。

は、ｗ_ｉによって表すことのできるＭ個の概念及び他の可能性を表す特別の概念Ωを含み、

となる。

は、後で説明する概念辞書から参照される。

は、その時点までに認識されたＬ＋１個の参照ドメインを含み、

となる。@₀は、対話において参照されるすべての個別の具体物を含む基底ドメインである。対話の初期において、

である。他のＬ個の参照ドメインは、対話中に徐々に追加される。

は、全ての考えうる指示対象を、すなわち、Ｋ個の個々の具体物及びＬ＋１個の参照ドメインを含む。したがって、

となる。 The reference Bayesian network for an N-word unit reference expression instance has 2N + 2 individual random variables W ₁ ,..., W _n , C ₁ ..., C _n , X and D. The area of each variable is determined by the current context and the corresponding reference domain. here,

Indicates the region of the random variable V.

Contains the corresponding observed word w _i and other possibilities ω,

It becomes. Each Wi has a corresponding node Ci.

Contains the M concepts that can be represented by w _i and the special concept Ω representing other possibilities,

It becomes.

Are referred to from a concept dictionary described later.

Contains L + 1 reference domains recognized so far,

It becomes. @ ₀ is the base domain that contains all the individual objects referenced in the dialog. Early in the conversation

It is. The other L reference domains are added gradually during the interaction.

Contains all possible referents, ie K individual specifics and L + 1 reference domains. Therefore,

It becomes.

参照ベイジアンネットワークを使用した指示対象の求め方は、

と定式化される。

は、以下に説明する確率表によって計算される結合確率を周辺化することによって得られる。 How to find the target using the reference Bayesian network

Is formulated.

Is obtained by marginalizing the joint probabilities calculated by the probability table described below.

参照ベイジアンネットワークで使用されるすべてのランダム変数は離散的であるので、確率分布は、確率表として与えられる。参照ベイジアンネットワークで使用される４個の確率表について以下に説明する。

は、話し手が指示することを意図するｃ及びｘから聴き手がｗを観測する確率である。 Since all random variables used in the reference Bayesian network are discrete, the probability distribution is given as a probability table. The four probability tables used in the reference Bayesian network are described below.

Is the probability that the listener will observe w from c and x that the speaker intends to direct.

ほとんどの場合に、ＷｉはＸに依存しない。すなわち、

である。しかし、Ｘは、個別化された項目(名前)を扱うのに必要である。 In most cases, Wi does not depend on X. That is,

It is. However, X is necessary to handle individualized items (names).

確率の割り当てにはいくつかの方法が考えられる。一つの簡単な方法は以下のとおりである。各々の

に対して

とし、Ωに対して、

とする。ここで、Ｔは、概念

に対して考えられる語の数であり、εは、１０^−８のような予め定めた小さな数である。

は、ｄ内のｘを指示するために、

から概念ｃが選択される確率である。

は、コンテキストに依存するので、対話システムの開発者は、あらかじめ

を与えることはできない。したがって、

から

を構成するアプローチを採用する。

は、ｄに関して、概念

の指示対象ｘに対する適合度であり、

である。１は、完全な適合度を意味し、０は適合度がないことを意味する。０．５は中立を意味する。たとえば、ｘがスーツケースであるとき、概念「箱」は、０．８のような、高い適合度を有するが、概念「ボール」は、０．１のような、低い適合度を有する。ｘがｄの中になければ、

は、０である。

のいずれもが高い適合度を有さない場合には、概念Ωには高い確率が割り当てられる。

が静的であれば、

は、予め表の形で数値として与えられる。静的でなければ、対話システム開発者によって関数の形で組み込まれる。すなわち、

である。ここで、Ｉは、対話システムから得られるすべての情報である。 There are several methods for assigning probabilities. One simple method is as follows. Each

Against

And for Ω,

And Where T is the concept

Ε is a predetermined small number such as 10 ⁻⁸ .

To denote x in d

The probability that concept c is selected from

Depends on the context, so interactive system developers

Can not give. Therefore,

From

Adopt the approach that composes.

Is a concept with respect to d

The degree of conformity to the instruction target x of

It is. 1 means perfect fitness, 0 means no fitness. 0.5 means neutrality. For example, when x is a suitcase, the concept “box” has a high fitness, such as 0.8, whereas the concept “ball” has a low fitness, such as 0.1. If x is not in d,

Is 0.

If none of these have a high fitness, the concept Ω is assigned a high probability.

Is static,

Is given as a numerical value in the form of a table in advance. If it is not static, it is incorporated in the form of a function by the dialog system developer. That is,

It is. Here, I is all information obtained from the interactive system.

たとえば、図５に示すような状況において、位置的概念「左」の適合度関数は、以下のように実装される。

ここで、ｕ_ｘ，ｕ_ｌｕ_ｒは、それぞれ、ｘ、ｄにおける最も左のピース、及びｄにおける最も右のピースの水平座標である。これらは、Ｉから得られる。ｘが参照ドメインであれば、適合度は、該参照ドメインに含まれる要素の平均値として与えられる。

は、参照ドメインｄにおける要素ｘが参照される確率であり、参照ドメインにおける属性情報に関係なく、対応する参照表現が発せられた時点におけるコンテキスト情報にしたがって推定される。コンテキスト情報は、それまでの参照の歴史(談話)及び参照者の注視のような物理的な状態(状況)を含む。

を予測モデルと呼称する。予測モデルについては実験に関連して後で説明する。

は、参照表現が発せられた時点で、参照ドメインｄが予想される確率である。参照ドメインは、陰(implicit)であるので、この確率モデルを推定するためのデータを収集することはできない。したがって、顕現性ｄに基づくアプリオリな近似関数を試験する。顕現性は、最新性に比例する。顕現性については評価実験に関連して後で説明する。
一様モデル
このモデルは顕現性を無視する。このモデルは、顕現性の重要度を理解するために導入する。

線形モデル
このモデルは、顕現性に比例するように確率を分配する。

指数モデル
このモデルは、最近の参照ドメインを強調する。この関数は、いわゆるソフト・マックスである。

For example, in the situation shown in FIG. 5, the fitness function of the positional concept “left” is implemented as follows.

Here, u _x and u _l u _r are the horizontal coordinates of the leftmost piece in x and d and the rightmost piece in d, respectively. These are derived from I. If x is a reference domain, the goodness of fit is given as an average value of elements included in the reference domain.

Is the probability that the element x in the reference domain d is referred to, and is estimated according to the context information at the time when the corresponding reference expression is issued regardless of the attribute information in the reference domain. The context information includes a history of previous references (discourse) and a physical state (situation) such as a gaze of the reference.

Is called a prediction model. The prediction model will be described later in connection with the experiment.

Is the probability that the reference domain d is expected when the reference expression is issued. Since the reference domain is implicit, data for estimating this probability model cannot be collected. Therefore, an a priori approximation function based on the manifestation d is tested. The manifestation is proportional to the freshness. The manifestation will be described later in connection with the evaluation experiment.
Uniform model This model ignores the manifestation. This model is introduced to understand the importance of manifestation.

Linear model This model distributes probabilities in proportion to the manifestation.

Exponential model This model highlights recent reference domains. This function is a so-called soft max.

図６は、参照表現処理部１０１による参照表現の理解の処理を示す流れ図である。 FIG. 6 is a flowchart showing a reference expression understanding process performed by the reference expression processing unit 101.

図６のステップＳ１０１０において、参照表現処理部１０１がすべての考えられる指示対象ｘについて、参照表現ベイジアンネットワーク(ＲＥＢＮ)を作成し、確率

を求める。 In step S1010 of FIG. 6, the reference expression processing unit 101 creates a reference expression Bayesian network (REBN) for all possible instruction targets x, and the probability

Ask for.

図６のステップＳ１０２０において、参照表現処理部１０１は、確率

が最大となるｘ’を参照表現の指示対象として選択する。 In step S1020 of FIG. 6, the reference expression processing unit 101 determines the probability.

X ′ that maximizes the reference expression is selected as a reference expression instruction target.

図７は、図６のステップＳ１０１０の詳細な処理を説明するための流れ図である。 FIG. 7 is a flowchart for explaining detailed processing of step S1010 of FIG.

図７のステップＳ２０１０において、参照表現処理部１０１がＰ（Ｄ）を求める。Ｐ（Ｄ）の求め方は上述のとおりである。 In step S2010 in FIG. 7, the reference expression processing unit 101 obtains P (D). The method for obtaining P (D) is as described above.

図７のステップＳ２０２０において、参照表現処理部１０１が

を求める。

の予測モデルについては、上述のように後で説明する。 In step S2020 of FIG. 7, the reference expression processing unit 101

Ask for.

This prediction model will be described later as described above.

図７のステップＳ２０３０において、参照表現処理部１０１が

を求める。

の求め方は上述のとおりである。 In step S2030 of FIG. 7, the reference expression processing unit 101

Ask for.

The method of obtaining is as described above.

図７のステップＳ２０４０において、参照表現処理部１０１が

を求める。

の求め方は上述のとおりである。 In step S2040 of FIG. 7, the reference expression processing unit 101

Ask for.

The method of obtaining is as described above.

図７のステップＳ２０５０において、参照表現処理部１０１が

を求める。 In step S2050 of FIG. 7, the reference expression processing unit 101

Ask for.

図７のステップＳ２０６０において、参照表現処理部１０１が

を、既存の手法を用いて周辺化し、

を求める。 In step S2060 of FIG. 7, the reference expression processing unit 101

Is marginalized using existing methods,

Ask for.

図８は、参照表現処理部１０１による参照表現の生成の処理を示す流れ図である。 FIG. 8 is a flowchart showing a process of generating a reference expression by the reference expression processing unit 101.

図８のステップＳ３０１０において、参照表現処理部１０１は、指示対象ｘを受け取り、Ｗの候補を定める。 In step S3010 of FIG. 8, the reference expression processing unit 101 receives the instruction target x and determines W candidates.

図８のステップＳ３０２０において、参照表現処理部１０１は、参照表現ベイジアンネットワークを使用して、図７の流れ図に示した手順により

を求める。 In step S3020 of FIG. 8, the reference expression processing unit 101 uses the reference expression Bayesian network and performs the procedure shown in the flowchart of FIG.

Ask for.

図８のステップＳ３０３０において、参照表現処理部１０１は、全てのＷについて処理を行ったかどうか判断する。全てのＷについて処理を行っていれば、ステップＳ３０４０に進む。全てのＷについて処理を行っていなければ、ステップＳ３０２０に戻る。 In step S3030 of FIG. 8, the reference expression processing unit 101 determines whether or not processing has been performed for all Ws. If the process is performed for all Ws, the process proceeds to step S3040. If all the W have not been processed, the process returns to step S3020.

図８のステップＳ３０４０において、参照表現処理部１０１は、指示対象ｘに対して、

が最大となるＷを参照表現として選択する。 In step S3040 of FIG. 8, the reference expression processing unit 101 performs processing for the instruction target x.

Is selected as a reference expression.

このように、本実施形態による参照表現部は、参照表現ベイジアンネットワークを使用した単一の確率モデルにより、参照表現の理解と生成の処理を行うことができる。 As described above, the reference expression unit according to the present embodiment can perform the process of understanding and generating the reference expression using a single probability model using the reference expression Bayesian network.

本実施形態による参照表現処理の評価を行うための実験について説明する。 An experiment for evaluating reference expression processing according to the present embodiment will be described.

評価用データとしてＲＥＸ−Ｊコーパス(Phlipp Spanger, Masaaki Yasuhara, Ryu Iida, Takenobu Tokunaga, Asuka Terai, and Naoko Kuriyama. 2010. REX-J: Japanese referring expression corpus of situated dialog. Language Resources and Evaluation. Online First, DOI: 10.1007/s10579-010-9134-8)を使用した。ＲＥＸ−Ｊコーパスは、２４個の人間−人間対話から構成され、それぞれの対話において、二人の参加者が、図５に示す７個のピースのタングラム・パズルを解く。パズルのゴールは、指定された形状を形成するように７個のピースを組み合わせることである。二人の被験者のうちの一人は、操作する人（ＯＰ）の役割を果たし、他の一人は解く人（ＳＶ）の役割を果たす。ＯＰは、マウスを操作することによってパソコンのモニター上に表示された仮想パズルピースを操作することができるがゴールの形状は知らない。ＳＶは、ゴールの形状を知っているが、ピースを操作することはできない。ピース及びＯＰによって操作されるマウスカーソルの状態は、リアルタイムに二人の被験者によって共有される。このようにして二人の参加者は、ピースへの多数の参照表現を含む協力的な対話を進める。参照表現に加えて、ピースの位置及び方向、マウスカーソルの位置、及びＯＰによる操作が、タイムスタンプ及び指示対象ピースのＩＤとともに記録された。 REX-J Corpus (Phlipp Spanger, Masaaki Yasuhara, Ryu Iida, Takenobu Tokunaga, Asuka Terai, and Naoko Kuriyama. 2010. REX-J: Japanese referring expression corpus of located dialog. Language Resources and Evaluation. Online First, DOI: 10.1007 / s10579-010-9134-8) was used. The REX-J corpus is composed of 24 human-human dialogues, in which two participants solve the seven piece tangram puzzle shown in FIG. The goal of the puzzle is to combine seven pieces to form a specified shape. One of the two subjects plays the role of the operator (OP) and the other one plays the role of the solver (SV). The OP can operate the virtual puzzle piece displayed on the monitor of the personal computer by operating the mouse, but does not know the shape of the goal. The SV knows the shape of the goal but cannot manipulate the pieces. The state of the mouse cursor operated by the piece and the OP is shared by the two subjects in real time. In this way, the two participants advance a collaborative dialogue involving multiple reference expressions to the piece. In addition to the reference expression, the position and direction of the piece, the position of the mouse cursor, and the operation by the OP were recorded together with the time stamp and the ID of the pointing target piece.

表１は、指示対象を付記したそれぞれの参照表現を示す表である。表１の１番目の参照表現「おっきい三角形」はあいまいであり、ピース１または２を指す。７番目及び８番目の参照表現はピース１及び２の集合を指す。その他の参照表現は、個別のピースを指す。

Table 1 is a table showing each reference expression to which the instruction target is added. The first reference expression “big triangle” in Table 1 is ambiguous and refers to

piece

1 or 2. The seventh and eighth reference expressions refer to the set of

pieces

1 and 2. Other reference expressions refer to individual pieces.

参照表現の構造解析のエラーによる問題を避けるために、参照ベイジアンネットワークが構成されるもとになる中間構造(REX-graph)も付記した。中間構造は、カッコ内の分離された単語のリストである。 In order to avoid problems due to errors in structural analysis of reference expressions, an intermediate structure (REX-graph) from which the reference Bayesian network is constructed is also added. The intermediate structure is a list of separated words in parentheses.

確率計算には、ＢＮＪ（http://bnj.sourceforge.net/）を使用する。以下において、多かれ少なかれＲＥＸ−Ｊコーパスのタスク領域に特有の具体化について説明する。 BNJ (http://bnj.sourceforge.net/) is used for the probability calculation. In the following, more or less specific implementations specific to the task area of the REX-J corpus will be described.

図９は、ＲＥＸ−Ｊコーパスに対して定義された概念辞書の抜粋を示す図である。対話を観察することにより４０個の概念を定義した。 FIG. 9 is a diagram showing an excerpt of the concept dictionary defined for the REX-J corpus. Forty concepts were defined by observing the dialogue.

図１０は、図５に示す７個のピースに対して定義された静的適合度表の抜粋を示す図である。４０個の概念のうち１３個に対して適合度の値を定めた。ＯＢＪは、全てのピースに対して、一様にかつ完全に適合する。ＦＩＧは、全てのピースに対して、一様ではあるが、それほど適合しない。ＴＲＩは、ピース１乃至５に対してのみ適合する。ＳＱＲは、ピース６及び７に対してのみ適合するが、ピース７は厳密な意味では、”square”ではないので、完全には適合しない。 FIG. 10 is a diagram showing an excerpt of the static fitness table defined for the seven pieces shown in FIG. Goodness-of-fit values were determined for 13 out of 40 concepts. The OBJ fits uniformly and perfectly for all pieces. FIG is uniform for all pieces, but not very well suited. TRI fits only for pieces 1-5. SQR only fits for pieces 6 and 7, but piece 7 is not "square" in the strict sense and therefore does not fit perfectly.

残りの２７個の概念に適合度関数を実装した。その一部について以下に説明する。 The fitness function was implemented in the remaining 27 concepts. Some of these will be described below.

「もう一つの」（ANOTHER）
参照ドメインｄの要素の中で焦点となっているものをfocus(d)で表す。

「残りの」（REST）

のように２個の群を要素に持つ参照ドメインの場合に限って、焦点の当っていない方の群の適合度を１とすし、それ以外の場合は０とする。

「両方」（BOTH）
ｘが群でその要素が２個であるとき１とする。

「図形」（FIG）
この表現は組み上がっているピースの群を参照する。そこで、ｘが単一ピースの場合(single(x)=true)は静的適合表から得た数値ｒとし、ｘが群であり互いに接続して形をなしている場合(shape(x)=true)は１とする。

「全部」（ALL）
すべての参照ドメインは、自分自身への特殊な参照(自己参照)をその要素に含むと考える。つまり、

である。その上で、

とする。 “Another” (ANOTHER)
The focus element among the elements of the reference domain d is represented by focus (d).

“Remaining” (REST)

Only in the case of a reference domain having two groups as elements, the matching degree of the unfocused group is set to 1, and 0 is set otherwise.

“Both” (BOTH)
Set to 1 when x is a group and there are two elements.

"Figure" (FIG)
This representation refers to a group of assembled pieces. Therefore, when x is a single piece (single (x) = true), the numerical value r obtained from the static fit table is used, and when x is a group and connected to each other (shape (x) = true) is 1.

"All" (ALL)
Every reference domain considers its element to contain a special reference to itself (self-reference). In other words,

It is. Moreover,

And

つぎに、参照ドメインのリストについて説明する。参照解決（参照表現の理解）の進行に応じて、参照ドメインはリストに追加され、以下の手順で更新される。リストにおいて参照ドメインは、顕現性にしたがって降順でソートされる。 Next, a list of reference domains will be described. As the reference resolution (understands the reference expression) progresses, the reference domain is added to the list and updated as follows. The reference domains in the list are sorted in descending order according to the manifestation.

参照解決ごとに、全ての従前の参照表現は、正しく解決されると仮定する。したがって、それぞれの参照解決時点後に、最後の参照表現の正しい指示対象が集合であれば、該集合と同じ新たな参照ドメインが、参照ドメインのリストに含まれない限り、参照ドメインのリストに追加する。いずれの場合にも、該集合と同じ参照表現が既に参照ドメインのリストの先頭でない限り、該集合と同じ参照表現の顕現性をσ＋１とする。ここで、シグマは、その時点における参照ドメインのリスト内の最大の顕現性の値、すなわち、先頭の参照ドメインの顕現性の値である。 For each reference resolution, assume that all previous reference expressions are resolved correctly. Therefore, after each reference resolution time point, if the correct reference target of the last reference expression is a set, a new reference domain that is the same as the set is added to the list of reference domains unless it is included in the list of reference domains. . In any case, unless the same reference expression as the set is already at the top of the list of reference domains, the manifestation of the same reference expression as the set is set to σ + 1. Here, sigma is the maximum visibility value in the list of reference domains at that time, that is, the visibility value of the leading reference domain.

それぞれの参照解決時点前に、先行する参照表現の後に一番最近に操作されたピースが、知覚的な群を構成するかどうかを、後で説明する知覚的群化によって目標参照表現の開始時に確認する。群が認識されれば、認識された群と同じ新たな参照ドメインが、参照ドメインのリストに含まれていない限り、参照ドメインのリストに追加する。いずれの場合にも、該集合と同じ参照ドメインが既に参照ドメインのリストの先頭でない限り、該集合と同じ参照ドメインの顕現性をσ＋１とし、該集合と同じ参照ドメインの焦点は一番最近に操作されたピースに設定される。 Before each reference resolution point, whether the most recently manipulated piece after the preceding reference expression constitutes a perceptual group is determined at the start of the target reference expression by a perceptual grouping described later. Check. If the group is recognized, the same new reference domain as the recognized group is added to the list of reference domains unless it is included in the list of reference domains. In any case, unless the same reference domain as the set is already at the top of the list of reference domains, the manifestation of the same reference domain as the set is set to σ + 1, and the focus of the same reference domain as the set is operated most recently. Set to a piece.

新しい参照ドメイン＠_ｍがリストに追加されたとき、その補集合の参照ドメイン＠_ｎ及び包括参照ドメイン＠_ｌも、リストの＠_ｍの後に挿入される。ここで、

及び

である。この操作は、「残りの」(REST)のような概念を扱うのに必要である。 When a new reference domain @ _m has been added to the list, see the domain @ _n and comprehensive reference domain @ _l of the complement is also inserted after the @ _m of the list. here,

as well as

It is. This operation is necessary to handle concepts like "REST" (REST).

知覚的群化について説明する。ここでは、２個のピース間の最短距離が所定値以下である時に該２個のピースは接触しているとみなし、接触しているピースの集合だけを群として認識する。この方法は、汎用的ではないが、タングラム・パズルの性質によりＲＥＸ−Ｊコーパスの領域では満足に機能する。 Describe perceptual grouping. Here, when the shortest distance between two pieces is less than or equal to a predetermined value, the two pieces are considered to be in contact with each other, and only a set of the pieces in contact is recognized as a group. Although this method is not universal, it works satisfactorily in the REX-J corpus domain due to the nature of the tangram puzzle.

図１１は、参照ドメインのリストを使用して

を求める方法を示す流れ図である。この処理は、対話管理部１０９が行ってもよい。 Figure 11 uses a list of reference domains

It is a flowchart which shows the method of calculating | requiring. This process may be performed by the dialogue management unit 109.

図１１のステップＳ４０１０において、対話管理部１０９は、参照表現処理部１０１による参照解決の直前に、知覚的群化の結果に基づき、参照ドメインのリストを更新する。 In step S4010 of FIG. 11, the dialog management unit 109 updates the reference domain list based on the perceptual grouping result immediately before the reference expression processing unit 101 performs reference resolution.

図１１のステップＳ４０２０において、参照表現処理部１０１は、参照ドメインのリストから求めた顕現性を、上述の顕現性のモデルに入力して

を求める。 In step S4020 of FIG. 11, the reference expression processing unit 101 inputs the manifestation obtained from the list of reference domains into the above-described manifestation model.

Ask for.

図１１のステップＳ４０３０において、対話管理部１０９は、参照表現処理部１０１による参照解決の直後に、その結果に応じて、参照ドメインのリストを更新する。 In step S4030 of FIG. 11, the dialog management unit 109 updates the reference domain list according to the result immediately after the reference resolution by the reference expression processing unit 101.

上述のように、予測モデル

を構築するために、SVMrank（Thorsten Joachism. 2006. Training linear SVMs in linear time. In Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (KDD), pages 217-226, Philadelphia, PA, USA, August）を使用するランキングに基づく方法(Ryu IIDA, Shumpei Kobayashi, and Takenobu Tokunaga. 2010. Incorporating extra-linguistic information into reference resolution in collaboration task dialogue. In Proceedings of the 48^th Annual Meeting of the Association for Computational Linguistics, pages 1259-1267, Uppsala, Sweden, July)を採用した。このモデルは、目標の要素は以前に参照されたか(談話素性)、目標はマウスカーソルの下にあるか(マウスカーソル素性)などの１６個の２値素性にしたがって要素をランク付けする。 As mentioned above, the prediction model

SVMrank (Thorsten Joachism. 2006. Training linear SVMs in linear time. In Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (KDD), pages 217-226, Philadelphia, PA, USA, August) Ryu IIDA, Shumpei Kobayashi, and Takenobu Tokunaga. 2010. Incorporating extra-linguistic information into reference resolution in collaboration task dialogue.In Proceedings of the 48 ^th Annual Meeting of the Association for Computational Linguistics, pages 1259- 1267, Uppsala, Sweden, July). This model ranks the elements according to 16 binary features such as whether the target element has been previously referenced (discourse feature) or whether the target is under the mouse cursor (mouse cursor feature).

目標が集合、すなわち参照ドメインである場合には、集合に対する談話素性は、ピースの場合のように計算される。一方、マウスカーソル素性は異なる仕方で処理される。すなわち、グループのメンバーの一つがマウスカーソル素性の基準を満たせば、そのグループは基準を満たすと判断される。 If the goal is a set, ie a reference domain, the discourse features for the set are calculated as in the case of pieces. On the other hand, mouse cursor features are handled differently. That is, if one of the group members satisfies the mouse cursor feature criterion, the group is determined to satisfy the criterion.

ランクは、

で表せる。θはコンテキスト情報である。

は、ランクを使用して以下の式で求められる。

ここで、

は

となるための正規化係数である。 The rank is

It can be expressed as θ is context information.

Is obtained by the following equation using the rank.

here,

Is

Is a normalization coefficient for

ｄ内にｘがなければ、

は０である。 If there is no x in d,

Is 0.

評価には２４個の対話を使用した。２４個の対話は、１４７４個の単位参照表現と２８個の複合参照表現を含む。複合参照表現の他に、適合度関数を短時間で具体化するのが困難な複雑な概念に言及する参照表現を除外した。これらの参照表現を除外した後、１３１０個の参照表現が利用可能であった。１３１０個の参照表現のうち、１８２個の（１３．９％）は集合を指し、６１２個（４６．７％）は、「それ」などの指示代名詞である。 24 dialogues were used for evaluation. The 24 interactions include 1474 unit reference expressions and 28 compound reference expressions. In addition to compound reference expressions, we excluded reference expressions that refer to complex concepts that are difficult to instantiate the fitness function in a short time. After excluding these reference expressions, 1310 reference expressions were available. Of the 1310 reference expressions, 182 (13.9%) refer to the set, and 612 (46.7%) are pronouns such as “it”.

実験において以下の条件を前提とする。
話者の役割の独立性
参照表現は、話者の役割、すなわち、「解く人」及び「操作する人」から独立しているとする。すべての参照表現は、混合され連続的に処理される。
完全な処理および過去の情報
音声認識、形態素解析および構文解析を含む前処理から誤りは生じないとする。さらに、過去の参照表現の全ての正しい指示対象は既知であるとする。
未来情報なし
人間間の対話において、しばしば、参照表現の解決に役立つ情報が参照表現が発せられた後に与えられる。このような未来の情報は考慮しない。
数の情報
英語を含む多数の言語は、冠詞、名詞の単数形・複数形及び連辞などを使用して数の区別を示すことを要求する。日本語はそのような文法的な仕組みを有さないが、言語的情報及び身振りの情報を用いる機械学習技術を使用してそのような区別を予測することができる。そこで、そのような数の情報を与える効果を観察した。以下の実験において、正しい指示対象の注釈を事前に見ることによって、単数・複数の区別の情報を参照ベイジアンネットワークに与える。このことは、特別の証拠ノードＣ０を追加することによって達成される。ここで、

である。ｘがピースであれば、

及び

である。反対にｘが集合であれば、

及び

である。 The following conditions are assumed in the experiment.
Suppose that the speaker role independence reference expression is independent of the speaker role, ie, the "solver" and the "manipulator". All reference expressions are mixed and processed sequentially.
It is assumed that no errors arise from the complete processing and preprocessing including past information speech recognition, morphological analysis and syntax analysis. Furthermore, it is assumed that all correct instruction targets of past reference expressions are known.
No future information In human-to-human dialogue, information that is useful for resolving reference expressions is often given after the reference expression is issued. Such future information is not considered.
Numerous information Many languages, including English, require the use of articles, noun singular / plural forms and collocations to indicate the distinction of numbers. Japanese does not have such a grammatical mechanism, but such distinctions can be predicted using machine learning techniques using linguistic information and gesture information. Therefore, the effect of giving such a number of information was observed. In the following experiment, the reference Bayesian network is provided with information for distinguishing the singular and plural by looking at the annotation of the correct pointing object in advance. This is achieved by adding a special evidence node C0. here,

It is. If x is a piece,

as well as

It is. Conversely, if x is a set,

as well as

It is.

実験のベースラインとして、単一ドメインと呼ばれる

モデルを準備した。単一ドメインにおいて、

は、その時点までに認識された個々のピース及び参照ドメインを含む単一の参照ドメイン

のみから構成される。すなわち、

である。 As a baseline for the experiment, called a single domain

A model was prepared. In a single domain,

Is a single reference domain containing individual pieces and reference domains recognized up to that point

Consists of only. That is,

It is.

本実験において、参照表現が指示詞を含む場合には、上述のモデルによる参照ドメインを使用するよりも単一ドメインを使用する方が、性能が向上した。以下の結果において、参照表現が指示詞を含む場合には、常に単一ドメインを使用した。 In this experiment, when the reference expression includes a directive, performance is improved by using a single domain rather than using a reference domain according to the above model. In the following results, a single domain was always used when the reference expression included a directive.

表２は、実験結果を示す表である。参照解決の性能は、カテゴリーごと及び条件ごとに正確さで表される。ここで、正確さは、正しく解決された参照表現の数を参照表現の数で割った値である。

Table 2 is a table showing experimental results. The performance of the reference solution is expressed with accuracy by category and condition. Here, the accuracy is a value obtained by dividing the number of correctly resolved reference expressions by the number of reference expressions.

評価に当たり、「単数」、「複数」及び「全体」の３個のカテゴリーを設定した。「単数」カテゴリーは、単一のピースを指す参照表現の集まりである。「複数」は、ピースの集合を指す参照表現の集まりである。「全体」は、それらの和である。表１の一番目のようなあいまいな参照表現は、「単一」として数え、解決された結果が考えられる指示対象のうちの一つであれば、そのような参照表現の解決は正しいと考える。 For the evaluation, three categories of “single”, “plural” and “whole” were set. The “single” category is a collection of reference expressions that point to a single piece. “Plural” is a collection of reference expressions indicating a collection of pieces. “Whole” is the sum of them. The ambiguous reference expression such as the first in Table 1 is counted as “single”, and if the resolved result is one of the possible target objects, the resolution of such a reference expression is considered correct. .

「単複情報なし」は、単数・複数の区別の情報なしの実験結果を示し、「単複情報あり」は、単数・複数の区別の情報ありの実験結果を示す。単数・複数の情報は、明らかに強い影響力を有する。 “No single / multiple information” indicates an experimental result without single / multiple distinction information, and “with single / multiple information” indicates an experimental result with single / multiple distinction information. The singular / plural information clearly has a strong influence.

「単数」カテゴリーに対する最良の性能は、線形モデルによって達成されたが、「複数」に対する最良の性能は「指数モデル」によって達成された。参照表現が「単数」のものであるか「複数」のものであるか知ることができれば、すなわち、単数・複数の情報が利用できれば、適切な

モデルを選択することができる。したがって、モデルを切り替えることによって、単数・複数の情報を使用した「全体」の裁量の性能は、８３．２%に達し、ベースラインに対して２．０ポイントの増加が達成された（符号検定、p<0.0001）。 The best performance for the “single” category was achieved by the linear model, while the best performance for the “plurality” was achieved by the “exponential model”. If you can know whether the reference expression is “single” or “plural”, that is, if singular / plural information is available,

A model can be selected. Thus, by switching models, the discretionary performance of “overall” using single or multiple information reached 83.2%, and an increase of 2.0 points over the baseline was achieved (sign test) , P <0.0001).

参照ドメインを導入することによって、「複数」カテゴリーにおける解決は、顕著に向上した。最大の性能の増加は、９．３ポイントである（符号検定、p<0.005）。 By introducing a reference domain, the resolution in the “multiple” category was significantly improved. The maximum performance increase is 9.3 points (sign test, p <0.005).

さらに、LEFT及びRIGHTなどの位置概念を含むより多くの参照表現が、一様、線形及び指数モデルの場合に正しく解決された。 In addition, more reference representations, including location concepts such as LEFT and RIGHT, were solved correctly for uniform, linear and exponential models.

表３は、単数・複数の情報を使用した４個の位置概念の解決結果をまとめた表である。表３の数値は、全体の数または正解の数である。ベースラインである単一モデルは６５%を解決したが、線形モデルは７５%を正しく解決した（符号検定、p<0.05）。

Table 3 is a table summarizing the solution results of the four position concepts using single and plural pieces of information. The numerical values in Table 3 are the total number or the number of correct answers. The baseline single model solved 65%, while the linear model solved 75% correctly (sign test, p <0.05).

解決ごとに、問題の参照表現に対して専用のベイジアンネットワークが構築される。構築された参照表現ベイジアンネットワークは、記述、直示または照応の参照表現を統一的な仕方で処理する。参照表現ベイジアンネットワークは、コンテキスト依存の属性を使用して参照表現を解決し、集合に応じて参照表現を処理することを可能とする参照ドメインの考えを組み込んでいる。参照表現ベイジアンネットワークは、スマートフォン、自動車システム、サービス用ロボットなどのパーソナル・エイジェントのような任意かつ全てのタスク指向の応用に対して使用することのできる標準的なアプローチとなりうる。 For each solution, a dedicated Bayesian network is built for the reference representation of the problem. The constructed reference expression Bayesian network handles description, direct or anaphoric reference expressions in a unified way. Reference expression Bayesian networks incorporate the idea of a reference domain that allows contextual attributes to be used to resolve a reference expression and to process the reference expression according to a set. Reference representation Bayesian networks can be a standard approach that can be used for any and all task-oriented applications such as personal agents such as smartphones, automotive systems, service robots, and the like.

１００…言語処理装置、１０１…参照表現処理部、１０３…言語理解処理部、１０５…記憶部、１０７…言語生成処理択部、１０９…対話管理部 DESCRIPTION OF SYMBOLS 100 ... Language processing apparatus, 101 ... Reference expression process part, 103 ... Language understanding process part, 105 ... Memory | storage part, 107 ... Language generation process selection part, 109 ... Dialog management part

Claims

A reference expression Bayesian network representing a relationship between a reference domain (D) that is a set of possible instruction objects, an instruction object (X) in the reference domain, a concept (C) about the instruction object, and a word (W) that expresses the concept A reference expression processing unit that performs at least one of understanding and generation of the reference expression using a probability model composed of
A reference expression processing apparatus comprising: a storage unit that stores data necessary to form the reference expression Bayesian network.

The reference expression processing device according to claim 1, wherein the reference expression Bayesian network is formed for each processing of a reference expression while a dialogue is in progress.

The reference expression processing device according to claim 2, wherein the reference expression processing apparatus is configured to change a method of determining the reference domain according to a type of reference expression.

The reference expression processing device according to claim 3, wherein the reference domain includes all elements when the reference expression includes a directive.

A plurality of estimation models of the reference domain is formed with the reference domain's manifestation as a parameter, and one of the plurality of estimation models is selected and used depending on whether the target object of the reference expression is a single object or a set. The reference expression processing device according to claim 3, which is configured to do so.

A language processing apparatus comprising the reference expression processing apparatus according to claim 1.

The reference expression processing unit of the language processing device uses the data stored in the storage unit to perform a reference domain (D), a target object (X) in the reference domain, and a concept (C) And a reference expression Bayesian network representing the relationship between the words (W) representing the concept and
The reference expression processing unit peripheralizing the Bayesian network to obtain a probability P (X | W);
A reference expression processing method including: a step of obtaining x ′ that maximizes the probability P (X | W) and making the reference expression instruction target.