JP7308550B2

JP7308550B2 - Utterance generation device, utterance generation method, and computer program

Info

Publication number: JP7308550B2
Application number: JP2021151210A
Authority: JP
Inventors: 敦青山; 健太朗辻
Original assignee: Datavision
Current assignee: Datavision
Priority date: 2018-12-25
Filing date: 2021-09-16
Publication date: 2023-07-14
Anticipated expiration: 2038-12-25
Also published as: JP2020102117A; JP2021193608A; JP6951763B2

Description

本発明は、発話生成装置、発話生成方法、及びコンピュータプログラムに関する。 The present invention relates to a speech generation device, a speech generation method, and a computer program.

近年、ユーザ発話を解析し、発話内容に応じた応答を返すことでユーザとの対話を実現する対話装置の開発が進められている。このような対話装置は、例えば、カーナビゲーション装置、及び公共施設等における案内装置等において利用されている。 2. Description of the Related Art In recent years, the development of dialogue devices that realize dialogue with a user by analyzing a user's utterance and returning a response according to the content of the utterance has been promoted. Such interactive devices are used, for example, in car navigation devices, guidance devices in public facilities and the like.

特開２００４－１１０５２４号公報JP 2004-110524 A

しかしながら、上述したような対話装置はタスク指向型であり、質問への応答を目的としているため、癒し、治療、助言、発想支援等のユーザの心的変化を喚起する効果は期待できない。 However, since the above-described interactive devices are task-oriented and aimed at answering questions, they cannot be expected to have the effect of arousing changes in the user's mind, such as healing, treatment, advice, and idea support.

本発明は、斯かる事情に鑑みてなされたものであり、癒し、治療、助言、発想支援等のユーザの心的変化を喚起する効果を期待できる発話生成装置、発話生成方法、及びコンピュータプログラムを提供することを目的とする。 The present invention has been made in view of such circumstances, and provides an utterance generation device, a utterance generation method, and a computer program that can be expected to have the effect of arousing a change in the user's mind, such as healing, treatment, advice, and idea support. intended to provide

本願の一態様に係る発話生成装置は、入力されたユーザ発話に対して出力すべきシステム発話を生成する発話生成装置であって、前記ユーザ発話からユーザの意図を認識する認識部と、前記ユーザ発話に含まれる名詞を抽出し、抽出した名詞から連想される複数の連想語と、前記名詞及び前記複数の連想語の夫々に共起する用言を含む複数の共起表現とを取得する取得部と、前記名詞及び前記複数の連想語の間で共通に用いられる共起表現の多寡に基づき、前記名詞及び前記複数の連想語を複数のクラスタに分類する分類部と、前記複数のクラスタのうちの一のクラスタから一の名詞又は連想語を選択すると共に、前記一のクラスタに含まれる何れかの名詞又は連想語と共起する一の共起表現を選択する選択部と、前記認識部が認識したユーザの意図を前記システム発話に付与すべき意図へ変換する変換部と、前記選択部により選択された名詞又は連想語と、前記選択部により選択された共起表現とを接続すると共に、前記共起表現に前記変換部による変換後の意図を接続してシステム発話を生成する発話生成部とを備える。 An utterance generation device according to an aspect of the present application is a utterance generation device that generates a system utterance to be output in response to an input user utterance, comprising: a recognition unit that recognizes a user's intention from the user utterance; Acquisition of extracting a noun included in an utterance and acquiring a plurality of associated words associated with the extracted noun and a plurality of co-occurring expressions including terms co-occurring with the noun and the plurality of associated words a classification unit that classifies the noun and the plurality of associative words into a plurality of clusters based on the amount of co-occurrence expressions commonly used between the noun and the plurality of associative words; and a selection unit that selects one noun or associative word from one of the clusters and selects one co-occurrence expression that co-occurs with any noun or associative word included in the one cluster; connects a conversion unit that converts the recognized user's intention into the intention to be added to the system utterance, the noun or associative word selected by the selection unit, and the co-occurrence expression selected by the selection unit; and an utterance generation unit that generates a system utterance by connecting the intention after conversion by the conversion unit to the co-occurrence expression .

本願の一態様に係る発話生成方法は、コンピュータを用いて、入力されたユーザ発話に対して出力すべきシステム発話を生成する発話生成方法であって、前記コンピュータは、前記ユーザ発話からユーザの意図を認識し、前記ユーザ発話に含まれる名詞を抽出し、抽出した名詞から連想される複数の連想語と、前記名詞及び前記複数の連想語の夫々に共起する用言を含む複数の共起表現とを取得し、前記名詞及び前記複数の連想語の間で共通に用いられる共起表現の多寡に基づき、前記名詞及び前記複数の連想語を複数のクラスタに分類し、前記複数のクラスタのうちの一のクラスタから一の名詞又は連想語を選択すると共に、前記一のクラスタに含まれる何れかの名詞又は連想語と共起する一の共起表現を選択し、認識したユーザの意図を前記システム発話に付与すべき意図へ変換し、選択した名詞又は連想語と、選択した共起表現とを接続すると共に、前記共起表現に対して変換後の意図を接続してシステム発話を生成する。 An utterance generation method according to an aspect of the present application is an utterance generation method that uses a computer to generate a system utterance to be output in response to an input user utterance, wherein the computer extracts a user's intention from the user utterance. and extracting nouns included in the user utterance, and a plurality of co-occurrences including a plurality of associated words associated with the extracted nouns and terms co-occurring with each of the nouns and the plurality of associated words. and classifying the noun and the plurality of associative words into a plurality of clusters based on the amount of co-occurrence expressions commonly used between the noun and the plurality of associative words, and classifying the plurality of clusters. Select one noun or associative word from one of the clusters, select one co-occurring expression that co-occurs with any noun or associative word contained in the one cluster, and recognize the user's intention converting the system utterance into an intention to be imparted, connecting the selected noun or associative word with the selected co-occurring expression, and connecting the converted intention to the co-occurring expression to generate the system utterance; do.

本願の一態様に係るコンピュータプログラムは、コンピュータに、入力されたユーザ発話に対して出力すべきシステム発話を生成する処理を実行させるためのコンピュータプログラムであって、前記コンピュータに、前記ユーザ発話からユーザの意図を認識し、前記ユーザ発話に含まれる名詞を抽出し、抽出した名詞から連想される複数の連想語と、前記名詞及び前記複数の連想語の夫々に共起する用言を含む複数の共起表現とを取得し、前記名詞及び前記複数の連想語の間で共通に用いられる共起表現の多寡に基づき、前記名詞及び前記複数の連想語を複数のクラスタに分類し、前記複数のクラスタのうちの一のクラスタから一の名詞又は連想語を選択すると共に、前記一のクラスタに含まれる何れかの名詞又は連想語と共起する一の共起表現を選択し、認識したユーザの意図を前記システム発話に付与すべき意図へ変換し、選択した名詞又は連想語と、選択した共起表現とを接続すると共に、前記共起表現に対して変換後の意図を接続してシステム発話を生成する処理を実行させるためのコンピュータプログラムである。 A computer program according to an aspect of the present application is a computer program for causing a computer to execute processing for generating a system utterance to be output in response to an input user utterance, wherein the computer instructs the computer to generate a user utterance from the user utterance. extracting nouns included in the user utterance; and extracting a plurality of associated words associated with the extracted nouns; classifying the noun and the plurality of associative words into a plurality of clusters based on the amount of co-occurring expressions commonly used between the noun and the plurality of associative words; selecting one noun or associative word from one of the clusters, selecting one co-occurring expression co-occurring with any noun or associative word contained in the one cluster, and Converting the intention into the intention to be given to the system utterance, connecting the selected noun or associated word with the selected co-occurrence expression, and connecting the converted intention to the co-occurrence expression to generate the system utterance. is a computer program for executing the process of generating

本願に依れば、癒し、治療、助言、発想支援等のユーザの心的変化を喚起する効果を期待できる。 According to the present application, effects such as comfort, treatment, advice, idea support, and the like that arouse a change in the user's mental state can be expected.

実施の形態１に係る対話システムの全体構成を説明するブロック図である。1 is a block diagram illustrating the overall configuration of a dialogue system according to Embodiment 1; FIG. 対話システムにおける対話の一例を示す模式図である。1 is a schematic diagram showing an example of dialogue in a dialogue system; FIG. 発話生成装置の内部構成を説明するブロック図である。It is a block diagram explaining the internal structure of a speech production|generation apparatus. 端末装置の内部構成を説明するブロック図である。It is a block diagram explaining the internal structure of a terminal device. 発話生成処理の概要を説明する説明図である。FIG. 4 is an explanatory diagram for explaining an outline of speech generation processing; Ｗｏｒｄ２Ｖｅｃによる名詞の抽出例を示す図である。It is a figure which shows the extraction example of the noun by Word2Vec. 本実施の形態におけるクラスタリング手法を説明する説明図である。It is an explanatory view explaining a clustering method in this embodiment. 名詞及び共起表現の選択処理について説明する説明図である。FIG. 10 is an explanatory diagram illustrating selection processing of nouns and co-occurrence expressions; 意図変換テーブルの一例を示す概念図である。FIG. 4 is a conceptual diagram showing an example of an intention conversion table; 意図変換テーブルの一例を示す概念図である。FIG. 4 is a conceptual diagram showing an example of an intention conversion table; 発話生成装置が実行する処理の手順を説明するフローチャートである。4 is a flowchart for explaining the procedure of processing executed by the speech generation device; 実施の形態２に係るシステム発話の生成手法を説明する説明図である。FIG. 10 is an explanatory diagram illustrating a system utterance generation method according to Embodiment 2;

以下、本発明をその実施の形態を示す図面に基づいて具体的に説明する。
（実施の形態１）
図１は実施の形態１に係る対話システムの全体構成を説明するブロック図である。本実施の形態に係る対話システムは、通信網Ｎを介して互いに通信可能に接続される発話生成装置１０と端末装置２０とを備える。端末装置２０は、ユーザによって利用されるパーソナルコンピュータ、スマートフォン、ＡＲ（Augmented Reality）装置などの情報処理端末であり、発話生成装置１０にアクセスするためのアプリケーションプログラムがインストールされているものとする。発話生成装置１０は、例えば、端末装置２０からのアクセスを受付けた場合、ユーザ認証を行い、ユーザ認証に成功した場合、端末装置２０に対して対話サービスを提供する。 BEST MODE FOR CARRYING OUT THE INVENTION Hereinafter, the present invention will be specifically described based on the drawings showing the embodiments thereof.
(Embodiment 1)
FIG. 1 is a block diagram for explaining the overall configuration of a dialogue system according to Embodiment 1. As shown in FIG. A dialogue system according to the present embodiment includes a speech generation device 10 and a terminal device 20 that are connected to each other via a communication network N so as to be able to communicate with each other. The terminal device 20 is an information processing terminal used by a user, such as a personal computer, a smart phone, an AR (Augmented Reality) device, etc., and an application program for accessing the speech generation device 10 is installed. For example, when an access from the terminal device 20 is accepted, the speech generation device 10 performs user authentication, and provides an interactive service to the terminal device 20 when the user authentication is successful.

図２は対話システムにおける対話の一例を示す模式図である。図２は端末装置２０の表示画面２０Ａに表示される対話文の一例を示している。対話文は、ユーザが端末装置２０を用いて入力するユーザ発話と、ユーザ発話に対する発話生成装置１０の応答であるシステム発話とにより構成されている。本実施の形態では、ユーザ発話及びシステム発話を文字情報として説明するが、音声情報であってもよいことは勿論のことである。 FIG. 2 is a schematic diagram showing an example of dialogue in the dialogue system. FIG. 2 shows an example of a dialogue displayed on the display screen 20A of the terminal device 20. As shown in FIG. The dialogue consists of user utterances input by the user using the terminal device 20 and system utterances that are the responses of the utterance generation device 10 to the user utterances. In the present embodiment, user utterances and system utterances are described as character information, but it goes without saying that voice information may also be used.

端末装置２０は、ユーザ発話Ｕ０１の入力を受付けた場合、受付けたユーザ発話Ｕ０１を発話生成装置１０へ送信する。発話生成装置１０は、端末装置２０から送信されるユーザ発話Ｕ０１を受信した場合、その応答としてシステム発話Ｓ０１を生成し、生成したシステム発話Ｓ０１を端末装置２０へ送信する。以後、例えば対話の終了を意図する定型文（図２に示す例では、「ばいばい」というユーザ発話Ｕ０６）が端末装置２０に入力されるまでの間、発話生成装置１０は、端末装置２０からユーザ発話Ｕ０２，Ｕ０３，…を受信する都度、システム発話Ｓ０２，Ｓ０３，…を逐次生成し、生成したシステム発話Ｓ０２，Ｓ０３，…を端末装置２０へ返信する。 When the terminal device 20 receives the input of the user utterance U01, the terminal device 20 transmits the received user utterance U01 to the utterance generation device 10. FIG. When receiving a user utterance U01 transmitted from the terminal device 20, the utterance generation device 10 generates a system utterance S01 as a response, and transmits the generated system utterance S01 to the terminal device 20. FIG. After that, until the user utterance U06 of "bye-bye" in the example shown in FIG. Each time utterances U02, U03, . . . are received, system utterances S02, S03, .

端末装置２０の表示画面２０Ａには、入力されたユーザ発話Ｕ０１，Ｕ０２，Ｕ０３，…と、発話生成装置１０から受信したシステム発話Ｓ０１，Ｓ０２，Ｓ０３，…とが表示される。図２の例では、表示画面２０Ａの右側に、画面上部から画面下部に向かって時系列順にユーザ発話Ｕ０１～Ｕ０５が表示されており、表示画面２０Ａの左側に、画面上部から画面下部に向かって時系列順にシステム発話Ｓ０１～Ｓ０５が表示されている様子を示している。 Input user utterances U01, U02, U03, . . . and system utterances S01, S02, S03, . In the example of FIG. 2, on the right side of the display screen 20A, user utterances U01 to U05 are displayed in chronological order from the top to the bottom of the screen, and on the left side of the display screen 20A, utterances U01 to U05 are displayed from the top to the bottom of the screen. It shows how system utterances S01 to S05 are displayed in chronological order.

なお、発話生成装置１０によるシステム発話の生成手法については、後に詳述することとする。 A method of generating the system utterance by the utterance generation device 10 will be described in detail later.

図３は発話生成装置１０の内部構成を説明するブロック図である。発話生成装置１０は、例えばサーバ装置であり、制御部１１、記憶部１２、通信部１３、表示部１４及び操作部１５を備える。 FIG. 3 is a block diagram for explaining the internal configuration of the speech generator 10. As shown in FIG. The speech generation device 10 is, for example, a server device, and includes a control section 11 , a storage section 12 , a communication section 13 , a display section 14 and an operation section 15 .

制御部１１は、例えば、ＣＰＵ（Central Processing Unit）、ＲＯＭ（Read Only Memory）、ＲＡＭ（Random Access Memory）などにより構成されている。制御部１１が備えるＣＰＵは、ＲＯＭ又は記憶部１２に記憶されている各種コンピュータプログラムをＲＡＭ上に展開して実行することにより、装置全体を本願の発話生成装置として機能させる。 The control unit 11 includes, for example, a CPU (Central Processing Unit), a ROM (Read Only Memory), a RAM (Random Access Memory), and the like. The CPU provided in the control unit 11 expands various computer programs stored in the ROM or the storage unit 12 on the RAM and executes them, thereby causing the entire device to function as the speech generation device of the present application.

なお、制御部１１は、上記の構成に限定されるものではなく、１又は複数のＣＰＵ、マルチコアＣＰＵ、マイコン等を含む任意の処理回路であればよい。また、制御部１１は、計測開始指示を与えてから計測終了指示を与えるまでの経過時間を計測するタイマ、数をカウントするカウンタ等の機能を備えていてもよい。 Note that the control unit 11 is not limited to the configuration described above, and may be any processing circuit including one or more CPUs, a multi-core CPU, a microcomputer, or the like. Further, the control unit 11 may have a function such as a timer for measuring the elapsed time from when the measurement start instruction is given until when the measurement end instruction is given, and a counter for counting the number.

記憶部１２は、ハードディスク装置などの記憶装置により構成されており、各種コンピュータプログラム及び各種データを記憶する。ここで、記憶部１２に記憶されるコンピュータプログラムは、入力されるユーザ発話からシステム発話を生成させる処理を発話生成装置１０に実行させるためのコンピュータプログラム（発話生成プログラム１２０）を含む。 The storage unit 12 is configured by a storage device such as a hard disk device, and stores various computer programs and various data. Here, the computer programs stored in the storage unit 12 include a computer program (speech generation program 120) for causing the speech generation device 10 to execute processing for generating system utterances from input user utterances.

なお、記憶部１２に記憶されるプログラムは、当該プログラムを読み取り可能に記録した非一時的な記録媒体Ｍ１により提供されてもよい。記録媒体Ｍ１は、例えば、ＣＤ－ＲＯＭ、ＵＳＢメモリ、ＳＤ（Secure Digital）カード、マイクロＳＤカード、コンパクトフラッシュ（登録商標）などの可搬型メモリである。この場合、制御部１１は、不図示の読取装置を用いて記録媒体Ｍ１から各種プログラムを読み取り、読み取った各種プログラムを記憶部１２にインストールする。また、記憶部１２に記憶されるプログラムは、通信部１３を介した通信により提供されてもよい。この場合、制御部１１は、通信部１３を通じて各種プログラムを取得し、取得した各種プログラムを記憶部１２にインストールする。 The program stored in the storage unit 12 may be provided by a non-temporary recording medium M1 on which the program is readable. The recording medium M1 is, for example, a portable memory such as a CD-ROM, USB memory, SD (Secure Digital) card, micro SD card, compact flash (registered trademark). In this case, the control unit 11 uses a reading device (not shown) to read various programs from the recording medium M<b>1 and installs the read various programs in the storage unit 12 . Also, the programs stored in the storage unit 12 may be provided by communication via the communication unit 13 . In this case, the control unit 11 acquires various programs through the communication unit 13 and installs the acquired various programs in the storage unit 12 .

また、記憶部１２に記憶される各種データは、後述する意図変換テーブル１２１、Ｗｏｒｄ２Ｖｅｃデータ１２２を含む。これらのデータの詳細については後に詳述することとする。 Various data stored in the storage unit 12 include an intention conversion table 121 and Word2Vec data 122, which will be described later. Details of these data will be described later.

通信部１３は、通信網Ｎを通じて端末装置２０と通信を行うためのインタフェースを備える。通信部１３は、端末装置２０へ送信すべき情報が制御部１１から入力された場合、入力された情報を端末装置２０へ送信する共に、通信網Ｎを通じて受信した端末装置２０からの情報を制御部１１へ出力する。 The communication unit 13 has an interface for communicating with the terminal device 20 through the communication network N. FIG. When information to be transmitted to the terminal device 20 is input from the control unit 11, the communication unit 13 transmits the input information to the terminal device 20 and controls information received from the terminal device 20 through the communication network N. Output to unit 11 .

表示部１４は、液晶ディスプレイ、有機ＥＬディスプレイなどの表示デバイスを備え、発話生成装置１０の管理者に対して報知すべき情報を表示する。また、操作部１５は、タッチパネル、各種ボタンを備え、発話生成装置１０の管理者による操作を受付け、受付けた操作情報を制御部１１へ出力する。 The display unit 14 has a display device such as a liquid crystal display or an organic EL display, and displays information to be notified to the administrator of the speech generation device 10 . Further, the operation unit 15 includes a touch panel and various buttons, receives operations by the administrator of the speech generation device 10 , and outputs received operation information to the control unit 11 .

なお、本実施の形態では、簡略化のために、発話生成装置１０を１つの装置として記載したが、複数のサーバ装置により構成されてもよく、１又は複数の仮想マシンにより構成されるものであってもよい。 In this embodiment, for the sake of simplification, the utterance generation device 10 is described as one device, but it may be configured by a plurality of server devices, and may be configured by one or a plurality of virtual machines. There may be.

また、本実施の形態では、簡略化のために、発話生成装置１０の記憶部１２が意図変換テーブル１２１、及びＷｏｒｄ２Ｖｅｃデータ１２２を備えるものとして説明するが、１又は複数の外部サーバにこれらのデータを用意しておき、必要に応じて外部サーバにアクセスすることによって、必要なデータを取得する構成としてもよい。 Further, in this embodiment, for the sake of simplification, it is assumed that the storage unit 12 of the utterance generation device 10 includes the intention conversion table 121 and the Word2Vec data 122. may be prepared, and necessary data may be obtained by accessing an external server as necessary.

図４は端末装置２０の内部構成を説明するブロック図である。端末装置２０は、例えばサーバ装置であり、パーソナルコンピュータ、スマートフォンなどの情報処理端末であり、制御部２１、記憶部２２、通信部２３、表示部２４及び操作部２５を備える。 FIG. 4 is a block diagram for explaining the internal configuration of the terminal device 20. As shown in FIG. The terminal device 20 is, for example, a server device, an information processing terminal such as a personal computer or a smartphone, and includes a control unit 21 , a storage unit 22 , a communication unit 23 , a display unit 24 and an operation unit 25 .

制御部２１は、例えば、ＣＰＵ、ＲＯＭ、ＲＡＭなどにより構成されている。制御部２１が備えるＣＰＵは、ＲＯＭ又は記憶部２２に記憶されている各種コンピュータプログラムをＲＡＭ上に展開して実行することにより、装置全体の動作を制御する。 The control unit 21 is composed of, for example, a CPU, a ROM, a RAM, and the like. The CPU provided in the control unit 21 controls the operation of the entire apparatus by deploying various computer programs stored in the ROM or the storage unit 22 on the RAM and executing them.

記憶部２２は、ＥＥＰＲＯＭ（Electronically Erasable Programmable Read Only Memory）などの不揮発性メモリにより構成されており、各種コンピュータプログラム及び各種データを記憶する。ここで、記憶部２２に記憶されるコンピュータプログラムは、発話生成装置１０にアクセスするためのアプリケーションプログラムが含まれる。 The storage unit 22 is composed of a nonvolatile memory such as an EEPROM (Electronically Erasable Programmable Read Only Memory), and stores various computer programs and various data. Here, the computer programs stored in the storage unit 22 include application programs for accessing the speech generation device 10 .

通信部２３は、通信網Ｎを通じて発話生成装置１０と通信を行うためのインタフェースを備える。通信部２３は、発話生成装置１０へ送信すべき情報が制御部２１から入力された場合、入力された情報を発話生成装置１０へ送信する共に、通信網Ｎを通じて受信した発話生成装置１０からの情報を制御部２１へ出力する。 The communication unit 23 has an interface for communicating with the speech generation device 10 through the communication network N. FIG. When information to be transmitted to the utterance generation device 10 is input from the control unit 21, the communication unit 23 transmits the input information to the utterance generation device 10. Information is output to the control unit 21 .

表示部２４は、液晶ディスプレイ、有機ＥＬディスプレイなどの表示デバイスを備え、端末装置２０のユーザに対して報知すべき情報（例えば対話文）を表示する。また、操作部２５は、タッチパネル、各種ボタンを備え、端末装置２０のユーザによる操作を受付け、受付けた操作情報を制御部２１へ出力する。 The display unit 24 includes a display device such as a liquid crystal display and an organic EL display, and displays information (for example, dialogue sentences) to be notified to the user of the terminal device 20 . The operation unit 25 also has a touch panel and various buttons, receives operations by the user of the terminal device 20 , and outputs received operation information to the control unit 21 .

なお、本実施の形態では、端末装置２０が表示部２４を備える構成としたが、表示部２４に代えて、又は表示部２４に加えて、音声出力装置を備える構成としてもよい。この場合、端末装置２０は、ユーザに対して報知すべき情報（例えば対話文）を音声出力装置から音声として出力することが可能である。 In the present embodiment, the terminal device 20 is configured to include the display unit 24 , but may be configured to include an audio output device in place of the display unit 24 or in addition to the display unit 24 . In this case, the terminal device 20 can output information to be notified to the user (for example, dialogue sentences) as voice from the voice output device.

以下、発話生成装置１０が実行する発話生成処理について説明する。
図５は発話生成処理の概要を説明する説明図である。発話生成装置１０は、端末装置２０からユーザ発話を受信した場合、まず、受信したユーザ発話からコンテンツ及びユーザの意図を抽出する。このため、発話生成装置１０は、形態素解析の手法を用いて、ユーザ発話を複数の形態素に分解する。図５の例では、端末装置２０から受信したユーザ発話が「学校で学びたい」であった場合、形態素解析により、「学校」、「で」、「学び」、「たい」の４つの形態素に分解されたことを示している。また、端末装置２０から受信したユーザ発話が「技術経営？」であった場合、「技術経営」及び「？」の２つの形態素に分解されたことを示している。 The speech generation process performed by the speech generation device 10 will be described below.
FIG. 5 is an explanatory diagram for explaining the outline of the utterance generation process. When receiving a user's speech from the terminal device 20, the speech generation device 10 first extracts the content and the user's intention from the received user's speech. Therefore, the utterance generation device 10 uses a morphological analysis technique to decompose the user's utterance into a plurality of morphemes. In the example of FIG. 5, when the user utterance received from the terminal device 20 is "I want to study at school", the morphological analysis is performed to divide it into four morphemes: "school", "de", "learn", and "want". It shows disassembled. Also, when the user's utterance received from the terminal device 20 is "technical management?", it is decomposed into two morphemes, "technical management" and "?".

発話生成装置１０は、形態素解析の結果に基づき、主要品詞を取得する。主要品詞は、終端の動詞を起点として、動詞、名詞、形容詞又は形状詞の順にユーザ発話から１又は複数個取得する。発話生成装置１０は、取得した品詞のうち、名詞、動詞、及び形容詞に相当する形態素をコンテンツとして認識する。また、発話生成装置１０は、名詞及び動詞を除く、主として助詞及び助動詞に相当する形態素を意図として認識する。図５の例では、「学校」、「で」、「学び」、「たい」の４つの形態素のうち、「学校」及び「学び」がコンテンツとして認識され、「たい」をユーザの意図（この場合、願望）として認識されたことを示している。また、「技術経営」及び「？」の２つの形態素のうち、「技術経営」がコンテンツとして認識され、「？」がユーザの意図（この場合、疑問）として認識されたことを示している。なお、発話生成装置１０が抽出するユーザの意図には、ユーザの性格、感情、対話状況に関する情報が含まれてもよい。 The utterance generation device 10 acquires the main part of speech based on the result of the morphological analysis. One or a plurality of main parts of speech are acquired from the user's utterance in the order of verbs, nouns, adjectives, or shape words starting from the terminal verb. The speech generation device 10 recognizes morphemes corresponding to nouns, verbs, and adjectives among the acquired parts of speech as content. In addition, the utterance generation device 10 recognizes morphemes mainly corresponding to particles and auxiliary verbs, excluding nouns and verbs, as intentions. In the example of FIG. 5, of the four morphemes "school", "de", "learn", and "want", "school" and "learn" are recognized as content, and "want" is the user's intention (this indicates that it was recognized as a desire). Also, of the two morphemes of "technical management" and "?", "technical management" was recognized as content, and "?" was recognized as the user's intention (question in this case). The user's intention extracted by the utterance generation device 10 may include information about the user's personality, emotion, and dialogue situation.

発話生成装置１０は、認識したコンテンツから連想される連想語を抽出する処理を実行する。本実施の形態では、コンテンツから連想される連想語として名詞を抽出する構成について説明するが、連想語は動詞であってもよく、形容詞であってもよい。発話生成装置１０は、主格名詞又は初期名詞をテーマとして保持しつつ、直近のユーザ発話の主要品詞、直近のシステム発話の名詞、動詞又は形容詞、テーマを合成し、Ｗｏｒｄ２Ｖｅｃデータ１２２を参照して、名詞を設定した数だけ抽出する。例えば、２つ目のユーザ発話である「技術経営？」に含まれるコンテンツから名詞を連想する際、「技術経営」というコンテンツだけでなく、直近のユーザ発話、又はシステム発話に含まれるコンテンツ（「学校」及び「学び」）を加味して名詞を連想することができる。コンテンツから連想される複数の名詞を抽出した後、発話生成装置１０は、抽出した名詞群の中から、システム発話に含めるべき名詞を１つ選択する。また、選択した名詞と共起する用言を含む共起表現が必要に応じて抽出される。なお、コンテンツから抽出される連想語が動詞又は形容詞である場合、動詞又は形容詞に共起する体言を含む共起表現を抽出してもよい。 The utterance generation device 10 executes a process of extracting associated words associated with recognized content. In this embodiment, a configuration for extracting nouns as associated words associated with content will be described, but the associated words may be verbs or adjectives. The utterance generation device 10 synthesizes the main part of speech of the most recent user utterance, the noun, verb or adjective of the most recent system utterance, and the theme while holding the subject noun or initial noun as a theme, and refers to the Word2Vec data 122, Extract only the set number of nouns. For example, when associating a noun from the content included in the second user utterance, "Technology management?" It is possible to associate nouns by adding "school" and "learning"). After extracting a plurality of nouns associated with the content, the utterance generation device 10 selects one noun to be included in the system utterance from the group of extracted nouns. In addition, co-occurring expressions including predicates co-occurring with the selected noun are extracted as necessary. Note that when the associated words extracted from the content are verbs or adjectives, co-occurring expressions including nominal words co-occurring with the verbs or adjectives may be extracted.

本実施の形態では、認識したコンテンツから連想される名詞を抽出する際に、Ｗｏｒｄ２Ｖｅｃデータ１２２を用いる構成としたが、Ｗｏｒｄ２Ｖｅｃデータ１２２に限らず、ＧｌｏＶｅ、ＦａｓｔＴｅｘｔなどの各単語の関係性をベクトル表現化したデータを用いて、認識コンテンツから連想される名詞を抽出する構成としてもよいことは勿論のことである。 In this embodiment, the Word2Vec data 122 is used when extracting nouns associated with recognized content. Of course, it is also possible to extract nouns associated with the recognized content using the converted data.

また、本実施の形態では、認識されるコンテンツから連想語を抽出する構成としたが、設定した用語を加えることによって、意図的に連想を偏らせる構成としてもよい。例えば、「物理学」といった用語を加えることによって、物理学に近い概念に連想を偏らせることも可能である。 Further, in the present embodiment, an association word is extracted from recognized content, but the association may be intentionally biased by adding a set term. For example, by adding the term "physics" it is possible to bias the association towards concepts close to physics.

なお、本実施形態では、ユーザ発話に含まれるコンテンツと同一のコンテンツを選択すること（パロット）を許容する。図５の例では、１つ目のユーザ発話である「学校で学びたい」に含まれるコンテンツの「学び」に対して、パロットによりシステム発話のコンテンツが選択されたことを示している。 Note that, in the present embodiment, it is allowed to select (parrot) the same content as the content included in the user's utterance. The example of FIG. 5 shows that the parrot has selected the content of the system utterance for the content "learning" included in the first user utterance "I want to learn at school".

発話生成装置１０は、意図変換テーブル１２１を利用して、ユーザ発話から認識したユーザの意図を、システム発話に付与すべき意図へ変換する処理を実行する。なお、本実施の形態では、変換後の意図が元のユーザ発話から認識したユーザの意図と同一であることを許容する。図５に示す例では、１つ目のユーザ発話に含まれるユーザの意図（願望）がシステム発話の意図（願望）に変換され、２つ目のユーザ発話に含まれるユーザの意図（疑問）がシステム発話の意図（願望）に変換されたことを示している。 The utterance generation device 10 uses the intention conversion table 121 to execute processing for converting the user's intention recognized from the user's utterance into the intention to be added to the system utterance. Note that, in the present embodiment, the intention after conversion is allowed to be the same as the intention of the user recognized from the original user utterance. In the example shown in FIG. 5, the user's intention (desire) contained in the first user utterance is converted into the system utterance's intention (desire), and the user's intention (question) contained in the second user utterance is converted into It shows that it has been converted into the intention (desire) of the system utterance.

発話生成装置１０は、抽出した名詞と、変換後の意図とを含むシステム発話を生成する。図５の例では、１つ目のユーザ発話「学校で学びたい」に対して、パロットにより選択されたコンテンツ（「学び」）と、変換後の意図（「たい」）とを含む「私も学びたいよ」といったシステム発話が生成されたことを示している。また、２つ目のユーザ発話に対して、連想により抽出された名詞（コンテンツ）である「概論」、この名詞と共起する用言を含む共起表現である「を取る」、及び変換後の意図（「たい」）を含む「概論を取りたいです」といったシステム発話が生成されたことを示している。 The utterance generation device 10 generates system utterances including the extracted nouns and the converted intentions. In the example of FIG. 5, for the first user utterance "I want to learn at school", the content selected by the parrot ("learn") and the intention after conversion ("I want to") It shows that a system utterance such as "I want to learn" was generated. Also, for the second user utterance, the noun (content) extracted by the association is “introduction”, the co-occurrence expression “to take” including the predicate that co-occurs with this noun, and the converted It shows that a system utterance such as "I want to take a general discussion" containing the intention ("tai") of the was generated.

発話生成装置１０は、端末装置２０からユーザ発話を受信する都度、システム発話を生成し、その都度、端末装置２０へ返信する。端末装置２０は、入力されたユーザ発話、及び発話生成装置１０から受信したシステム発話を表示部２４に時系列順に表示させる。 The utterance generation device 10 generates a system utterance each time it receives a user utterance from the terminal device 20 and returns it to the terminal device 20 each time. The terminal device 20 causes the display unit 24 to display the input user speech and the system speech received from the speech generation device 10 in chronological order.

以下、ユーザ発話に含まれるコンテンツから連想される名詞を抽出する処理の詳細について説明する。 Details of the process of extracting nouns associated with content included in user utterances will be described below.

図６はＷｏｒｄ２Ｖｅｃによる名詞の抽出例を示す図である。発話生成装置１０は、ユーザ発話から認識したコンテンツに基づき、Ｗｏｒｄ２Ｖｅｃデータ１２２からコンテンツに連想される名詞を取得する。コンテンツから連想される名詞は、Ｗｏｒｄ２Ｖｅｃデータ１２２からコンテンツの近傍語彙として抽出される。近傍語彙は、言語空間内において、対象のコンテンツから所定距離範囲内に位置する語彙であり、コンテンツと各語彙との間のコサイン距離に基づき判別される。図６の例では、「意味は？」といったユーザ発話に含まれるコンテンツ（この例では「意味」）に基づき、Ｗｏｒｄ２Ｖｅｃデータ１２２から取得した近傍語彙の一部を示している。取得した近傍語彙には、「いみ」、「わけ」、「意図」、「真意」、「ニュアンス」、…といった名詞の他に、「分かる」といった動詞が含まれることが分かる。このように、Ｗｏｒｄ２Ｖｅｃデータ１２２から得られる近傍語彙には、名詞だけでなく、動詞（又は形容詞）等が含まれる可能性があるが、本実施の形態においては、名詞以外の動詞、形容詞等は抽出対象から除外される。 FIG. 6 is a diagram showing an example of noun extraction by Word2Vec. The speech generation device 10 acquires nouns associated with the content from the Word2Vec data 122 based on the content recognized from the user's speech. Nouns associated with the content are extracted from the Word2Vec data 122 as neighboring vocabularies of the content. Neighboring vocabularies are vocabularies located within a predetermined distance range from the target content in the language space, and are determined based on the cosine distance between the content and each lexical. The example of FIG. 6 shows part of the neighborhood vocabulary acquired from the Word2Vec data 122 based on the content (in this example, "meaning") included in the user's utterance such as "meaning?". It can be seen that the acquired neighboring vocabulary includes nouns such as "imi", "wake", "intention", "true intention", "nuance", and so on, and verbs such as "understand". In this way, the neighborhood vocabulary obtained from the Word2Vec data 122 may include not only nouns but also verbs (or adjectives) and the like. Excluded from extraction targets.

本実施の形態では、言語資源の偏りに伴う発話性能の制限を緩和するために、抽出した名詞に関してクラスタリングを行う。例えば、言語資源として、「炭酸水－を飲む」、「水－を飲む、水－が美味しい」しか存在しない場合、「炭酸水」に対しては、「を飲む」という助詞及び動詞が一意に接続されることになる。すなわち、ユーザ発話から連想される名詞が「炭酸水」であった場合、システム発話としては「炭酸水を飲む」といった発話しか生成できないことになる。これに対し、「を飲む」という点で「炭酸水」と「水」とが同一のクラスタとしてクラスタリングできれば、「炭酸水」に対して、「を飲む」又は「が美味しい」の何れかを選択できるようになるので、「炭酸水を飲む」といったシステム発話の他に、「炭酸水が美味しい」といったシステム発話の生成が可能となる。このように、クラスタリングによって、発話として自然に接続できる助詞及び動詞（形容詞）の可能性を広げることができる。 In this embodiment, clustering is performed on the extracted nouns in order to alleviate restrictions on speech performance due to biased language resources. For example, if there are only linguistic resources "carbonated water - drink" and "water - drink, water - is delicious", the particle and verb "to drink" are unique for "carbonated water". will be connected. That is, if the noun associated with the user's utterance is "carbonated water", only an utterance such as "drink carbonated water" can be generated as the system utterance. On the other hand, if “carbonated water” and “water” can be clustered as the same cluster in terms of “drinking”, either “drink” or “is delicious” is selected for “carbonated water”. Since it becomes possible to do so, it is possible to generate system utterances such as "carbonated water is delicious" in addition to system utterances such as "drink carbonated water". In this way, clustering can expand the possibilities of particles and verbs (adjectives) that can be naturally connected as utterances.

図７は本実施の形態におけるクラスタリング手法を説明する説明図である。本実施の形態では、無向グラフとＫｍｅａｎｓ＋＋とを用いて、Ｗｏｒｄ２Ｖｅｃデータ１２２から取得した名詞をクラスタリングする。例として、ユーザ発話から認識したコンテンツである「意味」、及び、その近傍語彙としてＷｏｒｄ２Ｖｅｃデータ１２２から取得した名詞である「いみ」、「わけ」、「意図」、「真意」、「ニュアンス」、「真偽」、「文脈」、「仕組み」、「理由」の合計１０個の名詞をクラスタリングする処理について説明する。発話生成装置１０は、取得した１０個の名詞、並びに、各名詞に共起する用言を含む共起表現に基づき、無向グラフを生成する。共起表現には、例えば、各名詞に共起する動詞（又は形容詞）と、名詞及び動詞（又は形容詞）を接続する助詞を含む。各名詞は、それぞれに接続される共起表現の多寡に応じて、無向グラフ上に分散して配置される。 FIG. 7 is an explanatory diagram for explaining the clustering method in this embodiment. In this embodiment, nouns obtained from the Word2Vec data 122 are clustered using an undirected graph and Kmeans++. As an example, "meaning", which is content recognized from user utterances, and nouns "imi", "why", "intention", "real intention", and "nuance" obtained from the Word2Vec data 122 as neighboring vocabularies, A process of clustering a total of 10 nouns of "truth", "context", "mechanism", and "reason" will be described. The utterance generation device 10 generates an undirected graph based on the acquired ten nouns and co-occurrence expressions including predicates co-occurring with each noun. Co-occurring expressions include, for example, verbs (or adjectives) that co-occur with each noun, and particles that connect nouns and verbs (or adjectives). Each noun is distributed and arranged on the undirected graph according to the number of co-occurrence expressions connected to each noun.

発話生成装置１０は、無向グラフ上に配置した１０個の名詞をＫｍｅａｎｓ＋＋を用いてクラスタリングする。Ｋｍｅａｎｓ＋＋によるクラスタリングでは、分割したいクラスタの数だけ重心を置き、各データ（この例では名詞）を最も近い重心に関連付ける処理と、関連付けたデータの平均値に重心を移動させる処理とを繰り返すことによって、クラスタを決定する。なお、Ｋｍｅａｎｓ＋＋では、クラスタの数は予め設定しておく必要があるが、エルボー法を利用して、分割するクラスタの数を自動的に設定する構成としてもよい。 The utterance generation device 10 clusters 10 nouns placed on the undirected graph using Kmeans++. In clustering by Kmeans++, the center of gravity is placed as many times as the number of clusters to be divided, and the process of associating each data (noun in this example) with the closest center of gravity and the process of moving the center of gravity to the average value of the associated data are repeated. Determine clusters. In Kmeans++, the number of clusters must be set in advance, but the elbow method may be used to automatically set the number of clusters to be divided.

図７に示した例では、「意味」、「ニュアンス」、「文脈」の３つの名詞が第１のクラスタ、「真偽」、「真意」、「意図」の３つの名詞が第２のクラスタ、「わけ」、「いみ」、「仕組み」、「理由」の４つの名詞が第３のクラスタにクラスタリングされたことを示している。 In the example shown in FIG. 7, the three nouns "meaning", "nuance", and "context" are in the first cluster, and the three nouns "truth", "true meaning", and "intention" are in the second cluster. , “wake”, “imi”, “mechanism”, and “reason” are clustered into the third cluster.

なお、本実施の形態では、簡略化のために１０個の名詞を３つのクラスタにクラスタリングする構成について説明したが、クラスタリングの対象となる名詞の数及びクラスタの数は図７に示した例に限定されるものではない。例えば、コンテンツから取得される名詞の数が５０個程度となる場合があるが、このとき、クラスタリングによって８～１６個程度のクラスタが生成され得る。 In this embodiment, for the sake of simplification, a configuration in which 10 nouns are clustered into 3 clusters has been described. It is not limited. For example, the number of nouns acquired from content may be about 50, and in this case, about 8 to 16 clusters may be generated by clustering.

発話生成装置１０は、クラスタリングの結果を参照して、システム発話に含める名詞及び共起表現を選択する。 The utterance generation device 10 refers to the clustering results to select nouns and co-occurrence expressions to be included in the system utterance.

図８は名詞及び共起表現の選択処理について説明する説明図である。図８では、ユーザ発話に基づき認識したコンテンツから複数の名詞を抽出し、システム発話に含める名詞及び共起表現を選択するまでの処理の流れを模式的に示している。発話生成装置１０は、ユーザ発話に基づきコンテンツを認識した場合、上述したように、そのコンテンツから連想される名詞をＷｏｒｄ２Ｖｅｃデータ１２２から抽出する。また、発話生成装置１０は、抽出した名詞のそれぞれと共起する用言を含む共起表現を併せて読み込む。共起表現は、例えば、助詞及び動詞、又は、助詞及び形容詞を含む。なお、Ｗｏｒｄ２Ｖｅｃデータ１２２から抽出する名詞は、元のユーザ発話から認識したコンテンツと同一（パロット）であってもよい。また、名詞と併せて読み込む共起表現は、元のユーザ発話に含まれる共起表現と同一（パロット）であってもよい。図８において、パロットとして抽出された名詞及び共起表現は、ハッチングを付して示している。 FIG. 8 is an explanatory diagram illustrating selection processing of nouns and co-occurrence expressions. FIG. 8 schematically shows the flow of processing from extracting a plurality of nouns from content recognized based on user utterances to selecting nouns and co-occurrence expressions to be included in system utterances. When the speech generation device 10 recognizes content based on user speech, the speech generation device 10 extracts nouns associated with the content from the Word2Vec data 122 as described above. In addition, the utterance generation device 10 also reads co-occurrence expressions including predicates that co-occur with each of the extracted nouns. Co-occurring expressions include, for example, particles and verbs, or particles and adjectives. Note that the noun extracted from the Word2Vec data 122 may be the same (parrot) as the content recognized from the original user utterance. Also, the co-occurring expression read together with the noun may be the same (parrot) as the co-occurring expression included in the original user utterance. In FIG. 8, nouns and co-occurrence expressions extracted as parrots are shown with hatching.

発話生成装置１０は、Ｗｏｒｄ２Ｖｅｃデータ１２２から抽出した名詞、及び併せて読み込んだ共起表現が得られた場合、上述したように、無向グラフを生成し、Ｋｍｅａｎｓ＋＋の手法を用いてクラスタリングを行う。図８の例では、ｎ個のクラスタが生成され、第１クラスタには３つの名詞、第ｎクラスタにはパロットを含む４つの名詞が含まれることを示している。各クラスタに関連付けられる共起表現は、各クラスタ内に含まれる名詞のそれぞれと共起する用言を含む共起表現の集合である。すなわち、「炭酸水」の共起表現に「が美味しい」が含まれない場合であっても、「炭酸水」と同一のクラスタに含まれる「水」の共起表現として、「が美味しい」が含まれる場合、このクラスタ内には、「が美味しい」が含まれることになる。 When the nouns extracted from the Word2Vec data 122 and the co-occurrence expressions read together are obtained, the utterance generation device 10 generates an undirected graph as described above and performs clustering using the Kmeans++ method. The example of FIG. 8 shows that n clusters are generated, the first cluster contains three nouns, and the n-th cluster contains four nouns including Parrot. The co-occurring expressions associated with each cluster are a set of co-occurring expressions containing predicates that co-occur with each of the nouns contained within each cluster. That is, even if the co-occurrence expression of "carbonated water" does not include "ga delicious", the co-occurrence expression of "water" included in the same cluster as "carbonated water" includes "ga delicious". If it is included, this cluster will include "ga delicious".

発話生成装置１０は、システム発話に含める名詞及び共起表現を選択する際、まず、生成されたクラスタの中からランダムに１つのクラスタを選択する。次いで、発話生成装置１０は、選択したクラスタに含まれる名詞の中から、ランダムに１つの名詞を選択し、そのクラスタに関連付けられている共起表現の集合の中から、ランダムに１つの共起表現を選択する。図８に示した例では、第１クラスタ～第ｎクラスタの中から第ｎクラスタを選択し、選択した第ｎクラスタから名詞を１つ選択すると共に、第ｎクラスタに関連付けられている共起表現の集合から共起表現を１つ選択したことを示している。
なお、ユーザ発話に対して、オウム返しを行う場合、パロットとして含まれる名詞及び共起表現が選択される。 When selecting nouns and co-occurring expressions to be included in the system utterance, the utterance generation device 10 first randomly selects one cluster from the generated clusters. Next, the utterance generation device 10 randomly selects one noun from the nouns included in the selected cluster, and randomly selects one co-occurrence expression from the set of co-occurrence expressions associated with the cluster. Choose an expression. In the example shown in FIG. 8, the n-th cluster is selected from the first to n-th clusters, one noun is selected from the selected n-th cluster, and the co-occurrence expression associated with the n-th cluster is selected. This indicates that one co-occurrence expression is selected from the set of .
When parroting back the user's utterance, nouns and co-occurrence expressions included as parrots are selected.

以上の処理により、発話生成装置１０は、システム発話に含める名詞及び共起表現を抽出することができる。 Through the above processing, the utterance generation device 10 can extract nouns and co-occurrence expressions to be included in system utterances.

次に、発話生成装置１０は、ユーザ発話から認識した意図を、システム発話に付与すべき意図へ変換する処理を実行する。本実施の形態では、ユーザ発話における意図と、システム発話における意図との関係を規定する意図変換テーブル１２１を用いて意図変換を行う。 Next, the utterance generation device 10 executes a process of converting the intention recognized from the user's utterance into the intention to be added to the system utterance. In this embodiment, intention conversion is performed using an intention conversion table 121 that defines the relationship between the intention in user utterances and the intention in system utterances.

図９及び図１０は意図変換テーブル１２１の一例を示す概念図である。意図変換テーブル１２１は、図９に示す第１テーブル１２１Ａと、図１０に示す第２テーブル１２１Ｂとにより構成される。 9 and 10 are conceptual diagrams showing an example of the intention conversion table 121. FIG. The intention conversion table 121 is composed of a first table 121A shown in FIG. 9 and a second table 121B shown in FIG.

図９に示す第１テーブル１２１Ａの第１列目は、ユーザ発話の意図を分類したタグを表し、第２列目～第１０列目は、システム発話の意図として選択される候補を表している。本実施の形態では、ユーザ発話を［現在・肯定・通常文］、［過去・肯定・通常文］、［※１・否定・通常文］、［現在・肯定・疑問文］、［過去・肯定・疑問文］、［※１・否定・疑問文］、［現在・肯定・７Ｗ２Ｈ］、［過去・肯定・７Ｗ２Ｈ］、［※１・否定・７Ｗ２Ｈ］の９つの意図に分類する。 The first column of the first table 121A shown in FIG. 9 represents tags that classify the intention of the user's utterance, and the second to tenth columns represent candidates selected as the intention of the system utterance. . In the present embodiment, user utterances are [present/affirmative/regular], [past/affirmative/regular], [*1/negative/regular], [present/affirmative/interrogative], [past/affirmative] · Interrogative sentence], [*1 negative interrogative sentence], [present affirmative 7W2H], [past affirmative 7W2H], and [*1 negative 7W2H].

例えば、ユーザ発話が「勉強をしたかった」である場合、［過去・肯定・通常文］に分類されるので、第１テーブル１２１Ａの例では、「０ｘ０００００００２」の行が選択の候補となる。すなわち、システム発話の意図は「０ｘ００００００００」～「０ｘＦＦＦＦＣ０００」が候補となり、発話生成装置１０は、予め設定している重みによりランダムでシステム発話の候補を選択する。 For example, if the user utterance is "I wanted to study", it is classified into [past/affirmative/normal sentence], so in the example of the first table 121A, the row of "0x00000002" is a candidate for selection. That is, the intention of the system utterance is "0x00000000" to "0xFFFFC000", and the utterance generation device 10 randomly selects the candidate of the system utterance with a preset weight.

発話生成装置１０は、システム発話の候補として、例えば「０ｘ００００００００」を選択した場合、第１テーブル１２１Ａに登録されているページ番号（この例では１０ページ）を参照し、該当ページの第２テーブル１２１Ｂを記憶部１２から読出す。 When selecting, for example, "0x00000000" as a system utterance candidate, the utterance generation device 10 refers to the page number (page 10 in this example) registered in the first table 121A, and selects the second table 121B of the corresponding page. is read out from the storage unit 12 .

図１０は記憶部１２から読み出した１０ページ目の第２テーブル１２１Ｂを示している。第２テーブル１２１Ｂへのインプットとしては、ユーザ発話の意図タグのみを使用する。ここで、上述したユーザ発話「勉強をしたかった」は「主体・願望」を表すので、第２テーブル１２１Ｂにおける「主体・願望」の行が選択候補となる。すなわち、システム発話における意図の候補は、「可能・通常」、「可能・願望」、「主体・願望」、「主体・希望」、「義務」、「提案」の６つとなる。発話生成装置１０は、予め設定している重みによりランダムでシステム発話における意図の候補を選択する。例えば、重み付きランダムにより、「提案」を選択した場合、ユーザ発話の意図（主体・願望）は、システム発話に付与すべき意図（提案）に変換されることになる。 FIG. 10 shows the second table 121B of the tenth page read out from the storage unit 12. As shown in FIG. As an input to the second table 121B, only intention tags of user utterances are used. Here, since the above-described user utterance "I wanted to study" represents "subject/desire", the row of "subject/desire" in the second table 121B becomes a selection candidate. That is, there are six intention candidates in the system utterance: "Possible/Normal", "Possible/Desire", "Subject/Desire", "Subject/Desire", "Obligation", and "Proposal". The utterance generation device 10 randomly selects an intention candidate in the system utterance using preset weights. For example, when "suggestion" is selected by weighted randomness, the intention (subject/desire) of the user's utterance is converted into the intention (proposal) to be added to the system utterance.

なお、「主体・願望」の行と、「提案」の列とが交差する欄には、６つのＳが記載されているが、これらは、名詞連想の可否、及び動詞（形容詞）連想の可否を制御するための制御子である。何れかのＳが選択された場合、名詞を連想する／名詞を連想しない、および、動詞（形容詞）を連想する／動詞（形容詞）を連想しないの何れかが決定される。 In the column where the row of "Subject/Desire" and the column of "Proposal" intersect, there are six S's, which indicate whether or not noun association and verb (adjective) association are possible or not. is a controller for controlling If any S is selected, it is determined whether it associates a noun/not associates a noun and associates a verb (adjective)/does not associate a verb (adjective).

また、ユーザ発話からコンテンツを認識する際に、ユーザ発話の意図の情報を考慮して、コンテンツを認識してもよい。 Also, when recognizing content from user utterances, content may be recognized in consideration of intention information of user utterances.

本実施の形態では、意図変換テーブル１２１を用いて、ユーザ発話から認識した意図を、システム発話に付与すべき意図へ変換する構成としたが、複数のテーブルを用意しておき、パラメトリックにテーブルを選択してもよく、選択パラメータを他の情報と結び付けて動的に変更する構成としてもよい。更に、対話データを用いて、意図変換テーブル１２１を機械学習させてもよい。 In this embodiment, the intention conversion table 121 is used to convert the intention recognized from the user utterance into the intention to be given to the system utterance. It may be selected, and the configuration may be such that the selected parameter is linked with other information and dynamically changed. Furthermore, the intention conversion table 121 may be machine-learned using dialogue data.

また、ユーザの属性、感情、性格、キャラクタ設定、会話の状態、会話の流れといったパラメータを用いて、ユーザ発話から認識した意図を、システム発話に付与すべき意図に変換してもよい。 In addition, the intention recognized from the user's utterance may be converted into the intention to be added to the system utterance using parameters such as user's attribute, emotion, personality, character setting, conversation state, and conversation flow.

以下、発話生成装置１０が実行する処理の手順を説明する。
図１１は発話生成装置１０が実行する処理の手順を説明するフローチャートである。発話生成装置１０の制御部１１は、通信部１３を通じて、端末装置２０から送信されるユーザ発話の入力を受付ける（ステップＳ１０１）。制御部１１は、入力されたユーザ発話について定型文であるか否かを判断し（ステップＳ１０２）、定型文であると判断した場合（Ｓ１０２：ＹＥＳ）、システム発話として定型文を出力する（ステップＳ１０３）。制御部１１から出力されるシステム発話は、通信部１３を通じて、端末装置２０へ送信される。 The procedure of processing executed by the speech generation device 10 will be described below.
FIG. 11 is a flowchart for explaining the procedure of processing executed by the utterance generation device 10. As shown in FIG. The control unit 11 of the speech generation device 10 receives an input of user speech transmitted from the terminal device 20 through the communication unit 13 (step S101). The control unit 11 determines whether or not the input user utterance is a standard sentence (step S102), and if it determines that it is a standard sentence (S102: YES), outputs the standard sentence as a system utterance (step S102). S103). A system utterance output from the control unit 11 is transmitted to the terminal device 20 through the communication unit 13 .

定型文でないと判断した場合（Ｓ１０２：ＮＯ）、制御部１１は、ユーザ発話からコンテンツ及び意図を認識する（ステップＳ１０４）。すなわち、制御部１１は、ユーザ発話について形態素解析を行うことによって主要品詞を取得し、取得した品詞のうち、名詞に相当する形態素をコンテンツとして認識し、名詞、動詞、及び形容詞を除く形態素に基づきユーザの意図を認識する。 If it is determined that the sentence is not a fixed phrase (S102: NO), the control unit 11 recognizes the content and intention from the user's utterance (step S104). That is, the control unit 11 acquires main parts of speech by performing morphological analysis on user utterances, recognizes morphemes corresponding to nouns among the acquired parts of speech as content, and recognizes morphemes excluding nouns, verbs, and adjectives based on morphemes excluding nouns, verbs, and adjectives. Recognize user intent.

次いで、制御部１１は、Ｗｏｒｄ２Ｖｅｃデータ１２２を参照し、認識したコンテンツから連想される名詞を抽出する（ステップＳ１０５）。このとき、制御部１１は、抽出した名詞のそれぞれに接続される共起表現の読み出しを行う。 Next, the control unit 11 refers to the Word2Vec data 122 and extracts nouns associated with the recognized content (step S105). At this time, the control unit 11 reads co-occurrence expressions connected to each of the extracted nouns.

次いで、制御部１１は、抽出した名詞及び共起表現に基づき無向グラフを生成し（ステップＳ１０６）、生成した無向グラフ上に配置される名詞をクラスタリングする（ステップＳ１０７）。クラスタリングには、例えばＫｍｅａｎｓ＋＋による手法が用いられる。また、必要に応じてエルボー法を用いることにより、分割するクラスタ数を自動的に設定してもよい。 Next, the control unit 11 generates an undirected graph based on the extracted nouns and co-occurrence expressions (step S106), and clusters the nouns arranged on the generated undirected graph (step S107). For clustering, for example, a method based on Kmeans++ is used. Moreover, the number of clusters to be divided may be automatically set by using the elbow method as necessary.

次いで、制御部１１は、ステップＳ１０７で生成されるクラスタの中から一のクラスタをランダムに選択する（ステップＳ１０８）。更に、制御部１１は、選択したクラスタに含まれる名詞の中から一の名詞をランダムに選択すると共に、そのクラスタに関連付けられている共起表現の集合の中から一の共起表現をランダムに選択する（ステップＳ１０９）。 Next, the control unit 11 randomly selects one cluster from the clusters generated in step S107 (step S108). Further, the control unit 11 randomly selects one noun from the nouns included in the selected cluster, and randomly selects one co-occurring expression from the set of co-occurring expressions associated with the cluster. Select (step S109).

次いで、制御部１１は、意図変換テーブル１２１を参照し、ユーザ発話の意図を、システム発話に付与すべき意図へ変換する（ステップＳ１１０）。なお、本実施の形態では、ユーザ発話に含まれるコンテンツから連想される名詞を抽出し、システム発話に含める名詞及び共起表現を選択した後に、意図変換を行う手順を示したが、意図変換を実行した後に、ユーザ発話に含まれるコンテンツから連想される名詞を抽出し、システム発話に含める名詞及び共起表現を選択した後に、意図変換を行う手順であってもよく、これらの手順を同時並行的に行ってもよい。 Next, the control unit 11 refers to the intention conversion table 121 and converts the intention of the user's utterance into the intention to be added to the system utterance (step S110). In the present embodiment, the procedure for extracting nouns associated with content included in user utterances, selecting nouns and co-occurrence expressions to be included in system utterances, and then performing intention conversion is described. After the execution, extract nouns associated with the content included in the user utterance, select nouns and co-occurrence expressions to be included in the system utterance, and then perform intention conversion. You can go there on purpose.

次いで、制御部１１は、ステップＳ１０９で選択した名詞及び共起表現、並びにステップＳ１１０で変換したシステム発話の意図を用いて、システム発話を生成する（ステップＳ１１１）。このとき、制御部１１は、共起表現に含まれる動詞又は形容詞がシステム発話の意図と適切に接続されるように、動詞又は形容詞の活用を適宜変換する。制御部１１によって生成されたシステム発話は、通信部１３を通じて、端末装置２０へ送信される。 Next, the control unit 11 generates system utterances using the nouns and co-occurrence expressions selected in step S109 and the intention of the system utterances converted in step S110 (step S111). At this time, the control unit 11 appropriately converts the conjugation of the verbs or adjectives so that the verbs or adjectives included in the co-occurrence expressions are appropriately connected to the intention of the system utterance. A system utterance generated by the control unit 11 is transmitted to the terminal device 20 through the communication unit 13 .

以上のように、本実施の形態では、ユーザ発話に含まれるコンテンツと意図とを認識した上で、コンテンツから連想される複数の名詞と、これらの名詞に共起する用言を含む共起表現とを活用して、システム発話を生成するので、シナリオといった予め定めたパターンに依存せずに対話を進めることができる。この結果、ユーザに対して、癒し、治療、助言、発想支援等の心的変化の喚起を目的とした対話環境を提供することができる。 As described above, in the present embodiment, after recognizing the contents and intentions included in the user's utterance, a plurality of nouns associated with the contents and co-occurrence expressions including predicates co-occurring with these nouns are used. is used to generate system utterances, so dialogue can proceed without depending on a predetermined pattern such as a scenario. As a result, it is possible to provide the user with an interactive environment aimed at arousing mental changes such as healing, treatment, advice, and idea support.

なお、本実施の形態では、ユーザ発話を基にシステム発話を生成する構成としたが、例えばスマートフォンやＡＲ装置などの端末装置２０から得られる、視覚、聴覚、加速度、カレンダー、メール等のマルチモーダルな情報を補助的に用いて、システム発話を生成してもよい。 In the present embodiment, a system utterance is generated based on a user utterance. information may be used to generate system utterances.

更に、ユーザに関する情報、及びシステム発話を発するキャラクタの情報を記憶部１２に格納し、これらの情報を加味して、システム発話を生成してもよい。 Furthermore, information on the user and information on the character that utters the system utterance may be stored in the storage unit 12, and the system utterance may be generated in consideration of these pieces of information.

（実施の形態２）
実施の形態２では、コンテンツに付随する心的語彙を評価し、評価結果に基づきシステム発話を生成する構成について説明する。
なお、システムの全体構成、並びに、発話生成装置１０及び端末装置２０の内部構成は実施の形態１と同様であるため、その説明を省略することとする。 (Embodiment 2)
In Embodiment 2, a configuration will be described in which mental vocabulary associated with content is evaluated and system utterances are generated based on the evaluation results.
Note that the overall configuration of the system and the internal configurations of the speech generation device 10 and the terminal device 20 are the same as those of the first embodiment, so description thereof will be omitted.

図１２は実施の形態２に係るシステム発話の生成手法を説明する説明図である。実施の形態２では、「二日酔いです」といったユーザ発話に対して、「辛いね」といったシステム発話を生成することを目的としている。発話生成装置１０にユーザ発話が入力された場合、実施の形態１と同様に、形態素解析を用いて主要品詞に分解し、コンテンツを認識する。例えば「二日酔いです」といったユーザ発話からは、「二日酔い」がコンテンツとして認識される。 FIG. 12 is an explanatory diagram for explaining a system utterance generation method according to the second embodiment. The second embodiment aims to generate a system utterance such as "it's painful" in response to a user utterance such as "I have a hangover". When a user's utterance is input to the utterance generation device 10, it is decomposed into main parts of speech using morphological analysis to recognize the content, as in the first embodiment. For example, from a user utterance such as "I have a hangover", "hangover" is recognized as content.

次いで、発話生成装置１０は、Ｗｏｒｄ２Ｖｅｃデータ１２２を用いて、「二日酔い」と、（嬉しい、悲しい）、（安心、心配）、（楽しい、苦痛）といった心的語彙との間のコサイン距離を算出し、その２点間のコサイン距離が閾値を超えた場合に、どちらの意図に近いのかを判断するＳｅｎｓ分析を行う。図１２の例では、「苦痛」とのコサイン距離が１に近く、かつ閾値を超えていると評価していることを示している。 Next, the utterance generation device 10 uses the Word2Vec data 122 to calculate the cosine distance between "Hangover" and mental vocabularies such as (happy, sad), (relieved, worried), and (fun, painful). , and the cosine distance between the two points exceeds a threshold value, Sens analysis is performed to determine which of the two points is closer to the intention. The example of FIG. 12 indicates that the cosine distance to "pain" is close to 1 and exceeds the threshold.

発話生成装置１０は、評価結果である「苦痛」と関連付けて記憶されているシステム発話（図１２の例では「辛い」）を、発話テンプレートから取得し、システム発話として「辛いね」を生成する。 The utterance generation device 10 acquires the system utterance ("painful" in the example of FIG. 12) stored in association with the evaluation result "pain" from the utterance template, and generates "painful" as the system utterance. .

実施の形態２におけるＳｅｎｓ分析によるシステム発話の生成は、常時実行する必要はなく、ランダムに実施してもよい。 Generation of system utterances by Sens analysis in Embodiment 2 need not always be performed, and may be performed randomly.

今回開示された実施の形態は、全ての点で例示であって、制限的なものではないと考えられるべきである。本発明の範囲は、上述した意味ではなく、特許請求の範囲によって示され、特許請求の範囲と均等の意味及び範囲内での全ての変更が含まれることが意図される。 The embodiments disclosed this time should be considered as examples in all respects and not as restrictive. The scope of the present invention is indicated by the scope of the claims rather than the meaning described above, and is intended to include all modifications within the scope and meaning equivalent to the scope of the claims.

１０発話生成装置
１１制御部
１２記憶部
１２０発話生成プログラム
１２１意図変換テーブル
１２２Ｗｏｒｄ２Ｖｅｃデータ
１３通信部
１４表示部
１５操作部
２０端末装置
２１制御部
２２記憶部
２３通信部
２４表示部
２５操作部 REFERENCE SIGNS LIST 10 speech generation device 11 control unit 12 storage unit 120 speech generation program 121 intention conversion table 122 Word2Vec data 13 communication unit 14 display unit 15 operation unit 20 terminal device 21 control unit 22 storage unit 23 communication unit 24 display unit 25 operation unit

Claims

An utterance generation device that generates a system utterance to be output in response to an input user utterance,
a recognition unit that recognizes a user's intention from the user's utterance;
extracting nouns included in the user utterance, and acquiring a plurality of associated words associated with the extracted nouns and a plurality of co-occurrence expressions including terms co-occurring with the nouns and the plurality of associated words, respectively; an acquisition unit that
a classification unit that classifies the noun and the plurality of associative words into a plurality of clusters based on the amount of co-occurrence expressions commonly used between the noun and the plurality of associative words;
A selection unit that selects one noun or associative word from one of the plurality of clusters and selects one co-occurrence expression that co-occurs with any noun or associative word included in the one cluster. and,
a converting unit that converts the user's intention recognized by the recognizing unit into an intention that should be added to the system utterance;
connecting the nouns or associative words selected by the selection unit and the co-occurrence expressions selected by the selection unit, and connecting the intention after conversion by the conversion unit to the co-occurrence expressions to generate a system utterance; and a speech generator.

The speech generation device according to claim 1, wherein the selection unit randomly selects the one cluster, the one noun or associative word, and the one co-occurrence expression.

3. The speech generation device according to claim 1, wherein the acquisition unit acquires a plurality of associated words positioned within a predetermined distance range from the noun in language space.

The speech generation device according to any one of claims 1 to 3, wherein the associative words are nouns, verbs, or adjectives.

Equipped with an intention conversion table that defines the relationship between intentions recognized from user utterances and intentions to be given to system utterances,
The utterance according to any one of claims 1 to 4, wherein the conversion unit refers to the intention conversion table and converts an intention recognized from an input user utterance into an intention to be added to the system utterance. generator.

the intention conversion table includes a plurality of candidates for intention to be added to the system utterance;
The speech generation device according to claim 5, wherein the conversion unit selects one intention to be added to the system speech from intention candidates included in the intention conversion table.

7. The system according to any one of claims 1 to 6, further comprising: when the user utterance is a fixed phrase, a fixed phrase output unit that outputs a system utterance including the fixed phrase instead of the system utterance generated by the utterance generation unit. A speech generator as described.

An utterance generation method for generating a system utterance to be output in response to an input user utterance using a computer, comprising:
The computer is
recognizing the user's intention from the user utterance;
extracting nouns included in the user utterance;
Acquiring a plurality of associated words associated with the extracted noun and a plurality of co-occurrence expressions including terms co-occurring with the noun and the plurality of associated words,
classifying the noun and the plurality of associative words into a plurality of clusters based on the amount of co-occurrence expressions commonly used between the noun and the plurality of associative words;
Selecting one noun or associative word from one cluster of the plurality of clusters, and selecting one co-occurrence expression that co-occurs with any noun or associative word contained in the one cluster,
Converting the recognized user's intention to the intention to be attached to the system utterance,
An utterance generation method for generating a system utterance by connecting selected nouns or associated words and selected co-occurrence expressions, and connecting intention after conversion to the co-occurrence expressions.

A computer program for causing a computer to execute processing for generating a system utterance to be output in response to an input user utterance,
to said computer;
recognizing the user's intention from the user utterance;
extracting nouns included in the user utterance;
Acquiring a plurality of associated words associated with the extracted noun and a plurality of co-occurrence expressions including terms co-occurring with the noun and the plurality of associated words,
classifying the noun and the plurality of associative words into a plurality of clusters based on the amount of co-occurrence expressions commonly used between the noun and the plurality of associative words;
Selecting one noun or associative word from one cluster of the plurality of clusters, and selecting one co-occurrence expression that co-occurs with any noun or associative word contained in the one cluster,
Converting the recognized user's intention to the intention to be attached to the system utterance,
A computer program for connecting a selected noun or associative word with a selected co-occurring expression, and connecting an intention after conversion to the co-occurring expression to generate a system utterance.