JP2022103472A

JP2022103472A - Information processor, information processing method, and program

Info

Publication number: JP2022103472A
Application number: JP2020218112A
Authority: JP
Inventors: 和哉渡邉; Kazuya Watanabe
Original assignee: Honda Motor Co Ltd
Current assignee: Honda Motor Co Ltd
Priority date: 2020-12-28
Filing date: 2020-12-28
Publication date: 2022-07-08
Anticipated expiration: 2040-12-28
Also published as: JP7449852B2

Abstract

To provide an information processor capable of improving the usability of a voice user interface.SOLUTION: An information processor in an embodiment includes: an extraction unit that extracts unique expressions from utterances of each of multiple target users; a determination unit that determines whether or not the target user has visited a specific point where the number of visits has increased sharply based on the behavior history of the target user for each of the target users; a first generation unit that generates a feature value as a combination of the unique expression extracted by the extraction unit and the determination result by the determination unit for each of the target users; an analysis unit that performs clustering of the multiple target users for which the feature value is generated by the first generation unit; and a second generation unit that generates a dictionary of at least one of a speech recognition and a natural language understanding for each cluster generated by the clustering.SELECTED DRAWING: Figure 1

Description

本発明は、情報処理装置、情報処理方法、及びプログラムに関する。 The present invention relates to an information processing apparatus, an information processing method, and a program.

音声認識技術を利用した音声ユーザインターフェースが知られている（例えば、特許文献１－４参照）。 A voice user interface using voice recognition technology is known (see, for example, Patent Document 1-4).

特開２０１７－５８６７４号公報Japanese Unexamined Patent Publication No. 2017-58674 特表２０１９－５３６１７２号公報Japanese Patent Publication No. 2019-536172 特開２０１４－１０６５２３号公報Japanese Unexamined Patent Publication No. 2014-106523 特表２００４－５０２９８５号公報Japanese Patent Publication No. 2004-502985

音声ユーザインターフェースに対するユーザの発話内容や、その発話に含まれる固有表現などは、ユーザの所属コミュニティや流行などに応じて変化している。発話の主体であるユーザや、その発話の情報源などが多様化しているのものの、従来の技術では、その多様化に対応できておらず、音声ユーザインターフェースのユーザビリティが十分でない場合があった。 The content of the user's utterance to the voice user interface and the unique expressions included in the utterance change according to the user's community and fashion. Although the user who is the subject of the utterance and the information source of the utterance are diversified, the conventional technique cannot cope with the diversification, and the usability of the voice user interface may not be sufficient.

本発明の態様は、このような事情を考慮してなされたものであり、音声ユーザインターフェースのユーザビリティを向上させることができる情報処理装置、情報処理方法、及びプログラムを提供することを目的の一つとする。 Aspects of the present invention have been made in consideration of such circumstances, and one of the objects of the present invention is to provide an information processing device, an information processing method, and a program capable of improving the usability of a voice user interface. do.

この発明に係る情報処理装置、情報処理方法、及びプログラムは、以下の構成を採用した。
（１）本発明の第１の態様は、複数の対象ユーザのそれぞれの発話から固有表現を抽出する抽出部と、前記対象ユーザの行動履歴に基づいて、訪問回数が急増した特定地点を前記対象ユーザが訪問したか否かを、前記対象ユーザごとに判定する判定部と、前記抽出部によって抽出された前記固有表現と、前記判定部による判定結果とを組み合わせた特徴量を、前記対象ユーザごとに生成する第１生成部と、前記第１生成部によって前記特徴量が生成された前記複数の対象ユーザのクラスタリングを行う解析部と、前記クラスタリングによって生成されたクラスタごとに、音声認識及び自然言語理解の少なくとも一方のための辞書を生成する第２生成部と、を備える情報処理装置である。 The information processing apparatus, information processing method, and program according to the present invention have the following configurations.
(1) The first aspect of the present invention is an extraction unit that extracts a unique expression from each speech of a plurality of target users, and a specific point where the number of visits has rapidly increased based on the behavior history of the target user. A feature amount that combines a determination unit that determines whether or not a user has visited for each target user, the unique expression extracted by the extraction unit, and a determination result by the determination unit is determined for each target user. Speech recognition and natural language for each of the first generation unit generated in, the analysis unit that clusters the plurality of target users whose feature quantities are generated by the first generation unit, and the clusters generated by the clustering. It is an information processing apparatus including a second generation unit that generates a dictionary for at least one of understanding.

（２）本発明の第２の態様は、第１の態様において、前記特定地点が、他のユーザによって訪問された回数が閾値以上の地点、又は前記他のユーザによって訪問された回数の所定期間あたりの増加率が閾値以上の地点である情報処理装置である。 (2) The second aspect of the present invention is, in the first aspect, a predetermined period of the number of times the specific point is visited by another user is equal to or greater than the threshold value, or the number of times the specific point is visited by the other user. It is an information processing device whose rate of increase per hit is equal to or higher than the threshold value.

（３）本発明の第３の態様は、第１の態様又は第２の態様において、前記クラスタに属する前記対象ユーザの発話から抽出された前記固有表現の共起表現を、前記クラスタごとに収集する収集部を更に備え、前記第２生成部が、前記クラスタごとに、前記収集部によって収集された前記共起表現を含む前記辞書を生成する情報処理装置である。 (3) In the third aspect of the present invention, in the first aspect or the second aspect, the co-occurrence expression of the named entity extracted from the utterance of the target user belonging to the cluster is collected for each cluster. The second generation unit is an information processing device that generates the dictionary including the co-occurrence expression collected by the collection unit for each cluster.

（４）本発明の第４の態様は、第１から第３の態様のうちいずれ一つにおいて、複数の前記クラスタのうちの特定クラスタに属する前記対象ユーザに、複数の前記辞書のうちの前記特定クラスタに対応した前記辞書の利用案内情報を提供する提供部を更に備える情報処理装置である。 (4) A fourth aspect of the present invention is, in any one of the first to third aspects, to the target user belonging to the specific cluster among the plurality of the clusters, to the target user among the plurality of dictionaries. It is an information processing apparatus further provided with a providing unit that provides usage guidance information of the dictionary corresponding to a specific cluster.

（５）本発明の第５の態様は、第４の態様において、前記第２生成部が、前記クラスタごとに生成した前記辞書と既存辞書とを組み合わせた新辞書を生成し、前記提供部が、前記特定クラスタに属する前記対象ユーザに、前記特定クラスタに対応した前記辞書と前記既存辞書とが組み合わされた前記新辞書の利用案内情報を提供する情報処理装置である。 (5) In the fifth aspect of the present invention, in the fourth aspect, the second generation unit generates a new dictionary in which the dictionary generated for each cluster and the existing dictionary are combined, and the providing unit generates a new dictionary. An information processing device that provides the target user belonging to the specific cluster with usage guidance information of the new dictionary in which the dictionary corresponding to the specific cluster and the existing dictionary are combined.

（６）本発明の第６の態様は、第１から第５の態様のうちいずれ一つにおいて、所定のユーザの集団内における前記対象ユーザの発話に基づいて、前記辞書を検証する検証部を更に備える情報処理装置である。 (6) In the sixth aspect of the present invention, in any one of the first to fifth aspects, a verification unit that verifies the dictionary based on the utterance of the target user in a predetermined user group is provided. It is an information processing device to be further provided.

（７）本発明の第７の態様は、第１から第６の態様のうちいずれ一つにおいて、前記第１生成部が、前記固有表現に基づく第１特徴量と、前記判定部による判定結果に基づく第２特徴量との組み合わせを、前記特徴量として生成する情報処理装置である。 (7) In the seventh aspect of the present invention, in any one of the first to sixth aspects, the first generation unit has the first feature amount based on the named entity and the determination result by the determination unit. This is an information processing apparatus that generates a combination with a second feature amount based on the above as the feature amount.

（８）本発明の第８の態様は、第７の態様において、前記第２特徴量には、前記特定地点への訪問の有無と、前記特定地点への訪問の回数とのうち一方又は双方を表す特徴量が含まれる情報処理装置である。 (8) In the eighth aspect of the present invention, in the seventh aspect, the second feature amount includes one or both of the presence or absence of a visit to the specific point and the number of visits to the specific point. It is an information processing device including a feature amount representing.

（９）本発明の第９の態様は、第１から第８の態様のうちいずれ一つにおいて、前記固有表現には、地名又は標章の言い回しが含まれる情報処理装置である。 (9) A ninth aspect of the present invention is an information processing apparatus in which, in any one of the first to eighth aspects, the named entity includes the wording of a place name or a mark.

（１０）本発明の第１０の態様は、コンピュータが、複数の対象ユーザのそれぞれの発話から固有表現を抽出し、前記対象ユーザの行動履歴に基づいて、訪問回数が急増した特定地点を前記対象ユーザが訪問したか否かを、前記対象ユーザごとに判定し、前記抽出した固有表現と、前記判定した結果とを組み合わせた特徴量を、前記対象ユーザごとに生成し、前記特徴量を生成した前記複数の対象ユーザのクラスタリングを行い、前記クラスタリングによって生成したクラスタごとに、音声認識及び自然言語理解の少なくとも一方のための辞書を生成する情報処理方法である。 (10) In the tenth aspect of the present invention, the computer extracts a unique expression from each speech of a plurality of target users, and based on the behavior history of the target user, the target is a specific point where the number of visits has rapidly increased. Whether or not a user has visited is determined for each target user, and a feature amount that combines the extracted unique expression and the determination result is generated for each target user, and the feature amount is generated. It is an information processing method that performs clustering of a plurality of target users and generates a dictionary for at least one of speech recognition and natural language understanding for each cluster generated by the clustering.

（１１）本発明の第１１の態様は、コンピュータに、複数の対象ユーザのそれぞれの発話から固有表現を抽出すること、前記対象ユーザの行動履歴に基づいて、訪問回数が急増した特定地点を前記対象ユーザが訪問したか否かを、前記対象ユーザごとに判定すること、前記抽出した固有表現と、前記判定した結果とを組み合わせた特徴量を、前記対象ユーザごとに生成すること、前記特徴量を生成した前記複数の対象ユーザのクラスタリングを行うこと、前記クラスタリングによって生成したクラスタごとに、音声認識及び自然言語理解の少なくとも一方のための辞書を生成すること、を実行させるためのプログラムである。 (11) The eleventh aspect of the present invention is to extract a unique expression from each speech of a plurality of target users on a computer, and to determine a specific point where the number of visits has rapidly increased based on the behavior history of the target user. It is determined for each target user whether or not the target user has visited, and a feature amount that combines the extracted unique expression and the determination result is generated for each target user. It is a program for executing the clustering of the plurality of target users who generated the above, and the generation of a dictionary for at least one of speech recognition and natural language understanding for each cluster generated by the clustering.

上記態様によれば、音声ユーザインターフェースのユーザビリティを向上させることができる。 According to the above aspect, the usability of the voice user interface can be improved.

実施形態の情報提供システム１の構成図である。It is a block diagram of the information provision system 1 of an embodiment. ユーザ認証情報１３２の内容について説明するための図である。It is a figure for demonstrating the content of the user authentication information 132. 個人発話履歴情報１３４Ａの内容について説明するための図である。It is a figure for demonstrating the content of personal utterance history information 134A. 集団発話履歴情報１３４Ｂの内容について説明するための図である。It is a figure for demonstrating the content of group speech history information 134B. 個人行動履歴情報１３６Ａの内容について説明するための図である。It is a figure for demonstrating the content of personal action history information 136A. 集団行動履歴情報１３６Ｂの内容について説明するための図である。It is a figure for demonstrating the content of group action history information 136B. 実施形態の通信端末３００の構成図である。It is a block diagram of the communication terminal 300 of an embodiment. エージェント装置５００が搭載された車両Ｍの概略構成の一例を示す図である。It is a figure which shows an example of the schematic structure of the vehicle M equipped with the agent device 500. 実施形態の情報提供装置１００による一連の処理の流れを表すフローチャートである。It is a flowchart which shows the flow of a series of processing by the information providing apparatus 100 of an embodiment. 実施形態の情報提供装置１００による一連の処理の流れを表すフローチャートである。It is a flowchart which shows the flow of a series of processing by the information providing apparatus 100 of an embodiment. 訪問回数が急増した地点とそうでない地点とを説明するための図である。It is a figure for explaining the point where the number of visits increased sharply and the point where the number of visits did not increase rapidly. ユーザベクトルのクラスタリング結果の一例を表す図である。It is a figure which shows an example of the clustering result of a user vector. 情報処理辞書の生成方法を説明するための図である。It is a figure for demonstrating the generation method of an information processing dictionary. 情報処理辞書の利用案内情報が提供される場面を模式的に表す図である。It is a figure which shows typically the scene where the usage guidance information of an information processing dictionary is provided.

以下、図面を参照し、本発明の情報処理装置、情報処理方法、及びプログラムの実施形態について説明する。 Hereinafter, embodiments of the information processing apparatus, information processing method, and program of the present invention will be described with reference to the drawings.

図１は、実施形態の情報提供システム１の構成図である。情報提供システム１は、例えば、情報提供装置１００と、情報提供システム１のユーザＵ１が利用する通信端末３００と、情報提供システム１のユーザＵ２が利用する車両Ｍとを備える。これらの構成要素は、ネットワークＮＷを介して互いに通信可能である。ネットワークＮＷは、例えば、インターネット、ＷＡＮ（Wide Area Network）、ＬＡＮ（Local Area Network）、電話回線、公衆回線、専用回線、プロバイダ装置、無線基地局等を含む。情報提供システム１には、通信端末３００または車両Ｍの一方または双方が複数含まれてもよい。車両Ｍは、例えば、エージェント装置５００を備える。情報提供装置１００は、「情報処理装置」の一例である。 FIG. 1 is a configuration diagram of the information providing system 1 of the embodiment. The information providing system 1 includes, for example, an information providing device 100, a communication terminal 300 used by the user U1 of the information providing system 1, and a vehicle M used by the user U2 of the information providing system 1. These components are communicable with each other via the network NW. The network NW includes, for example, the Internet, a WAN (Wide Area Network), a LAN (Local Area Network), a telephone line, a public line, a dedicated line, a provider device, a radio base station, and the like. The information providing system 1 may include one or more of the communication terminal 300 and the vehicle M. The vehicle M includes, for example, an agent device 500. The information providing device 100 is an example of an “information processing device”.

情報提供装置１００は、通信端末３００からユーザＵ１の問い合わせや要求等を受け付け、受け付けた問い合わせや要求に応じた処理を行い、処理結果を通信端末３００に送信する。また、情報提供装置１００は、車両Ｍに搭載されたエージェント装置５００からユーザＵ２の問い合わせや要求等を受け付け、受け付けた問い合わせや要求に応じた処理を行い、処理結果をエージェント装置５００に送信する。情報提供装置１００は、例えば、通信端末３００およびエージェント装置５００と、ネットワークＮＷを介して互いに通信し、各種データを送受信するクラウドサーバとして機能してもよい。 The information providing device 100 receives an inquiry or request of the user U1 from the communication terminal 300, performs processing according to the received inquiry or request, and transmits the processing result to the communication terminal 300. Further, the information providing device 100 receives an inquiry or request of the user U2 from the agent device 500 mounted on the vehicle M, performs processing according to the received inquiry or request, and transmits the processing result to the agent device 500. The information providing device 100 may function as, for example, a cloud server that communicates with the communication terminal 300 and the agent device 500 via the network NW and transmits / receives various data.

通信端末３００は、例えば、スマートフォンやタブレット端末等の携帯型端末である。通信端末３００は、ユーザＵ１からの問い合わせや要求等の情報を受け付ける。通信端末３００は、ユーザＵ１から受け付けた情報を情報提供装置１００に送信し、送信した情報に対する回答として得られた情報を出力する。つまり、通信端末３００は、音声ユーザインターフェースとして機能する。 The communication terminal 300 is, for example, a portable terminal such as a smartphone or a tablet terminal. The communication terminal 300 receives information such as inquiries and requests from the user U1. The communication terminal 300 transmits the information received from the user U1 to the information providing device 100, and outputs the information obtained as a reply to the transmitted information. That is, the communication terminal 300 functions as a voice user interface.

エージェント装置５００が搭載される車両Ｍは、例えば、二輪や三輪、四輪等の車両であり、その駆動源は、ディーゼルエンジンやガソリンエンジン等の内燃機関、電動機、或いはこれらの組み合わせである。電動機は、内燃機関に連結された発電機による発電電力、或いは二次電池や燃料電池の放電電力を使用して動作する。また、車両Ｍは、自動運転車両であってもよい。自動運転とは、例えば、車両の操舵または速度のうち、一方または双方を自動的に制御することである。上述した車両の運転制御には、例えば、ＡＣＣ（Adaptive Cruise Control）や、ＡＬＣ（Auto Lane Changing）、ＬＫＡＳ（Lane Keeping Assistance System）といった種々の運転制御が含まれてよい。自動運転車両は、乗員（運転者）の手動運転によって運転が制御されることがあってもよい。 The vehicle M on which the agent device 500 is mounted is, for example, a vehicle such as a two-wheeled vehicle, a three-wheeled vehicle, or a four-wheeled vehicle, and the drive source thereof is an internal combustion engine such as a diesel engine or a gasoline engine, an electric motor, or a combination thereof. The electric motor operates by using the electric power generated by the generator connected to the internal combustion engine or the electric power generated by the secondary battery or the fuel cell. Further, the vehicle M may be an autonomous driving vehicle. Autonomous driving is, for example, the automatic control of one or both of the steering and speed of a vehicle. The vehicle operation control described above may include various operation controls such as ACC (Adaptive Cruise Control), ALC (Auto Lane Changing), and LKAS (Lane Keeping Assistance System). The driving of the self-driving vehicle may be controlled by the manual driving of the occupant (driver).

エージェント装置５００は、車両Ｍの乗員（例えば、ユーザＵ２）と対話したり、乗員から問い合わせや要求等に対する情報を提供したりする。エージェント装置５００は、例えば、ユーザＵ２からの問い合わせや要求等の情報を受け付け、その受け付けた情報を情報提供装置１００に送信し、送信した情報に対する回答として得られた情報を出力する。つまり、エージェント装置５００は、通信端末３００と同様に、音声ユーザインターフェースとして機能する。 The agent device 500 interacts with the occupant of the vehicle M (for example, the user U2), and provides information for inquiries, requests, and the like from the occupant. The agent device 500 receives, for example, information such as an inquiry or request from the user U2, transmits the received information to the information providing device 100, and outputs the information obtained as a reply to the transmitted information. That is, the agent device 500 functions as a voice user interface, similarly to the communication terminal 300.

［情報提供装置］
以下、情報提供装置１００の構成を説明する。情報提供装置１００は、例えば、通信部１０２と、認証部１０４と、取得部１０６と、音声認識部１０８と、自然言語処理部１１０と、判定部１１２と、ユーザベクトル生成部１１４と、解析部１１６と、収集部１１８と、辞書生成部１２０と、検証部１２２と、提供部１２４と、記憶部１３０とを備える。音声認識部１０８と自然言語処理部１１０とを合わせたものは、「抽出部」の一例である。ユーザベクトル生成部１１４は「第１生成部」の一例である。辞書生成部１２０は「第２生成部」の一例である。 [Information provider]
Hereinafter, the configuration of the information providing device 100 will be described. The information providing device 100 is, for example, a communication unit 102, an authentication unit 104, an acquisition unit 106, a voice recognition unit 108, a natural language processing unit 110, a determination unit 112, a user vector generation unit 114, and an analysis unit. It includes 116, a collection unit 118, a dictionary generation unit 120, a verification unit 122, a provision unit 124, and a storage unit 130. The combination of the voice recognition unit 108 and the natural language processing unit 110 is an example of the "extraction unit". The user vector generation unit 114 is an example of the “first generation unit”. The dictionary generation unit 120 is an example of the “second generation unit”.

認証部１０４と、取得部１０６と、音声認識部１０８と、自然言語処理部１１０と、判定部１１２と、ユーザベクトル生成部１１４と、解析部１１６と、収集部１１８と、辞書生成部１２０と、検証部１２２と、提供部１２４は、それぞれ、ＣＰＵ（Central Processing Unit）等のハードウェアプロセッサがプログラム（ソフトウェア）を実行することにより実現される。また、これらの構成要素のうち一部または全部は、ＬＳＩ（Large Scale Integration）やＡＳＩＣ（Application Specific Integrated Circuit）、ＦＰＧＡ（Field-Programmable Gate Array）、ＧＰＵ（Graphics Processing Unit）等のハードウェア（回路部；circuitryを含む）によって実現されてもよいし、ソフトウェアとハードウェアの協働によって実現されてもよい。プログラムは、予めＨＤＤ（Hard Disk Drive）やフラッシュメモリ等の記憶装置（非一過性の記憶媒体を備える記憶装置）に格納されていてもよいし、ＤＶＤやＣＤ－ＲＯＭ等の着脱可能な記憶媒体（非一過性の記憶媒体）に格納されており、記憶媒体がドライブ装置等に装着されることで情報提供装置１００の記憶装置にインストールされてもよい。 The authentication unit 104, the acquisition unit 106, the voice recognition unit 108, the natural language processing unit 110, the determination unit 112, the user vector generation unit 114, the analysis unit 116, the collection unit 118, and the dictionary generation unit 120. The verification unit 122 and the provision unit 124 are each realized by executing a program (software) by a hardware processor such as a CPU (Central Processing Unit). In addition, some or all of these components are hardware (circuits) such as LSI (Large Scale Integration), ASIC (Application Specific Integrated Circuit), FPGA (Field-Programmable Gate Array), GPU (Graphics Processing Unit), etc. It may be realized by the part; including circuitry), or it may be realized by the cooperation of software and hardware. The program may be stored in advance in a storage device (a storage device including a non-transient storage medium) such as an HDD (Hard Disk Drive) or a flash memory, or a removable storage such as a DVD or a CD-ROM. It is stored in a medium (non-transient storage medium), and may be installed in the storage device of the information providing device 100 by mounting the storage medium on a drive device or the like.

記憶部１３０は、上記の各種記憶装置、或いはＥＥＰＲＯＭ（Electrically Erasable Programmable Read Only Memory）、ＲＯＭ（Read Only Memory）、またはＲＡＭ（Random Access Memory）等により実現される。記憶部１３０には、上記のプロセッサによって参照されるプログラムに加えて、例えば、ユーザ認証情報１３２、発話履歴情報１３４、行動履歴情報１３６などが格納される。 The storage unit 130 is realized by the above-mentioned various storage devices, EEPROM (Electrically Erasable Programmable Read Only Memory), ROM (Read Only Memory), RAM (Random Access Memory), or the like. In addition to the program referenced by the processor, the storage unit 130 stores, for example, user authentication information 132, speech history information 134, action history information 136, and the like.

ユーザ認証情報１３２には、例えば、情報提供装置１００を利用するユーザを識別する情報や認証部１０４による認証時に用いられる情報等が含まれる。ユーザ認証情報１３２は、例えば、ユーザＩＤ、パスワード、住所、氏名、年齢、性別、趣味、特技、指向情報等である。指向情報とは、ユーザの指向性を示す情報であり、例えば、ユーザの考え方を示す情報や、好みなどを示す情報（嗜好性の情報）、ユーザが重視する事項を示す情報等である。 The user authentication information 132 includes, for example, information for identifying a user who uses the information providing device 100, information used at the time of authentication by the authentication unit 104, and the like. The user authentication information 132 is, for example, a user ID, a password, an address, a name, an age, a gender, a hobby, a special skill, orientation information, and the like. The directivity information is information indicating the directivity of the user, for example, information indicating the user's way of thinking, information indicating preference (information on preference), information indicating items that the user attaches importance to, and the like.

発話履歴情報１３４は、音声ユーザインターフェースとして機能する通信端末３００又はエージェント装置５００に対して、ユーザが話しかけた言葉（つまり発話）の履歴情報である。発話履歴情報１３４には、一人のユーザの発話履歴である個人発話履歴情報１３４Ａと、複数のユーザの発話履歴である集団発話履歴情報１３４Ｂとが含まれる。例えば、エージェント装置５００が搭載された車両Ｍに一人のユーザのみが乗車している場合（エージェント装置５００によって一人のユーザの発話のみが収音された場合）、そのユーザの発話履歴は、個人発話履歴情報１３４Ａとして記録される。一方、車両Ｍに複数のユーザが集団で乗車している場合（エージェント装置５００によって複数のユーザの発話が収音された場合）、それら複数のユーザの発話履歴は、集団発話履歴情報１３４Ｂとして記録される。 The utterance history information 134 is history information of words (that is, utterances) spoken by the user to the communication terminal 300 or the agent device 500 that functions as a voice user interface. The utterance history information 134 includes individual utterance history information 134A, which is the utterance history of one user, and group utterance history information 134B, which is the utterance history of a plurality of users. For example, when only one user is in the vehicle M on which the agent device 500 is mounted (when only one user's utterance is picked up by the agent device 500), the utterance history of that user is an individual utterance. It is recorded as history information 134A. On the other hand, when a plurality of users are in a group on the vehicle M (when the utterances of the plurality of users are picked up by the agent device 500), the utterance histories of the plurality of users are recorded as the group utterance history information 134B. Will be done.

行動履歴情報１３６は、観光地への訪問やインターネット検索といったユーザの行動の履歴情報である。行動履歴情報１３６には、一人のユーザの行動履歴である個人行動履歴情報１３６Ａと、複数のユーザの行動履歴である集団行動履歴情報１３６Ｂとが含まれる。例えば、エージェント装置５００が搭載された車両Ｍに一人のユーザのみが乗車している場合、そのユーザを乗せた車両Ｍの位置の遷移履歴（移動履歴）は、個人発話履歴情報１３４Ａとして記録される。一方、車両Ｍに複数のユーザが集団で乗車している場合、それら複数のユーザを乗せた車両Ｍの位置の遷移履歴（移動履歴）は、集団発話履歴情報１３４Ｂとして記録される。また、一人のユーザが通信端末３００を携行して移動した場合、その通信端末３００の位置の遷移履歴（移動履歴）は、個人発話履歴情報１３４Ａとして記録される。一方、複数のユーザのそれぞれが通信端末３００を携行して移動した場合、それら複数の通信端末３００の位置の遷移履歴（移動履歴）は、集団発話履歴情報１３４Ｂとして記録される。 The behavior history information 136 is history information of the user's behavior such as a visit to a tourist spot or an Internet search. The action history information 136 includes individual action history information 136A, which is the action history of one user, and group action history information 136B, which is the action history of a plurality of users. For example, when only one user is on the vehicle M on which the agent device 500 is mounted, the transition history (movement history) of the position of the vehicle M on which the user is placed is recorded as the personal utterance history information 134A. .. On the other hand, when a plurality of users are on the vehicle M in a group, the transition history (movement history) of the position of the vehicle M on which the plurality of users are placed is recorded as the group utterance history information 134B. Further, when one user carries and moves the communication terminal 300, the transition history (movement history) of the position of the communication terminal 300 is recorded as the personal utterance history information 134A. On the other hand, when each of the plurality of users carries and moves the communication terminal 300, the transition history (movement history) of the positions of the plurality of communication terminals 300 is recorded as the group utterance history information 134B.

通信部１０２は、ネットワークＮＷを介して通信端末３００、エージェント装置５００、その他の外部装置と通信するインターフェースである。例えば、通信部１０２は、ＮＩＣ（Network Interface Card）や、無線通信用のアンテナなどを備える。 The communication unit 102 is an interface for communicating with the communication terminal 300, the agent device 500, and other external devices via the network NW. For example, the communication unit 102 includes a NIC (Network Interface Card), an antenna for wireless communication, and the like.

認証部１０４は、情報提供システム１を利用するユーザ（例えば、ユーザＵ１、Ｕ２）に関する情報を、ユーザ認証情報１３２として記憶部１３０に登録する。例えば、認証部１０４は、通信端末３００又はエージェント装置５００からユーザ登録要求を受け付けた場合に、ユーザ認証情報１３２に含まれる各種情報を入力するためのＧＵＩ（Graphical User Interface）を、登録要求を受け付けた装置に表示させる。ユーザがＧＵＩに各種情報を入力すると、認証部１０４は、その装置からユーザに関する情報を取得する。そして、認証部１０４は、通信端末３００又はエージェント装置５００から取得したユーザに関する情報を記憶部１３０にユーザ認証情報１３２として登録する。 The authentication unit 104 registers information about a user (for example, users U1 and U2) who uses the information providing system 1 in the storage unit 130 as user authentication information 132. For example, when the authentication unit 104 receives a user registration request from the communication terminal 300 or the agent device 500, the authentication unit 104 accepts a GUI (Graphical User Interface) for inputting various information included in the user authentication information 132. Display on the device. When the user inputs various information into the GUI, the authentication unit 104 acquires information about the user from the device. Then, the authentication unit 104 registers the information about the user acquired from the communication terminal 300 or the agent device 500 in the storage unit 130 as the user authentication information 132.

図２は、ユーザ認証情報１３２の内容について説明するための図である。ユーザ認証情報１３２は、例えば、ユーザの認証情報に対して、そのユーザの住所、氏名、年齢、性別、連絡先、指向情報等の情報が対応付けられたものである。認証情報には、例えば、ユーザを識別する識別情報であるユーザＩＤやパスワード等が含まれる。また、認証情報には、指紋情報や虹彩情報等の生体認証情報が含まれてもよい。連絡先は、例えば、そのユーザによって使用される音声ユーザインターフェース（通信端末３００又はエージェント装置５００）と通信するためのアドレス情報であってもよいし、ユーザの電話番号やメールアドレス、端末識別情報等であってもよい。情報提供装置１００は、連絡先の情報に基づいて、各移動通信機器と通信し、各種情報を提供する。 FIG. 2 is a diagram for explaining the contents of the user authentication information 132. The user authentication information 132 is, for example, associated with the user's authentication information by information such as the user's address, name, age, gender, contact information, and orientation information. The authentication information includes, for example, a user ID, a password, and the like, which are identification information for identifying the user. Further, the authentication information may include biometric authentication information such as fingerprint information and iris information. The contact information may be, for example, address information for communicating with the voice user interface (communication terminal 300 or agent device 500) used by the user, the user's telephone number, e-mail address, terminal identification information, or the like. May be. The information providing device 100 communicates with each mobile communication device based on the contact information and provides various information.

認証部１０４は、予め登録しておいたユーザ認証情報１３２に基づいて情報提供システム１のサービスのユーザを認証する。例えば、認証部１０４は、通信端末３００またはエージェント装置５００からサービスの利用要求を受け付けたタイミングでユーザを認証する。具体的には、認証部１０４は、利用要求を受け付けた場合に、ユーザＩＤやパスワード等の認証情報を入力するためのＧＵＩを、要求のあった端末装置に表示させると共に、そのＧＵＩ上に入力された入力認証情報とユーザ認証情報１３２の認証情報とを比較する。認証部１０４は、ユーザ認証情報１３２の中に、入力認証情報に合致する認証情報が格納されているか否かを判定し、入力認証情報に合致する認証情報が格納されている場合、サービスの利用を許可する。一方、認証部１０４は、入力認証情報に合致する認証情報が格納されていない場合、サービスの利用を禁止したり、或いは新規登録を行わせるための処理を行う。 The authentication unit 104 authenticates the user of the service of the information providing system 1 based on the user authentication information 132 registered in advance. For example, the authentication unit 104 authenticates the user at the timing when the service use request is received from the communication terminal 300 or the agent device 500. Specifically, when the authentication unit 104 receives the usage request, the authentication unit 104 displays the GUI for inputting the authentication information such as the user ID and the password on the requested terminal device, and inputs the GUI on the GUI. The input authentication information and the authentication information of the user authentication information 132 are compared. The authentication unit 104 determines whether or not the authentication information matching the input authentication information is stored in the user authentication information 132, and if the authentication information matching the input authentication information is stored, the service is used. Allow. On the other hand, when the authentication information matching the input authentication information is not stored, the authentication unit 104 performs a process for prohibiting the use of the service or for making a new registration.

取得部１０６は、通信部１０２を介して（ネットワークＮＷを介して）、通信端末３００またはエージェント装置５００から、一人又は複数人のユーザの発話を取得し、それを発話履歴情報１３４として記憶部１３０に格納する。ユーザの発話は、音声データ（音響データや音響ストリームともいう）であってもよいし、その音声データから認識されたテキストデータであってもよい。また、取得部１０６は、通信部１０２を介して（ネットワークＮＷを介して）、通信端末３００またはエージェント装置５００から、一人又は複数人のユーザの行動履歴を取得し、それを行動履歴情報１３６として記憶部１３０に格納する。 The acquisition unit 106 acquires utterances of one or a plurality of users from the communication terminal 300 or the agent device 500 via the communication unit 102 (via the network NW), and stores the utterances as the utterance history information 134 in the storage unit 130. Store in. The user's utterance may be voice data (also referred to as acoustic data or acoustic stream), or may be text data recognized from the voice data. Further, the acquisition unit 106 acquires the action history of one or a plurality of users from the communication terminal 300 or the agent device 500 via the communication unit 102 (via the network NW), and uses it as the action history information 136. It is stored in the storage unit 130.

図３は、個人発話履歴情報１３４Ａの内容について説明するための図である。個人発話履歴情報１３４Ａは、例えば、ユーザが発話した日時に、その発話がなされた場所、その発話の内容発話、および提供情報が対応付けられたものである。発話内容は、ユーザが発話した音声であってもよいし、後述する音声認識部１０８による音声認識によって得られたテキストであってもよい。提供情報は、ユーザの発話に対するレスポンスとして提供部１２４により提供された情報である。提供情報には、例えば、対話用の音声情報や、画像や動作等の表示情報が含まれる。 FIG. 3 is a diagram for explaining the contents of the personal utterance history information 134A. In the personal utterance history information 134A, for example, the place where the utterance was made, the content utterance of the utterance, and the provided information are associated with the date and time when the user utters the utterance. The utterance content may be a voice spoken by the user, or may be a text obtained by voice recognition by the voice recognition unit 108, which will be described later. The provided information is information provided by the providing unit 124 as a response to the user's utterance. The provided information includes, for example, voice information for dialogue and display information such as images and actions.

図４は、集団発話履歴情報１３４Ｂの内容について説明するための図である。集団発話履歴情報１３４Ｂは、例えば、ユーザが発話した日時に対して、その発話がなされた場所、その発話の内容発話、提供情報といった情報に加えて、更に、集団メンバ情報が対応付けられたものである。集団メンバ情報とは、例えば、同じ車両Ｍに乗車した他のユーザや、同じ場所に同行した他のユーザ、位置情報から同時刻に同じ場所にいたとみなせる他のユーザに関する情報（例えばユーザＩＤ等）である。 FIG. 4 is a diagram for explaining the contents of the group utterance history information 134B. The group utterance history information 134B is, for example, associated with the date and time when the user utters, in addition to information such as the place where the utterance was made, the content of the utterance, and the provided information, and further, the group member information is associated with the information. Is. The group member information is, for example, information about other users who got on the same vehicle M, other users who accompanied the same place, and other users who can be regarded as being at the same place at the same time from the position information (for example, user ID, etc.). ).

図５は、個人行動履歴情報１３６Ａの内容について説明するための図である。個人行動履歴情報１３６Ａは、例えば、ユーザＩＤおよびに日時に対して、行動履歴が対応付けられたものである。行動履歴には、例えば、ユーザが訪問先や、その移動手段などが含まれる。上述したように、行動履歴には、インターネット上の行動履歴が含まれていてもよい。 FIG. 5 is a diagram for explaining the contents of the personal behavior history information 136A. The personal action history information 136A has, for example, a user ID and a date and time associated with the action history. The action history includes, for example, a visit destination by the user and a means of transportation thereof. As described above, the action history may include the action history on the Internet.

図６は、集団行動履歴情報１３６Ｂの内容について説明するための図である。集団行動履歴情報１３６Ｂは、例えば、ユーザＩＤおよびに日時に対して、行動履歴と、集団メンバ情報とが対応付けられたものである。 FIG. 6 is a diagram for explaining the contents of the group action history information 136B. In the group action history information 136B, for example, the action history and the group member information are associated with each other with respect to the user ID and the date and time.

音声認識部１０８は、ユーザの発話音声を認識する音声認識（音声をテキスト化する処理）を行う。例えば、音声認識部１０８は、取得部１０６によって取得されたユーザの発話を表す音声データに対して音声認識を行い、音声データをテキスト化したテキストデータを生成する。テキストデータには、発話の内容が文字として記述された文字列が含まれる。 The voice recognition unit 108 performs voice recognition (process of converting the voice into text) for recognizing the voice spoken by the user. For example, the voice recognition unit 108 performs voice recognition on the voice data representing the user's utterance acquired by the acquisition unit 106, and generates text data obtained by converting the voice data into text. The text data includes a character string in which the content of the utterance is described as a character.

例えば、音声認識部１０８は、音響モデルと、自動音声認識のための辞書（以下、ＡＳＲ辞書と称する）とを用いて、音声データをテキスト化してよい。音響モデルは、入力された音声を周波数に応じて分離し、その分離した各音声を音素（スペクトログラム）に変換するよう予め学習又は調整されたモデルであり、例えば、ニューラルネットワークや隠れマルコフモデルなどである。ＡＳＲ辞書は、複数の音素の組み合わせに対して文字列が対応付けれており、更に、構文によって文字列の区切る位置が定義付けられたデータベースである。ＡＳＲ辞書は、いわゆるパターンマッチ辞書である。例えば、音声認識部１０８は、音声データを音響モデルに入力し、その音響モデルによって出力された音素の集合をＡＳＲ辞書の中から探し、その音素の集合に対応した文字列を取得する。音声認識部１０８は、このように得られた文字列の組み合わせをテキストデータとして生成する。なお、音声認識部１０８は、ＡＳＲ辞書を使用する代わりに、例えばｎ－ｇｒａｍモデル等によって実装された言語モデルを用いて、音響モデルの出力結果からテキストデータを生成してもよい。 For example, the voice recognition unit 108 may convert voice data into text using an acoustic model and a dictionary for automatic voice recognition (hereinafter referred to as an ASR dictionary). The acoustic model is a model that is pre-trained or adjusted so as to separate the input voice according to the frequency and convert each separated voice into a phoneme (spectrogram). For example, in a neural network or a hidden Markov model. be. The ASR dictionary is a database in which character strings are associated with a combination of a plurality of phonemes, and the positions where the character strings are separated are defined by a syntax. The ASR dictionary is a so-called pattern matching dictionary. For example, the voice recognition unit 108 inputs voice data to an acoustic model, searches the ASR dictionary for a set of phonemes output by the acoustic model, and acquires a character string corresponding to the set of phonemes. The voice recognition unit 108 generates the combination of the character strings thus obtained as text data. Instead of using the ASR dictionary, the voice recognition unit 108 may generate text data from the output result of the acoustic model by using, for example, a language model implemented by an n-gram model or the like.

自然言語処理部１１０は、テキストの構造や意味を理解する自然言語理解を行う。例えば、自然言語処理部１１０は、意味解釈のために予め用意された辞書（以下、ＮＬＵ辞書）を参照しながら、音声認識部１０８によって生成されたテキストデータの意味を解釈する。ＮＬＵ辞書は、テキストデータに対して抽象化された意味情報が対応付けられたデータベースである。例えば、ＮＬＵ辞書は、「私」という単語と「同僚」という単語が互いに関係性が高く、「ハンバーガー」という単語と「食べる」という単語が互いに関係性が高い、といったことを定義している。これにより例えば、「私は同僚とハンバーガーを食べた」という文章が、「私」という単一の主体が「同僚」及び「ハンバーガー」という２つの客体に対して、「食べる」という行為を行ったという意味で解釈されるのではなく、「私」及び「同僚」という２つの主体が「ハンバーガー」という単一の客体に対して、「食べる」という行為を行ったという意味で解釈されることになる。ＮＬＵ辞書は、同義語や類義語などを含んでもよい。音声認識と自然言語理解は、必ずしも段階が明確に分かれる必要はなく、自然言語理解の結果を受けて音声認識の結果を修正するなど、相互に影響し合って行われてよい。 The natural language processing unit 110 performs natural language understanding to understand the structure and meaning of text. For example, the natural language processing unit 110 interprets the meaning of the text data generated by the speech recognition unit 108 while referring to a dictionary (hereinafter, NLU dictionary) prepared in advance for meaning interpretation. The NLU dictionary is a database in which abstract semantic information is associated with text data. For example, the NLU dictionary defines that the words "I" and "colleagues" are highly related to each other, and the words "hamburger" and "eat" are highly related to each other. As a result, for example, the sentence "I ate a hamburger with a colleague" performed the act of "eating" to two objects, "colleague" and "hamburger", by a single subject "I". It is not interpreted in the sense that it is interpreted in the sense that the two subjects "I" and "colleague" performed the act of "eating" on a single object called "hamburger". Become. The NLU dictionary may include synonyms, synonyms, and the like. Speech recognition and natural language understanding do not necessarily have to be clearly separated into stages, and may be performed by interacting with each other, such as modifying the result of speech recognition based on the result of natural language understanding.

また、自然言語処理部１１０は、音声認識部１０８によって生成されたテキストデータから固有表現を抽出する。例えば、自然言語処理部１１０は、ＴＦ（Term Frequency）－ＩＤＦ（Inverse Document Frequency）などを用いて、固有表現を抽出してよい。 Further, the natural language processing unit 110 extracts the named entity from the text data generated by the voice recognition unit 108. For example, the natural language processing unit 110 may use TF (Term Frequency) -IDF (Inverse Document Frequency) or the like to extract a named entity.

固有表現とは、例えば、名詞のような一つの単語（ワード）であってもよいし、名詞と名詞とが他の品詞（例えば助詞）で接続された一つの句（フレーズ）であってもよいし、名詞や動詞、助詞、助動詞などの種々の品詞を含む一つの文（センテンス）であってもよい。 The proper expression may be, for example, one word (word) such as a noun, or one phrase (phrase) in which a noun and a noun are connected by another part of speech (for example, an auxiliary verb). It may be a single sentence containing various parts of speech such as nouns, verbs, auxiliary verbs, and auxiliary verbs.

例えば、固有表現には、ある地名や、その地名の言い回し、ある標章や、その標章の言い回しなどが含まれる。標章には、例えば、企業名、ブランド名、店舗名などが含まれる。例えば、ある企業の公式名称が「本田技研工業株式会社」であり、その「本田技研工業株式会社」という企業の本社が存在する地域のユーザが、その企業のことを親しみをこめて「本田技研」などとを呼称していたとする。この場合、「本田技研」という固有表現は、「本田技研工業株式会社」という固有表現の言い回しとして扱われる。また、ある飲食店の公式名称が「ＡＢＣＤＥＦ」であるときに、ある地域のユーザがその飲食店のことを「ＡＢＣ」と省略して呼称しているのに対して、別の地域のユーザがその飲食店のことを「ＤＥＦ」と省略して呼称していたとする。この場合、「ＡＢＣ」や「ＤＥＦ」のように地域で呼び方が異なる固有表現は、「ＡＢＣＤＥＦ」という固有表現の言い回しとして扱われる。また、固有表現とその言い回しとの関係は、地域に限らず、若者と大人といったように年齢や世代の違いのなかにも存在していてよいし、コミュニティや派閥などの違いのなかにも存在していてよい。このような関係は、例えば、広く一般的に知られているメジャーな名称と、メジャーな名称と同一の意味で使用されるマイナーな名称との間にも存在していてよい。このように、自然言語処理部１１０は、文字列的に互いに異なる表現であるものの、その意味する対象が同一である表現を、それぞれ固有表現として抽出する。 For example, named entity includes a place name, the wording of the place name, a mark, and the wording of the mark. The mark includes, for example, a company name, a brand name, a store name, and the like. For example, the official name of a company is "Honda Motor Co., Ltd.", and users in the area where the head office of the company "Honda Motor Co., Ltd." is located are familiar with the company and "Honda Motor Co., Ltd." "And so on. In this case, the named entity "Honda Motor Co., Ltd." is treated as the phrase "Honda Motor Co., Ltd.". Also, when the official name of a restaurant is "ABCDEF", users in one area abbreviate the restaurant as "ABC", while users in another area call it "ABC". It is assumed that the restaurant is abbreviated as "DEF". In this case, named entities that are called differently in different regions, such as "ABC" and "DEF", are treated as the phrase "ABCDEF". In addition, the relationship between named entity recognition and its wording may exist not only in regions but also in differences in age and generation, such as young people and adults, and also in differences in communities and factions. You can do it. Such a relationship may also exist, for example, between a widely known major name and a minor name that is used interchangeably with the major name. In this way, the natural language processing unit 110 extracts expressions that are different from each other in terms of character strings but have the same meaning, as unique expressions.

判定部１１２は、行動履歴情報１３６に含まれる複数のユーザのそれぞれの行動履歴に基づいて、各々のユーザが訪問回数が急増した地点を訪問したか否かを判定する。「訪問回数が急増した地点」とは、例えば、他のユーザによって訪問された回数が閾値以上の地点、又は他のユーザによって訪問された回数の所定期間あたりの増加率が閾値以上の地点である。 The determination unit 112 determines whether or not each user has visited a point where the number of visits has increased rapidly, based on the action history of each of the plurality of users included in the action history information 136. The "point where the number of visits has increased sharply" is, for example, a point where the number of visits by other users is equal to or higher than the threshold value, or a point where the rate of increase in the number of visits by other users per predetermined period is equal to or higher than the threshold value. ..

ユーザベクトル生成部１１４は、自然言語処理部１１０によって抽出された固有表現と、判定部１１２による判定結果とを組み合わせた多次元の特徴量を、ユーザごとに生成する。例えば、ユーザベクトル生成部１１４は、ＴＦ－ＩＤＦなどを用いて得られた固有表現のベクトル（以下、「発話ベクトル」という）と、判定部１１２による判定結果に基づくベクトル（以下、「行動ベクトル」という）とを組み合わせ、それらの組み合わせを一つのベクトルとして生成する。以下、発話ベクトルと行動ベクトルとを組み合わせたベクトルのことを、「ユーザベクトル」と称して説明する。発話ベクトルは「第１特徴量」の一例であり、行動ベクトルは「第２特徴量」の一例である。 The user vector generation unit 114 generates a multidimensional feature amount for each user, which is a combination of the named entity extracted by the natural language processing unit 110 and the determination result by the determination unit 112. For example, the user vector generation unit 114 has a vector of named entity obtained by using TF-IDF or the like (hereinafter, referred to as “utterance vector”) and a vector based on the determination result by the determination unit 112 (hereinafter, “action vector””. ) And the combination is generated as one vector. Hereinafter, a vector that combines an utterance vector and an action vector will be referred to as a “user vector”. The utterance vector is an example of the "first feature amount", and the action vector is an example of the "second feature amount".

解析部１１６は、ユーザベクトルが生成された複数のユーザのクラスタリングを行い、発話内容や訪問地点といった特徴が類似するユーザ同士を同一のクラスタに分類する。この際、解析部１１６は、ユーザベクトルの次元を圧縮してよい。次元圧縮には、例えば、主成分分析や、ＬＤＡ（Latent Dirichlet Allocation）に代表されるようなトピックモデル、Ｗｏｒｄ２Ｖｅｃなどのニューラルネットワークなどが利用されてよい。また、次元圧縮には、正則化回帰の一つであるＬＡＳＳＯ（Least Absolute Shrinkage and Selection Operator）や、ＮＭＦ（Nonnegative Matrix Factorization）などが利用されてもよい。 The analysis unit 116 clusters a plurality of users for which a user vector is generated, and classifies users having similar characteristics such as utterance contents and visited points into the same cluster. At this time, the analysis unit 116 may compress the dimension of the user vector. For dimensional compression, for example, principal component analysis, a topic model represented by LDA (Latent Dirichlet Allocation), a neural network such as Word2Vec, or the like may be used. Further, for dimension compression, LASSO (Least Absolute Shrinkage and Selection Operator), which is one of regularization regression, NMF (Nonnegative Matrix Factorization), and the like may be used.

収集部１１８は、クラスタリングによって生成されたクラスタごとに、そのクラスタに属するユーザの発話から抽出された固有表現の共起表現を収集する。共起表現とは、例えば、ウェブサイトなどにおいて、ユーザの発話から抽出された固有表現とともに出現しやすい単語などである。例えば、収集部１１８は、通信部１０２を介してウェブサーバ等にアクセスし、そのウェブサーバによって提供されるウェブサイトをクロールし、コンテンツを収集する。収集部１１８は、アソシエーション分析等を利用して、収集したコンテンツから共起表現を抽出する。このように、収集部１１８は、Ｗｅｂクローラーのように機能してよい。 The collecting unit 118 collects the co-occurrence expression of the named entity extracted from the utterance of the user belonging to the cluster for each cluster generated by the clustering. The co-occurrence expression is, for example, a word that easily appears together with a named entity extracted from a user's utterance on a website or the like. For example, the collecting unit 118 accesses a web server or the like via the communication unit 102, crawls the website provided by the web server, and collects the content. The collection unit 118 extracts co-occurrence expressions from the collected content by using association analysis or the like. In this way, the collecting unit 118 may function like a Web crawler.

辞書生成部１２０は、クラスタリングによって生成されたクラスタごとに、音声認識や自然言語理解といった各情報処理のための辞書（以下、情報処理辞書と称する）を生成する。情報処理辞書とは、上述した音声認識（音声のテキスト化）の際に参照されるＡＳＲ辞書と、自然言語理解（テキストの意味解釈）の際に参照されるＮＬＵ辞書とのうち、いずれか一方又は双方の辞書である。例えば、音声認識の際にＡＳＲ辞書ではなく言語モデルが利用される場合、情報処理辞書には、ＮＬＵ辞書のみが含まれる。情報処理辞書には、収集部１１８によって収集された共起表現が包含される。 The dictionary generation unit 120 generates a dictionary (hereinafter referred to as an information processing dictionary) for each information processing such as speech recognition and natural language understanding for each cluster generated by clustering. The information processing dictionary is either an ASR dictionary referred to in the above-mentioned speech recognition (speech text conversion) or an NLU dictionary referred to in natural language understanding (text meaning interpretation). Or both dictionaries. For example, when a language model is used instead of an ASR dictionary for speech recognition, the information processing dictionary includes only an NLU dictionary. The information processing dictionary includes co-occurrence expressions collected by the collecting unit 118.

検証部１２２は、辞書生成部１２０によって生成された情報処理辞書の精度を検証する。検証方法の詳細については後述する。 The verification unit 122 verifies the accuracy of the information processing dictionary generated by the dictionary generation unit 120. The details of the verification method will be described later.

提供部１２４は、通信部１０２を介して、音声ユーザインターフェースである通信端末３００又はエージェント装置５００に各種情報を提供（送信）する。例えば、取得部１０６が通信端末３００又はエージェント装置５００から問い合わせや要求を発話として取得した場合、提供部１２４は、その問い合わせや要求のレスポンスとなる情報を生成する。例えば、「今日の天気を教えて」という意味の発話が取得された場合、提供部１２４は、「今日」と「天気」という固有表現に対応したコンテンツ（天気予報の結果を表す画像や映像、音声など）を生成してよい。そして、提供部１２４は、通信部１０２を介して、生成した情報を問い合わせや要求のあった音声ユーザインターフェースに返信する。 The providing unit 124 provides (transmits) various information to the communication terminal 300 or the agent device 500, which is a voice user interface, via the communication unit 102. For example, when the acquisition unit 106 acquires an inquiry or request as an utterance from the communication terminal 300 or the agent device 500, the provision unit 124 generates information that is a response to the inquiry or request. For example, when an utterance meaning "tell me the weather today" is acquired, the provider 124 will provide content corresponding to the unique expressions "today" and "weather" (images and videos showing the results of weather forecasts, etc.). Voice etc.) may be generated. Then, the providing unit 124 returns the generated information to the voice user interface inquired or requested via the communication unit 102.

また、提供部１２４は、辞書生成部１２０によって生成された情報処理辞書の利用案内情報を、通信端末３００又はエージェント装置５００に提供する。利用案内情報とは、例えば、ＡＳＲ辞書が音声認識の際に新たに参照（使用）されるようユーザに設定を推奨したり、或いは、ＮＬＵ辞書が自然言語理解の際に新たに参照（使用）されるようユーザに設定を推奨したりする情報である。 Further, the providing unit 124 provides the communication terminal 300 or the agent device 500 with the usage guidance information of the information processing dictionary generated by the dictionary generation unit 120. The usage guidance information is, for example, recommended to the user to newly refer (use) the ASR dictionary during voice recognition, or to newly refer (use) the NLU dictionary when understanding natural language. This is information that recommends the user to set it.

［通信端末］
次に、通信端末３００の構成について説明する。図７は、実施形態の通信端末３００の構成図である。通信端末３００は、例えば、端末側通信部３１０と、入力部３２０と、ディスプレイ３３０と、スピーカ３４０と、マイクロフォン（以下、マイク）３５０と、位置取得部３５５と、カメラ３６０と、アプリ実行部３７０と、出力制御部３８０と、端末側記憶部３９０とを備える。位置取得部３５５と、アプリ実行部３７０と、出力制御部３８０とは、例えば、ＣＰＵ等のハードウェアプロセッサがプログラム（ソフトウェア）を実行することにより実現される。また、これらの構成要素のうち一部または全部は、ＬＳＩやＡＳＩＣ、ＦＰＧＡ、ＧＰＵ等のハードウェア（回路部；circuitryを含む）によって実現されてもよいし、ソフトウェアとハードウェアの協働によって実現されてもよい。プログラムは、予めＨＤＤやフラッシュメモリ等の記憶装置（非一過性の記憶媒体を備える記憶装置）に格納されていてもよいし、ＤＶＤやＣＤ－ＲＯＭ等の着脱可能な記憶媒体（非一過性の記憶媒体）に格納されており、記憶媒体がドライブ装置やカードスロット等に装着されることで通信端末３００の記憶装置にインストールされてもよい。 [Communication terminal]
Next, the configuration of the communication terminal 300 will be described. FIG. 7 is a configuration diagram of the communication terminal 300 of the embodiment. The communication terminal 300 includes, for example, a terminal-side communication unit 310, an input unit 320, a display 330, a speaker 340, a microphone (hereinafter, microphone) 350, a position acquisition unit 355, a camera 360, and an application execution unit 370. And an output control unit 380 and a terminal side storage unit 390. The position acquisition unit 355, the application execution unit 370, and the output control unit 380 are realized by, for example, a hardware processor such as a CPU executing a program (software). Further, some or all of these components may be realized by hardware such as LSI, ASIC, FPGA, GPU (circuit unit; including circuitry), or realized by collaboration between software and hardware. May be done. The program may be stored in advance in a storage device such as an HDD or a flash memory (a storage device including a non-transient storage medium), or a removable storage medium (non-transient) such as a DVD or a CD-ROM. It is stored in a sex storage medium), and may be installed in the storage device of the communication terminal 300 by mounting the storage medium in a drive device, a card slot, or the like.

端末側記憶部３９０は、上記の各種記憶装置、或いはＥＥＰＲＯＭ、ＲＯＭ、ＲＡＭ等により実現されてもよい。端末側記憶部３９０には、例えば、上記のプログラムや、情報提供アプリケーション３９２、その他の各種情報が格納される。 The terminal-side storage unit 390 may be realized by the above-mentioned various storage devices, EEPROM, ROM, RAM, or the like. The terminal-side storage unit 390 stores, for example, the above program, the information providing application 392, and various other information.

端末側通信部３１０は、例えば、ネットワークＮＷを利用して、情報提供装置１００、エージェント装置５００、その他の外部装置と通信を行う。 The terminal-side communication unit 310 uses, for example, the network NW to communicate with the information providing device 100, the agent device 500, and other external devices.

入力部３２０は、例えば、各種キーやボタン等の操作によるユーザＵ１の入力を受け付ける。ディスプレイ３３０は、例えば、ＬＣＤ（Liquid Crystal Display）や有機ＥＬ（Electro Luminescence）ディスプレイ等である。入力部３２０は、タッチパネルとしてディスプレイ３３０と一体に構成されていてもよい。ディスプレイ３３０は、出力制御部３８０の制御により、実施形態における各種情報を表示する。スピーカ３４０は、例えば、出力制御部３８０の制御により、所定の音声を出力する。マイク３５０は、例えば、出力制御部３８０の制御により、ユーザＵ１の音声の入力を受け付ける。 The input unit 320 accepts the input of the user U1 by operating various keys, buttons, or the like, for example. The display 330 is, for example, an LCD (Liquid Crystal Display), an organic EL (Electro Luminescence) display, or the like. The input unit 320 may be integrally configured with the display 330 as a touch panel. The display 330 displays various information in the embodiment under the control of the output control unit 380. The speaker 340 outputs a predetermined voice under the control of the output control unit 380, for example. The microphone 350 receives the voice input of the user U1 under the control of the output control unit 380, for example.

位置取得部３５５は、通信端末３００の位置情報を取得する。例えば、位置取得部３５５は、ＧＰＳ（Global Positioning System）などに代表されるＧＮＳＳ（Global Navigation Satellite System）受信機を含む。位置情報とは、例えば、二次元の地図座標でもよく、緯度経度情報でもよい。位置取得部３５５は、端末側通信部３１０を介して、取得した位置情報を情報提供装置１００に送信してよい。 The position acquisition unit 355 acquires the position information of the communication terminal 300. For example, the position acquisition unit 355 includes a GNSS (Global Navigation Satellite System) receiver represented by GPS (Global Positioning System) or the like. The position information may be, for example, two-dimensional map coordinates or latitude / longitude information. The position acquisition unit 355 may transmit the acquired position information to the information providing device 100 via the terminal-side communication unit 310.

カメラ３６０は、例えば、ＣＣＤ（Charge Coupled Device）やＣＭＯＳ（Complementary Metal Oxide Semiconductor）等の固体撮像素子（イメージセンサ）を利用したデジタルカメラである。例えば、ナビゲーション装置などの代用として通信端末３００が車両Ｍのインストルメントパネルに取り付けられた場合、その通信端末３００のカメラ３６０は、自動的に、又はユーザＵ１の操作に応じて、車両Ｍの車室内を撮像してよい。 The camera 360 is, for example, a digital camera using a solid-state image sensor (image sensor) such as a CCD (Charge Coupled Device) or a CMOS (Complementary Metal Oxide Semiconductor). For example, when the communication terminal 300 is attached to the instrument panel of the vehicle M as a substitute for a navigation device or the like, the camera 360 of the communication terminal 300 automatically or in response to the operation of the user U1 is the vehicle of the vehicle M. The room may be imaged.

アプリ実行部３７０は、端末側記憶部３９０に記憶された情報提供アプリケーション３９２を実行する。情報提供アプリケーション３９２は、情報提供装置１００から提供された画像をディスプレイ３３０に出力させたり、情報提供装置１００から提供された情報に対応する音声をスピーカ３４０から出力させたりするように、出力制御部３８０を制御するためのアプリケーションプログラムである。また、アプリ実行部３７０は、入力部３２０により入力された情報を、端末側通信部３１０を介して情報提供装置１００に送信する。情報提供アプリケーション３９２は、例えば、ネットワークＮＷを介して外部装置からダウンロードされたものが通信端末３００にインストールされてよい。 The application execution unit 370 executes the information providing application 392 stored in the terminal side storage unit 390. The information providing application 392 outputs an image provided by the information providing device 100 to the display 330, and outputs a voice corresponding to the information provided by the information providing device 100 from the speaker 340. It is an application program for controlling 380. Further, the application execution unit 370 transmits the information input by the input unit 320 to the information providing device 100 via the terminal-side communication unit 310. As the information providing application 392, for example, one downloaded from an external device via the network NW may be installed in the communication terminal 300.

出力制御部３８０は、アプリ実行部３７０の制御により、ディスプレイ３３０に画像を表示させたり、スピーカ３４０に音声を出力させたりする。その際、出力制御部３８０は、ディスプレイ３３０に表示させる画像の内容や態様を制御したり、スピーカ３４０に出力させる音声の内容や態様を制御したりしてよい。 The output control unit 380 displays an image on the display 330 and outputs sound to the speaker 340 under the control of the application execution unit 370. At that time, the output control unit 380 may control the content or mode of the image to be displayed on the display 330, or may control the content or mode of the sound to be output to the speaker 340.

［車両］
次に、エージェント装置５００が搭載された車両Ｍの概略構成について説明する。図８は、エージェント装置５００が搭載された車両Ｍの概略構成の一例を示す図である。図８に示す車両Ｍには、エージェント装置５００と、マイク６１０と、表示・操作装置６２０と、スピーカユニット６３０と、ナビゲーション装置６４０と、ＭＰＵ（Map Positioning Unit）６５０と、車両機器６６０と、車載通信装置６７０と、乗員認識装置６９０と、自動運転制御装置７００とが搭載される。また、スマートフォンなどの汎用通信装置６８０が車室内に持ち込まれ、通信装置として使用される場合がある。汎用通信装置６８０は、例えば、通信端末３００である。これらの装置は、ＣＡＮ（Controller Area Network）通信線等の多重通信線やシリアル通信線、無線通信網等によって互いに接続される。 [vehicle]
Next, a schematic configuration of the vehicle M on which the agent device 500 is mounted will be described. FIG. 8 is a diagram showing an example of a schematic configuration of a vehicle M on which the agent device 500 is mounted. The vehicle M shown in FIG. 8 includes an agent device 500, a microphone 610, a display / operation device 620, a speaker unit 630, a navigation device 640, an MPU (Map Positioning Unit) 650, a vehicle device 660, and an in-vehicle device. A communication device 670, an occupant recognition device 690, and an automatic operation control device 700 are mounted. In addition, a general-purpose communication device 680 such as a smartphone may be brought into the vehicle interior and used as a communication device. The general-purpose communication device 680 is, for example, a communication terminal 300. These devices are connected to each other by a multiplex communication line such as a CAN (Controller Area Network) communication line, a serial communication line, a wireless communication network, or the like.

先にエージェント装置５００以外の構成について説明する。マイク６１０は、車室内で発せられた音声を収集する。表示・操作装置６２０は、画像を表示すると共に、入力操作を受付可能な装置（或いは装置群）である。表示・操作装置６２０は、典型的には、タッチパネルである。表示・操作装置６２０は、更に、ＨＵＤ（Head Up Display）や機械式の入力装置を含んでもよい。スピーカユニット６３０は、例えば、車室内や車外に音声や警報音等を出力する。表示・操作装置６２０は、エージェント装置５００とナビゲーション装置６４０とで共用されてもよい。 First, configurations other than the agent device 500 will be described. The microphone 610 collects the voice emitted in the passenger compartment. The display / operation device 620 is a device (or device group) capable of displaying an image and accepting an input operation. The display / operation device 620 is typically a touch panel. The display / operation device 620 may further include a HUD (Head Up Display) or a mechanical input device. The speaker unit 630 outputs, for example, a voice, an alarm sound, or the like inside or outside the vehicle. The display / operation device 620 may be shared by the agent device 500 and the navigation device 640.

ナビゲーション装置６４０は、ナビＨＭＩ（Human machine Interface）と、ＧＰＳなどの位置測位装置と、地図情報を記憶した記憶装置と、経路探索などを行う制御装置（ナビゲーションコントローラ）とを備える。マイク６１０、表示・操作装置６２０、およびスピーカユニット６３０のうち一部または全部がナビＨＭＩとして用いられてもよい。ナビゲーション装置６４０は、位置測位装置によって特定された車両Ｍの位置に基づいて地図情報を参照し、地図情報から車両Ｍの位置からユーザによって入力された目的地まで移動するための経路（ナビ経路）を探索し、経路に沿って車両Ｍが走行できるように、ナビＨＭＩを用いて案内情報を出力する。経路探索機能は、ネットワークＮＷを介してアクセス可能な情報提供装置１００やナビゲーションサーバにあってもよい。この場合、ナビゲーション装置６４０は、情報提供装置１００やナビゲーションサーバから経路を取得して案内情報を出力する。なお、エージェント装置５００は、ナビゲーションコントローラを基盤として構築されてもよく、その場合、ナビゲーションコントローラとエージェント装置５００は、ハードウェア上は一体に構成される。 The navigation device 640 includes a navigation HMI (Human machine Interface), a positioning device such as GPS, a storage device that stores map information, and a control device (navigation controller) that performs route search and the like. A part or all of the microphone 610, the display / operation device 620, and the speaker unit 630 may be used as the navigation HMI. The navigation device 640 refers to the map information based on the position of the vehicle M specified by the positioning device, and is a route (navigation route) for moving from the position of the vehicle M to the destination input by the user from the map information. Is searched, and guidance information is output using the navigation HMI so that the vehicle M can travel along the route. The route search function may be provided in the information providing device 100 or the navigation server accessible via the network NW. In this case, the navigation device 640 acquires a route from the information providing device 100 or the navigation server and outputs guidance information. The agent device 500 may be constructed based on the navigation controller, and in that case, the navigation controller and the agent device 500 are integrally configured on the hardware.

ＭＰＵ６５０は、例えば、ナビゲーション装置６４０から提供された地図上経路を複数のブロックに分割し（例えば、車両進行方向に関して１００［ｍ］毎に分割し）、ブロックごとに推奨車線を決定する。例えば、ＭＰＵ６５０は、左から何番目の車線を走行するといった決定を行う。また、ＭＰＵ６５０は、ナビゲーション装置６４０の記憶装置に記憶された地図情報よりも高精度な地図情報（高精度地図）を用いて推奨車線を決定してもよい。高精度地図は、例えば、ＭＰＵ６５０の記憶装置に記憶されていてもよく、ナビゲーション装置６４０の記憶装置やエージェント装置５００の車両側記憶部５６０に記憶してもよい。高精度地図は、車線の中央の情報あるいは車線の境界の情報、交通規制情報、住所情報（住所・郵便番号）、施設情報、電話番号情報などが含まれてよい。 The MPU 650, for example, divides the map route provided by the navigation device 640 into a plurality of blocks (for example, divides the route into 100 [m] units with respect to the vehicle traveling direction), and determines a recommended lane for each block. For example, the MPU 650 determines which lane from the left to drive. Further, the MPU 650 may determine the recommended lane by using the map information (high-precision map) having higher accuracy than the map information stored in the storage device of the navigation device 640. The high-precision map may be stored in the storage device of the MPU 650, for example, or may be stored in the storage device of the navigation device 640 or the vehicle side storage unit 560 of the agent device 500. The high-precision map may include information on the center of the lane or information on the boundary of the lane, traffic regulation information, address information (address / zip code), facility information, telephone number information, and the like.

車両機器６６０は、例えば、カメラやレーダ装置、ＬＩＤＡＲ（Light Detection and Ranging）、物体認識装置である。カメラは、例えば、ＣＣＤやＣＭＯＳ等の固体撮像素子を利用したデジタルカメラである。カメラは、車両Ｍの任意の箇所に取り付けられる。レーダ装置は、車両Ｍの周辺にミリ波などの電波を放射すると共に、物体によって反射された電波（反射波）を検出して少なくとも物体の位置（距離および方位）を検出する。ＬＩＤＡＲは、車両Ｍの周辺に光を照射し、散乱光を測定する。ＬＩＤＡＲは、発光から受光までの時間に基づいて、対象までの距離を検出する。物体認識装置は、カメラ、レーダ装置、およびＬＩＤＡＲのうち一部または全部による検出結果に対してセンサフュージョン処理を行って、車両Ｍの周辺に存在する物体の位置、種類、速度などを認識する。物体認識装置は、認識結果をエージェント装置５００や自動運転制御装置７００に出力する。 The vehicle device 660 is, for example, a camera, a radar device, a LIDAR (Light Detection and Ranging), and an object recognition device. The camera is, for example, a digital camera using a solid-state image sensor such as a CCD or CMOS. The camera is attached to any part of the vehicle M. The radar device radiates radio waves such as millimeter waves around the vehicle M, and detects radio waves (reflected waves) reflected by the object to at least detect the position (distance and orientation) of the object. LIDAR irradiates the periphery of the vehicle M with light and measures the scattered light. LIDAR detects the distance to the target based on the time from light emission to light reception. The object recognition device performs sensor fusion processing on the detection result of a part or all of the camera, the radar device, and the LIDAR, and recognizes the position, type, speed, and the like of the object existing around the vehicle M. The object recognition device outputs the recognition result to the agent device 500 and the automatic operation control device 700.

また、車両機器６６０は、例えば、運転操作子や、走行駆動力出力装置、ブレーキ装置、ステアリング装置等を含む。運転操作子は、例えば、アクセルペダル、ブレーキペダル、シフトレバー、ステアリングホイール、異形ステア、ジョイスティックその他の操作子を含む。運転操作子には、操作量あるいは操作の有無を検出するセンサが取り付けられており、その検出結果は、エージェント装置５００や自動運転制御装置７００、もしくは、走行駆動力出力装置、ブレーキ装置、およびステアリング装置のうち一部または全部に出力される。走行駆動力出力装置は、車両Ｍが走行するための走行駆動力（トルク）を駆動輪に出力する。ブレーキ装置は、例えば、ブレーキキャリパーと、ブレーキキャリパーに油圧を伝達するシリンダと、シリンダに油圧を発生させる電動モータと、ブレーキＥＣＵとを備える。ブレーキＥＣＵは、自動運転制御装置７００から入力される情報、或いは運転操作子から入力される情報に従って電動モータを制御し、制動操作に応じたブレーキトルクが各車輪に出力されるようにする。ステアリング装置は、例えば、ステアリングＥＣＵと、電動モータとを備える。電動モータは、例えば、ラックアンドピニオン機構に力を作用させて転舵輪の向きを変更する。ステアリングＥＣＵは、自動運転制御装置７００から入力される情報、或いは運転操作子から入力される情報に従って、電動モータを駆動し、転舵輪の向きを変更させる。 Further, the vehicle equipment 660 includes, for example, a driving operator, a traveling driving force output device, a braking device, a steering device, and the like. Driving controls include, for example, accelerator pedals, brake pedals, shift levers, steering wheels, odd-shaped steers, joysticks and other controls. A sensor for detecting the amount of operation or the presence or absence of operation is attached to the operation controller, and the detection result is the agent device 500, the automatic operation control device 700, or the traveling drive force output device, the brake device, and the steering. It is output to some or all of the devices. The traveling driving force output device outputs a traveling driving force (torque) for the vehicle M to travel to the drive wheels. The brake device includes, for example, a brake caliper, a cylinder that transmits hydraulic pressure to the brake caliper, an electric motor that generates hydraulic pressure in the cylinder, and a brake ECU. The brake ECU controls the electric motor according to the information input from the automatic operation control device 700 or the information input from the operation operator so that the brake torque corresponding to the braking operation is output to each wheel. The steering device includes, for example, a steering ECU and an electric motor. The electric motor, for example, exerts a force on the rack and pinion mechanism to change the direction of the steering wheel. The steering ECU drives the electric motor according to the information input from the automatic driving control device 700 or the information input from the driving operator, and changes the direction of the steering wheel.

また、車両機器６６０は、例えば、ドアロック装置、ドア開閉装置、窓、窓の開閉装置および窓の開閉制御装置、シート、シート位置の制御装置、ルームミラーおよびその角度位置制御装置、車両内外の照明装置およびその制御装置、ワイパーやデフォッガーおよびそれぞれの制御装置、方向指示灯およびその制御装置、空調装置などの車両情報装置などが含まれてもよい。 Further, the vehicle equipment 660 includes, for example, a door lock device, a door opening / closing device, a window, a window opening / closing device and a window opening / closing control device, a seat, a seat position control device, a room mirror and its angle position control device, and inside and outside the vehicle. Lighting devices and their control devices, wipers and defoggers and their respective control devices, direction indicator lights and their control devices, vehicle information devices such as air conditioners, and the like may be included.

車載通信装置６７０は、例えば、セルラー網やＷｉ－Ｆｉ網を利用してネットワークＮＷにアクセス可能な無線通信装置である。 The in-vehicle communication device 670 is a wireless communication device that can access the network NW using, for example, a cellular network or a Wi-Fi network.

乗員認識装置６９０は、例えば、着座センサ、車室内カメラ、画像認識装置などを含む。着座センサは座席の下部に設けられた圧力センサ、シートベルトに取り付けられた張力センサなどを含む。車室内カメラは、車室内に設けられたＣＣＤカメラやＣＭＯＳカメラである。画像認識装置は、車室内カメラの画像を解析し、座席ごとのユーザの有無、ユーザの顔などを認識して、ユーザの着座位置を認識する。また、乗員認識装置６９０は、予め登録された顔画像とのマッチング処理を行うことで、画像に含まれる運転席や助手席等に着座するユーザを特定してもよい。 The occupant recognition device 690 includes, for example, a seating sensor, a vehicle interior camera, an image recognition device, and the like. The seating sensor includes a pressure sensor provided at the bottom of the seat, a tension sensor attached to the seat belt, and the like. The vehicle interior camera is a CCD camera or a CMOS camera provided in the vehicle interior. The image recognition device analyzes the image of the vehicle interior camera, recognizes the presence / absence of the user for each seat, the face of the user, and the like, and recognizes the seating position of the user. Further, the occupant recognition device 690 may specify a user who is seated in a driver's seat, a passenger's seat, or the like included in the image by performing a matching process with a face image registered in advance.

自動運転制御装置７００は、例えば、ＣＰＵなどのハードウェアプロセッサがプログラム（ソフトウェア）を実行することにより処理を行う。自動運転制御装置７００の構成要素のうち一部または全部は、ＬＳＩやＡＳＩＣ、ＦＰＧＡ、ＧＰＵ等のハードウェア（回路部；circuitryを含む）によって実現されてもよいし、ソフトウェアとハードウェアの協働によって実現されてもよい。プログラムは、予め自動運転制御装置７００のＨＤＤやフラッシュメモリなどの記憶装置（非一過性の記憶媒体を備える記憶装置）に格納されていてもよいし、ＤＶＤやＣＤ－ＲＯＭなどの着脱可能な記憶媒体に格納されており、記憶媒体（非一過性の記憶媒体）がドライブ装置に装着されることで自動運転制御装置７００のＨＤＤやフラッシュメモリにインストールされてもよい。 The automatic operation control device 700 performs processing by, for example, a hardware processor such as a CPU executing a program (software). A part or all of the components of the automatic operation control device 700 may be realized by hardware (circuit unit; including circuitry) such as LSI, ASIC, FPGA, GPU, or cooperation between software and hardware. May be realized by. The program may be stored in advance in a storage device (a storage device including a non-transient storage medium) such as an HDD or a flash memory of the automatic operation control device 700, or may be detachable such as a DVD or a CD-ROM. It is stored in a storage medium, and may be installed in the HDD or flash memory of the automatic operation control device 700 by mounting the storage medium (non-transient storage medium) in the drive device.

自動運転制御装置７００は、車両機器６６０の物体認識装置を介して入力された情報に基づいて、車両Ｍの周辺にある物体の位置、および速度、加速度等の状態を認識する。自動運転制御装置７００は、原則的にはＭＰＵ６５０により決定された推奨車線を走行し、更に、車両Ｍの周辺状況に対応できるように、車両Ｍが自動的に（運転者の操作に依らずに）将来走行する目標軌道を生成する。目標軌道は、例えば、速度要素を含んでいる。例えば、目標軌道は、車両Ｍの到達すべき地点（軌道点）を順に並べたものとして表現される。 The automatic driving control device 700 recognizes the position, speed, acceleration, and other states of objects in the vicinity of the vehicle M based on the information input via the object recognition device of the vehicle equipment 660. In principle, the automatic driving control device 700 travels in the recommended lane determined by the MPU 650, and the vehicle M automatically (without depending on the driver's operation) so as to be able to respond to the surrounding conditions of the vehicle M. ) Generate a target track to run in the future. The target trajectory contains, for example, a speed element. For example, the target track is expressed as an arrangement of points (track points) to be reached by the vehicle M in order.

自動運転制御装置７００は、目標軌道を生成するにあたり、自動運転のイベントを設定してよい。自動運転のイベントには、定速走行イベント、低速追従走行イベント、車線変更イベント、分岐イベント、合流イベント、テイクオーバーイベント、自動駐車イベントなどがある。自動運転制御装置７００は、起動させたイベントに応じた目標軌道を生成する。また、自動運転制御装置７００は、生成した目標軌道を、予定の時刻通りに車両Ｍが通過するように、車両機器６６０の走行駆動力出力装置、ブレーキ装置、およびステアリング装置を制御する。例えば、自動運転制御装置７００は、目標軌道（軌道点）に付随する速度要素に基づいて、走行駆動力出力装置またはブレーキ装置を制御したり、目標軌道の曲がり具合に応じて、ステアリング装置を制御する。 The automatic driving control device 700 may set an event of automatic driving in generating a target trajectory. Autonomous driving events include constant speed driving events, low speed following driving events, lane change events, branching events, merging events, takeover events, automatic parking events, and the like. The automatic operation control device 700 generates a target trajectory according to the activated event. Further, the automatic driving control device 700 controls the traveling driving force output device, the braking device, and the steering device of the vehicle equipment 660 so that the vehicle M passes the generated target track at the scheduled time. For example, the automatic driving control device 700 controls a traveling driving force output device or a braking device based on a speed element associated with a target track (track point), or controls a steering device according to the degree of bending of the target track. do.

次に、エージェント装置５００について説明する。エージェント装置５００は、車両Ｍの乗員と対話を行う装置である。例えば、エージェント装置５００は、乗員の発話を情報提供装置１００に送信し、その発話に対する回答を情報提供装置１００から受信する。エージェント装置５００は、受信した回答を、音声や画像を用いて乗員に提示する。 Next, the agent device 500 will be described. The agent device 500 is a device that interacts with the occupant of the vehicle M. For example, the agent device 500 transmits the utterance of the occupant to the information providing device 100, and receives the answer to the utterance from the information providing device 100. The agent device 500 presents the received answer to the occupant using voice or an image.

エージェント装置５００は、例えば、管理部５２０と、エージェント機能部５４０と、車両側記憶部５６０とを備える。管理部５２０は、例えば、音響処理部５２２と、表示制御部５２４と、音声制御部５２６とを備える。図８において、これらの構成要素の配置は説明のために簡易に示しており、実際には、例えば、エージェント機能部５４０と車載通信装置６０の間に管理部５２０が介在してもよく、その配置は任意に改変することができる。 The agent device 500 includes, for example, a management unit 520, an agent function unit 540, and a vehicle-side storage unit 560. The management unit 520 includes, for example, an acoustic processing unit 522, a display control unit 524, and a voice control unit 526. In FIG. 8, the arrangement of these components is simply shown for the sake of explanation, and in reality, for example, a management unit 520 may intervene between the agent function unit 540 and the vehicle-mounted communication device 60. The arrangement can be modified arbitrarily.

エージェント装置５００の車両側記憶部５６０以外の各構成要素は、例えば、ＣＰＵなどのハードウェアプロセッサがプログラム（ソフトウェア）を実行することにより実現される。これらの構成要素のうち一部または全部は、ＬＳＩやＡＳＩＣ、ＦＰＧＡ、ＧＰＵなどのハードウェア（回路部；circuitryを含む）によって実現されてもよいし、ソフトウェアとハードウェアの協働によって実現されてもよい。プログラムは、予めＨＤＤ（Hard Disk Drive）やフラッシュメモリなどの記憶装置（非一過性の記憶媒体を備える記憶装置）に格納されていてもよいし、ＤＶＤやＣＤ－ＲＯＭなどの着脱可能な記憶媒体（非一過性の記憶媒体）に格納されており、記憶媒体がドライブ装置に装着されることでインストールされてもよい。 Each component other than the vehicle-side storage unit 560 of the agent device 500 is realized by, for example, a hardware processor such as a CPU executing a program (software). Some or all of these components may be realized by hardware such as LSI, ASIC, FPGA, GPU (circuit part; including circuitry), or realized by the cooperation of software and hardware. May be good. The program may be stored in advance in a storage device (a storage device including a non-transient storage medium) such as an HDD (Hard Disk Drive) or a flash memory, or a removable storage device such as a DVD or a CD-ROM. It is stored in a medium (non-transient storage medium) and may be installed by mounting the storage medium in a drive device.

車両側記憶部５６０は、上記の各種記憶装置、或いはＥＥＰＲＯＭ、ＲＯＭ、またはＲＡＭ等により実現されてよい。車両側記憶部５６０には、例えば、プログラム、その他各種情報が格納される。 The vehicle-side storage unit 560 may be realized by the above-mentioned various storage devices, EEPROM, ROM, RAM, or the like. For example, a program and various other information are stored in the vehicle side storage unit 560.

管理部５２０は、ＯＳ（Operating System）やミドルウェアなどのプログラムが実行されることで機能する。 The management unit 520 functions by executing a program such as an OS (Operating System) or middleware.

音響処理部５２２は、車両Ｍの乗員（例えば、ユーザＵ２）から受け付けた各種音声のうち、問い合わせや要求等に関する情報を認識するのに適した状態になるように、入力された音に対して音響処理を行う。具体的には、音響処理部５２２は、ノイズ除去などの音響処理を行ってよい。 The sound processing unit 522 receives input sound from the occupant of the vehicle M (for example, user U2) so as to be in a state suitable for recognizing information related to an inquiry, a request, or the like. Perform sound processing. Specifically, the acoustic processing unit 522 may perform acoustic processing such as noise reduction.

表示制御部５２４は、エージェント機能部５４０からの指示に応じて、表示・操作装置６２０等の出力装置に車両Ｍの乗員からの問い合わせや要求に対する回答結果に関する画像を生成する。回答結果に関する画像とは、例えば、問い合わせや要求等に対する回答結果を示す店舗や施設の一覧リストを示す画像や、各店舗や施設に関する画像、目的地までの走行経路を示す画像、その他レコメンド情報や処理の開始または終了を示す画像等である。また、表示制御部５２４は、エージェント機能部５４０からの指示に応じて、乗員とコミュニケーションを行う擬人化されたキャラクタ画像（以下、エージェント画像と称する）を生成してもよい。エージェント画像は、例えば、乗員に対して話しかける態様の画像である。エージェント画像は、例えば、少なくとも観者（乗員）によって表情や顔向きが認識される程度の顔画像を含んでよい。表示制御部５２４は、生成した画像を表示・操作装置６２０に出力させる。 The display control unit 524 generates an image regarding the response result to the inquiry or request from the occupant of the vehicle M to the output device such as the display / operation device 620 in response to the instruction from the agent function unit 540. The image related to the answer result is, for example, an image showing a list of stores and facilities showing the answer result to inquiries and requests, an image about each store and facility, an image showing the driving route to the destination, and other recommendation information. An image or the like showing the start or end of processing. Further, the display control unit 524 may generate an anthropomorphic character image (hereinafter referred to as an agent image) that communicates with the occupant in response to an instruction from the agent function unit 540. The agent image is, for example, an image of a mode of talking to an occupant. The agent image may include, for example, a facial image such that the facial expression and the facial orientation are recognized by the viewer (occupant) at least. The display control unit 524 outputs the generated image to the display / operation device 620.

音声制御部５２６は、エージェント機能部５４０からの指示に応じて、スピーカ６３０に含まれるスピーカのうち一部または全部に音声を出力させる。音声には、例えば、エージェント画像が乗員と対話を行うための音声や、表示制御部５２４により画像を表示・操作装置６２０に出力された画像に対応する音声が含まれる。また、音声制御部５２６は、複数のスピーカ６３０を用いて、エージェント画像の表示位置に対応する位置にエージェント音声の音像を定位させる制御を行ってもよい。エージェント画像の表示位置に対応する位置とは、例えば、エージェント画像がエージェント音声を喋っていると乗員が感じると予測される位置であり、具体的には、エージェント画像の表示位置付近（例えば、２～３［ｃｍ］以内）の位置である。また、音像が定位するとは、例えば、ユーザの左右の耳に伝達される音の大きさを調節することにより、乗員が感じる音源の空間的な位置を定めることである。 The voice control unit 526 causes some or all of the speakers included in the speaker 630 to output voice in response to an instruction from the agent function unit 540. The voice includes, for example, a voice for the agent image to have a dialogue with the occupant, and a voice corresponding to the image output to the display / operation device 620 by the display control unit 524. Further, the voice control unit 526 may use a plurality of speakers 630 to control the localization of the sound image of the agent voice at a position corresponding to the display position of the agent image. The position corresponding to the display position of the agent image is, for example, a position where the occupant is expected to feel that the agent image is speaking the agent voice, and specifically, a position near the display position of the agent image (for example, 2). It is within ~ 3 [cm]). Further, the localization of the sound image means that, for example, the spatial position of the sound source felt by the occupant is determined by adjusting the loudness of the sound transmitted to the left and right ears of the user.

エージェント機能部５４０は、管理部５２０により取得される各種情報に基づいて、情報提供装置１００と協働してエージェント画像等を出現させ、車両Ｍの乗員の発話に応じて、音声による応答を含むサービスを提供する。例えば、エージェント機能部５４０は、音響処理部５２２により処理された音声に含まれる起動ワードに基づいてエージェントを起動したり、終了ワードに基づいてエージェントを終了させたりする。また、エージェント機能部５４０は、音響処理部５２２により処理された音声データを、車載通信装置６７０を介して情報提供装置１００に送信したり、情報提供装置１００から得られる情報を乗員に提供したりする。また、エージェント機能部５４０は、汎用通信装置６８０と連携し、情報提供装置１００と通信する機能を備えていてもよい。この場合、エージェント機能部５４０は、例えば、Ｂｌｕｅｔｏｏｔｈ（登録商標）によって汎用通信装置６８０とペアリングを行い、エージェント機能部５４０と汎用通信装置６８０とを接続させる。また、エージェント機能部５４０は、ＵＳＢ（Universal Serial Bus）などを利用した有線通信によって汎用通信装置６８０に接続されるようにしてもよい。 The agent function unit 540 causes an agent image or the like to appear in cooperation with the information providing device 100 based on various information acquired by the management unit 520, and includes a voice response according to the utterance of the occupant of the vehicle M. Provide services. For example, the agent function unit 540 activates the agent based on the start word included in the voice processed by the sound processing unit 522, or terminates the agent based on the end word. Further, the agent function unit 540 transmits the voice data processed by the sound processing unit 522 to the information providing device 100 via the in-vehicle communication device 670, and provides the information obtained from the information providing device 100 to the occupant. do. Further, the agent function unit 540 may have a function of cooperating with the general-purpose communication device 680 and communicating with the information providing device 100. In this case, the agent function unit 540 pairs with the general-purpose communication device 680 by, for example, Bluetooth (registered trademark), and connects the agent function unit 540 and the general-purpose communication device 680. Further, the agent function unit 540 may be connected to the general-purpose communication device 680 by wired communication using USB (Universal Serial Bus) or the like.

［情報処理装置の処理フロー］
次に、情報提供装置１００による一連の処理の流れについてフローチャートを用いて説明する。図９及び１０は、実施形態の情報提供装置１００による一連の処理の流れを表すフローチャートである。 [Processing flow of information processing device]
Next, a flow of a series of processes by the information providing device 100 will be described with reference to a flowchart. 9 and 10 are flowcharts showing the flow of a series of processes by the information providing device 100 of the embodiment.

まず、取得部１０６は、通信部１０２を介して、通信端末３００またはエージェント装置５００から、複数のユーザの発話及び行動履歴を取得する（ステップＳ１００）。取得部１０６は、ユーザの発話及び行動履歴を取得すると、それらを発話履歴情報１３４及び行動履歴情報１３６として記憶部１３０に記憶させる。 First, the acquisition unit 106 acquires the utterances and action histories of a plurality of users from the communication terminal 300 or the agent device 500 via the communication unit 102 (step S100). When the acquisition unit 106 acquires the user's utterance and action history, it stores them in the storage unit 130 as the utterance history information 134 and the action history information 136.

次に、音声認識部１０８は、音声認識により複数のユーザのそれぞれの発話からテキストデータを生成する（ステップＳ１０２）。通信端末３００またはエージェント装置５００において既に発話がテキスト化されていた場合、つまり、取得部１０６によって取得されたユーザの発話がテキストデータであった場合、Ｓ１０２の処理は省略されてよい。 Next, the voice recognition unit 108 generates text data from the utterances of each of the plurality of users by voice recognition (step S102). If the utterance is already converted into text in the communication terminal 300 or the agent device 500, that is, if the user's utterance acquired by the acquisition unit 106 is text data, the process of S102 may be omitted.

次に、自然言語処理部１１０は、音声認識部１０８によって生成された各ユーザの発話のテキストデータの中から、一人の対象ユーザの発話由来のテキストデータを選択し、その選択したテキストデータから固有表現を抽出する（ステップＳ１０４）。つまり、自然言語処理部１１０は、不特定多数のユーザの中から対象ユーザを選択し、その対象ユーザの発話から固有表現を抽出する。 Next, the natural language processing unit 110 selects text data derived from the utterance of one target user from the text data of the utterances of each user generated by the voice recognition unit 108, and is unique from the selected text data. The expression is extracted (step S104). That is, the natural language processing unit 110 selects a target user from an unspecified number of users, and extracts a named entity from the utterance of the target user.

次に、判定部１１２は、複数のユーザの行動履歴に基づいて、それら複数のユーザが訪れた地点の中に訪問回数が急増した地点が存在するか否かを判定する（ステップＳ１０６）。 Next, the determination unit 112 determines whether or not there is a point where the number of visits has increased sharply among the points visited by the plurality of users based on the action history of the plurality of users (step S106).

図１１は、訪問回数が急増した地点とそうでない地点とを説明するための図である。図示の例では、地図上に観光地のような３つの候補地Ｘ１～Ｘ３が存在している。例えば、候補地Ｘ１では、訪問数が急増しており、ユーザの訪問回数が閾値以上、又はユーザの訪問回数の所定期間あたりの増加率が閾値以上となっている。一方、候補地Ｘ２やＸ３では、訪問数に変動がなく、ユーザの訪問回数が閾値未満、又はユーザの訪問回数の所定期間あたりの増加率が閾値未満となっている。このような場合、判定部１１２は、候補地Ｘ１が訪問回数が急増した地点であり、候補地Ｘ２やＸ３が訪問回数が急増していない地点であると判定する。 FIG. 11 is a diagram for explaining a point where the number of visits has increased sharply and a point where the number of visits has not increased. In the illustrated example, there are three candidate sites X1 to X3 such as tourist spots on the map. For example, in the candidate site X1, the number of visits is rapidly increasing, and the number of visits by the user is equal to or greater than the threshold value, or the rate of increase in the number of visits by the user per predetermined period is equal to or greater than the threshold value. On the other hand, at the candidate sites X2 and X3, the number of visits does not change, the number of visits by the user is less than the threshold value, or the rate of increase in the number of visits by the user per predetermined period is less than the threshold value. In such a case, the determination unit 112 determines that the candidate site X1 is a point where the number of visits has increased sharply, and the candidate sites X2 and X3 are points where the number of visits has not increased sharply.

また、判定部１１２は、観光地のような人が集まりやすい地点でなくとも、ユーザの訪問回数が閾値以上、又はユーザの訪問回数の所定期間あたりの増加率が閾値以上の地点を、訪問回数が急増した地点として判定してよい。例えば、判定部１１２は、不特定多数のユーザの位置情報を参照し、とある地点に多数のユーザが集まっており、その地点におけるユーザの訪問回数が閾値以上、又はその増加率が閾値以上である場合には、当該地点を訪問回数が急増した地点として判定してよい。つまり、何らかの理由によって多数のユーザを感化させている地点が存在する場合、その地点が訪問回数が急増した地点として判定される。 Further, the determination unit 112 visits a point where the number of visits by the user is equal to or greater than the threshold value or the rate of increase in the number of visits by the user is equal to or greater than the threshold value, even if the location is not a place where people tend to gather, such as a tourist spot. May be determined as a point where the number has increased sharply. For example, the determination unit 112 refers to the position information of an unspecified number of users, and a large number of users are gathered at a certain point, and the number of visits by the user at that point is equal to or greater than the threshold value, or the rate of increase thereof is equal to or greater than the threshold value. In some cases, the point may be determined as a point where the number of visits has increased sharply. That is, if there is a point that inspires a large number of users for some reason, that point is determined as a point where the number of visits has increased sharply.

図９及び１０のフローチャートの説明に戻る。次に、判定部１１２は、対象ユーザが訪問回数が急増した地点を実際に訪問したか否かを判定する（ステップＳ１０８）。 Returning to the description of the flowcharts of FIGS. 9 and 10. Next, the determination unit 112 determines whether or not the target user has actually visited a point where the number of visits has increased rapidly (step S108).

例えば、判定部１１２は、地図上において、対象ユーザの位置座標と、訪問回数が急増した地点を訪問した他のユーザの位置座標とを比較し、それらユーザ同士の位置座標が同じ場合、対象ユーザが訪問回数が急増した地点を訪問したと判定してよい。また、訪問回数が急増した地点として判定された施設（例えば商業ビルや駐車場）内において無料Ｗｉ－Ｆｉなどの通信サービスが提供されており、そこで対象ユーザと他のユーザとが共にその通信サービスを利用したとする。この場合、対象ユーザ及び他のユーザのそれぞれの位置情報には、Ｗｉ－Ｆｉのアクセスポイントの位置情報が含まれる。従って、判定部１１２は、対象ユーザ及び他のユーザのそれぞれの位置情報の中に共通のアクセスポイントの位置情報が含まれる場合、対象ユーザが訪問回数が急増した地点を訪問したと判定してよい。 For example, the determination unit 112 compares the position coordinates of the target user with the position coordinates of other users who have visited the point where the number of visits has increased rapidly on the map, and if the position coordinates of the users are the same, the target user. May determine that he has visited a point where the number of visits has increased sharply. In addition, communication services such as free Wi-Fi are provided in facilities (for example, commercial buildings and parking lots) that are determined to be points where the number of visits has increased sharply, and the target users and other users can use the communication services together. Suppose that you used. In this case, the location information of the target user and the other users includes the location information of the Wi-Fi access point. Therefore, when the position information of the common access point is included in the position information of the target user and other users, the determination unit 112 may determine that the target user has visited the point where the number of visits has increased rapidly. ..

ユーザベクトル生成部１１４は、対象ユーザが訪問回数が急増した地点を訪問したと判定部１１２によって判定された場合、「訪問した」ということを表す対象ユーザの行動ベクトルを生成する（ステップＳ１１０）。 When the determination unit 112 determines that the target user has visited a point where the number of visits has increased rapidly, the user vector generation unit 114 generates an action vector of the target user indicating "visited" (step S110).

一方、ユーザベクトル生成部１１４は、訪問回数が急増した地点が存在しない、又は対象ユーザが訪問回数が急増した地点を訪問していないと判定部１１２によって判定された場合、「訪問していない」ということを表す対象ユーザの行動ベクトルを生成する（ステップＳ１１２）。 On the other hand, the user vector generation unit 114 "does not visit" when the determination unit 112 determines that there is no point where the number of visits has increased rapidly or the target user has not visited the point where the number of visits has increased rapidly. The action vector of the target user indicating that is generated (step S112).

例えば、ユーザベクトル生成部１１４は、「訪問した」ということを「１」とし、「訪問していない」ということを「０」とした一次元のベクトル（スカラ）を行動ベクトルとして生成してよい。また、ユーザベクトル生成部１１４は、対象ユーザが何度も繰り返し訪問回数が急増した地点を訪問している場合、訪問回数Ｎを要素とした行動ベクトルを生成してもよい。 For example, the user vector generation unit 114 may generate a one-dimensional vector (scalar) in which "visited" is set to "1" and "not visited" is set to "0" as an action vector. .. Further, when the target user repeatedly visits a point where the number of visits has rapidly increased, the user vector generation unit 114 may generate an action vector having the number of visits N as an element.

次に、ユーザベクトル生成部１１４は、対象ユーザの発話から抽出された固有表現がベクトル化された発話ベクトルと、対象ユーザの行動ベクトルとを組み合わせて、対象ユーザのユーザベクトルを生成する（ステップＳ１１４）。例えば、発話ベクトルが１０次元であり、行動ベクトルが１次元である場合、ユーザベクトルは１１次元のベクトルとなる。ユーザベクトルは、対象ユーザのユーザＩＤ等に対応付けられてよい。 Next, the user vector generation unit 114 generates a user vector of the target user by combining the utterance vector in which the named entity extracted from the utterance of the target user is vectorized and the action vector of the target user (step S114). ). For example, if the utterance vector is 10-dimensional and the action vector is 1-dimensional, the user vector is an 11-dimensional vector. The user vector may be associated with the user ID of the target user or the like.

次に、自然言語処理部１１０は、発話及び行動履歴が取得された全ユーザについてユーザベクトルが生成されたか否かを判定する（ステップＳ１１６）。全ユーザについてユーザベクトルが生成されていない場合、自然言語処理部１１０は、Ｓ１０４に処理を戻し、前回対象ユーザとして選択したユーザと異なる他のユーザを新たな対象ユーザとして選択し直し、その新たな対象ユーザの発話から固有表現を抽出する。以降、新たな対象ユーザに関してＳ１０６からＳ１１４の処理が行われ、新たな対象ユーザのユーザベクトルが生成される。このようにしてユーザベクトルが繰り返し生成される。 Next, the natural language processing unit 110 determines whether or not a user vector has been generated for all the users whose utterances and action histories have been acquired (step S116). When the user vector is not generated for all users, the natural language processing unit 110 returns the processing to S104, reselects another user different from the user selected as the target user last time, and reselects the new target user. Extract unique expressions from the utterances of the target user. After that, the processes of S106 to S114 are performed for the new target user, and the user vector of the new target user is generated. In this way, the user vector is repeatedly generated.

一方、全ユーザについてユーザベクトルが生成された場合、解析部１１６は、それら複数のユーザベクトルのそれぞれの次元を圧縮する（ステップＳ１１８）。例えば、解析部１１６は、ユーザベクトルが１１次元である場合、１０次元又はそれ以下まで圧縮する（ベクトルの要素数を減らす）。 On the other hand, when the user vector is generated for all users, the analysis unit 116 compresses each dimension of the plurality of user vectors (step S118). For example, when the user vector is 11 dimensions, the analysis unit 116 compresses it to 10 dimensions or less (reduces the number of elements of the vector).

次に、解析部１１６は、次元を圧縮した複数のユーザベクトルのクラスタリングを行い、発話内容や訪問地点といった特徴が類似するユーザ同士を同一のクラスタに分類する（ステップＳ１２０）。 Next, the analysis unit 116 performs clustering of a plurality of user vectors with compressed dimensions, and classifies users having similar characteristics such as utterance contents and visiting points into the same cluster (step S120).

次に、収集部１１８は、クラスタリングによって生成されたクラスタごとに、そのクラスタに属するユーザの発話から抽出された固有表現の共起表現を収集する（ステップＳ１２２）。 Next, the collecting unit 118 collects the co-occurrence expression of the named entity extracted from the utterance of the user belonging to the cluster for each cluster generated by the clustering (step S122).

図１２は、ユーザベクトルのクラスタリング結果の一例を表す図である。図示の例では、複数のユーザベクトルが、Ａ、Ｂ、Ｃの３つのクラスタに分類されている。この場合、収集部１１８は、クラスタＡにユーザベクトルが属するユーザ（以下、ユーザ群Ａという）の固有表現に対する共起表現をウェブサイトなどから収集する。クラスタＡ、Ｂ、Ｃのうちいずれか一つは「特定クラスタ」の一例である。 FIG. 12 is a diagram showing an example of the clustering result of the user vector. In the illustrated example, the plurality of user vectors are classified into three clusters A, B, and C. In this case, the collecting unit 118 collects the co-occurrence expression for the named entity of the user (hereinafter referred to as the user group A) to which the user vector belongs to the cluster A from the website or the like. Any one of clusters A, B, and C is an example of a "specific cluster".

同様に、収集部１１８は、クラスタＢにユーザベクトルが属するユーザ（以下、ユーザ群Ｂという）の固有表現に対する共起表現と、クラスタＣにユーザベクトルが属するユーザ（以下、ユーザ群Ｃという）の固有表現に対する共起表現とを、ウェブサイトなどから収集する。 Similarly, the collecting unit 118 has a co-occurrence expression for the unique expression of the user to which the user vector belongs to the cluster B (hereinafter referred to as the user group B) and the user having the user vector to the cluster C (hereinafter referred to as the user group C). Collect co-occurrence expressions for unique expressions from websites.

例えば、ユーザ群Ａでは、「ＡＢＣＤＥＦ」という飲食店の言い回しである「ＡＢＣ」が頻繁に発話されていたとする。この場合、収集部１１８は、ウェブページなどにおいて「ＡＢＣ」という表現とともに出現しやすい表現を、ユーザ群Ａの共起表現として収集する。一方、ユーザ群Ｂでは、「ＡＢＣＤＥＦ」という飲食店の言い回しである「ＤＥＦ」が頻繁に発話されていたとする。この場合、収集部１１８は、ウェブページなどにおいて「ＤＥＦ」という表現とともに出現しやすい表現を、ユーザ群Ｂの共起表現として収集する。 For example, in the user group A, it is assumed that "ABC", which is the phrase of the restaurant "ABCDEF", is frequently spoken. In this case, the collecting unit 118 collects expressions that are likely to appear together with the expression "ABC" on a web page or the like as co-occurrence expressions of the user group A. On the other hand, in the user group B, it is assumed that "DEF", which is the phrase of the restaurant "ABCDEF", is frequently spoken. In this case, the collecting unit 118 collects expressions that are likely to appear together with the expression "DEF" on a web page or the like as co-occurrence expressions of the user group B.

図９及び１０のフローチャートの説明に戻る。次に、辞書生成部１２０は、クラスタリングによって生成されたクラスタごとに、音声認識のためのＡＳＲ辞書や自然言語理解のためのＮＬＵ辞書を含む情報処理辞書を生成する（ステップＳ１２４）。 Returning to the description of the flowcharts of FIGS. 9 and 10. Next, the dictionary generation unit 120 generates an information processing dictionary including an ASR dictionary for speech recognition and an NLU dictionary for natural language understanding for each cluster generated by clustering (step S124).

図１３は、情報処理辞書の生成方法を説明するための図である。図示のように、辞書生成部１２０は、クラスタＡについて、ユーザ群Ａの固有表現や共起表現が互いに対応付けられた情報処理辞書ＤＩＣＴ＿Ａを生成してよい。同様に、辞書生成部１２０は、クラスタＢについて、ユーザ群Ｂの固有表現や共起表現が互いに対応付けられた情報処理辞書ＤＩＣＴ＿Ｂを生成し、クラスタＣについて、ユーザ群Ｃの固有表現や共起表現が互いに対応付けられた情報処理辞書ＤＩＣＴ＿Ｃを生成してよい。このように、辞書生成部１２０は、クラスタごとに情報処理辞書を生成する。 FIG. 13 is a diagram for explaining a method of generating an information processing dictionary. As shown in the figure, the dictionary generation unit 120 may generate the information processing dictionary DICT_A in which the named entities and co-occurrence expressions of the user group A are associated with each other for the cluster A. Similarly, the dictionary generation unit 120 generates an information processing dictionary DICT_B in which the eigenexpressions and co-occurrence expressions of the user group B are associated with each other for the cluster B, and the eigenexpressions and co-occurrence of the user group C for the cluster C. The information processing dictionary DICT_C in which the expressions are associated with each other may be generated. In this way, the dictionary generation unit 120 generates an information processing dictionary for each cluster.

図９及び１０のフローチャートの説明に戻る。次に、辞書生成部１２０は、通信端末３００又はエージェント装置５００の各記憶装置の中に既存辞書が存在するか否かを判定するか否かを判定する（ステップＳ１２６）。既存辞書とは、例えば、携帯電話やパーソナルコンピュータなどにおいて利用される文字の予測変換機能や入力予測機能（サジェスト機能）を実現するための各種辞書である。 Returning to the description of the flowcharts of FIGS. 9 and 10. Next, the dictionary generation unit 120 determines whether or not the existing dictionary exists in each storage device of the communication terminal 300 or the agent device 500 (step S126). The existing dictionary is, for example, various dictionaries for realizing a character prediction conversion function and an input prediction function (suggest function) used in mobile phones, personal computers, and the like.

辞書生成部１２０は、既存辞書が存在すると判定した場合、情報処理辞書と既存辞書とを組み合わせた新情報処理辞書を生成する（ステップＳ１２８）。新情報処理辞書には、音声認識のためのＡＳＲ辞書及び／又は自然言語理解のためのＮＬＵ辞書に加えて、更に既存辞書が含まれる。新情報処理辞書は「新辞書」の一例である。 When it is determined that the existing dictionary exists, the dictionary generation unit 120 generates a new information processing dictionary by combining the information processing dictionary and the existing dictionary (step S128). The new information processing dictionary includes an existing dictionary in addition to an ASR dictionary for speech recognition and / or an NLU dictionary for natural language understanding. The new information processing dictionary is an example of a "new dictionary".

次に、検証部１２２は、辞書生成部１２０によって生成された情報処理辞書（新情報処理辞書を含む）の精度を検証する（ステップＳ１３０）。例えば、検証部１２２は、ユーザ群Ａの発話に基づいて、クラスタＡの情報処理辞書の精度を検証する。より具体的には、検証部１２２は、ユーザ群Ａ（クラスタＡにユーザベクトルが所属するユーザ）における発話頻度に対するカバレッジ（被覆率）とユーザに対するカバレッジとが、予め設定された閾値以上である場合、クラスタＡの情報処理辞書の精度が閾値以上であると判定する。同様に、検証部１２２は、ユーザ群Ｂの発話に基づいて、クラスタＢの情報処理辞書の精度を検証し、ユーザ群Ｃの発話に基づいて、クラスタＣの情報処理辞書の精度を検証する。 Next, the verification unit 122 verifies the accuracy of the information processing dictionary (including the new information processing dictionary) generated by the dictionary generation unit 120 (step S130). For example, the verification unit 122 verifies the accuracy of the information processing dictionary of the cluster A based on the utterance of the user group A. More specifically, in the verification unit 122, when the coverage (coverage) for the utterance frequency and the coverage for the user in the user group A (users to which the user vector belongs to the cluster A) are equal to or higher than a preset threshold value. , It is determined that the accuracy of the information processing dictionary of cluster A is equal to or higher than the threshold value. Similarly, the verification unit 122 verifies the accuracy of the information processing dictionary of the cluster B based on the utterance of the user group B, and verifies the accuracy of the information processing dictionary of the cluster C based on the utterance of the user group C.

次に、提供部１２４は、情報処理辞書の精度が閾値以上である場合、その情報処理辞書の利用案内情報を、通信端末３００又はエージェント装置５００に提供する（ステップＳ１３２）。これによって本フローチャートの処理が終了する。 Next, when the accuracy of the information processing dictionary is equal to or higher than the threshold value, the providing unit 124 provides the usage guidance information of the information processing dictionary to the communication terminal 300 or the agent device 500 (step S132). This ends the processing of this flowchart.

図１４は、情報処理辞書の利用案内情報が提供される場面を模式的に表す図である。図中のＵ３は、クラスタリングの特徴空間上において、ユーザベクトルがクラスタＡに近いユーザである。つまり、ユーザＵ３は、ユーザ群Ａと発話内容や行動履歴といった特徴が類似しているユーザである。このようなユーザＵ３には、例えば、ユーザ群Ａの固有表現や共起表現を含む情報処理辞書ＤＩＣＴ＿Ａと既存辞書ＤＩＣＴ＿Ｘとの組み合わせである新情報処理辞書ＤＩＣＴ＿ＮＥＷを音声ユーザインターフェース上において設定するよう推奨される。例えば、ユーザＵ３が、音声ユーザインターフェース上において、推奨された新情報処理辞書ＤＩＣＴ＿ＮＥＷの利用を設定したとする。この場合、音声ユーザインターフェースは、新情報処理辞書ＤＩＣＴ＿ＮＥＷの利用が許可された旨の情報を情報提供装置１００に送信する。情報提供装置１００は、この許可情報を受けると、ユーザＵ３によって許可された新情報処理辞書ＤＩＣＴ＿ＮＥＷを用いて、ユーザＵ３の発話に対して音声認識を行ったり、認識した音声の意味を解釈したりする。これによって、ユーザＵ３が日常的に使用している固有表現の言い回しの意味を解釈できるようになり、その言い回しの問い合わせや要求に対して適切な回答を提供することができる。この結果、ユーザが親しみをもって音声ユーザインターフェースを利用することができる。 FIG. 14 is a diagram schematically showing a scene in which usage guidance information of an information processing dictionary is provided. U3 in the figure is a user whose user vector is close to the cluster A in the clustering feature space. That is, the user U3 is a user whose characteristics such as utterance content and action history are similar to those of the user group A. It is recommended that such a user U3 set, for example, a new information processing dictionary DICT_NEW which is a combination of the information processing dictionary DICT_A including the unique expression and the co-occurrence expression of the user group A and the existing dictionary DICT_X on the voice user interface. Will be done. For example, it is assumed that the user U3 sets the use of the recommended new information processing dictionary DICT_NEW on the voice user interface. In this case, the voice user interface transmits information to the effect that the use of the new information processing dictionary DICT_NEW is permitted to the information providing device 100. Upon receiving this permission information, the information providing device 100 uses the new information processing dictionary DICT_NEW permitted by the user U3 to perform voice recognition for the utterance of the user U3 and interpret the meaning of the recognized voice. do. As a result, the user U3 can interpret the meaning of the wording of the named entity that is used on a daily basis, and can provide an appropriate answer to the inquiry or request of the wording. As a result, the user can use the voice user interface in a familiar manner.

以上説明した実施形態によれば、情報提供装置１００は、複数のユーザの発話をテキスト化し、そのテキストデータから固有表現を抽出する。情報提供装置１００は、複数のユーザのそれぞれの行動履歴に基づいて、各々のユーザが訪問回数が急増した地点を訪問したか否かを判定する。情報提供装置１００は、各ユーザのテキストデータから抽出した固有表現をベクトル化した発話ベクトルと、訪問回数が急増した地点への各ユーザによる訪問の有無やその回数の結果をベクトル化した行動ベクトルとを組み合わせて、各ユーザのユーザベクトルを生成する。情報提供装置１００は、ユーザベクトルが生成された複数のユーザのクラスタリングを行い、発話内容や訪問地点といった特徴が類似するユーザ同士を同一のクラスタに分類する。情報提供装置１００は、クラスタリングによって生成されたクラスタごとに、音声認識のためのＡＳＲ辞書や自然言語理解のためのＮＬＵ辞書を含む情報処理辞書を生成する。そして、情報提供装置１００は、複数のクラスタのうち、例えばクラスタＡのユーザに、そのクラスタＡに対応した情報処理辞書の利用案内情報を提供する。 According to the embodiment described above, the information providing device 100 converts the utterances of a plurality of users into text and extracts a unique expression from the text data. The information providing device 100 determines whether or not each user has visited a point where the number of visits has increased rapidly, based on the behavior history of each of the plurality of users. The information providing device 100 includes an utterance vector obtained by vectorizing a unique expression extracted from each user's text data, and an action vector vectorized by the presence / absence of each user's visit to a point where the number of visits has increased and the result of the number of visits. To generate a user vector for each user. The information providing device 100 clusters a plurality of users for which a user vector is generated, and classifies users having similar characteristics such as utterance contents and visiting points into the same cluster. The information providing device 100 generates an information processing dictionary including an ASR dictionary for speech recognition and an NLU dictionary for natural language understanding for each cluster generated by clustering. Then, the information providing device 100 provides, for example, the user of the cluster A among the plurality of clusters with the usage guidance information of the information processing dictionary corresponding to the cluster A.

これによって、地域や年齢、流行などに応じて変化し得る発話の多様性に対応することができる。この結果、音声ユーザインターフェースのユーザビリティが向上し、例えば、ユーザが親しみのある言い回しなどを用いて音声ユーザインターフェースを利用することができる。また、辞書に登録された言葉が廃れてしまったり、或いは辞書に登録されていない新たな言葉が流行り出したりしても、その辞書を自動的に更新することができる。 This makes it possible to respond to the variety of utterances that can change depending on the region, age, fashion, and the like. As a result, the usability of the voice user interface is improved, and the voice user interface can be used, for example, by using a phrase familiar to the user. In addition, even if a word registered in the dictionary becomes obsolete or a new word not registered in the dictionary becomes popular, the dictionary can be automatically updated.

上記説明した実施形態は、以下のように表現することができる。
プログラムを記憶したメモリと、
プロセッサと、を備え、
前記プロセッサが前記プログラムを実行することにより、
複数の対象ユーザのそれぞれの発話から固有表現を抽出し、
前記対象ユーザの行動履歴に基づいて、訪問回数が急増した特定地点を前記対象ユーザが訪問したか否かを、前記対象ユーザごとに判定し、
前記抽出した固有表現と、前記判定した結果とを組み合わせた多次元の特徴量を、前記対象ユーザごとに生成し、
前記特徴量を生成した前記複数の対象ユーザのクラスタリングを行い、
前記クラスタリングによって生成したクラスタごとに、音声認識及び自然言語理解の少なくとも一方のための辞書を生成する、
ように構成されている、情報処理装置。 The embodiment described above can be expressed as follows.
The memory that stores the program and
With a processor,
When the processor executes the program,
Extract unique expressions from the utterances of multiple target users
Based on the behavior history of the target user, it is determined for each target user whether or not the target user has visited a specific point where the number of visits has increased sharply.
A multidimensional feature amount that combines the extracted named entity and the determined result is generated for each target user.
Clustering of the plurality of target users who generated the feature amount was performed.
For each cluster generated by the clustering, a dictionary for at least one of speech recognition and natural language understanding is generated.
An information processing device that is configured as such.

以上、本発明を実施するための形態について実施形態を用いて説明したが、本発明はこうした実施形態に何等限定されるものではなく、本発明の要旨を逸脱しない範囲内において種々の変形及び置換を加えることができる。 Although the embodiments for carrying out the present invention have been described above using the embodiments, the present invention is not limited to these embodiments, and various modifications and substitutions are made without departing from the gist of the present invention. Can be added.

１…情報提供システム、１００…情報提供装置、１０２…通信部、１０４…認証部、１０６…取得部、１０８…音声認識部、１１０…自然言語処理部、１１２…判定部、１１４…ユーザベクトル生成部、１１６…解析部、１１８…収集部、１２０…辞書生成部、１２２…検証部、１２４…提供部、１３０…記憶部、３００…通信端末、３１０…端末側通信部、３２０…入力部、３３０…ディスプレイ、３４０、６３０…スピーカ、３５０、６１０…マイク、３５５…位置取得部、３６０…カメラ、３７０…アプリ実行部、３８０…出力制御部、３９０…端末側記憶部、５００…エージェント装置、５２０…管理部、５４０…エージェント機能部、５６０…車両側記憶部、６２０…表示・操作装置、６４０…ナビゲーション装置６４０…ＭＰＵ、６６０…車両機器、６７０…車載通信装置、６８０…汎用通信装置、６９０…乗員認識装置、７００…自動運転制御装置、Ｍ…車両 1 ... Information providing system, 100 ... Information providing device, 102 ... Communication unit, 104 ... Authentication unit, 106 ... Acquisition unit, 108 ... Voice recognition unit, 110 ... Natural language processing unit, 112 ... Judgment unit, 114 ... User vector generation Unit, 116 ... Analysis unit, 118 ... Collection unit, 120 ... Dictionary generation unit, 122 ... Verification unit, 124 ... Providing unit, 130 ... Storage unit, 300 ... Communication terminal, 310 ... Terminal side communication unit, 320 ... Input unit, 330 ... Display, 340, 630 ... Speaker, 350, 610 ... Mike, 355 ... Position acquisition unit, 360 ... Camera, 370 ... App execution unit, 380 ... Output control unit, 390 ... Terminal side storage unit, 500 ... Agent device, 520 ... Management unit, 540 ... Agent function unit, 560 ... Vehicle side storage unit, 620 ... Display / operation device, 640 ... Navigation device 640 ... MPU, 660 ... Vehicle equipment, 670 ... In-vehicle communication device, 680 ... General-purpose communication device, 690 ... Crew recognition device, 700 ... Automatic operation control device, M ... Vehicle

Claims

An extractor that extracts named entities from the utterances of multiple target users,
Based on the behavior history of the target user, a determination unit that determines for each target user whether or not the target user has visited a specific point where the number of visits has increased sharply.
A first generation unit that generates a feature amount that combines the named entity extracted by the extraction unit and the determination result by the determination unit for each target user.
An analysis unit that clusters the plurality of target users whose features are generated by the first generation unit, and an analysis unit.
A second generator that generates a dictionary for at least one of speech recognition and natural language understanding for each cluster generated by the clustering.
Information processing device equipped with.

The specific point is a point where the number of visits by another user is equal to or greater than the threshold value, or the number of visits by the other user is a point where the rate of increase per predetermined period is equal to or greater than the threshold value.
The information processing apparatus according to claim 1.

Further, a collecting unit is provided for collecting the co-occurrence expression of the named entity extracted from the utterance of the target user belonging to the cluster for each cluster.
The second generation unit generates the dictionary containing the co-occurrence expressions collected by the collection unit for each cluster.
The information processing apparatus according to claim 1 or 2.

Further, a providing unit that provides usage guidance information of the dictionary corresponding to the specific cluster among the plurality of dictionaries is further provided to the target user belonging to the specific cluster among the plurality of the clusters.
The information processing apparatus according to any one of claims 1 to 3.

The second generation unit generates a new dictionary by combining the dictionary generated for each cluster and the existing dictionary.
The providing unit provides the target user belonging to the specific cluster with usage guidance information of the new dictionary in which the dictionary corresponding to the specific cluster and the existing dictionary are combined.
The information processing apparatus according to claim 4.

A verification unit that verifies the dictionary based on the utterance of the target user in a predetermined group of users is further provided.
The information processing apparatus according to any one of claims 1 to 5.

The first generation unit generates a combination of a first feature amount based on the named entity and a second feature amount based on a determination result by the determination unit as the feature amount.
The information processing apparatus according to any one of claims 1 to 6.

The second feature amount includes a feature amount representing one or both of the presence / absence of a visit to the specific point and the number of visits to the specific point.
The information processing apparatus according to claim 7.

The named entity includes the wording of a place name or mark,
The information processing apparatus according to any one of claims 1 to 8.

The computer
Extract unique expressions from the utterances of multiple target users
Based on the behavior history of the target user, it is determined for each target user whether or not the target user has visited a specific point where the number of visits has increased sharply.
A feature amount that combines the extracted named entity and the determined result is generated for each target user.
Clustering of the plurality of target users who generated the feature amount was performed.
For each cluster generated by the clustering, a dictionary for at least one of speech recognition and natural language understanding is generated.
Information processing method.

On the computer
Extracting named entity from each utterance of multiple target users,
Based on the behavior history of the target user, it is determined for each target user whether or not the target user has visited a specific point where the number of visits has increased sharply.
To generate a feature amount that combines the extracted named entity and the determined result for each target user.
Performing clustering of the plurality of target users who generated the feature amount,
To generate a dictionary for at least one of speech recognition and natural language understanding for each cluster generated by the clustering.
A program to execute.