JP2018022290A

JP2018022290A - Information processing device and program

Info

Publication number: JP2018022290A
Application number: JP2016152282A
Authority: JP
Inventors: 友紀谷口; Tomonori Taniguchi; 茂之榊; Shigeyuki Sakaki; 大熊　智子; Tomoko Okuma; 智子大熊; 三浦康秀; Yasuhide Miura; 康秀三浦; 元樹谷口; Motoki Taniguchi
Original assignee: Fuji Xerox Co Ltd
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2016-08-02
Filing date: 2016-08-02
Publication date: 2018-02-08
Anticipated expiration: 2036-08-02
Also published as: JP6844143B2

Abstract

PROBLEM TO BE SOLVED: To improve the accuracy of at least one of training and identification in comparison with a configuration in which lacking information is not interpolated in the case of using information of a plurality of formats to perform machine learning.SOLUTION: A multi-modal identification device 300 includes: a relevant information selection part 312 and an information interpolation part 313 for interpolating information of a lacking format by relevant information related to information of a plurality of formats in the case that information of a portion of formats in the information of the plurality of formats; and a multi-modal identification part 314 for using the information of the plurality of formats obtained by interpolating the information of the lacking format to execute at least one of training and identification.SELECTED DRAWING: Figure 4

Description

本発明は、情報処理装置及びプログラムに関する。 The present invention relates to an information processing apparatus and a program.

例えば、特許文献１には、表示装置を備える、消費者の消費者コンピュータに電子的に
配信される構造化文書をカスタマイズする方法であって、消費者コンピュータにおいて構造化文書を受信し、構造化文書を、それぞれ複数の選択可能なコンテンツ代替を有する複数の可変コンテンツセクションにセグメント化し、各可変コンテンツセクション毎に、コンテンツ代替を消費者の消費者プロファイルに対して評価することで、セクションを増補するコンテンツ代替の１つを選択し、セクションを選択されたコンテンツ代替で増補し、増補した構造化文書を消費者に対して表示装置上に提示する技術が開示されている。 For example, Patent Document 1 discloses a method for customizing a structured document that is provided with a display device and is electronically distributed to a consumer's consumer computer. Augment the section by segmenting the document into multiple variable content sections, each with multiple selectable content alternatives, and for each variable content section, evaluate the content alternatives against the consumer's consumer profile Techniques are disclosed for selecting one of the content alternatives, augmenting a section with the selected content alternative, and presenting the augmented structured document to the consumer on a display device.

特表２００２−５２０６８９号公報Japanese translation of PCT publication No. 2002-520689

従来、ある程度の数のデータを入力して解析を行い、そのデータから有用な規則や判断基準などを導出する機械学習の技術が知られている。このような機械学習では、各種の様々なデータが入力されて訓練や識別の処理が行われる。例えば複数形式の情報を入力のデータとして用いる場合には、入力する情報に欠落のないことが望ましい。
本発明は、複数形式の情報を用いて機械学習を行う場合に、欠けている情報を補間しない構成と比較して、訓練及び識別の少なくとも何れか一方の精度を向上させることを目的とする。 2. Description of the Related Art Conventionally, a machine learning technique is known in which a certain number of data is input and analyzed, and useful rules and criteria are derived from the data. In such machine learning, various types of data are input and training and identification processes are performed. For example, when information in a plurality of formats is used as input data, it is desirable that input information is not missing.
An object of the present invention is to improve the accuracy of at least one of training and identification in comparison with a configuration in which missing information is not interpolated when machine learning is performed using information in a plurality of formats.

請求項１に記載の発明は、複数形式の情報のうち一部の形式の情報が欠けている場合に、当該複数形式の情報に関連する関連情報により、欠けている形式の情報を補間する補間手段と、前記欠けている形式の情報が補間された前記複数形式の情報を用いて、訓練及び識別の少なくとも何れか一方の処理を実行する処理実行手段とを備える情報処理装置である。
請求項２に記載の発明は、前記補間手段は、前記関連情報として、前記欠けている形式を有し、前記複数形式の情報のうち欠けていない情報が投稿された時刻から予め定められた時間内に投稿された情報を取得することを特徴とする請求項１に記載の情報処理装置である。
請求項３に記載の発明は、前記補間手段は、前記関連情報として、前記欠けている形式を有し、前記複数形式の情報のうち欠けていない情報が投稿された位置から予め定められた範囲内の位置にて投稿された情報を取得することを特徴とする請求項１に記載の情報処理装置である。
請求項４に記載の発明は、前記補間手段は、前記複数形式の情報のうち第１形式の情報が欠けており、第２形式の情報が欠けていない場合に、当該欠けていない第２形式の情報との類似度が予め定められた条件を満たす第２形式の他の情報と共に投稿された第１形式の情報を、前記関連情報として取得することを特徴とする請求項２又は３に記載の情報処理装置である。
請求項５に記載の発明は、前記補間手段は、前記欠けていない第２形式の情報がテキスト形式の情報である場合に、テキスト形式の情報から算出される文書ベクトルの類似度により、当該欠けていない第２形式の情報との類似度が予め定められた条件を満たすテキスト形式の他の情報と共に投稿された第１形式の情報を、前記関連情報として取得することを特徴とする請求項４に記載の情報処理装置である。
請求項６に記載の発明は、コンピュータに、複数形式の情報のうち一部の形式の情報が欠けている場合に、当該複数形式の情報に関連する関連情報により、欠けている形式の情報を補間する機能と、前記欠けている形式の情報が補間された前記複数形式の情報を用いて、訓練及び識別の少なくとも何れか一方の処理を実行する機能とを実現させるためのプログラムである。 According to the first aspect of the present invention, when a part of information in a plurality of formats is missing, interpolation is performed to interpolate the information in the missing format by using related information related to the information in the plurality of formats. An information processing apparatus comprising: means; and processing execution means for executing at least one of training and identification using the information of the plurality of formats in which the information of the missing format is interpolated.
According to a second aspect of the present invention, the interpolation means has the missing format as the related information, and a predetermined time from the time when the missing information is posted out of the information in the plurality of formats. The information processing apparatus according to claim 1, wherein information posted inside is acquired.
According to a third aspect of the present invention, the interpolation means has the missing format as the related information, and a predetermined range from a position where the missing information is posted among the information of the plurality of formats. The information processing apparatus according to claim 1, wherein information posted at a position in the server is acquired.
According to a fourth aspect of the present invention, when the interpolation unit lacks the first format information and the second format information is not missing, the second format is not missing. The information in the first format posted together with other information in the second format satisfying a predetermined degree of similarity with the information of the information is acquired as the related information. Information processing apparatus.
According to a fifth aspect of the present invention, when the second format information that is not missing is text format information, the interpolation means determines the missing format based on the similarity of the document vector calculated from the text format information. 5. The information in the first format posted together with other information in the text format satisfying a predetermined degree of similarity with the information in the second format that is not obtained is acquired as the related information. It is an information processing apparatus as described in.
In the invention described in claim 6, when a part of information in a plurality of formats is missing in the computer, the information in the missing format is obtained by the related information related to the information in the plurality of formats. A program for realizing an interpolating function and a function of executing at least one of training and identification processing using the information of the plurality of formats in which the information of the missing format is interpolated.

請求項１記載の発明によれば、複数形式の情報を用いて機械学習を行う場合に、欠けている情報を補間しない構成と比較して、訓練及び識別の少なくとも何れか一方の精度を向上させることができる。
請求項２記載の発明によれば、複数形式の情報のうち一部が欠けている場合に、時間的に近接する情報を用いて補間することができるようになる。
請求項３記載の発明によれば、複数形式の情報のうち一部が欠けている場合に、地理的に近接する情報を用いて補間することができるようになる。
請求項４記載の発明によれば、複数形式の情報のうち一部が欠けている場合に、欠けていない情報との類似度を基にして補間することができるようになる。
請求項５記載の発明によれば、複数形式の情報の多くにテキスト形式の情報が含まれる場合に、多くの複数形式の情報について補間することができるようになる。
請求項６記載の発明によれば、複数形式の情報を用いて機械学習を行う場合に、欠けている情報を補間しない構成と比較して、訓練及び識別の少なくとも何れか一方の精度を向上させる機能をコンピュータにより実現できる。 According to the first aspect of the present invention, when performing machine learning using information in a plurality of formats, the accuracy of at least one of training and identification is improved as compared with a configuration in which missing information is not interpolated. be able to.
According to the second aspect of the present invention, when a part of information in a plurality of formats is missing, it is possible to perform interpolation using information that is temporally close.
According to the third aspect of the present invention, when a part of information in a plurality of formats is missing, interpolation can be performed using geographically close information.
According to the fourth aspect of the present invention, when a part of information in a plurality of formats is missing, it is possible to perform interpolation based on the similarity to the information that is not missing.
According to the fifth aspect of the present invention, when information in a text format is included in many pieces of information in a plurality of formats, it becomes possible to interpolate a lot of information in a plurality of formats.
According to the sixth aspect of the present invention, when machine learning is performed using information in a plurality of formats, the accuracy of at least one of training and identification is improved as compared with a configuration in which missing information is not interpolated. The function can be realized by a computer.

機械学習の一例を説明するための図である。It is a figure for demonstrating an example of machine learning. 本実施の形態が適用されるコンピュータシステムの全体構成例を示した図である。It is a figure showing the example of whole composition of the computer system to which this embodiment is applied. 本実施の形態に係るマルチモダル識別装置のハードウェア構成例を示す図である。It is a figure which shows the hardware structural example of the multimodal identification device which concerns on this Embodiment. 実施の形態１に係るマルチモダル識別装置の機能構成例を示したブロック図である。3 is a block diagram illustrating an example of a functional configuration of a multimodal identification device according to Embodiment 1. FIG. （ａ）、（ｂ）は、実施の形態１に係る関連情報の選択処理の一例を説明するための図である。(A), (b) is a figure for demonstrating an example of the selection process of the relevant information which concerns on Embodiment 1. FIG. （ａ）は、実施の形態１に係るマルチモダル識別装置による訓練時の処理の手順を示したフローチャートである。（ｂ）は、実施の形態１に係るマルチモダル識別装置による識別時の処理の手順を示したフローチャートである。(A) is the flowchart which showed the procedure of the process at the time of training by the multimodal identification device which concerns on Embodiment 1. FIG. (B) is the flowchart which showed the procedure of the process at the time of the identification by the multimodal identification apparatus which concerns on Embodiment 1. FIG. 実施の形態１の変形例におけるマルチモダル識別装置の機能構成例を示したブロック図である。6 is a block diagram illustrating an example of a functional configuration of a multimodal identification device according to a modification of the first embodiment. FIG. （ａ）、（ｂ）は、テキスト情報の類似度の一例を示した図である。(A), (b) is the figure which showed an example of the similarity of text information. 実施の形態２に係るマルチモダル識別装置の機能構成例を示したブロック図である。FIG. 6 is a block diagram illustrating a functional configuration example of a multimodal identification device according to a second embodiment. （ａ）、（ｂ）は、実施の形態２に係る関連情報の選択処理の一例を説明するための図である。(A), (b) is a figure for demonstrating an example of the selection process of the relevant information which concerns on Embodiment 2. FIG. （ａ）は、実施の形態２に係るマルチモダル識別装置による訓練時の処理の手順を示したフローチャートである。（ｂ）は、実施の形態２に係るマルチモダル識別装置による識別時の処理の手順を示したフローチャートである。(A) is the flowchart which showed the procedure of the process at the time of training by the multimodal identification device which concerns on Embodiment 2. FIG. (B) is the flowchart which showed the procedure of the process at the time of the identification by the multimodal identification apparatus which concerns on Embodiment 2. FIG.

以下、添付図面を参照して、本発明の実施の形態について詳細に説明する。
＜背景＞
まず、本実施の形態の背景について説明する。
図１は、機械学習の一例を説明するための図である。図１に示す例は、ニューラルネットワークのモデルを用いた機械学習を示している。具体的には、入力されるデータ（入力データ）は入力層を通り、中間層、出力層を通過して処理され、出力結果（出力データ）が生成される。訓練の処理では、複数のデータを入力し、入力層、中間層、出力層を繋ぐ重み付けの調整が行われる。このような訓練を行って重み付けを調整することにより、何らかのデータを入力して識別する場合に希望する出力結果が得られるようになる。 Embodiments of the present invention will be described below in detail with reference to the accompanying drawings.
<Background>
First, the background of the present embodiment will be described.
FIG. 1 is a diagram for explaining an example of machine learning. The example shown in FIG. 1 shows machine learning using a neural network model. Specifically, the input data (input data) passes through the input layer, passes through the intermediate layer and the output layer, and is processed to generate an output result (output data). In the training process, a plurality of data is input, and weighting adjustment is performed to connect the input layer, the intermediate layer, and the output layer. By performing such training and adjusting the weighting, a desired output result can be obtained when some data is input and identified.

なお、一般に、機械学習では、事前に人間により判断基準が示されたデータ、いわゆる教師データを基に訓練を行い、入力データに対して適正な出力を行う規則を導出する教師付き学習が知られている。また、入力されるデータのみが与えられ、そのデータを基に訓練を行い、コンピュータ自身が一定の基準に基づき自動でそれを適正化するような出力の割り当てを求める教師なし学習も知られている。 In general, in machine learning, supervised learning is known in which training is performed based on data for which judgment criteria have been previously shown by humans, that is, so-called teacher data, and a rule for appropriately outputting input data is derived. ing. In addition, unsupervised learning is also known in which only input data is given, training is performed based on the data, and the computer itself automatically assigns an output that is optimized based on a certain standard. .

そして、本実施の形態は、マルチモダルな情報（以下、マルチモダル情報と称する）、言い換えると、複数のコミュニケーション様式からなる情報を利用して、機械学習を行うものである。即ち、本実施の形態では、マルチモダル情報を用いて訓練を行い、重み付けを調整する。そして、重み付けの調整を行ったモデルに対してデータを入力し、入力したデータの識別を行う。 In this embodiment, machine learning is performed using multimodal information (hereinafter referred to as multimodal information), in other words, information including a plurality of communication modes. That is, in this embodiment, training is performed using multimodal information, and weighting is adjusted. Then, data is input to the model whose weight is adjusted, and the input data is identified.

ここで、マルチモダル情報には、例えば、テキスト、画像、動画、音声などの形式（コミュニケーション様式）の情報が含まれており、これらの複数の形式、言い換えると、少なくとも２つ以上の形式の情報が含まれている。なお、テキストとは、文字コードで構成された文字列や文書のデータを示す。
このようなマルチモダル情報としては、例えば、Ｔｗｉｔｔｅｒ（登録商標）,Ｉｎｓｔａｇｒａｍ（登録商標）などのソーシャル・ネットワーキング・サービス（ＳＮＳ）にてユーザに投稿され、公開されている情報を例示することができる。 Here, the multi-modal information includes, for example, information in a format (communication style) such as text, image, video, audio, and the like, and in other words, information in at least two or more formats is included. include. The text indicates a character string or a document data composed of character codes.
As such multi-modal information, for example, information posted to a user through a social networking service (SNS) such as Twitter (registered trademark) or Instagram (registered trademark) and published can be exemplified.

例えば、投稿内容として、「かわいい」というテキストと、「犬」の画像とが一緒に投稿されている場合を考える。このような場合には、テキスト及び画像というマルチモダル情報を扱っているからこそ、「犬」を「かわいい」と表現した投稿であり、例えば、投稿者が愛犬家であると予測することができる。一方、例えば「かわいい」というテキストのみであれば、その「かわいい」とされる対象は不明であるし、例えば「犬」の画像のみであれば、犬に対して好きか嫌いかの判断をすることができない。このように、マルチモダル情報を用いることで、１つの形式の情報のみを用いる場合と比較して、ユーザの嗜好や行動などを予測し易くなる。予測されたユーザの嗜好や行動などの情報は、例えば広告配信などのマーケティングの情報として活用することができる。 For example, let us consider a case in which a text “cute” and an image of “dog” are posted together as posted contents. In such a case, it is a post that expresses “dog” as “cute” because it handles multi-modal information such as text and images, and for example, it can be predicted that the poster is a dog dog. On the other hand, for example, if only the text “cute” is present, the subject to be “cute” is unknown. For example, if only the image “dog” is used, it is determined whether the dog likes or dislikes. I can't. Thus, by using multimodal information, it becomes easier to predict the user's preferences, behaviors, and the like than when only one type of information is used. Information such as predicted user preferences and behavior can be used as marketing information such as advertisement distribution.

ところで、例えば、Ｔｗｉｔｔｅｒ、ＩｎｓｔａｇｒａｍなどのＳＮＳで投稿される情報は、２つ以上の形式の情報が含まれておらず、マルチモダル情報ではない場合も多い。例えば、Ｔｗｉｔｔｅｒはテキストの投稿がメインのＳＮＳであり、テキスト情報はある一方で画像情報は欠落していることが多い。また、例えば、Ｉｎｓｔａｇｒａｍは画像投稿がメインのＳＮＳであり、画像情報はある一方でテキスト情報は欠落していることが多い。そのため以下にて、マルチモダル情報を用いた機械学習を行うために、複数形式の情報のうち欠落している一部の情報を補間する手順について、説明する。 By the way, for example, information posted by SNS such as Twitter and Instagram does not include information of two or more formats and is often not multimodal information. For example, Twitter is the main SNS for posting text, and there are many cases where there is text information but image information is missing. In addition, for example, Instagram is a main SNS for image posting, and text information is often missing while there is image information. Therefore, in the following, a procedure for interpolating a part of missing information in a plurality of formats in order to perform machine learning using multi-modal information will be described.

＜システムの全体構成＞
本実施の形態が適用されるコンピュータシステムについて説明する。図２は、本実施の形態が適用されるコンピュータシステムの全体構成例を示した図である。図示するように、このコンピュータシステムでは、クライアント端末１００（図示の例では、クライアント端末１００ａ〜１００ｃ）、情報記憶装置２００、マルチモダル識別装置３００がネットワーク４００に接続されている。 <Overall system configuration>
A computer system to which this embodiment is applied will be described. FIG. 2 is a diagram showing an example of the overall configuration of a computer system to which the present embodiment is applied. As shown in the figure, in this computer system, a client terminal 100 (client terminals 100a to 100c in the illustrated example), an information storage device 200, and a multimodal identification device 300 are connected to a network 400.

クライアント端末１００は、ユーザが使用する端末装置であり、例えば、携帯情報端末（いわゆる、スマートフォンやタブレット端末等）やＰＣ（Personal Computer）などを例示することができる。ユーザは、このクライアント端末１００を使用して、例えば、ＴｗｉｔｔｅｒやＩｎｓｔａｇｒａｍ等のサービスにおいて、テキストや画像を投稿する。言い換えると、クライアント端末１００は、ユーザの操作入力に基づいて、テキストや画像等のユーザが投稿する情報（以下、投稿情報と称する）を情報記憶装置２００へ送信する。なお、図２では、３台のクライアント端末１００しか示していないが、３台に限定されるものではなく、実際には、例えば、ＴｗｉｔｔｅｒやＩｎｓｔａｇｒａｍ等の各種サービスを利用している複数のクライアント端末が対象となる。 The client terminal 100 is a terminal device used by a user, and can be exemplified by, for example, a portable information terminal (so-called smart phone or tablet terminal) or a PC (Personal Computer). Using this client terminal 100, the user posts text and images in services such as Twitter and Instagram, for example. In other words, the client terminal 100 transmits information posted by the user such as text and images (hereinafter referred to as posted information) to the information storage device 200 based on the user's operation input. In FIG. 2, only three client terminals 100 are shown. However, the number is not limited to three, and actually, for example, a plurality of client terminals using various services such as Twitter and Instagram. Is the target.

情報記憶装置２００は、クライアント端末１００から取得した投稿情報を記憶するコンピュータ装置であり、サーバ等を例示することができる。より具体的には、情報記憶装置２００は、例えば、ＴｗｉｔｔｅｒやＩｎｓｔａｇｒａｍ等の各種サービスを提供するために設けられたサーバである。なお、図２では、１台の情報記憶装置２００しか示していないが、１台に限定されるものではなく、実際には、各種サービス毎に設けられたり、１つのサービスでも複数の情報記憶装置２００が設けられたりするものとする。 The information storage device 200 is a computer device that stores post information acquired from the client terminal 100, and may be a server or the like. More specifically, the information storage device 200 is a server provided to provide various services such as Twitter and Instagram. In FIG. 2, only one information storage device 200 is shown. However, the information storage device 200 is not limited to one. Actually, the information storage device 200 is provided for each of various services. 200 is provided.

マルチモダル識別装置３００は、マルチモダル情報を利用して機械学習を行うコンピュータ装置である。マルチモダル識別装置３００としては、例えば、ＰＣ、ワークステーション等が例示される。
より具体的には、マルチモダル識別装置３００は、情報記憶装置２００に記憶された投稿情報を取得し、取得した投稿情報を基に訓練を行う。詳しくは後述するが、この訓練において、マルチモダル識別装置３００は、取得した投稿情報がマルチモダル情報ではなく、欠落している情報がある場合、欠落している情報を補間してから訓練を行う。
また、マルチモダル識別装置３００は、訓練により訓練モデルを生成した後、生成した訓練モデルを用いて、情報記憶装置２００から取得した投稿情報の識別を行う。詳しくは後述するが、この識別において、マルチモダル識別装置３００は、取得した投稿情報がマルチモダル情報ではなく、欠落している情報がある場合、欠落している情報を補間してから識別を行う。 The multimodal identification device 300 is a computer device that performs machine learning using multimodal information. Examples of the multi-modal identification device 300 include a PC and a workstation.
More specifically, the multimodal identification device 300 acquires post information stored in the information storage device 200 and performs training based on the acquired post information. As will be described in detail later, in this training, when the acquired post information is not multimodal information and there is missing information, the multimodal identification device 300 performs training after interpolating the missing information.
In addition, the multimodal identification device 300 generates a training model by training, and then identifies post information acquired from the information storage device 200 using the generated training model. As will be described in detail later, in this identification, when the acquired post information is not multimodal information but there is missing information, the multimodal identification device 300 performs identification after interpolating the missing information.

ネットワーク４００は、クライアント端末１００、情報記憶装置２００、マルチモダル識別装置３００の間の情報通信に用いられる通信手段であり、例えば、インターネットである。 The network 400 is a communication unit used for information communication among the client terminal 100, the information storage device 200, and the multimodal identification device 300, and is, for example, the Internet.

なお、図２に示す例では、マルチモダル識別装置３００は、情報記憶装置２００から、直接、投稿情報を取得することとしたが、このような構成に限られるものではない。例えば、情報記憶装置２００の投稿情報を記憶しておく別の記憶装置を設けて、マルチモダル識別装置３００が、その別の記憶装置から投稿情報を取得するような構成にしても良い。 In the example illustrated in FIG. 2, the multi-modal identification device 300 acquires post information directly from the information storage device 200, but is not limited to such a configuration. For example, another storage device that stores the posting information of the information storage device 200 may be provided, and the multimodal identification device 300 may acquire the posting information from the other storage device.

＜マルチモダル識別装置３００のハードウェア構成＞
次に、本実施の形態に係るマルチモダル識別装置３００のハードウェア構成について説明する。図３は、本実施の形態に係るマルチモダル識別装置３００のハードウェア構成例を示す図である。図示するように、マルチモダル識別装置３００は、演算手段であるＣＰＵ（Central Processing Unit）３０１と、記憶手段であるメインメモリ３０２及び磁気ディスク装置３０３とを備える。 <Hardware Configuration of Multimodal Identification Device 300>
Next, a hardware configuration of multimodal identification device 300 according to the present embodiment will be described. FIG. 3 is a diagram illustrating a hardware configuration example of the multimodal identification device 300 according to the present embodiment. As shown in the figure, the multimodal identification device 300 includes a CPU (Central Processing Unit) 301 that is a computing means, a main memory 302 that is a storage means, and a magnetic disk device 303.

ここで、ＣＰＵ３０１は、ＯＳ（Operating System）やアプリケーション等の各種プログラムを実行し、マルチモダル識別装置３００の各種機能を実現する。また、メインメモリ３０２は、各種プログラムやその実行に用いるデータ等を記憶する記憶領域である。また、磁気ディスク装置３０３は、各種プログラムに対する入力データや各種プログラムからの出力データ等を記憶する記憶領域である。
さらに、マルチモダル識別装置３００は、外部との通信を行うための通信インタフェース（通信Ｉ／Ｆ）３０４と、ビデオメモリやディスプレイ等からなる表示機構３０５と、キーボードやマウス等の入力デバイス３０６とを備える。 Here, the CPU 301 executes various programs such as an OS (Operating System) and applications, and realizes various functions of the multimodal identification device 300. The main memory 302 is a storage area for storing various programs and data used for execution thereof. The magnetic disk device 303 is a storage area for storing input data for various programs, output data from various programs, and the like.
Furthermore, the multimodal identification device 300 includes a communication interface (communication I / F) 304 for performing communication with the outside, a display mechanism 305 including a video memory, a display, and the like, and an input device 306 such as a keyboard and a mouse. .

［実施の形態１］
＜マルチモダル識別装置の機能構成＞
実施の形態１について説明を行う。
図４は、実施の形態１に係るマルチモダル識別装置３００の機能構成例を示したブロック図である。マルチモダル識別装置３００は、投稿情報における情報の欠落を検出する情報欠落検出部３１１と、欠落している情報を補間するための情報である関連情報を選択する関連情報選択部３１２と、欠落した情報を関連情報により補間する情報補間部３１３と、投稿情報を基に訓練及び識別を行うマルチモダル識別部３１４と、訓練により生成される訓練モデルを格納する訓練モデル格納部３１５とを備える。 [Embodiment 1]
<Functional configuration of multi-modal identification device>
The first embodiment will be described.
FIG. 4 is a block diagram illustrating a functional configuration example of the multimodal identification device 300 according to the first embodiment. The multi-modal identification device 300 includes an information loss detection unit 311 that detects information loss in posted information, a related information selection unit 312 that selects related information that is information for interpolating the missing information, and missing information Are interpolated with related information, a multi-modal identification unit 314 that performs training and identification based on post information, and a training model storage unit 315 that stores a training model generated by training.

情報欠落検出部３１１は、情報記憶装置２００から訓練又は識別に用いる投稿情報を取得する。そして、情報欠落検出部３１１は、取得した投稿情報において情報の欠落があるか否かを判定する。より具体的には、情報欠落検出部３１１は、取得した投稿情報が予め定められた欠落の条件（以下、欠落検出条件と称する）を満たしているか否かを判定する。ここで、情報欠落検出部３１１は、欠落検出条件を満たしている場合には情報が欠落していないと判定し、欠落検出条件を満たしていない場合には情報が欠落していると判定する。この欠落検出条件は、マルチモダル情報を構成する情報の形式を規定したものである。 The information loss detection unit 311 acquires post information used for training or identification from the information storage device 200. Then, the information loss detection unit 311 determines whether or not there is information loss in the acquired post information. More specifically, the information loss detection unit 311 determines whether the acquired post information satisfies a predetermined loss condition (hereinafter referred to as a loss detection condition). Here, the information loss detection unit 311 determines that the information is not missing when the loss detection condition is satisfied, and determines that the information is missing when the loss detection condition is not satisfied. This missing detection condition defines the format of information constituting the multimodal information.

例えば、欠落検出条件として、マルチモダル情報がテキスト情報及び画像情報を含むことが規定されている場合について説明する。この場合、情報欠落検出部３１１は、情報記憶装置２００から取得した投稿情報に対して、テキスト情報及び画像情報が含まれるか否かを判定する。取得した投稿情報にテキスト情報及び画像情報の両方が含まれる場合、欠落検出条件を満たしており、情報欠落検出部３１１は、情報は欠落していないと判定する。一方、取得した投稿情報にテキスト情報及び画像情報の少なくとも何れか一方が含まれない場合、欠落検出条件を満たしておらず、情報欠落検出部３１１は、情報が欠落していると判定する。 For example, the case where it is prescribed | regulated that multi-modal information contains text information and image information as a missing detection condition is demonstrated. In this case, the information loss detection unit 311 determines whether text information and image information are included in the posted information acquired from the information storage device 200. When both the text information and the image information are included in the acquired post information, the missing detection condition is satisfied, and the information missing detection unit 311 determines that no information is missing. On the other hand, when at least one of text information and image information is not included in the acquired posted information, the missing detection condition is not satisfied, and the information missing detection unit 311 determines that the information is missing.

なお、情報欠落検出部３１１は、情報記憶装置２００から取得した投稿情報について、情報が欠落していると判定した場合には、その投稿情報を関連情報選択部３１２及び情報補間部３１３に出力する。一方、情報欠落検出部３１１は、情報記憶装置２００から取得した投稿情報について、情報が欠落していないと判定した場合には、その投稿情報はマルチモダル情報であるため、そのままマルチモダル識別部３１４に出力する。 Note that the information loss detection unit 311 outputs the post information to the related information selection unit 312 and the information interpolation unit 313 when it is determined that the post information acquired from the information storage device 200 is missing. . On the other hand, when it is determined that the post information acquired from the information storage device 200 is not missing, the information loss detection unit 311 outputs the post information to the multi-modal identification unit 314 as it is because it is multi-modal information. To do.

関連情報選択部３１２は、投稿情報について情報が欠落していると判定された場合に、欠落している情報を補間するための情報である関連情報の選択を行う。付言すると、この関連情報は、情報の欠落が検出された投稿情報に関連する情報として捉えることができる。
本実施の形態において、関連情報選択部３１２は、候補情報抽出部３１２ａと候補情報選択部３１２ｂとを有する。 The related information selection unit 312 selects related information that is information for interpolating the missing information when it is determined that the information is missing for the posted information. In other words, the related information can be regarded as information related to the posted information in which the lack of information is detected.
In the present embodiment, the related information selection unit 312 includes a candidate information extraction unit 312a and a candidate information selection unit 312b.

候補情報抽出部３１２ａは、投稿情報の抽出範囲を過去の投稿にまで拡大して、関連情報の候補となる情報群を抽出する。より具体的には、候補情報抽出部３１２ａは、情報が欠落していると判定された投稿情報を投稿したユーザの過去の投稿情報を、情報記憶装置２００から取得する。ここで取得される情報群（即ち、過去の投稿情報）は、関連情報の候補として扱われる。 The candidate information extraction unit 312a expands the extraction range of post information to past posts, and extracts information groups that are candidates for related information. More specifically, the candidate information extraction unit 312a acquires, from the information storage device 200, past posting information of the user who posted the posting information determined to be missing. The information group acquired here (that is, past posting information) is treated as a candidate for related information.

候補情報選択部３１２ｂは、候補情報抽出部３１２ａが抽出した情報群の中から関連情報を選択する。言い換えると、候補情報選択部３１２ｂは、情報が欠落していると判定された投稿情報より過去に投稿された投稿情報の中から、関連情報を選択する。この関連情報を選択する手順の詳細については、後述する。 The candidate information selection unit 312b selects related information from the information group extracted by the candidate information extraction unit 312a. In other words, the candidate information selection unit 312b selects related information from post information posted in the past from post information determined to be missing information. Details of the procedure for selecting the related information will be described later.

情報補間部３１３は、情報欠落検出部３１１に情報が欠落していると判定された場合に、関連情報選択部３１２により選択された関連情報にて、欠落している情報を補間する。より具体的には、情報補間部３１３は、情報欠落検出部３１１により情報が欠落していると判定された投稿情報に対して、関連情報選択部３１２により選択された関連情報を関連付ける。投稿情報に関連情報を関連付けることにより、欠落した情報が補間されることとなり、投稿情報をマルチモダル情報として扱うことができるようになる。 The information interpolation unit 313 interpolates the missing information with the related information selected by the related information selection unit 312 when it is determined that information is missing from the information missing detection unit 311. More specifically, the information interpolating unit 313 associates the related information selected by the related information selecting unit 312 with the posted information determined to be missing by the information missing detecting unit 311. By associating the related information with the posted information, the missing information is interpolated, and the posted information can be handled as multimodal information.

マルチモダル識別部３１４は、情報欠落検出部３１１から、情報が欠落していないマルチモダル情報である投稿情報を取得する。また、マルチモダル識別部３１４は、情報補間部３１３にて欠落している情報が補間されてマルチモダル情報とされた投稿情報を取得する。そして、マルチモダル識別部３１４は、取得した投稿情報を基に訓練を行う。 The multimodal identification unit 314 acquires post information, which is multimodal information in which no information is missing, from the information loss detection unit 311. In addition, the multimodal identification unit 314 acquires post information in which information missing from the information interpolation unit 313 is interpolated into multimodal information. And the multimodal identification part 314 performs training based on the acquired posting information.

この訓練において、マルチモダル識別部３１４は、マルチモダル情報である投稿情報を訓練の集合として、訓練モデルの更新・生成を行う。例えば、教師付き学習の場合には、補間後のマルチモダル情報に対して正解のラベル（即ち、希望する出力結果）をユーザが付与して訓練が行われる。一方、教師なし学習の場合には、補間後のマルチモダル情報に対して、マルチモダル識別装置３００が自動で正解のラベルを付与して訓練が行われる。このようにして訓練が実行され、重み付けの調整が行われた後、識別の処理が開始される。即ち、マルチモダル識別部３１４は、情報欠落検出部３１１及び情報補間部３１３から取得した投稿情報を基に識別を行う。 In this training, the multimodal identification unit 314 updates and generates a training model using post information that is multimodal information as a set of training. For example, in the case of supervised learning, the user gives a correct label (that is, a desired output result) to the multimodal information after interpolation, and training is performed. On the other hand, in the case of unsupervised learning, the multimodal identification device 300 automatically assigns a correct answer label to the interpolated multimodal information and performs training. After the training is performed in this way and the weighting is adjusted, the identification process is started. That is, the multimodal identification unit 314 performs identification based on the posted information acquired from the information loss detection unit 311 and the information interpolation unit 313.

訓練モデル格納部３１５は、マルチモダル識別部３１４による訓練で更新・生成される訓練モデルを格納する。 The training model storage unit 315 stores a training model that is updated and generated by the training by the multimodal identification unit 314.

なお、図４に示すマルチモダル識別装置３００を構成する各機能部は、ソフトウェアとハードウェア資源とが協働することにより実現される。具体的には、マルチモダル識別装置３００を図３に示したハードウェア構成にて実現した場合、磁気ディスク装置３０３に格納されているＯＳのプログラムやアプリケーション・プログラムが、メインメモリ３０２に読み込まれてＣＰＵ３０１に実行されることにより、情報欠落検出部３１１、関連情報選択部３１２、情報補間部３１３、マルチモダル識別部３１４の各機能が実現される。また、訓練モデル格納部３１５は、メインメモリ３０２や磁気ディスク装置３０３等の記憶手段により実現される。
また、本実施の形態では、補間手段の一例として、関連情報選択部３１２、情報補間部３１３が用いられる。処理実行手段の一例として、マルチモダル識別部３１４が用いられる。 Note that each functional unit constituting the multimodal identification device 300 shown in FIG. 4 is realized by cooperation of software and hardware resources. Specifically, when the multi-modal identification device 300 is realized by the hardware configuration shown in FIG. 3, the OS program and application program stored in the magnetic disk device 303 are read into the main memory 302 and the CPU 301. As a result, the functions of the information loss detection unit 311, the related information selection unit 312, the information interpolation unit 313, and the multimodal identification unit 314 are realized. The training model storage unit 315 is realized by storage means such as the main memory 302 and the magnetic disk device 303.
In the present embodiment, the related information selection unit 312 and the information interpolation unit 313 are used as an example of an interpolation unit. As an example of processing execution means, a multi-modal identification unit 314 is used.

＜関連情報の選択処理＞
次に、欠落している情報を補間するための情報である関連情報を選択する処理について、詳細に説明する。図５（ａ）、（ｂ）は、実施の形態１に係る関連情報の選択処理の一例を説明するための図である。図５（ａ）、（ｂ）に示す例では、あるユーザ（図示の例において、ユーザＡとする）が投稿した投稿情報を示している。そして、それぞれの投稿情報について、テキスト情報、画像情報の有無を示している。 <Related information selection process>
Next, processing for selecting related information that is information for interpolating missing information will be described in detail. FIGS. 5A and 5B are diagrams for explaining an example of related information selection processing according to the first embodiment. In the example shown in FIGS. 5A and 5B, post information posted by a certain user (referred to as user A in the illustrated example) is shown. For each piece of post information, the presence / absence of text information and image information is indicated.

例えば、「ｔ＝０」は現在の時刻を示す。また、「ｔ＝−１」は「ｔ＝０」の投稿情報の１つ前の投稿情報が投稿された時刻を示す。同様に、「ｔ＝−２」は「ｔ＝−１」の投稿情報の１つ前の投稿情報が投稿された時刻を示す。「ｔ＝−３」は「ｔ＝−２」の投稿情報の１つ前の投稿情報が投稿された時刻を示す。 For example, “t = 0” indicates the current time. “T = −1” indicates the time when the post information immediately before the post information of “t = 0” is posted. Similarly, “t = −2” indicates the time when the post information immediately before the post information of “t = −1” is posted. “T = −3” indicates the time when the post information immediately before the post information of “t = −2” is posted.

そして、図５（ａ）に示す例では、「ｔ＝０」の時刻に投稿された投稿情報について、テキスト情報は含まれるが、画像情報は含まれないことを示している。ここで、情報欠落検出部３１１が「ｔ＝０」の投稿情報について画像情報の欠落を検出すると、関連情報選択部３１２の候補情報抽出部３１２ａは、「ｔ＝０」の時刻より前の予め定められた時間内に投稿された投稿情報を、情報記憶装置２００から取得する。そして、候補情報選択部３１２ｂは、取得された過去の投稿情報の中から関連情報を選択する。 In the example shown in FIG. 5A, the posted information posted at the time “t = 0” includes text information but does not include image information. Here, when the information loss detection unit 311 detects the loss of image information for the post information of “t = 0”, the candidate information extraction unit 312a of the related information selection unit 312 preliminarily before the time of “t = 0”. Post information posted within a predetermined time is acquired from the information storage device 200. Then, the candidate information selection unit 312b selects related information from the acquired past posting information.

図５（ａ）に示す例では、候補情報抽出部３１２ａは、「ｔ＝０」の時刻より前の予め定められた時間内の投稿情報として、「ｔ＝−１」、「ｔ＝−２」、「ｔ＝−３」の投稿情報を取得する。そして、候補情報選択部３１２ｂは、取得された過去の投稿情報の中から、「ｔ＝０」で欠落している形式の情報、即ち、画像情報を探し出す。ここで、「ｔ＝−１」の時刻に投稿された投稿情報には、画像情報が含まれる。そのため、候補情報選択部３１２ｂは、この「ｔ＝−１」に投稿された画像情報を、関連情報として選択する。
このようにして、「ｔ＝０」の投稿情報に対して、「ｔ＝−１」の画像情報が補間される。結果として、「ｔ＝０」のテキスト情報及び「ｔ＝−１」の画像情報がマルチモダル情報として、マルチモダル識別部３１４における訓練や識別に用いられる。 In the example illustrated in FIG. 5A, the candidate information extraction unit 312a includes “t = −1”, “t = −2” as post information within a predetermined time before the time “t = 0”. ”And“ t = −3 ”. Then, the candidate information selection unit 312b searches for information in a format missing at “t = 0”, that is, image information, from the acquired past posting information. Here, the posted information posted at the time “t = −1” includes image information. Therefore, the candidate information selection unit 312b selects the image information posted at “t = −1” as related information.
In this way, the image information of “t = −1” is interpolated with respect to the post information of “t = 0”. As a result, the text information of “t = 0” and the image information of “t = −1” are used as training and identification in the multimodal identification unit 314 as multimodal information.

さらに説明すると、「ｔ＝−３」の時刻に投稿された投稿情報にも画像情報が含まれているが、「ｔ＝−３」の時刻よりも「ｔ＝−１」の時刻の方が「ｔ＝０」の時刻に近い。一般に、投稿内容は、投稿された時刻が近いほど類似し、投稿された時刻が離れるほど類似しない傾向にある。そのため、候補情報選択部３１２ｂは、「ｔ＝０」から過去の最も近い時刻に投稿された画像情報として、「ｔ＝−１」に投稿された画像情報を選択する。なお、候補情報抽出部３１２ａの取得した過去の投稿情報に画像情報が含まれていない場合、補間は行われず、欠落しているままで、投稿情報がマルチモダル識別部３１４へ出力される。 More specifically, the post information posted at the time “t = −3” includes image information, but the time “t = −1” is more than the time “t = −3”. It is close to the time “t = 0”. In general, the posted contents tend to be similar as the posted time is closer, and not similar as the posted time is separated. Therefore, the candidate information selection unit 312b selects the image information posted at “t = −1” as the image information posted at the closest time in the past from “t = 0”. When image information is not included in the past post information acquired by the candidate information extraction unit 312a, the post information is output to the multi-modal identification unit 314 without being interpolated and remaining missing.

また、図５（ｂ）に示す例では、「ｔ＝０」の時刻に投稿された投稿情報について、画像情報は含まれるが、テキスト情報は含まれないことを示している。ここで、情報欠落検出部３１１が「ｔ＝０」の投稿情報についてテキスト情報の欠落を検出すると、図５（ａ）に示す例と同様に、候補情報抽出部３１２ａは、「ｔ＝０」の時刻より前の予め定められた時間内に投稿された投稿情報を、情報記憶装置２００から取得する。そして、候補情報選択部３１２ｂは、取得された過去の投稿情報の中から関連情報を選択する。 Further, in the example shown in FIG. 5B, the post information posted at the time “t = 0” includes image information but does not include text information. Here, when the missing information detector 311 detects missing text information for the posted information with “t = 0”, the candidate information extracting unit 312a detects “t = 0” as in the example illustrated in FIG. Post information posted within a predetermined time prior to the time is acquired from the information storage device 200. Then, the candidate information selection unit 312b selects related information from the acquired past posting information.

図５（ｂ）に示す例では、候補情報抽出部３１２ａは、「ｔ＝０」の時刻より前の予め定められた時間内の投稿情報として、「ｔ＝−１」、「ｔ＝−２」、「ｔ＝−３」の投稿情報を取得する。そして、候補情報選択部３１２ｂは、取得された過去の投稿情報の中から、「ｔ＝０」で欠落している形式の情報、即ち、テキスト情報を探し出す。ここで、「ｔ＝−１」の投稿情報にはテキスト情報が含まれていない。一方、「ｔ＝−２」の投稿情報にはテキスト情報が含まれる。そのため、候補情報選択部３１２ｂは、この「ｔ＝−２」に投稿されたテキスト情報を、関連情報として選択する。
このようにして、「ｔ＝０」の投稿情報に対して、「ｔ＝−２」のテキスト情報が補間される。結果として、「ｔ＝０」の画像情報及び「ｔ＝−２」のテキスト情報がマルチモダル情報として、マルチモダル識別部３１４における訓練や識別に用いられる。 In the example illustrated in FIG. 5B, the candidate information extraction unit 312 a includes “t = −1”, “t = −2” as post information within a predetermined time before the time “t = 0”. ”And“ t = −3 ”. Then, the candidate information selection unit 312b searches for information in a format missing at “t = 0”, that is, text information, from the acquired past posting information. Here, the post information of “t = −1” does not include text information. On the other hand, the post information “t = −2” includes text information. Therefore, the candidate information selection unit 312b selects the text information posted at “t = −2” as related information.
In this way, the text information of “t = −2” is interpolated with respect to the post information of “t = 0”. As a result, the image information of “t = 0” and the text information of “t = −2” are used as multimodal information for training and identification in the multimodal identification unit 314.

さらに説明すると、「ｔ＝−３」の時刻に投稿された投稿情報にもテキスト情報が含まれているが、「ｔ＝−３」の時刻よりも「ｔ＝−２」の時刻の方が「ｔ＝０」の時刻に近い。そのため、候補情報選択部３１２ｂは、「ｔ＝０」から過去の最も近い時刻に投稿されたテキスト情報として、「ｔ＝−２」に投稿されたテキスト情報を選択する。なお、候補情報抽出部３１２ａの取得した過去の投稿情報にテキスト情報が含まれていない場合、補間は行われず、欠落しているままで、投稿情報がマルチモダル識別部３１４へ出力される。 More specifically, the post information posted at the time “t = −3” includes text information, but the time “t = −2” is more than the time “t = −3”. It is close to the time “t = 0”. Therefore, the candidate information selection unit 312b selects text information posted at “t = −2” as text information posted at the closest time in the past from “t = 0”. In addition, when text information is not included in the past posting information acquired by the candidate information extraction unit 312a, interpolation is not performed, and the posting information is output to the multi-modal identification unit 314 without being interpolated.

このように、本実施の形態において、関連情報選択部３１２は、投稿情報において情報の欠落が検出されると、その投稿情報より過去の予め定められた時間内に投稿された投稿情報の中から関連情報を選択する。この選択において、候補情報選択部３１２ｂは、欠落している情報と同じ形式の情報であって、情報が欠落していると判定された投稿情報から過去の最も近い時刻に投稿された情報を選択する。このようにして欠落した情報を補間することにより、投稿情報と同一（又は類似）の内容・話題による補間が行われる可能性が高くなる。 As described above, in the present embodiment, when the lack of information is detected in the posted information, the related information selection unit 312 selects from the posted information posted within a predetermined time in the past from the posted information. Select related information. In this selection, the candidate information selection unit 312b selects information that is in the same format as the missing information and that has been posted at the closest time in the past from the posted information that is determined to be missing. To do. By interpolating the missing information in this way, the possibility of performing interpolation based on the same (or similar) content / topic as post information is increased.

＜マルチモダル識別装置による処理の手順＞
次に、本実施の形態に係るマルチモダル識別装置３００による処理の手順について説明する。マルチモダル識別装置３００による訓練時の処理と識別時の処理とに分けて説明する。図６（ａ）は、実施の形態１に係るマルチモダル識別装置３００による訓練時の処理の手順を示したフローチャートである。図６（ｂ）は、実施の形態１に係るマルチモダル識別装置３００による識別時の処理の手順を示したフローチャートである。 <Procedure for processing by multi-modal identification device>
Next, a processing procedure performed by multimodal identification apparatus 300 according to the present embodiment will be described. A description will be given by dividing into a process at the time of training by the multi-modal identification device 300 and a process at the time of identification. FIG. 6A is a flowchart showing a procedure of processing during training by the multimodal identification device 300 according to the first embodiment. FIG. 6B is a flowchart showing a procedure of processing at the time of identification by the multimodal identification device 300 according to the first embodiment.

まず、図６（ａ）を参照しながら、マルチモダル識別装置３００による訓練時の処理について説明する。
情報欠落検出部３１１は、情報記憶装置２００から訓練に用いる投稿情報を取得する（ステップ１０１）。ここで、投稿情報の取得条件を定めて、取得条件を満たす投稿情報を取得することとしても良い。取得条件としては、例えば、投稿情報が投稿された期間を指定したり、投稿情報が投稿された地域や投稿情報を投稿したユーザを指定したりすることが考えられる。 First, the processing at the time of training by the multimodal identification device 300 will be described with reference to FIG.
The information loss detection unit 311 acquires post information used for training from the information storage device 200 (step 101). Here, posting information acquisition conditions may be set, and posting information satisfying the acquisition conditions may be acquired. As acquisition conditions, for example, it is conceivable to specify a period in which posted information is posted, or to specify a region in which posted information is posted and a user who has posted posted information.

次に、情報欠落検出部３１１は、取得した投稿情報に欠落している情報があるか否かを判定する（ステップ１０２）。ここでは、取得した投稿情報が欠落検出条件を満たしているか否かの判定が行われる。取得した投稿情報が欠落検出条件を満たしている場合、欠落している情報はないと判定される。一方、取得した投稿情報が欠落検出条件を満たしていない場合、欠落している情報があると判定される。 Next, the information loss detection unit 311 determines whether or not there is missing information in the acquired post information (step 102). Here, it is determined whether or not the acquired post information satisfies the missing detection condition. If the acquired post information satisfies the missing detection condition, it is determined that there is no missing information. On the other hand, when the acquired post information does not satisfy the missing detection condition, it is determined that there is missing information.

ステップ１０２において、投稿情報に欠落している情報がないと判定された場合（ステップ１０２でＮｏ）、投稿情報はマルチモダル識別部３１４に出力される。そして、ステップ１０６へ移行する。
一方、ステップ１０２において、投稿情報に欠落している情報があると判定された場合（ステップ１０２でＹｅｓ）、投稿情報は関連情報選択部３１２に出力される。次に、関連情報選択部３１２の候補情報抽出部３１２ａは、投稿情報の抽出範囲を過去の投稿にまで拡大して、関連情報の候補となる過去の投稿情報を取得する（ステップ１０３）。次に、候補情報選択部３１２ｂは、候補情報抽出部３１２ａが取得した過去の投稿情報の中から、関連情報を選択する（ステップ１０４）。 If it is determined in step 102 that there is no missing information in the posted information (No in step 102), the posted information is output to the multimodal identifying unit 314. Then, the process proceeds to step 106.
On the other hand, when it is determined in step 102 that there is missing information in the posted information (Yes in step 102), the posted information is output to the related information selection unit 312. Next, the candidate information extraction unit 312a of the related information selection unit 312 expands the post information extraction range to past posts, and acquires past post information that is candidates for related information (step 103). Next, the candidate information selection unit 312b selects related information from the past posting information acquired by the candidate information extraction unit 312a (step 104).

次に、情報補間部３１３は、情報が欠落していると判定された投稿情報に対して、関連情報選択部３１２により選択された関連情報を関連付けて、欠落した情報を補間する（ステップ１０５）。欠落した情報が補間された投稿情報は、マルチモダル識別部３１４に出力される。そして、ステップ１０６へ移行する。 Next, the information interpolating unit 313 associates the related information selected by the related information selecting unit 312 with the post information determined to be missing, and interpolates the missing information (step 105). . Post information in which the missing information is interpolated is output to the multi-modal identification unit 314. Then, the process proceeds to step 106.

ステップ１０２で否定の判断（Ｎｏ）がされた後、又はステップ１０５の後、マルチモダル識別部３１４は、情報欠落検出部３１１又は情報補間部３１３から取得した投稿情報を基に訓練を行う（ステップ１０６）。次に、情報欠落検出部３１１は、ステップ１０１においてマルチモダル識別装置３００から取得した投稿情報を全て処理したか否かを判定する（ステップ１０７）。投稿情報でまだ処理していないものがある場合（ステップ１０７でＮｏ）、ステップ１０２へ移行する。一方、投稿情報の全てを処理した場合（ステップ１０７でＹｅｓ）、本処理フローは終了する。なお、本処理フローを１回又は複数回繰り返すことにより、訓練が終了し、識別の処理が開始されることになる。 After a negative determination (No) is made in step 102 or after step 105, the multimodal identification unit 314 performs training based on the posted information acquired from the information missing detection unit 311 or the information interpolation unit 313 (step 106). ). Next, the information loss detection unit 311 determines whether or not all post information acquired from the multimodal identification device 300 in step 101 has been processed (step 107). If there is post information that has not been processed yet (No in Step 107), the process proceeds to Step 102. On the other hand, when all of the posted information has been processed (Yes in step 107), this processing flow ends. In addition, training is complete | finished by repeating this process flow once or several times, and the process of identification is started.

次に、図６（ｂ）を参照しながら、マルチモダル識別装置３００による識別時の処理について説明する。
情報欠落検出部３１１は、情報記憶装置２００から識別の対象とする投稿情報を取得する（ステップ２０１）。ここで、図６（ａ）のステップ１０１と同様に、投稿情報の取得条件を定めても良い。次のステップ２０２〜ステップ２０５の処理は、図６（ａ）のステップ１０２〜ステップ１０５の処理と同様であるため、ここでは説明を省略する。 Next, with reference to FIG. 6B, a process at the time of identification by the multimodal identification device 300 will be described.
The information loss detection unit 311 acquires post information to be identified from the information storage device 200 (step 201). Here, as in step 101 in FIG. 6A, posting information acquisition conditions may be set. The processing of the next step 202 to step 205 is the same as the processing of step 102 to step 105 in FIG.

ステップ２０２で否定の判断（Ｎｏ）がされた後、又はステップ２０５の後、マルチモダル識別部３１４は、情報欠落検出部３１１又は情報補間部３１３から取得した投稿情報を基に識別を行う（ステップ２０６）。次に、情報欠落検出部３１１は、ステップ２０１においてマルチモダル識別装置３００から取得した投稿情報を全て処理したか否かを判定する（ステップ２０７）。投稿情報でまだ処理していないものがある場合（ステップ２０７でＮｏ）、ステップ２０２へ移行する。一方、投稿情報の全てを処理した場合（ステップ２０７でＹｅｓ）、本処理フローは終了する。識別の結果得られた情報は、上述したように、例えば広告配信などのマーケティングの情報として用いられる。 After a negative determination (No) is made in step 202 or after step 205, the multi-modal identification unit 314 performs identification based on the posted information acquired from the information missing detection unit 311 or the information interpolation unit 313 (step 206). ). Next, the information loss detection unit 311 determines whether or not all post information acquired from the multimodal identification device 300 in step 201 has been processed (step 207). If there is post information that has not been processed yet (No in Step 207), the process proceeds to Step 202. On the other hand, when all of the posted information has been processed (Yes in step 207), this processing flow ends. As described above, the information obtained as a result of the identification is used as marketing information such as advertisement distribution.

また、本実施の形態では、情報の補間において、情報が欠落していると判定された投稿情報から過去の最も近い時刻に投稿された情報で補間することとしたが、後述する変形例において計算する類似度を基に、補間を行うこととしても良い。
例えば、図５（ａ）に示す例では、「ｔ＝０」の投稿情報に対して、「ｔ＝−１」の画像情報が補間された。ここで、関連情報選択部３１２が、「ｔ＝０」のテキスト情報と「ｔ＝−１」のテキスト情報との類似度、「ｔ＝０」のテキスト情報と「ｔ＝−３」のテキスト情報との類似度の計算を行ったとする。その結果、「ｔ＝−１」のテキスト情報よりも「ｔ＝−３」のテキスト情報の方が、「ｔ＝０」のテキスト情報に類似していると判断された場合、「ｔ＝−１」の画像情報ではなく、「ｔ＝−３」の画像情報を選択して補間することとしても良い。 In the present embodiment, in the information interpolation, the information posted at the nearest time in the past is interpolated from the posted information determined to be missing information. Interpolation may be performed based on the similarity.
For example, in the example illustrated in FIG. 5A, the image information “t = −1” is interpolated with respect to the post information “t = 0”. Here, the related information selection unit 312 determines the similarity between the text information “t = 0” and the text information “t = −1”, the text information “t = 0”, and the text “t = −3”. Assume that the similarity with information is calculated. As a result, when it is determined that the text information of “t = −3” is more similar to the text information of “t = 0” than the text information of “t = −1”, “t = − Instead of the image information of “1”, the image information of “t = −3” may be selected and interpolated.

＜実施の形態１の変形例＞
次に、実施の形態１の変形例について説明する。図７は、実施の形態１の変形例におけるマルチモダル識別装置３００の機能構成例を示したブロック図である。この変形例では、図４に示す実施の形態１の構成と比較して、関連情報選択部３１２がさらに候補範囲指定部３１２ｃを有している。情報欠落検出部３１１、情報補間部３１３、マルチモダル識別部３１４、訓練モデル格納部３１５の機能は、図４の場合と同様である。よって以下、図４に示す実施の形態１の構成と異なる点として、関連情報選択部３１２について説明を行う。 <Modification of Embodiment 1>
Next, a modification of the first embodiment will be described. FIG. 7 is a block diagram illustrating a functional configuration example of the multimodal identification device 300 according to the modification of the first embodiment. In this modified example, the related information selecting unit 312 further includes a candidate range specifying unit 312c as compared with the configuration of the first embodiment shown in FIG. The functions of the information loss detection unit 311, the information interpolation unit 313, the multimodal identification unit 314, and the training model storage unit 315 are the same as in the case of FIG. 4. Therefore, the related information selection unit 312 will be described below as a difference from the configuration of the first embodiment shown in FIG.

本変形例で新たに設けられる候補範囲指定部３１２ｃは、候補情報抽出部３１２ａが抽出する情報群の範囲を指定する。より具体的には、候補範囲指定部３１２ｃは、情報が欠落していると判定された投稿情報の投稿を行ったユーザの過去の投稿情報を、情報記憶装置２００から取得する。そして、候補範囲指定部３１２ｃは、情報が欠落していると判定された投稿情報と過去の投稿情報との類似度を計算し、計算した類似度を基に、関連情報の候補とする範囲の指定を行う。 A candidate range specifying unit 312c newly provided in this modification specifies a range of information groups extracted by the candidate information extracting unit 312a. More specifically, the candidate range designating unit 312c obtains from the information storage device 200 the past posted information of the user who posted the posted information determined to be missing. Then, the candidate range specifying unit 312c calculates the similarity between the post information determined to be missing information and the past post information, and based on the calculated similarity, Specify.

例えば、「ｔ＝０」の時刻に投稿された投稿情報について、テキスト情報は含まれるが、画像情報は含まれておらず、情報欠落検出部３１１が画像情報の欠落を検出したものとする。この場合、候補範囲指定部３１２ｃは、同じユーザが「ｔ＝０」の時刻より前に投稿した過去の投稿情報を情報記憶装置２００から取得する。例えば、これらの投稿情報は、それぞれ「ｔ＝−１」、「ｔ＝−２」、「ｔ＝−３」、「ｔ＝−４」の時刻に投稿されたものとする。 For example, it is assumed that post information posted at the time “t = 0” includes text information but does not include image information, and the information loss detection unit 311 detects the loss of image information. In this case, the candidate range specifying unit 312c acquires past posting information posted by the same user before the time “t = 0” from the information storage device 200. For example, it is assumed that these pieces of posted information are posted at times “t = −1”, “t = −2”, “t = −3”, and “t = −4”, respectively.

ここで、候補範囲指定部３１２ｃは、「ｔ＝０」のテキスト情報と、「ｔ＝−１」、「ｔ＝−２」、「ｔ＝−３」、「ｔ＝−４」のそれぞれのテキスト情報との類似度を計算する。この類似度の計算は従来の手順を用いれば良く、例えば、ベクトル空間法による計算が行われる。ベクトル空間法とは、文書を多次元空間上のベクトルとして表現することにより、そのベクトル同士を比較して類似度を計算する手法である。さらに説明すると、ベクトル空間法により計算する場合、「ｔ＝０」、「ｔ＝−１」、「ｔ＝−２」、「ｔ＝−３」、「ｔ＝−４」のテキスト情報のそれぞれが文書ベクトルに変換される。これらの文書ベクトルから、例えばベクトル同士の成す角度の近さを表現するためのコサイン類似度を求めることにより、各文書ベクトルの類似度（即ち、テキスト情報の類似度）が計算される。なお、本実施の形態では、第１形式の情報、第２形式の情報の一例として、画像情報、テキスト情報が用いられる。なお、文書ベクトルは、文書を表しているベクトル全般を表す。そのため、例えば、文書を代表する単語をベクトル化したものも文書ベクトルに含まれる。 Here, the candidate range designation unit 312c performs the text information of “t = 0” and each of “t = −1”, “t = −2”, “t = −3”, and “t = −4”. Calculate similarity with text information. The similarity may be calculated using a conventional procedure. For example, calculation by the vector space method is performed. The vector space method is a method of calculating similarity by expressing documents as vectors in a multidimensional space and comparing the vectors. More specifically, when calculating by the vector space method, each of the text information of “t = 0”, “t = −1”, “t = −2”, “t = −3”, “t = −4”. Is converted to a document vector. From these document vectors, for example, by calculating a cosine similarity for expressing the closeness of the angle formed by the vectors, the similarity of each document vector (that is, the similarity of text information) is calculated. In the present embodiment, image information and text information are used as examples of the first format information and the second format information. Note that the document vector represents an entire vector representing a document. Therefore, for example, the vector representing words representing the document is also included in the document vector.

図８（ａ）、（ｂ）は、テキスト情報の類似度の一例を示した図である。
図８（ａ）に示すグラフでは、「ｔ＝−１」、「ｔ＝−２」、「ｔ＝−３」、「ｔ＝−４」のそれぞれのテキスト情報に関して、「ｔ＝０」のテキスト情報との類似度の値を示している。ここでは、類似度の値が小さいほど「ｔ＝０」のテキスト情報と類似していることを表している。上述したように、通常、投稿内容は、投稿された時刻が近いほど類似し（即ち、類似度の値が小さくなり）、投稿された時刻が離れるほど類似しない（即ち、類似度の値が大きくなる）傾向にある。そこで、候補範囲指定部３１２ｃは、類似度の値が予め定められた条件を満たす範囲、ここでは、類似度の値が予め定められた閾値ＴＨを下回る範囲を、関連情報の候補とする範囲として指定する。 FIGS. 8A and 8B are diagrams showing an example of the similarity of text information.
In the graph shown in FIG. 8A, “t = 0” is associated with each text information of “t = −1”, “t = −2”, “t = −3”, and “t = −4”. A similarity value with text information is shown. Here, the smaller the similarity value, the more similar to the text information of “t = 0”. As described above, usually, the posted contents are more similar as the posting time is closer (that is, the similarity value is smaller), and are not similar as the posted time is farther away (that is, the similarity value is larger). Tend to). Therefore, the candidate range specifying unit 312c sets a range in which the similarity value satisfies a predetermined condition, here, a range in which the similarity value is lower than a predetermined threshold TH as a range of related information candidates. specify.

図８（ａ）に示す場合には、「ｔ＝−１」、「ｔ＝−２」、「ｔ＝−３」の類似度の値が閾値ＴＨより小さい。そこで、候補情報抽出部３１２ａは、「ｔ＝−１」、「ｔ＝−２」、「ｔ＝−３」の時刻に投稿された投稿情報を、関連情報の候補として抽出する。そして、候補情報選択部３１２ｂは、候補情報抽出部３１２ａが抽出した投稿情報の中から、関連情報を選択する。 In the case shown in FIG. 8A, the similarity values of “t = −1”, “t = −2”, and “t = −3” are smaller than the threshold value TH. Therefore, the candidate information extraction unit 312a extracts post information posted at times “t = −1”, “t = −2”, and “t = −3” as related information candidates. Then, the candidate information selection unit 312b selects related information from the post information extracted by the candidate information extraction unit 312a.

また、図８（ｂ）に示す例では、「ｔ＝−１」、「ｔ＝−２」、「ｔ＝−３」、「ｔ＝−４」のそれぞれのテキスト情報に関して、直前に投稿されたテキスト情報との類似度（例えば、コサイン類似度）の差分（絶対値）を示している。例えば、「ｔ＝−２」のテキスト情報の場合、直前に投稿された「ｔ＝−１」のテキスト情報と比較する。即ち、「ｔ＝−１」のテキスト情報と「ｔ＝０」のテキスト情報との類似度と、「ｔ＝−２」のテキスト情報と「ｔ＝０」のテキスト情報との類似度との差分が計算される。通常、投稿の話題が変わる場合には、類似度が急峻に変化していることが考えられる。そこで、候補範囲指定部３１２ｃは、類似度の値が予め定められた条件を満たす範囲、ここでは、類似度の差分が予め定められた閾値ＴＨ２を下回る範囲を、関連情報の候補とする範囲として指定する。 In the example shown in FIG. 8B, the text information “t = −1”, “t = −2”, “t = −3”, and “t = −4” is posted immediately before. The difference (absolute value) in the similarity (for example, cosine similarity) with the text information is shown. For example, in the case of the text information “t = −2”, the text information is compared with the text information “t = −1” posted immediately before. That is, the similarity between the text information “t = −1” and the text information “t = 0”, and the similarity between the text information “t = −2” and the text information “t = 0”. The difference is calculated. Usually, when the topic of posting changes, the similarity may change sharply. Accordingly, the candidate range designating unit 312c sets a range in which the similarity value satisfies a predetermined condition, in this case, a range in which the difference in similarity is below a predetermined threshold TH2 as a candidate for related information. specify.

図８（ｂ）に示す場合には、「ｔ＝−１」、「ｔ＝−２」の類似度の差分が閾値ＴＨ２より小さい。そこで、候補情報抽出部３１２ａは、「ｔ＝−１」、「ｔ＝−２」の時刻に投稿された投稿情報を、関連情報の候補として抽出する。そして、候補情報選択部３１２ｂは、候補情報抽出部３１２ａが抽出した投稿情報の中から、関連情報を選択する。 In the case shown in FIG. 8B, the difference in similarity between “t = −1” and “t = −2” is smaller than the threshold value TH2. Therefore, the candidate information extraction unit 312a extracts post information posted at times “t = −1” and “t = −2” as related information candidates. Then, the candidate information selection unit 312b selects related information from the post information extracted by the candidate information extraction unit 312a.

このように、本変形例において、関連情報選択部３１２は、情報が欠落していると判定された投稿情報との類似度を計算して、関連情報の候補として抽出する範囲を指定する。そのため、例えば、類似度に基づいて抽出範囲を指定しない場合と比較して、投稿情報とは関係のない情報によって補間が行われることが抑制される。 As described above, in the present modification, the related information selection unit 312 calculates the degree of similarity with the posted information that is determined to be missing information, and specifies a range to be extracted as a candidate for related information. Therefore, for example, as compared with the case where the extraction range is not specified based on the similarity, it is suppressed that the interpolation is performed by information not related to the posting information.

なお、この変形例では、テキスト情報の類似度を計算することとしたが、テキスト情報とは別の形式の情報、例えば、画像情報の類似度を計算しても良い。画像情報の類似度の計算は、例えば、画像情報が有する画素の色値を比較することにより行われる。 In this modification, the similarity of the text information is calculated, but information in a format different from the text information, for example, the similarity of the image information may be calculated. The similarity of image information is calculated by, for example, comparing pixel color values included in the image information.

［実施の形態２］
＜マルチモダル識別装置の機能構成＞
次に、実施の形態２について説明を行う。
実施の形態１では、情報が欠落していると判定された投稿情報を投稿したユーザの過去の投稿を基に、欠落している情報を補間する。これに対して、本実施の形態では、情報が欠落していると判定された投稿情報を投稿したユーザの位置と地理的に近接する場所からの投稿を基に、欠落している情報を補間する。
付言すると、例えば、遊園地やイベント会場などの共通の場所から投稿された情報には、同一（又は類似）の内容が記述されている可能性が高いと考えられる。そこで、本実施の形態では、地理的に近接する場所から投稿された投稿情報を基に、欠落している情報を補間する。 [Embodiment 2]
<Functional configuration of multi-modal identification device>
Next, the second embodiment will be described.
In the first embodiment, the missing information is interpolated based on the past postings of the user who posted the posting information determined to be missing. On the other hand, in the present embodiment, the missing information is interpolated based on posts from locations that are geographically close to the position of the user who posted the posted information determined to be missing. To do.
In other words, for example, the information posted from a common place such as an amusement park or event venue is likely to have the same (or similar) content. Therefore, in the present embodiment, the missing information is interpolated based on the posted information posted from geographically close locations.

図９は、実施の形態２に係るマルチモダル識別装置３００の機能構成例を示したブロック図である。本実施の形態では、図４に示す実施の形態１の構成と比較して、関連情報選択部３１２の構成が異なる。情報欠落検出部３１１、情報補間部３１３、マルチモダル識別部３１４、訓練モデル格納部３１５の機能は、図４の場合と同様である。よって以下、図４に示す実施の形態１の構成と異なる点として、関連情報選択部３１２について説明を行う。 FIG. 9 is a block diagram illustrating a functional configuration example of the multimodal identification device 300 according to the second embodiment. In the present embodiment, the configuration of the related information selection unit 312 is different from the configuration of the first embodiment shown in FIG. The functions of the information loss detection unit 311, the information interpolation unit 313, the multimodal identification unit 314, and the training model storage unit 315 are the same as in the case of FIG. 4. Therefore, the related information selection unit 312 will be described below as a difference from the configuration of the first embodiment shown in FIG.

本実施の形態において、関連情報選択部３１２は、候補位置情報抽出部３１２ｄと候補情報選択部３１２ｂとを有する。
候補位置情報抽出部３１２ｄは、投稿情報の抽出範囲を地理的に近接するところまで拡大して、関連情報の候補となる情報群を抽出する。より具体的には、候補位置情報抽出部３１２ｄは、情報が欠落していると判定された投稿情報を投稿したユーザ（以下、対象ユーザと称する）の位置情報を取得する。そして、対象ユーザの位置に近接する場所、言い換えると、対象ユーザの位置から予め定められた範囲内から投稿された他のユーザ（以下、近接ユーザと称する）の投稿情報を、情報記憶装置２００から取得する。ここで取得される情報群（即ち、近接ユーザの投稿情報）は、関連情報の候補として扱われる。 In the present embodiment, the related information selection unit 312 includes a candidate position information extraction unit 312d and a candidate information selection unit 312b.
The candidate position information extraction unit 312d expands the extraction range of the posted information to a location that is geographically close, and extracts information groups that are candidates for related information. More specifically, the candidate position information extraction unit 312d acquires the position information of the user who posted the posting information determined to be missing (hereinafter referred to as the target user). Then, from the information storage device 200, post information of another user (hereinafter referred to as “proximity user”) posted from a location close to the position of the target user, in other words, within a predetermined range from the position of the target user. get. The information group acquired here (that is, posted information of the proximity user) is treated as a candidate for related information.

なお、ユーザの位置情報は、例えば、ユーザの投稿時に付与されるジオタグの情報を例示することができる。ジオタグとは、地図上の位置（緯度及び経度）を示す数字データである。本実施の形態に係る候補位置情報抽出部３１２ｄは、投稿情報に含まれるジオタグの情報を基に、対象ユーザの位置を特定したり、対象ユーザの位置に近接する場所から投稿された投稿情報を特定したりすることができる。 In addition, the user's position information can exemplify, for example, geotag information given at the time of posting by the user. A geotag is numerical data indicating a position (latitude and longitude) on a map. The candidate position information extraction unit 312d according to the present embodiment specifies the position of the target user based on the information of the geotag included in the post information, or post information posted from a place close to the position of the target user. Can be identified.

候補情報選択部３１２ｂは、候補情報抽出部３１２ａが抽出した情報群の中から関連情報を選択する。より具体的には、候補情報選択部３１２ｂは、情報が欠落していると判定された投稿情報が投稿された位置に近接する場所から投稿された投稿情報の中から、関連情報を選択する。 The candidate information selection unit 312b selects related information from the information group extracted by the candidate information extraction unit 312a. More specifically, the candidate information selection unit 312b selects related information from posted information posted from a location close to a position where posted information determined to lack information is posted.

＜関連情報の選択処理＞
次に、関連情報を選択する処理について、詳細に説明する。図１０（ａ）、（ｂ）は、実施の形態２に係る関連情報の選択処理の一例を説明するための図である。 <Related information selection process>
Next, processing for selecting related information will be described in detail. FIGS. 10A and 10B are diagrams for explaining an example of related information selection processing according to the second embodiment.

図１０（ａ）に示す例では、対象ユーザが投稿した投稿情報に欠落があると判定されたものとする。この場合、候補位置情報抽出部３１２ｄは、対象ユーザの位置から予め定められた範囲内に存在しているユーザを、近接ユーザとして特定する。予め定められた範囲としては、例えば、対象ユーザを中心として半径Ｌｍ以内などの範囲を例示することができる。その結果、他ユーザＡ及び他ユーザＢが近接ユーザとして特定される。一方、他ユーザＣはテキスト情報や画像情報を投稿しているが、予め定められた範囲外に存在しているため、近接ユーザからは除外される。 In the example shown in FIG. 10A, it is assumed that the posting information posted by the target user is determined to be missing. In this case, the candidate position information extraction unit 312d specifies a user who exists within a predetermined range from the position of the target user as a proximity user. As the predetermined range, for example, a range within a radius Lm around the target user can be exemplified. As a result, the other user A and the other user B are specified as the proximity users. On the other hand, the other user C posts text information and image information, but is excluded from the proximity user because it exists outside the predetermined range.

また、図１０（ｂ）に示す例は、対象ユーザ及び他ユーザＡ〜Ｃの投稿情報の一例を示している。対象ユーザの投稿情報について、テキスト情報は含まれるが、画像情報は含まれていないことを示している。ここで、情報欠落検出部３１１が対象ユーザの投稿情報について画像情報の欠落を検出すると、候補位置情報抽出部３１２ｄは、対象ユーザの位置から予め定められた範囲内に存在する近接ユーザ（ここでは、他ユーザＡ及び他ユーザＢ）の投稿情報を、情報記憶装置２００から取得する。そして、候補情報選択部３１２ｂは、取得された近接ユーザの投稿情報の中から関連情報を選択する。 In addition, the example illustrated in FIG. 10B illustrates an example of post information of the target user and other users A to C. The posted information of the target user includes text information but does not include image information. Here, when the information loss detection unit 311 detects the loss of image information for the post information of the target user, the candidate position information extraction unit 312d is a proximity user (here, the target user who exists within a predetermined range from the target user position). The posting information of other users A and B) is acquired from the information storage device 200. Then, the candidate information selection unit 312b selects related information from the acquired post information of the proximity user.

ここで、他ユーザＡの投稿情報には、画像情報が含まれていない。一方、他ユーザＢの投稿情報には、画像情報が含まれている。そのため、候補情報選択部３１２ｂは、他ユーザＢの画像情報を関連情報として選択する。このようにして、対象ユーザの投稿情報に対して他ユーザＢの画像情報が補間される。結果として、対象ユーザのテキスト情報及び他ユーザＢの画像情報がマルチモダル情報として、マルチモダル識別部３１４における訓練や識別に用いられる。 Here, the post information of the other user A does not include image information. On the other hand, the post information of the other user B includes image information. Therefore, the candidate information selection unit 312b selects the image information of the other user B as related information. In this way, the image information of the other user B is interpolated with respect to the posting information of the target user. As a result, the text information of the target user and the image information of the other user B are used as multimodal information for training and identification in the multimodal identification unit 314.

また、図１０（ａ）、（ｂ）に示す例では、画像情報を投稿した近隣ユーザが他ユーザＢしかいないが、複数の近隣ユーザが画像情報を投稿している場合も考えられる。このような場合には、予め定められた条件に従って、複数の近隣ユーザの画像情報の中から関連情報が選択される。 Further, in the example shown in FIGS. 10A and 10B, there is only another user B who has posted the image information, but there may be cases where a plurality of neighboring users have posted the image information. In such a case, related information is selected from the image information of a plurality of neighboring users according to a predetermined condition.

例えば、画像情報を投稿した複数の近隣ユーザの中で、対象ユーザから最も近い位置にいる近隣ユーザの画像情報が選択される。
また、例えば、実施の形態１の変形例において計算する類似度を基に、どの近隣ユーザの画像情報を選択するか決めても良い。この場合、候補位置情報抽出部３１２ｄは、例えば、対象ユーザのテキスト情報と各近隣ユーザのテキスト情報との類似度を計算する。そして、計算した類似度を基に、投稿内容が対象ユーザと最も類似している近隣ユーザの画像情報を選択する。
さらに、例えば、画像情報を投稿した複数の近隣ユーザの中で、対象ユーザの投稿時刻に最も近い時刻に投稿された画像情報を選択することとしても良い。 For example, image information of a neighboring user who is closest to the target user is selected from among a plurality of neighboring users who have posted image information.
Further, for example, it may be determined which neighboring user's image information is selected based on the similarity calculated in the modification of the first embodiment. In this case, for example, the candidate position information extraction unit 312d calculates the similarity between the text information of the target user and the text information of each neighboring user. Then, based on the calculated similarity, image information of a neighboring user whose post content is most similar to the target user is selected.
Furthermore, for example, image information posted at a time closest to the posting time of the target user may be selected from among a plurality of neighboring users who have posted image information.

付言すると、本実施の形態では、対象ユーザと近隣ユーザとの距離、対象ユーザの投稿内容と近隣ユーザの投稿内容との類似度、対象ユーザの投稿時刻と近隣ユーザの投稿時刻との時間差などの条件を基にして、複数の近隣ユーザのうちのどの近隣ユーザの投稿情報を用いて補間するかを決定すれば良い。 In other words, in this embodiment, the distance between the target user and the neighboring user, the similarity between the posted content of the target user and the posted content of the neighboring user, the time difference between the posted time of the target user and the posted time of the neighboring user, etc. Based on the conditions, it is only necessary to determine which neighboring user's posted information of the plurality of neighboring users is used for interpolation.

このように、本実施の形態において、関連情報選択部３１２は、投稿情報において情報の欠落が検出されると、その投稿情報が投稿された位置から予め定められた範囲内の場所から投稿された投稿情報の中から、関連情報を選択する。このようにして欠落した情報を補間することにより、投稿情報と同一（又は類似）の内容・話題による補間が行われる可能性が高くなる。 As described above, in the present embodiment, the related information selection unit 312 is posted from a location within a predetermined range from the position where the posted information is posted when the lack of information is detected in the posted information. Select related information from the posted information. By interpolating the missing information in this way, the possibility of performing interpolation based on the same (or similar) content / topic as post information is increased.

また、本実施の形態では、対象ユーザから予め定められた範囲内に存在している近接ユーザを特定して、特定した近接ユーザの投稿情報を取得したが、例えば、対象ユーザから近いユーザの投稿情報から順番に取得することとしても良い。この場合、例えば、対象ユーザの投稿情報に画像情報が含まれていなければ、対象ユーザから近いユーザの投稿情報から順番に画像情報の有無がチェックされて、画像情報の補間が行われる。 Moreover, in this Embodiment, the proximity | contact user who exists within the predetermined range from the object user was specified, and the posting information of the specified proximity | contact user was acquired, For example, a user's contribution close | similar to an object user It is good also as acquiring in order from information. In this case, for example, if image information is not included in the posted information of the target user, the presence or absence of the image information is checked in order from the posted information of the user close to the target user, and the image information is interpolated.

＜マルチモダル識別装置による処理の手順＞
次に、本実施の形態に係るマルチモダル識別装置３００による処理の手順について説明する。マルチモダル識別装置３００による訓練時の処理と識別時の処理とに分けて説明する。図１１（ａ）は、実施の形態２に係るマルチモダル識別装置３００による訓練時の処理の手順を示したフローチャートである。図１１（ｂ）は、実施の形態２に係るマルチモダル識別装置３００による識別時の処理の手順を示したフローチャートである。 <Procedure for processing by multi-modal identification device>
Next, a processing procedure performed by multimodal identification apparatus 300 according to the present embodiment will be described. A description will be given by dividing into a process at the time of training by the multi-modal identification device 300 and a process at the time of identification. FIG. 11A is a flowchart showing a procedure of processing during training by the multimodal identification apparatus 300 according to the second embodiment. FIG. 11B is a flowchart showing a procedure of processing at the time of identification by the multimodal identification device 300 according to the second embodiment.

まず、図１１（ａ）を参照しながら、マルチモダル識別装置３００による訓練時の処理について説明する。
情報欠落検出部３１１は、情報記憶装置２００から訓練に用いる投稿情報を取得する（ステップ３０１）。ここで、図６のステップ１０１と同様に、投稿情報の取得条件を定めても良い。次に、情報欠落検出部３１１は、取得した投稿情報に欠落している情報があるか否かを判定する（ステップ３０２）。ここでは、図６のステップ１０２と同様に、取得した投稿情報が欠落検出条件を満たしているか否かの判定が行われる。 First, the processing at the time of training by the multimodal identification device 300 will be described with reference to FIG.
The information loss detection unit 311 acquires post information used for training from the information storage device 200 (step 301). Here, as in step 101 of FIG. 6, posting information acquisition conditions may be set. Next, the information loss detection unit 311 determines whether or not there is missing information in the acquired post information (step 302). Here, as in step 102 of FIG. 6, it is determined whether or not the acquired post information satisfies the missing detection condition.

ステップ３０２において、投稿情報に欠落している情報がないと判定された場合（ステップ３０２でＮｏ）、投稿情報はマルチモダル識別部３１４に出力される。そして、ステップ３０６へ移行する。
一方、ステップ３０２において、投稿情報に欠落している情報があると判定された場合（ステップ３０２でＹｅｓ）、投稿情報は関連情報選択部３１２に出力される。次に、関連情報選択部３１２の候補位置情報抽出部３１２ｄは、投稿情報の抽出範囲を地理的に近接するところまで拡大して、関連情報の候補となる他ユーザの投稿情報を取得する（ステップ３０３）。次に、候補情報選択部３１２ｂは、候補位置情報抽出部３１２ｄが取得した他ユーザの投稿情報の中から、関連情報を選択する（ステップ３０４）。 If it is determined in step 302 that there is no missing information in the posted information (No in step 302), the posted information is output to the multimodal identification unit 314. Then, the process proceeds to step 306.
On the other hand, when it is determined in step 302 that there is missing information in the posted information (Yes in step 302), the posted information is output to the related information selection unit 312. Next, the candidate position information extraction unit 312d of the related information selection unit 312 expands the post information extraction range to a location that is geographically close to obtain post information of other users who are candidates for the related information (step) 303). Next, the candidate information selection unit 312b selects related information from the post information of other users acquired by the candidate position information extraction unit 312d (step 304).

次に、情報補間部３１３は、情報が欠落していると判定された投稿情報に対して、関連情報選択部３１２により選択された関連情報を関連付けて、欠落した情報を補間する（ステップ３０５）。欠落した情報が補間された投稿情報は、マルチモダル識別部３１４に出力される。そして、ステップ３０６へ移行する。 Next, the information interpolation unit 313 associates the related information selected by the related information selection unit 312 with the post information determined to be missing information, and interpolates the missing information (step 305). . Post information in which the missing information is interpolated is output to the multi-modal identification unit 314. Then, the process proceeds to step 306.

ステップ３０２で否定の判断（Ｎｏ）がされた後、又はステップ３０５の後、マルチモダル識別部３１４は、情報欠落検出部３１１又は情報補間部３１３から取得した投稿情報を基に訓練を行う（ステップ３０６）。次に、情報欠落検出部３１１は、ステップ３０１においてマルチモダル識別装置３００から取得した投稿情報を全て処理したか否かを判定する（ステップ３０７）。投稿情報でまだ処理していないものがある場合（ステップ３０７でＮｏ）、ステップ３０２へ移行する。一方、投稿情報の全てを処理した場合（ステップ３０７でＹｅｓ）、本処理フローは終了する。なお、本処理フローを１回又は複数回繰り返すことにより、訓練が終了し、識別の処理が開始されることになる。 After a negative determination (No) is made in step 302 or after step 305, the multimodal identification unit 314 performs training based on the posted information acquired from the information missing detection unit 311 or the information interpolation unit 313 (step 306). ). Next, the information loss detection unit 311 determines whether or not all post information acquired from the multimodal identification device 300 in step 301 has been processed (step 307). If there is post information that has not been processed yet (No in Step 307), the process proceeds to Step 302. On the other hand, when all of the posted information has been processed (Yes in step 307), this processing flow ends. In addition, training is complete | finished by repeating this process flow once or several times, and the process of identification is started.

次に、図１１（ｂ）を参照しながら、マルチモダル識別装置３００による識別時の処理について説明する。
情報欠落検出部３１１は、情報記憶装置２００から識別の対象とする投稿情報を取得する（ステップ４０１）。ここで、図１１（ａ）のステップ３０１と同様に、投稿情報の取得条件を定めても良い。次のステップ４０２〜ステップ４０５の処理は、図１１（ａ）のステップ３０２〜ステップ３０５の処理と同様であるため、ここでは説明を省略する。 Next, with reference to FIG. 11B, a process at the time of identification by the multimodal identification device 300 will be described.
The information loss detection unit 311 acquires post information to be identified from the information storage device 200 (step 401). Here, as in the case of step 301 in FIG. 11A, a condition for obtaining posted information may be set. The processing of the next step 402 to step 405 is the same as the processing of step 302 to step 305 in FIG.

ステップ４０２で否定の判断（Ｎｏ）がされた後、又はステップ４０５の後、マルチモダル識別部３１４は、情報欠落検出部３１１又は情報補間部３１３から取得した投稿情報を基に識別を行う（ステップ４０６）。次に、情報欠落検出部３１１は、ステップ４０１においてマルチモダル識別装置３００から取得した投稿情報を全て処理したか否かを判定する（ステップ４０７）。投稿情報でまだ処理していないものがある場合（ステップ４０７でＮｏ）、ステップ４０２へ移行する。一方、投稿情報の全てを処理した場合（ステップ４０７でＹｅｓ）、本処理フローは終了する。識別の結果得られた情報は、上述したように、例えば広告配信などのマーケティングの情報として用いられる。 After a negative determination (No) is made in step 402 or after step 405, the multimodal identification unit 314 performs identification based on the posted information acquired from the information missing detection unit 311 or the information interpolation unit 313 (step 406). ). Next, the information loss detection unit 311 determines whether or not all post information acquired from the multimodal identification device 300 in step 401 has been processed (step 407). If there is post information that has not been processed yet (No in Step 407), the process proceeds to Step 402. On the other hand, when all of the posted information has been processed (Yes in step 407), this processing flow ends. As described above, the information obtained as a result of the identification is used as marketing information such as advertisement distribution.

なお、実施の形態１及び実施の形態２では、マルチモダル識別部３１４にて訓練を行った後に識別を行うこととしたが、マルチモダル識別部３１４は、訓練と識別とを並行して実行しても良い。この場合、例えば、情報欠落検出部３１１は、訓練用の投稿情報と識別用の投稿情報とを分けて取得し、それぞれについて情報の欠落を検出する。そして、マルチモダル識別部３１４は、訓練用の投稿情報を用いて訓練モデルを更新し、それと並行して、訓練モデルを用いて、識別用の投稿情報に対する識別を行う。 In the first embodiment and the second embodiment, the multimodal identification unit 314 performs the training after performing the training. However, the multimodal identification unit 314 may perform the training and the identification in parallel. good. In this case, for example, the information loss detection unit 311 separately acquires post information for training and post information for identification, and detects a loss of information for each. Then, the multi-modal identification unit 314 updates the training model using the posting information for training, and in parallel with this, identifies the posting information for identification using the training model.

また、実施の形態１及び実施の形態２では、主に２つの形式の情報が含まれるマルチモダル情報に欠落が生じている場合について説明したが、３つ以上の形式の情報が含まれるマルチモダル情報に欠落が生じている場合についても同様の処理が行われる。例えば、３つ以上の形式の情報が含まれるマルチモダル情報について、そのうちの少なくとも１つの形式の情報が欠落している場合、欠落している情報の補間が行われる。 Moreover, in Embodiment 1 and Embodiment 2, the case where omission has occurred in multimodal information mainly including information of two formats has been described, but multimodal information including information of three or more formats is included. The same processing is performed when there is a loss. For example, for multi-modal information including information of three or more formats, when at least one of the formats is missing, interpolation of the missing information is performed.

さらに、実施の形態１及び実施の形態２では、１台のマルチモダル識別装置３００が、情報欠落検出部３１１、関連情報選択部３１２、情報補間部３１３、マルチモダル識別部３１４、訓練モデル格納部３１５の機能を有することとしたが、これらの機能を１台の装置ではなく複数台の装置で実現することとしても良い。 Furthermore, in the first embodiment and the second embodiment, one multi-modal identification device 300 includes an information loss detection unit 311, a related information selection unit 312, an information interpolation unit 313, a multi-modal identification unit 314, and a training model storage unit 315. Although these functions are provided, these functions may be realized by a plurality of devices instead of a single device.

なお、本発明の実施の形態を実現するプログラムは、通信手段により提供することはもちろん、ＣＤ−ＲＯＭ等の記録媒体に格納して提供することも可能である。 The program for realizing the embodiment of the present invention can be provided not only by a communication means but also by storing it in a recording medium such as a CD-ROM.

以上、本発明の実施の形態について説明したが、本発明の技術的範囲は上記実施の形態に記載の範囲には限定されない。上記実施の形態に、種々の変更又は改良を加えたものも、本発明の技術的範囲に含まれることは、特許請求の範囲の記載から明らかである。 Although the embodiments of the present invention have been described above, the technical scope of the present invention is not limited to the scope described in the above embodiments. It is clear from the description of the scope of the claims that various modifications or improvements added to the above embodiment are also included in the technical scope of the present invention.

３００…マルチモダル識別装置、３１１…情報欠落検出部、３１２…関連情報選択部、３１２ａ…候補情報抽出部、３１２ｂ…候補情報選択部、３１２ｃ…候補範囲指定部、３１２ｄ…候補位置情報抽出部、３１３…情報補間部、３１４…マルチモダル識別部、３１５…訓練モデル格納部 300 ... multimodal identification device, 311 ... information loss detection unit, 312 ... related information selection unit, 312a ... candidate information extraction unit, 312b ... candidate information selection unit, 312c ... candidate range specification unit, 312d ... candidate position information extraction unit, 313 ... information interpolation unit, 314 ... multi-modal identification unit, 315 ... training model storage unit

Claims

Interpolating means for interpolating the information of the missing format by the related information related to the information of the plurality of formats when the information of some formats is missing from the information of the plurality of formats;
An information processing apparatus comprising: a process execution unit that executes at least one of training and identification using the information of the plurality of formats in which the information of the missing format is interpolated.

The interpolating unit acquires the information posted within a predetermined time from the time when the missing information has the posted format as the related information, and the information that is not missing among the information of the plurality of formats is posted. The information processing apparatus according to claim 1.

The interpolation means has the missing format as the related information, and information posted at a position within a predetermined range from the location where the missing information is posted among the information of the plurality of formats. The information processing apparatus according to claim 1, further comprising:

In the interpolation means, when the information of the first format is missing from the information of the plurality of formats and the information of the second format is not missing, the similarity with the information of the second format that is not missing is determined in advance. 4. The information processing apparatus according to claim 2, wherein the information in the first format posted together with other information in the second format that satisfies the specified condition is acquired as the related information. 5.

When the information in the second format that is not missing is information in the text format, the interpolation means determines whether the information in the second format that is not missing is based on the similarity of the document vector calculated from the information in the text format. 5. The information processing apparatus according to claim 4, wherein the information in the first format posted together with other information in the text format that satisfies a predetermined degree of similarity is acquired as the related information.

On the computer,
A function of interpolating the information of the missing format by the related information related to the information of the plurality of formats when the information of the format of the plurality of formats is missing,
A program for realizing a function of executing processing of at least one of training and identification using the information of the plurality of formats in which the information of the missing format is interpolated.