JP2017173900A

JP2017173900A - Information processing device

Info

Publication number: JP2017173900A
Application number: JP2016055931A
Authority: JP
Inventors: 由樹子牧野; Yukiko Makino; 直治山田; Naoharu Yamada; 渉一岡; Wataru Ichioka
Original assignee: NTT Docomo Inc
Current assignee: NTT Docomo Inc
Priority date: 2016-03-18
Filing date: 2016-03-18
Publication date: 2017-09-28
Anticipated expiration: 2036-03-18
Also published as: JP6602245B2

Abstract

PROBLEM TO BE SOLVED: To provide an information processing device that improves accuracy of image search.SOLUTION: A tagging device 10 according to the present invention includes: a communication unit 11 for obtaining multiple pieces of image management information indicating a time, a place, or an event related to an image; and a combination unit 14 for deriving context indicating an action of a user related to the image by combining the multiple pieces of image management information obtained by the communication unit 11.SELECTED DRAWING: Figure 1

Description

本発明は、画像管理に係る情報処理装置に関する。 The present invention relates to an information processing apparatus related to image management.

従来、画像に対して日時や場所を関連付けるタグ付けシステムが知られている（例えば特許文献１参照）。このようなタグ付けシステムを用いることにより、画像が検索し易くなる。 2. Description of the Related Art Conventionally, a tagging system that associates date and time with an image is known (see, for example, Patent Document 1). By using such a tagging system, it becomes easier to search for images.

特表２０１５−５０１９８２号公報Special table 2015-501982 gazette

ここで、上述したようなタグ付けシステムは、日時や場所などをそれぞれ単体で画像に関連付けるものである。このようなタグ付けシステムでは、画像検索の精度が十分に担保されているとは言い難い。 Here, the tagging system as described above associates a date and a place with an image individually. In such a tagging system, it is difficult to say that the accuracy of image search is sufficiently secured.

本発明は上記実情に鑑みてなされたものであり、画像検索の精度を向上させる情報処理装置を提供することを目的とする。 The present invention has been made in view of the above circumstances, and an object thereof is to provide an information processing apparatus that improves the accuracy of image search.

本発明の一態様に係る情報処理装置は、画像に係る時期、場所、又は出来事を示す画像管理情報を複数取得する取得部と、取得部により取得された複数の画像管理情報を組み合わせることにより、当該画像に係るユーザの行動を示すユーザコンテクストを導出する組み合わせ部と、を備える。 An information processing apparatus according to an aspect of the present invention combines an acquisition unit that acquires a plurality of image management information indicating a time, place, or event relating to an image, and a plurality of image management information acquired by the acquisition unit, A combination unit for deriving a user context indicating the user's action related to the image.

この情報処理装置では、複数の画像管理情報、すなわちいつ、どこで、何をしていたかを示す情報が組み合わされて、画像に係るユーザの行動を示すユーザコンテクストが導出される。このようなユーザコンテクストを画像検索に用いることにより、例えば、日時や場所などの情報を単体で画像に関連付けて画像検索を行う場合と比較して、より画像と整合した（画像におけるユーザの行動に則した）情報に基づいて画像検索を行うことができる。このことにより、画像検索の精度を向上させることができる。また、同じ認識対象でも、画像が取得（例えば撮像）された場所又は状況によって異なるユーザコンテクストが推定されるので、画像検索の精度を向上させることができる。 In this information processing apparatus, a plurality of pieces of image management information, that is, information indicating when, where, and what are combined is combined to derive a user context indicating the user's action related to the image. By using such a user context for image search, for example, compared to the case where image search is performed by associating information such as date and place with an image alone, it is more consistent with the image. Image retrieval can be performed on the basis of information. As a result, the accuracy of the image search can be improved. In addition, even with the same recognition target, different user contexts are estimated depending on the location or situation where an image is acquired (for example, imaged), so that the accuracy of image search can be improved.

また、上記情報処理装置は、画像を記録した地点の位置情報と、画像を記録した日時情報とが所定の範囲内である複数の画像をグルーピングした画像グループを作成するグループ作成部を更に備え、組み合わせ部は、画像グループ毎に、画像グループに含まれる画像の画像管理情報に基づいて、ユーザの行動を示すグループコンテクストを導出してもよい。 In addition, the information processing apparatus further includes a group creation unit that creates an image group in which a plurality of images in which the position information of the point where the image is recorded and the date and time information where the image is recorded are within a predetermined range are grouped, For each image group, the combination unit may derive a group context indicating the user's behavior based on the image management information of the images included in the image group.

位置及び日時を考慮したグループ単位でグループコンテクストを導出することにより、共通のイベント（出来事）に係る画像を同一の検索結果とし易くなる。画像１枚単位でのみユーザコンテクストを導出した場合においては、例えばイベントの最中に昼食をとった場合などにおいて、共通のイベントの画像であるにもかかわらず位置等が少し異なることを理由として別の検索結果となる（ばらばらに出力される）ことが問題となる。また、１日などの長いスパンで１つのユーザコンテクストを導出した場合には、異なるイベントの画像が同じ検索結果として出力されることが問題となる。この点、位置及び日時を考慮してグループ化することにより、例えば共通のイベントの画像のみを検索結果とすることができる。また、例えば画像単体のユーザコンテクストについて誤認識や推定誤り等があった場合であっても、複数の画像のユーザコンテクストを考慮することによって、一部の誤りを無視できるため、画像検索の精度をより向上させることができる。 By deriving a group context for each group in consideration of the position and date and time, images related to a common event (event) can be easily obtained as the same search result. When the user context is derived only for one image unit, for example, when lunch is taken during the event, it is different because the position etc. is slightly different even though it is an image of a common event. It becomes a problem that the search result becomes (outputs in pieces). Further, when one user context is derived in a long span such as one day, there is a problem that images of different events are output as the same search result. By grouping in consideration of this point, the position, and the date and time, for example, only images of common events can be set as search results. In addition, for example, even when there is a misrecognition or estimation error in the user context of a single image, some errors can be ignored by considering the user context of a plurality of images, so the accuracy of image search is improved. It can be improved further.

また、上記情報処理装置は、取得部により取得された複数の画像管理情報のうち、該画像管理情報の正確性に関する所定の条件を満たす画像管理情報を、コンテクスト候補として推定する候補推定部を更に備え、組み合わせ部は、複数の画像管理情報のうち、候補推定部によりコンテクスト候補とされた画像管理情報を組み合わせることにより、ユーザコンテクスト及びグループコンテクストを導出してもよい。 The information processing apparatus further includes a candidate estimation unit configured to estimate, as a context candidate, image management information that satisfies a predetermined condition regarding the accuracy of the image management information among the plurality of image management information acquired by the acquisition unit. The combination unit may derive the user context and the group context by combining the image management information that is determined as the context candidate by the candidate estimation unit among the plurality of image management information.

組み合わせる対象の画像管理情報を無作為に選択するのではなく、画像管理情報としての精度が高いものを用いることにより、画像検索の精度をより向上させることができる。 Rather than randomly selecting image management information to be combined, it is possible to further improve the accuracy of image search by using information having high accuracy as image management information.

また、複数の画像管理情報には、画像に関する画像認識結果が複数含まれており、候補推定部は、画像に関する画像認識結果に基づくコンテクスト候補である画像候補を推定する画像認識推定部を有し、画像認識推定部は、画像に関する画像認識結果において、類似度合いを示すスコアが所定の閾値以上であるオブジェクトを示す情報を、ユーザコンテクストを導出するための画像候補として推定し、同一画像グループ内における最多の画像候補を、該画像グループのグループコンテクストを導出するための画像候補として推定してもよい。 The plurality of image management information includes a plurality of image recognition results regarding the image, and the candidate estimation unit includes an image recognition estimation unit that estimates an image candidate that is a context candidate based on the image recognition result regarding the image. The image recognition estimation unit estimates information indicating an object whose score indicating the degree of similarity is equal to or higher than a predetermined threshold in the image recognition result regarding the image as an image candidate for deriving a user context, and within the same image group The largest number of image candidates may be estimated as image candidates for deriving the group context of the image group.

類似度合いを示すスコアが高い情報を画像候補とすることにより、画像管理情報としての精度が高いものを組み合わせてユーザコンテクストを導出することができ、画像検索の精度をより向上させることができる。すなわち、例えば撮像時にぶれてしまった画像（写真）等が画像候補とされてユーザコンテクストが導出されることを防止することができる。また、同一画像グループ内における最多の画像候補を、グループコンテクストを導出するための画像候補とすることにより、画像グループにおける代表的な画像候補を用いてグループコンテクストを導出することができ、画像グループ単位の画像検索の精度をより向上させることができる。なお、画像グループ単位での推定において、単にスコアで判断するのではなく上述したようにして画像候補を決めることにより、画像単体でのスコアが低い画像についても考慮することができる。 By using information having a high score indicating the degree of similarity as an image candidate, a user context can be derived by combining information having high accuracy as image management information, and the accuracy of image search can be further improved. That is, for example, it is possible to prevent a user context from being derived from an image (photograph) that has been blurred during imaging as an image candidate. Further, by setting the most image candidates in the same image group as image candidates for deriving the group context, the group context can be derived using the representative image candidates in the image group. The accuracy of the image search can be further improved. In addition, in the estimation for each image group, it is possible to consider an image having a low score for a single image by determining image candidates as described above, instead of simply judging by a score.

また、複数の画像管理情報には、画像に関するＰＯＩ情報が複数含まれており、候補推定部は、画像に関するＰＯＩ情報に基づくコンテクスト候補であるＰＯＩ候補を推定するＰＯＩ推定部を有し、ＰＯＩ推定部は、複数のＰＯＩ情報それぞれについて、ＰＯＩのチェックイン数を、該ＰＯＩから画像を記録した位置までの距離で割ったＰＯＩスコアを算出し、該ＰＯＩスコアが最も高いＰＯＩ情報を、ユーザコンテクストを導出するためのＰＯＩ候補として推定し、同一の画像グループ内における最多のＰＯＩ候補を、該画像グループのグループコンテクストを導出するためのＰＯＩ候補として推定してもよい。 The plurality of pieces of image management information include a plurality of pieces of POI information related to images, and the candidate estimation unit includes a POI estimation unit that estimates POI candidates that are context candidates based on the POI information about images. The unit calculates a POI score obtained by dividing the number of POI check-ins by the distance from the POI to the position where the image is recorded for each of a plurality of POI information, and the POI information having the highest POI score is calculated as a user context. It may be estimated as a POI candidate for deriving, and the most POI candidates in the same image group may be estimated as POI candidates for deriving the group context of the image group.

ＰＯＩのチェックイン数を考慮することにより、ユーザが訪れている可能性が高いＰＯＩ情報をＰＯＩ候補として推定され易くすることができる。これにより、誤推定を抑制することができる。また、画像グループ単位での推定では、最多のＰＯＩ候補がグループコンテクストを導出するためのＰＯＩ候補とされることにより、ユーザが訪れている可能性が高いＰＯＩ情報をグループコンテクストを導出するためのＰＯＩ候補とすることができる。 By considering the number of POI check-ins, it is possible to easily estimate POI information that is likely to be visited by a user as a POI candidate. Thereby, erroneous estimation can be suppressed. Further, in the estimation in units of image groups, the POI for deriving the group context from the POI information that is likely to be visited by the user because the most POI candidates are the POI candidates for deriving the group context. Can be a candidate.

また、複数の画像管理情報には、位置を示す情報及び日時を示す情報によって特定されるイベント情報が含まれており、候補推定部は、イベント情報に基づくコンテクスト候補であるイベント候補を推定するイベント推定部を有し、イベント推定部は、位置を示す情報が画像を記録した地点の位置情報と一致し、且つ、日時を示す情報が画像を記録した日時情報と一致するイベント情報を、ユーザコンテクストを導出するためのイベント候補として推定し、画像グループに含まれる画像を記録した地点の位置情報と一致し、且つ、日時を示す情報が、該画像グループに含まれる画像のうち日時情報が最も古い画像を記録した日時情報から、日時情報が最も新しい画像を記録した日時情報の間に含まれているイベント情報を、該画像グループのグループコンテクストを導出するためのイベント候補として推定してもよい。これにより、位置及び日時を考慮して、ユーザが訪れていたと考えられるイベント候補を適切に推定することができる。すなわち、画像検索の精度をより向上させることができる。 Further, the plurality of image management information includes event information specified by information indicating a position and information indicating a date and time, and the candidate estimation unit estimates an event candidate that is a context candidate based on the event information. An event estimation unit, which includes event information whose location information matches the location information of the point where the image is recorded, and whose date and time information matches the date information where the image is recorded. As the event candidate for deriving the image, the position information that matches the position information of the point where the image included in the image group is recorded, and the date and time information is the oldest among the images included in the image group. The event information included between the date and time information when the image is recorded and the date and time information when the image with the newest date and time information is recorded It may be estimated as an event candidate for deriving the loop context. Thereby, the event candidate considered that the user was visiting can be appropriately estimated in consideration of the position and the date and time. That is, the accuracy of image search can be further improved.

また、複数の画像管理情報には、画像に関する文字認識結果が含まれており、候補推定部は、画像に関する文字認識結果に基づくコンテクスト候補である文字候補を推定する文字認識推定部を有し、文字認識推定部は、文字認識結果における文字のうち予め定められた文字を、ユーザコンテクストを導出するための文字候補として推定し、同一の画像グループ内における最多の文字候補を、該画像グループのグループコンテクストを導出するための文字候補として推定してもよい。 The plurality of image management information includes a character recognition result regarding the image, and the candidate estimation unit includes a character recognition estimation unit that estimates a character candidate that is a context candidate based on the character recognition result regarding the image, The character recognition estimation unit estimates a predetermined character among characters in the character recognition result as a character candidate for deriving a user context, and determines the most character candidate in the same image group as a group of the image group. You may estimate as a character candidate for deriving a context.

予め定めた文字のみを文字候補とすることにより、ユーザの行動を示すキーワードとして適切なものだけを文字候補とすることができる。また、同一画像グループ内における最多の文字候補を、グループコンテクストを導出するための文字候補とすることにより、画像グループにおける代表的な文字候補を用いてグループコンテクストを導出することができ、画像グループ単位の画像検索の精度をより向上させることができる。 By using only predetermined characters as character candidates, it is possible to set only appropriate characters as keywords indicating user behavior as character candidates. In addition, by setting the largest number of character candidates in the same image group as character candidates for deriving a group context, a group context can be derived using representative character candidates in an image group. The accuracy of the image search can be further improved.

また、組み合わせ部は、ユーザコンテクストに応じたタグを前記画像に関連付け、グループコンテクストに応じたタグを画像グループに関連付けてもよい。これにより、従来、画像に関連付けた認識結果として１つの認識結果のみを用いていた場合と比較して、曖昧性を低減すると共に抽象度を下げてタグ付けすることができ、画像検索の精度を向上させることができる。 The combination unit may associate a tag corresponding to the user context with the image, and associate a tag according to the group context with the image group. Thereby, compared with the case where only one recognition result is conventionally used as a recognition result associated with an image, tagging can be performed with reduced ambiguity and reduced abstraction, and the accuracy of image search is improved. Can be improved.

また、上記情報処理装置は、一又は複数のユーザコンテクストを上位概念で規定したカテゴリを画像に関連付け、一又は複数のグループコンテクストを上位概念で規定したカテゴリを画像グループに関連付ける、カテゴリ付与部を更に備えてもよい。カテゴリを画像及び画像グループに関連付けることにより、ユーザコンテクスト及びグループコンテクストよりも上位概念で画像を検索すること等が可能になる。 The information processing apparatus further includes a category assigning unit that associates one or more user contexts with a category defined by a higher concept with an image, and associates one or more group contexts with a higher concept with an image group. You may prepare. By associating a category with an image and an image group, it becomes possible to search for an image with a higher concept than a user context and a group context.

また、グループ作成部は、作成した画像グループのうち、含まれる画像の日時情報が所定の範囲内である複数の画像グループを、共通の新たな画像グループとし、組み合わせ部は、新たな画像グループに含まれる複数の画像グループのうち、画像数が最も多い画像グループのグループコンテクストを、新たな画像グループのグループコンテクストとしてもよい。 Further, the group creation unit sets a plurality of image groups in which the date and time information of the included images is within a predetermined range among the created image groups as a common new image group, and the combination unit sets the new image group as a new image group. Of the plurality of included image groups, the group context of the image group having the largest number of images may be set as the group context of the new image group.

これにより、日時情報が類似する画像グループを更に纏めることができ、類似する画像検索結果をより容易に確認することができる。また、画像数が最も多い画像グループのグループコンテクストが新たな画像グループのグループコンテクストとされることにより、新たな画像グループに関しての画像検索の精度を向上させることができる。 Thereby, image groups with similar date and time information can be further collected, and similar image search results can be confirmed more easily. In addition, since the group context of the image group having the largest number of images is set as the group context of the new image group, it is possible to improve the accuracy of the image search for the new image group.

本発明によれば、画像検索の精度を向上させる情報処理装置を提供することができる。 ADVANTAGE OF THE INVENTION According to this invention, the information processing apparatus which improves the precision of an image search can be provided.

本発明の実施形態に係るタグ付けシステムの機能構成を示す図である。It is a figure which shows the function structure of the tagging system which concerns on embodiment of this invention. 図１に示したタグ付けシステムに含まれるタグ付け装置のハードウェア構成を示す図である。It is a figure which shows the hardware constitutions of the tagging apparatus contained in the tagging system shown in FIG. グループ作成部によるグルーピング処理の説明図である。It is explanatory drawing of the grouping process by a group preparation part. 推定結果テーブルを示す表である。It is a table | surface which shows an estimation result table. タグ付け結果テーブルを示す表である。It is a table | surface which shows a tagging result table. 画像単体でタグ付けを行う一例の説明図である。It is explanatory drawing of an example which tags with a single image. 画像グループ単位でタグ付けを行う一例の説明図である。It is explanatory drawing of an example which performs tagging per image group. 画像単体でタグ付けを行わない一例の説明図である。It is explanatory drawing of an example which does not tag with a single image. 画像単体でタグ付けを行わない一例の説明図である。It is explanatory drawing of an example which does not tag with a single image. 画像単体でタグ付けを行わない一例の説明図である。It is explanatory drawing of an example which does not tag with a single image. 画像グループ単位でタグ付けを行わない一例の説明図である。It is explanatory drawing of an example which does not tag by an image group unit. カテゴリ付与の一例の説明図である。It is explanatory drawing of an example of category provision. 本発明の実施形態に係るタグ付け装置の処理を示すフローチャートである。It is a flowchart which shows the process of the tagging apparatus which concerns on embodiment of this invention. 文字認識推定部の処理の詳細を示すフローチャートである。It is a flowchart which shows the detail of a process of a character recognition estimation part. 画像認識推定部の処理の詳細を示すフローチャートである。It is a flowchart which shows the detail of a process of an image recognition estimation part. ＰＯＩ推定部の処理の詳細を示すフローチャートである。It is a flowchart which shows the detail of a process of a POI estimation part. イベント推定部の処理の詳細を示すフローチャートである。It is a flowchart which shows the detail of a process of an event estimation part. 変形例に係るタグ付け装置のアルバム作成の説明図である。It is explanatory drawing of album creation of the tagging apparatus which concerns on a modification.

以下、添付図面を参照しながら本発明の実施形態を詳細に説明する。図面の説明において、同一又は同等の要素には同一符号を用い、重複する説明を省略する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings. In the description of the drawings, the same reference numerals are used for the same or equivalent elements, and redundant descriptions are omitted.

図１は、本実施形態に係るタグ付けシステムの機能構成を示す図である。図１に示されるように、タグ付けシステム１は、画像を管理するためのシステムであり、タグ付け装置１０（情報処理装置）と、データ表示端末３０と、ＰＯＩ変換装置４０と、文字認識装置５０と、画像認識装置６０と、イベント推定装置７０と、を備えている。 FIG. 1 is a diagram showing a functional configuration of a tagging system according to the present embodiment. As shown in FIG. 1, the tagging system 1 is a system for managing images, and includes a tagging device 10 (information processing device), a data display terminal 30, a POI conversion device 40, and a character recognition device. 50, an image recognition device 60, and an event estimation device 70.

タグ付け装置１０は、画像管理に係るサーバであり、データ表示端末３０、ＰＯＩ変換装置４０、文字認識装置５０、画像認識装置６０、及びイベント推定装置７０と相互に通信可能に構成されている。より詳細には、タグ付け装置１０は、データ表示端末３０が管理する画像に検索用のタグを付けることにより、画像検索の容易化を実現するものである。タグ付け装置１０の詳細については後述する。 The tagging device 10 is a server related to image management, and is configured to be able to communicate with the data display terminal 30, the POI conversion device 40, the character recognition device 50, the image recognition device 60, and the event estimation device 70. More specifically, the tagging device 10 realizes facilitation of image search by attaching a search tag to an image managed by the data display terminal 30. Details of the tagging device 10 will be described later.

データ表示端末３０は、画像及び、画像のタグ付け結果を記憶すると共に、当該画像を表示する端末である。画像とは、視認可能な図形又は写真である。以下では、画像はデータ表示端末３０によって撮像された写真データである、として説明する。データ表示端末３０は、撮像した画像と、該画像を撮像した地点の位置情報とを、タグ付け装置１０に送信する。データ表示端末３０は、タグ付け装置１０によって導出された当該画像に関するタグ付け結果を、タグ付け装置１０から受信し、記憶する。また、データ表示端末３０は、当該データ表示端末３０のユーザによる検索操作に応じて、該ユーザが所望する画像を検索し、表示する。当該検索には、上述したタグ付け結果が用いられる。 The data display terminal 30 is a terminal that stores an image and a tagging result of the image and displays the image. An image is a visually recognizable figure or photograph. In the following description, it is assumed that the image is photograph data captured by the data display terminal 30. The data display terminal 30 transmits the captured image and the position information of the point where the image is captured to the tagging device 10. The data display terminal 30 receives the tagging result regarding the image derived by the tagging device 10 from the tagging device 10 and stores it. Further, the data display terminal 30 searches for and displays an image desired by the user in response to a search operation by the user of the data display terminal 30. The tagging result described above is used for the search.

データ表示端末３０は、例えばユーザに所持（携帯）されて用いられる、スマートフォン等の携帯電話機、タブレット端末、又はラップトップパソコン等である。或いは、データ表示端末３０は、ユーザの自宅や職場に備え付けられたデスクトップパソコン等である。 The data display terminal 30 is, for example, a mobile phone such as a smartphone, a tablet terminal, or a laptop personal computer that is used (carried) by a user. Alternatively, the data display terminal 30 is a desktop personal computer or the like installed in the user's home or workplace.

ＰＯＩ変換装置４０は、緯度経度により示される位置情報を入力として、ＰＯＩ情報を出力するサーバである。ＰＯＩ情報には、入力された位置情報で示される位置周辺のＰＯＩ（point of interest）、ＰＯＩのカテゴリ、及び各ＰＯＩへのチェックイン数が含まれている。ＰＯＩとは、あるユーザが興味を持った特定の場所を示す情報であり、例えば、緯度経度及び名称（場所の名称）を含んだ情報である。チェックイン数とは、当該ＰＯＩを訪れたユーザの数である。ＰＯＩ変換装置４０は、例えばユーザによってＰＯＩが追加及び編集されることにより、出力対象であるＰＯＩ情報を更新している。ＰＯＩ変換装置４０は、タグ付け装置１０からのＰＯＩ変換要求に応じて、ＰＯＩ情報をタグ付け装置１０に出力する。ＰＯＩ変換装置４０は、タグ付け装置１０を管理する通信事業者により管理されたサーバあってもよいし、一般に広く公開されて利用されているＰＯＩ変換に係るサーバであってもよい。 The POI converter 40 is a server that outputs position information indicated by latitude and longitude and outputs POI information. The POI information includes a POI (point of interest) around the position indicated by the input position information, a POI category, and the number of check-ins to each POI. The POI is information indicating a specific place in which a certain user is interested, for example, information including latitude and longitude and a name (place name). The number of check-ins is the number of users who have visited the POI. The POI conversion device 40 updates the POI information to be output by adding and editing the POI, for example, by the user. The POI conversion device 40 outputs POI information to the tagging device 10 in response to a POI conversion request from the tagging device 10. The POI conversion device 40 may be a server managed by a telecommunications carrier that manages the tagging device 10, or may be a server related to POI conversion that has been widely publicized and used.

文字認識装置５０は、画像中に含まれた文字を判別するサーバである。文字認識装置５０は、例えば、文字のテンプレートを記憶しており、該テンプレートの文字との一致を判断することにより、画像中に含まれた文字を判別する。文字認識装置５０は、タグ付け装置１０から、文字認識対象の画像を含んだ文字認識要求を受け、該画像中に含まれた文字を判別し、判別結果（文字認識結果）をタグ付け装置１０に出力する。文字認識装置５０は、判別した文字であるキーワードに基づき、該キーワードに関連付けられたカテゴリを導出する。すなわち、文字認識結果には、判別した文字を示すキーワードと、該キーワードに関連付けられたカテゴリとが含まれている。文字認識装置５０は、タグ付け装置１０を管理する通信事業者により管理されたサーバあってもよいし、一般に広く公開されて利用されている文字認識に係るサーバであってもよい。 The character recognition device 50 is a server that discriminates characters included in an image. The character recognition device 50 stores, for example, a character template, and determines a character included in the image by determining a match with the character of the template. The character recognition device 50 receives a character recognition request including an image to be recognized from the tagging device 10, discriminates characters included in the image, and uses the discrimination result (character recognition result) as the tagging device 10. Output to. The character recognition device 50 derives a category associated with the keyword based on the keyword that is the determined character. That is, the character recognition result includes a keyword indicating the determined character and a category associated with the keyword. The character recognition device 50 may be a server managed by a telecommunications carrier that manages the tagging device 10 or may be a server related to character recognition that has been widely disclosed and used.

画像認識装置６０は、画像中に含まれたオブジェクト又はシーンを判別するサーバである。画像認識装置６０は、例えば、オブジェクトのテンプレートを記憶しており、該テンプレートのオブジェクトとの一致を判断することにより、画像中に含まれたオブジェクトを判別する。画像認識装置６０は、タグ付け装置１０から、画像認識対象の画像を含んだ画像認識要求を受け、該画像中に含まれたオブジェクトを判別し、判別結果（画像認識結果）をタグ付け装置１０に出力する。当該画像認識結果には、判別したオブジェクトを示すラベルと、テンプレートとの一致度合い（類似度合い）を示すスコアとが含まれている。画像認識装置６０は、タグ付け装置１０を管理する通信事業者により管理されたサーバあってもよいし、一般に広く公開されて利用されている画像認識に係るサーバであってもよい。 The image recognition device 60 is a server that determines an object or a scene included in an image. For example, the image recognition device 60 stores a template of an object, and determines an object included in the image by determining a match with the object of the template. The image recognition device 60 receives an image recognition request including an image to be recognized from the tagging device 10, discriminates an object included in the image, and uses the discrimination result (image recognition result) as the tagging device 10. Output to. The image recognition result includes a label indicating the determined object and a score indicating the degree of matching (similarity) with the template. The image recognition device 60 may be a server managed by a telecommunications carrier that manages the tagging device 10, or may be a server related to image recognition that is widely publicized and used.

イベント推定装置７０は、画像を記録（撮像）した地点のＰＯＩ及び画像を記録した日時情報を入力として、イベント情報を出力するサーバである。イベント情報には、イベント名及びイベントカテゴリが含まれている。イベント推定装置７０は、当該イベント情報と、イベントが開催されたＰＯＩ及び日時を関連付けて記憶している。そして、イベント推定装置７０は、タグ付け装置１０から、画像を記録（撮像）した地点のＰＯＩ及び画像を記録した日時情報を含んだイベント推定要求を受け、画像に係るイベント情報を特定し、特定したイベント情報をタグ付け装置１０に出力する。イベント推定装置７０は、タグ付け装置１０を管理する通信事業者により管理されたサーバあってもよいし、一般に広く公開されて利用されているイベント推定に係るサーバであってもよい。 The event estimation device 70 is a server that outputs the event information with the POI of the point where the image is recorded (imaged) and the date / time information when the image is recorded as inputs. The event information includes an event name and an event category. The event estimation device 70 stores the event information, the POI at which the event was held, and the date and time in association with each other. Then, the event estimation device 70 receives the event estimation request including the POI of the point where the image is recorded (captured) and the date / time information when the image is recorded from the tagging device 10, specifies event information related to the image, and specifies The event information is output to the tagging device 10. The event estimation device 70 may be a server managed by a telecommunications carrier that manages the tagging device 10, or may be a server related to event estimation that is widely disclosed and used in general.

次に、タグ付け装置１０の詳細について説明する。図２は、図１に示したタグ付けシステム１に含まれるタグ付け装置１０のハードウェア構成を示す図である。タグ付け装置１０は、物理的には、図２に示すように、１又は複数のＣＰＵ１０１、主記憶装置であるＲＡＭ１０２及びＲＯＭ１０３、入力デバイスであるキーボード及びマウス等の入力装置１０４、ディスプレイ等の出力装置１０５、ネットワークカード等のデータ送受信デバイスである通信モジュール１０６、半導体メモリ等の補助記憶装置１０７等を含むコンピュータとして構成されている。 Next, details of the tagging device 10 will be described. FIG. 2 is a diagram showing a hardware configuration of the tagging apparatus 10 included in the tagging system 1 shown in FIG. As shown in FIG. 2, the tagging apparatus 10 physically includes one or a plurality of CPUs 101, a RAM 102 and a ROM 103 which are main storage devices, an input device 104 such as a keyboard and a mouse which are input devices, and an output such as a display. The computer 105 includes a device 105, a communication module 106 that is a data transmission / reception device such as a network card, an auxiliary storage device 107 such as a semiconductor memory, and the like.

タグ付け装置１０の各機能は、図２に示すＣＰＵ１０１、ＲＡＭ１０２等のハードウェア上に１又は複数の所定のコンピュータソフトウェアを読み込ませることにより、ＣＰＵ１０１の制御のもとで入力装置１０４、出力装置１０５、通信モジュール１０６を動作させるとともに、ＲＡＭ１０２や補助記憶装置１０７におけるデータの読み出し及び書き込みを行うことで実現される。 Each function of the tagging device 10 is loaded with one or more predetermined computer software on the hardware such as the CPU 101 and the RAM 102 shown in FIG. 2, so that the input device 104 and the output device 105 are controlled under the control of the CPU 101. This is realized by operating the communication module 106 and reading and writing data in the RAM 102 and the auxiliary storage device 107.

図１に戻り、タグ付け装置１０は、機能構成として、通信部１１（取得部）と、グループ作成部１２と、候補推定部１３と、組み合わせ部１４と、カテゴリ付与部１５と、画像メタデータ格納部１６と、位置情報格納部１７と、推定用情報格納部１８と、を備えている。 Returning to FIG. 1, the tagging apparatus 10 includes a communication unit 11 (acquisition unit), a group creation unit 12, a candidate estimation unit 13, a combination unit 14, a category assignment unit 15, and image metadata as functional configurations. A storage unit 16, a position information storage unit 17, and an estimation information storage unit 18 are provided.

通信部１１は、データ表示端末３０、ＰＯＩ変換装置４０、文字認識装置５０、画像認識装置６０、及びイベント推定装置７０と通信を行う機能である。通信部１１は、データ表示端末３０から、撮像した画像と該画像を撮像した地点の位置情報とを受信する。通信部１１は、受信した画像のメタデータを画像メタデータ格納部１６に格納し、受信した位置情報を位置情報格納部１７に格納する。画像のメタデータには、画像を記録した日時情報（すなわち、画像の作成日時）及び画像のファイル形式等が含まれている。位置情報格納部１７では、位置情報と画像とが関連付けられて記憶されている。該位置情報は、緯度経度を示すものであってもよいし、ＰＯＩ変換装置４０から受信したＰＯＩを示すものであってもよい。 The communication unit 11 has a function of communicating with the data display terminal 30, the POI conversion device 40, the character recognition device 50, the image recognition device 60, and the event estimation device 70. The communication unit 11 receives the captured image and the position information of the spot where the image is captured from the data display terminal 30. The communication unit 11 stores the metadata of the received image in the image metadata storage unit 16 and stores the received position information in the position information storage unit 17. The image metadata includes date and time information (that is, image creation date and time) when the image is recorded, an image file format, and the like. In the position information storage unit 17, the position information and the image are stored in association with each other. The position information may indicate latitude and longitude, or may indicate POI received from the POI conversion device 40.

通信部１１は、ＰＯＩ変換装置４０に対して、画像を撮像した地点の位置情報を含んだＰＯＩ変換要求を送信し、該要求に応じたＰＯＩ変換装置４０からＰＯＩ情報（画像管理情報）を取得する。通信部１１は、文字認識装置５０に対して、画像を含んだ文字認識要求を送信し、該要求に応じた文字認識装置５０から文字認識結果（画像管理情報）を取得する。通信部１１は、画像認識装置６０に対して、画像を含んだ画像認識要求を送信し、該要求に応じた画像認識装置６０から画像認識結果（画像管理情報）を取得する。通信部１１は、イベント推定装置７０に対して、画像を記録（撮像）した地点のＰＯＩ及び画像を記録した日時情報を含んだイベント推定要求を送信し、該要求に応じたイベント推定装置７０からイベント情報（画像管理情報）を取得する。このようにして、通信部１１は、画像に係る時期、場所、又は出来事を示す画像管理情報を複数取得する。通信部１１は、各画像管理情報を、コンテクスト推定に用いる推定用情報として推定用情報格納部１８に格納すると共に、候補推定部１３に推定要求を出力する。 The communication unit 11 transmits a POI conversion request including the position information of the point where the image is captured to the POI conversion device 40, and acquires POI information (image management information) from the POI conversion device 40 according to the request. To do. The communication unit 11 transmits a character recognition request including an image to the character recognition device 50, and acquires a character recognition result (image management information) from the character recognition device 50 in response to the request. The communication unit 11 transmits an image recognition request including an image to the image recognition device 60, and acquires an image recognition result (image management information) from the image recognition device 60 in response to the request. The communication unit 11 transmits an event estimation request including the POI of the point where the image is recorded (captured) and the date / time information when the image is recorded to the event estimation device 70, and the event estimation device 70 responding to the request transmits the event estimation request. Event information (image management information) is acquired. In this way, the communication unit 11 acquires a plurality of pieces of image management information indicating the time, place, or event related to the image. The communication unit 11 stores each piece of image management information in the estimation information storage unit 18 as estimation information used for context estimation, and outputs an estimation request to the candidate estimation unit 13.

グループ作成部１２は、画像を記録した地点の位置情報と、画像を記録した日時情報とが所定の範囲内である複数の画像をグルーピングした画像グループを作成する機能である。グループ作成部１２は、最初にグルーピングの基準となる画像（基準画像）を決定する。グループ作成部１２は、位置情報格納部１７を参照することにより、基準画像との位置情報の差異が所定の範囲内である画像を抽出する。更に、グループ作成部１２は、画像メタデータ格納部１６を参照することにより、基準画像との日時情報の差異が所定の範囲内である画像を抽出する。そして、グループ作成部１２は、基準画像との位置情報及び日時情報の差異がいずれも所定の範囲内である複数の画像をグルーピングして、画像グループを作成する。グループ作成部１２は、グルーピングの結果を含んだ推定要求を、候補推定部１３に出力する。なお、グループ作成部１２は、画像に紐づく位置情報がない場合には、当該画像をグルーピングの対象から外すこととしてもよい。また、グループ作成部１２は、画像が撮像された日時を含む予定が存在する旨の情報を取得している場合、同一の予定に係る画像をグルーピングしてもよい。 The group creating unit 12 has a function of creating an image group in which a plurality of images in which the position information of the point where the image is recorded and the date information when the image is recorded are within a predetermined range are grouped. The group creating unit 12 first determines an image (reference image) to be a grouping reference. The group creating unit 12 refers to the position information storage unit 17 to extract an image in which the difference in position information from the reference image is within a predetermined range. Further, the group creation unit 12 refers to the image metadata storage unit 16 to extract an image whose date / time information difference from the reference image is within a predetermined range. Then, the group creating unit 12 creates an image group by grouping a plurality of images in which the difference in position information and date / time information from the reference image is within a predetermined range. The group creation unit 12 outputs an estimation request including the grouping result to the candidate estimation unit 13. Note that the group creation unit 12 may exclude the image from the grouping target when there is no position information associated with the image. Moreover, the group creation part 12 may group the image which concerns on the same schedule, when the information to the effect that the schedule containing the date when the image was imaged exists is acquired.

図３は、グループ作成部１２によるグルーピング処理の説明図である。図３の例では、Ａ遊園地の写真である画像ｂ１，ｓ１，ｓ２，ｂ４と、Ｂ公園の写真である画像ｂ２，ｓ３がグルーピング対象の画像とされている。そして、図３中の時間軸ｔで示されるように、撮像されたタイミングは、画像ｂ１，ｓ１，ｓ２，ｂ２，ｓ３，ｂ４の順であり、全ての画像は日時情報の差異が所定の範囲内であるとする。この場合、例えば、基準画像との位置情報の差異が５００ｍ以内である場合に位置情報の差異が所定の範囲内であるとすると、基準画像である画像ｂ１と、該画像ｂ１から４００ｍ離れた場所で記録された画像ｓ１，ｓ２とが、同一の画像グループ（グループ１）とされる。一方で、画像ｂ１から１０００ｍ離れたＢ公園の写真である画像ｂ２は、グループ１ではないと判断される。そして、当該画像ｂ２が新たな基準画像とされ、該画像ｂ２と、該画像ｂ２から２００ｍ離れた場所で記録された画像ｓ３とが、同一の画像グループ（グループ２
とされる。そして、画像ｂ２から１０００ｍ離れた場所で記録された画像ｂ４は、グループ２ではないと判断され、新たなグループ３の基準画像とされる。 FIG. 3 is an explanatory diagram of the grouping process performed by the group creation unit 12. In the example of FIG. 3, images b1, s1, s2, and b4, which are photographs of A amusement park, and images b2, s3, which are photographs of Park B, are set as grouping target images. As shown by the time axis t in FIG. 3, the captured timing is in the order of images b1, s1, s2, b2, s3, and b4, and all images have a difference in date and time information within a predetermined range. Suppose that In this case, for example, if the difference in position information from the reference image is within 500 m and the difference in position information is within a predetermined range, the image b1 that is the reference image and a location that is 400 m away from the image b1 The images s1 and s2 recorded in step 1 are set to the same image group (group 1). On the other hand, it is determined that the image b2 which is a photograph of the park B 1000 m away from the image b1 is not the group 1. Then, the image b2 is set as a new reference image, and the image b2 and the image s3 recorded at a location 200 m away from the image b2 are the same image group (group 2).
It is said. The image b4 recorded at a location 1000 m away from the image b2 is determined not to be the group 2 and is set as a new group 3 reference image.

候補推定部１３は、通信部１１により取得されて推定用情報格納部１８に格納された複数の画像管理情報（推定用情報）のうち、該画像管理情報の正確性に関する所定の条件を満たす画像管理情報を、コンテクスト候補として推定する機能である。正確性に関する所定の条件を満たす画像管理情報とは、ユーザの行動を適切に示すことができる画像管理情報をいう。候補推定部１３は、通信部１１によって推定用情報格納部１８に格納されている推定用情報である各画像管理情報からコンテクスト候補を推定する。候補推定部１３は、推定したコンテクスト候補が関連付けられた推定結果テーブル（後述）を含んだコンテクスト導出要求を、組み合わせ部１４に出力する。候補推定部１３は、文字認識推定部１３ａと、画像認識推定部１３ｂと、ＰＯＩ推定部１３ｃと、イベント推定部１３ｄと、を有している。 The candidate estimation unit 13 is an image that satisfies a predetermined condition regarding the accuracy of the image management information among a plurality of pieces of image management information (estimation information) acquired by the communication unit 11 and stored in the estimation information storage unit 18. This function estimates management information as a context candidate. Image management information that satisfies a predetermined condition regarding accuracy refers to image management information that can appropriately indicate a user's behavior. The candidate estimation unit 13 estimates a context candidate from each piece of image management information that is estimation information stored in the estimation information storage unit 18 by the communication unit 11. The candidate estimation unit 13 outputs a context derivation request including an estimation result table (described later) associated with the estimated context candidate to the combination unit 14. The candidate estimation unit 13 includes a character recognition estimation unit 13a, an image recognition estimation unit 13b, a POI estimation unit 13c, and an event estimation unit 13d.

文字認識推定部１３ａは、推定用情報格納部１８に格納されている文字認識結果を取得し、該文字認識結果に基づくコンテクスト候補である文字候補を推定する。具体的には、文字認識推定部１３ａは、文字認識結果に含まれているキーワードが予め定められたキーワード（対象キーワード）である場合に、該キーワードを、ユーザコンテクストを導出するための文字候補とする。対象キーワードは、ユーザの行動を適切に示すことができると考えられるキーワードとされる。例えば、図８（ａ）の画像で文字認識される「卒業式」のようなキーワードは、ユーザが参加するイベントを示しておりユーザの行動を適切に示すことができるため、対象キーワードとされる。一方で、図８（ｂ）の画像で文字認識される「さつまかわしり」のような駅名を示すキーワード等は、ユーザの行動を適切に示しにくいため、対象キーワードとされない。文字認識推定部１３ａは、同様にして、文字認識結果に含まれているカテゴリが予め定められたカテゴリ（対象カテゴリ）である場合に該文字認識結果のカテゴリ及びキーワードを文字候補とする。 The character recognition estimation unit 13a acquires a character recognition result stored in the estimation information storage unit 18, and estimates a character candidate that is a context candidate based on the character recognition result. Specifically, when the keyword included in the character recognition result is a predetermined keyword (target keyword), the character recognition estimation unit 13a sets the keyword as a character candidate for deriving a user context. To do. The target keyword is a keyword considered to be able to appropriately indicate the user's behavior. For example, a keyword such as “Graduation Ceremony” that is recognized in the image of FIG. 8A indicates an event in which the user participates and can appropriately indicate the user's behavior, and thus is a target keyword. . On the other hand, a keyword or the like indicating a station name such as “Satsukawashiri” recognized in the image of FIG. 8B is not a target keyword because it is difficult to appropriately indicate the user's action. Similarly, when the category included in the character recognition result is a predetermined category (target category), the character recognition estimation unit 13a sets the character recognition result category and keyword as character candidates.

また、文字認識推定部１３ａは、グループ作成部１２によって作成された画像グループ単位でも、文字認識結果から文字候補を推定する。具体的には、文字認識推定部１３ａは、同一の画像グループに含まれる画像単体の文字候補として最も数が多い文字を、該画像グループのグループコンテクスト（後述）を導出するための文字候補とする。或いは、同一の画像グループに含まれる画像単体の文字候補を全てリストアップし、全てを該画像グループのグループコンテクスト（後述）を導出するための文字候補としてもよい。 Moreover, the character recognition estimation part 13a estimates a character candidate from the character recognition result also in the image group unit created by the group creation part 12. Specifically, the character recognition estimating unit 13a sets the character having the largest number as a single character candidate included in the same image group as a character candidate for deriving a group context (described later) of the image group. . Alternatively, all the character candidates for a single image included in the same image group may be listed, and all may be used as character candidates for deriving a group context (described later) of the image group.

画像認識推定部１３ｂは、推定用情報格納部１８に格納されている画像認識結果を取得し、該画像認識結果に基づくコンテクスト候補である画像候補を推定する。具体的には、画像認識推定部１３ｂは、同一の画像において画像認識された複数のオブジェクトのラベル及びスコアを取得し、スコアを昇順で並び替え、最もスコアの高いラベルを当該画像のラベルに決定すると共に、上位５つのスコアのラベルを、ユーザコンテクストを導出するための画像候補とする。或いは、画像認識推定部１３ｂは、所定の閾値よりもスコアが高いラベルを画像候補としてもよい。例えば、スコアの閾値が０．９とされている場合に、図９（ａ）のコアラの写真で画像認識されたオブジェクト（ラベル：animal）のスコアが０．９２であるとすると、当該オブジェクトのラベルは画像候補とされる。一方で、図９（ｂ）のライオンとワニの置物の写真で画像認識されたオブジェクト（ラベル：animal）のスコアが０．４４であるとすると、当該オブジェクトのラベルは画像候補とされない。画像候補が存在しない画像は、others画像であると判定される。判定された結果、画像単体において、画像候補が１つまたは複数になる場合と、画像候補が存在しない場合とがある。 The image recognition estimation unit 13b acquires an image recognition result stored in the estimation information storage unit 18, and estimates an image candidate that is a context candidate based on the image recognition result. Specifically, the image recognition estimation unit 13b acquires labels and scores of a plurality of objects recognized in the same image, rearranges the scores in ascending order, and determines the label with the highest score as the label of the image. In addition, the labels of the top five scores are used as image candidates for deriving the user context. Or the image recognition estimation part 13b is good also considering the label whose score is higher than a predetermined threshold as an image candidate. For example, when the score threshold is 0.9 and the score of an object (label: animal) recognized in the koala photograph in FIG. 9A is 0.92, The label is an image candidate. On the other hand, if the score of the object (label: animal) recognized in the photograph of the lion and crocodile figurine in FIG. 9B is 0.44, the label of the object is not an image candidate. An image for which no image candidate exists is determined to be the others image. As a result of the determination, there may be a case where one or a plurality of image candidates exist in a single image, or a case where no image candidate exists.

また、画像認識推定部１３ｂは、グループ作成部１２によって作成された画像グループ単位でも、画像認識結果からコンテクスト候補を推定する。具体的には、画像認識推定部１３ｂは、同一の画像グループ内における最多の画像候補を、該画像グループのグループコンテクスト（後述）を導出するための画像候補とする。或いは、画像認識推定部１３ｂは、同一の画像グループ内におけるスコアの総和が最も高いラベルを、該画像グループのグループコンテクスト（後述）を導出するための画像候補とする。また、画像認識推定部１３ｂは、同一の画像グループ内における最多の画像候補を、該画像グループのグループコンテクスト（後述）を導出するための画像候補とすると共に、同数の画像候補があった場合には、スコアの総和が高い方を画像候補としてもよい。画像認識推定部１３ｂは、上述したothers画像が閾値以上含まれている画像グループについては、グループコンテクスト（後述）を導出するための画像候補が存在しないと判定してもよい。例えば、スコアの閾値が０．９とされている場合に、図１１（ａ）〜図１１（ｄ）の４つの画像からなる画像グループがあった場合、スコアが０．９２の図１１（ａ）の画像のみ閾値を超え、図１１（ｂ）〜図１１（ｄ）の画像のスコアは閾値を越えなかったとする。そして、others画像が含まれている割合の閾値が０．６（６０％）であったとすると、いま、others画像が含まれている割合が０．７５（７５％）となるので、当該グループについては、グループコンテクスト（後述）を導出するための画像候補が存在しないと判定される。このように、画像グループ単位において、画像候補が１つまたは複数になる場合と、画像候補が存在しない場合とがある。 In addition, the image recognition estimation unit 13b estimates context candidates from the image recognition result even in units of image groups created by the group creation unit 12. Specifically, the image recognition estimation unit 13b sets the most image candidates in the same image group as image candidates for deriving a group context (described later) of the image group. Alternatively, the image recognition estimation unit 13b sets the label having the highest score sum in the same image group as an image candidate for deriving a group context (described later) of the image group. Further, the image recognition estimating unit 13b sets the most image candidates in the same image group as image candidates for deriving a group context (described later) of the image group, and when there are the same number of image candidates. The image candidate having a higher total score may be used as the image candidate. The image recognition estimation unit 13b may determine that there is no image candidate for deriving a group context (described later) for an image group that includes the above-described others image for a threshold value or more. For example, if the threshold value of the score is 0.9 and there is an image group consisting of the four images of FIGS. 11A to 11D, the score of FIG. ) Exceeds the threshold, and the scores of the images in FIGS. 11B to 11D do not exceed the threshold. Then, if the threshold of the ratio including the others image is 0.6 (60%), the ratio including the others image is now 0.75 (75%). Is determined that there is no image candidate for deriving a group context (described later). As described above, there are cases where one or a plurality of image candidates exist in image group units, and there are cases where no image candidates exist.

ＰＯＩ推定部１３ｃは、推定用情報格納部１８に格納されているＰＯＩ情報を取得し、該ＰＯＩ情報に基づくコンテクスト候補であるＰＯＩ候補を推定する。具体的には、ＰＯＩ推定部１３ｃは、同一の画像についての複数のＰＯＩ情報それぞれについて、ＰＯＩのチェックイン数を、ＰＯＩから画像を記録した位置までの距離で割ったＰＯＩスコアを算出し、該ＰＯＩスコアが最も高いＰＯＩ情報を、ユーザコンテクストを導出するためのＰＯＩ候補とする。より詳細には、ＰＯＩ推定部１３ｃは、ＰＯＩスコアが最も高いＰＯＩ情報のＰＯＩ名称とＰＯＩカテゴリとを、ＰＯＩ候補とする。なお、ＰＯＩ推定部１３ｃは、ＰＯＩスコアが同一のＰＯＩ情報が複数ある場合には、上記距離が短いＰＯＩ情報を当該画像のＰＯＩ候補としてもよい。或いは、ＰＯＩ推定部１３ｃは、ＰＯＩカテゴリが予め定められたキーワード（対象ＰＯＩキーワード）である場合に該ＰＯＩカテゴリをＰＯＩ候補としてもよい。例えば、対象ＰＯＩキーワードに「野球場」が含まれており「飲食店」が含まれていない場合には、図１０（ａ）の画像のＰＯＩカテゴリ「野球場」はＰＯＩ候補とされ、図１０（ｂ）の画像のＰＯＩカテゴリ「飲食店」はＰＯＩ候補とされない。 The POI estimation unit 13c acquires the POI information stored in the estimation information storage unit 18, and estimates a POI candidate that is a context candidate based on the POI information. Specifically, for each of a plurality of POI information for the same image, the POI estimation unit 13c calculates a POI score obtained by dividing the number of POI check-ins by the distance from the POI to the position where the image is recorded, POI information having the highest POI score is set as a POI candidate for deriving a user context. More specifically, the POI estimation unit 13c sets the POI name and POI category of the POI information with the highest POI score as the POI candidate. When there are a plurality of pieces of POI information having the same POI score, the POI estimation unit 13c may use the POI information with the short distance as a POI candidate for the image. Alternatively, the POI estimation unit 13c may use the POI category as a POI candidate when the POI category is a predetermined keyword (target POI keyword). For example, when the target POI keyword includes “baseball field” and does not include “restaurant”, the POI category “baseball field” in the image of FIG. 10A is set as a POI candidate. The POI category “Restaurant” in the image of (b) is not a POI candidate.

また、ＰＯＩ推定部１３ｃは、グループ作成部１２によって作成された画像グループ単位でも、ＰＯＩ情報からコンテクスト候補を推定する。具体的には、ＰＯＩ推定部１３ｃは、同一の画像グループ内における最多のＰＯＩ候補を、画像グループのグループコンテクスト（後述）を導出するためのＰＯＩ候補として推定する。なお、ＰＯＩ推定部１３ｃは、最多のＰＯＩ候補が複数ある場合には、ＰＯＩスコアの総計が高いＰＯＩ候補を、画像グループのグループコンテクスト（後述）を導出するためのＰＯＩ候補として推定してもよい。 The POI estimation unit 13c also estimates the context candidates from the POI information even in the image group unit created by the group creation unit 12. Specifically, the POI estimation unit 13c estimates the most POI candidates in the same image group as POI candidates for deriving a group context (described later) of the image group. If there are a plurality of POI candidates, the POI estimation unit 13c may estimate a POI candidate having a high total POI score as a POI candidate for deriving a group context (described later) of the image group. .

イベント推定部１３ｄは、推定用情報格納部１８に格納されているイベント情報を取得し、該イベント情報に基づくコンテクスト候補であるイベント候補を推定する。具体的には、イベント推定部１３ｄは、位置を示す情報が画像を記録した地点の位置情報と一致し、且つ、日時を示す情報が画像を記録した日時情報と一致するイベント情報を、ユーザコンテクストを導出するためのイベント候補として推定する。イベント推定部１３ｄは、まず推定用情報格納部１８に格納されているＰＯＩ情報から、画像を記録した地点のＰＯＩを取得すると共に、画像メタデータ格納部１６に格納されている該画像を記録した日時情報を取得する。その後、イベント推定部１３ｄは、推定用情報格納部１８に格納されているイベント情報に、ＰＯＩ及び日時情報が一致するイベント情報があるか検索し、あれば、該イベント情報のイベント名及びイベントカテゴリを、該画像のイベント候補とする。 The event estimation unit 13d acquires event information stored in the estimation information storage unit 18, and estimates event candidates that are context candidates based on the event information. Specifically, the event estimation unit 13d displays event information in which the information indicating the position matches the position information of the point where the image is recorded and the information indicating the date and time matches the date and time information where the image is recorded. As an event candidate for deriving. The event estimation unit 13d first acquires the POI of the point where the image was recorded from the POI information stored in the estimation information storage unit 18, and recorded the image stored in the image metadata storage unit 16. Get date and time information. Thereafter, the event estimation unit 13d searches the event information stored in the estimation information storage unit 18 for event information that matches the POI and date / time information. If there is, the event name and event category of the event information are found. Are event candidates for the image.

また、イベント推定部１３ｄは、グループ作成部１２によって作成された画像グループ単位でも、イベント情報からコンテクスト候補を推定する。具体的には、イベント推定部１３ｄは、ＰＯＩ及び日時情報が一致するイベント情報があれば、該イベント名及びイベントカテゴリを、該画像グループのグループコンテクスト（後述）を導出するためのイベント候補とする。ここで、画像グループ単位で日時情報の一致を判断する際には、日時情報が、画像グループに含まれる画像のうち日時情報が最も古い画像を記録した日時情報から、日時情報が最も新しい画像を記録した日時情報の間の日時を示す情報であるイベント情報の有無を判断する。 Further, the event estimation unit 13d estimates context candidates from the event information even in units of image groups created by the group creation unit 12. Specifically, if there is event information whose POI and date / time information match, the event estimation unit 13d sets the event name and event category as event candidates for deriving a group context (described later) of the image group. . Here, when determining whether the date / time information matches on an image group basis, the date / time information is obtained from the date / time information in which the image with the oldest date / time information is recorded among the images included in the image group. It is determined whether or not there is event information that is information indicating the date and time between the recorded date and time information.

上述したように、候補推定部１３から組み合わせ部１４に対しては、コンテクスト候補である文字候補、画像候補、ＰＯＩ候補、及びイベント候補が関連付けられた推定結果テーブルが出力される。図４は、推定結果テーブル１８０を示す表である。推定結果テーブル１８０では、画像ＩＤと、画像候補と、ＰＯＩ候補と、位置情報と、日時情報と、文字候補と、市町村名と、イベント候補と、ファイル形式とが対応付けられている。 As described above, an estimation result table in which character candidates, image candidates, POI candidates, and event candidates that are context candidates are associated is output from the candidate estimation unit 13 to the combination unit 14. FIG. 4 is a table showing the estimation result table 180. In the estimation result table 180, an image ID, an image candidate, a POI candidate, position information, date / time information, a character candidate, a municipality name, an event candidate, and a file format are associated with each other.

画像ＩＤとは、該画像を一意に特定する識別子である。画像候補、ＰＯＩ候補、文字候補、及びイベント候補とは、それぞれ、上述した画像認識推定部１３ｂ、ＰＯＩ推定部１３ｃ、文字認識推定部１３ａ、及びイベント推定部１３ｄにより推定されたコンテクスト候補である。なお、例えば画像ＩＤ「Ｐ０００１」で示される画像は、ＰＯＩ候補について「Ｂ公園（公園）」とされている。当該「Ｂ公園」はＰＯＩ名称であり、「（公園）」はＰＯＩカテゴリである。位置情報とは、位置情報格納部１７に格納された該画像の位置情報と同様の情報である。日時情報とは、画像メタデータ格納部１６に格納された該画像の日時情報である。市町村名とは、位置情報に基づき導出される市町村の名称である。ファイル形式とは、画像メタデータ格納部１６に格納された該画像のファイル形式である。なお、当該推定結果テーブル１８０は、画像単体のユーザコンテクスト導出に係る推定結果テーブルであるので、グループ単位のコンテクスト候補を含んでいない。 The image ID is an identifier that uniquely identifies the image. The image candidate, the POI candidate, the character candidate, and the event candidate are context candidates estimated by the above-described image recognition estimation unit 13b, POI estimation unit 13c, character recognition estimation unit 13a, and event estimation unit 13d, respectively. For example, the image indicated by the image ID “P0001” is “B park (park)” for the POI candidate. The “B park” is a POI name, and “(park)” is a POI category. The position information is the same information as the position information of the image stored in the position information storage unit 17. The date / time information is the date / time information of the image stored in the image metadata storage unit 16. The municipality name is the name of the municipality derived based on the location information. The file format is a file format of the image stored in the image metadata storage unit 16. Note that the estimation result table 180 is an estimation result table related to derivation of a user context of a single image, and therefore does not include group-unit context candidates.

組み合わせ部１４は、通信部１１により取得された複数の画像管理情報を組み合わせることにより、当該画像に係るユーザの行動を示すユーザコンテクストを導出する機能である。より詳細には、組み合わせ部１４は、複数の画像管理情報のうち、候補推定部１３によりコンテクスト候補とされた画像管理情報を組み合わせることにより、ユーザコンテクストを導出する。そして、組み合わせ部１４は、導出したユーザコンテクストに応じたタグを画像に関連付けたタグ付け結果テーブル（後述）を含んだカテゴリ付与要求を、カテゴリ付与部１５に出力する。タグとは、画像に関する具体的な情報要素であり、画像の検索キーとなる情報である。導出したユーザコンテクストとタグとは、同一のキーワードであってもよいし、対応する異なるキーワードであってもよい。 The combination unit 14 is a function for deriving a user context indicating a user's action related to the image by combining a plurality of pieces of image management information acquired by the communication unit 11. More specifically, the combination unit 14 derives a user context by combining the image management information that has been made a context candidate by the candidate estimation unit 13 among the plurality of image management information. Then, the combination unit 14 outputs a category assignment request including a tagging result table (described later) in which a tag corresponding to the derived user context is associated with an image to the category assignment unit 15. A tag is a specific information element related to an image, and is information serving as a search key for the image. The derived user context and tag may be the same keyword, or may be different corresponding keywords.

組み合わせ部１４は、推定結果テーブル（例えば図４参照）を参照することにより、画像に関連付けられたコンテクスト候補を特定する。例えば、図４における画像ＩＤが「Ｐ０００１」の画像については、コンテクスト候補として、画像候補である「park」、ＰＯＩ候補である「Ｂ公園（公園）」、文字候補である「運動会」が特定される。そして、これらのコンテクスト候補が組み合わせることにより、「公園」とのユーザコンテクストが導出される。このような、コンテクスト候補を組み合わせたユーザコンテクストの導出は、例えば、コンテクスト候補の組み合わせ（複数のコンテクスト候補）とユーザコンテクストとを関連付けたテーブルを用いることにより導出されてもよい。このようなテーブルを参照することにより、コンテクスト候補の組み合わせから、ユーザコンテクストを一意に定めることができる。また、導出されたユーザコンテクスト「公園」に応じて、タグ「公園」が付与される。タグは、例えばユーザコンテクストとタグとが関連付けられたテーブルに基づいて付与される。 The combination unit 14 identifies a context candidate associated with the image by referring to an estimation result table (for example, see FIG. 4). For example, for the image with the image ID “P0001” in FIG. 4, “park” as the image candidate, “B park (park)” as the POI candidate, and “athletic meet” as the character candidate are specified as the context candidates. The Then, by combining these context candidates, a user context with “park” is derived. Such derivation of the user context combining the context candidates may be derived, for example, by using a table in which the combination of context candidates (a plurality of context candidates) is associated with the user context. By referring to such a table, a user context can be uniquely determined from a combination of context candidates. In addition, the tag “park” is given according to the derived user context “park”. A tag is given based on a table in which a user context and a tag are associated, for example.

また、組み合わせ部１４は、グループ作成部１２によりグルーピングされた画像グループ毎に、画像グループに含まれる画像の画像管理情報に基づいて、ユーザの行動を示すグループコンテクストを導出する。より詳細には、組み合わせ部１４は、画像グループ単位のコンテクスト候補（文字候補、画像候補、ＰＯＩ候補、及びイベント候補）とされた画像管理情報を組み合わせることにより、グループコンテクストを導出する。そして、組み合わせ部１４は、導出したグループコンテクストに応じたタグを画像グループと関連付けたタグ付け管理テーブルを含んだカテゴリ付与要求を、カテゴリ付与部１５に出力する。 Further, the combination unit 14 derives a group context indicating the user's action for each image group grouped by the group creation unit 12 based on the image management information of the images included in the image group. More specifically, the combination unit 14 derives a group context by combining image management information that is a context candidate (character candidate, image candidate, POI candidate, and event candidate) in units of image groups. Then, the combination unit 14 outputs a category assignment request including a tagging management table in which a tag corresponding to the derived group context is associated with an image group to the category assignment unit 15.

カテゴリ付与部１５は、一又は複数のユーザコンテクストを上位概念で規定したカテゴリを画像に関連付ける機能である。カテゴリ付与部１５は、組み合わせ部１４から入力されたタグ付け管理テーブルにおいて、画像にカテゴリを関連付ける。カテゴリ付与部１５は、例えば１つのカテゴリと複数のユーザコンテクストとが対応づけらられたテーブルに基づいて、ユーザコンテクストからカテゴリを導出する。なお、カテゴリ付与部１５は、複数のユーザコンテクストが関連付けられた画像に対しては複数のカテゴリを導出してもよい。 The category assigning unit 15 is a function for associating an image with a category in which one or a plurality of user contexts are defined by a higher concept. The category assigning unit 15 associates a category with an image in the tagging management table input from the combination unit 14. The category assigning unit 15 derives a category from the user context based on, for example, a table in which one category is associated with a plurality of user contexts. The category assigning unit 15 may derive a plurality of categories for an image associated with a plurality of user contexts.

図１２は、カテゴリ付与の一例を説明する図である。例えば、図１２（ａ）〜（ｃ）の画像はいずれも「Ｂ公園」「公園」とのユーザコンテクストが関連付けられている。カテゴリ付与部１５は、これらの画像に対して、図１２（ｄ）に示す「おでかけ」カテゴリを関連付ける。また、図１２（ｂ）の画像は「ランチ」とのユーザコンテクストが関連付けられている。カテゴリ付与部１５は、この画像に対して、図１２（ｅ）に示す「食事」カテゴリを関連付ける。 FIG. 12 is a diagram for explaining an example of category assignment. For example, all the images in FIGS. 12A to 12C are associated with user contexts “B park” and “park”. The category assigning unit 15 associates the “outing” category shown in FIG. 12D with these images. In addition, the image of FIG. 12B is associated with a user context “lunch”. The category assigning unit 15 associates the “meal” category shown in FIG.

また、カテゴリ付与部１５は、一又は複数のグループコンテクストを上位概念で規定したカテゴリを画像グループに関連付ける。カテゴリ付与部１５は、組み合わせ部１４から入力されたタグ付け管理テーブルにおいて、画像グループにカテゴリを関連付ける。 The category assigning unit 15 associates a category that defines one or more group contexts with a superordinate concept with an image group. The category assigning unit 15 associates the category with the image group in the tagging management table input from the combination unit 14.

図５は、タグ付け結果テーブル１９０を示す表である。図５に示されるように、タグ付け結果テーブル１９０では、画像ＩＤと、画像単体のタグ付け結果と、画像単体のカテゴリと、画像候補と、ＰＯＩ候補と、文字候補と、市町村名と、イベント候補と、ファイル形式とが関連付けられている。なお、図５に示すタグ付け結果テーブル１９０では、画像単体についての各コンテクスト候補（文字候補、画像候補、ＰＯＩ候補、及びイベント候補）のみ記憶されているが、画像グループ単位でタグ付け及びカテゴリ付与を行う場合には、タグ付け結果テーブルにおいて画像グループ単位の各コンテクスト候補についても記憶される（後述）。 FIG. 5 is a table showing the tagging result table 190. As shown in FIG. 5, in the tagging result table 190, the image ID, the tagging result of the single image, the category of the single image, the image candidate, the POI candidate, the character candidate, the municipality name, the event Candidates are associated with file formats. In the tagging result table 190 shown in FIG. 5, only the context candidates (character candidates, image candidates, POI candidates, and event candidates) for a single image are stored, but tagging and category assignment are performed in units of image groups. In the tagging result table, each context candidate for each image group is also stored (described later).

上述したように、タグ付け装置１０では、画像単体でのタグ付け及び画像グループ単位でのタグ付けが行われる。以下では、図６及び図７を参照して、画像単体でのタグ付け及び画像グループ単位でのタグ付けの一例を説明する。図６は、画像単体でタグ付けを行う一例の説明図である。図７は、画像グループ単位でタグ付けを行う一例の説明図である。 As described above, the tagging device 10 performs tagging on an image basis and tagging on an image group basis. Hereinafter, with reference to FIGS. 6 and 7, an example of tagging in units of images and tagging in units of image groups will be described. FIG. 6 is an explanatory diagram of an example of performing tagging with a single image. FIG. 7 is an explanatory diagram of an example of performing tagging in units of image groups.

図６（ａ）は画像の一例を示す図であり、図６（ｂ）は図６（ａ）に示す画像（画像ＩＤ：Ｐ０００１）に関する推定結果テーブル１８１を示す表であり、図６（ｃ）は図６（ａ）に示す画像（画像ＩＤ：Ｐ０００１）に関するタグ付け結果テーブル１９１を示す表である。 6A is a diagram illustrating an example of an image, and FIG. 6B is a table illustrating an estimation result table 181 related to the image (image ID: P0001) illustrated in FIG. 6A. ) Is a table showing a tagging result table 191 related to the image (image ID: P0001) shown in FIG.

図６（ｂ）の推定結果テーブル１８１に示されるように、画像ＩＤが「Ｐ０００１」の画像について、画像候補として「park」が、ＰＯＩ候補として「Ｂ公園（公園）」が、それぞれ関連付けられている。そして、当該推定結果テーブル１８１に応じて、図６（ｃ）のタグ付け結果テーブル１９１が導出されている。図６（ｃ）のタグ付け結果テーブル１９１に示されるように、画像ＩＤが「Ｐ０００１」の画像について、画像単体のタグ付け結果として「公園」が、画像単体のカテゴリとして「おでかけ」が、それぞれ関連付けられている。当該画像単体のタグ付け結果「公園」は、上述したコンテクスト候補である「park」及び「Ｂ公園（公園）」に基づいて導出されるユーザコンテクスト「公園」に応じたものである。また、当該画像単体のカテゴリである「おでかけ」は、上述したユーザコンテクスト「公園」に応じたものである。 As shown in the estimation result table 181 in FIG. 6B, for the image with the image ID “P0001”, “park” is associated with the image candidate and “B park (park)” is associated with the POI candidate. Yes. Then, the tagging result table 191 in FIG. 6C is derived according to the estimation result table 181. As shown in the tagging result table 191 in FIG. 6C, for the image with the image ID “P0001”, “Park” is the tagging result of the single image, and “Outing” is the category of the single image, respectively. Associated. The tagging result “park” of the image alone corresponds to the user context “park” derived based on the above-described context candidates “park” and “B park (park)”. Further, “outing” which is a category of the image alone corresponds to the above-described user context “park”.

図７（ａ）〜図７（ｃ）は画像の一例を示す図であり、図７（ｄ）は図７（ａ）〜図７（ｃ）に示す画像（画像ＩＤ：Ｐ０００１〜Ｐ０００３）に関する推定結果テーブル１８２を示す表であり、図７（ｅ）は図７（ａ）〜図７（ｃ）に示す画像（画像ＩＤ：Ｐ０００１〜Ｐ０００３）に関するタグ付け結果テーブル１９２を示す表である。画像ＩＤ：Ｐ０００１〜Ｐ０００３の各画像は、同一の画像グループ（グループＩＤ：Ｇ０００１）に含まれている。なお、図７（ｄ）及び図７（ｅ）においては、画像単体の各コンテクスト候補のみ記載しているが、実際には、これらのテーブルでは、画像グループ単位の各コンテクスト候補についても記憶されている。 FIGS. 7A to 7C are diagrams showing examples of images, and FIG. 7D relates to the images (image IDs: P0001 to P0003) shown in FIGS. 7A to 7C. 7E is a table showing an estimation result table 182, and FIG. 7E is a table showing a tagging result table 192 relating to images (image IDs: P0001 to P0003) shown in FIGS. 7A to 7C. The images with image IDs P0001 to P0003 are included in the same image group (group ID: G0001). 7 (d) and 7 (e), only the context candidates for the single image are described, but in reality, the context candidates for each image group are also stored in these tables. Yes.

図７（ｄ）の推定結果テーブル１８２に示されるように、画像ＩＤが「Ｐ０００１」「Ｐ０００３」の画像について、画像候補として「park」が、ＰＯＩ候補として「Ｂ公園（公園）」が、それぞれ関連付けられている。また、画像ＩＤが「Ｐ０００２」の画像について、画像候補として「meal」が、ＰＯＩ候補として「Ｂ公園（公園）」がそれぞれ関連付けられている。 As shown in the estimation result table 182 in FIG. 7D, for the images with the image IDs “P0001” and “P0003”, “park” is the image candidate, and “B park (park)” is the POI candidate, Associated. For the image with the image ID “P0002”, “meal” is associated with the image candidate, and “B park (park)” is associated with the POI candidate.

そして、当該推定結果テーブル１８２においては、当該画像グループにおける最多の画像候補である「park」が、画像グループのグループコンテクストを導出するための画像候補とされる。また、当該推定結果テーブル１８２においては、当該画像グループにおける最多のＰＯＩ候補である「Ｂ公園（公園）」が、画像グループのグループコンテクストを導出するためのＰＯＩ候補とされる。当該画像グループのグループコンテクストを導出するための画像候補及びＰＯＩ候補は、図７（ｄ）中には記載されていないが、実際には推定結果テーブル１８２において記憶されている。 In the estimation result table 182, “park”, which is the most image candidate in the image group, is set as an image candidate for deriving the group context of the image group. In the estimation result table 182, “B park (park)” which is the most POI candidate in the image group is set as a POI candidate for deriving the group context of the image group. Image candidates and POI candidates for deriving the group context of the image group are not described in FIG. 7D, but are actually stored in the estimation result table 182.

当該推定結果テーブル１８２に応じて、図７（ｅ）のタグ付け結果テーブル１９２が導出されている。図７（ｅ）のタグ付け結果テーブル１９２に示されるように、画像ＩＤが「Ｐ０００１」「Ｐ０００３」の画像については、画像単体のタグ付け結果として「公園」が、画像単体のカテゴリとして「おでかけ」が、それぞれ関連付けられている。当該画像単体のタグ付け結果「公園」は、上述したコンテクスト候補である「park」及び「Ｂ公園（公園）」に基づいて導出されるユーザコンテクスト「公園」に応じたものである。当該画像単体のカテゴリである「おでかけ」は、上述したユーザコンテクスト「公園」に応じたものである。また、画像ＩＤが「Ｐ０００２」の画像については、画像単体のタグ付け結果として「ランチ」が、画像単体のカテゴリとして「食事」が、それぞれ関連付けられている。当該画像単体のタグ付け結果「ランチ」は、上述したコンテクスト候補である「meal」及び「Ｂ公園（公園）」に基づいて導出されるユーザコンテクスト「公園」に応じたものである。当該画像単体のカテゴリ「食事」は、上述したユーザコンテクスト「ランチ」に応じたものである。 The tagging result table 192 shown in FIG. 7E is derived according to the estimation result table 182. As shown in the tagging result table 192 of FIG. 7E, for the images with the image IDs “P0001” and “P0003”, “park” as the single image tagging result and “outing” as the single image category. Are associated with each other. The tagging result “park” of the image alone corresponds to the user context “park” derived based on the above-described context candidates “park” and “B park (park)”. “Odekake” which is a category of the single image corresponds to the user context “park” described above. For the image with the image ID “P0002”, “lunch” is associated with the tagging result of the single image, and “meal” is associated with the category of the single image. The tagging result “lunch” of the image alone corresponds to the user context “park” derived based on the above-described context candidates “meal” and “B park (park)”. The category “meal” of the single image corresponds to the above-described user context “lunch”.

更に、図７（ｅ）のタグ付け結果テーブル１９２に示されるように、各画像を含む画像グループ単位でもタグ及びカテゴリの関連付けが行われており、画像グループのタグ付け結果として「公園」が、画像グループのカテゴリとして「おでかけ」が導出されている。当該画像グループのタグ付け結果「公園」は、上述した画像グループ単位のコンテクスト候補である「park」及び「Ｂ公園（公園」に基づいて導出されるグループコンテクスト「公園」に応じたものである。当該画像グループのカテゴリである「おでかけ」は、上述したグループコンテクスト「公園」に応じたものである。 Furthermore, as shown in the tagging result table 192 of FIG. 7 (e), tags and categories are also associated with each image group including each image. As a result of tagging the image group, “park” “Outing” is derived as the category of the image group. The tagging result “park” of the image group corresponds to the group context “park” derived based on the above-mentioned image group unit context candidates “park” and “B park (park)”. “Odekake” which is the category of the image group corresponds to the group context “park” described above.

なお、画像ＩＤが「Ｐ０００１」「Ｐ０００３」の画像については、画像単体のタグ付け結果「公園」と、画像グループ単位のタグ付け結果「公園」とが重複しており、また、画像単体のカテゴリ「おでかけ」と、画像グループ単位のカテゴリ「おでかけ」とが重複している。このように、画像単体の推定結果と画像グループ単位の推定結果とが重複している場合には、図７（ａ）及び図７（ｃ）に示されるように、重複した結果の一方が非表示とされる（図７（ａ）及び図７（ｃ）における破線部分が非表示箇所）。 For the images with the image IDs “P0001” and “P0003”, the tagging result “park” for the single image and the tagging result “park” for each image group overlap, and the category of the single image “Outing” and the category “outing” for each image group overlap. In this way, when the estimation result of the single image and the estimation result of the image group unit overlap, as shown in FIG. 7A and FIG. Display is made (the broken line portion in FIGS. 7A and 7C is a non-display location).

次に、図１３を参照して、タグ付け装置１０によるタグ付け処理を説明する。図１３は、タグ付け装置１０の処理を示すフローチャートである。 Next, tagging processing by the tagging device 10 will be described with reference to FIG. FIG. 13 is a flowchart showing the processing of the tagging device 10.

タグ付け装置１０のタグ付け処理では、最初に、通信部１１によりコンテクスト推定に用いる推定用情報が取得される（ステップＳ１）。具体的には、通信部１１は、ＰＯＩ変換装置４０から画像のＰＯＩ情報（画像管理情報）を、文字認識装置５０から文字認識結果（画像管理情報）を、画像認識装置６０から画像認識結果（画像管理情報）を、イベント推定装置７０からイベント情報（画像管理情報）を、推定用情報として、それぞれ取得する。通信部１１は、当該推定用情報を推定用情報格納部１８に格納する。 In the tagging process of the tagging device 10, first, estimation information used for context estimation is acquired by the communication unit 11 (step S1). Specifically, the communication unit 11 receives the POI information (image management information) of the image from the POI conversion device 40, the character recognition result (image management information) from the character recognition device 50, and the image recognition result (image management information) from the image recognition device 60. Image management information) and event information (image management information) from the event estimation device 70 as information for estimation. The communication unit 11 stores the estimation information in the estimation information storage unit 18.

つづいて、ＰＯＩ推定部１３ｃによりＰＯＩ推定が行われ（ステップＳ２）、画像認識推定部１３ｂにより画像認識推定が行われ（ステップＳ３）、文字認識推定部１３ａにより文字認識推定が行われる（ステップＳ４）。また、図１３中には記載していないが、イベント推定部１３ｄによりイベント推定が行われてもよい。そして、画像単位のコンテクスト候補を含む推定結果は、候補推定部１３により組み合わせ部１４に出力される（ステップＳ５）。 Subsequently, POI estimation is performed by the POI estimation unit 13c (step S2), image recognition estimation is performed by the image recognition estimation unit 13b (step S3), and character recognition estimation is performed by the character recognition estimation unit 13a (step S4). ). Although not shown in FIG. 13, event estimation may be performed by the event estimation unit 13d. And the estimation result containing the context candidate of an image unit is output to the combination part 14 by the candidate estimation part 13 (step S5).

ここで、上述した候補推定部１３の各構成である文字認識推定部１３ａ、画像認識推定部１３ｂ、ＰＯＩ推定部１３ｃ、及びイベント推定部１３ｄの詳細な処理について、図１４〜図１７を参照して説明する。図１４は、文字認識推定部１３ａの処理の詳細を示すフローチャートである。図１５は、画像認識推定部１３ｂの処理の詳細を示すフローチャートである。図１６は、ＰＯＩ推定部１３ｃの処理の詳細を示すフローチャートである。図１７は、イベント推定部１３ｄの処理の詳細を示すフローチャートである。 Here, the detailed processing of the character recognition estimation unit 13a, the image recognition estimation unit 13b, the POI estimation unit 13c, and the event estimation unit 13d, which are the components of the candidate estimation unit 13 described above, will be described with reference to FIGS. I will explain. FIG. 14 is a flowchart showing details of processing of the character recognition estimating unit 13a. FIG. 15 is a flowchart showing details of processing of the image recognition estimation unit 13b. FIG. 16 is a flowchart showing details of the processing of the POI estimation unit 13c. FIG. 17 is a flowchart showing details of processing of the event estimation unit 13d.

図１４に示されるように、文字認識推定部１３ａでは、推定用情報格納部１８に格納されている文字認識結果が取得される（ステップＳ７０）。そして、文字認識推定部１３ａにより、予め定められたカテゴリが検索され、文字認識結果において判別されたキーワードに関連付けられたカテゴリが、予め定められたカテゴリであるか否かが判断される（ステップＳ７１）。当該予め定められたカテゴリである場合には、文字認識結果において判別されたキーワード及びカテゴリが、文字候補とされる。そして、文字認識推定部１３ａにより、予め定められたキーワードが検索され、文字認識結果において判別されたキーワードが、予め定められたキーワードであるか否かが判断される（ステップＳ７２）。当該予め定められたキーワードである場合には、文字認識結果において判別されたキーワードが、文字候補として取得される。 As shown in FIG. 14, the character recognition estimation unit 13a obtains the character recognition result stored in the estimation information storage unit 18 (step S70). Then, the character recognition estimating unit 13a searches for a predetermined category, and determines whether or not the category associated with the keyword determined in the character recognition result is the predetermined category (step S71). ). In the case of the predetermined category, the keyword and category determined in the character recognition result are set as character candidates. Then, the character recognition estimating unit 13a searches for a predetermined keyword, and determines whether or not the keyword determined in the character recognition result is a predetermined keyword (step S72). In the case of the predetermined keyword, the keyword determined in the character recognition result is acquired as a character candidate.

図１５に示されるように、画像認識推定部１３ｂでは、推定用情報格納部１８に格納されている画像認識結果のラベルとスコアが取得される（ステップＳ６０）。そして、画像認識推定部１３ｂにより、画像認識結果のラベルがスコアの昇順で並び替えられ（ステップＳ６１）、最もスコアの高いラベルが画像のラベルに決定される（ステップＳ６２）。最後に、画像認識推定部１３ｂによりスコアの上位５件の画像認識結果が、画像候補として取得される（ステップＳ６３）。 As shown in FIG. 15, in the image recognition estimation unit 13b, the label and score of the image recognition result stored in the estimation information storage unit 18 are acquired (step S60). Then, the image recognition estimation unit 13b rearranges the labels of the image recognition results in ascending order of scores (step S61), and determines the label with the highest score as the image label (step S62). Finally, the top five image recognition results of the score are acquired as image candidates by the image recognition estimation unit 13b (step S63).

図１６に示されるように、ＰＯＩ推定部１３ｃでは、推定用情報格納部１８に格納されているＰＯＩ情報が取得される（ステップＳ５０）。そして、ＰＯＩ推定部１３ｃにより、ＰＯＩのチェックイン数を、ＰＯＩまでの距離で割ったＰＯＩスコアが算出される（ステップＳ５１）。ＰＯＩ推定部１３ｃは、最もスコアの高いＰＯＩを当該画像のＰＯＩ候補と推定する（ステップＳ５２）。より詳細には、ＰＯＩ推定部１３ｃは、ＰＯＩスコアが最も高いＰＯＩ情報のＰＯＩ名称とＰＯＩカテゴリとを、ＰＯＩ候補として取得する（ステップＳ５３）。 As shown in FIG. 16, the POI estimation unit 13c acquires the POI information stored in the estimation information storage unit 18 (step S50). Then, the POI estimation unit 13c calculates a POI score obtained by dividing the number of POI check-ins by the distance to the POI (step S51). The POI estimation unit 13c estimates the POI with the highest score as a POI candidate for the image (step S52). More specifically, the POI estimation unit 13c acquires the POI name and POI category of the POI information with the highest POI score as a POI candidate (step S53).

図１７に示されるように、イベント推定部１３ｄでは、推定用情報格納部１８に格納されているＰＯＩ情報が取得される（ステップＳ８０）。そして、イベント推定部１３ｄにより、画像メタデータ格納部１６から画像を記録した日時情報が取得されると共に、推定用情報格納部１８に格納されているイベント情報に、ＰＯＩ及び日時情報が一致するイベント情報があるか否かが判定される（ステップＳ８１）。Ｓ８１において一致するイベント情報がない場合には処理が終了する。一方で、Ｓ８１において一致するイベント情報がある場合には、イベント推定部１３ｄにより、画像に紐づくイベント情報が特定され（ステップＳ８２）、該イベント情報のイベント名及びイベントカテゴリが、該画像のイベント候補として取得される（ステップＳ８３）。以上が、候補推定部１３の各構成の処理である。 As shown in FIG. 17, the event estimation unit 13d acquires the POI information stored in the estimation information storage unit 18 (step S80). Then, the event estimation unit 13d acquires the date / time information when the image is recorded from the image metadata storage unit 16, and the event whose POI and date / time information match the event information stored in the estimation information storage unit 18 It is determined whether there is information (step S81). If there is no matching event information in S81, the process ends. On the other hand, if there is matching event information in S81, the event estimation unit 13d specifies event information associated with the image (step S82), and the event name and event category of the event information indicate the event information of the image. Obtained as a candidate (step S83). The above is the process of each component of the candidate estimation unit 13.

図１３に戻り、Ｓ５の処理が完了すると、組み合わせ部１４によって、候補推定部１３によりコンテクスト候補とされた画像管理情報が組み合わせられ、画像単位のユーザコンテクストが推定（導出）される（ステップＳ６）。そして、組み合わせ部１４により、ユーザコンテクストに応じたタグ付けが行われる（ステップＳ７）。具体的には、組み合わせ部１４は、導出したユーザコンテクストに応じたタグを画像に関連付けたタグ付け結果テーブルを含んだカテゴリ付与要求を、カテゴリ付与部１５に出力する。そして、カテゴリ付与部１５において、一又は複数のユーザコンテクストを上位概念で規定したカテゴリが画像に関連付けられる。ここまでの処理は、画像単位のタグ付け処理である。以下では、画像グループ単位のタグ付け処理について説明する。 Returning to FIG. 13, when the process of S5 is completed, the combination unit 14 combines the image management information that has been made the context candidate by the candidate estimation unit 13, and estimates (derived) the user context in units of images (step S6). . And the tagging according to a user context is performed by the combination part 14 (step S7). Specifically, the combination unit 14 outputs a category assignment request including a tagging result table in which a tag corresponding to the derived user context is associated with an image to the category assignment unit 15. And in the category provision part 15, the category which prescribed | regulated the one or some user context by the high-order concept is linked | related with an image. The processing so far is tagging processing in units of images. Hereinafter, tagging processing in units of image groups will be described.

タグ付け装置１０の画像グループ単位のタグ付け処理では、まず、グループ作成部１２により画像グループが作成される（ステップＳ８）。具体的には、グループ作成部１２は、画像を記録した地点の位置情報と、画像を記録した日時情報とが所定の範囲内である複数の画像をグルーピングした画像グループを作成する。 In the tagging process for each image group of the tagging apparatus 10, first, an image group is created by the group creation unit 12 (step S8). Specifically, the group creating unit 12 creates an image group in which a plurality of images in which the position information of the point where the image is recorded and the date and time information when the image is recorded are within a predetermined range are grouped.

つづいて、候補推定部１３の各構成により、画像グループ単位でのコンテクスト候補が推定される。すなわち、ＰＯＩ推定部１３ｃにより、同一の画像グループ内における最多のＰＯＩ候補が、画像グループのグループコンテクストを導出するためのＰＯＩ候補として特定される（ステップＳ９）。また、イベント推定部１３ｄにより、ＰＯＩ及び日時情報が一致するイベント情報が、画像グループのグループコンテクストを導出するためのイベント候補として特定される（ステップＳ１０）。また、画像認識推定部１３ｂにより、同一の画像グループ内におけるスコアの総和が最も高いラベルが、画像グループのグループコンテクストを導出するための画像候補として特定される（ステップＳ１１）。更に、文字認識推定部１３ａにより、同一の画像グループに含まれる文字候補が全てリストアップされ、画像グループのグループコンテクストを導出するための文字候補として特定される（ステップＳ１２）。 Subsequently, context candidates for each image group are estimated by each configuration of the candidate estimation unit 13. That is, the POI estimation unit 13c identifies the most POI candidates in the same image group as POI candidates for deriving the group context of the image group (step S9). Further, the event estimation unit 13d identifies event information having the same POI and date / time information as an event candidate for deriving the group context of the image group (step S10). Further, the image recognition estimation unit 13b identifies the label having the highest total score in the same image group as an image candidate for deriving the group context of the image group (step S11). Further, the character recognition estimating unit 13a lists all the character candidates included in the same image group and specifies them as character candidates for deriving the group context of the image group (step S12).

そして、組み合わせ部１４によって、候補推定部１３によりグループコンテクストを導出するためのコンテクスト候補とされた画像管理情報が組み合わせられ、グループコンテクストが推定（導出）される（ステップＳ１３）。その後、組み合わせ部１４により、グループコンテクストに応じたタグ付けが行われ（ステップＳ１４）、カテゴリ付与部１５において、一又は複数のグループコンテクストを上位概念で規定したカテゴリが画像グループに関連付けられる（ステップＳ１５）。 Then, the combination management unit 14 combines the image management information set as context candidates for deriving the group context by the candidate estimation unit 13, and estimates (derives) the group context (step S13). Thereafter, the combination unit 14 performs tagging in accordance with the group context (step S14), and the category assigning unit 15 associates a category that defines one or a plurality of group contexts with a higher concept (step S15). ).

次に、実施形態に係るタグ付け装置１０の作用効果について説明する。 Next, the effect of the tagging apparatus 10 according to the embodiment will be described.

本実施形態に係るタグ付け装置１０は、画像に係る時期、場所、又は出来事を示す画像管理情報を複数取得する通信部１１と、通信部１１により取得された複数の画像管理情報を組み合わせることにより、画像に係るユーザの行動を示すユーザコンテクストを導出する組み合わせ部１４と、を備える。 The tagging apparatus 10 according to the present embodiment combines a communication unit 11 that acquires a plurality of image management information indicating time, place, or event related to an image and a plurality of image management information acquired by the communication unit 11. And a combination unit 14 for deriving a user context indicating the user's behavior related to the image.

このタグ付け装置１０では、複数の画像管理情報、すなわちいつ、どこで、何をしていたかを示す情報が組み合わされて、画像に係るユーザの行動を示すユーザコンテクストが導出される。このようなユーザコンテクストを画像検索に用いることにより、例えば、日時や場所などの情報を単体で画像に関連付けて画像検索を行う場合と比較して、より画像と整合した（画像におけるユーザの行動に則した）情報に基づいて画像検索を行うことができる。このことにより、画像検索の精度を向上させることができる。 In the tagging device 10, a plurality of pieces of image management information, that is, information indicating when, where, and what is being combined is combined to derive a user context indicating the user's action related to the image. By using such a user context for image search, for example, compared to the case where image search is performed by associating information such as date and place with an image alone, it is more consistent with the image. Image retrieval can be performed on the basis of information. As a result, the accuracy of the image search can be improved.

タグ付け装置１０は、画像を記録した地点の位置情報と、画像を記録した日時情報とが所定の範囲内である複数の画像をグルーピングした画像グループを作成するグループ作成部１２を更に備え、組み合わせ部１４は、画像グループ毎に、画像グループに含まれる画像の画像管理情報に基づいて、ユーザの行動を示すグループコンテクストを導出する。 The tagging apparatus 10 further includes a group creating unit 12 that creates an image group in which a plurality of images in which the position information of the point where the image is recorded and the date and time information when the image is recorded are within a predetermined range are combined. The unit 14 derives, for each image group, a group context indicating the user's behavior based on the image management information of the images included in the image group.

位置及び日時を考慮したグループ単位でグループコンテクストを導出することにより、共通のイベント（出来事）に係る画像を同一の検索結果とし易くなる。すなわち、例えばイベントの最中に昼食をとった場合などにおいて、共通のイベントであるにもかかわらず位置等が少し異なることを理由として別の検索結果となる（ばらばらに出力される）ことを抑制することができる。また、例えば画像単体のユーザコンテクストについて誤認識や推定誤り等があった場合であっても、複数の画像のユーザコンテクストを考慮することによって、一部の誤りを無視できるため、画像検索の精度をより向上させることができる。 By deriving a group context for each group in consideration of the position and date and time, images related to a common event (event) can be easily obtained as the same search result. In other words, for example, if you have lunch in the middle of an event, it is possible to suppress another search result (output separately) because the location is a little different even though it is a common event. can do. In addition, for example, even when there is a misrecognition or estimation error in the user context of a single image, some errors can be ignored by considering the user context of a plurality of images, so the accuracy of image search is improved. It can be improved further.

タグ付け装置１０は、通信部１１により取得された複数の画像管理情報のうち、該画像管理情報の正確性に関する所定の条件を満たす画像管理情報を、コンテクスト候補として推定する候補推定部１３を更に備え、組み合わせ部１４は、複数の画像管理情報のうち、候補推定部１３によりコンテクスト候補とされた画像管理情報を組み合わせることにより、ユーザコンテクスト及びグループコンテクストを導出する。 The tagging apparatus 10 further includes a candidate estimation unit 13 that estimates, as a context candidate, image management information that satisfies a predetermined condition regarding the accuracy of the image management information among the plurality of image management information acquired by the communication unit 11. The combining unit 14 derives a user context and a group context by combining the image management information that is selected as a context candidate by the candidate estimation unit 13 among the plurality of image management information.

組み合わせる対象の画像管理情報を無作為に選択するのではなく、画像管理情報としての精度が高い情報を用いることにより、画像検索の精度をより向上させることができる。 Rather than randomly selecting image management information to be combined, the accuracy of image search can be further improved by using information with high accuracy as image management information.

複数の画像管理情報には、画像に関する画像認識結果が複数含まれており、候補推定部１３は、画像に関する画像認識結果に基づくコンテクスト候補である画像候補を推定する画像認識推定部１３ｂを有し、画像認識推定部１３ｂは、画像に関する画像認識結果において、類似度合いを示すスコアが所定の閾値以上であるオブジェクトを示す情報を、ユーザコンテクストを導出するための画像候補として推定し、同一画像グループ内における最多の画像候補を、該画像グループのグループコンテクストを導出するための画像候補として推定する。 The plurality of image management information includes a plurality of image recognition results regarding the image, and the candidate estimation unit 13 includes an image recognition estimation unit 13b that estimates an image candidate that is a context candidate based on the image recognition result regarding the image. The image recognition estimation unit 13b estimates information indicating an object whose score indicating the degree of similarity is equal to or greater than a predetermined threshold in the image recognition result regarding the image as an image candidate for deriving the user context, and within the same image group. Are estimated as image candidates for deriving the group context of the image group.

複数の画像管理情報には、画像に関するＰＯＩ情報が複数含まれており、候補推定部１３は、画像に関するＰＯＩ情報に基づくコンテクスト候補であるＰＯＩ候補を推定するＰＯＩ推定部１３ｃを有し、ＰＯＩ推定部１３ｃは、複数のＰＯＩ情報それぞれについて、ＰＯＩのチェックイン数を、該ＰＯＩから画像を記録した位置までの距離で割ったＰＯＩスコアを算出し、該ＰＯＩスコアが最も高いＰＯＩ情報を、ユーザコンテクストを導出するためのＰＯＩ候補として推定し、同一の画像グループ内における最多のＰＯＩ候補を、該画像グループのグループコンテクストを導出するためのＰＯＩ候補として推定する。 The plurality of image management information includes a plurality of pieces of POI information related to images, and the candidate estimation unit 13 includes a POI estimation unit 13c that estimates POI candidates that are context candidates based on the POI information about images, and performs POI estimation. The unit 13c calculates, for each of the plurality of POI information, a POI score obtained by dividing the number of POI check-ins by the distance from the POI to the position where the image is recorded, and the POI information having the highest POI score is obtained as the user context Are estimated as POI candidates for deriving and the most POI candidates in the same image group are estimated as POI candidates for deriving the group context of the image group.

複数の画像管理情報には、位置を示す情報及び日時を示す情報によって特定されるイベント情報が含まれており、候補推定部１３は、イベント情報に基づくコンテクスト候補であるイベント候補を推定するイベント推定部１３ｄを有し、イベント推定部１３ｄは、位置を示す情報が画像を記録した地点の位置情報と一致し、且つ、日時を示す情報が画像を記録した日時情報と一致するイベント情報を、ユーザコンテクストを導出するためのイベント候補として推定し、画像グループに含まれる画像を記録した地点の位置情報と一致し、且つ、日時を示す情報が、該画像グループに含まれる画像のうち日時情報が最も古い画像を記録した日時情報から、日時情報が最も新しい画像を記録した日時情報の間に含まれているイベント情報を、該画像グループのグループコンテクストを導出するためのイベント候補として推定する。 The plurality of image management information includes event information specified by information indicating a position and information indicating a date and time, and the candidate estimation unit 13 estimates an event candidate that is a context candidate based on the event information. The event estimation unit 13d receives event information in which the information indicating the position matches the position information of the point where the image is recorded and the information indicating the date and time matches the date and time information where the image is recorded. Estimated as an event candidate for deriving the context, and the information indicating the date and time coincides with the position information of the point where the image included in the image group is recorded, and the date and time information is the most among the images included in the image group. Event information included between the date and time information when the old image is recorded and the date and time information when the image with the newest date and time information is recorded. It estimated as the event candidate for deriving a group context of the loop.

これにより、位置及び日時を考慮して、ユーザが訪れていたと考えられるイベント候補を適切に推定することができる。すなわち、画像検索の精度をより向上させることができる。 Thereby, the event candidate considered that the user was visiting can be appropriately estimated in consideration of the position and the date and time. That is, the accuracy of image search can be further improved.

複数の画像管理情報には、画像に関する文字認識結果が含まれており、候補推定部１３は、画像に関する文字認識結果に基づくコンテクスト候補である文字候補を推定する文字認識推定部１３ａを有し、文字認識推定部１３ａは、文字認識結果における文字のうち予め定められた文字を、ユーザコンテクストを導出するための文字候補として推定し、同一の画像グループ内における最多の文字候補を、該画像グループのグループコンテクストを導出するための文字候補として推定する。 The plurality of image management information includes a character recognition result regarding the image, and the candidate estimation unit 13 includes a character recognition estimation unit 13a that estimates a character candidate that is a context candidate based on the character recognition result regarding the image, The character recognition estimation unit 13a estimates a predetermined character among characters in the character recognition result as a character candidate for deriving a user context, and determines the most character candidates in the same image group as the image group. It is estimated as a character candidate for deriving a group context.

組み合わせ部は、ユーザコンテクストに応じたタグを画像に関連付け、グループコンテクストに応じたタグを画像グループに関連付ける。これにより、従来、画像に関連付けた認識結果として１つの認識結果のみを用いていた場合と比較して、曖昧性を低減すると共に抽象度を下げてタグ付けすることができ、画像検索の精度を向上させることができる。 The combination unit associates a tag according to the user context with the image and associates a tag according to the group context with the image group. Thereby, compared with the case where only one recognition result is conventionally used as a recognition result associated with an image, tagging can be performed with reduced ambiguity and reduced abstraction, and the accuracy of image search is improved. Can be improved.

タグ付け装置１０は、一又は複数のユーザコンテクストを上位概念で規定したカテゴリを画像に関連付け、一又は複数のグループコンテクストを上位概念で規定したカテゴリを画像グループに関連付ける、カテゴリ付与部１５を更に備える。カテゴリを画像及び画像グループに関連付けることにより、ユーザコンテクスト及びグループコンテクストよりも上位概念で画像を検索すること等が可能になる。 The tagging apparatus 10 further includes a category assigning unit 15 that associates one or more user contexts with a category defined by a superordinate concept with an image, and associates one or more group contexts with a superordinate concept with an image group. . By associating a category with an image and an image group, it becomes possible to search for an image with a higher concept than a user context and a group context.

以上、本発明の実施形態について説明したが、本発明は上記実施形態に限定されない。例えば、複数の画像グループについては、更に纏められた新たな画像グループ（アルバム）を構成するものであってもよい。すなわち、グループ作成部１２は、作成した画像グループのうち、含まれる画像の日時情報が所定の範囲内である複数の画像グループを、共通の新たな画像グループとし、組み合わせ部１４は、新たな画像グループに含まれる複数の画像グループのうち、画像数が最も多い画像グループのグループコンテクストを、新たな画像グループのグループコンテクストとしてもよい。 As mentioned above, although embodiment of this invention was described, this invention is not limited to the said embodiment. For example, a plurality of image groups may constitute a new image group (album) that is further summarized. That is, the group creation unit 12 sets a plurality of image groups in which the date and time information of the included images is within a predetermined range among the created image groups as a common new image group, and the combination unit 14 creates a new image group. The group context of the image group having the largest number of images among the plurality of image groups included in the group may be set as the group context of the new image group.

図１８は、変形例に係るタグ付け装置のアルバム作成の説明図である。図１８（ａ）のテーブル２００に示されるように、異なるグループＩＤ「Ｇ０００１」「Ｇ０００２」で示される複数の画像グループが、共通の新たな画像グループ（アルバムＩＤ「Ａ０００１」）に纏められている。そして、グループＩＤ「Ｇ０００１」の画像グループのグループコンテクストが「Ｂ公園」、グループＩＤ「Ｇ０００２」の画像グループのグループコンテクストが「Ｂ水族館」であるとすると、いま、図１８（ｂ）に示されるように、グループＩＤ「Ｇ０００１」の画像グループのほうが、画像数が１枚多いので、アルバムのタイトル（アルバムのグループコンテクスト）は、Ｂ公園とされる。 FIG. 18 is an explanatory diagram of album creation of the tagging device according to the modification. As shown in the table 200 in FIG. 18A, a plurality of image groups indicated by different group IDs “G0001” and “G0002” are grouped into a new common image group (album ID “A0001”). . Assuming that the group context of the image group with the group ID “G0001” is “B park” and the group context of the image group with the group ID “G0002” is “B aquarium”, FIG. Thus, the image group with the group ID “G0001” has one more image, so the album title (album group context) is B park.

１０…タグ付け装置、１１…通信部（取得部）、１２…グループ作成部、１３…候補推定部、１３…候補推定部、１３ａ…文字認識推定部、１３ｂ…画像認識推定部、１３ｃ…ＰＯＩ推定部、１３ｄ…イベント推定部、１４…組み合わせ部、１５…カテゴリ付与部。 DESCRIPTION OF SYMBOLS 10 ... Tagging apparatus, 11 ... Communication part (acquisition part), 12 ... Group creation part, 13 ... Candidate estimation part, 13 ... Candidate estimation part, 13a ... Character recognition estimation part, 13b ... Image recognition estimation part, 13c ... POI Estimation part, 13d ... event estimation part, 14 ... combination part, 15 ... category assignment part.

Claims

An acquisition unit that acquires a plurality of image management information indicating the time, place, or event relating to the image;
An information processing apparatus comprising: a combination unit that derives a user context indicating a user action related to the image by combining a plurality of pieces of image management information acquired by the acquisition unit.

A group creating unit that creates an image group in which a plurality of images in which the position information of the point where the image is recorded and the date and time information where the image is recorded are within a predetermined range are grouped;
The information processing apparatus according to claim 1, wherein the combination unit derives, for each image group, a group context indicating a user action based on the image management information of images included in the image group.

A candidate estimation unit that estimates, as a context candidate, image management information that satisfies a predetermined condition regarding accuracy of the image management information among the plurality of image management information acquired by the acquisition unit;
3. The information according to claim 2, wherein the combination unit derives the user context and the group context by combining the image management information that is selected as the context candidate by the candidate estimation unit among the plurality of image management information. Processing equipment.

The plurality of image management information includes a plurality of image recognition results regarding the image,
The candidate estimation unit includes an image recognition estimation unit that estimates an image candidate that is the context candidate based on an image recognition result related to the image;
The image recognition estimation unit
In the image recognition result regarding the image, information indicating an object whose score indicating the degree of similarity is equal to or greater than a predetermined threshold is estimated as the image candidate for deriving the user context,
The information processing apparatus according to claim 3, wherein the most image candidates in the same image group are estimated as the image candidates for deriving the group context of the image group.

The plurality of image management information includes a plurality of POI information related to images,
The candidate estimation unit includes a POI estimation unit that estimates a POI candidate that is the context candidate based on POI information regarding the image,
The POI estimation unit
For each of the plurality of POI information, a POI score obtained by dividing the number of POI check-ins by the distance from the POI to the position where the image was recorded is calculated. Estimate as said POI candidate for derivation,
The information processing apparatus according to claim 3 or 4, wherein the most POI candidates in the same image group are estimated as the POI candidates for deriving the group context of the image group.

The plurality of image management information includes event information specified by information indicating a position and information indicating a date and time,
The candidate estimation unit includes an event estimation unit that estimates an event candidate that is the context candidate based on the event information,
The event estimation unit
For deriving the user context, the event information in which the information indicating the position matches the position information of the point where the image is recorded and the information indicating the date and time matches the date and time information where the image is recorded. Estimated as the event candidate,
The information that matches the position information of the point where the image included in the image group is recorded, and the information indicating the date is from the date information that recorded the image with the oldest date information among the images included in the image group. The event information included between the date and time information recording the image with the newest date and time information is estimated as the event candidate for deriving the group context of the image group. The information processing apparatus according to any one of the above.

The plurality of image management information includes a character recognition result regarding the image,
The candidate estimation unit includes a character recognition estimation unit that estimates a character candidate that is the context candidate based on a character recognition result regarding the image,
The character recognition estimation unit
Estimating a predetermined character among the characters in the character recognition result as the character candidate for deriving the user context,
The information processing apparatus according to claim 3, wherein the largest number of character candidates in the same image group is estimated as the character candidates for deriving the group context of the image group.

The combination part is
Associate a tag according to the user context with the image,
The information processing apparatus according to claim 2, wherein a tag corresponding to the group context is associated with the image group.

Associating a category that defines one or more user contexts in a generic concept with the image;
The information processing apparatus according to claim 2, further comprising a category assigning unit that associates a category in which one or a plurality of the group contexts are defined by a superordinate concept with the image group.

The group creation unit, among the created image groups, a plurality of the image groups in which the date and time information of included images is within a predetermined range as a common new image group,
The combination unit sets the group context of the image group having the largest number of images among the plurality of image groups included in the new image group as the group context of the new image group. The information processing apparatus according to any one of the above.