JP5837447B2

JP5837447B2 - Metadata candidate generation device and metadata candidate generation method

Info

Publication number: JP5837447B2
Application number: JP2012060997A
Authority: JP
Inventors: 直治山田; 未來原; 浩三野秋; 武史長沼
Original assignee: NTT Docomo Inc
Current assignee: NTT Docomo Inc
Priority date: 2012-03-16
Filing date: 2012-03-16
Publication date: 2015-12-24
Anticipated expiration: 2032-03-16
Also published as: JP2013196189A

Description

本発明は、ユーザによって記憶された記憶データに付加するメタデータの候補を生成するメタデータ候補生成装置及びメタデータ候補生成方法に関するものである。 The present invention relates to a metadata candidate generation device and a metadata candidate generation method for generating metadata candidates to be added to stored data stored by a user.

従来から、ユーザによってデジタルカメラ等を利用して記録された画像データに対するメタデータを自動生成する技術が知られている。これにより、メタデータに含まれるキーワードによって画像データを検索することが可能になる。 Conventionally, a technique for automatically generating metadata for image data recorded by a user using a digital camera or the like is known. As a result, it is possible to search for image data using a keyword included in the metadata.

例えば、下記特許文献１には、画像データの撮像日時および撮像位置を含む撮像情報を受信し、その撮像情報に基づいて、画像データに対するメタデータの生成時に用いる初期キーワードを生成するメタデータ生成装置が開示されている。このような機能によれば、撮像日時に対応するスケジュールデータにより初期キーワードが抽出され、撮像位置に対応する位置情報より初期キーワードが抽出される。 For example, in Patent Document 1 below, a metadata generation device that receives imaging information including an imaging date and time and an imaging position of image data, and generates an initial keyword used when generating metadata for the image data based on the imaging information. Is disclosed. According to such a function, the initial keyword is extracted from the schedule data corresponding to the imaging date and time, and the initial keyword is extracted from the position information corresponding to the imaging position.

特開２００７−５２５８１号公報JP 2007-52581 A

しかしながら、上述した従来のメタデータ生成装置では、撮像情報に対応する初期キーワードが抽出され、その初期キーワードの出現頻度の高い文書データから中間キーワード及び最終キーワードが抽出されるため、必ずしも撮像データの検索に適したキーワードが選択されない場合がある。その結果、撮像データの検索が効率的に実施できない傾向にあった。 However, in the conventional metadata generation apparatus described above, an initial keyword corresponding to imaging information is extracted, and an intermediate keyword and a final keyword are extracted from document data in which the initial keyword appears frequently. In some cases, a keyword suitable for is not selected. As a result, there is a tendency that retrieval of imaging data cannot be performed efficiently.

そこで、本発明は、かかる課題に鑑みて為されたものであり、ユーザが作成したテキスト情報の中から、記憶データの検索に適したメタデータを効率的に抽出することが可能なメタデータ候補生成装置及びメタデータ候補生成方法を提供することを目的とする。 Therefore, the present invention has been made in view of such problems, and is a metadata candidate that can efficiently extract metadata suitable for retrieval of stored data from text information created by a user. It is an object to provide a generation device and a metadata candidate generation method.

上記課題を解決するため、本発明のメタデータ候補生成装置は、ユーザによって記憶された記憶データに対して付与されたデータ生成時刻に関する生成時刻情報を取得する第１の付与情報取得手段と、ユーザによって作成された複数のテキスト情報に対して付与された時刻に関する時刻情報を取得する第２の付与情報取得手段と、生成時刻情報の示す時刻に対して一定の時間範囲内の時刻を示す時刻情報が付与されたテキスト情報を、複数のテキスト情報のうちから特定し、当該テキスト情報を取得する時間的類似情報取得手段と、時間的類似情報取得手段によって取得されたテキスト情報から、重要度付けをした複数の重要語を抽出し、複数の重要語を記憶データに付加するメタデータの候補として出力する重要語抽出手段と、を備える。 In order to solve the above-described problem, the metadata candidate generation device of the present invention includes a first assignment information acquisition unit that acquires generation time information related to a data generation time assigned to storage data stored by a user, and a user Time information indicating a time within a certain time range with respect to the time indicated by the generation time information, and second addition information acquisition means for acquiring time information relating to the time given to the plurality of text information created by Is specified from among the plurality of text information, and the importance is given from the temporal similarity information acquisition means for acquiring the text information and the text information acquired by the temporal similarity information acquisition means. A plurality of important words extracted and output as candidate metadata for adding the plurality of important words to the stored data.

或いは、本発明のメタデータ候補生成方法は、第１の付与情報取得手段が、ユーザによって記憶された記憶データに対して付与されたデータ生成時刻に関する生成時刻情報を取得する第１の付与情報取得ステップと、第２の付与情報取得手段が、ユーザによって作成された複数のテキスト情報に対して付与された時刻に関する時刻情報を取得する第２の付与情報取得ステップと、時間的類似情報取得手段が、生成時刻情報の示す時刻に対して一定の時間範囲内の時刻を示す時刻情報が付与されたテキスト情報を、複数のテキスト情報のうちから特定し、当該テキスト情報を取得する時間的類似情報取得ステップと、重要語抽出手段が、時間的類似情報取得手段によって取得されたテキスト情報から、重要度付けをした複数の重要語を抽出し、複数の重要語を記憶データに付加するメタデータの候補として出力する重要語抽出ステップと、を備える。 Alternatively, in the metadata candidate generation method of the present invention, the first assignment information acquisition unit acquires the generation time information related to the data generation time assigned to the storage data stored by the user. A second grant information acquisition step in which the second grant information acquisition means acquires time information relating to the time given to the plurality of text information created by the user; and a temporal similarity information acquisition means. The time-similar information acquisition for identifying text information to which time information indicating a time within a certain time range is given from the time indicated by the generation time information from among a plurality of text information and acquiring the text information A step and an important word extracting means extract a plurality of important words given importance from the text information acquired by the temporal similarity information acquiring means; Comprising the important word extracting step of outputting as the candidate of the metadata to be added to the number of key words in the stored data, the.

このようなメタデータ候補生成装置、或いはメタデータ候補生成方法によれば、記憶データに対して付与された生成時刻情報が取得されるとともに、複数のテキスト情報に対して付与された時刻情報が取得される。さらに、生成時刻情報の示す時刻に対して一定の時間範囲の時刻情報が付与されたテキスト情報が取得され、これらのテキスト情報から重要度付けされた複数の重要語がメタデータの候補として出力される。これにより、記憶データの生成時刻に近い時刻に関連するテキスト情報が広く取得されて、そのテキスト情報の中から重要な語が出力されるので、記憶データ用の検索キーワードとしての候補を効率よく抽出して、その候補の中から記憶データにメタデータとして付加させることができる。 According to such a metadata candidate generation device or metadata candidate generation method, generation time information given to storage data is acquired, and time information given to a plurality of text information is acquired. Is done. Further, text information to which time information in a certain time range is given with respect to the time indicated by the generation time information is acquired, and a plurality of important words assigned importance from these text information are output as metadata candidates. The As a result, text information related to the time close to the generation time of the stored data is widely acquired, and important words are output from the text information. Therefore, candidates as search keywords for stored data are efficiently extracted. Then, the candidate can be added as metadata to the stored data.

第１の付与情報取得手段は、記憶データに対して付与されたデータ生成位置に関する生成位置情報を更に取得し、第２の付与情報取得手段は、テキスト情報に付与された位置に関する位置情報を更に取得し、生成位置情報の示す位置に対して空間的に関連の有る位置を示す位置情報が付与されたテキスト情報を、複数のテキスト情報のうちから特定し、当該テキスト情報を取得する空間的類似情報取得手段を備え、重要語抽出手段は、時間的類似情報取得手段及び空間的類似情報取得手段によって取得されたテキスト情報から、重要度付けをした複数の重要語を抽出し、複数の重要語を記憶データに付加するメタデータの候補として出力することが好ましい。 The first assignment information acquisition means further acquires the generation position information related to the data generation position assigned to the stored data, and the second assignment information acquisition means further acquires the position information related to the position assigned to the text information. Spatial similarity that acquires and identifies text information to which position information indicating a position spatially related to the position indicated by the generated position information is assigned from among a plurality of text information, and acquires the text information The information acquisition means includes an important word extraction means for extracting a plurality of important words with importance from the text information acquired by the temporal similarity information acquisition means and the spatial similarity information acquisition means, and a plurality of important words Are preferably output as metadata candidates to be added to the stored data.

かかる構成を採れば、記憶データに対して付与された生成位置情報が取得されるとともに、複数のテキスト情報に対して付与された位置情報が取得される。さらに、生成位置情報の示す位置に対して空間的に関連の有る位置情報が付与されたテキスト情報が取得され、これらのテキスト情報から重要度付けされた複数の重要語がメタデータの候補として出力される。これにより、記憶データの生成位置に近い位置に関連するテキスト情報が広く取得されて、そのテキスト情報の中から重要な語が出力されるので、記憶データ用の検索キーワードとしての候補をさらに一層効率よく抽出して、その候補の中から記憶データにメタデータとして付加させることができる。 With such a configuration, the generation position information given to the stored data is acquired, and the position information given to the plurality of text information is acquired. Furthermore, text information to which position information that is spatially related to the position indicated by the generated position information is obtained is acquired, and a plurality of important words that are assigned importance from these text information are output as metadata candidates. Is done. As a result, text information related to a position close to the generation position of the stored data is widely acquired, and important words are output from the text information, so that candidates as search keywords for the stored data can be made even more efficient. It can be extracted well and added as metadata to the stored data from the candidates.

また、重要語抽出手段は、テキスト情報における複数の単語の出現頻度に基づいて、複数の重要語の重要度を決定する、ことが好ましい。こうすれば、記憶データの検索キーワードにより適した重要語を記憶データに付加させることができる。 Moreover, it is preferable that an important word extraction means determines the importance of several important words based on the appearance frequency of several words in text information. In this way, an important word that is more suitable for a search keyword of stored data can be added to the stored data.

また、重要語抽出手段は、複数の単語の出現頻度、複数の単語の抽出元であるテキスト情報に付与された時刻情報の生成時刻情報に対する時間的一致度を基に、重要度を決定する、ことも好ましい。かかる構成を採れば、重要語に対して、その出現頻度に記憶データに対する時間的一致度を加味して重要度を決定するので、複数のテキスト情報の中から検索キーワードに適した重要語をより効率的に抽出することができる。 Further, the important word extracting means determines the importance based on the appearance frequency of the plurality of words and the temporal coincidence with respect to the generation time information of the time information given to the text information from which the plurality of words are extracted. It is also preferable. By adopting such a configuration, since the importance level is determined by adding the temporal frequency of the stored word to the appearance frequency of the important word, more important words suitable for the search keyword are selected from a plurality of text information. It can be extracted efficiently.

また、重要語抽出手段は、複数の単語の出現頻度、複数の単語の抽出元であるテキスト情報に付与された時刻情報の生成時刻情報に対する時間的一致度、及び当該テキスト情報に付与された位置情報の生成位置情報に対する空間的一致度を基に、重要度を決定する、ことも好ましい。かかる構成を採れば、重要語に対して、その出現頻度に記憶データに対する時間的及び空間的一致度を加味して重要度を決定するので、複数のテキスト情報の中から検索キーワードに適した重要語をより効率的に抽出することができる。 In addition, the important word extracting means includes a frequency of appearance of a plurality of words, a degree of temporal coincidence with generation time information of time information given to text information from which a plurality of words are extracted, and a position given to the text information It is also preferable to determine the importance based on the spatial coincidence with the information generation position information. By adopting such a configuration, the importance is determined by adding the temporal frequency and the spatial coincidence with the stored data to the appearance frequency of the important word, so that the importance suitable for the search keyword from a plurality of text information. Words can be extracted more efficiently.

さらに、重要語抽出手段は、複数の単語に関して、出現頻度及び時間的一致度を重み付け加算することにより、重要度を計算する、ことも好ましい。この場合、テキスト情報における出現頻度と記憶データに対する時間的一致度との間で重み付けを加えながら重要語の重要度を決定するので、検索キーワードにより一層適した重要語を選択することができる。 Furthermore, it is also preferable that the important word extracting unit calculates the importance by weighting and adding the appearance frequency and the temporal coincidence with respect to a plurality of words. In this case, since the importance of the important word is determined while weighting between the appearance frequency in the text information and the temporal coincidence with the stored data, it is possible to select an important word more suitable for the search keyword.

さらに、重要語抽出手段は、複数の単語に関して、出現頻度、時間的一致度、及び空間的一致度を重み付け加算することにより、重要度を計算する、ことも好ましい。この場合、テキスト情報における出現頻度と記憶データに対する時間的及び空間的一致度との間で重み付けを加えながら重要語の重要度を決定するので、検索キーワードにより一層適した重要語を選択することができる。 Furthermore, it is preferable that the important word extracting unit calculates the importance by weighting and adding the appearance frequency, the temporal coincidence, and the spatial coincidence for a plurality of words. In this case, since the importance of the important word is determined while weighting between the appearance frequency in the text information and the temporal and spatial coincidence with the stored data, it is possible to select a more suitable important word for the search keyword. it can.

またさらに、互いに一定の時間範囲内にある生成時刻情報を有する複数の記憶データを特定する記憶データ特定手段をさらに備え、重要語抽出手段は、当該複数の記憶データのうちの１つに対して抽出した複数の重要語を、当該複数の記憶データに対して付加するメタデータの候補として出力する、ことも好ましい。こうすれば、時間的に関連性の高い記憶データに対して、まとめて重要語を出力することができ、データ処理効率が格段に向上する。 The storage device further includes storage data specifying means for specifying a plurality of storage data having generation time information within a certain time range, and the keyword extraction means is provided for one of the plurality of storage data. It is also preferable to output the plurality of extracted important words as metadata candidates to be added to the plurality of stored data. In this way, it is possible to output important words collectively with respect to stored data that are highly related in time, and the data processing efficiency is greatly improved.

さらにまた、互いに空間的に関連がある生成位置情報を有する複数の記憶データを特定する記憶データ特定手段をさらに備え、重要語抽出手段は、当該複数の記憶データのうちの１つに対して抽出した複数の重要語を、当該複数の記憶データに対して付加するメタデータの候補として出力する、ことも好ましい。この場合も、空間的に関連性の高い記憶データに対して、まとめて重要語を出力することができ、データ処理効率が格段に向上する。 Furthermore, it further comprises storage data specifying means for specifying a plurality of storage data having generation position information spatially related to each other, and the important word extraction means extracts one of the plurality of storage data. It is also preferable to output the plurality of important words as metadata candidates to be added to the plurality of stored data. Also in this case, important words can be output collectively for stored data that are spatially related, and the data processing efficiency is greatly improved.

本発明によれば、ユーザが作成したテキスト情報の中から、記憶データの検索に適したメタデータを効率的に抽出することができる。 According to the present invention, metadata suitable for retrieval of stored data can be efficiently extracted from text information created by a user.

本発明の好適な一実施形態にかかるメタデータ候補生成用サーバ装置１の概略構成図である。It is a schematic block diagram of the server apparatus 1 for metadata candidate generation concerning suitable one Embodiment of this invention. 図１のメタデータ候補生成用サーバ装置１を構成する計算機のハードウェア構成を示すブロック図である。It is a block diagram which shows the hardware constitutions of the computer which comprises the server apparatus 1 for metadata candidate production | generation of FIG. 図１のテキストデータ格納部１２に格納されたテキストデータのデータ構成の一例を示す図である。It is a figure which shows an example of a data structure of the text data stored in the text data storage part 12 of FIG. 図１のメタデータ候補生成用サーバ装置１によるメタデータ候補生成時の動作を示すフローチャートである。It is a flowchart which shows the operation | movement at the time of the metadata candidate production | generation by the server apparatus 1 for metadata candidate production | generation of FIG. 図１のメタデータ候補生成用サーバ装置１によるメタデータ候補の出力例を示す図である。It is a figure which shows the example of an output of the metadata candidate by the server apparatus 1 for metadata candidate generation of FIG. 本発明の変形例にかかる移動通信端末１０１の概略構成図である。It is a schematic block diagram of the mobile communication terminal 101 concerning the modification of this invention.

以下、図面とともに本発明によるメタデータ候補生成装置及びメタデータ候補生成方法の好適な実施形態について詳細に説明する。なお、図面の説明においては同一要素には同一符号を付し、重複する説明を省略する。 Hereinafter, preferred embodiments of a metadata candidate generation apparatus and a metadata candidate generation method according to the present invention will be described in detail with reference to the drawings. In the description of the drawings, the same elements are denoted by the same reference numerals, and redundant description is omitted.

図１は、本発明の好適な一実施形態にかかるメタデータ候補生成用サーバ装置１の概略構成図である。図１に示すメタデータ候補生成用サーバ装置１は、ユーザにより移動通信端末２を利用して記憶された写真データ、音楽データ、文書データ等の記憶データを格納及び管理するサーバ装置である。このメタデータ候補生成用サーバ装置１は、１台のサーバ装置で構成されていてもよいし、複数のサーバ装置が連携して動作するサーバシステムであってもよい。このメタデータ候補生成用サーバ装置１は、移動体通信方式を採用した移動体通信ネットワークや有線通信ネットワーク等によって構成される通信ネットワークＮＷを介して、移動通信端末２と相互にデータ通信を行うことが可能とされている。移動通信端末２は、携帯電話端末、スマートフォン、ＰＤＡ等に代表される端末装置である。 FIG. 1 is a schematic configuration diagram of a metadata candidate generation server device 1 according to a preferred embodiment of the present invention. A metadata candidate generation server apparatus 1 shown in FIG. 1 is a server apparatus that stores and manages storage data such as photo data, music data, and document data stored by a user using a mobile communication terminal 2. The metadata candidate generation server device 1 may be configured by a single server device or a server system in which a plurality of server devices operate in cooperation. The metadata candidate generation server device 1 performs data communication with the mobile communication terminal 2 via a communication network NW configured by a mobile communication network, a wired communication network, or the like adopting a mobile communication method. Is possible. The mobile communication terminal 2 is a terminal device represented by a mobile phone terminal, a smartphone, a PDA, or the like.

図２は、図１のメタデータ候補生成用サーバ装置１を構成する計算機のハードウェア構成を示すブロック図である。メタデータ候補生成用サーバ装置１を構成する計算機１００は、物理的には、ＣＰＵ３１と、主記憶装置であるＲＡＭ３２及びＲＯＭ３３と、ハードディスク装置等の補助記憶装置３６と、入力デバイスである入力キー、タッチパネル、マウス等の入力装置３５と、ディスプレイ、スピーカ等の出力装置３７と、他の端末装置やサーバ装置との間での通信ネットワークＮＷを介したデータの送受信を司る通信モジュール３４とを含む情報処理装置として構成されている。メタデータ候補生成用サーバ装置１によって実現される機能は、図２に示すＣＰＵ３１、ＲＡＭ３２等のハードウェア上に所定のプログラムを読み込ませることにより、ＣＰＵ３１の制御のもとで通信モジュール３４、入力装置３５、出力装置３７を動作させるとともに、ＲＡＭ３２や補助記憶装置３６におけるデータの読み出し及び書き込みを行うことで実現される。 FIG. 2 is a block diagram showing a hardware configuration of a computer constituting the metadata candidate generation server device 1 of FIG. The computer 100 constituting the metadata candidate generation server device 1 physically includes a CPU 31, a RAM 32 and a ROM 33 that are main storage devices, an auxiliary storage device 36 such as a hard disk device, an input key that is an input device, Information including an input device 35 such as a touch panel and a mouse, an output device 37 such as a display and a speaker, and a communication module 34 that manages transmission / reception of data to / from other terminal devices and server devices via the communication network NW. It is configured as a processing device. The functions realized by the metadata candidate generation server device 1 include a communication module 34 and an input device under the control of the CPU 31 by reading a predetermined program on hardware such as the CPU 31 and the RAM 32 shown in FIG. 35, and by operating the output device 37 and reading and writing data in the RAM 32 and the auxiliary storage device 36.

図１に戻って、メタデータ候補生成用サーバ装置１は、機能的な構成要素として、記憶データ格納部１１と、テキストデータ格納部１２と、生成時刻情報取得部（第１の付与情報取得手段）１３と、生成位置情報取得部（第１の付与情報取得手段）１４と、時刻情報取得部（第２の付与情報取得手段）１５と、位置情報取得部（第２の付与情報取得手段）１７と、時間的類似情報取得部（時間的類似情報取得手段）１６と、空間的類似情報取得部（空間的類似情報取得手段）１８と、形態素解析部（重要語抽出手段）１９と、重要語抽出部（重要語抽出手段）２０と、重要語格納部２１と、類似データ特定部（記憶データ特定手段）２２と、候補データ出力部（重要語抽出手段）２３とを備えている。 Returning to FIG. 1, the metadata candidate generation server device 1 includes a storage data storage unit 11, a text data storage unit 12, and a generation time information acquisition unit (first attached information acquisition unit) as functional components. ) 13, generated position information acquisition unit (first assignment information acquisition unit) 14, time information acquisition unit (second addition information acquisition unit) 15, and position information acquisition unit (second addition information acquisition unit) 17, a temporal similarity information acquisition unit (temporal similarity information acquisition unit) 16, a spatial similarity information acquisition unit (spatial similarity information acquisition unit) 18, a morpheme analysis unit (important word extraction unit) 19, A word extracting unit (important word extracting unit) 20, an important word storage unit 21, a similar data specifying unit (stored data specifying unit) 22, and a candidate data output unit (important word extracting unit) 23 are provided.

まず、メタデータ候補生成用サーバ装置１の各構成要素の機能について詳細に説明する。 First, the function of each component of the metadata candidate generation server device 1 will be described in detail.

記憶データ格納部１１は、ユーザによって移動通信端末２を利用して生成あるいは取得された複数の記憶データを記憶する。この複数の記憶データのそれぞれには、データがユーザによって記憶された時刻に関する生成時刻情報と、データ記憶時の移動通信端末２の位置を示す生成位置情報とが付与されている。これらの生成時刻情報及び生成位置情報は、写真データにおけるＥＸＩＦ（Exchangeable Image File Format）のように記憶データにメタデータとして含まれていてもよいし、記憶データのファイル名に含まれていてもよい。また、生成時刻情報は、ユーザが移動通信端末２を利用して記憶データを生成した時刻を示すものであってもよく、メタデータ候補生成用サーバ装置１に記憶データが記憶された時刻を示すものであってもよい。同様に、生成位置情報は、ユーザが移動通信端末２を利用して記憶データを生成した時の位置を示すものであってもよく、メタデータ候補生成用サーバ装置１に記憶データが記憶された時の位置を示すものであってもよい。 The storage data storage unit 11 stores a plurality of storage data generated or acquired by the user using the mobile communication terminal 2. Each of the plurality of stored data is provided with generation time information related to the time when the data is stored by the user, and generation position information indicating the position of the mobile communication terminal 2 at the time of data storage. These generation time information and generation position information may be included as metadata in the storage data like EXIF (Exchangeable Image File Format) in the photo data, or may be included in the file name of the storage data. . Further, the generation time information may indicate the time when the user generated the storage data using the mobile communication terminal 2, and indicates the time when the storage data was stored in the metadata candidate generation server device 1. It may be a thing. Similarly, the generation position information may indicate a position when the user generates the storage data using the mobile communication terminal 2, and the storage data is stored in the metadata candidate generation server device 1. It may indicate the position of time.

テキストデータ格納部１２は、ユーザによって移動通信端末２を利用して作成された複数のテキストデータ（テキスト情報）を記憶する。テキストデータとしては、ユーザの行動予定及び行動履歴を示すスケジュールデータや、ユーザが通信ネットワークＮＷに接続されたＳＮＳ（Social Networking Service）等の各種サービスを提供するコンピュータ上に投稿した日記データ、メモデータ等のテキストデータ、ユーザが通信ネットワークＮＷに向けて送信したメールデータ等が挙げられる。このテキストデータには、そのデータに関連する時刻に関する時刻情報と、データに関連する位置を示す位置情報とが付与されている。例えば、スケジュールデータには、そのデータに含まれるスケジュールに対応する日時及び場所が含まれ、行事データには、クリスマスや誕生日などの行事に対応する日時が含まれ、日記データには、日記に対応する日時及び場所が含まれ、メールデータには、メール送信時の日時及び場所が含まれ、ニュースデータには、ニュースに対応する日時及び場所が含まれている。これらの時刻情報及び位置情報は、テキストデータに直接含まれていてもよいし、テキストデータにメタデータとして付加されていてもよい。 The text data storage unit 12 stores a plurality of text data (text information) created by the user using the mobile communication terminal 2. Text data includes schedule data indicating a user's action schedule and action history, diary data posted on a computer that provides various services such as SNS (Social Networking Service) connected to the communication network NW, and memo data. Text data such as, mail data sent by the user to the communication network NW, and the like. This text data is provided with time information related to the time related to the data and position information indicating a position related to the data. For example, the schedule data includes the date and time corresponding to the schedule included in the data, the event data includes the date and time corresponding to events such as Christmas and birthday, and the diary data includes Corresponding date / time and location are included, the mail data includes date / time and location at the time of mail transmission, and the news data includes date / time and location corresponding to the news. These time information and position information may be directly included in the text data, or may be added to the text data as metadata.

生成時刻情報取得部１３は、記憶データ格納部１１に格納されたメタデータ候補の出力処理対象の記憶データから、生成時刻情報を読み出して取得する。また、生成位置情報取得部１４は、記憶データ格納部１１に格納された処理対象の記憶データから、生成位置情報を読み出して取得する。そして、生成時刻情報取得部１３は、取得した生成時刻情報を時間的類似情報取得部１６に引き渡し、生成位置情報取得部１４は、取得した生成時刻情報を空間的類似情報取得部１８に引き渡す。 The generation time information acquisition unit 13 reads and acquires the generation time information from the storage data of the metadata candidate output process target stored in the storage data storage unit 11. Further, the generation position information acquisition unit 14 reads out and acquires generation position information from the processing target storage data stored in the storage data storage unit 11. The generation time information acquisition unit 13 passes the acquired generation time information to the temporal similarity information acquisition unit 16, and the generation position information acquisition unit 14 passes the acquired generation time information to the spatial similarity information acquisition unit 18.

時刻情報取得部１５は、テキストデータ格納部１２に格納された複数のテキストデータから、時刻情報を読み出して取得し、位置情報取得部１７は、テキストデータ格納部１２に格納された複数のテキストデータから、位置情報を読み出して取得する。そして、時刻情報取得部１５は、取得した時刻情報を時間的類似情報取得部１６に引き渡し、位置情報取得部１７は、取得した生成時刻情報を空間的類似情報取得部１８に引き渡す。 The time information acquisition unit 15 reads and acquires time information from a plurality of text data stored in the text data storage unit 12, and the position information acquisition unit 17 stores a plurality of text data stored in the text data storage unit 12. The position information is read out and acquired. The time information acquisition unit 15 passes the acquired time information to the temporal similarity information acquisition unit 16, and the position information acquisition unit 17 passes the acquired generation time information to the spatial similarity information acquisition unit 18.

図３には、テキストデータ格納部１２に格納されたデータの構成の一例を示している。同図に示すように、１つのテキストデータには、そのデータを識別するＩＤ情報“カレンダ＿001”と、データ内容“家族旅行＠北海道”と、時刻情報“日時：2012/1/30”と、位置情報“北海道札幌市…”とが含まれ、他のテキストデータには、そのデータを識別するＩＤ情報“日記＿001”と、データ内容“夏休みに家族で北海道へ。…”と、時刻情報“日時：2012/1/30”とが含まれている。時刻情報取得部１５は、このような複数のテキストデータから、ＩＤと時刻情報との組み合わせ、詳細には、ＩＤ情報“カレンダ＿001”と時刻情報“日時：2012/1/30”との組み合わせ、及びＩＤ情報“日記＿001”と時刻情報“日時：2012/1/30”との組み合わせを取得する。また、位置情報取得部１７は、このような複数のテキストデータから、ＩＤと位置情報との組み合わせ、詳細には、ＩＤ情報“カレンダ＿001”と位置情報“北海道札幌市…”との組み合わせを取得する。なお、テキストデータに含まれる位置情報としては、緯度経度を示す情報であってもよい。 FIG. 3 shows an example of the configuration of data stored in the text data storage unit 12. As shown in the figure, in one text data, ID information “Calendar_001” for identifying the data, data content “Family trip @ Hokkaido”, time information “Date / time: 2012/1/30”, Location information “Hokkaido Sapporo City…” is included, and the other text data includes ID information “Diary_001” for identifying the data, data content “Family go to Hokkaido on summer vacation…”, and time information “ Date and time: 2012/1/30 ”is included. The time information acquisition unit 15 uses a combination of ID and time information from the plurality of text data, more specifically, a combination of ID information “Calendar_001” and time information “Date / Time: 2012/1/30”, Also, a combination of ID information “diary_001” and time information “date / time: 2012/1/30” is acquired. Further, the position information acquisition unit 17 acquires a combination of ID and position information, more specifically, a combination of ID information “Calendar_001” and position information “Sapporo City in Hokkaido ...” from such text data. To do. Note that the position information included in the text data may be information indicating latitude and longitude.

時間的類似情報取得部１６は、処理対象の記憶データに付与されていた生成時刻情報の示す時刻に対して、一定の時間範囲内の時刻を示す時刻情報が付与されたテキストデータを、テキストデータ格納部１２に格納されていたテキストデータの中から特定する。すなわち、時刻情報取得部１５によって取得されたＩＤ情報及び時刻情報の組み合わせから、生成時刻情報の示す時刻から予め定められた時間内の時刻情報を含む組み合わせを抽出し、抽出した組み合わせに含まれるＩＤ情報を基にテキストデータを特定する。例えば、図３に示すテキストデータの例の場合、処理対象の記憶データに付与されていた生成時刻情報の示す時刻“2012/1/30 01:00:00”から１日以内の時刻の範囲“2012/1/29 01:00:00”〜“2012/1/31 01:00:00”を算出し、その時刻の範囲にある時刻情報““2012/1/30”を抽出し、その時刻情報に対応するＩＤ情報“カレンダ＿001”，“日記＿001”を特定する。そして、時間的類似情報取得部１６は、特定したＩＤ情報に対応するテキストデータをテキストデータ格納部１２から取得して、形態素解析部１９に引き渡す。 The temporal similarity information acquisition unit 16 converts the text data to which the time information indicating the time within a certain time range is added to the time indicated by the generation time information given to the storage data to be processed, as text data The text data stored in the storage unit 12 is specified. That is, a combination including time information within a predetermined time is extracted from the time indicated by the generation time information from the combination of ID information and time information acquired by the time information acquisition unit 15, and the ID included in the extracted combination Identify text data based on information. For example, in the case of the text data example shown in FIG. 3, the time range within one day from the time “2012/1/30 01:00:00” indicated by the generation time information given to the storage data to be processed “ 2012/1/29 01:00:00 ”to“ 2012/1/31 01:00:00 ”is calculated, and the time information“ 2012/1/30 ”within the time range is extracted, and the time ID information “Calendar_001” and “Diary_001” corresponding to the information is specified, and the temporal similarity information acquisition unit 16 acquires text data corresponding to the specified ID information from the text data storage unit 12, and Delivered to the morphological analyzer 19.

空間的類似情報取得部１８は、処理対象の記憶データに付与されていた生成位置情報の示す時刻に対して、空間的に関連の高い位置を示す位置情報が付与されたテキストデータを、テキストデータ格納部１２に格納されていたテキストデータの中から特定する。すなわち、位置情報取得部１７によって取得されたＩＤ情報及び位置情報の組み合わせから、生成位置情報の示す位置から予め定められた距離範囲内の位置情報を含む組み合わせを抽出し、抽出した組み合わせに含まれるＩＤ情報を基にテキストデータを特定する。例えば、図３に示すテキストデータの例の場合、処理対象の記憶データに付与されていた生成位置情報の示す位置“緯度Ｘ度、経度Ｙ度”からＺｋｍの距離範囲内の位置を示す位置情報“北海道札幌市…”を関連の高い位置情報として抽出し、その位置情報に対応するＩＤ情報“カレンダ＿001”を特定する。このとき、空間的類似情報取得部１８は、地名で示される位置情報を緯度経度に変換して比較してもよいし、緯度経度で示される位置情報を地名に変換して比較してもよい。また、空間的類似情報取得部１８は、２つの位置情報の示す位置間の距離を判断することにより位置情報を抽出してもよいし、２つの位置情報の示す地名の一部が一致するか否かで位置情報を抽出してもよい。そして、空間的類似情報取得部１８は、特定したＩＤ情報に対応するテキストデータをテキストデータ格納部１２から取得して、形態素解析部１９に引き渡す。 The spatial similarity information acquisition unit 18 converts the text data to which the position information indicating the spatially related position is added to the time indicated by the generation position information given to the storage data to be processed. The text data stored in the storage unit 12 is specified. That is, a combination including position information within a predetermined distance range is extracted from the position indicated by the generated position information from the combination of the ID information and the position information acquired by the position information acquisition unit 17 and is included in the extracted combination. Text data is specified based on the ID information. For example, in the example of the text data shown in FIG. 3, the position information indicating the position within the distance range of Zkm from the position “latitude X degrees, longitude Y degrees” indicated by the generation position information given to the storage data to be processed. “Hokkaido Sapporo City…” is extracted as highly relevant location information, and ID information “Calendar_001” corresponding to the location information is specified. At this time, the spatial similarity information acquisition unit 18 may convert the position information indicated by the place name into latitude and longitude, or may convert the position information indicated by the latitude and longitude into the place name for comparison. . Also, the spatial similarity information acquisition unit 18 may extract the position information by determining the distance between the positions indicated by the two position information, and whether the part of the place name indicated by the two position information matches. Position information may be extracted depending on whether or not. Then, the spatial similarity information acquisition unit 18 acquires text data corresponding to the identified ID information from the text data storage unit 12 and delivers it to the morpheme analysis unit 19.

形態素解析部１９は、時間的類似情報取得部１６及び空間的類似情報取得部１８から渡されたテキストデータに対して形態素解析を施し、それらのテキストデータを単語に分割し、分割した全ての単語を重要語抽出部２０に引き渡す。なお、形態素解析部１９は、分割した単語の品詞を判別し、名詞や動詞等の特定の品詞に判定された単語のみを抽出して重要語抽出部２０に引き渡すことが好適である。また、分割した単語の品詞を判別し、助詞や助動詞等の特定の品詞に判定された単語を除いて抽出することも好適である。例えば、形態素解析部１９は、テキストデータ“家族旅行＠北海道”から、単語“家族旅行”、“北海道”を抽出する。 The morpheme analysis unit 19 performs morpheme analysis on the text data passed from the temporal similarity information acquisition unit 16 and the spatial similarity information acquisition unit 18, divides the text data into words, and all the divided words Is transferred to the keyword extraction unit 20. It is preferable that the morphological analysis unit 19 discriminates the part of speech of the divided word, extracts only words determined as specific parts of speech such as nouns and verbs, and delivers them to the keyword extraction unit 20. It is also preferable to discriminate the part of speech of the divided word and extract the words determined as specific parts of speech such as particles and auxiliary verbs. For example, the morphological analysis unit 19 extracts the words “family trip” and “Hokkaido” from the text data “family trip @ Hokkaido”.

重要語抽出部２０は、形態素解析部１９によって抽出された各単語の重要度を、各単語の出現頻度、生成時刻情報に対する時間的一致度、及び生成位置情報に対する空間的一致度を基に決定する。すなわち、重要語抽出部２０は、該当単語が抽出されたテキストデータ中の出現頻度からＴＦ（Term Frequency）値tfを算出し、テキストデータ格納部１２に格納されている全てのテキストデータ数、及び該当単語を含むテキストデータ数からＩＤＦ（Inverse Document Frequency）値idfを算出し、それらの値を乗じて得られたＴＦ−ＩＤＦ値tfidfを重要度パラメータＳ_{ＴＦ−ＩＤＦ}として計算する（下記式参照。）。

上記式中、n_i,jは、テキストデータｊにおける出現回数、|D|は総テキストデータ数、

は、単語ｉを含むテキストデータ数を示す。
また、重要語抽出部２０は、該当単語の抽出元のテキストデータに付与された時刻情報の示す時刻と、処理対象の記憶データに付与された生成時刻情報の示す時刻との間の時間差を、各単語の記憶データに対する時間的近接度を示す重要度パラメータＳ_ＴＩＭＥとして決定する。また、重要語抽出部２０は、該当単語の抽出元のテキストデータに付与された位置情報の示す位置と、処理対象の記憶データに付与された生成位置情報の示す位置との間の距離を、各単語の記憶データに対する空間的近接度を示す重要度パラメータＳ_{ＳＰＡＣＥ}として決定する。さらに、重要語抽出部２０は、計算した３つの重要度パラメータＳ_{ＴＦ−ＩＤＦ}，Ｓ_ＴＩＭＥ，Ｓ_{ＳＰＡＣＥ}を、下記式；
重要度パラメータＳ＝αＳ^’ _{ＴＦ−ＩＤＦ}＋βＳ^’ _ＴＩＭＥ＋γＳ^’ _{ＳＰＡＣＥ}
を用いて重み付け加算をすることにより、各単語の記憶データに対する重要度を示す重要度パラメータＳを計算して各単語に付加する。ここで、α＋β＋γ＝１であり、Ｓ^’ _{ＴＦ−ＩＤＦ}は、全ての単語の値のうちの上位Ｎ値の総和でＳ_{ＴＦ−ＩＤＦ}を正規化した値である。また、Ｓ^’ _ＴＩＭＥ，Ｓ^’ _{ＳＰＡＣＥ}は、それぞれ、下記式；
Ｓ^’ _ＴＩＭＥ＝（ΣＳ_ＴＩＭＥ−Ｓ_ＴＩＭＥ）／ΣＳ_ＴＩＭＥ，
Ｓ^’ _{ＳＰＡＣＥ}＝（ΣＳ_{ＳＰＡＣＥ}−Ｓ_{ＳＰＡＣＥ}）／ΣＳ_{ＳＰＡＣＥ}
によってＳ_ＴＩＭＥ，Ｓ_{ＳＰＡＣＥ}を正規化した値である（ΣＳ_ＴＩＭＥ，ΣＳ_ＴＩＭＥは、全ての単語の値の総和を示し、ｎ個のＳ_ＴＩＭＥの総和ΣＳ_ＴＩＭＥ＝０の場合には、Ｓ^’ _ＴＩＭＥ＝１／ｎとし、ｎ個のＳ_{ＳＰＡＣＥ}の総和ΣＳ_{ＳＰＡＣＥ}＝０の場合には、Ｓ^’ _{ＳＰＡＣＥ}＝１／ｎとする）。そして、重要語抽出部２０は、重要度パラメータＳを付加した各単語を、メタデータ候補の出力処理対象の記憶データ毎に重要語格納部２１に格納する。 The important word extraction unit 20 determines the importance of each word extracted by the morphological analysis unit 19 based on the appearance frequency of each word, the temporal coincidence with respect to the generation time information, and the spatial coincidence with respect to the generation position information. To do. That is, the important word extraction unit 20 calculates a TF (Term Frequency) value tf from the appearance frequency in the text data from which the corresponding word is extracted, the number of all text data stored in the text data storage unit 12, and An IDF (Inverse Document Frequency) value idf is calculated from the number of text data including the corresponding word, and a TF-IDF value tfidf obtained by multiplying these values is calculated as an importance parameter S _TF-IDF (see the following formula). ).

In the above formula, n _{i, j} is the number of appearances in text data j, | D | is the total number of text data,

Indicates the number of text data including the word i.
In addition, the important word extraction unit 20 calculates a time difference between the time indicated by the time information given to the text data from which the word is extracted and the time indicated by the generation time information given to the storage data to be processed. The importance parameter S _TIME indicating the temporal proximity to the stored data of each word is determined. Further, the important word extraction unit 20 calculates the distance between the position indicated by the position information given to the text data from which the word is extracted and the position indicated by the generated position information given to the storage data to be processed. The importance parameter S _SPACE indicating the spatial proximity of each word to the stored data is determined. Furthermore, the keyword extraction unit 20 calculates the calculated three importance parameters S _TF-IDF , S _TIME , S _SPACE by the following formula:
Importance parameter S = αS ^′ _TF−IDF + βS ^′ _TIME + γS ^′ _SPACE
Is added to each word by calculating the importance parameter S indicating the importance of each word with respect to the stored data. Here, α + β + γ = 1, and S ^′ _TF-IDF is a value obtained by normalizing S _TF-IDF with the sum of the upper N values of all word values. Also, S ^{_^'TIME, S'} _SPACE, respectively, the following formulas;
S ^′ _TIME = (ΣS _TIME −S _TIME ) / ΣS _TIME ,
S ^′ _SPACE = (ΣS _SPACE −S _SPACE ) / ΣS _SPACE
By a _S _TIME, the value obtained by normalizing the _{S SPACE} (ΣS _{_TIME,} ΣS _TIME represents the sum of all the words of the value, if the sum [sigma] s _TIME = 0 of n _{S TIME} is S ^_'TIME = a 1 / n, in the case of the sum [sigma] s _SPACE = 0 of n _{S SPACE} is a ^{_{S 'SPACE = 1 / n)}} . Then, the important word extraction unit 20 stores each word to which the importance level parameter S is added in the important word storage unit 21 for each storage data of the metadata candidate output processing target.

類似データ特定部２２は、メタデータ候補の出力処理対象の記憶データに類似する記憶データを、記憶データ格納部１１を参照することにより特定する。具体的には、処理対象の記憶データに付与された生成時刻情報の示す時刻から一定時間範囲内の時刻を示す生成時刻情報が付与された複数の記憶データを分類する。例えば、処理対象の記憶データに生成時刻情報“2012/1/30 01:00:00”が付与された場合には、その時刻に対して前後１時間の範囲“2012/1/30 00:00:00”〜“2012/1/30 02:00:00”にある生成時刻情報が付与された記憶データを特定する。また、類似データ特定部２２は、処理対象の記憶データに付与された生成位置情報の示す位置から空間的に関連の高い位置を示す生成位置情報が付与された複数の記憶データを特定して分類する。例えば、処理対象の記憶データに生成位置情報“緯度Ｘ度、経度Ｙ度”が付与されていた場合には、その位置に対して１０ｋｍの範囲にある生成位置情報が付与された記憶データや、生成位置情報から特定される地名が重複する位置が付与された記憶データを特定する。なお、類似データ特定部２２は、互いに類似する記憶データを、自己組織化やＫ−ｍｅａｎｓ法等のクラスタリングの手法を用いて生成時刻の偏りや生成位置の偏りを特定することで、分類してもよい。そして、類似データ特定部２２は、処理対象の記憶データに類似する複数の記憶データを特定する情報を候補データ出力部２３に出力する。 The similar data specifying unit 22 specifies storage data similar to the storage data that is the output processing target of the metadata candidate by referring to the storage data storage unit 11. Specifically, a plurality of pieces of storage data to which generation time information indicating a time within a certain time range from a time indicated by generation time information added to the storage data to be processed is classified. For example, when the generation time information “2012/1/30 01:00:00” is given to the storage data to be processed, the range of 1 hour before and after that time “2012/1/30 00:00 : 00 ”to“ 2012/1/30 02:00:00 ”The storage data to which the generation time information is assigned is specified. Further, the similar data specifying unit 22 specifies and classifies a plurality of storage data to which generation position information indicating a spatially related position is assigned from the position indicated by the generation position information assigned to the processing target storage data. To do. For example, when the generation position information “latitude X degrees, longitude Y degrees” is given to the storage data to be processed, the storage data to which the generation position information in the range of 10 km is attached to the position, The storage data to which the position where the place name specified from the generated position information is duplicated is specified. The similar data specifying unit 22 classifies the storage data similar to each other by specifying the generation time bias and the generation position bias by using a clustering technique such as self-organization or the K-means method. Also good. Then, the similar data specifying unit 22 outputs information for specifying a plurality of storage data similar to the storage data to be processed to the candidate data output unit 23.

候補データ出力部２３は、移動通信端末２からの送信要求に応じて、重要語格納部２１から重要度順に複数の単語を読み出して、処理対象の記憶データに対して付加するメタデータの候補として出力する。このとき、候補データ出力部２３は、処理対象の１つの記憶データに対して出力される複数の単語を、類似データ特定部２２によって当該記憶データに類似するとして分類された複数の記憶データに対するメタデータの候補としても、同時に出力する。さらに、候補データ出力部２３は、メタデータ候補のなかからメタデータに反映する単語を選択する入力を移動通信端末２から受け付けた場合に、その単語をメタデータに反映するように、記憶データ格納部１１内の該当記憶データを更新する。ここで、候補データ出力部２３は、処理対象の記憶データに対してメタデータの候補を出力する際には、メタデータ候補の抽出元のテキストデータの種別毎に（例えば、スケジュールデータ、日記データ、及びメールデータ毎に）、メタデータ候補を順位づけして出力してもよいし、全ての種別のテキストデータから抽出されて順位づけされた候補をまとめて出力してもよい。 In response to a transmission request from the mobile communication terminal 2, the candidate data output unit 23 reads a plurality of words from the important word storage unit 21 in order of importance, and uses them as metadata candidates to be added to the storage data to be processed. Output. At this time, the candidate data output unit 23 converts the plurality of words output for one storage data to be processed into a plurality of storage data classified as similar to the storage data by the similar data specifying unit 22. Output as data candidates at the same time. Further, when the candidate data output unit 23 receives an input from the mobile communication terminal 2 for selecting a word to be reflected in the metadata from the metadata candidates, the candidate data output unit 23 stores the stored data so that the word is reflected in the metadata. The corresponding stored data in the unit 11 is updated. Here, when the candidate data output unit 23 outputs metadata candidates to the storage data to be processed, the candidate data output unit 23 outputs the metadata candidates for each type of text data (for example, schedule data, diary data). , And for each mail data), metadata candidates may be ranked and output, or candidates extracted from all types of text data and ranked may be output collectively.

以下、図４を参照して、メタデータ候補生成用サーバ装置１の動作について説明するとともに、併せてメタデータ候補生成用サーバ装置１におけるメタデータ候補生成方法について詳述する。同図は、メタデータ候補生成用サーバ装置１によるメタデータ候補生成時の動作を示すフローチャートである。 Hereinafter, the operation of the metadata candidate generation server device 1 will be described with reference to FIG. 4, and the metadata candidate generation method in the metadata candidate generation server device 1 will be described in detail. FIG. 3 is a flowchart showing an operation at the time of metadata candidate generation by the metadata candidate generation server apparatus 1.

まず、移動通信端末２のユーザからメタデータ候補の出力処理対象を選択する入力が受け付けられる（ステップＳ１０１）。そうすると、生成時刻情報取得部１３によって、処理対象の記憶データに付与された生成時刻情報が記憶データ格納部１１から読み出されると同時に、時刻情報取得部１５によって、テキストデータ格納部１２から複数のテキストデータに付与された時刻情報が読み出される（ステップＳ１０２）。そして、時間的類似情報取得部１６によって、生成時刻情報の示す時刻に対して一定の時間範囲内の時刻情報が付与された複数のテキストデータが特定され、それらのテキストデータがテキストデータ格納部１２から取得される（ステップＳ１０３）。さらに、生成位置情報取得部１４によって、処理対象の記憶データに付与された生成位置情報が記憶データ格納部１１から読み出されると同時に、位置情報取得部１７によって、テキストデータ格納部１２から複数のテキストデータに付与された位置情報が読み出される（ステップＳ１０４）。そして、空間的類似情報取得部１８によって、生成位置情報の示す位置に対して空間的に関連の高い位置情報が付与された複数のテキストデータが特定され、それらのテキストデータがテキストデータ格納部１２から取得される（ステップＳ１０５）。 First, an input for selecting a metadata candidate output processing target is received from the user of the mobile communication terminal 2 (step S101). Then, the generation time information given to the storage data to be processed is read from the storage data storage unit 11 by the generation time information acquisition unit 13, and at the same time, a plurality of texts are read from the text data storage unit 12 by the time information acquisition unit 15. The time information attached to the data is read (step S102). The temporal similarity information acquisition unit 16 identifies a plurality of text data to which time information within a certain time range is given with respect to the time indicated by the generation time information, and the text data is stored in the text data storage unit 12. (Step S103). Furthermore, the generated position information acquisition unit 14 reads the generated position information given to the storage data to be processed from the stored data storage unit 11, and at the same time, the position information acquisition unit 17 reads a plurality of texts from the text data storage unit 12. The position information given to the data is read (step S104). The spatial similarity information acquisition unit 18 identifies a plurality of text data to which position information highly related to the position indicated by the generation position information is assigned, and the text data is stored in the text data storage unit 12. (Step S105).

その後、形態素解析部１９により、複数のテキストデータが単語に分割される（ステップＳ１０６）。次に、重要語抽出部２０により、分割された複数の単語を対象に重要度パラメータＳが計算され、その重要度パラメータＳを基に順位付けがなされた複数の単語が、処理対象の記憶データ毎に重要語格納部２１に格納される（ステップＳ１０７）。さらに、類似データ特定部２２により、空間的或いは時間的に互いに類似する複数の記憶データが分類される（ステップＳ１０８）。最後に、移動通信端末２からの送信要求に応じて、処理対象の記憶データ、及びその記憶データに類似すると分類された記憶データに関して、メタデータの候補としての単語が順位付けされて出力される（ステップＳ１０９）。 Thereafter, the morphological analysis unit 19 divides the plurality of text data into words (step S106). Next, the important word extraction unit 20 calculates the importance parameter S for the plurality of divided words, and the plurality of words ranked based on the importance parameter S are processed storage data. Every time it is stored in the keyword storage unit 21 (step S107). Further, the similar data specifying unit 22 classifies a plurality of stored data that are similar to each other spatially or temporally (step S108). Finally, in response to a transmission request from the mobile communication terminal 2, words as metadata candidates are ranked and output for the storage data to be processed and the storage data classified as similar to the storage data. (Step S109).

図５には、メタデータ候補生成用サーバ装置１からのメタデータ候補の出力に応じて、移動通信端末２において表示された出力画面Ｄ_１の一例を示している。同図に示すように、処理対象の記憶データである写真データＧ_１と、写真データＧ_１に対して時間的或いは空間的に類似する写真データＧ_２，Ｇ_３，Ｇ_４とに対するメタデータ候補の単語として、“北海道”、“家族旅行”、“夏休み”、“バーベキュー”、“キャンプファイヤ”、“熊の置物”が表示され、それらの候補の中から、ユーザにより、“北海道”、“家族旅行”、“夏休み”、“バーベキュー”、“キャンプファイヤ”が、メタデータとして反映する単語として選択されている。 FIG. 5 shows an example of the output screen D ₁ displayed on the mobile communication terminal 2 in response to the output of metadata candidates from the metadata candidate generation server device 1. As shown in the figure, a photograph data G ₁ is a storage data to be processed, photograph data G ₂ similar temporally or spatially with respect to picture data G _{_1,} G _3, the metadata candidates for the G ₄ The words “Hokkaido”, “Family trip”, “Summer vacation”, “Barbecue”, “Campfire”, “Kuma no figurine” are displayed, and the user selects “Hokkaido”, “ "Family trip", "Summer vacation", "Barbecue", "Campfire" are selected as words to be reflected as metadata.

以上説明したメタデータ候補生成用サーバ装置１、及びメタデータ候補生成用サーバ装置１におけるメタデータ候補生成方法によれば、記憶データに対して付与された生成時刻情報と生成位置情報とが取得されるとともに、複数のテキストデータに対して付与された時刻情報と位置情報とが取得される。さらに、生成時刻情報の示す時刻に対して一定の時間範囲の時刻情報が付与されたテキストデータが取得されるとともに、生成位置情報の示す位置に対して空間的に関連の有る位置情報が付与されたテキストデータが取得され、これらのテキスト情報から重要度付けされた複数の重要語がメタデータの候補として出力される。これにより、記憶データの生成時刻及び生成位置に近い時刻及び位置に関連するテキスト情報が広く取得されて、そのテキスト情報の中から重要な語が出力されるので、記憶データ用の検索キーワードとしての候補を、効率よく抽出して、その候補の中から記憶データにメタデータとして付加させることができる。 According to the metadata candidate generation server device 1 and the metadata candidate generation method in the metadata candidate generation server device 1 described above, the generation time information and the generation position information given to the storage data are acquired. At the same time, time information and position information given to the plurality of text data are acquired. Further, text data to which time information in a certain time range is assigned to the time indicated by the generation time information is acquired, and position information that is spatially related to the position indicated by the generation position information is assigned. Text data is acquired, and a plurality of important words given importance from these text information are output as metadata candidates. As a result, text information related to the time and position close to the generation time and generation position of the stored data is widely acquired, and important words are output from the text information. Candidates can be efficiently extracted and added to the stored data as metadata from the candidates.

また、テキストデータにおける複数の単語の出現頻度に基づいて、複数の単語の重要度が決定されるので、記憶データの検索キーワードにより適した重要語を記憶データに付加させることができる。さらに、重要語に対して、その出現頻度に記憶データに対する時間的及び空間的一致度を加味して重要度を決定するので、複数のテキスト情報の中から検索キーワードに適した重要語をより効率的に抽出することができる。 In addition, since the importance of the plurality of words is determined based on the appearance frequency of the plurality of words in the text data, it is possible to add an important word more suitable for the search keyword of the storage data to the storage data. Furthermore, since the importance is determined by adding the temporal and spatial coincidence to the stored data to the occurrence frequency of the important word, the important word suitable for the search keyword is more efficiently selected from a plurality of text information. Can be extracted.

また、互いに一定の時間範囲内にある生成時刻情報を有する複数の記憶データに対して、まとめてメタデータ候補を出力することができ、データ処理効率が格段に向上する。さらに、互いに空間的に関連がある生成位置情報を有する複数の記憶データに対して、まとめてメタデータ候補を出力することができ、データ処理効率がさらに向上する。 In addition, metadata candidates can be output collectively for a plurality of stored data having generation time information within a certain time range, and the data processing efficiency is remarkably improved. Furthermore, metadata candidates can be output collectively for a plurality of stored data having generated position information that is spatially related to each other, further improving data processing efficiency.

なお、本発明は、上述した実施形態に限定されるものではない。 In addition, this invention is not limited to embodiment mentioned above.

例えば、図１に示したような各構成要素は、ユーザが使用する端末装置に一部又は全てが具備されていてもよい。例えば、図６に示すように、全ての構成要素が、記憶データにメタデータを付加しようとするユーザが使用する移動通信端末１０１内に備えられていてもよい。 For example, some or all of the components shown in FIG. 1 may be included in the terminal device used by the user. For example, as shown in FIG. 6, all the components may be provided in the mobile communication terminal 101 used by a user who wants to add metadata to stored data.

１…メタデータ候補生成用サーバ装置、１０１，２…移動通信端末、１３…生成時刻情報取得部（第１の付与情報取得手段）、１４…生成位置情報取得部（第１の付与情報取得手段）、１５…時刻情報取得部（第２の付与情報取得手段）、１６…時間的類似情報取得部（時間的類似情報取得手段）、１７…位置情報取得部（第２の付与情報取得手段）、１８…空間的類似情報取得部（空間的類似情報取得手段）、１９…形態素解析部（重要語抽出手段）、２０…重要語抽出部（重要語抽出手段）、２２…類似データ特定部（記憶データ特定手段）、２３…候補データ出力部（重要語抽出手段）。 DESCRIPTION OF SYMBOLS 1 ... Metadata candidate production | generation server apparatus, 101,2 ... Mobile communication terminal, 13 ... Generation | occurrence | production time information acquisition part (1st provision information acquisition means), 14 ... Generation | occurrence | production position information acquisition part (1st provision information acquisition means) ), 15... Time information acquisition unit (second attached information acquisition unit), 16... Temporal similarity information acquisition unit (temporal similarity information acquisition unit), 17... Position information acquisition unit (second addition information acquisition unit) 18 ... Spatial similarity information acquisition unit (spatial similarity information acquisition unit), 19 ... Morphological analysis unit (important word extraction unit), 20 ... Important word extraction unit (important word extraction unit), 22 ... Similar data specifying unit ( Storage data specifying means), 23... Candidate data output section (important word extracting means).

Claims

First grant information acquisition means for acquiring generation time information related to data generation time assigned to stored data stored by a user;
A second assignment information acquisition means for acquiring time information related to the time assigned to the plurality of text information created by the user;
Time-similar information for identifying text information to which time information indicating a time within a certain time range with respect to the time indicated by the generation time information is specified from among the plurality of text information and acquiring the text information Acquisition means;
Important word extraction that extracts a plurality of important words given importance from the text information acquired by the temporal similarity information acquisition means, and outputs the plurality of important words as metadata candidates to be added to the stored data Means,
Equipped with a,
The important word extraction means includes
The appearance frequency and the time based on the appearance frequency of the plurality of words in the text information and the degree of temporal coincidence with the generation time information of the time information given to the text information from which the plurality of words are extracted Calculating and determining the importance of the plurality of important words by weighted addition of the degree of matching
A metadata candidate generation device characterized by that.

  First grant information acquisition means for acquiring generation time information related to a data generation time given to storage data stored by a user and generation position information related to a data generation position given to the storage data; ,
  Second grant information acquisition means for acquiring time information related to a time given to a plurality of text information created by a user and position information related to a location given to the text information;
  Time-similar information for identifying text information to which time information indicating a time within a certain time range with respect to the time indicated by the generation time information is specified from among the plurality of text information and acquiring the text information Acquisition means;
  Spatial similarity information for specifying text information to which position information indicating a position spatially related to the position indicated by the generated position information is assigned from among the plurality of text information and acquiring the text information Acquisition means;
  Metadata that extracts a plurality of important words given importance from the text information acquired by the temporal similarity information acquisition means and the spatial similarity information acquisition means, and adds the plurality of important words to the stored data Keyword extraction means to output as a candidate for
With
  The important word extraction means includes
Frequency of appearance of a plurality of words in the text information, degree of temporal coincidence of the time information given to the text information from which the plurality of words are extracted to the generation time information, and the position given to the text information The weight of the appearance frequency, the temporal coincidence, and the spatial coincidence of the plurality of words based on the spatial coincidence with respect to the generation position information of information, thereby calculating the importance of the plurality of important words. Calculate and determine the degree,
A metadata candidate generation device characterized by that.

Further comprising storage data specifying means for specifying a plurality of the storage data having generation time information within a certain time range from each other;
The important word extracting means outputs the plurality of important words extracted for one of the plurality of stored data as metadata candidates to be added to the plurality of stored data.
The metadata candidate generation device according to claim 1 or 2 , characterized in that

A storage data specifying means for specifying a plurality of the storage data having generated position information spatially related to each other;
The important word extracting means outputs the plurality of important words extracted for one of the plurality of stored data as metadata candidates to be added to the plurality of stored data.
The metadata candidate generation device according to any one of claims 1 to 3 .

A first grant information acquisition step in which a first grant information acquisition means acquires generation time information related to a data generation time given to storage data stored by a user;
A second grant information acquisition step in which the second grant information acquisition means acquires time information related to the time given to the plurality of text information created by the user;
The temporal similarity information acquisition unit specifies text information to which time information indicating a time within a certain time range is given with respect to the time indicated by the generation time information from among the plurality of text information, and the text A temporally similar information acquisition step for acquiring information;
A candidate for metadata in which the important word extracting means extracts a plurality of important words given importance from the text information acquired by the temporal similarity information acquiring means, and adds the plurality of important words to the stored data Key word extraction step to output as
Equipped with a,
In the important word extraction step,
The appearance frequency and the time based on the appearance frequency of the plurality of words in the text information and the degree of temporal coincidence with the generation time information of the time information given to the text information from which the plurality of words are extracted Calculating and determining the importance of the plurality of important words by weighted addition of the degree of matching
A metadata candidate generation method characterized by:

  A first assignment information acquisition unit acquires generation time information related to a data generation time assigned to storage data stored by a user and generation position information related to a data generation position assigned to the storage data. A first grant information acquisition step,
  2nd grant information in which the 2nd grant information acquisition means acquires time information about time given to a plurality of text information created by a user, and position information about a position given to the text information An acquisition step;
  The temporal similarity information acquisition unit specifies text information to which time information indicating a time within a certain time range is given with respect to the time indicated by the generation time information from among the plurality of text information, and the text A temporally similar information acquisition step for acquiring information;
  Spatial similarity information acquisition means specifies text information to which position information indicating a position spatially related to the position indicated by the generated position information is assigned from the plurality of text information, and the text A spatial similarity information acquisition step for acquiring information;
  An important word extracting means extracts a plurality of important words given importance from the text information acquired by the temporal similar information acquiring means and the spatial similar information acquiring means, and stores the plurality of important words in the memory A key word extraction step to output as metadata candidates to be added to the data;
With
  In the important word extraction step,
Frequency of appearance of a plurality of words in the text information, degree of temporal coincidence of the time information given to the text information from which the plurality of words are extracted to the generation time information, and the position given to the text information The weight of the appearance frequency, the temporal coincidence, and the spatial coincidence of the plurality of words based on the spatial coincidence with respect to the generation position information of information, thereby calculating the importance of the plurality of important words. Calculate and determine the degree,
A metadata candidate generation method characterized by: