JP5133294B2

JP5133294B2 - Spatio-temporal search device, method and program

Info

Publication number: JP5133294B2
Application number: JP2009098279A
Authority: JP
Inventors: 健倉島; 考藤村
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2009-04-14
Filing date: 2009-04-14
Publication date: 2013-01-30
Anticipated expiration: 2029-04-14
Also published as: JP2010250496A

Description

本発明は、時空間検索装置及び方法及びプログラムに係り、特に、個人の経験が自然言語で記述され、さらにそれを経験した（または、そこに記述されている経験を実際にした）日時、場所を一意に識別可能な数値情報がそれに付与された構造を持つデータを対象として、人間の感情と、時空間領域との関係性を分析するための時空間検索装置及び方法及びプログラムに関する。 The present invention relates to a spatio-temporal search apparatus, method, and program, and in particular, the date, time, and place where an individual's experience is described in natural language and experienced (or the experience described therein is actually realized). The present invention relates to a spatio-temporal search device, method, and program for analyzing the relationship between human emotions and spatio-temporal regions for data having a structure with numerical information that can be uniquely identified as a target.

近年、ブログやＳＮＳ（ソーシャルネットワークサービス）、インターネット掲示板などのいわゆるＣＧＭ(Consumer Generated Media)と呼ばれるメディアの普及が著しい。これらのメディアは、企業が発する広告情報や、客観的なデータとは異なり、人間の経験、つまり、個人の行動履歴や、主観的な記述を多く含むという特徴がある。そのため、マーケティングや企業経営、消費行動などの様々な分野で、これらのメディアの利用の必要性が高まっている。また、携帯電話などのＧＰＳを搭載したモバイル端末の普及により、位置情報（緯度、経度）や時間情報（日付）が付与されたＣＧＭ、ウェブテキストが発信される機会も増えてきている。 In recent years, media called so-called CGM (Consumer Generated Media) such as blogs, SNSs (social network services), and Internet bulletin boards have been widely used. Unlike advertising information and objective data issued by companies, these media are characterized by including many human experiences, that is, individual behavior histories and subjective descriptions. Therefore, the necessity of using these media is increasing in various fields such as marketing, corporate management, and consumption behavior. In addition, with the spread of mobile terminals equipped with GPS such as mobile phones, opportunities to transmit CGM and web text with location information (latitude, longitude) and time information (date) are increasing.

従来の第１の技術として、このようなＣＧＭ、特に、ブログを対象として、時間、もしくは空間に特有な人々の行動や感情を発見する経験マイニング技術が知られている。この技術は、ブログに付与された時間情報と、ブログ中に存在する地名、行動語、感情語の共起関係を利用して時間、空間、行動、感情の関係性を導出している（例えば、非特許文献１参照）。 As a first conventional technique, there is known an experience mining technique for discovering behaviors and feelings of people peculiar to time or space for such CGM, in particular, a blog. This technology derives the relationship between time, space, action, and emotion using the co-occurrence relationship between the time information given to the blog and the place name, action word, and emotion word existing in the blog (for example, Non-Patent Document 1).

従来の第２の技術として、ある単語の極性（ポジティブ／ネガティブ）を自動抽出する技術が知られている。この技術は、Ｗｅｂ検索エンジンのヒット件数を使い、ある単語が、ポジティブな単語（ｇｏｏｄ）とネガティブな単語（ｂａｄ）のどちらと共起しやすいかという情報をもとに、その単語の極性を自動で決定する。この技術によれば、単語「事故」が、ポジティブよりネガティブの意味を強く持つということがわかる（例えば、非特許文献２参照）。 As a conventional second technique, a technique for automatically extracting the polarity (positive / negative) of a word is known. This technology uses the number of hits of a Web search engine and determines the polarity of a word based on information about whether a word is likely to co-occur with a positive word (good) or a negative word (bad). Determine automatically. According to this technique, it can be seen that the word “accident” has a negative meaning stronger than positive (for example, see Non-Patent Document 2).

従来の第３の技術は、単語の評価極性を一軸で評価する第１の技術を拡張し、単語の極性を人間の感情を構成する四軸（＜嬉しい，哀しい＞、＜驚き，予期＞、＜受容，嫌悪＞、＜怒り，恐れ＞）で評価する技術である（例えば、非特許文献３参照）。 The conventional third technique is an extension of the first technique that evaluates the evaluation polarity of a word in one axis, and the polarity of the word is composed of four axes (<happy, sad>, <surprise, anticipation>, <Acceptance, disgust>, <anger, fear>) (see, for example, Non-Patent Document 3).

また、従来の第２、第３の技術に依れば、最も単純には、ある文書においてポジティブな単語が、ネガティブな単語よりも多く出現している場合に、その内容が全体としてポジティブであると判断するといったように、文書分類にも応用することが可能である。 Further, according to the second and third techniques of the prior art, most simply, when a positive word appears more than a negative word in a document, the content is positive as a whole. It can be applied to document classification as well.

大規模テキストからの経験マイニング：倉島健、藤村考、奥田英範，電子情報通信学会第１９回データ工学ワークショップ/第６回日本データベース学会年次大会(DEWS2008)，A1-4, 2008.Experience mining from large-scale texts: Ken Kurashima, Ko Fujimura, Hidenori Okuda, The 19th Data Engineering Workshop / The 6th Annual Conference of the Database Society of Japan (DEWS2008), A1-4, 2008. Measuring Praise and Criticism: Inference of Semantic Orientation from Association, P. Turney and M.L. Littman, ACM transaction on Information Systems, Vol. 21, No. 4, 2003.Measuring Praise and Criticism: Inference of Semantic Orientation from Association, P. Turney and M.L.Littman, ACM transaction on Information Systems, Vol. 21, No. 4, 2003. Proposal of Impression Mining from News Articles, T. Kumamoto and K. Tanaka, Proc. of the 10th International Conference on Knowledge-Based & Intelligent Information & Engineering Systems (KES 2005), LNAI 3681, pp. 901-910, 2005.Proposal of Impression Mining from News Articles, T. Kumamoto and K. Tanaka, Proc. Of the 10th International Conference on Knowledge-Based & Intelligent Information & Engineering Systems (KES 2005), LNAI 3681, pp. 901-910, 2005.

従来の第１の技術は、時間属性（日付）や空間属性が数値データとして与えられた場合に、これら数値属性をカテゴリ属性に帰着して解くため、得られる解が最適性を持っていなかった。ここでは、血液型、性別のような２つ以上の値をとる属性をカテゴリ属性、時間、緯度／経度、体重、身長といった、一般の数値を取る属性を数値属性と呼ぶ。カテゴリ属性に帰着するということは、つまり、数値属性の連続性を無視することを意味する。例えば、数値属性「日付」の値「２００８年１１月０６日」を、単純なカテゴリ属性「日付」の値「２００８年１１月６日」として扱った瞬間に、１１月５日、１１月６日、１１月７日…といった連続性が失われ、その結果１１月１日から１１月７日までの一週間に起こった傾向といった、複数のカテゴリ属性値に跨って起きる傾向を得られなくなる。空間属性についても同様で、従来の第１の技術は、地名で表される領域の一つの単位として傾向を発見するため、地名で表される領域に跨る、もしくは、地名で表される領域の一部で起きている傾向を発見できなかった。 In the first conventional technique, when time attributes (dates) and spatial attributes are given as numerical data, these numerical attributes are reduced to category attributes and solved, so the obtained solution has not been optimal. . Here, attributes that take two or more values such as blood type and gender are called category attributes, and attributes that take general numerical values such as time, latitude / longitude, weight, and height are called numerical attributes. Reducing to a category attribute means ignoring the continuity of numeric attributes. For example, at the moment when the value “November 06, 2008” of the numerical attribute “date” is treated as the value “November 6, 2008” of the simple category attribute “date”, November 5, November 6 As a result, it becomes impossible to obtain a tendency to occur across a plurality of category attribute values such as a tendency that occurred in one week from November 1 to November 7. The same applies to the spatial attribute. The first conventional technique finds a trend as one unit of the area represented by the place name, and therefore spans the area represented by the place name or the area represented by the place name. I couldn't find a trend that happened in some areas.

また、従来の第２・第３の技術は、時間に左右されない固定的な単語と極性との関係を発見するための技術であり、時間的／空間的要因によって変化する関係を発見することができない。 In addition, the conventional second and third techniques are techniques for discovering a relationship between a fixed word and polarity that is not influenced by time, and it is possible to discover a relationship that changes depending on temporal / spatial factors. Can not.

本発明は、上記の点に鑑みなされたもので、個人の経験が自然言語で記述され、さらに、それを経験した（または、そこに記述されている経験を実際にした）日時、場所を一意に識別可能な数値情報がそれに付与された構造を持つデータを対象として、ある感情を抱く人々を最もよく特徴付ける最適な時空間領域を求めることが可能な時空間検索装置及び方法及びプログラムを提供することを目的とする。 The present invention has been made in view of the above points, and the personal experience is described in natural language, and the date, time, and place where the experience was experienced (or the experience described therein was actually made) are unique. A spatio-temporal search device, method, and program capable of obtaining an optimal spatio-temporal region that best characterizes people who have a certain emotion are targeted for data having a structure in which numerical information that can be identified is assigned thereto For the purpose.

図１は、本発明の原理構成図である。 FIG. 1 is a principle configuration diagram of the present invention.

本発明（請求項１）は、個人の経験が自然言語で記述された文書と、該文書に対して、経験をした位置情報、時間情報が数値データで与えられた構造の解析対象データから、特定の人々を最もよく特徴付ける最適な時空間領域を求める時空間検索装置であって、
解析対象データと、求める時空間領域に対する制約と、人々を特徴付ける条件を入力し、入力情報記憶手段１１に格納する入力手段１０と、
入力手段１０で与えられた人々を特徴付ける条件を、最適な時空間領域を求める目的値とする目的値導出手段３０と、
解析対象データのテキストの内容が、人々を特徴付ける条件に該当する人々によって書かれたものかを判定する文書極性判定手段２１と、
文書極性判定手段２１の判定結果と、解析対象データに付与された位置情報、時間情報に基づいて、位置情報、時間情報、判定結果を要素として持つトランザクションを生成し、トランザクション記憶手段１５に格納するトランザクション生成手段２２と、
トランザクション記憶手段１５のトランザクションの集合から、目的値に該当するトランザクションの出現確率を最大化する時空間領域条件を、求める時空間領域に対する制約に基づいて、数値属性相関ルールを抽出することで導出する数値属性相関ルール抽出手段４０と、
を有する。 The present invention (Claim 1) is based on a document in which an individual's experience is described in a natural language, and analysis target data having a structure in which position information and time information on which the experience has been given are given as numerical data. A spatio-temporal search device for finding an optimal spatio-temporal region that best characterizes a specific person,
Input means 10 for inputting data to be analyzed, constraints on the spatiotemporal area to be obtained, conditions for characterizing people, and storing them in the input information storage means 11;
A target value deriving unit 30 that sets a condition characterizing people given by the input unit 10 as a target value for obtaining an optimum spatiotemporal region;
Document polarity determination means 21 for determining whether the content of the text of the analysis target data is written by people corresponding to the conditions characterizing people;
Based on the determination result of the document polarity determination unit 21 and the position information and time information given to the analysis target data, a transaction having the position information, time information, and determination result as elements is generated and stored in the transaction storage unit 15. Transaction generation means 22;
A space-time region condition that maximizes the appearance probability of the transaction corresponding to the target value is derived from the set of transactions stored in the transaction storage unit 15 by extracting a numerical attribute correlation rule based on the constraints on the space-time region to be obtained. Numerical attribute association rule extraction means 40;
Have

また、本発明（請求項２）は、請求項１の時空間検索装置において、
入力手段１０は、
人々を特徴付ける条件として、反対の意味を持つ二つの感情からなる感情極性を指定することで、ある感情、もしくは、その逆の感情を抱くかどうかという観点で人々を特徴付けする手段を含み、
目的値導出手段３０は、
入力手段１０で与えられた二つの感情を、それぞれ、最適な時空間領域を求める目的値とする手段を含み、
文書極性判定手段２１は、
解析対象データのテキストの内容が、感情極性の極性値である二つの感情のどちらに属するかを判定する手段を含む。 Further, the present invention (Claim 2) is the spatio-temporal search device according to Claim 1,
The input means 10
As a condition that characterizes people, by specifying emotional polarity consisting of two emotions with opposite meanings, including means to characterize people in terms of whether they have one emotion or the opposite,
The target value deriving means 30
Means for setting the two emotions given by the input means 10 as target values for obtaining an optimal space-time region,
The document polarity determination means 21
Means for determining which content of the text of the analysis target data belongs to which of the two emotions which are the polarity values of the emotion polarity.

また、本発明（請求項３）は、請求項１の時空間検索装置において、
入力手段１０は、
人々を特徴付ける条件として、指定された反対の意味を持つ二つの感情からなる複数の感情極性の入力を受け付け、複数の感情を同時に抱くかという観点で人々を特徴付けする手段を含み、
目的値導出手段３０は、
入力手段で与えられたそれぞれの感情極性について、感情極性を構成する二つの反意の感情を元とする集合の直積集合を目的値とする手段を含み、
文書極性判定手段２１は、それぞれの感情極性について、解析対象データのテキストの内容が、感情極性の極性値である二つの感情のどちらに属するかを判定する手段を含む。 Further, the present invention (Claim 3) is the spatio-temporal search device according to Claim 1,
The input means 10
As a condition to characterize people, it includes means for characterizing people from the perspective of accepting multiple emotional polarities that consist of two emotions with specified opposite meanings,
The target value deriving means 30
For each emotional polarity given by the input means, including means for setting the objective value to a Cartesian product set of sets based on two opposing emotions constituting the emotion polarity,
The document polarity determination unit 21 includes a unit that determines, for each emotion polarity, the content of the text of the analysis target data belongs to two emotions that are the polarity values of the emotion polarity.

また、本発明（請求項４）は、請求項１乃至３の何れか１項記載の時空間検索装置において、特定の人々を最も良く特徴付ける最適な時空間領域を出力する出力手段を更に有する。 The present invention (Claim 4) further comprises output means for outputting an optimum spatio-temporal region that best characterizes a specific person in the spatio-temporal search device according to any one of Claims 1 to 3.

図２は、本発明の原理を説明するための図である。 FIG. 2 is a diagram for explaining the principle of the present invention.

本発明（請求項５）は、個人の経験が自然言語で記述された文書と、該文書に対して、経験をした位置情報、時間情報が数値データで与えられた構造の解析対象データから、特定の人々を最もよく特徴付ける最適な時空間領域を求める時空間検索方法であって、
解析対象データと、求める時空間領域に対する制約と、人々を特徴付ける条件を入力し、入力情報記憶手段に格納する入力ステップ（ステップ１）と、
入力ステップ（ステップ１）で与えられた人々を特徴付ける条件を、最適な時空間領域を求める目的値とする目的値導出ステップ（ステップ２）と、
解析対象データのテキストの内容が、人々を特徴付ける条件に該当する人々によって書かれたものかを判定する文書極性判定ステップ（ステップ３）と、
文書極性判定ステップ（ステップ３）の判定結果と、解析対象データに付与された位置情報、時間情報に基づいて、位置情報、時間情報、判定結果を要素として持つトランザクションを生成し、トランザクション記憶手段に格納するトランザクション生成ステップ（ステップ４）と、
トランザクション記憶手段のトランザクションの集合から、目的値に該当するトランザクションの出現確率を最大化する時空間領域条件を、求める時空間領域に対する制約に基づいて、数値属性相関ルールを抽出することで導出する数値属性相関ルール抽出ステップ（ステップ５）と、を行う。 The present invention (Claim 5) includes a document in which an individual's experience is described in a natural language, and analysis target data having a structure in which the positional information and time information on which the experience is given are given as numerical data, A spatio-temporal search method that finds the optimal spatio-temporal region that best characterizes a particular person,
An input step (step 1) for inputting analysis target data, constraints on the space-time region to be obtained, conditions for characterizing people, and storing them in the input information storage means;
A target value deriving step (step 2) in which the conditions characterizing the people given in the input step (step 1) are set as target values for obtaining an optimal spatiotemporal region;
A document polarity determination step (step 3) for determining whether the text content of the analysis target data is written by people corresponding to the conditions characterizing people;
Based on the determination result of the document polarity determination step (step 3) and the position information and time information given to the analysis target data, a transaction having the position information, time information, and determination result as elements is generated and stored in the transaction storage means. A transaction generation step (step 4) to store;
A numerical value derived by extracting a numerical attribute correlation rule from a set of transactions in the transaction storage means, based on the constraints on the spatiotemporal domain to obtain the spatiotemporal domain condition that maximizes the appearance probability of the transaction corresponding to the target value An attribute correlation rule extraction step (step 5) is performed.

また、本発明（請求項６）は、請求項５記載の時空間検索方法の入力ステップ（ステップ１）では、人々を特徴付ける条件として、反対の意味を持つ二つの感情からなる感情極性を指定することで、ある感情、もしくは、その逆の感情を抱くかどうかという観点で人々を特徴付けし、
目的値導出ステップ（ステップ２）では、入力ステップで与えられた反対の意味を持つ二つの感情を、それぞれ、最適な時空間領域を求める目的値とし、
文書極性判定ステップ（ステップ３）では、解析対象データのテキストの内容が、感情極性の極性値である二つの感情のどちらに属するかを判定する。 Further, according to the present invention (Claim 6), in the input step (Step 1) of the spatio-temporal search method according to Claim 5, an emotion polarity consisting of two emotions having opposite meanings is specified as a condition characterizing people. To characterize people in terms of whether they have a feeling or vice versa,
In the target value derivation step (step 2), two emotions having opposite meanings given in the input step are set as target values for obtaining the optimum spatiotemporal region,
In the document polarity determination step (step 3), it is determined which of the two emotions, which is the polarity value of the emotion polarity, the content of the text of the analysis target data.

また、本発明（請求項７）は、請求項５記載の時空間検索方法の入力ステップ（ステップ１）では、人々を特徴付ける条件として、指定された反対の意味を持つ二つの感情からなる複数の感情極性の入力を受け付け、複数の感情を同時に抱くかという観点で人々を特徴付けし、
目的値導出ステップ（ステップ２）では、入力ステップで与えられたそれぞれの感情極性について、感情極性を構成する二つの反意の感情を元とする集合の直積集合を目的値とし、
文書極性判定ステップ（ステップ３）では、それぞれの感情極性について、解析対象データのテキストの内容が、感情極性の極性値である二つの感情のどちらに属するかを判定する。 Further, according to the present invention (Claim 7), in the input step (Step 1) of the spatio-temporal search method according to Claim 5, as a condition characterizing people, a plurality of emotions having two designated opposite meanings are provided. Characterize people in terms of accepting emotional polarity input and having multiple emotions at the same time,
In the objective value deriving step (step 2), for each emotion polarity given in the input step, the objective value is a Cartesian product set of a set based on two opposing emotions constituting the emotion polarity,
In the document polarity determination step (step 3), for each emotion polarity, it is determined which of the two emotions, which are the polarity values of the emotion polarity, the content of the text of the analysis target data belongs to.

また、本発明（請求項８）は、請求項５乃至７のいずれか１項に記載の時空間検索方法において、特定の人々を最も良く特徴付ける最適な時空間領域を表示手段に出力する出力ステップを更に行う。 Further, according to the present invention (Claim 8), in the spatiotemporal search method according to any one of Claims 5 to 7, an output step of outputting an optimal spatiotemporal region that best characterizes a specific person to the display means. Is further performed.

本発明（請求項９）は、請求項１乃至４記載の何れか１項記載の時空間検索装置を構成する各手段としてコンピュータを機能させるための時空間検索プログラムである。 The present invention (Claim 9) is a spatiotemporal search program for causing a computer to function as each means constituting the spatiotemporal search apparatus according to any one of Claims 1 to 4.

本発明によれば、個人の経験が自然言語で記述されたテキストと、それに対して経験した位置情報、時間情報が数値データで与えられた構造のデータから、ある感情を抱く人々を最もよく特徴付ける最適な時空間領域を求めることが可能である。 According to the present invention, a person who has a certain emotion is best characterized from text in which personal experience is described in natural language, and positional information and time information experienced in the form of numerical data. It is possible to obtain an optimal spatiotemporal region.

本発明の原理構成図である。It is a principle block diagram of this invention. 本発明の原理を説明するための図である。It is a figure for demonstrating the principle of this invention. 本発明の一実施の形態における時空間検索装置の構成図である。It is a block diagram of the spatiotemporal search apparatus in one embodiment of this invention. 本発明の一実施の形態における時空間検索装置の概要動作のフローチャートである。It is a flowchart of outline | summary operation | movement of the spatio-temporal search apparatus in one embodiment of this invention. 本発明の一実施の形態における二次元平面上の領域族の例である。It is an example of the area | region group on the two-dimensional plane in one embodiment of this invention. 本発明の一実施の形態におけるトランザクション生成部で生成されるトランザクションの例である。It is an example of the transaction produced | generated by the transaction production | generation part in one embodiment of this invention. 本発明の一実施の形態における目的値導出部の処理を表すフローチャートである。It is a flowchart showing the process of the target value derivation | leading-out part in one embodiment of this invention. 本発明の一実施の形態における図６のトランザクションに目的属性を付与した例である。It is an example which provided the objective attribute to the transaction of FIG. 6 in one embodiment of this invention. 本発明の一実施の形態におけるトランザクションを二次元平面にマッピングした例である。It is the example which mapped the transaction in one embodiment of this invention on the two-dimensional plane. 本発明の一実施の形態における図９から最適直方凸領域を求めた例である。It is the example which calculated | required the optimal square convex area | region from FIG. 9 in one embodiment of this invention.

以下、図面と共に本発明の実施の形態を説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

図３は、本発明の一実施の形態における時空間検索装置の構成を示す。 FIG. 3 shows the configuration of the spatio-temporal search device in one embodiment of the present invention.

時空間検索装置は、入力部１０、解析対象データ記憶部１１、閾値記憶部１２、感情極性記憶部１３、領域族記憶部１４、トランザクション記憶部１５、最適解記憶部１６、トランザクション生成機能部２０、目的値導出部３０、数値属性相関ルール抽出部４０、出力部５０からなり、ある特定の人々、代表的には、ある感情を抱く人々を最もよく特徴付ける最適な時空間領域を求める。本発明において、時空間領域とは、時間、または、空間領域、または、時空間領域を含む概念であると定義する。 The spatiotemporal search device includes an input unit 10, an analysis target data storage unit 11, a threshold storage unit 12, an emotion polarity storage unit 13, a region group storage unit 14, a transaction storage unit 15, an optimal solution storage unit 16, and a transaction generation function unit 20. The target value deriving unit 30, the numerical attribute correlation rule extracting unit 40, and the output unit 50 are used to obtain an optimal spatiotemporal region that best characterizes a specific person, typically a person who has a certain emotion. In the present invention, a spatiotemporal region is defined as a concept including time, a spatial region, or a spatiotemporal region.

図４は、本発明の一実施の形態における時空間検索装置の概要動作のフローチャートである。 FIG. 4 is a flowchart of the schematic operation of the spatio-temporal search device in one embodiment of the present invention.

ステップ１００）入力部１０は、ユーザから入力された解析対象データ、閾値（最小支持度）、感情極性（＜ポジティブ，ネガティブ＞のような反対の意味を持つ二つの感情）、領域族（二つの数値属性が張る平面、及び、三つの数値属性が張る空間の種類）をそれぞれ、解析対象データ記憶部１１、閾値記憶部１２、感情極性記憶部１３、領域族記憶部１４に格納する。 Step 100) The input unit 10 includes analysis target data input from the user, threshold value (minimum support), emotion polarity (two emotions having opposite meanings such as <positive, negative>), region group (two The plane with numerical attributes and the type of space with three numerical attributes) are stored in the analysis target data storage unit 11, the threshold storage unit 12, the emotion polarity storage unit 13, and the region group storage unit 14, respectively.

ステップ２００）目的値導出部３０は、感情極性記憶部１３から感情極性を読み出して、最適な時間、空間領域、時空間領域を求める対象となる目的値を求める。または、それぞれの感情極性について、感情極性を構成する二つの範囲の感情を元とする集合の直積集合を求め、目的値とする。 Step 200) The objective value deriving unit 30 reads the emotion polarity from the emotion polarity storage unit 13, and obtains an objective value that is a target for obtaining the optimum time, space region, and spatiotemporal region. Alternatively, for each emotion polarity, a Cartesian product set of a set based on two ranges of emotions constituting the emotion polarity is obtained and set as a target value.

ステップ３００）トランザクション生成機能部２０は、感情極性記憶部１３から単一、または複数の感情極性を、解析対象データ記憶部１１から解析対象データを、それぞれ読み出して、それぞれの感情極性について、解析対象データが極性値である二つの感情のどちらに属するかを判定する。 Step 300) The transaction generation function unit 20 reads out one or more emotion polarities from the emotion polarity storage unit 13 and analysis target data from the analysis target data storage unit 11, and analyzes each emotion polarity. It is determined whether the data belongs to two emotions that are polar values.

ステップ４００）トランザクション生成機能部２０は、ステップ３００で判定された解析対象データの感情極性と、解析対象データの日付、緯度、経度、感情極性値を要素に持つトランザクションを生成し、トランザクション記憶部１５に格納する。 Step 400) The transaction generation function unit 20 generates a transaction having the emotion polarity of the analysis target data determined in Step 300 and the date, latitude, longitude, and emotion polarity value of the analysis target data as elements, and the transaction storage unit 15 To store.

ステップ５００）数値属性相関ルール抽出部４０は、トランザクション記憶部１５に格納されているトランザクション、目的値導出部３０で求められた目的値、領域族記憶部１４に格納されている領域族、閾値記憶部１２に格納されている最小支持度を取得して、特定の人々を最も特徴付ける最適な空間領域、時空間領域を求め、最適解として、最適解記憶部１６に格納する。 Step 500) The numerical attribute correlation rule extraction unit 40 stores the transaction stored in the transaction storage unit 15, the target value obtained by the target value deriving unit 30, the region group stored in the region group storage unit 14, and the threshold storage. The minimum support degree stored in the unit 12 is acquired, the optimal space region and the spatio-temporal region that characterize the specific people are obtained, and stored in the optimal solution storage unit 16 as the optimal solution.

ステップ６００）出力部５０は、最適解記憶部１６に格納されている最適解を出力する。 Step 600) The output unit 50 outputs the optimum solution stored in the optimum solution storage unit 16.

以下に、各構成要素毎に詳細な動作を説明する。 Below, detailed operation | movement is demonstrated for every component.

入力部１０は、解析対象データと、相関ルール抽出の際に用いる閾値と、感情極性、領域族とをユーザから受け付ける。入力部１０は、例えば、キーボード、ＯＣＲ、ペン入力、音声認識装置、ＧＰＳなどを搭載した端末等や、ネットワーク上に置かれたテキストファイルを読み込む手段等によって構成されている。 The input unit 10 receives analysis target data, a threshold value used when extracting an association rule, an emotion polarity, and a region group from a user. The input unit 10 includes, for example, a keyboard, an OCR, a pen input, a voice recognition device, a terminal equipped with a GPS, a unit that reads a text file placed on a network, and the like.

解析対象データは、個人の経験の履歴や経験に基づく感想、評価が自然言語で記述され、さらに、それを記述した、または、そこに記述されている経験を実際に行った時間、位置を一意に識別可能な数値情報がそれに付与されたデータであるものとする。本実施の形態では、時間情報として日付が、位置情報として緯度、経度が指定されたものとする。 The analysis target data is a personal experience history, impressions and evaluations based on the experience are described in natural language, and the time and position at which the experience is described or actually performed are described. It is assumed that the numerical information that can be identified is the data attached thereto. In the present embodiment, it is assumed that the date is specified as time information and the latitude and longitude are specified as position information.

上記の閾値としては、最小支持度が与えられる。当該最小支持度は、後述する相関ルールの有用性を示す尺度の一つである支持度に対する閾値である。 As the threshold value, a minimum support level is given. The minimum support level is a threshold value for the support level, which is one of the measures indicating the usefulness of the association rule described later.

上記の感情極性は、＜ポジティブ，ネガティブ＞のような、逆の意味を持つ二つの概念を指定することで、ある特徴を持つ人々を絞り込むための極性を示すためのものである。ここで、＜ポジティブ，ネガティブ＞のような反意を持つ２概念からなる感情軸を『感情極性』（あるいは略して『極性』）、"ポジティブ"、"ネガティブ"をそれぞれ「極性値」と呼ぶこととする。＜ポジティブ，ネガティブ＞以外の代表的な感情極性として、
（１）＜嬉しい，悲しい＞
（２）＜驚き，予期＞
（３）＜怒り，恐れ＞
（４）＜受容，嫌悪＞
の４種類を挙げることができる。また、これらの４軸は直交する概念であることが知られているが、ユーザは複数の直交する感情極性を指定して、混合した感情を問い合わせすることも可能である。本実施の形態においては、ユーザはｍ個の感情極性Ｑ_１＝｛ｐ_１，ｎ_１｝，…，Ｑ_ｍ＝｛Ｐ_ｍ，Ｎ_ｍ｝を指定したとする。また、ここに示した感情極性以外にも例えば、＜良い，悪い＞、＜明るい，暗い＞など、形容詞の反意語のペアを入力してもよい。また、あるキーワードや概念を＜含む，含まない＞としたりしてもよい。 The above emotional polarities are intended to show the polarities to narrow down people with certain characteristics by specifying two concepts with opposite meanings, such as <positive, negative>. Here, the emotional axis consisting of two concepts with affirmatives such as <positive, negative> is called "emotional polarity" (or "polarity" for short), and "positive" and "negative" are called "polarity values", respectively. I will do it. Typical emotional polarities other than <positive, negative>
(1) <happy and sad>
(2) <Surprise, anticipation>
(3) <Anger, fear>
(4) <Acceptance, disgust>
Can be mentioned. Moreover, although it is known that these four axes are orthogonal concepts, the user can inquire about mixed emotions by specifying a plurality of orthogonal emotion polarities. In the present embodiment, it is assumed that the user designates m emotion polarities Q ₁ = {p ₁ , n ₁ },..., Q _m = {P _m , N _m }. In addition to the emotional polarity shown here, antonym pairs of adjectives such as <good, bad> and <bright, dark> may be input. A certain keyword or concept may be <included, not included>.

上記の領域族とは、二つの数値属性が張る平面、及び、三つの数値属性が張る空間の種類である。ユーザが、最適な空間領域（二次元）、時空間領域（三次元）を求める際にはこの領域族を指定する。時間を求める際には、単一の数値属性を扱うことになるため指定する必要はない。例えば、二値属性が張る平面における領域族の例としては、
（１）矩形領域、
（２）ｘ単調領域、
（３）直方凸領域
が挙げられる。 The above-mentioned region group is a type of plane on which two numerical attributes are stretched and a space on which three numerical attributes are stretched. The user designates this region group when obtaining the optimum space region (two-dimensional) and space-time region (three-dimensional). When calculating the time, it is not necessary to specify a single numeric attribute. For example, as an example of a region family in the plane where the binary attribute extends,
(1) rectangular area,
(2) x monotonic region,
(3) A rectangular convex region is mentioned.

（１）の矩形領域とは、図５（ａ）に示すように、二つの数値属性の区間の直積によって表現され、二つの数値属性がつくる平面上の軸に平行な領域である。 As shown in FIG. 5A, the rectangular area (1) is an area that is expressed by the direct product of two numerical attribute sections and is parallel to the axis on the plane formed by the two numerical attributes.

（２）のｘ単調領域とは、図５（ｂ）に示すように、ｘ軸（もしくはｙ軸）に垂直な直線との交わりが一つの区間か空であるような連結領域である。 The x monotonic region in (2) is a connected region in which the intersection with a straight line perpendicular to the x axis (or y axis) is one section or empty, as shown in FIG.

（３）の直方凸領域とは、図５（ｃ）に示すように、ｘ単調かつ、ｙ単調な連結領域である。 As shown in FIG. 5C, the rectangular convex area (3) is an x monotone and y monotone connected area.

解析対象データ記憶部１１は、入力部１０で入力された上記の解析対象データを格納する。解析対象データ記憶部１１は、入力データの構造が保存され、復元可能なものであれば何でもよい。例えば、データベースや、予め備えられた汎用的な記憶装置（メモリやハードディスク装置等）の特定領域に記憶される。 The analysis target data storage unit 11 stores the analysis target data input from the input unit 10. The analysis target data storage unit 11 may be anything as long as the structure of the input data is stored and can be restored. For example, it is stored in a specific area of a database or a general-purpose storage device (such as a memory or a hard disk device) provided in advance.

閾値記憶部１２は、入力部１０で入力された、上記閾値を格納する。閾値記憶部１２は、解析対象データ記憶部１１と同様に、最小支持度を保存し、復元可能なものであれば何でもよい。例えば、データベースや、予め備えられた汎用的な記憶装置（メモリやハードディスク装置等）の特定領域に記憶される。 The threshold storage unit 12 stores the threshold value input by the input unit 10. Similar to the analysis target data storage unit 11, the threshold storage unit 12 may be anything as long as it can store and restore the minimum support level. For example, it is stored in a specific area of a database or a general-purpose storage device (such as a memory or a hard disk device) provided in advance.

感情極性記憶部１３は、入力部１０で入力された感情極性を格納する。感情極性記憶部１３は、感情極性を保存し、復元可能なものであれば何でもよい。例えば、データベースや予め備えられた汎用的な記憶装置（メモリやハードディスク装置等）の特定領域に記憶される。 The emotion polarity storage unit 13 stores the emotion polarity input by the input unit 10. The emotion polarity storage unit 13 may be anything as long as it can store and restore the emotion polarity. For example, it is stored in a specific area of a database or a general-purpose storage device (a memory, a hard disk device, etc.) provided in advance.

領域族記憶部１４は、入力部１０で入力された領域族を格納する。入力された領域族を保存し、復元可能なものであれば何でもよい。例えば、データベースや予め備えられた汎用的な記憶装置（メモリやハードディスク装置等）の特定領域に記憶される。 The region family storage unit 14 stores the region family input by the input unit 10. Anything can be used as long as it can save and restore the input region family. For example, it is stored in a specific area of a database or a general-purpose storage device (a memory, a hard disk device, etc.) provided in advance.

トランザクション生成機能部２０は、文書極性判定部２１とトランザクション生成部２２とを有する。文書極性判定部２１は、感情極性記憶部１３に記憶されている極性に基づき、解析対象データ記憶部１１に格納されている上記解析対象データのテキスト部分の内容の極性値を判定する。例えば、テキストを＜ポジティブ，ネガティブ＞という軸で評価する場合には、『店員の態度が悪かったです』というテキストを「ネガティブ」に、『料理が非常においしかったです』を「ポジティブ」に分類する。また、ｍ個の極性に対して、一テキストをｍ回、異なる極性で判定する。 The transaction generation function unit 20 includes a document polarity determination unit 21 and a transaction generation unit 22. The document polarity determination unit 21 determines the polarity value of the content of the text portion of the analysis target data stored in the analysis target data storage unit 11 based on the polarity stored in the emotion polarity storage unit 13. For example, if the text is evaluated on the axis of <positive, negative>, the text "The clerk's attitude was bad" was classified as "negative" and "The food was very delicious" was classified as "positive" To do. Also, for m polarities, one text is determined m times with different polarities.

文書極性判定部２１は、機械学習を用いた文書分類技術、前述の非特許文献２、非特許文献３や、評価表現辞書、感情辞書等のシソーラスを利用した方法等で実現できる。 The document polarity determination unit 21 can be realized by a document classification technique using machine learning, the above-described Non-Patent Document 2, Non-Patent Document 3, or a method using a thesaurus such as an evaluation expression dictionary or an emotion dictionary.

トランザクション生成部２２は、文書極性判定部２１の結果と、解析対象データ記憶部１１に記憶された解析対象データの日付、緯度、経度とから、以下の形式のトランザクションを生成する。 The transaction generation unit 22 generates a transaction of the following format from the result of the document polarity determination unit 21 and the date, latitude, and longitude of the analysis target data stored in the analysis target data storage unit 11.

Ｒ＝｛ＩＤ，日付，緯度，経度，感情極性_１，…，感情極性_ｍ｝
上記のＩＤは、トランザクションの識別子である。感情極性は、文書極性判定部２１で得られた極性値を値として持つ。図６に２つの感情極性＜嬉しい，悲しい＞、＜驚き，予期＞が入力された場合の結果の例を示す。 R = {ID, date, latitude, longitude, emotion polarity ₁ ,..., Emotion polarity _m }
The ID is a transaction identifier. The emotion polarity has a polarity value obtained by the document polarity determination unit 21 as a value. FIG. 6 shows an example of the result when two emotion polarities <joyful, sad> and <surprise, anticipation> are input.

トランザクション記憶部１５は、トランザクション生成部２２で生成したトランザクションを格納する。トランザクション記憶部１５は、トランザクションの構造を保存するものであればなんでもよい。例えば、データベースや予め備えられた汎用的な記憶装置（メモリやハードディスク装置等）の特定領域に記憶される。 The transaction storage unit 15 stores the transaction generated by the transaction generation unit 22. The transaction storage unit 15 may be anything as long as it stores the transaction structure. For example, it is stored in a specific area of a database or a general-purpose storage device (a memory, a hard disk device, etc.) provided in advance.

目的値導出部３０は、評価極性記憶部１３に記憶されたユーザが入力したｍ個の感情極性から、最適な時間、空間領域、時空間領域を求める対象となる目的値の集合Ｐを導出する。 The target value deriving unit 30 derives a set P of target values for which an optimum time, space region, and spatio-temporal region are obtained from m emotion polarities input by the user stored in the evaluation polarity storage unit 13. .

目的値集合Ｐは、以下の通り、ｍ個の集合Ｑ１，…，Ｑｍの直積集合である。 The target value set P is a Cartesian product set of m sets Q1,..., Qm as follows.

例えば、３つの感情極性Ｑ_１＝｛嬉しい，悲しい｝，Ｑ_２＝｛驚き，予期｝，Ｑ_３＝｛怒り，恐れ｝が入力された場合には、Ｐ＝｛（嬉しい，驚き，怒り），（嬉しい，驚き，恐れ），（嬉しい，予期，怒り），（嬉しい，予期，恐れ），（悲しい，驚き，怒り），(悲しい，驚き，恐れ)，（悲しい，予期，怒り），（悲しい，予期，恐れ）｝となる。つまり、ｍ個の感情極性から２^ｍ個の要素を持つ目的値集合Ｐが生成される。但し、ｍ＝１のときは、単一の感情極性集合Ｑを集合Ｐにセットする。目的値導出部３０のフローチャートを図７に示す。

For example, when three emotional polarities Q ₁ = {happy, sad}, Q ₂ = {surprise, expectation}, Q ₃ = {anger, fear} are entered, P = {(happy, surprise, anger) , (Happy, surprise, fear), (happy, expectation, anger), (happy, expectation, fear), (sad, surprise, anger), (sad, surprise, fear), (sad, expectation, anger), ( Sad, anticipation, fear)}. That is, a target value set P having 2 ^m elements is generated from ^m emotion polarities. However, when m = 1, a single emotion polarity set Q is set to the set P. A flowchart of the target value deriving unit 30 is shown in FIG.

ステップ２０１）目的値導出部３０は、評価極性記憶部１３から感情極性Ｏ_１，…，Ｏ_ｍを読み込む。 Step 201) The target value deriving unit 30 reads the emotion polarities O ₁ ,..., O _m from the evaluation polarity storage unit 13.

ステップ２０２）ｍ≧２であれば、ステップ１０３に移行し、そうでない場合はステップ２０８に移行する。 Step 202) If m ≧ 2, proceed to Step 103, otherwise proceed to Step 208.

ステップ２０３）一次変数ｉを初期化（ｉ←１）する。 Step 203) The primary variable i is initialized (i ← 1).

ステップ２０４）ｉ≦ｍである場合にはステップ２０５に移行し、そうでない場合はステップ２０７に移行する。 Step 204) If i ≦ m, go to Step 205, otherwise go to Step 207.

ステップ２０５）感情極性Ｏ_ｉの極性値を元とする集合Ｑ_ｉを生成する。 Step 205) generates a set _{Q i} to the original polarity value of semantic orientation _{O i.}

ステップ２０６）ｉ＝ｉ＋１としてステップ２０４に戻る。 Step 206) Set i = i + 1 and return to Step 204.

ステップ２０７）ステップ２０４においてｉ＞ｍである場合には、Ｑ₁，…，Ｑ_mの直積集合を目的値集合Ｐに設定し、ステップ２０９に移行する。 Step 207) If i> m in Step 204, set the Cartesian product set of Q ₁ ,..., Q _m to the target value set P, and go to Step 209.

ステップ２０８）ステップ２０２において、ｍ＜２であれば感情極性Ｏ_１の極性値を元とする目的値集合Ｐを生成する。 Step 208) In step 202, if m <2, a target value set P based on the polarity value of the emotion polarity O ₁ is generated.

ステップ２０９）目的値集合Ｐを目的値導出部３０内のメモリ（図示せず）に記憶する。 Step 209) The target value set P is stored in a memory (not shown) in the target value deriving unit 30.

数値属性相関ルール抽出部４０は、トランザクション記憶部１５に記憶されたトランザクションと、目的値導出部３０が導出した目的値集合Ｐと、領域族記憶部１４に記憶された領域族と、閾値記憶部１２に記憶された最小支持度に基づいて、最適確信度数値属性相関ルールを抽出する。 The numerical attribute correlation rule extraction unit 40 includes a transaction stored in the transaction storage unit 15, a target value set P derived by the target value deriving unit 30, a region family stored in the region group storage unit 14, and a threshold storage unit. The optimal certainty factor attribute correlation rule is extracted based on the minimum support degree stored in FIG.

数値属性相関ルール抽出部４０は、目的値導出部３０によって得られた目的値集合Ｐの全ての要素について、最適な時間、空間領域、時空間領域を順々に求めていく。 The numerical attribute correlation rule extraction unit 40 sequentially obtains the optimal time, space region, and spatio-temporal region for all elements of the target value set P obtained by the target value deriving unit 30.

目的値集合Ｐのある要素ｐ＝［ｑ_１，…，ｑ_ｍ］│ｑ_１∈Ｑ_１，…，ｑ_ｍ∈Ｑ_ｍについて最適な時間、空間領域、時空間領域を求める場合を考える。数値属性相関ルール導出部４０は、最初に、トランザクション記憶部１５に記憶されたトランザクションに対して、トランザクションの極性属性部分ｐ'＝［ｑ'_１，…，ｑ'_ｍ］│ｑ'_１∈Ｑ_１，…，ｑ'_ｍ∈Ｑ_ｍが、ｐに等しいか否かを判別するための目的属性Ｅを付与する。本実施の形態においては、目的属性Ｅは、ｐ＝ｐ'の場合には１を、そうでない場合には０をとるものとする。図８に、図６の例に目的属性を付与した例を示す。 Element _{p = [q 1, ...,} q m] of the target value set P │q _₁ ∈Q _1, ..., optimum time for _{q _m} ∈Q _m, consider the case of obtaining the spatial domain, the space-time region. First, the numerical attribute correlation rule derivation unit 40 applies the transaction polarity attribute part p ′ = [q ′ ₁ ,..., Q ′ _m ] | q ′ ₁ εQ to the transaction stored in the transaction storage unit 15. ₁ ,..., Q ′ _m εQ _m is assigned a purpose attribute E for determining whether or not p is equal to p. In the present embodiment, the objective attribute E is 1 when p = p ′, and 0 otherwise. FIG. 8 shows an example in which a purpose attribute is added to the example of FIG.

数値属性相関ルール抽出部４０は、最適な時間を求める場合には、一次元数値属性相関ルールを抽出する。また、最適な空間領域を求める場合には、二次元数値属性相関ルールを抽出する。また、最適な時空間領域を求める際には、三次元数値属性相関ルールを抽出する。数値属性相関ルールは以下の形式で表される。 The numerical attribute correlation rule extraction unit 40 extracts a one-dimensional numerical attribute correlation rule when obtaining the optimum time. Further, when obtaining an optimal space region, a two-dimensional numerical attribute correlation rule is extracted. Further, when obtaining the optimal space-time region, a three-dimensional numerical attribute correlation rule is extracted. Numeric attribute association rules are represented in the following format:

（Ａ∈（ｖ_１，ｖ_２））→（Ｅ＝１）
上記のＡは数値属性で、ｖ_１≦ｖ_２はＡの定義域中の値、Eは目的属性である。矢印左の項を条件部、右の項を結論部と呼ぶ。相関ルールの有用性を示す尺度としては、支持度と確信度を用いる。全トランザクション数をN、属性Aの値がｖ_１≦ｖ_２に含まれるトランザクション数をｓ、属性Ａの値がｖ_１≦ｖ_２に含まれ、かつ、属性Ｅの値として１を持つトランザクション数をｈとすると、支持度はｈ／Ｎ、確信度はｈ／ｓで計算できる。また、設定のし易さを考えて、支持度をｈと考えて計算してもよい。また、条件部に１，２，３個の数値属性を持つルールをそれぞれ、一次元、二次元、三次元数値属性相関ルールと呼ぶこととする。 (A∈ (v ₁ , v ₂ )) → (E = 1)
A is a numerical attribute, v ₁ ≦ v ₂ is a value in the domain of A, and E is a target attribute. The term on the left of the arrow is called the condition part, and the term on the right is called the conclusion part. Support and confidence are used as a measure of the usefulness of the association rule. Number of transactions where N is the total number of transactions, s is the number of transactions whose attribute A is included in v ₁ ≦ v ₂ , the number of transactions whose attribute A is included in v ₁ ≦ v ₂ , and 1 is the value of attribute E If h is h, the support can be calculated by h / N and the certainty can be calculated by h / s. Further, in consideration of ease of setting, the degree of support may be calculated as h. Further, rules having 1, 2, and 3 numeric attributes in the condition part are referred to as one-dimensional, two-dimensional, and three-dimensional numeric attribute correlation rules, respectively.

最適な時間を求めるには、以下の形式の一次元数値属性相関ルールを抽出する。 In order to obtain the optimum time, a one-dimensional numerical attribute correlation rule of the following format is extracted.

（Ｔ∈［ｔ１，ｔ２］）→（Ｅ＝１）
ここで、Ｔは時間属性、ｔ１≦ｔ２はＴの定義域中の値である。例えば、「２００８年１月１日から２００８年１月２日に、人は高い確率で嬉しいという感情を持つ」という事実は、以下の相関ルールで表現される。 (Tε [t1, t2]) → (E = 1)
Here, T is a time attribute, and t1 ≦ t2 is a value in the T domain. For example, the fact that “a person has a high probability of being happy from January 1, 2008 to January 2, 2008” is expressed by the following association rule.

（Ｔ∈［2008-01-01,2008-01-02］）→（Ｅ＝１）
前述の通り、目的属性Ｅは、極性値が"嬉しい"の場合に１をとる属性である。トランザクション記憶部１５に格納されているトランザクション集合の中から、ある一定以上（最小支持度以上）の支持度を持ち、その中で確信度が最も高くなるルール（最適確信度相関ルール）を選択する。もし、確信度を最大とするルールが複数存在する場合には、支持度を最大にするものを優先的に選ぶ。そのルールの条件部が示す時間領域を、目的属性の値を最も良く特徴付ける最適領域とする。 (T∈ [2008-01-01,2008-01-02]) → (E = 1)
As described above, the target attribute E is an attribute that takes 1 when the polarity value is “happy”. From the transaction set stored in the transaction storage unit 15, select a rule (optimum certainty correlation rule) having a certain degree of support (a minimum support) or more and the highest certainty among them. . If there are a plurality of rules that maximize the certainty factor, the rule that maximizes the support factor is preferentially selected. The time region indicated by the condition part of the rule is the optimum region that best characterizes the value of the target attribute.

最適な空間領域を求めるには、以下の形式の二次元数値属性相関ルールを抽出する。 In order to obtain the optimum spatial region, a two-dimensional numerical attribute correlation rule having the following format is extracted.

（＜Ｌ，Ａ＞∈Ｒ）→（Ｅ＝１）
ここで、Ｌは緯度属性、Ａは経度属性である。Ｒは数値属性Ｌ，Ａとそれらが張る平面状の領域である。この領域Ｒの形式は、領域族記憶部１４に記憶された領域族である。前記の通り、二次元の数値属性における代表的な領域族は、
１）矩形領域、
２）ｘ単調領域、
３）直方凸領域
である。 (<L, A> ∈R) → (E = 1)
Here, L is a latitude attribute, and A is a longitude attribute. R is the numerical attributes L and A and the planar area that they stretch. The format of the region R is a region group stored in the region group storage unit 14. As described above, typical region groups in two-dimensional numerical attributes are:
1) rectangular area,
2) x monotonic region,
3) A rectangular convex region.

１）の矩形領域は、二つの数値属性の区間の直積によって表現され、二つの数値属性がつくる平面上の軸に平行な領域であるため、一次元数値属性相関ルールを単純に拡張した以下の形式の相関ルールが矩形領域に該当する。 The rectangular area of 1) is expressed by the direct product of the sections of two numerical attributes, and is an area parallel to the axis on the plane created by the two numerical attributes. The association rule of the format corresponds to the rectangular area.

（Ｌ∈［35.0000,36.0000］）∧（Ａ∈［140.0000,141.0000］）→（Ｅ＝１）
トランザクション記憶部１５に記憶されているトランザクション集合の中から、ある一定以上（最小支持度以上）を選択する。もし、確信度を最大とするルールが複数存在するとき、支持度を最大にするものを優先的に選ぶ。そのルールの条件部が示す空間領域＜Ｌ，Ａ＞を、目的属性の値を最も良く特徴付ける最適領域とする。この方法は想定する領域族を（１）矩形領域、（２）ｘ単調領域、（３）直方凸領域のどれにするかによって、異なる最適解を導き出す可能性がある。 (L∈ [35.0000,36.0000]) ∧ (A∈ [140.0000,141.0000]) → (E = 1)
From a transaction set stored in the transaction storage unit 15, a certain level or more (minimum support level) is selected. If there are a plurality of rules that maximize the certainty factor, the rule that maximizes the support factor is preferentially selected. The spatial region <L, A> indicated by the condition part of the rule is set as the optimum region that best characterizes the value of the objective attribute. This method has a possibility of deriving different optimum solutions depending on whether the assumed region group is (1) rectangular region, (2) x monotone region, or (3) rectangular convex region.

最適な時空間領域を求めるには、以下の形式の三次元数値属性相関ルールを考える。 To find the optimal spatio-temporal region, consider a three-dimensional numerical attribute correlation rule of the form

（＜Ｔ，Ｌ，Ａ＞∈Ｒ）→（Ｅ＝１）
ここで、Ｒは数値属性Ｔ，Ｌ，Ａとそれらが張る三次元空間領域である。この領域Ｒの形式は、領域族記憶部１４に記憶された領域族である。領域の形としては、三つの数値属性がつくる空間上の軸に平行な領域や、ある一つの軸に垂直な直線との交わりが一つの区間か空であるような連結領域や、ある二つの軸に垂直な直線との交わりがひとつの区間か空であるような連結領域や、全ての軸に垂直な直線との交わりが一つの区間か空であるような連結領域が考えられる。 (<T, L, A> ∈R) → (E = 1)
Here, R is a numerical attribute T, L, A and a three-dimensional space region spanned by them. The format of the region R is a region group stored in the region group storage unit 14. The shape of the region includes a region parallel to the spatial axis created by the three numerical attributes, a connected region in which the intersection with a straight line perpendicular to one axis is empty, or two A connection region where the intersection with a straight line perpendicular to the axis is empty in one section or a connection region where the intersection with a straight line perpendicular to all axes is empty in one section is conceivable.

トランザクション記憶部１５に格納されているトランザクション集合の中から、ある一定以上（最小支持度以上）の支持度を持ち、その中で確信度が最も高くなるルール（最適確信度ルール）を選択する。もし、確信度を最大とするルールが複数存在するとき、支持度を最大にするものを優先的に選ぶ。そのルールの条件部が示す時空間領域＜Ｔ，Ｌ，Ａ＞を、目的属性の値を最も良く特徴付ける最適領域とする。この方法は想定する領域族によって、異なる最適解を導き出す可能性がある。 From the transaction set stored in the transaction storage unit 15, a rule (optimum certainty rule) that has a certain degree of support (greater than the minimum support) and the highest certainty among them is selected. If there are a plurality of rules that maximize the certainty factor, the rule that maximizes the support factor is preferentially selected. The spatiotemporal region <T, L, A> indicated by the condition part of the rule is the optimum region that best characterizes the value of the target attribute. This method may lead to different optimal solutions depending on the assumed region family.

全ての区間、領域を列挙して最大の確信度となる区間を選ぶ素朴な手法から、より効率的な既存技術など、それぞれの数値属性相関ルールを求める具体的な手法は問わない。 There is no limitation on a specific method for obtaining each numerical attribute correlation rule, such as a simple method of enumerating all the sections and areas and selecting a section having the highest certainty factor, or a more efficient existing technique.

例えば、一次元数値属性相関ルールは、数値属性を偏りのないＭ個の区画に分割し、ｋ＝１，…，Ｍに対して二次元平面上における点の列、 For example, a one-dimensional numerical attribute correlation rule divides a numerical attribute into M sections without bias, and a sequence of points on a two-dimensional plane for k = 1,.

を考える。ここで、ｕ_ｉは区画ｉに含まれるトランザクション数で、ｖ_ｉは区画ｉに含まれるトランザクションの中で、目的属性の値がユーザからの要求を満たすトランザクション数である。ｘ方向に最小支持度以上離れた２点で、それらを結ぶ直線の傾きが最大のものを発見するというように、幾何学の問題に置き換えて効率的に求める手法が知られているのでこれを使ってもよい（例えば、Mining optimized association rules for numeric attributes, T. Fukuda, Y. Morimoto, S. Morishita, and T. Tokuyama, ACM SIGACT-SIGART Symposium on Principles of Database Systems, pp.182-191, 1996）。矩形領域は、二次元、三次元数値属性相関ルールも、一次元数値属性相関ルールの問題に還元することで効率的に解くことができる。また、二次元数値属性相関ルールのｘ単調領域や直方凸領域を効率的に求める手法も知られている（例えば、Data mining using two-dimensional optimized association rules: Scheme, algorithms, and visualization, T. Fukuda, Y. Morimoto, S. Morishita, and T. Tokuyama, ACM SIGMOD Conference on Management of Data, pp.13-23, 1996）。

think of. Here, u _i is the number of transactions included in the partition i, and v _i is the number of transactions whose purpose attribute value satisfies the request from the user among the transactions included in the partition i. There is a known method for efficiently finding this by replacing it with a geometrical problem, such as finding the one with the largest slope of the straight line connecting them at two points that are more than the minimum support in the x direction. (E.g. Mining optimized association rules for numeric attributes, T. Fukuda, Y. Morimoto, S. Morishita, and T. Tokuyama, ACM SIGACT-SIGART Symposium on Principles of Database Systems, pp.182-191, 1996 ). The rectangular area can be efficiently solved by reducing the two-dimensional and three-dimensional numerical attribute correlation rules to the problem of the one-dimensional numerical attribute correlation rules. There are also known methods for efficiently obtaining x-monotone regions and rectangular convex regions of two-dimensional numerical attribute association rules (for example, Data mining using two-dimensional optimized association rules: Scheme, algorithms, and visualization, T. Fukuda , Y. Morimoto, S. Morishita, and T. Tokuyama, ACM SIGMOD Conference on Management of Data, pp. 13-23, 1996).

最適解記憶部１６は、数値属性相関ルール抽出部４０で得られた、目的値、数値属性相関ルール、各種ルールの優位性を示す指標（支持度、確信度）の組み合わせを格納する。最適解（最も高い確信度を持つルール）のみを記憶しても良いし、最小支持度以上の支持度を持つ数値属性相関ルールの全てを記憶しても良い。つまり、数値属性相関ルール抽出部４０で得られた全ての情報を記憶可能なものである。例えば、データベースや予め備えられた汎用的な記憶装置（メモリやハードディスク装置等）の特定領域に記憶される。 The optimal solution storage unit 16 stores combinations of objective values, numerical attribute correlation rules, and indices (support level, confidence factor) indicating superiority of various rules obtained by the numerical attribute correlation rule extraction unit 40. Only the optimal solution (the rule having the highest certainty factor) may be stored, or all the numerical attribute correlation rules having the support level equal to or higher than the minimum support level may be stored. That is, all the information obtained by the numerical attribute correlation rule extraction unit 40 can be stored. For example, it is stored in a specific area of a database or a general-purpose storage device (a memory, a hard disk device, etc.) provided in advance.

出力部５０は、最適解記憶部１６に格納されている目的値、数値属性相関ルール（または、相関ルールの条件部の、時間領域、空間領域、時空間領域）、相関ルールの優位性を示す指標（支持度、確信度）を出力する。ここで、出力とはディスプレイへの表示、プリンタへの印字、音出力、外部の装置への送信、記録媒体への蓄積等を含む概念である。出力部５０は、ディスプレイやスピーカ等の出力デバイスを含むと考えても、含まないと考えてもよい。出力部５０は、出力デバイスのドライバソフトまたは、出力デバイスのドライバソフトと出力デバイス等で実現することができる。 The output unit 50 indicates the superiority of the objective value, the numerical attribute correlation rule (or the time domain, the spatial domain, the spatio-temporal domain of the condition part of the correlation rule), and the correlation rule stored in the optimal solution storage unit 16. Output indicators (support level, confidence level). Here, the output is a concept including display on a display, printing on a printer, sound output, transmission to an external device, accumulation in a recording medium, and the like. The output unit 50 may or may not include an output device such as a display or a speaker. The output unit 50 can be realized by output device driver software, output device driver software, and an output device.

上記の実施の形態において、数値属性相関ルール抽出部４０が、数値属性相関ルールを生成する動作について、最適空間領域を求める例を用いて説明する。ここでは、トランザクションの位置情報として緯度属性Ｌ，経度属性Ａという二つの数値属性が付与されているとする。また、ユーザから指定された領域族は直方凸領域であるとする。また、最小支持度としては頻度「６」が指定されたとする（説明の簡略化のため、支持度を頻度ベースで定義）。 In the above embodiment, the operation in which the numerical attribute correlation rule extraction unit 40 generates a numerical attribute correlation rule will be described using an example in which an optimal space region is obtained. Here, it is assumed that two numerical attributes, a latitude attribute L and a longitude attribute A, are assigned as transaction position information. Further, it is assumed that the region group designated by the user is a rectangular convex region. Further, it is assumed that the frequency “6” is designated as the minimum support level (for the sake of simplicity, the support level is defined on a frequency basis).

まず、最適化領域を求めるために、二つの数値属性Ｌ，Ａが張る平面の定義域を適当な粒度のグリッドに分割し、グリッド上のピクセルをつなぎ合わせて領域を作る。次に、トランザクション記憶部１５に記憶されたそれぞれのトランザクションを、位置（緯度、経度）情報に基づいて、Ｌ，Ａが張る平面上にマッピングする。この際に、目的属性の値が「１」のトランザクションと「０」のトランザクションとを判別可能な形式でマッピングする。図９は、目的属性の値が「１」のトランザクションを黒い丸で、「０」のトランザクションを白い丸で表現し、緯度、経度に基づいてマッピングした領域の例である。 First, in order to obtain an optimized region, a domain defined by a plane formed by two numerical attributes L and A is divided into grids of appropriate granularity, and pixels on the grid are connected to create a region. Next, each transaction stored in the transaction storage unit 15 is mapped on a plane spanned by L and A based on position (latitude, longitude) information. At this time, the transaction having the value of the target attribute “1” and the transaction “0” are mapped in a discriminable format. FIG. 9 is an example of a region in which a transaction having a target attribute value “1” is represented by a black circle and a transaction “0” is represented by a white circle and mapped based on latitude and longitude.

図９に示した領域から、数値属性相関ルールアルゴリズムを用いて、最小支持度以上の支持度を持つ直方凸領域を求めると、図１０に示すように、三つの領域（直方凸領域Ａ，直直方凸領域Ｂ，直方凸領域Ｃ）が得られる。この中で最も確信度の高い直方凸領域、つまり、直方凸領域Ａが最適空間領域となる。 When a rectangular convex area having a support level equal to or higher than the minimum support level is obtained from the area shown in FIG. 9 using the numerical attribute correlation rule algorithm, as shown in FIG. A rectangular convex area B and a rectangular convex area C) are obtained. Of these, the rectangular convex region with the highest certainty, that is, the rectangular convex region A is the optimal space region.

なお、上記の時空間検索装置の構成要素の動作をプログラムとして構築し、時空間検索装置として利用されるコンピュータにインストールして実行させる、または、ネットワークを介して流通させることが可能である。 The operations of the components of the spatio-temporal search device described above can be constructed as a program, installed in a computer used as the spatio-temporal search device, executed, or distributed via a network.

また、構築されたプログラムをハードディスクや、フレキシブルディスク・ＣＤ−ＲＯＭ等の可搬記憶媒体に格納し、コンピュータにインストールする、または、配布することが可能である。 Further, the constructed program can be stored in a portable storage medium such as a hard disk, a flexible disk, or a CD-ROM, and can be installed or distributed in a computer.

なお、本発明は、上記の実施の形態に限定されることなく、特許請求の範囲内において種々変更・応用が可能である。 The present invention is not limited to the above-described embodiment, and various modifications and applications can be made within the scope of the claims.

本発明は、ブログ等の個人発信情報を時空間上で分類する技術に適用可能である。 The present invention can be applied to a technique for classifying personally transmitted information such as a blog in space-time.

１０入力手段、入力部
１１入力情報記憶手段、解析対象データ記憶部
１２閾値記憶部
１３感情極性記憶部
１４領域族記憶部
１５トランザクション記憶手段、トランザクション記憶部
１６最適解記憶部
２０トランザクション生成機能部
２１文書極性判定部
２２トランザクション生成手段、トランザクション生成部
３０目的値算出手段、目的値導出部
４０数値属性相関ルール抽出手段、数値属性相関ルール抽出部
５０出力部 DESCRIPTION OF SYMBOLS 10 Input means, Input part 11 Input information storage means, Analysis object data storage part 12 Threshold value storage part 13 Emotion polarity storage part 14 Area group storage part 15 Transaction storage means, transaction storage part 16 Optimal solution storage part 20 Transaction generation function part 21 Document polarity determination unit 22 Transaction generation unit, transaction generation unit 30 Objective value calculation unit, objective value derivation unit 40 Numeric attribute correlation rule extraction unit, Numeric attribute correlation rule extraction unit 50 Output unit

Claims

The most suitable for characterizing a specific person from a document in which personal experience is described in a natural language and the analysis target data of the structure in which the experienced location information and time information are given as numerical data. A spatio-temporal search device for obtaining a spatio-temporal region,
Input means for inputting the analysis target data, constraints on the space-time region to be obtained, conditions for characterizing people, and storing them in storage means;
A target value deriving unit that sets a condition characterizing the people given by the input unit as a target value for obtaining an optimal space-time region;
Document polarity determination means for determining whether the content of the text of the analysis target data is written by people who meet the conditions characterizing the people;
Based on the determination result of the document polarity determination means and the position information and time information given to the analysis target data, a transaction having the position information, time information, and determination result as elements is generated and stored in the transaction storage means. Transaction generation means;
Extracting a numerical attribute correlation rule from a set of transactions in the transaction storage means based on a constraint on the spatio-temporal region to determine a spatio-temporal region condition that maximizes the appearance probability of the transaction corresponding to the target value. Numeric attribute association rule extraction means for deriving;
A spatio-temporal search device characterized by comprising:

The input means includes
The means for characterizing the people includes means for characterizing people in terms of whether they have a certain emotion, or vice versa, by specifying an emotion polarity consisting of two emotions having opposite meanings,
The target value deriving means includes
Each of the two emotions given by the input means includes means for obtaining a target value for obtaining an optimal spatiotemporal region,
The document polarity determination means includes
The spatio-temporal search device according to claim 1, further comprising means for determining which of the two emotions, which are polar values of emotion polarity, the content of the text of the analysis target data belongs to.

The input means includes
Means for characterizing people in terms of accepting multiple emotional polarities consisting of two emotions with opposite specified meanings as conditions for characterizing the people,
The target value deriving means includes
For each emotion polarity given by the input means, including means for setting a direct product set of a set based on two opposing emotions constituting the emotion polarity as a target value,
The document polarity determination means includes
The spatio-temporal search device according to claim 1, further comprising means for determining, for each emotional polarity, the content of the text of the analysis target data belongs to two emotions which are polarity values of the emotional polarity.

4. The spatio-temporal search device according to claim 1, further comprising output means for outputting an optimal spatiotemporal region that best characterizes a specific person.

The most suitable for characterizing a specific person from a document in which personal experience is described in a natural language and the analysis target data of the structure in which the experienced location information and time information are given as numerical data. A spatiotemporal search method for obtaining a spatiotemporal region,
An input step for inputting the analysis target data, restrictions on the space-time region to be obtained, conditions for characterizing people, and storing them in storage means;
A target value deriving step in which the conditions characterizing the people given in the input step are set as target values for obtaining an optimal spatiotemporal region;
A document polarity determination step for determining whether the text content of the analysis target data is written by people corresponding to a condition characterizing the people;
Based on the determination result of the document polarity determination step and the position information and time information given to the analysis target data, a transaction having the position information, time information, and determination result as elements is generated and stored in the transaction storage means. A transaction generation step;
Extracting a numerical attribute correlation rule from a set of transactions in the transaction storage means based on a constraint on the spatio-temporal region to determine a spatio-temporal region condition that maximizes the appearance probability of the transaction corresponding to the target value. A numerical attribute association rule extraction step to be derived;
A spatio-temporal search method characterized by:

In the input step,
Characterize people in terms of whether they have a certain emotion, or vice versa, by specifying an emotion polarity consisting of two emotions with opposite meanings as a condition that characterizes the people,
In the target value derivation step,
Each of the two emotions given in the input step is set as a target value for obtaining an optimal spatiotemporal region,
In the document polarity determination step,
The spatio-temporal search method according to claim 5, wherein the text content of the analysis target data is determined to belong to one of two emotions, which is a polarity value of emotion polarity.

In the input step,
As a condition for characterizing the people, it accepts input of a plurality of emotion polarities consisting of two emotions having opposite meanings specified, and characterizes people in terms of whether to hold a plurality of emotions simultaneously,
In the target value derivation step,
For each emotion polarity given in the input step, the objective value is a Cartesian product set of a set based on two opposing emotions constituting the emotion polarity,
In the document polarity determination step,
6. The spatiotemporal search method according to claim 5, wherein, for each emotion polarity, the content of the text of the analysis target data is determined to belong to two emotions which are polarity values of the emotion polarity.

8. The spatio-temporal search method according to claim 5, further comprising an output step of outputting an optimal spatio-temporal region that best characterizes a specific person.

A spatio-temporal search program for causing a computer to function as each means constituting the spatio-temporal search device according to any one of claims 1 to 4.