JP2010113511A

JP2010113511A - Device for extracting and retrieving sensitivity information, method therefor and program

Info

Publication number: JP2010113511A
Application number: JP2008285501A
Authority: JP
Inventors: Hisako Asano; 久子浅野; Izumi Takahashi; いづみ高橋; Nozomi Kobayashi; のぞみ小林; Yoshihiro Matsuo; 義博松尾
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2008-11-06
Filing date: 2008-11-06
Publication date: 2010-05-20
Anticipated expiration: 2028-11-06
Also published as: JP5137134B2

Abstract

<P>PROBLEM TO BE SOLVED: To achieve sensitivity retrieval allowing acquisition of a retrieval result by only inputting a target, and not needing to impart metadata or the like related to the target via manpower. <P>SOLUTION: In a sensitivity information extraction part 10, a text document is language-analyzed, sensitivity expression inside the text document is extracted by use of text analysis information thereof, a sensitivity expression dictionary 20, and a sensitivity expression extraction rule 30, and sensitivity information including information related to at least the sensitivity expression in an extracted sensitivity expression unit is generated and is accumulated in a sensitivity information DB 40. In a sensitivity retrieval part 50, a sensitivity vector dictionary 60 is searched together with the sensitivity information DB 40 based on an input retrieval condition, a retrieval result thereof is totaled based on a tabulation condition to create a tabulation results, and the retrieval result of a predetermined format is output by use of the tabulation results. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、テキスト文書から感性情報を抽出してデータベース化し、このデータベースを利用して情報検索を行う技術に関する。 The present invention relates to a technique for extracting Kansei information from a text document into a database and performing information retrieval using the database.

従来の情報検索装置は、キーワードを入力すると、そのキーワードを含む文書を検索結果として出力するものがほとんどであった。このため、一の対象とイメージ的に似た他の対象を知りたいといった、あいまいで感覚的な要求に対しては、当該一の対象のイメージを表す表現をキーワードとして用いて検索を行い、検索意図とはかけ離れたものを大量に含む検索結果の各文書を確認して、実際に前記一の対象とイメージ的に似た他の対象を探し出すという手間をかける必要があった。また、検索する際、前記一の対象のイメージを表す表現を全て列挙するのは困難であるため、正確に一の対象とイメージ的に似た他の対象を探し出すのも難しかった。 Most conventional information search apparatuses output a document including a keyword as a search result when the keyword is input. For this reason, for ambiguous and sensory requests such as wanting to know other objects that are similar in image to one object, a search is performed using an expression representing the image of the one object as a keyword. It has been necessary to take time and effort to check each document of the search result including a large amount of things that are far from the intention and to search for other objects that are actually similar in image to the one object. Further, when searching, it is difficult to enumerate all expressions representing the image of the one object, and it is difficult to find another object that is exactly similar in image to the one object.

例えば、ある俳優に対し、「もの静かでかっこいい」というイメージであると認識している場合、「もの静かかっこいい」で検索し、その検索結果の各文書を確認して、当該ある俳優に似た別の人物を探し出す必要があった。 For example, if an actor recognizes that the image is "quiet and cool", search for "quiet and cool", check each document in the search results, I needed to find a person.

このようなニーズに対応するものとして、商品の画像やあいまいな言葉から商品を探し出せる「感性検索」という技術が実用化されている（非特許文献１参照）。しかし、この技術では、例えばある商品の画像をきっかけに関連する別の商品を検索できるようにするためには、各商品の画像の座標軸にその商品の様々な特徴を表すメタデータ（付随情報）を予め人手で入力する必要があり、手間がかかるという問題があった。
”マーケティング視点で感性検索・アルベルト山川会長−推奨エンジンに挑む人々（３）”、［ｏｎｌｉｎｅ］、平成２０年５月２９日、(C) 2008 Nikkei Inc. / Nikkei Digital Media, Inc. 、［平成２０年１０月２１日検索］、インターネット＜ＵＲＬ：http://it.nikkei.co.jp/internet/news/index.aspx?n=MMITba000029052008＞ In order to meet such needs, a technology called “Kansei Search” has been put into practical use in which a product can be found from a product image or ambiguous language (see Non-Patent Document 1). However, in this technology, for example, in order to be able to search for another product related to an image of a product, metadata (accompanying information) representing various characteristics of the product on the coordinate axis of the image of each product There is a problem that it takes time and effort to input manually.
"Kansei search from marketing perspective-Chairman Alberto Yamakawa-People who challenge the recommended engine (3)", [online], May 29, 2008, (C) 2008 Nikkei Inc. / Nikkei Digital Media, Inc., [Heisei Search on October 21, 2008], Internet <URL: http://it.nikkei.co.jp/internet/news/index.aspx?n=MMITba000029052008>

前述した、一の対象とイメージ的に似た他の対象を知りたいような場合、通常の情報検索と同様、当該一の対象を入力するだけで検索結果が得られる方がユーザの利便性が向上する。 If you want to know other objects that are similar in image to the one, you can improve the convenience of the user if you can obtain the search result just by inputting the one, as in the case of normal information search. To do.

また、このような情報検索、つまり感性的な情報検索を可能とするために、各対象に関するメタデータ等を、人手を介して付与するのは手間が大きいので、それをできるだけ省力化することが望ましい。 Also, in order to enable such information retrieval, that is, sensuous information retrieval, it is laborious to provide metadata about each target manually, so that it is possible to save labor as much as possible. desirable.

本発明は、上記の点に鑑みなされたもので、テキスト文書から感性情報を抽出してデータベース化し、このデータベースを利用して検索対象とその比較対象との感性的な距離を求めて情報検索を行うことにより、対象を入力するだけで検索結果が得られるとともに、対象に関するメタデータ等を人手を介して付与する必要がなく、その分、省力化でき、運用コストを低減できることを目的とする。 The present invention has been made in view of the above points, and extracts Kansei information from a text document to create a database. By using this database, an information retrieval is performed by obtaining the Kansei distance between a search object and its comparison object. By doing so, it is possible to obtain a search result only by inputting a target, and to add metadata related to the target manually, and to save labor and to reduce operational costs.

本発明は、テキスト文書から感性情報を抽出してデータベース化し、このデータベースを利用して情報検索を行う感性情報抽出・検索装置であって、テキスト文書に対してテキスト解析を行い、テキスト解析情報を出力するテキスト解析部と、前記テキスト解析情報と、任意の対象に関する印象や感じ方を表す表現である感性表現となり得る単語の情報を登録した感性表現辞書と、抽出する感性表現のパターンを登録した感性表現抽出ルールとを用いて前記テキスト文書中の感性表現を抽出する感性表現抽出部と、感性表現単位に少なくとも当該感性表現に関する情報を含む感性情報を生成して出力する感性情報生成部とから構成される感性情報抽出部と、感性情報抽出部から出力された感性情報を格納する感性情報データベースと、入力された検索条件および集計条件を受け付ける条件入力部と、前記検索条件に基づいて前記感性情報データベースとともに各感性表現に対する感性ベクトルを登録した感性ベクトル辞書を少なくとも検索し、その検索結果を前記集計条件に基づいて集計して集計結果を作成する検索・集計部と、前記集計結果を用いて予め定められた形式で検索結果を出力する結果出力部とから構成される感性検索部とを備えたことを特徴とする。 The present invention is a sensitivity information extraction / retrieval device that extracts sensitivity information from a text document and creates a database, and performs information search using the database, and performs text analysis on the text document to obtain text analysis information. Registered the text analysis section to output, the text analysis information, the Kansei expression dictionary that registered the information of words that could be Kansei expressions that are expressions representing impressions and feelings about arbitrary objects, and the patterns of Kansei expressions to be extracted A sensitivity expression extraction unit that extracts a sensitivity expression in the text document using a sensitivity expression extraction rule; and a sensitivity information generation unit that generates and outputs sensitivity information including at least information related to the sensitivity expression in a sensitivity expression unit. The sentiment information extraction unit configured, the sentiment information database storing the sentiment information output from the sentiment information extraction unit, and the input A condition input unit that receives the search condition and the total condition, and at least a sensitivity vector dictionary in which the sensitivity vector for each sensitivity expression is registered together with the sensitivity information database based on the search condition, and the search result is based on the total condition And a sensibility search unit that includes a search / aggregation unit that aggregates and creates a tabulation result and a result output unit that outputs the search result in a predetermined format using the tabulation result. And

本発明によれば、任意のテキスト文書から感性情報を抽出してデータベース化し、このデータベースを利用して検索対象とその比較対象との感性的な距離の情報を求めて情報検索を行うことにより、対象を入力するだけで検索結果が得られる感性的な検索が可能となるとともに、対象に関するメタデータ等を人手を介して付与する必要がなく、その分、省力化でき、運用コストを低減できる。 According to the present invention, Kansei information is extracted from an arbitrary text document and converted into a database, and by searching for information on the Kansei distance between the search target and the comparison target using this database, It is possible to perform a sensuous search in which a search result can be obtained simply by inputting a target, and it is not necessary to manually add metadata or the like related to the target, so that labor can be saved and operation costs can be reduced accordingly.

以下、本発明を図示の実施の形態により詳細に説明する。 Hereinafter, the present invention will be described in detail with reference to the illustrated embodiments.

本発明の感性情報抽出・検索装置は、コンピュータ装置からなり、キーボード等の入力手段、モニタ等の表示手段（出力手段）、ハードディスクやメモリ等の記億手段および外部ネットワークに接続可能な通信装置等（いずれも図示省略）を備えている。 The sensitivity information extraction / retrieval device of the present invention comprises a computer device, and includes input means such as a keyboard, display means (output means) such as a monitor, storage means such as a hard disk and a memory, and a communication device that can be connected to an external network. (Both are not shown).

＜第１の実施の形態＞
図１は本発明の感性情報抽出・検索装置の第１の実施の形態を示すもので、本発明の感性情報抽出・検索装置は、感性情報抽出部１０、感性表現辞書２０、感性表現抽出ルール３０、感性情報データベース（ＤＢ）４０、感性検索部５０、感性ベクトル辞書６０および集計結果蓄積データベース（ＤＢ）７０から構成される。また、図２は感性情報抽出部１０における感性情報抽出の流れ図、図３は感性検索部５０における感性検索処理の流れ図、図４は感性情報抽出部１０における処理のようすの一例を示す説明図である。 <First Embodiment>
FIG. 1 shows a first embodiment of a sensitivity information extraction / retrieval apparatus according to the present invention. The sensitivity information extraction / retrieval apparatus according to the present invention includes a sensitivity information extraction unit 10, a sensitivity expression dictionary 20, and a sensitivity expression extraction rule. 30, Kansei information database (DB) 40, Kansei search unit 50, Kansei vector dictionary 60, and tabulation result accumulation database (DB) 70. 2 is a flowchart of sensitivity information extraction in the sensitivity information extraction unit 10, FIG. 3 is a flowchart of sensitivity search processing in the sensitivity search unit 50, and FIG. 4 is an explanatory diagram showing an example of processing in the sensitivity information extraction unit 10. is there.

感性情報抽出部１０は、図１に示すように、テキスト解析部１１、感性表現抽出部１２および感性情報生成部１３からなり、図示しない入力手段から直接入力され又は記憶手段から読み出されて入力され又は通信媒体を介して他の装置等から入力された文書（テキスト文書）の集合を入力とし、感性情報を感性情報ＤＢ５０に出力する。 As shown in FIG. 1, the sensibility information extraction unit 10 includes a text analysis unit 11, a sensibility expression extraction unit 12, and a sensibility information generation unit 13, and is directly input from an input unit (not shown) or read and input from a storage unit. Or a set of documents (text documents) input from another device or the like via a communication medium is input, and the sensitivity information is output to the sensitivity information DB 50.

ここで、テキスト文書は、少なくとも、その文書を識別するための文書ＩＤと、テキスト（テキストデータ）とを有する。その他に、作成日時、文書種別（例：ブログ）、著者ＩＤ等の文書メタ情報を備えていても良い。文書メタ情報として文書種別、作成日時および著者ＩＤを備えたテキスト文書の一例を図４（ａ）に示す。 Here, the text document has at least a document ID for identifying the document and a text (text data). In addition, document meta information such as creation date / time, document type (eg, blog), author ID, and the like may be provided. FIG. 4A shows an example of a text document provided with document type, creation date and author ID as document meta information.

テキスト解析部１１は、前記テキスト文書のテキストに対し、少なくとも周知の形態素解析処理（単語情報を生成）を含むテキスト解析処理を行い、少なくとも単語情報（単語の表記、品詞、読み等からなる）を含むテキスト解析情報を出力する（図２のステップＳ１）。 The text analysis unit 11 performs at least a well-known morphological analysis process (generation of word information) on the text of the text document, and at least word information (consisting of word notation, part of speech, reading, etc.). The included text analysis information is output (step S1 in FIG. 2).

少なくとも単語情報を含むテキスト解析情報の一例として、図４（ａ）のテキスト文書に対するテキスト解析情報（表記、読み等は省略）を図４（ｂ）に示す（但し、図４（ｂ）では後述する感性表現抽出処理の結果、即ち抽出された感性表現を示す下線を含んでいるが、この段階では存在しない。）。 As an example of text analysis information including at least word information, text analysis information (notation, reading, etc. is omitted) for the text document in FIG. 4A is shown in FIG. 4B (however, it will be described later in FIG. 4B). Although the underline indicating the result of the emotional expression extraction process, that is, the extracted emotional expression is included, it does not exist at this stage.

感性表現抽出部１２は、テキスト解析部１１から出力されたテキスト解析情報とともに、感性表現辞書２０および感性表現抽出ルール３０を用いて、前記テキスト文書中の感性表現を抽出する（図２のステップＳ２）。 The emotional expression extraction unit 12 extracts the emotional expression in the text document using the emotional expression dictionary 20 and the emotional expression extraction rule 30 together with the text analysis information output from the text analysis unit 11 (step S2 in FIG. 2). ).

ここで、感性表現とは、任意の対象に関する印象や感じ方を表す、「もの静か」「かっこいい」「しっかりした」「クール」等の表現である。 Here, the emotional expression is an expression such as “quiet”, “cool”, “solid”, “cool”, etc., representing an impression or feeling about an arbitrary object.

感性表現辞書２０は、感性表現となり得る単語について、その表記、品詞、読みとともにカテゴリの情報を登録してなるもので、図５にその一例を示す。 The sensibility expression dictionary 20 is configured by registering category information together with the notation, part of speech, and reading of a word that can be a sensibility expression. FIG. 5 shows an example thereof.

また、感性表現抽出ルール３０は、感性表現の記述に関するルールについて、そのルールＩＤと、感性表現を構成する各単語の正規表現からなる感性表現パターンと、当該感性表現のカテゴリとを登録してなるものである。 In addition, the Kansei Expression Extraction Rule 30 registers a Rule ID, a Kansei Expression Pattern composed of a regular expression of each word constituting the Kansei Expression, and a category of the Kansei Expression for the rule related to the description of the Kansei Expression. Is.

図６に感性表現抽出ルール３０の一例を示す。感性表現パターンにおいて、＜＞は１個（の単語）の正規表現、（？：＜＞）＊は０個以上（の単語）の正規表現、（？：＜＞）？は０または１個（の単語）の正規表現に相当し、「ｅ：」は感性表現に対する条件、「ｐ：」は品詞に対する条件、「ｈ：」は表記に対する条件であることを示す。また、カテゴリの“ｄｉｃ−ｃａｔｅｇｏｒｙ”は感性表現辞書２０で付与されたカテゴリをそのまま感性表現のカテゴリとすることを表す。 FIG. 6 shows an example of the sensitivity expression extraction rule 30. In the Kansei expression pattern, <> is one (or word) regular expression, (?: <>) * Is zero or more (or word) regular expressions, (?: <>)? Is equivalent to 0 or 1 (word) regular expression, “e:” is a condition for emotional expression, “p:” is a condition for part of speech, and “h:” is a condition for notation. Further, the category “dic-category” represents that the category assigned in the emotional expression dictionary 20 is used as the category of the emotional expression as it is.

感性表現の抽出には既存技術、例えば、特開２００８−１４０３５９「評価情報抽出装置、評価情報抽出方法およびそのプログラム」の評価表現抽出部の処理を利用することにより可能である。この場合、評価表現辞書の代わりに感性表現辞書２０を用い、評価表現ルールの代わりに感性表現抽出ルール３０を用いるものとする。 Kansei expressions can be extracted by using the processing of an existing technique, for example, the evaluation expression extraction unit of Japanese Patent Application Laid-Open No. 2008-140359 “Evaluation Information Extraction Apparatus, Evaluation Information Extraction Method and Program”. In this case, the sensitivity expression dictionary 20 is used instead of the evaluation expression dictionary, and the sensitivity expression extraction rule 30 is used instead of the evaluation expression rule.

さらに、感性表現抽出ルール３０で抽出された感性表現に対し、抽出中に与えられた情報により、感性表現の絞込みを行っても良い。例えば、図６の感性表現抽出ルールを用いた場合、ルールＩＤの先頭が“Ｋａｎｓｅｉ”で始まる表現だけを感性表現として抽出する。 Further, the sensitivity expression extracted by the sensitivity expression extraction rule 30 may be narrowed down by the information given during the extraction. For example, when the sensitivity expression extraction rule of FIG. 6 is used, only the expression whose rule ID starts with “Kansei” is extracted as the sensitivity expression.

図４（ａ）のテキスト文書に対する感性表現抽出部１２の処理結果（抽出された感性表現）を図４（ｂ）中の下線で示す。 The processing result (extracted emotional expression) of the emotional expression extracting unit 12 for the text document of FIG. 4A is indicated by an underline in FIG. 4B.

感性情報生成部１３は、感性表現抽出部１２で抽出された感性表現単位に、少なくとも当該感性表現に関する情報である感性表現情報を含む感性情報を生成して出力する（図２のステップＳ３）。 The sentiment information generating unit 13 generates and outputs the sentiment information including at least sentiment expression information that is information related to the sentiment expression in the sentiment expression unit extracted by the sentiment expression extracting unit 12 (step S3 in FIG. 2).

感性情報は、感性表現抽出部１２で抽出された感性表現単位のレコードからなり、各レコードは、レコードＩＤと、各感性表現に関する表記、標準形、カテゴリからなる感性表現情報とを少なくとも含む。さらに、各感性表現が抽出されたテキスト文書に関する文書ＩＤ、著者ＩＤ、著者カテゴリからなる文書メタ情報（文書情報）を含んでも良い。図４（ａ）のテキスト文書に対する感性情報の一例を図４（ｃ−１）に示す。 The Kansei information is composed of Kansei expression unit records extracted by the Kansei expression extracting unit 12, and each record includes at least a record ID and Kansei expression information including notation, standard form, and category related to each Kansei expression. Furthermore, document meta information (document information) including a document ID, an author ID, and an author category related to a text document from which each sensitivity expression is extracted may be included. An example of sensitivity information for the text document in FIG. 4A is shown in FIG.

感性情報ＤＢ４０は、感性情報抽出部１０から出力された感性情報を格納するデータベースであり、ＳＱＬのような各種検索条件によりレコード検索可能な周知のものを用いれば良い。 The sensitivity information DB 40 is a database that stores sensitivity information output from the sensitivity information extraction unit 10, and a known information that can be searched for records by various search conditions such as SQL may be used.

感性検索部５０は、図１に示すように、条件入力部５１、検索・集計部５２および結果出力部５３からなり、図示しない入力手段から直接入力され又は記憶手段から読み出されて入力され又は通信媒体を介して他の装置等から入力された検索条件および集計条件を入力とし、感性情報ＤＢ４０および感性ベクトル辞書６０、またはこれらに加えて集計結果蓄積ＤＢ７０を用いて検索結果を出力する。 As shown in FIG. 1, the sensitivity search unit 50 includes a condition input unit 51, a search / aggregation unit 52, and a result output unit 53, and is directly input from an input unit (not shown) or read and input from a storage unit. The search condition and the total condition input from another device or the like via the communication medium are input, and the search result is output using the sensitivity information DB 40 and the sensitivity vector dictionary 60 or the total result storage DB 70 in addition to these.

感性ベクトル辞書６０は、各感性表現に対する感性ベクトルを、各感性表現を示す感性表現ＩＤとともに登録してなるものである。 The sentiment vector dictionary 60 registers the sentiment vector for each sentiment expression together with the sentiment expression ID indicating each sentiment expression.

感性表現ＩＤは、当該感性ベクトルを有する感性表現を表すためのＩＤである。例えば、感性表現標準形を感性表現ＩＤとしたり、感性情報ＤＢ４０の感性情報のカラムに数値化した感性表現ＩＤを保持し、その感性表現ＩＤをそのまま感性ベクトル辞書６０の感性表現ＩＤとしたりしても良い。また、例えば、感性表現カテゴリを感性表現ＩＤとすれば、複数の感性表現をまとめて一つの感性ベクトルとして表現することも可能となる。感性ベクトルは、予め定めた次元数からなり、各次元は予め定めた感性軸に対する値を示す。 The sensitivity expression ID is an ID for expressing a sensitivity expression having the sensitivity vector. For example, the Kansei expression standard form is used as the Kansei expression ID, or the Kansei information column of the Kansei information DB 40 stores the digitized Kansei expression ID, and the Kansei expression ID is used as the Kansei expression ID of the Kansei vector dictionary 60 as it is. Also good. Further, for example, if a sensitivity expression category is a sensitivity expression ID, a plurality of sensitivity expressions can be collectively expressed as one sensitivity vector. The sensitivity vector has a predetermined number of dimensions, and each dimension indicates a value with respect to a predetermined sensitivity axis.

図７に感性ベクトル辞書６０の一例を示す。本例においては、感性表現ＩＤとして感性表現標準形を用いている。また、感性ベクトルは、次元数＝７であり、第１次元の軸＝美しさ（より美しいほど値が大きい）、第２次元の軸＝静かさ、…、第７次元の軸＝強さと定めているものとする。 FIG. 7 shows an example of the sensitivity vector dictionary 60. In this example, the sensitivity expression standard form is used as the sensitivity expression ID. In addition, the sensitivity vector is defined such that the number of dimensions = 7, the first dimension axis = beauty (the value is larger as it is more beautiful), the second dimension axis = silence,..., The seventh dimension axis = strength. It shall be.

集計結果蓄積ＤＢ７０は、予め検索して集計しておいた、特定の検索条件および集計条件に対する集計結果を保存するためのもので、図８にその一例を示す。なお、内容の詳細については後述する。 The tabulation result accumulation DB 70 is for storing tabulation results for a specific search condition and tabulation conditions that have been searched and tabulated in advance, and an example is shown in FIG. Details of the contents will be described later.

さらに、この外、感性情報抽出部１０に入力したテキスト文書を、文書ＩＤをキーに検索可能とするためにデータベース化した文書ＤＢや、テキスト文書の著者の属性（著者名、性別、年代、趣味、職業等）を、著者ＩＤをキーに検索可能とするためにデータベース化した著者ＤＢを用意して利用するようにしても良い。 In addition, the text document input to the sensibility information extraction unit 10 is made into a database so that the text document can be searched using the document ID as a key, and the attributes of the author of the text document (author name, gender, age, hobby) , Occupations, etc.) may be prepared and used as a database in order to be searchable using the author ID as a key.

図９は感性検索部５０における処理のようすの一例、ここでは著者ＩＤで入力された一の著者に対し、感性的に近い順に芸能人の一覧を検索結果として出力する場合の例を示すもので、以下、これに従って感性検索部５０の動作について説明する。 FIG. 9 shows an example of processing in the sensitivity search unit 50, and here shows an example in the case of outputting a list of entertainers as search results in order of sensitivity to one author input with the author ID. Hereinafter, the operation of the sensitivity search unit 50 will be described in accordance with this.

本実施の形態においては、感性的に近いかどうかの判定には、著者単位の感性ベクトルを作成し、その距離（本実施の形態においては周知のコサイン距離を利用。他の距離尺度を用いても良い。）の近いものほど感性的に近いとして扱う。 In this embodiment, to determine whether or not it is close to the sensibility, a sensitivity vector for each author is created, and the distance (in this embodiment, a known cosine distance is used. Using another distance measure) It is also good).

著者ＩＤ＝ｘである一の著者の感性ベクトルは、感性情報ＤＢ４０中の著者ＩＤ＝ｘである全てのレコード、あるいは感性情報ＤＢ４０を著者ＩＤ＝ｘで規定のレコード数上限まで検索した結果であるところの全てのレコードを対象に、その各感性表現に対応する感性ベクトルを感性ベクトル辞書６０より得て、次元毎にその全ベクトルの平均をとったものを著者ＩＤ＝ｘの感性ベクトルとする。 The sensitivity vector of one author whose author ID is x is the result of searching all records with the author ID = x in the sensitivity information DB 40 or the sensitivity information DB 40 up to the maximum number of records specified by the author ID = x. However, for all the records, the sensitivity vectors corresponding to each sensitivity expression are obtained from the sensitivity vector dictionary 60, and the average of all the vectors for each dimension is set as the sensitivity vector of author ID = x.

本実施の形態においては、著者ＤＢを用いず、感性情報ＤＢ４０に格納された感性情報の文書情報中に著者カテゴリとして著者の職業が保持されているものとする。また、集計結果蓄積ＤＢ７０には図８に示すように少なくとも、著者カテゴリが芸能人である著者単位の計算済みの感性ベクトルがその著者ＩＤおよび著者カテゴリとともに保存されているものとする。 In the present embodiment, it is assumed that the author's occupation is held as the author category in the document information of the sensitivity information stored in the sensitivity information DB 40 without using the author DB. In addition, as shown in FIG. 8, it is assumed that at least the calculated sensitivity vector for each author whose author category is an entertainer is stored in the tabulation result storage DB 70 together with the author ID and the author category.

条件入力部５１は、入力された検索条件および集計条件を受け付け、検索・集計部５２に渡す（図３のステップＳ１１）。 The condition input unit 51 receives the input search condition and totaling condition and passes them to the search / totaling unit 52 (step S11 in FIG. 3).

ここで、検索条件および集計条件の入力形式としては、様々なものが考えられる。例えば、全ての検索条件および集計条件を任意入力可能とすることができる。また、特定の用途に用いる感性検索を行うために、検索条件および集計条件の一部は固定として予め保持しておき、任意入力可能な検索条件および集計条件のみ、グラフィカルユーザインタフェース等を介してユーザに入力させるようにしても良い。 Here, various types of search conditions and tabulation conditions can be input. For example, all search conditions and aggregation conditions can be arbitrarily input. In addition, in order to perform a sensitivity search used for a specific application, a part of the search conditions and the totaling conditions are held in advance as fixed, and only the search conditions and the totaling conditions that can be arbitrarily input can be obtained via a graphical user interface or the like. You may make it input to.

図９（ａ）に示す検索条件および集計条件では、検索条件として検索対象となる著者の著者ＩＤのみ任意入力可能とし、その他の検索条件として、職業が芸能人である著者の著者ＩＤ、即ち著者カテゴリが芸能人である著者の著者ＩＤが比較対象として固定の指定がなされ、また、集計条件として、入力された検索対象著者ＩＤの著者の感性ベクトルと比較対象である各著者ＩＤの著者の感性ベクトルとの距離を計算し、距離が近い順にランキングするという固定の指定がなされているものとする。そして、本例においては、検索対象著者ＩＤとして、「ａｂｃ０００５」が入力されたものとする。 In the search conditions and tabulation conditions shown in FIG. 9A, only the author ID of the author to be searched can be arbitrarily entered as the search condition, and the author ID of the author whose profession is the entertainer, that is, the author category, is the other search condition. The author ID of the author who is the entertainer is fixed as a comparison target, and the sensitivity vector of each author ID that is the comparison target and the author's sensitivity vector of the entered search target author ID as the aggregation condition It is assumed that a fixed designation has been made such that the distances are calculated and ranked in order of increasing distance. In this example, it is assumed that “abc0005” is input as the search target author ID.

次に、検索・集計部５２は、条件入力部５１より入力された検索条件に基づいて感性情報ＤＢ４０とともに感性ベクトル辞書６０および集計結果蓄積ＤＢ７０を検索し、その検索結果を条件入力部５１より入力された集計条件に基づいて集計して集計結果を作成する（図３のステップＳ１２）。 Next, the search / aggregation unit 52 searches the sensitivity vector DB 60 and the aggregation result accumulation DB 70 together with the sensitivity information DB 40 based on the search conditions input from the condition input unit 51, and inputs the search results from the condition input unit 51. Based on the totaled condition, a total result is created (step S12 in FIG. 3).

本実施の形態では、比較対象の著者ＩＤに対応する感性ベクトルについては、集計結果蓄積ＤＢ７０（図８）に保存されているため、計算を行う必要がないが、検索対象である著者ＩＤ＝ａｂｃ０００５（著者カテゴリが芸能人ではない）の感性ベクトルについては算出する必要がある。ここで、著者ＩＤ＝ａｂｃ０００５が記述した文書が図４（ａ）に示した文書のみであった場合、感性情報ＤＢ６０から著者ＩＤ＝ａｂｃ０００５であるレコードを検索すると、図４（ｃ−１）に示した感性情報が結果として得られる。 In the present embodiment, the sensitivity vector corresponding to the author ID to be compared is stored in the tabulation result accumulation DB 70 (FIG. 8), and therefore, there is no need to perform calculation, but the author ID = abc0005 that is the search target. It is necessary to calculate the sensitivity vector (the author category is not a celebrity). Here, when the document described by the author ID = abc0005 is only the document shown in FIG. 4A, when a record with the author ID = abc0005 is searched from the sensitivity information DB 60, the document shown in FIG. The sensitivity information shown is obtained as a result.

図４（ｃ−１）に示した各感性情報の感性表現標準形をキーに感性ベクトル辞書６０（図７）を検索し、各感性表現の感性ベクトルを得て、その平均を算出した結果が著者ＩＤ＝ａｂｃ０００５の感性ベクトルとなる。こうして算出した著者ＩＤ＝ａｂｃ０００５の感性ベクトルを図９（ｂ）に示す。 The sensitivity vector dictionary 60 (FIG. 7) is searched using the sensitivity expression standard form of each sensitivity information shown in FIG. 4 (c-1) as a key, the sensitivity vector of each sensitivity expression is obtained, and the average is calculated. Author ID = abc0005 sensitivity vector. FIG. 9B shows the sensitivity vector of the author ID = abc0005 calculated in this way.

次に、前記著者ＩＤ＝ａｂｃ０００５の感性ベクトルと、集計結果蓄積ＤＢ７０より検索された（検索数に予め上限を設けても良い）比較対象の各著者ＩＤに対応する感性ベクトルそれぞれとの感性距離を算出、ここでは周知のコサイン距離を利用して算出し、感性距離が近い順にソートした結果を図９（ｃ）に示す。 Next, the sensitivity distance between the sensitivity vector of the author ID = abc0005 and the sensitivity vectors corresponding to each of the comparison-target author IDs searched from the tabulation result storage DB 70 (an upper limit may be set in advance) may be obtained. FIG. 9C shows the result of calculation, here calculated using a known cosine distance, and sorted in the order of the sensitivity distance.

最後に、結果出力部５３は、検索・集計部５２で作成された集計結果を用いて予め定められた形式で検索結果を画面等に出力する（図３のステップＳ１３）。 Finally, the result output unit 53 outputs the search result to a screen or the like in a predetermined format using the total result created by the search / total unit 52 (step S13 in FIG. 3).

図９（ｃ）の集計結果から作成した出力結果の一例を図９（ｄ）に示す。「ブログＵＲＬ」や「こんなこと書いています」（ブログＵＲＬで示したブログ内の感性表現を含むｓｎｉｐｐｅｔ）は、感性情報ＤＢ４０にこれらの情報のカラムを設けて予め格納する、あるいは文書ＤＢを保持し、感性情報ＤＢ４０を当該著者ＩＤで検索し、文書ＩＤを特定して文書ＤＢを検索することで得ることができる。また、「芸能人名」は、感性情報ＤＢ４０にカラムを設けて予め格納する、あるいは著者ＤＢを保持し、当該著者ＩＤで著者ＤＢを検索することで得ることができる。 An example of the output result created from the tabulation result of FIG. 9C is shown in FIG. “Blog URL” and “I am writing this” (snippet including the emotional expression in the blog indicated by the blog URL) are stored in advance in the sensitivity information DB 40, or stored in the document DB. It can be obtained by searching the sensitivity information DB 40 with the author ID, specifying the document ID, and searching the document DB. The “entertainer name” can be obtained by providing a column in the sensitivity information DB 40 and storing it in advance, or by holding the author DB and searching the author DB with the author ID.

＜第２の実施の形態＞
図１０は本発明の感性情報抽出・検索装置の第２の実施の形態を示すもので、ここでは第１の実施の形態において感性情報抽出部１０に関係抽出部１４を加え、感性情報として感性表現とともにその対象に関する情報も出力するようにしたものである。 <Second Embodiment>
FIG. 10 shows a second embodiment of the sensitivity information extraction / retrieval apparatus according to the present invention. Here, in the first embodiment, a relationship extraction unit 14 is added to the sensitivity information extraction unit 10 and sensitivity is obtained as sensitivity information. Information about the object is output along with the expression.

関係抽出部１４は、感性表現抽出部１２で抽出された各感性表現が対象としているものを前記テキスト文書から抽出する。関係のある対象の抽出は、平野他「テキストにおける固有表現間の意味的関係の抽出」自然言語処理学会第１３回年次大会発表論文集、２００７等の技術を利用することにより可能である。 The relationship extraction unit 14 extracts, from the text document, those targeted by each emotional expression extracted by the emotional expression extraction unit 12. Extraction of related objects is possible by using techniques such as Hirano et al. “Extraction of semantic relations between proper expressions in text”, Natural Language Processing Society 13th Annual Conference, 2007, etc.

ここで、関係抽出に利用するテキスト解析情報としては、単語情報に加えて、周知の係り受け解析を行うことにより得られる文節・係り受け情報と、周知の固有表現抽出を行うことにより得られる固有表現情報とが必要になるため、テキスト解析部１１では係り受け解析および固有表現抽出の処理も行い、文節・係り受け情報および固有表現情報もテキスト解析情報として関係抽出部１４に出力するものとする。 Here, the text analysis information used for the relationship extraction includes the phrase / dependency information obtained by performing well-known dependency analysis in addition to the word information, and the unique information obtained by performing well-known specific expression extraction. Since the expression information is required, the text analysis unit 11 also performs dependency analysis and specific expression extraction processing, and outputs the phrase / dependency information and the specific expression information to the relationship extraction unit 14 as text analysis information. .

また、感性情報生成部１３では、感性表現抽出部１２で抽出された感性表現単位に、当該感性表現に関する情報である感性表現情報とともに、関係抽出部１４で抽出されたその対象に関する情報である感性対象情報を含む感性情報を生成して出力する。 In addition, in the sensitivity information generation unit 13, in the sensitivity expression unit extracted by the sensitivity expression extraction unit 12, the sensitivity information that is information related to the target extracted by the relationship extraction unit 14 together with the sensitivity expression information that is information related to the sensitivity expression. Kansei information including target information is generated and output.

図４（ａ）のテキスト文書に対する、感性表現情報とともに感性対象情報を含む感性情報の一例を図４（ｃ−２）に示す。ここで、感性表現「シック」および「秋っぽく」の対象は「○○」であることを表している。 FIG. 4C-2 shows an example of sensitivity information including sensitivity object information as well as sensitivity expression information for the text document of FIG. Here, the sensibility expressions “chic” and “autumn” are expressed as “OO”.

感性検索部２０は、第１の実施の形態の場合と同様である。但し、検索条件および集計条件として、感性対象情報も利用できる点が第１の実施の形態の場合と異なる。 The sensibility search unit 20 is the same as that in the first embodiment. However, it differs from the case of the first embodiment in that sensitivity object information can also be used as a search condition and a totaling condition.

本実施の形態によれば、著者ＩＤ単位で算出した著者の感性ベクトル以外にも、感性対象情報（例えば感性対象標準形）単位に、著者感性ベクトルを算出したのと同様の手法をとることにより、対象感性ベクトルを作成し、感性対象間での距離を用いることにより、類似した対象を検索する等が可能となる。 According to the present embodiment, in addition to the author's sensitivity vector calculated in units of author ID, by taking the same method as that for calculating the author sensitivity vector in units of sensitivity object information (for example, sensitivity object standard form) By creating a target sensitivity vector and using the distance between the sensitivity targets, it is possible to search for a similar target.

本発明の感性情報抽出・検索装置の第１の実施の形態を示す構成図The block diagram which shows 1st Embodiment of the sensitivity information extraction / retrieval apparatus of this invention 感性情報抽出部における感性情報抽出処理の流れ図Flow chart of Kansei information extraction process in Kansei information extraction unit 感性検索部における検索処理の流れ図Flow chart of search processing in Kansei search unit 感性情報抽出部における処理のようすの一例を示す説明図Explanatory drawing which shows an example of the process in the sensitivity information extraction part 感性表現辞書の一例を示す説明図Explanatory drawing which shows an example of a sensitivity expression dictionary 感性表現抽出ルールの一例を示す説明図Explanatory drawing showing an example of Kansei expression extraction rules 感性ベクトル辞書の一例を示す説明図Explanatory drawing showing an example of a sensitivity vector dictionary 集計結果蓄積ＤＢの一例を示す説明図Explanatory drawing which shows an example of totaling result accumulation DB 感性検索部における処理のようすの一例を示す説明図Explanatory drawing which shows an example of the process in the sensitivity search part 本発明の感性情報抽出・検索装置の第２の実施の形態を示す構成図The block diagram which shows 2nd Embodiment of the sensitivity information extraction / retrieval apparatus of this invention

Explanation of symbols

１０：感性情報抽出部、１１：テキスト解析部、１２：感性表現抽出部、１３：感性情報生成部、１４：関係抽出部、２０：感性表現辞書、３０：感性表現抽出ルール、４０：感性情報データベース（ＤＢ）、５０：感性検索部、５１：条件入力部、５２：検索・集計部、５３：結果出力部、６０：感性ベクトル辞書、７０：集計結果蓄積データベース（ＤＢ）。 10: Kansei information extracting unit, 11: Text analyzing unit, 12: Kansei expression extracting unit, 13: Kansei information generating unit, 14: Relationship extracting unit, 20: Kansei expression dictionary, 30: Kansei expression extraction rule, 40: Kansei information Database (DB), 50: Kansei search unit, 51: Condition input unit, 52: Search / aggregation unit, 53: Result output unit, 60: Kansei vector dictionary, 70: Aggregation result accumulation database (DB).

Claims

A sensitivity information extraction / retrieval device that extracts sensitivity information from a text document and creates a database, and performs information retrieval using this database,
A text analysis unit that performs text analysis on a text document and outputs text analysis information;
Using the text analysis information, a Kansei expression dictionary that registers information of words that can be Kansei expressions that are expressions representing impressions and feelings about an arbitrary object, and Kansei expression extraction rules that register patterns of Kansei expressions to be extracted A sensitivity expression extracting unit for extracting the sensitivity expression in the text document;
A sensitivity information extraction unit configured by a sensitivity information generation unit that generates and outputs sensitivity information including at least information related to the sensitivity expression in the sensitivity expression unit;
A sensitivity information database for storing sensitivity information output from the sensitivity information extraction unit;
A condition input part for receiving the input search condition and aggregation condition;
A search / aggregation unit that retrieves at least an emotion vector dictionary in which an emotion vector for each emotion expression is registered together with the sensitivity information database based on the search condition, and aggregates the search results based on the aggregation condition to create an aggregation result When,
A sensibility information extraction / retrieval device comprising: a sensibility search unit configured to include a result output unit that outputs a search result in a predetermined format using the aggregation result.

Based on the search condition of the search target, search the sensitivity vector dictionary together with the sensitivity information database, obtain the sensitivity vector of each sensitivity expression related to the search target, calculate the average to find the sensitivity vector corresponding to the search target, and total The search / aggregation unit that calculates a sensitivity distance between a sensitivity vector corresponding to a comparison target based on a condition and a sensitivity vector corresponding to the search target using a cosine distance is provided. Kansei information extraction / retrieval device.

In addition to the above
It has a summary result accumulation database that stores summary results for specific search conditions and summary conditions.
The sensibility information extraction / retrieval device according to claim 2, wherein the retrieval / aggregation unit reads out and uses the sensibility vector corresponding to the comparison target based on the aggregation condition from the aggregation result accumulation database.

In addition to the above, the sensitivity information extraction unit
A relationship extraction unit that extracts a target of emotional expression from the text document;
The Kansei information generation / search unit according to any one of Claims 1 to 3, wherein the Kansei information generation unit generates and outputs Kansei information including at least the Kansei expression and information related to the target in the Kansei expression unit. apparatus.

Kansei information extraction / retrieval method for extracting Kansei information from a text document into a database and performing information retrieval using this database,
A step in which the text analysis unit performs text analysis on the text document and outputs text analysis information;
Kansei expression extraction unit registered Kansei expression dictionary that registered the text analysis information, information of words that could be an expression of impressions and feelings related to any object, and Kansei expression patterns to be extracted Extracting emotional expressions in the text document using expression extraction rules;
A sentiment information generation unit that generates sentiment information including at least information related to the sentiment expression in the sentiment expression unit and outputs it to the sentiment information database;
A step in which the condition input unit receives the input search condition and aggregation condition;
The search / aggregation unit searches at least a sensitivity vector dictionary in which sensitivity vectors for each emotional expression are registered together with the sensitivity information database based on the search conditions, and totals the search results based on the aggregation conditions. A step to create,
A result output unit including a step of outputting a search result in a predetermined format using the tabulated result.

The search / aggregation step searches the sensitivity vector dictionary together with the sensitivity information database based on the search condition of the search target, obtains the sensitivity vector of each sensitivity expression related to the search target, calculates the average, and corresponds to the search target The sensibility vector is obtained, and the sensibility distance between the sensibility vector corresponding to the comparison target based on the totaling condition and the sensibility vector corresponding to the search target is calculated using a cosine distance. Kansei information extraction / search method.

The retrieval / aggregation step reads out and uses a sensitivity vector corresponding to a comparison target based on the aggregation condition from an aggregation result accumulation database that stores the aggregation result for the specific search condition and the aggregation condition. Kansei information extraction and retrieval method.

In addition to the above
A relationship extraction unit including a step of extracting an object of emotional expression from the text document;
The Kansei information generation / retrieval step according to any one of Claims 5 to 7, wherein the Kansei information generation step generates and outputs Kansei information including at least the Kansei expression and information on the object in the Kansei expression unit. Method.

A program for causing a computer to execute each processing step of the sensitivity information extraction / retrieval method according to any one of claims 5 to 8.