JP2012008874A

JP2012008874A - Image selection device, method and program

Info

Publication number: JP2012008874A
Application number: JP2010145301A
Authority: JP
Inventors: Tomoaki Konno; 智明今野; Emi Meido; 絵美明堂; Ryoichi Kawada; 亮一川田
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2010-06-25
Filing date: 2010-06-25
Publication date: 2012-01-12
Anticipated expiration: 2030-06-25
Also published as: JP5455232B2

Abstract

PROBLEM TO BE SOLVED: To provide an image selection device for automatically applying an image which is suitable for the impression of a document such as a blog.SOLUTION: The image selection device is configured of: an impression information extraction part 2 for extracting, from an input document, an impression word and a document impression relevance ratio showing the strength of an impression of the impression word in the input document; an impression/image DB 30 for storing an image, an image impression word and an image impression relevance ratio; and an image retrieval part 3 for retrieving the image in which the extracted document impression word is matched with the image impression word in the impression/image DB 30, and for obtaining an image selection score from the document impression relevance ratio and the image impression relevance ratio of the retrieved image.

Description

本発明は文書用の画像選定装置、方法及びプログラムに関し、特に文書の印象や雰囲気に合った画像を自動で選定し、付与する画像選定装置、方法及びプログラムに関する。 The present invention relates to an image selection apparatus, method, and program for a document, and more particularly to an image selection apparatus, method, and program for automatically selecting and assigning an image that matches the impression and atmosphere of a document.

ブログ記事をブログ投稿者が書く際に画像を付与するケースは多々ある。例えば、自分のその日の出来事についての記事に、画像を付加してブログを作成し投稿を行う。この場合は、自分で撮影した画像を使うケースが多い。一方で、自分の感情や思い、悩みなどを出来事と同時にもしくは独立にブログに書くことがある。テキスト文書であるブログ記事だけでは伝わりにくい記事の雰囲気や印象を画像で表現できれば、投稿者の思いをブログ閲覧者に効果的に伝えることができる。 There are many cases where images are added when a blog author writes a blog article. For example, add an image to an article about what happened that day, create a blog, and post it. In this case, there are many cases where an image taken by oneself is used. On the other hand, I sometimes write my feelings, thoughts, and troubles on my blog at the same time or independently. If you can express the atmosphere and impression of an article that is difficult to convey with just a blog article that is a text document, you can effectively convey the thoughts of the poster to the blog viewer.

しかし、こういったケースでは、ブログ記事に対応する画像をブログ投稿者自身が持っていることは少ないと考えられ、また適した画像を取得して付与するには労力がかかる。また、記事の雰囲気を表すような画像を取得できた場合においても、ブログのテンプレートと画像との配色的なバランスがとれていることがより望ましいが、このような記事の雰囲気を表しており且つブログのテンプレートとの配色的バランスがとれている画像をブログ投稿者自身が取得することはさらに困難を伴う。したがって、ブログに適した印象を表す画像を自動で収集・選択するシステムを実現できれば非常に有用であると考えられる。 However, in such cases, it is considered that blog contributors rarely have images corresponding to blog posts, and it takes a lot of effort to acquire and assign suitable images. In addition, even when an image representing the atmosphere of an article can be acquired, it is more desirable that the blog template and the image have a color-balanced balance. It is even more difficult for the blog poster to obtain an image that is well balanced with the blog template. Therefore, it would be very useful if a system that automatically collects and selects images representing impressions suitable for blogs can be realized.

こうした事情に関連した従来技術として、以下の特許文献１〜３及び非特許文献１〜３に開示された技術がある。 As conventional techniques related to such circumstances, there are techniques disclosed in the following Patent Documents 1 to 3 and Non-Patent Documents 1 to 3.

特許文献１（感情画像作成システム、感情画像作成方法、及び感情画像作成プログラム）では、各感情の度合いを表す感情ベクトルと単語との対応付けがされている感情辞書を利用して、入力された文書及び単語から感情を判定し、文書及び単語に対して感情ベクトルを求める。一方、予め用意しておいた画像データベース中の画像に対しても感情ベクトルと関連付けて保持しておく。文書から求めた感情ベクトルと最も近いベクトルを画像データベースから探し、それを文書の感情を表す画像とする。文書中に複数感情を表す単語が存在する場合には、感情ベクトルの近さに応じて画像の大きさを変化させることで、複数の感情を画像で表すことができるとしている。 In Patent Document 1 (emotion image creation system, emotion image creation method, and emotion image creation program), an input is made using an emotion dictionary in which an emotion vector representing the degree of each emotion is associated with a word. Emotion is determined from the document and the word, and an emotion vector is obtained for the document and the word. On the other hand, the image in the prepared image database is also stored in association with the emotion vector. A vector closest to the emotion vector obtained from the document is searched from the image database, and is used as an image representing the emotion of the document. When there are words representing multiple emotions in the document, it is possible to represent multiple emotions with images by changing the size of the image according to the proximity of emotion vectors.

特許文献２（データ処理装置、データ処理方法、及びプログラム）では、ブログの文書及びブログのコメントの文書から受ける印象と、それらの文書と共に公開される画像との印象を近づけるという技術が開示されている。ブログ文書中に「嬉しい」や「悲しい」のような感情を表すキーワードが含まれているかによって、文書の印象を判断し、文書を分類する。感情を表すキーワード群は、予め用意されたものを利用する。この文書から得られた感情情報に応じて、文書と同時に入力された画像を変化させることで、文書と画像の印象を近づけることができるとしている。例えば、文書が「嬉しい」という印象であれば、人の顔の画像の「口元をあげる」ことで、画像の印象を「嬉しい」に近づけるといったことである。 Patent Document 2 (a data processing apparatus, a data processing method, and a program) discloses a technique for bringing an impression received from a blog document and a blog comment document closer to an image published together with those documents. Yes. The impression of the document is judged and the document is classified according to whether the blog document includes a keyword representing emotion such as “happy” or “sad”. A keyword group representing emotions is prepared in advance. According to the emotion information obtained from this document, the impression of the document and the image can be made closer by changing the image input simultaneously with the document. For example, if the document has an impression of “happy”, “raise the mouth” of the image of the person's face may bring the image closer to “happy”.

特許文献３（服飾品購入支援システム、サーバ、服飾品購入支援方法及びプログラム）では、ユーザは、服飾品のコーディネートの支援を行うことを目的として、印象情報や配色情報を利用した服飾品の絞り込み方法を提供している。印象情報による服飾品の絞り込みを行う場合には、上半身といった部位毎に、「高級感」や「フォーマル」といった「資質」を入力する。この場合、印象の観点からユーザの要求にマッチする服飾品の情報をユーザに提供する。また、配色情報による服飾品の絞り込みを行う場合には、服装のベースとなる色である「ベースカラー」およびそれを与える「対応部位」や「類似効果」、「反対色相」といった「色彩効果」およびそれを与える「対応部位」といった、コーディネート規則を入力する。色の観点からユーザの要求にマッチすると思われる服飾品の情報をユーザに提供する。印象情報もしくは配色情報のいずれかを利用することで、ユーザの嗜好を考慮した服飾品の候補の絞りこみを行うシステムを提供する。 In Patent Document 3 (apparel purchase support system, server, apparel purchase support method and program), the user narrows down the apparel using impression information and color arrangement information for the purpose of assisting the coordination of the apparel. Providing a way. When narrowing down clothing items based on impression information, “qualities” such as “luxury” and “formal” are input for each part such as the upper body. In this case, from the viewpoint of impression, information on clothing items that match the user's request is provided to the user. In addition, when narrowing down clothing by color scheme information, “color effect” such as “base color” that is the base color of clothing and “corresponding part”, “similar effect”, and “opposite hue” that give it And a coordination rule such as “corresponding part” for giving it. Providing the user with information on clothing that seems to match the user's requirements from a color perspective. By using either impression information or color arrangement information, a system is provided that narrows down candidates for clothing items in consideration of user preferences.

非特許文献１（ブログ記事からのイベント文抽出によるシーンの生成）では、ブログ記事中から出来事を表すイベント文を抽出し、イベント文中に表れる場所、対象物、動作という3つの観点についてそれぞれを画像で表す技術が公開されている。場所、対象物、動作を表す単語を抽出する際に、各単語と特定のフレーズとの組み合わせ（文脈を考慮した検索）の検索クエリでテキスト検索エンジンによる検索を行った場合のヒット数を利用することで、各単語が場所、対象物、動作のどれに該当するかを決定する。 In Non-Patent Document 1 (Generating a scene by extracting an event sentence from a blog article), an event sentence that represents an event is extracted from the blog article, and each of the three viewpoints of the place, object, and action that appears in the event sentence is imaged. The technology represented by is published. When extracting words that represent places, objects, and actions, use the number of hits when a search by a text search engine is performed with a search query of a combination of each word and a specific phrase (search considering the context). Thus, it is determined whether each word corresponds to a place, an object, or an action.

非特許文献２（ソーシャルアノテーションに基づく動画検索手法）では、動画共有サイトにおける動画に付与されたタイトル、サマリ、タグに加えて、動画閲覧者によるコメントを利用した動画検索を提案している。例として、ニコニコ動画（登録商標）を利用しているが、こういったサイトでは、コメントを動画の特定の再生時間に対して付与できる。検索クエリがコメント中に出現した数による検索スコアリングや、複数の検索クエリに対して、それらがコメント中に出現した時間の間隔に基づいたスコアリングを行う手法を提案している。さらに、名詞などのキーワードで検索を行った後、検索結果を感性語によりソートする手法を提案している。 Non-Patent Document 2 (Video Search Method Based on Social Annotation) proposes video search using comments by video viewers in addition to titles, summaries, and tags assigned to videos on video sharing sites. As an example, Nico Nico Douga (registered trademark) is used, but on such sites, a comment can be given for a specific playback time of the animation. A search scoring based on the number of times a search query appears in a comment, and a method for scoring a plurality of search queries based on the time interval in which they appear in a comment are proposed. Furthermore, after searching with keywords such as nouns, we propose a method of sorting the search results by sensitivity word.

非特許文献３（WordNet : A Lexical Database for English）では、名詞、動詞、形容詞などの品詞の各語の間の関係性を人手によって分類したシソーラスが開示されている。2010年現在、このシソーラスDB（データベース）は、一般にも公開されており言語処理の分野をはじめとして広く用いられている。DB中には、ある単語（synsetと呼ばれる同義語の集合）に対する上位語、下位語、対義語などの関係が記述されている。 Non-Patent Document 3 (WordNet: A Lexical Database for English) discloses a thesaurus in which relationships between parts of parts of speech such as nouns, verbs, and adjectives are classified manually. As of 2010, this thesaurus DB (database) is open to the public and widely used in the field of language processing. In the DB, relations such as broader terms, narrower terms, and synonyms with respect to a certain word (a set of synonyms called synset) are described.

非特許文献４（形容詞共起を用いた単語の印象推定法）では、Web検索エンジンを利用して、単語の印象を推定する手法を提案している。形容詞、形容動詞、名詞の印象を、webページ文書上での共起関係に基づいて推定している。形容詞同士の類似度を計るために、共起共立の関係に着目し、2つの形容詞を組み合わせてWeb検索エンジンで検索を行った時のヒット数を利用している。例えば、「明るい」と「美しい」という形容詞があった場合に、"明るく美しい"というクエリでWeb検索エンジンで検索を行った時の検索ヒット数を類似度とするということである。形容動詞についても同様である。一方、名詞に関しては、名詞同士の共起共立の関係は印象を知るためにはあまり役立たないと考え、形容詞または形容動詞と、名詞との共起関係に基づいて各印象との類似度を求める。例えば、「明るい」という形容詞と「花」という名詞があった場合に、これらはよく共起するため、類似度は大きくなる。一方で、「暗い」と「花」の場合は、あまり共起しないため、類似度は小さくなる。この結果より、「花」という名詞は、「明るい」という形容詞との結びつきの傾向が強く、そのため「花」は「暗い」よりも「明るい」という印象が強いことが得られる。他の名詞についても同様にすることで、各名詞の印象を推定できるとしている。 Non-Patent Document 4 (word impression estimation method using adjective co-occurrence) proposes a method of estimating a word impression using a Web search engine. Adjectives, adjective verbs, and noun impressions are estimated based on co-occurrence relationships in web page documents. In order to measure the degree of similarity between adjectives, we focus on the relationship between co-occurrence and co-occurrence, and use the number of hits when a search is performed with a Web search engine by combining two adjectives. For example, when there are adjectives “bright” and “beautiful”, the number of search hits when a search is performed by a Web search engine with a query “bright and beautiful” is used as the similarity. The same is true for adjective verbs. On the other hand, regarding nouns, the relationship between co-occurrence and co-occurrence between nouns is not very useful for understanding impressions, and the similarity between each impression is calculated based on the co-occurrence relationship between an adjective or adjective verb . For example, if there is an adjective “bright” and a noun “flower”, these often co-occur and the degree of similarity increases. On the other hand, in the case of “dark” and “flower”, the degree of similarity is small because they do not co-occur. From this result, the noun “flower” has a strong tendency to be associated with the adjective “bright”, so that “flower” has a stronger impression of “bright” than “dark”. By doing the same for other nouns, the impression of each noun can be estimated.

特開２００５−２０８８９２号公報Japanese Patent Laid-Open No. 2005-208992 特開２００８−２７６４０９号公報JP 2008-276409 A 特開２００９−３７２８８号公報JP 2009-37288 A

佐藤圭太、西原陽子、砂山渡：ブログ記事からのイベント文抽出によるシーンの生成、人工知能学会全国大会2007、2F4-5(2007)Yuta Sato, Yoko Nishihara, Watanabe Sunayama: Generating scenes by extracting event sentences from blog articles, Japan Society for Artificial Intelligence 2007, 2F4-5 (2007) 中村聡史、田中克己："ソーシャルアノテーションに基づく動画検索手法"、DEIM2009，D6-1(2009)Atsushi Nakamura, Katsumi Tanaka: "Video Search Method Based on Social Annotation", DEIM2009, D6-1 (2009) George A. Miller : "WordNet : A Lexical Database for English", Comm. Of the ACM, 38-11, pp.39-41 (1995)George A. Miller: "WordNet: A Lexical Database for English", Comm. Of the ACM, 38-11, pp.39-41 (1995) 清水浩平，萩原将文：形容詞共起を用いた単語の印象推定法，IEICE論文誌，Vol.J89-D, No.11, pp.2483-2490(2006)Komizu Shimizu, Masafumi Sugawara: Word Impression Estimation Using Adjective Co-occurrence, IEICE Journal, Vol.J89-D, No.11, pp.2483-2490 (2006)

しかしながら以上のような従来技術では、ブログ等の文書に適した印象を表す画像を自動で収集・選択するシステムを実現できない。特許文献1では、文書中の感情を判定し、画像データベース中の画像に付与された感情情報との関連度を計ることで、文書の印象に適した画像を選択するとしている。しかしながら、画像に付与された感情情報をどうやって求めるかについては開示されていない。 However, the conventional technology as described above cannot realize a system for automatically collecting and selecting an image representing an impression suitable for a document such as a blog. In Patent Document 1, emotions in a document are determined, and the degree of relevance with emotion information given to images in an image database is measured to select an image suitable for the impression of the document. However, it is not disclosed how to obtain emotion information given to an image.

特許文献2では、ブログ文書中から感情情報を表す言葉を抽出し、文書と同時に入力された画像を、感情情報に応じて変化させることで、文書に適した画像を生成する。この場合、元の画像のコントラストや大きさなどを変化させることで、感情を表現するため、元の画像はユーザが用意しておく必要がある。 In Patent Document 2, a word representing emotion information is extracted from a blog document, and an image input simultaneously with the document is changed according to emotion information, thereby generating an image suitable for the document. In this case, since the emotion is expressed by changing the contrast or size of the original image, the user needs to prepare the original image.

特許文献3では、服装のコーディネートにおいて、印象情報や配色情報を利用している。印象や色によって、服飾品の呈示候補の絞り込みを行っているが、服飾品のコーディネートが目的であり、ブログへの画像の付与とは目的が大きく異なる。 In Patent Document 3, impression information and color arrangement information are used in the coordination of clothes. Narrowing down the candidates for clothing items by impression and color, the purpose is to coordinate the clothing items, and the purpose is very different from adding images to blogs.

非特許文献1では、ブログ文書中に書かれている出来事に対応する画像を付与する。この従来技術では、実際に起きた出来事を端的に表すことに主眼が置かれている。場所、対象物、動作に対応する画像を付与することが目的であるため、その時のユーザの思いや、文書全体における雰囲気など、すなわち印象を考慮した画像は付与されない。 In Non-Patent Document 1, an image corresponding to an event written in a blog document is assigned. In this prior art, the main focus is on simply expressing what actually happened. Since an object is to provide an image corresponding to a place, an object, and an action, an image in consideration of the user's feeling at that time, the atmosphere in the entire document, that is, an impression is not provided.

非特許文献2では、動画共有サイトの動画に対するコメントを利用した動画検索手法を提案しているがその精度は十分とは言えず、また動画検索を目的としており、ブログに現れるような感性語で直接的に検索を行っているわけでもなく、ブログの印象に適した画像を付与する目的とは異なる。 Non-Patent Document 2 proposes a video search method that uses comments on videos on video sharing sites, but its accuracy is not sufficient, and it is aimed at video search, and it uses sensitivity words that appear on blogs. It is not a direct search, and is different from the purpose of providing an image suitable for the impression of a blog.

非特許文献3では、単語間の関係性を表したシソーラスについて示されている。人手によって、このシソーラスは形成されているため信頼性は高いが、新たな語についてその関係性を構築する場合にはコストがかかる。また、一般に使用される言語としての言葉の関係性を定義するために作成されているため、画像付与などを目的とした場合には有用とは言えない。 Non-Patent Document 3 shows a thesaurus that represents the relationship between words. This thesaurus is formed by manpower, so the reliability is high, but it is expensive to build the relationship for new words. Further, since it is created to define the relationship of words as commonly used languages, it is not useful for the purpose of image addition.

非特許文献4では、web検索エンジンによるヒット数により、形容詞、形容動詞といった直接的に印象を表すような語だけではなく、名詞の語の印象も自動で推定している。既存のweb検索エンジンを利用することで、簡易に語の印象を推定できるが、これも非特許文献3と同様に、一般的な語の関係を調べているため、画像付与を目的とした場合には精度が不十分であると考えられる。 In Non-Patent Document 4, not only words that directly express an impression such as adjectives and adjective verbs, but also the impressions of noun words are automatically estimated based on the number of hits by a web search engine. By using an existing web search engine, you can easily estimate the impression of a word. However, as with Non-Patent Document 3, the relationship between common words is being examined, so the purpose is to add an image. Is considered to be inaccurate.

本発明は、上記の事情に鑑みてなされたもので、文書に対して、文書の印象に適した画像を精度良く選定する画像選定装置、方法及びプログラムを提供することを目的とする。 The present invention has been made in view of the above circumstances, and an object of the present invention is to provide an image selection apparatus, method, and program for accurately selecting an image suitable for an impression of a document with respect to a document.

上記の目的を達成するために、本発明は文書に対して適した画像を選定し出力する画像選定装置であって、入力文書から、該入力文書に含まれる単語を抽出し当該抽出された単語と、所与の印象語と、に基づいて文書印象語を定め且つ該文書印象語の各々に対して該文書印象語の前記入力文書における印象の強さを表す文書印象関連度を求める印象情報抽出部と、検索対象画像と、前記所与の印象語を含む画像印象語と、該画像印象語の各々に対して前記検索対象画像に対する画像印象関連度と、を対応づけて保存する印象・画像データベースと、前記文書印象語と前記文書関連度とを受け取り、前記印象・画像データベース内の検索対象画像から対応する画像印象語に前記文書印象語と一致する語が存在する画像を検索し、当該検索された各画像の中から、当該検索一致した各語に対応する前記文書印象関連度と、当該検索一致した各語に対応する前記画像印象関連度とに基づいて前記入力文書に対して適した画像を選定する画像検索部を備えることを第１の特徴とする。 In order to achieve the above object, the present invention is an image selection device that selects and outputs an image suitable for a document, and extracts a word contained in the input document from the input document and extracts the extracted word. Impression information for determining a document impression relevance that defines a document impression word based on a given impression word and represents the strength of the impression of the document impression word in the input document for each of the document impression words An impression unit that stores an extraction unit, a search target image, an image impression word including the given impression word, and an image impression related degree with respect to the search target image in association with each of the image impression words Receiving an image database, the document impression word, and the document relevance level, searching for an image in which a word matching the document impression word exists in the corresponding image impression word from the search target image in the impression / image database; The searched An image suitable for the input document is selected based on the document impression relevance level corresponding to each search matched word and the image impression relevance level corresponding to each search matched word. It is a first feature that an image search unit is provided.

また本発明は、画像と当該画像に対するコメントとを対応づけて保存するコメント・画像データベースと、前記コメント・画像データベースから前記印象・画像データベースを構築するデータベース印象情報抽出部とを備え、該データベース印象情報抽出部は前記コメント・画像データベースに保存された各画像を前記検索対象画像として取得し、前記画像に対応づけられたコメントと前記所与の印象語とに基づいて前記画像印象語を定め且つ前記画像印象関連度を求めることにより前記印象・画像データベースを構築することを第２の特徴とする。 The present invention also includes a comment / image database that stores an image and a comment for the image in association with each other, and a database impression information extraction unit that constructs the impression / image database from the comment / image database. The information extraction unit obtains each image stored in the comment / image database as the search target image, determines the image impression word based on the comment associated with the image and the given impression word, and The second feature is to construct the impression / image database by obtaining the degree of image impression relevance.

また本発明は、前記コメント・画像データベースに保存された画像及びコメントが、ネットワークを介した画像投稿システムを利用する複数ユーザにより投稿された画像及び当該画像に対して当該画像の閲覧ユーザにより付与されたコメントを含むことを第３の特徴とする。 Further, according to the present invention, an image and a comment stored in the comment / image database are assigned by an image viewing user to an image posted by a plurality of users using an image posting system via a network. The third feature is that the comments are included.

また本発明は、所定の文書を格納し当該文書に対する検索に応答する第１テキストデータベースを備え、前記印象情報抽出部は前記所与の印象語の各々を前記文書印象語として定め、前記文書印象語の各々と前記入力文書から前記抽出された単語の各々とのペアを検索キーとして前記第１テキストデータベースを検索してヒット数を求め、前記文書印象関連度を当該文書印象語とペアで当該検索により求められた各単語の当該ヒット数の和に基づいて求め、前記データベース印象情報抽出部は前記所与の印象語の各々を前記画像印象語として定め、前記画像に対応づけられたコメントに含まれる単語を抽出し、前記画像印象語の各々と前記コメントから前記抽出された単語の各々とのペアを検索キーとして前記第１テキストデータベースを検索してヒット数を求め、前記画像印象関連度を当該画像印象語とペアで当該検索により求められた各単語の当該ヒット数の和に基づいて求めることを第４の特徴とする。 In addition, the present invention includes a first text database that stores a predetermined document and responds to a search for the document, and the impression information extraction unit determines each given impression word as the document impression word, and the document impression The first text database is searched using a pair of each word and each word extracted from the input document as a search key to obtain a hit count, and the document impression relevance is paired with the document impression word The database impression information extraction unit determines each of the given impression words as the image impression word, and adds it to the comment associated with the image. Words included are extracted, and the first text database is searched using a pair of each of the image impression words and each of the words extracted from the comments as a search key. To determine the number of hits, the fourth feature to seek on the basis of the image impression relevance to the sum of the number of hits of each word obtained by the search in the image impression words paired.

また本発明は、前記印象情報抽出部は前記入力文書から前記抽出された単語のうち前記所与の印象語に一致する単語を前記文書印象語として定め、前記文書印象関連度を当該文書印象語に対応する単語の前記入力文書中の頻度に基づいて求め、 Further, according to the present invention, the impression information extraction unit determines a word that matches the given impression word among the extracted words from the input document as the document impression word, and determines the document impression relevance level as the document impression word. Based on the frequency in the input document of the word corresponding to

前記データベース印象情報抽出部は前記コメントから単語を抽出し、当該抽出した単語のうち前記所与の印象語に一致する単語を前記コメントに対応する前記検索対象画像に対する前記画像印象語として求め、前記画像印象関連度を当該画像印象語に対応する単語の前記コメント中における頻度に基づいて求めることを第５の特徴とする。 The database impression information extraction unit extracts a word from the comment, obtains a word that matches the given impression word among the extracted words as the image impression word for the search target image corresponding to the comment, A fifth feature is that the degree of image impression relevance is obtained based on the frequency of the word corresponding to the image impression word in the comment.

また本発明は、前記所与の印象語の各々をグループ分けして印象グループに対応づける印象語変換テーブルと、該印象語変換テーブルを参照して印象語を印象カテゴリに変換する印象語変換部とを備え、前記印象情報抽出部は前記定めた文書印象語を前記印象語変換部に渡して文書印象カテゴリに変換させ、前記文書印象関連度を各文書印象カテゴリに属する文書印象語に対する文書印象関連度の和に基づくように更新し、前記データベース印象情報抽出部は前記定めた画像印象語を前記印象語変換部に渡して画像印象カテゴリに変換させ、前記画像印象関連度を各画像印象カテゴリに属する画像印象語に対する画像印象関連度の和に基づくように更新し、前記印象・画像データベースを前記検索対象画像と、前記変換された画像印象カテゴリと、前記更新された画像印象関連度と、を対応づけて保存するよう再構築し、前記画像検索部が前記文書印象カテゴリと前記更新された文書印象関連度とを受け取り、前記再構築された印象・画像データベース内の検索対象画像から対応する画像印象カテゴリに前記文書印象カテゴリと一致するカテゴリが存在する画像を検索し、当該検索された各画像の中から、当該検索一致した各カテゴリに対応する前記文書印象関連度と、当該検索一致した各カテゴリに対応する前記画像印象関連度とに基づいて前記入力文書に対して適した画像を選定することを第６の特徴とする。 The present invention also provides an impression word conversion table that groups each of the given impression words and associates them with an impression group, and an impression word conversion unit that converts an impression word into an impression category with reference to the impression word conversion table The impression information extraction unit passes the determined document impression word to the impression word conversion unit to convert it into a document impression category, and the document impression relevance for the document impression words belonging to each document impression category The database impression information extraction unit passes the determined image impression word to the impression word conversion unit to convert it into an image impression category, and the image impression relevance level is changed to each image impression category. The impression / image database is updated based on the sum of image impression relevance levels for image impression words belonging to the image impression word, and the converted image impression category And the updated image impression relevance level are stored in association with each other, and the image search unit receives the document impression category and the updated document impression relevance level, and the reconstructed image impression relevance level is stored. Search for an image in which a category matching the document impression category exists in the corresponding image impression category from the search target image in the impression / image database, and correspond to each searched and matched category from among the searched images A sixth feature is that an image suitable for the input document is selected based on the document impression relevance level and the image impression relevance level corresponding to each of the retrieved and matched categories.

また本発明は、前記再構築される前の印象・画像データベースを読み込んで前記印象語変換テーブルを構築するカテゴライズ部を備え、該カテゴライズ部は、前記印象・画像データベースに保存された各検索対象画像に対応づけられた画像印象語同士の共起関係に基づいて印象語同士の類似度を求め、当該類似度に基づいて印象語をグループ分けすることにより前記印象語変換テーブルを構築することを第７の特徴とする。 The present invention further includes a categorizing unit that reads the impression / image database before being reconstructed and constructs the impression word conversion table, and the categorizing unit stores each search target image stored in the impression / image database. Constructing the impression word conversion table by obtaining a similarity between impression words based on a co-occurrence relationship between image impression words associated with the image impression words and grouping the impression words based on the similarity 7 features.

また本発明は、前記カテゴライズ部が、前記画像印象語に対して前記印象・画像データベースにおいて対応づけられた前記画像印象関連度を用いて前記類似度を求めることを第８の特徴とする。 The eighth aspect of the present invention is characterized in that the categorizing unit obtains the similarity using the image impression relevance level associated with the image impression word in the impression / image database.

また本発明は、前記入力文書から、該入力文書に含まれるキーワードを抽出するキーワード抽出部を備え、前記印象・画像データベースには前記検索対象画像の各々に対して当該画像内容を表すタグが付与して保存され、前記画像検索部は前記キーワードを受け取り、前記印象・画像データベース内の検索対象画像から前記検索を行うに際して、前記付与されたタグに当該キーワードが含まれる検索対象画像のみに検索対象を限定して前記検索を行うことを第９の特徴とする。 The present invention further includes a keyword extraction unit that extracts a keyword included in the input document from the input document, and the impression / image database is provided with a tag representing the image content for each of the search target images. When the image search unit receives the keyword and performs the search from the search target image in the impression / image database, only the search target image including the keyword in the assigned tag is searched. It is a ninth feature that the search is performed by limiting the search.

また本発明は、前記入力文書から、該入力文書に含まれるキーワード及び該キーワードの前記入力文書中での重みを抽出するキーワード抽出部を備え、前記印象・画像データベースには前記検索対象画像の各々に対して当該画像内容を表すタグが付与して保存され、前記画像検索部は前記キーワード及び前記重みを受け取り、前記印象・画像データベース内の検索対象画像から、対応する画像印象語に前記文書印象語と一致する語が存在する画像、又は対応するタグに前記キーワードと一致するタグが存在する画像、を検索し、当該検索された各画像の中から、当該検索一致した各語に対応する前記文書印象関連度及び前記画像印象関連度、又は当該検索一致した各タグに対応する前記キーワードの重み、の少なくとも一方に基づいて前記入力文書に対して適した画像を選定することを第１０の特徴とする。 The present invention further includes a keyword extraction unit that extracts a keyword included in the input document and a weight of the keyword in the input document from the input document, and each of the search target images is included in the impression / image database. The image search unit receives the keyword and the weight, and stores the document impression from the search target image in the impression / image database to the corresponding image impression word. Search for an image in which a word that matches the word exists, or an image in which the tag that matches the keyword exists in the corresponding tag, and from among the searched images, the corresponding to each word that matches the search The input based on at least one of the document impression relevance level and the image impression relevance level, or the weight of the keyword corresponding to each search matched tag. And tenth aspect of the selecting an image suitable for the document.

また本発明は、前記所与の印象語の各々に対する上位所定数の頻出共起単語を列挙した頻出共起単語リストと、前記文書印象語を受け取り、前記頻出共起単語リストを参照して前記文書印象語の各々に対する頻出共起単語を抽出する共起情報抽出部とを備え、前記印象・画像データベースには前記検索対象画像の各々に対して当該画像内容を表すタグが付与して保存され、前記画像検索部は前記文書印象語に対する頻出共起単語を受け取り、前記印象・画像データベース内の検索対象画像から、対応する画像印象語に前記文書印象語と一致する語が存在する画像を前記検索し、当該検索一致した印象語に対応する前記頻出共起単語と前記付与されたタグとが一致しない画像を優先して前記入力文書に対して適した画像として選定することを第１１の特徴とする。 The present invention also includes a frequent co-occurrence word list that lists a top predetermined number of frequent co-occurrence words for each of the given impression words, the document impression word, and the reference to the frequent co-occurrence word list. A co-occurrence information extraction unit that extracts frequent co-occurrence words for each of the document impression words, and the impression / image database stores each of the search target images with a tag representing the image content. The image search unit receives a frequent co-occurrence word for the document impression word, and from the search target image in the impression / image database, an image having a word matching the document impression word in the corresponding image impression word exists. Searching and preferentially selecting an image in which the frequent co-occurrence word corresponding to the search-matched impression word does not match the assigned tag as an image suitable for the input document. And a feature.

また本発明は、所定の文書を格納し当該文書に対する検索に応答する第２テキストデータベースと、前記第２テキストデータベースから前記頻出共起単語リストを構築するデータベーステキスト検索部とを備え、該データベーステキスト検索部は、前記所与の印象語の各々を検索キーとして前記第２テキストデータベースに当該検索キーに一致する語を含む文書を検索させ、当該検索により得られる所定数の文書から当該所与の印象語と共起する単語を上位所定数抽出し、当該共起する単語の各々に対して、当該共起する単語と当該所与の印象語とを検索キーとして前記第２テキストデータベースに再検索を行わせてヒットする文書数を求め、当該共起する単語を該ヒットした文書数で順位付けすることにより、当該所与の印象語に対する前記上位所定数の頻出共起単語を得て前記頻出単語リストを構築することを第１２の特徴とする。 The present invention further includes a second text database that stores a predetermined document and responds to a search for the document, and a database text search unit that constructs the frequent co-occurrence word list from the second text database. The search unit causes the second text database to search for a document including a word that matches the search key using each of the given impression words as a search key, and from the predetermined number of documents obtained by the search, A predetermined upper number of words that co-occur with an impression word are extracted, and the second text database is re-searched for each of the co-occurring words using the co-occurring word and the given impression word as a search key. To determine the number of hit documents and rank the co-occurrence words by the hit number of documents, so that Obtaining frequent co-occurrence word predetermined number of top and twelfth features to build the frequently appearing word list.

また本発明は、前記入力文書と共に利用されるテンプレートからテンプレート配色情報を抽出する配色情報抽出部を備え、前記画像検索部は前記テンプレート配色情報を受け取り、前記印象・画像データベース内の検索対象画像から、対応する画像印象語に前記文書印象語と一致する語が存在する画像、又は配色が前記テンプレート配色情報と類似する画像、を検索し、当該検索一致した各語に対応する前記文書印象関連度及び前記画像印象関連度、又は当該検索された類似配色画像の配色類似度、の少なくとも一方に基づいて前記入力文書に対して適した画像することを第１３の特徴とする。 Further, the present invention includes a color arrangement information extraction unit that extracts template color arrangement information from a template used together with the input document, and the image search unit receives the template color arrangement information and uses the search target image in the impression / image database. The image impression word corresponding to the document impression word exists, or the image having a color scheme similar to the template color scheme information is retrieved, and the document impression relevance level corresponding to each search matched word is retrieved. A thirteenth feature is that an image suitable for the input document is formed based on at least one of the image impression relevance degree and the color similarity degree of the searched similar color arrangement image.

また本発明は、上記の目的を達成するために、文書に対して適した画像を選定し出力する画像選定方法であって、入力文書から、該入力文書に含まれる単語を抽出し当該抽出された単語と、所与の印象語と、に基づいて文書印象語を定め且つ該文書印象語の各々に対して該文書印象語の前記入力文書における印象の強さを表す文書印象関連度を求める印象情報抽出手段と、検索対象画像と、前記所与の印象語を含む画像印象語と、該画像印象語の各々に対して前記検索対象画像に対する画像印象関連度と、を対応づけて保存する印象・画像保存手段と、前記文書印象語と前記文書関連度とを受け取り、前記印象・画像データベース内の検索対象画像から対応する画像印象語に前記文書印象語と一致する語が存在する画像を検索し、当該検索された各画像の中から、当該検索一致した各語に対応する前記文書印象関連度と、当該検索一致した各語に対応する前記画像印象関連度とに基づいて前記入力文書に対して適した画像を選定する画像検索手段を備えることを第１４の特徴とする。 In order to achieve the above object, the present invention is an image selection method for selecting and outputting an image suitable for a document, wherein a word contained in the input document is extracted and extracted from the input document. A document impression word is determined based on a given word and a given impression word, and a document impression relevance level representing an impression strength of the document impression word in the input document is obtained for each of the document impression words. Impression information extraction means, a search target image, an image impression word including the given impression word, and an image impression relation level with respect to the search target image are stored in association with each of the image impression words. An impression / image storage means, the document impression word, and the document relevance level are received, and an image in which a word that matches the document impression word exists in a corresponding image impression word from a search target image in the impression / image database. Search and search From among the images, an image suitable for the input document based on the document impression relevance level corresponding to the search matched words and the image impression relevance level corresponding to the search matched words. It is a fourteenth feature that the image search means for selecting is provided.

また本発明は、上記の目的を達成するために、文書に対して適した画像を選定し出力する画像選定プログラムであって、入力文書から、該入力文書に含まれる単語を抽出し当該抽出された単語と、所与の印象語と、に基づいて文書印象語を定め且つ該文書印象語の各々に対して該文書印象語の前記入力文書における印象の強さを表す文書印象関連度を求める印象情報抽出手段と、検索対象画像と、前記所与の印象語を含む画像印象語と、該画像印象語の各々に対して前記検索対象画像に対する画像印象関連度と、を対応づけて保存する印象・画像保存手段と、前記文書印象語と前記文書関連度とを受け取り、前記印象・画像データベース内の検索対象画像から対応する画像印象語に前記文書印象語と一致する語が存在する画像を検索し、当該検索された各画像の中から、当該検索一致した各語に対応する前記文書印象関連度と、当該検索一致した各語に対応する前記画像印象関連度とに基づいて前記入力文書に対して適した画像を選定する画像検索手段を備えることを特徴とする画像選定プログラムとして、コンピュータを機能させることを第１５の特徴とする。 In order to achieve the above object, the present invention is an image selection program for selecting and outputting an image suitable for a document, wherein a word contained in the input document is extracted and extracted from the input document. A document impression word is determined based on a given word and a given impression word, and a document impression relevance level representing an impression strength of the document impression word in the input document is obtained for each of the document impression words. Impression information extraction means, a search target image, an image impression word including the given impression word, and an image impression relation level with respect to the search target image are stored in association with each of the image impression words. An impression / image storage means, the document impression word, and the document relevance level are received, and an image in which a word that matches the document impression word exists in a corresponding image impression word from a search target image in the impression / image database. Search and concerned Suitable for the input document on the basis of the document impression relevance level corresponding to the search matched words and the image impression relevance level corresponding to the search matched words from the searched images. According to a fifteenth feature, the computer is caused to function as an image selection program characterized by including an image search means for selecting a selected image.

前記第１〜第５の特徴によれば、文書の印象に適した画像を簡単に付与する画像選定装置を提供することができる。前記第６〜第８の特徴によれば、印象・画像データベース中に出現頻度が低い印象語が保存されている場合であっても、印象カテゴリへ変換することによって印象カテゴリとしての出現頻度が高くなり、当該印象カテゴリを用いて画像検索することによって文書の印象に適した画像の検索精度が改善される。 According to the first to fifth features, it is possible to provide an image selection device that easily gives an image suitable for the impression of a document. According to the sixth to eighth features, even if an impression word having a low appearance frequency is stored in the impression / image database, the appearance frequency as the impression category is increased by converting the impression word into the impression category. Thus, the image search accuracy using the impression category is improved by searching for an image suitable for the impression of the document.

前記第９、第１０の特徴によれば、文書の印象のみでなく内容も適度に考慮した画像を付与できる。前記第１１、第１２の特徴によれば、文書の印象を表す画像としてより抽象的な画像を選別して、文書の内容に関わらず、より汎用的に利用できる画像を付与できる。前記第１３の特徴によれば、文書と共に利用されるテンプレートの配色とマッチするような色合いの画像が選別され、テンプレートとバランスの取れた画像を付与できる。 According to the ninth and tenth features, it is possible to provide an image that appropriately considers not only the impression of the document but also the content. According to the eleventh and twelfth features, a more abstract image can be selected as an image representing the impression of a document, and an image that can be used more generally regardless of the content of the document. According to the thirteenth feature, an image having a hue that matches the color scheme of the template used together with the document is selected, and an image balanced with the template can be given.

前記第１４の特徴によれば、文書の印象に適した画像を簡単に付与する画像選定方法を提供することができる。前記第１５の特徴によれば、文書の印象に適した画像を簡単に付与する画像選定プログラムを提供することができる。 According to the fourteenth feature, it is possible to provide an image selection method for easily giving an image suitable for the impression of a document. According to the fifteenth feature, it is possible to provide an image selection program that easily gives an image suitable for the impression of a document.

本発明による画像付与の流れの概略を示すブロック図である。It is a block diagram which shows the outline of the flow of the image provision by this invention. 本発明の画像選定装置の構成を示すブロック図である。It is a block diagram which shows the structure of the image selection apparatus of this invention. 本発明の画像選定装置のうち、印象・画像ＤＢを構築するための構成を示すブロック図である。It is a block diagram which shows the structure for constructing impression and image DB among the image selection apparatuses of this invention. 文書印象情報及び画像印象情報を説明する図である。It is a figure explaining document impression information and image impression information. 印象語変換テーブルの例を示す図である。It is a figure which shows the example of an impression word conversion table. ブログテンプレートから配色を抽出する箇所の例を示す図である。It is a figure which shows the example of the location which extracts a color scheme from a blog template. 印象・画像ＤＢに格納されるデータの例を示す図である。It is a figure which shows the example of the data stored in impression and image DB. コメント・画像ＤＢに格納されるデータの例を示す図である。It is a figure which shows the example of the data stored in comment and image DB. 本発明において好ましい印象語変換テーブルを作成する過程を模式的に示す図である。It is a figure which shows typically the process which produces a preferable impression word conversion table in this invention. キーワード情報を説明する図である。It is a figure explaining keyword information. 共起情報抽出部を用いて抽象画像を検索する実施形態を模式的に示すブロック図である。It is a block diagram which shows typically embodiment which searches an abstract image using a co-occurrence information extraction part. 本発明の画像選定装置のうち頻出共起単語リストを構築するための構成を示すブロック図である。It is a block diagram which shows the structure for constructing the frequent occurrence word list among the image selection apparatuses of this invention. 頻出共起単語リストを構築する過程を模式的に示すブロック図である。It is a block diagram which shows typically the process of building a frequent occurrence co-occurrence word list. 本発明の画像選定プログラムを実行するコンピュータシステムのブロック図である。It is a block diagram of a computer system that executes an image selection program of the present invention.

以下に、図面を参照して本発明の実施形態を詳細に説明する。図１に本発明による処理の流れの概略を示す。まず、ブログ投稿者から入力されたブログ記事（同図（ａ）「動物園に行って楽しかった。…」）から、1つ以上の文書印象語（同図（ｂ）"楽しい"、"つまらない"等）と記事の各印象の強さを表す文書印象関連度（"1.0"、"0.1"等）を含む印象情報（文書印象情報）を抽出する。この文書印象情報をキーとして、予め用意しておいた、同図（ｃ）に示すような印象・画像DBに対して検索を行う。すなわち、文書印象情報中の文書印象語とマッチする印象語を、印象・画像DB中の印象情報（画像印象情報）に含まれる画像印象語の中から検索し、マッチしたものの画像を取得するといったことを行う。同図（ｄ）に示すようにこの取得した画像の中から、画像を選択する基準となる画像選択スコアを求めて、上位何枚かの画像を最終的な出力とする。この候補画像を見て、ユーザは最終的に気に入った画像を選択して、同図（ｅ）に示すようにブログに画像を付与する。 Embodiments of the present invention will be described below in detail with reference to the drawings. FIG. 1 shows an outline of a processing flow according to the present invention. First, one or more document impression words (Fig. (B) "Fun", "Boring") from the blog article (Fig. (A) "I enjoyed going to the zoo ...") entered by the blog author. Etc.) and the impression information (document impression information) including the document impression relevance level (“1.0”, “0.1”, etc.) indicating the strength of each impression of the article. Using this document impression information as a key, an impression / image DB as shown in FIG. That is, an impression word that matches a document impression word in the document impression information is searched from image impression words included in the impression information (image impression information) in the impression / image DB, and a matched image is acquired. Do things. As shown in FIG. 4D, an image selection score serving as a reference for selecting an image is obtained from the acquired images, and the top several images are used as final outputs. Looking at this candidate image, the user finally selects a favorite image and adds an image to the blog as shown in FIG.

図２に本発明の画像選定装置の構成を示す。本発明の画像選定装置は、入力部１、印象情報抽出部２、画像検索部３、印象・画像データベース（印象・画像ＤＢ）３０、出力部４、印象語変換部５、印象語変換テーブル５０、キーワード情報抽出部６、配色情報抽出部７、共起情報抽出部８、頻出共起単語リスト８０及び第１テキストデータベース（第１テキストＤＢ）９を備える。図１の処理の概要説明における（ａ）を入力部１が、（ｂ）を印象情報抽出部２が、（ｃ）を印象・画像ＤＢ３０が、（ｄ）を画像検索部３が、（ｅ）を出力部４が担う。 FIG. 2 shows the configuration of the image selection apparatus of the present invention. The image selection apparatus of the present invention includes an input unit 1, an impression information extraction unit 2, an image search unit 3, an impression / image database (impression / image DB) 30, an output unit 4, an impression word conversion unit 5, and an impression word conversion table 50. , A keyword information extraction unit 6, a color arrangement information extraction unit 7, a co-occurrence information extraction unit 8, a frequent co-occurrence word list 80, and a first text database (first text DB) 9. In the outline description of the processing of FIG. 1, (a) is the input unit 1, (b) is the impression information extraction unit 2, (c) is the impression / image DB 30, (d) is the image search unit 3, (e) ) Is handled by the output unit 4.

第１実施形態では、本発明の画像選定装置は図２に示す構成のうち入力部１、印象情報抽出部２、画像検索部３、印象・画像ＤＢ３０及び出力部４によって、さらに印象関連度の実施形態によっては第１テキストＤＢ９を追加して画像選定を行う。第２〜第５実施形態では第１実施形態にさらに図２において括弧で囲った構成ブロックすなわち、印象語変換部５及び印象語変換テーブル５０、キーワード情報抽出部６、配色情報抽出部７、並びに共起情報抽出部及び頻出共起単語リスト８０、のうち少なくとも一つを追加した構成によって画像選定を行う。これらの各実施形態の詳細については後述するが、各追加構成から得られる情報は全て画像検索部３に渡され、ここでの画像選択スコアや画像検索方式に反映された形となって画像選定が行われる。 In the first embodiment, the image selection device according to the present invention has an impression degree of relevance by the input unit 1, the impression information extraction unit 2, the image search unit 3, the impression / image DB 30 and the output unit 4 in the configuration shown in FIG. Depending on the embodiment, the first text DB 9 is added to select an image. In the second to fifth embodiments, the constituent blocks enclosed in parentheses in FIG. 2 in addition to the first embodiment, that is, the impression word conversion unit 5 and the impression word conversion table 50, the keyword information extraction unit 6, the color arrangement information extraction unit 7, and Image selection is performed by a configuration in which at least one of the co-occurrence information extraction unit and the frequent co-occurrence word list 80 is added. Although details of each of these embodiments will be described later, all the information obtained from each additional configuration is transferred to the image search unit 3 and is reflected in the image selection score and the image search method here. Is done.

すなわち、詳しくは後述するが第１実施形態では上記構成により画像選択スコアとして印象情報に基づく印象語スコアを採用する。第２実施形態は印象語スコアの精度を上げるために、印象情報を扱うにあたって印象語単位ではなく印象語をグループ化した印象カテゴリ単位で算出処理を行い、印象語スコアの別実施形態として印象カテゴリスコアを求め、これを画像選択スコアとする。第３〜第５実施形態では、第１、第２実施形態における印象情報以外の情報のみを利用する又は印象情報以外の情報を追加して利用する等して画像選択スコアを求める。 That is, although described in detail later, in the first embodiment, an impression word score based on impression information is adopted as an image selection score by the above configuration. In the second embodiment, in order to improve the accuracy of the impression word score, calculation processing is performed in the impression category unit in which the impression words are grouped instead of the impression word unit in handling the impression information. A score is obtained and used as an image selection score. In the third to fifth embodiments, the image selection score is obtained by using only information other than the impression information in the first and second embodiments or by adding information other than the impression information.

第３実施形態では、配色情報抽出部７の利用によってテンプレートと画像の配色に基づく配色スコアを求めて、画像選択スコア算出に利用する。第４実施形態では、キーワード情報抽出部６の利用によりキーワード情報を抽出してキーワードスコアを求めて画像選択スコア算出に利用する。第５実施形態では、共起情報抽出部８の利用によって画像検索部３による検索対象に制限を加える。 In the third embodiment, the use of the color arrangement information extraction unit 7 obtains a color arrangement score based on the color arrangement between the template and the image, and uses it for image selection score calculation. In the fourth embodiment, keyword information is extracted by using the keyword information extraction unit 6 to obtain a keyword score, which is used for image selection score calculation. In the fifth embodiment, the use of the co-occurrence information extraction unit 8 limits the search target by the image search unit 3.

なお第１又は第２実施形態に対する追加として、第３ないし第５実施形態による追加の少なくとも一つを任意に追加できる。しかしこのうち第４実施形態と第５実施形態とは文書に付与すべき適切な画像の定義が対照的であるので、第４実施形態と第５実施形態との両方を追加する実施形態は形式的には可能であるが効果の観点から好ましくない。 As an addition to the first or second embodiment, at least one of the additions according to the third to fifth embodiments can be arbitrarily added. However, among these, the fourth embodiment and the fifth embodiment are in contrast to the definition of an appropriate image to be assigned to a document. Therefore, an embodiment in which both the fourth embodiment and the fifth embodiment are added is a format. Although it is possible, it is not preferable from the viewpoint of the effect.

なおまた、第１〜第５実施形態全てにおいて用いる印象関連度は頻度に基づいて求める実施形態と、検索により求める実施形態とが利用可能であるが、検索により求める実施形態においては図２、図３に示す第１テキストＤＢ９を利用する。 The impression relevance level used in all of the first to fifth embodiments can be used in the embodiment obtained based on the frequency and the embodiment obtained by the search. In the embodiment obtained by the search, FIG. The first text DB 9 shown in FIG.

また本発明において図２の印象・画像ＤＢ３０を構築するための構成を図３に示す。印象・画像ＤＢ３０は、コメント・画像データベース（コメント・画像ＤＢ）１０、データベース印象情報抽出部（ＤＢ印象情報抽出部）１１、データベース印象語変換部（ＤＢ印象語変換部）１２、印象語変換テーブル５０及びカテゴライズ部１３を備えた構成によって構築され、これらは図２には示していないが本発明の画像選定装置に含まれる。印象・画像ＤＢ３０においても印象関連度を頻度ではなく検索で求める実施形態では図２と共通の第１テキストＤＢ９を用いる。 FIG. 3 shows a structure for constructing the impression / image DB 30 of FIG. 2 in the present invention. The impression / image DB 30 includes a comment / image database (comment / image DB) 10, a database impression information extraction unit (DB impression information extraction unit) 11, a database impression word conversion unit (DB impression word conversion unit) 12, an impression word conversion table. 50 and the categorizing unit 13 are constructed, and these are included in the image selection apparatus of the present invention although not shown in FIG. Also in the impression / image DB 30, the first text DB 9 common to FIG.

なお前記第１実施形態において印象・画像ＤＢ３０を構築するための構成は図３に示すうち括弧で囲った部分を除いたコメント・画像ＤＢ１０、ＤＢ印象情報抽出部１１及び印象・画像ＤＢ３０である。前記第１実施形態に対して印象語変換部５及び印象語変換テーブル５０を追加した実施形態（後述するようにこれが第２実施形態である）において印象・画像ＤＢ３０を構築するための構成が、図３に示す全構成である。 The structure for constructing the impression / image DB 30 in the first embodiment is the comment / image DB 10, the DB impression information extraction unit 11, and the impression / image DB 30 except for the portion enclosed in parentheses in FIG. 3. The configuration for constructing the impression / image DB 30 in the embodiment in which the impression word conversion unit 5 and the impression word conversion table 50 are added to the first embodiment (this is the second embodiment as will be described later). It is the whole structure shown in FIG.

なお図２、図３の印象語変換テーブル５０は矢印の向かう当該テーブルを参照して利用するブロックが異なるが、共通の番号５０を付しているように同一である。後述のように印象語変換テーブル５０は印象語変換部５から参照され、DB印象語変換部１２から参照され、またカテゴライズ部１３によって構築される。 The impression word conversion tables 50 in FIG. 2 and FIG. 3 are the same as shown by the common number 50, although the blocks used by referring to the table to which the arrow is directed differ. As will be described later, the impression word conversion table 50 is referred to by the impression word conversion unit 5, is referred to from the DB impression word conversion unit 12, and is constructed by the categorizing unit 13.

なおまた後述するがＤＢ印象情報抽出部１１は印象情報抽出部２と、ＤＢ印象語変換部１２は印象語変換部５と、テキスト処理に関しては同一機能である。よってこれらはそれぞれ同一の構成ブロックであるとしてもよいが、本発明における処理の流れを説明する便宜上、別の名称と番号とを与えている。ＤＢ印象情報抽出部１１及びＤＢ印象語変換部１２では、処理対象テキストにコメント・画像ＤＢ１０によって画像が対応づけられているという点が異なる。 As will be described later, the DB impression information extraction unit 11 has the same function as the impression information extraction unit 2 and the DB impression word conversion unit 12 has the same function with respect to the text processing. Therefore, these may be the same constituent blocks, but different names and numbers are given for convenience in explaining the flow of processing in the present invention. The DB impression information extraction unit 11 and the DB impression word conversion unit 12 are different in that an image is associated with the processing target text by the comment / image DB 10.

また、本発明において図２の頻出共起単語リスト８０を構築するための構成を図１２に示す。頻出共起単語リスト８０は第２テキストデータベース（第２テキストDB）８０１とデータベーステキスト検索部（DBテキスト検索部）８０２とを備えた構成によって構築される。これら構成も図２には示していないが本発明の画像選定装置に含まれる。なお第１テキストＤＢ９と第２テキストＤＢ８０１は共にテキスト検索に応答するものであるので、同一のデータベースとして利用してもよいが、本発明における処理の流れを説明する便宜上、別の名称と番号とを与えている。 FIG. 12 shows a configuration for constructing the frequent co-occurrence word list 80 of FIG. 2 in the present invention. The frequent co-occurrence word list 80 is constructed by a configuration including a second text database (second text DB) 801 and a database text search unit (DB text search unit) 802. These configurations are not shown in FIG. 2, but are included in the image selection apparatus of the present invention. Since both the first text DB 9 and the second text DB 801 respond to text search, they may be used as the same database. However, for convenience of explaining the flow of processing in the present invention, different names and numbers are used. Is given.

以下、図２、３の各部について説明してから、まず前記第１実施形態における画像選択スコアの算出を説明する。 2 and 3 will be described first, and then the calculation of the image selection score in the first embodiment will be described first.

入力部１には、ブログ投稿者が書いたブログ文書などの文書が入力される。文書はユーザがPC（パーソナルコンピュータ）などの端末で作成したものを入力してもよいし、既に存在しているブログサイト上にアップロードされている文書などでもよい。また、主として配色情報抽出部７で配色情報の抽出に利用されることとなる、ブログのテンプレート（タイトルバー、メニューバー、ブログ記事が書かれる部分の背景）も入力文書と共に入力される。 A document such as a blog document written by a blog poster is input to the input unit 1. The document may be input by a user using a terminal such as a PC (personal computer), or may be a document uploaded on an already existing blog site. Also, a blog template (title bar, menu bar, background of a part where a blog article is written), which is mainly used for extraction of color arrangement information by the color arrangement information extraction unit 7, is input together with the input document.

印象情報抽出部２は、入力された文書から1つ以上の印象語および該印象語の文書印象関連度を含む文書印象情報を抽出する。印象語は入力文書とは独立に所与のものを用意しておき、形態素解析などにより入力文書中の単語から一致するものを抽出する。入力文書から抽出された印象語を最初に用意しておいた所与の印象語と特に区別する場合、文書印象語と呼ぶこととする。 The impression information extraction unit 2 extracts document impression information including one or more impression words and a document impression related degree of the impression words from the input document. A given impression word is prepared independently of the input document, and a matching word is extracted from words in the input document by morphological analysis or the like. When the impression word extracted from the input document is particularly distinguished from the given impression word prepared first, it will be called a document impression word.

文書印象関連度とは、各文書印象語が実際に入力文書で表現されている印象をどの程度の強さで表しているかを示す量であり、一実施形態では各文書印象語の入力文書中の頻度に基づいて求められる。すなわち当該実施形態では文書印象関連度は頻度自体でもよいし、現れた印象語の間での割合でもよい。また、大量の印象語を含んだ文書コーパスから予め計算しておいた各印象語のdocument frequency (df) を使って、当該文書中に現れた印象語の数であるterm frequency (tf) からtfidfを計算し、それを文書印象関連度としてもよい。なおtfidfも、tfを用いることから明らかなように、印象語の入力文書中の頻度に基づく量に含まれる。 The document impression relevance is an amount indicating how much impression each document impression word actually represents in the input document, and in one embodiment, in the input document of each document impression word It is calculated based on the frequency of In other words, in this embodiment, the document impression relevance may be the frequency itself or the ratio between the impression words that appear. In addition, using the document frequency (df) of each impression word calculated in advance from a document corpus containing a large number of impression words, the term frequency (tf), which is the number of impression words that appear in the document, is used as tfidf. It is good also as a document impression relevance degree. Note that tfidf is also included in the quantity based on the frequency of impression words in the input document, as is apparent from using tf.

また、文書印象関連度を上述のように頻度に基づいて求める一実施形態に対する別実施形態を説明する。当該実施形態においては印象情報抽出部２が第１テキストＤＢ９を利用する。印象情報抽出部２はまず前述の実施形態と同様に形態素解析などにより入力文書に含まれる単語を抽出する。前述の頻度に基づく実施形態では当該抽出された単語の中から所与の印象語に一致する単語のみを選んで用いたが、当該実施形態では以下のように当該抽出された単語を全て用いて文書印象関連度を求める。 Further, another embodiment for one embodiment in which the document impression relevance level is obtained based on the frequency as described above will be described. In the present embodiment, the impression information extraction unit 2 uses the first text DB 9. The impression information extraction unit 2 first extracts words included in the input document by morphological analysis or the like as in the above-described embodiment. In the embodiment based on the above-described frequency, only the word that matches the given impression word is selected from the extracted words and used. However, in the embodiment, all the extracted words are used as follows. Obtain the document impression relevance.

なおまた当該実施形態では前述実施形態と異なり入力文書中に一致する単語がない場合でも印象語の文書印象関連度が求まるが、前述実施形態での呼称と対応させて当該実施形態では入力文書に対して文書印象関連度が求められた印象語のことを文書印象語と呼ぶこととする。さらに前述実施形態と対応させて入力文書中に一致する単語がない場合であっても、所与の印象語の利用により「入力文書から抽出された文書印象語」などと表現するものとする。これら表現は後述のＤＢ印象情報抽出部１１における画像印象語等に対しても同様に用いるものとする。 In this embodiment, unlike the previous embodiment, the document impression relevance level of the impression word is obtained even when there is no matching word in the input document. However, in this embodiment, the input document is associated with the name in the previous embodiment. On the other hand, an impression word for which a document impression relevance degree is obtained is called a document impression word. Further, even if there is no matching word in the input document in correspondence with the above-described embodiment, it is expressed as “document impression word extracted from the input document” by using a given impression word. These expressions are also used for image impression words and the like in the DB impression information extraction unit 11 described later.

当該実施形態では印象情報抽出部２は次に、前記予め用意しておいた所与の印象語（例えば「かわいい」、「美しい」など）との関係の強さを、抽出した全ての単語について求め、この結果から文書印象関連度を算出する。関係の強さを求めるため、印象情報抽出部２は抽出した各単語と所与の印象語（例えば、「象」など）の組み合わせ（象ＡＮＤかわいい，象ＡＮＤ美しい， ...）を検索キーとして用いて、大量のテキストが含まれた第１テキストＤＢ９（例えばweb検索エンジン等）に対して検索を行い、その時のヒット数を得る。検索キーの組み合わせすなわちペアの方式は、ＡＮＤ検索でなくて、OR検索(象 OR かわいい)やPHRASE検索（"かわいい象"）などでもよい。 In this embodiment, the impression information extraction unit 2 then determines the strength of the relationship with the given impression word prepared in advance (for example, “cute”, “beautiful”, etc.) for all the extracted words. The document impression relevance is calculated from the result. In order to determine the strength of the relationship, the impression information extraction unit 2 uses a search key for a combination of each extracted word and a given impression word (for example, “elephant”) (elephant AND cute, elephant AND beautiful, ...) Is used to search a first text DB 9 (for example, a web search engine) containing a large amount of text, and the number of hits at that time is obtained. The search key combination, that is, the pairing method, may be OR search (elephant OR cute), PHRASE search ("cute elephant"), etc., instead of AND search.

印象情報抽出部２は当該検索によるヒット数を用いて、所与の印象語と入力文書から抽出された各単語との関係の強さを求める。ここで検索ヒット数が多ければ、当該単語と当該印象語の関係性は強いと判断され、少なければ関係性が弱いと判断できる。よってこのヒット数をそのまま単語単位の印象関連度としてもよいし、単語単独で検索した数などでヒット数を正規化した数（Dice係数など）を印象関連度としてもよい。入力文書に対する各印象語すなわち各文書印象語の文書印象関連度は、この単語単位の印象関連度を文書から抽出された単語につき足し合わせた値、または必要に応じて当該値を文書中に出現した、すなわち足し合わせた単語数などで正規化したものなどとする。 The impression information extraction unit 2 obtains the strength of the relationship between a given impression word and each word extracted from the input document using the number of hits by the search. Here, if the number of search hits is large, it is determined that the relationship between the word and the impression word is strong, and if it is small, it can be determined that the relationship is weak. Therefore, this hit number may be used as an impression relevance level in units of words, or a number obtained by normalizing the number of hits by the number of words searched alone (Dice coefficient or the like) may be used as the impression relevance level. Each impression word for the input document, that is, the document impression relevance level of each document impression word is a value obtained by adding the impression relevance level of this word unit to the word extracted from the document, or the value appears in the document as necessary. That is, it is normalized by the number of words added together.

なお文書印象関連度は，全印象語に対して求めるのではなく最も強い印象（最大数）または上位所定数の印象に対する数と印象を保持してもよい。この場合、後述の各実施形態におけるスコア算出などにおいて、文書印象関連度が求められなかった印象語は文書印象語が存在しないものとして扱うか、文書印象語は存在するが文書印象関連度の値がゼロであるとして扱えばよい。 The document impression relevance level may not be obtained for all impression words but may hold the number and impression for the strongest impression (maximum number) or the upper predetermined number of impressions. In this case, in the score calculation or the like in each embodiment described later, an impression word for which the document impression relevance level is not calculated is treated as a document impression word does not exist, or there is a document impression word but a value of the document impression relevance level Can be treated as zero.

なお以上のような各実施形態で文書印象関連度は求められるので、当該文書印象関連度は入力文書の印象の強さを表す量であるが、本願発明において入力文書における印象語の印象の強さとは人の主観に左右されるものではない。 In addition, since the document impression relevance is obtained in each of the embodiments as described above, the document impression relevance is a quantity representing the strength of the impression of the input document. In the present invention, the impression impression of the input word is strong. Sato does not depend on human subjectivity.

図４は上述のような各実施形態で求まる文書印象情報の例を示す図である。同図（ａ１）のブログ入力文書「動物園楽しかった。かわいいペンギンも見れたし。本当に楽しかった。」に対して、印象語が同図（ｂ）に示すように「楽しい」「かわいい」「つまらない」の３語である場合に得られる文書印象情報が同図（ｃ１）である。これは文書印象語として「楽しい」２語、「かわいい」１語を抽出して、文書印象関連度として各文書印象語の頻度を抽出した例である。図中にも示すように、印象情報抽出部２は形態素解析などにより入力文書中の「楽しかった」から印象語「楽しい」を抽出するものとする。印象語「つまらない」は入力文書に存在しないので抽出されていないが、文書印象関連度の値がゼロとして抽出されたとみなしてもよい。なお同図（ａ２）及び（ｃ２）は上述のように印象情報抽出部２で文書印象情報を抽出するのと同一の処理によってＤＢ印象情報抽出部１１で画像印象情報を抽出することを説明するものであり、後述する。 FIG. 4 is a diagram showing an example of document impression information obtained in each embodiment as described above. In the same figure (a1), the blog input document “I enjoyed the zoo. I could see cute penguins. I really enjoyed it.” (C1) is the document impression information obtained when there are three words “”. In this example, two words “fun” and one “cute” are extracted as document impression words, and the frequency of each document impression word is extracted as the degree of document impression relevance. As shown in the figure, it is assumed that the impression information extraction unit 2 extracts the impression word “fun” from “I enjoyed” in the input document by morphological analysis or the like. Although the impression word “not boring” is not extracted because it does not exist in the input document, it may be considered that the value of the document impression related degree is extracted as zero. FIGS. (A2) and (c2) illustrate that the image impression information is extracted by the DB impression information extraction unit 11 by the same processing as the document information extraction by the impression information extraction unit 2 as described above. This will be described later.

なお、印象語は、図中のように"楽しい"などの形容詞でもいいし、"春"などの名詞でもよい。また、非特許文献4などにある語の共起関係による印象の推定手法を利用して、"春"といった語から"楽しい"などを推定したものを印象語としてもよい。なお前述のように本発明においては、文書の印象を抽出するにあたって基本となる印象語（"楽しい"など）は予め定義しておいたものとする。 The impression word may be an adjective such as “fun” as in the figure, or a noun such as “spring”. In addition, an impression word may be used that is obtained by estimating “fun” or the like from a word such as “spring” by using an impression estimation method based on a co-occurrence relationship of words in Non-Patent Document 4 or the like. As described above, in the present invention, the basic impression word (such as “fun”) for extracting the impression of a document is defined in advance.

印象語変換部５（第２実施形態にて利用）では、所定の印象語を印象カテゴリに変換する。また後述のように当該変換に対応した文書印象関連度の値も求める。印象カテゴリへの変換の際に、印象語変換テーブル５０を利用する。印象語変換テーブル５０とは、印象語と印象カテゴリの対応付けがされている対応表であり、例を図５に示す。例えば、印象語として、「愉快な」と「心地よい」があった場合に、その印象カテゴリは「楽しい」に対応する。また、印象カテゴリは必ずしも「楽しい」といった言葉である必要はなく、印象カテゴリの識別を表すためのID（番号）でもよい。印象語変換テーブル５０は、非特許文献3、4などの所定の知見から作成してもよいし、後述する手法を利用して作成してもよい。 The impression word conversion unit 5 (used in the second embodiment) converts a predetermined impression word into an impression category. Further, as will be described later, a document impression relevance value corresponding to the conversion is also obtained. The impression word conversion table 50 is used when converting to the impression category. The impression word conversion table 50 is a correspondence table in which impression words are associated with impression categories, and an example is shown in FIG. For example, when there are “pleasant” and “comfortable” as impression words, the impression category corresponds to “fun”. The impression category is not necessarily a word such as “fun”, and may be an ID (number) for identifying the impression category. The impression word conversion table 50 may be created from predetermined knowledge such as Non-Patent Documents 3 and 4, or may be created using a method described later.

キーワード抽出部６（第３実施形態にて利用）では、入力文書中から印象語とは別に、名詞や動詞などキーワードとなる語を抽出する。キーワード抽出方法は、言語処理の分野などで広く用いられているtfidfを用いればよい。なお、印象語とキーワードは語として同じものがあってもよいが、前述のように印象語とは入力文書とは独立に所与の語を用意しておくものであり、キーワードとは入力文書自体から抽出されるものである。 The keyword extraction unit 6 (used in the third embodiment) extracts words that are keywords such as nouns and verbs from the input document, in addition to the impression words. The keyword extraction method may be tfidf which is widely used in the field of language processing. The impression word and the keyword may be the same as the word, but as described above, the impression word is prepared for a given word independently of the input document, and the keyword is the input document. It is extracted from itself.

配色情報抽出部７（第４実施形態にて利用）は、入力部１から入力されたブログ等のテンプレートから、テンプレート配色情報を抽出する。ブログテンプレートから配色を抽出する例を図６に示す。例えば、ブログのテンプレートが「自然」というテーマのものであれば、ブログのタイトルバーやメニューバーは、緑色や茶色を基調とした配色が行われる。このタイトルバーなどの色情報を抽出する。予め、ブログテンプレートと配色の対応をとっておいたものを利用してもよいし、PCなどの端末におけるディスプレイでの表示画面をスクリーンショットしたものを配色情報としてもよい。 The color arrangement information extraction unit 7 (used in the fourth embodiment) extracts template color arrangement information from a template such as a blog input from the input unit 1. An example of extracting a color scheme from a blog template is shown in FIG. For example, if the blog template has a theme of “nature”, the title bar and menu bar of the blog are colored based on green and brown. Color information such as the title bar is extracted. It is possible to use a blog template that corresponds to the color scheme in advance, or a screen shot of a display screen on a terminal such as a PC may be used as the color scheme information.

こうして配色情報抽出部７では、ブログのテンプレート領域における色を抽出する。ここでのテンプレートとは、図６に示すようにブログのタイトルが表示されるタイトル領域やユーザが過去に書いた記事へのリンクなどが表示されるメニュー領域および記事書き込み欄の背景領域の色などを複数パターン用意したもので、その領域をテンプレート領域とする。一般的に、当該ブログの投稿者（管理者）は、自分の好みに合ったテンプレートを選択する。例えば、図中のように「自然」をテーマとしたテンプレートであれば、タイトル領域やメニュー領域の色は、「緑」を基調とした色に装飾されたりする。このテンプレートに該当する部分の色を抽出する。所定のRGBの各色成分を取得し、RGB空間またはRGB空間からHSV，L*a*b，L*u*vなどの所定の各種色空間へ変換した上で、各色成分を利用して後述の画像検索部３で利用する。 In this way, the color arrangement information extraction unit 7 extracts colors in the template area of the blog. The template here is, as shown in FIG. 6, the title area where the title of the blog is displayed, the menu area where the link to the article written by the user in the past is displayed, the color of the background area of the article writing column, etc. A plurality of patterns are prepared, and the area is set as a template area. Generally, a blog poster (administrator) selects a template that suits his / her preference. For example, if the template is based on the theme of “nature” as shown in the figure, the color of the title area and menu area is decorated with a color based on “green”. The color of the part corresponding to this template is extracted. After acquiring each RGB color component and converting it from RGB space or RGB space to various color spaces such as HSV, L * a * b, L * u * v, etc. Used in the image search unit 3.

共起情報抽出部８（第５実施形態にて利用）では、印象情報抽出部２から抽出された印象語とよく共起する単語の共起情報を、頻出共起単語リスト８０から抽出する。共起情報とは、共起する「単語」、およびその単語が出現する「頻度」、およびその頻度に基づく順位である「ランク」を指す。頻出共起単語リスト８０については後で述べる。 The co-occurrence information extraction unit 8 (used in the fifth embodiment) extracts from the frequent co-occurrence word list 80 co-occurrence information of words that frequently co-occur with the impression word extracted from the impression information extraction unit 2. The co-occurrence information refers to a “word” that co-occurs, a “frequency” at which the word appears, and a “rank” that is a rank based on the frequency. The frequent co-occurrence word list 80 will be described later.

印象・画像DB３０は、画像（画像ＩＤおよび画像内容）と画像印象情報（印象語および関連度）が対応づけられて保存されているデータベースである。各画像を識別するためのテキスト情報であるタグも追加して対応づけられて保存されていてもよい。図７に印象・画像ＤＢ３０に格納されたデータ例を示す。当該データおよび印象・画像ＤＢ３０の構築に関しては後述する。 The impression / image DB 30 is a database in which images (image IDs and image contents) and image impression information (impression words and degrees of association) are stored in association with each other. Tags that are text information for identifying each image may also be added and associated with each other and stored. FIG. 7 shows an example of data stored in the impression / image DB 30. The construction of the data and the impression / image DB 30 will be described later.

コメント・画像DB１０は、各画像に対して、画像識別のタグを対応づけるとともに、画像閲覧者によるコメントが画像と対応づいた形で保存されているものであり、例を図８に示す。例えば、Flickr（登録商標）のような画像共有サイトがこれに該当する。同図にも示すように画像に対応づけられたタグはサイト管理者等が画像内容識別等の目的で付与したものであり、画像に寄せられたコメントとは区別される。 In the comment / image DB 10, an image identification tag is associated with each image, and a comment by an image viewer is stored in a form associated with an image. An example is shown in FIG. 8. For example, an image sharing site such as Flickr (registered trademark) corresponds to this. As shown in the figure, the tag associated with the image is given by the site manager or the like for the purpose of identifying the image contents, and is distinguished from the comment attached to the image.

本発明における当該コメント・画像ＤＢ１０の利用は次のような着目点によるものである。すなわち、自動での画像選定装置の実現には、画像とその画像の印象との対応付けを行った画像データベース（画像DB）が必要となる。その際に、人手で画像DB中の全画像に印象を与えるのは困難である。簡単にその画像DBを構築するために、例えばFlickrのような画像共有サイトの利用が考えられる。画像共有サイトでは、各画像に対して、画像を識別するためのテキスト情報であるタグおよび画像閲覧者によるコメントが存在する。このコメントには、画像に対する閲覧者の感想や印象が多く含まれていると考えられ、閲覧者の印象が直接的に反映されていると考えられる。コメント中の印象を表す言葉（印象語）を利用して、画像との対応付けができれば有用な画像DBが簡単に構築でき、ブログの印象に適した画像の付与が可能となると考えられる。 The use of the comment / image DB 10 in the present invention is based on the following points of interest. In other words, an image database (image DB) in which an image and an impression of the image are associated with each other is required to realize an automatic image selection device. At that time, it is difficult to give an impression to all the images in the image database manually. In order to easily construct the image database, you can consider using an image sharing site such as Flickr. In the image sharing site, a tag that is text information for identifying an image and a comment by an image viewer exist for each image. This comment is considered to contain many impressions and impressions of the viewer with respect to the image, and it is considered that the impression of the viewer is directly reflected. If a word (impression word) representing an impression in a comment can be used and associated with an image, a useful image DB can be easily constructed, and an image suitable for a blog impression can be provided.

ＤＢ印象情報抽出部１１はコメント・画像ＤＢ１０に保存されたコメントに対して印象情報抽出部２と同様の処理を行う。すなわち、印象情報抽出部２で用いたのと同一の所与の印象語を用い、同一のテキスト処理によってコメントから文書印象語及び文書印象関連度を含む文書印象情報を抽出する。しかしここで当該コメントはコメント・画像ＤＢ１０において対応づけられた画像に対する評価・感想等のコメントであるので、ＤＢ印象情報抽出部１１が抽出しているのは画像に対する印象である。そこでユーザ入力のブログ文書をテキスト処理の対象とする印象情報抽出部２での抽出結果としての文書印象語、文書印象関連度、文書印象情報と特に区別するために、当該ＤＢ印象情報抽出部１１が画像に対応づけられたコメントから上記抽出した結果を画像印象語、画像印象関連度、画像印象情報と呼ぶこととする。なおまた特に区別せずとも文脈から明らかな場合は単に印象語などと呼ぶこととする。 The DB impression information extraction unit 11 performs the same processing as the impression information extraction unit 2 on comments stored in the comment / image DB 10. That is, using the same given impression word as used in the impression information extraction unit 2, the document impression information including the document impression word and the document impression related degree is extracted from the comment by the same text processing. However, since the comment is a comment such as evaluation / impression on the image associated with the comment / image DB 10, the DB impression information extraction unit 11 extracts the impression on the image. Therefore, in order to distinguish particularly from the document impression word, the document impression relevance, and the document impression information as the extraction result in the impression information extraction unit 2 that uses the user-input blog document as a text processing target, the DB impression information extraction unit 11 The result extracted from the comment associated with the image is referred to as an image impression word, image impression relevance, and image impression information. In addition, if it is clear from the context without particular distinction, it is simply called an impression word.

なお、ＤＢ印象情報抽出部１１のテキスト処理の面で印象情報抽出部２と同一機能であるので、画像印象関連度に関してもコメント・画像ＤＢ１０におけるコメントから抽出した単語のうち所与の印象語と一致する語の頻度に基づいて求める実施形態と、第１テキストＤＢ９を検索して所与の印象語の全てに対して求める実施形態とが存在する。しかしここで同一機能であるとは同一機能に基づく各実施形態がそれぞれにおいて利用可能であるという意味であり、印象情報抽出部２とＤＢ印象情報抽出部１１とで文書印象関連度／画像印象関連度を求める実施形態は必ずしも同一のものを用いなくてもよい。例えば印象情報抽出部２では頻度に基づいて文書印象関連度を求め、ＤＢ印象情報抽出部１１では第１テキストＤＢ９の検索によって画像印象関連度を求めてもよい。 Since the DB impression information extraction unit 11 has the same function as the impression information extraction unit 2 in terms of text processing, the impression impression relevance level is the same as the given impression word among the words extracted from comments in the comment / image DB 10. There are embodiments that are determined based on the frequency of matching words, and embodiments that are searched for all of the given impression words by searching the first text DB 9. However, here, the same function means that each embodiment based on the same function can be used, and the impression information extraction unit 2 and the DB impression information extraction unit 11 are related to document impression relevance / image impression relevance. The embodiment for obtaining the degree need not necessarily be the same. For example, the impression information extraction unit 2 may obtain the document impression relevance level based on the frequency, and the DB impression information extraction unit 11 may obtain the image impression relevance level by searching the first text DB 9.

ここでＤＢ印象情報抽出部１１が処理するコメントとは、一画像に対して寄せられた全コメントの集合である。例えば図８の画像１に対するコメントは、コメント１、２、３、…を全て集めた「楽しそう！」、「かわいい！」、「楽しい！」、…であり、これに対して印象情報抽出部２と同様の処理を行って画像１に対する画像印象語及び画像印象関連度を含む画像印象情報を抽出する。 Here, the comment processed by the DB impression information extraction unit 11 is a set of all comments sent to one image. For example, the comments for image 1 in FIG. 8 are “Looks fun!”, “Cute!”, “Happy!”, Etc., which are all collected comments 1, 2, 3,. The image impression information including the image impression word and the image impression related degree with respect to the image 1 is extracted by performing the same process as in FIG.

この抽出の例を前述の図４に示す。ＤＢ印象情報抽出部１１への入力文書は同図（ａ１）に示すように画像１に対するコメントであり、同図（ｂ）に示すように印象情報抽出部２で用いたのと同一の印象語「楽しい」「かわいい」「つまらない」によって、同図（ｃ２）に示すような画像印象情報が得られる。 An example of this extraction is shown in FIG. The input document to the DB impression information extraction unit 11 is a comment on the image 1 as shown in FIG. 11A, and the same impression word used in the impression information extraction unit 2 as shown in FIG. By “fun”, “cute”, and “not boring”, image impression information as shown in FIG.

なお画像印象関連度は、文書印象関連度と同様に当該画像に対する全コメント中に現れた各印象語の頻度でもよいし、出現した割合でもよい。また、コメント・画像DB１０中の全画像に対する全コメントにおける各印象語の頻度で、当該画像の各印象語の頻度を割ってもよい。これは、一般的な文書ではなく、画像に対するコメントのみを使って、コメント中に現れやすい印象語と現れにくい印象語の正規化をしていることになる。また、文書印象関連度と同様に、所定の文書コーパスなどを利用して求めたdfを利用して、当該画像のコメントに対するtfidfを計算し、それを画像印象関連度としてもよい。また、ＤＢ印象情報抽出部１１が第１テキストＤＢ９を検索することによって画像印象関連度を求めてもよい。 The image impression relevance level may be the frequency of each impression word that appears in all comments on the image, or the rate of appearance, as with the document impression relevance level. Further, the frequency of each impression word in the image may be divided by the frequency of each impression word in all comments for all images in the comment / image DB 10. This means that the impression words that are likely to appear in the comments and the impression words that are difficult to appear in the comments are normalized using only the comments on the image, not a general document. Similarly to the document impression relevance level, df obtained using a predetermined document corpus or the like may be used to calculate tfidf for the comment of the image, and this may be used as the image impression relevance level. Further, the DB impression information extraction unit 11 may obtain the image impression relevance degree by searching the first text DB 9.

印象・画像ＤＢ３０の構築は、以上のようにしてＤＢ印象情報抽出部１１がコメント・画像ＤＢ１０に保存された各画像に対するコメントを処理し、得られた画像印象情報を画像と対応づけて印象・画像ＤＢ３０に保存することと、コメント・画像DB１０において各画像に対応づけられたタグを印象・画像ＤＢ３０に上記保存された画像に対してそのままコピーして引き継ぐことによって行われる。すなわち画像と画像印象情報の対応づけにおいて、コメント・画像ＤＢ１０が１次データベース、印象・画像ＤＢ３０が２次データベースの関係となる。 As described above, the impression / image DB 30 is constructed by the DB impression information extraction unit 11 processing comments for each image stored in the comment / image DB 10 and associating the obtained image impression information with the image. Saving is performed in the image DB 30, and a tag associated with each image in the comment / image DB 10 is copied to the impression / image DB 30 as it is and then transferred. That is, in associating images with image impression information, the comment / image DB 10 has a primary database and the impression / image DB 30 has a secondary database.

このようにして本発明においてはコメント・画像ＤＢ１０に対して、予め多数の閲覧者等によって膨大な画像に各画像の印象に対応するコメントが付与されているネットワーク上の一般的な画像共有サイト、画像投稿システム等のデータを用いる。各画像はシステム等を利用する多数のユーザが投稿し、それを閲覧した多数のユーザがコメントを付与する。なおコメントが付与されていない画像はＤＢ印象情報抽出部１１の処理対象から省けばよい。このようなコメント・画像ＤＢ１０からＤＢ印象情報抽出部１１等にて自動生成される印象・画像ＤＢ３０を利用することによって、本発明の画像選定装置を利用するユーザ自身が手動で膨大な画像の中から画像を探すという手間をかけることなく、また画像の印象を表す語をシステム開発側が手動で行うなどという負担もなく、当該ユーザのブログ文書の雰囲気に合致した上位所定数の画像が自動選出され当該ユーザに提供されるという効果がある。 In this way, in the present invention, a general image sharing site on a network in which comments corresponding to the impression of each image are given to a huge number of images in advance by a large number of viewers, etc., with respect to the comment / image DB 10. Data such as an image posting system is used. Each image is posted by a large number of users who use the system or the like, and a large number of users who view the images give comments. Note that an image to which no comment is attached may be omitted from the processing target of the DB impression information extraction unit 11. By using the impression / image DB 30 that is automatically generated from the comment / image DB 10 by the DB impression information extraction unit 11 or the like, the user who uses the image selection device of the present invention manually creates a large number of images. The top predetermined number of images that match the atmosphere of the user's blog document are automatically selected without the hassle of searching for an image from the user, and without the burden of manually developing a word representing the impression of the image. There is an effect that it is provided to the user.

画像検索部３では、印象語抽出部２で得られた文書印象語をキーとして、印象・画像DB３０に対して検索を行う。文書印象語とそれにマッチする画像印象語との間で、文書印象関連度と画像印象関連度に基づき画像選択スコアを計算する。そして、スコアの高い一枚以上の画像（候補）を出力部４に渡す。ユーザは出力部４の複数の画像候補の中から、気に入った画像を選択する。 The image search unit 3 searches the impression / image DB 30 using the document impression word obtained by the impression word extraction unit 2 as a key. An image selection score is calculated between the document impression word and the image impression word matching the document impression word based on the document impression related degree and the image impression related degree. Then, one or more images (candidates) with high scores are passed to the output unit 4. The user selects a favorite image from among a plurality of image candidates in the output unit 4.

前記第１実施形態すなわち入力部１、印象情報抽出部２、画像検索部３、印象・画像ＤＢ３０及び出力部４の構成（印象・画像ＤＢ３０を構築する場合にはコメント・画像ＤＢ１０及びＤＢ印象情報抽出部１１を用いる）における画像選択スコアは次のとおりであり、当該スコアは印象語に基づいて算出されるスコアであるので印象語スコアと呼ぶこととする。 Configuration of the first embodiment, that is, the input unit 1, the impression information extraction unit 2, the image search unit 3, the impression / image DB 30 and the output unit 4 (in the case of constructing the impression / image DB 30, the comment / image DB 10 and the DB impression information) The image selection score in (using the extraction unit 11) is as follows. Since the score is a score calculated based on the impression word, it is referred to as an impression word score.

すなわちユーザの入力文書および所与の印象語を用いて印象情報抽出部２の各実施形態により抽出された各文書印象語をk_xとし、文書印象語k_xの文書印象関連度をD(k_x)、画像検索部３による検索で得られた文書印象語k_xに一致する画像印象語k_xを持つ画像i_zの画像印象関連度をI_iz(k_x)として、印象語スコアIS_izは次式（数式１）で計算される。 That is, each document impression word extracted by each embodiment of the impression information extraction unit 2 using the user's input document and a given impression word is set as k _x, and the document impression related degree of the document impression word k _x is set as D (k _x ), an impression word score IS _iz with an image impression relevance of an image i _z having an image impression word k _x matching the document impression word k _x obtained by the search by the image search unit 3 as I _iz (k _x ) Is calculated by the following formula (Formula 1).

ここで、K={k_x|x=1,2,...,N_K}は入力文書から抽出された印象語の集合、N_Kは入力文書から抽出された印象語の個数である。なおまた一般に単独画像i_zに対する画像印象語の集合と入力文書の文書印象語の集合とは部分的にのみ一致するので、（数式１）において文書印象語集合Kに含まれる文書印象語k_y∈K のうち、画像i_zの画像印象語に一致するものがないような文書印象語k_yについては画像印象関連度I_iz(k_y)の値をゼロとする。 Here, K = {k _x | x = 1, 2,..., N _K } is a set of impression words extracted from the input document, and N _K is the number of impression words extracted from the input document. Note Since the general document impression word group of a single image i _z image impression word set and input documents for matching only partially, the document impression words included in the document impression word set K in (Equation 1) k _y of that is an element of the set k, the document impression word k _y as there are no matches to the image impression word of image i _z is the value of the image impression relevance I _iz (k _y) is zero.

出力部４では、画像検索部でスコアが高いと判定された画像群の中から1枚以上の画像を、入力部で入力された文書と共に出力する。画像は出力スコアが高いものから順に所定数を出力する。ユーザは出力された画像を全て利用してもよいし、気に入った画像のみを選択して当該選択結果を画像選定装置に送信して再度入力文書と共に受信してもよい。 The output unit 4 outputs one or more images from the image group determined to have a high score by the image search unit together with the document input by the input unit. A predetermined number of images are output in descending order of output score. The user may use all the output images, or may select only the images that he / she likes and transmit the selection results to the image selection device and receive them again with the input document.

次に、第１実施形態に印象語変換部５、印象語変換テーブル５０を追加した第２実施形態における画像選択スコアの算出について述べる。第２実施形態において印象・画像DB３０を構築する場合には前述の通り図３の構成を全て用いる。 Next, calculation of the image selection score in the second embodiment in which the impression word conversion unit 5 and the impression word conversion table 50 are added to the first embodiment will be described. When constructing the impression / image DB 30 in the second embodiment, all the configurations in FIG. 3 are used as described above.

第２実施形態は第１実施形態が（数式１）に示すように印象語単位で画像選択スコア（印象語スコア）を算出していたのに対して印象カテゴリ単位で画像選択スコアを算出するものであり、これを印象カテゴリスコアと呼ぶこととする。第２実施形態は第１実施形態における効果に対して次のような追加的効果を有する。すなわち、第１実施形態で且つ文書印象関連度・画像印象関連度が頻度に基づく実施形態の場合、所与の印象語として用意しておく語の種類によっては、コメント・画像DB１０のコメント中に現れる頻度が低い語が存在する場合もある。出現頻度が低い印象語を用いて画像と印象との対応付けを行い印象画像DB３０を構築した場合、その対応付けの信頼性が低くなってしまい、本発明の画像選定装置が適切な画像を選択する性能が低下することも考えられる。そこで第２実施形態により、似ている印象語をカテゴライズ（グループ化）してその印象カテゴリの出現頻度を高めた上で、その印象カテゴリと画像とを対応付けることで、信頼性を高めるという効果が得られる。 In the second embodiment, the image selection score is calculated in units of impression categories, whereas the image selection score (impression word score) is calculated in units of impression words as shown in (Formula 1) in the second embodiment. This is called the impression category score. The second embodiment has the following additional effects with respect to the effects of the first embodiment. That is, in the first embodiment and the embodiment in which the document impression relevance level and the image impression relevance level are based on the frequency, depending on the type of words prepared as a given impression word, There may be words that appear less frequently. When the impression image DB 30 is constructed by associating an image with an impression using an impression word having a low appearance frequency, the reliability of the association becomes low, and the image selection apparatus of the present invention selects an appropriate image. It is also conceivable that the performance to be reduced. Thus, according to the second embodiment, similar impression words are categorized (grouped) to increase the appearance frequency of the impression category, and the impression category and the image are associated with each other, thereby improving the reliability. can get.

なお、第１テキストＤＢ９を検索利用して文書印象関連度・画像印象関連度を求める実施形態では上述のような印象語の設定による対応づけの信頼性低下のおそれは少ないと考えられるが、この場合であっても当該第２実施形態と併用することはできる。 In the embodiment in which the document impression relevance / image impression relevance is obtained by searching and using the first text DB 9, it is considered that there is little risk of lowering the reliability of association by setting the impression word as described above. Even in this case, it can be used in combination with the second embodiment.

まず前述の図５のような印象語変換テーブル５０が既に作成されているものとして、第２実施形態において画像選択スコアとして採用する印象カテゴリスコアの算出を説明する。印象情報抽出部２で第１実施形態と同様に文書印象情報を抽出し、印象語変換部５へ文書印象情報を渡す。印象語変換部５は受け取った文書印象語を印象変換テーブル５０を参照して対応する印象カテゴリに変換する。これを文書印象カテゴリと呼ぶこととする。また印象語変換部５では文書印象語と対応づけて印象情報抽出部２で抽出された文書印象関連度に対して、印象語の印象カテゴリへの変換に対応する処理を行う。すなわち印象語単位で値が定義されていた文書印象関連度を拡張して印象カテゴリ単位での値を求める。当該値は当該文書印象カテゴリに属する文書印象語の関連度を足し合わせた値となるが、同様に「文書印象関連度」と呼ぶこととする。これらは「文書印象語に対応する文書印象関連度」と、「文書印象カテゴリに対応する文書印象関連度」又は「更新された文書印象関連度」、などとして区別するものとする。 First, assuming that the impression word conversion table 50 as shown in FIG. 5 has already been created, the calculation of the impression category score employed as the image selection score in the second embodiment will be described. The impression information extraction unit 2 extracts document impression information as in the first embodiment, and passes the document impression information to the impression word conversion unit 5. The impression word conversion unit 5 refers to the impression conversion table 50 to convert the received document impression word into a corresponding impression category. This is called a document impression category. In addition, the impression word conversion unit 5 performs processing corresponding to conversion of impression words into impression categories for the document impression relevance extracted by the impression information extraction unit 2 in association with document impression words. That is, the document impression relevance level, whose value is defined in terms of impression words, is expanded to obtain a value in terms of impression categories. The value is a value obtained by adding the relevance levels of document impression words belonging to the document impression category, and is similarly referred to as “document impression relevance level”. These are distinguished as “document impression relevance corresponding to document impression word”, “document impression relevance corresponding to document impression category”, “updated document impression relevance”, or the like.

（例１）例えば、第１実施形態で所定の印象語が「愉快な」、「心地よい」、「快い」…で、ある入力文書に対して文書印象関連度が印象語の頻度として「愉快な＝３語」、「心地よい＝４語」、「快い＝２語」…として得られているとする。この場合第２実施形態で印象語変換テーブル５０が「印象カテゴリ１（愉快な）、（心地よい）」、「印象カテゴリ２（快い）」…である場合、前記入力文書に対する文書印象関連度は各印象カテゴリに属する全印象語の頻度の和として「印象カテゴリ１＝７語＝（愉快な：３語）＋（心地よい：４語）」、「印象カテゴリ２＝２語＝（快い：２語）」…となる。頻度以外を用いる場合でも同様に足し合わせればよい。 (Example 1) For example, in the first embodiment, the predetermined impression words are “pleasant”, “comfortable”, “pleasant”, etc., and the document impression relevance level is “pleasant” for a certain input document. = 3 words, “comfortable = 4 words”, “pleasant = 2 words”... In this case, in the second embodiment, when the impression word conversion table 50 is “impression category 1 (pleasant), (comfortable)”, “impression category 2 (pleasant)”,... “Impression category 1 = 7 words = (Pleasant: 3 words) + (Comfortable: 4 words)”, “Impression category 2 = 2 words = (Pleasant: 2 words) "... Even when a frequency other than the frequency is used, it may be added in the same manner.

このような文書印象カテゴリ及び当該文書印象カテゴリに対応する文書印象関連度が印象語変換部５から画像検索部３に渡される。画像検索部３では印象語変換部５から受け取った文書印象カテゴリをキーとして印象・画像ＤＢ３０を検索して対応する画像印象関連度を求める。 The document impression category and the document impression related degree corresponding to the document impression category are transferred from the impression word conversion unit 5 to the image search unit 3. The image search unit 3 searches the impression / image DB 30 using the document impression category received from the impression word conversion unit 5 as a key, and obtains a corresponding image impression related degree.

印象・画像ＤＢ３０では第１実施形態において各画像に対して対応する「画像印象語」及び画像印象関連度が格納されていたのに対応して、第２実施形態では各画像に対して対応する「画像印象カテゴリ」及び画像印象関連度が格納されている。このような画像印象カテゴリ単位での画像印象関連度は、ＤＢ印象語変換部１２と印象語変換テーブル５０との利用によって前述の印象語変換部５の説明と同様にして求められる。すなわち、コメント画像ＤＢ１０においてある画像に対応する全コメントからＤＢ印象情報抽出部１１にて画像印象語及び画像印象関連度を含む画像印象情報を算出し、当該画像印象情報をＤＢ印象語変換部１２に渡して印象語変換テーブル５０を参照して画像印象語を印象カテゴリで一括して画像印象カテゴリに変換したうえで、ＤＢ印象情報抽出部１１にて画像印象カテゴリ単位での画像印象関連度を求める。当該値は前述の（例１）と全く同様に画像印象カテゴリに属する画像印象情報の関連度の値の和である。こうして第２実施形態では画像印象カテゴリ単位で画像印象関連度を算出して印象・画像ＤＢ３０を構築する。なお当該構築においてタグは第１実施形態と全く同様のまま利用できることは明らかである。 In the impression / image DB 30, the “image impression word” and the image impression related degree corresponding to each image in the first embodiment are stored, and in the second embodiment, each image corresponds to each image. “Image impression category” and image impression relevance are stored. The degree of image impression relevance in units of image impression categories is obtained in the same manner as described above for the impression word conversion unit 5 by using the DB impression word conversion unit 12 and the impression word conversion table 50. That is, the image impression information including the image impression word and the degree of image impression relevance is calculated by the DB impression information extraction unit 11 from all comments corresponding to an image in the comment image DB 10, and the image impression information is converted into the DB impression word conversion unit 12. The image impression words are collectively converted into image impression categories in the impression category with reference to the impression word conversion table 50, and the DB impression information extraction unit 11 determines the image impression relevance level for each image impression category. Ask. The value is the sum of the relevance values of the image impression information belonging to the image impression category in exactly the same manner as described above (Example 1). Thus, in the second embodiment, the impression / image DB 30 is constructed by calculating the degree of image impression relevance for each image impression category. In this construction, it is obvious that tags can be used as they are in the same manner as in the first embodiment.

以上のようにして、第２実施形態では第１実施形態における印象語スコアIS_izの拡張版として印象カテゴリスコアIS_izを同様の（数式１）によって求め、これを画像選択スコアとする。ただし第２実施形態における（数式１）ではユーザの入力文書から印象情報抽出部２および印象語変換部５を経て抽出された各文書印象カテゴリをk_xとし、文書印象カテゴリk_xに対応する文書印象関連度をD(k_x)、検索により得られた文書印象カテゴリk_xに一致する画像印象カテゴリk_xを持つ画像i_zの画像印象関連度をI_iz(k_x)とし、K={k_x|x=1,2,...,N_K}は入力文書から抽出された印象カテゴリの集合、N_Kは入力文書から抽出された印象カテゴリの個数と読み替えるものとする。 As described above, in the second embodiment, the impression category score IS _iz is obtained by the same (Formula 1) as an extended version of the impression word score IS _iz in the first embodiment, and this is used as the image selection score. However, in (Formula 1) in the second embodiment, each document impression category extracted from the user's input document through the impression information extraction unit 2 and the impression word conversion unit 5 is k _x , and the document corresponding to the document impression category k _x impression relevance as D (k _x), the image impression relevance to I _iz of the image i _z having an image impression category k _x matching document impression category k _x obtained by the search (k _x), K = { k _x | x = 1,2,..., N _K } is read as a set of impression categories extracted from the input document, and N _K is read as the number of impression categories extracted from the input document.

次に、第２実施形態に用いる印象語変換テーブル５０の作成方法について述べる。前述の通り非特許文献3、4の技術によって作成することができるが、自動、手動による違いはあるものの、画像を用いずに語のみの一般的な関係性を構築するという点で同じであり、本発明のように画像付与を行う場合には、適さないことが考えられる。本発明では印象・画像ＤＢ３０に格納されたコメントを利用した画像付与に特化した形での語の関係性の構築について述べる。 Next, a method for creating the impression word conversion table 50 used in the second embodiment will be described. As mentioned above, it can be created by the techniques of Non-Patent Documents 3 and 4, but there is a difference between automatic and manual, but it is the same in that a general relationship of words only is constructed without using images. When an image is applied as in the present invention, it may be unsuitable. In the present invention, the construction of word relationships in a form specialized for image assignment using comments stored in the impression / image DB 30 will be described.

当該構築においてはまず第１実施形態における構築方法によって印象語単位で構築された印象・画像DB３０をカテゴライズ部１３が読み込んで、以下に具体的に説明する手法によって印象語変換テーブル５０を作成すると共に、カテゴライズ部１３が当該構築の結果を利用して印象・画像DB３０を第２実施形態に沿った形式すなわち印象カテゴリ単位で構築されたデータベースへと更新する。なお当該更新にあたっては更新された印象カテゴリ単位のデータをデータベースに上書きする代わりに追加することによって、印象語単位で構築されたデータを参照可能なように残しておいてもよい。 In the construction, first, the categorizing unit 13 reads the impression / image DB 30 constructed in units of impression words by the construction method in the first embodiment, and creates the impression word conversion table 50 by a method specifically described below. The categorizing unit 13 uses the result of the construction to update the impression / image DB 30 into a database constructed in the format according to the second embodiment, that is, in impression category units. In the update, the updated data in impression category units may be added instead of overwriting the database so that the data constructed in impression word units can be referred to.

図９は、カテゴライズ部１３による、画像に対するコメント中の語の共起関係に基づく印象語のカテゴライズ方法を模式的に示している。当該方法の着目点は次の通りである。すなわち、画像共有サイトなどでは、ある１つの画像に対して、複数の画像閲覧者からのコメントが付与されている場合がある。この場合に、画像閲覧者が画像に対して共通の印象を受けているならば、付与されたコメント同士は、同じような意味であることが多い。このことから、コメント中の語同士も同じような意味であると推測される。したがって、同じ画像に対するコメント中に共起して現れる語同士は類似度が高いと考える。この類似度は、画像との関係性が反映されているコメントから求めているため、非特許文献3、4による語の関係性よりも画像付与に特化しているといえる。以上の着目点からも明らかなように、当該類似度を利用して印象語をカテゴライズし印象語変換テーブル５０を作成することで本発明の画像選定装置が適切な画像を選択する精度が向上するという効果がある。 FIG. 9 schematically shows a method of categorizing impression words based on the co-occurrence relationship of words in a comment with respect to an image by the categorizing unit 13. The focus of this method is as follows. That is, in an image sharing site or the like, a comment from a plurality of image viewers may be given to a certain image. In this case, provided that the image viewers have a common impression on the image, the given comments often have the same meaning. From this, it is presumed that the words in the comment have the same meaning. Therefore, words appearing together in comments on the same image are considered to have high similarity. Since the similarity is obtained from a comment reflecting the relationship with the image, it can be said that the similarity is more specific to image assignment than the word relationship according to Non-Patent Documents 3 and 4. As is clear from the above points of interest, the accuracy of selecting an appropriate image by the image selection apparatus of the present invention is improved by categorizing impression words using the similarity and creating the impression word conversion table 50. There is an effect.

なお、上記の説明では効果を得るための着目点を強調するために「コメント中の語」等として説明したが、実際にカテゴライズ部１３が処理するのは印象・画像ＤＢ３０内の画像印象語である。当該画像印象語は前述のように、コメント・画像ＤＢ１０において画像に対して付与されたコメントに由来し、当該コメント中からＤＢ印象情報抽出部１１で抽出されて印象・画像ＤＢ３０に保存される。上記の説明ではこのような由来によってその効果を生ずるという点が明確となるよう「コメント中の語」等で説明した。 In the above description, “word in comment” is described to emphasize a point of interest for obtaining an effect. However, what is actually processed by the categorizing unit 13 is an image impression word in the impression / image DB 30. is there. As described above, the image impression word is derived from the comment given to the image in the comment / image DB 10, extracted from the comment by the DB impression information extraction unit 11, and stored in the impression / image DB 30. In the above explanation, “words in comments” etc. are explained so that it is clear that the effect is caused by such origin.

図９の例では、"愉快な"、"心地よい"、"快い"の3つの印象語の関係について示している。同図（a）のカテゴライズ前、すなわち第１実施形態によって構築された印象・画像DB３０における3つの語の１画像内における画像印象語としての共起関係を見ると、"愉快な"に対して、"心地よい"、"快い"がそれぞれ共起している。例では"愉快な"-"心地よい"、 "愉快な"-"快い"はそれぞれ2回（２画像）ずつ共起しているが、この共起頻度を印象語間の類似度として採用してもよい。この場合は類似度が同じという判定になる。もしこの共起頻度に差があれば、共起頻度が高い語の組み合わせの方が、類似度が高いということになる。類似度の高い印象語をカテゴライズすると、印象語カテゴリ同士の類似度を計算する必要が出てくるが、後述の（数式２）よりも明らかなようにそれは同様に行うことができる。 In the example of FIG. 9, the relationship between three impression words of “pleasant”, “comfortable”, and “pleasant” is shown. When the co-occurrence relationship as an image impression word in one image of the three words in the impression / image DB 30 constructed by the first embodiment, that is, before categorization in FIG. , "Comfortable" and "Pleasant" co-occur. In the example, “Pleasant”-“Pleasant” and “Pleasant”-“Pleasant” co-occur twice (two images), but this co-occurrence frequency is adopted as the similarity between impression words. Also good. In this case, it is determined that the similarity is the same. If there is a difference in the co-occurrence frequency, a combination of words having a higher co-occurrence frequency has a higher similarity. When impression words having a high degree of similarity are categorized, it is necessary to calculate the degree of similarity between impression word categories. However, as can be seen from (Equation 2) described later, this can be done in the same manner.

印象語カテゴリAと印象語カテゴリBの類似度（または距離）Rel(A,B)は、以下（数式２）のようにして求められる。当該類似度を求めるためにまず、全コメント集合Γに含まれ、ある1つの画像zに対して付与されたコメント群γ_zにおいて、印象語カテゴリΑに含まれる印象語（α_x）と印象語カテゴリΒに含まれる印象語（β_y）の関係の強さをf(α_x,β_y,γ_z)によって求める。印象語α_xと印象語β_yの共起頻度を考慮した場合、以下の手法（１）（数式３）の計算式でfを求める。 The similarity (or distance) Rel (A, B) between impression word category A and impression word category B is obtained as follows (Formula 2). In order to obtain the similarity, first, in the comment group γ _z included in the entire comment set Γ and given to one image z, the impression word (α _x ) and the impression word included in the impression word category Α The strength of the relationship between impression words (β _y ) included in category Β is obtained by f (α _x , β _y , γ _z ). When the co-occurrence frequency of the impression word α _x and the impression word β _y is taken into consideration, f is obtained by the following formula (1) (Formula 3).

（数式３）においてλは任意の値で構わないが、通常λ＝1とする。こうして（数式２）に（数式３）を適用すると、全コメント中における各印象語間の共起頻度の総和を正規化したものとして類似度Rel(A,B)が得られる。なお、手法（１）では、直感的にはfは類似度であるので、値が高いほど印象語カテゴリ間は類似していることになる。 In (Equation 3), λ may be an arbitrary value, but normally λ = 1. By applying (Equation 3) to (Equation 2) in this way, the similarity Rel (A, B) is obtained by normalizing the sum of co-occurrence frequencies between impression words in all comments. In the method (1), f is intuitively the similarity, so the higher the value, the more similar the impression word category.

一方で、共起頻度に重みを持たせた場合、以下の手法（２）の計算式（数式４）でfを求める。この重みは、画像に対する印象語の関連度の強さが反映されている。2つの印象語が共起していた場合に、2つの印象語に対する画像印象関連度の差が小さければ、その2つの印象語は類似しており、差が大きければ、その2つの印象語は類似していないという考えに基づく。これは、同じような意味を持つ印象語の共起頻度は同じくらい出現するであろうという仮定からである。なお、手法（２）では、直感的にはfは距離であるので、値が高いほど印象語カテゴリ間は類似していないということになる。 On the other hand, when a weight is given to the co-occurrence frequency, f is obtained by a calculation formula (Formula 4) of the following method (2). This weight reflects the strength of the degree of relevance of the impression word to the image. If two impression words co-occur and the difference in image impression relevance between the two impression words is small, the two impression words are similar, and if the difference is large, the two impression words are Based on the idea that they are not similar. This is because of the assumption that co-occurrence frequencies of impression words with similar meanings will appear as much. In the method (2), since f is a distance intuitively, a higher value means that the impression word categories are not similar.

（数式４）においてg(α_x,γ_z)は、ある1つの画像ｚに対するコメント群γ_zにおける印象語α_xと画像との関連度であり、同様にg(β_y,γ_z)は、印象語β_yと画像との関連度であり、これらは印象・画像DB３０を第１実施形態によって作成する際に既に求められているものである。 In (Formula 4), g (α _x , γ _z ) is the degree of association between the impression word α _x and the image in the comment group γ _z for a certain image z, and similarly g (β _y , γ _z ) is The degree of association between the impression word β _y and the image, which has already been obtained when the impression / image DB 30 is created according to the first embodiment.

以上の説明における数式２〜４は次に示す通りである。 Formulas 2 to 4 in the above description are as follows.

なお、（数式２）において規格化項N_A、N_Bはそれぞれ印象語カテゴリＡ、Bに属する印象語の種類数であり、x, yは印象カテゴリA、Bに属する各印象語α_x、β_yに渡る和を（α_x∈A, β_y∈B）の部分で表すための添字、規格化項N_Γはある一つの画像に対する全コメント数である。 In (Equation 2), the normalized terms N _A and N _B are the number of types of impression words belonging to impression word categories A and B, respectively, and x and y are impression words α _x , belonging to impression categories A and B, respectively. A subscript for expressing the sum over β _y in the part of (α _x ∈A, β _y ∈B), the normalization term N _Γ is the total number of comments for one image.

上記（数式２）により算出される値Rel(A,B)を基準として、印象語カテゴリAと印象語カテゴリBのカテゴライズを、所定の階層的クラスタリング手法（最短距離法、最遠距離法、群平均法、Ward法など）を利用して実現する。上述のように手法（１）では値が高いほど類似しており、手法（２）では値が低いほど類似しているとしてカテゴライズを行う。なおカテゴライズの開始にあたっては各印象語が１語毎に１カテゴリに対応しているものとして（数式２）の値を算出すればよい。 Based on the value Rel (A, B) calculated by the above (Formula 2), impression word category A and impression word category B are categorized into a predetermined hierarchical clustering method (shortest distance method, farthest distance method, group This is achieved using the average method, Ward method, etc. As described above, the higher the value is in the method (1), the more similar, and the lower the value is in the method (2), the more similar is the categorization. At the start of categorization, the value of (Equation 2) may be calculated assuming that each impression word corresponds to one category for each word.

任意の閾値条件を与えて、クラスタ数がその閾値条件を満たすまでクラスタリングによる統合を行うことにより、図９（ｂ）に示すような各印象語カテゴリが形成される。そのカテゴリと各印象語の対応付けを同図（ｃ）に示すように印象語変換テーブル５０に保存する。この対応付けを利用して、文書印象語および画像印象語を前述のようにそれぞれ文書印象カテゴリおよび画像印象カテゴリに変換して、画像検索部３による検索を行う。 Each impression word category as shown in FIG. 9B is formed by giving an arbitrary threshold condition and performing integration by clustering until the number of clusters satisfies the threshold condition. The association between the category and each impression word is stored in the impression word conversion table 50 as shown in FIG. Using this association, the document impression word and the image impression word are converted into the document impression category and the image impression category, respectively, as described above, and the image search unit 3 performs a search.

次に、印象情報に追加して入力文書のテンプレートと候補画像の画像配色を考慮した画像検索を実現する第３実施形態について述べる。配色情報抽出部７によって、図６で説明したような入力文書に対応するテンプレート領域（画素単位、ブロック単位など）における各色成分を求める。一方で、印象・画像DB３０中の画像も同様に画像の各領域（画素単位、ブロック単位、背景、前景など）の色成分が求まる。当該印象・画像ＤＢ３０中の画像の色成分算出は印象・画像ＤＢ３０に格納された画像に対して予め処理しておいて画像と共に保存しておいても、画像検索部３が検索時に行ってもよい。テンプレートの色成分と画像DB中の画像の色成分の距離を求めることで、テンプレートと画像の配色がマッチする度合い（以後、配色スコアと呼ぶ）を求めることができる。 Next, a description will be given of a third embodiment that realizes an image search in consideration of an image color scheme of an input document template and candidate images in addition to impression information. The color arrangement information extraction unit 7 obtains each color component in the template area (pixel unit, block unit, etc.) corresponding to the input document as described in FIG. On the other hand, the color components of each area (pixel unit, block unit, background, foreground, etc.) of the image are similarly obtained for the image in the impression / image DB 30. The calculation of the color component of the image in the impression / image DB 30 may be performed on the image stored in the impression / image DB 30 in advance and stored together with the image, or may be performed during the search by the image search unit 3. Good. By obtaining the distance between the color component of the template and the color component of the image in the image DB, it is possible to obtain the degree of matching between the color scheme of the template and the image (hereinafter referred to as a color scheme score).

色成分を表す情報としてテンプレートと画像のそれぞれの色ヒストグラムを利用する。ヒストグラム間の距離を計算することで、配色スコアを求める。色空間(例：RGB空間) をC、各色成分(例：R,G,B)をその要素c_y，ヒストグラムのビンwにおける、テンプレートの色成分のヒストグラムの頻度をT_cy(w)，画像i_zの色成分のヒストグラムの頻度をh_izcy(w)として、それらのユークリッド距離として配色スコアCS_izを次式（数式５）で求めることができる。 The color histograms of the template and the image are used as information representing color components. A color arrangement score is obtained by calculating the distance between the histograms. The color space (example: RGB space) is C, each color component (example: R, G, B) is its element c _y , the histogram frequency of the template color component in the histogram bin w is T _cy (w), image _Assuming that the frequency of the histogram of the color component of i _z is h _izcy (w), the color arrangement score CS _iz can be obtained by the following equation (Equation 5) as the Euclidean distance.

ここで、N_Bは、ヒストグラムのビンの数（色空間の分割数）であり、任意に設定する。例えば、ヒストグラム空間を10個に分割したとすれば、10個のビンができるので、N_B=10となる。また、テンプレートのヒストグラムの頻度をT_cy(w)および画像のヒストグラムの頻度h_izcy(w)は、それぞれ頻度の合計値で正規化されたものである。T'_cy(w) およびh'_izcy(w)は次の（数式６）（数式７）のように正規化前の各頻度を示し、N_TおよびN_Hは、各頻度の合計値である。 Here, N _B is the number of histogram bins (the number of divided color space), arbitrarily set. For example, if the histogram space is divided into ten, ten bins are created, so that N _B = 10. Further, the frequency of the template histogram T _cy (w) and the frequency of the image histogram h _izcy (w) are each normalized by the total frequency. T ′ _cy (w) and h ′ _izcy (w) indicate respective frequencies before normalization as in the following (formula 6) and (formula 7), and _NT and _NH are the total values of the respective frequencies. .

上記の例では、例えばRGB空間のR,G,B全てを利用する場合について書いたが、その一部のRだけを利用して配色スコアを求めてもよい。また、色空間自体もRGBではなく、例えばHSVなどの各種色空間を用いてもよく、その中のH成分だけで配色スコアを求めてもよい。また、頻度を求める単位は、画素毎でもよいし、任意の大きさのブロック毎(矩形)でもよいし、任意の形状の範囲毎（円など）でもよい。ブロック毎などで求める場合には、ブロックの中での最頻の色成分をそのブロックの代表色成分として、画像全体の色の頻度を計算してもよい。さらには、画像中の位置などによって（例えば、画像の中央と端で、中央の方が重みが大きくなるように）、頻度に重みを持たせるなどしてもよい。距離は、ユークリッド距離だけはなく、各次元の分散と相関を考慮したマハラノビス距離など所定の距離定義を利用してもよい。また、ビン毎に対する重みを持たせてもよい。 In the above example, the case where all R, G, and B of the RGB space are used has been described. However, the color arrangement score may be obtained using only a part of R. In addition, the color space itself is not RGB, and various color spaces such as HSV may be used, and the color arrangement score may be obtained using only the H component in the color space. The unit for obtaining the frequency may be every pixel, every block (rectangular) having an arbitrary size, or every range of an arbitrary shape (such as a circle). When determining for each block, the frequency of the color of the entire image may be calculated using the most frequent color component in the block as the representative color component of the block. Further, the frequency may be given a weight depending on the position in the image or the like (for example, the weight is greater at the center and the edge of the image). As the distance, not only the Euclidean distance but also a predetermined distance definition such as a Mahalanobis distance considering the variance and correlation of each dimension may be used. Moreover, you may give the weight with respect to each bin.

こうして求めた配色スコアCS_izと第１実施形態（又は第２実施形態）での既述の印象語スコア（又は印象カテゴリスコア）IS_izの重み付き線形和を求めて統合することにより、第３実施形態における画像選択スコアを次の（数式８）のように求めることができる。なお、aは統合重みであり、0.0〜1.0の任意の連続値とする。 By _obtaining and integrating a weighted linear sum of the color arrangement score CS _iz thus obtained and the _{aforementioned} impression word score (or impression category score) IS _iz in the first embodiment (or the second embodiment), a third is obtained. The image selection score in the embodiment can be obtained as in the following (Formula 8). Note that a is an integrated weight, and is an arbitrary continuous value between 0.0 and 1.0.

ここでa=1.0であれば、第１、第２実施形態に該当し印象語スコア、印象カテゴリスコアのみを利用することとなり、逆にa=0.0であれば、第３実施形態の特徴である配色スコアのみを利用することとなる。テンプレートの色合いに近い色合いの画像を利用して統一感を出したい場合には、類似度が高い画像を利用すればよいし、逆にテンプレートの色合いに遠い色合いの画像を利用して補色効果によるインパクトを与えたい場合には、配色スコアCS_izを（数式５）で求めた値の符号を負に変えたものを（数式８）で求めるなどして、色的な類似度が低く且つ印象語スコアが高い画像を利用すればよい。以上により、第３実施形態では配色を適度に考慮した印象語による検索が実現できるという効果がある。 Here, if a = 1.0, it corresponds to the first and second embodiments, and only the impression word score and the impression category score are used. Conversely, if a = 0.0, it is a feature of the third embodiment. Only the coloration score will be used. If you want to create a sense of unity by using an image with a hue close to that of the template, you can use an image with a high degree of similarity. In order to give an impact, the coloration score CS _iz is obtained by changing the sign of the value obtained in (Equation 5) to a negative value by (Equation 8). An image with a high score may be used. As described above, in the third embodiment, there is an effect that it is possible to realize a search by an impression word that appropriately considers the color scheme.

次に、キーワード情報を利用して画像検索する第４実施形態について述べる。ブログ等の入力文書からキーワード抽出部６により抽出されるキーワード情報の例を図１０に示す。キーワード情報（ｂ）は、図１０に示すように入力文書（ａ）中のキーワードおよびその重みであるキーワード重みからなる。ここでは、印象語とは別に例えば名詞や動詞といった語を印象語と区別してキーワードと呼ぶ。また、キーワード抽出は例えばtfidf法により、キーワード重みは、当該tfidfの値などの既存手法を利用して、入力文書に含まれる単語のうち重要度が高いものをその重みと共に所定数抽出する。当該キーワード情報を用いて画像検索部３が印象・画像ＤＢ３０を検索することにより、例えば、"かわいい＋犬"という検索クエリ要求に対応することができるようになり、本発明において入力文書の印象をその内容から直接的に反映した画像を選択する精度が高まるという効果がある。 Next, a fourth embodiment in which image search is performed using keyword information will be described. An example of keyword information extracted by the keyword extraction unit 6 from an input document such as a blog is shown in FIG. As shown in FIG. 10, the keyword information (b) is composed of keywords in the input document (a) and keyword weights that are weights thereof. Here, apart from impression words, words such as nouns and verbs are called keywords to distinguish them from impression words. In addition, keyword extraction is performed by, for example, the tfidf method, and a keyword weight is extracted by using an existing method such as the value of the tfidf, and a predetermined number of words included in the input document with high importance are extracted together with the weight. When the image search unit 3 searches the impression / image DB 30 using the keyword information, for example, it becomes possible to respond to a search query request of “cute + dog”. There is an effect that the accuracy of selecting an image directly reflected from the content is increased.

印象・画像DB３０には、図７に示すように印象語の他に、画像に対するタグも保存されている。前述のように当該タグはコメント・画像ＤＢ１０において付与されていたタグをＤＢ印象情報抽出部１１によって直接引き継いだタグを用いることができる。印象・画像ＤＢ３０に格納されたこのタグに対して、画像検索部３はキーワード抽出部６より受け取ったキーワードをキーとして検索を行うと共に次のようにキーワードスコアを算出する。 In the impression / image DB 30, tags for images are stored in addition to impression words as shown in FIG. As described above, the tag obtained by directly inheriting the tag assigned in the comment / image DB 10 by the DB impression information extraction unit 11 can be used. The image search unit 3 searches for the tag stored in the impression / image DB 30 using the keyword received from the keyword extraction unit 6 as a key and calculates a keyword score as follows.

入力文書中のキーワード集合Wに含まれるキーワードｗ_xが、印象・画像DB３０中の画像i_zに対するタグ集合T_izのタグt_yに一致するか否かを判別する（数式１０）に示す関数E_iz(w_x,t_y)を利用して、キーワードと画像との関係性の強さを表すキーワードスコアKS_izは次式（数式９）で表すことができる。なお、ω_wxはキーワードw_xの重みであり、ω_wx =1として重みなしとしてもよいし、キーワード情報抽出部６にて各単語に対して求めたtfidfなどの値を利用してもよい。N_Wは文書中のキーワードの個数であり、N_Tizは画像i_zに対するタグ集合T_izに含まれるタグの個数である。また、通常、λ=1である。 Keyword w _x contained in the keyword set W in the input document, the function shown in determines whether or not the matching tag t _y tag set T _iz for image i _z in impression-image DB 30 (Equation 10) E _{Using iz} (w _x , t _y ), the keyword score KS _iz representing the strength of the relationship between the keyword and the image can be expressed by the following formula (formula 9). Note that ω _wx is the weight of the keyword w _x , ω _wx = 1 may be without weight, or a value such as tfidf obtained for each word by the keyword information extraction unit 6 may be used. N _W is the keyword number of the document, N _Tiz is the number of tags in the tag set T _iz for image i _z. Usually, λ = 1.

またタグのみだけではなく、コメントからタグに相当するような名詞などを抽出しておいて、それをタグの代替として利用してもよい。 Further, not only a tag but also a noun corresponding to the tag may be extracted from the comment and used as an alternative to the tag.

このキーワードスコアKS_izと、上述した第１実施形態又は第２実施形態における印象語と画像との関係を表す印象語スコア又は印象カテゴリスコアIS_izを統合することにより、画像検索部３は第４実施形態における画像選択スコアS_izを次の（数式１１）のように求めることができる。なお、bは統合重みであり、0.0〜1.0の任意の連続値である。 By integrating the keyword score KS _iz and the impression word score or impression category score IS _iz representing the relationship between the impression word and the image in the first embodiment or the second embodiment described above, the image search unit 3 performs the fourth operation. The image selection score S _iz in the embodiment can be obtained as in the following (Equation 11). Note that b is an integrated weight, and is an arbitrary continuous value between 0.0 and 1.0.

b=1.0であれば、印象語スコア、印象カテゴリスコアのみを利用する第１、第２実施形態に一致し、逆にb=0.0であれば、第４実施形態の特徴であるキーワードスコアのみを利用することとなる。 If b = 1.0, it matches the first and second embodiments using only the impression word score and impression category score. Conversely, if b = 0.0, only the keyword score, which is the feature of the fourth embodiment, is obtained. Will be used.

また、第４実施形態の別実施形態として、上述のようにキーワードスコアを利用するのではなく、印象・画像ＤＢ３０に格納された画像のうちキーワードをタグに含む画像のみを検索対象として、第１又は第２実施形態によって画像検索部３から印象語又は印象カテゴリをキーとして検索を行ってもよい。 Further, as another embodiment of the fourth embodiment, instead of using the keyword score as described above, only the images including the keyword in the tag among the images stored in the impression / image DB 30 are set as search targets. Alternatively, a search may be performed from the image search unit 3 using the impression word or impression category as a key according to the second embodiment.

この場合には、入力文書から抽出された文書印象語に対応する画像印象語を含み、かつタグ中に入力されたキーワードのいずれか一つ以上含む画像を検索対象とした上で、印象語スコア又は印象カテゴリスコアIS_izのみを次式（数式１２）で計算して、それを画像選択スコアとする。 In this case, the impression word score is obtained by searching for an image including an image impression word corresponding to the document impression word extracted from the input document and including any one or more of the keywords input in the tag. Alternatively, only the impression category score IS _iz is calculated by the following equation (Equation 12) and is used as the image selection score.

以上により、第４実施形態においては印象語および配色により画像の内容を適度に考慮した上で、入力文書の印象を表すような画像を検索できるようになるという効果がある。 As described above, in the fourth embodiment, there is an effect that it is possible to search for an image representing the impression of the input document while appropriately considering the content of the image by the impression word and the color scheme.

さらに同様にして第１又は第２実施形態に追加して、第３実施形態における配色と第４実施形態におけるキーワードの両方を考慮した印象語による画像検索手法も、配色スコア、キーワードスコア、印象語スコアの3つの重み付き線形和を求めることにより実現できる。この場合、統合重みのパラメータがa,bの2つとなり、次式（数式１３）で求めることができる。 Similarly, in addition to the first or second embodiment, an image search method using an impression word considering both the color arrangement in the third embodiment and the keyword in the fourth embodiment also includes a color arrangement score, a keyword score, and an impression word. This can be achieved by finding three weighted linear sums of scores. In this case, there are two parameters of integrated weights a and b, which can be obtained by the following equation (Equation 13).

これにより、配色と画像の内容を考慮した上で、印象を表すような画像を検索できるようになるという効果がある。 Accordingly, there is an effect that it is possible to search for an image representing an impression in consideration of the color arrangement and the content of the image.

次に、第５実施形態として共起情報抽出部８から頻出共起単語リスト８０を利用した結果を用いて画像検索部３が抽象画像を検索する場合について述べる。図１１に抽象画像検索の流れを示す。 Next, a case where the image search unit 3 searches for an abstract image using a result of using the frequent co-occurrence word list 80 from the co-occurrence information extraction unit 8 will be described as a fifth embodiment. FIG. 11 shows the flow of abstract image search.

当該第５実施形態において検索し選定する画像を特に抽象画像と呼ぶのは次のような着目点によるものである。すなわち、印象を表す画像として具体的な画像が本発明の画像選定装置により選択されても、場合によっては適切な画像でないことがある。例えば、「猫」のブログ記事を書いていて、その記事の印象が「かわいい」であった場合に、「かわいい犬」の画像が選択された場合、適した画像とは言えないことがある。「かわいい犬」の画像は、「かわいい」という印象を表す一方で、「犬」という具体的なオブジェクトを表す画像とも言える。このオブジェクトにできるだけ依存せずに印象を画像で表したい場合、できるだけ抽象的な画像が望まれる。そこで、抽象的な画像を得るために、「かわいい」と「犬」のようによく共起して用いられるような画像を選択候補から省けばよいと考えられる。 The image to be searched and selected in the fifth embodiment is particularly called an abstract image because of the following points of interest. That is, even if a specific image is selected as an image representing an impression by the image selection device of the present invention, it may not be an appropriate image depending on the case. For example, if a blog article of “Cat” is written and the impression of the article is “Cute”, and an image of “Cute dog” is selected, it may not be a suitable image. While the image of “cute dog” represents the impression of “cute”, it can also be said to be an image representing a specific object of “dog”. When it is desired to express an impression as an image without depending on this object as much as possible, an image that is as abstract as possible is desired. Therefore, in order to obtain an abstract image, it is considered that images such as “cute” and “dog” that are often used together are omitted from the selection candidates.

このような着目点のもと、第５実施形態では印象を表す抽象的な画像を取得できるという効果がある。なおこのため、効果的な画像を選択するという共通の目的ではあるが、第３実施形態で入力文書から直接的にキーワードを拾ってそのキーワードにタグが一致する画像を取得するのとは手段が対照的であるので、第５実施形態と第３実施形態は（形式上は可能だが）併用すべきではない。 Under such a point of interest, the fifth embodiment has an effect that an abstract image representing an impression can be acquired. For this reason, although it is a common purpose of selecting an effective image, in the third embodiment, a means for directly picking up a keyword from an input document and acquiring an image whose tag matches the keyword is a means. In contrast, the fifth and third embodiments should not be used together (although formally possible).

第５実施形態では抽象画像の検索実現のために、印象情報抽出部２で抽出された印象語と共起する単語を共起情報抽出部８によって抽出する。共起情報抽出部８では、頻出共起単語リスト８０から、印象情報抽出部１１で抽出された文書印象語とよく共起する単語を共起頻度順に抽出する。頻出共起単語リスト８０は各印象語に対して作成されるが、各印象語と共起する「単語」すなわち頻出共起単語とその単語が共起した「頻度」、およびその頻度順に並べた時の順位を表す「ランク」が保存されている。この印象語とよく共起する単語を頻出共起単語と呼ぶことにする。 In the fifth embodiment, a word that co-occurs with an impression word extracted by the impression information extraction unit 2 is extracted by the co-occurrence information extraction unit 8 in order to realize retrieval of an abstract image. The co-occurrence information extraction unit 8 extracts words that frequently co-occur with the document impression word extracted by the impression information extraction unit 11 from the frequent co-occurrence word list 80 in the order of the co-occurrence frequencies. The frequent co-occurrence word list 80 is created for each impression word, and is arranged in the order of the “words” that co-occur with each impression word, that is, the “frequency” in which the frequent co-occurrence word co-occurs with the word, and the frequency. A “rank” representing the order of time is stored. A word that often co-occurs with the impression word will be referred to as a frequent co-occurrence word.

画像検索部３では印象情報抽出部２および共起情報抽出部８の結果を受け取り、「頻度」または「ランク」に対して、所定の閾値を超える上位の所定数の頻出共起単語を印象語と共に利用して、画像検索を行う。印象・画像DB３０に対して検索を行う際に、第１、第２実施形態におけるように印象語をクエリとするだけでなく、頻出共起単語もクエリとする。印象・画像DB３０中の当該印象語を含む画像でかつ、頻出共起単語をタグに含まない画像を選択する。なお図７に示す印象・画像DB３０の例からも明らかなように、印象語クエリは画像印象語を対象とし、頻出共起単語クエリはタグを対象として検索し、これら全てのクエリが一致する画像を選出する。 The image search unit 3 receives the results of the impression information extraction unit 2 and the co-occurrence information extraction unit 8 and selects a predetermined number of frequently occurring co-occurrence words that exceed a predetermined threshold with respect to “frequency” or “rank” as impression words. Use it together to perform image searches. When searching the impression / image DB 30, not only impression words are used as queries as in the first and second embodiments, but frequent co-occurrence words are also used as queries. An image that includes the impression word in the impression / image DB 30 and that does not include a frequent co-occurrence word in the tag is selected. As is clear from the example of the impression / image DB 30 shown in FIG. 7, the impression word query searches for an image impression word, the frequent co-occurrence word query searches for a tag, and all these queries match. Is elected.

こうして第５実施形態では、例えば第１実施形態における（数式１）なら、文書印象語k_xに一致する画像印象語k_xを持ち、且つ画像のタグに文書印象語k_xの各々に対して上位所定の頻出共起単語と一致するものがないような画像i_z、と式の意味を読み替えることによって画像検索部３により画像選択スコアが算出できる。同様の読み替えによって第２〜第４実施形態と第５実施形態とを組み合わせることもできる。 Thus in the fifth embodiment, for example, if in the first embodiment (Equation 1), having an image impression words k _x matching document impression word k _x, and the image tag to each article impression word k _x An image selection score can be calculated by the image search unit 3 by re-reading the meaning of the expression i _z that does not match the upper predetermined frequent occurrence word. The second to fourth embodiments and the fifth embodiment can be combined by the same replacement.

図１１の具体例を用いて以上の抽象画像の検索を説明する。同図（ａ）は文書印象情報抽出部２で印象語を抽出しており、ここから始めて抽出された印象語のうち「かわいい」に頻出共起単語を求める例が図１１である。（ａ）の印象語の各々が文書印象情報抽出部２から共起情報抽象部８に渡され、各印象語の頻出共起単語を頻出共起単語リスト８０から読みとる。（ｂ）では頻出共起単語リスト８０のうち印象語「かわいい」に対応する部分のデータ例が示されている。 The above-described abstract image search will be described with reference to a specific example of FIG. FIG. 11A shows an example in which an impression word is extracted by the document impression information extraction unit 2, and a frequently occurring co-occurrence word is determined as “cute” among the extracted impression words from here. Each of the impression words in (a) is transferred from the document impression information extraction unit 2 to the co-occurrence information abstraction unit 8, and the frequent co-occurrence words of each impression word are read from the frequent co-occurrence word list 80. In (b), a data example of a portion corresponding to the impression word “cute” in the frequent co-occurrence word list 80 is shown.

共起情報抽出部８ではリスト中の印象語に対応する頻出共起単語から頻度が所定基準を満たす頻出共起単語をその共起頻度順に抽出する。同図の例は所定基準として頻度が１０回以上とした例であり、（ｃ）に示すように印象語「かわいい」に対して頻出共起単語「犬」が抽出され、画像検索部３に渡される。したがって画像検索部３では「かわいい」という印象語と「犬」という頻出共起単語を利用して、（ｅ）に示すような印象・画像ＤＢ３０に対して画像検索を行うことになるが、"画像１"に関しては、画像DBの印象語中に「かわいい」が存在するが、タグ中に「犬」が存在するため、この画像は選択されない。一方で、"画像２"に関しては、画像DBの印象語中に「かわいい」が存在し、タグ中には「犬」が存在しないため、（ｆ）に示すようにこの画像は選択されるということになる。 The co-occurrence information extraction unit 8 extracts frequent co-occurrence words whose frequencies satisfy a predetermined criterion from the frequent co-occurrence words corresponding to the impression words in the list in the order of the co-occurrence frequencies. The example shown in the figure is an example in which the frequency is set to 10 times or more as a predetermined criterion. As shown in (c), the frequent co-occurrence word “dog” is extracted for the impression word “cute”, and the image search unit 3 Passed. Therefore, the image search unit 3 uses the impression word “cute” and the frequent co-occurrence word “dog” to search the impression / image DB 30 as shown in FIG. As for the image 1 ″, “cute” exists in the impression word of the image DB, but “dog” exists in the tag, so this image is not selected. On the other hand, with respect to “image 2”, “cute” exists in the impression word of the image DB, and “dog” does not exist in the tag, so this image is selected as shown in (f). It will be.

以上、頻出共起単語を用いた選択画像の制限はタグに頻出共起単語が含まれるものを選択対象から除外するとしたが、選択画像の制限の別実施形態として、印象・画像DB３０に対して前述のように１次データベースの関係にある、コメント・画像DB１０中のコメントも印象・画像DB３０中に保持しておき、そのコメント中に頻出共起単語が含まれる画像を選択対象から除外するとしてもよい。 As described above, the restriction on the selected image using the frequently occurring co-occurrence word excludes the tag including the frequently occurring co-occurrence word from the selection target. However, as another embodiment of the restriction on the selected image, the impression / image DB 30 is limited. As described above, the comments in the comment / image DB 10 that are related to the primary database are also retained in the impression / image DB 30, and images including frequently occurring co-occurrence words in the comments are excluded from selection targets. Also good.

さらにまた、タグおよびコメントの少なくとも一方に頻出共起単語が含まれる画像を画像の選択候補から除外してもよいし、選択候補の中での順位を低くすることにしてもよい。順位の決定方法は、頻出共起単語を含まない画像を上位として、頻出共起単語を含む画像を下位とする。このような頻出共起単語を含むか否かによる順位の重み付けは、例えば第１実施形態における（数式１）を第５実施形態に次（数式１４）のように適用すればよい。 Furthermore, an image in which a frequently occurring word is included in at least one of a tag and a comment may be excluded from image selection candidates, or the ranking among the selection candidates may be lowered. In the rank determination method, an image that does not include a frequently occurring co-occurrence word is assigned to the upper rank, and an image that includes the frequent co-occurrence word is assigned to the lower position. For example, the weighting of the rank depending on whether or not the frequently occurring co-occurrence word is included may be obtained by applying (Equation 1) in the first embodiment to the fifth embodiment as follows (Equation 14).

ここで、（数式１）と同様にK={k_x|x=1,2,...,N_K}は入力文書から抽出された印象語の集合、N_Kは入力文書から抽出された印象語の個数である。C(k_x)が頻出重み単語を含むか否かによる重み付け係数であって、例えば画像i_zのタグ等に当該画像i_zの画像印象語k_xの頻出共起単語があればC(k_x)＝C₁、なければC(k_x)＝C₂、C₁とC₂は所定の定数で0≦C₁＜C₂を満たす、などとすればよい。第２〜第４実施形態と第５実施形態の組み合わせも同様にできる。 Here, as in (Formula 1), K = {k _x | x = 1,2, ..., N _K } is a set of impression words extracted from the input document, and N _K is extracted from the input document The number of impression words. C (k _x) is frequently a weighting coefficient according to whether or not including the weight word, for example, image i _z C (k if there is frequent co-occurrence word image impression word k _x of the image i _z to the tags and _{If x} ) = C ₁ , otherwise C (k _x ) = C ₂ , C ₁ and C ₂ may be predetermined constants that satisfy 0 ≦ C ₁ <C ₂ . Combinations of the second to fourth embodiments and the fifth embodiment can be similarly performed.

以上では頻出共起単語リスト８０は予め与えられているものとして第５実施形態を説明してきた。次にこのような頻出共起単語リスト８０の構築を図１３で説明する。当該構築を行う構成が前述のように図１２であり、以下に説明するようにＤＢテキスト検索部８０２が所定の印象語の各々をキーとして第２テキストＤＢ８０１に対して検索を行い、検索結果から共起単語頻度リストを作成し、当該共起単語頻度リストと印象語の各々とをキーとして再度テキストＤＢ８０１を検索した結果から頻出単語リスト８０が構築される。 The fifth embodiment has been described above assuming that the frequently occurring co-occurrence word list 80 is given in advance. Next, the construction of such a frequent occurrence word list 80 will be described with reference to FIG. FIG. 12 shows the configuration for performing the construction as described above. As described below, the DB text search unit 802 searches the second text DB 801 using each of the predetermined impression words as a key, and from the search result. A frequent word list 80 is constructed from a result of creating a co-occurrence word frequency list and searching the text DB 801 again using the co-occurrence word frequency list and each impression word as a key.

図１３（ａ）に示すような各印象語をキーとして、同図（ｂ）第２テキストDB８０１において、テキスト検索を行う。第２テキストＤＢ８０１には既存のテキスト検索エンジンを採用すればよい。なお当該印象語は印象情報抽出部２、ＤＢ印象情報抽出部１１で用いたのと同じの、本発明の画像付与装置全体において共通で用いられる所与の印象語である。（ｃ）に示すように各キー、すなわちこの例では「かわいい」にマッチしたWebページを所定のページ数だけ取得し、そのWebページ中の各単語とキーとした印象語との共起頻度を求めて、（ｄ）のような共起単語頻度リストを作成する。これによりまず印象語と共起する単語を大まかに知ることができる。なお、当該テキスト検索エンジンの検索対象の文書はWebページとして説明しているが、本発明の画像選定装置をブログ文書に適用する場合に好ましい実施形態がWebページであり、その他一般の文書が検索対象であってもよい。 Using each impression word as shown in FIG. 13A as a key, text search is performed in the second text DB 801 in FIG. An existing text search engine may be adopted for the second text DB 801. The impression word is a given impression word that is commonly used in the entire image providing apparatus of the present invention, which is the same as that used in the impression information extraction unit 2 and the DB impression information extraction unit 11. As shown in (c), a predetermined number of pages are acquired for each key, that is, “cute” in this example, and the co-occurrence frequency of each word in the Web page and the impression word as a key is determined. In response, a co-occurrence word frequency list such as (d) is created. As a result, it is possible to roughly know the words that co-occur with the impression word. The search target document of the text search engine is described as a Web page. However, when the image selection apparatus of the present invention is applied to a blog document, a preferred embodiment is a Web page, and other general documents are searched. It may be a target.

次に、印象語とこの共起単語頻度リストにある単語の組み合わせをキーとして、（ｅ）に示すように再び（ｂ）のテキスト検索エンジンにより検索を行い、（ｆ）に示すようにその時のヒット数を（ｇ）のように頻出共起単語リストの頻度とする。 Next, using the combination of the impression word and the word in the co-occurrence word frequency list as a key, the search is performed again by the text search engine of (b) as shown in (e), and as shown in (f). The number of hits is the frequency of the frequently occurring word list as shown in (g).

（ｅ）の再検索における単語の組み合わせ方として、「印象語」 AND 「共起単語」というようなAND検索を行ってもよいし、「印象語」と「共起単語」をひとつの単語として検索を行ってもよい。例えば、前者は、「かわいい」 AND 「犬」というAND検索で、必ずしも「かわいい」と「犬」がWebページ中の文書において近くに存在しなくてもヒットする。一方で、後者は、「かわいい犬」という印象語と共起単語の組み合わせをひとつのキーとして検索を行うことを意味し、この場合にはWebページ中の文書において「かわいい犬」という並びで存在しているページのみがヒットする。また、ページ内の共起ではなく、ある一文で共起する（文内共起）場合のみをヒットするようにしてもよい。これは、例えば、「あの犬はかわいい。」という一文の中で共起することを意味している。逆に、「あの犬は大きい。でも、かわいい。」という場合には、ページ内では共起しているが、文内では共起しないため、ヒットしない。なお、上述した方法以外で既に作成された単語と単語の共起関係があれば、それを利用してもよい。 As a method of combining words in the re-search in (e), an AND search such as “impression word” AND “co-occurrence word” may be performed, or “impression word” and “co-occurrence word” are regarded as one word. A search may be performed. For example, the former is an AND search of “cute” AND “dog” and hits even if “cute” and “dog” do not necessarily exist nearby in the document in the Web page. On the other hand, the latter means searching using a combination of impression word and co-occurrence word "cute dog" as one key. In this case, it exists in the line of "cute dog" in the document on the web page. Only the current page is hit. Further, it may be possible to hit only in the case of co-occurrence in a certain sentence (co-occurrence in a sentence) instead of co-occurrence in a page. This means, for example, co-occurring in a sentence “That dog is cute.” On the contrary, in the case of “that dog is big, but cute”, it co-occurs in the page, but does not co-occur in the sentence, so it does not hit. If there is a word-word co-occurrence relationship that has already been created by a method other than the method described above, it may be used.

なお、以上の第１〜第５実施形態に関してはいずれを適用するかを入力文書の入力ユーザが選択できるようにしておいてもよい。当該選択の命令は入力文書と共に本発明の画像選定装置が受信して選択された実施形態を適用するようにすればよい。 It should be noted that the input user of the input document may be allowed to select which of the first to fifth embodiments is applied. It suffices to apply the embodiment in which the selection instruction is received and selected by the image selection apparatus of the present invention together with the input document.

図１４は、本発明による以上説明したような画像選定における全機能又は一部機能を実行するコンピュータシステムのブロック図であり、CD-ROM等の記録メディアに記録された画像選定プログラムの全部又は一部（以下、当該全部又は一部を、本発明のプログラムと呼ぶこととする。）を読みとるドライブ装置１００と、ネットワークを介してデータを送受信するインタフェース（I/F）１０１と、オペレーティングシステムと共に前記読みとられた本発明のプログラムや当該プログラム実行に必要なデータを一時記憶する、例えばHDDやフラッシュメモリ等からなる補助記憶装置１０２と、ブログ等の入力文書や配色情報等が入力されるキーボードやマウス等の入力装置１０３と、本発明のプログラムの実行結果を出力するディスプレイなどの出力装置１０４と、各種のデータやプログラムが不揮発に記憶されたROM１０５と、本発明のプログラムを実行するCPU１０６と、CPU106にワークエリアを提供するRAM107とを主要な構成としている。 FIG. 14 is a block diagram of a computer system that executes all or part of the image selection functions as described above according to the present invention, and shows all or one of the image selection programs recorded on a recording medium such as a CD-ROM. Unit (hereinafter, all or part thereof will be referred to as a program of the present invention), the interface (I / F) 101 for transmitting / receiving data via a network, and the operating system The auxiliary storage device 102, such as an HDD or a flash memory, which temporarily stores the read program of the present invention and data necessary for executing the program, a keyboard for inputting input documents such as a blog, color scheme information, etc. An input device 103 such as a mouse and a display for outputting the execution result of the program of the present invention. A device 104, various data and programs are the ROM105 stored in the nonvolatile, and CPU 106 for executing the program of the present invention, the main components and RAM107 which provides a work area to CPU 106.

なお、本発明のプログラムは記録メディアからドライブ装置１００により読み込む代わりに、ネットワーク上の他コンピュータ等からI/F１０１を介して読み込むなどしてもよい。また複数コンピュータで本発明のプログラムを実行する場合には、中間的なデータの送受信をI/F１０１を介して行ってもよい。また、入力文書や配色情報は入力装置１０３からではなく、I/F101を介して他コンピュータから受信してもよい。よって図２の入力部１が入力装置１０３又はI/F１０１に、出力部４が出力装置１０４又はI/F１０１に該当する。 Note that the program of the present invention may be read from another computer on the network via the I / F 101 instead of being read from the recording medium by the drive device 100. Further, when the program of the present invention is executed by a plurality of computers, intermediate data transmission / reception may be performed via the I / F 101. Further, the input document and color arrangement information may be received from another computer via the I / F 101 instead of the input device 103. 2 corresponds to the input device 103 or the I / F 101, and the output unit 4 corresponds to the output device 104 or the I / F 101.

従って、例えば印象情報抽出部２、画像検索部３、印象語変換部５、キーワード情報抽出部６、配色情報抽出部７、共起情報抽出部８、DB印象情報抽出部１１、ＤＢ印象語変換部１２、カテゴライズ部１３、ＤＢテキスト検索部８０２の各機能は、本発明のプログラムに従って動作するCPU１０６、ROM105、RAM107等の一機能に該当する。また第１テキストＤＢ９、コメント・画像ＤＢ１０、印象・画像DB３０、印象語変換テーブル５０、頻出共起単語リスト８０、第２テキストＤＢ８０１などは補助記憶装置１０２の一機能に該当する。 Therefore, for example, impression information extraction unit 2, image search unit 3, impression word conversion unit 5, keyword information extraction unit 6, color arrangement information extraction unit 7, co-occurrence information extraction unit 8, DB impression information extraction unit 11, DB impression word conversion Each function of the unit 12, the categorizing unit 13, and the DB text search unit 802 corresponds to one function such as the CPU 106, the ROM 105, and the RAM 107 that operate according to the program of the present invention. The first text DB 9, the comment / image DB 10, the impression / image DB 30, the impression word conversion table 50, the frequent co-occurrence word list 80, the second text DB 801, and the like correspond to one function of the auxiliary storage device 102.

なお第１テキストＤＢ９及び第２テキストＤＢ８０１は補助記憶装置１０２の一機能とともにCPU106等の機能もそなえて前記検索を可能とする。またコメント・画像ＤＢ１０やテキストＤＢ８０１などは、補助記憶装置１０２の一機能を担うが、I/F１０１を介して別のコンピュータ上にあってもよい。 Note that the first text DB 9 and the second text DB 801 include the functions of the auxiliary storage device 102 and the functions of the CPU 106 and the like to enable the search. The comment / image DB 10, the text DB 801, and the like serve one function of the auxiliary storage device 102, but may be on another computer via the I / F 101.

以上、本発明によれば、次のような効果が得られる。第１実施形態によってブログの文書の印象に適した画像を簡単にブログに付与することができる。さらに、第２実施形態によって印象語を印象語カテゴリに変換することで、画像の検索精度が高まる可能性がある。また、第３実施形態によって文書からキーワードを抽出して、印象語と併用して検索に利用することで、画像の内容も考慮した上で、印象を表す画像を取得できる。さらに、第４実施形態によってブログのテンプレートの配色と画像の配色の適合度を考慮して画像検索を行うことで、配色にマッチした画像を精度良く取得できる。これにより、統一感のとれた（もしくは補色利用等によりインパクトのある）ブログとすることができる。さらに、第５実施形態では抽象画像検索により、印象を表すより抽象的で汎用的な画像が取得でき、ブログ記事との適合度を高めることができる。 As described above, according to the present invention, the following effects can be obtained. According to the first embodiment, an image suitable for the impression of a blog document can be easily given to the blog. Furthermore, by converting the impression word into the impression word category according to the second embodiment, there is a possibility that the image search accuracy is improved. Further, by extracting a keyword from a document according to the third embodiment and using it together with an impression word for search, an image representing an impression can be acquired in consideration of the content of the image. Furthermore, an image matching the color scheme can be obtained with high accuracy by performing an image search according to the fourth embodiment in consideration of the degree of matching between the color scheme of the blog template and the color scheme of the image. Thereby, it can be set as a blog with a sense of unity (or an impact by using complementary colors). Furthermore, in the fifth embodiment, by abstract image search, a more abstract and general-purpose image representing an impression can be acquired, and the degree of matching with a blog article can be increased.

また、以上の本発明の説明では、全て印象語は「かわいい」のような日本語を対象に説明してきたが、印象語は「cute」のような英語でも構わない。この場合、日本語と英語の印象語の対応付け（かわいい：cute）を予めとっておけばよい。したがって、Flickrのような画像共有サイトのように、英語のコメントが多い場合にも対応できる。また、ブログに限らず、発表資料用スライドなども含めて文書に対して適した画像を付与することが期待できる。 In the above description of the present invention, all impression words have been explained for Japanese words such as “cute”, but the impression words may be English words such as “cute”. In this case, the correspondence between Japanese and English impression words (cute: cute) may be taken in advance. Therefore, it can handle the case where there are many English comments like an image sharing site like Flickr. Moreover, it can be expected that a suitable image is added to a document including not only a blog but also a slide for presentation material.

２…印象情報抽出部、３…画像検索部、３０…印象・画像ＤＢ 2 ... Impression information extraction unit, 3 ... Image search unit, 30 ... Impression / image DB

Claims

An image selection device that selects and outputs an image suitable for a document,
A word included in the input document is extracted from the input document, a document impression word is determined based on the extracted word and a given impression word, and the document impression word for each of the document impression words An impression information extraction unit for obtaining a document impression relevance level representing the impression strength of the input document;
An impression / image database for storing a search target image, an image impression word including the given impression word, and an image impression relation level for the search target image with respect to each of the image impression words;
The document impression word and the document relevance level are received, and an image in which a word matching the document impression word exists in the corresponding image impression word is searched from the search target image in the impression / image database. An image suitable for the input document is selected based on the document impression relevance level corresponding to each search matched word and the image impression relevance level corresponding to each search matched word. An image selection device comprising an image search unit for selection.

A comment / image database for storing images and comments for the images in association with each other;
A database impression information extraction unit for constructing the impression / image database from the comment / image database,
The database impression information extraction unit obtains each image stored in the comment / image database as the search target image, and based on the comment associated with the image and the given impression word, the image impression word The image selection apparatus according to claim 1, wherein the impression / image database is constructed by determining the image impression relevance level.

The image and comment stored in the comment / image database include an image posted by a plurality of users using an image posting system via a network and a comment given to the image by a viewing user of the image. The image selection device according to claim 2.

A first text database for storing a predetermined document and responding to a search for the document;
The impression information extraction unit defines each of the given impression words as the document impression word, and uses the pair of each of the document impression words and each of the words extracted from the input document as a search key. A text database is searched to determine the number of hits, and the document impression relevance is determined based on the sum of the number of hits of each word determined by the search in pairs with the document impression word,
The database impression information extraction unit defines each of the given impression words as the image impression word, extracts a word included in a comment associated with the image, and extracts the word from each of the image impression word and the comment. The first text database is searched using a pair with each of the extracted words as a search key to obtain the number of hits, and the degree of image impression relevance is determined for each word obtained by the search in pairs with the image impression word. The image selection apparatus according to claim 2, wherein the image selection apparatus is obtained based on a sum of hit numbers.

The impression information extraction unit determines a word that matches the given impression word among the words extracted from the input document as the document impression word, and sets the document impression relevance of the word corresponding to the document impression word. Obtained based on the frequency in the input document,
The database impression information extraction unit extracts a word from the comment, obtains a word that matches the given impression word among the extracted words as the image impression word for the search target image corresponding to the comment, 4. The image selection apparatus according to claim 2, wherein the image impression relevance level is obtained based on a frequency in the comment of a word corresponding to the image impression word.

An impression word conversion table that groups each of the given impression words and associates them with an impression group;
An impression word conversion unit that converts an impression word into an impression category with reference to the impression word conversion table,
The impression information extraction unit passes the determined document impression word to the impression word conversion unit to convert it into a document impression category, and the document impression relevance level is a sum of document impression relevance levels for document impression words belonging to each document impression category. Updated to be based on
The database impression information extraction unit passes the determined image impression word to the impression word conversion unit to convert the image impression word into an image impression category, and sets the image impression relevance level of the image impression relevance level for image impression words belonging to each image impression category. Updated to be based on the sum, and restructured the impression / image database to store the search target image, the converted image impression category, and the updated image impression relevance level in association with each other,
The image search unit receives the document impression category and the updated document impression relevance level, and matches the document impression category with a corresponding image impression category from a search target image in the reconstructed impression / image database. Search for an image in which a category exists, and from among the searched images, the document impression relevance level corresponding to each category that matches the search, and the image impression relevance level corresponding to each category that matches the search 6. The image selection apparatus according to claim 2, wherein an image suitable for the input document is selected based on the input document.

A categorizing unit that reads the impression / image database before being reconstructed and constructs the impression word conversion table,
The categorizing unit obtains a similarity between impression words based on a co-occurrence relationship between image impression words associated with each search target image stored in the impression / image database, and the impression is based on the similarity. The image selection apparatus according to claim 6, wherein the impression word conversion table is constructed by grouping words.

The image selection apparatus according to claim 7, wherein the categorizing unit obtains the similarity using the image impression relevance level associated with the image impression word in the impression / image database.

A keyword extraction unit for extracting a keyword included in the input document from the input document;
In the impression / image database, a tag representing the content of the image is assigned to each of the search target images and stored,
The image search unit receives the keyword, and when performing the search from the search target image in the impression / image database, the search target is limited to only the search target image including the keyword in the assigned tag. 9. The image selection apparatus according to claim 1, wherein a search is performed.

A keyword extraction unit for extracting a keyword included in the input document and a weight of the keyword in the input document from the input document;
In the impression / image database, a tag representing the content of the image is assigned to each of the search target images and stored,
The image search unit receives the keyword and the weight, and from the search target image in the impression / image database, an image in which a word matching the document impression word exists in a corresponding image impression word, or a corresponding tag An image having a tag that matches the keyword is searched, and from the searched images, the document impression related degree and the image impression related degree corresponding to the search matched words or the search match is found. 9. The image selection apparatus according to claim 1, wherein an image suitable for the input document is selected based on at least one of the weights of the keywords corresponding to the tags.

A frequent co-occurrence word list listing the top predetermined number of frequent co-occurrence words for each of the given impression words;
A co-occurrence information extraction unit that receives the document impression words and extracts the frequent co-occurrence words for each of the document impression words with reference to the frequent co-occurrence word list;
In the impression / image database, a tag representing the content of the image is assigned to each of the search target images and stored,
The image search unit receives frequent co-occurrence words for the document impression word, and searches the search target image in the impression / image database for an image in which a word matching the document impression word exists in the corresponding image impression word. And preferentially selecting an image in which the frequent co-occurrence word corresponding to the search-matched impression word does not match the assigned tag as an image suitable for the input document. The image selection apparatus according to any one of 1 to 10.

A second text database for storing a predetermined document and responding to a search for the document;
A database text search unit for constructing the frequent co-occurrence word list from the second text database,
The database text search unit causes the second text database to search for a document containing a word that matches the search key using each of the given impression words as a search key, and from the predetermined number of documents obtained by the search, The upper predetermined number of words that co-occur with a given impression word are extracted, and for each of the co-occurring words, the second text database using the co-occurring word and the given impression word as a search key The above-mentioned predetermined number of frequently occurring co-occurrence words for the given impression word is obtained by re-searching to obtain the number of hit documents and ranking the co-occurrence words according to the number of hit documents. The image selection apparatus according to claim 11, wherein the frequent word list is constructed.

A color arrangement information extraction unit that extracts template color arrangement information from a template used together with the input document;
The image search unit receives the template color arrangement information, and from the search target image in the impression / image database, an image in which a word matching the document impression word exists in a corresponding image impression word, or a color arrangement is the template color arrangement information The image impression relevance level and the image impression relevance level corresponding to each search-matched word, or the color arrangement similarity degree of the searched similar color arrangement image is searched for. The image selection apparatus according to claim 1, wherein an image suitable for an input document is formed.

An image selection method for selecting and outputting an image suitable for a document,
A word included in the input document is extracted from the input document, a document impression word is determined based on the extracted word and a given impression word, and the document impression word for each of the document impression words Impression information extraction means for obtaining a document impression relevance level representing the strength of an impression of the input document;
Impression / image storage means for storing a search target image, an image impression word including the given impression word, and an image impression related degree with respect to the search target image in association with each of the image impression words; ,
The document impression word and the document relevance level are received, and an image in which a word matching the document impression word exists in the corresponding image impression word is searched from the search target image in the impression / image database. An image suitable for the input document is selected based on the document impression relevance level corresponding to each search matched word and the image impression relevance level corresponding to each search matched word. An image selection method comprising image search means for selecting.

An image selection program for selecting and outputting a suitable image for a document,
A word included in the input document is extracted from the input document, a document impression word is determined based on the extracted word and a given impression word, and the document impression word for each of the document impression words Impression information extraction means for obtaining a document impression relevance level representing the strength of an impression of the input document;
Impression / image storage means for storing a search target image, an image impression word including the given impression word, and an image impression related degree with respect to the search target image in association with each of the image impression words; ,
The document impression word and the document relevance level are received, and an image in which a word matching the document impression word exists in the corresponding image impression word is searched from the search target image in the impression / image database. An image suitable for the input document is selected based on the document impression relevance level corresponding to each search matched word and the image impression relevance level corresponding to each search matched word. An image selection program for causing a computer to function as an image selection program comprising image selection means for selection.