JP2011180824A - Device and method for imparting word meaning classified content, and program - Google Patents

Device and method for imparting word meaning classified content, and program Download PDF

Info

Publication number
JP2011180824A
JP2011180824A JP2010044257A JP2010044257A JP2011180824A JP 2011180824 A JP2011180824 A JP 2011180824A JP 2010044257 A JP2010044257 A JP 2010044257A JP 2010044257 A JP2010044257 A JP 2010044257A JP 2011180824 A JP2011180824 A JP 2011180824A
Authority
JP
Japan
Prior art keywords
meaning
word
search
headword
content
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP2010044257A
Other languages
Japanese (ja)
Inventor
Sanae Fujita
早苗 藤田
Masaaki Nagata
昌明 永田
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nippon Telegraph and Telephone Corp
Original Assignee
Nippon Telegraph and Telephone Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nippon Telegraph and Telephone Corp filed Critical Nippon Telegraph and Telephone Corp
Priority to JP2010044257A priority Critical patent/JP2011180824A/en
Publication of JP2011180824A publication Critical patent/JP2011180824A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

<P>PROBLEM TO BE SOLVED: To provide a device for imparting word meaning classified contents, imparting the appropriate contents such as images and moving images, to respective word meanings, when there is a direction word and a definition sentence indicating the word meaning or text similar to it, and widely covering the plurality of word meanings even for minor word meanings. <P>SOLUTION: For the direction word, a list wherein a plurality of retrieval word candidates to which relation with the direction word is attached respectively are sectioned for each word meaning is input, and a retrieval word selection part selects and outputs at least one retrieval word for each word meaning on the basis of the relation. A content retrieval part acquires predetermined contents for each word meaning of the direction word by retrieving a communication network using the retrieval word corresponding to the word meaning. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、自然言語解析技術又は画像や映像などのコンテンツ検索技術に関し、特に語とその語義を示す定義文又はそれに類するテキストとがある場合に、語の意味に即した適切なコンテンツを獲得、付与又は検索する語義別コンテンツ付与装置、語義別コンテンツ付与方法、及びプログラムに関する。   The present invention relates to a natural language analysis technique or a content search technique such as an image or a video, and in particular, when there is a word and a definition sentence indicating the meaning or a text similar thereto, obtains appropriate content in accordance with the meaning of the word, The present invention relates to a meaning-by-word content assigning device, a meaning-by-meaning content giving method, and a program.

Web上には膨大な画像や映像が流通しており、これらを既存の辞書と関連付けることができれば、文字列情報からだけでは得られない視覚的な情報を利用できるようになる。例えば、語を調べる時、適切な画像も同時に表示されれば、子供や母国語の話者以外でも語義が理解しやすいと考えられる。特に、多義性を持つ語の場合、語義毎に適切な画像を提示できれば、より直感的に語義の違いを理解しやすいと考えられる。   Enormous images and videos are distributed on the Web, and if they can be associated with an existing dictionary, visual information that cannot be obtained from character string information alone can be used. For example, if an appropriate image is displayed at the same time when a word is examined, it is considered that the meaning of the word can be easily understood by anyone other than a child or a native speaker. In particular, in the case of a word having ambiguity, it is considered that it is easier to understand the difference in meaning more intuitively if an appropriate image can be presented for each meaning.

辞書(シソーラス)に画像を付与する従来技術として、非特許文献1〜3に開示されたものが挙げられる。また、近年は大勢の利用者によってデータ共有やタグ付与を行う仕組み(フォルクソノミー)が発達し、タグを付与された大量の画像や映像が蓄積されてきている。例えば、非特許文献3に係るImageNetは、既存のシソーラスであるWordNet(Christine Fellbaum, editor. 1998. WordNet:An Electronic Lexical Database. MIT Press.)の体系から選択した一部のsynset(同義語のグループ)に大量の画像を付与している。画像はWebから検索で収集し、対象synsetの画像として適切か否かをAMT(Amazon Mechanical Turk, http://www.mturk.com/)を利用して人手で判断している。   Non-patent documents 1 to 3 disclosed as conventional techniques for assigning images to a dictionary (thesaurus). In recent years, a mechanism for sharing data and tagging (foldonomy) has been developed by a large number of users, and a large amount of images and videos to which tags have been added have been accumulated. For example, ImageNet according to Non-Patent Document 3 is a synset (group of synonyms) selected from the system of WordNet (Christine Fellbaum, editor. 1998. WordNet: An Electronic Lexical Database. MIT Press.), An existing thesaurus. ) A large amount of images. Images are collected by searching from the Web, and it is manually determined using AMT (Amazon Mechanical Turk, http://www.mturk.com/) whether or not the images are appropriate for the target synset.

Francis bond, Hitoshi Isahara, Sanae Fujita, Kiyotaka Uchimoto, Takayuki Kuribayashi, and Kyoko Kanzaki "Enhancing the Japanese WordNet", Proceedings of the 7th Workshop on Asian Language Resources, ACL-IJCNLP, 2009, p.1-8Francis bond, Hitoshi Isahara, Sanae Fujita, Kiyotaka Uchimoto, Takayuki Kuribayashi, and Kyoko Kanzaki "Enhancing the Japanese WordNet", Proceedings of the 7th Workshop on Asian Language Resources, ACL-IJCNLP, 2009, p.1-8 藤井 敦、石川 徹也、「テキスト処理による画像の多義性解消と事典検索サイトへの応用」、言語処理学会第11回年次大会(NLP-2005)、2005年、p.1002-1005Satoshi Fujii, Tetsuya Ishikawa, “Resolution of image ambiguity by text processing and application to encyclopedia site”, The 11th Annual Conference of the Language Processing Society of Japan (NLP-2005), 2005, p.1002-1005 Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li and Li Fei-Fei, "ImageNet: A Large-Scale Hierarchical Image Database", IEEE Conference on Computer Vision and Pattern Recognition, 2009, p.248-255Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li and Li Fei-Fei, "ImageNet: A Large-Scale Hierarchical Image Database", IEEE Conference on Computer Vision and Pattern Recognition, 2009, p.248-255

非特許文献1における辞書への画像の付与方法は、日本語WordNet(http://nlpwww.nict.go.jp/wn-ja/)に、Open Clip ArtLibrary(OCAL)から獲得した画像を付与している。ここで、OCALとWordNetの階層構造を比較して画像候補を得た後、各synset(意味クラス)の画像として適切かどうかを人手で判断している。この方法については、OCALが著作権フリーで再配布可能であるという利点があるが、画像の種類が限られるため、辞書の語義を広くカバーするのは難しい。   Non-Patent Document 1 gives an image to a dictionary by assigning an image acquired from Open Clip Art Library (OCAL) to Japanese WordNet (http://nlpwww.nict.go.jp/wn-ja/). ing. Here, after comparing the OCAL and WordNet hierarchical structures to obtain image candidates, it is manually determined whether the images are appropriate for each synset (semantic class). This method has the advantage that OCAL is copyright free and can be redistributed, but because the types of images are limited, it is difficult to cover the meaning of the dictionary widely.

また、非特許文献2における辞書への画像の付与方法は、Webから獲得した画像を、辞書検索システムCYCLONE(http://cyclone.cl.cs.titech.ac.jp/)の語義と対応付けるものである。画像が対応すべき語義は、画像のリンク元テキストの多義性解消により推定している。しかし、複数ある語義の出現頻度に大きな差がある場合、見出し語のみで検索しても、ほとんど最もメジャーな語義の画像しか得られない。例えば、見出し語「アーチ」で検索した場合、少なくとも上位500画像中に”本塁打”を示す画像はなかった。このように、見出し語のみによる検索ではマイナーな語義にまで適切な画像を獲得することは難しい。   In addition, the method for assigning an image to a dictionary in Non-Patent Document 2 is to associate an image acquired from the Web with the meaning of the dictionary search system CYCLONE (http://cyclone.cl.cs.titech.ac.jp/). It is. The meaning that the image should correspond to is estimated by eliminating the ambiguity of the link source text of the image. However, when there is a large difference in the appearance frequency of a plurality of meanings, only the most significant meaning images can be obtained even if only the headword is searched. For example, when searching for the entry word “arch”, there was no image indicating “home run” in at least the top 500 images. As described above, it is difficult to obtain an appropriate image even for a minor meaning in a search using only a headword.

また、非特許文献3における辞書への画像の付与方法は、大量のデータを精度良く集めることができる点において優れているが、現在は対象synsetが限定されているため語義の網羅性には疑問が残る。また、ImageNetでは上位語を用いて検索語を拡張して検索に利用している。しかし、上位語はうまく働く場合もあるが、語の意味を不明瞭にする場合もある。例えば、「煮干し」を上位語「食品」で拡張すると「煮干しを使った食品」が多く出現する。このように、拡張に上位語のみを用いると検索の適合率をかえって下げる場合があった。   In addition, the method of adding an image to a dictionary in Non-Patent Document 3 is excellent in that a large amount of data can be collected with high accuracy. However, since the target synset is limited at present, the completeness of the meaning is questionable. Remains. ImageNet uses the broader terms to expand the search term and use it for the search. However, the broader word may work well, but it may obscure the meaning of the word. For example, when “boiled and dried” is expanded with the broader term “food”, many “foods using boiled and dried” appear. As described above, when only the broader word is used for the expansion, the relevance rate of the search may be lowered.

本発明の目的は、上記の従来技術の問題点を鑑み、見出し語とその語義を示す定義文又はそれに類するテキストがある場合に、各語義に適切な画像や動画等のコンテンツを付与することができ、かつ、マイナーな語義であっても複数の語義を広く網羅することができる語義別コンテンツ付与装置、語義別コンテンツ付与方法、及びプログラムを提供することにある。   An object of the present invention is to provide content such as images and videos suitable for each meaning when there is a definition word indicating the headword and its meaning, or a similar text, in view of the problems of the prior art. An object of the present invention is to provide a meaning-by-meaning content assigning apparatus, a meaning-by-meaning content giving method, and a program that can cover a plurality of meanings even in a minor meaning.

本発明の語義別コンテンツ付与装置は、検索語選択部とコンテンツ検索部とを備える。   The content providing apparatus classified by meaning of the present invention includes a search word selection unit and a content search unit.

検索語選択部は、見出し語について、当該見出し語との関係がそれぞれ付された複数の検索語候補が語義別に区分されたリストが入力され、前記関係に基づき語義ごとに1以上の検索語を選択して出力する。   The search word selection unit inputs a list in which a plurality of search word candidates each having a relationship with the head word are classified according to the meaning of the head word, and one or more search words for each word meaning based on the relationship. Select and output.

コンテンツ検索部は、前記語義に対応する前記検索語を用いて通信ネットワークを検索することにより、前記見出し語の語義ごとに所定のコンテンツを取得する。   A content search part searches a communication network using the said search word corresponding to the said meaning, and acquires a predetermined content for every meaning of the said headword.

本発明の語義別コンテンツ付与装置、語義別コンテンツ付与方法、及びプログラムによれば、見出し語とその語義を示す定義文又はそれに類するテキストがある場合に、各語義に適切な画像や動画等のコンテンツを付与することができ、かつ、マイナーな語義であっても複数の語義を広く網羅することができる。   According to the word meaning-specific content assigning device, word sense-specific content assigning method, and program of the present invention, when there is a headword and a definition sentence indicating the meaning or similar text, contents such as images and videos suitable for each meaning In addition, even a minor meaning can cover a plurality of meanings widely.

語義別コンテンツ付与装置100の構成例を示すブロック図。The block diagram which shows the structural example of the content provision apparatus 100 according to meaning. 語義別コンテンツ付与装置100の処理フロー例を示す図。The figure which shows the example of a processing flow of the content provision apparatus 100 according to meaning. 辞書10の例を示す図。The figure which shows the example of the dictionary. 図3から語義と定義文の部分を抽出した図。The figure which extracted the part of meaning and a definition sentence from FIG. 辞書10の別の例を示す図。The figure which shows another example of the dictionary. 図5から語義と定義文の部分を抽出した図。The figure which extracted the part of meaning and a definition sentence from FIG. 図4の各定義文から、見出し語との関係がそれぞれ付された複数の検索語候補を語義ごとに区別して生成したリストの例を示す図。The figure which shows the example of the list | wrist which produced | generated the several search word candidate which each attached | subjected the relationship with a headword from each definition sentence of FIG. 4, distinguishing for every meaning. 見出し語と検索語候補との関係のバリエーションを示す図。The figure which shows the variation of the relationship between a headword and a search word candidate. 図5の説明文から見出し語「EU」に対して同義語の関係を有する語のみを抽出した図。The figure which extracted only the word which has a synonym relation with headword "EU" from the explanatory note of FIG. 図3の辞書に画像を付与した場合の画面表示の例を示す図。The figure which shows the example of a screen display at the time of giving an image to the dictionary of FIG. 図4の抽出情報に画像を付与した場合の画面表示の例を示す図。The figure which shows the example of a screen display at the time of giving an image to the extraction information of FIG. 語義別コンテンツ付与装置100´の構成例を示すブロック図。The block diagram which shows the structural example of content provision apparatus 100 'classified by meaning. 見出し語「ヒマワリ」で検索した場合に語義別に画像コンテンツ入りの検索結果を表示した例を示す図。The figure which shows the example which displayed the search result containing an image content according to meaning when searching with the headword "sunflower".

以下、本発明の実施の形態について、詳細に説明する。   Hereinafter, embodiments of the present invention will be described in detail.

図1に本発明の語義別コンテンツ付与装置100の構成例を示すブロック図を、図2にその処理フロー例を示す。本発明の語義別コンテンツ付与装置100は、検索語抽出部110と検索語選択部120とコンテンツ検索部130とを備える。以下の説明においては、対象自然言語として日本語の場合を取り上げるが、英語、中国語、スペイン語、ドイツ語、フランス語など、いかなる自然言語であっても構わない。   FIG. 1 is a block diagram showing a configuration example of the meaning-by-meaning content providing apparatus 100 of the present invention, and FIG. The content assignment device 100 according to meaning of the present invention includes a search word extraction unit 110, a search word selection unit 120, and a content search unit 130. In the following description, the case where the target natural language is Japanese is taken up, but any natural language such as English, Chinese, Spanish, German, French may be used.

検索語抽出部110は、少なくとも見出し語と当該見出し語の1以上の語義をそれぞれ表現するテキストとを含む辞書などが入力され、語義ごとに、前記テキストから前記見出し語と所定の関係を有する複数の検索語候補を抽出し、前記見出し語との関係がそれぞれ付された複数の検索語候補が語義別に区分されたリストを生成する(S1)。   The search word extraction unit 110 receives a dictionary including at least a headword and a text that expresses one or more meanings of the headword, and a plurality of words having a predetermined relationship with the headword from the text for each meaning. The search word candidates are extracted, and a list in which a plurality of search word candidates each having a relationship with the entry word are classified by meaning is generated (S1).

語義別コンテンツ付与装置100に入力する辞書10としては例えば、基本語意味データベースLexeed(参考文献1、図3参照)、Web上のユーザー参加型辞書(http://ja.wikipedia.org/)、意味クラス(synset)の説明文を含むシソーラスであるWordNetなど、様々なものが考えられる。また、辞書としてまとまっていなくても、Webから見出し語とその説明文らしき部分を抽出したものを入力することも考えられる。
〔参考文献1〕笠原 要、佐藤 浩史、Francis Bond、田中 貴秋、藤田 早苗、金杉 友子、天野 秋成、"「基本語意味データベース:Lexeed」の構築"、情報処理学会研究報告、2004、2004-NLC-159、p75-82
Examples of the dictionary 10 to be input to the semantic content granting device 100 include a basic word meaning database Lexeed (see Reference 1 and FIG. 3), a user participation dictionary on the Web (http://en.wikipedia.org/), Various things are possible, such as WordNet, a thesaurus that includes explanations of semantic classes (synsets). Moreover, even if it is not organized as a dictionary, it is also conceivable to input a word word and its description-like portion extracted from the Web.
[Reference 1] Kaname Kasahara, Hiroshi Sato, Francis Bond, Takaaki Tanaka, Sanae Fujita, Tomoko Kanesugi, Akinari Amano, "Construction of" Basic Word Meaning Database: Lexeed "", Information Processing Society of Japan Research Report, 2004, 2004-NLC -159, p75-82

図3や図5は辞書10の例を示す図である。図3は、辞書10として基本語意味データベースを用いる場合であり、見出し語「アーチ」に対する3つの語義に係る関連情報がまとめられたものであり、図4のように見出し語「アーチ」とその3つの語義の定義文の部分を抽出することができる。また、図5は辞書10としてWikipedia(登録商標)の曖昧さ回避ページを用いる場合であり、ここからは図6のように見出し語「EU」に係る7つの語義の定義文の部分を抽出することができる。なお、図5のページからは、見出し語「EU」傘下の「EU」、「Eu」、「eu」の3つの小見出し語に対して、それぞれ3つ、2つ、2つの定義文を抽出することもできる。   3 and 5 are diagrams showing examples of the dictionary 10. FIG. 3 shows a case where a basic word meaning database is used as the dictionary 10, in which related information relating to the three word meanings for the headword “arch” is compiled. As shown in FIG. The part of the definition sentence of three meanings can be extracted. FIG. 5 shows a case where the ambiguity avoidance page of Wikipedia (registered trademark) is used as the dictionary 10, and from this, as shown in FIG. 6, seven meaning definition parts related to the headword “EU” are extracted. be able to. From the page of FIG. 5, three, two, and two definition sentences are extracted for each of the three sub-words “EU”, “Eu”, and “eu” under the headword “EU”. You can also.

検索語抽出部110は、このように抽出した見出し語と当該見出し語の1以上の語義をそれぞれ表現する定義文や説明文等のテキストから、見出し語と所定の関係を有する複数の検索語候補を抽出し、見出し語との関係がそれぞれ付された複数の検索語候補が語義別に区分されたリストを生成する。図7は、図4の各定義文から、見出し語と所定の関係を有する複数の検索語候補を抽出し、見出し語との関係がそれぞれ付された複数の検索語候補を語義ごとに区別して生成したリストの例である。このリストでは図4の定義文から、見出し語との間で同義語、上位語、関連語、分野の関係の下、抽出しうる検索語候補がほぼ全て抽出されているが、図8に示すように、下位語、部分全体、略称、別称など、任意の関係を設定して抽出してもよい。抽出にあたっては、特定の関係に着目し、例えば同義語のみを抽出したり、同義語と上位語のみを抽出したり、あるいは関連語を重み付きで抽出したりなどといったことも考えられる。例えば、図9は図5の説明文から見出し語「EU」に対して同義語の関係を有する語のみを抽出したものである(なお、語義番号は便宜的に付与したものである)。また、同義語があれば同義語を、なければ上位語を抽出することとしたり、語義別に異なる検索語を得るために語義によって関係を異にするものとしたり、あるいは定義の中で他の定義文での出現頻度の低い語を抽出するといったことも考えられる。   The search word extraction unit 110 extracts a plurality of search word candidates having a predetermined relationship with the head word from the extracted head word and text such as a definition sentence and an explanatory sentence that express one or more meanings of the head word. And a list in which a plurality of search word candidates each having a relationship with a headword are classified according to meaning. 7 extracts a plurality of search word candidates having a predetermined relationship with a headword from each definition sentence in FIG. 4, and distinguishes a plurality of search word candidates each having a relationship with a headword for each meaning. It is an example of the generated list. In this list, almost all search word candidates that can be extracted are extracted from the definition sentence of FIG. 4 under the relationship of synonyms, broader terms, related words, and fields with the headword. As described above, an arbitrary relationship such as a narrow word, an entire part, an abbreviation, or another name may be set and extracted. When extracting, focusing on a specific relationship, for example, it is possible to extract only synonyms, extract only synonyms and broader terms, or extract related terms with weights. For example, FIG. 9 shows only words having a synonym relation with the headword “EU” extracted from the explanatory text of FIG. 5 (the meaning number is given for convenience). Also, if there are synonyms, synonyms will be extracted, if not, broader terms will be extracted, the relationship will be different depending on the meaning to obtain different search terms by meaning, or other definitions in the definition It is also conceivable to extract words with a low appearance frequency in sentences.

以上のような関係情報の抽出には、参考文献2などに示された既知の方法を利用することができるが、辞書に応じたルールを個別に作成して抽出してもよい。例えばWikipediaにおいては図5の下線で示すようにサイト内リンクが張られているが、このようなサイト内リンクや文字の強調部分などを関係語として抽出することも考えられる。この時、多くのページからリンクされている語は、その語義の特徴を示しにくいとして抽出しないことや、年代のような時間表現へのリンクは抽出しないことも考えられる。また、同義語を抽出するのであれば、図5の説明文内のハイフン(−)の前の先頭部分を抽出し、これを見出し語「EU」の同義語として抽出することができる。あるいは、「十二支のうちの一つ。→」のように、言葉や矢印などで参照先名が明示的に示されている場合、参照先名を同義語として抽出することができる。あるいは、見出し語「CS」に対する"computer science"のように見出し語が英数字の場合に、各文字を含む部分同語義であるとして抽出することができる。
〔参考文献2〕Francis Bond, Eric Nichols, Sanae Fujita, and Takaaki Tanaka,"Acquiring an Ontology for a Fundamental Vocabulary", COLING-2004, 2004, p.1319-1325
For extracting the relationship information as described above, a known method shown in Reference 2 or the like can be used. However, rules corresponding to the dictionary may be individually created and extracted. For example, in Wikipedia, in-site links are provided as shown by the underline in FIG. 5, it is also conceivable to extract such in-site links and highlighted portions of characters as related words. At this time, it is conceivable that a word linked from many pages is not extracted because it is difficult to show the characteristics of the meaning, and a link to a time expression such as an age is not extracted. If a synonym is to be extracted, the leading part before the hyphen (-) in the explanatory text of FIG. 5 can be extracted and extracted as a synonym for the headword "EU". Alternatively, when the name of the reference destination is explicitly indicated by a word or an arrow, such as “one of the twelve branches . → ”, the name of the reference destination can be extracted as a synonym. Alternatively, when the headword is alphanumeric like “computer science” for the headword “CS”, it can be extracted as a partial synonym including each character.
[Reference 2] Francis Bond, Eric Nichols, Sanae Fujita, and Takaaki Tanaka, "Acquiring an Ontology for a Fundamental Vocabulary", COLING-2004, 2004, p.1319-1325

検索語選択部120は、検索語抽出部110で生成した、見出し語について当該見出し語との関係がそれぞれ付された複数の検索語候補が語義別に区分されたリストが入力され、当該複数の検索語候補と見出し語との関係に基づき語義ごとに1以上の検索語を選択して出力する(S2)。例えば、図7〜9に示すようなリストが入力されたとき、同義語のみを選択する、又は上位語のみを選択する、あるいは同義語があれば同義語を、同義語が無い場合は分野を示す語を、それも無ければ上位語を選択することが考えられる。また、関係について複数種別を選択したり、選択個数をある上限値以内としたりするといったことも考えられる。更に、センスバンクや辞書での出現頻度、語義番号、リンク頻度などを利用してマイナー語義の関係語は選択しないことなども考えられる。更に当該語義が、その見出し語の語義としては非常にマイナーな場合、見出し語を検索語として使わず、同義語などの関係語だけを使うことも考えられる。例えば、「アーチ3」の場合に、「ホームラン」だけを利用すること等が挙げられる。加えて、見出し語が外来語の場合、オリジナルの言葉のみを検索語として利用することも考えられる。例えば、外国人アーティスト名の場合に、日本語(カタカナなど)ではなく、元の言語での綴りを検索に利用すること等が挙げられる。   The search word selection unit 120 receives a list in which a plurality of search word candidates generated by the search word extraction unit 110, each of which is associated with the head word, are classified according to meaning. One or more search terms are selected and output for each meaning based on the relationship between the word candidates and the headwords (S2). For example, when a list as shown in FIGS. 7 to 9 is input, select only a synonym, or select only a broader term, or if there is a synonym, specify a synonym, and if there is no synonym, select a field. If there is no word to indicate, it is possible to select a broader word. It is also conceivable to select a plurality of types for the relationship, or to make the number of selections within a certain upper limit value. Furthermore, it may be possible not to select a related word in a minor meaning using the appearance frequency, meaning number, link frequency, etc. in a sense bank or dictionary. Further, when the meaning is very minor as the meaning of the headword, it is possible to use only a related word such as a synonym without using the headword as a search word. For example, in the case of “arch 3”, only “home run” may be used. In addition, when the headword is a foreign word, it is possible to use only the original word as a search word. For example, in the case of a foreign artist name, the spelling in the original language is used for the search instead of Japanese (Katakana etc.).

コンテンツ検索部130は、各語義に対応する検索語を用いて通信ネットワークを検索することにより、見出し語の語義ごとに所定のコンテンツを取得し(S3)、これを辞書10に付与あるいはリンク付けを行い、辞書10´を作成する。図3の辞書に画像を付与した場合の画面表示の例を図10に、図4の情報に画像を付与した場合の画面表示の例を図11にそれぞれ示す。ここで、取得するコンテンツは画像のほか、動画など語義を表示するテキストに付与したい任意のコンテンツを選んでよい。また、コンテンツの検索は、Web上の既知の画像検索システムやその他任意の方法を利用できる。Web上の既知の画像検索システムとしては、例えばGoogle(登録商標)(Google AJAX images API, http://code.google.com/intl/ja/apis/ajaxsearch/)、goo(登録商標)(http://bsearch.goo.ne.jp/)、Yahoo!(登録商標)(http://image-search.yahoo.co.jp/)などが挙げられる。   The content search unit 130 searches the communication network using a search word corresponding to each meaning, thereby obtaining predetermined content for each word meaning of the headword (S3), and assigning or linking this to the dictionary 10 To create a dictionary 10 '. FIG. 10 shows an example of screen display when an image is added to the dictionary of FIG. 3, and FIG. 11 shows an example of screen display when an image is added to the information of FIG. Here, in addition to images, the content to be acquired may be any content desired to be added to text that displays meaning, such as a moving image. The content search can use a known image search system on the Web or any other method. Known image search systems on the Web include, for example, Google (registered trademark) (Google AJAX images API, http://code.google.com/intl/ja/apis/ajaxsearch/), goo (registered trademark) (http : //bsearch.goo.ne.jp/) and Yahoo! (registered trademark) (http://image-search.yahoo.co.jp/).

以上のように構成することで、見出し語とその語義を示す定義文又はそれに類するテキストがある場合に、各語義に適切な画像や動画等のコンテンツを付与することができ、かつ、マイナーな語義であっても複数の語義を広く網羅することができる。   By configuring as described above, when there is a definition sentence indicating the headword and its meaning, or a similar text, contents such as images and videos suitable for each meaning can be given, and minor meaning Even so, it can cover a wide range of meanings.

予め、1以上の語義を有する見出し語について、当該見出し語との関係がそれぞれ付された複数の検索語候補が語義別に区分された図7〜9に示すようなリストを用意できる場合には、検索語抽出部110は不要である。このような場合には、図12に示すように検索語選択部120とコンテンツ検索部130とからなる語義別コンテンツ付与装置100´を構成し、図2のS2、S3のみを実行することとしてもよい。   In the case where a list as shown in FIGS. 7 to 9 in which a plurality of search word candidates each having a relationship with the headword are divided according to the meaning for headwords having one or more meanings can be prepared. The search term extraction unit 110 is not necessary. In such a case, as shown in FIG. 12, it is possible to configure the semantic content adding apparatus 100 ′ including the search word selection unit 120 and the content search unit 130 and execute only S <b> 2 and S <b> 3 in FIG. Good.

図1では辞書に対してコンテンツを付与する形態を例示したが、検索語選択部120において得られた語義別に区分された検索語候補のリストをデータベースに保存しておき、ユーザーから検索要求があった時に、コンテンツ検索部130でリアルタイムに語義別の検索を実行し、語義別にリアルタイムに検索結果を表示することも考えられる。図13は、見出し語「ヒマワリ」で検索した場合に語義別の画像コンテンツ入りの検索結果を表示した例である。   Although FIG. 1 illustrates an example in which content is assigned to a dictionary, a list of search term candidates classified by meaning obtained in the search term selection unit 120 is stored in a database, and a search request is received from a user. It is also conceivable that the content search unit 130 performs a search by meaning in real time and displays the search results in real time by meaning. FIG. 13 is an example in which a search result including image content by meaning is displayed when a search is made with the headword “sunflower”.

以上の語義別コンテンツ付与装置及び語義別コンテンツ付与方法をコンピュータによって実現する場合、検索語抽出部、検索語選択部及びコンテンツ検索部が担う処理機能はプログラムによって記述される。そしてパソコンや携帯端末上で、入力手段や各種記憶手段とCPUとのデータのやりとりを通じてこのプログラムを実行することにより、ハードウェアとソフトウェアが協働し、上記処理機能がコンピュータ上で実現されて本発明の語義別コンテンツ付与装置及び語義別コンテンツ付与方法の作用効果を奏する。なおこの場合、処理機能の少なくとも一部をハードウェア的に実現することとしてもよい。また、上記の各種処理は、記載に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されてもよい。その他、本発明の趣旨を逸脱しない範囲で適宜変更が可能である。   When the above-described content-by-word content granting apparatus and word-by-meaning content giving method are realized by a computer, the processing functions of the search word extraction unit, the search word selection unit, and the content search unit are described by a program. By executing this program on the personal computer or portable terminal through the exchange of data between the input means and various storage means and the CPU, the hardware and software cooperate to realize the above processing functions on the computer. The effect of the content provision apparatus classified by meaning and the content provision method classified by meaning of invention is produced. In this case, at least a part of the processing function may be realized by hardware. Further, the various processes described above are not only executed in time series according to the description, but may be executed in parallel or individually according to the processing capability of the apparatus that executes the processes or as necessary. In addition, it can change suitably in the range which does not deviate from the meaning of this invention.

Claims (5)

見出し語について、当該見出し語との関係がそれぞれ付された複数の検索語候補が語義別に区分されたリストが入力され、前記関係に基づき語義ごとに1以上の検索語を選択して出力する検索語選択部と、
前記語義に対応する前記検索語を用いて通信ネットワークを検索することにより、前記見出し語の語義ごとに所定のコンテンツを取得するコンテンツ検索部と、
を備える語義別コンテンツ付与装置。
For a headword, a list in which a plurality of search word candidates each having a relationship with the headword are classified according to meaning is input, and one or more search words are selected and output for each meaning based on the relationship A word selector,
A content search unit that acquires a predetermined content for each meaning of the headword by searching a communication network using the search word corresponding to the meaning;
Meaning-specific content granting device comprising:
請求項1に記載の語義別コンテンツ付与装置において、
少なくとも見出し語と当該見出し語の1以上の語義をそれぞれ表現するテキストとが入力され、語義ごとに、前記テキストから前記見出し語と所定の関係を有する複数の検索語候補を抽出し、前記見出し語との関係がそれぞれ付された複数の検索語候補が語義別に区分された前記リストを生成する検索語抽出部を更に備える
ことを特徴とする語義別コンテンツ付与装置。
In the content provision apparatus classified by meaning according to claim 1,
At least a headword and a text each representing one or more meanings of the headword are input, and for each meaning, a plurality of search word candidates having a predetermined relationship with the headword are extracted from the text, and the headword And a search word extraction unit that generates the list in which a plurality of search word candidates each having a relation to the above are classified by word meaning.
検索語選択部が、見出し語について、当該見出し語との関係がそれぞれ付された複数の検索語候補が語義別に区分されたリストを読み込み、前記関係に基づき語義ごとに1以上の検索語を選択して出力する検索語選択ステップと、
コンテンツ検索部が、前記語義に対応する前記検索語を用いて通信ネットワークを検索することにより、前記見出し語の語義ごとに所定のコンテンツを取得するコンテンツ検索ステップと、
を実行する語義別コンテンツ付与方法。
The search word selection unit reads a list in which a plurality of search word candidates each having a relationship with the head word are classified according to the meaning of the head word, and selects one or more search words for each word meaning based on the relationship Search term selection step to be output,
A content search step in which a content search unit acquires a predetermined content for each meaning of the headword by searching a communication network using the search word corresponding to the meaning;
The content granting method according to meaning of meaning.
請求項3に記載の語義別コンテンツ付与方法において、
前記検索語選択ステップに先立ち、
検索語抽出部が、少なくとも見出し語と当該見出し語の1以上の語義をそれぞれ表現するテキストとが入力され、語義ごとに、前記テキストから前記見出し語と所定の関係を有する複数の検索語候補を抽出し、前記見出し語との関係がそれぞれ付された複数の検索語候補が語義別に区分された前記リストを生成する検索語抽出ステップを更に実行する
ことを特徴とする語義別コンテンツ付与装置。
In the content provision method according to meaning of claim 3,
Prior to the search word selection step,
The search word extraction unit is input with at least a headword and a text each representing one or more meanings of the headword, and for each meaning, a plurality of search word candidates having a predetermined relationship with the headword from the text. A word-by-phrase content adding device that further performs a search word extraction step of generating the list in which a plurality of search word candidates respectively extracted and associated with the headword are classified by word meaning.
請求項1又は2に記載の装置としてコンピュータを機能させるためのプログラム。   A program for causing a computer to function as the apparatus according to claim 1.
JP2010044257A 2010-03-01 2010-03-01 Device and method for imparting word meaning classified content, and program Pending JP2011180824A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2010044257A JP2011180824A (en) 2010-03-01 2010-03-01 Device and method for imparting word meaning classified content, and program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2010044257A JP2011180824A (en) 2010-03-01 2010-03-01 Device and method for imparting word meaning classified content, and program

Publications (1)

Publication Number Publication Date
JP2011180824A true JP2011180824A (en) 2011-09-15

Family

ID=44692265

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2010044257A Pending JP2011180824A (en) 2010-03-01 2010-03-01 Device and method for imparting word meaning classified content, and program

Country Status (1)

Country Link
JP (1) JP2011180824A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2019215684A (en) * 2018-06-12 2019-12-19 国立大学法人大阪大学 Intellectual property support device, intellectual property support method and intellectual property support program

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000029901A (en) * 1998-07-14 2000-01-28 Canon Inc Device for retrieving image and method therefor

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000029901A (en) * 1998-07-14 2000-01-28 Canon Inc Device for retrieving image and method therefor

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
藤井 敦: "テキスト処理による画像の多義性解消と事典検索サイトへの応用", 言語処理学会第11回年次大会発表論文集, JPN6013023169, 15 March 2005 (2005-03-15), JP, pages 1002 - 1005, ISSN: 0002531303 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2019215684A (en) * 2018-06-12 2019-12-19 国立大学法人大阪大学 Intellectual property support device, intellectual property support method and intellectual property support program

Similar Documents

Publication Publication Date Title
CN110543574B (en) Knowledge graph construction method, device, equipment and medium
CN106202382B (en) Link instance method and system
CN107797991B (en) Dependency syntax tree-based knowledge graph expansion method and system
CN107818085B (en) Answer selection method and system for reading understanding of reading robot
CN103491205B (en) The method for pushing of a kind of correlated resources address based on video search and device
JP2006338457A (en) Query answering system, data search method, and computer program
CN105808711B (en) A kind of system and method that the concept based on text semantic generates model
CN106227714A (en) A kind of method and apparatus obtaining the key word generating poem based on artificial intelligence
Al-Safadi et al. Developing ontology for Arabic blogs retrieval
JP2014056503A (en) Computer packaging method, program, and system for specifying non-text element matching communication in multilingual environment
Spitz et al. EVELIN: Exploration of event and entity links in implicit networks
US20170235835A1 (en) Information identification and extraction
JP2008123526A (en) Information retrieval method and device
Jumani et al. Named entity recognition system for Sindhi language
KR101478016B1 (en) Apparatus and method for information retrieval based on sentence cluster using term co-occurrence
Gupta et al. Text analysis and information retrieval of text data
JP2009075881A (en) Text-analyzing program, text-analyzing method, and text-analyzing device
TWI636370B (en) Establishing chart indexing method and computer program product by text information
JP2011180824A (en) Device and method for imparting word meaning classified content, and program
Yeh et al. A case for query by image and text content: searching computer help using screenshots and keywords
JP4953440B2 (en) Morphological analysis device, morphological analysis method, morphological analysis program, and recording medium storing computer program
CN112328743A (en) Code searching method and device, readable storage medium and electronic equipment
US10628632B2 (en) Generating a structured document based on a machine readable document and artificial intelligence-generated annotations
CN106708808B (en) Information mining method and device
KR101567904B1 (en) System and Method for Retrieving Plain Text Query Based Mathematical Formula

Legal Events

Date Code Title Description
RD03 Notification of appointment of power of attorney

Free format text: JAPANESE INTERMEDIATE CODE: A7423

Effective date: 20110701

A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20120213

A977 Report on retrieval

Free format text: JAPANESE INTERMEDIATE CODE: A971007

Effective date: 20130510

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20130521

A02 Decision of refusal

Free format text: JAPANESE INTERMEDIATE CODE: A02

Effective date: 20131203