JP2007011973A

JP2007011973A - Information retrieval device and information retrieval program

Info

Publication number: JP2007011973A
Application number: JP2005195216A
Authority: JP
Inventors: Mitsuomi Sugano; 充臣菅野; Masashi Hirozawa; 昌司広沢
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 2005-07-04
Filing date: 2005-07-04
Publication date: 2007-01-18

Abstract

PROBLEM TO BE SOLVED: To provide an information retrieval device capable of increasing an information amount related to a retrieval character string without increasing an operation amount. SOLUTION: This information retrieval device has: an input device 2 functioning as a retrieval character string input part for inputting the retrieval character string; an information analysis device 1 functioning as a retrieval target information acquisition part acquiring retrieval target information, a retrieval character string information extraction part extracting information related to the retrieval character string from the retrieval target information, a related term extraction part extracting a related term having prescribed correspondence relation with the retrieval character string from the retrieval target information on the basis of the information extracted by the retrieval character string information extraction part, and a retrieval result generation part generating a retrieval result by use of the related term extracted by the related term extraction part; and a display device 3 displaying the retrieval result. COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、検索対象情報を検索する情報検索装置及びコンピュータに検索対象情報を検索させるための情報検索プログラムに関する。 The present invention relates to an information search apparatus for searching for search target information and an information search program for causing a computer to search for search target information.

インターネットの普及により、一般のクライアントが扱う検索対象情報の規模は飛躍的に拡大している。これにより、自分の必要とする情報を選別する手法の重要性が増している。 With the spread of the Internet, the scale of search target information handled by general clients has increased dramatically. As a result, the importance of a method for selecting information required by the person is increasing.

インターネットでの情報探索手法として最も利用されているものは、サーチエンジンと呼ばれる検索システムである。サーチエンジンの中でも特に大きな勢力となっているのが全文検索型のものである。全文検索型のサーチエンジンは、Ｗｅｂページの内容をサーチエンジン側のデータベースに保存しておき、検索要求があったときには当該データベースを検索し結果をはじき出すという情報検索システムである。全文検索型のサーチエンジンによって出力される結果は、単純に検索文字列を抽出したものが大半である。 A search system called a search engine is most used as an information search method on the Internet. One of the most powerful search engines is the full-text search type. A full-text search type search engine is an information search system that saves the contents of a Web page in a database on the search engine side, searches the database when a search request is made, and pops out the result. Most of the results output by a full-text search type search engine are simply extracted search strings.

従来の一般的な情報検索システムは、検索結果に対して何らかの処理を加えるものだとしても、さらに情報をクライアントから得る必要がある場合が多く、多少なりとも処理が煩雑になる。さらに、幅広い知識の探索という観点に立てば、クライアントによる新しい検索文字列の入力という原始的な方法に頼らざるを得なかった。 Even if a conventional general information search system adds some processing to the search result, it is often necessary to obtain more information from the client, and the processing becomes more or less complicated. Furthermore, from the viewpoint of searching for a wide range of knowledge, we had to rely on the primitive method of inputting a new search string by the client.

上述した従来の一般的な検索システムの他にも、種々の情報検索システムが提案されている。 In addition to the conventional general search system described above, various information search systems have been proposed.

検索結果の正確さ及び信頼性を向上させることができる情報検索システムの一例が、特許文献１で開示されている。特許文献１で開示されている検索システムでは、質問者の知りたい事柄を表す語を収録した辞書データベースを設けることで、検索結果の正確さ及び信頼性を高めている。 An example of an information search system that can improve the accuracy and reliability of a search result is disclosed in Patent Document 1. In the search system disclosed in Patent Document 1, the accuracy and reliability of the search results are improved by providing a dictionary database that records words representing matters that the questioner wants to know.

検索文書から検索文字列に関連する箇所を抽出することができる情報検索システムの一例が、特許文献２で開示されている。特許文献２で開示されている情報検索システムは、適合文書の各文について関連キーワードの関連度を累積し、その累積値の大きい文章から順に所定の長さ以上になるまで、文書に現れる順番を保ったまま文章を抽出する。 An example of an information search system that can extract a portion related to a search character string from a search document is disclosed in Patent Document 2. The information retrieval system disclosed in Patent Document 2 accumulates the degree of relevance of related keywords for each sentence of a conforming document, and determines the order of appearance in the document until a predetermined length is reached in order from the sentence with the largest accumulated value. Extract sentences while keeping them.

検索情報に優劣をつけて表示させることができる情報検索システムの一例が、特許文献３で開示されている。特許文献３で開示されている情報検索システムは、検索されたテキストに対して構文解析を施し、キーワードを抽出するとともに、キーワードの構文情報を調べ、そのキーワードの構文情報が設定されている条件を満たしているか否かによって優先順位を決定し、検索されたテキストに当該優先順位を与える。
特開２００２−１２３５４１公報特開２００１−８４２５５号公報特開平６−２１５０３５号公報 An example of an information search system that can display search information with superiority or inferiority is disclosed in Patent Document 3. The information retrieval system disclosed in Patent Document 3 performs syntax analysis on the retrieved text, extracts keywords, examines the syntax information of the keywords, and determines the conditions under which the keyword syntax information is set. The priority order is determined depending on whether or not it is satisfied, and the priority order is given to the retrieved text.
JP 2002-123541 A JP 2001-84255 A Japanese Patent Laid-Open No. 6-215035

特許文献１で開示されている情報検索システムは、質問者の知りたい事柄を表す語を収録した辞書データベースの規模によって、検索結果の正確さ及び信頼性に落差が生じてしまう。 In the information search system disclosed in Patent Document 1, there is a difference in the accuracy and reliability of the search results depending on the scale of the dictionary database that contains words representing the matters that the questioner wants to know.

また、特許文献２で開示されている情報検索システムは、関連度の観点で文章を抽出しているものの、関連度の累積値の大きい文章から順に、文書に現れる順番を保ったまま文章を抽出していくため、解析結果にデータとしての再利用性が乏しく、肝心の検索文字列自体の情報が汲み取りにくくなっている。 In addition, although the information search system disclosed in Patent Document 2 extracts sentences from the viewpoint of the degree of relevance, it extracts sentences while maintaining the order in which they appear in the document in descending order of the relevance accumulated value. Therefore, the reusability as data in the analysis result is poor, and it is difficult to draw the information of the important search character string itself.

特許文献３で開示されている情報検索システムは、キーワードの構文情報が設定されている条件を満たしているか否かによって優先順位を決定し、検索されたテキストに当該優先順位を与えているが、設定されている条件によっては不当に低い評価を受けるテキストも存在し、一貫した評価基準になりにくい。 The information search system disclosed in Patent Document 3 determines the priority order based on whether or not the keyword syntax information satisfies a set condition, and gives the priority order to the searched text. Depending on the set conditions, there are some texts that receive an unreasonably low rating, making it difficult to achieve a consistent evaluation standard.

このように特許文献１〜３で開示されている情報検索システムでは、データを加工する過程でかなりのデータを削減するなどして、より密度の高い情報を提供しようとしているものの、その過程で適用する基準によって非効率な検索を行ってしまう場面が発生する。 As described above, the information search systems disclosed in Patent Documents 1 to 3 try to provide more dense information by reducing a considerable amount of data in the process of processing data, but are applied in that process. Depending on the criteria used, an inefficient search may occur.

また、従来の一般的な情報検索システムにおいて、単純にクライアントによる入力に従って検索語句に関する広範囲な知識の検索を試みる場合には、一度検索された結果を取得し、その中から関連する語句を抽出して再検索するといった手順を踏む必要があった。検索行為に関しては、少ない手順かつ簡単な操作で、必要な知識をできる限り多く取得できることが重要になってくるため、クライアントによる入力を新たに要求することや、検索の設定に多くの複雑な手順を含むもの、もしくは検索結果が削減される可能性のある基準を設定することなどは避けるべきである。 Also, in a conventional general information search system, when trying to search a wide range of knowledge about search terms according to the input by the client, the search result is obtained once, and the related terms are extracted from it. It was necessary to take a procedure such as searching again. Regarding search actions, it is important to acquire as much necessary knowledge as possible with few steps and simple operations, so it is important to request new input from the client and to set up complicated search procedures. It should be avoided to set criteria that may contain search results or reduce search results.

本発明は、上記の問題点に鑑み、操作量を増大させずに、検索文字列に関しての情報量を増加させることができる情報検索装置及び情報検索プログラムを提供することを目的とする。 In view of the above problems, an object of the present invention is to provide an information search apparatus and an information search program that can increase the amount of information related to a search character string without increasing the amount of operation.

上記目的を達成するために本発明に係る情報検索装置は、検索文字列を取得する検索文字列取得手段と、検索対象情報を取得する検索対象情報取得手段と、前記検索対象情報の中から前記検索文字列に関する情報を抽出する検索文字列情報抽出手段と、前記検索文字列情報抽出手段によって抽出された情報に基づいて、前記検索対象情報の中から前記検索文字列との間に所定の対応関係を有している関連語句を抽出する関連語句抽出手段と、前記関連語句抽出手段によって抽出された関連語句を用いて検索結果を生成する検索結果生成手段とを備える構成としている。 In order to achieve the above object, an information search apparatus according to the present invention includes a search character string acquisition unit that acquires a search character string, a search target information acquisition unit that acquires search target information, and the search target information. A search character string information extracting means for extracting information related to the search character string, and a predetermined correspondence between the search character string and the search character string based on the information extracted by the search character string information extracting means. The system includes a related phrase extracting unit that extracts a related phrase having a relationship, and a search result generating unit that generates a search result using the related phrase extracted by the related phrase extracting unit.

このような構成によると、従来の一般的な全文検索型情報検索システムと同様に検索文字列の入力操作のみで情報検索が実行され、前記検索文字列との間に所定の対応関係を有している関連語句を検索結果として得ることができるので、操作量を増大させずに、検索文字列に関しての情報量を増加させることができる。 According to such a configuration, as in the conventional general full-text search type information search system, the information search is executed only by the input operation of the search character string, and there is a predetermined correspondence with the search character string. Therefore, it is possible to increase the amount of information related to the search character string without increasing the amount of operation.

また、前記検索結果に含まれる付加情報を生成する付加情報生成手段を備え、前記付加情報生成手段が、前記関連語句抽出手段によって抽出された関連語句に基づいて、前記関連語句を含む要約文を前記付加情報の一要素とするようにしてもよい。さらに、前記付加情報生成手段が、前記関連語句抽出手段によって抽出された関連語句及び前記検索文字列情報抽出手段によって抽出された情報に基づいて、前記関連語句及び前記検索文字列を含む要約文を前記付加情報の一要素とするようにしてもよい。 Further, the information processing apparatus further includes additional information generation means for generating additional information included in the search result, and the additional information generation means includes a summary sentence including the related phrase based on the related phrase extracted by the related phrase extraction means. You may make it be one element of the said additional information. Further, the additional information generating means may include a summary sentence including the related word phrase and the search character string based on the related word phrase extracted by the related word phrase extraction means and the information extracted by the search character string information extraction means. You may make it be one element of the said additional information.

このような構成によると、検索結果単体での情報量が増加するので、クライアントが必要な知識を多く取得することができる。 According to such a configuration, since the amount of information in the search result alone increases, the client can acquire a lot of necessary knowledge.

また、前記検索結果に含まれる付加情報を生成する付加情報生成手段を備え、前記付加情報生成手段が、前記関連語句抽出手段によって抽出された関連語句に基づいて、前記関連語句抽出手段によって抽出された関連語句を含む情報の格納場所を前記付加情報の一要素とするようにしてもよい。 Further, it comprises additional information generating means for generating additional information included in the search result, and the additional information generating means is extracted by the related phrase extracting means based on the related phrases extracted by the related phrase extracting means. A storage location of information including related phrases may be an element of the additional information.

このような構成によると、クライアントは、必要な情報のある場所へ簡単かつ少ない操作で移動することができ、検索処理の効率を高めることができる。 According to such a configuration, the client can move to a place with necessary information with a simple and few operations, and the efficiency of the search process can be improved.

また、前記検索結果に含まれる付加情報を生成する付加情報生成手段を備え、前記付加情報生成手段が、前記関連語句と前記検索文字列の間の関連性を数値化した情報である関連度を前記付加情報の一要素とするようにしてもよい。 Further, the information processing apparatus includes additional information generation means for generating additional information included in the search result, wherein the additional information generation means has a relevance degree that is information obtained by quantifying the relation between the related phrase and the search character string. You may make it be one element of the said additional information.

また、前記関連度を基準として、前記関連語句を並べ替える関連語句ソート手段を備えてもよい。さらに、前記付加情報生成手段が、前記関連語句と前記検索文字列の間に存在する文字数に基づいて前記関連度を算出してもよく、前記関連語句抽出手段によって抽出された関連語句の同一語句の総数に基づいて関連度を算出してもよい。 Moreover, you may provide the related phrase sorting means which rearranges the said related phrases on the basis of the said related degree. Further, the additional information generating means may calculate the degree of association based on the number of characters existing between the related phrase and the search character string, and the same phrase of the related phrase extracted by the related phrase extracting means The degree of association may be calculated based on the total number of.

また、上記目的を達成するために本発明に係る情報検索プログラムは、コンピュータを、検索文字列を取得する検索文字列取得手段、検索対象情報を取得する検索対象情報取得手段、前記検索対象情報の中から前記検索文字列に関する情報を抽出する検索文字列情報抽出手段、前記検索文字列情報抽出手段によって抽出された情報に基づいて、前記検索対象情報の中から前記検索文字列との間に所定の対応関係を有している関連語句を抽出する関連語句抽出手段、及び前記関連語句抽出手段によって抽出された関連語句を用いて検索結果を生成する検索結果生成手段として機能させるための情報検索プログラムとしている。 In order to achieve the above object, an information search program according to the present invention provides a computer with search character string acquisition means for acquiring a search character string, search target information acquisition means for acquiring search target information, Search character string information extracting means for extracting information related to the search character string from the inside, based on the information extracted by the search character string information extracting means, a predetermined space between the search character information and the search character string Related phrase extracting means for extracting a related phrase having a corresponding relationship, and an information search program for functioning as a search result generating means for generating a search result using the related phrase extracted by the related phrase extracting means It is said.

本発明によると、操作量を増大させずに、検索文字列に関しての情報量を増加させることができる情報検索装置及び情報検索プログラムを実現することができる。 According to the present invention, it is possible to realize an information search device and an information search program that can increase the amount of information related to a search character string without increasing the amount of operation.

本発明の実施形態について図面を参照して以下に説明する。以下の説明では、同一の部品及び構成要素には同一の符号を付し、同一の部品及び構成要素についての詳細な説明は繰り返さないことにする。 Embodiments of the present invention will be described below with reference to the drawings. In the following description, the same parts and components are denoted by the same reference numerals, and detailed description of the same parts and components will not be repeated.

＜用語の定義＞
まず、本発明の実施形態の説明において用いられる用語の定義について述べる。 <Definition of terms>
First, definitions of terms used in the description of the embodiments of the present invention will be described.

「クライアント」とは、本発明に係る情報検索装置に対して検索処理を依頼する立場にあるものをいい、例えば実際に本発明に係る情報検索装置を使用する人間、検索処理の要求を伝達する内外部の演算処理装置、及び演算処理プログラムなどが該当する。 “Client” refers to a person who is in a position to request a search process to the information search apparatus according to the present invention. For example, a person who actually uses the information search apparatus according to the present invention transmits a request for the search process. This corresponds to an internal / external arithmetic processing device, an arithmetic processing program, and the like.

「周囲に存在する」とは、複数の文字列が互いに位置的に近くにある(同じ文中に存在するなど)という場合や、複数の文字列が互いに概念的に近いという場合など、複数の文字列間にある種の対応関係がある場合をいう。 `` Exists in the vicinity '' means that multiple characters are located close to each other (for example, in the same sentence), or multiple characters are conceptually close to each other. A case where there is some kind of correspondence between columns.

「関連語句」とは、ある文字列に対して何らかの関係を有する文字列をいう。また、関連語句が検索文字列自体を含むことに問題は無い。例えば、「犬」という検索文字列によって抽出された「盲導犬」という語句を、「犬」という検索文字列の関連語句とすることに何ら問題はない。 A “related phrase” refers to a character string having some relationship with a certain character string. There is no problem that the related phrase includes the search character string itself. For example, there is no problem in using the word “guide dog” extracted by the search character string “dog” as the related word phrase of the search character string “dog”.

「記録手段」とは、情報を記録する手段をいい、例えばフラッシュメモリに代表される主記憶装置、ＨＤＤ(Hard Disk Drive) に代表される外部記憶装置、ＣＤ−ＲＯＭ(Compact Disk Read Only Memory)やメモリーカードに代表される記録用メディアなどが該当する。 “Recording means” refers to means for recording information. For example, a main storage device represented by flash memory, an external storage device represented by HDD (Hard Disk Drive), a CD-ROM (Compact Disk Read Only Memory). And recording media such as memory cards.

「入力装置」とは、人間の入力操作に応じた情報をＣＰＵのような演算装置に与える装置をいい、キーボード、マウス、タブレットなどその形態は問わない。 The “input device” refers to a device that gives information corresponding to a human input operation to an arithmetic device such as a CPU, and may be in any form such as a keyboard, a mouse, and a tablet.

「入力情報」とは、クライアントによって入力された情報、あるいは記録手段に格納されている情報の少なくともどちらか一方の情報をいう。「クライアントによって入力された情報」とは、例えばキーボードから入力された文字情報や、タブレットで入力されたストローク情報および当該ストローク情報が文字認識処理されて得られた文字情報などが該当する。他にも、紙に書かれた文字をカメラなどで撮影した画像や、その画像を文字認識した文字情報なども考えられる。 “Input information” refers to at least one of information input by a client and information stored in a recording unit. The “information input by the client” corresponds to, for example, character information input from a keyboard, stroke information input from a tablet, character information obtained by character recognition processing of the stroke information, and the like. In addition, an image obtained by photographing a character written on paper with a camera or the like, or character information obtained by character recognition of the image can be considered.

「報知手段」とは、ＣＰＵのような演算装置から与えられる情報を報知する手段をいい、ＣＰＵのような演算装置から与えられる情報を表示する表示装置のみならず、自動販売機のように信号を受けて物品を供出するような装置も報知手段に該当する。 “Notification means” refers to means for notifying information provided from an arithmetic device such as a CPU. Not only a display device that displays information provided from an arithmetic device such as a CPU, but also a signal such as a vending machine. A device that receives the goods and receives the goods also corresponds to the notification means.

「通信デバイス」とは、情報検索装置内部にはない情報を外部から受け取る目的でＣＰＵのような演算装置と外部との間に設置するものをいい、ゲートウェイ装置やルータまたは専用回線などその形態は問わない。 “Communication device” refers to a device installed between an arithmetic device such as a CPU and the outside for the purpose of receiving information from outside that is not inside the information search device. The form of the gateway device, router, dedicated line, etc. It doesn't matter.

「検索対象情報」とは、電子的に記録されたデータなどを主に指しており、通常人が認知できるような形式で表現できるデータのことであればどんなものでも良い。人が認知できるような形式とは、例えば、文字や画像に対しては、それらをディスプレイに表示する、もしくは紙に印刷するといったような形式のことをいう。データの内容は、文字、図形、画像、動画、音声などの視覚や聴覚をはじめとして、触覚や嗅覚、味覚など、人が五感を使って認知できるような形式で表現できるあらゆるデータを含む。検索対象情報の例としては、オフィスで使うような業務文書ファイル、電子書籍やインターネット上の電子データなどが考えられる。もちろん音楽や映画といった通常文字列として扱うことのできないデータに対しても、特殊な手法を用いて検索処理を行う場合が考えられるため、これを含むものとする。通常検索対象情報は、ファイルなどの形式で記録手段に記録されることが多い。形式は多種多用に考えられるが、例えばファイル形式やデータベース形式などが一般的である。これらの検索対象情報の読み書きは、情報検索装置内のデバイスに対して直接行われる場合もあるし、通信回線を経た層の異なるデバイスに対して間接的に行われる場合もある。 “Search target information” mainly refers to electronically recorded data or the like, and may be any data that can be expressed in a form that can be normally recognized by a person. The format that can be recognized by humans means, for example, a format in which characters and images are displayed on a display or printed on paper. The contents of the data include all kinds of data that can be expressed in a format that can be recognized by the human senses using the five senses, such as touch, smell, and taste, as well as visual and auditory senses such as characters, figures, images, videos, and sounds. Examples of search target information include business document files used in offices, electronic books, and electronic data on the Internet. Of course, data that cannot be handled as a normal character string, such as music and movies, may be searched using a special technique, and is included. Usually, the search target information is often recorded in a recording means in the form of a file or the like. Various formats can be considered, but for example, file formats and database formats are common. The reading / writing of the search target information may be directly performed on a device in the information search apparatus, or may be indirectly performed on a device having a different layer through a communication line.

「インデックス」とは、情報の検索速度を向上させるために、どの行がどこにあるかなどの位置情報を示した索引のことをいす。インデックスの形式は行単位である必要はなく、バイト位置やＵＲＬなどデータの所在を表現できるものであれば、どんなものでもよい。 The “index” refers to an index indicating position information such as which line is in order to improve the information retrieval speed. The format of the index does not need to be in units of lines, and any index can be used as long as it can express the location of data such as byte positions and URLs.

「形態素解析」とは与えられた文を形態素に分ける作業のことをいい、「形態素」とは語のなかで変化しない最小単位を指す。形態素解析には、それぞれの形態素がどの品詞に対応するかという対応付けも含まれる。本発明では形態素解析の中で、文を形態素に分解する部分と、品詞への対応付けを行う部分とを取り上げており、逆説的に言えば厳密な形態素解析手法を用いることなく、形態素の切り出しとその品詞の判別が行える場合には、本発明において当該判別手法を形態素解析手法の代わりに採用しても問題はない。 “Morphological analysis” refers to the process of dividing a given sentence into morphemes, and “morpheme” refers to the smallest unit that does not change in a word. The morpheme analysis also includes correspondence as to which part of speech each morpheme corresponds to. In the present invention, in the morpheme analysis, the part that decomposes the sentence into morphemes and the part that associates the part of speech with each other are taken up. Paradoxically speaking, the morpheme segmentation is performed without using a strict morphological analysis method. In the present invention, there is no problem even if the discrimination method is adopted instead of the morphological analysis method.

＜本発明に係る情報検索装置の概要＞
次に、本発明に係る情報検索装置の概要について説明する。本発明に係る情報検索装置の外観例を図１に示す。図１に示す情報検索装置は、情報解析装置１と、検索に必要な情報を情報解析装置１に与える入力装置２と、情報解析装置１から受け取った検索結果を表示する表示装置３とからなる。 <Outline of Information Retrieval Device According to the Present Invention>
Next, an outline of the information search apparatus according to the present invention will be described. An example of the appearance of the information retrieval apparatus according to the present invention is shown in FIG. The information search device shown in FIG. 1 includes an information analysis device 1, an input device 2 that gives information necessary for the search to the information analysis device 1, and a display device 3 that displays the search results received from the information analysis device 1. .

図１に示す情報検索装置は、情報解析装置１、入力装置２、及び表示装置３がそれぞれ別体になっているが、本発明に係る情報検索装置はこれに限定されることはなく、情報解析装置１、入力装置２、及び表示装置３が一体に構成されていてもよい。また、表示装置３の代わりに他の報知手段を用いても構わない。例えば、検索対象情報が音楽データである場合、表示装置３の代わりに音声出力装置を用いることが考えられる。 1, the information analysis device 1, the input device 2, and the display device 3 are separate from each other. However, the information search device according to the present invention is not limited to this, and the information search device The analysis device 1, the input device 2, and the display device 3 may be integrally configured. Further, other informing means may be used instead of the display device 3. For example, when the search target information is music data, it is conceivable to use an audio output device instead of the display device 3.

本発明に係る情報検索装置では、クライアントによる検索文字列の入力に対して、当該検索文字列をそのまま出力することはしない。全文検索の手法によって特定された検索対象情報上の当該検索文字列は、単独でその文字列が存在しているわけではなく、その大半は文章の中に存在し、周囲にいくつもの語句を配している。これら周囲に存在する語句と当該検索文字列との間には、文脈関係を初めとする意味論的な強い結びつきがあると考えられる。 The information search apparatus according to the present invention does not output the search character string as it is in response to the input of the search character string by the client. The search string on the search target information specified by the full-text search method does not exist by itself, but most of the search string exists in the sentence and a number of phrases are placed around it. is doing. It is considered that there is a strong semantic connection between the surrounding words and the search character string, including a context relationship.

そこで、本発明に係る情報検索装置では、これら周囲に存在する語句を、当該検索文字列の関連語句として収集し、これを検索結果として出力する。実際の表示例を図２に示す。図２に示す表示例では、「犬」という検索文字列の関連語句となる「猫」「ペット」「介助犬」「盲導犬」が列挙されている。 Therefore, in the information search apparatus according to the present invention, the phrases existing around these are collected as the related phrases of the search character string and output as search results. An actual display example is shown in FIG. In the display example illustrated in FIG. 2, “cat”, “pet”, “service dog”, and “guide dog” that are related words in the search character string “dog” are listed.

検索結果は、同じ関連語句であることを基準にしてまとめられ、それぞれのグループごとに出力される。ここで言う検索結果とは、検索文字列を含む文の要約、関連語句の格納された場所、収集されたデータを基に数値化された関連語句の出現頻度など、またはそれらの組み合わせをいい、得られたデータを用いて作成し得るものならば、特に限定する必要は無い。実際の表示例を図３に示す。図３に示す表示例では、「犬」という検索文字列の関連語句である「猫」「ペット」と共に、その付加情報である要約文、検索文字列と関連語句との関係を数値化したもの、及びインターネットアドレスなどを表示している。 Search results are compiled based on the same related terms and are output for each group. The search result mentioned here is a summary of the sentence including the search string, the location where the related words are stored, the frequency of occurrence of related words quantified based on the collected data, etc., or a combination of these, There is no particular limitation as long as it can be created using the obtained data. An actual display example is shown in FIG. In the display example shown in FIG. 3, “cat” and “pet”, which are related terms of the search character string “dog”, and a summary sentence that is the additional information, and a relationship between the search character string and the related terms are quantified , And Internet addresses are displayed.

＜本発明に係る情報検索装置の詳細＞
次に、本発明に係る情報検索装置の詳細について説明する。 <Details of Information Retrieval Device According to the Present Invention>
Next, details of the information retrieval apparatus according to the present invention will be described.

図１に示す情報検索装置の機能について図４を参照して説明する。図４は、図１に示す情報検索装置の機能ブロック図である。図１の情報検索装置は、検索文字列入力部４と、検索対象情報取得部５と、インデックス生成部６と、検索文字列情報抽出部７と、関連語句抽出部８と、関連語句ソート部９と、付加情報生成部１０と、検索結果生成部１１と、検索結果出力部１２とを備えている。検索対象情報取得部５と、インデックス生成部６と、検索文字列情報抽出部７と、関連語句抽出部８と、関連語句ソート部９と、付加情報生成部１０と、検索結果生成部１１とは情報解析装置１によって実現され、検索文字列入力部４は入力装置２によって実現され、検索結果出力部１２は表示装置３によって実現される。 The function of the information search apparatus shown in FIG. 1 will be described with reference to FIG. FIG. 4 is a functional block diagram of the information search apparatus shown in FIG. 1 includes a search character string input unit 4, a search target information acquisition unit 5, an index generation unit 6, a search character string information extraction unit 7, a related word / phrase extraction unit 8, and a related word / phrase sort unit. 9, an additional information generation unit 10, a search result generation unit 11, and a search result output unit 12. Search target information acquisition unit 5, index generation unit 6, search character string information extraction unit 7, related word / phrase extraction unit 8, related word / phrase sort unit 9, additional information generation unit 10, and search result generation unit 11 Is realized by the information analysis device 1, the search character string input unit 4 is realized by the input device 2, and the search result output unit 12 is realized by the display device 3.

検索文字列入力部４は、クライアントから与えられる情報を検索文字列情報抽出部７へ送る。 The search character string input unit 4 sends information given from the client to the search character string information extraction unit 7.

検索対象情報取得部５は、情報検索装置の内部又は外部の記録手段に格納されている検索対象情報を取得し、取得した検索対象情報などを検索文字列情報抽出部７へ送る。また、検索対象情報が膨大な場合などにはインデックスを作成するために、検索対象情報取得部５は、取得した検索対象情報を、インデックス生成部６を経由して検索文字列情報抽出部７へ送る。 The search target information acquisition unit 5 acquires search target information stored in an internal or external recording unit of the information search apparatus, and sends the acquired search target information and the like to the search character string information extraction unit 7. Further, in order to create an index when the search target information is enormous, the search target information acquisition unit 5 sends the acquired search target information to the search character string information extraction unit 7 via the index generation unit 6. send.

検索文字列情報抽出部７は、検索文字列入力部４と、検索対象情報取得部５又はインデックス生成部６とから送られてくる情報を受け取り、検索文字列を含むデータの位置(アドレスなど)やインデックスを抽出する。 The search character string information extraction unit 7 receives information sent from the search character string input unit 4 and the search target information acquisition unit 5 or the index generation unit 6, and the position (address, etc.) of data including the search character string And extract indexes.

関連語句抽出部８は、検索文字列情報抽出部７で抽出された検索文字列を含むデータの位置(アドレスなど)やインデックスに基づいて、検索文字列の周囲に存在する語句を関連語句として抽出する。 The related word / phrase extraction unit 8 extracts words / phrases existing around the search character string as related words / phrases based on the position (address, etc.) and index of the data including the search character string extracted by the search character string information extraction unit 7. To do.

関連語句ソート部９は、関連語句抽出部８で抽出された関連語句を辞書順などの規則で並び替える。ここで用いる規則は検索結果を出力する上で、クライアントが利用しやすい形態であればどんな形態でも構わない。 The related phrase sorting unit 9 sorts the related phrases extracted by the related phrase extracting unit 8 according to a rule such as dictionary order. The rules used here may be in any form as long as the client can easily use the search results.

付加情報生成部１０は、検索文字列情報抽出部７及び関連語句抽出部８によって抽出された情報から生成可能な付加情報を生成する。 The additional information generating unit 10 generates additional information that can be generated from the information extracted by the search character string information extracting unit 7 and the related phrase extracting unit 8.

検索結果生成部１１は、関連語句ソート部９で並び替えられた情報に、付加情報生成部１０で生成された付加情報を加え、クライアントへ報知するための検索結果を生成する。 The search result generation unit 11 adds the additional information generated by the additional information generation unit 10 to the information rearranged by the related phrase sorting unit 9, and generates a search result for notification to the client.

検索結果出力部１２は、検索結果生成部１１によって生成された検索結果を表示し、検索によって得られた情報をクライアントに報知する。 The search result output unit 12 displays the search result generated by the search result generation unit 11 and notifies the client of information obtained by the search.

尚、上記説明において、文中でデータなどを「送る」という動詞で表現している部分があるが、これは実際に主語が目的語に対して働きかける場合のみを指しているわけではなく、目的語である部分が主語である部分に働きかけて主語である部分からデータを得る場合も、便宜上「送る」という言葉で表現している。 In the above description, there is a part that expresses data etc. in the verb “send” in the sentence, but this does not mean only when the subject actually works on the object, but the object Even when the part that works is the subject, and the data is obtained from the subject part, it is expressed by the word “send” for convenience.

続いて、図１に示す情報検索装置の構成について図５を参照して説明する。図５は、図１に示す情報検索装置の一構成例を示す図である。情報解析装置１は、主記憶装置１Ａと、外部記憶装置１Ｂと、ＣＰＵ（中央処理装置）１Ｃと、通信デバイス１Ｄと、データバス１Ｅとを備えており、データバス１Ｅによって入力装置２や表示装置３とデータの遣り取りを行う。 Next, the configuration of the information search apparatus shown in FIG. 1 will be described with reference to FIG. FIG. 5 is a diagram illustrating a configuration example of the information search apparatus illustrated in FIG. The information analysis device 1 includes a main storage device 1A, an external storage device 1B, a CPU (central processing unit) 1C, a communication device 1D, and a data bus 1E. The data bus 1E allows the input device 2 and display. Data exchange with the device 3 is performed.

主記憶装置１Ａは、フラッシュメモリなどの一時記憶装置であり、検索対象情報取得部５、インデックス生成部６、検索文字列情報抽出部７、関連語句抽出部８、関連語句ソート部９、付加情報生成部１０、及び検索結果生成部１１によって取得、抽出、或いは生成されるデータを記憶することを主な目的としている。尚、ＣＰＵ（中央処理装置）１Ｃに含まれるレジスタ領域なども、主記憶装置１Ａと分担して、検索対象情報取得部５、インデックス生成部６、検索文字列情報抽出部７、関連語句抽出部８、関連語句ソート部９、付加情報生成部１０、及び検索結果生成部１１によって取得、抽出、或いは生成されるデータを記憶する。 The main storage device 1A is a temporary storage device such as a flash memory, and includes a search target information acquisition unit 5, an index generation unit 6, a search character string information extraction unit 7, a related phrase extraction unit 8, a related phrase sorting unit 9, and additional information. The main purpose is to store data acquired, extracted, or generated by the generation unit 10 and the search result generation unit 11. The register area included in the CPU (Central Processing Unit) 1C is also shared with the main storage device 1A, and the search target information acquisition unit 5, the index generation unit 6, the search character string information extraction unit 7, and the related phrase extraction unit. 8. The data acquired, extracted, or generated by the related phrase sorting unit 9, the additional information generation unit 10, and the search result generation unit 11 is stored.

外部記憶装置１Ｂは、ＨＤＤ(Hard Disk Drive)などの取り外し可能な記憶装置であり、ＣＰＵ（中央処理装置）１Ｃによって算出された解析結果や主記憶装置１Ａには一時的に必要のない退避対象のデータを保存すること等を主な目的としている。また、通信デバイス１Ｄ及びネットワーク１３を経由した有線接続或いは無線接続によって利用可能な情報検索装置外部の主記憶装置や外部記憶装置を外部記憶装置１Ｂとみなして用いることも可能である。 The external storage device 1B is a removable storage device such as an HDD (Hard Disk Drive), and an analysis result calculated by a CPU (Central Processing Unit) 1C or a save target that is temporarily unnecessary for the main storage device 1A. The main purpose is to store the data. It is also possible to use a main storage device or an external storage device outside the information search device that can be used by wired connection or wireless connection via the communication device 1D and the network 13 as the external storage device 1B.

ＣＰＵ（中央処理装置）１Ｃは、検索対象情報取得部５、インデックス生成部６、検索文字列情報抽出部７、関連語句抽出部８、関連語句ソート部９、付加情報生成部１０、及び検索結果生成部１１が行うデータ処理の手順を記述したプログラムを主記憶装置１Ａ、外部記憶装置１Ｂ、又は通信デバイス１Ｄを経由したネットワーク１３などより取得し、検索対象情報取得部５、インデックス生成部６、検索文字列情報抽出部７、関連語句抽出部８、関連語句ソート部９、付加情報生成部１０、及び検索結果生成部１１が行うデータ処理を実行する。尚、ＣＰＵ（中央処理装置）１Ｃと他の装置とのデータの遣り取りはデータバス１Ｅに限らず、有線或いは無線の通信装置などデータを送受信できるものならば、どのようなものを利用してもよい。また、検索対象情報取得部５、インデックス生成部６、検索文字列情報抽出部７、関連語句抽出部８、関連語句ソート部９、付加情報生成部１０、及び検索結果生成部１１が行うデータ処理を実行する手段としては、ＣＰＵ（中央処理装置）１Ｃに限らず、当該データ処理を実行可能であれば、どのような装置を用いても構わない。 The CPU (central processing unit) 1C includes a search target information acquisition unit 5, an index generation unit 6, a search character string information extraction unit 7, a related phrase extraction unit 8, a related phrase sorting unit 9, an additional information generation unit 10, and a search result. A program describing a procedure of data processing performed by the generation unit 11 is acquired from the main storage device 1A, the external storage device 1B, the network 13 via the communication device 1D, or the like, and the search target information acquisition unit 5, the index generation unit 6, Data processing performed by the search character string information extraction unit 7, the related phrase extraction unit 8, the related phrase sorting unit 9, the additional information generation unit 10, and the search result generation unit 11 is executed. Note that data exchange between the CPU (central processing unit) 1C and other devices is not limited to the data bus 1E, and any data transmission / reception device such as a wired or wireless communication device can be used. Good. Further, data processing performed by the search target information acquisition unit 5, the index generation unit 6, the search character string information extraction unit 7, the related phrase extraction unit 8, the related phrase sorting unit 9, the additional information generation unit 10, and the search result generation unit 11 Is not limited to the CPU (central processing unit) 1C, and any device may be used as long as the data processing can be executed.

表示装置３は検索結果出力部１２として機能し、検索結果生成部１１によって生成された検索結果を表示し、検索によって得られた情報をクライアントに報知する。 The display device 3 functions as the search result output unit 12, displays the search result generated by the search result generation unit 11, and notifies the client of information obtained by the search.

入力装置２は、検索文字列入力部４として機能し、クライアントによって与えられる情報（検索文字列）をＣＰＵ（中央処理装置）１Ｃに供給する。入力装置２の形式には特に限定はなく、検索文字列を記述できるものであればどのようなものでもよい。 The input device 2 functions as the search character string input unit 4 and supplies information (search character string) given by the client to the CPU (central processing unit) 1C. The format of the input device 2 is not particularly limited, and any format can be used as long as a search character string can be described.

通信デバイス１Ｃは、有線接続或いは無線接続によって他のネットワーク機器との間でデータの遣り取りを行う。 The communication device 1C exchanges data with other network devices by wired connection or wireless connection.

続いて、図５に示す構成の情報検索装置におけるデータ処理手順について図６を参照して説明する。図６は、図５に示す構成の情報検索装置におけるデータ処理手順の一例を示すフローチャートである。 Next, a data processing procedure in the information search apparatus having the configuration shown in FIG. 5 will be described with reference to FIG. FIG. 6 is a flowchart showing an example of a data processing procedure in the information search apparatus having the configuration shown in FIG.

図６に示すデータ処理は、ＣＰＵ（中央処理装置）１Ｃが主記憶装置１Ａや外部記憶装置１Ｂなどに記憶されるプログラムを読み出して実行し、情報検索装置全体を制御することによって実現される。 The data processing shown in FIG. 6 is realized by the CPU (central processing unit) 1C reading and executing a program stored in the main storage device 1A, the external storage device 1B, etc., and controlling the entire information retrieval device.

ステップＳ１０では、検索文字列入力部４として機能する入力装置２がクライアントである使用者の入力操作によってクライアントから与えられる文字列を検索文字列としてＣＰＵ（中央処理装置）１Ｃに送る。 In step S10, the input device 2 functioning as the search character string input unit 4 sends a character string given from the client by the user's input operation as a client to the CPU (central processing unit) 1C as a search character string.

ステップＳ２０では、ＣＰＵ（中央処理装置）１Ｃが検索対象情報取得部５及び検索文字列情報抽出部７として機能する。具体的には、ＣＰＵ（中央処理装置）１Ｃが、外部記憶装置１Ｂなどに格納された情報や、通信デバイス１Ｄを経由してネットワーク１３より得られた検索対象情報を取得し、検索対象情報の中から検索文字列のデータ位置を抽出し、抽出された検索文字列のデータ位置を自己のレジスタ領域或いは主記憶装置１Ａや外部記憶装置１Ｂなどに書き込む。このとき、抽出された検索文字列のデータ位置は図８（ａ）に示す配列index[ｋ] （ｋは１≦ｋ≦ｎの任意の自然数）に格納される。 In step S 20, the CPU (central processing unit) 1 C functions as the search target information acquisition unit 5 and the search character string information extraction unit 7. Specifically, the CPU (central processing unit) 1C acquires information stored in the external storage device 1B or the like, or search target information obtained from the network 13 via the communication device 1D, and stores the search target information. The data position of the search character string is extracted from the inside, and the data position of the extracted search character string is written into its own register area or the main storage device 1A or the external storage device 1B. At this time, the data position of the extracted search character string is stored in the array index [k] (k is an arbitrary natural number of 1 ≦ k ≦ n) shown in FIG.

図６に示すデータ処理手順によって得られるデータの内部構造の一例を図８に示す。図８（ａ）は検索文字列のデータ位置を保持する配列index[ｋ]であり、図８（ｂ）は関連語句として抽出される単語(element)と、文を解析した結果判明する単語の品詞(attribute)と、その単語の位置(index)とを合わせて確保する構造体relationであり、図８（ｃ）は関連語句として抽出される単語(element)とその付加情報(add_info)を合わせて保持する構造体additionである。 An example of the internal structure of data obtained by the data processing procedure shown in FIG. 6 is shown in FIG. FIG. 8A shows an array index [k] that holds the data position of the search character string. FIG. 8B shows a word (element) extracted as a related phrase and a word that is found as a result of analyzing the sentence. This is a structure relation that secures the part of speech (attribute) and the position (index) of the word. FIG. 8 (c) shows the combination of the word (element) extracted as the related phrase and its additional information (add_info). Structure addition to be held.

図９は、図８に示すデータの内部構造を、その階層構造に焦点を絞って示した模式図である。index[ｋ]は、それぞれその子供要素として構造体relationの集合を保持している。これら構造体relationは、それぞれが関連語句として抽出される単語(element)を内包しており、その関連語句として抽出される単語(element)と１対１、もしくは１対多数の付加情報(add_info)を内包する構造体additionを子供要素に持っている。図９に示す階層構造のデータ構造は本実施形態に関して有効な態様であると考え示したものであり、この他のデータ構造を用いても本発明に係る情報検索装置を実現することは可能である。例えば、図１０（ａ）に示すような或る単一のレコードにデータを取得した順序で逐次保存する態様でも、図１０（ｂ）に示すようなヒストリを記録したレコードを用意する態様であってもよい。 FIG. 9 is a schematic diagram showing the internal structure of the data shown in FIG. 8 with a focus on the hierarchical structure. Each index [k] holds a set of structure relations as its child elements. Each of these structure relations includes a word (element) extracted as a related phrase, and one-to-one or one-to-many additional information (add_info) is extracted from the word (element) extracted as the related phrase. The child element has a structure addition that contains. The data structure of the hierarchical structure shown in FIG. 9 is considered to be an effective aspect with respect to the present embodiment, and it is possible to realize the information retrieval apparatus according to the present invention using other data structures. is there. For example, even in a mode in which data is sequentially stored in the order in which data is acquired in a single record as shown in FIG. 10A, a record in which history is recorded as shown in FIG. 10B is prepared. May be.

ステップＳ３０では、ＣＰＵ（中央処理装置）１Ｃが関連語句抽出部８として機能する。すなわち、ＣＰＵ（中央処理装置）１ＣがステップＳ２０で抽出された検索文字列のデータ位置に基づいて関連語句の抽出を行う。ステップＳ３０の処理の詳細について図７を参照して説明する。図７は、関連語句の抽出処理手順の一例を示すフローチャートである。 In step S30, the CPU (central processing unit) 1C functions as the related phrase extraction unit 8. That is, the CPU (central processing unit) 1C extracts related words based on the data position of the search character string extracted in step S20. Details of the processing in step S30 will be described with reference to FIG. FIG. 7 is a flowchart illustrating an example of a related word / phrase extraction processing procedure.

ステップＳ３１では、ＣＰＵ（中央処理装置）１Ｃが、図８（ｂ）に示す検索文字列のデータ位置を保持する配列index [ｋ]を自己のレジスタ領域或いは主記憶装置１Ａや外部記憶装置１Ｂなどから読み出す。 In step S31, the CPU (central processing unit) 1C sets the array index [k] that holds the data position of the search character string shown in FIG. 8B to its own register area or the main storage device 1A or the external storage device 1B. Read from.

ステップＳ３２では、ＣＰＵ（中央処理装置）１Ｃが、ステップＳ３１で取得した検索文字列のデータ位置を用いて、検索対象情報の作業領域を特定する。 In step S32, the CPU (central processing unit) 1C specifies the work area of the search target information using the data position of the search character string acquired in step S31.

ステップＳ３３では、ＣＰＵ（中央処理装置）１Ｃが、ステップＳ３２で特定した検索対象情報の作業領域から検索文字列を含む文を抽出する。ここで言う「文」とは、句点「。」によって区切られている部分を指す。 In step S33, the CPU (central processing unit) 1C extracts a sentence including the search character string from the work area of the search target information specified in step S32. Here, the “sentence” refers to a portion delimited by the punctuation mark “.”.

ステップＳ３４では、ＣＰＵ（中央処理装置）１Ｃが、抽出した文を形態素解析手法によって分解し、切り分けた単語を図８（ｂ）に示す構造体relationのelementに格納し、切り分けた単語の品詞を図８（ｂ）に示す構造体relationのattributeに格納し、切り分けた単語のデータ位置を図８（ｂ）に示す構造体relationのindexに格納する。通常、形態素解析手法では曖昧性が議論の対象となるが、かな漢字の混合文を対象にした場合は、複数の候補が算出される文例はほとんど存在せず、英文の場合は単語単位での分解が容易であるため、ステップＳ３４の処理には支障をきたさないものと考えて構わない。 In step S34, the CPU (central processing unit) 1C decomposes the extracted sentence by a morphological analysis method, stores the segmented word in the element of the structure relation shown in FIG. 8B, and stores the part of speech of the segmented word. It is stored in the attribute of the structure relation shown in FIG. 8B, and the data position of the segmented word is stored in the index of the structure relation shown in FIG. 8B. Normally, ambiguity is the subject of debate in morphological analysis methods, but there are few sentence examples where multiple candidates are calculated when dealing with mixed sentences of Kana-Kanji characters, and in English, decomposition is done in units of words. Therefore, it may be considered that the processing of step S34 is not hindered.

ステップＳ３５では、ＣＰＵ（中央処理装置）１Ｃが、図８（ｂ）に示す構造体relationの品詞（attribute）に特定の値(例：名詞)が入っている場合にのみ同構造体relationの単語（element）を関連語句とし、この条件を満たさない構造体relationを削除する。 In step S35, the CPU (central processing unit) 1C has the word of the structure relation only when a specific value (eg, noun) is included in the part of speech (attribute) of the structure relation shown in FIG. (Element) is a related phrase, and a structure relation that does not satisfy this condition is deleted.

ステップＳ３６では、ＣＰＵ（中央処理装置）１Ｃが、図８（ａ）に示す配列index[ｋ]内の要素について、ステップＳ３２〜Ｓ３５の処理を行っていないものが無いかを確認する。対象の要素（ステップＳ３２〜Ｓ３５の処理を行っていない要素）が存在すれば（ステップＳ３６のＮＯ）、ステップＳ３２に戻る。対象の要素が存在しなければ（ステップＳ３６のＹＥＳ）、このフローチャートを抜け、図６に示すステップＳ４０に移行する。 In step S36, the CPU (central processing unit) 1C confirms whether there are any elements in the array index [k] shown in FIG. 8A that have not been subjected to the processes in steps S32 to S35. If there is a target element (an element that has not been subjected to the processes in steps S32 to S35) (NO in step S36), the process returns to step S32. If there is no target element (YES in step S36), the process exits this flowchart and proceeds to step S40 shown in FIG.

再び図６を参照して、図５に示す構成の情報検索装置におけるデータ処理手順についての説明を続ける。 Referring to FIG. 6 again, the description of the data processing procedure in the information search apparatus having the configuration shown in FIG. 5 will be continued.

ステップＳ４０では、ＣＰＵ（中央処理装置）１Ｃが、図８（ｂ）に示す構造体relationの単語（element）を含む文、即ち関連語句を含む文を、図８（ｂ）に示す構造体relationの配列（index）を用いて検索対象情報から抽出して、図８（ｃ）に示す構造体additionのadd_Infoへ格納する。図８（ｃ）に示す構造体additionのelementには、現在抽出の基準となっている関連語句をそのまま格納する。 In step S40, the CPU (central processing unit) 1C converts a sentence including the word (element) of the structure relation illustrated in FIG. 8B, that is, a sentence including a related phrase into the structure relation illustrated in FIG. 8B. Is extracted from the search target information using the array (index) and stored in add_Info of the structure addition shown in FIG. In the element of the structure addition shown in FIG. 8C, the related phrase that is currently used as the extraction criterion is stored as it is.

ステップＳ５０では、ＣＰＵ（中央処理装置）１Ｃが、図８（ａ）に示す配列index[ｋ]の中で、付加情報の抽出を行っていない要素が無いかを確認する。対象の要素（ステップＳ４０の処理を行っていない要素）が存在すれば（ステップＳ５０のＮＯ）、ステップＳ４０に戻る。対象の要素が存在しなければ（ステップＳ５０のＹＥＳ）、ステップＳ６０へ進む。 In step S50, the CPU (central processing unit) 1C confirms whether there is an element from which additional information is not extracted in the array index [k] illustrated in FIG. If there is a target element (an element not subjected to the process of step S40) (NO in step S50), the process returns to step S40. If there is no target element (YES in step S50), the process proceeds to step S60.

ステップＳ６０では、ＣＰＵ（中央処理装置）１Ｃが、図８（ｃ）に示す構造体additionのelementとadd_Infoを特定の基準を算出し(例:elementの辞書順)、その算出結果に基づいて図８（ｂ）に示す構造体relationの順序を変更して再び主記憶装置１Ａや外部記憶装置１Ｂなどに記憶させる。 In step S60, the CPU (central processing unit) 1C calculates a specific reference for the element and add_Info of the structure addition shown in FIG. 8C (for example, in the order of the dictionary of elements), and based on the calculation result, The order of the structure relations shown in FIG. 8B is changed and stored again in the main storage device 1A, the external storage device 1B, or the like.

ステップＳ７０では、ＣＰＵ（中央処理装置）１Ｃが、図８（ｃ）に示す構造体additionに基づいて検索結果となるデータを生成し、その検索結果を表示装置２に送り、図６に示すフローが終了する。 In step S70, the CPU (central processing unit) 1C generates data as a search result based on the structure addition shown in FIG. 8C, sends the search result to the display device 2, and the flow shown in FIG. Ends.

以下、実際に検索結果を出力表示する際の具体例を列挙する。図２は、「犬」を検索文字列に指定した場合に、関連語句のみを出力結果として並べたものである。関連語句を並べる順序は、例えば、検索対象情報から発見された順番、辞書順、検索された数が多い順などが挙げられる。 The following are specific examples when the search results are actually output and displayed. FIG. 2 shows only the related terms arranged as an output result when “dog” is designated as a search character string. The order in which the related phrases are arranged includes, for example, the order found from the search target information, the dictionary order, the order in which the number of searched items is large, and the like.

図１１は、「犬」を検索文字列に指定した場合に、関連語句に加えて要約文を付加情報として表示したものである。これにより、検索結果単体での情報量が増加し、クライアントはこの画面から必要とする情報を得やすくなる。要約文の抽出方法としては、例えば、検索文字列を含む文、関連語句を含む文、またはそれらを共に含む文を要約文として抽出する方法などが挙げられる。また、抽出する要約文の長さに関する設定は、例えば、句読点(英文であればピリオドなど、文章を区切る記号)を境界とした一文を要約文とする設定や、閾値を設定しその範囲内でできるだけ抽出する設定などが挙げられる。また、図１１に示すように、要約文を表示させる際にそれに含まれる検索文字列や、関連語句を太字やボールド体などで強調表示すると、クライアントにとって見やすい表示になると考えられる。 FIG. 11 shows a summary sentence as additional information in addition to a related phrase when “dog” is designated as a search character string. As a result, the amount of information in the search result alone increases, and the client can easily obtain necessary information from this screen. As a summary sentence extraction method, for example, a sentence that includes a search character string, a sentence that includes a related phrase, or a sentence that includes both of them is extracted as a summary sentence. In addition, for the setting related to the length of the summary sentence to be extracted, for example, a sentence with a punctuation mark (a symbol such as a period in English) can be used as a summary sentence, or a threshold value can be set within the range. Settings that extract as much as possible are listed. Further, as shown in FIG. 11, when a summary text is displayed, if a search character string included in the summary text or a related word / phrase is highlighted in bold or bold, it is considered that the display is easy to see for the client.

図１２は、「犬」を検索文字列に指定した場合に、関連語句に加えて該当するデータを掲載するインターネットアドレスを付加情報として表示したものである。出力例としてはＵＲＬ(Uniform Resource Locator)を列挙しているが、ＨＴＭＬ(Hyper Text Markup Language)の記述を用いて、異なる文字列に対してリンク指定したものを列挙するなど、単純にデータの格納されている場所を指し示すことができる方法であれば、どのような形態をとっても良い。これによりクライアントは、必要な情報のある場所へ簡単かつ少ない操作で移動することができ、検索処理の効率を高めることができる。 FIG. 12 shows, as additional information, an Internet address at which relevant data is posted in addition to related terms when “dog” is designated as a search character string. URL (Uniform Resource Locator) is listed as an output example, but data is simply stored by listing links specified for different character strings using HTML (Hyper Text Markup Language) description. Any method can be used as long as it can point to the place where the information is being placed. As a result, the client can move to a place with necessary information with a simple and few operations, and the efficiency of the search process can be improved.

図１３は、「犬」を検索文字列に指定した場合に、関連語句に加えてその双方の関連度を付加情報として表示したものである。これにより、検索結果単体での情報量が増加し、クライアントはこの画面から必要とする情報を得やすくなる。この出力例では関連度の算出方法として、実際に検索された関連語句の件数を用いており、その件数が多いものから順に表示している。検索した全ての関連語句を表示する必要はなく、その中から件数が多い順に所定の数（例えば３つ）だけを表示させてもよい。すなわち、関連語句の表示数に制約をかける必要はない。 FIG. 13 shows a case where “dog” is designated as a search character string, and in addition to related terms, the degree of association between the two is displayed as additional information. As a result, the amount of information in the search result alone increases, and the client can easily obtain necessary information from this screen. In this output example, the number of related terms actually searched is used as a method of calculating the degree of association, and the numbers are displayed in descending order. It is not necessary to display all the related phrases searched, and only a predetermined number (for example, three) may be displayed in order from the largest number of cases. That is, there is no need to limit the number of related phrases displayed.

図１４は、「犬」を検索文字列に指定した場合に、関連語句に加えてその双方の近接度を付加情報として表示したものである。これにより、検索結果単体での情報量が増加し、クライアントはこの画面から必要とする情報を得やすくなる。この出力例に関しては、検索文字列とそれに対応した関連語句との間に存在する文字数を以って近接度としている。また、関連語句を並べる基準として百分率を横に表示させているが、この算出法に関しては、抽出された同一名の関連語句の中で、近接度がある値以下の集合の割合を用いたり、百分率の表示自体をせず、単純に各近接度の平均をとったものを採用したり、といった方法が考えられる。また、平均の算出方法としては、例えば、検索されたもの全ての平均をとる方法や、その近接度が小さい順に所定の数（例えば１０個）を抽出してその平均をとる方法などが挙げられる。 FIG. 14 shows a case where “dog” is designated as a search character string, and the proximity of both of them is displayed as additional information in addition to related terms. As a result, the amount of information in the search result alone increases, and the client can easily obtain necessary information from this screen. In this output example, the proximity is determined by the number of characters existing between the search character string and the related phrase corresponding thereto. In addition, the percentage is displayed horizontally as a reference for arranging related terms, but for this calculation method, among the extracted related terms with the same name, the ratio of the set whose proximity is less than a certain value, Instead of displaying the percentage itself, it is possible to adopt a method of simply taking the average of each proximity. Further, as an average calculation method, for example, there is a method of taking an average of all searched items, a method of extracting a predetermined number (for example, 10) in ascending order of proximity, and taking an average thereof. .

は、本発明に係る情報検索装置の外観例を示す図である。These are figures which show the example of an external appearance of the information search device which concerns on this invention. は、本発明に係る情報検索装置の出力表示例を示す図である。These are figures which show the example of an output display of the information search device which concerns on this invention. は、本発明に係る情報検索装置の出力表示例を示す図である。These are figures which show the example of an output display of the information search device which concerns on this invention. は、図１に示す情報検索装置の機能ブロック図である。These are functional block diagrams of the information search apparatus shown in FIG. は、図１に示す情報検索装置の一構成例を示す図である。These are figures which show the example of 1 structure of the information search device shown in FIG. は、図５に示す構成の情報検索装置におけるデータ処理手順の一例を示すフローチャートである。These are the flowcharts which show an example of the data processing procedure in the information search device of a structure shown in FIG. は、関連語句の抽出処理手順の一例を示すフローチャートである。These are the flowcharts which show an example of the extraction processing procedure of a related term phrase. は、図６に示すデータ処理手順によって得られるデータの内部構造を示す図である。These are figures which show the internal structure of the data obtained by the data processing procedure shown in FIG. は、図６に示すデータ処理手順によって得られるデータの内部構造を示す模式図である。FIG. 7 is a schematic diagram showing an internal structure of data obtained by the data processing procedure shown in FIG. 6. は、図６に示すデータ処理手順によって得られるデータの他の構造を示す図である。These are figures which show the other structure of the data obtained by the data processing procedure shown in FIG. は、本発明に係る情報検索装置の出力表示例を示す図である。These are figures which show the example of an output display of the information search device which concerns on this invention. は、本発明に係る情報検索装置の出力表示例を示す図である。These are figures which show the example of an output display of the information search device which concerns on this invention. は、本発明に係る情報検索装置の出力表示例を示す図である。These are figures which show the example of an output display of the information search device which concerns on this invention. は、本発明に係る情報検索装置の出力表示例を示す図である。These are figures which show the example of an output display of the information search device which concerns on this invention.

Explanation of symbols

１情報解析装置
１Ａ主記憶装置
１Ｂ外部記憶装置
１ＣＣＰＵ（中央処理装置）
１Ｄ通信デバイス
１Ｅデータバス
２入力装置
３表示装置
４検索文字列入力部
５検索対象情報取得部
６インデックス生成部
７検索文字列情報抽出部
８関連語句抽出部
９関連語句ソート部
１０付加情報生成部
１１検索結果生成部
１２検索結果出力部
１３ネットワーク 1 Information Analysis Device 1A Main Storage Device 1B External Storage Device 1C CPU (Central Processing Unit)
1D Communication Device 1E Data Bus 2 Input Device 3 Display Device 4 Search Character String Input Unit 5 Search Target Information Acquisition Unit 6 Index Generation Unit 7 Search Character String Information Extraction Unit 8 Related Phrase Extraction Unit 9 Related Phrase Sorting Unit 10 Additional Information Generation Unit 11 Search Result Generation Unit 12 Search Result Output Unit 13 Network

Claims

A search string acquisition means for acquiring a search string;
Search target information acquisition means for acquiring search target information;
Search character string information extracting means for extracting information related to the search character string from the search target information;
Based on the information extracted by the search character string information extraction unit, a related word phrase extracting unit that extracts a related word phrase having a predetermined correspondence with the search character string from the search target information; ,
An information search apparatus comprising: a search result generation unit that generates a search result using the related phrase extracted by the related phrase extraction unit.

Comprising additional information generating means for generating additional information included in the search results;
The information search device according to claim 1, wherein the additional information generation unit uses a summary sentence including the related phrase as one element of the additional information based on the related phrase extracted by the related phrase extraction unit.

The additional information generation unit adds the summary sentence including the related phrase and the search character string based on the related phrase extracted by the related phrase extraction unit and the information extracted by the search character string information extraction unit. The information search apparatus according to claim 2, wherein the information search apparatus is an element of information.

Comprising additional information generating means for generating additional information included in the search results;
The storage location of information including the related phrase extracted by the related phrase extracting unit based on the related phrase extracted by the related phrase extracting unit as one element of the additional information. The information search apparatus in any one of 1-3.

Comprising additional information generating means for generating additional information included in the search results;
The information according to any one of claims 1 to 4, wherein the additional information generation unit uses, as one element of the additional information, a degree of relevance that is information obtained by quantifying the relevance between the related phrase and the search character string. Search device.

The information search device according to claim 5, further comprising: a related phrase sorting unit that rearranges the related phrases based on the relevance level.

The information search device according to claim 5 or 6, wherein the additional information generation unit calculates the degree of association based on the number of characters existing between the related phrase and the search character string.

The information search apparatus according to claim 5 or 6, wherein the additional information generation unit calculates a relevance level based on a total number of identical phrases of the related phrases extracted by the related phrase extraction unit.

Computer
Search string acquisition means for acquiring a search string,
Search target information acquisition means for acquiring search target information,
Search character string information extracting means for extracting information related to the search character string from the search target information;
Based on the information extracted by the search character string information extraction unit, a related word phrase extraction unit that extracts a related word phrase having a predetermined correspondence with the search character string from the search target information; And an information search program for functioning as search result generation means for generating a search result using the related phrases extracted by the related phrase extraction means.