JP2023125592A

JP2023125592A - Information processing system, information processing method, and program

Info

Publication number: JP2023125592A
Application number: JP2022029784A
Authority: JP
Inventors: 義治進; Yoshiharu Shin
Original assignee: Canon Marketing Japan Inc; Canon IT Solutions Inc
Current assignee: Canon Marketing Japan Inc; Canon IT Solutions Inc
Priority date: 2022-02-28
Filing date: 2022-02-28
Publication date: 2023-09-07
Anticipated expiration: 2042-02-28
Also published as: JP7545061B2

Abstract

To provide a mechanism that enables efficient checking of search results.SOLUTION: An information processing system provided herein is configured to perform a document search using a search query specified by a user, perform control so as to display a search result, acquire characteristic words in a document from the searched document, and display the characteristic words in such a way that the characteristic words are identifiable as identified words.SELECTED DRAWING: Figure 1

Description

本発明は、情報処理システム、情報処理方法、プログラムに関する。 The present invention relates to an information processing system, an information processing method, and a program.

企業内に電子化された文書が増えるにつれ、業務上必要な文書を効率的に検索するための文書検索システムの重要性が高まっている。ここで、文書検索システムは、ユーザーが入力した検索条件に関連する文書集合をユーザーに提示するシステムである。代表例としては、文字列で検索クエリを入力し、検索クエリ中に含まれる検索キーワードによって関連文書を取得する機能を持つものがある。 As the number of digitized documents in companies increases, the importance of document search systems to efficiently search for documents necessary for business is increasing. Here, the document search system is a system that presents a user with a collection of documents related to search conditions input by the user. A typical example is one that has the function of inputting a search query as a character string and acquiring related documents based on the search keywords included in the search query.

検索システムにおいて、検索結果の文書において検索キーワードが現れる周辺の文字列を表示し、さらに検索キーワードをハイライトする（ハイライト機能）等により識別表示する技術が存在する（以下、識別表示の対象となる単語を識別単語という）。 In search systems, there is a technology that displays character strings around search keywords in search result documents, and further identifies the search keywords by highlighting them (highlight function). (words that differ from each other are called identification words).

ハイライト機能により、ユーザーは検索結果として得られる複数の文書から自分にとって興味のある単語を効率的に見つけることができ、その結果、自分が求めている文書がどの文書であるか素早く把握することができる。 The highlighting function allows users to efficiently find words of interest from multiple documents obtained as search results, and as a result, users can quickly understand which document is the one they are looking for. I can do it.

非特許文献１には、文書検索システムにおけるハイライト機能について開示されている。 Non-Patent Document 1 discloses a highlight function in a document search system.

ｈｔｔｐｓ：／／ｗｗｗ．ｈｉｔａｃｈｉ－ｓｙｓｔｅｍｓ．ｃｏｍ／ｉｎｄ／ｓｒｐａｒｔｎｅｒ／ｐｒｏｄｕｃｔ／ｈｉｇｈｌｉｇｈｔ／ｉｎｄｅｘ．ｈｔｍｌhttps://www. hitachi-systems. com/ind/srpartner/product/highlight/index. html

非特許文献１には、検索に用いたキーワードについてユーザが指定した色でハイライトして表示する機能について開示されている。 Non-Patent Document 1 discloses a function for highlighting and displaying keywords used in a search in a color specified by the user.

しかし、検索の仕方によっては、検索に用いたキーワードがハイライトされるだけでは、検索された文書の特徴を効率的に把握できない等の課題がある。 However, depending on the search method, there are problems such as the fact that the characteristics of the searched document cannot be efficiently grasped just by highlighting the keywords used in the search.

そこで、本発明は、検索結果を効率的に確認できる仕組みを提供することを目的とする。 Therefore, an object of the present invention is to provide a mechanism that allows search results to be checked efficiently.

本発明の情報処理システムは、ユーザから指定された検索クエリを用いて文書検索を行う検索手段と、前記検索手段による検索結果を表示するよう制御する表示制御手段と、前記検索された文書から当該文書における特徴語を取得する特徴語取得手段と、を備え、前記表示制御手段は、前記検索手段により検索された文書における前記特徴語を識別単語として識別可能に表示するよう制御することを特徴とする。 The information processing system of the present invention includes a search means for performing a document search using a search query specified by a user, a display control means for controlling display of the search results by the search means, and a display control means for controlling the display of the search results by the search means. Feature word acquisition means for obtaining a feature word in a document, and the display control means controls to display the feature word in the document searched by the search means in an identifiable manner as an identification word. do.

また、本発明の情報処理システムは、ユーザから指定された検索クエリを用いて文書検索を行う検索手段と、前記検索手段による検索結果を表示するよう制御する表示制御手段と、前記検索手段により検索された文書に関連する単語である関連語を取得する関連語取得手段と、を備え、前記表示制御手段は、前記検索手段により検索された文書に含まれる前記関連語を識別単語として識別可能に表示するよう制御することを特徴とする。 Further, the information processing system of the present invention includes a search means for performing a document search using a search query specified by a user, a display control means for controlling display of search results by the search means, and a search means for performing a search by the search means. a related word acquisition means for obtaining a related word that is a word related to the document searched for, and the display control means is capable of identifying the related word included in the document searched by the search means as an identification word. It is characterized by controlling to display.

本発明によれば、検索結果を効率的に確認することがかのうとなる。 According to the present invention, search results can be checked efficiently.

本発明の実施形態における、文書検索システムのシステム構成の一例を示す図である。1 is a diagram showing an example of a system configuration of a document search system in an embodiment of the present invention. 本発明の実施形態における、文書検索システム、クライアント端末のハードウェア構成の一例を示すブロック図である。FIG. 1 is a block diagram showing an example of the hardware configuration of a document search system and a client terminal in an embodiment of the present invention. 本発明の実施形態における、文書ＤＢに保存された検索対象文書集合の一例を示す図である。FIG. 3 is a diagram showing an example of a set of search target documents stored in a document DB in an embodiment of the present invention. 本発明の実施形態における、検索条件として用いられるデータの一例を示す図である。It is a figure showing an example of data used as a search condition in an embodiment of the present invention. 本発明の実施形態における、検索結果として用いられるデータの一例を示す図A diagram showing an example of data used as search results in an embodiment of the present invention 本発明の実施形態における、検索結果一覧に関する優先度ルール表の一例を示す図である。FIG. 3 is a diagram showing an example of a priority rule table regarding a search result list in an embodiment of the present invention. 本発明の実施形態における、検索結果詳細に関する優先度ルール表の一例を示す図である。FIG. 3 is a diagram showing an example of a priority rule table regarding search result details in an embodiment of the present invention. 本発明の実施形態における、識別単語候補作成部が作成する識別単語候補ソース表の一例を示す図である。FIG. 3 is a diagram showing an example of an identification word candidate source table created by an identification word candidate creation unit in the embodiment of the present invention. 本発明の実施形態における、識別単語候補作成部が作成する識別単語候補表の一例を示す図である。FIG. 3 is a diagram showing an example of an identification word candidate table created by an identification word candidate creation unit in the embodiment of the present invention. 本発明の実施形態における、検索処理部が実施する検索処理を示すフローチャートである。It is a flowchart which shows the search process which a search processing part performs in embodiment of this invention. 本発明の実施形態における、識別単語候補作成部が実施する識別単語候補作成処理を示すフローチャートである。FIG. 2 is a flowchart showing identification word candidate creation processing performed by an identification word candidate creation unit in an embodiment of the present invention. FIG. 本発明の実施形態における、検索結果画面の一例を示す図である。It is a figure showing an example of a search result screen in an embodiment of the present invention. 本発明の実施形態における、検索結果画面において識別単語候補を表示している状態の一例を示す図である。FIG. 6 is a diagram illustrating an example of a state in which identification word candidates are displayed on a search result screen in an embodiment of the present invention. 本発明の実施形態における、検索結果画面において識別単語を選択した状態の一例を示す図である。FIG. 6 is a diagram showing an example of a state in which an identification word is selected on a search result screen in an embodiment of the present invention. 本発明の実施形態における、検索結果詳細画面において識別単語候補を表示している状態の一例を示す図であるFIG. 3 is a diagram illustrating an example of a state in which identification word candidates are displayed on a search result details screen in an embodiment of the present invention. 本発明の実施形態における、検索結果詳細画面において識別単語を選択した状態の一例を示す図である。FIG. 6 is a diagram illustrating an example of a state in which an identification word is selected on a search result details screen in an embodiment of the present invention.

以下、図面を参照して、本発明の実施形態を詳細に説明する。 Embodiments of the present invention will be described in detail below with reference to the drawings.

図１は、本発明の実施形態における文書検索システム１００のシステム構成の一例を示す図である。 FIG. 1 is a diagram showing an example of the system configuration of a document search system 100 according to an embodiment of the present invention.

文書検索システム１００は、文書登録装置１１０、文書ＤＢ１２０、文書検索装置１３０、特徴語更新装置１４０から成る。 The document search system 100 includes a document registration device 110, a document DB 120, a document search device 130, and a characteristic word update device 140.

文書登録装置１１０は、ユーザーによる検索の対象となる文書を登録するための装置であり、文書受信部１１１、キーワード抽出部１１２、文書登録処理部１１３から成る。 The document registration device 110 is a device for registering documents to be searched by a user, and includes a document receiving section 111, a keyword extraction section 112, and a document registration processing section 113.

文書受信部１１１は、登録対象の文書を受け付けるための機能部である。ユーザー（クライアント端末）はＷｅｂブラウザなどを通じて任意の文書を文書受信部１１１に送信できる。あるいは、クローラーが機械的に文書を収集して送信するような構成をとってもよい。 The document receiving unit 111 is a functional unit for receiving a document to be registered. A user (client terminal) can send any document to the document receiving unit 111 through a web browser or the like. Alternatively, a configuration may be adopted in which a crawler mechanically collects and transmits documents.

キーワード抽出部１１２は、文書受信部１１１が受け付けた文書から、当該文書における特徴語の候補となるキーワードとその出現頻度を抽出するための機能部である。特徴語の詳細については後述する。キーワード抽出部１１２におけるキーワード抽出処理は、公知の形態素解析技術を用いる。ここで、抽出する形態素は、文書検索システムの用途に応じて、固有名詞などの特定の品詞に限定してもよい。また、形態素解析を使用せずに、事前に定めたパターンに一致する文字列をキーワードとして抽出してもよい。 The keyword extracting unit 112 is a functional unit for extracting keywords that are candidate feature words in the document and their frequency of appearance from the document received by the document receiving unit 111. Details of the characteristic words will be described later. The keyword extraction process in the keyword extraction unit 112 uses a known morphological analysis technique. Here, the morphemes to be extracted may be limited to specific parts of speech, such as proper nouns, depending on the purpose of the document search system. Alternatively, character strings matching a predetermined pattern may be extracted as keywords without using morphological analysis.

文書登録処理部１１３は、文書受信部１１１で受け付けた文書と、キーワード抽出部１１２において抽出したキーワードとを紐づけて、文書ＤＢ１２０へ格納する装置である。

文書ＤＢ１２０は、文書を一意に識別するための文書ＩＤ、文書名、本文、キーワード抽出部１１２が抽出した値を格納するキーワード：出現頻度、および、特徴語を格納する領域を備える。文書ＤＢ１２０に格納されたデータの一例を図３に示す。特徴語の作成方法ついては後述する。なお、本アイデアを説明するための構成として、前述の５項目を例示しているが、文書の所在を示すＵＲＬ、文書のサイズ、文書の作成者など、文書検索システムとして利用する項目を追加で備えてもよい。 The document registration processing unit 113 is a device that associates the document received by the document reception unit 111 with the keyword extracted by the keyword extraction unit 112 and stores the linked document in the document DB 120.

The document DB 120 includes an area for storing a document ID for uniquely identifying a document, a document name, a text, a keyword: appearance frequency for storing a value extracted by the keyword extracting unit 112, and a characteristic word. FIG. 3 shows an example of data stored in the document DB 120. The method for creating characteristic words will be described later. The above five items are illustrated as an example of the configuration for explaining this idea, but additional items such as the URL indicating the location of the document, the size of the document, the author of the document, etc., to be used as a document search system can be added. You may prepare.

文書検索装置１３０は、検索処理部１３１、検索条件保存部１３２、検索操作保存部１３３、検索結果保存部１３４、識別単語候補作成部１３５、優先度ルール表（検索結果一覧）１３６、優先度ルール表（検索結果詳細）１３７からなる。 The document search device 130 includes a search processing unit 131, a search condition storage unit 132, a search operation storage unit 133, a search result storage unit 134, an identification word candidate creation unit 135, a priority rule table (search result list) 136, and priority rules. It consists of 137 tables (details of search results).

検索処理部１３１は、ユーザーからの検索操作を受け付け、その検索操作を解釈して検索ＤＢに問い合わせる検索条件を生成し、その検索条件に合致する文書を文書ＤＢ１２０から検索する機能部であり、検索条件に関連のある文書をスコア順に取得する機能を備える。検索処理部１３１が行う検索処理の詳細は後述する。さらに、ユーザーは検索処理部１３１へ検索操作を送ると同時に識別単語を送ることもできる。 The search processing unit 131 is a functional unit that receives a search operation from a user, interprets the search operation, generates search conditions for querying the search DB, and searches the document DB 120 for documents that match the search conditions. Equipped with a function to retrieve documents related to conditions in order of score. Details of the search processing performed by the search processing unit 131 will be described later. Furthermore, the user can also send an identification word at the same time as sending a search operation to the search processing unit 131.

検索条件保存部１３２は、ユーザーが行った検索の検索条件を保存する機能部である。検索条件は図４のように「検索クエリ」と「類似文書検索クエリの特徴語」と「キーワードフィルター」からなる。 The search condition storage unit 132 is a functional unit that stores search conditions for searches performed by the user. As shown in FIG. 4, the search conditions consist of a "search query," "feature words of similar document search query," and a "keyword filter."

検索操作保存部１３３は、ユーザーが行った検索の検索操作を保存する機能部である。検索操作保存部１３３に保存される値としては少なくとも「検索クエリによる検索」、「類似文書検索」、「キーワードフィルターの追加」の３種類の値がありうる。さらに、「キーワードフィルターの追加」の際には追加情報としてキーワードの文字列を保存できる。 The search operation storage unit 133 is a functional unit that stores search operations for searches performed by the user. There can be at least three types of values stored in the search operation storage unit 133: "Search by search query," "Similar document search," and "Addition of keyword filter." Additionally, when you add a keyword filter, you can save the keyword string as additional information.

検索結果保存部１３４は、ユーザーが行った検索の検索結果を保存する機能部である。検索結果は図５のように、文書ＤＢ１２０から検索条件に合致する文書集合を抜き出した「文書一覧」と、検索結果に関連する単語の一覧である「関連語」と、文書一覧の文書ＩＤごとに本文内から識別表示する部分を抜き出した「スニペット」からなる。 The search result storage unit 134 is a functional unit that stores search results of searches performed by the user. As shown in Figure 5, the search results include a "document list" that is a collection of documents that match the search conditions extracted from the document DB 120, a "related word" that is a list of words related to the search results, and a list of documents for each document ID in the document list. It consists of "snippets" that are extracted from the main text to be identified.

検索処理部１３１は、検索処理を実行するたびに検索条件保存部１３２および検索操作保存部１３３および検索結果保存部１３４に保存している情報を更新する。 The search processing unit 131 updates the information stored in the search condition storage unit 132, search operation storage unit 133, and search result storage unit 134 every time a search process is executed.

識別単語候補作成部１３５は、図９のような識別単語候補表を作成する識別単語候補作成処理を行う。識別単語候補表は単語とその優先度を持つ表である。識別単語候補表作成処理については後述する。 The identification word candidate creation unit 135 performs an identification word candidate creation process to create an identification word candidate table as shown in FIG. The identification word candidate table is a table containing words and their priorities. The identification word candidate table creation process will be described later.

優先度ルール表（検索結果一覧）１３６は、検索結果一覧に関する識別単語候補作成処理において用いる、図６のような優先度ルールを保持する。また、優先度ルール表（検索結果詳細）１３７は、検索結果詳細に関する識別単語候補作成処理において用いる、図７のような優先度ルールを保持する。これらの表の利用方法については、識別単語候補表作成処理の説明時に合わせて説明する。なお、これらの優先度ルール表は文書検索システムを構築したときにシステム管理者が値を設定できる。 The priority rule table (search result list) 136 holds priority rules as shown in FIG. 6, which are used in the identification word candidate creation process regarding the search result list. Further, the priority rule table (search result details) 137 holds priority rules as shown in FIG. 7, which are used in the identification word candidate creation process regarding the search result details. How to use these tables will be explained in conjunction with the explanation of the identification word candidate table creation process. Note that values can be set in these priority rule tables by the system administrator when the document search system is constructed.

特徴語更新装置１４０は、文書ＤＢに格納された各文書について、特徴的なキーワードを特徴語として抽出して当該レコードを更新する装置である。特徴語の選出は、単語の特徴量を表す指標の１つであるｔｆ－ｉｄｆを用いることで実現できる。特徴語更新装置１４０は、文書ＤＢ１２０におけるキーワード：出現頻度の項目から、各単語の出現頻度を取得し、ｔｆ－ｉｄｆ値の高い順に、最大Ｎ件のキーワードを特徴語として抽出する。Ｎの値は文書検索システムを構築したときにシステム管理者が値を設定できる。例えば、図３の文書ＤＢ１２０において、文書１の特徴語は、「設計」、「画面」、「モバイル」の３件である。 The feature word update device 140 is a device that extracts a characteristic keyword as a feature word for each document stored in the document DB and updates the record. Selection of feature words can be realized by using tf-idf, which is one of the indicators representing the feature amount of a word. The characteristic word updating device 140 acquires the appearance frequency of each word from the keyword: appearance frequency field in the document DB 120, and extracts a maximum of N keywords as characteristic words in descending order of tf-idf value. The value of N can be set by the system administrator when the document search system is constructed. For example, in the document DB 120 of FIG. 3, the feature words of document 1 are three items: "design", "screen", and "mobile".

なお、図１において、文書検索システム１００を構成する装置として、文書登録装置１１０、文書検索装置１３０、特徴語更新装置１４０の３つの装置と文書ＤＢ１２０とで構成される例を説明したが、本発明における文書検索システムはこの構成例に限定されるものではなく、各装置が備える機能を一つの装置が備えたシステムであっても良い。 In FIG. 1, an example has been described in which the document search system 100 is configured with three devices: the document registration device 110, the document search device 130, and the characteristic word update device 140, and the document DB 120. The document search system according to the invention is not limited to this configuration example, and may be a system in which one device has the functions provided by each device.

図２は、本発明の文書検索検索システム１００や各装置として適用可能な情報処理装置のハードウェア構成の一例を示すブロック図である。 FIG. 2 is a block diagram showing an example of the hardware configuration of an information processing device applicable as the document search system 100 of the present invention and each device.

図２に示すように、情報処理装置は、システムバス２００を介してＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）２０１、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）２０２、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）２０３、記憶装置２０４、入力コントローラ２０５、音声コントローラ２０６、ビデオコントローラ２０７、メモリコントローラ２０８、よび通信Ｉ／Ｆコントローラ２０９が接続される。 As shown in FIG. 2, the information processing device includes a CPU (Central Processing Unit) 201, a ROM (Read Only Memory) 202, a RAM (Random Access Memory) 203, a storage device 204, an input controller 205, An audio controller 206, a video controller 207, a memory controller 208, and a communication I/F controller 209 are connected.

ＣＰＵ２０１は、システムバス２００に接続される各デバイスやコントローラを統括的に制御する。 The CPU 201 centrally controls each device and controller connected to the system bus 200.

ＲＯＭ２０２あるいは外部メモリ２１３は、ＣＰＵ２０１が実行する制御プログラムであるＢＩＯＳ（ＢａｓｉｃＩｎｐｕｔ／ＯｕｔｐｕｔＳｙｓｔｅｍ）やＯＳ（ＯｐｅｒａｔｉｎｇＳｙｓｔｅｍ）や、本情報処理方法を実現するためのコンピュータ読み取り実行可能なプログラムおよび必要な各種データ（データテーブルを含む）を保持している。 The ROM 202 or external memory 213 stores a BIOS (Basic Input/Output System) and an OS (Operating System), which are control programs executed by the CPU 201, as well as computer-readable and executable programs and various necessary programs for realizing this information processing method. Holds data (including data tables).

ＲＡＭ２０３は、ＣＰＵ２０１の主メモリ、ワークエリア等として機能する。ＣＰＵ２０１は、処理の実行に際して必要なプログラム等をＲＯＭ２０２あるいは外部メモリ２１３からＲＡＭ２０３にロードし、ロードしたプログラムを実行することで各種動作を実現する。 The RAM 203 functions as the main memory, work area, etc. of the CPU 201. The CPU 201 loads necessary programs and the like from the ROM 202 or the external memory 213 into the RAM 203 when executing processing, and executes the loaded programs to realize various operations.

入力コントローラ２０５は、キーボード２１０や不図示のマウス等のポインティングデバイス等の入力装置からの入力を制御する。入力装置がタッチパネルの場合、ユーザがタッチパネルに表示されたアイコンやカーソルやボタンに合わせて押下（指等でタッチ）することにより、各種の指示を行うことができることとする。 The input controller 205 controls input from input devices such as a keyboard 210 and a pointing device such as a mouse (not shown). If the input device is a touch panel, the user can issue various instructions by pressing (touching with a finger or the like) an icon, cursor, or button displayed on the touch panel.

また、タッチパネルは、マルチタッチスクリーンなどの、複数の指でタッチされた位置を検出することが可能なタッチパネルであってもよい。 Further, the touch panel may be a touch panel capable of detecting positions touched by multiple fingers, such as a multi-touch screen.

ビデオコントローラ２０７は、ディスプレイ２１２などの外部出力装置への表示を制御する。ディスプレイは本体と一体になったノート型パソコンのディスプレイも含まれるものとする。なお、外部出力装置はディスプレイに限ったものははく、例えばプロジェクタであってもよい。また、前述のタッチ操作を受け付け可能な装置については、入力装置も提供する。 Video controller 207 controls display on an external output device such as display 212. The display shall also include the display of a notebook computer that is integrated with the main body. Note that the external output device is not limited to a display, but may be a projector, for example. Furthermore, an input device is also provided for the device capable of accepting the above-mentioned touch operation.

なおビデオコントローラ２０７は、表示制御を行うためのビデオメモリ（ＶＲＡＭ）を制御することが可能で、ビデオメモリ領域としてＲＡＭ２０３の一部を利用することもできるし、別途専用のビデオメモリを設けることも可能である。 Note that the video controller 207 can control a video memory (VRAM) for display control, and can use a part of the RAM 203 as a video memory area, or can provide a separate dedicated video memory. It is possible.

メモリコントローラ２０８は、外部メモリ２１３へのアクセスを制御する。外部メモリとしては、ブートプログラム、各種アプリケーション、フォントデータ、ユーザファイル、編集ファイル、および各種データ等を記憶する外部記憶装置（ハードディスク）、フレキシブルディスク（ＦＤ）、或いはＰＣＭＣＩＡカードスロットにアダプタを介して接続されるコンパクトフラッシュ（登録商標）メモリ等を利用可能である。 Memory controller 208 controls access to external memory 213. External memory can be an external storage device (hard disk) that stores boot programs, various applications, font data, user files, editing files, various data, etc., a flexible disk (FD), or a PCMCIA card slot connected via an adapter. Compact flash (registered trademark) memory and the like can be used.

通信Ｉ／Ｆコントローラ２０９は、ネットワークを介して外部機器と接続・通信するものであり、ネットワークでの通信制御処理を実行する。例えば、ＴＣＰ／ＩＰを用いた通信やＩＳＤＮなどの電話回線、および携帯電話の４Ｇ回線、５Ｇ回線等を用いた通信が可能である。 The communication I/F controller 209 connects and communicates with external devices via a network, and executes communication control processing on the network. For example, communication using TCP/IP, a telephone line such as ISDN, a 4G line, a 5G line of a mobile phone, etc. is possible.

尚、ＣＰＵ２０１は、例えばＲＡＭ２０３内の表示情報用領域へアウトラインフォントの展開（ラスタライズ）処理を実行することにより、ディスプレイ２１２上での表示を可能としている。また、ＣＰＵ２０１は、ディスプレイ２１２上の不図示のマウスカーソル等でのユーザ指示を可能とする。 Note that the CPU 201 enables display on the display 212 by, for example, executing an outline font development (rasterization) process in a display information area in the RAM 203. Further, the CPU 201 allows the user to give instructions using a mouse cursor (not shown) on the display 212.

次に、図１０のフローチャートを用いて本発明の形態において検索処理部１３１がクライアント端末から検索リクエストを受けたときに実行する検索処理について説明する。 Next, a search process executed by the search processing unit 131 when receiving a search request from a client terminal in the embodiment of the present invention will be described using the flowchart of FIG.

まず、ステップＳ１００１において、クライアントから受けた検索操作に含まれる検索条件と、検索条件保存部１３２に保存されている検索条件とを合わせることで、一つの検索条件を作る。 First, in step S1001, one search condition is created by combining the search condition included in the search operation received from the client and the search condition stored in the search condition storage unit 132.

検索条件には図４のように「検索クエリ」と「類似文書検索クエリの特徴語」と「キーワードフィルター」とを含む。図４に示す通り、検索クエリはキーワードである。検索クエリを含む検索条件によって、文書ＤＢ１２０に登録された文書の本文に対して全文検索が行われる。 As shown in FIG. 4, the search conditions include a "search query", "feature words of similar document search query", and "keyword filter". As shown in FIG. 4, the search query is a keyword. A full text search is performed on the main texts of documents registered in the document DB 120 using search conditions including a search query.

類似文書検索クエリの特徴語は、ユーザーが手持ちの文書を文書検索システム１００へ送信し、文書検索システム１００がその文書に類似する文書を検索ＤＢから取得する類似文書検索において、検索条件として用いられるキーワードである。ユーザーから文書を受け付けると、文書検索システム１００はその文書において特徴的な単語をｔｆ－ｉｄｆのような統計値から求めて特徴語として自動抽出する。類似文書検索クエリの特徴語を含む検索条件による検索処理により、文書ＤＢに登録された文書のキーワードと、類似文書検索クエリにおける特徴語の一致率がしきい値以上の文書が取得される。 The characteristic words of the similar document search query are used as search conditions in a similar document search in which a user sends a document in hand to the document search system 100, and the document search system 100 retrieves a document similar to the document from the search DB. It is a keyword. When a document is received from a user, the document search system 100 finds characteristic words in the document from statistical values such as tf-idf and automatically extracts them as characteristic words. Through search processing using search conditions that include the characteristic words of the similar document search query, documents for which the match rate between the keywords of documents registered in the document DB and the characteristic words of the similar document search query are equal to or higher than a threshold value are acquired.

キーワードフィルターはキーワードを含む絞り込み条件である。これを含む検索条件による検索処理により、文書ＤＢから当該キーワードを含む文書のみの検索が行われる。 A keyword filter is a narrowing condition that includes keywords. Through search processing using search conditions including this, only documents containing the keyword are searched from the document DB.

「検索クエリ」と「類似文書検索クエリの特徴語」と「キーワードフィルター」のうち複数の種類を含む検索条件からは、それぞれの条件のＡＮＤ条件をとることで最終的な検索結果の文書一覧が決まる。 From search conditions that include multiple types of "search query", "feature words of similar document search query", and "keyword filter", the final document list of search results can be obtained by taking the AND condition of each condition. It's decided.

例えば、検索条件保存部１３２に何も保存されていない状態で、ユーザーから「操作種別：検索クエリによる検索、検索クエリ：『製品Ｘ仕様』」という検索操作を受け付けたとする。このとき検索クエリが「製品Ｘ仕様」である検索条件を生成する。そうすると、後の処理により、検索条件保存部１３２に「検索クエリ：『製品Ｘ仕様』」という検索条件が保存される。 For example, suppose that a search operation such as "operation type: search by search query, search query: 'product At this time, a search condition in which the search query is "product X specifications" is generated. Then, in a later process, the search condition "Search query: 'Product X specifications'" is stored in the search condition storage unit 132.

その後、追加でユーザーから「操作種別：キーワードフィルターの追加、キーワード：『画面』」という検索操作を受け付けたとする。このとき、検索条件保存部１３２に保存された検索条件である「検索クエリ：『製品Ｘ仕様』」と、ユーザーから受け付けた操作により生成されるキーワードフィルター「画面」を合わせて、「検索クエリ：『製品Ｘ仕様』」かつ「キーワードフィルター：『画面』」であるような検索条件を生成する。 After that, assume that an additional search operation of "Operation type: Add keyword filter, Keyword: 'Screen'" is received from the user. At this time, the search condition "Search query: 'Product Search conditions such as ``Product X Specifications'' and ``Keyword Filter: ``Screen'''' are generated.

ステップＳ１００２において、検索条件に合致する文書を文書ＤＢ１２０から検索し、検索された文書を、検索条件に合致する程度を示す値であるスコアの高い順に並べる。効率的な検索処理を実現するためには、文書登録処理部１１３において、公知の技術である転置インデックスを作成して検索時に用いればよい。 In step S1002, documents matching the search conditions are searched from the document DB 120, and the searched documents are arranged in descending order of score, which is a value indicating the extent to which the search conditions are matched. In order to realize efficient search processing, the document registration processing unit 113 may create a transposed index using a known technique and use it during the search.

ステップＳ１００３において、検索結果の関連語を求める。関連語としては、検索された文書に含まれるキーワードのうち一部を用いる。具体的な関連語の取得方法として、例えば検索された文書のうちおよそ半分の文書に共通に含まれるキーワードを選出する。この選出方法により、検索された文書において複数の話題を持つ文書が混在するとき、それらの話題を適切に分割する関連語が選ばれることを期待できる。 In step S1003, related words in the search results are obtained. Some of the keywords included in the retrieved document are used as related words. As a specific method for acquiring related words, for example, keywords that are commonly included in about half of the searched documents are selected. With this selection method, when documents with multiple topics coexist in the retrieved documents, it can be expected that related words will be selected to appropriately divide the topics.

ステップＳ１００４において、生成された検索条件により検索条件保存部１３２を更新し、ユーザーから受け付けた検索操作により検索操作保存部１３３を更新する。 In step S1004, the search condition storage unit 132 is updated with the generated search conditions, and the search operation storage unit 133 is updated with the search operation received from the user.

ステップＳ１００５において、ユーザーから識別単語を提示（指定）されたか否かにより処理を分岐する。識別単語を提示されている場合、それを識別単語として用いてステップＳ１０１０に進む。提示されていない場合、ステップＳ１００６に進む。 In step S1005, the process branches depending on whether the user has presented (specified) an identification word. If an identification word has been presented, it is used as the identification word and the process proceeds to step S1010. If it has not been presented, the process advances to step S1006.

ステップＳ１００６において、文書検索システムが識別単語簡易選択の動作モードであるか否かによって処理を分岐する。識別単語簡易選択の動作モードであるかどうかは、文書検索システムのシステム管理者により設定可能である。簡易選択の動作モードであればステップＳ１００７に進み、簡易選択の動作モードでなければＳ１００８に進む。 In step S1006, the process branches depending on whether the document search system is in the identification word simple selection operation mode. The system administrator of the document retrieval system can set whether the operation mode is the simple identification word selection mode. If the operation mode is simple selection, the process advances to step S1007, and if the operation mode is not simple selection, the process advances to step S1008.

識別単語簡易選択モードである場合（ステップＳ１００６：ＹＥＳ）は、ステップＳ１００７において、検索条件保存部に保存されている検索クエリから、空白区切りにより単語を抜き出し、それを識別単語として用いる。 If it is the identification word simple selection mode (step S1006: YES), in step S1007, words are extracted by space separation from the search query stored in the search condition storage unit and used as identification words.

識別単語簡易選択モードではない場合（ステップＳ１００６：ＮＯ）は、ステップＳ１００８において、識別単語候補作成部に識別単語候補作成処理を行わせ、その結果を受け取る。識別単語候補作成処理については後述する。結果として図９のような、単語と優先度を持つ識別単語候補表を得られる。 If it is not the identification word simple selection mode (step S1006: NO), in step S1008, the identification word candidate creation unit is caused to perform identification word candidate creation processing, and the result is received. The identification word candidate creation process will be described later. As a result, an identification word candidate table having words and priorities as shown in FIG. 9 can be obtained.

ステップＳ１００９では、識別単語候補表から所定の条件を満たす単語（例えば優先度上位Ｎ件の単語や優先度が閾値以上の単語など）を識別単語として選出する。ここでＮは文書検索システム１００において定義された定数である。 In step S1009, a word that satisfies a predetermined condition (for example, a word in the top N priorities, a word with a priority equal to or higher than a threshold value, etc.) is selected as an identification word from the identification word candidate table. Here, N is a constant defined in the document search system 100.

ユーザーは検索クエリに含まれる単語がハイライトされる動作に慣れている場合があるため、その場合にはシステム管理者が識別単語簡易選択の動作モードになるよう文書検索システムを設定することで、検索クエリの単語がハイライトされるようになり、ユーザーにとって違和感のない挙動を実現できる。 Users may be accustomed to highlighting words in search queries, so system administrators can configure the document retrieval system to operate in the quick identification word selection mode. The words in the search query are now highlighted, creating a behavior that feels natural to users.

一方で、よりユーザーにとって興味深い可能性が高い単語をハイライトさせたい場合、システム管理者は識別単語候補作成処理を行わせるよう文書検索システムを設定することもできる。 On the other hand, if the system administrator wants to highlight words that are likely to be more interesting to the user, the system administrator can also set the document search system to perform identification word candidate creation processing.

ステップＳ１０１０において、検索結果として得られた各文書の本文から識別単語周辺の文字列をスニペットとして抽出する。周辺の文字列として、識別単語の前後Ｎ文字を用いることができる。ここでＮは文書検索システム１００において定義された定数である。 In step S1010, character strings around the identified word are extracted as snippets from the text of each document obtained as a search result. N characters before and after the identification word can be used as the surrounding character string. Here, N is a constant defined in the document search system 100.

さらに、クライアント端末において識別単語部分が識別表示されるように、抽出したスニペットの識別単語の部分を識別表示タグで囲む。識別表示の方法としては、識別単語を太字で表示する方法、他の文字列とは異なるフォントで表示する方法、マーカーで色付けして表示する方法、他の文字列とは異なる文字色で表示する方法など、識別単語が識別可能になる表示形態であればいずれでも良い。 Further, the identification word portion of the extracted snippet is surrounded by an identification display tag so that the identification word portion can be identified and displayed on the client terminal. Methods of identification include displaying identification words in bold, using a font different from other text strings, displaying them in color with markers, and displaying text in a different font color from other text strings. Any display method may be used as long as the identification word can be identified.

スニペットを効率よく抽出するために、公知の技術である転置インデックスを用いて本文内における識別単語の位置を取得することができる。 In order to efficiently extract snippets, the position of the identified word within the text can be obtained using a known technique, inverted index.

識別単語が複数ある場合、各識別単語および周辺文脈の抽出結果を文字列結合したものをスニペットとして用いることができる。 If there are a plurality of identification words, a string of the extraction results of each identification word and surrounding context can be combined and used as a snippet.

例えば図１２に示す検索画面においては、ユーザーは明示的に識別単語を指定しておらず、検索クエリ１２０１に「製品Ｘ仕様」が設定されており、類似文書検索クエリの特徴語１２０２に「製品Ｙ」「仕様」「画面」という特徴語が設定されており、キーワードフィルター１２０３に「画面」「モバイル」というキーワードが設定されている。なお、画面上では「絞り込み条件」というラベルによりキーワードフィルターを表示している。ここで、仮にステップＳ１００６において識別単語簡易選択の動作モードがＹｅｓであったとすると、ステップＳ１００７により識別単語として「製品Ｘ」「仕様」という２件の単語が選出される。そのとき、検索処理部は検索結果の一番目の文書において、「製品Ｘ」および「仕様」の周辺文脈として本文内から「製品Ｘの管理画面の仕様は以下の通りとする」というスニペットを抽出する。さらに「製品Ｘ」「仕様」を識別表示タグで囲む。その結果、クライアント端末では検索結果の一番目の文書において、「製品Ｘの管理画面の仕様は以下の通りとする」「管理画面の仕様は以下の通りとする」という２つのテキストからなるスニペットが表示され、「製品Ｘ」「仕様」は識別表示される。 For example, on the search screen shown in FIG. 12, the user has not explicitly specified an identification word, the search query 1201 is set to "product The characteristic words "Y", "specifications", and "screen" are set, and the keywords "screen" and "mobile" are set in the keyword filter 1203. Note that the keyword filter is displayed on the screen with the label "Narrowing down conditions". Here, if the operation mode for simple selection of identification words is Yes in step S1006, two words "product X" and "specification" are selected as identification words in step S1007. At that time, the search processing unit extracts the snippet "The specifications of the management screen of product X are as follows" from the main text as the surrounding context of "product X" and "specifications" in the first document of the search results. do. Furthermore, "Product X" and "Specifications" are surrounded by identification tags. As a result, on the client terminal, the first document in the search results contains a snippet consisting of two texts: "The specifications of the management screen for product X are as follows" and "The specifications of the management screen are as follows." "Product X" and "Specifications" are displayed for identification.

また、図１４の検索画面において、ユーザーは識別単語として「モバイル」を指定している。この場合、検索処理部は識別単語として「モバイル」を選出する。以下同様にして、クライアント端末では検索結果の一番目の文書において、「管理画面はモバイル向けには提供しない」「モバイル向けの検索画面の設計は以下の通り」という２つのテキストからなるスニペットが表示され、「モバイル」は識別表示される。 Furthermore, on the search screen shown in FIG. 14, the user specifies "mobile" as the identification word. In this case, the search processing unit selects "mobile" as the identification word. Similarly, on the client terminal, the first document in the search results displays a snippet consisting of two texts: "The management screen will not be provided for mobile users" and "The design of the search screen for mobile users is as follows." "Mobile" is identified and displayed.

ステップＳ１０１１において、文書ＤＢ１２０から得られた文書一覧と求めた関連語とスニペットにより検索結果保存部１３４を更新する。 In step S1011, the search result storage unit 134 is updated with the document list obtained from the document DB 120 and the obtained related words and snippets.

ステップＳ１０１２において、検索結果をクライアント端末へ返す。クライアント端末では図１２のような検索結果画面が表示される。 In step S1012, the search results are returned to the client terminal. A search result screen as shown in FIG. 12 is displayed on the client terminal.

次に、図１１のフローチャートを用いて、本発明の実施形態における識別単語候補作成部１３５が実行する識別単語候補作成処理について説明する。 Next, the identification word candidate creation process executed by the identification word candidate creation unit 135 in the embodiment of the present invention will be described using the flowchart of FIG. 11.

また、参考例として、検索条件保存部１３２に図４の検索条件が、検索結果保存部１３３に図５の検索結果が保存されているものとし、検索操作保存部１３３に保存されている検索操作が「キーワードフィルターの追加・キーワード『モバイル』」であるとする。また、優先度ルール表として優先度ルール表（検索結果一覧）１３６を用いて、その中身が図６であるとする。 As a reference example, it is assumed that the search conditions shown in FIG. 4 are stored in the search condition storage section 132, the search results shown in FIG. Assume that the keyword is "Add keyword filter/keyword 'mobile'". It is also assumed that the priority rule table (search result list) 136 is used as the priority rule table and its contents are as shown in FIG.

ステップＳ１１０１からＳ１１０５にかけて、識別単語候補ソース表の作成処理が行われる。識別単語候補ソース表は図８のように単語と、その単語の取得元および取得元詳細からなる表である。取得元詳細は空のことがありうる。同じ単語が２回以上出現する場合もある。 From steps S1101 to S1105, an identification word candidate source table creation process is performed. The identification word candidate source table, as shown in FIG. 8, is a table consisting of words, sources of acquisition of the words, and details of the sources of acquisition. The source details may be empty. The same word may appear more than once.

ステップＳ１１０１において、検索条件保存部１３２に保存されている検索クエリから単語を識別単語候補ソース表に加える。 In step S1101, words from the search query stored in the search condition storage unit 132 are added to the identification word candidate source table.

例えば、図４の検索条件には検索クエリ「製品Ｘ仕様」が含まれるが、この検索クエリの文字列を空白で区切り「製品Ｘ」「仕様」という単語が得られる。「製品Ｘ」は取得元を「検索クエリ」とし、検索クエリ内の左から１番目に得られた単語であるため取得元詳細を「前から１番目」として識別単語候補ソース表に加える。同様に「仕様」は取得元を「検索クエリ」とし、検索クエリの左から２番目に得られた単語であるため取得元詳細を「前から２番目」として識別単語候補ソース表に加える。 For example, the search conditions in FIG. 4 include the search query "product "Product Similarly, "specification" has the acquisition source as "search query", and since it is the second word obtained from the left of the search query, it is added to the identification word candidate source table with the acquisition source details set as "second from the front".

ステップＳ１１０２において、検索条件保存部１３２に保存されている類似文書検索クエリの特徴語から単語を識別単語候補ソース表に加える。 In step S1102, words are added to the identification word candidate source table from the characteristic words of the similar document search query stored in the search condition storage unit 132.

例えば、図４の検索条件には類似文書検索クエリの特徴語として「製品Ｙ」「仕様」「画面」が含まれるため、「製品Ｙ」「仕様」「画面」という単語が得られる。これらの単語は、取得元を「類似文書検索クエリの特徴語」として識別単語候補ソース表に加える。 For example, the search conditions in FIG. 4 include "product Y," "specifications," and "screen" as characteristic words of the similar document search query, so the words "product Y," "specifications," and "screen" are obtained. These words are added to the identification word candidate source table with the acquisition source as "feature word of similar document search query".

ステップＳ１１０３において、検索条件保存部１３２に保存されているキーワードフィルターから単語を識別単語候補ソース表に加える。 In step S1103, words are added to the identification word candidate source table from the keyword filter stored in the search condition storage unit 132.

例えば、図４の検索条件にはキーワードフィルターとして「画面」「モバイル」が含まれるため、「画面」「モバイル」という単語が得られる。これらの単語は、取得元を「キーワードフィルター」として識別単語候補ソース表に加える。 For example, the search conditions in FIG. 4 include "screen" and "mobile" as keyword filters, so the words "screen" and "mobile" are obtained. These words are added to the identification word candidate source table with the source as the "keyword filter".

ステップＳ１１０４において、検索結果保存部１３４に保存されている関連語から単語を識別単語候補ソース表に加える。 In step S1104, words from the related words stored in the search result storage unit 134 are added to the identification word candidate source table.

例えば、図５の検索結果には関連語として「企画」「設計」「提案」が含まれるため、「企画」「設計」「提案」という単語が得られる。これらの単語は、取得元を「検索結果の関連語」として識別単語候補ソース表に加える。 For example, the search results in FIG. 5 include "planning," "design," and "proposal" as related words, so the words "planning," "design," and "proposal" are obtained. These words are added to the identification word candidate source table with the acquisition source as "related words of search results."

ステップＳ１１０５において、検索結果保存部１３４に保存されている文書一覧の特徴語（その文書において特徴的な単語であって、ｔｆ－ｉｄｆのような統計値から求められる単語）を識別単語候補ソース表に加える。 In step S1105, characteristic words (words characteristic of the document and determined from statistical values such as tf-idf) of the document list stored in the search result storage unit 134 are displayed in an identification word candidate source table. Add to.

例えば、図５の検索結果の一番目の文書には特徴語として「設計」「画面」「モバイル」が含まれるため、「設計」「画面」「モバイル」という単語が得られる。これらの単語は、取得元を「検索結果の特徴語」とし、検索スコアが最も高い文書から得た特徴語であるため取得元詳細を「文書の検索スコア１位」として識別単語候補ソース表に加える。検索結果の２番目、３番目の文書についても同様に特徴語を加えることができる。 For example, the first document in the search results in FIG. 5 includes "design," "screen," and "mobile" as characteristic words, so the words "design," "screen," and "mobile" are obtained. These words are obtained from the document with the highest search score, with the acquisition source as "feature words in search results", so they are added to the identified word candidate source table with the source details as "document search score 1st". Add. Characteristic words can be similarly added to the second and third documents in the search results.

ステップＳ１１０６において、識別単語候補ソース表のエントリ一覧について繰り返す処理を開始する。 In step S1106, processing is started to repeat the list of entries in the identification word candidate source table.

ステップＳ１１０７において、取得元および検索操作保存部１３３に保存されている検索操作に応じて、優先度ルールから加算する優先度を計算（算出）する。 In step S1107, the priority to be added is calculated from the priority rule according to the acquisition source and the search operation stored in the search operation storage unit 133.

例えば、図８の識別単語候補ソース表におけるエントリ８０１の単語「モバイル」の優先度は、図６の優先度ルール表（検索結果一覧）により以下のように計算される。まず取得元がキーワードフィルターであるため、ルール６０４より優先度は＋３００される。 For example, the priority of the word "mobile" in the entry 801 in the identification word candidate source table of FIG. 8 is calculated as follows using the priority rule table (search result list) of FIG. First, since the acquisition source is a keyword filter, the priority is increased by +300 according to rule 604.

また、検索操作保存部１３３に保存されている直近の検索操作の操作種別が「キーワードフィルターの追加」であり、かつ追加されたキーワードが「モバイル」であるため、ルール６０５より優先度は＋８００される。 Furthermore, since the operation type of the most recent search operation stored in the search operation storage unit 133 is "add keyword filter" and the added keyword is "mobile", the priority is +800 according to rule 605. Ru.

一般に、ユーザーが最後に行った検索操作はユーザーが直前に興味を持った内容を反映していると考えられる。そのため、ユーザーが最後に行った検索操作に関連する識別単語は、優先度を上げて優先的に表示することが有益と考えられる。ルール６０１、６０２、６０３も同様に、ユーザーが直前に興味を持った内容の優先度を高めるためのルールである。 In general, the user's last search operation is considered to reflect the user's most recent interest. Therefore, it is considered beneficial to increase the priority and display preferentially the identification words related to the last search operation performed by the user. Similarly, rules 601, 602, and 603 are rules for increasing the priority of content that the user was interested in immediately before.

最終的にはエントリ８０１の単語「モバイル」の優先度は１１００となる。 Finally, the priority of the word "mobile" in entry 801 is 1100.

もう一つの例として、図８の識別単語候補ソース表におけるエントリ８０２の単語「設計」の優先度は以下のように計算される。まず取得元が「検索結果の特徴語」由来であるため、ルール６０４より優先度は＋５０される。
さらに、取得元詳細において文書の検索スコア順位が１位であるため、ルール６０５より優先度は－１される。 As another example, the priority of the word "design" in entry 802 in the identification word candidate source table of FIG. 8 is calculated as follows. First, since the acquisition source is derived from "feature words of search results", the priority is increased by +50 according to rule 604.
Further, since the search score ranking of the document is first in the acquisition source details, the priority is decreased by -1 according to the rule 605.

このルールは、検索順位が高いほど優先度を高くする（下げ幅を小さくする）というルールである。 This rule is such that the higher the search ranking, the higher the priority (the smaller the reduction).

検索順位が高い文書はユーザーの指定した検索条件への一致度合いが高く、ユーザーが興味を持つ可能性が高い文書といえる。そのため、検索順位が高い文書に特徴的に現れる識別単語は、優先度を上げて優先的に表示することが有益と考えられる。 A document with a high search ranking has a high degree of matching with the search conditions specified by the user, and can be said to be a document that is likely to be of interest to the user. Therefore, it is considered beneficial to increase the priority and preferentially display identification words that characteristically appear in documents with high search rankings.

ステップＳ１１０８では、識別単語候補表におけるその単語エントリの優先度を更新する。その際の優先度の値は、識別単語候補表に既にその単語が存在する場合、既存の優先度とステップＳ１１０７で求めた値を足したものである。一方、識別単語候補表にその単語が存在していない場合、ステップＳ１１０７で求めた値を優先度の初期値として単語を追加する。 In step S1108, the priority of the word entry in the identification word candidate table is updated. If the word already exists in the identification word candidate table, the priority value at this time is the sum of the existing priority and the value obtained in step S1107. On the other hand, if the word does not exist in the identification word candidate table, the word is added with the value determined in step S1107 as the initial priority value.

例えば、図８の識別単語候補ソース表において、単語「モバイル」は取得元が「キーワードフィルター」であるエントリ８０１と、取得元が「検索結果文書の特徴語」であるエントリ８０３の２つのエントリがある。
エントリ８０１において優先度が１１００得られ、エントリ８０３において優先度が４９得られたとすると、最終的に単語「モバイル」の優先度はそれらの和である１１４９となる。 For example, in the identification word candidate source table of FIG. 8, the word "mobile" has two entries: entry 801 whose acquisition source is "keyword filter" and entry 803 whose acquisition source is "feature word of search result document". be.
Assuming that entry 801 obtains a priority of 1100 and entry 803 obtains a priority of 49, the final priority of the word "mobile" is the sum of these, 1149.

ステップＳ１１０９では、識別単語の一覧に未処理の単語が残っていれば処理をステップＳ１１０６に戻し、全て処理が終了していればステップＳ１１１０に進む。 In step S1109, if any unprocessed words remain in the list of identified words, the process returns to step S1106, and if all the processes have been completed, the process proceeds to step S1110.

ステップＳ１１１０では識別単語候補表を優先度の降順に並べ替える。なお、同一の優先度である単語はどのような順番にしてもよい。順番を一意にしたい場合、単語の文字コード順にしてもよい。 In step S1110, the identification word candidate table is sorted in descending order of priority. Note that words having the same priority may be placed in any order. If you want to make the order unique, you can use the word character code order.

最終的な識別単語候補表は図９のようになる。仮に検索処理のステップＳ１００６で識別単語簡易選択の動作モードがＮｏであり、文書検索システムが優先度上位１単語を識別単語として選択する設定になっている場合、識別単語は「モバイル」となる。これは単純に検索クエリから識別単語を選んだ場合とは別の結果になる。こうして、直近の検索操作に関係が強いキーワードフィルターの単語など、ユーザーが興味を持つ可能性が高い単語を自動で優先して識別表示することができる。 The final identification word candidate table is as shown in FIG. If the operation mode for simple identification word selection is No in step S1006 of the search process and the document search system is set to select one word with the highest priority as an identification word, the identification word will be "mobile". This results in a different result than simply selecting identifying words from a search query. In this way, it is possible to automatically prioritize and display words that are likely to be of interest to the user, such as keyword filter words that are closely related to the most recent search operation.

識別単語候補作成処理は、検索処理の一部として呼ばれるだけでなく、クライアント端末からのリクエストに応じて単体で実行されることもある。ユーザーが識別単語を手動入力する際、候補をユーザーに提示して識別単語を簡易に入力できるようにすることを目的とする。 The identification word candidate generation process is not only called as part of the search process, but also may be executed independently in response to a request from a client terminal. When a user manually inputs an identification word, the purpose is to present candidates to the user so that the user can easily input the identification word.

図１３を例として説明する。クライアント端末に検索結果画面が表示されているとき、ユーザーはクライアント端末を操作し、識別単語を手動で入力するフォーム１３０１にフォーカスを当てる。このとき、クライアント端末は文書検索システムへ識別単語候補表をリクエストする。リクエストを受け取った文書検索システムは識別単語候補作成処理を実施し、識別単語候補表をクライアント端末へ返す。なお、この場合には作成された識別単語候補表を上位だけに絞り込むことをせず、全件クライアント端末へ返す。なお、現在の検索結果において識別単語になっている単語のエントリを返さないこともできる。 This will be explained using FIG. 13 as an example. When the search result screen is displayed on the client terminal, the user operates the client terminal and focuses on form 1301 for manually inputting an identification word. At this time, the client terminal requests an identification word candidate table from the document search system. Upon receiving the request, the document search system executes identification word candidate creation processing and returns an identification word candidate table to the client terminal. In this case, the created identification word candidate table is not narrowed down to only the top ones, but all items are returned to the client terminal. Note that it is also possible not to return entries for words that are identified words in the current search results.

識別単語候補表を受けったクライアント端末は、フォーム１３０１の下部に識別単語候補一覧１３０２を表示する。識別単語候補一覧１３０２は、識別単語候補表の単語を並べたものである。上に表示される単語ほど優先度が高く、上下が同じ位置では左に表示される単語ほど優先度が高い。 Upon receiving the identification word candidate list, the client terminal displays an identification word candidate list 1302 at the bottom of the form 1301. The identification word candidate list 1302 is a list of words in the identification word candidate table. The higher the word is displayed, the higher the priority, and if the upper and lower positions are the same, the further the word is displayed to the left, the higher the priority is.

ユーザーがクライアント端末を操作して識別単語候補一覧１３０２に含まれる単語をクリックすると、フォーム１３０１にその単語が入力される。それと同時に、クライアント端末は、クリックされた単語を識別単語として現在と同じ検索条件で検索するよう、文書検索システムへリクエストを送る。その結果、文書検索システムにおいて検索処理が行われ、クライアント端末に検索結果が返ってくる。クライアント端末は返ってきた検索結果をもとに、検索結果の表示を更新する。なお、検索条件が同じであるため、検索結果の文書および順位は同一になり、実際にはスニペットの表示だけ更新されることになる。 When the user operates the client terminal and clicks on a word included in the identification word candidate list 1302, the word is input into the form 1301. At the same time, the client terminal sends a request to the document search system to use the clicked word as an identification word and search using the same search conditions as the current one. As a result, a search process is performed in the document search system, and search results are returned to the client terminal. The client terminal updates the search result display based on the returned search results. Note that since the search conditions are the same, the documents and rankings of the search results will be the same, and in reality only the display of the snippet will be updated.

例えば図１３において単語１３０３である「モバイル」がクリックされたとき、検索画面は図１４のように変化する。すなわち、検索結果の文書は同一であり、スニペットの表示が「モバイル」周辺の文字列に変化する。 For example, when the word 1303 "mobile" in FIG. 13 is clicked, the search screen changes as shown in FIG. 14. In other words, the documents in the search results are the same, but the snippet display changes to a string around "mobile."

さらに、クライアント端末に表示される画面として、特定の文書の詳細を表す図１５のような検索結果詳細画面も存在する。 Furthermore, as a screen displayed on the client terminal, there is also a search result details screen as shown in FIG. 15 showing details of a specific document.

これは、検索リクエストの返り値である検索結果として得られた文書から、１件の文書のみを詳細に表示する画面である。この画面によりユーザーは興味を持った文書について、本文をより詳細に確認することができる。
検索結果詳細画面においても、ユーザーが識別単語を手動で設定し、識別単語の周辺文脈を確認する機能がある。このときも、文書検索システムは識別単語の候補をユーザーに提示してユーザーが識別単語を入力できるよう支援を行う。 This is a screen that displays in detail only one document from among the documents obtained as a search result that is a return value of a search request. This screen allows the user to check the text of the document of interest in more detail.
The search result details screen also has a function that allows users to manually set identification words and check the surrounding context of the identification words. At this time as well, the document retrieval system presents identification word candidates to the user and assists the user in inputting the identification word.

識別単語候補提示の流れは、検索結果一覧画面のものと類似している。
ユーザーがクライアント端末を操作し、識別単語を手動で入力するフォーム１５０１にフォーカスを当てたとき、クライアント端末は文書検索システムへ識別単語候補表をリクエストする。ただしこのとき、クライアント端末は優先度ルール表（検索結果詳細）を用いて優先度を決定するようにリクエストを行う。文書検索システムは優先度ルール表（検索結果詳細）を用いて求めた識別単語候補表をクライアント端末へ返す。識別単語候補表を受けったクライアント端末は、フォーム１５０１の下部に識別単語候補一覧１５０２を表示する。 The flow of presenting identification word candidates is similar to that of the search result list screen.
When a user operates a client terminal and focuses on a form 1501 for manually inputting an identification word, the client terminal requests an identification word candidate table from the document retrieval system. However, at this time, the client terminal makes a request to determine the priority using the priority rule table (search result details). The document search system returns the identification word candidate table obtained using the priority rule table (search result details) to the client terminal. Upon receiving the identification word candidate list, the client terminal displays an identification word candidate list 1502 at the bottom of the form 1501.

ユーザーがクライアント端末を操作して識別単語候補一覧１５０２に含まれる単語をクリックすると、フォーム１５０１にその単語が入力される。それと同時に、クライアント端末は、クリックされた単語を識別単語として、現在詳細表示している文書の文書ＩＤを検索条件として検索するよう、文書検索システムへリクエストを送る。その結果文書検索システムにおいて検索処理が行われ、当該文書１件のスニペットを含む検索結果がクライアント端末に返される。端末は返された検索結果をもとに、本文の表示をスニペットに置き換える。なお、検索システムは本文以外でも、タイトルのようなテキスト項目からスニペットを抽出できる構成にすることが可能であり、本文以外のフィールドの表示を抽出されたスニペットで置き換えることができる。 When the user operates the client terminal and clicks on a word included in the identification word candidate list 1502, the word is input into the form 1501. At the same time, the client terminal sends a request to the document search system to search using the clicked word as an identification word and the document ID of the document currently being displayed in detail as a search condition. As a result, a search process is performed in the document search system, and a search result including a snippet of the one document is returned to the client terminal. Based on the search results returned, the device replaces the text display with a snippet. Note that the search system can be configured to extract snippets from text items such as titles in addition to the main text, and the display of fields other than the main text can be replaced with the extracted snippets.

例えば図１５において識別単語候補一覧にある単語１５０３「モバイル」がクリックされると、図１６のように本文が「モバイル」周辺の文字列を表示するよう変化する。タイトルは「モバイル」を含まないため、空欄を表示する。 For example, when the word 1503 "mobile" in the identification word candidate list in FIG. 15 is clicked, the main text changes to display character strings surrounding "mobile" as shown in FIG. 16. The title does not include "mobile", so a blank field is displayed.

検索結果詳細に関する識別単語候補表の作成処理について説明する。基本的には検索結果一覧によるものと同様、図１１のフローチャートに従って処理を行う。ただし、ステップＳ１１０５において、詳細表示対象の文書の特徴語のみを識別単語候補ソース表に加える対象として、それ以外の文書の特徴語は加えない。他の文書の関連語が詳細表示中の文書に関係ある可能性は低いためである。また、ステップＳ１１０７において優先度ルールを計算する際、優先度ルール表（検索結果詳細）１３７を用いて計算を行う。図７に優先度ルール表（検索結果詳細）１３７の例を記載している。基本的には優先度ルール表（検索結果一覧）１３６と同じであるが、検索結果の特徴語に関してはルール７０１のように、取得元が当該文書の特徴語である場合に適用されるルールとなる。また、検索スコアの順位は利用しない。 The process of creating an identification word candidate table regarding search result details will be described. Basically, the process is performed according to the flowchart in FIG. 11, similar to the process using the search result list. However, in step S1105, only the characteristic words of the document to be displayed in detail are added to the identification word candidate source table, and the characteristic words of other documents are not added. This is because there is a low possibility that related words in other documents are related to the document currently being displayed in detail. Furthermore, when calculating the priority rules in step S1107, the calculation is performed using the priority rule table (search result details) 137. FIG. 7 shows an example of the priority rule table (search result details) 137. Basically, it is the same as the priority rule table (search result list) 136, but regarding the characteristic words of the search results, rules like rule 701 apply when the acquisition source is the characteristic word of the document concerned. Become. Also, search score rankings are not used.

以上説明した通り、本発明では、検索クエリとして指定された単語だけでなく、検索された文書から取得される、当該文書を特徴付ける単語や、検索された文書に関連する単語についても識別表示の対象とすることが可能となるため、文書検索を行ったユーザは、検索結果を効率的に確認することが可能となる。 As explained above, in the present invention, not only words specified as a search query but also words that characterize the document retrieved from the document and words related to the retrieved document are subject to identification display. Therefore, a user who has performed a document search can efficiently check the search results.

本発明は、例えば、システム、装置、方法、プログラムもしくは記録媒体等としての実施態様をとることが可能である。具体的には、複数の機器から構成されるシステムに適用しても良いし、また、一つの機器からなる装置に適用しても良い。 The present invention can be implemented as, for example, a system, an apparatus, a method, a program, a recording medium, or the like. Specifically, the present invention may be applied to a system consisting of a plurality of devices, or may be applied to a device consisting of a single device.

また、本発明におけるプログラムは、図１０、図１１に示すフローチャートの処理方法をコンピュータが実行可能なプログラムであり、本発明の記憶媒体は図１０、図１１の処理方法をコンピュータが実行可能なプログラムが記憶されている。なお、本発明におけるプログラムは図１０、図１１の各装置の処理方法ごとのプログラムであってもよい。 Further, the program in the present invention is a program that allows a computer to execute the processing method shown in the flowcharts shown in FIGS. 10 and 11, and the storage medium of the present invention is a program that allows a computer to execute the processing method shown in is memorized. Note that the program in the present invention may be a program for each processing method of each device shown in FIGS. 10 and 11.

以上のように、前述した実施形態の機能を実現するプログラムを記録した記録媒体を、システムあるいは装置に供給し、そのシステムあるいは装置のコンピュータ（またはＣＰＵやＭＰＵ）が記録媒体に格納されたプログラムを読み出し、実行することによっても本発明の目的が達成されることは言うまでもない。 As described above, a recording medium recording a program that implements the functions of the embodiments described above is supplied to a system or device, and the computer (or CPU or MPU) of the system or device reads the program stored in the recording medium. It goes without saying that the object of the present invention can also be achieved by reading and executing.

この場合、記録媒体から読み出されたプログラム自体が本発明の新規な機能を実現することになり、そのプログラムを記録した記録媒体は本発明を構成することになる。 In this case, the program itself read from the recording medium will realize the novel function of the present invention, and the recording medium on which the program is recorded constitutes the present invention.

プログラムを供給するための記録媒体としては、例えば、フレキシブルディスク、ハードディスク、光ディスク、光磁気ディスク、ＣＤ－ＲＯＭ、ＣＤ－Ｒ、ＤＶＤ－ＲＯＭ、磁気テープ、不揮発性のメモリカード、ＲＯＭ、ＥＥＰＲＯＭ、シリコンディスク等を用いることが出来る。 Examples of recording media for supplying programs include flexible disks, hard disks, optical disks, magneto-optical disks, CD-ROMs, CD-Rs, DVD-ROMs, magnetic tapes, non-volatile memory cards, ROMs, EEPROMs, and silicon A disk or the like can be used.

また、コンピュータが読み出したプログラムを実行することにより、前述した実施形態の機能が実現されるだけでなく、そのプログラムの指示に基づき、コンピュータ上で稼働しているＯＳ（オペレーティングシステム）等が実際の処理の一部または全部を行い、その処理によって前述した実施形態の機能が実現される場合も含まれることは言うまでもない。 In addition, by executing a program read by a computer, not only the functions of the above-described embodiments are realized, but also the OS (operating system) etc. running on the computer are realized based on the instructions of the program. It goes without saying that this also includes a case where part or all of the processing is performed and the functions of the embodiments described above are realized by the processing.

さらに、記録媒体から読み出されたプログラムが、コンピュータに挿入された機能拡張ボードやコンピュータに接続された機能拡張ユニットに備わるメモリに書き込まれた後、そのプログラムコードの指示に基づき、その機能拡張ボードや機能拡張ユニットに備わるＣＰＵ等が実際の処理の一部または全部を行い、その処理によって前述した実施形態の機能が実現される場合も含まれることは言うまでもない。 Furthermore, after the program read from the recording medium is written into the memory of the function expansion board inserted into the computer or the function expansion unit connected to the computer, the function expansion board It goes without saying that this also includes a case where a CPU or the like provided in a function expansion unit performs part or all of the actual processing, and the functions of the above-described embodiments are realized by the processing.

また、本発明は、複数の機器から構成されるシステムに適用しても、ひとつの機器から成る装置に適用しても良い。また、本発明は、システムあるいは装置にプログラムを供給することによって達成される場合にも適応できることは言うまでもない。この場合、本発明を達成するためのプログラムを格納した記録媒体を該システムあるいは装置に読み出すことによって、そのシステムあるいは装置が、本発明の効果を享受することが可能となる。 Moreover, the present invention may be applied to a system made up of a plurality of devices, or to a device made up of one device. It goes without saying that the present invention can also be applied to cases where the present invention is achieved by supplying a program to a system or device. In this case, by reading a recording medium storing a program for achieving the present invention into the system or device, the system or device can enjoy the effects of the present invention.

さらに、本発明を達成するためのプログラムをネットワーク上のサーバ、データベース等から通信プログラムによりダウンロードして読み出すことによって、そのシステムあるいは装置が、本発明の効果を享受することが可能となる。なお、上述した各実施形態およびその変形例を組み合わせた構成も全て本発明に含まれるものである。 Further, by downloading and reading a program for achieving the present invention from a server, database, etc. on a network using a communication program, the system or device can enjoy the effects of the present invention. Note that all configurations that are combinations of the above-described embodiments and their modifications are also included in the present invention.

１００文書検索システム 100 Document search system

Claims

a search means for performing a document search using a search query specified by a user;
Display control means for controlling to display search results by the search means;
Feature word acquisition means for obtaining feature words in the document from the retrieved document;
Equipped with
The information processing system is characterized in that the display control means controls the characteristic word in the document searched by the search means to be displayed as an identification word in an identifiable manner.

a search means for performing a document search using a search query specified by a user;
Display control means for controlling to display search results by the search means;
Related word acquisition means for acquiring related words that are words related to the document searched by the search means;
Equipped with
The information processing system is characterized in that the display control means controls the related word included in the document searched by the search means to be displayed as an identification word in an identifiable manner.

3. The information processing system according to claim 1, wherein the display control means controls to display a word that satisfies a predetermined condition among the characteristic word or the related word so as to be identifiable as an identification word.

4. The information processing system according to claim 3, wherein the display control means controls to display words that satisfy a predetermined condition among the identification words in a recognizable manner.

5. The information processing system according to claim 4, wherein the predetermined condition is a condition based on a priority calculated for each of the identification words.

6. The information processing system according to claim 5, wherein the priority is a value calculated based on information including an acquisition source of the identification word.

7. The information processing system according to claim 5, wherein the display control means controls to display a predetermined number of identification words in an identifiable manner in descending order of priority.

Equipped with a reception means for accepting the specification of an identification word by the user,
8. The display control means controls to display the identification word in an identifiable manner when the reception means accepts the identification word from the user. The information processing system described.

The display control means controls to display the identification word candidates,
9. The information processing system according to claim 8, wherein the accepting unit accepts the designation of the identification word by accepting a selection of a word to be identified and displayed from among the displayed identification word candidates.

10. The display control means controls to display, as a snippet, character strings surrounding the identification word that is the object of the identification display from the document searched by the search means. The information processing system according to item 1.

11. The information processing system according to claim 10, wherein the display control means displays the words included in the snippet in an identifiable manner.

An information processing method in an information processing system, comprising:
a search step in which the search means of the information processing system performs a document search using a search query specified by a user;
a display control step in which the display control means of the information processing system controls to display the search results from the search step;
a characteristic word acquisition step in which the characteristic word acquisition means of the information processing system acquires characteristic words in the document from the retrieved document;
Equipped with
The information processing method is characterized in that the display control step controls the feature words in the document searched in the search step to be displayed as identification words in an identifiable manner.

An information processing method in an information processing system, comprising:
a search step in which the search means of the information processing system performs a document search using a search query specified by a user;
a display control step in which the display control means of the information processing system controls to display the search results from the search step;
a related word acquisition step in which the related word acquisition means of the information processing system acquires related words that are words related to the document searched in the search step;
Equipped with
The information processing method is characterized in that the display control step controls the related words included in the document retrieved in the search step to be displayed as identification words in an identifiable manner.

A program for causing a computer to function as each means according to any one of claims 1 to 11.