JP7251625B2

JP7251625B2 - Method and system for searching and displaying relevant documents

Info

Publication number: JP7251625B2
Application number: JP2021528777A
Authority: JP
Inventors: 勇樹石川
Original assignee: Shimadzu Corp
Current assignee: Shimadzu Corp
Priority date: 2019-06-27
Filing date: 2019-06-27
Publication date: 2023-04-04
Anticipated expiration: 2039-06-27
Also published as: WO2020261479A1; JPWO2020261479A1

Description

本発明は、関連文書を検索して表示する方法およびシステムに関する。 The present invention relates to methods and systems for retrieving and displaying relevant documents.

従来、検索クエリに関連する文書（関連文書）を検索して表示する方法が知られている。たとえば、特開２０１８－１０４８２号公報（特許文献１）には、検索クエリと、検索クエリに概念的に適合する検索対象文書である正解文書の集合との組の集合（正解情報）を用いて、ユーザが入力した検索クエリに概念的に適合する検索対象文書を検索する文書概念検索方法が開示されている。当該文書概念検索方法によれば、正解情報を用いることにより検索精度を向上させることができる。 Conventionally, a method of searching for and displaying documents related to a search query (related documents) is known. For example, in Japanese Patent Application Laid-Open No. 2018-10482 (Patent Document 1), a set of pairs (correct information) of a search query and a set of correct documents that are search target documents conceptually matching the search query are used. , a document concept search method for searching for documents to be searched that conceptually match a search query entered by a user. According to the document concept retrieval method, retrieval accuracy can be improved by using the correct information.

特開２０１８－１０４８２号公報JP 2018-10482 A

人工知能分野の自然言語処理では機械学習を応用して、単語の意味をベクトル表現（分散表現）に変換する技術が用いられている。具体的には、或る単語の意味は当該単語の周囲に出現する単語の分布から決定されるという分布仮説に基づいて、大量の文章情報をニューラルネットワークで学習し、各単語の意味を表すベクトル空間を生成する。 Natural language processing in the field of artificial intelligence applies machine learning to convert the meaning of words into vector representations (distributed representations). Specifically, based on the distribution hypothesis that the meaning of a word is determined from the distribution of words that appear around that word, a neural network learns a large amount of sentence information and generates a vector representing the meaning of each word. Generate space.

互いに類似する意味を有する２つの言語単位のベクトル間距離は、相対的に短くなる。ベクトル表現に基づく文書検索方法によると、検索クエリに近い意味を有する言語単位をより多く含む文書が上位に順位付けされる。そのため、当該文書検索方法によると、検索クエリと一致する文字列が全く含まない文書であっても検索結果の上位に順位付けされる場合があり得る。そのような場合、当該文書が検索された根拠をユーザが理解することができない可能性がある。 The vector-to-vector distance between two linguistic units that have similar meanings to each other is relatively short. According to the document retrieval method based on vector representation, documents containing more linguistic units having meanings close to the retrieval query are ranked higher. Therefore, according to the document retrieval method, even documents that do not contain any character strings that match the retrieval query may be ranked high in the retrieval results. In such cases, the user may not be able to understand why the document was retrieved.

特許文献１に開示されている文書概念検索方法においては、検索クエリ概念ベクトルと検索対象文書の概念ベクトルとの類似度を降順にランキングした検索対象文書を検索結果として表示する。しかし、検索結果の根拠をユーザに示すことについて考慮されていない。 In the document concept search method disclosed in Patent Document 1, search target documents ranked in descending order of similarity between a search query concept vector and a search target document concept vector are displayed as search results. However, no consideration is given to presenting the basis of search results to the user.

本発明は、このような課題を解決するためになされたものであって、その目的は、複数の文書が格納されたデータベースから検索クエリに関連する少なくとも１つの関連文書を検索して表示する方法において、検索結果の根拠をユーザに示すことである。 SUMMARY OF THE INVENTION The present invention has been made to solve such problems, and an object thereof is a method of retrieving and displaying at least one related document related to a search query from a database in which a plurality of documents are stored. , to show the user the basis of the search results.

本発明の第１の態様に係る関連文書を検索して表示する方法は、複数の文書を含むデータベースから検索クエリに関連する少なくとも１つの関連文書を検索して表示する。当該方法は、検索工程と、表示工程とを含む。検索工程は、任意の言語単位をベクトル表現に変換するベクトル空間を用いて複数の文書の各々についてのベクトル表現と検索クエリのベクトル表現との距離を算出し、距離に応じて少なくとも１つの関連文書を検索する。表示工程は、少なくとも１つの関連文書の各々を表示する。表示工程は、当該関連文書に含まれる複数の言語単位の各々のベクトル表現と検索クエリのベクトル表現との関連性の大きさに応じた表示態様で、当該言語単位を表示する工程を含む。 A method for retrieving and displaying related documents according to a first aspect of the present invention retrieves and displays at least one related document related to a search query from a database containing a plurality of documents. The method includes a searching step and a displaying step. The retrieving step calculates a distance between the vector representation of each of the plurality of documents and the vector representation of the search query using a vector space that converts any linguistic unit into a vector representation, and retrieves at least one related document according to the distance. Search for The displaying step displays each of the at least one related document. The displaying step includes displaying the linguistic units in a display mode according to the degree of relevance between the vector representation of each of the linguistic units included in the related document and the vector representation of the search query.

本発明の第２の態様に係る関連文書を検索して表示するシステムは、複数の文書から検索クエリに関連する少なくとも１つの関連文書を検索して表示する。当該システムは、データベースと、検索処理部とを備える。データベースには、複数の文書が格納されている。検索処理部は、任意の言語単位をベクトル表現に変換するベクトル空間を用いて複数の文書の各々についてのベクトル表現と検索クエリのベクトル表現との距離を算出し、距離に応じて少なくとも１つの関連文書を検索する。検索処理部は、少なくとも１つの関連文書の各々を表示する。検索処理部は、当該関連文書に含まれる複数の言語単位の各々のベクトル表現と検索クエリのベクトル表現との関連性の大きさに応じた表示態様で、当該言語単位を表示する。 A system for retrieving and displaying related documents according to a second aspect of the present invention retrieves and displays at least one related document related to a search query from a plurality of documents. The system includes a database and a search processor. A plurality of documents are stored in the database. The search processing unit calculates the distance between the vector representation of each of the plurality of documents and the vector representation of the search query using a vector space that converts any linguistic unit into a vector representation, and determines at least one relation according to the distance. Search for documents. A search processor displays each of the at least one related document. The search processing unit displays the linguistic units in a display manner according to the degree of relevance between the vector representation of each of the linguistic units included in the related document and the vector representation of the search query.

本発明に係る関連文書を検索して表示する方法およびシステムによれば、関連文書に含まれる複数の言語単位の各々のベクトル表現と検索クエリのベクトル表現とに基づく当該言語単位と検索クエリとの関連性の大きさに応じた表示態様で、当該言語単位を表示することにより、検索結果の根拠をユーザに示すことができる。 According to the method and system for retrieving and displaying related documents according to the present invention, each of the plurality of linguistic units included in the related document and the vector representation of the search query are used to determine the relationship between the linguistic unit and the search query. By displaying the linguistic unit in a display mode according to the degree of relevance, it is possible to show the user the basis of the search result.

実施の形態に係る関連文書を検索して表示するシステムの一例である分析事例検索システムの外観図である。1 is an external view of an analysis case search system, which is an example of a system for searching and displaying related documents according to an embodiment; FIG. 図１の分析事例検索システムの構成を示す機能ブロック図である。FIG. 2 is a functional block diagram showing the configuration of the analysis case search system of FIG. 1; 分析レポートの一例を示す図である。It is a figure which shows an example of an analysis report. 図２の学習処理部によって行われる学習処理の流れを説明するためのフローチャートである。3 is a flowchart for explaining the flow of learning processing performed by the learning processing unit of FIG. 2; 図２の検索処理部によって行われる検索処理の流れを説明するためのフローチャートである。3 is a flowchart for explaining the flow of search processing performed by the search processing unit in FIG. 2; 図２の表示制御部によってディスプレイに表示される検索結果ウィンドウの一例を示す図である。3 is a diagram showing an example of a search result window displayed on the display by the display control unit shown in FIG. 2; FIG. 図６の関連文書が選択された場合に検索結果ウィンドウに表示される関連文書の内容を示す図である。FIG. 7 is a diagram showing the contents of a related document displayed in a search result window when the related document in FIG. 6 is selected; 関連文書に含まれる単語と検索クエリとの関連性として当該単語の寄与度を用いて、関連文書に含まれる複数の単語の各々をハイライトした様子を示す図である。FIG. 10 is a diagram showing how each of a plurality of words included in a related document is highlighted using the degree of contribution of the word as the relevance between the word included in the related document and the search query; ハイライトされる言語単位が文章である場合の検索結果ウィンドウに表示される関連文書の内容を示す図である。FIG. 10 is a diagram showing the contents of related documents displayed in the search result window when the highlighted linguistic unit is a sentence; ハイライトされる言語単位が段落である場合の検索結果ウィンドウに表示される関連文書の内容を示す図である。FIG. 10 is a diagram showing the contents of related documents displayed in the search result window when the highlighted linguistic unit is a paragraph; 実施の形態に係る関連文書を検索して表示するシステムの一例である分析事例検索システムがネットワークを介して複数のクライアント端末と接続された場合の、分析事例検索システムの構成を示す機能ブロック図である。FIG. 2 is a functional block diagram showing the configuration of an analysis case search system, which is an example of a system for searching and displaying related documents according to the embodiment, when the analysis case search system is connected to a plurality of client terminals via a network; be.

以下に、本発明の実施の形態について図面を参照して詳細に説明する。なお、以下では図中の同一または相当部分には同一符号を付してその説明は原則的に繰返さない。 BEST MODE FOR CARRYING OUT THE INVENTION Below, embodiments of the present invention will be described in detail with reference to the drawings. In the following description, the same reference numerals are assigned to the same or corresponding parts in the drawings, and the description thereof will not be repeated in principle.

図１は、実施の形態に係る関連文書を検索して表示するシステムの一例である分析事例検索システム１００の外観図である。図１に示されるように、分析事例検索システム１００は、計算機１０と、ディスプレイ６０と、キーボードＫＢ１と、マウスＭＳ１とを備える。ディスプレイ６０と、キーボードＫＢ１と、マウスＭＳ１とは、計算機１０に接続されている。 FIG. 1 is an external view of an analysis case search system 100, which is an example of a system for searching and displaying related documents according to an embodiment. As shown in FIG. 1, the analysis case retrieval system 100 includes a calculator 10, a display 60, a keyboard KB1, and a mouse MS1. A display 60, a keyboard KB1, and a mouse MS1 are connected to the computer 10. FIG.

ディスプレイ６０には、検索ウィンドウＷｎ１およびカーソルＣｒが表示されている。ユーザは、マウスＭＳ１を操作することによりカーソルを操作する。ユーザは、キーボードＫＢ１を操作することにより検索クエリを検索ウィンドウＷｎ１に入力する。図１には、臭素酸の分析に適した分析装置、分析方法、および分析条件等が記載された文書を検索するため、検索クエリとして「臭素酸」という文字列をユーザが検索ウィンドウＷｎ１に入力した場合が示されている。 The display 60 displays a search window Wn1 and a cursor Cr. The user operates the cursor by operating the mouse MS1. The user inputs a search query into the search window Wn1 by operating the keyboard KB1. In FIG. 1, a user enters a character string "bromic acid" as a search query into a search window Wn1 in order to search for documents describing analytical equipment suitable for analysis of bromic acid, analysis methods, analysis conditions, and the like. The case is shown.

（分析事例検索システム１００の構成）
図２は、図１の分析事例検索システム１００の構成を示す機能ブロック図である。図２に示されるように、分析事例検索システム１００は、学習処理部２０と、検索処理部４０とを備える。(Configuration of Analysis Case Search System 100)
FIG. 2 is a functional block diagram showing the configuration of the analysis case search system 100 of FIG. As shown in FIG. 2 , the analysis case search system 100 includes a learning processing section 20 and a search processing section 40 .

分析事例検索システム１００は、ユーザが入力した検索クエリに基づいて、データベース３０に含まれる複数の文書を検索し、検索クエリに関連する関連文書を検索する。データベース３０には、分析事例の分野の文書データが含まれている。分析事例の分野の文書データには、たとえば、分析レポート、分析関連論文、および分析関連特許文献が含まれる。分析レポートは、たとえば図３に示されるような化合物分析に関する文書であり、分析対象の化合物の分析方法に関する情報、分析装置に関する情報、および分析条件に関する情報等を含む。 The analysis case search system 100 searches a plurality of documents included in the database 30 based on the search query input by the user, and searches related documents related to the search query. The database 30 contains document data in the field of analysis cases. Document data in the field of analysis cases include, for example, analysis reports, analysis-related papers, and analysis-related patent documents. The analysis report is, for example, a document related to compound analysis as shown in FIG. 3, and includes information related to the analysis method of the compound to be analyzed, information related to the analysis equipment, information related to analysis conditions, and the like.

（学習処理部２０の構成）
再び図２を参照して、学習処理部２０は、形態素解析部２１、ベクトル生成部２３、関連度学習部２５、コーパス２７、およびデータベース３０を含む。コーパス２７は、自然言語処理を用いる機械学習のために、分析事例の分野に関する大量の文書データが体系化されて蓄積された言語資料である。(Configuration of learning processing unit 20)
Referring again to FIG. 2 , learning processing unit 20 includes morphological analysis unit 21 , vector generation unit 23 , association degree learning unit 25 , corpus 27 , and database 30 . The corpus 27 is linguistic material in which a large amount of document data related to the field of analysis cases is systematized and accumulated for machine learning using natural language processing.

形態素解析部２１は、コーパス２７に蓄積されているすべての文書データを、意味を有する最小の言語単位（形態素あるいは単語）に形態素解析によって分割することによって細分化する。 The morphological analysis unit 21 subdivides all the document data stored in the corpus 27 by dividing them into the smallest meaningful linguistic units (morphemes or words) by morphological analysis.

ベクトル生成部２３は、形態素解析部２１による形態素解析の結果に対する機械学習によって、単語をベクトル表現に変換するベクトル空間を生成する。当該ベクトル空間は、ニューラルネットワークによって形成されるモデルに対する機械学習の過程で生成される。当該モデルとしては、たとえば、周辺の単語（コンテキスト）から中央の単語（ターゲット）を推論するＣＢＯＷ（Continuous Bag-of-Words）モデル、あるいは中央の単語から周辺の単語を推論するｓｋｉｐ－ｇｒａｍモデルを挙げることができる。 The vector generation unit 23 generates a vector space for converting a word into a vector representation by machine learning of the morphological analysis result of the morphological analysis unit 21 . The vector space is generated in the process of machine learning for models formed by neural networks. Examples of the model include a CBOW (Continuous Bag-of-Words) model that infers a central word (target) from surrounding words (context), or a skip-gram model that infers peripheral words from a central word. can be mentioned.

ベクトル生成部２３は、単語の意味表現である単語ベクトルを生成する。ベクトル生成部２３は、文章の特徴を表す文章ベクトル、および文書の特徴を表す文書ベクトルを文書中に含まれている単語の特徴量の総和等から生成する。なお、文書データの細分化は、任意の言語単位で行われてもよい。任意の言語単位の種類としては、文字、形態素、単語、文章、あるいは段落を挙げることができる。文章とは意味のある文の最小単位であり、文書は複数の文章で構成される。日本語の文章は句点で区切られ、英語等の言語で作成された文章はピリオド（終止符）で区切られる。上記の形態素解析によって文書を句点、あるいはピリオドごとに文に分割して、当該形態素解析の結果に対する上記の機械学習をもとに文章ベクトルを生成する。文書を段落ごとに分割して、段落単位に文章ベクトルを生成してもよい。 The vector generation unit 23 generates word vectors, which are semantic representations of words. The vector generation unit 23 generates a sentence vector representing the feature of the sentence and a document vector representing the feature of the document from the sum of feature amounts of words included in the document. Note that subdivision of the document data may be performed in arbitrary language units. Any type of linguistic unit may include letters, morphemes, words, sentences, or paragraphs. A sentence is the smallest unit of meaningful sentences, and a document consists of a plurality of sentences. Japanese sentences are separated by periods, and sentences written in languages such as English are separated by periods (full stops). By the above morphological analysis, the document is divided into sentences by punctuation marks or periods, and a sentence vector is generated based on the above machine learning for the result of the morphological analysis. A sentence vector may be generated for each paragraph by dividing the document into paragraphs.

ベクトル生成部２３で生成された単語ベクトル、文章ベクトル、および文書ベクトルは、関連度学習部２５に送信される。関連度学習部２５は、単語ベクトル学習部２５ａ、単語－文章間学習部２５ｂ、および単語－文書間学習部２５ｃを含む。 The word vectors, sentence vectors, and document vectors generated by the vector generation unit 23 are sent to the association degree learning unit 25 . The association level learning unit 25 includes a word vector learning unit 25a, a word-sentence learning unit 25b, and a word-document learning unit 25c.

単語ベクトル学習部２５ａは、単語のベクトル空間におけるベクトル同士の距離が単語間の意味的な類似度を表すことから、ベクトル空間における単語間の意味的な関連度、および単語間のベクトル距離を計算する。同様に単語－文章間学習部２５ｂは、ベクトル空間における単語と文章間の関連度、およびベクトル距離を計算し、単語－文書間学習部２５ｃは、ベクトル空間における単語と文書間の関連度、およびベクトル距離を計算する。なお、２つのベクトル表現の間の距離としては、たとえば、コサイン距離（コサイン類似度）を挙げることができる。２つのベクトル表現の間のコサイン距離が小さいほど、２つのベクトルによってそれぞれ表現される２つの言語単位の意味は近い。 Since the distance between vectors in the vector space of words represents the semantic similarity between words, the word vector learning unit 25a calculates the semantic relevance between words in the vector space and the vector distance between words. do. Similarly, the word-sentence learning unit 25b calculates the degree of relevance between the word and the sentence in the vector space and the vector distance, and the word-document learning unit 25c calculates the degree of relevance between the word and the document in the vector space, and Compute vector distances. Note that the distance between two vector representations can be, for example, cosine distance (cosine similarity). The smaller the cosine distance between the two vector representations, the closer the meaning of the two linguistic units respectively represented by the two vectors.

単語ベクトル学習部２５ａ、単語－文章間学習部２５ｂ、および単語－文書間学習部２５ｃでの計算結果は、単語、文章、および文書各々を座標軸とする多次元ベクトル空間を有するデータベース３０に、学習データとともに、単語ベクトル、単語－文章ベクトル、および単語－文書ベクトルとして格納される。 Calculation results in the word vector learning unit 25a, the word-sentence learning unit 25b, and the word-document learning unit 25c are stored in the database 30 having a multidimensional vector space with each coordinate axis being a word, a sentence, and a document. Stored with the data as word vectors, word-sentence vectors, and word-document vectors.

なお、コーパス２７において、たとえば分析事例のような特定分野を中心としたデータを蓄積することで、その分野により合致したベクトル表現が得られる。その結果、特定分野の文章等の検索精度を向上させることができる。また、コーパス２７は、社内報告書、技報、アプリケーションニュース等を格納した内部コーパスと、ウィキペディア（登録商標）等のウェブ上で外部に公開されているデータ等を集めた外部コーパスとから構成されてもよい。外部コーパスはベクトル表現の学習の向上を図るものであるため、外部コーパスを検索対象から除外することにより、検索速度の低下を防止することができる。 By accumulating data centered on a specific field, such as analysis cases, in the corpus 27, a vector representation more suitable for that field can be obtained. As a result, it is possible to improve the search accuracy for sentences in a specific field. The corpus 27 is composed of an internal corpus that stores in-house reports, technical reports, application news, etc., and an external corpus that collects data published on the web such as Wikipedia (registered trademark). may Since the external corpus is intended to improve the learning of vector representations, it is possible to prevent a decrease in search speed by excluding the external corpus from the search target.

図４は、図２の学習処理部２０によって行われる学習処理の流れを説明するためのフローチャートである。以下ではステップを単にＳと記載する。図４に示されるように、形態素解析部２１は、Ｓ１１において、コーパス２７に格納されている文書データ（学習データ）を既存の辞書を使用した形態素解析により複数の単語に分割する。 FIG. 4 is a flowchart for explaining the flow of learning processing performed by the learning processing unit 20 of FIG. A step is simply denoted as S below. As shown in FIG. 4, in S11, the morphological analysis unit 21 divides the document data (learning data) stored in the corpus 27 into a plurality of words by morphological analysis using an existing dictionary.

ベクトル生成部２３は、Ｓ１１に続くＳ１３において、Ｓ１１における形態素解析の結果をもとに、単語の意味表現である単語ベクトルを機械学習によって生成するともに、文章の特徴を表す文章ベクトル、および文書の特徴ベクトルである文書ベクトルを文書中に含まれている単語の特徴量の総和等から生成する。 In S13 following S11, the vector generation unit 23 generates word vectors, which are semantic representations of words, by machine learning based on the results of the morphological analysis in S11, and also generates sentence vectors representing the characteristics of sentences and text vectors representing the characteristics of the sentences. A document vector, which is a feature vector, is generated from the sum of feature amounts of words included in the document.

単語－文章間学習部２５ｂは、Ｓ１３に続くＳ１５において、ベクトル空間における単語と文章との間の関連度、およびベクトル距離を計算する。単語－文書間学習部２５ｃは、Ｓ１５に続くＳ１７において、ベクトル空間における単語と文書間の関連度、およびベクトル距離を計算する。 In S15 following S13, the word-sentence learning unit 25b calculates the degree of relevance and the vector distance between the word and the sentence in the vector space. In S17 following S15, the word-document learning unit 25c calculates the degree of relevance and vector distance between the word and the document in the vector space.

関連度学習部２５は、Ｓ１７に続くＳ１９において、学習データとして使用したコーパス２７の文書データとともに、Ｓ１３，Ｓ１５，Ｓ１７での計算結果を、単語ベクトル、単語－文章ベクトル、および単語－文書ベクトルとしてデータベース３０に格納する。 In S19 following S17, the relevance learning unit 25 converts the document data of the corpus 27 used as learning data and the calculation results of S13, S15, and S17 into word vectors, word-sentence vectors, and word-document vectors. Store in database 30 .

（検索処理部４０の構成）
再び図２を参照して、検索処理部４０は、入力部１、解析部１１、特徴抽出部１３、検索部１５、表示制御部１７、および出力部５を含む。検索処理部４０において、入力部１にはユーザによって検索クエリが入力される。検索クエリは、たとえば、分析関連検索キーワード、分析関連化合物名、分析関連分析対象物名を含む。入力部１は、キーボードＫＢ１、およびマウスＭＳ１を含む。出力部５は、ディスプレイ６０を含む。(Configuration of search processing unit 40)
Referring to FIG. 2 again, search processing unit 40 includes input unit 1 , analysis unit 11 , feature extraction unit 13 , search unit 15 , display control unit 17 and output unit 5 . A user inputs a search query to the input unit 1 of the search processing unit 40 . Search queries include, for example, analysis-related search keywords, analysis-related compound names, and analysis-related analyte names. The input unit 1 includes a keyboard KB1 and a mouse MS1. Output unit 5 includes display 60 .

解析部１１は、入力部１に入力された検索クエリに対して、予め定義された検索辞書をもとに形態素解析を行って、検索クエリを単語に分割する。特徴抽出部１３は、学習処理部２０によって生成されたベクトル空間を用いて、検索クエリのベクトル表現を算出する。 The analysis unit 11 performs morphological analysis on the search query input to the input unit 1 based on a predefined search dictionary, and divides the search query into words. The feature extraction unit 13 uses the vector space generated by the learning processing unit 20 to calculate the vector representation of the search query.

検索部１５は、特徴抽出部１３から取得した検索クエリのベクトル表現を用いて、データベース３０から検索クエリに関連する関連文書を検索する。検索部１５は、データベース３０から検索クエリのベクトル表現との距離が閾値よりも小さい関連文書を検索する。検索部１５は、検索クエリのベクトル表現と、検索された複数の関連文書の各々についてのベクトル表現との距離が短いほど当該関連文書を上位に順位付けた検索結果を表示制御部１７に出力する。表示制御部１７は、検索部１５による順位付けの順に関連文書が表示されるように出力部５を制御する。出力部５は、表示制御部１７による制御結果に応じた情報をディスプレイ６０に表示する。 The search unit 15 uses the vector representation of the search query acquired from the feature extraction unit 13 to search the database 30 for related documents related to the search query. The search unit 15 searches the database 30 for related documents whose distance from the vector representation of the search query is smaller than a threshold. The search unit 15 outputs to the display control unit 17 a search result in which the shorter the distance between the vector representation of the search query and the vector representation of each of the retrieved related documents, the related documents are ranked higher. . The display control unit 17 controls the output unit 5 so that related documents are displayed in the order of ranking by the search unit 15 . The output unit 5 displays information on the display 60 according to the control result of the display control unit 17 .

図５は、図２の検索処理部４０によって行われる検索処理の流れを説明するためのフローチャートである。図５に示されるように、入力部１は、Ｓ２１において、ユーザによって入力されたより検索クエリを受け付ける。解析部１１は、Ｓ２１に続くＳ２３において、検索クエリに対して形態素解析を行って、検索クエリを最小単位の形態素（単語）に分割する。特徴抽出部１３は、Ｓ２３に続くＳ２５において、検索クエリの形態素解析の結果および学習処理部によって生成されたベクトル空間を用いて、検索クエリのベクトル表現を算出する。 FIG. 5 is a flowchart for explaining the flow of search processing performed by the search processing unit 40 of FIG. As shown in FIG. 5, the input unit 1 receives a search query input by the user in S21. In S23 following S21, the analysis unit 11 performs morphological analysis on the search query to divide the search query into minimum unit morphemes (words). In S25 following S23, the feature extraction unit 13 calculates a vector representation of the search query using the result of the morphological analysis of the search query and the vector space generated by the learning processing unit.

検索部１５は、Ｓ２５に続くＳ２７において、コーパス２７に対する学習によってベクトル化された学習データ等が蓄積されたデータベース３０から、検索クエリと関連がある関連文書を検索する。Ｓ２７においては、検索結果として、検索クエリに関連のある、あるいは関連性の高い文書が検索される。検索クエリに関連の高い文書とは、あらかじめ単語と文書間の関連を計算して得た、ベクトル空間における単語と文書間の関連度が高く、ベクトル距離が短い文書である。検索部１５は、検索された複数の関連文書をベクトル距離が小さい順に順位付けする。 In S27 following S25, the search unit 15 searches the database 30 storing learning data vectorized by learning the corpus 27 for related documents related to the search query. In S27, documents related or highly relevant to the search query are retrieved as the search results. A document highly relevant to a search query is a document having a high degree of association between a word and a document in the vector space and a short vector distance, which is obtained by calculating the association between the word and the document in advance. The search unit 15 ranks the searched related documents in ascending order of vector distance.

表示制御部１７は、Ｓ２７に続くＳ２９において、検索された複数の関連文書を検索部１５の順位付けに基づいて出力部５に表示する。ユーザは、検索された少なくとも１つの関連文書の各々と検索クエリとの関連性の順位に従って閲覧する関連文書を決定することができる。 In S29 following S27, the display control unit 17 displays the plurality of retrieved related documents on the output unit 5 based on the ranking by the retrieval unit 15. FIG. A user can determine a related document to browse according to the ranking of relevance between each of the at least one retrieved related document and the search query.

図６は、図２の表示制御部１７によってディスプレイ６０に表示される検索結果ウィンドウＷｎ２の一例を示す図である。図６に示されるように、検索結果ウィンドウＷｎ２には、関連文書Ｄ１～Ｄ４が順位１～４とともに順番にそれぞれ表示されている。関連文書Ｄ１～Ｄ４の各々にはハイパーリンクが設定されている。図６において、データベース３０から検索された複数の関連文書のうち、「臭素酸」という検索クエリに最も関連のある関連文書はＤ１である。 FIG. 6 is a diagram showing an example of a search result window Wn2 displayed on the display 60 by the display control section 17 of FIG. As shown in FIG. 6, the related documents D1 to D4 are displayed in order with the ranks 1 to 4 in the search result window Wn2. A hyperlink is set for each of the related documents D1 to D4. In FIG. 6, among the plurality of related documents retrieved from the database 30, the related document D1 is most relevant to the search query "bromic acid".

再び図５も参照して、図６においてユーザがカーソルＣｒを操作して関連文書Ｄ１のハイパーリンクを選択した場合、表示制御部１７は、Ｓ２９に続くＳ３１において、選択されたハイパーリンクに対応する関連文書の内容を表示する。図７は、図６の関連文書Ｄ１が選択された場合に検索結果ウィンドウＷｎ２に表示される関連文書Ｄ１の内容を示す図である。 Again referring to FIG. 5, when the user operates the cursor Cr to select the hyperlink of the related document D1 in FIG. View the content of related documents. FIG. 7 shows the contents of the related document D1 displayed in the search result window Wn2 when the related document D1 of FIG. 6 is selected.

図７に示されるように、表示制御部１７は、検索クエリのベクトル表現と、関連文書Ｄ１に含まれる複数の単語の各々について、当該単語のベクトル表現と検索クエリのベクトル表現との距離に応じて、検索結果ウィンドウＷｎ２における当該単語の周辺領域の色を変更することにより、当該単語をハイライトする。関連文書Ｄ１に含まれる単語のベクトル表現と検索クエリのベクトル表現との距離は、距離Ｄｓ３より大きい範囲Ｒ１、距離Ｄｓ２（＜Ｄｓ３）より大きく距離Ｄｓ３以下の範囲Ｒ２、距離Ｄｓ１（＜Ｄｓ２）より大きく距離Ｄｓ２以下の範囲Ｒ３、距離Ｄｓ１以下の範囲Ｒ４の４段階に分けられている。カラーマップＣＭ１において、範囲Ｒ１～Ｒ４にそれぞれ互いに異なる色ＣＬ１～ＣＬ４が割り当てられている。なお、各単語と検索クエリとの関連性は、カラーマップＣＭ１において連続的な色変化（グラデーション）として表示されてもよい。 As shown in FIG. 7, the display control unit 17 controls the vector representation of the search query and for each of a plurality of words included in the related document D1, according to the distance between the vector representation of the word and the vector representation of the search query. The word is highlighted by changing the color of the surrounding area of the word in the search result window Wn2. The distance between the vector representation of the words included in the related document D1 and the vector representation of the search query is a range R1 greater than the distance Ds3, a range R2 greater than the distance Ds2 (<Ds3) and less than or equal to the distance Ds3, and a distance Ds1 (<Ds2). It is roughly divided into four stages: a range R3 with a distance Ds2 or less and a range R4 with a distance Ds1 or less. In the color map CM1, different colors CL1-CL4 are assigned to the ranges R1-R4, respectively. Note that the relevance between each word and the search query may be displayed as a continuous color change (gradation) in the color map CM1.

色ＣＬ２でハイライトされた単語ＷＤ５と検索クエリとの関連性は、色ＣＬ１でハイライトされた単語（不図示）と検索クエリとの関連性よりも大きい。色ＣＬ３でハイライトされた単語ＷＤ２，ＷＤ４，ＷＤ７と検索クエリとの関連性は、単語ＷＤ５と検索クエリとの関連性よりも大きい。色ＣＬ４でハイライトされた単語ＷＤ１，ＷＤ３，ＷＤ６，ＷＤ８と検索クエリとの関連性は、単語ＷＤ２，ＷＤ４，ＷＤ７と検索クエリとの関連性よりも大きい。 The relevance between the word WD5 highlighted with the color CL2 and the search query is greater than the relevance between the word (not shown) highlighted with the color CL1 and the search query. The relevance of words WD2, WD4, and WD7 highlighted with color CL3 to the search query is greater than the relevance of word WD5 to the search query. The relevance of words WD1, WD3, WD6, WD8 highlighted with color CL4 to the search query is greater than the relevance of words WD2, WD4, WD7 to the search query.

再び図５も参照して、表示制御部１７は、Ｓ３１に続くＳ３３において、検索結果ウィンドウにおいてハイライトされている言語単位に対して選択操作（たとえばマウス操作によるダブルクリック）がされたか否かを判定する。ハイライトされている言語単位に対して選択操作がされた場合（Ｓ３３においてＹＥＳ）、表示制御部１７は、選択された言語単位をＳ３５において検索クエリに設定して、処理をＳ２３に戻す。たとえば、図７の単語ＷＤ２の周辺領域にカーソルＣｒが重なっている状態において、ユーザがマウスＭＳ１をダブルクリックした場合、単語ＷＤ２が検索クエリに設定されて、図５のＳ２３からの検索処理が開始される。ユーザは今回の検索クエリとの関連性に着目して、新たな検索クエリを決定することができる。 Again referring to FIG. 5, in S33 subsequent to S31, the display control unit 17 determines whether or not a selection operation (for example, a double-click by a mouse operation) has been performed on the language unit highlighted in the search result window. judge. If a selection operation has been performed on the highlighted linguistic unit (YES in S33), the display control unit 17 sets the selected linguistic unit in the search query in S35, and returns the process to S23. For example, when the user double-clicks the mouse MS1 while the cursor Cr is superimposed on the area surrounding the word WD2 in FIG. 7, the word WD2 is set as the search query, and the search process starts from S23 in FIG. be done. The user can determine a new search query by focusing on the relevance to the current search query.

ハイライトされている言語単位に対する選択操作がされていない場合（Ｓ３３においてＮＯ）、表示制御部１７は、Ｓ３７において検索結果ウィンドウの閉止操作がされたか否かを判定する。検索結果ウィンドウの閉止操作（たとえば図７のボタンＢｎ３の押下）がされていない場合（Ｓ３７においてＮＯ）、表示制御部１７は、処理をＳ３３に戻す。検索結果ウィンドウの閉止操作がされた場合（Ｓ３７においてＹＥＳ）、表示制御部１７は、処理を終了する。 If the highlighted linguistic unit has not been selected (NO in S33), the display control unit 17 determines in S37 whether or not the search result window has been closed. If the operation to close the search result window (for example, pressing the button Bn3 in FIG. 7) has not been performed (NO in S37), the display control unit 17 returns the process to S33. If the search result window has been closed (YES in S37), the display control unit 17 terminates the process.

図７に示される検索結果ウィンドウＷｎ２において、ユーザは、関連文書Ｄ１に含まれる複数の単語の各々について、検索クエリとの関連性をハイライト色の違いとして確認することができる。関連文書Ｄ１に検索クエリと一致する文字列が含まれていない場合でも、ユーザは、関連文書Ｄ１が検索された根拠を視覚的に把握することができる。また、複数の単語の各々のベクトル表現と検索クエリのベクトル表現との距離に応じて当該単語のハイライト色を変更することにより、検索された関連文書に含まれる複数の単語の各々と検索クエリとの関連性を、当該単語以外の他の単語に依存しない直接的な関連性としてユーザに示すことができる。 In the search result window Wn2 shown in FIG. 7, the user can confirm the relevance of each of the plurality of words included in the related document D1 to the search query as a difference in highlight color. Even if the related document D1 does not contain a character string that matches the search query, the user can visually grasp the reason why the related document D1 was retrieved. Further, by changing the highlight color of the word according to the distance between the vector representation of each of the multiple words and the vector representation of the search query, each of the multiple words contained in the retrieved related documents and the search query can be shown to the user as a direct relationship that does not depend on other words other than the word.

図７においては、関連文書に含まれる複数の単語の各々について、当該単語と検索クエリとの関連性を、当該単語のベクトル表現と検索クエリのベクトル表現との距離として表現する場合について説明した。当該単語と検索クエリとの関連性は、当該関連文書のベクトル表現と検索クエリのベクトル表現との距離への当該単語の寄与度として表現されてもよい。 In FIG. 7, for each of a plurality of words included in related documents, the case where the relevance between the word and the search query is expressed as the distance between the vector representation of the word and the vector representation of the search query has been described. The relevance between the word and the search query may be expressed as the contribution of the word to the distance between the vector representation of the relevant document and the vector representation of the search query.

データベース３０に格納されている文書Ｗ_ｉは、以下の式（１）のように単語ｗｄ_ｉ，ｋの集合として表現される。なお、自然数ｉは、１から自然数Ｄ（＞１）までのいずれかの自然数である。自然数ｋ，ｔの各々は、１から自然数Ｎ（＞１）までのいずれかの自然数である。A document W _i stored in the database 30 is expressed as a set of words wd _i,k as shown in Equation (1) below. The natural number i is any natural number from 1 to D (>1). Each of natural numbers k and t is any natural number from 1 to natural number N (>1).

データベース３０に格納されている複数の文書は、以下の式（２）のように文章集合Ｗとして表現される。 A plurality of documents stored in the database 30 are expressed as a set of sentences W as shown in Equation (2) below.

文書Ｗ_ｉと検索クエリＱとの距離Ｄｓ_ｉは、以下の式（３）のように表現される。The distance _Dsi between the document _Wi and the search query Q is expressed as in Equation (3) below.

式（３）における関数ｆは、引数のベクトル表現を返す関数である。関数ｆとしては、たとえば、Ｄｏｃ２Ｖｅｃ、Ｋ－ｈｏｔベクトル、Ｏｎｅ－ｈｏｔベクトルの線形結合、単語の数え上げによるベクトル表現、およびトピックモデルを挙げることができる。 The function f in equation (3) is a function that returns a vector representation of its arguments. Functions f include, for example, Doc2Vec, K-hot vectors, linear combinations of One-hot vectors, vector representations by counting words, and topic models.

式（１）の文書Ｗ_ｉに含まれる単語ｗｄ_ｉ，ｔの距離Ｄｓ_ｉへの寄与度を求めるために、文書Ｗ_ｉから単語ｗｄ_ｉ，ｔを削除した文書Ｗ_ｉ，／ｔを以下の式（４）のように定義する。In order to obtain the degree of contribution of words wd _i,t included in document W _i in equation (1) to distance Dsi, document W _i ,/t obtained by deleting words wd _i _,t from document W _i is expressed as follows: It is defined as in formula (4).

文書Ｗ_ｉ，／ｔと検索クエリＱとの距離Ｄｓ_ｉ，／ｔは、以下の式（５）のように表現される。The distance Ds _i,/t between the document Wi, _/t and the search query Q is expressed as in Equation (5) below.

検索クエリＱと文書Ｗ_ｉとの関連性への単語ｗｄ_ｉ，ｔの寄与度が大きい程、検索クエリＱと文書Ｗ_ｉ，／ｔとの関連性は小さくなる。すなわち、検索クエリＱと文書Ｗ_ｉとの関連性への単語ｗｄ_ｉ，ｔの寄与度が大きい程、距離Ｄｓ_ｉ，／ｔが大きくなる。その結果、距離Ｄｓ_ｉ，／ｔと距離Ｄｓ_ｉとの差が大きくなる。単語ｗｄ_ｉ，ｔの寄与度Ｃｎ_ｉ，ｎは、以下の式（６）のように表される。The greater the contribution of word wd _i,t to the relevance between search query Q and document _Wi , the smaller the relevance between search query Q and document _Wi,/t . That is, the greater the contribution of the word wd _i,t to the relevance between the search query Q and the document _Wi , the greater the distance Dsi _,/t . As a result, the difference between the distance Ds _i,/t and the distance Ds _i increases. The contribution Cn _i,n of the word wd _i, t is represented by the following equation (6).

図８は、関連文書Ｄ１に含まれる単語と検索クエリとの関連性として当該単語の寄与度を用いて、関連文書Ｄ１に含まれる複数の単語の各々をハイライトした様子を示す図である。図８の検索結果ウィンドウＷｎ２の内容は、図７の検索結果ウィンドウＷｎ２のカラーマップＣＭ１がカラーマップＣＭ２に置き換えられた内容である。これ以外は同様であるため、説明を繰り返さない。 FIG. 8 is a diagram showing how each of a plurality of words included in the related document D1 is highlighted using the degree of contribution of the word as the relationship between the word included in the related document D1 and the search query. The contents of the search result window Wn2 in FIG. 8 are obtained by replacing the color map CM1 of the search result window Wn2 in FIG. 7 with the color map CM2. Other than this, they are the same, so the description will not be repeated.

図８に示されるように、表示制御部１７は、関連文書Ｄ１に含まれる複数の単語の各々について、当該単語の寄与度に応じて、検索結果ウィンドウＷｎ２における当該単語の周辺領域の色を変更し、当該単語をハイライトする。関連文書Ｄ１に含まれる単語の寄与度は、寄与度Ｃｎ１より小さい範囲Ｒ１１、寄与度Ｃｎ１以上であって寄与度Ｃｎ２（＞Ｃｎ１）より小さい範囲Ｒ１２、寄与度Ｃｎ２以上であって寄与度Ｃｎ３（＞Ｃｎ２）より小さい範囲Ｒ１３、および寄与度Ｃｎ３以上の範囲Ｒ１４の４段階に分けられている。カラーマップＣＭ２において、範囲Ｒ１１～Ｒ１４にそれぞれ互いに異なる色ＣＬ１～ＣＬ４が割り当てられている。なお、単語の寄与度は、カラーマップＣＭ２において連続的な色変化（グラデーション）として表示されてもよい。 As shown in FIG. 8, the display control unit 17 changes the color of the peripheral area of each of the words included in the related document D1 in the search result window Wn2 according to the degree of contribution of the word. to highlight the word. The degree of contribution of the words included in the related document D1 is the range R11 smaller than the contribution Cn1, the range R12 greater than or equal to the contribution Cn1 and smaller than the contribution Cn2 (>Cn1), and the range R12 greater than or equal to the contribution Cn2 and Cn3 ( >Cn2) It is divided into four stages: a range R13 smaller than Cn2 and a range R14 greater than or equal to Cn3. In the color map CM2, different colors CL1 to CL4 are assigned to the ranges R11 to R14, respectively. Note that the degree of contribution of words may be displayed as a continuous color change (gradation) in the color map CM2.

検索された関連文書に含まれる複数の単語の各々の寄与度に応じて当該単語のハイライト色を変更することにより、当該関連文書に含まれる複数の言語単位の各々と検索クエリとの関連性を、当該言語単位以外の他の言語単位と検索クエリとの関連性が反映された総合的な関連性としてユーザに示すことができる。 Relevance between each of the plurality of linguistic units included in the related document and the search query by changing the highlight color of the word according to the degree of contribution of each of the words included in the related document can be presented to the user as a comprehensive relevance that reflects the relevance between other linguistic units and the search query.

図７および図８においては、検索結果ウィンドウにおいてハイライトされる言語単位が単語である場合について説明した。ハイライトされる言語単位は、単語以外であってもよい。言語単位の種類を変化させることにより、複数の観点から関連文書が検索された根拠をユーザに示すことができる。 7 and 8, the case where the linguistic unit highlighted in the search result window is a word has been described. The linguistic units highlighted may be other than words. By changing the type of linguistic unit, it is possible to show the user the grounds on which related documents have been retrieved from a plurality of viewpoints.

図９は、ハイライトされる言語単位が文章である場合の検索結果ウィンドウＷｎ２に表示される関連文書Ｄ１の内容を示す図である。図９に示されるカラーマップＣＭ２は、文章の寄与度の分布を示す。 FIG. 9 shows the contents of the related document D1 displayed in the search result window Wn2 when the highlighted linguistic unit is a sentence. The color map CM2 shown in FIG. 9 shows the distribution of the degree of contribution of sentences.

図９に示されるように、色ＣＬ２でハイライトされた文章ＳＴ４と検索クエリとの関連性は、色ＣＬ１でハイライトされた文章（不図示）と検索クエリとの関連性よりも大きい。色ＣＬ３でハイライトされた文章ＳＴ３，ＳＴ６と検索クエリとの関連性は、文章ＳＴ４と検索クエリとの関連性よりも大きい。色ＣＬ４でハイライトされた文章ＳＴ１，ＳＴ２，ＳＴ５，ＳＴ７と検索クエリとの関連性は、文章ＳＴ３，ＳＴ６と検索クエリとの関連性よりも大きい。 As shown in FIG. 9, the relevance between the sentence ST4 highlighted with the color CL2 and the search query is greater than the relevance between the sentence (not shown) highlighted with the color CL1 and the search query. The relevance between the sentences ST3 and ST6 highlighted with the color CL3 and the search query is greater than the relevance between the sentence ST4 and the search query. The relevance between the sentences ST1, ST2, ST5 and ST7 highlighted with the color CL4 and the search query is greater than the relevance between the sentences ST3 and ST6 and the search query.

図１０は、ハイライトされる言語単位が段落である場合の検索結果ウィンドウＷｎ２に表示される関連文書Ｄ１の内容を示す図である。図１０に示されるカラーマップＣＭ２は、段落の寄与度の分布を示す。 FIG. 10 shows the contents of the related document D1 displayed in the search result window Wn2 when the highlighted linguistic unit is a paragraph. The color map CM2 shown in FIG. 10 indicates the distribution of paragraph contributions.

図１０に示されるように、色ＣＬ２でハイライトされた段落ＰＲ３と検索クエリとの関連性は、色ＣＬ１でハイライトされた段落（不図示）と検索クエリとの関連性よりも大きい。色ＣＬ３でハイライトされた段落ＰＲ１と検索クエリとの関連性は、段落ＰＲ３と検索クエリとの関連性よりも大きい。色ＣＬ４でハイライトされた段落ＰＲ２と検索クエリとの関連性は、段落ＰＲ１と検索クエリとの関連性よりも大きい。 As shown in FIG. 10, the relevance between the paragraph PR3 highlighted with the color CL2 and the search query is greater than the relevance between the paragraph (not shown) highlighted with the color CL1 and the search query. The relevance of paragraph PR1 highlighted with color CL3 to the search query is greater than the relevance of paragraph PR3 to the search query. The relevance between paragraph PR2 highlighted with color CL4 and the search query is greater than the relevance between paragraph PR1 and the search query.

なお、検索結果ウィンドウＷｎ２においてハイライトされる言語単位は１種類である必要はなく、文字、形態素、単語、文章、段落、およびこれらの任意の組み合わせからなる群から選択されてもよい。たとえば、文字および形態素がハイライトされる言語単位とされてもよいし、単語、文章、および段落がハイライトされる言語単位とされてもよい。 Note that the linguistic unit highlighted in the search result window Wn2 does not have to be of one type, and may be selected from the group consisting of characters, morphemes, words, sentences, paragraphs, and any combination thereof. For example, letters and morphemes may be the linguistic units highlighted, or words, sentences, and paragraphs may be the linguistic units highlighted.

図１１は、実施の形態に係る関連文書を検索して表示するシステムの一例である分析事例検索システム１００Ａがネットワーク７０を介して複数のクライアント端末８０ａ～８０ｎに接続された場合の、分析事例検索システム１００Ａの構成を示す機能ブロック図である。図１１に示されるように、分析事例検索システム１００Ａは、たとえばインターネット環境において、ユーザ等からの要求に応じて分析事例の検索を可能にし、検索結果をユーザ等に提供する。 FIG. 11 shows an analysis case search when an analysis case search system 100A, which is an example of a system for searching and displaying related documents according to the embodiment, is connected to a plurality of client terminals 80a to 80n via a network 70. It is a functional block diagram which shows the structure of system 100A. As shown in FIG. 11, the analysis case search system 100A enables search of analysis cases in response to a request from a user or the like, for example, in an Internet environment, and provides the search results to the user or the like.

分析事例検索システム１００Ａと、複数のクライアント端末８０ａ～８０ｎとが、インターネット等の情報通信用のネットワーク７０を介して通信可能に接続されている。分析事例検索システム１００Ａおよび複数のクライアント端末８０ａ～８０ｎは、クライアントサーバシステム１０００を構成する。分析事例検索システム１００Ａ内の通信部６１は、ネットワーク７０とのインターフェイスである。制御部６５は、ＣＰＵ（Central Processing Unit）を含み、学習処理部２０および検索処理部４０を含む分析事例検索システム１００Ａの全体の制御を司る。 An analysis case retrieval system 100A and a plurality of client terminals 80a to 80n are communicably connected via a network 70 for information communication such as the Internet. The analysis case search system 100A and the plurality of client terminals 80a to 80n constitute a client server system 1000. FIG. A communication unit 61 in the analysis case search system 100A is an interface with the network 70 . The control unit 65 includes a CPU (Central Processing Unit) and controls the entire analysis case search system 100A including the learning processing unit 20 and the search processing unit 40 .

メモリ６７には、上述した学習処理部２０における学習処理プログラム、および検索処理部４０による検索処理プログラム等が格納されている。制御部６５は、メモリ６７からこれらのプログラムを読み出して、図４および図５に示される所定の処理等を実行する。なお、分析事例検索システム１００Ａを、ネットワーク７０に接続されたサーバ装置と位置づけることもできる。すなわち、分析事例検索システム１００Ａにおいては、検索処理部４０による表示処理（表示工程）がサーバサイドにおいて行われる。既存のクライアント端末をサーバ装置に接続することにより、当該クライアント端末を介して関連文書が検索された根拠をユーザに示すことができる。 The memory 67 stores a learning processing program for the learning processing unit 20 described above, a search processing program for the search processing unit 40, and the like. The control unit 65 reads out these programs from the memory 67 and executes predetermined processes and the like shown in FIGS. 4 and 5. FIG. Note that the analysis case search system 100A can also be positioned as a server device connected to the network 70. FIG. That is, in the analysis case search system 100A, the display processing (display process) by the search processing unit 40 is performed on the server side. By connecting an existing client terminal to the server device, it is possible to show the user the reason why the relevant document was retrieved via the client terminal.

検索処理部４０による表示処理は、複数のクライアント端末８０ａ～８０ｎ（クライアントサイド）において行われてもよい。クライアント端末を既存のサーバ装置に接続することにより、当該クライアント端末を介して関連文書が検索された根拠をユーザに示すことができる。 The display processing by the search processing unit 40 may be performed in a plurality of client terminals 80a to 80n (client side). By connecting the client terminal to the existing server device, it is possible to show the user the reason why the related document was retrieved via the client terminal.

以上、実施の形態に係る関連文書を検索して表示する方法およびシステムによれば、検索結果の根拠をユーザに示すことができる。 As described above, according to the method and system for searching and displaying related documents according to the embodiment, it is possible to show the user the basis of the search result.

［態様］
上述した複数の例示的な実施の形態は、以下の態様の具体例であることが当業者により理解される。[Aspect]
It will be appreciated by those skilled in the art that the multiple exemplary embodiments described above are specific examples of the following aspects.

（第１項）一態様に係る関連文書を検索して表示する方法は、複数の文書を含むデータベースから検索クエリに関連する少なくとも１つの関連文書を検索して表示する。当該方法は、検索工程と、表示工程とを含む。検索工程は、任意の言語単位をベクトル表現に変換するベクトル空間を用いて複数の文書の各々についてのベクトル表現と検索クエリのベクトル表現との距離を算出し、距離に応じて少なくとも１つの関連文書を検索する。表示工程は、少なくとも１つの関連文書の各々を表示する。表示工程は、当該関連文書に含まれる複数の言語単位の各々のベクトル表現と検索クエリのベクトル表現とに基づく当該言語単位と検索クエリとの関連性の大きさに応じた表示態様で、当該言語単位を表示する工程を含む。 (Section 1) A method for retrieving and displaying related documents according to one aspect retrieves and displays at least one related document related to a search query from a database including a plurality of documents. The method includes a searching step and a displaying step. The retrieving step calculates a distance between the vector representation of each of the plurality of documents and the vector representation of the search query using a vector space that converts any linguistic unit into a vector representation, and retrieves at least one related document according to the distance. Search for The displaying step displays each of the at least one related document. In the display step, the language is displayed in a display mode according to the degree of relevance between the language unit and the search query based on the vector expression of each of the plurality of language units included in the related document and the vector expression of the search query. Including the step of displaying units.

第１項に記載の方法によれば、当該関連文書に含まれる複数の言語単位の各々のベクトル表現と検索クエリのベクトル表現とに基づく当該言語単位と検索クエリとの関連性の大きさに応じた表示態様で、当該言語単位を表示することにより、検索結果の根拠をユーザに示すことができる。 According to the method described in item 1, according to the degree of relevance between the linguistic unit and the search query based on the vector representation of each of the plurality of linguistic units included in the related document and the vector representation of the search query By displaying the linguistic unit in this display mode, it is possible to show the user the basis of the search result.

（第２項）第１項に記載の方法において、当該言語単位の表示態様は、当該言語単位の周辺領域の色を含む。 (Section 2) In the method described in Section 1, the display mode of the linguistic unit includes the color of the surrounding area of the linguistic unit.

第２項に記載の方法によれば、検索された関連文書に含まれる複数の言語単位の各々と検索クエリとの関連性が、当該言語単位の周辺領域の色の違いとして視覚的に把握することができる。 According to the method described in item 2, the relevance between each of the plurality of linguistic units included in the retrieved related documents and the search query is visually grasped as the difference in color of the surrounding area of the linguistic unit. be able to.

（第３項）第１項または第２項に記載の方法において、当該言語単位と検索クエリとの関連性の大きさは、複数の言語単位の各々のベクトル表現と検索クエリのベクトル表現との距離である。 (Section 3) In the method described in Section 1 or Section 2, the degree of relevance between the linguistic unit and the search query is determined by the relationship between the vector representation of each of the plurality of linguistic units and the vector representation of the search query. Distance.

第３項に記載の方法によれば、検索された関連文書に含まれる複数の言語単位の各々と検索クエリとの関連性を、当該言語単位以外の他の言語単位に依存しない直接的な関連性としてユーザに示すことができる。 According to the method described in paragraph 3, the relevance between each of the plurality of linguistic units included in the retrieved related documents and the search query is directly related without depending on other linguistic units other than the relevant linguistic unit. It can be shown to the user as gender.

（第４項）第１項または第２項に記載の方法において、当該言語単位と検索クエリとの関連性の大きさは、複数の言語単位の各々の寄与度である。寄与度は、少なくとも１つの関連文書の各々から当該言語単位を除いたベクトル表現と検索クエリのベクトル表現との距離から、当該関連文書のベクトル表現と検索クエリのベクトル表現との距離を減算した値である。 (Item 4) In the method described in item 1 or 2, the degree of relevance between the language unit and the search query is the degree of contribution of each of the plurality of language units. The degree of contribution is a value obtained by subtracting the distance between the vector representation of the related document and the vector representation of the search query from the distance between the vector representation of the search query and the vector representation of at least one related document excluding the relevant linguistic unit. is.

第４項に記載の方法によれば、検索された関連文書に含まれる複数の言語単位の各々と検索クエリとの関連性を、当該言語単位以外の他の言語単位と検索クエリとの関連性が反映された総合的な関連性としてユーザに示すことができる。 According to the method described in paragraph 4, the relevance between each of the plurality of language units included in the related documents retrieved and the search query is determined by determining the relevance between the search query and other language units other than the relevant language unit. can be shown to the user as a comprehensive relevance reflecting

（第５項）第１項に記載の方法において、検索工程は、検索クエリのベクトル表現と少なくとも１つの関連文書の各々についてのベクトル表現との距離が短いほど、当該文書を上位に順位付ける工程を含む。表示工程は、検索工程による順位付けに従って少なくとも１つの関連文書を表示する工程を含む。 (Section 5) In the method according to Section 1, the search step ranks the document higher as the distance between the vector representation of the search query and the vector representation for each of the at least one related document is shorter. including. The displaying step includes displaying the at least one relevant document as ranked by the searching step.

第５項に記載の方法によれば、ユーザは、検索された少なくとも１つの関連文書の各々と検索クエリとの関連性の順位に従って、閲覧する関連文書を決定することができる。 According to the method of item 5, the user can decide which related document to browse according to the ranking of relevance between each of the retrieved at least one related document and the search query.

（第６項）第１項に記載の方法において、複数の言語単位の種類は、文字、形態素、単語、文章、段落、およびこれらの任意の組み合わせからなる群から選択される。 (Item 6) In the method of item 1, the types of the plurality of linguistic units are selected from the group consisting of characters, morphemes, words, sentences, paragraphs, and any combination thereof.

第６項に記載の方法によれば、複数の言語単位の種類を変化させることにより、複数の観点から関連文書が検索された根拠をユーザに示すことができる。 According to the method described in item 6, by changing the types of a plurality of linguistic units, it is possible to show the user the reason why the related document was retrieved from a plurality of viewpoints.

（第７項）第１項に記載の方法は、クライアントサーバシステムにおいて行われる。表示工程は、サーバサイドにおいて行われる。 (Section 7) The method described in Section 1 is performed in a client-server system. The display process is performed on the server side.

第７項に記載の方法によれば、既存のクライアント端末をサーバ装置に接続することにより、当該クライアント端末を介して関連文書が検索された根拠をユーザに示すことができる。 According to the method described in Item 7, by connecting an existing client terminal to the server device, it is possible to show the user the reason why the related document was retrieved via the client terminal.

（第８項）第１項に記載の方法は、クライアントサーバシステムにおいて行われる。表示工程は、クライアントサイドにおいて行われる。 (Section 8) The method described in Section 1 is performed in a client-server system. The display process takes place on the client side.

第８項に記載の方法によれば、クライアント端末を既存のサーバ装置に接続することにより、当該クライアント端末を介して関連文書が検索された根拠をユーザに示すことができる。 According to the method described in Item 8, by connecting the client terminal to the existing server device, it is possible to show the user the reason why the related document was retrieved via the client terminal.

（第９項）第１項に記載の方法において、検索工程は、表示されている複数の言語単位に含まれる或る言語単位がユーザによって選択された場合、当該言語単位を検索クエリとして少なくとも１つの関連文書を検索する工程を含む。 (Item 9) In the method according to item 1, in the searching step, when a certain linguistic unit included in the displayed plurality of linguistic units is selected by the user, at least one linguistic unit is used as a search query. retrieving two related documents.

第９項に記載の方法によれば、ユーザは今回の検索クエリとの関連性に着目して、新たな検索クエリを決定することができる。 According to the method described in item 9, the user can determine a new search query by focusing on the relevance to the current search query.

（第１０項）第１項に記載の方法において、ベクトル空間は、データベースを含むコーパスに対して自然言語処理を行う機械学習によって生成される。 (Item 10) In the method described in Item 1, the vector space is generated by machine learning that performs natural language processing on a corpus including a database.

第１０項に記載の方法によれば、コーパスを用いたモデルに対する機械学習の過程で、コーパスの特徴が高精度に反映されたベクトル空間を生成することができる。 According to the method described in item 10, in the process of machine learning for a model using a corpus, a vector space in which the features of the corpus are highly accurately reflected can be generated.

（第１１項）第１１項に記載の関連文書を検索して表示するシステムは、複数の文書から検索クエリに関連する少なくとも１つの関連文書を検索して表示する。当該システムは、データベースと、検索処理部とを含む。データベースは、複数の文書を含む。検索処理部は、任意の言語単位をベクトル表現に変換するベクトル空間を用いて複数の文書の各々についてのベクトル表現と検索クエリのベクトル表現との距離を算出し、距離に応じて少なくとも１つの関連文書を検索する。検索処理部は、少なくとも１つの関連文書の各々を表示する。検索処理部は、当該関連文書に含まれる複数の言語単位の各々のベクトル表現と検索クエリのベクトル表現とに基づく当該言語単位と検索クエリとの関連性の大きさに応じた表示態様で、当該言語単位を表示する。 (Item 11) The system for retrieving and displaying related documents according to Item 11 retrieves and displays at least one related document related to a search query from a plurality of documents. The system includes a database and a search processor. The database contains multiple documents. The search processing unit calculates the distance between the vector representation of each of the plurality of documents and the vector representation of the search query using a vector space that converts any linguistic unit into a vector representation, and determines at least one relation according to the distance. Search for documents. A search processor displays each of the at least one related document. The search processing unit displays the relevant Display linguistic units.

第１１項に記載のシステムによれば、関連文書に含まれる複数の言語単位の各々のベクトル表現と検索クエリのベクトル表現とに基づく当該言語単位と検索クエリとの関連性の大きさに応じた表示態様で、当該言語単位を表示することにより、検索結果の根拠をユーザに示すことができる。 According to the system of item 11, based on the vector representation of each of the plurality of linguistic units contained in the related document and the vector representation of the search query, according to the degree of relevance between the linguistic unit and the search query By displaying the relevant linguistic units in the display mode, it is possible to show the user the grounds for the search results.

なお、上述した実施の形態および変更例について、明細書内で言及されていない組み合わせを含めて、不都合または矛盾が生じない範囲内で、実施の形態で説明された構成を適宜組み合わせることは出願当初から予定されている。 It should be noted that, regarding the above-described embodiments and modifications, it is possible to appropriately combine the configurations described in the embodiments within a range that does not cause any inconvenience or contradiction, including combinations not mentioned in the specification at the time of filing. is scheduled from

今回開示された実施の形態はすべての点で例示であって制限的なものではないと考えられるべきである。本発明の範囲は上記した説明ではなくて請求の範囲によって示され、請求の範囲と均等の意味および範囲内でのすべての変更が含まれることが意図される。 It should be considered that the embodiments disclosed this time are illustrative in all respects and not restrictive. The scope of the present invention is indicated by the scope of the claims rather than the above description, and is intended to include all changes within the scope and meaning equivalent to the scope of the claims.

１入力部、５出力部、１０計算機、１１解析部、１３特徴抽出部、１５検索部、１７表示制御部、２０学習処理部、２１形態素解析部、２３ベクトル生成部、２５関連度学習部、２５ａ単語ベクトル学習部、２５ｂ単語－文章間学習部、２５ｃ単語－文書間学習部、２７コーパス、３０データベース、４０検索処理部、６０ディスプレイ、６１通信部、６５制御部、６７メモリ、７０ネットワーク、８０ａ～８０ｎクライアント端末、１００，１００Ａ分析事例検索システム、１０００クライアントサーバシステム、ＫＢ１キーボード、ＭＳ１マウス。 1 input unit, 5 output unit, 10 calculator, 11 analysis unit, 13 feature extraction unit, 15 search unit, 17 display control unit, 20 learning processing unit, 21 morphological analysis unit, 23 vector generation unit, 25 relevance learning unit, 25a word vector learning unit, 25b word-sentence learning unit, 25c word-sentence learning unit, 27 corpus, 30 database, 40 search processing unit, 60 display, 61 communication unit, 65 control unit, 67 memory, 70 network, 80a-80n client terminal, 100, 100A analysis case search system, 1000 client server system, KB1 keyboard, MS1 mouse.

Claims

1. A computer-implemented method of retrieving and displaying at least one relevant document associated with a search query from a database containing a plurality of documents, comprising:
calculating a distance between the vector representation of each of the plurality of documents and the vector representation of the search query using a vector space that converts any linguistic unit into a vector representation, and determining the at least one related document according to the distance; a search step of searching for
displaying each of the at least one related document;
The display step includes
The linguistic unit is displayed in a display mode according to the degree of relevance between the linguistic unit and the search query based on the vector representation of each of the multiple linguistic units contained in the related document and the vector representation of the search query. including the step of displaying,
The magnitude of the relevance is the contribution of each of the plurality of linguistic units,
The degree of contribution is calculated from the distance between the vector representation of the search query and the vector representation of each of the at least one related document excluding the linguistic unit, and the distance between the vector representation of the related document and the vector representation of the search query. The method, which is the value after subtracting the

2. The method of claim 1, wherein the display aspect includes the color of the surrounding area of the linguistic unit.

3. The method according to claim 1 or 2, wherein said measure of relevance is a distance between a vector representation of each of said plurality of linguistic units and a vector representation of said search query.

the searching step includes ranking the related documents higher as the distance between the vector representation of the search query and the vector representation for each of the at least one related documents is shorter;
2. The method of claim 1, wherein the displaying step comprises displaying the at least one related document according to ranking by the searching step.

2. The method of claim 1, wherein the plurality of linguistic unit types are selected from the group consisting of letters, morphemes, words, sentences, paragraphs, and any combination thereof.

The method is performed in a client-server system,
2. The method of claim 1, wherein the displaying step is performed at the server side.

The method is performed in a client-server system,
2. The method of claim 1, wherein the displaying step is performed on the client side.

The searching step includes, when a certain linguistic unit included in the plurality of displayed linguistic units is selected by the user, retrieving the at least one related document using the linguistic unit as the search query. The method of claim 1.

2. The method of claim 1, wherein the vector space is generated by machine learning performing natural language processing on a corpus containing the database.

A system for retrieving and displaying at least one related document related to a search query from a plurality of documents, comprising:
a database containing the plurality of documents;
calculating a distance between the vector representation of each of the plurality of documents and the vector representation of the search query using a vector space that converts any linguistic unit into a vector representation, and determining the at least one related document according to the distance; and displaying each of the at least one related document,
The search processing unit is
The linguistic unit is displayed in a display mode according to the degree of relevance between the linguistic unit and the search query based on the vector representation of each of the multiple linguistic units contained in the related document and the vector representation of the search query. display and
The magnitude of the relevance is the contribution of each of the plurality of linguistic units,
The degree of contribution is calculated from the distance between the vector representation of the search query and the vector representation of each of the at least one related document excluding the linguistic unit, and the distance between the vector representation of the related document and the vector representation of the search query. , the system.