JP2002014970A

JP2002014970A - Language transverse type concept retrieving system

Info

Publication number: JP2002014970A
Application number: JP2000197062A
Authority: JP
Inventors: Yoichi Nakatani; 洋一中谷; Masao Tanaka; 雅雄田中; Shigeto Higuchi; 重人樋口
Original assignee: Patolis Corp
Current assignee: Patolis Corp
Priority date: 2000-06-29
Filing date: 2000-06-29
Publication date: 2002-01-18

Abstract

PROBLEM TO BE SOLVED: To provide a language transverse type concept retrieving system by which documents in various kinds of languages can be obtained precisely and simultaneously if a user has knowledge at the word level of a foreign language except for one's mother tongue by simultaneously retrieving documents in many languages by only preparing an inquiry expression in a single language. SOLUTION: The language transverse type concept retrieving system which is constructed so as to retrieve respectively different language documents and includes plural similar document retrieving systems (21, 23), in which documents of the same contents described in respective languages are scattered is provided with a corresponding document retrieving device (6) for extracting the documents of the same contents described in each language corresponding to the document existing in the result of retrieving with respect to a retrieving question in one language and for outputting the documents of the same contents as a retrieving kind document for each language and a similar document retrieving system 23 for simultaneously retrieving the similar document in the plural languages by using the outputted retrieving kind document.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、それぞれ異なった
言語文書を検索するように構築され複数の類似文書検索
システムを含み、各類似文書検索システムには各言語で
記述された同一の内容の文献が散在している言語横断型
概念検索システムに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention includes a plurality of similar document retrieval systems constructed to retrieve documents in different languages, and each similar document retrieval system includes documents having the same contents described in each language. Is a cross-language concept retrieval system in which is scattered.

【０００２】[0002]

【従来の技術】外国特許文献を取得するためには、外国
語の特許文献データベースにアクセスし、外国語でクエ
リー（質問文）を作成して外国特許文献を取得する方
法、自国語で作成したクエリーを、自動翻訳機を用いて
外国語クエリーに翻訳して検索し、外国特許文献を取得
する方法がある。2. Description of the Related Art In order to obtain a foreign patent document, a method of accessing a foreign language patent document database, creating a query (question sentence) in a foreign language, and obtaining the foreign patent document, has been prepared in a native language. There is a method in which a query is translated into a foreign language query using an automatic translator and searched to obtain a foreign patent document.

【０００３】前者は同義語、関連語を含めたクエリーの
作成には外国語についての知識は勿論、使用する検索シ
ステムの特性についての知識も必要であり、専門家でな
いと充分な検索回答結果を得ることができないといわれ
ている。また、後者は、自国語例えば日本語と英語で
は、単語の意味関係は一対一ではなく、日本語クエリー
に一致する英語クエリーに翻訳することができない場合
が多く、満足する回答が得られないという問題点があっ
た。The former requires not only knowledge of foreign languages but also knowledge of characteristics of a search system to be used to create a query including synonyms and related words. It is said that you cannot get it. In the latter case, in the native language, for example, Japanese and English, the meaning of the words is not one-to-one, and in many cases it cannot be translated into an English query that matches the Japanese query, and a satisfactory answer cannot be obtained. There was a problem.

【０００４】[0004]

【発明が解決しようとする課題】この発明の第１の目的
は、一つの言語による質問式を作成するだけで、多数の
言語の文献を同時に検索できる言語横断型概念検索シス
テムを提供することである。SUMMARY OF THE INVENTION A first object of the present invention is to provide a cross-language concept search system capable of simultaneously searching documents in many languages simply by creating a query formula in one language. is there.

【０００５】この発明の第２の目的は、母国語以外の外
国語の単語レベルの知識があれば、種々の言語の文献を
より精密に同時に取得することができる言語横断型概念
検索システムを提供することである。[0005] A second object of the present invention is to provide a cross-language concept retrieval system capable of acquiring documents in various languages more precisely and simultaneously if there is knowledge at the word level of a foreign language other than the native language. It is to be.

【０００６】[0006]

【課題を解決するための手段】以上の課題は次の手段に
よって解決される。すなわち、第１番目の発明の解決手
段の言語横断型概念検索システムは、それぞれ異なった
言語文書を検索するように構築され複数の類似文書検索
システムを含み、各類似文書検索システムには各言語で
記述された同一の内容の文献が散在している言語横断型
概念検索システムにおいて、一つの言語による検索質問
に対する検索結果中に存在する前記文献に対応する各言
語で記述された同一内容の文献を抽出すると共に同一内
容の文献を各言語用の検索種文献として出力する対応文
献検索装置と、出力された前記検索種文献を用いて複数
言語の類似文献を同時に検索するようにした類似文書検
索システムとを備えるものである。The above objects can be attained by the following means. That is, the cross-language concept search system according to the first aspect of the present invention includes a plurality of similar document search systems constructed to search for different language documents, and each similar document search system includes a language-specific document. In a cross-language concept search system in which documents having the same content described are scattered, documents having the same content described in each language corresponding to the document present in a search result for a search query in one language are searched for. Corresponding document search device for extracting documents having the same content as a search type document for each language, and a similar document search system for simultaneously searching for similar documents in a plurality of languages using the output search type documents Is provided.

【０００７】第２番目の発明の解決手段の言語横断型概
念検索システムは、それぞれ異なった言語文書を検索す
るように構築され複数の類似文書検索システムを含み、
各類似文書検索システムには各言語で記述された同一の
内容の文献が散在している言語横断型概念検索システム
において、一つの言語による検索質問に対する検索結果
中に存在する前記文献に対応する各言語で記述された同
一内容の文献を抽出すると共に同一内容の文献を各言語
用の検索種文献として出力する対応文献検索装置と、出
力された前記検索種文献を用いて複数言語の類似文献を
同時に検索するようにした類似文書検索システムと、当
該類似文書検索システムからの検索結果より選定した文
書の意味を調整し再び類似文書検索を行えるようにした
種文献意味調整装置とを備えるものである。A cross-language concept search system according to a second aspect of the present invention includes a plurality of similar document search systems constructed to search different language documents.
In each of the similar document search systems, in a cross-language concept search system in which documents of the same content described in each language are scattered, in the cross-language concept search system, A corresponding document search device that extracts documents of the same content described in a language and outputs the same document as a search species document for each language, and a similar document in a plurality of languages using the output search species document. A similar document search system for simultaneous searching; and a seed document meaning adjusting device for adjusting the meaning of the document selected from the search result from the similar document search system and performing a similar document search again. .

【０００８】第３番目の発明の解決手段の言語横断型概
念検索システムは、それぞれ異なった言語文書を検索す
るように構築され複数の類似文書検索システムを含み、
各類似文書検索システムには各言語で記述された同一の
内容の文献が散在している言語横断型概念検索システム
において、一つの言語による検索質問に対する検索結果
中に存在する前記文献に対応する各言語で記述された同
一内容の文献を抽出すると共に同一内容の文献を各言語
用の検索種文献として出力する対応文献検索装置と、出
力された前記検索種文献を用いて複数言語の類似文献を
同時に検索するようにした類似文書検索システムと、当
該類似文書検索システムからの検索結果の中から最適の
種文書を選択して再び類似文書検索を行えるようにした
再検索指示装置とを備えるものである。[0008] A cross-language concept search system according to a third aspect of the present invention includes a plurality of similar document search systems constructed to search for different language documents, respectively.
In each of the similar document search systems, in a cross-language concept search system in which documents of the same content described in each language are scattered, in the cross-language concept search system, A corresponding document search device that extracts documents of the same content described in a language and outputs the same document as a search species document for each language, and a similar document in a plurality of languages using the output search species document. A similar document search system configured to simultaneously search, and a re-search instruction device configured to select an optimal seed document from search results from the similar document search system and perform similar document search again. is there.

【０００９】第４番目の発明の解決手段の言語横断型概
念検索システムは、第１番目から第３番目の発明の言語
横断型概念検索システムにおいて、この言語横断型概念
検索システムに、更に、検索結果を表示するテキスト表
示装置を備えるようにしたものである。According to a fourth aspect of the present invention, there is provided a cross-language concept search system comprising the cross-language concept search system according to the first to third inventions. A text display device for displaying the result is provided.

【００１０】第５番目の発明の解決手段の言語横断型概
念検索システムは、第１番目から第４番目の発明の言語
横断型概念検索システムにおいて、この言語横断型概念
検索システムに、更に、任意の一つの言語から任意の他
の言語に翻訳する自動翻訳装置を備えるようにしたもの
である。According to a fifth aspect of the present invention, there is provided a cross-language concept search system according to the first to fourth inventions, wherein the cross-language concept search system further includes an optional language. An automatic translator for translating from one language to any other language is provided.

【００１１】[0011]

【実施例】従来から、特許の分野においては、パリー同
盟条約、特許協力条約などにより、認められている優先
権を主張する権利（同一の発明について第一国の出願日
から１年以内に他の同盟国に出願すれば第一国への出願
日と同一の日にしたものとして取り扱われる権利）を活
用して、多数の国に特許出願することが行われている。DESCRIPTION OF THE PREFERRED EMBODIMENTS In the field of patents, the right to claim priority granted by the Parry Alliance Convention, the Patent Cooperation Treaty, etc. (within one year from the filing date of the first country for the same invention) The right to be filed on the same date as the filing date of the first country).

【００１２】これら優先権を主張して各国に行った出願
は、優先権番号をキーとして同一の技術内容を記載した
各国毎の一群の出願発明として捕えることができる。こ
れが通常パテントファミリーと呼ばれるものである。そ
して、グローバル化した技術分野にこれらの特許出願は
集中している。An application filed in each country claiming these priorities can be regarded as a group of application inventions for each country in which the same technical content is described using the priority number as a key. This is what is usually called a patent family. And these patent applications are concentrated in globalized technical fields.

【００１３】言い換えれば、各国の特許文敵データベー
スには、一定の割合で、全く同一の技術内容の特許文献
が存在しており、これらパテントファミリーを形成する
特許文献は、一国の出願番号等の番号を指定すれば、各
国の同一技術内容特許文献をたちどころに見つけ出せる
ことを意味している。[0013] In other words, the patent literature enemy databases in each country contain, at a fixed rate, patent documents having exactly the same technical contents, and the patent documents forming the patent family include the application number of one country. This means that patents with the same technical content in each country can be found quickly.

【００１４】類似文書検索システムは、入力した質問文
書を、ベクトル空間モデルに基づいて検索するものであ
り、現在、例えばVext Search、コマツソフト株式会
社、登録商標、やConcept Base Search、Justsystem
社、登録商標、のような製品が発売されている。このよ
うな類似文書検索システムの概要は以下のとおりであ
る。The similar document search system searches an input question document based on a vector space model. Currently, for example, Vext Search, Komatsu Software Corporation, a registered trademark, Concept Base Search, Justsystem
Products such as the Company, registered trademarks, etc. are on sale. The outline of such a similar document search system is as follows.

【００１５】データベース中の各文書は形態素解析して
内容語（ターム）が抽出され、タームによる索引が作成
されている。各タームはその文書内における重要度に応
じて重み付けされたタームベクトルとして取り扱われ
る。各文書もタームの重み付けしたタームベクトルを合
成した文書ベクトルとして表現されている（ベクトル空
間モデル）。Each document in the database is subjected to morphological analysis to extract a content word (term), and an index based on the term is created. Each term is treated as a term vector weighted according to its importance in the document. Each document is also expressed as a document vector obtained by synthesizing a term vector weighted with a term (vector space model).

【００１６】検索に際しては、質問も重み付けしたター
ムベクトルの合成よりなる質問ベクトルとして表現さ
れ、２つのベクトルがなす角の余弦（cosine）によって
検索質問と文書の類似度が計算され、すなわち、質問ベ
クトルと文書ベクトルの内積が類似度を表すものとして
これが計算され、類似度が大きい順に検索文書をソート
して提示される。At the time of retrieval, a question is also expressed as a question vector composed of a combination of weighted term vectors, and the similarity between the retrieval query and the document is calculated by the cosine of the angle formed by the two vectors. This is calculated assuming that the inner product of the document vector and the document vector represents the similarity, and the search documents are sorted and presented in descending order of the similarity.

【００１７】タームの重みはTF・IDF 法を用いて計算さ
れる。TF（Term Frequency）は質問要求や各文書におけ
るタームの出現頻度であり、IDF(Inverse Document Fre
quency）はタームが出現する文書数に対する全文書数の
比の対数である。ここで、TF・IDF による重みの大きい
ターム、即ちタームが出現する文書数は少ないが、特定
の文書に多く出現するタームはその文書を特徴付けるキ
ーワードと位置づけることができる。なお、通常、この
ようなシステムでは、タームによる索引をそのまま用い
て、ブーリアン検索（論理式による検索）もできるよう
になっている。The term weight is calculated using the TF / IDF method. TF (Term Frequency) is the frequency of occurrence of a term in a request for a question or each document.
quency) is the logarithm of the ratio of the total number of documents to the number of documents in which the term appears. Here, a term having a large weight by TF / IDF, that is, a term in which the term appears in a small number of documents, but appears in a specific document in a large number can be regarded as a keyword that characterizes the document. Normally, in such a system, a Boolean search (search by a logical expression) can be performed by using the index by the term as it is.

【００１８】ベクトル空間モデルについては、Salton,
G. and McGill，M.J.「Introduction to Modern Inform
ation Retrieval」McGraw-Hill（1983）、或いは、「Ra
nking Algorithms」（Donna Harman 著、Information R
etrieval 、P. 363-392）に記述されている。For the vector space model, see Salton,
G. and McGill, MJ "Introduction to Modern Inform
ation Retrieval ”McGraw-Hill (1983) or“ Ra
nking Algorithms "(Donna Harman, Information R
etrieval, p. 363-392).

【００１９】以上のことより、日本語特許文献データベ
ースより検索質問に適合する日本語特許文献の検索集合
よりパテントファミリー関係にある出願を抽出し対応関
係にある米国特許出願を取得することができれば、当該
米国特許出願の発明内容を種文書として類似文書検索す
ることにより、米国における類似の米国特許文献を得る
ことができる。From the above, if a patent family-related application can be extracted from a search set of Japanese patent documents matching a search query from a Japanese patent document database and a corresponding US patent application can be obtained, Similar US patent documents in the United States can be obtained by searching for similar documents using the invention content of the US patent application as a seed document.

【００２０】ところで、類似文書検索の概要において、
「各タームはその文書内における重要度に応じて重み付
けされたタームベクトルとして取り扱われる。各文書も
タームの重み付けしたタームベクトルを合成した文書ベ
クトルとして表現されている。」と説明した。このこと
は、タームベクトルの重みを操作することは、文書ベク
トルを変えること、つまり、重み付けの操作により種文
書内容を変えることができることを意味している。更に
言い換えれば、種文書中の特に重要な意味を有するター
ムの重みを操作することにより、米国特許文献データベ
ースの検索質問を変更できることを意味している。By the way, in the outline of the similar document search,
"Each term is treated as a term vector weighted according to the degree of importance in the document. Each document is also expressed as a document vector obtained by combining the term vector weighted with the term." This means that operating the weight of the term vector changes the document vector, that is, the seed document content can be changed by the weighting operation. In other words, by manipulating the weights of terms that have particularly significant meaning in the seed document, it means that the search query in the US Patent Literature Database can be changed.

【００２１】本発明は、以上に述べた事実に基づく発明
である。以下に本発明の具体的な例を説明する。すなわ
ち、図１および図２は、それぞれ本発明の第１の実施例
を説明するためのブロック図およびフローチャートであ
る。The present invention is an invention based on the above facts. Hereinafter, specific examples of the present invention will be described. That is, FIG. 1 and FIG. 2 are a block diagram and a flowchart, respectively, for explaining the first embodiment of the present invention.

【００２２】図１において、１は検索要求である検索質
問を入力するための検索質問入力装置、２は日本特許デ
ータベース、３はテキスト検索装置、４は検索結果を表
示するテキスト表示装置である。In FIG. 1, reference numeral 1 denotes a search query input device for inputting a search query as a search request, 2 denotes a Japanese patent database, 3 denotes a text search device, and 4 denotes a text display device for displaying search results.

【００２３】これら検索質問入力装置１、日本特許デー
タベース２、テキスト検索装置３、及びテキスト表示装
置４で構成される特許情報検索システムは、現在、商用
サービスとして提供されているPATOLIS （登録商標、財
団法人日本特許情報機構）のようなブーリアン検索方
式、或いは、特許ＣＤ−ＲＯＭ公報の全文検索システム
（n-gram）方式によるものでもよいが、文書形式及びキ
ーワードの論理式による検索質問が可能で、且つ、検索
結果が順位付けできるベクトル空間モデル方式による類
似文書検索システムであることが好ましい。The patent information search system composed of the search query input device 1, the Japanese patent database 2, the text search device 3, and the text display device 4 is a PATOLIS (registered trademark, foundation, currently provided as a commercial service). A Boolean search method such as the Japanese Patent Information Organization, or a full-text search system (n-gram) method of a patent CD-ROM publication may be used, but a search query using a document format and a logical expression of a keyword is possible. In addition, it is preferable that the similar document search system is based on a vector space model system in which search results can be ranked.

【００２４】本実施例のプログラムがスタート（ステッ
プＳ００）すると、ステップＳ０１において日本語検索
システムの検索質問が入力されると、この検索質問はテ
キスト検索装置３に送られ、日本特許データベース２に
よる検索が実行される（ステップＳ０２）。ステップＳ
０３において、検索によって得られた結果をテキスト表
示装置４が表示する。このとき表示される事項は、日本
語のテキスト（抄録あるいは本文、書誌事項等）であ
る。表示をみて、満足する文献が得られたと検索者が判
断（ステップＳ０４、ＹＥＳ）したとき、次のステップ
Ｓ０５に進むが、そうでないとき、再びステップＳ０１
に還り、検索質問の修正（ステップＳ０１）、検索（ス
テップＳ０２）、表示（ステップＳ０３）が、満足する
回答が得られるまで何度か繰り返される。When the program of this embodiment starts (step S00), when a search query of the Japanese search system is input in step S01, the search query is sent to the text search device 3 and searched by the Japanese patent database 2. Is executed (step S02). Step S
At 03, the text display device 4 displays the result obtained by the search. Items displayed at this time are Japanese texts (abstracts or texts, bibliographic items, etc.). When the searcher determines that a satisfactory document has been obtained by viewing the display (step S04, YES), the process proceeds to the next step S05, but otherwise, the process returns to step S01.
The correction of the search question (step S01), the search (step S02), and the display (step S03) are repeated several times until a satisfactory answer is obtained.

【００２５】ステップＳ０５では、いわゆるパテントフ
ァミリー検索が実行される。このとき使用される対応特
許関係データベース５は、日本の特許文献番号とパテン
トファミリー関係にある外国特許文献番号を登録したデ
ータベースであり、対応特許検索装置６に日本特許文献
番号を入力することにより対応関係にある米国特許文献
番号が検索できるようになっている。In step S05, a so-called patent family search is executed. The corresponding patent relation database 5 used at this time is a database in which Japanese patent document numbers and foreign patent document numbers having a patent family relationship are registered. Related US patent document numbers can be searched.

【００２６】７は米国のパテントファミリーを有する日
本特許文献の適合性を評価する適合特許確認装置であ
る。通常日本特許文献中に占める米国特許文献の比率は
平均すると５％程度であるが、グローバルな活動の行わ
れている技術分野ではこの比率は著しく高くなるといわ
れている。しかし、このことは、逆に、検索要求に最も
適合する日本特許文献の米国対応特許文献が存在しない
場合があることを示している。つまり、もっと適切な文
献が米国特許の中にあることがあり得ることを示してい
る。そこで、適合特許確認装置７は、ステップＳ０５に
おいて検索された日本特許文献に付されたターム、ター
ムの重さ、抄録等を表示しながら、米国対応特許文献が
存在する日本特許文献の中で、最も適合している特許文
献を確認する（ステップＳ０６）ために設けられたもの
である。Reference numeral 7 denotes a compatible patent confirmation apparatus for evaluating the compatibility of Japanese patent documents having a US patent family. Normally, the ratio of U.S. patent documents in Japanese patent documents is about 5% on average, but it is said that this ratio will be significantly higher in technical fields where global activities are being carried out. However, conversely, this indicates that there may be no U.S.-compliant patent documents of Japanese patent documents that best match the search request. That is, it indicates that more relevant literature may be found in US patents. Therefore, the compatible patent confirmation device 7 displays the terms attached to the Japanese patent documents searched in step S05, the weight of the terms, the abstract, etc., and among the Japanese patent documents in which U.S.-compatible patent documents exist, This is provided to confirm the most suitable patent document (step S06).

【００２７】前述した、テキスト検索装置３により提示
される順位を利用して、米国対応特許文献が存在する日
本特許文献の中で最も上位にランクされた文献を自動的
に（機械的に）利用してもよいが、本実施例において
は、より高い適合性を確保できるようにステップＳ０６
において利用者が選択・確認した後、次のステップＳ０
７に進むようにしている。Utilizing the ranking presented by the text search device 3 described above, the highest ranking document among Japanese patent documents in which US-compatible patent documents exist is automatically (mechanically) used. However, in the present embodiment, step S06 is performed so as to ensure higher compatibility.
After the user selects and confirms in step S0,
Go to 7.

【００２８】これら対応特許関係データベース５、対応
特許検索装置６、及び適合特許確認装置７によって構成
される適合対応特許選択システムは、テキスト表示装置
４から出力された日本特許文献よりなる検索集合が、利
用者の検索要求に適合していると判断した時、テキスト
検索装置３より検索集合が適合対応特許選択システムに
出力される。The compatible patent selection system composed of the corresponding patent relation database 5, the corresponding patent search device 6, and the compatible patent confirmation device 7 provides a search set consisting of Japanese patent documents output from the text display device 4, When it is determined that the set matches the user's search request, the search set is output from the text search device 3 to the compatible patent selection system.

【００２９】次に利用者が、米国対応特許文献が存在し
且つ検索要求に最も適合している日本特許文献を選択・
確認すると、米国対応特許文献の文献番号が類似文書検
索装置８に出力される。類似文書検索装置８は少なくと
も番号検索及び類似文書検索が可能な検索装置であっ
て、まず、番号検索が行われ米国対応特許文献が米国特
許データベース９より抽出される（ステップＳ０７）。Next, the user selects a Japanese patent document which has a US-compliant patent document and is most suitable for the search request.
When confirmed, the document number of the U.S.-compliant patent document is output to the similar document search device 8. The similar document search device 8 is a search device capable of at least a number search and a similar document search. First, a number search is performed, and U.S.-compatible patent documents are extracted from the U.S. patent database 9 (step S07).

【００３０】続いて、ステップＳ０８において、抽出し
た文献を検索種文書（質問文）として類似文書検索を行
い米国特許データベース９より類似した米国特許文献を
順位付けして出力する。この出力はテキスト表示装置１
０に表示される（ステップＳ０９）。このテキスト表示
装置１０は、例えば上位１０件程度の適宜の数の特許文
献をオリジナル文献（原文、英語）とその翻訳（英日自
動翻訳装置１１にて翻訳される。）及びオリジナル文献
の重み付けした特徴ターム（英語）を併せて表示できる
ようになっている。なお、特徴タームの個数も上位１０
個のように適宜の数を指定できるようになっている。Subsequently, in step S08, similar documents are searched using the extracted documents as search type documents (question sentences), and similar US patent documents are ranked and output from the US patent database 9. This output is sent to the text display 1
0 is displayed (step S09). The text display apparatus 10 weights, for example, an appropriate number of patent documents such as the top 10 cases in the original document (original text, English), its translation (translated by the English-Japanese automatic translation device 11), and the original document. Characteristic terms (English) can be displayed together. The number of feature terms is also in the top 10
You can specify an appropriate number, such as.

【００３１】ステップＳ０９において表示された文献が
適切かどうかを検索者が入力する（ステップＳ１０）。
適切であれば（ステップＳ１０、ＹＥＳ）、ステップＳ
１１において文献出力して終わる（ステップＳ９９）。The searcher inputs whether or not the document displayed in step S09 is appropriate (step S10).
If appropriate (step S10, YES), step S
At 11, the output of the document ends (step S99).

【００３２】一方、ステップＳ１０における判断がＮＯ
の場合、ステップＳ２１に進み、ここで検索者による種
文書意味調整、すなわち、特徴タームの重みの調整が行
われる。On the other hand, if the determination in step S10 is NO
In the case of, the process proceeds to step S21, in which the searcher performs the seed document meaning adjustment, that is, the adjustment of the feature term weight.

【００３３】種文書意味調整装置１２は、米国対応特許
文献より抽出した検索種文書の適合性が不十分な場合、
特徴タームの重みを調整して、検索種文書即ち検索質問
の意味を調整・変更することを目的としたものである。
オリジナル文献と翻訳文献及びオリジナル文献の重み付
けした特徴タームを参考にして、利用者（検索者）が特
徴タームの重み付けを変更し、類似文書検索を繰り返し
て、より適合度の高い米国特許文献を検索できるように
なっている。ステップＳ０８からステップＳ１０は適切
な文献が適合するまで繰り返し実行される。The seed document semantic adjustment device 12 is adapted to execute the search if the relevance of the search seed document extracted from the US-compliant patent document is insufficient.
The purpose is to adjust and change the meaning of a search type document, that is, a search query, by adjusting the weight of the feature term.
The user (searcher) changes the weighting of the feature terms with reference to the weighted feature terms of the original document, the translated document, and the original document, and repeats the similar document search to search for a U.S. patent document with higher relevance. I can do it. Steps S08 to S10 are repeatedly executed until an appropriate document is matched.

【００３４】図３および図４は、上で述べたタームの重
み付けを変更する様子を、実例を用いて説明するための
説明図である。図３では、テキスト表示装置１０に表示
されたオリジナル文献とその翻訳が並べて示されている
画面（一部）であり、また、図４では、タームの番号、
対訳ターム、オリジナル文献のタームとそれに付けられ
た重み、およびこれが変更された重みが示されている画
面である。FIGS. 3 and 4 are explanatory diagrams for explaining how to change the term weighting described above using an actual example. FIG. 3 shows a screen (part) in which the original document displayed on the text display device 10 and its translation are displayed side by side. In FIG. 4, the term number,
This is a screen showing the bilingual terms, the terms of the original document, the weights assigned to them, and the weights that have been changed.

【００３５】図３の画面と図４の画面は、不図示の切換
ボタンを押すことにより相互に移行することができる。
これらの図では、多言語 origin of multi language、
中間語 intermediate language、記述子 descriptor、
および、効率 efficiency のタームの重み付けの変更が
行われている。この変更は変更ボタン（図４右側）を操
作することによって行われる。The screen shown in FIG. 3 and the screen shown in FIG. 4 can be switched to each other by pressing a switch button (not shown).
In these figures, the multilingual origin of multi language,
Intermediate language, descriptor,
Also, the weighting of the term efficiency has been changed. This change is performed by operating a change button (right side in FIG. 4).

【００３６】以上に示したように、この実施例１では、
日本特許検索システムによって得られた日本特許文献か
ら、これに対応するパテントファミリーの米国特許文献
を検索し、この米国特許文献を種文書（質問文）とし
て、米国特許データベースが検索される。米国特許デー
タベースには、日本特許にファミリーを持たないものが
多数含まれており、日本特許検索システムではカバーさ
れていない多数の米国特許文献がカバーされているの
で、これらの中からも目的の文献を発見することができ
る。As described above, in the first embodiment,
From the Japanese patent documents obtained by the Japanese patent search system, the corresponding US patent documents of the patent family are searched, and the US patent database is searched using the US patent documents as seed documents (question sentences). The U.S. Patent Database contains a large number of Japanese patents that do not have families, and covers a large number of U.S. patent documents that are not covered by the Japanese patent search system. Can be found.

【００３７】また、この検索による結果が満足できない
場合には、検索者がタームの重みを変更することによっ
て種文書の意味調整が行われ、より適切な質問文が作成
され、より適切な米国特許文献にアクセスすることがで
きる。If the result of the search is not satisfactory, the searcher changes the weight of the term to adjust the meaning of the seed document, thereby creating a more appropriate question, and a more appropriate US patent. Access to literature.

【００３８】図５、および、図６は、本発明の第２実施
例を説明するためのブロック図、およびフローチャート
である。この実施例は、第１実施例の種文書意味調整装
置１２に代えて再検索指示装置１３を用いた例であり、
図５における再検索指示装置１３、および図６における
ステップＳ３１が実施例１と異なるだけなので、重複す
る説明はできるだけ省略する。FIGS. 5 and 6 are a block diagram and a flowchart for explaining a second embodiment of the present invention. This embodiment is an example in which a re-search instruction device 13 is used instead of the seed document meaning adjusting device 12 of the first embodiment.
Only the re-search instructing device 13 in FIG. 5 and step S31 in FIG. 6 are different from those in the first embodiment, and a duplicate description will be omitted as much as possible.

【００３９】既に、第１実施例において説明したと同様
に、第２実施例においても、利用者が、米国対応特許文
献が存在し且つ検索要求に最も適合している日本特許文
献を選択・確認すると、米国対応特許文献の文献番号が
類似文書検索装置８に出力される。As already described in the first embodiment, in the second embodiment, the user selects and confirms a Japanese patent document in which a US-compatible patent document exists and is most suitable for a search request. Then, the document number of the US-compliant patent document is output to the similar document search device 8.

【００４０】類似文書検索装置８は少なくとも番号検索
及び類似文書検索が可能な検索装置であって、まず、番
号検索が行われ米国対応特許文献が米国特許データベー
ス９より抽出される。続いて、抽出した文献を検索種文
書として類似文書検索を行い米国特許データベース９よ
り類似した米国特許文献が順位付けして出力されるよう
になっている。The similar document search device 8 is a search device capable of at least a number search and a similar document search. First, a number search is performed, and a US-compliant patent document is extracted from the US patent database 9. Subsequently, a similar document search is performed using the extracted document as a search type document, and similar US patent documents are ranked and output from the US patent database 9.

【００４１】テキスト表示装置１０は、例えば上位１０
件程度の適宜の数の特許文献を、オリジナル文献（英
語）とその翻訳（日本語）及びオリジナル文献の重み付
けした特徴ターム（英語）を併せて表示（ステップＳ０
９）できるようになっている。The text display device 10 is, for example,
About an appropriate number of patent documents are displayed together with the original document (English), its translation (Japanese), and the weighted feature terms (English) of the original document (step S0).
9) You can do it.

【００４２】最初の検索により望ましい米国特許文献が
得られないとき（ステップＳ１０、ＮＯ）、利用者は、
この翻訳（日本語）を参考にして、自己の検索要求に適
合した前とは異なる文献を選択して（ステップＳ３
１）、これを種文書として再検索指示装置１３により再
度類似文書検索（ステップＳ０８）を行う。これを幾度
か繰り返してより最適な文献が上位に順位付けされるよ
うにする。When the desired US patent document cannot be obtained by the first search (step S10, NO), the user
With reference to this translation (Japanese), select a different document that matches the search request of the user (step S3).
1) Using this as a seed document, a similar document search is again performed by the re-search instruction device 13 (step S08). This is repeated several times so that the most suitable documents are ranked higher.

【００４３】以上に示したように、上記実施例１と同様
に、この実施例２では、日本特許検索システムによって
得られた日本特許文献から、これに対応するパテントフ
ァミリーの米国特許文献を検索し、この米国特許文献を
種文書（質問文）として、米国特許データベースが検索
される。米国特許データベースには、日本特許にファミ
リーを持たないものが多数含まれており、日本特許検索
システムではカバーされていない多数の米国特許文献が
カバーされているので、これらの中からも目的の文献を
発見することができる。As described above, in the second embodiment, similar to the first embodiment, the corresponding US patent document of the patent family is searched from the Japanese patent document obtained by the Japanese patent search system. The U.S. patent database is searched using this U.S. patent document as a seed document (question text). The U.S. Patent Database contains a large number of Japanese patents that do not have families, and covers a large number of U.S. patent documents that are not covered by the Japanese patent search system. Can be found.

【００４４】また、この実施例２では、最初の検索によ
る結果が満足できない場合には、検索者が、検索結果を
みながら別の米国特許文献を種文書として選択し直し、
再度検索実行することによって、より適切な米国特許文
献にアクセスすることができる。In the second embodiment, if the result of the first search is not satisfactory, the searcher selects another US patent document as a seed document while checking the search result.
By performing a search again, more appropriate US patent documents can be accessed.

【００４５】以上実施例１および実施例２によって明ら
かなように、本発明は、外国語の文法知織、文書作成知
識が不十分であっても、辞書を参考に単語レベルの知識
で日本語特許文献及び外国語の特許文献の検索を同時に
行うことができる。As is clear from the first and second embodiments, the present invention provides a method of Japanese-language knowledge based on word-level knowledge with reference to a dictionary, even if foreign language grammar and document creation knowledge are insufficient. Patent documents and patent documents in foreign languages can be searched simultaneously.

【００４６】また、本願発明の実施例の説明において
は、日本語文献データベースと外国語文献データベース
に散在する同一の記事内容の文献として、パテントファ
ミリーを用いた例を示したが、特許のパテントファミリ
ーに限るものではなく、広く一般に存在する学術論文、
新聞記事、あるいは、ＰＡＪ（Patent Abstract of Jap
an）など同一内容文が母国語以外の言語に翻訳されてい
る文書を含んだデータベースであるならば本発明を適用
することが可能なことは明らかである。Further, in the description of the embodiment of the present invention, an example is shown in which a patent family is used as documents having the same article content scattered in a Japanese literature database and a foreign language literature database. Is not limited to, but widely available academic papers,
Newspaper articles or PAJ (Patent Abstract of Jap)
It is obvious that the present invention can be applied to a database including documents in which the same content sentence such as an) is translated into a language other than the native language.

【００４７】[0047]

【発明の効果】本発明の言語横断型概念検索システム
は、一つの言語による質問式を作成するだけで、多数の
言語の文献を同時に検索することができるという効果を
奏する。The cross-language concept search system of the present invention has an effect that documents in many languages can be searched at the same time simply by creating a question formula in one language.

【００４８】更に、本発明の言語横断型概念検索システ
ムは、母国語以外の外国語の単語レベルの知識があれ
ば、種々の言語の文献をより精密に同時に取得すること
ができるという効果を奏する。Further, the cross-language concept search system of the present invention has an effect that if there is word-level knowledge of a foreign language other than the native language, documents in various languages can be obtained more precisely and simultaneously. .

【００４９】また、本発明の言語横断型概念検索システ
ムは、日本特許検索システムによって得られた日本特許
文献から、これに対応するパテントファミリーの米国特
許文献を検索し、この米国特許文献を種文書（質問文）
として、米国特許データベースが検索される。米国特許
データベースには、日本特許にファミリーを持たないも
のが多数含まれており、すなわち日本特許検索システム
ではカバーされていない多数の米国特許文献がカバーさ
れているので、これらの中からも目的の文献を発見する
ことができるという効果を奏する。Further, the cross-language concept retrieval system of the present invention retrieves corresponding US patent documents of the patent family from Japanese patent documents obtained by the Japanese patent retrieval system, and converts the US patent documents into seed documents. (Question text)
Is searched in the US patent database. The U.S. Patent Database contains a large number of Japanese patents that do not have families, that is, a large number of U.S. patent documents that are not covered by the Japanese patent search system. This has the effect that documents can be found.

【００５０】また、本発明の言語横断型概念検索システ
ムは、最初の検索による結果が満足できない場合には、
検索者がタームの重みを変更することによって種文書の
意味調整が行われ、より適切な質問文が作成され、より
適切な米国特許文献にアクセスすることができるという
効果を奏する。Further, the cross-language concept search system of the present invention, when the result of the first search is not satisfactory,
By changing the term weight by the searcher, the meaning of the seed document is adjusted, a more appropriate question sentence is created, and a more appropriate access to US patent documents can be obtained.

【００５１】また、本発明の言語横断型概念検索システ
ムは、最初の検索による結果が満足できない場合には、
検索者が、検索結果をみながら別の米国特許文献を種文
書として選択し直し、再度検索実行することによって、
より適切な米国特許文献にアクセスすることができると
いう効果を奏することができる。Further, the cross-language concept search system of the present invention, when the result of the first search is not satisfactory,
By re-selecting another U.S. patent document as a seed document while looking at the search results, and re-executing the search,
It is possible to obtain an effect that a more appropriate US patent document can be accessed.

[Brief description of the drawings]

【図１】本発明の第１の実施例を説明するためのブロッ
ク図である。FIG. 1 is a block diagram for explaining a first embodiment of the present invention.

【図２】本発明の第１の実施例を説明するためのフロー
チャートである。FIG. 2 is a flowchart for explaining a first embodiment of the present invention.

【図３】図４とともに、タームの重み付けを変更する様
子を、実例を用いて説明するための説明図である。FIG. 3 is an explanatory diagram, together with FIG. 4, for explaining how to change the term weighting using an actual example.

【図４】図３とともに、タームの重み付けを変更する様
子を、実例を用いて説明するための説明図である。FIG. 4 is an explanatory diagram together with FIG. 3 for explaining how to change the term weighting using an actual example.

【図５】本発明の第２の実施例を説明するためのブロッ
ク図である。FIG. 5 is a block diagram for explaining a second embodiment of the present invention.

【図６】本発明の第２の実施例を説明するためのフロー
チャートである。FIG. 6 is a flowchart for explaining a second embodiment of the present invention.

[Explanation of symbols]

１検索質問入力装置２日本特許データベース３テキスト検索装置４、１０テキスト表示装置５対応特許関係データベース６対応特許検索装置７適合特許確認装置８類似文書検索装置９米国特許データベース１１英日自動翻訳装置１２種文書意味調整装置１３再検索指示装置２１特許（日本語）情報検索システム２２適合対応特許選択システム２３米国特許類似文書検索システム DESCRIPTION OF SYMBOLS 1 Search query input device 2 Japanese patent database 3 Text search device 4, 10 Text display device 5 Corresponding patent related database 6 Corresponding patent search device 7 Conforming patent confirmation device 8 Similar document search device 9 US patent database 11 English-Japanese automatic translation device 12 Seed document semantic adjustment device 13 Re-search instructing device 21 Patent (Japanese) information search system 22 Conforming patent selection system 23 US similar document search system

───────────────────────────────────────────────────── フロントページの続き (72)発明者樋口重人東京都江東区東陽４丁目１番７号佐藤ダイヤビルディング財団法人日本特許情報機構内Ｆターム(参考） 5B075 ND03 PP23 PQ02 PQ36 PQ72 PR06 QM05 QM08 ──────────────────────────────────────────────────続き Continued on the front page (72) Inventor Shigeto Higuchi 4-7-1, Toyo, Koto-ku, Tokyo Sato Daiya Building F-term in Japan Patent Information Organization (Reference) 5B075 ND03 PP23 PQ02 PQ36 PQ72 PR06 QM05 QM08

Claims

[Claims]

1. A system in which a plurality of similar document search systems are constructed to search for different language documents, and each similar document search system includes languages in which documents of the same content described in each language are scattered. In the cross-sectional concept search system, a document having the same content described in each language corresponding to the document present in a search result for a search query in one language is extracted, and a document having the same content is searched for in each language. A cross-language concept search comprising: a corresponding document search device that outputs as a document; and a similar document search system that simultaneously searches for similar documents in a plurality of languages using the output search type document. system.

2. A system in which a plurality of similar document search systems are constructed so as to search for different language documents, respectively, and each similar document search system includes languages in which documents of the same content described in each language are scattered. In the cross-sectional concept search system, a document having the same content described in each language corresponding to the document present in a search result for a search query in one language is extracted, and a document having the same content is searched for in each language. A corresponding document search device that outputs as a document, a similar document search system that searches for similar documents in multiple languages at the same time using the output search type document, and a selection is made from search results from the similar document search system. A cross-language concept retrieval system characterized by comprising a seed document semantic adjustment device that adjusts the meaning of a document so that a similar document can be searched again. .

3. A similar document search system constructed to search for different language documents, each including a plurality of similar document search systems, wherein each of the similar document search systems is a language in which documents of the same content described in each language are scattered. In the cross-sectional concept search system, a document having the same content described in each language corresponding to the document present in a search result for a search query in one language is extracted, and a document having the same content is searched for in each language. A corresponding document search device that outputs as a document, a similar document search system configured to simultaneously search for similar documents in multiple languages using the output search type document, and a search result from the similar document search system. A cross-language concept search system comprising a re-search instruction device for selecting an optimum seed document and performing a similar document search again.

4. The cross-language concept search system according to any one of claims 1 to 3, further comprising a text display device for displaying a search result. A cross-language concept retrieval system characterized by:

5. A cross-language concept search system according to any one of claims 1 to 4, wherein the cross-language concept search system further comprises any one language from any other language. A cross-language concept search system comprising an automatic translation device for translating into a language.