JP5516916B2

JP5516916B2 - Method, system and apparatus for targeted investigation of multi-selected documents in an electronic document collection

Info

Publication number: JP5516916B2
Application number: JP2012509771A
Authority: JP
Inventors: レスニック，ジェイソン，デイヴィッド; ラカス，ランディー，ダブリュー
Original assignee: シーピーエーソフトウェアリミテッド
Priority date: 2009-05-08
Filing date: 2009-05-08
Publication date: 2014-06-11
Anticipated expiration: 2029-05-08
Also published as: KR20140056402A; AU2009345829A1; NZ596910A; WO2010128974A1; JP2012526319A; CN102804125A; EP2438507A1; EP2438507A4; CA2761542A1

Description

本発明は、電子ドキュメントコレクションに関し、またクエリの受け取りに応じたコレクションの調査に関する。より具体的には、本発明は、各ドキュメントの複数の区分を分類し、コレクション内のドキュメントの分類された区分に応じてクエリを効率的に処理することに関する。 The present invention relates to electronic document collections and to surveying collections upon receipt of queries. More specifically, the present invention relates to classifying multiple sections of each document and efficiently processing queries according to the classified sections of documents in the collection.

特許、商標および著作権出願を含むすべての知的財産ドキュメントは、そのような出願を受け取るように指定された政府機関の前に登録または審査のために提出されなければならない。政府特許庁の前に審査のために提出された特許出願は、各特許が新しく有用で非自明でなければならないことを含むいくつかの要件を満たさなければならない。類似の基準は、全てでないにしてもほとんどの外国特許庁の特許庁に適用される。審査のための特許出願を適切に準備するには、一発明に一特許しか与えられないので、関連技術分野の先行特許（すなわち、先行技術）の知識を有することが有用である。先行技術を確認する方法は、特許調査として知られる。特許調査の結果は、一般に、後工程の特許出願の作成者が、何が特許可能な内容でありそうかに集中し、また発明者の目的または特許権所有者の目的を達成するための妥当な戦略を策定するのを支援する。 All intellectual property documents, including patents, trademarks and copyright applications, must be submitted for registration or examination prior to the government agency designated to receive such applications. Patent applications filed for examination prior to the Government Patent Office must meet several requirements, including that each patent must be new, useful and non-obvious. Similar criteria apply to most, if not all, patent offices of foreign patent offices. In order to properly prepare a patent application for examination, it is useful to have knowledge of prior patents in the relevant technical field (ie prior art) since only one patent is granted for an invention. The method of confirming the prior art is known as patent search. The results of a patent search are generally valid for the creator of a subsequent patent application to focus on what is likely to be patentable content and to achieve the inventor's objectives or the patent owner's objectives. To develop a good strategy.

現在の電子情報時代に技術が発展する前、特許調査が手作業で行われていたことは知られている。調査員は、特許情報開示を調べ、特許の分類体系に基づいて特許情報開示がどこで分類されるかを確認し、その後で調査を行なう。情報技術の到来で、全ての特許および公開特許出願が電子形式でしか入手できないので、紙の調査はもはや利用できない。電子形式の特許ドキュメントの場合でも、電子特許データベースの調査に、手による調査で使用されるのと類似の戦略を使用することができる。 It is known that patent search was done manually before the technology developed in the present electronic information age. The investigator examines the patent information disclosure, confirms where the patent information disclosure is classified based on the patent classification system, and then conducts the survey. With the advent of information technology, paper searches are no longer available because all patents and published patent applications are only available in electronic form. Even in the case of electronic patent documents, strategies similar to those used in manual searches can be used to search the electronic patent database.

様々な結果を得るために、様々なクラスの調査が依頼されることがある。例えば、特許を申請するかどうかを確認するための新規性調査が依頼されることがある。製品が現行特許のクレームによってカバーされているかどうかを確認するために、製品クリアランス調査が依頼されることがある。特許の発行クレームが有効かどうかなどを決定するために、無効性調査が依頼されることがある。従来の電子調査ツールは、異なるクラスの調査に対応していない。より正確に言うと、調査範囲に基づいて調査で調べる特許ドキュメントの区分を制限するには、調査を行う人（調査員としても知られる）に負担がかかる。データベース内の特許と出願特許出願の量が増大しているので、特許の数と共に調査の負担が増大し、また公開された特許出願を調査のたびに調べなければならない。 Different classes may be requested to obtain different results. For example, a novelty search may be requested to confirm whether to apply for a patent. A product clearance survey may be requested to determine if the product is covered by the claims of the current patent. An invalidity investigation may be requested to determine whether a patent issuance claim is valid or not. Traditional electronic survey tools do not support different classes of surveys. More precisely, restricting the classification of patent documents to be examined in the search based on the scope of the search places a burden on the person conducting the search (also known as the researcher). As the amount of patents and patent applications in the database increases, the search burden increases with the number of patents, and published patent applications must be examined for each search.

したがって、調査および関連調査範囲と関連した負担を軽減する調査員によって使用されるツールが必要とされている。このツールによって、調査員は、正確で望ましい調査結果をより効率的かつ効果的に取得するために調査中に特許ドキュメントの様々な区分を活用できなければならない。 Therefore, there is a need for tools that can be used by investigators to reduce the burden associated with the study and related study scope. With this tool, investigators must be able to utilize different sections of patent documents during the search to obtain more accurate and desirable search results more efficiently and effectively.

本発明は、特許ドキュメントなどの知的財産ドキュメントのコレクションを効率的かつ有効に調査する方法、システムおよび物品を含む。 The present invention includes methods, systems, and articles for efficiently and effectively searching a collection of intellectual property documents such as patent documents.

本発明の１つの態様では、電子ドキュメントコレクションを調査するコンピュータ手法が提供される。知的財産ドキュメントのコレクションがコンパイルされ、コレクション内の各知的財産ドキュメントは、複数の区分からなる。例えば、コレクションを索引付けするとき、コレクション内の各特許ドキュメントの少なくとも１つのドキュメントベクトルが導出される。ドキュメントベクトルの導出は、コレクション内に各ドキュメントの少なくとも１つの静的ドキュメントベクトルを作成することを含む。コレクションにクエリを提示するとき、クエリ入力と共に提示された文字列に基づいて動的ドキュメントベクトルが作成される。コレクションにクエリ入力を提示することにより、クエリ入力と関連付けられた動的ドキュメントベクトルが、コレクション内の各静的ドキュメントベクトルと比較される。コレクションの静的ドキュメントベクトルと動的ドキュメントベクトルの比較に基づいて、適切な特許ドキュメントのコンパイルが戻される。 In one aspect of the present invention, a computer approach for examining an electronic document collection is provided. A collection of intellectual property documents is compiled, and each intellectual property document in the collection consists of a plurality of sections. For example, when indexing a collection, at least one document vector for each patent document in the collection is derived. Deriving document vectors includes creating at least one static document vector for each document in the collection. When presenting a query to a collection, a dynamic document vector is created based on the string presented with the query input. By presenting the query input to the collection, the dynamic document vector associated with the query input is compared to each static document vector in the collection. Based on the comparison of the static and dynamic document vectors in the collection, an appropriate patent document compilation is returned.

本発明の別の態様では、コンピュータシステムは、記憶媒体と通信状態にあるプロセッサと、記憶媒体上に維持される電子ドキュメントコレクションが提供される。電子ドキュメントコレクションは、特許または他の知的財産ドキュメントのコンパイルである。特許ドキュメントの特徴に基づいて、コレクション内の特許ドキュメントはそれぞれは複数の区分を有する。索引付け時に、コレクション内の各特許ドキュメントの少なくとも１つのドキュメントベクトルが導出される。ドキュメントベクトルの作成は、ドキュメントコレクション内の各特許ドキュメントの少なくとも１つの静的ドキュメントベクトルを作成することを含む。クエリ時に、クエリ入力により受け取った文字列データから動的ドキュメントベクトルが作成される。動的ドキュメントベクトルを作成した後で、クエリ入力は、電子特許ドキュメントコレクションに提示される。入力マネージャと通信状態にあるクエリマネージャが、特許ドキュメントコレクションへのクエリ入力の提示に応じて、動的ドキュメントベクトルを、コレクション内の各静的ドキュメントベクトルと比較する。クエリマネージャによって提示された後で、静的ドキュメントベクトルと動的ドキュメントベクトルの比較に基づくコンパイルにより、適切な特許ドキュメントのコンパイルが戻される。 In another aspect of the invention, a computer system is provided with a processor in communication with a storage medium and an electronic document collection maintained on the storage medium. An electronic document collection is a compilation of patents or other intellectual property documents. Based on the characteristics of the patent document, each patent document in the collection has a plurality of sections. At index time, at least one document vector for each patent document in the collection is derived. Creating the document vector includes creating at least one static document vector for each patent document in the document collection. At the time of query, a dynamic document vector is created from character string data received by query input. After creating the dynamic document vector, the query input is presented to the electronic patent document collection. A query manager in communication with the input manager compares the dynamic document vector with each static document vector in the collection in response to presenting the query input to the patent document collection. After being presented by the query manager, compilation based on a comparison of static and dynamic document vectors returns a compilation of the appropriate patent document.

本発明の更に別の態様では、コンピュータメモリ上の電子ドキュメントコレクションを調査するように構成されたコンピュータプログラム命令を含むコンピュータ可読キャリアを備えた物品が提供される。コンピュータ可読キャリアは、ドキュメントコレクション上で実行すべきコンピュータプログラム命令を含む。特許ドキュメントのコレクションをコンパイルする命令が提供される。コレクション内の各特許ドキュメントは、複数の区分に分割される。コレクションを索引付け時に、コレクション内の特許ドキュメントごとに少なくとも１つのドキュメントベクトルを導出する命令が提供される。これは、ドキュメントコレクション内の特許ドキュメントごとに少なくとも１つの静的ドキュメントベクトルを作成することを含む。コレクションにクエリを提示するとき、クエリ入力からの文字列データに基づいて動的ドキュメントベクトルを作成する命令が提供される。動的ドキュメントベクトルを作成した後で、動的ドキュメントベクトルをコレクション内の各静的ドキュメントベクトルと比較するためのクエリが電子ドキュメントコレクションに提示される。クエリ提示の結果は、コレクション内の動的ドキュメントベクトルを静的ドキュメントベクトルと比較することに基づいて戻された適切な特許ドキュメントのコンパイルを含む。 In yet another aspect of the invention, an article is provided comprising a computer readable carrier that includes computer program instructions configured to examine an electronic document collection on a computer memory. A computer readable carrier includes computer program instructions to be executed on a document collection. Instructions are provided for compiling a collection of patent documents. Each patent document in the collection is divided into a plurality of sections. Instructions are provided for deriving at least one document vector for each patent document in the collection when indexing the collection. This includes creating at least one static document vector for each patent document in the document collection. When presenting a query to the collection, instructions are provided for creating a dynamic document vector based on string data from the query input. After creating the dynamic document vector, a query is presented to the electronic document collection to compare the dynamic document vector with each static document vector in the collection. The results of query presentation include a compilation of the appropriate patent documents returned based on comparing the dynamic document vectors in the collection with the static document vectors.

本発明の他の特徴および利点は、添付図面と関連して行われる本発明の現在好ましい実施形態の以下の詳細な説明から明らかになる。 Other features and advantages of the present invention will become apparent from the following detailed description of the presently preferred embodiments of the invention, taken in conjunction with the accompanying drawings.

本明細書で参照された図面は、本明細書の一部を構成する。図面に示された特徴は、特に断らない限り、本発明のすべての実施形態の例示ではなく単に本発明のいくつかの実施形態の例示として示されたものである。他の状況では逆の意味付けは行われない。 The drawings referred to in this specification form part of this specification. The features shown in the drawings are merely illustrative of some embodiments of the present invention, and not merely illustrative of all embodiments of the present invention, unless otherwise specified. In other situations the reverse meaning is not made.

電子ドキュメントコレクション、より具体的には特許および特許公報に関係するコレクションの調査を示すフローチャートである。6 is a flow chart illustrating an electronic document collection, more specifically a collection survey related to patents and patent publications. 特許ドキュメントコレクションにクエリを提示する一般的なプロセスを示すフローチャートである。FIG. 6 is a flowchart illustrating a general process for presenting a query to a patent document collection. ストップワードを使用して特許ドキュメントコレクション内の静的ドキュメントベクトルを更に解析するプロセスを示すフローチャートである。FIG. 6 is a flowchart illustrating a process for further analyzing static document vectors in a patent document collection using stopwords. コレクション内の各特許ドキュメントの複数のドキュメントベクトルを作成するプロセスを示すフローチャートである。FIG. 6 is a flowchart illustrating a process for creating a plurality of document vectors for each patent document in the collection. 本発明の好ましい実施形態による、複数のドキュメントベクトルと共にドキュメントコレクションにクエリを提示するプロセスを示すフローチャートであり、交付済み特許の第１ページに印刷するように提案される。FIG. 6 is a flowchart illustrating a process for presenting a query to a document collection with a plurality of document vectors, according to a preferred embodiment of the present invention, and is proposed to print on the first page of a issued patent. 電子ドキュメントコレクションに提示されたクエリを処理するために使用される１組のツールを示すブロック図である。FIG. 3 is a block diagram illustrating a set of tools used to process a query presented in an electronic document collection. 電子ドキュメントコレクションを調査するユーザ入力指示をするためのグラフィカルユーザインターフェースのブロック図である。FIG. 3 is a block diagram of a graphical user interface for providing user input instructions for examining an electronic document collection.

本明細書の図で概略的に示され説明された本発明の構成要素が、様々な異なる構成で構成され設計されてもよいことは容易に理解されよう。したがって、図に示されたような本発明の装置、システムおよび方法の実施形態に関する以下の詳細な説明は、請求された通り本発明の範囲を限定するものではなく、単に本発明の特定の実施形態を表すものである。 It will be readily appreciated that the components of the invention schematically illustrated and described in the figures herein may be configured and designed in a variety of different configurations. Accordingly, the following detailed description of embodiments of the apparatus, system and method of the present invention as shown in the figures does not limit the scope of the invention as claimed, but is merely a specific implementation of the invention. It represents the form.

本明細書で述べる機能ユニットは、マネージャとして示された。マネージャは、フィールドプログラマブルゲートアレイ、プログラマブルアレイロジック、プログラマブルロジック装置などのプログラム可能なハードウェア装置で実現されてもよい。また、マネージャは、様々なタイプのプロセッサによって実行されるソフトウェアで実現されてもよい。例えば、実行可能コードの特定マネージャは、例えばオブジェクト、手順、関数または他の構成体として編成されてもよいコンピュータ命令の１つまたは複数の物理または論理ブロックから構成されてもよい。しかしながら、特定マネージャの実行ファイルは、物理的に一緒に配置されなくてよいが、論理的に結合されたときにマネージャを構成しかつマネージャの規定の目的を達成する様々な場所に格納された個々の命令を含んでもよい。 The functional units described herein have been designated as managers. The manager may be implemented with programmable hardware devices such as field programmable gate arrays, programmable array logic, programmable logic devices, and the like. The manager may also be implemented with software executed by various types of processors. For example, a specific manager of executable code may be composed of one or more physical or logical blocks of computer instructions that may be organized, for example, as objects, procedures, functions, or other constructs. However, specific manager executables do not have to be physically located together, but are individually stored in various locations that, when logically combined, constitute the manager and achieve the manager's specified objectives. May also include instructions.

実際には、実行可能コードのマネージャは、単一命令でも複数命令でもよく、またいくつかの異なるコードセグメントに分配されてもよく、様々なアプリケーション間に分配されてもよく、またいくつかのメモリデバイスにわたって分配されてもよい。同様に、オペレーショナルデータが、マネージャ内で識別され示されてもよく、任意の適切な形式で実施され任意の適切なタイプのデータ構造内に編成されてもよい。オペレーショナルデータは、単一のデータセットとして収集されてもよく、様々な記憶装置を含む様々な場所に分散されてもよく、少なくとも部分的にシステムまたはネットワーク上の電子信号として存在してもよい。 In practice, the manager of executable code may be a single instruction or multiple instructions, may be distributed over several different code segments, may be distributed among various applications, and several memory It may be distributed across devices. Similarly, operational data may be identified and shown within the manager, implemented in any suitable format, and organized into any suitable type of data structure. Operational data may be collected as a single data set, distributed across various locations including various storage devices, and may exist at least partially as electronic signals on a system or network.

本明細書の全体にわたって、「特定の実施形態（a select embodiment）」、「一実施形態（one embodiment）」、または「実施形態（an embodiment）」は、この実施形態と関連して説明された特定の特徴、構造または特性が、本発明の少なくとも１つの実施形態に含まれることを意味する。したがって、本明細書全体の様々な場所に現れる語句「特定の実施形態」、「一実施形態では（in one embodiment）」または「実施形態では（in an embodiment）」は、必ずしも同じ実施形態を指さない。 Throughout this specification, a "a select embodiment", "one embodiment", or "an embodiment" has been described in connection with this embodiment. It is meant that a particular feature, structure or characteristic is included in at least one embodiment of the invention. Thus, the phrases “a particular embodiment”, “in one embodiment”, or “in an embodiment” appearing in various places throughout this specification are not necessarily referring to the same embodiment. No.

更に、記載した特徴、構造または特性は、１つまたは複数の実施形態において任意の適切な方法で組み合わされてもよい。以下の説明では、本発明の実施形態の完全な理解を提供するために、ドキュメントマネージャ、入力マネージャ、クエリマネージャなどの例のような多数の特定の詳細が提供される。しかしながら、当業者は、本発明を１つまたは複数の特定の詳細なしに、または他の方法、構成要素、材料などにより実施できることを理解するであろう。他の例では、本発明の態様を不明瞭にしないように、周知の構造、材料または操作は詳細に図示または言及されない。 Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided, such as examples of document managers, input managers, query managers, etc., in order to provide a thorough understanding of embodiments of the present invention. However, one of ordinary skill in the art appreciates that the invention can be practiced without one or more specific details or with other methods, components, materials, and the like. In other instances, well-known structures, materials, or operations are not shown or described in detail to avoid obscuring aspects of the invention.

本発明の図示された実施形態は、全体に亘って類似の部分が類似の数字によって示された図面を参照することにより最もよく理解されるであろう。以下の説明は、単なる例であり、本明細書で請求された本発明と一致する装置、システムおよび方法の特定の選択された実施形態だけを示す。 The illustrated embodiments of the present invention will be best understood by reference to the drawings, wherein like parts are designated by like numerals throughout. The following description is merely exemplary and shows only certain selected embodiments of apparatus, systems and methods consistent with the invention claimed herein.

（概要）
静的ドキュメントベクトルと動的ドキュメントベクトルが、知的財産ドキュメントと共に使用される。以下では、特に特許ドキュメントに関して言及される。一実施形態では、ドキュメントベクトルは、任意の知的財産ドキュメントに適用されてもよい。ドキュメントベクトルは、１組の（キーワード，重み）ペアであり、ここで、キーワードは、基本ドキュメントと関連付けられた語または句であり、重みは、キーワードがドキュメントにとってどれだけ重要かの数値尺度である。より具体的には、ドキュメントベクトルは、ドキュメント間の比較を容易にするようにドキュメント内容を表わす一種のドキュメント署名である。これは、ドキュメントの非構造化テキスト内容の数値表現である。静的ドキュメントベクトルは、特許および公開特許出願と、そのようなドキュメントが頻繁に変更されないように関連付けられる。動的ドキュメントベクトルは、特許ドキュメントコレクションに提示されたクエリ文字列データ（以下では文字列）と関連付けられる。静的ドキュメントベクトルは、特許に特有でかつ調査を行う際に極小値を有する文字列を除外するように解析されてもよい。除外された文字列は、ストップワードと呼ばれる。一実施形態では、本明細書で使用されるストップワードは、特許業界に特有である。更に、各特許ドキュメントは中に定義された区分を有し、各区分は、特許ドキュメントの様々な部分を識別する。特許調査を行うとき、特許ドキュメントの様々な区分に異なる値がある。したがって、特許調査の範囲によって、調査は、特許ドキュメントの特定の区分に限定されることがある。したがって、コレクションに提示されたクエリに関係するデータによって結果セットを効率的かつ効果的に作成するために、特許ドキュメントコレクションにドキュメントベクトルが使用され、結果セットは、特許ドキュメントコレクションの内、提示されたクエリ文字列と関連付けられた動的ドキュメントベクトルのセット数値範囲内になるように計算された静的ドキュメントベクトルを有する１つまたは複数のドキュメントである。 (Overview)
Static document vectors and dynamic document vectors are used with intellectual property documents. In the following, reference will be made in particular to patent documents. In one embodiment, the document vector may be applied to any intellectual property document. A document vector is a set of (keyword, weight) pairs, where a keyword is a word or phrase associated with the base document and a weight is a numerical measure of how important the keyword is to the document. . More specifically, a document vector is a type of document signature that represents document content so as to facilitate comparison between documents. This is a numerical representation of the unstructured text content of the document. Static document vectors are associated with patents and published patent applications so that such documents do not change frequently. The dynamic document vector is associated with query string data (hereinafter, a string) presented in the patent document collection. Static document vectors may be parsed to exclude character strings that are patent specific and have a local minimum when conducting a search. The excluded character string is called a stop word. In one embodiment, the stopwords used herein are specific to the patent industry. Further, each patent document has a section defined therein, and each section identifies various portions of the patent document. When conducting a patent search, there are different values for the various categories of patent documents. Thus, depending on the scope of the patent search, the search may be limited to a specific section of the patent document. Thus, to efficiently and effectively create a result set with data related to the query presented in the collection, a document vector is used for the patent document collection, and the result set is presented within the patent document collection. One or more documents having a static document vector calculated to be within a set numerical range of dynamic document vectors associated with the query string.

（技術詳細）
実施形態の以下の説明では、本明細書の一部を構成し本発明を実施することができる特定の実施形態を例として示す添付図面を参照する。本発明の範囲を逸脱せずに構造を変更することができるので他の実施形態を利用できることを理解されたい。 (Technical details)
In the following description of the embodiments, reference is made to the accompanying drawings that form a part hereof, and in which is shown by way of illustration specific embodiments in which the invention may be practiced. It should be understood that other embodiments may be utilized as the structure may be changed without departing from the scope of the invention.

図１は、電子ドキュメントコレクション、およびより具体的には特許および特許公報に関係するコレクションを調査する全体像を示すフローチャート（１００）である。最初に、特許ドキュメントのコレクションが、コンパイルされる（１０２）。特許および特許公報が複数の区分からなることは当該技術分野で理解されている。ドキュメントのコンパイルに続いて、コレクションが、索引付けされる（１０４）。コンパイルを索引付けするプロセスは、データのコレクションを調査と抽出に適したデータベースに変換することを含む。より具体的には、ドキュメントコレクションの索引付けは、コレクション内の各特許ドキュメントのドキュメントベクトルを導出することを含む（１０６）。ドキュメントベクトルは、語と句の重み付けリストを含む。一実施形態では、ドキュメントベクトルに選択される用語には、名詞句、タイトルケースでかつ文の最初ではない語、およびドキュメントに頻繁に現れる語が挙げられるが、これらに限定されない。重みは、ベクトルに入れられた用語に関して間計算される。一実施形態では、重みを計算する以下の方法は、ドキュメント内の語の頻度が１〜０の数に正規化され、ここで１はドキュメントに最も頻繁に現れる語に割り当てられ、ドキュメントの選択されたフィールド内の語または語ペアを強調するステップと、名詞句により高い重みを割り当てるステップと、ドキュメントの本文内のタイトルケース語を強調するステップと、短い文字列より長い文字列により高い重みを割り当てるステップとを含むがこれらに限定されない。ドキュメントベクトルに含める語と句を選択され、その語と句の重みが選択された後で、積分器を利用することによりドキュメントベクトルが計算される。一実施形態では、積分器は、ベクトルにどのフィールドを含めるか、含む語と句をどれだけ強調するかを選択し、各要素が最終的な用語の重みにどれだけ寄付するかを選択し、ドキュメント内に見つけた企業体の重要性の強化や、データベース内に見つけた一般的な句を除去するためのストップワードリストの増大などのエンティティタイプをベクトルに追加することができる。コレクション内の各特許ドキュメントに作成されたドキュメントベクトルは、「静的ドキュメントベクトル」と名付けられる。 FIG. 1 is a flowchart (100) showing an overview of an electronic document collection, and more specifically, a collection related to patents and patent publications. Initially, a collection of patent documents is compiled (102). It is understood in the art that patents and patent publications consist of multiple sections. Following compilation of the document, the collection is indexed (104). The process of indexing compilation involves converting a collection of data into a database suitable for research and extraction. More specifically, indexing a document collection includes deriving a document vector for each patent document in the collection (106). The document vector includes a weighted list of words and phrases. In one embodiment, the terms selected for the document vector include, but are not limited to, noun phrases, words that are in the title case and not the beginning of the sentence, and words that appear frequently in the document. The weight is calculated for the terms placed in the vector. In one embodiment, the following method of calculating weights normalizes the frequency of words in the document to a number between 1 and 0, where 1 is assigned to the most frequently occurring word in the document and the document is selected. Highlighting words or word pairs in a field, assigning higher weights to noun phrases, emphasizing titlecase words in the body of a document, and assigning higher weights to longer strings than short strings Including, but not limited to, steps. After the words and phrases to be included in the document vector are selected and the weights for the words and phrases are selected, the document vector is calculated by utilizing an integrator. In one embodiment, the integrator selects which fields to include in the vector, how much to emphasize the words and phrases it contains, how much each element contributes to the final term weight, Entity types can be added to the vector, such as increasing the importance of the business entity found in the document or increasing the stop word list to remove common phrases found in the database. The document vector created for each patent document in the collection is named “static document vector”.

少数の例外以外、特許ドキュメントが発行された後、一般に変更は受けられない。この規則の例外には、補正証明書の発行、交付済み特許の再審査、および交付済み特許の再発行が挙げられるが、これらに限定されない。これらの例外に対処するために、ドキュメントコレクションが更新される。より具体的には、コレクション内のドキュメントに対する変更および関連ドキュメントベクトルを更新するための時間期間が設定される（１０８）。時間期間の例には、毎月、半年毎、毎年などがあるがこれらに限定されない。その後で、設定された時間期間が終了したかどうかが判定される（１１０）。ステップ（１１０）での判定に対する肯定応答の後で、ステップ（１０２）に戻る。これと逆に、ステップ（１１０）での判定に対する否定応答の後は、設定された時間期間待機して、特許ドキュメントに対する変更をドキュメントベクトルに組み込むように特許ドキュメントベクトルを更新し（１１２）、その後でステップ（１１０）に戻る。一実施形態では、特許コレクションは、交付済み特許に限定されず、公開特許出願を含む。したがって、特許に特有の性質に基づいて、特許ドキュメントコレクションは、コレクション内の特許のどれかに対する変更に対処するために、定期的に更新されなければならない。 With the exception of a few exceptions, changes generally cannot be made after a patent document is issued. Exceptions to this rule include, but are not limited to, issuance of amendment certificates, reexamination of issued patents, and reissue of issued patents. To handle these exceptions, the document collection is updated. More specifically, a change is made to the documents in the collection and a time period for updating the associated document vector is set (108). Examples of time periods include, but are not limited to, monthly, semi-annual, annual. Thereafter, it is determined whether the set time period has expired (110). After an affirmative response to the determination in step (110), the process returns to step (102). Conversely, after a negative response to the determination at step (110), wait for a set period of time, update the patent document vector to incorporate changes to the patent document into the document vector (112), and then Return to step (110). In one embodiment, the patent collection is not limited to issued patents, but includes published patent applications. Therefore, based on patent specific properties, patent document collections must be regularly updated to accommodate changes to any of the patents in the collection.

ドキュメントコレクションを解析してコレクションの静的ドキュメントベクトルを作成した後、コレクション全体にクエリが実行されることがある。図２は、特許ドキュメントコレクションにクエリを提示する一般的なプロセスを示すフローチャート（２００）である。最初に、入力クエリを受け取る（２０２）。一実施形態では、入力クエリは、文字列で構成される。クエリ入力用のドキュメントベクトルを作成する（２０４）。クエリ用のドキュメントベクトルは、提示時に作成されるので、今後、動的ドキュメントベクトルと呼ぶ。動的ドキュメントベクトルは、クエリ用のテキスト入力に基づいて作成される。より具体的には、動的ドキュメントベクトルは、クエリ入力テキストからの最も適切な用語から成る。動的ドキュメントベクトルに含める文字列を選択し、かつベクトルに含めるように選択された用語に重みを割り当てるために、様々なツールが使用できる。一実施形態では、入力クエリから、名詞句、タイトルケースの語（すなわち、最初の文字が大文字化されているが文の最初にない）、ドキュメントに頻繁に現れる語、ドキュメントに頻繁に現れる対の語が抽出される。静的ドキュメントベクトルと同じように、指定されたストップワードは、除去され、動的ドキュメントベクトルに含まれない。動的ベクトルに含めるための用語が、入力クエリのテキストから抽出された後、これらの用語に重みが割り当てられる。一実施形態では、ドキュメント内の各用語または句の頻度は、１〜０の数に正規化され、ここで、１は、ドキュメントに最も頻繁に現れる語に割り当てられる。同様に、一実施形態では、例えば、名称などの特別なフィールド内の語または語ペアが強調され、名詞句により高い重みが割り当てられ、ドキュメントの本文内のタイトルケース語が強調され、長い文字列に短い文字列より高い重みが割り当てられる。ドキュメントベクトルの計算は、高度に構成可能である。一実施形態では、ユーザは、調査語に重みを割り当てることができる。したがって、クエリ入力に基づいて適切な動的ドキュメントベクトルを作成するために呼び出すことができる様々なツールがある。 After parsing a document collection to create a static document vector for the collection, the entire collection may be queried. FIG. 2 is a flowchart (200) illustrating a general process for presenting a query to a patent document collection. Initially, an input query is received (202). In one embodiment, the input query consists of a string. A document vector for query input is created (204). Since the document vector for query is created at the time of presentation, it will be called a dynamic document vector in the future. A dynamic document vector is created based on the text input for the query. More specifically, the dynamic document vector consists of the most appropriate terms from the query input text. Various tools can be used to select strings to include in the dynamic document vector and assign weights to the terms selected for inclusion in the vector. In one embodiment, from an input query, a noun phrase, a title case word (ie, the first letter is capitalized but not at the beginning of a sentence), a word that appears frequently in the document, a pair of words that appears frequently in the document A word is extracted. As with the static document vector, the specified stopword is removed and not included in the dynamic document vector. After terms for inclusion in the dynamic vector are extracted from the text of the input query, these terms are assigned weights. In one embodiment, the frequency of each term or phrase in the document is normalized to a number between 1 and 0, where 1 is assigned to the word that appears most frequently in the document. Similarly, in one embodiment, words or word pairs in special fields, such as names, are emphasized, higher weights are assigned to noun phrases, title case words in the body of the document are emphasized, and long strings. Is assigned a higher weight than a short string. The calculation of the document vector is highly configurable. In one embodiment, the user can assign a weight to the search term. Thus, there are various tools that can be called to create an appropriate dynamic document vector based on query input.

ステップ（２０４）の後で、動的ドキュメントベクトルの形のクエリが、ドキュメントコレクションに提示され（２０６）、ここで、動的ドキュメントベクトルは、特許ドキュメントコレクション内の静的ドキュメントベクトルと比較される（２０８）。次に、コレクション内の静的ドキュメントベクトルのどれかが、動的ドキュメントベクトルの定義された数値範囲内にあるとかどうかが判定される（２１０）。ステップ（２１０）での判定に対する肯定応答の後で、定義された数値範囲内にある１つまたは複数の静的ドキュメントベクトルを有するコレクション内のすべての基礎特許ドキュメントを結果セットに入れる（２１２）。ステップ（２１２）の後またはステップ（２１０）での判定に対する否定応答に応じて、ユーザが、ドキュメントコレクションに新しいクエリを提示したいかどうかが判定される（２１４）。一実施形態では、新しいクエリは、以前に提示されたクエリの範囲を狭くしてもよい。同様に、新しいクエリは、以前に提示されたクエリの範囲を広くしてもよい。新しいクエリの範囲にかかわらず、ステップ（２１４）での判定に対する肯定応答の後で、ステップ（２０４）に戻る。同様に、ステップ（２１４）での判定に対する否定応答は、ドキュメントコレクションに対するクエリ提示プロセスの終了となる。したがって、ドキュメントコレクションにクエリを提示することは、提示された文字列を動的ドキュメントベクトルに変換することと、ドキュメントベクトルをドキュメントコレクションの静的ベクトルと比較することを含む。 After step (204), a query in the form of a dynamic document vector is presented to the document collection (206), where the dynamic document vector is compared to a static document vector in the patent document collection ( 208). Next, it is determined whether any of the static document vectors in the collection are within the defined numerical range of the dynamic document vector (210). After an affirmative response to the determination at step (210), all base patent documents in the collection having one or more static document vectors that fall within the defined numerical range are placed in the result set (212). After step (212) or in response to a negative response to the determination at step (210), it is determined whether the user wishes to submit a new query to the document collection (214). In one embodiment, a new query may narrow the scope of previously presented queries. Similarly, a new query may broaden the scope of previously presented queries. Regardless of the scope of the new query, after an affirmative response to the determination at step (214), return to step (204). Similarly, a negative response to the determination in step (214) ends the query presentation process for the document collection. Thus, presenting a query to the document collection includes converting the presented string to a dynamic document vector and comparing the document vector to a static vector of the document collection.

特許ドキュメントコレクションは、技術ドキュメントの固有のコレクションである。特許ドキュメントは、交付済み特許認可と公開特許出願の形で来る。ドキュメントの２つのカテゴリーの違いは、それらの実施可能な値を識別する。より具体的には、特許認可は、司法裁判所で実施することができる実際の所有権であり、一方、公開特許出願は、係属特許権である係属出願である。記述された各特許ドキュメントは、出願書類に入れる慣例的な語と句を含む。しかしながら、そのような語と句は、これらの語とが、ほとんどの特許ドキュメントに現われ、本発明に固有でないので、調査では極小価値を有する。そのような語と句の例には、「実施形態」、「例示的」、「先行技術」などがあるがこれらに限定されない。同様に、各国は、特許出願ではよくある様々な語を有することがある。例えば、いくつかの国では、語「特徴とする（characterized）」は、ほとんど特許性や調査価値がない一般的な語である。そのような語は、本明細書では、ストップワードと呼ばれる。国、言語、または文化に特有のストップワードを識別する目的は、調査するドキュメントベクトルのサイズを最小化することである。コレクションから識別されたストップワードを除去するために、特許ドキュメントコレクション内の各ドキュメントベクトルが解析されることがある。 A patent document collection is a unique collection of technical documents. Patent documents come in the form of issued patent grants and published patent applications. The difference between the two categories of documents identifies their possible values. More specifically, patent authorization is the actual ownership that can be enforced in the court of justice, while the published patent application is a pending application that is a pending patent right. Each patent document described contains customary words and phrases that are included in the application. However, such words and phrases have minimal value in research because these words appear in most patent documents and are not unique to the present invention. Examples of such words and phrases include, but are not limited to, “embodiments”, “exemplary”, “prior art”, and the like. Similarly, each country may have various words that are common in patent applications. For example, in some countries, the word “characterized” is a common word that has little patentability or search value. Such words are referred to herein as stop words. The purpose of identifying country, language, or culture specific stopwords is to minimize the size of the document vector being examined. Each document vector in the patent document collection may be analyzed to remove the identified stopwords from the collection.

図３は、特許ドキュメントコレクション内の静的ドキュメントベクトルを更に解析するためにストップワードを使用するプロセスを示すフローチャート（３００）である。ドキュメントコレクションにクエリを提示する前に、静的ドキュメントベクトルのストップワードを解析すべきかどうか判定する。ストップワードは、特定の国（３０２）、特定の言語（３０４）および／または特定の文化（３０６）に限定されてもよい。ステップ（３０２）、（３０４）および／または（３０６）における任意の個々の選択または選択の組み合わせに対する肯定応答の後で、特許ドキュメントコレクション内の静的ドキュメントベクトルを解析するためのストップワードのコンパイルを作成する（３０８）。特許ドキュメントのコレクションをコンパイルする（３１０）。一実施形態では、特許ドキュメントのコレクションは、選択された国、言語および／または特定の文化に限定されもよい。ドキュメント（３１０）をコンパイルした後で、コレクションを索引付けし（３１２）、コレクションからストップワードを解析する（３１４）。コンパイルからストップワードを索引付けし取り出すプロセスは、データのコレクションを調査と抽出に適したデータベースに変換することを含む。ステップ（３１４）の後で、コレクション内のドキュメントの１つまたは複数の区分を、コレクション用に作成されるドキュメントベクトルに含めるように選択する（３１６）。ステップ（３１６）での少なくとも１つの区分の選択に基づいて、コレクション内の各特許ドキュメントのドキュメントベクトルを作成する（３１８）。より具体的には、ドキュメントコレクションを索引付けした後で、導出されたドキュメントベクトルから識別されたストップワードのないコレクション内の各特許ドキュメントの選択された区分のドキュメントベクトルを導出する。そのようなドキュメントベクトルは、本明細書では静的ドキュメントベクトルと呼ばれる。 FIG. 3 is a flowchart (300) illustrating the process of using stop words to further analyze static document vectors in a patent document collection. Before submitting a query to the document collection, determine if the static document vector stopwords should be parsed. Stop words may be limited to a specific country (302), a specific language (304), and / or a specific culture (306). After acknowledgment for any individual selection or combination of selections in steps (302), (304) and / or (306), compiling a stop word to parse static document vectors in the patent document collection Create (308). A collection of patent documents is compiled (310). In one embodiment, the collection of patent documents may be limited to a selected country, language, and / or specific culture. After compiling the document (310), the collection is indexed (312) and the stopwords are parsed from the collection (314). The process of indexing and retrieving stopwords from compilation involves converting the collection of data into a database suitable for examination and extraction. After step (314), one or more sections of the documents in the collection are selected to be included in the document vector created for the collection (316). A document vector for each patent document in the collection is created (318) based on the selection of at least one section in step (316). More specifically, after indexing the document collection, a document vector for the selected section of each patent document in the collection without the stopwords identified from the derived document vector is derived. Such a document vector is referred to herein as a static document vector.

少数の例外を除き、特許ドキュメントは発行された後、一般に変更を受けない。そのような例外に対処するために、ドキュメントコレクションは希に更新される。より具体的には、コレクション内のドキュメントに対する任意の変更および該当ドキュメントベクトルを更新するための時間期間（３２０）が設定される。時間期間の例には、毎月、半年毎、年毎などがあるがこれらに限定されない。その後で、設定された時間期間が終了したかどうかが判定される（３２２）。ステップ（３２２）の判定に対する否定応答の後で、設定された時間期間（３２４）待機して、特許ドキュメントに対する変更をドキュメントベクトルに組み込むように特許ドキュメントベクトルを更新し、その後でステップ（３２０）に戻る。これに対して、ステップ（３２２）での判定に対する肯定応答の後で、ドキュメントコレクションに適用される新しいストップワードがあるかどうかが判定される（３２６）。ステップ（３２６）での判定に対する肯定応答の後で、ステップ（３１０）に戻り、またステップ（３２６）の判定に対する否定応答の後で、非該当特許用語のコンパイルに新しいストップワードおよび／または句を追加する（３２８）。ステップ（３２８）の後で、特許ドキュメントコレクションの静的ドキュメントベクトルを作成しかつ／または更新するプロセスは、ステップ（３１０）に戻る。したがって、識別されたストップワードの選択のために静的ドキュメントベクトルを解析して、クエリの提示を可能にして、静的ドキュメントコレクション内の適切な文字列に集中できるようにしてもよい。 With few exceptions, patent documents are generally unchanged after they are issued. To handle such exceptions, document collections are rarely updated. More specifically, an arbitrary change to a document in the collection and a time period (320) for updating the corresponding document vector are set. Examples of time periods include, but are not limited to, monthly, semi-annual, annual. Thereafter, it is determined whether the set time period has expired (322). After a negative response to the determination of step (322), wait for a set time period (324) to update the patent document vector to incorporate changes to the patent document into the document vector, and then to step (320) Return. In contrast, after an affirmative response to the determination at step (322), it is determined whether there are any new stopwords to be applied to the document collection (326). After an affirmative response to the determination at step (326), return to step (310), and after a negative response to the determination at step (326), add new stopwords and / or phrases to the compilation of the non-patent terms. Add (328). After step (328), the process of creating and / or updating the static document vector of the patent document collection returns to step (310). Thus, the static document vector may be parsed for selection of identified stopwords to allow query presentation and focus on the appropriate strings in the static document collection.

交付済み特許と公開特許出願が複数の区分に分割されることは認識される。特許ドキュメントの各区分は、完成した特許出願の提示に必要とされ、特許の各区分は目的を有する。特許出願の各区分の詳細は、本明細書では詳細に言及されない。しかしながら、様々な区分が識別される。大部分は、各特許出願は、名称、優先権出願日、概要、背景説明、要約、図面の簡単な説明（ある場合）、本発明の詳細な説明、およびクレームを含む。調査の目的により特許界で使用される様々な調査カテゴリーがある。例えば、侵害および／または製品クリアランス調査は、クレーム内の語と関係し、したがって、ドキュメントコレクション内にあるクレームが対象とされるはずである。有効性および／または無効性調査は、任意の既知の先行技術と関係し、特許ドキュメントの優先権出願日の識別を必要とする。発明者が、特許出願の前または後でその発明の新規性を判定しようとするとき、発明者またはその代行人または代理人が、新規性調査を依頼することがある。そのような調査は、クレームを重視するのをやめ、発明の詳細な説明に注目してもよい。したがって、本明細書に示されたように、各調査は、ドキュメントコレクション内の特許ドキュメントの様々な区分に重点を置く。 It will be appreciated that issued patents and published patent applications are divided into multiple categories. Each section of a patent document is required to present a completed patent application, and each section of a patent has a purpose. Details of each section of the patent application are not mentioned in detail herein. However, various categories are identified. For the most part, each patent application includes a name, priority filing date, summary, background description, abstract, brief description of the drawings (if any), detailed description of the invention, and claims. There are various search categories used in the patent world depending on the purpose of the search. For example, infringement and / or product clearance investigations are related to the words in the claims and thus should be directed to claims that are in the document collection. Validity and / or invalidity searches are associated with any known prior art and require identification of the priority filing date of a patent document. When an inventor tries to determine the novelty of an invention before or after filing a patent application, the inventor or his agent or agent may request a novelty search. Such a search may cease to focus on claims and focus on the detailed description of the invention. Thus, as shown herein, each survey focuses on various sections of patent documents within a document collection.

上記したように、コレクションの調査で極小価値を有するストップワードを選択するために、ドキュメントコレクション内の各特許を解析してもよい。しかしながら、ストップワードの選択に加えてまたはその選択と別に、単一の特許ドキュメントの複数の静的ドキュメントベクトルをコンパイルしたいことがあり、個別の各ドキュメントベクトルは、コレクション内の特許ドキュメントの識別された各区分に関係する。特定の区分を識別する各ベクトルによる複数のドキュメントベクトルの作成は、定義された調査範囲に基づいて、ドキュメントコレクションの調査を精密化してもよい。例えば、ドキュメントコレクション内の侵害調査は、ドキュメントコレクション内の各特許のクレーム区分に関係するドキュメントベクトルに限定されないことがある。 As described above, each patent in the document collection may be analyzed in order to select a stop word with minimal value in the collection search. However, you may want to compile multiple static document vectors of a single patent document in addition to or in addition to the selection of stopwords, each individual document vector being identified for a patent document in the collection Relevant to each category. Creation of multiple document vectors with each vector identifying a particular segment may refine the search of the document collection based on a defined search scope. For example, infringement investigations in a document collection may not be limited to document vectors related to the claim category of each patent in the document collection.

図４は、コレクション内の各特許ドキュメントに複数のドキュメントベクトルを作成するプロセスを示すフローチャート（４００）である。最初に、特許ドキュメントのコレクションをコンパイルし（４０２）、索引付けする（４０４）。変数Ｍ_{Ｔｏｔａｌ}に特許ドキュメントコレクション内のドキュメントの総数が指定され（４０６）、カウント変数Ｍに整数１が指定される（４０８）。コレクションの特許ドキュメントＭ内の区分の量を識別する（４１０）。ステップ（４１０）の後で、変数Ｎ_{Ｔｏｔａｌ}に特許ドキュメントＭ内の区分の総数が指定され（４１２）、カウント変数Ｎに整数１に指定される（４１４）。コレクション内の各特許ドキュメントの各区分のドキュメントベクトルを作成する。より具体的には、「ＰａｔｅｎｔＤｏｃｕｍｅｎｔ_Ｍ」の各「Ｓｅｃｔｉｏｎ_Ｎ」のドキュメントベクトルを作成する（４１６）。ステップ（４１６）でドキュメントベクトルを作成した後で、特許ドキュメントの別の区分がある場合は、カウント変数Ｎを増分して（４１８）、次の区分の次のドキュメントベクトルを作成するために特許ドキュメントの次の区分に進む。ステップ（４１８）の後で、特許ドキュメントにドキュメントベクトルを作成する更に他の区分があるかどうかが判定される（４２０）。ステップ（４２０）での判定に対する否定応答の後で、ステップ（４１６）に戻る。これと反対に、ステップ（４２０）での判定に対する肯定応答の後で、変数Ｍが増分される（４２２）。次に、コレクション内の各ドキュメントを解析して複数のドキュメントベクトルが作成されたかどうかを判定する（４２４）。ステップ（４２４）の判定に対する否定応答の後で、ステップ（４１０）に戻り、コレクション内の次のドキュメント用の複数のドキュメントベクトルを作成する。前述のように、静的ドキュメントコレクションを定期的に更新しなければならないことがあることは、当該技術分野では既知である。更新の頻度は、コレクションの精度によって頻繁でも稀でもよい。一実施形態では、静的ドキュメントベクトルを更新する頻度は、特許の発行割合に比例してもよい。ステップ（４２４）での判定に対する肯定応答は、特許ドキュメントコレクションを解析して各特許ドキュメントの複数のドキュメントベクトルを作成したことを示す。次に、コレクション内の静的ベクトルを更新する時間期間が終了したかどうかが判定される（４２６）。ステップ（４２６）での判定に対する肯定応答の後で、ステップ（４０２）に戻る。これと反対に、ステップ（４２６）での判定に対する否定応答の後で、設定された時間期間待機して、特許ドキュメントに対する変更をドキュメントベクトルに組み込むために特許ドキュメントベクトルを更新し（４２８）、その後で、ステップ（４２６）に戻る。したがって、ドキュメントコレクション内の各特許ドキュメントを解析して複数の静的ドキュメントベクトルを作成してもよく、各ベクトルは、特許ドキュメントの１つの識別された区分に関係する。 FIG. 4 is a flowchart (400) illustrating a process for creating a plurality of document vectors for each patent document in the collection. Initially, a collection of patent documents is compiled (402) and indexed (404). The total number of documents in the patent document collection is designated in the variable M _Total (406), and the integer 1 is designated in the count variable M (408). The amount of sections in the patent document M of the collection is identified (410). After step (410), the total number of sections in patent document M is specified in variable N _Total (412), and integer 1 is specified in count variable N (414). Create a document vector for each section of each patent document in the collection. More specifically, a document vector of each “Section _N ” of “PatentDocument _M ” is created (416). After creating the document vector in step (416), if there is another division of the patent document, the count variable N is incremented (418) to create the next document vector of the next division. Proceed to the next section. After step (418), it is determined (420) whether there are any further sections in the patent document that create document vectors. After a negative response to the determination in step (420), the process returns to step (416). Conversely, after an affirmative response to the determination at step (420), variable M is incremented (422). Next, each document in the collection is analyzed to determine whether multiple document vectors have been created (424). After a negative response to the determination at step (424), return to step (410) to create a plurality of document vectors for the next document in the collection. As mentioned above, it is known in the art that the static document collection may need to be updated periodically. The frequency of updates may be frequent or rare depending on the accuracy of the collection. In one embodiment, the frequency of updating the static document vector may be proportional to the patent issuance rate. An affirmative response to the determination at step (424) indicates that the patent document collection has been analyzed to create a plurality of document vectors for each patent document. Next, it is determined whether the time period for updating static vectors in the collection has expired (426). After an affirmative response to the determination in step (426), the process returns to step (402). Conversely, after a negative response to the determination in step (426), wait for a set time period to update the patent document vector (428) to incorporate changes to the patent document into the document vector, and thereafter The process returns to step (426). Thus, each patent document in the document collection may be analyzed to create a plurality of static document vectors, each vector related to one identified section of the patent document.

特許ドキュメントを解析して、コレクション内の各ドキュメントの複数のドキュメントベクトルを作成した後で、クエリの提示は、ドキュメント区分の解析を活用してもよい。図５は、複数のドキュメントベクトルを有するドキュメントコレクションにクエリを提示するプロセスを示すフローチャート（５００）である。最初に、クエリをコレクションに提示するユーザが、調査の範囲を定義する（５０２）。一実施形態では、調査の範囲の選択を容易にするために、ユーザには、コンピュータ命令の上の層としてグラフィカルユーザインターフェースが提供されてもよい。ステップ（５０２）の後で、定義された調査の範囲が、ドキュメントコレクションのドキュメントベクトルカテゴリーの選択と関連付けられ（５０４）、クエリ文字列がドキュメントコレクションに提示される（５０６）。その後で、提示されたクエリ文字列の動的ドキュメントベクトルが作成され（５０８）、その動的ドキュメントベクトルが、該当ドキュメントを決定するためにドキュメントコレクションに提示される（５１０）。クエリの提示は、動的ドキュメントベクトルのドキュメントコレクションの特定の静的ドキュメントベクトルとの比較に限定される（５１２）。一実施形態では、静的ドキュメントベクトルの選択は、静的ドキュメントベクトルのグループの選択でよい（５１３）。より具体的には、特許ドキュメントのクレーム区分に限定された調査で、特許ドキュメントコレクション内の特許のクレーム区分の静的ドキュメントベクトル、すなわち類似の静的ドキュメントベクトルのグループだけが調査される。ステップ（５１２）での比較は、動的ドキュメントベクトルと静的ドキュメントベクトルの数学的比較である。比較の結果セットは、数学的比較に基づいてソートされる（５１４）。一実施形態では、ソートは、ドキュメントコレクションの静的ドキュメントベクトルの動的ドキュメントベクトルとの近さに基づく階層的である。したがって、動的ドキュメントベクトルのコレクションの静的ドキュメントベクトルとの比較によって、結果セットが得られる。 After parsing the patent document and creating multiple document vectors for each document in the collection, query presentation may utilize document segment analysis. FIG. 5 is a flowchart (500) illustrating a process for presenting a query to a document collection having a plurality of document vectors. Initially, a user submitting a query to a collection defines a scope of the survey (502). In one embodiment, the user may be provided with a graphical user interface as a layer above the computer instructions to facilitate selection of the scope of the survey. After step (502), the defined scope of the survey is associated with the selection of the document vector category of the document collection (504), and a query string is presented to the document collection (506). Thereafter, a dynamic document vector of the presented query string is created (508), and the dynamic document vector is presented to the document collection to determine the corresponding document (510). Query presentation is limited to comparing the dynamic document vector document collection with a particular static document vector (512). In one embodiment, the static document vector selection may be a selection of a group of static document vectors (513). More specifically, in a search limited to patent document claim categories, only the static document vectors of patent claim categories in the patent document collection, ie, groups of similar static document vectors, are examined. The comparison in step (512) is a mathematical comparison between the dynamic document vector and the static document vector. The comparison result set is sorted based on the mathematical comparison (514). In one embodiment, the sorting is hierarchical based on the proximity of the static document vector of the document collection to the dynamic document vector. Thus, a result set is obtained by comparing a collection of dynamic document vectors with a static document vector.

結果セットをソートした後（５１４）、数値を利用して、該当するように決定されるソート済みドキュメントの近さの範囲を定義する（５１６）。ステップ（５１６）の後で、ソートされたコレクション内に、定義された数値範囲内にあるドキュメントがあるかどうかが判定される（５１８）。ステップ（５１８）での判定に対する肯定応答の後で、動的ドキュメントベクトルの定義された範囲内の静的ドキュメントベクトル内の基本特許の全てのリストを結果セットに入れる（５２０）。ステップ（５２０）の後またはステップ（５１８）での比較に対する否定応答の後で、ユーザが、新しいクエリ文字列を提示したいか、または前のクエリ文字列提示のクエリを制限したいかかどうかが判定される（５２２）。判定ステップ（５２２）に対する否定応答は、クエリ提示プロセスの終わりを知らせる。これと反対に、ステップ（５２２）での判定に対する肯定応答の後で、ユーザは、クエリ（すなわち、動的ドキュメントベクトル）と比較される調査の区分（すなわち、静的ドキュメントベクトル）を変更したいかどうかを判定する（５２４）。一実施形態では、調査の範囲の変更は、調査で使用される静的ドキュメントベクトルの選択を直接変更してもよい。ステップ（５２４）での判定に対する肯定応答の後で、ステップ（５０２）に戻り、次のクエリで評価される特許ドキュメントの区分を変更する。これと反対に、ステップ（５２４）での判定に対する否定応答は、特許コレクション内に前のクエリと同じドキュメントベクトルの制限を維持しながら新しいクエリが前のクエリの範囲を更に制限することを示す。したがって、否定応答の後で、特許ドキュメントコレクションではなくクエリのさらなる修正が提示され、ステップ（５０６）に戻る。したがって、調査の範囲は、クエリの動的ドキュメントベクトルを特許ドキュメントコレクションの静的ドキュメントベクトルと比較することに基づいて結果セットを修正するように、２つの点で変更されてもよい。 After sorting the result set (514), numerical values are used to define the proximity range of the sorted documents that are determined to be relevant (516). After step (516), it is determined whether there are any documents in the sorted collection that are within the defined numerical range (518). After an affirmative response to the determination at step (518), a list of all the basic patents in the static document vector within the defined range of the dynamic document vector is put into the result set (520). After step (520) or after a negative response to the comparison at step (518), a determination is made whether the user wishes to present a new query string or limit the query from the previous query string presentation. (522). A negative response to decision step (522) signals the end of the query presentation process. Conversely, after an affirmative response to the determination in step (522), the user wishes to change the survey segment (ie, static document vector) compared to the query (ie, dynamic document vector). It is determined whether or not (524). In one embodiment, changing the scope of the survey may directly change the selection of static document vectors used in the survey. After an affirmative response to the determination at step (524), return to step (502) to change the classification of patent documents evaluated in the next query. Conversely, a negative response to the determination at step (524) indicates that the new query further limits the scope of the previous query while maintaining the same document vector restrictions as the previous query in the patent collection. Thus, after a negative response, further modifications of the query rather than the patent document collection are presented and the process returns to step (506). Thus, the scope of the survey may be changed in two ways to modify the result set based on comparing the query's dynamic document vector with the patent document collection's static document vector.

図１〜図５に示されたように、特許ドキュメントコレクションに特有のドキュメントベクトルが作成され、次にクエリの提示に使用されて、コレクションの静的ドキュメントベクトルの定義された範囲内にある動的ドキュメントベクトル内の結果セットが作成される。図６は、静的および動的ドキュメントベクトルを作成しかつそのベクトルをドキュメントコレクションに提示されるクエリと関連して使用する１組のツールを示すブロック図（６００）である。図示されたように、コンピュータシステム（６０２）は、バス構造（６０８）によってメモリ（６０６）に結合された処理装置（６０４）を備える。１つの処理装置（６０４）だけを示すが、一実施形態では、拡張設計でもっと多くの処理装置が提供されてもよい。システム（６０２）は、ドキュメントコレクション（６４２）を収容するように構成された記憶媒体（６４０）と通信状態にあるように示されている。一実施形態では、電子ドキュメントコレクションは、交付済み特許と公開特許出願を含む特許ドキュメントのコンパイルを含む。記憶媒体（６４０）は、処理装置（６０４）と通信状態にある。更に、システムは、画像データを表現するための表示装置（６５０）と通信状態にあるように示されている。本明細書に図示され記述された要素はそれぞれ、ドキュメントコレクション（６４２）に対するクエリ提示を支援する。 As shown in FIGS. 1-5, a document vector specific to a patent document collection is created and then used to present a query, which is within the defined range of the collection's static document vector. A result set in the document vector is created. FIG. 6 is a block diagram (600) illustrating a set of tools for creating static and dynamic document vectors and using the vectors in conjunction with queries presented in a document collection. As shown, the computer system (602) comprises a processing unit (604) coupled to a memory (606) by a bus structure (608). Although only one processing device (604) is shown, in one embodiment, more processing devices may be provided in an expanded design. The system (602) is shown in communication with a storage medium (640) configured to accommodate a document collection (642). In one embodiment, the electronic document collection includes a compilation of patent documents including issued patents and published patent applications. The storage medium (640) is in communication with the processing device (604). Further, the system is shown in communication with a display device (650) for representing image data. Each of the elements shown and described herein supports query presentation for the document collection (642).

コンピュータシステム（６０２）に対してローカルでかりかつメモリ（６０６）と通信状態のドキュメントマネージャ（６６０）が提供される。ドキュメントマネージャ（６６０）は、索引付け時にコレクション（６４２）の各特許ドキュメントのドキュメントベクトルを導出する役割をする。より具体的には、ドキュメントマネージャ（６６０）は、コレクション（６４２）内の各特許ドキュメントに少なくとも１つの静的ドキュメントベクトル（６４４）を作成する。前述のように、各特許ドキュメントは、同じ特許庁管轄から発行された場合に一貫していることもある特定の標準化された区分からなる。一実施形態では、ドキュメントマネージャ（６６０）は、各特許ドキュメントの複数の静的ドキュメントベクトル（６４４）を作成するために使用される。ドキュメントマネージャ（６６０）によって作成されたドキュメントベクトル（６４４）は、記憶媒体（６４０）に収容される。やはりコンピュータシステム（６０２）に対してローカルでかつメモリ（６０６）と通信状態の入力マネージャ（６６２）が提供される。入力マネージャ（６６２）は、クエリ入力から受け取った文字列データに基づいて、クエリ時に動的ドキュメントベクトルを作成する役割をする。入力マネージャ（６６２）は、クエリマネージャ（６６４）と通信状態にあり、やはりコンピュータシステム（６０２）に対してローカルでありかつメモリ（６０６）との通信状態に提供される。クエリマネージャ（６６４）は、入力マネージャ（６６２）によって作成された動的ドキュメントベクトルを、ドキュメントコレクション（６４２）へのクエリ入力の提示に応じて、各静的ドキュメントベクトル（６４４）と比較する役割をする。この比較により、適切な特許ドキュメント（６４６）がコンパイルされる。一実施形態では、コンパイルは、表示装置（６５０）上に提供される。同様に、一実施形態では、コンパイルは、記憶装置上に一時的に保持されてもよく永続的に保持されてもよい。 A document manager (660) is provided that is local to the computer system (602) and in communication with the memory (606). The document manager (660) is responsible for deriving a document vector for each patent document in the collection (642) at the time of indexing. More specifically, the document manager (660) creates at least one static document vector (644) for each patent document in the collection (642). As mentioned above, each patent document consists of specific standardized sections that may be consistent when issued from the same JPO jurisdiction. In one embodiment, the document manager (660) is used to create a plurality of static document vectors (644) for each patent document. The document vector (644) created by the document manager (660) is stored in the storage medium (640). An input manager (662) that is also local to the computer system (602) and in communication with the memory (606) is provided. The input manager (662) serves to create a dynamic document vector at the time of a query based on character string data received from the query input. The input manager (662) is in communication with the query manager (664), is also local to the computer system (602) and is provided in communication with the memory (606). The query manager (664) is responsible for comparing the dynamic document vectors created by the input manager (662) with each static document vector (644) in response to the presentation of query input to the document collection (642). To do. This comparison compiles the appropriate patent document (646). In one embodiment, the compilation is provided on the display device (650). Similarly, in one embodiment, compilations may be kept temporarily or permanently on storage.

非該当文字列データ（６４８）のコンパイルを使用して、静的ドキュメントベクトル（６４４）からの非該当文字列データを解析してもよい。一実施形態では、非該当文字列データ（６４８）のコンパイルは、記憶媒体（６４０）上に保持され、ドキュメントマネージャ（６６０）によって定期的に更新される。非該当文字列データを使用するか無視することより、ドキュメントマネージャ（６６０）は、ドキュメントコレクション（６４２）内の各特許ドキュメントの複数の静的ドキュメントベクトルを作成するように指示されることがある。コンピュータシステム（６０２）に対してローカルでありかつメモリ（６０６）と通信状態の選択マネージャ（６６６）が提供される。より具体的には、選択マネージャ（６６６）は、ドキュメントコレクションに対する調査範囲を選択するためにクエリマネージャ（６６４）と通信状態にある。選択された調査範囲は、クエリを処理するためにクエリマネージャ（６６４）によって適用される静的ドキュメントベクトルの選択を決定する。 Compilation of non-applicable character string data (648) may be used to analyze non-applicable character string data from static document vector (644). In one embodiment, the compilation of non-applicable string data (648) is maintained on storage medium (640) and updated periodically by document manager (660). By using or ignoring non-applicable string data, the document manager (660) may be instructed to create multiple static document vectors for each patent document in the document collection (642). A selection manager (666) that is local to the computer system (602) and in communication with the memory (606) is provided. More specifically, the selection manager (666) is in communication with the query manager (664) to select a search scope for the document collection. The selected search scope determines a static document vector selection to be applied by the query manager (664) to process the query.

一実施形態では、入力マネージャ（６６２）、クエリマネージャ（６６４）、ドキュメントマネージャ（６６０）および選択マネージャ（６６６）は、コンピュータシステム（６０２）に対してローカルなメモリ（６０６）内にあってもよい。しかしながら、本発明は、この実施形態に限定されるべきでない。例えば、一実施形態では、入力、クエリ、ドキュメントおよび選択マネージャ（６６０）〜（６６６）はそれぞれ、ローカルメモリ（６０６）の外部のハードウェアツールとして存在してもよく、ハードウェアとソフトウェアの組み合わせとして実現されてもよい。同様に、一実施形態では、マネージャ（６６０）〜（６６６）は、記憶媒体（６４０）と通信状態にあるリモートシステム上にあってもよい。したがって、マネージャは、適切な特許ドキュメントのコンパイルを得るために、電子特許ドキュメントコレクションに対する１つまたは複数のクエリの提示を支援するソフトウェアツールまたはハードウェアツールとして実現されてもよい。 In one embodiment, the input manager (662), query manager (664), document manager (660) and selection manager (666) may be in memory (606) local to the computer system (602). . However, the present invention should not be limited to this embodiment. For example, in one embodiment, the input, query, document and selection managers (660)-(666) may each exist as a hardware tool external to the local memory (606), as a combination of hardware and software. It may be realized. Similarly, in one embodiment, managers (660)-(666) may be on a remote system in communication with storage medium (640). Thus, the manager may be implemented as a software or hardware tool that assists in the presentation of one or more queries to the electronic patent document collection to obtain a compilation of appropriate patent documents.

本明細書に記載されているように、クエリは、クエリ実行で処理される静的ドキュメントベクトルに関係する特定の命令と共に特許ドキュメントコレクションに提示されてもよい。図７は、命令の提示を支援するために使用されることがあるグラフィカルユーザインターフェース（７０２）のブロック図（７００）である。インターフェース（７０２）は、電子ドキュメントコレクションの基礎データベースを支援する命令の上の化粧板として働く。図示されたように、４つの主フィールドがある。第１のフィールド（７１０）は、ドキュメントコレクションにクエリを提示するためのフィールド（７１２）を含む。第２のフィールド（７２０）は、調査カテゴリーを選択するための複数のフィールドを含む。より具体的には、図示されたように、第２のフィールド（７２０）は、調査カテゴリーを選択するための、新規性（７２２）、最新技術（７２４）、侵害（７２６）、製品クリアランス（７２８）、妥当性／無効性（７３０）のサブフィールドを含んでもよい。一実施形態では、調査フィールド（７２０）は、複数のサブフィールドの選択に対応することができる。第３のフィールド（７４０）は、結果コンパイルで戻される調査ドキュメントの最大量を選択するための複数のフィールドを含む。より具体的には、第３のフィールド（７４０）は、１０ドキュメント（７４２）、５０ドキュメント（７４４）、１００ドキュメント（７４６）、５００ドキュメント（７４８）、１０００ドキュメント（７５０）、および戻される最大数量をカスタマイズした入力を支援する入力フィールド（７５２）などのサブフィールドを含んでもよい。本発明は、（７４２）〜（７５０）で示されたサブフィールドの量に限定されるべきでない。本明細書で提供される数は、単なる例示である。インターフェースの第４のフィールド（７６０）は、ドキュメントコレクションにクエリ文字列を提示するために使用される。一実施形態では、第４のフィールド（７６０）は、クエリ提示を入力するための実行ボタン（７６２）と、当該実行をキャンセルするためのキャンセルボタン（７６４）とを有する。したがって、本明細書で示されたインターフェースは、電子ドキュメントコレクションへのクエリの通信と提示を容易にして、電子ドキュメントコレクション内の１つまたは複数の静的ドキュメントベクトルの使用を強化する。 As described herein, a query may be presented in a patent document collection with specific instructions related to a static document vector processed in query execution. FIG. 7 is a block diagram (700) of a graphical user interface (702) that may be used to assist in the presentation of instructions. The interface (702) acts as a decorative board on top of the instructions that support the basic database of the electronic document collection. As shown, there are four main fields. The first field (710) includes a field (712) for presenting a query to the document collection. The second field (720) includes a plurality of fields for selecting a survey category. More specifically, as shown, the second field (720) includes novelty (722), state-of-the-art (724), infringement (726), product clearance (728) for selecting a survey category. ), Sub-fields of validity / invalidity (730) may be included. In one embodiment, the survey field (720) may correspond to a selection of multiple subfields. The third field (740) includes a plurality of fields for selecting the maximum amount of survey documents returned in the result compilation. More specifically, the third field (740) includes 10 documents (742), 50 documents (744), 100 documents (746), 500 documents (748), 1000 documents (750), and the maximum quantity returned. Subfields such as an input field (752) that supports customized input may be included. The present invention should not be limited to the amount of subfields indicated at (742)-(750). The numbers provided herein are merely exemplary. The fourth field (760) of the interface is used to present the query string to the document collection. In one embodiment, the fourth field (760) has an execute button (762) for entering the query presentation and a cancel button (764) for canceling the execution. Accordingly, the interface presented herein facilitates communication and presentation of queries to the electronic document collection and enhances the use of one or more static document vectors within the electronic document collection.

一実施形態では、本発明は、ファームウェア、常駐ソフトウェア、マイクロコードなどを含むがこれらに限定されないソフトウェアで実現される。本発明は、コンピュータまたは任意の命令実行システムによってまたはそれらと関連して使用するためのコンピュータ可用媒体またはコンピュータ可読媒体からアクセス可能なコンピュータプログラム製品の形態でよい。この説明のために、コンピュータ可用媒体またはコンピュータ可読媒体は、命令実行システム、機器または装置によってまたはそれらと関連して使用するプログラムを収容、記憶、通信、伝搬、または転送することができる任意の装置でよい。 In one embodiment, the invention is implemented in software, which includes but is not limited to firmware, resident software, microcode, etc. The invention may be in the form of a computer program product accessible from a computer-usable or computer-readable medium for use by or in connection with a computer or any instruction execution system. For purposes of this description, a computer-usable or computer-readable medium is any device that can contain, store, communicate, propagate, or transfer a program for use by or in connection with an instruction execution system, device or apparatus. It's okay.

本発明の範囲内の実施形態は、コード化されたプログラムコードを有するログラム記憶手段を含む製造物品も含む。そのようなプログラム記憶手段は、汎用または専用コンピュータがアクセスすることができる任意の利用可能な媒体でよい。限定ではなく一例として、そのようなプログラム記憶手段は、ＲＡＭ、ＲＯＭ、ＥＥＰＲＯＭ、ＣＤ−ＲＯＭ若しくは他の光学ディスク記憶装置、磁気ディスク記憶装置若しくは他の磁気記憶装置、または所望のプログラムコード手段を記憶するために使用することができ汎用または専用コンピュータがアクセスすることができる任意の他の媒体でよい。以上のものの組み合わせも、プログラム記憶手段の範囲に含まれるはずである。 Embodiments within the scope of the present invention also include an article of manufacture including program storage means having encoded program code. Such program storage means may be any available media that can be accessed by a general purpose or special purpose computer. By way of example and not limitation, such program storage means may store RAM, ROM, EEPROM, CD-ROM or other optical disk storage device, magnetic disk storage device or other magnetic storage device, or desired program code means. It can be any other medium that can be used to access a general purpose or special purpose computer. Combinations of the above should also be included in the scope of the program storage means.

媒体は、電子、磁気、光学、電磁気、赤外線、半導体システム（または、機器または装置）、または伝搬媒体でよい。コンピュータ可読媒体の例には、半導体または固体メモリ、磁気テープ、取外し式コンピュータディスケット、ランダムアクセスメモリ（ＲＡＭ）、読み出し専用メモリ（ＲＯＭ）、剛性磁気ディスク、および光ディスクがある。光ディスクの現在の例には、コンパクトディスクＢ読み出し専用（ＣＤ−ＲＯＭ）、コンパクトディスクＢ読出し／書込み（ＣＤ−Ｒ／Ｗ）およびＤＶＤが挙げられる。 The medium can be an electronic, magnetic, optical, electromagnetic, infrared, semiconductor system (or apparatus or device), or a propagation medium. Examples of computer readable media include semiconductor or solid state memory, magnetic tape, removable computer diskette, random access memory (RAM), read only memory (ROM), rigid magnetic disk, and optical disk. Current examples of optical disks include compact disk B read only (CD-ROM), compact disk B read / write (CD-R / W) and DVD.

プログラムコードを記憶しかつ／または実行するのに適したデータ処理システムは、システムバスを介してメモリ素子に直接または間接に結合された少なくとも１つのプロセッサを含む。メモリ素子には、プログラムコードの実際の実行中に使用されるローカルメモリ、大容量記憶装置、実行中に大容量記憶装置からコードを取り出さなければならない回数を減らすために少なくとも何らかのプログラムコードを一時的に記憶するキャッシュメモリを挙げることができる。 A data processing system suitable for storing and / or executing program code will include at least one processor coupled directly or indirectly to memory elements through a system bus. The memory element temporarily stores at least some program code to reduce the number of times the code must be fetched from the local memory, mass storage device, and mass storage device during execution. Can be listed as a cache memory.

入力／出力またはＩ／Ｏ装置（キーボード、表示装置、ポインティング装置などを含むがこれらに限定されない）は、システムに対して直接または介在Ｉ／Ｏコントローラを介して結合されてもよい。また、データ処理システムが、介在する専用または公衆ネットワークを介して他のデータ処理システム、リモートプリンタまたは記憶装置に結合できるようにするために、システムにネットワークアダプタが結合されてもよい。 Input / output or I / O devices (including but not limited to keyboards, display devices, pointing devices, etc.) may be coupled to the system either directly or via an intervening I / O controller. A network adapter may also be coupled to the system to allow the data processing system to be coupled to other data processing systems, remote printers or storage devices via intervening dedicated or public networks.

ソフトウェア実装は、コンピュータや任意の命令実行システムによってまたはそれらと関連して使用するためのプログラムコードを提供するコンピュータ可用媒体またはかコンピュータ可読媒体からアクセス可能なコンピュータプログラム製品の形態を取ることができる。 A software implementation may take the form of a computer program product accessible from a computer-usable or computer-readable medium that provides program code for use by or in connection with a computer or any instruction execution system.

（先行技術に対する利点）
各特許ドキュメントは、法令出願要件を満たすために必要とされる区分の定義された概要を有することは当該技術分野において知られている。ドキュメントベクトルから非該当特許文字列を除去する選択肢により、個々の電子ドキュメントの複数のドキュメントベクトルを作成する。一実施形態では、ドキュメントコレクションのクレーム区分に１つのドキュメントベクトルが作成され、ドキュメントコレクションの名称、概要およびクレーム区分に別のドキュメントベクトルが作成され、組み合わされたドキュメントコレクションのすべての区分に第３のドキュメントベクトルが作成される。ベクトルを解析することによって、より小さくより簡潔なドキュメントベクトルが得られ、ドキュメントベクトルが小さいほど、ベクトルが解析文字列の追加の処理を必要としないので、クエリ処理効率が改善される。すべてのクエリが同じとは限らない。コレクションに様々なクエリが提示されて様々な結果が得られる。したがって、非該当特許用語を解析すると共に、静的ドキュメントベクトルの分類によって、クエリ提示を効率的かつ効果的に処理して、ドキュメント結果の望ましいコンパイルを行うことができる。 (Advantages over the prior art)
It is known in the art that each patent document has a defined overview of the categories required to meet the legal application requirements. A plurality of document vectors for each electronic document is created by the option of removing non-corresponding patent character strings from the document vector. In one embodiment, one document vector is created for the claim category of the document collection, another document vector is created for the name, summary, and claim category of the document collection, and a third is created for all segments of the combined document collection. A document vector is created. By analyzing the vector, a smaller and more concise document vector is obtained, and the smaller the document vector, the better the query processing efficiency because the vector does not require additional processing of the parse string. Not all queries are the same. Different queries are presented to the collection and different results are obtained. Thus, by analyzing non-patent patent terms and classifying static document vectors, query presentation can be processed efficiently and effectively to achieve a desired compilation of document results.

（他の実施形態）
本明細書で本発明の特定の実施形態を例示のために説明してきたが、本発明の趣旨と範囲から逸脱することなく様々な変更を行うことができることを理解されよう。詳細には、知的財産ドキュメントの調査は、交付済み特許と公開特許出願に限定されない。調査は、商標登録および出願、著作権登録および出願、ならびに全ての形態の特許ドキュメントを含むがこれに限定されない知的財産ドキュメントの全ての形態を含むように拡張されてもよい。クエリ提示のドキュメント分類にかかわらず、ドキュメントコレクション内の静的ドキュメントベクトルを更新するにはリソースの負担がある。科学の進歩の自然の成り行きで、ドキュメントコレクションは、新しいドキュメントが週一度または他の時間にコレクションに追加されることにより、増大するドキュメントのコレクションである。静的ドキュメントベクトルを更新するように設定される時間期間は、知的財産ドキュメントが設定頻度で与えられ公開されるように一定でよい。しかしながら、一実施形態では、時間期間を変更するために１つまたは複数の変数が使用されてもよい。例えば、一実施形態では、定義された時間期間内にコレクションに追加されるドキュメントの量に基づいて、時間期間変数が変更されてもよい。目標は、コレクション内の静的ドキュメントベクトルの定期的更新を必要とする正確なドキュメントコレクションを維持して、包括的なデータレポジトリを保証することである。 (Other embodiments)
Although specific embodiments of the present invention have been described herein for purposes of illustration, it will be understood that various modifications can be made without departing from the spirit and scope of the invention. In particular, the search for intellectual property documents is not limited to issued patents and published patent applications. The search may be extended to include all forms of intellectual property documents, including but not limited to trademark registrations and applications, copyright registrations and applications, and all forms of patent documents. Regardless of the query classification document classification, updating the static document vectors in the document collection is resource intensive. In the natural course of scientific progress, a document collection is a collection of documents that grows as new documents are added to the collection once a week or at other times. The time period set to update the static document vector may be constant so that the intellectual property document is given and published at a set frequency. However, in one embodiment, one or more variables may be used to change the time period. For example, in one embodiment, the time period variable may be changed based on the amount of documents added to the collection within a defined time period. The goal is to maintain an accurate document collection that requires periodic updates of the static document vectors in the collection to ensure a comprehensive data repository.

更に、電子ドキュメントコレクションは、特に、知的財産ドキュメントに関係して説明された。しかしながら、本発明は、これらの特定のカテゴリーの電子文書に限定されるべきでない。一実施形態では、電子ドキュメントコレクションは、定義された複数の区分を有する任意のタイプのドキュメントを含んでもよい。これにより、マネージャは、ドキュメントを定義された区分に解析し、定義された区分の複数の静的ドキュメントベクトルを作成し、ドキュメントの定義された区分に基づいてクエリを定義することに対応することができる。したがって、本発明の保護の範囲は、添付のクレームおよびその等価物によってのみ限定される。 In addition, electronic document collections have been specifically described in relation to intellectual property documents. However, the present invention should not be limited to these specific categories of electronic documents. In one embodiment, the electronic document collection may include any type of document having a plurality of defined sections. This allows the manager to parse the document into defined segments, create multiple static document vectors for the defined segments, and support defining queries based on the defined segments of the document. it can. Accordingly, the scope of protection of the present invention is limited only by the appended claims and equivalents thereof.

Claims

A computer-implemented method for exploring an electronic document collection, comprising:
Compiling a collection of intellectual property documents, each document in the collection having at least one section;
At the time of indexing, the process includes creating at least one static document vector for each document in the document collection, and deriving at least one document vector for each document in the collection based on the at least one partition. Steps,
Limiting the static document vector to selection of fields from an intellectual property document;
Creating a group of a plurality of static document vectors for each intellectual property document in the collection;
At query time, identifying a specific document vector based on the query input;
Presenting the identified specific document vector to a search engine;
Selecting a survey range having a plurality of survey range sub-fields for application to the document collection, wherein the survey range selection includes at least one static document vector category from the document collection; Comparing the selection of the at least one static vector category with the generated dynamic vector based on a matched and defined search range;
On the basis of the comparison of the specific document vectors the identification for the at least one created static document vector, seen including the steps of: compiling the returned relevant documents, and
The field is selected from the group consisting of name, summary, background, summary, detailed description, claims, drawings and combinations thereof;
A method wherein each static document vector is based on one or more fields of the intellectual property document .

The method of claim 1, wherein the step of identifying a specific document vector based on a query input further comprises creating a dynamic document vector based on string data from the query input.

Includes a process of excluding each string in the from each compilation of the document vector, to create a list of commonly used in the term in the patent application in the file, the weight of the previous SL document vector terms of said list The method of claim 1, further comprising minimizing, removing, excluding or reducing .

The method of claim 3, wherein the list of terms is language specific.

4. The method of claim 3, wherein the list of terms is culture specific.

Includes a process of identifying specific terms for inclusion in the compilation, further comprising dynamically updating the list of terms, the method of claim 3.

The scope of the investigation includes selection of an intellectual property infringement investigation classification ,
Selecting a claim vector category for the infringement investigation;
The complaint vector category selection, the static document vector from the document collection, to limit the claims in the basic document in the collection method of claim 1.

The scope of the investigation includes the selection of an intellectual property infringement investigation invalidity investigation classification ;
The Name for invalidity study outline, summary, detailed description, further comprising the step of selecting a vector category claims and drawings,
The selection of the selected vector categories, the static document vector from the document collection is limited to a representative section of the intellectual property documents in the form of a document vector in the basic document in the collection, in claim 1 The method described.

The Scope comprises a selection patent novelty search classification,
Selecting the detailed description vector category for the novelty study;
Selection of the detailed description vector category, the static document vector from the document collection is limited to the detailed description section of the form of intellectual property document of a document vector in the basic document in the collection, according to claim 1 The method described in 1.

Further comprising the step of using a graphical user interface layer for selecting the Scope The method of claim 1.

The method of claim 1, further comprising setting a maximum limit on the amount of relevant documents returned in the survey.

The method of claim 1, wherein the compilation of returned relevant documents includes a document determined to have at least one static document vector within a defined numerical range of the dynamic document vector.

A processor in communication with the storage medium, wherein the storage medium stores an electronic document collection, the electronic document collection includes compilation of intellectual property documents, and the intellectual property documents in the collection each have a plurality of sections;
A document manager for deriving at least one document vector for each intellectual property document in the collection, which performs processing to create at least one static document vector for each intellectual property document in the document collection at the time of indexing When,
The document manager for limiting the static document vector to selection of fields from an intellectual property document;
An input manager that creates a dynamic document vector based on string data from a query input presented in the electronic intellectual property document collection at the time of the query;
A query manager that communicates with the input manager and compares the dynamic document vector to each static document vector in the collection in response to presentation of the query input to the intellectual property document collection;
A selection manager in communication with the query manager;
Compiling the returned relevant intellectual property document in response to the query manager and based on a comparison of the dynamic document vector and the static document vector ;
The fields are selected from the group consisting of name, background, summary, summary, detailed description, claims, drawings and combinations thereof;
The document manager creates a plurality of static document vectors for each intellectual property document in the collection;
Each static document vector is based on one or more fields of the intellectual property document,
The selection manager selects a survey range having a plurality of survey range subfields to apply to the document collection, the selection of the survey range being at least one static document vector category from the document collection. And comparing the selection of the at least one static vector category with the created dynamic vector based on a defined search range .

A compilation of non-applicable strings of intellectual property terms stored in the file;
The system of claim 13 , wherein the query manager includes applying the compilation to the static document vector and excluding each string in the compilation from each of the document vectors.

The system of claim 14 , wherein the compilation of intellectual property terms is language specific.

The system of claim 14 , wherein the compilation of intellectual property terms is culture specific.

The system of claim 14 , further comprising a document manager that dynamically updates the compilation of non-applicable intellectual property terms that performs a process of identifying specific terms for inclusion in the compilation.

The scope of the investigation includes a selection of infringement investigation categories ;
A selection manager for selecting the claim vector category for the infringement investigation;
The system of claim 13 , wherein selection of the claim vector category limits the static document vector from the document collection to claims that are in the base document collection.

The survey range comprises a selection of invalidity survey classification,
Name for the invalidity study outline, summary, detailed description, further comprising a selection manager for selecting a vector category claims and drawings,
The selection of the selected vector categories, the static document vector from the document collection is limited to a representative section of the shape of the intellectual property document of a document vector in the basic document in the collection, to claim 13 The system described.

The Scope comprises a selection of novelty search classification,
A selection manager for selecting the detailed description vector category for the novelty study;
Selection of the detailed description vector category, the static document vector from the document collection is limited to the detailed description section of the form of intellectual property document of a document vector in the basic document in the collection, according to claim 13 The system described in.

A graphical user interface in communication with the query manager;
The system of claim 13 , wherein the graphical user interface comprises an array of input selectors defined to select the survey scope to apply to the document collection.

A computer readable medium having recorded thereon a computer program executed on a computer for examining a collection of electronic documents on a computer memory and executing a query ,
Including computer program instructions to be executed by a computer ;
The computer program instructions are
Instructions for compiling a collection of intellectual property documents, wherein the intellectual property documents in the collection each have a plurality of sections;
Instructions for creating at least one static document vector for each intellectual property document in the document collection at the time of indexing, and for deriving at least one document vector for each intellectual property document in the collection; ,
Instructions to limit the static document vector to selection of fields from an intellectual property document;
Instructions for creating a plurality of static document vectors for each intellectual property document in the collection;
At query time, an instruction to create a dynamic document vector based on string data from the query input,
Instructions for comparing the dynamic document vector with each static document vector in the collection, and for presenting the query input to the electronic document collection;
Instructions for selecting a search range having a plurality of search range sub-fields for application to the document collection, wherein the selection of the search range includes at least one static document vector category from the document collection and Instructions for matching and comparing the selection of the at least one static vector category with the created dynamic vector based on the defined search scope;
Based on a comparison of the static document vector and the dynamic document vector, and the instruction to return the compilation of the relevant intellectual property documents, only including,
The field is selected from the group consisting of name, summary, background, summary, detailed description, claims, drawings and combinations thereof;
A computer readable medium wherein each static document vector is based on one or more fields of the intellectual property document .

Instructions comprising: creating a compilation of non-applicable strings of intellectual property terms in the file; applying the compilation to the document vector; and excluding each string in the compilation from each of the document vectors The computer readable medium of claim 22 , comprising:

24. The computer readable medium of claim 23 , wherein the compilation of intellectual property terms is language specific.

24. The computer readable medium of claim 23 , wherein the compilation of intellectual property terms is culture specific.

24. The computer readable medium of claim 23 , further comprising instructions for dynamically updating the compilation of non-applicable intellectual property terms, including instructions that identify specific terms for inclusion in the compilation.

The scope of the investigation includes a selection of infringement investigation categories ;
Further comprising an instruction to select the claim vector category for the infringement investigation;
23. The computer readable medium of claim 22 , wherein selection of the claim vector category limits the static document vector from the document collection to claims that are in the basic document collection.

The survey range comprises a selection of invalidity survey classification,
Further including instructions for selecting a name category, summary, summary, detailed description, claims, and drawing vector category for the invalidity investigation;
The selection of the selected vector categories, the static document vector from the document collection is limited to a representative section of the intellectual property documents in the form of a document vector in the basic document in the collection, to claim 22 The computer-readable medium described.

The Scope comprises a selection of novelty search classification,
Further comprising instructions for selecting the detailed description vector category for the novelty study;
Selection of the detailed description vector category, the static document vector from the document collection is limited to the detailed description section of the form of intellectual property document of a document vector in the basic document in the collection, according to claim 22 A computer-readable medium according to claim 1 .

A computer readable medium having recorded thereon a computer program executed on a computer for examining a collection of electronic documents on a computer memory and executing a query ,
Including computer program instructions to be executed by a computer ;
The computer program instructions are
A compile command to compile a collection of intellectual property documents, and instructions for the intellectual property documents in the collection, each having a plurality of sections,
When indexing includes instructions for creating at least one static document vectors for each IP documents in the document collection, and instructions for deriving at least one document vectors for each IP documents in the collection ,
Instructions to limit the static document vector to selection of fields from an intellectual property document;
Instructions for creating a plurality of static document vectors for each intellectual property document in the collection;
At query time, an instruction to create a dynamic document vector based on string data from the query input,
Said dynamic document vector includes instructions to be compared with each static document vector in the collection, presenting the query input to the electronic document collection instruction,
Instructions for selecting a search range having a plurality of search range sub-fields for application to the document collection, wherein the selection of the search range includes at least one static document vector category from the document collection and Instructions for matching and comparing the selection of the at least one static vector category with the created dynamic vector based on the defined search scope;
Based on a comparison of the static document vector and the dynamic document vector, and the instruction to return the compilation of the relevant intellectual property documents, only including,
The field is selected from the group consisting of name, summary, background, summary, detailed description, claims, drawings and combinations thereof;
A computer readable medium wherein each static document vector is based on one or more fields of the intellectual property document .