JP2021140282A

JP2021140282A - Program for extracting case component from english patent specification

Info

Publication number: JP2021140282A
Application number: JP2020035392A
Authority: JP
Inventors: 玲雄加藤; Reo Kato; 翔久保田; Sho Kubota; 匡伊津野; Tadashi Izuno; 友子山下; Tomoko Yamashita; 元安彦; Hajime Abiko
Original assignee: Cyber Patent Ltd; Management of Technology Solution Cooperation
Current assignee: Cyber Patent Ltd; Management of Technology Solution Cooperation
Priority date: 2020-03-02
Filing date: 2020-03-02
Publication date: 2021-09-16

Abstract

To accurately extract case components defined in documents converted to digital data including English patent specifications and the like.SOLUTION: A computer is caused to execute a character string extraction step of extracting a character string from description in a range of a patent claim in English patent specifications converted to digital data and a case component extraction step of extracting noun phrases from the character string extracted in the character string extraction step and extracting case components from the extracted noun phrases. In the case component extraction step, when a noun phrase modifying a verb or a noun phrase modified by a verb is described in each of a preceding stage and a following stage and the noun phrases in the preceding stage and in the following stage are consistent with each other, such processing is performed that description of the noun phrase modifying a verb or the noun phrase modified by a verb in the following stage is excluded from case components to be extracted.SELECTED DRAWING: Figure 3

Description

本発明は、電子データ化された英文の特許明細書に記載された文字列から格成分を抽出する英文の特許明細書からの格成分抽出プログラムに関する。 The present invention relates to a case component extraction program from an English patent specification that extracts a case component from a character string described in an English patent specification that has been converted into electronic data.

従来、短時間で特許明細書のチェックを行うことを目的とするものであって、特に、クレーム数を極力小さく抑える場合においても、かかる中間概念の記載を含めてより簡単に、かつ権利範囲や特許率をも考慮してチェックすることができ、さらには明細書の記載の方向性そのものを中間概念をも含めて容易にチェックすることができる明細書分析表示装置が特許文献１に開示されている。 Conventionally, the purpose has been to check patent specifications in a short time, and in particular, even when the number of claims is kept as small as possible, it is easier to include the description of such an intermediate concept, and the scope of rights and the scope of rights Patent Document 1 discloses a specification analysis display device that can be checked in consideration of the patent rate, and further, the direction itself of the description of the specification itself can be easily checked including an intermediate concept. There is.

また、英文の特許明細書における特許請求の範囲に定義された発明の限定度合をカウントして表示することが可能な特許明細書分析表示装置が特許文献２に開示されている。 Further, Patent Document 2 discloses a patent specification analysis display device capable of counting and displaying the degree of limitation of an invention defined in the claims in an English patent specification.

特開２００６−１５５１５１号公報Japanese Unexamined Patent Publication No. 2006-155151 特開２０１４−２２２３７１号公報Japanese Unexamined Patent Publication No. 2014-222371

しかしながら、この特許文献2の開示技術によれば、実際に英文の特許明細書から発明の限定度合としての格成分を抽出する上で高精度かつ簡便に実現できる方法について検討がなされていないところもあった。 However, according to the disclosure technique of Patent Document 2, there is a place where a method that can be easily and accurately realized in actually extracting a case component as a degree of limitation of an invention from an English patent specification has not been studied. there were.

そこで、本発明は、上述した問題点に鑑みて案出されたものであり、英文の特許明細書等を始めとした電子データ化された文書に定義された格成分を高精度かつ簡便に抽出することが可能な英文の特許明細書からの格成分抽出プログラムを提供することを目的とする。 Therefore, the present invention has been devised in view of the above-mentioned problems, and the case components defined in electronic data-converted documents such as English patent specifications can be easily extracted with high accuracy. It is an object of the present invention to provide a case component extraction program from a patent specification in English that can be used.

第１発明は、電子データ化された英文の特許明細書における特許請求の範囲の記載から文字列を抽出する文字列抽出ステップと、上記文字列抽出ステップにおいて抽出された文字列について名詞句を抽出し、抽出した名詞句から格成分を抽出する格成分抽出ステップとを有し、上記格成分抽出ステップでは、上記名詞句に係り受けし、また上記名詞句が係り受けする動詞、動名詞、to動詞、現在分詞、過去分詞の記載があるとき、当該記載の後段において記載されている同一の名詞句に係り受けし、また当該名詞句が係り受けする動詞、動名詞、to動詞、現在分詞、過去分詞の意味内容が一致する場合、上記後段において記載されている同一の名詞句を抽出すべき格成分から除外するように処理することをコンピュータに実行させることを特徴とする英文の特許明細書からの格成分抽出プログラムである。 The first invention is a character string extraction step for extracting a character string from the description of the scope of a patent claim in an English patent specification converted into electronic data, and a noun phrase for the character string extracted in the above character string extraction step. It also has a case component extraction step that extracts a case component from the extracted noun phrase, and in the case component extraction step, the verb, verb, to When there is a description of a verb, present part, or past part, the verb, verb, to verb, present part, which depends on the same noun phrase described later in the description, and which the noun phrase depends on. An English patent specification characterized by having a computer perform processing to exclude the same noun phrase described in the latter part of the above from the case components to be extracted when the meanings of the past verbs match. It is a noun phrase extraction program from.

第２発明は、電子データ化された英文の特許明細書における特許請求の範囲の記載から文字列を抽出する文字列抽出ステップと、上記文字列抽出ステップにおいて抽出された文字列について名詞句を抽出し、抽出した名詞句から格成分を抽出する格成分抽出ステップとを有し、上記格成分抽出ステップでは、前段及び後段においてそれぞれ、第5文型の目的語と補語に相当する文字列が記載されていたとき、これらが互いに一致する場合には、後段に記載された目的語と補語に相当する文字列の組み合わせを抽出すべき格成分から除外するように処理することをコンピュータに実行させることを特徴とする英文の特許明細書からの格成分抽出プログラムである。 The second invention is a character string extraction step for extracting a character string from the description of the scope of a patent claim in an English patent specification converted into electronic data, and a noun phrase for the character string extracted in the above character string extraction step. It has a case component extraction step that extracts a case component from the extracted noun phrase, and in the above case component extraction step, character strings corresponding to the object and the complement of the fifth sentence pattern are described in the first stage and the second stage, respectively. If they match each other, let the computer perform a process to exclude the combination of the object and the noun corresponding to the noun described in the latter part from the case components to be extracted. It is a case component extraction program from a featured English patent specification.

第３発明は、第１発明において、上記格成分抽出ステップでは、上記目的語と補語に相当する文字列が、前段又は後段で主語の述語の関係で記載されていたとき、これらが互いに一致する場合には、後段に記載された目的語と補語に相当する文字列の組み合わせを抽出すべき格成分から除外するように処理することを特徴とする。 In the third invention, in the first case component extraction step, when the character strings corresponding to the object and the complement are described in the relation of the predicate of the subject in the first stage or the second stage, they match each other. In this case, it is characterized in that the combination of the object and the character string corresponding to the complement described in the latter part is excluded from the case components to be extracted.

第４発明は、第１発明〜第３発明において、上記格成分抽出ステップでは、互いに一致するか否かの判断において、互いの動詞の同一性を、能動態又は受動態の共通性まで判断することなくその動詞の共通性のみに基づいて判断することを特徴とする。 In the fourth invention, in the first to third inventions, in the above case component extraction step, in determining whether or not they match each other, the identity of each verb is not determined up to the commonality of the active voice or the passive voice. It is characterized in that the judgment is based only on the commonality of the verbs.

第５発明は、第１発明〜第３発明において、上記格成分抽出ステップでは、互いに一致するか否かの判断において、互いの名詞又は名詞句の同一性を、a, the, each, both, saidのみが異なる場合には、互いに同一であるものと判断することを特徴とする。 In the first to third inventions, the fifth invention determines the identity of each other's nouns or noun phrases in a, the, each, both, in the determination of whether or not they match each other in the above case component extraction step. When only said is different, it is determined that they are the same as each other.

第６発明は、第１発明〜第３発明において、上記格成分抽出ステップでは、互いに一致するか否かの判断において、名詞句に対して「configured to 動詞句」が後ろから係り受けする場合、当該名詞句に対して動詞の過去分詞が後ろから係り受けしているとみなして同一性を判断することを特徴とする。 According to the sixth invention, in the first to third inventions, in the above case component extraction step, when "configured to verb phrase" is applied from behind to a noun phrase in determining whether or not they match each other. It is characterized in that the past part of the verb is regarded as being dependent on the noun phrase from behind, and the identity is judged.

第７発明は、第１発明〜第３発明において、上記格成分抽出ステップでは、互いに一致するか否かの判断対象となる前段と後段の各文字列が、それぞれwhen, if, in case, in the case, while, only, providing, whetherの何れか＋名詞句＋動詞からなる条件節に係り受ける場合に、当該係り受けする条件節が前段と後段の間で相違する場合には、互いに非同一であるものと判断することを特徴とする。 In the seventh invention, in the first to third inventions, in the above-mentioned case component extraction step, each character string in the first stage and the second stage, which is a target for determining whether or not they match each other, is when, if, in case, in, respectively. When a conditional clause consisting of any of the case, while, only, providing, or whether + noun phrase + verb is involved, if the dependent conditional clause differs between the first and second stages, they are not identical to each other. It is characterized in that it is judged to be.

第８発明は、第１発明〜第３発明において、上記格成分抽出ステップでは、互いに一致するか否かの判断において、互いの動詞の同一性を、助動詞の共通性まで判断することなくその動詞の共通性のみに基づいて判断することを特徴とする。 According to the eighth invention, in the first to third inventions, in the above case component extraction step, in determining whether or not they match each other, the verbs are identified without determining the identity of each other's verbs up to the commonality of auxiliary verbs. It is characterized in that the judgment is based only on the commonality of.

上述した構成からなる本発明では、英文の特許明細書における特許請求の範囲に定義された発明の限定度合としての格成分を高精度かつ簡便に抽出することが可能となる。 In the present invention having the above-described configuration, it is possible to easily and accurately extract the case component as the degree of limitation of the invention defined in the claims in the English patent specification.

格成分について説明するための図である。It is a figure for demonstrating the case component. 格成分について説明するための他の図である。It is another figure for demonstrating the case component. 本発明を適用した格成分抽出プログラムの処理動作ステップを示すフローチャートである。It is a flowchart which shows the processing operation step of the case component extraction program to which this invention is applied.

以下、本発明を実施するための形態として、英文の特許明細書に記載された文字列から格成分を抽出する格成分抽出プログラムについて、図面を参照しながら詳細に説明する。 Hereinafter, as a mode for carrying out the present invention, a case component extraction program for extracting a case component from a character string described in an English patent specification will be described in detail with reference to the drawings.

以下の例では、英文の特許明細書における特許請求の範囲の記載から格成分を抽出する場合について説明をするが、これに限定されるものではなく、他のいかなる文書に対して本発明を適用してもよいことは勿論である。 In the following example, the case where the case component is extracted from the description of the claims in the English patent specification will be described, but the present invention is not limited to this, and the present invention is applied to any other document. Of course, it may be done.

格成分数の理論についてAbout the theory of the number of case components

特許発明の技術的範囲の広さを定量化、数値化するための数値化方法における最小抽出単位として、格成分が提案されている。 A case component has been proposed as the minimum extraction unit in a quantification method for quantifying and quantifying the breadth of the technical scope of a patented invention.

特許請求の範囲が、各構成要素A、B、Cを「〜Aと、〜Bと、Cとを備える○○装置（方法）」と列挙することにより定義する、いわゆる要件列挙方式で記載されていることを前提としたとき、各構成要素A、B、Cには、動詞句が係り受けする。即ち、各構成要素A、B、Cを主語としたとき、これらにはそれぞれ動詞句が係り受けし、主語と述語とからなる文を成立させることができる。これら動詞句における動詞は、述語として文を形成するにあたり、自らの表す動き、状態、関係を実現させるために、どのような名詞句の組み合わせを取るかが基本的に決まっている。動詞が自らの帯びている語彙的意味の類的なあり方に応じて、文の形成に必要な名詞句の組み合わせを選択的に要求する働きを、動詞の格支配と仮称するならば、動詞により文の成分として要求された名詞句は、動詞を補足する上での格成分ということができる。 The claims are described in the so-called requirement enumeration method, which is defined by enumerating each component A, B, C as "○○ device (method) including ~ A, ~ B, and C". A verb phrase is assigned to each of the components A, B, and C, assuming that they are. That is, when each component A, B, and C is the subject, a verb phrase is assigned to each of them, and a sentence consisting of the subject and the predicate can be established. When forming a sentence as a predicate, the verbs in these verb phrases basically determine what kind of noun phrase combination to take in order to realize the movement, state, and relationship that they represent. If the function of selectively requesting the combination of noun phrases necessary for the formation of a sentence according to the categorical form of the verb meaning that the verb has is tentatively called the case control of the verb, the verb The noun phrase required as a component of a sentence can be said to be a case component for supplementing a verb.

例を挙げて説明するならば、構成要素“信号生成手段”が下記のＢ−１）により定義されているものとする。 To explain by way of example, it is assumed that the component "signal generation means" is defined by the following B-1).

Ｂ−１）「ユーザの要求に応じて駆動信号を生成する信号生成手段と、〜」 B-1) "Signal generation means for generating a drive signal in response to a user's request, and ..."

このとき、「ユーザの要求に応じて駆動信号を生成する」という動詞句が信号生成手段に係り受けする。この動詞句において「生成する」という動詞の動作を実現・完成させるために、「ユーザの要求（に応じて）」、「駆動信号（を）」という名詞句を要求する働きが格支配であり、これら名詞句が格成分である。そして、「ユーザの要求（に応じて）」、「駆動信号（を）」といった名詞句が、それぞれ「生成する」という動詞に対して「動作開始条件」「対象」といった類的な関係的意味が格である。この格成分を実現している名詞句の担っている語彙的意味、またそれらの名詞句が帯びている関係的意味としての格によって形成されるものが、この動詞「生成する」により実現すべき命題の中核部分となる。上記例で言うならば “生成する”という動詞による命題を実現するための動作開始条件として、“ユーザの要求に応じて”いなければならず、さらにその対象として“駆動信号”を生成しなければならないため、これら2つが動詞による命題実現のため条件数となっている。 At this time, the verb phrase "generate a drive signal in response to a user's request" depends on the signal generation means. In order to realize and complete the action of the verb "generate" in this verb phrase, the function of requesting the noun phrases "user's request (according to)" and "drive signal (o)" is the rule of thumb. , These noun phrases are case components. Then, noun phrases such as "user's request (according to)" and "drive signal (o)" have similar relational meanings such as "operation start condition" and "object" for the verb "generate", respectively. Is a case. What is formed by the lexical meaning of the noun phrases that realize this case component and the case as the relational meaning of those noun phrases should be realized by this verb "generate". It becomes the core part of the proposition. In the above example, the operation start condition for realizing the proposition by the verb "generate" must be "in response to the user's request", and the "drive signal" must be generated as the target. Because they have to be, these two are conditional numbers for the realization of the proposition by the verb.

図１に示すように、このような動詞による命題実現のためにクリアしなければならない条件数が少ない場合を模式的に表したものである。動詞による動作開始のための条件の数が少なくなるため、命題を実現できる可能性が向上する。これに対して、図２は、動詞による命題実現のためにクリアしなければならない条件数が増加した場合を模式的に表したものであり、かかる場合には命題を実現できる可能性が低下する。 As shown in FIG. 1, it schematically represents a case where the number of conditions that must be cleared in order to realize a proposition by such a verb is small. Since the number of conditions for starting an action by a verb is reduced, the possibility of realizing the proposition is improved. On the other hand, FIG. 2 schematically shows a case where the number of conditions that must be cleared in order to realize a proposition by a verb increases, and in such a case, the possibility that the proposition can be realized decreases. ..

条件数が増加するほど、実際に動作が開始されるまでに条件を満たすか否かの判断のステップ数が増加することになる。各条件を満たす確率に多少の差異があることを考慮しても、この条件の判断ステップ数が増加するに従い、換言すれば条件を規定する格成分が増加するに従い、その動詞句が係り受けする構成要素Aに該当する可能性が低くなることを意味しており、その可能性の低下した分、技術的範囲が狭まることを示している。逆に、格成分数が少ない場合には、その動詞句が係り受けする構成要素Aに該当する可能性が高くなることを意味しており、その分において技術的範囲が広がることを示している。 As the number of conditions increases, the number of steps for determining whether or not the conditions are satisfied increases before the operation is actually started. Even considering that there are some differences in the probabilities of satisfying each condition, the verb phrase is affected as the number of judgment steps of this condition increases, in other words, as the case component that defines the condition increases. It means that the possibility of falling under component A is reduced, and the technical scope is narrowed by the reduced possibility. On the contrary, when the number of case components is small, it means that there is a high possibility that the verb phrase corresponds to the component A, which indicates that the technical range is expanded accordingly. ..

このように格成分数（条件数）が、動詞による動作開始可能性、ひいては命題実現の可能性を支配し、これが技術的範囲の広狭に影響を及ぼすものであるから、特許請求の範囲の数値化方法の最小抽出単位を格成分として、この動詞句に含められている格成分数をカウントすることにより、技術的範囲の広さに応じた数値化を実現することができるものと考えられる。 In this way, the number of case components (number of conditions) controls the possibility of starting an action by a verb, and by extension, the possibility of realizing a proposition, which affects the breadth and narrowness of the technical scope. By counting the number of case components included in this verb phrase, using the minimum extraction unit of the conversion method as the case component, it is considered that quantification according to the wide technical range can be realized.

上述したＢ−１）の例では、構成要素「信号生成手段」の命題を実現するための動詞「生成する」に係り受けする2つの格成分「ユーザの要求に応じて」「駆動信号を」が存在するため、格成分数が2となる。このとき、動詞「生成する」の動作開始条件を規定する「ユーザの要求に応じて」という格成分が存在しない場合には、格成分数が1となり、ユーザの要求の有無に関係なく、いつでも「駆動信号」を「生成する」ことができることになり、「信号生成手段」の命題を実現できる可能性が高くなる。これは構成要素「信号生成手段」により、侵害被疑製品の技術的要素の同一性を立証できる可能性を向上させることができることを示唆しており、かかる可能性の向上させた分が、侵害被疑製品に対する特許発明の捕捉容易性、ひいては特許発明の技術的範囲の広さに相当するものと考えられる。このため、構成要素の命題実現可能性を格成分数を介して表現することにより、技術的範囲の広さに応じた、より最適な数値化、定量化を図ることができるものと考えられる。 In the above-mentioned example of B-1), the two case components "according to the user's request" and "driving signal" which are related to the verb "generate" for realizing the proposition of the component "signal generating means". Because of the existence of, the number of case components is 2. At this time, if the case component "in response to the user's request" that defines the operation start condition of the verb "generate" does not exist, the number of case components becomes 1, and the number of case components becomes 1, regardless of the presence or absence of the user's request. Since the "drive signal" can be "generated", the possibility that the proposition of the "signal generation means" can be realized increases. This suggests that the component "signal generation means" can increase the possibility of proving the identity of the technical elements of the alleged infringement product, and the increased possibility is the alleged infringement. It is considered that it corresponds to the ease of capturing the patented invention for the product, and by extension, the wide technical range of the patented invention. Therefore, by expressing the propositional realizability of the components through the number of case components, it is considered that more optimal quantification and quantification can be achieved according to the wide technical range.

特許請求の範囲が、各構成要素A、B、Cを「〜Aと、〜Bと、Cとを備える○○装置（方法）」と列挙することにより定義する、いわゆる要件列挙方式で記載されていることを前提としたとき、各構成要素A、B、Cについてそれぞれ格成分数を求め、その総和を特許発明としての○○装置の格成分数とする。仮に、構成要素Aの格成分数が1、構成要素Bの格成分数が3、構成要素Cの格成分数が2とした場合に、これらにより構成される特許発明としての○○装置の格成分数は、その総和である6となる。 The scope of claims is described by the so-called requirement enumeration method, which is defined by enumerating each component A, B, and C as "○○ device (method) including ~ A, ~ B, and C". Assuming that, the number of case components is obtained for each of the constituent elements A, B, and C, and the total is taken as the number of case components of the XX device as a patented invention. Assuming that the number of case components of component A is 1, the number of case components of component B is 3, and the number of case components of component C is 2, the case of the XX device as a patented invention composed of these is assumed. The number of components is 6, which is the total number of components.

実際に各構成要素の格成分数を求める際に、上述したＢ−１）の例では、「生成する」という動詞に係り受けする格成分「ユーザの要求に応じて」「駆動信号を」を抽出することになるが、この格成分の実際の抽出作業は、「応じて」「を」等、形態素を目印にして行っていくことになる。 When actually obtaining the number of case components of each component, in the above-mentioned example of B-1), the case components "according to the user's request" and "drive signal" that are related to the verb "generate" are used. Although it will be extracted, the actual extraction work of this case component will be carried out using morphemes such as "accordingly" and "o" as markers.

表１に格成分を抽出する際に、目印として参照する形態素の例を示す。 Table 1 shows an example of a morpheme to be referred to as a mark when extracting a case component.

特許請求の範囲は、Ｂ−１）のようなオーソドックスな形で定義されている場合のみならず、例えばＣ−１）に示すように定義される場合もある。 The scope of claims may be defined not only in an orthodox form such as B-1), but also as shown in C-1), for example.

Ｃ−１）「第1のレンズによりスポット径を制御された光束を反射板により全反射する反射手段と、〜」 C-1) "Reflective means that totally reflects the luminous flux whose spot diameter is controlled by the first lens by the reflector, and ..."

例えば、特許請求の範囲を構成する構成要素「反射手段」には、「全反射する」という動詞が係り受けする。この「全反射する」という動詞の動作開始条件を規定する格成として、「光束（を）」と、「反射板（により）」が存在することになり、先ず格成分数2をカウントすることができる。しかし、このうち一の格成分「光束」に着目した場合に、「第1のレンズによりスポット径を制御された」という動詞句がさらに係り受けしているのが分かる。あらゆる光束を「全反射する」対象として定義するよりも、むしろ条件が付加された光束を「全反射する」対象として定義した方が、技術的に限定が付加され、構成要素「反射手段」の命題実現可能性が低下する。このため、この命題実現可能性の低下分を格成分数として補正する必要が出てくる。 For example, the verb "total internal reflection" is applied to the component "reflection means" that constitutes the scope of claims. As a case that defines the operation start condition of the verb "total internal reflection", "luminous flux (o)" and "reflector (by)" exist, and the number of case components 2 must be counted first. Can be done. However, when focusing on one of the case components, "luminous flux," it can be seen that the verb phrase "the spot diameter was controlled by the first lens" is further involved. Rather than defining every luminous flux as an object of "total reflection", defining a luminous flux with a condition as an object of "total reflection" is technically limited and of the component "reflection means". Proposition feasibility is reduced. Therefore, it is necessary to correct the decrease in the feasibility of this proposition as the number of case components.

かかる場合には、「光束」に係り受けする動詞句「第1のレンズによりスポット径を制御される」において、「制御する」という動詞により格支配される2つの格成分「第1のレンズ（により）」、「スポット径（を）」が存在しているため、さらに格成分数2を加算する。その結果、Ｃ−１）は合計の格成分数は4となる。 In such a case, in the verb phrase "the spot diameter is controlled by the first lens", which is related to "light beam", the two case components "first lens" (the first lens) are governed by the verb "control". Because) ”and“ spot diameter () ”exist, the number of case components 2 is added. As a result, C-1) has a total number of case components of 4.

なお、この特許請求の範囲の記載Ｃ−１）は、明細書作成者によっては下記のＣ−２）のように定義される場合もある。 The description C-1) of the scope of claims may be defined as C-2) below depending on the specification creator.

Ｃ−２）「第1のレンズにより光束のスポット径を制御する光束制御手段と、かかる光束を反射板により全反射する反射手段と、〜」 C-2) "A luminous flux control means for controlling the spot diameter of a luminous flux by a first lens, and a reflecting means for totally reflecting such a luminous flux by a reflector."

このＣ−２）における「光束制御手段」並びに「反射手段」により形成される技術的範囲は、Ｃ−２）と実質的に同一である。Ｃ−２）では、Ｃ−１）において「光束」に係り受けする動詞句の内容を、「光束制御手段」により実現すべき命題と位置づけて定義し直している。実際にＣ−１）においても、定義されている「光束」を作り出すためには、何らかの手段や部材を利用することになるが、あえて「光束制御手段」を挙げて定義していないに過ぎない。このＣ−２）についても同様に格成分数をカウントすると、光束制御手段で格成分数2、反射手段で格成分数2で合計で格成分数4となり、Ｃ−１）と同様の結果となる。 The technical range formed by the "luminous flux control means" and the "reflection means" in C-2) is substantially the same as that in C-2). In C-2), the content of the verb phrase related to "luminous flux" in C-1) is redefined as a proposition to be realized by "luminous flux control means". Actually, even in C-1), some means or members are used to create the defined "luminous flux", but the "luminous flux control means" is not defined. .. Similarly, when the number of case components is counted for this C-2), the number of case components is 2 for the luminous flux control means and the number of case components is 2 for the reflection means, resulting in a total of 4 case components, which is the same result as C-1). Become.

上述のように、構成要素の命題を実現するために要求される格成分の数を介して技術的範囲の広さをカウントするものであるため、互いに記載方法や単語数が異なるものの実質的に同一の技術的範囲からなるＣ−１）、Ｃ−２）を同等の格成分数で表現することができる。仮に最小抽出単位を構成要素とした場合にＣ−１）は1、Ｃ−２）は2となり、最小抽出単位を単語とした場合にＣ−１）は7、Ｃ−２）は、9となることを鑑みても、この格成分数は、より好適な数値化方法の最小抽出単位になり得るものと考えられる。 As described above, since the breadth of the technical range is counted through the number of case components required to realize the proposition of the component, the description method and the number of words are different from each other, but substantially. C-1) and C-2) having the same technical range can be expressed by the same number of case components. If the minimum extraction unit is a component, C-1) is 1, C-2) is 2, and if the minimum extraction unit is a word, C-1) is 7, and C-2) is 9. In view of the above, it is considered that this number of case components can be the minimum extraction unit of a more suitable quantification method.

なお、Ｃ−１）、Ｃ−２）の構成は、以下のＣ−３）により定義される場合もあり得る。 The configurations of C-1) and C-2) may be defined by the following C-3).

Ｃ−３）「第1のレンズにより光束のスポット径を制御する光束制御手段と、上記光束制御手段によりスポット径が制御された光束を反射板により全反射する反射手段と、〜」 C-3) "A light beam control means for controlling the spot diameter of a light flux by the first lens, and a reflection means for totally reflecting a light beam whose spot diameter is controlled by the light beam control means by a reflecting plate."

このＣ−３）の下線部は、Ｃ−２）における「かかる光束」の部分に相当する。しかし、この下線部に記載されている動詞句の内容は、「光束制御手段」に係り受けする「光束のスポット径を制御する」により既に実現されてしまった命題であり、下線の記載をあえて入れたのは「反射手段」が全反射する対象としての「光束」について解釈上の疑義が生じないようにするための確認的なものに過ぎない。このため、格成分数を計算する際に、この下線部の記載をも同様にカウントすることになれば、下線部の内容を二重カウントすることになり測定精度の悪化を招く結果となる。 The underlined portion of C-3) corresponds to the portion of "the luminous flux" in C-2). However, the content of the verb phrase described in this underlined part is a proposition that has already been realized by "controlling the spot diameter of the luminous flux" that is related to the "luminous flux control means", and the underlined description is intentionally made. What I put in is just a confirmation so that there is no doubt in interpretation about the "luminous flux" as the object to be totally reflected by the "reflection means". Therefore, if the description of the underlined portion is counted in the same manner when calculating the number of case components, the content of the underlined portion will be double-counted, resulting in deterioration of measurement accuracy.

このため、既に命題として実行済みの下線部の記載は、改めてカウントしないように留意することにより、クレームの記載順序や記載方法による格差に伴う格成分数のずれを解消することが可能となる。実際には、前段の「光束のスポット径を制御する」の記載と、後段の下線部（「上記光束制御手段によりスポット径が制御された光束」）の記載とにより実現される命題が同一か否かを判断する必要がある。 For this reason, it is possible to eliminate the deviation in the number of case components due to the difference in the description order and description method of the claims by paying attention not to count the underlined description that has already been executed as a proposition. Actually, is the proposition realized by the description of "controlling the spot diameter of the luminous flux" in the first stage and the description of the underlined part in the second stage ("the light flux whose spot diameter is controlled by the above-mentioned luminous flux control means") are the same? It is necessary to judge whether or not.

最小抽出単位を格成分とした数値化方法では、特に命題の同一性を判断する際において有用なものとなる。格成分は単語と異なり、動詞により支配される性質を持つため、格成分が同一であるということは、格成分の内容に加えてこれを支配する動詞も同一でなければならないことを意味している。格成分と、これを支配する動詞の双方が同一であれば、これにより実現される命題も同一となる。このため、格成分の同一性の判断を介して、命題が同一か否かを識別することが可能となり、ひいては二重カウントか否かを容易に識別することが可能となる。また、明細書作成者間の記載方法の相違により、得られる数値に影響が及ぶこともなくなる。 The quantification method using the minimum extraction unit as the case component is particularly useful when judging the identity of a proposition. Unlike words, case components have the property of being dominated by verbs, so the same case component means that in addition to the content of the case component, the verb that governs it must also be the same. There is. If both the case component and the verb that governs it are the same, the proposition realized by this is also the same. Therefore, it is possible to identify whether or not the propositions are the same through the determination of the identity of the case components, and by extension, it is possible to easily identify whether or not the propositions are double counts. In addition, the difference in the description method between the specification creators does not affect the obtained numerical value.

なお、上述した例では、特許請求の範囲の記載が構成要件列挙型で記載されている場合を例にとり説明をしたが、他の形式で記載されていても同様にカウントすることができる。例えば、ジェプソン型で記載されていた場合には、公知部分とされる「おいて書き」の記載についても同様の方法によりカウントする。このとき、「おいて書き」の記載のカウント方法が分からない場合には、ジェプソン型から構成要件列挙型に書き換えた上でカウントするようにしてもよい。ちなみに、このジェプソン型から構成要件列挙型への書き換え方法は、従来から周知である。 In the above-mentioned example, the case where the description of the claims is described in the constituent requirement enumeration type is taken as an example, but the description can be performed in the same manner even if it is described in another format. For example, when the description is in the Jepson type, the description of "written", which is a known part, is also counted by the same method. At this time, if the counting method described in "Note" is not known, the Jepson type may be rewritten to the constituent requirement enumeration type before counting. By the way, the method of rewriting from the Jepson type to the constituent requirement enumeration type has been well known.

また、特許請求の範囲が書き流し型で記載されていた場合も同様の手法によりカウントすることができる。例えば、Ｃ−２）が書き流し型で記載されていた場合には、以下のＣ−４）のようになる。 Further, when the claims are described in a written format, they can be counted by the same method. For example, when C-2) is written in a writing type, it becomes as shown in C-4) below.

Ｃ−４）「第1のレンズにより光束のスポット径を制御し、かかる光束を反射板により全反射し、〜」 C-4) "The spot diameter of the luminous flux is controlled by the first lens, and the luminous flux is totally reflected by the reflector."

この書き流し型で記載されたＣ−４）は、Ｃ−２）と比較して、「光束制御手段」並びに「反射手段」の文言が抜けている点が相違するが、それ以外は同一である。「光束制御手段」並びに「反射手段」は、主語であってこれをカウントの対象とせず、あくまでこの主語に係り受けする名詞句をカウントするため、書き流し型と構成要件列挙型との間で格成分数が異なることはない。 Compared with C-2), C-4) described in this writing type is the same except that the words "luminous flux control means" and "reflection means" are omitted. .. "Luminous flux control means" and "reflection means" are subjects and are not counted, but only count noun phrases related to this subject. The number of components does not differ.

英文の特許明細書からの格成分抽出
本発明では、上述した格成分の理論に基づき、電子データ化された英文の特許明細書における特許請求の範囲の記載から格成分を抽出するものである。英文の特許明細書から格成分を抽出する際には、下記のプロセスに基づくものである。以下「カウント」と称しているものは、何れも格成分として抽出するという意味である。 Extraction of Case Component from English Patent Specification Based on the above-mentioned theory of case component, the case component is extracted from the description of the scope of claims in the electronically converted English patent specification. When extracting case components from an English patent specification, it is based on the following process. What is hereinafter referred to as "count" means that all of them are extracted as case components.

この英文の特許明細書における格成分の抽出プロセスは、図３に示すフローチャートで構成される。格成分の抽出プロセスは、文字列抽出ステップＳ１１、単語包括化ステップＳ１２、名詞句抽出ステップＳ１３、格成分抽出ステップＳ１４により構成される。 The process of extracting case components in this English patent specification is composed of the flowchart shown in FIG. The case component extraction process is composed of a character string extraction step S11, a word inclusion step S12, a noun phrase extraction step S13, and a case component extraction step S14.

以下各ステップの動作について詳細に説明をする。 The operation of each step will be described in detail below.

先ず、文字列抽出ステップＳ１１においては、電子データ化された英文の特許明細書における特許請求の範囲の記載から文字列を抽出する。この文字列抽出ステップＳ１１においては、抽出した文字列について正規化処理を行う。 First, in the character string extraction step S11, a character string is extracted from the description of the claims in the electronically converted English patent specification. In this character string extraction step S11, the extracted character string is normalized.

正規化処理は、単語内において、文字と文字の間にスペースが空いていた場合、当該スペースを詰めて文字と文字とを連結する処理を行う。またこの正規化処理においては、抽出した文字列中に引用符が付されている場合に、これを削除する処理を行う。また、正規化処理では、カッコ内に1単語もしくは1単語を区切り文字（「,」,「:」,「;」）でつなぐ文字列のみからなる場合、括弧とその中身を削除する処理を行う。更にこの正規化処理においては、抽出した文字列中に「・」が含まれている場合に、これを削除する処理を行う。正規化処理では、全角文字を半角文字に置換する処理を行う。 In the normalization process, when there is a space between characters in a word, the space is filled and the characters are connected to each other. Further, in this normalization process, if a quotation mark is attached to the extracted character string, the process of deleting the quotation mark is performed. In the normalization process, if one word or one word is connected by a delimiter (",", ":", ";") in parentheses, the parentheses and their contents are deleted. .. Further, in this normalization process, if the extracted character string contains "・", a process of deleting the "・" is performed. In the normalization process, full-width characters are replaced with half-width characters.

文字列抽出ステップＳ１１では、更に発明の名称の抽出を行う。かかる場合には、文字列中の先頭にある名詞句を発明の名称として特定し、特定した発明の名称、並びに、the又はsaid +（発明の名称）＋comprisingからなる文言を、抽出すべき格成分から除外する。 In the character string extraction step S11, the name of the invention is further extracted. In such a case, the noun phrase at the beginning of the character string should be specified as the name of the invention, and the specified invention name and the wording consisting of the or said + (name of the invention) + comprising should be extracted. Exclude from.

例えば、"A data transfer device for transferring the image data input from an image processing device to an electronic device, said data transfer device comprising"と冒頭から記載されていた場合には、"A data transfer device"を発明の名称として特定する。そして、この発明の名称を抽出すべき格成分から除外する。同様に下線部におけるsaid +（発明の名称）＋comprisingからなる文言の発明の名称に相当する"data transfer device"についても同様に抽出すべき格成分から除外する。なお、下線部が"said device comprising"のように発明の名称と一部一致する文言が記載されている場合においても、同様にこれを抽出すべき格成分から除外する。 For example, if "A data transfer device for transferring the image data input from an image processing device to an electronic device, said data transfer device comprising " is described from the beginning, "A data transfer device" is the name of the invention. Identify as. Then, the name of the present invention is excluded from the case components to be extracted. Similarly, the "data transfer device" corresponding to the title of the invention in the wording consisting of said + (name of the invention) + comprising in the underlined part is also excluded from the case components to be extracted. Even if the underlined part contains a wording that partially matches the name of the invention, such as "said device comprising", it is also excluded from the case components to be extracted.

更にこの文字列抽出ステップＳ１１では、文のフレーズ化を行う。この文のフレーズ化では、実際に抽出した文字列について形態素解析を行うことで、各単語の品詞を取得する。次にこの文字列について構文解析を行うことで、当該文字列を一つの文章とした場合における文書木構造を取得する。解析器によって文書木構造が異なることから、このフェーズで文書木構造を取得することで、解析器による解析誤差を無くすためである。また、文のフレーズ化処理では、文節構造をまとめ上げることにより句にするためのchunkingの処理を行う。 Further, in the character string extraction step S11, the sentence is made into a phrase. In the phrase formation of this sentence, the part of speech of each word is acquired by performing morphological analysis on the actually extracted character string. Next, by performing a syntactic analysis on this character string, the document tree structure when the character string is made into one sentence is obtained. Since the document tree structure differs depending on the analyzer, the analysis error by the analyzer is eliminated by acquiring the document tree structure in this phase. Further, in the phrase-making process of a sentence, chunking processing is performed to make a phrase by putting together the phrase structure.

また文字列抽出ステップＳ１１では、各フレーズについて、格成分抽出で扱いやすい品詞、係り受け構造に変換する処理を行う。かかる場合には、予め熟語マスタに対象となる熟語群を登録しておく。そして、この熟語マスタに登録されている熟語と照らし合わせ、抽出した文字列においてこの登録されている熟語に当てはまる場合には、これを１フレーズとして包括し、熟語フラグをたてる。そして、熟語フラグを立てたフレーズについて、扱いやすい品詞、係り受け構造に変換する。なお、動詞を含む熟語は、かかる処理動作の対象外とする。 Further, in the character string extraction step S11, each phrase is converted into a part of speech and a dependency structure that are easy to handle in the case component extraction. In such a case, the target compound word group is registered in the compound word master in advance. Then, it is compared with the idioms registered in this idiom master, and if the extracted character string corresponds to this registered idiom, this is included as one phrase and the idiom flag is set. Then, the phrase with the compound word flag is converted into an easy-to-use part of speech and dependency structure. Idioms including verbs are excluded from such processing operations.

格成分の抽出ルールとして、英熟語を構成する名詞については抽出すべき格成分から除外するルールがある。例えば、in association with, in response toについては、その英熟語中に、associationやresponse等の単語が含まれ、名詞と解することもできるが、これらは、抽出すべき格成分から除外するように処理を行う。実際には、このような熟語を構成するものをデータベース上に登録しておき、抽出した文字列中にその登録した熟語が含まれているか否かを確認し、仮に含まれていた場合には、抽出すべき格成分から除外するような処理を行う。 As a rule for extracting case components, there is a rule for excluding nouns that make up English idioms from the case components to be extracted. For example, for in association with, in response to, words such as association and response are included in the English idiom and can be interpreted as nouns, but these should be excluded from the case components to be extracted. Perform processing. Actually, what constitutes such a compound word is registered in the database, it is confirmed whether or not the registered compound word is included in the extracted character string, and if it is included, if it is included. , Perform processing to exclude from the case components to be extracted.

このように熟語フラグを立てておくことで、格成分から除外すべきものであるか否かを容易に判断することができる利点もある。 By setting the compound word flag in this way, there is also an advantage that it can be easily determined whether or not it should be excluded from the case component.

またこの文字列抽出ステップＳ１１においては、第5文型をとる動詞、使役動詞にフラグを立てておくことで、後段の処理動作の利便性を向上させるようにしてもよい。第５文型は、主語（Ｓ）＋動詞（Ｖ）＋目的語（Ｏ）＋補語（Ｃ）で成り立つ英文であり、目的語と補語が互いに同一の内容を指す場合もあれば、補語が目的語の意味を補う関係にある場合もある。 Further, in the character string extraction step S11, the convenience of the subsequent processing operation may be improved by setting a flag for the verb and the causative verb that take the fifth sentence pattern. The fifth sentence pattern is an English sentence consisting of subject (S) + verb (V) + object (O) + complement (C), and the object and complement may point to the same content, or the complement is the purpose. In some cases, they complement the meaning of the word.

このステップＳ１１には、格成分の抽出ルールとしての「副詞のルール」に基づいて以下の処理を行う。 In this step S11, the following processing is performed based on the "adverb rule" as the case component extraction rule.

副詞のルール
副詞のルールでは、例えば、〜的に、〜自在に、〜可能に、〜不能に、〜不可に等、動詞に直接係り受けする副詞はカウントしないルールである。英語で言えば、temporarily等のような副詞は格成分に含めない。副詞は、終端が（〜ly ）で終わる場合が多いが、これをテキストマイニング技術を利用して抽出することで副詞を判別するようにしてもよい。このような副詞については、文字列抽出ステップＳ１１において抽出し、これを削除する処理も行う。この副詞削除時において、複数の副詞が接続詞、例えば「,」等で隔てられて連続する場合、このような接続詞も含めて連続する副詞を一括して削除する。 Adverb rules The adverb rules are rules that do not count adverbs that are directly related to the verb, for example, ~ freely, ~ possible, ~ impossible, ~ impossible, and so on. In English, adverbs such as temporarily are not included in the case component. Adverbs often end with (~ ly), but adverbs may be identified by extracting them using text mining technology. Such an adverb is extracted in the character string extraction step S11, and a process of deleting the adverb is also performed. At the time of deleting the adverbs, if a plurality of adverbs are separated by conjunctions such as "," and are continuous, the continuous adverbs including such conjunctions are deleted at once.

このような文字列抽出ステップＳ１１を終了させた後、単語包括化ステップＳ１２へ移行する。 After completing such a character string extraction step S11, the process proceeds to the word inclusion step S12.

単語包括化ステップＳ１２では、文字列抽出ステップＳ１１において解析した文字列につき、連体表現を１フレーズとして包括する処理を行う。ここでいう連体表現の種類としては、冠詞＋名詞（the data）、名詞＋名詞、形容詞＋名詞、現在分詞(〜ing)＋名詞、過去分詞(〜ed)＋名詞、「at least, not less than, any of, either of, any」＋数量表現＋名詞複数形、「at least, not less than, any of, either of, any」＋数量表現＋名詞複数形＋「between, among」＋名詞群等が挙げられる。この連体表現としては、他に数字、単位、括弧、計算記号の羅列等も含まれる。 In the word inclusion step S12, the character string analyzed in the character string extraction step S11 is subjected to a process of including the association expression as one phrase. The types of complex expressions mentioned here are coronary + noun (the data), noun + noun, adjective + noun, present part (~ ing) + noun, past part (~ ed) + noun, "at least, not less". "than, any of, either of, any" + quantitative expression + noun plural, "at least, not less than, any of, either of, any" + quantitative expression + noun plural + "between, among" + noun group And so on. This adnominal expression also includes numbers, units, parentheses, a list of calculation symbols, and the like.

このような連体表現は名詞句として構成される場合が多いが、この名詞句を抽出する上では、文字列について形態素解析及び構文解析を行うことにより、英単語に分解する。この個々の英単語が名詞句を構成する場合もあるため、かかる場合には、そのような英単語を名詞句として抽出する。ちなみに、名詞句とは、文法的に句を構成する場合に限定されるものではなく、一つの英単語からなる名詞をも含む概念である。また、例えば"the data"のように、個々の名詞に加えて冠詞を含めてこれを名詞句として抽出するようにしてもよい。また、"the image data"や" an optical disc"のように、名詞＋名詞等の包括化単語を名詞句として抽出するようにしてもよい。 Such a noun phrase is often constructed as a noun phrase, but in extracting this noun phrase, the character string is decomposed into English words by performing morphological analysis and parsing. Since these individual English words may constitute a noun phrase, in such a case, such an English word is extracted as a noun phrase. By the way, a noun phrase is not limited to the case of grammatically constructing a phrase, but is a concept including a noun consisting of one English word. In addition to individual nouns, articles may be included and extracted as noun phrases, for example, "the data". Further, a comprehensive word such as a noun + a noun may be extracted as a noun phrase, such as "the image data" or "an optical disc".

ちなみに、この名詞句は、例えば、"light receiving areas"や" the management information writing means"のように動詞の進行形（現在分詞）を含んでいる場合や、"a fixed range"のように過去分詞を含んでいる場合もあるが、これらも形態素解析技術及び構文解析技術を用いて名詞句として抽出する。 By the way, this noun phrase may include a progressive form of the verb (current participle), such as "light receiving areas" or "the management information writing means", or a past participle, such as "a fixed range". May be included, but these are also extracted as noun phrases using morpheme analysis techniques and syntactic analysis techniques.

このような名詞句を抽出することで、格成分抽出ルールの一つである「形容詞のルール」にも対応することができる。 By extracting such a noun phrase, it is possible to correspond to the "adjective rule" which is one of the case component extraction rules.

形容詞のルール
形容詞のルールとは、名詞に係り受けする形容詞や、名詞を修飾する修飾語は、名詞と一体化させて考え、特に独立してカウントすることはしないとするものである。例えば、以下の下線部に示される形容詞として「〜の大きな取っ手を」、「〜によって入力可能な情報を」は、何れも名詞に係り受けする形容詞であることから、名詞と一体化させて考え、ひとまとめで１格成分とする。 Rules for adjectives The rules for adjectives are that adjectives related to nouns and modifiers that modify nouns are considered as one with nouns and are not counted independently. For example, as the adjectives shown in the underlined part below, "the big handle of ..." and " information that can be input by ..." are both adjectives that are related to the noun, so think of them as one with the noun. , Collectively make one component.

また「at least, not less than, any of, either of, any」＋数量表現＋名詞複数形、「at least, not less than, any of, either of, any」＋数量表現＋名詞複数形＋「between, among」＋名詞群について抽出することで、「〜のうち何れか１つ」を１格成分としてカウントすることが可能となる。 Also, "at least, not less than, any of, either of, any" + quantitative expression + noun plural, "at least, not less than, any of, either of, any" + quantitative expression + noun plural + " By extracting "between, among" + noun group, it is possible to count "any one of ..." as a nominative component.

また単語包括化ステップＳ１２では、「「場合に」のルール」に基づいて、以下を1フレーズとして包括する処理をおこなうようにしてもよい。 Further, in the word inclusion step S12, the following may be included as one phrase based on the "rule of" case "".

「場合に」のルール
when, if, in case, while, only, providing, whether, in a case when」＋名詞句(S)＋動詞(V)からなる副詞節があるものとする。 "In case" rule
It is assumed that there is an adverbial clause consisting of "when, if, in case, while, only, providing, whether, in a case when" + noun phrase (S) + verb (V).

ここで上記式中における動詞（Ｖ）は、この副詞節内の動詞であるから副詞節内動詞と定義する。このとき、副詞節内動詞にかかる目的語や副詞句は、これよりも上段において記載がある場合、これを改めてカウントすると二重カウントになるため、あえて格成分としてカウントはしない。これに対して、副詞節内動詞にかかる目的語や副詞句は、これよりも上段において記載が無い場合、別の格成分としてカウントする。 Here, the verb (V) in the above formula is defined as an adverbial clause verb because it is a verb within the adverbial clause. At this time, if the object or adverbial phrase related to the verb in the adverbial clause is described in the upper part, it will be double-counted if it is counted again, so it is not counted as a case component. On the other hand, the object or adverbial phrase related to the verb in the adverbial clause is counted as another case component if it is not described in the upper part.

副詞節内動詞(V)が補語をとるbe動詞の場合は、「when, if, in case, while, only, providing, whether, in a case when」＋名詞句(S)＋be動詞＋補語となる。かかる場合には、when等から始まる副詞節から補語にいたるまでを１格成分としてカウントする。 Adverb If the intra-clause verb (V) is a be verb that takes a complement, it is "when, if, in case, while, only, providing, whether, in a case when" + noun phrase (S) + be verb + complement. .. In such a case, the part from the adverb clause starting with when etc. to the complement is counted as a nominative component.

また動詞(V)の目的語となる名詞節「that, whether, whether or not」＋名詞句(S)＋動詞(V)については、名詞節内動詞にかかる目的語、副詞句は、これよりも上段において記載がある場合、これを改めてカウントすると二重カウントになるため、あえて格成分としてカウントはしない。これに対して、副詞節内動詞にかかる目的語や副詞句は、これよりも上段において記載が無い場合、別の格成分としてカウントする。 For the noun clause "that, whether, whether or not" + noun phrase (S) + verb (V), which is the object of the verb (V), the object and adverb phrase related to the verb in the noun clause are from this. However, if there is a description in the upper row, if this is counted again, it will be a double count, so it is not counted as a case component. On the other hand, the object or adverbial phrase related to the verb in the adverbial clause is counted as another case component if it is not described in the upper part.

動詞(V)が補語をとるbe動詞の場合、「that, ・・・」＋名詞句(S)＋be動詞＋補語には、これら1フレーズを１格成分としてカウントする。 In the case of a be verb whose verb (V) takes a complement, these one phrase is counted as a nominative component in "that, ..." + noun phrase (S) + be verb + complement.

つまり、when等の直後に主語、動詞と続く場合のルールとして、when等＋主語＋動詞となっている場合には、これらをまとめて１格成分とする。例えば、"〜when the wiper arm is located"については、このwhen the wiper arm is locatedをまとめて１格成分とする。また、この後に" at a predetermined position"等、他の文言が続く場合には、それについては個別に格成分の有無を判断していくこととなる。例えば"when the wiper arm is located at a predetermined position and a relative position signal"については、それぞれ、"when the wiper arm is located"、" predetermined position"、" a relative position signal"をそれぞれ格成分として抽出する。同様に、"when the management information writing means writes updated management information to a page of the flash memory"については、when the management information writing means writesまでをまとめて１格成分、"updated management information"、" a page of the flash memory"それぞれを格成分として抽出する。 That is, as a rule when the subject and the verb follow immediately after the when etc., when the time etc. + the subject + the verb are used, these are collectively regarded as one case component. For example, for "~ when the wiper arm is located", this when the wiper arm is located is collectively regarded as a nominative component. In addition, if other words such as "at a predetermined position" follow, the presence or absence of the case component will be determined individually. For example, for "when the wiper arm is located at a predetermined position and a relative position signal", "when the wiper arm is located", "predetermined position", and "a relative position signal" are extracted as case components, respectively. .. Similarly, for "when the management information writing means writes updated management information to a page of the flash memory", the one-class component, "updated management information", "a page of", includes up to when the management information writing means writes. the flash memory "Extract each as a case component.

また、単語包括化ステップＳ１２では、以下の「形態素のルール」に基づいて、以下のフレーズを包括して１格成分としてカウントしてもよい。 Further, in the word inclusion step S12, the following phrases may be comprehensively counted as a nominative component based on the following "rules of morphemes".

形態素のルール
形態素のルールでは、例えば、日本語でいうところの「Ａに係るＢ」、「Ａに関するＢ」、「ＡにおけるＢ」、「ＡとなるＢ」、「ＡであるＢ」、「ＡでのＢ」、「Ａのうち（の）Ｂ」、「ＡとしてのＢ」は、これが英語で記載された場合においても同様にＡ、Ｂひとまとめにして１格成分としてカウントするものである。 Morpheme rules In the morpheme rules, for example, "B related to A", "B related to A", "B in A", "B to be A", "B to be A", "B" in Japanese. "B in A", "B in A", and "B as A" are similarly counted as one case component for A and B even when they are written in English. ..

・名詞句＋名詞間接続前置詞(of, in, for, or, within, about)＋名詞句
・名詞句＋名詞間接続前置詞(of, in, for, or, within, about)＋名詞句＋名詞間接続前置詞(of, in, for, or, within, about)＋名詞句
・名詞句＋「between, among, from between, from among」＋名詞句のとる名詞が複数形
但し、前置詞が動詞の英熟語に含まれる前置詞である場合や、in, forに続く名詞句が「現在分詞(〜ing)＋名詞句」、「過去分詞(〜ed)＋名詞句」の場合は、上述したカウントのルールからは除外する。・ Noun phrase + noun connection prefix (of, in, for, or, within, about) + noun phrase ・ Noun phrase + noun connection prefix (of, in, for, or, within, about) + noun phrase + noun Inter-connection prefix (of, in, for, or, within, about) + noun phrase ・ Noun phrase + "between, among, from between, from among" + nouns taken by noun phrases are plural If it is a predicate included in a compound word, or if the noun phrase following in, for is "current part (~ ing) + noun phrase" or "past part (~ ed) + noun phrase", the above-mentioned counting rule Exclude from.

次に名詞句抽出ステップＳ１３へ移行する。この名詞句抽出ステップＳ１３では、文字列抽出ステップＳ１１において、連体表現を１フレーズとして包括されているという前提の下でこれを抽出し、これら１フレーズを１格成分としてカウントする。 Next, the process proceeds to the noun phrase extraction step S13. In this noun phrase extraction step S13, in the character string extraction step S11, this is extracted on the premise that the adnominal expression is included as one phrase, and these one phrase is counted as one case component.

次に格成分抽出ステップＳ１４に移行する。この格成分抽出ステップＳ１４では、いかに説明するルールに基づいて格成分を抽出する処理動作を実行する。 Next, the process proceeds to the case component extraction step S14. In this case component extraction step S14, a processing operation for extracting a case component is executed based on a rule explaining how.

先ず、発明の名称とみなした文言、「the・said」＋「発明の名称もしくはその一部」＋comprisingに該当する「the・said」＋「発明の名称もしくはその一部」を格成分から除外する処理を行う。 First, the wording regarded as the name of the invention, "the ・ said" + "the name of the invention or a part thereof" + "the said" corresponding to comprising + "the name of the invention or a part thereof" is excluded from the case components. Perform processing.

また、この格成分抽出ステップＳ１４においては、以下の二重定義のルールに基づいて、格成分の抽出を行う。 Further, in the case component extraction step S14, the case component is extracted based on the following dual definition rule.

二重定義のルール
動詞に係る名詞句が重複する場合、その名詞句を格成分から除外する。これは格成分抽出ルールにおける二重定義のルールに基づくものである。 Double-defined rule If a noun phrase related to a verb is duplicated, that noun phrase is excluded from the case component. This is based on the double-defined rule in the case component extraction rule.

二重定義のルールでは、以下の処理動作を行う。
１）受動態からなる動詞を順次抽出する。この受動態からなる動詞が二重定義になっている可能性があるためである。
２）抽出した受動態からなる動詞の能動態（不定詞を含む）、進行形、受動態の何れかが、その上段で既に定義されているか識別を行う。
３）当該動詞の能動態、進行形、受動態の何れかが既に定義されていた旨を識別した場合には、識別した動詞の能動態、進行形、受動態の何れかに係り受けする名詞句を特定し、又は識別した動詞の能動態、進行形、受動態の何れかが係り受けする主体を示す名詞句を特定する。
４）上記特定した名詞句と、当該抽出した受動態からなる動詞にby 又はinを介して係り受けする名詞句とが少なくとも一部が同一であるか否か判断する。
５）４）においてこれらが少なくとも一部が同一であるものと判断した場合、当該抽出した受動態からなる動詞に係り受けする名詞句を抽出すべき格成分から除外する処理を行う。一方、４）においてこれらが少なくとも一部が同一でないものと判断した場合、当該抽出した受動態からなる動詞に係り受けする名詞句を抽出すべき格成分に含める処理を行う。 In the double definition rule, the following processing operations are performed.
1) Sequentially extract verbs consisting of the passive voice. This is because the verb consisting of this passive voice may have a double definition.
2) Identify whether the active voice (including infinitive), progressive tense, or passive voice of the verb consisting of the extracted passive voice is already defined in the upper part.
3) When it is identified that any of the active voice, progressive form, and passive voice of the verb has already been defined, the nomenclature that depends on any of the active voice, progressive form, and passive voice of the identified verb is specified. Or, identify a nomenclature that indicates the subject to which the active, progressive, or passive voice of the identified verb is involved.
4) It is determined whether or not at least a part of the noun phrase specified above and the noun phrase that is applied to the extracted passive verb via by or in are the same.
5) If it is determined in 4) that at least a part of these is the same, the noun phrase related to the verb consisting of the extracted passive voice is excluded from the case components to be extracted. On the other hand, when it is determined in 4) that at least a part of these is not the same, the noun phrase related to the extracted passive verb is included in the case component to be extracted.

上述の処理動作を具体例を示しながら説明をする。
例えば、"a graphic memory for storing the image data;
a control circuit for managing the image data stored in said graphic memory,"
は、日本語でいうところの「イメージデータを蓄積する画像メモリと、上記画像メモリに蓄積されたイメージデータを制御する制御回路と、」である。
ここで「上記画像メモリに蓄積された」は、既に上段で実現されている命題に対応した動作である。このため、これらも格成分として抽出してしまうと、上段との間で同一の命題に対応した二重に抽出してしまうこととなる。このため、この「上記画像メモリに蓄積された」は、抽出すべき格成分から除外する。 The above processing operation will be described with reference to specific examples.
For example, "a graphic memory for storing the image data;
a control circuit for managing the image data stored in said graphic memory, "
Is "an image memory that stores image data and a control circuit that controls the image data stored in the image memory" in Japanese.
Here, "stored in the image memory" is an operation corresponding to the proposition already realized in the upper stage. Therefore, if these are also extracted as case components, they will be extracted twice corresponding to the same proposition with the upper row. Therefore, this "stored in the image memory" is excluded from the case components to be extracted.

これらの処理を英語のクレームにおいても同様に実行する。 The same applies to English claims.

先ず、１）において、受動態からなる動詞"stored"を順次抽出する。次に２）において、抽出した受動態からなる動詞の能動態、進行形、受動態の何れかがその上段に記載されているか否か識別する。上述した例では、storeの進行形であるstoringが記載されていることを識別することとなる。 First, in 1), the verb "stored" consisting of the passive voice is sequentially extracted. Next, in 2), it is identified whether or not any of the active voice, progressive tense, and passive voice of the extracted passive verb is described in the upper row. In the above example, it is identified that storing, which is the progressive form of the store, is described.

次に３）に移行し、storeの進行形であるstoringが記載されていたことから、これに係り受けする名詞句を特定し、又は識別した動詞の能動態、進行形、受動態の何れかが係り受けする主体を示す名詞句を特定する。上述した例では、"a graphic memory"、"the image data"を特定する。 Next, we moved to 3), and since storing, which is the progressive form of the store, was described, the noun phrase that was related to this was specified, or the active voice, progressive tense, or passive voice of the identified verb was involved. Identify the noun phrase that indicates the receiving subject. In the above example, "a graphic memory" and "the image data" are specified.

次に４）に移行し、特定した名詞句"a graphic memory"、"the image data"と、当該抽出した受動態からなる動詞"stored"にby 又はinを介して係り受けする名詞句“said graphic memory"とが少なくとも一部が同一であるか否か判断する。この例では、"graphic memory"が少なくとも一部が同一であるものと判断される。 Next, move to 4), and the noun phrase "said graphic" is applied to the specified noun phrases "a graphic memory" and "the image data" and the extracted passive verb "stored" via by or in. Determine if at least part of the memory "is the same. In this example, it is determined that the "graphic memory" is at least partially identical.

次に５）に移行し、４）において少なくとも一部が同一であるものと判断した場合であることから、当該抽出した受動態からなる動詞に係り受けする名詞句“said graphic memory"を抽出すべき格成分から除外する処理を行う。 Next, we move to 5), and since it is judged that at least a part is the same in 4), the noun phrase "said graphic memory" that depends on the verb consisting of the extracted passive voice should be extracted. Performs processing to exclude from the case component.

一方、４）においてこれらが少なくとも一部すら同一でないものと判断した場合、当該抽出した受動態からなる動詞に係り受けする名詞句を抽出すべき格成分に含める処理を行う。 On the other hand, if it is determined in 4) that at least some of these are not the same, a process is performed in which the noun phrase related to the extracted passive verb is included in the case component to be extracted.

これに対して、２）においてstoreの能動態、進行形、受動態の何れかが記載されていない場合、或いは４）において"stored"にby 又はinを介して係り受けする名詞句“said graphic memory"と少なくとも一部が同一する名詞句が無い場合には、当該抽出した受動態からなる動詞に係り受けする名詞句“said graphic memory"を抽出すべき格成分に含める処理を行う。 On the other hand, in 2), when any of the active, progressive, and passive voices of the store is not described, or in 4), the noun phrase "said graphic memory" is assigned to "stored" via by or in. If there is no noun phrase that is at least partly the same as the noun phrase, the noun phrase "said graphic memory" that is related to the verb consisting of the extracted passive voice is included in the case component to be extracted.

下記の請求項の記載について二重定義のルールに基づいて処理を行う際には、storedの能動態であるstoreを抽出し、stored in の後に続くthe memory unitが、memory unit operableと少なくとも一部が同一であるから、stored in の後に続くthe memory unitを抽出すべき格成分から除外する。
"a memory unit operable to store a program composed of a plurality of instructions; and a processor operable to fetch each instruction in turn from the program stored in the memory unit," When processing the description of the following claims based on the rule of double definition, the store, which is the active voice of stored, is extracted, and the memory unit following stored in is memory unit optionally and at least a part of it. Since they are the same, the memory unit following stored in is excluded from the case components to be extracted.
"a memory unit appropriately to store a program composed of a plurality of instructions; and a processor accurately to fetch each instruction in turn from the program stored in the memory unit,"

なお、上述した１）〜５）のプロセスは必ずしもこの順序で行う場合に限定されるものではなく、一部順序を入れ替え、又は一部のプロセスを同時に行うようにしてもよいことは勿論である。 It should be noted that the processes 1) to 5) described above are not necessarily limited to the case where they are performed in this order, and it goes without saying that some of the processes may be changed or some processes may be performed at the same time. ..

二重定義のルールは更に以下のケースにおいても同様に判断を行う。 The double definition rule also makes the same judgment in the following cases.

名詞句に係り受けし、また名詞句が係り受けする動詞、動名詞、to動詞、現在分詞、過去分詞の記載があるとき、当該記載の後段において記載されている同一の名詞句に係り受けし、また当該名詞句が係り受けする動詞、動名詞、to動詞、現在分詞、過去分詞の意味内容が一致する場合、上記後段において記載されている同一の名詞句を抽出すべき格成分から除外するように処理するようにしてもよい。 When there is a description of a verb, a verb, a to verb, a present part, or a past part that is related to a noun phrase, and is related to a noun phrase, it is related to the same noun phrase described later in the description. If the meanings of the verb, verb, to verb, current part, and past part that the noun phrase is related to match, the same noun phrase described in the latter part of the above is excluded from the case components to be extracted. It may be processed as follows.

例えば、以下の例に示すように、前段において規定された「map generation means」という構成要素において、名詞句「a map」が「generating」という現在分詞に係り受けしているものとする。このとき、この名詞句「map」と同一の名詞句が後段において定義されており、その後段の名詞句「map」にgenerated が係り受けする。
「map generation means for generating a map ・・・・・ the map generated by the map generation means ・・・ For example, as shown in the following example, it is assumed that the noun phrase "a map" is related to the present participle "generating" in the component "map generation means" defined in the previous paragraph. At this time, the same noun phrase as this noun phrase "map" is defined in the latter part, and generated is involved in the noun phrase "map" in the subsequent part.
"Map generation means for generating a map ・・・・・ the map generated by the map generation means ・・・

かかる場合には、この前段の名詞句が係り受けしている現在分詞と、これと同一の名詞に係り受けする過去分詞の意味内容が一致するため、後段において記載されている同一の名詞句「map」を抽出すべき格成分から除外する。 In such a case, since the meaning and content of the present participle that the noun phrase in the first sentence is related to and the past participle that is related to the same noun match, the same noun phrase " Exclude "map" from the case components to be extracted.

また、前段及び後段においてそれぞれ、動詞に係り受けする［名詞句］もしくは動詞から係り受けする［名詞句］の記載があるものとする。このとき、前段及び後段においてそれぞれ記載された、動詞に係り受けする［名詞句］もしくは動詞から係り受けする［名詞句］が一致するのであれば、後段に記載された、動詞に係り受けする［名詞句］もしくは動詞から係り受けする［名詞句］をカウントすべき格成分から除外する。 In addition, it is assumed that there is a description of a [noun phrase] that depends on a verb or a [noun phrase] that depends on a verb in the first and second stages, respectively. At this time, if the [noun phrase] that depends on the verb or the [noun phrase] that depends on the verb match, which are described in the first and second stages, it depends on the verb that is described in the second stage [ Exclude [noun phrases] or [noun phrases] that are dependent on verbs from the case components that should be counted.

また、前段及び後段においてそれぞれ「that」「which」「wherein」＋名詞句(S)＋動詞・動名詞が記載され、これらが互いに一致する場合には、後段に記載された、「that」「which」「wherein」＋名詞句(S)＋動詞・動名詞をカウントすべき格成分から除外する。このとき、名詞句(S)＋動詞・動名詞の同一性を確認し、同一であれば後段に記載された名詞句(S)＋動詞・動名詞をカウントすべき格成分から除外するようにしてもよい。 In addition, "that", "which", "where in" + noun phrase (S) + verb / verb are described in the first and second stages, respectively, and if they match each other, "that" and "that" and "that" are described in the second stage. "which" "where in" + noun phrase (S) + verbs / nouns are excluded from the case components to be counted. At this time, check the identity of the noun phrase (S) + verb / verb, and if they are the same, exclude the noun phrase (S) + verb / verb described in the latter part from the case components to be counted. You may.

また、前段及び後段においてそれぞれ第5文型のOとCが記載されていたとき、このＯとＣの組み合わせが、前段と後段との間で互いに一致する場合には、後段に記載されたOとCの組み合わせをカウントすべき格成分から除外するようにしてもよい。またこのOとCが、前段又は後段で主語の述語の関係で記載されている場合も、同一性の判断対象に含めてもよい。 In addition, when O and C of the fifth sentence pattern are described in the first and second stages, respectively, and if the combination of O and C matches each other between the first and second stages, the O and C described in the second stage are used. The combination of C may be excluded from the case components to be counted. Further, when O and C are described in the relation of the predicate of the subject in the first stage or the second stage, they may be included in the judgment target of identity.

上述した前段と後段の一致度の判断は、動詞が能動態であるか、受動態であるか否かの判断まで含めて同一性を判断するようにしてもよいし、逆に動詞が能動態であるか、受動態であるか否かまでは考慮することなく、動詞さえ共通してれば同一であるものと判断してもよい。即ち、互いに一致するか否かの判断において、互いの動詞の同一性を、能動態又は受動態の共通性まで判断することなくその動詞の共通性のみに基づいて判断するようにしてもよい。また、関係代名詞が主語の場合、その参照先の名詞句を主語とみなして同一性の判断を行うようにしてもよい。また、同一性の判断時において、a, the, each, both, saidのみが異なる場合には、互いに同一であるものと判断するようにしてもよい。また「configured to 動詞句」が後ろから名詞句に係り受けする場合、動詞の過去分詞が名詞句に後ろから係り受けしているとみなして同一性を判断してもよい。同一性の判断時において、「comprise」を原形に持つ動詞は、この同一性の判断から除外してもよい。 In the above-mentioned judgment of the degree of agreement between the first stage and the second stage, the identity may be judged including the judgment of whether the verb is active or passive, and conversely, whether the verb is active or not. , It may be judged that they are the same as long as the verbs are common, without considering whether or not they are passive. That is, in determining whether or not they match each other, the identity of each other's verbs may be determined based only on the commonality of the verbs without determining the commonality of the active voice or the passive voice. Further, when the relative pronoun is the subject, the noun phrase of the reference destination may be regarded as the subject and the identity may be judged. Further, when determining the identity, if only a, the, each, both, said are different, it may be determined that they are the same as each other. If the "configured to verb phrase" is dependent on the noun phrase from behind, the identity may be determined by assuming that the past part of the verb is dependent on the noun phrase from behind. At the time of determination of identity, a verb having "comprise" as its original form may be excluded from this determination of identity.

また、「when, if, in case, in the case, while, only, providing, whether」＋名詞句(S)＋動詞(V)からなる条件節を係りうける文があるものとする。このとき、当該条件節と完全一致する条件節が、その上段で記述されていない場合、係り受ける文の前段又は後段において重複する格成分は、格成分のカウント対象から除外することなく、カウントしてもよい。つまり、互いに一致するか否かの判断対象となる前段と後段の各文字列が、それぞれwhen, if, in case, in the case, while, only, providing, whetherの何れか＋名詞句(S)＋動詞(V)からなる条件節に係り受ける場合に、当該係り受けする条件節が前段と後段の間で相違する場合には、互いに非同一であるものと判断する。かかる場合の例として、
前段の同一性判断対象名詞句Ａ＋動詞Ｂ when名詞句Ｐ＋動詞Ｑ
後段の同一性判断対象名詞句Ａ＋動詞Ｂ when名詞句Ｐ＋動詞Ｒ
であるものとする。かかる場合に、「when, if, in case, in the case, while, only, providing, whether」＋名詞句(S)＋動詞(V)からなる条件節が前段と後段で互いに相違するものであるから、互いに非同一であるものと判断する。一方、このwhen 節が前段と後段で共通するものであれば、前段と後段の同一性判断対象は互いに同一であるものと判断するようにしてもよい。 In addition, it is assumed that there is a sentence relating to a conditional clause consisting of "when, if, in case, in the case, while, only, providing, whether" + noun phrase (S) + verb (V). At this time, if a conditional clause that exactly matches the conditional clause is not described in the upper part, the case component that overlaps in the first or second part of the sentence to be applied is counted without being excluded from the count target of the case component. You may. In other words, each character string in the first and second stages, which is the target of determining whether or not they match each other, is either when, if, in case, in the case, while, only, providing, or whether + noun phrase (S). When a conditional clause consisting of + verb (V) is applied, if the conditional clause to be applied is different between the first stage and the second stage, it is judged that they are not the same as each other. As an example of such a case
Noun phrase A + verb B when noun phrase P + verb Q
Noun phrase A + verb B when noun phrase P + verb R
Suppose that In such a case, the conditional clause consisting of "when, if, in case, in the case, while, only, providing, whether" + noun phrase (S) + verb (V) is different from each other in the first and second stages. Therefore, it is judged that they are not the same as each other. On the other hand, if this when clause is common to the first stage and the second stage, it may be determined that the objects for determining the identity of the first stage and the second stage are the same.

また、上述した前段と後段の一致度を判断する際において、動詞が共通しており、助動詞のみが異なる場合も同一性があるものと判断してもよいし、異なるものとして判断してもよい。 Further, when determining the degree of agreement between the first stage and the second stage described above, it may be determined that the verbs are common and only the auxiliary verbs are different, that they are the same, or that they are different. ..

主語のルール
また、格成分抽出ステップＳ１４では、以下に説明する主語のルールに基づいて格成分抽出を行うようにしてもよい。この主語のルールでは、抽出した文字列中に主語と、これに係り受けする動詞又は動名詞の関係が含まれていた場合であって、その主語に相当する文言が、その上段で既に記載されている場合には、当該主語に相当する文言を抽出すべき格成分から除外する。また、その主語に相当する文言が、その上段で未だ記載されていない場合には、当該主語に相当する文言を抽出すべき格成分に含めるように処理する。 Subject Rule In the case component extraction step S14, the case component extraction may be performed based on the subject rule described below. In this subject rule, when the extracted character string contains the relationship between the subject and the verb or gerund related to it, the wording corresponding to the subject is already described in the upper part. If so, the wording corresponding to the subject is excluded from the case components to be extracted. If the wording corresponding to the subject is not yet described in the upper part, the wording corresponding to the subject is processed so as to be included in the case component to be extracted.

例えば、that (which/ wherein)＋主語＋動詞の場合には、当該主語について上述したルールに基づいて判断を行う。 For example, in the case of that (which / similarly) + subject + verb, the subject is judged based on the above-mentioned rule.

例えば、"・・・are lowered by the drive machine, wherein the drive machine is arranged a rotation axis"の場合には、主語としての"the drive machine"は既に前段において現れている。このため"the drive machine"については格成分から除外するように処理を行う。仮に、この主語としての"the drive machine"が、"a drive machine"等と記載され、これよりも上段に（これよりも以前に）同一の名詞句が記載されていない場合には、この"a drive machine"を格成分に含めるように処理を行う。 For example, in the case of "... are lowered by the drive machine, wherein the drive machine is arranged a rotation axis", the subject "the drive machine" has already appeared in the previous stage. Therefore, "the drive machine" is processed so as to be excluded from the case component. If "the drive machine" as the subject is described as "a drive machine" etc. and the same noun phrase is not described above this (before this), this "" Process so that "a drive machine" is included in the case component.

なお、この主語のルールでは、that (which/ wherein)＋主語＋動名詞の場合についても同様の処理動作を行うようにしてもよい。 In this subject rule, the same processing operation may be performed in the case of that (which / herein) + subject + gerund.

実際に主語のルールに基づいた処理を行う場合には、以下の処理フローに基づくものであってもよい。
１）主語を先ず抽出する。
２）抽出した主語が、それより以前に記載されているかをソートして確認する。その結果、抽出した主語と１００％一致するもの、或いはそれを一部含むものがあった場合、３）へ移行する。それ以外は、抽出した主語が初めて登場したものであるから、その抽出した主語を格成分として特定する。
３）上記２）において３）に移行する場合に、いずれも格成分としてカウントしない。 When actually performing processing based on the rule of the subject, it may be based on the following processing flow.
1) First, extract the subject.
2) Sort and confirm whether the extracted subject is described before that. As a result, if there is a subject that is 100% consistent with the extracted subject, or if there is a part that includes it, the process proceeds to 3). Other than that, since the extracted subject appears for the first time, the extracted subject is specified as a case component.
3) When shifting to 3) in 2) above, none of them are counted as case components.

即ち、抽出した文字列中に主語と動詞の関係が含まれているか否かをまず判断する。その結果、抽出した文字列中に主語と動詞の関係が含まれている場合には、その主語に相当する文言が、その上段で既に記載されているか否かを確認する。そしてその主語に相当する文言が、その上段で既に記載されている場合には、当該主語に相当する文言を抽出すべき格成分から除外し、その主語に相当する文言が、その上段で未だ記載されていない場合には、新たに登場したものであるから、当該主語に相当する文言を抽出すべき格成分に含める。 That is, it is first determined whether or not the extracted character string contains the relationship between the subject and the verb. As a result, if the extracted character string contains a relationship between the subject and the verb, it is confirmed whether or not the wording corresponding to the subject has already been described in the upper row. If the wording corresponding to the subject is already described in the upper part, the wording corresponding to the subject is excluded from the case components to be extracted, and the wording corresponding to the subject is still described in the upper part. If it is not, it is a new appearance, so the wording corresponding to the subject is included in the case component to be extracted.

これにより主語が多い特許請求の範囲の記載において、主語の数が多いほど格成分が多くなるという不合理を解消することが可能となる。 This makes it possible to eliminate the absurdity that the larger the number of subjects, the larger the number of case components in the description of the claims having many subjects.

この主語のルールを適用する場合、具体的には以下のアルゴリズムに基づくものであってもよい。 When applying this subject rule, it may be specifically based on the following algorithm.

抽出した文字列中に主語と、これに係り受けする［動詞・動名詞・to動詞］の関係が含まれているか否かを判断する。その結果、主語と［動詞・動名詞・to動詞］の関係が含まれている場合、その主語に相当する文言が、その上段で既に記載されているか否かを判別する。当該主語に相当する文言が上段で既に記載されていた場合、この主語については、抽出すべき格成分から除外する。なお、この主語に相当する文言と一致する文言がその上段において記載されているか否かを判断する場合、その一致度は、完全一致のみならず、一部一致も含めてもよい。 It is determined whether or not the extracted character string contains the relationship between the subject and the [verb / gerund / to verb] related to it. As a result, when the relationship between the subject and [verb / gerund / to verb] is included, it is determined whether or not the word corresponding to the subject is already described in the upper part. If the wording corresponding to the subject has already been described in the upper part, this subject is excluded from the case components to be extracted. When determining whether or not a wording that matches the wording corresponding to this subject is described in the upper part of the wording, the degree of matching may include not only an exact match but also a partial match.

ここでいう一部一致とは、a, the, each, both, saidのみが異なる場合、一部一致とみなしてもよい。また、each, both, said, theが主語となる名詞句の冒頭にある場合、その語を除く名詞句を含む名詞句（後方一致により検索）が存在する場合、一部一致とみなしてもよい。主語に該当する名詞句について、該当の名詞句に並列の関係となる名詞句が「and」「,」で接続されている場合、並列関係にある名詞句を包括して主語とみなす処理を行うようにしてもよい。 The partial match here may be regarded as a partial match when only a, the, each, both, said are different. Also, if each, both, said, the is at the beginning of a noun phrase whose subject is a noun phrase that includes a noun phrase other than that word (searched by a suffix match), it may be regarded as a partial match. .. For noun phrases that correspond to the subject, if the noun phrases that are in parallel with the noun phrase are connected by "and" and ",", the noun phrases that are in parallel are comprehensively regarded as the subject. You may do so.

なお主語と、これに係り受けする［動詞・動名詞・to動詞］の関係の代替として、「that」「which」「wherein」＋名詞句(S)＋動詞・動名詞の場合も同様に、主語のルールを適用するようにしてもよい。この「that」「which」「wherein」に続く名詞句（Ｓ）が主語であり、これに係り受けする動詞・動名詞との関係で主語になっている場合が多いが、これについても主語に相当する文言と一致する文言がその上段において記載されているか否かを判断する。 As an alternative to the relationship between the subject and the [verb / gerund / to verb] related to it, the same applies to "that", "which", "where in" + noun phrase (S) + verb / gerund. The subject rule may be applied. The noun phrase (S) following "that", "which", and "where in" is the subject, and in many cases it is the subject in relation to the verbs and gerunds involved in this, but this is also the subject. Determine if a wording that matches the corresponding wording is listed above it.

構成要素のルール
また、格成分抽出ステップＳ１４では、「構成要素のルール」に基づいて格成分抽出を行うようにしてもよい。この「構成要素のルール」とは、下記の要件列挙方式で記載された下線部の構成要素を格成分数としてカウントしないルールである。要件列挙方式と、構成要素を記載しない書き流し方式との間で格成分数の差が生まれるのを防止するために調整を行うものである。 Component Rule In the case component extraction step S14, the case component extraction may be performed based on the “component rule”. This "rule of component" is a rule that does not count the component of the underlined portion described by the following requirement enumeration method as the number of case components. Adjustments are made to prevent a difference in the number of case components between the requirement enumeration method and the writing method in which no component is described.

・要件列挙方式
「〜Ａと、
上記Ａに配設された〜Ｂと、
上記Ｂに接続された、〜からなるＣと、
上記Ｃに取り付けられたＤとを備えること
を特徴とする装置」・ Requirements enumeration method " ~ A and
~ B arranged in A above,
C , which is connected to B above and consists of ...
A device including D attached to C above. "

英語の特許明細書の場合、特許文が以下の構成からなる場合、comprising(comprise), including(include), consisting of(consist of), characterized byに続く各文の先頭の名詞句を格成分から除外することとする。 In the case of an English patent specification, if the patent sentence has the following structure, the noun phrase at the beginning of each sentence following comprising (comprise), including (include), consisting of (consist of), characterized by is started from the case component. It will be excluded.

また、名詞句 + comprising(comprise), including(include), consisting of(consist of), characterized by (+ 名詞句)についても同様に、この名詞句をそれぞれ格成分数とみなさないように処理するようにしてもよい。 Similarly, for noun phrases + comprising (comprise), including (include), consisting of (consist of), characterized by (+ noun phrases), treat each noun phrase so that it is not regarded as the number of case components. It may be.

構成要素のルールの下で格成分から除外すべきものと判断された名詞句について、該当する名詞句に並列の関係となる名詞句が「and」「,」で接続されている場合、並列関係にある名詞句を包括して構成要素とみなし、これを格成分としてカウントしないように処理するようにしてもよい。 For noun phrases that are judged to be excluded from the case component under the rules of the components, if the noun phrases that have a parallel relationship with the corresponding noun phrase are connected by "and" and ",", the noun phrase is in a parallel relationship. A certain noun phrase may be comprehensively regarded as a component and processed so as not to be counted as a case component.

その他
格成分抽出ステップＳ１４では、その他、ケースバイケースにおいて下記のルールを格成分のカウントの際に採用するようにしてもよい。 In the other case component extraction step S14, the following rules may be adopted when counting the case components on a case-by-case basis.

名詞句＋「to be 受身動詞」の場合、「to be 受身動詞」を格成分から除外するようにしてもよい。また「each other」を格成分から除外するようにしてもよい。また、「comprising」に続く「the step of」を格成分から除外するようにしてもよい。また「in a 〜 manner」が存在する場合、「a 〜 manner」を格成分から除外するようにしてもよい。「as a result of 〜」が存在する場合、「as a result of 〜」を格成分から除外するようにしてもよい。「in part」がそれのみで（part の後続でpartにかかる句をとらずに）副詞句をとる場合、「part」を格成分から除外するようにしてもよい。「in response to 〜」が存在する場合、「response」を格成分から除外するようにしてもよい。 In the case of a noun phrase + "to be passive verb", "to be passive verb" may be excluded from the case component. Further, "each other" may be excluded from the case component. Further, "the step of" following "comprising" may be excluded from the case component. If "in a ~ manner" exists, "a ~ manner" may be excluded from the case component. If "as a result of ~" exists, "as a result of ~" may be excluded from the case component. If "in part" takes an adverbial phrase by itself (without taking the phrase that follows part), then "part" may be excluded from the case component. If "in response to ~" exists, "response" may be excluded from the case component.

格成分抽出対象の名詞句の格となる名詞に対して係り受けする形容詞で、複数の形容詞が並列の関係にある場合、つまり形容詞がandで接続されている場合がある。かかる場合には、形容詞の数に応じて格成分数をカウントする。つまり名詞に係り受けする形容詞Ａと形容詞Ｂが２つ存在する場合、当該名詞＋形容詞Ａからなる格成分と、当該名詞＋形容詞Ｂからなる格成分の合計２格成分数をカウントすることとなる。 It is an adjective that is related to the noun that is the case of the noun phrase to be extracted from the case component, and there are cases where multiple adjectives are in a parallel relationship, that is, the adjectives are connected by and. In such a case, the number of case components is counted according to the number of adjectives. That is, when there are two adjectives A and B related to a noun, the total number of case components consisting of the noun + adjective A and the noun + adjective B is counted. ..

本発明の実施方法
本発明を実施する上で、パーソナルコンピュータ（ＰＣ）を利用し、これに読み込まれたプログラムに基づいて行う。 Implementation Method of the Present Invention In implementing the present invention, a personal computer (PC) is used, and the present invention is carried out based on a program read therein.

先ず、電子データ化された英文の特許明細書の記載から、電子データ化された特許明細書における特定の欄（特許請求の範囲の記載）に記載されている文字列を抽出する。また、この特定の欄とは、特許請求の範囲の１請求項分に相当する。 First, from the description of the patent specification in English that has been converted into electronic data, the character string described in a specific column (statement of claims) in the patent specification that has been converted into electronic data is extracted. In addition, this specific column corresponds to one claim in the claims.

次に、抽出した文字列から上述したルールに基づいて格成分を抽出する。この抽出した格成分に基づいて格成分数を求めるようにしてもよい。格成分数とは、１請求項における格成分の数を意味している。メインクレームの場合は、そのメインクレームが定義されている請求項から抽出した文字列からそのまま格成分数をカウントすればよいが、サブクレームの場合には、そのサブクレームが定義されている請求項から抽出した文字列からそのまま格成分数をカウントするとともに、当該サブクレームが従属するメインの請求項の格成分数をこれに加算する。 Next, the case component is extracted from the extracted character string based on the above-mentioned rule. The number of case components may be obtained based on the extracted case components. The number of case components means the number of case components in one claim. In the case of a main claim, the number of case components may be counted as it is from the character string extracted from the claim in which the main claim is defined, but in the case of a subclaim, the claim in which the subclaim is defined. The number of case components is counted as it is from the character string extracted from, and the number of case components of the main claim to which the subclaim depends is added to this.

そして、カウントされた格成分数を、発明の限定度合いとして例えばディスプレイ等を介して表示する。 Then, the counted number of case components is displayed as the degree of limitation of the invention, for example, via a display or the like.

また、本発明は、上述した分析を行うプログラムがインストールされたＰＣ等のような装置として具体化されていてもよいし、これらをＰＣに実行させるためのプログラム、又はこれが記録された記録媒体として具体化されていてもよい。 Further, the present invention may be embodied as a device such as a PC on which the above-mentioned analysis program is installed, or as a program for causing the PC to execute these, or as a recording medium on which the program is recorded. It may be embodied.

また、本発明は、ネットワークシステムにおいて適用されるようにしてもよい。先ず、サーバー側において、特許明細書について格成分数を予め数値化したデータをサーバーに記憶させておく。そして、クライアント側から、所望の特許について格成分数の送信要求があった場合、当該特許の格成分数をサーバーから読み出し、これをネットワークを介してクライアント側に送るようにしてもよい。 The present invention may also be applied in a network system. First, on the server side, data in which the number of case components of the patent specification is quantified in advance is stored in the server. Then, when the client side requests transmission of the number of case components for the desired patent, the number of case components of the patent may be read from the server and sent to the client side via the network.

また、クライアント側から送信要求のあった特許について未だ格成分数が求められていない場合には、サーバーは当該要求を受けた特許について格成分数をカウントし、これをネットワークを介してクライアント側に送るようにしてもよい。 If the number of case components has not yet been requested for the patent requested by the client, the server counts the number of case components for the requested patent and sends this to the client side via the network. You may send it.

格成分の抽出においては既存のあらゆるテキストマイニング技術、データマイニング技術、言語解析処理技術等を用いるようにしてもよい。そして、これらの技術を利用し、上述したルールに基づいて、分析対象の各文字列について、格成分に相当するか否かを判断し、最終的に１請求項分の格成分を特定する。そして、この特定した格成分の１請求項分の総数を求め、これを当該請求項の格成分数として出力する。 In the extraction of case components, any existing text mining technology, data mining technology, language analysis processing technology, or the like may be used. Then, using these techniques, it is determined whether or not each character string to be analyzed corresponds to a case component based on the above-mentioned rule, and finally the case component for one claim is specified. Then, the total number of the specified case components for one claim is obtained, and this is output as the number of case components of the claim.

なお、本発明では、単に請求項毎の格成分数を出力するのみならず、この出力したデータを、パテントマップやグラフ、その他の評価値等、あらゆる情報に付加するようにしてもよい。 In the present invention, not only the number of case components for each claim may be output, but also the output data may be added to all kinds of information such as patent maps, graphs, and other evaluation values.

また、本発明では、請求項毎に特定した格成分を、画面上に表示し、又は印刷し、或いはデータ化するようにしてもよい。つまり、以下の実施例に示すように、請求項について特定した格成分を視覚的に把握可能なように下線やハイライト表示等を行うようにしてもよい。 Further, in the present invention, the case component specified for each claim may be displayed on the screen, printed, or converted into data. That is, as shown in the following embodiment, underlining, highlighting, or the like may be performed so that the case component specified for the claim can be visually grasped.

ちなみに、上述した格成分数の代替として、格成分数に基づく評価値を求め、これを出力又は記憶、送信するようにしてもよい。 Incidentally, as an alternative to the above-mentioned number of case components, an evaluation value based on the number of case components may be obtained and output, stored, or transmitted.

なお、本発明は、英文の特許明細書の特許請求の範囲の記載の格成分抽出に限定されるものではなく、他のいかなる英文の書類についても同様の方法に基づいて格成分を抽出する際にも適用することができる。 It should be noted that the present invention is not limited to the extraction of case components described in the claims of the English patent specification, and when extracting case components from any other English document based on the same method. It can also be applied to.

Claims

A character string extraction step for extracting a character string from the description of the claims in an English patent specification converted into electronic data, and a character string extraction step.
It has a case component extraction step that extracts a noun phrase from the extracted character string in the above character string extraction step and extracts a case component from the extracted noun phrase.
In the case component extraction step, when there is a description of a verb, a verb, a to verb, a present part, or a past part that depends on the noun phrase and is related to the noun phrase, it is described later in the description. If the meanings of the verb, verb, to verb, current part, and past part that are related to the same noun phrase are the same, the same noun phrase described in the latter part of the above A case component extraction program from an English patent specification, characterized in that a computer is executed to exclude the case component from the case component to be extracted.

A character string extraction step for extracting a character string from the description of the claims in an English patent specification converted into electronic data, and a character string extraction step.
It has a case component extraction step that extracts a noun phrase from the extracted character string in the above character string extraction step and extracts a case component from the extracted noun phrase.
In the above case component extraction step, when the object and the character string corresponding to the complement of the fifth sentence pattern are described in the first stage and the second stage, respectively, if they match each other, the object described in the second stage and the character string are described. A case component extraction program from an English patent specification, characterized in that a computer executes processing so as to exclude a combination of character strings corresponding to complements from the case components to be extracted.

In the case component extraction step, when the character strings corresponding to the object and the complement are described in the relation of the predicate of the subject in the first stage or the second stage, if they match each other, the purpose described in the second stage. The case component extraction program from the English patent specification according to claim 1, wherein a combination of a character string corresponding to a word and a complement is processed so as to be excluded from the case components to be extracted.

The above case component extraction step is characterized in that, in determining whether or not they match each other, the identity of each other's verbs is determined based only on the commonality of the verbs without determining the commonality of the active voice or the passive voice. The case component extraction program from the English patent specification according to any one of claims 1 to 3.

In the above case component extraction step, in determining whether or not they match each other, the identity of each noun or noun phrase is considered to be the same if only a, the, each, both, said are different. A case component extraction program from an English patent specification according to any one of claims 1 to 3, characterized in that determination is made.

In the above case component extraction step, when "configured to verb phrase" is related to a noun phrase from the back in determining whether or not they match each other, the past part of the verb is related to the noun phrase from the back. A case component extraction program from an English patent specification according to any one of claims 1 to 3, characterized in that the identity is judged as being received.

In the above case component extraction step, each of the first and second character strings to be judged whether or not they match each other is either when, if, in case, in the case, while, only, providing, or +. Claim 1 is characterized in that, when a conditional clause consisting of a noun phrase + a verb is applied, if the conditional clause to be applied is different between the first stage and the second stage, it is determined that they are not the same as each other. A case component extraction program from an English patent specification described in any one of ~ 3.

The case component extraction step is characterized in that, in determining whether or not they match each other, the identity of each other's verbs is determined based only on the commonality of the verbs without determining the commonality of the auxiliary verbs. A case component extraction program from an English patent specification according to any one of claims 1 to 3.