WO2008053949A1

WO2008053949A1 - Document group analysis device

Info

Publication number: WO2008053949A1
Application number: PCT/JP2007/071282
Authority: WO
Inventors: Hiroaki Masuyama; Norio Araki; Kazumi Hasuko; Toshiro Ohsaki
Original assignee: Intellectual Property Bank Corp.
Priority date: 2006-11-01
Filing date: 2007-11-01
Publication date: 2008-05-08
Also published as: JPWO2008053949A1

Abstract

A document analysis device includes: a text data acquisition unit (101) for acquiring a plurality of technical documents; a document vector acquisition unit (102) for acquiring a weighting amount of each index word for each technical document acquired; a factor calculation unit (103) which uses the acquired technical documents as an examinee and the weighting amounts of the respective index words to perform factor analysis with the respective index words as observation variables, calculates a factor load amount for each of factors for each fo the index words, and calculates a factor point for each of factors for each of the technical documents; and an imputed factor decision unit (104) which decides an imputed factor of each index word by using the factor load amount of each index word and decides the imputed factor of each technical document by using the factor point of each technical document. This enables appropriate analysis of a plurality of documents, i.e., what kind of concepts or characteristics are contained in the document or the document group.

Description

Specification

Document group analyzer

Technical field

[0001] The present invention relates to a technique for analyzing technical literature, and more particularly, to a technique for analyzing what kind of concept or characteristic exists in a plurality of documents.

Background art

[0002] In order to analyze the contents of a plurality of technical documents, there is known one that classifies technical documents into a plurality of clusters. For example, Japanese Patent Laid-Open No. 2005-92443 (Patent Document 1) adds a weight to each word obtained by performing morphological analysis on a searched technical document, vectorizes each technical document, Technical documents with similar orientations are grouped into a cluster. Then, important words are extracted for each cluster.

Patent Document 1: JP 2005-92443

Disclosure of the invention

Problems to be solved by the invention

[0003] However, in such classification based on conventional cluster analysis, it may be difficult to accurately grasp the concept or feature of the technical field represented in a plurality of technical documents to be analyzed. For example, in the above Japanese Patent Laid-Open Publication No. 2005-92443 (Patent Document 1), classification is performed on the scale of closeness of vector direction, so the same cluster if the vector direction is closer than a certain threshold, otherwise It becomes a cluster, and the characteristics of each cluster are not always clear. Even if important words are extracted for each cluster, the extracted important words are similar! /, And it is difficult to grasp the difference between clusters! /, And the validity of the classification itself is questioned. End up.

In other words, the method disclosed in Patent Document 1 does not take into consideration the analysis of the characteristics of the classified clusters by the analyst. Therefore, if an analyst tries to grasp the characteristics of the classified cluster, the analyst is forced to read each technical document belonging to the cluster, and the analysis takes a long time.

Therefore, the present invention has been made in view of the above circumstances, and the object of the present invention is to analyze It is intended to enable analysts to understand what concepts or features the technical fields represented by multiple technical documents have in the analysis of technical documents while maintaining objectivity. .

Means for solving the problem

[0005] (1) In order to solve the above-described problem, one aspect of the present invention is applied to a document group analysis apparatus that analyzes text data.

Then, the document group analysis device is a text data acquisition means for acquiring a plurality of technical documents represented by text data, and a weighting amount calculation for calculating a weighting amount of each index word for each of the acquired technical documents. And a factor analysis using each index word as an observation variable using the obtained weighted amount of each index word as a subject, and for each index word for each factor. In addition to calculating the factor loading, for each of the technical documents, calculating means for calculating the factor score for each factor, and determining the attribution factor of each index word using the factor loading of each index word, The attribution factor determination means for determining the attribution factor of each technical document using the factor score of each technical document, and the index word or group of index words belonging to the same factor, respectively. With operative document group of data, characterized by comprising output means for outputting for each factor, a.

[0006] According to one aspect of the present invention, an index word and a plurality of technical documents can be attributed to each factor. Therefore, from the index word, characteristics of the technical field through language information that can be understood by an analyst. It becomes possible to grasp the concept. Also, from a plurality of technical documents, it is possible to grasp the characteristics or concepts of the technical field from the specific tendency of the bibliographic information included in each technical document. Note that the present invention does not require the subjective or arbitrary judgment of the analyst such as “interpretation of factors” in 1S normal factor analysis using the factor analysis technique. This is because each index word is used as an observation variable. In other words, since the observed variable itself represents the content, if the observed variable belonging to the factor is determined based on the factor loading, the content of each factor can be shown directly using the observed variable.

[0007] (2) In the document group analysis apparatus, The attribution factor determination means uses the calculated factor loading for each index word to select a factor having the maximum factor loading, and identifies the selected factor as the attribution factor of the index word. At the same time, for each technical document, using the calculated factor score, the factor having the maximum factor score may be selected, and the selected factor may be specified as the attribution factor of the technical document.

[0008] By belonging to the factor most relevant to each index word or each technical document, it is possible to best explain the factor. As a result, it becomes possible to grasp the features and concepts of the technical field more clearly.

[0009] (3) In the document group analysis apparatus,

The frequency of occurrence of index words included in the plurality of documents is obtained, the importance of each index word is calculated using the frequency of appearance, and a predetermined number of index words with the highest importance are calculated using the calculated importance. An important index word extracting unit may be further included, and the weighting amount calculating unit may obtain a weighting amount of a predetermined number of index words having a higher importance as a weighting amount of each index word.

[0010] From a plurality of technical documents, only the index words that represent the characteristics of the entire technical document group are extracted as important index words and analyzed, so that the characteristics or concepts of the technical field can be understood more clearly. Is possible. On the other hand, it is possible to improve the efficiency of the analysis process by narrowing down the index words in advance.

[0011] (4) In the document group analyzing apparatus,

For each of the factors, there is provided factor evaluation value calculation means for calculating a factor evaluation value indicating a technical evaluation of the factor, and the output means uses the technical document as data of the technical document or technical document group. Or it is good also as outputting the factor evaluation value of a technical literature group.

[0012] With the above configuration, it is possible to perform a relative comparison between factors based on factor evaluation values. As a result, it is possible to grasp the relative positional relationship between the technical elements represented by the technical literature belonging to the factors, and to further classify the important technical elements from those that do not. You can also

(5) In the document group analyzing apparatus,

The technical document is a patent document including a patent publication and a patent publication, Progress information acquisition means for acquiring progress information of each patent document, and for each of the factors, factor evaluation indicating a technical evaluation of the factor using the progress information of the technical literature or technical literature group belonging to the factor Factor evaluation value calculation means for calculating a value, and the output means may output the factor evaluation value as data of the technical document or technical document group.

[0014] In one aspect of the present invention, factors can be evaluated based on a patent gazette. Therefore, highly accurate factor evaluation can be performed by utilizing examination progress information included in patent information related to the patent gazette. Can be done.

[0015] (6) In the document group analyzing apparatus,

Document number determination means for determining the number of technical documents or technical document groups belonging to each factor, and the factor evaluation value calculation means for each of the factors is a technical document or technical document group document belonging to the factor. A first index with a predetermined weight is calculated, a second index is calculated by indexing progress information of technical documents or technical documents belonging to the factor, and the calculated first index and second index are calculated. It may be used to calculate a factor evaluation value of the factor.

[0016] By determining the number of technical documents, it is possible to grasp the factor share in the population, which is the technical document group power. In addition, by giving a predetermined weight to the number of documents, it is possible to evaluate factors that take into account economic aspects and technological competitiveness. By indexing progress information, quantitative and objective factors can be evaluated. As a result, the share and economic value of the technical elements in the technical field as well as the share and economic value of the technical elements can be grasped numerically and quantitatively proposed. I'll do it.

[0017] (7) In the document group analysis apparatus,

The progress information includes the number of citations from other companies, the number of oppositions to patents, the number of requests for patent invalidation trials, the existence of requests for examination, and the presence or absence of registration of patent right setting. Is a value obtained by weighting the number of documents using at least one of the total number of citations from other companies, the total number of times of oppositions to be patented, and the total number of requests for patent invalidation. The second index obtained by indexing the historical information is cited by other companies. As an index value of at least one of the total number, the total number of patent objections, the total number of requests for patent invalidation trial, the examination request rate, and the registration assessment rate Good.

[0018] It is possible to perform factor evaluation taking into account the influence of patents that can be an obstacle to patent acquisition and technological development of other companies. In addition, it is possible to perform factor evaluation that takes into account the applicant's willingness to acquire rights and the examiner's evaluation. As a result, the fairness and appropriateness of the factor evaluation can be ensured, and as a result, the relative positional relationship between the technical elements and the importance results can be grasped fairly and appropriately. .

(8) In the document group analyzing apparatus,

The output means may calculate the factor evaluation value for each factor and for each applicant.

[0020] For each factor, it is possible to grasp the rank and share of applicants of patent publications belonging to the factor. As a result, it is possible to grasp the characteristics of a specific technical field from the viewpoint of the competitive state of the development company.

[0021] (9 and 10) According to another aspect of the present invention, there is provided a data analysis method including the same steps as the methods executed by each of the above apparatuses, and the same processes as the processes executed by the respective apparatuses. It is a program that can be executed. This program can be sent and received over a network that can be recorded on a recording medium such as an FD, CDROM, or DVD.

[0022] (11) In the document group analyzing apparatus,

The technical document is a patent document including a patent publication and a patent publication, and for each acquired patent document! /, Means for acquiring a patent score that individually evaluates the value of the patent document;

For each of the factors, there is provided factor evaluation value calculation means for calculating a factor evaluation value indicating a technical evaluation of the factor using the patent score of a patent document belonging to the factor.

Good as.

[0023] By using a patent score that individually evaluates the value of each patent document, it belongs to each factor. It is possible to calculate a factor evaluation value that reflects the value of a patent document. As a result, it is possible to grasp the features and concepts of the technical field more clearly.

[0024] (12) In the document group analyzing apparatus,

The factor evaluation value calculation means includes

For each factor, a patent score equal to or higher than a predetermined threshold is selected from the patent scores of patent documents belonging to the factor, and a value obtained by adding the selected patent scores is calculated as the factor evaluation directly.

Good as.

[0025] By subtracting values below the threshold with the values above a predetermined threshold being included, only a large number of low-importance patents with a large number of cases will result in a high score for factors with few important patents. Can be prevented. As a result, an appropriate factor evaluation value can be calculated, and it becomes possible to more clearly understand the characteristics or concepts of the technical field.

[0026] (13) In the document group analyzing apparatus,

The patent score is a value that is standardized with respect to the document group of the population including the factor for which the factor evaluation value is calculated.

Is desirable.

[0027] By calculating a factor evaluation value by obtaining a standard value in the population, the accuracy of relative comparison between different factors can be improved. As a result, it becomes possible to grasp the features and concepts of the technical field more clearly.

(14) Further, the document group analyzing apparatus,

The patent score means that the patent documents are classified into groups for each technical field and every predetermined period, and for each classified group, the progress information of the patent documents belonging to the group is used, and Must be a calculated value

Good as.

[0029] By subdividing into groups for each technical field and for each predetermined period, and calculating patent scores using the past information for each classified group, progress information based on differences in technical field and application time. It is possible to correct the bias and calculate an accurate patent score. As a result, an appropriate factor evaluation value can be calculated, and the characteristics and concepts of the technical field can be further improved. It becomes possible to grasp clearly.

Brief Description of Drawings

FIG. 1 is a diagram showing a hardware configuration of a document group analysis apparatus according to an embodiment of the present invention.

[FIG. 2] FIG. 2 (A) is a flowchart explaining the procedure of the factor extraction process in the above document group analyzer, and FIG. 2 (B) calculates the factor progress information score as document or document group data. The flowchart explaining the processing procedure to do.

FIG. 3A is an explanatory diagram regarding an example of a method for acquiring text data of a plurality of documents to be analyzed.

3B] An explanatory diagram regarding an example of a method for acquiring text data of a plurality of documents to be analyzed.

FIG. 3C is an explanatory diagram regarding an example of a method for acquiring text data of a plurality of documents to be analyzed.

4] Factor loading of each index word calculated in the embodiment of the present invention.

5] Factor scores for each publication calculated in the examples of the present invention.

6] Index words belonging to each factor extracted in the embodiment of the present invention, and publications belonging to each factor.

7] A plurality of indicators of each factor calculated in the embodiment of the present invention, and a patent impact index calculated based on this.

8] A plurality of indicators for each factor calculated in the embodiment of the present invention, and a progress information index calculated based on the indicators.

9] Factor progress information score (for each factor) calculated in the embodiment of the present invention. 10] An example illustrating the factor progress information score calculated in the embodiment of the present invention. 11] Factor progress information score for each factor and for each applicant calculated in the example of the present invention.

12] In the embodiment of the present invention, the publication group belonging to each factor is further classified for each applicant.

The example which illustrated the factor progress information score for every factor and every applicant.

FIG. 13 is a functional block diagram of a document group analysis apparatus according to a modification of the embodiment.

14] A flowchart showing the procedure for calculating the technical element score in the above modification FIG. 15 is a graph showing the distribution of the technical element score of the modified example and the factor progress information score of the example in relation to the number of publications.

[FIG. 16] An example showing the technical element score for each factor and for each applicant in the above variation.

FIG. 17 is a diagram schematically illustrating an example of a data configuration of progress information used in the modification example.

FIG. 18 is a diagram schematically illustrating an example of a data configuration of content information used in the above modification.

FIG. 19 is a flowchart showing the procedure of a patent score calculation process in the modified example.

FIG. 20 is a flowchart showing details of processing for calculating an evaluation value of each patent data in the modified example. Explanation of symbols

[0031] 1: processing device, 2: input device, 3: recording device, 4: output device, 101: text data acquisition means, 102: document vector acquisition means, 103: factor load and factor score calculation means, 104: Attribution factor determination means

BEST MODE FOR CARRYING OUT THE INVENTION

Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

<1. Configuration of document group analyzer>

FIG. 1 is a diagram showing a hardware configuration of a document group analysis apparatus according to an embodiment of the present invention. The document group analysis apparatus of the present embodiment includes a processing device 1 having a CPU (central processing unit) and a memory (recording device), an input device 2 that is an input means such as a keyboard (manual input device), a document group A recording device 3 which is a recording means for storing data, conditions, work results by the processing device 1 and the like, and an output device 4 which is an output means for displaying or printing data of documents or document groups belonging to the extracted factors. It consists of a computer device equipped.

The processing device 1 includes a text data acquisition unit 101, a document vector acquisition unit 102, a factor calculation unit 103, an attribution factor determination unit 104, a document number determination unit 201, a progress information reading unit 202, an index calculation unit 203, a patent Impact calculation unit 204, progress information calculation unit 205, and score calculation unit 206 Is provided.

Here, in the present embodiment, each functional unit of the processing device 1 (text data acquisition unit 101, document vector acquisition unit 102, factor calculation unit 103, attribution factor determination unit 104, document number determination unit 201, progress information reading unit 202 , Index calculation unit 203, patent impact calculation unit 204, progress information calculation unit 205, and score calculation unit 206) Programs for realizing the functions of the functional part (text data acquisition program, document vector acquisition program, factor calculation program, attribution factor determination program, document number determination program, progress information reading program, indicator calculation program, patent impact calculation program , Progress information calculation program, and score calculation program) are stored. The functions of the functional units of the processing device 1 are realized by the CPU executing the above program stored in the memory.

The recording device 3 includes a condition recording unit 31, a work result storage unit 32, a document storage unit 33, and the like. The document storage unit 33 includes document group data obtained from an external database or an internal database. An external database means, for example, a document database such as IPDL of a patent electronic library that is serviced by the Japan Patent Office or PATOLIS (registered trademark) that is serviced by Patrice Co., Ltd.! In addition, the internal database is a database that stores data such as patent JP-ROMs that are sold by itself, documents that are stored on a disk (such as discs) and DVDs (digital versatile discs), and is output to paper. It includes devices such as OCR (optical character reader) that reads in-written or handwritten documents, and devices that convert the read data into electronic data such as text.

In the present embodiment, the published patent publication, published patent publication, patent publication publication, patent invention specification, published patent publication, republished patent publication, patent trial request publication, published patent publication English abstract, publication Utility Model Gazette, Published Utility Model Specification, Published Utility Model Gazette, Republished Utility Model Gazette, Public Utility Model Gazette, Utility Model Registration Gazette, Registered Utility Model Various patent gazettes such as new gazettes, registered utility model specifications, notices of requests for utility model trials, public technical bulletins, etc., but are not limited to this. General can be analyzed.

[0035] As a communication means for exchanging signals and data among the processing device 1, the input device 2, the recording device 3, and the output device 4, a USB (Universal Serial Bus) cable or the like may be directly connected. It may be transmitted / received via a network such as a LAN (local area network), or via a medium such as an FD, CDROM, MO, or DVD that stores documents. Alternatively, a part or a combination of these may be used.

[0036] <1-1. Details of input device 2>

Next, the configuration and function of the document group analysis apparatus will be described in detail.

In the input device 2, the text data acquisition condition of the analysis target document group, the document vector acquisition condition, the factor loading amount and the factor score calculation condition, the attribution factor determination condition, the factor progress information score calculation condition described later, and the output Accepts input such as conditions. These input conditions are sent to and stored in the condition recording unit 31 of the recording device 3.

[0037] <1 2. Details of processing device 1>

The text data acquisition unit 101 acquires data of a document group to be analyzed from the document storage unit 33 of the recording device 3 according to the acquisition conditions of the text data input by the input device 2. For example, text data is acquired for I documents belonging to one cluster among the clusters obtained as a result of cluster analysis based on the similarity of documents extracted under certain conditions. The acquired text data is sent directly to the document vector acquisition unit 102 and used for processing there, or sent to the work result storage unit 32 of the recording device 3 and stored therein.

[0038] The document vector acquisition unit 102 conforms to the document vector acquisition condition input by the input device 2, and based on the text data of the I documents acquired by the text data acquisition unit 101, I documents Calculate the vector. This document vector is a J-dimensional vector whose vector element is the weighting amount z of the index word j in each document i, where J is the number of index words. The calculated document vector is directly sent to the factor calculation unit 103 and used for processing there, or sent to the work result storage unit 32 of the recording device 3 and stored therein. The factor calculation unit 103 is based on the vector element z of the document vector calculated by the document vector acquisition unit 102 in accordance with the factor load amount and factor score calculation conditions input by the input device 2.

Calculate factor loading a and factor score f in 1J increments. Where k is the factor number and factor negative

jk ik

The load a is calculated for each factor for each index word j, and the factor score f is jk ik for each document i.

Calculated for each factor. The calculated factor loading a and factor score f are assigned factors.

jk ik

It is sent directly to the unit 104 and used for processing there, or sent to the work result storage unit 32 of the recording device 3 and stored therein.

[0040] The attribution factor determination unit 104 determines the attribution factor of each index word j based on the factor loading a calculated by the factor calculation unit 103 in accordance with the attribution factor determination condition input by the input device 2.

jk

And determine the attribution factor for each document i based on the factor score f. Determined attribution factor

ik

Is sent directly to the document number determination unit 201 and the progress information reading unit 202 and used for processing there, or sent to the work result storage unit 32 of the recording device 3 and stored therein. Then, according to the output condition input by the input device 2, the index word j belonging to the same factor is output by the output device 4 together with the data of the corresponding document i.

The document number determination unit 201 to the score calculation unit 206 calculate the factor progress information score according to the factor progress information score calculation condition input by the input device 2.

Based on the attribution factor of each document determined by the attribution factor determination unit 104, the document number determination unit 201 calculates a document group for each factor or a document group for each factor and each applicant as a score calculation document. Read as a group and determine the number of documents.

The progress information reading unit 202 reads the progress information of each document from the document storage unit 33 of the recording device 3 for the score calculation target document group.

The index calculation unit 203 calculates an index based on the progress information read by the progress information reading unit 202 for the score calculation target document group.

The patent impact calculation unit 204 calculates the patent impact for the score calculation target document group based on the number of documents determined by the document number determination unit 201 and the index calculated by the index calculation unit 203.

The progress information calculation unit 205 calculates a progress information index based on the index calculated by the index calculation unit 203 for the score calculation target document group. The score calculation unit 206 calculates a factor progress information score for the score calculation target document group based on the patent impact calculated by the patent impact calculation unit 204 and the progress information index calculated by the progress information calculation unit 205.

The work results by each functional unit are sent to and stored in the work result storage unit 32 of the recording device 3.

[0042] <1 3. Details of recording device 3>

In the recording device 3, the condition recording unit 31 records information such as conditions obtained from the input device 2, and sends necessary data based on a request from the processing device 1. The work result storage unit 32 stores the work result of each component in the processing device 1 and sends necessary data based on a request from the processing device 1. The document storage unit 33 stores and provides necessary document group data obtained from the external database or the internal database based on the request of the input device 2 or the processing device 1. When storing patent document data, the document storage unit 33 preferably stores the bibliographic information (such as the name of the applicant) and the progress information (information such as the request for examination) together.

[0043] <1 -4. Details of output device 4>

The output device 4 outputs the document and index word for which the attribution factor is determined by the attribution factor determination unit 104 of the processing device 1 for each factor. The output device 4 includes a display unit such as a display device, and displays a correspondence table between documents and / or index words and factors, a factor evaluation value of a document or a document group calculated for each factor, and the like. The output format is not limited to the display on the display unit, but may be printed on a print medium such as paper, or transmitted to a computer device on a network via communication means.

[0044] <2. Factor extraction processing>

FIG. 2 (A) is a flowchart for explaining the procedure of the factor extraction process in the document group analyzer. The document group analysis apparatus of the present embodiment extracts factors from a plurality of documents to be analyzed using a factor analysis technique.

[0045] <2-1-Acquisition of text data>

In the document group analysis apparatus of the present embodiment, the text data acquisition unit 101 acquires text data of I documents i (i = l, 2,..., I) as analysis targets from the document storage unit 33. (SI 01). The kind of document group to be selected as I documents is arbitrary s, for example, as follows.

3A to 3C are explanatory diagrams regarding an example of a method for acquiring text data of a plurality of documents to be analyzed. The items (A) and (B) described below are realized by the procedure described in the patent gazette filed by the applicant (see International Publication No. 2006/030751). Therefore, the following description is simplified.

(A) First, the technology of interest is selected from the patent gazettes of a certain company (target company). Specifically, the text data acquisition unit 101 of the processing device 1 clusters the patent documents of the target company and obtains an evaluation value of each cluster (target company cluster) (for example, one corresponding to a factor progress information score described later). Calculate and select the target company cluster with the highest evaluation value as the technology of interest (Figure 3A).

(B) Next, the text data acquisition unit 101 selects the document group belonging to the “selected attention technique and the technology similar to the attention technique (specific technical field)” from its own patent document group including the target company and other companies' publications. (Self-other patent specific field document group) is extracted. Specifically, the text data acquisition unit 101 calculates the degree of similarity between each document in its own patent document group and the noted technology, and determines the document group having a higher number of similarities with the noted technology as its own others. It is extracted from the document storage unit 33 as a patent specific field document group (Fig. 3B). This makes it possible, for example, to select an important patent group from the patent group of the target company and then analyze similar patent groups including other company's patents.

(C) Next, the text data acquisition unit 101 obtains a document group to be analyzed by clustering the extracted patent documents in the patent-specific field. The group of documents to be analyzed here (own and other patent clusters) is not limited to lower clusters having a high degree of similarity between documents. It may be a cluster. Figure 3C shows an example where 70 lower clusters, 8 middle clusters (lines), and 4 upper clusters (groups) are generated! Whether to use a lower cluster or a middle or upper cluster as your own patent cluster can be selected according to the purpose of the analysis. As a result, for example, it is possible to classify a group of documents specific to the patent and other patents into subdivided technical areas, and for each technical area, or for each intermediate or upper cluster, the patent group can be classified. Can be analyzed. In addition, the ratio of documents explained by factors in factor analysis (cumulative contribution rate) can be analyzed by clustering the document groups of specific fields of patents and other patents that are highly similar to each other and analyzing the document groups with higher similarity. It is possible to accurately express concepts and features included in a document group.

When the text data acquisition unit 101 acquires the text data of each document i to be analyzed from the document storage unit 33, the predetermined number of index words j (j = 1, 2, 3,..., J) To extract. The index words to be extracted are, for example, index words of importance in I documents. To extract the index words with the highest importance, the importance is calculated for the index words included in the document i, sorted in descending order, and the highest words are extracted.

[0048] The importance level of the index word calculated here is based on the importance levels of a plurality of documents to be analyzed acquired by the text data acquisition unit, for example, GFIDF or the GFIDF based on the co-occurrence level of index words. It is preferable to use the one that has been corrected!

[0049] GFIDF refers to the global frequency (GF: total number of occurrences of the index word in the document group to be analyzed) and document frequency (DF: This is a value obtained by multiplying the reciprocal of the number of documents in which the index word appears) or the logarithm of the document frequency (IDF: inverse document frequency). This is an index word that is used in a large number in the document group to be analyzed, and is often used in a predetermined document group that is different from the document group to be analyzed. High scores and GFIDF values are calculated and evaluated as important words representing the characteristics of the document group to be analyzed.

[0050] Skey described below is a correction of GFIDF based on the co-occurrence of index terms. Note that the items described below are realized by the procedures described in the patent publication (see International Publication No. 2006/048998) filed by the present applicant, so the description will be simplified.

In order to obtain Skey, first, a high-frequency word that appears frequently in the document group C to be analyzed, and the co-occurrence degree of each document word with each index word of the document group.

c (w, w) = ∑ [DF (w, D) X DF (w, D)]

j k {Dec} j k

(W, w are each index word included in the document group C, and D is each sentence j k belonging to the document group C.

DF (w, D) is the document frequency of index word w in document D. ) Let high frequency words that are similar to each other be the base g (h = l, 2,…). This foundation g and index word

h

The co-occurrence degree of w for each document

Co (w, g no = ∑ c (w, w)

{w '^ g, w' ≠ w}

And Here, w ′ is a high-frequency word belonging to a certain base g and other than the index word w which is a measurement target of the co-occurrence degree Co (w, g). Degree of co-occurrence Co of the index words w and the base g (w, g) is, _w 'for all, the co-occurrence of _c and _{_w (w, w'} is the sum of). Based on the co-occurrence degree Co (w, g), the next key (w) is calculated.

key (w) = l- Π [1 Co (w, g) / F (g)]

{l≤h≤b} h h

Where F (g) = ∑ Co (w, g), that is, the co-occurrence degree of the index word w and the base g Co (w, g

h wEC} h h

) Of all index words w. Co (w, g) divided by F (g) and the difference from 1 h h h

The key (w) is obtained by multiplying all the base g and taking the difference from 1

h

Yes

Skey (w) is calculated by the following equation.

Skey (w) = GF (w, C) X [IDF (w, P) + In key (w)]

Note that GF (w, C) is the global frequency of the index word w in the document group C to be analyzed, and ID F (w, P) is a predetermined document group P different from the document group C to be analyzed. This is the reverse document frequency of the index word w.

GFIDF is high and co-occurs in the basic language of document group C and has high affinity with the contents of document group C V, (key (w) is high! /,) Word! /, High! /, Skey (w) value is calculated and evaluated as an important index word representing the characteristics of the document group C to be analyzed.

As described above, by extracting and analyzing only the index terms that represent the characteristics of the entire technical literature group as important index terms from a plurality of technical literatures, the features or concepts of the technical field can be clarified more clearly. It is possible to grasp. On the other hand, it is possible to improve the efficiency of the analysis process by narrowing down the index words in advance.

<2— 2. Acquiring Document Betanore>

Next, after the text data obtaining unit 101 obtains the text data, the document vector obtaining unit 102 obtains the I-th document i and the weighting amount z of each index word j as a vector element. Is generated (S102). As a result, I row J column data like the following You can get power S. Let Z be the matrix of I rows and J columns with z as the matrix element.

[table 1]

Here, the weighting amount refers to the quantity given to each index word in each document from a predetermined viewpoint, and for example, TFIDF is preferably used. TFIDF is the index word frequency (TF: number of occurrences of the index word in a document) and document frequency (DF: number of documents in the document group where the index word appears) for a certain index word. The value obtained by multiplying the reciprocal of or the logarithm of the document frequency (IDF: inverse document frequency). This is an index word that is used in a large number of documents for which a document vector is to be calculated, and is often used in a given document group. .

[0053] <2— 3. Calculation of Factor Load and Factor Score>

Next, the factor calculation unit 103 calculates factor loadings and factor scores in factor analysis in which each document i is a subject, each index word j is an observation variable, and the document vector of each document is a response by the subject ( S103).

Specifically, first, let K be the number of factors of factor k (k = l, 2,..., K), and let a be the factor loading for each factor k of each index word j. Also, let f jk i be the factor score for each factor k in each document i. Then, a factor loading matrix A having a factor loading a as a matrix element and a factor score matrix F having a factor score f as a k jk ik matrix element are set as follows.

[Table 2]

[Table 3] Factor 1 Factor 2 Factor Document 1 f 1 1 f 1 2 f K Document 2 f 2 1 f 2 2 f 2 K Document 3 f 3 1 f 3 2 f 3 K

■..-■--■ ■

Document I f. 1 f 1 2 f I K

[0054] Next, let E be the residual matrix of I rows and J columns, and

Z = F XA ^t + E

Where is the transpose of A

To obtain the factor loading matrix A and factor scoring matrix F.

[0055] Regarding the factor score f which is each element of the factor score matrix F and the residual e ik i which is each element of the residual matrix E, (1) the factor score is standardized to mean 0 and standard deviation 1 , (2) Each factor score

J

In general, assuming that the correlation between each residual is 0, (3) the correlation between each residual is 0, and (4) the correlation between each factor score and each residual is 0,

R = AA ^t + V

Where R is the correlation between the observed variables [i] and V is the residual covariance matrix

Is known to hold. Therefore, the factor loading is obtained by the following equation.

AA ^ R-V

Next, R—V = R *. In order to calculate this R *, the correlation matrix IJR is calculated from the values of each element z of the matrix Z, and then the matrix is estimated by replacing the diagonal elements of the correlation matrix with the estimated value of commonality ( Examples of commonness estimation methods include the SMC method and RMAX method). Since R * = AAt, the factor loading matrix A is calculated based on this R * matrix to obtain the factor loading (the factor loading is calculated by, for example, the main factor method, the least squares method, the maximum There is a likelihood method).

[0056] Then, in order to find a more meaningful factor, it is desirable that the factor calculation unit 103 performs an operation of factor rotation. As a method of rotating the factor axis in the present embodiment, it is preferable to use a norimax (orthogonal rotation) method. In other words, the relationship between the observed variable and the factor is obtained by rotating the factor axis while maintaining orthogonality with other factors so as to maximize the variance of the factor loading. In particular, the document group to be analyzed is a sentence with high similarity between document vectors. In the case of books, using this varimax method has the advantage that the amount of each factor load can be reduced and the characteristics of the factor can be clarified. As a general factor axis rotation method, in addition to the above-mentioned Norimax method, orthogonal rotations include, for example, Coatmax, Ekamax, Percimax, Osomax, orthogonal procrustes, etc. In the case of cross rotation, Promax, Oblimin, Harris' Kaiser, Oblique Procrustes, etc. are listed. The method of rotating the factor axis in this embodiment is not limited to the above varimax, and may be appropriately selected from these rotation methods according to the embodiment.

[0057] The factor score matrix F is, for example,

F = ZR ^{_ 1} A

(Where Z is standardized data here).

[0058] <2— 4. Determination of attribution factors>

When the factor calculation unit 103 obtains the factor loading matrix A and the factor scoring matrix F, the attribution factor determination unit 104 determines the attribution factor of each index word j based on the factor loading a, and based on the factor score f The attribution factor of each document i is determined (S104).

ik

As a result, index terms and multiple technical documents can be attributed to each factor, so it is possible to grasp the characteristics or concepts of the technical field from the index terms through linguistic information understandable to the analyst. . In addition, from a plurality of technical documents, it is possible to grasp the characteristics or concepts of the technical field from the specific tendency of the information included in each technical document. Note that the present invention does not require the subjective or arbitrary judgment of the analyst, such as “interpretation of factors” in the ordinary factor analysis using the factor analysis technique. This is because each index word is used as an observation variable. In other words, since the observed variable itself represents the content, if the observed variable belonging to the factor is determined based on the factor loading, the content of each factor can be shown simply using the observed variable.

[0059] For example, a factor among the factor loadings a, a, ..., & for each factor of an index word j

j l j 2 jK

If the factor loading a for the child k is the maximum, the attribution factor of the index word j is the factor k jk

And Similarly, a factor out of factor scores f, f,.

il i2 iK

If the factor score f for the child k is the maximum, the attribution factor of the document i is the factor k . Determining the attribution factor as described above can best explain the factor by, for example, belonging to the factor most closely related to each index word or each technical document. As a result, it becomes possible to grasp the features and concepts of the technical field more clearly. In this case, one index word can belong to only one factor, and one document can belong to only one factor. On the other hand, an index word belonging to one factor is not always one, and a document belonging to one factor is not necessarily one.

[0060] Furthermore, a lower limit value is set for the factor loading, and the maximum factor loading of a certain index word j a force S jk If the index loading j is less than the lower limit, the index word j should not belong to any factor. The index word j having a low factor load is preferably excluded from the index word group indicating the contents of the factor. Similarly, a lower limit is set for the factor score, and the maximum value f of the factor score for a document i is

ik

If it is less than the limit, the document i is not attributed to any factor, and the relationship with the factor is high, and it is preferable to attribute only the document to the factor.

[0061] If the attribution factor of each index word is determined as described above, the index word or index word group belonging to each factor is included in the I documents to be analyzed, indicating the contents of each factor. It can be thought of as showing a concept or feature. Since the document or document group belonging to each factor is a document or document group closely related to the factor, it should be considered to indicate in which document the concept or feature extracted as each factor appears. I can do it.

[0062] <2-5. Output>

Then, the document group analysis apparatus outputs the index word or index word group belonging to the same factor together with the document or document group data belonging to each corresponding factor by the output device 4 for each factor (S105). It is arbitrary power to output what kind of data for documents or document groups belonging to each factor. As an example, output data (such as a publication number in the case of a patent publication) specifying the document or document group. It is possible to calculate the factor evaluation value of the document or document group and output it.

By calculating the factor evaluation value as described above, it is possible to perform a relative comparison between the factors based on the factor evaluation value. As a result, the relative positional relationship between the technical elements represented by the technical literature belonging to the factors can be grasped, and moreover, it can be recognized from among them. It is also possible to classify the key technical elements and those that are not. As a preferred example of calculating the factor evaluation value, an example of calculating and outputting the factor progress information score of the document or document group will be described below.

[0063] <3. Calculation process of factor progress information score>

FIG. 2 (B) is a flowchart for explaining a processing procedure for calculating a factor progress information score as data of a document or a document group. The process of this flowchart is as follows: (1) A document or document group belonging to each factor for which a factor progress information score is to be calculated; or (2) A factor progress information score is calculated for each factor and for each applicant as described later. In some cases, each factor and each applicant's document or document group (hereinafter referred to as (1) (2) “score calculation target document group” in each case) will be executed. The

[0064] <3— 1. Determination of the number of documents>

First, the document number determination unit 201 determines the document number N of the score calculation target document group (S201). If a published patent publication and a patent publication are issued for a patent application, the number of documents for that patent application should be counted as two.

[0065] <3—2 · Reading progress information>

Next, the progress information reading unit 202 reads the progress information as the document attribute of each document of the score calculation target document group from the document storage unit 33 of the recording device 3 or another database (S202). This database records the history of patent applications related to each document. As an example of the progress information to be read, for each patent application,

“Other company citation count” (0 or positive integer),

“Number of requests for opposition to patent or trial for invalidation of patented patent” (0 or positive integer), “Request for examination” (1 or 0),

“Patent setting registration status” (1 or 0),

“Request for accelerated examination” (1 or 0),

“Presence or absence of appellate appeal” (1 or 0)

Other information may be used.

[0066] <3— 3. Calculation of indicators based on progress information>

Next, the index calculation unit 203 uses the above as an evaluation value for the score calculation target document group. A plurality of indices are calculated based on the progress information recorded in the database (S203). Examples of this indicator include: “Total number of citations from other companies”, “Total number of requests for patent opposition or patent invalidation trials”, “Examination request rate”, “Registered assessment rate”, There are "patent registration rate", "prompt request rate", "other companies cited number ratio", "appraisal appeal case ratio", and "objection or invalidation request ratio". Also good. Each definition is as follows.

“Total number of citations from other companies” = “Total number of citations from other companies” in the document group to be calculated

= Total number of documents subject to score calculation for “number of oppositions to be patented or requests for trial for invalidation of patented patents”

`` Examination request rate '' = number of requests for examination / number of patent applications

"Patent registration rate" = number of patent registrations / number of patent applications

"Registered assessment rate" = Number of patents registered / Number of requests for examination

"Rapid review request rate" = Number of requests for accelerated examination / Number of requests for examination

“Ratio of other companies' citations” = “Total number of citations of other companies” / Number of patent applications

“Rate of Appeal Appeal Trials” = Appeal Appeal Appeals / Requests for Appraisal

"Number of requests for opposition to patent or request for invalidation of patents"

= "Total number of requests for patent opposition or patent invalidation trial" / Number of patent registrations

Of these definitions,

The number of requests for examination is the total in the document group subject to score calculation of “whether or not there is a request for examination” (1 or 0) The number of patent registrations is the total in the document group subject to score calculation of “whether or not the patent right is registered” (1 or 0)

The number of requests for accelerated examination is the total for the documents subject to score calculation for “whether or not requested for accelerated examination” (1 or 0).

The number of trials against appraisal is the sum of the documents for which scores are to be calculated for “Applicability of appraisal” (1 or 0)

Give it. [0067] <3— 4. Calculation of Patent Impact Index>

Next, the patent impact calculation unit 204 calculates a patent impact index by assigning a predetermined weight to the “number of documents” of the score calculation target document group (S204). The patent impact index refers to a document that is intended to evaluate the other company's restraining power (the degree to which the rights of other companies are suppressed and the value of the company's patent is improved) for the document group subject to score calculation. For example, the “number of documents” is given a predetermined weighting based on “the total number of times of citations from other companies” and / or “the total number of times of oppositions to be patented or requests for invalidation of patented patents”

Patent Impact Index = “Number of documents” + “Total number of citations from other companies” + “Total number of requests for opposition to patent or request for trial for invalidity of patent”

The force S can be calculated by The weighting for the “number of documents” may be performed by addition as in the above formula, or by multiplying by some other ratio.

In addition to the above, the predetermined weighting includes various things such as patent profitability, patent productivity, patent utilization, and patent competitiveness, but is not limited to this. .

By determining the number of technical documents as described above, it is possible to grasp the share of factors in the population composed of technical documents. In addition, by applying a predetermined weight to the number of documents, it is possible to evaluate factors that take into account economic aspects and technical competitiveness.

[0068] <3— 5. Calculation of progress information index>

Next, the progress information calculation unit 205 calculates the progress information index by averaging the indices based on the progress information of the score calculation target document group (S205). The progress information index is an attempt to evaluate the value of patents of the company, the JPO, and competitors for a group of documents for which scores are calculated.

Progress information index = {∑ ^d (index) ² / d}

Indicator No = l

Can be calculated. In other words, d indicators based on progress information, such as “Request for Examination”, “Registered Appraisal Rate”, “Patent Registration Rate”, “Rapid Request for Approval”, “Ratio of Number of References from Other Companies”, “Appeal for Appeals” By taking the positive square root of the value calculated by dividing the sum of the squares of the seven indicators, the ratio of the number of cases '' and the ratio of the number of appeals against the objection or invalidation request, by the number of indicators d = 7 An information index can be calculated. Here is the progress indicator The above seven indicators are shown as an example, but in addition, for example, “in-house citation count ratio”, “domestic priority claim ratio”, “foreign priority claim ratio”, “packaging browsing rate”, etc. are used. You may do it. By calculating the factor progress information score using the above-mentioned indicators, it is possible to perform factor evaluation that takes into account the influence of patents that can be an obstacle to patent acquisition and technology development of other companies. In addition, it is possible to perform factor evaluation that takes into account the applicant's willingness to obtain rights and examiner's evaluation. As a result, the fairness and appropriateness of the factor evaluation can be ensured. As a result, the relative positional relationship between the technical elements and the result of the importance can be grasped fairly and appropriately.

[0069] As a method of calculating the progress information index from the d indices, a method of calculating the sum of each index is also possible (simple sum method). Since patent groups with a large number of indicators! /, High! / And values are highly evaluated, it is reasonable to use the sum of indicators as a historical information index. However, as the number of indicators d increases, the specific gravity of the “request for examination”, “registration assessment rate”, “patent registration rate”, etc., which should be emphasized, decreases. It can happen.

One way to solve this problem is to use the maximum value among the indicators as the progress information index, or to calculate the progress information index using only the top three indicators (maximum value method). . However, if only the maximum value or the higher index is adopted, the other indices will not be taken into account at all, which may result in a one-sided evaluation.

The above-mentioned method of taking the square root by dividing the sum of squares by the index number d can be said to be a method that combines the advantages of the simple sum method and the maximum value method. In other words, by taking the sum of squares, if there is a high value index among the d indices related to a certain document group, the high value greatly affects the historical information index. Therefore, the “examination request rate”, “registration assessment rate”, “patent registration rate”, etc., which tend to be high, are particularly high! / (As a result, the number of patent assessments is large! /) , Can give outstanding evaluation score. And for indicators other than high values, it is a historical information index that takes some consideration into account.

As described above, in the present embodiment, the progress information index is calculated with all the d indices taken into account. As a result, it is possible to evaluate the value of a patent from multiple angles.

[0070] <3- 6. Calculation of factor progress information score> Next, the score calculation unit 206 multiplies the patent impact index and the progress information index to calculate a factor evaluation value of the score calculation target document group (S206). By indexing the progress information in this way, for example, it is possible to evaluate quantitative and objective factors.

This evaluation value is referred to as a “factor progress information score”. The calculated score is output in S105. This factor progress information score has the following properties.

When “request for examination” = 0, the progress information index is 0 in most cases, and as a result, the factor progress information score is also 0.

The progress information index increases as the number of patents registered increases. In addition, if there are unassessed moon cakes, oppositions, etc., they are taken into account.

Since the patent impact index counts the number of publications, it increases as the number of patent applications increases, and further increases when patent publications are issued. It is weighted by “total number of citations from other companies” and “total number of requests for opposition to patent or patent invalidation request”.

With this factor progress information score, the patent document group can be evaluated from the aspect of the progress information.

By calculating the factor progress information score as described above, the factor can be evaluated based on the patent gazette. As a result, it is possible to perform highly accurate factor evaluation by utilizing examination progress information included in patent information related to the patent publication. The method for calculating the factor progress information score is not limited to this. For example, by calculating the individual evaluation value of each patent gazette belonging to each factor and totaling them, the factor progress information score for each factor is calculated. Can also be requested.

<4.Example>

Next, we introduce the analysis results of a specific document group.

A cluster analysis was performed based on the similarity of patent publications and patent publications published in Japan by a car manufacturer (named company A) to obtain multiple clusters. As a result of studying these multiple clusters, we decided to analyze in detail the relationship with the patents of other companies around the company, particularly regarding the cluster related to “exhaust control of internal combustion engines”. Therefore, the patent publications of our company and other companies In addition, 2869 publication groups with high similarity to the publication group belonging to the cluster related to the “exhaust control of internal combustion engine” were extracted from the group of about 4 million publications including patent publications. The analysis results are an example in which these 2869 publication groups are cluster-analyzed based on the similarity of the document vector of each publication, and among these, 569 publications belonging to a specific cluster are analyzed.

[0072] <4- 1. Factor analysis processing>

In this example, among the 569 analysis target document groups, among the index words, 77 words with the highest importance were used as observation variables, and 16 factors were extracted using a factor analysis technique. The number of factors to be extracted is not limited to this, and may be changed as appropriate according to the embodiment. Examples of methods for determining the number of factors include eigenvalues, factor contributions, interpretability, and Kaiser-Gatmann criteria.

Examples of factor loadings for each index word calculated by the factor calculation unit 103 and factor score calculation results for each gazette are shown in the tables of FIGS. 4 and 5 (here, part of the index word and gazette). Only shown).

[0073] The table shown in FIG. 4 is a table showing an example of the factor loading of each index word calculated for each factor from factor numbers 1 to 16; in the figure, the rows are index words and the columns are The number of each factor is indicated. In FIG. 4, for convenience of explanation, not all index words are shown, but only some index words are extracted and shown.

In this example, for each index word, the factor indicating the maximum factor load (shown by shading the relevant part) shown in FIG. Specifically, for example, the index word “Kura” at the top of FIG. 4 has a factor loading of 0.96 for factor 1. This index word “Kura” is attributed to factor 1 because it has a larger factor loading for factor 1 than any other factor up to factor 2 to 16; In addition, the index word “sucking, NOx, reduction, release, spike” belongs to factor 1 as well as the index word “Kura”. However, among the index terms used as observation variables, those that did not have a factor loading equal to or higher than a predetermined threshold were not attributed to any factor.

[0074] The table shown in FIG. 5 shows the factor scores of each gazette calculated for each factor from factor numbers 1 to 16; It is a table | surface which shows an example, In the same figure, a row has shown each publication number, and the column has shown the number of each factor. In FIG. 5, for the sake of convenience of explanation, the gazettes are excerpted without showing all gazettes belonging to each factor, and the numbers of these gazettes are partially processed.

In this example, as in the case of determining the attribution factor of the index word, for each gazette, the factor in which the factor score shown in FIG. 5 shows the maximum value (the corresponding part is shaded) is indicated in the gazette. The attribution factor. Specifically, for example, the publication “XX-XXX601” at the top of FIG. 5 has a factor score of 5.41 for factor 1. This report “XX-XXX601” is attributed to factor 1 because it has a higher factor score for factor 1 than any other factor up to factor 2 to 16; In addition, similarly to the publication “XX—XXX601”, the publications “XX—XXX097” and “XX—XXX189” belong to factor 1. However, among the gazettes included in the analyzed document group, those whose maximum score of factor is below the predetermined threshold (here, 1) are not attributed to any factor.

[0075] The table shown in Fig. 6 shows index words belonging to each factor and publications belonging to each factor. Fig. 6 is a table showing examples of index words and publications belonging to each factor.For convenience of explanation, not all of the publications belonging to each factor! Show me! /

This makes it easy to understand the concepts and features included in the document group to be analyzed using the index terms belonging to each factor, even in the case of the 569 document groups to be analyzed that have a high degree of similarity. In other words, when a document group is classified by clustering or the like, it may be difficult to grasp what is different between the classifications. According to this embodiment, an independent concept called a factor is extracted and then the factor is extracted. By assigning the index word and the gazette that are most relevant to, the characteristics or concept of the technical field can be grasped from the technical elements indicated by the index word and the technical information included in the gazette.

In addition, since it is possible to refer to specific gazettes as necessary for each factor, the gazette belonging to the factor to be noticed is actually read, or the factor progression described later is based on the data of the gazette belonging to each factor. More advanced analysis is possible, such as calculating information scores.

[0076] Unlike the notation in which a space is inserted between words such as English, the notation in Japanese does not express a word break clearly. For this reason, the text When performing inning, keyword extraction processing is generally performed in advance by applying a morphological analysis program on a computer. However, the current morphological analysis program has insufficient ability to handle a wide variety of expressions in Japanese natural sentences, so the words that originally make sense together are unnaturally divided and analyzed. a call and the force ^s interfering with the.

However, according to the present example, there was a phenomenon in which the divided words were gathered again in the same factor. For example, the index words “Lee” and “N” belonged to Factor 4, but these were originally “Lean” (in the field of this analysis document group, “the mixture ratio of fuel and air is low”). It seems to have been an integral term. Even if the terms are divided in this way, they are gathered again in the same factor because of the high commonality of the appearance patterns of these index terms, and the factor loading for each factor is similar. It is considered a thing. Similarly, factor 2, “sky”, “fuel”, “ratio”, and “theory”, seemed to have been an integral term of “theoretical air-fuel ratio”. Even in languages that use notation with spaces between words, such as English, “theoretical air-fuel ratio” may be analyzed by dividing it into “sky”, “fuel”, “ratio”, and “theory”. A certain force Even in that case, if the present invention is applied, it can be expected that the same factor will be collected again.

[0077] <4 2. Calculation of factor progress information score>

Next, an example in which a factor progress information score is calculated for each factor based on the progress information of the gazette belonging to each factor is shown.

In the tables shown in Fig. 7 and Fig. 8, based on the progress information data of the gazette belonging to each factor, for each factor, a plurality of indicators necessary for calculating the factor progress information score are calculated, and based on this, the patent impact is calculated. The example of the result of having calculated the index and the progress information index is shown.

Figure 7: Factor Nos. 1 to 16: Examination request rate, patent registration rate, registration decision rate, accelerated examination request rate, other company citation number ratio, assessment appeal ratio ratio, objections filed for each factor up to 16 An example of a ratio and a progress information index is shown. Figure 8 shows examples of the number of gazettes, the number of citations from other companies, the number of oppositions, and patent impact for each factor from 1 to 16;

In the table shown in FIG. 9, the score calculation unit 206 is based on the patent impact index and the progress information index. The factor progress information score calculated by the above and the factor progress information score (average) per publication calculated by dividing the factor progress information score by the number of documents by the score calculation unit 206 are shown. For reference, the eigenvalues of each factor in the factor analysis are also shown in the figure. The factor numbers in the leftmost column are common to Figs. In Fig. 4 to Fig. 8, Fig. 11 and Fig. 12, the forces are arranged in the order of the factor numbers. The factor arrangement order in Fig. 9 is the descending jet of the factor progress information score.

As can be seen from Fig. 9, by outputting the factor progress information score of each factor, it is possible to clearly understand which factor has the highest importance in the document group to be analyzed. The eigenvalues of each factor indicate how much of the data is explained.Factor factor The fact that it is independent of the importance of the document included in each factor from the perspective of the historical information score. I can do it quickly.

FIG. 10 shows an example in which the output device 4 outputs the factor progress information score calculated by the score calculation unit 206 in this embodiment. In FIG. 10, each circle represents each factor, the position of each circle on the vertical axis represents the factor progress information score for each factor, and the position of each circle on the horizontal axis represents the factor progress information score for each factor (the relevant factor (Average value per publication attributed to a factor). The size of each circle indicates the number of publications belonging to each factor, and the technical term attached to each circle indicates the index word belonging to each factor.

According to this, for example, although factor 13 (exhaust, NOX) has a small number of publications, it can be inferred that it is an extremely important factor from the viewpoint of historical information. By focusing on factors with high factor history information scores such as factor 4 (Lee, N), factor 5 (downstream, upstream, catalyst, deterioration, diagnosis), attention is paid to important technical elements in the document group to be investigated. The power S

Specifically, for example, in FIG. 10, factor 5 (downstream, upstream, catalyst, deterioration, diagnosis) is a combination of index words of catalyst, deterioration, and diagnosis, and the performance deterioration of exhaust gas reduction devices such as catalyst systems. It is presumed that this is a technical factor indicating a so-called advanced in-vehicle diagnosis system that has a function of automatically detecting and notifying the driver. Then, due to the recent exhaust gas regulations, the introduction of an advanced in-vehicle diagnosis system that monitors the malfunction of exhaust gas reduction devices such as catalysts for reducing NOx, detects it, and notifies the driver has been introduced. The power desired by various councils in local governments The importance of the technical element appears to be manifested in this factor 5 with meaningful index terms.

[0080] In calculating the factor progress information score, in the above example, the publication group belonging to each factor is set as the "score calculation target document group" and is calculated for each factor. However, the present invention is not limited to this, and the publication belonging to each factor. The group may be further classified for each applicant, and a group of publications belonging to each category for each factor and each applicant may be referred to as a “score calculation target document group”.

The table shown in Fig. 11 shows the calculation results of factor progress information scores for each factor and for each applicant.

The calculation method of the factor progress information score for each factor and each applicant is the same as the above-described factor progress information score (for each factor) except that the score calculation target document group is changed. Since it is classified by factor and by applicant, the number of documents in each score calculation target group is likely to be small. Regardless of whether or not it is classified for each applicant, the examination request rate, etc., ranges from 0 to 1, and is not necessarily small. Therefore, even if the factor progress information score for each applicant for a factor is summed up for all applicants, the factor progress information score for that factor (not for each applicant) does not match.

FIG. 12 is an example in which the publication group belonging to each factor is further classified for each applicant in this embodiment, and the factor progress information score is shown for each factor and for each applicant. Specifically, the score calculation unit 206 further classifies the publication group belonging to each factor for each applicant, and calculates a factor progress information score for each factor and for each applicant. In this example, the output device 4 generates a diagram based on the calculated factor progress information score and outputs the diagram.

In FIG. 12, factors are listed in the horizontal direction in the figure, applicants are listed in the depth direction, and the height direction indicates the factor progress information score for each factor and each applicant. However, Fig. 12 shows only the results calculated for the applicant's factor progress information score with the number of applications within the top 10 in all publications with factor scores above a certain level.

Specifically, for example, Factor 4 and Factor 5, which have been found to have a high factor progress information score in FIGS. 9 and 10, show that Company A has strong power (This Company A extracts the group of documents to be analyzed) The automaker that made it to the company. For example, company A Factor 4 and Factor 5! /, Strong! /, Position! /, Strength S, Factor 13! / It is possible to grasp the strengths and weaknesses between companies and other companies.

According to this, for each factor, it is possible to ascertain the ranks and shares of applicants of patent publications belonging to the factor. As a result, it is possible to grasp the characteristics of a given technical field from the viewpoint of the competitive state of the company that is the development entity.

The present invention is not limited to the embodiment described above, and various modifications can be made within the scope of the gist of the present invention.

In the above embodiment, the case where the technical document to be analyzed is a patent gazette is taken as an example, but the invention is not particularly limited to this! /. The technical document to be analyzed may be a technical paper. In this case, the factor progress information score may be obtained using the number of technical papers belonging to each factor, the number of citations, and the like.

Further, in the above embodiment, as a predetermined weight applied to the “number of documents” of the document group to be score-calculated, “total number of citations of other companies”, “total value of the number of oppositions to be patented” or “patented The power S is exemplified by the case where it is performed by at least one of the total number of requests for invalidation trial, one, or all of them, and is not limited to this. For example, it may be determined using patent profitability, patent productivity, patent utilization, or patent competitiveness.

Further, in the above embodiment, each functional unit (text data acquisition unit 101, document vector acquisition unit 102, factor calculation unit 103, attribution factor determination unit 104, document number determination unit 201, progress information reading unit 202 of the processing apparatus 1 is described. The index calculation unit 203, the patent impact calculation unit 204, the progress information calculation unit 205, and the score calculation unit 206) are examples of forces realized by software. However, the present invention is not limited to this. Each functional unit of the processing device 1 may be realized by a circuit (ASIC (Application Specific Integrated Circuit) or the like) designed exclusively for executing each functional unit.

Further, in the above embodiment, the processing apparatus 1 is not limited to the force particularly when the patent publications to be analyzed are acquired from the storage device 3 as an example. For example, it communicates with an external information providing server via a network such as the Internet, and external information You may make it acquire patent gazettes from a provision server.

In the above embodiment, as the index used for calculating the patent impact index, the “total value of the number of times of objection to patent patent” or the “total number of requests for patent invalidation trial” is used. This is only an example. As an index used for calculating the patent impact index, both “total value of the number of oppositions to be patented” and “total value of the number of requests for patent invalidation trial” may be used. Similarly, in the above embodiment, the power of using “the ratio of the number of oppositions to be patented” or “the ratio of the number of requests for invalidation of patents to be patented” as an index used to calculate the progress information index. . As an index used to calculate the progress information index, both “the ratio of the number of oppositions to be patented” and “the ratio of the number of requests for invalidation of patents to be patented” may be used.

[0083] <5. Variation>

Then, the modification of embodiment mentioned above of this invention is demonstrated.

The modified example of the present embodiment is a factor evaluation value different from the above-described factor progress information score among the processes performed by the above-described embodiment (hereinafter, the factor evaluation value calculated in the modified example is referred to as “technical element score”). Calculation). Here, the technical element refers to each extracted factor and is named from the technical literature contained in each factor and the content of each factor represented by an index word. In the following explanation, a case where patent documents such as patent gazettes are used for the document to be analyzed is taken as an example.

Hereinafter, modified examples of the present embodiment will be described in detail with reference to FIGS. 13 to 20. In the description of the modified example of the present embodiment, the same reference numerals are used for the same configurations as in the above embodiment. Further, in the description of this modification, the description will focus on the parts different from the above embodiment, and the description of the same configuration will be omitted.

[0085] <5— 1. Configuration of Modified Example>

First, FIG. 13 shows a configuration of a modification of the present embodiment. FIG. 13 is a functional block diagram of a document group analysis apparatus according to a modification of the present embodiment.

As shown in the figure, the document group analysis apparatus includes an input device 2, a recording device 3, an output device 4, and a processing device 100.

The input device 2, the recording device 3, and the output device 4 are the same as those in the above embodiment. The processing device 100 classifies the patent documents for each factor according to the request from the input device 2 and uses the patent documents stored in the recording device 3 according to the above-described procedure of FIG. Further, the processing device 100 calculates a technical element score for the factor designated by the user via the input device 2.

[0087] Specifically, the processing device 100 includes a text data acquisition unit 101, a document vector acquisition unit 102, a factor calculation unit 103, an attribution factor determination unit 104, a progress information reading unit 202, a score calculation unit 2 060, and A patent score calculation unit 2070 is provided. Note that the text data acquisition unit 101, the document vector acquisition unit 102, the factor calculation unit 103, the attribution factor determination unit 104, and the progress information reading unit 202 have the same functions as those in the above embodiment, and thus description thereof is omitted here. To do.

[0088] The score calculation unit 2060 receives a technical document from the user via the input device 2 in a state where the patent document is classified for each factor by the attribution factor determination unit 104 and an index word is associated with each factor. An element score calculation request is accepted.

When receiving the calculation request for the technical element score, the score calculation unit 2060 calculates a technical element score using a “patent score (PS)” indicating an evaluation value for each patent document belonging to the calculation target factor. The “patent score (PS)” is calculated in advance by the patent score calculation unit 2070 shown below.

[0089] For each patent document, the patent score calculation unit 2070 includes progress information of the patent document (information such as whether or not priority is claimed and the number of times cited in examination of other patent applications) and content information (claims). (Patent score (PS)) that evaluates the patent document. Then, for each piece of information (gazette number) identifying a patent document, the patent score calculation unit 2070 determines whether the patent document's “patent score (PS)” and whether or not the patent has been waived! / Information (hereinafter referred to as “PS information”) is generated in association with “abandonment information (including information indicating whether rejection has been confirmed! /”).

Next, a hardware configuration of the processing apparatus 100 will be described. As in the above embodiment, the processing device 100 is a CPU (Central Processing Unit), a memory, an I / F that exchanges data with external devices (input device 2, recording device 3, output device 4, etc.) It is realized by a computer equipped with Each functional unit of the processing device 100 (text data acquisition unit 101, document vector acquisition unit 102, factor calculation unit 103, attribution factor determination unit 104, progress information reading unit 202, score calculation unit 2060, and patent score calculation unit 2070) shall be realized by software.

Specifically, in the memory of the processing device 100, each functional unit (text data acquisition unit 101, document vector acquisition unit 102, factor calculation unit 103, attribution factor determination unit 104, progress information reading unit 202, A program for realizing the score calculation unit 2060 and the tent score calculation unit 2070) is stored. Each functional unit of the processing device 100 is realized by the CPU (Central Processing Unit) executing the program stored in the memory.

[0092] <5— 2. Calculation process in modified example>

Next, a technical element score calculation process according to a modified example different from the above-described embodiment will be described.

FIG. 14 is a flowchart showing the procedure of a technical element score calculation process according to a modification of the embodiment of the present invention.

In the processing flow shown below, the data of documents or document groups belonging to each factor is determined after the determination of the attribution factor for each document (S104) in the factor extraction process in Fig. 2 (A) described above. It is assumed to be performed for output (S 105). In addition, “information that associates patent documents belonging to factors for each factor (information shown in FIG. 6)” obtained by the factor extraction process in FIG. 2 (A) is stored in a predetermined area of the memory of the processing device 100. It shall be remembered.

Also, it is assumed that the patent score (PS) for each patent document belonging to each factor is calculated by the patent score calculation unit 2070 before the processing of FIG. Then, in the memory (or storage device 3) of the processing device 100, for each piece of information (gazette number) identifying the patent document, the “patent score (PS)” of the patent document and the patent are abandoned. /! Information indicating whether or not it is abandoned (including information on whether or not the rejection has been confirmed) (hereinafter referred to as “PS information”) ) Is stored. The procedure for calculating the patent score (PS) will be described later with reference to FIGS.

Specifically, the score calculation unit 2060 receives a technical element score from the user via the input device 2. A request for calculation processing is received (S2010). In addition, when requesting the calculation process of the technical element score, the user also specifies the category to be calculated.

For example, the factor obtained by the factor extraction process in Fig. 2 (A) may be specified as the classification target. In this case, a technical element score is calculated for each factor.

Further, for example, as a classification subject to calculation, patent gazettes belonging to each factor may be classified for each applicant, and classification for each factor and each applicant may be designated. In this case, a technical element score is calculated for each factor and for each applicant.

In the following, an example is given in which a request for calculating a technical element score for a certain factor is received.

Next, the score calculation unit 2060 acquires the patent score (PS) of the patent document belonging to the factor for which the technical element score received in S2010 is calculated (S2020).

Specifically, the score calculation unit 2060 uses “information in which patent documents are associated with each factor (information shown in FIG. 6)” and “PS information” stored in the memory of the processing device 100. Obtain “Patent Score (PS)” and “Abandonment Information” of patent documents belonging to the factor to be calculated.

[0095] Next, the score calculation unit 2060 uses the patent score “PST” and “waiver information” of the patent document belonging to the obtained factor to be calculated to obtain a patent score (PS) that has not been waived. Each of the standard values is obtained (S2030).

[0096] Specifically, the score calculation unit 2060 refers to the “waiver information” and, among patent documents belonging to the designated factor, patent documents that have not been surrendered (including patent applications pending at the JPO). Include) patent score (PS).

For each identified patent score (PS), the score calculation unit 2060 in the population (for example, in the analysis target document group subjected to the factor extraction process, the right is abandoned! /, NA! /, Patent document) Find the standard value. More specifically, the score calculation unit 2060 obtains a standard value for each identified patent score (PS) using the following (Equation 1) and the identified patent score (PS).

[0097] In the following (Equation 1), it is assumed that there are “m” patent scores (PS) of patent documents that have not been waived, and the subscript i is added to the patent score (PS). "PSi (l≤i≤ m (m is an integer greater than or equal to 1))

In (Equation 1), the standard value of the “patent score PSj” of each patent document j belonging to a specific factor is obtained from the PSi of m patent documents.

[Number 1]

<Standard value of patent j in population 內>… Equation 1

[0098] Next, the score calculation unit 2060 obtains the total value of the standard values of the patent scores PSj that are equal to or greater than the threshold among the standard values of the patent scores PSj of the patent documents belonging to the specific factor obtained in S2030, and The total value is set as the “technical element score” of the factor (S2040). In this step, the score calculation unit 206 obtains the maximum value among the standard values of the patent scores PSj of the patent documents belonging to the specific factor obtained in S2030.

[0099] Specifically, the score calculation unit 206 uses the following (Equation 2) and the standard value of the patent score (PSj) obtained in S2030 to calculate the “technical element for the factor specified by the user. Calculate score. Further, the score calculation unit 206 selects the maximum (MAX) standard value from the standard values of the patent scores PSj obtained in S2030, and sets the selected standard value as the maximum value in the factor.

In (Equation 2), among the standard values of each patent score PSj obtained in S2030, the number of standard values of the patent score PSj above the threshold is assumed to be “n” in the factor. In addition, (Equation 2) uses, as an example of the threshold value PSstd, 0 according to the average number 1 in the population of the standard value of each patent score PSi obtained in S2030.

[0100] [Equation 2] 《Technical element score:》 'Formula 2

= l

PS,

PS _std > o

PS _std : 0 (= average) for the threshold default

m: Number of patents not waived in the population (S .. ≠ 0)

": Number of patents with a factor (S.> S ^) or more in the factor

[0101] Then, when the technical element score is calculated by the score calculation unit 2060, S105 in FIG.

The process proceeds to (output).

In the flow of FIG. 14, the technical element score for one factor is calculated, but this is just an example. When receiving a request to calculate the technical element score of a plurality of factors, the processing of S2020 to S2040 is performed for each factor, and the technical element score and the maximum value are obtained for each factor.

[0102] In S105 of Fig. 2 (A), the output device 4 outputs the technical element score obtained in S2040. Alternatively, the output device 4 outputs the maximum value of the factor together with the technical element score.

The score calculation unit 2060 calculates the technical element score for each factor and each applicant, and generates information indicating a graph showing the technical element score for each factor and each applicant as shown in FIG. Then, it may be output by the output device 4. In this case, the technical element score and the maximum value may be indicated for each factor and for each applicant. As described above, in the present modification, the technical element score is calculated by using the patent score (PSi) of the patent document that has not been waived. The reason for this is as follows. For example, if a company tries to evaluate patents in each technical field, the number of patent documents classified into a technical field (factor) is very large, but many of them are abandoned. (Or! / Is an application that has confirmed the decision of rejection! /). In such a case, if an application that has already been abandoned (or an application for which refusal has been confirmed) is included in the evaluation of a patent in that technical field, the technical field that does not hold many patents is highly evaluated. It is not possible to analyze properly.

Therefore, in this modification, the technical element score is calculated using the patent score (PSi) of the patent document that has not been waived to improve the accuracy of the score.

[0104] In addition, in this modification, when calculating the standard value of the patent score (PSi), a coefficient is multiplied by a general standard value rather than a mere standard value (10 in (Equation 1)). Is doubled). This is for facilitating the discrimination between the obtained standard values. In (Equation 1), it is 10 times, but it is only an example.

[0105] In this modification, only the standard value of the patent score PSi exceeding the threshold is used for calculating the technical element score. This is to mitigate the effect of the number of patent documents received by the value of the technical element score.

For example, suppose that a technical element score is obtained for each applicant and for each factor, and the technical tendency for each applicant is analyzed by comparing the obtained technical element scores. In this case, if the threshold value is not taken into consideration as in this modification, the value of the technical element score of the applicant having a large number of applications tends to be too high, and there is a possibility that a highly accurate analysis cannot be performed.

Certainly, if the patents in a specific technical field are extracted without excess or deficiency and used as the analysis target document group (population), the number of applications for each applicant and each factor itself is also a sufficiently significant value. I can think. However, if the documents to be analyzed (population group) are extracted by any method that is not so, the number of applications itself varies depending on the characteristics of the industry to which each applicant belongs. If the number of applications for each factor is limited, there is a possibility that a highly accurate analysis cannot be performed. In addition, if the main objective is to select important elements from a group of documents to be analyzed (population) including a huge number of patents, then "individuals with low importance, many patents" In some cases, it is preferable to place emphasis on the person who has a high degree of importance.

For this reason, in this modification, only the standard value of the patent score PSi that is equal to or higher than a predetermined value is used, and a high technical element score is given only to factors including important patents that are higher than the predetermined value. In this way, the accuracy of the technical element score was improved.

In particular, for example, when the patent score is standardized so that the average becomes 0, and the standard values above the average (0) are aggregated to obtain the technical element score, the average that can be obtained by discarding the patent score values below the average as much as possible Even if there are many patent scores in the vicinity, the effect on the value of the technical element score is small. Therefore, the influence of the number of technical elements included can be further alleviated, and technical elements that contain highly important patents can be extracted accurately.

FIG. 15 is a diagram showing the distribution of the technical element score of the modified example and the factor progress information score of the example in relation to the number of publications. Specifically, factor progress information scores are calculated and standardized to mean 0 and variance 1 for multiple factors extracted from a certain group of documents to be analyzed, and technical factor scores (average 0 and variance 1 are standardized for the same factor). The standardized factor progress information score and technical element score are plotted on the vertical axis, and the number of publications for each factor is plotted on the horizontal axis.

As shown in the figure, the factor progress information score has a distribution close to a straight line showing a direct proportional relationship with the number of publications, and is greatly influenced by the number of publications. On the other hand, although the technical element score is not completely irrelevant to the number of publications, it is distributed in a region far from the straight line showing the direct proportional relationship, which shows that the influence of the number of publications is mitigated.

[0107] FIG. 16 shows that after determining the attribution factors of each document (S104) in the factor extraction process of FIG. 2 (A) described above, the publication groups belonging to each factor are further classified for each applicant. In this example, the technical element score of this modification is calculated for each factor and for each applicant. The specific calculation method is as follows.

A cluster analysis was performed based on the similarity of patent publications and publications published in Japan by a household equipment manufacturer (Company a) to obtain multiple clusters. These multiple clusters As a result, we decided to analyze in detail the relationship with the patents of other companies around the company, especially for clusters related to “composite structures”. Therefore, out of approximately 4 million gazette groups including patent publications and patent publication gazettes of our company and other companies, about 3000 gazette groups with high similarity to the gazette group belonging to the cluster related to the “composite structure”. Extracted.

A cluster analysis was performed on the approximately 3000 publication groups based on the similarity of the document vectors of each publication, and among these, 323 publications belonging to a specific cluster were extracted as factors to be analyzed. The score calculation unit 2060 further classifies the publication group belonging to each factor for each applicant, and calculates the technical element score for each factor and each applicant. Then, the output device 4 generates a diagram based on the calculated technical element score, and outputs the diagram.

In FIG. 16, factors are listed in the horizontal direction in the figure, applicants are listed in the depth direction in the order of the number of applications, and the height direction indicates the technical element score for each factor and each applicant. However, Fig. 16 shows only the results calculated for the applicant's technical element score with the number of applications within the top 10 in all publications with a factor score exceeding a certain level.

[0108] In this modification, the technical factor score is calculated by adding the patent scores that are below the average, excluding publications that are below the average. For factors with a below average, the technical factor score is close to 0 or 0. Therefore, the contrast between factors becomes clear, and as a result, the order and evaluation between factors can be easily grasped visually.

Looking at the output results in Fig. 16, company g, which has a small number of applications in the document group to be analyzed, has a large number of applications, patents comparable to companies a and b in a technical element! You can read! In addition, it is clear at a glance where the strengths and weaknesses of Company a, which is the top in the number of applications, include the technological elements that lag behind other companies. In this way, it is possible to clearly grasp the strengths and weaknesses of each company.

[0109] It should be noted that in the present modification, the force using the average of the population as the threshold value is not particularly limited to this. For example, an average of the standard values of the patent score PSi in the patent group of the specific applicant and other threshold values determined by other users may be set in the processing apparatus 100. Further, in this modification, the power value using the standard value of the patent score PSi is not limited to this. For example, the effect of the number of cases can be mitigated even when only non-standardized patent scores PSi are added.

[0110] Also, according to this modification, when the technical element score is presented to the user, the highest standard value of the patent score (PSj) of the patent document classified as the factor can be presented. Become. As a result, the user can grasp which technical elements (factors) a highly evaluated patent is included in. As a result, the user can grasp the technical elements (factors) including highly evaluated patents even if the evaluation value as a whole of the technical elements (factors) is low. Suppose that the technical element score for each factor of the company (applicant) is obtained in an attempt to evaluate the patent. In this case, by presenting the highest value of each factor, it becomes possible to grasp which technical field of the company has strong patents.

[0111] <6. Patent Score (PS)>

Next, the patent score (PS) used to calculate the technical element score in the above modification will be described with reference to FIGS.

In the following description, the calculation of the patent score (PS) is performed by the patent score calculation unit 2070 of the processing apparatus 100. The present invention is not particularly limited to this.

Another computer having a CPU (Central Processing Unit), a memory, and the like may perform the calculation of the patent score. In this case, a program (PS calculation program) for realizing the function of the patent score calculation unit 2070 is stored in another computer. Then, the CPU of another computer executes the “PS calculation program”, thereby calculating the patent score PS and generating the above-described PS information. The processing device 100 acquires PS information generated by another combinator and stores it in the memory. When another computer performs the patent score calculation process, it is not necessary to provide the patent score calculation unit 2060 in the processing apparatus 100. [0112] <6— 1. Data structure>

First, the data structure used for calculating the patent score PS will be described. The storage device 3 stores patent data (electronic data indicating a patent gazette) and patent attribute information. The electronic data indicating the patent gazette includes at least the bibliographic information such as the patent data I D (gazette number, etc.), filing date, and IPC code.

In addition, the patent attribute information includes progress information 300 of the patent document (information such as whether priority is claimed or the number of citations in examination of other patent applications), and content information 400 (number of claims, Information such as the number of specifications). Hereinafter, the data structure of the progress information 300 and the content information 400 will be described.

First, FIG. 17 shows an example of the data configuration of the progress information 300.

FIG. 17 is a diagram schematically showing an example of the data structure of the progress information used in the modification of the present embodiment.

As shown in the figure, the progress information 300 includes a “Finored 301” for registering a “patent data ID (gazette number, etc.)”, a “Finored 302” for registering “the filing date, etc.”, “examination” A field 303 for registering the “number of days since the billing date”, a field 304 for registering the “number of days elapsed from the registration date”, and a field 305 for registering information indicating whether or not “divisional application” exists. Finale 306 for registering information indicating the presence or absence of “early examination”, field 30 7 for registering information indicating the presence or absence of “trial decision of appeal against appeal”, and the presence or absence of “opposition to maintain opposition” Field 308 for registering information indicating the status, field 309 for registering information indicating the presence / absence of “invalidation trial maintenance decision”, and field 310 for registering information indicating the presence / absence of “priority claim” In order to register the field 311 for registering information indicating the presence / absence of “PCT application”, the field 312 for registering information indicating the presence / absence of “wrapping bag browsing”, and the information indicating “number of times cited” Field 313 and one record. The progress information 300 includes a plurality of records.

Here, “elapsed days from application”, “elapsed days from examination request”, and “elapsed days from registration date” are information relating to the period of the corresponding patent data. “Elapsed days from application” is the filing date, “Elapsed days from examination request” is the application examination request date, “Elapsed date from registration date” The number of days in the past is stored in the storage device 3 based on the registration date of the patent right, and the calculated number of days until the evaluation date (patent score calculation date) or near the evaluation date! . The application examination request has not been requested yet! /, The patent application has already been null, and the number of days passed since the examination request has been set to NULL. The “number of days elapsed since registration date” is NULL.

[0115] Among the progress information 300, “divisional application”, “early examination”, “appeal of appeal trial patent decision”, “opposition to maintain opposition”, “invalidity trial maintenance decision”, “viewing package”, “priority” "Is information indicating the presence or absence of a predetermined act on the patent data. `` Division application '' is whether the divisional application has been filed based on the patent application, `` Rapid examination '' is whether the patent application has been expedited examination, The appeal against the decision to reject the patent application is requested and whether or not a patent trial decision has been made in the trial, the “opposition to maintain opposition” is a patent opposition to the patent and a maintenance decision has been made Whether the trial for maintaining a trial for invalidation is requested and whether or not a trial for rejecting the request has been made in the trial, and “priority” is a patent application for which the patent application is earlier. Whether or not it is accompanied by a priority claim based on or whether the patent application is an international application based on the Patent Cooperation Treaty, Based on whether list claims have been made, a predetermined action is performed is in! /, Respectively, Ru case is given for instance 1, made is in! /, It! / ヽ case is X_ given is 0, for example.

Next, the data structure of the content information 400 is shown in FIG.

FIG. 18 is a diagram schematically showing an example of the data configuration of the content information used in the modification of the present embodiment.

As shown in the figure, the content information 400 includes a field 401 for registering “patent data ID (gazette number, etc.)”, a field 402 for registering “number of claims” of the patent data, One record is composed of a field 403 for registering “average number of characters in a claim” and a field 404 for registering “number of specifications” of the patent data. The content information 400 includes a plurality of records.

Here, the “number of claims” is information indicating the number of claims of the patent application. “Average number of characters” is information indicating the average number of characters (or the number of words) per claim of the patent application. “Number of specification pages” is information indicating the number of specification pages or the number of publication pages of the patent application. This information is extracted from the published patent gazette and other patent data of each patent application.

[0118] <6— 2. Patent Score Calculation Processing>

Next, description will be made with reference to FIG. FIG. 19 is a flowchart showing the procedure of a patent score calculation process according to a modification of the present embodiment.

As shown in FIG. 19, the patent score calculation unit 2070 receives input of an IPC code from the user, and acquires patent data (electronic data indicating a patent publication) (S400).

Specifically, when receiving an IPC code input from a user, the patent score calculation unit 207 accesses the storage device 3 and acquires patent data classified into the IPC code. The patent data contains bibliographic information such as the filing date information and priority date information of the patent application (limited to cases where priority is claimed).

[0120] Next, the patent score calculation unit 2070 uses the application date information or the priority date information among the obtained bibliographic information of the patent data to obtain the patent data every predetermined period (variation example of this embodiment). Then, it is classified into group t by application year and year to which the priority date belongs (S500).

).

Next, the patent score calculation unit 2070 calculates an evaluation value of each patent data (S600). Details of this processing will be described with reference to FIG.

FIG. 20 is a flowchart showing details of a process for calculating an evaluation value of each patent data according to a modification of the present embodiment.

The patent score calculation unit 2070 acquires the progress information 300 and the content information 400 for the patent data belonging to the group generated by the classification of S210 (S610). Specifically, the patent score calculation unit 2070 uses the patent ID (gazette number, etc.) included in the bibliographic information of the acquired patent data to be stored in the storage device 3! /, The progress information 300 and In addition, from the content information 400, the historical information 300 and the content information 400 associated with the patent ID of the acquired patent data are acquired.

Here, in Fig. 20, it is assumed that it consists of patent data for one acquired group force. The subscript j (j = l, 2, ···,]) is used to distinguish each of the J cases.

Once J patent data has been acquired, using these J patent data progress information 300 and content information 400, the “total value for J of the evaluation item corresponding data” used in S6302 to S6304, which will be described later, etc. Is obtained in advance.

[0122] Next, the variable j is set to 1 (S620), and the evaluation score of the patent data j is calculated as follows.

[0123] First, the information registered in each field of the progress information 300 is used as an evaluation item, and I evaluation items i (i = l, 2, ..., I) are evaluated for each evaluation item. Select the evaluation point calculation method set in advance in (S6301).

[0124] There are the following three evaluation point calculation methods in the modification of the present embodiment. In other words, Finored 305, 306, 307, 308, 309, 310, 311, 312ί have been registered! ] Is selected. For fields 302, 303, and 304, S6303 [time decay type] is selected as information relating to the period of the patent data. In the field 313, S6304 [number-of-times] is selected as information indicating the number of times the patent data is cited.

[0125] After selecting the evaluation score calculation method, the evaluation score of patent data j is calculated for each of the I evaluation items i (S6302, S6303, S6304).

[0126] <6-2- 1. Calculation of evaluation score for presence / absence type>

For the evaluation item i for which S6302 [Presence / absence type] is selected, the evaluation score is calculated according to the following 3).

Country

— (Evaluation item, applicable data) _

(Evaluation item, applicable data)

[0127] Here, the "relevance data for evaluation item i" arranged in the numerator is, for example, "1" if a divisional application has been filed as described above, and "0" if it has not been filed.

[0128] In the denominator, a positive square root of the total value in the group of the above-mentioned "data corresponding to evaluation item i" is arranged. Therefore, there are many patent data corresponding to the evaluation items in the group. When there are a small number of denominators, the denominator is large. When there are only a few patent data corresponding to the evaluation items in the group, the denominator is small. Patents with fewer evaluation items (such as “invalidation trial maintenance decision”) than patents with a higher number of evaluation items (such as “Bag browsing”) are more likely to be retained after registration. (In general, the high maintenance rate is considered to indicate the high economic value commensurate with the maintenance cost (patent fee)), so each evaluation item is automatically weighted. In addition, since data is collected in groups for each predetermined period, for example, older patents have more progress information added, and V, newer, and patents that have just been published still have progress information added. ! /, Na! /, Many things! /, But for that reason, it is possible to alleviate the tendency of new and low patents.

The attribute information of patent data is useful for relative evaluation within the analysis population, but proper evaluation cannot be performed if patent applications or patent rights within this analysis population are treated equally. According to this embodiment, the analysis target population is classified into groups for each period, and the value obtained for each classified group is used as the denominator, so that analysis targets including patent applications or patent rights at different periods can be used. Appropriate relative evaluation is possible within the population.For example, in one technical field! /, One value in a contemporaneous group with few patent applications, and one in a contemporaneous group with many patent applications. In terms of one value, the former value is often higher. On the other hand, for example, a patent application that has passed several years is more likely to be given progress information such as being requested to browse than a patent application that has just been published. That's why it is an error to evaluate a patent application as low as it has been published. For example, if there are only a few patent applications in the group that have received a request for review, the patent application that received the request for inspection is a patent application with a particularly high degree of attention and should be highly evaluated. Conversely, if there are a large number of patent applications in the group that have been requested to be browsed, the patent application that has been requested to be browsed should be highly evaluated simply because it has been requested to be read. It is not a thing. According to the present embodiment, the value obtained by using the patent attribute information of each patent data belonging to each group and the value obtained by using the patent attribute information of each patent data belonging to the group are obtained for each group. The evaluation score is calculated by multiplying the sum of the values by the value of the decreasing function. This According to this configuration, a value that takes into account the relative positioning of each patent data in each group can be obtained as an evaluation value. As a result, the lower the total value of the numerical information based on the progress information, the higher the weight, and the lower the higher the total value, the lower the weight. Appropriate evaluation of rights is possible.

[0129] <6— 2— 2. Calculation of evaluation points for time decay type>

For the evaluation item i for which S6303 [Time decay type] is selected, the evaluation score is calculated according to the following 4).

[Number 4

(Elapsed time, age

Term.

, (Applicability data of evaluation item)

[0130] “Exp (— (Min (Elapsed time, Years)) / Years)” placed in the numerator is “Elapsed days from request for examination”. any force vj of converted value) "and" year limit ", the value of the most side with the value obtained by multiplying the 1 divided by" maturity ", is a power value of Neipia number _e. The “year” is the maximum number of years from the date of filing to the expiration of the patent term (20 years under the current Japanese law). The same formula is used for “Elapsed days from date of registration” and “Maturity” is the maximum number of years from the date of filing to the expiration of the patent term (20 years under current Japanese law). The ability to use the same formula for “Elapsed days from filing date” “Maturity” is the number of years from the filing date to the deadline for requesting examination (3 years under current Japanese law). According to this, while the elapsed time is short, the value of the numerator is a force that is close to Exp (0) = 1, decaying with the passage of time and when the elapsed time ≥ age, Exp (— 1) = l / e Drop to. The advantage of using an exponential function is that a depreciation effect on the value can be introduced and that the evaluation value distribution can be made discrete and smooth. “Elapsed days from request for examination”, “Elapsed days from filing date”, and “Elapsed days from registration date” are basic evaluation items applicable to many patents. Can be avoided.

[0131] The denominator is a force with the same formula as S6302 [Presence / absence type] above. The number of days in the past is a positive square root obtained by summing, for example, a value of 1 if the application is requested for the patent application and 0 if not, for example. For the number of days elapsed from the registration date, the denominator is the sum of the values of 1 if the patent application has been registered for the patent application and 0 if not, and taking the positive square root. It becomes. As for “Elapsed days since filing”, all patent data is applicable, so if the corresponding data for the evaluation item is 1, the denominator value is equal to the positive square root of the number of patent data in the group. Become. In either case, the denominator is large when there are many patent data corresponding to the evaluation items in the group, and the denominator is small when there are only a few patent data corresponding to the evaluation items in the group. As described above, “Elapsed days from request for examination”, “Elapsed days of application filing power”, and “Registration power, etc.” are basic evaluation items applicable to many patents. Item points tend to be smaller.

The evaluation score calculated in S6303 [time decay type] is further corrected by the content information. In the following, the content information 400 shown in FIG. 18 is used.

If evaluation is based only on historical information, there is a possibility that the patent application or patent right will not be evaluated correctly because there is no historical information expected to be granted in the future, just after the application is published or the patent right is registered. is there. Therefore, in order to correct this, the content information is added to the evaluation based on the progress information. However, content information tends not to correlate with the maintenance rate as much as progress information, and carelessness of content information may reduce the accuracy of evaluation. Therefore, in order to minimize the influence of the content information in the evaluation of patents with sufficient progress information and to effectively reflect the content information in the evaluation of patents with insufficient progress information, this S223C [Time decay Only the evaluation score calculated in [Type] is multiplied by the correction coefficient based on the content information.

As described above, according to the present embodiment, regardless of whether the application is old or new, it is possible to add the contents information of each patent data to the information about the period having characteristics that are easily given to any patent data. . As a result, it is possible to appropriately evaluate patent data consisting of new applications with little progress information. [0133] Specifically, for each evaluation score of the above-mentioned woman 4],

a X a X a

one two Three

here,

a = 2 ^1/3 (if the average number of characters per claim is below average) or

2— ^1/3 (if the average number of characters per claim is above average)

a = 2 ^1/3 (when the total number of pages is above average) or

2

2— ^1/3 (when the total number of pages is below average)

a = 2 ^1/3 (when the number of claims is within ± 1 standard deviation) or

Three

2— ^1/3 (when the number of claims is outside the above range)

Multiply a Xa Xa is maximized by setting the maximum values of a, a and a to 2 ^1/3 respectively.

1 2 3 1 2 3 The value is limited to correction. In the above embodiment, the maximum value of a Xa Xa is 2

one two Three

It is trying to become.

[0134] <6— 2— 3 · Calculation of evaluation score in 3 times type>

For the evaluation item i for which S6304 [number-of-times] is selected, the evaluation score is calculated according to the following 5].

[Number 5

/ (Quote) xlo _g ("] + l)

/ (Quotation) xlog ("] + l)

[0135] Here, “f (quotation) Xlog (n + l)” placed in the numerator is weighted to the logarithm of the value obtained by adding 1 to “number of times cited” for “number of times cited”. f (quotation) multiplied. The verification by the present inventors has shown that the retention rate of patent rights changes depending on the number of citations as well as the presence or absence of citations. The logarithm is taken to indicate a tendency to gradually peak.

[0136] In the denominator, the positive square root of the total value in the group of the above "f (quotation) Xlog (n + l)" is arranged. Therefore, if there are many patent data cited in other applications in the group, the denominator is large.If there are only a few patent data cited in other applications in the group, the denominator is Get smaller. [0137] In the numerator and denominator of the above woman 5], the weight f (quotation) can use any positive number. Number of times cited in other patent applications (number of times other company citations) n Patents

j other

Number of times cited in the application (in-house citation number) n

j self

Grant only. In this case, the following woman 6] is used instead of the above woman 5].

[Equation 6]

Specific weights include f (quote) for other company quotes and f (quote) for company quotes.

other

The ratio to 1) was set to 1: 2.

self

[0138] The number of times cited is highly correlated with the value of a patent. Furthermore, according to the verification by the present inventors, the number of times cited in the examination of patent applications of other companies (quoted by other companies) and the number of times cited in examinations of other patent applications of the company (in-house citation) It was found that there was a significantly high correlation between the value of patents and patents. It is presumed that the invention cited in the examination of other patent applications of the company is often the basic invention that is the core of the technology implemented in the company. While recognizing that the company has already applied for such a basic invention, it is highly likely that the company has applied for the improved technology and built a strong patent portfolio.

According to the present embodiment, it is possible to appropriately evaluate a patent application or a patent right by considering the number of citations separately from other company citations and company citations, and reflecting the latter number more in the evaluation value.

[0139] <6— 2— 4. Calculation of evaluation score>

For all evaluation items i (i = l, 2, ···, I), the evaluation score of patent data j is calculated, and based on this, the evaluation score of the patent data j is 7] (S64 0).

[Equation 7]

As shown in this equation, the evaluation raw score is the positive square root of the sum of squares of I evaluation points, or 0. The evaluation score is 0 when the application examination request is not received by the deadline for requesting examination.

, If the application is withdrawn or abandoned, decision of refusal is finalized, other patent application is invalidated, decision to cancel by opposition or decision of invalidation by invalidation trial is confirmed, patent right is abandoned, patent This is the case when the term of the right expires or other patent rights have expired. This information is also read from the progress information of each patent data, and the evaluation score is set to 0 if applicable.

As described above, the evaluation score calculated in S6303 [time decay type] is corrected by the content information. Specifically, each of the evaluation points calculated in the above-mentioned Woman 4] based on “Elapsed days from examination request”, “Elapsed days from application date”, and “Elapsed days from registration date” After multiplying by X a X a, take the square root of the sum of squares according to Woman 7].

one two Three

As a method of calculating an evaluation raw score from an evaluation point i based on a plurality of evaluation items, there is a method of calculating a sum of each evaluation point i (simple sum method). According to the calculation method, the evaluation of a patent to which a large amount of historical information having a correlation with the patent maintenance rate (economic value) is given is highly calculated. At first glance, it seems reasonable that the correlation with the power maintenance rate is too high! /, A lot of progress information is given! /, An evaluation of a patent (low! /, A lot of evaluation points are calculated) Care should be taken because the raw score is very high in correlation with the maintenance rate and may exceed the evaluation raw score of a patent given a small amount of progress information!

One way to solve this problem is to use the maximum value among the evaluation points i as the evaluation raw score (maximum value method). According to this calculation method, there is a particular case when investigating the correlation between certain historical information and the retention rate of the patent group, when investigating the correlation regardless of what other historical information is given. The patent maintenance rate is expected to be best expressed by the maintenance rate of the historical information with the highest maintenance rate, so it is reasonable to assume that the maximum value of the evaluation point i is the evaluation raw score. If the maximum value of point i is the same in the two patents, superiority or inferiority cannot be assigned. Furthermore, when the maximum value method is used, it is not possible to make an evaluation that takes into account the perspectives of three different entities of the applicant, the JPO, and the competitors, and only the perspective of one of those entities As a result, the viewpoints of the remaining subjects cannot be reflected in the evaluation of patent data. The above-mentioned method of taking the square root of the sum of squares can be said to be a method that combines the advantages of the simple sum method and the maximum value method. In other words, by taking the square root of the sum of squares, if there is a high evaluation point i in I evaluation items i for a certain patent data j, the high evaluation point i greatly affects the evaluation point. Evaluation points other than the evaluation item with a high evaluation point i also become evaluation raw points with some consideration. Therefore, for patent data j that corresponds to multiple items such as “early examination”, “opposition to maintain opposition”, and “invalidation trial decision” that tend to be high, i. Can be given.

Thus, in this modification, patent evaluation is performed in consideration of all evaluation points calculated according to the type of patent attribute information (S630, S640). As a result, it is possible to evaluate the value of patent data from multiple angles.

[0141] <6— 2— 5. Calculation of Evaluation Value>

When the evaluation raw score is calculated, the logarithm thereof is calculated as the evaluation value of the patent data j (S650).

The evaluation value calculated based on the progress information or content information is the power S that gives a high value to a few patent applications or patent rights that can read unique progress or content S, and many other patent applications or patent rights. On the other hand, low! /, Often given a value! / ,. Therefore, looking at the number distribution by rating value, patent applications or patent rights with high evaluation values are few and sparse, and many patent applications or patent rights with low evaluation values are densely distributed.

In such a case, the average value (arithmetic average value) is greatly influenced by a small number of patent applications or patent rights with high evaluation values, so care must be taken when evaluating by comparison with such average values. It becomes. In addition, for example, when comparing two patent applications or patent rights that have obtained high evaluation values, even if it appears that there is a large difference in evaluation values, there is actually no significant difference. Nare, sometimes.

[0142] Next, it is determined whether or not the evaluation values have been calculated for all patent data j (S660). If the calculation results in a low level (S660: NO), S67C is advanced and variable j is set to j + 1. Set ί and then return to S63C to calculate the evaluation value for the next patent data.

When the evaluation values are calculated for all the patent data j (S660: YES), the processing for calculating the evaluation values for the patent data belonging to the relevant gnole ends. As described above, in this embodiment, a plurality of patent data having different characteristics are evaluated in consideration of the characteristics for each technical field and each application time. As a result, the value of patent data can be more appropriately evaluated.

[0143] The evaluation value calculation processing from S610 to S670 is executed for all groups t obtained by classifying the patent data acquired in S400 in S500.

When the evaluation values are calculated for all the groups t, the process returns to FIG. 19, and based on this evaluation value, the deviation value in the analysis target population acquired in S400 is calculated as the patent score PS (S700). This deviation value also enables relative comparison of patent data between different technical fields that would otherwise be difficult to compare (comparison with an analysis population selected separately by different IPCs in S400). It is.

[0144] In the modified example of the present embodiment, the technical element score is calculated based on the patent score PS obtained by the above procedure. There are significant advantages.

Specifically, in the above modification, the patent score PS, which is the basis of the technical element score, takes into account the weight according to the type of historical information. Since the technical element score is obtained using the patent score PS, a score with higher accuracy is calculated in the modified example.

According to the patent score of this modification, the analysis target population is classified into groups for each period, and the values obtained for each classified group are used as denominators to include patent applications or patent rights at different periods. Appropriate relative evaluation is possible within the analysis population.

Therefore, it is possible to reduce the possibility that a high evaluation value is calculated for the technical element score of a factor in which many patent data whose applications are old are classified.

Claims

The scope of the claims

[1] Text data acquisition means for acquiring a plurality of technical documents represented by text data, weighting amount calculation means for calculating a weighting amount of each index word for each of the acquired technical documents,

Each acquired technical document is used as a subject, and a factor analysis is performed using each index word as an observation variable by using the obtained weighting amount of each index word. For each index word, factor loading is performed for each factor. A calculation means for calculating a factor for each factor for each technical document,

An attribution factor determination means that determines the attribution factor of each index word using the factor loading of each index word and also determines the attribution factor of each technical document using the factor score of each technical document. Output means for outputting each word or index word group together with the technical literature or technical literature group data belonging to each corresponding factor.

Document group analysis device characterized by.

[2] The document group analysis apparatus according to claim 1,

The attribution factor determination means includes

For each index word, using the calculated factor loading, the factor with the largest factor loading is selected, and the selected factor is identified as the attribution factor of the index word. Using the calculated factor score, the factor with the highest factor score is selected, and the selected factor is specified as the attribution factor of the technical document.

Document group analysis device characterized by.

[3] The document group analyzer according to claim 1 or 2,

The frequency of occurrence of the word “bow” I included in the plurality of technical documents is obtained, the degree of importance of each index word is calculated using the frequency of appearance, and a predetermined number of higher-order importance is calculated using the calculated importance. An important index word extracting means for extracting the index word of

The weighting amount calculating means obtains a weighting amount of the predetermined number of index words having the highest importance as the weighting amount of each index word. Document group analysis device characterized by.

[4] A document group analysis apparatus according to any one of claims;! To 3,

For each of the factors, there is provided factor evaluation value calculation means for calculating a factor evaluation value indicating a technical evaluation of the factor,

The output means outputs the factor evaluation value of the technical document or technical document group as data of the technical document or technical document group.

Document group analysis device characterized by.

[5] A document group analysis apparatus according to any one of claims;! To 3,

The technical document is a patent document including a patent publication and a patent publication, and a progress information acquisition unit that acquires progress information of each patent document;

For each of the factors, a factor evaluation value calculating means for calculating a factor evaluation value indicating a technical evaluation of the factor using the progress information of the technical document or technical document group belonging to the factor, and

The output means outputs the factor evaluation value as data of the technical document or technical document group.

Document group analysis device characterized by.

[6] The document group analyzer according to claim 5,

A document number judging means for judging the number of documents of technical documents or technical documents belonging to each factor;

The factor evaluation value calculation means includes

For each of the factors, a first index is calculated by giving a predetermined weight to the number of documents in the technical document or technical document group belonging to the factor, and the progress information of the technical document or technical document group belonging to the factor is indexed. (2) Calculate an index and calculate the factor evaluation value of the factor using the calculated first index and second index.

Document group analysis device characterized by.

[7] The document group analyzer according to claim 6,

The progress information includes the number of citations from other companies, the number of oppositions to be patented, the number of requests for trial for invalidity of patented patents, the presence or absence of examination requests, and the presence or absence of registration of patent right settings. The first index is weighted by using at least one of the total number of citations from other companies, the total number of patent oppositions, and the total number of patent invalidation requests. Value

The second index, which indexes the progress information, is the total number of citations from other companies, the total number of patent objections, the total number of requests for patent invalidation trials, the examination request rate, and the registration decision rate. Must be an indexed value of at least one of

Document group analysis device characterized by.

[8] The document group analyzer according to any one of claims 5 to 7,

The document group analysis apparatus characterized in that the output means calculates the factor evaluation value for each factor and for each applicant.

[9] A data analysis method performed by an information processing device,

The information processing apparatus includes:

Obtaining a plurality of technical documents represented by text data;

For each of the acquired technical documents, a step of determining a weighting amount of each index word; and using each of the acquired technical documents as a subject, and using the determined weighting amount of each index word, each index word is defined as an observation variable. Performing factor analysis, calculating a factor loading for each factor for each index word, and calculating a factor score for each factor for each of the technical documents;

Determining the attribution factor of each index word using the factor loading of each index word, and determining the attribution factor of each technical document using the factor score of each technical document.

A data analysis method characterized by

[10] A program for causing an information processing device to perform data analysis processing,

The program is

Processing to obtain a plurality of technical documents represented by text data;

For each of the acquired technical documents, a process for obtaining a weighting amount of each index word, and using each of the acquired technical documents as a test subject, and using the obtained weighting amount of each index word, each index word is defined as an observation variable. Factor analysis, and for each index word, Calculating a factor load for each child, and calculating a factor score for each factor for each of the technical documents;

Determines the attribution factor of each index word using the factor loading of each index word, and causes the information processing device to execute the process of determining the attribution factor of each technical document using the factor score of each technical document thing

A program characterized by

[11] A document group analysis apparatus according to any one of claims;! To 3,

Document group analysis device characterized by.

[12] The document group analyzer according to claim 11,

The factor evaluation value calculation means includes

Document group analysis device characterized by.

[13] The document group analyzer according to claim 12,

Document group analysis device characterized by.

[14] The document group analysis apparatus according to any one of claims 11 to 13;

The patent score means that the patent documents are classified into groups for each technical field and every predetermined period, and for each classified group, the progress information of the patent documents belonging to the group is used, and Must be a calculated value Document group analysis device characterized by.