WO2008053949A1 - Document group analysis device - Google Patents

Document group analysis device Download PDF

Info

Publication number
WO2008053949A1
WO2008053949A1 PCT/JP2007/071282 JP2007071282W WO2008053949A1 WO 2008053949 A1 WO2008053949 A1 WO 2008053949A1 JP 2007071282 W JP2007071282 W JP 2007071282W WO 2008053949 A1 WO2008053949 A1 WO 2008053949A1
Authority
WO
WIPO (PCT)
Prior art keywords
factor
document
technical
score
index
Prior art date
Application number
PCT/JP2007/071282
Other languages
French (fr)
Japanese (ja)
Inventor
Hiroaki Masuyama
Norio Araki
Kazumi Hasuko
Toshiro Ohsaki
Original Assignee
Intellectual Property Bank Corp.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intellectual Property Bank Corp. filed Critical Intellectual Property Bank Corp.
Priority to JP2008542170A priority Critical patent/JPWO2008053949A1/en
Publication of WO2008053949A1 publication Critical patent/WO2008053949A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/313Selection or weighting of terms for indexing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/93Document management systems

Definitions

  • the present invention relates to a technique for analyzing technical literature, and more particularly, to a technique for analyzing what kind of concept or characteristic exists in a plurality of documents.
  • Patent Document 1 adds a weight to each word obtained by performing morphological analysis on a searched technical document, vectorizes each technical document, Technical documents with similar orientations are grouped into a cluster. Then, important words are extracted for each cluster.
  • Patent Document 1 JP 2005-92443
  • Patent Document 1 classification is performed on the scale of closeness of vector direction, so the same cluster if the vector direction is closer than a certain threshold, otherwise It becomes a cluster, and the characteristics of each cluster are not always clear. Even if important words are extracted for each cluster, the extracted important words are similar! /, And it is difficult to grasp the difference between clusters! /, And the validity of the classification itself is questioned. End up.
  • Patent Document 1 does not take into consideration the analysis of the characteristics of the classified clusters by the analyst. Therefore, if an analyst tries to grasp the characteristics of the classified cluster, the analyst is forced to read each technical document belonging to the cluster, and the analysis takes a long time.
  • the present invention has been made in view of the above circumstances, and the object of the present invention is to analyze It is intended to enable analysts to understand what concepts or features the technical fields represented by multiple technical documents have in the analysis of technical documents while maintaining objectivity. .
  • one aspect of the present invention is applied to a document group analysis apparatus that analyzes text data.
  • the document group analysis device is a text data acquisition means for acquiring a plurality of technical documents represented by text data, and a weighting amount calculation for calculating a weighting amount of each index word for each of the acquired technical documents. And a factor analysis using each index word as an observation variable using the obtained weighted amount of each index word as a subject, and for each index word for each factor.
  • calculating means for calculating the factor score for each factor, and determining the attribution factor of each index word using the factor loading of each index word The attribution factor determination means for determining the attribution factor of each technical document using the factor score of each technical document, and the index word or group of index words belonging to the same factor, respectively.
  • operative document group of data characterized by comprising output means for outputting for each factor, a.
  • an index word and a plurality of technical documents can be attributed to each factor. Therefore, from the index word, characteristics of the technical field through language information that can be understood by an analyst. It becomes possible to grasp the concept. Also, from a plurality of technical documents, it is possible to grasp the characteristics or concepts of the technical field from the specific tendency of the bibliographic information included in each technical document. Note that the present invention does not require the subjective or arbitrary judgment of the analyst such as “interpretation of factors” in 1S normal factor analysis using the factor analysis technique. This is because each index word is used as an observation variable. In other words, since the observed variable itself represents the content, if the observed variable belonging to the factor is determined based on the factor loading, the content of each factor can be shown directly using the observed variable.
  • the attribution factor determination means uses the calculated factor loading for each index word to select a factor having the maximum factor loading, and identifies the selected factor as the attribution factor of the index word.
  • the factor having the maximum factor score may be selected, and the selected factor may be specified as the attribution factor of the technical document.
  • the frequency of occurrence of index words included in the plurality of documents is obtained, the importance of each index word is calculated using the frequency of appearance, and a predetermined number of index words with the highest importance are calculated using the calculated importance.
  • An important index word extracting unit may be further included, and the weighting amount calculating unit may obtain a weighting amount of a predetermined number of index words having a higher importance as a weighting amount of each index word.
  • index words that represent the characteristics of the entire technical document group are extracted as important index words and analyzed, so that the characteristics or concepts of the technical field can be understood more clearly. Is possible. On the other hand, it is possible to improve the efficiency of the analysis process by narrowing down the index words in advance.
  • factor evaluation value calculation means for calculating a factor evaluation value indicating a technical evaluation of the factor
  • the output means uses the technical document as data of the technical document or technical document group. Or it is good also as outputting the factor evaluation value of a technical literature group.
  • the technical document is a patent document including a patent publication and a patent publication, Progress information acquisition means for acquiring progress information of each patent document, and for each of the factors, factor evaluation indicating a technical evaluation of the factor using the progress information of the technical literature or technical literature group belonging to the factor Factor evaluation value calculation means for calculating a value, and the output means may output the factor evaluation value as data of the technical document or technical document group.
  • factors can be evaluated based on a patent gazette. Therefore, highly accurate factor evaluation can be performed by utilizing examination progress information included in patent information related to the patent gazette. Can be done.
  • Document number determination means for determining the number of technical documents or technical document groups belonging to each factor
  • the factor evaluation value calculation means for each of the factors is a technical document or technical document group document belonging to the factor.
  • a first index with a predetermined weight is calculated
  • a second index is calculated by indexing progress information of technical documents or technical documents belonging to the factor
  • the calculated first index and second index are calculated. It may be used to calculate a factor evaluation value of the factor.
  • the progress information includes the number of citations from other companies, the number of oppositions to patents, the number of requests for patent invalidation trials, the existence of requests for examination, and the presence or absence of registration of patent right setting. Is a value obtained by weighting the number of documents using at least one of the total number of citations from other companies, the total number of times of oppositions to be patented, and the total number of requests for patent invalidation.
  • the second index obtained by indexing the historical information is cited by other companies. As an index value of at least one of the total number, the total number of patent objections, the total number of requests for patent invalidation trial, the examination request rate, and the registration assessment rate Good.
  • the output means may calculate the factor evaluation value for each factor and for each applicant.
  • a data analysis method including the same steps as the methods executed by each of the above apparatuses, and the same processes as the processes executed by the respective apparatuses. It is a program that can be executed. This program can be sent and received over a network that can be recorded on a recording medium such as an FD, CDROM, or DVD.
  • the technical document is a patent document including a patent publication and a patent publication, and for each acquired patent document! /, Means for acquiring a patent score that individually evaluates the value of the patent document;
  • factor evaluation value calculation means for calculating a factor evaluation value indicating a technical evaluation of the factor using the patent score of a patent document belonging to the factor.
  • the factor evaluation value calculation means includes
  • a patent score equal to or higher than a predetermined threshold is selected from the patent scores of patent documents belonging to the factor, and a value obtained by adding the selected patent scores is calculated as the factor evaluation directly.
  • the patent score is a value that is standardized with respect to the document group of the population including the factor for which the factor evaluation value is calculated.
  • the patent score means that the patent documents are classified into groups for each technical field and every predetermined period, and for each classified group, the progress information of the patent documents belonging to the group is used, and Must be a calculated value
  • FIG. 1 is a diagram showing a hardware configuration of a document group analysis apparatus according to an embodiment of the present invention.
  • FIG. 2 (A) is a flowchart explaining the procedure of the factor extraction process in the above document group analyzer, and FIG. 2 (B) calculates the factor progress information score as document or document group data. The flowchart explaining the processing procedure to do.
  • FIG. 3A is an explanatory diagram regarding an example of a method for acquiring text data of a plurality of documents to be analyzed.
  • FIG. 3C is an explanatory diagram regarding an example of a method for acquiring text data of a plurality of documents to be analyzed.
  • the publication group belonging to each factor is further classified for each applicant.
  • FIG. 13 is a functional block diagram of a document group analysis apparatus according to a modification of the embodiment.
  • FIG. 15 is a graph showing the distribution of the technical element score of the modified example and the factor progress information score of the example in relation to the number of publications.
  • FIG. 16 An example showing the technical element score for each factor and for each applicant in the above variation.
  • FIG. 17 is a diagram schematically illustrating an example of a data configuration of progress information used in the modification example.
  • FIG. 18 is a diagram schematically illustrating an example of a data configuration of content information used in the above modification.
  • FIG. 19 is a flowchart showing the procedure of a patent score calculation process in the modified example.
  • FIG. 20 is a flowchart showing details of processing for calculating an evaluation value of each patent data in the modified example. Explanation of symbols
  • FIG. 1 is a diagram showing a hardware configuration of a document group analysis apparatus according to an embodiment of the present invention.
  • the document group analysis apparatus of the present embodiment includes a processing device 1 having a CPU (central processing unit) and a memory (recording device), an input device 2 that is an input means such as a keyboard (manual input device), a document group A recording device 3 which is a recording means for storing data, conditions, work results by the processing device 1 and the like, and an output device 4 which is an output means for displaying or printing data of documents or document groups belonging to the extracted factors. It consists of a computer device equipped.
  • the processing device 1 includes a text data acquisition unit 101, a document vector acquisition unit 102, a factor calculation unit 103, an attribution factor determination unit 104, a document number determination unit 201, a progress information reading unit 202, an index calculation unit 203, a patent Impact calculation unit 204, progress information calculation unit 205, and score calculation unit 206 Is provided.
  • each functional unit of the processing device 1 (text data acquisition unit 101, document vector acquisition unit 102, factor calculation unit 103, attribution factor determination unit 104, document number determination unit 201, progress information reading unit 202 , Index calculation unit 203, patent impact calculation unit 204, progress information calculation unit 205, and score calculation unit 206)
  • Programs for realizing the functions of the functional part (text data acquisition program, document vector acquisition program, factor calculation program, attribution factor determination program, document number determination program, progress information reading program, indicator calculation program, patent impact calculation program , Progress information calculation program, and score calculation program) are stored.
  • the functions of the functional units of the processing device 1 are realized by the CPU executing the above program stored in the memory.
  • the recording device 3 includes a condition recording unit 31, a work result storage unit 32, a document storage unit 33, and the like.
  • the document storage unit 33 includes document group data obtained from an external database or an internal database.
  • An external database means, for example, a document database such as IPDL of a patent electronic library that is serviced by the Japan Patent Office or PATOLIS (registered trademark) that is serviced by Patrice Co., Ltd.!
  • the internal database is a database that stores data such as patent JP-ROMs that are sold by itself, documents that are stored on a disk (such as discs) and DVDs (digital versatile discs), and is output to paper. It includes devices such as OCR (optical character reader) that reads in-written or handwritten documents, and devices that convert the read data into electronic data such as text.
  • the published patent publication, published patent publication, patent publication publication, patent invention specification published patent publication, republished patent publication, patent trial request publication, published patent publication English abstract, publication Utility Model Gazette, Published Utility Model Specification, Published Utility Model Gazette, Republished Utility Model Gazette, Public Utility Model Gazette, Utility Model Registration Gazette, Registered Utility Model
  • Various patent gazettes such as new gazettes, registered utility model specifications, notices of requests for utility model trials, public technical bulletins, etc., but are not limited to this. General can be analyzed.
  • USB Universal Serial Bus
  • a communication means for exchanging signals and data among the processing device 1, the input device 2, the recording device 3, and the output device 4 may be directly connected. It may be transmitted / received via a network such as a LAN (local area network), or via a medium such as an FD, CDROM, MO, or DVD that stores documents. Alternatively, a part or a combination of these may be used.
  • the text data acquisition condition of the analysis target document group the document vector acquisition condition, the factor loading amount and the factor score calculation condition, the attribution factor determination condition, the factor progress information score calculation condition described later, and the output Accepts input such as conditions.
  • These input conditions are sent to and stored in the condition recording unit 31 of the recording device 3.
  • the text data acquisition unit 101 acquires data of a document group to be analyzed from the document storage unit 33 of the recording device 3 according to the acquisition conditions of the text data input by the input device 2. For example, text data is acquired for I documents belonging to one cluster among the clusters obtained as a result of cluster analysis based on the similarity of documents extracted under certain conditions.
  • the acquired text data is sent directly to the document vector acquisition unit 102 and used for processing there, or sent to the work result storage unit 32 of the recording device 3 and stored therein.
  • the document vector acquisition unit 102 conforms to the document vector acquisition condition input by the input device 2, and based on the text data of the I documents acquired by the text data acquisition unit 101, I documents Calculate the vector.
  • This document vector is a J-dimensional vector whose vector element is the weighting amount z of the index word j in each document i, where J is the number of index words.
  • the calculated document vector is directly sent to the factor calculation unit 103 and used for processing there, or sent to the work result storage unit 32 of the recording device 3 and stored therein.
  • the factor calculation unit 103 is based on the vector element z of the document vector calculated by the document vector acquisition unit 102 in accordance with the factor load amount and factor score calculation conditions input by the input device 2.
  • the load a is calculated for each factor for each index word j, and the factor score f is jk ik for each document i.
  • the attribution factor determination unit 104 determines the attribution factor of each index word j based on the factor loading a calculated by the factor calculation unit 103 in accordance with the attribution factor determination condition input by the input device 2.
  • the index word j belonging to the same factor is output by the output device 4 together with the data of the corresponding document i.
  • the document number determination unit 201 to the score calculation unit 206 calculate the factor progress information score according to the factor progress information score calculation condition input by the input device 2.
  • the document number determination unit 201 calculates a document group for each factor or a document group for each factor and each applicant as a score calculation document. Read as a group and determine the number of documents.
  • the progress information reading unit 202 reads the progress information of each document from the document storage unit 33 of the recording device 3 for the score calculation target document group.
  • the index calculation unit 203 calculates an index based on the progress information read by the progress information reading unit 202 for the score calculation target document group.
  • the patent impact calculation unit 204 calculates the patent impact for the score calculation target document group based on the number of documents determined by the document number determination unit 201 and the index calculated by the index calculation unit 203.
  • the progress information calculation unit 205 calculates a progress information index based on the index calculated by the index calculation unit 203 for the score calculation target document group.
  • the score calculation unit 206 calculates a factor progress information score for the score calculation target document group based on the patent impact calculated by the patent impact calculation unit 204 and the progress information index calculated by the progress information calculation unit 205.
  • the work results by each functional unit are sent to and stored in the work result storage unit 32 of the recording device 3.
  • the condition recording unit 31 records information such as conditions obtained from the input device 2, and sends necessary data based on a request from the processing device 1.
  • the work result storage unit 32 stores the work result of each component in the processing device 1 and sends necessary data based on a request from the processing device 1.
  • the document storage unit 33 stores and provides necessary document group data obtained from the external database or the internal database based on the request of the input device 2 or the processing device 1. When storing patent document data, the document storage unit 33 preferably stores the bibliographic information (such as the name of the applicant) and the progress information (information such as the request for examination) together.
  • the output device 4 outputs the document and index word for which the attribution factor is determined by the attribution factor determination unit 104 of the processing device 1 for each factor.
  • the output device 4 includes a display unit such as a display device, and displays a correspondence table between documents and / or index words and factors, a factor evaluation value of a document or a document group calculated for each factor, and the like.
  • the output format is not limited to the display on the display unit, but may be printed on a print medium such as paper, or transmitted to a computer device on a network via communication means.
  • FIG. 2 (A) is a flowchart for explaining the procedure of the factor extraction process in the document group analyzer.
  • the document group analysis apparatus of the present embodiment extracts factors from a plurality of documents to be analyzed using a factor analysis technique.
  • the kind of document group to be selected as I documents is arbitrary s, for example, as follows.
  • 3A to 3C are explanatory diagrams regarding an example of a method for acquiring text data of a plurality of documents to be analyzed.
  • the items (A) and (B) described below are realized by the procedure described in the patent gazette filed by the applicant (see International Publication No. 2006/030751). Therefore, the following description is simplified.
  • the technology of interest is selected from the patent gazettes of a certain company (target company). Specifically, the text data acquisition unit 101 of the processing device 1 clusters the patent documents of the target company and obtains an evaluation value of each cluster (target company cluster) (for example, one corresponding to a factor progress information score described later). Calculate and select the target company cluster with the highest evaluation value as the technology of interest (Figure 3A).
  • the text data acquisition unit 101 selects the document group belonging to the “selected attention technique and the technology similar to the attention technique (specific technical field)” from its own patent document group including the target company and other companies' publications. (Self-other patent specific field document group) is extracted. Specifically, the text data acquisition unit 101 calculates the degree of similarity between each document in its own patent document group and the noted technology, and determines the document group having a higher number of similarities with the noted technology as its own others. It is extracted from the document storage unit 33 as a patent specific field document group (Fig. 3B). This makes it possible, for example, to select an important patent group from the patent group of the target company and then analyze similar patent groups including other company's patents.
  • the text data acquisition unit 101 obtains a document group to be analyzed by clustering the extracted patent documents in the patent-specific field.
  • the group of documents to be analyzed here (own and other patent clusters) is not limited to lower clusters having a high degree of similarity between documents. It may be a cluster.
  • Figure 3C shows an example where 70 lower clusters, 8 middle clusters (lines), and 4 upper clusters (groups) are generated! Whether to use a lower cluster or a middle or upper cluster as your own patent cluster can be selected according to the purpose of the analysis.
  • the index words to be extracted are, for example, index words of importance in I documents.
  • the importance is calculated for the index words included in the document i, sorted in descending order, and the highest words are extracted.
  • the importance level of the index word calculated here is based on the importance levels of a plurality of documents to be analyzed acquired by the text data acquisition unit, for example, GFIDF or the GFIDF based on the co-occurrence level of index words. It is preferable to use the one that has been corrected!
  • GFIDF refers to the global frequency (GF: total number of occurrences of the index word in the document group to be analyzed) and document frequency (DF: This is a value obtained by multiplying the reciprocal of the number of documents in which the index word appears) or the logarithm of the document frequency (IDF: inverse document frequency). This is an index word that is used in a large number in the document group to be analyzed, and is often used in a predetermined document group that is different from the document group to be analyzed. High scores and GFIDF values are calculated and evaluated as important words representing the characteristics of the document group to be analyzed.
  • DF (w, D) is the document frequency of index word w in document D.
  • Let high frequency words that are similar to each other be the base g (h l, 2,). This foundation g and index word
  • w ′ is a high-frequency word belonging to a certain base g and other than the index word w which is a measurement target of the co-occurrence degree Co (w, g).
  • Degree of co-occurrence Co of the index words w and the base g (w, g) is, w 'for all, the co-occurrence of c and w (w, w' is the sum of).
  • the next key (w) is calculated.
  • F (g) ⁇ Co (w, g), that is, the co-occurrence degree of the index word w and the base g Co (w, g
  • the key (w) is obtained by multiplying all the base g and taking the difference from 1
  • Skey (w) is calculated by the following equation.
  • GF (w, C) is the global frequency of the index word w in the document group C to be analyzed
  • ID F (w, P) is a predetermined document group P different from the document group C to be analyzed. This is the reverse document frequency of the index word w.
  • GFIDF is high and co-occurs in the basic language of document group C and has high affinity with the contents of document group C V, (key (w) is high! /,) Word! /, High! /, Skey (w) value is calculated and evaluated as an important index word representing the characteristics of the document group C to be analyzed.
  • the document vector obtaining unit 102 obtains the I-th document i and the weighting amount z of each index word j as a vector element. Is generated (S102). As a result, I row J column data like the following You can get power S. Let Z be the matrix of I rows and J columns with z as the matrix element.
  • the weighting amount refers to the quantity given to each index word in each document from a predetermined viewpoint, and for example, TFIDF is preferably used.
  • TFIDF is the index word frequency (TF: number of occurrences of the index word in a document) and document frequency (DF: number of documents in the document group where the index word appears) for a certain index word.
  • the factor calculation unit 103 calculates factor loadings and factor scores in factor analysis in which each document i is a subject, each index word j is an observation variable, and the document vector of each document is a response by the subject ( S103).
  • a the factor loading for each factor k of each index word j
  • f jk i the factor score for each factor k in each document i.
  • a factor loading matrix A having a factor loading a as a matrix element and a factor score matrix F having a factor score f as a k jk ik matrix element are set as follows.
  • R—V R *.
  • the factor calculation unit 103 performs an operation of factor rotation.
  • a method of rotating the factor axis in the present embodiment it is preferable to use a norimax (orthogonal rotation) method.
  • the relationship between the observed variable and the factor is obtained by rotating the factor axis while maintaining orthogonality with other factors so as to maximize the variance of the factor loading.
  • the document group to be analyzed is a sentence with high similarity between document vectors. In the case of books, using this varimax method has the advantage that the amount of each factor load can be reduced and the characteristics of the factor can be clarified.
  • orthogonal rotations include, for example, Coatmax, Ekamax, Percimax, Osomax, orthogonal procrustes, etc.
  • Coatmax Ekamax
  • Percimax Percimax
  • Osomax orthogonal procrustes
  • Promax Oblimin
  • Harris' Kaiser Oblique Procrustes, etc.
  • the method of rotating the factor axis in this embodiment is not limited to the above varimax, and may be appropriately selected from these rotation methods according to the embodiment.
  • the factor score matrix F is, for example,
  • the attribution factor determination unit 104 determines the attribution factor of each index word j based on the factor loading a, and based on the factor score f The attribution factor of each document i is determined (S104).
  • index terms and multiple technical documents can be attributed to each factor, so it is possible to grasp the characteristics or concepts of the technical field from the index terms through linguistic information understandable to the analyst. .
  • the present invention does not require the subjective or arbitrary judgment of the analyst, such as “interpretation of factors” in the ordinary factor analysis using the factor analysis technique. This is because each index word is used as an observation variable. In other words, since the observed variable itself represents the content, if the observed variable belonging to the factor is determined based on the factor loading, the content of each factor can be shown simply using the observed variable.
  • the attribution factor of the index word j is the factor k jk
  • the attribution factor of the document i is the factor k . Determining the attribution factor as described above can best explain the factor by, for example, belonging to the factor most closely related to each index word or each technical document. As a result, it becomes possible to grasp the features and concepts of the technical field more clearly. In this case, one index word can belong to only one factor, and one document can belong to only one factor. On the other hand, an index word belonging to one factor is not always one, and a document belonging to one factor is not necessarily one.
  • a lower limit value is set for the factor loading, and the maximum factor loading of a certain index word j a force S jk If the index loading j is less than the lower limit, the index word j should not belong to any factor.
  • the index word j having a low factor load is preferably excluded from the index word group indicating the contents of the factor.
  • a lower limit is set for the factor score, and the maximum value f of the factor score for a document i is
  • the document i is not attributed to any factor, and the relationship with the factor is high, and it is preferable to attribute only the document to the factor.
  • the index word or index word group belonging to each factor is included in the I documents to be analyzed, indicating the contents of each factor. It can be thought of as showing a concept or feature. Since the document or document group belonging to each factor is a document or document group closely related to the factor, it should be considered to indicate in which document the concept or feature extracted as each factor appears. I can do it.
  • the document group analysis apparatus outputs the index word or index word group belonging to the same factor together with the document or document group data belonging to each corresponding factor by the output device 4 for each factor (S105). It is arbitrary power to output what kind of data for documents or document groups belonging to each factor. As an example, output data (such as a publication number in the case of a patent publication) specifying the document or document group. It is possible to calculate the factor evaluation value of the document or document group and output it.
  • the factor evaluation value By calculating the factor evaluation value as described above, it is possible to perform a relative comparison between the factors based on the factor evaluation value. As a result, the relative positional relationship between the technical elements represented by the technical literature belonging to the factors can be grasped, and moreover, it can be recognized from among them. It is also possible to classify the key technical elements and those that are not. As a preferred example of calculating the factor evaluation value, an example of calculating and outputting the factor progress information score of the document or document group will be described below.
  • FIG. 2 (B) is a flowchart for explaining a processing procedure for calculating a factor progress information score as data of a document or a document group.
  • the process of this flowchart is as follows: (1) A document or document group belonging to each factor for which a factor progress information score is to be calculated; or (2) A factor progress information score is calculated for each factor and for each applicant as described later.
  • each factor and each applicant's document or document group hereinafter referred to as (1) (2) “score calculation target document group” in each case) will be executed.
  • the process of this flowchart is as follows: (1) A document or document group belonging to each factor for which a factor progress information score is to be calculated; or (2) A factor progress information score is calculated for each factor and for each applicant as described later.
  • each factor and each applicant's document or document group hereinafter referred to as (1) (2) “score calculation target document group” in each case
  • the document number determination unit 201 determines the document number N of the score calculation target document group (S201). If a published patent publication and a patent publication are issued for a patent application, the number of documents for that patent application should be counted as two.
  • the progress information reading unit 202 reads the progress information as the document attribute of each document of the score calculation target document group from the document storage unit 33 of the recording device 3 or another database (S202).
  • This database records the history of patent applications related to each document. As an example of the progress information to be read, for each patent application,
  • Patent setting registration status (1 or 0)
  • the index calculation unit 203 uses the above as an evaluation value for the score calculation target document group.
  • a plurality of indices are calculated based on the progress information recorded in the database (S203). Examples of this indicator include: “Total number of citations from other companies”, “Total number of requests for patent opposition or patent invalidation trials”, “Examination request rate”, “Registered assessment rate”, There are “patent registration rate”, “prompt request rate”, “other companies cited number ratio”, “appraisal appeal case ratio”, and “objection or invalidation request ratio”. Also good. Each definition is as follows.
  • Total number of citations from other companies “Total number of citations from other companies” in the document group to be calculated
  • Patent registration rate number of patent registrations / number of patent applications
  • Rapid review request rate Number of requests for accelerated examination / Number of requests for examination
  • the number of requests for examination is the total in the document group subject to score calculation of “whether or not there is a request for examination” (1 or 0)
  • the number of patent registrations is the total in the document group subject to score calculation of “whether or not the patent right is registered” (1 or 0)
  • the number of requests for accelerated examination is the total for the documents subject to score calculation for “whether or not requested for accelerated examination” (1 or 0).
  • the number of trials against appraisal is the sum of the documents for which scores are to be calculated for “Applicability of appraisal” (1 or 0)
  • the patent impact calculation unit 204 calculates a patent impact index by assigning a predetermined weight to the “number of documents” of the score calculation target document group (S204).
  • the patent impact index refers to a document that is intended to evaluate the other company's restraining power (the degree to which the rights of other companies are suppressed and the value of the company's patent is improved) for the document group subject to score calculation.
  • the “number of documents” is given a predetermined weighting based on “the total number of times of citations from other companies” and / or “the total number of times of oppositions to be patented or requests for invalidation of patented patents”
  • Patent Impact Index “Number of documents” + “Total number of citations from other companies” + “Total number of requests for opposition to patent or request for trial for invalidity of patent”
  • the force S can be calculated by The weighting for the “number of documents” may be performed by addition as in the above formula, or by multiplying by some other ratio.
  • the predetermined weighting includes various things such as patent profitability, patent productivity, patent utilization, and patent competitiveness, but is not limited to this. .
  • the progress information calculation unit 205 calculates the progress information index by averaging the indices based on the progress information of the score calculation target document group (S205).
  • the progress information index is an attempt to evaluate the value of patents of the company, the JPO, and competitors for a group of documents for which scores are calculated.
  • d indicators based on progress information such as “Request for Examination”, “Registered Appraisal Rate”, “Patent Registration Rate”, “Rapid Request for Approval”, “Ratio of Number of References from Other Companies”, “Appeal for Appeals”
  • progress information such as “Request for Examination”, “Registered Appraisal Rate”, “Patent Registration Rate”, “Rapid Request for Approval”, “Ratio of Number of References from Other Companies”, “Appeal for Appeals”
  • the progress indicator The above seven indicators are shown as an example, but in addition, for example, “in-house citation count ratio”, “domestic priority claim ratio”, “foreign priority claim ratio”, “packaging browsing rate”, etc. are used. You may do it.
  • factor progress information score By calculating the factor progress information score using the above-mentioned indicators, it is possible to perform factor evaluation that takes into account the influence of patents that can be an obstacle to patent acquisition and technology development of other companies.
  • factor evaluation that takes into account the applicant's willingness to obtain rights and examiner's evaluation.
  • the fairness and appropriateness of the factor evaluation can be ensured.
  • the relative positional relationship between the technical elements and the result of the importance can be grasped fairly and appropriately.
  • the above-mentioned method of taking the square root by dividing the sum of squares by the index number d can be said to be a method that combines the advantages of the simple sum method and the maximum value method.
  • the high value greatly affects the historical information index. Therefore, the “examination request rate”, “registration assessment rate”, “patent registration rate”, etc., which tend to be high, are particularly high! / (As a result, the number of patent assessments is large! /) , Can give outstanding evaluation score. And for indicators other than high values, it is a historical information index that takes some consideration into account.
  • the progress information index is calculated with all the d indices taken into account. As a result, it is possible to evaluate the value of a patent from multiple angles.
  • the score calculation unit 206 multiplies the patent impact index and the progress information index to calculate a factor evaluation value of the score calculation target document group (S206). By indexing the progress information in this way, for example, it is possible to evaluate quantitative and objective factors.
  • This evaluation value is referred to as a “factor progress information score”.
  • the calculated score is output in S105.
  • This factor progress information score has the following properties.
  • the progress information index increases as the number of patents registered increases. In addition, if there are unassessed moon cakes, oppositions, etc., they are taken into account.
  • the patent document group can be evaluated from the aspect of the progress information.
  • the factor By calculating the factor progress information score as described above, the factor can be evaluated based on the patent gazette. As a result, it is possible to perform highly accurate factor evaluation by utilizing examination progress information included in patent information related to the patent publication.
  • the method for calculating the factor progress information score is not limited to this. For example, by calculating the individual evaluation value of each patent gazette belonging to each factor and totaling them, the factor progress information score for each factor is calculated. Can also be requested.
  • a cluster analysis was performed based on the similarity of patent publications and patent publications published in Japan by a car manufacturer (named company A) to obtain multiple clusters.
  • 2869 publication groups with high similarity to the publication group belonging to the cluster related to the “exhaust control of internal combustion engine” were extracted from the group of about 4 million publications including patent publications.
  • the analysis results are an example in which these 2869 publication groups are cluster-analyzed based on the similarity of the document vector of each publication, and among these, 569 publications belonging to a specific cluster are analyzed.
  • the number of factors to be extracted is not limited to this, and may be changed as appropriate according to the embodiment. Examples of methods for determining the number of factors include eigenvalues, factor contributions, interpretability, and Kaiser-Gatmann criteria.
  • the table shown in FIG. 4 is a table showing an example of the factor loading of each index word calculated for each factor from factor numbers 1 to 16; in the figure, the rows are index words and the columns are The number of each factor is indicated. In FIG. 4, for convenience of explanation, not all index words are shown, but only some index words are extracted and shown.
  • the factor indicating the maximum factor load shown in FIG.
  • the index word “Kura” at the top of FIG. 4 has a factor loading of 0.96 for factor 1.
  • This index word “Kura” is attributed to factor 1 because it has a larger factor loading for factor 1 than any other factor up to factor 2 to 16;
  • the index word “sucking, NOx, reduction, release, spike” belongs to factor 1 as well as the index word “Kura”.
  • those that did not have a factor loading equal to or higher than a predetermined threshold were not attributed to any factor.
  • the table shown in FIG. 5 shows the factor scores of each gazette calculated for each factor from factor numbers 1 to 16; It is a table
  • the factor in which the factor score shown in FIG. 5 shows the maximum value (the corresponding part is shaded) is indicated in the gazette.
  • the attribution factor Specifically, for example, the publication “XX-XXX601” at the top of FIG. 5 has a factor score of 5.41 for factor 1. This report “XX-XXX601” is attributed to factor 1 because it has a higher factor score for factor 1 than any other factor up to factor 2 to 16; In addition, similarly to the publication “XX—XXX601”, the publications “XX—XXX097” and “XX—XXX189” belong to factor 1. However, among the gazettes included in the analyzed document group, those whose maximum score of factor is below the predetermined threshold (here, 1) are not attributed to any factor.
  • FIG. 6 shows index words belonging to each factor and publications belonging to each factor.
  • Fig. 6 is a table showing examples of index words and publications belonging to each factor.For convenience of explanation, not all of the publications belonging to each factor! Show me! /
  • the gazette belonging to the factor to be noticed is actually read, or the factor progression described later is based on the data of the gazette belonging to each factor. More advanced analysis is possible, such as calculating information scores.
  • Figure 7 Factor Nos. 1 to 16: Examination request rate, patent registration rate, registration decision rate, accelerated examination request rate, other company citation number ratio, assessment appeal ratio ratio, objections filed for each factor up to 16 An example of a ratio and a progress information index is shown.
  • Figure 8 shows examples of the number of gazettes, the number of citations from other companies, the number of oppositions, and patent impact for each factor from 1 to 16;
  • the score calculation unit 206 is based on the patent impact index and the progress information index.
  • the factor progress information score calculated by the above and the factor progress information score (average) per publication calculated by dividing the factor progress information score by the number of documents by the score calculation unit 206 are shown.
  • the eigenvalues of each factor in the factor analysis are also shown in the figure.
  • the factor numbers in the leftmost column are common to Figs. In Fig. 4 to Fig. 8, Fig. 11 and Fig. 12, the forces are arranged in the order of the factor numbers.
  • the factor arrangement order in Fig. 9 is the descending jet of the factor progress information score.
  • FIG. 10 shows an example in which the output device 4 outputs the factor progress information score calculated by the score calculation unit 206 in this embodiment.
  • each circle represents each factor
  • the position of each circle on the vertical axis represents the factor progress information score for each factor
  • the position of each circle on the horizontal axis represents the factor progress information score for each factor (the relevant factor (Average value per publication attributed to a factor).
  • the size of each circle indicates the number of publications belonging to each factor
  • the technical term attached to each circle indicates the index word belonging to each factor.
  • factor 13 exhaust, NOX
  • factor 5 downstream, upstream, catalyst, deterioration, diagnosis
  • factor 5 downstream, upstream, catalyst, deterioration, diagnosis
  • factor 5 is a combination of index words of catalyst, deterioration, and diagnosis, and the performance deterioration of exhaust gas reduction devices such as catalyst systems. It is presumed that this is a technical factor indicating a so-called advanced in-vehicle diagnosis system that has a function of automatically detecting and notifying the driver. Then, due to the recent exhaust gas regulations, the introduction of an advanced in-vehicle diagnosis system that monitors the malfunction of exhaust gas reduction devices such as catalysts for reducing NOx, detects it, and notifies the driver has been introduced. The power desired by various councils in local governments The importance of the technical element appears to be manifested in this factor 5 with meaningful index terms.
  • the publication group belonging to each factor is set as the "score calculation target document group" and is calculated for each factor.
  • the present invention is not limited to this, and the publication belonging to each factor.
  • the group may be further classified for each applicant, and a group of publications belonging to each category for each factor and each applicant may be referred to as a “score calculation target document group”.
  • the table shown in Fig. 11 shows the calculation results of factor progress information scores for each factor and for each applicant.
  • the calculation method of the factor progress information score for each factor and each applicant is the same as the above-described factor progress information score (for each factor) except that the score calculation target document group is changed. Since it is classified by factor and by applicant, the number of documents in each score calculation target group is likely to be small. Regardless of whether or not it is classified for each applicant, the examination request rate, etc., ranges from 0 to 1, and is not necessarily small. Therefore, even if the factor progress information score for each applicant for a factor is summed up for all applicants, the factor progress information score for that factor (not for each applicant) does not match.
  • FIG. 12 is an example in which the publication group belonging to each factor is further classified for each applicant in this embodiment, and the factor progress information score is shown for each factor and for each applicant.
  • the score calculation unit 206 further classifies the publication group belonging to each factor for each applicant, and calculates a factor progress information score for each factor and for each applicant.
  • the output device 4 generates a diagram based on the calculated factor progress information score and outputs the diagram.
  • FIG. 12 shows only the results calculated for the applicant's factor progress information score with the number of applications within the top 10 in all publications with factor scores above a certain level.
  • Factor 4 and Factor 5 which have been found to have a high factor progress information score in FIGS. 9 and 10, show that Company A has strong power (This Company A extracts the group of documents to be analyzed) The automaker that made it to the company. For example, company A Factor 4 and Factor 5! /, Strong! /, Position! /, Strength S, Factor 13! / It is possible to grasp the strengths and weaknesses between companies and other companies.
  • the technical document to be analyzed is a patent gazette is taken as an example, but the invention is not particularly limited to this! /.
  • the technical document to be analyzed may be a technical paper.
  • the factor progress information score may be obtained using the number of technical papers belonging to each factor, the number of citations, and the like.
  • the power S is exemplified by the case where it is performed by at least one of the total number of requests for invalidation trial, one, or all of them, and is not limited to this. For example, it may be determined using patent profitability, patent productivity, patent utilization, or patent competitiveness.
  • each functional unit (text data acquisition unit 101, document vector acquisition unit 102, factor calculation unit 103, attribution factor determination unit 104, document number determination unit 201, progress information reading unit 202 of the processing apparatus 1 is described.
  • the index calculation unit 203, the patent impact calculation unit 204, the progress information calculation unit 205, and the score calculation unit 206) are examples of forces realized by software.
  • the present invention is not limited to this.
  • Each functional unit of the processing device 1 may be realized by a circuit (ASIC (Application Specific Integrated Circuit) or the like) designed exclusively for executing each functional unit.
  • ASIC Application Specific Integrated Circuit
  • the processing apparatus 1 is not limited to the force particularly when the patent publications to be analyzed are acquired from the storage device 3 as an example.
  • it communicates with an external information providing server via a network such as the Internet, and external information You may make it acquire patent gazettes from a provision server.
  • the index used for calculating the patent impact index the “total value of the number of times of objection to patent patent” or the “total number of requests for patent invalidation trial” is used. This is only an example. As an index used for calculating the patent impact index, both “total value of the number of oppositions to be patented” and “total value of the number of requests for patent invalidation trial” may be used. Similarly, in the above embodiment, the power of using “the ratio of the number of oppositions to be patented” or “the ratio of the number of requests for invalidation of patents to be patented” as an index used to calculate the progress information index. . As an index used to calculate the progress information index, both “the ratio of the number of oppositions to be patented” and “the ratio of the number of requests for invalidation of patents to be patented” may be used.
  • the modified example of the present embodiment is a factor evaluation value different from the above-described factor progress information score among the processes performed by the above-described embodiment (hereinafter, the factor evaluation value calculated in the modified example is referred to as “technical element score”). Calculation).
  • the technical element refers to each extracted factor and is named from the technical literature contained in each factor and the content of each factor represented by an index word.
  • patent documents such as patent gazettes are used for the document to be analyzed is taken as an example.
  • FIG. 13 shows a configuration of a modification of the present embodiment.
  • FIG. 13 is a functional block diagram of a document group analysis apparatus according to a modification of the present embodiment.
  • the document group analysis apparatus includes an input device 2, a recording device 3, an output device 4, and a processing device 100.
  • the input device 2, the recording device 3, and the output device 4 are the same as those in the above embodiment.
  • the processing device 100 classifies the patent documents for each factor according to the request from the input device 2 and uses the patent documents stored in the recording device 3 according to the above-described procedure of FIG. Further, the processing device 100 calculates a technical element score for the factor designated by the user via the input device 2.
  • the processing device 100 includes a text data acquisition unit 101, a document vector acquisition unit 102, a factor calculation unit 103, an attribution factor determination unit 104, a progress information reading unit 202, a score calculation unit 2 060, and A patent score calculation unit 2070 is provided.
  • the text data acquisition unit 101, the document vector acquisition unit 102, the factor calculation unit 103, the attribution factor determination unit 104, and the progress information reading unit 202 have the same functions as those in the above embodiment, and thus description thereof is omitted here. To do.
  • the score calculation unit 2060 receives a technical document from the user via the input device 2 in a state where the patent document is classified for each factor by the attribution factor determination unit 104 and an index word is associated with each factor. An element score calculation request is accepted.
  • the score calculation unit 2060 calculates a technical element score using a “patent score (PS)” indicating an evaluation value for each patent document belonging to the calculation target factor.
  • the “patent score (PS)” is calculated in advance by the patent score calculation unit 2070 shown below.
  • the patent score calculation unit 2070 includes progress information of the patent document (information such as whether or not priority is claimed and the number of times cited in examination of other patent applications) and content information (claims). (Patent score (PS)) that evaluates the patent document. Then, for each piece of information (gazette number) identifying a patent document, the patent score calculation unit 2070 determines whether the patent document's “patent score (PS)” and whether or not the patent has been waived! / Information (hereinafter referred to as “PS information”) is generated in association with “abandonment information (including information indicating whether rejection has been confirmed! /”).
  • the processing device 100 is a CPU (Central Processing Unit), a memory, an I / F that exchanges data with external devices (input device 2, recording device 3, output device 4, etc.) It is realized by a computer equipped with Each functional unit of the processing device 100 (text data acquisition unit 101, document vector acquisition unit 102, factor calculation unit 103, attribution factor determination unit 104, progress information reading unit 202, score calculation unit 2060, and patent score calculation unit 2070) shall be realized by software.
  • CPU Central Processing Unit
  • I / F that exchanges data with external devices
  • each functional unit (text data acquisition unit 101, document vector acquisition unit 102, factor calculation unit 103, attribution factor determination unit 104, progress information reading unit 202, A program for realizing the score calculation unit 2060 and the tent score calculation unit 2070) is stored.
  • Each functional unit of the processing device 100 is realized by the CPU (Central Processing Unit) executing the program stored in the memory.
  • CPU Central Processing Unit
  • FIG. 14 is a flowchart showing the procedure of a technical element score calculation process according to a modification of the embodiment of the present invention.
  • the data of documents or document groups belonging to each factor is determined after the determination of the attribution factor for each document (S104) in the factor extraction process in Fig. 2 (A) described above. It is assumed to be performed for output (S 105).
  • “information that associates patent documents belonging to factors for each factor (information shown in FIG. 6)” obtained by the factor extraction process in FIG. 2 (A) is stored in a predetermined area of the memory of the processing device 100. It shall be remembered.
  • the score calculation unit 2060 receives a technical element score from the user via the input device 2.
  • a request for calculation processing is received (S2010).
  • the user when requesting the calculation process of the technical element score, the user also specifies the category to be calculated.
  • the factor obtained by the factor extraction process in Fig. 2 (A) may be specified as the classification target.
  • a technical element score is calculated for each factor.
  • patent gazettes belonging to each factor may be classified for each applicant, and classification for each factor and each applicant may be designated.
  • a technical element score is calculated for each factor and for each applicant.
  • the score calculation unit 2060 acquires the patent score (PS) of the patent document belonging to the factor for which the technical element score received in S2010 is calculated (S2020).
  • the score calculation unit 2060 uses “information in which patent documents are associated with each factor (information shown in FIG. 6)” and “PS information” stored in the memory of the processing device 100. Obtain “Patent Score (PS)” and “Abandonment Information” of patent documents belonging to the factor to be calculated.
  • the score calculation unit 2060 uses the patent score “PST” and “waiver information” of the patent document belonging to the obtained factor to be calculated to obtain a patent score (PS) that has not been waived. Each of the standard values is obtained (S2030).
  • the score calculation unit 2060 refers to the “waiver information” and, among patent documents belonging to the designated factor, patent documents that have not been surrendered (including patent applications pending at the JPO). Include) patent score (PS).
  • the score calculation unit 2060 in the population finds the standard value. More specifically, the score calculation unit 2060 obtains a standard value for each identified patent score (PS) using the following (Equation 1) and the identified patent score (PS).
  • the score calculation unit 2060 obtains the total value of the standard values of the patent scores PSj that are equal to or greater than the threshold among the standard values of the patent scores PSj of the patent documents belonging to the specific factor obtained in S2030, and The total value is set as the “technical element score” of the factor (S2040).
  • the score calculation unit 206 obtains the maximum value among the standard values of the patent scores PSj of the patent documents belonging to the specific factor obtained in S2030.
  • the score calculation unit 206 uses the following (Equation 2) and the standard value of the patent score (PSj) obtained in S2030 to calculate the “technical element for the factor specified by the user. Calculate score. Further, the score calculation unit 206 selects the maximum (MAX) standard value from the standard values of the patent scores PSj obtained in S2030, and sets the selected standard value as the maximum value in the factor.
  • Equation 2 among the standard values of each patent score PSj obtained in S2030, the number of standard values of the patent score PSj above the threshold is assumed to be “n” in the factor.
  • Equation 2 uses, as an example of the threshold value PSstd, 0 according to the average number 1 in the population of the standard value of each patent score PSi obtained in S2030.
  • the technical element score for one factor is calculated, but this is just an example.
  • the processing of S2020 to S2040 is performed for each factor, and the technical element score and the maximum value are obtained for each factor.
  • the output device 4 outputs the technical element score obtained in S2040. Alternatively, the output device 4 outputs the maximum value of the factor together with the technical element score.
  • the score calculation unit 2060 calculates the technical element score for each factor and each applicant, and generates information indicating a graph showing the technical element score for each factor and each applicant as shown in FIG. Then, it may be output by the output device 4. In this case, the technical element score and the maximum value may be indicated for each factor and for each applicant.
  • the technical element score is calculated by using the patent score (PSi) of the patent document that has not been waived.
  • PSi patent score
  • the reason for this is as follows. For example, if a company tries to evaluate patents in each technical field, the number of patent documents classified into a technical field (factor) is very large, but many of them are abandoned. (Or! / Is an application that has confirmed the decision of rejection! /). In such a case, if an application that has already been abandoned (or an application for which refusal has been confirmed) is included in the evaluation of a patent in that technical field, the technical field that does not hold many patents is highly evaluated. It is not possible to analyze properly.
  • the technical element score is calculated using the patent score (PSi) of the patent document that has not been waived to improve the accuracy of the score.
  • the number of applications for each applicant and each factor itself is also a sufficiently significant value. I can think. However, if the documents to be analyzed (population group) are extracted by any method that is not so, the number of applications itself varies depending on the characteristics of the industry to which each applicant belongs. If the number of applications for each factor is limited, there is a possibility that a highly accurate analysis cannot be performed.
  • FIG. 15 is a diagram showing the distribution of the technical element score of the modified example and the factor progress information score of the example in relation to the number of publications.
  • factor progress information scores are calculated and standardized to mean 0 and variance 1 for multiple factors extracted from a certain group of documents to be analyzed, and technical factor scores (average 0 and variance 1 are standardized for the same factor).
  • the standardized factor progress information score and technical element score are plotted on the vertical axis, and the number of publications for each factor is plotted on the horizontal axis.
  • the factor progress information score has a distribution close to a straight line showing a direct proportional relationship with the number of publications, and is greatly influenced by the number of publications.
  • the technical element score is not completely irrelevant to the number of publications, it is distributed in a region far from the straight line showing the direct proportional relationship, which shows that the influence of the number of publications is mitigated.
  • FIG. 16 shows that after determining the attribution factors of each document (S104) in the factor extraction process of FIG. 2 (A) described above, the publication groups belonging to each factor are further classified for each applicant.
  • the technical element score of this modification is calculated for each factor and for each applicant.
  • the specific calculation method is as follows.
  • a cluster analysis was performed based on the similarity of patent publications and publications published in Japan by a household equipment manufacturer (Company a) to obtain multiple clusters. These multiple clusters As a result, we decided to analyze in detail the relationship with the patents of other companies around the company, especially for clusters related to “composite structures”. Therefore, out of approximately 4 million gazette groups including patent publications and patent publication gazettes of our company and other companies, about 3000 gazette groups with high similarity to the gazette group belonging to the cluster related to the “composite structure”. Extracted.
  • a cluster analysis was performed on the approximately 3000 publication groups based on the similarity of the document vectors of each publication, and among these, 323 publications belonging to a specific cluster were extracted as factors to be analyzed.
  • the score calculation unit 2060 further classifies the publication group belonging to each factor for each applicant, and calculates the technical element score for each factor and each applicant. Then, the output device 4 generates a diagram based on the calculated technical element score, and outputs the diagram.
  • FIG. 16 factors are listed in the horizontal direction in the figure, applicants are listed in the depth direction in the order of the number of applications, and the height direction indicates the technical element score for each factor and each applicant.
  • Fig. 16 shows only the results calculated for the applicant's technical element score with the number of applications within the top 10 in all publications with a factor score exceeding a certain level.
  • the technical factor score is calculated by adding the patent scores that are below the average, excluding publications that are below the average. For factors with a below average, the technical factor score is close to 0 or 0. Therefore, the contrast between factors becomes clear, and as a result, the order and evaluation between factors can be easily grasped visually.
  • the force using the average of the population as the threshold value is not particularly limited to this.
  • an average of the standard values of the patent score PSi in the patent group of the specific applicant and other threshold values determined by other users may be set in the processing apparatus 100.
  • the power value using the standard value of the patent score PSi is not limited to this. For example, the effect of the number of cases can be mitigated even when only non-standardized patent scores PSi are added.
  • the calculation of the patent score (PS) is performed by the patent score calculation unit 2070 of the processing apparatus 100.
  • the present invention is not particularly limited to this.
  • Another computer having a CPU (Central Processing Unit), a memory, and the like may perform the calculation of the patent score.
  • a program (PS calculation program) for realizing the function of the patent score calculation unit 2070 is stored in another computer.
  • the CPU of another computer executes the “PS calculation program”, thereby calculating the patent score PS and generating the above-described PS information.
  • the processing device 100 acquires PS information generated by another combinator and stores it in the memory.
  • another computer performs the patent score calculation process, it is not necessary to provide the patent score calculation unit 2060 in the processing apparatus 100.
  • the storage device 3 stores patent data (electronic data indicating a patent gazette) and patent attribute information.
  • the electronic data indicating the patent gazette includes at least the bibliographic information such as the patent data I D (gazette number, etc.), filing date, and IPC code.
  • the patent attribute information includes progress information 300 of the patent document (information such as whether priority is claimed or the number of citations in examination of other patent applications), and content information 400 (number of claims, Information such as the number of specifications).
  • progress information 300 of the patent document information such as whether priority is claimed or the number of citations in examination of other patent applications
  • content information 400 number of claims, Information such as the number of specifications.
  • FIG. 17 shows an example of the data configuration of the progress information 300.
  • FIG. 17 is a diagram schematically showing an example of the data structure of the progress information used in the modification of the present embodiment.
  • the progress information 300 includes a “Finored 301” for registering a “patent data ID (gazette number, etc.)”, a “Finored 302” for registering “the filing date, etc.”, “examination”
  • the field 311 for registering information indicating the presence / absence of “PCT application” the field 312 for registering information indicating the presence / absence of “wrapping bag browsing”, and the information indicating “number of times cited” Field 313 and one record.
  • the progress information 300 includes a plurality of records.
  • “elapsed days from application”, “elapsed days from examination request”, and “elapsed days from registration date” are information relating to the period of the corresponding patent data.
  • “Elapsed days from application” is the filing date
  • “Elapsed days from examination request” is the application examination request date
  • “Elapsed date from registration date” The number of days in the past is stored in the storage device 3 based on the registration date of the patent right, and the calculated number of days until the evaluation date (patent score calculation date) or near the evaluation date! .
  • the application examination request has not been requested yet! /,
  • the patent application has already been null, and the number of days passed since the examination request has been set to NULL.
  • the “number of days elapsed since registration date” is NULL.
  • ⁇ Division application '' is whether the divisional application has been filed based on the patent application
  • ⁇ Rapid examination '' is whether the patent application has been expedited examination
  • the appeal against the decision to reject the patent application is requested and whether or not a patent trial decision has been made in the trial
  • the “opposition to maintain opposition” is a patent opposition to the patent and a maintenance decision has been made Whether the trial for maintaining a trial for invalidation is requested and whether or not a trial for rejecting the request has been made in the trial
  • “priority” is a patent application for which the patent application is earlier.
  • FIG. 18 is a diagram schematically showing an example of the data configuration of the content information used in the modification of the present embodiment.
  • the content information 400 includes a field 401 for registering “patent data ID (gazette number, etc.)”, a field 402 for registering “number of claims” of the patent data, One record is composed of a field 403 for registering “average number of characters in a claim” and a field 404 for registering “number of specifications” of the patent data.
  • the content information 400 includes a plurality of records.
  • the “number of claims” is information indicating the number of claims of the patent application.
  • “Average number of characters” is information indicating the average number of characters (or the number of words) per claim of the patent application.
  • “Number of specification pages” is information indicating the number of specification pages or the number of publication pages of the patent application. This information is extracted from the published patent gazette and other patent data of each patent application.
  • FIG. 19 is a flowchart showing the procedure of a patent score calculation process according to a modification of the present embodiment.
  • the patent score calculation unit 2070 receives input of an IPC code from the user, and acquires patent data (electronic data indicating a patent publication) (S400).
  • the patent score calculation unit 207 accesses the storage device 3 and acquires patent data classified into the IPC code.
  • the patent data contains bibliographic information such as the filing date information and priority date information of the patent application (limited to cases where priority is claimed).
  • the patent score calculation unit 2070 uses the application date information or the priority date information among the obtained bibliographic information of the patent data to obtain the patent data every predetermined period (variation example of this embodiment). Then, it is classified into group t by application year and year to which the priority date belongs (S500).
  • the patent score calculation unit 2070 calculates an evaluation value of each patent data (S600). Details of this processing will be described with reference to FIG.
  • FIG. 20 is a flowchart showing details of a process for calculating an evaluation value of each patent data according to a modification of the present embodiment.
  • the patent score calculation unit 2070 acquires the progress information 300 and the content information 400 for the patent data belonging to the group generated by the classification of S210 (S610). Specifically, the patent score calculation unit 2070 uses the patent ID (gazette number, etc.) included in the bibliographic information of the acquired patent data to be stored in the storage device 3! /, The progress information 300 and In addition, from the content information 400, the historical information 300 and the content information 400 associated with the patent ID of the acquired patent data are acquired.
  • variable j is set to 1 (S620), and the evaluation score of the patent data j is calculated as follows.
  • the evaluation score of patent data j is calculated for each of the I evaluation items i (S6302, S6303, S6304).
  • the evaluation score is calculated according to the following 3).
  • the "relevance data for evaluation item i" arranged in the numerator is, for example, "1” if a divisional application has been filed as described above, and "0" if it has not been filed.
  • the attribute information of patent data is useful for relative evaluation within the analysis population, but proper evaluation cannot be performed if patent applications or patent rights within this analysis population are treated equally.
  • the analysis target population is classified into groups for each period, and the value obtained for each classified group is used as the denominator, so that analysis targets including patent applications or patent rights at different periods can be used. Appropriate relative evaluation is possible within the population.For example, in one technical field! /, One value in a contemporaneous group with few patent applications, and one in a contemporaneous group with many patent applications. In terms of one value, the former value is often higher.
  • a patent application that has passed several years is more likely to be given progress information such as being requested to browse than a patent application that has just been published. That's why it is an error to evaluate a patent application as low as it has been published.
  • the patent application that received the request for inspection is a patent application with a particularly high degree of attention and should be highly evaluated.
  • the patent application that has been requested to be browsed should be highly evaluated simply because it has been requested to be read. It is not a thing.
  • the value obtained by using the patent attribute information of each patent data belonging to each group and the value obtained by using the patent attribute information of each patent data belonging to the group are obtained for each group.
  • the evaluation score is calculated by multiplying the sum of the values by the value of the decreasing function.
  • the evaluation score is calculated according to the following 4.
  • the denominator is a force with the same formula as S6302 [Presence / absence type] above.
  • the number of days in the past is a positive square root obtained by summing, for example, a value of 1 if the application is requested for the patent application and 0 if not, for example.
  • the denominator is the sum of the values of 1 if the patent application has been registered for the patent application and 0 if not, and taking the positive square root. It becomes.
  • “Elapsed days since filing” all patent data is applicable, so if the corresponding data for the evaluation item is 1, the denominator value is equal to the positive square root of the number of patent data in the group. Become.
  • the denominator is large when there are many patent data corresponding to the evaluation items in the group, and the denominator is small when there are only a few patent data corresponding to the evaluation items in the group.
  • “Elapsed days from request for examination”, “Elapsed days of application filing power”, and “Registration power, etc.” are basic evaluation items applicable to many patents. Item points tend to be smaller.
  • the evaluation score calculated in S6303 [time decay type] is further corrected by the content information.
  • the content information 400 shown in FIG. 18 is used.
  • Multiply a Xa Xa is maximized by setting the maximum values of a, a and a to 2 1/3 respectively.
  • the evaluation score is calculated according to the following 5].
  • f (quotation) Xlog (n + l) placed in the numerator is weighted to the logarithm of the value obtained by adding 1 to “number of times cited” for “number of times cited”. f (quotation) multiplied.
  • the verification by the present inventors has shown that the retention rate of patent rights changes depending on the number of citations as well as the presence or absence of citations. The logarithm is taken to indicate a tendency to gradually peak.
  • Specific weights include f (quote) for other company quotes and f (quote) for company quotes.
  • the ratio to 1) was set to 1: 2.
  • the evaluation raw score is the positive square root of the sum of squares of I evaluation points, or 0.
  • the evaluation score is 0 when the application examination request is not received by the deadline for requesting examination.
  • the evaluation score calculated in S6303 [time decay type] is corrected by the content information. Specifically, each of the evaluation points calculated in the above-mentioned Woman 4] based on “Elapsed days from examination request”, “Elapsed days from application date”, and “Elapsed days from registration date” After multiplying by X a X a, take the square root of the sum of squares according to Women 7].
  • One way to solve this problem is to use the maximum value among the evaluation points i as the evaluation raw score (maximum value method).
  • maximum value method there is a particular case when investigating the correlation between certain historical information and the retention rate of the patent group, when investigating the correlation regardless of what other historical information is given.
  • the patent maintenance rate is expected to be best expressed by the maintenance rate of the historical information with the highest maintenance rate, so it is reasonable to assume that the maximum value of the evaluation point i is the evaluation raw score. If the maximum value of point i is the same in the two patents, superiority or inferiority cannot be assigned.
  • the above-mentioned method of taking the square root of the sum of squares can be said to be a method that combines the advantages of the simple sum method and the maximum value method. In other words, by taking the square root of the sum of squares, if there is a high evaluation point i in I evaluation items i for a certain patent data j, the high evaluation point i greatly affects the evaluation point.
  • Evaluation points other than the evaluation item with a high evaluation point i also become evaluation raw points with some consideration. Therefore, for patent data j that corresponds to multiple items such as “early examination”, “opposition to maintain opposition”, and “invalidation trial decision” that tend to be high, i. Can be given.
  • patent evaluation is performed in consideration of all evaluation points calculated according to the type of patent attribute information (S630, S640). As a result, it is possible to evaluate the value of patent data from multiple angles.
  • the evaluation value calculated based on the progress information or content information is the power S that gives a high value to a few patent applications or patent rights that can read unique progress or content S, and many other patent applications or patent rights.
  • low! / Often given a value! / ,. Therefore, looking at the number distribution by rating value, patent applications or patent rights with high evaluation values are few and sparse, and many patent applications or patent rights with low evaluation values are densely distributed.
  • the average value (arithmetic average value) is greatly influenced by a small number of patent applications or patent rights with high evaluation values, so care must be taken when evaluating by comparison with such average values. It becomes.
  • the average value is greatly influenced by a small number of patent applications or patent rights with high evaluation values, so care must be taken when evaluating by comparison with such average values. It becomes.
  • the average value is greatly influenced by a small number of patent applications or patent rights with high evaluation values, so care must be taken when evaluating by comparison with such average values. It becomes.
  • the average value is greatly influenced by a small number of patent applications or patent rights with high evaluation values, so care must be taken when evaluating by comparison with such average values. It becomes.
  • the average value is greatly influenced by a small number of patent applications or patent rights with high evaluation values, so care must be taken when evaluating by comparison with such average values. It becomes.
  • the average value is greatly influenced by a small number of patent applications or patent rights with high evaluation values, so care must be taken when evaluating
  • the evaluation value calculation processing from S610 to S670 is executed for all groups t obtained by classifying the patent data acquired in S400 in S500.
  • the process returns to FIG. 19, and based on this evaluation value, the deviation value in the analysis target population acquired in S400 is calculated as the patent score PS (S700).
  • This deviation value also enables relative comparison of patent data between different technical fields that would otherwise be difficult to compare (comparison with an analysis population selected separately by different IPCs in S400). It is.
  • the technical element score is calculated based on the patent score PS obtained by the above procedure.
  • the patent score PS which is the basis of the technical element score, takes into account the weight according to the type of historical information. Since the technical element score is obtained using the patent score PS, a score with higher accuracy is calculated in the modified example.
  • the analysis target population is classified into groups for each period, and the values obtained for each classified group are used as denominators to include patent applications or patent rights at different periods. Appropriate relative evaluation is possible within the analysis population.

Abstract

A document analysis device includes: a text data acquisition unit (101) for acquiring a plurality of technical documents; a document vector acquisition unit (102) for acquiring a weighting amount of each index word for each technical document acquired; a factor calculation unit (103) which uses the acquired technical documents as an examinee and the weighting amounts of the respective index words to perform factor analysis with the respective index words as observation variables, calculates a factor load amount for each of factors for each fo the index words, and calculates a factor point for each of factors for each of the technical documents; and an imputed factor decision unit (104) which decides an imputed factor of each index word by using the factor load amount of each index word and decides the imputed factor of each technical document by using the factor point of each technical document. This enables appropriate analysis of a plurality of documents, i.e., what kind of concepts or characteristics are contained in the document or the document group.

Description

明 細 書  Specification
文書群分析装置  Document group analyzer
技術分野  Technical field
[0001] 本発明は、技術文献を分析する技術に関し、特に複数の文書の中に、どのようなコ ンセブト乃至特徴が存在するかを分析するための技術に関する。  [0001] The present invention relates to a technique for analyzing technical literature, and more particularly, to a technique for analyzing what kind of concept or characteristic exists in a plurality of documents.
背景技術  Background art
[0002] 複数の技術文献の内容を分析するために、技術文献を複数のクラスタに分類する ものが知られている。例えば特開 2005— 92443号公報(特許文献 1)は、検索され た技術文献に対して形態素解析を行って得られた各単語にウェイトを付加して、各技 術文献をベクトル化し、ベクトルの向きが近い技術文献同士を一つのクラスタにまとめ ている。そして、個々のクラスタごとに重要単語を抽出している。  [0002] In order to analyze the contents of a plurality of technical documents, there is known one that classifies technical documents into a plurality of clusters. For example, Japanese Patent Laid-Open No. 2005-92443 (Patent Document 1) adds a weight to each word obtained by performing morphological analysis on a searched technical document, vectorizes each technical document, Technical documents with similar orientations are grouped into a cluster. Then, important words are extracted for each cluster.
特許文献 1 :特開 2005— 92443号公報  Patent Document 1: JP 2005-92443
発明の開示  Disclosure of the invention
発明が解決しょうとする課題  Problems to be solved by the invention
[0003] しかし、このような従来のクラスタ分析による分類では、分析対象の複数の技術文献 に表される技術分野のコンセプト乃至特徴を的確に把握することが困難な場合があ る。例えば上記特開 2005— 92443号公報(特許文献 1)では、ベクトルの向きの近さ という尺度で分類を行っているので、ベクトルの向きが一定の閾値より近ければ同一 クラスタ、少しでも遠ければ他クラスタとなってしまい、クラスタごとの特徴が必ずしも 明確とはならない。クラスタごとに重要単語を抽出しても、抽出された重要単語が似 通って!/、るクラスタ間ではその違!/、を把握しにくいし、分類そのものの妥当性に対し ても疑問が生じてしまう。  [0003] However, in such classification based on conventional cluster analysis, it may be difficult to accurately grasp the concept or feature of the technical field represented in a plurality of technical documents to be analyzed. For example, in the above Japanese Patent Laid-Open Publication No. 2005-92443 (Patent Document 1), classification is performed on the scale of closeness of vector direction, so the same cluster if the vector direction is closer than a certain threshold, otherwise It becomes a cluster, and the characteristics of each cluster are not always clear. Even if important words are extracted for each cluster, the extracted important words are similar! /, And it is difficult to grasp the difference between clusters! /, And the validity of the classification itself is questioned. End up.
つまり、上記特許文献 1の手法については、分類したクラスタの特徴を分析者に把 握させることについて特に考慮されてはいないのである。従って、分析者は、分類さ れたクラスタの特徴を把握しょうとすると、クラスタに属する各技術文献を読まざるを得 なくなり、分析に長大な時間を費やしてしまうことになる。  In other words, the method disclosed in Patent Document 1 does not take into consideration the analysis of the characteristics of the classified clusters by the analyst. Therefore, if an analyst tries to grasp the characteristics of the classified cluster, the analyst is forced to read each technical document belonging to the cluster, and the analysis takes a long time.
[0004] そこで、本発明は上記事情に鑑みてなされたものであり、本発明の目的は、分析の 客観性を維持しつつ、分析者が技術文献の分析において、複数の技術文献によつ て表される技術分野がどのようなコンセプト乃至特徴を有しているかを把握できるよう にすることにある。 Therefore, the present invention has been made in view of the above circumstances, and the object of the present invention is to analyze It is intended to enable analysts to understand what concepts or features the technical fields represented by multiple technical documents have in the analysis of technical documents while maintaining objectivity. .
課題を解決するための手段  Means for solving the problem
[0005] (1)上記課題を解決するため、本発明の一態様は、テキストデータの分析を行う文 書群分析装置に適用される。 [0005] (1) In order to solve the above-described problem, one aspect of the present invention is applied to a document group analysis apparatus that analyzes text data.
そして、前記文書群分析装置は、テキストデータで表された、複数の技術文献を取 得するテキストデータ取得手段と、前記取得した各技術文献につき、各索引語の重 み付け量を求める重み付け量算出手段と、前記取得した各技術文献を被験者とし、 前記求めた各索引語の重み付け量を用いて、前記各索引語を観測変数とした因子 分析を行い、各索引語の各々について、因子毎に因子負荷量を算出するとともに、 前記各技術文献の各々について、因子毎に因子得点を算出する演算手段と、各索 引語の因子負荷量を用いて各索引語の帰属因子を決定するとともに、各技術文献 の因子得点を用いて各技術文献の帰属因子を決定する帰属因子決定手段と、同じ 因子に属する索引語又は索引語群を、それぞれ該当する各因子に属する技術文献 又は技術文献群のデータとともに、各因子につき出力する出力手段と、を備えること を特徴とする。  Then, the document group analysis device is a text data acquisition means for acquiring a plurality of technical documents represented by text data, and a weighting amount calculation for calculating a weighting amount of each index word for each of the acquired technical documents. And a factor analysis using each index word as an observation variable using the obtained weighted amount of each index word as a subject, and for each index word for each factor. In addition to calculating the factor loading, for each of the technical documents, calculating means for calculating the factor score for each factor, and determining the attribution factor of each index word using the factor loading of each index word, The attribution factor determination means for determining the attribution factor of each technical document using the factor score of each technical document, and the index word or group of index words belonging to the same factor, respectively. With operative document group of data, characterized by comprising output means for outputting for each factor, a.
[0006] 本発明の一態様によれば、各因子に索引語及び複数の技術文献を帰属させること ができるので、索引語からは、分析者にとって理解可能な言語情報を通じて技術分 野の特徴乃至コンセプトを把握することが可能となる。また、複数の技術文献からは、 各技術文献に含まれる書誌情報が有する特定の傾向から技術分野の特徴乃至コン セプトを把握することが可能となる。なお、本発明では因子分析の手法を用いている 1S 通常の因子分析における「因子の解釈」というような分析者の主観的或いは恣意 的な判断を必要としない。これは、観測変数として各索引語を用いていることによる。 つまり、観測変数そのものが内容を表しているので、因子負荷量に基づいて因子に 帰属する観測変数を決定すれば、各因子の内容は観測変数を用いて端的に示すこ と力 Sできる。  [0006] According to one aspect of the present invention, an index word and a plurality of technical documents can be attributed to each factor. Therefore, from the index word, characteristics of the technical field through language information that can be understood by an analyst. It becomes possible to grasp the concept. Also, from a plurality of technical documents, it is possible to grasp the characteristics or concepts of the technical field from the specific tendency of the bibliographic information included in each technical document. Note that the present invention does not require the subjective or arbitrary judgment of the analyst such as “interpretation of factors” in 1S normal factor analysis using the factor analysis technique. This is because each index word is used as an observation variable. In other words, since the observed variable itself represents the content, if the observed variable belonging to the factor is determined based on the factor loading, the content of each factor can be shown directly using the observed variable.
[0007] (2)また、前記文書群分析装置であって、 前記帰属因子決定手段は、各索引語の各々について、前記算出した因子負荷量 を用いて、当該因子負荷量が最大の因子を選択し、その選択した因子を該索引語の 帰属因子として特定するとともに、各技術文献の各々について、前記算出した因子 得点を用いて、当該因子得点が最大の因子を選択し、その選択した因子を該技術 文献の帰属因子として特定することとしてもよい。 [0007] (2) In the document group analysis apparatus, The attribution factor determination means uses the calculated factor loading for each index word to select a factor having the maximum factor loading, and identifies the selected factor as the attribution factor of the index word. At the same time, for each technical document, using the calculated factor score, the factor having the maximum factor score may be selected, and the selected factor may be specified as the attribution factor of the technical document.
[0008] 各索引語又は各技術文献にとって最も関係性の強い因子に帰属することにより、そ の因子を最もよく説明することが可能となる。その結果、技術分野の特徴乃至コンセ ブトをより明確に把握することが可能となる。  [0008] By belonging to the factor most relevant to each index word or each technical document, it is possible to best explain the factor. As a result, it becomes possible to grasp the features and concepts of the technical field more clearly.
[0009] (3)また、前記文書群分析装置であって、  [0009] (3) In the document group analysis apparatus,
前記複数の文書に含まれる索引語の出現頻度を求め、該出現頻度を用いて各索 引語の重要度を算出し、該算出した重要度を用いて、重要度上位所定個数の索引 語を抽出する重要索引語抽出手段を更に備え、前記重み付け量算出手段は、前記 各索引語の重み付け量として、前記重要度上位所定個数の索引語の重み付け量を 求めることとしてあよい。  The frequency of occurrence of index words included in the plurality of documents is obtained, the importance of each index word is calculated using the frequency of appearance, and a predetermined number of index words with the highest importance are calculated using the calculated importance. An important index word extracting unit may be further included, and the weighting amount calculating unit may obtain a weighting amount of a predetermined number of index words having a higher importance as a weighting amount of each index word.
[0010] 複数の技術文献の中から、技術文献群全体の特徴を表す索引語のみを重要索引 語として抽出した上で分析することにより、その技術分野の特徴乃至コンセプトをより 明確に把握することが可能である。他方で、索引語を事前に絞り込んでおくことにより 、分析処理の効率化を図ることも可能となる。  [0010] From a plurality of technical documents, only the index words that represent the characteristics of the entire technical document group are extracted as important index words and analyzed, so that the characteristics or concepts of the technical field can be understood more clearly. Is possible. On the other hand, it is possible to improve the efficiency of the analysis process by narrowing down the index words in advance.
[0011] (4)また、前記文書群分析装置であって、  [0011] (4) In the document group analyzing apparatus,
前記因子の各々について、該因子の技術的評価を示す因子評価値を算出する因 子評価値算出手段を備え、前記出力手段は、前記技術文献又は技術文献群のデ ータとして、該技術文献又は技術文献群の因子評価値を出力することとしてもよい。  For each of the factors, there is provided factor evaluation value calculation means for calculating a factor evaluation value indicating a technical evaluation of the factor, and the output means uses the technical document as data of the technical document or technical document group. Or it is good also as outputting the factor evaluation value of a technical literature group.
[0012] 上記構成により、因子評価値による各因子間の相対比較を行うことができる。それ により、因子に帰属する技術文献によって表される技術要素間の相対的な位置関係 を把握することができ、更には、それらの中から重要な技術要素とそうでないものとの 分類を fiうこともできる。  [0012] With the above configuration, it is possible to perform a relative comparison between factors based on factor evaluation values. As a result, it is possible to grasp the relative positional relationship between the technical elements represented by the technical literature belonging to the factors, and to further classify the important technical elements from those that do not. You can also
[0013] (5)また、前記文書群分析装置であって、  (5) In the document group analyzing apparatus,
前記技術文献は、特許公開公報及び特許掲載公報を含む特許文書であり、前記 各特許文書の経過情報を取得する経過情報取得手段と、前記因子の各々について 、該因子に属する前記技術文献又は技術文献群の経過情報を用いて、該因子の技 術的評価を示す因子評価値を算出する因子評価値算出手段と、を備え、前記出力 手段は、前記技術文献又は技術文献群のデータとして、前記因子評価値を出力す ることとしてあよい。 The technical document is a patent document including a patent publication and a patent publication, Progress information acquisition means for acquiring progress information of each patent document, and for each of the factors, factor evaluation indicating a technical evaluation of the factor using the progress information of the technical literature or technical literature group belonging to the factor Factor evaluation value calculation means for calculating a value, and the output means may output the factor evaluation value as data of the technical document or technical document group.
[0014] 本発明の一態様では、特許公報に基づく因子の評価を行うことができるため、その 特許公報に関連する特許情報に含まれる審査経過情報を活用することにより、精度 の高い因子評価を行うことが可能となる。  [0014] In one aspect of the present invention, factors can be evaluated based on a patent gazette. Therefore, highly accurate factor evaluation can be performed by utilizing examination progress information included in patent information related to the patent gazette. Can be done.
[0015] (6)また、前記文書群分析装置であって、  [0015] (6) In the document group analyzing apparatus,
各因子に属する技術文献又は技術文献群の文献数を判定する文書数判定手段を 備え、前記因子評価値算出手段は、前記因子の各々について、該因子に属する技 術文献又は技術文献群の文献数に所定の重み付けをした第 1指数を算出し、該因 子に属する技術文献又は技術文献群の経過情報を指数化した第 2指数を算出し、 該算出した第 1指数および第 2指数を用いて、該因子の因子評価値を算出することと してもよい。  Document number determination means for determining the number of technical documents or technical document groups belonging to each factor, and the factor evaluation value calculation means for each of the factors is a technical document or technical document group document belonging to the factor. A first index with a predetermined weight is calculated, a second index is calculated by indexing progress information of technical documents or technical documents belonging to the factor, and the calculated first index and second index are calculated. It may be used to calculate a factor evaluation value of the factor.
[0016] 技術文献数を判定することにより、技術文献群力 なる母集団における因子のシェ ァを把握すること力 Sできる。また、文献数に所定の重み付けを行うことにより、経済的 な側面や技術競争力を加味した因子の評価を行うことができる。そして、経過情報を 指数化することにより、定量的且つ客観的な因子の評価を行うことができる。それらの 結果、その技術分野における技術要素のシェア及び経済的価値を把握することがで きるだけでなぐこれら技術要素のシェア及び経済的価値を、数値化した上で定量的 に巴提することあでさる。  [0016] By determining the number of technical documents, it is possible to grasp the factor share in the population, which is the technical document group power. In addition, by giving a predetermined weight to the number of documents, it is possible to evaluate factors that take into account economic aspects and technological competitiveness. By indexing progress information, quantitative and objective factors can be evaluated. As a result, the share and economic value of the technical elements in the technical field as well as the share and economic value of the technical elements can be grasped numerically and quantitatively proposed. I'll do it.
[0017] (7)また、前記文書群分析装置であって、  [0017] (7) In the document group analysis apparatus,
前記経過情報には、他社引用件数、被特許異議申立ての回数、被特許無効審判 請求の回数、審査請求の有無、および特許権設定登録の有無が含まれていて、前 記第 1指数とは、前記文献数に、他社引用件数の合計値、被特許異議申立て回数 の合計値、および被特許無効審判請求回数の合計値のうちの少なくとも 1つを用い て重み付けをした値であり、前記経過情報を指数化した第 2指数とは、他社引用件 数の合計値、被特許異議申立て回数の合計値、被特許無効審判請求の回数の合 計値、審査請求率、および登録査定率のうちの少なくとも 1つを指数化した値である こととしてあよい。 The progress information includes the number of citations from other companies, the number of oppositions to patents, the number of requests for patent invalidation trials, the existence of requests for examination, and the presence or absence of registration of patent right setting. Is a value obtained by weighting the number of documents using at least one of the total number of citations from other companies, the total number of times of oppositions to be patented, and the total number of requests for patent invalidation. The second index obtained by indexing the historical information is cited by other companies. As an index value of at least one of the total number, the total number of patent objections, the total number of requests for patent invalidation trial, the examination request rate, and the registration assessment rate Good.
[0018] 他社の特許取得、技術開発に対して障害となりうる特許の影響力を加味した因子 評価が可能である。また、出願人の権利化意欲や審査官評価を加味した因子評価 が可能である。それにより、因子評価の公平性及び適正度を担保することができるた め、その結果として、技術要素間の相対的な位置関係、重要性の結果の公平且つ適 正な把握をすることができる。  [0018] It is possible to perform factor evaluation taking into account the influence of patents that can be an obstacle to patent acquisition and technological development of other companies. In addition, it is possible to perform factor evaluation that takes into account the applicant's willingness to acquire rights and the examiner's evaluation. As a result, the fairness and appropriateness of the factor evaluation can be ensured, and as a result, the relative positional relationship between the technical elements and the importance results can be grasped fairly and appropriately. .
[0019] (8)また、前記文書群分析装置であって、  (8) In the document group analyzing apparatus,
前記出力手段は、前記因子評価値は前記因子ごと及び出願人ごとに算出すること としてあよい。  The output means may calculate the factor evaluation value for each factor and for each applicant.
[0020] 各因子ごとに、因子に帰属する特許公報の出願人間の序列及びシェアを把握する ことができる。その結果、開発主体となる企業における競争状態の観点から、所定の 技術分野の特徴を把握することができる。  [0020] For each factor, it is possible to grasp the rank and share of applicants of patent publications belonging to the factor. As a result, it is possible to grasp the characteristics of a specific technical field from the viewpoint of the competitive state of the development company.
[0021] (9及び 10)また本発明の他の態様は、上記各装置によって実行される方法と同じ 工程を備えたデータ分析方法、並びに上記各装置によって実行される処理と同じ処 理をコンピュータに実行させることのできるプログラムである。このプログラムは、 FD、 CDROM、 DVDなどの記録媒体に記録されたものでもよぐネットワークで送受信さ れるあのであよレヽ。  [0021] (9 and 10) According to another aspect of the present invention, there is provided a data analysis method including the same steps as the methods executed by each of the above apparatuses, and the same processes as the processes executed by the respective apparatuses. It is a program that can be executed. This program can be sent and received over a network that can be recorded on a recording medium such as an FD, CDROM, or DVD.
[0022] (11)また、前記文書群分析装置であって、  [0022] (11) In the document group analyzing apparatus,
前記技術文献は、特許公開公報及び特許掲載公報を含む特許文書であり、 前記取得した各特許文書につ!/、て、当該特許文書の価値を個別に評価した特許 スコアを取得する手段と、  The technical document is a patent document including a patent publication and a patent publication, and for each acquired patent document! /, Means for acquiring a patent score that individually evaluates the value of the patent document;
前記因子の各々について、その因子に属する特許文書の前記特許スコアを用いて 、該因子の技術的評価を示す因子評価値を算出する因子評価値算出手段とを備え ること  For each of the factors, there is provided factor evaluation value calculation means for calculating a factor evaluation value indicating a technical evaluation of the factor using the patent score of a patent document belonging to the factor.
としてあよい。  Good as.
[0023] 各特許文書の価値を個別に評価した特許スコアを用いることにより、各因子に属す る特許文書の価値を反映した因子評価値の算出が可能となる。その結果、技術分野 の特徴乃至コンセプトをより明確に把握することが可能となる。 [0023] By using a patent score that individually evaluates the value of each patent document, it belongs to each factor. It is possible to calculate a factor evaluation value that reflects the value of a patent document. As a result, it is possible to grasp the features and concepts of the technical field more clearly.
[0024] (12)また、前記文書群分析装置であって、 [0024] (12) In the document group analyzing apparatus,
前記因子評価値算出手段は、  The factor evaluation value calculation means includes
前記因子毎に、その因子に属する特許文書の前記特許スコアのうち、所定の閾値 以上の特許スコアを選択し、その選択した特許スコアを集計した値を、前記因子評価 ィ直として算出すること  For each factor, a patent score equal to or higher than a predetermined threshold is selected from the patent scores of patent documents belonging to the factor, and a value obtained by adding the selected patent scores is calculated as the factor evaluation directly.
としてあよい。  Good as.
[0025] 所定の閾値以上の値を集計対象とし閾値以下の値を捨象することにより、件数は多 くても重要性の低い特許が多数あるだけで重要な特許の少ない因子が高得点になる ことを防止できる。その結果、適切な因子評価値を算出することができ、技術分野の 特徴乃至コンセプトをより明確に把握することが可能となる。  [0025] By subtracting values below the threshold with the values above a predetermined threshold being included, only a large number of low-importance patents with a large number of cases will result in a high score for factors with few important patents. Can be prevented. As a result, an appropriate factor evaluation value can be calculated, and it becomes possible to more clearly understand the characteristics or concepts of the technical field.
[0026] (13)また、前記文書群分析装置であって、  [0026] (13) In the document group analyzing apparatus,
前記特許スコアは、前記因子評価値の算出対象である因子を含む母集団の文書 群にぉレ、て標準化した値であること  The patent score is a value that is standardized with respect to the document group of the population including the factor for which the factor evaluation value is calculated.
が望ましい。  Is desirable.
[0027] 母集団における標準値を求めて因子評価値を算出することにより、異なる因子間で の相対比較の精度を向上させることができる。その結果、技術分野の特徴乃至コンセ ブトをより明確に把握することが可能となる。  [0027] By calculating a factor evaluation value by obtaining a standard value in the population, the accuracy of relative comparison between different factors can be improved. As a result, it becomes possible to grasp the features and concepts of the technical field more clearly.
[0028] (14)また、前記文書群分析装置であって、  (14) Further, the document group analyzing apparatus,
前記特許スコアとは、前記特許文書を技術分野毎、且つ所定期間毎のグループに 分類し、その分類したグループ毎に、そのグループに属する特許文書の経過情報を 利用し、それぞれの特許文書についての算出した値であること  The patent score means that the patent documents are classified into groups for each technical field and every predetermined period, and for each classified group, the progress information of the patent documents belonging to the group is used, and Must be a calculated value
としてあよい。  Good as.
[0029] 技術分野毎、且つ所定期間毎のグループに分類し、その分類したグループ毎に経 過情報を利用して特許スコアを算出することで、技術分野及び出願時期の違いによ る経過情報の偏りを補正し、的確な特許スコアを算出することが可能となる。その結 果、適切な因子評価値を算出することができ、技術分野の特徴乃至コンセプトをより 明確に把握することが可能となる。 [0029] By subdividing into groups for each technical field and for each predetermined period, and calculating patent scores using the past information for each classified group, progress information based on differences in technical field and application time. It is possible to correct the bias and calculate an accurate patent score. As a result, an appropriate factor evaluation value can be calculated, and the characteristics and concepts of the technical field can be further improved. It becomes possible to grasp clearly.
図面の簡単な説明 Brief Description of Drawings
[図 1]本発明の一実施形態に係る文書群分析装置のハードウェア構成を示す図。  FIG. 1 is a diagram showing a hardware configuration of a document group analysis apparatus according to an embodiment of the present invention.
[図 2]図 2 (A)は、上記文書群分析装置における因子抽出処理の手順を説明するフ ローチャート、図 2 (B)は、文書又は文書群のデータとして因子経過情報スコアを算 出する処理手順を説明するフローチャート。 [FIG. 2] FIG. 2 (A) is a flowchart explaining the procedure of the factor extraction process in the above document group analyzer, and FIG. 2 (B) calculates the factor progress information score as document or document group data. The flowchart explaining the processing procedure to do.
[図 3A]分析対象となる複数の文書のテキストデータを取得する方法の一例に関する 説明図。  FIG. 3A is an explanatory diagram regarding an example of a method for acquiring text data of a plurality of documents to be analyzed.
園 3B]分析対象となる複数の文書のテキストデータを取得する方法の一例に関する 説明図。 3B] An explanatory diagram regarding an example of a method for acquiring text data of a plurality of documents to be analyzed.
[図 3C]分析対象となる複数の文書のテキストデータを取得する方法の一例に関する 説明図。  FIG. 3C is an explanatory diagram regarding an example of a method for acquiring text data of a plurality of documents to be analyzed.
園 4]本発明の実施例において算出された、各索引語の因子負荷量。 4] Factor loading of each index word calculated in the embodiment of the present invention.
園 5]本発明の実施例において算出された、各公報の因子得点。 5] Factor scores for each publication calculated in the examples of the present invention.
園 6]本発明の実施例において抽出された各因子に属する索引語と、当該各因子に 属する公報。 6] Index words belonging to each factor extracted in the embodiment of the present invention, and publications belonging to each factor.
園 7]本発明の実施例において算出された、各因子の複数の指標と、これに基づい て算出された特許インパクト指数。 7] A plurality of indicators of each factor calculated in the embodiment of the present invention, and a patent impact index calculated based on this.
園 8]本発明の実施例において算出された、各因子の複数の指標と、これに基づい て算出された経過情報指数。 8] A plurality of indicators for each factor calculated in the embodiment of the present invention, and a progress information index calculated based on the indicators.
園 9]本発明の実施例において算出された、(因子ごとの)因子経過情報スコア。 園 10]本発明の実施例において算出された因子経過情報スコアを図示した例。 園 11]本発明の実施例において算出された、因子ごと及び出願人ごとの因子経過情 報スコア。 9] Factor progress information score (for each factor) calculated in the embodiment of the present invention. 10] An example illustrating the factor progress information score calculated in the embodiment of the present invention. 11] Factor progress information score for each factor and for each applicant calculated in the example of the present invention.
園 12]本発明の実施例において、各因子に属する公報群を更に出願人ごとに分類し12] In the embodiment of the present invention, the publication group belonging to each factor is further classified for each applicant.
、因子ごと及び出願人ごとに因子経過情報スコアを図示した例。 The example which illustrated the factor progress information score for every factor and every applicant.
[図 13]上記実施形態の変形例に係る文書群分析装置の機能ブロック図。  FIG. 13 is a functional block diagram of a document group analysis apparatus according to a modification of the embodiment.
園 14]上記変形例における技術要素スコアの算出処理の手順を示すフローチャート [図 15]上記変形例の技術要素スコア及び上記実施例の因子経過情報スコアの分布 を、公報件数との関係において示した図。 14] A flowchart showing the procedure for calculating the technical element score in the above modification FIG. 15 is a graph showing the distribution of the technical element score of the modified example and the factor progress information score of the example in relation to the number of publications.
[図 16]上記変形例において、因子ごと及び出願人ごとに技術要素スコアを図示した 例。  [FIG. 16] An example showing the technical element score for each factor and for each applicant in the above variation.
[図 17]上記変形例で利用する経過情報のデータ構成の一例を模擬的に示した図。  FIG. 17 is a diagram schematically illustrating an example of a data configuration of progress information used in the modification example.
[図 18]上記変形例で利用する内容情報のデータ構成の一例を模擬的に示した図。  FIG. 18 is a diagram schematically illustrating an example of a data configuration of content information used in the above modification.
[図 19]上記変形例におけるパテントスコアの算出処理の手順を示したフローチャート  FIG. 19 is a flowchart showing the procedure of a patent score calculation process in the modified example.
[図 20]上記変形例において各特許データの評価値を算出する処理の詳細を示すフ ローチャート。 符号の説明 FIG. 20 is a flowchart showing details of processing for calculating an evaluation value of each patent data in the modified example. Explanation of symbols
[0031] 1:処理装置、 2 :入力装置、 3 :記録装置、4 :出力装置、 101 :テキストデータ取得 手段、 102 :文書ベクトル取得手段、 103 :因子負荷量及び因子得点演算手段、 104 :帰属因子決定手段  [0031] 1: processing device, 2: input device, 3: recording device, 4: output device, 101: text data acquisition means, 102: document vector acquisition means, 103: factor load and factor score calculation means, 104: Attribution factor determination means
発明を実施するための最良の形態  BEST MODE FOR CARRYING OUT THE INVENTION
[0032] 以下、本発明の実施の形態を、図面を参照して詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
< 1.文書群分析装置の構成〉  <1. Configuration of document group analyzer>
図 1は、本発明の一実施形態に係る文書群分析装置のハードウェア構成を示す図 である。本実施形態の文書群分析装置は、 CPU (中央演算装置)及びメモリ(記録装 置)などを備えた処理装置 1、キーボード(手入力器具)などの入力手段である入力 装置 2、文書群のデータや条件や処理装置 1による作業結果などを格納する記録手 段である記録装置 3、及び抽出された因子に属する文書又は文書群のデータを表示 又は印刷等する出力手段である出力装置 4を備えたコンピュータ装置より構成されて いる。  FIG. 1 is a diagram showing a hardware configuration of a document group analysis apparatus according to an embodiment of the present invention. The document group analysis apparatus of the present embodiment includes a processing device 1 having a CPU (central processing unit) and a memory (recording device), an input device 2 that is an input means such as a keyboard (manual input device), a document group A recording device 3 which is a recording means for storing data, conditions, work results by the processing device 1 and the like, and an output device 4 which is an output means for displaying or printing data of documents or document groups belonging to the extracted factors. It consists of a computer device equipped.
[0033] 処理装置 1は、テキストデータ取得部 101、文書ベクトル取得部 102、因子演算部 1 03、帰属因子決定部 104、文書数判定部 201、経過情報読出し部 202、指標算出 部 203、特許インパクト算出部 204、経過情報算出部 205、およびスコア算出部 206 を備える。 The processing device 1 includes a text data acquisition unit 101, a document vector acquisition unit 102, a factor calculation unit 103, an attribution factor determination unit 104, a document number determination unit 201, a progress information reading unit 202, an index calculation unit 203, a patent Impact calculation unit 204, progress information calculation unit 205, and score calculation unit 206 Is provided.
ここで、本実施形態では、処理装置 1の各機能部(テキストデータ取得部 101、文書 ベクトル取得部 102、因子演算部 103、帰属因子決定部 104、文書数判定部 201、 経過情報読出し部 202、指標算出部 203、特許インパクト算出部 204、経過情報算 出部 205、およびスコア算出部 206)力 ソフトウェアにより実現される場合を例にす 具体的には、処理装置 1のメモリには、各機能部の機能を実現するためのプロダラ ム(テキストデータ取得プログラム、文書ベクトル取得プログラム、因子演算プログラム 、帰属因子決定プログラム、文書数判定プログラム、経過情報読出しプログラム、指 標算出プログラム、特許インパクト算出プログラム、経過情報算出プログラム、および スコア算出プログラム)が記憶されているものとする。そして、処理装置 1の各機能部 の機能は、 CPUがメモリに記憶されている上記のプログラムを実行することにより実 現される。  Here, in the present embodiment, each functional unit of the processing device 1 (text data acquisition unit 101, document vector acquisition unit 102, factor calculation unit 103, attribution factor determination unit 104, document number determination unit 201, progress information reading unit 202 , Index calculation unit 203, patent impact calculation unit 204, progress information calculation unit 205, and score calculation unit 206) Programs for realizing the functions of the functional part (text data acquisition program, document vector acquisition program, factor calculation program, attribution factor determination program, document number determination program, progress information reading program, indicator calculation program, patent impact calculation program , Progress information calculation program, and score calculation program) are stored. The functions of the functional units of the processing device 1 are realized by the CPU executing the above program stored in the memory.
記録装置 3は、条件記録部 31、作業結果格納部 32、文書格納部 33などから構成 される。文書格納部 33は外部データベースや内部データベースから得た、文書群の データを含んでいる。外部データベースとは、例えば日本国特許庁でサービスして V、る特許電子図書館の IPDLや、株式会社パトリスでサービスして!/、る PATOLIS ( 登録商標)などの文書データベースを意味する。又内部データベースとは、販売され てレ、る例えば特許 JP— ROMなどのデータを自前で格納したデータベース、文書を イスク)、 DVD (デジタルバーサタイルディスク)などの媒体から読み出す装置、紙な どに出力された或いは手書きされた文書を読み込む OCR (光学的文字読み取り装 置)などの装置及び読み込んだデータをテキストなどの電子データに変換する装置 などを含んでレヽるものとする。  The recording device 3 includes a condition recording unit 31, a work result storage unit 32, a document storage unit 33, and the like. The document storage unit 33 includes document group data obtained from an external database or an internal database. An external database means, for example, a document database such as IPDL of a patent electronic library that is serviced by the Japan Patent Office or PATOLIS (registered trademark) that is serviced by Patrice Co., Ltd.! In addition, the internal database is a database that stores data such as patent JP-ROMs that are sold by itself, documents that are stored on a disk (such as discs) and DVDs (digital versatile discs), and is output to paper. It includes devices such as OCR (optical character reader) that reads in-written or handwritten documents, and devices that convert the read data into electronic data such as text.
本実施形態では、分析対象文書として主に公開特許公報、公告特許公報、特許掲 載公報、特許発明明細書、公表特許公報、再公表特許公報、特許審判請求公告、 公開特許公報英文抄録、公開実用新案公報、公開実用新案明細書、公表実用新 案公報、再公表実用新案公報、公告実用新案公報、実用新案登録公報、登録実用 新案公報、登録実用新案明細書、実用新案審判請求公告、公開技報等の種々の特 許公報類を极うが、これに限らず、技術論文、技術を扱った雑誌、書籍など広く技術 文献一般を分析することができる。 In the present embodiment, the published patent publication, published patent publication, patent publication publication, patent invention specification, published patent publication, republished patent publication, patent trial request publication, published patent publication English abstract, publication Utility Model Gazette, Published Utility Model Specification, Published Utility Model Gazette, Republished Utility Model Gazette, Public Utility Model Gazette, Utility Model Registration Gazette, Registered Utility Model Various patent gazettes such as new gazettes, registered utility model specifications, notices of requests for utility model trials, public technical bulletins, etc., but are not limited to this. General can be analyzed.
[0035] 処理装置 1、入力装置 2、記録装置 3、及び出力装置 4の間で信号やデータをやり 取りする通信手段としては、 USB (ユニバーサルシリアルバス)ケーブルなどで直接 接続してもよいし、 LAN (ローカルエリアネットワーク)などのネットワークを介して送受 信してもよいし、文書を格納した FD、 CDROM、 MO、 DVDなどの媒体を介してもよ い。或いはこれらの一部、又はいくつかを組み合わせたものでもよい。  [0035] As a communication means for exchanging signals and data among the processing device 1, the input device 2, the recording device 3, and the output device 4, a USB (Universal Serial Bus) cable or the like may be directly connected. It may be transmitted / received via a network such as a LAN (local area network), or via a medium such as an FD, CDROM, MO, or DVD that stores documents. Alternatively, a part or a combination of these may be used.
[0036] < 1 - 1.入力装置 2の詳細〉  [0036] <1-1. Details of input device 2>
次に、上記の文書群分析装置における構成と機能を詳しく説明する。  Next, the configuration and function of the document group analysis apparatus will be described in detail.
入力装置 2では、分析対象文書群のテキストデータの取得条件、文書ベクトルの取 得条件、因子負荷量及び因子得点の算出条件、帰属因子の決定条件、後述の因子 経過情報スコアの算出条件、出力条件などの入力を受け付ける。これら入力された 条件は、記録装置 3の条件記録部 31へ送られて格納される。  In the input device 2, the text data acquisition condition of the analysis target document group, the document vector acquisition condition, the factor loading amount and the factor score calculation condition, the attribution factor determination condition, the factor progress information score calculation condition described later, and the output Accepts input such as conditions. These input conditions are sent to and stored in the condition recording unit 31 of the recording device 3.
[0037] < 1 2.処理装置 1の詳細〉  [0037] <1 2. Details of processing device 1>
テキストデータ取得部 101は、入力装置 2で入力されるテキストデータの取得条件 に従い、分析対象となる文書群のデータを記録装置 3の文書格納部 33から取得する 。例えば、一定条件で抽出された文書群を類似度に基づいてクラスタ分析した結果 得られたクラスタのうち、 1つのクラスタに属する I件の文書についてテキストデータを 取得する。取得されたテキストデータは、文書ベクトル取得部 102に直接送られてそ こでの処理に用いられ、或いは記録装置 3の作業結果格納部 32に送られて格納さ れる。  The text data acquisition unit 101 acquires data of a document group to be analyzed from the document storage unit 33 of the recording device 3 according to the acquisition conditions of the text data input by the input device 2. For example, text data is acquired for I documents belonging to one cluster among the clusters obtained as a result of cluster analysis based on the similarity of documents extracted under certain conditions. The acquired text data is sent directly to the document vector acquisition unit 102 and used for processing there, or sent to the work result storage unit 32 of the recording device 3 and stored therein.
[0038] 文書ベクトル取得部 102は、入力装置 2で入力される文書ベクトルの取得条件に従 い、テキストデータ取得部 101で取得された I件の文書のテキストデータに基づいて、 I個の文書ベクトルを算出する。この文書ベクトルは、索引語数を Jとすると、各文書 iに おける索引語 jの重み付け量 zをベクトル要素とする J次元ベクトルとなる。算出された 文書ベクトルは、因子演算部 103に直接送られてそこでの処理に用いられ、或いは 記録装置 3の作業結果格納部 32に送られて格納される。 [0039] 因子演算部 103は、入力装置 2で入力される因子負荷量及び因子得点の算出条 件に従い、文書ベクトル取得部 102で算出された文書ベクトルのベクトル要素 zに基 [0038] The document vector acquisition unit 102 conforms to the document vector acquisition condition input by the input device 2, and based on the text data of the I documents acquired by the text data acquisition unit 101, I documents Calculate the vector. This document vector is a J-dimensional vector whose vector element is the weighting amount z of the index word j in each document i, where J is the number of index words. The calculated document vector is directly sent to the factor calculation unit 103 and used for processing there, or sent to the work result storage unit 32 of the recording device 3 and stored therein. The factor calculation unit 103 is based on the vector element z of the document vector calculated by the document vector acquisition unit 102 in accordance with the factor load amount and factor score calculation conditions input by the input device 2.
1J づき、因子負荷量 a 及び因子得点 f を算出する。ここで kは因子番号であり、因子負  Calculate factor loading a and factor score f in 1J increments. Where k is the factor number and factor negative
jk ik  jk ik
荷量 a は各索引語 jについて因子ごとに算出され、因子得点 f は各文書 iについて jk ik  The load a is calculated for each factor for each index word j, and the factor score f is jk ik for each document i.
因子ごとに算出される。算出された因子負荷量 a 及び因子得点 f は、帰属因子決定  Calculated for each factor. The calculated factor loading a and factor score f are assigned factors.
jk ik  jk ik
部 104に直接送られてそこでの処理に用いられ、或いは記録装置 3の作業結果格納 部 32に送られて格納される。  It is sent directly to the unit 104 and used for processing there, or sent to the work result storage unit 32 of the recording device 3 and stored therein.
[0040] 帰属因子決定部 104は、入力装置 2で入力される帰属因子の決定条件に従い、因 子演算部 103により算出された因子負荷量 a に基づいて各索引語 jの帰属因子を決 [0040] The attribution factor determination unit 104 determines the attribution factor of each index word j based on the factor loading a calculated by the factor calculation unit 103 in accordance with the attribution factor determination condition input by the input device 2.
jk  jk
定し、因子得点 f に基づいて各文書 iの帰属因子を決定する。決定された帰属因子  And determine the attribution factor for each document i based on the factor score f. Determined attribution factor
ik  ik
は、文書数判定部 201及び経過情報読出し部 202に直接送られてそこでの処理に 用いられ、或いは記録装置 3の作業結果格納部 32に送られて格納される。そして、 入力装置 2で入力される出力条件に従い、同じ因子に属する索引語 jが、該当する文 書 iのデータとともに出力装置 4により出力される。  Is sent directly to the document number determination unit 201 and the progress information reading unit 202 and used for processing there, or sent to the work result storage unit 32 of the recording device 3 and stored therein. Then, according to the output condition input by the input device 2, the index word j belonging to the same factor is output by the output device 4 together with the data of the corresponding document i.
[0041] 文書数判定部 201乃至スコア算出部 206は、入力装置 2で入力される因子経過情 報スコアの算出条件に従い、因子経過情報スコアの算出を行う。 The document number determination unit 201 to the score calculation unit 206 calculate the factor progress information score according to the factor progress information score calculation condition input by the input device 2.
文書数判定部 201は、帰属因子決定部 104で決定された各文書の帰属因子に基 づいて、因子ごとの文書群、或いは、因子ごと及び出願人ごとの文書群を、スコア算 出対象文書群として読出し、その文書数を判定する。  Based on the attribution factor of each document determined by the attribution factor determination unit 104, the document number determination unit 201 calculates a document group for each factor or a document group for each factor and each applicant as a score calculation document. Read as a group and determine the number of documents.
経過情報読出し部 202は、上記スコア算出対象文書群について、各文書の経過情 報を記録装置 3の文書格納部 33から読出す。  The progress information reading unit 202 reads the progress information of each document from the document storage unit 33 of the recording device 3 for the score calculation target document group.
指標算出部 203は、スコア算出対象文書群について、経過情報読出し部 202によ つて読出された経過情報に基づく指標を算出する。  The index calculation unit 203 calculates an index based on the progress information read by the progress information reading unit 202 for the score calculation target document group.
特許インパクト算出手段 204は、スコア算出対象文書群について、文書数判定部 2 01により判定された文書数及び指標算出部 203により算出された指標に基づいて特 許インパクトを算出する。  The patent impact calculation unit 204 calculates the patent impact for the score calculation target document group based on the number of documents determined by the document number determination unit 201 and the index calculated by the index calculation unit 203.
経過情報算出部 205は、スコア算出対象文書群について、指標算出部 203により 算出された指標に基づいて経過情報指数を算出する。 スコア算出部 206は、スコア算出対象文書群について、特許インパクト算出部 204 によって算出された特許インパクト及び経過情報算出部 205によって算出された経 過情報指数に基づいて因子経過情報スコアを算出する。 The progress information calculation unit 205 calculates a progress information index based on the index calculated by the index calculation unit 203 for the score calculation target document group. The score calculation unit 206 calculates a factor progress information score for the score calculation target document group based on the patent impact calculated by the patent impact calculation unit 204 and the progress information index calculated by the progress information calculation unit 205.
各機能部による作業結果は記録装置 3の作業結果格納部 32に送られて格納され  The work results by each functional unit are sent to and stored in the work result storage unit 32 of the recording device 3.
[0042] < 1 3.記録装置 3の詳細〉 [0042] <1 3. Details of recording device 3>
記録装置 3において、条件記録部 31は、入力装置 2から得られた条件などの情報 を記録し、処理装置 1の要求に基づいて、必要なデータを送る。作業結果格納部 32 は、処理装置 1における各構成要素の作業結果を格納し、処理装置 1の要求に基づ いて、必要なデータを送る。文書格納部 33は、入力装置 2或いは処理装置 1の要求 に基づいて、外部データベース或いは内部データベースから得た、必要な文書群の データを格納し、提供する。文書格納部 33は、特許文書のデータを格納するときは、 その書誌情報(出願人名など)及び経過情報 (審査請求などの情報)を併せて格納 するのが好ましい。  In the recording device 3, the condition recording unit 31 records information such as conditions obtained from the input device 2, and sends necessary data based on a request from the processing device 1. The work result storage unit 32 stores the work result of each component in the processing device 1 and sends necessary data based on a request from the processing device 1. The document storage unit 33 stores and provides necessary document group data obtained from the external database or the internal database based on the request of the input device 2 or the processing device 1. When storing patent document data, the document storage unit 33 preferably stores the bibliographic information (such as the name of the applicant) and the progress information (information such as the request for examination) together.
[0043] < 1 -4.出力装置 4の詳細〉  [0043] <1 -4. Details of output device 4>
出力装置 4は、処理装置 1の帰属因子決定部 104で帰属因子が決定された文書及 び索引語を、因子ごとに出力する。この出力装置 4は例えばディスプレイ装置などの 表示部を備え、文書及び/又は索引語と因子との対応表や、因子ごとに算出した文 書又は文書群の因子評価値等を表示する。出力の形態としては、表示部での表示 に限らず、紙などの印刷媒体への印刷、或いは通信手段を介してのネットワーク上の コンピュータ装置への送信などによってもょレ、。  The output device 4 outputs the document and index word for which the attribution factor is determined by the attribution factor determination unit 104 of the processing device 1 for each factor. The output device 4 includes a display unit such as a display device, and displays a correspondence table between documents and / or index words and factors, a factor evaluation value of a document or a document group calculated for each factor, and the like. The output format is not limited to the display on the display unit, but may be printed on a print medium such as paper, or transmitted to a computer device on a network via communication means.
[0044] < 2.因子抽出処理〉  [0044] <2. Factor extraction processing>
図 2 (A)は、上記文書群分析装置における因子抽出処理の手順を説明するフロー チャートである。本実施形態の文書群分析装置は、因子分析の手法を用いて、分析 対象となる複数の文書から因子を抽出する。  FIG. 2 (A) is a flowchart for explaining the procedure of the factor extraction process in the document group analyzer. The document group analysis apparatus of the present embodiment extracts factors from a plurality of documents to be analyzed using a factor analysis technique.
[0045] < 2— 1 ·テキストデータの取得〉  [0045] <2-1-Acquisition of text data>
本実施形態の文書群分析装置は、テキストデータ取得部 101により、文書格納部 3 3から、分析対象として I件の文書 i (i= l , 2, · · · , I)のテキストデータを取得する(S I 01)。 I件の文書としてどのような文書群を選ぶかは任意である力 s、例えば次のように 行う。 In the document group analysis apparatus of the present embodiment, the text data acquisition unit 101 acquires text data of I documents i (i = l, 2,..., I) as analysis targets from the document storage unit 33. (SI 01). The kind of document group to be selected as I documents is arbitrary s, for example, as follows.
図 3A乃至図 3Cは、分析対象となる複数の文書のテキストデータを取得する方法 の一例に関する説明図である。なお、以下に説明する (A)及び (B)の事項は、本出 願人により出願された特許公報(国際公開第 2006/030751号参照。)記載の手順 により実現されるものである。そのため、以下の説明は簡略化する。  3A to 3C are explanatory diagrams regarding an example of a method for acquiring text data of a plurality of documents to be analyzed. The items (A) and (B) described below are realized by the procedure described in the patent gazette filed by the applicant (see International Publication No. 2006/030751). Therefore, the following description is simplified.
(A)まず、ある企業(対象企業)の特許公報群から、注目技術を選定する処理を行 う。具体的には、処理装置 1のテキストデータ取得部 101は、対象企業の特許文書群 をクラスタリングして各クラスタ(対象企業クラスタ)の評価値 (例えば後述の因子経過 情報スコアに相当するもの)を算出し、評価値が最大の対象企業クラスタを注目技術 として選定する(図 3A)。  (A) First, the technology of interest is selected from the patent gazettes of a certain company (target company). Specifically, the text data acquisition unit 101 of the processing device 1 clusters the patent documents of the target company and obtains an evaluation value of each cluster (target company cluster) (for example, one corresponding to a factor progress information score described later). Calculate and select the target company cluster with the highest evaluation value as the technology of interest (Figure 3A).
(B)次に、テキストデータ取得部 101は、対象企業及び他企業の公報を含む自他 特許文書群から「選定した注目技術および注目技術に類似する技術 (特定技術分野 )」に属する文書群(自他特許特定分野文書群)を抽出する。具体的には、テキストデ ータ取得部 101は、自他特許文書群の各文書と上記注目技術との類似度を計算し、 上記注目技術との類似度上位所定個数の文書群を自他特許特定分野文書群として 文書格納部 33から抽出する(図 3B)。これにより、例えば、対象企業の特許群から重 要な特許群を選び出した上で、他社特許を含めた類似特許群を分析することができ  (B) Next, the text data acquisition unit 101 selects the document group belonging to the “selected attention technique and the technology similar to the attention technique (specific technical field)” from its own patent document group including the target company and other companies' publications. (Self-other patent specific field document group) is extracted. Specifically, the text data acquisition unit 101 calculates the degree of similarity between each document in its own patent document group and the noted technology, and determines the document group having a higher number of similarities with the noted technology as its own others. It is extracted from the document storage unit 33 as a patent specific field document group (Fig. 3B). This makes it possible, for example, to select an important patent group from the patent group of the target company and then analyze similar patent groups including other company's patents.
(C)次に、テキストデータ取得部 101は、抽出した自他特許特定分野文書群をクラ スタリングすることにより、分析対象となる文書群を得る。ここで分析対象とする文書群 (自他特許クラスタ)は、文書同士の類似度の高い下位クラスタに限られるものではな く、下位クラスタ同士の類似度の高い上位クラスタ、或いはその中間の中位クラスタで あってもよい。図 3Cには、下位クラスタ 70個、中位クラスタ(ライン) 8個、上位クラスタ (グループ) 4個が生成された例を示して!/、る。 自他特許クラスタとして下位クラスタを 用いるか、中位或いは上位クラスタを用いるかは分析の目的に応じて選択すればよ い。これにより、例えば、自他特許特定分野文書群を、細分化された技術領域に分 類すること力 Sでき、各技術領域ごとに、或いは中位又は上位クラスタごとに、特許群を 分析することができる。また、類似性の高い文書群である自他特許特定分野文書群 をクラスタリングして更に類似性の高い文書群を分析対象とすることにより、因子分析 において因子によって説明される割合(累積寄与率)を向上させ、文書群に含まれる コンセプト乃至特徴を的確に表現することができる。 (C) Next, the text data acquisition unit 101 obtains a document group to be analyzed by clustering the extracted patent documents in the patent-specific field. The group of documents to be analyzed here (own and other patent clusters) is not limited to lower clusters having a high degree of similarity between documents. It may be a cluster. Figure 3C shows an example where 70 lower clusters, 8 middle clusters (lines), and 4 upper clusters (groups) are generated! Whether to use a lower cluster or a middle or upper cluster as your own patent cluster can be selected according to the purpose of the analysis. As a result, for example, it is possible to classify a group of documents specific to the patent and other patents into subdivided technical areas, and for each technical area, or for each intermediate or upper cluster, the patent group can be classified. Can be analyzed. In addition, the ratio of documents explained by factors in factor analysis (cumulative contribution rate) can be analyzed by clustering the document groups of specific fields of patents and other patents that are highly similar to each other and analyzing the document groups with higher similarity. It is possible to accurately express concepts and features included in a document group.
[0047] テキストデータ取得部 101は、分析対象となる各文書 iのテキストデータを文書格納 部 33から取得したら、所定個對個の索引語 j (j = l , 2, 3,…, J)を抽出する。抽出 する索引語は、例えば I件の文書における重要度上 個の索引語とする。重要度上 個の索引語を抽出するには、文書 iに含まれる索引語について重要度を算出して 降順に並べ替え、上 ί 個を抽出する。  When the text data acquisition unit 101 acquires the text data of each document i to be analyzed from the document storage unit 33, the predetermined number of index words j (j = 1, 2, 3,..., J) To extract. The index words to be extracted are, for example, index words of importance in I documents. To extract the index words with the highest importance, the importance is calculated for the index words included in the document i, sorted in descending order, and the highest words are extracted.
[0048] ここで算出する索引語の重要度は、テキストデータ取得部で取得された分析対象と なる複数の文書における重要度、例えば GFIDFや、この GFIDFを索引語同士の共 起度に基づレ、て補正したものを用いるのが好まし!/、。  [0048] The importance level of the index word calculated here is based on the importance levels of a plurality of documents to be analyzed acquired by the text data acquisition unit, for example, GFIDF or the GFIDF based on the co-occurrence level of index words. It is preferable to use the one that has been corrected!
[0049] GFIDFとは、ある索引語について、大域的頻度(GF:分析対象となる文書群にお ける当該索引語の出現回数の合計値)と、文書頻度 (DF :所定文書集団のうち当該 索引語が出現する文書の文書数)の逆数又は文書頻度の対数の逆数 (IDF:逆文書 頻度)との積により求められる値である。分析対象となる文書群において多数用いら れる索引語であって、分析対象となる文書群とは異なる所定文書集団においてはあ まり用いられてレ、な!/、索引語にっレ、ては高レ、GFIDF値が算出され、分析対象となる 文書群の特徴を表す重要語と評価される。  [0049] GFIDF refers to the global frequency (GF: total number of occurrences of the index word in the document group to be analyzed) and document frequency (DF: This is a value obtained by multiplying the reciprocal of the number of documents in which the index word appears) or the logarithm of the document frequency (IDF: inverse document frequency). This is an index word that is used in a large number in the document group to be analyzed, and is often used in a predetermined document group that is different from the document group to be analyzed. High scores and GFIDF values are calculated and evaluated as important words representing the characteristics of the document group to be analyzed.
[0050] この GFIDFを索引語同士の共起度に基づいて補正したものとしては、次に述べる Skeyがある。なお、以下に説明する事項は、本出願人により出願された特許公報( 国際公開第 2006/048998号参照。 )記載の手順により実現されるものであるため 、説明を簡略化する。  [0050] Skey described below is a correction of GFIDF based on the co-occurrence of index terms. Note that the items described below are realized by the procedures described in the patent publication (see International Publication No. 2006/048998) filed by the present applicant, so the description will be simplified.
Skeyを求めるには、まず分析対象となる文書群 Cで多く出現する高頻度語であつ て、当該文書群の各索引語との文書単位の共起度  In order to obtain Skey, first, a high-frequency word that appears frequently in the document group C to be analyzed, and the co-occurrence degree of each document word with each index word of the document group.
c(w,w ) =∑ [DF(w,D) X DF(w,D)]  c (w, w) = ∑ [DF (w, D) X DF (w, D)]
j k {Dec} j k  j k {Dec} j k
(但し、 w、 wは文書群 Cに含まれる各索引語であり、 Dは文書群 Cに属する各文 j k  (W, w are each index word included in the document group C, and D is each sentence j k belonging to the document group C.
書であり、 DF (w, D)は文書 Dでの索引語 wの文書頻度である。 ) が互いに類似する高頻度語を土台 g (h=l, 2, … )とする。この土台 gと索引語 DF (w, D) is the document frequency of index word w in document D. ) Let high frequency words that are similar to each other be the base g (h = l, 2,…). This foundation g and index word
h  h
wの文書単位での共起度を The co-occurrence degree of w for each document
Co (w, gノ =∑ c (w, w )  Co (w, g no = ∑ c (w, w)
{w' ^g, w'≠w}  {w '^ g, w' ≠ w}
とする。ここで w'は、ある土台 gに属する高頻度語であり、かつ共起度 Co (w, g)の計 測対象である索引語 w以外のものをいう。索引語 wと土台 gとの共起度 Co (w, g)は、 w 'すべてについての、 wとの共起度 c(w, w')の合計である。この共起度 Co (w, g) に基づき、次の key (w)を算出する。 And Here, w ′ is a high-frequency word belonging to a certain base g and other than the index word w which is a measurement target of the co-occurrence degree Co (w, g). Degree of co-occurrence Co of the index words w and the base g (w, g) is, w 'for all, the co-occurrence of c and w (w, w' is the sum of). Based on the co-occurrence degree Co (w, g), the next key (w) is calculated.
key(w)=l- Π [1 Co(w, g )/F(g )]  key (w) = l- Π [1 Co (w, g) / F (g)]
{l≤h≤b} h h  {l≤h≤b} h h
ここで、 F(g ) =∑ Co(w, g )、すなわち索引語 wと土台 gとの共起度 Co (w, g Where F (g) = ∑ Co (w, g), that is, the co-occurrence degree of the index word w and the base g Co (w, g
h wEC} h h  h wEC} h h
)の、全索引語 wについての合計と定義する。 Co(w, g )を F(g )で除して 1との差 h h h  ) Of all index words w. Co (w, g) divided by F (g) and the difference from 1 h h h
をとり、これをすベての土台 gについて乗じて 1との差をとつたものが、 key(w)である The key (w) is obtained by multiplying all the base g and taking the difference from 1
h  h
 Yes
Skey(w)は次の式により算出される。  Skey (w) is calculated by the following equation.
Skey(w) =GF(w,C) X [IDF(w,P) + In key(w) ]  Skey (w) = GF (w, C) X [IDF (w, P) + In key (w)]
なお、 GF(w, C)は分析対象となる文書群 Cでの索引語 wの大域的頻度であり、 ID F(w, P)は分析対象となる文書群 Cとは異なる所定文書集団 Pでの索引語 wの逆文 書頻度である。 Note that GF (w, C) is the global frequency of the index word w in the document group C to be analyzed, and ID F (w, P) is a predetermined document group P different from the document group C to be analyzed. This is the reverse document frequency of the index word w.
GFIDFが高ぐかつ、文書群 Cの土台語に共起し文書群 Cの内容との親和性が高 V、 (key (w)が高!/、)語につ!/、ては高!/、Skey (w)値が算出され、分析対象となる文書 群 Cの特徴を表す重要索引語と評価される。  GFIDF is high and co-occurs in the basic language of document group C and has high affinity with the contents of document group C V, (key (w) is high! /,) Word! /, High! /, Skey (w) value is calculated and evaluated as an important index word representing the characteristics of the document group C to be analyzed.
上記のように、複数の技術文献の中から、技術文献群全体の特徴を表す索引語の みを重要索引語として抽出した上で分析することにより、その技術分野の特徴乃至コ ンセブトをより明確に把握することが可能である。他方で、索引語を事前に絞り込ん でおくことにより、分析処理の効率化を図ることも可能となる。  As described above, by extracting and analyzing only the index terms that represent the characteristics of the entire technical literature group as important index terms from a plurality of technical literatures, the features or concepts of the technical field can be clarified more clearly. It is possible to grasp. On the other hand, it is possible to improve the efficiency of the analysis process by narrowing down the index words in advance.
<2— 2.文書べタトノレの取得〉  <2— 2. Acquiring Document Betanore>
次に、上記テキストデータ取得部 101がテキストデータを取得した後、文書べクトノレ 取得部 102により、 I件の各文書 iにっき、各索引語 jの重み付け量 zをベクトル要素と する J次元文書ベクトルを生成する(S102)。この結果、次のような I行 J列のデータを 得ること力 Sできる。この zを行列要素とする I行 J列の行列を Zとおく。 Next, after the text data obtaining unit 101 obtains the text data, the document vector obtaining unit 102 obtains the I-th document i and the weighting amount z of each index word j as a vector element. Is generated (S102). As a result, I row J column data like the following You can get power S. Let Z be the matrix of I rows and J columns with z as the matrix element.
[表 1]  [table 1]
Figure imgf000018_0001
Figure imgf000018_0001
[0052] ここで重み付け量とは、所定の観点から各索引語に対し各文書において与えられ る数量をいい、例えば TFIDFを用いるのが好ましい。 TFIDFとは、ある索引語につ いて、索引語頻度 (TF :ある文書における当該索引語の出現回数)と、文書頻度(D F :所定文書集団のうち当該索引語が出現する文書の文書数)の逆数又は文書頻度 の対数の逆数 (IDF :逆文書頻度)との積により求められる値である。文書ベクトルの 算出対象となる文書において多数用いられる索引語であって、所定文書集団におい てあまり用いられてレ、な!/、索引語にっレ、ては高レ、TFIDF値が算出される。  Here, the weighting amount refers to the quantity given to each index word in each document from a predetermined viewpoint, and for example, TFIDF is preferably used. TFIDF is the index word frequency (TF: number of occurrences of the index word in a document) and document frequency (DF: number of documents in the document group where the index word appears) for a certain index word. The value obtained by multiplying the reciprocal of or the logarithm of the document frequency (IDF: inverse document frequency). This is an index word that is used in a large number of documents for which a document vector is to be calculated, and is often used in a given document group. .
[0053] < 2— 3.因子負荷量と因子得点の算出〉  [0053] <2— 3. Calculation of Factor Load and Factor Score>
次に、因子演算部 103により、各文書 iを被験者とし、各索引語 jを観測変数とし、各 文書の文書ベクトルを被験者による回答とした因子分析における因子負荷量及び因 子得点を算出する(S103)。  Next, the factor calculation unit 103 calculates factor loadings and factor scores in factor analysis in which each document i is a subject, each index word j is an observation variable, and the document vector of each document is a response by the subject ( S103).
具体的には、まず、因子 kの因子数を Kとし (k= l , 2, · · · , K)、各索引語 jの各因 子 kに対する因子負荷量を a とする。また、各文書 iの各因子 kに関する因子得点を f jk i とする。そして、因子負荷量 a を行列要素とする因子負荷行列 Aと、因子得点 f を k jk ik 行列要素とする因子得点行列 Fを次のようにおく。  Specifically, first, let K be the number of factors of factor k (k = l, 2,..., K), and let a be the factor loading for each factor k of each index word j. Also, let f jk i be the factor score for each factor k in each document i. Then, a factor loading matrix A having a factor loading a as a matrix element and a factor score matrix F having a factor score f as a k jk ik matrix element are set as follows.
[表 2]  [Table 2]
Figure imgf000018_0002
Figure imgf000018_0002
[表 3] 因子 1 因子 2 因子 文書 1 f 1 1 f 1 2 f K 文書 2 f 2 1 f 2 2 f 2 K 文書 3 f 3 1 f 3 2 f 3 K [Table 3] Factor 1 Factor 2 Factor Document 1 f 1 1 f 1 2 f K Document 2 f 2 1 f 2 2 f 2 K Document 3 f 3 1 f 3 2 f 3 K
■ . . - ■ - - ■ ■  ■..-■--■ ■
文書 I f . 1 f 1 2 f I K  Document I f. 1 f 1 2 f I K
[0054] 次に、 I行 J列の残差行列を Eとおき、式  [0054] Next, let E be the residual matrix of I rows and J columns, and
Z = F XAt + E Z = F XA t + E
但し、 は Aの転置行列  Where is the transpose of A
を以下のようにして解いて因子負荷行列 Aと因子得点行列 Fを求める。  To obtain the factor loading matrix A and factor scoring matrix F.
[0055] 因子得点行列 Fの各要素である因子得点 f 及び残差行列 Eの各要素である残差 e ik i に関し、(1)因子得点は、平均 0、標準偏差 1に標準化されている、(2)各因子得点[0055] Regarding the factor score f which is each element of the factor score matrix F and the residual e ik i which is each element of the residual matrix E, (1) the factor score is standardized to mean 0 and standard deviation 1 , (2) Each factor score
J J
間の相関は 0である、(3)各残差間の相関は 0である、(4)各因子得点と各残差との 相関は 0である、との仮定を設けると、一般に、  In general, assuming that the correlation between each residual is 0, (3) the correlation between each residual is 0, and (4) the correlation between each factor score and each residual is 0,
R=AAt + V R = AA t + V
但し、 Rは観測変数間の相関行歹 i]、 Vは残差の分散共分散行列  Where R is the correlation between the observed variables [i] and V is the residual covariance matrix
が成立することが知られている。そこで、次式において因子負荷量を求める。  Is known to hold. Therefore, the factor loading is obtained by the following equation.
AA^R-V  AA ^ R-V
次に、 R— V=R*とおく。この R*を算出するため、行列 Zの各要素 zの値から相関 行歹 IJRを算出した上で、相関行列の対角要素を共通性の推定値で置き換えることに より、 行列を推定する(共通性の推定法としては例えば SMC法、 RMAX法等があ る)。そして、 R* =AAtであることから、この R*行列を基に因子負荷行列 Aを算出し て因子負荷量を求める(因子負荷量を求める方法としては例えば主因子法、最小二 乗法、最尤法等がある)。  Next, R—V = R *. In order to calculate this R *, the correlation matrix IJR is calculated from the values of each element z of the matrix Z, and then the matrix is estimated by replacing the diagonal elements of the correlation matrix with the estimated value of commonality ( Examples of commonness estimation methods include the SMC method and RMAX method). Since R * = AAt, the factor loading matrix A is calculated based on this R * matrix to obtain the factor loading (the factor loading is calculated by, for example, the main factor method, the least squares method, the maximum There is a likelihood method).
[0056] そして、より有意味な因子を見出すため、因子演算部 103によって、因子の回転と いう操作を行うことが望ましい。本実施形態における因子軸の回転方法としては、ノ リ マックス(直交回転)法を用いるのが好ましい。つまり、因子負荷量の分散を最大化す るように他因子との直交性を保ちながら因子軸を回転させることで観測変数と因子の 関係を求める。特に、分析対象となる文書群が、文書ベクトル間の類似性の高い文 書群である場合には、このバリマックス法を用いることにより、各因子負荷量の大きさ に減り張りをつけ、その因子の特徴を明確にすることができるという利点がある。 なお、一般的な因子軸の回転方法としては、上記ノ リマックス法の他にも、直交回 転では、例えば、コーティマックス、ェカマックス、パーシマックス、ォーソマックス、直 交プロクラステス等が挙げられ、斜交回転では、プロマックス、ォブリミン、ハリス'カイ ザ一、斜交プロクラステス等が挙げられる。本実施例で因子軸を回転させる方法は、 上記バリマックスに限らず、実施態様に応じてこれらの回転方法から適宜に選択して あよい。 [0056] Then, in order to find a more meaningful factor, it is desirable that the factor calculation unit 103 performs an operation of factor rotation. As a method of rotating the factor axis in the present embodiment, it is preferable to use a norimax (orthogonal rotation) method. In other words, the relationship between the observed variable and the factor is obtained by rotating the factor axis while maintaining orthogonality with other factors so as to maximize the variance of the factor loading. In particular, the document group to be analyzed is a sentence with high similarity between document vectors. In the case of books, using this varimax method has the advantage that the amount of each factor load can be reduced and the characteristics of the factor can be clarified. As a general factor axis rotation method, in addition to the above-mentioned Norimax method, orthogonal rotations include, for example, Coatmax, Ekamax, Percimax, Osomax, orthogonal procrustes, etc. In the case of cross rotation, Promax, Oblimin, Harris' Kaiser, Oblique Procrustes, etc. are listed. The method of rotating the factor axis in this embodiment is not limited to the above varimax, and may be appropriately selected from these rotation methods according to the embodiment.
[0057] 因子得点行列 Fは、例えば、  [0057] The factor score matrix F is, for example,
F = ZR_ 1A F = ZR _ 1 A
で算出される(但し、 Zはここでは標準化されたデータとする)。  (Where Z is standardized data here).
[0058] < 2— 4.帰属因子の決定〉 [0058] <2— 4. Determination of attribution factors>
因子演算部 103によって因子負荷行列 Aと因子得点行列 Fが求められたら、帰属 因子決定部 104により、因子負荷量 a に基づいて各索引語 jの帰属因子を決定し、 因子得点 f に基づいて各文書 iの帰属因子を決定する(S 104)。  When the factor calculation unit 103 obtains the factor loading matrix A and the factor scoring matrix F, the attribution factor determination unit 104 determines the attribution factor of each index word j based on the factor loading a, and based on the factor score f The attribution factor of each document i is determined (S104).
ik  ik
これにより、各因子に索引語及び複数の技術文献を帰属させることができるので、 索引語からは、分析者にとって理解可能な言語情報を通じて技術分野の特徴乃至コ ンセブトを把握することが可能となる。また、複数の技術文献からは、各技術文献に 含まれる情報が有する特定の傾向から技術分野の特徴乃至コンセプトを把握するこ とが可能となる。なお、本発明では因子分析の手法を用いている力 通常の因子分 析における「因子の解釈」というような分析者の主観的或いは恣意的な判断を必要と しない。これは、観測変数として各索引語を用いていることによる。つまり、観測変数 そのものが内容を表しているので、因子負荷量に基づいて因子に帰属する観測変数 を決定すれば、各因子の内容は観測変数を用いて端的に示すことができる。  As a result, index terms and multiple technical documents can be attributed to each factor, so it is possible to grasp the characteristics or concepts of the technical field from the index terms through linguistic information understandable to the analyst. . In addition, from a plurality of technical documents, it is possible to grasp the characteristics or concepts of the technical field from the specific tendency of the information included in each technical document. Note that the present invention does not require the subjective or arbitrary judgment of the analyst, such as “interpretation of factors” in the ordinary factor analysis using the factor analysis technique. This is because each index word is used as an observation variable. In other words, since the observed variable itself represents the content, if the observed variable belonging to the factor is determined based on the factor loading, the content of each factor can be shown simply using the observed variable.
[0059] 例えば、ある索引語 jの各因子に対する因子負荷量 a 、 a 、 · · ·、& のうち、ある因 [0059] For example, a factor among the factor loadings a, a, ..., & for each factor of an index word j
j l j 2 jK  j l j 2 jK
子 kに対する因子負荷量 a が最大であれば、当該索引語 jの帰属因子を当該因子 k jk  If the factor loading a for the child k is the maximum, the attribution factor of the index word j is the factor k jk
とする。同様に、ある文書 iの各因子に関する因子得点 f 、f 、 · · ·、ί のうち、ある因  And Similarly, a factor out of factor scores f, f,.
il i2 iK  il i2 iK
子 kに関する因子得点 f が最大であれば、当該文書 iの帰属因子を当該因子 kとする 。上記のように帰属因子を決定することが、例えば、各索引語又は各技術文献にとつ て最も関係性の強い因子に帰属することにより、その因子を最もよく説明ことが可能と なる。その結果、技術分野の特徴乃至コンセプトをより明確に把握することが可能と なる。なお、この場合、 1つの索引語が帰属し得る因子は 1つのみとなり、 1つの文書 が帰属し得る因子も 1つのみとなる。これに対し、 1つの因子に帰属する索引語は 1つ とは限らないし、 1つの因子に帰属する文書も 1つとは限らない。 If the factor score f for the child k is the maximum, the attribution factor of the document i is the factor k . Determining the attribution factor as described above can best explain the factor by, for example, belonging to the factor most closely related to each index word or each technical document. As a result, it becomes possible to grasp the features and concepts of the technical field more clearly. In this case, one index word can belong to only one factor, and one document can belong to only one factor. On the other hand, an index word belonging to one factor is not always one, and a document belonging to one factor is not necessarily one.
[0060] また更に、因子負荷量に下限値を設け、ある索引語 jの因子負荷量の最大値 a 力 S jk 当該下限値未満であれば、当該索引語 jはいかなる因子にも帰属しないこととし、因 子負荷量の低い索引語 jは因子の内容を示す索引語群から除外することが好ましい 。同様に、因子得点にも下限値を設け、ある文書 iの因子得点の最大値 f が当該下 [0060] Furthermore, a lower limit value is set for the factor loading, and the maximum factor loading of a certain index word j a force S jk If the index loading j is less than the lower limit, the index word j should not belong to any factor. The index word j having a low factor load is preferably excluded from the index word group indicating the contents of the factor. Similarly, a lower limit is set for the factor score, and the maximum value f of the factor score for a document i is
ik  ik
限値未満であれば、当該文書 iはいかなる因子にも帰属しないこととし、因子との関係 が高レ、文書のみを因子に帰属させることが好ましレ、。  If it is less than the limit, the document i is not attributed to any factor, and the relationship with the factor is high, and it is preferable to attribute only the document to the factor.
[0061] 以上のように各索引語の帰属因子が決定されれば、各因子に属する索引語又は索 引語群は、当該各因子の内容を示すものすなわち分析対象の I件の文書に含まれる コンセプト乃至特徴を示すものと考えることができる。そして、各因子に属する文書又 は文書群は当該因子に関係の深い文書又は文書群であるから、各因子として抽出さ れたコンセプト乃至特徴が特にどの文書に表れているかを示すものと考えることがで きる。  [0061] If the attribution factor of each index word is determined as described above, the index word or index word group belonging to each factor is included in the I documents to be analyzed, indicating the contents of each factor. It can be thought of as showing a concept or feature. Since the document or document group belonging to each factor is a document or document group closely related to the factor, it should be considered to indicate in which document the concept or feature extracted as each factor appears. I can do it.
[0062] < 2 - 5.出力〉  [0062] <2-5. Output>
そして、文書群分析装置は、同じ因子に属する索引語又は索引語群を、それぞれ 該当する各因子に属する文書又は文書群のデータとともに、各因子につき出力装置 4により出力する(S 105)。各因子に属する文書又は文書群のデータとしてどのような ものを出力するかは任意である力 一例としては当該文書又は文書群を特定するデ ータ(特許公報であれば公報番号など)を出力することや、当該文書又は文書群の 因子評価値を算出して出力することが考えられる。  Then, the document group analysis apparatus outputs the index word or index word group belonging to the same factor together with the document or document group data belonging to each corresponding factor by the output device 4 for each factor (S105). It is arbitrary power to output what kind of data for documents or document groups belonging to each factor. As an example, output data (such as a publication number in the case of a patent publication) specifying the document or document group. It is possible to calculate the factor evaluation value of the document or document group and output it.
上記のように因子評価値を算出することにより、因子評価値による各因子間の相対 比較を行うことができる。その結果として、因子に帰属する技術文献によって表される 技術要素間の相対的な位置関係を把握することができ、更には、それらの中から重 要な技術要素とそうでないものとの分類を行うこともできる。なお、因子評価値を算出 する好ましい一例として、当該文書又は文書群の因子経過情報スコアを算出して出 力する例を次に述べる。 By calculating the factor evaluation value as described above, it is possible to perform a relative comparison between the factors based on the factor evaluation value. As a result, the relative positional relationship between the technical elements represented by the technical literature belonging to the factors can be grasped, and moreover, it can be recognized from among them. It is also possible to classify the key technical elements and those that are not. As a preferred example of calculating the factor evaluation value, an example of calculating and outputting the factor progress information score of the document or document group will be described below.
[0063] < 3.因子経過情報スコアの算出処理〉  [0063] <3. Calculation process of factor progress information score>
図 2 (B)は、文書又は文書群のデータとして因子経過情報スコアを算出する処理手 順を説明するフローチャートである。このフローチャートの処理は、(1)因子経過情報 スコアを算出すべき各因子に属する文書又は文書群、或いは、(2)後述のように因 子ごと及び出願人ごとに因子経過情報スコアを算出する場合には各因子かつ各出 願人の文書又は文書群(以下では(1) (2)何れの場合についても「スコア算出対象 文書群」とレ、う)につレ、て、それぞれ実行される。  FIG. 2 (B) is a flowchart for explaining a processing procedure for calculating a factor progress information score as data of a document or a document group. The process of this flowchart is as follows: (1) A document or document group belonging to each factor for which a factor progress information score is to be calculated; or (2) A factor progress information score is calculated for each factor and for each applicant as described later. In some cases, each factor and each applicant's document or document group (hereinafter referred to as (1) (2) “score calculation target document group” in each case) will be executed. The
[0064] < 3— 1.文書数の判定〉  [0064] <3— 1. Determination of the number of documents>
まず、文書数判定部 201により、スコア算出対象文書群の文書数 Nを判定する(S2 01)。ある特許出願につき公開特許公報と特許掲載公報が発行されている場合には 、当該特許出願についての文書数は 2件としてカウントすることが望ましい。  First, the document number determination unit 201 determines the document number N of the score calculation target document group (S201). If a published patent publication and a patent publication are issued for a patent application, the number of documents for that patent application should be counted as two.
[0065] < 3— 2·経過情報の読み出し〉  [0065] <3—2 · Reading progress information>
次に、経過情報読み出し部 202により、記録装置 3の文書格納部 33又はその他の データベースから、スコア算出対象文書群の各文書の文書属性として経過情報を読 み出す(S202)。このデータベースには、各文書に係る特許出願の経過情報が記録 されている。読み出される経過情報の例としては、各特許出願につき、  Next, the progress information reading unit 202 reads the progress information as the document attribute of each document of the score calculation target document group from the document storage unit 33 of the recording device 3 or another database (S202). This database records the history of patent applications related to each document. As an example of the progress information to be read, for each patent application,
「他社引用回数」(0又は正の整数)、  “Other company citation count” (0 or positive integer),
「被特許異議申立若しくは被特許無効審判請求の回数」(0又は正の整数)、 「審査請求の有無」(1又は 0)、  “Number of requests for opposition to patent or trial for invalidation of patented patent” (0 or positive integer), “Request for examination” (1 or 0),
「特許権設定登録の有無」(1又は 0)、  “Patent setting registration status” (1 or 0),
「早期審査請求の有無」(1又は 0)、  “Request for accelerated examination” (1 or 0),
「査定不服審判の有無」(1又は 0)  “Presence or absence of appellate appeal” (1 or 0)
等が挙げられる力 他の情報であってもよい。  Other information may be used.
[0066] < 3— 3.経過情報に基づく指標の算出〉 [0066] <3— 3. Calculation of indicators based on progress information>
次に、指標算出部 203により、スコア算出対象文書群についての評価値として、上 記データベースに記録された経過情報に基づく複数の指標を算出する(S203)。 この指標の例としては、「他社引用回数の合計値」、「被特許異議申立若しくは被特 許無効審判請求の回数の合計値」、「審査請求率」、「登録査定率」の他、「特許登録 率」、「早期審査請求率」、「他社引用件数比率」、「査定不服審判件数比率」、「被異 議申立又は被無効審判請求件数比率」があるが、他の指標を用いてもよい。各々の 定義は次の通りである。 Next, the index calculation unit 203 uses the above as an evaluation value for the score calculation target document group. A plurality of indices are calculated based on the progress information recorded in the database (S203). Examples of this indicator include: “Total number of citations from other companies”, “Total number of requests for patent opposition or patent invalidation trials”, “Examination request rate”, “Registered assessment rate”, There are "patent registration rate", "prompt request rate", "other companies cited number ratio", "appraisal appeal case ratio", and "objection or invalidation request ratio". Also good. Each definition is as follows.
「他社引用回数の合計値」 =「他社引用回数」のスコア算出対象文書群での合計 「被特許異議申立若しくは被特許無効審判請求の回数の合計値」  “Total number of citations from other companies” = “Total number of citations from other companies” in the document group to be calculated
=「被特許異議申立若しくは被特許無効審判請求の回数」のスコア算出対象文 書群での合計  = Total number of documents subject to score calculation for “number of oppositions to be patented or requests for trial for invalidation of patented patents”
「審査請求率」 =審査請求件数/特許出願件数  `` Examination request rate '' = number of requests for examination / number of patent applications
「特許登録率」 =特許登録件数/特許出願件数  "Patent registration rate" = number of patent registrations / number of patent applications
「登録査定率」 =特許登録件数/審査請求件数  "Registered assessment rate" = Number of patents registered / Number of requests for examination
「早期審査請求率」 =早期審査請求件数/審査請求件数  "Rapid review request rate" = Number of requests for accelerated examination / Number of requests for examination
「他社引用件数比率」 =「他社引用回数の合計値」/特許出願件数  “Ratio of other companies' citations” = “Total number of citations of other companies” / Number of patent applications
「査定不服審判件数比率」 =査定不服審判件数/審査請求件数  “Rate of Appeal Appeal Trials” = Appeal Appeal Appeals / Requests for Appraisal
「被特許異議申立若しくは被特許無効審判請求件数比率」  "Number of requests for opposition to patent or request for invalidation of patents"
=「被特許異議申立若しくは被特許無効審判請求の回数の合計値」 /特許登録 件数  = "Total number of requests for patent opposition or patent invalidation trial" / Number of patent registrations
なお、これらの定義のうち Of these definitions,
審査請求件数は「審査請求の有無」(1又は 0)のスコア算出対象文書群での合計 特許登録件数は「特許権設定登録の有無」(1又は 0)のスコア算出対象文書群で の合計  The number of requests for examination is the total in the document group subject to score calculation of “whether or not there is a request for examination” (1 or 0) The number of patent registrations is the total in the document group subject to score calculation of “whether or not the patent right is registered” (1 or 0)
早期審査請求件数は「早期審査請求の有無」(1又は 0)のスコア算出対象文書群 での合計  The number of requests for accelerated examination is the total for the documents subject to score calculation for “whether or not requested for accelerated examination” (1 or 0).
査定不服審判件数は「査定不服審判の有無」(1又は 0)のスコア算出対象文書群 での合計  The number of trials against appraisal is the sum of the documents for which scores are to be calculated for “Applicability of appraisal” (1 or 0)
で与えればよい。 [0067] < 3— 4.特許インパクト指数の算出〉 Give it. [0067] <3— 4. Calculation of Patent Impact Index>
次に、特許インパクト算出部 204により、スコア算出対象文書群の「文書数」に所定 の重み付けをして特許インパクト指数を算出する(S204)。特許インパクト指数は、ス コア算出対象文書群について、他社牽制力(他社の権利化を抑制し、自社特許の価 値を向上させる度合い)を評価しょうとするものをいう。例えば、「文書数」に対して「他 社引用回数の合計値」及び/又は「被特許異議申立若しくは被特許無効審判請求 の回数の合計値」に基づく所定の重み付けを行い、  Next, the patent impact calculation unit 204 calculates a patent impact index by assigning a predetermined weight to the “number of documents” of the score calculation target document group (S204). The patent impact index refers to a document that is intended to evaluate the other company's restraining power (the degree to which the rights of other companies are suppressed and the value of the company's patent is improved) for the document group subject to score calculation. For example, the “number of documents” is given a predetermined weighting based on “the total number of times of citations from other companies” and / or “the total number of times of oppositions to be patented or requests for invalidation of patented patents”
特許インパクト指数 =「文書数」 +「他社引用回数の合計値」 +「被特許異議申立 若しくは被特許無効審判請求の回数の合計値」  Patent Impact Index = “Number of documents” + “Total number of citations from other companies” + “Total number of requests for opposition to patent or request for trial for invalidity of patent”
によって算出すること力 Sできる。「文書数」に対する重み付けは、上式のような加算に より行っても良いし、他の何らかの比率を乗算することにより行ってもよい。  The force S can be calculated by The weighting for the “number of documents” may be performed by addition as in the above formula, or by multiplying by some other ratio.
なお、上記所定の重み付けとしては、これらの他にも、例えば、特許収益性、特許 生産性、特許活用度及び特許競争力等、種々のものが挙げられるが、これに限定さ れなレ、。  In addition to the above, the predetermined weighting includes various things such as patent profitability, patent productivity, patent utilization, and patent competitiveness, but is not limited to this. .
上記のように技術文献数を判定することにより、技術文献群からなる母集団におけ る因子のシェアを把握することができる。また、文献数に所定の重み付けを行うことに より、経済的な側面や技術競争力を加味した因子の評価を行うことができる。  By determining the number of technical documents as described above, it is possible to grasp the share of factors in the population composed of technical documents. In addition, by applying a predetermined weight to the number of documents, it is possible to evaluate factors that take into account economic aspects and technical competitiveness.
[0068] < 3— 5.経過情報指数の算出〉 [0068] <3— 5. Calculation of progress information index>
次に、経過情報算出部 205により、スコア算出対象文書群の経過情報に基づく指 標を二乗平均して経過情報指数を算出する(S205)。経過情報指数は、スコア算出 対象文書群について、自社、特許庁及び競合他社の特許の価値を評価しょうとする もので、例えば、  Next, the progress information calculation unit 205 calculates the progress information index by averaging the indices based on the progress information of the score calculation target document group (S205). The progress information index is an attempt to evaluate the value of patents of the company, the JPO, and competitors for a group of documents for which scores are calculated.
経過情報指数 = {∑ d (指標)2 /d} Progress information index = {∑ d (index) 2 / d}
指標 No = l  Indicator No = l
によって算出することができる。すなわち、経過情報に基づく d個の指標、例えば上 記「審査請求率」、「登録査定率」、「特許登録率」、「早期審査請求率」、「他社引用 件数比率」、「査定不服審判件数比率」、「被異議申立又は被無効審判請求件数比 率」の計 7個の指標の二乗和を指標数 d= 7で除算して算出された値の正の平方根 をとることにより、経過情報指数を算出することができる。ここでは、経過情報指標の 例として上記 7個の指標を示したが、他にも、例えば、「自社引用件数比率」、「国内 優先権主張率」、「国外優先権主張率」、「包袋閲覧率」等を用いるようにしてもよい。 上記の指標を用いた因子経過情報スコアを算出することにより、他社の特許取得、技 術開発に対して障害となりうる特許の影響力を加味した因子評価が可能である。また 、出願人の権利化意欲や審査官評価を加味した因子評価が可能である。それにより 、因子評価の公平性及び適正さを担保することができるため、その結果として、技術 要素間の相対的な位置関係、重要性の結果の公平且つ適正な把握をすることがで きる。 Can be calculated. In other words, d indicators based on progress information, such as “Request for Examination”, “Registered Appraisal Rate”, “Patent Registration Rate”, “Rapid Request for Approval”, “Ratio of Number of References from Other Companies”, “Appeal for Appeals” By taking the positive square root of the value calculated by dividing the sum of the squares of the seven indicators, the ratio of the number of cases '' and the ratio of the number of appeals against the objection or invalidation request, by the number of indicators d = 7 An information index can be calculated. Here is the progress indicator The above seven indicators are shown as an example, but in addition, for example, “in-house citation count ratio”, “domestic priority claim ratio”, “foreign priority claim ratio”, “packaging browsing rate”, etc. are used. You may do it. By calculating the factor progress information score using the above-mentioned indicators, it is possible to perform factor evaluation that takes into account the influence of patents that can be an obstacle to patent acquisition and technology development of other companies. In addition, it is possible to perform factor evaluation that takes into account the applicant's willingness to obtain rights and examiner's evaluation. As a result, the fairness and appropriateness of the factor evaluation can be ensured. As a result, the relative positional relationship between the technical elements and the result of the importance can be grasped fairly and appropriately.
[0069] d個の指標から経過情報指数を算出する方法として、各指標の総和を求める方法も 可能である(単純和法)。多数の指標にお!/、て高!/、値を有する特許群が高く評価さ れるので、指標の総和を経過情報指数とすることは一見合理的である。但し、指標の 数 dを増やせば増やすほど、本来重視すべき「審査請求率」、「登録査定率」、「特許 登録率」等の比重が下がり、評価の目的とは異なった経過情報指数となってしまう場 合があり得る。  [0069] As a method of calculating the progress information index from the d indices, a method of calculating the sum of each index is also possible (simple sum method). Since patent groups with a large number of indicators! /, High! / And values are highly evaluated, it is reasonable to use the sum of indicators as a historical information index. However, as the number of indicators d increases, the specific gravity of the “request for examination”, “registration assessment rate”, “patent registration rate”, etc., which should be emphasized, decreases. It can happen.
この問題を解決する 1つの方法として、各指標のうち最大値を経過情報指数とした り、上位例えば 3指標のみを使って経過情報指数を算出したりする方法も可能である (最大値法)。しかし、最大値又は上位指標のみを採用すると、他の指標はまったく勘 案されないことになつてしまい、一面的な評価になってしまう場合があり得る。  One way to solve this problem is to use the maximum value among the indicators as the progress information index, or to calculate the progress information index using only the top three indicators (maximum value method). . However, if only the maximum value or the higher index is adopted, the other indices will not be taken into account at all, which may result in a one-sided evaluation.
二乗和を指標数 dで除算して平方根をとる上述の方法は、単純和法と最大値法の 長所を兼ね備えた方法ということができる。すなわち、二乗和をとることにより、ある文 書群に関する d個の指標の中に高い値の指標があるときは、その高い値が経過情報 指数に大きく影響する。従って、値の高くなりやすい「審査請求率」、「登録査定率」、 「特許登録率」等が特に高!/、 (結果的に特許査定件数が多!/、)文書群に対しては、 突出して高い評価素点を与えることができる。そして、高い値以外の指標についても 、幾らか考慮された経過情報指数となる。  The above-mentioned method of taking the square root by dividing the sum of squares by the index number d can be said to be a method that combines the advantages of the simple sum method and the maximum value method. In other words, by taking the sum of squares, if there is a high value index among the d indices related to a certain document group, the high value greatly affects the historical information index. Therefore, the “examination request rate”, “registration assessment rate”, “patent registration rate”, etc., which tend to be high, are particularly high! / (As a result, the number of patent assessments is large! /) , Can give outstanding evaluation score. And for indicators other than high values, it is a historical information index that takes some consideration into account.
このように本実施形態では、 d個の指標を全て加味して経過情報指数を算出するよ うにしている。その結果、特許の価値を多面的に評価することが可能となる。  As described above, in the present embodiment, the progress information index is calculated with all the d indices taken into account. As a result, it is possible to evaluate the value of a patent from multiple angles.
[0070] < 3- 6.因子経過情報スコアの算出〉 次に、スコア算出部 206により、上記特許インパクト指数と、上記経過情報指数とを 乗算して当該スコア算出対象文書群の因子評価値を算出する(S206)。このように 経過情報を指数化することにより、例えば、定量的且つ客観的な因子の評価を行うこ と力 Sできる。 [0070] <3- 6. Calculation of factor progress information score> Next, the score calculation unit 206 multiplies the patent impact index and the progress information index to calculate a factor evaluation value of the score calculation target document group (S206). By indexing the progress information in this way, for example, it is possible to evaluate quantitative and objective factors.
この評価値を「因子経過情報スコア」と称する。算出されたスコアは、 S 105におい て出力される。この因子経過情報スコアは、以下の性質を持っている。  This evaluation value is referred to as a “factor progress information score”. The calculated score is output in S105. This factor progress information score has the following properties.
「審査請求率」 = 0の場合、経過情報指数はほとんどのケースで 0となり、その結果、 因子経過情報スコアも 0となる。  When “request for examination” = 0, the progress information index is 0 in most cases, and as a result, the factor progress information score is also 0.
経過情報指数は、特許登録される件数が増えるにつれて増大する。また、査定不 月艮、被異議申立等があれば勘案される。  The progress information index increases as the number of patents registered increases. In addition, if there are unassessed moon cakes, oppositions, etc., they are taken into account.
特許インパクト指数は公報件数をカウントするので、特許出願が増えるほど増大し、 更に特許掲載公報が発行されると一層増大する。そして、「他社引用回数の合計値」 、「被特許異議申立若しくは被特許無効審判請求の回数の合計値」で重み付けされ ている。  Since the patent impact index counts the number of publications, it increases as the number of patent applications increases, and further increases when patent publications are issued. It is weighted by “total number of citations from other companies” and “total number of requests for opposition to patent or patent invalidation request”.
この因子経過情報スコアにより、特許文書群を経過情報の側面から評価できるので 、特許件数だけでは測れなレ、特許の強さを窺!、知ること力 Sできる。  With this factor progress information score, the patent document group can be evaluated from the aspect of the progress information.
以上のような因子経過情報スコアを算出することにより、特許公報に基づく因子の 評価を行うことができる。その結果、その特許公報に関連する特許情報に含まれる審 查経過情報を活用することにより、精度の高い因子評価を行うことが可能となる。 なお、因子経過情報スコアを算出する方法は、これに限らず、例えば、各因子に属 する各特許公報の個別評価値を算出し、これらを総計することにより各因子ごとに因 子経過情報スコアを求めることもできる。  By calculating the factor progress information score as described above, the factor can be evaluated based on the patent gazette. As a result, it is possible to perform highly accurate factor evaluation by utilizing examination progress information included in patent information related to the patent publication. The method for calculating the factor progress information score is not limited to this. For example, by calculating the individual evaluation value of each patent gazette belonging to each factor and totaling them, the factor progress information score for each factor is calculated. Can also be requested.
< 4.実施例〉  <4.Example>
次に、具体的な文書群の解析結果を紹介する。  Next, we introduce the analysis results of a specific document group.
ある自動車メーカー(企業 Aとする)の日本における特許公開公報及び特許掲載公 報を類似度に基づレ、てクラスタ分析して複数のクラスタを得た。これら複数のクラスタ を検討した結果、特に「内燃機関の排気制御」に関するクラスタについて、周辺他社 特許との関係を詳細に解析することにした。そこで、自社及び他社の特許公開公報 及び特許掲載公報を含む約 400万件の公報群の中から、当該「内燃機関の排気制 御」に関するクラスタに属する公報群との類似度が高い 2869件の公報群を抽出した 以下に紹介する解析結果は、この 2869件の公報群を各公報の文書ベクトルの類 似度に基づいてクラスタ分析し、このうち特定クラスタに属する公報 569件を分析対 象文書群とした例である。 A cluster analysis was performed based on the similarity of patent publications and patent publications published in Japan by a car manufacturer (named company A) to obtain multiple clusters. As a result of studying these multiple clusters, we decided to analyze in detail the relationship with the patents of other companies around the company, particularly regarding the cluster related to “exhaust control of internal combustion engines”. Therefore, the patent publications of our company and other companies In addition, 2869 publication groups with high similarity to the publication group belonging to the cluster related to the “exhaust control of internal combustion engine” were extracted from the group of about 4 million publications including patent publications. The analysis results are an example in which these 2869 publication groups are cluster-analyzed based on the similarity of the document vector of each publication, and among these, 569 publications belonging to a specific cluster are analyzed.
[0072] < 4- 1.因子分析処理〉 [0072] <4- 1. Factor analysis processing>
本実施例では、上記 569件の分析対象文書群に含まれる索引語のうち重要度上 位 77語を観測変数とし、因子分析の手法を用いて 16個の因子を抽出した。なお、抽 出する因子の数はこれに限らず、実施態様に応じて適宜に変更してもよい。因子数 を決定する方法としては、例えば、固有値、因子寄与、解釈可能性、カイザーガット マン基準による方法等が挙げられる。  In this example, among the 569 analysis target document groups, among the index words, 77 words with the highest importance were used as observation variables, and 16 factors were extracted using a factor analysis technique. The number of factors to be extracted is not limited to this, and may be changed as appropriate according to the embodiment. Examples of methods for determining the number of factors include eigenvalues, factor contributions, interpretability, and Kaiser-Gatmann criteria.
以下、因子演算部 103により算出した各索引語の因子負荷量と、各公報の因子得 点の算出結果の例を図 4及び図 5の表にそれぞれ示す (ここでは索引語及び公報の 一部についてのみ示す)。  Examples of factor loadings for each index word calculated by the factor calculation unit 103 and factor score calculation results for each gazette are shown in the tables of FIGS. 4 and 5 (here, part of the index word and gazette). Only shown).
[0073] 図 4に示す表は因子番号 1〜; 16までの各因子ごとに算出した各索引語の因子負荷 量の例を示す表であり、同図中、行は各索引語、列は各因子の番号を示す。図 4に おいて、説明の便宜上、すべての索引語を表記せず、一部の索引語のみを抜粋して 示している。 [0073] The table shown in FIG. 4 is a table showing an example of the factor loading of each index word calculated for each factor from factor numbers 1 to 16; in the figure, the rows are index words and the columns are The number of each factor is indicated. In FIG. 4, for convenience of explanation, not all index words are shown, but only some index words are extracted and shown.
本実施例では、各索引語について、図 4に示した因子負荷量が最大値 (該当部分 に網掛けを施した)を示す因子を当該索引語の帰属因子とした。具体的には、例え ば図 4の最上段にある「蔵」という索引語は、因子 1に対して 0. 96の因子負荷量を有 する。この「蔵」という索引語は、因子 2〜; 16までの他のどの因子よりも因子 1に対して 大きな因子負荷量を有するため、因子 1に帰属するということになる。その他、因子 1 には、「蔵」という索引語と同様に、「吸、 NOx、還元、放出、スパイク」という索引語が 帰属している。但し、観測変数とした索引語のうち所定の閾値以上の因子負荷量を 持たなかったものは、どの因子にも帰属しないものとした。  In this example, for each index word, the factor indicating the maximum factor load (shown by shading the relevant part) shown in FIG. Specifically, for example, the index word “Kura” at the top of FIG. 4 has a factor loading of 0.96 for factor 1. This index word “Kura” is attributed to factor 1 because it has a larger factor loading for factor 1 than any other factor up to factor 2 to 16; In addition, the index word “sucking, NOx, reduction, release, spike” belongs to factor 1 as well as the index word “Kura”. However, among the index terms used as observation variables, those that did not have a factor loading equal to or higher than a predetermined threshold were not attributed to any factor.
[0074] 図 5に示す表は因子番号 1〜; 16までの各因子ごとに算出した各公報の因子得点の 例を示す表であり、同図中、行は各公報番号、列は各因子の番号を示している。図 5 において、説明の便宜上、公報は各因子に帰属している公報すベてを表記せずに 一部を抜粋し、そして、それら公報の番号は一部を加工して示している。 [0074] The table shown in FIG. 5 shows the factor scores of each gazette calculated for each factor from factor numbers 1 to 16; It is a table | surface which shows an example, In the same figure, a row has shown each publication number, and the column has shown the number of each factor. In FIG. 5, for the sake of convenience of explanation, the gazettes are excerpted without showing all gazettes belonging to each factor, and the numbers of these gazettes are partially processed.
本実施例では、上記索引語の帰属因子を決定する場合と同様に、各公報について も、図 5に示した因子得点が最大値 (該当部分に網掛けを施した)を示す因子を当該 公報の帰属因子とした。具体的には、例えば図 5の最上段にある「XX— XXX601」と いう公報は、因子 1に対して 5. 41の因子得点を有する。この「XX— XXX601」の公 報は、因子 2〜; 16までの他のどの因子よりも因子 1に対して大きな因子得点を有する ため、因子 1に帰属することになる。その他、因子 1には、「XX— XXX601」の公報と 同様に、「XX— XXX097」、「XX— XXX189」の公報が帰属している。但し、分析対 象文書群に含まれる公報のうち因子得点の最大値が所定の閾値 (ここでは 1とした) 以下のものは、どの因子にも帰属しないものとした。  In this example, as in the case of determining the attribution factor of the index word, for each gazette, the factor in which the factor score shown in FIG. 5 shows the maximum value (the corresponding part is shaded) is indicated in the gazette. The attribution factor. Specifically, for example, the publication “XX-XXX601” at the top of FIG. 5 has a factor score of 5.41 for factor 1. This report “XX-XXX601” is attributed to factor 1 because it has a higher factor score for factor 1 than any other factor up to factor 2 to 16; In addition, similarly to the publication “XX—XXX601”, the publications “XX—XXX097” and “XX—XXX189” belong to factor 1. However, among the gazettes included in the analyzed document group, those whose maximum score of factor is below the predetermined threshold (here, 1) are not attributed to any factor.
[0075] 図 6に示す表に、各因子に属する索引語と、当該各因子に属する公報を示す。図 6 は、各因子に帰属する索引語及び公報の例を示す表であるが、説明の便宜上、各 因子に帰属して!/、る公報すベてを表記せず、一部を抜粋して示して!/、る。 [0075] The table shown in Fig. 6 shows index words belonging to each factor and publications belonging to each factor. Fig. 6 is a table showing examples of index words and publications belonging to each factor.For convenience of explanation, not all of the publications belonging to each factor! Show me! /
これにより、もともと類似度の高い 569件の分析対象文書群であっても、各因子に 属する索引語によって分析対象の文書群に含まれるコンセプト及び特徴が容易に理 解できる。つまり、文書群をクラスタリング等により分類した場合には分類間で何が違 うのかを把握しにくいことがある力 本実施形態によれば、因子という互いに独立の概 念を抽出した上でその因子に最も関連性の強い索引語及び公報を帰属させることに より、索引語が示す技術要素、及び公報に含まれる技術情報から、技術分野の特徴 乃至コンセプトを把握することができる。  This makes it easy to understand the concepts and features included in the document group to be analyzed using the index terms belonging to each factor, even in the case of the 569 document groups to be analyzed that have a high degree of similarity. In other words, when a document group is classified by clustering or the like, it may be difficult to grasp what is different between the classifications. According to this embodiment, an independent concept called a factor is extracted and then the factor is extracted. By assigning the index word and the gazette that are most relevant to, the characteristics or concept of the technical field can be grasped from the technical elements indicated by the index word and the technical information included in the gazette.
また、各因子につきそれぞれ必要に応じて具体的公報を参照できることとなったの で、注目したい因子に属する公報を実際に読み込んだり、各因子に属する公報のデ ータに基づいて後述の因子経過情報スコアを算出したりするなど更に進んだ分析が 可能となる。  In addition, since it is possible to refer to specific gazettes as necessary for each factor, the gazette belonging to the factor to be noticed is actually read, or the factor progression described later is based on the data of the gazette belonging to each factor. More advanced analysis is possible, such as calculating information scores.
[0076] 日本語の表記法は、英語のような単語間にスペースを入れる表記法とは異なり、単 語の区切りを形態上明確には表現しない。このため日本語の文章に対してテキストマ イニングを行う場合、コンピュータ上で予め形態素解析プログラムを適用することによ りキーワード切出し処理を行っておくのが一般である。しかし、現在の形態素解析プ ログラムは日本語の自然文の多種多様な表現への対応能力が不十分であるため、 本来一体となって意味をなす言葉が不自然に分割されてしまレ、分析に支障を来たす こと力 sある。 [0076] Unlike the notation in which a space is inserted between words such as English, the notation in Japanese does not express a word break clearly. For this reason, the text When performing inning, keyword extraction processing is generally performed in advance by applying a morphological analysis program on a computer. However, the current morphological analysis program has insufficient ability to handle a wide variety of expressions in Japanese natural sentences, so the words that originally make sense together are unnaturally divided and analyzed. a call and the force s interfering with the.
ところ力 本実施例によれば、分割されてしまった言葉が同じ因子に再び集まってく るという現象がみられた。例えば因子 4には索引語「リー」及び「ン」が帰属することに なったが、これらはもともと「リーン」(本分析対象文書群の分野では「燃料と空気の混 合比が薄いこと」を意味する)という一体の用語であったものと思われる。このように用 語が分割されても同じ因子に再び集まってくるのは、これら索引語の出現パターンの 共通性が高いため、各因子に対する因子負荷量も似通ったものになることに起因す るものと考えられる。同様に因子 2の「空」、「燃」、「比」、「理論」も、もともと「理論空燃 比」という一体の用語であったものと思われる。英語のように単語間にスペースを入れ る表記法をとる言語でも、「理論空燃比」は「空」、「燃」、「比」、「理論」に分割のうえ解 析される可能性がある力 その場合でも本発明を適用すれば同じ因子に再び集まつ てくるものと期待できる。  However, according to the present example, there was a phenomenon in which the divided words were gathered again in the same factor. For example, the index words “Lee” and “N” belonged to Factor 4, but these were originally “Lean” (in the field of this analysis document group, “the mixture ratio of fuel and air is low”). It seems to have been an integral term. Even if the terms are divided in this way, they are gathered again in the same factor because of the high commonality of the appearance patterns of these index terms, and the factor loading for each factor is similar. It is considered a thing. Similarly, factor 2, “sky”, “fuel”, “ratio”, and “theory”, seemed to have been an integral term of “theoretical air-fuel ratio”. Even in languages that use notation with spaces between words, such as English, “theoretical air-fuel ratio” may be analyzed by dividing it into “sky”, “fuel”, “ratio”, and “theory”. A certain force Even in that case, if the present invention is applied, it can be expected that the same factor will be collected again.
[0077] < 4 2.因子経過情報スコアの算出〉 [0077] <4 2. Calculation of factor progress information score>
次に、各因子に属する公報の経過情報に基づいて、各因子につき、因子経過情報 スコアを算出した例を示す。  Next, an example in which a factor progress information score is calculated for each factor based on the progress information of the gazette belonging to each factor is shown.
図 7及び図 8に示す表に、各因子に属する公報の経過情報データに基づいて、各 因子につき、因子経過情報スコアの算出に必要な複数の指標を算出し、これに基づ いて特許インパクト指数及び経過情報指数を算出した結果の例を示す。  In the tables shown in Fig. 7 and Fig. 8, based on the progress information data of the gazette belonging to each factor, for each factor, a plurality of indicators necessary for calculating the factor progress information score are calculated, and based on this, the patent impact is calculated. The example of the result of having calculated the index and the progress information index is shown.
図 7に因子番号 1〜; 16までの各因子ごとに算出した審査請求率、特許登録率、登 録查定率、早期審査請求率、他社引用件数比率、査定不服審判件数比率、被異議 申立件数比率、及び経過情報指数の例を示す。図 8に因子番号 1〜; 16までの各因 子ごとの公報件数、他社引用回数、被異議申立件数、及び特許インパクトの例を示 す。  Figure 7: Factor Nos. 1 to 16: Examination request rate, patent registration rate, registration decision rate, accelerated examination request rate, other company citation number ratio, assessment appeal ratio ratio, objections filed for each factor up to 16 An example of a ratio and a progress information index is shown. Figure 8 shows examples of the number of gazettes, the number of citations from other companies, the number of oppositions, and patent impact for each factor from 1 to 16;
[0078] 図 9に示す表に、特許インパクト指数及び経過情報指数に基づきスコア算出部 206 によって算出した因子経過情報スコアと、スコア算出部 206により因子経過情報スコ ァを文書数で除算して算出した公報 1件当たりの因子経過情報スコア(平均)を示す 。また、参考のため、因子分析における各因子の固有値も同図に示す。左端列の因 子番号は図 4乃至図 12で共通のものである。図 4乃至図 8と図 11及び図 12では因 子番号の順に配列した力 この図 9での因子の配列順序は因子経過情報スコアの降 J噴とした。 In the table shown in FIG. 9, the score calculation unit 206 is based on the patent impact index and the progress information index. The factor progress information score calculated by the above and the factor progress information score (average) per publication calculated by dividing the factor progress information score by the number of documents by the score calculation unit 206 are shown. For reference, the eigenvalues of each factor in the factor analysis are also shown in the figure. The factor numbers in the leftmost column are common to Figs. In Fig. 4 to Fig. 8, Fig. 11 and Fig. 12, the forces are arranged in the order of the factor numbers. The factor arrangement order in Fig. 9 is the descending jet of the factor progress information score.
図 9からわかるように、各因子の因子経過情報スコアを出力することによって、分析 対象の文書群のなかでも重要度の高い因子がどれであるかを鮮明に理解できる。各 因子の固有値は、データのうちどれだけの割合が説明されるかを示している力 因子 経過情報スコアの観点から見た各因子に含まれる文書の重要度とは無関係であるこ と力 S理角早できる。  As can be seen from Fig. 9, by outputting the factor progress information score of each factor, it is possible to clearly understand which factor has the highest importance in the document group to be analyzed. The eigenvalues of each factor indicate how much of the data is explained.Factor factor The fact that it is independent of the importance of the document included in each factor from the perspective of the historical information score. I can do it quickly.
図 10は、この実施例において、スコア算出部 206によって算出された因子経過情 報スコアを、出力装置 4が出力した図示例を示している。図 10において、各円が各因 子を示し、縦軸上における各円の位置は各因子の因子経過情報スコアを示し、横軸 上における各円の位置は各因子の因子経過情報スコア(当該因子に帰属する公報 1 件あたりの平均値)を示す。各円の大きさは各因子に属する公報件数を示し、各円に 付記された技術用語は各因子に帰属する索引語を示す。  FIG. 10 shows an example in which the output device 4 outputs the factor progress information score calculated by the score calculation unit 206 in this embodiment. In FIG. 10, each circle represents each factor, the position of each circle on the vertical axis represents the factor progress information score for each factor, and the position of each circle on the horizontal axis represents the factor progress information score for each factor (the relevant factor (Average value per publication attributed to a factor). The size of each circle indicates the number of publications belonging to each factor, and the technical term attached to each circle indicates the index word belonging to each factor.
これによれば、例えば因子 13 (排気、 NOX)は公報件数は少ないものの、経過情 報の観点からみて極めて重要な公報群よりなる因子であると推測できる。その他因子 4 (リー、ン)、因子 5 (下流、上流、触媒、劣化、診断)など因子経過情報スコアの高い 因子に注目することで、調査対象の文書群において重要な技術要素に注意を向け ること力 Sでさる。  According to this, for example, although factor 13 (exhaust, NOX) has a small number of publications, it can be inferred that it is an extremely important factor from the viewpoint of historical information. By focusing on factors with high factor history information scores such as factor 4 (Lee, N), factor 5 (downstream, upstream, catalyst, deterioration, diagnosis), attention is paid to important technical elements in the document group to be investigated. The power S
具体的には、例えば、図 10において因子 5 (下流、上流、触媒、劣化、診断)は、触 媒、劣化、診断という索引語の組合せから、触媒システム等の排出ガス低減装置の 性能劣化を自動的に検出して運転者に知らせる機能を有する、いわゆる高度な車載 診断システムを示す技術要素因子であると推測される。そうすると、近時の排出ガス 規制により、 NOxを低減するための触媒等の排出ガス低減装置の機能不良を監視 し、それを検出して運転者に知らせる高度な車載診断システムの導入が、政府又は 地方公共団体における各種審議会によって望まれている力 その技術エレメントの重 要性が、この因子 5に有意味な索引語を伴って顕れていると考えられる。 Specifically, for example, in FIG. 10, factor 5 (downstream, upstream, catalyst, deterioration, diagnosis) is a combination of index words of catalyst, deterioration, and diagnosis, and the performance deterioration of exhaust gas reduction devices such as catalyst systems. It is presumed that this is a technical factor indicating a so-called advanced in-vehicle diagnosis system that has a function of automatically detecting and notifying the driver. Then, due to the recent exhaust gas regulations, the introduction of an advanced in-vehicle diagnosis system that monitors the malfunction of exhaust gas reduction devices such as catalysts for reducing NOx, detects it, and notifies the driver has been introduced. The power desired by various councils in local governments The importance of the technical element appears to be manifested in this factor 5 with meaningful index terms.
[0080] 因子経過情報スコアの算出にあたっては、上述の例では各因子に属する公報群を 「スコア算出対象文書群」とし、因子ごとに算出したが、これに限らず、各因子に属す る公報群を更に出願人ごとに区分し、この因子ごと及び出願人ごとの各区分に属す る公報群を「スコア算出対象文書群」としてもよい。 [0080] In calculating the factor progress information score, in the above example, the publication group belonging to each factor is set as the "score calculation target document group" and is calculated for each factor. However, the present invention is not limited to this, and the publication belonging to each factor. The group may be further classified for each applicant, and a group of publications belonging to each category for each factor and each applicant may be referred to as a “score calculation target document group”.
図 11に示す表に、因子ごと及び出願人ごとの因子経過情報スコアの算出結果を示 す。  The table shown in Fig. 11 shows the calculation results of factor progress information scores for each factor and for each applicant.
この因子ごと及び出願人ごとの因子経過情報スコアの算出方法は、スコア算出対 象文書群を変更しただけで他は上述の(因子ごとの)因子経過情報スコアと同じであ る。因子ごと及び出願人ごとに区分したので各スコア算出対象文書群の文書数は小 さくなることが多いと考えられる。し力、し、出願人ごとに区分したか否かに関わらず審 查請求率などは 0から 1までの値域をとるので、必ずしも小さくなるわけではない。従 つて、ある因子について出願人ごとの因子経過情報スコアを全出願人で合計しても、 当該因子の因子経過情報スコア(出願人ごとではないもの)とは一致しない。  The calculation method of the factor progress information score for each factor and each applicant is the same as the above-described factor progress information score (for each factor) except that the score calculation target document group is changed. Since it is classified by factor and by applicant, the number of documents in each score calculation target group is likely to be small. Regardless of whether or not it is classified for each applicant, the examination request rate, etc., ranges from 0 to 1, and is not necessarily small. Therefore, even if the factor progress information score for each applicant for a factor is summed up for all applicants, the factor progress information score for that factor (not for each applicant) does not match.
[0081] 図 12は、この実施例において、各因子に属する公報群を更に出願人ごとに分類し 、因子ごと及び出願人ごとに因子経過情報スコアを図示した例である。具体的には、 スコア算出部 206が、各因子に属する公報群を更に出願人ごとに分類し、因子ごと 及び出願人ごとに因子経過情報スコアを算出する。出力装置 4が、その算出された 因子経過情報スコアに基づく図を生成し、その図を出力した例である。  FIG. 12 is an example in which the publication group belonging to each factor is further classified for each applicant in this embodiment, and the factor progress information score is shown for each factor and for each applicant. Specifically, the score calculation unit 206 further classifies the publication group belonging to each factor for each applicant, and calculates a factor progress information score for each factor and for each applicant. In this example, the output device 4 generates a diagram based on the calculated factor progress information score and outputs the diagram.
図 12において、図示左右方向に因子が列挙され、奥行き方向に出願人が列挙さ れており、高さ方向が因子ごと及び出願人ごとの因子経過情報スコアを示している。 但し、図 12には、因子得点が一定水準以上を超えた全公報において、出願件数が 上位 10位以内となる出願人の因子経過情報スコアについて算出した結果のみを示 している。  In FIG. 12, factors are listed in the horizontal direction in the figure, applicants are listed in the depth direction, and the height direction indicates the factor progress information score for each factor and each applicant. However, Fig. 12 shows only the results calculated for the applicant's factor progress information score with the number of applications within the top 10 in all publications with factor scores above a certain level.
具体的には、例えば図 9及び図 10において因子経過情報スコアが高いことがわか つた因子 4及び因子 5は、企業 Aの力が強いことがわかる(この企業 Aは、分析対象 文書群を抽出する元になつた自動車メーカー(自社)である)。また、例えば企業 Aは 因子 4及び因子 5につ!/、ては強!/、立場に!/、る力 S、因子 13につ!/、ては他社に遅れを 取っているなど、企業 Aと他企業との間、その他各企業間の強み、弱みを把握するこ と力 Sできる。 Specifically, for example, Factor 4 and Factor 5, which have been found to have a high factor progress information score in FIGS. 9 and 10, show that Company A has strong power (This Company A extracts the group of documents to be analyzed) The automaker that made it to the company. For example, company A Factor 4 and Factor 5! /, Strong! /, Position! /, Strength S, Factor 13! / It is possible to grasp the strengths and weaknesses between companies and other companies.
これによれば、各因子ごとに、因子に帰属する特許公報の出願人間の序列及びシ エアを把握することができる。その結果、開発主体となる企業における競争状態の観 点から、所定の技術分野の特徴を把握することができる。  According to this, for each factor, it is possible to ascertain the ranks and shares of applicants of patent publications belonging to the factor. As a result, it is possible to grasp the characteristics of a given technical field from the viewpoint of the competitive state of the company that is the development entity.
なお、本発明は、以上で説明した実施形態に限定されるものではなぐ本発明の要 旨の範囲内において種々の変形が可能である。  The present invention is not limited to the embodiment described above, and various modifications can be made within the scope of the gist of the present invention.
上記実施形態では、分析対象の技術文献が特許公報類である場合を例にしたが、 特にこれに限定されるものではな!/、。分析対象の技術文献が技術論文であってもよ い。この場合、因子経過情報スコアを、各因子に属する技術論文の数、引用回数等 を利用して求めるようにすればよい。  In the above embodiment, the case where the technical document to be analyzed is a patent gazette is taken as an example, but the invention is not particularly limited to this! /. The technical document to be analyzed may be a technical paper. In this case, the factor progress information score may be obtained using the number of technical papers belonging to each factor, the number of citations, and the like.
また、上記実施形態では、スコア算出対象文書群の「文書数」に行う所定の重み付 けとして、「他社引用回数の合計値」、「被特許異議申立の回数の合計値」若しくは「 被特許無効審判請求の回数の合計値」の少なくともいずれ力、 1つ、又はそれらの全 てによって行う場合を例にした力 S、特にこれに限定されるものではない。例えば、特 許収益性、特許生産性、特許活用度又は特許競争力等を利用して求めるようにして あよい。  Further, in the above embodiment, as a predetermined weight applied to the “number of documents” of the document group to be score-calculated, “total number of citations of other companies”, “total value of the number of oppositions to be patented” or “patented The power S is exemplified by the case where it is performed by at least one of the total number of requests for invalidation trial, one, or all of them, and is not limited to this. For example, it may be determined using patent profitability, patent productivity, patent utilization, or patent competitiveness.
また、上記実施形態では、処理装置 1の各機能部(テキストデータ取得部 101、文 書ベクトル取得部 102、因子演算部 103、帰属因子決定部 104、文書数判定部 201 、経過情報読出し部 202、指標算出部 203、特許インパクト算出部 204、経過情報 算出部 205、およびスコア算出部 206)がソフトウェアにより実現される場合を例にし た力 特にこれに限定されるものではない。処理装置 1の各機能部は、各機能部を実 行するために専用に設計された回路(ASIC (Application Specific Integrated Circuit )等)により実現されてもよい。  Further, in the above embodiment, each functional unit (text data acquisition unit 101, document vector acquisition unit 102, factor calculation unit 103, attribution factor determination unit 104, document number determination unit 201, progress information reading unit 202 of the processing apparatus 1 is described. The index calculation unit 203, the patent impact calculation unit 204, the progress information calculation unit 205, and the score calculation unit 206) are examples of forces realized by software. However, the present invention is not limited to this. Each functional unit of the processing device 1 may be realized by a circuit (ASIC (Application Specific Integrated Circuit) or the like) designed exclusively for executing each functional unit.
また、上記実施形態では、処理装置 1は、分析対象の特許公報類を記憶装置 3か ら取得する場合を例にした力 特にこれに限定するものではない。例えば、インター ネット等のネットワークを介して、外部の情報提供サーバと通信を行い、外部の情報 提供サーバから、特許公報類を取得するようにしてもよい。 Further, in the above embodiment, the processing apparatus 1 is not limited to the force particularly when the patent publications to be analyzed are acquired from the storage device 3 as an example. For example, it communicates with an external information providing server via a network such as the Internet, and external information You may make it acquire patent gazettes from a provision server.
また、上記実施形態では、特許インパクト指数の算出に利用する指標として、「被特 許異議申立の回数の合計値」又は「被特許無効審判請求の回数の合計値」を利用し ているが、これは例示に過ぎない。特許インパクト指数の算出に利用する指標として 、「被特許異議申立の回数の合計値」および「被特許無効審判請求の回数の合計値 」の両者を利用するようにしてもよい。同様に、上記実施形態では、経過情報指数の 算出に利用する指標として、「被特許異議申立件数比率」又は「被特許無効審判請 求件数比率」を利用している力 これは例示に過ぎない。経過情報指数の算出に利 用する指標として、「被特許異議申立件数比率」および「被特許無効審判請求件数 比率」の両者を利用するようにしてもよい。  In the above embodiment, as the index used for calculating the patent impact index, the “total value of the number of times of objection to patent patent” or the “total number of requests for patent invalidation trial” is used. This is only an example. As an index used for calculating the patent impact index, both “total value of the number of oppositions to be patented” and “total value of the number of requests for patent invalidation trial” may be used. Similarly, in the above embodiment, the power of using “the ratio of the number of oppositions to be patented” or “the ratio of the number of requests for invalidation of patents to be patented” as an index used to calculate the progress information index. . As an index used to calculate the progress information index, both “the ratio of the number of oppositions to be patented” and “the ratio of the number of requests for invalidation of patents to be patented” may be used.
[0083] < 5.変形例〉 [0083] <5. Variation>
続いて、本発明の上述した実施形態の変形例を説明する。  Then, the modification of embodiment mentioned above of this invention is demonstrated.
本実施形態の変形例は、上述した実施形態が行う処理のうち、上述の因子経過情 報スコアとは別の因子評価値 (以下、変形例で算出する因子評価値を「技術要素ス コア」という)の算出を行うものである。ここで技術要素とは、抽出された各因子のこと を言い、各因子に含まれる技術文献および索引語により表される各因子の内容面か ら命名したものである。なお、以下の説明では、分析対象の文書に特許公報等の特 許文献を利用する場合を例にする。  The modified example of the present embodiment is a factor evaluation value different from the above-described factor progress information score among the processes performed by the above-described embodiment (hereinafter, the factor evaluation value calculated in the modified example is referred to as “technical element score”). Calculation). Here, the technical element refers to each extracted factor and is named from the technical literature contained in each factor and the content of each factor represented by an index word. In the following explanation, a case where patent documents such as patent gazettes are used for the document to be analyzed is taken as an example.
[0084] 以下、図 13〜図 20を参照しながら、本実施形態の変形例について詳細に説明し ていく。なお、本実施形態の変形例の説明において、上記実施形態と同じ構成につ いては同じ符号を用いる。また、本変形例の説明では、上記実施形態と異なる部分 を中心に説明し、同様の構成の説明は省略する。  Hereinafter, modified examples of the present embodiment will be described in detail with reference to FIGS. 13 to 20. In the description of the modified example of the present embodiment, the same reference numerals are used for the same configurations as in the above embodiment. Further, in the description of this modification, the description will focus on the parts different from the above embodiment, and the description of the same configuration will be omitted.
[0085] < 5— 1.変形例の構成〉  [0085] <5— 1. Configuration of Modified Example>
先ず、本実施形態の変形例の構成を図 13に示す。図 13は、本実施形態の変形例 の文書群分析装置の機能ブロック図である。  First, FIG. 13 shows a configuration of a modification of the present embodiment. FIG. 13 is a functional block diagram of a document group analysis apparatus according to a modification of the present embodiment.
[0086] 図示するように、文書群分析装置は、入力装置 2、記録装置 3、出力装置 4、および 処理装置 100を備える。  As shown in the figure, the document group analysis apparatus includes an input device 2, a recording device 3, an output device 4, and a processing device 100.
入力装置 2、記録装置 3、および出力装置 4は、上記実施形態と同様のものを用い 処理装置 100は、入力装置 2からの要求にしたがい、記録装置 3に格納されている 特許文献を利用して、上述した図 2 (A)の手順にしたがい、因子毎に特許文献を分 類する。また、処理装置 100は、入力装置 2を介して、ユーザから指定された因子に 関して、技術要素スコアを算出する。 The input device 2, the recording device 3, and the output device 4 are the same as those in the above embodiment. The processing device 100 classifies the patent documents for each factor according to the request from the input device 2 and uses the patent documents stored in the recording device 3 according to the above-described procedure of FIG. Further, the processing device 100 calculates a technical element score for the factor designated by the user via the input device 2.
[0087] 具体的には、処理装置 100は、テキストデータ取得部 101、文書ベクトル取得部 10 2、因子演算部 103、帰属因子決定部 104、経過情報読出し部 202、スコア算出部 2 060、およびパテントスコア算出部 2070を備える。なお、テキストデータ取得部 101、 文書ベクトル取得部 102、因子演算部 103、帰属因子決定部 104、および経過情報 読出し部 202は、上記実施形態と同様の機能であるため、ここでの説明は省略する。  [0087] Specifically, the processing device 100 includes a text data acquisition unit 101, a document vector acquisition unit 102, a factor calculation unit 103, an attribution factor determination unit 104, a progress information reading unit 202, a score calculation unit 2 060, and A patent score calculation unit 2070 is provided. Note that the text data acquisition unit 101, the document vector acquisition unit 102, the factor calculation unit 103, the attribution factor determination unit 104, and the progress information reading unit 202 have the same functions as those in the above embodiment, and thus description thereof is omitted here. To do.
[0088] スコア算出部 2060は、帰属因子決定部 104により、因子毎に特許文献が分類され 、かつ各因子に索引語が対応付けられた状態において、入力装置 2を介して、ユー ザから技術要素スコアの算出要求を受け付ける。  [0088] The score calculation unit 2060 receives a technical document from the user via the input device 2 in a state where the patent document is classified for each factor by the attribution factor determination unit 104 and an index word is associated with each factor. An element score calculation request is accepted.
スコア算出部 2060は、技術要素スコアの算出要求を受け付けると、算出対象の因 子に属する特許文献毎の評価値を示す「パテントスコア (PS)」を利用して、技術要素 スコアを算出する。なお、「パテントスコア(PS)」は、以下に示す、パテントスコア算出 部 2070により、予め算出されていることとする。  When receiving the calculation request for the technical element score, the score calculation unit 2060 calculates a technical element score using a “patent score (PS)” indicating an evaluation value for each patent document belonging to the calculation target factor. The “patent score (PS)” is calculated in advance by the patent score calculation unit 2070 shown below.
[0089] パテントスコア算出部 2070は、特許文献毎に、その特許文献の経過情報 (優先権 主張の有無や、他の特許出願の審査での被引用回数などの情報)および内容情報( 請求項の数や、明細書の枚数等の情報)を利用して、その特許文献を評価した「パ テントスコア (PS)」を算出する。そして、パテントスコア算出部 2070は、特許文献を 識別する情報 (公報番号)毎に、その特許文献の「パテントスコア (PS)」と、その特許 が権利放棄されて!/、るか否かを示す「放棄情報 (拒絶が確定して!/、るか否かの情報 も含まれるものとする)」とを対応付けた情報 (以下、「PS情報」とレ、う)を生成する。  [0089] For each patent document, the patent score calculation unit 2070 includes progress information of the patent document (information such as whether or not priority is claimed and the number of times cited in examination of other patent applications) and content information (claims). (Patent score (PS)) that evaluates the patent document. Then, for each piece of information (gazette number) identifying a patent document, the patent score calculation unit 2070 determines whether the patent document's “patent score (PS)” and whether or not the patent has been waived! / Information (hereinafter referred to as “PS information”) is generated in association with “abandonment information (including information indicating whether rejection has been confirmed! /”).
[0090] つぎに、処理装置 100のハードウェア構成について説明する。処理装置 100は、上 記実施形態と同様、 CPU (Central Processing Unit)、メモリ、外部装置(入力装置 2、 記録装置 3、出力装置 4等)との間でデータの授受を行う I/F等を備えたコンピュータ により実現される。 そして、処理装置 100の各機能部(テキストデータ取得部 101、文書ベクトル取得 部 102、因子演算部 103、帰属因子決定部 104、経過情報読出し部 202、スコア算 出部 2060、およびパテントスコア算出部 2070)は、ソフトウェアにより実現されるもの とする。 Next, a hardware configuration of the processing apparatus 100 will be described. As in the above embodiment, the processing device 100 is a CPU (Central Processing Unit), a memory, an I / F that exchanges data with external devices (input device 2, recording device 3, output device 4, etc.) It is realized by a computer equipped with Each functional unit of the processing device 100 (text data acquisition unit 101, document vector acquisition unit 102, factor calculation unit 103, attribution factor determination unit 104, progress information reading unit 202, score calculation unit 2060, and patent score calculation unit 2070) shall be realized by software.
[0091] 具体的には、処理装置 100のメモリには、各機能部(テキストデータ取得部 101、文 書ベクトル取得部 102、因子演算部 103、帰属因子決定部 104、経過情報読出し部 202、スコア算出部 2060、およびテントスコア算出部 2070)を実現するためのプログ ラムが記憶されている。そして、処理装置 100の各機能部は、上記の CPU (Central Processing Unit)が、メモリに記憶されている上記のプログラムを実行することにより実 現されるものとする。  Specifically, in the memory of the processing device 100, each functional unit (text data acquisition unit 101, document vector acquisition unit 102, factor calculation unit 103, attribution factor determination unit 104, progress information reading unit 202, A program for realizing the score calculation unit 2060 and the tent score calculation unit 2070) is stored. Each functional unit of the processing device 100 is realized by the CPU (Central Processing Unit) executing the program stored in the memory.
[0092] < 5— 2.変形例における算出処理〉  [0092] <5— 2. Calculation process in modified example>
つぎに、上述した実施形態と異なる変形例の技術要素スコアの算出処理について 説明する。  Next, a technical element score calculation process according to a modified example different from the above-described embodiment will be described.
図 14は、本発明の実施形態の変形例の技術要素スコアの算出処理の手順を示す フローチャートである。  FIG. 14 is a flowchart showing the procedure of a technical element score calculation process according to a modification of the embodiment of the present invention.
なお、以下に示す処理フローは、上述した図 2 (A)の因子抽出処理において各文 書の帰属因子の決定(S 104)が完了してから、各因子に属する文書又は文書群の データを出力(S 105)するために行われるものとする。また、図 2 (A)の因子抽出処 理により求めた、「因子毎に、因子に属する特許文献を対応付けた情報(図 6に示す 情報)」は、処理装置 100のメモリの所定領域に記憶されているものとする。  In the processing flow shown below, the data of documents or document groups belonging to each factor is determined after the determination of the attribution factor for each document (S104) in the factor extraction process in Fig. 2 (A) described above. It is assumed to be performed for output (S 105). In addition, “information that associates patent documents belonging to factors for each factor (information shown in FIG. 6)” obtained by the factor extraction process in FIG. 2 (A) is stored in a predetermined area of the memory of the processing device 100. It shall be remembered.
また、図 14の処理を行う前に、パテントスコア算出部 2070により、各因子に属する 特許文献毎のパテントスコア (PS)が算出されているものとする。そして、処理装置 10 0のメモリ(或いは記憶装置 3)には、特許文献を識別する情報 (公報番号)毎に、そ の特許文献の「パテントスコア(PS)」と、その特許が権利放棄されて!/、るか否かを示 す「放棄情報 (拒絶が確定してレ、るか否かの情報も含まれるものとする)」とを対応付 けた情報(以下、「PS情報」という)が格納されているものとする。なお、パテントスコア (PS)の算出手順は、後述する図 17〜図 20で説明する。  Also, it is assumed that the patent score (PS) for each patent document belonging to each factor is calculated by the patent score calculation unit 2070 before the processing of FIG. Then, in the memory (or storage device 3) of the processing device 100, for each piece of information (gazette number) identifying the patent document, the “patent score (PS)” of the patent document and the patent are abandoned. /! Information indicating whether or not it is abandoned (including information on whether or not the rejection has been confirmed) (hereinafter referred to as “PS information”) ) Is stored. The procedure for calculating the patent score (PS) will be described later with reference to FIGS.
[0093] 具体的には、スコア算出部 2060は、入力装置 2を介して、ユーザから技術要素スコ ァの算出処理の要求を受け付ける(S2010)。なお、ユーザは、技術要素スコアの算 出処理を要求する際、算出の対象となる区分も指定する。 Specifically, the score calculation unit 2060 receives a technical element score from the user via the input device 2. A request for calculation processing is received (S2010). In addition, when requesting the calculation process of the technical element score, the user also specifies the category to be calculated.
算出の対象となる区分として、例えば、図 2 (A)の因子抽出処理により求めた因子 を指定してもよい。この場合には、因子毎に、技術要素スコアが算出される。  For example, the factor obtained by the factor extraction process in Fig. 2 (A) may be specified as the classification target. In this case, a technical element score is calculated for each factor.
また、例えば、算出の対象となる区分として、各因子に属する特許公報を出願人毎 に区分し、因子毎かつ出願人毎の分類を指定してもよい。この場合には、因子毎か つ出願人毎に技術要素スコアが算出される。  Further, for example, as a classification subject to calculation, patent gazettes belonging to each factor may be classified for each applicant, and classification for each factor and each applicant may be designated. In this case, a technical element score is calculated for each factor and for each applicant.
なお、以下では、ある因子の技術要素スコアを算出する要求を受け付けた場合を 例にする。  In the following, an example is given in which a request for calculating a technical element score for a certain factor is received.
[0094] つぎに、スコア算出部 2060は、 S2010で受け付けた技術要素スコアの算出対象と なる因子に属する特許文献のパテントスコア (PS)を取得する(S2020)。  Next, the score calculation unit 2060 acquires the patent score (PS) of the patent document belonging to the factor for which the technical element score received in S2010 is calculated (S2020).
具体的には、スコア算出部 2060は、処理装置 100のメモリに記憶されている「因子 毎に特許文献を対応付けた情報(図 6に示す情報)」、および「PS情報」を利用して、 算出対象となる因子に属する特許文献の「パテントスコア (PS)」および「放棄情報」を 取得する。  Specifically, the score calculation unit 2060 uses “information in which patent documents are associated with each factor (information shown in FIG. 6)” and “PS information” stored in the memory of the processing device 100. Obtain “Patent Score (PS)” and “Abandonment Information” of patent documents belonging to the factor to be calculated.
[0095] つぎに、スコア算出部 2060は、取得した算出対象となる因子に属する特許文献の 「パテントスコア (PS)」および「放棄情報」を利用し、権利放棄されていないパテントス コア(PS)につ!/、て、各々、その標準値を求める(S2030)。  [0095] Next, the score calculation unit 2060 uses the patent score “PST” and “waiver information” of the patent document belonging to the obtained factor to be calculated to obtain a patent score (PS) that has not been waived. Each of the standard values is obtained (S2030).
[0096] 具体的には、スコア算出部 2060は、「放棄情報」を参照し、指定された因子に属す る特許文献のうち、権利放棄されていない特許文献 (特許庁に係属中の出願も含め る)のパテントスコア (PS)を特定する。  [0096] Specifically, the score calculation unit 2060 refers to the “waiver information” and, among patent documents belonging to the designated factor, patent documents that have not been surrendered (including patent applications pending at the JPO). Include) patent score (PS).
スコア算出部 2060は、特定した各パテントスコア(PS)について、母集団(例えば、 因子抽出処理の行われた分析対象文書群のうちの権利放棄されて!/、な!/、特許文献 )における標準値を求める。より具体的には、スコア算出部 2060は、以下に示す (数 1)と、上記の特定したパテントスコア(PS)とを用いて、特定したパテントスコア(PS) 毎に標準値を求める。  For each identified patent score (PS), the score calculation unit 2060 in the population (for example, in the analysis target document group subjected to the factor extraction process, the right is abandoned! /, NA! /, Patent document) Find the standard value. More specifically, the score calculation unit 2060 obtains a standard value for each identified patent score (PS) using the following (Equation 1) and the identified patent score (PS).
[0097] 以下に示す (数 1)では、権利放棄されていない特許文献のパテントスコア (PS)が 母集団内に「m」個あるものとし、パテントスコア(PS)に添え字 iを付け、「PSi (l≤i≤ m (mは 1以上の整数) ) jで示して!/、る。 [0097] In the following (Equation 1), it is assumed that there are “m” patent scores (PS) of patent documents that have not been waived, and the subscript i is added to the patent score (PS). "PSi (l≤i≤ m (m is an integer greater than or equal to 1))
また、(式 1)では、 m個の特許文献の PSiのうち、特定の因子に属する各特許文献 j の「パテントスコア PSj」の標準値を求めている。  In (Equation 1), the standard value of the “patent score PSj” of each patent document j belonging to a specific factor is obtained from the PSi of m patent documents.
[数 1]  [Number 1]
《母集団內における特許 jの標準値》 …式 1
Figure imgf000037_0001
<Standard value of patent j in population 內>… Equation 1
Figure imgf000037_0001
[0098] つぎに、スコア算出部 2060は、 S2030で求めた特定の因子に属する特許文献の 各パテントスコア PSjの標準値のうち、閾値以上のパテントスコア PSjの標準値の合計 値を求め、その合計値を当該因子の「技術要素スコア」とする(S2040)。また、スコア 算出部 206は、本ステップにおいて、 S2030で求めた特定の因子に属する特許文 献の各パテントスコア PSjの標準値のうち、最大値を求める。 [0098] Next, the score calculation unit 2060 obtains the total value of the standard values of the patent scores PSj that are equal to or greater than the threshold among the standard values of the patent scores PSj of the patent documents belonging to the specific factor obtained in S2030, and The total value is set as the “technical element score” of the factor (S2040). In this step, the score calculation unit 206 obtains the maximum value among the standard values of the patent scores PSj of the patent documents belonging to the specific factor obtained in S2030.
[0099] 具体的には、スコア算出部 206は、以下に示す(数 2)と、 S2030で求めたパテント スコア(PSj)の標準値とを用いて、ユーザから指定された因子に対する「技術要素ス コア」を算出する。また、スコア算出部 206は、 S2030で求めた各パテントスコア PSj の標準値の中から最大(MAX)の標準値を選択し、選択した標準値を当該因子にお ける最大値とする。  [0099] Specifically, the score calculation unit 206 uses the following (Equation 2) and the standard value of the patent score (PSj) obtained in S2030 to calculate the “technical element for the factor specified by the user. Calculate score. Further, the score calculation unit 206 selects the maximum (MAX) standard value from the standard values of the patent scores PSj obtained in S2030, and sets the selected standard value as the maximum value in the factor.
なお、(数 2)では、 S2030で求めた各パテントスコア PSjの標準値のうち、閾値以 上のパテントスコア PSjの標準値の数が当該因子に「n」個あるものとしている。また、( 数 2)では閾値 PSstdの例として、 S2030で求めた各パテントスコア PSiの標準値の 母集団での平均 数 1]によれば 0となる)を用いている。  In (Equation 2), among the standard values of each patent score PSj obtained in S2030, the number of standard values of the patent score PSj above the threshold is assumed to be “n” in the factor. In addition, (Equation 2) uses, as an example of the threshold value PSstd, 0 according to the average number 1 in the population of the standard value of each patent score PSi obtained in S2030.
[0100] [数 2] 《技術要素スコア:》 '式 2
Figure imgf000038_0001
=l
[0100] [Equation 2] 《Technical element score:》 'Formula 2
Figure imgf000038_0001
= l
PS,  PS,
但し PSstd > o
Figure imgf000038_0002
PS std > o
Figure imgf000038_0002
PSstd:閾値 デフオルトでは 0(=平均) PS std : 0 (= average) for the threshold default
m:母集団中で権利放棄されていない( S..≠0)特許の数 m: Number of patents not waived in the population (S .. ≠ 0)
":因子中で閾値 ( S. > S^)以上である特許の数 ": Number of patents with a factor (S.> S ^) or more in the factor
[0101] そして、スコア算出部 2060により技術要素スコアが算出されると、図 2 (A)の S105 [0101] Then, when the technical element score is calculated by the score calculation unit 2060, S105 in FIG.
(出力)の処理に移行する。  The process proceeds to (output).
なお、図 14のフローでは、 1つの因子に対する技術要素スコアを算出しているが、 あくまでもこれは例示である。複数の因子の技術要素スコアを算出する要求を受けた 場合には、各因子について、 S2020〜S2040の処理を行い、因子毎に、技術要素 スコアおよび最大値を求める。  In the flow of FIG. 14, the technical element score for one factor is calculated, but this is just an example. When receiving a request to calculate the technical element score of a plurality of factors, the processing of S2020 to S2040 is performed for each factor, and the technical element score and the maximum value are obtained for each factor.
[0102] 図 2 (A)の S 105では、出力装置 4により、 S2040で求めた技術要素スコアを出力 する。或いは、出力装置 4により、技術要素スコアと共に、その因子の最大値を出力 する。 [0102] In S105 of Fig. 2 (A), the output device 4 outputs the technical element score obtained in S2040. Alternatively, the output device 4 outputs the maximum value of the factor together with the technical element score.
なお、スコア算出部 2060により、因子ごとおよび出願人ごとに技術要素スコアを求 め、上述した図 12に示すように、因子ごとおよび出願人ごとに技術要素スコアを示し たグラフを示す情報を生成し、出力装置 4により出力するようにしてもよい。この場合、 因子ごとおよび出願人ごとに技術要素スコアと最高値を併せて示すようにしてもよい [0103] このように、本変形例では、権利放棄されていない特許文献のパテントスコア(PSi) を利用して、技術要素スコアを算出するようにしている。このようにしたのは以下の理 由による。例えば、ある企業において、技術分野毎の特許の評価をしょうとした場合、 ある技術分野(因子)に分類される特許文献の件数は非常に多いが、その多くが放 棄されてレ、る出願(或!/、は拒絶査定の確定して!/、る出願)であったとする。このような 場合、その技術分野の特許の評価に、すでに放棄されている出願(或いは拒絶が確 定している出願)を含めてしまうと、特許権を多く保持していない技術分野が高く評価 されてしまい、適切な分析ができない。 The score calculation unit 2060 calculates the technical element score for each factor and each applicant, and generates information indicating a graph showing the technical element score for each factor and each applicant as shown in FIG. Then, it may be output by the output device 4. In this case, the technical element score and the maximum value may be indicated for each factor and for each applicant. As described above, in the present modification, the technical element score is calculated by using the patent score (PSi) of the patent document that has not been waived. The reason for this is as follows. For example, if a company tries to evaluate patents in each technical field, the number of patent documents classified into a technical field (factor) is very large, but many of them are abandoned. (Or! / Is an application that has confirmed the decision of rejection! /). In such a case, if an application that has already been abandoned (or an application for which refusal has been confirmed) is included in the evaluation of a patent in that technical field, the technical field that does not hold many patents is highly evaluated. It is not possible to analyze properly.
そこで、本変形例では、権利放棄されていない特許文献のパテントスコア(PSi)を 利用して技術要素スコアを算出するようにして、スコアの精度を高めるようにしている  Therefore, in this modification, the technical element score is calculated using the patent score (PSi) of the patent document that has not been waived to improve the accuracy of the score.
[0104] また、本変形例では、パテントスコア(PSi)の標準値を算出する際に、単なる標準 値ではなぐ一般的な標準値に係数を乗算するようにしている((数 1)では 10倍して いる)。これは、求めた標準値間の差異を判別し易くするためである。なお、(数 1)で は 10倍しているがあくまでも例示である。 [0104] In addition, in this modification, when calculating the standard value of the patent score (PSi), a coefficient is multiplied by a general standard value rather than a mere standard value (10 in (Equation 1)). Is doubled). This is for facilitating the discrimination between the obtained standard values. In (Equation 1), it is 10 times, but it is only an example.
[0105] また、本変形例では、技術要素スコアの算出に閾値を超えるパテントスコア PSiの 標準値だけを利用するようにしている。これは、技術要素スコアの値が受ける特許文 献の件数の影響を緩和するためである。  [0105] In this modification, only the standard value of the patent score PSi exceeding the threshold is used for calculating the technical element score. This is to mitigate the effect of the number of patent documents received by the value of the technical element score.
例えば、出願人毎且つ因子毎に、技術要素スコアを求め、その求めた技術要素ス コアを比較して出願人毎の技術傾向を分析しょうとしたとする。この場合に本変形例 のように閾値を考慮しないとすれば、出願件数が多い出願人の技術要素スコアの値 が高くなり過ぎる傾向にあり、精度の高い分析ができなくなるおそれがある。  For example, suppose that a technical element score is obtained for each applicant and for each factor, and the technical tendency for each applicant is analyzed by comparing the obtained technical element scores. In this case, if the threshold value is not taken into consideration as in this modification, the value of the technical element score of the applicant having a large number of applications tends to be too high, and there is a possibility that a highly accurate analysis cannot be performed.
確かに、特定技術分野の特許を過不足なく抽出して分析対象文書群 (母集団)とし たような場合には、各出願人及び各因子の出願件数の多寡そのものも十分に有意な 数値と考えること力できる。しかし、そうではない任意の方法で分析対象文書群 (母集 団)を抽出したような場合には、出願件数そのものが各出願人の属する業種の特性 等に応じて異なることから、各出願人及び各因子の出願件数の多寡にとらわれてしま うと、精度の高い分析ができなくなる可能性がある。 また、膨大な数の特許を含む分析対象文書群 (母集団)から重要な要素を選出す ることを主眼とした場合には、「個々の重要度の低レ、多数の特許」より「個々の重要度 の高レ、特許」が含まれてレ、る方を重視した方が好ましレ、場合もある。 Certainly, if the patents in a specific technical field are extracted without excess or deficiency and used as the analysis target document group (population), the number of applications for each applicant and each factor itself is also a sufficiently significant value. I can think. However, if the documents to be analyzed (population group) are extracted by any method that is not so, the number of applications itself varies depending on the characteristics of the industry to which each applicant belongs. If the number of applications for each factor is limited, there is a possibility that a highly accurate analysis cannot be performed. In addition, if the main objective is to select important elements from a group of documents to be analyzed (population) including a huge number of patents, then "individuals with low importance, many patents" In some cases, it is preferable to place emphasis on the person who has a high degree of importance.
そのため、本変形例では、パテントスコア PSiの標準値のうち、所定値以上のものだ けを利用するようにして、当該所定値以上の重要特許を含む因子にのみ高い技術要 素スコアが付与されるようにして技術要素スコアの精度を高めるようにした。  For this reason, in this modification, only the standard value of the patent score PSi that is equal to or higher than a predetermined value is used, and a high technical element score is given only to factors including important patents that are higher than the predetermined value. In this way, the accuracy of the technical element score was improved.
特に、例えば平均が 0となるようにパテントスコアを標準化し、平均(0)以上の標準 値を集計して技術要素スコアとする場合には、平均以下のパテントスコアの値を捨象 できるだけでなぐ平均付近のパテントスコアが多数あっても技術要素スコアの値に 与える影響が小さぐ平均から飛び抜けて高いものがあれば技術要素スコアの値に 大きな影響を与える。従って、技術要素に含まれる件数の影響を更に緩和し、重要 度の高い特許が含まれている技術要素を的確に抽出することができる。  In particular, for example, when the patent score is standardized so that the average becomes 0, and the standard values above the average (0) are aggregated to obtain the technical element score, the average that can be obtained by discarding the patent score values below the average as much as possible Even if there are many patent scores in the vicinity, the effect on the value of the technical element score is small. Therefore, the influence of the number of technical elements included can be further alleviated, and technical elements that contain highly important patents can be extracted accurately.
[0106] 図 15は、上記変形例の技術要素スコア及び上記実施例の因子経過情報スコアの 分布を、公報件数との関係において示した図である。詳しくは、ある分析対象文書群 力、ら抽出した複数の因子について因子経過情報スコアを算出して平均 0、分散 1に 標準化するとともに、同じ因子について技術要素スコア(平均 0、分散 1に標準化され ている)を算出し、これら標準化された因子経過情報スコア及び技術要素スコアを縦 軸に、各因子の公報件数を横軸にとっている。  FIG. 15 is a diagram showing the distribution of the technical element score of the modified example and the factor progress information score of the example in relation to the number of publications. Specifically, factor progress information scores are calculated and standardized to mean 0 and variance 1 for multiple factors extracted from a certain group of documents to be analyzed, and technical factor scores (average 0 and variance 1 are standardized for the same factor). The standardized factor progress information score and technical element score are plotted on the vertical axis, and the number of publications for each factor is plotted on the horizontal axis.
図に示されるように、因子経過情報スコアは公報件数との正比例関係を示す直線 に近い分布を有しており、公報件数の影響を大きく受けている。これに対し、技術要 素スコアは公報件数とまったく無関係ではないものの正比例関係を示す直線からか なり離れた領域にも分布しており、公報件数の影響が緩和されていることがわかる。  As shown in the figure, the factor progress information score has a distribution close to a straight line showing a direct proportional relationship with the number of publications, and is greatly influenced by the number of publications. On the other hand, although the technical element score is not completely irrelevant to the number of publications, it is distributed in a region far from the straight line showing the direct proportional relationship, which shows that the influence of the number of publications is mitigated.
[0107] 図 16は、上述した図 2 (A)の因子抽出処理において各文書の帰属因子の決定(S 104)が完了した後、各因子に属する公報群を更に出願人ごとに分類し、因子ごと及 び出願人ごとに本変形例の技術要素スコアを算出し、図示した例である。具体的な 算出方法は次の通りである。  [0107] FIG. 16 shows that after determining the attribution factors of each document (S104) in the factor extraction process of FIG. 2 (A) described above, the publication groups belonging to each factor are further classified for each applicant. In this example, the technical element score of this modification is calculated for each factor and for each applicant. The specific calculation method is as follows.
ある住宅設備機器メーカー(a社)の日本における特許公開公報及び特許掲載公 報を類似度に基づレ、てクラスタ分析して複数のクラスタを得た。これら複数のクラスタ を検討した結果、特に「複合構造物」に関するクラスタについて、周辺他社特許との 関係を詳細に解析することにした。そこで、自社及び他社の特許公開公報及び特許 掲載公報を含む約 400万件の公報群の中から、当該「複合構造物」に関するクラスタ に属する公報群との類似度が高い約 3000件の公報群を抽出した。 A cluster analysis was performed based on the similarity of patent publications and publications published in Japan by a household equipment manufacturer (Company a) to obtain multiple clusters. These multiple clusters As a result, we decided to analyze in detail the relationship with the patents of other companies around the company, especially for clusters related to “composite structures”. Therefore, out of approximately 4 million gazette groups including patent publications and patent publication gazettes of our company and other companies, about 3000 gazette groups with high similarity to the gazette group belonging to the cluster related to the “composite structure”. Extracted.
この約 3000件の公報群を各公報の文書ベクトルの類似度に基づいてクラスタ分析 し、このうち特定クラスタに属する公報 323件を分析対象文書群として因子を抽出し た。スコア算出部 2060が、各因子に属する公報群を更に出願人ごとに分類し、因子 ごと及び出願人ごとに技術要素スコアを算出した。そして、出力装置 4が、その算出さ れた技術要素スコアに基づく図を生成し、その図を出力した。  A cluster analysis was performed on the approximately 3000 publication groups based on the similarity of the document vectors of each publication, and among these, 323 publications belonging to a specific cluster were extracted as factors to be analyzed. The score calculation unit 2060 further classifies the publication group belonging to each factor for each applicant, and calculates the technical element score for each factor and each applicant. Then, the output device 4 generates a diagram based on the calculated technical element score, and outputs the diagram.
図 16において、図示左右方向に因子が列挙され、奥行き方向には出願人がその 出願件数順に列挙されており、高さ方向が因子ごと及び出願人ごとの技術要素スコ ァを示している。但し、図 16には、因子得点が一定水準以上を超えた全公報におい て、出願件数が上位 10位以内となる出願人の技術要素スコアについて算出した結 果のみを示している。  In FIG. 16, factors are listed in the horizontal direction in the figure, applicants are listed in the depth direction in the order of the number of applications, and the height direction indicates the technical element score for each factor and each applicant. However, Fig. 16 shows only the results calculated for the applicant's technical element score with the number of applications within the top 10 in all publications with a factor score exceeding a certain level.
[0108] 本変形例においては、技術要素スコアを算出するにあたりパテントスコアが平均以 下の公報を除外して合計しているため、平均以下の公報が多くを占める因子、若しく はすべての公報が平均以下である因子については、技術要素スコアは 0に近い値と なるか、若しくは 0となる。従って、因子間のコントラストが明瞭になり、その結果、因子 間の序列や評価が視覚的に把握し易くなる。  [0108] In this modification, the technical factor score is calculated by adding the patent scores that are below the average, excluding publications that are below the average. For factors with a below average, the technical factor score is close to 0 or 0. Therefore, the contrast between factors becomes clear, and as a result, the order and evaluation between factors can be easily grasped visually.
図 16の出力結果を見ると、分析対象文書群において出願件数が少ない g社が、出 願件数の多レ、a社や b社に匹敵する特許群をある技術要素にお!/、て保有して!/、るこ とが読み取れる。また、出願件数においてトップである a社であっても、他社に遅れを とっている技術要素が存在するなど、どこに強み、弱みがあるかが一目瞭然となって いる。このように、各企業の強み、弱みを明瞭に把握することができる。  Looking at the output results in Fig. 16, company g, which has a small number of applications in the document group to be analyzed, has a large number of applications, patents comparable to companies a and b in a technical element! You can read! In addition, it is clear at a glance where the strengths and weaknesses of Company a, which is the top in the number of applications, include the technological elements that lag behind other companies. In this way, it is possible to clearly grasp the strengths and weaknesses of each company.
[0109] なお、本変形例では、閾値に母集団での平均を利用するようにしている力 特にこ れに限定するものではない。例えば、処理装置 100に、特定出願人の特許群でのパ テントスコア PSiの標準値の平均や、その他のユーザが定めた閾値を設定するように してもよい。 また、本変形例では、パテントスコア PSiの標準値を利用するようにしている力 特 にこれに限定するものではない。例えば、標準化していないパテントスコア PSiのうち 所定値以上のものだけを加算した場合であっても、件数の影響を緩和することができ [0109] It should be noted that in the present modification, the force using the average of the population as the threshold value is not particularly limited to this. For example, an average of the standard values of the patent score PSi in the patent group of the specific applicant and other threshold values determined by other users may be set in the processing apparatus 100. Further, in this modification, the power value using the standard value of the patent score PSi is not limited to this. For example, the effect of the number of cases can be mitigated even when only non-standardized patent scores PSi are added.
[0110] また、本変形例によれば、ユーザに技術要素スコアを提示する際、その因子に分類 される特許文献のパテントスコア(PSj)の標準値の最高値も提示することができるよう になる。これにより、ユーザは、高評価の特許がどの技術要素(因子)に含まれるのか を把握できるようになる。また、それに伴いユーザは、技術要素(因子)全体としての 評価値は低くても、高評価の特許が含まれる技術要素(因子)を把握することができ 例えば、ある企業において、技術分野毎の特許の評価をしょうとして、その企業(出 願人)の因子毎の技術要素スコアを求めたとする。この場合、各因子の最高値を提示 することにより、自社のどの技術分野に、強い特許があるのかを把握できるようになる [0110] Also, according to this modification, when the technical element score is presented to the user, the highest standard value of the patent score (PSj) of the patent document classified as the factor can be presented. Become. As a result, the user can grasp which technical elements (factors) a highly evaluated patent is included in. As a result, the user can grasp the technical elements (factors) including highly evaluated patents even if the evaluation value as a whole of the technical elements (factors) is low. Suppose that the technical element score for each factor of the company (applicant) is obtained in an attempt to evaluate the patent. In this case, by presenting the highest value of each factor, it becomes possible to grasp which technical field of the company has strong patents.
[0111] < 6.パテントスコア(PS)について〉 [0111] <6. Patent Score (PS)>
つぎに、図 17〜図 20を用いて、上記変形例における技術要素スコアの算出に利 用したパテントスコア (PS)について説明する。  Next, the patent score (PS) used to calculate the technical element score in the above modification will be described with reference to FIGS.
なお、以下の説明では、パテントスコア(PS)の算出処理は、処理装置 100のパテ ントスコア算出部 2070により行うようにしている力 特にこれに限定するものではない  In the following description, the calculation of the patent score (PS) is performed by the patent score calculation unit 2070 of the processing apparatus 100. The present invention is not particularly limited to this.
CPU (Central Processing Unit)、メモリ等を備える、別のコンピュータがパテントスコ ァの算出処理を行うようにしてもかまわない。この場合、別のコンピュータに、パテント スコア算出部 2070の機能を実現するプログラム(PS算出プログラム)を記憶させてお く。そして、別のコンピュータの CPUが「PS算出プログラム」を実行することにより、パ テントスコア PSを算出し、上述した PS情報を生成する。処理装置 100は、別のコンビ タが生成した PS情報を取得してメモリに記憶させておく。なお、別のコンピュータ がパテントスコアの算出処理を行う場合、処理装置 100にパテントスコア算出部 206 0を設ける必要はない。 [0112] < 6— 1.データ構成〉 Another computer having a CPU (Central Processing Unit), a memory, and the like may perform the calculation of the patent score. In this case, a program (PS calculation program) for realizing the function of the patent score calculation unit 2070 is stored in another computer. Then, the CPU of another computer executes the “PS calculation program”, thereby calculating the patent score PS and generating the above-described PS information. The processing device 100 acquires PS information generated by another combinator and stores it in the memory. When another computer performs the patent score calculation process, it is not necessary to provide the patent score calculation unit 2060 in the processing apparatus 100. [0112] <6— 1. Data structure>
先ず、パテントスコア PSの算出に利用するデータ構成について説明する。 なお、記憶装置 3には、特許データ (特許公報を示す電子データ)と、特許属性情報 とが格納されている。特許公報を示す電子データには、少なくとも、その特許データ I D (公報番号等)、出願日、 IPCコード等の書誌情報が含まれるものとする。  First, the data structure used for calculating the patent score PS will be described. The storage device 3 stores patent data (electronic data indicating a patent gazette) and patent attribute information. The electronic data indicating the patent gazette includes at least the bibliographic information such as the patent data I D (gazette number, etc.), filing date, and IPC code.
また、特許属性情報には、その特許文献の経過情報 300 (優先権主張の有無や、 他の特許出願の審査での被引用回数などの情報)、および内容情報 400 (請求項の 数や、明細書の枚数等の情報)が含まれる。以下、経過情報 300、および内容情報 4 00のデータ構成を説明する。  In addition, the patent attribute information includes progress information 300 of the patent document (information such as whether priority is claimed or the number of citations in examination of other patent applications), and content information 400 (number of claims, Information such as the number of specifications). Hereinafter, the data structure of the progress information 300 and the content information 400 will be described.
[0113] 先ず、経過情報 300のデータ構成の一例を図 17に示す。 First, FIG. 17 shows an example of the data configuration of the progress information 300.
図 17は、本実施形態の変形例で利用する経過情報のデータ構成の一例を模擬的 に示した図である。  FIG. 17 is a diagram schematically showing an example of the data structure of the progress information used in the modification of the present embodiment.
図示するように、経過情報 300は、「特許データ ID (公報番号等)」を登録するため のフィーノレド 301と、「出願日力、らの経過日数」を登録するためのフィーノレド 302と、「 審査請求日からの経過日数」を登録するためのフィールド 303と、「登録日からの経 過日数」を登録するためのフィールド 304と、「分割出願」の有無を示す情報を登録 するためのフィールド 305と、「早期審査」の有無を示す情報を登録するためのフィー ノレド 306と、「不服審判特許審決」の有無を示す情報を登録するためのフィールド 30 7と、「異議申立維持決定」の有無を示す情報を登録するためのフィールド 308と、「 無効審判維持審決」の有無を示す情報を登録するためのフィールド 309と、「優先権 主張」の有無を示す情報を登録するためのフィールド 310と、「PCT出願」の有無を 示す情報を登録するためのフィールド 311と、「包袋閲覧」の有無を示す情報を登録 するためのフィールド 312と、「被引用回数」を示す情報を登録するためのフィールド 313とを備えて、 1つのレコードが構成される。なお、経過情報 300は、複数のレコー ド、よりなる。  As shown in the figure, the progress information 300 includes a “Finored 301” for registering a “patent data ID (gazette number, etc.)”, a “Finored 302” for registering “the filing date, etc.”, “examination” A field 303 for registering the “number of days since the billing date”, a field 304 for registering the “number of days elapsed from the registration date”, and a field 305 for registering information indicating whether or not “divisional application” exists. Finale 306 for registering information indicating the presence or absence of “early examination”, field 30 7 for registering information indicating the presence or absence of “trial decision of appeal against appeal”, and the presence or absence of “opposition to maintain opposition” Field 308 for registering information indicating the status, field 309 for registering information indicating the presence / absence of “invalidation trial maintenance decision”, and field 310 for registering information indicating the presence / absence of “priority claim” In order to register the field 311 for registering information indicating the presence / absence of “PCT application”, the field 312 for registering information indicating the presence / absence of “wrapping bag browsing”, and the information indicating “number of times cited” Field 313 and one record. The progress information 300 includes a plurality of records.
[0114] ここで、「出願からの経過日数」、「審査請求からの経過日数」、および「登録日から の経過日数」は、該当する特許データの期間に関する情報である。「出願からの経過 日数」は出願日、「審査請求からの経過日数」は出願審査請求日、「登録日からの経 過日数」は特許権設定登録日に基づき、それぞれ評価日(パテントスコアの算出日) まで又は評価日に近!/、所定日付までの経過日数を算出したものが記憶装置 3に格 納される。未だ出願審査請求されてレ、な!/、特許出願にっレ、ての「審査請求からの経 過日数」は NULLとなり、未だ設定登録されて!/、な!/、特許出願にっレ、ての「登録日 からの経過日数」は NULLとなる。 Here, “elapsed days from application”, “elapsed days from examination request”, and “elapsed days from registration date” are information relating to the period of the corresponding patent data. “Elapsed days from application” is the filing date, “Elapsed days from examination request” is the application examination request date, “Elapsed date from registration date” The number of days in the past is stored in the storage device 3 based on the registration date of the patent right, and the calculated number of days until the evaluation date (patent score calculation date) or near the evaluation date! . The application examination request has not been requested yet! /, The patent application has already been null, and the number of days passed since the examination request has been set to NULL. The “number of days elapsed since registration date” is NULL.
[0115] 経過情報 300のうち、「分割出願」、「早期審査」、「不服審判特許審決」、「異議申 立維持決定」、「無効審判維持審決」、「包袋閲覧」、「優先権」は、特許データに対す る所定行為の有無を示す情報である。「分割出願」は当該特許出願をもとの出願とし て分割出願がなされているか否か、「早期審査」は当該特許出願の早期審査がなさ れているか否か、「不服審判特許審決」は当該特許出願について拒絶査定不服審判 が請求され、且つ当該審判において特許審決がなされているか否力、、「異議申立維 持決定」は当該特許について特許異議申立がなされ、且つ維持決定がなされている か否か、「無効審判維持審決」は当該特許について特許無効審判が請求され、且つ 当該審判において請求棄却審決がなされているか否力、、「優先権」は当該特許出願 が先の特許出願等に基づく優先権主張を伴っているか否か、或いは当該特許出願 が特許協力条約に基づく国際出願を国内に移行したものであるか否か、「包袋閲覧」 は当該特許出願について閲覧請求がなされているか否かに基づき、それぞれ所定 行為がなされて!/、る場合は例えば 1が与えられ、なされて!/、な!/ヽ場合は例えば 0が与 X_られる。 [0115] Among the progress information 300, “divisional application”, “early examination”, “appeal of appeal trial patent decision”, “opposition to maintain opposition”, “invalidity trial maintenance decision”, “viewing package”, “priority” "Is information indicating the presence or absence of a predetermined act on the patent data. `` Division application '' is whether the divisional application has been filed based on the patent application, `` Rapid examination '' is whether the patent application has been expedited examination, The appeal against the decision to reject the patent application is requested and whether or not a patent trial decision has been made in the trial, the “opposition to maintain opposition” is a patent opposition to the patent and a maintenance decision has been made Whether the trial for maintaining a trial for invalidation is requested and whether or not a trial for rejecting the request has been made in the trial, and “priority” is a patent application for which the patent application is earlier. Whether or not it is accompanied by a priority claim based on or whether the patent application is an international application based on the Patent Cooperation Treaty, Based on whether list claims have been made, a predetermined action is performed is in! /, Respectively, Ru case is given for instance 1, made is in! /, It! / ヽ case is X_ given is 0, for example.
[0116] つぎに、内容情報 400のデータ構成を図 18に示す。  Next, the data structure of the content information 400 is shown in FIG.
図 18は、本実施形態の変形例で利用する内容情報のデータ構成の一例を模擬的 に示した図である。  FIG. 18 is a diagram schematically showing an example of the data configuration of the content information used in the modification of the present embodiment.
[0117] 図示するように、内容情報 400は、「特許データ ID (公報番号等)」を登録するため のフィールド 401と、その特許データの「請求項数」を登録するためのフィールド 402 と、「請求項の平均文字数」を登録するためのフィールド 403と、その特許データの「 明細書枚数」を登録するためのフィールド 404とを備えて 1つのレコードが構成される 。なお、内容情報 400は、複数のレコードよりなる。  As shown in the figure, the content information 400 includes a field 401 for registering “patent data ID (gazette number, etc.)”, a field 402 for registering “number of claims” of the patent data, One record is composed of a field 403 for registering “average number of characters in a claim” and a field 404 for registering “number of specifications” of the patent data. The content information 400 includes a plurality of records.
ここで、「請求項数」は、当該特許出願の請求項数を示す情報であり、「請求項の平 均文字数」は、当該特許出願の請求項 1項あたりの平均文字数 (又は単語数)を示す 情報である。「明細書頁数」は、当該特許出願の明細書頁数又は公報頁数を示す情 報である。これらの情報は各特許出願の公開特許公報その他の特許データより抽出 される。 Here, the “number of claims” is information indicating the number of claims of the patent application. “Average number of characters” is information indicating the average number of characters (or the number of words) per claim of the patent application. “Number of specification pages” is information indicating the number of specification pages or the number of publication pages of the patent application. This information is extracted from the published patent gazette and other patent data of each patent application.
[0118] < 6— 2.パテントスコア算出処理〉  [0118] <6— 2. Patent Score Calculation Processing>
続いて、図 19を用いて説明する。図 19は、本実施形体の変形例のパテントスコア の算出処理の手順を示したフローチャートである。  Next, description will be made with reference to FIG. FIG. 19 is a flowchart showing the procedure of a patent score calculation process according to a modification of the present embodiment.
[0119] 図 19に示すように、パテントスコア算出部 2070は、ユーザからの IPCコードの入力 を受け付け、特許データ(特許公報を示す電子データ)を取得する(S400)。 As shown in FIG. 19, the patent score calculation unit 2070 receives input of an IPC code from the user, and acquires patent data (electronic data indicating a patent publication) (S400).
具体的には、パテントスコア算出部 207は、ユーザからの IPCコードの入力を受け 付けると、記憶装置 3にアクセスし、その IPCコードに分類される特許データを取得す る。なお、特許データには、その特許出願の出願日の情報や優先日の情報 (優先権 を主張してレ、る場合に限る)等の書誌情報が含まれてレ、る  Specifically, when receiving an IPC code input from a user, the patent score calculation unit 207 accesses the storage device 3 and acquires patent data classified into the IPC code. The patent data contains bibliographic information such as the filing date information and priority date information of the patent application (limited to cases where priority is claimed).
[0120] つぎに、パテントスコア算出部 2070は、取得した特許データの書誌情報のうち出 願日の情報又は優先日の情報等を用いて、特許データを所定期間ごと (本実施形態 の変形例では出願年ごと、優先日が属する年ごと等)のグループ tに分類する(S500[0120] Next, the patent score calculation unit 2070 uses the application date information or the priority date information among the obtained bibliographic information of the patent data to obtain the patent data every predetermined period (variation example of this embodiment). Then, it is classified into group t by application year and year to which the priority date belongs (S500).
)。 ).
つぎに、パテントスコア算出部 2070は、各特許データの評価値を算出する(S600 )。この処理の詳細を、図 20に基づいて説明する。  Next, the patent score calculation unit 2070 calculates an evaluation value of each patent data (S600). Details of this processing will be described with reference to FIG.
[0121] 図 20は、本実施形態の変形例の各特許データの評価値を算出する処理の詳細を 示すフローチャートである。 FIG. 20 is a flowchart showing details of a process for calculating an evaluation value of each patent data according to a modification of the present embodiment.
パテントスコア算出部 2070は、 S210の分類によって生成されたグループに属する 特許データについて、経過情報 300および内容情報 400を取得する(S610)。具体 的には、パテントスコア算出部 2070は、取得した特許データの書誌情報に含まれる 特許 ID (公報番号等)を利用して、記憶装置 3に格納されて!/、る経過情報 300およ び内容情報 400の中から、取得した特許データの特許 IDに関連付けられている経 過情報 300および内容情報 400を取得する。  The patent score calculation unit 2070 acquires the progress information 300 and the content information 400 for the patent data belonging to the group generated by the classification of S210 (S610). Specifically, the patent score calculation unit 2070 uses the patent ID (gazette number, etc.) included in the bibliographic information of the acquired patent data to be stored in the storage device 3! /, The progress information 300 and In addition, from the content information 400, the historical information 300 and the content information 400 associated with the patent ID of the acquired patent data are acquired.
ここで、図 20では、当該取得した 1つのグループ力 件の特許データからなるものと し、 J件のそれぞれを区別するため添え字 j (j = l , 2, · · · ,】)を用いる。 Here, in Fig. 20, it is assumed that it consists of patent data for one acquired group force. The subscript j (j = l, 2, ···,]) is used to distinguish each of the J cases.
J件の特許データを取得したら、これら J件の特許データの経過情報 300および内容 情報 400を用いて、後述の S6302〜S6304で用いる「評価項目の該当有無データ の J件分の合計値」等を予め求めておく。  Once J patent data has been acquired, using these J patent data progress information 300 and content information 400, the “total value for J of the evaluation item corresponding data” used in S6302 to S6304, which will be described later, etc. Is obtained in advance.
[0122] 次に、変数 jを 1にセットし(S620)、次のようにして特許データ jの評価素点を算出 する。 [0122] Next, the variable j is set to 1 (S620), and the evaluation score of the patent data j is calculated as follows.
[0123] まず、経過情報 300の各フィールドに登録されて!/、る情報を評価項目とし、 I個の評 価項目 i (i= l , 2, · · · , I)について、評価項目ごとに予め設定された評価点算出方 法を選択する(S6301)。  [0123] First, the information registered in each field of the progress information 300 is used as an evaluation item, and I evaluation items i (i = l, 2, ..., I) are evaluated for each evaluation item. Select the evaluation point calculation method set in advance in (S6301).
[0124] 本実施形態の変形例における評価点算出方法には次の 3通りがある。すなわち、フ ィーノレド 305、 306、 307、 308、 309、 310、 311、 312ίこ登録されて!/、る十青幸 こつ いては、当該特許データに対する所定行為の有無を示す情報として S6302〔有無型 〕を選択する。また、フィールド 302、 303、 304については、当該特許データの期間 に関する情報として S6303〔時間減衰型〕を選択する。また、フィールド 313について は、当該特許データの引用回数を示す情報として S6304〔回数型〕を選択する。  [0124] There are the following three evaluation point calculation methods in the modification of the present embodiment. In other words, Finored 305, 306, 307, 308, 309, 310, 311, 312ί have been registered! ] Is selected. For fields 302, 303, and 304, S6303 [time decay type] is selected as information relating to the period of the patent data. In the field 313, S6304 [number-of-times] is selected as information indicating the number of times the patent data is cited.
[0125] 評価点算出方法を選択したら、 I個の評価項目 iの各々について、特許データ jの評 価点を算出する(S6302、 S6303, S6304)。  [0125] After selecting the evaluation score calculation method, the evaluation score of patent data j is calculated for each of the I evaluation items i (S6302, S6303, S6304).
[0126] < 6 - 2- 1.有無型における評価点の算出〉  [0126] <6-2- 1. Calculation of evaluation score for presence / absence type>
S6302〔有無型〕が選択された評価項目 iにつ!/、ては、次のほ女 3]により評価点を算 出する。  For the evaluation item i for which S6302 [Presence / absence type] is selected, the evaluation score is calculated according to the following 3).
 Country
—(評価項目,の該当有無データ) _ — (Evaluation item, applicable data) _
(評価項目,の該当有無データ)  (Evaluation item, applicable data)
[0127] ここで分子に配置された「評価項目 iの該当有無データ」は、例えば「分割出願」に ついては、上述のように分割出願がなされていれば 1、なされていなければ 0となる。 [0127] Here, the "relevance data for evaluation item i" arranged in the numerator is, for example, "1" if a divisional application has been filed as described above, and "0" if it has not been filed.
[0128] 分母には、上記「評価項目 iの該当有無データ」の当該グループ内合計値の正の平 方根が配置されている。従って、当該グループ内に評価項目該当の特許データが多 数存在する場合は分母が大きぐ当該グループ内に評価項目該当の特許データが 少数しか存在しない場合は分母が小さくなる。該当件数の多い評価項目(「包袋閲覧 」等)を有する特許よりも、該当件数の少ない評価項目(「無効審判維持審決」等)を 有する特許の方が、特許権設定登録後の維持率が高い傾向がある(一般に、維持率 の高さは、維持費(特許料)に見合う経済的価値の高さを示すと考えられる)ので、各 評価項目の重み付けが自動的になされる。また、所定期間ごとのグループ単位で集 計しているので、例えば古い特許ほど多くの経過情報が付加され、公開されて間もな V、新しレ、特許には未だ経過情報が付加されて!/、な!/、ことが多!/、が、それだけの理由 で新しレ、特許に低レ、評価が与えられるとレ、う傾向を緩和することができる。 [0128] In the denominator, a positive square root of the total value in the group of the above-mentioned "data corresponding to evaluation item i" is arranged. Therefore, there are many patent data corresponding to the evaluation items in the group. When there are a small number of denominators, the denominator is large. When there are only a few patent data corresponding to the evaluation items in the group, the denominator is small. Patents with fewer evaluation items (such as “invalidation trial maintenance decision”) than patents with a higher number of evaluation items (such as “Bag browsing”) are more likely to be retained after registration. (In general, the high maintenance rate is considered to indicate the high economic value commensurate with the maintenance cost (patent fee)), so each evaluation item is automatically weighted. In addition, since data is collected in groups for each predetermined period, for example, older patents have more progress information added, and V, newer, and patents that have just been published still have progress information added. ! /, Na! /, Many things! /, But for that reason, it is possible to alleviate the tendency of new and low patents.
特許データの属性情報は、分析対象母集団内での相対評価に有用であるが、この 分析対象母集団内の特許出願又は特許権を平等に扱ってしまうと適切な評価はで きない。本実施形態によれば、分析対象母集団を時期ごとのグループに分類し、こ の分類されたグループごとに求めた値を分母として用いることで、異なる時期の特許 出願又は特許権を含む分析対象母集団内において、適切な相対評価が可能となる また、例えばある技術分野にお!/、て、特許出願が少ない同時期グループにおける 1件の価値と、特許出願が多くなつた同時期グループにおける 1件の価値とでは、前 者の価値の方が高いことが多い。一方で例えば、出願公開されて間もない特許出願 より、数年経過した特許出願の方が、閲覧請求を受けた等の経過情報が付与される 可能性は必然的に高!/、が、だからと!/、つて出願公開されて間もなレ、特許出願をその まま低く評価するのは誤りである。同時期グループ内の特許出願の中で、例えば閲 覧請求を受けたものが数少ない場合、その閲覧請求を受けた特許出願は格別注目 度の高い特許出願であり、高く評価されるべきである。逆に、同時期グループ内の特 許出願の中で、閲覧請求を受けたものが数多い場合、その閲覧請求を受けた特許 出願は、閲覧請求を受けたというだけの理由で高く評価されるべきものではない。 本実施形態によれば、各グループに属する各特許データの特許属性情報を利用し て求めた値と、該グループに属する各特許データの特許属性情報を利用して求めた 値を該グループ毎に合計した値の減少関数の値と、の積により評価点を算出する。こ の構成によれば、それぞれのグループにおける各特許データの相対的な位置づけ を考慮した値を評価値として求めることができる。その結果、経過情報に基づく数値 情報の前記同時期グループにおける合計値が低いほど高い重み付けをし、逆に当 該合計値が高いほど低い重み付けをすることにより、分析対象文書群における特許 出願又は特許権の適切な評価が可能となる。 The attribute information of patent data is useful for relative evaluation within the analysis population, but proper evaluation cannot be performed if patent applications or patent rights within this analysis population are treated equally. According to this embodiment, the analysis target population is classified into groups for each period, and the value obtained for each classified group is used as the denominator, so that analysis targets including patent applications or patent rights at different periods can be used. Appropriate relative evaluation is possible within the population.For example, in one technical field! /, One value in a contemporaneous group with few patent applications, and one in a contemporaneous group with many patent applications. In terms of one value, the former value is often higher. On the other hand, for example, a patent application that has passed several years is more likely to be given progress information such as being requested to browse than a patent application that has just been published. That's why it is an error to evaluate a patent application as low as it has been published. For example, if there are only a few patent applications in the group that have received a request for review, the patent application that received the request for inspection is a patent application with a particularly high degree of attention and should be highly evaluated. Conversely, if there are a large number of patent applications in the group that have been requested to be browsed, the patent application that has been requested to be browsed should be highly evaluated simply because it has been requested to be read. It is not a thing. According to the present embodiment, the value obtained by using the patent attribute information of each patent data belonging to each group and the value obtained by using the patent attribute information of each patent data belonging to the group are obtained for each group. The evaluation score is calculated by multiplying the sum of the values by the value of the decreasing function. This According to this configuration, a value that takes into account the relative positioning of each patent data in each group can be obtained as an evaluation value. As a result, the lower the total value of the numerical information based on the progress information, the higher the weight, and the lower the higher the total value, the lower the weight. Appropriate evaluation of rights is possible.
[0129] < 6— 2— 2.時間減衰型における評価点の算出〉 [0129] <6— 2— 2. Calculation of evaluation points for time decay type>
S6303〔時間減衰型〕が選択された評価項目 iにつ!/、ては、次のほ女 4]により評価 点を算出する。  For the evaluation item i for which S6303 [Time decay type] is selected, the evaluation score is calculated according to the following 4).
[数 4コ  [Number 4
(経過時間,年限 (Elapsed time, age
年限 .  Term.
, (評価項目,の該当有無データ)  , (Applicability data of evaluation item)
[0130] ここで分子に配置された「Exp (— (Min (経過時間,年限))/年限)」は、「審査請求 からの経過日数」については、当該「審査請求からの経過日数 (年数換算値)」と「年 限」のうち何れ力 vj、さい方の値を「年限」で除算し 1を乗算した値で、ネィピア数 eを べき乗した値である。「年限」は出願日から特許権存続期間満了までの最大年数(日 本の現行法では 20年)とする。「登録日からの経過日数」の場合も同じ計算式を用い 、「年限」は出願日から特許権存続期間満了までの最大年数(日本の現行法では 20 年)とする。「出願日からの経過日数」の場合も同じ計算式を用いる力 「年限」は出 願日から出願審査請求期限までの年数(日本の現行法では 3年)とする。これによる と、経過時間が短いうちは分子の値は Exp (0) = 1に近い値である力、時間の経過と ともに減衰して経過時間≥年限となると Exp (— 1) = l/eにまで低下する。指数関数 にする利点は、価値に対する減価償却効果を導入できることと、評価値分布の離散 化をなくし滑らかな分布にできることである。「審査請求からの経過日数」、「出願日か らの経過日数」、「登録日からの経過日数」は、多くの特許に該当する基本評価項目 であり、これら 3評価項目しか該当しない特許群の同点化を避けることができる。 [0130] “Exp (— (Min (Elapsed time, Years)) / Years)” placed in the numerator is “Elapsed days from request for examination”. any force vj of converted value) "and" year limit ", the value of the most side with the value obtained by multiplying the 1 divided by" maturity ", is a power value of Neipia number e. The “year” is the maximum number of years from the date of filing to the expiration of the patent term (20 years under the current Japanese law). The same formula is used for “Elapsed days from date of registration” and “Maturity” is the maximum number of years from the date of filing to the expiration of the patent term (20 years under current Japanese law). The ability to use the same formula for “Elapsed days from filing date” “Maturity” is the number of years from the filing date to the deadline for requesting examination (3 years under current Japanese law). According to this, while the elapsed time is short, the value of the numerator is a force that is close to Exp (0) = 1, decaying with the passage of time and when the elapsed time ≥ age, Exp (— 1) = l / e Drop to. The advantage of using an exponential function is that a depreciation effect on the value can be introduced and that the evaluation value distribution can be made discrete and smooth. “Elapsed days from request for examination”, “Elapsed days from filing date”, and “Elapsed days from registration date” are basic evaluation items applicable to many patents. Can be avoided.
[0131] 分母は上記 S6302〔有無型〕と同様の式が配置されている力 「審査請求からの経 過日数」については、当該特許出願につき出願審査請求されていれば例えば 1、さ れていなければ例えば 0の値を当該グループ内で合計し正の平方根をとつたもので ある。「登録日からの経過日数」についても、当該特許出願につき特許権設定登録さ れていれば 1、されていなければ 0の値を当該グループ内で合計し正の平方根をとつ たものが分母となる。 「出願からの経過日数」については、すべての特許データが該 当するので、当該評価項目の該当有無データを 1とすれば、分母の値はグループ内 の特許データの件数の正の平方根に等しくなる。何れの場合も、当該グループ内に 評価項目該当の特許データが多数存在する場合は分母が大きぐ当該グループ内 に評価項目該当の特許データが少数しか存在しない場合は分母が小さくなる。上述 のように「審査請求からの経過日数」、「出願日力 の経過日数」、「登録日力、らの経 過日数」は、多くの特許に該当する基本評価項目であるので、これら評価項目の配 点は小さくなりやすい。 [0131] The denominator is a force with the same formula as S6302 [Presence / absence type] above. The number of days in the past is a positive square root obtained by summing, for example, a value of 1 if the application is requested for the patent application and 0 if not, for example. For the number of days elapsed from the registration date, the denominator is the sum of the values of 1 if the patent application has been registered for the patent application and 0 if not, and taking the positive square root. It becomes. As for “Elapsed days since filing”, all patent data is applicable, so if the corresponding data for the evaluation item is 1, the denominator value is equal to the positive square root of the number of patent data in the group. Become. In either case, the denominator is large when there are many patent data corresponding to the evaluation items in the group, and the denominator is small when there are only a few patent data corresponding to the evaluation items in the group. As described above, “Elapsed days from request for examination”, “Elapsed days of application filing power”, and “Registration power, etc.” are basic evaluation items applicable to many patents. Item points tend to be smaller.
この S6303〔時間減衰型〕で算出された評価点は、更に内容情報による補正を行う なお、以下では、図 18に示した内容情報 400を利用する。  The evaluation score calculated in S6303 [time decay type] is further corrected by the content information. In the following, the content information 400 shown in FIG. 18 is used.
経過情報のみにより評価する場合、出願公開後又は特許権設定登録後間もない 特許出願又は特許権には、今後付与されると期待される経過情報がなく評価が正し く行えない可能性がある。従ってこれを補正するため、経過情報による評価に内容情 報を加味する。し力、し、内容情報は、経過情報ほど維持率との相関が高くない傾向に あり、不用意に内容情報を加味すると却って評価の精度が落ちる可能性がある。 そこで、経過情報が十分に付与された特許の評価には内容情報の影響を小さくと どめ、経過情報が不十分な特許の評価に内容情報を効果的に反映させるため、この S223C〔時間減衰型〕で算出された評価点にのみ、内容情報に基づく補正係数を乗 算する。  If evaluation is based only on historical information, there is a possibility that the patent application or patent right will not be evaluated correctly because there is no historical information expected to be granted in the future, just after the application is published or the patent right is registered. is there. Therefore, in order to correct this, the content information is added to the evaluation based on the progress information. However, content information tends not to correlate with the maintenance rate as much as progress information, and carelessness of content information may reduce the accuracy of evaluation. Therefore, in order to minimize the influence of the content information in the evaluation of patents with sufficient progress information and to effectively reflect the content information in the evaluation of patents with insufficient progress information, this S223C [Time decay Only the evaluation score calculated in [Type] is multiplied by the correction coefficient based on the content information.
このように本実施形態によれば、出願の古い新しいを問わず、どの特許データにも 一律に付与されやすい特性を有する期間に関する情報に、各々の特許データの内 容情報を加味することができる。その結果、経過情報があまり付与されていない新し い出願からなる特許データについても、適切な評価を行うことができる。 [0133] 具体的には、上記ほ女 4]の各評価点に、 As described above, according to the present embodiment, regardless of whether the application is old or new, it is possible to add the contents information of each patent data to the information about the period having characteristics that are easily given to any patent data. . As a result, it is possible to appropriately evaluate patent data consisting of new applications with little progress information. [0133] Specifically, for each evaluation score of the above-mentioned woman 4],
a X a X a  a X a X a
1 2 3  one two Three
ここで、  here,
a =21/3(請求項当たりの平均文字数が平均以下の場合)又は a = 2 1/3 (if the average number of characters per claim is below average) or
2— 1/3(請求項当たりの平均文字数が平均以上の場合) 2— 1/3 (if the average number of characters per claim is above average)
a =21/3(全頁数が平均以上の場合)又は a = 2 1/3 (when the total number of pages is above average) or
2  2
2— 1/3 (全頁数が平均以下の場合) 2— 1/3 (when the total number of pages is below average)
a =21/3(請求項数が平均値 ±1標準偏差以内の場合)又は a = 2 1/3 (when the number of claims is within ± 1 standard deviation) or
3  Three
2— 1/3 (請求項数が上記範囲外の場合) 2— 1/3 (when the number of claims is outside the above range)
を乗算する。 a 、 a 、 aの最大値をそれぞれ 21/3とすることにより、 a Xa Xaを最大 Multiply a Xa Xa is maximized by setting the maximum values of a, a and a to 2 1/3 respectively.
1 2 3 1 2 3 値とする補正にとどめている。なお、上記実施形態では、 a Xa Xaの値が最大で 2  1 2 3 1 2 3 The value is limited to correction. In the above embodiment, the maximum value of a Xa Xa is 2
1 2 3  one two Three
になるようにしている。  It is trying to become.
[0134] < 6— 2— 3·回数型における評価点の算出〉 [0134] <6— 2— 3 · Calculation of evaluation score in 3 times type>
S6304〔回数型〕が選択された評価項目 iにつ!/、ては、次のほ女 5]により評価点を算 出する。  For the evaluation item i for which S6304 [number-of-times] is selected, the evaluation score is calculated according to the following 5].
[数 5コ  [Number 5
/(引用) xlog("]+l) / (Quote) xlo g ("] + l)
/(引用) xlog("]+l)  / (Quotation) xlog ("] + l)
[0135] ここで分子に配置された「f (引用) Xlog(n + l)」は、「被引用回数」については、当 該「被引用回数 n」に 1を加えた値の対数に重み f (引用)を乗算したものである。本発 明者らの検証により、被引用の有無にとどまらずその回数によっても特許権の維持率 が変化することがわかっている力 両者に比例関係はなぐ被引用回数の増加による 維持率の増加は次第に頭打ちの傾向を示すため、対数をとることとしたものである。  [0135] Here, “f (quotation) Xlog (n + l)” placed in the numerator is weighted to the logarithm of the value obtained by adding 1 to “number of times cited” for “number of times cited”. f (quotation) multiplied. The verification by the present inventors has shown that the retention rate of patent rights changes depending on the number of citations as well as the presence or absence of citations. The logarithm is taken to indicate a tendency to gradually peak.
[0136] 分母には、上記「f (引用) Xlog(n + l)」の当該グループ内合計値の正の平方根が 配置されている。従って、当該グループ内に他の出願で引用された特許データが多 数存在する場合は分母が大きぐ当該グループ内に他の出願で引用された特許デ ータが少数しか存在しない場合は分母が小さくなる。 [0137] 上記ほ女 5]の分子及び分母において、重み f (引用)は任意の正数を用いることがで きる力 他社の特許出願で引用された回数 (他社引用回数) n と自社の他の特許 [0136] In the denominator, the positive square root of the total value in the group of the above "f (quotation) Xlog (n + l)" is arranged. Therefore, if there are many patent data cited in other applications in the group, the denominator is large.If there are only a few patent data cited in other applications in the group, the denominator is Get smaller. [0137] In the numerator and denominator of the above woman 5], the weight f (quotation) can use any positive number. Number of times cited in other patent applications (number of times other company citations) n Patents
j other  j other
出願で引用された回数(自社引用回数) n とで区別し、それぞれの対数に異なる重  Number of times cited in the application (in-house citation number) n
j self  j self
みを付与する。この場合、上記ほ女 5]に代え、次のほ女 6]を用いる。  Grant only. In this case, the following woman 6] is used instead of the above woman 5].
[数 6]
Figure imgf000051_0001
[Equation 6]
Figure imgf000051_0001
具体的な重みとしては、他社引用の場合の f (引用 )と、自社引用の場合の f (引  Specific weights include f (quote) for other company quotes and f (quote) for company quotes.
other  other
用 )との比を、 1 : 2とした。  The ratio to 1) was set to 1: 2.
self  self
[0138] 被引用回数は、特許の価値との間に高い相関がある。更に、本発明者らの検証に よれば、他社の特許出願の審査において引用(他社引用)された回数と、 自社の他 の特許出願の審査において引用(自社引用)された回数とでは、後者と特許の価値 との相関が有意に高いことが認められた。 自社の他の特許出願の審査において引用 された発明は、自社の実施技術において中核となる基本発明であることが多いことに よるものと推測される。そして、そのような基本発明を自社が既に出願していることを 認識しつつ、その改良技術をも出願し強固な特許ポートフォリオの構築を図った可能 性が高い。  [0138] The number of times cited is highly correlated with the value of a patent. Furthermore, according to the verification by the present inventors, the number of times cited in the examination of patent applications of other companies (quoted by other companies) and the number of times cited in examinations of other patent applications of the company (in-house citation) It was found that there was a significantly high correlation between the value of patents and patents. It is presumed that the invention cited in the examination of other patent applications of the company is often the basic invention that is the core of the technology implemented in the company. While recognizing that the company has already applied for such a basic invention, it is highly likely that the company has applied for the improved technology and built a strong patent portfolio.
本実施形態によれば、被引用回数を他社引用と自社引用とに分けて考え、後者の 回数をより大きく評価値に反映させることにより、特許出願又は特許権の適切な評価 が可能となる。  According to the present embodiment, it is possible to appropriately evaluate a patent application or a patent right by considering the number of citations separately from other company citations and company citations, and reflecting the latter number more in the evaluation value.
[0139] < 6— 2— 4.評価素点の算出〉 [0139] <6— 2— 4. Calculation of evaluation score>
全ての評価項目 i (i= l , 2, · · · , I)について、特許データ jの評価点が算出された ら、これに基づいて当該特許データ jの評価素点を、次のほ女 7]により算出する(S64 0)。  For all evaluation items i (i = l, 2, ···, I), the evaluation score of patent data j is calculated, and based on this, the evaluation score of the patent data j is 7] (S64 0).
[数 7]
Figure imgf000051_0002
この式に示されるように、評価素点は、 I個の評価点の二乗和の正の平方根、又は 0 となる。評価素点が 0となるのは、審査請求期限までに出願審査請求しなかった場合
[Equation 7]
Figure imgf000051_0002
As shown in this equation, the evaluation raw score is the positive square root of the sum of squares of I evaluation points, or 0. The evaluation score is 0 when the application examination request is not received by the deadline for requesting examination.
、出願を取下げ又は放棄した場合、拒絶査定が確定した場合、その他特許出願が失 効した場合と、異議申立による取消決定や無効審判による無効審決が確定した場合 、特許権を放棄した場合、特許権の存続期間が満了した場合、その他の特許権が消 滅した場合である。これらの情報も各特許データの経過情報から読み取り、該当する 場合は評価素点を 0とする。 , If the application is withdrawn or abandoned, decision of refusal is finalized, other patent application is invalidated, decision to cancel by opposition or decision of invalidation by invalidation trial is confirmed, patent right is abandoned, patent This is the case when the term of the right expires or other patent rights have expired. This information is also read from the progress information of each patent data, and the evaluation score is set to 0 if applicable.
上述のように S6303〔時間減衰型〕で算出された評価点に対しては、内容情報によ る補正を行う。具体的には、「審査請求からの経過日数」、「出願日からの経過日数」 、「登録日からの経過日数」に基づき上述のほ女 4]で算出された評価点にそれぞれ 上述の a X a X aを乗算した上で、ほ女 7]に従い二乗和の平方根をとる。  As described above, the evaluation score calculated in S6303 [time decay type] is corrected by the content information. Specifically, each of the evaluation points calculated in the above-mentioned Woman 4] based on “Elapsed days from examination request”, “Elapsed days from application date”, and “Elapsed days from registration date” After multiplying by X a X a, take the square root of the sum of squares according to Woman 7].
1 2 3  one two Three
複数の評価項目による評価点 iから評価素点を算出する方法として、各評価点 iの 総和を求める方法がある(単純和法)。し力、しこの算出方法によると、特許の維持率( 経済的価値)との相関を有する経過情報が多数付与された特許の評価が高く算出さ れるので、評価点 iの総和を評価素点とすることは一見合理的である力 維持率との 相関があまり高くな!/、経過情報を多数付与されて!/、る特許の(低!/、評価点が多数カロ 算される)評価素点が、維持率との相関が極めて高レ、経過情報を少数付与されて!/、 る特許の評価素点を超えてしまうことがあり得るので注意が必要である。  As a method of calculating an evaluation raw score from an evaluation point i based on a plurality of evaluation items, there is a method of calculating a sum of each evaluation point i (simple sum method). According to the calculation method, the evaluation of a patent to which a large amount of historical information having a correlation with the patent maintenance rate (economic value) is given is highly calculated. At first glance, it seems reasonable that the correlation with the power maintenance rate is too high! /, A lot of progress information is given! /, An evaluation of a patent (low! /, A lot of evaluation points are calculated) Care should be taken because the raw score is very high in correlation with the maintenance rate and may exceed the evaluation raw score of a patent given a small amount of progress information!
この問題を解決する 1つの方法として、各評価点 iのうち最大値を評価素点とする方 法もある(最大値法)。し力、しこの算出方法によると、特に、ある経過情報と特許群の 維持率との相関を調べる場合に、他にどんな経過情報が付与されているか無関係に 相関を調べた場合には、ある特許の維持率は、最高の維持率を持つ経過情報の維 持率で最もよく表現できると期待されるので、評価点 iの最大値を評価素点とすること は一見合理的である力 評価点 iの最大値が 2つの特許で同じである場合に優劣が つけられない。さらに、最大値法を用いた場合は、出願人、特許庁及び競合他社の 異なる 3主体の観点を加味した評価を行うことができず、それらの主体のうちのいず れかー者の観点のみが反映されることとなってしまい、残りの主体の観点を特許デー タの評価に反映させることができなレ、。 二乗和の平方根をとる上述の方法は、単純和法と最大値法の長所を兼ね備えた方 法ということができる。すなわち、二乗和の平方根をとることにより、ある特許データ jに 関する I個の評価項目 iの中に高い評価点 iがあるときは、その高い評価点 iが評価素 点に大きく影響する。そして、評価点 iの高い評価項目以外の評価点についても、幾 らか考慮された評価素点となる。従って、評価点 iの高くなりやすい「早期審査」、「異 議申立維持決定」、「無効審判維持審決」等に複数該当するような特許データ jに対 しては、突出して高い評価素点を与えることができる。 One way to solve this problem is to use the maximum value among the evaluation points i as the evaluation raw score (maximum value method). According to this calculation method, there is a particular case when investigating the correlation between certain historical information and the retention rate of the patent group, when investigating the correlation regardless of what other historical information is given. The patent maintenance rate is expected to be best expressed by the maintenance rate of the historical information with the highest maintenance rate, so it is reasonable to assume that the maximum value of the evaluation point i is the evaluation raw score. If the maximum value of point i is the same in the two patents, superiority or inferiority cannot be assigned. Furthermore, when the maximum value method is used, it is not possible to make an evaluation that takes into account the perspectives of three different entities of the applicant, the JPO, and the competitors, and only the perspective of one of those entities As a result, the viewpoints of the remaining subjects cannot be reflected in the evaluation of patent data. The above-mentioned method of taking the square root of the sum of squares can be said to be a method that combines the advantages of the simple sum method and the maximum value method. In other words, by taking the square root of the sum of squares, if there is a high evaluation point i in I evaluation items i for a certain patent data j, the high evaluation point i greatly affects the evaluation point. Evaluation points other than the evaluation item with a high evaluation point i also become evaluation raw points with some consideration. Therefore, for patent data j that corresponds to multiple items such as “early examination”, “opposition to maintain opposition”, and “invalidation trial decision” that tend to be high, i. Can be given.
このように本変形例では、特許属性情報の種類に応じて算出した評価点を全て加 味した特許評価を行うようにしている(S630、 S640)。その結果、特許データの価値 を多面的に評価することが可能となる。  Thus, in this modification, patent evaluation is performed in consideration of all evaluation points calculated according to the type of patent attribute information (S630, S640). As a result, it is possible to evaluate the value of patent data from multiple angles.
[0141] < 6— 2— 5.評価値の算出〉 [0141] <6— 2— 5. Calculation of Evaluation Value>
評価素点が算出されたら、その対数を算出して当該特許データ jの評価値とする(S 650)。  When the evaluation raw score is calculated, the logarithm thereof is calculated as the evaluation value of the patent data j (S650).
経過情報又は内容情報に基づいて算出される評価値は、特異な経過又は内容が 読み取れる数少ない特許出願又は特許権に対しては高い値が与えられる力 S、その 他大勢の特許出願又は特許権に対しては低!/、値が与えられることが多!/、。従って評 価値別の件数分布を見ると、評価値が高い特許出願又は特許権は数少なくまばらな 分布となり、評価値が低い特許出願又は特許権は数多く密集した分布となる。  The evaluation value calculated based on the progress information or content information is the power S that gives a high value to a few patent applications or patent rights that can read unique progress or content S, and many other patent applications or patent rights. On the other hand, low! /, Often given a value! / ,. Therefore, looking at the number distribution by rating value, patent applications or patent rights with high evaluation values are few and sparse, and many patent applications or patent rights with low evaluation values are densely distributed.
このような場合には、評価値の高い少数の特許出願又は特許権によって平均値( 相加平均値)が大きく左右されるので、このような平均値との比較によって評価する際 は注意が必要となる。また例えば高レ、評価値が得られた 2つの特許出願又は特許権 を比較する場合に、数値の上では評価値に大きな差があるように見えたとしても、実 際には有意な差ではなレ、こともある。  In such a case, the average value (arithmetic average value) is greatly influenced by a small number of patent applications or patent rights with high evaluation values, so care must be taken when evaluating by comparison with such average values. It becomes. In addition, for example, when comparing two patent applications or patent rights that have obtained high evaluation values, even if it appears that there is a large difference in evaluation values, there is actually no significant difference. Nare, sometimes.
[0142] 次に、すべての特許データ jについて評価値を算出したか否かを判定し(S660)、 算出してレヽなレヽ場合(S660 : N O )、 S67C こ進み、変数 jを j + 1 ίこセットし、 S63C こ 戻って次の特許データについて評価値を算出する。 [0142] Next, it is determined whether or not the evaluation values have been calculated for all patent data j (S660). If the calculation results in a low level (S660: NO), S67C is advanced and variable j is set to j + 1. Set ί and then return to S63C to calculate the evaluation value for the next patent data.
すべての特許データ jについて評価値を算出した場合は(S660 : YES)、当該グノレ ープに属する特許データに関する評価値の算出処理を終了する。 このように本実施形態では、特性の異なる複数の特許データを、技術分野ごと、出 願時期ごとの特性を加味した上で評価するようにしている。その結果、特許データの 価値をより適切に評価することができる。 When the evaluation values are calculated for all the patent data j (S660: YES), the processing for calculating the evaluation values for the patent data belonging to the relevant gnole ends. As described above, in this embodiment, a plurality of patent data having different characteristics are evaluated in consideration of the characteristics for each technical field and each application time. As a result, the value of patent data can be more appropriately evaluated.
[0143] S610〜S670までの評価値算出処理は、 S400で取得した特許データを S500で 分類して得られたすべてのグループ tについて実行する。  [0143] The evaluation value calculation processing from S610 to S670 is executed for all groups t obtained by classifying the patent data acquired in S400 in S500.
すべてのグループ tについて評価値を算出したら図 19に戻り、この評価値に基づい て、 S400で取得した分析対象母集団における偏差値をパテントスコア PSとして算出 する(S700)。この偏差値は、本来ならば比較することが困難な、異なる技術分野間 の特許データの相対比較(S400で異なる IPCにより別途選択される分析対象母集 団との比較)をも可能とするものである。  When the evaluation values are calculated for all the groups t, the process returns to FIG. 19, and based on this evaluation value, the deviation value in the analysis target population acquired in S400 is calculated as the patent score PS (S700). This deviation value also enables relative comparison of patent data between different technical fields that would otherwise be difficult to compare (comparison with an analysis population selected separately by different IPCs in S400). It is.
[0144] そして、本実施形態の変形例では、上記の手順により求めたパテントスコア PSを基 にして、技術要素スコアを算出するようにしているため、上記実施形態に比べて、以 下のような利点がある。  [0144] In the modified example of the present embodiment, the technical element score is calculated based on the patent score PS obtained by the above procedure. There are significant advantages.
具体的には、上記変形例では、技術要素スコアの基となるパテントスコア PSは、経 過情報の種類に応じた重みを考慮している。そして、そのパテントスコア PSを用いて 、技術要素スコアを求めるようにしているため、変形例では、より精度が高いスコアが 算出される。  Specifically, in the above modification, the patent score PS, which is the basis of the technical element score, takes into account the weight according to the type of historical information. Since the technical element score is obtained using the patent score PS, a score with higher accuracy is calculated in the modified example.
本変形例のパテントスコアによれば、分析対象母集団を時期ごとのグループに分類 し、この分類されたグループごとに求めた値を分母として用いることで、異なる時期の 特許出願又は特許権を含む分析対象母集団内において、適切な相対評価が可能と している。  According to the patent score of this modification, the analysis target population is classified into groups for each period, and the values obtained for each classified group are used as denominators to include patent applications or patent rights at different periods. Appropriate relative evaluation is possible within the analysis population.
そのため、出願が古い特許データが多く分類されている因子の技術要素スコアに、 高い評価値が算出されてしまう可能性を低減できる。  Therefore, it is possible to reduce the possibility that a high evaluation value is calculated for the technical element score of a factor in which many patent data whose applications are old are classified.

Claims

請求の範囲 The scope of the claims
[1] テキストデータで表された、複数の技術文献を取得するテキストデータ取得手段と、 前記取得した各技術文献につき、各索引語の重み付け量を求める重み付け量算 出手段と、  [1] Text data acquisition means for acquiring a plurality of technical documents represented by text data, weighting amount calculation means for calculating a weighting amount of each index word for each of the acquired technical documents,
前記取得した各技術文献を被験者とし、前記求めた各索引語の重み付け量を用い て、前記各索引語を観測変数とした因子分析を行い、各索引語の各々について、因 子毎に因子負荷量を算出するとともに、前記各技術文献の各々について、因子毎に 因子得点を算出する演算手段と、  Each acquired technical document is used as a subject, and a factor analysis is performed using each index word as an observation variable by using the obtained weighting amount of each index word. For each index word, factor loading is performed for each factor. A calculation means for calculating a factor for each factor for each technical document,
各索引語の因子負荷量を用いて各索引語の帰属因子を決定するとともに、各技術 文献の因子得点を用いて各技術文献の帰属因子を決定する帰属因子決定手段と、 同じ因子に属する索引語又は索引語群を、それぞれ該当する各因子に属する技 術文献又は技術文献群のデータとともに、各因子につき出力する出力手段と、を備 えること  An attribution factor determination means that determines the attribution factor of each index word using the factor loading of each index word and also determines the attribution factor of each technical document using the factor score of each technical document. Output means for outputting each word or index word group together with the technical literature or technical literature group data belonging to each corresponding factor.
を特徴とする文書群分析装置。  Document group analysis device characterized by.
[2] 請求項 1に記載の文書群分析装置であって、 [2] The document group analysis apparatus according to claim 1,
前記帰属因子決定手段は、  The attribution factor determination means includes
各索引語の各々について、前記算出した因子負荷量を用いて、当該因子負荷量 が最大の因子を選択し、その選択した因子を該索引語の帰属因子として特定すると ともに、各技術文献の各々について、前記算出した因子得点を用いて、当該因子得 点が最大の因子を選択し、その選択した因子を該技術文献の帰属因子として特定す ること  For each index word, using the calculated factor loading, the factor with the largest factor loading is selected, and the selected factor is identified as the attribution factor of the index word. Using the calculated factor score, the factor with the highest factor score is selected, and the selected factor is specified as the attribution factor of the technical document.
を特徴とする文書群分析装置。  Document group analysis device characterized by.
[3] 請求項 1又は 2に記載の文書群分析装置であって、 [3] The document group analyzer according to claim 1 or 2,
前記複数の技術文献に含まれる索弓 I語の出現頻度を求め、該出現頻度を用レ、て 各索引語の重要度を算出し、該算出した重要度を用いて、重要度上位所定個数の 索引語を抽出する重要索引語抽出手段を更に備え、  The frequency of occurrence of the word “bow” I included in the plurality of technical documents is obtained, the degree of importance of each index word is calculated using the frequency of appearance, and a predetermined number of higher-order importance is calculated using the calculated importance. An important index word extracting means for extracting the index word of
前記重み付け量算出手段は、前記各索引語の重み付け量として、前記重要度上 位所定個数の索引語の重み付け量を求めること を特徴とする文書群分析装置。 The weighting amount calculating means obtains a weighting amount of the predetermined number of index words having the highest importance as the weighting amount of each index word. Document group analysis device characterized by.
[4] 請求項;!〜 3のいずれか一項に記載の文書群分析装置であって、 [4] A document group analysis apparatus according to any one of claims;! To 3,
前記因子の各々について、該因子の技術的評価を示す因子評価値を算出する因 子評価値算出手段を備え、  For each of the factors, there is provided factor evaluation value calculation means for calculating a factor evaluation value indicating a technical evaluation of the factor,
前記出力手段は、前記技術文献又は技術文献群のデータとして、該技術文献又 は技術文献群の因子評価値を出力すること  The output means outputs the factor evaluation value of the technical document or technical document group as data of the technical document or technical document group.
を特徴とする文書群分析装置。  Document group analysis device characterized by.
[5] 請求項;!〜 3のいずれか一項に記載の文書群分析装置であって、 [5] A document group analysis apparatus according to any one of claims;! To 3,
前記技術文献は、特許公開公報及び特許掲載公報を含む特許文書であり、 前記各特許文書の経過情報を取得する経過情報取得手段と、  The technical document is a patent document including a patent publication and a patent publication, and a progress information acquisition unit that acquires progress information of each patent document;
前記因子の各々について、該因子に属する前記技術文献又は技術文献群の経過 情報を用いて、該因子の技術的評価を示す因子評価値を算出する因子評価値算出 手段と、を備え、  For each of the factors, a factor evaluation value calculating means for calculating a factor evaluation value indicating a technical evaluation of the factor using the progress information of the technical document or technical document group belonging to the factor, and
前記出力手段は、前記技術文献又は技術文献群のデータとして、前記因子評価 値を出力すること  The output means outputs the factor evaluation value as data of the technical document or technical document group.
を特徴とする文書群分析装置。  Document group analysis device characterized by.
[6] 請求項 5に記載の文書群分析装置であって、 [6] The document group analyzer according to claim 5,
各因子に属する技術文献又は技術文献群の文献数を判定する文書数判定手段を 備え、  A document number judging means for judging the number of documents of technical documents or technical documents belonging to each factor;
前記因子評価値算出手段は、  The factor evaluation value calculation means includes
前記因子の各々について、該因子に属する技術文献又は技術文献群の文献数に 所定の重み付けをした第 1指数を算出し、該因子に属する技術文献又は技術文献 群の経過情報を指数化した第 2指数を算出し、該算出した第 1指数および第 2指数 を用いて、該因子の因子評価値を算出すること  For each of the factors, a first index is calculated by giving a predetermined weight to the number of documents in the technical document or technical document group belonging to the factor, and the progress information of the technical document or technical document group belonging to the factor is indexed. (2) Calculate an index and calculate the factor evaluation value of the factor using the calculated first index and second index.
を特徴とする文書群分析装置。  Document group analysis device characterized by.
[7] 請求項 6に記載の文書群分析装置であって、 [7] The document group analyzer according to claim 6,
前記経過情報には、他社引用件数、被特許異議申立ての回数、被特許無効審判 請求の回数、審査請求の有無、および特許権設定登録の有無が含まれていて、 前記第 1指数とは、前記文献数に、他社引用件数の合計値、被特許異議申立て回 数の合計値、および被特許無効審判請求回数の合計値のうちの少なくとも 1つを用 いて重み付けをした値であり、 The progress information includes the number of citations from other companies, the number of oppositions to be patented, the number of requests for trial for invalidity of patented patents, the presence or absence of examination requests, and the presence or absence of registration of patent right settings. The first index is weighted by using at least one of the total number of citations from other companies, the total number of patent oppositions, and the total number of patent invalidation requests. Value
前記経過情報を指数化した第 2指数とは、他社引用件数の合計値、被特許異議申 立て回数の合計値、被特許無効審判請求の回数の合計値、審査請求率、および登 録查定率のうちの少なくとも 1つを指数化した値であること  The second index, which indexes the progress information, is the total number of citations from other companies, the total number of patent objections, the total number of requests for patent invalidation trials, the examination request rate, and the registration decision rate. Must be an indexed value of at least one of
を特徴とする文書群分析装置。  Document group analysis device characterized by.
[8] 請求項 5〜7のいずれか一項に記載の文書群分析装置であって、 [8] The document group analyzer according to any one of claims 5 to 7,
前記出力手段は、前記因子評価値は前記因子ごと及び出願人ごとに算出すること を特徴とする文書群分析装置。  The document group analysis apparatus characterized in that the output means calculates the factor evaluation value for each factor and for each applicant.
[9] 情報処理装置が行うデータ分析方法であって、 [9] A data analysis method performed by an information processing device,
前記情報処理装置は、  The information processing apparatus includes:
テキストデータで表された、複数の技術文献を取得するステップと、  Obtaining a plurality of technical documents represented by text data;
前記取得した各技術文献につき、各索引語の重み付け量を求めるステップと、 前記取得した各技術文献を被験者とし、前記求めた各索引語の重み付け量を用い て、前記各索引語を観測変数とした因子分析を行い、各索引語の各々について、因 子毎に因子負荷量を算出するとともに、前記各技術文献の各々について、因子毎に 因子得点を算出するステップと、  For each of the acquired technical documents, a step of determining a weighting amount of each index word; and using each of the acquired technical documents as a subject, and using the determined weighting amount of each index word, each index word is defined as an observation variable. Performing factor analysis, calculating a factor loading for each factor for each index word, and calculating a factor score for each factor for each of the technical documents;
各索引語の因子負荷量を用いて各索引語の帰属因子を決定するとともに、各技術 文献の因子得点を用いて各技術文献の帰属因子を決定するステップと、を実行する こと  Determining the attribution factor of each index word using the factor loading of each index word, and determining the attribution factor of each technical document using the factor score of each technical document.
を特徴とするデータ分析方法。  A data analysis method characterized by
[10] 情報処理装置にデータ分析処理を実行させるプログラムであって、 [10] A program for causing an information processing device to perform data analysis processing,
前記プログラムは、  The program is
テキストデータで表された、複数の技術文献を取得する処理と、  Processing to obtain a plurality of technical documents represented by text data;
前記取得した各技術文献につき、各索引語の重み付け量を求める処理と、 前記取得した各技術文献を被験者とし、前記求めた各索引語の重み付け量を用い て、前記各索引語を観測変数とした因子分析を行い、各索引語の各々について、因 子毎に因子負荷量を算出するとともに、前記各技術文献の各々について、因子毎に 因子得点を算出する処理と、 For each of the acquired technical documents, a process for obtaining a weighting amount of each index word, and using each of the acquired technical documents as a test subject, and using the obtained weighting amount of each index word, each index word is defined as an observation variable. Factor analysis, and for each index word, Calculating a factor load for each child, and calculating a factor score for each factor for each of the technical documents;
各索引語の因子負荷量を用いて各索引語の帰属因子を決定するとともに、各技術 文献の因子得点を用いて各技術文献の帰属因子を決定する処理と、を情報処理装 置に実行させること  Determines the attribution factor of each index word using the factor loading of each index word, and causes the information processing device to execute the process of determining the attribution factor of each technical document using the factor score of each technical document thing
を特徴とするプログラム。  A program characterized by
[11] 請求項;!〜 3のいずれか一項に記載の文書群分析装置であって、 [11] A document group analysis apparatus according to any one of claims;! To 3,
前記技術文献は、特許公開公報及び特許掲載公報を含む特許文書であり、 前記取得した各特許文書につ!/、て、当該特許文書の価値を個別に評価した特許 スコアを取得する手段と、  The technical document is a patent document including a patent publication and a patent publication, and for each acquired patent document! /, Means for acquiring a patent score that individually evaluates the value of the patent document;
前記因子の各々について、その因子に属する特許文書の前記特許スコアを用いて 、該因子の技術的評価を示す因子評価値を算出する因子評価値算出手段とを備え ること  For each of the factors, there is provided factor evaluation value calculation means for calculating a factor evaluation value indicating a technical evaluation of the factor using the patent score of a patent document belonging to the factor.
を特徴とする文書群分析装置。  Document group analysis device characterized by.
[12] 請求項 11に記載の文書群分析装置であって、 [12] The document group analyzer according to claim 11,
前記因子評価値算出手段は、  The factor evaluation value calculation means includes
前記因子毎に、その因子に属する特許文書の前記特許スコアのうち、所定の閾値 以上の特許スコアを選択し、その選択した特許スコアを集計した値を、前記因子評価 ィ直として算出すること  For each factor, a patent score equal to or higher than a predetermined threshold is selected from the patent scores of patent documents belonging to the factor, and a value obtained by adding the selected patent scores is calculated as the factor evaluation directly.
を特徴とする文書群分析装置。  Document group analysis device characterized by.
[13] 請求項 12に記載の文書群分析装置であって、 [13] The document group analyzer according to claim 12,
前記特許スコアは、前記因子評価値の算出対象である因子を含む母集団の文書 群にぉレ、て標準化した値であること  The patent score is a value that is standardized with respect to the document group of the population including the factor for which the factor evaluation value is calculated.
を特徴とする文書群分析装置。  Document group analysis device characterized by.
[14] 請求項 11〜; 13のうちのいずれか一項に記載の文書群分析装置であって、 [14] The document group analysis apparatus according to any one of claims 11 to 13;
前記特許スコアとは、前記特許文書を技術分野毎、且つ所定期間毎のグループに 分類し、その分類したグループ毎に、そのグループに属する特許文書の経過情報を 利用し、それぞれの特許文書についての算出した値であること を特徴とする文書群分析装置。 The patent score means that the patent documents are classified into groups for each technical field and every predetermined period, and for each classified group, the progress information of the patent documents belonging to the group is used, and Must be a calculated value Document group analysis device characterized by.
PCT/JP2007/071282 2006-11-01 2007-11-01 Document group analysis device WO2008053949A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2008542170A JPWO2008053949A1 (en) 2006-11-01 2007-11-01 Document group analyzer

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2006298324 2006-11-01
JP2006-298324 2006-11-01

Publications (1)

Publication Number Publication Date
WO2008053949A1 true WO2008053949A1 (en) 2008-05-08

Family

ID=39344289

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2007/071282 WO2008053949A1 (en) 2006-11-01 2007-11-01 Document group analysis device

Country Status (2)

Country Link
JP (1) JPWO2008053949A1 (en)
WO (1) WO2008053949A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010020563A (en) * 2008-07-10 2010-01-28 Fujitsu Ltd Progress information output method and progress information output program
JP2012517046A (en) * 2009-02-02 2012-07-26 エルジー エレクトロニクス インコーポレイティド Literature analysis system
WO2014118861A1 (en) * 2013-01-31 2014-08-07 アスタミューゼ株式会社 Information presentation device and information presentation system
JP2014199661A (en) * 2013-03-15 2014-10-23 国立大学法人神戸大学 Patent ranking device and patent ranking method
JP2016018336A (en) * 2014-07-07 2016-02-01 株式会社パテント・リザルト Patent assessment device
CN113641825A (en) * 2021-10-15 2021-11-12 人民法院信息技术服务中心 Smart court system big data processing method and device based on objective information theory

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000250919A (en) * 1999-02-26 2000-09-14 Fujitsu Ltd Document processor and its program storage medium

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000250919A (en) * 1999-02-26 2000-09-14 Fujitsu Ltd Document processor and its program storage medium

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010020563A (en) * 2008-07-10 2010-01-28 Fujitsu Ltd Progress information output method and progress information output program
JP2012517046A (en) * 2009-02-02 2012-07-26 エルジー エレクトロニクス インコーポレイティド Literature analysis system
WO2014118861A1 (en) * 2013-01-31 2014-08-07 アスタミューゼ株式会社 Information presentation device and information presentation system
JP2014199661A (en) * 2013-03-15 2014-10-23 国立大学法人神戸大学 Patent ranking device and patent ranking method
JP2016018336A (en) * 2014-07-07 2016-02-01 株式会社パテント・リザルト Patent assessment device
CN113641825A (en) * 2021-10-15 2021-11-12 人民法院信息技术服务中心 Smart court system big data processing method and device based on objective information theory

Also Published As

Publication number Publication date
JPWO2008053949A1 (en) 2010-02-25

Similar Documents

Publication Publication Date Title
US9710457B2 (en) Computer-implemented patent portfolio analysis method and apparatus
JP4870448B2 (en) Information processing apparatus, customer needs analysis method, and program
US20090234688A1 (en) Company Technical Document Group Analysis Supporting Device
WO2008053949A1 (en) Document group analysis device
CN112632989B (en) Method, device and equipment for prompting risk information in contract text
JPWO2006115260A1 (en) Information analysis report automatic creation device, information analysis report automatic creation program, and information analysis report automatic creation method
Zhang et al. An improved TF-IDF algorithm based on class discriminative strength for text categorization on desensitized data
Momeni et al. Properties, prediction, and prevalence of useful user-generated comments for descriptive annotation of social media objects
KR20080086430A (en) Technical document attribute association analysis supporting apparatus
JPWO2008136421A1 (en) Information analysis system, information analysis method, and information analysis program
JP5827206B2 (en) Document management system, document management method, and document management program
JP2009116457A (en) Method and device for analyzing internet site information
Mohemad et al. Performance analysis in text clustering using k-means and k-medoids algorithms for Malay crime documents
JP2010198142A (en) Device, method and program for preparing database in which phrase included in document classified by category
WO2016189605A1 (en) Data analysis system, control method, control program, and recording medium
Yaniasih et al. Analysis of In-text Citation Patterns in Local Journals for Ranking Scientific Documents.
Li-Juan et al. A classification method of Vietnamese news events based on maximum entropy model
JP5614687B2 (en) Information analysis device for analyzing time-series text data including time-series information and text information
Weng et al. A study on searching for similar documents based on multiple concepts and distribution of concepts
JP2006293616A (en) Document aggregating method, and device and program
Marovac et al. Automation of psychological testing of stressful situations in the Serbian
Fan et al. Prior matters: simple and general methods for evaluating and improving topic quality in topic modeling
JP5295818B2 (en) Database creation apparatus, database creation method, and database creation program in which words included in document are assigned by category
Yu et al. Research on the relevancy of scientific literature based on the citation-mention frequency
Seki et al. Multi-document viewpoint summarization focused on facts, opinion and knowledge

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 07831016

Country of ref document: EP

Kind code of ref document: A1

DPE1 Request for preliminary examination filed after expiration of 19th month from priority date (pct application filed from 20040101)
WWE Wipo information: entry into national phase

Ref document number: 2008542170

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: LOSS OF RIGHTS COMMUNICATION (EPO F1205A OF 10.08.09)

122 Ep: pct application non-entry in european phase

Ref document number: 07831016

Country of ref document: EP

Kind code of ref document: A1

DPE1 Request for preliminary examination filed after expiration of 19th month from priority date (pct application filed from 20040101)