WO2007069663A1 - Technical document attribute association analysis supporting apparatus - Google Patents

Technical document attribute association analysis supporting apparatus Download PDF

Info

Publication number
WO2007069663A1
WO2007069663A1 PCT/JP2006/324876 JP2006324876W WO2007069663A1 WO 2007069663 A1 WO2007069663 A1 WO 2007069663A1 JP 2006324876 W JP2006324876 W JP 2006324876W WO 2007069663 A1 WO2007069663 A1 WO 2007069663A1
Authority
WO
WIPO (PCT)
Prior art keywords
vector
cluster
vectors
attribute
generated
Prior art date
Application number
PCT/JP2006/324876
Other languages
French (fr)
Japanese (ja)
Inventor
Hiroaki Masuyama
Makoto Asada
Kazumi Hasuko
Original Assignee
Intellectual Property Bank Corp.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from PCT/JP2006/321958 external-priority patent/WO2007069408A1/en
Application filed by Intellectual Property Bank Corp. filed Critical Intellectual Property Bank Corp.
Priority to US12/097,446 priority Critical patent/US20090138465A1/en
Priority to JP2007550208A priority patent/JPWO2007069663A1/en
Publication of WO2007069663A1 publication Critical patent/WO2007069663A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/34Browsing; Visualisation therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2216/00Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
    • G06F2216/11Patent retrieval

Definitions

  • the present invention relates to an analysis support device, support method and support program for analyzing the relevance of document attributes in a technical document group.
  • Non-patent literature 1 Taichiro Ueda et al. "Practical workshop Excel thorough utilization Multivariate analysis
  • the subject of the present invention is to solve the interrelationship of the first vector group corresponding to the first attribute X of the technical document and the interrelationship of the second vector group corresponding to the second attribute Y. And analyze in detail the first attribute X and the second attribute Y, and identify the state of concentration or dispersion of the data distribution of the document attribute in the technical document group. Also, it is to provide a device, method and program for supporting relevance analysis of technical document attributes, which can show judgment criteria for the direction of technological development of a company.
  • the technical document attribute relevance analysis support device of the present invention is:
  • Data acquisition means for acquiring data of a technical document group including a plurality of technical documents each having at least two types of attributes;
  • Score calculation means for calculating a score according to data of a technical document belonging to each of a combination of the first attribute X and the second attribute Y among the at least two types of attributes; and the first attribute X
  • a vector is generated based on the scores belonging to each column in the matrix arrangement.
  • First vector relevancy calculating means for calculating interrelationships of vectors generated by the first vector group generating means
  • First vector arranging means for arranging the vectors of the highness of the relevancy closer to the vectors generated by the first vector group generating means
  • Second vector group generation means for generating a vector based on the scores belonging to each row in the matrix arrangement;
  • Second vector relevancy calculating means for calculating interrelationships of vectors generated by the second vector group generating means;
  • the height of the relevancy For the vector group generated by the second vector group generation means, the height of the relevancy, and a second vector arrangement means for arranging the vectors closer to each other.
  • the correlation between the vectors corresponding to the first attribute X (each row of the scores arranged in a matrix) is calculated, and the vectors having a similar distribution of the second attribute Y are calculated.
  • the correlation between the vectors corresponding to the second attribute Y (each row of the score arranged in a matrix) is calculated, and vectors having similar distributions of the first attribute X are It will be placed closer.
  • one of the first attribute X and the second attribute Y is a human attribute of each technical document, and the other is a technical attribute of each technical document.
  • Human attributes include, for example, applicants and inventors for patent documents, and authors and editors for technical papers and books.
  • Technical attributes include technical elements, keywords, etc. in addition to technical classifications such as IPC (International Patent Classification).
  • the score is calculated.
  • the score may be calculated by weighting and summing each of the books.
  • the weighting emphasizes the importance or quality of the technical document, for example, by making the patent publication more weighty than the published patent publication.
  • the first vector group generation means or the second vector group generation means generate a beta including, as a component, a logarithm of each of the scores belonging to each column or each row in the matrix arrangement.
  • the distribution of the beta component becomes close to the normal distribution, particularly when the scores are nonnegative and the distribution is concentrated around 0, so that the reliability of the relevancy calculation result is improved.
  • First cluster generation means for selecting two vectors among the vector groups generated by the first vector group generation means according to a predetermined criterion, and causing the two vectors to be adjacent to generate a cluster;
  • the vector having the highest relevance to any of the end vectors located at both ends of the vectors forming the cluster generated by the first cluster generation means is generated by the first vector group generation means
  • the addition vector is selected from vectors other than the cluster among the vectors, and the addition vector is added to the end vector that is most highly correlated with the addition vector, and the addition vector is adjacent to the addition vector.
  • V. first cluster expanding means for adding vectors to the clusters to expand the clusters sequentially, and Z or the second vector arranging means includes
  • a second cluster generation unit configured to select two vectors among the vector groups generated by the second vector group generation unit according to a predetermined criterion, and to make the two vectors adjacent to generate a cluster;
  • the vector having the highest relevance to any of end vectors located at both ends of the vectors forming the cluster generated by the second cluster generation means is generated by the second vector group generation means
  • the addition vector is selected from vectors other than the cluster among the vectors, and the addition vector is added to the end vector that is most highly correlated with the addition vector, and the addition vector is adjacent to the addition vector.
  • second cluster expanding means for expanding the cluster sequentially by adding a vector to the cluster.
  • the high relevance ! the vectors are sequentially adjacent, the cluster is expanded, and so on, so that highly relevant vectors are surely placed close to each other, and the document attribute data distribution is concentrated And the state of distribution can be made explicit.
  • the first cluster generation unit or the second cluster generation unit is configured to Regarding the vector group generated by the toll group generation means or the vector group generated by the second vector group generation means,
  • a first cluster expansion cancellation determination unit for stopping the selection of the joining vector by the first cluster expansion unit and the expansion of the cluster if any relevance is equal to or less than a predetermined threshold
  • a first cluster regenerating unit configured to select two vectors out of a group of vectors other than clusters generated by the first cluster generating unit according to a predetermined criterion and to make the two vectors adjacent to generate another cluster;
  • the vector having the highest relevance to any one of the end vectors located at both ends of the vectors forming the other cluster generated by the first cluster regenerating means is generated by the first vector group generating means
  • the vector groups other than the other clusters are selected to be the generated vector group as a join vector, and the association vector and the association vector are selected.
  • the second vector arranging unit is
  • Second cluster enlargement / disapproval determination means for discontinuing selection of the joining vector by the second cluster enlargement means and the enlargement of the cluster
  • Second cluster regenerating means for selecting two vectors out of a group of vectors other than clusters generated by the second cluster generating means according to a predetermined criterion and causing the two vectors to be adjacent to generate another cluster
  • the vector having the highest relevance to any one of the end vectors located at both ends of the vectors forming the other cluster generated by the second cluster regenerating means is generated by the second vector group generating means.
  • a vector group other than the other clusters is selected as the generated vector group to form a join vector, and the association vector and the association vector are selected.
  • the distribution state of the score is not clear at first glance only by numerically indicating the distribution of the score, but the distribution state of the score can be displayed more easily by marking or coloring it.
  • the present invention also provides a method of supporting analysis of relevance of document attributes including the same steps as the method executed by each of the above devices, and the same processing as the processing executed by each of the above devices. It is a technical document attribute relevance analysis support program that can be executed by This program is recorded on a recording medium such as FD, CDROM, DVD, etc. And may be transmitted and received by the network.
  • FIG. 1 is a diagram showing a hardware configuration of a technical document attribute relevance analysis support apparatus according to a first embodiment of the present invention.
  • FIG. 2 A flowchart showing an operation procedure of the processing device 1 in the relevance analysis support device of the first embodiment.
  • FIG. 3 is a view showing a display example by a display unit.
  • FIG. 4 is a view showing another display example by the display unit.
  • FIG. 5 The flowchart which shows the operation
  • FIG. 6 An example of the document number matrix generated in the second embodiment.
  • processing device 2 input device 3: recording device 4: output device 110: data acquisition unit 120: score calculation unit 130 and 140: first and second vector group generation units 150 And 160: first and second vector relation calculation units, 170 and 180: first and second vector arrangement units
  • X, Y Attributes of individual technical documents. For example, applicant, technical field (keyword or IPC) etc.
  • X, Y The value of the attribute. For example, it refers to the specific name of the applicant or technical field, and is expressed numerically j k
  • Score calculated for each combination of attribute X and attribute ⁇ . Value range of attribute X is X, X, ⁇ kj 1 2
  • FIG. 1 is a diagram showing the hardware configuration of the technical document attribute relevance analysis supporting apparatus according to the first embodiment of the present invention.
  • the relevance analysis support device of this embodiment includes a processing device 1 including a CPU (central processing unit) and a memory (recording device) and the like, and input means such as a keyboard (hand input device)
  • An input unit 2 a recording unit 3 which is a recording unit for storing data and conditions of a technical document group and conditions and processing results by the processing unit 1, and an output unit for displaying or printing scores etc. arranged in a matrix.
  • the output device 4 is
  • the processing device 1 includes a data acquisition unit 110, a score calculation unit 120, first and second vector group generation units 130 and 140, first and second vector relevance calculation units 150 and 160, first and second The vector arrangement units 170 and 180 are provided.
  • the recording device 3 includes a condition recording unit 31, a work result storage unit 32, a document storage unit 33, and the like.
  • the document storage unit 33 contains data of a technical document group, which has also acquired external database and internal database capabilities.
  • the external database means, for example, a document database such as IPDL of a patent electronic library serviced by the Japan Patent Office or service operated by Patrices Corporation!
  • An internal database is a database that stores data such as patent JP-ROM that is sold and / or is stored, FD (flexible disc) that stores documents, CD (compact disc) ROM, and MO (magneto-optical).
  • Devices such as DVDs (digital video disks), devices such as DVDs (digital video disks), devices such as OCR (optical information readers) that read or output handwritten documents on paper etc It is assumed that the equipment to convert data is included.
  • patent publications are mainly handled as technical documents, but not limited to this, general technical documents can be analyzed widely, such as utility model publications, technical articles, technical magazines, books, and the like.
  • the input device 2 receives inputs such as acquisition conditions of data of technical documents, calculation conditions of scores, generation conditions of vectors, calculation conditions of relevance, arrangement conditions of vectors, and the like. These input conditions are sent to the condition recording unit 31 of the recording device 3 and stored.
  • the data acquisition unit 110 acquires data of a technical document group to be analyzed from the document storage unit 33 of the recording device 3 in accordance with the acquisition conditions of data input by the input device 2. For example, based on bibliographic information of each technical document, at least two types of attributes of each technical document are acquired as data.
  • the acquired data of the technical document group is directly sent to the score calculation unit 120 to be used for processing there, or sent to the work result storage unit 32 of the recording device 3 to be stored.
  • the score calculation unit 120 determines the combination of the first attribute X and the second attribute Y among the at least two types of attributes. Calculate the score ⁇ according to the data of the technical document that belongs to each. The score ⁇ is calculated for each combination of the value of the first attribute X and the value of the second attribute Y of kj kj. The calculated score ⁇ is directly sent to the first and second vector group generation units 130 and 140 to be used for processing with them, or sent to the work result storage unit 32 of the recording device 3 to be stored.
  • the first vector group generation unit 130 generates a vector group X based on the score ⁇ calculated by the score calculation unit 120.
  • the vector group X has each “row” in the matrix-like arrangement when the score ⁇ is arranged in a matrix with the first attribute X on the horizontal axis and the second attribute ⁇ on the vertical axis. Calculated based on the above score belonging to The second vector group generation unit 140 is based on the score ⁇ calculated by the score calculation unit 120,
  • This vector group ⁇ has the first attribute X as a horizontal axis, and the second
  • the vector groups X and Y generated by the first and second vector group generation units 130 and 140 are identical to each other.
  • the vector groups X and Y generated by the first and second vector group generation units 130 and 140 are
  • the first vector relevancy calculation unit 150 calculates the relevancy of the vector group X generated by the first vector group generation unit 130.
  • the second vector relevancy calculation unit 160 calculates the relevancy of the vector group Y generated by the second vector group generation unit 140.
  • the data of the relevancy calculated by the first and second vector relevancy calculation units 150 and 160 are directly sent to the first and second vector arrangement units 170 and 180, respectively, to be used for processing there, or It is sent to and stored in the work result storage unit 32 of the recording device 3.
  • the first vector arrangement unit 170 arranges vectors having high relevance closer to each other based on the vector X mutual relationships calculated by the first vector relationship calculation unit 150.
  • the second vector arrangement unit 180 arranges vectors having high relevance closer to each other based on the mutual relationship between vector Y calculated by the second vector relevance calculation unit 160. k
  • the arrangement of the vectors determined by the first and second vector arrangement units 170 and 180 is sent to and stored in the work result storage unit 32 of the recording device 3 and output from the output device 4 as necessary.
  • the first cluster generation unit 171 selects two vectors among the vectors generated by the first vector group generation unit 130 based on a predetermined criterion, and generates a cluster by causing these two vectors to be adjacent to each other. .
  • the second cluster generation unit 181 selects two vectors among the vectors generated by the second vector group generation unit 140 based on a predetermined reference, and generates a cluster by causing these two vectors to be adjacent to each other.
  • the predetermined criterion for selecting the two vectors is, for example, the level of relevancy, and the relevancy is the highest. Two vectors can be selected.
  • the clusters generated by the first and second cluster generation units 171 and 181 are directly sent to the first and second cluster expansion units 172 and 182, respectively, to be used for processing there, or the work result of the recording apparatus 3 It is sent to and stored in the storage unit 32.
  • the first cluster expanding unit 172 sequentially expands the clusters generated by the first cluster generating unit 171 by adding joining vectors to the clusters generated by the first cluster generating unit 171.
  • the addition vector is a vector having the highest relevance to any of the end vectors located at both ends of the vectors generated by the first cluster generation unit 171. This is determined by selecting a vector group force other than the above cluster among the vector groups X generated by.
  • Joining of a join vector to a cluster is performed by bringing the end vector determined to be most relevant to the join vector to be adjacent to the join vector, but is not limited thereto. You may join Vettar.
  • the second cluster expanding unit 182 sequentially expands the clusters generated by the second cluster generating unit 181 by adding joining vectors to the clusters generated by the second cluster generating unit 181.
  • the addition vector is a vector having the highest relevance to any of the end vectors located at both ends of the vectors forming the cluster generated by the second cluster generation unit 181.
  • Joining a join vector to a cluster is performed by using an end vector that has the highest relevance to the join vector and the join vector.
  • the present invention is not limited to this, but the joining vector may be joined to other places in the cluster.
  • the clusters are expanded by the first and second cluster expansion units 172 and 182, and when there is no cluster unjoined vector, the processing of the first and second vector arrangement units 170 and 180 is finished.
  • the first cluster enlargement stop determination unit 174 is generated by the end vectors located at both ends of the vectors forming the cluster generated by the first cluster generation unit 171, and the first vector group generation unit 130.
  • the first cluster expanding unit 172 cancels the selection of the joining vector and the cluster expansion.
  • the second cluster enlargement stop determination unit 184 includes end vectors located at both ends of the vectors forming the cluster generated by the second cluster generation unit 181, and the vectors generated by the second vector generation unit 140. What is the relationship between the group Y and vectors other than the above cluster?
  • the second cluster expanding unit 182 cancels the addition vector selection and the cluster expansion.
  • the predetermined threshold value for example, it is desirable to set 0 (no correlation) in the case of the correlation coefficient.
  • the first cluster regeneration unit 175 selects one of the vectors other than the clusters generated by the first cluster generation unit 171 (the clusters after expansion when expanded by the first cluster expansion unit 172), 2 One vector is selected according to a predetermined criterion, and the two vectors are adjacent to generate another cluster.
  • the second cluster re-generation unit 185 generates two vectors out of a group of vectors other than the clusters generated by the second cluster generation unit 181 (the clusters after expansion when expanded by the second cluster expansion unit 182). The selection is made based on a predetermined criterion, and the two vectors are made adjacent to generate another cluster.
  • the other clusters generated by the first and second cluster regenerating units 175 and 185 are directly sent to the first and second cluster re-expanding units 176 and 186, respectively, and are used for processing there or the recording device 3 It is sent to and stored in the work result storage unit 32 of FIG.
  • the first cluster re-expanding unit 176 is configured by the other cluster generated by the first cluster re-generating unit 175. By adding a join vector to the star, the other clusters are sequentially expanded.
  • the addition vector is a vector that has the highest relevance to any of the end vectors located at both ends of the vectors forming the other clusters generated by the first cluster regeneration unit 175. It is determined by selecting vector group forces other than the above other clusters among vector groups other than clusters which are vector group X generated by the group generation unit 130 and generated by the first cluster generation unit 171.
  • the addition vector is added to the other cluster by bringing the end vector most associated with the addition vector and the addition vector adjacent to each other.
  • the second cluster re-enlargement unit 186 expands the other clusters sequentially by using the other cluster generated by the second cluster re-generation unit 185 with the join vector.
  • the addition vector is a vector that has the highest relevance to any of the end vectors located at both ends of the vectors forming the other cluster generated by the second cluster regenerating unit 185.
  • Joining of the join vector to the other cluster is performed by bringing the end vector that is considered to be most relevant to the join vector to be adjacent to the join vector.
  • the clusters are expanded by the first and second cluster re-enlargement units 176 and 186, and when there are no vectors other than clusters, the processing of the first and second vector arrangement units 170 and 180 ends.
  • the condition recording unit 31 records information such as conditions obtained from the input device 2, and sends necessary data based on the request of the processing device 1.
  • the work result storage unit 32 stores the work result of each component in the processing device 1 and sends necessary data based on the request of the processing device 1.
  • the document storage unit 33 stores and provides data of a necessary technical document group which also obtains external database or internal database capability based on the request of the input device 2 or the processing device 1.
  • the output device 4 outputs a score or the like arranged in a matrix based on the arrangement of vectors determined by the first and second vector arrangement units 170 and 180 of the processing device 1.
  • the output device 4 includes, for example, a display unit 41 such as a display device, and displays the distribution state of the scores arranged in a matrix with a pattern or a color according to the score.
  • the form of the output is not limited to the display on the display unit 41, but may be printing on a print medium such as paper, or transmission to a computer apparatus on a network via communication means.
  • FIG. 2 is a flow chart showing the operation procedure of the processing device 1 in the relevance analysis support device of the first embodiment.
  • the data acquisition unit 110 acquires data of a technical document group to be analyzed (step S110).
  • Each document of this technical document group needs to have at least two kinds of attributes X and Y respectively.
  • the number of documents in this technical document group is N.
  • the data shown in [Table 1] below is obtained.
  • the number of attribute values may be one for each technical document, and the attribute of each technical document may be one like attribute Z of technical document numbers 2, 3 and 4 in the following [Table 1].
  • the score calculation unit 120 calculates a score corresponding to the data of the technical document belonging to each of the combinations of the first attribute X and the second attribute Y among the at least two types of attributes (step S 120) ).
  • two types of the above-mentioned attributes are selected. This selection is made based on the user's instruction input from the input device 2.
  • One of the two types of attributes is a human attribute such as the applicant or the inventor, and the other is a technical field attribute such as a keyword or IPC.
  • both of the two types of attributes may be technical field attributes, for example, one may be a technical classification and the other may be a technical element.
  • an attribute that is neither a human attribute nor a technical field attribute may be selected as one or both of the two types of attributes, such as the filing date.
  • each attribute X, Y! / ! the value of the attribute X, Y (eg, applicant j k
  • the specific name of the keyword and it is not limited to the numerical value). For example, create a descending order ranking of the number of relevant technical documents, as shown in [Table 2] below, and specify the range of values that fall within the top p for attribute X and the top q for attribute Y.
  • the number p of values X in the range of attribute X and the number q of values Y in the range of attribute Y may be the same It may be different.
  • the value range may be selected according to the purpose of analysis, such as the ability to analyze top companies in the number or which technical field to analyze. In the following description, values X, X, ⁇ ⁇ ⁇ ⁇ for attribute X, values X for attribute Y, Y ⁇ Y, ⁇ ⁇
  • Described as being determined as the Y power range.
  • a technical document with a set of attributes (X, ⁇ ) is a technical document number among a set of technical documents.
  • the core ⁇ is, for example, as shown in [Table 3] below.
  • the virtual cases shown in [Table 3] will be referred to as appropriate.
  • the score ⁇ may be determined after setting a certain width and resetting the value of the attribute. For example, when the filing date is selected as the attribute X, the value of ⁇ becomes 1000 or more in several years as it is, but the year of filing or the year of filing may be set as the value of the attribute. This allows the range of attributes to be easily analyzed.
  • the sum of the weights may be used as the score ⁇ .
  • the weight may be given based on the application progress information, for example, a large value if patented in the case of a patent document, and a small value if not registered, etc. It is preferable to give based on the number of citations and the like.
  • analysis can be appropriately performed with a score that takes into account the importance or qualitative factors of technical documents.
  • vectors are generated in the first and second vector group generation units 130 and 140 (steps S130 and S140).
  • This vector X is a vector representing the distribution of the attribute ⁇ ⁇ ⁇ ⁇ for the value X of the attribute X.
  • This vector X becomes a vector indicating the distribution of technical fields.
  • applicant X has many features in the technical fields ⁇ and ⁇ .
  • the turtle Y is a vector indicating the distribution of the attribute X with respect to the value ⁇ of the attribute ⁇ . For example k k
  • the vector k represents the distribution of applicants.
  • the vectors X and Y may have the score itself as a component as described above, but the score ⁇
  • the logarithm can not be defined, but, for example, it should be taken as logarithm of 0, for convenience-1 or other negative numbers and ⁇ , may be, or all score For convenience, one or another positive number may be added and logarithms may be added, respectively.
  • a method of generating a vector a method using the score itself as a component as described above, a score ⁇
  • the product of the score multiplied by the reciprocal of the frequency of occurrence is the component
  • the score ⁇ is attribute property ⁇
  • ⁇ ⁇ is multiplied by 1Z4, which is the reciprocal of this frequency of occurrence. Then, for example, the score ⁇ lj 12
  • the first component of vector X, or the second of vector Y, is multiplied by the number 1Z4.
  • vectors composed of the components of each column corresponding to 16 be vectors X to X
  • vectors composed of the components of each row corresponding to the range Y to Y be each
  • the correlation between p vectors X can be obtained as data shown in [Table 5] below using, for example, a correlation coefficient.
  • the same can be applied to the force attribute Y indicating the calculation result of the relativity for the vector X corresponding to the attribute X.
  • a method of evaluating relevance in addition to the correlation coefficient, a method using an inner product, a method of calculating Spearman's rank correlation coefficient, etc. can be considered.
  • first and second vector arrangement units 170 and 180 processing is performed to arrange vectors having high relevance closer to vectors having low relevance.
  • the following explains one of the methods. Although the following description will be mainly made by showing an example of the attribute X, the same can be applied to the attribute Y.
  • two vectors having the highest correlation among p vectors X are selected, and these vectors are adjacent to generate a cluster.
  • the vector X and the X force with the correlation coefficient 0.84 are the most relevant vectors.
  • adjacent vectors may be made by other methods. For example, when it is desired to contrast a specific applicant (such as own company) with the remaining applicants, the vector of the specific applicant may be adjacent to the vector most relevant thereto. Also, for example, when it is desired to compare these with the remaining applicants while comparing the two specific applicants (such as the company and the competitor), the vectors of the two specific applicants should be made adjacent.
  • cluster a group of a plurality of adjacent vectors.
  • the joining vector is added to the cluster to expand the cluster (steps S172 and S182).
  • the most relevant set of vectors is determined between the vectors located at both ends of the cluster and the remaining vectors not in the cluster.
  • the most relevant vector to the vector X or X located at either end of the cluster is the vector
  • the vectors are made adjacent to form larger clusters.
  • join vector X is adjacent.
  • the class is not limited to this
  • the join vector may be joined to another point in the data.
  • the vectors with high relevance are surely arranged close to each other, and the state of concentration or dispersion of data distribution of the document attribute is obtained.
  • the distribution can be formed to be explicit.
  • step S173 and S183: NO when there are no vectors not yet joined to the cluster (steps S173 and S183: NO), the arrangement of vectors ends. If there is a cluster unjoined vector (steps S 173 and S 183: YES), the process goes to steps SI 74 and SI 84 respectively.
  • steps S 174 and S 184 the first and second cluster enlargement stop determination units 174 and 184 are determined. In the above, it is determined whether or not all the associations with vectors other than clusters are less than or equal to a predetermined threshold. If there is at least one relationship that exceeds the predetermined threshold (steps S 174 and S 184: NO), the process returns to steps S 172 and S 182 to expand the cluster sequentially. For example, between the two ends X or X of adjacent clusters in the order of vectors X, X, X
  • the most relevant vector is the vector X, whose correlation coefficient with the vector X is 0.49
  • join vector X is adjacent to vector X.
  • the highly relevant vector may be adjacent. For example, if it is determined that highly relevant betats are determined and adjacent to only one of both ends of a cluster, the vector that first made up the cluster is finally placed at the end of the matrix. It is also possible to make things. Also, for example, if it is determined that adjacent highly correlated vectors are determined alternately at one end and the other end of the cluster, the vector which first made up the cluster is finally arranged at the center of the matrix. It is also possible to create
  • steps S174 and S184 when the relevancy is less than or equal to a predetermined threshold (steps S174 and S184: YES), the process proceeds to steps S175 and S185, respectively.
  • steps S175 and S185 in the first and second cluster regenerating units 175 and 185, two vectors in the vector group other than the above cluster are made adjacent to generate another cluster.
  • the join vector is added to the other cluster to expand the other cluster (steps S176 and S186). That is, when there is no vector having relevance higher than the threshold value, clusters are generated again only with the remaining beta and the same cluster expansion procedure as described above is repeated.
  • the threshold of relevance be, for example, 0 (no correlation) if the correlation coefficient. Relevance Using a correlation coefficient as an evaluation method is also advantageous in that it is easy to set the threshold in this manner.
  • step S177 and S187: NO the arrangement of the vectors ends. If there are unclustered vectors remaining V (steps S 177 and S 187: YES), the process proceeds to steps S 178 and S 188, respectively.
  • steps S 178 and S 188 it is determined whether or not the relevance to any vector other than the cluster is less than or equal to a predetermined threshold. If at least one of the associations exceeds the predetermined threshold (steps S 178 and S 188: NO), the process returns to steps S 176 and S 186, respectively, to successively expand the other clusters. If all the relevancy is below the predetermined threshold (steps S 1 78 and S 188: YES), the process returns to steps S 175 and S 185 respectively to generate further clusters.
  • a plurality of clusters can be formed, and finally, these clusters are adjacent to each other.
  • a method of making clusters adjacent to each other there is a method of arranging in one direction from one end side to the other end in descending or ascending order of cluster size (the number of vectors included in the cluster) A method of arranging them alternately can be considered.
  • step 2 steps S140, S160, and S181 to S188
  • steps S140, S160, and S181 to S188 may be performed either first or the other first, or both at the same time. It may be executed. Alternatively, only one of them may be executed.
  • one of the attributes X is a human attribute such as an applicant
  • the other attribute Y is a technology classification according to a coding system such as IPC
  • the output by the output device 4 may be in the form as shown in [Table 6] above, and in order to make it easier to view, the distribution state of the score may be displayed with a pattern or color according to the score. For example, it is preferable to give dark or warm colors to areas where high scores are distributed, and light, cold or cold colors to areas where low scores are distributed. Although the distribution state may not be apparent at first glance just by showing the distribution of the score numerically, the distribution state of the score can be displayed in an easy-to-see manner by adding a pattern or a color.
  • FIG. 3 is a view showing one display example by the display unit.
  • the area with dense distribution is hatched with a high linear density
  • the area with coarse distribution is hatched with a low density.
  • the density of the distribution of scores becomes clear, and the distribution of scores can be displayed more easily. it can.
  • FIG. 4 is a view showing another display example by the display unit.
  • the value of each attribute is shown specifically when "applicant" is selected as the first attribute X and "technical field" is selected as the second attribute Y.
  • grids with high line density are hatched in areas with dense distribution, and grids with low line density are printed in areas with coarse distribution, so the coarse / dense state of the score distribution state. Is clear. In other words, if you select a specific “applicant” and look at the dense distribution, you can read the main technical fields being developed by the applicant, and select a specific “technical field” and the distribution is dense. If you look at the section, you can read the major applicants who are developing in the relevant technical field.
  • FIG. 4 illustrates an example in which one of the two types of attributes is a human attribute and the other is a technical field attribute
  • the present invention is not limited to this, and both of the two types of attributes are technical field attributes, example
  • one may be a technology classification and the other may be a technology element.
  • one may be an IPC main classification (section, class), and the other may be an IPC subclass (group, subgroup) or the like.
  • a company can grasp the result of technological development that he or she has developed in his own research and development organization and the current status of the technological asset portfolio by himself. It becomes possible to have objective guidelines, and it can contribute to a company's technology development investment judgment.
  • the current state of the development system of a specific company can be analyzed more precisely than the multifaceted angular force.
  • the hardware configuration of the technical document attribute relevance analysis support apparatus according to the second embodiment is the same as the hardware configuration (FIG. 1) in the first embodiment, and thus the description thereof will be omitted.
  • FIG. 5 is a flowchart showing the operation procedure of the processing device 1 in the relevance analysis support device of the second embodiment.
  • the second embodiment has main features in the portion corresponding to the processing up to the generation of the first and second vector groups in the first embodiment. That is, in the second embodiment, the task word and the solution word included in the document are used as the attributes X and Y of the technical document, and the combination of the problem word and the solution word is the same as the score to be a vector component. Use the change rate of the number of documents.
  • the process of arranging the generated vector group is substantially the same as that of the first embodiment. The operation procedure of this second embodiment will be described in detail below.
  • the data acquisition unit 110 acquires the technical document group to be analyzed (step S210).
  • the types of technical documents to be acquired may be any type such as patent documents and technical papers, but in particular patent documents will be described in a format that can be extracted by computer processing the problem words and solution words described below. It can be said that it is preferable.
  • the acquisition condition of the analysis target document group may be specified even if it is specified by, for example, the IPC code, and a document having a predetermined number of similar high degree of similarity to a specific technical document may be acquired.
  • the data acquisition unit 110 extracts candidates for the “task word” and the “solved word” from each document of the acquired analysis target document group (step S211). For example, if there is an item “task” or “solution” in the summary part or other part of each document, the word of that part is extracted. Also, for example, when each document includes a description such as “The subject of the present invention is' ⁇ > Extract
  • the data acquisition unit 110 selects each of the “task word” and the “solved word” used for the analysis from among the extracted “task word” and “solved word” candidates (step S 212).
  • the selection method for example, the document frequency in the analysis target document group for each “problem word” and “solved word” candidate (DF: the number of hit documents when searched by each index word in the analysis target document group)
  • DF the number of hit documents when searched by each index word in the analysis target document group
  • the upper predetermined number for example, 100 words each
  • the data acquisition unit 110 performs factor analysis using the selected “task word” to calculate the factor loading amount of each task word (step S 213). Specifically, it is performed as follows.
  • Z be an I-by-G matrix with z as the matrix element.
  • TFIDF is the index term frequency (TF: number of occurrences of the task word in a document) and document frequency (DF: the number of documents of the document in which the task term appears in a predetermined document group) for a certain index term. It is a value determined by multiplying it by the reciprocal of or the reciprocal of the log of document frequency (IDF: reverse document frequency). It is a task word that is used in large numbers for the document for which the document vector is to be calculated, and is used frequently for the specified document group, and a high word TFIDF value is calculated for the task word. .
  • the number of factors is H
  • the factor loading amount for each factor h of each task word g is a.
  • a factor loading matrix A having a factor loading amount a as a matrix element, and a factor score ih gh
  • F be a factor score matrix F with matrix elements as follows.
  • the factor loading amount is determined in the following equation.
  • factor rotation As the method of rotation of factor axis, orthogonal rotation such as Nolimax, Coatimax, Ekamax, Persimax, Orthomax, Orthogonal procrustes, etc., Promax, Oblimin, Harris' Kaiser, Oblique such as oblique procrastess Rotation is mentioned.
  • the data acquisition unit 110 also performs factor analysis on the “solved word” to calculate the factor loading amount of each resolved word (step S 214).
  • the calculation method of the factor load amount is the same as that described for the “task word”.
  • the data acquisition unit 110 selects a predetermined number of factors (hereinafter referred to as “task factor” and “solving factor”) obtained as a result of factor analysis of each of the task word and the solution word (step S215). , S216) 0
  • a factor is selected with a predetermined number of higher eigenvalues.
  • the number of factors to be selected is arbitrary.
  • p task factors and q resolution factors are selected.
  • two types of attributes X and Y are used. We select “task factor” and “resolution factor” and select the top p unique factor eigenvalues and the top q eigenvalue factors as the value range of the attribute (range).
  • the data acquisition unit 110 determines an attribution factor of each task word and each solution word (steps S217 and S218).
  • the factor loading amount a for a certain factor h is maximum
  • an assignment factor of the subject word (or solution term) g is set as the factor h.
  • one task word (or solution word) can belong to only one factor.
  • Force One task word (or solution word) belongs to one factor is not limited to one.
  • a lower limit value is set for the factor loading amount, and the maximum value of the factor loading amount of a certain task word (or solution) g a is less than the lower limit, the task word (or solved word) g is any factor 'J ⁇ belongs, as a matter of course.
  • the score calculation unit 120 counts the number of relevant technical documents for each combination of each task word and each solution word whose attribution factor has been determined (step S220). For example, an AND search is performed to search for documents that contain both one task word and one solution word whose attribution factor has been determined in the document or its summary part, and the number of hit documents is the number of relevant technical documents.
  • the score calculation unit 120 sums up the number of documents for each combination of each task factor and each solution factor (step S221). For example, the number of relevant technical documents is totaled for all combinations of one of the task words belonging to a certain task factor and one of the solution words belonging to a certain resolution factor. For example, the task word belonging to a certain task factor is 3 of Xg, Xg, Xg
  • each document returns the factor score for each factor h of each document i calculated by the above factor analysis based on f.
  • This document number matrix shows how many technical documents exist for each combination of problem factor and solution factor, and what kinds of problems and solutions are focused in a certain technical field
  • FIG. 6 shows an example of the document number matrix generated in the second embodiment.
  • This document number matrix is used to extract a predetermined number of patent documents similar to the top of a certain patent document i relating to “semiconductor devices and their manufacturing methods”, and perform factor analysis for each of the problem word and the solution word by the method described above. It is obtained by going.
  • the meaning of the factor interpreted by the analyst based on the task word group and the solution word group included in each task factor and each solution factor is described in the margin of this matrix.
  • the task factor represents a disadvantage that may occur in any application, and the resolution factor is a technology that can eliminate it.
  • Applications can be analogized to technology from solution factors.
  • Each element (the number of documents) of the document number matrix of p rows and q columns is set as a score ⁇ , and the first and second vector groups are generated as in the first embodiment, By arranging the vectors based on this, it may be possible to analyze the state of concentration and variance of the problem factor and the solution factor, but in the second embodiment, a vector group is further generated as follows. .
  • the score calculation unit 120 classifies each element of the document number matrix of ⁇ rows and q columns for each predetermined period (step S222). For example, in the case of patent documents, classification by year of application or classification by multiple years can be considered. Preferably, it is classified into two periods before and after a predetermined time.
  • the score calculation unit 120 calculates the increase / decrease rate of the number of technical documents based on the classification for each predetermined period for each element of the document number matrix of p rows and q columns. If the classification for each predetermined period is classified into two periods, the rate of increase or decrease is calculated for each element of the document number matrix of p rows q columns, so the change rate matrix of p rows q columns Forces are generated. In the case where the classification for each predetermined period is the classification into T period (T ⁇ 3), it is also possible to generate (T ⁇ 1) pieces of change rate matrix of ⁇ rows and q columns for each adjacent period. And one matrix of average rates of change may be generated.
  • the change factor matrix generated in this way makes it possible to detect changes in the trend of issues and solutions. For example, focus on a specific solution factor (one row with a matrix) to find changes in the application of the technology, or focus on a specific task factor (one row with a matrix) to change the solution to the problem. It can be found.
  • the subsequent processing is the same as in the first embodiment, and the first and second vector group generation units 130 and 140 set each element (increase / decrease ratio) of this increase / decrease ratio matrix of p rows and q columns as a score ⁇ .
  • a second vector group is generated (steps S230 and S240).
  • the first and second vector relation calculation units 150 and 160 respectively calculate the relation between the vectors (steps S250 and S260), and the first and second vector arrangement units 170 and 180 respectively arrange the vectors.
  • Steps S271 to 278, S281 to S288) are respectively referred to as “the task factor gazette increase / decrease rate vector”.
  • the p dimension vectors relating to the q solution factors Is referred to as “resolution factor bulletin number increase / decrease rate vector”.
  • the first and second clusters are respectively referred to as “problem factor cluster” and “solving factor cluster”.
  • each element of the matrix is the change rate of the number of documents etc., it becomes possible to grasp the temporal transition of the problem factor (use) and the solution factor (technology) in detail.
  • the problem factors (applications) and the solution factors (technologies) can be grasped quickly by making remarkable changes in the matrix.
  • the attributes arranged on each axis of the matrix are described in the case where one is a human attribute and the other is a technical attribute, and the applicant is used as an example of the human attribute. I have listed. However, this is only an example. Other human information such as the inventors may be used as the human attribute. Also in this case, the same function and effect as the first embodiment can be obtained.
  • a matrix may be generated for only one technical document group to be analyzed, or each element of a certain matrix may be classified, for example, for each predetermined period, and divided into the matrix for each predetermined period.
  • a plurality of matrixes may be generated.
  • the matrix is classified for each predetermined period (S222), and Task factor 'Calculate the increase / decrease rate of the number of corresponding gazettes for each combination of each solution factor (S223), and then perform the process of S230 to S227 (or! /, Scam S240 to S287)! / It is not particularly limited to this.
  • the processing of S222 and S223 should not be performed after S221, but after the processing of S277 (or, processing of S287).

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Data on a group of technical documents having an attribute X and an attribute Y is acquired and a score corresponding to the data on the technical documents belonging to the combination of the attribute X and attribute Y is calculated. The attribute X is placed on the horizontal axis and the attribute Y is placed on the vertical axis. The scores are placed in a matrix. According to the scores belonging to each column of the placement in the matrix, a group of vectors Xj are created. According to the scores belonging to each line, a group of vectors Yk are created. For each of the groups of vectors Xj, Yk, vectors having higher association with each other are placed nearer to each other. The associations between the vectors of the first group corresponding to the first attribute X of the technical document and the associations between the vectors of the second group corresponding to the second attribute Y are analyzed in detail, and examination in consideration of both first and second attributes X, Y can be performed.

Description

明 細 書  Specification
技術文書属性の関連性分析支援装置  Relevancy analysis support device for technical document attributes
技術分野  Technical field
[0001] 本発明は、技術文書群における文書属性の関連性を分析する分析支援装置、支 援方法及び支援プログラムに関する。  [0001] The present invention relates to an analysis support device, support method and support program for analyzing the relevance of document attributes in a technical document group.
背景技術  Background art
[0002] 企業が自社の研究開発組織にお!、て開発してきた技術開発成果や、その技術資 産ポートフォリオの現状を自身で把握し、今後の開発方向性の客観的な指針を持つ ことは容易ではない。企業の開発方向性の客観的な指針を得るための方策として、 自社及び他社の技術文書群カゝら得られるデータを収集分析することは有効な手段と 考えられるが、膨大な技術文書群力も有用な情報を抽出することには相当な困難が 伴う。  [0002] It is possible for a company to grasp for themselves the current status of its technological asset portfolio and the results of technological development that it has developed in its own R & D organization and to have objective guidelines for future development direction. It's not easy. Collecting and analyzing data obtained from the company and other companies' technical documents as a means to obtain objective guidelines on the development direction of a company is considered to be an effective means, but the enormous technical documents also have the power There are considerable difficulties in extracting useful information.
[0003] 従来、膨大なデータの中からそこに埋もれた情報を発掘する試みとして、例えば X ( j = l, 2, ···, p)及び Y (k=l, 2, ···, q)という二種類の項目を横軸及び縦軸に  [0003] Conventionally, in an attempt to excavate information buried in huge data, for example, X (j = l, 2, ···, p) and Y (k = l, 2, · · ·, Two items, q), on the horizontal and vertical axes
k  k
配置し、これらの項目の組合せごとの集計結果を表にしたクロス表を分析するものが ある。  Some arrange and analyze cross-tables that tabulate the aggregation results for each combination of these items.
[0004] 例えば次の文献に記載された双対尺度法 (Dual Scaling)は、このようなクロス表の 横軸の項目 X(表頭)及び縦軸の項目 Y (表側)にそれぞれ尺度 X(j = l, 2, ···, p  For example, the dual scaling method (Dual Scaling) described in the following document can be obtained by scaling X (j) to item X (overhead) on the horizontal axis and item Y (overhead) on the vertical axis in such a cross table. = l, 2, ..., p
j k j  j k j
)及び尺度 Y (k=l, 2, ···, q)を与え、クロス表に隠された傾向を見つけようとする  Give a scale Y (k = l, 2, ..., q) and try to find hidden trends in the cross table
k  k
ものである。この文献では、尺度 X及び尺度 Yの具体的数値を算出するために、 p次  It is a thing. In this document, in order to calculate specific numerical values of scale X and scale Y,
j k  j k
元ベクトル X= (X , X , ···, X)と q次元ベクトル Υ= (Υ , Υ , ···, Υ)との相関係  Correlation between source vector X = (X, X, ..., X) and q-dimensional vector Υ = (Υ, Υ, ..., Υ)
1 2 ρ 1 2 q  1 2 ρ 1 2 q
数の自乗ができるだけ 1に近くなるようにベクトル Xと Yの成分を求めて 、る。  Find the components of the vectors X and Y so that the square of the number is as close to one as possible.
非特許文献 1:上田太一郎 他著「実践ワークショップ Excel徹底活用 多変量解析 Non-patent literature 1: Taichiro Ueda et al. "Practical workshop Excel thorough utilization Multivariate analysis
」株式会社秀和システム, 2003年 9月 5日発行, 323〜337頁 "Shuwa System Co., Ltd., published on September 5, 2003, pp. 323-337
発明の開示  Disclosure of the invention
発明が解決しょうとする課題  Problem that invention tries to solve
[0005] しかし、上記の双対尺度法やその他従来の手法では、クロス表の縦軸の項目 X (j = 1, 2, · · · , p)相互の関係や、横軸の項目 Y (k= l, 2, · · · , q)相互の関係を十 However, in the above dual scaling method and other conventional methods, the item X (j = 1, 2, · · · · · p, mutual relations, and items Y on the horizontal axis (k = l, 2, · · ·, q) mutual relations
k  k
分に分析するものではないから、 Xと Yとを併せ考慮した検討を十分に行うことがで  Because it does not analyze in minutes, it is possible to fully consider considering X and Y together.
j k  j k
きない。上記の双対尺度法は、 Xと Yとにそれぞれ尺度を与えているが、そこから得  I can't. The above dual scaling method gives X and Y respectively a scale,
j k  j k
られる情報は限られたものでしかない。この手法を用いても、技術文書群における文 書属性の関連性を十分に分析することはできない。従って、企業の技術開発の方向 性に対する客観的な指針を得るための判断基準とすることはできない。  Information is limited. Even with this method, the relevance of document attributes in technical documents can not be analyzed sufficiently. Therefore, it can not be used as a criterion to obtain objective guidance on the direction of technology development in a company.
[0006] 本発明の課題は、技術文書の第 1の属性 Xに対応する第 1のベクトル群の相互の 関連性と、第 2の属性 Yに対応する第 2のベクトル群の相互の関連性とを詳細に分析 し、その上で第 1の属性 Xと第 2の属性 Yとを併せ考慮した検討を行うことで、技術文 書群における文書属性のデータ分布の集中や分散の状態を識別し、企業の技術開 発の方向性に対する判断基準を示すことができる技術文書属性の関連性分析支援 装置、支援方法及び支援プログラムを提供することである。  The subject of the present invention is to solve the interrelationship of the first vector group corresponding to the first attribute X of the technical document and the interrelationship of the second vector group corresponding to the second attribute Y. And analyze in detail the first attribute X and the second attribute Y, and identify the state of concentration or dispersion of the data distribution of the document attribute in the technical document group. Also, it is to provide a device, method and program for supporting relevance analysis of technical document attributes, which can show judgment criteria for the direction of technological development of a company.
課題を解決するための手段  Means to solve the problem
[0007] (1)上記の課題を解決するため、本発明の技術文書属性の関連性分析支援装置 は、 (1) In order to solve the above problems, the technical document attribute relevance analysis support device of the present invention is:
少なくとも 2種類の属性をそれぞれ有する技術文書を複数含んだ技術文書群のデ ータを取得するデータ取得手段と、  Data acquisition means for acquiring data of a technical document group including a plurality of technical documents each having at least two types of attributes;
前記少なくとも 2種類の属性のうち第 1の属性 Xと第 2の属性 Yとの組合せのそれぞ れに属する技術文書のデータに応じたスコアを算出するスコア算出手段と、 前記第 1の属性 Xを横軸に、前記第 2の属性 Yを縦軸にとって前記スコアをマトリク ス状に配置したときの、当該マトリクス状の配置における各列に属する前記スコアに 基づきベクトルを生成する第 1ベクトル群生成手段と、  Score calculation means for calculating a score according to data of a technical document belonging to each of a combination of the first attribute X and the second attribute Y among the at least two types of attributes; and the first attribute X When the score is arranged in a matrix form with the second attribute Y as the abscissa and the ordinate with the second attribute Y, a vector is generated based on the scores belonging to each column in the matrix arrangement. Means,
前記第 1ベクトル群生成手段により生成されたベクトル群について、相互の関連性 を算出する第 1べ外ル関連性算出手段と、  First vector relevancy calculating means for calculating interrelationships of vectors generated by the first vector group generating means;
前記第 1ベクトル群生成手段により生成されたベクトル群について、前記関連性の 高 、ベクトル同士をより近くに配置する第 1ベクトル配置手段と、  First vector arranging means for arranging the vectors of the highness of the relevancy closer to the vectors generated by the first vector group generating means;
前記マトリクス状の配置における各行に属する前記スコアに基づきベクトルを生成 する第 2ベクトル群生成手段と、 前記第 2ベクトル群生成手段により生成されたベクトル群について、相互の関連性 を算出する第 2べ外ル関連性算出手段と、 Second vector group generation means for generating a vector based on the scores belonging to each row in the matrix arrangement; Second vector relevancy calculating means for calculating interrelationships of vectors generated by the second vector group generating means;
前記第 2ベクトル群生成手段により生成されたベクトル群について、前記関連性の 高 、ベクトル同士をより近くに配置する第 2ベクトル配置手段と、を備えて 、る。  For the vector group generated by the second vector group generation means, the height of the relevancy, and a second vector arrangement means for arranging the vectors closer to each other.
[0008] これによれば、第 1の属性 X(マトリクス状に配置したスコアの各列)にそれぞれ対応 するベクトル相互の関連性を算出して第 2の属性 Yの分布が似ているベクトル同士を より近くに配置し、第 2の属性 Y (マトリクス状に配置したスコアの各行)にそれぞれ対 応するベクトル相互の関連性を算出して第 1の属性 Xの分布が似ているベクトル同士 をより近くに配置することになる。従って、第 1の属性 Xに対応するベクトル相互の関 連性と、第 2の属性 Yに対応するベクトル相互の関連性とを詳細に分析し、その上で 第 1の属性 Xと第 2の属性 Yとを併せ考慮した検討を行うことで、技術文書群における 文書属性のデータ分布の集中や分散の状態を識別することができる。 According to this, the correlation between the vectors corresponding to the first attribute X (each row of the scores arranged in a matrix) is calculated, and the vectors having a similar distribution of the second attribute Y are calculated. Are placed closer, and the correlation between the vectors corresponding to the second attribute Y (each row of the score arranged in a matrix) is calculated, and vectors having similar distributions of the first attribute X are It will be placed closer. Therefore, we analyze in detail the interrelationship between the vectors corresponding to the first attribute X and the interrelationships between the vectors corresponding to the second attribute Y, on which the first attribute X and the second By conducting a study in consideration of the attribute Y, it is possible to identify the concentration and distribution of data distribution of document attributes in the technical document group.
[0009] (2)上記の技術文書属性の関連性分析支援装置において、 (2) In the relevance analysis support device of the above technical document attribute,
前記第 1の属性 X及び第 2の属性 Yのうち、一方は各技術文書の人的属性であり、 他方は各技術文書の技術分野属性であることが望ましい。  Preferably, one of the first attribute X and the second attribute Y is a human attribute of each technical document, and the other is a technical attribute of each technical document.
人的属性には例えば特許文書であれば出願人や発明者などが含まれ、技術論文 や書籍であれば著者や編集者などが含まれる。技術分野属性には IPC (国際特許 分類)などの技術分類の他、技術要素、キーワードなどが含まれる。  Human attributes include, for example, applicants and inventors for patent documents, and authors and editors for technical papers and books. Technical attributes include technical elements, keywords, etc. in addition to technical classifications such as IPC (International Patent Classification).
[0010] これにより、人的属性に対応するベクトル相互の関連性と、技術分野属性に対応す るべ外ル相互の関連性を分析し、その上で人的属性と技術分野属性とを併せ考慮 した検討が可能となる。例えば、自社と他社とで技術開発領域の関連性が示される ので、類似の開発性向を有する企業を探すことができる。ここでいう類似の開発性向 を有する企業は、現に市場で競合している企業とは限らない。自社と比較される企業 1S 自社と類似の開発性向を有しながら、自社にとって未参入の業界に参入済みの 場合、自社がその業界に新規参入するための技術的ハードルは低いことが予想でき る。また、自社と市場で競合しているが異なる開発性向を有する企業と比較して自社 の開発部門の強み Z弱みを発見したり、互いの開発部門の弱点を補い合える技術 提携先を探したりして、自社が参入したい業界で他社に対抗できるようにするための 技術開発の方針策定に役立てることもできる。また更に、例えばある技術分野と他の 技術分野とで開発主体の関連性が示されるので、技術分野間の関連性を分析する ことができる。例えば、比較する技術分野を同一企業が併せて手掛ける傾向が高い 場合、(a)両者を手掛けることで現存の事業に結びついている可能性を見出し、当該 事業への参入可否や、当該事業に参入するための技術開発の要否を判断すること ができる。或いは、(b)技術的に一見関連していないようでも相互の技術を転用でき る可能性を見出すことができる。 [0010] In this way, it is possible to analyze the interrelationship between vectors corresponding to human attributes and the mutual relation between veins corresponding to technical field attributes, and then combine human attributes and technical field attributes. It will be possible to consider it. For example, since the relationship between technology development areas is shown between the company and other companies, companies with similar development tendencies can be found. Companies with similar development tendencies are not necessarily the ones that are currently competing in the market. Company 1S compared to your company With similar development tendency to your company, if you have entered an industry that has not yet entered your company, you can expect that the technical hurdles for your company to enter the industry are low. . In addition, you will find strengths of your development department Z compared with companies that compete with your company but have different development tendencies, or find a technology alliance partner that can compensate for each other's development department's weaknesses. To be able to compete with other companies in the industry that you want to enter It can also be used to formulate a policy for technology development. Furthermore, because relationships between developmental entities are shown, for example, in one technical field and another technical field, it is possible to analyze the relationship between technical fields. For example, if there is a high tendency for the same company to work together on the technology fields to be compared, (a) finding the possibility that it is linked to the existing business by working on both sides, and whether to enter the business or enter the business It is possible to judge the necessity of technical development to Or (b) it is possible to find out the possibility of diverting each other's technology even if it is not technically seemingly related.
[0011] (3)上記の技術文書属性の関連性分析支援装置において、 (3) In the relevance analysis support device for the above technical document attributes,
前記スコア算出手段は、前記第 1の属性 Xの値 X (j = l, 2, · · · , p)と前記第 2の属 性 Yの値 Y (k= l, 2, · · · , q)との組合せ (X , Y )が同一である技術文書の数に基  The score calculation means is configured to calculate the value X of the first attribute X (j = l, 2, · · ·, p) and the value of the second attribute Y Y (k = l, 2, · · · ·, Based on the number of technical documents whose combination (X, Y) with q) is identical
k j k  k j k
づいて、前記スコアを算出することが望ましい。  Preferably, the score is calculated.
[0012] 組合せが同一である技術文書の数に基づいてスコアを算出することにより、属性の 分布の集中や分散の状態を簡単に且つ客観的に表現することができる。 By calculating the score based on the number of technical documents having the same combination, it is possible to simply and objectively express the state of concentration or dispersion of attribute distribution.
[0013] (4)また、前記スコア算出手段は、前記第 1の属性 Xの値 X (j = l, 2, · · · , p)と前 記第 2の属性 Yの値 Y (k= l, 2, · · · , q)との組合せ (X , Y )が同一である技術文 (4) Further, the score calculation means may calculate the value X of the first attribute X (j = l, 2,..., P) and the value Y of the second attribute Y (k = Technical statement in which combinations (X, Y) with l, 2, · · · · · q are identical
k j k  k j k
書の各々に重み付けをして合計することにより、前記スコアを算出することとしてもよ い。  The score may be calculated by weighting and summing each of the books.
[0014] 組合せが同一である技術文書の各々に重み付けをして合計してスコアを算出する ことにより、技術文書の重要度或いは質的要素を加味したスコアで、適切に分析を行 うことができる。  [0014] By appropriately calculating the score by weighting each of the technical documents whose combination is the same, it is possible to appropriately analyze the score in consideration of the importance or qualitative factor of the technical documents. it can.
重み付けは、例えば、公開特許公報よりも特許掲載公報の重み付けを大きくするこ とにより、技術文書の重要度或いは質の高さが強調される。  The weighting emphasizes the importance or quality of the technical document, for example, by making the patent publication more weighty than the published patent publication.
[0015] (5)上記の技術文書属性の関連性分析支援装置において、 (5) In the relevance analysis support device of the above technical document attribute,
前記第 1ベクトル群生成手段又は前記第 2ベクトル群生成手段は、前記マトリクス状 の配置における各列又は各行に属するスコアの各々の対数を成分として含むベタト ルを生成することが望まし 、。  It is desirable that the first vector group generation means or the second vector group generation means generate a beta including, as a component, a logarithm of each of the scores belonging to each column or each row in the matrix arrangement.
[0016] これにより、特に各スコアが非負で且つ 0付近に分布が集中している場合に、ベタト ル成分の分布が正規分布に近くなるので、関連性算出結果の信頼度を向上すること ができる。 [0016] As a result, the distribution of the beta component becomes close to the normal distribution, particularly when the scores are nonnegative and the distribution is concentrated around 0, so that the reliability of the relevancy calculation result is improved. Can.
[0017] (6)上記の技術文書属性の関連性分析支援装置において、  (6) In the relevance analysis support device of the above technical document attribute,
前記第 1ベクトル配置手段は、  The first vector arrangement means
前記第 1ベクトル群生成手段により生成されたベクトル群のうち 2つのベクトルを所 定の基準で選択し、前記 2つのベクトルを隣接させてクラスタを生成する第 1クラスタ 生成手段と、  First cluster generation means for selecting two vectors among the vector groups generated by the first vector group generation means according to a predetermined criterion, and causing the two vectors to be adjacent to generate a cluster;
前記第 1クラスタ生成手段により生成されたクラスタを構成するベクトル群のうち両 端に位置する端部ベクトルの何れかとの関連性が最も高いベクトルを、前記第 1ベタ トル群生成手段により生成されたベクトル群のうち前記クラスタ以外のベクトル群から 選択して加入ベクトルとし、当該加入ベクトルと関連性が最も高 ヽとされた端部べタト ルと、当該加入ベクトルとを隣接させることにより、当該加入ベクトルを前記クラスタに 加えて前記クラスタを順次拡大させる第 1クラスタ拡大手段と、を備え、且つ Z又は、 前記第 2ベクトル配置手段は、  The vector having the highest relevance to any of the end vectors located at both ends of the vectors forming the cluster generated by the first cluster generation means is generated by the first vector group generation means The addition vector is selected from vectors other than the cluster among the vectors, and the addition vector is added to the end vector that is most highly correlated with the addition vector, and the addition vector is adjacent to the addition vector. And V. first cluster expanding means for adding vectors to the clusters to expand the clusters sequentially, and Z or the second vector arranging means includes
前記第 2ベクトル群生成手段により生成されたベクトル群のうち 2つのベクトルを所 定の基準で選択し、前記 2つのベクトルを隣接させてクラスタを生成する第 2クラスタ 生成手段と、  A second cluster generation unit configured to select two vectors among the vector groups generated by the second vector group generation unit according to a predetermined criterion, and to make the two vectors adjacent to generate a cluster;
前記第 2クラスタ生成手段により生成されたクラスタを構成するベクトル群のうち両 端に位置する端部ベクトルの何れかとの関連性が最も高いベクトルを、前記第 2ベタ トル群生成手段により生成されたベクトル群のうち前記クラスタ以外のベクトル群から 選択して加入ベクトルとし、当該加入ベクトルと関連性が最も高 ヽとされた端部べタト ルと、当該加入ベクトルとを隣接させることにより、当該加入ベクトルを前記クラスタに 加えて前記クラスタを順次拡大させる第 2クラスタ拡大手段と、を備えることが望ま Uヽ  The vector having the highest relevance to any of end vectors located at both ends of the vectors forming the cluster generated by the second cluster generation means is generated by the second vector group generation means The addition vector is selected from vectors other than the cluster among the vectors, and the addition vector is added to the end vector that is most highly correlated with the addition vector, and the addition vector is adjacent to the addition vector. And second cluster expanding means for expanding the cluster sequentially by adding a vector to the cluster.
[0018] これによれば、関連性の高!、ベクトルから順次隣接させ、クラスタを拡大させて 、く ので、関連性の高いベクトル同士を確実に近くに配置し、文書属性のデータ分布の 集中や分散の状態を明示させることができる。 [0018] According to this, the high relevance !, the vectors are sequentially adjacent, the cluster is expanded, and so on, so that highly relevant vectors are surely placed close to each other, and the document attribute data distribution is concentrated And the state of distribution can be made explicit.
[0019] (7)上記の技術文書属性の関連性分析支援装置において、  (7) In the relevance analysis support device for the above technical document attributes,
前記第 1クラスタ生成手段又は前記第 2クラスタ生成手段は、それぞれ前記第 1ベタ トル群生成手段により生成されたベクトル群又は前記第 2ベクトル群生成手段により 生成されたベクトル群について、 The first cluster generation unit or the second cluster generation unit is configured to Regarding the vector group generated by the toll group generation means or the vector group generated by the second vector group generation means,
当該ベクトル群のうち相互の関連性が最も高い 2つのベクトルを選択することが望ま しい。  It is desirable to select two vectors with the highest correlation among the vectors.
[0020] これにより、最も関連性の高いベクトル同士を確実に隣接させることができるので、 ベクトル配置の定量的な客観性を担保することができる。  [0020] This makes it possible to ensure that the most relevant vectors are adjacent to each other, so that quantitative objectivity of vector arrangement can be secured.
[0021] (8)上記の技術文書属性の関連性分析支援装置において、 (8) In the relevance analysis support device for the above technical document attributes,
前記第 1ベクトル配置手段は、  The first vector arrangement means
前記第 1クラスタ生成手段により生成されたクラスタを構成するベクトル群のうち両 端に位置する端部ベクトルと、前記第 1ベクトル群生成手段により生成されたベクトル 群のうち前記クラスタ以外のベクトルとの関連性が何れも所定の閾値以下である場合 、前記第 1クラスタ拡大手段による前記加入ベクトルの選択と前記クラスタの拡大を中 止する、第 1クラスタ拡大中止判定手段と、  Between end vectors located at both ends of the vectors forming the cluster generated by the first cluster generation means, and vectors other than the cluster among the vectors generated by the first vector group generation means A first cluster expansion cancellation determination unit for stopping the selection of the joining vector by the first cluster expansion unit and the expansion of the cluster if any relevance is equal to or less than a predetermined threshold;
前記第 1クラスタ生成手段により生成されたクラスタ以外のベクトル群のうち、 2つの ベクトルを所定の基準で選択し、当該 2つのベクトルを隣接させて他のクラスタを生成 する、第 1クラスタ再生成手段と、  A first cluster regenerating unit configured to select two vectors out of a group of vectors other than clusters generated by the first cluster generating unit according to a predetermined criterion and to make the two vectors adjacent to generate another cluster; When,
前記第 1クラスタ再生成手段により生成された前記他のクラスタを構成するベクトル 群のうち両端に位置する端部ベクトルの何れかとの関連性が最も高いベクトルを、前 記第 1ベクトル群生成手段により生成されたベクトル群であって前記第 1クラスタ生成 手段により生成されたクラスタ以外のベクトル群のうち前記他のクラスタ以外のベタト ル群カゝら選択して加入ベクトルとし、当該加入ベクトルと関連性が最も高 ヽとされた端 部ベクトルと、当該加入ベクトルとを隣接させることにより、当該加入ベクトルを前記他 のクラスタに加えて前記他のクラスタを順次拡大させる、第 1クラスタ再拡大手段と、を 更に備え、且つ Z又は、  The vector having the highest relevance to any one of the end vectors located at both ends of the vectors forming the other cluster generated by the first cluster regenerating means is generated by the first vector group generating means Of the vector groups other than the clusters generated by the first cluster generation means, the vector groups other than the other clusters are selected to be the generated vector group as a join vector, and the association vector and the association vector are selected. First cluster re-expanding means for expanding the other cluster sequentially by adding the addition vector to the other cluster by bringing the end vector whose highest value is the highest and the addition vector adjacent to each other; And Z or or
前記第 2ベクトル配置手段は、  The second vector arranging unit is
前記第 2クラスタ生成手段により生成されたクラスタを構成するベクトル群のうち両 端に位置する端部ベクトルと、前記第 2ベクトル群生成手段により生成されたベクトル 群のうち前記クラスタ以外のベクトルとの関連性が何れも所定の閾値以下である場合 、前記第 2クラスタ拡大手段による前記加入ベクトルの選択と前記クラスタの拡大を中 止する、第 2クラスタ拡大中止判定手段と、 Between an end vector located at both ends of the vectors forming the cluster generated by the second cluster generation means, and a vector other than the cluster among the vectors generated by the second vector group generation means When all relevance is below a predetermined threshold Second cluster enlargement / disapproval determination means for discontinuing selection of the joining vector by the second cluster enlargement means and the enlargement of the cluster;
前記第 2クラスタ生成手段により生成されたクラスタ以外のベクトル群のうち、 2つの ベクトルを所定の基準で選択し、当該 2つのベクトルを隣接させて他のクラスタを生成 する、第 2クラスタ再生成手段と、  Second cluster regenerating means for selecting two vectors out of a group of vectors other than clusters generated by the second cluster generating means according to a predetermined criterion and causing the two vectors to be adjacent to generate another cluster When,
前記第 2クラスタ再生成手段により生成された前記他のクラスタを構成するベクトル 群のうち両端に位置する端部ベクトルの何れかとの関連性が最も高いベクトルを、前 記第 2ベクトル群生成手段により生成されたベクトル群であって前記第 2クラスタ生成 手段により生成されたクラスタ以外のベクトル群のうち前記他のクラスタ以外のベタト ル群カゝら選択して加入ベクトルとし、当該加入ベクトルと関連性が最も高 ヽとされた端 部ベクトルと、当該加入ベクトルとを隣接させることにより、当該加入ベクトルを前記他 のクラスタに加えて前記他のクラスタを順次拡大させる、第 2クラスタ再拡大手段と、を 更に備えることが望ましい。  The vector having the highest relevance to any one of the end vectors located at both ends of the vectors forming the other cluster generated by the second cluster regenerating means is generated by the second vector group generating means. Of the vector groups other than the clusters generated by the second cluster generation means, a vector group other than the other clusters is selected as the generated vector group to form a join vector, and the association vector and the association vector are selected. A second cluster re-expanding means for expanding the other cluster sequentially by adding the addition vector to the other cluster by bringing the end vector whose highest value is the highest and the addition vector adjacent to each other; It is desirable to further provide
[0022] これによれば、端部ベクトルとの関連性が所定の閾値以下である場合、 1つのクラス タに無理にまとめられてしまうことを回避し、より高い関連性を有するベクトル同士の 組合せを優先させることができ、ベクトルの配置の信頼性を向上することができる。関 連性の閾値は例えば相関係数 0を用いる。  [0022] According to this, when the association with the end vector is less than or equal to the predetermined threshold value, it is avoided that the cluster is forcibly integrated into one cluster, and the combination of vectors having higher association is achieved. Can be prioritized, and the reliability of vector placement can be improved. For example, a correlation coefficient of 0 is used as the threshold of relevance.
[0023] (9)上記の技術文書属性の関連性分析支援装置において、  (9) In the relevance analysis support device for the above technical document attributes,
前記第 1ベクトル配置手段及び前記第 2ベクトル配置手段による配置に基づいてマ トリタス状に配置されるスコアの分布状態を、スコアに応じた模様又は色彩を付して表 示する表示手段を備えることが望まし 、。  Display means for displaying the distribution state of the score arranged in a matrix based on the arrangement by the first vector arrangement means and the second vector arrangement means with a pattern or a color according to the score. Hoped.
[0024] スコアの分布を数値で示すだけでは、分布状態が一見して明らかではな 、が、模 様又は色彩を付すことにより、スコアの分布状態をより識別しやすく表示することがで きる。  The distribution state of the score is not clear at first glance only by numerically indicating the distribution of the score, but the distribution state of the score can be displayed more easily by marking or coloring it.
[0025] (10)また本発明は、上記各装置によって実行される方法と同じ工程を備えた技術 文書属性の関連性分析支援方法、並びに上記各装置によって実行される処理と同 じ処理をコンピュータに実行させることのできる技術文書属性の関連性分析支援プロ グラムである。このプログラムは、 FD、 CDROM、 DVDなどの記録媒体に記録され たものでもよく、ネットワークで送受信されるものでもよい。 (10) The present invention also provides a method of supporting analysis of relevance of document attributes including the same steps as the method executed by each of the above devices, and the same processing as the processing executed by each of the above devices. It is a technical document attribute relevance analysis support program that can be executed by This program is recorded on a recording medium such as FD, CDROM, DVD, etc. And may be transmitted and received by the network.
図面の簡単な説明  Brief description of the drawings
[0026] [図 1]本発明の第一実施形態に係る技術文書属性の関連性分析支援装置のハード ウェア構成を示す図。  FIG. 1 is a diagram showing a hardware configuration of a technical document attribute relevance analysis support apparatus according to a first embodiment of the present invention.
[図 2]上記第一実施形態の関連性分析支援装置における処理装置 1の動作手順を 示すフローチャート。  [FIG. 2] A flowchart showing an operation procedure of the processing device 1 in the relevance analysis support device of the first embodiment.
[図 3]表示部による表示例を示す図。  FIG. 3 is a view showing a display example by a display unit.
[図 4]表示部による他の表示例を示す図。  FIG. 4 is a view showing another display example by the display unit.
[図 5]第二実施形態の関連性分析支援装置における処理装置 1の動作手順を示す フローチャート。  [FIG. 5] The flowchart which shows the operation | movement procedure of the processing apparatus 1 in the relevance analysis assistance apparatus of 2nd embodiment.
[図 6]第二実施形態で生成される文書数マトリクスの一例。  [FIG. 6] An example of the document number matrix generated in the second embodiment.
符号の説明  Explanation of sign
[0027] 1 :処理装置、 2 :入力装置、 3 :記録装置、 4 :出力装置、 110 :データ取得部、 120 :スコア算出部、 130及び 140 :第 1及び第 2ベクトル群生成部、 150及び 160 :第 1及 び第 2ベクトル関連性算出部、 170及び 180 :第 1及び第 2ベクトル配置部  1: processing device 2: input device 3: recording device 4: output device 110: data acquisition unit 120: score calculation unit 130 and 140: first and second vector group generation units 150 And 160: first and second vector relation calculation units, 170 and 180: first and second vector arrangement units
発明を実施するための最良の形態  BEST MODE FOR CARRYING OUT THE INVENTION
[0028] 以下、本発明の実施の形態を、図面を参照して詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
< 1.略号の説明等 >  <1. Description of abbreviations>
i: 個々の技術文書に付与される技術文書番号。例えば一定条件下で抽出された 全特許出願にそれぞれ付与される。技術文書数を Nとすると、 i= l, 2, · · · , Nであ る。  i: Technical document number assigned to each technical document. For example, they are respectively granted to all patent applications extracted under certain conditions. If the number of technical documents is N, then i = l, 2, · · ·, N.
X、 Y: 個々の技術文書の属性。例えば出願人、技術分野 (キーワード又は IPC) など。  X, Y: Attributes of individual technical documents. For example, applicant, technical field (keyword or IPC) etc.
X、 Y: 属性の値。例えば出願人や技術分野の具体的名称を指し、数値で表現さ j k  X, Y: The value of the attribute. For example, it refers to the specific name of the applicant or technical field, and is expressed numerically j k
れるものに限らない。  It is not limited to
σ : 属性 Xと属性 Υとの組合せごとに算出されるスコア。属性 Xの値域を X、 X、 · kj 1 2 σ: Score calculated for each combination of attribute X and attribute Υ. Value range of attribute X is X, X, · kj 1 2
• ·、 X、属性 Yの値域を Y、 Y、 · · ·、 Yとした場合、 p X q個のスコア σ を定義でき• · · · If X, attribute Y range is Y, Y, · · · Y, you can define p x q scores σ
Ρ 1 2 q k]Ρ 1 2 q k]
、これらを q行 p列のマトリクス状に配置することができる。マトリクス状に配置された各 列に属するスコア σ 、 σ 、 · · ·、 σ を成分とする q次元ベクトルをベクトル X、各行 These can be arranged in a matrix of q rows and p columns. Each arranged in a matrix A vector X with q-dimensional vectors whose components are the scores σ, σ, · · · · · belonging to a column, each row
1] 2] qj ] に属するスコア σ 、 σ 、 · · ·、 σ を成分とする ρ次元ベクトルをベクトル Υとする(  1] 2] Let q be a ρ-dimensional vector whose component is a score σ, σ, · · · · · belonging to the vector (
kl k2 kp k 対応する属性の値 X、 Yと同じ符号を用いる)。  kl k2 kp k Use the same code as the corresponding attribute values X and Y).
j k  j k
[0029] < 2.技術文書属性の関連性分析支援装置の構成 >  [0029] <2. Configuration of Relevancy Analysis Support Device for Technical Document Attributes>
図 1は本発明の第一実施形態に係る技術文書属性の関連性分析支援装置のハー ドウ ア構成を示す図である。同図に示すように、本実施形態の関連性分析支援装 置は、 CPU (中央演算装置)およびメモリ(記録装置)などから構成される処理装置 1 、キーボード (手入力器具)などの入力手段である入力装置 2、技術文書群のデータ や条件や処理装置 1による作業結果などを格納する記録手段である記録装置 3、お よびマトリクス状に配置されたスコア等を表示又は印刷等する出力手段である出力装 置 4から構成される。  FIG. 1 is a diagram showing the hardware configuration of the technical document attribute relevance analysis supporting apparatus according to the first embodiment of the present invention. As shown in the figure, the relevance analysis support device of this embodiment includes a processing device 1 including a CPU (central processing unit) and a memory (recording device) and the like, and input means such as a keyboard (hand input device) An input unit 2, a recording unit 3 which is a recording unit for storing data and conditions of a technical document group and conditions and processing results by the processing unit 1, and an output unit for displaying or printing scores etc. arranged in a matrix. The output device 4 is
[0030] 処理装置 1は、データ取得部 110、スコア算出部 120、第 1及び第 2ベクトル群生成 部 130及び 140、第 1及び第 2ベクトル関連性算出部 150及び 160、第 1及び第 2ベ タトル配置部 170及び 180、を備えている。  The processing device 1 includes a data acquisition unit 110, a score calculation unit 120, first and second vector group generation units 130 and 140, first and second vector relevance calculation units 150 and 160, first and second The vector arrangement units 170 and 180 are provided.
[0031] 記録装置 3は、条件記録部 31、作業結果格納部 32、文書格納部 33などから構成 される。文書格納部 33は外部データベースや内部データベース力も得た、技術文書 群のデータを含んでいる。外部データベースとは、例えば日本国特許庁でサービス して 、る特許電子図書館の IPDLや、株式会社パトリスでサービスして!/、る PATOLI S (登録商標)などの文書データベースを意味する。又内部データベースとは、販売 されて!/、る例えば特許 JP— ROMなどのデータを自前で格納したデータベース、文 書を格納した FD (フレキシブルディスク)、 CD (コンパクトディスク) ROM、 MO (光磁 気ディスク)、 DVD (デジタルビデオディスク)などの媒体力 読み出す装置、紙など に出力された或いは手書きされた文書を読み込む OCR (光学的情報読み取り装置) などの装置及び読み込んだデータをテキストなどの電子データに変換する装置など を含んでいるものとする。  The recording device 3 includes a condition recording unit 31, a work result storage unit 32, a document storage unit 33, and the like. The document storage unit 33 contains data of a technical document group, which has also acquired external database and internal database capabilities. The external database means, for example, a document database such as IPDL of a patent electronic library serviced by the Japan Patent Office or service operated by Patrices Corporation! An internal database is a database that stores data such as patent JP-ROM that is sold and / or is stored, FD (flexible disc) that stores documents, CD (compact disc) ROM, and MO (magneto-optical). Devices such as DVDs (digital video disks), devices such as DVDs (digital video disks), devices such as OCR (optical information readers) that read or output handwritten documents on paper etc It is assumed that the equipment to convert data is included.
本実施例では、技術文書として主に特許公報類を扱うが、これに限らず、実用新案 公報、技術論文、技術を扱った雑誌、書籍など広く技術文書一般を分析することが できる。 [0032] 処理装置 1、入力装置 2、記録装置 3、および出力装置 4の間で信号やデータをや り取りする通信手段としては、 USB (ユニバーサルシステムバス)ケーブルなどで直接 接続してもよ 、し、 LAN (ローカルエリヤネットワーク)などのネットワークを介して送受 信してもよいし、文書を格納した FD、 CDROM、 MO、 DVDなどの媒体を介してもよ い。或いはこれらの一部、又はいくつかを組み合わせたものでもよい。 In the present embodiment, patent publications are mainly handled as technical documents, but not limited to this, general technical documents can be analyzed widely, such as utility model publications, technical articles, technical magazines, books, and the like. As communication means for exchanging signals and data among the processing unit 1, the input unit 2, the recording unit 3 and the output unit 4, they may be directly connected by a USB (Universal System Bus) cable or the like. It may be sent and received via a network such as a LAN (local area network), or may be via a medium such as an FD, a CDROM, an MO, a DVD, etc., in which the document is stored. Alternatively, some or some of these may be combined.
[0033] < 2— 1.入力装置 2の詳細 >  [0033] <2 — 1. Details of Input Device 2>
次に、上記の関連性分析支援装置における構成と機能を詳しく説明する。 入力装置 2では、技術文書群のデータの取得条件、スコアの算出条件、ベクトルの 生成条件、関連性の算出条件、ベクトルの配置条件などの入力を受け付ける。これら 入力された条件は、記録装置 3の条件記録部 31へ送られて格納される。  Next, the configuration and functions of the relevance analysis support device described above will be described in detail. The input device 2 receives inputs such as acquisition conditions of data of technical documents, calculation conditions of scores, generation conditions of vectors, calculation conditions of relevance, arrangement conditions of vectors, and the like. These input conditions are sent to the condition recording unit 31 of the recording device 3 and stored.
[0034] < 2— 2.処理装置 1の詳細 >  [0034] <2-2. Details of Processing Unit 1>
データ取得部 110は、入力装置 2で入力されるデータの取得条件に従って、分析 対象となる技術文書群のデータを記録装置 3の文書格納部 33から取得する。例えば 、各技術文書の書誌情報などに基づいて、各技術文書の少なくとも 2種類の属性を データとして取得する。取得された技術文書群のデータは、スコア算出部 120に直接 送られてそこでの処理に用いられ、或いは記録装置 3の作業結果格納部 32に送ら れて格納される。  The data acquisition unit 110 acquires data of a technical document group to be analyzed from the document storage unit 33 of the recording device 3 in accordance with the acquisition conditions of data input by the input device 2. For example, based on bibliographic information of each technical document, at least two types of attributes of each technical document are acquired as data. The acquired data of the technical document group is directly sent to the score calculation unit 120 to be used for processing there, or sent to the work result storage unit 32 of the recording device 3 to be stored.
[0035] スコア算出部 120は、データ取得部 110で取得された技術文書群のデータに基づ いて、上記少なくとも 2種類の属性のうち第 1の属性 Xと第 2の属性 Yとの組合せのそ れぞれに属する技術文書のデータに応じたスコア σ を算出する。このスコア σ は、 kj kj 上記第 1の属性 Xの値と第 2の属性 Yの値との組合せごとに、それぞれ算出される。 算出されたスコア σ は、第 1及び第 2ベクトル群生成部 130及び 140に直接送られ てそれらでの処理に用いられ、或いは記録装置 3の作業結果格納部 32に送られて 格納される。  Based on the data of the technical document group acquired by the data acquisition unit 110, the score calculation unit 120 determines the combination of the first attribute X and the second attribute Y among the at least two types of attributes. Calculate the score σ according to the data of the technical document that belongs to each. The score σ is calculated for each combination of the value of the first attribute X and the value of the second attribute Y of kj kj. The calculated score σ is directly sent to the first and second vector group generation units 130 and 140 to be used for processing with them, or sent to the work result storage unit 32 of the recording device 3 to be stored.
[0036] 第 1ベクトル群生成部 130は、スコア算出部 120で算出されたスコア σ に基づき、 ベクトル群 Xを生成する。このベクトル群 Xは、上記第 1の属性 Xを横軸に、上記第 2 の属性 Υを縦軸にとって上記スコア σ をマトリクス状に配置したときの、当該マトリク ス状の配置における各「列」に属する上記スコアに基づいて算出される。 第 2ベクトル群生成部 140は、スコア算出部 120で算出されたスコア σ に基づき、 The first vector group generation unit 130 generates a vector group X based on the score σ calculated by the score calculation unit 120. The vector group X has each “row” in the matrix-like arrangement when the score σ is arranged in a matrix with the first attribute X on the horizontal axis and the second attribute Υ on the vertical axis. Calculated based on the above score belonging to The second vector group generation unit 140 is based on the score σ calculated by the score calculation unit 120,
«  «
ベクトル群 Υを生成する。このベクトル群 Υは、上記第 1の属性 Xを横軸に、上記第 2  Generate vector group Υ. This vector group Υ has the first attribute X as a horizontal axis, and the second
k k  k k
の属性 Yを縦軸にとって上記スコア σ をマトリクス状に配置したときの、当該マトリク ス状の配置における各「行」に属する上記スコアに基づ 、て算出される。  When the above-mentioned scores σ are arranged in the form of a matrix with the attribute Y of Y being on the vertical axis, calculation is made based on the above-mentioned scores belonging to each “row” in the matrix-like arrangement.
第 1及び第 2ベクトル群生成部 130及び 140で生成されたベクトル群 X及び Yは、  The vector groups X and Y generated by the first and second vector group generation units 130 and 140 are
j k それぞれ第 1及び第 2ベクトル関連性算出部 150及び 160に直接送られてそこでの 処理に用いられ、或いは記録装置 3の作業結果格納部 32に送られて格納される。  j k are directly sent to the first and second vector relation calculation units 150 and 160 for use in processing there, or are sent to the work result storage unit 32 of the recording device 3 and stored.
[0037] 第 1ベクトル関連性算出部 150は、第 1ベクトル群生成部 130により生成されたべク トル群 Xについて、相互の関連性を算出する。 The first vector relevancy calculation unit 150 calculates the relevancy of the vector group X generated by the first vector group generation unit 130.
第 2ベクトル関連性算出部 160は、第 2ベクトル群生成部 140により生成されたべク トル群 Yについて、相互の関連性を算出する。  The second vector relevancy calculation unit 160 calculates the relevancy of the vector group Y generated by the second vector group generation unit 140.
k  k
第 1及び第 2ベクトル関連性算出部 150及び 160で算出された関連性のデータは、 それぞれ第 1及び第 2ベクトル配置部 170及び 180に直接送られてそこでの処理に 用いられ、或 、は記録装置 3の作業結果格納部 32に送られて格納される。  The data of the relevancy calculated by the first and second vector relevancy calculation units 150 and 160 are directly sent to the first and second vector arrangement units 170 and 180, respectively, to be used for processing there, or It is sent to and stored in the work result storage unit 32 of the recording device 3.
[0038] 第 1ベクトル配置部 170は、第 1ベクトル関連性算出部 150により算出されたべタト ル X相互の関連性に基づいて、当該関連性の高いベクトル同士をより近くに配置す j [0038] The first vector arrangement unit 170 arranges vectors having high relevance closer to each other based on the vector X mutual relationships calculated by the first vector relationship calculation unit 150.
る処理を行う。  Process.
第 2ベクトル配置部 180は、第 2ベクトル関連性算出部 160により算出されたべタト ル Y相互の関連性に基づいて、当該関連性の高いベクトル同士をより近くに配置す k  The second vector arrangement unit 180 arranges vectors having high relevance closer to each other based on the mutual relationship between vector Y calculated by the second vector relevance calculation unit 160. k
る処理を行う。  Process.
第 1及び第 2ベクトル配置部 170及び 180で決定されたベクトルの配置は、記録装 置 3の作業結果格納部 32に送られて格納され、必要に応じて出力装置 4にて出力さ れる。  The arrangement of the vectors determined by the first and second vector arrangement units 170 and 180 is sent to and stored in the work result storage unit 32 of the recording device 3 and output from the output device 4 as necessary.
[0039] 第 1及び第 2ベクトル配置部 170及び 180の特に好ましい態様として、図 1には、そ れぞれ第 1及び第 2クラスタ生成部 171及び 181、第 1及び第 2クラスタ拡大部 172及 び 182を備えたものが示されている。更に好ましい態様として、この図 1には、それぞ れ第 1及び第 2クラスタ拡大中止判定部 174及び 184、第 1及び第 2クラスタ再生成 部 175及び 185、第 1及び第 2クラスタ再拡大部 176及び 186を備えたものが示され ている。 As particularly preferable embodiments of the first and second vector arrangement units 170 and 180, in FIG. 1, first and second cluster generation units 171 and 181, and first and second cluster enlargement units 172, respectively. And 182 are shown. As a further preferable embodiment, in FIG. 1, first and second cluster expansion stop determination units 174 and 184, first and second cluster regeneration units 175 and 185, and first and second cluster reextension units, respectively. Those with 176 and 186 are shown ing.
[0040] 第 1クラスタ生成部 171は、第 1ベクトル群生成部 130により生成されたベクトル群 のうち 2つのベクトルを所定の基準で選択し、これら 2つのベクトルを隣接させてクラス タを生成する。  The first cluster generation unit 171 selects two vectors among the vectors generated by the first vector group generation unit 130 based on a predetermined criterion, and generates a cluster by causing these two vectors to be adjacent to each other. .
第 2クラスタ生成部 181は、第 2ベクトル群生成部 140により生成されたベクトル群 のうち 2つのベクトルを所定の基準で選択し、これら 2つのベクトルを隣接させてクラス タを生成する。  The second cluster generation unit 181 selects two vectors among the vectors generated by the second vector group generation unit 140 based on a predetermined reference, and generates a cluster by causing these two vectors to be adjacent to each other.
2つのベクトルを選択する所定の基準は、例えば関連性の高さとし、相互の関連性 が最も高 、2つのベクトルを選択することができる。  The predetermined criterion for selecting the two vectors is, for example, the level of relevancy, and the relevancy is the highest. Two vectors can be selected.
第 1及び第 2クラスタ生成部 171及び 181で生成されたクラスタは、それぞれ第 1及 び第 2クラスタ拡大部 172及び 182に直接送られてそこでの処理に用いられ、或いは 記録装置 3の作業結果格納部 32に送られて格納される。  The clusters generated by the first and second cluster generation units 171 and 181 are directly sent to the first and second cluster expansion units 172 and 182, respectively, to be used for processing there, or the work result of the recording apparatus 3 It is sent to and stored in the storage unit 32.
[0041] 第 1クラスタ拡大部 172は、第 1クラスタ生成部 171で生成されたクラスタに加入べク トルを加えることにより、第 1クラスタ生成部 171で生成されたクラスタを順次拡大させ る。この加入ベクトルは、第 1クラスタ生成部 171により生成されたクラスタを構成する ベクトル群のうち両端に位置する端部ベクトルの何れかとの関連性が最も高いベタト ルを、第 1ベクトル群生成部 130により生成されたベクトル群 Xのうち上記クラスタ以 外のベクトル群力 選択することによって決定する。加入ベクトルのクラスタへの加入 は、当該加入ベクトルと関連性が最も高いとされた端部ベクトルと、当該加入ベクトル とを隣接させることによって行うが、これに限らずクラスタ内の他の箇所に加入べタト ルを加入させてもよい。 The first cluster expanding unit 172 sequentially expands the clusters generated by the first cluster generating unit 171 by adding joining vectors to the clusters generated by the first cluster generating unit 171. The addition vector is a vector having the highest relevance to any of the end vectors located at both ends of the vectors generated by the first cluster generation unit 171. This is determined by selecting a vector group force other than the above cluster among the vector groups X generated by. Joining of a join vector to a cluster is performed by bringing the end vector determined to be most relevant to the join vector to be adjacent to the join vector, but is not limited thereto. You may join Vettar.
第 2クラスタ拡大部 182は、第 2クラスタ生成部 181で生成されたクラスタに加入べク トルを加えることにより、第 2クラスタ生成部 181で生成されたクラスタを順次拡大させ る。この加入ベクトルは、第 2クラスタ生成部 181により生成されたクラスタを構成する ベクトル群のうち両端に位置する端部ベクトルの何れかとの関連性が最も高いベタト ルを、第 2ベクトル群生成部 140により生成されたベクトル群 Yのうち上記クラスタ以  The second cluster expanding unit 182 sequentially expands the clusters generated by the second cluster generating unit 181 by adding joining vectors to the clusters generated by the second cluster generating unit 181. The addition vector is a vector having the highest relevance to any of the end vectors located at both ends of the vectors forming the cluster generated by the second cluster generation unit 181. In the vector group Y generated by
k  k
外のベクトル群力 選択することによって決定する。加入ベクトルのクラスタへの加入 は、当該加入ベクトルと関連性が最も高いとされた端部ベクトルと、当該加入ベクトル とを隣接させることによって行うが、これに限らずクラスタ内の他の箇所に加入べタト ルを加入させてもよい。 Determine by selecting the outer vector group force. Joining a join vector to a cluster is performed by using an end vector that has the highest relevance to the join vector and the join vector. However, the present invention is not limited to this, but the joining vector may be joined to other places in the cluster.
第 1及び第 2クラスタ拡大部 172及び 182によりクラスタを拡大させ、クラスタ未加入 のベクトルがなくなったときは、第 1及び第 2ベクトル配置部 170及び 180の処理は終 了する。  The clusters are expanded by the first and second cluster expansion units 172 and 182, and when there is no cluster unjoined vector, the processing of the first and second vector arrangement units 170 and 180 is finished.
[0042] 第 1クラスタ拡大中止判定部 174は、第 1クラスタ生成部 171により生成されたクラス タを構成するベクトル群のうち両端に位置する端部ベクトルと、第 1ベクトル群生成部 130により生成されたベクトル群 Xのうち上記クラスタ以外のベクトルとの関連性が何  The first cluster enlargement stop determination unit 174 is generated by the end vectors located at both ends of the vectors forming the cluster generated by the first cluster generation unit 171, and the first vector group generation unit 130. The association of vectors X other than the above-mentioned clusters among
j  j
れも所定の閾値以下である場合、第 1クラスタ拡大部 172による加入ベクトルの選択 とクラスタの拡大を中止する。  If this is also less than or equal to the predetermined threshold value, the first cluster expanding unit 172 cancels the selection of the joining vector and the cluster expansion.
第 2クラスタ拡大中止判定部 184は、第 2クラスタ生成部 181により生成されたクラス タを構成するベクトル群のうち両端に位置する端部ベクトルと、第 2ベクトル群生成部 140により生成されたベクトル群 Yのうち上記クラスタ以外のベクトルとの関連性が何  The second cluster enlargement stop determination unit 184 includes end vectors located at both ends of the vectors forming the cluster generated by the second cluster generation unit 181, and the vectors generated by the second vector generation unit 140. What is the relationship between the group Y and vectors other than the above cluster?
k  k
れも所定の閾値以下である場合、第 2クラスタ拡大部 182による加入ベクトルの選択 とクラスタの拡大を中止する。  If this is also less than a predetermined threshold, the second cluster expanding unit 182 cancels the addition vector selection and the cluster expansion.
ここで所定の閾値としては、例えば相関係数なら 0 (無相関)とすることが望ましい。  Here, as the predetermined threshold value, for example, it is desirable to set 0 (no correlation) in the case of the correlation coefficient.
[0043] 第 1クラスタ再生成部 175は、第 1クラスタ生成部 171により生成されたクラスタ (第 1 クラスタ拡大部 172により拡大された場合は拡大後のクラスタ)以外のベクトル群のう ち、 2つのベクトルを所定の基準で選択し、当該 2つのベクトルを隣接させて他のクラ スタを生成する。 The first cluster regeneration unit 175 selects one of the vectors other than the clusters generated by the first cluster generation unit 171 (the clusters after expansion when expanded by the first cluster expansion unit 172), 2 One vector is selected according to a predetermined criterion, and the two vectors are adjacent to generate another cluster.
第 2クラスタ再生成部 185は、第 2クラスタ生成部 181により生成されたクラスタ (第 2 クラスタ拡大部 182により拡大された場合は拡大後のクラスタ)以外のベクトル群のう ち、 2つのベクトルを所定の基準で選択し、当該 2つのベクトルを隣接させて他のクラ スタを生成する。  The second cluster re-generation unit 185 generates two vectors out of a group of vectors other than the clusters generated by the second cluster generation unit 181 (the clusters after expansion when expanded by the second cluster expansion unit 182). The selection is made based on a predetermined criterion, and the two vectors are made adjacent to generate another cluster.
第 1及び第 2クラスタ再生成部 175及び 185で生成された他のクラスタは、それぞれ 第 1及び第 2クラスタ再拡大部 176及び 186に直接送られてそこでの処理に用いられ 、或いは記録装置 3の作業結果格納部 32に送られて格納される。  The other clusters generated by the first and second cluster regenerating units 175 and 185 are directly sent to the first and second cluster re-expanding units 176 and 186, respectively, and are used for processing there or the recording device 3 It is sent to and stored in the work result storage unit 32 of FIG.
[0044] 第 1クラスタ再拡大部 176は、第 1クラスタ再生成部 175で生成された上記他のクラ スタに加入ベクトルをカ卩えることにより、上記他のクラスタを順次拡大させる。この加入 ベクトルは、第 1クラスタ再生成部 175により生成された上記他のクラスタを構成する ベクトル群のうち両端に位置する端部ベクトルの何れかとの関連性が最も高いベタト ルを、第 1ベクトル群生成部 130により生成されたベクトル群 Xであって第 1クラスタ生 成部 171により生成されたクラスタ以外のベクトル群のうち上記他のクラスタ以外のベ タトル群力 選択することによって決定する。加入ベクトルの上記他のクラスタへの加 入は、当該加入ベクトルと関連性が最も高いとされた端部ベクトルと、当該加入べタト ルとを隣接させることによって行う。 The first cluster re-expanding unit 176 is configured by the other cluster generated by the first cluster re-generating unit 175. By adding a join vector to the star, the other clusters are sequentially expanded. The addition vector is a vector that has the highest relevance to any of the end vectors located at both ends of the vectors forming the other clusters generated by the first cluster regeneration unit 175. It is determined by selecting vector group forces other than the above other clusters among vector groups other than clusters which are vector group X generated by the group generation unit 130 and generated by the first cluster generation unit 171. The addition vector is added to the other cluster by bringing the end vector most associated with the addition vector and the addition vector adjacent to each other.
第 2クラスタ再拡大部 186は、第 2クラスタ再生成部 185で生成された上記他のクラ スタに加入ベクトルをカ卩えることにより、上記他のクラスタを順次拡大させる。この加入 ベクトルは、第 2クラスタ再生成部 185により生成された上記他のクラスタを構成する ベクトル群のうち両端に位置する端部ベクトルの何れかとの関連性が最も高いベタト ルを、第 2ベクトル群生成部 140により生成されたベクトル群 Yであって第 2クラスタ  The second cluster re-enlargement unit 186 expands the other clusters sequentially by using the other cluster generated by the second cluster re-generation unit 185 with the join vector. The addition vector is a vector that has the highest relevance to any of the end vectors located at both ends of the vectors forming the other cluster generated by the second cluster regenerating unit 185. A vector group Y generated by the group generation unit 140, which is the second cluster
k  k
生成部 181により生成されたクラスタ以外のベクトル群のうち上記他のクラスタ以外の ベクトル群力 選択することによって決定する。加入ベクトルの上記他のクラスタへの 加入は、当該加入ベクトルと関連性が最も高いとされた端部ベクトルと、当該加入べ タトルとを隣接させることによって行う。  It is determined by selecting vector group forces other than the above other clusters among vector groups other than clusters generated by the generation unit 181. Joining of the join vector to the other cluster is performed by bringing the end vector that is considered to be most relevant to the join vector to be adjacent to the join vector.
第 1及び第 2クラスタ再拡大部 176及び 186によりクラスタを拡大させ、クラスタ以外 のベクトル群がなくなったときは、第 1及び第 2ベクトル配置部 170及び 180の処理は 終了する。  The clusters are expanded by the first and second cluster re-enlargement units 176 and 186, and when there are no vectors other than clusters, the processing of the first and second vector arrangement units 170 and 180 ends.
[0045] < 2— 3.記録装置 3の詳細 >  [0045] <2— 3. Details of Recording Device 3>
記録装置 3において、条件記録部 31は、入力装置 2から得られた条件などの情報 を記録し、処理装置 1の要求に基づき、必要なデータを送る。作業結果格納部 32は 、処理装置 1における各構成要素の作業結果を格納し、処理装置 1の要求に基づき 、必要なデータを送る。文書格納部 33は、入力装置 2或いは処理装置 1の要求に基 づき、外部データベース或いは内部データベース力も得た、必要な技術文書群のデ ータを格納し、提供する。  In the recording device 3, the condition recording unit 31 records information such as conditions obtained from the input device 2, and sends necessary data based on the request of the processing device 1. The work result storage unit 32 stores the work result of each component in the processing device 1 and sends necessary data based on the request of the processing device 1. The document storage unit 33 stores and provides data of a necessary technical document group which also obtains external database or internal database capability based on the request of the input device 2 or the processing device 1.
[0046] < 2-4.出力装置 4の詳細 > 出力装置 4は、処理装置 1の第 1及び第 2ベクトル配置部 170及び 180で決定され たベクトルの配置に基づいてマトリクス状に配置されるスコア等を出力する。この出力 装置 4は例えばディスプレイ装置などの表示部 41を備え、マトリクス状に配置されるス コアの分布状態を、スコアに応じた模様又は色彩を付して表示する。出力の形態とし ては、表示部 41での表示に限らず、紙などの印刷媒体への印刷、或いは通信手段 を介してのネットワーク上のコンピュータ装置への送信などによってもよい。 2-4. Details of Output Device 4> The output device 4 outputs a score or the like arranged in a matrix based on the arrangement of vectors determined by the first and second vector arrangement units 170 and 180 of the processing device 1. The output device 4 includes, for example, a display unit 41 such as a display device, and displays the distribution state of the scores arranged in a matrix with a pattern or a color according to the score. The form of the output is not limited to the display on the display unit 41, but may be printing on a print medium such as paper, or transmission to a computer apparatus on a network via communication means.
[0047] < 3.第一実施形態の動作 > Operation of First Embodiment>
図 2は上記第一実施形態の関連性分析支援装置における処理装置 1の動作手順 を示すフローチャートである。  FIG. 2 is a flow chart showing the operation procedure of the processing device 1 in the relevance analysis support device of the first embodiment.
[0048] < 3— 1.技術文書群のデータ入手 >  <3— 1. Acquisition of data of technical documents>
まず、データ取得部 110において、分析対象となる技術文書群のデータを取得す る (ステップ S110)。この技術文書群の個々の文書は、それぞれ少なくとも 2種類の 属性 X及び Yをもっている必要がある。この技術文書群の文書数を N個とする。例え ば次の [表 1]のようなデータを得る。なお、個々の技術文書について属性の値の数 は 1つでも良いし、次の [表 1]の技術文書番号 2、 3、 4等の属性 Zのように、個々の技 術文書について属性の値が複数あっても良い。例えば 1つの特許文書に複数の発 明者が記載されている場合、発明者属性の値は発明者の数だけあることになる。  First, the data acquisition unit 110 acquires data of a technical document group to be analyzed (step S110). Each document of this technical document group needs to have at least two kinds of attributes X and Y respectively. The number of documents in this technical document group is N. For example, the data shown in [Table 1] below is obtained. In addition, the number of attribute values may be one for each technical document, and the attribute of each technical document may be one like attribute Z of technical document numbers 2, 3 and 4 in the following [Table 1]. There may be multiple values. For example, in the case where a plurality of inventors are described in one patent document, the value of the inventor attribute will be the same as the number of inventors.
[表 1] [table 1]
技術文書番号 i 属性 X 属性 Y ■ . . 属性 z Technical document number i attribute X attribute Y ■. Attribute z
1  1
2 X! Y3 z2 z4 2 X! Y 3 z 2 z 4
3 y3 3 y 3
4 χ z2 z3 4 χ z 2 z 3
5 z3 5 z 3
6 x2 ^ z4 6 x 2 ^ z 4
7 x2 3 z4 7 x 2 3 z 4
8 x2 Y4 z4 8 x 2 Y 4 z 4
9 x2 z5 9 x 2 z 5
1 0 x3 γ2 z5 1 0 x 3 γ 2 z 5
N x4 Z3 N x 4 Z 3
[0049] く 3— 2.スコアの算出〉 [0049] 3-2. Calculation of score>
次に、スコア算出部 120において、上記少なくとも 2種類の属性のうち第 1の属性 X と第 2の属性 Yとの組合せのそれぞれに属する技術文書のデータに応じたスコアを 算出する (ステップ S 120)。  Next, the score calculation unit 120 calculates a score corresponding to the data of the technical document belonging to each of the combinations of the first attribute X and the second attribute Y among the at least two types of attributes (step S 120) ).
このために、まず上記属性のうち 2種類 (例えば「出願人」及び「キーワード」の 2種 類。以下本実施例の説明においてそれぞれ X及び Yとする。)を選択する。この選択 は入力装置 2より入力されるユーザの指示に基づき行われるが、 2種類の属性のうち 一方を出願人や発明者などの人的属性とし、他方をキーワードや IPCなどの技術分 野属性とすることが好ましい。また、 2種類の属性の両方を技術分野属性とし、例えば 一方を技術分類、他方を技術要素としてもよい。また、 2種類の属性のうち何れか又 は両方に、人的属性でも技術分野属性でもない属性、例えば出願日等を選択しても よい。  For this purpose, first, two types of the above-mentioned attributes (for example, two types of "applicant" and "keyword"; hereinafter referred to as X and Y respectively in the description of this embodiment) are selected. This selection is made based on the user's instruction input from the input device 2. One of the two types of attributes is a human attribute such as the applicant or the inventor, and the other is a technical field attribute such as a keyword or IPC. It is preferable to Also, both of the two types of attributes may be technical field attributes, for example, one may be a technical classification and the other may be a technical element. In addition, an attribute that is neither a human attribute nor a technical field attribute may be selected as one or both of the two types of attributes, such as the filing date.
[0050] 2種類の属性を選択したら、各属性 X、 Yにつ!/ヽて、属性の値 X、 Y (例えば出願人 j k  [0050] After selecting two types of attributes, each attribute X, Y! / !, the value of the attribute X, Y (eg, applicant j k
やキーワードの具体的名称を指し、数値に限らない)の範囲 (値域)を決定する。例え ば次の [表 2]のような、該当技術文書数の降順ランキングを作成し、属性 Xについて は上位 p個、属性 Yについては上位 q個以内に入る値の範囲を、各属性の値域とす る。属性 Xの値域内の値 Xの個数 pと、属性 Yの値域内の値 Yの個数 qは同じでも良 いし、異なっていてもよい。例えば、件数上位何社につき分析する力、或いはどの技 術分野について分析するか、など分析目的に応じて値域を選択すればよい。以下の 説明では、属性 Xについては値 X、 X、 · · ·、 X力 属性 Yについては値 Y、 Y、 · · And the specific name of the keyword, and it is not limited to the numerical value). For example, create a descending order ranking of the number of relevant technical documents, as shown in [Table 2] below, and specify the range of values that fall within the top p for attribute X and the top q for attribute Y. To. The number p of values X in the range of attribute X and the number q of values Y in the range of attribute Y may be the same It may be different. For example, the value range may be selected according to the purpose of analysis, such as the ability to analyze top companies in the number or which technical field to analyze. In the following description, values X, X, · · · · for attribute X, values X for attribute Y, Y · Y, · ·
1 2 p 1 2 1 2 p 1 2
·、 Y力 値域として決定されたものとして説明する。 · Described as being determined as the Y power range.
[表 2] [Table 2]
Figure imgf000019_0001
続いて、属性の値 Xと Yとの各組み合わせ(但し、 j = l, 2, · · · , p、k= l, 2, · · · ,
Figure imgf000019_0001
Then, each combination of attribute values X and Y (where j = l, 2, · · ·, p, k = l, 2, · · ·,,
] k  ] k
q)について、これら属性の値の組み合わせを有する技術文書の文書数に基づいて p X q個のスコア σ を算出する。 For q), calculate p x q scores σ based on the number of technical documents having a combination of values of these attributes.
スコア σ は、属性の値 Xと Υとの組み合わせ (X、 Υ )が同一である技術文書の文 書数そのものとしても良いし、規格化処理をするなど文書数を変数とする関数値とし てもよい。スコア σ を文書数そのものとした場合、例えば上の [表 1]のように、属性の 値 (X、 Υ )の組をもつ技術文書が、 Ν個の技術文書のうち技術文書番号 i= lのみ である場合は、(X、 Y )の組に関するスコア σ は、 1となる。また例えば上の [表 1]  The score σ may be the number of technical documents in which the combination of the attribute value X and 値 (X,)) is the same, or may be a function value with the number of documents as a variable such as normalization processing. It is also good. Assuming that the score σ is the number of documents itself, for example, as shown in [Table 1] above, a technical document having a set of attribute values (X, Υ) is a technical document number i = l among the technical documents. If only, then the score σ for the (X, Y) pair is 1. For example, above [Table 1]
1 1 11  1 1 11
のように、属性 (X、 Υ )の組をもつ技術文書が、 Ν個の技術文書のうち技術文書番 A technical document with a set of attributes (X, Υ) is a technical document number among a set of technical documents.
1 3  13
号 i = 2及び 3の 2つである場合は、(X、 Y )の組に関するスコア σ は、 2となる。ス In the case of two numbers i = 2 and 3, the score σ on the (X, Y) pair is 2. The
1 3 31  1 3 31
コア σ は例えば次の [表 3]のようなものとなる。以下、この [表 3]に示された仮想事 例を適宜参照する。 The core σ is, for example, as shown in [Table 3] below. Hereafter, the virtual cases shown in [Table 3] will be referred to as appropriate.
[表 3] び [Table 3] The
Χ2 χ3 Χ4 χ5 χ6 σ ! τ = 1 σ12=8 σ13=0 σ14= 1 σ15 = 0 σ16= 1 γ2 σ21 =0 σ 22 = 0 σ 23= 5 σ24= 2 び 25= 1 σ26=0 γ3 σ si = 6 σ 32= 0 σ 33= 0 σ34= 0 σ35= 1 σ36=0 γ4 σ41 = 2 σ42= 1 σ43= 0 σ44= 0 σ45= 1 σ46=0 γ5 σ51 = 1 σ52=0 び 53= 1 σ54=0 σ55 = 0 σ56=0 γ6 σ61 =0 σ62= 1 σ63=0 σ64=0 σ65 = 0 σ 66= 1 Χ 2 χ 3 Χ 4 χ 5 χ 6 σ! Τ = 1 σ 12 = 8 σ 13 = 0 σ 14 = 1 σ 15 = 0 σ 16 = 1 γ 2 σ 21 = 0 σ 22 = 0 σ 23 = 5 σ 24 = 2 × 25 = 1 σ 26 = 0 γ 3 σ si = 6 σ 32 = 0 σ 33 = 0 σ 34 = 0 σ 35 = 1 σ 36 = 0 γ 4 σ 41 = 2 σ 42 = 1 σ 43 = 0 σ 44 = 0 σ 45 = 1 σ 46 = 0 γ 5 σ 51 = 1 σ 52 = 0 and 53 = 1 σ 54 = 0 σ 55 = 0 σ 56 = 0 γ 6 σ 61 = 0 σ 62 = 1 σ 63 = 0 σ 64 = 0 σ 65 = 0 σ 66 = 1
[0052] このように、属性の値の組み合わせが pXq個あるため、 pXq個のスコア σ (j = l, 2, · · ·, p、k=l, 2, · · ·, q)を、 q行 p列のマトリクス状に配置することができる。 [表 3 ]の例では、 6行 6歹 Uとなっている。 Thus, since there are pXq combinations of attribute values, pXq scores σ (j = l, 2, · · ·, p, k = l, 2, · · ·, q), It can be arranged in a matrix of q rows and p columns. In the example of [Table 3], it is 6 lines 6 歹 U.
[0053] なお、属性 X又は Yの値域が大きく pや qが大きすぎる場合には、ある程度の幅を設 けて属性の値を設定し直した上でスコア σ を決定してもよい。例えば、属性 Xとして 出願日を選んだ場合、そのままでは数年分で ρの値が 1000以上になるが、出願年 や出願年月を属性の値として設定してもよい。これにより、属性の値域を分析しやす い大きさとすることができる。  When the value range of attribute X or Y is large and p or q is too large, the score σ may be determined after setting a certain width and resetting the value of the attribute. For example, when the filing date is selected as the attribute X, the value of が becomes 1000 or more in several years as it is, but the year of filing or the year of filing may be set as the value of the attribute. This allows the range of attributes to be easily analyzed.
[0054] ここでは文書数に基づいてスコア σ を算出する例について説明した力 これに限 らず、個々の技術文書に重み 0; =1, 2, ···, Ν)を与え、この重みをスコアの算出 に反映させることとしてもよい。例えば、属性の値 Xと Υとの各組み合わせにっき、 j k  Here, the power described in the example of calculating the score σ based on the number of documents is not limited to this, and weights 0; = 1, 2,..., Ν are given to individual technical documents, and the weights May be reflected in the calculation of the score. For example, for each combination of attribute value X and Υ, j k
σ = ∑ α V ie(X、 Υ)  σ = α α V ie (X, Υ)
« i j k  «I j k
で算出してもよい。すなわち、属性の値の組合せが (X、 Y )であるすベての iについ j k  It may be calculated by That is, all combinations of attribute values are (X, Y) j k
ての重みひの和をスコア σ としてもよい。例えば上の [表 1]のように、属性 (X、 Υ ) i kj 1 3 の組をもつ技術文書が、 N個の技術文書のうち技術文書番号 i= 2及び 3の 2つであ り、それぞれ重み α  The sum of the weights may be used as the score σ. For example, as shown in [Table 1] above, the technical documents having the set of attributes (X,)) ikj 1 3 are two of N technical documents, technical document number i = 2 and 3. , Each weight α
2及び α  2 and α
3が与えられている場合は、(X、Υ  If 3 is given, (X, Υ
1 3 )の組に関するスコア σ 、 ひ + ひ と る。  The score σ for the set of 1 3) and + +.
31 2 3  31 2 3
この場合の重み は、例えば特許文書の場合には特許登録されていれば大きな 値を、されていなければ小さな値を付与するなど、出願経過情報に基づいて付与し たり、独立請求項数や被引用回数などに基づいて付与したりするのが好ましい。  In this case, the weight may be given based on the application progress information, for example, a large value if patented in the case of a patent document, and a small value if not registered, etc. It is preferable to give based on the number of citations and the like.
[0055] スコア σ を文書数で表す場合 (すべての技術文書につき同じ重み α =1を与える  [0055] When the score σ is expressed by the number of documents (give the same weight α = 1 for all technical documents
« i  «I
場合)は、属性の分布が簡単に且つ客観的に表現されるという利点がある。 一方、技術文書ごとに別々の重み aの値を与え、重み aを合計してスコア σ を算 Case) has the advantage that the distribution of the attributes is expressed simply and objectively. On the other hand, given the value of a separate weight a for each technical document, add the weights a and calculate the score σ
i i W 出する場合、技術文書の重要度或いは質的要素を加味したスコアで、適切に分析を 行うことができる。  In the case of i i W, analysis can be appropriately performed with a score that takes into account the importance or qualitative factors of technical documents.
[0056] < 3— 3.ベクトルの生成 >  [0056] <3— 3. Generation of Vectors>
次に、第 1及び第 2ベクトル群生成部 130及び 140において、ベクトルを生成する( ステップ S 130及び S 140)。  Next, vectors are generated in the first and second vector group generation units 130 and 140 (steps S130 and S140).
具体的には、上記のようにスコアを q行 p列のマトリクス状に配置したときの各列に属 するスコア σ 、 σ 、 · · ·、 σ を成分とする q次元ベクトルを、ベクトル X (j = 1, 2, · · lj 2] qj ]  Specifically, when the scores are arranged in a matrix of q rows and p columns as described above, a q-dimensional vector whose components are the scores σ, σ, · · · · · σ belonging to each column is a vector X ( j = 1, 2, · · · lj 2] qj]
· , ρ)とする。このベクトル Xは、属性 Xの値 Xについて、属性 Υの分布を示すベクトル となる。例えばある企業 Xの特許出願について、技術分野の分布を示すベクトルとな る。上の [表 3]の仮想事例では、出願人 Xは、技術分野 Υ及び Υにおいて多くの特  ·, Ρ). This vector X is a vector representing the distribution of the attribute に つ い て for the value X of the attribute X. For example, for a patent application of a certain company X, it becomes a vector indicating the distribution of technical fields. In the hypothetical case of [Table 3] above, applicant X has many features in the technical fields Υ and Υ.
1 3 4  1 3 4
許出願をしているが、技術分野 Υ  Applied for a patent, but the technical field
2及び Υの  2 and salmon
6 特許出願はしていない。  6 No patent application has been filed.
同様に、上記のようにマトリクス状に配置したときの各行に属するスコア σ 、 σ 、 · kl k2 Similarly, the scores σ, σ, · kl k2 belonging to each row when arranged in a matrix as described above
· ·、 σ を成分とする p次元ベクトルを、ベクトル Y (k= l, 2, · · ·, q)とする。このべ kp k · · · Let p be a p-dimensional vector whose component is σ be vector Y (k = l, 2, · · ·, q). This is kp k
タトル Yは、属性 Υの値 Υについて、属性 Xの分布を示すベクトルとなる。例えばある k k  The turtle Y is a vector indicating the distribution of the attribute X with respect to the value 属性 of the attribute Υ. For example k k
技術分野 Y  Technical field Y
kについて、出願人の分布を示すベクトルとなる。上の [表 3]の仮想事例 では、技術分野 Y  The vector k represents the distribution of applicants. In the hypothetical case above [Table 3], the technical field Y
1において、出願人 X  In 1, the applicant X
2が多くの特許出願をしている力 他の出願人 はあまり特許出願をして 、な 、。  The power of 2 has many patent applications Other applicants do not have much patent applications.
[0057] ベクトル X及び Yは、上記のようにスコアそのものを成分としてもよいが、スコア σ [0057] The vectors X and Y may have the score itself as a component as described above, but the score σ
j k « の対数を成分とすることが望ましい。これは、 2種類の技術文書属性の組合せに基づ くスコア σ は非負で且つ 0付近に分布が集中しやすい傾向にあるためである。この ような場合にスコア σ の対数を成分とすれば、ベクトル成分の分布が正規分布に近 くなるので、関連性算出結果の信頼度を向上することができる。特に関連性の評価 方法として相関係数を選んだ場合は、スコア σ の対数を成分とすることが望ましい。 なお、スコア σ 力^の場合は対数を定義できな 、が、例えば 0の対数をとるべきとこ ろを便宜的に― 1又はその他の負数とぉ 、ても良 、し、或いは全スコアにそれぞれ 便宜的に 1又はその他の正数をカ卩えた上でそれぞれ対数をとつてもよい。  It is desirable to use the logarithm of j k «as a component. This is because the score σ based on the combination of two technical document attributes is nonnegative and the distribution tends to be concentrated near zero. In such a case, if the logarithm of the score σ is used as the component, the distribution of the vector component becomes close to the normal distribution, and the reliability of the relevancy calculation result can be improved. In particular, when the correlation coefficient is selected as the evaluation method of relevance, it is desirable to use the logarithm of the score σ as a component. In the case of score σ power ^, the logarithm can not be defined, but, for example, it should be taken as logarithm of 0, for convenience-1 or other negative numbers and 良, may be, or all score For convenience, one or another positive number may be added and logarithms may be added, respectively.
[0058] ベクトルの生成法としては、上記のようにスコアそのものを成分とする方法、スコア σ の対数を成分とする方法の他に、スコアに出現頻度の逆数を乗算したものを成分と[0058] As a method of generating a vector, a method using the score itself as a component as described above, a score σ In addition to the method in which the component logarithm is used as the component, the product of the score multiplied by the reciprocal of the frequency of occurrence is the component
W W
する方法も考えられる。 How to do
例えば上記 [表 3]において、一方の属性 Xの値 Xにおいては、スコア σ が属'性 Υ  For example, in the above [Table 3], in the value X of one attribute X, the score σ is attribute property Υ
2 k2 の値域 Y〜Υにおいて 3回出現している(σ =0であるスコアは出現回数に算入し  A score that appears three times in a 2 k 2 range Y to Υ (σ = 0) is included in the number of occurrences
1 6 kj  1 6 kj
て ヽな 、)。従って、値 Xに対応するスコア σ にはこの出現頻度の逆数である 1Z3 Thank you. Therefore, the score σ corresponding to the value X is the inverse of this frequency of occurrence 1Z3
2 k2  2 k2
を乗算する。 更に上記 [表 3]において、他方の属性 Yの値 Yにおいては、スコア σ が属性 Xの値域 X〜Χにおいて 4回出現している。従って、値 Υに対応するスコ lj 1 6 1 Multiply. Furthermore, in the above [Table 3], in the value Y of the other attribute Y, the score σ appears four times in the range X to 属性 of the attribute X. Therefore, the score lj 1 6 1 corresponding to the value Υ
ァ σ にはこの出現頻度の逆数である 1Z4を乗算する。 すると、例えばスコア σ lj 12Σ σ is multiplied by 1Z4, which is the reciprocal of this frequency of occurrence. Then, for example, the score σ lj 12
=8に対しては、値 Xにおける出現頻度の逆数 1Z3と値 Yにおける出現頻度の逆 For = 8, the reciprocal of the frequency of occurrence in value X, 1Z3 and the reverse of the frequency of occurrence in value Y
2 1  twenty one
数 1Z4とを乗算することになるので、ベクトル Xの第 1成分或いはベクトル Yの第 2 The first component of vector X, or the second of vector Y, is multiplied by the number 1Z4.
2 1 成分 (値 (X、 Y )の組み合わせに対応する成分)は、 8/ (3 X 4)となる。他の成分  2 1 components (components corresponding to combinations of values (X, Y)) become 8 / (3 × 4). Other ingredients
2 1  twenty one
についても同様に各スコアに出現頻度の逆数を乗算すると、 [表 4]のような成分が得 られる。値域 X〜X Similarly, if each score is multiplied by the reciprocal of the frequency of occurrence, the components shown in [Table 4] are obtained. Range X to X
1 6に対応する各列の成分から構成されるベクトルをそれぞれべタト ル X〜Xとし、値域 Y〜Yに対応する各行の成分から構成されるベクトルをそれぞ Let the vectors composed of the components of each column corresponding to 16 be vectors X to X, and the vectors composed of the components of each row corresponding to the range Y to Y be each
1 6 1 6 1 6 1 6
れべクトノレ Υ〜Υとする。 Let's say べ ク ト ノ.
1 6  1 6
[表 4]  [Table 4]
Figure imgf000022_0001
このように、多くのベクトルに共通して出現するベクトル成分の値が低ぐ特定のベタ トルにしか出現しないベクトル成分の値が高くなるようにすることで、各文書属性の値 独自のスコアを強調したベクトルを生成することができる。
Figure imgf000022_0001
In this way, by making the value of the vector component that appears only in a specific vector lower, the value of the vector component appearing common to many vectors becomes higher, the value unique to each document attribute value An enhanced vector can be generated.
< 3— 4.関連性の算出 > 次に、第 1及び第 2ベクトル関連性算出部 150及び 160において、 p個のベクトル X 相互の関連性及び q個のベクトル Y相互の関連性をそれぞれ算出する (ステップ S1 <3— 4. Relevance calculation> Next, in the first and second vector relation calculation units 150 and 160, the relation between p vectors X and the relation between q vectors Y are respectively calculated (step S1).
k  k
50及び SI 60)。  50 and SI 60).
例えば上記 [表 3]の仮想事例において、 p個のベクトル X相互の関連性は、例えば 相関係数を用いて次の [表 5]のようなデータとして得られる。  For example, in the virtual case of [Table 3] above, the correlation between p vectors X can be obtained as data shown in [Table 5] below using, for example, a correlation coefficient.
[表 5]  [Table 5]
Figure imgf000023_0001
Figure imgf000023_0001
ここでは属性 Xに対応するベクトル Xについて関連性の算出結果を示した力 属性 Yについても同様に行うことができる。関連性の評価方法としては、相関係数の他に も、内積を用いる方法、スピアマンの順位相関係数を算出する方法などが考えられる  Here, the same can be applied to the force attribute Y indicating the calculation result of the relativity for the vector X corresponding to the attribute X. As a method of evaluating relevance, in addition to the correlation coefficient, a method using an inner product, a method of calculating Spearman's rank correlation coefficient, etc. can be considered.
[0061] < 3- 5.ベクトルの配置 > [0061] <3-5. Arrangement of Vectors>
次に、第 1及び第 2ベクトル配置部 170及び 180において、関連性の高いベクトル 同士を関連性の低いベクトルより近くに配置する処理を行う。以下に、その方法の 1 つを説明する。なお、以下では主に属性 Xについての例を示しながら説明するが、属 性 Yについても同様に行うことができる。  Next, in the first and second vector arrangement units 170 and 180, processing is performed to arrange vectors having high relevance closer to vectors having low relevance. The following explains one of the methods. Although the following description will be mainly made by showing an example of the attribute X, the same can be applied to the attribute Y.
[0062] < 3— 5— 1.クラスタの生成 >  <3 — 5 — 1. Create Cluster>
まず、第 1及び第 2クラスタ生成部 171及び 181において、 2つのベクトルを隣接さ せてクラスタを生成する(ステップ S 171及び S 181 )。  First, in the first and second cluster generation units 171 and 181, two vectors are made adjacent to generate a cluster (steps S171 and S181).
その方法の一例としては、 p個のベクトル Xのうち、相互の関連性が最も高い 2つの ベクトルを選択し、これらのベクトルを隣接させてクラスタを生成する。上の [表 5]の例 では、相関係数 0. 84であるベクトル Xと X力 最も関連性の高いベクトルであるから  As an example of the method, two vectors having the highest correlation among p vectors X are selected, and these vectors are adjacent to generate a cluster. In the example in [Table 5] above, the vector X and the X force with the correlation coefficient 0.84 are the most relevant vectors.
3 4  3 4
、これらを隣接させる。 関連性が最も高い 2つのベクトルを選択してクラスタを生成することにより、関連性が 最も高 、ベクトル同士を確実に隣接させることができるので、ベクトル配置の定量的 な客観性を担保することができる。 , Make them adjacent. By selecting two vectors with the highest relevance and generating clusters, it is possible to ensure that the vectors have the highest relevance and ensure that the vectors are adjacent to each other, thus ensuring quantitative objectivity of the vector arrangement. it can.
[0063] 隣接させるベクトルの選択は、他の方法によってもよい。例えば、特定の出願人(自 社など)を残りの出願人と対比したい場合に、当該特定の出願人のベクトルと、これに 最も関連性の高いベクトルとを隣接させてもよい。また例えば、特定の 2名の出願人( 自社と競合他社など)を対比しつつ、これらと残りの出願人とを対比したい場合に、当 該特定の 2名の出願人のベクトルを隣接させてもよ!、。 Selection of adjacent vectors may be made by other methods. For example, when it is desired to contrast a specific applicant (such as own company) with the remaining applicants, the vector of the specific applicant may be adjacent to the vector most relevant thereto. Also, for example, when it is desired to compare these with the remaining applicants while comparing the two specific applicants (such as the company and the competitor), the vectors of the two specific applicants should be made adjacent. Well!
以下、隣接させた複数のベクトルの集まりを「クラスタ」と称することにする。  Hereinafter, a group of a plurality of adjacent vectors will be referred to as a "cluster".
[0064] < 3— 5— 2.クラスタの拡大 > <3 — 5 — 2. Cluster Expansion>
次に、第 1及び第 2クラスタ拡大部 172及び 182において、加入ベクトルをクラスタ に加えてクラスタを拡大させる(ステップ S 172及び S 182)。  Next, in the first and second cluster expanding units 172 and 182, the joining vector is added to the cluster to expand the cluster (steps S172 and S182).
具体的には、クラスタの両端に位置するベクトルと、クラスタに入っていない残りの各 ベクトルとの間で、最も関連性の高いベクトルの組を判定する。上の例では、クラスタ の両端に位置するベクトル X又は Xとの間で最も関連性の高いベクトルは、ベクトル  Specifically, the most relevant set of vectors is determined between the vectors located at both ends of the cluster and the remaining vectors not in the cluster. In the example above, the most relevant vector to the vector X or X located at either end of the cluster is the vector
3 4  3 4
Xとの相関係数が 0· 37であるベクトル Xである。このベクトル Xを加入ベクトルとす It is a vector X whose correlation coefficient with X is 0 · 37. Let this vector X be a join vector
3 5 5 3 5 5
る。  Ru.
最も関連性の高いベクトルの組が決定されたら、そのベクトル同士を隣接させること により、より大きなクラスタを形成する。上の例では、既に隣接しているベクトル X及び  Once the most relevant set of vectors is determined, the vectors are made adjacent to form larger clusters. In the example above, the already adjacent vector X and
3 3
Xのうち、ベクトル Xの隣に、加入ベクトル Xを隣接させる。但し、これに限らずクラスOf X, next to vector X, join vector X is adjacent. However, the class is not limited to this
4 3 5 4 3 5
タ内の他の箇所に加入ベクトルを加入させてもよい。  The join vector may be joined to another point in the data.
以上のように関連性の高 、ベクトル力 順次隣接させてクラスタを拡大させることに より、関連性の高いベクトル同士を確実に近くに配置し、文書属性のデータ分布の集 中や分散の状態を明示させるように分布状態を形成することができる。  As described above, by expanding the clusters by sequentially arranging the vectors with high relevance, the vectors with high relevance are surely arranged close to each other, and the state of concentration or dispersion of data distribution of the document attribute is obtained. The distribution can be formed to be explicit.
[0065] クラスタ拡大の結果、クラスタ未加入のベクトルがなくなったら(ステップ S 173及び S 183 : NO)、ベクトルの配置は終了する。クラスタ未加入のベクトルが残っている場合 (ステップ S 173及び S 183 : YES)、それぞれステップ SI 74及び SI 84に移行する。  [0065] As a result of cluster expansion, when there are no vectors not yet joined to the cluster (steps S173 and S183: NO), the arrangement of vectors ends. If there is a cluster unjoined vector (steps S 173 and S 183: YES), the process goes to steps SI 74 and SI 84 respectively.
[0066] ステップ S174及び S184では、第 1及び第 2クラスタ拡大中止判定部 174及び 184 において、クラスタ以外のベクトルとの関連性が何れも所定の閾値以下である力否か を判定する。 1つでも所定の閾値を超える関連性がある場合 (ステップ S 174及び S 1 84 : NO)、それぞれステップ S172及び S182に戻ってクラスタを順次拡大する。 例えば、ベクトル X、 X、 Xの順で隣接しているクラスタの両端 X又は Xとの間で In steps S 174 and S 184, the first and second cluster enlargement stop determination units 174 and 184 are determined. In the above, it is determined whether or not all the associations with vectors other than clusters are less than or equal to a predetermined threshold. If there is at least one relationship that exceeds the predetermined threshold (steps S 174 and S 184: NO), the process returns to steps S 172 and S 182 to expand the cluster sequentially. For example, between the two ends X or X of adjacent clusters in the order of vectors X, X, X
5 3 4 5 4 最も関連性の高いベクトルは、ベクトル Xとの相関係数が 0. 49であるベクトル Xで  5 3 4 5 4 The most relevant vector is the vector X, whose correlation coefficient with the vector X is 0.49
5 1 あるとすると、ベクトル Xの隣に、加入ベクトル Xを隣接させる。  5 1 If there is, join vector X is adjacent to vector X.
5 1  5 1
[0067] クラスタの両端のうちいずれに関連性の高いベクトルを隣接させるかを、予め決め ておいてもよい。例えば、クラスタの両端のうち一方のみとの間で関連性の高いベタト ルを判定して隣接させることにしておけば、最初にクラスタを構成したベクトルが最終 的にマトリクスの端部に配置されたものを作成することも可能である。また例えば、クラ スタの一端及び他端で、交互に関連性の高いベクトルを判定して隣接させることにし ておけば、最初にクラスタを構成したベクトルが最終的にマトリクスの中央に配置され たものを作成することも可能である。  [0067] It may be determined in advance which of the two ends of the cluster the highly relevant vector is to be adjacent. For example, if it is determined that highly relevant betats are determined and adjacent to only one of both ends of a cluster, the vector that first made up the cluster is finally placed at the end of the matrix. It is also possible to make things. Also, for example, if it is determined that adjacent highly correlated vectors are determined alternately at one end and the other end of the cluster, the vector which first made up the cluster is finally arranged at the center of the matrix. It is also possible to create
[0068] < 3— 5— 3.他のクラスタの生成 >  <3—5— 3. Generation of Other Clusters>
ステップ S 174及び S 184において、関連性が何れも所定の閾値以下である場合 ( ステップ S 174及び S 184 : YES)、それぞれステップ S175及び S185に移行する。 ステップ S 175及び S 185では、第 1及び第 2クラスタ再生成部 175及び 185にお!/ヽ て、上記クラスタ以外のベクトル群のうち 2つのベクトルを隣接させて他のクラスタを生 成する。  In steps S174 and S184, when the relevancy is less than or equal to a predetermined threshold (steps S174 and S184: YES), the process proceeds to steps S175 and S185, respectively. In steps S175 and S185, in the first and second cluster regenerating units 175 and 185, two vectors in the vector group other than the above cluster are made adjacent to generate another cluster.
そして、第 1及び第 2クラスタ再拡大部 176及び 186において、加入ベクトルを上記 他のクラスタに加えて上記他のクラスタを拡大させる(ステップ S176及び S186)。 すなわち、閾値以上の関連性を有するベクトルがなくなった場合には、残りのベタト ルだけで再度クラスタを生成し、上記と同様のクラスタ拡大手順を繰り返す。  Then, in the first and second cluster reexpanders 176 and 186, the join vector is added to the other cluster to expand the other cluster (steps S176 and S186). That is, when there is no vector having relevance higher than the threshold value, clusters are generated again only with the remaining beta and the same cluster expansion procedure as described above is repeated.
[0069] このように、クラスタの両端のベクトルとの関連性が所定の閾値以下である場合に、 1つのクラスタに無理にまとめられてしまうことを回避し、より高い関連性を有するベタ トル同士の組合せを優先させることにより、ベクトルの配置の信頼性を向上することが できる。 As described above, when the association with the vectors at both ends of the cluster is less than or equal to a predetermined threshold, it is possible to prevent the group being forcibly integrated into one cluster, and to obtain betas having higher relevance. By giving priority to the combination of, it is possible to improve the reliability of the arrangement of the vectors.
関連性の閾値は、例えば相関係数なら 0 (無相関)とすることが望ましい。関連性の 評価方法として相関係数を用いることは、このように閾値を設定し易い点でも有利で ある。 It is desirable that the threshold of relevance be, for example, 0 (no correlation) if the correlation coefficient. Relevance Using a correlation coefficient as an evaluation method is also advantageous in that it is easy to set the threshold in this manner.
[0070] 他のクラスタの拡大の結果、クラスタ未加入のベクトルがなくなったら(ステップ S 17 7及び S 187 : NO)、ベクトルの配置は終了する。クラスタ未加入のベクトルが残って V、る場合 (ステップ S 177及び S 187 : YES)、それぞれステップ S 178及び S 188に移 行する。  [0070] As a result of expansion of another cluster, when there are no vectors not yet joined to the cluster (steps S177 and S187: NO), the arrangement of the vectors ends. If there are unclustered vectors remaining V (steps S 177 and S 187: YES), the process proceeds to steps S 178 and S 188, respectively.
[0071] ステップ S178及び S188では、クラスタ以外のベクトルとの関連性が何れも所定の 閾値以下であるか否かを判定する。 1つでも所定の閾値を超える関連性がある場合( ステップ S 178及び S 188 : NO)、それぞれステップ S 176及び S 186に戻って上記他 のクラスタを順次拡大する。関連性が何れも所定の閾値以下である場合 (ステップ S 1 78及び S 188 : YES)、それぞれステップ S 175及び S 185に戻って更に他のクラスタ を生成する。  In steps S 178 and S 188, it is determined whether or not the relevance to any vector other than the cluster is less than or equal to a predetermined threshold. If at least one of the associations exceeds the predetermined threshold (steps S 178 and S 188: NO), the process returns to steps S 176 and S 186, respectively, to successively expand the other clusters. If all the relevancy is below the predetermined threshold (steps S 1 78 and S 188: YES), the process returns to steps S 175 and S 185 respectively to generate further clusters.
[0072] 以上の処理により、クラスタが複数できるので、最後にこれらクラスタ同士を隣接さ せる。クラスタ同士を隣接させる方法としては、クラスタの大きさ(クラスタに含まれるベ タトルの数)の降順又は昇順で、一端側から他端側へ一方向に並べる方法、両端か ら中央に向力つて交互に並べる方法などが考えられる。  By the above processing, a plurality of clusters can be formed, and finally, these clusters are adjacent to each other. As a method of making clusters adjacent to each other, there is a method of arranging in one direction from one end side to the other end in descending or ascending order of cluster size (the number of vectors included in the cluster) A method of arranging them alternately can be considered.
同様の手順を属性 Xのみならず属性 Yについても行い、配置決定が終了する。上 の例では次の [表 6]のようになる。  The same procedure is performed not only for the attribute X but also for the attribute Y, and the placement determination is completed. In the above example, it is as shown in [Table 6] below.
[表 6]  [Table 6]
Figure imgf000026_0001
なお、ステップ S120におけるスコア算出後の、第 1ベクトル群生成部 130、第 1ベタ トル関連性算出部 150、第 1ベクトル配置部 170における処理 (ステップ S130、 S15 0及び S171〜S178)と、第 2ベクトル群生成部 140、第 2ベクトル関連性算出部 160 、第 2ベタ卜ノレ酉己置咅 における処理 (ステップ S140、 S160及び S181〜S188) とは、任意の何れか一方から先に、他方を後に実行しても良いし、両者を同時に並 行して実行してもよい。また、何れか一方のみを実行することとしても良い。何れか一 方のみを実行するのは、例えば、一方の属性 Xを出願人などの人的属性、他方の属 性 Yを IPCなどのコード体系による技術分類としたときに、属性 Yについては関連性 に基づく配置を行わず、体系化されたコード番号順に従って配置した方が見やすい 場合などが考えられる。
Figure imgf000026_0001
Note that the processes in the first vector group generation unit 130, the first vector relevance calculation unit 150, and the first vector arrangement unit 170 (steps S130, S150, and S171 to S178) after score calculation in step S120, and 2 vector group generation unit 140, second vector relevance calculation unit 160 The process in step 2 (steps S140, S160, and S181 to S188) may be performed either first or the other first, or both at the same time. It may be executed. Alternatively, only one of them may be executed. For example, when one of the attributes X is a human attribute such as an applicant, and the other attribute Y is a technology classification according to a coding system such as IPC, it is relevant to execute only one of them. In some cases, it may be easier to see if it is arranged according to the systematic code number order without arranging based on gender.
[0073] < 3— 6.出力例 >  [0073] <3— 6. Output Example>
出力装置 4による出力は、上記 [表 6]のような形態でも良いし、更に見やすくするた め、スコアの分布状態を、スコアに応じた模様又は色彩を付して表示しても良い。例 えば、高いスコアが分布する領域に濃い色又は暖色を付し、低いスコアが分布する 領域に薄 、色又は寒色を付すのが好まし 、。スコアの分布を数値で示すだけでは、 分布状態が一見して明らかではない可能性があるが、模様又は色彩を付すことによ り、スコアの分布状態を見やすく表示することができる。  The output by the output device 4 may be in the form as shown in [Table 6] above, and in order to make it easier to view, the distribution state of the score may be displayed with a pattern or color according to the score. For example, it is preferable to give dark or warm colors to areas where high scores are distributed, and light, cold or cold colors to areas where low scores are distributed. Although the distribution state may not be apparent at first glance just by showing the distribution of the score numerically, the distribution state of the score can be displayed in an easy-to-see manner by adding a pattern or a color.
[0074] 図 3は、表示部による 1つの表示例を示す図である。この図では、分布が密の領域 には線密度の高 、格子状斜線を付し、分布が粗の領域には線密度の低 、格子状斜 線を付している。この図に示すように、スコアの分布状態を所謂雲マップ或いは等高 線マップで示すことにより、スコアの分布状態の粗密が明瞭になり、スコアの分布状 態をより識別しやすく表示することができる。  FIG. 3 is a view showing one display example by the display unit. In this figure, the area with dense distribution is hatched with a high linear density, and the area with coarse distribution is hatched with a low density. As shown in this figure, by showing the distribution of scores in a so-called cloud map or contour map, the density of the distribution of scores becomes clear, and the distribution of scores can be displayed more easily. it can.
[0075] 図 4は、表示部による他の 1つの表示例を示す図である。この図では、第 1の属性 X として「出願人」を、第 2の属性 Yとして「技術分野」を選んだ場合の各属性の値が具 体的に示されている。この図でも、分布が密の領域には線密度の高い格子状斜線を 付し、分布が粗の領域には線密度の低い格子状斜線を付しているので、スコアの分 布状態の粗密が明瞭となっている。すなわち、特定の「出願人」を選んで分布が密の 箇所を見れば、当該出願人において開発されている主要な技術分野を読み取ること ができ、特定の「技術分野」を選んで分布が密の箇所を見れば、当該技術分野にお V、て開発を行って 、る主要な出願人を読み取ることができる。  FIG. 4 is a view showing another display example by the display unit. In this figure, the value of each attribute is shown specifically when "applicant" is selected as the first attribute X and "technical field" is selected as the second attribute Y. Also in this figure, grids with high line density are hatched in areas with dense distribution, and grids with low line density are printed in areas with coarse distribution, so the coarse / dense state of the score distribution state. Is clear. In other words, if you select a specific “applicant” and look at the dense distribution, you can read the main technical fields being developed by the applicant, and select a specific “technical field” and the distribution is dense. If you look at the section, you can read the major applicants who are developing in the relevant technical field.
[0076] 図 4のように人的属性と技術分野属性とを用いることにより、次のような分析が可能 となる。 The following analysis is possible by using human attributes and technical field attributes as shown in FIG. It becomes.
[0077] 自社と他社とで技術開発領域の関連性が示されるので、  [0077] As the relationship between technology development areas is shown between your company and other companies,
(a)類似の開発性向を有する企業を探すことができる。図 4では、例えば「E自動車」 を自社としたとき、隣接している「F電気」を発見できる。ここで発見される企業は、現 に市場で自社と競合して 、る企業とは限らな 、。自社「E自動車」と比較される「F電 気」が、「電池」「セラミタス」など自社と類似の開発性向を有しながら、自社にとって未 参入の業界 (例えば電気関連製品)に参入済みの場合、自社がその業界に新規参 入するための技術的ハードルは低いことが予想できる。  (a) You can look for companies with similar development tendencies. In Fig. 4, for example, when "E car" is in-house, adjacent "F electricity" can be found. The companies found here are currently limited to companies that compete with themselves in the market. “F electricity” compared to our own “E car” has entered into an industry (for example, electrical related products) that has not entered the company while having similar development tendency like our “battery” and “Ceramitas” If this is the case, the technical hurdles for new entry into the industry can be expected to be low.
(b)自社と市場で競合しているが、異なる開発性向を有する企業と比較して自社の開 発部門の強み Z弱みを発見することができる。図 4では、例えば「半導体」を得意とす る一方で、「電気 ·電子」を不得意とする「D電機」を自社としたとき、異なる開発性向 を有し「電気 ·電子」を得意とする一方、「半導体」を不得意とする「A電機」と比較す れば自社の開発部門の強み Z弱みを発見することができる。  (b) You can discover the strength Z weakness of your development department compared to companies that compete with you in the market but have different development tendencies. In Fig. 4, for example, when “D Electric”, which specializes in “Semiconductors”, and “D Electric”, which does not specialize in “Electrics and electronics,” has a different development propensity, and it specializes in “Electrics and electronics”. On the other hand, in comparison with "A Electric", which is not good at "semiconductors", it is possible to discover the strength Z weakness of its development department.
(c)異なる開発性向を有し互いの開発部門の弱点を補い合える技術提携先を探すこ とができる。図 4では、例えば「半導体」や「光学」に特ィ匕しており他に得意分野を有し な!ヽ「C製作所」を自社としたとき、異なる開発性向を有し「電気 ·電子」等に強!ヽ「A 電機」などを発見できる。  (c) It is possible to search for technology alliance partners who have different development tendencies and can compensate for the weaknesses of each other's development departments. In Fig. 4, for example, the company is specializing in "semiconductors" and "optical" and has no other specialty! With "C Mfg." As its own company, it has a different development tendency and "electricity · electronic" You can discover “A Electric Machine” etc.
[0078] また、ある技術分野と他の技術分野とで開発主体の関連性が示されるので、技術 分野間の関連性を分析することができる。例えば、図 4で隣接する「電池」と「セラミク ス」のように、比較する技術分野を同一企業「E自動車」又は「F電気」が併せて手掛 ける傾向が高い場合、  In addition, since the relationship between developmental entities is shown in a certain technical field and another technical field, it is possible to analyze the relationship between the technical fields. For example, as in the case of “battery” and “ceramics” that are adjacent in FIG.
(a)両者を手掛けることで現存の事業に結びついている可能性を見出し、当該事業 への参入可否や、当該事業に参入するための技術開発の要否を判断することができ る。或いは、  (a) By working on both sides, it is possible to find out the possibility of connecting to the existing business, and to judge whether or not to enter the business, and whether it is necessary to develop technology to enter the business. Or,
(b)技術的に一見関連していないようでも相互の技術を転用できる可能性を見出す ことができる。  (b) It is possible to find out the possibility of diverting each other's technologies even if they are technically seemingly unrelated.
[0079] 図 4では、 2種類の属性のうち一方を人的属性とし、他方を技術分野属性とした例 について説明したが、これに限らず、 2種類の属性の両方を技術分野属性とし、例え ば一方を技術分類、他方を技術要素としてもよい。また、一方を IPC主分類 (セクショ ン、クラス)、他方を IPC副分類 (グループ、サブグループ)等としてもよい。 Although FIG. 4 illustrates an example in which one of the two types of attributes is a human attribute and the other is a technical field attribute, the present invention is not limited to this, and both of the two types of attributes are technical field attributes, example For example, one may be a technology classification and the other may be a technology element. Also, one may be an IPC main classification (section, class), and the other may be an IPC subclass (group, subgroup) or the like.
[0080] 以上のように、本実施例によれば、企業が自社の研究開発組織において開発して きた技術開発成果や、その技術資産ポートフォリオの現状を自身で把握し、今後の 開発方向性の客観的な指針を持つことが可能となり、企業の技術開発投資判断に資 することが可能である。  As described above, according to the present embodiment, a company can grasp the result of technological development that he or she has developed in his own research and development organization and the current status of the technological asset portfolio by himself. It becomes possible to have objective guidelines, and it can contribute to a company's technology development investment judgment.
また、以上のように、技術文書の属性の様々な組合せに対して、本発明の手法を適 用することにより、特定企業の開発体制の現状をより多面的な角度力 より精緻に分 析することが可能となり、さらに、当該分析力も得られた結果に基づき、今後の開発の 方向性に対する企業の意思決定をより効果的に支援することが可能となる。  Also, as described above, by applying the method of the present invention to various combinations of attributes of the technical document, the current state of the development system of a specific company can be analyzed more precisely than the multifaceted angular force. In addition, based on the analysis results obtained, it is possible to more effectively support the company's decision-making on the direction of future development.
[0081] <4.第二実施形態 >  Second Embodiment>
次に、本発明の第二実施形態について説明する。この第二実施形態に係る技術文 書属性の関連性分析支援装置のハードウェア構成は、上記第一実施形態における ハードウ ア構成(図 1)と同一であるので、その説明を省略する。  Next, a second embodiment of the present invention will be described. The hardware configuration of the technical document attribute relevance analysis support apparatus according to the second embodiment is the same as the hardware configuration (FIG. 1) in the first embodiment, and thus the description thereof will be omitted.
[0082] 図 5は第二実施形態の関連性分析支援装置における処理装置 1の動作手順を示 すフローチャートである。  FIG. 5 is a flowchart showing the operation procedure of the processing device 1 in the relevance analysis support device of the second embodiment.
この第二実施形態は、上記第一実施形態における第 1及び第 2ベクトル群を生成 するまでの処理に相当する部分に、主な特徴を有している。すなわち、この第二実施 形態において、技術文書の属性 X及び Yとして、文書に含まれる課題語及び解決語 を用い、ベクトル成分となるスコアとして、上記課題語及び解決語の組合せが同一で ある技術文書数の増減率を用いる。生成されたベクトル群を配置する処理等は、上 記第一実施形態とほぼ同様である。この第二実施形態の動作手順について、以下 詳細に説明する。  The second embodiment has main features in the portion corresponding to the processing up to the generation of the first and second vector groups in the first embodiment. That is, in the second embodiment, the task word and the solution word included in the document are used as the attributes X and Y of the technical document, and the combination of the problem word and the solution word is the same as the score to be a vector component. Use the change rate of the number of documents. The process of arranging the generated vector group is substantially the same as that of the first embodiment. The operation procedure of this second embodiment will be described in detail below.
[0083] < 4 1.技術文書群の取得 > <4 1. Acquisition of technical document group>
まず、データ取得部 110が、入力装置 2から入力される分析対象文書群の取得条 件に基づき、分析対象となる技術文書群を取得する (ステップ S210)。取得される技 術文書群の種類は特許文書、技術論文等任意であるが、特に特許文書は次に述べ る課題語、解決語がコンピュータ処理で抽出可能なフォーマットで記載されて ヽるの で好ましいと言える。分析対象文書群の取得条件は、例えば IPCコードで指定しても 良 ヽし、特定の技術文書に対する類似度上位所定件数の文書を取得することとして も良い。 First, based on the acquisition condition of the analysis target document group input from the input device 2, the data acquisition unit 110 acquires the technical document group to be analyzed (step S210). The types of technical documents to be acquired may be any type such as patent documents and technical papers, but in particular patent documents will be described in a format that can be extracted by computer processing the problem words and solution words described below. It can be said that it is preferable. The acquisition condition of the analysis target document group may be specified even if it is specified by, for example, the IPC code, and a document having a predetermined number of similar high degree of similarity to a specific technical document may be acquired.
[0084] く 4— 2.課題語、解決語の選定 >  [0084] 4-2. Selection of problem words and solution words>
次に、データ取得部 110は、取得された分析対象文書群の各文書から、「課題語」 と「解決語」の候補をそれぞれ抽出する (ステップ S211)。例えば各文書の要約部分 又はその他の部分に「課題」や「解決手段」の項目がある場合は、その部分の単語を 抽出する。また例えば各文書に「本発明の課題は' · ·」或いは「この課題を解決する ために本発明は · · ·」等の記述が含まれている場合は、当該記述の直後の部分から 単語を抽出する。  Next, the data acquisition unit 110 extracts candidates for the “task word” and the “solved word” from each document of the acquired analysis target document group (step S211). For example, if there is an item “task” or “solution” in the summary part or other part of each document, the word of that part is extracted. Also, for example, when each document includes a description such as “The subject of the present invention is' ····································> Extract
[0085] 次に、データ取得部 110は、抽出された「課題語」及び「解決語」の候補の中から、 分析に用いる「課題語」及び「解決語」それぞれ選定する (ステップ S212)。選定方法 としては、例えば、各「課題語」及び「解決語」の候補について、分析対象文書群にお ける文書頻度 (DF :分析対象文書群において各索引語で検索したときのヒット文書 数)の上位所定数 (例えば各 100語)を選定する方法が考えられるが、それ以外の方 法でも良い。  Next, the data acquisition unit 110 selects each of the “task word” and the “solved word” used for the analysis from among the extracted “task word” and “solved word” candidates (step S 212). As the selection method, for example, the document frequency in the analysis target document group for each “problem word” and “solved word” candidate (DF: the number of hit documents when searched by each index word in the analysis target document group) There is a method of selecting the upper predetermined number (for example, 100 words each), but other methods may be used.
[0086] <4 3.因子負荷量の算出 >  [0086] <4 3. Calculation of factor loading>
次に、データ取得部 110は、選定された「課題語」を用いて因子分析を行い、各課 題語の因子負荷量を算出する (ステップ S 213)。具体的には、次のように行う。 分析対象文書群の文書数を Iとし、各文書を i(i= l, 2, · · · , I)で表す。また、選定 された課題語の数を Gとし、各課題語を g (g= l, 2, · · · , G)で表す。 I件の各文書 i にっき、各課題語 gの重み付け量 zを算出する。この結果、次のような I行 G列のデー タを得ることができる。この zを行列要素とする I行 G列の行列を Zとおく。  Next, the data acquisition unit 110 performs factor analysis using the selected “task word” to calculate the factor loading amount of each task word (step S 213). Specifically, it is performed as follows. The number of documents in the analysis target group is I, and each document is represented by i (i = 1, 2, · · ·, I). In addition, the number of selected task words is G, and each task word is represented by g (g = l, 2, · · ·, G). For each document i of I cases, calculate the weight amount z of each task word g. As a result, it is possible to obtain the following data of I row G column. Let Z be an I-by-G matrix with z as the matrix element.
[表 7]  [Table 7]
索引語 1 索引語 2 索引語 G  Index term 1 Index term 2 Index term G
文書 1 Z 1 1 Z 1 2 ■ . . Z 1 G 文書 2 Z 2 1 z 2 2 ■ . . z 2 G 文書 3 Z 3 1 z 3 2 ■ . . z 3 G Article 1 Z 1 1 Z 1 2 ■ .. Z 1 G Document 2 Z 2 1 z 2 2 ■ .. Z 2 G Article 3 Z 3 1 z 3 2 ■ .. Z 3 G
■ . . ■ . . ■ . . ■ . .  ■ ■ ■ ■ ■ ■ ■.
文書 I Z I 1 Z I 2 ■ . . Z I G [0087] ここで重み付け量とは、所定の観点力 各課題語に対し各文書において与えられ る数量をいい、例えば TFIDFを用いるのが好ましい。 TFIDFとは、ある索引語につ いて、索引語頻度 (TF :ある文書における当該課題語の出現回数)と、文書頻度 (D F:所定文書集団のうち当該課題語が出現する文書の文書数)の逆数又は文書頻度 の対数の逆数 (IDF :逆文書頻度)との積により求められる値である。文書ベクトルの 算出対象となる文書にぉ ヽて多数用いられる課題語であって、所定文書集団にぉ ヽ てあまり用いられて ヽな 、課題語にっ 、ては高 ヽ TFIDF値が算出される。 Documents IZI 1 ZI 2 ■. Here, the term “weighting amount” means a quantity given in each document for each predetermined task word, and it is preferable to use, for example, TFIDF. TFIDF is the index term frequency (TF: number of occurrences of the task word in a document) and document frequency (DF: the number of documents of the document in which the task term appears in a predetermined document group) for a certain index term. It is a value determined by multiplying it by the reciprocal of or the reciprocal of the log of document frequency (IDF: reverse document frequency). It is a task word that is used in large numbers for the document for which the document vector is to be calculated, and is used frequently for the specified document group, and a high word TFIDF value is calculated for the task word. .
[0088] 次に、各文書 iを被験者とし、各課題語 gを観測変数とし、各重み付け量 zを被験者 による回答とした因子分析における因子負荷量を算出する。  Next, with each document i as a subject, each task word g as an observation variable, and each weighting amount z as an answer by the subject, a factor loading amount in factor analysis is calculated.
具体的には、因子数を Hとし、各因子を h (h= l, 2, · · · , H)で表し、各課題語 gの 各因子 hに対する因子負荷量を a とする。また、各文書 iの各因子 hに関する因子得 gh  Specifically, the number of factors is H, each factor is represented by h (h = 1, 2, · · ·, H), and the factor loading amount for each factor h of each task word g is a. Also, the factor gh for each factor h in each document i
点を f とする。そして、因子負荷量 a を行列要素とする因子負荷行列 Aと、因子得点 ih gh  Let f be a point. Then, a factor loading matrix A having a factor loading amount a as a matrix element, and a factor score ih gh
f を行列要素とする因子得点行列 Fを次のようにおく。  Let F be a factor score matrix F with matrix elements as follows.
[表 8]  [Table 8]
Figure imgf000031_0001
Figure imgf000031_0001
[表 9]  [Table 9]
Figure imgf000031_0002
Figure imgf000031_0002
次に、 I行 G列の残差行列を Eとおき、式  Next, let the residual matrix of I rows and G columns be E, and
Z = F X At + E Z = FXA t + E
但し、 は Aの転置行列  Where is the transpose of A
を以下のようにして解!、て因子負荷行列 Aを求める。 [0090] 因子得点行列 Fの各要素である因子得点 f 及び残差行列 Eの各要素である残差 e Solve as follows and find the factor loading matrix A. Factor score f which is each element of factor score matrix F and residual e which is each element of residual matrix E
ih i に関し、(1)因子得点は、平均 0、標準偏差 1に標準化されている、(2)各因子得点 g  For ih i, (1) factor scores are standardized to mean 0 and standard deviation 1 (2) each factor score g
間の相関は 0である、(3)各残差間の相関は 0である、(4)各因子得点と各残差との 相関は 0である、との仮定を設けると、一般に、 Assuming that the correlation between is 0, (3) the correlation between each residual is 0, and (4) the correlation between each factor score and each residual is 0, in general,
Figure imgf000032_0001
Figure imgf000032_0001
但し、 Rは観測変数間の相関行列、 Vは残差の分散共分散行列  Where R is the correlation matrix between observed variables and V is the variance-covariance matrix of the residual
が成立することが知られている。そこで、次式において因子負荷量を求める。  Is known to hold. Therefore, the factor loading amount is determined in the following equation.
AAt =R-V AA t = RV
次に、 R— V=R*とおく。この R*を算出するため、行列 Zの各要素 z の値力も相関 ig  Next, let R-V = R *. In order to calculate this R *, the value power of each element z of the matrix Z is also correlated ig
行列 Rを算出した上で、相関行列の対角要素を共通性の推定値で置き換えることに より、 R*行列を推定する(共通性の推定法としては例えば SMC法、 RMAX法等があ る)。そして、 R* =AAtであることから、この R*行列を基に因子負荷行列 Aを算出し て因子負荷量を求める(因子負荷量を求める方法としては例えば主因子法、最小二 乗法、最尤法等がある)。  After the matrix R is calculated, the R * matrix is estimated by replacing the diagonal elements of the correlation matrix with the estimated value of the commonity (for example, the SMC method, RMAX method, etc. can be used to estimate the commonality) ). Then, since R * = AAt, the factor loading matrix A is calculated based on this R * matrix to obtain the factor loading (for example, the principal factor method, the least-squares method, There is a likelihood method etc.).
[0091] そして、より有意味な因子を見出すため、因子の回転という操作を行うことが望まし い。因子軸の回転方法としては、ノ リマックス、コーティマックス、ェカマックス、パーシ マックス、ォーソマックス、直交プロクラステス等の直交回転や、プロマックス、ォブリミ ン、ハリス'カイザー、斜交プロクラステス等の斜交回転が挙げられる。  [0091] Then, in order to find more meaningful factors, it is desirable to perform an operation called factor rotation. As the method of rotation of factor axis, orthogonal rotation such as Nolimax, Coatimax, Ekamax, Persimax, Orthomax, Orthogonal procrustes, etc., Promax, Oblimin, Harris' Kaiser, Oblique such as oblique procrastess Rotation is mentioned.
[0092] データ取得部 110は、「解決語」についても因子分析を行い、各解決語の因子負荷 量を算出する (ステップ S214)。因子負荷量の算出方法は「課題語」について説明し たものと同一である。  The data acquisition unit 110 also performs factor analysis on the “solved word” to calculate the factor loading amount of each resolved word (step S 214). The calculation method of the factor load amount is the same as that described for the “task word”.
[0093] < 4 4.因子の選定 >  [0093] <4 4. Selection of Factors>
次に、データ取得部 110は、課題語、解決語それぞれの因子分析の結果得られた 因子 (それぞれ「課題因子」、「解決因子」とする)のうちそれぞれ所定個数を選定す る (ステップ S215、 S216) 0例えば各因子の固有値に基づき、固有値上位所定個数 の因子を選定する。選定する因子の数は任意であり、ここでは課題因子を p個、解決 因子を q個選定するものとする。 Next, the data acquisition unit 110 selects a predetermined number of factors (hereinafter referred to as “task factor” and “solving factor”) obtained as a result of factor analysis of each of the task word and the solution word (step S215). , S216) 0 For example, based on the eigenvalues of each factor, a factor is selected with a predetermined number of higher eigenvalues. The number of factors to be selected is arbitrary. Here, p task factors and q resolution factors are selected.
第一実施形態と対比して述べるならば、本第二実施形態では、 2種類の属性 X, Y として「課題因子」と「解決因子」を選択し、属性の値の範囲 (値域)としてそれぞれ固 有値上位 p個の課題因子と固有値上位 q個の解決因子を選ぶことになる。 In contrast to the first embodiment, in the second embodiment, two types of attributes X and Y are used. We select “task factor” and “resolution factor” and select the top p unique factor eigenvalues and the top q eigenvalue factors as the value range of the attribute (range).
[0094] < 4 5.課題語、解決語の帰属因子の決定 > [0094] <4 5. Determination of Assigning Factors of Problem Words and Solved Words>
次に、データ取得部 110は、各課題語、各解決語の帰属因子をそれぞれ決定する (ステップ S217、 S218)。  Next, the data acquisition unit 110 determines an attribution factor of each task word and each solution word (steps S217 and S218).
例えば、ある課題語 (又は解決語) gの各因子 (但し、上記因子の選定において選 定されなかった因子を除く。 )に対する因子負荷量のうち、ある因子 hに対する因子負 荷量 a が最大であれば、当該課題語 (又は解決語) gの帰属因子を当該因子 hとす gh  For example, among the factor loadings for each factor of a certain task word (or a solved word) g (except the factor not selected in the selection of the factor), the factor loading amount a for a certain factor h is maximum In this case, an assignment factor of the subject word (or solution term) g is set as the factor h.
る。なお、この場合、 1つの課題語 (又は解決語)が帰属し得る因子は 1つのみとなる 力 1つの因子に帰属する課題語 (又は解決語)は 1つとは限らない。  Ru. In this case, one task word (or solution word) can belong to only one factor. Force One task word (or solution word) belongs to one factor is not limited to one.
また、因子負荷量に下限値を設け、ある課題語 (又は解決語) gの因子負荷量の最 大値 a 力当該下限値未満であれば、当該課題語 (又は解決語) gはいかなる因子に ち' J帚属しな 、こととしてちょ 、。  In addition, a lower limit value is set for the factor loading amount, and the maximum value of the factor loading amount of a certain task word (or solution) g a is less than the lower limit, the task word (or solved word) g is any factor 'J 帚 belongs, as a matter of course.
[0095] <4-6.マトリクス作成〉  <4-6. Matrix Creation>
次に、スコア算出部 120は、帰属因子が決定した各課題語、各解決語の組合せご とに、該当技術文書数を計数する (ステップ S220)。例えば、帰属因子が決定した 1 つの課題語と 1つの解決語の両者を文書内又はその要約部内に含む文書を検索す る AND検索を実行し、そのヒット文書数を該当技術文書数とする。  Next, the score calculation unit 120 counts the number of relevant technical documents for each combination of each task word and each solution word whose attribution factor has been determined (step S220). For example, an AND search is performed to search for documents that contain both one task word and one solution word whose attribution factor has been determined in the document or its summary part, and the number of hit documents is the number of relevant technical documents.
[0096] 次に、スコア算出部 120は、各課題因子、各解決因子の組合せごとに、文書数を集 計する (ステップ S221)。例えば、ある課題因子に帰属する課題語のうちの 1つと、あ る解決因子に帰属する解決語のうちの 1つとの全組合せについて、該当技術文書数 を集計する。例えば、ある課題因子に帰属する課題語が Xg、 Xg、 Xgの 3  Next, the score calculation unit 120 sums up the number of documents for each combination of each task factor and each solution factor (step S221). For example, the number of relevant technical documents is totaled for all combinations of one of the task words belonging to a certain task factor and one of the solution words belonging to a certain resolution factor. For example, the task word belonging to a certain task factor is 3 of Xg, Xg, Xg
1 2 3 つである とし、ある解決因子に帰属する解決語が Yg、 Ygの 2つであるとすれば、  Assuming that it is 1 2 3 and there are two solution words belonging to a certain solution factor, Yg and Yg,
1 2  1 2
(Xg , Yg )についての該当技術文書数、  Number of relevant technical documents for (Xg, Yg),
(Xg , Yg )についての該当技術文書数、  Number of relevant technical documents for (Xg, Yg),
1 2  1 2
(Xg , Yg )についての該当技術文書数、  Number of relevant technical documents for (Xg, Yg),
2 1  twenty one
(Xg , Yg )についての該当技術文書数、  Number of relevant technical documents for (Xg, Yg),
2 2  twenty two
(Xg , Yg )についての該当技術文書数、 (Xg , Yg )についての該当技術文書数 Number of relevant technical documents for (Xg, Yg), Number of relevant technical documents for (Xg, Yg)
3 2  3 2
を合計したものが、当該課題因子と当該解決因子の組合せに係る文書数となる。  Is the number of documents related to the combination of the problem factor and the solution factor.
[0097] 因子の組合せごとに文書数を集計する方法はこれに限らず、例えば、上述の因子 分析で算出される各文書 iの各因子 hに関する因子得点を f に基づいて各文書が帰 [0097] The method of counting the number of documents for each combination of factors is not limited to this, for example, each document returns the factor score for each factor h of each document i calculated by the above factor analysis based on f.
ih  ih
属する因子の組合せを決定し、これに基づ!ヽて文書数を集計しても良 、。  Determine the combination of factors to which you belong, and you can count the number of documents based on this.
[0098] こうして各課題因子と各解決因子の組合せに係る文書数をそれぞれ算出すると、 p 個の課題因子と q個の解決因子の組合せは P X q個であるので、 p行 q列の文書数マ トリタスが得られる。 Thus, when the number of documents relating to the combination of each task factor and each solution factor is calculated, since the combination of p task factors and q solution factors is PX q, the number of documents in p rows and q columns Matrix is obtained.
この文書数マトリクスは、課題因子と解決因子の各組合せにっ ヽて幾つの技術文 書が存在するかを示すものであり、ある技術分野でどのような課題及び解決手段が 注目されているのかを把握したり、特定の解決因子 (マトリクスのある 1行)に着目して 当該技術で解決し得る複数の課題 (用途)を見出したり、特定の課題因子 (マトリクス のある 1列)に着目して当該課題に対する複数の解決手段を見出したりするのに役立 つものである。  This document number matrix shows how many technical documents exist for each combination of problem factor and solution factor, and what kinds of problems and solutions are focused in a certain technical field To identify multiple problems (uses) that can be solved by the technology by focusing on a particular solution factor (one row with a matrix), or to focus on a particular problem factor (one row with a matrix) Help to find multiple solutions to the problem.
[0099] 図 6は、第二実施形態で生成される文書数マトリクスの一例を示したものである。こ の文書数マトリクスは、「半導体装置及びその製造方法」に関するある特許文献 iの類 似度上位所定件数の特許文献を抽出し、上述の方法により課題語及び解決語につ いてそれぞれ因子分析を行って得られたものである。このマトリクスの欄外に、各課題 因子及び各解決因子に含まれる課題語群及び解決語群に基づいて分析者が解釈 した因子の意味が記載されて 、る。  FIG. 6 shows an example of the document number matrix generated in the second embodiment. This document number matrix is used to extract a predetermined number of patent documents similar to the top of a certain patent document i relating to “semiconductor devices and their manufacturing methods”, and perform factor analysis for each of the problem word and the solution word by the method described above. It is obtained by going. The meaning of the factor interpreted by the analyst based on the task word group and the solution word group included in each task factor and each solution factor is described in the margin of this matrix.
まず、マトリクスを縦に見てみる。特許文献数を縦軸に集計すると、この分析対象文 書群の主要な課題が見えてくる。この例では、課題因子 1及び 2の数が大きい。従つ て、「半導体装置及びその製造方法」に関する特許文献 iの類似文献群において、主 要な課題は微細化と製造管理であると言える。更に、各列で平均出願年を算出する と、課題因子 3は、数は少ないものの比較的新しい特許文献が集中していることがわ かる。つまり、主要な課題が微細化や製造管理力 消費電力へ移ってきていることが わかる。パーソナルコンピュータ等の設置型の用途から、携帯端末等のバッテリー駆 動用途がトレンドになりつつあることが推測される。 次に、マトリクスを横に見てみる。課題因子 1に対して、解決因子 1及び 2の特許文 献数が多い。つまり、微細化に対してはリソグラフィ及びエッチングが主要な解決手 段であることがわかる。また、解決因子 2は課題因子 2に対しても特許文献数が多い。 つまり、エッチングは製造管理においても、有効な解決手段になり得る。また例えば 課題因子 1における各解決因子の出願人構成を見たり、あるボックスに注目して年ご との推移を見たりすることによって、種々の分析が可能となる。 First, look at the matrix vertically. If the number of patent documents is summed up on the vertical axis, the main issues of this document group to be analyzed become apparent. In this example, the number of task factors 1 and 2 is large. Therefore, in the similar document group of Patent Document i relating to “semiconductor device and its manufacturing method”, it can be said that the main problems are miniaturization and manufacturing control. Furthermore, when the average filing year is calculated in each column, it can be seen that although the number of task factor 3 is small, relatively new patent documents are concentrated. In other words, it can be seen that the major issues are shifting to miniaturization, manufacturing control power consumption and power consumption. From stationary applications such as personal computers, it can be inferred that battery drive applications such as portable terminals are becoming a trend. Next, look at the matrix sideways. The number of patent documents for Resolution Factors 1 and 2 is greater than for Problem Factor 1. In other words, it is clear that lithography and etching are the main solutions for miniaturization. In addition, the resolution factor 2 has more patent documents than the problem factor 2. That is, etching can be an effective solution also in manufacturing control. Also, for example, various analyzes can be made by looking at the applicant composition of each solution factor in Task factor 1, or by looking at a certain box and looking at the transition of each year.
上述したとおり、属性の一方を課題因子、他方を解決因子とした場合、課題因子は 何らかの用途で起こり得る不都合を表し、解決因子はそれを解消し得る技術であると すると、課題因子カゝら用途を、解決因子から技術を類推することができる。  As described above, when one of the attributes is a task factor and the other is a resolution factor, the task factor represents a disadvantage that may occur in any application, and the resolution factor is a technology that can eliminate it. Applications can be analogized to technology from solution factors.
更に、ある課題に対する各解決因子を企業別に集計することによって、同一課題に 対する各社の技術戦略を分析することができる。  In addition, it is possible to analyze each company's technology strategy for the same task by aggregating each solution factor for a certain task by company.
[0100] この p行 q列の文書数マトリクスの各要素(文書数)をスコア σ として、第一実施形 態と同様に第 1及び第 2ベクトル群を生成し、それぞれベクトル間の関連性に基づい てベクトルの配置を行うことにより、課題因子及び解決因子の集中や分散の状態を分 析できるようにしてもよいが、本第二実施形態では、更に以下のようにしてベクトル群 を生成する。 Each element (the number of documents) of the document number matrix of p rows and q columns is set as a score σ, and the first and second vector groups are generated as in the first embodiment, By arranging the vectors based on this, it may be possible to analyze the state of concentration and variance of the problem factor and the solution factor, but in the second embodiment, a vector group is further generated as follows. .
[0101] く 4— 7.増減率マトリクス作成 > [0101] 4) 7. Creation of change rate matrix>
スコア算出部 120は、 ρ行 q列の文書数マトリクスの各要素を、所定期間ごとに分類 する (ステップ S222)。例えば特許文書であれば、出願の年ごとに分類したり、複数 年ごとに分類したりすることが考えられる。好ましくは、所定時期を境に前後 2期間に 分類する。  The score calculation unit 120 classifies each element of the document number matrix of ρ rows and q columns for each predetermined period (step S222). For example, in the case of patent documents, classification by year of application or classification by multiple years can be considered. Preferably, it is classified into two periods before and after a predetermined time.
[0102] 次に、スコア算出部 120は、 p行 q列の文書数マトリクスの各要素について、上記所 定期間ごとの分類に基づいて技術文書数の増減率を算出する。上記所定期間ごと の分類が 2期間への分類であった場合には、増減率は p行 q列の文書数マトリクスの 各要素につき 1つずつ算出されるので、 p行 q列の増減率マトリクス力^つ生成される 。上記所定期間ごとの分類が T期間 (T≥3)への分類であった場合には、隣接する 期間ごとに ρ行 q列の増減率マトリクスを生成して (T—1)個としても良いし、平均増減 率のマトリクスを 1つ生成しても良い。 このようにして生成された増減率マトリクスにより、課題や解決手段のトレンドの変化 を察知することができる。例えば、特定の解決因子 (マトリクスのある 1行)に着目して 当該技術の用途の変化を見出したり、特定の課題因子 (マトリクスのある 1列)に着目 して当該課題に対する解決手段の変化を見出したりすることができる。 Next, the score calculation unit 120 calculates the increase / decrease rate of the number of technical documents based on the classification for each predetermined period for each element of the document number matrix of p rows and q columns. If the classification for each predetermined period is classified into two periods, the rate of increase or decrease is calculated for each element of the document number matrix of p rows q columns, so the change rate matrix of p rows q columns Forces are generated. In the case where the classification for each predetermined period is the classification into T period (T≥3), it is also possible to generate (T−1) pieces of change rate matrix of を rows and q columns for each adjacent period. And one matrix of average rates of change may be generated. The change factor matrix generated in this way makes it possible to detect changes in the trend of issues and solutions. For example, focus on a specific solution factor (one row with a matrix) to find changes in the application of the technology, or focus on a specific task factor (one row with a matrix) to change the solution to the problem. It can be found.
[0103] <4 8.ベクトルの生成等 > <4 8. Generation of vector etc.>
以降の処理は第一実施形態と同様であり、第 1及び第 2ベクトル群生成部 130及び 140により、この p行 q列の増減率マトリクスの各要素(増減率)をスコア σ として第 1 及び第 2ベクトル群を生成する(ステップ S230、 S240)。  The subsequent processing is the same as in the first embodiment, and the first and second vector group generation units 130 and 140 set each element (increase / decrease ratio) of this increase / decrease ratio matrix of p rows and q columns as a score σ. A second vector group is generated (steps S230 and S240).
そして、第 1及び第 2ベクトル関連性算出部 150及び 160により、それぞれベクトル 間の関連性を算出し (ステップ S250、 S260)、第 1及び第 2ベクトル配置部 170及び 180により、それぞれベクトルの配置を行う(ステップ S271〜278、 S281〜S288)。 なお、第 1及び第 2ベクトル群について、本第二実施形態では、 p個の課題因子に 関する q次元ベクトルを「課題因子公報件数増減率ベクトル」と称し、 q個の解決因子 に関する P次元ベクトルを「解決因子公報件数増減率ベクトル」と称している。また、 第 1及び第 2クラスタについて、本第二実施形態では、それぞれ「課題因子クラスタ」 及び「解決因子クラスタ」と称して 、る。  Then, the first and second vector relation calculation units 150 and 160 respectively calculate the relation between the vectors (steps S250 and S260), and the first and second vector arrangement units 170 and 180 respectively arrange the vectors. (Steps S271 to 278, S281 to S288). For the first and second vector groups, in the second embodiment, the q-dimensional vector relating to the p task factors is referred to as “the task factor gazette increase / decrease rate vector”, and the p dimension vectors relating to the q solution factors Is referred to as “resolution factor bulletin number increase / decrease rate vector”. In the second embodiment, the first and second clusters are respectively referred to as “problem factor cluster” and “solving factor cluster”.
こうして増減率マトリクスについてベクトルの配置を行うことにより、課題因子及び解 決因子のトレンドに関する集中や分散の状態を分析することができる。  In this way, by arranging the vectors on the change rate matrix, it is possible to analyze the state of concentration or variance regarding the trend of the problem factor and the solution factor.
また、マトリクスの各要素を文書数等の増減率とした場合、課題因子 (用途)と解決 因子 (技術)の時間的変遷を詳細に把握することが可能となる。特に、マトリクスの中 で増減の著し 、課題因子 (用途)と解決因子 (技術)を迅速に把握できるよう可視化 することができる。更に、件数が増加傾向にある要素を発見できる場合がある。  In addition, if each element of the matrix is the change rate of the number of documents etc., it becomes possible to grasp the temporal transition of the problem factor (use) and the solution factor (technology) in detail. In particular, it can be visualized so that the problem factors (applications) and the solution factors (technologies) can be grasped quickly by making remarkable changes in the matrix. In addition, it may be possible to find factors that are increasing in number.
また、ある課題因子 (用途)について、特定の解決因子 (技術)が増加傾向にあると きは、その用途の主流技術が変化してきたことを察知できる。同様に、ある技術の用 途が変化する兆候をつかむことも可能である。このことはシーズである技術の新たな ニーズへの転用可能性を意味しており、シーズに基づく技術開発戦略策定の基礎と することが可能となる。  In addition, if there is an increasing trend in a particular solution factor (technology) for a certain problem factor (application), it can be understood that the mainstream technology of the application has changed. Similarly, it is possible to catch signs of changing the use of a given technology. This means that the technology that is the seeds can be diverted to the new needs, and can be used as a basis for formulating technology development strategies based on the seeds.
[0104] < 5.他の実施形態 > なお、本発明は、以上で説明した実施形態に限定されるものではなぐ本発明の要 旨の範囲内において種々の変形が可能である。 <5. Other Embodiments> The present invention can be variously modified within the scope of the gist of the present invention which is not limited to the embodiment described above.
例えば、第 1実施形態では、マトリクスの各軸に配置される属性は、一方が人的属 性であり、他方が技術分野属性である場合について説明し、人的属性の例として出 願人を挙げている。しかし、これは例示に過ぎない。人的属性として、発明者等の他 の人的情報を利用してもよい。この場合も第 1実施形態と同様の作用効果を奏するこ とがでさる。  For example, in the first embodiment, the attributes arranged on each axis of the matrix are described in the case where one is a human attribute and the other is a technical attribute, and the applicant is used as an example of the human attribute. I have listed. However, this is only an example. Other human information such as the inventors may be used as the human attribute. Also in this case, the same function and effect as the first embodiment can be obtained.
また、上記第 2実施形態では、マトリクスの各要素となるスコアに文書数を利用する 場合と、文書数等の増減率を利用する場合とを説明したが、特にこれに限定されるも のではない。マトリクスの各要素となるスコアに、技術文書のデータに応じた任意のス コアを用いるようにしてもょ ヽ。  In the second embodiment, the case where the number of documents is used for the score which is each element of the matrix and the case where the increase / decrease rate of the number of documents etc. is used are described. Absent. It is possible to use an arbitrary score corresponding to the data of the technical document for the score that is each element of the matrix.
またマトリクスは、分析対象となる 1つの技術文書群に対して 1枚のみ生成しても良 いし、あるマトリクスの各要素を例えば所定期間ごとに分類し、当該所定期間ごとのマ トリタスに分けることで複数枚のマトリクスを生成しても良い。  Also, a matrix may be generated for only one technical document group to be analyzed, or each element of a certain matrix may be classified, for example, for each predetermined period, and divided into the matrix for each predetermined period. A plurality of matrixes may be generated.
そして、所定期間ごとのマトリクスに分ける等により複数枚のマトリクスを生成した場 合、マトリクス要素内の特許文献を出願年ごとに追っていくと、分析対象となる文書群 のトレンド (例えばある用途に対する技術トレンド)をおおむね把握することができる。 更に例えば属性の一方を課題因子、他方を解決因子とした場合、いくつかの用途と それを構成する技術、そして主要な課題が整理され、いつごろどのような解決手段が 主流であつたかを網羅的に把握できる。  Then, when a plurality of matrices are generated by dividing them into a matrix for each predetermined period, if the patent documents in the matrix element are followed for each filing year, the trend of the document group to be analyzed (for example, the technology for a certain application) It is possible to grasp the trend in general. Furthermore, for example, when one of the attributes is the problem factor and the other is the solution factor, several applications, technologies that constitute them, and major issues are organized, covering when and what solutions were mainstream. Can be grasped.
また、上記の第 2実施形態で説明した増減率関連性マトリックス作成処理 (図 5参照 )では、 S221の処理の後、マトリクスを所定期間毎に分類し (S222)、所定期間にお ける、各課題因子'各解決因子の組合せ毎に、該当公報件数の増減率を算出(S22 3)し、その後、 S230〜S227 (或!/、 ίま S240〜S287)の処理を行うようにして!/ヽるカ S 特にこれに限定するものではない。例えば、 S222および S223の処理を、 S221の 後で ίまなく、 S277の処理(或! ま、 S287の処理)の後【こ行うよう【こしてもよ!ヽ。 こうして、マトリクスの軸を構成する課題因子と解決因子とにおける関連性マトリクス を生成することにより、類似な課題因子と類似な解決因子がそれぞれ隣り合わせに 接すること〖こなる。そのため分析対象となる所定の技術分野における課題解決手段 が整理統合され、いくつ力の用途とそれを構成する技術、そして主要な課題に分類 することが可能となる。 Further, in the increase / decrease rate relevance matrix creation process (see FIG. 5) described in the second embodiment above, after the process of S221, the matrix is classified for each predetermined period (S222), and Task factor 'Calculate the increase / decrease rate of the number of corresponding gazettes for each combination of each solution factor (S223), and then perform the process of S230 to S227 (or! /, Scam S240 to S287)! / It is not particularly limited to this. For example, the processing of S222 and S223 should not be performed after S221, but after the processing of S277 (or, processing of S287). Thus, by generating a relation matrix between the task factor and the solution factor that constitute the matrix axis, similar task factors and similar solution factors are arranged side by side. I will meet you. As a result, the means for solving problems in a given technical field to be analyzed are consolidated and integrated, and it is possible to classify them into several applications, technologies that constitute them, and major problems.
更に、関連性マトリクスの要素ごとに増減率を算出したので、この分野が直面してお り、注目の高まっている課題が何であるの力 そして、当該課題を解決するものとして 集中的な取り組みがなされている技術が何であるのかを、把握することができる。  Furthermore, we calculated the rate of change for each element of the relevancy matrix, so the power of the issues that are rising in the field that are facing increasing attention and focused efforts to solve those issues. You can understand what the technology being done is.

Claims

請求の範囲 The scope of the claims
[1] 少なくとも 2種類の属性をそれぞれ有する技術文書を複数含んだ技術文書群のデ ータを取得するデータ取得手段と、  [1] A data acquisition means for acquiring data of a technical document group including a plurality of technical documents each having at least two types of attributes,
前記少なくとも 2種類の属性のうち第 1の属性 Xと第 2の属性 Yとの組合せのそれぞ れに属する技術文書のデータに応じたスコアを算出するスコア算出手段と、  Score calculation means for calculating a score according to data of a technical document belonging to each of the combinations of the first attribute X and the second attribute Y among the at least two types of attributes;
前記第 1の属性 Xを横軸に、前記第 2の属性 Yを縦軸にとって前記スコアをマトリク ス状に配置したときの、当該マトリクス状の配置における各列に属する前記スコアに 基づきベクトルを生成する第 1ベクトル群生成手段と、  When the first attribute X is arranged on the horizontal axis and the second attribute Y on the vertical axis, a vector is generated based on the score belonging to each column in the matrix arrangement when the score is arranged in a matrix. First vector group generation means for generating
前記第 1ベクトル群生成手段により生成されたベクトル群について、相互の関連性 を算出する第 1べ外ル関連性算出手段と、  First vector relevancy calculating means for calculating interrelationships of vectors generated by the first vector group generating means;
前記第 1ベクトル群生成手段により生成されたベクトル群について、前記関連性の 高 、ベクトル同士をより近くに配置する第 1ベクトル配置手段と、  First vector arranging means for arranging the vectors of the highness of the relevancy closer to the vectors generated by the first vector group generating means;
前記マトリクス状の配置における各行に属する前記スコアに基づきベクトルを生成 する第 2ベクトル群生成手段と、  Second vector group generation means for generating a vector based on the scores belonging to each row in the matrix arrangement;
前記第 2ベクトル群生成手段により生成されたベクトル群について、相互の関連性 を算出する第 2べ外ル関連性算出手段と、  Second vector relevancy calculating means for calculating interrelationships of vectors generated by the second vector group generating means;
前記第 2ベクトル群生成手段により生成されたベクトル群について、前記関連性の 高いベクトル同士をより近くに配置する第 2ベクトル配置手段と、を備えた、技術文書 属性の関連性分析支援装置。  Technical document Attribute relevance analysis support apparatus, comprising: second vector arrangement means for arranging vectors having high relevance closer to each other with respect to vectors generated by the second vector group generation means.
[2] 請求項 1に記載の技術文書属性の関連性分析支援装置であって、 [2] A technical document attribute relevancy analysis supporting device according to claim 1;
前記第 1の属性 X及び第 2の属性 Yのうち、一方は各技術文書の人的属性であり、 他方は各技術文書の技術分野属性である、技術文書属性の関連性分析支援装置。  A technical document attribute relevancy analysis supporting device, wherein one of the first attribute X and the second attribute Y is a human attribute of each technical document, and the other is a technical field attribute of each technical document.
[3] 請求項 1又は請求項 2に記載の技術文書属性の関連性分析支援装置であって、 前記スコア算出手段は、前記第 1の属性 Xの値 X (j = l, 2, · · · , p)と前記第 2の属 性 Yの値 Y (k= l, 2, · · · , q)との組合せ (X , Y )が同一である技術文書の数に基 [3] The technical document attribute relevance analysis support device according to claim 1 or claim 2, wherein the score calculation means is configured to calculate a value X of the first attribute X (j = 1, 2, · · · ·, P) and the value Y (k = l, 2, · · ·, q) of the second attribute Y are based on the number of technical documents in which (X, Y) are identical
k j k  k j k
づいて、前記スコアを算出する、技術文書属性の関連性分析支援装置。  The technical document attribute relevance analysis support device, which calculates the score.
[4] 請求項 1又は請求項 2に記載の技術文書属性の関連性分析支援装置であって、 前記スコア算出手段は、前記第 1の属性 Xの値 X (j = l, 2, · · · , p)と前記第 2の属 性 Yの値 Y (k= l, 2, · · · , q)との組合せ (X , Y )が同一である技術文書の各々に k j k [4] The technical document attribute relevancy analysis supporting device according to claim 1 or 2, wherein the score calculation unit is configured to calculate a value X of the first attribute X (j = 1, 2, · · · ·, P) and the second genus In each of the technical documents whose combination (X, Y) with the value Y (k = l, 2, · · ·, q) of the degree Y is identical
重み付けをして合計することにより、前記スコアを算出する、技術文書属性の関連性 分析支援装置。  The technical document attribute relevancy analysis supporting device that calculates the score by weighting and summing.
[5] 請求項 1乃至請求項 4の何れか一項に記載の技術文書属性の関連性分析支援装 置であって、  [5] A technical document attribute relevance analysis supporting apparatus according to any one of claims 1 to 4, which is an apparatus for supporting relevance analysis of technical document attributes, wherein
前記第 1ベクトル群生成手段又は前記第 2ベクトル群生成手段は、前記マトリクス状 の配置における各列又は各行に属するスコアの各々の対数を成分として含むベタト ルを生成する、技術文書属性の関連性分析支援装置。  Relevance of the technical document attribute, wherein the first vector group generation means or the second vector group generation means generates a beta including the logarithm of each score belonging to each column or row in the matrix arrangement as a component Analysis support device.
[6] 請求項 1乃至請求項 5の何れか一項に記載の技術文書属性の関連性分析支援装 置であって、 [6] A technical document attribute relevance analysis support device according to any one of claims 1 to 5,
前記第 1ベクトル配置手段は、  The first vector arrangement means
前記第 1ベクトル群生成手段により生成されたベクトル群のうち 2つのベクトルを所 定の基準で選択し、前記 2つのベクトルを隣接させてクラスタを生成する第 1クラスタ 生成手段と、  First cluster generation means for selecting two vectors among the vector groups generated by the first vector group generation means according to a predetermined criterion, and causing the two vectors to be adjacent to generate a cluster;
前記第 1クラスタ生成手段により生成されたクラスタを構成するベクトル群のうち両 端に位置する端部ベクトルの何れかとの関連性が最も高いベクトルを、前記第 1ベタ トル群生成手段により生成されたベクトル群のうち前記クラスタ以外のベクトル群から 選択して加入ベクトルとし、当該加入ベクトルと関連性が最も高 ヽとされた端部べタト ルと、当該加入ベクトルとを隣接させることにより、当該加入ベクトルを前記クラスタに 加えて前記クラスタを順次拡大させる第 1クラスタ拡大手段と、を備え、且つ Z又は、 前記第 2ベクトル配置手段は、  The vector having the highest relevance to any of the end vectors located at both ends of the vectors forming the cluster generated by the first cluster generation means is generated by the first vector group generation means The addition vector is selected from vectors other than the cluster among the vectors, and the addition vector is added to the end vector that is most highly correlated with the addition vector, and the addition vector is adjacent to the addition vector. And V. first cluster expanding means for adding vectors to the clusters to expand the clusters sequentially, and Z or the second vector arranging means includes
前記第 2ベクトル群生成手段により生成されたベクトル群のうち 2つのベクトルを所 定の基準で選択し、前記 2つのベクトルを隣接させてクラスタを生成する第 2クラスタ 生成手段と、  A second cluster generation unit configured to select two vectors among the vector groups generated by the second vector group generation unit according to a predetermined criterion, and to make the two vectors adjacent to generate a cluster;
前記第 2クラスタ生成手段により生成されたクラスタを構成するベクトル群のうち両 端に位置する端部ベクトルの何れかとの関連性が最も高いベクトルを、前記第 2ベタ トル群生成手段により生成されたベクトル群のうち前記クラスタ以外のベクトル群から 選択して加入ベクトルとし、当該加入ベクトルと関連性が最も高 ヽとされた端部べタト ルと、当該加入ベクトルとを隣接させることにより、当該加入ベクトルを前記クラスタに 加えて前記クラスタを順次拡大させる第 2クラスタ拡大手段と、を備えた、技術文書属 性の関連性分析支援装置。 The vector having the highest relevance to any of end vectors located at both ends of the vectors forming the cluster generated by the second cluster generation means is generated by the second vector group generation means Among the vector groups, a vector group other than the cluster is selected as a join vector, and the end vector that is most highly correlated with the join vector A technical document attribute relationship analysis support device, comprising: a second cluster expanding unit that adds the subscription vector to the cluster by causing the cluster and the subscription vector to be adjacent to each other to expand the cluster sequentially.
[7] 請求項 6に記載の技術文書属性の関連性分析支援装置であって、 [7] A technical document attribute relevancy analysis supporting device according to claim 6;
前記第 1クラスタ生成手段又は前記第 2クラスタ生成手段は、それぞれ前記第 1ベタ トル群生成手段により生成されたベクトル群又は前記第 2ベクトル群生成手段により 生成されたベクトル群について、  The first cluster generation unit or the second cluster generation unit is configured to generate a vector group generated by the first vector group generation unit or a vector group generated by the second vector group generation unit, respectively.
当該ベクトル群のうち相互の関連性が最も高い 2つのベクトルを選択する、技術文 書属性の関連性分析支援装置。  Technical document attribute relevancy analysis support device that selects two vectors with the highest correlation among the vector groups.
[8] 請求項 6又は請求項 7に記載の技術文書属性の関連性分析支援装置であって、 前記第 1ベクトル配置手段は、 [8] The technical document attribute relevancy analysis supporting device according to claim 6 or 7, wherein the first vector arranging unit is:
前記第 1クラスタ生成手段により生成されたクラスタを構成するベクトル群のうち両 端に位置する端部ベクトルと、前記第 1ベクトル群生成手段により生成されたベクトル 群のうち前記クラスタ以外のベクトルとの関連性が何れも所定の閾値以下である場合 、前記第 1クラスタ拡大手段による前記加入ベクトルの選択と前記クラスタの拡大を中 止する、第 1クラスタ拡大中止判定手段と、  Between end vectors located at both ends of the vectors forming the cluster generated by the first cluster generation means, and vectors other than the cluster among the vectors generated by the first vector group generation means A first cluster expansion cancellation determination unit for stopping the selection of the joining vector by the first cluster expansion unit and the expansion of the cluster if any relevance is equal to or less than a predetermined threshold;
前記第 1クラスタ生成手段により生成されたクラスタ以外のベクトル群のうち、 2つの ベクトルを所定の基準で選択し、当該 2つのベクトルを隣接させて他のクラスタを生成 する、第 1クラスタ再生成手段と、  A first cluster regenerating unit configured to select two vectors out of a group of vectors other than clusters generated by the first cluster generating unit according to a predetermined criterion and to make the two vectors adjacent to generate another cluster; When,
前記第 1クラスタ再生成手段により生成された前記他のクラスタを構成するベクトル 群のうち両端に位置する端部ベクトルの何れかとの関連性が最も高いベクトルを、前 記第 1ベクトル群生成手段により生成されたベクトル群であって前記第 1クラスタ生成 手段により生成されたクラスタ以外のベクトル群のうち前記他のクラスタ以外のベタト ル群カゝら選択して加入ベクトルとし、当該加入ベクトルと関連性が最も高 ヽとされた端 部ベクトルと、当該加入ベクトルとを隣接させることにより、当該加入ベクトルを前記他 のクラスタに加えて前記他のクラスタを順次拡大させる、第 1クラスタ再拡大手段と、を 更に備え、且つ Z又は、  The vector having the highest relevance to any one of the end vectors located at both ends of the vectors forming the other cluster generated by the first cluster regenerating means is generated by the first vector group generating means Of the vector groups other than the clusters generated by the first cluster generation means, the vector groups other than the other clusters are selected to be the generated vector group as a join vector, and the association vector and the association vector are selected. First cluster re-expanding means for expanding the other cluster sequentially by adding the addition vector to the other cluster by bringing the end vector whose highest value is the highest and the addition vector adjacent to each other; And Z or or
前記第 2ベクトル配置手段は、 前記第 2クラスタ生成手段により生成されたクラスタを構成するベクトル群のうち両 端に位置する端部ベクトルと、前記第 2ベクトル群生成手段により生成されたベクトル 群のうち前記クラスタ以外のベクトルとの関連性が何れも所定の閾値以下である場合 、前記第 2クラスタ拡大手段による前記加入ベクトルの選択と前記クラスタの拡大を中 止する、第 2クラスタ拡大中止判定手段と、 The second vector arranging unit is Between an end vector located at both ends of the vectors forming the cluster generated by the second cluster generation means, and a vector other than the cluster among the vectors generated by the second vector group generation means A second cluster expansion cancellation determination unit for stopping the selection of the joining vector by the second cluster expansion unit and the expansion of the cluster if any relevance is equal to or less than a predetermined threshold;
前記第 2クラスタ生成手段により生成されたクラスタ以外のベクトル群のうち、 2つの ベクトルを所定の基準で選択し、当該 2つのベクトルを隣接させて他のクラスタを生成 する、第 2クラスタ再生成手段と、  Second cluster regenerating means for selecting two vectors out of a group of vectors other than clusters generated by the second cluster generating means according to a predetermined criterion and causing the two vectors to be adjacent to generate another cluster When,
前記第 2クラスタ再生成手段により生成された前記他のクラスタを構成するベクトル 群のうち両端に位置する端部ベクトルの何れかとの関連性が最も高いベクトルを、前 記第 2ベクトル群生成手段により生成されたベクトル群であって前記第 2クラスタ生成 手段により生成されたクラスタ以外のベクトル群のうち前記他のクラスタ以外のベタト ル群カゝら選択して加入ベクトルとし、当該加入ベクトルと関連性が最も高 ヽとされた端 部ベクトルと、当該加入ベクトルとを隣接させることにより、当該加入ベクトルを前記他 のクラスタに加えて前記他のクラスタを順次拡大させる、第 2クラスタ再拡大手段と、を 更に備える技術文書属性の関連性分析支援装置。  The vector having the highest relevance to any one of the end vectors located at both ends of the vectors forming the other cluster generated by the second cluster regenerating means is generated by the second vector group generating means. Of the vector groups other than the clusters generated by the second cluster generation means, a vector group other than the other clusters is selected as the generated vector group to form a join vector, and the association vector and the association vector are selected. A second cluster re-expanding means for expanding the other cluster sequentially by adding the addition vector to the other cluster by bringing the end vector whose highest value is the highest and the addition vector adjacent to each other; And a technical document attribute relevance analysis support device.
[9] 請求項 1乃至請求項 8の何れか一項に記載の技術文書属性の関連性分析支援装 置であって、 [9] A technical document attribute relevancy analysis supporting device according to any one of claims 1 to 8, comprising:
前記第 1ベクトル配置手段及び前記第 2ベクトル配置手段による配置に基づいてマ トリタス状に配置されるスコアの分布状態を、スコアに応じた模様又は色彩を付して表 示する表示手段を備えた、技術文書属性の関連性分析支援装置。  Display means for displaying the distribution state of the score arranged in a matrix based on the arrangement by the first vector arrangement means and the second vector arrangement means with a pattern or a color according to the score , Technical document attribute relevance analysis support device.
[10] 少なくとも 2種類の属性をそれぞれ有する技術文書を複数含んだ技術文書群のデ ータを取得するデータ取得ステップと、 [10] a data acquisition step of acquiring data of a technical document group including a plurality of technical documents each having at least two types of attributes;
前記少なくとも 2種類の属性のうち第 1の属性 Xと第 2の属性 Yとの組合せのそれぞ れに属する技術文書のデータに応じたスコアを算出するスコア算出ステップと、 前記第 1の属性 Xを横軸に、前記第 2の属性 Yを縦軸にとって前記スコアをマトリク ス状に配置したときの、当該マトリクス状の配置における各列に属する前記スコアに 基づきベクトルを生成する第 1ベクトル群生成ステップと、 前記第 1ベクトル群生成ステップにより生成されたベクトル群について、相互の関連 性を算出する第 1べ外ル関連性算出ステップと、 Calculating a score according to data of a technical document belonging to each of the combination of the first attribute X and the second attribute Y among the at least two types of attributes; and the first attribute X When the score is arranged in a matrix form with the second attribute Y as the abscissa and the ordinate with the second attribute Y, a vector is generated based on the scores belonging to each column in the matrix arrangement. Step and A first vector relevancy calculation step of calculating mutual relativity for the vector groups generated by the first vector group generation step;
前記第 1ベクトル群生成ステップにより生成されたベクトル群について、前記関連性 の高いベクトル同士をより近くに配置する第 1ベクトル配置ステップと、  A first vector arrangement step of arranging vectors having high relevance among the vectors generated by the first vector group generation step;
前記マトリクス状の配置における各行に属する前記スコアに基づきベクトルを生成 する第 2ベクトル群生成ステップと、  A second vector group generating step of generating a vector based on the score belonging to each row in the matrix arrangement;
前記第 2ベクトル群生成ステップにより生成されたベクトル群について、相互の関連 性を算出する第 2べ外ル関連性算出ステップと、  Calculating a second relationship between the vectors generated by the second vector group generation step;
前記第 2ベクトル群生成ステップにより生成されたベクトル群について、前記関連性 の高いベクトル同士をより近くに配置する第 2ベクトル配置ステップと、を備えた、技術 文書属性の関連性分析支援方法。  A technical document attribute relevancy analysis supporting method, comprising: a second vector arrangement step of arranging vectors having high relevancy closer to vectors generated by the second vector group generation step.
[11] 少なくとも 2種類の属性をそれぞれ有する技術文書を複数含んだ技術文書群のデ ータを取得するデータ取得ステップと、 [11] a data acquisition step of acquiring data of a technical document group including a plurality of technical documents each having at least two types of attributes;
前記少なくとも 2種類の属性のうち第 1の属性 Xと第 2の属性 Yとの組合せのそれぞ れに属する技術文書のデータに応じたスコアを算出するスコア算出ステップと、 前記第 1の属性 Xを横軸に、前記第 2の属性 Yを縦軸にとって前記スコアをマトリク ス状に配置したときの、当該マトリクス状の配置における各列に属する前記スコアに 基づきベクトルを生成する第 1ベクトル群生成ステップと、  Calculating a score according to data of a technical document belonging to each of the combination of the first attribute X and the second attribute Y among the at least two types of attributes; and the first attribute X When the score is arranged in a matrix form with the second attribute Y as the abscissa and the ordinate with the second attribute Y, a vector is generated based on the scores belonging to each column in the matrix arrangement. Step and
前記第 1ベクトル群生成ステップにより生成されたベクトル群について、相互の関連 性を算出する第 1べ外ル関連性算出ステップと、  A first vector relevancy calculation step of calculating mutual relativity for the vector groups generated by the first vector group generation step;
前記第 1ベクトル群生成ステップにより生成されたベクトル群について、前記関連性 の高いベクトル同士をより近くに配置する第 1ベクトル配置ステップと、  A first vector arrangement step of arranging vectors having high relevance among the vectors generated by the first vector group generation step;
前記マトリクス状の配置における各行に属する前記スコアに基づきベクトルを生成 する第 2ベクトル群生成ステップと、  A second vector group generating step of generating a vector based on the score belonging to each row in the matrix arrangement;
前記第 2ベクトル群生成ステップにより生成されたベクトル群について、相互の関連 性を算出する第 2べ外ル関連性算出ステップと、  Calculating a second relationship between the vectors generated by the second vector group generation step;
前記第 2ベクトル群生成ステップにより生成されたベクトル群について、前記関連性 の高いベクトル同士をより近くに配置する第 2ベクトル配置ステップと、をコンピュータ に実行させる、技術文書属性の関連性分析支援プログラム。 A second vector arranging step of arranging the vectors having high relevance closer to each other for the vector group generated by the second vector group generating step; Relevancy analysis support program for technical document attributes to be executed.
PCT/JP2006/324876 2005-12-13 2006-12-13 Technical document attribute association analysis supporting apparatus WO2007069663A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US12/097,446 US20090138465A1 (en) 2005-12-13 2006-12-13 Technical document attribute association analysis supporting apparatus
JP2007550208A JPWO2007069663A1 (en) 2005-12-13 2006-12-13 Technical document attribute relevance analysis support device

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
JP2005358529 2005-12-13
JP2005-358529 2005-12-13
JPPCT/JP2006/321958 2006-11-02
PCT/JP2006/321958 WO2007069408A1 (en) 2005-12-13 2006-11-02 Technical document attribute association analysis supporting apparatus

Publications (1)

Publication Number Publication Date
WO2007069663A1 true WO2007069663A1 (en) 2007-06-21

Family

ID=38162966

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2006/324876 WO2007069663A1 (en) 2005-12-13 2006-12-13 Technical document attribute association analysis supporting apparatus

Country Status (1)

Country Link
WO (1) WO2007069663A1 (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010009307A (en) * 2008-06-26 2010-01-14 Kyoto Univ Feature word automatic learning system, content linkage type advertisement distribution computer system, retrieval linkage type advertisement distribution computer system and text classification computer system, and computer program and method for them
JP2010231564A (en) * 2009-03-27 2010-10-14 Nomura Research Institute Ltd Patent application evaluation device
JP2011198111A (en) * 2010-03-19 2011-10-06 Toshiba Corp Feature word extraction device and program
JPWO2014034557A1 (en) * 2012-08-31 2016-08-08 日本電気株式会社 Text mining device, text mining method and program
JP6134973B1 (en) * 2016-02-22 2017-05-31 ジャパンモード株式会社 Literature data analysis system
JP2018049430A (en) * 2016-09-21 2018-03-29 ジャパンモード株式会社 Literature data analysis program and system
JP2018055605A (en) * 2016-09-30 2018-04-05 ジャパンモード株式会社 Innovation creation support program
JP2018055604A (en) * 2016-09-30 2018-04-05 ジャパンモード株式会社 Innovation creation support program
JP2018060493A (en) * 2016-10-03 2018-04-12 ジャパンモード株式会社 Problem solution supporting system, problem solution supporting method, and problem solution supporting program
JP2018077548A (en) * 2016-11-07 2018-05-17 株式会社Personal AI Artificial intelligence device automatically determining existence or non-existence of similarity of thinking sentence from object sentence group
JP2019067191A (en) * 2017-10-02 2019-04-25 株式会社東芝 Information processing device, information processing method, and program
JP2021125068A (en) * 2020-02-07 2021-08-30 学校法人金沢工業大学 New business proposal computer system, new business proposal computer program, new business proposal method, and new business proposal device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001092825A (en) * 1999-09-17 2001-04-06 Nec Corp Device and method for processing information
JP2003345811A (en) * 2002-05-27 2003-12-05 Hitachi Ltd System and method for displaying document information, and document retrieving method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001092825A (en) * 1999-09-17 2001-04-06 Nec Corp Device and method for processing information
JP2003345811A (en) * 2002-05-27 2003-12-05 Hitachi Ltd System and method for displaying document information, and document retrieving method

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010009307A (en) * 2008-06-26 2010-01-14 Kyoto Univ Feature word automatic learning system, content linkage type advertisement distribution computer system, retrieval linkage type advertisement distribution computer system and text classification computer system, and computer program and method for them
JP2010231564A (en) * 2009-03-27 2010-10-14 Nomura Research Institute Ltd Patent application evaluation device
JP2011198111A (en) * 2010-03-19 2011-10-06 Toshiba Corp Feature word extraction device and program
US10140361B2 (en) 2012-08-31 2018-11-27 Nec Corporation Text mining device, text mining method, and computer-readable recording medium
JPWO2014034557A1 (en) * 2012-08-31 2016-08-08 日本電気株式会社 Text mining device, text mining method and program
JP6134973B1 (en) * 2016-02-22 2017-05-31 ジャパンモード株式会社 Literature data analysis system
JP2017151533A (en) * 2016-02-22 2017-08-31 ジャパンモード株式会社 Literature data analysis system
JP2018049430A (en) * 2016-09-21 2018-03-29 ジャパンモード株式会社 Literature data analysis program and system
JP7012298B2 (en) 2016-09-21 2022-01-28 ジャパンモード株式会社 Literature data analysis program and system
JP2018055604A (en) * 2016-09-30 2018-04-05 ジャパンモード株式会社 Innovation creation support program
JP2018055605A (en) * 2016-09-30 2018-04-05 ジャパンモード株式会社 Innovation creation support program
JP2018060493A (en) * 2016-10-03 2018-04-12 ジャパンモード株式会社 Problem solution supporting system, problem solution supporting method, and problem solution supporting program
JP2018077548A (en) * 2016-11-07 2018-05-17 株式会社Personal AI Artificial intelligence device automatically determining existence or non-existence of similarity of thinking sentence from object sentence group
JP2019067191A (en) * 2017-10-02 2019-04-25 株式会社東芝 Information processing device, information processing method, and program
JP2021125068A (en) * 2020-02-07 2021-08-30 学校法人金沢工業大学 New business proposal computer system, new business proposal computer program, new business proposal method, and new business proposal device
JP7454213B2 (en) 2020-02-07 2024-03-22 学校法人金沢工業大学 New business proposal computer system, new business proposal computer program, new business proposal method, new business proposal device

Similar Documents

Publication Publication Date Title
WO2007069663A1 (en) Technical document attribute association analysis supporting apparatus
Bergstrom Eigenfactor: Measuring the value and prestige of scholarly journals
Ur-Rahman et al. Textual data mining for industrial knowledge management and text classification: A business oriented approach
Marqués et al. Ranking-based MCDM models in financial management applications: analysis and emerging challenges
US9135242B1 (en) Methods and systems for the analysis of large text corpora
Yang et al. Tag-based expert recommendation in community question answering
Thorleuchter et al. Analyzing existing customers’ websites to improve the customer acquisition process as well as the profitability prediction in B-to-B marketing
US20140108311A1 (en) Information porcessing apparatus and method, and program thereof
US20070106663A1 (en) Methods and apparatus for using user personality type to improve the organization of documents retrieved in response to a search query
Ku et al. Artificial intelligence and visual analytics: a deep-learning approach to analyze hotel reviews & responses
JPWO2007069663A1 (en) Technical document attribute relevance analysis support device
Bergstrom Measuring the value and prestige of scholarly journals
Nazemi et al. Visual analytics for technology and innovation management: An interaction approach for strategic decision making
CN112116380A (en) Dynamic satisfaction-based intelligent interactive information terminal visualization method
WO2017203672A1 (en) Item recommendation method, item recommendation program, and item recommendation apparatus
Vlachos et al. Toward interpretable predictive models in B2B recommender systems
Mo et al. An interval efficiency measurement in DEA when considering undesirable outputs
Meng et al. Fine-grained job salary benchmarking with a nonparametric dirichlet process–based latent factor model
CN115829683A (en) Power integration commodity recommendation method and system based on inverse reward learning optimization
Fernandes Andry et al. Big data implementation in Tesla using classification with rapid miner
Ramsey et al. Text mining to identify customers likely to respond to cross-selling campaigns: Reading notes from your customers
You et al. A hotel ranking model through online reviews with aspect-based sentiment analysis
Tuarob et al. Automated discovery of product preferences in ubiquitous social media data: A case study of automobile market
García et al. Ranking-based MCDM models in financial management applications: analysis and emerging challenges
Amigo et al. What is my problem? Identifying formal tasks and metrics in data mining on the basis of measurement theory

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application
DPE1 Request for preliminary examination filed after expiration of 19th month from priority date (pct application filed from 20040101)
WWE Wipo information: entry into national phase

Ref document number: 1020087011111

Country of ref document: KR

WWE Wipo information: entry into national phase

Ref document number: 2007550208

Country of ref document: JP

WWE Wipo information: entry into national phase

Ref document number: 12097446

Country of ref document: US

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 06834631

Country of ref document: EP

Kind code of ref document: A1