WO2007069663A1

WO2007069663A1 - Technical document attribute association analysis supporting apparatus

Info

Publication number: WO2007069663A1
Application number: PCT/JP2006/324876
Authority: WO
Inventors: Hiroaki Masuyama; Makoto Asada; Kazumi Hasuko
Original assignee: Intellectual Property Bank Corp.
Priority date: 2005-12-13
Filing date: 2006-12-13
Publication date: 2007-06-21

Abstract

Data on a group of technical documents having an attribute X and an attribute Y is acquired and a score corresponding to the data on the technical documents belonging to the combination of the attribute X and attribute Y is calculated. The attribute X is placed on the horizontal axis and the attribute Y is placed on the vertical axis. The scores are placed in a matrix. According to the scores belonging to each column of the placement in the matrix, a group of vectors Xj are created. According to the scores belonging to each line, a group of vectors Yk are created. For each of the groups of vectors Xj, Yk, vectors having higher association with each other are placed nearer to each other. The associations between the vectors of the first group corresponding to the first attribute X of the technical document and the associations between the vectors of the second group corresponding to the second attribute Y are analyzed in detail, and examination in consideration of both first and second attributes X, Y can be performed.

Description

Specification

Relevancy analysis support device for technical document attributes

Technical field

[0001] The present invention relates to an analysis support device, support method and support program for analyzing the relevance of document attributes in a technical document group.

Background art

[0002] It is possible for a company to grasp for themselves the current status of its technological asset portfolio and the results of technological development that it has developed in its own R & D organization and to have objective guidelines for future development direction. It's not easy. Collecting and analyzing data obtained from the company and other companies' technical documents as a means to obtain objective guidelines on the development direction of a company is considered to be an effective means, but the enormous technical documents also have the power There are considerable difficulties in extracting useful information.

[0003] Conventionally, in an attempt to excavate information buried in huge data, for example, X (j = l, 2, ···, p) and Y (k = l, 2, · · ·, Two items, q), on the horizontal and vertical axes

k

Some arrange and analyze cross-tables that tabulate the aggregation results for each combination of these items.

For example, the dual scaling method (Dual Scaling) described in the following document can be obtained by scaling X (j) to item X (overhead) on the horizontal axis and item Y (overhead) on the vertical axis in such a cross table. = l, 2, ..., p

j k j

Give a scale Y (k = l, 2, ..., q) and try to find hidden trends in the cross table

k

It is a thing. In this document, in order to calculate specific numerical values of scale X and scale Y,

j k

Correlation between source vector X = (X, X, ..., X) and q-dimensional vector Υ = (Υ, Υ, ..., Υ)

1 2 ρ 1 2 q

Find the components of the vectors X and Y so that the square of the number is as close to one as possible.

Non-patent literature 1: Taichiro Ueda et al. "Practical workshop Excel thorough utilization Multivariate analysis

"Shuwa System Co., Ltd., published on September 5, 2003, pp. 323-337

Disclosure of the invention

Problem that invention tries to solve

However, in the above dual scaling method and other conventional methods, the item X (j = 1, 2, · · · · · p, mutual relations, and items Y on the horizontal axis (k = l, 2, · · ·, q) mutual relations

k

Because it does not analyze in minutes, it is possible to fully consider considering X and Y together.

j k

I can't. The above dual scaling method gives X and Y respectively a scale,

j k

Information is limited. Even with this method, the relevance of document attributes in technical documents can not be analyzed sufficiently. Therefore, it can not be used as a criterion to obtain objective guidance on the direction of technology development in a company.

The subject of the present invention is to solve the interrelationship of the first vector group corresponding to the first attribute X of the technical document and the interrelationship of the second vector group corresponding to the second attribute Y. And analyze in detail the first attribute X and the second attribute Y, and identify the state of concentration or dispersion of the data distribution of the document attribute in the technical document group. Also, it is to provide a device, method and program for supporting relevance analysis of technical document attributes, which can show judgment criteria for the direction of technological development of a company.

Means to solve the problem

(1) In order to solve the above problems, the technical document attribute relevance analysis support device of the present invention is:

Data acquisition means for acquiring data of a technical document group including a plurality of technical documents each having at least two types of attributes;

Score calculation means for calculating a score according to data of a technical document belonging to each of a combination of the first attribute X and the second attribute Y among the at least two types of attributes; and the first attribute X When the score is arranged in a matrix form with the second attribute Y as the abscissa and the ordinate with the second attribute Y, a vector is generated based on the scores belonging to each column in the matrix arrangement. Means,

First vector relevancy calculating means for calculating interrelationships of vectors generated by the first vector group generating means;

First vector arranging means for arranging the vectors of the highness of the relevancy closer to the vectors generated by the first vector group generating means;

Second vector group generation means for generating a vector based on the scores belonging to each row in the matrix arrangement; Second vector relevancy calculating means for calculating interrelationships of vectors generated by the second vector group generating means;

For the vector group generated by the second vector group generation means, the height of the relevancy, and a second vector arrangement means for arranging the vectors closer to each other.

According to this, the correlation between the vectors corresponding to the first attribute X (each row of the scores arranged in a matrix) is calculated, and the vectors having a similar distribution of the second attribute Y are calculated. Are placed closer, and the correlation between the vectors corresponding to the second attribute Y (each row of the score arranged in a matrix) is calculated, and vectors having similar distributions of the first attribute X are It will be placed closer. Therefore, we analyze in detail the interrelationship between the vectors corresponding to the first attribute X and the interrelationships between the vectors corresponding to the second attribute Y, on which the first attribute X and the second By conducting a study in consideration of the attribute Y, it is possible to identify the concentration and distribution of data distribution of document attributes in the technical document group.

(2) In the relevance analysis support device of the above technical document attribute,

Preferably, one of the first attribute X and the second attribute Y is a human attribute of each technical document, and the other is a technical attribute of each technical document.

Human attributes include, for example, applicants and inventors for patent documents, and authors and editors for technical papers and books. Technical attributes include technical elements, keywords, etc. in addition to technical classifications such as IPC (International Patent Classification).

[0010] In this way, it is possible to analyze the interrelationship between vectors corresponding to human attributes and the mutual relation between veins corresponding to technical field attributes, and then combine human attributes and technical field attributes. It will be possible to consider it. For example, since the relationship between technology development areas is shown between the company and other companies, companies with similar development tendencies can be found. Companies with similar development tendencies are not necessarily the ones that are currently competing in the market. Company 1S compared to your company With similar development tendency to your company, if you have entered an industry that has not yet entered your company, you can expect that the technical hurdles for your company to enter the industry are low. . In addition, you will find strengths of your development department Z compared with companies that compete with your company but have different development tendencies, or find a technology alliance partner that can compensate for each other's development department's weaknesses. To be able to compete with other companies in the industry that you want to enter It can also be used to formulate a policy for technology development. Furthermore, because relationships between developmental entities are shown, for example, in one technical field and another technical field, it is possible to analyze the relationship between technical fields. For example, if there is a high tendency for the same company to work together on the technology fields to be compared, (a) finding the possibility that it is linked to the existing business by working on both sides, and whether to enter the business or enter the business It is possible to judge the necessity of technical development to Or (b) it is possible to find out the possibility of diverting each other's technology even if it is not technically seemingly related.

(3) In the relevance analysis support device for the above technical document attributes,

The score calculation means is configured to calculate the value X of the first attribute X (j = l, 2, · · ·, p) and the value of the second attribute Y Y (k = l, 2, · · · ·, Based on the number of technical documents whose combination (X, Y) with q) is identical

k j k

Preferably, the score is calculated.

By calculating the score based on the number of technical documents having the same combination, it is possible to simply and objectively express the state of concentration or dispersion of attribute distribution.

(4) Further, the score calculation means may calculate the value X of the first attribute X (j = l, 2,..., P) and the value Y of the second attribute Y (k = Technical statement in which combinations (X, Y) with l, 2, · · · · · q are identical

k j k

The score may be calculated by weighting and summing each of the books.

[0014] By appropriately calculating the score by weighting each of the technical documents whose combination is the same, it is possible to appropriately analyze the score in consideration of the importance or qualitative factor of the technical documents. it can.

The weighting emphasizes the importance or quality of the technical document, for example, by making the patent publication more weighty than the published patent publication.

(5) In the relevance analysis support device of the above technical document attribute,

It is desirable that the first vector group generation means or the second vector group generation means generate a beta including, as a component, a logarithm of each of the scores belonging to each column or each row in the matrix arrangement.

[0016] As a result, the distribution of the beta component becomes close to the normal distribution, particularly when the scores are nonnegative and the distribution is concentrated around 0, so that the reliability of the relevancy calculation result is improved. Can.

(6) In the relevance analysis support device of the above technical document attribute,

The first vector arrangement means

First cluster generation means for selecting two vectors among the vector groups generated by the first vector group generation means according to a predetermined criterion, and causing the two vectors to be adjacent to generate a cluster;

The vector having the highest relevance to any of the end vectors located at both ends of the vectors forming the cluster generated by the first cluster generation means is generated by the first vector group generation means The addition vector is selected from vectors other than the cluster among the vectors, and the addition vector is added to the end vector that is most highly correlated with the addition vector, and the addition vector is adjacent to the addition vector. And V. first cluster expanding means for adding vectors to the clusters to expand the clusters sequentially, and Z or the second vector arranging means includes

A second cluster generation unit configured to select two vectors among the vector groups generated by the second vector group generation unit according to a predetermined criterion, and to make the two vectors adjacent to generate a cluster;

The vector having the highest relevance to any of end vectors located at both ends of the vectors forming the cluster generated by the second cluster generation means is generated by the second vector group generation means The addition vector is selected from vectors other than the cluster among the vectors, and the addition vector is added to the end vector that is most highly correlated with the addition vector, and the addition vector is adjacent to the addition vector. And second cluster expanding means for expanding the cluster sequentially by adding a vector to the cluster.

[0018] According to this, the high relevance !, the vectors are sequentially adjacent, the cluster is expanded, and so on, so that highly relevant vectors are surely placed close to each other, and the document attribute data distribution is concentrated And the state of distribution can be made explicit.

(7) In the relevance analysis support device for the above technical document attributes,

The first cluster generation unit or the second cluster generation unit is configured to Regarding the vector group generated by the toll group generation means or the vector group generated by the second vector group generation means,

It is desirable to select two vectors with the highest correlation among the vectors.

[0020] This makes it possible to ensure that the most relevant vectors are adjacent to each other, so that quantitative objectivity of vector arrangement can be secured.

(8) In the relevance analysis support device for the above technical document attributes,

The first vector arrangement means

Between end vectors located at both ends of the vectors forming the cluster generated by the first cluster generation means, and vectors other than the cluster among the vectors generated by the first vector group generation means A first cluster expansion cancellation determination unit for stopping the selection of the joining vector by the first cluster expansion unit and the expansion of the cluster if any relevance is equal to or less than a predetermined threshold;

A first cluster regenerating unit configured to select two vectors out of a group of vectors other than clusters generated by the first cluster generating unit according to a predetermined criterion and to make the two vectors adjacent to generate another cluster; When,

The vector having the highest relevance to any one of the end vectors located at both ends of the vectors forming the other cluster generated by the first cluster regenerating means is generated by the first vector group generating means Of the vector groups other than the clusters generated by the first cluster generation means, the vector groups other than the other clusters are selected to be the generated vector group as a join vector, and the association vector and the association vector are selected. First cluster re-expanding means for expanding the other cluster sequentially by adding the addition vector to the other cluster by bringing the end vector whose highest value is the highest and the addition vector adjacent to each other; And Z or or

The second vector arranging unit is

Between an end vector located at both ends of the vectors forming the cluster generated by the second cluster generation means, and a vector other than the cluster among the vectors generated by the second vector group generation means When all relevance is below a predetermined threshold Second cluster enlargement / disapproval determination means for discontinuing selection of the joining vector by the second cluster enlargement means and the enlargement of the cluster;

Second cluster regenerating means for selecting two vectors out of a group of vectors other than clusters generated by the second cluster generating means according to a predetermined criterion and causing the two vectors to be adjacent to generate another cluster When,

The vector having the highest relevance to any one of the end vectors located at both ends of the vectors forming the other cluster generated by the second cluster regenerating means is generated by the second vector group generating means. Of the vector groups other than the clusters generated by the second cluster generation means, a vector group other than the other clusters is selected as the generated vector group to form a join vector, and the association vector and the association vector are selected. A second cluster re-expanding means for expanding the other cluster sequentially by adding the addition vector to the other cluster by bringing the end vector whose highest value is the highest and the addition vector adjacent to each other; It is desirable to further provide

[0022] According to this, when the association with the end vector is less than or equal to the predetermined threshold value, it is avoided that the cluster is forcibly integrated into one cluster, and the combination of vectors having higher association is achieved. Can be prioritized, and the reliability of vector placement can be improved. For example, a correlation coefficient of 0 is used as the threshold of relevance.

(9) In the relevance analysis support device for the above technical document attributes,

Display means for displaying the distribution state of the score arranged in a matrix based on the arrangement by the first vector arrangement means and the second vector arrangement means with a pattern or a color according to the score. Hoped.

The distribution state of the score is not clear at first glance only by numerically indicating the distribution of the score, but the distribution state of the score can be displayed more easily by marking or coloring it.

(10) The present invention also provides a method of supporting analysis of relevance of document attributes including the same steps as the method executed by each of the above devices, and the same processing as the processing executed by each of the above devices. It is a technical document attribute relevance analysis support program that can be executed by This program is recorded on a recording medium such as FD, CDROM, DVD, etc. And may be transmitted and received by the network.

Brief description of the drawings

FIG. 1 is a diagram showing a hardware configuration of a technical document attribute relevance analysis support apparatus according to a first embodiment of the present invention.

[FIG. 2] A flowchart showing an operation procedure of the processing device 1 in the relevance analysis support device of the first embodiment.

FIG. 3 is a view showing a display example by a display unit.

FIG. 4 is a view showing another display example by the display unit.

[FIG. 5] The flowchart which shows the operation | movement procedure of the processing apparatus 1 in the relevance analysis assistance apparatus of 2nd embodiment.

[FIG. 6] An example of the document number matrix generated in the second embodiment.

Explanation of sign

1: processing device 2: input device 3: recording device 4: output device 110: data acquisition unit 120: score calculation unit 130 and 140: first and second vector group generation units 150 And 160: first and second vector relation calculation units, 170 and 180: first and second vector arrangement units

BEST MODE FOR CARRYING OUT THE INVENTION

Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

<1. Description of abbreviations>

i: Technical document number assigned to each technical document. For example, they are respectively granted to all patent applications extracted under certain conditions. If the number of technical documents is N, then i = l, 2, · · ·, N.

X, Y: Attributes of individual technical documents. For example, applicant, technical field (keyword or IPC) etc.

X, Y: The value of the attribute. For example, it refers to the specific name of the applicant or technical field, and is expressed numerically j k

It is not limited to

σ: Score calculated for each combination of attribute X and attribute Υ. Value range of attribute X is X, X, · kj 1 2

• · · · If X, attribute Y range is Y, Y, · · · Y, you can define p x q scores σ

Ρ 1 2 q k]

These can be arranged in a matrix of q rows and p columns. Each arranged in a matrix A vector X with q-dimensional vectors whose components are the scores σ, σ, · · · · · belonging to a column, each row

1] 2] Let q be a ρ-dimensional vector whose component is a score σ, σ, · · · · · belonging to the vector (

kl k2 kp k Use the same code as the corresponding attribute values X and Y).

j k

[0029] <2. Configuration of Relevancy Analysis Support Device for Technical Document Attributes>

FIG. 1 is a diagram showing the hardware configuration of the technical document attribute relevance analysis supporting apparatus according to the first embodiment of the present invention. As shown in the figure, the relevance analysis support device of this embodiment includes a processing device 1 including a CPU (central processing unit) and a memory (recording device) and the like, and input means such as a keyboard (hand input device) An input unit 2, a recording unit 3 which is a recording unit for storing data and conditions of a technical document group and conditions and processing results by the processing unit 1, and an output unit for displaying or printing scores etc. arranged in a matrix. The output device 4 is

The processing device 1 includes a data acquisition unit 110, a score calculation unit 120, first and second vector group generation units 130 and 140, first and second vector relevance calculation units 150 and 160, first and second The vector arrangement units 170 and 180 are provided.

The recording device 3 includes a condition recording unit 31, a work result storage unit 32, a document storage unit 33, and the like. The document storage unit 33 contains data of a technical document group, which has also acquired external database and internal database capabilities. The external database means, for example, a document database such as IPDL of a patent electronic library serviced by the Japan Patent Office or service operated by Patrices Corporation! An internal database is a database that stores data such as patent JP-ROM that is sold and / or is stored, FD (flexible disc) that stores documents, CD (compact disc) ROM, and MO (magneto-optical). Devices such as DVDs (digital video disks), devices such as DVDs (digital video disks), devices such as OCR (optical information readers) that read or output handwritten documents on paper etc It is assumed that the equipment to convert data is included.

In the present embodiment, patent publications are mainly handled as technical documents, but not limited to this, general technical documents can be analyzed widely, such as utility model publications, technical articles, technical magazines, books, and the like. As communication means for exchanging signals and data among the processing unit 1, the input unit 2, the recording unit 3 and the output unit 4, they may be directly connected by a USB (Universal System Bus) cable or the like. It may be sent and received via a network such as a LAN (local area network), or may be via a medium such as an FD, a CDROM, an MO, a DVD, etc., in which the document is stored. Alternatively, some or some of these may be combined.

[0033] <2 — 1. Details of Input Device 2>

Next, the configuration and functions of the relevance analysis support device described above will be described in detail. The input device 2 receives inputs such as acquisition conditions of data of technical documents, calculation conditions of scores, generation conditions of vectors, calculation conditions of relevance, arrangement conditions of vectors, and the like. These input conditions are sent to the condition recording unit 31 of the recording device 3 and stored.

[0034] <2-2. Details of Processing Unit 1>

The data acquisition unit 110 acquires data of a technical document group to be analyzed from the document storage unit 33 of the recording device 3 in accordance with the acquisition conditions of data input by the input device 2. For example, based on bibliographic information of each technical document, at least two types of attributes of each technical document are acquired as data. The acquired data of the technical document group is directly sent to the score calculation unit 120 to be used for processing there, or sent to the work result storage unit 32 of the recording device 3 to be stored.

Based on the data of the technical document group acquired by the data acquisition unit 110, the score calculation unit 120 determines the combination of the first attribute X and the second attribute Y among the at least two types of attributes. Calculate the score σ according to the data of the technical document that belongs to each. The score σ is calculated for each combination of the value of the first attribute X and the value of the second attribute Y of kj kj. The calculated score σ is directly sent to the first and second vector group generation units 130 and 140 to be used for processing with them, or sent to the work result storage unit 32 of the recording device 3 to be stored.

The first vector group generation unit 130 generates a vector group X based on the score σ calculated by the score calculation unit 120. The vector group X has each “row” in the matrix-like arrangement when the score σ is arranged in a matrix with the first attribute X on the horizontal axis and the second attribute Υ on the vertical axis. Calculated based on the above score belonging to The second vector group generation unit 140 is based on the score σ calculated by the score calculation unit 120,

«

Generate vector group Υ. This vector group Υ has the first attribute X as a horizontal axis, and the second

k k

When the above-mentioned scores σ are arranged in the form of a matrix with the attribute Y of Y being on the vertical axis, calculation is made based on the above-mentioned scores belonging to each “row” in the matrix-like arrangement.

The vector groups X and Y generated by the first and second vector group generation units 130 and 140 are

j k are directly sent to the first and second vector relation calculation units 150 and 160 for use in processing there, or are sent to the work result storage unit 32 of the recording device 3 and stored.

The first vector relevancy calculation unit 150 calculates the relevancy of the vector group X generated by the first vector group generation unit 130.

The second vector relevancy calculation unit 160 calculates the relevancy of the vector group Y generated by the second vector group generation unit 140.

k

The data of the relevancy calculated by the first and second vector relevancy calculation units 150 and 160 are directly sent to the first and second vector arrangement units 170 and 180, respectively, to be used for processing there, or It is sent to and stored in the work result storage unit 32 of the recording device 3.

[0038] The first vector arrangement unit 170 arranges vectors having high relevance closer to each other based on the vector X mutual relationships calculated by the first vector relationship calculation unit 150.

Process.

The second vector arrangement unit 180 arranges vectors having high relevance closer to each other based on the mutual relationship between vector Y calculated by the second vector relevance calculation unit 160. k

Process.

The arrangement of the vectors determined by the first and second vector arrangement units 170 and 180 is sent to and stored in the work result storage unit 32 of the recording device 3 and output from the output device 4 as necessary.

As particularly preferable embodiments of the first and second vector arrangement units 170 and 180, in FIG. 1, first and second cluster generation units 171 and 181, and first and second cluster enlargement units 172, respectively. And 182 are shown. As a further preferable embodiment, in FIG. 1, first and second cluster expansion stop determination units 174 and 184, first and second cluster regeneration units 175 and 185, and first and second cluster reextension units, respectively. Those with 176 and 186 are shown ing.

The first cluster generation unit 171 selects two vectors among the vectors generated by the first vector group generation unit 130 based on a predetermined criterion, and generates a cluster by causing these two vectors to be adjacent to each other. .

The second cluster generation unit 181 selects two vectors among the vectors generated by the second vector group generation unit 140 based on a predetermined reference, and generates a cluster by causing these two vectors to be adjacent to each other.

The predetermined criterion for selecting the two vectors is, for example, the level of relevancy, and the relevancy is the highest. Two vectors can be selected.

The clusters generated by the first and second cluster generation units 171 and 181 are directly sent to the first and second cluster expansion units 172 and 182, respectively, to be used for processing there, or the work result of the recording apparatus 3 It is sent to and stored in the storage unit 32.

The first cluster expanding unit 172 sequentially expands the clusters generated by the first cluster generating unit 171 by adding joining vectors to the clusters generated by the first cluster generating unit 171. The addition vector is a vector having the highest relevance to any of the end vectors located at both ends of the vectors generated by the first cluster generation unit 171. This is determined by selecting a vector group force other than the above cluster among the vector groups X generated by. Joining of a join vector to a cluster is performed by bringing the end vector determined to be most relevant to the join vector to be adjacent to the join vector, but is not limited thereto. You may join Vettar.

The second cluster expanding unit 182 sequentially expands the clusters generated by the second cluster generating unit 181 by adding joining vectors to the clusters generated by the second cluster generating unit 181. The addition vector is a vector having the highest relevance to any of the end vectors located at both ends of the vectors forming the cluster generated by the second cluster generation unit 181. In the vector group Y generated by

k

Determine by selecting the outer vector group force. Joining a join vector to a cluster is performed by using an end vector that has the highest relevance to the join vector and the join vector. However, the present invention is not limited to this, but the joining vector may be joined to other places in the cluster.

The clusters are expanded by the first and second cluster expansion units 172 and 182, and when there is no cluster unjoined vector, the processing of the first and second vector arrangement units 170 and 180 is finished.

The first cluster enlargement stop determination unit 174 is generated by the end vectors located at both ends of the vectors forming the cluster generated by the first cluster generation unit 171, and the first vector group generation unit 130. The association of vectors X other than the above-mentioned clusters among

j

If this is also less than or equal to the predetermined threshold value, the first cluster expanding unit 172 cancels the selection of the joining vector and the cluster expansion.

The second cluster enlargement stop determination unit 184 includes end vectors located at both ends of the vectors forming the cluster generated by the second cluster generation unit 181, and the vectors generated by the second vector generation unit 140. What is the relationship between the group Y and vectors other than the above cluster?

k

If this is also less than a predetermined threshold, the second cluster expanding unit 182 cancels the addition vector selection and the cluster expansion.

Here, as the predetermined threshold value, for example, it is desirable to set 0 (no correlation) in the case of the correlation coefficient.

The first cluster regeneration unit 175 selects one of the vectors other than the clusters generated by the first cluster generation unit 171 (the clusters after expansion when expanded by the first cluster expansion unit 172), 2 One vector is selected according to a predetermined criterion, and the two vectors are adjacent to generate another cluster.

The second cluster re-generation unit 185 generates two vectors out of a group of vectors other than the clusters generated by the second cluster generation unit 181 (the clusters after expansion when expanded by the second cluster expansion unit 182). The selection is made based on a predetermined criterion, and the two vectors are made adjacent to generate another cluster.

The other clusters generated by the first and second cluster regenerating units 175 and 185 are directly sent to the first and second cluster re-expanding units 176 and 186, respectively, and are used for processing there or the recording device 3 It is sent to and stored in the work result storage unit 32 of FIG.

The first cluster re-expanding unit 176 is configured by the other cluster generated by the first cluster re-generating unit 175. By adding a join vector to the star, the other clusters are sequentially expanded. The addition vector is a vector that has the highest relevance to any of the end vectors located at both ends of the vectors forming the other clusters generated by the first cluster regeneration unit 175. It is determined by selecting vector group forces other than the above other clusters among vector groups other than clusters which are vector group X generated by the group generation unit 130 and generated by the first cluster generation unit 171. The addition vector is added to the other cluster by bringing the end vector most associated with the addition vector and the addition vector adjacent to each other.

The second cluster re-enlargement unit 186 expands the other clusters sequentially by using the other cluster generated by the second cluster re-generation unit 185 with the join vector. The addition vector is a vector that has the highest relevance to any of the end vectors located at both ends of the vectors forming the other cluster generated by the second cluster regenerating unit 185. A vector group Y generated by the group generation unit 140, which is the second cluster

k

It is determined by selecting vector group forces other than the above other clusters among vector groups other than clusters generated by the generation unit 181. Joining of the join vector to the other cluster is performed by bringing the end vector that is considered to be most relevant to the join vector to be adjacent to the join vector.

The clusters are expanded by the first and second cluster re-enlargement units 176 and 186, and when there are no vectors other than clusters, the processing of the first and second vector arrangement units 170 and 180 ends.

[0045] <2— 3. Details of Recording Device 3>

In the recording device 3, the condition recording unit 31 records information such as conditions obtained from the input device 2, and sends necessary data based on the request of the processing device 1. The work result storage unit 32 stores the work result of each component in the processing device 1 and sends necessary data based on the request of the processing device 1. The document storage unit 33 stores and provides data of a necessary technical document group which also obtains external database or internal database capability based on the request of the input device 2 or the processing device 1.

2-4. Details of Output Device 4> The output device 4 outputs a score or the like arranged in a matrix based on the arrangement of vectors determined by the first and second vector arrangement units 170 and 180 of the processing device 1. The output device 4 includes, for example, a display unit 41 such as a display device, and displays the distribution state of the scores arranged in a matrix with a pattern or a color according to the score. The form of the output is not limited to the display on the display unit 41, but may be printing on a print medium such as paper, or transmission to a computer apparatus on a network via communication means.

Operation of First Embodiment>

FIG. 2 is a flow chart showing the operation procedure of the processing device 1 in the relevance analysis support device of the first embodiment.

<3— 1. Acquisition of data of technical documents>

First, the data acquisition unit 110 acquires data of a technical document group to be analyzed (step S110). Each document of this technical document group needs to have at least two kinds of attributes X and Y respectively. The number of documents in this technical document group is N. For example, the data shown in [Table 1] below is obtained. In addition, the number of attribute values may be one for each technical document, and the attribute of each technical document may be one like attribute Z of technical document numbers 2, 3 and 4 in the following [Table 1]. There may be multiple values. For example, in the case where a plurality of inventors are described in one patent document, the value of the inventor attribute will be the same as the number of inventors.

[table 1]

Technical document number i attribute X attribute Y ■. Attribute z

1

2 X! Y ₃ z ₂ z ₄

3 y ₃

4 χ z ₂ z ₃

5 z ₃

6 x ₂ ^ z ₄

7 x _{2 3} z ₄

8 x ₂ Y ₄ z ₄

9 x ₂ z ₅

1 0 x ₃ γ ₂ z ₅

■

N x ₄ Z ₃

[0049] 3-2. Calculation of score>

Next, the score calculation unit 120 calculates a score corresponding to the data of the technical document belonging to each of the combinations of the first attribute X and the second attribute Y among the at least two types of attributes (step S 120) ).

For this purpose, first, two types of the above-mentioned attributes (for example, two types of "applicant" and "keyword"; hereinafter referred to as X and Y respectively in the description of this embodiment) are selected. This selection is made based on the user's instruction input from the input device 2. One of the two types of attributes is a human attribute such as the applicant or the inventor, and the other is a technical field attribute such as a keyword or IPC. It is preferable to Also, both of the two types of attributes may be technical field attributes, for example, one may be a technical classification and the other may be a technical element. In addition, an attribute that is neither a human attribute nor a technical field attribute may be selected as one or both of the two types of attributes, such as the filing date.

[0050] After selecting two types of attributes, each attribute X, Y! / !, the value of the attribute X, Y (eg, applicant j k

And the specific name of the keyword, and it is not limited to the numerical value). For example, create a descending order ranking of the number of relevant technical documents, as shown in [Table 2] below, and specify the range of values that fall within the top p for attribute X and the top q for attribute Y. To. The number p of values X in the range of attribute X and the number q of values Y in the range of attribute Y may be the same It may be different. For example, the value range may be selected according to the purpose of analysis, such as the ability to analyze top companies in the number or which technical field to analyze. In the following description, values X, X, · · · · for attribute X, values X for attribute Y, Y · Y, · ·

1 2 p 1 2

· Described as being determined as the Y power range.

[Table 2]

Then, each combination of attribute values X and Y (where j = l, 2, · · ·, p, k = l, 2, · · ·,,

] k

For q), calculate p x q scores σ based on the number of technical documents having a combination of values of these attributes.

The score σ may be the number of technical documents in which the combination of the attribute value X and 値 (X,)) is the same, or may be a function value with the number of documents as a variable such as normalization processing. It is also good. Assuming that the score σ is the number of documents itself, for example, as shown in [Table 1] above, a technical document having a set of attribute values (X, Υ) is a technical document number i = l among the technical documents. If only, then the score σ for the (X, Y) pair is 1. For example, above [Table 1]

1 1 11

A technical document with a set of attributes (X, Υ) is a technical document number among a set of technical documents.

13

In the case of two numbers i = 2 and 3, the score σ on the (X, Y) pair is 2. The

1 3 31

The core σ is, for example, as shown in [Table 3] below. Hereafter, the virtual cases shown in [Table 3] will be referred to as appropriate.

[Table 3] The

Χ ₂ χ ₃ Χ ₄ χ ₅ χ ₆ σ! Τ = 1 σ ₁₂ = 8 σ ₁₃ = 0 σ ₁₄ = 1 σ ₁₅ = 0 σ ₁₆ = 1 γ ₂ σ ₂₁ = 0 σ 22 = 0 σ ₂₃ = 5 σ ₂₄ = 2 × 25 = 1 σ ₂₆ = 0 γ ₃ σ si = 6 σ 32 = 0 σ ₃₃ = 0 σ ₃₄ = 0 σ ₃₅ = 1 σ ₃₆ = 0 γ ₄ σ ₄₁ = 2 σ ₄₂ = 1 σ ₄₃ = 0 σ ₄₄ = 0 σ ₄₅ = 1 σ ₄₆ = 0 γ ₅ σ ₅₁ = 1 σ ₅₂ = 0 and 53 = 1 σ ₅₄ = 0 σ ₅₅ = 0 σ ₅₆ = 0 γ ₆ σ ₆₁ = 0 σ ₆₂ = 1 σ ₆₃ = 0 σ ₆₄ = 0 σ ₆₅ = 0 σ 66 = 1

Thus, since there are pXq combinations of attribute values, pXq scores σ (j = l, 2, · · ·, p, k = l, 2, · · ·, q), It can be arranged in a matrix of q rows and p columns. In the example of [Table 3], it is 6 lines 6 歹 U.

When the value range of attribute X or Y is large and p or q is too large, the score σ may be determined after setting a certain width and resetting the value of the attribute. For example, when the filing date is selected as the attribute X, the value of が becomes 1000 or more in several years as it is, but the year of filing or the year of filing may be set as the value of the attribute. This allows the range of attributes to be easily analyzed.

Here, the power described in the example of calculating the score σ based on the number of documents is not limited to this, and weights 0; = 1, 2,..., Ν are given to individual technical documents, and the weights May be reflected in the calculation of the score. For example, for each combination of attribute value X and Υ, j k

σ = α α V ie (X, Υ)

«I j k

It may be calculated by That is, all combinations of attribute values are (X, Y) j k

The sum of the weights may be used as the score σ. For example, as shown in [Table 1] above, the technical documents having the set of attributes (X,)) ikj 1 3 are two of N technical documents, technical document number i = 2 and 3. , Each weight α

2 and α

If 3 is given, (X, Υ

The score σ for the set of 1 3) and + +.

31 2 3

In this case, the weight may be given based on the application progress information, for example, a large value if patented in the case of a patent document, and a small value if not registered, etc. It is preferable to give based on the number of citations and the like.

[0055] When the score σ is expressed by the number of documents (give the same weight α = 1 for all technical documents

«I

Case) has the advantage that the distribution of the attributes is expressed simply and objectively. On the other hand, given the value of a separate weight a for each technical document, add the weights a and calculate the score σ

In the case of i i W, analysis can be appropriately performed with a score that takes into account the importance or qualitative factors of technical documents.

[0056] <3— 3. Generation of Vectors>

Next, vectors are generated in the first and second vector group generation units 130 and 140 (steps S130 and S140).

Specifically, when the scores are arranged in a matrix of q rows and p columns as described above, a q-dimensional vector whose components are the scores σ, σ, · · · · · σ belonging to each column is a vector X ( j = 1, 2, · · · lj 2] qj]

·, Ρ). This vector X is a vector representing the distribution of the attribute について for the value X of the attribute X. For example, for a patent application of a certain company X, it becomes a vector indicating the distribution of technical fields. In the hypothetical case of [Table 3] above, applicant X has many features in the technical fields Υ and Υ.

1 3 4

Applied for a patent, but the technical field

2 and salmon

6 No patent application has been filed.

Similarly, the scores σ, σ, · kl k2 belonging to each row when arranged in a matrix as described above

· · · Let p be a p-dimensional vector whose component is σ be vector Y (k = l, 2, · · ·, q). This is kp k

The turtle Y is a vector indicating the distribution of the attribute X with respect to the value 属性 of the attribute Υ. For example k k

Technical field Y

The vector k represents the distribution of applicants. In the hypothetical case above [Table 3], the technical field Y

In 1, the applicant X

The power of 2 has many patent applications Other applicants do not have much patent applications.

[0057] The vectors X and Y may have the score itself as a component as described above, but the score σ

It is desirable to use the logarithm of j k «as a component. This is because the score σ based on the combination of two technical document attributes is nonnegative and the distribution tends to be concentrated near zero. In such a case, if the logarithm of the score σ is used as the component, the distribution of the vector component becomes close to the normal distribution, and the reliability of the relevancy calculation result can be improved. In particular, when the correlation coefficient is selected as the evaluation method of relevance, it is desirable to use the logarithm of the score σ as a component. In the case of score σ power ^, the logarithm can not be defined, but, for example, it should be taken as logarithm of 0, for convenience-1 or other negative numbers and 良, may be, or all score For convenience, one or another positive number may be added and logarithms may be added, respectively.

[0058] As a method of generating a vector, a method using the score itself as a component as described above, a score σ In addition to the method in which the component logarithm is used as the component, the product of the score multiplied by the reciprocal of the frequency of occurrence is the component

W

How to do

For example, in the above [Table 3], in the value X of one attribute X, the score σ is attribute property Υ

A score that appears three times in a 2 k 2 range Y to Υ (σ = 0) is included in the number of occurrences

1 6 kj

Thank you. Therefore, the score σ corresponding to the value X is the inverse of this frequency of occurrence 1Z3

2 k2

Multiply. Furthermore, in the above [Table 3], in the value Y of the other attribute Y, the score σ appears four times in the range X to 属性 of the attribute X. Therefore, the score lj 1 6 1 corresponding to the value Υ

Σ σ is multiplied by 1Z4, which is the reciprocal of this frequency of occurrence. Then, for example, the score σ lj 12

For = 8, the reciprocal of the frequency of occurrence in value X, 1Z3 and the reverse of the frequency of occurrence in value Y

twenty one

The first component of vector X, or the second of vector Y, is multiplied by the number 1Z4.

2 1 components (components corresponding to combinations of values (X, Y)) become 8 / (3 × 4). Other ingredients

twenty one

Similarly, if each score is multiplied by the reciprocal of the frequency of occurrence, the components shown in [Table 4] are obtained. Range X to X

Let the vectors composed of the components of each column corresponding to 16 be vectors X to X, and the vectors composed of the components of each row corresponding to the range Y to Y be each

1 6 1 6

Let's say べクトノ.

1 6

[Table 4]

In this way, by making the value of the vector component that appears only in a specific vector lower, the value of the vector component appearing common to many vectors becomes higher, the value unique to each document attribute value An enhanced vector can be generated.

<3— 4. Relevance calculation> Next, in the first and second vector relation calculation units 150 and 160, the relation between p vectors X and the relation between q vectors Y are respectively calculated (step S1).

k

50 and SI 60).

For example, in the virtual case of [Table 3] above, the correlation between p vectors X can be obtained as data shown in [Table 5] below using, for example, a correlation coefficient.

[Table 5]

Here, the same can be applied to the force attribute Y indicating the calculation result of the relativity for the vector X corresponding to the attribute X. As a method of evaluating relevance, in addition to the correlation coefficient, a method using an inner product, a method of calculating Spearman's rank correlation coefficient, etc. can be considered.

[0061] <3-5. Arrangement of Vectors>

Next, in the first and second vector arrangement units 170 and 180, processing is performed to arrange vectors having high relevance closer to vectors having low relevance. The following explains one of the methods. Although the following description will be mainly made by showing an example of the attribute X, the same can be applied to the attribute Y.

<3 — 5 — 1. Create Cluster>

First, in the first and second cluster generation units 171 and 181, two vectors are made adjacent to generate a cluster (steps S171 and S181).

As an example of the method, two vectors having the highest correlation among p vectors X are selected, and these vectors are adjacent to generate a cluster. In the example in [Table 5] above, the vector X and the X force with the correlation coefficient 0.84 are the most relevant vectors.

3 4

, Make them adjacent. By selecting two vectors with the highest relevance and generating clusters, it is possible to ensure that the vectors have the highest relevance and ensure that the vectors are adjacent to each other, thus ensuring quantitative objectivity of the vector arrangement. it can.

Selection of adjacent vectors may be made by other methods. For example, when it is desired to contrast a specific applicant (such as own company) with the remaining applicants, the vector of the specific applicant may be adjacent to the vector most relevant thereto. Also, for example, when it is desired to compare these with the remaining applicants while comparing the two specific applicants (such as the company and the competitor), the vectors of the two specific applicants should be made adjacent. Well!

Hereinafter, a group of a plurality of adjacent vectors will be referred to as a "cluster".

<3 — 5 — 2. Cluster Expansion>

Next, in the first and second cluster expanding units 172 and 182, the joining vector is added to the cluster to expand the cluster (steps S172 and S182).

Specifically, the most relevant set of vectors is determined between the vectors located at both ends of the cluster and the remaining vectors not in the cluster. In the example above, the most relevant vector to the vector X or X located at either end of the cluster is the vector

3 4

It is a vector X whose correlation coefficient with X is 0 · 37. Let this vector X be a join vector

3 5 5

Ru.

Once the most relevant set of vectors is determined, the vectors are made adjacent to form larger clusters. In the example above, the already adjacent vector X and

3

Of X, next to vector X, join vector X is adjacent. However, the class is not limited to this

4 3 5

The join vector may be joined to another point in the data.

As described above, by expanding the clusters by sequentially arranging the vectors with high relevance, the vectors with high relevance are surely arranged close to each other, and the state of concentration or dispersion of data distribution of the document attribute is obtained. The distribution can be formed to be explicit.

[0065] As a result of cluster expansion, when there are no vectors not yet joined to the cluster (steps S173 and S183: NO), the arrangement of vectors ends. If there is a cluster unjoined vector (steps S 173 and S 183: YES), the process goes to steps SI 74 and SI 84 respectively.

In steps S 174 and S 184, the first and second cluster enlargement stop determination units 174 and 184 are determined. In the above, it is determined whether or not all the associations with vectors other than clusters are less than or equal to a predetermined threshold. If there is at least one relationship that exceeds the predetermined threshold (steps S 174 and S 184: NO), the process returns to steps S 172 and S 182 to expand the cluster sequentially. For example, between the two ends X or X of adjacent clusters in the order of vectors X, X, X

5 3 4 5 4 The most relevant vector is the vector X, whose correlation coefficient with the vector X is 0.49

5 1 If there is, join vector X is adjacent to vector X.

5 1

[0067] It may be determined in advance which of the two ends of the cluster the highly relevant vector is to be adjacent. For example, if it is determined that highly relevant betats are determined and adjacent to only one of both ends of a cluster, the vector that first made up the cluster is finally placed at the end of the matrix. It is also possible to make things. Also, for example, if it is determined that adjacent highly correlated vectors are determined alternately at one end and the other end of the cluster, the vector which first made up the cluster is finally arranged at the center of the matrix. It is also possible to create

<3—5— 3. Generation of Other Clusters>

In steps S174 and S184, when the relevancy is less than or equal to a predetermined threshold (steps S174 and S184: YES), the process proceeds to steps S175 and S185, respectively. In steps S175 and S185, in the first and second cluster regenerating units 175 and 185, two vectors in the vector group other than the above cluster are made adjacent to generate another cluster.

Then, in the first and second cluster reexpanders 176 and 186, the join vector is added to the other cluster to expand the other cluster (steps S176 and S186). That is, when there is no vector having relevance higher than the threshold value, clusters are generated again only with the remaining beta and the same cluster expansion procedure as described above is repeated.

As described above, when the association with the vectors at both ends of the cluster is less than or equal to a predetermined threshold, it is possible to prevent the group being forcibly integrated into one cluster, and to obtain betas having higher relevance. By giving priority to the combination of, it is possible to improve the reliability of the arrangement of the vectors.

It is desirable that the threshold of relevance be, for example, 0 (no correlation) if the correlation coefficient. Relevance Using a correlation coefficient as an evaluation method is also advantageous in that it is easy to set the threshold in this manner.

[0070] As a result of expansion of another cluster, when there are no vectors not yet joined to the cluster (steps S177 and S187: NO), the arrangement of the vectors ends. If there are unclustered vectors remaining V (steps S 177 and S 187: YES), the process proceeds to steps S 178 and S 188, respectively.

In steps S 178 and S 188, it is determined whether or not the relevance to any vector other than the cluster is less than or equal to a predetermined threshold. If at least one of the associations exceeds the predetermined threshold (steps S 178 and S 188: NO), the process returns to steps S 176 and S 186, respectively, to successively expand the other clusters. If all the relevancy is below the predetermined threshold (steps S 1 78 and S 188: YES), the process returns to steps S 175 and S 185 respectively to generate further clusters.

By the above processing, a plurality of clusters can be formed, and finally, these clusters are adjacent to each other. As a method of making clusters adjacent to each other, there is a method of arranging in one direction from one end side to the other end in descending or ascending order of cluster size (the number of vectors included in the cluster) A method of arranging them alternately can be considered.

The same procedure is performed not only for the attribute X but also for the attribute Y, and the placement determination is completed. In the above example, it is as shown in [Table 6] below.

[Table 6]

Note that the processes in the first vector group generation unit 130, the first vector relevance calculation unit 150, and the first vector arrangement unit 170 (steps S130, S150, and S171 to S178) after score calculation in step S120, and 2 vector group generation unit 140, second vector relevance calculation unit 160 The process in step 2 (steps S140, S160, and S181 to S188) may be performed either first or the other first, or both at the same time. It may be executed. Alternatively, only one of them may be executed. For example, when one of the attributes X is a human attribute such as an applicant, and the other attribute Y is a technology classification according to a coding system such as IPC, it is relevant to execute only one of them. In some cases, it may be easier to see if it is arranged according to the systematic code number order without arranging based on gender.

[0073] <3— 6. Output Example>

The output by the output device 4 may be in the form as shown in [Table 6] above, and in order to make it easier to view, the distribution state of the score may be displayed with a pattern or color according to the score. For example, it is preferable to give dark or warm colors to areas where high scores are distributed, and light, cold or cold colors to areas where low scores are distributed. Although the distribution state may not be apparent at first glance just by showing the distribution of the score numerically, the distribution state of the score can be displayed in an easy-to-see manner by adding a pattern or a color.

FIG. 3 is a view showing one display example by the display unit. In this figure, the area with dense distribution is hatched with a high linear density, and the area with coarse distribution is hatched with a low density. As shown in this figure, by showing the distribution of scores in a so-called cloud map or contour map, the density of the distribution of scores becomes clear, and the distribution of scores can be displayed more easily. it can.

FIG. 4 is a view showing another display example by the display unit. In this figure, the value of each attribute is shown specifically when "applicant" is selected as the first attribute X and "technical field" is selected as the second attribute Y. Also in this figure, grids with high line density are hatched in areas with dense distribution, and grids with low line density are printed in areas with coarse distribution, so the coarse / dense state of the score distribution state. Is clear. In other words, if you select a specific “applicant” and look at the dense distribution, you can read the main technical fields being developed by the applicant, and select a specific “technical field” and the distribution is dense. If you look at the section, you can read the major applicants who are developing in the relevant technical field.

The following analysis is possible by using human attributes and technical field attributes as shown in FIG. It becomes.

[0077] As the relationship between technology development areas is shown between your company and other companies,

(a) You can look for companies with similar development tendencies. In Fig. 4, for example, when "E car" is in-house, adjacent "F electricity" can be found. The companies found here are currently limited to companies that compete with themselves in the market. “F electricity” compared to our own “E car” has entered into an industry (for example, electrical related products) that has not entered the company while having similar development tendency like our “battery” and “Ceramitas” If this is the case, the technical hurdles for new entry into the industry can be expected to be low.

(b) You can discover the strength Z weakness of your development department compared to companies that compete with you in the market but have different development tendencies. In Fig. 4, for example, when “D Electric”, which specializes in “Semiconductors”, and “D Electric”, which does not specialize in “Electrics and electronics,” has a different development propensity, and it specializes in “Electrics and electronics”. On the other hand, in comparison with "A Electric", which is not good at "semiconductors", it is possible to discover the strength Z weakness of its development department.

(c) It is possible to search for technology alliance partners who have different development tendencies and can compensate for the weaknesses of each other's development departments. In Fig. 4, for example, the company is specializing in "semiconductors" and "optical" and has no other specialty! With "C Mfg." As its own company, it has a different development tendency and "electricity · electronic" You can discover “A Electric Machine” etc.

In addition, since the relationship between developmental entities is shown in a certain technical field and another technical field, it is possible to analyze the relationship between the technical fields. For example, as in the case of “battery” and “ceramics” that are adjacent in FIG.

(a) By working on both sides, it is possible to find out the possibility of connecting to the existing business, and to judge whether or not to enter the business, and whether it is necessary to develop technology to enter the business. Or,

(b) It is possible to find out the possibility of diverting each other's technologies even if they are technically seemingly unrelated.

Although FIG. 4 illustrates an example in which one of the two types of attributes is a human attribute and the other is a technical field attribute, the present invention is not limited to this, and both of the two types of attributes are technical field attributes, example For example, one may be a technology classification and the other may be a technology element. Also, one may be an IPC main classification (section, class), and the other may be an IPC subclass (group, subgroup) or the like.

As described above, according to the present embodiment, a company can grasp the result of technological development that he or she has developed in his own research and development organization and the current status of the technological asset portfolio by himself. It becomes possible to have objective guidelines, and it can contribute to a company's technology development investment judgment.

Also, as described above, by applying the method of the present invention to various combinations of attributes of the technical document, the current state of the development system of a specific company can be analyzed more precisely than the multifaceted angular force. In addition, based on the analysis results obtained, it is possible to more effectively support the company's decision-making on the direction of future development.

Second Embodiment>

Next, a second embodiment of the present invention will be described. The hardware configuration of the technical document attribute relevance analysis support apparatus according to the second embodiment is the same as the hardware configuration (FIG. 1) in the first embodiment, and thus the description thereof will be omitted.

FIG. 5 is a flowchart showing the operation procedure of the processing device 1 in the relevance analysis support device of the second embodiment.

The second embodiment has main features in the portion corresponding to the processing up to the generation of the first and second vector groups in the first embodiment. That is, in the second embodiment, the task word and the solution word included in the document are used as the attributes X and Y of the technical document, and the combination of the problem word and the solution word is the same as the score to be a vector component. Use the change rate of the number of documents. The process of arranging the generated vector group is substantially the same as that of the first embodiment. The operation procedure of this second embodiment will be described in detail below.

<4 1. Acquisition of technical document group>

First, based on the acquisition condition of the analysis target document group input from the input device 2, the data acquisition unit 110 acquires the technical document group to be analyzed (step S210). The types of technical documents to be acquired may be any type such as patent documents and technical papers, but in particular patent documents will be described in a format that can be extracted by computer processing the problem words and solution words described below. It can be said that it is preferable. The acquisition condition of the analysis target document group may be specified even if it is specified by, for example, the IPC code, and a document having a predetermined number of similar high degree of similarity to a specific technical document may be acquired.

[0084] 4-2. Selection of problem words and solution words>

Next, the data acquisition unit 110 extracts candidates for the “task word” and the “solved word” from each document of the acquired analysis target document group (step S211). For example, if there is an item “task” or “solution” in the summary part or other part of each document, the word of that part is extracted. Also, for example, when each document includes a description such as “The subject of the present invention is' ····································> Extract

Next, the data acquisition unit 110 selects each of the “task word” and the “solved word” used for the analysis from among the extracted “task word” and “solved word” candidates (step S 212). As the selection method, for example, the document frequency in the analysis target document group for each “problem word” and “solved word” candidate (DF: the number of hit documents when searched by each index word in the analysis target document group) There is a method of selecting the upper predetermined number (for example, 100 words each), but other methods may be used.

[0086] <4 3. Calculation of factor loading>

Next, the data acquisition unit 110 performs factor analysis using the selected “task word” to calculate the factor loading amount of each task word (step S 213). Specifically, it is performed as follows. The number of documents in the analysis target group is I, and each document is represented by i (i = 1, 2, · · ·, I). In addition, the number of selected task words is G, and each task word is represented by g (g = l, 2, · · ·, G). For each document i of I cases, calculate the weight amount z of each task word g. As a result, it is possible to obtain the following data of I row G column. Let Z be an I-by-G matrix with z as the matrix element.

[Table 7]

Index term 1 Index term 2 Index term G

Article 1 Z 1 1 Z 1 2 ■ .. Z 1 G Document ^{2 Z 2 1 z 2 2 ■} .. Z 2 G Article ^{3 Z 3 1 z 3 2 ■} .. Z 3 G

■ ■ ■ ■ ■ ■ ■.

Documents IZI 1 ZI 2 ■. Here, the term “weighting amount” means a quantity given in each document for each predetermined task word, and it is preferable to use, for example, TFIDF. TFIDF is the index term frequency (TF: number of occurrences of the task word in a document) and document frequency (DF: the number of documents of the document in which the task term appears in a predetermined document group) for a certain index term. It is a value determined by multiplying it by the reciprocal of or the reciprocal of the log of document frequency (IDF: reverse document frequency). It is a task word that is used in large numbers for the document for which the document vector is to be calculated, and is used frequently for the specified document group, and a high word TFIDF value is calculated for the task word. .

Next, with each document i as a subject, each task word g as an observation variable, and each weighting amount z as an answer by the subject, a factor loading amount in factor analysis is calculated.

Specifically, the number of factors is H, each factor is represented by h (h = 1, 2, · · ·, H), and the factor loading amount for each factor h of each task word g is a. Also, the factor gh for each factor h in each document i

Let f be a point. Then, a factor loading matrix A having a factor loading amount a as a matrix element, and a factor score ih gh

Let F be a factor score matrix F with matrix elements as follows.

[Table 8]

[Table 9]

Next, let the residual matrix of I rows and G columns be E, and

Z = FXA ^t + E

Where is the transpose of A

Solve as follows and find the factor loading matrix A. Factor score f which is each element of factor score matrix F and residual e which is each element of residual matrix E

For ih i, (1) factor scores are standardized to mean 0 and standard deviation 1 (2) each factor score g

Assuming that the correlation between is 0, (3) the correlation between each residual is 0, and (4) the correlation between each factor score and each residual is 0, in general,

Where R is the correlation matrix between observed variables and V is the variance-covariance matrix of the residual

Is known to hold. Therefore, the factor loading amount is determined in the following equation.

AA ^{t =} RV

Next, let R-V = R *. In order to calculate this R *, the value power of each element z of the matrix Z is also correlated ig

After the matrix R is calculated, the R * matrix is estimated by replacing the diagonal elements of the correlation matrix with the estimated value of the commonity (for example, the SMC method, RMAX method, etc. can be used to estimate the commonality) ). Then, since R * = AAt, the factor loading matrix A is calculated based on this R * matrix to obtain the factor loading (for example, the principal factor method, the least-squares method, There is a likelihood method etc.).

[0091] Then, in order to find more meaningful factors, it is desirable to perform an operation called factor rotation. As the method of rotation of factor axis, orthogonal rotation such as Nolimax, Coatimax, Ekamax, Persimax, Orthomax, Orthogonal procrustes, etc., Promax, Oblimin, Harris' Kaiser, Oblique such as oblique procrastess Rotation is mentioned.

The data acquisition unit 110 also performs factor analysis on the “solved word” to calculate the factor loading amount of each resolved word (step S 214). The calculation method of the factor load amount is the same as that described for the “task word”.

[0093] <4 4. Selection of Factors>

Next, the data acquisition unit 110 selects a predetermined number of factors (hereinafter referred to as “task factor” and “solving factor”) obtained as a result of factor analysis of each of the task word and the solution word (step S215). , S216) ₀ For example, based on the eigenvalues of each factor, a factor is selected with a predetermined number of higher eigenvalues. The number of factors to be selected is arbitrary. Here, p task factors and q resolution factors are selected.

In contrast to the first embodiment, in the second embodiment, two types of attributes X and Y are used. We select “task factor” and “resolution factor” and select the top p unique factor eigenvalues and the top q eigenvalue factors as the value range of the attribute (range).

[0094] <4 5. Determination of Assigning Factors of Problem Words and Solved Words>

Next, the data acquisition unit 110 determines an attribution factor of each task word and each solution word (steps S217 and S218).

For example, among the factor loadings for each factor of a certain task word (or a solved word) g (except the factor not selected in the selection of the factor), the factor loading amount a for a certain factor h is maximum In this case, an assignment factor of the subject word (or solution term) g is set as the factor h.

Ru. In this case, one task word (or solution word) can belong to only one factor. Force One task word (or solution word) belongs to one factor is not limited to one.

In addition, a lower limit value is set for the factor loading amount, and the maximum value of the factor loading amount of a certain task word (or solution) g a is less than the lower limit, the task word (or solved word) g is any factor 'J 帚 belongs, as a matter of course.

<4-6. Matrix Creation>

Next, the score calculation unit 120 counts the number of relevant technical documents for each combination of each task word and each solution word whose attribution factor has been determined (step S220). For example, an AND search is performed to search for documents that contain both one task word and one solution word whose attribution factor has been determined in the document or its summary part, and the number of hit documents is the number of relevant technical documents.

Next, the score calculation unit 120 sums up the number of documents for each combination of each task factor and each solution factor (step S221). For example, the number of relevant technical documents is totaled for all combinations of one of the task words belonging to a certain task factor and one of the solution words belonging to a certain resolution factor. For example, the task word belonging to a certain task factor is 3 of Xg, Xg, Xg

Assuming that it is 1 2 3 and there are two solution words belonging to a certain solution factor, Yg and Yg,

1 2

Number of relevant technical documents for (Xg, Yg),

1 2

Number of relevant technical documents for (Xg, Yg),

twenty one

Number of relevant technical documents for (Xg, Yg),

twenty two

Number of relevant technical documents for (Xg, Yg), Number of relevant technical documents for (Xg, Yg)

3 2

Is the number of documents related to the combination of the problem factor and the solution factor.

[0097] The method of counting the number of documents for each combination of factors is not limited to this, for example, each document returns the factor score for each factor h of each document i calculated by the above factor analysis based on f.

ih

Determine the combination of factors to which you belong, and you can count the number of documents based on this.

Thus, when the number of documents relating to the combination of each task factor and each solution factor is calculated, since the combination of p task factors and q solution factors is PX q, the number of documents in p rows and q columns Matrix is obtained.

This document number matrix shows how many technical documents exist for each combination of problem factor and solution factor, and what kinds of problems and solutions are focused in a certain technical field To identify multiple problems (uses) that can be solved by the technology by focusing on a particular solution factor (one row with a matrix), or to focus on a particular problem factor (one row with a matrix) Help to find multiple solutions to the problem.

FIG. 6 shows an example of the document number matrix generated in the second embodiment. This document number matrix is used to extract a predetermined number of patent documents similar to the top of a certain patent document i relating to “semiconductor devices and their manufacturing methods”, and perform factor analysis for each of the problem word and the solution word by the method described above. It is obtained by going. The meaning of the factor interpreted by the analyst based on the task word group and the solution word group included in each task factor and each solution factor is described in the margin of this matrix.

First, look at the matrix vertically. If the number of patent documents is summed up on the vertical axis, the main issues of this document group to be analyzed become apparent. In this example, the number of task factors 1 and 2 is large. Therefore, in the similar document group of Patent Document i relating to “semiconductor device and its manufacturing method”, it can be said that the main problems are miniaturization and manufacturing control. Furthermore, when the average filing year is calculated in each column, it can be seen that although the number of task factor 3 is small, relatively new patent documents are concentrated. In other words, it can be seen that the major issues are shifting to miniaturization, manufacturing control power consumption and power consumption. From stationary applications such as personal computers, it can be inferred that battery drive applications such as portable terminals are becoming a trend. Next, look at the matrix sideways. The number of patent documents for Resolution Factors 1 and 2 is greater than for Problem Factor 1. In other words, it is clear that lithography and etching are the main solutions for miniaturization. In addition, the resolution factor 2 has more patent documents than the problem factor 2. That is, etching can be an effective solution also in manufacturing control. Also, for example, various analyzes can be made by looking at the applicant composition of each solution factor in Task factor 1, or by looking at a certain box and looking at the transition of each year.

As described above, when one of the attributes is a task factor and the other is a resolution factor, the task factor represents a disadvantage that may occur in any application, and the resolution factor is a technology that can eliminate it. Applications can be analogized to technology from solution factors.

In addition, it is possible to analyze each company's technology strategy for the same task by aggregating each solution factor for a certain task by company.

Each element (the number of documents) of the document number matrix of p rows and q columns is set as a score σ, and the first and second vector groups are generated as in the first embodiment, By arranging the vectors based on this, it may be possible to analyze the state of concentration and variance of the problem factor and the solution factor, but in the second embodiment, a vector group is further generated as follows. .

[0101] 4) 7. Creation of change rate matrix>

The score calculation unit 120 classifies each element of the document number matrix of ρ rows and q columns for each predetermined period (step S222). For example, in the case of patent documents, classification by year of application or classification by multiple years can be considered. Preferably, it is classified into two periods before and after a predetermined time.

Next, the score calculation unit 120 calculates the increase / decrease rate of the number of technical documents based on the classification for each predetermined period for each element of the document number matrix of p rows and q columns. If the classification for each predetermined period is classified into two periods, the rate of increase or decrease is calculated for each element of the document number matrix of p rows q columns, so the change rate matrix of p rows q columns Forces are generated. In the case where the classification for each predetermined period is the classification into T period (T≥3), it is also possible to generate (T−1) pieces of change rate matrix of を rows and q columns for each adjacent period. And one matrix of average rates of change may be generated. The change factor matrix generated in this way makes it possible to detect changes in the trend of issues and solutions. For example, focus on a specific solution factor (one row with a matrix) to find changes in the application of the technology, or focus on a specific task factor (one row with a matrix) to change the solution to the problem. It can be found.

<4 8. Generation of vector etc.>

The subsequent processing is the same as in the first embodiment, and the first and second vector group generation units 130 and 140 set each element (increase / decrease ratio) of this increase / decrease ratio matrix of p rows and q columns as a score σ. A second vector group is generated (steps S230 and S240).

Then, the first and second vector relation calculation units 150 and 160 respectively calculate the relation between the vectors (steps S250 and S260), and the first and second vector arrangement units 170 and 180 respectively arrange the vectors. (Steps S271 to 278, S281 to S288). For the first and second vector groups, in the second embodiment, the q-dimensional vector relating to the p task factors is referred to as “the task factor gazette increase / decrease rate vector”, and the p dimension vectors relating to the q solution factors Is referred to as “resolution factor bulletin number increase / decrease rate vector”. In the second embodiment, the first and second clusters are respectively referred to as “problem factor cluster” and “solving factor cluster”.

In this way, by arranging the vectors on the change rate matrix, it is possible to analyze the state of concentration or variance regarding the trend of the problem factor and the solution factor.

In addition, if each element of the matrix is the change rate of the number of documents etc., it becomes possible to grasp the temporal transition of the problem factor (use) and the solution factor (technology) in detail. In particular, it can be visualized so that the problem factors (applications) and the solution factors (technologies) can be grasped quickly by making remarkable changes in the matrix. In addition, it may be possible to find factors that are increasing in number.

In addition, if there is an increasing trend in a particular solution factor (technology) for a certain problem factor (application), it can be understood that the mainstream technology of the application has changed. Similarly, it is possible to catch signs of changing the use of a given technology. This means that the technology that is the seeds can be diverted to the new needs, and can be used as a basis for formulating technology development strategies based on the seeds.

<5. Other Embodiments> The present invention can be variously modified within the scope of the gist of the present invention which is not limited to the embodiment described above.

For example, in the first embodiment, the attributes arranged on each axis of the matrix are described in the case where one is a human attribute and the other is a technical attribute, and the applicant is used as an example of the human attribute. I have listed. However, this is only an example. Other human information such as the inventors may be used as the human attribute. Also in this case, the same function and effect as the first embodiment can be obtained.

In the second embodiment, the case where the number of documents is used for the score which is each element of the matrix and the case where the increase / decrease rate of the number of documents etc. is used are described. Absent. It is possible to use an arbitrary score corresponding to the data of the technical document for the score that is each element of the matrix.

Also, a matrix may be generated for only one technical document group to be analyzed, or each element of a certain matrix may be classified, for example, for each predetermined period, and divided into the matrix for each predetermined period. A plurality of matrixes may be generated.

Then, when a plurality of matrices are generated by dividing them into a matrix for each predetermined period, if the patent documents in the matrix element are followed for each filing year, the trend of the document group to be analyzed (for example, the technology for a certain application) It is possible to grasp the trend in general. Furthermore, for example, when one of the attributes is the problem factor and the other is the solution factor, several applications, technologies that constitute them, and major issues are organized, covering when and what solutions were mainstream. Can be grasped.

Further, in the increase / decrease rate relevance matrix creation process (see FIG. 5) described in the second embodiment above, after the process of S221, the matrix is classified for each predetermined period (S222), and Task factor 'Calculate the increase / decrease rate of the number of corresponding gazettes for each combination of each solution factor (S223), and then perform the process of S230 to S227 (or! /, Scam S240 to S287)! / It is not particularly limited to this. For example, the processing of S222 and S223 should not be performed after S221, but after the processing of S277 (or, processing of S287). Thus, by generating a relation matrix between the task factor and the solution factor that constitute the matrix axis, similar task factors and similar solution factors are arranged side by side. I will meet you. As a result, the means for solving problems in a given technical field to be analyzed are consolidated and integrated, and it is possible to classify them into several applications, technologies that constitute them, and major problems.

Furthermore, we calculated the rate of change for each element of the relevancy matrix, so the power of the issues that are rising in the field that are facing increasing attention and focused efforts to solve those issues. You can understand what the technology being done is.

Claims

The scope of the claims

[1] A data acquisition means for acquiring data of a technical document group including a plurality of technical documents each having at least two types of attributes,

Score calculation means for calculating a score according to data of a technical document belonging to each of the combinations of the first attribute X and the second attribute Y among the at least two types of attributes;

When the first attribute X is arranged on the horizontal axis and the second attribute Y on the vertical axis, a vector is generated based on the score belonging to each column in the matrix arrangement when the score is arranged in a matrix. First vector group generation means for generating

Second vector group generation means for generating a vector based on the scores belonging to each row in the matrix arrangement;

Second vector relevancy calculating means for calculating interrelationships of vectors generated by the second vector group generating means;

Technical document Attribute relevance analysis support apparatus, comprising: second vector arrangement means for arranging vectors having high relevance closer to each other with respect to vectors generated by the second vector group generation means.

[2] A technical document attribute relevancy analysis supporting device according to claim 1;

A technical document attribute relevancy analysis supporting device, wherein one of the first attribute X and the second attribute Y is a human attribute of each technical document, and the other is a technical field attribute of each technical document.

[3] The technical document attribute relevance analysis support device according to claim 1 or claim 2, wherein the score calculation means is configured to calculate a value X of the first attribute X (j = 1, 2, · · · ·, P) and the value Y (k = l, 2, · · ·, q) of the second attribute Y are based on the number of technical documents in which (X, Y) are identical

k j k

The technical document attribute relevance analysis support device, which calculates the score.

[4] The technical document attribute relevancy analysis supporting device according to claim 1 or 2, wherein the score calculation unit is configured to calculate a value X of the first attribute X (j = 1, 2, · · · ·, P) and the second genus In each of the technical documents whose combination (X, Y) with the value Y (k = l, 2, · · ·, q) of the degree Y is identical

The technical document attribute relevancy analysis supporting device that calculates the score by weighting and summing.

[5] A technical document attribute relevance analysis supporting apparatus according to any one of claims 1 to 4, which is an apparatus for supporting relevance analysis of technical document attributes, wherein

Relevance of the technical document attribute, wherein the first vector group generation means or the second vector group generation means generates a beta including the logarithm of each score belonging to each column or row in the matrix arrangement as a component Analysis support device.

[6] A technical document attribute relevance analysis support device according to any one of claims 1 to 5,

The first vector arrangement means

The vector having the highest relevance to any of end vectors located at both ends of the vectors forming the cluster generated by the second cluster generation means is generated by the second vector group generation means Among the vector groups, a vector group other than the cluster is selected as a join vector, and the end vector that is most highly correlated with the join vector A technical document attribute relationship analysis support device, comprising: a second cluster expanding unit that adds the subscription vector to the cluster by causing the cluster and the subscription vector to be adjacent to each other to expand the cluster sequentially.

[7] A technical document attribute relevancy analysis supporting device according to claim 6;

The first cluster generation unit or the second cluster generation unit is configured to generate a vector group generated by the first vector group generation unit or a vector group generated by the second vector group generation unit, respectively.

Technical document attribute relevancy analysis support device that selects two vectors with the highest correlation among the vector groups.

[8] The technical document attribute relevancy analysis supporting device according to claim 6 or 7, wherein the first vector arranging unit is:

The second vector arranging unit is Between an end vector located at both ends of the vectors forming the cluster generated by the second cluster generation means, and a vector other than the cluster among the vectors generated by the second vector group generation means A second cluster expansion cancellation determination unit for stopping the selection of the joining vector by the second cluster expansion unit and the expansion of the cluster if any relevance is equal to or less than a predetermined threshold;

The vector having the highest relevance to any one of the end vectors located at both ends of the vectors forming the other cluster generated by the second cluster regenerating means is generated by the second vector group generating means. Of the vector groups other than the clusters generated by the second cluster generation means, a vector group other than the other clusters is selected as the generated vector group to form a join vector, and the association vector and the association vector are selected. A second cluster re-expanding means for expanding the other cluster sequentially by adding the addition vector to the other cluster by bringing the end vector whose highest value is the highest and the addition vector adjacent to each other; And a technical document attribute relevance analysis support device.

[9] A technical document attribute relevancy analysis supporting device according to any one of claims 1 to 8, comprising:

Display means for displaying the distribution state of the score arranged in a matrix based on the arrangement by the first vector arrangement means and the second vector arrangement means with a pattern or a color according to the score , Technical document attribute relevance analysis support device.

[10] a data acquisition step of acquiring data of a technical document group including a plurality of technical documents each having at least two types of attributes;

Calculating a score according to data of a technical document belonging to each of the combination of the first attribute X and the second attribute Y among the at least two types of attributes; and the first attribute X When the score is arranged in a matrix form with the second attribute Y as the abscissa and the ordinate with the second attribute Y, a vector is generated based on the scores belonging to each column in the matrix arrangement. Step and A first vector relevancy calculation step of calculating mutual relativity for the vector groups generated by the first vector group generation step;

A first vector arrangement step of arranging vectors having high relevance among the vectors generated by the first vector group generation step;

A second vector group generating step of generating a vector based on the score belonging to each row in the matrix arrangement;

Calculating a second relationship between the vectors generated by the second vector group generation step;

A technical document attribute relevancy analysis supporting method, comprising: a second vector arrangement step of arranging vectors having high relevancy closer to vectors generated by the second vector group generation step.

[11] a data acquisition step of acquiring data of a technical document group including a plurality of technical documents each having at least two types of attributes;

Calculating a score according to data of a technical document belonging to each of the combination of the first attribute X and the second attribute Y among the at least two types of attributes; and the first attribute X When the score is arranged in a matrix form with the second attribute Y as the abscissa and the ordinate with the second attribute Y, a vector is generated based on the scores belonging to each column in the matrix arrangement. Step and

A first vector relevancy calculation step of calculating mutual relativity for the vector groups generated by the first vector group generation step;

A second vector arranging step of arranging the vectors having high relevance closer to each other for the vector group generated by the second vector group generating step; Relevancy analysis support program for technical document attributes to be executed.