WO2021024966A1

WO2021024966A1 - Company similarity calculation server and company similarity calculation method

Info

Publication number: WO2021024966A1
Application number: PCT/JP2020/029577
Authority: WO
Inventors: 阿部諒馬; 老沼隆史; 岩下博洋
Original assignee: Ｖａｎｄｄｄ株式会社
Priority date: 2019-08-08
Filing date: 2020-07-31
Publication date: 2021-02-11
Also published as: JP2021026689A; JP7418781B2

Abstract

Provided are a company similarity calculation server and a company similarity calculation method which can accurately calculate the similarity between companies. The company similarity calculation server, which calculates company similarity between a first company that is a reference and a second company other than the first company, is characterized by including a similarity calculation means which: calculates business similarity on the basis of first information pertaining to the first company and second information pertaining to the second company and on the basis of a word pertaining to a business, which is extracted from the first information, and a word pertaining to a business, which is extracted from the second information; calculates business community similarity on the basis of a first business community pertaining to a business community to which the first company belongs and a second business community pertaining to a business community to which the second company belongs; and calculates business category similarity on the basis of a first business category pertaining to a business category to which the first company belongs and a second business category pertaining to a business category to which the second company belongs, wherein the company similarity is calculated on the basis of the business similarity, the business community similarity, and the business category similarity.

Description

Company similarity calculation server and company similarity calculation method

[Related application]
This application claims the priority of Japanese Patent Application No. 2019-146489 entitled "Corporate Similarity Calculation Server and Corporate Similarity Calculation Method" filed on August 8, 2019, and the disclosure thereof is in its entirety. Incorporated herein by reference.
The present invention relates to a company similarity calculation server and a company similarity calculation method.

As a background technology in this technical field, there is Japanese Patent Application Laid-Open No. 2012-118612 (Patent Document 1). In this publication, "The marketing proposal support method in the management server connected to the data server storing the financial information of the customer company and the user terminal via the network obtains the customer list from the user terminal and obtains the customer list. At least one characteristic evaluation index is extracted from the customer list, at least one similar company is searched from the data server based on the characteristic evaluation index, and the searched at least one company is selected. Narrow down based on similar degree ".

Japanese Unexamined Patent Publication No. 2012-118612

However, when calculating the similarity between companies, the information on the companies is not classified into information on the business, information on the industry, or information on the business format, and there is a risk that the similarity between the companies cannot be calculated accurately.

Therefore, the present invention provides a mechanism capable of accurately calculating the similarity between companies based on the similarity of information about businesses, the similarity of information about industries, and the similarity of information about business formats in each company. To do.

In order to solve the above problems, for example, the configuration described in the claims is adopted. The present application includes a plurality of means for solving the above problems, to give an example.
It is a company similarity calculation server that calculates the company similarity between the reference first company and the second company other than the first company.
A similarity calculation means for calculating the company similarity between the first company and the second company based on the first information about the first company and the second information about the second company.
It is provided with an output means for outputting the calculated company similarity.
The similarity calculation means is
The degree of business similarity is calculated based on the words related to the business conducted by the first company and the words related to the business conducted by the second company.
The industry similarity is calculated based on the first industry related to the industry to which the first company belongs and the second industry related to the industry to which the second company belongs.
The degree of business type similarity is calculated based on the first business type related to the business type to which the first company belongs and the second business type related to the business type to which the second company belongs.
The company similarity is calculated based on the business similarity, the industry similarity, and the business format similarity.
It is characterized by that.

According to the present invention, it is possible to accurately calculate the degree of similarity between companies based on the degree of similarity of information about business, the degree of similarity of information about industry, and the degree of similarity of information about business type in each company.
Issues, configurations and effects other than those described above will be clarified by the following description of the embodiments.

This is an example of a configuration diagram of the entire company similarity calculation system 1. This is an example of the hardware configuration of the company similarity calculation server 101. This is an example of the hardware configuration of the user terminal 102. This is an example of the hardware configuration of the administrator terminal 103. This is an example of the standard company word information 500. This is an example of the standard company classification information 600. This is an example of similarity information 700. This is an example of the target company basic information 800. This is an example of target company word information 900. This is an example of the target company classification information 1000. This is an example of the industry word classification dictionary information 1100. This is an example of business type word classification dictionary information 1200. This is an example of the industry similarity matrix information 1300. This is an example of the format similarity matrix information 1400. This is an example of the industry format similarity setting information 1500. This is an example of the similarity calculation flow 1600 carried out by the similarity calculation module 211. This is an example of the business word extraction flow 1700 implemented by the similarity calculation module 211. This is an example of the business word similarity calculation flow 1800 implemented by the similarity calculation module 211. This is an example of the industry word extraction flow 1900 implemented by the similarity calculation module 211. This is an example of the industry similarity calculation flow 2000 implemented by the similarity calculation module 211. This is an example of the business type word extraction flow 2100 implemented by the similarity calculation module 211. This is an example of the business type similarity calculation flow 2200 implemented by the similarity calculation module 211. This is an example of the company similarity calculation flow 2300 implemented by the similarity calculation module 211. This is an example of the similarity output flow 2400 implemented by the similarity calculation module 211. This is an example of the similar company display flow 2500 implemented by the similar company display module 212. This is an example of the screen 2600 for starting the extraction of a similar company similar to the first company. This is an example of the screen 2700 for displaying a similar company similar to the first company.

Information on the degree of similarity between companies (hereinafter, may be referred to as the degree of similarity between companies) is used in various situations. For example, when planning an M & A (Merger and Acquisition) between companies, the other party's The degree of company similarity is utilized when extracting candidates for a company (hereinafter, may be a matching destination company). In this case, more efficient M & A can be realized by accurately calculating the degree of similarity between companies.

However, in the past, there was no way to calculate the similarity between companies based on the similarity of information about businesses, the similarity of information about industries, and the similarity of information about business formats in each company. It was not taken into consideration, and there was a risk that the degree of company similarity could not be calculated accurately.

Therefore, in order to solve the problem, the present embodiment adopts the system or method described below. As a result, the degree of similarity between companies can be calculated accurately.
Hereinafter, embodiments will be described.

In this embodiment, an example of a company similarity calculation system 1 that calculates the similarity between companies in order to extract candidates for matching destination companies when planning an M & A between companies will be described.
FIG. 1 is an example of a configuration diagram of the entire company similarity calculation system 1.
The company similarity calculation system 1 includes a plurality of user terminals 102 and a plurality of administrator terminals 103, each of which is connected to the company similarity calculation server 101 via a network. The network may be wired or wireless, and each terminal can send and receive information via the network.

Each terminal of the company similarity calculation system 1 and the company similarity calculation server 101 may be, for example, a mobile terminal (mobile terminal) such as a smartphone, a tablet, a mobile phone, or a mobile information terminal (PDA), or may be a glasses type or a wristwatch. It may be a wearable terminal such as a mold or a clothing type. It may also be a stationary or portable computer, or a server located in the cloud or on a network. Further, the function may be a VR (virtual reality: Virtual Reality) terminal, an AR terminal, or an MR (mixed reality: Mixed Reality) terminal. Alternatively, it may be a combination of these plurality of terminals. For example, a combination of one smartphone and one wearable terminal can logically function as one terminal. Further, an information processing terminal other than these may be used.

Each terminal of the company similarity calculation system 1 and the company similarity calculation server 101 have a processor that executes an operating system, an application, a program, and the like, a main storage device such as a RAM (Random Access Memory), and an IC card or a hard disk. Auxiliary storage devices such as drives, SSDs (Solid State Drive), and flash memory, communication control units such as network cards, wireless communication modules, and mobile communication modules, touch panel, keyboard, mouse, voice input, and camera unit movement. It is equipped with an input device such as an input by detection and an output device such as a monitor or a display. The output device may be a device or a terminal for transmitting information for output to an external monitor, display, printer, device, or the like.

Various programs and applications (modules) are stored in the main memory, and each functional element of the entire system is realized by executing these programs and applications by the processor. In addition, each of these modules may be implemented by hardware by integrating them. Further, each module may be an independent program or application, but may be implemented in the form of a part of a subprogram or a function in one integrated program or application.
In this specification, each module is described as a subject (subject) that performs processing, but in reality, a processor that processes various programs, applications, and the like (module) executes processing.

Various databases (DB) are stored in the auxiliary storage device. A "database" is a functional element (storage unit) that stores a data set so that it can handle arbitrary data operations (for example, extraction, addition, deletion, overwriting, etc.) from a processor or an external computer. The method of implementing the database is not limited, and may be, for example, a database management system, spreadsheet software, or a text file such as XML or JSON. When implemented in a database management system, it may be a relational database (RDBMS) or a non-relational database (non-RDMS).

The user terminal 102 is a terminal used by a person who uses the company similarity information. The users include not only those who use the company similarity information by themselves, but also those who provide the information to other companies.
The administrator terminal 103 is a terminal used by the administrator of the company similarity calculation system 1.
The company similarity calculation server 101 receives input of various information necessary for making a determination from each of the above terminals and the like, and stores these in the auxiliary storage device 202.

FIG. 2 is an example of the hardware configuration of the company similarity calculation server 101.
The company similarity calculation server 101 is composed of, for example, a server arranged on the cloud.
The main storage device 201 stores the programs and applications of the similarity calculation module 211, the similar company display module 212, and the management module 213, and the processor 203 executes these programs and applications to execute the company similarity calculation server. Each functional element of 101 is realized.

The similarity calculation module 211 calculates the similarity between companies based on the information related to the companies. Details will be described later, but for example, a reference company (hereinafter, may be referred to as a first company) and a company for which the degree of similarity with the first company is calculated (hereinafter, a second company). In some cases), and the degree of similarity is calculated.
The similar company display module 212 displays information between similar companies on the user terminal 102 and the administrator terminal 103. Details will be described later, but for example, information on a plurality of second companies similar to the first company, and the degree of similarity between the first company and each second company are displayed.

The management module 213 manages the company similarity calculation system 1. Specifically, the management module 213 manages the operation information of the company similarity calculation server, the user information using the company similarity calculation system 1, and the like.

The auxiliary storage device 202 includes a survey history database 207 (hereinafter, may be referred to as a survey history DB), a target company database 208 (hereinafter, may be referred to as a target company DB), and a dictionary database 209 (hereinafter, may be referred to as a dictionary DB). In some cases), In the present embodiment, the database is provided with at least one piece of information that stores at least one piece of information.

The survey history DB 207 includes reference company word information 500, reference company classification information 600, and similarity information 700.
The target company DB 208 includes target company basic information 800, target company word information 900, and target company classification information 1000.
The dictionary DB 209 includes business word dictionary information 1050, industry word classification dictionary information 1100, business category word classification dictionary information 1200, industry similarity matrix information 1300, business category similarity matrix information 1400, and industry category similarity setting information 1500.

FIG. 3 is an example of the hardware configuration of the user terminal 102.
The user terminal 102 is composed of, for example, a stationary computer.
The similarity company display module 311 is stored in the main storage device 301, and each functional element of the user terminal 102 is realized by executing these programs and applications by the processor 303.
The user terminal data 321 of the auxiliary storage device 302 stores information related to the user.

FIG. 4 is an example of the hardware configuration of the administrator terminal 103.
The administrator terminal 103 is composed of, for example, a stationary computer.
The management module 411 is stored in the main storage device 401, and each functional element of the administrator terminal 103 is realized by executing these programs and applications by the processor.

The management module 411 manages the company similarity calculation system 1.
The administrator terminal data 421 of the auxiliary storage device 402 stores information for managing the company similarity calculation system 1.

5, 6 and 7 are examples of each table stored in the investigation history DB 207 of the auxiliary storage device 202 of the company similarity calculation server 101.

FIG. 5 is an example of the reference company word information 500.
The reference company word information 500 stores the word information extracted from the information about the first company.
The reference company word information 500 has information such as a case ID 501, a company ID 502, a reference company name 503, a business word 504, an industry word 505, and a business type word 506.

The matter ID 501 is generated when the company similarity calculation server receives a request for calculating the similarity between the first company as the reference company and at least one second company from the user terminal. It is a unique ID. As for the matter ID, a larger numerical value is given to the later matter than the past matter in the time series.
The company ID 502 is a unique ID generated for each company. In other words, one company has one company ID.
The standard company name 503 is the name of the standard company (first company).
The business word 504 is information on a word related to the business of the first company, which is extracted from the information about the business of the first company.
The industry word 505 is the information of the word about the industry of the first company extracted from the information about the first company.
The business format word 506 is information on words related to the business format of the first company extracted from the information on the first company.

FIG. 6 is an example of the standard company classification information 600.
The reference company classification information 600 stores information on the industry and business type to which the reference company (first company) belongs.
The standard company classification information 600 has information such as a project ID 601 and a company ID 602, a standard company name 603, an industry 604, and a business type 605.
The industry 604 is information on the industry to which the reference company (first company) belongs.
The business format 605 is information on the business format to which the standard company (first company) belongs.

FIG. 7 is an example of the similarity information 700.
The similarity information 700 stores information on the degree of similarity between the reference company (first company) and the target company (second company).
The similarity information 700 has information such as a case ID 701, a company ID 702, a reference company name 703, and a similarity 704.
The similarity 704 is information on the similarity between the reference company (first company) and a plurality of target companies (second company).

8, 9 and 10 are examples of each table stored in the target company DB 208 of the auxiliary storage device 202 of the company similarity calculation server 101.

FIG. 8 is an example of the target company basic information 800.
The target company basic information 800 stores company information about the target company (second company).
The target company basic information 800 has information such as company ID 801, target company name 802, company information 803, market capitalization 804, net income 805, and price-earnings ratio 806.
The target company name 802 is the name of the target company (second company).
The company information 803 is character string information about the target company (second company), and may be information that is substantially linked to the character string information about the target company (second company), for example, a company URL. (Uniform Resource Locator) may be used.

FIG. 9 is an example of the target company word information 900. The target company word information 900 stores word information extracted from information about the target company (second company).
The target company word information 900 has information such as a company ID 901, a target company name 902, a business word 903, an industry word 904, and a business type word 905.
The business word 903 is the information of the word about the business of the first company extracted from the information about the second company.
The industry word 904 is the information of the word about the industry of the first company extracted from the information about the second company.
The business format word 905 is the information of the word related to the business format of the first company extracted from the information about the second company.

FIG. 10 is an example of the target company classification information 1000.
The target company classification information 1000 stores information on the industry and business type to which the target company (second company) belongs.
The target company classification information 1000 has information such as a company ID 1001, a target company name 1002, an industry 1003, and a business type 1004.
Industry 1003 is information on the industry to which the target company (second company) belongs.
The business type 1004 is information on the business type to which the target company (second company) belongs.

11, FIG. 12, FIG. 13, FIG. 14, and FIG. 15 are examples of each table stored in the dictionary DB 209 of the auxiliary storage device 202 of the company similarity calculation server 101.
The business word dictionary information 1050 is also stored in the dictionary DB 209. The business word dictionary information 1050 stores word information related to the business.

FIG. 11 is an example of the industry word classification dictionary information 1100.
The industry word classification dictionary information 1100 stores information on words and classifications related to the industry.
The industry word classification dictionary information 1100 has information such as upper industry 1101, lower industry 1102, and industry word 1103.
The industry word 1103 is information on words related to the industry.
The sub-industry 1102 is information on the classification of the sub-industry associated with the industry word 1103.
The upper industry 1101 is information on the classification of the upper industry associated with the lower industry 1102. The upper industry 1101 is a broader concept than the lower industry 1102.

FIG. 12 is an example of the business type word classification dictionary information 1200.
The business type word classification dictionary information 1200 stores information on words and classifications related to the business type.
The business type word classification dictionary information 1200 has information such as a high-level business type 1201, a low-level business type 1202, and a business type word 1203.
The business format word 1203 is information on words related to the business format.
The sub-business format 1202 is information on the classification of the sub-business format associated with the business format word 1203.
The upper business format 1201 is information on the classification of the upper business format associated with the lower business format 1202. The upper format 1201 is a broader concept than the lower format 1202.

It should be noted that all the words related to the business stored in the business word dictionary information 1050, all the words related to the industry stored in the industry word classification dictionary information 1100, and the business types stored in the business type word classification dictionary information 1200 are related. Make sure that all words do not exactly match (different). However, some specific words related to the business stored in the business word dictionary information 1050, some specific words related to the industry stored in the industry word classification dictionary information 1100, and the business type word classification dictionary information 1200 It may be the same as some specific words related to the business type to be memorized.

FIG. 13 is an example of the industry similarity matrix information 1300.
The industry similarity matrix information 1300 stores the similarity between one upper industry and a lower industry and the other upper industry and a lower industry.
The industry similarity matrix information 1300 has information such as upper industry 1301, lower industry 1302, and similarity 1303.
In the industry similarity matrix information 1300, the elements belonging to the columns (upper industry and lower industry) and the elements belonging to the row (upper industry and lower industry) correspond to each other.
In the similarity 1303, information on the similarity between the upper industry and the lower industry in the column and the upper industry and the lower industry in the row is stored at the intersection of the column and the row.

FIG. 14 is an example of the format similarity matrix information 1400.
The business type similarity matrix information 1400 stores the degree of similarity between one upper business type and lower business type and the other upper business type and lower business type.
The business type similarity matrix information 1400 has information such as the upper business type 1401, the lower business type 1402, and the similarity degree 1403.
In the business type similarity matrix information 1400, the elements belonging to the columns (upper business type and lower business type) and the elements belonging to the row (upper business type and lower business type) correspond to each other.
The similarity level 1403 stores information on the degree of similarity between the upper business type and the lower business type in the column and the upper business type and the lower business type in the row at the intersection of the column and the row.

FIG. 15 is an example of the industry format similarity setting information 1500.
The industry format similarity setting information 1500 stores information on rules for setting the similarity 1303 of the industry similarity matrix information 1300 and the similarity 1403 of the business category similarity matrix information 1400. Details will be described later.

16 to 25 show the flow of various processes executed by the company similarity calculation system 1.
FIG. 16 is an example of the similarity calculation flow 1600 carried out by the similarity calculation module 211.
The similarity calculation flow 1600 is a flow for calculating the similarity between the first company and the second company and outputting the calculated similarity.

The similarity calculation module 211 acquires information about the first company (hereinafter, may be referred to as the first information) and information regarding the second company (hereinafter, may be referred to as the second information) (step). 1601).
Here, it will be described with reference to FIG. An example of the screen in this embodiment is displayed on the output device 305 of the user terminal 102.

FIG. 26 is an example of the screen 2600 for starting the extraction of similar companies similar to the first company.
In the example shown in FIG. 26, if there is a candidate for a matching destination company in A Co., Ltd., which is the first company, the candidate for the matching destination company is displayed. However, in the example shown in FIG. 26, since the matching destination company is not registered in A Co., Ltd., in order to identify the candidate of the matching destination company, the extraction of similar companies similar to the first company is started. It is the screen of.
By extracting similar companies similar to the first company, the user can easily make a decision as to which company should be a candidate for the matching destination company by referring to the extracted multiple similar companies. ..

In the example shown in FIG. 26, URL2601 (https://www.a ...), which is information (first information) about A Co., Ltd., which is the first company, is displayed. Here, the information about the A corporation (first information) is the information stored in advance as the information about the A corporation. That is, the information about the A corporation (including the URL information of the A corporation) is the information already stored in the target company basic information 800. Therefore, in the example shown in FIG. 26, the URL 2601 which is the first information is the company information (not shown) of A Co., Ltd. stored in the company information 803 of the target company basic information 800. As another example, the first information may be received from the user terminal 102.

In the example shown in FIG. 26, when "automatically extract from the following URL" 2602 is selected, the similarity calculation module 211 has acquired the first information and the second information (step 1601).
That is, the similarity calculation module 211 acquires the first information when "automatically extract from the following URL" 2602 is selected.
Further, in the present embodiment, the similarity calculation module 211 is the company information (No. 1) of all the companies stored in the company information 803 of the target company basic information 800 when "automatically extract from the following URL" 2602 is selected. (Excluding those with the same company ID as one company) is acquired as the second information. That is, in the present embodiment, the similarity calculation module 211 determines the similarity of all the companies (excluding the same company as the first company) stored in the target company basic information 800 with respect to one first company. Means to calculate.
As another embodiment, the similarity calculation module 211 uses the company information stored in the company information 803 of the specific target company basic information 800 or the company information received from the user terminal 102 as the second information. May be obtained as.

The similarity calculation module 211 extracts words related to the business from the first information and the second information (step 1602). Details will be described later.
The similarity calculation module 211 calculates the business similarity based on the words related to the business conducted by the first company and the words related to the business conducted by the second company (step 1603). Details will be described later.

The similarity calculation module 211 extracts words related to the industry from the first information and the second information (step 1604). Details will be described later.
The similarity calculation module 211 calculates the industry similarity based on the similarity between the industry to which the first company belongs and the industry to which the second company belongs (step 1605). Details will be described later.

The similarity calculation module 211 extracts words related to the business format from the first information and the second information (step 1606). Details will be described later.
The similarity calculation module 211 calculates the business type similarity based on the similarity between the business type to which the first company belongs and the business type to which the second company belongs (step 1607). Details will be described later.

The similarity calculation module 211 calculates the company similarity based on the business similarity, the industry similarity, and the business type similarity (step 1608). Details will be described later.
The similarity calculation module 211 outputs the similarity of a plurality of second companies based on the order of the company similarity (step 1609). Details will be described later.
As a result, the similarity calculation flow 1600 executed by the similarity calculation module 211 ends.

FIG. 17 is an example of the business word extraction flow 1700 implemented by the similarity calculation module 211.
The business word extraction flow 1700 is a flow for extracting words related to the business from the first information and the second information, and is a detailed flow of step 1602 in FIG.

The similarity calculation module 211 extracts a word group from the first information which is information about the first company and the second information which is information about the second company (step 1701).
Specifically, the similarity calculation module 211 obtains the meaning of the first information (character string information of the link destination of the company URL in the present embodiment), which is the character string information about the first company, by morphological analysis. It is decomposed into words, which is the minimum unit to have (hereinafter, a group of a plurality of words obtained by decomposing the first information may be referred to as a first word group).
Similarly, the similarity calculation module 211 performs the second information (character string information of the link destination of the company URL in the present embodiment), which is the character string information about the second company, by morphological analysis to the minimum meaningful. It is decomposed into words, which are the unit of the limit (hereinafter, a group of a plurality of words obtained by decomposing the second information may be referred to as a second word group).

The similarity calculation module 211 collates the first word group with the business word dictionary information 1050, and collates the second word group with the business word dictionary information 1050 (step 1702).
The business word dictionary information 1050 of the dictionary DB 209 stores information on words related to the business (hereinafter, may be referred to as business words).

The similarity calculation module 211 is a business word included in the first word group (hereinafter, may be referred to as a first business word) or a business word included in the second word group (hereinafter, a second business word). (May be) (step 1703).
The similarity calculation module 211 outputs the number of times the first business word appears in the first information and the number of times the second business word appears in the second information (step 1704).

Specific examples of

steps

1703 and 1704 will be described with reference to FIG. An example in which the project ID shown in FIG. 5 is M1 (the standard company name is A Co., Ltd.) will be described.
The first word group extracted from the character string information (first information) about A Co., Ltd. contains the words "house" and "maintenance", and the business word dictionary information 1050 includes "house" and "maintenance". It is assumed that the business word of is included. In this case, the similarity calculation module 211 assumes that "house" and "maintenance" common to the first word group and the business word dictionary information 1050 are the first business words in the first information. Is output (memorized) to the business word 504 of the reference company word information 500 (step 1703).

Further, the similarity calculation module 211 also outputs (memorizes) the number of times "house" and "maintenance" appear in the character string information related to A Co., Ltd. in the business word 504 of the standard company word information 500 (step 1704). That is, the similarity calculation module 211 outputs the most frequently appearing "house" (the number of appearances is 7) among the character string information (first information) related to A Co., Ltd. to word 1 of the business word 504, and then outputs it to word 1. "Maintenance" (the number of appearances is 6), which has been frequently used, is output to word 2 of the business word 504. The similarity calculation module 211 also outputs other words that appear most frequently after "maintenance" after word 3 (not shown) of business word 504.

It is assumed that the first word group extracted from the character string information about A Co., Ltd. contains the word "new construction" and the business word dictionary information 1050 does not include the business word "new construction". In this case, the similarity calculation module 211 does not output the word "new construction" to the business word 504 of the standard company word information 500 because the word "new construction" is not common between the first word group and the business word dictionary information 1050. (I don't remember).
Further, the example described with reference to FIG. 5 is an example of the first company, but even in the case of the second company, the information of the second business word and the information of the number of appearances are the target companies. It is the same as the example described with reference to FIG. 5 except that it is output (memorized) to the business word 903 in the word information 900. As another embodiment, the similarity calculation module 211 may store the information of the first business word and the number of appearances, and the information of the information second business word and the number of appearances in the same database. ..
As a result, the business word extraction flow 1700 executed by the similarity calculation module 211 ends.

FIG. 18 is an example of the business word similarity calculation flow 1800 carried out by the similarity calculation module 211.
The business word similarity calculation flow 1800 includes words related to the business conducted by the first company (first business word) and words related to the business conducted by the second company (second business word). It is a flow for calculating the business similarity based on, and is a detailed flow of step 1603 in FIG.

The similarity calculation module 211 includes all the first business words stored in the business word 504 in FIG. 5 (hereinafter, may be referred to as the first business word group) and all the second businesses stored in the business word 903. Acquire words (hereinafter, may be referred to as a second business word group) (step 1801).
The similarity calculation module 211 vectorizes the first business word group and vectorizes the second business word group (step 1802).

Specifically, for example, the similarity calculation module 211 can vectorize the first business word group and the second business word group, respectively, by using tf-idf (Tf-idf). it can. In this case, the similarity calculation module 211 also acquires information on the number of occurrences of each business word included in the first business word group.
As another example, the similarity calculation module 211 uses a technique for vectorizing character string information such as Bag of Words, LSA (Latent Semantic Analysis), word2vec, and Doc2Vec to vectorize character string information (first). The information of 1) may be vectorized, or the character string information (second information) about the second company may be vectorized.

The similarity calculation module 211 calculates the similarity between the vector information in the first business word group and the vector information in the second business word group (step 1803).
For example, the similarity calculation module 211 can calculate the similarity between the vector information in the first business word group and the vector information in the second business word group by calculating the cosine similarity.
The similarity calculation module 211 outputs the calculated similarity between the vector information in the first business word group and the vector information in the second business word group as the business similarity (step 1804).

Specific examples of

steps

1803 and 1804 will be described with reference to FIGS. 5, 7, and 9. Business similarity between A Co., Ltd. (standard company name 503), which is the first company shown in FIG. 5, and Z Co., Ltd. (target company name 902), which is one of the second companies shown in FIG. This is an example of calculating the degree and outputting it to the similarity 704 in FIG.

As step 1803, the similarity calculation module 211 vectorizes the vector information stored in the business word 504 in FIG. 5A and the information stored in the business word 903 in Z corporation in FIG. Calculate the cosine similarity with the vector information. In this case, the similarity calculation module 211 calculates, for example, the cosine similarity as 0.960.

As step 1804, the similarity calculation module 211 outputs (stores) the company ID of Z Co., Ltd. and the business similarity (cosine similarity) in relation to the company ID to the similarity 704 of the similarity information 700.
The similarity calculation module 211 also outputs the business similarity between the first company A Co., Ltd. and the other second company to the similarity 704, but at this point, it is necessary to store the business similarity according to the order of the business similarity. There is no. Although the details will be described later, the business similarity of Z Co., Ltd. (company ID: C0001) is stored in the similarity 1 of the similarity 704.
As a result, the business word similarity calculation flow 1800 executed by the similarity calculation module 211 ends.

FIG. 19 is an example of the industry word extraction flow 1900 carried out by the similarity calculation module 211.
The industry word extraction flow 1900 is a flow for extracting words related to the industry from the first information and the second information, and is a detailed flow of step 1604 in FIG.

The similarity calculation module 211 extracts a word group from the first information which is information about the first company and the second information which is information about the second company (step 1901). The step is the same as in step 1701, and the word group extracted by the similarity calculation module 211 can be used in step 1701.

The similarity calculation module 211 collates the first word group with the industry word classification dictionary information 1100, and collates the second word group with the industry word classification dictionary information 1100 (step 1902).
The industry word classification dictionary information 1100 of the dictionary DB 209 stores information on words related to the industry (hereinafter, may be referred to as industry words).

The similarity calculation module 211 includes an industry word included in the first word group (hereinafter, may be referred to as a first industry word) or an industry word included in the second word group (hereinafter, a second industry word). In some cases) (step 1903).
The similarity calculation module 211 outputs the number of times the first industry word appears in the first information and the number of times the second industry word appears in the second information (step 1904).

Specific examples of steps 1902 to 1904 will be described with reference to FIGS. 5 and 11. An example in which the project ID shown in FIG. 5 is M1 (the standard company name is A Co., Ltd.) will be described.
It is assumed that the first word group extracted from the character string information (first information) about A Co., Ltd. includes the words "new construction" and "house". In this case, the similarity calculation module 211 searches whether the words "new construction" and "house" included in the first word group are stored in the industry word 1103 of the industry word classification dictionary information 1100 (step 1902). ).

In the present embodiment, the words "new construction" 1111 and "house" 1112 are stored in the industry word 1103 of the industry word classification dictionary information 1100. Therefore, in the similarity calculation module 211, the words "new construction" and "house" included in the first word group are the words "new construction" 1111 and "house" 1112 included in the industry word 1103 of the industry word classification dictionary information 1100. As words, the words "new construction" and "house" are output (memorized) to the industry word 505 of the reference company word information 500 as the first industry word (step 1903).

Further, the similarity calculation module 211 also outputs (remembers) the number of times "new construction" and "house" appear in the character string information related to A Co., Ltd. in the industry word 505 of the standard company word information 500 (step 1904). That is, the similarity calculation module 211 outputs the most frequently appearing "new construction" (the number of appearances is 10) among the character string information (first information) related to A Co., Ltd. to word 1 of the industry word 505, and then outputs it to word 1. The "house" (the number of appearances is 7), which has appeared frequently, is output to word 2 of the industry word 505. The similarity calculation module 211 also outputs other words that appear most frequently after "house" after word 3 (not shown) of industry word 505.

The first word group extracted from the character string information about A Co., Ltd. includes the word "construction case", and the industry word 1103 of the industry word classification dictionary information 1100 includes the industry word "construction case". Imagine that it is not. In this case, in the similarity calculation module 211, the word "construction case" is not common between the first word group and the industry word 1103 of the industry word classification dictionary information 1100, so that the word "construction case" is the reference company word information. Do not output (do not remember) to 500 industry words 505.
Further, the example described with reference to FIG. 5 is an example of the first company, but even in the case of the second company, the information of the second industry word and the information of the number of appearances are targeted. It is the same as the example described with reference to FIG. 5, except that it is output (memorized) to the industry word 904 in the company word information 900. As another embodiment, the similarity calculation module 211 may store the information of the first industry word and the number of occurrences, and the information of the information second industry word and the number of appearances in the same database. ..
As a result, the industry word extraction flow 1900 executed by the similarity calculation module 211 ends.

FIG. 20 is an example of the industry similarity calculation flow 2000 implemented by the similarity calculation module 211.
The industry similarity calculation flow 2000 is a flow for calculating the industry similarity based on the similarity between the industry to which the first company belongs and the industry to which the second company belongs, and is a detailed flow of step 1605 in FIG. It is a flow.

The similarity calculation module 211 acquires the first industry word that appears more than a predetermined number of times from the industry word 505 of the reference company word information 500, and sets the information of the second industry word that appears more than a predetermined number of times as the target company word. Obtained from the industry word 904 of information 900 (step 2001).

A specific example of step 2001 will be described with reference to FIGS. 5 and 9. An example in which the reference company name shown in FIG. 5 is A Co., Ltd. will be described. The industry words in A Co., Ltd. are "new construction" (10 appearances) and "house" (7 appearances). In the present embodiment, the similarity calculation module 211 acquires the first industry word that appears 80% or more (8 times or more) of the appearance number (10 times) of the "new construction" that appears at the maximum number of appearances. To do. That is, the similarity calculation module 211 acquires only the information of the industry word of "new construction". In the present embodiment, the similarity calculation module 211 is set to a predetermined ratio (80%) or more of the number of appearances of the word appearing at the maximum number of appearances, but the predetermined ratio can be arbitrarily set.

An example in which the target company name shown in FIG. 9 is Z Co., Ltd. will be described. The industry words at Z Co., Ltd. are "spatial design" (appearance 5 times) and "house" (appearance 4 times). In the present embodiment, the similarity calculation module 211 uses the first industry word that appears 80% or more (4 times or more) of the appearance number (5 times) of the "spatial design" that appears at the maximum number of appearances. get. That is, the similarity calculation module 211 acquires information on the industry words of "spatial design" and "housing".

The similarity calculation module 211 includes an industry classification (hereinafter, may be referred to as a first industry) associated with the first industry word acquired in step 2001, and a second industry word acquired in step 2001. The industry classification (hereinafter, may be referred to as a second industry) associated with is acquired from the industry word classification dictionary information 1100 (step 2002).

A specific example of step 2002 will be described with reference to FIGS. 5 and 11. An example in which the reference company name shown in FIG. 5 is A Co., Ltd. will be described. The similarity calculation module 211 acquires the first industry corresponding to the first industry word of "new construction" acquired in step 2001. Specifically, the similarity calculation module 211 corresponds to the "new construction" 1111 of the industry word classification dictionary information 1100 in FIG. 11, and is stored in the "architecture" 1113 and the upper industry 1101 stored in the lower industry 1102. Acquire "Construction" 1114. The similarity calculation module 211 stores the acquired information on the first industry in the industry 1 of the industry 604 of the reference company classification information 600 shown in FIG. 6, and when there are a plurality of the first industries, the industry 2 I will remember it later.

Similarly, a specific example of step 2002 will be described with reference to FIGS. 9 and 11. An example in which the target company name shown in FIG. 9 is Z Co., Ltd. will be described. The similarity calculation module 211 acquires the second industry corresponding to the second industry words of "spatial design" and "housing" acquired in step 2001. Specifically, the similarity calculation module 211 corresponds to the "spatial design" 1115 and the "house" 1112 of the industry word classification dictionary information 1100 in FIG. 11, and the "architecture" 1113 and the upper industry stored in the lower industry 1102. Acquire the "construction" 1114 stored in 1101. The similarity calculation module 211 stores the acquired information on the second industry in the industry 1 of the industry 1003 of the target company classification information 1000 shown in FIG. 10, and when there are a plurality of the first industries, the industry 2 I will remember it later.

The similarity calculation module 211 acquires the similarity between the first industry and the second industry acquired in step 2002 based on the industry similarity matrix information 1300 (step 2003).
A specific example of step 2003 will be described with reference to FIG. The degree of similarity between the first industry of A Co., Ltd. (lower industry "architecture" and upper industry "construction") and the second industry of Z Co., Ltd. (lower industry "architecture" and upper industry "construction") An example of acquisition will be described.
The similarity calculation module 211 includes the upper business format "construction" 1312 and the lower business format "construction" 1311 of A Co., Ltd. belonging to the column of the industry similarity matrix information 1300 in FIG. 13, and the upper business format "construction" of Z Co., Ltd. belonging to the row. The similarity (10) associated with the intersection 1315 with 1314 and the subordinate business format “construction” 1313 is acquired.

The similarity calculation module 211 calculates the industry similarity between the first company and the second company based on the similarity associated with the intersection of the industry similarity matrix information 1300 (step 2004).
As in the example of A Co., Ltd. as the first company and Z Co., Ltd. as the second company, the first company has only one sub-industry and the second company has only one sub-industry. (When there is one intersection of the industry similarity matrix information 1300), the similarity associated with the intersection of the industry similarity matrix information 1300 is the same as that of the first company and the second company. It becomes the industry similarity of. That is, in the case of the example of A corporation as the first company and Z corporation as the second company described in step 2003 above, the degree of business type similarity is 10.

When at least one of the first industry of the first company and the second industry of the second company includes a plurality of sub-industries, there are a plurality of intersections of the industry similarity matrix information 1300 ( When there are a plurality of similarities), the industry similarity can be calculated by performing a predetermined calculation using the similarity associated with the plurality of intersections. Specific examples will be described with reference to FIGS. 6, 10 and 13. The first industry of H Co., Ltd. stored in the reference company classification information 600 in FIG. 6 and the second industry of V Co., Ltd. (company ID: C0005) stored in the target company classification information 1000 in FIG. An example of acquiring the similarity will be described.

The similarity calculation module 211 has the lower industry "polymer" and the upper industry "chemical / petrochemical / material" and the lower industry "inorganic material" and the upper industry "chemical / petrochemical / material" as the first industry of H Co., Ltd. Obtained from the industry 604 of the standard company classification information 600.
The similarity calculation module 211 is a second industry of V Co., Ltd. (company ID: C0005), which is a lower industry "polymer" and a higher industry "chemical / petroleum / material", a lower industry "household goods" and a higher industry "chemical". -Obtain "Oil / Materials" from the industry 1003 of the target company classification information 1000.

The similarity calculation module 211 includes the sub-industry "polymer" 1316 and sub-industry "inorganic material" 1317 of H Co., Ltd. belonging to the column and the sub-industry V Co., Ltd. belonging to the row in the industry similarity matrix information 1300 of FIG. The similarity associated with the four

intersections

1320, 1321, 1322, and 1323 of the industry "polymer" 1318 and the sub-industry "household goods" 1319 is acquired, respectively.
That is, the similarity calculation module 211 has a similarity "8" associated with the intersection 1320, a similarity "10" associated with the intersection 1321, and a similarity "6" associated with the intersection 1322. , The similarity degree “8” associated with the intersection 1323 is acquired, respectively.

The similarity calculation module 211 calculates, for example, the similarity for each of the first sub-industries, and calculates the average value of the calculated similarity for each of the first sub-industries. Industry similarity can be calculated. The similarity calculation module 211 is, for example, the average of the maximum similarity and the average similarity among the plurality of similarity associated with one first sub-industry of the first company belonging to the column. By calculating the value, the degree of similarity for each of the first sub-industries is calculated. When the column industry and the row industry are exactly the same, the industry similarity is the maximum.

The degree of similarity for each of the first sub-industries can be specifically calculated as follows. The similarity calculation module 211 corresponds to the similarity “8” and the intersection 1321 associated with the intersection 1320 among the similarity associated with the intersection in the sub-industry “polymer” 1316 of H Co., Ltd. belonging to the column. The average "9" with the attached similarity "10" and the similarity "10" associated with the intersection 1321 which is the maximum similarity are acquired. Further, the similarity calculation module 211 sets “9.5”, which is the average value of the acquired “9” and “10”, as the similarity in the sub-industry “polymer” 1316 of H Co., Ltd.

The similarity calculation module 211 corresponds to the similarity “6” and the intersection 1323 associated with the intersection 1322 among the similarity associated with the intersection in the sub-industry “inorganic material” 1317 of H Co., Ltd. belonging to the column. The average "7" with the attached similarity "8" and the similarity "8" associated with the intersection 1323, which is the maximum similarity, are acquired. Further, the similarity calculation module 211 sets “7.5”, which is the average value of the acquired “7” and “8”, as the similarity in the sub-industry “inorganic material” 1317 of H Co., Ltd. In addition, as in the above example, by calculating the similarity for each of the first sub-industries, it is possible to evaluate with emphasis on the maximum similarity among the similarity associated with a plurality of intersections. it can.

Then, the similarity calculation module 211 has a similarity "9.5" in the lower industry "polymer" 1316 of H Co., Ltd. and a similarity "7.5" in "inorganic material" 1317 of H Co., Ltd. The average value of "8.5" is calculated as the industry similarity. The similarity calculation module 211 stores the calculated industry similarity in the similarity 704 of the similarity information 700 in FIG. 7 according to the next step 2005. As another example, all associated with a plurality of intersections. The average value of the similarity of the above may be used as the industry similarity.

The similarity calculation module 211 outputs the calculated industry similarity (step 2005).
A specific example of step 2005 will be described with reference to FIG. An example of A Co., Ltd. as a first company and Z Co., Ltd. as a second company will be described. In the similarity calculation module 211, in order to adjust the industry similarity 10 calculated in step 2004 described above with the business similarity score, the similarity value 1.00 divided by 10 is the similarity 704 of the similarity information 700. Output (store) in association with Z Co., Ltd. (company ID of the target company: C0001). As another embodiment, the similarity after the output may be associated with the intersection of the industry similarity matrix information 1300.
As a result, the industry similarity calculation flow 2000 executed by the similarity calculation module 211 ends.

FIG. 21 is an example of the business format word extraction flow 2100 implemented by the similarity calculation module 211.
The business format word extraction flow 2100 is a flow for extracting words related to the business format from the first information and the second information, and is a detailed flow of step 1606 in FIG.

The similarity calculation module 211 extracts a word group from the first information which is information about the first company and the second information which is information about the second company (step 2101). The step is the same as in

steps

1701 and 1901, and the word group extracted by the similarity calculation module 211 in

step

1701 or 1901 can be used.

The similarity calculation module 211 collates the first word group with the business type word classification dictionary information 1200, and collates the second word group with the business type word classification dictionary information 1200 (step 2102).
The business type word classification dictionary information 1200 of the dictionary DB 209 stores information on words related to the business type (hereinafter, may be referred to as business type words).

The similarity calculation module 211 is a business type word included in the first word group (hereinafter, may be referred to as a first business type word) or a business type word included in the second word group (hereinafter, a second business type word). (In some cases), it is output (step 2103).
The similarity calculation module 211 outputs the number of times the first business type word appears in the first information and the number of times the second business type word appears in the second information (step 2104).

Specific examples of steps 2102 to 2104 will be described with reference to FIGS. 5 and 12. An example in which the project ID shown in FIG. 5 is M1 (the standard company name is A Co., Ltd.) will be described.
It is assumed that the first word group extracted from the character string information (first information) about A Co., Ltd. includes the words "construction example" and "maintenance". In this case, the similarity calculation module 211 searches whether the words "construction example" and "maintenance" included in the first word group are stored in the business format word 1203 of the business category word classification dictionary information 1200 (step). 2102).

In the present embodiment, the words "construction example" 1211 and "maintenance" 1212 are stored in the business type word 1203 of the business type word classification dictionary information 1200. Therefore, in the similarity calculation module 211, the words "construction example" and "maintenance" included in the first word group are "construction example" 1211 and "maintenance" included in the format word 1203 of the format word classification dictionary information 1200. As the 1212 words, the words "construction example" and "maintenance" are output (memorized) to the business format word 506 of the standard company word information 500 as the first business format word (step 2103).

Further, the similarity calculation module 211 also outputs (memorizes) the number of times "construction example" and "maintenance" appear in the character string information related to A Co., Ltd. in the business format word 506 of the standard company word information 500 (step 2104). .. That is, the similarity calculation module 211 outputs the most frequently appearing "construction example" (the number of appearances is 7) among the character string information (first information) related to A Co., Ltd. to word 1 of the business format word 506. Next, "maintenance" (the number of appearances is 6), which has the highest number of appearances, is output to word 2 of the business type word 506. The similarity calculation module 211 also outputs other words that appear most frequently after "house" after word 3 (not shown) of the business format word 506.

It should be noted that the first word group extracted from the character string information about A Co., Ltd. includes the word "house", and the business type word 1203 of the business type word classification dictionary information 1200 does not include the business type word "house". Imagine a case. In this case, in the similarity calculation module 211, the word "house" is not common between the first word group and the business type word 1203 of the business type word classification dictionary information 1200, so that the word "house" is the reference company word information 500. Do not output to business type word 506 (do not remember).
Further, the example described with reference to FIG. 5 is an example of the first company, but even in the case of the second company, the information of the second business type word and the information of the number of appearances are targeted. It is the same as the example described with reference to FIG. 5, except that it is output (memorized) to the business type word 905 in the company word information 900. As another embodiment, the similarity calculation module 211 may store the information of the first business category word and the number of appearances, and the information of the second business category word and the number of appearances in the same database. ..
As a result, the business format word extraction flow 2100 executed by the similarity calculation module 211 ends.

FIG. 22 is an example of the business format similarity calculation flow 2200 implemented by the similarity calculation module 211.
The business type similarity calculation flow 2200 is a flow for calculating the business type similarity based on the similarity between the business type to which the first company belongs and the business type to which the second company belongs, and the details of step 1607 in FIG. Flow.

The similarity calculation module 211 acquires the first business type word that appears more than a predetermined number of times from the business type word 506 of the reference company word information 500, and obtains the information of the second business type word that appears more than a predetermined number of times as the target company word. Obtained from the business type word 905 of the information 900 (step 2201).

A specific example of step 2201 will be described with reference to FIGS. 5 and 9. An example in which the reference company name shown in FIG. 5 is A Co., Ltd. will be described. The format words in A Co., Ltd. are "construction case" (appearance number 7 times) and "maintenance" (appearance number 6 times). In the present embodiment, the similarity calculation module 211 is a first business format that appears 80% or more (5.6 times or more) of the appearance times (7 times) of the "construction case" that appears at the maximum number of appearances. Get the word. That is, the similarity calculation module 211 acquires the information of the business type words of "construction example" and "maintenance". In the present embodiment, the similarity calculation module 211 is set to a predetermined ratio (80%) or more of the number of appearances of the word appearing at the maximum number of appearances, but the predetermined ratio can be arbitrarily set.

An example in which the target company name shown in FIG. 9 is Z Co., Ltd. will be described. The format words in Z Co., Ltd. are "construction record" (8 appearances) and "maintenance" (4 appearances). In the present embodiment, the similarity calculation module 211 is the first business format that appears 80% or more (6.4 times or more) of the appearance times (8 times) of the "construction results" that appear at the maximum number of appearances. Get the word. That is, the similarity calculation module 211 acquires only the information of the business type word of "construction record".

The similarity calculation module 211 includes a business category classification associated with the first business format word acquired in step 2201 (hereinafter, may be referred to as a first business format), and a second business format word acquired in step 2201. The business category classification (hereinafter, may be referred to as a second business category) associated with is acquired from the business category word classification dictionary information 1200 (step 2202).

A specific example of step 2202 will be described with reference to FIGS. 5 and 12. An example in which the reference company name shown in FIG. 5 is A Co., Ltd. will be described. The similarity calculation module 211 acquires the first business format corresponding to the first business format words of "construction example" and "maintenance" acquired in step 2201. Specifically, the similarity calculation module 211 is stored in the "construction" 1213 and the upper business category 1201 stored in the lower business format 1202, which corresponds to the "construction example" 1211 in the business format word classification dictionary information 1200 of FIG. Acquire "Manufacturing / Processing" 1214. Further, the similarity calculation module 211 corresponds to the “maintenance” 1212 of the business format word classification dictionary information 1200 in FIG. 12, and the “maintenance / maintenance” 1215 stored in the lower business format 1202 and the “management” stored in the upper business format 1201. 1216 is acquired.
The similarity calculation module 211 stores the acquired information on the first business type in the business type 1 and the business type 2 of the business type 605 of the reference company classification information 600 shown in FIG.

Similarly, a specific example of step 2202 will be described with reference to FIGS. 9 and 12. An example in which the target company name shown in FIG. 9 is Z Co., Ltd. will be described. The similarity calculation module 211 acquires the second business format corresponding to the second business format word of the “construction record” acquired in step 2201. Specifically, the similarity calculation module 211 is stored in the "construction" 1213 and the upper business category 1201 stored in the lower business category 1202, which corresponds to the "construction record" 1217 of the business category word classification dictionary information 1200 in FIG. Acquire "Manufacturing / Processing" 1214. The similarity calculation module 211 stores the acquired information on the second business type in the business type 1 of the business type 1004 of the target company classification information 1000 shown in FIG. 10, and when there are a plurality of the first business types, the business type 2 I will remember it later.

The similarity calculation module 211 acquires the similarity between the first business format and the second business format acquired in step 2202 based on the business format similarity matrix information 1400 (step 2203).
A specific example of step 2203 will be described with reference to FIG. The first business format of A Co., Ltd. (lower business format "construction" and higher business format "manufacturing / processing" and lower business format "maintenance / maintenance" and higher business format "management") and the second business format of Z Co., Ltd. (lower business format) An example of acquiring the degree of similarity with "construction" and the higher-level business format "manufacturing / processing") will be described.

The similarity calculation module 211 is used in the business format similarity matrix information 1400 shown in FIG. It is associated with two

intersections

1417 and 1418 of "management" 1414 and lower format "maintenance / maintenance" 1413, and upper format "manufacturing / processing" 1416 and lower format "construction" 1415 of Z Co., Ltd. belonging to the bank. Get the similarity. That is, the similarity calculation module 211 acquires the similarity "10" associated with the intersection 1417 and the similarity "7" associated with the intersection 1418, respectively.

The similarity calculation module 211 calculates the business type similarity between the first company and the second company based on the similarity associated with the intersection of the business type similarity matrix information 1400 (step 2204).
If the first company has only one sub-business format and the second company has only one sub-business format (when the intersection of the business type similarity matrix information 1400 is one), the similarity is calculated. Module 211 can set the similarity associated with the intersection of the business type similarity matrix information 1400 as the business type similarity between the first company and the second company.

In the case where at least one of the first business format of the first company and the second business format of the second company includes a plurality of subordinate business formats, there are a plurality of intersections of the business category similarity matrix information 1400 ( Suppose there are multiple similarities). In this case, the similarity calculation module 211 calculates the business type similarity using the similarity associated with the plurality of intersections.
The similarity calculation module 211 calculates, for example, the similarity for each one of the plurality of first sub-business categories, and calculates the average value of the similarities for all the calculated first sub-business categories. Calculate the degree of business type similarity. Similar to the processing in the industry similarity calculation flow 2000 described above, the similarity calculation module 211 is, for example, among a plurality of similarity associations associated with one first sub-business category of the first company belonging to the column. By calculating the average value of the maximum similarity and the average similarity, the similarity for each of the first sub-business categories is calculated. In addition, when the business type of the column and the business type of the row are completely the same, the degree of business type similarity is the maximum value.

An example of calculating the business type similarity between A Co., Ltd. as the first company and Z Co., Ltd. as the second company will be described. In this example, since there is only one similarity associated with one first sub-business category of the first company belonging to the column, the one similarity level is the similarity level for each first sub-business category. It becomes. That is, the similarity calculation module 211 acquires the similarity "10" 1417 in the subordinate business format "construction" 1411 of A Co., Ltd. and the similarity "7" 1418 in "maintenance / maintenance" 1413 of A Co., Ltd. ..
Then, the similarity calculation module 211 is an average value of the similarity "10" 1417 in the subordinate business format "construction" 1411 of A Co., Ltd. and the similarity "7" 1418 in "maintenance / maintenance" 1413 of A Co., Ltd. Is calculated as "8.5" as the business type similarity.

The similarity calculation module 211 outputs the calculated business type similarity (step 2205).
A specific example of step 2205 will be described with reference to FIG. An example of A Co., Ltd. as a first company and Z Co., Ltd. as a second company will be described. The similarity calculation module 211 adjusts the business type similarity 8.5 calculated in step 2204 described above with the business similarity score, so that 0.85, which is a value divided by 10, is similar to the similarity information 700. Output (store) in association with Z Co., Ltd. (company ID: C0001 of the target company) at degree 704. As another embodiment, the similarity after the output may be associated with the intersection of the business type similarity matrix information 1400.
As a result, the business format similarity calculation flow 2200 executed by the similarity calculation module 211 ends.

Here, the similarity stored at the intersection of the lower industry and the upper industry in the column of the industry similarity matrix and the lower industry and the upper industry in the row is set based on a predetermined rule. The predetermined rule will be described with reference to FIG.
In the industry type similarity setting information 1500 of FIG. 15, the upper classification of columns (upper industry) and the upper classification of rows (upper industry) in the industry similarity matrix are the same, high similarity, medium similarity, or low. The cases are classified according to the degree of similarity, and the subclassification of columns (subclassification) and subclassification of rows (subclassification) in the industry similarity matrix are the same, high similarity, or medium similarity and low similarity. It is divided into cases according to the degree, and the similarity according to each case is stored.
In addition, "same" means that the industries are completely the same, "high similarity" means that the industries are likely to be similar, and "medium similarity" means that the industries are similar next to "high similarity". Highly likely, “low similarity” means that industries are likely to be similar next to “medium similarity”.

The industry type similarity setting information 1500 has information such as the similarity setting rule 1501.
The similarity setting rule 1501 stores the similarity corresponding to each of the above-mentioned cases in the bottom line of the industry type similarity setting information 1500.
The similarity calculation module 211 is based on the rules shown in the industry format similarity setting information 1500, and is a lower industry and a higher industry in the column (first company) of the industry similarity matrix and a lower industry in the row (second company). Correspond the degree of similarity to the intersection of the industry and the upper industry.

Specifically, for example, the similarity calculation module 211 can set the similarity higher in the following order.
When the sub-industry of the column and the sub-industry of the row are the same (similarity stored at the intersection is 10).
When the lower industry in the column and the lower industry in the row have high similarity, and the upper industry in the column and the upper industry in the row are the same (the similarity stored at the intersection is 9).
When the lower industry in the column and the lower industry in the row have a medium similarity, and the upper industry in the column and the upper industry in the row are the same (the similarity stored at the intersection is 8).

When the similarity between the lower industry in the column and the lower industry in the row is high, and the similarity between the upper industry in the column and the upper industry in the row is high (the similarity stored in the intersection is high). 7).
When the similarity between the lower industry in the column and the lower industry in the row is medium similarity, and the similarity between the upper industry in the column and the upper industry in the row is high (the similarity stored in the intersection is 6).
When the similarity between the lower industry in the column and the lower industry in the row is high, and the similarity between the upper industry in the column and the upper industry in the row is low and medium (similarity stored at the intersection). The degree is 5).
When the similarity between the lower industry in the column and the lower industry in the row is medium similarity, and the similarity between the upper industry in the column and the upper industry in the row is low and medium similarity (similarity stored at the intersection). The degree is 4).

When the similarity between the lower industry in the column and the lower industry in the row is low, and the upper industry in the column and the upper industry in the row are the same (the similarity stored at the intersection is 3).
When the similarity between the lower industry in the column and the lower industry in the row is low, and the similarity between the upper industry in the column and the upper industry in the row is high (the similarity stored at the intersection is 2).
When the similarity between the lower industry in the column and the lower industry in the row is low, and the similarity between the upper industry in the column and the upper industry in the row is low and medium similarity (similarity stored at the intersection). The degree is 1).

It should be noted that it is necessary to determine in advance whether the sub-industries are "same", "high similarity", "medium similarity" or "low similarity". Similarly, it is necessary to determine in advance whether the upper industries are “same”, “high similarity”, “medium similarity” or “low similarity”.

In addition, the similarity corresponding to the intersection of the lower and upper business categories in the column of the business category similarity matrix and the lower and upper business categories in the row is also set based on the predetermined rules as described above.

FIG. 23 is an example of the company similarity calculation flow 2300 implemented by the similarity calculation module 211.
The company similarity calculation flow 2300 is a flow for calculating the company similarity based on the business similarity, the industry similarity, and the business type similarity, and is a detailed flow of step 1608 in FIG.

The similarity calculation module 211 acquires information on business similarity, industry similarity, and business type similarity from the similarity information 700 (step 2301).
The similarity calculation module 211 calculates the company similarity by adding the business similarity, the industry similarity, and the business type similarity at a predetermined ratio (step 2302).
The similarity calculation module 211 outputs the calculated company similarity (step 2303).

Specific examples of steps 2301 to 2303 will be described with reference to FIG. An example of A corporation as a first company and Z corporation as a second company (company ID of the target company: C0001) will be described.
The similarity calculation module 211 has a business similarity (0.960) between the first company A Co., Ltd. and the second company (company ID of the target company: C0001) from the similarity 704 of the similarity information 700. , Industry similarity (1.00) and business category similarity (0.850) are acquired respectively (step 2301).

The similarity calculation module 211 adds the respective similarity at the following ratios.
Business similarity: Industry similarity: Business similarity = 3: 5: 2
The ratio of business similarity is the highest, and the ratio of business similarity is the lowest.
That is, the similarity calculation module 211 calculates a value of 0.958 as the company similarity (step 2302).
The predetermined ratio may be any ratio, and the importance of each similarity can be set by adjusting the ratio.

The similarity calculation module 211 outputs (stores) the calculated company similarity (0.958) in association with the similarity 704 of the similarity information 700 in association with Z Co., Ltd. (company ID: C0001 of the target company). Step 2303).
As a result, the company similarity calculation flow 2300 executed by the similarity calculation module 211 ends.

FIG. 24 is an example of the similarity output flow 2400 implemented by the similarity calculation module 211.
The similarity output flow 2400 is a flow for outputting the similarity of a plurality of second companies based on the order of the company similarity, and is a detailed flow of step 1609 in FIG.

The similarity calculation module 211 acquires the company similarity in the plurality of second companies (step 2401).
The similarity calculation module 211 determines the order of the plurality of second companies based on the acquired company similarity in the plurality of second companies (step 2402).
The similarity calculation module 211 outputs (stores) information on the order of the plurality of determined second companies (step 2403).

Specific examples of steps 2401 to 2403 will be described with reference to FIG. The example of A Co., Ltd. as the first company will be described.
The similarity calculation module 211 acquires the company similarity between the first company A Co., Ltd. and the plurality of second companies from the similarity 704 of the similarity information 700 in FIG. 7 (step 2401). In the present embodiment, the company similarity between all the second companies and A Co., Ltd. stored in the target company basic information 800 is acquired from the similarity 704 of the similarity information 700. Note that FIG. 7 shows only the similarity of the three companies.

The similarity calculation module 211 ranks the second companies based on the company similarity between all the second companies and A Co., Ltd., and for example, the top three can be determined as follows (step 2402). The first place is a company with a company ID of C0001 (Z Co., Ltd.) and the company similarity is 0.958, the second place is a company with a company ID of C0080 and the company similarity is 0.927, and the third place is a company. The ID is C0087 and the company similarity is 0.810.

The similarity calculation module 211 outputs (stores) information on each similarity of Z Co., Ltd., whose first-ranked company ID is C0001, to similarity 1 of similarity 704 of similarity information 700, and second-ranked company. The information about each similarity of the company whose ID is C0080 is output (stored) to the similarity 2 of the similarity 704 of the similarity information 700, and the information about each similarity of the company whose third company ID is C807 is the similarity. It is output (stored) to the similarity 3 of the similarity 704 of the information 700 (step 2403). The similarity calculation module 211 outputs (stores) the fourth and subsequent ranks to the similarity 4 and later of the similarity 704 of the similarity information 700.
As a result, the similarity output flow 2400 executed by the similarity calculation module 211 ends.

FIG. 25 is an example of the similar company display flow 2500 implemented by the similar company display module 212.
The similar company display flow 2500 is a flow for displaying a second company similar to the first company.

The similar company display module 212 acquires the business similarity, the industry similarity, the business type similarity, and the company similarity in the second company having the higher company similarity (step 2501).
The similar company display module 212 displays information about the second company and a chart including axes of business similarity, industry similarity, and business type similarity on the user terminal 102 based on the order of company similarity (step 2502). ).
The chart generated and displayed by the similar company display module 212 is not limited to the radar chart, and may be, for example, a chart in which column charts (bar graphs) of each degree of similarity are grouped for each company.

Specific examples of

steps

2501 and 2502 will be described with reference to FIGS. 7, 8 and 27. The example of A Co., Ltd. as the first company will be described.
The similar company display module 212 stores in the similarity 1, the similarity 2 and the similarity 3 of the row of the first company A Co., Ltd. in the similarity 704 of the similarity information 700 of FIG. The degree of business similarity, the degree of industry similarity, the degree of business type similarity, and the degree of company similarity in a company (Z Co., Ltd. with company ID C0001, company with company ID C0080 and company with company ID C0087) are acquired. In addition, the information stored in the target company basic information 800 of FIG. 8 in the acquired second company is also acquired (step 2501).

FIG. 27 is an example of a screen 2700 for displaying a similar company similar to the first company. More specifically, FIG. 27 shows three companies similar to A Co., Ltd. as the first company. Note that FIG. 27 is a screen displayed after "Automatically extract from the following URL" 2602 in FIG. 26 is selected.
The similar company display module 212 displays information on each company based on the order of the company similarity of the company with the company ID of C0001, the company with the company ID of C0080, and the company with the company ID of C0087. That is, the similar company display module 212 displays the information of Z Co., Ltd. (company ID: C0001), which is the company with the highest degree of company similarity, in the upper part, and the company ID of the company with the next highest degree of company similarity is C0080. The information of the company (R Co., Ltd.) is displayed in the middle part, and the information of the company (G Co., Ltd.) whose company ID is C0087, which is the next highest degree of company similarity, is displayed in the lower part.

As shown in FIG. 27, the similar company display module 212 displays the name, market capitalization, net income, and price-earnings ratio of each second company acquired from the target company basic information 800 of FIG. ..
Further, as shown in FIG. 27, the similar company display module 212 displays the business similarity acquired from the similarity information 700 on the axis of the business similarity, and displays the industry similarity acquired from the similarity information 700 as the industry similarity. A radar chart is displayed on the axis of the business type similarity, and the business type similarity obtained from the similarity information 700 is displayed on the business type similarity axis.
As a result, the similar company display flow 2500 executed by the similar company display module 212 ends.

In addition, the similar company display module 212 adds or replaces the axis of business similarity, the axis of industry similarity, and the axis of business type similarity, and the axis of similarity regarding the industry, the axis of similarity regarding the business form, or the axis of similarity regarding the business form. A chart may be generated and displayed that includes an axis of similarity with respect to the business structure.
The similarity calculation module 211 can calculate the similarity regarding the industry, the similarity regarding the business form, or the similarity regarding the business structure by the same method as the method for calculating the industry similarity or the business type similarity.

That is, the similarity calculation module 211 stores the industry word classification dictionary information for storing the word and classification information related to the industry, the business form word for storing the business form word and the classification information from the first information or the second information. Using the classification dictionary information or the business structure word classification dictionary information that stores words related to the business structure and information on the classification, words related to the type of business, words related to the business form, or words related to the business structure are extracted.

Next, the industry, business form, or business structure to which the first company or the second company corresponding to the extracted word belongs is determined by the industry word classification dictionary information, the business form word classification dictionary information, or the business structure word classification dictionary information. To identify using.
Next, the similarity calculation module 211 calculates the similarity regarding the industry, the similarity regarding the business form, or the similarity regarding the business structure by using the industry similarity matrix, the business form similarity matrix, or the business structure similarity matrix. ..

The present invention is not limited to the above-mentioned examples, and includes various modifications. For example, the above-described embodiment has been described in detail in order to explain the present invention in an easy-to-understand manner, and is not necessarily limited to those having all the described configurations. Further, it is possible to replace a part of the configuration of one embodiment with the configuration of another embodiment, and it is also possible to add the configuration of another embodiment to the configuration of one embodiment. Further, it is possible to add / delete / replace a part of the configuration of each embodiment with another configuration.

Further, each of the above configurations, functions, processing units, processing means, etc. may be realized by hardware by designing a part or all of them by, for example, an integrated circuit. Further, each of the above configurations, functions, and the like may be realized by software by the processor interpreting and executing a program that realizes each function. Information such as programs, tables, and files that realize each function can be stored in a memory, a hard disk, a recording device such as an SSD (Solid State Drive), or a recording medium such as an IC card, an SD card, or a DVD.

In addition, the control lines and information lines indicate those that are considered necessary for explanation, and do not necessarily indicate all the control lines and information lines in the product. In practice, it can be considered that almost all configurations are interconnected.
It should be noted that the above-described embodiment discloses at least the configuration described in the claims.

1 ... Corporate similarity calculation system, 101 ... Corporate similarity calculation server, 102 ... User terminal, 103 ... Administrator terminal, 201 ... Main storage device, 202 ... Auxiliary storage Device, 203 ... Processor, 211 ... Similarity calculation module, 212 ... Similar company display module, 213 ... Management module

Claims

It is a company similarity calculation server that calculates the company similarity between the reference first company and the second company other than the first company.
A similarity calculation means for calculating the company similarity between the first company and the second company based on the first information about the first company and the second information about the second company.
It is provided with an output means for outputting the calculated company similarity.
The similarity calculation means is
The degree of business similarity is calculated based on the words related to the business conducted by the first company and the words related to the business conducted by the second company.
The industry similarity is calculated based on the first industry related to the industry to which the first company belongs and the second industry related to the industry to which the second company belongs.
The degree of business type similarity is calculated based on the first business type related to the business type to which the first company belongs and the second business type related to the business type to which the second company belongs.
A company similarity calculation server that calculates the company similarity based on the business similarity, the industry similarity, and the business type similarity.
The similarity calculation means is
A word related to an industry is extracted from the first information, and at least one industry associated with at least one first industry word that appears more than a predetermined number of occurrences among the extracted words is defined as the first industry. ,
Words related to the industry are extracted from the second information, and at least one industry associated with at least one second industry word that appears more than a predetermined number of occurrences among the extracted words is defined as the second industry. ,
The company similarity calculation server according to claim 1, which calculates the industry similarity.
The similarity calculation means is
Words related to the industry are extracted from the first information, and at least one industry associated with at least one first industry word that appears more than a predetermined number of occurrences among the extracted words is defined as the first sub-industry. , The concept is broader than the first lower industry, and the industry associated with the first lower industry is defined as the first upper industry.
Words related to the industry are extracted from the second information, and at least one industry associated with at least one second industry word that appears more than a predetermined number of occurrences among the extracted words is defined as the second sub-industry. , The concept is broader than the second lower industry, and the industry associated with the second lower industry is defined as the second upper industry.
Claim 1 for calculating the industry similarity by calculating the similarity between the first lower industry and the first upper industry and the second lower industry and the second upper industry. The company similarity calculation server described in.
The information acquisition means acquires an industry similarity matrix used for calculating the industry similarity, and obtains the industry similarity matrix.
The industry similarity matrix is
The elements belonging to columns and the elements belonging to rows correspond to each other, and the elements include a lower industry which is information about an industry and a higher industry which is a broader concept than the lower industry.
At the intersection of the column and the row, the similarity between the lower industry and the upper industry in the column and the lower industry and the upper industry in the row is associated.
The similarity calculation means is
Using the industry similarity matrix,
With at least one of the columns associated with the first sub-industry and the first super-industry,
The third aspect of the present invention, wherein the industry similarity is calculated by acquiring the similarity associated with the second sub-industry and at least one said line related to the second upper industry. Company similarity calculation server.
With at least one sub-industry associated with the first sub-industry and the first super-industry and the column corresponding to the super-industry.
A plurality of similarities associated with the second sub-industry and at least one of the sub-industries associated with the second super-industry and the row corresponding to the super-industry are acquired.
The company similarity calculation server according to claim 4, which calculates the industry similarity based on the acquired plurality of similarity.
The similarity associated with the column and the row of the industry similarity matrix is
When the sub-industry in the column and the sub-industry in the row are the same
When the degree of similarity between the lower industry in the column and the lower industry in the row is high, and the upper industry in the column and the upper industry in the row are the same. ,
The similarity between the lower industry in the column and the lower industry in the row is the high similarity, and the similarity between the upper industry in the column and the upper industry in the row is the high similarity. If,
When the similarity between the lower industry in the column and the lower industry in the row is lower than the high similarity, and the upper industry in the column and the upper industry in the row are the same. ,
The similarity between the lower industry in the column and the lower industry in the row is the low similarity, and the similarity between the upper industry in the column and the upper industry in the row is the low similarity. If,
The company similarity calculation server according to claim 4 or 5, which is higher in the order of.
The similarity calculation means is
A word related to a business type is extracted from the first information, and at least one business type associated with at least one first business type word that appears more than a predetermined number of occurrences among the extracted words is defined as the first business type. ,
Words related to business formats are extracted from the second information, and at least one business format associated with at least one second business format word that appears more than a predetermined number of occurrences among the extracted words is defined as the second business format. ,
The company similarity calculation server according to any one of claims 1 to 6, which calculates the business type similarity.
The similarity calculation means is
Words related to business formats are extracted from the first information, and at least one business format associated with at least one first business format word that appears more than a predetermined number of occurrences among the extracted words is set as the first sub-business format. , The concept is broader than the first lower-level business format, and the business format associated with the first lower-level business format is defined as the first higher-level business format.
Words related to business formats are extracted from the second information, and at least one business format associated with at least one second business format word that appears more than a predetermined number of occurrences among the extracted words is set as the second sub-business format. , The concept is broader than the second lower-level business format, and the business format associated with the second lower-level business format is defined as the second higher-level business format.
Claim 1 for calculating the degree of similarity between the first lower-level business type and the first upper-level business type, and the second lower-level business type and the second upper-level business type to calculate the degree of similarity between the first lower-level business type and the first upper-level business type. The company similarity calculation server according to any one of 6 to 6.
The information acquisition means acquires a business type similarity matrix used for calculating the business type similarity degree, and obtains the business type similarity degree matrix.
The format similarity matrix is
The element belonging to the column and the element belonging to the row correspond to each other, and the element includes a lower business type which is information about the business type and a higher business type which is a broader concept than the lower business type.
At the intersection of the column and the row, the degree of similarity between the lower business category and the upper business category in the column and the lower business category and the upper business category in the row is associated.
The similarity calculation means is
Using the format similarity matrix,
With at least one of the columns associated with the first subordinate format and the first superior format,
The eighth aspect of the present invention, wherein the degree of similarity of the type of business is calculated by acquiring the degree of similarity associated with the second lower type of business and at least one of the rows related to the second higher type of business. Corporate similarity calculation server.
The first sub-business format and at least one subordinate business format related to the first upper business format and the column corresponding to the upper business format, and
A plurality of similarities associated with the second sub-business format and at least one sub-business format related to the second upper business format and the row corresponding to the upper business format are acquired.
The company similarity calculation server according to claim 9, which calculates the business type similarity based on the acquired plurality of similarity.
The similarity associated with the column and the row of the format similarity matrix is
When the sub-business category in the column and the sub-business category in the row are the same,
When the degree of similarity between the lower business type in the column and the lower business type in the row is high, and the upper business type in the column and the upper business type in the row are the same. ,
The degree of similarity between the lower business category in the column and the lower business category in the row is the high similarity, and the similarity between the upper business category in the column and the upper business category in the row is the high similarity. If
When the similarity between the lower business category in the column and the lower business category in the row is lower than the high similarity, and the upper business category in the column and the upper business category in the row are the same. ,
The similarity between the lower business category in the column and the lower business category in the row is the low similarity, and the similarity between the upper business category in the column and the upper business category in the row is the low similarity. If
The company similarity calculation server according to claim 9 or 10, which is higher in the order of.
The similarity calculation means is
At least one first business word, which is a word related to the business conducted by the first company, is extracted from the first information, and the first business word is vectorized.
At least one second business word, which is a word related to the business conducted by the second company, is extracted from the second information, and the second business word is vectorized.
The company similarity according to any one of claims 1 to 11, wherein the business similarity is calculated by calculating the vectorized similarity between the first business word and the second business word. Degree calculation server.
The similarity calculation means is
By calculating the value obtained by adding the business similarity, the business category similarity, and the industry similarity at a predetermined ratio.
The company similarity calculation server according to any one of claims 1 to 12, which calculates the company similarity.
The predetermined ratio is
The company similarity calculation server according to claim 13, wherein the ratio of the business similarity is the highest and the ratio of the business similarity is the lowest.
The output means
The business similarity and
The degree of similarity between business formats and
With the industry similarity
The company similarity calculation server according to any one of claims 1 to 14, which outputs the company similarity.
The output means
The axis of business similarity and
The axis of business type similarity and
The company similarity calculation server according to any one of claims 1 to 15, which outputs a chart including the industry similarity axis.
The output means
The business similarity calculated by the similarity calculation means is displayed on the axis of the business similarity.
The industry similarity calculated by the similarity calculation means is displayed on the industry similarity axis.
The company similarity calculation server according to claim 16, wherein the business type similarity calculated by the similarity calculation means is displayed on the axis of the business type similarity.
If there are multiple second companies,
The output means
The company similarity calculation server according to any one of claims 1 to 17, which outputs the similarity of a plurality of the second companies based on the order of the company similarity calculated by the similarity calculation means.
It is a method of calculating the company similarity in the company similarity calculation server that calculates the company similarity between the first company as a reference and the second company other than the first company.
The degree of business similarity is calculated based on the words related to the business conducted by the first company and the words related to the business conducted by the first company.
The industry similarity is calculated based on the first industry related to the industry to which the first company belongs and the second industry related to the industry to which the second company belongs.
The degree of business type similarity is calculated based on the first business type related to the business type to which the first company belongs and the second business type related to the business type to which the second company belongs.
The company similarity is calculated based on the business similarity, the industry similarity, and the business format similarity.
A company similarity calculation method that outputs the calculated company similarity.
A program for causing the company similarity calculation server to execute each step of the company similarity calculation method according to claim 19.