WO2021024966A1 - Company similarity calculation server and company similarity calculation method - Google Patents

Company similarity calculation server and company similarity calculation method Download PDF

Info

Publication number
WO2021024966A1
WO2021024966A1 PCT/JP2020/029577 JP2020029577W WO2021024966A1 WO 2021024966 A1 WO2021024966 A1 WO 2021024966A1 JP 2020029577 W JP2020029577 W JP 2020029577W WO 2021024966 A1 WO2021024966 A1 WO 2021024966A1
Authority
WO
WIPO (PCT)
Prior art keywords
similarity
business
industry
company
information
Prior art date
Application number
PCT/JP2020/029577
Other languages
French (fr)
Japanese (ja)
Inventor
阿部諒馬
老沼隆史
岩下博洋
Original Assignee
Vanddd株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Vanddd株式会社 filed Critical Vanddd株式会社
Publication of WO2021024966A1 publication Critical patent/WO2021024966A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising

Definitions

  • Patent Document 1 Japanese Patent Application Laid-Open No. 2012-118612
  • the marketing proposal support method in the management server connected to the data server storing the financial information of the customer company and the user terminal via the network obtains the customer list from the user terminal and obtains the customer list.
  • At least one characteristic evaluation index is extracted from the customer list, at least one similar company is searched from the data server based on the characteristic evaluation index, and the searched at least one company is selected. Narrow down based on similar degree ".
  • the information on the companies is not classified into information on the business, information on the industry, or information on the business format, and there is a risk that the similarity between the companies cannot be calculated accurately.
  • the present invention provides a mechanism capable of accurately calculating the similarity between companies based on the similarity of information about businesses, the similarity of information about industries, and the similarity of information about business formats in each company. To do.
  • the present application includes a plurality of means for solving the above problems, to give an example. It is a company similarity calculation server that calculates the company similarity between the reference first company and the second company other than the first company. A similarity calculation means for calculating the company similarity between the first company and the second company based on the first information about the first company and the second information about the second company. It is provided with an output means for outputting the calculated company similarity.
  • the similarity calculation means is The degree of business similarity is calculated based on the words related to the business conducted by the first company and the words related to the business conducted by the second company.
  • the industry similarity is calculated based on the first industry related to the industry to which the first company belongs and the second industry related to the industry to which the second company belongs.
  • the degree of business type similarity is calculated based on the first business type related to the business type to which the first company belongs and the second business type related to the business type to which the second company belongs.
  • the company similarity is calculated based on the business similarity, the industry similarity, and the business format similarity. It is characterized by that.
  • the present invention it is possible to accurately calculate the degree of similarity between companies based on the degree of similarity of information about business, the degree of similarity of information about industry, and the degree of similarity of information about business type in each company. Issues, configurations and effects other than those described above will be clarified by the following description of the embodiments.
  • the degree of similarity between companies is used in various situations. For example, when planning an M & A (Merger and Acquisition) between companies, the other party's The degree of company similarity is utilized when extracting candidates for a company (hereinafter, may be a matching destination company). In this case, more efficient M & A can be realized by accurately calculating the degree of similarity between companies.
  • M & A Merger and Acquisition
  • the present embodiment adopts the system or method described below. As a result, the degree of similarity between companies can be calculated accurately.
  • embodiments will be described.
  • FIG. 1 is an example of a configuration diagram of the entire company similarity calculation system 1.
  • the company similarity calculation system 1 includes a plurality of user terminals 102 and a plurality of administrator terminals 103, each of which is connected to the company similarity calculation server 101 via a network.
  • the network may be wired or wireless, and each terminal can send and receive information via the network.
  • Each terminal of the company similarity calculation system 1 and the company similarity calculation server 101 may be, for example, a mobile terminal (mobile terminal) such as a smartphone, a tablet, a mobile phone, or a mobile information terminal (PDA), or may be a glasses type or a wristwatch. It may be a wearable terminal such as a mold or a clothing type. It may also be a stationary or portable computer, or a server located in the cloud or on a network. Further, the function may be a VR (virtual reality: Virtual Reality) terminal, an AR terminal, or an MR (mixed reality: Mixed Reality) terminal. Alternatively, it may be a combination of these plurality of terminals. For example, a combination of one smartphone and one wearable terminal can logically function as one terminal. Further, an information processing terminal other than these may be used.
  • a mobile terminal such as a smartphone, a tablet, a mobile phone, or a mobile information terminal (PDA)
  • PDA mobile information terminal
  • the function may be a VR (
  • Each terminal of the company similarity calculation system 1 and the company similarity calculation server 101 have a processor that executes an operating system, an application, a program, and the like, a main storage device such as a RAM (Random Access Memory), and an IC card or a hard disk.
  • main storage device such as a RAM (Random Access Memory), and an IC card or a hard disk.
  • Auxiliary storage devices such as drives, SSDs (Solid State Drive), and flash memory, communication control units such as network cards, wireless communication modules, and mobile communication modules, touch panel, keyboard, mouse, voice input, and camera unit movement. It is equipped with an input device such as an input by detection and an output device such as a monitor or a display.
  • the output device may be a device or a terminal for transmitting information for output to an external monitor, display, printer, device, or the like.
  • each module is stored in the main memory, and each functional element of the entire system is realized by executing these programs and applications by the processor.
  • each of these modules may be implemented by hardware by integrating them.
  • each module may be an independent program or application, but may be implemented in the form of a part of a subprogram or a function in one integrated program or application.
  • each module is described as a subject (subject) that performs processing, but in reality, a processor that processes various programs, applications, and the like (module) executes processing.
  • DB databases
  • a “database” is a functional element (storage unit) that stores a data set so that it can handle arbitrary data operations (for example, extraction, addition, deletion, overwriting, etc.) from a processor or an external computer.
  • the method of implementing the database is not limited, and may be, for example, a database management system, spreadsheet software, or a text file such as XML or JSON.
  • a database management system it may be a relational database (RDBMS) or a non-relational database (non-RDMS).
  • the user terminal 102 is a terminal used by a person who uses the company similarity information.
  • the users include not only those who use the company similarity information by themselves, but also those who provide the information to other companies.
  • the administrator terminal 103 is a terminal used by the administrator of the company similarity calculation system 1.
  • the company similarity calculation server 101 receives input of various information necessary for making a determination from each of the above terminals and the like, and stores these in the auxiliary storage device 202.
  • FIG. 2 is an example of the hardware configuration of the company similarity calculation server 101.
  • the company similarity calculation server 101 is composed of, for example, a server arranged on the cloud.
  • the main storage device 201 stores the programs and applications of the similarity calculation module 211, the similar company display module 212, and the management module 213, and the processor 203 executes these programs and applications to execute the company similarity calculation server. Each functional element of 101 is realized.
  • the similarity calculation module 211 calculates the similarity between companies based on the information related to the companies. Details will be described later, but for example, a reference company (hereinafter, may be referred to as a first company) and a company for which the degree of similarity with the first company is calculated (hereinafter, a second company). In some cases), and the degree of similarity is calculated.
  • the similar company display module 212 displays information between similar companies on the user terminal 102 and the administrator terminal 103. Details will be described later, but for example, information on a plurality of second companies similar to the first company, and the degree of similarity between the first company and each second company are displayed.
  • the management module 213 manages the company similarity calculation system 1. Specifically, the management module 213 manages the operation information of the company similarity calculation server, the user information using the company similarity calculation system 1, and the like.
  • the auxiliary storage device 202 includes a survey history database 207 (hereinafter, may be referred to as a survey history DB), a target company database 208 (hereinafter, may be referred to as a target company DB), and a dictionary database 209 (hereinafter, may be referred to as a dictionary DB).
  • the database is provided with at least one piece of information that stores at least one piece of information.
  • the survey history DB 207 includes reference company word information 500, reference company classification information 600, and similarity information 700.
  • the target company DB 208 includes target company basic information 800, target company word information 900, and target company classification information 1000.
  • the dictionary DB 209 includes business word dictionary information 1050, industry word classification dictionary information 1100, business category word classification dictionary information 1200, industry similarity matrix information 1300, business category similarity matrix information 1400, and industry category similarity setting information 1500.
  • FIG. 3 is an example of the hardware configuration of the user terminal 102.
  • the user terminal 102 is composed of, for example, a stationary computer.
  • the similarity company display module 311 is stored in the main storage device 301, and each functional element of the user terminal 102 is realized by executing these programs and applications by the processor 303.
  • the user terminal data 321 of the auxiliary storage device 302 stores information related to the user.
  • FIG. 4 is an example of the hardware configuration of the administrator terminal 103.
  • the administrator terminal 103 is composed of, for example, a stationary computer.
  • the management module 411 is stored in the main storage device 401, and each functional element of the administrator terminal 103 is realized by executing these programs and applications by the processor.
  • the management module 411 manages the company similarity calculation system 1.
  • the administrator terminal data 421 of the auxiliary storage device 402 stores information for managing the company similarity calculation system 1.
  • FIG. 5 is an example of the reference company word information 500.
  • the reference company word information 500 stores the word information extracted from the information about the first company.
  • the reference company word information 500 has information such as a case ID 501, a company ID 502, a reference company name 503, a business word 504, an industry word 505, and a business type word 506.
  • the matter ID 501 is generated when the company similarity calculation server receives a request for calculating the similarity between the first company as the reference company and at least one second company from the user terminal. It is a unique ID. As for the matter ID, a larger numerical value is given to the later matter than the past matter in the time series.
  • the company ID 502 is a unique ID generated for each company. In other words, one company has one company ID.
  • the standard company name 503 is the name of the standard company (first company).
  • the business word 504 is information on a word related to the business of the first company, which is extracted from the information about the business of the first company.
  • the industry word 505 is the information of the word about the industry of the first company extracted from the information about the first company.
  • the business format word 506 is information on words related to the business format of the first company extracted from the information on the first company.
  • FIG. 6 is an example of the standard company classification information 600.
  • the reference company classification information 600 stores information on the industry and business type to which the reference company (first company) belongs.
  • the standard company classification information 600 has information such as a project ID 601 and a company ID 602, a standard company name 603, an industry 604, and a business type 605.
  • the industry 604 is information on the industry to which the reference company (first company) belongs.
  • the business format 605 is information on the business format to which the standard company (first company) belongs.
  • FIG. 7 is an example of the similarity information 700.
  • the similarity information 700 stores information on the degree of similarity between the reference company (first company) and the target company (second company).
  • the similarity information 700 has information such as a case ID 701, a company ID 702, a reference company name 703, and a similarity 704.
  • the similarity 704 is information on the similarity between the reference company (first company) and a plurality of target companies (second company).
  • FIG. 8 is an example of the target company basic information 800.
  • the target company basic information 800 stores company information about the target company (second company).
  • the target company basic information 800 has information such as company ID 801, target company name 802, company information 803, market capitalization 804, net income 805, and price-earnings ratio 806.
  • the target company name 802 is the name of the target company (second company).
  • the company information 803 is character string information about the target company (second company), and may be information that is substantially linked to the character string information about the target company (second company), for example, a company URL. (Uniform Resource Locator) may be used.
  • Uniform Resource Locator Uniform Resource Locator
  • FIG. 9 is an example of the target company word information 900.
  • the target company word information 900 stores word information extracted from information about the target company (second company).
  • the target company word information 900 has information such as a company ID 901, a target company name 902, a business word 903, an industry word 904, and a business type word 905.
  • the business word 903 is the information of the word about the business of the first company extracted from the information about the second company.
  • the industry word 904 is the information of the word about the industry of the first company extracted from the information about the second company.
  • the business format word 905 is the information of the word related to the business format of the first company extracted from the information about the second company.
  • FIG. 10 is an example of the target company classification information 1000.
  • the target company classification information 1000 stores information on the industry and business type to which the target company (second company) belongs.
  • the target company classification information 1000 has information such as a company ID 1001, a target company name 1002, an industry 1003, and a business type 1004.
  • Industry 1003 is information on the industry to which the target company (second company) belongs.
  • the business type 1004 is information on the business type to which the target company (second company) belongs.
  • FIG. 11, FIG. 12, FIG. 13, FIG. 14, and FIG. 15 are examples of each table stored in the dictionary DB 209 of the auxiliary storage device 202 of the company similarity calculation server 101.
  • the business word dictionary information 1050 is also stored in the dictionary DB 209.
  • the business word dictionary information 1050 stores word information related to the business.
  • FIG. 11 is an example of the industry word classification dictionary information 1100.
  • the industry word classification dictionary information 1100 stores information on words and classifications related to the industry.
  • the industry word classification dictionary information 1100 has information such as upper industry 1101, lower industry 1102, and industry word 1103.
  • the industry word 1103 is information on words related to the industry.
  • the sub-industry 1102 is information on the classification of the sub-industry associated with the industry word 1103.
  • the upper industry 1101 is information on the classification of the upper industry associated with the lower industry 1102.
  • the upper industry 1101 is a broader concept than the lower industry 1102.
  • FIG. 12 is an example of the business type word classification dictionary information 1200.
  • the business type word classification dictionary information 1200 stores information on words and classifications related to the business type.
  • the business type word classification dictionary information 1200 has information such as a high-level business type 1201, a low-level business type 1202, and a business type word 1203.
  • the business format word 1203 is information on words related to the business format.
  • the sub-business format 1202 is information on the classification of the sub-business format associated with the business format word 1203.
  • the upper business format 1201 is information on the classification of the upper business format associated with the lower business format 1202.
  • the upper format 1201 is a broader concept than the lower format 1202.
  • all the words related to the business stored in the business word dictionary information 1050, all the words related to the industry stored in the industry word classification dictionary information 1100, and the business types stored in the business type word classification dictionary information 1200 are related. Make sure that all words do not exactly match (different). However, some specific words related to the business stored in the business word dictionary information 1050, some specific words related to the industry stored in the industry word classification dictionary information 1100, and the business type word classification dictionary information 1200 It may be the same as some specific words related to the business type to be memorized.
  • FIG. 13 is an example of the industry similarity matrix information 1300.
  • the industry similarity matrix information 1300 stores the similarity between one upper industry and a lower industry and the other upper industry and a lower industry.
  • the industry similarity matrix information 1300 has information such as upper industry 1301, lower industry 1302, and similarity 1303.
  • the elements belonging to the columns (upper industry and lower industry) and the elements belonging to the row (upper industry and lower industry) correspond to each other.
  • the similarity 1303 information on the similarity between the upper industry and the lower industry in the column and the upper industry and the lower industry in the row is stored at the intersection of the column and the row.
  • FIG. 14 is an example of the format similarity matrix information 1400.
  • the business type similarity matrix information 1400 stores the degree of similarity between one upper business type and lower business type and the other upper business type and lower business type.
  • the business type similarity matrix information 1400 has information such as the upper business type 1401, the lower business type 1402, and the similarity degree 1403.
  • the elements belonging to the columns (upper business type and lower business type) and the elements belonging to the row (upper business type and lower business type) correspond to each other.
  • the similarity level 1403 stores information on the degree of similarity between the upper business type and the lower business type in the column and the upper business type and the lower business type in the row at the intersection of the column and the row.
  • FIG. 15 is an example of the industry format similarity setting information 1500.
  • the industry format similarity setting information 1500 stores information on rules for setting the similarity 1303 of the industry similarity matrix information 1300 and the similarity 1403 of the business category similarity matrix information 1400. Details will be described later.
  • FIG. 16 is an example of the similarity calculation flow 1600 carried out by the similarity calculation module 211.
  • the similarity calculation flow 1600 is a flow for calculating the similarity between the first company and the second company and outputting the calculated similarity.
  • the similarity calculation module 211 acquires information about the first company (hereinafter, may be referred to as the first information) and information regarding the second company (hereinafter, may be referred to as the second information) (step). 1601). Here, it will be described with reference to FIG. An example of the screen in this embodiment is displayed on the output device 305 of the user terminal 102.
  • FIG. 26 is an example of the screen 2600 for starting the extraction of similar companies similar to the first company.
  • the candidate for the matching destination company is displayed.
  • the matching destination company is not registered in A Co., Ltd., in order to identify the candidate of the matching destination company, the extraction of similar companies similar to the first company is started. It is the screen of.
  • URL2601 https://www.a (7), which is information (first information) about A Co., Ltd., which is the first company, is displayed.
  • the information about the A corporation (first information) is the information stored in advance as the information about the A corporation. That is, the information about the A corporation (including the URL information of the A corporation) is the information already stored in the target company basic information 800. Therefore, in the example shown in FIG. 26, the URL 2601 which is the first information is the company information (not shown) of A Co., Ltd. stored in the company information 803 of the target company basic information 800.
  • the first information may be received from the user terminal 102.
  • the similarity calculation module 211 when "automatically extract from the following URL" 2602 is selected, the similarity calculation module 211 has acquired the first information and the second information (step 1601). That is, the similarity calculation module 211 acquires the first information when "automatically extract from the following URL" 2602 is selected. Further, in the present embodiment, the similarity calculation module 211 is the company information (No. 1) of all the companies stored in the company information 803 of the target company basic information 800 when "automatically extract from the following URL" 2602 is selected. (Excluding those with the same company ID as one company) is acquired as the second information.
  • the similarity calculation module 211 determines the similarity of all the companies (excluding the same company as the first company) stored in the target company basic information 800 with respect to one first company. Means to calculate. As another embodiment, the similarity calculation module 211 uses the company information stored in the company information 803 of the specific target company basic information 800 or the company information received from the user terminal 102 as the second information. May be obtained as.
  • the similarity calculation module 211 extracts words related to the business from the first information and the second information (step 1602). Details will be described later.
  • the similarity calculation module 211 calculates the business similarity based on the words related to the business conducted by the first company and the words related to the business conducted by the second company (step 1603). Details will be described later.
  • the similarity calculation module 211 extracts words related to the industry from the first information and the second information (step 1604). Details will be described later.
  • the similarity calculation module 211 calculates the industry similarity based on the similarity between the industry to which the first company belongs and the industry to which the second company belongs (step 1605). Details will be described later.
  • the similarity calculation module 211 extracts words related to the business format from the first information and the second information (step 1606). Details will be described later.
  • the similarity calculation module 211 calculates the business type similarity based on the similarity between the business type to which the first company belongs and the business type to which the second company belongs (step 1607). Details will be described later.
  • the similarity calculation module 211 calculates the company similarity based on the business similarity, the industry similarity, and the business type similarity (step 1608). Details will be described later.
  • the similarity calculation module 211 outputs the similarity of a plurality of second companies based on the order of the company similarity (step 1609). Details will be described later. As a result, the similarity calculation flow 1600 executed by the similarity calculation module 211 ends.
  • FIG. 17 is an example of the business word extraction flow 1700 implemented by the similarity calculation module 211.
  • the business word extraction flow 1700 is a flow for extracting words related to the business from the first information and the second information, and is a detailed flow of step 1602 in FIG.
  • the similarity calculation module 211 extracts a word group from the first information which is information about the first company and the second information which is information about the second company (step 1701). Specifically, the similarity calculation module 211 obtains the meaning of the first information (character string information of the link destination of the company URL in the present embodiment), which is the character string information about the first company, by morphological analysis. It is decomposed into words, which is the minimum unit to have (hereinafter, a group of a plurality of words obtained by decomposing the first information may be referred to as a first word group).
  • the similarity calculation module 211 performs the second information (character string information of the link destination of the company URL in the present embodiment), which is the character string information about the second company, by morphological analysis to the minimum meaningful. It is decomposed into words, which are the unit of the limit (hereinafter, a group of a plurality of words obtained by decomposing the second information may be referred to as a second word group).
  • the similarity calculation module 211 collates the first word group with the business word dictionary information 1050, and collates the second word group with the business word dictionary information 1050 (step 1702).
  • the business word dictionary information 1050 of the dictionary DB 209 stores information on words related to the business (hereinafter, may be referred to as business words).
  • the similarity calculation module 211 is a business word included in the first word group (hereinafter, may be referred to as a first business word) or a business word included in the second word group (hereinafter, a second business word). (May be) (step 1703).
  • the similarity calculation module 211 outputs the number of times the first business word appears in the first information and the number of times the second business word appears in the second information (step 1704).
  • steps 1703 and 1704 will be described with reference to FIG. An example in which the project ID shown in FIG. 5 is M1 (the standard company name is A Co., Ltd.) will be described.
  • the first word group extracted from the character string information (first information) about A Co., Ltd. contains the words "house” and “maintenance”, and the business word dictionary information 1050 includes "house” and "maintenance”. It is assumed that the business word of is included.
  • the similarity calculation module 211 assumes that "house” and "maintenance" common to the first word group and the business word dictionary information 1050 are the first business words in the first information. Is output (memorized) to the business word 504 of the reference company word information 500 (step 1703).
  • the similarity calculation module 211 also outputs (memorizes) the number of times "house” and "maintenance” appear in the character string information related to A Co., Ltd. in the business word 504 of the standard company word information 500 (step 1704). That is, the similarity calculation module 211 outputs the most frequently appearing "house” (the number of appearances is 7) among the character string information (first information) related to A Co., Ltd. to word 1 of the business word 504, and then outputs it to word 1. "Maintenance” (the number of appearances is 6), which has been frequently used, is output to word 2 of the business word 504. The similarity calculation module 211 also outputs other words that appear most frequently after "maintenance” after word 3 (not shown) of business word 504.
  • the similarity calculation module 211 does not output the word "new construction” to the business word 504 of the standard company word information 500 because the word "new construction" is not common between the first word group and the business word dictionary information 1050. (I don't remember).
  • the example described with reference to FIG. 5 is an example of the first company, but even in the case of the second company, the information of the second business word and the information of the number of appearances are the target companies. It is the same as the example described with reference to FIG.
  • the similarity calculation module 211 may store the information of the first business word and the number of appearances, and the information of the information second business word and the number of appearances in the same database. .. As a result, the business word extraction flow 1700 executed by the similarity calculation module 211 ends.
  • FIG. 18 is an example of the business word similarity calculation flow 1800 carried out by the similarity calculation module 211.
  • the business word similarity calculation flow 1800 includes words related to the business conducted by the first company (first business word) and words related to the business conducted by the second company (second business word). It is a flow for calculating the business similarity based on, and is a detailed flow of step 1603 in FIG.
  • the similarity calculation module 211 includes all the first business words stored in the business word 504 in FIG. 5 (hereinafter, may be referred to as the first business word group) and all the second businesses stored in the business word 903. Acquire words (hereinafter, may be referred to as a second business word group) (step 1801). The similarity calculation module 211 vectorizes the first business word group and vectorizes the second business word group (step 1802).
  • the similarity calculation module 211 can vectorize the first business word group and the second business word group, respectively, by using tf-idf (Tf-idf). it can. In this case, the similarity calculation module 211 also acquires information on the number of occurrences of each business word included in the first business word group. As another example, the similarity calculation module 211 uses a technique for vectorizing character string information such as Bag of Words, LSA (Latent Semantic Analysis), word2vec, and Doc2Vec to vectorize character string information (first). The information of 1) may be vectorized, or the character string information (second information) about the second company may be vectorized.
  • Tf-idf tf-idf
  • the similarity calculation module 211 calculates the similarity between the vector information in the first business word group and the vector information in the second business word group (step 1803). For example, the similarity calculation module 211 can calculate the similarity between the vector information in the first business word group and the vector information in the second business word group by calculating the cosine similarity. The similarity calculation module 211 outputs the calculated similarity between the vector information in the first business word group and the vector information in the second business word group as the business similarity (step 1804).
  • steps 1803 and 1804 will be described with reference to FIGS. 5, 7, and 9.
  • Business similarity between A Co., Ltd. standard company name 503, which is the first company shown in FIG. 5, and Z Co., Ltd. (target company name 902), which is one of the second companies shown in FIG. This is an example of calculating the degree and outputting it to the similarity 704 in FIG.
  • the similarity calculation module 211 vectorizes the vector information stored in the business word 504 in FIG. 5A and the information stored in the business word 903 in Z corporation in FIG. Calculate the cosine similarity with the vector information. In this case, the similarity calculation module 211 calculates, for example, the cosine similarity as 0.960.
  • the similarity calculation module 211 outputs (stores) the company ID of Z Co., Ltd. and the business similarity (cosine similarity) in relation to the company ID to the similarity 704 of the similarity information 700.
  • the similarity calculation module 211 also outputs the business similarity between the first company A Co., Ltd. and the other second company to the similarity 704, but at this point, it is necessary to store the business similarity according to the order of the business similarity. There is no. Although the details will be described later, the business similarity of Z Co., Ltd. (company ID: C0001) is stored in the similarity 1 of the similarity 704. As a result, the business word similarity calculation flow 1800 executed by the similarity calculation module 211 ends.
  • FIG. 19 is an example of the industry word extraction flow 1900 carried out by the similarity calculation module 211.
  • the industry word extraction flow 1900 is a flow for extracting words related to the industry from the first information and the second information, and is a detailed flow of step 1604 in FIG.
  • the similarity calculation module 211 extracts a word group from the first information which is information about the first company and the second information which is information about the second company (step 1901).
  • the step is the same as in step 1701, and the word group extracted by the similarity calculation module 211 can be used in step 1701.
  • the similarity calculation module 211 collates the first word group with the industry word classification dictionary information 1100, and collates the second word group with the industry word classification dictionary information 1100 (step 1902).
  • the industry word classification dictionary information 1100 of the dictionary DB 209 stores information on words related to the industry (hereinafter, may be referred to as industry words).
  • the similarity calculation module 211 includes an industry word included in the first word group (hereinafter, may be referred to as a first industry word) or an industry word included in the second word group (hereinafter, a second industry word). In some cases) (step 1903).
  • the similarity calculation module 211 outputs the number of times the first industry word appears in the first information and the number of times the second industry word appears in the second information (step 1904).
  • steps 1902 to 1904 will be described with reference to FIGS. 5 and 11.
  • the first word group extracted from the character string information (first information) about A Co., Ltd. includes the words "new construction” and "house”.
  • the similarity calculation module 211 searches whether the words "new construction” and "house” included in the first word group are stored in the industry word 1103 of the industry word classification dictionary information 1100 (step 1902). ).
  • the words “new construction” 1111 and “house” 1112 are stored in the industry word 1103 of the industry word classification dictionary information 1100. Therefore, in the similarity calculation module 211, the words “new construction” and “house” included in the first word group are the words “new construction” 1111 and “house” 1112 included in the industry word 1103 of the industry word classification dictionary information 1100. As words, the words “new construction” and “house” are output (memorized) to the industry word 505 of the reference company word information 500 as the first industry word (step 1903).
  • the similarity calculation module 211 also outputs (remembers) the number of times "new construction” and "house” appear in the character string information related to A Co., Ltd. in the industry word 505 of the standard company word information 500 (step 1904). That is, the similarity calculation module 211 outputs the most frequently appearing "new construction” (the number of appearances is 10) among the character string information (first information) related to A Co., Ltd. to word 1 of the industry word 505, and then outputs it to word 1. The "house” (the number of appearances is 7), which has appeared frequently, is output to word 2 of the industry word 505. The similarity calculation module 211 also outputs other words that appear most frequently after "house” after word 3 (not shown) of industry word 505.
  • the first word group extracted from the character string information about A Co., Ltd. includes the word "construction case", and the industry word 1103 of the industry word classification dictionary information 1100 includes the industry word "construction case”.
  • the word "construction case” is not common between the first word group and the industry word 1103 of the industry word classification dictionary information 1100, so that the word "construction case” is the reference company word information. Do not output (do not remember) to 500 industry words 505.
  • the example described with reference to FIG. 5 is an example of the first company, but even in the case of the second company, the information of the second industry word and the information of the number of appearances are targeted. It is the same as the example described with reference to FIG.
  • the similarity calculation module 211 may store the information of the first industry word and the number of occurrences, and the information of the information second industry word and the number of appearances in the same database. .. As a result, the industry word extraction flow 1900 executed by the similarity calculation module 211 ends.
  • FIG. 20 is an example of the industry similarity calculation flow 2000 implemented by the similarity calculation module 211.
  • the industry similarity calculation flow 2000 is a flow for calculating the industry similarity based on the similarity between the industry to which the first company belongs and the industry to which the second company belongs, and is a detailed flow of step 1605 in FIG. It is a flow.
  • the similarity calculation module 211 acquires the first industry word that appears more than a predetermined number of times from the industry word 505 of the reference company word information 500, and sets the information of the second industry word that appears more than a predetermined number of times as the target company word. Obtained from the industry word 904 of information 900 (step 2001).
  • step 2001 A specific example of step 2001 will be described with reference to FIGS. 5 and 9.
  • the industry words in A Co., Ltd. are "new construction” (10 appearances) and "house” (7 appearances).
  • the similarity calculation module 211 acquires the first industry word that appears 80% or more (8 times or more) of the appearance number (10 times) of the "new construction” that appears at the maximum number of appearances. To do. That is, the similarity calculation module 211 acquires only the information of the industry word of "new construction".
  • the similarity calculation module 211 is set to a predetermined ratio (80%) or more of the number of appearances of the word appearing at the maximum number of appearances, but the predetermined ratio can be arbitrarily set.
  • the similarity calculation module 211 uses the first industry word that appears 80% or more (4 times or more) of the appearance number (5 times) of the "spatial design” that appears at the maximum number of appearances. get. That is, the similarity calculation module 211 acquires information on the industry words of "spatial design” and "housing".
  • the similarity calculation module 211 includes an industry classification (hereinafter, may be referred to as a first industry) associated with the first industry word acquired in step 2001, and a second industry word acquired in step 2001.
  • the industry classification (hereinafter, may be referred to as a second industry) associated with is acquired from the industry word classification dictionary information 1100 (step 2002).
  • step 2002 A specific example of step 2002 will be described with reference to FIGS. 5 and 11.
  • the similarity calculation module 211 acquires the first industry corresponding to the first industry word of "new construction" acquired in step 2001.
  • the similarity calculation module 211 corresponds to the "new construction" 1111 of the industry word classification dictionary information 1100 in FIG. 11, and is stored in the "architecture" 1113 and the upper industry 1101 stored in the lower industry 1102. Acquire "Construction" 1114.
  • the similarity calculation module 211 stores the acquired information on the first industry in the industry 1 of the industry 604 of the reference company classification information 600 shown in FIG. 6, and when there are a plurality of the first industries, the industry 2 I will remember it later.
  • step 2002 a specific example of step 2002 will be described with reference to FIGS. 9 and 11.
  • the similarity calculation module 211 acquires the second industry corresponding to the second industry words of "spatial design" and "housing” acquired in step 2001.
  • the similarity calculation module 211 corresponds to the "spatial design” 1115 and the "house” 1112 of the industry word classification dictionary information 1100 in FIG. 11, and the "architecture” 1113 and the upper industry stored in the lower industry 1102. Acquire the "construction” 1114 stored in 1101.
  • the similarity calculation module 211 stores the acquired information on the second industry in the industry 1 of the industry 1003 of the target company classification information 1000 shown in FIG. 10, and when there are a plurality of the first industries, the industry 2 I will remember it later.
  • the similarity calculation module 211 acquires the similarity between the first industry and the second industry acquired in step 2002 based on the industry similarity matrix information 1300 (step 2003).
  • a specific example of step 2003 will be described with reference to FIG.
  • the degree of similarity between the first industry of A Co., Ltd. (lower industry "architecture” and upper industry “construction”) and the second industry of Z Co., Ltd. (lower industry "architecture” and upper industry “construction”) An example of acquisition will be described.
  • the similarity calculation module 211 includes the upper business format "construction” 1312 and the lower business format "construction” 1311 of A Co., Ltd. belonging to the column of the industry similarity matrix information 1300 in FIG. 13, and the upper business format "construction” of Z Co., Ltd. belonging to the row.
  • the similarity (10) associated with the intersection 1315 with 1314 and the subordinate business format “construction” 1313 is acquired.
  • the similarity calculation module 211 calculates the industry similarity between the first company and the second company based on the similarity associated with the intersection of the industry similarity matrix information 1300 (step 2004).
  • the first company has only one sub-industry and the second company has only one sub-industry.
  • the similarity associated with the intersection of the industry similarity matrix information 1300 is the same as that of the first company and the second company. It becomes the industry similarity of. That is, in the case of the example of A corporation as the first company and Z corporation as the second company described in step 2003 above, the degree of business type similarity is 10.
  • the industry similarity matrix information 1300 When at least one of the first industry of the first company and the second industry of the second company includes a plurality of sub-industries, there are a plurality of intersections of the industry similarity matrix information 1300 ( When there are a plurality of similarities), the industry similarity can be calculated by performing a predetermined calculation using the similarity associated with the plurality of intersections. Specific examples will be described with reference to FIGS. 6, 10 and 13.
  • the first industry of H Co., Ltd. stored in the reference company classification information 600 in FIG. 6 and the second industry of V Co., Ltd. (company ID: C0005) stored in the target company classification information 1000 in FIG. An example of acquiring the similarity will be described.
  • the similarity calculation module 211 has the lower industry "polymer” and the upper industry “chemical / petrochemical / material” and the lower industry “inorganic material” and the upper industry “chemical / petrochemical / material” as the first industry of H Co., Ltd. Obtained from the industry 604 of the standard company classification information 600.
  • the similarity calculation module 211 is a second industry of V Co., Ltd. (company ID: C0005), which is a lower industry "polymer” and a higher industry “chemical / petroleum / material", a lower industry “household goods” and a higher industry “chemical”. -Obtain “Oil / Materials” from the industry 1003 of the target company classification information 1000.
  • the similarity calculation module 211 includes the sub-industry "polymer” 1316 and sub-industry “inorganic material” 1317 of H Co., Ltd. belonging to the column and the sub-industry V Co., Ltd. belonging to the row in the industry similarity matrix information 1300 of FIG.
  • the similarity associated with the four intersections 1320, 1321, 1322, and 1323 of the industry "polymer” 1318 and the sub-industry "household goods” 1319 is acquired, respectively. That is, the similarity calculation module 211 has a similarity "8" associated with the intersection 1320, a similarity "10” associated with the intersection 1321, and a similarity "6” associated with the intersection 1322. ,
  • the similarity degree “8” associated with the intersection 1323 is acquired, respectively.
  • the similarity calculation module 211 calculates, for example, the similarity for each of the first sub-industries, and calculates the average value of the calculated similarity for each of the first sub-industries. Industry similarity can be calculated.
  • the similarity calculation module 211 is, for example, the average of the maximum similarity and the average similarity among the plurality of similarity associated with one first sub-industry of the first company belonging to the column. By calculating the value, the degree of similarity for each of the first sub-industries is calculated. When the column industry and the row industry are exactly the same, the industry similarity is the maximum.
  • the degree of similarity for each of the first sub-industries can be specifically calculated as follows.
  • the similarity calculation module 211 corresponds to the similarity “8” and the intersection 1321 associated with the intersection 1320 among the similarity associated with the intersection in the sub-industry “polymer” 1316 of H Co., Ltd. belonging to the column.
  • the average "9” with the attached similarity "10” and the similarity "10” associated with the intersection 1321 which is the maximum similarity are acquired.
  • the similarity calculation module 211 sets “9.5”, which is the average value of the acquired “9” and “10”, as the similarity in the sub-industry “polymer” 1316 of H Co., Ltd.
  • the similarity calculation module 211 corresponds to the similarity “6” and the intersection 1323 associated with the intersection 1322 among the similarity associated with the intersection in the sub-industry “inorganic material” 1317 of H Co., Ltd. belonging to the column.
  • the average "7” with the attached similarity "8” and the similarity "8” associated with the intersection 1323, which is the maximum similarity, are acquired. Further, the similarity calculation module 211 sets “7.5”, which is the average value of the acquired “7” and “8”, as the similarity in the sub-industry “inorganic material” 1317 of H Co., Ltd.
  • by calculating the similarity for each of the first sub-industries it is possible to evaluate with emphasis on the maximum similarity among the similarity associated with a plurality of intersections. it can.
  • the similarity calculation module 211 has a similarity "9.5" in the lower industry “polymer” 1316 of H Co., Ltd. and a similarity "7.5” in "inorganic material” 1317 of H Co., Ltd.
  • the average value of "8.5” is calculated as the industry similarity.
  • the similarity calculation module 211 stores the calculated industry similarity in the similarity 704 of the similarity information 700 in FIG. 7 according to the next step 2005. As another example, all associated with a plurality of intersections. The average value of the similarity of the above may be used as the industry similarity.
  • the similarity calculation module 211 outputs the calculated industry similarity (step 2005).
  • step 2005 A specific example of step 2005 will be described with reference to FIG. An example of A Co., Ltd. as a first company and Z Co., Ltd. as a second company will be described.
  • the similarity value 1.00 divided by 10 is the similarity 704 of the similarity information 700.
  • the similarity after the output may be associated with the intersection of the industry similarity matrix information 1300.
  • the industry similarity calculation flow 2000 executed by the similarity calculation module 211 ends.
  • FIG. 21 is an example of the business format word extraction flow 2100 implemented by the similarity calculation module 211.
  • the business format word extraction flow 2100 is a flow for extracting words related to the business format from the first information and the second information, and is a detailed flow of step 1606 in FIG.
  • the similarity calculation module 211 extracts a word group from the first information which is information about the first company and the second information which is information about the second company (step 2101).
  • the step is the same as in steps 1701 and 1901, and the word group extracted by the similarity calculation module 211 in step 1701 or 1901 can be used.
  • the similarity calculation module 211 collates the first word group with the business type word classification dictionary information 1200, and collates the second word group with the business type word classification dictionary information 1200 (step 2102).
  • the business type word classification dictionary information 1200 of the dictionary DB 209 stores information on words related to the business type (hereinafter, may be referred to as business type words).
  • the similarity calculation module 211 is a business type word included in the first word group (hereinafter, may be referred to as a first business type word) or a business type word included in the second word group (hereinafter, a second business type word). (In some cases), it is output (step 2103). The similarity calculation module 211 outputs the number of times the first business type word appears in the first information and the number of times the second business type word appears in the second information (step 2104).
  • steps 2102 to 2104 will be described with reference to FIGS. 5 and 12.
  • the first word group extracted from the character string information (first information) about A Co., Ltd. includes the words "construction example” and "maintenance”.
  • the similarity calculation module 211 searches whether the words "construction example” and "maintenance" included in the first word group are stored in the business format word 1203 of the business category word classification dictionary information 1200 (step). 2102).
  • the words “construction example” 1211 and “maintenance” 1212 are stored in the business type word 1203 of the business type word classification dictionary information 1200. Therefore, in the similarity calculation module 211, the words “construction example” and “maintenance” included in the first word group are “construction example” 1211 and “maintenance” included in the format word 1203 of the format word classification dictionary information 1200. As the 1212 words, the words “construction example” and “maintenance” are output (memorized) to the business format word 506 of the standard company word information 500 as the first business format word (step 2103).
  • the similarity calculation module 211 also outputs (memorizes) the number of times "construction example” and "maintenance” appear in the character string information related to A Co., Ltd. in the business format word 506 of the standard company word information 500 (step 2104). .. That is, the similarity calculation module 211 outputs the most frequently appearing "construction example” (the number of appearances is 7) among the character string information (first information) related to A Co., Ltd. to word 1 of the business format word 506. Next, "maintenance” (the number of appearances is 6), which has the highest number of appearances, is output to word 2 of the business type word 506. The similarity calculation module 211 also outputs other words that appear most frequently after "house” after word 3 (not shown) of the business format word 506.
  • the first word group extracted from the character string information about A Co., Ltd. includes the word "house”, and the business type word 1203 of the business type word classification dictionary information 1200 does not include the business type word "house”.
  • the word "house” is not common between the first word group and the business type word 1203 of the business type word classification dictionary information 1200, so that the word "house” is the reference company word information 500. Do not output to business type word 506 (do not remember).
  • the example described with reference to FIG. 5 is an example of the first company, but even in the case of the second company, the information of the second business type word and the information of the number of appearances are targeted.
  • the similarity calculation module 211 may store the information of the first business category word and the number of appearances, and the information of the second business category word and the number of appearances in the same database. .. As a result, the business format word extraction flow 2100 executed by the similarity calculation module 211 ends.
  • FIG. 22 is an example of the business format similarity calculation flow 2200 implemented by the similarity calculation module 211.
  • the business type similarity calculation flow 2200 is a flow for calculating the business type similarity based on the similarity between the business type to which the first company belongs and the business type to which the second company belongs, and the details of step 1607 in FIG. Flow.
  • the similarity calculation module 211 acquires the first business type word that appears more than a predetermined number of times from the business type word 506 of the reference company word information 500, and obtains the information of the second business type word that appears more than a predetermined number of times as the target company word. Obtained from the business type word 905 of the information 900 (step 2201).
  • step 2201 A specific example of step 2201 will be described with reference to FIGS. 5 and 9.
  • the format words in A Co., Ltd. are "construction case” (appearance number 7 times) and "maintenance” (appearance number 6 times).
  • the similarity calculation module 211 is a first business format that appears 80% or more (5.6 times or more) of the appearance times (7 times) of the "construction case” that appears at the maximum number of appearances. Get the word. That is, the similarity calculation module 211 acquires the information of the business type words of "construction example” and "maintenance".
  • the similarity calculation module 211 is set to a predetermined ratio (80%) or more of the number of appearances of the word appearing at the maximum number of appearances, but the predetermined ratio can be arbitrarily set.
  • the format words in Z Co., Ltd. are "construction record” (8 appearances) and “maintenance” (4 appearances).
  • the similarity calculation module 211 is the first business format that appears 80% or more (6.4 times or more) of the appearance times (8 times) of the "construction results" that appear at the maximum number of appearances. Get the word. That is, the similarity calculation module 211 acquires only the information of the business type word of "construction record".
  • the similarity calculation module 211 includes a business category classification associated with the first business format word acquired in step 2201 (hereinafter, may be referred to as a first business format), and a second business format word acquired in step 2201.
  • the business category classification (hereinafter, may be referred to as a second business category) associated with is acquired from the business category word classification dictionary information 1200 (step 2202).
  • the similarity calculation module 211 acquires the first business format corresponding to the first business format words of "construction example” and "maintenance” acquired in step 2201. Specifically, the similarity calculation module 211 is stored in the "construction” 1213 and the upper business category 1201 stored in the lower business format 1202, which corresponds to the "construction example” 1211 in the business format word classification dictionary information 1200 of FIG. Acquire "Manufacturing / Processing" 1214. Further, the similarity calculation module 211 corresponds to the “maintenance” 1212 of the business format word classification dictionary information 1200 in FIG.
  • the similarity calculation module 211 stores the acquired information on the first business type in the business type 1 and the business type 2 of the business type 605 of the reference company classification information 600 shown in FIG.
  • step 2202 a specific example of step 2202 will be described with reference to FIGS. 9 and 12.
  • the similarity calculation module 211 acquires the second business format corresponding to the second business format word of the “construction record” acquired in step 2201. Specifically, the similarity calculation module 211 is stored in the "construction" 1213 and the upper business category 1201 stored in the lower business category 1202, which corresponds to the "construction record" 1217 of the business category word classification dictionary information 1200 in FIG. Acquire "Manufacturing / Processing" 1214.
  • the similarity calculation module 211 stores the acquired information on the second business type in the business type 1 of the business type 1004 of the target company classification information 1000 shown in FIG. 10, and when there are a plurality of the first business types, the business type 2 I will remember it later.
  • the similarity calculation module 211 acquires the similarity between the first business format and the second business format acquired in step 2202 based on the business format similarity matrix information 1400 (step 2203).
  • a specific example of step 2203 will be described with reference to FIG.
  • the first business format of A Co., Ltd. lower business format "construction” and higher business format “manufacturing / processing” and lower business format “maintenance / maintenance” and higher business format “management”
  • the second business format of Z Co., Ltd. lower business format
  • An example of acquiring the degree of similarity with "construction” and the higher-level business format "manufacturing / processing" will be described.
  • the similarity calculation module 211 is used in the business format similarity matrix information 1400 shown in FIG. It is associated with two intersections 1417 and 1418 of "management" 1414 and lower format “maintenance / maintenance” 1413, and upper format “manufacturing / processing” 1416 and lower format “construction” 1415 of Z Co., Ltd. belonging to the bank. Get the similarity. That is, the similarity calculation module 211 acquires the similarity "10" associated with the intersection 1417 and the similarity "7" associated with the intersection 1418, respectively.
  • the similarity calculation module 211 calculates the business type similarity between the first company and the second company based on the similarity associated with the intersection of the business type similarity matrix information 1400 (step 2204). If the first company has only one sub-business format and the second company has only one sub-business format (when the intersection of the business type similarity matrix information 1400 is one), the similarity is calculated. Module 211 can set the similarity associated with the intersection of the business type similarity matrix information 1400 as the business type similarity between the first company and the second company.
  • the similarity calculation module 211 calculates the business type similarity using the similarity associated with the plurality of intersections.
  • the similarity calculation module 211 calculates, for example, the similarity for each one of the plurality of first sub-business categories, and calculates the average value of the similarities for all the calculated first sub-business categories. Calculate the degree of business type similarity.
  • the similarity calculation module 211 is, for example, among a plurality of similarity associations associated with one first sub-business category of the first company belonging to the column. By calculating the average value of the maximum similarity and the average similarity, the similarity for each of the first sub-business categories is calculated. In addition, when the business type of the column and the business type of the row are completely the same, the degree of business type similarity is the maximum value.
  • the similarity calculation module 211 acquires the similarity "10" 1417 in the subordinate business format "construction" 1411 of A Co., Ltd. and the similarity "7" 1418 in "maintenance / maintenance" 1413 of A Co., Ltd. ..
  • the similarity calculation module 211 is an average value of the similarity "10" 1417 in the subordinate business format "construction" 1411 of A Co., Ltd. and the similarity "7" 1418 in "maintenance / maintenance” 1413 of A Co., Ltd. Is calculated as "8.5" as the business type similarity.
  • the similarity calculation module 211 outputs the calculated business type similarity (step 2205).
  • a specific example of step 2205 will be described with reference to FIG. An example of A Co., Ltd. as a first company and Z Co., Ltd. as a second company will be described.
  • the similarity calculation module 211 adjusts the business type similarity 8.5 calculated in step 2204 described above with the business similarity score, so that 0.85, which is a value divided by 10, is similar to the similarity information 700.
  • the similarity after the output may be associated with the intersection of the business type similarity matrix information 1400.
  • the business format similarity calculation flow 2200 executed by the similarity calculation module 211 ends.
  • the similarity stored at the intersection of the lower industry and the upper industry in the column of the industry similarity matrix and the lower industry and the upper industry in the row is set based on a predetermined rule.
  • the predetermined rule will be described with reference to FIG.
  • the upper classification of columns (upper industry) and the upper classification of rows (upper industry) in the industry similarity matrix are the same, high similarity, medium similarity, or low.
  • the cases are classified according to the degree of similarity, and the subclassification of columns (subclassification) and subclassification of rows (subclassification) in the industry similarity matrix are the same, high similarity, or medium similarity and low similarity. It is divided into cases according to the degree, and the similarity according to each case is stored.
  • “same” means that the industries are completely the same
  • “high similarity” means that the industries are likely to be similar
  • “medium similarity” means that the industries are similar next to “high similarity”.
  • Highly likely, “low similarity” means that industries are likely to be similar next to “medium similarity”.
  • the industry type similarity setting information 1500 has information such as the similarity setting rule 1501.
  • the similarity setting rule 1501 stores the similarity corresponding to each of the above-mentioned cases in the bottom line of the industry type similarity setting information 1500.
  • the similarity calculation module 211 is based on the rules shown in the industry format similarity setting information 1500, and is a lower industry and a higher industry in the column (first company) of the industry similarity matrix and a lower industry in the row (second company). Correspond the degree of similarity to the intersection of the industry and the upper industry.
  • the similarity calculation module 211 can set the similarity higher in the following order.
  • the sub-industry of the column and the sub-industry of the row are the same (similarity stored at the intersection is 10).
  • the lower industry in the column and the lower industry in the row have high similarity, and the upper industry in the column and the upper industry in the row are the same (the similarity stored at the intersection is 9).
  • the lower industry in the column and the lower industry in the row have a medium similarity
  • the upper industry in the column and the upper industry in the row are the same (the similarity stored at the intersection is 8).
  • the similarity between the lower industry in the column and the lower industry in the row is high, and the similarity between the upper industry in the column and the upper industry in the row is high (the similarity stored in the intersection is high). 7). When the similarity between the lower industry in the column and the lower industry in the row is medium similarity, and the similarity between the upper industry in the column and the upper industry in the row is high (the similarity stored in the intersection is 6). When the similarity between the lower industry in the column and the lower industry in the row is high, and the similarity between the upper industry in the column and the upper industry in the row is low and medium (similarity stored at the intersection). The degree is 5). When the similarity between the lower industry in the column and the lower industry in the row is medium similarity, and the similarity between the upper industry in the column and the upper industry in the row is low and medium similarity (similarity stored at the intersection). The degree is 4).
  • the similarity corresponding to the intersection of the lower and upper business categories in the column of the business category similarity matrix and the lower and upper business categories in the row is also set based on the predetermined rules as described above.
  • FIG. 23 is an example of the company similarity calculation flow 2300 implemented by the similarity calculation module 211.
  • the company similarity calculation flow 2300 is a flow for calculating the company similarity based on the business similarity, the industry similarity, and the business type similarity, and is a detailed flow of step 1608 in FIG.
  • the similarity calculation module 211 acquires information on business similarity, industry similarity, and business type similarity from the similarity information 700 (step 2301).
  • the similarity calculation module 211 calculates the company similarity by adding the business similarity, the industry similarity, and the business type similarity at a predetermined ratio (step 2302).
  • the similarity calculation module 211 outputs the calculated company similarity (step 2303).
  • steps 2301 to 2303 will be described with reference to FIG.
  • An example of A corporation as a first company and Z corporation as a second company (company ID of the target company: C0001) will be described.
  • the similarity calculation module 211 has a business similarity (0.960) between the first company A Co., Ltd. and the second company (company ID of the target company: C0001) from the similarity 704 of the similarity information 700.
  • Industry similarity (1.00) and business category similarity (0.850) are acquired respectively (step 2301).
  • the similarity calculation module 211 adds the respective similarity at the following ratios.
  • Business similarity: Industry similarity: Business similarity 3: 5: 2
  • the ratio of business similarity is the highest, and the ratio of business similarity is the lowest. That is, the similarity calculation module 211 calculates a value of 0.958 as the company similarity (step 2302).
  • the predetermined ratio may be any ratio, and the importance of each similarity can be set by adjusting the ratio.
  • the similarity calculation module 211 outputs (stores) the calculated company similarity (0.958) in association with the similarity 704 of the similarity information 700 in association with Z Co., Ltd. (company ID: C0001 of the target company). Step 2303). As a result, the company similarity calculation flow 2300 executed by the similarity calculation module 211 ends.
  • FIG. 24 is an example of the similarity output flow 2400 implemented by the similarity calculation module 211.
  • the similarity output flow 2400 is a flow for outputting the similarity of a plurality of second companies based on the order of the company similarity, and is a detailed flow of step 1609 in FIG.
  • the similarity calculation module 211 acquires the company similarity in the plurality of second companies (step 2401).
  • the similarity calculation module 211 determines the order of the plurality of second companies based on the acquired company similarity in the plurality of second companies (step 2402).
  • the similarity calculation module 211 outputs (stores) information on the order of the plurality of determined second companies (step 2403).
  • the similarity calculation module 211 acquires the company similarity between the first company A Co., Ltd. and the plurality of second companies from the similarity 704 of the similarity information 700 in FIG. 7 (step 2401).
  • the company similarity between all the second companies and A Co., Ltd. stored in the target company basic information 800 is acquired from the similarity 704 of the similarity information 700. Note that FIG. 7 shows only the similarity of the three companies.
  • the similarity calculation module 211 ranks the second companies based on the company similarity between all the second companies and A Co., Ltd., and for example, the top three can be determined as follows (step 2402).
  • the first place is a company with a company ID of C0001 (Z Co., Ltd.) and the company similarity is 0.958
  • the second place is a company with a company ID of C0080 and the company similarity is 0.927
  • the third place is a company.
  • the ID is C0087 and the company similarity is 0.810.
  • the similarity calculation module 211 outputs (stores) information on each similarity of Z Co., Ltd., whose first-ranked company ID is C0001, to similarity 1 of similarity 704 of similarity information 700, and second-ranked company.
  • the information about each similarity of the company whose ID is C0080 is output (stored) to the similarity 2 of the similarity 704 of the similarity information 700, and the information about each similarity of the company whose third company ID is C807 is the similarity. It is output (stored) to the similarity 3 of the similarity 704 of the information 700 (step 2403).
  • the similarity calculation module 211 outputs (stores) the fourth and subsequent ranks to the similarity 4 and later of the similarity 704 of the similarity information 700. As a result, the similarity output flow 2400 executed by the similarity calculation module 211 ends.
  • FIG. 25 is an example of the similar company display flow 2500 implemented by the similar company display module 212.
  • the similar company display flow 2500 is a flow for displaying a second company similar to the first company.
  • the similar company display module 212 acquires the business similarity, the industry similarity, the business type similarity, and the company similarity in the second company having the higher company similarity (step 2501).
  • the similar company display module 212 displays information about the second company and a chart including axes of business similarity, industry similarity, and business type similarity on the user terminal 102 based on the order of company similarity (step 2502). ).
  • the chart generated and displayed by the similar company display module 212 is not limited to the radar chart, and may be, for example, a chart in which column charts (bar graphs) of each degree of similarity are grouped for each company.
  • steps 2501 and 2502 will be described with reference to FIGS. 7, 8 and 27.
  • the example of A Co., Ltd. as the first company will be described.
  • the similar company display module 212 stores in the similarity 1, the similarity 2 and the similarity 3 of the row of the first company A Co., Ltd. in the similarity 704 of the similarity information 700 of FIG.
  • the degree of business similarity, the degree of industry similarity, the degree of business type similarity, and the degree of company similarity in a company Z Co., Ltd. with company ID C0001, company with company ID C0080 and company with company ID C0087) are acquired.
  • the information stored in the target company basic information 800 of FIG. 8 in the acquired second company is also acquired (step 2501).
  • FIG. 27 is an example of a screen 2700 for displaying a similar company similar to the first company. More specifically, FIG. 27 shows three companies similar to A Co., Ltd. as the first company. Note that FIG. 27 is a screen displayed after "Automatically extract from the following URL" 2602 in FIG. 26 is selected.
  • the similar company display module 212 displays information on each company based on the order of the company similarity of the company with the company ID of C0001, the company with the company ID of C0080, and the company with the company ID of C0087. That is, the similar company display module 212 displays the information of Z Co., Ltd.
  • the similar company display module 212 displays the name, market capitalization, net income, and price-earnings ratio of each second company acquired from the target company basic information 800 of FIG. .. Further, as shown in FIG. 27, the similar company display module 212 displays the business similarity acquired from the similarity information 700 on the axis of the business similarity, and displays the industry similarity acquired from the similarity information 700 as the industry similarity. A radar chart is displayed on the axis of the business type similarity, and the business type similarity obtained from the similarity information 700 is displayed on the business type similarity axis. As a result, the similar company display flow 2500 executed by the similar company display module 212 ends.
  • the similar company display module 212 adds or replaces the axis of business similarity, the axis of industry similarity, and the axis of business type similarity, and the axis of similarity regarding the industry, the axis of similarity regarding the business form, or the axis of similarity regarding the business form.
  • a chart may be generated and displayed that includes an axis of similarity with respect to the business structure.
  • the similarity calculation module 211 can calculate the similarity regarding the industry, the similarity regarding the business form, or the similarity regarding the business structure by the same method as the method for calculating the industry similarity or the business type similarity.
  • the similarity calculation module 211 stores the industry word classification dictionary information for storing the word and classification information related to the industry, the business form word for storing the business form word and the classification information from the first information or the second information.
  • the classification dictionary information or the business structure word classification dictionary information that stores words related to the business structure and information on the classification, words related to the type of business, words related to the business form, or words related to the business structure are extracted.
  • the industry, business form, or business structure to which the first company or the second company corresponding to the extracted word belongs is determined by the industry word classification dictionary information, the business form word classification dictionary information, or the business structure word classification dictionary information.
  • the similarity calculation module 211 calculates the similarity regarding the industry, the similarity regarding the business form, or the similarity regarding the business structure by using the industry similarity matrix, the business form similarity matrix, or the business structure similarity matrix. ..
  • the present invention is not limited to the above-mentioned examples, and includes various modifications.
  • the above-described embodiment has been described in detail in order to explain the present invention in an easy-to-understand manner, and is not necessarily limited to those having all the described configurations.
  • it is possible to replace a part of the configuration of one embodiment with the configuration of another embodiment and it is also possible to add the configuration of another embodiment to the configuration of one embodiment.
  • each of the above configurations, functions, processing units, processing means, etc. may be realized by hardware by designing a part or all of them by, for example, an integrated circuit. Further, each of the above configurations, functions, and the like may be realized by software by the processor interpreting and executing a program that realizes each function. Information such as programs, tables, and files that realize each function can be stored in a memory, a hard disk, a recording device such as an SSD (Solid State Drive), or a recording medium such as an IC card, an SD card, or a DVD.
  • SSD Solid State Drive
  • control lines and information lines indicate those that are considered necessary for explanation, and do not necessarily indicate all the control lines and information lines in the product. In practice, it can be considered that almost all configurations are interconnected. It should be noted that the above-described embodiment discloses at least the configuration described in the claims.

Abstract

Provided are a company similarity calculation server and a company similarity calculation method which can accurately calculate the similarity between companies. The company similarity calculation server, which calculates company similarity between a first company that is a reference and a second company other than the first company, is characterized by including a similarity calculation means which: calculates business similarity on the basis of first information pertaining to the first company and second information pertaining to the second company and on the basis of a word pertaining to a business, which is extracted from the first information, and a word pertaining to a business, which is extracted from the second information; calculates business community similarity on the basis of a first business community pertaining to a business community to which the first company belongs and a second business community pertaining to a business community to which the second company belongs; and calculates business category similarity on the basis of a first business category pertaining to a business category to which the first company belongs and a second business category pertaining to a business category to which the second company belongs, wherein the company similarity is calculated on the basis of the business similarity, the business community similarity, and the business category similarity.

Description

企業類似度算出サーバ及び企業類似度算出方法Company similarity calculation server and company similarity calculation method
 [関連出願]
 本出願は、2019年8月8日に出願された「企業類似度算出サーバ及び企業類似度算出方法」と題する日本国特許出願2019-146489号の優先権を主張し、その開示はその全体が参照により本明細書に取り込まれる。
 本発明は、企業類似度算出サーバ及び企業類似度算出方法に関する。
[Related application]
This application claims the priority of Japanese Patent Application No. 2019-146489 entitled "Corporate Similarity Calculation Server and Corporate Similarity Calculation Method" filed on August 8, 2019, and the disclosure thereof is in its entirety. Incorporated herein by reference.
The present invention relates to a company similarity calculation server and a company similarity calculation method.
 本技術分野の背景技術として、特開2012-118612号公報(特許文献1)がある。この公報には、「ネットワークを介して、顧客である企業の財務情報を格納したデータサーバとユーザ端末とに接続された管理サーバにおけるマーケティング提案支援方法は、前記ユーザ端末から顧客リストを取得し、前記顧客リストから少なくとも1つの特徴的な評価指標を抽出し、前記特徴的な評価指標に基づいて、前記データサーバから、少なくとも1つの類似企業を検索し、前記検索された少なくとも1つの企業を、類似の程度に基づいて絞り込む」ことが記載されている。 As a background technology in this technical field, there is Japanese Patent Application Laid-Open No. 2012-118612 (Patent Document 1). In this publication, "The marketing proposal support method in the management server connected to the data server storing the financial information of the customer company and the user terminal via the network obtains the customer list from the user terminal and obtains the customer list. At least one characteristic evaluation index is extracted from the customer list, at least one similar company is searched from the data server based on the characteristic evaluation index, and the searched at least one company is selected. Narrow down based on similar degree ".
特開2012-118612号公報Japanese Unexamined Patent Publication No. 2012-118612
 しかし、企業間の類似度を算出するにあたり、企業の情報を事業に関する情報、業界に関する情報又は業態に関する情報に分類しておらず、企業間の類似度を精度良く算出できないおそれがあった。 However, when calculating the similarity between companies, the information on the companies is not classified into information on the business, information on the industry, or information on the business format, and there is a risk that the similarity between the companies cannot be calculated accurately.
 そこで、本発明は、其々の企業における、事業に関する情報の類似度、業界に関する情報の類似度、及び業態に関する情報の類似度に基づいて、企業間の類似度を精度良く算出できる仕組みを提供する。 Therefore, the present invention provides a mechanism capable of accurately calculating the similarity between companies based on the similarity of information about businesses, the similarity of information about industries, and the similarity of information about business formats in each company. To do.
 上記課題を解決するために、例えば特許請求の範囲に記載の構成を採用する。本願は上記課題を解決する手段を複数含んでいるが、その一例を挙げるならば、
 基準となる第1の企業と、前記第1の企業以外の第2の企業と、の企業類似度を算出する企業類似度算出サーバであって、
 前記第1の企業に関する第1の情報及び前記第2の企業に関する第2の情報に基づき、前記第1の企業と前記第2の企業との企業類似度を算出する類似度算出手段と、
 算出された前記企業類似度を出力する出力手段と、を備え、
 前記類似度算出手段は、
 前記第1の企業が行っている事業に関する単語と、前記第2の企業が行っている事業に関する単語と、に基づいて事業類似度を算出し、
 前記第1の企業が属する業界に関する第1の業界と、前記第2の企業が属する業界に関する第2の業界と、に基づいて業界類似度を算出し、
 前記第1の企業が属する業態に関する第1の業態と、前記第2の企業が属する業態に関する第2の業態と、に基づいて業態類似度を算出し、
 前記事業類似度と、前記業界類似度と、前記業態類似度と、に基づいて前記企業類似度を算出する、
ことを特徴とする。
In order to solve the above problems, for example, the configuration described in the claims is adopted. The present application includes a plurality of means for solving the above problems, to give an example.
It is a company similarity calculation server that calculates the company similarity between the reference first company and the second company other than the first company.
A similarity calculation means for calculating the company similarity between the first company and the second company based on the first information about the first company and the second information about the second company.
It is provided with an output means for outputting the calculated company similarity.
The similarity calculation means is
The degree of business similarity is calculated based on the words related to the business conducted by the first company and the words related to the business conducted by the second company.
The industry similarity is calculated based on the first industry related to the industry to which the first company belongs and the second industry related to the industry to which the second company belongs.
The degree of business type similarity is calculated based on the first business type related to the business type to which the first company belongs and the second business type related to the business type to which the second company belongs.
The company similarity is calculated based on the business similarity, the industry similarity, and the business format similarity.
It is characterized by that.
 本発明によれば、其々の企業における、事業に関する情報の類似度、業界に関する情報の類似度、及び業態に関する情報の類似度に基づいて、企業間の類似度を精度良く算出できる。
 上記した以外の課題、構成及び効果は、以下の実施形態の説明により明らかにされる。
According to the present invention, it is possible to accurately calculate the degree of similarity between companies based on the degree of similarity of information about business, the degree of similarity of information about industry, and the degree of similarity of information about business type in each company.
Issues, configurations and effects other than those described above will be clarified by the following description of the embodiments.
全体の企業類似度算出システム1の構成図の例である。This is an example of a configuration diagram of the entire company similarity calculation system 1. 企業類似度算出サーバ101のハードウェア構成の例である。This is an example of the hardware configuration of the company similarity calculation server 101. 利用者端末102のハードウェア構成の例である。This is an example of the hardware configuration of the user terminal 102. 管理者端末103のハードウェア構成の例である。This is an example of the hardware configuration of the administrator terminal 103. 基準企業単語情報500の例である。This is an example of the standard company word information 500. 基準企業分類情報600の例である。This is an example of the standard company classification information 600. 類似度情報700の例である。This is an example of similarity information 700. 対象企業基本情報800の例である。This is an example of the target company basic information 800. 対象企業単語情報900の例であるThis is an example of target company word information 900. 対象企業分類情報1000の例である。This is an example of the target company classification information 1000. 業界単語分類辞書情報1100の例である。This is an example of the industry word classification dictionary information 1100. 業態単語分類辞書情報1200の例である。This is an example of business type word classification dictionary information 1200. 業界類似度マトリクス情報1300の例である。This is an example of the industry similarity matrix information 1300. 業態類似度マトリクス情報1400の例である。This is an example of the format similarity matrix information 1400. 業界業態類似度設定情報1500の例である。This is an example of the industry format similarity setting information 1500. 類似度算出モジュール211が実施する類似度算出フロー1600の例である。This is an example of the similarity calculation flow 1600 carried out by the similarity calculation module 211. 類似度算出モジュール211が実施する事業単語抽出フロー1700の例である。This is an example of the business word extraction flow 1700 implemented by the similarity calculation module 211. 類似度算出モジュール211が実施する事業単語類似度算出フロー1800の例である。This is an example of the business word similarity calculation flow 1800 implemented by the similarity calculation module 211. 類似度算出モジュール211が実施する業界単語抽出フロー1900の例である。This is an example of the industry word extraction flow 1900 implemented by the similarity calculation module 211. 類似度算出モジュール211が実施する業界類似度算出フロー2000の例である。This is an example of the industry similarity calculation flow 2000 implemented by the similarity calculation module 211. 類似度算出モジュール211が実施する業態単語抽出フロー2100の例である。This is an example of the business type word extraction flow 2100 implemented by the similarity calculation module 211. 類似度算出モジュール211が実施する業態類似度算出フロー2200の例である。This is an example of the business type similarity calculation flow 2200 implemented by the similarity calculation module 211. 類似度算出モジュール211が実施する企業類似度算出フロー2300の例である。This is an example of the company similarity calculation flow 2300 implemented by the similarity calculation module 211. 類似度算出モジュール211が実施する類似度出力フロー2400の例である。This is an example of the similarity output flow 2400 implemented by the similarity calculation module 211. 類似企業表示モジュール212が実施する類似企業表示フロー2500の例である。This is an example of the similar company display flow 2500 implemented by the similar company display module 212. 第1の企業と類似する類似企業の抽出を開始するための画面2600の例である。This is an example of the screen 2600 for starting the extraction of a similar company similar to the first company. 第1の企業と類似する類似企業を表示するための画面2700の例である。This is an example of the screen 2700 for displaying a similar company similar to the first company.
 企業間の類似度(以下、企業類似度とする場合がある。)の情報は様々な場面で活用されており、例えば、企業同士のM&A(Merger and Acquisition:企業買収)を計画するに際して相手方の企業(以下、マッチング先企業とする場合がある。)の候補を抽出する場合に、企業類似度が活用される。この場合において、企業類似度を精度良く算出することで、より効率的なM&Aを実現できる。 Information on the degree of similarity between companies (hereinafter, may be referred to as the degree of similarity between companies) is used in various situations. For example, when planning an M & A (Merger and Acquisition) between companies, the other party's The degree of company similarity is utilized when extracting candidates for a company (hereinafter, may be a matching destination company). In this case, more efficient M & A can be realized by accurately calculating the degree of similarity between companies.
 しかしながら、従来においては、企業間の類似度を、其々の企業における、事業に関する情報の類似度、業界に関する情報の類似度、及び業態に関する情報の類似度に基づいて、算出することについては何ら考慮されておらず、企業類似度を精度良く算出できないおそれがあった。 However, in the past, there was no way to calculate the similarity between companies based on the similarity of information about businesses, the similarity of information about industries, and the similarity of information about business formats in each company. It was not taken into consideration, and there was a risk that the degree of company similarity could not be calculated accurately.
 そこで、当該課題を解決するために、本実施形態は以下で説明するシステム又は方法を採用した。これにより、企業間の類似度を精度良く算出できる。
 以下、実施形態を説明する。
Therefore, in order to solve the problem, the present embodiment adopts the system or method described below. As a result, the degree of similarity between companies can be calculated accurately.
Hereinafter, embodiments will be described.
 本実施形態では、企業同士のM&Aを計画するに際してマッチング先企業の候補を抽出するために企業間の類似度を算出する企業類似度算出システム1の例を説明する。
 図1は、全体の企業類似度算出システム1の構成図の例である。
 企業類似度算出システム1は、複数の利用者端末102、複数の管理者端末103、を備え、それぞれがネットワークを介して企業類似度算出サーバ101に接続されている。なお、ネットワークは有線、無線を問わず、それぞれの端末はネットワークを介して情報を送受信することができる。
In this embodiment, an example of a company similarity calculation system 1 that calculates the similarity between companies in order to extract candidates for matching destination companies when planning an M & A between companies will be described.
FIG. 1 is an example of a configuration diagram of the entire company similarity calculation system 1.
The company similarity calculation system 1 includes a plurality of user terminals 102 and a plurality of administrator terminals 103, each of which is connected to the company similarity calculation server 101 via a network. The network may be wired or wireless, and each terminal can send and receive information via the network.
 企業類似度算出システム1のそれぞれの端末や企業類似度算出サーバ101は、例えば、スマートフォン、タブレット、携帯電話機、携帯情報端末(PDA)などの携帯端末(モバイル端末)でもよいし、メガネ型や腕時計型、着衣型などのウェアラブル端末でもよい。また、据置型または携帯型のコンピュータや、クラウドやネットワーク上に配置されるサーバでもよい。また、機能としてはVR(仮想現実:Virtual Reality)端末、AR端末、MR(複合現実:Mixed Reality)端末でもよい。あるいは、これらの複数の端末の組合せであってもよい。例えば、1台のスマートフォンと1台のウェアラブル端末との組合せが論理的に一つの端末として機能し得る。またこれら以外の情報処理端末であってもよい。 Each terminal of the company similarity calculation system 1 and the company similarity calculation server 101 may be, for example, a mobile terminal (mobile terminal) such as a smartphone, a tablet, a mobile phone, or a mobile information terminal (PDA), or may be a glasses type or a wristwatch. It may be a wearable terminal such as a mold or a clothing type. It may also be a stationary or portable computer, or a server located in the cloud or on a network. Further, the function may be a VR (virtual reality: Virtual Reality) terminal, an AR terminal, or an MR (mixed reality: Mixed Reality) terminal. Alternatively, it may be a combination of these plurality of terminals. For example, a combination of one smartphone and one wearable terminal can logically function as one terminal. Further, an information processing terminal other than these may be used.
 企業類似度算出システム1のそれぞれの端末や企業類似度算出サーバ101は、それぞれオペレーティングシステムやアプリケーション、プログラムなどを実行するプロセッサと、RAM(Random Access Memory)等の主記憶装置と、ICカードやハードディスクドライブ、SSD(Solid State Drive)、フラッシュメモリ等の補助記憶装置と、ネットワークカードや無線通信モジュール、モバイル通信モジュール等の通信制御部と、タッチパネルやキーボード、マウス、音声入力、カメラ部の撮像による動き検知による入力などの入力装置と、モニタやディスプレイ等の出力装置とを備える。なお、出力装置は、外部のモニタやディスプレイ、プリンタ、機器などに、出力するための情報を送信する装置や端子であってもよい。 Each terminal of the company similarity calculation system 1 and the company similarity calculation server 101 have a processor that executes an operating system, an application, a program, and the like, a main storage device such as a RAM (Random Access Memory), and an IC card or a hard disk. Auxiliary storage devices such as drives, SSDs (Solid State Drive), and flash memory, communication control units such as network cards, wireless communication modules, and mobile communication modules, touch panel, keyboard, mouse, voice input, and camera unit movement. It is equipped with an input device such as an input by detection and an output device such as a monitor or a display. The output device may be a device or a terminal for transmitting information for output to an external monitor, display, printer, device, or the like.
 主記憶装置には、各種プログラムやアプリケーションなど(モジュール)が記憶されており、これらのプログラムやアプリケーションをプロセッサが実行することで全体システムの各機能要素が実現される。なお、これらの各モジュールは集積化する等によりハードウェアで実装してもよい。また、各モジュールはそれぞれ独立したプログラムやアプリケーションでもよいが、1つの統合プログラムやアプリケーションの中の一部のサブプログラムや関数などの形で実装されていてもよい。
 本明細書では、各モジュールが、処理を行う主体(主語)として記載をしているが、実際には各種プログラムやアプリケーションなど(モジュール)を処理するプロセッサが処理を実行する。
Various programs and applications (modules) are stored in the main memory, and each functional element of the entire system is realized by executing these programs and applications by the processor. In addition, each of these modules may be implemented by hardware by integrating them. Further, each module may be an independent program or application, but may be implemented in the form of a part of a subprogram or a function in one integrated program or application.
In this specification, each module is described as a subject (subject) that performs processing, but in reality, a processor that processes various programs, applications, and the like (module) executes processing.
 補助記憶装置には、各種データベース(DB)が記憶されている。「データベース」とは、プロセッサまたは外部のコンピュータからの任意のデータ操作(例えば、抽出、追加、削除、上書きなど)に対応できるようにデータ集合を記憶する機能要素(記憶部)である。データベースの実装方法は限定されず、例えばデータベース管理システムでもよいし、表計算ソフトウェアでもよいし、XML、JSONなどのテキストファイルでもよい。データベース管理システムで実装する場合には、リレーショナルデータベース(RDBMS)であってもよいし、非リレーショナルデータベース(非RDBMS)であってもよい。 Various databases (DB) are stored in the auxiliary storage device. A "database" is a functional element (storage unit) that stores a data set so that it can handle arbitrary data operations (for example, extraction, addition, deletion, overwriting, etc.) from a processor or an external computer. The method of implementing the database is not limited, and may be, for example, a database management system, spreadsheet software, or a text file such as XML or JSON. When implemented in a database management system, it may be a relational database (RDBMS) or a non-relational database (non-RDMS).
 利用者端末102は、企業類似度情報を利用する者が使用する端末である。利用する者とは自ら企業類似度情報を利用する者だけでなく、当該情報を他社に提供する者も含む。
 管理者端末103は、企業類似度算出システム1の管理者などが使用する端末である。
 企業類似度算出サーバ101は、上記それぞれの端末などから、判定を行うにあたって必要となる様々な情報の入力を受け付け、これらを補助記憶装置202の中に記憶する。
The user terminal 102 is a terminal used by a person who uses the company similarity information. The users include not only those who use the company similarity information by themselves, but also those who provide the information to other companies.
The administrator terminal 103 is a terminal used by the administrator of the company similarity calculation system 1.
The company similarity calculation server 101 receives input of various information necessary for making a determination from each of the above terminals and the like, and stores these in the auxiliary storage device 202.
 図2は、企業類似度算出サーバ101のハードウェア構成の例である。
 企業類似度算出サーバ101は、例えばクラウド上に配置されたサーバで構成される。
 主記憶装置201には、類似度算出モジュール211、類似企業表示モジュール212、管理モジュール213のプログラムやアプリケーションが記憶されており、これらのプログラムやアプリケーションをプロセッサ203が実行することで企業類似度算出サーバ101の各機能要素が実現される。
FIG. 2 is an example of the hardware configuration of the company similarity calculation server 101.
The company similarity calculation server 101 is composed of, for example, a server arranged on the cloud.
The main storage device 201 stores the programs and applications of the similarity calculation module 211, the similar company display module 212, and the management module 213, and the processor 203 executes these programs and applications to execute the company similarity calculation server. Each functional element of 101 is realized.
 類似度算出モジュール211は、企業に関連する情報に基づき、企業間の類似度を算出する。詳細は後述するが、例えば、基準となる企業(以下、第1の企業とする場合がある)と、第1の企業との類似度を算出される対象となる企業(以下、第2の企業とする場合がある)と、の類似度を算出する。
 類似企業表示モジュール212は、類似する企業同士の情報を利用者端末102や管理者端末103に表示する。詳細は後述するが、例えば、第1の企業と類似する複数の第2の企業の情報、及び当該第1の企業と其々の第2の企業との類似度を表示する。
The similarity calculation module 211 calculates the similarity between companies based on the information related to the companies. Details will be described later, but for example, a reference company (hereinafter, may be referred to as a first company) and a company for which the degree of similarity with the first company is calculated (hereinafter, a second company). In some cases), and the degree of similarity is calculated.
The similar company display module 212 displays information between similar companies on the user terminal 102 and the administrator terminal 103. Details will be described later, but for example, information on a plurality of second companies similar to the first company, and the degree of similarity between the first company and each second company are displayed.
 管理モジュール213は、企業類似度算出システム1の管理を行う。具体的には、管理モジュール213は企業類似度算出サーバの稼働情報や、企業類似度算出システム1を利用する利用者情報などを管理することなどを行う。 The management module 213 manages the company similarity calculation system 1. Specifically, the management module 213 manages the operation information of the company similarity calculation server, the user information using the company similarity calculation system 1, and the like.
 補助記憶装置202は、調査履歴データベース207(以下、調査履歴DBとする場合がある)、対象企業データベース208(以下、対象企業DBとする場合がある)、辞書データベース209(以下、辞書DBとする場合がある)、を備える。本実施形態においては少なくとも1つの情報を格納する少なくとも1つの情報をデータベースに備える。 The auxiliary storage device 202 includes a survey history database 207 (hereinafter, may be referred to as a survey history DB), a target company database 208 (hereinafter, may be referred to as a target company DB), and a dictionary database 209 (hereinafter, may be referred to as a dictionary DB). In some cases), In the present embodiment, the database is provided with at least one piece of information that stores at least one piece of information.
 調査履歴DB207は、基準企業単語情報500、基準企業分類情報600及び類似度情報700を備える。
 対象企業DB208は、対象企業基本情報800、対象企業単語情報900及び対象企業分類情報1000を備える。
 辞書DB209は、事業単語辞書情報1050、業界単語分類辞書情報1100、業態単語分類辞書情報1200、業界類似度マトリクス情報1300、業態類似度マトリクス情報1400及び業界業態類似度設定情報1500を備える。
The survey history DB 207 includes reference company word information 500, reference company classification information 600, and similarity information 700.
The target company DB 208 includes target company basic information 800, target company word information 900, and target company classification information 1000.
The dictionary DB 209 includes business word dictionary information 1050, industry word classification dictionary information 1100, business category word classification dictionary information 1200, industry similarity matrix information 1300, business category similarity matrix information 1400, and industry category similarity setting information 1500.
 図3は、利用者端末102のハードウェア構成の例である。
 利用者端末102は、例えば据置型コンピュータで構成される。
 主記憶装置301には、類似度企業表示モジュール311が記憶されており、これらのプログラムやアプリケーションをプロセッサ303が実行することで利用者端末102の各機能要素が実現される。
 補助記憶装置302の利用者端末データ321は、利用者に関連する情報を記憶する。
FIG. 3 is an example of the hardware configuration of the user terminal 102.
The user terminal 102 is composed of, for example, a stationary computer.
The similarity company display module 311 is stored in the main storage device 301, and each functional element of the user terminal 102 is realized by executing these programs and applications by the processor 303.
The user terminal data 321 of the auxiliary storage device 302 stores information related to the user.
 図4は、管理者端末103のハードウェア構成の例である。
 管理者端末103は、例えば据置型コンピュータで構成される。
 主記憶装置401には、管理モジュール411が記憶されており、これらのプログラムやアプリケーションをプロセッサが実行することで管理者端末103の各機能要素が実現される。
FIG. 4 is an example of the hardware configuration of the administrator terminal 103.
The administrator terminal 103 is composed of, for example, a stationary computer.
The management module 411 is stored in the main storage device 401, and each functional element of the administrator terminal 103 is realized by executing these programs and applications by the processor.
 管理モジュール411は、企業類似度算出システム1の管理を行う。
 補助記憶装置402の管理者端末データ421は、企業類似度算出システム1を管理するための情報を記憶する。
The management module 411 manages the company similarity calculation system 1.
The administrator terminal data 421 of the auxiliary storage device 402 stores information for managing the company similarity calculation system 1.
 図5、図6及び図7は企業類似度算出サーバ101の補助記憶装置202の調査履歴DB207に記憶される各テーブルの例である。 5, 6 and 7 are examples of each table stored in the investigation history DB 207 of the auxiliary storage device 202 of the company similarity calculation server 101.
 図5は、基準企業単語情報500の例である。
 基準企業単語情報500は第1の企業に関する情報から抽出した単語の情報を記憶している。
 基準企業単語情報500は、案件ID501、企業ID502、基準企業名503、事業単語504、業界単語505及び業態単語506などの情報を有する。
FIG. 5 is an example of the reference company word information 500.
The reference company word information 500 stores the word information extracted from the information about the first company.
The reference company word information 500 has information such as a case ID 501, a company ID 502, a reference company name 503, a business word 504, an industry word 505, and a business type word 506.
 案件ID501は、基準企業としての第1の企業と、少なくとも1つの第2の企業と、の類似度を算出する旨の要求を利用者端末から企業類似度算出サーバが受信した場合に生成されるユニークなIDである。案件IDは時系列において過去の案件より後の案件の方が大きい数値が付与される。
 企業ID502は、企業ごとに生成されるユニークなIDである。言い換えれば、1つの企業は1つの企業IDを有する。
 基準企業名503は、基準企業(第1の企業)の名称である。
 事業単語504は、第1の企業に関する情報から抽出した、第1の企業の事業に関する単語の情報である。
 業界単語505は、第1の企業に関する情報から抽出した、第1の企業の業界に関する単語の情報である。
 業態単語506は、第1の企業に関する情報から抽出した、第1の企業の業態に関する単語の情報である。
The matter ID 501 is generated when the company similarity calculation server receives a request for calculating the similarity between the first company as the reference company and at least one second company from the user terminal. It is a unique ID. As for the matter ID, a larger numerical value is given to the later matter than the past matter in the time series.
The company ID 502 is a unique ID generated for each company. In other words, one company has one company ID.
The standard company name 503 is the name of the standard company (first company).
The business word 504 is information on a word related to the business of the first company, which is extracted from the information about the business of the first company.
The industry word 505 is the information of the word about the industry of the first company extracted from the information about the first company.
The business format word 506 is information on words related to the business format of the first company extracted from the information on the first company.
 図6は、基準企業分類情報600の例である。
 基準企業分類情報600は基準企業(第1の企業)が属する業界及び業態の情報を記憶している。
 基準企業分類情報600は案件ID601、企業ID602、基準企業名603、業界604及び業態605などの情報を有する。
 業界604は、基準企業(第1の企業)が属する業界の情報である。
 業態605は、基準企業(第1の企業)が属する業態の情報である。
FIG. 6 is an example of the standard company classification information 600.
The reference company classification information 600 stores information on the industry and business type to which the reference company (first company) belongs.
The standard company classification information 600 has information such as a project ID 601 and a company ID 602, a standard company name 603, an industry 604, and a business type 605.
The industry 604 is information on the industry to which the reference company (first company) belongs.
The business format 605 is information on the business format to which the standard company (first company) belongs.
 図7は、類似度情報700の例である。
 類似度情報700は基準企業(第1の企業)と対象企業(第2の企業)との類似度の情報を記憶している。
 類似度情報700は案件ID701、企業ID702、基準企業名703、及び類似度704などの情報を有する。
 類似度704は基準企業(第1の企業)と複数の対象企業(第2の企業)との類似度の情報である。
FIG. 7 is an example of the similarity information 700.
The similarity information 700 stores information on the degree of similarity between the reference company (first company) and the target company (second company).
The similarity information 700 has information such as a case ID 701, a company ID 702, a reference company name 703, and a similarity 704.
The similarity 704 is information on the similarity between the reference company (first company) and a plurality of target companies (second company).
 図8、図9及び図10は企業類似度算出サーバ101の補助記憶装置202の対象企業DB208に記憶される各テーブルの例である。 8, 9 and 10 are examples of each table stored in the target company DB 208 of the auxiliary storage device 202 of the company similarity calculation server 101.
 図8は、対象企業基本情報800の例である。
 対象企業基本情報800は対象企業(第2の企業)に関する企業情報を記憶している。
 対象企業基本情報800は企業ID801、対象企業名802、企業情報803、株式時価総額804、当期純利益805及び株価収益率806などの情報を有する。
 対象企業名802は、対象企業(第2の企業)の名称である。
 企業情報803は、対象企業(第2の企業)に関する文字列の情報であり、実質的に対象企業(第2の企業)に関する文字列の情報に紐づいた情報であればよく、例えば企業URL(Uniform Resource Locator)であってもよい。
FIG. 8 is an example of the target company basic information 800.
The target company basic information 800 stores company information about the target company (second company).
The target company basic information 800 has information such as company ID 801, target company name 802, company information 803, market capitalization 804, net income 805, and price-earnings ratio 806.
The target company name 802 is the name of the target company (second company).
The company information 803 is character string information about the target company (second company), and may be information that is substantially linked to the character string information about the target company (second company), for example, a company URL. (Uniform Resource Locator) may be used.
 図9は、対象企業単語情報900の例である
 対象企業単語情報900は対象企業(第2の企業)に関する情報から抽出した単語の情報を記憶している。
 対象企業単語情報900は、企業ID901、対象企業名902、事業単語903、業界単語904及び業態単語905などの情報を有する。
 事業単語903は、第2の企業に関する情報から抽出した、第1の企業の事業に関する単語の情報である。
 業界単語904は、第2の企業に関する情報から抽出した、第1の企業の業界に関する単語の情報である。
 業態単語905は、第2の企業に関する情報から抽出した、第1の企業の業態に関する単語の情報である。
FIG. 9 is an example of the target company word information 900. The target company word information 900 stores word information extracted from information about the target company (second company).
The target company word information 900 has information such as a company ID 901, a target company name 902, a business word 903, an industry word 904, and a business type word 905.
The business word 903 is the information of the word about the business of the first company extracted from the information about the second company.
The industry word 904 is the information of the word about the industry of the first company extracted from the information about the second company.
The business format word 905 is the information of the word related to the business format of the first company extracted from the information about the second company.
 図10は、対象企業分類情報1000の例である。
 対象企業分類情報1000は対象企業(第2の企業)が属する業界及び業態の情報を記憶している。
 対象企業分類情報1000は企業ID1001、対象企業名1002、業界1003及び業態1004などの情報を有する。
 業界1003は、対象企業(第2の企業)が属する業界の情報である。
 業態1004は、対象企業(第2の企業)が属する業態の情報である。
FIG. 10 is an example of the target company classification information 1000.
The target company classification information 1000 stores information on the industry and business type to which the target company (second company) belongs.
The target company classification information 1000 has information such as a company ID 1001, a target company name 1002, an industry 1003, and a business type 1004.
Industry 1003 is information on the industry to which the target company (second company) belongs.
The business type 1004 is information on the business type to which the target company (second company) belongs.
 図11、図12、図13、図14及び図15は企業類似度算出サーバ101の補助記憶装置202の辞書DB209に記憶される各テーブルの例である。
 なお、辞書DB209には、事業単語辞書情報1050も記憶されている。事業単語辞書情報1050は事業に関する単語の情報を記憶している。
11, FIG. 12, FIG. 13, FIG. 14, and FIG. 15 are examples of each table stored in the dictionary DB 209 of the auxiliary storage device 202 of the company similarity calculation server 101.
The business word dictionary information 1050 is also stored in the dictionary DB 209. The business word dictionary information 1050 stores word information related to the business.
 図11は、業界単語分類辞書情報1100の例である。
 業界単語分類辞書情報1100は業界に関する単語及び分類の情報を記憶している。
 業界単語分類辞書情報1100は上位業界1101、下位業界1102及び業界単語1103などの情報を有する。
 業界単語1103は、業界に関する単語の情報である。
 下位業界1102は、業界単語1103に対応付けられた下位の業界の分類の情報である。
 上位業界1101は、下位業界1102に対応付けられた上位の業界の分類の情報である。上位業界1101は下位業界1102より広い概念である。
FIG. 11 is an example of the industry word classification dictionary information 1100.
The industry word classification dictionary information 1100 stores information on words and classifications related to the industry.
The industry word classification dictionary information 1100 has information such as upper industry 1101, lower industry 1102, and industry word 1103.
The industry word 1103 is information on words related to the industry.
The sub-industry 1102 is information on the classification of the sub-industry associated with the industry word 1103.
The upper industry 1101 is information on the classification of the upper industry associated with the lower industry 1102. The upper industry 1101 is a broader concept than the lower industry 1102.
 図12は、業態単語分類辞書情報1200の例である。
 業態単語分類辞書情報1200は業態に関する単語及び分類の情報を記憶している。
 業態単語分類辞書情報1200は上位業態1201、下位業態1202及び業態単語1203などの情報を有する。
 業態単語1203は、業態に関する単語の情報である。
 下位業態1202は、業態単語1203に対応付けられた下位の業態の分類の情報である。
 上位業態1201は、下位業態1202に対応付けられた上位の業態の分類の情報である。上位業態1201は下位業態1202より広い概念である。
FIG. 12 is an example of the business type word classification dictionary information 1200.
The business type word classification dictionary information 1200 stores information on words and classifications related to the business type.
The business type word classification dictionary information 1200 has information such as a high-level business type 1201, a low-level business type 1202, and a business type word 1203.
The business format word 1203 is information on words related to the business format.
The sub-business format 1202 is information on the classification of the sub-business format associated with the business format word 1203.
The upper business format 1201 is information on the classification of the upper business format associated with the lower business format 1202. The upper format 1201 is a broader concept than the lower format 1202.
 なお、事業単語辞書情報1050に記憶する事業に関連する全ての単語と、業界単語分類辞書情報1100に記憶する業界に関連する全ての単語と、業態単語分類辞書情報1200に記憶する業態に関連する全ての単語と、は完全に一致しないようにする(異なる)。しかし、事業単語辞書情報1050に記憶する事業に関連する特定の一部の単語と、業界単語分類辞書情報1100に記憶する業界に関連する特定の一部の単語と、業態単語分類辞書情報1200に記憶する業態に関連する特定の一部の単語と、が同一であってもよい。 It should be noted that all the words related to the business stored in the business word dictionary information 1050, all the words related to the industry stored in the industry word classification dictionary information 1100, and the business types stored in the business type word classification dictionary information 1200 are related. Make sure that all words do not exactly match (different). However, some specific words related to the business stored in the business word dictionary information 1050, some specific words related to the industry stored in the industry word classification dictionary information 1100, and the business type word classification dictionary information 1200 It may be the same as some specific words related to the business type to be memorized.
 図13は、業界類似度マトリクス情報1300の例である。
 業界類似度マトリクス情報1300は一方の上位業界及び下位業界と他方の上位業界及び下位業界との類似度を記憶している。
 業界類似度マトリクス情報1300は上位業界1301、下位業界1302及び類似度1303などの情報を有する。
 業界類似度マトリクス情報1300は、列に属する要素(上位業界及び下位業界)と行に属する要素(上位業界及び下位業界)とがそれぞれ対応している。
 類似度1303には、列と行の交点に、列における上位業界及び下位業界と行における上位業界及び下位業界との類似度の情報を記憶している。
FIG. 13 is an example of the industry similarity matrix information 1300.
The industry similarity matrix information 1300 stores the similarity between one upper industry and a lower industry and the other upper industry and a lower industry.
The industry similarity matrix information 1300 has information such as upper industry 1301, lower industry 1302, and similarity 1303.
In the industry similarity matrix information 1300, the elements belonging to the columns (upper industry and lower industry) and the elements belonging to the row (upper industry and lower industry) correspond to each other.
In the similarity 1303, information on the similarity between the upper industry and the lower industry in the column and the upper industry and the lower industry in the row is stored at the intersection of the column and the row.
 図14は、業態類似度マトリクス情報1400の例である。
 業態類似度マトリクス情報1400は一方の上位業態及び下位業態と他方の上位業態及び下位業態との類似度を記憶している。
 業態類似度マトリクス情報1400は上位業態1401、下位業態1402及び類似度1403などの情報を有する。
 業態類似度マトリクス情報1400は、列に属する要素(上位業態及び下位業態)と行に属する要素(上位業態及び下位業態)とがそれぞれ対応している。
 類似度1403は、列と行の交点に、列における上位業態及び下位業態と行における上位業態及び下位業態との類似度の情報を記憶している。
FIG. 14 is an example of the format similarity matrix information 1400.
The business type similarity matrix information 1400 stores the degree of similarity between one upper business type and lower business type and the other upper business type and lower business type.
The business type similarity matrix information 1400 has information such as the upper business type 1401, the lower business type 1402, and the similarity degree 1403.
In the business type similarity matrix information 1400, the elements belonging to the columns (upper business type and lower business type) and the elements belonging to the row (upper business type and lower business type) correspond to each other.
The similarity level 1403 stores information on the degree of similarity between the upper business type and the lower business type in the column and the upper business type and the lower business type in the row at the intersection of the column and the row.
 図15は、業界業態類似度設定情報1500の例である。
 業界業態類似度設定情報1500は、業界類似度マトリクス情報1300の類似度1303及び業態類似度マトリクス情報1400の類似度1403を設定するための規則の情報を記憶している。詳細は後述する。
FIG. 15 is an example of the industry format similarity setting information 1500.
The industry format similarity setting information 1500 stores information on rules for setting the similarity 1303 of the industry similarity matrix information 1300 and the similarity 1403 of the business category similarity matrix information 1400. Details will be described later.
 図16~図25は企業類似度算出システム1で実施される各種処理のフローを示す。
 図16は、類似度算出モジュール211が実施する類似度算出フロー1600の例である。
 類似度算出フロー1600は、第1の企業と第2の企業との類似度を算出し、算出した類似度を出力するフローである。
16 to 25 show the flow of various processes executed by the company similarity calculation system 1.
FIG. 16 is an example of the similarity calculation flow 1600 carried out by the similarity calculation module 211.
The similarity calculation flow 1600 is a flow for calculating the similarity between the first company and the second company and outputting the calculated similarity.
 類似度算出モジュール211は第1の企業に関する情報(以下、第1の情報とする場合がある)及び第2の企業に関する情報(以下、第2の情報とする場合がある)を取得する(ステップ1601)。
 ここで、図26を用いて説明する。本実施形態における画面の例は利用者端末102の出力装置305に表示される。
The similarity calculation module 211 acquires information about the first company (hereinafter, may be referred to as the first information) and information regarding the second company (hereinafter, may be referred to as the second information) (step). 1601).
Here, it will be described with reference to FIG. An example of the screen in this embodiment is displayed on the output device 305 of the user terminal 102.
 図26は、第1の企業と類似する類似企業の抽出を開始するための画面2600の例である。
 図26で示す例では、第1の企業であるA株式会社におけるマッチング先企業の候補がある場合にはマッチング先企業の候補を表示する。しかし、図26で示す例では、A株式会社にはマッチング先企業が登録されていないため、マッチング先企業の候補を特定するために、第1の企業と類似する類似企業の抽出を開始するための画面となっている。
 第1の企業と類似する類似企業を抽出することにより、利用者は、抽出した複数の類似企業を参照することで、いずれの企業をマッチング先企業の候補とすべきかの意思決定を行いやすくなる。
FIG. 26 is an example of the screen 2600 for starting the extraction of similar companies similar to the first company.
In the example shown in FIG. 26, if there is a candidate for a matching destination company in A Co., Ltd., which is the first company, the candidate for the matching destination company is displayed. However, in the example shown in FIG. 26, since the matching destination company is not registered in A Co., Ltd., in order to identify the candidate of the matching destination company, the extraction of similar companies similar to the first company is started. It is the screen of.
By extracting similar companies similar to the first company, the user can easily make a decision as to which company should be a candidate for the matching destination company by referring to the extracted multiple similar companies. ..
 図26で示す例では、第1の企業であるA株式会社に関する情報(第1の情報)であるURL2601(https://www.a‥)を表示している。ここで、当該A株式会社に関する情報(第1の情報)は、A株式会社に関する情報であるとして予め記憶した情報である。すなわち、A株式会社に関する情報(A株式会社のURLの情報も含む。)は既に対象企業基本情報800に記憶した情報である。従って、図26で示す例では、第1の情報であるURL2601は対象企業基本情報800の企業情報803に記憶したA株式会社の企業情報(図示せず)である。他の例として、第1の情報を利用者端末102から受信してもよい。 In the example shown in FIG. 26, URL2601 (https://www.a ...), which is information (first information) about A Co., Ltd., which is the first company, is displayed. Here, the information about the A corporation (first information) is the information stored in advance as the information about the A corporation. That is, the information about the A corporation (including the URL information of the A corporation) is the information already stored in the target company basic information 800. Therefore, in the example shown in FIG. 26, the URL 2601 which is the first information is the company information (not shown) of A Co., Ltd. stored in the company information 803 of the target company basic information 800. As another example, the first information may be received from the user terminal 102.
 図26で示す例において、「以下のURLから自動抽出する」2602が選択されると、類似度算出モジュール211は、第1の情報及び第2の情報を取得(ステップ1601)したこととなる。
 すなわち、類似度算出モジュール211は「以下のURLから自動抽出する」2602が選択された場合に、第1の情報を取得する。
 また、類似度算出モジュール211は、本実施形態において、「以下のURLから自動抽出する」2602が選択された時点で対象企業基本情報800の企業情報803に記憶した全ての企業における企業情報(第1の企業と企業IDが同一であるものを除く。)を第2の情報として取得する。すなわち、本実施形態において、類似度算出モジュール211は、1つの第1の企業に対して対象企業基本情報800に記憶した全ての企業(第1の企業と同一の企業を除く)の類似度を算出することを意味する。
 なお、他の実施形態として、類似度算出モジュール211は、特定の一部の対象企業基本情報800の企業情報803に記憶した企業情報、又は利用者端末102から受信した企業情報を第2の情報として取得してもよい。
In the example shown in FIG. 26, when "automatically extract from the following URL" 2602 is selected, the similarity calculation module 211 has acquired the first information and the second information (step 1601).
That is, the similarity calculation module 211 acquires the first information when "automatically extract from the following URL" 2602 is selected.
Further, in the present embodiment, the similarity calculation module 211 is the company information (No. 1) of all the companies stored in the company information 803 of the target company basic information 800 when "automatically extract from the following URL" 2602 is selected. (Excluding those with the same company ID as one company) is acquired as the second information. That is, in the present embodiment, the similarity calculation module 211 determines the similarity of all the companies (excluding the same company as the first company) stored in the target company basic information 800 with respect to one first company. Means to calculate.
As another embodiment, the similarity calculation module 211 uses the company information stored in the company information 803 of the specific target company basic information 800 or the company information received from the user terminal 102 as the second information. May be obtained as.
 類似度算出モジュール211は、第1の情報及び第2の情報から事業に関する単語を抽出する(ステップ1602)。詳細は後述する。
 類似度算出モジュール211は、第1の企業が行っている事業に関する単語と、第2の企業が行っている事業に関する単語と、に基づいて事業類似度を算出する(ステップ1603)。詳細は後述する。
The similarity calculation module 211 extracts words related to the business from the first information and the second information (step 1602). Details will be described later.
The similarity calculation module 211 calculates the business similarity based on the words related to the business conducted by the first company and the words related to the business conducted by the second company (step 1603). Details will be described later.
 類似度算出モジュール211は、第1の情報及び第2の情報から業界に関する単語を抽出する(ステップ1604)。詳細は後述する。
 類似度算出モジュール211は、第1の企業が属する業界と、第2の企業が属する業界と、の類似度に基づいて業界類似度を算出する(ステップ1605)。詳細は後述する。
The similarity calculation module 211 extracts words related to the industry from the first information and the second information (step 1604). Details will be described later.
The similarity calculation module 211 calculates the industry similarity based on the similarity between the industry to which the first company belongs and the industry to which the second company belongs (step 1605). Details will be described later.
 類似度算出モジュール211は、第1の情報及び第2の情報から業態に関する単語を抽出する(ステップ1606)。詳細は後述する。
 類似度算出モジュール211は、第1の企業が属する業態と、第2の企業が属する業態と、の類似度に基づいて業態類似度を算出する(ステップ1607)。詳細は後述する。
The similarity calculation module 211 extracts words related to the business format from the first information and the second information (step 1606). Details will be described later.
The similarity calculation module 211 calculates the business type similarity based on the similarity between the business type to which the first company belongs and the business type to which the second company belongs (step 1607). Details will be described later.
 類似度算出モジュール211は、事業類似度、業界類似度、及び業態類似度に基づいて企業類似度を算出する(ステップ1608)。詳細は後述する。
 類似度算出モジュール211は、企業類似度の序列に基づいて複数の第2の企業の類似度を出力する(ステップ1609)。詳細は後述する。
 これにより、類似度算出モジュール211が実行する類似度算出フロー1600は終了する。
The similarity calculation module 211 calculates the company similarity based on the business similarity, the industry similarity, and the business type similarity (step 1608). Details will be described later.
The similarity calculation module 211 outputs the similarity of a plurality of second companies based on the order of the company similarity (step 1609). Details will be described later.
As a result, the similarity calculation flow 1600 executed by the similarity calculation module 211 ends.
 図17は、類似度算出モジュール211が実施する事業単語抽出フロー1700の例である。
 事業単語抽出フロー1700は、第1の情報及び第2の情報から事業に関する単語を抽出するフローであり、図16におけるステップ1602の詳細なフローである。
FIG. 17 is an example of the business word extraction flow 1700 implemented by the similarity calculation module 211.
The business word extraction flow 1700 is a flow for extracting words related to the business from the first information and the second information, and is a detailed flow of step 1602 in FIG.
 類似度算出モジュール211は第1の企業に関する情報である第1の情報及び第2の企業に関する情報である第2の情報から単語群を抽出する(ステップ1701)。
 具体的には、類似度算出モジュール211は、第1の企業に関する文字列の情報である第1の情報(本実施形態においては企業URLのリンク先の文字列情報)を形態素解析により、意味を持つ最小限の単位である単語に分解する(以下、第1の情報を分解して得た複数の単語の群を第1の単語群とする場合がある。)。
 同様に、類似度算出モジュール211は、第2の企業に関する文字列の情報である第2の情報(本実施形態においては企業URLのリンク先の文字列情報)を形態素解析により、意味を持つ最小限の単位である単語に分解する(以下、第2の情報を分解して得た複数の単語の群を第2の単語群とする場合がある。)。
The similarity calculation module 211 extracts a word group from the first information which is information about the first company and the second information which is information about the second company (step 1701).
Specifically, the similarity calculation module 211 obtains the meaning of the first information (character string information of the link destination of the company URL in the present embodiment), which is the character string information about the first company, by morphological analysis. It is decomposed into words, which is the minimum unit to have (hereinafter, a group of a plurality of words obtained by decomposing the first information may be referred to as a first word group).
Similarly, the similarity calculation module 211 performs the second information (character string information of the link destination of the company URL in the present embodiment), which is the character string information about the second company, by morphological analysis to the minimum meaningful. It is decomposed into words, which are the unit of the limit (hereinafter, a group of a plurality of words obtained by decomposing the second information may be referred to as a second word group).
 類似度算出モジュール211は第1の単語群と事業単語辞書情報1050とを突合し、かつ第2の単語群と事業単語辞書情報1050とを突合する(ステップ1702)。
 辞書DB209の事業単語辞書情報1050には事業に関する単語(以下、事業単語とする場合がある。)の情報が記憶されている。
The similarity calculation module 211 collates the first word group with the business word dictionary information 1050, and collates the second word group with the business word dictionary information 1050 (step 1702).
The business word dictionary information 1050 of the dictionary DB 209 stores information on words related to the business (hereinafter, may be referred to as business words).
 類似度算出モジュール211は第1の単語群に含まれる事業単語(以下、第1の事業単語とする場合がある。)又は第2の単語群に含まれる事業単語(以下、第2の事業単語とする場合がある。)を出力する(ステップ1703)。
 類似度算出モジュール211は、第1の情報において第1の事業単語が出現した回数及び第2の情報において第2の事業単語が出現した回数を出力する(ステップ1704)。
The similarity calculation module 211 is a business word included in the first word group (hereinafter, may be referred to as a first business word) or a business word included in the second word group (hereinafter, a second business word). (May be) (step 1703).
The similarity calculation module 211 outputs the number of times the first business word appears in the first information and the number of times the second business word appears in the second information (step 1704).
 ステップ1703及びステップ1704の具体的な例を、図5を用いて説明する。図5で示す案件IDがM1(基準企業名がA株式会社)の例を説明する。
 A株式会社に関する文字列情報(第1の情報)から抽出した第1の単語群に「住宅」及び「メンテナンス」の単語を含んでおり、かつ事業単語辞書情報1050に「住宅」及び「メンテナンス」の事業単語を含んでいる場合を想定する。当該場合には、類似度算出モジュール211は、第1の単語群と事業単語辞書情報1050とで共通する「住宅」及び「メンテナンス」が第1の情報における第1の事業単語であるとして当該単語を基準企業単語情報500の事業単語504に出力する(記憶する)(ステップ1703)。
Specific examples of steps 1703 and 1704 will be described with reference to FIG. An example in which the project ID shown in FIG. 5 is M1 (the standard company name is A Co., Ltd.) will be described.
The first word group extracted from the character string information (first information) about A Co., Ltd. contains the words "house" and "maintenance", and the business word dictionary information 1050 includes "house" and "maintenance". It is assumed that the business word of is included. In this case, the similarity calculation module 211 assumes that "house" and "maintenance" common to the first word group and the business word dictionary information 1050 are the first business words in the first information. Is output (memorized) to the business word 504 of the reference company word information 500 (step 1703).
 さらに、類似度算出モジュール211は、「住宅」及び「メンテナンス」がA株式会社に関する文字列情報に出現した回数も基準企業単語情報500の事業単語504に出力する(記憶する)(ステップ1704)。すなわち、類似度算出モジュール211はA株式会社に関する文字列情報(第1の情報)のうち、最も多く出現した「住宅」(出現回数が7回)を事業単語504の単語1に出力し、次いで出現回数が多かった「メンテナンス」(出現回数が6回)を事業単語504の単語2に出力する。なお、類似度算出モジュール211は、「メンテナンス」に次いで出現回数が多かった他の単語も事業単語504の単語3(図示せず)以降に出力する。 Further, the similarity calculation module 211 also outputs (memorizes) the number of times "house" and "maintenance" appear in the character string information related to A Co., Ltd. in the business word 504 of the standard company word information 500 (step 1704). That is, the similarity calculation module 211 outputs the most frequently appearing "house" (the number of appearances is 7) among the character string information (first information) related to A Co., Ltd. to word 1 of the business word 504, and then outputs it to word 1. "Maintenance" (the number of appearances is 6), which has been frequently used, is output to word 2 of the business word 504. The similarity calculation module 211 also outputs other words that appear most frequently after "maintenance" after word 3 (not shown) of business word 504.
 なお、A株式会社に関する文字列情報から抽出した第1の単語群に「新築」の単語を含んでおり、かつ事業単語辞書情報1050に「新築」の事業単語を含んでいない場合を想定する。当該場合には、類似度算出モジュール211は、第1の単語群と事業単語辞書情報1050とでは「新築」の単語は共通しないため「新築」は基準企業単語情報500の事業単語504に出力しない(記憶しない)。
 また、図5を用いて説明した当該例は第1の企業についての例であるが、第2の企業の場合であっても、第2の事業単語の情報及び出現した回数の情報を対象企業単語情報900における事業単語903に出力する(記憶する)こと以外は、図5を用いて説明した例と同様である。他の実施形態として、類似度算出モジュール211は、第1の事業単語の情報及び出現した回数、並びに情報第2の事業単語の情報及び出現した回数の情報が同一のデータベースに記憶してもよい。
 これにより、類似度算出モジュール211が実行する事業単語抽出フロー1700は終了する。
It is assumed that the first word group extracted from the character string information about A Co., Ltd. contains the word "new construction" and the business word dictionary information 1050 does not include the business word "new construction". In this case, the similarity calculation module 211 does not output the word "new construction" to the business word 504 of the standard company word information 500 because the word "new construction" is not common between the first word group and the business word dictionary information 1050. (I don't remember).
Further, the example described with reference to FIG. 5 is an example of the first company, but even in the case of the second company, the information of the second business word and the information of the number of appearances are the target companies. It is the same as the example described with reference to FIG. 5 except that it is output (memorized) to the business word 903 in the word information 900. As another embodiment, the similarity calculation module 211 may store the information of the first business word and the number of appearances, and the information of the information second business word and the number of appearances in the same database. ..
As a result, the business word extraction flow 1700 executed by the similarity calculation module 211 ends.
 図18は、類似度算出モジュール211が実施する事業単語類似度算出フロー1800の例である。
 事業単語類似度算出フロー1800は、第1の企業が行っている事業に関する単語(第1の事業単語)と、第2の企業が行っている事業に関する単語(第2の事業単語)と、に基づいて事業類似度を算出するフローであり、図16におけるステップ1603の詳細なフローである。
FIG. 18 is an example of the business word similarity calculation flow 1800 carried out by the similarity calculation module 211.
The business word similarity calculation flow 1800 includes words related to the business conducted by the first company (first business word) and words related to the business conducted by the second company (second business word). It is a flow for calculating the business similarity based on, and is a detailed flow of step 1603 in FIG.
 類似度算出モジュール211は図5における事業単語504に記憶した全ての第1の事業単語(以下、第1の事業単語群とする場合がある)及び事業単語903に記憶した全ての第2の事業単語(以下、第2の事業単語群とする場合がある。)取得する(ステップ1801)。
 類似度算出モジュール211は第1の事業単語群をベクトル化し、かつ第2の事業単語群をベクトル化する(ステップ1802)。
The similarity calculation module 211 includes all the first business words stored in the business word 504 in FIG. 5 (hereinafter, may be referred to as the first business word group) and all the second businesses stored in the business word 903. Acquire words (hereinafter, may be referred to as a second business word group) (step 1801).
The similarity calculation module 211 vectorizes the first business word group and vectorizes the second business word group (step 1802).
 具体的には、例えば、類似度算出モジュール211は、tf‐idf(Term Frequency-Inverse Document Frequency)を用いて、第1の事業単語群及び第2の事業単語群を其々ベクトル化することができる。この場合においては、類似度算出モジュール211は、第1の事業単語群に含まれる各事業単語の出現回数の情報も取得する。
 他の例として、類似度算出モジュール211は、Bag of Words、LSA(Latent Semantic Analysis)、word2vec、Doc2Vec等の文字列情報をベクトル化する技術を用いて、第1の企業に関する文字列情報(第1の情報)をベクトル化してもよく、また第2の企業に関する文字列情報(第2の情報)をベクトル化してもよい。
Specifically, for example, the similarity calculation module 211 can vectorize the first business word group and the second business word group, respectively, by using tf-idf (Tf-idf). it can. In this case, the similarity calculation module 211 also acquires information on the number of occurrences of each business word included in the first business word group.
As another example, the similarity calculation module 211 uses a technique for vectorizing character string information such as Bag of Words, LSA (Latent Semantic Analysis), word2vec, and Doc2Vec to vectorize character string information (first). The information of 1) may be vectorized, or the character string information (second information) about the second company may be vectorized.
 類似度算出モジュール211は、第1の事業単語群におけるベクトル情報と第2の事業単語群におけるベクトル情報との類似度を算出する(ステップ1803)。
 例えば、類似度算出モジュール211は、第1の事業単語群におけるベクトル情報と第2の事業単語群におけるベクトル情報との類似度は、コサイン類似度を計算することで算出できる。
 類似度算出モジュール211は、第1の事業単語群におけるベクトル情報と第2の事業単語群におけるベクトル情報との算出した類似度を事業類似度として出力する(ステップ1804)。
The similarity calculation module 211 calculates the similarity between the vector information in the first business word group and the vector information in the second business word group (step 1803).
For example, the similarity calculation module 211 can calculate the similarity between the vector information in the first business word group and the vector information in the second business word group by calculating the cosine similarity.
The similarity calculation module 211 outputs the calculated similarity between the vector information in the first business word group and the vector information in the second business word group as the business similarity (step 1804).
 ステップ1803及びステップ1804の具体的な例を、図5、図7及び図9を用いて説明する。図5で示す第1の企業であるA株式会社(基準企業名503)と、図9で示す第2の企業のうちの1つであるZ株式会社(対象企業名902)と、の事業類似度を算出し、図7における類似度704に出力する例である。 Specific examples of steps 1803 and 1804 will be described with reference to FIGS. 5, 7, and 9. Business similarity between A Co., Ltd. (standard company name 503), which is the first company shown in FIG. 5, and Z Co., Ltd. (target company name 902), which is one of the second companies shown in FIG. This is an example of calculating the degree and outputting it to the similarity 704 in FIG.
 ステップ1803として、類似度算出モジュール211は、図5のA株式会社における事業単語504に記憶した情報をベクトル化したベクトル情報と、図9のZ株式会社における事業単語903に記憶した情報をベクトル化したベクトル情報と、のコサイン類似度を算出する。この場合、類似度算出モジュール211は、例えば、コサイン類似度を0.960と算出する。 As step 1803, the similarity calculation module 211 vectorizes the vector information stored in the business word 504 in FIG. 5A and the information stored in the business word 903 in Z corporation in FIG. Calculate the cosine similarity with the vector information. In this case, the similarity calculation module 211 calculates, for example, the cosine similarity as 0.960.
 ステップ1804として、類似度算出モジュール211は、Z株式会社の企業ID及び当該企業IDに関連させて事業類似度(コサイン類似度)を類似度情報700の類似度704に出力する(記憶する)。
 なお、類似度算出モジュール211は、第1の企業のA株式会社と他の第2の企業との事業類似度も類似度704に出力するが、この時点において事業類似度の序列に従って記憶する必要はない。詳細は後述するが、Z株式会社(企業ID:C0001)の事業類似度は、類似度704のうちの類似度1に記憶されている。
 これにより、類似度算出モジュール211が実行する事業単語類似度算出フロー1800は終了する。
As step 1804, the similarity calculation module 211 outputs (stores) the company ID of Z Co., Ltd. and the business similarity (cosine similarity) in relation to the company ID to the similarity 704 of the similarity information 700.
The similarity calculation module 211 also outputs the business similarity between the first company A Co., Ltd. and the other second company to the similarity 704, but at this point, it is necessary to store the business similarity according to the order of the business similarity. There is no. Although the details will be described later, the business similarity of Z Co., Ltd. (company ID: C0001) is stored in the similarity 1 of the similarity 704.
As a result, the business word similarity calculation flow 1800 executed by the similarity calculation module 211 ends.
 図19は、類似度算出モジュール211が実施する業界単語抽出フロー1900の例である。
 業界単語抽出フロー1900は、第1の情報及び第2の情報から業界に関する単語を抽出するフローであり、図16におけるステップ1604の詳細なフローである。
FIG. 19 is an example of the industry word extraction flow 1900 carried out by the similarity calculation module 211.
The industry word extraction flow 1900 is a flow for extracting words related to the industry from the first information and the second information, and is a detailed flow of step 1604 in FIG.
 類似度算出モジュール211は第1の企業に関する情報である第1の情報及び第2の企業に関する情報である第2の情報から単語群を抽出する(ステップ1901)。なお、当該ステップはステップ1701と同様であり、ステップ1701において、類似度算出モジュール211が抽出した単語群を利用することができる。 The similarity calculation module 211 extracts a word group from the first information which is information about the first company and the second information which is information about the second company (step 1901). The step is the same as in step 1701, and the word group extracted by the similarity calculation module 211 can be used in step 1701.
 類似度算出モジュール211は第1の単語群と業界単語分類辞書情報1100とを突合し、かつ第2の単語群と業界単語分類辞書情報1100とを突合する(ステップ1902)。
 辞書DB209の業界単語分類辞書情報1100には業界に関する単語(以下、業界単語とする場合がある。)の情報を記憶している。
The similarity calculation module 211 collates the first word group with the industry word classification dictionary information 1100, and collates the second word group with the industry word classification dictionary information 1100 (step 1902).
The industry word classification dictionary information 1100 of the dictionary DB 209 stores information on words related to the industry (hereinafter, may be referred to as industry words).
 類似度算出モジュール211は第1の単語群に含まれる業界単語(以下、第1の業界単語とする場合がある。)又は第2の単語群に含まれる業界単語(以下、第2の業界単語とする場合がある。)を出力する(ステップ1903)。
 類似度算出モジュール211は、第1の情報において第1の業界単語が出現した回数及び第2の情報において第2の業界単語が出現した回数を出力する(ステップ1904)。
The similarity calculation module 211 includes an industry word included in the first word group (hereinafter, may be referred to as a first industry word) or an industry word included in the second word group (hereinafter, a second industry word). In some cases) (step 1903).
The similarity calculation module 211 outputs the number of times the first industry word appears in the first information and the number of times the second industry word appears in the second information (step 1904).
 ステップ1902~ステップ1904の具体的な例を、図5及び図11を用いて説明する。図5で示す案件IDがM1(基準企業名がA株式会社)の例を説明する。
 A株式会社に関する文字列情報(第1の情報)から抽出した第1の単語群に「新築」及び「住宅」の単語を含んでいる場合を想定する。当該場合に、類似度算出モジュール211は、第1の単語群に含む「新築」及び「住宅」の単語が、業界単語分類辞書情報1100の業界単語1103に記憶されているかを検索する(ステップ1902)。
Specific examples of steps 1902 to 1904 will be described with reference to FIGS. 5 and 11. An example in which the project ID shown in FIG. 5 is M1 (the standard company name is A Co., Ltd.) will be described.
It is assumed that the first word group extracted from the character string information (first information) about A Co., Ltd. includes the words "new construction" and "house". In this case, the similarity calculation module 211 searches whether the words "new construction" and "house" included in the first word group are stored in the industry word 1103 of the industry word classification dictionary information 1100 (step 1902). ).
 本実施形態において、業界単語分類辞書情報1100の業界単語1103には「新築」1111及び「住宅」1112の単語が記憶されている。従って、類似度算出モジュール211は、第1の単語群に含む「新築」及び「住宅」の単語は、業界単語分類辞書情報1100の業界単語1103に含まれる「新築」1111及び「住宅」1112の単語であるとして、「新築」及び「住宅」の単語を第1の業界単語として基準企業単語情報500の業界単語505に出力する(記憶する)(ステップ1903)。 In the present embodiment, the words "new construction" 1111 and "house" 1112 are stored in the industry word 1103 of the industry word classification dictionary information 1100. Therefore, in the similarity calculation module 211, the words "new construction" and "house" included in the first word group are the words "new construction" 1111 and "house" 1112 included in the industry word 1103 of the industry word classification dictionary information 1100. As words, the words "new construction" and "house" are output (memorized) to the industry word 505 of the reference company word information 500 as the first industry word (step 1903).
 さらに、類似度算出モジュール211は、「新築」及び「住宅」がA株式会社に関する文字列情報に出現した回数も基準企業単語情報500の業界単語505に出力する(記憶する)(ステップ1904)。すなわち、類似度算出モジュール211はA株式会社に関する文字列情報(第1の情報)のうち、最も多く出現した「新築」(出現回数が10回)を業界単語505の単語1に出力し、次いで出現回数が多かった「住宅」(出現回数が7回)を業界単語505の単語2に出力する。なお、類似度算出モジュール211は、「住宅」に次いで出現回数が多かった他の単語も業界単語505の単語3(図示せず)以降に出力する。 Further, the similarity calculation module 211 also outputs (remembers) the number of times "new construction" and "house" appear in the character string information related to A Co., Ltd. in the industry word 505 of the standard company word information 500 (step 1904). That is, the similarity calculation module 211 outputs the most frequently appearing "new construction" (the number of appearances is 10) among the character string information (first information) related to A Co., Ltd. to word 1 of the industry word 505, and then outputs it to word 1. The "house" (the number of appearances is 7), which has appeared frequently, is output to word 2 of the industry word 505. The similarity calculation module 211 also outputs other words that appear most frequently after "house" after word 3 (not shown) of industry word 505.
 なお、A株式会社に関する文字列情報から抽出した第1の単語群に「施工事例」の単語を含んでおり、かつ業界単語分類辞書情報1100の業界単語1103に「施工事例」の業界単語を含んでいない場合を想定する。当該場合に、類似度算出モジュール211は、第1の単語群と業界単語分類辞書情報1100の業界単語1103とでは「施工事例」の単語は共通しないため「施工事例」の単語は基準企業単語情報500の業界単語505に出力しない(記憶しない)。
 また、図5を用いて説明した当該例は第1の企業についての例であるが、第2の企業の場合でっあっても、第2の業界単語の情報及び出現した回数の情報を対象企業単語情報900における業界単語904に出力する(記憶する)こと以外は、図5を用いて説明した例と同様である。他の実施形態として、類似度算出モジュール211は、第1の業界単語の情報及び出現した回数、並びに情報第2の業界単語の情報及び出現した回数の情報が同一のデータベースに記憶してもよい。
 これにより、類似度算出モジュール211が実行する業界単語抽出フロー1900は終了する。
The first word group extracted from the character string information about A Co., Ltd. includes the word "construction case", and the industry word 1103 of the industry word classification dictionary information 1100 includes the industry word "construction case". Imagine that it is not. In this case, in the similarity calculation module 211, the word "construction case" is not common between the first word group and the industry word 1103 of the industry word classification dictionary information 1100, so that the word "construction case" is the reference company word information. Do not output (do not remember) to 500 industry words 505.
Further, the example described with reference to FIG. 5 is an example of the first company, but even in the case of the second company, the information of the second industry word and the information of the number of appearances are targeted. It is the same as the example described with reference to FIG. 5, except that it is output (memorized) to the industry word 904 in the company word information 900. As another embodiment, the similarity calculation module 211 may store the information of the first industry word and the number of occurrences, and the information of the information second industry word and the number of appearances in the same database. ..
As a result, the industry word extraction flow 1900 executed by the similarity calculation module 211 ends.
 図20は、類似度算出モジュール211が実施する業界類似度算出フロー2000の例である。
 業界類似度算出フロー2000は、第1の企業が属する業界と、第2の企業が属する業界と、の類似度に基づいて業界類似度を算出するフローであり、図16におけるステップ1605の詳細なフローである。
FIG. 20 is an example of the industry similarity calculation flow 2000 implemented by the similarity calculation module 211.
The industry similarity calculation flow 2000 is a flow for calculating the industry similarity based on the similarity between the industry to which the first company belongs and the industry to which the second company belongs, and is a detailed flow of step 1605 in FIG. It is a flow.
 類似度算出モジュール211は、所定回数以上で出現する第1の業界単語を基準企業単語情報500の業界単語505から取得し、かつ所定回数以上で出現する第2の業界単語の情報を対象企業単語情報900の業界単語904から取得する(ステップ2001)。 The similarity calculation module 211 acquires the first industry word that appears more than a predetermined number of times from the industry word 505 of the reference company word information 500, and sets the information of the second industry word that appears more than a predetermined number of times as the target company word. Obtained from the industry word 904 of information 900 (step 2001).
 ステップ2001の具体的な例を、図5及び図9を用いて説明する。図5で示す基準企業名がA株式会社の例を説明する。A株式会社における業界単語は「新築」(出現回数10回)、「住宅」(出現回数7回)と続く。本実施形態において、類似度算出モジュール211は、最大の出現回数で出現する「新築」の出現回数(10回)の80%以上の回数(8回以上)で出現する第1の業界単語を取得する。すなわち、類似度算出モジュール211は、「新築」の業界単語の情報のみを取得する。なお、本実施形態において、類似度算出モジュール211は、最大の出現回数で出現する単語の出現回数の所定割合(80%)以上としたが、当該所定割合は任意に設定できる。 A specific example of step 2001 will be described with reference to FIGS. 5 and 9. An example in which the reference company name shown in FIG. 5 is A Co., Ltd. will be described. The industry words in A Co., Ltd. are "new construction" (10 appearances) and "house" (7 appearances). In the present embodiment, the similarity calculation module 211 acquires the first industry word that appears 80% or more (8 times or more) of the appearance number (10 times) of the "new construction" that appears at the maximum number of appearances. To do. That is, the similarity calculation module 211 acquires only the information of the industry word of "new construction". In the present embodiment, the similarity calculation module 211 is set to a predetermined ratio (80%) or more of the number of appearances of the word appearing at the maximum number of appearances, but the predetermined ratio can be arbitrarily set.
 図9で示す対象企業名がZ株式会社の例を説明する。Z株式会社における業界単語は「空間設計」(出現回数5回)、「住宅」(出現回数4回)と続く。本実施形態において、類似度算出モジュール211は、最大の出現回数で出現する「空間設計」の出現回数(5回)の80%以上の回数(4回以上)で出現する第1の業界単語を取得する。すなわち、類似度算出モジュール211は、「空間設計」及び「住宅」の業界単語の情報を取得する。 An example in which the target company name shown in FIG. 9 is Z Co., Ltd. will be described. The industry words at Z Co., Ltd. are "spatial design" (appearance 5 times) and "house" (appearance 4 times). In the present embodiment, the similarity calculation module 211 uses the first industry word that appears 80% or more (4 times or more) of the appearance number (5 times) of the "spatial design" that appears at the maximum number of appearances. get. That is, the similarity calculation module 211 acquires information on the industry words of "spatial design" and "housing".
 類似度算出モジュール211は、ステップ2001で取得した第1の業界単語に対応付けられた業界分類(以下、第1の業界とする場合がある。)、及びステップ2001で取得した第2の業界単語に対応付けられた業界分類(以下、第2の業界とする場合がある。)を業界単語分類辞書情報1100から取得する(ステップ2002)。 The similarity calculation module 211 includes an industry classification (hereinafter, may be referred to as a first industry) associated with the first industry word acquired in step 2001, and a second industry word acquired in step 2001. The industry classification (hereinafter, may be referred to as a second industry) associated with is acquired from the industry word classification dictionary information 1100 (step 2002).
 ステップ2002の具体的な例を、図5及び図11を用いて説明する。図5で示す基準企業名がA株式会社の例を説明する。類似度算出モジュール211は、ステップ2001で取得した「新築」の第1の業界単語に対応する第1の業界を取得する。具体的には、類似度算出モジュール211は、図11の業界単語分類辞書情報1100の「新築」1111に対応する、下位業界1102に記憶された「建築」1113及び上位業界1101に記憶された「建設」1114を取得する。なお、類似度算出モジュール211は、取得した第1の業界の情報を図6で示す基準企業分類情報600の業界604の業界1に記憶し、第1の業界が複数ある場合には、業界2以降に記憶する。 A specific example of step 2002 will be described with reference to FIGS. 5 and 11. An example in which the reference company name shown in FIG. 5 is A Co., Ltd. will be described. The similarity calculation module 211 acquires the first industry corresponding to the first industry word of "new construction" acquired in step 2001. Specifically, the similarity calculation module 211 corresponds to the "new construction" 1111 of the industry word classification dictionary information 1100 in FIG. 11, and is stored in the "architecture" 1113 and the upper industry 1101 stored in the lower industry 1102. Acquire "Construction" 1114. The similarity calculation module 211 stores the acquired information on the first industry in the industry 1 of the industry 604 of the reference company classification information 600 shown in FIG. 6, and when there are a plurality of the first industries, the industry 2 I will remember it later.
 同様に、ステップ2002の具体的な例を、図9及び図11を用いて説明する。図9で示す対象企業名がZ株式会社の例を説明する。類似度算出モジュール211は、ステップ2001で取得した「空間設計」及び「住宅」の第2の業界単語に対応する第2の業界を取得する。具体的には、類似度算出モジュール211は、図11の業界単語分類辞書情報1100の「空間設計」1115及び「住宅」1112に対応する、下位業界1102に記憶された「建築」1113及び上位業界1101に記憶された「建設」1114を取得する。なお、類似度算出モジュール211は、取得した第2の業界の情報を図10で示す対象企業分類情報1000の業界1003の業界1に記憶し、第1の業界が複数ある場合には、業界2以降に記憶する。 Similarly, a specific example of step 2002 will be described with reference to FIGS. 9 and 11. An example in which the target company name shown in FIG. 9 is Z Co., Ltd. will be described. The similarity calculation module 211 acquires the second industry corresponding to the second industry words of "spatial design" and "housing" acquired in step 2001. Specifically, the similarity calculation module 211 corresponds to the "spatial design" 1115 and the "house" 1112 of the industry word classification dictionary information 1100 in FIG. 11, and the "architecture" 1113 and the upper industry stored in the lower industry 1102. Acquire the "construction" 1114 stored in 1101. The similarity calculation module 211 stores the acquired information on the second industry in the industry 1 of the industry 1003 of the target company classification information 1000 shown in FIG. 10, and when there are a plurality of the first industries, the industry 2 I will remember it later.
 類似度算出モジュール211は、業界類似度マトリクス情報1300に基づき、ステップ2002で取得した、第1の業界と第2の業界との類似度を取得する(ステップ2003)。
 ステップ2003の具体的な例を、図13を用いて説明する。A株式会社の第1の業界(下位業界「建築」及び上位業界「建設」)と、Z株式会社の第2の業界(下位業界「建築」及び上位業界「建設」)と、の類似度を取得する例を説明する。
 類似度算出モジュール211は、図13の業界類似度マトリクス情報1300の列に属するA株式会社の上位業態「建設」1312及び下位業態「建築」1311と、行に属するZ株式会社の上位業態「建設」1314及び下位業態「建築」1313との交点1315に対応付けられた類似度(10)を取得する。
The similarity calculation module 211 acquires the similarity between the first industry and the second industry acquired in step 2002 based on the industry similarity matrix information 1300 (step 2003).
A specific example of step 2003 will be described with reference to FIG. The degree of similarity between the first industry of A Co., Ltd. (lower industry "architecture" and upper industry "construction") and the second industry of Z Co., Ltd. (lower industry "architecture" and upper industry "construction") An example of acquisition will be described.
The similarity calculation module 211 includes the upper business format "construction" 1312 and the lower business format "construction" 1311 of A Co., Ltd. belonging to the column of the industry similarity matrix information 1300 in FIG. 13, and the upper business format "construction" of Z Co., Ltd. belonging to the row. The similarity (10) associated with the intersection 1315 with 1314 and the subordinate business format “construction” 1313 is acquired.
 類似度算出モジュール211は、業界類似度マトリクス情報1300の交点に対応付けられた類似度に基づき、第1の企業と第2の企業との業界類似度を算出する(ステップ2004)。
 第1の企業としてのA株式会社と第2の企業としてのZ株式会社との例のように、第1の企業が1つの下位業界のみを有し、第2の企業が1つの下位業界のみを有している場合(業界類似度マトリクス情報1300の交点が1つの場合)には、業界類似度マトリクス情報1300の交点に対応付けられた類似度が、第1の企業と第2の企業との業界類似度となる。すなわち、上述のステップ2003で説明した、第1の企業としてのA株式会社と第2の企業としてのZ株式会社との例の場合の業態類似度は10である。
The similarity calculation module 211 calculates the industry similarity between the first company and the second company based on the similarity associated with the intersection of the industry similarity matrix information 1300 (step 2004).
As in the example of A Co., Ltd. as the first company and Z Co., Ltd. as the second company, the first company has only one sub-industry and the second company has only one sub-industry. (When there is one intersection of the industry similarity matrix information 1300), the similarity associated with the intersection of the industry similarity matrix information 1300 is the same as that of the first company and the second company. It becomes the industry similarity of. That is, in the case of the example of A corporation as the first company and Z corporation as the second company described in step 2003 above, the degree of business type similarity is 10.
 第1の企業の第1の業界及び第2の企業の第2の業界のうちの少なくとも1つが複数の下位業界を含んでいる場合であって、業界類似度マトリクス情報1300の交点が複数ある(類似度が複数ある)場合には、複数の交点に対応付けられた類似度を用いて所定の計算をすること業界類似度を算出できる。具体的な例を、図6、図10及び図13を用いて説明する。図6における基準企業分類情報600に記憶するH株式会社の第1の業界と、図10における対象企業分類情報1000に記憶するV株式会社(企業ID:C0005)の第2の業界と、の業界類似度を取得する例を説明する。 When at least one of the first industry of the first company and the second industry of the second company includes a plurality of sub-industries, there are a plurality of intersections of the industry similarity matrix information 1300 ( When there are a plurality of similarities), the industry similarity can be calculated by performing a predetermined calculation using the similarity associated with the plurality of intersections. Specific examples will be described with reference to FIGS. 6, 10 and 13. The first industry of H Co., Ltd. stored in the reference company classification information 600 in FIG. 6 and the second industry of V Co., Ltd. (company ID: C0005) stored in the target company classification information 1000 in FIG. An example of acquiring the similarity will be described.
 類似度算出モジュール211は、H株式会社の第1の業界として下位業界「高分子」及び上位業界「化学・石油・素材」並びに下位業界「無機材料」及び上位業界「化学・石油・素材」を基準企業分類情報600の業界604から取得する。
 類似度算出モジュール211は、V株式会社(企業ID:C0005)の第2の業界として下位業界「高分子」及び上位業界「化学・石油・素材」並びに下位業界「生活用品」及び上位業界「化学・石油・素材」を対象企業分類情報1000の業界1003から取得する。
The similarity calculation module 211 has the lower industry "polymer" and the upper industry "chemical / petrochemical / material" and the lower industry "inorganic material" and the upper industry "chemical / petrochemical / material" as the first industry of H Co., Ltd. Obtained from the industry 604 of the standard company classification information 600.
The similarity calculation module 211 is a second industry of V Co., Ltd. (company ID: C0005), which is a lower industry "polymer" and a higher industry "chemical / petroleum / material", a lower industry "household goods" and a higher industry "chemical". -Obtain "Oil / Materials" from the industry 1003 of the target company classification information 1000.
 類似度算出モジュール211は、図13の業界類似度マトリクス情報1300における、列に属するH株式会社の下位業界「高分子」1316及び下位業界「無機材料」1317と、行に属するV株式会社の下位業界「高分子」1318及び下位業界「生活用品」1319と、の4つの交点1320、1321、1322、1323に対応付けられた類似度を其々取得する。
 すなわち、類似度算出モジュール211は、交点1320に対応付けられた類似度「8」と、交点1321に対応付けられた類似度「10」と、交点1322に対応付けられた類似度「6」と、交点1323に対応付けられた類似度「8」と、を其々取得する。
The similarity calculation module 211 includes the sub-industry "polymer" 1316 and sub-industry "inorganic material" 1317 of H Co., Ltd. belonging to the column and the sub-industry V Co., Ltd. belonging to the row in the industry similarity matrix information 1300 of FIG. The similarity associated with the four intersections 1320, 1321, 1322, and 1323 of the industry "polymer" 1318 and the sub-industry "household goods" 1319 is acquired, respectively.
That is, the similarity calculation module 211 has a similarity "8" associated with the intersection 1320, a similarity "10" associated with the intersection 1321, and a similarity "6" associated with the intersection 1322. , The similarity degree “8” associated with the intersection 1323 is acquired, respectively.
 類似度算出モジュール211は、例えば、複数のうちの1つの第1の下位業界ごとに類似度を算出し、算出した全ての第1の下位業界ごとの類似度の平均値を算出することで、業界類似度を算出できる。類似度算出モジュール211は、例えば、列に属する第1の企業の1つの第1の下位業界に対応付けられた複数の類似度のうちの、最大の類似度と平均の類似度と、の平均値を算出することで、第1の下位業界ごとの類似度を算出する。なお、列の業界と、行の業界が完全に同一の場合の業界類似度は最大値となる。 The similarity calculation module 211 calculates, for example, the similarity for each of the first sub-industries, and calculates the average value of the calculated similarity for each of the first sub-industries. Industry similarity can be calculated. The similarity calculation module 211 is, for example, the average of the maximum similarity and the average similarity among the plurality of similarity associated with one first sub-industry of the first company belonging to the column. By calculating the value, the degree of similarity for each of the first sub-industries is calculated. When the column industry and the row industry are exactly the same, the industry similarity is the maximum.
 第1の下位業界ごとの類似度は、具体的に次のように算出できる。類似度算出モジュール211は、列に属するH株式会社の下位業界「高分子」1316における交点に対応付けられた類似度のうち、交点1320に対応付けられた類似度「8」と交点1321に対応付けられた類似度「10」との平均である「9」と、最大の類似度である交点1321に対応付けられた類似度「10」と、を取得する。更に、類似度算出モジュール211は、取得した「9」と「10」との平均値である「9.5」を、H株式会社の下位業界「高分子」1316における類似度とする。 The degree of similarity for each of the first sub-industries can be specifically calculated as follows. The similarity calculation module 211 corresponds to the similarity “8” and the intersection 1321 associated with the intersection 1320 among the similarity associated with the intersection in the sub-industry “polymer” 1316 of H Co., Ltd. belonging to the column. The average "9" with the attached similarity "10" and the similarity "10" associated with the intersection 1321 which is the maximum similarity are acquired. Further, the similarity calculation module 211 sets “9.5”, which is the average value of the acquired “9” and “10”, as the similarity in the sub-industry “polymer” 1316 of H Co., Ltd.
 類似度算出モジュール211は、列に属するH株式会社の下位業界「無機材料」1317における交点に対応付けられた類似度のうち、交点1322に対応付けられた類似度「6」と交点1323に対応付けられた類似度「8」との平均である「7」と、最大の類似度である交点1323に対応付けられた類似度「8」と、を取得する。更に、類似度算出モジュール211は、取得した「7」と「8」との平均値である「7.5」を、H株式会社の下位業界「無機材料」1317における類似度とする。なお、当該例のように、第1の下位業界ごとに類似度を算出することで、複数の交点に対応付けられた類似度のうちの最大の類似度に重きをおいた評価をすることができる。 The similarity calculation module 211 corresponds to the similarity “6” and the intersection 1323 associated with the intersection 1322 among the similarity associated with the intersection in the sub-industry “inorganic material” 1317 of H Co., Ltd. belonging to the column. The average "7" with the attached similarity "8" and the similarity "8" associated with the intersection 1323, which is the maximum similarity, are acquired. Further, the similarity calculation module 211 sets “7.5”, which is the average value of the acquired “7” and “8”, as the similarity in the sub-industry “inorganic material” 1317 of H Co., Ltd. In addition, as in the above example, by calculating the similarity for each of the first sub-industries, it is possible to evaluate with emphasis on the maximum similarity among the similarity associated with a plurality of intersections. it can.
 そして、類似度算出モジュール211は、H株式会社の下位業界「高分子」1316における類似度「9.5」と、H株式会社の「無機材料」1317における類似度「7.5」と、の平均値である「8.5」を業界類似度として算出する。なお、類似度算出モジュール211は、算出した業界類似度は次のステップ2005に従い図7の類似度情報700の類似度704に記憶する
 なお、他の例として、複数の交点に対応付けられた全ての類似度の平均値を業界類似度としてもよい。
Then, the similarity calculation module 211 has a similarity "9.5" in the lower industry "polymer" 1316 of H Co., Ltd. and a similarity "7.5" in "inorganic material" 1317 of H Co., Ltd. The average value of "8.5" is calculated as the industry similarity. The similarity calculation module 211 stores the calculated industry similarity in the similarity 704 of the similarity information 700 in FIG. 7 according to the next step 2005. As another example, all associated with a plurality of intersections. The average value of the similarity of the above may be used as the industry similarity.
 類似度算出モジュール211は、算出した業界類似度を出力する(ステップ2005)。
 ステップ2005の具体的な例を、図7を用いて説明する。第1の企業としてのA株式会社と第2の企業としてのZ株式会社との例で説明する。類似度算出モジュール211は、上述したステップ2004により算出した業界類似度の10を、事業類似度の点数と調整するため、10で除した値である1.00を類似度情報700の類似度704にZ株式会社(対象企業の企業ID:C0001)と関連させて出力する(記憶する)。なお、他の実施形態として、当該出力後の類似度を業界類似度マトリクス情報1300の交点に対応付けておいてもよい。
 これにより、類似度算出モジュール211が実行する業界類似度算出フロー2000は終了する。
The similarity calculation module 211 outputs the calculated industry similarity (step 2005).
A specific example of step 2005 will be described with reference to FIG. An example of A Co., Ltd. as a first company and Z Co., Ltd. as a second company will be described. In the similarity calculation module 211, in order to adjust the industry similarity 10 calculated in step 2004 described above with the business similarity score, the similarity value 1.00 divided by 10 is the similarity 704 of the similarity information 700. Output (store) in association with Z Co., Ltd. (company ID of the target company: C0001). As another embodiment, the similarity after the output may be associated with the intersection of the industry similarity matrix information 1300.
As a result, the industry similarity calculation flow 2000 executed by the similarity calculation module 211 ends.
 図21は、類似度算出モジュール211が実施する業態単語抽出フロー2100の例である。
 業態単語抽出フロー2100は、第1の情報及び第2の情報から業態に関する単語を抽出するフローであり、図16におけるステップ1606の詳細なフローである。
FIG. 21 is an example of the business format word extraction flow 2100 implemented by the similarity calculation module 211.
The business format word extraction flow 2100 is a flow for extracting words related to the business format from the first information and the second information, and is a detailed flow of step 1606 in FIG.
 類似度算出モジュール211は第1の企業に関する情報である第1の情報及び第2の企業に関する情報である第2の情報から単語群を抽出する(ステップ2101)。なお、当該ステップはステップ1701及び1901と同様であり、ステップ1701又は1901において、類似度算出モジュール211が抽出した単語群を利用することができる。 The similarity calculation module 211 extracts a word group from the first information which is information about the first company and the second information which is information about the second company (step 2101). The step is the same as in steps 1701 and 1901, and the word group extracted by the similarity calculation module 211 in step 1701 or 1901 can be used.
 類似度算出モジュール211は第1の単語群と業態単語分類辞書情報1200とを突合し、かつ第2の単語群と業態単語分類辞書情報1200とを突合する(ステップ2102)。
 辞書DB209の業態単語分類辞書情報1200には業態に関する単語(以下、業態単語とする場合がある。)の情報を記憶している。
The similarity calculation module 211 collates the first word group with the business type word classification dictionary information 1200, and collates the second word group with the business type word classification dictionary information 1200 (step 2102).
The business type word classification dictionary information 1200 of the dictionary DB 209 stores information on words related to the business type (hereinafter, may be referred to as business type words).
 類似度算出モジュール211は第1の単語群に含まれる業態単語(以下、第1の業態単語とする場合がある。)又は第2の単語群に含まれる業態単語(以下、第2の業態単語とする場合がある。)を出力する(ステップ2103)。
 類似度算出モジュール211は、第1の情報において第1の業態単語が出現した回数及び第2の情報において第2の業態単語が出現した回数を出力する(ステップ2104)。
The similarity calculation module 211 is a business type word included in the first word group (hereinafter, may be referred to as a first business type word) or a business type word included in the second word group (hereinafter, a second business type word). (In some cases), it is output (step 2103).
The similarity calculation module 211 outputs the number of times the first business type word appears in the first information and the number of times the second business type word appears in the second information (step 2104).
 ステップ2102~ステップ2104の具体的な例を、図5及び図12を用いて説明する。図5で示す案件IDがM1(基準企業名がA株式会社)の例を説明する。
 A株式会社に関する文字列情報(第1の情報)から抽出した第1の単語群に「施工事例」及び「メンテナンス」の単語を含んでいる場合を想定する。当該場合に、類似度算出モジュール211は、第1の単語群に含む「施工事例」及び「メンテナンス」の単語が、業態単語分類辞書情報1200の業態単語1203に記憶されているかを検索する(ステップ2102)。
Specific examples of steps 2102 to 2104 will be described with reference to FIGS. 5 and 12. An example in which the project ID shown in FIG. 5 is M1 (the standard company name is A Co., Ltd.) will be described.
It is assumed that the first word group extracted from the character string information (first information) about A Co., Ltd. includes the words "construction example" and "maintenance". In this case, the similarity calculation module 211 searches whether the words "construction example" and "maintenance" included in the first word group are stored in the business format word 1203 of the business category word classification dictionary information 1200 (step). 2102).
 本実施形態において、業態単語分類辞書情報1200の業態単語1203には「施工事例」1211及び「メンテナンス」1212の単語が記憶されている。従って、類似度算出モジュール211は、第1の単語群に含む「施工事例」及び「メンテナンス」の単語は、業態単語分類辞書情報1200の業態単語1203に含まれる「施工事例」1211及び「メンテナンス」1212の単語であるとして、「施工事例」及び「メンテナンス」の単語を第1の業態単語として基準企業単語情報500の業態単語506に出力する(記憶する)(ステップ2103)。 In the present embodiment, the words "construction example" 1211 and "maintenance" 1212 are stored in the business type word 1203 of the business type word classification dictionary information 1200. Therefore, in the similarity calculation module 211, the words "construction example" and "maintenance" included in the first word group are "construction example" 1211 and "maintenance" included in the format word 1203 of the format word classification dictionary information 1200. As the 1212 words, the words "construction example" and "maintenance" are output (memorized) to the business format word 506 of the standard company word information 500 as the first business format word (step 2103).
 さらに、類似度算出モジュール211は、「施工事例」及び「メンテナンス」がA株式会社に関する文字列情報に出現した回数も基準企業単語情報500の業態単語506に出力する(記憶する)(ステップ2104)。すなわち、類似度算出モジュール211はA株式会社に関する文字列情報(第1の情報)のうち、最も多く出現した「施工事例」(出現回数が7回)を業態単語506の単語1に出力し、次いで出現回数が多かった「メンテナンス」(出現回数が6回)を業態単語506の単語2に出力する。なお、類似度算出モジュール211は、「住宅」に次いで出現回数が多かった他の単語も業態単語506の単語3(図示せず)以降に出力する。 Further, the similarity calculation module 211 also outputs (memorizes) the number of times "construction example" and "maintenance" appear in the character string information related to A Co., Ltd. in the business format word 506 of the standard company word information 500 (step 2104). .. That is, the similarity calculation module 211 outputs the most frequently appearing "construction example" (the number of appearances is 7) among the character string information (first information) related to A Co., Ltd. to word 1 of the business format word 506. Next, "maintenance" (the number of appearances is 6), which has the highest number of appearances, is output to word 2 of the business type word 506. The similarity calculation module 211 also outputs other words that appear most frequently after "house" after word 3 (not shown) of the business format word 506.
 なお、A株式会社に関する文字列情報から抽出した第1の単語群に「住宅」の単語を含んでおり、かつ業態単語分類辞書情報1200の業態単語1203に「住宅」の業態単語を含んでいない場合を想定する。当該場合に、類似度算出モジュール211は、第1の単語群と業態単語分類辞書情報1200の業態単語1203とでは「住宅」の単語は共通しないため「住宅」の単語は基準企業単語情報500の業態単語506に出力しない(記憶しない)。
 また、図5を用いて説明した当該例は第1の企業についての例であるが、第2の企業の場合でっあっても、第2の業態単語の情報及び出現した回数の情報を対象企業単語情報900における業態単語905に出力する(記憶する)こと以外は、図5を用いて説明した例と同様である。他の実施形態として、類似度算出モジュール211は、第1の業態単語の情報及び出現した回数、並びに情報第2の業態単語の情報及び出現した回数の情報が同一のデータベースに記憶してもよい。
 これにより、類似度算出モジュール211が実行する業態単語抽出フロー2100は終了する。
It should be noted that the first word group extracted from the character string information about A Co., Ltd. includes the word "house", and the business type word 1203 of the business type word classification dictionary information 1200 does not include the business type word "house". Imagine a case. In this case, in the similarity calculation module 211, the word "house" is not common between the first word group and the business type word 1203 of the business type word classification dictionary information 1200, so that the word "house" is the reference company word information 500. Do not output to business type word 506 (do not remember).
Further, the example described with reference to FIG. 5 is an example of the first company, but even in the case of the second company, the information of the second business type word and the information of the number of appearances are targeted. It is the same as the example described with reference to FIG. 5, except that it is output (memorized) to the business type word 905 in the company word information 900. As another embodiment, the similarity calculation module 211 may store the information of the first business category word and the number of appearances, and the information of the second business category word and the number of appearances in the same database. ..
As a result, the business format word extraction flow 2100 executed by the similarity calculation module 211 ends.
 図22は、類似度算出モジュール211が実施する業態類似度算出フロー2200の例である。
 業態類似度算出フロー2200は、第1の企業が属する業態と、第2の企業が属する業態と、の類似度に基づいて業態類似度を算出するするフローであり、図16におけるステップ1607の詳細なフローである。
FIG. 22 is an example of the business format similarity calculation flow 2200 implemented by the similarity calculation module 211.
The business type similarity calculation flow 2200 is a flow for calculating the business type similarity based on the similarity between the business type to which the first company belongs and the business type to which the second company belongs, and the details of step 1607 in FIG. Flow.
 類似度算出モジュール211は、所定回数以上で出現する第1の業態単語を基準企業単語情報500の業態単語506から取得し、かつ所定回数以上で出現する第2の業態単語の情報を対象企業単語情報900の業態単語905から取得する(ステップ2201)。 The similarity calculation module 211 acquires the first business type word that appears more than a predetermined number of times from the business type word 506 of the reference company word information 500, and obtains the information of the second business type word that appears more than a predetermined number of times as the target company word. Obtained from the business type word 905 of the information 900 (step 2201).
 ステップ2201の具体的な例を、図5及び図9を用いて説明する。図5で示す基準企業名がA株式会社の例を説明する。A株式会社における業態単語は「施工事例」(出現回数7回)、「メンテナンス」(出現回数6回)と続く。本実施形態において、類似度算出モジュール211は、最大の出現回数で出現する「施工事例」の出現回数(7回)の80%以上の回数(5.6回以上)で出現する第1の業態単語を取得する。すなわち、類似度算出モジュール211は、「施工事例」及び「メンテナンス」の業態単語の情報を取得する。なお、本実施形態において、類似度算出モジュール211は、最大の出現回数で出現する単語の出現回数の所定割合(80%)以上としたが、当該所定割合は任意に設定できる。 A specific example of step 2201 will be described with reference to FIGS. 5 and 9. An example in which the reference company name shown in FIG. 5 is A Co., Ltd. will be described. The format words in A Co., Ltd. are "construction case" (appearance number 7 times) and "maintenance" (appearance number 6 times). In the present embodiment, the similarity calculation module 211 is a first business format that appears 80% or more (5.6 times or more) of the appearance times (7 times) of the "construction case" that appears at the maximum number of appearances. Get the word. That is, the similarity calculation module 211 acquires the information of the business type words of "construction example" and "maintenance". In the present embodiment, the similarity calculation module 211 is set to a predetermined ratio (80%) or more of the number of appearances of the word appearing at the maximum number of appearances, but the predetermined ratio can be arbitrarily set.
 図9で示す対象企業名がZ株式会社の例を説明する。Z株式会社における業態単語は「施工実績」(出現回数8回)、「メンテナンス」(出現回数4回)と続く。本実施形態において、類似度算出モジュール211は、最大の出現回数で出現する「施工実績」の出現回数(8回)の80%以上の回数(6.4回以上)で出現する第1の業態単語を取得する。すなわち、類似度算出モジュール211は、「施工実績」の業態単語の情報のみを取得する。 An example in which the target company name shown in FIG. 9 is Z Co., Ltd. will be described. The format words in Z Co., Ltd. are "construction record" (8 appearances) and "maintenance" (4 appearances). In the present embodiment, the similarity calculation module 211 is the first business format that appears 80% or more (6.4 times or more) of the appearance times (8 times) of the "construction results" that appear at the maximum number of appearances. Get the word. That is, the similarity calculation module 211 acquires only the information of the business type word of "construction record".
 類似度算出モジュール211は、ステップ2201で取得した第1の業態単語に対応付けられた業態分類(以下、第1の業態とする場合がある。)、及びステップ2201で取得した第2の業態単語に対応付けられた業態分類(以下、第2の業態とする場合がある。)を業態単語分類辞書情報1200から取得する(ステップ2202)。 The similarity calculation module 211 includes a business category classification associated with the first business format word acquired in step 2201 (hereinafter, may be referred to as a first business format), and a second business format word acquired in step 2201. The business category classification (hereinafter, may be referred to as a second business category) associated with is acquired from the business category word classification dictionary information 1200 (step 2202).
 ステップ2202の具体的な例を、図5及び図12を用いて説明する。図5で示す基準企業名がA株式会社の例を説明する。類似度算出モジュール211は、ステップ2201で取得した「施工事例」及び「メンテナンス」の第1の業態単語に対応する第1の業態を取得する。具体的には、類似度算出モジュール211は、図12の業態単語分類辞書情報1200の「施工事例」1211に対応する、下位業態1202に記憶された「施工」1213及び上位業態1201に記憶された「製造・加工」1214を取得する。さらに、類似度算出モジュール211は、図12の業態単語分類辞書情報1200の「メンテナンス」1212に対応する、下位業態1202に記憶された「整備・維持」1215及び上位業態1201に記憶された「管理」1216を取得する。
 なお、類似度算出モジュール211は、取得した第1の業態の情報を図6で示す基準企業分類情報600の業態605の業態1及び業態2に記憶する。
A specific example of step 2202 will be described with reference to FIGS. 5 and 12. An example in which the reference company name shown in FIG. 5 is A Co., Ltd. will be described. The similarity calculation module 211 acquires the first business format corresponding to the first business format words of "construction example" and "maintenance" acquired in step 2201. Specifically, the similarity calculation module 211 is stored in the "construction" 1213 and the upper business category 1201 stored in the lower business format 1202, which corresponds to the "construction example" 1211 in the business format word classification dictionary information 1200 of FIG. Acquire "Manufacturing / Processing" 1214. Further, the similarity calculation module 211 corresponds to the “maintenance” 1212 of the business format word classification dictionary information 1200 in FIG. 12, and the “maintenance / maintenance” 1215 stored in the lower business format 1202 and the “management” stored in the upper business format 1201. 1216 is acquired.
The similarity calculation module 211 stores the acquired information on the first business type in the business type 1 and the business type 2 of the business type 605 of the reference company classification information 600 shown in FIG.
 同様に、ステップ2202の具体的な例を、図9及び図12を用いて説明する。図9で示す対象企業名がZ株式会社の例を説明する。類似度算出モジュール211は、ステップ2201で取得した「施工実績」の第2の業態単語に対応する第2の業態を取得する。具体的には、類似度算出モジュール211は、図12の業態単語分類辞書情報1200の「施工実績」1217に対応する、下位業態1202に記憶された「施工」1213及び上位業態1201に記憶された「製造・加工」1214を取得する。なお、類似度算出モジュール211は、取得した第2の業態の情報を図10で示す対象企業分類情報1000の業態1004の業態1に記憶し、第1の業態が複数ある場合には、業態2以降に記憶する。 Similarly, a specific example of step 2202 will be described with reference to FIGS. 9 and 12. An example in which the target company name shown in FIG. 9 is Z Co., Ltd. will be described. The similarity calculation module 211 acquires the second business format corresponding to the second business format word of the “construction record” acquired in step 2201. Specifically, the similarity calculation module 211 is stored in the "construction" 1213 and the upper business category 1201 stored in the lower business category 1202, which corresponds to the "construction record" 1217 of the business category word classification dictionary information 1200 in FIG. Acquire "Manufacturing / Processing" 1214. The similarity calculation module 211 stores the acquired information on the second business type in the business type 1 of the business type 1004 of the target company classification information 1000 shown in FIG. 10, and when there are a plurality of the first business types, the business type 2 I will remember it later.
 類似度算出モジュール211は、業態類似度マトリクス情報1400に基づき、ステップ2202で取得した、第1の業態と第2の業態との類似度を取得する(ステップ2203)。
 ステップ2203の具体的な例を、図14を用いて説明する。A株式会社の第1の業態(下位業態「施工」及び上位業態「製造・加工」並びに下位業態「整備・維持」及び上位業態「管理」)と、Z株式会社の第2の業態(下位業態「施工」及び上位業態「製造・加工」)と、の類似度を取得する例を説明する。
The similarity calculation module 211 acquires the similarity between the first business format and the second business format acquired in step 2202 based on the business format similarity matrix information 1400 (step 2203).
A specific example of step 2203 will be described with reference to FIG. The first business format of A Co., Ltd. (lower business format "construction" and higher business format "manufacturing / processing" and lower business format "maintenance / maintenance" and higher business format "management") and the second business format of Z Co., Ltd. (lower business format) An example of acquiring the degree of similarity with "construction" and the higher-level business format "manufacturing / processing") will be described.
 類似度算出モジュール211は、図14の業態類似度マトリクス情報1400における、列に属するA株式会社の上位業態「製造・加工」1412及び下位業態「施工」1411並びに列に属するA株式会社の上位業態「管理」1414及び下位業態「整備・維持」1413と、行に属するZ株式会社の上位業態「製造・加工」1416及び下位業態「施工」1415と、の2つの交点1417、1418に対応付けられた類似度を其々取得する。すなわち、類似度算出モジュール211は、交点1417に対応付けられた類似度「10」と、交点1418に対応付けられた類似度「7」と、を其々取得する。 The similarity calculation module 211 is used in the business format similarity matrix information 1400 shown in FIG. It is associated with two intersections 1417 and 1418 of "management" 1414 and lower format "maintenance / maintenance" 1413, and upper format "manufacturing / processing" 1416 and lower format "construction" 1415 of Z Co., Ltd. belonging to the bank. Get the similarity. That is, the similarity calculation module 211 acquires the similarity "10" associated with the intersection 1417 and the similarity "7" associated with the intersection 1418, respectively.
 類似度算出モジュール211は、業態類似度マトリクス情報1400の交点に対応付けられた類似度に基づき、第1の企業と第2の企業との業態類似度を算出する(ステップ2204)。
 第1の企業が1つの下位業態のみを有し、第2の企業が1つの下位業態のみを有している場合(業態類似度マトリクス情報1400の交点が1つの場合)には、類似度算出モジュール211は、業態類似度マトリクス情報1400の交点に対応付けられた類似度を第1の企業と第2の企業との業態類似度とできる。
The similarity calculation module 211 calculates the business type similarity between the first company and the second company based on the similarity associated with the intersection of the business type similarity matrix information 1400 (step 2204).
If the first company has only one sub-business format and the second company has only one sub-business format (when the intersection of the business type similarity matrix information 1400 is one), the similarity is calculated. Module 211 can set the similarity associated with the intersection of the business type similarity matrix information 1400 as the business type similarity between the first company and the second company.
 第1の企業の第1の業態及び第2の企業の第2の業態のうちの少なくとも1つが複数の下位業態を含んでいる場合であって、業態類似度マトリクス情報1400の交点が複数ある(類似度が複数ある)場合を想定する。当該場合に、類似度算出モジュール211は、複数の交点に対応付けられた類似度を用いて業態類似度を算出する。
 類似度算出モジュール211は、例えば、複数のうちの1つの第1の下位業態ごとに類似度を算出し、算出した全ての第1の下位業態ごとの類似度の平均値を算出することで、業態類似度を算出する。類似度算出モジュール211は、上述する業界類似度算出フロー2000における処理と同様に、例えば、列に属する第1の企業の1つの第1の下位業態に対応付けられた複数の類似度のうちの、最大の類似度と平均の類似度と、の平均値を算出することで、第1の下位業態ごとの類似度を算出する。なお、列の業態と、行の業態が完全に同一の場合の業態類似度は最大値となる。
In the case where at least one of the first business format of the first company and the second business format of the second company includes a plurality of subordinate business formats, there are a plurality of intersections of the business category similarity matrix information 1400 ( Suppose there are multiple similarities). In this case, the similarity calculation module 211 calculates the business type similarity using the similarity associated with the plurality of intersections.
The similarity calculation module 211 calculates, for example, the similarity for each one of the plurality of first sub-business categories, and calculates the average value of the similarities for all the calculated first sub-business categories. Calculate the degree of business type similarity. Similar to the processing in the industry similarity calculation flow 2000 described above, the similarity calculation module 211 is, for example, among a plurality of similarity associations associated with one first sub-business category of the first company belonging to the column. By calculating the average value of the maximum similarity and the average similarity, the similarity for each of the first sub-business categories is calculated. In addition, when the business type of the column and the business type of the row are completely the same, the degree of business type similarity is the maximum value.
 第1の企業としてのA株式会社と第2の企業としてのZ株式会社との業態類似度を算出する例を説明する。当該例においては、列に属する第1の企業の1つの第1の下位業態に対応付けられた類似度は1つであるため、当該1つの類似度が、第1の下位業態ごとの類似度となる。すなわち、類似度算出モジュール211は、A株式会社の下位業態「施工」1411における類似度「10」1417と、A株式会社の「整備・維持」1413における類似度「7」1418と、を取得する。
 そして、類似度算出モジュール211は、A株式会社の下位業態「施工」1411における類似度「10」1417と、A株式会社の「整備・維持」1413における類似度「7」1418と、の平均値である「8.5」を業態類似度として算出する。
An example of calculating the business type similarity between A Co., Ltd. as the first company and Z Co., Ltd. as the second company will be described. In this example, since there is only one similarity associated with one first sub-business category of the first company belonging to the column, the one similarity level is the similarity level for each first sub-business category. It becomes. That is, the similarity calculation module 211 acquires the similarity "10" 1417 in the subordinate business format "construction" 1411 of A Co., Ltd. and the similarity "7" 1418 in "maintenance / maintenance" 1413 of A Co., Ltd. ..
Then, the similarity calculation module 211 is an average value of the similarity "10" 1417 in the subordinate business format "construction" 1411 of A Co., Ltd. and the similarity "7" 1418 in "maintenance / maintenance" 1413 of A Co., Ltd. Is calculated as "8.5" as the business type similarity.
 類似度算出モジュール211は、算出した業態類似度を出力する(ステップ2205)。
 ステップ2205の具体的な例を、図7を用いて説明する。第1の企業としてのA株式会社と第2の企業としてのZ株式会社との例で説明する。類似度算出モジュール211は、上述したステップ2204により算出した業態類似度の8.5を、事業類似度の点数と調整するため、10で除した値である0.85を類似度情報700の類似度704にZ株式会社(対象企業の企業ID:C0001)と関連させて出力する(記憶する)。なお、他の実施形態として、当該出力後の類似度を業態類似度マトリクス情報1400の交点に対応付けておいてもよい。
 これにより、類似度算出モジュール211が実行する業態類似度算出フロー2200は終了する。
The similarity calculation module 211 outputs the calculated business type similarity (step 2205).
A specific example of step 2205 will be described with reference to FIG. An example of A Co., Ltd. as a first company and Z Co., Ltd. as a second company will be described. The similarity calculation module 211 adjusts the business type similarity 8.5 calculated in step 2204 described above with the business similarity score, so that 0.85, which is a value divided by 10, is similar to the similarity information 700. Output (store) in association with Z Co., Ltd. (company ID: C0001 of the target company) at degree 704. As another embodiment, the similarity after the output may be associated with the intersection of the business type similarity matrix information 1400.
As a result, the business format similarity calculation flow 2200 executed by the similarity calculation module 211 ends.
 ここで、業界類似度マトリクスの列における下位業界及び上位業界と、行における下位業界及び上位業界と、の交点に記憶された類似度は、所定の規則に基づいて設定される。当該所定の規則について、図15を用いて説明する。
 図15の業界業態類似度設定情報1500には、業界類似度マトリクスにおける列の上位分類(上位業界)と行の上位分類(上位業界)とが、同じ、高類似度、中類似度、又は低類似度の場合で場合分けされており、かつ業界類似度マトリクスにおける列の下位分類(下位業界)と行の下位分類(下位業界)とが、同じ、高類似度、又は中類似度及び低類似度場合で場合分けされており、其々の場合に応じた類似度が記憶されている。
 なお、「同じ」は業界同士が完全に同一であり、「高類似度」は業界同士が類似する可能性が高く、「中類似度」は業界同士が「高類似度」に次いで類似する可能性が高く、「低類似度」は業界同士が「中類似度」に次いで類似する可能性が高いことを意味する。
Here, the similarity stored at the intersection of the lower industry and the upper industry in the column of the industry similarity matrix and the lower industry and the upper industry in the row is set based on a predetermined rule. The predetermined rule will be described with reference to FIG.
In the industry type similarity setting information 1500 of FIG. 15, the upper classification of columns (upper industry) and the upper classification of rows (upper industry) in the industry similarity matrix are the same, high similarity, medium similarity, or low. The cases are classified according to the degree of similarity, and the subclassification of columns (subclassification) and subclassification of rows (subclassification) in the industry similarity matrix are the same, high similarity, or medium similarity and low similarity. It is divided into cases according to the degree, and the similarity according to each case is stored.
In addition, "same" means that the industries are completely the same, "high similarity" means that the industries are likely to be similar, and "medium similarity" means that the industries are similar next to "high similarity". Highly likely, “low similarity” means that industries are likely to be similar next to “medium similarity”.
 業界業態類似度設定情報1500は類似度設定規則1501などの情報を有する。
 類似度設定規則1501は、業界業態類似度設定情報1500の最下行に上述した其々の場合に応じた類似度を記憶している。
 類似度算出モジュール211は、業界業態類似度設定情報1500で示す規則に基づいて、業界類似度マトリクスの列(第1の企業)における下位業界及び上位業界と、行(第2の企業)における下位業界及び上位業界と、の交点に類似度を対応付ける。
The industry type similarity setting information 1500 has information such as the similarity setting rule 1501.
The similarity setting rule 1501 stores the similarity corresponding to each of the above-mentioned cases in the bottom line of the industry type similarity setting information 1500.
The similarity calculation module 211 is based on the rules shown in the industry format similarity setting information 1500, and is a lower industry and a higher industry in the column (first company) of the industry similarity matrix and a lower industry in the row (second company). Correspond the degree of similarity to the intersection of the industry and the upper industry.
 具体的には、例えば、類似度算出モジュール211は、以下の順で類似度を高く設定することができる。
 列の下位業界と行の下位業界とが同じである場合(交点に記憶される類似度が10)。
 列の下位業界と行の下位業界とが高類似度であり、かつ列の上位業界と行の上位業界とが同じである場合(交点に記憶される類似度が9)。
 列の下位業界と行の下位業界とが中類似度であり、かつ列の上位業界と行の上位業界とが同じである場合(交点に記憶される類似度が8)。
Specifically, for example, the similarity calculation module 211 can set the similarity higher in the following order.
When the sub-industry of the column and the sub-industry of the row are the same (similarity stored at the intersection is 10).
When the lower industry in the column and the lower industry in the row have high similarity, and the upper industry in the column and the upper industry in the row are the same (the similarity stored at the intersection is 9).
When the lower industry in the column and the lower industry in the row have a medium similarity, and the upper industry in the column and the upper industry in the row are the same (the similarity stored at the intersection is 8).
 列の下位業界と行の下位業界との類似度が高類似度であり、かつ列の上位業界と行の上位業界との類似度が高類似度である場合(交点に記憶される類似度が7)。
 列の下位業界と行の下位業界との類似度が中類似度であり、かつ列の上位業界と行の上位業界との類似度が高類似度である場合(交点に記憶される類似度が6)。
 列の下位業界と行の下位業界との類似度が高類似度であり、かつ列の上位業界と行の上位業界との類似度が低及び中類似度である場合(交点に記憶される類似度が5)。
 列の下位業界と行の下位業界との類似度が中類似度であり、かつ列の上位業界と行の上位業界との類似度が低及び中類似度である場合(交点に記憶される類似度が4)。
When the similarity between the lower industry in the column and the lower industry in the row is high, and the similarity between the upper industry in the column and the upper industry in the row is high (the similarity stored in the intersection is high). 7).
When the similarity between the lower industry in the column and the lower industry in the row is medium similarity, and the similarity between the upper industry in the column and the upper industry in the row is high (the similarity stored in the intersection is 6).
When the similarity between the lower industry in the column and the lower industry in the row is high, and the similarity between the upper industry in the column and the upper industry in the row is low and medium (similarity stored at the intersection). The degree is 5).
When the similarity between the lower industry in the column and the lower industry in the row is medium similarity, and the similarity between the upper industry in the column and the upper industry in the row is low and medium similarity (similarity stored at the intersection). The degree is 4).
 列の下位業界と行の下位業界との類似度が低類似度であり、かつ列の上位業界と行の上位業界とが同じである場合(交点に記憶される類似度が3)。
 列の下位業界と行の下位業界との類似度が低類似度であり、かつ列の上位業界と行の上位業界との類似度が高類似度である場合(交点に記憶される類似度が2)。
 列の下位業界と行の下位業界との類似度が低類似度であり、かつ列の上位業界と行の上位業界との類似度が低及び中類似度である場合(交点に記憶される類似度が1)。
When the similarity between the lower industry in the column and the lower industry in the row is low, and the upper industry in the column and the upper industry in the row are the same (the similarity stored at the intersection is 3).
When the similarity between the lower industry in the column and the lower industry in the row is low, and the similarity between the upper industry in the column and the upper industry in the row is high (the similarity stored at the intersection is 2).
When the similarity between the lower industry in the column and the lower industry in the row is low, and the similarity between the upper industry in the column and the upper industry in the row is low and medium similarity (similarity stored at the intersection). The degree is 1).
 なお、下位業界同士が「同じ」、「高類似度」、「中類似度」又は「低類似度」かは、予め決定されている必要がある。また同様に、上位業界同士が「同じ」、「高類似度」、「中類似度」又は「低類似度」かは、予め決定されている必要がある。 It should be noted that it is necessary to determine in advance whether the sub-industries are "same", "high similarity", "medium similarity" or "low similarity". Similarly, it is necessary to determine in advance whether the upper industries are “same”, “high similarity”, “medium similarity” or “low similarity”.
 また業態類似度マトリクスの列における下位業態及び上位業態と、行における下位業態及び上位業態と、の交点に対応付ける類似度についても上述するとおりの所定の規則に基づいて設定される。 In addition, the similarity corresponding to the intersection of the lower and upper business categories in the column of the business category similarity matrix and the lower and upper business categories in the row is also set based on the predetermined rules as described above.
 図23は、類似度算出モジュール211が実施する企業類似度算出フロー2300の例である。
 企業類似度算出フロー2300は、事業類似度、業界類似度、及び業態類似度に基づいて企業類似度を算出するフローであり、図16におけるステップ1608の詳細なフローである。
FIG. 23 is an example of the company similarity calculation flow 2300 implemented by the similarity calculation module 211.
The company similarity calculation flow 2300 is a flow for calculating the company similarity based on the business similarity, the industry similarity, and the business type similarity, and is a detailed flow of step 1608 in FIG.
 類似度算出モジュール211は、類似度情報700から事業類似度、業界類似度及び業態類似度の情報を取得する(ステップ2301)。
 類似度算出モジュール211は、事業類似度、業界類似度及び業態類似度を所定割合で足し合わせた企業類似度を算出する(ステップ2302)。
 類似度算出モジュール211は、算出した企業類似度を出力する(ステップ2303)。
The similarity calculation module 211 acquires information on business similarity, industry similarity, and business type similarity from the similarity information 700 (step 2301).
The similarity calculation module 211 calculates the company similarity by adding the business similarity, the industry similarity, and the business type similarity at a predetermined ratio (step 2302).
The similarity calculation module 211 outputs the calculated company similarity (step 2303).
 ステップ2301~ステップ2303の具体的な例を、図7を用いて説明する。第1の企業としてのA株式会社と第2の企業としてのZ株式会社(対象企業の企業ID:C0001)との例で説明する。
 類似度算出モジュール211は、類似度情報700の類似度704から、第1の企業であるA株式会社と第2の企業(対象企業の企業ID:C0001)との事業類似度(0.960)、業界類似度(1.00)及び業態類似度(0.850)を其々取得する(ステップ2301)。
Specific examples of steps 2301 to 2303 will be described with reference to FIG. An example of A corporation as a first company and Z corporation as a second company (company ID of the target company: C0001) will be described.
The similarity calculation module 211 has a business similarity (0.960) between the first company A Co., Ltd. and the second company (company ID of the target company: C0001) from the similarity 704 of the similarity information 700. , Industry similarity (1.00) and business category similarity (0.850) are acquired respectively (step 2301).
 類似度算出モジュール211は、以下の割合で其々の類似度を足し合わせる。
 事業類似度:業界類似度:業態類似度=3:5:2
 業態類似度の割合が最も高く、事業類似度の割合が最も低い。
 すなわち、類似度算出モジュール211は、企業類似度として0.958の値を算出する(ステップ2302)。
 当該所定割合はいかなる割合でもよく、当該割合を調整することで各類似度の重要度を設定できる。
The similarity calculation module 211 adds the respective similarity at the following ratios.
Business similarity: Industry similarity: Business similarity = 3: 5: 2
The ratio of business similarity is the highest, and the ratio of business similarity is the lowest.
That is, the similarity calculation module 211 calculates a value of 0.958 as the company similarity (step 2302).
The predetermined ratio may be any ratio, and the importance of each similarity can be set by adjusting the ratio.
 類似度算出モジュール211は、算出した企業類似度(0.958)を類似度情報700の類似度704にZ株式会社(対象企業の企業ID:C0001)と関連させて出力する(記憶する)(ステップ2303)。
 これにより、類似度算出モジュール211が実行する企業類似度算出フロー2300は終了する。
The similarity calculation module 211 outputs (stores) the calculated company similarity (0.958) in association with the similarity 704 of the similarity information 700 in association with Z Co., Ltd. (company ID: C0001 of the target company). Step 2303).
As a result, the company similarity calculation flow 2300 executed by the similarity calculation module 211 ends.
 図24は、類似度算出モジュール211が実施する類似度出力フロー2400の例である。
 類似度出力フロー2400は、企業類似度の序列に基づいて複数の第2の企業の類似度を出力するフローであり、図16におけるステップ1609の詳細なフローである。
FIG. 24 is an example of the similarity output flow 2400 implemented by the similarity calculation module 211.
The similarity output flow 2400 is a flow for outputting the similarity of a plurality of second companies based on the order of the company similarity, and is a detailed flow of step 1609 in FIG.
 類似度算出モジュール211は、複数の第2の企業における企業類似度を取得する(ステップ2401)。
 類似度算出モジュール211は、取得した複数の第2の企業における企業類似度に基づき複数の第2の企業の序列を決定する(ステップ2402)。
 類似度算出モジュール211は、決定した複数の第2の企業の序列の情報を出力する(記憶する)(ステップ2403)。
The similarity calculation module 211 acquires the company similarity in the plurality of second companies (step 2401).
The similarity calculation module 211 determines the order of the plurality of second companies based on the acquired company similarity in the plurality of second companies (step 2402).
The similarity calculation module 211 outputs (stores) information on the order of the plurality of determined second companies (step 2403).
 ステップ2401~ステップ2403の具体的な例を、図7を用いて説明する。第1の企業としてのA株式会社の例で説明する。
 類似度算出モジュール211は、図7の類似度情報700の類似度704から、第1の企業であるA株式会社と複数の第2の企業との企業類似度を取得する(ステップ2401)。本実施形態においては、対象企業基本情報800に記憶されたすべての第2の企業とA株式会社との企業類似度を類似度情報700の類似度704から取得する。なお、図7には、3つの企業の類似度のみしか表示されていない。
Specific examples of steps 2401 to 2403 will be described with reference to FIG. The example of A Co., Ltd. as the first company will be described.
The similarity calculation module 211 acquires the company similarity between the first company A Co., Ltd. and the plurality of second companies from the similarity 704 of the similarity information 700 in FIG. 7 (step 2401). In the present embodiment, the company similarity between all the second companies and A Co., Ltd. stored in the target company basic information 800 is acquired from the similarity 704 of the similarity information 700. Note that FIG. 7 shows only the similarity of the three companies.
 類似度算出モジュール211は、すべての第2の企業とA株式会社との企業類似度に基づき第2の企業を序列付けしたところ、例えば、上位3位は次のとおり決定できる(ステップ2402)。第1位は企業IDがC0001(Z株式会社)で企業類似度が0.958であり、第2位は企業IDがC0080の企業で企業類似度が0.927であり、第3位は企業IDがC0087で企業類似度が0.810である。 The similarity calculation module 211 ranks the second companies based on the company similarity between all the second companies and A Co., Ltd., and for example, the top three can be determined as follows (step 2402). The first place is a company with a company ID of C0001 (Z Co., Ltd.) and the company similarity is 0.958, the second place is a company with a company ID of C0080 and the company similarity is 0.927, and the third place is a company. The ID is C0087 and the company similarity is 0.810.
 類似度算出モジュール211は、第1位の企業IDがC0001のZ株式会社の各類似度に関する情報を類似度情報700の類似度704の類似度1に出力(記憶)し、第2位の企業IDがC0080の企業の各類似度に関する情報を類似度情報700の類似度704の類似度2に出力(記憶)し、第3位の企業IDがC0087の企業の各類似度に関する情報を類似度情報700の類似度704の類似度3に出力(記憶)する(ステップ2403)。なお、類似度算出モジュール211は、第4位以降を類似度情報700の類似度704の類似度4以降に出力(記憶)する。
 これにより、類似度算出モジュール211が実行する類似度出力フロー2400は終了する。
The similarity calculation module 211 outputs (stores) information on each similarity of Z Co., Ltd., whose first-ranked company ID is C0001, to similarity 1 of similarity 704 of similarity information 700, and second-ranked company. The information about each similarity of the company whose ID is C0080 is output (stored) to the similarity 2 of the similarity 704 of the similarity information 700, and the information about each similarity of the company whose third company ID is C807 is the similarity. It is output (stored) to the similarity 3 of the similarity 704 of the information 700 (step 2403). The similarity calculation module 211 outputs (stores) the fourth and subsequent ranks to the similarity 4 and later of the similarity 704 of the similarity information 700.
As a result, the similarity output flow 2400 executed by the similarity calculation module 211 ends.
 図25は、類似企業表示モジュール212が実施する類似企業表示フロー2500の例である。
 類似企業表示フロー2500は、第1の企業に類似する第2の企業を表示するフローである。
FIG. 25 is an example of the similar company display flow 2500 implemented by the similar company display module 212.
The similar company display flow 2500 is a flow for displaying a second company similar to the first company.
 類似企業表示モジュール212は、企業類似度が上位の第2の企業における事業類似度、業界類似度、業態類似度及び企業類似度を取得する(ステップ2501)。
 類似企業表示モジュール212は、企業類似度の序列に基づいて、第2の企業に関する情報及び事業類似度、業界類似度及び業態類似度の軸を含むチャートを利用者端末102に表示する(ステップ2502)。
 なお、類似企業表示モジュール212が生成及び表示するチャートは、レーダーチャートに限られず、例えば、会社毎に各類似度のコラムチャート(棒グラフ)をグルーピングしたチャートであってもよい。
The similar company display module 212 acquires the business similarity, the industry similarity, the business type similarity, and the company similarity in the second company having the higher company similarity (step 2501).
The similar company display module 212 displays information about the second company and a chart including axes of business similarity, industry similarity, and business type similarity on the user terminal 102 based on the order of company similarity (step 2502). ).
The chart generated and displayed by the similar company display module 212 is not limited to the radar chart, and may be, for example, a chart in which column charts (bar graphs) of each degree of similarity are grouped for each company.
 ステップ2501及びステップ2502の具体的な例を、図7、図8及び図27を用いて説明する。第1の企業としてのA株式会社の例で説明する。
 類似企業表示モジュール212は、図7の類似度情報700の類似度704における、第1の企業であるA株式会社の行の類似度1、類似度2及び類似度3に記憶した、第2の企業(企業IDがC0001のZ株式会社、企業IDがC0080の企業及び企業IDがC0087の企業)における事業類似度、業界類似度、業態類似度及び企業類似度を取得する。また、取得した第2の企業における図8の対象企業基本情報800に記憶した情報も併せて取得する(ステップ2501)。
Specific examples of steps 2501 and 2502 will be described with reference to FIGS. 7, 8 and 27. The example of A Co., Ltd. as the first company will be described.
The similar company display module 212 stores in the similarity 1, the similarity 2 and the similarity 3 of the row of the first company A Co., Ltd. in the similarity 704 of the similarity information 700 of FIG. The degree of business similarity, the degree of industry similarity, the degree of business type similarity, and the degree of company similarity in a company (Z Co., Ltd. with company ID C0001, company with company ID C0080 and company with company ID C0087) are acquired. In addition, the information stored in the target company basic information 800 of FIG. 8 in the acquired second company is also acquired (step 2501).
 図27は、第1の企業と類似する類似企業を表示するための画面2700の例である。より具体的には、図27は第1の企業としてのA株式会社に類似する3つの企業が表示されている。なお、図27は図26の「以下のURLから自動抽出する」2602が選択された後に表示される画面である。
 類似企業表示モジュール212は、企業IDがC0001のZ株式会社、企業IDがC0080の企業及び企業IDがC0087の企業の企業類似度の序列に基づいて、各企業の情報を表示する。すなわち、類似企業表示モジュール212は、企業類似度が最も高い企業であるZ株式会社(企業ID:C0001)の情報を上段部に表示し、次いで企業類似度が高い企業である企業IDがC0080の企業(R株式会社)の情報を中段部に表示し、次いで企業類似度が高い企業である企業IDがC0087の企業(G株式会社)の情報を下段部に表示する。
FIG. 27 is an example of a screen 2700 for displaying a similar company similar to the first company. More specifically, FIG. 27 shows three companies similar to A Co., Ltd. as the first company. Note that FIG. 27 is a screen displayed after "Automatically extract from the following URL" 2602 in FIG. 26 is selected.
The similar company display module 212 displays information on each company based on the order of the company similarity of the company with the company ID of C0001, the company with the company ID of C0080, and the company with the company ID of C0087. That is, the similar company display module 212 displays the information of Z Co., Ltd. (company ID: C0001), which is the company with the highest degree of company similarity, in the upper part, and the company ID of the company with the next highest degree of company similarity is C0080. The information of the company (R Co., Ltd.) is displayed in the middle part, and the information of the company (G Co., Ltd.) whose company ID is C0087, which is the next highest degree of company similarity, is displayed in the lower part.
 類似企業表示モジュール212は、図27で示すように、図8の対象企業基本情報800から取得した、各第2の企業における、名称、株式時価総額、当期純利益、及び株価収益率を表示する。
 また、類似企業表示モジュール212は、図27で示すように、類似度情報700から取得した事業類似度を事業類似度の軸に表示し、類似度情報700から取得した業界類似度を業界類似度の軸に表示し、類似度情報700から取得した業態類似度を業態類似度の軸に表示したレーダーチャートを表示する。
 これにより、類似企業表示モジュール212が実行する類似企業表示フロー2500は終了する。
As shown in FIG. 27, the similar company display module 212 displays the name, market capitalization, net income, and price-earnings ratio of each second company acquired from the target company basic information 800 of FIG. ..
Further, as shown in FIG. 27, the similar company display module 212 displays the business similarity acquired from the similarity information 700 on the axis of the business similarity, and displays the industry similarity acquired from the similarity information 700 as the industry similarity. A radar chart is displayed on the axis of the business type similarity, and the business type similarity obtained from the similarity information 700 is displayed on the business type similarity axis.
As a result, the similar company display flow 2500 executed by the similar company display module 212 ends.
 なお、類似企業表示モジュール212は、事業類似度の軸、業界類似度の軸及び業態類似度の軸に加えて、又は置き換えて、業種に関する類似度の軸、営業形態に関する類似度の軸、又は事業構造に関する類似度の軸、を含むチャートを生成及び表示してもよい。
 類似度算出モジュール211は、業種に関する類似度、営業形態に関する類似度、又は事業構造に関する類似度を、業界類似度又は業態類似度を算出する方法と同様の方法で、算出できる。
In addition, the similar company display module 212 adds or replaces the axis of business similarity, the axis of industry similarity, and the axis of business type similarity, and the axis of similarity regarding the industry, the axis of similarity regarding the business form, or the axis of similarity regarding the business form. A chart may be generated and displayed that includes an axis of similarity with respect to the business structure.
The similarity calculation module 211 can calculate the similarity regarding the industry, the similarity regarding the business form, or the similarity regarding the business structure by the same method as the method for calculating the industry similarity or the business type similarity.
 すなわち、類似度算出モジュール211は、第1の情報又は第2の情報から、業種に関する単語及び分類の情報を記憶する業種単語分類辞書情報、営業形態に関する単語及び分類の情報を記憶する営業形態単語分類辞書情報、又は事業構造に関する単語及び分類の情報を記憶する事業構造単語分類辞書情報を用いて、業種に関する単語、営業形態に関する単語、又は事業構造に関する単語を抽出する。 That is, the similarity calculation module 211 stores the industry word classification dictionary information for storing the word and classification information related to the industry, the business form word for storing the business form word and the classification information from the first information or the second information. Using the classification dictionary information or the business structure word classification dictionary information that stores words related to the business structure and information on the classification, words related to the type of business, words related to the business form, or words related to the business structure are extracted.
 次いで、当該抽出した単語に対応する第1の企業又は第2の企業が属する業種、営業形態、又は事業構造を、業種単語分類辞書情報、営業形態単語分類辞書情報、又は事業構造単語分類辞書情報を用いて、特定する。
 次いで、類似度算出モジュール211は、業種類似度マトリクス、営業形態類似度マトリクス、又は事業構造類似度マトリクスを用いて、業種に関する類似度、営業形態に関する類似度、又は事業構造に関する類似度を算出する。
Next, the industry, business form, or business structure to which the first company or the second company corresponding to the extracted word belongs is determined by the industry word classification dictionary information, the business form word classification dictionary information, or the business structure word classification dictionary information. To identify using.
Next, the similarity calculation module 211 calculates the similarity regarding the industry, the similarity regarding the business form, or the similarity regarding the business structure by using the industry similarity matrix, the business form similarity matrix, or the business structure similarity matrix. ..
 なお、本発明は上記した実施例に限定されるものではなく、様々な変形例が含まれる。例えば、上記した実施例は本発明を分かりやすく説明するために詳細に説明したものであり、必ずしも説明した全ての構成を備えるものに限定されるものではない。また、ある実施例の構成の一部を他の実施例の構成に置き換えることが可能であり、また、ある実施例の構成に他の実施例の構成を加えることも可能である。また、各実施例の構成の一部について、他の構成の追加・削除・置換をすることが可能である。 The present invention is not limited to the above-mentioned examples, and includes various modifications. For example, the above-described embodiment has been described in detail in order to explain the present invention in an easy-to-understand manner, and is not necessarily limited to those having all the described configurations. Further, it is possible to replace a part of the configuration of one embodiment with the configuration of another embodiment, and it is also possible to add the configuration of another embodiment to the configuration of one embodiment. Further, it is possible to add / delete / replace a part of the configuration of each embodiment with another configuration.
 また、上記の各構成、機能、処理部、処理手段等は、それらの一部又は全部を、例えば集積回路で設計する等によりハードウェアで実現してもよい。また、上記の各構成、機能等は、プロセッサがそれぞれの機能を実現するプログラムを解釈し、実行することによりソフトウェアで実現してもよい。各機能を実現するプログラム、テーブル、ファイル等の情報は、メモリや、ハードディスク、SSD(Solid State Drive)等の記録装置、または、ICカード、SDカード、DVD等の記録媒体に置くことができる。 Further, each of the above configurations, functions, processing units, processing means, etc. may be realized by hardware by designing a part or all of them by, for example, an integrated circuit. Further, each of the above configurations, functions, and the like may be realized by software by the processor interpreting and executing a program that realizes each function. Information such as programs, tables, and files that realize each function can be stored in a memory, a hard disk, a recording device such as an SSD (Solid State Drive), or a recording medium such as an IC card, an SD card, or a DVD.
 また、制御線や情報線は説明上必要と考えられるものを示しており、製品上必ずしも全ての制御線や情報線を示しているとは限らない。実際には殆ど全ての構成が相互に接続されていると考えてもよい。
 なお、上述の実施例は少なくとも特許請求の範囲に記載の構成を開示している。
In addition, the control lines and information lines indicate those that are considered necessary for explanation, and do not necessarily indicate all the control lines and information lines in the product. In practice, it can be considered that almost all configurations are interconnected.
It should be noted that the above-described embodiment discloses at least the configuration described in the claims.
1・・・企業類似度算出システム、101・・・企業類似度算出サーバ、102・・・利用者端末、103・・・管理者端末、201・・・主記憶装置、202・・・補助記憶装置、203・・・プロセッサ、211・・・類似度算出モジュール、212・・・類似企業表示モジュール、213・・・管理モジュール

 
1 ... Corporate similarity calculation system, 101 ... Corporate similarity calculation server, 102 ... User terminal, 103 ... Administrator terminal, 201 ... Main storage device, 202 ... Auxiliary storage Device, 203 ... Processor, 211 ... Similarity calculation module, 212 ... Similar company display module, 213 ... Management module

Claims (20)

  1.  基準となる第1の企業と、前記第1の企業以外の第2の企業と、の企業類似度を算出する企業類似度算出サーバであって、
     前記第1の企業に関する第1の情報及び前記第2の企業に関する第2の情報に基づき、前記第1の企業と前記第2の企業との企業類似度を算出する類似度算出手段と、
     算出された前記企業類似度を出力する出力手段と、を備え、
     前記類似度算出手段は、
     前記第1の企業が行っている事業に関する単語と、前記第2の企業が行っている事業に関する単語と、に基づいて事業類似度を算出し、
     前記第1の企業が属する業界に関する第1の業界と、前記第2の企業が属する業界に関する第2の業界と、に基づいて業界類似度を算出し、
     前記第1の企業が属する業態に関する第1の業態と、前記第2の企業が属する業態に関する第2の業態と、に基づいて業態類似度を算出し、
     前記事業類似度と、前記業界類似度と、前記業態類似度と、に基づいて前記企業類似度を算出する、企業類似度算出サーバ。
    It is a company similarity calculation server that calculates the company similarity between the reference first company and the second company other than the first company.
    A similarity calculation means for calculating the company similarity between the first company and the second company based on the first information about the first company and the second information about the second company.
    It is provided with an output means for outputting the calculated company similarity.
    The similarity calculation means is
    The degree of business similarity is calculated based on the words related to the business conducted by the first company and the words related to the business conducted by the second company.
    The industry similarity is calculated based on the first industry related to the industry to which the first company belongs and the second industry related to the industry to which the second company belongs.
    The degree of business type similarity is calculated based on the first business type related to the business type to which the first company belongs and the second business type related to the business type to which the second company belongs.
    A company similarity calculation server that calculates the company similarity based on the business similarity, the industry similarity, and the business type similarity.
  2.  前記類似度算出手段は、
     前記第1の情報から業界に関する単語を抽出し、抽出した単語のうち所定の出現回数以上で出現する少なくとも1つの第1の業界単語と対応付けられた少なくとも1つの業界を前記第1の業界とし、
     前記第2の情報から業界に関する単語を抽出し、抽出した単語のうち所定の出現回数以上で出現する少なくとも1つの第2の業界単語と対応付けられた少なくとも1つの業界を前記第2の業界として、
     前記業界類似度を算出する、請求項1に記載の企業類似度算出サーバ。
    The similarity calculation means is
    A word related to an industry is extracted from the first information, and at least one industry associated with at least one first industry word that appears more than a predetermined number of occurrences among the extracted words is defined as the first industry. ,
    Words related to the industry are extracted from the second information, and at least one industry associated with at least one second industry word that appears more than a predetermined number of occurrences among the extracted words is defined as the second industry. ,
    The company similarity calculation server according to claim 1, which calculates the industry similarity.
  3.  前記類似度算出手段は、
     前記第1の情報から業界に関する単語を抽出し、抽出した単語のうち所定の出現回数以上で出現する少なくとも1つの第1の業界単語と対応付けられた少なくとも1つの業界を第1の下位業界とし、前記第1の下位業界より広い概念であり、前記第1の下位業界に対応付けられた業界を第1の上位業界とし、
     前記第2の情報から業界に関する単語を抽出し、抽出した単語のうち所定の出現回数以上で出現する少なくとも1つの第2の業界単語と対応付けられた少なくとも1つの業界を第2の下位業界とし、前記第2の下位業界より広い概念であり、前記第2の下位業界に対応付けられた業界を第2の上位業界とし、
     前記第1の下位業界及び前記第1の上位業界と、前記第2の下位業界及び前記第2の上位業界と、の類似度を算出することで、前記業界類似度を算出する、請求項1に記載の企業類似度算出サーバ。
    The similarity calculation means is
    Words related to the industry are extracted from the first information, and at least one industry associated with at least one first industry word that appears more than a predetermined number of occurrences among the extracted words is defined as the first sub-industry. , The concept is broader than the first lower industry, and the industry associated with the first lower industry is defined as the first upper industry.
    Words related to the industry are extracted from the second information, and at least one industry associated with at least one second industry word that appears more than a predetermined number of occurrences among the extracted words is defined as the second sub-industry. , The concept is broader than the second lower industry, and the industry associated with the second lower industry is defined as the second upper industry.
    Claim 1 for calculating the industry similarity by calculating the similarity between the first lower industry and the first upper industry and the second lower industry and the second upper industry. The company similarity calculation server described in.
  4.  情報取得手段は、前記業界類似度を算出するために用いる業界類似度マトリクスを取得し、
     前記業界類似度マトリクスは、
     列に属する要素と行に属する要素とがそれぞれ対応しており、前記要素は、業界に関する情報である下位業界と前記下位業界より広い概念である上位業界とを含み、
     前記列と前記行との交点には、前記列における前記下位業界及び前記上位業界と、前記行における前記下位業界及び前記上位業界と、の類似度が対応付けられており、
     前記類似度算出手段は、
     前記業界類似度マトリクスを用いて、
     前記第1の下位業界及び前記第1の上位業界と関連する少なくとも1つの前記列と、
     前記第2の下位業界及び前記第2の上位業界と関連する少なくとも1つの前記行と、に対応付けられた類似度を取得することで、前記業界類似度を算出する、請求項3に記載の企業類似度算出サーバ。
    The information acquisition means acquires an industry similarity matrix used for calculating the industry similarity, and obtains the industry similarity matrix.
    The industry similarity matrix is
    The elements belonging to columns and the elements belonging to rows correspond to each other, and the elements include a lower industry which is information about an industry and a higher industry which is a broader concept than the lower industry.
    At the intersection of the column and the row, the similarity between the lower industry and the upper industry in the column and the lower industry and the upper industry in the row is associated.
    The similarity calculation means is
    Using the industry similarity matrix,
    With at least one of the columns associated with the first sub-industry and the first super-industry,
    The third aspect of the present invention, wherein the industry similarity is calculated by acquiring the similarity associated with the second sub-industry and at least one said line related to the second upper industry. Company similarity calculation server.
  5.  前記第1の下位業界及び前記第1の上位業界と関連する少なくとも1つの前記下位業界及び前記上位業界に対応する前記列と、
     前記第2の下位業界及び前記第2の上位業界と関連する少なくとも1つの前記下位業界及び前記上位業界に対応する前記行と、に対応付けられた複数の類似度を取得し、
     取得した前記複数の類似度に基づき、前記業界類似度を算出する、請求項4に記載の企業類似度算出サーバ。
    With at least one sub-industry associated with the first sub-industry and the first super-industry and the column corresponding to the super-industry.
    A plurality of similarities associated with the second sub-industry and at least one of the sub-industries associated with the second super-industry and the row corresponding to the super-industry are acquired.
    The company similarity calculation server according to claim 4, which calculates the industry similarity based on the acquired plurality of similarity.
  6.  前記業界類似度マトリクスの前記列と前記行とに対応付けられた類似度は、
     前記列の前記下位業界と前記行の前記下位業界とが同一である場合、
     前記列の前記下位業界と前記行の前記下位業界との類似度が類似する可能性が高い高類似度であり、かつ前記列の前記上位業界と前記行の前記上位業界とが同一である場合、
     前記列の前記下位業界と前記行の前記下位業界との類似度が前記高類似度であり、かつ前記列の前記上位業界と前記行の前記上位業界との類似度が前記高類似度である場合、
     前記列の前記下位業界と前記行の前記下位業界との類似度が前記高類似度より低い低類似度であり、かつ前記列の前記上位業界と前記行の前記上位業界とが同一である場合、
     前記列の前記下位業界と前記行の前記下位業界との類似度が前記低類似度であり、かつ前記列の前記上位業界と前記行の前記上位業界との類似度が前記低類似度である場合、
     の順で高い、請求項4または5に記載の企業類似度算出サーバ。
    The similarity associated with the column and the row of the industry similarity matrix is
    When the sub-industry in the column and the sub-industry in the row are the same
    When the degree of similarity between the lower industry in the column and the lower industry in the row is high, and the upper industry in the column and the upper industry in the row are the same. ,
    The similarity between the lower industry in the column and the lower industry in the row is the high similarity, and the similarity between the upper industry in the column and the upper industry in the row is the high similarity. If,
    When the similarity between the lower industry in the column and the lower industry in the row is lower than the high similarity, and the upper industry in the column and the upper industry in the row are the same. ,
    The similarity between the lower industry in the column and the lower industry in the row is the low similarity, and the similarity between the upper industry in the column and the upper industry in the row is the low similarity. If,
    The company similarity calculation server according to claim 4 or 5, which is higher in the order of.
  7.  前記類似度算出手段は、
     前記第1の情報から業態に関する単語を抽出し、抽出した単語のうち所定の出現回数以上で出現する少なくとも1つの第1の業態単語と対応付けられた少なくとも1つの業態を前記第1の業態とし、
     前記第2の情報から業態に関する単語を抽出し、抽出した単語のうち所定の出現回数以上で出現する少なくとも1つの第2の業態単語と対応付けられた少なくとも1つの業態を前記第2の業態として、
     前記業態類似度を算出する、請求項1から6のいずれか1項に記載の企業類似度算出サーバ。
    The similarity calculation means is
    A word related to a business type is extracted from the first information, and at least one business type associated with at least one first business type word that appears more than a predetermined number of occurrences among the extracted words is defined as the first business type. ,
    Words related to business formats are extracted from the second information, and at least one business format associated with at least one second business format word that appears more than a predetermined number of occurrences among the extracted words is defined as the second business format. ,
    The company similarity calculation server according to any one of claims 1 to 6, which calculates the business type similarity.
  8.  前記類似度算出手段は、
     前記第1の情報から業態に関する単語を抽出し、抽出した単語のうち所定の出現回数以上で出現する少なくとも1つの第1の業態単語と対応付けられた少なくとも1つの業態を第1の下位業態とし、前記第1の下位業態より広い概念であり、前記第1の下位業態に対応付けられた業態を第1の上位業態とし、
     前記第2の情報から業態に関する単語を抽出し、抽出した単語のうち所定の出現回数以上で出現する少なくとも1つの第2の業態単語と対応付けられた少なくとも1つの業態を第2の下位業態とし、前記第2の下位業態より広い概念であり、前記第2の下位業態に対応付けられた業態を第2の上位業態とし、
     前記第1の下位業態及び前記第1の上位業態と、前記第2の下位業態及び前記第2の上位業態と、の類似度を算出することで、前記業態類似度を算出する、請求項1から6のいずれか1項に記載の企業類似度算出サーバ。
    The similarity calculation means is
    Words related to business formats are extracted from the first information, and at least one business format associated with at least one first business format word that appears more than a predetermined number of occurrences among the extracted words is set as the first sub-business format. , The concept is broader than the first lower-level business format, and the business format associated with the first lower-level business format is defined as the first higher-level business format.
    Words related to business formats are extracted from the second information, and at least one business format associated with at least one second business format word that appears more than a predetermined number of occurrences among the extracted words is set as the second sub-business format. , The concept is broader than the second lower-level business format, and the business format associated with the second lower-level business format is defined as the second higher-level business format.
    Claim 1 for calculating the degree of similarity between the first lower-level business type and the first upper-level business type, and the second lower-level business type and the second upper-level business type to calculate the degree of similarity between the first lower-level business type and the first upper-level business type. The company similarity calculation server according to any one of 6 to 6.
  9.  情報取得手段は、前記業態類似度を算出するために用いる業態類似度マトリクスを取得し、
     前記業態類似度マトリクスは、
     列に属する要素と行に属する要素とがそれぞれ対応しており、前記要素は、業態に関する情報である下位業態と前記下位業態より広い概念である上位業態とを含み、
     前記列と前記行との交点には、前記列における前記下位業態及び前記上位業態と、前記行における前記下位業態及び前記上位業態と、の類似度が対応付けられており、
     前記類似度算出手段は、
     前記業態類似度マトリクスを用いて、
     前記第1の下位業態及び前記第1の上位業態と関連する少なくとも1つの前記列と、
     前記第2の下位業態及び前記第2の上位業態と関連する少なくとも1つの前記行と、に対応付けられた類似度を取得することで、前記業態類似度を算出する、請求項8に記載の企業類似度算出サーバ。
    The information acquisition means acquires a business type similarity matrix used for calculating the business type similarity degree, and obtains the business type similarity degree matrix.
    The format similarity matrix is
    The element belonging to the column and the element belonging to the row correspond to each other, and the element includes a lower business type which is information about the business type and a higher business type which is a broader concept than the lower business type.
    At the intersection of the column and the row, the degree of similarity between the lower business category and the upper business category in the column and the lower business category and the upper business category in the row is associated.
    The similarity calculation means is
    Using the format similarity matrix,
    With at least one of the columns associated with the first subordinate format and the first superior format,
    The eighth aspect of the present invention, wherein the degree of similarity of the type of business is calculated by acquiring the degree of similarity associated with the second lower type of business and at least one of the rows related to the second higher type of business. Corporate similarity calculation server.
  10.  前記第1の下位業態及び前記第1の上位業態と関連する少なくとも1つの前記下位業態及び前記上位業態に対応する前記列と、
     前記第2の下位業態及び前記第2の上位業態と関連する少なくとも1つの前記下位業態及び前記上位業態に対応する前記行と、に対応付けられた複数の類似度を取得し、
     取得した前記複数の類似度に基づき、前記業態類似度を算出する、請求項9に記載の企業類似度算出サーバ。
    The first sub-business format and at least one subordinate business format related to the first upper business format and the column corresponding to the upper business format, and
    A plurality of similarities associated with the second sub-business format and at least one sub-business format related to the second upper business format and the row corresponding to the upper business format are acquired.
    The company similarity calculation server according to claim 9, which calculates the business type similarity based on the acquired plurality of similarity.
  11.  前記業態類似度マトリクスの前記列と前記行とに対応付けられた類似度は、
     前記列の前記下位業態と前記行の前記下位業態とが同一である場合、
     前記列の前記下位業態と前記行の前記下位業態との類似度が類似する可能性が高い高類似度であり、かつ前記列の前記上位業態と前記行の前記上位業態とが同一である場合、
     前記列の前記下位業態と前記行の前記下位業態との類似度が前記高類似度であり、かつ前記列の前記上位業態と前記行の前記上位業態との類似度が前記高類似度である場合、
     前記列の前記下位業態と前記行の前記下位業態との類似度が前記高類似度より低い低類似度であり、かつ前記列の前記上位業態と前記行の前記上位業態とが同一である場合、
     前記列の前記下位業態と前記行の前記下位業態との類似度が前記低類似度であり、かつ前記列の前記上位業態と前記行の前記上位業態との類似度が前記低類似度である場合、
     の順で高い、請求項9または10に記載の企業類似度算出サーバ。
    The similarity associated with the column and the row of the format similarity matrix is
    When the sub-business category in the column and the sub-business category in the row are the same,
    When the degree of similarity between the lower business type in the column and the lower business type in the row is high, and the upper business type in the column and the upper business type in the row are the same. ,
    The degree of similarity between the lower business category in the column and the lower business category in the row is the high similarity, and the similarity between the upper business category in the column and the upper business category in the row is the high similarity. If
    When the similarity between the lower business category in the column and the lower business category in the row is lower than the high similarity, and the upper business category in the column and the upper business category in the row are the same. ,
    The similarity between the lower business category in the column and the lower business category in the row is the low similarity, and the similarity between the upper business category in the column and the upper business category in the row is the low similarity. If
    The company similarity calculation server according to claim 9 or 10, which is higher in the order of.
  12.  前記類似度算出手段は、
     前記第1の企業が行っている事業に関する単語である第1の事業単語を、前記第1の情報から少なくとも1つ抽出し、前記第1の事業単語をベクトル化し、
     前記第2の企業が行っている事業に関する単語である第2の事業単語を、前記第2の情報から少なくとも1つ抽出し、前記第2の事業単語をベクトル化し、
     ベクトル化した、前記第1の事業単語と前記第2の事業単語との類似度を算出することで、前記事業類似度を算出する、請求項1から11のいずれか1項に記載の企業類似度算出サーバ。
    The similarity calculation means is
    At least one first business word, which is a word related to the business conducted by the first company, is extracted from the first information, and the first business word is vectorized.
    At least one second business word, which is a word related to the business conducted by the second company, is extracted from the second information, and the second business word is vectorized.
    The company similarity according to any one of claims 1 to 11, wherein the business similarity is calculated by calculating the vectorized similarity between the first business word and the second business word. Degree calculation server.
  13.  前記類似度算出手段は、
     前記事業類似度と、前記業態類似度と、前記業界類似度と、の其々を所定の割合で足し合わせた値を算出することで、
     前記企業類似度を算出する、請求項1から12のいずれか1項に記載の企業類似度算出サーバ。
    The similarity calculation means is
    By calculating the value obtained by adding the business similarity, the business category similarity, and the industry similarity at a predetermined ratio.
    The company similarity calculation server according to any one of claims 1 to 12, which calculates the company similarity.
  14.  前記所定の割合は、
     前記業態類似度の割合が最も高く、前記事業類似度の割合が最も低い、請求項13に記載の企業類似度算出サーバ。
    The predetermined ratio is
    The company similarity calculation server according to claim 13, wherein the ratio of the business similarity is the highest and the ratio of the business similarity is the lowest.
  15.  前記出力手段は、
     前記事業類似度と、
     前記業態類似度と、
     前記業界類似度と、
     前記企業類似度と、を出力する、請求項1から14のいずれか1項に記載の企業類似度算出サーバ。
    The output means
    The business similarity and
    The degree of similarity between business formats and
    With the industry similarity
    The company similarity calculation server according to any one of claims 1 to 14, which outputs the company similarity.
  16.  前記出力手段は、
     前記事業類似度の軸と、
     前記業態類似度の軸と、
     前記業界類似度の軸と、を含むチャートを出力する、請求項1から15のいずれか1項に記載の企業類似度算出サーバ。
    The output means
    The axis of business similarity and
    The axis of business type similarity and
    The company similarity calculation server according to any one of claims 1 to 15, which outputs a chart including the industry similarity axis.
  17.  前記出力手段は、
     前記類似度算出手段により算出した前記事業類似度を前記事業類似度の軸に表示し、
     前記類似度算出手段により算出した前記業界類似度を前記業界類似度の軸に表示し、
     前記類似度算出手段により算出した前記業態類似度を前記業態類似度の軸に表示する、請求項16に記載の企業類似度算出サーバ。
    The output means
    The business similarity calculated by the similarity calculation means is displayed on the axis of the business similarity.
    The industry similarity calculated by the similarity calculation means is displayed on the industry similarity axis.
    The company similarity calculation server according to claim 16, wherein the business type similarity calculated by the similarity calculation means is displayed on the axis of the business type similarity.
  18.  前記第2の企業が複数ある場合、
     前記出力手段は、
     前記類似度算出手段により算出した前記企業類似度の序列に基づいて複数の前記第2の企業の類似度を出力する、請求項1から17のいずれか1項に記載の企業類似度算出サーバ。
    If there are multiple second companies,
    The output means
    The company similarity calculation server according to any one of claims 1 to 17, which outputs the similarity of a plurality of the second companies based on the order of the company similarity calculated by the similarity calculation means.
  19.  基準となる第1の企業と、前記第1の企業以外の第2の企業と、の企業類似度を算出する企業類似度算出サーバにおける企業類似度算出方法であって、
     前記第1の企業が行っている事業に関する単語と、前記第1の企業が行っている事業に関する単語と、に基づいて事業類似度を算出し、
     前記第1の企業が属する業界に関する第1の業界と、前記第2の企業が属する業界に関する第2の業界と、に基づいて業界類似度を算出し、
     前記第1の企業が属する業態に関する第1の業態と、前記第2の企業が属する業態に関する第2の業態と、に基づいて業態類似度を算出し、
     前記事業類似度と、前記業界類似度と、前記業態類似度と、に基づいて前記企業類似度を算出し、
     算出された前記企業類似度を出力する、企業類似度算出方法。
    It is a method of calculating the company similarity in the company similarity calculation server that calculates the company similarity between the first company as a reference and the second company other than the first company.
    The degree of business similarity is calculated based on the words related to the business conducted by the first company and the words related to the business conducted by the first company.
    The industry similarity is calculated based on the first industry related to the industry to which the first company belongs and the second industry related to the industry to which the second company belongs.
    The degree of business type similarity is calculated based on the first business type related to the business type to which the first company belongs and the second business type related to the business type to which the second company belongs.
    The company similarity is calculated based on the business similarity, the industry similarity, and the business format similarity.
    A company similarity calculation method that outputs the calculated company similarity.
  20.  企業類似度算出サーバに請求項19に記載の企業類似度算出方法の各ステップを実行させるためのプログラム。

     
    A program for causing the company similarity calculation server to execute each step of the company similarity calculation method according to claim 19.

PCT/JP2020/029577 2019-08-08 2020-07-31 Company similarity calculation server and company similarity calculation method WO2021024966A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2019146489A JP7418781B2 (en) 2019-08-08 2019-08-08 Company similarity calculation server and company similarity calculation method
JP2019-146489 2019-08-08

Publications (1)

Publication Number Publication Date
WO2021024966A1 true WO2021024966A1 (en) 2021-02-11

Family

ID=74503847

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2020/029577 WO2021024966A1 (en) 2019-08-08 2020-07-31 Company similarity calculation server and company similarity calculation method

Country Status (2)

Country Link
JP (1) JP7418781B2 (en)
WO (1) WO2021024966A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008287328A (en) * 2007-05-15 2008-11-27 Ntt Data Corp Evaluation device, method, and computer program
JP2016071798A (en) * 2014-10-01 2016-05-09 富士ゼロックス株式会社 Information processor and information processing program
JP6489340B1 (en) * 2018-06-28 2019-03-27 嘉久 塩川 Comparison target company selection system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008287328A (en) * 2007-05-15 2008-11-27 Ntt Data Corp Evaluation device, method, and computer program
JP2016071798A (en) * 2014-10-01 2016-05-09 富士ゼロックス株式会社 Information processor and information processing program
JP6489340B1 (en) * 2018-06-28 2019-03-27 嘉久 塩川 Comparison target company selection system

Also Published As

Publication number Publication date
JP2021026689A (en) 2021-02-22
JP7418781B2 (en) 2024-01-22

Similar Documents

Publication Publication Date Title
US11373106B2 (en) System and method for detecting friction in websites
US11868411B1 (en) Techniques for compiling and presenting query results
US20160314126A1 (en) Entity fingerprints
US20150006432A1 (en) Ontology-driven construction of semantic business intelligence models
US11321518B2 (en) Machine learning based document editing engine
US20150356094A1 (en) Systems and methods for management of data platforms
US20160232537A1 (en) Statistically and ontologically correlated analytics for business intelligence
US20150006160A1 (en) Business intelligence data models with concept identification using language-specific clues
US20150186776A1 (en) Contextual data analysis using domain information
US10067964B2 (en) System and method for analyzing popularity of one or more user defined topics among the big data
US11263523B1 (en) System and method for organizational health analysis
US11681817B2 (en) System and method for implementing attribute classification for PII data
US20150302036A1 (en) Method, system and computer program for information retrieval using content algebra
CN113886708A (en) Product recommendation method, device, equipment and storage medium based on user information
CN115577701A (en) Risk behavior identification method, device, equipment and medium for big data security
US10719561B2 (en) System and method for analyzing popularity of one or more user defined topics among the big data
Sunuwar et al. Comparative Analysis of Relational and Graph Databases for Data Provenance: Performance, Queries, and Security Considerations
US20210271637A1 (en) Creating descriptors for business analytics applications
EP3152678B1 (en) Systems and methods for management of data platforms
WO2021024966A1 (en) Company similarity calculation server and company similarity calculation method
Das et al. A review on text analytics process with a CV parser model
KR102095744B1 (en) Personal data de-identification method for formless big data
US20140317154A1 (en) Heterogeneous data management methodology and system
Oleksy Data Science with R
US20230060127A1 (en) Techniques to generate and store graph models from structured and unstructured data in a cloud-based graph database system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20849425

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 23.05.2022)

122 Ep: pct application non-entry in european phase

Ref document number: 20849425

Country of ref document: EP

Kind code of ref document: A1