JP6941802B1

JP6941802B1 - Search system, search method and search program

Info

Publication number: JP6941802B1
Application number: JP2021111716A
Authority: JP
Inventors: 琢也大迫; 康一郎佐野
Original assignee: LINKERS CORPORATION
Current assignee: LINKERS CORPORATION
Priority date: 2021-07-05
Filing date: 2021-07-05
Publication date: 2021-09-29
Anticipated expiration: 2041-07-05
Also published as: JP2023008284A

Abstract

【課題】ビジネスパートナー探索におけるユーザの負担を軽減し、ビジネスパートナーとして相応しい候補企業を効果的にリストアップする。【解決手段】検索システム１の管理サーバ２０は、企業の特徴を表す複数の特徴語をそのスコアと共に企業ごとに記憶する特徴語データベース２２３と、機械学習を行って得られる特徴語の分散表現を特徴語ごとに学習モデルとして記憶する学習モデルデータベース２２４を備えている。管理サーバ２０は外部の入力装置から入力されたキーワードを取得すると、キーワードに対応する特徴語を持つ企業を特徴語データベース２２３から検索し、その企業の特徴語に基づく分散表現とキーワードに対応する特徴語に基づく分散表現との類似度を計算し、特徴語、スコア及び類似度を含む企業リストを生成する。【選択図】図３PROBLEM TO BE SOLVED: To reduce the burden on a user in searching for a business partner and effectively list candidate companies suitable as a business partner. SOLUTION: A management server 20 of a search system 1 stores a feature word database 223 that stores a plurality of feature words representing the characteristics of a company together with their scores for each company, and a distributed expression of the feature words obtained by performing machine learning. It is equipped with a learning model database 224 that stores each feature word as a learning model. When the management server 20 acquires a keyword input from an external input device, the management server 20 searches the feature word database 223 for a company having a feature word corresponding to the keyword, and a distributed expression based on the feature word of the company and a feature corresponding to the keyword. Calculates similarity to word-based distributed expressions and generates a list of companies including feature words, scores and similarity. [Selection diagram] Fig. 3

Description

本発明は、ビジネスパートナー探索を目的として企業リストを生成する検索システム、検索方法及び検索プログラムに関する。 The present invention relates to a search system, a search method, and a search program that generate a company list for the purpose of searching for a business partner.

近年の企業における製品開発は、自社技術だけでなく、社外企業の優れた技術を活用するケースが多く見られる。例えば、多くの大企業や中堅企業が優れた技術を持つ中小・ベンチャー企業をビジネスパートナーにしている。
これまで取引がない新しいビジネスパートナーをインターネットで探す場合、検索ユーザはパーソナルコンピュータ等の画面に表示されるＷｅｂブラウザ上でキーワードを入力し、そのキーワードがヒットした企業をビジネスパートナーの候補としてリストアップする。この際に用いられるキーワードの多くは、商材やその特徴等の製品や材料に関する単語であるが、検索ユーザの経験や知識に基づいて選択されるため、ユーザのスキルや経験の差が検索結果に表れてしまうことが多い。そこで、入力されたキーワードに関連するキーワード技術を企業情報から抽出して両者の類似度を算出した上で企業を検索する技術が知られている（例えば、特許文献１参照）。 In recent years, product development in companies often utilizes not only their own technology but also the excellent technology of external companies. For example, many large and medium-sized companies have small and medium-sized venture companies with excellent technology as business partners.
When searching the Internet for a new business partner with whom there has been no transaction, the search user enters a keyword on the Web browser displayed on the screen of a personal computer or the like, and the companies that hit the keyword are listed as candidates for the business partner. .. Most of the keywords used in this case are words related to products and materials such as products and their characteristics, but since they are selected based on the experience and knowledge of the search user, the difference in user skills and experience is the search result. It often appears in. Therefore, there is known a technique of extracting a keyword technique related to an input keyword from company information, calculating the similarity between the two, and then searching for a company (see, for example, Patent Document 1).

特開２０１９−１３３３６７号公報Japanese Unexamined Patent Publication No. 2019-133367

しかしながら、ユーザの試行錯誤を経た従来のキーワード検索によってリストアップされる企業は、リストアップされた時点ではビジネスパートナーとして相応しいか否かの優劣はつけられていない。そのため、リストアップされた全ての候補企業の企業情報をユーザが確認して判断しなければ、より適切な候補企業を選択し漏らす可能性があるため、リストアップ後の手間・人的コストがかかる。 However, the companies listed by the conventional keyword search through trial and error of the user are not given superiority or inferiority as to whether or not they are suitable as business partners at the time of listing. Therefore, if the user does not confirm and judge the company information of all the listed candidate companies, there is a possibility that a more appropriate candidate company will be selected and leaked, which requires labor and human cost after listing. ..

本発明は、このような事情を考慮してなされたものであり、ビジネスパートナー探索におけるユーザの負担を軽減し、ビジネスパートナーとして相応しい候補企業を効果的にリストアップできる検索システム、検索方法及び検索プログラムを提供することを目的とする。 The present invention has been made in consideration of such circumstances, and is a search system, a search method, and a search program that can reduce the burden on the user in searching for a business partner and effectively list candidate companies suitable as business partners. The purpose is to provide.

上記課題を解決するために、本発明は、ビジネスパートナー探索を目的として企業リストを生成する検索システムであって、企業の特徴を表す複数の特徴語を該特徴語の出現頻度に基づくスコアと共に企業ごとに記憶する特徴語記憶手段と、前記複数の特徴語から一の特徴語を予測する機械学習を行って得られる前記特徴語の分散表現を特徴語ごとに記憶して学習モデルとする学習モデル記憶手段と、入力装置から入力されたキーワードを取得する取得手段と、前記キーワードに対応する特徴語を持つ企業を前記特徴語記憶手段から検索する検索手段と、前記検索手段で検索された前記企業が持つ複数の特徴語に基づく分散表現と前記キーワードに対応する特徴語に基づく分散表現との類似度を計算する類似度計算手段と、前記検索手段で検索された前記企業が持つ複数の特徴語、該特徴語のスコア及び類似度を含む企業リストを生成するリスト生成手段とを備えることを特徴とする。 In order to solve the above problems, the present invention provides a search system for generating a list of companies for the purpose of business partners search, a plurality of feature words representing the characteristics of the company together with the score based on the current frequency output of the characteristic words Learning to memorize the characteristic word storage means to be memorized for each company and the distributed expression of the characteristic word obtained by performing machine learning to predict one characteristic word from the plurality of characteristic words as a learning model. A model storage means, an acquisition means for acquiring a keyword input from an input device, a search means for searching a company having a feature word corresponding to the keyword from the feature word storage means, and the search means searched by the search means. A similarity calculation means for calculating the similarity between a distributed expression based on a plurality of characteristic words possessed by a company and a distributed expression based on a characteristic word corresponding to the keyword, and a plurality of features possessed by the company searched by the search means. It is characterized by comprising a list generation means for generating a company list including a word, a score of the characteristic word, and a similarity.

また、本発明に係る上記検索システムにおいて、前記類似度計算手段は、前記検索手段で検索された前記企業が持つ複数の特徴語のそれぞれの分散表現を合成した合成分散表現と前記キーワードに対応する特徴語の分散表現との類似度を計算することを特徴とする。 Further, in the search system according to the present invention, the similarity calculation means corresponds to a synthetic distributed expression obtained by synthesizing each distributed expression of a plurality of characteristic words of the company searched by the search means and the keyword. It is characterized by calculating the degree of similarity with the distributed expression of the feature word.

さらに、本発明に係る上記検索システムにおいて、前記類似度計算手段は、前記キーワードに対応する特徴語に類似する特徴語を前記学習モデルに基づいて計算し、前記キーワードに対応する特徴語及び該特徴語に類似する複数の特徴語とそれらのスコアを含む第１のベクトルと、各企業が保有する特徴語とそのスコアを含む第２のベクトルとの類似度を計算することを特徴とする。 Further, in the search system according to the present invention, the similarity calculation means calculates a feature word similar to the feature word corresponding to the keyword based on the learning model, and the feature word corresponding to the keyword and the feature. It is characterized in that the similarity between a first vector containing a plurality of feature words similar to words and their scores and a second vector containing the feature words owned by each company and their scores is calculated.

さらにまた、本発明に係る上記検索システムにおいて、前記類似度計算手段は、前記第１のベクトルと前記第２のベクトルに含まれる同一の特徴語間のスコアの積の総和に基づいて前記類似度を計算することを特徴とする。 Furthermore, in the search system according to the present invention, the similarity calculation means is based on the sum of the products of the scores between the first vector and the same feature words included in the second vector. Is characterized by calculating.

さらにまた、本発明に係る上記検索システムにおいて、前記類似度計算手段は、前記検索手段で検索された前記企業が持つ複数の特徴語に基づく分散表現について、それぞれの特徴語のスコアが上位の特徴語をスコアが下位の特徴語よりも大きく重み付けを行った分散表現とし、前記キーワードに対応する特徴語に基づく分散表現との類似度を計算することを特徴とする。 Furthermore, in the search system according to the present invention, the similarity calculation means has a feature in which the score of each feature word is higher than that of the distributed expression based on a plurality of feature words of the company searched by the search means. A word is a distributed expression in which a word is weighted more than a feature word having a lower score, and the similarity with a distributed expression based on the feature word corresponding to the keyword is calculated.

さらにまた、上記課題を解決するために、本発明は、ビジネスパートナー探索を目的として企業リストを生成する検索サーバにおける検索方法であって、入力装置から入力されたキーワードを取得する取得工程と、前記キーワードに対応する特徴語を持つ企業を、該企業の特徴を表す複数の特徴語を該特徴語の出現頻度に基づくスコアと共に企業ごとに記憶する特徴語データベースから検索する検索工程と、前記複数の特徴語から一の特徴語を予測する機械学習を行って得られる前記特徴語の分散表現を特徴語ごとに記憶して学習モデルとする学習モデルデータベースを参照して、前記検索工程で検索された前記企業が持つ複数の特徴語に基づく分散表現と前記キーワードに対応する特徴語に基づく分散表現との類似度を計算する類似度計算工程と、前記検索工程で検索された前記企業が持つ複数の特徴語、該特徴語のスコア及び類似度を含む企業リストを生成するリスト生成工程とを有することを特徴とする。 Furthermore, in order to solve the above problems, the present invention is a search method in a search server that generates a company list for the purpose of searching for a business partner, and includes an acquisition step of acquiring a keyword input from an input device, and the above-mentioned. the company with characteristic word corresponding to the keyword, the search step of searching a plurality of feature words representing the characteristics of該企industry from the feature word database stored in each company along with the score based on the current frequency output of the feature words, the plurality Searched in the search process by referring to a learning model database that stores the distributed expression of the feature word obtained by performing machine learning that predicts one feature word from the feature words of the above for each feature word and uses it as a learning model. A similarity calculation process for calculating the similarity between a distributed expression based on a plurality of characteristic words possessed by the company and a distributed expression based on the characteristic words corresponding to the keyword, and a plurality of the distributed expressions possessed by the company searched in the search process. It is characterized by having a list generation step of generating a company list including the feature word, the score and the similarity of the feature word.

さらにまた、上記課題を解決するために、本発明に係るコンピュータ読み取り可能な検索プログラムは、ビジネスパートナー探索を目的として企業リストを生成する検索サーバに、入力装置から入力されたキーワードを取得する取得手順と、前記キーワードに対応する特徴語を持つ企業を、該企業の特徴を表す複数の特徴語を該特徴語の出現頻度に基づくスコアと共に企業ごとに記憶する特徴語データベースから検索する検索手順と、前記複数の特徴語から一の特徴語を予測する機械学習を行って得られる前記特徴語の分散表現を特徴語ごとに記憶して学習モデルとする学習モデルデータベースを参照して、前記検索手順で検索された前記企業が持つ複数の特徴語に基づく分散表現と前記キーワードに対応する特徴語に基づく分散表現との類似度を計算する類似度計算手順と、前記検索手順で検索された前記企業が持つ複数の特徴語、該特徴語のスコア及び類似度を含む企業リストを生成するリスト生成手順とを実行させることを特徴とする。 Furthermore, in order to solve the above problems, the computer-readable search program according to the present invention acquires a keyword input from an input device to a search server that generates a company list for the purpose of searching for a business partner. A search procedure for searching a company having a characteristic word corresponding to the keyword from a characteristic word database that stores a plurality of characteristic words representing the characteristics of the company together with a score based on the frequency of appearance of the characteristic word for each company. In the search procedure, the search procedure refers to a learning model database that stores the distributed expressions of the feature words obtained by performing machine learning that predicts one feature word from the plurality of feature words for each feature word and uses the learning model as a learning model. The similarity calculation procedure for calculating the similarity between the distributed expression based on a plurality of characteristic words of the searched company and the distributed expression based on the characteristic word corresponding to the keyword, and the company searched by the search procedure It is characterized in that a list generation procedure for generating a company list including a plurality of feature words, scores and similarities of the feature words is executed.

本発明によれば、ビジネスパートナー探索におけるユーザの負担を軽減し、ビジネスパートナーとして相応しい候補企業を効果的にリストアップできる。例えば、それぞれの特徴後がその企業の特徴をどの程度表しているかの指標となるスコアと、その企業が入力キーワードとどの程度関連しているかの指標となる類似度とを含むユーザフレンドリーな企業リストをユーザに提供できる。 According to the present invention, it is possible to reduce the burden on the user in searching for a business partner and effectively list candidate companies suitable as business partners. For example, a user-friendly list of companies that includes a score that is an indicator of how well each feature represents the characteristics of the company, and a similarity that is an indicator of how relevant the company is to the input keywords. Can be provided to the user.

本発明の一実施形態に係る検索システム１の構成図である。It is a block diagram of the search system 1 which concerns on one Embodiment of this invention. 本発明の一実施形態に係る検索システム１におけるユーザ端末１０の構成を示すブロック図である。It is a block diagram which shows the structure of the user terminal 10 in the search system 1 which concerns on one Embodiment of this invention. 本発明の一実施形態に係る検索システム１における管理サーバ２０の構成を示すブロック図である。It is a block diagram which shows the structure of the management server 20 in the search system 1 which concerns on one Embodiment of this invention. 本発明の一実施形態に係る特徴語データベース２２３のデータ構造例を示す図である。It is a figure which shows the data structure example of the characteristic word database 223 which concerns on one Embodiment of this invention. 本発明の一実施形態に係る学習モデルデータベース２２４のデータ構造例を示す図である。It is a figure which shows the data structure example of the learning model database 224 which concerns on one Embodiment of this invention. 本発明の一実施形態に係る検索システム１の管理サーバ２０における全体処理を説明するためのフローチャートである。It is a flowchart for demonstrating the whole process in the management server 20 of the search system 1 which concerns on one Embodiment of this invention. 本発明の一実施形態に係る検索システム１の管理サーバ２０における新たなキーワード候補（特徴語）の生成処理（ステップＳ１２）の詳細を説明するためのフローチャートである。It is a flowchart for demonstrating the detail of the generation processing (step S12) of the new keyword candidate (feature word) in the management server 20 of the search system 1 which concerns on one Embodiment of this invention. 本発明の一実施形態に係る候補企業リストのユーザ端末１０への表示例を示す図である。It is a figure which shows the display example on the user terminal 10 of the candidate company list which concerns on one Embodiment of this invention. 本発明の一実施形態に係る検索システム１の管理サーバ２０における類似度計算処理（ステップＳ１５）の詳細を説明するためのフローチャートである。It is a flowchart for demonstrating the detail of the similarity calculation process (step S15) in the management server 20 of the search system 1 which concerns on one Embodiment of this invention. 本発明の一実施形態に係る類似度を含む候補企業リストのユーザ端末１０での表示例を示す図である。It is a figure which shows the display example on the user terminal 10 of the candidate company list including the degree of similarity which concerns on one Embodiment of this invention.

以下、図面を参照して、本発明の一実施形態について説明する。
図１は、本発明の一実施形態に係る検索システム１の構成図である。図１に示すように、本実施形態に係る検索システム１は、複数のユーザ端末１０と管理サーバ２０で構成され、それぞれインターネット等の有線又は無線のネットワーク３０で通信可能に接続されている。以下では、ビジネスパートナー企業の探索を含む企業マッチングサービスを提供する企業（以下、「管理企業」という）に上記検索システム１が導入され、ＳａａＳ（Software as a Service）型のクラウドサービスによって以下で詳述するビジネスパートナー候補企業の検索やその他の各種処理が行われるものとする。 Hereinafter, an embodiment of the present invention will be described with reference to the drawings.
FIG. 1 is a configuration diagram of a search system 1 according to an embodiment of the present invention. As shown in FIG. 1, the search system 1 according to the present embodiment is composed of a plurality of user terminals 10 and a management server 20, and is connected to each other so as to be communicable by a wired or wireless network 30 such as the Internet. Below, the above search system 1 is introduced to a company that provides a company matching service including the search for business partner companies (hereinafter referred to as "management company"), and is detailed below by a SaaS (Software as a Service) type cloud service. The search for business partner candidate companies and various other processes described above shall be performed.

ユーザ端末１０は、例えば、管理企業のビジネスパートナー探索を行う部署の管理者又はその部署に所属する担当者のパーソナルコンピュータ等である。管理企業がビジネスパートナー探索の依頼を顧客企業（以下、「発注企業」という）から受けると、管理企業の担当者はビジネスパートナー企業に期待する技術等を発注企業の担当者等にヒアリング等を行い、検索のための適切なキーワードを検討する。そして、ユーザ端末１０のユーザ（管理企業の担当者等又は別の人員）は、検討されたキーワードを用いて候補企業の検索を行い、ビジネスパートナーの候補企業リストを作成する。具体的には、ユーザ端末１０のユーザは、管理サーバ２０が管理するＷｅｂサイトにアクセスをしてキーワードを入力する。管理サーバ２０は、入力されたキーワードに基づいて候補企業の検索を行い、ビジネスパートナーの候補企業リストを生成する。このとき、当該リストはユーザ端末１０からアクセスしたＷｅｂサイトで表示可能になっている。 The user terminal 10 is, for example, a personal computer of a manager of a department that searches for a business partner of a management company or a person in charge belonging to the department. When the management company receives a request to search for a business partner from a client company (hereinafter referred to as "ordering company"), the person in charge of the management company conducts interviews with the person in charge of the ordering company about the technology expected of the business partner company. , Consider the appropriate keywords for your search. Then, the user of the user terminal 10 (the person in charge of the management company or another person) searches for the candidate company using the examined keywords and creates a candidate company list of the business partner. Specifically, the user of the user terminal 10 accesses the website managed by the management server 20 and inputs a keyword. The management server 20 searches for candidate companies based on the input keywords and generates a list of candidate companies for business partners. At this time, the list can be displayed on the website accessed from the user terminal 10.

ここで、ユーザ端末１０は、管理企業だけでなく、ビジネスパートナーの探索サービスの提供を受ける発注企業や、候補企業リストに基づいて発注企業が面談を希望する受注候補企業等（以下、「候補企業」という）に導入されていてもよい。この場合、発注企業の担当者等は、それぞれのユーザ端末１０から自社の企業情報やビジネスパートナー探索に関する情報（ビジネスパートナー探索の詳細や受注候補企業への質問等）を管理サーバ２０にアップロードしたり、管理企業のユーザによって検索された候補企業リストを閲覧することができる。また、受注企業の担当者等は、ユーザ端末１０から自社の企業情報やビジネスパートナー探索に関する情報（発注企業等へのアピール文や発注企業からの質問の回答等）を管理サーバ２０にアップロードできる。 Here, the user terminal 10 includes not only the management company, but also the ordering company that receives the search service of the business partner, the ordering company that the ordering company wishes to interview based on the candidate company list, and the like (hereinafter, "candidate company"). It may be introduced in). In this case, the person in charge of the ordering company uploads the company information of the company and the information related to the search for the business partner (details of the search for the business partner, questions to the candidate company for ordering, etc.) to the management server 20 from each user terminal 10. , You can browse the list of candidate companies searched by the users of the management company. In addition, the person in charge of the ordering company can upload the company information of the company and the information related to the search for the business partner (appeal statement to the ordering company, etc., answer to the question from the ordering company, etc.) to the management server 20 from the user terminal 10.

図２は、本発明の一実施形態に係る検索システム１におけるユーザ端末１０の構成を示すブロック図である。図２に示すように、ユーザ端末１０は、ユーザがキーワードやコマンド等を入力するキーボードやタッチパネル等の入力部１１、入力されたキーワードや管理サーバ２０で生成された候補企業リスト等のデータを表示するモニタ等の表示部１２、各種処理を行う処理部１３、及び管理サーバ２０やその他の不図示の装置等とネットワーク３０を介して通信処理を行う通信部１４を備えている。処理部１３は、ＣＰＵやプログラムを格納したＲＯＭ、ＲＡＭ等によって構成され、ＲＯＭに記憶されたプログラムに従って各種処理を行う。本実施形態において、ユーザ端末１０はパーソナルコンピュータとして説明するが、それ以外に、スマートフォンやタブレット等の通信端末であってもよい。 FIG. 2 is a block diagram showing a configuration of a user terminal 10 in the search system 1 according to the embodiment of the present invention. As shown in FIG. 2, the user terminal 10 displays data such as an input unit 11 such as a keyboard or touch panel on which the user inputs keywords and commands, and a list of candidate companies generated by the input keywords and the management server 20. It includes a display unit 12 such as a monitor, a processing unit 13 that performs various processes, and a communication unit 14 that performs communication processing with a management server 20 and other devices (not shown) via a network 30. The processing unit 13 is composed of a CPU, a ROM storing a program, a RAM, and the like, and performs various processes according to the program stored in the ROM. In the present embodiment, the user terminal 10 will be described as a personal computer, but other than that, a communication terminal such as a smartphone or a tablet may be used.

図３は、本発明の一実施形態に係る検索システム１における管理サーバ２０の構成を示すブロック図である。図３に示すように、管理サーバ２０は、ユーザ端末１０やその他の不図示の外部装置等とネットワーク３０を介して通信処理を行う通信部２１、プログラムや各種データ等の情報を記憶する記憶部２２、ユーザ端末１０で入力されたキーワードに基づいてビジネスパートナー候補企業の検索を行う検索部２３、入力されたキーワードと検索された企業の特徴との類似度計算を行う類似度計算部２４、計算された類似度を用いてビジネスパートナーの候補企業リストを生成するリスト生成部２５を備えている。 FIG. 3 is a block diagram showing the configuration of the management server 20 in the search system 1 according to the embodiment of the present invention. As shown in FIG. 3, the management server 20 includes a communication unit 21 that performs communication processing with a user terminal 10 and other external devices (not shown) via the network 30, and a storage unit that stores information such as programs and various data. 22, Search unit 23 that searches for business partner candidate companies based on the keywords entered on the user terminal 10, Similarity calculation unit 24 that calculates the similarity between the entered keywords and the characteristics of the searched companies, Calculation It is provided with a list generation unit 25 that generates a list of candidate companies for business partners using the similarities.

また、管理サーバ２０は、ユーザ端末１０やその他の外部装置（不図示）から候補企業の技術等に関する情報やその企業の技術等の特徴をタグ付けによって示すための技術タグ情報を取得し、データベースとして記憶部２２に記憶させるデータベース構築部２６、企業の特徴を示す特徴語を生成する特徴語生成部２７、及び特徴語の学習モデルを生成する学習モデル生成部２８を備える。また、記憶部２２は、候補企業の技術等に関する情報を記憶する企業データベース２２１、技術タグ情報を記憶する技術タグデータベース２２２、特徴語を記憶する特徴語データベース２２３、及び学習モデルを記憶する学習モデルデータベース２２４を備えている。 In addition, the management server 20 acquires information on the technology of the candidate company and technical tag information for indicating the characteristics of the technology of the company by tagging from the user terminal 10 and other external devices (not shown), and obtains a database. A database construction unit 26 for storing in the storage unit 22, a feature word generation unit 27 for generating feature words indicating the characteristics of the company, and a learning model generation unit 28 for generating a learning model for the feature words are provided. In addition, the storage unit 22 stores a company database 221 that stores information related to the technology of the candidate company, a technology tag database 222 that stores technology tag information, a feature word database 223 that stores feature words, and a learning model that stores a learning model. It has a database 224.

本実施形態において、管理サーバ２０は、上述の各ユニット（通信部２１〜学習モデル生成部２８）の処理機能を１つのサーバで実現するようにしているが、この機能を複数のサーバで実現するようにしてもよい。また、各サーバを単一の計算機で構成するようにしてもよいし、物理的に異なる複数の計算機等で構成されるようにしてもよい。 In the present embodiment, the management server 20 realizes the processing function of each of the above-mentioned units (communication unit 21 to learning model generation unit 28) on one server, but realizes this function on a plurality of servers. You may do so. Further, each server may be configured by a single computer, or may be configured by a plurality of physically different computers or the like.

データベース構築部２６は、ユーザ端末１０又はその他の外部装置（不図示）から、インターネット上で公開されている企業のホームページや有償・無償のデータベースから選択的に抽出された発注企業や受注候補企業の企業概要、事業概要及び保有技術等に関する企業情報等（以下、「公開企業情報」という）を取得すると、企業ごとに付与される企業ＩＤに対応付けて記憶部２２に記憶することで企業データベース２２１を新規に構築又は記憶情報を更新する。例えば、公開企業情報には、法人名（企業名）、本社の所在地（住所）、企業概要の説明文、事業概要の説明文、保有技術の説明文及び企業ホームページのアドレス（ＵＲＬ）等の公開情報が含まれる。なお、公報企業情報には、公開情報に基づいて管理企業の担当者等が企業概要、事業概要又は保有技術の説明文を起案した文章も含まれる。 The database construction unit 26 is a user terminal 10 or other external device (not shown) of an ordering company or an order candidate company selectively extracted from a company's homepage or a paid / free database published on the Internet. When the company information, etc. (hereinafter referred to as "public company information") related to the company outline, business outline, owned technology, etc. is acquired, it is stored in the storage unit 22 in association with the company ID given to each company, so that the company database 221 Is newly constructed or the stored information is updated. For example, in the public company information, the company name (company name), the location (address) of the head office, the explanation of the company outline, the explanation of the business outline, the explanation of the owned technology, the address (URL) of the company homepage, etc. are disclosed. Contains information. The gazetteed company information also includes a sentence in which the person in charge of the management company drafts a company outline, a business outline, or an explanation of the owned technology based on the public information.

また、データベース構築部２６は、ユーザ端末１０又はその他の外部装置（不図示）から、管理企業における過去及び現在のビジネスパートナー探索プロジェクトの案件についてのエントリー情報（受注候補企業からの立候補に関する情報やその際のアピール文等の付加情報等）、過去及び現在のビジネスパートナー探索の過程で発注企業と候補企業との間でなされた質問とその回答に関する情報、及び、候補企業が保有している技術で外部企業への提供が可能な技術（シーズ技術）に関する情報等（以下、「非公開企業情報」という）を取得すると、企業ＩＤに対応付けて記憶部２２に記憶することで企業データベース２２１を新規に構築又は記憶情報を更新する。 In addition, the database construction unit 26 uses the user terminal 10 or other external device (not shown) to provide entry information (information on candidacy from the order candidate company and its candidacy) regarding past and present business partner search project projects in the management company. (Additional information such as appeal statements, etc.), information on questions and answers asked between the ordering company and the candidate company in the process of searching for past and present business partners, and the technology possessed by the candidate company. When information about technology (seeds technology) that can be provided to external companies (hereinafter referred to as "private company information") is acquired, the company database 221 is newly created by storing it in the storage unit 22 in association with the company ID. Build or update stored information.

さらに、データベース構築部２６は、ユーザ端末１０又はその他の外部装置（不図示）から、その企業が保有しているＩＳＯ（登録商標）等の製品・サービスに関する国際規格や国内規格に関するタグ情報（以下、「規格タグ」という）、製品の加工・生産等のための装置・設備に関するタグ情報（以下、「設備タグ」という）、国・地方公共団体や各種業界団体等から表彰を受けた受賞歴等に関するタグ情報（以下、「受賞歴タグ」という）を含む技術タグ情報を取得すると、記憶部２２に記憶することで技術タグデータベース（技術タグ辞書）２２２を新規に構築又は記憶情報を更新する。 Further, the database construction unit 26 uses tag information (hereinafter, not shown) regarding international standards and domestic standards regarding products and services such as ISO (registered trademark) owned by the company from the user terminal 10 or other external device (not shown). , "Standard tag"), tag information on equipment / equipment for product processing / production (hereinafter referred to as "equipment tag"), award history received from national / local public organizations and various industry organizations When the technical tag information including the tag information related to the above (hereinafter referred to as "award history tag") is acquired, the technical tag database (technical tag dictionary) 222 is newly constructed or the stored information is updated by storing the technical tag information in the storage unit 22. ..

なお、データベース構築部２６による公開企業情報及び非公開企業情報の取得及び企業データベース２２１への記憶・更新処理、技術タグ情報の取得及び技術タグデータベース２２２への記憶・更新処理は、後述するキーワード検索や候補企業リストの生成に係る処理と共に行ってもよいし、これらの処理とは独立して行うようにしてもよい。例えば、データベース構築部２６は、数か月に１回等、定期的に又は任意のタイミングで外部装置等から公開企業情報、非公開企業情報及び技術タグ情報を取得して企業データベース２２１及び技術タグデータベース２２２の記憶情報をアップデートするようにしてもよい。なお、企業データベース２２１の公開企業情報及び非公開企業情報、及び技術タグデータベース２２２の技術タグ情報は、後述する特徴語生成処理や学習モデル生成処理等での処理を容易化するためにテキストデータで記憶されている。 The acquisition of public company information and private company information by the database construction unit 26, the storage / update process in the company database 221, the acquisition of technical tag information, and the storage / update process in the technical tag database 222 are performed by a keyword search described later. And may be performed together with the processing related to the generation of the candidate company list, or may be performed independently of these processing. For example, the database construction unit 26 acquires public company information, private company information, and technical tag information from an external device or the like on a regular basis or at an arbitrary timing, such as once every few months, and obtains the company database 221 and the technical tag. The stored information in the database 222 may be updated. The public company information and private company information in the company database 221 and the technical tag information in the technical tag database 222 are text data in order to facilitate processing in the feature word generation process and the learning model generation process described later. It is remembered.

特徴語生成部２７は、企業データベース２２１から企業ごとに公開企業情報及び非公開企業情報を取得して結合テキストデータを生成し、その結合テキストデータに対して形態素解析・不要語除去処理を経て得られた技術関連用語を特徴語として生成し、企業ＩＤに対応付けて記憶部２２に記憶させることで特徴語データベース２２３に新規に記憶又は情報を更新する。図４は、本発明の一実施形態に係る特徴語データベース２２３のデータ構造例を示す図である。本実施形態において、特徴語データベース２２３には、特徴語生成部２７で生成された特徴語がそのスコアと対応付けて企業ごとに記憶されている。また、各企業が保有する特徴語の個数として、企業ごとにスコアが上位の特徴語をスコアとともに記憶するが、本実施形態では一例として５００個の特徴語をそのスコアとともに記憶するものとする。 The feature word generation unit 27 acquires public company information and private company information for each company from the company database 221 to generate combined text data, and obtains the combined text data through morphological analysis and unnecessary word removal processing. The technically related terms are generated as feature words, and are stored in the storage unit 22 in association with the company ID to newly store or update the information in the feature word database 223. FIG. 4 is a diagram showing an example of a data structure of the feature word database 223 according to the embodiment of the present invention. In the present embodiment, the feature word database 223 stores the feature words generated by the feature word generation unit 27 in association with the score for each company. Further, as the number of characteristic words possessed by each company, the characteristic words having a higher score are stored together with the score for each company, but in the present embodiment, 500 characteristic words are stored together with the score as an example.

ここで特徴語のスコアについて説明する。本実施形態において特徴語のスコアは、企業データベース２２１に記憶されている全テキストデータにおける各特徴語の出現頻度におけるそれぞれの企業における結合テキストデータから生成される特徴語の出現頻度の割合に所定の係数を乗算した数値で示される。本実施形態は、特に製造業を中心としたビジネスパートナー探索を効果的に行うために、製造業界特有の単語が特徴語として多く格納されており、製造業向けの辞書を含む検索システムとなっている。 Here, the score of the feature word will be described. In the present embodiment, the feature word score is determined by the ratio of the appearance frequency of the feature words generated from the combined text data in each company to the appearance frequency of each feature word in all the text data stored in the company database 221. It is indicated by a numerical value multiplied by a coefficient. In this embodiment, in order to effectively search for business partners especially in the manufacturing industry, many words peculiar to the manufacturing industry are stored as characteristic words, and the search system includes a dictionary for the manufacturing industry. There is.

学習モデル生成部２８は、製造業の企業ごとに特許文献、企業の研究開発報告書、技術関連ニュース等の技術に関する公開文献情報（以下、「技術文書」という）に含まれる特徴語について、周辺の複数の特徴語から一の特徴語を予測する深層学習を行ってそれぞれの特徴語を高次元ベクトルで表現した分散表現を生成し、分散表現の特徴語を学習モデルとして学習モデルデータベース２２４に記憶する。図５は、本発明の一実施形態に係る学習モデルデータベース２２４のデータ構造例を示す図である。本実施形態において、学習モデルデータベース２２４には、特徴語データベース２２３に記憶されている特徴語に基づいて学習モデル生成部２８で生成された各特徴語の分散表現が、特徴語ごとにベクトルの次元順に数値で示されて記憶されている。本実施形態では、後述するように各特徴語を少なくとも数百次元以上の高次元ベクトル（具体的には、５００次元のベクトル）の分散表現で構築しているため５００個の数字が記憶されているが、ベクトルの次元数についてはこれに限られず、適宜変更できるようなシステム設計にすることもできる。学習モデルデータベース２２４の生成処理の詳細については後述する。 The learning model generation unit 28 describes characteristic words contained in public literature information (hereinafter referred to as “technical documents”) related to technology such as patent documents, corporate research and development reports, and technology-related news for each manufacturing company. Deep learning is performed to predict one feature word from a plurality of feature words in the above, a distributed expression in which each feature word is expressed by a high-dimensional vector is generated, and the feature word of the distributed expression is stored in the learning model database 224 as a learning model. do. FIG. 5 is a diagram showing an example of a data structure of the learning model database 224 according to the embodiment of the present invention. In the present embodiment, in the learning model database 224, the distributed expression of each feature word generated by the learning model generation unit 28 based on the feature words stored in the feature word database 223 has a vector dimension for each feature word. It is shown numerically in order and stored. In this embodiment, as will be described later, since each feature word is constructed by a distributed representation of a high-dimensional vector (specifically, a 500-dimensional vector) having at least several hundred dimensions or more, 500 numbers are stored. However, the number of dimensions of the vector is not limited to this, and the system can be designed so that it can be changed as appropriate. The details of the generation process of the learning model database 224 will be described later.

以下、図面を参照して、本発明の一実施形態に係る管理サーバ２０におけるキーワード検索によるビジネスパートナー候補企業のリスト生成処理について説明する。以下の実施形態では、企業マッチングサービスを提供している管理企業に検索システム１が導入されており、発注企業からの企業探索サービスの発注を受けて管理企業のユーザが社内のユーザ端末１０を用いてキーワード検索を行って候補企業のリストを生成する管理サーバ２０における一連の検索処理について説明する。 Hereinafter, a list generation process of business partner candidate companies by keyword search on the management server 20 according to the embodiment of the present invention will be described with reference to the drawings. In the following embodiment, the search system 1 is introduced in the management company that provides the company matching service, and the user of the management company uses the in-house user terminal 10 in response to the order of the company search service from the ordering company. A series of search processes in the management server 20 that performs a keyword search and generates a list of candidate companies will be described.

図６は、本発明の一実施形態に係る検索システム１の管理サーバ２０における全体処理を説明するためのフローチャートである。検索システム１において、記憶部２２内の企業データベース２２１、技術タグデータベース２２２、特徴語データベース２２３及び学習モデルデータベース２２４の新規構築及び記憶情報の更新処理（ステップＳ１１）は、ユーザ端末１０のユーザが行うキーワード検索処理と同時に、又はその前後にそれぞれ独立的に行うことが可能であるが、以下の実施形態では説明を容易にするために、ユーザ端末１０のユーザによるキーワード検索処理に先立って行う形態で説明する。 FIG. 6 is a flowchart for explaining the overall processing in the management server 20 of the search system 1 according to the embodiment of the present invention. In the search system 1, the user of the user terminal 10 performs new construction of the company database 221, the technical tag database 222, the feature word database 223, and the learning model database 224 in the storage unit 22, and the storage information update process (step S11). It is possible to perform the keyword search process at the same time as, or before and after the keyword search process, respectively, but in the following embodiments, in order to facilitate the explanation, the keyword search process is performed prior to the keyword search process by the user of the user terminal 10. explain.

［企業データベース２２１の構築・更新］
管理サーバ２０のデータベース構築部２６は、ユーザ端末１０によるキーワード検索に先立って、ユーザ端末１０又はその他の外部装置（不図示）等から候補企業の公開企業情報及び非公開企業情報を取得し、その情報を企業ごとに企業データベース２２１に新規に記憶又は既存情報の更新を行う。 [Construction / update of corporate database 221]
Prior to the keyword search by the user terminal 10, the database construction unit 26 of the management server 20 acquires the public company information and the private company information of the candidate company from the user terminal 10 or other external device (not shown), and the like. Information is newly stored in the company database 221 for each company or the existing information is updated.

本実施形態において、公開企業情報は、インターネットを介してアクセス可能な各企業のホームページやプレスリリース、ニュース記事等からそれぞれの企業の技術情報等をテキストデータ形式で取得したものである。なお、公開されている企業情報がHTML形式のようなデータの場合には、ヘッダーやフッター等の不要語を公知の手法等で除去することによってテキストデータに変換することができる。また、取得のタイミングに特に制限はなく、管理者のユーザ端末１０やその他の外部装置（不図示）等から常時又は定期的に、或いは任意のタイミングで管理サーバ２０にアップロードされ、データベース構築部２６がそのデータを企業データベース２２１に新規に記憶又は記憶されている情報を更新する。 In the present embodiment, the public company information is obtained by acquiring the technical information of each company in text data format from the homepage, press release, news article, etc. of each company accessible via the Internet. When the publicly available company information is data such as HTML format, it can be converted into text data by removing unnecessary words such as headers and footers by a known method or the like. Further, there is no particular limitation on the acquisition timing, and the data is uploaded to the management server 20 from the administrator's user terminal 10 or other external device (not shown) at all times, periodically, or at any timing, and is uploaded to the database construction unit 26. Updates the data newly stored or stored in the corporate database 221.

また、本実施形態において、非公開企業情報は、ビジネスパートナー探索のプロジェクト案件ごとに決められた期間中に、候補企業の担当者等がそれぞれのユーザ端末１０から管理サーバ２０にアクセスし、データを入力することで管理サーバ２０にアップロードされる。例えば、ビジネスパートナー探索プロジェクトの案件情報が候補企業に提示された後、１か月の期間を設け、その期間内だけ発注企業からの質問事項に関する受注候補企業からの回答（すなわち、非公開企業情報）を受け付けてデータ入力を可能にすることができる。なお、当該期間は、これ以外に、各ビジネスパートナー探索プロジェクトが行われている全期間であってもよいし、特に期間を設けずにいつでも入力できるようにしておき、管理者が任意のタイミングで入力の可否を調整できるようにしてもよい。この処理は、管理サーバ２０のデータベース構築部２６によって制御され、ユーザ端末１０で入力された非公開企業情報が管理サーバ２０にアップロードされると、企業データベース２２１に新規に記憶又は既存情報の更新が行われる。ここで、入力される非公開企業情報として、例えば、候補企業の最新の技術や発注企業への技術力のアピール文等の企業のオリジナル情報や発注企業からの質問事項に対する回答情報等が含まれる。 Further, in the present embodiment, the person in charge of the candidate company accesses the management server 20 from each user terminal 10 and obtains the data of the private company information during the period determined for each project of the business partner search. By inputting, it is uploaded to the management server 20. For example, after the project information of the business partner search project is presented to the candidate company, a period of one month is set, and the answer from the order candidate company regarding the question from the ordering company only within that period (that is, private company information). ) Can be accepted to enable data entry. In addition to this, the period may be the entire period in which each business partner search project is being carried out, or it may be possible to input at any time without setting a special period, and the administrator can enter it at any time. It may be possible to adjust whether or not input is possible. This process is controlled by the database construction unit 26 of the management server 20, and when the private company information input by the user terminal 10 is uploaded to the management server 20, it is newly stored in the company database 221 or the existing information is updated. Will be done. Here, the private company information to be input includes, for example, the company's original information such as the latest technology of the candidate company and the statement of appeal of the technical ability to the ordering company, and the answer information to the question from the ordering company. ..

[技術タグデータベース２２２の構築・更新]
管理サーバ２０のデータベース構築部２６は、さらにユーザ端末１０によるキーワード検索に先立って各企業が保有する潜在的な技術特徴等をタグ付けするための基礎となる技術タグ情報をユーザ端末１０又はその他の外部装置（不図示）から取得し、技術タグデータベース２２２に新規に記憶又は既存情報を更新する。本実施形態では、上記企業データベース２２１の構築と同時又はそれと前後して技術タグ情報を取得し、技術タグデータベース２２２を構築する。技術タグ情報には、前述のように、規格タグ、設備タグ及び受賞歴タグが含まれる。 [Construction / update of technical tag database 222]
The database construction unit 26 of the management server 20 further provides the user terminal 10 or other technical tag information that is the basis for tagging potential technical features and the like possessed by each company prior to the keyword search by the user terminal 10. It is acquired from an external device (not shown) and newly stored or updated in the technical tag database 222. In the present embodiment, the technical tag information is acquired at the same time as or before and after the construction of the corporate database 221 to construct the technical tag database 222. As described above, the technical tag information includes a standard tag, an equipment tag, and an award history tag.

生成される候補企業リストの付属情報として、各企業ＩＤと上記技術タグに関する情報を対応付けて表示可能にすることによって、ユーザは、候補企業の絞り込みのために、その企業が保有する国際規格、設備・装置又は受賞歴を含む技術タグの項目を参照して、一目で候補企業の潜在的な技術力を比較することができる。例えば、規格タグの項目に様々な国際規格等や受賞歴が表示されるような企業は、一目で製品・サービスの品質を重視することに留意していると考えられる企業であり、また他機関等から表彰されるほど技術力が高い企業であると想像でき、さらに設備タグの項目に様々な装置・設備名が表示されるような企業は、装置・設備名を一目見るだけで試作や量産等の対応が可能な企業であるかを推察することができる。また、例えば、クリーンルームの保有がビジネスパートナー企業に要求されるような場合、「クリーンルーム」が特徴語として含まれるが、そのスコアが低い場合はリストアップされた時にユーザ端末１０の画面上には特徴語が表示されない場合もある。しかし、そのような場合でも設備タグを参照することで、ユーザは一目でクリーンルームを保有していることを確認できる。また、ユーザは、国際規格や受賞歴等が表示されるので企業の生産体制や信頼性を推測することができる。また、検索するユーザが専門知識を持っていない場合でも、技術タグの項目を一目見るだけで容易にその企業の特徴を確認することができる。 By making it possible to display the information related to each company ID and the above technology tag in association with each other as the attached information of the generated candidate company list, the user can use the international standard owned by the company to narrow down the candidate companies. It is possible to compare the potential technological capabilities of candidate companies at a glance by referring to the items of technology tags including equipment / equipment or award history. For example, a company whose standard tag item displays various international standards and awards is a company that is considered to be paying attention to the quality of products and services at a glance, and other organizations. It can be imagined that the company has high technical capabilities so that it can be commended from the above, and companies that display various equipment / equipment names in the equipment tag item can make prototypes or mass-produce them just by looking at the equipment / equipment names. It is possible to infer whether the company is capable of dealing with such issues. Further, for example, when a business partner company is required to own a clean room, "clean room" is included as a feature word, but when the score is low, the feature is displayed on the screen of the user terminal 10 when it is listed. Words may not be displayed. However, even in such a case, the user can confirm at a glance that he / she owns a clean room by referring to the equipment tag. In addition, since the international standard, award history, etc. are displayed, the user can infer the production system and reliability of the company. Moreover, even if the user who searches does not have specialized knowledge, the characteristics of the company can be easily confirmed by just looking at the item of the technical tag.

［特徴語データベース２２３の構築・更新］
管理サーバ２０の特徴語生成部２７は、企業データベース２２１の企業情報（テキストデータ）から企業ごとに特徴語とそのスコアを生成して、それらを特徴語データベース２２３に新規に記憶又は既存情報を更新する。 [Construction / update of feature word database 223]
The feature word generation unit 27 of the management server 20 generates feature words and their scores for each company from the company information (text data) of the company database 221 and newly stores them in the feature word database 223 or updates existing information. do.

ここで、特徴語生成部２７における特徴語生成処理の詳細について説明する。本実施形態は、後述するキーワード検索等の処理に先立って企業データベース２２１に記憶されている企業情報等から特徴語を生成して特徴語データベース２２３に記憶する。なお、本処理は後述するキーワード検索時に行うようにしてもよいが、本実施形態では本処理はキーワード検索処理とは独立した処理とし、事前に特徴語を特徴語データベース２２３に記憶しておき、また独立して逐次特徴語データベース２２３に記憶される特徴語の更新を行うものとする。 Here, the details of the feature word generation process in the feature word generation unit 27 will be described. In this embodiment, a feature word is generated from the company information or the like stored in the company database 221 prior to the processing such as the keyword search described later, and is stored in the feature word database 223. Although this process may be performed at the time of keyword search described later, in this embodiment, this process is a process independent of the keyword search process, and the feature words are stored in the feature word database 223 in advance. In addition, the feature words stored in the feature word database 223 are sequentially updated independently.

特徴語生成部２７は、まず企業データベース２２１からテキストデータの企業情報（公開企業情報及び非公開企業情報）を呼び出す。次に、特徴語生成部２７は、呼び出した企業情報から不要語を除去して特徴語を抽出する。具体的には、特徴語生成部２７は、技術用語をまとめた技術辞書（不図示）やその単語の文法上等の属性をまとめた属性辞書（不図示）等を用いて、テキストを形態素に分解・決定する形態素解析を行って、その結果から技術用語を抽出する。この際、同義語やゆらぎ語（例えば、「モーター」と「モータ」）は１つの単語としてまとめるようにする。不要語除去においては、不要語辞書（不図示）を用意して、形態素化されたテキストのうち、助詞や非技術用語（名詞）等を不要語として除去する。不要語を除去した後の単語は技術用語と想定され、特徴量（後述のスコア）が計算される。なお、特徴語生成部２７は、抽出した単語（技術用語）に対して、ビジネスパートナー探索における技術的な特徴を表す単語として共通的に用いられて一般化しているような抽象的な単語（例えば、「〇〇装置」の「装置」等）を抽象語として取り除き、残った単語（例えば、「〇〇装置」の「〇〇」の部分）を特徴語として特徴語データベース２２３に格納（新規に記憶又は情報を更新）する。 The feature word generation unit 27 first calls the company information (public company information and private company information) of the text data from the company database 221. Next, the feature word generation unit 27 removes unnecessary words from the called company information and extracts the feature words. Specifically, the feature word generation unit 27 uses a technical dictionary (not shown) that summarizes technical terms, an attribute dictionary (not shown) that summarizes grammatical attributes of the word, and the like to make text into a morpheme. Perform morphological analysis to decompose and determine, and extract technical terms from the results. At this time, synonyms and fluctuation words (for example, "motor" and "motor") should be combined as one word. In removing unnecessary words, an unnecessary word dictionary (not shown) is prepared, and particles, non-technical terms (nouns), etc. are removed as unnecessary words from the morphemeized text. The word after removing unnecessary words is assumed to be a technical term, and the feature amount (score described later) is calculated. In addition, the feature word generation unit 27 is an abstract word (for example,) that is commonly used and generalized as a word representing a technical feature in business partner search with respect to the extracted word (technical term). , "Device" of "○○ device") is removed as an abstract word, and the remaining word (for example, "○○" part of "○○ device") is stored in the feature word database 223 as a feature word (newly). Update memory or information).

本実施形態において各特徴語にはそのテキスト（特徴語）の特徴を表す指標としてスコアと呼ばれる特徴量（スカラ量）が与えられている。本実施形態では、この指標としてＴＦ−ＩＤＦ（Term Frequency - Inverse Document Frequency）値を採用している。ＴＦ−ＩＤＦ値はＴＦ値とＩＤＦ値との積で、ＴＦ値は企業のテキストデータの中での各単語の出現頻度、ＩＤＦ値は全企業のテキストデータのうち、全企業に対する各単語を含む企業数の割合の逆数に対数をとった値である。このスコアは文書に含まれる単語の重要度を示すもので文書の特徴を判別することができる。具体的には、「ある文書の中で何度も出現する単語、かつ多くの文書に出現しない単語はその文書を特徴付ける重要な単語」と理解できる。ここで、ＴＦ−ＩＤＦ値が高い順番に単語を並び替えると技術単語以外の単語も上位にランクインするため、前述のように技術単語辞書を用意して、その辞書に含まれる単語のみファクターを乗算して特徴語データベース２２３に格納する。
なお、スコアの指標や次元数についてはこれに限定されるものではなく、実装するシステムの性質や目的によって適宜変更するようにしてもよい。 In the present embodiment, each feature word is given a feature amount (scalar amount) called a score as an index indicating the feature of the text (feature word). In this embodiment, a TF-IDF (Term Frequency --Inverse Document Frequency) value is adopted as this index. The TF-IDF value is the product of the TF value and the IDF value, the TF value is the frequency of appearance of each word in the text data of the company, and the IDF value is the text data of all companies including each word for all companies. It is the logarithmic value of the reciprocal of the ratio of the number of companies. This score indicates the importance of the words contained in the document, and can distinguish the characteristics of the document. Specifically, it can be understood that "words that appear many times in a document and words that do not appear in many documents are important words that characterize the document." Here, if the words are rearranged in descending order of TF-IDF value, words other than technical words are also ranked high, so prepare a technical word dictionary as described above and set the factor only for the words included in the dictionary. Multiply and store in the feature word database 223.
The score index and the number of dimensions are not limited to this, and may be appropriately changed depending on the nature and purpose of the system to be implemented.

［学習モデルデータベース２２４の構築・更新］
管理サーバ２０の学習モデル生成部２８は、製造業に関する技術文書中に含まれる特徴語を周辺の特徴語から予測させる深層学習を行ってそれぞれの特徴語を高次元ベクトルの分散表現で構築した学習モデル（学習済みモデル）を生成し、学習モデルデータベース２２４に特徴語ごとに新規に記憶又は記憶情報を更新する。 [Construction / update of learning model database 224]
The learning model generation unit 28 of the management server 20 performs deep learning to predict the feature words contained in the technical document related to the manufacturing industry from the surrounding feature words, and constructs each feature word by the distributed representation of the high-dimensional vector. A model (learned model) is generated, and the learning model database 224 is newly stored or the stored information is updated for each feature word.

ここで、特徴語の高次元ベクトルの分散表現には、文字・単語をベクトル空間に埋め込み、その空間上の一つの点としてとらえる単語分散表現を用いることができる。本実施形態では、Ｗｏｒｄ２Ｖｅｃを用いた学習済みの埋め込み空間で単語ベクトル空間を表現している。なお、ベクトル空間表現はＷｏｒｄ２Ｖｅｃに限られることはなくＤｏｃ２ＶｅｃやＴＦ−ＩＤＦ等のその他のベクトル空間モデルを使用するようにしてもよい。 Here, as a distributed expression of a high-dimensional vector of a feature word, a word distributed expression in which characters / words are embedded in a vector space and regarded as one point in the space can be used. In this embodiment, the word vector space is expressed by the learned embedded space using Word2Vec. The vector space representation is not limited to Word2Vec, and other vector space models such as Doc2Vec and TF-IDF may be used.

また、学習モデル生成部２８における深層学習は、入力層、中間層及び出力層の多層構造のニューラルネットワークを用いて行われ、文章中の各単語（「中心語」と呼ぶ）に対して入力層に中心語の周辺の単語（「周辺語」と呼ぶ）をワンホット（one-hot）表現へ変換したベクトルを与え、出力層に中心語をワンホット表現へ変換したベクトルを与えて、周辺語から中心語を予測する学習を行う。本実施形態は、例えば、中心のある単語を周辺の単語から予測する教師あり学習法であるＣＢＯＷ（Continuous Bag of Words）法を用い、入力として周辺語を与え、中心語の予測を出力する学習を行い、学習を通じてある単語の周囲にどのような単語が現れる可能性が高いのかを学習させる。学習が終われば各単語の分散表現が得られる。ここで、分散表現は、中間層への重み行列とする。中間層の次元は１００〜１０００次元を選ぶのが一般的であるが、本実施形態では５００次元を採用する。 Further, deep learning in the learning model generation unit 28 is performed using a neural network having a multi-layer structure of an input layer, an intermediate layer, and an output layer, and is performed for each word in a sentence (referred to as a "central word"). Is given a vector in which the words around the central word (called "peripheral words") are converted into one-hot expressions, and the output layer is given a vector in which the central words are converted into one-hot expressions. Learn to predict the central word from. In this embodiment, for example, learning is performed by using the CBOW (Continuous Bag of Words) method, which is a supervised learning method for predicting a central word from peripheral words, giving peripheral words as input, and outputting the prediction of the central word. To learn what kind of words are likely to appear around a certain word through learning. After learning, a distributed expression of each word can be obtained. Here, the distributed representation is a weight matrix to the middle layer. The dimension of the intermediate layer is generally selected from 100 to 1000 dimensions, but in this embodiment, 500 dimensions are adopted.

前述したように、本実施形態に係る検索システム１では、技術文書として、特許文献、企業の研究開発報告書、技術関連ニュース等の技術に関する公開文献情報を用いて学習を行っている。一方、一般のニュース記事等のように、専門用語が正しく使われていない可能性が高い文書や、学術論文のようなその分野の専門家しか理解できないような専門用語や表現が多く含まれるような文書は学習対象の技術文書から除くようにしている。すなわち、いわゆる「ものづくり」の分野である製造業界におけるビジネスパートナー探索に好適なシステムとなるように、学習対象の技術文書を上記のように限定することで、最適な機械学習が行われるようにしている。 As described above, in the search system 1 according to the present embodiment, learning is performed using public document information on technology such as patent documents, corporate research and development reports, and technology-related news as technical documents. On the other hand, documents such as general news articles that are likely to be misused with technical terms, and technical terms and expressions that only experts in the field can understand, such as academic papers, are included. Documents are excluded from the technical documents to be studied. In other words, by limiting the technical documents to be learned as described above so that the system is suitable for searching for business partners in the manufacturing industry, which is the so-called "manufacturing" field, optimal machine learning can be performed. There is.

学習モデル生成処理は、後述する類似度計算処理の直前に行うようにしてもよいが、本実施形態では本処理は類似度計算処理とは独立した処理とし、事前に学習モデルを生成して学習モデルデータベース２２４に記憶しておき、定期的に（例えば、四半期に一度）学習モデルデータベース２２４に記憶される学習モデルの更新を行うものとする。 The learning model generation process may be performed immediately before the similarity calculation process described later, but in the present embodiment, this process is a process independent of the similarity calculation process, and a learning model is generated and learned in advance. It is stored in the model database 224, and the learning model stored in the learning model database 224 is updated periodically (for example, once a quarter).

また、学習モデルデータベースをユーザの目的にあわせて選択的に使用できるように複数の学習済みモデルを備えたシステムにしてもよい。このようにすることで、例えば、自動車業界と素材・金属業界とで用いられる特徴語「モータ」のニュアンスの違いに対応することができる。また、製造業をさらにいくつかの細部業種に分類してそれぞれの細部業種で別のデータベースを設けたり、製造業を含む複数の業種のデータベースを設けたりして、ビジネスパートナー探索の内容に応じて使用するデータベースを適宜選択できるようにしてもよい。 Further, the system may include a plurality of trained models so that the learning model database can be selectively used according to the user's purpose. By doing so, for example, it is possible to deal with the difference in nuance of the characteristic word "motor" used in the automobile industry and the material / metal industry. In addition, the manufacturing industry is further classified into several detailed industries and a separate database is set up for each detailed industry, or a database for multiple industries including the manufacturing industry is set up, depending on the content of the business partner search. The database to be used may be appropriately selected.

［キーワード候補（特徴語）の提示処理］
ビジネスパートナー候補企業のリストアップをキーワード検索に基づいて行う場合、通常は複数の異なるキーワードを同時に、又は逐次入力し、適切な候補企業がリストアップされるように検索結果を絞り込んでいくという作業が発生する。そこで、本実施形態では、ビジネスパートナー候補企業のリストアップを行うユーザが、ユーザ端末１０で一つ目のキーワードを入力し、そのキーワードと一致するテキストデータが企業データベース２２１に記憶された企業情報に含まれているか、そのキーワードに対応する特徴語が特徴語データベース２２３に保持されている企業をリストアップする（ステップＳ１２）。この際、管理サーバ２０は、入力されたキーワードに基づいて候補企業のリストを生成するが、そのリストに候補企業が持つ複数の特徴語を含めることで、そのユーザが候補企業の絞り込み検索を行う際のキーワード候補をユーザ端末１０を介してユーザに示唆することができる。図７は、本発明の一実施形態に係る検索システム１の管理サーバ２０における新たなキーワード候補（特徴語）の生成処理（ステップＳ１２）の詳細を説明するためのフローチャートである。 [Keyword candidate (characteristic word) presentation processing]
When listing business partner candidate companies based on keyword search, it is usually necessary to enter multiple different keywords at the same time or sequentially to narrow down the search results so that appropriate candidate companies are listed. appear. Therefore, in the present embodiment, the user who lists the business partner candidate companies inputs the first keyword on the user terminal 10, and the text data matching the keyword is stored in the company database 221 as the company information. List the companies that are included or whose feature words corresponding to the keyword are held in the feature word database 223 (step S12). At this time, the management server 20 generates a list of candidate companies based on the input keywords, and the user narrows down the search for the candidate companies by including a plurality of characteristic words of the candidate companies in the list. The keyword candidate can be suggested to the user via the user terminal 10. FIG. 7 is a flowchart for explaining the details of the generation process (step S12) of new keyword candidates (feature words) in the management server 20 of the search system 1 according to the embodiment of the present invention.

検索システム１のユーザは、ユーザ端末１０を用いて所定のキーワードを入力してその情報を管理サーバ２０に検索指示を送信する。本実施形態では、まず「樹脂成型」が一つ目のキーワードとして入力された場合を例に挙げる。管理サーバ２０は、通信部２１を介してユーザ端末１０で入力されたキーワード「樹脂成型」を取得する（ステップＳ１２１）。 The user of the search system 1 inputs a predetermined keyword using the user terminal 10 and transmits the information to the management server 20 as a search instruction. In the present embodiment, first, a case where "resin molding" is input as the first keyword will be given as an example. The management server 20 acquires the keyword “resin molding” input by the user terminal 10 via the communication unit 21 (step S121).

次に、検索部２３は、取得したキーワード「樹脂成型」と同一の特徴語「樹脂成型」を持つ企業を記憶部２２内の特徴語データベース２２３の中から検索し、ヒットした企業の企業ＩＤを取得する（ステップＳ１２２）。例えば、本実施形態では、図４に示される特徴語「樹脂成型」を保有する複数の企業の企業ＩＤ「５３９８，３６５８８，３４５８９、…」がヒットすることとなり、特徴語「樹脂成型」を保有しない企業の企業ＩＤはヒットしない。 Next, the search unit 23 searches for a company having the same characteristic word "resin molding" as the acquired keyword "resin molding" from the characteristic word database 223 in the storage unit 22, and searches for the company ID of the hit company. Acquire (step S122). For example, in the present embodiment, the company IDs "5398, 36588, 34589, ..." Of a plurality of companies having the characteristic word "resin molding" shown in FIG. 4 are hit, and the characteristic word "resin molding" is possessed. The company ID of the company that does not hit does not hit.

次に、リスト生成部２５は、検索部２３でヒットした企業ＩＤの企業が保有する特徴語のうちスコアの高い特徴語とそのスコアを含む候補企業リストを生成する（ステップＳ１２３）。リスト生成部２５で生成された候補企業リストは、管理サーバ２０が管理するＷｅｂサイトにアクセスしたユーザ端末１０の画面上に表示される。図８は、本発明の一実施形態に係る候補企業リストのユーザ端末１０への表示例を示す図である。図８に示すように、ユーザ端末１０には、「樹脂成型」という入力キーワードの他に、「液晶」、「板金」、「赤外線」等の他のキーワードとなり得る特徴語がスコアとともに表示される。そしてユーザは、特徴語「樹脂成型」と掛け合わせて検索するための、別の特徴語「試作」を二つ目のキーワードとして選択することができる。なお、ステップS１２３で生成された企業リストを提示する場合（ユーザ端末１０の画面上に表示させる場合）に、企業ＩＤと特徴語の他に、ユーザ便宜のために、リスト生成部２５は、企業データベース２２１を参照して、企業ＩＤに対応付けられている企業名、住所、ＵＲＬ、事業概要等をあわせて企業リストに含めるようにして生成してもよい。 Next, the list generation unit 25 generates a candidate company list including the feature words having a high score among the feature words owned by the company with the company ID hit by the search unit 23 and the score (step S123). The candidate company list generated by the list generation unit 25 is displayed on the screen of the user terminal 10 that has accessed the website managed by the management server 20. FIG. 8 is a diagram showing an example of displaying the candidate company list according to the embodiment of the present invention on the user terminal 10. As shown in FIG. 8, on the user terminal 10, in addition to the input keyword "resin molding", characteristic words that can be other keywords such as "liquid crystal", "sheet metal", and "infrared ray" are displayed together with the score. .. Then, the user can select another feature word "prototype" as the second keyword for searching by multiplying the feature word "resin molding". When presenting the company list generated in step S123 (when displaying on the screen of the user terminal 10), in addition to the company ID and feature words, the list generation unit 25 is used for the convenience of the user. The database 221 may be referred to, and the company name, address, URL, business outline, etc. associated with the company ID may be included in the company list.

この際、一つ目のキーワード（上記例では「樹脂成型」）と表示される別の特徴語（上記例では「試作」）は類似している場合もあれば、類似していない場合もある。ユーザ端末１０のユーザは、どのような複数の視点で検索するか等の検索目的に応じて表示されている特徴語から二つ目のキーワードを適宜選択すればよく、その際にキーワード同士が類似しているか類似していないかは考慮していない。 At this time, another characteristic word (“prototype” in the above example) displayed as the first keyword (“resin molding” in the above example) may or may not be similar. .. The user of the user terminal 10 may appropriately select the second keyword from the feature words displayed according to the search purpose such as what kind of viewpoints to search from, and at that time, the keywords are similar to each other. It does not consider whether it is similar or not.

なお、本実施形態では、一例として、企業ごとにスコアが高い上位１０個の特徴語がスコアの高い特徴語から降順に表示されるようにしている。これにより、ユーザは、最初の検索時に予期しないキーワード候補を知ることができる。ここで、ユーザが最初に入力したキーワード「樹脂成型」と同一の特徴語のスコアが低く、上位１０個よりも下位の場合は、この特徴語「樹脂成型」は表示されずに別の上位１０個の特徴語がその企業の特徴語として表示される。 In the present embodiment, as an example, the top 10 feature words with the highest scores for each company are displayed in descending order from the feature words with the highest score. This allows the user to know unexpected keyword candidates at the time of the first search. Here, if the score of the same feature word as the keyword "resin molding" first entered by the user is low and lower than the top 10, this feature word "resin molding" is not displayed and another top 10 Individual feature words are displayed as feature words of the company.

また、ユーザ端末１０に表示される特徴語は、それぞれの企業について表示する特徴語の個数を設定（例えば、スコアが上位X個（Xは自然数）の特徴語に設定）したり、しきい値以上のスコアを有する特徴語をすべて表示するように設定したりすることで、表示画面の錯綜状態を軽減することができる。
上述の処理により、ユーザは二つのキーワード「樹脂成型」及び「試作」を用いてビジネスパートナー候補企業のリストアップを行うことを決定する。 Further, for the feature words displayed on the user terminal 10, the number of feature words to be displayed for each company is set (for example, the score is set to the top X feature words (X is a natural number)), or the threshold value is set. By setting to display all the feature words having the above scores, it is possible to reduce the complicated state of the display screen.
By the above process, the user decides to list the candidate business partner companies using the two keywords "resin molding" and "prototype".

なお、上述の実施形態に加えて、ステップＳ１２２の処理において、検索部２３は、記憶部２２内の企業データベース２２１に記憶されている各企業のテキストデータ（公開企業情報及び非公開企業情報）中に、取得したキーワードと同一のテキストデータを持つ企業を企業ごとに検索し、ヒットした企業の企業ＩＤを取得するようにしてもよい。本実施形態では、各受注候補企業が保有する特徴語は５００個に制限しているため、前述の出現頻度が低くスコアが小さな特徴語はデータとして保持されていない。そのため、特徴語データベース２２３には保持されていないが、企業データベース２２１のテキストデータには含まれているキーワードがユーザ端末１０でキーワードとして入力された場合であっても、次回以降の検索で用いることができる有望な複数のキーワード候補を提示することができる。なお、この場合、リスト生成部２５は、上述のステップＳ１２３の処理と同様に、ユーザ端末１０で入力されたキーワードに一致するテキストデータが含まれていると検索部２３で検索された受注候補企業が保有する複数の特徴語を含む企業リストを生成することとなるが、その特徴語の中にはそのキーワードは含まれていない。しかし、ユーザ端末１０のユーザはリストアップされた様々な特徴語の中から次に検索する場合に適切だと思われる１又は複数のキーワードを容易に選定することができるようになる。 In addition to the above-described embodiment, in the process of step S122, the search unit 23 is included in the text data (public company information and private company information) of each company stored in the company database 221 in the storage unit 22. In addition, a company having the same text data as the acquired keyword may be searched for each company, and the company ID of the hit company may be acquired. In the present embodiment, since the number of characteristic words possessed by each candidate company is limited to 500, the above-mentioned characteristic words having a low frequency of appearance and a small score are not retained as data. Therefore, even if a keyword included in the text data of the company database 221 is input as a keyword on the user terminal 10, although it is not retained in the feature word database 223, it should be used in the next and subsequent searches. Can present multiple promising keyword candidates. In this case, the list generation unit 25 is the order candidate company searched by the search unit 23 that the text data matching the keyword input by the user terminal 10 is included, as in the process of step S123 described above. Will generate a list of companies that include multiple feature words owned by, but the keywords are not included in the feature words. However, the user of the user terminal 10 can easily select one or a plurality of keywords that are considered to be appropriate for the next search from the various feature words listed.

これまでの処理により、企業データベース２２１には公開企業情報及び非公開企業情報を含む企業情報が記憶され、技術タグデータベース２２２には設備タグ、標準規格タグ及び受賞歴タグを含む技術タグ情報が記憶され、特徴語データベース２２３には企業ごとの特徴語とそのスコアが記憶され、学習モデルデータベースには特徴語の学習済みモデルが記憶されている。そして、ユーザ端末１０のユーザが１つのキーワードを入力したことにより管理サーバ２０において上述の処理が行われ、ユーザ端末１０にはその他のキーワード候補が表示されるので、ユーザはよりビジネスパートナー候補企業を検索するための、より適切なキーワードを選択することができる。 By the processing so far, the company database 221 stores the company information including the public company information and the private company information, and the technology tag database 222 stores the technology tag information including the equipment tag, the standard tag and the award history tag. The feature word database 223 stores the feature words and their scores for each company, and the learning model database stores the trained model of the feature words. Then, when the user of the user terminal 10 inputs one keyword, the above processing is performed on the management server 20, and other keyword candidates are displayed on the user terminal 10, so that the user can further select a business partner candidate company. You can select more appropriate keywords to search.

そこで、以下の実施形態では、ユーザが２つのキーワードを入力してＡＮＤ（アンド）検索を行うことでビジネスパートナー候補企業のリストを得る処理について説明する。ここで本実施形態では、上記２つのキーワードのＡＮＤ検索を行うものとする。例えば、ユーザ端末１０のユーザは、入力部１１で「樹脂成型試作」と２つのキーワードを両者間にスペースを挟んで入力する。 Therefore, in the following embodiment, a process of obtaining a list of business partner candidate companies by a user inputting two keywords and performing an AND search will be described. Here, in the present embodiment, it is assumed that the AND search of the above two keywords is performed. For example, the user of the user terminal 10 inputs "resin molding prototype" and two keywords with a space between them in the input unit 11.

ユーザ端末１０で入力された２つのキーワードを管理サーバ２０が取得すると（ステップＳ１３）、検索部２３は前述のステップＳ１２２の処理と同様に、取得したキーワードと同一の特徴語を持つ企業を特徴語データベース２２３から検索し、その企業の企業ＩＤを取得する（ステップＳ１４）。この際の処理としては、例えば、入力された二つのキーワード（樹脂成型、試作）と同一の特徴語を両方とも保有している企業が検索される。例えば、本実施形態では、図４に示される特徴語「樹脂成型」及び「試作」の両方の特徴語を保有する複数の企業の企業ＩＤ「５３９８，３４５８９，…」がヒットすることとなり、特徴語「樹脂成型」は保有するが特徴語「試作」を保有しない（特徴語として記憶されている５００個に含まれていないものとする）企業の企業ＩＤ「３６５８８，…」はヒットしない。 When the management server 20 acquires the two keywords input by the user terminal 10 (step S13), the search unit 23 features a company having the same characteristic words as the acquired keywords, as in the process of step S122 described above. The company ID of the company is acquired by searching from the database 223 (step S14). As a process at this time, for example, a company having both the same characteristic words as the two input keywords (resin molding, trial production) is searched. For example, in the present embodiment, the company IDs "5398, 34589, ..." Of a plurality of companies having both the feature words "resin molding" and "trial production" shown in FIG. 4 are hit, and the feature The company ID "36588, ..." of a company that possesses the word "resin molding" but does not possess the characteristic word "prototype" (assuming that it is not included in the 500 stored as the characteristic word) does not hit.

［類似度計算処理］
次に、類似度計算部２４は、取得した２つのキーワードと検索された企業との類似度を計算する（ステップＳ１５）。図９は、本発明の一実施形態に係る検索システム１の管理サーバ２０における類似度計算処理（ステップＳ１５）の詳細を説明するためのフローチャートである。 [Similarity calculation processing]
Next, the similarity calculation unit 24 calculates the similarity between the two acquired keywords and the searched company (step S15). FIG. 9 is a flowchart for explaining the details of the similarity calculation process (step S15) in the management server 20 of the search system 1 according to the embodiment of the present invention.

類似度計算部２４は、ヒットした企業ごとに、各企業の特徴語の分散表現（高次元ベクトル表現）を学習モデルデータベース２２４から取得して合成分散表現を生成する（ステップＳ１５１）。本実施形態において合成分散表現は、各企業が保有する特徴語に与えられたそれぞれの高次元ベクトル表現を加算することで得られる。具体的には、ヒットした企業ごとに、その企業が保有する特徴語（本実施形態では各企業には５００個の特徴語が付与されている）の分散表現をすべて足し合わせた合成分散表現を生成する。この際、各特徴語の分散表現は単純に足し合わせてもよいし、スコアが上位の特徴語は、その企業の特徴がより表されているものと考えて、スコアが下位の特徴語よりも高い重み付けを行って足し合わせて合成分散表現を生成するようにしてもよい。 The similarity calculation unit 24 acquires a distributed representation (high-dimensional vector representation) of the feature words of each company from the learning model database 224 for each hit company and generates a synthetic distributed representation (step S151). In the present embodiment, the synthetic distributed representation is obtained by adding the respective high-dimensional vector representations given to the feature words owned by each company. Specifically, for each hit company, a synthetic distributed expression is created by adding all the distributed expressions of the characteristic words owned by that company (500 characteristic words are given to each company in this embodiment). Generate. At this time, the distributed expressions of each feature word may be simply added together, and the feature word having a higher score is considered to be more representative of the characteristics of the company, and is more than the feature word having a lower score. High weighting may be applied and added together to generate a composite distributed representation.

次に、類似度計算部２４は、ユーザ端末１０で入力された二つのキーワード「樹脂成型」及び「試作」に対応する特徴語の分散表現を学習モデルデータベース２２４から取得し、それぞれの入力キーワードに対応する特徴語の分散表現から同様に合成分散表現を生成する（ステップＳ１５２）。なお、この際の合成処理は、両分散表現を単純に足し合わせてもよいし、一つ目のキーワードに対応する特徴語に二つ目又はそれ以降（三つ以上のキーワードが入力された場合）のキーワードに対応する特徴語よりも高い重み付けを与えて足し合わせるようにしてもよい。 Next, the similarity calculation unit 24 acquires the distributed representation of the feature words corresponding to the two keywords “resin molding” and “prototype” input by the user terminal 10 from the learning model database 224, and uses each input keyword as the input keyword. Similarly, a synthetic distributed representation is generated from the distributed representation of the corresponding feature word (step S152). In the composition process at this time, both distributed expressions may be simply added, or the second or later (when three or more keywords are input) in the feature word corresponding to the first keyword. ) May be added by giving a higher weight than the feature word corresponding to the keyword.

そして、類似度計算部２４は、ステップＳ１５１で生成された企業の合成分散表現ベクトルとステップＳ１５２で生成されたキーワードの合成ベクトルの類似度を計算する（ステップＳ１５３）。本実施形態では、計算される類似度としてコサイン類似度を用いる。 Then, the similarity calculation unit 24 calculates the similarity between the composite dispersion expression vector of the company generated in step S151 and the composite vector of the keywords generated in step S152 (step S153). In this embodiment, cosine similarity is used as the calculated similarity.

そして、リスト生成部２５は、検索部２３で検索された企業の企業名と、類似度計算部２４で計算されたその企業の類似度を含む企業リストを生成する（ステップＳ１６）。生成された企業リストは、管理サーバ２０にアクセスしたユーザ端末１０に表示されるＷｅｂブラウザで表示可能な形式で出力される。図１０は、本発明の一実施形態に係る類似度を含む候補企業リストのユーザ端末１０での表示例を示す図である。図１０に示すように、ユーザ端末１０には、候補企業名が類似度とともに表示される。これにより、候補企業がリストアップされた時点でビジネスパートナーとして相応しいか否かの目安が類似度に基づいて確認できる。そのため、ユーザは従来のようなリストアップした候補企業を全社確認する必要はなく、類似度が上位の企業のみ、或いはしきい値以上の類似度の企業のみ確認するようにすることで、リストアップ後の手間・コストを削減することができる。また、検索結果（出力情報）の標準化が図れるとともに、ユーザの検索目的を酌んだプラスアルファの企業情報を提供できる。なお、企業リスト表示の際には、図８で示したように、ユーザ便宜のために、リスト生成部２５は、企業データベース２２１を参照して、企業ＩＤに対応付けられた企業名、住所、ＵＲＬ及び事業概要等をあわせて企業リストに含めるようにして企業リストを生成するようにしてもよい。 Then, the list generation unit 25 generates a company list including the company name of the company searched by the search unit 23 and the similarity of the company calculated by the similarity calculation unit 24 (step S16). The generated company list is output in a format that can be displayed by a Web browser displayed on the user terminal 10 that has accessed the management server 20. FIG. 10 is a diagram showing an example of displaying a candidate company list including a degree of similarity according to an embodiment of the present invention on the user terminal 10. As shown in FIG. 10, the candidate company name is displayed on the user terminal 10 together with the degree of similarity. As a result, when the candidate companies are listed, it is possible to confirm whether or not they are suitable as business partners based on the degree of similarity. Therefore, the user does not need to check the listed candidate companies company-wide as in the past, but by checking only the companies with the highest similarity or the companies with the similarity above the threshold value, the list is listed. Later labor and cost can be reduced. In addition, the search results (output information) can be standardized, and plus alpha company information can be provided in consideration of the user's search purpose. When displaying the company list, as shown in FIG. 8, for the convenience of the user, the list generation unit 25 refers to the company database 221 and refers to the company name, address, and the company name and address associated with the company ID. The company list may be generated by including the URL, the business outline, and the like in the company list.

また、ビジネスパートナー企業探索における発注企業からの要求事項（リクワイヤメント）に保有規格、保有設備又は特定の表彰の受賞歴の有無等に関する項目があるような場合は、それらが探索上重要な要素であるとして、候補企業がリストアップされた後等に、リスト生成部２５が技術タグ情報を参照し、それらの技術タグを保有しない企業をリストアップ対象から外すような処理を行うことも可能である。また、リクワイヤメントに記載されている保有設備等を保有する企業をリストアップの際に、案件によりマッチする企業であるとして、上位に表示するような処理を行うこともできる。 In addition, if the requirements (requirement) from the ordering company in the search for a business partner company include items related to owned standards, owned equipment, or whether or not a specific award has been awarded, those are important factors in the search. Therefore, after the candidate companies are listed, the list generation unit 25 can refer to the technical tag information and perform a process of excluding the companies that do not have those technical tags from the list. be. In addition, when listing the companies that own the owned equipment and the like described in the recruitment, it is possible to perform processing such that the companies are displayed at the top as being the companies that match the case.

［非公開企業情報の取得及び企業データベース２２１への記憶処理］
ここで、上述した類似度はそのままのレーティングだけではビジネスマッチングシステムとしての信頼性は低い可能性がある。これは単に公開情報等のキーワードだけで類似度を見ているためである。そこで、本実施形態では、過去のビジネスパートナー探索において取得されたプロジェクト案件での入力情報や、さらに実際に発注企業と受注候補企業とが面談に進んでいる等の付加情報を非公開企業情報として企業データベース２２１に記憶させ、それらを含む結合テキストデータから特徴語を生成することで、より高い信頼性を持つ検索システムを構築している。 [Acquisition of private company information and storage processing in company database 221]
Here, there is a possibility that the reliability as a business matching system is low only by the rating with the above-mentioned similarity as it is. This is because the similarity is viewed only by keywords such as public information. Therefore, in the present embodiment, the input information in the project project acquired in the past business partner search and the additional information such as the fact that the ordering company and the order candidate company are actually proceeding to the interview are used as non-public company information. A search system with higher reliability is constructed by storing it in the company database 221 and generating feature words from the combined text data including them.

前述のとおり、本実施形態に係る検索システム１におけるビジネスパートナー候補企業の探索においては、公開企業情報だけでなく非公開企業情報も重要な位置付けにある。
ここでは、データベース構築部２６による非公開企業情報の取得及び企業データベース２２１への記憶処理の詳細について説明する。 As described above, in the search for business partner candidate companies in the search system 1 according to the present embodiment, not only public company information but also private company information is in an important position.
Here, the details of the acquisition of private company information by the database construction unit 26 and the storage processing in the company database 221 will be described.

本実施形態では、所定の時期に限って、管理企業の検索システム１の管理サーバ２０に発注企業のユーザ端末又は受注候補企業のユーザ端末１０からアクセスし、非公開企業情報をそれぞれ入力又はアップロードすることができる。例えば、受注候補企業のユーザ端末１０からビジネスパートナー探索のプロジェクト案件がスタートし、受注企業を募集する期間である１か月間に限って、受注候補企業の担当者等が案件に対する回答を管理サーバ２０が管理するＷｅｂブラウザを介して入力することができるようにする。なお、当該期間は、これ以外に、各ビジネスパートナー探索プロジェクトが行われている全期間であってもよいし、特に期間を設けずにいつでも入力できるようにしておき、管理者が任意のタイミングで入力の可否を調整できるようにしてもよい。入力された回答情報等は非公開企業情報としてデータベース構築部２６によって企業データベース２２１に記憶され、情報が新規に記憶又は既存情報が更新される。 In the present embodiment, the management server 20 of the search system 1 of the management company is accessed from the user terminal of the ordering company or the user terminal 10 of the order candidate company, and the private company information is input or uploaded, respectively, only at a predetermined time. be able to. For example, a business partner search project project starts from the user terminal 10 of the order candidate company, and the person in charge of the order candidate company responds to the project only for one month, which is the period for recruiting the order company, on the management server 20. Allows input via a Web browser managed by. In addition to this, the period may be the entire period in which each business partner search project is being carried out, or it may be possible to input at any time without setting a special period, and the administrator can enter it at any time. It may be possible to adjust whether or not input is possible. The input answer information and the like are stored in the company database 221 by the database construction unit 26 as non-public company information, and the information is newly stored or the existing information is updated.

一般に、公開企業情報は事業概要やサービス説明等の一般的な情報であって、その企業のコアな技術力に関する情報が書かれていないことも多い。一方で個々のビジネスパートナー探索プロジェクトにおける質問等の回答情報には、その探索案件の募集概要に見合った受注候補企業からのアピール文を含む提案内容が書かれており、それぞれの企業等の様様な事情や技術・開発のトレンド等を加味したその企業に関するより専門的な技術情報がテキスト入力される。そこでこの情報（非公開企業情報）から生成された特徴語については、公開企業情報から生成された特徴語よりも、重み付けを大きくすることでより過去の類似したビジネスパートナー探索で得た情報をより有効に活用した検索システムを構築することができる。具体的には、回答情報中の何回も繰り返し用いられたテキストに対応する特徴語により高い重み付けを与えたシステムにしてもよい。 In general, public company information is general information such as business outlines and service explanations, and often does not contain information on the core technical capabilities of the company. On the other hand, in the answer information such as questions in each business partner search project, the content of the proposal including the appeal statement from the order candidate company that matches the recruitment outline of the search project is written, and it is like each company etc. More specialized technical information about the company, taking into account circumstances, technology and development trends, etc., is entered as text. Therefore, for the feature words generated from this information (private company information), the information obtained in the search for similar business partners in the past can be obtained by increasing the weighting compared to the feature words generated from the public company information. It is possible to build a search system that makes effective use of it. Specifically, a system may be used in which a higher weight is given to the feature words corresponding to the texts used many times in the answer information.

このように本実施形態に係る検索システム１によれば、公開情報等の企業情報をすべて同等のテキストデータとして取り扱うのではなく、過去のビジネスパートナー探索における様々な事情やトレンドを考慮したスマートな探索処理を行うようにすることができる。これにより、過去のビジネスパートナー探索の成果を効果的に活用でき、ビジネスパートナー検索におけるユーザの負担を軽減し、ビジネスパートナーとして相応しい候補企業を効果的に検索できる。 As described above, according to the search system 1 according to the present embodiment, all corporate information such as public information is not treated as equivalent text data, but a smart search that considers various circumstances and trends in the past business partner search. The processing can be performed. As a result, the results of the past business partner search can be effectively utilized, the burden on the user in the business partner search can be reduced, and the candidate companies suitable as business partners can be effectively searched.

［その他の処理］
上述の実施形態では、企業検索のために入力されるキーワード（ステップＳ１２１で管理サーバ２０が取得するキーワード）と、その取得キーワードに基づいてステップＳ１２２で検索され、ステップＳ１２３でその取得キーワードとの類似度が計算される候補企業が保有する特徴語との間の類似度を考慮しているが、入力されるキーワードの周辺キーワード（入力キーワードに類似するキーワード）との類似語は考慮していない。そこで、上記類似度計算処理（ステップＳ１２３）に代えて、入力されるキーワードの周辺キーワードと候補企業が保有する特徴語との間の類似度を計算する別の実施形態も考えられる。 [Other processing]
In the above-described embodiment, the keyword input for the company search (the keyword acquired by the management server 20 in step S121) and the acquired keyword are searched in step S122, and are similar to the acquired keyword in step S123. The similarity with the characteristic words owned by the candidate company for which the degree is calculated is taken into consideration, but the similar words with the peripheral keywords (keywords similar to the input keyword) of the input keyword are not considered. Therefore, instead of the similarity calculation process (step S123), another embodiment for calculating the similarity between the peripheral keywords of the input keyword and the feature words owned by the candidate company can be considered.

具体的には、類似度計算部２４において、検索でヒットした企業ごとに、ユーザ端末１０で入力され管理サーバ２０で取得したキーワードに対応する特徴語の分散表現と類似度が高いキーワードを基底としたベクトル（ベクトル１）と各企業の特徴語を基底としたベクトル（ベクトル２）とのコサイン類似度を計算する。 Specifically, in the similarity calculation unit 24, for each company hit by the search, the distributed expression of the feature words corresponding to the keywords input by the user terminal 10 and acquired by the management server 20 and the keywords with high similarity are used as the basis. The cosine similarity between the obtained vector (vector 1) and the vector (vector 2) based on the characteristic words of each company is calculated.

取得キーワードに対応する特徴語の分散表現と類似度が高いキーワードを基底としたベクトル（ベクトル１）は、取得キーワード（例えば、「樹脂成型」）に対応する特徴語の分散表現とコサイン類似度が高い学習モデルの特徴語を類似度計算部２４で計算し、取得キーワードに対応する特徴語及びその特徴語に類似する（コサイン類似度が高い）複数の別の特徴語を取得して、それらの特徴語とそれぞれのスコアから構成されるベクトルである。また、二つ以上のキーワードが入力される場合は、例えば、計算された特徴語の分散表現を合成して、ベクトル１を生成する。なお、分散表現の合成の際には、最初のキーワードは二番目以降のキーワードよりも軸となる重要なキーワードであるとみなして、最初のキーワードに対応する特徴語の分散表現の方を二番目以降のキーワードに対応する特徴語の分散表現よりも大きな重みを付けて合成するようにしてもよい。 A vector (vector 1) based on a keyword having a high degree of similarity to the distributed expression of the feature word corresponding to the acquired keyword has a cosine similarity with the distributed expression of the feature word corresponding to the acquired keyword (for example, “resin molding”). The feature words of the high learning model are calculated by the similarity calculation unit 24, the feature words corresponding to the acquired keywords and a plurality of other feature words similar to the feature words (high cosine similarity) are acquired, and they are obtained. It is a vector composed of feature words and their respective scores. When two or more keywords are input, for example, the calculated distributed representation of the feature words is combined to generate the vector 1. When synthesizing the distributed expression, the first keyword is regarded as an important keyword that is the axis rather than the second and subsequent keywords, and the distributed expression of the feature word corresponding to the first keyword is the second. The composition may be performed with a larger weight than the distributed expression of the feature words corresponding to the subsequent keywords.

また、各企業の特徴語を基底としたベクトル（ベクトル２）は、各企業の特徴語とＴＦ−ＩＤＦ値の組み合わせを要素として持った行列（企業行列）である。本実施形態において、各企業は図４に示すように５００個の特徴語を保有しており、企業行列はそれらの特徴語とそれぞれのスコアから構成されるベクトルである。 Further, the vector (vector 2) based on the characteristic word of each company is a matrix (company matrix) having a combination of the characteristic word of each company and the TF-IDF value as an element. In the present embodiment, each company has 500 feature words as shown in FIG. 4, and the company matrix is a vector composed of those feature words and their respective scores.

そして、類似度計算部２４は、上記ベクトル１とベクトル２の類似度を計算するが、この際、両ベクトルに含まれる同一の特徴語間のスコアに限って積を計算し、異なる特徴語間の積は考慮しないようにして、特徴語間のスコアの積の総和を算出し、その大きさに基づいて類似度を計算する。
前述したコサイン類似度計算（ステップＳ１５３）を通じた候補企業リスト生成では、各企業の特徴語に重み付けがされていないため、幅広く企業をリストアップしたい場合に有効である。一方、上記の各企業の特徴語を基底としたベクトルを用いた類似度計算を通じた候補企業リストの生成は、よりマッチング確度が高い企業を探す場合に有効である。 Then, the similarity calculation unit 24 calculates the similarity between the vector 1 and the vector 2, but at this time, the product is calculated only for the scores between the same feature words included in both vectors, and the products between the different feature words are calculated. The sum of the products of the scores between the feature words is calculated without considering the product of, and the similarity is calculated based on the size.
In the candidate company list generation through the cosine similarity calculation (step S153) described above, since the characteristic words of each company are not weighted, it is effective when it is desired to list a wide range of companies. On the other hand, the generation of a candidate company list through similarity calculation using a vector based on the characteristic words of each company described above is effective when searching for a company with a higher matching probability.

なお、リスト生成部２５は、類似度計算部２４で計算された上記類似度が所定のしきい値以上の場合である等の所定の条件を満たす場合に、その企業を企業リストに含めるようにしてもよい。
また、ユーザ端末１０で入力されたキーワードに対応する特徴語や学習モデルが記憶部２２内に記憶されていない場合、管理サーバ２０はユーザ端末１０にエラーを返すような設定にすることができる。このような場合は、エラーが出たキーワードを優先的に特徴語として記憶し、その学習モデルを生成する処理を行うようにすることができる。例えば、入力されたキーワードが学習モデルに登録されていない場合は、そのキーワードを含むコーパス（テキスト）を入力として追加学習する。 The list generation unit 25 includes the company in the company list when a predetermined condition such as a case where the similarity calculated by the similarity calculation unit 24 is equal to or higher than a predetermined threshold value is satisfied. You may.
Further, when the feature word or the learning model corresponding to the keyword input by the user terminal 10 is not stored in the storage unit 22, the management server 20 can be set to return an error to the user terminal 10. In such a case, the keyword in which the error occurred can be preferentially stored as a feature word, and the process of generating the learning model can be performed. For example, if the input keyword is not registered in the learning model, the corpus (text) including the keyword is additionally learned as input.

また、前述したステップＳ１２の処理において、二つ以上のキーワード入力でＡＮＤ演算を行い、三つ目以降のキーワードをユーザに示唆するようにしてもよい。ＡＮＤは上記処理と同様にすればよい。また、検索目的に応じて、ステップＳ１３の処理において、二つのキーワードによるＡＮＤ検索を行っているが、三つ以上のキーワードを用いた場合でも同じ原理で行えばよく、またいずれかのキーワードが含まれていればよいとするＯＲ（オア）検索を行うようにしてもよい。 Further, in the process of step S12 described above, the AND operation may be performed by inputting two or more keywords to suggest the third and subsequent keywords to the user. AND may be the same as the above processing. Further, depending on the search purpose, the AND search is performed by two keywords in the process of step S13, but even if three or more keywords are used, the same principle may be used, and any of the keywords is included. An OR (or) search may be performed as long as it is performed.

また、本実施形態では、管理サーバ２０によって提供される機能の全部又は一部がＳａａＳ型のクラウドサービスによって実現される例について説明したが、それらの機能をＰａａＳ（Platform as a Service）、ＩａａＳ（Infrastructure as a Service）、又はＡＰＩ（Application Programming Interface）等を介して提供するサービスによって実現するようにしてもよい。 Further, in the present embodiment, an example in which all or a part of the functions provided by the management server 20 are realized by the SaaS type cloud service has been described, but those functions are described in PaaS (Platform as a Service) and IaaS (Platform as a Service). It may be realized by a service provided via Infrastructure as a Service) or API (Application Programming Interface).

なお、本実施形態で説明した検索システム１の構成、ユーザ端末１０及び管理サーバ２０の構成は一例であり、本発明の範囲を超えない範囲において変更してもよい。また、管理サーバ２０等の処理の流れも一例であり、本発明の範囲を超えない範囲において不要処理ステップの削除や新規処理ステップの追加や処理ステップの入れ替えは可能である。 The configuration of the search system 1 and the configuration of the user terminal 10 and the management server 20 described in the present embodiment are examples, and may be changed within the scope of the present invention. Further, the processing flow of the management server 20 and the like is also an example, and it is possible to delete unnecessary processing steps, add new processing steps, and replace processing steps within the range not exceeding the scope of the present invention.

１検索システム
１０ユーザ端末
２０管理サーバ
２１通信部
２２記憶部
２３検索部
２４類似度計算部
２５リスト生成部
２６データベース構築部
２７特徴語生成部
２８学習モデル生成部
３０ネットワーク
２２１企業データベース
２２２技術タグデータベース
２２３特徴語データベース
２２４学習モデルデータベース

1 Search system 10 User terminal 20 Management server 21 Communication unit 22 Storage unit 23 Search unit 24 Similarity calculation unit 25 List generation unit 26 Database construction unit 27 Feature word generation unit 28 Learning model generation unit 30 Network 221 Corporate database 222 Technical tag database 223 Feature word database 224 Learning model database

Claims

A search system that generates a list of companies for the purpose of searching for business partners.
A feature word storage means for storing for each company a plurality of characteristic words with scores based on the current frequency output of the feature words representing the characteristics of the companies,
A learning model storage means that stores the distributed expression of the feature word obtained by performing machine learning that predicts one feature word from the plurality of feature words for each feature word and uses it as a learning model.
An acquisition method for acquiring keywords input from an input device,
A search means for searching a company having a characteristic word corresponding to the keyword from the characteristic word storage means, and
A similarity calculation means for calculating the similarity between a distributed expression based on a plurality of characteristic words of the company searched by the search means and a distributed expression based on the characteristic words corresponding to the keyword.
A search system including a plurality of feature words of the company searched by the search means, and a list generation means for generating a company list including the score and similarity of the feature words.

The similarity calculation means calculates the similarity between the synthetic distributed expression obtained by synthesizing the distributed expressions of the plurality of feature words of the company searched by the search means and the distributed expression of the feature words corresponding to the keyword. The search system according to claim 1, wherein the search system is performed.

The similarity calculation means calculates a feature word similar to the feature word corresponding to the keyword based on the learning model, and the feature word corresponding to the keyword, a plurality of feature words similar to the feature word, and a plurality of feature words thereof. The search system according to claim 1, wherein the similarity between the first vector including the score, the feature word owned by each company, and the second vector including the score is calculated.

The third aspect of the present invention is that the similarity calculation means calculates the similarity based on the sum of the products of the scores between the same feature words included in the first vector and the second vector. Described search system.

The similarity calculation means weights a feature word having a higher score for each feature word more than a feature word with a lower score for a distributed expression based on a plurality of feature words possessed by the company searched by the search means. The search system according to any one of claims 1 to 4, wherein the distributed expression is obtained by calculating the similarity with the distributed expression based on the feature word corresponding to the keyword.

A search method on a search server that generates a company list for the purpose of searching for business partners.
The acquisition process to acquire the keywords input from the input device, and
The company with characteristic word corresponding to the keyword, the search step of searching a plurality of feature words representing the characteristics of該企industry from the feature word database stored in each company along with the score based on the current frequency output of the characteristic words,
In the search step, referring to a learning model database that stores the distributed expression of the feature word obtained by performing machine learning that predicts one feature word from the plurality of feature words for each feature word and uses it as a learning model. A similarity calculation process for calculating the similarity between the distributed expression based on a plurality of characteristic words possessed by the searched company and the distributed expression based on the characteristic words corresponding to the keyword.
A search method comprising a plurality of feature words of the company searched in the search step, a list generation step of generating a company list including the score and similarity of the feature words.

On a search server that generates a list of companies for the purpose of searching for business partners
The acquisition procedure to acquire the keyword input from the input device, and
A search procedure for searching a company having a characteristic word corresponding to the keyword from a characteristic word database that stores a plurality of characteristic words representing the characteristics of the company together with a score based on the frequency of appearance of the characteristic word for each company.
With reference to a learning model database that stores the distributed expression of the feature word obtained by performing machine learning that predicts one feature word from the plurality of feature words for each feature word and uses it as a learning model, the search procedure is performed. A similarity calculation procedure for calculating the similarity between the distributed expression based on a plurality of characteristic words of the searched company and the distributed expression based on the characteristic word corresponding to the keyword, and
A computer-readable search program for executing a plurality of feature words of the company searched by the search procedure, a list generation procedure for generating a company list including the score and similarity of the feature words.