JP6751188B1

JP6751188B1 - Information processing apparatus, information processing method, and information processing program

Info

Publication number: JP6751188B1
Application number: JP2019144071A
Authority: JP
Inventors: 了輔城下; 勇太大場; 櫻井　努; 努櫻井
Original assignee: DMG Mori Co Ltd
Current assignee: DMG Mori Co Ltd
Priority date: 2019-08-05
Filing date: 2019-08-05
Publication date: 2020-09-02
Anticipated expiration: 2039-08-05
Also published as: JP2021026515A

Abstract

【課題】より効率的に類似文書を発見する情報処理装置、情報処理方法及び情報処理プログラムを提供する。【解決手段】情報処理装置１００は、文書データベースに保存された複数の文書のそれぞれに含まれる単語の意味を表すベクトルの積算結果としての点と、新たに取得した文書に含まれる単語の意味を表すベクトルの積算結果としての点との距離を第１類似度として算出する第１算出部１０１と、複数の文書のそれぞれに含まれる文字と、新たに取得した文書に含まれる文字との相違を距離として算出し、その距離の近さを第２類似度として算出する第２算出部１０２と、第１類似度および第２類似度に基づいて、新たに取得した文書に類似する文書を複数の文書から選択する選択部１０３と、を備える。【選択図】図１PROBLEM TO BE SOLVED: To provide an information processing device, an information processing method and an information processing program for finding similar documents more efficiently. SOLUTION: An information processing apparatus 100 determines a point as an integration result of a vector representing the meaning of a word included in each of a plurality of documents stored in a document database and a meaning of a word included in a newly acquired document. The difference between the first calculation unit 101, which calculates the distance from the point as the integration result of the represented vector as the first similarity, the characters included in each of the plurality of documents, and the characters included in the newly acquired document. A second calculation unit 102 that calculates as a distance and calculates the closeness of the distance as a second similarity, and a plurality of documents that are similar to the newly acquired document based on the first similarity and the second similarity. A selection unit 103 for selecting from a document is provided. [Selection diagram] Fig. 1

Description

本発明は、情報処理装置、情報処理方法および情報処理プログラムに関する。 The present invention relates to an information processing device, an information processing method, and an information processing program.

上記技術分野において、特許文献１には、文書に含まれる単語に重要度を付し、単語の重要度を要素とする多次元ベクトルにより文書の特徴を表現し、文書間のベクトルの角度により類似度を判断する技術が開示されている。 In the above technical field, in Patent Document 1, a word included in a document is assigned a degree of importance, a feature of the document is expressed by a multidimensional vector having the degree of importance of the word as an element, and it is more similar to the angle of the vector between the documents. A technique for determining the degree is disclosed.

特開２０１５−２１９７９９号公報JP, 2005-219799, A

しかしながら、上記文献に記載の技術では、効率的に類似文書を発見することができなかった。 However, with the technique described in the above document, it was not possible to efficiently find a similar document.

本発明の目的は、上述の課題を解決する技術を提供することにある。 An object of the present invention is to provide a technique that solves the above problems.

上記目的を達成するため、本発明にかかる情報処理装置は、
工作機械の仕様書のオプション欄に記載されているであろう単語を含む検索対象としての文書データをユーザから受信する受信部と、
前記文書データに含まれる複数の単語の意味をそれぞれベクトル化して積算して第１文書ベクトルを算出する算出部と、
過去の仕様書のオプション欄に自由に記載された文書を示すオプション欄データと、前記オプション欄データに含まれる単語の意味をそれぞれベクトル化して積算した結果としての第２文書ベクトルとを対応付けて記憶する仕様書データベースと、
前記第１文書ベクトルと前記第２文書ベクトルとのコサイン距離を第１類似度として算出する第１算出部と、
前記オプション欄データに含まれる文字列と、前記文書データに含まれる文字列とを比較し、文字列の相違を表す距離を第２類似度として算出する第２算出部と、
前記第１類似度および前記第２類似度に基づいて、前記文書データに類似する前記オプション欄データを含む類似仕様書を前記仕様書データベースから選択する選択部と、
前記類似仕様書を前記ユーザに送信する送信部と、
を備えた。
上記目的を達成するため、本発明にかかる他の情報処理装置は、
機械の種類と、日程と、自由入力が可能なオプション欄とを含む仕様書データを取得する取得部と、
(a)文書データベースに保存された複数の仕様書データのそれぞれのオプション欄に含まれる単語の意味を表すベクトルの積算結果としての第１ベクトルと、前記取得部が取得した前記仕様書データに含まれるオプション欄に記載された第１オプションに含まれる単語の意味を表すベクトルの積算結果としての第２ベクトルと、のコサイン距離を前記第１オプションの第１類似度として算出し、(b)前記第１ベクトルと、前記取得部が取得した仕様書のデータに含まれる前記オプション欄に記載された第２オプションに含まれる単語の意味を表すベクトルの積算結果としての第３ベクトルと、のコサイン距離を前記第２オプションの第１類似度として算出する第１算出部と、
(c)前記文書データベースに保存された複数の仕様書データのそれぞれのオプション欄に含まれる文字列と、前記第１オプションに含まれる文字列との相違を距離として算出し、その距離の近さを前記第１オプションの第２類似度として算出し、(d)前記文書データベースに保存された複数の仕様書データのそれぞれのオプション欄に含まれる文字列と、前記第２オプションに含まれる文字列との相違を距離として算出し、その距離の近さを前記第２オプションの第２類似度として算出する第２算出部と、
(e)前記第１オプションの前記第１類似度及び前記第２類似度に基づいて前記第１オプションに類似する文書を前記文書データベースに保存された複数の仕様書から選択し、(f)前記第２オプションの前記第１類似度及び前記第２類似度に基づいて前記第２オプションに類似する文書を前記文書データベースに保存された複数の仕様書から選択する選択部と、
を備えた。 In order to achieve the above object, an information processing apparatus according to the present invention,
A receiving unit for receiving from the user the document data as a search target including the word that will be described in the option field of the specification of the machine tool,
A calculation unit that calculates the first document vector by vectorizing and integrating the meanings of a plurality of words included in the document data;
The option column data indicating a document freely described in the option column of the past specifications and the second document vector as a result of vectorizing and integrating the meanings of the words included in the option column data are associated with each other. A specification database to be stored,
A first calculator that calculates a cosine distance between the first document vector and the second document vector as a first similarity;
A character string included in the option field data, compared with the character string included in the document data, a second calculation unit for calculating a distance representing a difference in string as a second degree of similarity,
A selection unit for selecting, from the specification database, a similar specification including the option field data similar to the document data , based on the first similarity and the second similarity.
A transmission unit that transmits the similar specifications to the user,
Equipped with.
In order to achieve the above object, another information processing apparatus according to the present invention is
An acquisition unit that acquires specification data including the type of machine, schedule, and option fields that can be freely input,
(a) A first vector as an integration result of vectors representing the meanings of words included in each option field of a plurality of specification data stored in the document database, and included in the specification data acquired by the acquisition unit. The cosine distance between the second vector as the integration result of the vector representing the meaning of the word included in the first option described in the option column described above is calculated as the first similarity of the first option, and (b) the above Cosine distance between the first vector and the third vector as the integration result of the vector representing the meaning of the word included in the second option described in the option column included in the specification data acquired by the acquisition unit A first calculation unit that calculates as the first similarity of the second option,
(c) The difference between the character string included in each option column of the plurality of specification data stored in the document database and the character string included in the first option is calculated as a distance, and the closeness of the distances is calculated. Is calculated as the second similarity of the first option, and (d) a character string included in each option column of a plurality of specification data stored in the document database, and a character string included in the second option. A second calculation unit that calculates the difference as a distance and calculates the closeness of the distance as the second similarity of the second option;
(e) selecting a document similar to the first option from a plurality of specifications stored in the document database based on the first similarity and the second similarity of the first option, and (f) A selection unit for selecting a document similar to the second option from a plurality of specifications stored in the document database based on the first similarity and the second similarity of the second option;
Equipped with.

上記目的を達成するため、本発明にかかる情報処理方法は、
工作機械の仕様書のオプション欄に記載されているであろう単語を含む検索対象としての文書データをユーザから受信部が受信する受信ステップと、
前記文書データに含まれる複数の単語の意味をそれぞれベクトル化して積算して第１文書ベクトルを文書ベクトル算出部が算出する文書ベクトル算出ステップと、
過去の仕様書のオプション欄に自由に記載された文書を示すオプション欄データと、前記オプション欄データに含まれる単語の意味をそれぞれベクトル化して積算した結果としての第２文書ベクトルとを対応付けてを記憶する仕様書データベースを用いて、前記第１文書ベクトルと前記第２文書ベクトルとのコサイン距離を第１算出部が第１類似度として算出する第１算出ステップと、
前記オプション欄データに含まれる文字列と、前記文書データに含まれる文字列とを比較し、文字列の相違を表す距離を第２類似度として第２算出部が算出する第２算出ステップと、
前記第１類似度および前記第２類似度に基づいて、前記文書データに類似する前記オプション欄データを含む類似仕様書を前記仕様書データベースから選択部が選択する選択ステップと、
前記類似仕様書を前記ユーザに送信部が送信する送信ステップと、
を含む。 In order to achieve the above object, an information processing method according to the present invention is
A receiving step in which the receiving unit receives document data as a search target including a word that will be described in the option field of the specification of the machine tool from the user,
A document vector calculation step in which a document vector calculation unit calculates a first document vector by vectorizing and integrating the meanings of a plurality of words included in the document data;
The option column data indicating a document freely described in the option column of the past specifications and the second document vector as a result of vectorizing and integrating the meanings of the words included in the option column data are associated with each other. A first calculation step in which a first calculation unit calculates a cosine distance between the first document vector and the second document vector as a first similarity using a specification database that stores
A second calculation step in which a character string included in the option field data is compared with a character string included in the document data, and a distance representing the difference between the character strings is calculated by the second calculator as the second similarity.
A selection step in which a selection unit selects a similar specification including the option field data similar to the document data from the specification database based on the first similarity and the second similarity;
A transmission step in which the transmission unit transmits the similar specifications to the user,
including.

上記目的を達成するため、本発明に係る情報処理プログラムは、
文書データベースに保存された複数の文書のそれぞれに含まれる単語の意味を表すベクトルの積算結果としての点と、新たに取得した文書に含まれる単語の意味を表すベクトルの積算結果としての点との距離を第１類似度として算出する第１算出ステップと、
複数の文書のそれぞれに含まれる文字と、新たに取得した文書に含まれる文字との相違を距離として算出し、その距離の近さを第２類似度として算出する第２算出ステップと、
前記第１類似度および前記第２類似度に基づいて、前記新たに取得した文書に類似する文書を前記複数の文書から選択する選択ステップと、
をコンピュータに実行させる。 In order to achieve the above object, an information processing program according to the present invention,
The points as the result of adding up the vectors representing the meanings of the words contained in each of the multiple documents stored in the document database and the points as the result of adding up the vectors representing the meanings of the words contained in the newly acquired document. A first calculation step of calculating the distance as the first similarity,
A second calculation step of calculating a difference between a character included in each of the plurality of documents and a character included in the newly acquired document as a distance, and calculating a closeness of the distance as a second similarity degree;
A selection step of selecting a document similar to the newly acquired document from the plurality of documents based on the first similarity and the second similarity;
Causes the computer to execute.

本発明によれば、より効率的に類似文書を発見することができる。 According to the present invention, a similar document can be found more efficiently.

本発明の第１実施形態に係る情報処理装置の内部構成を説明する図である。It is a figure explaining the internal configuration of the information processing apparatus which concerns on 1st Embodiment of this invention. 本発明の第２実施形態に係る情報処理装置の動作の概要を説明する図である。It is a figure explaining the outline of operation|movement of the information processing apparatus which concerns on 2nd Embodiment of this invention. 本発明の第２実施形態に係る情報処理装置の類似検索アルゴリズムの一例を説明する図である。It is a figure explaining an example of the similarity search algorithm of the information processing apparatus which concerns on 2nd Embodiment of this invention. 本発明の第２実施形態に係る情報処理装置の構成を示すブロック図である。It is a block diagram which shows the structure of the information processing apparatus which concerns on 2nd Embodiment of this invention. 本発明の第２実施形態に係る情報処理装置により返信される返信メールの内容の一例を説明する図である。It is a figure explaining an example of the content of the reply mail returned by the information processor concerning a 2nd embodiment of the present invention. 本発明の第２実施形態に係る情報処理装置が有するベクトル化テーブルの一例を示す図である。It is a figure which shows an example of the vectorization table which the information processing apparatus which concerns on 2nd Embodiment of this invention has. 本発明の第２実施形態に係る情報処理装置が有するベクトル変換テーブルの一例を示す図である。It is a figure which shows an example of the vector conversion table which the information processing apparatus which concerns on 2nd Embodiment of this invention has. 本発明の第２実施形態に係る情報処理装置のハードウェア構成を示すブロック図である。It is a block diagram which shows the hardware constitutions of the information processing apparatus which concerns on 2nd Embodiment of this invention. 本発明の第２実施形態に係る情報処理装置の処理手順を説明するフローチャートである。It is a flow chart explaining the processing procedure of the information processor concerning a 2nd embodiment of the present invention.

以下に、本発明を実施するための形態について、図面を参照して、例示的に詳しく説明記載する。ただし、以下の実施の形態に記載されている、構成、数値、処理の流れ、機能要素などは一例に過ぎず、その変形や変更は自由であって、本発明の技術範囲を以下の記載に限定する趣旨のものではない。 Hereinafter, modes for carrying out the present invention will be described in detail by way of example with reference to the drawings. However, the configurations, numerical values, processing flows, functional elements, and the like described in the following embodiments are merely examples, and modifications and changes thereof are free, and the technical scope of the present invention is described below. It is not meant to be limiting.

［第１実施形態］
本発明の第１実施形態に係る情報処理装置について、図１を用いて説明する。図１は、本実施形態に係る情報処理装置の内部構成を説明するための図である。 [First Embodiment]
An information processing apparatus according to the first embodiment of the present invention will be described with reference to FIG. FIG. 1 is a diagram for explaining the internal configuration of the information processing apparatus according to this embodiment.

情報処理装置１００は、算出部１０１、算出部１０２および選択部１０３を有する。算出部１０１は、文書データベース１６０に保存された複数の文書１６１のそれぞれに含まれる単語の意味を表すベクトルの積算結果としての点１６２と、新たに取得した文書１５０に含まれる単語の意味を表すベクトルの積算結果としての点１５２との距離を類似度１１１として算出する。算出部１０２は、複数の文書１６１のそれぞれに含まれる文字１６３と、新たに取得した文書１５０に含まれる文字１５３との相違を距離として算出し、その距離の近さを類似度１２１として算出する。選択部１０３は、類似度１１１および類似度１２１に基づいて、新たに取得した文書１５０に類似する文書１７０を複数の文書１４１から選択する。 The information processing device 100 includes a calculation unit 101, a calculation unit 102, and a selection unit 103. The calculation unit 101 represents a point 162 as an integration result of vectors representing the meanings of the words included in each of the plurality of documents 161 stored in the document database 160, and the meaning of the words included in the newly acquired document 150. The distance to the point 152 as the vector integration result is calculated as the similarity 111. The calculation unit 102 calculates the difference between the character 163 included in each of the plurality of documents 161 and the character 153 included in the newly acquired document 150 as the distance, and calculates the closeness of the distance as the similarity 121. .. The selection unit 103 selects the document 170 similar to the newly acquired document 150 from the plurality of documents 141 based on the similarity 111 and the similarity 121.

本実施形態によれば、より効率的に類似文書を発見することができる。 According to this embodiment, a similar document can be found more efficiently.

［第２実施形態］
次に本発明の第２実施形態に係る情報処理装置について、図２Ａ乃至図６を用いて説明する。図２Ａは、本実施形態に係る情報処理装置２００の動作の概要を説明する図である。例えば、工作機械販売者２１０が、工作機械の購入希望者から工作機械の見積り依頼や工作機械の設計依頼を受けた場合、その種の依頼に基づいて、文書２５０（設計書または仕様書）を作成して、情報処理装置２００に送信する。情報処理装置２００は、受け取った文書２５０に類似する仕様書２６１をデータベース２６０から検索し、過去の類似事例の工作機械の仕様書２６１を工作機械販売者２１０に提示する。 [Second Embodiment]
Next, an information processing apparatus according to the second embodiment of the present invention will be described with reference to FIGS. 2A to 6. FIG. 2A is a diagram for explaining the outline of the operation of the information processing device 200 according to the present embodiment. For example, when the machine tool seller 210 receives an estimate request for a machine tool or a design request for a machine tool from a person who wants to purchase the machine tool, a document 250 (design document or specification sheet ) is issued based on such a request. It is created and transmitted to the information processing device 200. The information processing apparatus 200 searches the database 260 for a specification 261 similar to the received document 250, and presents the specification 261 of the machine tool of a similar case in the past to the machine tool seller 210.

過去の類似事例の仕様書２６１を参照すれば、工作機械販売者２１０は、見積書を作成する際や、実際に受注して工作機械の設計をする際に、どんな部品を取り付けていたか、価格がいくらだったかなどをすぐに知ることができる。そのため、工作機械販売者２１０は、見積書の作成や設計を容易に行うことができ、見積書の作成の時間や設計の時間、部品の発注に要する時間を大幅に短縮することができる。 By referring to the specification 261 of similar cases in the past, the machine tool seller 210 can confirm what kind of parts are attached to the machine tool when making a quotation or actually designing a machine tool by receiving an order and a price. You can immediately see how much was. Therefore, the machine tool seller 210 can easily create and design an estimate, and can significantly reduce the time for creating an estimate, the time for designing, and the time required for ordering parts.

工作機械の仕様書２５０，２６１には、工作機械の種類、日程（商談開始日、打ち合わせ日、発注日）、オプションなどの記入欄がある。ここで、オプション欄は、いわゆる自由入力欄に相当するものであり、各顧客の個別の事情に合わせた、工作機械の様々な仕様を指定するための情報を記入する欄である。つまり、オプション欄は、顧客の要望が自由な体裁で記載されている。例えば、ある工作機械販売者２１０は、部品の名称、個数などを箇条書き形式で記入し、また、他の工作機械販売者２１０は、顧客の要望を文章化して記入する。このように、オプション欄の記載は、自由度が高く、工作機械販売者２１０によっては、略称や略語、記号などを用いて記載することもあり、記載のための決まったフォーマットはない。 The machine tool specifications 250 and 261 include fields for entering the machine tool type, schedule (business negotiation start date, meeting date, order date), options, and the like. Here, the option field corresponds to a so-called free input field, and is a field for entering information for designating various specifications of the machine tool according to the individual circumstances of each customer. That is, in the option column, the customer's request is described in a free format. For example, one machine tool seller 210 writes the names of parts, the number of parts, etc. in a bulleted form, and another machine tool seller 210 writes the customer's request in text. As described above, the option field has a high degree of freedom, and depending on the machine tool seller 210, the option field may be described by using an abbreviation, an abbreviation, a symbol, or the like, and there is no fixed format for the description.

そして、工作機械の見積書の作成や設計書の作成においては、オプション欄の記載が重要となる。つまり、例えば、工作機械の機種が同じであっても、どのようなオプションを設定するか応じて、見積内容や設計内容が大幅に異なってくる。そのため、工作機械販売者２１０は、オプション欄に記載されている内容に基づいて、過去の類似事例を検索する。 When creating a quotation for a machine tool or a design, it is important to describe the option column. That is, for example, even if the machine tool models are the same, the estimation contents and the design contents are significantly different depending on what option is set. Therefore, the machine tool seller 210 searches for past similar cases based on the contents described in the option column.

ただし、工作機械販売者２１０が検索のためのキーワードを考え、そのキーワードを用いて検索を行う場合には、工作機械販売者２１０が考えたキーワードが適切でなければ、工作機械販売者２１０が望むような検索結果は返ってこない。 However, when the machine tool seller 210 considers a keyword for a search and performs a search using the keyword, the machine tool seller 210 wants if the keyword considered by the machine tool seller 210 is not appropriate. Such search results will not be returned.

そこで、工作機械販売者２１０は、過去の仕様書のオプション欄に記載されているであろう文章、用語、単語、記号、数値などを記載した送信メール２５０を作成し、情報処理装置２００へ送信する。送信メール２５０を受信した情報処理装置２００は、送信メール２５０に記載されている文章などに基づいて、データベース２６０から類似するオプション欄の記載を有する仕様書を検索し、抽出する。そして、情報処理装置２００は、抽出した仕様書を返信メール２３０として工作機械販売者２１０に返信する。なお、工作機械の仕様書は、データベース２６０に保存され、管理されている。データベース２６０は、情報処理装置２００の内部にあっても、外部にあってもよい。 Therefore, the machine tool seller 210, to create a sentence that would have been described in the options field of the past of specifications, terms, words, symbols, numbers outgoing mail 2 5 0 was current as, the information processing apparatus 200 Send to. The information processing apparatus 200 which has received the outgoing mail 2 5 0, based like in the text that is described in the outgoing mail 2 5 0, retrieves the specifications with a description of the option field similar from the database 260, and extracts. Then, the information processing apparatus 200 returns the extracted specifications as a reply mail 230 to the machine tool seller 210. The specifications of the machine tool are stored and managed in the database 260. The database 260 may be internal or external to the information processing device 200.

情報処理装置２００は、データベース２６０の検索の際に、データベース２６０に保存されている仕様書の全てについて、工作機械販売者２１０が所望するオプション欄の記載との類似度を算出し、類似度の高い仕様書を抽出する。情報処理装置２００は、類似度の算出において、図３Ｃに示すように、コサイン距離およびレーベンシュタイン距離の２つの指標を用いて類似度を算出する。コサイン距離を用いた算出手法は、言葉や単語の意味が似ているものは類似度として高い値を算出する手法であり、言葉の意味をベースとした算出手法である。 At the time of searching the database 260, the information processing apparatus 200 calculates the degree of similarity with the description in the option column desired by the machine tool seller 210 for all the specifications stored in the database 260, and calculates the degree of similarity. Extract high specifications. In the calculation of the similarity, the information processing apparatus 200 calculates the similarity using two indices of the cosine distance and the Levenshtein distance, as shown in FIG. 3C. The calculation method using the cosine distance is a method for calculating a high value for the similarity between words or words having similar meanings, and is a calculation method based on the meaning of words.

また、レーベンシュタイン距離を用いた算出手法は、文字列自体がどの程度類似しているかで類似度を算出する手法であり、文字をベースとした算出手法である。例えば、文字列同士の相違が３文字以下であれば類似度として高い値を算出し、４文字以上であれば類似度として低い値を算出する手法である。情報処理装置２００は、この２つの類似度の算出手法を組み合わせて過去の類似事例を検索する。情報処理装置２００は、例えば、コサイン距離による類似度を５０点満点、レーベンシュタイン距離による類似度を５０点満点で算出し、両者の合計（１００点満点）で全体としての類似度を算出する。 The calculation method using the Levenshtein distance is a method of calculating the degree of similarity based on how similar the character strings themselves are, and is a character-based calculation method. For example, when the difference between the character strings is 3 characters or less, a high similarity value is calculated, and when the difference is 4 characters or more, a low similarity value is calculated. The information processing apparatus 200 searches for past similar cases by combining these two methods of calculating the degree of similarity. The information processing apparatus 200 calculates, for example, the similarity based on the cosine distance with a perfect score of 50 points, the similarity based on the Levenshtein distance with a perfect score of 50 points, and the total of the two (a perfect score of 100 points) to calculate the overall similarity.

情報処理装置２００は、算出した類似度の値と所定の閾値とを比較して、所定の閾値よりも高い類似度の値を有する仕様書を類似事例と決定する。情報処理装置２００は、類似事例と決定された仕様書を返信メール２３０に添付して工作機械販売者２１０に返信する。これにより、例えば、工作機械のある機種のオプション欄の記載については、返信メール２３０に添付されたような過去の仕様書、つまり、過去の類似事例があったことが分かる。 The information processing apparatus 200 compares the calculated similarity value with a predetermined threshold value, and determines a specification having a similarity value higher than the predetermined threshold value as a similar case. The information processing apparatus 200 attaches the specifications determined to be similar cases to the reply mail 230 and sends the reply to the machine tool seller 210. Thereby, for example, regarding the description of the option column of a certain model of the machine tool, it can be seen that there was a past specification attached to the reply mail 230, that is, a similar case in the past.

図３Ａは、本実施形態に係る情報処理装置の構成を示すブロック図である。図３Ｂは、本実施形態に係る情報処理装置により返信される返信メールの内容の一例を説明する図である。情報処理装置２００は、算出部３０１、算出部３０２、選択部３０３および送受信部３０４を有する。まず、データベース２６０に保存された複数の仕様書２４１のうち１つの仕様書２４１に着目する。算出部３０１は、仕様書２４１に含まれる全ての単語の意味を表すベクトルを求め、求めたベクトルを積算し、その結果としての点を求める。 FIG. 3A is a block diagram showing the configuration of the information processing apparatus according to this embodiment. FIG. 3B is a diagram illustrating an example of the content of a reply mail returned by the information processing device according to the present embodiment. The information processing device 200 includes a calculation unit 301, a calculation unit 302, a selection unit 303, and a transmission/reception unit 304. First, attention is focused on a single specification 241 of the plurality of specifications 241 stored in the database 260. The calculation unit 301 obtains vectors representing the meanings of all the words included in the specification 241, integrates the obtained vectors, and obtains the resulting points.

次に、算出部３０１は、データベース２６０に保存されている仕様書の残りの仕様書の全てについて上述の計算を行い、それぞれの仕様書のベクトルの積算結果としての点を求める。 Next, the calculation unit 301 performs the above calculation for all the remaining specifications stored in the database 260, and obtains a point as the integration result of the vectors of the respective specifications.

次に、算出部３０１は、新たに取得した文書２５０に含まれる単語の意味を表す単語ベクトルを求め、求めた単語ベクトルの和を取り、その単語ベクトルの積算結果としての文書ベクトルを示す点の座標を求める。 Subsequently, the computing unit 301 obtains a word vector representing the meaning of a word contained in the newly acquired document 250, a sum of word vectors determined, the point indicating the document vector as result of integration word vectors of its Find the coordinates of.

そして、算出部３０１は、仕様書２６１の文書ベクトルとしての点２６２と、文書２５０の文書ベクトルとしての点２５２と、のコサイン距離を類似度３１１として算出する。算出された類似度３１１は、算出部３０１に一時的に保存される。 Then, the calculation unit 301 calculates the cosine distance between the point 262 as the document vector of the specification 261 and the point 252 as the document vector of the document 250 as the similarity 311. The calculated similarity 311 is temporarily stored in the calculation unit 301.

このコサイン距離の類似度３１１は、コサイン類似度と呼ばれるものであり、ベクトル空間モデルにおいて、文書同士を比較する際に用いられる類似度計算手法であり、２本の文書ベクトルの向きがどれくらい同じ向きを向いているのかを表す指標となる。コサイン距離は１に近ければ類似していることを、０に近ければ類似していないことを表す。 The cosine distance similarity 311 is called a cosine similarity , and is a similarity calculation method used when comparing documents in a vector space model. The similarity between two document vectors is the same. It is an index that indicates whether or not you are facing. If the cosine distance is close to 1, it means that they are similar, and if it is close to 0, it means that they are not similar.

算出部３０２は、データベース２６０に保存された複数の仕様書２６１のそれぞれに含まれる文字列２６３と、新たに取得した文書２５０に含まれる文字列２５３との相違を距離として算出する。算出部３０２は、算出された距離の近さを類似度３２１として算出する。 The calculation unit 302 calculates the difference between the character string 263 included in each of the plurality of specifications 261 stored in the database 260 and the character string 253 included in the newly acquired document 250 as a distance. The calculating unit 302 calculates the closeness of the calculated distances as the similarity 321.

ここで、算出部３０２が算出する距離は、レーベンシュタイン距離と呼ばれるものである。レーベンシュタイン距離とは、２つの文字列がどの程度異なっているかを示す距離の一種である。 Here, the distance calculated by the calculation unit 302 is called a Levenshtein distance. The Levenshtein distance is a type of distance indicating how different two character strings are.

選択部３０３は、算出部３０１で算出した類似度３１１と算出部３０２で算出した類似度３２１とを合算して類似度３３１を生成する。選択部３０３は、生成した類似度３３１に基づいて、文書２５０に類似する仕様書をデータベース２６０に保存された仕様書２６１から類似仕様書２７０として選択する。 The selection unit 303 adds up the similarity 311 calculated by the calculation unit 301 and the similarity 321 calculated by the calculation unit 302 to generate a similarity 331. The selection unit 303 selects a specification similar to the document 250 as the similar specification 270 from the specification 261 stored in the database 260 based on the generated similarity 331.

類似度３１１と類似度３２１とを合算して生成された類似度３３１は、意味に基づく類似度と文字に基づく類似度との組み合わせとなっており、これら２つの類似度をバランスよく組み合わせることにより、文書２５０と類似する類似仕様書２７０を抽出できる。 The similarity 331 generated by adding the similarity 311 and the similarity 321 is a combination of the similarity based on the meaning and the similarity based on the character. By combining these two similarities in a well-balanced manner. , A similar specification 270 similar to the document 250 can be extracted.

送受信部３０４は、選択された類似仕様書２７０を返信メールに添付して工作機械販売者２１０へ返信する。返信されるメールは、具体的には、図３Ｂに示したような内容のメールとなる。なお、送受信部３０４は、選択された類似仕様書２７０を添付して返信する代わりに、類似仕様書２７０へのリンク先のアドレスをユーザに返信メールとして返信してもよい。 The transmitting/receiving unit 304 attaches the selected similar specification 270 to a reply mail and sends the reply to the machine tool seller 210. Specifically, the returned email has the content shown in FIG. 3B. Note that the transmission/reception unit 304 may return the address of the link destination to the similar specification 270 to the user as a reply mail instead of attaching the selected similar specification 270 as a reply.

なお、選択された類似仕様書２７０について、工作機械販売者２１０の望むものであったか否かのフィードバックを工作機械販売者２１０から受けて、選択結果の良し悪しを学習させるようにしてもよい。 Note that the selected similar specification 270 may receive feedback from the machine tool seller 210 as to whether or not the machine tool seller 210 desires it so that the quality of the selection result can be learned.

なお、ここでは、工作機械の仕様書を例に説明をしたが、情報処理装置２００が受信する、新たに取得した文書２５０は仕様書には限定されず、例えば、部品の発注書、製品が故障した場合のユーザからの問い合わせ文書などであってもよい。 Although the specification of the machine tool has been described as an example here, the newly acquired document 250 received by the information processing device 200 is not limited to the specification, and, for example, a part purchase order or a product It may be an inquiry document from the user in the case of a failure.

図４Ａは、本実施形態に係る情報処理装置が有するベクトル化テーブルの一例を示す図である。仕様書ＩＤ（Identifier）４１１は、仕様書を識別するための識別子であり、データベース２６０に保存されている各仕様書に一意に割り当てられている。データベース２６０には、新たな仕様書が作成される度に、作成された仕様書が保存される。オプション欄４１２は、顧客要望を自由に表した文書であり、どのような部品を付け足したか、価格はいくらか、などが記入されている。文書ベクトル４１３は、仕様書（文書）に含まれる各単語のベクトルを足し合わせたものである。このように、各仕様書のオプション欄の文書ベクトルをあらかじめ計算しておくと、文書ベクトル同士のコサイン距離（コサイン類似度）を求めることにより、類似度の近い仕様書を迅速、確実に特定することができる。これらの他に、ベクトル化テーブル４０１は、例えば、工作機械の機種、日程、ＮＣ装置の種類などを記憶してもよい。ここで、日程は、例えば、商談開始日、商談打ち合わせ日、発注日などを含む。なお、ベクトル化テーブル４０１は、機種ごとにソートしてもよい。 FIG. 4A is a diagram showing an example of a vectorization table included in the information processing apparatus according to the present embodiment. The specification ID (Identifier) 411 is an identifier for identifying the specification, and is uniquely assigned to each specification stored in the database 260. The database 260 stores the created specifications each time a new specification is created. The option column 412 is a document that freely expresses the customer's request, and describes what kind of parts are added, what the price is, and the like. The document vector 413 is the sum of the vectors of the words included in the specification (document). In this way, if the document vector in the option field of each specification is calculated in advance, the specifications having similarities can be quickly and surely specified by obtaining the cosine distance (cosine similarity) between the document vectors. be able to. In addition to these, the vectorization table 401 may store, for example, the machine tool model, schedule, NC device type, and the like. Here, the schedule includes, for example, a business negotiation start date, a business negotiation meeting date, an ordering date, and the like. The vectorization table 401 may be sorted by model.

図４Ｂは、本実施形態に係る情報処理装置が有するベクトル変換テーブルの一例を示す図である。ベクトル変換テーブル４０２は、単語に４２１に関連付けてベクトル４２２を記憶する。単語４２１は、工作機械の仕様書に含まれる言葉である。なお、単語４２１には、一般用語の他に略語や外国語、専門用語なども含まれる。ベクトル４２２は、単語４２１の一つ一つのベクトルであり、単語４２１をベクトル化した場合に、どのようなベクトルとなるかを示している。つまり、ベクトル変換テーブル４０２は、単語とベクトルとの対応関係を表している。情報処理装置２００は、各単語について、コサイン距離をあらかじめ算出しているので、ベクトル変換テーブル４０２を参照すれば、仕様書のオプション欄のコサイン距離を容易に算出することができる。 FIG. 4B is a diagram showing an example of the vector conversion table included in the information processing apparatus according to the present embodiment. The vector conversion table 402 stores a vector 422 in association with a word 421. The word 421 is a word included in the specification of the machine tool. The word 421 includes abbreviations, foreign languages, technical terms, etc. in addition to general terms. The vector 422 is each vector of the word 421, and shows what kind of vector it will be when the word 421 is vectorized. That is, the vector conversion table 402 represents the correspondence between words and vectors. Since the information processing device 200 calculates the cosine distance for each word in advance, the cosine distance in the option field of the specification can be easily calculated by referring to the vector conversion table 402.

図５は、本実施形態に係る情報処理装置のハードウェア構成を示すブロック図である。ＣＰＵ(Central Processing Unit)５１０は、演算制御用のプロセッサであり、プログラムを実行することで図３の情報処理装置２００の機能構成部を実現する。ＣＰＵ５１０は複数のプロセッサを有し、異なるプログラムやモジュール、タスク、スレッドなどを並行して実行してもよい。ＲＯＭ(Read Only Memory)５２０は、初期データおよびプログラムなどの固定データおよびその他のプログラムを記憶する。また、ネットワークインタフェース５３０は、ネットワークを介して他の装置などと通信する。なお、ＣＰＵ５１０は１つに限定されず、複数のＣＰＵであっても、あるいは画像処理用のＧＰＵ(Graphics Processing Unit)を含んでもよい。また、ネットワークインタフェース５３０は、ＣＰＵ５１０とは独立したＣＰＵを有して、ＲＡＭ(Random Access Memory)５４０の領域に送受信データを書き込みあるいは読み出しするのが望ましい。また、ＲＡＭ５４０とストレージ５５０との間でデータを転送するＤＭＡＣ(Direct Memory Access Controller)を設けるのが望ましい（図示なし）。さらに、ＣＰＵ５１０は、ＲＡＭ５４０にデータが受信あるいは転送されたことを認識してデータを処理する。また、ＣＰＵ５１０は、処理結果をＲＡＭ５４０に準備し、後の送信あるいは転送はネットワークインタフェース５３０やＤＭＡＣに任せる。 FIG. 5 is a block diagram showing the hardware configuration of the information processing apparatus according to this embodiment. A CPU (Central Processing Unit) 510 is a processor for arithmetic control, and executes a program to realize the functional configuration unit of the information processing apparatus 200 in FIG. The CPU 510 has a plurality of processors and may execute different programs, modules, tasks, threads, etc. in parallel. A ROM (Read Only Memory) 520 stores fixed data such as initial data and programs and other programs. The network interface 530 also communicates with other devices and the like via the network. The number of CPUs 510 is not limited to one, and may be a plurality of CPUs or may include a GPU (Graphics Processing Unit) for image processing. Further, it is desirable that the network interface 530 has a CPU independent of the CPU 510 and writes or reads transmission/reception data to/from an area of a RAM (Random Access Memory) 540. Further, it is desirable to provide a DMAC (Direct Memory Access Controller) for transferring data between the RAM 540 and the storage 550 (not shown). Further, the CPU 510 recognizes that the data is received or transferred to the RAM 540 and processes the data. Further, the CPU 510 prepares the processing result in the RAM 540, and leaves the subsequent transmission or transfer to the network interface 530 or the DMAC.

ＲＡＭ５４０は、ＣＰＵ５１０が一時記憶のワークエリアとして使用するランダムアクセスメモリである。ＲＡＭ５４０には、本実施形態の実現に必要なデータを記憶する領域が確保されている。取得文書５４１は、工作機械販売者２１０が、情報処理装置２００に対して送信した、類似する仕様書を検索するための文書である。コサイン距離５４２は、取得した文書２５０とデータベース２６０に保存されている複数の仕様書２６１とがどれくらい似通っているかを示す距離である。レーベンシュタイン距離５４３は、取得した文書２５０に含まれる文字と、データベース２６０に保存されている複数の仕様書２６１の文字とが文字ベースでどれくらい似通っているかを示す距離である。類似度５４４は、コサイン距離５４２およびレーベンシュタイン距離５４３に基づいて決定された、取得した文書２５０とデータベース２６０に保存されている複数の仕様書２６１とが類似している割合である。類似仕様書５４５は、類似度５４４に基づいて選択された取得した文書２５０に類似する文書である。 The RAM 540 is a random access memory used by the CPU 510 as a work area for temporary storage. The RAM 540 has an area reserved for storing data necessary for realizing the present embodiment. The acquired document 541 is a document transmitted by the machine tool seller 210 to the information processing apparatus 200 for searching for similar specifications. The cosine distance 542 is a distance indicating how similar the acquired document 250 and the specifications 261 stored in the database 260 are. The Levenshtein distance 543 is a distance indicating how similar the characters included in the acquired document 250 and the characters of the specifications 261 stored in the database 260 are on a character basis. The similarity 544 is a ratio determined based on the cosine distance 542 and the Levenshtein distance 543 and the similarity between the acquired document 250 and the specifications 261 stored in the database 260. The similar specification 545 is a document similar to the acquired document 250 selected based on the similarity 544.

送受信データ５４６は、ネットワークインタフェース５３０を介して送受信されるデータである。また、ＲＡＭ５４０は、各種アプリケーションモジュールを実行するためのアプリケーション実行領域５４７を有する。 The transmission/reception data 546 is data transmitted/received via the network interface 530. The RAM 540 also has an application execution area 547 for executing various application modules.

ストレージ５５０には、データベースや各種のパラメータ、あるいは本実施形態の実現に必要な以下のデータまたはプログラムが記憶されている。ストレージ５５０は、ベクトル化テーブル４０１および変換テーブル４０２を格納する。ベクトル化テーブル４０１は、図４Ａに示した、仕様書ＩＤ４１１と文書ベクトル４１３などとを関連付けて記憶するテーブルである。変換テーブル４０２は、図４Ｂに示した、単語４２１とベクトル４２２とを関連付けて記憶するテーブルである。 The storage 550 stores a database, various parameters, and the following data or programs necessary for implementing the present embodiment. The storage 550 stores the vectorization table 401 and the conversion table 402. The vectorization table 401 is a table for storing the specification ID 411 and the document vector 413 shown in FIG. 4A in association with each other. The conversion table 402 is a table that stores the words 421 and the vectors 422 shown in FIG. 4B in association with each other.

ストレージ５５０は、さらに、算出モジュール５５１、算出モジュール５５２、選択モジュール５５３および送受信モジュール５５４を格納する。算出モジュール５５１は、コサイン距離を算出するモジュールである。算出モジュール５５２は、レーベンシュタイン距離を算出するモジュールである。選択モジュール５５３は、コサイン距離とレーベンシュタイン距離とに基づいて、新たに取得した文書２５０と類似する類似仕様書２７０を選択するモジュールである。送受信モジュール５５４は、文書２５０を電子メールで受信し、類似仕様書２７０を電子メールで返信するモジュールである。これらのモジュール５５１〜５５４は、ＣＰＵ５１０によりＲＡＭ５４０のアプリケーション実行領域５４７に読み出され、実行される。制御プログラム５５５は、情報処理装置２００の全体を制御するためのプログラムである。 The storage 550 further stores a calculation module 551, a calculation module 552, a selection module 553, and a transmission/reception module 554. The calculation module 551 is a module that calculates the cosine distance. The calculation module 552 is a module that calculates the Levenshtein distance. The selection module 553 is a module that selects the similar specification 270 similar to the newly acquired document 250 based on the cosine distance and the Levenshtein distance. The transmission/reception module 554 is a module that receives the document 250 by e-mail and returns the similar specification 270 by e-mail. These modules 551 to 554 are read by the CPU 510 into the application execution area 547 of the RAM 540 and executed. The control program 555 is a program for controlling the entire information processing apparatus 200.

入出力インタフェース５６０は、入出力機器との入出力データをインタフェースする。入出力インタフェース５６０には、表示部５６１、操作部５６２、が接続される。また、入出力インタフェース５６０には、さらに、記憶媒体５６４が接続されてもよい。さらに、音声出力部であるスピーカ５６３や、音声入力部であるマイク（図示せず）、あるいは、ＧＰＳ位置判定部が接続されてもよい。なお、図５に示したＲＡＭ５４０やストレージ５５０には、情報処理装置２００が有する汎用の機能や他の実現可能な機能に関するプログラムやデータは図示されていない。 The input/output interface 560 interfaces the input/output data with the input/output device. A display unit 561 and an operation unit 562 are connected to the input/output interface 560. A storage medium 564 may be further connected to the input/output interface 560. Further, a speaker 563 that is a voice output unit, a microphone (not shown) that is a voice input unit, or a GPS position determination unit may be connected. Note that the RAM 540 and the storage 550 shown in FIG. 5 do not show programs or data relating to general-purpose functions of the information processing apparatus 200 or other feasible functions.

図６は、本実施形態に係る情報処理装置２００の処理手順を説明するためのフローチャートである。このフローチャートは、図５のＣＰＵ５１０がＲＡＭ５４０を使用して実行し、図３の情報処理装置２００の機能構成部を実現する。 FIG. 6 is a flowchart for explaining the processing procedure of the information processing apparatus 200 according to this embodiment. This flowchart is executed by the CPU 510 of FIG. 5 using the RAM 540 to realize the functional configuration unit of the information processing apparatus 200 of FIG.

ステップＳ６０１において、情報処理装置２００は、ユーザが知りたい過去の仕様書を検索するためのキーワードなどを記した文書をユーザから受信する。ステップＳ６０３において、情報処理装置２００は、ユーザから受信した文書とデータベース２６０に保存されている仕様書とのコサイン距離を算出する。ステップＳ６０５において、情報処理装置２００は、ユーザから受信した文書とデータベース２６０に保存されている仕様書とのレーベンシュタイン距離を算出する。ステップＳ６０７において、情報処理装置２００は、算出したコサイン距離と算出したレーベンシュタイン距離との和をとり、保存仕様書の類似度を算出する。ステップＳ６０９において、情報処理装置２００は、データベース２６０に保存されている全ての仕様書について、類似度の算出が完了したか否かを判断する。全ての保存仕様書について、類似度の算出が完了していない場合（ステップＳ６０９のＮＯ）、情報処理装置２００は、ステップＳ６０３以降の処理を繰り返す。全ての保存仕様書について、類似度の算出が完了している場合（ステップＳ６０９のＹＥＳ）、情報処理装置２００は、ステップＳ６１１へ進む。ステップＳ６１１において、情報処理装置２００は、所定の閾値以上の類似度を持つ仕様書を類似仕様書として電子メールに添付してユーザに送信する。 In step S601, the information processing apparatus 200 receives, from the user, a document in which a keyword or the like for searching for past specifications that the user wants to know is written. In step S603, the information processing apparatus 200 calculates the cosine distance between the document received from the user and the specifications stored in the database 260. In step S605, the information processing apparatus 200 calculates the Levenshtein distance between the document received from the user and the specifications stored in the database 260. In step S607, the information processing apparatus 200 calculates the similarity of the storage specifications by taking the sum of the calculated cosine distance and the calculated Levenshtein distance. In step S609, the information processing apparatus 200 determines whether or not the calculation of the similarity has been completed for all the specifications stored in the database 260. When the calculation of the similarity has not been completed for all the storage specifications (NO in step S609), the information processing apparatus 200 repeats the processing from step S603. When the calculation of the similarity has been completed for all the storage specifications (YES in step S609), the information processing apparatus 200 proceeds to step S611. In step S611, the information processing apparatus 200 attaches a specification having a degree of similarity equal to or higher than a predetermined threshold to the e-mail as a similar specification and transmits it to the user.

本実施形態によれば、コサイン距離とレーベンシュタイン距離とを組み合わせて類似度を判定するので、ユーザが探している文書に類似している過去の仕様書を迅速、確実に見つけ出すことができる。 According to this embodiment, since the similarity is determined by combining the cosine distance and the Levenshtein distance, it is possible to quickly and surely find a past specification that is similar to the document the user is looking for.

［他の実施形態］
以上、実施形態を参照して本願発明を説明したが、本願発明は上記実施形態に限定されるものではない。本願発明の構成や詳細には、本願発明の技術的範囲で当業者が理解し得る様々な変更をすることができる。また、それぞれの実施形態に含まれる別々の特徴を如何様に組み合わせたシステムまたは装置も、本発明の技術的範囲に含まれる。 [Other Embodiments]
Although the present invention has been described with reference to the exemplary embodiments, the present invention is not limited to the above exemplary embodiments. Various modifications that can be understood by those skilled in the art can be made to the configuration and details of the present invention within the technical scope of the present invention. Further, a system or apparatus in which different features included in each embodiment are combined in any way is also included in the technical scope of the present invention.

また、本発明は、複数の機器から構成されるシステムに適用されてもよいし、単体の装置に適用されてもよい。さらに、本発明は、実施形態の機能を実現する情報処理プログラムが、システムあるいは装置に供給され、内蔵されたプロセッサによって実行される場合にも適用可能である。したがって、本発明の機能をコンピュータで実現するために、コンピュータにインストールされるプログラム、あるいはそのプログラムを格納した媒体、そのプログラムをダウンロードさせるＷＷＷ(World Wide Web)サーバも、プログラムを実行するプロセッサも本発明の技術的範囲に含まれる。特に、少なくとも、上述した実施形態に含まれる処理ステップをコンピュータに実行させるプログラムを格納した非一時的コンピュータ可読媒体（non-transitory computer readable medium）は本発明の技術的範囲に含まれる。 Further, the present invention may be applied to a system including a plurality of devices or may be applied to a single device. Furthermore, the present invention is applicable to a case where an information processing program that realizes the functions of the embodiments is supplied to a system or apparatus and executed by a built-in processor. Therefore, in order to realize the functions of the present invention on a computer, both a program installed in the computer, a medium storing the program, a WWW (World Wide Web) server for downloading the program, and a processor for executing the program are provided. It is included in the technical scope of the invention. In particular, at least a non-transitory computer readable medium storing a program that causes a computer to execute the processing steps included in the above-described embodiment is included in the technical scope of the present invention.

Claims

An acquisition unit that acquires specification data including the type of machine, schedule, and option fields that can be freely input,
(a) A first vector as an integration result of vectors representing the meanings of words included in each option field of a plurality of specification data stored in the document database, and included in the specification data acquired by the acquisition unit. The cosine distance between the second vector as the integration result of the vector representing the meaning of the word included in the first option described in the option column described above is calculated as the first similarity of the first option, and (b) the above Cosine distance between the first vector and the third vector as the integration result of the vector representing the meaning of the word included in the second option described in the option column included in the specification data acquired by the acquisition unit A first calculation unit that calculates as the first similarity of the second option,
(c) The difference between the character string included in each option column of the plurality of specification data stored in the document database and the character string included in the first option is calculated as a distance, and the closeness of the distances is calculated. Is calculated as the second similarity of the first option, and (d) a character string included in each option column of a plurality of specification data stored in the document database, and a character string included in the second option. A second calculation unit that calculates the difference as a distance and calculates the closeness of the distance as the second similarity of the second option;
(e) selecting a document similar to the first option from a plurality of specifications stored in the document database based on the first similarity and the second similarity of the first option, and (f) A selection unit for selecting a document similar to the second option from a plurality of specifications stored in the document database based on the first similarity and the second similarity of the second option;
Information processing device equipped with.

A receiving unit for receiving from the user the document data as a search target including the word that will be described in the option field of the specification of the machine tool,
A calculation unit that calculates the first document vector by vectorizing and integrating the meanings of a plurality of words included in the document data;
The option column data indicating a document freely described in the option column of the past specifications and the second document vector as a result of vectorizing and integrating the meanings of the words included in the option column data are associated with each other. A specification database to be stored,
A first calculator that calculates a cosine distance between the first document vector and the second document vector as a first similarity;
A character string included in the option field data, compared with the character string included in the document data, a second calculation unit for calculating a distance representing a difference in string as a second degree of similarity,
A selection unit for selecting, from the specification database, a similar specification including the option field data similar to the document data , based on the first similarity and the second similarity.
A transmission unit that transmits the similar specifications to the user,
Information processing device equipped with.

The information processing apparatus according to claim 1, wherein the distance indicating the difference between the character strings is a Levenshtein distance.