JP2020154514A

JP2020154514A - Learning device, learning method, retrieval device, retrieval method and program

Info

Publication number: JP2020154514A
Application number: JP2019050872A
Authority: JP
Inventors: 淳遠山; Jun Tooyama; 旅人福寿; Tabito Fukuju; 亮祐刈屋; Ryosuke Kariya
Original assignee: IVIS Inc; NTT Data Corp
Current assignee: IVIS Inc; NTT Data Group Corp
Priority date: 2019-03-19
Filing date: 2019-03-19
Publication date: 2020-09-24

Abstract

To highly accurately retrieve a new retrieval sentence for which no feedback has been obtained.SOLUTION: A management device 100 comprises: a data acquisition unit 11 acquiring data; a feature vector generation unit 12 generating a feature vector; a similar data acquisition unit 13 acquiring similar data similar to the data; an auto encoder 14 consisting of an encoder generating a lower-dimensional intermediate vector from the vector and a decoder returning the intermediate vector to the original-dimensional vector; and an update unit 15 updating a parameter of the auto encoder 14, the update unit 15 performs first update processing for updating the parameter so that first error that is an error between an input and an output of the auto encoder 14 is reduced and second update processing for updating the parameter so that second error that is an error between intermediate vectors generated by inputting each of the vectors of the similar data to the encoder is reduced.SELECTED DRAWING: Figure 1

Description

本発明は、学習装置、学習方法、検索装置、検索方法及びプログラムに関する。 The present invention relates to a learning device, a learning method, a search device, a search method and a program.

文書等の情報から必要な情報を検索する際の検索精度を向上させるための手法として、適合性フィードバックという手法が知られている。この手法は、検索結果をユーザに提示して、良い（ユーザが望む）検索結果なのか、悪い検索結果なのかのフィードバックを得ることにより、最終的な検索結果の精度を向上させていく手法である。例えば、特許文献１には、語間の関連性を考慮した適合性フィードバックを実現して、検索時間の短縮と精度の高い検索を可能にする情報検索方法が開示されている。 A method called conformity feedback is known as a method for improving the search accuracy when searching for necessary information from information such as documents. This method improves the accuracy of the final search result by presenting the search result to the user and getting feedback on whether the search result is good (desired by the user) or bad. is there. For example, Patent Document 1 discloses an information retrieval method that realizes conformity feedback in consideration of the relevance between words, shortens the search time, and enables highly accurate retrieval.

特開２０００−２４２６４６号公報JP-A-2000-242646

特許文献１に開示されている情報検索方法では、適合性フィードバックを行うことによって、精度の高い検索を可能にしている。しかし、フィードバックが得られていない新規の検索文に対しては、精度の高い検索を行うことができない。 The information retrieval method disclosed in Patent Document 1 enables highly accurate retrieval by providing conformity feedback. However, it is not possible to perform a highly accurate search for a new search sentence for which feedback has not been obtained.

本発明は、上記実情に鑑みてなされたものであり、フィードバックが得られていない新規の検索文に対しても精度の高い検索を行うための学習装置、学習方法、検索装置、検索方法及びプログラムを提供することを目的とする。 The present invention has been made in view of the above circumstances, and is a learning device, a learning method, a search device, a search method, and a program for performing a highly accurate search even for a new search sentence for which feedback has not been obtained. The purpose is to provide.

上記目的を達成するため、本発明の第１の観点に係る学習装置は、
基準データを取得する基準データ取得手段と、
入力されたデータから特徴ベクトルを生成する特徴ベクトル生成手段と、
前記基準データに類似する類似データを取得する類似データ取得手段と、
前記特徴ベクトルを入力すると前記特徴ベクトルの次元数より低い次元数の中間ベクトルを生成するエンコーダと、前記中間ベクトルを入力すると前記特徴ベクトルと同じ次元数の出力ベクトルを生成するデコーダと、からなるオートエンコーダと、
前記オートエンコーダのパラメータを更新する更新手段と、
を備え、
前記更新手段は、
前記基準データを前記特徴ベクトル生成手段に入力して生成される第１の特徴ベクトルと、前記第１の特徴ベクトルを前記オートエンコーダに入力して生成される出力ベクトルと、の間の誤差である第１の誤差が小さくなるように前記オートエンコーダのパラメータを更新する第１の更新処理と、
前記第１の特徴ベクトルを前記オートエンコーダのエンコーダに入力して生成される第１の中間ベクトルと、前記類似データを前記特徴ベクトル生成手段に入力して生成される第２の特徴ベクトルを前記オートエンコーダのエンコーダに入力して生成される第２の中間ベクトルと、の間の誤差である第２の誤差が小さくなるように前記オートエンコーダのパラメータを更新する第２の更新処理と、
を行う。 In order to achieve the above object, the learning device according to the first aspect of the present invention is
Standard data acquisition means for acquiring standard data,
A feature vector generation means that generates a feature vector from the input data,
Similar data acquisition means for acquiring similar data similar to the reference data, and
An auto consisting of an encoder that generates an intermediate vector having a dimension number lower than the dimension number of the feature vector when the feature vector is input, and a decoder that generates an output vector having the same dimension number as the feature vector when the intermediate vector is input. With the encoder
An update means for updating the parameters of the autoencoder, and
With
The update means
It is an error between the first feature vector generated by inputting the reference data into the feature vector generating means and the output vector generated by inputting the first feature vector into the autoencoder. The first update process of updating the parameters of the autoencoder so that the first error becomes small, and
The first intermediate vector generated by inputting the first feature vector into the encoder of the autoencoder and the second feature vector generated by inputting the similar data into the feature vector generating means are referred to as the auto. A second update process for updating the parameters of the autoencoder so that the second error, which is an error between the second intermediate vector generated by inputting to the encoder of the encoder, becomes smaller.
I do.

前記更新手段は、前記第１の更新処理と前記第２の更新処理とを交互に繰り返し行うことにより、前記第１の誤差と前記第２の誤差とがともに小さくなるように前記オートエンコーダのパラメータを更新する、
ようにしてもよい。 The update means alternately repeats the first update process and the second update process, so that both the first error and the second error are reduced by the parameters of the autoencoder. To update,
You may do so.

前記類似データ取得手段は、１つの前記基準データに対して複数の前記類似データを取得し、
前記更新手段は、前記複数の前記類似データを用いて前記第２の更新処理を行う、
ようにしてもよい。 The similar data acquisition means acquires a plurality of the similar data with respect to one reference data, and obtains a plurality of the similar data.
The update means performs the second update process using the plurality of similar data.
You may do so.

前記更新手段は、前記第１の誤差を自乗誤差を求めることにより算出し、前記第２の誤差をコサイン類似度を求めることにより算出する、
ようにしてもよい。 The updating means calculates the first error by obtaining the square error, and calculates the second error by obtaining the cosine similarity.
You may do so.

前記更新手段は、前記第１の誤差と前記第２の誤差との自乗和平均又は調和平均の値が最小になるように、前記第１の更新処理と前記第２の更新処理とを行う、
ようにしてもよい。 The update means performs the first update process and the second update process so that the value of the square sum average or the harmonic mean of the first error and the second error is minimized.
You may do so.

上記目的を達成するため、本発明の第２の観点に係る学習方法は、
特徴ベクトルを入力すると前記特徴ベクトルの次元数より低い次元数の中間ベクトルを生成するエンコーダと、前記中間ベクトルを入力すると前記特徴ベクトルと同じ次元数の出力ベクトルを生成するデコーダと、からなるオートエンコーダの学習方法であって、
基準データから第１の特徴ベクトルを生成する第１の特徴ベクトル生成ステップと、
前記基準データと類似する類似データから第２の特徴ベクトルを生成する第２の特徴ベクトル生成ステップと、
前記オートエンコーダのパラメータを更新する更新ステップと、
を備え、
前記更新ステップでは、
前記第１の特徴ベクトル生成ステップで生成された前記第１の特徴ベクトルと、前記第１の特徴ベクトルを前記オートエンコーダに入力して生成される出力ベクトルと、の間の誤差である第１の誤差が小さくなるように前記オートエンコーダのパラメータを更新する第１の更新処理と、
前記第１の特徴ベクトルを前記オートエンコーダのエンコーダに入力して生成される第１の中間ベクトルと、前記第２の特徴ベクトルを前記オートエンコーダのエンコーダに入力して生成される第２の中間ベクトルと、の間の誤差である第２の誤差が小さくなるように前記オートエンコーダのパラメータを更新する第２の更新処理と、
を行う。 In order to achieve the above object, the learning method according to the second aspect of the present invention is
An autoencoder consisting of an encoder that generates an intermediate vector having a dimension number lower than the dimension number of the feature vector when a feature vector is input, and a decoder that generates an output vector having the same dimension number as the feature vector when the feature vector is input. Is a learning method of
The first feature vector generation step to generate the first feature vector from the reference data, and
A second feature vector generation step of generating a second feature vector from similar data similar to the reference data, and
An update step for updating the parameters of the autoencoder, and
With
In the update step
The first, which is an error between the first feature vector generated in the first feature vector generation step and the output vector generated by inputting the first feature vector into the autoencoder. The first update process of updating the parameters of the autoencoder so that the error becomes small, and
A first intermediate vector generated by inputting the first feature vector into the encoder of the autoencoder and a second intermediate vector generated by inputting the second feature vector into the encoder of the autoencoder. A second update process for updating the parameters of the autoencoder so that the second error, which is an error between the two, is reduced.
I do.

上記目的を達成するため、本発明の第３の観点に係る検索装置は、
本発明の第２の観点に係る学習方法でパラメータを更新したオートエンコーダと、
入力されたデータから特徴ベクトルを生成する特徴ベクトル生成手段と、
検索される被検索データを前記特徴ベクトル生成手段に入力して生成される被検索特徴ベクトルを前記オートエンコーダのエンコーダに入力して生成される中間ベクトルである被検索中間ベクトルを生成することにより、前記被検索データの集合である被検索データ群から、前記被検索中間ベクトルの集合である被検索中間ベクトル群を予め生成する被検索中間ベクトル群生成手段と、
検索対象となる検索データを取得する検索データ取得手段と、
前記検索データを前記特徴ベクトル生成手段に入力して生成される検索特徴ベクトルを前記オートエンコーダのエンコーダに入力して生成される中間ベクトルである検索中間ベクトルと、前記被検索中間ベクトルと、の間のコサイン類似度に基づいて、前記被検索データ群の中から前記検索データに類似する前記被検索データを検索する検索手段と、
を備える。 In order to achieve the above object, the search device according to the third aspect of the present invention is
An autoencoder whose parameters have been updated by the learning method according to the second aspect of the present invention,
A feature vector generation means that generates a feature vector from the input data,
By inputting the searched data to be searched into the feature vector generation means and generating the searched intermediate vector which is an intermediate vector generated by inputting the searched feature vector into the encoder of the auto encoder, the searched intermediate vector is generated. A search intermediate vector group generation means for generating in advance a search intermediate vector group, which is a set of search intermediate vectors, from a search data group, which is a set of search data.
Search data acquisition means to acquire search data to be searched, and
Between the search intermediate vector, which is an intermediate vector generated by inputting the search data into the feature vector generation means and the search feature vector generated by inputting the search feature vector into the encoder of the auto encoder, and the search intermediate vector. A search means for searching the searched data similar to the search data from the searched data group based on the cosine similarity of the above.
To be equipped.

上記目的を達成するため、本発明の第４の観点に係る検索方法は、
本発明の第２の観点に係る学習方法でパラメータを更新したオートエンコーダによる検索方法であって、
検索される被検索データから生成される被検索特徴ベクトルを前記オートエンコーダのエンコーダに入力して生成される中間ベクトルである被検索中間ベクトルを生成することにより、前記被検索データの集合である被検索データ群から、前記被検索中間ベクトルの集合である被検索中間ベクトル群を予め生成する被検索中間ベクトル群生成ステップと、
検索対象となる検索データから検索特徴ベクトルを生成する検索特徴ベクトル生成ステップと、
前記検索特徴ベクトルを前記オートエンコーダのエンコーダに入力して生成される中間ベクトルである検索中間ベクトルと、前記被検索中間ベクトルと、の間のコサイン類似度に基づいて、前記被検索データ群の中から前記検索データに類似する前記被検索データを検索する検索ステップと、
を備える。 In order to achieve the above object, the search method according to the fourth aspect of the present invention is
A search method using an autoencoder in which parameters are updated by the learning method according to the second aspect of the present invention.
By inputting the searched feature vector generated from the searched searched data into the encoder of the auto encoder and generating the searched intermediate vector which is the intermediate vector generated, the searched is a set of the searched data. A search intermediate vector group generation step that previously generates a search intermediate vector group that is a set of the search intermediate vectors from the search data group,
A search feature vector generation step that generates a search feature vector from the search data to be searched, and
In the search data group, based on the cosine similarity between the search intermediate vector, which is an intermediate vector generated by inputting the search feature vector into the encoder of the autoencoder, and the search intermediate vector. To search for the searched data similar to the search data from
To be equipped.

上記目的を達成するため、本発明の第５の観点に係るプログラムは、
コンピュータを、
基準データを取得する基準データ取得手段、
入力されたデータから特徴ベクトルを生成する特徴ベクトル生成手段、
前記基準データに類似する類似データを取得する類似データ取得手段、
前記特徴ベクトルを入力すると前記特徴ベクトルの次元数より低い次元数の中間ベクトルを生成するエンコーダと、前記中間ベクトルを入力すると前記特徴ベクトルと同じ次元数の出力ベクトルを生成するデコーダと、からなるオートエンコーダ、
前記オートエンコーダのパラメータを更新する更新手段、
として機能させるためのプログラムであって、
前記更新手段は、
前記基準データを前記特徴ベクトル生成手段に入力して生成される第１の特徴ベクトルと、前記第１の特徴ベクトルを前記オートエンコーダに入力して生成される出力ベクトルと、の間の誤差である第１の誤差が小さくなるように前記オートエンコーダのパラメータを更新する第１の更新処理と、
前記第１の特徴ベクトルを前記オートエンコーダのエンコーダに入力して生成される第１の中間ベクトルと、前記類似データを前記特徴ベクトル生成手段に入力して生成される第２の特徴ベクトルを前記オートエンコーダのエンコーダに入力して生成される第２の中間ベクトルと、の間の誤差である第２の誤差が小さくなるように前記オートエンコーダのパラメータを更新する第２の更新処理と、
を行う。 In order to achieve the above object, the program according to the fifth aspect of the present invention is
Computer,
Reference data acquisition means for acquiring reference data,
Feature vector generation means, which generates a feature vector from the input data,
Similar data acquisition means for acquiring similar data similar to the reference data,
An auto consisting of an encoder that generates an intermediate vector having a dimension number lower than the dimension number of the feature vector when the feature vector is input, and a decoder that generates an output vector having the same dimension number as the feature vector when the intermediate vector is input. Encoder,
An update means for updating the parameters of the autoencoder,
It is a program to function as
The update means
It is an error between the first feature vector generated by inputting the reference data into the feature vector generating means and the output vector generated by inputting the first feature vector into the autoencoder. The first update process of updating the parameters of the autoencoder so that the first error becomes small, and
The first intermediate vector generated by inputting the first feature vector into the encoder of the autoencoder and the second feature vector generated by inputting the similar data into the feature vector generating means are referred to as the auto. A second update process for updating the parameters of the autoencoder so that the second error, which is an error between the second intermediate vector generated by inputting to the encoder of the encoder, becomes smaller.
I do.

本発明によれば、フィードバックが得られていない新規の検索文に対しても精度の高い検索を行うための学習装置、学習方法、検索装置、検索方法及びプログラムを提供することができる。 According to the present invention, it is possible to provide a learning device, a learning method, a search device, a search method, and a program for performing a highly accurate search even for a new search sentence for which feedback has not been obtained.

本発明の実施の形態に係るプロジェクト管理システムの機能ブロック図である。It is a functional block diagram of the project management system which concerns on embodiment of this invention. 実施の形態に係るチケットの入力画面例を示す図である。It is a figure which shows the example of the input screen of the ticket which concerns on embodiment. 実施の形態に係るチケットＤＢの一例を示す図である。It is a figure which shows an example of the ticket DB which concerns on embodiment. 実施の形態に係るフィードバックの入力画面例を示す図である。It is a figure which shows the example of the input screen of the feedback which concerns on embodiment. 実施の形態に係るフィードバックＤＢの一例を示す図である。It is a figure which shows an example of the feedback DB which concerns on embodiment. 実施の形態に係るオートエンコーダを説明する図である。It is a figure explaining the autoencoder which concerns on embodiment. 実施の形態に係るプロジェクト管理システムのハードウェア構成図である。It is a hardware block diagram of the project management system which concerns on embodiment. 実施の形態に係る学習処理のフローチャートである。It is a flowchart of the learning process which concerns on embodiment. 実施の形態に係る学習処理におけるオートエンコーダの誤差を説明する図である。It is a figure explaining the error of the autoencoder in the learning process which concerns on embodiment. 実施の形態に係る検索処理のフローチャートである。It is a flowchart of the search process which concerns on embodiment.

以下、本発明の学習装置を、ソフトウェア開発等におけるプロジェクト管理システムに適用した実施の形態について、図面を参照して説明する。なお、図中、同一又は相当部分には同一符号を付す。 Hereinafter, embodiments in which the learning device of the present invention is applied to a project management system in software development and the like will be described with reference to the drawings. In the figure, the same or corresponding parts are designated by the same reference numerals.

実施の形態に係るプロジェクト管理システム１は、ソフトウェア開発等におけるプロジェクトの進捗や課題等を管理するシステムである。図１に示すように、プロジェクト管理システム１は、管理装置１００と端末２００とを備える。 The project management system 1 according to the embodiment is a system for managing the progress and issues of a project in software development and the like. As shown in FIG. 1, the project management system 1 includes a management device 100 and a terminal 200.

プロジェクト管理システム１において、ソフトウェア開発等の担当者は、解決すべき課題が発生した際に、端末２００を通して、図２に示すようなチケットを入力する。そして、入力されたチケットは、入力された順番にチケット番号が付されて管理装置１００の記憶部１６に記憶されていき、図３に示すようなチケットＤＢ（ＤａｔａＢａｓｅ）が構成される。チケットＤＢが構成されると、担当者や管理者は、チケットＤＢに登録されたチケットを確認することにより、プロジェクトの進捗や課題等を管理できるようになる。 In the project management system 1, a person in charge of software development or the like inputs a ticket as shown in FIG. 2 through the terminal 200 when a problem to be solved occurs. Then, the input tickets are assigned ticket numbers in the input order and stored in the storage unit 16 of the management device 100, and a ticket DB (DataBase) as shown in FIG. 3 is configured. When the ticket DB is configured, the person in charge or the manager can manage the progress of the project, issues, etc. by checking the tickets registered in the ticket DB.

また、新たに発生した課題を担当者や管理者が解決する際には、過去に発生した類似課題の情報が参考になる場合があるため、管理装置１００は、それまでに入力された過去のチケットの情報を学習して、新たに入力されたチケットに記載された課題と類似する課題が記載された過去のチケット（類似チケット）を検索する機能を持つ。このような、過去のチケットの情報を学習し、類似チケットを検索する管理装置１００の構成について以下に説明する。 Further, when the person in charge or the administrator solves the newly generated problem, the information of the similar problem that has occurred in the past may be referred to, so that the management device 100 is the past input so far. It has a function to learn ticket information and search past tickets (similar tickets) in which tasks similar to the tasks described in the newly entered ticket are described. The configuration of the management device 100 that learns the information of the past tickets and searches for similar tickets will be described below.

管理装置１００は、図１に示すように、制御部１０と、記憶部１６と、通信部１７と、を備える。 As shown in FIG. 1, the management device 100 includes a control unit 10, a storage unit 16, and a communication unit 17.

制御部１０は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）等で構成される。制御部１０は、記憶部１６に記憶されたプログラムを実行することにより、後述する各部（データ取得部１１、特徴ベクトル生成部１２、類似データ取得部１３、オートエンコーダ１４、更新部１５）の機能を実現する。 The control unit 10 is composed of a CPU (Central Processing Unit) and the like. By executing the program stored in the storage unit 16, the control unit 10 functions of each unit (data acquisition unit 11, feature vector generation unit 12, similar data acquisition unit 13, autoencoder 14, update unit 15) described later. To realize.

記憶部１６は、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）等で構成され、チケットＤＢ及び後述するフィードバックＤＢ並びに制御部１０のＣＰＵが実行するプログラム及び必要なデータを記憶する。 The storage unit 16 is composed of a ROM (Read Only Memory), a RAM (Random Access Memory), and the like, and stores a ticket DB, a feedback DB described later, a program executed by the CPU of the control unit 10, and necessary data.

通信部１７は、他の機器とデータ通信を行うためのデバイス（ネットワークカード等）で構成される。管理装置１００は、通信部１７を介して、端末２００や他の装置等とデータの送受信を行う。 The communication unit 17 is composed of a device (network card or the like) for performing data communication with another device. The management device 100 transmits / receives data to / from the terminal 200, other devices, and the like via the communication unit 17.

次に、制御部１０の機能について説明する。制御部１０は、データ取得部１１、特徴ベクトル生成部１２、類似データ取得部１３、オートエンコーダ１４、更新部１５の機能を実現する。 Next, the function of the control unit 10 will be described. The control unit 10 realizes the functions of the data acquisition unit 11, the feature vector generation unit 12, the similar data acquisition unit 13, the autoencoder 14, and the update unit 15.

データ取得部１１は、管理装置１００に登録された大量のチケットからなるチケットＤＢから、各チケットのタイトル及び本文の情報を取得する。後述する学習処理においては、データ取得部１１は基準チケットのタイトル及び本文の情報（基準データ）を取得し、データ取得部１１は、基準データ取得手段として機能する。なお、チケットのタイトル及び本文の情報は、基準データの一例であり、データ取得部１１は、他の情報を基準データとして用いてもよい。例えば、チケットの優先度等も含めて基準データとしてもよいし、逆にタイトルのみ、本文のみを基準データとしてもよい。 The data acquisition unit 11 acquires information on the title and text of each ticket from the ticket DB composed of a large number of tickets registered in the management device 100. In the learning process described later, the data acquisition unit 11 acquires the information (reference data) of the title and the text of the reference ticket, and the data acquisition unit 11 functions as the reference data acquisition means. The information of the ticket title and the text is an example of the reference data, and the data acquisition unit 11 may use other information as the reference data. For example, the reference data may include the priority of the ticket and the like, or conversely, only the title and only the text may be used as the reference data.

特徴ベクトル生成部１２は、入力されたデータから特徴ベクトルを生成する。特徴ベクトル生成部１２は、特徴ベクトル生成手段として機能する。具体的には、特徴ベクトル生成部１２は、チケットが入力データとして与えられると、当該チケットのタイトルと本文を分かち書き文にして、当該分かち書き文中に出現する各単語の出現頻度から特徴ベクトルを生成する。本実施の形態では、この特徴ベクトルはＢｏＷ（ＢａｇｏｆＷｏｒｄｓ）ベクトルである。ＢｏＷベクトルとは、単語の種類数（例えば２万）を次元数とし、各単語が当該分かち書き文に出現する頻度を各次元の要素の値とするベクトルである。 The feature vector generation unit 12 generates a feature vector from the input data. The feature vector generation unit 12 functions as a feature vector generation means. Specifically, when the ticket is given as input data, the feature vector generation unit 12 makes the title and the text of the ticket into a word-separated sentence, and generates a feature vector from the appearance frequency of each word appearing in the word-separated sentence. .. In the present embodiment, this feature vector is a BoW (Bag of Words) vector. The BoW vector is a vector in which the number of types of words (for example, 20,000) is the number of dimensions, and the frequency with which each word appears in the word-separated sentence is the value of an element in each dimension.

類似データ取得部１３は、担当者等からのフィードバックを受けることにより、チケットＤＢに登録された大量のチケットの中でタイトル及び本文の情報が互いに類似するチケットの情報を取得する。類似データ取得部１３は、類似データ取得手段として機能する。具体的には、類似データ取得部１３は、管理装置１００に登録された大量のチケットの中の１つのチケット（例えば最も直近に入力されたチケット）を基準チケットとし、基準チケットに類似するチケットを所定の数（例えば１０個）抽出する。この抽出は、基準チケット及び他の各チケットに対して、特徴ベクトル生成部１２により特徴ベクトルを生成し、基準チケットの特徴ベクトルと他の各チケットの特徴ベクトルとの間のコサイン類似度を求め、基準チケットとのコサイン類似度の高い順に所定の数（例えば１０個）の他の各チケットを抽出することにより行われる。 By receiving feedback from the person in charge or the like, the similar data acquisition unit 13 acquires information on tickets whose titles and texts are similar to each other among a large number of tickets registered in the ticket DB. The similar data acquisition unit 13 functions as a similar data acquisition means. Specifically, the similar data acquisition unit 13 uses one ticket (for example, the most recently input ticket) among a large number of tickets registered in the management device 100 as a reference ticket, and sets a ticket similar to the reference ticket. Extract a predetermined number (for example, 10). In this extraction, a feature vector is generated by the feature vector generation unit 12 for the reference ticket and each of the other tickets, and the cosine similarity between the feature vector of the reference ticket and the feature vector of each of the other tickets is obtained. This is done by extracting a predetermined number (for example, 10) of each other ticket in descending order of cosine similarity with the reference ticket.

そして、図４に示すように、担当者等の使用する端末２００の表示部２３に、抽出された各チケットを表示し、それら抽出された各チケットが基準チケットと本当に類似しているか否かについて、担当者等からフィードバックを受ける。類似データ取得部１３が、受けたフィードバックの情報を管理装置１００の記憶部１６に登録していくことにより、図５に示すようなフィードバックＤＢが構成される。類似するチケットに関するフィードバック情報の登録をいつ行うかは任意であるが、例えば、類似データ取得部１３は、新規チケットが登録される度に、当該新規チケットを基準チケットとして、類似チケットの情報のフィードバックを受け、フィードバックＤＢに登録する。 Then, as shown in FIG. 4, each extracted ticket is displayed on the display unit 23 of the terminal 200 used by the person in charge or the like, and whether or not each of the extracted tickets is really similar to the reference ticket is checked. , Receive feedback from the person in charge. The feedback DB as shown in FIG. 5 is configured by the similar data acquisition unit 13 registering the received feedback information in the storage unit 16 of the management device 100. When to register feedback information about similar tickets is arbitrary. For example, the similar data acquisition unit 13 feeds back information about similar tickets using the new ticket as a reference ticket each time a new ticket is registered. Receive and register in the feedback DB.

オートエンコーダ１４は、図６に示すように、ニューラルネットワークによるエンコーダ１４１とデコーダ１４２とを接続したニューラルネットワークである。オートエンコーダ１４において、エンコーダ１４１に入力された入力ベクトル１４３は、入力ベクトル１４３よりも次元数の低い次元数の中間ベクトル１４４に変換される。そして、中間ベクトル１４４は、デコーダ１４２に入力され、入力ベクトル１４３と同じ次元数の出力ベクトル１４５に変換される。 As shown in FIG. 6, the autoencoder 14 is a neural network in which an encoder 141 and a decoder 142 are connected by a neural network. In the autoencoder 14, the input vector 143 input to the encoder 141 is converted into an intermediate vector 144 having a dimension number lower than that of the input vector 143. Then, the intermediate vector 144 is input to the decoder 142 and converted into an output vector 145 having the same number of dimensions as the input vector 143.

そして、入力ベクトル１４３と出力ベクトル１４５との間の誤差（第１の誤差１４６）が小さくなるように（出力ベクトル１４５が入力ベクトル１４３を復元できるように）、エンコーダ１４１及びデコーダ１４２のニューラルネットワークを学習させることにより、オートエンコーダ１４が得られる。オートエンコーダ１４のニューラルネットワークの学習は、後述する学習処理により、ニューラルネットワークの内部のパラメータ（ニューラルネットワークの各結合の重み等）が更新されることによって行われる。 Then, the neural network of the encoder 141 and the decoder 142 is set so that the error between the input vector 143 and the output vector 145 (first error 146) becomes small (so that the output vector 145 can restore the input vector 143). By learning, the autoencoder 14 is obtained. The learning of the neural network of the autoencoder 14 is performed by updating the internal parameters of the neural network (weights of each connection of the neural network, etc.) by the learning process described later.

本実施の形態においては、オートエンコーダ１４（エンコーダ１４１）に入力される入力ベクトル１４３の次元数及びオートエンコーダ１４（デコーダ１４２）から出力される出力ベクトル１４５の次元数は、特徴ベクトル生成部１２が生成する特徴ベクトルの次元数であり、例えば２万次元である。また、オートエンコーダ１４の中間ベクトル１４４の次元数は特徴ベクトル生成部１２が生成する特徴ベクトルの次元数よりも低い次元数であり、例えば１０００次元である。また、デコーダ１４２のニューラルネットワークとエンコーダ１４１のニューラルネットワークとは、互いに転置の関係にあるニューラルネットワークとすることができる。 In the present embodiment, the feature vector generation unit 12 determines the number of dimensions of the input vector 143 input to the autoencoder 14 (encoder 141) and the number of dimensions of the output vector 145 output from the autoencoder 14 (decoder 142). It is the number of dimensions of the feature vector to be generated, for example, 20,000 dimensions. Further, the number of dimensions of the intermediate vector 144 of the autoencoder 14 is lower than the number of dimensions of the feature vector generated by the feature vector generation unit 12, and is, for example, 1000 dimensions. Further, the neural network of the decoder 142 and the neural network of the encoder 141 can be a neural network in a transposed relationship with each other.

更新部１５は、オートエンコーダ１４の誤差（第１の誤差１４６等）が小さくなるようにオートエンコーダ１４の内部のパラメータ（ニューラルネットワークの各結合の重み等）を更新する。更新部１５は、更新手段として機能する。 The update unit 15 updates the internal parameters of the autoencoder 14 (weights of each coupling of the neural network, etc.) so that the error of the autoencoder 14 (first error 146, etc.) becomes small. The update unit 15 functions as an update means.

端末２００は、ＰＣ（ＰｅｒｓｏｎａｌＣｏｍｐｕｔｅｒ）等の情報端末機器である。端末２００は、図１に示すように、制御部２０と、記憶部２１と、通信部２２と、表示部２３と、入力部２４と、を備え、管理装置１００と通信可能に接続されている。 The terminal 200 is an information terminal device such as a PC (Personal Computer). As shown in FIG. 1, the terminal 200 includes a control unit 20, a storage unit 21, a communication unit 22, a display unit 23, and an input unit 24, and is communicably connected to the management device 100. ..

制御部２０は、ＣＰＵ等で構成される。制御部２０は、記憶部２１に記憶されたプログラムを実行することにより、管理装置１００にアクセスするＷｅｂブラウザの機能等を実現する。 The control unit 20 is composed of a CPU and the like. The control unit 20 realizes the function of a Web browser that accesses the management device 100 by executing the program stored in the storage unit 21.

記憶部２１は、ＲＯＭ、ＲＡＭ等で構成され、制御部２０のＣＰＵが実行するプログラム及び必要なデータを記憶する。 The storage unit 21 is composed of a ROM, a RAM, and the like, and stores a program executed by the CPU of the control unit 20 and necessary data.

通信部２２は、他の機器とデータ通信を行うためのデバイス（ネットワークカード等）で構成される。端末２００は、通信部２２を介して、管理装置１００等とデータの送受信を行う。 The communication unit 22 is composed of a device (network card or the like) for performing data communication with another device. The terminal 200 transmits / receives data to / from the management device 100 and the like via the communication unit 22.

表示部２３は、液晶ディスプレイや有機ＥＬ（Ｅｌｅｃｔｒｏ−Ｌｕｍｉｎｅｓｃｅｎｃｅ）等のディスプレイで構成され、管理装置１００から受信したチケットの情報等を表示する。 The display unit 23 is composed of a display such as a liquid crystal display or an organic EL (Electro-Luminence), and displays ticket information or the like received from the management device 100.

入力部２４は、キーボード、マウス、タッチパネル等で構成され、担当者等からのチケットやフィードバック等の入力を受け付ける。 The input unit 24 is composed of a keyboard, a mouse, a touch panel, etc., and receives inputs such as tickets and feedback from a person in charge or the like.

なお、図１では、端末２００が１つのみ示されているが、プロジェクト管理システム１は、複数の端末２００を備えて、どの端末２００からでも管理装置１００と通信できるようにしてもよい。また、逆に、プロジェクト管理システム１は、管理装置１００が表示部及び入力部を備えるなら、端末２００を備えなくてもよい。この場合、チケット等の情報は、管理装置１００の表示部に表示され、管理装置１００の入力部が担当者等からのチケットやフィードバック等の入力を受け付ける。 Although only one terminal 200 is shown in FIG. 1, the project management system 1 may include a plurality of terminals 200 so that any terminal 200 can communicate with the management device 100. On the contrary, if the management device 100 includes a display unit and an input unit, the project management system 1 does not have to include the terminal 200. In this case, the information such as the ticket is displayed on the display unit of the management device 100, and the input unit of the management device 100 accepts the input of the ticket, feedback, etc. from the person in charge or the like.

次に、プロジェクト管理システム１のハードウェア構成について、図７を参照して説明する。管理装置１００は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）１０１と、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）１０２と、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）１０３と、ハードディスクドライブ１０４と、ネットワークカード１０５とを備える。 Next, the hardware configuration of the project management system 1 will be described with reference to FIG. 7. The management device 100 includes a CPU (Central Processing Unit) 101, a RAM (Random Access Memory) 102, a ROM (Read Only Memory) 103, a hard disk drive 104, and a network card 105.

ＣＰＵ１０１は、ハードディスクドライブ１０４に記憶されているプログラムをＲＡＭ１０２に読み出して実行することにより、制御部１０として機能し、上述した各部の機能を実現する。 The CPU 101 functions as the control unit 10 by reading the program stored in the hard disk drive 104 into the RAM 102 and executing the program, and realizes the functions of the above-described units.

ＲＡＭ１０２は、揮発性メモリであり、ＣＰＵ１０１の作業領域や、ＣＰＵ１０１が実行するプログラムで使用される変数を格納する領域として用いられる。 The RAM 102 is a volatile memory, and is used as a work area of the CPU 101 and an area for storing variables used in a program executed by the CPU 101.

ＲＯＭ１０３は、不揮発性メモリであり、ＣＰＵ１０１が実行する管理装置１００の基本動作のための制御プログラム、ＢＩＯＳ（ＢａｓｉｃＩｎｐｕｔＯｕｔｐｕｔＳｙｓｔｅｍ）等を記憶する。 The ROM 103 is a non-volatile memory, and stores a control program for the basic operation of the management device 100 executed by the CPU 101, a BIOS (Basic Input Output System), and the like.

ハードディスクドライブ１０４は、管理装置１００に格納する各種情報と、ＣＰＵ１０１が実行するプログラムが格納される。ＲＡＭ１０２、ＲＯＭ１０３及びハードディスクドライブ１０４により記憶部１６が構成される。 The hard disk drive 104 stores various information stored in the management device 100 and a program executed by the CPU 101. The storage unit 16 is composed of the RAM 102, the ROM 103, and the hard disk drive 104.

ネットワークカード１０５は、通信回線とのインターフェースであり、端末２００が備える後述するネットワークカード２０５と通信可能に接続されている。ネットワークカード１０５とネットワークカード２０５との間は直接接続されていても良いし、インターネット、イントラネット、ＶＰＮ（ＶｉｒｔｕａｌＰｒｉｖａｔｅＮｅｔｗｏｒｋ）、ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）等のネットワークを介して接続されていても良い。ネットワークカード１０５により、通信部１７が構成される。 The network card 105 is an interface with a communication line, and is communicably connected to a network card 205 included in the terminal 200, which will be described later. The network card 105 and the network card 205 may be directly connected, or may be connected via a network such as the Internet, an intranet, a VPN (Virtual Private Network), or a LAN (Local Area Network). The network card 105 constitutes the communication unit 17.

端末２００は、ＣＰＵ２０１と、ＲＡＭ２０２と、ＲＯＭ２０３と、ハードディスクドライブ２０４と、ネットワークカード２０５と、ディスプレイ２０６と、キーボード２０７とを備える。 The terminal 200 includes a CPU 201, a RAM 202, a ROM 203, a hard disk drive 204, a network card 205, a display 206, and a keyboard 207.

ＣＰＵ２０１は、ハードディスクドライブ２０４に記憶されているプログラムをＲＡＭ２０２に読み出して実行することにより、制御部２０として機能し、端末２００における各種処理を実行する。 The CPU 201 functions as the control unit 20 by reading the program stored in the hard disk drive 204 into the RAM 202 and executing the program, and executes various processes in the terminal 200.

ＲＡＭ２０２は、揮発性メモリであり、ＣＰＵ２０１の作業領域や、ＣＰＵ２０１が実行するプログラムで使用される変数を格納する領域として用いられる。 The RAM 202 is a volatile memory, and is used as a work area of the CPU 201 and an area for storing variables used in a program executed by the CPU 201.

ＲＯＭ２０３は、不揮発性メモリであり、ＣＰＵ２０１が実行する端末２００の基本動作のための制御プログラム、ＢＩＯＳ等を記憶する。 The ROM 203 is a non-volatile memory, and stores a control program, a BIOS, and the like for the basic operation of the terminal 200 executed by the CPU 201.

ハードディスクドライブ２０４は、端末２００に格納する各種情報と、ＣＰＵ２０１が実行するプログラムが格納される。ＲＡＭ２０２、ＲＯＭ２０３及びハードディスクドライブ２０４により記憶部２１が構成される。 The hard disk drive 204 stores various information stored in the terminal 200 and a program executed by the CPU 201. The storage unit 21 is composed of the RAM 202, the ROM 203, and the hard disk drive 204.

ネットワークカード２０５は、通信回線とのインターフェースであり、管理装置１００が備えるネットワークカード１０５と通信可能に接続されている。ネットワークカード２０５とネットワークカード１０５との間は直接接続されていても良いし、インターネット、イントラネット、ＶＰＮ、ＬＡＮ等のネットワークを介して接続されていても良い。ネットワークカード２０５により、通信部２２が構成される。 The network card 205 is an interface with a communication line, and is communicably connected to the network card 105 included in the management device 100. The network card 205 and the network card 105 may be directly connected, or may be connected via a network such as the Internet, an intranet, a VPN, or a LAN. The network card 205 constitutes the communication unit 22.

次に、後述する検索処理によって類似チケットを検索できるようにするために、過去のチケットの情報を学習する学習処理について図８を参照して説明する。学習処理は、プロジェクト管理システム１の管理者等が、管理装置１００に学習処理の開始を指示すると開始される。この時、管理装置１００は、学習装置として動作する。なお、学習処理開始時点で、管理装置１００の記憶部１６に記憶されているチケットＤＢにはチケット数分のチケット情報が、フィードバックＤＢにはフィードバック数分のフィードバック情報が、それぞれ格納されているものとする。 Next, a learning process for learning information on past tickets will be described with reference to FIG. 8 so that similar tickets can be searched by the search process described later. The learning process is started when the manager or the like of the project management system 1 instructs the management device 100 to start the learning process. At this time, the management device 100 operates as a learning device. At the start of the learning process, the ticket DB stored in the storage unit 16 of the management device 100 stores the ticket information for the number of tickets, and the feedback DB stores the feedback information for the number of feedbacks. And.

まず、管理装置１００の制御部１０は、チケット番号を指定する変数ＴＮを１に初期化する（ステップＳ１０１）。次に、データ取得部１１は、チケットＤＢからチケット番号がＴＮであるチケットのタイトル及び本文（文章_ＴＮとする）を取得する（ステップＳ１０２）。 First, the control unit 10 of the management device 100 initializes the variable TN that specifies the ticket number to 1 (step S101). Next, the data acquisition unit 11 acquires the title and the text (referred to as the sentence _TN ) of the ticket whose ticket number is TN from the ticket DB (step S102).

そして、特徴ベクトル生成部１２は、文章_ＴＮからＢｏＷ_ＴＮベクトルを生成する（ステップＳ１０３）。ここで、ＢｏＷ_ＴＮベクトルは第１の特徴ベクトルとも呼ばれ、ステップＳ１０３は第１の特徴ベクトル生成ステップとも呼ばれる。次に、制御部１０は、オートエンコーダ１４に、ＢｏＷ_ＴＮベクトルを入力し、第１の誤差１４６を算出する（ステップＳ１０４）。第１の誤差１４６とは、図９に示すように、ＢｏＷ_ＴＮベクトルを入力ベクトル１４３としてオートエンコーダ１４に入力した時にオートエンコーダ１４から出力される出力ベクトル１４５と、入力ベクトル１４３と、の間の誤差である。制御部１０は、第１の誤差１４６を、自乗誤差（各ベクトルの対応する要素の差を自乗した値の総和）として算出する。 Then, the feature vector generation unit 12 generates a BoW _TN vector from the sentence _TN (step S103). Here, the BoW _TN vector is also referred to as a first feature vector, and step S103 is also referred to as a first feature vector generation step. Next, the control unit 10 inputs the BoW _TN vector to the autoencoder 14 and calculates the first error 146 (step S104). The first error 146 is between the output vector 145 output from the autoencoder 14 and the input vector 143 when the BoW _TN vector is input to the autoencoder 14 as the input vector 143, as shown in FIG. It is an error. The control unit 10 calculates the first error 146 as a square error (the sum of the squared values of the differences between the corresponding elements of each vector).

そして、更新部１５は、算出された第１の誤差１４６が小さくなるように、オートエンコーダ１４のニューラルネットワークのパラメータ（結合の重みなど）を更新する（ステップＳ１０５）。次に、制御部１０は、変数ＴＮに１を加算する（ステップＳ１０６）。 Then, the update unit 15 updates the parameters (coupling weight, etc.) of the neural network of the autoencoder 14 so that the calculated first error 146 becomes small (step S105). Next, the control unit 10 adds 1 to the variable TN (step S106).

そして、制御部１０は、変数ＴＮの値が、チケットＤＢに格納されているチケット数より大きいか否かを判定する（ステップＳ１０７）。変数ＴＮの値がチケット数以下なら（ステップＳ１０７；Ｎｏ）、ステップＳ１０２に戻る。 Then, the control unit 10 determines whether or not the value of the variable TN is larger than the number of tickets stored in the ticket DB (step S107). If the value of the variable TN is equal to or less than the number of tickets (step S107; No), the process returns to step S102.

変数ＴＮの値がチケット数より大きいなら（ステップＳ１０７；Ｙｅｓ）、制御部１０は、フィードバック番号を指定する変数ＦＮを１に初期化する（ステップＳ１０８）。そして、類似データ取得部１３は、フィードバックＤＢからフィードバック番号がＦＮである基準チケット番号（ＦＮ１とする）と類似チケット番号（ＦＮ２とする）を取得し、チケットＤＢからチケット番号がＦＮ１であるチケット（基準チケット）のタイトル及び本文（文章_ＦＮ１とする）とチケット番号がＦＮ２であるチケット（類似チケット）のタイトル及び本文（文章_ＦＮ２とする）とを取得する（ステップＳ１０９）。 If the value of the variable TN is larger than the number of tickets (step S107; Yes), the control unit 10 initializes the variable FN that specifies the feedback number to 1 (step S108). Then, the similar data acquisition unit 13 acquires a reference ticket number (referred to as FN1) and a similar ticket number (referred to as FN2) whose feedback number is FN from the feedback DB, and a ticket whose ticket number is FN1 from the ticket DB (referred to as FN2). reference ticket) and the title and text (sentences _FN1 of) the ticket number is a title and text (sentences _FN2 tickets (similar ticket) is FN2) and the acquiring (step S109).

次に、特徴ベクトル生成部１２は、文章_ＦＮ１からＢｏＷ_ＦＮ１ベクトルを生成し、文章_ＦＮ２からＢｏＷ_ＦＮ２ベクトルを生成する（ステップＳ１１０）。なお、ＢｏＷ_ＦＮ１ベクトルは第１の特徴ベクトルとも呼ばれ、ＢｏＷ_ＦＮ２ベクトルは第２の特徴ベクトルとも呼ばれる。そして、ステップＳ１１０は、文章_ＦＮ１からＢｏＷ_ＦＮ１ベクトルを生成する第１の特徴ベクトル生成ステップと、文章_ＦＮ２からＢｏＷ_ＦＮ２ベクトルを生成する第２の特徴ベクトル生成ステップとからなる。 Next, the feature vector generation unit 12 generates the _{BoW FN1} vector from the text _FN1, generates the _{BoW FN2} vector from the text _FN2 (step S110). _{Incidentally, BoW FN1} vector is also called a first feature _{vector, BoW FN2} vector is also referred to as a second feature vector. Then, step S110 is composed of a sentence _FN1 a first feature vector generation step of generating a _{BoW FN1} vector, the second feature vector generation step of generating a _{BoW FN2} vector from the text _FN2.

次に、制御部１０は、オートエンコーダ１４のエンコーダ１４１にＢｏＷ_ＦＮ１ベクトル及びＢｏＷ_ＦＮ２ベクトルをそれぞれ入力して得られる中間ベクトル１４４及び中間ベクトル１４４’に基づき第２の誤差１４７を算出する（ステップＳ１１１）。なお、ここで得られる中間ベクトル１４４は第１の中間ベクトルとも呼ばれ、中間ベクトル１４４’は第２の中間ベクトルとも呼ばれる。 Next, the control unit 10 calculates the second error 147 based on the intermediate vector 144 and the intermediate vector 144 'obtained in the encoder 141 of Autoencoder 14 _{BoW FN1} vector and _{BoW FN2} vector was inputted (step S111 ). The intermediate vector 144 obtained here is also referred to as a first intermediate vector, and the intermediate vector 144'is also referred to as a second intermediate vector.

第２の誤差１４７とは、図９に示すように、オートエンコーダ１４のエンコーダ１４１に、ＢｏＷ_ＦＮ１ベクトルを入力ベクトル１４３として入力して得られる中間ベクトル１４４と、エンコーダ１４１に、ＢｏＷ_ＦＮ２ベクトルを入力ベクトル１４３’として入力して得られる中間ベクトル１４４’と、の間の誤差である。制御部１０は、第２の誤差１４７を、コサイン類似度（２つのベクトルの内積を、各ベクトルの長さ（Ｌ^２ノルム）で割った値）として算出する。 And the second error 147, as shown in FIG. 9, the encoder 141 of Autoencoder _14, an intermediate vector 144 obtained by inputting the _{BoW FN1} vector as input vector 143, the encoder _141, enter the _{BoW FN2} vector It is an error between the intermediate vector 144' obtained by inputting as the vector 143'and the intermediate vector 144'. The control unit 10 calculates the second error 147 as a cosine similarity (a value obtained by dividing the inner product of two vectors by the length (L ² norm) of each vector).

なお、ここでコサイン類似度を用いている理由は、後述するように、中間ベクトル１４４を文章の類似度の算出に用いるためである。この場合、第２の誤差１４７については、誤差の絶対的な値を小さくするよりも、中間ベクトル１４４の間の類似度を高める（コサイン類似度を１に近づける）方が良いからである。一般的な誤差（小さい方が類似度が高い）と性質を似せるため、第２の誤差１４７を、コサイン類似度の逆数（１／コサイン類似度）として算出したり、１からコサイン類似度を引いた値（１−コサイン類似度）として算出したりしてもよい。 The reason why the cosine similarity is used here is that the intermediate vector 144 is used for calculating the similarity of sentences, as will be described later. In this case, for the second error 147, it is better to increase the similarity between the intermediate vectors 144 (make the cosine similarity closer to 1) than to reduce the absolute value of the error. In order to resemble the general error (smaller the higher the similarity), the second error 147 is calculated as the reciprocal of the cosine similarity (1 / cosine similarity), or the cosine similarity is subtracted from 1. It may be calculated as a value (1-cosine similarity).

図８に戻り、更新部１５は、算出された第２の誤差１４７が小さくなるように（コサイン類似度の値が１に近づくように）、オートエンコーダ１４のニューラルネットワークのパラメータを更新する（ステップＳ１１２）。なお、ステップＳ１０５は第１の更新処理であり、ステップＳ１１２は第２の更新処理である。そして、ステップＳ１０５とステップＳ１１２は更新ステップとも呼ばれる。 Returning to FIG. 8, the update unit 15 updates the parameters of the neural network of the autoencoder 14 so that the calculated second error 147 becomes small (so that the value of the cosine similarity approaches 1) (step). S112). Note that step S105 is the first update process, and step S112 is the second update process. Then, step S105 and step S112 are also called update steps.

次に、制御部１０は、変数ＦＮに１を加算する（ステップＳ１１３）。そして、制御部１０は、変数ＦＮの値が、フィードバックＤＢに格納されているフィードバック数より大きいか否かを判定する（ステップＳ１１４）。変数ＦＮの値がフィードバック数以下なら（ステップＳ１１４；Ｎｏ）、ステップＳ１０９に戻る。 Next, the control unit 10 adds 1 to the variable FN (step S113). Then, the control unit 10 determines whether or not the value of the variable FN is larger than the number of feedbacks stored in the feedback DB (step S114). If the value of the variable FN is equal to or less than the number of feedbacks (step S114; No), the process returns to step S109.

変数ＦＮの値がフィードバック数より大きいなら（ステップＳ１１４；Ｙｅｓ）、制御部１０は、第１の誤差１４６と第２の誤差１４７を統合した誤差（統合誤差）が、学習処理を開始してからそれまでの最小値になったか否かを判定する（ステップＳ１１５）。第１の誤差１４６と第２の誤差１４７を統合した誤差（統合誤差）とは、この統合誤差を小さくすることにより、第１の誤差１４６と第２の誤差１４７とが総合的に小さくなるような指標となる値である。例えば、第２の誤差１４７を１からコサイン類似度を引いた値（１−コサイン類似度）として算出する場合、第１の誤差１４６と第２の誤差１４７の平均（調和平均、自乗和平均、算術平均、幾何平均等のうちのいずれか）の値を統合誤差とすることができる。 If the value of the variable FN is larger than the number of feedbacks (step S114; Yes), the control unit 10 starts the learning process after the error (integration error) obtained by integrating the first error 146 and the second error 147. It is determined whether or not the minimum value up to that point has been reached (step S115). The error (integration error) in which the first error 146 and the second error 147 are integrated is such that the first error 146 and the second error 147 are comprehensively reduced by reducing this integration error. It is a value that serves as an index. For example, when the second error 147 is calculated as a value obtained by subtracting the cosine similarity from 1 (1-cosine similarity), the average of the first error 146 and the second error 147 (harmonic mean, arithmetic mean, mean, etc.) The value of either arithmetic mean, geometric mean, etc.) can be used as the integration error.

統合誤差が最小値であるなら（ステップＳ１１５；Ｙｅｓ）、制御部１０は、その時点でのオートエンコーダ１４のニューラルネットワークのパラメータを記憶部１６に保存し（ステップＳ１１６）、ステップＳ１０１に戻る。 If the integration error is the minimum value (step S115; Yes), the control unit 10 stores the neural network parameters of the autoencoder 14 at that time in the storage unit 16 (step S116), and returns to step S101.

統合誤差が最小値でないなら（ステップＳ１１５；Ｎｏ）、制御部１０は、統合誤差が最小値にならずにステップＳ１０１からステップＳ１１６までのループを所定の回数（例えば１００回）繰り返したか否かを判定する（ステップＳ１１７）。 If the integration error is not the minimum value (step S115; No), the control unit 10 determines whether or not the loop from step S101 to step S116 is repeated a predetermined number of times (for example, 100 times) without the integration error becoming the minimum value. Determine (step S117).

まだ所定の回数繰り返していなければ（ステップＳ１１７；Ｎｏ）、ステップＳ１０１に戻る。所定の回数繰り返していたら（ステップＳ１１７；Ｙｅｓ）、更新部１５は、ステップＳ１１６で記憶部１６に保存したパラメータで、オートエンコーダ１４のニューラルネットワークのパラメータを更新し（ステップＳ１１８）、学習処理を終了する。 If the process has not been repeated a predetermined number of times (step S117; No), the process returns to step S101. If the process is repeated a predetermined number of times (step S117; Yes), the update unit 15 updates the neural network parameters of the autoencoder 14 with the parameters saved in the storage unit 16 in step S116 (step S118), and ends the learning process. To do.

上述した学習処理が行われることにより、オートエンコーダ１４のエンコーダ１４１は、エンコーダ１４１に入力される特徴ベクトルの元の文章が類似すると、エンコーダ１４１が出力する中間ベクトル１４４も類似するように、ニューラルネットワークのパラメータが更新される。 By performing the above-mentioned learning process, the encoder 141 of the autoencoder 14 has a neural network so that when the original sentences of the feature vectors input to the encoder 141 are similar, the intermediate vector 144 output by the encoder 141 is also similar. Parameters are updated.

なお、上述した学習処理（図８）では、１つのチケット情報から誤差を算出する（ステップＳ１０４）毎にパラメータ更新（ステップＳ１０５）を行っていたが、第１の誤差の算出を複数（バッチサイズ）のチケット情報で行ってからパラメータ更新を行う、いわゆるミニバッチ学習を行ってもよい。フィードバック情報についても同様に、第２の誤差の算出を複数（バッチサイズ）のフィードバック情報で行ってからパラメータ更新を行うようにしてもよい。このようにすることにより、一部の特殊なデータによって算出された異常な誤差による（検索精度に悪影響を与える）パラメータ更新を避けることができる。 In the learning process (FIG. 8) described above, the parameter is updated (step S105) every time the error is calculated from one ticket information (step S104), but a plurality of first error calculations (batch size) are performed. ), So-called mini-batch learning, in which parameters are updated after the ticket information is used, may be performed. Similarly, for the feedback information, the second error may be calculated with a plurality of (batch size) feedback information, and then the parameters may be updated. By doing so, it is possible to avoid parameter updates (which adversely affect the search accuracy) due to an abnormal error calculated by some special data.

また、チケットＤＢに格納されているチケット数の大きさと、フィードバックＤＢに格納されているフィードバック数の大きさとが、かなり異なる場合（例えば、チケット数が、フィードバック数の２倍以上等）、少ない方の情報を繰り返し用いることにより、ステップＳ１０５によるパラメータ更新の回数と、ステップＳ１１２によるパラメータ更新の回数とが大体同じ回数になるようにチケット数及びフィードバック数を調整してもよい。 Also, if the size of the number of tickets stored in the ticket DB and the size of the number of feedbacks stored in the feedback DB are significantly different (for example, the number of tickets is twice or more the number of feedbacks), the smaller one. By repeatedly using the information in the above, the number of tickets and the number of feedbacks may be adjusted so that the number of parameter updates in step S105 and the number of parameter updates in step S112 are approximately the same.

第１の誤差と第２の誤差とが共に小さくなるのが理想だが、実際にはこの２つの誤差の間で綱引き状態になることが多い。その場合、チケット数がフィードバック数より多いと第１の誤差の最小化を優先したパラメータ更新が行われ、フィードバック数がチケット数より多いと第２の誤差の最小化を優先したパラメータ更新が行われがちである。上述の調整を行うことにより、このような偏ったパラメータ更新を防ぎ、第１の誤差と第２の誤差の両方をバランス良く小さくすることができるようになる。 Ideally, both the first error and the second error should be small, but in reality, there is often a tug of war between these two errors. In that case, if the number of tickets is larger than the number of feedbacks, the parameter update that prioritizes the minimization of the first error is performed, and if the number of feedbacks is larger than the number of tickets, the parameter update that prioritizes the minimization of the second error is performed. It tends to be. By making the above adjustments, it is possible to prevent such a biased parameter update and reduce both the first error and the second error in a well-balanced manner.

次に、上述の学習処理により学習されたオートエンコーダ１４を用いて類似チケットを検索する検索処理について、図１０を参照して説明する。担当者等が、検索対象となるチケット（検索チケット）を示して、管理装置１００に検索処理の開始を指示すると、検索処理が開始される。この時、管理装置１００は、検索装置として動作する。なお、検索処理開始前に、既に上述の学習処理（図８）により、管理装置１００のオートエンコーダ１４の学習（ニューラルネットワークのパラメータの更新）は済んでおり、記憶部１６に記憶されているチケットＤＢにはチケット数分のチケット情報が格納されているものとする。チケットＤＢに格納されているデータは、検索されるチケット（被検索データ）の集まりなので、被検索データ群とも呼ばれる。 Next, a search process for searching for similar tickets using the autoencoder 14 learned by the above-mentioned learning process will be described with reference to FIG. When the person in charge or the like indicates a ticket (search ticket) to be searched and instructs the management device 100 to start the search process, the search process is started. At this time, the management device 100 operates as a search device. Before the start of the search process, the autoencoder 14 of the management device 100 has already been learned (updated of the neural network parameters) by the above-mentioned learning process (FIG. 8), and the ticket stored in the storage unit 16 is stored. It is assumed that the ticket information for the number of tickets is stored in the DB. Since the data stored in the ticket DB is a collection of tickets to be searched (searched data), it is also called a searched data group.

まず、管理装置１００の制御部１０は、チケット番号を指定する変数ＴＮを１に初期化する（ステップＳ２０１）。次に、データ取得部１１は、チケットＤＢからチケット番号がＴＮであるチケットのタイトル及び本文（文章_ＴＮとする）を取得する（ステップＳ２０２）。 First, the control unit 10 of the management device 100 initializes the variable TN that specifies the ticket number to 1 (step S201). Next, the data acquisition unit 11 acquires the title and the text (referred to as the sentence _TN ) of the ticket whose ticket number is TN from the ticket DB (step S202).

そして、特徴ベクトル生成部１２は、文章_ＴＮからＢｏＷ_ＴＮベクトルを生成する（ステップＳ２０３）。ＢｏＷ_ＴＮベクトルは、検索されるチケット（被検索データ）の特徴ベクトルなので、被検索特徴ベクトルとも呼ばれる。次に、制御部１０は、オートエンコーダ１４のエンコーダ１４１にＢｏＷ_ＴＮベクトルを入力して、中間ベクトル１４４を取得し、当該中間ベクトルを特徴量Ｖ_ＴＮとして、記憶部１６に保存する（ステップＳ２０４）。特徴量Ｖ_ＴＮは、検索されるチケット（被検索データ）の中間ベクトルなので、被検索中間ベクトルとも呼ばれる。ステップＳ２０４は、被検索中間ベクトル生成ステップとも呼ばれる。そして、ステップＳ２０４を実行する際、制御部１０は被検索中間ベクトル生成手段として機能する。 Then, the feature vector generation unit 12 generates a BoW _TN vector from the sentence _TN (step S203). Since the BoW _TN vector is a feature vector of the ticket to be searched (searched data), it is also called a searched feature vector. Next, the control unit 10 inputs a BoW _TN vector to the encoder 141 of the autoencoder 14, acquires an intermediate vector 144, and stores the intermediate vector as a feature amount V _TN in the storage unit 16 (step S204). .. Since the feature amount _VTN is an intermediate vector of the ticket to be searched (searched data), it is also called an intermediate vector to be searched. Step S204 is also referred to as a searched intermediate vector generation step. Then, when the step S204 is executed, the control unit 10 functions as a search intermediate vector generation means.

そして、制御部１０は、変数ＴＮに１を加算する（ステップＳ２０５）。次に、制御部１０は、変数ＴＮの値が、チケットＤＢに格納されているチケット数より大きいか否かを判定する（ステップＳ２０６）。変数ＴＮの値がチケット数以下なら（ステップＳ２０６；Ｎｏ）、ステップＳ２０２に戻る。これにより、記憶部１６には、ＴＮが１からチケット数までのＶ_ＴＮ（被検索中間ベクトル群）が保存される。ステップＳ２０１からステップＳ２０６までは、被検索中間ベクトル群生成ステップとも呼ばれる。そして、ステップＳ２０１からステップＳ２０６までを実行する際、制御部１０は被検索中間ベクトル群生成手段として機能する。 Then, the control unit 10 adds 1 to the variable TN (step S205). Next, the control unit 10 determines whether or not the value of the variable TN is larger than the number of tickets stored in the ticket DB (step S206). If the value of the variable TN is equal to or less than the number of tickets (step S206; No), the process returns to step S202. As a result, _VTNs (searched intermediate vector group) having a TN of 1 to the number of tickets are stored in the storage unit 16. Steps S201 to S206 are also referred to as searched intermediate vector group generation steps. Then, when executing steps S201 to S206, the control unit 10 functions as a searched intermediate vector group generating means.

変数ＴＮの値がチケット数より大きいなら（ステップＳ２０６；Ｙｅｓ）、データ取得部１１は、検索処理の開始が指示された時に担当者等から示された検索チケットのタイトル及び本文（文章_Ｓとする）を取得する（ステップＳ２０７）。ステップＳ２０７を実行する際、データ取得部１１は検索データ取得手段として機能する。そして、特徴ベクトル生成部１２は、文章_ＳからＢｏＷ_Ｓベクトルを生成する（ステップＳ２０８）。ＢｏＷ_Ｓベクトルは検索チケット（検索データ）の特徴ベクトルなので検索特徴ベクトルとも呼ばれ、ステップＳ２０８は検索特徴ベクトル生成ステップとも呼ばれる。 If the value of the variable TN is larger than the number of tickets (step S206; Yes), the data acquisition unit 11 sets the title and body of the search ticket (sentence _S) indicated by the person in charge or the like when the start of the search process is instructed. ) Is acquired (step S207). When executing step S207, the data acquisition unit 11 functions as a search data acquisition means. Then, the feature vector generation unit 12 generates a Bow _S vector from the sentence _S (step S208). Since the Bow _S vector is a feature vector of the search ticket (search data), it is also called a search feature vector, and step S208 is also called a search feature vector generation step.

次に、制御部１０は、オートエンコーダ１４のエンコーダ１４１に、ＢｏＷ_Ｓベクトルを入力して、出力される中間ベクトル１４４を特徴量Ｖ_Ｓとして取得する（ステップＳ２０９）。 Next, the control unit 10, the encoder 141 of Autoencoder 14, enter the BoW _S vector, to obtain an intermediate vector 144 is output as the feature quantity _{V S} (step S209).

そして、制御部１０は、記憶部１６に保存されている特徴量Ｖ_ＴＮ（ＴＮ＝１〜チケット数）と特徴量Ｖ_Ｓとの間でコサイン類似度を計算し、コサイン類似度の高い順に所定の数（例えば１０個）のチケット_ＴＮ（特徴量Ｖ_Ｓとのコサイン類似度が高い特徴量Ｖ_ＴＮに対応するチケット）を抽出する（ステップＳ２１０）。ステップＳ２１０を実行する際、制御部１０は、検索手段として機能する。また、ステップＳ２１０は、検索ステップとも呼ばれる。 Then, the control unit 10, the cosine similarity between the feature quantity V _TN stored in the storage unit 16 and the _(TN =. 1 to the number of tickets), wherein the amount V _S calculated, predetermined with high cosine similarity order extracting a number of (e.g., 10) Ticket _TN (ticket is cosine similarity between the feature quantity _{V S} corresponding to the high feature quantity _{V TN)} of (step S210). When executing step S210, the control unit 10 functions as a search means. Step S210 is also called a search step.

そして、制御部１０は、抽出した類似チケットを出力して（ステップＳ２１１）、検索処理を終了する。この類似チケットの出力は、管理装置１００にアクセスしている端末２００の表示部２３に表示（例えば端末２００で動いているＷｅｂブラウザで表示）すること等によって行われる。 Then, the control unit 10 outputs the extracted similar ticket (step S211), and ends the search process. The output of this similar ticket is performed by displaying it on the display unit 23 of the terminal 200 accessing the management device 100 (for example, displaying it on the Web browser running on the terminal 200).

なお、上記検索処理のうち、ステップＳ２０１からステップＳ２０６までの処理は、検索されるチケット（被検索データ）の中間ベクトルを生成する処理（被検索中間ベクトル群生成処理）であり、学習処理（図８）の後に一度実行すれば、その後の検索処理においては、再度実行する必要はない。被検索中間ベクトル群生成処理を行って、記憶部１６に、被検索中間ベクトル群が保存されている状態の場合、検索処理は、ステップＳ２０１からステップＳ２０６をスキップして、ステップＳ２０７から開始される。 Of the above search processes, the processes from step S201 to step S206 are processes for generating intermediate vectors of the tickets to be searched (searched data) (searched intermediate vector group generation processing), and are learning processes (FIG. If it is executed once after 8), it is not necessary to execute it again in the subsequent search process. When the searched intermediate vector group generation process is performed and the searched intermediate vector group is stored in the storage unit 16, the search process skips steps S201 to S206 and starts from step S207. ..

以上、検索処理について説明した。以上説明したように、学習処理によってチケットＤＢ及びフィードバックＤＢの情報に基づいて、オートエンコーダ１４を学習させることにより、フィードバックが得られていない新規の検索チケットに対しても、エンコーダ１４１が出力する中間ベクトルの類似度を用いることによって、精度の高い類似チケット検索を行うことができる。担当者や管理者は、新規のチケットに類似する過去のチケットの情報が得られたら、当該過去のチケットに対して行われた対応策等を参照することができるので、今後の作業の参考にすることができる。 The search process has been described above. As described above, by learning the autoencoder 14 based on the information in the ticket DB and the feedback DB by the learning process, the encoder 141 outputs an intermediate even for a new search ticket for which no feedback has been obtained. By using the similarity of vectors, it is possible to perform a highly accurate similarity ticket search. When the person in charge or the manager obtains information on a past ticket similar to a new ticket, he / she can refer to the countermeasures taken for the past ticket, so that he / she can refer to future work. can do.

（変形例）
本発明は、上述した実施の形態に限定されるわけではなく、その他の種々の変更が可能である。例えば、プロジェクト管理システム１では、上記実施の形態で示した全ての技術的特徴を備えるものでなくてもよく、従来技術における少なくとも１つの課題を解決できるように、上記実施の形態で説明した一部の構成を備えたものであってもよい。また、下記の変形例それぞれについて、少なくとも一部を組み合わせても良い。 (Modification example)
The present invention is not limited to the above-described embodiment, and various other modifications are possible. For example, the project management system 1 does not have to have all the technical features shown in the above-described embodiment, and has been described in the above-described embodiment so as to solve at least one problem in the prior art. It may have a structure of parts. In addition, at least a part of each of the following modifications may be combined.

例えば、上述の実施の形態ではフィードバックの入力は図４に示すように、管理装置１００によって抽出された所定の数のチケットの中からチェックボックスで入力する形態になっていた。しかし、フィードバックの入力はこれに限られない。例えば、管理装置１００が抽出していないチケットであっても、担当者、管理者等が自ら見つけた類似チケットの情報をフィードバック可能にしてもよい。また、フィードバック時に、単に類似するという情報だけで無く、どの程度類似するのかの情報（例えば、「類似度高」、「類似度中」、「類似度低」等）も含めたフィードバックを可能にしてもよい。 For example, in the above-described embodiment, as shown in FIG. 4, the feedback input is in the form of inputting with a check box from a predetermined number of tickets extracted by the management device 100. However, feedback input is not limited to this. For example, even if the ticket is not extracted by the management device 100, the information of the similar ticket found by the person in charge, the manager, or the like may be fed back. In addition, at the time of feedback, it is possible to provide feedback including not only information that they are similar but also information on how similar they are (for example, "high similarity", "medium similarity", "low similarity", etc.). You may.

類似度のフィードバックも行える場合、フィードバックＤＢ（図５）には、類似度の情報も含めるようにし、学習処理（図８）のステップＳ１１２でのパラメータ更新の際は、類似度の大きさに応じてパラメータの更新の度合いを修正するようにしてもよい。このようにすることで、管理装置１００は、フィードバックの情報をより反映した学習を行えるようになり、検索処理（図１０）において、担当者や管理者により納得のいく検索結果を出力できるようになる。 When feedback of similarity can be performed, the feedback DB (FIG. 5) also includes information on similarity, and when updating the parameters in step S112 of the learning process (FIG. 8), it depends on the magnitude of similarity. The degree of parameter update may be modified. By doing so, the management device 100 can perform learning that more reflects the feedback information, and can output a search result that is more convincing to the person in charge or the manager in the search process (FIG. 10). Become.

また、上述の実施の形態では、プロジェクト管理におけるチケットについて、類似チケットを検索するための特徴量の学習、及び、学習した特徴量に基づく類似チケットの検索、をそれぞれ行う管理装置１００について説明した。しかし、本発明の適用範囲は類似チケットの検索に限られない。一般的に、文書データの情報と、文書データ間の類似度についてのフィードバック情報と、が格納されたシステムにおいて、ある文書に類似した文書を検索する場合には、上述した学習処理及び検索処理を適用することができる。例えば、ネットワーク上で質問すると誰かが回答を返してくれるようなＱ＆Ａシステムに本発明を適用すれば、新規の質問文に類似する過去の質問文が検索可能になる。そして、当該過去の質問文に対する回答を参照することにより、当該新規の質問をした人にとっても参考になる情報を得ることができる。 Further, in the above-described embodiment, the management device 100 for learning the feature amount for searching for similar tickets and searching for similar tickets based on the learned feature amount for the ticket in the project management has been described. However, the scope of application of the present invention is not limited to the search for similar tickets. Generally, in a system in which information on document data and feedback information on the degree of similarity between document data are stored, when searching for a document similar to a certain document, the above-mentioned learning process and search process are performed. Can be applied. For example, if the present invention is applied to a Q & A system in which someone returns an answer when a question is asked on a network, past question sentences similar to a new question sentence can be searched. Then, by referring to the answers to the past question sentences, information that can be used as a reference for the person who asked the new question can be obtained.

なお、管理装置１００及び端末２００は、専用の装置によらず、通常のコンピュータを用いて実現可能である。例えば、コンピュータに上述のいずれかを実行するためのプログラムを格納した記録媒体から該プログラムをコンピュータにインストールすることにより、上述の処理を実行する管理装置１００及び端末２００を構成してもよい。また、複数のコンピュータが協働して動作することによって、１つの管理装置１００又は端末２００を構成しても良い。 The management device 100 and the terminal 200 can be realized by using an ordinary computer without using a dedicated device. For example, the management device 100 and the terminal 200 that execute the above-described processing may be configured by installing the program in the computer from a recording medium in which the program for executing any of the above is executed in the computer. Further, one management device 100 or a terminal 200 may be configured by operating a plurality of computers in cooperation with each other.

また、コンピュータにプログラムを供給するための手法は、任意である。例えば、通信回線、通信ネットワーク、通信システム等を介して供給しても良い。 Also, the method for supplying the program to the computer is arbitrary. For example, it may be supplied via a communication line, a communication network, a communication system, or the like.

また、上述の機能の一部をＯＳ（ＯｐｅｒａｔｉｎｇＳｙｓｔｅｍ）が提供する場合には、ＯＳが提供する機能以外の部分をプログラムで提供すれば良い。 When the OS (Operating System) provides a part of the above-mentioned functions, a part other than the functions provided by the OS may be provided by a program.

１…プロジェクト管理システム、１０，２０…制御部、１１…データ取得部、１２…特徴ベクトル生成部、１３…類似データ取得部、１４…オートエンコーダ、１５…更新部、１６，２１…記憶部、１７，２２…通信部、２３…表示部、２４…入力部、１００…管理装置、１０１，２０１…ＣＰＵ、１０２，２０２…ＲＡＭ、１０３，２０３…ＲＯＭ、１０４，２０４…ハードディスクドライブ、１０５，２０５…ネットワークカード、１４１…エンコーダ、１４２…デコーダ、１４３，１４３’…入力ベクトル、１４４，１４４’…中間ベクトル、１４５…出力ベクトル、１４６…第１の誤差、１４７…第２の誤差、２００…端末、２０６…ディスプレイ、２０７…キーボード 1 ... project management system, 10, 20 ... control unit, 11 ... data acquisition unit, 12 ... feature vector generation unit, 13 ... similar data acquisition unit, 14 ... auto encoder, 15 ... update unit, 16, 21 ... storage unit, 17, 22 ... Communication unit, 23 ... Display unit, 24 ... Input unit, 100 ... Management device, 101, 201 ... CPU, 102, 202 ... RAM, 103, 203 ... ROM, 104, 204 ... Hard disk drive, 105, 205 ... network card, 141 ... encoder, 142 ... decoder, 143, 143'... input vector, 144, 144' ... intermediate vector, 145 ... output vector, 146 ... first error, 147 ... second error, 200 ... terminal , 206 ... Display, 207 ... Keyboard

Claims

Standard data acquisition means for acquiring standard data,
A feature vector generation means that generates a feature vector from the input data,
Similar data acquisition means for acquiring similar data similar to the reference data, and
An auto consisting of an encoder that generates an intermediate vector having a dimension number lower than the dimension number of the feature vector when the feature vector is input, and a decoder that generates an output vector having the same dimension number as the feature vector when the intermediate vector is input. With the encoder
An update means for updating the parameters of the autoencoder, and
With
The update means
It is an error between the first feature vector generated by inputting the reference data into the feature vector generating means and the output vector generated by inputting the first feature vector into the autoencoder. The first update process of updating the parameters of the autoencoder so that the first error becomes small, and
The first intermediate vector generated by inputting the first feature vector into the encoder of the autoencoder and the second feature vector generated by inputting the similar data into the feature vector generating means are referred to as the auto. A second update process for updating the parameters of the autoencoder so that the second error, which is an error between the second intermediate vector generated by inputting to the encoder of the encoder, becomes smaller.
Learning device to do.

The update means alternately repeats the first update process and the second update process, so that both the first error and the second error are reduced by the parameters of the autoencoder. To update,
The learning device according to claim 1.

The similar data acquisition means acquires a plurality of the similar data with respect to one reference data, and obtains a plurality of the similar data.
The update means performs the second update process using the plurality of similar data.
The learning device according to claim 1 or 2.

The updating means calculates the first error by obtaining the square error, and calculates the second error by obtaining the cosine similarity.
The learning device according to any one of claims 1 to 3.

The update means performs the first update process and the second update process so that the value of the square sum average or the harmonic mean of the first error and the second error is minimized.
The learning device according to any one of claims 1 to 4.

An autoencoder consisting of an encoder that generates an intermediate vector having a dimension number lower than the dimension number of the feature vector when a feature vector is input, and a decoder that generates an output vector having the same dimension number as the feature vector when the feature vector is input. Is a learning method of
The first feature vector generation step to generate the first feature vector from the reference data, and
A second feature vector generation step of generating a second feature vector from similar data similar to the reference data, and
An update step for updating the parameters of the autoencoder, and
With
In the update step
The first, which is an error between the first feature vector generated in the first feature vector generation step and the output vector generated by inputting the first feature vector into the autoencoder. The first update process of updating the parameters of the autoencoder so that the error becomes small, and
A first intermediate vector generated by inputting the first feature vector into the encoder of the autoencoder and a second intermediate vector generated by inputting the second feature vector into the encoder of the autoencoder. A second update process for updating the parameters of the autoencoder so that the second error, which is an error between the two, is reduced.
Learning method to do.

An autoencoder whose parameters have been updated by the learning method according to claim 6.
A feature vector generation means that generates a feature vector from the input data,
By inputting the searched data to be searched into the feature vector generation means and generating the searched intermediate vector which is an intermediate vector generated by inputting the searched feature vector into the encoder of the auto encoder, the searched intermediate vector is generated. A search intermediate vector group generation means for generating in advance a search intermediate vector group, which is a set of search intermediate vectors, from a search data group, which is a set of search data.
Search data acquisition means to acquire search data to be searched, and
Between the search intermediate vector, which is an intermediate vector generated by inputting the search data into the feature vector generation means and the search feature vector generated by inputting the search feature vector into the encoder of the auto encoder, and the search intermediate vector. A search means for searching the searched data similar to the search data from the searched data group based on the cosine similarity of the above.
A search device equipped with.

A search method using an autoencoder in which parameters are updated by the learning method according to claim 6.
By inputting the searched feature vector generated from the searched searched data into the encoder of the auto encoder and generating the searched intermediate vector which is the intermediate vector generated, the searched is a set of the searched data. A search intermediate vector group generation step that previously generates a search intermediate vector group that is a set of the search intermediate vectors from the search data group,
A search feature vector generation step that generates a search feature vector from the search data to be searched, and
In the search data group, based on the cosine similarity between the search intermediate vector, which is an intermediate vector generated by inputting the search feature vector into the encoder of the autoencoder, and the search intermediate vector. To search for the searched data similar to the search data from
Search method with.

Computer,
Reference data acquisition means for acquiring reference data,
Feature vector generation means, which generates a feature vector from the input data,
Similar data acquisition means for acquiring similar data similar to the reference data,
An auto consisting of an encoder that generates an intermediate vector having a dimension number lower than the dimension number of the feature vector when the feature vector is input, and a decoder that generates an output vector having the same dimension number as the feature vector when the intermediate vector is input. Encoder,
An update means for updating the parameters of the autoencoder,
It is a program to function as
The update means
It is an error between the first feature vector generated by inputting the reference data into the feature vector generating means and the output vector generated by inputting the first feature vector into the autoencoder. The first update process of updating the parameters of the autoencoder so that the first error becomes small, and
The first intermediate vector generated by inputting the first feature vector into the encoder of the autoencoder and the second feature vector generated by inputting the similar data into the feature vector generating means are referred to as the auto. A second update process for updating the parameters of the autoencoder so that the second error, which is an error between the second intermediate vector generated by inputting to the encoder of the encoder, becomes smaller.
Program to do.