JP2020027182A

JP2020027182A - Learning data generation method, learning method, and evaluation device

Info

Publication number: JP2020027182A
Application number: JP2018152116A
Authority: JP
Inventors: 継河合; Kei Kawai
Original assignee: Crystal Method Co Ltd
Current assignee: Crystal Method Co Ltd
Priority date: 2018-08-10
Filing date: 2018-08-10
Publication date: 2020-02-20
Anticipated expiration: 2038-08-10
Also published as: JP6452061B1

Abstract

To provide a learning data generation method, a learning method, and an evaluation device capable of easily acquiring learning data.SOLUTION: The learning data generation method for generating in a pseudo manner sound data to be used as learning data for machine learning, includes: an acquiring step of acquiring a reference image obtained by extracting a part of spectrogram converted from the sound data, and a training image obtained by deleting a part of the reference image; a first database generating step of generating a first database by machine learning using the reference image and the training image as a pair of input data; a generating step of generating a pseudo image based on the new reference image or the new training image by referring to the first database; and a converting step of converting the pseudo image into pseudo sound data.SELECTED DRAWING: Figure 1

Description

本発明は、学習データ生成方法、学習方法、及び評価装置に関する。 The present invention relates to a learning data generation method, a learning method, and an evaluation device.

従来、発話等の音を評価する技術として、例えば特許文献１の評価装置等が提案されている。 Conventionally, as a technique for evaluating sounds such as utterances, for example, an evaluation device disclosed in Patent Literature 1 has been proposed.

特許文献１では、例えば発話者の自由発話の音声信号が入力される入力部と、入力された音声信号から評価に用いられる特徴を抽出する特徴抽出部と、特徴抽出部で抽出された特徴と予め格納されている参照特徴を比較する特徴評価部と、比較結果を出力する出力部と、を備え、前記評価に用いられる特徴には、少なくとも発声の明瞭さが含まれ、前記発声の明瞭さは、入力された音声信号における阻害音対共鳴音比によって表され、前記特徴抽出部は、入力された音声信号を複数のセグメントに分割する手段と、得られたセグメントを阻害音と共鳴音に分類する手段と、を用いて前記阻害音対共鳴音比を取得する、発話評価装置が開示されている。 In Patent Literature 1, for example, an input unit to which a speech signal of a free utterance of a speaker is input, a feature extraction unit to extract a feature used for evaluation from the input speech signal, and a feature extracted by the feature extraction unit A feature evaluation unit that compares a reference feature stored in advance, and an output unit that outputs a comparison result, wherein the features used for the evaluation include at least clarity of speech, and clarity of the speech. Is represented by an inhibition sound-to-resonance ratio in the input audio signal, and the feature extraction unit divides the input audio signal into a plurality of segments, and converts the obtained segment into an inhibition sound and a resonance sound. An utterance evaluation device is disclosed, which obtains the disturbing sound-to-resonant sound ratio using a classifying unit.

特開２０１５−６８８９７号公報JP-A-2015-68897

ここで、例えば特許文献１のような音を評価する技術には、機械学習を用いて生成されたデータベース（分類器等）を参照する場合がある。機械学習を用いる場合、精度を向上させるために膨大な数の学習データが必要となる。特に、音を対象とした学習データを取得する場合、音の周波数や観測時間等のパラメータも多くなるため、文字や画像に比べて、学習データのファイル数やメモリ占有量等を必要とする傾向がある。このため、学習データを取得するためには、膨大な時間や費用が必要となり、容易に学習データを取得することが課題として挙げられる。この点、特許文献１の開示技術では、上述した課題を解決することは難しい。 Here, for example, in a technique for evaluating a sound as disclosed in Patent Literature 1, a database (a classifier or the like) generated using machine learning may be referred to. When machine learning is used, an enormous number of learning data is required to improve accuracy. In particular, when acquiring learning data for sounds, since parameters such as sound frequency and observation time are increased, the number of learning data files and memory occupancy tend to be required compared to characters and images. There is. For this reason, acquiring the learning data requires enormous time and expense, and it is an issue to easily acquire the learning data. In this regard, it is difficult to solve the above-described problem with the technology disclosed in Patent Document 1.

そこで本発明は、上述した問題点に鑑みて案出されたものであり、その目的とするところは、学習データを容易に取得することができる学習データ生成方法、学習方法、及び評価装置を提供することにある。 The present invention has been devised in view of the above-described problems, and has as its object to provide a learning data generation method, a learning method, and an evaluation device capable of easily acquiring learning data. Is to do.

第１発明に係る学習データ生成方法は、機械学習の学習データとして用いる音データを疑似的に生成する学習データ生成方法であって、学習用の前記音データから変換されたスペクトログラムの一部を抽出した参照画像と、前記参照画像の一部を削除した訓練画像と、を取得する取得ステップと、前記参照画像と、前記訓練画像とを一対の入力データとした機械学習により第１データベースを生成する第１データベース生成ステップと、前記第１データベースを参照し、新たな前記参照画像又は新たな前記訓練画像に基づく疑似画像を生成する生成ステップと、前記疑似画像を、疑似音データに変換する変換ステップと、を備えることを特徴とする。 A learning data generating method according to a first aspect of the present invention is a learning data generating method for artificially generating sound data used as learning data for machine learning, and extracts a part of a spectrogram converted from the sound data for learning. An acquisition step of acquiring the obtained reference image and a training image from which a part of the reference image has been deleted, and generating a first database by machine learning using the reference image and the training image as a pair of input data. A first database generating step, a generating step of generating a pseudo image based on the new reference image or the new training image by referring to the first database, and a converting step of converting the pseudo image into pseudo sound data And the following.

第２発明に係る学習データ生成方法は、第１発明において、前記生成ステップは、１つの新たな前記参照画像又は１つの新たな前記訓練画像に対して、複数の前記疑似画像を生成し、複数の前記疑似画像は、それぞれ異なる前記疑似音データに変換されることを特徴とする。 In the learning data generation method according to a second invention, in the first invention, the generation step generates a plurality of the pseudo images for one new reference image or one new training image, Are converted into the different pseudo sound data, respectively.

第３発明に係る学習データ生成方法は、第１発明又は第２発明において、前記第１データベース生成ステップは、機械学習に基づき前記第１データベースを生成することを特徴とする。 A learning data generation method according to a third invention is characterized in that, in the first invention or the second invention, the first database generation step generates the first database based on machine learning.

第４発明に係る学習データ生成方法は、第１発明又は第２発明において、前記第１データベース生成ステップは、生成系の機械学習に基づき前記第１データベースを生成することを特徴とする。 A learning data generation method according to a fourth invention is characterized in that, in the first invention or the second invention, the first database generation step generates the first database based on machine learning of a generation system.

第５発明に係る学習データ生成方法は、第１発明〜第４発明の何れかにおいて、前記音データは、コネクタ音と、周辺環境音とを含むことを特徴とする。 According to a fifth aspect of the present invention, in the learning data generation method according to any one of the first to fourth aspects, the sound data includes a connector sound and a surrounding environment sound.

第６発明に係る学習データ生成方法は、第１発明〜第５発明の何れかにおいて、前記変換ステップは、逆短時間フーリエ変換を用いて前記疑似画像から変換された前記疑似音データに対して、乱数で発生させたノイズを加えることを特徴とする。 The learning data generation method according to a sixth aspect is the learning data generation method according to any one of the first to fifth aspects, wherein the conversion step is performed on the pseudo sound data converted from the pseudo image using an inverse short-time Fourier transform. And adding noise generated by random numbers.

第７発明に係る学習方法は、第１発明〜第６発明における学習データ生成方法により生成された前記疑似音データを学習データとして機械学習する学習方法であって、前記疑似音データと、前記疑似音データに紐づく評価データとを一対の入力データとした機械学習により第２データベースを生成する第２データベース生成ステップと、を備えることを特徴とする。 A learning method according to a seventh aspect of the present invention is a learning method for performing machine learning using the pseudo sound data generated by the learning data generating method according to the first to sixth aspects as learning data, wherein the pseudo sound data and the pseudo sound data A second database generating step of generating a second database by machine learning using the evaluation data linked to the sound data as a pair of input data.

第８発明に係る評価装置は、第７発明における学習方法により生成された前記第２データベースを用いて、評価対象用音データを評価する評価装置であって、前記評価対象用音データを取得する取得部と、前記第２データベースを参照して、前記評価対象用音データに基づく評価結果を生成する評価部と、を備えることを特徴とする。 An evaluation device according to an eighth invention is an evaluation device that evaluates sound data for an evaluation target using the second database generated by the learning method according to the seventh invention, and acquires the sound data for an evaluation target. It is characterized by comprising an acquisition unit and an evaluation unit that generates an evaluation result based on the evaluation target sound data with reference to the second database.

第９発明に係る評価装置は、評価対象用音データを評価する評価装置であって、学習用の音データから変換されたスペクトログラムの一部を抽出した参照画像と、前記参照画像の一部を削除した訓練画像と、を取得する第１取得部と、前記参照画像と、前記訓練画像とを一対の入力データとした機械学習により第１データベースを生成する第１データベース生成部と、前記第１データベースを参照し、新たな前記参照画像又は新たな前記訓練画像に基づく疑似画像を生成する疑似画像生成部と、前記疑似画像を、前記疑似音データに変換する変換部と、前記疑似音データと、前記疑似音データに紐づく評価データとを一対の入力データとした機械学習により第２データベースを生成する第２データベース生成部と、前記評価対象用音データを取得する第２取得部と、前記第２データベースを参照して、前記評価対象用音データに基づく評価結果を生成する評価部と、を備えることを特徴とする。 An evaluation device according to a ninth aspect is an evaluation device that evaluates sound data for evaluation purposes, and includes a reference image that extracts a part of a spectrogram converted from sound data for learning, and a part of the reference image. A first acquisition unit that acquires the deleted training image; a first database generation unit that generates a first database by machine learning using the reference image and the training image as a pair of input data; With reference to a database, a pseudo image generation unit that generates a pseudo image based on the new reference image or the new training image, a conversion unit that converts the pseudo image into the pseudo sound data, and the pseudo sound data. A second database generating unit that generates a second database by machine learning using the evaluation data linked to the pseudo sound data as a pair of input data, A second obtaining unit that Tokusuru, with reference to the second database, characterized in that it comprises an evaluation unit for generating an evaluation result based on the sound data for the evaluation.

第１０発明に係る学習データ生成方法は、機械学習の学習データとして用いる音データを疑似的に生成する学習データ生成方法であって、学習用の前記音データに基づく参照データと、前記参照データの一部を削除した訓練データと、を取得する取得ステップと、前記参照データと、前記訓練データとを一対の入力データとした機械学習により第１データベースを生成する第１データベース生成ステップと、前記第１データベースを参照し、新たな前記参照データ又は新たな前記訓練データに基づく疑似データを生成する生成ステップと、を備えることを特徴とする。 A learning data generation method according to a tenth aspect is a learning data generation method for generating sound data to be used as learning data for machine learning in a pseudo manner, wherein reference data based on the sound data for learning, and reference data of the reference data. An acquisition step of acquiring training data with a part deleted, a first database generation step of generating a first database by machine learning using the reference data and the training data as a pair of input data, And generating a pseudo data based on the new reference data or the new training data with reference to one database.

第１１発明に係る学習データ生成方法は、第１０発明において、前記生成ステップは、１つの新たな前記参照データ又は１つの新たな前記訓練データに対して、複数の前記疑似データを生成することを特徴とする。 In a learning data generation method according to an eleventh aspect, in the tenth aspect, the generation step includes generating a plurality of the pseudo data with respect to one new reference data or one new training data. Features.

第１２発明に係る学習方法は、第１０発明又は第１１発明における学習データ生成方法により生成された前記疑似データを学習データとして機械学習する学習方法であって、前記疑似データと、前記疑似データに紐づく評価データとを一対の入力データとした機械学習により第２データベースを生成する第２データベース生成ステップと、を備えることを特徴とする。 A learning method according to a twelfth invention is a learning method for performing machine learning using the pseudo data generated by the learning data generation method according to the tenth invention or the eleventh invention as learning data, wherein the pseudo data and the pseudo data are A second database generating step of generating a second database by machine learning using the linked evaluation data as a pair of input data.

第１３発明に係る評価装置は、第１２発明における学習方法により生成された前記第２データベースを用いて、評価対象用音データを評価する評価装置であって、前記評価対象用音データを取得する取得部と、前記第２データベースを参照して、前記評価対象用音データに基づく評価結果を生成する評価部と、を備えることを特徴とする。 An evaluation device according to a thirteenth invention is an evaluation device that evaluates sound data for an evaluation target using the second database generated by the learning method according to the twelfth invention, and acquires the sound data for an evaluation target. It is characterized by comprising an acquisition unit and an evaluation unit that generates an evaluation result based on the evaluation target sound data with reference to the second database.

第１発明〜第８発明によれば、生成ステップは、新たな参照画像又は新たな訓練画像に基づく疑似画像を生成し、変換ステップは、疑似画像を疑似音データに変換する。すなわち、学習データとして用いられる音データが少ない場合においても、疑似音データを学習データとして用いることができる。このため、機械学習に用いられる学習データを容易に取得することができる。これにより、学習データを取得するための時間や費用の削減を実現することが可能となる。 According to the first to eighth inventions, the generation step generates a pseudo image based on a new reference image or a new training image, and the conversion step converts the pseudo image into pseudo sound data. That is, even when the sound data used as the learning data is small, the pseudo sound data can be used as the learning data. For this reason, learning data used for machine learning can be easily acquired. This makes it possible to reduce the time and cost for acquiring the learning data.

特に、第８発明によれば、評価部は、第２データベースを参照して、評価対象用音データに基づく評価結果を生成する。このため、学習データとして用いられる音データが少ない場合においても、疑似音データを用いた機械学習により生成された第２データベースを参照することにより、評価結果の精度の向上を図ることが可能となる。 In particular, according to the eighth aspect, the evaluation unit refers to the second database and generates an evaluation result based on the sound data for evaluation. Therefore, even when the sound data used as the learning data is small, the accuracy of the evaluation result can be improved by referring to the second database generated by the machine learning using the pseudo sound data. .

第９発明によれば、疑似画像生成部は、新たな参照画像又は新たな訓練画像に基づく疑似画像を生成し、変換部は、疑似画像を疑似音データに変換する。すなわち、学習データとして用いられる音データが少ない場合においても、疑似音データを学習データとして用いることができる。このため、機械学習に用いられる学習データを容易に取得することができる。これにより、学習データを取得するための時間や費用の削減を実現することが可能となる。 According to the ninth aspect, the pseudo image generation unit generates a pseudo image based on a new reference image or a new training image, and the conversion unit converts the pseudo image into pseudo sound data. That is, even when the sound data used as the learning data is small, the pseudo sound data can be used as the learning data. For this reason, learning data used for machine learning can be easily acquired. This makes it possible to reduce the time and cost for acquiring the learning data.

また、第９発明によれば、評価部は、第２データベースを参照して、評価対象用音データに基づく評価結果を生成する。このため、学習データとして用いられる音データが少ない場合においても、疑似音データを用いた機械学習により生成された第２データベースを参照することにより、評価結果の精度の向上を図ることが可能となる。 According to the ninth aspect, the evaluation unit refers to the second database and generates an evaluation result based on the evaluation target sound data. Therefore, even when the sound data used as the learning data is small, the accuracy of the evaluation result can be improved by referring to the second database generated by the machine learning using the pseudo sound data. .

第１０発明〜第１３発明によれば、生成ステップは、新たな参照データ又は新たな訓練データに基づく疑似データを生成する。すなわち、学習データとして用いられる音データが少ない場合においても、疑似データを学習データとして用いることができる。このため、機械学習に用いられる学習データを容易に取得することができる。これにより、学習データを取得するための時間や費用の削減を実現することが可能となる。 According to the tenth to thirteenth aspects, the generation step generates pseudo data based on new reference data or new training data. That is, even when the sound data used as the learning data is small, the pseudo data can be used as the learning data. For this reason, learning data used for machine learning can be easily acquired. This makes it possible to reduce the time and cost for acquiring the learning data.

特に、第１３発明によれば、評価部は、第２データベースを参照して、評価対象用音データに基づく評価結果を生成する。このため、学習データとして用いられる音データが少ない場合においても、疑似音データを用いた機械学習により生成された第２データベースを参照することにより、評価結果の精度の向上を図ることが可能となる。 In particular, according to the thirteenth aspect, the evaluation unit refers to the second database and generates an evaluation result based on the sound data for evaluation. Therefore, even when the sound data used as the learning data is small, the accuracy of the evaluation result can be improved by referring to the second database generated by the machine learning using the pseudo sound data. .

図１（ａ）は、本実施形態における評価装置の用途の一例を示す模式図であり、図１（ｂ）は、本実施形態における学習方法の概要を示す模式図であり、図１（ｃ）は、本実施形態における学習データ生成方法の概要を示す模式図である。FIG. 1A is a schematic diagram illustrating an example of an application of the evaluation device according to the present embodiment, and FIG. 1B is a schematic diagram illustrating an outline of a learning method according to the present embodiment. 2) is a schematic diagram illustrating an outline of a learning data generation method according to the present embodiment. 図２（ａ）は、音データの一例を示す模式図であり、図２（ｂ）は、スペクトログラムの一例を示す模式図であり、図２（ｃ）は、参照画像の一例を示す模式図であり、図２（ｄ）は、訓練画像の一例を示す模式図である。FIG. 2A is a schematic diagram illustrating an example of sound data, FIG. 2B is a schematic diagram illustrating an example of a spectrogram, and FIG. 2C is a schematic diagram illustrating an example of a reference image. FIG. 2D is a schematic diagram illustrating an example of a training image. 図３は、本実施形態における学習データ生成方法の一例を示すフローチャートである。FIG. 3 is a flowchart illustrating an example of the learning data generation method according to the present embodiment. 図４（ａ）〜（ｃ）は、音データ、スペクトログラム、参照画像、及び訓練画像の関係を示す模式図である。FIGS. 4A to 4C are schematic diagrams showing a relationship among sound data, a spectrogram, a reference image, and a training image. 図５は、本実施形態における学習方法の一例を示すフローチャートである。FIG. 5 is a flowchart illustrating an example of the learning method according to the present embodiment. 図６（ａ）は、本実施形態における評価装置の構成の一例を示す模式図であり、図６（ｂ）は、本実施形態における評価装置の機能の一例を示す模式図である。FIG. 6A is a schematic diagram illustrating an example of a configuration of the evaluation device according to the present embodiment, and FIG. 6B is a schematic diagram illustrating an example of functions of the evaluation device according to the present embodiment. 図７は、本実施形態における評価装置の動作の一例を示すフローチャートである。FIG. 7 is a flowchart illustrating an example of the operation of the evaluation device according to the present embodiment. 図８は、本実施形態における音データ及び訓練データの一例を示す模式図である。FIG. 8 is a schematic diagram illustrating an example of sound data and training data according to the present embodiment.

以下、本発明を適用した実施形態における学習データ生成方法、学習方法、及び評価装置の一例について、図面を参照しながら説明する。 Hereinafter, an example of a learning data generation method, a learning method, and an evaluation device in an embodiment to which the present invention is applied will be described with reference to the drawings.

図１を参照して、本実施形態における学習データ生成方法、学習方法、及び評価装置１の一例について説明する。図１（ａ）は、本実施形態における評価装置１の用途の一例を示す模式図であり、図１（ｂ）は、本実施形態における学習方法の概要を示す模式図であり、図１（ｃ）は、本実施形態における学習データ生成方法の概要を示す模式図である。 An example of a learning data generation method, a learning method, and an evaluation device 1 according to the present embodiment will be described with reference to FIG. FIG. 1A is a schematic diagram illustrating an example of a use of the evaluation device 1 according to the present embodiment, and FIG. 1B is a schematic diagram illustrating an outline of a learning method according to the present embodiment. (c) is a schematic diagram showing an outline of the learning data generation method in the present embodiment.

本実施形態における評価装置１は、例えば図１（ａ）に示すように、音データ（評価対象用音データ）を取得し、音データに対する評価結果を出力するものである。評価装置１は、例えば工場等に設置され、特定の音が含まれるか否かの評価や製品の出荷検査等に用いられるほか、例えば環境音やノイズ音の制御が必要な空間の評価等に用いられる。評価装置１として、例えばパーソナルコンピュータ（ＰＣ）等の電子機器が用いられる。 The evaluation device 1 according to the present embodiment acquires sound data (evaluation target sound data) and outputs an evaluation result for the sound data, for example, as shown in FIG. The evaluation device 1 is installed, for example, in a factory or the like, and is used for evaluating whether or not a specific sound is included, for shipping inspection of a product, and for evaluating a space that requires control of environmental sound and noise sound. Used. As the evaluation device 1, for example, an electronic device such as a personal computer (PC) is used.

評価装置１に取得される評価対象用音データは、例えば図示しないマイク等の収音装置により収集された音に基づき生成される。評価対象用音データは、例えば工場内における装置の稼働音、空調音、コネクタ音等の機械音や周辺環境音のほか、人の発話等の音声を含む。評価対象用音データは、例えば図２（ａ）に示すように、時間軸に対する振幅で示される。 The evaluation target sound data acquired by the evaluation device 1 is generated based on sounds collected by a sound collection device such as a microphone (not shown). The evaluation target sound data includes, for example, a machine sound such as an operation sound of an apparatus in a factory, an air-conditioning sound, a connector sound and the like, a surrounding environment sound, and a voice such as a human utterance. The evaluation target sound data is represented by an amplitude with respect to a time axis, for example, as shown in FIG.

評価装置１は、評価用データベース（第２データベース）を参照し、音データに対する評価結果を出力する。評価装置１は、例えば評価対象用音データに対して規格等の範囲内（ＯＫ）又は範囲外（ＮＧ）を評価した結果を、評価結果として出力する。評価結果の内容は、評価用データベースの生成時において任意に設定することができる。 The evaluation device 1 refers to an evaluation database (second database) and outputs an evaluation result for the sound data. The evaluation device 1 outputs, for example, a result of evaluating the sound data for evaluation within (OK) or out of range (NG) such as a standard as an evaluation result. The content of the evaluation result can be arbitrarily set when the evaluation database is generated.

本実施形態における学習方法は、例えば図１（ｂ）に示すように、疑似的に生成された疑似音データを学習データとして用いた機械学習により、評価用データベースを生成するものである。学習データには、疑似音データと疑似音データに紐づく評価データとの一対のデータが複数含まれるほか、マイク等の収音装置により収集された音から生成された音データと、音データに紐づく評価データとの一対のデータが複数含まれてもよい。学習データに含まれる疑似音データの割合は、任意である。 In the learning method according to the present embodiment, as shown in FIG. 1B, for example, an evaluation database is generated by machine learning using artificially generated pseudo sound data as learning data. The learning data includes a plurality of pairs of pseudo-sound data and evaluation data associated with the pseudo-sound data.In addition, the learning data includes sound data generated from sound collected by a sound collection device such as a microphone and sound data. A plurality of pairs of data with the associated evaluation data may be included. The ratio of the pseudo sound data included in the learning data is arbitrary.

疑似音データ及び音データは、例えば図２（ａ）に示すように、上述した評価対象用データと同様に時間軸に対する振幅で示される。評価データは、例えば「ＯＫ」、「ＮＧ」等の２値で表される評価結果を示すほか、紐づく音データの特徴を評価した結果を示してもよい。なお、評価データの内容は、ユーザ等によって任意に設定できる。 The pseudo-sound data and the sound data are represented by an amplitude with respect to a time axis, for example, as shown in FIG. The evaluation data indicates, for example, an evaluation result represented by a binary value such as “OK” or “NG”, or may indicate a result of evaluating the characteristic of the linked sound data. The contents of the evaluation data can be arbitrarily set by a user or the like.

学習方法では、例えばニューラルネットワークをモデルとした機械学習を用いて、評価用データベースを生成する。評価用データベースは、例えばＣＮＮ（ＣｏｎｖｏｌｕｔｉｏｎＮｅｕｒａｌＮｅｔｗｏｒｋ）やオートエンコーダをモデルとした機械学習を用いて生成されるほか、任意のモデルが用いられてもよい。評価用データベースには、例えば疑似音データと評価データとの間における連関度が記憶される。連関度は、疑似音データと評価データとの繋がりの度合いを示しており、例えば連関度が高いほど各データの繋がりが強いと判断することができる。連関度は、例えば百分率等の３値以上（３段階以上）で示されるほか、２値（２段階）で示されてもよい。 In the learning method, for example, an evaluation database is generated using machine learning using a neural network as a model. The evaluation database is generated using, for example, machine learning using a CNN (Convolution Neural Network) or an auto encoder as a model, or an arbitrary model may be used. In the evaluation database, for example, the degree of association between the pseudo sound data and the evaluation data is stored. The degree of association indicates the degree of connection between the pseudo sound data and the evaluation data. For example, it can be determined that the higher the degree of association, the stronger the connection between the data. The association degree is indicated by, for example, three values or more (three steps or more) such as a percentage, or may be indicated by two values (two steps).

本実施形態における学習データ生成方法は、例えば図１（ｃ）に示すように、サンプル画像を取得し、サンプル画像に対する疑似画像を生成し、疑似画像を疑似音データに変換するものである。学習データ生成方法では、生成用データベースを参照し、サンプル画像に対する疑似画像を生成する。学習データ生成方法は、１つのサンプル画像に対して、少なくとも１つの疑似画像を生成することができる。 The learning data generation method according to the present embodiment is to acquire a sample image, generate a pseudo image for the sample image, and convert the pseudo image into pseudo sound data, as shown in FIG. 1C, for example. In the learning data generation method, a pseudo image for a sample image is generated with reference to a generation database. The learning data generation method can generate at least one pseudo image for one sample image.

学習データ生成方法では、例えばニューラルネットワークをモデルとした機械学習を用いて、生成用データベース（第１データベース）を生成する。学習データ生成方法では、ＧＡＮ（ＧｅｎｅｒａｔｉｖｅＡｄｖｅｒｓａｒｉａｌＮｅｔｗｏｒｋ）やオートエンコーダをモデルとした機械学習を用いて、生成用データベースが生成され、特に、ＣｏｎｄｉｔｉｏｎａｌＧＡＮの一種であるｐｉｘ２ｐｉｘをモデルとした機械学習を用いて、生成用データベースが生成されてもよい。 In the learning data generation method, a generation database (first database) is generated using, for example, machine learning using a neural network as a model. In the learning data generation method, a generation database is generated using machine learning using a GAN (Generative Adversary Network) or an auto encoder as a model. In particular, using a machine learning using a model of pix2pix, which is a kind of conditional GAN, is used. , A generation database may be generated.

学習データ生成方法では、生成用データベースを生成する際、参照画像と、訓練画像との一対のデータが、学習データとして複数用いられる。学習データ生成方法では、学習用の音データを変換したスペクトログラムの一部を抽出した参照画像と、参照画像の一部を削除した訓練画像とが用いられる。 In the learning data generation method, when generating a generation database, a plurality of pairs of data of a reference image and a training image are used as learning data. In the learning data generation method, a reference image obtained by extracting a part of a spectrogram obtained by converting sound data for learning and a training image obtained by deleting a part of the reference image are used.

学習データ生成方法に用いられる学習用の音データとして、例えば図２（ａ）に示すように、上述した評価対象用音データと同種のデータが用いられる。スペクトログラムは、例えば図２（ｂ）に示すように、時間軸及び周波数軸に対する強度（振幅）で示され、例えばフーリエ変換（例えば短時間フーリエ変換）を用いて音データから変換される。学習データ生成方法において、スペクトログラムは、画像データとして用いられ、例えば１ピクセル×１ピクセルは、０．０６４ｓｅｃ×１５．６２４Ｈｚの範囲に対応する。 As the learning sound data used in the learning data generation method, for example, as shown in FIG. 2A, the same kind of data as the above-described evaluation target sound data is used. The spectrogram is represented by an intensity (amplitude) with respect to a time axis and a frequency axis as shown in FIG. 2B, for example, and is converted from sound data using, for example, a Fourier transform (for example, a short-time Fourier transform). In the learning data generation method, the spectrogram is used as image data. For example, 1 pixel × 1 pixel corresponds to a range of 0.064 sec × 15.624 Hz.

参照画像として、例えば図２（ｃ）に示すように、スペクトログラムにおける一部（図２（ｂ）では破線枠）を抽出した画像が用いられる。参照画像として、例えばスペクトログラムにおける０．５１２ｓｅｃ×１２８０Ｈｚの範囲を抽出した画像が用いられる。 As the reference image, for example, as shown in FIG. 2C, an image obtained by extracting a part (a broken line frame in FIG. 2B) of the spectrogram is used. As the reference image, for example, an image obtained by extracting a range of 0.512 sec × 1280 Hz in the spectrogram is used.

訓練画像として、例えば図２（ｄ）に示すように、参照画像における一部を削除（図２（ｃ）では破線部）した画像が用いられる。訓練画像として、例えば０．５１２ｓｅｃ×１２５Ｈｚの範囲を削除した画像が用いられる。 As the training image, for example, as shown in FIG. 2D, an image in which a part of the reference image is deleted (a broken line portion in FIG. 2C) is used. As the training image, for example, an image from which the range of 0.512 sec × 125 Hz is deleted is used.

生成用データベースは、参照画像と、訓練画像とを一対の入力データとした機械学習により生成されることで、入力データ（例えばサンプル画像）に対する疑似データの生成に用いることができる。この疑似データの生成は、ＧＡＮ等のようなＧｅｎｅｒａｔｏｒとＤｉｓｃｒｉｍｉｎａｔｏｒとの２つのネットワークを持つモデルを用いて行われる画像補完の技術を利用することで、実現することが可能である。即ち、参照画像を、訓練画像の正解画像として学習させることで、疑似画像を生成する精度を高めることができる。 The generation database is generated by machine learning using a reference image and a training image as a pair of input data, and can be used to generate pseudo data for input data (for example, a sample image). The generation of the pseudo data can be realized by using a technique of image complementation performed using a model having two networks of a generator and a discriminator such as GAN. That is, by learning the reference image as the correct image of the training image, the accuracy of generating the pseudo image can be increased.

学習データ生成方法に用いられるサンプル画像として、例えば参照画像又は訓練画像と同種の画像が用いられる。即ち、サンプル画像として、サンプルとして取得された音データを変換したスペクトログラムの一部を抽出した参照画像（新たな参照画像）、又は、参照画像における一部を削除した訓練画像（新たな訓練画像）が用いられる。何れの画像が用いられる場合においても、１つのサンプル画像から複数の疑似画像を生成することができる。 As a sample image used in the learning data generation method, for example, an image of the same type as the reference image or the training image is used. That is, as a sample image, a reference image (new reference image) in which a part of a spectrogram obtained by converting sound data obtained as a sample is extracted, or a training image in which a part of the reference image is deleted (new training image) Is used. Whichever image is used, a plurality of pseudo images can be generated from one sample image.

学習データ生成方法によって生成される疑似音データは、例えば逆フーリエ変換（例えば逆短時間フーリエ変換）を用いて疑似画像から変換される。これにより、サンプルとして取得された音データに対して疑似音データを取得することができる。 The pseudo sound data generated by the learning data generation method is converted from the pseudo image using, for example, an inverse Fourier transform (for example, an inverse short-time Fourier transform). Thereby, pseudo sound data can be obtained for the sound data obtained as a sample.

なお、学習データ生成方法では、例えば学習用の音データを画像に変換しないデータを参照データとし、参照データの一部を削除したデータを訓練データとして、参照データと訓練データを一対のデータとした機械学習により、生成用データベースを生成してもよい。この場合、例えば音データにおける特定の期間の振幅を抽出したものを参照データとし、参照データにおける特定の期間の振幅を削除したものを訓練データとする。ここで「振幅」は、アナログ値、デジタル値、又は画像表示値の何れでもよい。この場合、上述したサンプル画像の代わりに、参照データ又は訓練データと同種のサンプルデータが用いられる。
例えば、図８は、本実施形態における音データ及び訓練データの一例を示す模式図である。図８の（ａ）は、学習用の音データの一例である。図８の（ｂ）は、音データの一部を削除した訓練データである。 In the learning data generation method, for example, data in which sound data for learning is not converted to an image is used as reference data, data in which a part of the reference data is deleted is used as training data, and reference data and training data are used as a pair of data. The generation database may be generated by machine learning. In this case, for example, the data obtained by extracting the amplitude of a specific period in the sound data is used as reference data, and the data obtained by removing the amplitude of the specific period in the reference data is used as training data. Here, the “amplitude” may be any of an analog value, a digital value, or an image display value. In this case, the same sample data as the reference data or the training data is used instead of the above-described sample image.
For example, FIG. 8 is a schematic diagram illustrating an example of sound data and training data according to the present embodiment. FIG. 8A shows an example of sound data for learning. FIG. 8B shows training data from which a part of the sound data has been deleted.

なお、評価装置１は、上述した学習データ生成方法及び学習方法を実施する機能を備えるほか、例えば他の端末等によって生成された生成用データベース、疑似音データ及び評価用データベースの少なくとも何れかを取得してもよい。 Note that the evaluation device 1 has a learning data generation method and a function of executing the learning method described above, and acquires at least one of a generation database, pseudo sound data, and an evaluation database generated by another terminal or the like. May be.

（学習データ生成方法）
次に、図３を参照して、本実施形態における学習データ生成方法の一例を説明する。図３は、本実施形態における学習データ生成方法の一例を示すフローチャートである。なお、以下では参照画像及び訓練画像を用いた動作について説明するが、参照データ及び訓練データを用いた場合においても同様の動作のため、説明を省略する。 (Learning data generation method)
Next, an example of a learning data generation method according to the present embodiment will be described with reference to FIG. FIG. 3 is a flowchart illustrating an example of the learning data generation method according to the present embodiment. In the following, the operation using the reference image and the training image will be described. However, the same operation is performed when the reference data and the training data are used, and the description is omitted.

＜取得ステップＳ１１０＞
先ず、参照情報と、訓練情報とを取得する（取得ステップＳ１１０）。取得ステップＳ１１０では、評価対象の基準となる学習用の音データから変換されたスペクトログラムの一部を抽出した参照画像と、参照画像の一部を削除した訓練画像とを取得する。なお、取得ステップＳ１１０では、例えば予め生成された参照情報及び訓練情報を取得するほか、例えば収音装置により収集された音に基づき生成された音データを取得し、音データをスペクトログラムに変換（図４（ａ））し、スペクトログラムから参照画像を抽出（図４（ｂ））し、参照画像の一部を削除した訓練画像を取得（図４（ｃ））する動作を実行する評価装置１等の電子機器を用いてもよい。この場合、スペクトログラムから参照画像を取得する範囲や、参照画像の一部を削除する範囲は、予め設定してもよい。 <Acquisition step S110>
First, reference information and training information are acquired (acquisition step S110). In the acquiring step S110, a reference image in which a part of a spectrogram converted from learning sound data serving as a reference for evaluation is extracted, and a training image in which a part of the reference image is deleted are acquired. In the acquisition step S110, for example, in addition to acquiring reference information and training information generated in advance, for example, sound data generated based on sounds collected by a sound collection device is obtained, and the sound data is converted into a spectrogram (see FIG. 4 (a)), the reference image is extracted from the spectrogram (FIG. 4 (b)), and the evaluation device 1 that executes the operation of acquiring the training image from which a part of the reference image is deleted (FIG. 4 (c)), etc. May be used. In this case, the range for acquiring the reference image from the spectrogram and the range for deleting a part of the reference image may be set in advance.

＜第１データベース生成ステップＳ１２０＞
次に、生成用データベース（第１データベース）を生成する（第１データベース生成ステップＳ１２０）。第１データベース生成ステップＳ１２０では、参照画像と、訓練画像とを一対の入力データとした機械学習により、生成データベースを生成する。入力データは、参照画像と訓練画像との一対のデータを複数（例えば１０００程度）含み、例えば１つの参照画像に対して、削除箇所の異なる複数の訓練画像を、それぞれ一対のデータとして入力データに含ませてもよい。第１データベース生成ステップＳ１２０では、例えばｐｉｘ２ｐｉｘをモデルとした機械学習を用いて、生成用データベースを生成する。 <First database generation step S120>
Next, a generation database (first database) is generated (first database generation step S120). In the first database generation step S120, a generation database is generated by machine learning using a reference image and a training image as a pair of input data. The input data includes a plurality (for example, about 1000) of a pair of data of a reference image and a training image. For example, for one reference image, a plurality of training images having different deletion locations are input data as a pair of data. May be included. In the first database generation step S120, a generation database is generated using, for example, machine learning using pix2pix as a model.

＜生成ステップＳ１３０＞
次に、疑似画像を生成する（生成ステップＳ１３０）。生成ステップＳ１３０では、生成データベースを参照し、サンプル画像（新たな参照画像又は新たな訓練画像）に基づく疑似画像を生成する。生成ステップＳ１３０では、例えば第１データベース生成ステップＳ１２０に用いた参照画像又は訓練画像を、サンプル画像として用いてもよいほか、例えば第１データベース生成ステップＳ１２０に用いられない参照画像又は訓練画像を、サンプル画像として用いてもよい。何れの場合においても、サンプル画像は、取得ステップＳ１１０により取得された参照情報又は訓練画像と同様の方法により取得できる。 <Generation step S130>
Next, a pseudo image is generated (generation step S130). In the generation step S130, a pseudo image based on the sample image (new reference image or new training image) is generated with reference to the generation database. In the generation step S130, for example, the reference image or the training image used in the first database generation step S120 may be used as a sample image, and for example, a reference image or a training image not used in the first database generation step S120 may be sampled. It may be used as an image. In any case, the sample image can be obtained by the same method as the reference information or the training image obtained in the obtaining step S110.

生成ステップＳ１３０は、例えば１つのサンプル画像に対して、複数の疑似画像を生成してもよい。この場合、複数の疑似画像は、それぞれ異なる疑似音データに変換される。このため、１つの音データから複数の疑似音データを生成することができる。これにより、学習データが少ない場合においても、機械学習に必要な学習データを容易に確保することが可能となる。 The generation step S130 may generate a plurality of pseudo images for one sample image, for example. In this case, the plurality of pseudo images are respectively converted into different pseudo sound data. Therefore, a plurality of pseudo sound data can be generated from one sound data. This makes it possible to easily secure learning data necessary for machine learning even when the learning data is small.

＜変換ステップＳ１４０＞
次に、疑似画像を疑似音データに変換する（変換ステップＳ１４０）。変換ステップＳ１４０では、疑似画像を、疑似音データに変換する。変換ステップＳ１４０は、例えば疑似音データに乱数で発生させたノイズを加えてもよい。このため、疑似音データを、実際に取得される音に近づけることができる。これにより、学習データとしての質を向上させることが可能となる。 <Conversion step S140>
Next, the pseudo image is converted into pseudo sound data (conversion step S140). In the conversion step S140, the pseudo image is converted into pseudo sound data. In the conversion step S140, for example, noise generated by random numbers may be added to the pseudo sound data. For this reason, the pseudo sound data can be made closer to the sound actually acquired. This makes it possible to improve the quality of the learning data.

上述した各ステップを行うことで、本実施形態における学習データ生成方法が完了する。なお、参照データ及び訓練データを用いた場合、変換ステップＳ１４０を省略してもよい。 By performing each step described above, the learning data generation method in the present embodiment is completed. When the reference data and the training data are used, the conversion step S140 may be omitted.

（学習方法）
次に、図５を参照して、本実施形態における学習方法の一例を説明する。図５は、本実施形態における学習方法の一例を示すフローチャートである。 (Learning method)
Next, an example of a learning method according to the present embodiment will be described with reference to FIG. FIG. 5 is a flowchart illustrating an example of the learning method according to the present embodiment.

＜第２データベース生成ステップＳ２１０＞
本実施形態における学習方法では、評価用データベース（第２データベース）を生成する（第２データベース生成ステップＳ２１０）。第２データベース生成ステップＳ２１０では、疑似音データと、疑似音データに紐づく評価データとを一対の入力データとした機械学習により、評価用データベースを生成する。入力データは、疑似音データと評価データとの一対のデータを複数含み、例えば収音装置より収集された音に基づき生成された音データと、評価データとの一対のデータを複数含んでもよい。第２データベース生成ステップＳ２１０では、例えばＣＮＮをモデルとした機械学習を用いて、評価用データベースを生成する。 <Second database generation step S210>
In the learning method according to the present embodiment, an evaluation database (second database) is generated (second database generation step S210). In the second database generation step S210, an evaluation database is generated by machine learning using pseudo sound data and evaluation data associated with the pseudo sound data as a pair of input data. The input data includes a plurality of pairs of pseudo sound data and evaluation data. For example, the input data may include a plurality of pairs of sound data generated based on sounds collected from the sound collection device and evaluation data. In the second database generation step S210, an evaluation database is generated using, for example, machine learning using the CNN as a model.

上述したステップを行うことで、本実施形態における学習方法が完了する。 By performing the above steps, the learning method according to the present embodiment is completed.

（評価装置１）
次に、図６を参照して、本実施形態における評価装置１の一例を説明する。図６（ａ）は、本実施形態における評価装置１の構成の一例を示す模式図であり、図６（ｂ）は、本実施形態における評価装置１の機能の一例を示す模式図である。 (Evaluation device 1)
Next, an example of the evaluation device 1 in the present embodiment will be described with reference to FIG. FIG. 6A is a schematic diagram illustrating an example of a configuration of the evaluation device 1 in the present embodiment, and FIG. 6B is a schematic diagram illustrating an example of functions of the evaluation device 1 in the present embodiment.

評価装置１は、例えば図６（ａ）に示すように、筐体１０と、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）１０１と、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）１０２と、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）１０３と、保存部１０４と、Ｉ／Ｆ１０５〜１０７とを備える。各構成１０１〜１０７は、内部バス１１０により接続される。 As shown in FIG. 6A, for example, the evaluation device 1 includes a housing 10, a CPU (Central Processing Unit) 101, a ROM (Read Only Memory) 102, a RAM (Random Access Memory) 103, and a storage unit. 104 and I / Fs 105 to 107. Each of the components 101 to 107 is connected by an internal bus 110.

ＣＰＵ１０１は、評価装置１全体を制御する。ＲＯＭ１０２は、ＣＰＵ１０１の動作コードを格納する。ＲＡＭ１０３は、ＣＰＵ１０１の動作時に使用される作業領域である。保存部１０４は、音データ等の各種情報が記憶される。保存部１０４として、例えばＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）の他、ＳＳＤ（ｓｏｌｉｄｓｔａｔｅｄｒｉｖｅ）やフロッピーディスク等のデータ保存装置が用いられる。なお、例えば評価装置１は、図示しないＧＰＵ（ＧｒａｐｈｉｃｓＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）を有してもよい。ＧＰＵを有することで、通常よりも高速演算処理が可能となる。 The CPU 101 controls the entire evaluation device 1. The ROM 102 stores the operation code of the CPU 101. The RAM 103 is a work area used when the CPU 101 operates. The storage unit 104 stores various information such as sound data. As the storage unit 104, for example, a data storage device such as a hard disk drive (HDD), a solid state drive (SSD), or a floppy disk is used. Note that, for example, the evaluation device 1 may include a GPU (Graphics Processing Unit) not shown. By having a GPU, higher-speed arithmetic processing can be performed than usual.

Ｉ／Ｆ１０５は、インターネット等の公衆通信網２を介してサ、他の端末や専用サーバ等との各種情報の送受信を行うためのインターフェースである。Ｉ／Ｆ１０６は、入力部分１０８との情報の送受信を行うためのインターフェースである。入力部分１０８として、例えばキーボードが用いられ、評価装置１の利用者等は、入力部分１０８を介して、各種情報又は評価装置１の制御コマンド等を入力する。Ｉ／Ｆ１０７は、出力部分１０９との各種情報の送受信を行うためのインターフェースである。出力部分１０９は、保存部１０４に保存された各種情報、又は評価装置１の処理状況等を出力する。出力部分１０９として、ディスプレイが用いられ、例えばタッチパネル式でもよい。 The I / F 105 is an interface for transmitting and receiving various types of information to and from other terminals and dedicated servers via the public communication network 2 such as the Internet. The I / F 106 is an interface for transmitting and receiving information to and from the input unit 108. For example, a keyboard is used as the input unit 108, and a user of the evaluation device 1 inputs various information or control commands of the evaluation device 1 via the input unit 108. The I / F 107 is an interface for transmitting and receiving various information to and from the output unit 109. The output unit 109 outputs various information stored in the storage unit 104 or the processing status of the evaluation device 1 or the like. A display is used as the output unit 109, and may be, for example, a touch panel type.

図６（ｂ）は、評価装置１の機能の一例を示す模式図である。評価装置１は、取得部１１と、データベース生成部１２と、疑似画像生成部１３と、変換部１４と、評価部１５と、出力部１６と、記憶部１７とを備える。なお、図６（ｂ）に示した機能は、ＣＰＵ１０１が、ＲＡＭ１０３を作業領域として、保存部１０４等に記憶されたプログラムを実行することにより実現され例えば人工知能により制御されてもよい。 FIG. 6B is a schematic diagram illustrating an example of a function of the evaluation device 1. The evaluation device 1 includes an acquisition unit 11, a database generation unit 12, a pseudo image generation unit 13, a conversion unit 14, an evaluation unit 15, an output unit 16, and a storage unit 17. The function illustrated in FIG. 6B is realized by the CPU 101 executing a program stored in the storage unit 104 or the like using the RAM 103 as a work area, and may be controlled by, for example, artificial intelligence.

＜取得部１１＞
取得部１１は、評価対象用音データを取得する。取得部１１は、例えば評価用データベースを生成するための音データ（疑似音データ）、評価データ等を取得してもよい。取得部１１は、例えば疑似音データを生成するための音データ、参照情報、訓練情報、サンプル画像等を取得してもよい。 <Acquisition unit 11>
The obtaining unit 11 obtains evaluation target sound data. The acquisition unit 11 may acquire, for example, sound data (pseudo-sound data) for generating an evaluation database, evaluation data, and the like. The acquisition unit 11 may acquire, for example, sound data, reference information, training information, a sample image, and the like for generating pseudo sound data.

取得部１１は、例えば第１取得部と、第２取得部とを有し、それぞれ異なる情報を取得するようにしてもよい。この場合、第１取得部は、参照画像と、訓練画像とを取得し、第２取得部は、評価対象用音データを取得する。 The acquisition unit 11 may include, for example, a first acquisition unit and a second acquisition unit, and acquire different information. In this case, the first acquisition unit acquires the reference image and the training image, and the second acquisition unit acquires the sound data for evaluation.

＜データベース生成部１２＞
データベース生成部１２は、生成用データベース、及び評価用データベースの少なくとも何れかを生成する。生成用データベース、及び評価用データベースの少なくとも何れかを生成する方法は、上述した内容と同様である。なお、生成用データベース、及び評価用データベースを他の端末等により生成する場合、評価装置１は、データベース生成部１２を備えなくてもよい。 <Database generation unit 12>
The database generation unit 12 generates at least one of a generation database and an evaluation database. The method of generating at least one of the generation database and the evaluation database is the same as the above-described content. When the generation database and the evaluation database are generated by another terminal or the like, the evaluation device 1 may not include the database generation unit 12.

データベース生成部１２は、例えば第１データベース生成部と、第２データベース生成部とを有し、それぞれ異なるデータベースを生成するようにしてもよい。この場合、第１データベース生成部は、生成用データベースを生成し、第２データベース生成部は、評価用データベースを生成する。 The database generation unit 12 may include, for example, a first database generation unit and a second database generation unit, and generate different databases. In this case, the first database generation unit generates a generation database, and the second database generation unit generates an evaluation database.

＜疑似画像生成部１３＞
疑似画像生成部１３は、生成用データベースを参照し、サンプル画像（新たな参照画像又は新たな訓練画像）に基づく疑似画像を生成する。疑似画像の生成方法は、上述した内容と同様である。なお、評価用データベースを他の端末等により生成する場合、評価装置１は、疑似画像生成部１３を備えなくてもよい。 <Pseudo image generation unit 13>
The pseudo image generation unit 13 generates a pseudo image based on the sample image (new reference image or new training image) with reference to the generation database. The method of generating the pseudo image is the same as that described above. When the evaluation database is generated by another terminal or the like, the evaluation device 1 may not include the pseudo image generation unit 13.

＜変換部１４＞
変換部１４は、疑似画像を、疑似音データに変換する。変換部１４は、例えば疑似音データに乱数で発生させたノイズを加える。疑似音データを変換する方法は、上述した内容と同様である。なお、評価用データベースを他の端末等により生成する場合、評価装置１は、変換部１４を備えなくてもよい。 <Conversion unit 14>
The conversion unit 14 converts the pseudo image into pseudo sound data. The converter 14 adds, for example, noise generated by random numbers to the pseudo sound data. The method of converting the pseudo sound data is the same as described above. When the evaluation database is generated by another terminal or the like, the evaluation device 1 may not include the conversion unit 14.

＜評価部１５＞
評価部１５は、評価用データベースを参照して、評価対象用音データに基づく評価結果を生成する。評価結果を生成する方法は、上述した内容と同様である。 <Evaluation unit 15>
The evaluation unit 15 generates an evaluation result based on the evaluation target sound data with reference to the evaluation database. The method of generating the evaluation result is the same as that described above.

＜出力部１６＞
出力部１６は、評価結果等を出力部分１０９等に出力する。出力部１６は、例えば公衆通信網２を介して、他の端末等に評価結果を送信する。 <Output unit 16>
The output unit 16 outputs the evaluation result and the like to the output unit 109 and the like. The output unit 16 transmits the evaluation result to another terminal or the like via the public communication network 2, for example.

＜記憶部１７＞
記憶部１７は、取得部１１で取得した各種情報や、評価部１５で生成された評価結果等を、保存部１０４に保存し、必要に応じて保存部１０４に保存された各種情報を取出す。 <Storage unit 17>
The storage unit 17 stores the various information acquired by the acquisition unit 11, the evaluation result generated by the evaluation unit 15, and the like in the storage unit 104, and extracts the various information stored in the storage unit 104 as needed.

（評価装置１の動作）
次に、図７を参照して、本実施形態における評価装置１の動作の一例を説明する。図７は、本実施形態における評価装置１の動作の一例を示すフローチャートである。 (Operation of the evaluation device 1)
Next, an example of the operation of the evaluation device 1 according to the present embodiment will be described with reference to FIG. FIG. 7 is a flowchart illustrating an example of the operation of the evaluation device 1 according to the present embodiment.

＜評価データ取得ステップＳ３１０＞
先ず、評価対象用音データを取得する（評価データ取得ステップＳ３１０）。取得部１１は、例えば収音装置により収集された音に基づき生成された音データを、評価対象用音データとして取得する。取得部１１は、上述した学習用の音データと同じ形式の評価対象用音データを取得する。取得部１１は、例えば記憶部１７を介して、取得した評価対象用音データを保存部１０４に保存する。 <Evaluation data acquisition step S310>
First, evaluation target sound data is obtained (evaluation data obtaining step S310). The acquisition unit 11 acquires, for example, sound data generated based on the sound collected by the sound collection device as sound data for evaluation. The obtaining unit 11 obtains evaluation target sound data in the same format as the above-described learning sound data. The obtaining unit 11 stores the obtained evaluation target sound data in the storage unit 104 via, for example, the storage unit 17.

＜評価結果生成ステップＳ３２０＞
次に、評価対象用音データに基づく評価結果を生成する（評価結果生成ステップＳ３２０）。評価部１５は、評価用データベースを参照し、評価対象用音データに基づく評価結果を生成する。評価部１５は、例えば１つの評価対象用音データに対して１つの評価結果を生成するほか、複数の評価対象用音データに対して１つの評価結果を生成してもよい。 <Evaluation result generation step S320>
Next, an evaluation result based on the evaluation target sound data is generated (evaluation result generation step S320). The evaluation unit 15 refers to the evaluation database and generates an evaluation result based on the evaluation target sound data. For example, the evaluation unit 15 may generate one evaluation result for one evaluation target sound data, or may generate one evaluation result for a plurality of evaluation target sound data.

上述した各ステップを行うことで、本実施形態における評価装置１の動作が完了する。なお、評価装置１を用いて学習データ生成方法、又は学習方法を実施する場合においては、上述したデータベース生成部１２等を用いて行うことができる。 By performing the above-described steps, the operation of the evaluation device 1 in the present embodiment is completed. When the learning data generation method or the learning method is performed using the evaluation device 1, the learning data generation method or the learning method can be performed using the database generation unit 12 described above.

本実施形態によれば、生成ステップＳ１３０は、サンプル画像（新たな参照画像又は新たな訓練画像）に基づく疑似画像を生成し、変換ステップＳ１４０は、疑似画像を疑似音データに変換する。すなわち、学習データとして用いられる音データが少ない場合においても、疑似音データを学習データとして用いることができる。このため、機械学習に用いられる学習データを容易に取得することができる。これにより、学習データを取得するための時間や費用の削減を実現することが可能となる。 According to the present embodiment, the generation step S130 generates a pseudo image based on a sample image (new reference image or new training image), and the conversion step S140 converts the pseudo image into pseudo sound data. That is, even when the sound data used as the learning data is small, the pseudo sound data can be used as the learning data. For this reason, learning data used for machine learning can be easily acquired. This makes it possible to reduce the time and cost for acquiring the learning data.

また、本実施形態によれば、評価部１５は、評価用データベース（第２データベース）を参照して、評価対象用音データに基づく評価結果を生成する。このため、学習データとして用いられる音データが少ない場合においても、疑似音データを用いた機械学習により生成された評価用データベースを参照することにより、評価結果の精度の向上を図ることが可能となる。 Further, according to the present embodiment, the evaluation unit 15 generates an evaluation result based on the evaluation target sound data with reference to the evaluation database (second database). Therefore, even when the sound data used as the learning data is small, the accuracy of the evaluation result can be improved by referring to the evaluation database generated by the machine learning using the pseudo sound data. .

また、本実施形態によれば、疑似画像生成部１３は、新たな参照画像又は新たな訓練画像に基づく疑似画像を生成し、変換部１４は、疑似画像を疑似音データに変換する。すなわち、学習データとして用いられる音データが少ない場合においても、疑似音データを学習データとして用いることができる。このため、機械学習に用いられる学習データを容易に取得することができる。これにより、学習データを取得するための時間や費用の削減を実現することが可能となる。 Further, according to the present embodiment, the pseudo image generation unit 13 generates a pseudo image based on a new reference image or a new training image, and the conversion unit 14 converts the pseudo image into pseudo sound data. That is, even when the sound data used as the learning data is small, the pseudo sound data can be used as the learning data. For this reason, learning data used for machine learning can be easily acquired. This makes it possible to reduce the time and cost for acquiring the learning data.

また、本実施形態によれば、評価部１５は、評価用データベースを参照して、評価対象用音データに基づく評価結果を生成する。このため、学習データとして用いられる音データが少ない場合においても、疑似音データを用いた機械学習により生成された評価用データベースを参照することにより、評価結果の精度の向上を図ることが可能となる。 According to the present embodiment, the evaluation unit 15 generates an evaluation result based on the evaluation target sound data with reference to the evaluation database. Therefore, even when the sound data used as the learning data is small, the accuracy of the evaluation result can be improved by referring to the evaluation database generated by the machine learning using the pseudo sound data. .

また、本実施形態によれば、複数の疑似画像は、それぞれ異なる疑似音データに変換される。このため、１つの音データから複数の疑似音データを生成することができる。これにより、学習データが少ない場合においても、機械学習に必要な学習データを容易に確保することが可能となる。 Further, according to the present embodiment, the plurality of pseudo images are respectively converted into different pseudo sound data. Therefore, a plurality of pseudo sound data can be generated from one sound data. This makes it possible to easily secure learning data necessary for machine learning even when the learning data is small.

また、本実施形態によれば、第１データベース生成ステップＳ１２０は、ＧＡＮに基づき生成用データベースを生成する。このため、他の学習モデルを用いる場合に比べて、容易に疑似データを生成することが可能となる。 According to the present embodiment, the first database generation step S120 generates a generation database based on the GAN. For this reason, pseudo data can be easily generated as compared with the case where another learning model is used.

また、本実施形態によれば、変換ステップＳ１４０は、疑似音データに乱数で発生させたノイズを加えてもよい。このため、疑似音データを、実際に取得される音に近づけることができる。これにより、学習データとしての質を向上させることが可能となる。 Further, according to the present embodiment, the conversion step S140 may add noise generated by random numbers to the pseudo sound data. For this reason, the pseudo sound data can be made closer to the sound actually acquired. This makes it possible to improve the quality of the learning data.

また、本実施形態によれば、生成ステップＳ１３０は、サンプルデータ（新たな参照データ又は新たな訓練データ）に基づく疑似データを生成する。すなわち、学習データとして用いられる音データが少ない場合においても、疑似データを学習データとして用いることができる。このため、機械学習に用いられる学習データを容易に取得することができる。これにより、学習データを取得するための時間や費用の削減を実現することが可能となる。 Further, according to the present embodiment, the generation step S130 generates pseudo data based on sample data (new reference data or new training data). That is, even when the sound data used as the learning data is small, the pseudo data can be used as the learning data. For this reason, learning data used for machine learning can be easily acquired. This makes it possible to reduce the time and cost for acquiring the learning data.

本発明の実施形態を説明したが、この実施形態は例として提示したものであり、発明の範囲を限定することは意図していない。これら新規な実施形態は、その他の様々な形態で実施されることが可能であり、発明の要旨を逸脱しない範囲で、種々の省略、置き換え、変更を行うことができる。これら実施形態やその変形は、発明の範囲や要旨に含まれるとともに、特許請求の範囲に記載された発明とその均等の範囲に含まれる。 Although an embodiment of the present invention has been described, this embodiment is provided by way of example and is not intended to limit the scope of the invention. These new embodiments can be implemented in other various forms, and various omissions, replacements, and changes can be made without departing from the spirit of the invention. These embodiments and their modifications are included in the scope and gist of the invention, and are also included in the invention described in the claims and their equivalents.

１：評価装置
２：公衆通信網
１０：筐体
１１：取得部
１２：データベース生成部
１３：疑似画像生成部
１４：変換部
１５：評価部
１６：出力部
１７：記憶部
１０１：ＣＰＵ
１０２：ＲＯＭ
１０３：ＲＡＭ
１０４：保存部
１０５：Ｉ／Ｆ
１０６：Ｉ／Ｆ
１０７：Ｉ／Ｆ
１０８：入力部分
１０９：出力部分
１１０：内部バス
Ｓ１１０：取得ステップ
Ｓ１２０：第１データベース生成ステップ
Ｓ１３０：生成ステップ
Ｓ１４０：変換ステップ
Ｓ２１０：第２データベース生成ステップ
Ｓ３１０：評価データ取得ステップ
Ｓ３２０：評価結果生成ステップ 1: evaluation device 2: public communication network 10: housing 11: acquisition unit 12: database generation unit 13: pseudo image generation unit 14: conversion unit 15: evaluation unit 16: output unit 17: storage unit 101: CPU
102: ROM
103: RAM
104: storage unit 105: I / F
106: I / F
107: I / F
108: input part 109: output part 110: internal bus S110: acquisition step S120: first database generation step S130: generation step S140: conversion step S210: second database generation step S310: evaluation data acquisition step S320: evaluation result generation step

Claims

A learning data generation method for artificially generating sound data used as learning data for machine learning,
An acquisition step of acquiring a reference image in which a part of a spectrogram converted from the sound data for learning is extracted, and a training image in which a part of the reference image is deleted,
A first database generation step of generating a first database by machine learning using the reference image and the training image as a pair of input data;
A generation step of generating a pseudo image based on the new reference image or the new training image by referring to the first database;
A conversion step of converting the pseudo image into pseudo sound data;
A learning data generation method, comprising:

The generating step generates a plurality of the pseudo images for one new reference image or one new training image,
The learning data generation method according to claim 1, wherein the plurality of pseudo images are respectively converted into different pseudo sound data.

The learning data generation method according to claim 1, wherein the first database generation step generates the first database based on machine learning.

The learning data generation method according to claim 1 or 2, wherein the first database generation step generates the first database based on machine learning of a generation system.

The learning data generation method according to any one of claims 1 to 4, wherein the sound data includes a connector sound and a surrounding environment sound.

The said conversion step adds the noise which generate | occur | produced by the random number with respect to the said pseudo sound data converted from the said pseudo image using the inverse short-time Fourier-transform, The noise according to any one of Claims 1-5. The learning data generation method described in the section.

A learning method for performing machine learning on the pseudo sound data generated by the learning data generation method according to any one of claims 1 to 6 as learning data,
A second database generating step of generating a second database by machine learning using the pseudo sound data and the evaluation data associated with the pseudo sound data as a pair of input data;
A learning method, comprising:

An evaluation device that evaluates evaluation target sound data using the second database generated by the learning method according to claim 7,
An acquiring unit for acquiring the evaluation target sound data,
An evaluation unit that generates an evaluation result based on the evaluation target sound data with reference to the second database;
An evaluation device comprising:

An evaluation device for evaluating sound data for an evaluation object,
A first acquisition unit configured to acquire a reference image obtained by extracting a part of the spectrogram converted from the sound data for learning, and a training image obtained by deleting a part of the reference image;
A first database generation unit that generates a first database by machine learning using the reference image and the training image as a pair of input data;
A pseudo image generation unit that generates a pseudo image based on the new reference image or the new training image by referring to the first database;
A conversion unit that converts the pseudo image into pseudo sound data,
A second database generating unit that generates a second database by machine learning using the pseudo sound data and the evaluation data associated with the pseudo sound data as a pair of input data;
A second acquisition unit that acquires the evaluation target sound data;
An evaluation unit that generates an evaluation result based on the evaluation target sound data with reference to the second database;
An evaluation device comprising:

A learning data generation method for artificially generating sound data used as learning data for machine learning,
An acquisition step of acquiring reference data based on the sound data for learning, and training data in which a part of the reference data has been deleted,
A first database generation step of generating a first database by machine learning using the reference data and the training data as a pair of input data;
A generating step of referring to the first database and generating pseudo data based on the new reference data or the new training data;
A learning data generation method, comprising:

The learning data generating method according to claim 10, wherein the generating step generates a plurality of the pseudo data for one new reference data or one new training data.

A learning method for performing machine learning on the pseudo data generated by the learning data generation method according to claim 10 or 11, as learning data,
A second database generation step of generating a second database by machine learning using the pseudo data and the evaluation data associated with the pseudo data as a pair of input data;
A learning method, comprising:

An evaluation device that evaluates evaluation target sound data using the second database generated by the learning method according to claim 12,
An acquiring unit for acquiring the evaluation target sound data,
An evaluation unit that generates an evaluation result based on the evaluation target sound data with reference to the second database;
An evaluation device comprising: