JP7252156B2

JP7252156B2 - LEARNING DATA GENERATION DEVICE AND LEARNING DATA GENERATION METHOD

Info

Publication number: JP7252156B2
Application number: JP2020033344A
Authority: JP
Inventors: 貫太郎三宅; 誠由高瀬; 康充野中; 伊織山崎
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2020-02-28
Filing date: 2020-02-28
Publication date: 2023-04-04
Anticipated expiration: 2040-02-28
Also published as: JP2021135896A

Description

本発明は、学習データ生成装置、及び学習データ生成方法に関する。 The present invention relates to a learning data generation device and a learning data generation method.

機械学習システムの実現に際しては、機械学習モデルの精度を確保するため、有効な学習データを準備する必要がある。 When implementing a machine learning system, it is necessary to prepare effective learning data in order to ensure the accuracy of the machine learning model.

学習データを生成する技術として、例えば、特許文献１には、ニューラルネットワークを利用した判定装置を学習するための学習用データの生成装置について記載されている。学習用データの生成装置は、収集した時系列データのデータ値を変更し、時系列データの各データの時間間隔を変更し、時系列データに歪を付加し、時系列データに雑音を付加する。 As a technique for generating learning data, for example, Patent Literature 1 describes a learning data generation device for learning a determination device using a neural network. The learning data generation device changes the data values of the collected time-series data, changes the time interval of each data of the time-series data, adds distortion to the time-series data, and adds noise to the time-series data. .

また特許文献２には、学習データが少数の場合に、学習データを加工することで、学習の改善に寄与するデータを生成する技術に関して記載されている。具体的には、ニューラルネットワーク学習装置が、学習中のニューラルネットワークを用いて学習データから特徴を抽出し、抽出した特徴から学習中のニューラルネットワークを用いて敵対的特徴を生成し、学習データと敵対的特徴とを用いてニューラルネットワークの認識結果を算出し、認識結果が望ましい出力に近づくようにニューラルネットワークを学習する。 Further, Patent Literature 2 describes a technique for generating data that contributes to improvement of learning by processing the learning data when the number of learning data is small. Specifically, a neural network learning device extracts features from learning data using a neural network under learning, generates adversarial features from the extracted features using a neural network under learning, and performs adversarial features with the learning data. The recognition result of the neural network is calculated using the characteristic features and the neural network is trained so that the recognition result approaches the desired output.

また特許文献３には、監視対象の状態異常を速やかに検知することを目的として構成された異常検知システムに関して記載されている。異常検知システムは、監視対象に対する観測データを収集して時系列観測データとして保存し、観測データを訓練用データおよび検証用データのいずれかに分類し、訓練用データに基づき監視対象の線形状態空間モデルのモデルパラメータを同定し、モデルパラメータと検証用データを入力として監視対象の状態変数の確率分布の推定値を計算し、推定値に基づき監視対象の異常度を計算し、観測データを収集すると新しく収集した観測データを時系列観測データに追加するとともに時系列観測データのデータ数がしきい値よりも大きい場合は最も前に収集した観測データを破棄する。 Further, Patent Literature 3 describes an anomaly detection system configured for the purpose of promptly detecting an anomaly in the state of an object to be monitored. The anomaly detection system collects observation data for the monitored object, stores it as time series observation data, classifies the observed data into either training data or verification data, and converts the linear state space of the monitored object based on the training data. Identify the model parameters of the model, calculate the estimated value of the probability distribution of the monitored state variables using the model parameters and verification data as input, calculate the degree of abnormality of the monitored target based on the estimated value, and collect the observation data Newly collected observation data is added to the time-series observation data, and when the number of time-series observation data is larger than the threshold value, the earliest collected observation data is discarded.

特開２０１９－８７１０６号公報JP 2019-87106 A 国際公開第２０１８／１６７９００号WO2018/167900 特開２０１９－１９１８３６号公報JP 2019-191836 A

R. B. Cleveland、外3名、“STL: a seasonal-trend decomposition procedure based on loess”、[online]、1990年、Journal of official statistics、[2020年1月31日検索]、インターネット<URL:https://www.wessa.net/download/stl.pdf>R. B. Cleveland, 3 others, "STL: a seasonal-trend decomposition procedure based on loess", [online], 1990, Journal of official statistics, [searched January 31, 2020], Internet <URL:https:/ /www.wessa.net/download/stl.pdf>

時系列データに基づき予兆診断や異常検知等の推論処理を行う機械学習システムの実現に際しては、上記推論処理を行う機械学習モデルの精度を確保する必要があり、そのためには有効な学習データを効率よく準備する必要がある。また機械学習モデルの精度を確保するには、そのために必要とされる期間の時系列データを学習データとして用意する必要
がある。 In order to realize a machine learning system that performs inference processing such as predictive diagnosis and anomaly detection based on time-series data, it is necessary to ensure the accuracy of the machine learning model that performs the above inference processing. You need to prepare well. In addition, in order to ensure the accuracy of the machine learning model, it is necessary to prepare time-series data for the required period as learning data.

しかし特許文献１及び特許文献２は、いずれも必要とされる期間の時系列データを生成する技術については何も開示されていない。また特許文献３に記載の技術では、監視対象に対する観測データを収集する必要があり、例えば、機械学習システムの導入時等のように観測データが得られていない場合には対応することができない。 However, neither Patent Literature 1 nor Patent Literature 2 discloses any technology for generating time-series data for the required period. In addition, the technique described in Patent Document 3 needs to collect observation data for the monitored object, and cannot cope with cases where observation data is not obtained, such as when a machine learning system is introduced.

本発明の目的は、必要とされる期間について適切な内容の学習データを効率よく提供することが可能な、学習データ生成装置、及び学習データ生成方法を提供することを目的とする。 An object of the present invention is to provide a learning data generation device and a learning data generation method capable of efficiently providing learning data with appropriate content for a required period.

上記目的を達成するための本発明のうちの一つは、情報処理装置を用いて構成され、機械学習モデルの学習に用いる学習データを生成する学習データ生成装置であって、所定周期分の時系列データである生成元データを複製したデータである複製データを複数連結するとともに前記複製データの夫々に雑音を与えることにより、要求される期間に応じた期間の時系列データである人工データを生成する人工データ生成部と、前記人工データを用いて学習データを生成する学習データ生成部と、を備える。 One of the present inventions for achieving the above object is a learning data generation device configured using an information processing device for generating learning data used for learning a machine learning model, comprising: Synthetic data, which is time-series data of a period corresponding to a required period, is generated by concatenating a plurality of replicated data, which are data obtained by replicating source data, which is series data, and applying noise to each of the replicated data. and a learning data generating unit that generates learning data using the artificial data.

その他、本願が開示する課題、及びその解決方法は、発明を実施するための形態の欄、及び図面により明らかにされる。 In addition, the problems disclosed by the present application and their solutions will be clarified by the description of the mode for carrying out the invention and the drawings.

本発明によれば、必要とされる期間について適切な内容の学習データを効率よく提供することができる。 ADVANTAGE OF THE INVENTION According to this invention, the learning data of the content suitable about the required period can be efficiently provided.

機械学習システムの概略的な構成を示す図である。1 is a diagram showing a schematic configuration of a machine learning system; FIG. 情報処理装置の構成の一例である。It is an example of a configuration of an information processing device. 学習データ生成装置が備える主な機能を示す図である。It is a figure which shows the main functions with which a learning data generation apparatus is provided. 学習データ生成処理を説明するフローチャートである。4 is a flowchart for explaining learning data generation processing; 人工データ生成処理の詳細を説明するフローチャートである。It is a flow chart explaining the details of artificial data generation processing. 人工データ生成処理の実行過程で生成されるデータを模式的に示した図である。It is the figure which showed typically the data produced|generated in the execution process of an artificial data production|generation process. 生成元データの一例である。It is an example of origin data. 人工データの一例である。This is an example of artificial data. 学習データ期間設定処理の詳細を説明するフローチャートである。FIG. 11 is a flowchart illustrating details of learning data period setting processing; FIG. 観測データの一例である。It is an example of observation data. 学習データ生成処理の詳細を説明するフローチャートである。4 is a flowchart for explaining the details of learning data generation processing; 学習データの一例である。It is an example of learning data. 第２実施形態における人工データ生成処理を説明するフローチャートである。It is a flow chart explaining artificial data generation processing in a 2nd embodiment. 第２実施形態における人工データ生成処理の実行過程で生成されるデータを模式的に示した図である。FIG. 10 is a diagram schematically showing data generated in the process of executing artificial data generation processing in the second embodiment; 生成元データの一例である。It is an example of origin data. 中間データの一例である。It is an example of intermediate data. 複製元データの一例である。It is an example of copy source data. 人工データの一例である。This is an example of artificial data.

以下、本発明の一実施形態について図面を参照しつつ説明する。尚、以下の説明において、同一の又は類似する機能を有する構成について同一の符号を付して重複する説明を省略することがある。また以下の説明において、符号の前に付した「Ｓ」の文字は処理ステップを意味する。また以下の説明において「学習データ」という用語を用いるが、「訓練データ」と同義である。またいわゆる教師あり機械学習に用いる学習データはいわゆるラベルの情報を含むが、説明の簡単のため、本実施形態ではラベルに関する説明や例示を省略する。また以下の説明において、期間は、日時で指定してもよいし、日のみもしくは時間のみで指定してもよい。 An embodiment of the present invention will be described below with reference to the drawings. In the following description, the same reference numerals may be assigned to components having the same or similar functions, and redundant description may be omitted. Also, in the following description, the letter "S" attached before the reference sign means a processing step. Also, although the term "learning data" is used in the following description, it has the same meaning as "training data." Learning data used for so-called supervised machine learning includes so-called label information, but for the sake of simplicity, the present embodiment omits explanation and examples of labels. Also, in the following description, the period may be specified by date and time, or may be specified by only days or only hours.

［第１実施形態］
図１に、第１実施形態として示す学習データ生成装置１００が適用される情報処理システム（以下、「機械学習システム１」と称する。）の概略的な構成を示している。同図に示すように、機械学習システム１は、推論装置２と学習データ生成装置１００とを含む。 [First embodiment]
FIG. 1 shows a schematic configuration of an information processing system (hereinafter referred to as "machine learning system 1") to which the learning data generation device 100 shown as the first embodiment is applied. As shown in the figure, the machine learning system 1 includes an inference device 2 and a learning data generation device 100 .

推論装置２は、時系列データである学習データ１１４を用いて機械学習モデル２３の学習を行う学習処理部２１、及び機械学習モデル２３を用いて推論処理を行う推論処理部２２の各機能を有する。推論処理部２２は、時系列データである観測データ１１３を機械学習モデル２３に入力することにより推論処理を行い、結果を推論結果７として出力する。機械学習モデル２３は、例えば、時系列データに基づき予兆診断や異常検知等のための推論処理を行う。 The inference device 2 has functions of a learning processing unit 21 that performs learning of the machine learning model 23 using learning data 114, which is time-series data, and an inference processing unit 22 that performs inference processing using the machine learning model 23. . The inference processing unit 22 performs inference processing by inputting observation data 113, which is time-series data, into the machine learning model 23, and outputs the result as an inference result 7. FIG. The machine learning model 23 performs, for example, inference processing for predictive diagnosis, abnormality detection, etc. based on time-series data.

学習データ生成装置１００は、時系列データである、生成元データ１１１や観測データ１１３に基づき学習データ１１４を生成する。生成された学習データ１１４は、通信または記録媒体を介して推論装置２に入力される。 The learning data generation device 100 generates learning data 114 based on source data 111 and observation data 113, which are time-series data. The generated learning data 114 is input to the inference device 2 via communication or a recording medium.

図２に、推論装置２や学習データ生成装置１００の構成に用いる情報処理装置１０の一例を示す。同図に示すように、例示する情報処理装置１０は、プロセッサ１１、主記憶装置１２、補助記憶装置１３、入力装置１４、出力装置１５、及び通信装置１６を備える。これらはバス等の通信手段を介して通信可能に接続されている。 FIG. 2 shows an example of the information processing device 10 used for configuring the inference device 2 and the learning data generation device 100 . As shown in the figure, the illustrated information processing apparatus 10 includes a processor 11 , a main storage device 12 , an auxiliary storage device 13 , an input device 14 , an output device 15 and a communication device 16 . These are communicably connected via a communication means such as a bus.

情報処理装置１０は、例えば、クラウドシステムにより提供される仮想サーバのように仮想化技術やプロセス空間分離技術等を用いて提供される仮想的な情報処理資源を用いて実現されるものであってもよい。また情報処理装置１０の機能の全部又は一部を、例えば、クラウドシステムがＡＰＩ（Application Programming Interface）等を介して提供す
るサービスにより実現してもよい。また例えば、通信可能に接続された複数の情報処理装置１０を用いて学習データ生成装置１００を構成してもよい。情報処理装置１０には、例えば、オペレーティングシステム、ファイルシステム、ＤＢＭＳ（DataBase Management System）（リレーショナルデータベース、ＮｏＳＱＬ等）等のソフトウェアが導入されていてもよい。 The information processing apparatus 10 is realized using virtual information processing resources provided using virtualization technology, process space separation technology, etc., such as a virtual server provided by a cloud system, for example. good too. Further, all or part of the functions of the information processing apparatus 10 may be implemented by a service provided by a cloud system via an API (Application Programming Interface) or the like, for example. Further, for example, the learning data generation device 100 may be configured using a plurality of information processing devices 10 that are communicably connected. Software such as an operating system, file system, DBMS (database management system) (relational database, NoSQL, etc.) may be installed in the information processing apparatus 10, for example.

プロセッサ１１は、例えば、ＣＰＵ（Central Processing Unit）、ＭＰＵ（Micro Processing Unit）、ＧＰＵ（Graphics Processing Unit）、ＡＩ（Artificial Intelligence）チップ、ＦＰＧＡ（Field Programmable Gate Array）、ＡＳＩＣ（Application Specific Integrated Circuit）等を用いて構成されている。 The processor 11 is, for example, a CPU (Central Processing Unit), MPU (Micro Processing Unit), GPU (Graphics Processing Unit), AI (Artificial Intelligence) chip, FPGA (Field Programmable Gate Array), ASIC (Application Specific Integrated Circuit), etc. is configured using

主記憶装置１２は、プログラムやデータを記憶する装置であり、例えば、ＲＯＭ（Read
Only Memory）、ＲＡＭ（Random Access Memory）、不揮発性メモリ（ＮＶＲＡＭ（Non Volatile RAM））等である。 The main storage device 12 is a device that stores programs and data, and is, for example, a ROM (Read
Only Memory), RAM (Random Access Memory), nonvolatile memory (NVRAM (Non Volatile RAM)), and the like.

補助記憶装置１３は、例えば、ＳＳＤ（Solid State Drive）、ハードディスクドライ
ブ、光学式記憶装置（ＣＤ（Compact Disc）、ＤＶＤ（Digital Versatile Disc）等）、ストレージシステム、ＩＣカード、ＳＤカードや光学式記録媒体等の記録媒体の読取／書込装置、仮想サーバの記憶領域等である。補助記憶装置１３には、記録媒体の読取装置や通信装置１６を介してプログラムやデータを読み出すことができる。補助記憶装置１３に格納（記憶）されているプログラムやデータは主記憶装置１２に随時読み出される。 The auxiliary storage device 13 is, for example, an SSD (Solid State Drive), a hard disk drive, an optical storage device (CD (Compact Disc), DVD (Digital Versatile Disc), etc.), a storage system, an IC card, an SD card, or an optical recording device. They are a read/write device for a recording medium such as a medium, a storage area for a virtual server, and the like. Programs and data can be read from the auxiliary storage device 13 via a recording medium reading device or the communication device 16 . Programs and data stored (stored) in the auxiliary storage device 13 are read out to the main storage device 12 at any time.

入力装置１４は、外部からの入力を受け付けるインタフェースであり、例えば、キーボード、マウス、タッチパネル、カードリーダ、音声入力装置等である。出力装置１５は、処理経過や処理結果等の各種情報を出力するインタフェースである。 The input device 14 is an interface that receives input from the outside, and includes, for example, a keyboard, mouse, touch panel, card reader, voice input device, and the like. The output device 15 is an interface for outputting various information such as processing progress and processing results.

出力装置１５は、例えば、上記の各種情報を可視化する表示装置（液晶モニタ、ＬＣＤ（Liquid Crystal Display）、プロジェクタ等）、上記の各種情報を音声化する装置（音声出力装置（スピーカ等））、上記の各種情報を文字化する装置（印字装置等）である。 The output device 15 includes, for example, a display device (liquid crystal monitor, LCD (Liquid Crystal Display), projector, etc.) that visualizes the above various information, a device that converts the above various information into sound (audio output device (speaker, etc.)), It is a device (printing device, etc.) that converts the above various information into characters.

入力装置１４と出力装置１５はユーザインタフェースを構成する。尚、例えば、情報処理装置１０が通信装置１６を介して他の装置（スマートフォン、タブレット、ノートブック型コンピュータ、各種携帯情報端末等）と情報の入出力を行う構成としてもよい。 The input device 14 and the output device 15 constitute a user interface. For example, the information processing device 10 may be configured to input/output information to/from other devices (smartphone, tablet, notebook computer, various mobile information terminals, etc.) via the communication device 16 .

通信装置１６は、他の装置との間の通信を実現する。通信装置１６は、通信ネットワークを介した他の装置との間の通信を実現する、無線又は有線方式の通信インタフェースであり、例えば、ＮＩＣ（Network Interface Card）、無線通信モジュール、ＵＳＢ（Universal Serial Bus）モジュール、シリアル通信モジュール等である。続いて、各装置が備える機能について説明する。 Communication device 16 implements communication with other devices. The communication device 16 is a wireless or wired communication interface that realizes communication with other devices via a communication network. ) module, serial communication module, and the like. Next, functions of each device will be described.

図３に、学習データ生成装置１００が備える主な機能を示している。同図に示すように、学習データ生成装置１００は、記憶部１１０、観測データ取得部１２０、生成元データ取得部１３０、人工データ生成部１４０、学習データ期間設定部１５０、学習データ生成部１６０、及び学習データ出力部１７０の各機能を備える。これらの機能は、学習データ生成装置１００を構成する情報処理装置１０のプロセッサ１１が、情報処理装置１０の主記憶装置１２に格納されているプログラムを読み出して実行することにより、もしくは、情報処理装置１０が備えるハードウェア（ＦＰＧＡ、ＡＳＩＣ、ＡＩチップ等）により実現される。 FIG. 3 shows main functions of the learning data generating device 100. As shown in FIG. As shown in the figure, the learning data generation device 100 includes a storage unit 110, an observation data acquisition unit 120, a generation source data acquisition unit 130, an artificial data generation unit 140, a learning data period setting unit 150, a learning data generation unit 160, and each function of the learning data output unit 170 . These functions are realized by the processor 11 of the information processing device 10 constituting the learning data generation device 100 reading out and executing a program stored in the main storage device 12 of the information processing device 10, or by executing the program stored in the information processing device 10. 10 is implemented by hardware (FPGA, ASIC, AI chip, etc.).

上記機能のうち、記憶部１１０は、生成元データ１１１、人工データ１１２、観測データ１１３、及び学習データ１１４を記憶し管理する。記憶部１１０は、例えば、ＤＢＭＳが提供するデータベースのテーブルや、ファイルシステムが提供するファイルとして、各データを記憶し管理する。 Among the functions described above, the storage unit 110 stores and manages the generation data 111 , the artificial data 112 , the observation data 113 , and the learning data 114 . The storage unit 110 stores and manages each data as, for example, a database table provided by a DBMS or a file provided by a file system.

生成元データ１１１は、人工データ１１２の生成に用いるデータである。人工データ１１２は、学習データ１１４の生成に用いるデータである。観測データ１１３は、機械学習システム１が、例えば、本番運用を開始した後に機械学習モデル２３に入力されたデータである。学習データ１１４は、人工データ１１２や観測データ１１３に基づき生成されるデータであり、機械学習モデル２３の学習（訓練）に用いられるデータである。 The source data 111 is data used to generate the artificial data 112 . Artificial data 112 is data used to generate learning data 114 . The observation data 113 is data input to the machine learning model 23 after the machine learning system 1 starts actual operation, for example. The learning data 114 is data generated based on the artificial data 112 and the observation data 113 and is data used for learning (training) of the machine learning model 23 .

図３に示す機能のうち、観測データ取得部１２０は、推論装置２から通信や記録媒体を介して観測データ１１３を取得する。記憶部１１０は、観測データ取得部１２０が取得した観測データ１１３を記憶する。 Among the functions shown in FIG. 3, the observation data acquisition unit 120 acquires the observation data 113 from the inference device 2 via communication or a recording medium. The storage unit 110 stores observation data 113 acquired by the observation data acquisition unit 120 .

生成元データ取得部１３０は、生成元データ１１１を取得もしくは生成する。生成元データ取得部１３０は、例えば、ユーザインタフェースを介してユーザから生成元データ１
１１を取得する。生成元データ取得部１３０は、例えば、観測データ１１３に基づき生成元データ１１１を生成する。ユーザがユーザインタフェースを介して観測データ１１３を編集することにより生成元データ１１１を生成してもよい。記憶部１１０は、生成元データ取得部１３０が取得もしくは生成した生成元データ１１１を記憶する。 The source data obtaining unit 130 obtains or generates the source data 111 . For example, the origin data acquisition unit 130 receives the origin data 1 from the user via a user interface.
11 is obtained. The origin data acquisition unit 130 generates the origin data 111 based on the observation data 113, for example. The user may generate source data 111 by editing observation data 113 via a user interface. The storage unit 110 stores the source data 111 acquired or generated by the source data acquisition unit 130 .

人工データ生成部１４０は、生成元データ１１１に基づき人工データ１１２を生成する。記憶部１１０は、人工データ生成部１４０が生成した人工データ１１２を記憶する。 Artificial data generator 140 generates artificial data 112 based on generation source data 111 . Storage unit 110 stores artificial data 112 generated by artificial data generation unit 140 .

学習データ期間設定部１５０は、学習データ１１４の期間（学習データの開始時点から終了時点まで。以下、「学習データ期間」と称する。）の設定に関する処理を行う。学習データ期間設定部１５０は、例えば、ユーザインタフェースを介してユーザから学習データ期間の設定に関する情報を受け付ける。 The learning data period setting unit 150 performs processing related to setting the period of the learning data 114 (from the start point to the end point of the learning data; hereinafter referred to as “learning data period”). The learning data period setting unit 150 receives, for example, information regarding the setting of the learning data period from the user via the user interface.

学習データ生成部１６０は、人工データ１１２や観測データ１１３に基づき学習データ１１４を生成する。 The learning data generator 160 generates the learning data 114 based on the artificial data 112 and the observation data 113 .

学習データ出力部１７０は、学習データ生成部１６０が生成した学習データ１１４を出力する。出力された学習データ１１４は、通信や記録媒体を介して推論装置２に入力される。 The learning data output unit 170 outputs the learning data 114 generated by the learning data generation unit 160 . The output learning data 114 is input to the inference device 2 via communication or a recording medium.

図４は、学習データ生成装置１００が学習データ１１４の生成に際して行う処理（以下、「学習データ生成処理Ｓ４００」と称する。）を説明するシーケンス図である。以下、同図とともに学習データ生成処理Ｓ４００について説明する。尚、同図に示す処理の開始時点において、記憶部１１０は、観測データ取得部１２０が取得した観測データ１１３、及び生成元データ取得部１３０が取得もしくは生成した生成元データ１１１を既に記憶しているものとする。 FIG. 4 is a sequence diagram illustrating a process performed by the learning data generating device 100 when generating the learning data 114 (hereinafter referred to as "learning data generating process S400"). The learning data generation processing S400 will be described below with reference to FIG. Note that at the start of the process shown in the figure, the storage unit 110 already stores the observation data 113 acquired by the observation data acquisition unit 120 and the generation source data 111 acquired or generated by the generation source data acquisition unit 130. It is assumed that there is

同図に示すように、まず人工データ生成部１４０が、記憶部１１０が記憶している生成元データ１１１を読み出す（Ｓ４１１）。 As shown in the figure, first, the artificial data generation unit 140 reads the generation source data 111 stored in the storage unit 110 (S411).

続いて、人工データ生成部１４０は、ユーザインタフェースを介して、ユーザが生成しようとする学習データ１１４の期間の長さ（以下、「要求期間」と称する。）の指定、生成元データ１１１に含まれている周期の数（以下、「周期数」と称する。）、及び人工データ１１２に与える雑音の生成に用いる分散σ＾２の入力を受け付ける（Ｓ４１２）。 Subsequently, the artificial data generation unit 140 designates the length of the period of the learning data 114 that the user intends to generate (hereinafter referred to as the “request period”) via the user interface, The input of the number of cycles (hereinafter referred to as "the number of cycles") and the variance σ̂2 used to generate noise given to the artificial data 112 is accepted (S412).

続いて、人工データ生成部１４０は、読み出した生成元データ１１１と、受け付けた要求期間、周期数、及び分散σ＾２に基づき、人工データ１１２を生成する処理（以下、「人工データ生成処理Ｓ４１３」と称する。）を行う（Ｓ４１３）。 Subsequently, the artificial data generation unit 140 generates the artificial data 112 based on the read source data 111 and the received request period, number of cycles, and variance σ^2 (hereinafter referred to as “artificial data generation processing S413 ) is performed (S413).

続いて、学習データ期間設定部１５０が、記憶部１１０が記憶している観測データを読み出す（Ｓ４２１）。 Subsequently, the learning data period setting unit 150 reads the observation data stored in the storage unit 110 (S421).

続いて、学習データ期間設定部１５０は、Ｓ４１２で人工データ生成部１４０が読み出した生成元データ１１１の期間、Ｓ４１３で人工データ生成部１４０が生成した人工データ１１２の期間、及びＳ４２１で読み出した観測データ１１３の期間を取得する（Ｓ４２２）。 Subsequently, the learning data period setting unit 150 sets the period of the generation source data 111 read by the artificial data generation unit 140 in S412, the period of the artificial data 112 generated by the artificial data generation unit 140 in S413, and the observation read out in S421. The period of data 113 is acquired (S422).

続いて、学習データ期間設定部１５０は、Ｓ４１２で人工データ生成部１４０が受け付けた要求期間を取得する（Ｓ４２３）。 Subsequently, the learning data period setting unit 150 acquires the requested period accepted by the artificial data generating unit 140 in S412 (S423).

続いて、学習データ期間設定部１５０は、Ｓ４２２で取得した各期間とＳ４２３で取得した要求期間とに基づき、学習データ期間を設定する処理（以下、「学習データ期間設定処理Ｓ４２４」と称する。）を行う（Ｓ４２４）。 Subsequently, the learning data period setting unit 150 sets the learning data period based on each period acquired in S422 and the requested period acquired in S423 (hereinafter referred to as "learning data period setting process S424"). (S424).

続いて、学習データ生成部１６０が、Ｓ４１１で人工データ生成部１４０が読み出した生成元データ１１１、Ｓ４１３で人工データ生成部１４０が生成した人工データ１１２、及びＳ４２１で学習データ期間設定部１５０が読み出した観測データ１１３に基づき、学習データ期間設定処理Ｓ４２４により設定された学習データ期間について学習データ１１４を生成する（Ｓ４３１）。 Subsequently, the learning data generation unit 160 reads the generation source data 111 read by the artificial data generation unit 140 in S411, the artificial data 112 generated by the artificial data generation unit 140 in S413, and the learning data period setting unit 150 in S421. Learning data 114 is generated for the learning data period set by the learning data period setting process S424 based on the observed data 113 obtained (S431).

その後、学習データ出力部１７０は、生成された学習データ１１４を出力する。出力された学習データ１１４は、通信や記録媒体を介して推論装置２の学習処理部２１に送信（提供）される。 After that, the learning data output unit 170 outputs the generated learning data 114 . The output learning data 114 is transmitted (provided) to the learning processing unit 21 of the inference device 2 via communication or a recording medium.

図５は、図４に示した人工データ生成処理Ｓ４１３の詳細を説明するフローチャートである。また図６は、人工データ生成処理Ｓ４１３の実行過程で生成されるデータを模式的に示した図である。人工データ生成部１４０は、生成元データ１１１を、Ｓ４１２で受け付けた要求期間に応じた周期数だけ複製し、生成元データ１１１の日時を適切な日時に置換し、更に観測値に雑音を付与することにより、人工データ１１２を生成する。以下、図５及び図６を参照しつつ、人工データ生成処理Ｓ４１３について説明する。 FIG. 5 is a flowchart for explaining the details of the artificial data generation processing S413 shown in FIG. FIG. 6 is a diagram schematically showing data generated in the process of executing the artificial data generation process S413. The artificial data generation unit 140 duplicates the generation source data 111 by the number of cycles corresponding to the request period received in S412, replaces the date and time of the generation source data 111 with an appropriate date and time, and adds noise to the observed value. Thus, the artificial data 112 is generated. The artificial data generation processing S413 will be described below with reference to FIGS. 5 and 6. FIG.

図７に生成元データ１１１の一例を示す。以下では同図に示す生成元データ１１１を例として人工データ生成処理Ｓ４１３を説明する。同図に示すように、例示する生成元データ１１１は、日時７０１及び観測値７０２の各項目を有する複数のエントリ（レコード）を含む。 FIG. 7 shows an example of the originating data 111. As shown in FIG. The artificial data generation processing S413 will be described below using the generation source data 111 shown in the figure as an example. As shown in the figure, the example generation source data 111 includes a plurality of entries (records) having date/time 701 and observation value 702 items.

上記項目のうち、日時７０１には、観測値７０２の値を取得した日時が設定される。尚、日時７０１の値は、各エントリを一意に識別するための識別子としても用いられる。観測値７０２には、観測値が設定される。尚、時系列データにはカテゴリ変数情報が含まれる場合があるが、とくに説明がない限り、観測値は量的変数情報であるものとする。観測値７０２は、例えば、センサ装置等から取得した値そのもの（生データ）や、複数の観測対象から得られた値を処理（加減乗除、集計処理、統計処理等）することにより得られる値である。上記値は、例えば、観測対象が情報通信システムである場合における通信量や稼働率である。また上記値は、例えば、「上り通信量」と「下り通信量」という２つの観測対象の値を合計した値「合計通信量」である。また例えば、上記値は、ある時点における観測値と別の時点の観測値に基づき計算により求められる。また例えば、上記値は、前回の通信量と今回の通信量との差分（通信量の時間変化量）である。 Among the above items, in the date and time 701, the date and time when the value of the observed value 702 was obtained is set. The value of the date and time 701 is also used as an identifier for uniquely identifying each entry. An observed value is set in the observed value 702 . Although time-series data may contain categorical variable information, observations are assumed to be quantitative variable information unless otherwise specified. The observed value 702 is, for example, a value itself (raw data) obtained from a sensor device or the like, or a value obtained by processing (addition, subtraction, multiplication, division, tabulation processing, statistical processing, etc.) values obtained from a plurality of observation targets. be. The above values are, for example, the amount of communication and the operating rate when the observation target is an information communication system. Further, the above value is, for example, a value "total traffic" obtained by summing two observation target values of "upstream traffic" and "downstream traffic". Further, for example, the above values are obtained by calculation based on an observed value at a certain point in time and an observed value at another point in time. Further, for example, the above value is the difference between the previous traffic volume and the current traffic volume (amount of change in traffic over time).

例示する生成元データ１１１は、２０１９年１１月１５日０時０分０秒から２０１９年１１月２２日０時０分０秒までの情報を１０分間隔で記録した内容からなり、図６（Ａ）に示す１周期分のデータである。尚、以下の説明において、Ｓ４１２で受け付けた生成元データ１１１の周期は１週間であり、１周期あたり１００８個のエントリが含まれるものとする。またＳ４１２で要求期間として２８週を受け付けているものとする。 The exemplified generation source data 111 consists of information recorded at intervals of 10 minutes from 00:00:00 on November 15, 2019 to 00:00:00 on November 22, 2019. FIG. A) is data for one cycle. In the following description, it is assumed that the generation source data 111 received in S412 has a cycle of one week and includes 1008 entries per cycle. It is also assumed that 28 weeks has been accepted as the requested period in S412.

図５に示すように、まず人工データ生成部１４０は、Ｓ４１２で受け付けた要求期間以上の期間となる、生成元データ１１１の１周期の期間の倍数の最小値（以下、「最小周期数」と称する。）を求める（Ｓ５０１）。 As shown in FIG. 5, the artificial data generation unit 140 first generates the minimum multiple of the period of one cycle of the generation source data 111 (hereinafter referred to as "minimum number of cycles"), which is longer than the requested period received in S412. ) is obtained (S501).

続いて、人工データ生成部１４０は、求めた最小周期数から生成元データ１１１に含まれている周期数を減じた値を求め、求めた値を生成元データ１１１に含まれる周期数で割
った値を小数点以下切り上げることにより得られる値を複製回数とする（Ｓ５０２）。尚、Ｓ５０１で求めた最小周期数から生成元データ１１１に含まれている周期数を減じているのは、複製元の生成元データ１１１の分を複製数から除くためである。例示する生成元データ１１１の周期数は１であり、要求期間は２８週であるため、本例では複製回数として２７が得られる。尚、複製回数は以上の方法以外の方法で取得してもよい。例えば、ユーザインタフェースを介してユーザから複製回数の指定を受け付けるようにしてもよい。 Subsequently, the artificial data generation unit 140 obtains a value obtained by subtracting the number of cycles included in the generation source data 111 from the obtained minimum number of cycles, and divides the obtained value by the number of cycles included in the source data 111. A value obtained by rounding up the value after the decimal point is set as the number of replications (S502). The reason why the number of cycles included in the generation source data 111 is subtracted from the minimum number of cycles obtained in S501 is to exclude the source data 111 of the replication source from the number of replications. Since the number of cycles of the illustrated source data 111 is 1 and the requested period is 28 weeks, 27 is obtained as the number of replications in this example. Note that the number of times of duplication may be acquired by a method other than the above method. For example, the user may specify the number of times of duplication via a user interface.

続いて、人工データ生成部１４０は、Ｓ５０２で取得した複製回数だけ生成元データ１１１を複製したデータ（以下、「複製データ」と称する。）を生成する（Ｓ５０３）。 Subsequently, the artificial data generating unit 140 generates data (hereinafter referred to as “duplicate data”) by duplicating the original data 111 by the number of times of duplication acquired in S502 (S503).

続いて、人工データ生成部１４０は、１から始まる自然数を各複製データに順に割り当てる。記憶部１１０は、各複製データに割り当てられた番号（以下、「複製番号」と称する。）を複製データの夫々に対応づけて記憶する（Ｓ５０４）。 Subsequently, the artificial data generation unit 140 sequentially assigns natural numbers starting from 1 to each replicated data. The storage unit 110 stores the number assigned to each replicated data (hereinafter referred to as "replicated number") in association with each replicated data (S504).

続いて、人工データ生成部１４０は、割り当てた複製番号の逆順に複製データを時系列に連結したデータ（以下、「一次人工データ」と称する。）を生成する（Ｓ５０５）。 Subsequently, the artificial data generating unit 140 generates data (hereinafter referred to as “primary artificial data”) by concatenating the duplicate data in reverse order of the assigned duplicate numbers (S505).

続いて、人工データ生成部１４０は、Ｓ５０７で生成した一次人工データの各エントリに、生成元データ１１１の各エントリの日時７０１の値を複製したデータ（以下、「参照元日時」と称する。）を付与する（Ｓ５０６）。 Subsequently, the artificial data generation unit 140 generates data obtained by copying the value of the date and time 701 of each entry of the generation source data 111 (hereinafter referred to as “reference date and time”) to each entry of the primary artificial data generated in S507. (S506).

続いて、人工データ生成部１４０は、付与した一次人工データの各エントリの参照元日時を、基準とする日時（同図では例えば日時ｔ）から遡った値に更新する（日時ｔから生成元データ１１１の周期と各エントリの複製番号とを乗算することにより得られる日時分遡る）ことにより、各エントリの日時を生成する（Ｓ５０７）。例えば、複製番号２７の複製データにおける２０１９年１１月１５日０時０分０秒の変更後の日時は、２７週分遡った２０１９年５月１０日０時０分０秒になる。Ｓ５０７を実行することにより生成される一次人工データは、図６（Ｂ）のようになる。 Subsequently, the artificial data generation unit 140 updates the reference source date and time of each entry of the provided primary artificial data to a value that predates the reference date and time (for example, date and time t in FIG. The date and time of each entry is generated by going back by the date and time obtained by multiplying the period of 111 and the copy number of each entry (S507). For example, the date and time after the change of 00:00:00 on November 15, 2019 in the replicated data with the replication number 27 is 00:00:00 on May 10, 2019, which is 27 weeks earlier. The primary artificial data generated by executing S507 is as shown in FIG. 6(B).

続いて、人工データ生成部１４０は、図４のＳ４１２で受け付けた分散σ＾２を用いて人工データの期間の白色雑音を生成する（Ｓ５０８）。Ｓ５０８を実行することにより生成される白色雑音は図６（Ｃ）のようになる。 Subsequently, the artificial data generation unit 140 generates white noise for the artificial data period using the variance σ̂2 received in S412 of FIG. 4 (S508). The white noise generated by executing S508 is as shown in FIG. 6(C).

続いて、人工データ生成部１４０は、一次人工データに対して、Ｓ５０８で生成した白色雑音を変動値として付与することにより、人工データ１１２を生成する（Ｓ５０９）。Ｓ５０９を実行することにより生成される人工データ１１２は、図６（Ｄ）のようになる。 Subsequently, the artificial data generation unit 140 generates the artificial data 112 by adding the white noise generated in S508 as a variation value to the primary artificial data (S509). The artificial data 112 generated by executing S509 is as shown in FIG. 6(D).

図８に、人工データ生成処理Ｓ４１３により生成される人工データ１１２の一例を示す。同図に示すように、例示する人工データ１１２は、日時８１１、観測値８１２、参照元観測値８１３、変動値８１４、参照元日時８１５、及び複製番号８１６の各項目を有する複数のエントリを含む。 FIG. 8 shows an example of the artificial data 112 generated by the artificial data generation processing S413. As shown in the figure, the artificial data 112 illustrated includes a plurality of entries each having a date/time 811, an observed value 812, a reference source observed value 813, a variation value 814, a reference source date/time 815, and a replication number 816. .

上記項目のうち、日時８１１には、Ｓ５０７で生成された日時が設定される。尚、日時８１１の値は、各エントリを一意に識別する識別子としても用いられる。観測値８１２には、Ｓ５０９で生成された人工データ１１２の当該日時における観測値が設定される。参照元観測値８１３には、当該日時に対応する、生成元データ１１１の観測値７０２が設定される。変動値８１４には、Ｓ５０８で生成された、当該日時に対応する白色雑音の値が設定される。参照元日時８１５には、当該日時に対応する、生成元データ１１１の日時７０１が設定される。参照元日時８１５の値は、当該エントリが、当該参照元日時８１５の
値の日時の生成元データ１１１のエントリに基づくものであることを示す。複製番号８１６には、Ｓ５０４で割り当てられた複製番号が設定される。 Among the above items, the date and time generated in S507 is set in the date and time 811 . Note that the value of the date and time 811 is also used as an identifier that uniquely identifies each entry. In the observed value 812, the observed value of the artificial data 112 generated in S509 at the date and time is set. The observed value 702 of the generation source data 111 corresponding to the date and time is set in the reference source observed value 813 . The variation value 814 is set to the white noise value corresponding to the date and time generated in S508. The reference source date and time 815 is set with the date and time 701 of the generation source data 111 corresponding to the date and time. The value of the reference date and time 815 indicates that the entry is based on the entry of the generation source data 111 of the date and time of the value of the reference date and time 815 . The replication number 816 is set with the replication number assigned in S504.

図９は、図４に示した学習データ期間設定処理Ｓ４２４を説明するフローチャートである。以下、同図とともに学習データ期間設定処理Ｓ４２４について説明する。尚、Ｓ４２３で取得した要求期間に重なる期間の観測データ１１３が既に取得されている場合、人工データ生成部１４０は、観測データ１１３を優先して学習データ１１４として採用されるように学習データ期間を設定する。 FIG. 9 is a flowchart for explaining the learning data period setting process S424 shown in FIG. The learning data period setting process S424 will be described below with reference to FIG. Note that if the observation data 113 for a period overlapping the requested period acquired in S423 has already been acquired, the artificial data generation unit 140 sets the learning data period so that the observation data 113 is preferentially adopted as the learning data 114. set.

図１０は、以下の説明で用いる観測データ１１３の一例である。同図に示すように、例示する観測データ１１３は、日時１０１１と観測値１０１２の各項目を有する複数のエントリを含む。上記項目のうち日時１０１１には、当該エントリの観測値が取得された日時が設定される。観測値１０１２には、観測対象から実際に取得した観測値が設定される。 FIG. 10 is an example of observation data 113 used in the following description. As shown in the figure, the illustrated observation data 113 includes a plurality of entries having date and time 1011 and observation value 1012 items. In the date and time 1011 of the above items, the date and time when the observation value of the entry was acquired is set. The observed value 1012 is set with an observed value actually obtained from the observation target.

図９に示すように、まず学習データ期間設定部１５０は、記憶部１１０が観測データ１１３を記憶しているか否か（学習データ生成装置１００が観測データ１１３を取得しているか否か）を確認する（Ｓ９０１）。記憶部１１０が観測データ１１３を記憶している場合（Ｓ９０１：ＹＥＳ）、学習データ期間設定部１５０は、観測データ１１３の期間の終了時点を、学習データ期間の終了時点t_endとして設定する（Ｓ９０２）。その後、処理
はＳ９０４に進む。一方、記憶部１１０が観測データ１１３を記憶していない場合（Ｓ９０１：ＮＯ）、学習データ期間設定部１５０は、生成元データ１１１の期間の終了時点を、学習データ期間の終了時点t_endとして設定する（Ｓ９０３）。その後、処理はＳ９０
４に進む。 As shown in FIG. 9, the learning data period setting unit 150 first confirms whether or not the storage unit 110 stores the observation data 113 (whether or not the learning data generation device 100 has acquired the observation data 113). (S901). When the storage unit 110 stores the observation data 113 (S901: YES), the learning data period setting unit 150 sets the end point of the period of the observation data 113 as the end point t_end of the learning data period (S902). . After that, the process proceeds to S904. On the other hand, if the storage unit 110 does not store the observation data 113 (S901: NO), the learning data period setting unit 150 sets the end point of the period of the generation source data 111 as the end point t_end of the learning data period. (S903). After that, the process goes to S90
Proceed to 4.

Ｓ９０４では、学習データ期間設定部１５０は、Ｓ９０２又はＳ９０３で設定した学習データ期間の終了時点t_endから、要求期間（人工データ生成部１４０がＳ４１２で取得
した要求期間）だけ過去に遡った日時（以下、「仮開始時点tmp_Tstart」と称する。）を取得する。 In S904, the learning data period setting unit 150 sets a date and time (hereinafter referred to as , called “provisional start time tmp_Tstart”).

続いて、学習データ期間設定部１５０は、人工データ１１２の期間、生成元データ１１１の期間、及び観測データ１１３の期間と、仮開始時点tmp_Tstartとを比較する（Ｓ９０５）。仮開始時点tmp_Tstartが人工データ１１２の期間中である場合（Ｓ９０５：人工データの期間中）、学習データ期間の開始時点t_startに仮開始時点tmp_Tstartを設定する
（Ｓ９０６）。一方、仮開始時点tmp_Tstartが、生成元データ１１１の期間中か観測データ１１３の期間中である場合（Ｓ９０５：生成元データｏｒ観測データの期間中）、学習データ期間の開始時点t_startに生成元データ１１１の開始時点を設定する（Ｓ９０７）
。 Subsequently, the learning data period setting unit 150 compares the period of the artificial data 112, the period of the source data 111, and the period of the observation data 113 with the provisional start time tmp_Tstart (S905). If the provisional start time tmp_Tstart is during the period of the artificial data 112 (S905: during the artificial data period), the provisional start time tmp_Tstart is set as the start time t_start of the learning data period (S906). On the other hand, if the temporary start time tmp_Tstart is during the period of the source data 111 or the period of the observation data 113 (S905: during the period of the source data or observation data), the source data 111 start point is set (S907)
.

以上の処理により、学習データ期間の開始時点t_startと終了時点t_endが設定され、学習データ期間の設定が完了する。尚、Ｓ９０２、Ｓ９０３、及びＳ９０７の処理により、学習データ生成処理Ｓ４３１において、観測データ１１３又は生成元データ１１１が人工データ１１２よりも優先して学習データ１１４として採用されるようになる。 By the above processing, the start time t_start and the end time t_end of the learning data period are set, and the setting of the learning data period is completed. By the processing of S902, S903, and S907, in the learning data generation processing S431, the observation data 113 or the original data 111 are adopted as the learning data 114 with priority over the artificial data 112.

図１１は、図４に示した学習データ生成処理Ｓ４３１を説明するフローチャートである。以下、同図とともに学習データ生成処理Ｓ４３１について説明する。 FIG. 11 is a flowchart for explaining the learning data generation processing S431 shown in FIG. The learning data generation processing S431 will be described below with reference to FIG.

まず学習データ生成部１６０は、記憶部１１０から、学習データ期間設定処理Ｓ４２４により設定された学習データ期間に重なる期間の、観測データ１１３、生成元データ１１１、及び人工データ１１２を取得する（Ｓ１１０１～Ｓ１１０３）。 First, the learning data generation unit 160 acquires from the storage unit 110 the observation data 113, the source data 111, and the artificial data 112 of the period overlapping with the learning data period set by the learning data period setting process S424 (S1101 to S1103).

続いて、学習データ生成部１６０は、取得した生成元データ１１１と取得した人工データ１１２を時系列方向に連結（人工データ１１２、生成元データ１１１の時系列順に連結）した中間連結データを生成する（Ｓ１１０４）。 Subsequently, the learning data generation unit 160 generates intermediate consolidated data by connecting the acquired generation source data 111 and the acquired artificial data 112 in the time series direction (connecting the artificial data 112 and the generation source data 111 in chronological order). (S1104).

続いて、学習データ生成部１６０は、中間連結データと、取得した観測データ１１３を時系列方向に連結して学習データ１１４を生成する（Ｓ１１０５）。尚、学習データ期間の全期間に対応する観測データ１１３が存在する場合、学習データ１１４は全て観測データ１１３によるものとなる。また学習データ期間の一部の期間に観測データ１１３が重なる場合、学習データ１１４の全期間のうち、学習データ期間の開始時点から観測データ１１３の開始時点までは中間連結データによるものとなり、観測データ１１３の開始時点から学習データ期間の終了時点までは観測データ１１３によるものとなる。このように観測データ１１３が存在する場合は観測データ１１３が学習データ１１４として優先的に採用されるので、機械学習システム１の本番運用が開始された後、実際に取得されたデータである観測データ１１３のみを学習データ１１４として用いて学習する運用状態に早期に移行することができる。 Subsequently, the learning data generating unit 160 generates learning data 114 by connecting the intermediate connected data and the obtained observation data 113 in the time-series direction (S1105). Note that when there is observation data 113 corresponding to the entire learning data period, all of the learning data 114 is based on the observation data 113 . Also, when the observation data 113 overlaps a part of the learning data period, the period from the start of the learning data period to the start of the observation data 113 in the entire period of the learning data 114 is intermediate concatenated data. Observation data 113 is used from the start time of 113 to the end time of the learning data period. In this way, when the observation data 113 exists, the observation data 113 is preferentially adopted as the learning data 114. Therefore, after the actual operation of the machine learning system 1 is started, the observation data which is the data actually acquired 113 as the learning data 114, it is possible to make an early transition to an operational state in which learning is performed.

図１２に学習データ生成処理Ｓ４３１により生成される学習データ１１４の一例を示す。例示する学習データ１１４は、日時１２０１及び観測値１２０２の各項目を有する複数のエントリを含む。上記項目のうち、日時１２０１には、人工データ１１２の日時８１１、生成元データ１１１の日時７０１、及び観測データ１１３の日時１０１１のいずれかの値に基づく日時が設定される。観測値１２０２には、人工データ１１２の観測値８１２、生成元データ１１１の観測値７０２、及び観測データ１１３の観測値１０１２のいずれかに基づく観測値が設定される。 FIG. 12 shows an example of the learning data 114 generated by the learning data generation processing S431. The illustrated learning data 114 includes a plurality of entries having date/time 1201 and observation value 1202 items. Among the above items, the date and time 1201 is set based on any of the values of the date and time 811 of the artificial data 112 , the date and time 701 of the source data 111 , and the date and time 1011 of the observation data 113 . The observed value 1202 is set to an observed value based on any one of the observed value 812 of the artificial data 112 , the observed value 702 of the generator data 111 , and the observed value 1012 of the observed data 113 .

以上に説明したように、第１実施形態の学習データ生成装置１００によれば、機械学習モデル２３の精度を確保するために必要な期間の学習データを用意することが難しい場合でも、上記期間について有効な学習データをユーザの手を煩わせることなく効率よく生成して提供することができる。 As described above, according to the learning data generation device 100 of the first embodiment, even if it is difficult to prepare learning data for a period necessary to ensure the accuracy of the machine learning model 23, Effective learning data can be efficiently generated and provided without troubling the user.

また学習データ生成装置１００は、各複製データに個別に雑音を付加した人工データ１１２を用いて学習データ１１４を生成するので、機械学習モデル２３の過学習の抑制効果が期待される多様性を有する学習データ１１４を生成することができ、機械学習モデル２３の推論精度を向上することができる。また人工データ１１２に白色雑音を付加することで、実際の変動に近い変動を再現することができ、例えば、観測データが正規分布に従うことを前提として機能する機械学習モデル２３の推論精度を高めることができる。 In addition, since the learning data generation device 100 generates the learning data 114 using the artificial data 112 in which noise is individually added to each replicated data, the machine learning model 23 has diversity expected to suppress over-learning. The learning data 114 can be generated, and the inference accuracy of the machine learning model 23 can be improved. Also, by adding white noise to the artificial data 112, it is possible to reproduce fluctuations that are close to actual fluctuations. can be done.

また図９に示したように、学習データ期間設定処理Ｓ４２４において、観測データ１１３が学習データ１１４に優先して採用されるように学習データ期間が設定されるので、機械学習システム１の本番運用が開始された後は、実際に取得されたデータである観測データ１１３のみを学習データ１１４として用いて学習する運用状態に早期に移行することができる。このため、本番運用の開始後、推論装置２の推論精度を早期に向上することができる。 Further, as shown in FIG. 9, in the learning data period setting process S424, the learning data period is set so that the observation data 113 is preferentially adopted over the learning data 114, so that the actual operation of the machine learning system 1 can be performed. After the start, it is possible to quickly transition to an operational state in which learning is performed using only the observation data 113 that is actually acquired data as the learning data 114 . Therefore, the inference accuracy of the inference device 2 can be improved early after the start of the actual operation.

また学習データ生成装置１００は、生成元データ１１１よりも過去の期間の人工データ１１２を生成することが可能であり、新たに取得される観測データ１１３の期間と重ならないように人工データ１１２を生成することができ、例えば、人工データ１１２を観測データ１１３で置換するといった煩雑な処理を発生させないようにすることができる。 In addition, the learning data generation device 100 can generate artificial data 112 of a period earlier than the generation source data 111, and generates the artificial data 112 so as not to overlap with the period of the newly acquired observation data 113. For example, it is possible to avoid complicated processing such as replacing the artificial data 112 with the observed data 113 .

尚、以上では、生成元データ１１１よりも過去の期間の人工データ１１２を生成する場合を例示したが、生成元データ１１１よりも未来の期間の人工データ１１２を生成しても
よい。これにより、例えば、現実の振る舞いを最もよく反映していると考えられる時期における過去の時系列データを生成元データ１１１として用いて所望の未来の時期の学習データ１１４を生成することができる。尚、この場合、例えば、図５のＳ５０４において未来の期間の人工データ１１２とする各複製データに－１から始まる負の整数を複製番号を割り当て、過去の期間の各複製データに割り当てた正の複製番号と負の複製番号の絶対値との合計が図５のＳ５０２で取得した複製回数と一致するようにする。そのようにすることで、Ｓ５０７で複製データの期間に複製番号を乗算した値を基準とする日時に加算するだけで、日時（期間）情報を容易に算出することができる。 In the above description, the case of generating the artificial data 112 for a period earlier than the generation source data 111 is illustrated, but the artificial data 112 for a period later than the generation source data 111 may be generated. As a result, for example, it is possible to generate learning data 114 for a desired future period using past time-series data for a period considered to best reflect actual behavior as generation source data 111 . In this case, for example, in S504 of FIG. 5, a negative integer starting from −1 is assigned to each replicated data to be the artificial data 112 in the future period, and a positive number assigned to each replicated data in the past period is assigned. The sum of the replication number and the absolute value of the negative replication number is made to match the number of replications acquired in S502 of FIG. By doing so, the date/time (period) information can be easily calculated simply by adding the value obtained by multiplying the period of the copy data by the copy number in S507 to the reference date/time.

［第２実施形態］
続いて、第２実施形態について説明する。第２実施形態の学習データ生成装置１００は、生成元データ１１１を分解することにより得られる構成要素（後述するトレンド、周期変動、及び残差）に基づき人工データ１１２を生成する。尚、第２実施形態の機械学習システム１の基本的な構成並びに機械学習システム１において実行される処理の流れは、図１乃至図４とともに説明した第１実施形態の機械学習システム１と基本的に共通するが、人工データ生成部１４０の機能の一部が異なる。以下では、第１実施形態と異なる部分を中心として説明する。 [Second embodiment]
Next, a second embodiment will be described. The learning data generation device 100 of the second embodiment generates artificial data 112 based on components (trends, periodic fluctuations, and residuals to be described later) obtained by decomposing the source data 111 . The basic configuration of the machine learning system 1 of the second embodiment and the flow of processing executed in the machine learning system 1 are basically the same as the machine learning system 1 of the first embodiment described with FIGS. , but a part of the function of the artificial data generation unit 140 is different. The following description will focus on portions that differ from the first embodiment.

図１３は、第２実施形態として示す人工データ生成処理Ｓ４１３を説明するフローチャートである。また図１４は、人工データ生成処理Ｓ４１３の実行過程で生成されるデータを模式的に示した図である。また図１５は、以下の説明で用いる生成元データ１１１の一例である。以下、これらの図を参照しつつ、第２実施形態の人工データ生成処理Ｓ４１３について詳述する。 FIG. 13 is a flowchart for explaining artificial data generation processing S413 shown as the second embodiment. FIG. 14 is a diagram schematically showing data generated in the process of executing the artificial data generation process S413. FIG. 15 is an example of generation source data 111 used in the following description. The artificial data generation processing S413 of the second embodiment will be described in detail below with reference to these figures.

図１４（Ａ）に示すように、例示する生成元データ１１１は、小周期Ｔｐ（＝１日）と大周期Ｔ（＝７日）を有する、２０１９年１１月１５日０時０分０秒から２０１９年１１月２２日２３時５０分０秒までの１０分間隔の８日分のデータ（８回の小周期Ｔｐ（大周期７日×１＋小周期１日））からなる。尚、以下の説明において、生成元データ１１１の開始時点をｔとする。また以下の説明において、図４のＳ４１２で受け付けた要求期間は２８週とする。また図４のＳ４１２において、生成元データ１１１の周期数として１周期（大周期１回分）を受け付けているものとする。 As shown in FIG. 14A, the example generation source data 111 has a small period Tp (=1 day) and a large period T (=7 days), 00:00:00 on November 15, 2019. to 23:50:00 on Nov. 22, 2019 for 8 days (eight small periods Tp (large period 7 days x 1 + small period 1 day)) at 10-minute intervals. In the following description, t is the starting point of the generation source data 111 . Also, in the following description, the request period received in S412 of FIG. 4 is assumed to be 28 weeks. Also, in S412 of FIG. 4, it is assumed that one cycle (one large cycle) is received as the number of cycles of the generation source data 111 .

図１３に示すように、まず人工データ生成部１４０は、ユーザインタフェースを解して小周期Ｔｐ（１日）と大周期Ｔ（７日）の入力を受け付ける（Ｓ１３０１）。 As shown in FIG. 13, the artificial data generator 140 first receives input of a short period Tp (1 day) and a large period T (7 days) through the user interface (S1301).

続いて、人工データ生成部１４０は、Ｓ４１２で受け付けた要求期間（２８週）以上の期間となる、生成元データ１１１の１周期の期間の倍数の最小値（最小周期数）を求める（Ｓ１３０２）。 Subsequently, the artificial data generation unit 140 obtains the minimum value (minimum number of cycles) of the multiple of the period of one cycle of the generation source data 111, which is longer than the requested period (28 weeks) received in S412 (S1302). .

続いて、人工データ生成部１４０は、Ｓ１３０１で求めた最小周期数から生成元データ１１１に含まれている周期数を減じた値を求め、求めた値を生成元データ１１１に含まれている周期数で割った値を小数点以下切り上げ、更に１を加算して得られる値を複製回数とする（Ｓ１３０３）。尚、１を加算するのは、生成元データ１１１について後述するトレンドを移動平均により求めることに起因して生じる時間差（後述するＴｐ／２）により、生成した人工データ１１２の期間が要求期間を満たさなくなる可能性があるからである。本例の場合、生成元データ１１１の周期数が１であり、Ｓ４１２で受け付けた要求期間が２８週であるので、複製回数として２８が得られる。 Subsequently, the artificial data generation unit 140 obtains a value obtained by subtracting the number of cycles included in the generation source data 111 from the minimum number of cycles obtained in S1301, A value obtained by rounding up the value obtained by dividing by the number after the decimal point and adding 1 is set as the number of times of replication (S1303). Note that the reason for adding 1 is that the period of the generated artificial data 112 satisfies the required period due to the time difference (Tp/2, which will be described later) caused by obtaining the trend, which will be described later, from the moving average of the generation source data 111. because it may disappear. In this example, the cycle number of the generation source data 111 is 1, and the request period received in S412 is 28 weeks, so 28 is obtained as the number of times of duplication.

続いて、人工データ生成部１４０は、小周期Ｔｐを変動周期として、生成元データ１１１を構成要素（トレンド、周期変動、残差）に分解する（Ｓ１３０４）。ここでトレンド
とは、時系列データにおける長期的な変動を表す要素（Trend component）のことをいう
。また周期変動とは、時系列データにおいて一定期間ごとに周期的に現れる要素（Seasonal component）のことをいう。また残差とは、時系列データにおいて、トレンドと周期変動を除くことにより残る細かな変動要素（Redidual component）のことをいう。本実施形態は、上記分解を非特許文献１に記載されているＳＴＬ（Seasonal-Trend Decomposition
Procedure Based on Loess）を用いて行うものとするが、上記分解の方法は必ずしも限
定されない。 Subsequently, the artificial data generation unit 140 decomposes the generation source data 111 into constituent elements (trend, periodic fluctuation, residual) using the short period Tp as the fluctuation period (S1304). Here, the term "trend" refers to a component representing long-term fluctuations in time-series data (Trend component). Periodic fluctuation refers to a seasonal component that periodically appears in time-series data at regular intervals. Residuals refer to small fluctuation elements (reddual components) that remain after removing trends and periodic fluctuations in time-series data. In this embodiment, the above decomposition is performed by STL (Seasonal-Trend Decomposition
Procedure Based on Loess), but the decomposition method is not necessarily limited.

図１４（Ｂ）は、図１４（Ａ）の生成元データ１１１を分解することにより得られる構成要素である。同図において、（Ｂ－１）はトレンド、（Ｂ－２）は周期変動、（Ｂ－３）は残差である。 FIG. 14(B) shows components obtained by decomposing the generator data 111 of FIG. 14(A). In the figure, (B-1) is the trend, (B-2) is the periodic variation, and (B-3) is the residual.

図１６に、Ｓ１３０４で得られるデータ（以下、「中間データ１６００」と称する。）を示す。同図に示すように、中間データ１６００は、日時１６０１、観測値１６０２、トレンド１６０３、周期変動１６０４、及び残差１６０５の各項目を有する複数のエントリを含む。同図において、「－」は、データが欠落していることを示す。日時１６０１及び観測値１６０２は、生成元データ１１１における日時１２０１及び観測値１２０２に対応する。トレンド１６０３、周期変動１６０４、及び残差１６０５には夫々、Ｓ１３０４で得られた、観測値１６０２の構成要素であるトレンド、周期変動、及び残差を示す値が設定される。尚、トレンド１６０３と残差１６０５は、いずれも期間の両端において、ＳＴＬを実行する際に指定した小周期の半分の期間（＝Ｔｐ／２）の値が欠落する。本例では、２０１９年１１月１５日０時０分０秒から２０１９年１１月１５日１１時５０分０秒までの期間と、２０１９年１１月２２日１２時０分０秒から、２０１９年１１月２２日２３時５０分０秒までの期間においてトレンド１６０３と残差１６０５の値が欠落している。 FIG. 16 shows the data obtained in S1304 (hereinafter referred to as "intermediate data 1600"). As shown in the figure, the intermediate data 1600 includes a plurality of entries having date/time 1601, observed value 1602, trend 1603, periodic variation 1604, and residual 1605 items. In the figure, "-" indicates lack of data. The date and time 1601 and the observed value 1602 correspond to the date and time 1201 and the observed value 1202 in the source data 111 . A trend 1603, a periodic fluctuation 1604, and a residual 1605 are set to values indicating the trend, periodic fluctuation, and residual, which are the components of the observed value 1602 obtained in S1304. Note that both the trend 1603 and the residual 1605 lack values for half the period (=Tp/2) of the short period specified when executing the STL at both ends of the period. In this example, the period from 00:00:00 on November 15, 2019 to 11:50:00 on November 15, 2019 and from 12:00:00 on November 22, 2019 to the year 2019 The values of trend 1603 and residual 1605 are missing in the period up to 23:50:00 on November 22nd.

図１３に戻り、続いて、人工データ生成部１４０は、中間データ１６００のトレンド１６０３の値が存在する（欠落していない）日時について、同じ日時１６０１のトレンド１６０３と周期変動１６０４の合計値（以下、「複製元観測値」と称する。）を求める（Ｓ１３０５）。Ｓ１３０５の処理は、図１４では（Ｂ－１）に示すトレンドと（Ｂ－２）に示す周期変動とを合成する処理に相当する。当該処理を実行することにより、図１４（Ｃ）に示すデータ（以下、「複製元データ１７００」と称する。）が得られる。 Returning to FIG. 13, subsequently, the artificial data generation unit 140 determines the total value of the trend 1603 and the periodic fluctuation 1604 (hereinafter referred to as , referred to as “replication source observation values”) (S1305). The process of S1305 corresponds to the process of synthesizing the trend shown in (B-1) and the periodic fluctuation shown in (B-2) in FIG. By executing this process, the data shown in FIG. 14C (hereinafter referred to as "copy source data 1700") is obtained.

図１７に複製元データ１７００の一例を示す。同図に示すように、複製元データ１７００は、日時１７０１、観測値１７０２、トレンド１７０３、周期変動１７０４、残差１７０５、及び複製元観測値１７０６の各項目を有する複数のエントリを含む。上記項目のうち、日時１７０１には、中間データ１６００のエントリのうち、トレンド１７０３の値を有するエントリの日時１６０１の値が設定される。観測値１７０２には、中間データ１６００のエントリのうち、日時１７０１の値に対応する観測値１６０２の値が設定される。トレンド１７０３には、中間データ１６００のエントリのうち、日時１７０１の値に対応するトレンド１６０３の値が設定される。周期変動１７０４には、中間データ１６００のエントリのうち、日時１７０１の値に対応する周期変動１６０４の値が設定される。残差１７０５には、中間データ１６００のエントリのうち、日時１７０１の値に対応する残差１６０５の値が設定される。複製元観測値１７０６には、中間データ１６００のエントリのうち、日時１７０１の値に対応するトレンド１６０３の値と日時１７０１の値に対応する周期変動１６０４の値とを合計した値が設定される。 FIG. 17 shows an example of duplication source data 1700 . As shown in the figure, source data 1700 includes multiple entries having items of date and time 1701 , observed value 1702 , trend 1703 , periodic variation 1704 , residual 1705 , and source observed value 1706 . Among the above items, in the date and time 1701, the value of the date and time 1601 of the entry having the value of the trend 1703 among the entries of the intermediate data 1600 is set. In the observation value 1702, the value of the observation value 1602 corresponding to the value of the date and time 1701 among the entries of the intermediate data 1600 is set. The value of the trend 1603 corresponding to the value of the date and time 1701 among the entries of the intermediate data 1600 is set to the trend 1703 . The value of the periodic variation 1604 corresponding to the value of the date and time 1701 among the entries of the intermediate data 1600 is set in the periodic variation 1704 . The value of the residual 1605 corresponding to the value of the date and time 1701 among the entries of the intermediate data 1600 is set to the residual 1705 . A duplicate source observed value 1706 is set to a value obtained by summing the value of the trend 1603 corresponding to the value of the date and time 1701 and the value of the periodic variation 1604 corresponding to the value of the date and time 1701 among the entries of the intermediate data 1600 .

図１３に戻り、続いて、人工データ生成部１４０は、Ｓ１３０３で求めた複製回数だけ複製元データ１７００を複製する（Ｓ１３０６）。以下、複製された各データのことを「複製データ」と称する。 Returning to FIG. 13, the artificial data generation unit 140 subsequently duplicates the source data 1700 by the number of times of duplication obtained in S1303 (S1306). Hereinafter, each replicated data is referred to as "replicated data".

続いて、人工データ生成部１４０は、１から始まる自然数を、生成した各複製データに順に割り当て、記憶部１１０が、各複製データに割り当てられた番号（以下、「複製番号」と称する。）を複製データの夫々に対応づけて記憶する（Ｓ１３０７）。尚、第１実施形態で述べたのと同様に、当該処理において１から始まる自然数とは別に－１から始まる負の整数を複製番号として割り当ることにより生成元データ１１１よりも未来の期間における人工データ１１２を生成してもよい。この場合、第１実施形態の場合と同様に、正の複製番号と負の複製番号の絶対値との合計値がＳ１３０３で取得した複製回数と一致するようにする。 Subsequently, the artificial data generation unit 140 sequentially assigns a natural number starting from 1 to each generated replicated data, and the storage unit 110 assigns a number assigned to each replicated data (hereinafter referred to as a “replicate number”). It is stored in association with each copy data (S1307). In the same manner as described in the first embodiment, by assigning a negative integer starting from -1 as a replication number in addition to a natural number starting from 1 in the process, artificial Data 112 may be generated. In this case, as in the case of the first embodiment, the total value of the positive replication number and the absolute value of the negative replication number is made to match the number of replications acquired in S1303.

続いて、人工データ生成部１４０は、割り当てた複製番号の逆順に、複製データを時系列方向に連結していくことにより一次人工データを生成する（Ｓ１３０８）。 Subsequently, the artificial data generation unit 140 generates primary artificial data by linking the duplicate data in the time-series direction in reverse order of the assigned duplicate numbers (S1308).

続いて、人工データ生成部１４０は、一次人工データのうち、複製番号が１で日時がｔ＋Ｔからｔ＋Ｔ＋Ｔｐ／２の期間に該当するエントリを削除する（Ｓ１３０９）。 Subsequently, the artificial data generation unit 140 deletes the entry whose copy number is 1 and whose date and time is from t+T to t+T+Tp/2 from the primary artificial data (S1309).

続いて、人工データ生成部１４０は、一次人工データの各エントリに対して、生成元データ１１１の各エントリの日時７０１の値を複製したデータ（以下、「参照元日時」と称する。）を付与する（Ｓ１３１０）。 Subsequently, the artificial data generation unit 140 gives each entry of the primary artificial data data obtained by duplicating the value of the date and time 701 of each entry of the generation source data 111 (hereinafter referred to as “reference date and time”). (S1310).

続いて、人工データ生成部１４０は、一次人工データの各エントリの参照元日時を、基準とする日時から遡った値に更新することにより各エントリの日時を生成する（Ｓ１３１１）。この処理により、例えば、複製番号２の複製データにおける２０１９年１１月１５日１２時０分０秒の変更後の日時は、２週分遡った２０１９年１１月１日１２時０分０秒となる。 Subsequently, the artificial data generation unit 140 generates the date and time of each entry by updating the reference source date and time of each entry of the primary artificial data to a value that goes back from the reference date and time (S1311). As a result of this process, for example, the changed date and time of November 15, 2019, 12:00:00 in the replicated data with the replication number 2 is changed to November 1, 2019, 12:00:00, two weeks earlier. Become.

続いて、人工データ生成部１４０は、複製データを連結する際の境界となる時点における、境界の前後の複製データの周期変動の差分ｄを求める。具体的には、人工データ生成部１４０は、周期変動について、ｔ＋Ｔｐ／２の時点のエントリの値と当該時点から一つ前の時点のエントリの値との差分ｄを求める（Ｓ１３１２）。例えば、図１６の中間データ１６００の例では、ｔ＋Ｔｐ／２は２０１９年１１月１５日１２時０分０秒であるため、同日時の周期変動１６０４として１５２が得られる。また同日時の一つ前の時点である２０１９年１１月１５日１１時５０分０秒の周期変動１６０４として１５１が得られる。このため、本例では差分ｄとして１が得られる。 Subsequently, the artificial data generation unit 140 obtains the difference d in periodic fluctuations of the duplicated data before and after the boundary at the time of the boundary when connecting the duplicated data. Specifically, the artificial data generator 140 obtains the difference d between the value of the entry at the point of time t+Tp/2 and the value of the entry one point before (S1312). For example, in the example of the intermediate data 1600 in FIG. 16, t+Tp/2 is 12:00:00 on November 15, 2019, so 152 is obtained as the periodic variation 1604 on the same date and time. Also, 151 is obtained as the periodic variation 1604 at 11:50:00 on November 15, 2019, which is one point before the same date and time. Therefore, in this example, 1 is obtained as the difference d.

続いて、人工データ生成部１４０は、一次人工データの各エントリに対して、差分ｄと各エントリの複製番号との積として求められる値を、一次人工データの各エントリの観測値に反映（例えば、加算又は減算）する（Ｓ１３１３）。即ち、短期間のデータから取得されるトレンド（差分ｄ）が要求期間において継続していたと仮定した場合における一次人工データを生成する。当該処理の実行後、一次人工データは図１４（Ｄ）のようになる。 Subsequently, the artificial data generation unit 140 reflects the value obtained as the product of the difference d and the replication number of each entry for each entry of the primary artificial data in the observed value of each entry of the primary artificial data (for example, , addition or subtraction) (S1313). That is, primary artificial data are generated assuming that the trend (difference d) obtained from short-term data continued during the requested period. After executing the process, the primary artificial data becomes as shown in FIG. 14(D).

続いて、人工データ生成部１４０は、Ｓ１３０４で得られた残差の分散ｓ＾２を求める（Ｓ１３１４）。 Subsequently, the artificial data generation unit 140 obtains the variance ŝ2 of the residuals obtained in S1304 (S1314).

続いて、人工データ生成部１４０は、上記分散ｓ＾２を有する一次人工データの期間に対応する期間について白色雑音を生成する（Ｓ１３１５）。当該処理を実行することにより生成される白色雑音は図１４（Ｅ）のようになる。 Subsequently, the artificial data generation unit 140 generates white noise for the period corresponding to the period of the primary artificial data having the variance ŝ2 (S1315). White noise generated by executing the processing is as shown in FIG. 14(E).

続いて、人工データ生成部１４０は、一次人工データに対して、各エントリの観測値を複製した値（以下「参照元観測値」と称する。）を生成する（Ｓ１３１６）。 Subsequently, the artificial data generating unit 140 generates a value obtained by duplicating the observed value of each entry (hereinafter referred to as “reference source observed value”) for the primary artificial data (S1316).

続いて、人工データ生成部１４０は、一次人工データに対して、Ｓ１３１５で生成した白色雑音を変動値として付与する（Ｓ１３１７）。 Subsequently, the artificial data generator 140 gives the white noise generated in S1315 as a variation value to the primary artificial data (S1317).

続いて、人工データ生成部１４０は、人工データ１１２の各エントリの観測値に、夫々の参照元観測値に夫々の変動値を加算した値を設定して人工データを生成する（Ｓ１３１８）。当該処理を実行することにより生成される人工データ１１２は、図１４（Ｆ）のようになる。 Subsequently, the artificial data generation unit 140 generates artificial data by setting the observed value of each entry in the artificial data 112 to a value obtained by adding each variation value to each reference source observed value (S1318). The artificial data 112 generated by executing the process is as shown in FIG. 14(F).

図１８に人工データ１１２の一例を示す。同図に示すように、人工データ１１２は、日時１８０１、観測値１８０２、参照元観測値１８０３、変動値１８０４、参照元日時１８０５、複製番号１８０６の各項目を有する複数のエントリを含む。上記項目のうち、日時１８０１には、Ｓ１３１１において生成された日時が設定される。日時１８０１の値は、各エントリを一意に識別するための識別子としても機能する。観測値１８０２には、Ｓ１３１８で求めた観測値が設定される。参照元観測値１８０３には、Ｓ１３１６で生成された参照元観測値が設定される。変動値１８０４には、Ｓ１３１７で付与された白色雑音の値が設定される。参照元日時１８０５には、Ｓ１３１０で付与された日時が設定される。参照元日時１８０５は、生成元データ１１１の日時７０１に対応し、当該エントリが生成元データ１１１の日時７０１のエントリに基づくものであることを示す。複製番号１８０６は、Ｓ１３０７で割り当てられた複製番号が設定される。 An example of artificial data 112 is shown in FIG. As shown in the figure, the artificial data 112 includes a plurality of entries having items of date/time 1801 , observed value 1802 , reference source observed value 1803 , variation value 1804 , reference source date/time 1805 , and replication number 1806 . The date and time generated in S1311 is set in the date and time 1801 of the above items. The value of date and time 1801 also functions as an identifier for uniquely identifying each entry. The observed value obtained in S1318 is set in the observed value 1802 . The reference source observation value generated in S1316 is set in the reference source observation value 1803 . The variation value 1804 is set to the white noise value added in S1317. The reference source date and time 1805 is set with the date and time given in S1310. The reference source date and time 1805 corresponds to the date and time 701 of the source data 111 and indicates that the entry is based on the entry of the date and time 701 of the source data 111 . The replication number 1806 is set to the replication number assigned in S1307.

以上に説明したように、第２実施形態の学習データ生成装置１００は、２つの周期を含む時系列データである生成元データ１１１を、トレンド、周期変動、残差に分解し、トレンドと周期変動とに基づき雑音のない複製元データを生成し、また残差から得た分散ｓ＾２に基づき白色雑音を生成し、複製元データと白色雑音から人工データを生成する。このため、現実に起こる変動過程に近い変動過程を再現した学習データを生成することができ、これを用いて機械学習モデル２３の学習を行うことで推論装置２の推論精度を向上することができる。 As described above, the learning data generation device 100 of the second embodiment decomposes the source data 111, which is time-series data including two cycles, into trend, periodic fluctuation, and residual, and divides the trend and periodic fluctuation into and white noise is generated based on the variance ŝ2 obtained from the residual, and artificial data is generated from the original data and the white noise. Therefore, it is possible to generate learning data that reproduces a variation process that is close to the variation process that actually occurs, and use this to train the machine learning model 23, thereby improving the inference accuracy of the inference device 2. .

また学習データ生成装置１００は、複製データを連結する境界となる時点の前後の複製データの周期変動の差分ｄ（短期間のトレンド）を取得し、上記境界において複製番号と差分ｄとの積の値だけ観測値を変化させつつ複数の複製データを連結することにより人工データ１１２を生成する。このため、長期のトレンドを考慮した学習データ１１４を生成することができ、機械学習モデル２３を精度よく学習することができる。 In addition, the learning data generation device 100 acquires the difference d (short-term trend) of periodic fluctuations of the replicated data before and after the point of time that becomes the boundary for connecting the replicated data, and obtains the product of the replication number and the difference d at the boundary. Artificial data 112 is generated by concatenating a plurality of replicated data while changing the observed value by the value. Therefore, the learning data 114 can be generated in consideration of long-term trends, and the machine learning model 23 can be learned with high accuracy.

以上、本発明の実施形態につき説明したが、本発明は上記した実施形態に限定されるものではなく、様々な変形例が含まれる。また例えば、上記した実施形態は本発明を分かりやすく説明するために構成を詳細に説明したものであり、必ずしも説明した全ての構成を備えるものに限定されるものではない。また各実施形態の構成の一部について、他の構成に追加、削除、置換することが可能である。 Although the embodiments of the present invention have been described above, the present invention is not limited to the above-described embodiments, and includes various modifications. Further, for example, the above-described embodiments are detailed descriptions of the configurations for easy understanding of the present invention, and are not necessarily limited to those having all the described configurations. Also, part of the configuration of each embodiment can be added, deleted, or replaced with another configuration.

また上記の各構成、機能、処理部、処理手段等は、それらの一部又は全部を、例えば集積回路で設計する等によりハードウェアで実現してもよい。また実施形態で示した各機能を実現するソフトウェアのプログラムコードによっても実現できる。この場合、プログラムコードを記録した記憶媒体を情報処理装置（コンピュータ）に提供し、その情報処理装置が備えるプロセッサが記憶媒体に格納されたプログラムコードを読み出す。この場合、記憶媒体から読み出されたプログラムコード自体が以上の実施形態の機能を実現することになり、そのプログラムコード自体、及びそれを記憶した記憶媒体は本発明を構成することになる。このようなプログラムコードを供給するための記憶媒体としては、例えば、ハードディスク、ＳＳＤ（Solid State Drive）、光ディスク、光磁気ディスク、ＣＤ-Ｒ、
フレキシブルディスク、ＣＤ-ＲＯＭ、ＤＶＤ-ＲＯＭ、磁気テープ、不揮発性のメモリカード、ＲＯＭ等が用いられる。 Further, each of the above configurations, functions, processing units, processing means, and the like may be realized by hardware, for example, by designing them in an integrated circuit. It can also be implemented by a software program code that implements each function shown in the embodiment. In this case, an information processing apparatus (computer) is provided with a storage medium storing the program code, and a processor included in the information processing apparatus reads the program code stored in the storage medium. In this case, the program code itself read from the storage medium implements the functions of the above embodiments, and the program code itself and the storage medium storing it constitute the present invention. Examples of storage media for supplying such program codes include hard disks, SSDs (Solid State Drives), optical disks, magneto-optical disks, CD-Rs,
Flexible disks, CD-ROMs, DVD-ROMs, magnetic tapes, non-volatile memory cards, ROMs, etc. are used.

以上の実施形態において、制御線や情報線は、説明上必要と考えられるものを示しており、製品上必ずしも全ての制御線や情報線を示しているとは限らない。全ての構成が相互に接続されていてもよい。また以上では各種の情報を表形式で例示したが、これらの情報は表以外の形式で管理してもよい。 In the above embodiments, the control lines and information lines are those considered necessary for explanation, and not all control lines and information lines are necessarily shown on the product. All configurations may be interconnected. In the above description, various types of information are illustrated in tabular form, but these information may be managed in forms other than the tabular form.

１機械学習システム、２推論装置、２１学習処理部、２２推論処理部、２３機械学習モデル、１００学習データ生成装置、１１０記憶部、１１２人工データ、１１１生成元データ、１１３観測データ、１１４学習データ、１２０観測データ取得部、１３０生成元データ取得部、１４０人工データ生成部、１５０学習データ期間設定部、１６０学習データ生成部、１７０学習データ出力部、Ｓ４００学習データ生成処理、Ｓ４１３人工データ生成処理、Ｓ４２４学習データ期間設定処理、Ｓ４３１学習データ生成処理 1 machine learning system, 2 inference device, 21 learning processing unit, 22 inference processing unit, 23 machine learning model, 100 learning data generation device, 110 storage unit, 112 artificial data, 111 generation source data, 113 observation data, 114 learning data , 120 observation data acquisition unit, 130 generation source data acquisition unit, 140 artificial data generation unit, 150 learning data period setting unit, 160 learning data generation unit, 170 learning data output unit, S400 learning data generation processing, S413 artificial data generation processing , S424 learning data period setting process, S431 learning data generation process

Claims

A learning data generation device configured using an information processing device and generating learning data used for learning a machine learning model,
By connecting a plurality of replicated data, which are data obtained by replicating the source data, which is time-series data for a predetermined period, and adding noise to each of the replicated data, time-series data of a period corresponding to the required period is obtained. an artificial data generation unit that generates certain artificial data;
a learning data generation unit that generates learning data using the artificial data;
A learning data generation device comprising:

The learning data generation device according to claim 1,
The requested period is the period received from the user via the user interface,
Learning data generator.

The learning data generation device according to claim 1,
The artificial data generation unit generates the artificial data for a period earlier than the generation source data,
The learning data generation unit generates the learning data by linking the artificial data to the generation source data in time series.
Learning data generator.

The learning data generation device according to claim 1,
The artificial data generation unit generates the artificial data for a future period from the generation source data,
The learning data generation unit generates the learning data by linking the artificial data to the generation source data in time series.
Learning data generator.

The learning data generation device according to claim 1 or 2,
The learning data generation unit links observed data, which is time-series data actually input to the machine learning model during actual operation of a machine learning system that performs inference processing using the machine learning model, to the artificial data. generating the learning data by
Learning data generator.

The learning data generation device according to claim 5,
The learning data generation unit generates the learning data by adopting the observation data with priority over the artificial data.
Learning data generator.

The learning data generation device according to claim 6,
A learning data period setting unit that sets a learning data period that is a period from the start time to the end time of the learning data,
The learning data period setting unit sets the latest point in time of the observation data to the end point of the learning data period,
setting the start time to a time that is the required period of time before the end time;
Learning data generator.

The learning data generation device according to claim 1,
The artificial data generation unit gives noise to each of the replicated data individually.
Learning data generator.

The learning data generation device according to claim 1,
wherein the noise is white noise;
Learning data generator.

The learning data generation device according to claim 1,
The artificial data generation unit decomposes the source data into component elements of a trend, a periodic variation, and a residual, and generates the artificial data based on the trend and the periodic variation among the components. ,
Learning data generator.

The learning data generation device according to claim 10,
The artificial data generation unit generates the noise based on the variance of the residual, and adds the generated noise to the generated artificial data.
Learning data generator.

The learning data generation device according to claim 10,
The artificial data generation unit obtains a difference in the periodic variation of the duplicated data before and after the boundary at a point of time that becomes a boundary when connecting the duplicated data, and generates a plurality of the duplicated data while reflecting the difference. generating said artificial data by concatenating;
Learning data generator.

The information processing device
By connecting a plurality of replicated data, which are data obtained by replicating the source data, which is time-series data for a predetermined period, and adding noise to each of the replicated data, time-series data of a period corresponding to the required period is obtained. generating some artificial data;
generating learning data for use in learning a machine learning model using the artificial data;
A training data generation method that executes

The learning data generation method according to claim 13,
The information processing device concatenates, to the artificial data, observed data, which is time-series data actually input to the machine learning model during actual operation of a machine learning system that performs inference processing using the machine learning model. generating the learning data by
A learning data generation method further comprising:

The learning data generation method according to claim 14,
a step in which the information processing device generates the learning data by preferentially adopting the observation data over the artificial data;
A learning data generation method further comprising:

The learning data generation method according to claim 13,
A step in which the information processing device decomposes the source data into components of a trend, a periodic variation, and a residual, and generates the artificial data based on the trend and the periodic variation among the components. ,
A learning data generation method further comprising:

The learning data generation method according to claim 16,
a step in which the information processing device generates the noise based on the variance of the residual and adds the generated noise to the generated artificial data;
A learning data generation method further comprising:

The learning data generation method according to claim 16,
The information processing device obtains a difference in the periodic variation of the duplicated data before and after the boundary at a point of time when the duplicated data is concatenated, and concatenates the duplicated data while sequentially adding the difference. generating said artificial data by moving
A learning data generation method further comprising: