WO2023242943A1 - データ生成装置、データ生成方法、プログラム、および機械学習システム - Google Patents
データ生成装置、データ生成方法、プログラム、および機械学習システム Download PDFInfo
- Publication number
- WO2023242943A1 WO2023242943A1 PCT/JP2022/023750 JP2022023750W WO2023242943A1 WO 2023242943 A1 WO2023242943 A1 WO 2023242943A1 JP 2022023750 W JP2022023750 W JP 2022023750W WO 2023242943 A1 WO2023242943 A1 WO 2023242943A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- environment
- production environment
- normal
- abnormal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/16—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks using machine learning or artificial intelligence
Definitions
- One aspect of the present invention relates to a data generation device, a data generation method, a program, and a machine learning system that generate learning data for a machine learning model that detects an abnormality such as a failure.
- the system is ultimately developed with the aim of applying it to a production environment (target production environment).
- a production environment target production environment
- it is essential to perform verification work in an environment that simulates the production environment, such as a verification environment or a staging environment.
- AI Artificial Intelligence
- MLOps Machine Learning Operations
- This invention was made in view of the above-mentioned circumstances, and aims to provide a technology that can generate learning data necessary for learning an AI model at low cost.
- a data generation device includes a data classification section and a data generation section.
- the data classification unit classifies existing data collected from an environment applicable to verification of the production environment into normal data and abnormal data.
- the data generation unit generates abnormal data of the production environment based on the normal data of the production environment and the classified normal data and abnormal data.
- FIG. 1 is a diagram showing an example of a system to which a data generation device according to an embodiment of the present invention is applied.
- FIG. 2 is a block diagram showing an example of the learning data generation device 3 shown in FIG. 1.
- FIG. 3 is a functional block diagram showing an example of the learning data generation device 3 shown in FIG. 2.
- FIG. 4 is a functional block diagram illustrating an example of a machine learning system according to an embodiment.
- FIG. 5 is a diagram for explaining the flow of data exchanged between the learning data generation device 3, the existing environment 80, the AI model 2, and the production environment 70.
- FIG. 6 is a diagram illustrating an example of a processing procedure of the machine learning system according to the embodiment.
- FIG. 7 is a diagram illustrating an example of changes in metrics values in existing data.
- FIG. 8 is a diagram for explaining calculation of abnormality data in the production environment.
- FIG. 1 is a diagram showing an example of a system to which a data generation device according to an embodiment of the present invention is applied.
- the system shown in FIG. 1 is a system that operates a target system 1, such as a network system, using an AI model 2.
- This system further includes a learning data generation device 3.
- the learning data generation device 3 generates learning data for training the AI model 2.
- FIG. 2 is a block diagram showing an example of the learning data generation device 3 shown in FIG. 1.
- the learning data generation device 3 is a computer that includes a processor 10, a memory 20, a storage 30, an input/output interface (I/F) 40, and a bus 45 that interconnects these.
- the input/output I/F 40 sets up a communication link between the learning data generation device 3, the target system 1, and the AI model 2, and exchanges various data.
- the processor 10 is an arithmetic device such as a CPU (Central Processing Unit) or an MPU (Micro Processing Unit), and implements the processing functions of the embodiment according to a program loaded from the storage 30 to the memory 20.
- CPU Central Processing Unit
- MPU Micro Processing Unit
- FIG. 3 is a functional block diagram showing an example of the learning data generation device 3 shown in FIG. 2.
- the memory 20 is a semiconductor memory such as a ROM (Read Only Memory) or a RAM (Random Access Memory).
- the storage 30 is a non-volatile memory such as an HDD (Hard Disk Drive) or an SSD (Solid State Drive), and in addition to basic software such as an OS (Operating System), it also contains programs for realizing the processing according to the embodiment.
- OS Operating System
- the program 34 can be installed in the learning data generation device 3.
- the storage 30 includes an existing data storage section 31, a calculation data storage section 32, and a learning data storage section 33 as storage areas necessary for implementing an embodiment of the present invention.
- the existing data storage unit 31 stores existing data collected from an environment applicable to verification of the production environment.
- the existing data may be, for example, metrics values such as CPU usage rate and memory usage rate.
- the environment applicable to the verification of the production environment refers to an environment, such as a verification environment or a staging environment, that has similar equipment and environment configuration to the production environment. Such environments may be collectively referred to as similar environments.
- existing data refers to data collected from an environment similar to the production environment.
- the calculation data storage unit 32 stores abnormality data of the production environment generated by the processor 50.
- the learning data storage unit 33 stores normal data of the production environment and abnormal data of the production environment generated by the processor 50 as learning data of the machine learning model (AI model 2).
- the processor 50 includes a data classification section 51, a data generation section 52, and a data management section 53 as processing functions according to an embodiment of the present invention.
- the data classification section 51, the data generation section 52, and the data management section 53 are realized by the processor 10 executing a program loaded into the memory 20.
- the program 34 includes instructions that cause the processor 50 to function as the data classification section 51, instructions that cause the processor 50 to function as the data generation section 52, and instructions that cause the processor 50 to function as the data management section 53.
- the data classification unit 51 classifies the existing data stored in the existing data storage unit 31 into normal data and abnormal data. As a result, the existing data is classified into data when the existing environment is normal and data when the existing environment is abnormal.
- the data generation unit 52 generates abnormal data of the production environment based on the normal data of the production environment, the normal data of the existing environment, and the abnormal data of the existing environment. For example, the data generation unit 52 generates information about abnormalities in the production environment by mapping the existing environment abnormality data to the production environment abnormality data based on the relationship between the existing environment normal data and the production environment normal data. Generate time data.
- Data mapping is a term that refers to the process of, for example, resolving differences between databases of different systems and integrating digital data. Those skilled in the art can easily understand data mapping. Data mapping in the embodiment will be described later.
- the data management unit 53 stores and manages the normal data of the production environment and the generated abnormal data of the production environment as learning data of the AI model 2 in the learning data storage unit 33.
- FIG. 4 is a functional block diagram illustrating an example of a machine learning system according to an embodiment.
- This system includes a mirror environment forming device 60 that reproduces the actual environment and a learning data generating device 3. Then, the AI model 2 trained by repeatedly giving learning data generated by either the mirror environment forming device 60 or the learning data generating device 3 is deployed to the production environment 70.
- the mirror environment creation device 60 includes a production environment reproduction section 11, an event generation section 12, a recovery section 13, and a data management section 14.
- the production environment reproduction unit 11 acquires information from the production environment, and forms a mirror environment corresponding to the production environment of the target system 1 (FIG. 1) based on the obtained information.
- the event generation unit 12 artificially generates a first operating state that simulates a normal operating state and a second operating state that corresponds to the operation when a failure occurs in the mirror environment.
- the recovery unit 13 performs processing for restoring the second operating state that occurred in the mirror environment, and processing for causing the event generating unit 12 to generate the second operating state in the mirror environment again.
- the data management unit 14 has a function of selecting an operation mode. In the embodiment, three modes, mode A, mode B, and mode C, are assumed.
- Mode A is a mode in which there is no existing environment and no existing data. In this mode, a mirror environment is utilized.
- Mode B is a mode in which there is an existing environment but no existing data. In this mode, the existing environment is used as the production environment reproduction unit 11 of the mirror environment.
- Mode C is a mode in which there is no existing environment but existing data is present. In this mode, the normal data of the production environment is used as the normal data for learning, and the abnormal data of the production environment generated by the learning data generation device 3 is acquired as the abnormal data for learning, and the AI model Have students learn 2.
- FIG. 4 confirm the production environment 70, the AI model 2 to be trained, and existing data.
- the data classification unit 51 of the learning data generation device 3 classifies existing data into normal data and abnormal data.
- the data generation unit 52 compares the classified normal time data and the normal time data of the production environment, and calculates the difference between the two data.
- the data generation unit 52 substitutes the calculated difference into the classified abnormality data to generate abnormality data in the production environment.
- the data management unit 53 sets initial values related to machine learning and acquires data necessary for the functions of the learning data generation device 3. Further, the data management unit 53 acquires the normal data of the production environment and the generated abnormal data of the production environment, and causes the AI model 2 to learn. By transferring the trained AI model 2 to the production environment 70, it can be utilized for the operation of the network system.
- the learning data generation device 3 makes it possible to generate learning data even when the actual environment cannot be reproduced.
- FIG. 5 is a diagram for explaining the flow of data exchanged between the learning data generation device 3, the existing environment 80, the AI model 2, and the production environment 70.
- a solid line indicates data exchange within the learning data generation device 3
- a dotted line indicates data transfer between the learning data generation device 3 and the outside.
- the data management unit 53 sets information such as the presence or absence of an existing environment and the presence or absence of existing data as initial values from existing information (a). Furthermore, the data management unit 53 saves the production environment normal data (d). On the other hand, from the determined AI model 2, the required number of data, types, etc. are set as initial values (c).
- the data classification unit 51 uses the existing environment to classify existing data in the existing environment into existing environment normal data and existing environment abnormal data (b).
- the classified existing environment normal time data is saved as existing environment normal time data for calculation.
- the data generation unit 52 obtains normal data of the existing environment (e) and obtains normal data of the production environment (f). Then, the data generation unit 52 calculates abnormal data using the difference between the existing environment normal data (h) and the production environment normal data (i) (j). Furthermore, the data generation unit 52 applies the calculated difference to the existing environmental abnormality data (g) to generate abnormality data of the production environment.
- the generated abnormality data (k) is stored in the data management section 53.
- the abnormal time data for learning and the normal time data for learning are repeatedly given to the AI model 2 (l) for learning, and the learning results (trained model) are used in the production environment 70 (m).
- FIG. 6 is a diagram illustrating an example of a processing procedure of the machine learning system according to the embodiment.
- the data management unit 53 acquires information indicating the type and amount of learning data that the AI model 2 requires for learning from the AI model 2 (step S10).
- the type of learning data is, for example, normal data, abnormal data, or both.
- As the amount of data for example, the time or number of times data is collected may be used.
- the data management unit 53 refers to the acquired information and determines which mode A, mode B, or mode C should be applied (step S11). If it is mode A, learning of the AI model 2 is performed using the mirror environment (step S12) (step S24). If it is mode B, the existing environment is used (step S13), then the mirror environment is used (step S12), and the AI model 2 is trained (step S24).
- the learning data generation device 3 uses the production environment normal data and the existing environment data (step S14). That is, the learning data generation device 3 stores the normal production environment data in the learning data storage unit 33 (FIG. 3) to use it as learning data (step S40) (step S23).
- the learning data generation device 3 classifies the existing environmental data into abnormal data and normal data (step S20), and provides the existing environmental abnormal data to the data generation unit 52 (step S21). Further, the learning data generation device 3 provides the normal time data to the data generation unit 52 as production environment normal time data of the existing environment (step S30), and calculates the difference between the production normal time data and the existing environment normal time data (step S30). S31). This difference is given to the data generation section 52 (step 21).
- FIG. 8 is a diagram for explaining calculation of abnormality data in the production environment.
- Example 1 two examples (Example 1, Example 2) were simulated and compared.
- a similar environment and a production environment were created for Example 1 and Example 2 in a container orchestration environment with the same virtual layer.
- Example 1 is a type in which, for example, the number of network equipment for configuring the physical layer environment is the same, but the specifications of the equipment are different.
- Example 2 the network equipment for building the physical layer environment is the same, but the number is different.
- a failure can be caused, for example, by activating a kill container. Then, for example, the memory usage rate is acquired as a metrics value.
- FIG. 8 shows an example in which the values are acquired only once, it is also possible to use the average value of values acquired multiple times. Then, the obtained values and error calculations are shown.
- Abnormal data in the production environment can be calculated, for example, using equation (1).
- the error can be calculated using equation (2), for example.
- a production environment, an AI model to be trained, and existing similar data of a similar environment are prepared.
- the existing normal data of a similar environment is classified into normal data and abnormal data.
- the classified normal data is compared with the normal data in the production environment, and the difference is calculated.
- the difference calculation results are substituted into the classified abnormality data, and abnormality data for the production environment is generated by data mapping.
- the abnormal data of the production environment obtained in this way is repeatedly fed to the AI model as a set with the normal data of the production environment for learning. Finally, transfer the learned AI model to the production environment and utilize it.
- the mirror environment needs to reproduce the production environment by reproducing the production environment.
- an existing environment such as a staging environment or a verification environment or a similar environment exists
- existing data that has already been collected metric values such as CPU usage rate and memory usage rate
- This allows you to obtain training data without reproducing the production environment or creating artificial failures. Therefore, it is possible to deal with cases where there is little data or where there is no data in the first place.
- data mapping technology using normal data can unify data between different environments.
- any type of AI model may be applied, such as DNN (Deep Neural Network), CNN (Convolutional Neural Network), RNN (Recurrent Neural Network).
- DNN Deep Neural Network
- CNN Convolutional Neural Network
- RNN Recurrent Neural Network
- the present invention is not limited to the above-described embodiments as they are, but can be embodied by modifying the constituent elements at the implementation stage without departing from the spirit of the invention.
- various inventions can be formed by appropriately combining the plurality of components disclosed in the above embodiments. For example, some components may be deleted from all the components shown in the embodiments. Furthermore, components from different embodiments may be combined as appropriate.
Landscapes
- Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Databases & Information Systems (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/JP2022/023750 WO2023242943A1 (ja) | 2022-06-14 | 2022-06-14 | データ生成装置、データ生成方法、プログラム、および機械学習システム |
| JP2024527945A JP7747205B2 (ja) | 2022-06-14 | 2022-06-14 | データ生成装置、データ生成方法、プログラム、および機械学習システム |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/JP2022/023750 WO2023242943A1 (ja) | 2022-06-14 | 2022-06-14 | データ生成装置、データ生成方法、プログラム、および機械学習システム |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2023242943A1 true WO2023242943A1 (ja) | 2023-12-21 |
Family
ID=89192654
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/JP2022/023750 Ceased WO2023242943A1 (ja) | 2022-06-14 | 2022-06-14 | データ生成装置、データ生成方法、プログラム、および機械学習システム |
Country Status (2)
| Country | Link |
|---|---|
| JP (1) | JP7747205B2 (https=) |
| WO (1) | WO2023242943A1 (https=) |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2020136888A (ja) * | 2019-02-19 | 2020-08-31 | 日本電信電話株式会社 | 検知装置および検知方法 |
| JP7015405B1 (ja) * | 2021-04-27 | 2022-02-02 | 東京エレクトロンデバイス株式会社 | 学習モデルの生成方法、プログラム、情報処理装置及び学習用データの生成方法 |
Family Cites Families (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6910053B1 (en) * | 1999-06-18 | 2005-06-21 | Sap Aktiengesellschaft | Method for data maintenance in a network of partially replicated database systems |
| JP6843450B1 (ja) * | 2019-11-08 | 2021-03-17 | リーダー電子株式会社 | 教師データ生成方法、学習済みモデルを生成する方法、装置、記録媒体、プログラム、情報処理装置 |
| JP7428819B2 (ja) * | 2020-09-25 | 2024-02-06 | ファナック株式会社 | 外観検査のためのモデル作成装置及び外観検査装置 |
-
2022
- 2022-06-14 JP JP2024527945A patent/JP7747205B2/ja active Active
- 2022-06-14 WO PCT/JP2022/023750 patent/WO2023242943A1/ja not_active Ceased
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2020136888A (ja) * | 2019-02-19 | 2020-08-31 | 日本電信電話株式会社 | 検知装置および検知方法 |
| JP7015405B1 (ja) * | 2021-04-27 | 2022-02-02 | 東京エレクトロンデバイス株式会社 | 学習モデルの生成方法、プログラム、情報処理装置及び学習用データの生成方法 |
Non-Patent Citations (1)
| Title |
|---|
| LEE LI; KAZUYO AKASHI; HARUHISA NOZUE; KENICHI TAYAMA: "B-14-13 A Study of Building a Mirror Environment for AI Training Data Generation. ", PROCEEDINGS OF THE 2021 IEICE COMMUNICATIONS SOCIETY CONFERENCE (2); SEPTEMBER 14-17, 2021, IEICE, JP, 31 August 2021 (2021-08-31) - 17 September 2021 (2021-09-17), JP, pages 187, XP009551233 * |
Also Published As
| Publication number | Publication date |
|---|---|
| JPWO2023242943A1 (https=) | 2023-12-21 |
| JP7747205B2 (ja) | 2025-10-01 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP7809028B2 (ja) | コンピュータシステム、コンピュータ実装方法、プログラム、及びコンピュータ実装システム(学習因果関係) | |
| US20220083415A1 (en) | Storage Network with Enhanced Data Access Performance | |
| WO2019116354A1 (en) | Training of artificial neural networks using safe mutations based on output gradients | |
| JP2023534696A (ja) | ネットワークトポロジーにおけるアノマリー検知 | |
| WO2022074796A1 (ja) | 評価方法、評価装置、および評価プログラム | |
| US20050204333A1 (en) | Integrated system-of-systems modeling environment and related methods | |
| WO2024255436A1 (zh) | 多元异构计算系统内节点的建模方法、装置、设备及介质 | |
| KR102934196B1 (ko) | 제로샷 학습 기반 지식 그래프의 링크 예측 모델 생성 장치 및 방법 | |
| WO2018143019A1 (ja) | 情報処理装置、情報処理方法およびプログラム記録媒体 | |
| CN114186609A (zh) | 模型训练方法和装置 | |
| Fredericks | Automatically hardening a self-adaptive system against uncertainty | |
| JP6649294B2 (ja) | 状態判定装置、状態判定方法及びプログラム | |
| Romankevich et al. | Fault-tolerant multiprocessor systems reliability estimation using statistical experiments with GL-models | |
| CN119032360A (zh) | 基于机器学习的监测焦点引擎 | |
| JP7747205B2 (ja) | データ生成装置、データ生成方法、プログラム、および機械学習システム | |
| CN114528131A (zh) | 智能移动系统i/o接口可靠性分析方法及容错装置 | |
| JP5244750B2 (ja) | テストケース生成装置およびその方法 | |
| CN112953781A (zh) | 网络切片下基于粒子群的虚拟业务故障恢复方法及装置 | |
| JP7820891B2 (ja) | ブラックボックス機械学習モデルのための新しい特徴 | |
| US20230185791A1 (en) | Prioritized data cleaning | |
| Li et al. | An Efficient Universal Generating Function‐Based Analyzing Approach for Multistate System with Imperfect Coverage Failure | |
| US20250317361A1 (en) | Methods And Systems For Discrete Event Network Simulation | |
| US12602302B2 (en) | Machine learning model training with hardware error simulation | |
| Vitui | Automation And Intelligence In IT Operation Management: Machine Learning for Capacity Planning and Load Testing Optimization | |
| JP7643634B1 (ja) | 情報処理装置、情報処理方法及びプログラム |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22946763 Country of ref document: EP Kind code of ref document: A1 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2024527945 Country of ref document: JP |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 22946763 Country of ref document: EP Kind code of ref document: A1 |