WO2022079919A1 - 検知プログラム、検知方法および検知装置 - Google Patents

検知プログラム、検知方法および検知装置 Download PDF

Info

Publication number
WO2022079919A1
WO2022079919A1 PCT/JP2020/039191 JP2020039191W WO2022079919A1 WO 2022079919 A1 WO2022079919 A1 WO 2022079919A1 JP 2020039191 W JP2020039191 W JP 2020039191W WO 2022079919 A1 WO2022079919 A1 WO 2022079919A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
machine learning
pair
learning model
distribution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/JP2020/039191
Other languages
English (en)
French (fr)
Japanese (ja)
Inventor
寛彰 金月
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujitsu Ltd
Original Assignee
Fujitsu Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujitsu Ltd filed Critical Fujitsu Ltd
Priority to PCT/JP2020/039191 priority Critical patent/WO2022079919A1/ja
Priority to JP2022556825A priority patent/JP7424507B2/ja
Publication of WO2022079919A1 publication Critical patent/WO2022079919A1/ja
Priority to US18/187,740 priority patent/US12591808B2/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/01Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks

Definitions

  • the present invention relates to a detection program or the like for detecting an accuracy deterioration of a machine learning model in operation.
  • system Since the machine learning model determines and classifies according to the training data trained at the time of system development, the accuracy of the machine learning model deteriorates when the tendency of the input data changes during the system operation.
  • FIG. 21 is a diagram for explaining the deterioration of the machine learning model due to the change in the tendency of the input data.
  • the machine learning model described here is a model that classifies the input data into one of the first class, the second class, and the third class, and is pre-trained based on the training data before the system operation. do.
  • distribution 1A shows the distribution of input data at the initial stage of system operation.
  • Distribution 1B shows the distribution of the input data at the time when T1 time has elapsed from the initial stage of system operation.
  • Distribution 1C shows the distribution of the input data at the time when T2 hours have elapsed from the initial stage of system operation. It is assumed that the tendency (feature amount, etc.) of the input data changes with the passage of time. For example, if the input data is an image, the tendency of the input data changes depending on the season and the time zone even if the image is taken of the same subject.
  • the determination boundary 3 indicates the boundary between the model application areas 3a to 3c.
  • the model application area 3a is an area in which training data belonging to the first class is distributed.
  • the model application area 3b is an area in which training data belonging to the second class is distributed.
  • the model application area 3c is an area in which training data belonging to the third class is distributed.
  • the asterisk is input data that belongs to the first class, and it is correct that it is classified into the model application area 3a when it is input to the machine learning model.
  • the triangle marks are input data belonging to the second class, and it is correct that they are classified into the model application area 3b when they are input to the machine learning model.
  • the circles are input data belonging to the third class, and it is correct that they are classified into the model application area 3a when they are input to the machine learning model.
  • distribution 1A all input data is distributed in the normal model application area. That is, the input data of the star mark is located in the model application area 3a, the input data of the triangle mark is located in the model application area 3b, and the input data of the circle mark is located in the model application area 3c.
  • T 2 statistic Hotelling's T-square
  • the data group of the input data and the normal data ( training data) is analyzed by the principal component, and the T2 statistic of the input data is calculated.
  • the T 2 statistic is the sum of the squares of the distances from the origin of each standardized principal component to the data.
  • the accuracy deterioration of the machine learning model is detected based on the change in the distribution of the T 2 statistic of the input data group.
  • the T 2 statistic of the input data group corresponds to the percentage of outlier data.
  • the above-mentioned conventional technique has a problem that it cannot detect a change in the distribution of data that may cause a deterioration in the accuracy of the machine learning model.
  • the detection program causes the computer to perform the following processing.
  • the computer is a second machine learning generated by machine learning based on the first result output from the first machine learning model in response to the input of the first plurality of data and the first plurality of data. Enter a second set of data into the model.
  • the computer acquires the second result output from the second machine learning model in response to the input of the second plurality of data.
  • the computer performs the distribution of the first plurality of data and the second plurality of data based on the comparison between the threshold value and the value calculated based on the second result and the gradient of the loss function of the second machine learning model. Detect the difference from the data distribution.
  • FIG. 1 is a diagram for explaining a reference technique.
  • FIG. 2 is a diagram showing an example of accuracy deterioration prediction.
  • FIG. 3 is a diagram showing an example of concept drift.
  • FIG. 4 is a diagram for explaining the basic mechanism of the inspector model.
  • FIG. 5 is a diagram for explaining a problem of the reference technique.
  • FIG. 6 is a diagram for explaining the problem of statistical testing.
  • FIG. 7 is a diagram for explaining the processing of the detection device according to the present embodiment.
  • FIG. 8 is a diagram for explaining knowledge distillation.
  • FIG. 9 is a functional block diagram showing the configuration of the detection device according to the present embodiment.
  • FIG. 10 is a diagram showing an example of the data structure of the training data set.
  • FIG. 11 is a diagram for explaining an example of an operating model.
  • FIG. 10 is a diagram showing an example of the data structure of the training data set.
  • FIG. 11 is a diagram for explaining an example of an operating model.
  • FIG. 10 is
  • FIG. 12 is a diagram showing an example of the data structure of the pseudo sample table.
  • FIG. 13 is a diagram showing an example of the data structure of the distillation data table.
  • FIG. 14 is a diagram showing an example of the data structure of the operation data set table.
  • FIG. 15 is a diagram for explaining a determination boundary of a feature space according to the present embodiment.
  • FIG. 16 is a diagram showing the distribution of score differences according to hyperparameters.
  • FIG. 17 is a flowchart showing a processing procedure of the detection device according to the present embodiment.
  • FIG. 18 is a diagram (1) showing the nature of the determination boundary of each machine learning model.
  • FIG. 19 is a diagram (2) showing the nature of the determination boundary of each machine learning model.
  • FIG. 20 is a diagram showing an example of a hardware configuration of a computer that realizes the same functions as the detection device according to the present embodiment.
  • FIG. 21 is a diagram for explaining the deterioration of the machine learning model due to the change in the tendency of the input data.
  • the reference technique for detecting the deterioration of the accuracy of the machine learning model will be described.
  • the accuracy deterioration of the machine learning model is detected by using a plurality of monitors that narrow the model application area under different conditions.
  • the monitor will be referred to as an "inspector model”.
  • FIG. 1 is a diagram for explaining a reference technique.
  • the machine learning model 10 is a machine learning model generated by executing machine learning using training data.
  • the reference technique the deterioration of the accuracy of the machine learning model 10 is detected.
  • the training data is used when training the parameters of the machine learning model 10, and the correct answer label is associated with the training data.
  • the inspector models 11A, 11B, and 11C have different decision boundaries because the model application area is narrowed under different conditions.
  • the inspector models 11A to 11C are trained by modifying the training data in some way and using the modified training data.
  • the output results may differ even if the same input data is input.
  • the accuracy deterioration of the machine learning model 10 is detected based on the difference in the output results of the inspector models 11A to 11C.
  • inspector models 11A to 11C are shown, but accuracy deterioration may be detected by using another inspector model.
  • DNN Deep Neural Network
  • FIG. 2 is a diagram showing an example of accuracy deterioration prediction.
  • the vertical axis of the graph in FIG. 2 is the axis corresponding to the accuracy, and the horizontal axis is the axis corresponding to the time.
  • the accuracy decreases with the passage of time, and at time t1, the accuracy falls below the permissible limit of accuracy.
  • the permissible limit For example, in the reference technique, at time t1, deterioration in accuracy (beyond the permissible limit) is detected.
  • FIG. 3 is a diagram showing an example of concept drift.
  • the vertical axis of FIG. 3 is the axis corresponding to the first feature amount
  • the horizontal axis is the axis corresponding to the second feature amount.
  • the distribution A 1 of the first data may change to the distribution A 2 . Since the original machine learning model 10 is learning with the distribution of the first data as the distribution A1, the accuracy decreases with the passage of time, and retraining is required.
  • Data that causes concept drift includes spam emails, electricity demand forecasts, stock price forecasts, poker hand strategic procedures, images, etc.
  • an image has different features depending on the season and time zone, even if the subject is the same.
  • a plurality of inspector models 11A to 11C are trained in order to detect the deterioration of the accuracy of the machine learning model 10. Then, in order to train the plurality of inspector models 11A to 11C, it is essential that the machine learning model 10 and the training data used at the time of training the machine learning model 10 can be modified in some way.
  • the machine learning model 10 is required to be a specific machine learning model, such as the machine learning model 10 being a model for calculating the degree of certainty.
  • FIG. 4 is a diagram for explaining the basic mechanism of the inspector model.
  • the inspector model is created by performing machine learning on the decision boundary 5 that is the boundary between the distribution A1 of the training data belonging to the first class and the distribution B of the training data belonging to the second class.
  • the danger area 5a of the decision boundary 5 is monitored, and the number of operation data contained in the danger area 5a has increased (or decreased).
  • the number of operational data increases (or decreases)
  • the deterioration of accuracy is detected.
  • FIG. 5 is a diagram for explaining a problem of the reference technique. For example, if a method of monitoring the danger zone 5a using an inspector model or the like is used, the case 1 or the case 2 is obtained.
  • the data answer itself may have changed, but even if the inspector model explained in the reference technique is used, There is no change in the results output from each inspector model. For example, the inspector model classifies the data into the first class, no matter how far away from the decision boundary, as long as the data is contained in the domain of the first class.
  • the reference technique implicitly assumes that the answer to the data has not changed.
  • statistical tests include Student's test, Kolmogorov Smirnov test, method using L2 distance, method using cosine distance, method using KL distance, method using Worserstein distance, and the like.
  • FIG. 6 is a diagram for explaining the problem of statistical testing.
  • the data group 6a and 6a-1 located in the feature space including the x-axis, the y-axis, and the z-axis will be described.
  • the determination boundary 7 is located on the xy plane, the change in the z-axis direction is irrelevant to the classification result.
  • the statistical test when the data group 6b-1 moves in the z-axis direction and changes to the data group 6b-2 with the passage of time, such a change is detected and false detection occurs.
  • FIG. 7 is a diagram for explaining the processing of the detection device according to the present embodiment.
  • FIG. 7 describes the processing of the detection device in the learning phase and the operation phase.
  • the detection device takes the training data set 141 as an input and executes machine learning of the operation model 50.
  • the training data set 141 includes a plurality of training data, and the plurality of training data are given correct answer labels.
  • the detection device executes machine learning of the inspector model 55 by inputting the output when the training data set 141 is input to the operation model 50 for which machine learning has been executed and the training data set 141 as inputs.
  • the detection device executes machine learning of the inspector model 55 by knowledge distillation (KD: Knowledge Distiller).
  • FIG. 8 is a diagram for explaining knowledge distillation.
  • a Student model 7B that mimics the output value of the Teacher model 7A is constructed.
  • the Teacher model 7A corresponds to the operating model 50 of FIG.
  • the Student model 7B corresponds to the inspector model 55 of FIG.
  • NN Neurological Network
  • the detection device trains the parameters of the Teacher model 7A (executes machine learning by the error back propagation method) so that the output result of the Teacher model 7A when the training data 6 is input approaches the correct answer label "dog". Further, the detection device trains the parameters of the Student model 7B so that the output result of the Student model 7B when the training data 6 is input approaches the output result of the Teacher model 7A when the training data 6 is input.
  • the output of the Teacher model 7A is called "Soft Target”.
  • the correct label for the training data is called "Hard Target”.
  • the method of training the Teacher model 7A using the training data 6 and the hard target and training the Student model 7B using the training data 6 and the soft target is called knowledge distillation.
  • the detector trains the Teacher model 7A and the Student model 7B in the same manner for the other training data.
  • the detection device performs machine learning of the inspector model 55 using the training data set 141 and the soft target output from the operating model 50.
  • the detection device inputs a plurality of operational data included in the operational data set C0 into the inspector model 55, and acquires the result output from the inspector model 55.
  • the detection device compares the value calculated based on the result output from the inspector model 55 and the gradient of the loss coefficient of the inspector model 55 with the threshold value, and detects the concept drift.
  • the value output from the inspector model 55 and the value calculated based on the gradient of the loss coefficient of the inspector model 55 indicate the distance from the determination boundary indicating the boundary of the model application area.
  • the value calculated based on the result output from the inspector model 55 and the gradient of the loss coefficient of the inspector model 55 is referred to as an “evaluation value”.
  • the evaluation value is equal to or higher than the threshold value, it means that the operational data set input to the inspector model 55 is far from the decision boundary and concept drift has occurred.
  • the detection device detects concept drift when the evaluation value becomes equal to or higher than the threshold value.
  • the operation data set C0 is input to the operation model 50, and the class to which the data of the operation data set C0 belongs is predicted.
  • the detection device detects the drift, the detection device re-executes machine learning for the operation model 50 by the new training data set.
  • the detection device uses knowledge distillation to perform machine learning of the inspector model 55, which is a monitor of the operation model 50.
  • the detection device inputs the operation data set C0 into the inspector model 55, compares the evaluation value calculated based on the result output from the inspector model 55 and the gradient of the loss coefficient of the inspector model 55 with the threshold value, and compares the threshold value. Detect concept drift. As a result, even if the data distribution changes in the direction away from the determination boundary with the passage of time, the change in the data distribution can be detected, and the accuracy deterioration of the operating model 50 can be detected.
  • FIG. 9 is a functional block diagram showing the configuration of the detection device according to the present embodiment.
  • the detection device 100 includes a communication unit 110, an input unit 120, an output unit 130, a storage unit 140, and a control unit 150.
  • the communication unit 110 executes data communication with an external device (not shown) via a network.
  • the communication unit 110 receives the training data set 141 and the like, which will be described later, from an external device.
  • the input unit 120 is a device or interface for inputting data.
  • the input unit 120 is a mouse, a keyboard, and the like.
  • the output unit 130 is a display or the like that displays a screen.
  • the storage unit 140 is an example of a storage device that stores data, a program executed by the control unit 150, and the like, and is, for example, a hard disk or a memory.
  • the storage unit 140 has a training data set 141, an operation model data 142, a pseudo sample table 143, a distillation data table 144, an inspector model data 145, and an operation data set table 146.
  • the training data set 141 includes a plurality of training data.
  • FIG. 10 is a diagram showing an example of the data structure of the training data set. As shown in FIG. 10, this training data set associates record numbers with training data with correct labels.
  • the record number is a number that identifies the pair of the training data and the correct answer label.
  • the training data corresponds to mail spam data, electricity demand forecast, stock price forecast, poker hand data, image data, etc., and includes features of multiple dimensions.
  • the correct label is information that uniquely identifies the first class or the second class.
  • the operating model data 142 is the data of the operating model 50 (machine learning model).
  • the operating model 50 of the present embodiment classifies the input data into a plurality of classes by a predetermined classification algorithm.
  • the operating model 50 will be described as NN.
  • FIG. 11 is a diagram for explaining an example of an operation model.
  • the operating model 50 has a neural network structure, and has an input layer 50a, a hidden layer 50b, and an output layer 50c.
  • the input layer 50a, the hidden layer 50b, and the output layer 50c have a structure in which a plurality of nodes are connected by edges.
  • the hidden layer 50b and the output layer 50c have a function called an activation function and a bias value, and weights are set on the edges.
  • the bias value and weight are referred to as "parameters”.
  • the pseudo sample table 143 holds a plurality of pseudo samples generated based on the training data set 141.
  • FIG. 12 is a diagram showing an example of the data structure of the pseudo sample table. As shown in FIG. 12, this pseudo sample table 143 associates a sample number with a pseudo sample. The sample number is information for identifying a pseudo sample. The pseudo sample is data obtained by scaling the features of the training data.
  • the distillation data table 144 stores an output result (soft target) when each pseudo sample of the pseudo sample table 143 is input to the operating model 50.
  • FIG. 13 is a diagram showing an example of the data structure of the distillation data table. As shown in FIG. 13, the distillation data table 144 associates a sample number with a pseudo sample and a soft target. The description of the sample number and the pseudo sample is the same as the description of the sample number and the pseudo sample given in FIG.
  • the soft target is an output result when a pseudo sample is input to the operating model 50. For example, the soft target is one of a plurality of classes.
  • the inspector model data 145 is the data of the inspector model 55.
  • the inspector model 55 has a neural network structure, and has an input layer, a hidden layer, and an output layer in the same manner as the operating model 50 described with reference to FIG. Parameters are set in the inspector model 55.
  • the parameters of the inspector model 55 are trained by knowledge distillation.
  • the operational data set table 146 has an operational data set that is added over time.
  • FIG. 14 is a diagram showing an example of the data structure of the operation data set table.
  • the operation data set table 146 has data identification information and an operation data set.
  • the data identification information is information that identifies an operational data set.
  • the operational data set contains a plurality of operational data. Operational data corresponds to email spam data, electricity demand forecasts, stock price forecasts, poker hand data, image data, and the like.
  • the control unit 150 is a processing unit that controls the entire detection device 100, and has a generation unit 151, a calculation unit 152, an acquisition unit 153, and a detection unit 154.
  • the control unit 150 is, for example, a processor or the like.
  • the generation unit 151 executes a process of generating the operation model data 142, a process of generating the pseudo sample table 143, a process of generating the distillation data table 144, and a process of generating the inspector model data 145.
  • the generation unit 151 takes the training data set 141 as an input and executes machine learning of the operation model 50. For example, when the generation unit 151 inputs the training data of the training data set to the input layer of the operation model 50, the parameter of the operation model 50 so that the output result of the output layer approaches the correct answer label of the input training data. To train. For example, the generation unit 151 executes machine learning by the error back propagation method. The generation unit 151 registers the data (operation model data 142) of the operation model 50 in which machine learning is executed in the storage unit 140.
  • FIG. 15 is a diagram for explaining the determination boundary of the feature space according to the present embodiment.
  • the feature space 30 is a visualization of each training data of the training data set 141.
  • the horizontal axis of the feature space 30 corresponds to the axis of the first feature amount
  • the vertical axis corresponds to the axis of the second feature amount.
  • each training data is shown on two axes, but the training data is assumed to be multidimensional data.
  • the correct answer label corresponding to the training data marked with a circle is referred to as "first class”
  • the correct answer label corresponding to the training data marked with a triangle is referred to as "second class”.
  • the feature space 30 is classified into a model application area 31A and a model application area 31B by the determination boundary 31.
  • the operating model 50 is NN
  • the probability of the first class and the probability of the second class are output. If the probability of the first class is greater than that of the second class, the data is classified in the first class. If the probability of the second class is higher than that of the first class, the data is classified into the second class.
  • FIG. 15 describes a case where the correct answer label of the training data is the "first class” or the "second class", but the correct answer label of another class may be given.
  • n model application areas are set in the feature space 30.
  • the operation model 50 is NN, when the operation data is input to the operation model 50, the probabilities of each class are output.
  • the generation unit 151 executes data conversion for each training data included in the training data set 141.
  • the generation unit 151 executes data conversion (Min-Max Scaling) so that the value of the feature amount in each dimension of the training data is included in the value of 0 or more and less than 1.
  • the generation unit 151 randomly selects training data in which the value of the feature amount in each dimension is ⁇ m or more and less than 1 + m from each training data after data conversion. "M" is a margin, and an arbitrary real number is set in advance.
  • the training data after data conversion randomly selected by the generation unit 151 by the above processing is referred to as a "pseudo sample".
  • the range of the feature quantity values of the pseudo sample is defined by the equation (1). Let n be the number of dimensions of the feature quantity.
  • the generation unit 151 associates the sample number, the pseudo sample, and the correct answer label, and registers them in the pseudo sample table 143.
  • the correct label of the pseudo sample is the correct label of the training data before data conversion corresponding to the pseudo sample.
  • the generation unit 151 inputs the pseudo sample of the pseudo sample table 143 into the operation model 50, and acquires the output result (soft target) of the operation model 50.
  • the generation unit 151 registers the sample number, the pseudo sample, and the soft target in the distillation data table 144.
  • the generation unit 151 acquires a soft target and registers it in the distillation data table 144 by repeatedly executing the above processing for each pseudo sample in the pseudo sample table 143.
  • the pseudo data set is defined by the equation (2).
  • the symbol shown at position a1 in the equation (2) is referred to as "D hat”.
  • the symbol shown in a2 of the equation (2) is expressed as "x hat”.
  • the D hat represents a pseudo data set.
  • the x-hat indicates a pseudo sample.
  • f (x hat) is a soft target output from the operating model 50.
  • the symbol shown at position a3 in the equation (2) is referred to as "flower letter X".
  • the flower letter X indicates an input space.
  • the generation unit 151 acquires the distillation data table 144 and trains the parameters of the inspector model 55 based on the distillation data table 144. For example, the generation unit 151 trains the parameters of the inspector model 55 so that when the pseudo sample of the distillation data table 144 is input to the input layer of the inspector model 55, the output result of the output layer approaches the soft target. .. For example, the generation unit 151 executes machine learning by the error back propagation method. The generation unit 151 registers the data (inspector model data 145) of the inspector model 55 that has executed machine learning in the storage unit 140.
  • the process of training the inspector model 55 by the generation unit 151 corresponds to training the inspector model 55 so as to minimize the loss function ⁇ 2 * shown in the equation (3).
  • f (X; ⁇ 1 ) corresponds to the output of the operating model 50
  • X corresponds to the D hat.
  • ⁇ 1 indicates the parameters of the operating model 50, which are trained parameters.
  • g (X; ⁇ 2 ) corresponds to the output of the inspector model 55, and X corresponds to the D hat.
  • ⁇ 2 indicates the parameters of the inspector model 55 and is the parameters to be trained.
  • the generation unit 151 When the generation unit 151 receives the fact that the drift is detected from the detection unit 154 described later, the generation unit 151 re-executes the machine learning of the operation model 50 and the inspector model 55. For example, the generation unit 151 acquires the latest training data set 141 from an external device and retrains the operating model 50 and the inspector model 55 using the latest training data set 141.
  • the calculation unit 152 calculates hyperparameters for scaling the output of the inspector model 55 using the temperatured softmax.
  • the output gi of the inspector model 55 using the temperatured softmax when the data i is input is defined by the equation (4).
  • zi is the output of the inspector model 55 when the data i is input, and indicates the output of the inspector model 55 using a normal softmax.
  • T indicates a hyperparameter.
  • the output of the inspector model 55 using the temperatured softmax is referred to as a score.
  • the calculation unit 152 selects a pair of pseudo samples from the pseudo sample table 143.
  • the pair of pseudo samples is referred to as a first pseudo sample and a second pseudo sample.
  • the calculation unit 152 inputs the first pseudo sample into the inspector model 55 using the temperatured softmax, and calculates the first score.
  • the calculation unit 152 inputs the second pseudo sample into the inspector model 55 using the temperatured softmax, and calculates the second score.
  • the calculation unit 152 calculates the absolute value of the difference between the first score and the second score as the score difference.
  • the calculation unit 152 selects different pairs of pseudo samples from the pseudo sample table 143, and repeatedly executes the process of calculating the score difference.
  • the calculation unit 152 searches for hyperparameters such that the maximum score difference among the plurality of score differences is less than the threshold value Ths.
  • the threshold Ths is set in advance.
  • FIG. 16 is a diagram showing the distribution of score differences according to hyperparameters.
  • the horizontal axis of the graph G1-1 corresponds to the score difference
  • the vertical axis of the graph G1-1 corresponds to the frequency.
  • the frequency is slightly higher in the score difference of 0 to 0.1, and the frequency is concentrated in the score difference of 0.9 to 1.0.
  • the horizontal axis of the graph G2-1 corresponds to the score difference, and the vertical axis of the graph G2-1 corresponds to the frequency.
  • the frequency of the score difference is evenly distributed as compared with the graph G1-1.
  • the calculation unit 152 searches for hyperparameters such that the maximum score difference is less than the threshold value Ths, the relationship between the score difference and the frequency approaches the relationship of graph G2-1, and as shown in graph G2-2, You will be able to finely classify the distance from the decision boundary.
  • the calculation unit 152 outputs the calculated (searched) hyperparameter information to the detection unit 154.
  • the detection unit 154 inputs operational data to the inspector model 55, calculates the distance from the decision boundary, and based on the distance from the decision boundary, sets the distribution of the training data set 141 and the distribution of the operational data set. Detect the difference.
  • the detection unit 154 detects the difference between the distribution of the training data set 141 and the distribution of the operation data set as a drift, and outputs to the generation unit 151 that the drift has been detected.
  • the detection unit 154 selects the operation data i and the operation data j, which are a pair of the operation data, from the operation data set.
  • g i (x t ) is an output result output from the "inspector model 55 using the temperatured soft max" by inputting the operation data i.
  • g j (x t ) is an output result output from the “inspector model 55 using the temperatured soft max” by inputting the operation data j.
  • ⁇ x g i (x t ) is the output result when the operational data i is input to the partially differentiated "inspector model 55 using the temperatured softmax", and corresponds to the gradient of the loss function in the operational data i. .. ⁇ x g j (x t ) is the output result when the operation data j is input to the partially differentiated "inspector model 55 using the temperatured softmax", and corresponds to the gradient of the loss function in the operation data j. ..
  • the denominator of the equation (5) indicates the q-norm of the difference between the gradient of the loss function in the operational data i and the gradient of the loss function in the operational data j.
  • q-norm is a dual-norm of p-norm, and the relationship of equation (6) holds between p and q.
  • the p-norm is represented by the equation (7).
  • the detection unit 154 reselects the operation data pair from the operation data set, and repeatedly executes the process of calculating d to based on the reselected operation data pair and the equation (5).
  • the detection unit 154 calculates the average value of the plurality of calculated d to .
  • the average value of the plurality of d to corresponds to the above-mentioned evaluation value.
  • the detection unit 154 detects the difference between the distribution of the training data set 141 and the distribution of the operation data set, and detects it as a drift.
  • the detection unit 154 may notify the external device that the drift has been detected.
  • the detection unit 154 When a plurality of operation data sets are registered in the operation data set table 146, the detection unit 154 repeatedly executes the above processing for each operation data set.
  • the detection unit 154 executes the following processing to calculate the threshold value th ⁇ .
  • the detection unit 154 selects a pair of pseudo samples from the pseudo sample table 143, and repeatedly executes a process of calculating d ⁇ based on the selected pair of pseudo samples and the equation (5).
  • the detection unit 154 calculates the standard deviation of d based on the calculated plurality of d ⁇ , and sets the calculated standard deviation as the above-mentioned threshold value Th ⁇ .
  • the classification unit 155 identifies the class to which the operation data belongs by inputting the operation data of the operation data set into the operation model 50.
  • the classification unit 155 classifies a plurality of operation data into a plurality of classes by repeatedly executing the above processing for other operation data of the operation data set.
  • FIG. 17 is a flowchart showing a processing procedure of the detection device according to the present embodiment.
  • the detection device 100 repeatedly executes the process of FIG. 17 every time a new operation data set is registered in the operation data set table 146.
  • the generation unit 151 of the detection device 100 inputs the training data set 141 and executes machine learning of the operation model (step S101).
  • the generation unit 151 generates a pseudo sample table 143 based on the training data set 141 (step S102).
  • the generation unit 151 generates the distillation data table 144 by inputting the pseudo sample of the pseudo sample table 143 into the operating model (step S103).
  • the generation unit 151 executes machine learning of the inspector model 55 that imitates the operation model 50 by knowledge distillation (step S104).
  • the calculation unit 152 of the detection device 100 calculates the hyperparameters of the inspector model 55 (step S105).
  • the acquisition unit 153 of the detection device 100 inputs the operation data pair of the operation data set into the inspector model 55, and acquires the output result of the inspector model (step S106).
  • the detection unit 154 of the detection device 100 determines whether or not drift has been detected based on the equation (5) (step S107).
  • step S108 When the detection unit 154 detects a drift (steps S108, Yes), the detection unit 154 shifts to step S101. If the detection unit 154 does not detect the drift (steps S108, No), the detection unit 154 shifts to step S109.
  • the classification unit 155 of the detection device 100 inputs the operation data into the operation model 50 and classifies the operation data into classes (step S109).
  • the detection device 100 uses knowledge distillation to perform machine learning of the inspector model 55, which is a monitor of the operation model 50.
  • the detection device 100 inputs an operation data set to the inspector model 55, compares the evaluation value calculated based on the result output from the inspector model 55 and the gradient of the loss coefficient of the inspector model 55 with the threshold value, and compares the threshold value. Detect concept drift. As a result, even if the data distribution changes in the direction away from the determination boundary with the passage of time, the change in the data distribution can be detected, and the accuracy deterioration of the operating model 50 can be detected.
  • the detection device 100 selects different pairs of pseudo samples and repeatedly executes the process of calculating the score difference.
  • the detection device 100 calculates hyperparameters such that the maximum score difference among a plurality of score differences is less than the threshold value Ths. This makes it possible to quantify the distance from the decision boundary step by step.
  • the detection device 100 calculates an evaluation value based on the equation (5), compares it with the threshold value Th ⁇ , and detects a change in the data distribution. Further, the detection device 100 calculates the threshold value Th ⁇ using the training data set. This makes it possible to detect the drift with high accuracy.
  • the operation data of the operation data set is input to the operation model 50, and the operation data is classified into a plurality of classes. Therefore, the operating data can be appropriately classified into a plurality of classes according to the operating model 50 before the drift occurs.
  • the detection device 100 detects a drift
  • the machine learning of the operation model is executed again by the new training data set.
  • the operating model 50 corresponding to the drift can be generated again.
  • FIG. 18 is a diagram (1) showing the nature of the determination boundary of each machine learning model.
  • a training data set 15 is used to perform machine learning on a support vector machine (Soft-Margin SVM), a random forest (Ramdom Forest), and an NN, respectively.
  • Soft-Margin SVM Soft-Margin SVM
  • Random Forest random forest
  • NN an NN
  • the distribution when the data set is input to the trained support vector machine becomes the distribution 20A, and each data is classified into the first class and the second class at the decision boundary 21A.
  • the distribution becomes the distribution 20B, and each data is classified into the first class and the second class at the decision boundary 21B.
  • the distribution becomes the distribution 20C, and each data is classified into the first class and the second class at the determination boundary 21C.
  • FIG. 19 is a diagram (2) showing the nature of the determination boundary of each machine learning model.
  • FIG. 19 shows an example of training a plurality of types of machine learning models using a certain training data set 35.
  • Nearest Neighbors, RBF SVM, Gaussian Process, Random Forest, Neural Net, Radient Booting Tree, and Naive Bayes are shown as machine learning models.
  • the distribution when the data set is input to the trained Nearest Neighbors is the distribution 40A. Each data is classified into the first class and the second class at the decision boundary 41A.
  • the distribution of the trained Nearest Neighbors inspector model is distribution 42A, and each data is classified into the first class and the second class at the decision boundary 43A.
  • G42A indicates the distance from the decision boundary calculated based on the trained Nearest Neighbors inspector model. In G42A, contour lines of the same color indicate the same distance. Let the inspector model be NN.
  • the distribution when the data set is input to the trained RBF SVM is the distribution 40B. Each data is classified into the first class and the second class at the decision boundary 41B.
  • the distribution of the trained RBF SVM inspector model is distribution 42B, and each data is classified into the first class and the second class at the decision boundary 43B.
  • G42B indicates the distance from the decision boundary calculated based on the trained RBF SVM inspector model. Let the inspector model be NN.
  • the distribution when the data set is input to the trained Gaussian Process is the distribution 40C. Each data is classified into the first class and the second class at the decision boundary 41C.
  • the distribution of the trained Gaussian Process inspector model is distribution 42C, and each data is classified into the first class and the second class at the decision boundary 43C.
  • G42C indicates the distance from the decision boundary calculated based on the trained Gaussian Process inspector model. Let the inspector model be NN.
  • the distribution when the data set is input to the trained Random Forest is the distribution 40D. Each data is classified into the first class and the second class at the decision boundary 41D.
  • the distribution of the trained Random Forest inspector model is distribution 42D, and each data is classified into the first class and the second class at the decision boundary 43D.
  • G42D indicates the distance from the decision boundary calculated based on the trained Random Forest inspector model. Let the inspector model be NN.
  • the distribution when the data set is input to the trained Neural Net is the distribution 40E. Each data is classified into the first class and the second class at the decision boundary 41E.
  • the distribution of the trained Neural Net inspector model is distribution 42E, and each data is classified into the first class and the second class at the decision boundary 43E.
  • G42E indicates the distance from the decision boundary calculated based on the trained Neural Net inspector model. Let the inspector model be NN.
  • the distribution when the data set is input to the trained Gradient Booting Tree is the distribution 40F. Each data is classified into the first class and the second class at the decision boundary 41F.
  • the distribution of the trained Gradient Booting Tree inspector model is distribution 42F, and each data is classified into the first class and the second class at the decision boundary 43F.
  • G42F indicates the distance from the decision boundary calculated based on the trained Gradient Booting Tree inspector model. Let the inspector model be NN.
  • the distribution when the data set is input to the trained Naive Bayes is a distribution of 40G. Each data is classified into the first class and the second class at the decision boundary 41G.
  • the distribution of the trained Naive Bayes inspector model is 42G, and each data is classified into the first class and the second class at the decision boundary 43G.
  • G42G indicates the distance from the decision boundary calculated based on the trained Naive Bayes inspector model. Let the inspector model be NN.
  • the distance from the decision boundary can be approximately calculated using the inspector model regardless of the architecture of the machine learning model.
  • FIG. 20 is a diagram showing an example of a hardware configuration of a computer that realizes the same functions as the detection device according to the present embodiment.
  • the computer 200 has a CPU 201 that executes various arithmetic processes, an input device 202 that receives data input from a user, and a display 203. Further, the computer 200 has a reading device 204 for reading a program or the like from a storage medium, and an interface device 205 for exchanging data with an external device or the like via a wired or wireless network. The computer 200 has a RAM 206 for temporarily storing various information and a hard disk device 207. Then, each of the devices 201 to 207 is connected to the bus 208.
  • the hard disk device 207 has a generation program 207a, a calculation program 207b, an acquisition program 207c, a detection program 207d, and a classification program 207e.
  • the CPU 201 reads out the generation program 207a, the calculation program 207b, the acquisition program 207c, the detection program 207d, and the classification program 207e and deploys them in the RAM 206.
  • the generation program 207a functions as the generation process 206a.
  • the calculation program 207b functions as the calculation process 206b.
  • the acquisition program 207c functions as the acquisition process 206c.
  • the detection program 207d functions as the detection process 206d.
  • the classification program 207e functions as the classification process 206e.
  • the processing of the generation process 206a corresponds to the processing of the generation unit 151.
  • the processing of the calculation process 206b corresponds to the processing of the calculation unit 152.
  • the processing of the acquisition process 206c corresponds to the processing of the acquisition unit 153.
  • the processing of the detection process 206d corresponds to the processing of the detection unit 154.
  • the processing of the classification process 206d corresponds to the processing of the classification unit 155.
  • each program 207a to 207e does not necessarily have to be stored in the hard disk device 507 from the beginning.
  • each program is stored in a "portable physical medium" such as a flexible disk (FD), a CD-ROM, a DVD disk, a magneto-optical disk, or an IC card inserted in the computer 200. Then, the computer 200 may read out and execute each of the programs 207a to 207e.
  • Detection device 110 Communication unit 120 Input unit 130 Output unit 140 Storage unit 141 Training data set 142 Operation model data 143 Pseudo sample table 144 Distillation data table 145 Inspector model data 146 Operation data set table 150 Control unit 151 Generation unit 152 Calculation unit 153 Acquisition unit 154 Detection unit 155 Classification unit

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Algebra (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Testing And Monitoring For Control Systems (AREA)
  • Image Analysis (AREA)
PCT/JP2020/039191 2020-10-16 2020-10-16 検知プログラム、検知方法および検知装置 Ceased WO2022079919A1 (ja)

Priority Applications (3)

Application Number Priority Date Filing Date Title
PCT/JP2020/039191 WO2022079919A1 (ja) 2020-10-16 2020-10-16 検知プログラム、検知方法および検知装置
JP2022556825A JP7424507B2 (ja) 2020-10-16 2020-10-16 検知プログラム、検知方法および検知装置
US18/187,740 US12591808B2 (en) 2020-10-16 2023-03-22 Computer-readable recording medium storing detection program, detection method, and detection device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2020/039191 WO2022079919A1 (ja) 2020-10-16 2020-10-16 検知プログラム、検知方法および検知装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/187,740 Continuation US12591808B2 (en) 2020-10-16 2023-03-22 Computer-readable recording medium storing detection program, detection method, and detection device

Publications (1)

Publication Number Publication Date
WO2022079919A1 true WO2022079919A1 (ja) 2022-04-21

Family

ID=81209051

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2020/039191 Ceased WO2022079919A1 (ja) 2020-10-16 2020-10-16 検知プログラム、検知方法および検知装置

Country Status (3)

Country Link
US (1) US12591808B2 (https=)
JP (1) JP7424507B2 (https=)
WO (1) WO2022079919A1 (https=)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024047758A1 (ja) * 2022-08-30 2024-03-07 富士通株式会社 訓練データ分布推定プログラム、装置、及び方法

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021079458A1 (ja) * 2019-10-24 2021-04-29 富士通株式会社 検出方法、検出プログラムおよび情報処理装置

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170330109A1 (en) * 2016-05-16 2017-11-16 Purepredictive, Inc. Predictive drift detection and correction

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012042521A2 (en) 2010-09-27 2012-04-05 Carmel-Haifa University Economic Corporation Ltd Detecting change points in data streams
US11030536B2 (en) 2015-04-16 2021-06-08 Siemens Aktiengesellschaft Method and apparatus for operating an automation system and accounting for concept drift
JP7024515B2 (ja) 2018-03-09 2022-02-24 富士通株式会社 学習プログラム、学習方法および学習装置
WO2020076309A1 (en) * 2018-10-09 2020-04-16 Hewlett-Packard Development Company, L.P. Categorization to related categories
JP7592737B2 (ja) * 2020-03-06 2024-12-02 ボストンジーン コーポレイション 多重免疫蛍光イメージングを使用する組織特性の決定

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170330109A1 (en) * 2016-05-16 2017-11-16 Purepredictive, Inc. Predictive drift detection and correction

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
CERQUEIRA, VITOR ET AL.: "Unsupervised Concept Drift Detection Using a Student-Teacher Approach", LECTURE NOTES IN COMPUTER SCIENCE, vol. 12323, 15 October 2020 (2020-10-15), pages 190 - 204, Retrieved from the Internet <URL:https://link.springer.com/chapter/10.1007/978-3-030-61527-7_13> [retrieved on 20201127], DOI: 10.1007/978- 3-030-61527-7_13 *
ISHIDA TSUTOMU ຮ, KINGETSU HIROAKI, YOKOTA YASUTO, OKAWA YOSHIHIRO, KOBAYASHI KENICHI, NAKAZAWA KATSUHITO,: "Evaluation of Concept Drift Detection Methods for Unlabeled Data in Operation", PROCEEDINGS OF 34TH ANNUAL CONFERENCE, 1 June 2020 (2020-06-01), pages 1 - 4, XP055932932 *
ISHII, ASUKA ET AL.: "Examination of optimization of temperature parameters in Knowledge Distillation", IPSJ SIG TECHNICAL REPORT: COMPUTER VISION AND IMAGE MEDIA (CVIM, vol. 2019 -CV, no. 10, pages 1 - 5, ISSN: 2188-8701, Retrieved from the Internet <URL:https://ipsj.ixsq.nii.ac.jp/ej/?action=repository_uri&item_id=194906&file_id=l&file_no=l> [retrieved on 20190301] *
LU JIE; LIU ANJIN; DONG FAN; GU FENG; GAMA JOAO; ZHANG GUANGQUAN: "Learning under Concept Drift: A Review", IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, IEEE SERVICE CENTRE , LOS ALAMITOS , CA, US, vol. 31, no. 12, 1 December 2019 (2019-12-01), US , pages 2346 - 2363, XP011754680, ISSN: 1041-4347, DOI: 10.1109/TKDE.2018.2876857 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024047758A1 (ja) * 2022-08-30 2024-03-07 富士通株式会社 訓練データ分布推定プログラム、装置、及び方法

Also Published As

Publication number Publication date
US20230222392A1 (en) 2023-07-13
JP7424507B2 (ja) 2024-01-30
US12591808B2 (en) 2026-03-31
JPWO2022079919A1 (https=) 2022-04-21

Similar Documents

Publication Publication Date Title
KR101711936B1 (ko) 머신 상태 모니터링에서 결함 진단을 위한 일반화된 패턴 인식
WO2023115761A1 (zh) 基于时序知识图谱的事件检测方法和装置
JP6472621B2 (ja) 分類器構築方法、画像分類方法および画像分類装置
US20220188707A1 (en) Detection method, computer-readable recording medium, and computing system
JP7400827B2 (ja) 検出方法、検出プログラムおよび情報処理装置
US20220215294A1 (en) Detection method, computer-readable recording medium, and computng system
US20230045330A1 (en) Multi-term query subsumption for document classification
JP2017102906A (ja) 情報処理装置、情報処理方法及びプログラム
US20220222581A1 (en) Creation method, storage medium, and information processing apparatus
US12591808B2 (en) Computer-readable recording medium storing detection program, detection method, and detection device
JP2017102865A (ja) 情報処理装置、情報処理方法及びプログラム
JP2020177430A (ja) 情報処理装置、情報処理方法及びプログラム
US20220327394A1 (en) Learning support apparatus, learning support methods, and computer-readable recording medium
US20220230027A1 (en) Detection method, storage medium, and information processing apparatus
JP2013080395A (ja) 誤分類検出装置、方法、及びプログラム
US20220222545A1 (en) Generation method, non-transitory computer-readable storage medium, and information processing device
US20220222579A1 (en) Deterioration detection method, non-transitory computer-readable storage medium, and information processing device
US20240233349A9 (en) Integrated model generation method, image inspection system, image inspection model generation device, image inspection model generation program, and image inspection device
JP5640796B2 (ja) 名寄せ支援処理装置、方法及びプログラム
US20220237459A1 (en) Generation method, computer-readable recording medium storing generation program, and information processing apparatus
US20220215272A1 (en) Deterioration detection method, computer-readable recording medium storing deterioration detection program, and information processing apparatus
US20220222580A1 (en) Deterioration detection method, non-transitory computer-readable storage medium, and information processing device
KR20240039407A (ko) 악성코드 변종 분석을 위한 ai 모델의 견고성 측정 시스템 및 어플리케이션
US20220237463A1 (en) Generation method, computer-readable recording medium storing generation program, and information processing apparatus
US20220222582A1 (en) Generation method, computer-readable recording medium storing generation program, and information processing apparatus

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20957748

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2022556825

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20957748

Country of ref document: EP

Kind code of ref document: A1