EP4182859A1

EP4182859A1 - Generating noisy copies of training data in a method for detecting anomalies

Info

Publication number: EP4182859A1
Application number: EP21755801.4A
Authority: EP
Inventors: Roman MOSCOVIZ
Original assignee: Suez International SAS
Current assignee: Suez International SAS
Priority date: 2020-07-16
Filing date: 2021-07-15
Publication date: 2023-05-24
Also published as: FR3112634B1; ZA202301444B; WO2022013503A1; FR3112634A1

Abstract

Computer-implemented method (1) for detecting anomalies in a dataset, implementing an unsupervised machine learning module, comprising a step of generating (10-10'') a plurality of noisy copies of all or some of the data of the training dataset, each noisy copy being obtained based on at least one noise generation parameter, for each noisy copy, a step of training (12-12'') said machine learning module based on said associated noisy training dataset, and a step of determining (14, 14') the noisy training dataset exhibiting a maximum detection performance.

Description

Title of the invention:

GENERATION OF COPIES OF NOISED TRAINING DATA IN AN ANOMALY DETECTION PROCESS

[1] The invention relates to a method for detecting anomalies.

[2] More particularly, the invention relates to an automatic learning method for the detection of abnormal data in a data set.

[3] Anomaly detection is a well-known subject especially in the field of data mining, in English Data Mining, with many industrial applications such as sorting objects on a recycling chain, monitoring measurement sensors or any other plant supervision application.

[4] It is known to implement machine learning methods to detect so-called anomalous data in a data set. However, a known problem in anomaly detection is that the data generally available is highly unbalanced.

[5] In other words, the training data is generally composed of a large majority of "normal" data, while the anomaly data is generally weakly represented. This then raises a relatively important concern about the robustness of the training of the machine learning modules, in particular when they are supervised.

[6] In order to reduce the effect of data imbalance, methods are known such as the SMOTE method described in the scientific publication SMOTE: Synthetic Minority Over-sampling Technique, Nitesh V. Chawla et al., in Journal ofArtificial Intelligence Research 16 (2002) 321-357.

[7] The SMOTE method tends to densify sparsely represented data areas so as to reduce the imbalance of the data set.

[8] However, the techniques known from the prior art are generally not satisfactory. Supervised learning methods are indeed sensitive to data balancing, and the training dataset imbalance between normal and abnormal data makes them unsuitable for this purpose.

[9] Known unsupervised learning methods tend to be relatively conservative. In other words, these methods are developed assuming that anomalous data is present in the training datasets, which makes them relatively inflexible in a context where there are few or no examples of anomalous data. This does not allow to obtain a satisfactory control on the rate of false positives.

[10] However, in the context of the invention, the training datasets include few or no examples of anomalous data, and there is a strong constraint on the accuracy of the detections obtained.

[11] Also, there is the need for a machine learning process for the anomaly detection resolving previous problems.

[12] To this end, a method for detecting anomalies implemented by computer in a data set is proposed, implementing an unsupervised machine learning module, and comprising the provision of a data set of labeled training comprising at least one training datum representative of a normal state of the analyzed data of the analyzed data.

[13] The process includes:

[14] A step of generating a plurality of noisy copies of all or part of the data of the training data set, each noisy copy being obtained from at least one noise generation parameter, said parameter noise generation comprising a maximum noise amplitude to be added to all or part of the data of the training data set;

[15] For each noisy copy, a step of forming a set of noisy training data, comprising said set of training data and said at least one associated noisy copy,

[16] For each noisy copy, a step of training said automatic learning module according to said set of associated noisy training data, for example by a cross-validation method,

[17] For each noisy copy, a step of calculating the detection performance of said machine learning module as a function of the false positive rate obtained by implementing the machine learning module on a set of labeled test data , comprising at least one training datum representative of a normal state of the analyzed data;

[18] A step of determining the noisy training data set with maximum detection performance; and

[19] The implementation of the machine learning module trained from said determined noisy training data set.

[20] Thus, a method is obtained for improving an unsupervised machine learning module to detect anomalies by adding noise to the training data in a controlled manner. Thus, the invention makes it possible to avoid the erroneous detection of false positives although there are not necessarily examples of abnormal data in the training data.

[21] The at least one noise generation parameter which includes a maximum noise amplitude to be added to all or part of the data of the training data set constitutes a simple and particularly robust parameter for obtaining distinct noisy copies of controlled manner.

[22] In particular, the step of generating a noisy copy may in particular comprise the generation of noisy copies comprising the noise of only one part of the training data set, for example by random selection of the data to be noisy, so as to ensure a completely random distribution of the noisy data.

[23] In particular, for each noisy copy the noise parameter can have a different value but can also take identical values for all or part of the noisy copies. Indeed, the noise being added randomly, the same noise parameter will generate different noisy copies so that their performance may vary.

[24] Advantageously and in a non-limiting way, the method implements in parallel the steps of generating a plurality of noisy copies, and for each noisy copy generated, the method implements in parallel, the steps of constitution, training, and calculation, the determining step determining the set of noisy training data having a maximum detection performance among the performances calculated by the parallel calculation steps.

[25] Thus, by implementing the method according to a search grid method, one can relatively quickly determine the optimal noise parameter to noisy the training data.

[26] According to an advantageous alternative for implementing the method, said generation, constitution, training and calculation steps are implemented iteratively, for example incrementally or dichotomously, so that at each iteration the detection performance of said automatic learning module calculated is compared with the detection performance of said automatic learning module calculated at the previous iteration.

[27] Thus, an iterative implementation allows a relatively lightweight approach, especially in terms of memory footprint, to determine the optimal noise parameter value.

[28] According to a particular implementation, said generation, constitution, training and calculation steps are implemented incrementally, so that at each iteration the noise generation parameter value is incremented by a predetermined step, and if the performance calculated at the previous iteration is lower than that calculated at the current iteration, a new iteration of the generation, constitution, training and calculation steps is carried out.

[29] Advantageously and in a non-limiting manner, the set of labeled training data also includes at least one example of anomalous data. Thus, one can obtain training data making the machine learning module even more robust, with even better control over false positives, without diminishing the anomaly detection performance.

[30] Advantageously and in a non-limiting way, the step of calculating the performance of detection of said automatic learning module is also a function of the rate of false negatives obtained by the implementation of the automatic learning module on at least one example of abnormal data. Thus, it is possible to improve the calculation of the performance of the training of the machine learning module from the sold-out data set, taking into account not only the false positives, in other words the erroneous detections of anomalies, but also the false negative, in other words the rate of undetected anomalies.

[31] Advantageously and in a non-limiting manner, said step of calculating the detection performance comprises calculating an average between the rate of false positives and the rate of false negatives. This is a simple and relatively reliable method for calculating overall performance.

[32] Advantageously and in a non-limiting way, said average is weighted according to a predetermined coefficient. Here the weighting makes it possible to give more weight either to the false-positives or to the false-negatives in the overall calculation of the performance, in order to take into account the most important criterion of the two, in particular with regard to the technical field for which the process is implemented.

[33] Advantageously and in a non-limiting way, the determination step further comprises a comparison of the maximum detection performance determined with a target performance value, so that if the maximum detection performance is lower than the performance value target, a new implementation of the method is carried out with new values of noise generation parameters. Thus, if the maximum performance detected is not satisfactory, the method according to the invention can be relaunched with new sold-out copies, for example by changing the sold-out parameters, so as to continue the search for a set of sold-out data. showing satisfactory performance.

[34]

[35] Advantageously and in a non-limiting way, said at least one brait generation parameter comprises a statistical brait distribution, such as a Gaussian white additive brait or a colored brait. Thus, the noise can be varied not only in terms of its amplitude but also in terms of its type of generation. This contributes to the detection of an optimal brait parameterization.

[36] Advantageously and in a non-limiting manner, the machine learning module comprises one module among: a one-class support vector machine, a decision tree, a forest of decision trees, a method of the k nearest neighbors or an auto-encoder. These modules are well known to those skilled in the art and are particularly efficient. In addition, the method presents effective results for all these modules.

[37] The invention also relates to a method for classifying a material, to be classified in a list of predetermined materials, such as a list of plastic resins, each material being associated with at least one measurement parameter, such as an absorption spectrum; the classification method comprising, for each material of the predetermined list of materials, the implementation of the anomaly detection method as described previously.

[38] The invention also relates to a computer program comprising program code instructions for the execution of the steps of the anomaly detection method as described previously and/or for the execution of the preceding classification method, when said program is run on a computer.

[39] Other features and advantages of the invention will become apparent on reading the description given below of two particular embodiments of the invention, given by way of indication but not limitation, with reference to the appended drawings in which :

[40] [Fig. 1] is a flowchart of a first embodiment of the method according to the invention;

[41] [Fig. 2] is a flowchart of a second embodiment of the method according to the invention; and

[42] [Fig. 3] is a schematic representation of the noisy and non-noisy training datasets implemented by the invention.

[43] According to a first embodiment of the invention of method 1 according to the invention with reference to Figure 1, method 1 is implemented to detect non-recyclable plastic resins in a polyethylene recycling chain.

[44] Although the description in these two embodiments 1, 1' described is based on this particular example of use of the method 1, the invention is not limited to this single technical field, but is adaptable to any set of data in which undesirable elements must be identified, for any application of data mining, or data processing.

[45] The invention can be implemented in any technical field, whether in the field of biology, for example to carry out the detection of bacteria, in astrophysics or in any other technical field where the detection fault should be performed.

[46] Anomaly detection method 1 implements an unsupervised machine learning module.

[47] Any type of unsupervised machine learning module can be implemented. Indeed, as will emerge from the description, the invention is based in particular on the automatic optimization of the training data. The method makes it possible in this respect to optimize the implementation of any type of automatic learning module.

[48] Among the automatic learning modules, we know in particular one-class support vector machines, called one-class SVM, trees of decision tree, known as Decision Tree , decision tree forests, in particular Isolation Forests, auto-encoders, or even the K-Nearest Neighbors method. All these methods are well known to those skilled in the art, and their particular implementation is not discussed here.

[49] It is clear that for those skilled in the art, all these methods are known to be trained on the basis of training data, so as to then be able, in an exploitation phase, to provide, depending on an input data, a characterization of the data.

[50] In the context of the invention, the objective is not to classify the data among a set of classes, as is known to do in a classification process, but to determine whether each datum of entry belongs to the unique class sought or not, in other words if it is an anomaly with respect to the other data.

[51] In this example implementation of the invention, we implement a one-class SVM module.

[52] The one-class SVM is particularly useful for detecting observations or measurements whose characteristics differ significantly from the expected data. This data is commonly referred to in the technical field as outliers, or outliers.

[53] The one-class SVM and described in a first implementation in the publication

Support Vector Method for Novelty Detection, Bernhard Scholkopf et al., NIPS'99: Proceedings of the 12th International Conference on Neural Information Processing Systems, November 1999, Pages 582-588, which is incorporated herein by reference.

[54] In this implementation, the module looks for a hyperplane, like the classic SVM, with the difference that this hyperplane aims to separate all the data from the origin in the space in which the data is projected, while maximizing the distance from the hyperplane to the data, so as to reduce the margin between the data and the hyperplane.

[55] Another known implementation of the one-class SVM is that described in Support vector data description, Tax, D. and Duin, R., 2004, in Machine learning,

54(1) .-pages 45-66, also incorporated by reference.

[56] In this particular implementation of the one-class SVM, the method separates the space by a hypersphere which aims to encompass all the training data and to find the smallest hypersphere allowing to encompass all this data.

[57] It is well known that unsupervised methods perform, in general, subdivisions or separations of space depending on the density of the training data, which makes it possible to highlight the outliers in a Automatique.

[58] However, a well-known limitation of these algorithms is the risk of obtaining false positives. In other words, the machine learning module detects data as "abnormal" in an erroneous way, and the risk of false negatives, in other words the module detects that the data is "normal" when it is actually data "abnormal".

[59] In the context of the example described, these two cases are problematic:

[60] - In the event of a false positive, a recyclable material is removed because it is considered non-recyclable; the negative impact is twofold, due both to a loss of recycled materials and to the increase in the operational cost of processing the non-recoverable flow. If, for example, the detection error is around 5 to 10%, this affects the tonnages sorted per year.

[61] - In the event of a false negative, a non-recyclable material is considered recyclable and then pollutes all the sorted materials. In this case, the impact occurs during the over-sorting steps upstream of the production lines for recycled materials or on the actual production.

[62] In the first case, over-sorting steps can eliminate non-recyclable materials; which will lead to an increase in waste treatment costs, ranging, for example, from €80 to €150 per ton.

[63] In the second case, the impact relates to the quality of the recycled material produced if an incompatible material is mixed with a modification of the expected properties; this has the consequence of not being able to market the recycled materials on the expected markets and of sending them to markets with lower added value, with a loss of between €100 and €300 per tonne, for example.

[64] In this regard, the invention aims to reduce the risk of all false detections in the case of the implementation of unsupervised machine learning modules.

[65] Method 1 uses multiple datasets.

[66] First, we provide 9 a labeled training data set 20 representative of the normal state of the analyzed data.

[67] Figure 3 is a schematic representation of a training data set 20. Each data in Figure 3 is a point of the plane; in other words, data made of a two-dimensional vector, so that the training data set 20 can be represented in a plane. The reference 21 represents the noisy copy 21 of the training data. But the training data 20 being superimposed on the noisy data 21, all the data of the noisy copy 21 cannot be seen in FIG. dimensions greater than 2 or 3.

[68] Although not required, the training data set 20 may also include examples of anomalies. This is particularly advantageous because these examples of anomalies make it possible to obtain a module more robust machine learning by further improving control over false positive detection.

[69] In our example of realization, objects to be recycled in polyethylene were subjected to a near infrared characterization, known under the name of NIR characterization, for Near Infra-Red, so that each object is associated with a spectrum absorption to characterize it.

[70] This is again only given within the framework of the example implementation of the method, but the labeled data can include any characteristic desired by the person skilled in the art, and is not limited to a spectra or absorbance values.

[71] A set of labeled test data representative of the normal state of the analyzed data is also provided.

[72] This set of test data, which is a classic element of machine learning methods, makes it possible to calculate the performance of a trained machine learning module, by identifying the rate of correct detections made on a set of data whose component data we already know.

[73] Method 1 according to the invention then trains the unsupervised automatic learning module, also from a set of noisy training data. The objective is to optimally calibrate the added noise to improve the robustness of the learning module.

[74] The noise added to the training data 20 is determined according to at least one noise parameter, in particular according to a random value limited in amplitude.

[75] Indeed, it is well known that an input data of a machine learning module, whether training, test or data to be analyzed, comprises a plurality of numerical values, usually organized as a vector or matrix.

[76] In our example, as explained above, a labeled datum associates an absorption spectrum with a material, here polyethylene.

[77] The absorption spectrum is then an N-dimensional vector of absorbance values. N being the number of sample values in the spectrum.

[78] In our example, the analysis spectrum being obtained by a so-called near infrared method, the spectrum is for example obtained between 780nm and 2500nm. A number N of sampling points of the spectrum obtained during the analysis of the object is then defined, forming input data associated with the object.

[79] In general, one then proceeds to a normalization of the data vector so as to simplify the data processing, although this step is not obligatory.

[80] To add noise to such a vector, by way of example, a noise vector is then added to this data vector, the norm of which is less than or equal to a noise threshold value. Thus, noise threshold is understood to mean the maximum value in amplitude of the generated noise.

[81] The noise generation method for the different data can be a method of generating an additive white Gaussian noise or a colored noise, in other words a noise whose power spectral density is not constant over the spectrum. This choice can be made by a person skilled in the art.

[82] We can also provide different noisy copies with different types of noise, which also increases the probability of obtaining maximum performance, this however requiring more computation time and memory footprint.

[83] Thus from an input datum, one can obtain a second noisy input datum, the noise of which is controlled in amplitude in the N-dimensional space of the datum.

[84] The generation step 10 of a noisy copy 21 does not necessarily include the noise of all the data of the training data set, but may include the noise of only part of this set , for example by random selection of the data to be noisy, so as to ensure a completely random distribution of the noisy data.

[85] The sound effects of only part of the data of the training data set 20 is particularly relevant when this training data set 20 includes a very large number of data, which would impose training times particularly long. Also, this choice must be made by those skilled in the art as a ratio to be found between the necessary calculation time, depending on the available calculation performance, and the need to maximize the amount of input data.

[86] The amount of noisy input data may also be increased a posteriori if the detection performance as calculated 13 later in the description is not sufficient.

[87] In this first embodiment, the search for the optimal noise value is carried out by a search grid method in which a plurality of noise parameters are used in parallel.

[88] In other words, several noisy copies 21 of the training data set 20 are generated during several parallel 10-10” generation steps.

[89] The number of noisy data 21 calculated is determined by the person skilled in the art according to the available calculation time, the desired search granularity and the desired training speed.

[90] Each generation step 10-10” being independent, and each noisy copy 21 obtained being produced according to a noise threshold value different from the other noisy copies, or according to a different noise mode (white noise, colored noise).

[91] Thus, a plurality of noisy copies 21 are obtained in parallel. [92] For each noisy copy 21, one then proceeds to a step 11-11” of constitution of a noisy training data set 22, comprising the training data set 20 and its noisy copy 21. Also , one obtains as many noisy training data sets 22 as there are distinct noisy copies 21 .

[93] We then proceed, for each set of noisy training data 22, to a training step 12-12” of a machine learning module. This step is well known to those skilled in the art who implement an automatic learning module.

[94] In particular, it is possible to train the automatic learning module using cross-validation methods known to the technical field.

[95] In particular, training can be carried out by cross-validation called Hold-

Out in which the training data is divided into two samples, a first training sub-sample comprising 80% of the training data set and a second test sub-sample made of the remaining 20%, not not be confused with the test data set of computation step 13. By cross-validation methods, training sub-steps based on the training sub-sample are repeated several times and the hyper- parameters of the machine learning module depending on the error obtained at each repetition of the training sub-steps, the calculated error can be a performance score of the model on the test sample, such as the squared error medium. For training, training sub-steps are therefore repeated a plurality of times, for example between 20 and 50 times, in particular 30 times, so as to cause the hyper-parameters of the model to converge to an optimal value as a function of the set of noisy training data 21. However, this order of magnitude is given solely by way of example and depends in particular on the calculation time available for the training.

[96] Thus, we obtain as many trained machine learning modules as there are noisy training data sets 22.

[97] Then, once the 12-12” training steps are completed, we proceed, for each machine learning module trained, to an IS IS” calculation step of the performance of the corresponding machine learning module.

[98] The performance of the machine learning module is calculated by implementing the trained module with the labeled test data set.

[99] We then determine the rate of data wrongly considered to be abnormal, in other words the rate of false positives.

[100] This calculation step 13 can be refined, according to an alternative implementation, also taking into account the number of false negatives, otherwise depending on the number of abnormal data wrongly considered as normal. This is possible only when the test dataset also includes abnormal data. Also it is advantageous to have a set of test data also including examples of anomalous data.

[101] In this alternative case, the performance can then be estimated as an average of the false-negative and false-positive rates, possibly weighted according to the importance given respectively to the type of false.

[102] Also, such a weighting must be carried out in direct connection with the technical field for which the process is implemented.

[103] In our example of object sorting, a false negative is particularly problematic in that it allows unwanted material to pass through the recycling chain. Also, in this implementation alternative, and taking into account the technical field, a weighting increasing the importance of the detection of false negatives is then preferred.

[104] Once the performance has been calculated for each set of noisy training data 22, we proceed to a step of determining 14 the set of data. Here it is determined which each noisy training data set 22 maximizes the performance criterion.

[105] At this stage of method 1 according to a particular implementation of the first embodiment, it is possible to continue the search for a better performance criterion if, for example, the maximum performance criterion obtained during the determination step 14 is less than a desired minimum performance criterion value, or if it is suspected, for example, of having obtained a local maximum which does not correspond to the maximum that can actually be obtained.

[106] In this case, method 1 can be restarted at steps 10-10” by creating new datasets that have been reduced, for example by creating datasets that have been reduced from values of thresholds of noise presenting a refined granularity around the brait threshold having made it possible to obtain the maximum performance during the determination step, in order to seek an even better maximum value.

[107] Returning to the first implementation of the first embodiment, once the set of discounted training data 22 having the determined maximum detection performance, also called set of discounted training data 22 determined, we implements the machine learning module trained from this brazed training data set 22 to detect anomalies. In other words, in our embodiment, we can implement this machine learning module to detect non-recyclable plastic resins in a polyethylene recycling chain.

[108] According to another embodiment of the invention, with reference to FIG. 2, method 1' differs from the previous method 1 in that it is implemented incrementally, and not according to the grid mode. of research put in work previously.

[109] The incremental method described below is an example of iterative implementation of the invention. However, the invention is not limited to this single incremental method and can be adapted to other methods, such as a dichotomous approach or a heuristic method to perform an iterative search for optimum.

[110] Also, in detail, the method 1′ first proceeds to a generation step 10 of a first noisy copy 21, from a first noise threshold value.

[111] We proceed with this noisy copy 21 to the step of constitution 11 of a noisy training data set 22 which is the combination of the noisy copy 21 and the training data set 20.

[112] Once this set of noisy training data 22 has been constituted, the automatic learning module is then trained from this set of noisy training data 22, and a calculation step 13 of the performance of said trained machine learning module.

[113] These steps of constitution 11, training 12 and calculation 13 of the performance are carried out in the same way as for the first embodiment of the invention.

[114] During the first iteration, we necessarily proceed to a return to step 10 of generation of a new noisy copy. This new noisy copy is generated from a new noise threshold value.

[115] Also, according to this example of implementation, the noise threshold value is, during the initialization of the process 1′, initialized to a minimum value, which corresponds to the smallest added noise. At each iteration of method 1′, the noise threshold is incremented by one step, determined according to the desired granularity.

[116] Indeed, the lower the noise threshold increment step, the more precise the search for maximum performance will be, but the greater the number of iterations. On the other hand, a large increment step will make it possible to obtain a maximum relatively quickly, but risks not making it possible to find the optimal maximum performance.

[117] Also, as an example for a noise vector with a norm included in the range of values ]0,1], we can define an increment of 1/1000. With a norm of 0.001 as the first noise threshold value and at each iteration an increase in the noise threshold, so that at each iteration a noise with an increasingly large amplitude is generated.

[118] However, this is given by way of example and the noise iteration step, the noise threshold values, in other words the noise amplitude, can be freely adapted by those skilled in the art.

[119] From the second iteration, when the performance of the module automatic learning is calculated during the step 13 of calculating performance, a step 14′ of determining the set of training data sold off presenting a maximum detection performance is carried out. This determination step 14' is here implemented at each iteration.

[120] During this determination step 14′, the performance of the machine learning module obtained at this iteration is compared to the performance of the machine learning module of the previous iteration.

[121] At this stage, if the previous iteration presented a better performance, we can consider that a maximum performance has been reached.

[122] Also in this case, it is determined 14' that the brait threshold value of the previous iteration was optimal and one can then proceed to the implementation step 15 of the machine learning module trained from the braited copy obtained for the brait threshold value of the previous iteration.

[123] On the other hand, if it is observed, during the determination step, that the performance of the current iteration is superior to the performance of the previous iteration, there is an increase in performance, and one then proceeds to a new iteration of the method, returning to step 10 of generation of a braited copy.

[124] However, when it is determined during the determination step 14' that a performance maximum has been found, it may be a local maximum and not the optimal performance maximum. Also, according to a particular implementation of this second embodiment, it is possible to carry out a plurality of new iterations, for example for a predetermined number of iterations, in order to check whether a new improvement in performance can be obtained for the subsequent iterations. This prevents the process from stopping at a local maximum.

[125] These two embodiments are given by way of example of the invention, and the invention is not however limited solely to this implementation. In particular, any type of unsupervised learning module can be implemented by the method, the different algorithmic approaches described can also be freely adapted by those skilled in the art, insofar as a search for maximum performance of the machine learning module by generating different braited copies is implemented.

[126] The invention described here makes it possible to obtain an anomaly detection method implementing unsupervised automatic learning modules having an operation optimized by the addition of controlled noise in the training data.

[127] The invention can also be used for classification purposes. In particular, with reference to the example of the technical field set out in the present description, but in a non-limiting manner, the invention may relate to a classification method, for example material classification, to be classified in a predetermined list of materials, such as a list of plastic resins.

[128] In this regard, each material is associated with at least one measurement parameter, such as an absorption spectrum, as explained above.

[129] The classification process then implements, for each material from the predetermined list of materials, a process 1, 1' for detecting anomalies as described above. This allows, through the implementation of several anomaly detection processes, to perform a relatively efficient and robust classification.

Claims

[Claim 1] A computer-implemented method (1, G) of detecting anomalies in a data set implementing an unsupervised machine learning module, and comprising providing (9) a set of training data (20) labeled comprising at least one training data representative of a normal state of the analyzed data, characterized in that the method comprises: a step of generating (10-10”) a plurality of copies noisy (21) of all or part of the data of the training data set (20), each noisy copy (21) being obtained from at least one noise generation parameter; said noise generation parameter comprising a maximum noise amplitude to be added to all or part of the data of the training data set; for each noisy copy (21), a step of constituting (11-11”) a set of noisy training data (22), comprising said set of training data (20) and said at least one noisy copy (21) associated, for each noisy copy, with a step of training (12-12”) of said automatic learning module as a function of said set of noisy training data (22) associated, for each noisy copy, with a step of calculation (13-13”) of the detection performance of said machine learning module as a function of the false positive rate obtained by the implementation of the machine learning module on a set of labeled test data, comprising at least one training data representative of a normal state of the analyzed data; a step of determining (14, 14') the set of noisy training data exhibiting maximum detection performance; and implementing (15) the machine learning module trained from said determined noisy training data set.

[Claim 2] Method (1) according to claim 1, characterized in that it implements in parallel the steps of generating (10-10”) a plurality of noisy copies (21), and for each noisy copy generated, the method (1) implements in parallel, the steps of constitution (11-11”), training (12-12”), and calculation (13-13”), the step of determination (14) determining the set of noisy training data having a maximum detection performance among the performances calculated by the parallel calculation steps (13-13”).

[Claim 3] Method (G) according to claim 1, characterized in that said generation (10), constitution (11), training (12) and calculation (13) steps are implemented iteratively , for example incrementally or dichotomously; so that at each iteration, during the determination step (14′), the detection performance of said automatic learning module calculated is compared with the detection performance of said automatic learning module calculated at the previous iteration.

[Claim 4] Method (1, G) according to any one of Claims 1 to 3, characterized in that the set of labeled training data (20) also comprises at least one example of anomalous data.

[Claim 5] Method (1, G) according to claim 4 characterized in that the step of calculating (13-13”) the detection performance of said automatic learning module is also a function of the rate of false negatives obtained by implementing the automatic learning module on at least one example of anomalous data.

[Claim 6] Method (1, G) according to claim 5, characterized in that said step of calculating (13-13”) the detection performance comprises calculating an average between the rate of false positives and the rate false negatives.

[Claim 7] Method (1, G) according to claim 6, characterized in that said average is weighted according to a predetermined coefficient.

[Claim 8] A method (1, G) according to any one of claims 1 to 7, characterized in that the determining step (14) further comprises comparing the determined maximum detection performance with a performance value target, so that if the maximum detection performance is lower than the target performance value, a new implementation of the method (I,G) is carried out with new noise generation parameter values.

[Claim 9] A method (1, G) according to any one of claims 1 to 8, characterized in that said at least one noise generating parameter comprises a statistical distribution of noise, such as additive white Gaussian noise or colored noise.

[Claim 10] A method (1, G) according to any one of claims 1 to 9, characterized in that the machine learning module comprises one of: a one-class support vector machine, a decision tree, a forest of decision trees, a k-nearest neighbor method or an auto-encoder.

[Claim 11] Method for classifying a material, to be classified in a list of predetermined materials, such as a list of plastic resins, each material being associated with at least one measurement parameter, such as an absorption spectrum ; the classification method comprising, for each material of the predetermined list of materials, the implementation of the method (1, G) for detecting anomalies according to any one of claims 1 to 10.

[Claim 12] Computer program comprising program code instructions for carrying out the steps of the method (1, G) according to any one of claims 1 to 10 and/or for carrying out the method according to claim 11, when said program is executed on a computer.