WO2020168796A1 - Data augmentation method based on high-dimensional spatial sampling - Google Patents

Data augmentation method based on high-dimensional spatial sampling Download PDF

Info

Publication number
WO2020168796A1
WO2020168796A1 PCT/CN2019/125431 CN2019125431W WO2020168796A1 WO 2020168796 A1 WO2020168796 A1 WO 2020168796A1 CN 2019125431 W CN2019125431 W CN 2019125431W WO 2020168796 A1 WO2020168796 A1 WO 2020168796A1
Authority
WO
WIPO (PCT)
Prior art keywords
training
data set
data
classifier
sampling
Prior art date
Application number
PCT/CN2019/125431
Other languages
French (fr)
Chinese (zh)
Inventor
王卡风
须成忠
曹廷荣
熊超
Original Assignee
深圳先进技术研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳先进技术研究院 filed Critical 深圳先进技术研究院
Publication of WO2020168796A1 publication Critical patent/WO2020168796A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present invention relates to the technical field of data enhancement, and more specifically, to a method for enhancing data by performing Monte Carlo sampling in a high-dimensional space after upgrading the data training set.
  • Machine learning and deep learning are generally used to improve accuracy through Data Augmentation or adjusting machine learning classification and regression algorithms.
  • Data enhancement is one of the important branches of machine learning and deep learning research. Obtaining sufficient and effective data is an important means to obtain high accuracy. In practice, the data is often insufficient or there are many invalid redundant data in the original data. In this case, it is necessary to find more data or effectively enhance the original data. In actual problems, there may be many types of data, but the magnitude of the data is too small. In this case, the solution to the problem is a big obstacle. One solution is to use the original data for data enhancement to obtain more information. Lots of data suitable for the task.
  • the training data is generally "expanded” through a series of random transformations, so that the machine learning model will not see exactly the same two training data, which helps prevent the model from overfitting, thereby improving Test accuracy rate.
  • Two current data enhancement methods are introduced as follows: The first is AutoAugment data enhancement method: EkinD. Cubuk et al.'s paper "AutoAugment: Learning Augmentation Policies from Data” learns a data enhancement method suitable for the current task through model learning. Use reinforcement learning to find the best image transformation strategy from the data itself, and learn different combinations of enhancement methods for different tasks. It is a search of an existing image operation set on the original image; but in essence, this method is similar to some commonly used methods.
  • GAN Generative adversarial networks learn the distribution of data through the model, and randomly generate images consistent with the distribution of the training data set, but this method cannot directly improve the accuracy of the classifier.
  • the present invention proposes a method of upgrading the data training set, and then using Monte Carlo sampling to generate new samples according to the upgraded data set, and combining the selection of machine learning algorithms and the adjustment of algorithm hyperparameters
  • the technical solutions for joint optimization to improve the accuracy of machine learning are as follows:
  • the present invention provides a data enhancement method based on high-dimensional space sampling.
  • the method first divides the data set to be enhanced into a training set and a test set, which specifically includes:
  • the sampler performs sampling on the first data set by using a Monte Carlo method to obtain a second data set
  • the training model further includes a Metropolis-Hastings corrector.
  • the "sampling device uses the Monte Carlo method to sample on the first data set to obtain a second data set"
  • the steps include:
  • the Metropolis-Hastings corrector determines whether the candidate sample conforms to the distribution properties consistent with the first data set by setting an acceptance/rejection ratio, wherein the acceptance/rejection ratio ranges from 0.8 to 1.4.
  • step S1 the step of "mapping the training set from the low-dimensional space P to the high-dimensional space D to obtain the first data set” includes:
  • the dimensionality of the training set is increased through the dictionary matrix and the dimensionality increase operator to obtain the first data set.
  • the dictionary matrix is randomly generated or is trained and generated in the KSVD algorithm using the training set, and the dimension increase operator is selected from any one of LASSO function, convolution, or coding.
  • the Monte Carlo method is a random gradient Langevin dynamic sampling method or a random gradient Hamilton Monte Carlo sampling method.
  • the classifier is selected from any one of a support vector machine algorithm, a random forest algorithm, or a convolutional neural network algorithm.
  • a dimension increase operator or a dimension reduction operator is used to control the training set, the second data set and the test set to be in the same dimensional space, and the dimension increase/dimensionality reduction operator is selected from convolution/ Any one of deconvolution, encoding/decoding, or LASSO function.
  • step S5 the step of "inputting the dimensional control training set and the second data set into the classifier for training” includes:
  • the dimensionality-controlled training set and the second data set are combined, they are input into a classifier for training.
  • the training set and the second data set with over-dimension control are combined according to a ratio of (4-7):1.
  • the method provided by the present invention proposes to sample data in a higher dimension; using the LASSO function to increase the dimension, you can get rid of the limitation of sampling in more data dimensions, and achieve the purpose of data enhancement At the same time, it can get rid of the disaster of dimensionality and reduce the resource occupation of sampling.
  • the performance of subsequent classifiers has also been significantly improved.
  • the new samples generated by this method are more suitable for classifier classification.
  • Fig. 1 is a flowchart of a method for realizing data enhancement by sampling in a high-dimensional space provided by an embodiment of the present invention.
  • Fig. 2 is a design flow chart of a gradient estimator provided by an embodiment of the present invention.
  • Fig. 3 is an implementation flow chart of a sampling algorithm after dimensionality-upgrading using compressed sensing according to an embodiment of the present invention.
  • Fig. 4 is a design flow chart of the Metropolis-Hastings corrector provided by an embodiment of the present invention.
  • Fig. 5 is a design flowchart for training a training model provided by an embodiment of the present invention.
  • the present invention provides a data enhancement method based on high-dimensional space sampling.
  • the method assumes that all samples are low-dimensional measurements of certain high-dimensional sparse vectors, and there is continuous probability in such high-dimensional space Distribution, sampling in this continuous distribution to obtain new samples, and these high-dimensional space new samples are more conducive to classification.
  • FIG. 1 is a flowchart of a method for realizing data enhancement by sampling in a high-dimensional space according to an embodiment of the present invention. The present invention will be explained in detail below with reference to FIG. 1.
  • This method first divides the data set that needs to be enhanced into a training set and a test set, and specifically includes the following steps:
  • Step S1 Map the training set from the low-dimensional space P to the high-dimensional space D to obtain a first data set.
  • this step also includes S11, randomly generating a dictionary matrix for compressed sensing, or using the training set to train in the KSVD algorithm to generate a dictionary matrix; S12, combining the dictionary matrix generated in step S11 with the ascending operator to train
  • the dimensionality of the set is increased to obtain the first data set.
  • the dimensionality increase operator can be selected from any of the LASSO function, convolution, or encoding, and is preferably the LASSO function.
  • the limitation of sampling on more data dimensions can achieve the effect of data enhancement, while also being able to get rid of the dimensional disaster and reduce the resource occupation of sampling.
  • Step S2 build an initial training model, which includes a sampler and a classifier.
  • the sampler used in the training model performs sampling based on the Monte Carlo method.
  • the Monte Carlo method that can be used includes stochastic gradient Langevin dynamics (SGLD) sampling Method or Stochastic Gradient Hamiltonian Monte Carlo (sgHMC) sampling method, etc.
  • the classifiers used in the training model include support vector machines (SVM), random forests and other shallow learning algorithms, and convolutional neural networks (CNN) and other deep learning algorithms.
  • a Metropolis-Hastings corrector can be added to the training model, and the corrector is used to determine whether the collected samples conform to the distribution consistent with the first data set or the training set before the upgrade. Nature, if it meets, accept; otherwise, reject it. Adding Metropolis-Hastings corrector will help to collect samples that meet the requirements.
  • Step S3 the sampler performs sampling on the first data set by using the Monte Carlo method to obtain the second data set.
  • the sampler used in the present invention includes a gradient estimator. Please refer to Fig. 2.
  • Fig. 2 is a flow chart of the gradient estimator design provided by an embodiment of the present invention. The principle is as follows: first, a small batch of samples are randomly selected from the original data set. The amount of data S, first solve the random gradient g m of the initial value X 0 on the S data set, and then obtain the value of the next candidate sample X T according to the random gradient g m . Based on the gradient estimator, the embodiment of the present invention provides a specific sampling algorithm implementation process as shown in Fig.
  • step S31 use independent and identically distributed white noise to take an initial value X 0 on the first data set; step S32, in a sampler with a gradient estimator, perform T iterations on the initial value X 0 to find the next candidate sample X T ; step S32, by using the Metropolis-Hastings corrector to determine whether X T matches the first data Set consistent distribution properties to determine whether to accept X T as a new valid sample; when the judgment result is yes, add the current candidate sample to the second data set, and return to step S31; when the judgment result is no, replace the current candidate sample For the new initial sample, return to step S32.
  • K random samples can be taken from the D-dimensional spatial distribution: X 1 , X 2 , X 3 ,..., X k , these samples form high-dimensional The second data set of the space.
  • the Metropolis-Hastings corrector determines whether the candidate sample conforms to the distribution properties consistent with the first data set by setting an acceptance/rejection ratio.
  • the range of the acceptance/rejection ratio is It is 0.8 ⁇ 1.4.
  • the implementation process of the Metropolis-Hastings corrector is shown in Figure 4. Firstly, the negative logarithmic density and derivative of X 0 and X T are evaluated based on the entire data set; then the transition probability of X 0 to X T and X are calculated respectively.
  • Transition probability from T to X 0 and calculate the ratio ⁇ of the two probability values; then randomly select a number ⁇ between 0 and 1, compare ⁇ d and ⁇ , where d is the set acceptance/rejection ratio value , If ⁇ d ⁇ , choose to accept X T , otherwise, reject.
  • Step S4 controlling the training set, the second data set and the test set to be in the same dimensional space.
  • the training set, the second data set, and the test set are controlled to be in the same dimensional space by using an ascending operator or a dimensional reducing operator to obtain the dimensional data required by the classifier. It specifically includes: use the dimension-up operator to increase the dimensions of the training set and the test set, so that the three data sets are all distributed in D-dimensional space; or use the dimension-reduction operator to reduce the dimensionality of the second data set to make three data
  • the set is also distributed in P-dimensional space.
  • the dimensionality increase/dimension reduction operator used is a pair of algorithms, you can choose convolution (convolution) / deconvolution (de-convolution), encoding (encoder) / decoding (decoder) or LASSO, etc. Any group of operators.
  • Step S5 Input the dimensional control training set and the second data set into the classifier to train the training model.
  • the training result is evaluated with the correct rate obtained from the training, and the training ends when the correct rate no longer increases.
  • the start of the sampler is adjusted according to the correct rate. Parameters such as the number of initial steps, the number of steps in the sampling interval, the acceptance/rejection ratio of the corrector, the classifier algorithm and its corresponding hyperparameters, the specific training process is shown in Figure 5.
  • the dimensionality-controlled training set may be first input into the classifier for training, and after the training is completed, the second dimensionality-controlled data set is continuously input into the classifier for training , It is also possible to merge the training set and the second data set with controlled dimensionality, and then enter the training process into the classifier.
  • the training set with controlled dimensionality and the second data set are (4 ⁇ 7): Combine the ratio of 1.
  • Step S6 Evaluate the performance of the trained training model by using the dimensional controlled test set.
  • the data enhancement method provided by the present invention not only gets rid of the limitation of sampling in more data dimensions, but also has a significant improvement in the performance of the subsequent classifiers, and the new samples generated are more suitable for classifier classification. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Character Discrimination (AREA)

Abstract

A data augmentation method based on high-dimensional spatial sampling. The method involves first dividing a data set needing to be augmented into a training set and a test set, and comprises: S1, mapping the training set from a low-dimensional space P to a high-dimensional space D, so as to obtain a first data set; S2, building a training model, wherein the training model comprises a sampler and a classifier; S3, the sampler performing sampling on the first data set using a Monte Carlo method, so as to obtain a second data set; S4, controlling the training set, the second data set and the test set to be in the same dimensional space; S5, inputting the training set and second data set with dimensions controlled into the classifier to train the training model; and S6, using the test set with the dimension controlled to evaluate the performance of the trained training model. The method is free from restriction of sampling on more data dimensions, and a generated new sample is more suitable for classification by a classifier.

Description

一种基于高维空间采样的数据增强方法A data enhancement method based on high-dimensional spatial sampling 技术领域Technical field
本发明涉及数据增强技术领域,更具体而言,涉及一种将数据训练集升维后,再在高维空间进行蒙特卡罗采样生成新样本来增强数据的方法。The present invention relates to the technical field of data enhancement, and more specifically, to a method for enhancing data by performing Monte Carlo sampling in a high-dimensional space after upgrading the data training set.
背景技术Background technique
机器学习、深度学习提高正确率一般通过数据增强(Data Augmentation)或者调整机器学习分类、回归算法来完成。数据增强是机器学习、深度学习研究的重要分支之一,获得充足、有效的数据是得到高正确率的重要手段。在实践中,数据往往都是不充足的或者原始数据中有很多的无效冗余数据,在这种情况下,需要寻找更多的数据或是对原始数据进行有效增强。在实际问题中,可能存在数据种类多,但是数据量级偏少,这种情况下对问题的解决是一个很大的阻碍,一种解决的办法便是利用原始数据进行数据增强,来获取更多的适用于任务的数据。为了尽可能充分利用训练数据,一般会通过一系列随机变换来“扩充”训练数据,这样机器学习模型就不会看到完全相同的两次训练数据,有助于防止模型过拟合,从而提升测试正确率。以下介绍两种目前最新的数据增强方法:第一种是AutoAugment数据增强方法:EkinD.Cubuk等的论文《AutoAugment:Learning Augmentation Policies from Data》通过模型学习出适合当前任务的数据增强方法。使用强化学习从数据本身寻找最佳图像变换策略,对于不同的任务学习不同的增强方法组合,是在原始图像上对已有图像操作集合的搜索;但是从本质上来看,该方法和常用的一些算法(如:旋转,仿射等)没有本质的区别,采样的空间以及采样的维度均没有改变。第二种是GAN数据增强方法:生成对抗网络(GAN:Generative adversarial networks)通 过模型学习数据的分布,随机生成与训练数据集分布一致的图片,但是该方法不能直接提升分类器的正确率。Machine learning and deep learning are generally used to improve accuracy through Data Augmentation or adjusting machine learning classification and regression algorithms. Data enhancement is one of the important branches of machine learning and deep learning research. Obtaining sufficient and effective data is an important means to obtain high accuracy. In practice, the data is often insufficient or there are many invalid redundant data in the original data. In this case, it is necessary to find more data or effectively enhance the original data. In actual problems, there may be many types of data, but the magnitude of the data is too small. In this case, the solution to the problem is a big obstacle. One solution is to use the original data for data enhancement to obtain more information. Lots of data suitable for the task. In order to make full use of the training data as much as possible, the training data is generally "expanded" through a series of random transformations, so that the machine learning model will not see exactly the same two training data, which helps prevent the model from overfitting, thereby improving Test accuracy rate. Two current data enhancement methods are introduced as follows: The first is AutoAugment data enhancement method: EkinD. Cubuk et al.'s paper "AutoAugment: Learning Augmentation Policies from Data" learns a data enhancement method suitable for the current task through model learning. Use reinforcement learning to find the best image transformation strategy from the data itself, and learn different combinations of enhancement methods for different tasks. It is a search of an existing image operation set on the original image; but in essence, this method is similar to some commonly used methods. Algorithms (such as: rotation, affine, etc.) have no essential difference, and neither the sampling space nor the sampling dimension has changed. The second is the GAN data enhancement method: Generative adversarial networks (GAN: Generative adversarial networks) learn the distribution of data through the model, and randomly generate images consistent with the distribution of the training data set, but this method cannot directly improve the accuracy of the classifier.
发明内容Summary of the invention
鉴于上述问题,本发明提出了一种将数据训练集升维,然后根据升维后的数据集采用蒙特卡罗采样的方法来生成新样本,并结合机器学习算法的选择、算法超参数的调整来联合优化,从而提高机器学习正确率的技术方案,如下:In view of the above problems, the present invention proposes a method of upgrading the data training set, and then using Monte Carlo sampling to generate new samples according to the upgraded data set, and combining the selection of machine learning algorithms and the adjustment of algorithm hyperparameters The technical solutions for joint optimization to improve the accuracy of machine learning are as follows:
本发明提供了一种基于高维空间采样的数据增强方法,该方法先将需要增强的数据集分为训练集和测试集,具体包括:The present invention provides a data enhancement method based on high-dimensional space sampling. The method first divides the data set to be enhanced into a training set and a test set, which specifically includes:
S1,将所述训练集从低维空间P映射至高维空间D,以获得第一数据集;S1: Map the training set from the low-dimensional space P to the high-dimensional space D to obtain a first data set;
S2,搭建训练模型,所述训练模型包括采样器和分类器;S2, building a training model, where the training model includes a sampler and a classifier;
S3,所述采样器通过使用蒙特卡罗方法在所述第一数据集上进行采样以获得第二数据集;S3, the sampler performs sampling on the first data set by using a Monte Carlo method to obtain a second data set;
S4,控制所述训练集、所述第二数据集和所述测试集在相同的维度空间;S4, controlling the training set, the second data set and the test set to be in the same dimensional space;
S5,将控制过维度的训练集和第二数据集输入分类器中,对训练模型进行训练;S5: Input the dimensional control training set and the second data set into the classifier, and train the training model;
S6,使用控制过维度的所述测试集对经训练后的训练模型的性能进行评估。S6: Evaluate the performance of the trained training model by using the test set of controlled dimensions.
优选地,所述训练模型还包括Metropolis-Hastings校正器,在步骤S3中,所述“所述采样器通过使用蒙特卡罗方法在所述第一数据集上进行采样以获得第二数据集”的步骤包括:Preferably, the training model further includes a Metropolis-Hastings corrector. In step S3, the "sampling device uses the Monte Carlo method to sample on the first data set to obtain a second data set" The steps include:
S31,在所述第一数据集上随机选取一个样本作为初始样本;S31, randomly selecting a sample from the first data set as an initial sample;
S32,对所述初始样本进行T次迭代以获得候选样本;S32: Perform T iterations on the initial sample to obtain candidate samples;
S33,使用所述Metropolis-Hastings校正器判断所述候选样本是否符合与所述第一数据集一致的分布性质,当判断结果为是时,将当前候选样本加入第二数据集,返回步骤S31;当判断结果为否时,将当前候选样本替换为新的初始样本,返回步骤S32。S33: Use the Metropolis-Hastings corrector to determine whether the candidate sample meets the distribution properties consistent with the first data set, and when the determination result is yes, add the current candidate sample to the second data set, and return to step S31; When the judgment result is no, replace the current candidate sample with a new initial sample, and return to step S32.
更优选地,所述Metropolis-Hastings校正器通过设置接受/拒绝比率来判断所述候选样本是否符合与所述第一数据集一致的分布性质,其中,所述接受/拒绝比率的范围为0.8~1.4。More preferably, the Metropolis-Hastings corrector determines whether the candidate sample conforms to the distribution properties consistent with the first data set by setting an acceptance/rejection ratio, wherein the acceptance/rejection ratio ranges from 0.8 to 1.4.
优选地,在步骤S1中,所述“将所述训练集从低维空间P映射至高维空间D,以获得第一数据集”的步骤包括:Preferably, in step S1, the step of "mapping the training set from the low-dimensional space P to the high-dimensional space D to obtain the first data set" includes:
通过字典矩阵和升维算子对所述训练集进行升维以获得第一数据集。The dimensionality of the training set is increased through the dictionary matrix and the dimensionality increase operator to obtain the first data set.
更优选地,所述字典矩阵随机生成或者利用所述训练集在KSVD算法中训练生成,所述升维算子选自LASSO函数、卷积或者编码中的任意一种。More preferably, the dictionary matrix is randomly generated or is trained and generated in the KSVD algorithm using the training set, and the dimension increase operator is selected from any one of LASSO function, convolution, or coding.
优选地,所述蒙特卡罗方法为随机梯度朗之万动力学采样法或随机梯度哈密尔顿蒙特卡洛采样法。Preferably, the Monte Carlo method is a random gradient Langevin dynamic sampling method or a random gradient Hamilton Monte Carlo sampling method.
优选地,所述分类器选自支持向量机算法、随机森林算法或者卷积神经网络算法中的任意一种。Preferably, the classifier is selected from any one of a support vector machine algorithm, a random forest algorithm, or a convolutional neural network algorithm.
优选地,使用升维算子或降维算子控制所述训练集、所述第二数据集和所述测试集在相同的维度空间,所述升维/降维算子选自卷积/去卷积、编码/去编码或者LASSO函数中的任意一组。Preferably, a dimension increase operator or a dimension reduction operator is used to control the training set, the second data set and the test set to be in the same dimensional space, and the dimension increase/dimensionality reduction operator is selected from convolution/ Any one of deconvolution, encoding/decoding, or LASSO function.
优选地,在步骤S5中,所述“将控制过维度的训练集和第二数据集输入分类器中进行训练”的步骤包括:Preferably, in step S5, the step of "inputting the dimensional control training set and the second data set into the classifier for training" includes:
先将控制过维度的训练集输入分类器中进行训练,训练完成后,再将控制过维度的第二数据集继续输入所述分类器中进行训练;或者First input the dimension-controlled training set into the classifier for training, and after the training is completed, continue to input the dimension-controlled second data set into the classifier for training; or
将控制过维度的所述训练集和所述第二数据集合并后,再输入分类器中进行训练。After the dimensionality-controlled training set and the second data set are combined, they are input into a classifier for training.
更优选地,将控制过维度的所述训练集和所述第二数据集按照(4~7):1的比例合并。More preferably, the training set and the second data set with over-dimension control are combined according to a ratio of (4-7):1.
与现有技术相比,本发明提供的该方法提出了在更高维度上对数据进行采样;利用LASSO函数升维,可以摆脱在更多的数据维度上进行采样的限制,达到增强数据的目的,同时还能够摆脱维度灾难,减少采样的资源占用,在之 后的分类器性能表现上,也有了明显的提升,本方法经试验证实产生的新样本更加适合分类器分类。Compared with the prior art, the method provided by the present invention proposes to sample data in a higher dimension; using the LASSO function to increase the dimension, you can get rid of the limitation of sampling in more data dimensions, and achieve the purpose of data enhancement At the same time, it can get rid of the disaster of dimensionality and reduce the resource occupation of sampling. The performance of subsequent classifiers has also been significantly improved. The new samples generated by this method are more suitable for classifier classification.
附图说明Description of the drawings
图1是本发明实施例提供的在高维空间采样实现数据增强的方法流程图。Fig. 1 is a flowchart of a method for realizing data enhancement by sampling in a high-dimensional space provided by an embodiment of the present invention.
图2是本发明实施例提供的梯度估计器设计流程图。Fig. 2 is a design flow chart of a gradient estimator provided by an embodiment of the present invention.
图3是本发明实施例提供的一种利用压缩感知升维后采样算法的实现流程图。Fig. 3 is an implementation flow chart of a sampling algorithm after dimensionality-upgrading using compressed sensing according to an embodiment of the present invention.
图4是本发明实施例提供的Metropolis-Hastings校正器的设计流程图。Fig. 4 is a design flow chart of the Metropolis-Hastings corrector provided by an embodiment of the present invention.
图5是本发明实施例提供的对训练模型进行训练的设计流程图。Fig. 5 is a design flowchart for training a training model provided by an embodiment of the present invention.
具体实施方式detailed description
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。In order to make the purpose, technical solutions, and advantages of this application clearer, the following further describes this application in detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the application, and not used to limit the application.
本发明提供了一种基于高维空间采样的数据增强方法,受压缩感知的启发,该方法假设所有样本是某些高维稀疏向量的低维度测量,而在这样的高维度空间中存在连续概率分布,在这个连续分布中进行采样获得新样本,并且这些高维空间新样本更利于分类。请参阅图1,图1为本发明实施例提供的在高维空间采样实现数据增强的方法流程图,下面结合图1对本发明进行具体解释。The present invention provides a data enhancement method based on high-dimensional space sampling. Inspired by compressed sensing, the method assumes that all samples are low-dimensional measurements of certain high-dimensional sparse vectors, and there is continuous probability in such high-dimensional space Distribution, sampling in this continuous distribution to obtain new samples, and these high-dimensional space new samples are more conducive to classification. Please refer to FIG. 1. FIG. 1 is a flowchart of a method for realizing data enhancement by sampling in a high-dimensional space according to an embodiment of the present invention. The present invention will be explained in detail below with reference to FIG. 1.
该方法先将需要增强的数据集分为训练集和测试集,具体包括以下步骤:This method first divides the data set that needs to be enhanced into a training set and a test set, and specifically includes the following steps:
步骤S1,将该训练集从低维空间P映射至高维空间D,以获得第一数据集。在该步骤中,又包括了S11,随机产生一个压缩感知的字典矩阵,或者用训练集在KSVD算法中训练生成一个字典矩阵;S12,将步骤S11中生成的字典矩阵结合升维算子对训练集进行升维,以获得第一数据集,其中,根据本发明的一些实施方式,升维算子可以选自LASSO函数、卷积或者编码中的任意一种,优选为LASSO函数,不仅可以摆脱在更多的数据维度上进行采样的限 制,达到数据增强的效果,同时还能够摆脱维度灾难,减少采样的资源占用。Step S1: Map the training set from the low-dimensional space P to the high-dimensional space D to obtain a first data set. In this step, it also includes S11, randomly generating a dictionary matrix for compressed sensing, or using the training set to train in the KSVD algorithm to generate a dictionary matrix; S12, combining the dictionary matrix generated in step S11 with the ascending operator to train The dimensionality of the set is increased to obtain the first data set. According to some embodiments of the present invention, the dimensionality increase operator can be selected from any of the LASSO function, convolution, or encoding, and is preferably the LASSO function. The limitation of sampling on more data dimensions can achieve the effect of data enhancement, while also being able to get rid of the dimensional disaster and reduce the resource occupation of sampling.
步骤S2,搭建初始训练模型,该训练模型包括采样器和分类器。在该训练模型中所使用的采样器基于蒙特卡罗方法进行采样,根据本发明的一些实施例,可以使用的蒙特卡罗方法包括随机梯度朗之万动力学(stochastic gradient langevin dynamics,SGLD)采样法或随机梯度哈密尔顿蒙特卡洛(stochastic gradient Hamiltonian Monte Carlo,sgHMC)采样法等。在该训练模型中所使用的分类器包括了支持向量机(SVM)、随机森林等浅度学习算法和卷积神经网络(CNN)等等深度学习算法。根据本发明的另一些实施例,在该训练模型中还可以增加Metropolis-Hastings校正器,该校正器用于判断采得的样本是否符合与第一数据集或者未升维前的训练集一致的分布性质,如果符合,则接受;反之,则拒绝,增加Metropolis-Hastings校正器会有利于采得符合要求的样本。Step S2, build an initial training model, which includes a sampler and a classifier. The sampler used in the training model performs sampling based on the Monte Carlo method. According to some embodiments of the present invention, the Monte Carlo method that can be used includes stochastic gradient Langevin dynamics (SGLD) sampling Method or Stochastic Gradient Hamiltonian Monte Carlo (sgHMC) sampling method, etc. The classifiers used in the training model include support vector machines (SVM), random forests and other shallow learning algorithms, and convolutional neural networks (CNN) and other deep learning algorithms. According to other embodiments of the present invention, a Metropolis-Hastings corrector can be added to the training model, and the corrector is used to determine whether the collected samples conform to the distribution consistent with the first data set or the training set before the upgrade. Nature, if it meets, accept; otherwise, reject it. Adding Metropolis-Hastings corrector will help to collect samples that meet the requirements.
步骤S3,采样器通过使用蒙特卡罗方法在第一数据集上进行采样以获得第二数据集。本发明所使用的采样器中包括了梯度估计器,请参阅图2,图2为本发明实施例提供的梯度估计器设计流程图,其原理如下:先从原始数据集中随机抽取一小批的数据量S,在S数据集上先求解初始值X 0的随机梯度g m,再根据随机梯度g m来得到下一个候选样本X T的值。基于该梯度估计器,本发明实施例提供了一种具体的采样算法实现过程如图3所示,步骤S31,用独立同分布的白噪声在第一数据集上取一个初始值X 0;步骤S32,在具有梯度估计器的采样器中,对初始值X 0进行T次的迭代,找到下一个候选样本X T;步骤S32,通过使用Metropolis-Hastings校正器判断X T是否符合与第一数据集一致的分布性质来决定是否接受X T成为新的有效样本;当判断结果为是时,将当前候选样本加入第二数据集,返回步骤S31;当判断结果为否时,将当前候选样本替换为新的初始样本,返回步骤S32,K轮之后,即可从D维空间分布中取出了K个随机样本:X 1,X 2,X 3,……,X k,这些样本组成了高维空间的第二数据集。 Step S3, the sampler performs sampling on the first data set by using the Monte Carlo method to obtain the second data set. The sampler used in the present invention includes a gradient estimator. Please refer to Fig. 2. Fig. 2 is a flow chart of the gradient estimator design provided by an embodiment of the present invention. The principle is as follows: first, a small batch of samples are randomly selected from the original data set. The amount of data S, first solve the random gradient g m of the initial value X 0 on the S data set, and then obtain the value of the next candidate sample X T according to the random gradient g m . Based on the gradient estimator, the embodiment of the present invention provides a specific sampling algorithm implementation process as shown in Fig. 3, step S31, use independent and identically distributed white noise to take an initial value X 0 on the first data set; step S32, in a sampler with a gradient estimator, perform T iterations on the initial value X 0 to find the next candidate sample X T ; step S32, by using the Metropolis-Hastings corrector to determine whether X T matches the first data Set consistent distribution properties to determine whether to accept X T as a new valid sample; when the judgment result is yes, add the current candidate sample to the second data set, and return to step S31; when the judgment result is no, replace the current candidate sample For the new initial sample, return to step S32. After K rounds, K random samples can be taken from the D-dimensional spatial distribution: X 1 , X 2 , X 3 ,..., X k , these samples form high-dimensional The second data set of the space.
在步骤S32中,Metropolis-Hastings校正器通过设置接受/拒绝比率来判断 所述候选样本是否符合与所述第一数据集一致的分布性质,根据本发明的一些实施方式,接受/拒绝比率的范围为0.8~1.4。进一步地,该Metropolis-Hastings校正器的实现过程如图4所示,先基于整个数据集评估X 0、X T的负对数密度以及导数;然后分别算出X 0到X T的转移概率以及X T到X 0的转移概率,并求出两个概率值的比例θ;再在0~1之间随机抽取一个数ε,将εd和θ进行比较,其中d为设定的接受/拒绝比率值,若εd<θ,则选择接受X T,反之,则拒绝。 In step S32, the Metropolis-Hastings corrector determines whether the candidate sample conforms to the distribution properties consistent with the first data set by setting an acceptance/rejection ratio. According to some embodiments of the present invention, the range of the acceptance/rejection ratio is It is 0.8~1.4. Further, the implementation process of the Metropolis-Hastings corrector is shown in Figure 4. Firstly, the negative logarithmic density and derivative of X 0 and X T are evaluated based on the entire data set; then the transition probability of X 0 to X T and X are calculated respectively. Transition probability from T to X 0 , and calculate the ratio θ of the two probability values; then randomly select a number ε between 0 and 1, compare εd and θ, where d is the set acceptance/rejection ratio value , If εd<θ, choose to accept X T , otherwise, reject.
步骤S4,控制训练集、第二数据集和测试集在相同的维度空间。在该步骤中,通过使用升维算子或降维算子控制训练集、第二数据集和测试集在相同的维度空间,以得到分类器所需要的维度数据。具体包括了:使用升维算子对训练集和测试集进行升维,使得三个数据集同为D维空间分布;或者使用降维算子对第二数据集进行降维,使得三个数据集同为P维空间分布。且在整个发明中,所使用的升维/降维算子是一对算法,可以选择卷积(convolution)/去卷积(de-convolution)、编码(encoder)/解码(decoder)或者LASSO等算子中的任意一组。Step S4, controlling the training set, the second data set and the test set to be in the same dimensional space. In this step, the training set, the second data set, and the test set are controlled to be in the same dimensional space by using an ascending operator or a dimensional reducing operator to obtain the dimensional data required by the classifier. It specifically includes: use the dimension-up operator to increase the dimensions of the training set and the test set, so that the three data sets are all distributed in D-dimensional space; or use the dimension-reduction operator to reduce the dimensionality of the second data set to make three data The set is also distributed in P-dimensional space. And in the whole invention, the dimensionality increase/dimension reduction operator used is a pair of algorithms, you can choose convolution (convolution) / deconvolution (de-convolution), encoding (encoder) / decoding (decoder) or LASSO, etc. Any group of operators.
步骤S5,将控制过维度的训练集和第二数据集输入分类器中,对训练模型进行训练。在该步骤中,训练结果以训练所得的正确率进行评估,训练到当正确率不再升高的饱和状态时结束,在训练的过程中,根据正确率的好坏来反馈调整采样器的起始步数成、采样间隔步数等参数、校正器的接受/拒绝比率、分类器算法及其对应的超参数,具体的训练过程如图5所示。根据本发明的一些实施例,在该步骤中,可以先将控制过维度的训练集输入分类器中进行训练,训练完成后,再将控制过维度的第二数据集继续输入分类器中进行训练,也可以将控制过维度的训练集和第二数据集合并后,再输入分类器中进行训练的过程,根据本发明的另一些实施例,将控制过维度的训练集和第二数据集按照(4~7):1的比例合并。Step S5: Input the dimensional control training set and the second data set into the classifier to train the training model. In this step, the training result is evaluated with the correct rate obtained from the training, and the training ends when the correct rate no longer increases. During the training process, the start of the sampler is adjusted according to the correct rate. Parameters such as the number of initial steps, the number of steps in the sampling interval, the acceptance/rejection ratio of the corrector, the classifier algorithm and its corresponding hyperparameters, the specific training process is shown in Figure 5. According to some embodiments of the present invention, in this step, the dimensionality-controlled training set may be first input into the classifier for training, and after the training is completed, the second dimensionality-controlled data set is continuously input into the classifier for training , It is also possible to merge the training set and the second data set with controlled dimensionality, and then enter the training process into the classifier. According to other embodiments of the present invention, the training set with controlled dimensionality and the second data set are (4~7): Combine the ratio of 1.
步骤S6,使用控制过维度的测试集对经训练后的训练模型的性能进行评 估。Step S6: Evaluate the performance of the trained training model by using the dimensional controlled test set.
经试验证实,本发明提供的数据增强方法不仅摆脱了在更多的数据维度上进行采样的限制,在之后的分类器性能表现上,也有了明显的提升,产生的新样本更加适合分类器分类。Experiments have confirmed that the data enhancement method provided by the present invention not only gets rid of the limitation of sampling in more data dimensions, but also has a significant improvement in the performance of the subsequent classifiers, and the new samples generated are more suitable for classifier classification. .
以上所述仅是本发明的优选实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本发明原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本发明的保护范围。The above are only the preferred embodiments of the present invention. It should be pointed out that for those of ordinary skill in the art, without departing from the principle of the present invention, several improvements and modifications can be made, and these improvements and modifications are also It should be regarded as the protection scope of the present invention.

Claims (10)

  1. 一种基于高维空间采样的数据增强方法,该方法先将需要增强的数据集分为训练集和测试集,其特征在于,所述方法包括:A data enhancement method based on high-dimensional spatial sampling. The method first divides the data set to be enhanced into a training set and a test set, and is characterized in that the method includes:
    S1,将所述训练集从低维空间P映射至高维空间D,以获得第一数据集;S1: Map the training set from the low-dimensional space P to the high-dimensional space D to obtain a first data set;
    S2,搭建训练模型,所述训练模型包括采样器和分类器;S2, building a training model, where the training model includes a sampler and a classifier;
    S3,所述采样器通过使用蒙特卡罗方法在所述第一数据集上进行采样以获得第二数据集;S3, the sampler performs sampling on the first data set by using a Monte Carlo method to obtain a second data set;
    S4,控制所述训练集、所述第二数据集和所述测试集在相同的维度空间;S4, controlling the training set, the second data set and the test set to be in the same dimensional space;
    S5,将控制过维度的训练集和第二数据集输入分类器中,对训练模型进行训练;S5: Input the dimensional control training set and the second data set into the classifier, and train the training model;
    S6,使用控制过维度的所述测试集对经训练后的训练模型的性能进行评估。S6: Evaluate the performance of the trained training model by using the test set of controlled dimensions.
  2. 如权利要求1所述方法,其特征在于,所述训练模型还包括Metropolis-Hastings校正器,在步骤S3中,所述“所述采样器通过使用蒙特卡罗方法在所述第一数据集上进行采样以获得第二数据集”的步骤包括:The method according to claim 1, wherein the training model further comprises a Metropolis-Hastings corrector, and in step S3, the "said sampler uses the Monte Carlo method on the first data set The steps of sampling to obtain the second data set include:
    S31,在所述第一数据集上随机选取一个样本作为初始样本;S31, randomly selecting a sample from the first data set as an initial sample;
    S32,对所述初始样本进行T次迭代以获得候选样本;S32: Perform T iterations on the initial sample to obtain candidate samples;
    S33,使用所述Metropolis-Hastings校正器判断所述候选样本是否符合与所述第一数据集一致的分布性质,当判断结果为是时,将当前候选样本加入第二数据集,返回步骤S31;当判断结果为否时,将当前候选样本替换为新的初始样本,返回步骤S32。S33: Use the Metropolis-Hastings corrector to determine whether the candidate sample meets the distribution properties consistent with the first data set, and when the determination result is yes, add the current candidate sample to the second data set, and return to step S31; When the judgment result is no, replace the current candidate sample with a new initial sample, and return to step S32.
  3. 如权利要求2所述方法,其特征在于,所述Metropolis-Hastings校正器通过设置接受/拒绝比率来判断所述候选样本是否符合与所述第一数据集一致的分布性质,其中,所述接受/拒绝比率的范围为0.8~1.4。The method of claim 2, wherein the Metropolis-Hastings corrector determines whether the candidate sample conforms to the distribution property consistent with the first data set by setting an acceptance/rejection ratio, wherein the acceptance The range of the rejection ratio is 0.8 to 1.4.
  4. 如权利要求1所述方法,其特征在于,在步骤S1中,所述“将所述训练集从低维空间P映射至高维空间D,以获得第一数据集”的步骤包括:The method according to claim 1, wherein in step S1, the step of "mapping the training set from the low-dimensional space P to the high-dimensional space D to obtain the first data set" comprises:
    通过字典矩阵和升维算子对所述训练集进行升维以获得第一数据集。The dimensionality of the training set is increased through the dictionary matrix and the dimensionality increase operator to obtain the first data set.
  5. 如权利要求4所述方法,其特征在于,所述字典矩阵随机生成或者利用所述训练集在KSVD算法中训练生成,所述升维算子选自LASSO函数、卷积或者编码中的任意一种。The method of claim 4, wherein the dictionary matrix is randomly generated or the training set is used to train and generate in the KSVD algorithm, and the dimensionality operator is selected from any one of LASSO function, convolution, or coding. Kind.
  6. 如权利要求1所述方法,其特征在于,所述蒙特卡罗方法为随机梯度朗之万动力学采样法或随机梯度哈密尔顿蒙特卡洛采样法。The method of claim 1, wherein the Monte Carlo method is a random gradient Langevin dynamic sampling method or a random gradient Hamilton Monte Carlo sampling method.
  7. 如权利要求1所述方法,其特征在于,所述分类器选自支持向量机算法、随机森林算法或者卷积神经网络算法中的任意一种。The method according to claim 1, wherein the classifier is selected from any one of a support vector machine algorithm, a random forest algorithm, or a convolutional neural network algorithm.
  8. 如权利要求1或4所述方法,其特征在于,使用升维算子或降维算子控制所述训练集、所述第二数据集和所述测试集在相同的维度空间,所述升维/降维算子选自卷积/去卷积、编码/去编码或者LASSO函数中的任意一组。The method according to claim 1 or 4, characterized in that the training set, the second data set and the test set are controlled in the same dimensional space by using an ascending operator or a dimensional reducing operator, and the ascending The dimension/dimension reduction operator is selected from any group of convolution/deconvolution, encoding/decoding or LASSO function.
  9. 如权利要求1所述方法,其特征在于,在步骤S5中,所述“将控制过维度的训练集和第二数据集输入分类器中进行训练”的步骤包括:The method according to claim 1, wherein in step S5, the step of "inputting the dimensional control training set and the second data set into the classifier for training" comprises:
    先将控制过维度的训练集输入分类器中进行训练,训练完成后,再将控制过维度的第二数据集继续输入所述分类器中进行训练;或者First input the dimension-controlled training set into the classifier for training, and after the training is completed, continue to input the dimension-controlled second data set into the classifier for training; or
    将控制过维度的所述训练集和所述第二数据集合并后,再输入分类器中进行训练。After the dimensionality-controlled training set and the second data set are combined, they are input into a classifier for training.
  10. 如权利要求9所述方法,其特征在于,将控制过维度的所述训练集和所述第二数据集按照4:1~7:1的比例合并。The method according to claim 9, wherein the training set and the second data set with over-dimension control are combined in a ratio of 4:1-7:1.
PCT/CN2019/125431 2019-02-19 2019-12-14 Data augmentation method based on high-dimensional spatial sampling WO2020168796A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910123936.6A CN109886333A (en) 2019-02-19 2019-02-19 A kind of data enhancement methods based on higher dimensional space sampling
CN201910123936.6 2019-02-19

Publications (1)

Publication Number Publication Date
WO2020168796A1 true WO2020168796A1 (en) 2020-08-27

Family

ID=66928457

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/125431 WO2020168796A1 (en) 2019-02-19 2019-12-14 Data augmentation method based on high-dimensional spatial sampling

Country Status (2)

Country Link
CN (1) CN109886333A (en)
WO (1) WO2020168796A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112183300A (en) * 2020-09-23 2021-01-05 厦门大学 AIS radiation source identification method and system based on multi-level sparse representation
CN113626414A (en) * 2021-08-26 2021-11-09 国家电网有限公司 Data dimension reduction and denoising method for high-dimensional data set
CN117655118A (en) * 2024-01-29 2024-03-08 太原科技大学 Strip steel plate shape control method and device with multiple modes fused

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109886333A (en) * 2019-02-19 2019-06-14 深圳先进技术研究院 A kind of data enhancement methods based on higher dimensional space sampling
CN111027717A (en) * 2019-12-11 2020-04-17 支付宝(杭州)信息技术有限公司 Model training method and system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140324742A1 (en) * 2013-04-30 2014-10-30 Hewlett-Packard Development Company, L.P. Support vector machine
CN106407664A (en) * 2016-08-31 2017-02-15 深圳市中识创新科技有限公司 Domain self-adaptive method and device of breathing gas diagnosis system
WO2018187950A1 (en) * 2017-04-12 2018-10-18 邹霞 Facial recognition method based on kernel discriminant analysis
CN108921123A (en) * 2018-07-17 2018-11-30 重庆科技学院 A kind of face identification method based on double data enhancing
CN109214401A (en) * 2017-06-30 2019-01-15 清华大学 SAR image classification method and device based on stratification autocoder
CN109886333A (en) * 2019-02-19 2019-06-14 深圳先进技术研究院 A kind of data enhancement methods based on higher dimensional space sampling

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140324742A1 (en) * 2013-04-30 2014-10-30 Hewlett-Packard Development Company, L.P. Support vector machine
CN106407664A (en) * 2016-08-31 2017-02-15 深圳市中识创新科技有限公司 Domain self-adaptive method and device of breathing gas diagnosis system
WO2018187950A1 (en) * 2017-04-12 2018-10-18 邹霞 Facial recognition method based on kernel discriminant analysis
CN109214401A (en) * 2017-06-30 2019-01-15 清华大学 SAR image classification method and device based on stratification autocoder
CN108921123A (en) * 2018-07-17 2018-11-30 重庆科技学院 A kind of face identification method based on double data enhancing
CN109886333A (en) * 2019-02-19 2019-06-14 深圳先进技术研究院 A kind of data enhancement methods based on higher dimensional space sampling

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112183300A (en) * 2020-09-23 2021-01-05 厦门大学 AIS radiation source identification method and system based on multi-level sparse representation
CN112183300B (en) * 2020-09-23 2024-03-22 厦门大学 AIS radiation source identification method and system based on multi-level sparse representation
CN113626414A (en) * 2021-08-26 2021-11-09 国家电网有限公司 Data dimension reduction and denoising method for high-dimensional data set
CN117655118A (en) * 2024-01-29 2024-03-08 太原科技大学 Strip steel plate shape control method and device with multiple modes fused
CN117655118B (en) * 2024-01-29 2024-04-19 太原科技大学 Strip steel plate shape control method and device with multiple modes fused

Also Published As

Publication number Publication date
CN109886333A (en) 2019-06-14

Similar Documents

Publication Publication Date Title
WO2020168796A1 (en) Data augmentation method based on high-dimensional spatial sampling
CN108564129B (en) Trajectory data classification method based on generation countermeasure network
Gao et al. Balanced semisupervised generative adversarial network for damage assessment from low‐data imbalanced‐class regime
WO2022121289A1 (en) Methods and systems for mining minority-class data samples for training neural network
US20180253640A1 (en) Hybrid architecture system and method for high-dimensional sequence processing
CN107690663B (en) Whitening neural network layer
JP6244059B2 (en) Face image verification method and face image verification system based on reference image
JP7266674B2 (en) Image classification model training method, image processing method and apparatus
CN110929679B (en) GAN-based unsupervised self-adaptive pedestrian re-identification method
CN110288030A (en) Image-recognizing method, device and equipment based on lightweight network model
CN108446676B (en) Face image age discrimination method based on ordered coding and multilayer random projection
JP2015095212A (en) Identifier, identification program, and identification method
US20220036231A1 (en) Method and device for processing quantum data
CN110543906B (en) Automatic skin recognition method based on Mask R-CNN model
CN112766399B (en) Self-adaptive neural network training method for image recognition
US11544532B2 (en) Generative adversarial network with dynamic capacity expansion for continual learning
CN109740695A (en) Image-recognizing method based on adaptive full convolution attention network
CN116668327A (en) Small sample malicious flow classification increment learning method and system based on dynamic retraining
CN112085086A (en) Multi-source transfer learning method based on graph convolution neural network
Xue et al. Research on edge detection operator of a convolutional neural network
CN109101984B (en) Image identification method and device based on convolutional neural network
CN111652264B (en) Negative migration sample screening method based on maximum mean value difference
CN111242176B (en) Method and device for processing computer vision task and electronic system
CN108573275B (en) Construction method of online classification micro-service
CN116976461A (en) Federal learning method, apparatus, device and medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19916310

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19916310

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 19916310

Country of ref document: EP

Kind code of ref document: A1