CN114004364A - Sampling optimization method and device, electronic equipment and storage medium - Google Patents

Sampling optimization method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN114004364A
CN114004364A CN202111269562.2A CN202111269562A CN114004364A CN 114004364 A CN114004364 A CN 114004364A CN 202111269562 A CN202111269562 A CN 202111269562A CN 114004364 A CN114004364 A CN 114004364A
Authority
CN
China
Prior art keywords
training
sampling
probability
image sample
micro
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111269562.2A
Other languages
Chinese (zh)
Inventor
李楚鸣
刘宇
王晓刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Sensetime Technology Development Co Ltd
Original Assignee
Shanghai Sensetime Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Sensetime Technology Development Co Ltd filed Critical Shanghai Sensetime Technology Development Co Ltd
Priority to CN202111269562.2A priority Critical patent/CN114004364A/en
Publication of CN114004364A publication Critical patent/CN114004364A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • G06F18/24155Bayesian classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The present disclosure provides a sampling optimization method, apparatus, device and storage medium, the sampling optimization method comprising: acquiring a training sample data set, wherein the training sample data set comprises image samples with labels; carrying out initial sampling on the training sample data set, training a target model based on a sampling result to obtain a pre-trained target model, and determining the characteristics of each image sample based on the pre-trained target model; constructing a probability function based on the features of each image sample; determining a target sampling probability for each image sample based on the probability function and the pre-trained target model. According to the embodiment of the application, the sampling probability of the training samples of the target models in different scenes can be optimized, and then the training efficiency and the model performance of the model can be improved.

Description

Sampling optimization method and device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of artificial intelligence technologies, and in particular, to a sampling optimization method and apparatus, an electronic device, and a storage medium.
Background
With the development of computer technology, various deep learning models gradually start to be applied. In the training process of the deep learning model, the sampling probability of the training samples is an important means for improving the training efficiency and the model performance of the deep learning model. For example, importance sampling improves training speed by increasing the sampling probability of samples with large loss functions.
However, the existing sampling method (such as importance sampling) is mainly used in a scene with many simple samples, and therefore, how to optimize the sampling probability of the training samples for the training task of the deep learning model in different scenes is an important problem to be solved urgently.
Disclosure of Invention
The embodiment of the disclosure at least provides a sampling optimization method, a sampling optimization device, electronic equipment and a storage medium.
In a first aspect, an embodiment of the present disclosure provides a sampling optimization method, including:
acquiring a training sample data set, wherein the training sample data set comprises image samples with labels;
carrying out initial sampling on the training sample data set, training a target model based on a sampling result to obtain a pre-trained target model, and determining the characteristics of each image sample based on the pre-trained target model;
constructing a probability function based on the features of each image sample;
determining a target sampling probability for each image sample based on the probability function and the pre-trained target model.
In the embodiment of the disclosure, a target model is pre-trained to obtain a pre-trained target model, the characteristics of each image sample are determined based on the pre-trained target model, then a probability function is constructed based on the characteristics of each image sample, and then the target sampling probability of each image sample is determined based on the probability function and the pre-trained target model.
In a possible implementation manner, the determining the target sampling probability of each image sample based on the probability function and the pre-trained target model includes:
carrying out hyper-parameter random search on the probability function through Bayesian optimization, determining a primary optimization hyper-parameter, and obtaining a primary optimization sampling probability of each image sample based on the primary optimization hyper-parameter;
sampling the training sample data set based on the initially optimized sampling probability of each image sample, and performing micro-training on the pre-trained target model based on a sampling result to obtain a micro-training result; the micro training refers to training with the training iteration times less than the preset total training times;
determining a target sampling probability for each image sample based on the micro-training results and the total training times.
In the embodiment, the Bayesian optimization is combined with the micro-training, so that the computing resources consumed by searching the sampling probability are greatly reduced, and the efficiency of optimizing the sampling probability can be improved.
In a possible implementation manner, the determining the target sampling probability of each image sample based on the micro training result and the total training times includes:
carrying out hyperparametric search on the probability function through Bayesian optimization based on the micro-training result to obtain suboptimal hyperparameters, and obtaining suboptimal sampling probability of each image sample based on the suboptimal parameters;
sampling the training sample data set based on the suboptimal sampling probability of each image sample, and performing the micro-training on the pre-trained target model based on a sampling result to obtain a micro-training result;
and repeating the steps until the sum of times of each micro training is greater than or equal to the preset total training times, and determining the target sampling probability of each image sample according to the micro training result obtained by each micro training.
In this embodiment, when the sum of the times of each micro training is greater than or equal to the preset total training time, the target sampling probability of each image sample is determined according to the micro training result obtained by each micro training, so that the resource waste caused by excessive training times can be avoided under the condition that the obtained sampling probability is better.
In a possible implementation manner, the determining the target sampling probability of each image sample according to the micro-training result obtained by each micro-training includes:
and determining the sampling probability of each image sample corresponding to the best micro-training result in each micro-training result as the target sampling probability of each image sample.
According to the first aspect, in a possible implementation, the initially sampling the training sample data set includes:
the initial sampling is performed on the training sample data set with the same sampling probability.
According to the first aspect, in one possible implementation, the characteristic of each image sample includes a loss function value of the each image sample and/or a density value of the each image sample.
According to the first aspect, in a possible implementation manner, the probability function includes a piecewise linear function and a linear weighting function of each feature of each image sample, and the hyper-parameters in the probability function include a weight coefficient of each feature of each image sample and a piecewise end point of the piecewise linear function and a function value at the piecewise end point.
According to the first aspect, in one possible implementation, the method further comprises:
and sampling the training sample data set according to the target sampling probability of each image sample, and training the target model based on the sampling result to obtain a trained target model.
In the embodiment of the disclosure, the target model is trained by adopting the target sampling probability, so that the training efficiency and the model performance of the target model can be improved.
In a second aspect, an embodiment of the present disclosure provides a sampling optimization apparatus, including:
the system comprises a sample acquisition module, a data processing module and a data processing module, wherein the sample acquisition module is used for acquiring a training sample data set, and the training sample data set comprises image samples with labels;
the model training module is used for carrying out initial sampling on the training sample data set, training a target model based on a sampling result to obtain a pre-trained target model, and determining the characteristics of each image sample based on the pre-trained target model;
a function construction module for constructing a probability function based on the characteristics of each image sample;
and the probability optimization module is used for determining the target sampling probability of each image sample based on the probability function and the pre-trained target model.
According to the second aspect, in a possible implementation, the probability optimization module is specifically configured to:
carrying out hyper-parameter random search on the probability function through Bayesian optimization, determining a primary optimization hyper-parameter, and obtaining a primary optimization sampling probability of each image sample based on the primary optimization hyper-parameter;
sampling the training sample data set based on the initially optimized sampling probability of each image sample, and performing micro-training on the pre-trained target model based on a sampling result to obtain a micro-training result; the micro training refers to training with the training iteration times less than the preset total training times;
determining a target sampling probability for each image sample based on the micro-training results and the total training times.
According to the second aspect, in a possible implementation, the probability optimization module is specifically configured to:
carrying out hyperparametric search on the probability function through Bayesian optimization based on the micro-training result to obtain suboptimal hyperparameters, and obtaining suboptimal sampling probability of each image sample based on the suboptimal parameters;
sampling the training sample data set based on the suboptimal sampling probability of each image sample, and performing the micro-training on the pre-trained target model based on a sampling result to obtain a micro-training result;
and repeating the steps until the sum of times of each micro training is greater than or equal to the preset total training times, and determining the target sampling probability of each image sample according to the micro training result obtained by each micro training.
According to the second aspect, in a possible implementation, the probability optimization module is specifically configured to:
and determining the sampling probability of each image sample corresponding to the best micro-training result in each micro-training result as the target sampling probability of each image sample.
According to a second aspect, in a possible implementation, the model training module is specifically configured to:
the initial sampling is performed on the training sample data set with the same sampling probability.
According to the second aspect, in one possible implementation, the characteristic of each image sample includes a loss function value of the each image sample and/or a density value of the each image sample.
According to the second aspect, in a possible implementation, the probability function includes a piecewise linear function and a linear weighting function of the respective feature of each image sample, and the hyper-parameters in the probability function include weight coefficients of the respective feature of each image sample and function values at segment endpoints and the segment endpoints of the piecewise linear function.
According to the second aspect, in a possible implementation, the model training module is further configured to:
and sampling the training sample data set according to the target sampling probability of each image sample, and training the target model based on the sampling result to obtain a trained target model.
In a third aspect, an embodiment of the present disclosure provides an electronic device, including: a processor, a memory and a bus, wherein the memory stores machine-readable instructions executable by the processor, the processor and the memory communicate via the bus when the electronic device is running, and the machine-readable instructions, when executed by the processor, perform the sampling optimization method according to any one of the embodiments of the first aspect and the first aspect.
In a fourth aspect, the disclosed embodiments provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, performs the sampling optimization method as described in the first aspect and any implementation manner of the first aspect.
In order to make the aforementioned objects, features and advantages of the present disclosure more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings required for use in the embodiments will be briefly described below, and the drawings herein incorporated in and forming a part of the specification illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the technical solutions of the present disclosure. It is appreciated that the following drawings depict only certain embodiments of the disclosure and are therefore not to be considered limiting of its scope, for those skilled in the art will be able to derive additional related drawings therefrom without the benefit of the inventive faculty.
Fig. 1 illustrates a flow chart of a sampling optimization method provided by an embodiment of the present disclosure;
FIG. 2 is a flowchart illustrating a method for determining a target sampling probability based on a probability function according to an embodiment of the present disclosure;
FIG. 3 is a flowchart illustrating a method for determining a target sampling probability based on a micro-training result according to an embodiment of the present disclosure;
FIG. 4 is a flow chart illustrating another sampling optimization method provided by the embodiments of the present disclosure;
fig. 5 is a schematic structural diagram illustrating a sampling optimization apparatus provided in an embodiment of the present disclosure;
fig. 6 shows a schematic diagram of an electronic device provided by an embodiment of the present disclosure.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, not all of the embodiments. The components of the embodiments of the present disclosure, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present disclosure, presented in the figures, is not intended to limit the scope of the claimed disclosure, but is merely representative of selected embodiments of the disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the disclosure without making creative efforts, shall fall within the protection scope of the disclosure.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
The term "and/or" herein merely describes an associative relationship, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.
First, the terms of the related nouns referred to in the embodiments of the present application are introduced and explained:
artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.
The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
Computer Vision technology (CV) is a science for researching how to make a machine "look", and more specifically, it refers to that a camera and a Computer are used to replace human eyes to perform machine Vision such as identification, tracking and measurement on a target, and further image processing is performed, so that the Computer processing becomes an image more suitable for human eyes to observe or is transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. The computer vision technology generally includes image processing, image Recognition, image semantic understanding, image retrieval, Optical Character Recognition (OCR), video processing, video semantic understanding, video content/behavior Recognition, three-dimensional object reconstruction, 3D technology, virtual reality, augmented reality, synchronous positioning, map construction, and other technologies, and also includes common biometric technologies such as face Recognition and fingerprint Recognition.
Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and the like.
The decision intelligence comprises fully-tuned training configuration, efficient algorithm implementation and a pre-trained model library, and can help researchers and engineers quickly start learning and verification ideas of reinforcement learning and production service baseline models.
Deep Learning (DL) is a new research direction in the field of machine Learning, and is introduced into machine Learning to make it closer to the original goal, artificial intelligence.
Deep learning is the intrinsic law and expression level of the learning sample data, and the information obtained in the learning process is very helpful for the interpretation of data such as characters, images and sounds. The final aim of the method is to enable the machine to have the analysis and learning capability like a human, and to recognize data such as characters, images and sounds.
In the training process of the deep learning model, the sampling probability of the training samples is an important means for improving the training efficiency and the model performance of the deep learning model. For example, importance sampling improves training speed by increasing the sampling probability of samples with large loss functions.
Research shows that the existing sampling method (such as importance sampling) is mainly used in scenes with many simple samples, so how to optimize the sampling probability of the training samples for the training task of the deep learning model in different scenes is an important problem to be solved urgently.
Based on the above research, the present disclosure provides a sampling optimization method, including: acquiring a training sample data set, wherein the training sample data set comprises image samples with labels; carrying out initial sampling on the training sample data set, training a target model based on a sampling result to obtain a pre-trained target model, and determining the characteristics of each image sample based on the pre-trained target model; constructing a probability function based on the features of each image sample; determining a target sampling probability for each image sample based on the probability function and the pre-trained target model.
In the embodiment of the disclosure, a target model is pre-trained to obtain a pre-trained target model, the characteristics of each image sample are determined based on the pre-trained target model, then a probability function is constructed based on the characteristics of each image sample, and then the target sampling probability of each image sample is determined based on the probability function and the pre-trained target model.
In addition, the sampling optimization method can be applied to a terminal, or can be applied to a server, or can be applied to an implementation environment consisting of the terminal and the server. In addition, the sampling optimization method can also be software running in a terminal or a server, such as an application program with a sampling optimization function.
The terminal may be, but is not limited to, a smart phone, a tablet computer, a laptop computer, a desktop computer, a smart speaker, a smart watch, and the like. The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a network service, cloud communication, middleware service, a domain name service, a security service, a CDN, a big data and artificial intelligence platform, and the like. In some possible implementations, the sample optimization method may be implemented by a processor calling computer readable instructions stored in a memory.
Referring to fig. 1, a flowchart of a sampling optimization method provided in the embodiment of the present disclosure is shown, where the sampling optimization method includes the following steps S101 to S104:
s101, a training sample data set is obtained, wherein the training sample data set comprises image samples with labels.
The training sample data set includes a plurality of image samples with labels, and specifically, the image samples with labels refer to image data with known detection results. The type of the training sample data set may also be different according to different application scenarios, for example, taking a living body detection scenario as an example, the image sample may include face image data with a label. In this scenario, the face image data with the label refers to face image data with a known face living body detection result.
Optionally, the image sample may be a target region (e.g., a face region) image extracted after the target image is acquired by the image acquisition device or the electronic device, may also be a target image from the internet or a third-party application (e.g., image processing software), and may also be a target image stored in a database in advance, which is not limited in this application.
S102, initially sampling the training sample data set, training a target model based on a sampling result to obtain a pre-trained target model, and determining the characteristics of each image sample based on the pre-trained target model.
The target model is a model to be trained, and the target model may be different according to different application scenarios, for example, a face recognition model applied to a living body detection scenario and a human body feature detection model applied to an illegal behavior detection scenario, or may be a model to be trained in other scenarios, which is not limited herein.
For example, the initial sampling may be performed on the training sample data set with the same sampling probability, where the sampling probability of the initial sampling is a sampling probability that is not optimized, that is, a target model is trained in a conventional training manner, a pre-trained target model is obtained, and a feature of each image sample is determined based on the pre-trained target model.
Wherein the characteristic of each image sample comprises a loss function value of the each image sample and/or a density value of the each image sample. Specifically, the loss function value of each image sample refers to the difference between the output value of the target model and the label in the training process; the density value of each image sample refers to a ratio of each image sample in the training sample data set, for example, if the training sample data set includes 10 image samples, where an a image sample includes 4 image samples, a B image sample includes 3 image samples, and a C image sample includes 3 image samples, at this time, the density for each a image sample is 4/10.
In other embodiments, the features of each image sample may also include other features, such as, for example, in classification model training, an entropy may be calculated based on the probability of each class in the classification output, e.g., the entropy may be a cross-entropy feature that measures the dissimilarity between two probability distributions.
And S103, constructing a probability function based on the characteristics of each image sample.
Illustratively, after obtaining the features of each image sample, the sampling probability of each image sample may be modeled as a hyper-parametric function with respect to the features of the image sample. Specifically, the expression of the sampling probability function p (x) of the image sample x is defined as p (x) ═ H (g (x)), where H is a piecewise linear function and g is a linear weighting function of the features of the image sample x, for example, if there are n features f1, …, fn per image sample x, g (x) ═ w1 × f1(x) + w2 f2(x) + … + wn × fn (x). The super parameter of the sampling probability function p (x) is set as: the coefficients w of the respective features, the segment endpoints of the piecewise linear function H, and the function values at the endpoints.
Illustratively, when g (x) ranges from 0 to 1, p (x) g (x) +1, and when g (x) ranges from 1 to 2, p (x) 2 g (x).
Thus, in some embodiments, the probability function comprises a piecewise linear function and a linear weighting function for the respective feature of each image sample, and the hyper-parameters in the probability function comprise weight coefficients for the respective feature of each image sample and function values at the piecewise end points and the piecewise end points of the piecewise linear function.
It should be noted that, since the feature of each image sample in the sampling probability function is determined, when the hyper-parameter in the probability function changes, the sampling probability changes, and therefore, different sampling probabilities can be searched by searching different hyper-parameters of the probability function.
And S104, determining the target sampling probability of each image sample based on the probability function and the pre-trained target model.
For example, the training sample data set may be sampled according to the sampling probability obtained by the probability function, the pre-trained target model may be micro-trained according to the sampling result, and whether the current sampling probability is optimal or not may be verified according to the micro-training result. Wherein, the micro training refers to the training with the training iteration number less than the preset total training number.
In the embodiment of the disclosure, the target model is pre-trained to obtain a pre-trained target model, the characteristics of each image sample are determined based on the pre-trained target model, then a probability function is constructed based on the characteristics of each image sample, and then the target sampling probability of each image sample is determined based on the probability function and the pre-trained target model.
Referring to fig. 2, for the above S104, when determining the target sampling probability of each image sample based on the probability function and the pre-trained target model, the following S1041 to S1043 may be included:
s1041, carrying out hyper-parameter random search on the probability function through Bayesian optimization, determining a primary optimization hyper-parameter, and obtaining a primary optimization sampling probability of each image sample based on the primary optimization hyper-parameter.
Bayesian optimization finds the value of the optimized objective function by building a substitute function based on past evaluation results of the objective function. The bayesian approach differs from random or grid search in that it references previous evaluation results when trying the next set of hyper-parameters, thus improving efficiency.
In particular, bayesian optimization includes a statistical model of the function from the hyper-parameter value to the goal evaluated on the validation set. Intuitively, this method assumes some smooth but noisy function as the mapping from hyper-parameters to targets. In bayesian optimization, one objective is to collect observations in order to display as few times as possible the machine learning model while displaying as much information about the function, in particular the best position, as possible. Bayesian optimization relies on assuming a very general prior function that, when combined with the observed hyperparametric value and corresponding output, produces a distribution of functions. The method observes (experimental runs) by iteratively selecting the hyperparameters, in a manner that discards (hyperparameters with the most uncertain results) and exploits (hyperparameters with expected good results).
It can be understood that, when the probability function is subjected to the first hyper-parameter search, since no reference is provided, the probability function can be subjected to the hyper-parameter random search to obtain the initially optimized sampling probability of each image sample.
S1042, sampling the training sample data set based on the initially optimized sampling probability of each image sample, and performing micro-training on the pre-trained target model based on a sampling result to obtain a micro-training result; the micro training refers to training with the number of training iterations being less than the preset total training number.
After the initial optimized sampling probability of each image sample is obtained, the training sample data set can be sampled based on the initial optimized sampling probability of each image sample, then the pre-trained target model is subjected to micro-training based on the sampling result to obtain a micro-training result, then according to the obtained micro-training result, the probability function is subjected to hyper-parameter search through Bayesian optimization to obtain a sub-optimized hyper-parameter, the sub-optimized sampling probability of each image sample is obtained based on the sub-optimized parameter, and then the sub-optimized sampling probability is subjected to micro-training verification.
The micro training refers to training a target model with a small number of iterations, for example, 1000 times of training are preset, and the micro training is only performed 10 times.
S1043, determining the target sampling probability of each image sample based on the micro-training result and the total training times.
For example, referring to fig. 3, for step S1043, when determining the target sampling probability of each image sample based on the micro training result and the total training times, the following steps S10431 to S10433 may be included:
s10431, based on the micro training result, carrying out hyperparameter search on the probability function through Bayesian optimization to obtain a suboptimal hyperparameter, and based on the suboptimal parameters, obtaining a suboptimal sampling probability of each image sample.
S10432, sampling the training sample data set based on the sub-optimal sampling probability of each image sample, and performing the micro-training on the pre-trained target model based on the sampling result to obtain a micro-training result.
And S10433, repeating the above steps until the sum of times of each micro training is greater than or equal to the preset total training times, and determining the target sampling probability of each image sample according to the micro training result obtained by each micro training.
For example, after obtaining the suboptimal sampling probability of each image sample, the suboptimal sampling probability may be verified, that is, the pre-trained target model is micro-trained based on the suboptimal sampling probability of each image sample, and then the sampling probability of each image sample corresponding to the best micro-training result in each micro-training result is determined as the target sampling probability of each image sample. The micro-training result preferably means that the loss between the output result of the pre-trained target model after micro-training and the label of the image sample is minimum, for example, if the target model is a classification model, the accuracy of classification is the highest, and the result of micro-training is indicated to be the best.
In the embodiment of the disclosure, the suboptimal sampling probability of each image sample is obtained through the Bayesian algorithm, and then the pre-trained target model is subjected to micro-training based on the suboptimal sampling probability, that is, Bayesian optimization and micro-training are combined, so that the calculation resources consumed by searching the sampling probability can be greatly reduced, and the efficiency of sampling probability optimization is improved.
It is to be understood that, in other embodiments, the target sampling probability of each image sample may also be determined according to the result of the micro training during the process of repeatedly performing the micro training, for example, in a case that the training result meets a preset requirement, the current sub-optimal sampling probability of each image sample is determined to be the target sampling probability of each image sample; and returning to the step of searching the hyperparameter of the probability function through Bayesian optimization (S10431) under the condition that the training result does not meet the preset requirement until the training result meets the preset requirement to obtain the target sampling probability of each image sample.
Referring to fig. 4, a flowchart of another sampling optimization method provided in the embodiment of the present disclosure is different from the sampling method in fig. 1, in that the sampling optimization method further includes the following S105:
s105, sampling the training sample data set according to the target sampling probability of each image sample, and training the target model based on the sampling result to obtain the trained target model.
Exemplarily, after the target sampling probability obtained by the bayesian optimization is used, the target model can be completely trained by using the target sampling probability to obtain a trained target model, so that the training efficiency and the model performance of the target model can be improved.
It will be understood by those skilled in the art that in the method of the present invention, the order of writing the steps does not imply a strict order of execution and any limitations on the implementation, and the specific order of execution of the steps should be determined by their function and possible inherent logic.
Based on the same technical concept, the embodiment of the present disclosure further provides a sampling optimization device corresponding to the sampling optimization method, and as the principle of solving the problem of the device in the embodiment of the present disclosure is similar to the sampling optimization method in the embodiment of the present disclosure, the implementation of the device may refer to the implementation of the method, and repeated parts are not described again.
Referring to fig. 5, a schematic diagram of a sampling optimization apparatus 500 according to an embodiment of the present disclosure is shown, where the sampling optimization apparatus 500 includes:
a sample obtaining module 501, configured to obtain a training sample data set, where the training sample data set includes image samples with labels;
a model training module 502, configured to perform initial sampling on the training sample data set, train a target model based on a sampling result, obtain a pre-trained target model, and determine a feature of each image sample based on the pre-trained target model;
a function construction module 503, configured to construct a probability function based on the features of each image sample;
a probability optimization module 504, configured to determine a target sampling probability for each image sample based on the probability function and the pre-trained target model.
In a possible implementation manner, the probability optimization module 504 is specifically configured to:
carrying out hyper-parameter random search on the probability function through Bayesian optimization, determining a primary optimization hyper-parameter, and obtaining a primary optimization sampling probability of each image sample based on the primary optimization hyper-parameter;
sampling the training sample data set based on the initially optimized sampling probability of each image sample, and performing micro-training on the pre-trained target model based on a sampling result to obtain a micro-training result; the micro training refers to training with the training iteration times less than the preset total training times;
determining a target sampling probability for each image sample based on the micro-training results and the total training times.
In a possible implementation manner, the probability optimization module 504 is specifically configured to:
carrying out hyperparametric search on the probability function through Bayesian optimization based on the micro-training result to obtain suboptimal hyperparameters, and obtaining suboptimal sampling probability of each image sample based on the suboptimal parameters;
sampling the training sample data set based on the suboptimal sampling probability of each image sample, and performing the micro-training on the pre-trained target model based on a sampling result to obtain a micro-training result;
and repeating the steps until the sum of times of each micro training is greater than or equal to the preset total training times, and determining the target sampling probability of each image sample according to the micro training result obtained by each micro training.
In a possible implementation manner, the probability optimization module 504 is specifically configured to:
and determining the sampling probability of each image sample corresponding to the best micro-training result in each micro-training result as the target sampling probability of each image sample.
In a possible implementation, the model training module 502 is specifically configured to:
the initial sampling is performed on the training sample data set with the same sampling probability.
In a possible embodiment, the characteristic of each image sample includes a loss function value of each image sample and/or a density value of each image sample.
In a possible implementation, the probability function includes a piecewise linear function and a linear weighting function of each feature of each image sample, and the hyper-parameters in the probability function include weight coefficients of each feature of each image sample and a function value at a piecewise endpoint and the piecewise endpoint of the piecewise linear function.
In a possible implementation, the model training module 502 is further configured to:
and sampling the training sample data set according to the target sampling probability of each image sample, and training the target model based on the sampling result to obtain a trained target model.
The description of the processing flow of each module in the device and the interaction flow between the modules may refer to the related description in the above method embodiments, and will not be described in detail here.
Based on the same technical concept, the embodiment of the disclosure also provides an electronic device. Referring to fig. 6, a schematic structural diagram of an electronic device 700 provided in the embodiment of the present disclosure includes a processor 701, a memory 702, and a bus 703. The memory 702 is used for storing execution instructions and includes a memory 7021 and an external memory 7022; the memory 7021 is also referred to as an internal memory and temporarily stores operation data in the processor 701 and data exchanged with an external memory 7022 such as a hard disk, and the processor 701 exchanges data with the external memory 7022 via the memory 7021.
In this embodiment, the memory 702 is specifically configured to store application program codes for executing the scheme of the present application, and is controlled by the processor 701 to execute. That is, when the electronic device 700 is operated, the processor 701 and the memory 702 communicate with each other through the bus 703, so that the processor 701 executes the application program code stored in the memory 702, thereby executing the method described in any of the foregoing embodiments.
The Memory 702 may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Read-Only Memory (EEPROM), and the like.
The processor 701 may be an integrated circuit chip having signal processing capabilities. The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The various methods, steps and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
It is to be understood that the illustrated structure of the embodiment of the present application does not specifically limit the electronic device 700. In other embodiments of the present application, the electronic device 700 may include more or fewer components than shown, or combine certain components, or split certain components, or a different arrangement of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.
The embodiments of the present disclosure also provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program performs the steps of the sampling optimization method in the above method embodiments. The storage medium may be a volatile or non-volatile computer-readable storage medium.
The embodiments of the present disclosure also provide a computer program product, where the computer program product carries a program code, and instructions included in the program code may be used to execute steps of the sampling optimization method in the foregoing method embodiments, which may be referred to specifically for the foregoing method embodiments, and are not described herein again.
The computer program product may be implemented by hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK), or the like.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. In the several embodiments provided in the present disclosure, it should be understood that the disclosed system, apparatus, and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present disclosure may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present disclosure. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.
Finally, it should be noted that: the above-mentioned embodiments are merely specific embodiments of the present disclosure, which are used for illustrating the technical solutions of the present disclosure and not for limiting the same, and the scope of the present disclosure is not limited thereto, and although the present disclosure is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive of the technical solutions described in the foregoing embodiments or equivalent technical features thereof within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present disclosure, and should be construed as being included therein. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims (11)

1. A method for sampling optimization, comprising:
acquiring a training sample data set, wherein the training sample data set comprises image samples with labels;
carrying out initial sampling on the training sample data set, training a target model based on a sampling result to obtain a pre-trained target model, and determining the characteristics of each image sample based on the pre-trained target model;
constructing a probability function based on the features of each image sample;
determining a target sampling probability for each image sample based on the probability function and the pre-trained target model.
2. The method of claim 1, wherein the probability function comprises a hyper-parameter, and wherein determining the target sampling probability for each image sample based on the probability function and the pre-trained target model comprises:
carrying out hyper-parameter random search on the probability function through Bayesian optimization, determining a primary optimization hyper-parameter, and obtaining a primary optimization sampling probability of each image sample based on the primary optimization hyper-parameter;
sampling the training sample data set based on the initially optimized sampling probability of each image sample, and performing micro-training on the pre-trained target model based on a sampling result to obtain a micro-training result; the micro training refers to training with the training iteration times less than the preset total training times;
determining a target sampling probability for each image sample based on the micro-training results and the total training times.
3. The method of claim 2, wherein determining the target sampling probability for each image sample based on the micro-training results and the total number of training passes comprises:
carrying out hyperparametric search on the probability function through Bayesian optimization based on the micro-training result to obtain suboptimal hyperparameters, and obtaining suboptimal sampling probability of each image sample based on the suboptimal parameters;
sampling the training sample data set based on the suboptimal sampling probability of each image sample, and performing the micro-training on the pre-trained target model based on a sampling result to obtain a micro-training result;
and repeating the steps until the sum of times of each micro training is greater than or equal to the preset total training times, and determining the target sampling probability of each image sample according to the micro training result obtained by each micro training.
4. The method of claim 3, wherein determining the target sampling probability for each image sample according to the micro-training result obtained from each micro-training comprises:
and determining the sampling probability of each image sample corresponding to the best micro-training result in each micro-training result as the target sampling probability of each image sample.
5. The method according to any of claims 1 to 4, wherein said initially sampling said set of training sample data comprises:
the initial sampling is performed on the training sample data set with the same sampling probability.
6. The method of any of claims 1 to 5, wherein the characteristics of each image sample comprise a loss function value of the each image sample and/or a density value of the each image sample.
7. The method according to any one of claims 1 to 4, wherein the probability function comprises a piecewise linear function and a linear weighting function of the respective feature of each image sample, and the hyper-parameters in the probability function comprise weight coefficients of the respective feature of each image sample and function values at segment endpoints and segment endpoints of the piecewise linear function.
8. The method of any of claims 1-7, further comprising:
and sampling the training sample data set according to the target sampling probability of each image sample, and training the target model based on the sampling result to obtain a trained target model.
9. A sampling optimization apparatus, characterized in that the apparatus comprises:
the system comprises a sample acquisition module, a data processing module and a data processing module, wherein the sample acquisition module is used for acquiring a training sample data set, and the training sample data set comprises image samples with labels;
the model training module is used for carrying out initial sampling on the training sample data set, training a target model based on a sampling result to obtain a pre-trained target model, and determining the characteristics of each image sample based on the pre-trained target model;
a function construction module for constructing a probability function based on the characteristics of each image sample;
and the probability optimization module is used for determining the target sampling probability of each image sample based on the probability function and the pre-trained target model.
10. An electronic device, comprising: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating over the bus when the electronic device is operating, the machine-readable instructions when executed by the processor performing the sampling optimization method of any one of claims 1-8.
11. A computer-readable storage medium, having stored thereon a computer program which, when being executed by a processor, is adapted to carry out the sample optimization method according to any one of claims 1 to 8.
CN202111269562.2A 2021-10-29 2021-10-29 Sampling optimization method and device, electronic equipment and storage medium Pending CN114004364A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111269562.2A CN114004364A (en) 2021-10-29 2021-10-29 Sampling optimization method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111269562.2A CN114004364A (en) 2021-10-29 2021-10-29 Sampling optimization method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114004364A true CN114004364A (en) 2022-02-01

Family

ID=79925012

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111269562.2A Pending CN114004364A (en) 2021-10-29 2021-10-29 Sampling optimization method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114004364A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115512391A (en) * 2022-09-29 2022-12-23 珠海视熙科技有限公司 Target detection model training method, device and equipment for data adaptive resampling
CN117194791A (en) * 2023-09-18 2023-12-08 上海鱼尔网络科技有限公司 Sampling method, sampling device, computer equipment and storage medium for recommendation algorithm

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115512391A (en) * 2022-09-29 2022-12-23 珠海视熙科技有限公司 Target detection model training method, device and equipment for data adaptive resampling
CN115512391B (en) * 2022-09-29 2023-07-21 珠海视熙科技有限公司 Target detection model training method, device and equipment for data self-adaptive resampling
CN117194791A (en) * 2023-09-18 2023-12-08 上海鱼尔网络科技有限公司 Sampling method, sampling device, computer equipment and storage medium for recommendation algorithm

Similar Documents

Publication Publication Date Title
CN108694225B (en) Image searching method, feature vector generating method and device and electronic equipment
CN109993102B (en) Similar face retrieval method, device and storage medium
Van Der Maaten Barnes-hut-sne
CN111898696A (en) Method, device, medium and equipment for generating pseudo label and label prediction model
CN108229347B (en) Method and apparatus for deep replacement of quasi-Gibbs structure sampling for human recognition
CN111339343A (en) Image retrieval method, device, storage medium and equipment
CN114004364A (en) Sampling optimization method and device, electronic equipment and storage medium
CN110781413A (en) Interest point determining method and device, storage medium and electronic equipment
CN110751027B (en) Pedestrian re-identification method based on deep multi-instance learning
Xiao et al. Motion retrieval based on dynamic Bayesian network and canonical time warping
CN113298197A (en) Data clustering method, device, equipment and readable storage medium
CN114565807A (en) Method and device for training target image retrieval model
CN114298122A (en) Data classification method, device, equipment, storage medium and computer program product
CN113986674A (en) Method and device for detecting abnormity of time sequence data and electronic equipment
CN114282059A (en) Video retrieval method, device, equipment and storage medium
CN114693624A (en) Image detection method, device and equipment and readable storage medium
CN114298997B (en) Fake picture detection method, fake picture detection device and storage medium
CN115062709A (en) Model optimization method, device, equipment, storage medium and program product
CN113761282A (en) Video duplicate checking method and device, electronic equipment and storage medium
CN113705293A (en) Image scene recognition method, device, equipment and readable storage medium
CN111709473A (en) Object feature clustering method and device
CN116975711A (en) Multi-view data classification method and related equipment
Dong et al. Scene-oriented hierarchical classification of blurry and noisy images
CN113395584B (en) Video data processing method, device, equipment and medium
CN114494809A (en) Feature extraction model optimization method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination