WO2022174805A1 - 模型训练与图像处理方法、装置、电子设备和存储介质 - Google Patents

模型训练与图像处理方法、装置、电子设备和存储介质 Download PDF

Info

Publication number
WO2022174805A1
WO2022174805A1 PCT/CN2022/076751 CN2022076751W WO2022174805A1 WO 2022174805 A1 WO2022174805 A1 WO 2022174805A1 CN 2022076751 W CN2022076751 W CN 2022076751W WO 2022174805 A1 WO2022174805 A1 WO 2022174805A1
Authority
WO
WIPO (PCT)
Prior art keywords
sample
training
model
samples
difficult
Prior art date
Application number
PCT/CN2022/076751
Other languages
English (en)
French (fr)
Inventor
马东宇
朱烽
赵瑞
Original Assignee
上海商汤智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海商汤智能科技有限公司 filed Critical 上海商汤智能科技有限公司
Publication of WO2022174805A1 publication Critical patent/WO2022174805A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • G06V40/165Detection; Localisation; Normalisation using facial parts and geometric relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Definitions

  • the present application relates to computer technology, in particular to a model training and image processing method, apparatus, electronic device and storage medium.
  • neural network models are usually trained using training sample sets.
  • some specific scenarios may only include a small amount of sample data, so the constructed training sample set may lack relevant samples in the above-mentioned specific scenarios.
  • the sample data included in the set is not balanced, so that the model cannot learn relevant information in some specific scenarios well, making the model perform poorly in some specific scenarios.
  • the face image data included in the above-mentioned face image set may not be balanced, only a small amount of image data may be included for certain scenes such as children or wearing masks, so that the face recognition model cannot learn well
  • the relevant information of face recognition in such specific scenes makes the model perform poorly in specific scenes such as children or wearing masks.
  • the present application discloses at least one model training method, and the above method includes:
  • training samples Inputting a number of training samples into the model to obtain a loss value corresponding to each training sample; wherein the training samples include training samples of multiple sample types;
  • the at least part of the training samples are respectively stored in a sample set corresponding to the sample type to which they belong, and the model is trained based on the training samples included in the sample set.
  • the above-mentioned determining, based on the above-mentioned loss value, the sample type to which at least some of the above-mentioned training samples belong to includes:
  • the sample type to which the above difficult sample belongs is determined according to the sample characteristics corresponding to the above difficult sample.
  • the above method further includes:
  • a sample set corresponding to each feature center is established.
  • each sample type corresponds to M reference images respectively; wherein, the above-mentioned M is a positive integer;
  • the above-mentioned reference images corresponding to each sample type are respectively input into the above model to obtain the feature centers corresponding to each sample type, including:
  • the M reference features corresponding to each sample type are weighted and averaged respectively to obtain the feature center corresponding to each sample type.
  • the above-mentioned determination of the sample type to which the above-mentioned difficult sample belongs according to the sample characteristics corresponding to the above-mentioned difficult sample includes:
  • the above method further includes:
  • the above-mentioned storing at least a part of the training samples in the sample sets corresponding to the sample types to which they belong respectively includes:
  • the above-mentioned determination of difficult samples in the above-mentioned several training samples based on the above-mentioned loss value includes:
  • N N loss values with larger values among the loss values corresponding to each training sample; wherein, the above N is a positive integer;
  • the training samples corresponding to the above N loss values are determined as the above difficult samples.
  • the above-mentioned determination of difficult samples in the above-mentioned several training samples based on the above-mentioned loss value includes:
  • the training sample is determined as the above-mentioned difficult sample.
  • the above method further includes:
  • the number of stored difficult samples reaches the first preset threshold and the stored difficult samples are input into the above model for training, among the loss values corresponding to the difficult samples obtained in this training, the P loss values with larger values are respectively
  • the corresponding difficult samples are stored in the sample set corresponding to the sample type to which each difficult sample belongs.
  • the above method further includes:
  • a number of training samples are input into the model above, and the loss value corresponding to each training sample is obtained, including:
  • the above-mentioned training of the above-mentioned model based on the training samples included in the above-mentioned sample set includes:
  • the first preset threshold is the number of samples included in the batch data.
  • the above method further includes:
  • the above-mentioned training samples Before using the above-mentioned training samples to train the model, pre-train the model by using the pre-training samples; wherein, the above-mentioned pre-training samples include pre-training samples of multiple sample types.
  • the present application also discloses an image processing method, the method comprising:
  • the above image processing model includes a model trained based on the model training method shown in any of the foregoing embodiments.
  • the present application also discloses a model training device, the device comprising: an input module for inputting several training samples into a model to obtain a loss value corresponding to each training sample; wherein, the training samples include training samples of multiple sample types;
  • a determination module configured to update the model parameters of the above-mentioned model according to the above-mentioned loss value, and determine the difficult samples in each training sample based on the above-mentioned loss value;
  • an update and determination module configured to update the model parameters of the above-mentioned model according to the above-mentioned loss value, and determine the sample type to which at least some of the above-mentioned training samples belong to based on the above-mentioned loss value;
  • the storage and training module is configured to store the above at least part of the training samples in a sample set corresponding to the sample type to which they belong, and to train the above model based on the training samples included in the above sample set.
  • the present application also discloses an image processing device, the device comprising:
  • the acquisition module is used to acquire the target image
  • an image processing module configured to perform image processing on the above-mentioned target image through an image processing model to obtain an image processing result corresponding to the above-mentioned target image
  • the above image processing model includes a model trained based on the model training method shown in any of the foregoing embodiments.
  • the application also discloses an electronic device, the device comprising:
  • a memory for storing the above-mentioned processor-executable instructions
  • the above-mentioned processor is configured to invoke the executable instructions stored in the above-mentioned memory to implement the above-mentioned model training method or weight image processing method.
  • the present application also discloses a computer-readable storage medium, where the storage medium stores a computer program, and the computer program is used to implement the aforementioned model training method or image processing method.
  • the above solution can determine the difficult samples based on the loss value obtained from the training.
  • the training samples can be classified and stored, the model can be trained based on the difficult samples in the sample set, and the model parameters can be updated. Therefore, on the one hand, difficult samples can be screened out and trained for the difficult samples in the process of using the training samples to train the model, so that there is no need to construct a training set for the difficult samples for independent training, which reduces the workload of developers; on the other hand, Various types of difficult samples can be used to train the model, thereby increasing the number of times of optimization of the model by various types of difficult samples, so that the model performs better in each specific scenario.
  • Fig. 2 is the method flow chart of a kind of model training method shown in this application;
  • FIG. 3 is a schematic flowchart of a model training method shown in the application.
  • FIG. 5 is a schematic diagram of the internal structure of a memory unit shown in the application.
  • FIG. 6 is a schematic structural diagram of a model training device shown in the application.
  • FIG. 7 is a schematic diagram of a hardware structure of an electronic device shown in this application.
  • FIG. 1 is a schematic flowchart of a traditional model training method shown in this application. It should be noted that the description of the process shown in FIG. 1 is only a schematic description of the process of the model training method, and fine-tuning can be performed in practical applications.
  • S102 (not shown) usually needs to be executed first to prepare training samples during model training.
  • the above training samples can usually be a collection of multiple face images annotated with human objects.
  • the original images can usually be labeled with ground truth by means of manual labeling or machine-assisted labeling.
  • image annotation software can be used to annotate the human object indicated by the face included in the original image, thereby obtaining several training samples. It should be noted that, when the true value is labeled, one-hot coding and other methods may be used for labeling, and this application does not limit the specific labeling method.
  • S104 may be executed to generate batch data from the above-mentioned several training samples by random sampling in each training process. After the above batch data is obtained, the batch data can be input into the above model for training.
  • the above batch data may specifically include several training samples.
  • the above-mentioned training samples need to be input into the above-mentioned model for training during the current round of iterative training, so as to update the parameters of the above-mentioned model.
  • the above training samples may be face images marked with true values.
  • this application does not specifically limit the number of samples included in the batch data.
  • a single-data model training scheme can also be used, and this scheme can refer to the batch data scheme, which will not be described in detail here.
  • the forward propagation method can be used in the above model to obtain the feature map corresponding to each training sample, and the face recognition result corresponding to each training sample obtained by this training is output through the connected classifier.
  • S106 may be executed, and the true values corresponding to the training samples corresponding to the face recognition results obtained by training and the training samples corresponding to the recognition results are input into the preset loss function to calculate the loss value corresponding to each training sample .
  • the above-mentioned preset loss function may be a loss function commonly used in the field of face recognition, which is not particularly limited here.
  • S108 may be executed, and the above-mentioned model parameters are updated by back-propagating the gradient through the gradient descent method.
  • the above gradient descent method may be Stochastic Gradient Descent (SGD), Batch Gradient Descent (BGD), or Mini-Batch Gradient Descent (MBGD), where Not particularly limited.
  • the above-mentioned S102-S108 may be repeatedly performed until the above-mentioned model converges.
  • the above is the traditional model training method. It is not difficult to find that since the face image data included in the above training samples may not be balanced, only a small amount of image data may be included for specific scenes such as children or wearing masks (ie difficult samples) , so that the face recognition model cannot learn the relevant information for face recognition in this type of specific scene well, so that the model performs poorly in specific scenes such as children or wearing masks.
  • the present application proposes a model training method.
  • a model training method in the process of using training samples to train the model, difficult samples are selected from the training sample set, and batch data is formed with the type of difficult samples as the dimension, and the model is trained and learned centrally. Difficult samples are screened out and trained for the difficult samples, so that there is no need to build a training set for the difficult samples for independent training, which reduces the workload of developers; on the other hand, various types of difficult samples can be used to train the model, thereby Increase the number of optimizations of various types of difficult samples to the model, so that the model performs better in specific scenarios.
  • FIG. 2 is a method flowchart of a model training method shown in this application. As shown in Figure 2, the above method may include:
  • S206 Store the at least part of the training samples in a sample set corresponding to the sample type to which they belong, respectively, and train the model based on the training samples included in the sample set.
  • the above model training method can be applied to electronic devices.
  • the above-mentioned electronic device may execute the above-mentioned model training method by carrying a software system corresponding to the model training method.
  • the types of the above electronic devices may be notebook computers, computers, servers, mobile phones, PAD terminals, etc., which are not particularly limited in this application.
  • the above model training method can be executed only by the terminal device or the server device alone, or can be executed by the terminal device and the server device in cooperation.
  • the above model training methods can be integrated on the client side.
  • the terminal device equipped with the client After receiving the model training request, the terminal device equipped with the client can provide computing power through its own hardware environment to execute the above model training method.
  • the above model training method can be integrated into the system platform.
  • the server device equipped with the system platform can provide computing power through its own hardware environment to execute the above model training method.
  • the above model training method can be divided into two tasks: constructing a training sample set and performing model training based on the training sample set.
  • the construction of the training sample set can be integrated in the client and carried on the terminal device.
  • the model training task can be integrated on the server and carried on the server device.
  • the above terminal device may initiate a model training request to the above server device after constructing the training sample set.
  • the server device may, in response to the request, perform training on the model based on the training sample set.
  • the execution subject is an electronic device (hereinafter referred to as a device) as an example for description.
  • the above model may be a model constructed based on a neural network.
  • the above-mentioned models can be models of different structures and uses.
  • the above model may be a face recognition model constructed based on a convolutional network (hereinafter referred to as a "model").
  • the above model may be an image processing model constructed based on LSTM (Long Short Term Memory Model).
  • the above-mentioned model may be a human body recognition model constructed based on a convolutional network, and so on. The embodiments are described below by taking the field of face recognition as an example.
  • the above-mentioned model parameters specifically refer to various parameters that need to be adjusted in the above-mentioned model. It can be understood that training the model is actually a process of continuously adjusting the above model parameters. When the model converges, it is considered that the above model parameters are adjusted optimally.
  • model convergence means that the model reaches a certain preset convergence condition during the training process. It is understandable that the model convergence can be considered to have completed this training.
  • the present application does not specifically limit the specific conditions for model convergence.
  • the above-mentioned training samples Before using the above-mentioned training samples to train the model, pre-train the model by using the pre-training samples; wherein, the above-mentioned pre-training samples include pre-training samples of multiple sample types. This can speed up the model convergence speed and improve the model training efficiency.
  • At least part of the training samples in the above-mentioned several training samples may refer to difficult samples.
  • the above-mentioned difficult samples specifically refer to training samples with large loss values (ie, difficult-to-learn samples) that appear in the training process. It is understandable that difficult samples can usually represent data in infrequent scenarios. Therefore, the prediction of difficult samples by a model trained on data in common scenarios is usually inaccurate. It can be seen that in this application, it is feasible to determine the difficult samples through the loss value obtained by the model training.
  • difficult samples can be specific types of image data such as face images wearing masks, children's face images, and elderly face images.
  • the difficult samples in the several training samples can be determined based on the loss value. Then, the sample type to which the above difficult sample belongs is determined according to the sample characteristics corresponding to the above difficult sample.
  • FIG. 3 is a schematic flowchart of a model training method shown in this application. It should be noted that the description of the process shown in FIG. 3 is only a schematic description of the process of the model training method, and fine-tuning may be performed in practical applications. FIG. 3 does not show the process of updating model parameters by backpropagation.
  • the memory unit shown in FIG. 3 is specifically a virtual unit, which can be implemented by carrying specific codes: storing difficult samples, and outputting the stored difficult samples when the number of difficult samples reaches a first preset threshold.
  • a preset threshold value which may be a value set according to experience.
  • the size of the above-mentioned first preset threshold may be the same as the number of samples included in the batch data.
  • the above-mentioned memory unit may include a sample set, a counter and an output subunit.
  • the above-mentioned sample set can be used to store difficult samples.
  • the aforementioned counter can be used to indicate the number of difficult samples stored in the memory unit.
  • the above-mentioned output subunit is used to obtain and output the stored difficult samples from the sample set.
  • the above-described sample set may include a linear data structure, such as in the form of a queue. It can be understood that, when the sample set is in the form of a queue, the maximum capacity corresponding to the queue can be set as the above-mentioned first preset threshold. At this time, when the queue data is full, it can be considered that the number of stored difficult samples has reached the first preset threshold. Of course, at this time, the above-mentioned counter may not necessarily be included in the above-mentioned memory unit.
  • S302 (not shown) needs to be executed first to prepare training samples.
  • the above training samples can usually be a collection of multiple face images annotated with human objects.
  • the original images can usually be labeled with ground truth by means of manual labeling or machine-assisted labeling.
  • image annotation software can be used to annotate the human object indicated by the face included in the original image, thereby obtaining several training samples. It should be noted that, when constructing a training sample, one-hot encoding and other methods may be used for construction, and this application does not limit the specific method of constructing a training sample.
  • training samples including multiple sample types may be randomly sampled.
  • the above sample type is specifically used to indicate the scene type to which the sample belongs.
  • the sample in the field of face recognition, when a sample image includes a child's face, the sample can be considered as a child sample type.
  • the sample image includes an elderly person's face, it can be considered that the sample belongs to the elderly sample type.
  • the sample image includes a face wearing a mask, it can be considered that the image belongs to the sample type wearing a mask. Therefore, it is ensured that the training samples include various types of training samples, and the training effect is improved.
  • S202 may be executed to input the several training samples into the model to obtain the loss value corresponding to each training sample.
  • S304 may be executed to construct batch data based on several training samples, and input the above batch data into the model for training.
  • batch data is generated from the above-mentioned several training samples by random sampling. After the above batch data is obtained, the batch data can be input into the above model for training.
  • the forward propagation method can be used in the above model to obtain the feature map corresponding to each training sample, and the face recognition result corresponding to each training sample obtained by this training is output through the connected classifier.
  • this application does not specifically limit the number of samples included in the batch data.
  • a single-data model training scheme can also be used, and this scheme can refer to the batch data scheme, which will not be described in detail here.
  • S306 may be executed, and the true values corresponding to the training samples corresponding to the face recognition results obtained by training and the training samples corresponding to the recognition results are input into the preset loss function to calculate the loss value corresponding to each training sample .
  • the above-mentioned preset loss function may be a loss function commonly used in the field of face recognition, which is not particularly limited here.
  • S204 may be executed, the model parameters of the above-mentioned model are updated according to the above-mentioned loss value, and the difficult sample in each training sample is determined based on the above-mentioned loss value.
  • S308 (not shown in the figure) can be performed to update the above-mentioned model parameters by back-propagating the gradient through the gradient descent method; on the other hand, S310 can be performed, based on The loss value corresponding to the sample determines the difficult samples included in the above training samples.
  • N loss values with larger values may be determined among the loss values corresponding to each training sample.
  • the above N is a positive integer.
  • the loss values corresponding to each training sample can be sorted in descending order. After the sorting is completed, the top N loss values may be determined as N loss values with larger numerical values. It should be noted here that the above N may be a numerical value set according to experience. The present application does not specifically limit the numerical value of N.
  • the training samples corresponding to the above N loss values respectively may be determined as the above difficult samples.
  • the above-mentioned second preset threshold may be a value set according to experience. Reaching the second preset threshold includes at least two cases of being greater than or equal to the second preset threshold.
  • the above-mentioned second preset threshold may be a reference line for measuring whether the training sample is a difficult sample. If the loss value corresponding to any training sample reaches the above-mentioned second preset threshold, the training sample is determined as the above-mentioned difficult sample.
  • S204 may be continued to determine the sample type of the above-mentioned difficult sample.
  • the sample type of the above difficult sample is specifically used to indicate the scene type to which the difficult sample belongs.
  • the difficult sample belongs to the type of child sample.
  • the image included in the difficult sample is the face of the elderly, it can be considered that the difficult sample belongs to the type of the elderly sample.
  • the image included in the difficult sample is a face wearing a mask, it can be considered that the difficult sample belongs to the type of sample wearing a mask.
  • the feature center of the difficult sample extracted by the above model can be compared with the feature center of each sample type extracted by the above model, and the sample type corresponding to the matching feature center can be determined as The sample type of the above difficult samples.
  • the above sample set can be used to store difficult samples.
  • the reference image corresponding to each sample type can be input into the above model to obtain the feature center corresponding to each sample type; wherein, the above-mentioned feature center is used to determine the sample type to which the difficult sample belongs. Then a sample set corresponding to each feature center is established.
  • the above model When the above model is trained based on the difficult samples in the above sample set, it may be determined whether the number of difficult samples in each sample set reaches a first preset threshold. If so, input the difficult samples in the sample set into the above model for training, and update the above model parameters; otherwise, continue to accumulate difficult samples.
  • S206 may be executed to store the above-mentioned difficult samples in a sample set corresponding to the sample type to which the above-mentioned difficult samples belong, and train the above-mentioned model based on the above-mentioned difficult samples in the above-mentioned sample set.
  • the above-mentioned difficult samples may be stored in the above-mentioned memory unit.
  • the above-mentioned memory unit may determine whether the number of stored difficult samples reaches the above-mentioned first preset threshold periodically or after each difficult sample is received. If it is reached, the stored difficult samples are input into the above model for training, and the above model parameters are updated. If not reached, no action is performed.
  • the first preset threshold is the number of samples included in the batch data.
  • S312 may be executed to construct the stored difficult samples into batch data, and input the above model for training to update the above model parameters.
  • the model can be easily calculated.
  • the above scheme can determine the difficult samples based on the loss value obtained from the training. After the difficult samples are determined, the training samples can be classified and stored, the model can be trained based on the difficult samples in the sample set, and the model parameters can be updated. Therefore, on the one hand, difficult samples can be screened out and trained for the difficult samples in the process of using the training samples to train the model, so that there is no need to construct a training set for the difficult samples for independent training, which reduces the workload of developers; on the other hand, Various types of difficult samples can be used to train the model, thereby increasing the number of times of optimization of the model by various types of difficult samples, so that the model performs better in each specific scenario.
  • the model in order to enable the model to learn relevant sample information in various specific scenarios, thereby improving the performance of the model in various scenarios, when performing S206 to store difficult samples, it is possible to first determine the samples to which the above difficult samples belong. Type (ie the scene to which it belongs). After determining the sample type to which the difficult samples belong, the above-mentioned difficult samples are classified and stored.
  • Type ie the scene to which it belongs.
  • the above sample type is specifically used to indicate the scene type to which the difficult sample belongs.
  • the difficult sample belongs to the type of child sample.
  • the image included in the difficult sample is the face of the elderly, it can be considered that the difficult sample belongs to the type of the elderly sample.
  • the image included in the difficult sample is a face wearing a mask, it can be considered that the difficult sample belongs to the type of sample wearing a mask.
  • FIG. 4 is a schematic flowchart of a model training method shown in the present application. It should be noted that the description of the process shown in FIG. 4 is only a schematic description of the process of the model training method, and fine-tuning can be performed in practical applications. FIG. 4 does not show the process of updating model parameters by backpropagation.
  • the memory unit shown in FIG. 4 is specifically a virtual unit, which can be implemented by carrying specific codes: classifying and storing difficult samples, and when the number of difficult samples of any type reaches a first preset threshold, The difficult samples in the set are input into the above model for training, and the above model parameters are updated.
  • the above-mentioned memory unit may include several sample sets corresponding to the sample types, counters and output subunits.
  • the above-mentioned several sample sets corresponding to the sample types are used to store various types of difficult sample data.
  • the above counters can be used to indicate the number of difficult samples stored in each sample set.
  • the above-mentioned output subunit is used to obtain and output the stored difficult samples from the set of samples that meet the conditions.
  • the above-described sample set may include a linear data structure, such as in the form of a queue. It can be understood that, when the sample set is in the form of a queue, the maximum capacity corresponding to the queue can be set as the above-mentioned first preset threshold. At this time, when the queue data is full, it can be considered that the number of stored difficult samples has reached the first preset threshold. Of course, at this time, the above-mentioned counter may not necessarily be included in the above-mentioned memory unit.
  • FIG. 5 is a schematic diagram of the internal structure of a memory unit shown in the present application. It should be noted that the internal schematic shown in FIG. 5 is only a schematic illustration, and fine-tuning can be performed in practical applications.
  • the above-mentioned memory unit may include various sample types.
  • the sample types included in the memory unit can be preset according to actual business requirements. For example, when the business requirements need to improve the face recognition ability of the model for the elderly and children, the above-mentioned memory unit can set the sample type of the elderly and the sample type of children. For another example, when the business requirements need to improve the face recognition ability of the model for the elderly, children and people wearing masks, the above memory unit can set the sample type of the elderly, the sample type of children and the sample type of wearing masks. The following description will be given by taking as an example that the memory unit includes the elderly sample type and the child sample type.
  • the above-mentioned memory units may also include Normal sample type.
  • the above-mentioned normal sample type is used to store difficult samples in general scenarios (ie, non-specific scenarios).
  • the training samples include the three types of the elderly, adults, and children
  • the adults are the types in the conventional scene.
  • the memory unit may also include difficult sample data representing adults in general scenarios.
  • the first sample type may indicate a normal type; the second sample type may indicate an elderly type; and the third sample type may indicate a child type.
  • a corresponding sample set may be created in the memory unit for each sample type.
  • the above sample set is in the form of a queue.
  • the maximum capacity corresponding to each queue may be set to the above-mentioned first preset threshold (batch data size). When the data of any queue is full, it can be considered that the number of difficult samples stored in the queue has reached the above-mentioned first preset threshold.
  • the first sample type queue can be used to store normal types of difficult samples; the second sample type queue can be used to store elderly type difficult samples; the first sample type queue can be used to store child types difficult samples.
  • the feature center corresponding to each sample type can also be determined.
  • the reference image corresponding to each sample type may be input into the above model to obtain the feature center corresponding to each sample type.
  • the above-mentioned feature center is specifically used to determine the sample type to which the difficult sample belongs.
  • the feature centers may be labeled with the pixels of the feature vector.
  • sample type to which the above difficult sample belongs can be determined by determining the feature center that is most similar to the sample feature corresponding to the difficult sample.
  • sample features specifically refer to the features obtained after performing convolution operations and pooling operations on difficult samples.
  • sample features may be characterized in the form of feature vectors.
  • face images belonging to each sample type may be selected first.
  • the memory unit includes an elderly sample type and a child sample type
  • a child face image and an elderly face image can be selected as reference images.
  • the reference image corresponding to each sample type can be input into the above model for forward propagation to obtain the feature center corresponding to each sample type.
  • M reference images may be selected for each sample type.
  • M is a positive integer. It can be understood that, in some examples, the number of reference images selected for each sample type may also be different. For example, 10 sheets are selected for the elderly type, and 8 sheets are selected for the child type. The following description will be given with the same number of reference images selected for each sample type.
  • the first reference image set may include M reference images of normal type; the second reference image set may include M reference images of elderly type; the third reference image set may include M reference images of child type image.
  • the M reference images corresponding to each sample type may be input into the above model to obtain M reference features corresponding to each sample type.
  • the above-mentioned reference features may include features obtained by performing operations such as convolution and pooling on the reference image.
  • the aforementioned fiducial features can be characterized in the form of feature vectors.
  • the M reference features corresponding to each sample type are obtained, the M reference features corresponding to each sample type are weighted and averaged to obtain a feature center corresponding to each sample type.
  • the above-mentioned M is an empirical threshold, which is not particularly limited here.
  • the above-mentioned reference feature is a feature map obtained by performing feature extraction on the reference image through the above-mentioned model (for example, several convolution operations).
  • weight used in the above weighted average is not particularly limited in this application.
  • the above weight may be 1.
  • the types of samples included in the memory cells described above may not be determined.
  • a clustering algorithm such as K-MEANS can be used to cluster the obtained difficult samples to obtain the sample types included in the memory unit.
  • the sample features of each of the above-mentioned difficult samples obtained by the above-mentioned model can be compared to obtain the similarity of the above-mentioned difficult samples. Then, based on the similarity of the above-mentioned difficult samples, the categories to which the different sample data in the above-mentioned difficult samples belong are classified.
  • the above-mentioned difficult samples may include several unknown sample types, and the above-mentioned clustering algorithm can reasonably classify the difficult samples to obtain several sample types.
  • the feature center of each of the above-mentioned sample sets can be obtained by calculating the average similarity of the difficult samples in the sample sets corresponding to each category. Therefore, when new difficult samples are obtained, the sample features of the newly obtained difficult samples can be compared with the feature centers of each of the above-mentioned sample sets for similarity, and the newly obtained difficult samples can be stored in the above-mentioned sample sets of their corresponding categories. middle.
  • the manual determination of the sample type can be avoided, and the unsupervised clustering is performed according to the actual situation of the difficult samples, so as to obtain the sample type of the difficult samples that is more suitable for the actual situation, thereby improving the The model predicts the effect.
  • the first feature center can be the feature center corresponding to the normal type; the second feature center can be the feature center corresponding to the elderly type; the third feature center can be the feature center corresponding to the child type.
  • the sample type to which the target difficult sample belongs can be determined through the three types of feature maps.
  • S404 may be continued to construct batch data based on several training samples, and input the above batch data into the model for training.
  • S406 may be executed, and the true values corresponding to the training samples corresponding to the face recognition results obtained by training and the recognition results are input into the preset loss function. Calculate the loss value corresponding to each training sample.
  • S408 (not shown in the figure) can be executed, and the above-mentioned model parameters can be updated by back-propagating the gradient through the gradient descent method; on the other hand, S410 can be executed, based on the previous To the loss value corresponding to each training sample obtained after propagation, the difficult samples included in each of the above training samples are determined.
  • S412 may be executed to determine the sample type to which the above difficult sample belongs.
  • the similarity between the above-mentioned sample features and each feature center can be determined through a similarity calculation scheme such as cosine distance or Mahalanobis distance.
  • a similarity calculation scheme such as cosine distance or Mahalanobis distance.
  • the highest similarity among the above-mentioned similarities can be determined, and the sample type corresponding to the feature center corresponding to the above-mentioned highest similarity is determined as the sample to which the above-mentioned difficult sample belongs type.
  • the above determined similarities may be sorted in descending order, and the first similarity may be determined as the highest similarity.
  • the feature center corresponding to the highest similarity may be determined by querying the maintained correspondence.
  • the sample type corresponding to the feature center may be determined as the sample type to which the above-mentioned difficult sample belongs.
  • S414 may be executed to store the above-mentioned difficult sample in a sample set corresponding to the sample type to which the above-mentioned difficult sample belongs.
  • the above-mentioned difficult samples may be stored in a queue corresponding to the sample type to which the above-mentioned difficult samples belong.
  • the image data corresponding to the difficult sample can be inserted into the child type queue (ie, the third sample type queue).
  • S416 may be executed, and the difficult samples in the sample set are input into the above-mentioned model for training, and the above-mentioned model parameters are updated.
  • any sample data set queue included in the above-mentioned memory unit is full, it can be considered that the number of difficult samples stored in the queue has reached the first preset threshold.
  • the difficult samples stored in the above queue can be extracted to construct batch data. After the batch data is constructed, the batch data can be input into the above model for training, and the model parameters can be updated.
  • the model can be trained for a variety of specific types of difficult samples.
  • the trained model can have better performance in a variety of scenarios of this specific type; on the other hand, it is not necessary to establish training samples for multiple types separately, reducing the workload of developers.
  • the difficult samples corresponding to the P loss values with larger values are stored in the sample set corresponding to the sample type to which each difficult sample belongs.
  • P is a positive integer set according to experience.
  • the sample types of the difficult samples corresponding to the P loss values can be determined, and the difficult samples corresponding to the P loss values are stored in the sample set corresponding to the sample type to which each difficult sample belongs. middle.
  • the loss value corresponding to each difficult sample obtained in this training can also be calculated.
  • the difficult samples corresponding to the larger P loss values are stored in the sample set corresponding to the sample type to which each difficult sample belongs. Therefore, the difficult samples with larger loss values can be stored multiple times and the model can be trained multiple times, thereby increasing the number of difficult samples. Through the optimization times of this type of difficult samples to the model, the model performs better for this type of difficult samples.
  • the present application also proposes an image processing method.
  • This method can be applied to any electronic device.
  • This method performs image processing by using the image processing model trained by the training method shown in any of the foregoing embodiments, so as to ensure that the above-mentioned image processing model not only performs well in conventional scenarios, but also in different specific scenarios. Improve image processing.
  • the above method may include:
  • a target image is acquired and image processing is performed on the above target image through an image processing model to obtain an image processing result corresponding to the above target image.
  • the above-mentioned target image can be any image that needs to be processed.
  • the above-mentioned target image may be an image containing a face object.
  • the above-mentioned image processing model can be any model that needs to perform image processing.
  • the above-mentioned image processing model may be a face recognition model.
  • the present application further provides a model training device.
  • FIG. 6 is a schematic structural diagram of a model training apparatus shown in the present application.
  • the above-mentioned apparatus 600 may include: an input module 610 for inputting several training samples into a model to obtain a loss value corresponding to each training sample; wherein, the above-mentioned training samples include training samples of multiple sample types;
  • an update and determination module 620 configured to update the model parameters of the above-mentioned model according to the above-mentioned loss value, and determine the sample type to which at least some of the above-mentioned training samples belong to the above-mentioned several training samples based on the above-mentioned loss value;
  • the storage and training module 630 is configured to store the above at least part of the training samples into a sample set corresponding to the sample type to which they belong, and to train the above model based on the training samples included in the above sample set.
  • the above updating and determining module 620 includes:
  • a first determination module configured to determine the difficult samples in the above-mentioned several training samples based on the above-mentioned loss value
  • the second determining module is configured to determine the sample type to which the above difficult sample belongs according to the sample characteristics corresponding to the above difficult sample.
  • the above-mentioned apparatus 600 further includes:
  • a sample set corresponding to each feature center is established.
  • each sample type corresponds to M reference images respectively; wherein, the above-mentioned M is a positive integer; the above-mentioned establishment module is specifically used for:
  • the M reference features corresponding to each sample type are weighted and averaged respectively to obtain the feature center corresponding to each sample type.
  • the above-mentioned updating and determining module 620 is specifically used for:
  • the above-mentioned apparatus 600 further includes:
  • the classification module compares the sample features of each of the above-mentioned difficult samples obtained through the above-mentioned model, and obtains the similarity of the above-mentioned difficult samples;
  • the above-mentioned updating and determining module 620 is specifically used for:
  • the above-mentioned updating and determining module 620 is specifically used for:
  • N N loss values with larger values among the loss values corresponding to each training sample; wherein, the above N is a positive integer;
  • the training samples corresponding to the above N loss values are determined as the above difficult samples.
  • the above-mentioned updating and determining module 620 is specifically used for:
  • the training sample is determined as the above-mentioned difficult sample.
  • the above-mentioned apparatus 600 further includes:
  • the storage module after the number of stored difficult samples reaches the first preset threshold and the stored difficult samples are input into the above-mentioned model for training, the loss values corresponding to the difficult samples obtained in this training are P with larger values
  • the difficult samples corresponding to the loss values are stored in the sample set corresponding to the sample type to which each difficult sample belongs.
  • the above-mentioned apparatus 600 further includes:
  • the batch processing module constructs batch data based on the above training samples before inputting several training samples into the model;
  • a number of training samples are input into the model above, and the loss value corresponding to each training sample is obtained, including:
  • the above-mentioned storage and training module 630 is specifically used for:
  • the first preset threshold is the number of samples included in the batch data.
  • the above-mentioned apparatus 600 further includes:
  • the pre-training module uses the pre-training samples to pre-train the model before using the above-mentioned training samples for model training; wherein, the above-mentioned pre-training samples include pre-training samples of multiple sample types.
  • the present application also proposes an image processing apparatus, and the above-mentioned apparatus may include:
  • the acquisition module is used to acquire the target image
  • the image processing module is configured to perform image processing on the above target image by using an image processing model to obtain an image processing result corresponding to the above target image.
  • the above-mentioned image processing model includes a model obtained based on the model training method shown in any of the foregoing embodiments.
  • model training apparatus or the image processing apparatus shown in this application can be applied to electronic devices.
  • an electronic device which may include: a processor;
  • memory for storing processor-executable instructions
  • the above-mentioned processor is configured to call the executable instructions stored in the above-mentioned memory to implement the model training method or the image processing method as shown above.
  • FIG. 7 is a schematic diagram of a hardware structure of an electronic device shown in this application.
  • the electronic device may include a processor for executing instructions, a network interface for making network connections, a memory for storing operating data for the processor, and a model training device or image processing device for storing Non-volatile memory for the corresponding instruction.
  • the embodiments of the foregoing apparatus may be implemented by software, or may be implemented by hardware or a combination of software and hardware.
  • a device in a logical sense is formed by reading the corresponding computer program instructions in the non-volatile memory into the memory for operation by the processor of the electronic device where the device is located.
  • the electronic device where the apparatus is located in the embodiment may also include other Hardware, no further details on this.
  • the corresponding instructions of the model training apparatus or the image processing apparatus may also be directly stored in the memory, which is not limited herein.
  • the present application proposes a computer-readable storage medium, where the storage medium stores a computer program, and the computer program is used to execute the model training method or the image processing method as shown above.
  • one or more embodiments of the present application may be provided as a method, system or computer program product. Accordingly, one or more embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, one or more embodiments of the present application may employ a computer implemented on one or more computer-usable storage media (which may include, but are not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein The form of the program product.
  • computer-usable storage media which may include, but are not limited to, disk storage, CD-ROM, optical storage, etc.
  • Embodiments of the subject matter and functional operations described in this application can be implemented in digital electronic circuits, in tangible embodiment of computer software or firmware, in computer hardware which can include the structures disclosed in this application and their structural equivalents, or in A combination of one or more of.
  • Embodiments of the subject matter described in this application may be implemented as one or more computer programs, ie, one or more of computer program instructions encoded on a tangible non-transitory program carrier for execution by, or to control the operation of, data processing apparatus. multiple modules.
  • the program instructions may be encoded on an artificially generated propagated signal, such as a machine-generated electrical, optical or electromagnetic signal, which is generated to encode and transmit information to a suitable receiver device for interpretation by the data.
  • the processing device executes.
  • the computer storage medium may be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of these.
  • the processes and logic flows described in this application can be performed by one or more programmable computers executing one or more computer programs to perform corresponding functions by operating on input data and generating output.
  • the processes and logic flows described above can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, eg, an FPGA (Field Programmable Gate Array) or an ASIC (Application Specific Integrated Circuit).
  • FPGA Field Programmable Gate Array
  • ASIC Application Specific Integrated Circuit
  • a computer suitable for the execution of a computer program may include, for example, a general and/or special purpose microprocessor, or any other type of central processing unit.
  • the central processing unit will receive instructions and data from read only memory and/or random access memory.
  • the basic components of a computer may include a central processing unit for implementing or executing instructions and one or more memory devices for storing instructions and data.
  • a computer will also include, or be operably coupled to, such mass storage devices to receive data therefrom or to include one or more mass storage devices, such as magnetic disks, magneto-optical disks, or optical disks, etc., for storing data. Send data to it, or both.
  • the computer does not have to have such a device.
  • the computer may be embedded in another device such as a mobile phone, personal digital assistant (PDA), mobile audio or video player, game console, global positioning system (GPS) receiver, or a universal serial bus (USB) ) flash drives for portable storage devices, to name a few.
  • PDA personal digital assistant
  • GPS global positioning system
  • USB universal serial bus
  • Computer readable media suitable for storage of computer program instructions and data may include all forms of non-volatile memory, media, and memory devices, and may include, for example, semiconductor memory devices (eg, EPROM, EEPROM, and flash memory devices), magnetic disks (eg, internal hard disks) or removable discs), magneto-optical discs, and CD-ROM and DVD-ROM discs.
  • semiconductor memory devices eg, EPROM, EEPROM, and flash memory devices
  • magnetic disks eg, internal hard disks
  • removable discs removable discs
  • magneto-optical discs e.g., CD-ROM and DVD-ROM discs.
  • the processor and memory may be supplemented by or incorporated in special purpose logic circuitry.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Geometry (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

本申请公开模型训练与图像处理方法、装置、电子设备和存储介质。其中,上述方法可以包括,将若干训练样本输入模型,得到各训练样本对应的损失值;其中,上述训练样本包括多个样本类型的训练样本;根据上述损失值更新上述模型的模型参数,并基于上述损失值确定上述若干训练样本中至少部分训练样本所属的样本类型;将上述至少部分训练样本分别存储至与其所属的样本类型对应的样本集合中,以及基于上述样本集合包括的训练样本对上述模型进行训练。

Description

模型训练与图像处理方法、装置、电子设备和存储介质
相关申请的交叉引用
本公开要求于2021年02月22日提交的、申请号为202110198534.X的中国专利申请的优先权,该中国专利申请公开的全部内容以引用的方式并入本文中。
技术领域
本申请涉及计算机技术,具体涉及模型训练与图像处理方法、装置、电子设备和存储介质。
背景技术
在神经网络领域中,通常会使用训练样本集对神经网络模型进行训练。而在实际情形中某些特定场景可能仅包括少量的样本数据,因此构建的训练样本集可能缺少上述特定场景下的相关样本,从而在基于上述训练样本集进行神经网络训练时,可能由于训练样本集包括的样本数据并不均衡,导致该模型不能很好的学习到在某些特定场景下的相关信息,使该模型在某些特定场景下表现较差。
例如,在人脸识别领域,经常需要根据人脸图像集合(训练样本集),对人脸识别模型进行训练,以期望该人脸识别模型具有很好的人脸识别效果。
但是,由于上述人脸图像集合中包括的人脸图像数据可能并不均衡,针对诸如小孩或戴口罩等特定场景可能仅包括少量的图像数据,从而导致该人脸识别模型不能很好的学习到在该类特定场景下对人脸进行识别的相关信息,使该模型在小孩或戴口罩等特定场景下表现较差。
发明内容
有鉴于此,本申请至少公开一种模型训练方法,上述方法包括:
将若干训练样本输入模型,得到各训练样本对应的损失值;其中,上述训练样本包括多个样本类型的训练样本;
根据上述损失值更新上述模型的模型参数,并基于上述损失值确定上述若干训练样本中至少部分训练样本所属的样本类型;
将上述至少部分训练样本分别存储至与其所属的样本类型对应的样本集合中,以及基于上述样本集合包括的训练样本对上述模型进行训练。
在示出的一些实施例中,上述基于上述损失值确定上述若干训练样本中至少部分训练样本所属的样本类型,包括:
基于上述损失值确定上述若干训练样本中的困难样本;
根据上述困难样本对应的样本特征确定上述困难样本所属的样本类型。
在示出的一些实施例中,上述方法还包括:
分别将与各样本类型对应的基准图像输入上述模型得到与各样本类型对应的特征中心;其中,上述特征中心用于确定困难样本所属的样本类型;
建立与各特征中心分别对应的样本集合。
在示出的一些实施例中,各样本类型分别对应M个基准图像;其中,上述M为正整数;
上述分别将与各样本类型对应的基准图像输入上述模型得到与各样本类型对应的特征中心,包括:
分别将与各样本类型对应的上述M个基准图像输入上述模型得到与各样本类型对应的M个基准特征;
分别对各样本类型对应的M个基准特征进行加权平均,得到与各样本类型对应的特征中心。
在示出的一些实施例中,上述根据上述困难样本对应的样本特征确定上述困难样本所属的样本类型,包括:
确定上述困难样本输入上述模型后得到的样本特征,与各特征中心之间的相似度;
确定上述相似度中的最高相似度,并将上述最高相似度对应的特征中心所对应的样本类型确定为上述困难样本所属的样本类型。
在示出的一些实施例中,上述方法还包括:
对通过上述模型获得的每个上述困难样本的样本特征进行比对,得到上述困难样本的相似度;
基于上述困难样本的相似度,将上述困难样本中不同样本数据所属类别进行分类。
在示出的一些实施例中,上述将上述至少部分训练样本分别存储至与其所属的样本类型对应的样本集合中,包括:
计算每种类别对应的样本集合中困难样本的平均相似度,得到每种上述样本集合的特征中心;
将新获得的困难样本的样本特征与每种上述样本集合的特征中心进行相似度比对,并将新获得的困难样本存储到其对应类别的上述样本集合中。
在示出的一些实施例中,上述基于上述损失值确定上述若干训练样本中的困难样本,包括:
确定各训练样本对应的损失值中,数值较大的N个损失值;其中,上述N为正整数;
将与上述N个损失值分别对应的训练样本确定为上述困难样本。
在示出的一些实施例中,上述基于上述损失值确定上述若干训练样本中的困难样本,包括:
确定各训练样本对应的损失值是否达到第二预设阈值;
若任一训练样本对应的损失值达到上述第二预设阈值,则将该训练样本确定为上述困难样本。
在示出的一些实施例中,上述方法还包括:
在存储的困难样本数量达到第一预设阈值并将存储的困难样本输入上述模型进行训练之后,将本次训练得到的与各困难样本对应的损失值中,数值较大的P个损失值分别对应的困难样本存储至与各困难样本所属样本类型对应的样本集合中。
在示出的一些实施例中,上述方法还包括:
在将若干训练样本输入模型之前,基于上述训练样本构建批处理数据;
上述将若干训练样本输入模型,得到各训练样本对应的损失值,包括:
将上述批处理数据输入模型,得到所述批处理数据中各训练样本对应的损失值。
在示出的一些实施例中,上述基于上述样本集合包括的训练样本对上述模型进行训练,包括:
确定各样本集合中困难样本数量是否达到第一预设阈值;
如果达到,则将该样本集合中的困难样本输入上述模型进行训练,更新上述模型参数;否则,继续累加困难样本。
在示出的一些实施例中,上述第一预设阈值为上述批处理数据所包括的样本数量。
在示出的一些实施例中,上述方法还包括:
在利用上述训练样本进行模型训练前,利用预训练样本对该模型进行预训练;其中,上述预训练样本包括多个样本类型的预训练样本。
本申请还公开一种图像处理方法,上述方法包括:
获取目标图像;
通过图像处理模型对上述目标图像进行图像处理,得到与上述目标图像对应的图像处理结果;
其中,上述图像处理模型包括基于前述任一实施例示出的模型训练方法训练得到的模型。
本申请还公开一种模型训练装置,上述装置包括:输入模块,用于将若干训练样本输入模型,得到各训练样本对应的损失值;其中,上述训练样本包括多个样本类型的训练样本;
确定模块,用于根据上述损失值更新上述模型的模型参数,并基于上述损失值确定各训练样本中的困难样本;
更新与确定模块,用于根据上述损失值更新上述模型的模型参数,并基于上述损失值确定上述若干训练样本中至少部分训练样本所属的样本类型;
存储与训练模块,用于将上述至少部分训练样本分别存储至与其所属的样本类型对应的样本集合中,以及基于上述样本集合包括的训练样本对上述模型进行训练。
本申请还公开一种图像处理装置,上述装置包括:
获取模块,用于获取目标图像;
图像处理模块,用于通过图像处理模型对上述目标图像进行图像处理,得到与上述目标图像对应的图像处理结果;
其中,上述图像处理模型包括基于前述任一实施例示出的模型训练方法训练得到的模型。
本申请还公开一种电子设备,上述设备包括:
处理器;
用于存储上述处理器可执行指令的存储器;
其中,上述处理器被配置为调用上述存储器中存储的可执行指令,实现如前述的模型训练方法或权图像处理方法。
本申请还公开一种计算机可读存储介质,上述存储介质存储有计算机程序,上述计算机程序用于实现如前述的模型训练方法或权图像处理方法。
本申请中,在模型训练过程中,上述方案可以基于训练得到的损失值确定困难样本。在确定困难样本后可以分类存储上述训练样本,并基于上述样本集合中的困难样本对上述模型进行训练,更新上述模型参数。因此,一方面可以在利用训练样本对模型进行训练过程中筛选出困难样本并针对该困难样本进行训练,从而无需针对困难样本单独构建训练集进行独立训练,减少开发人员工作量;另一方面,可以利用各种类型的困难样本对模型进行训练,从而增加各类型的困难样本对模型的优化次数,使得模型在各特定场景下表现更好。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本申请。
附图说明
为了更清楚地说明本申请一个或多个实施例或相关技术中的技术方案,下面将对实施例或相关技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请一个或多个实施例中记载的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1为本申请示出的一种传统模型训练方法的流程示意图;
图2为本申请示出的一种模型训练方法的方法流程图;
图3为本申请示出的一种模型训练方法的流程示意图;
图4为本申请示出的一种模型训练方法的流程示意图;
图5为本申请示出的一种记忆单元内部结构示意图;
图6为本申请示出的一种模型训练装置的结构示意图;
图7为本申请示出的一种电子设备的硬件结构示意图。
具体实施方式
下面将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本申请相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本申请的一些方面相一致的设备和方法的例子。
在本申请使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本申请。在本申请和所附权利要求书中所使用的单数形式的“一种”、“上述”和“该”也旨在可以包括多数形式,除非上下文清楚地表示其他含义。还应当理解,本文中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。还应当理解,本文中所使用的词语“如果”,取决于语境,可以被解释成为“在……时”或“当……时”或“响应于确定”。
在介绍本申请实施例前,先介绍传统的模型训练方法。以下以人脸识别领域为例进行实施例说明。
请参见图1,图1为本申请示出的一种传统模型训练方法的流程示意图。需要说明的是,图1示出的流程说明仅为针对模型训练方法流程的示意性说明,在实际应用中可以进行微调。
如图1所示,在进行模型训练时通常需要先执行S102(图未绘示),准备训练样本。
在人脸识别领域中,上述训练样本通常可以是标注了人员对象的多张人脸图像的集合。在准备上述训练样本时,通常可以采用人工标注或机器辅助标注的方式对原始图像进行真值标注。例如,在获取到原始图像后,可以使用图像标注软件对原始图像中包括的人脸指示的人员对象进行标注,从而得到若干训练样本。需要说明的是,在真值标注时可以采用one-hot编码等方式进行标注,本申请不对标注的具体方式进行限定。
在得到若干训练样本后,可以执行S104,在每次训练过程中通过随机采样的方式,从上述若干训练样本中生成批处理数据。在得到上述批处理数据后,可以将该批处理数据输入上述模型中进行训练。
上述批处理数据,具体可以包括若干训练样本。其中,上述训练样本需要在本轮迭代训练过程中被输入上述模型进行训练,以对上述模型进行参数更新。可以理解的是,在人脸识别领域中,上述训练样本可以是被标注了真值的人脸图像。需要说明的是,本申请不对批处理数据包括的样本数量进行特别限定。本申请中也可以采用单数据进行模型训练的方案,该方案可以参照批处理数据方案,在此不作详述。
在本步骤中,在上述模型中可以采用前向传播的方式得到与各训练样本对应的特征图,并通过连接的分类器输出本次训练得到的与各训练样本分别对应的人脸识别结果。
在得到上述人脸识别结果后,可以执行S106,将训练得到的各人脸识别结果与各识别结果分别对应的训练样本对应的真值输入预设的损失函数中计算各训练样本对应的损失值。
其中,上述预设的损失函数可以是在人脸识别领域中常用的损失函数,在此不作特别限定。
在得到各训练样本对应的损失值后,可以执行S108,通过梯度下降法,对梯度进行反向传播更新上述模型参数。
其中,上述梯度下降法可以是随机梯度下降法(Stochastic Gradient Descent,SGD),批量梯度下降法(Batch Gradient Descent,BGD),或小批量梯度下降法(Mini-Batch Gradient Descent,MBGD),在此不作特别限定。
在执行完一次训练后,可以重复执行上述S102-S108,直至上述模型收敛。
以上即为传统的模型训练方法,不难发现,由于上述训练样本中包括的人脸图像数据可能并不均衡,针对诸如小孩或戴口罩等特定场景可能仅包括少量的图像数据(即困难样本),从而导致该人脸识别模型不能很好的学习到在该类特定场景下对人脸进行识别的相关信息,使该模型在小孩或戴口罩等特定场景下表现较差。
为了提升诸如人脸识别模型针对某些特定场景的表现性能,在传统技术中通常需要构建与特定场景相关的特定类型的训练样本,并对已经训练好的模型继续进行若干次训练,从而达到对该模型进行微调的效果。
不难发现相关技术中对模型的训练不仅需要针对不同的场景进行样本构建,而且还需要对该模型进行多次独立训练,可见,相关技术相对繁琐,对开发人员很不友好。
基于此,本申请提出一种模型训练方法。该方法通过在利用训练样本对模型进行训练过程中,从训练样本集中筛选出困难样本,并以困难样本类型为维度形成批处理数据, 对模型进行集中训练学习,从而一方面,可以在训练过程中筛选出困难样本并针对该困难样本进行训练,从而无需针对困难样本单独构建训练集进行独立训练,减少开发人员工作量;另一方面,可以利用各种类型的困难样本对模型进行训练,从而增加各类型的困难样本对模型的优化次数,使得模型在各特定场景下表现更好。
请参见图2,图2为本申请示出的一种模型训练方法的方法流程图。如图2所示,上述方法可以包括:
S202,将若干训练样本输入模型,得到各训练样本对应的损失值;其中,上述训练样本包括多个样本类型的训练样本;
S204,根据上述损失值更新上述模型的模型参数,并基于上述损失值确定上述若干训练样本中至少部分训练样本所属的样本类型;
S206,将上述至少部分训练样本分别存储至与其所属的样本类型对应的样本集合中,以及基于上述样本集合包括的训练样本对上述模型进行训练。
上述模型训练方法可以应用于电子设备中。其中,上述电子设备可以通过搭载与模型训练方法对应的软件系统执行上述模型训练方法。本申请实施例中,上述电子设备的类型可以是笔记本电脑,计算机,服务器,手机,PAD终端等,在本申请中不作特别限定。
可以理解的是,上述模型训练方法既可以仅通过终端设备或服务端设备单独执行,也可以通过终端设备与服务端设备配合执行。
例如,上述模型训练方法可以集成于客户端。搭载该客户端的终端设备在接收到模型训练请求后,可以通过自身硬件环境提供算力执行上述模型训练方法。
又例如,上述模型训练方法可以集成于系统平台。搭载该系统平台的服务端设备在接收到模型训练请求后,可以通过自身硬件环境提供算力执行上述模型训练方法。
还例如,上述模型训练方法可以分为构建训练样本集与基于训练样本集进行模型训练两个任务。其中,构建训练样本集可以集成于客户端并搭载于终端设备。模型训练任务可以集成于服务端并搭载于服务端设备。上述终端设备可以在构建训练样本集后向上述服务端设备发起模型训练请求。上述服务端设备在接收到上述模型训练请求后,可以响应于上述请求基于上述训练样本集对上述模型进行训练。
以下以执行主体为电子设备(以下简称设备)为例进行说明。
上述模型,可以是基于神经网络构建的模型。在不同领域中,上述模型可以为不同结构与用途的模型。例如,在人脸识别领域中,上述模型可以为基于卷积网络构建的人脸识别模型(以下简称“模型”)。又例如,在自动驾驶领域中,上述模型可以为基于LSTM(长短期记忆模型)构建的图像处理模型。还例如,上述模型可以为基于卷积网络构建的人体识别模型等等。以下以人脸识别领域为例进行实施例说明。
上述模型参数具体是指上述模型中需要被调整的各类参数。可以理解的是,对模型进行训练实际是不断调整上述模型参数的过程。当模型收敛时,即认为上述模型参数被调整的最优。
上述模型收敛是指模型在训练过程中达到某种预设的收敛条件。可以理解的是,模型收敛可以认为已经完成了本次训练。本申请不对模型收敛的具体条件进行特别限定。
在利用上述训练样本进行模型训练前,利用预训练样本对该模型进行预训练;其中,上述预训练样本包括多个样本类型的预训练样本。由此可以加快模型收敛速度,提高模型训练效率。
上述若干训练样本中至少部分训练样本,可以是指困难样本。上述困难样本,具体是指在训练过程中出现的损失值较大的训练样本(即难学的样本)。可以理解的是,困难样本通常可以代表不经常出现的场景下的数据,因此,经过常见场景下的数据训练得到的模型针对困难样本的预测通常是不准确的。可见,在本申请中通过模型训练得到的损失值来确定困难样本是可以实施的。
例如,在人脸识别领域中,困难样本可以是戴口罩的人脸图像,小孩人脸图像,老人人脸图像等特定类型的图像数据。
此时在基于上述损失值确定上述若干训练样本中至少部分训练样本所属的样本类型时,可以基于上述损失值确定上述若干训练样本中的困难样本。然后再根据上述困难样本对应的样本特征确定上述困难样本所属的样本类型。
以下结合人脸识别领域进行本申请实施例的说明。
请参见图3,图3为本申请示出的一种模型训练方法的流程示意图。需要说明的是,图3示出的流程说明仅为针对模型训练方法流程的示意性说明,在实际应用中可以进行微调。图3并未示出反向传播更新模型参数的过程。
图3中示出的记忆单元具体是一个虚拟单元,该记忆单元可以通过搭载具体的代码实现:存储困难样本,以及当困难样本数量达到第一预设阈值时,输出存储的困难样本,上述第一预设阈值,具体可以是根据经验设定的数值。在一些例子中,为了便于模型进行运算,上述第一预设阈值的大小可以与批处理数据包括的样本数量相同。
上述记忆单元中可以包括样本集合,计数器以及输出子单元。其中,上述样本集合可以用于存储困难样本。上述计数器可以用于指示记忆单元中存储的困难样本数量。上述输出子单元用于从样本集合中获取存储的困难样本并进行输出。
在一些例子中,为了方便数据存取,上述样本集合可以包括诸如队列形式的线性数据结构。可以理解的是,当样本集合为队列形式时,可以将队列对应的最大容量设置为上述第一预设阈值。此时,当队列数据存满后,即可认为存储的困难样本数量达到了第一预设阈值。当然,此时上述记忆单元中可以不必包括上述计数器。
在进行模型训练时,需要先执行S302(图未绘示),准备训练样本。
在人脸识别领域中,上述训练样本通常可以是标注了人员对象的多张人脸图像的集合。在准备上述训练样本时,通常可以采用人工标注或机器辅助标注的方式对原始图像进行真值标注。例如,在获取到原始图像后,可以使用图像标注软件对原始图像中包括的人脸指示的人员对象进行标注,从而得到若干训练样本。需要说明的是,在构建训练样本时,可以采用one-hot编码等方式进行构建,本申请不对构建训练样本的具体方式进行限定。
其中,在准备训练样本时可以随机采样包括多个样本类型的训练样本。
上述样本类型,具体用于指示样本所属的场景类型。例如,在人脸识别领域中,当样本图像包括小孩人脸时,可以认为该样本为小孩样本类型。当样本图像包括老人人脸时,可以认为该样本属于老人样本类型。当样本图像包括戴口罩人脸时,可以认为该图像属于戴口罩样本类型。由此保证训练样本包括多种类型的训练样本,提升训练效果。
在得到若干训练样本后,可以执行S202,将若干训练样本输入模型,得到各训练样本对应的损失值。
其中,在上述训练过程中,在执行S202之前可以先执行S304,基于若干训练样本构建批处理数据,并将上述批处理数据输入模型进行训练。
具体地,在每次训练过程中通过随机采样的方式,从上述若干训练样本中生成批处理数据。在得到上述批处理数据后,可以将该批处理数据输入上述模型中进行训练。
在本步骤中,在上述模型中可以采用前向传播的方式得到与各训练样本对应的特征图,并通过连接的分类器输出本次训练得到的与各训练样本分别对应的人脸识别结果。
需要说明的是,本申请不对批处理数据包括的样本数量进行特别限定。本申请中也可以采用单数据进行模型训练的方案,该方案可以参照批处理数据方案,在此不作详述。
在得到上述人脸识别结果后,可以执行S306,将训练得到的各人脸识别结果与各识别结果分别对应的训练样本对应的真值输入预设的损失函数中计算各训练样本对应的损失值。
其中,上述预设的损失函数可以是在人脸识别领域中常用的损失函数,在此不作特别限定。
在得到各训练样本对应的损失值后,可以执行S204,根据上述损失值更新上述模型的模型参数,并基于上述损失值确定各训练样本中的困难样本。
具体地,一方面,可以执行S308(图未绘示),通过梯度下降法,对梯度进行反向传播更新上述模型参数;另一方面,可以执行S310,基于前向传播之后得到的与各训练样本对应的损失值,确定上述各训练样本中包括的困难样本。
在确定困难样本时,在一些例子中,可以确定各训练样本对应的损失值中,数值较大的N个损失值。其中,上述N为正整数。例如,可以将各训练样本对应的损失值按照从大到小的顺序排序。在排序完成后,可以将排在前N位的损失值确定为数值较大的N个损失值。在此需要说明的是,上述N可以是根据经验设定的数值。本申请不对N的数值进行特别限定。
在确定数值较大的N个损失值后,可以将与上述N个损失值分别对应的训练样本确定为上述困难样本。
在另一些例子中,确定各训练样本对应的损失值是否达到第二预设阈值。
其中,上述第二预设阈值可以是根据经验设定的数值。达到上述第二预设阈值至少包括大于或大于等于上述第二预设阈值两种情况。上述第二预设阈值可以是衡量训练样本是否为困难样本的参考线。若任一训练样本对应的损失值达到上述第二预设阈值,则将该训练样本确定为上述困难样本。
在确定困难样本之后,可以继续执行S204,确定上述困难样本的样本类型。
上述困难样本的样本类型,具体用于指示困难样本所属的场景类型。例如,在人脸识别领域中,当困难样本包括的图像为小孩人脸时,可以认为该困难样本属于小孩样本类型。当困难样本包括的图像为老人人脸时,可以认为该困难样本属于老人样本类型。当困难样本包括的图像为戴口罩人脸时,可以认为该困难样本属于戴口罩样本类型。
在确定困难样本的样本类型时,可以将通过上述模型提取的困难样本的特征中心,与通过上述模型提取的各样本类型的特征中心进行比对,并将匹配的特征中心对应的样本类型确定为上述困难样本的样本类型。
上述样本集合,可以用于存储困难样本。在一些例子中,可以分别将与各样本类型对应的基准图像输入上述模型得到与各样本类型对应的特征中心;其中,上述特征中心用于确定困难样本所属的样本类型。然后建立与各特征中心分别对应的样本集合。
在基于上述样本集合中的困难样本对上述模型进行训练时,可以确定各样本集合中困难样本数量是否达到第一预设阈值。如果达到,则将该样本集合中的困难样本输入上 述模型进行训练,更新上述模型参数;否则,继续累加困难样本。
请继续参见图3,在确定困难样本之后,可以执行S206,将上述困难样本存储至与上述困难样本所属样本类型对应的样本集合,以及基于上述样本集合中的困难样本对上述模型进行训练。
在一些例子中,可以将上述困难样本存储至上述记忆单元中。
上述记忆单元可以定期或在每收到一个困难样本后,确定存储的困难样本的数量是否达到上述第一预设阈值。如果达到,则将存储的困难样本输入上述模型进行训练,更新上述模型参数。如果未达到,则不执行任何动作。
在一些例子中,上述第一预设阈值为上述批处理数据所包括的样本数量。在将存储的困难样本输入上述模型进行训练,更新上述模型参数时,可以执行S312,将存储的困难样本构造为批处理数据,输入上述模型进行训练更新上述模型参数。
由于上述记忆单元输出的困难样本数量与训练模型时构建的批处理数据包括的样本数量一致,因此可以便于模型进行计算。
在对模型参数完成一次更新后,可以重复执行上述S304-S312的步骤直至上述模型收敛。
在模型训练过程中,上述方案可以基于训练得到的损失值确定困难样本。在确定困难样本后可以分类存储上述训练样本,并基于上述样本集合中的困难样本对上述模型进行训练,更新上述模型参数。因此,一方面可以在利用训练样本对模型进行训练过程中筛选出困难样本并针对该困难样本进行训练,从而无需针对困难样本单独构建训练集进行独立训练,减少开发人员工作量;另一方面,可以利用各种类型的困难样本对模型进行训练,从而增加各类型的困难样本对模型的优化次数,使得模型在各特定场景下表现更好。
在一些实施例中,为了使模型可以学习到多种特定场景下的相关样本信息,从而提升模型在多种场景下的表现性能,在执行S206存储困难样本时,可以先确定上述困难样本所属样本类型(即所属场景)。在确定困难样本所属样本类型后,对上述困难样本进行分类存储。
其中,上述样本类型,具体用于指示困难样本所属的场景类型。例如,在人脸识别领域中,当困难样本包括的图像为小孩人脸时,可以认为该困难样本属于小孩样本类型。当困难样本包括的图像为老人人脸时,可以认为该困难样本属于老人样本类型。当困难样本包括的图像为戴口罩人脸时,可以认为该困难样本属于戴口罩样本类型。
请参见图4,图4为本申请示出的一种模型训练方法的流程示意图。需要说明的是,图4示出的流程说明仅为针对模型训练方法流程的示意性说明,在实际应用中可以进行微调。图4并未示出反向传播更新模型参数的过程。
图4中示出的记忆单元具体是一个虚拟单元,该记忆单元可以通过搭载具体的代码实现:分类存储困难样本,以及当任一类型的困难样本数量达到第一预设阈值时,将该样本集合中的困难样本输入上述模型进行训练,更新上述模型参数。
在上述记忆单元中可以包括若干与样本类型对应的样本集合,计数器以及输出子单元。
其中,上述若干与样本类型对应的样本集合,用于存储各种类型的困难样本数据。上述计数器可以用于指示各样本集合中存储的困难样本数量。上述输出子单元用于从满足条件的样本集合中获取存储的困难样本并进行输出。
在一些例子中,为了方便数据存取,上述样本集合可以包括诸如队列形式的线性数据结构。可以理解的是,当样本集合为队列形式时,可以将队列对应的最大容量设置为上述第一预设阈值。此时,当队列数据存满后,即可认为存储的困难样本数量达到了第一预设阈值。当然,此时上述记忆单元中可以不必包括上述计数器。
在模型训练中,通常需要对记忆单元进行初始化处理。以下通过介绍记忆单元内部结构介绍对记忆单元的初始化过程。
请参见图5,图5为本申请示出的一种记忆单元内部结构示意图。需要说明的是,图5示出的内部示意仅为一种示意性说明,在实际应用中可以进行微调。
如图5所示,上述记忆单元中可以包括多种样本类型。其中,记忆单元包括的样本类型可以根据实际业务需求进行预先设置。例如,当业务需求需要提升模型对老人和小孩的人脸识别能力时,上述记忆单元中可以设定老人样本类型与小孩样本类型。又例如,当业务需求需要提升模型对老人、小孩以及戴口罩人员的人脸识别能力时,上述记忆单元中可以设定老人样本类型、小孩样本类型以及戴口罩样本类型。以下以记忆单元中包括老人样本类型与小孩样本类型为例进行说明。
需要说明的是,由于上述若干训练样本中大量的数据仍然为常规场景下的样本数据,因此,为了提升记忆单元兼容性使模型学习到各类困难样本的相关信息,上述记忆单元中还可以包括正常样本类型。其中,上述正常样本类型用于存储常规场景下(即非特定场景)的困难样本。
例如,当训练样本包括老人,成年人,小孩这三种类型时,成年人就是常规场景下的类型。此时,在记忆单元中除了包括老人类型与小孩类型的困难样本数据外,还可以包括代表成年人这一类常规场景下的困难样本数据。
请继续参见图5,其中第一样本类型可以指示正常类型;第二样本类型可以指示老人类型;第三样本类型可以指示小孩类型。
在上述方案中,由于常规场景下的困难样本数据也被存储起来,因此,可以增加该类场景下的困难样本对模型的优化次数,使得模型对于该类场景下的困难样本表现更好。
在确定记忆单元中包括的样本类型后,可以在记忆单元中为各样本类型创建对应的样本集合。其中,上述样本集合为队列形式。每个队列对应的最大容量可以设置为上述第一预设阈值(批处理数据大小)。当任一队列数据存满后,即可认为该队列存储的困难样本数量达到了上述第一预设阈值。
请继续参见图5,其中第一样本类型队列可以用于存储正常类型的困难样本;第二样本类型队列可以用于存储老人类型的困难样本;第一样本类型队列可以用于存储小孩类型的困难样本。
在确定记忆单元中包括的样本类型后,还可以确定各样本类型对应的特征中心。
在确定各样本类型对应的特征中心时,可以分别将与各样本类型对应的基准图像输入上述模型得到与各样本类型对应的特征中心。
其中,上述特征中心具体用于确定困难样本所属的样本类型。在一些例子中,可以以特征向量的像素标注上述特征中心。
可以理解的是,通过确定与困难样本对应的样本特征最相似的特征中心,即可确定上述困难样本所属的样本类型。
上述样本特征,具体指对困难样本进行诸如卷积操作,池化操作后得到的特征。 在一些例子中,可以使用特征向量的形式表征上述样本特征。
在此步骤中,可以先选取属于各样本类型的人脸图像。例如,当记忆单元中包括老人样本类型与小孩样本类型时,可以选取一张小孩人脸图像与一张老人人脸图像作为基准图像。
在确定基准图像后,可以将各样本类型对应的基准图像输入上述模型中进行前向传播得到与各样本类型对应的特征中心。
在一些例子中,为了确定更为精准的特征中心,在选取精准图像时,可以针对各样本类型分别选取M个基准图像。其中,上述M为正整数。可以理解的是,在一些例子中,各样本类型选取的基准图像的数量也可以是不一样的。例如,老人类型选取10张,小孩类型选取8张。以下以各样本类型选取的基准图像的数量一样进行说明。
请继续参见图5,其中第一基准图像集合可以包括正常类型的M个基准图像;第二基准图像集合可以包括老人类型的M个基准图像;第三基准图像集合可以包括小孩类型的M个基准图像。
此时,在确定特征中心时,可以分别将与各样本类型对应的上述M个基准图像输入上述模型得到与各样本类型对应的M个基准特征。
上述基准特征,可以包括对基准图像进行诸如卷积操作,池化操作后得到的特征。在一些例子中,可以通过特征向量的形式表征上述基准特征。
在得到各样本类型对应的M个基准特征之后,再分别对各样本类型对应的M个基准特征进行加权平均,得到与各样本类型对应的特征中心。
其中,上述M为经验阈值,在此不作特别限定。
上述基准特征为通过上述模型对基准图像进行特征提取(例如若干次卷积运算)得到的特征图。
需要说明的是,上述加权平均使用的权重在本申请中不作特别限定。例如,上述权重可以为1。
在一些例子中,可能并不确定上述记忆单元中包括的样本类型。此时可以采用诸如K-MEANS聚类算法,对得到的困难样本进行聚类,得到记忆单元中包括的样本类型。
具体地,可以对通过上述模型获得的每个上述困难样本的样本特征进行比对,得到上述困难样本的相似度。然后基于上述困难样本的相似度,将上述困难样本中不同样本数据所属类别进行分类。
上述困难样本可能包括未知的若干样本类型,通过上述聚类算法可以将困难样本进行合理分类,得到若干样本类型。
在确定上述记忆单元中包括的样本类型后,可以通过计算每种类别对应的样本集合中困难样本的平均相似度,得到每种上述样本集合的特征中心。从而可以在获得新的困难样本时,将新获得的困难样本的样本特征与每种上述样本集合的特征中心进行相似度比对,并将新获得的困难样本存储到其对应类别的上述样本集合中。
在上述例子中公开的样本类型确定方法中,可以避免由人工确定样本类型,而是根据困难样本的实际情形进行无监督方式的聚类,得到更贴合实际的困难样本的样本类型,进而提升模型预测效果。
请继续参见图5,其中第一特征中心可以是正常类型对应的特征中心;第二特征 中心可以是老人类型对应的特征中心;第三特征中心可以是小孩类型对应的特征中心。通过三类特征图即可确定目标困难样本所属的样本类型。
请继续参见图4,在完成记忆单元初始化后,可以开始对模型的正式训练。在进行模型训练时,需要先执行S402(图未绘示),准备训练样本。
在确定若干训练样本后,可以继续执行S404,基于若干训练样本构建批处理数据,并将上述批处理数据输入模型进行训练。
在得到上述批处理数据包括的各训练样本对应的预测结果后,可以执行S406,将训练得到的各人脸识别结果与各识别结果分别对应的训练样本对应的真值输入预设的损失函数中计算各训练样本对应的损失值。
在得到各训练样本对应的损失值后,一方面,可以执行S408(图未绘示),通过梯度下降法,对梯度进行反向传播更新上述模型参数;另一方面,可以执行S410,基于前向传播之后得到的与各训练样本对应的损失值,确定上述各训练样本中包括的困难样本。
在确定困难样本后,可以执行S412,确定上述困难样本所属的样本类型。
在本步骤中,可以确定上述困难样本输入上述模型后得到的样本特征,与各特征中心之间的相似度。
例如,在本申请中可以通过诸如余弦距离或马氏距离等相似度计算方案,确定上述样本特征与各特征中心之间的相似度。在计算出上述样本特征与各特征中心之间的相似度时,还可以维护特征中心与依据该特征中心计算出的相似度的对应关系。
在确定上述样本特征与各特征中心之间的相似度后,可以确定上述相似度中的最高相似度,并将上述最高相似度对应的特征中心所对应的样本类型确定为上述困难样本所属的样本类型。
例如,在本申请中可以将确定的上述相似度按照从大到小的顺序进行排序,并将在首位的相似度确定为最高相似度。在确定上述最高相似度后,可以通过查询维护的上述对应关系,确定与上述最高相似度对应的特征中心。在确定与上述最高相似度对应的特征中心后,可以将该特征中心对应的样本类型确定为上述困难样本所属的样本类型。
在确定上述困难样本上述的样本类型后,可以执行S414,将上述困难样本存储至与上述困难样本所属样本类型对应的样本集合中。
在本步骤中,可以将上述困难样本存储至与上述困难样本所属样本类型对应的队列中。
请继续参见图5,假设目标困难样本所属的样本类型为小孩类型,则可以将该困难样本对应的图像数据插入小孩类型队列中(即第三样本类型队列)。
当任一类型的困难样本数量达到第一预设阈值时,可以执行S416,将该样本集合中的困难样本输入上述模型进行训练,更新上述模型参数。
在本步骤中,如果上述记忆单元包括的任一样本数据集队列已经存满,即可以认为该队列存储的困难样本数量已达到第一预设阈值。此时,可以将上述队列中存储的困难样本提取出来构造批处理数据。在批处理数据构造完成后,可以将该批处理数据输入上述模型进行训练,更新该模型参数。
在上述方案中,由于可以灵活设置多种样本类型,并且可以将困难样本进行分类存储和分类训练,因此,一方面可以使模型训练时可以针对多种特定类型的困难样本进行针对性训练,从而使训练的模型可以在多种该特定类型场景下有较好的性能;另一 方面可以无需专门针对多种类型分别建立训练样本,减少开发人员工作量。
在一些实施例中,为了针对困难样本进行多次训练,在存储的困难样本数量达到第一预设阈值并将存储的困难样本输入上述模型进行训练之后,还可以将本次训练得到的与各困难样本对应的损失值中,数值较大的P个损失值分别对应的困难样本存储至与各困难样本所属样本类型对应的样本集合中。
其中,P为根据经验设定的正整数。
在本步骤中,在存储的困难样本数量达到第一预设阈值并将存储的困难样本输入上述模型进行训练得到各困难样本对应的损失值后,可以确定各损失值中数值较大的P个损失值。
在确定上述P个损失值后,可以确定该P个损失值分别对应的困难样本的样本类型,并将该P个损失值分别对应的困难样本存储至与各困难样本所属样本类型对应的样本集合中。
在上述方案中,由于在存储的困难样本数量达到第一预设阈值并将存储的困难样本输入上述模型进行训练之后,还可以将本次训练得到的与各困难样本对应的损失值中,数值较大的P个损失值分别对应的困难样本存储至与各困难样本所属样本类型对应的样本集合中,因此可以将损失值较大的困难样本多次存储并对模型进行多次训练,从而增加通过该类困难样本对模型的优化次数,使得模型对于该类困难样本表现更好。
本申请还提出一种图像处理方法。该方法可以应用于任意电子设备。该方法通过利用前述任一实施例示出的训练方法训练得到的图像处理模型进行图像处理,从而可以保证上述图像处理模型除了在常规场景下表现优异,还可以在不同的特定场景下表现优异,进而提升图像处理效果。
具体地,上述方法可以包括:
获取目标图像并通过图像处理模型对上述目标图像进行图像处理,得到与上述目标图像对应的图像处理结果。
上述目标图像,可以是需要进行图像处理的任意图像。例如在人脸识别场景中,上述目标图像可以是包含人脸对象的图像。上述图像处理模型可以是任意需要进行图像处理的模型。例如,在人脸识别场景中,上述图像处理模型可以是人脸识别模型。
与上述任一实施例相对应的,本申请还提出一种模型训练装置。
请参见图6,图6为本申请示出的一种模型训练装置的结构示意图。
如图6所示,上述装置600可以包括:输入模块610,用于将若干训练样本输入模型,得到各训练样本对应的损失值;其中,上述训练样本包括多个样本类型的训练样本;
更新与确定模块620,用于根据上述损失值更新上述模型的模型参数,并基于上述损失值确定上述若干训练样本中至少部分训练样本所属的样本类型;
存储与训练模块630,用于将上述至少部分训练样本分别存储至与其所属的样本类型对应的样本集合中,以及基于上述样本集合包括的训练样本对上述模型进行训练。
在示出的一些实施例中,上述更新与确定模块620,包括:
第一确定模块,用于基于上述损失值确定上述若干训练样本中的困难样本;
第二确定模块,用于根据上述困难样本对应的样本特征确定上述困难样本所属的样本类型。
在示出的一些实施例中,上述装置600还包括:
建立模块,分别将与各样本类型对应的基准图像输入上述模型得到与各样本类型对应的特征中心;其中,上述特征中心用于确定困难样本所属的样本类型;
建立与各特征中心分别对应的样本集合。
在示出的一些实施例中,各样本类型分别对应M个基准图像;其中,上述M为正整数;上述建立模块具体用于:
分别将与各样本类型对应的上述M个基准图像输入上述模型得到与各样本类型对应的M个基准特征;
分别对各样本类型对应的M个基准特征进行加权平均,得到与各样本类型对应的特征中心。
在示出的一些实施例中,上述更新与确定模块620具体用于:
确定上述困难样本输入上述模型后得到的样本特征,与各特征中心之间的相似度;
确定上述相似度中的最高相似度,并将上述最高相似度对应的特征中心所对应的样本类型确定为上述困难样本所属的样本类型。
在示出的一些实施例中,上述装置600还包括:
分类模块,对通过上述模型获得的每个上述困难样本的样本特征进行比对,得到上述困难样本的相似度;
基于上述困难样本的相似度,将上述困难样本中不同样本数据所属类别进行分类。
在示出的一些实施例中,上述更新与确定模块620具体用于:
计算每种类别对应的样本集合中困难样本的平均相似度,得到每种上述样本集合的特征中心;
将新获得的困难样本的样本特征与每种上述样本集合的特征中心进行相似度比对,并将新获得的困难样本存储到其对应类别的上述样本集合中。
在示出的一些实施例中,上述更新与确定模块620具体用于:
确定各训练样本对应的损失值中,数值较大的N个损失值;其中,上述N为正整数;
将与上述N个损失值分别对应的训练样本确定为上述困难样本。
在示出的一些实施例中,上述更新与确定模块620具体用于:
确定各训练样本对应的损失值是否达到第二预设阈值;
若任一训练样本对应的损失值达到上述第二预设阈值,则将该训练样本确定为上述困难样本。
在示出的一些实施例中,上述装置600还包括:
存储模块,在存储的困难样本数量达到第一预设阈值并将存储的困难样本输入上述模型进行训练之后,将本次训练得到的与各困难样本对应的损失值中,数值较大的P个损失值分别对应的困难样本存储至与各困难样本所属样本类型对应的样本集合中。
在示出的一些实施例中,上述装置600还包括:
批处理模块,在将若干训练样本输入模型之前,基于上述训练样本构建批处理数据;
上述将若干训练样本输入模型,得到各训练样本对应的损失值,包括:
将上述批处理数据输入模型,得到批处理数据中各训练样本对应的损失值。
在示出的一些实施例中,上述存储与训练模块630具体用于:
确定各样本集合中困难样本数量是否达到第一预设阈值;
如果达到,则将该样本集合中的困难样本输入上述模型进行训练,更新上述模型参数;否则,继续累加困难样本。
在示出的一些实施例中,上述第一预设阈值为上述批处理数据所包括的样本数量。
在示出的一些实施例中,上述装置600还包括:
预训练模块,在利用上述训练样本进行模型训练前,利用预训练样本对该模型进行预训练;其中,上述预训练样本包括多个样本类型的预训练样本。
本申请还提出一种图像处理装置,上述装置可以包括:
获取模块,用于获取目标图像;
图像处理模块,用于通过图像处理模型对上述目标图像进行图像处理,得到与上述目标图像对应的图像处理结果。
其中,上述图像处理模型包括基于前述任一实施例示出的模型训练方法得到的模型。
本申请示出的模型训练装置或图像处理装置的实施例可以应用于电子设备上。
相应地,本申请公开了一种电子设备,该设备可以包括:处理器;
用于存储处理器可执行指令的存储器,
其中,上述处理器被配置为调用上述存储器中存储的可执行指令,实现如上述示出的模型训练方法或图像处理方法。
请参见图7,图7为本申请示出的一种电子设备的硬件结构示意图。
如图7所示,该电子设备可以包括用于执行指令的处理器,用于进行网络连接的网络接口,用于为处理器存储运行数据的内存,以及用于存储模型训练装置或图像处理装置对应指令的非易失性存储器。
其中,上述装置的实施例可以通过软件实现,也可以通过硬件或者软硬件结合的方式实现。以软件实现为例,作为一个逻辑意义上的装置,是通过其所在电子设备的处理器将非易失性存储器中对应的计算机程序指令读取到内存中运行形成的。从硬件层面而言,除了图7所示的处理器、内存、网络接口、以及非易失性存储器之外,实施例中装置所在的电子设备通常根据该电子设备的实际功能,还可以包括其他硬件,对此不再赘述。
可以理解的是,为了提升处理速度,模型训练装置或图像处理装置对应指令也可以直接存储于内存中,在此不作限定。
本申请提出一种计算机可读存储介质,上述存储介质存储有计算机程序,上述计算机程序用于执行如前示出的模型训练方法或图像处理方法。
本领域技术人员应明白,本申请一个或多个实施例可提供为方法、系统或计算机程序产品。因此,本申请一个或多个实施例可采用完全硬件实施例、完全软件实施例 或结合软件和硬件方面的实施例的形式。而且,本申请一个或多个实施例可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(可以包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本申请中的“和/或”表示至少具有两者中的其中一个,例如,“A和/或B”可以包括三种方案:A、B、以及“A和B”。
本申请中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于数据处理设备实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
上述对本申请特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下,在权利要求书中记载的行为或步骤可以按照不同于实施例中的顺序来执行并且仍然可以实现期望的结果。另外,在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中,多任务处理和并行处理也是可以的或者可能是有利的。
本申请中描述的主题及功能操作的实施例可以在以下中实现:数字电子电路、有形体现的计算机软件或固件、可以包括本申请中公开的结构及其结构性等同物的计算机硬件、或者它们中的一个或多个的组合。本申请中描述的主题的实施例可以实现为一个或多个计算机程序,即编码在有形非暂时性程序载体上以被数据处理装置执行或控制数据处理装置的操作的计算机程序指令中的一个或多个模块。可替代地或附加地,程序指令可以被编码在人工生成的传播信号上,例如机器生成的电、光或电磁信号,该信号被生成以将信息编码并传输到合适的接收机装置以由数据处理装置执行。计算机存储介质可以是机器可读存储设备、机器可读存储基板、随机或串行存取存储器设备、或它们中的一个或多个的组合。
本申请中描述的处理及逻辑流程可以由执行一个或多个计算机程序的一个或多个可编程计算机执行,以通过根据输入数据进行操作并生成输出来执行相应的功能。上述处理及逻辑流程还可以由专用逻辑电路—例如FPGA(现场可编程门阵列)或ASIC(专用集成电路)来执行,并且装置也可以实现为专用逻辑电路。
适合用于执行计算机程序的计算机可以包括,例如通用和/或专用微处理器,或任何其他类型的中央处理单元。通常,中央处理单元将从只读存储器和/或随机存取存储器接收指令和数据。计算机的基本组件可以包括用于实施或执行指令的中央处理单元以及用于存储指令和数据的一个或多个存储器设备。通常,计算机还将可以包括用于存储数据的一个或多个大容量存储设备,例如磁盘、磁光盘或光盘等,或者计算机将可操作地与此大容量存储设备耦接以从其接收数据或向其传送数据,抑或两种情况兼而有之。然而,计算机不是必须具有这样的设备。此外,计算机可以嵌入在另一设备中,例如移动电话、个人数字助理(PDA)、移动音频或视频播放器、游戏操纵台、全球定位系统(GPS)接收机、或例如通用串行总线(USB)闪存驱动器的便携式存储设备,仅举几例。
适合于存储计算机程序指令和数据的计算机可读介质可以包括所有形式的非易失性存储器、媒介和存储器设备,例如可以包括半导体存储器设备(例如EPROM、EEPROM和闪存设备)、磁盘(例如内部硬盘或可移动盘)、磁光盘以及CD ROM和DVD-ROM盘。处理器和存储器可由专用逻辑电路补充或并入专用逻辑电路中。
虽然本申请包含许多具体实施细节,但是这些不应被解释为限制任何公开的范围或所要求保护的范围,而是主要用于描述特定公开的具体实施例的特征。本申请内在 多个实施例中描述的某些特征也可以在单个实施例中被组合实施。另一方面,在单个实施例中描述的各种特征也可以在多个实施例中分开实施或以任何合适的子组合来实施。此外,虽然特征可以如上所述在某些组合中起作用并且甚至最初如此要求保护,但是来自所要求保护的组合中的一个或多个特征在一些情况下可以从该组合中去除,并且所要求保护的组合可以指向子组合或子组合的变型。
类似地,虽然在附图中以特定顺序描绘了操作,但是这不应被理解为要求这些操作以所示的特定顺序执行或顺次执行、或者要求所有例示的操作被执行,以实现期望的结果。在某些情况下,多任务和并行处理可能是有利的。此外,上述实施例中的各种系统模块和组件的分离不应被理解为在所有实施例中均需要这样的分离,并且应当理解,所描述的程序组件和系统通常可以一起集成在单个软件产品中,或者封装成多个软件产品。
由此,主题的特定实施例已被描述。其他实施例在所附权利要求书的范围以内。在某些情况下,权利要求书中记载的动作可以以不同的顺序执行并且仍实现期望的结果。此外,附图中描绘的处理并非必需所示的特定顺序或顺次顺序,以实现期望的结果。在某些实现中,多任务和并行处理可能是有利的。
以上仅为本申请一个或多个实施例的较佳实施例而已,并不用以限制本申请一个或多个实施例,凡在本申请一个或多个实施例的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本申请一个或多个实施例保护的范围之内。

Claims (19)

  1. 一种模型训练方法,其特征在于,所述方法包括:
    将若干训练样本输入模型,得到各训练样本对应的损失值;其中,所述训练样本包括多个样本类型的训练样本;
    根据所述损失值更新所述模型的模型参数,并基于所述损失值确定所述若干训练样本中至少部分训练样本所属的样本类型;
    将所述至少部分训练样本分别存储至与其所属的样本类型对应的样本集合中,以及基于所述样本集合包括的训练样本对所述模型进行训练。
  2. 根据权利要求1所述的方法,其特征在于,所述基于所述损失值确定所述若干训练样本中至少部分训练样本所属的样本类型,包括:
    基于所述损失值确定所述若干训练样本中的困难样本;
    根据所述困难样本对应的样本特征确定所述困难样本所属的样本类型。
  3. 根据权利要求2所述的方法,其特征在于,所述方法还包括:
    分别将与各样本类型对应的基准图像输入所述模型得到与各样本类型对应的特征中心;其中,所述特征中心用于确定困难样本所属的样本类型;
    建立与各特征中心分别对应的样本集合。
  4. 根据权利要求3所述的方法,其特征在于,各样本类型分别对应M个基准图像;其中,所述M为正整数;
    所述分别将与各样本类型对应的基准图像输入所述模型得到与各样本类型对应的特征中心,包括:
    分别将与各样本类型对应的所述M个基准图像输入所述模型得到与各样本类型对应的M个基准特征;
    分别对各样本类型对应的M个基准特征进行加权平均,得到与各样本类型对应的特征中心。
  5. 根据权利要求3或4所述的方法,其特征在于,所述根据所述困难样本对应的样本特征确定所述困难样本所属的样本类型,包括:
    确定所述困难样本输入所述模型后得到的样本特征,与各特征中心之间的相似度;
    确定所述相似度中的最高相似度,并将所述最高相似度对应的特征中心所对应的样本类型确定为所述困难样本所属的样本类型。
  6. 根据权利要求2所述的方法,其特征在于,所述方法还包括:
    对通过所述模型获得的每个所述困难样本的样本特征进行比对,得到所述困难样本的相似度;
    基于所述困难样本的相似度,将所述困难样本中不同样本数据所属类别进行分类。
  7. 根据权利要求6所述的方法,其特征在于,所述将所述至少部分训练样本分别存储至与其所属的样本类型对应的样本集合中,包括:
    计算每种类别对应的样本集合中困难样本的平均相似度,得到每种所述样本集合的特征中心;
    将新获得的困难样本的样本特征与每种所述样本集合的特征中心进行相似度比对,并将新获得的困难样本存储到其对应类别的所述样本集合中。
  8. 根据权利要求2-7任一所述的方法,其特征在于,所述基于所述损失值确定所述若干训练样本中的困难样本,包括:
    确定各训练样本对应的损失值中,数值较大的N个损失值;其中,所述N为正整数;
    将与所述N个损失值分别对应的训练样本确定为所述困难样本。
  9. 根据权利要求2-7任一所述的方法,其特征在于,所述基于所述损失值确定所述若干训练样本中的困难样本,包括:
    确定各训练样本对应的损失值是否达到第二预设阈值;
    若任一训练样本对应的损失值达到所述第二预设阈值,则将该训练样本确定为所述困难样本。
  10. 根据权利要求2-9任一所述的方法,其特征在于,所述方法还包括:
    在存储的困难样本数量达到第一预设阈值并将存储的困难样本输入所述模型进行训练之后,将本次训练得到的与各困难样本对应的损失值中,数值较大的P个损失值分别对应的困难样本存储至与各困难样本所属样本类型对应的样本集合中。
  11. 根据权利要求2-10任一所述的方法,其特征在于,所述方法还包括:
    在将若干训练样本输入模型之前,基于所述训练样本构建批处理数据;
    所述将若干训练样本输入模型,得到各训练样本对应的损失值,包括:
    将所述批处理数据输入模型,得到所述批处理数据中各训练样本对应的损失值。
  12. 根据权利要求11所述的方法,其特征在于,所述基于所述样本集合包括的训练样本对所述模型进行训练,包括:
    确定各样本集合中困难样本数量是否达到第一预设阈值;
    如果达到,则将该样本集合中的困难样本输入所述模型进行训练,更新所述模型参数;否则,继续累加困难样本。
  13. 根据权利要求12所述的方法,其特征在于,所述第一预设阈值为所述批处理数据所包括的样本数量。
  14. 根据权利要求1-13任一所述的方法,其特征在于,所述方法还包括:
    在利用所述训练样本进行模型训练前,利用预训练样本对该模型进行预训练;其中,所述预训练样本包括多个样本类型的预训练样本。
  15. 一种图像处理方法,其特征在于,所述方法包括:
    获取目标图像;
    通过图像处理模型对所述目标图像进行图像处理,得到与所述目标图像对应的图像处理结果;
    其中,所述图像处理模型包括基于权利要求1-14任一所述的模型训练方法得到的模型。
  16. 一种模型训练装置,其特征在于,所述装置包括:
    输入模块,用于将若干训练样本输入模型,得到各训练样本对应的损失值;其中,所述训练样本包括多个样本类型的训练样本;
    更新与确定模块,用于根据所述损失值更新所述模型的模型参数,并基于所述损失值确定所述若干训练样本中至少部分训练样本所属的样本类型;
    存储与训练模块,用于将所述至少部分训练样本分别存储至与其所属的样本类型对应的样本集合中,以及基于所述样本集合包括的训练样本对所述模型进行训练。
  17. 一种图像处理装置,其特征在于,所述装置包括:
    获取模块,用于获取目标图像;
    图像处理模块,用于通过图像处理模型对所述目标图像进行图像处理,得到与所述目标图像对应的图像处理结果;
    其中,所述图像处理模型包括基于权利要求1-14任一所述的模型训练方法得到的模型。
  18. 一种电子设备,其特征在于,所述设备包括:
    处理器;
    用于存储所述处理器可执行指令的存储器;
    其中,所述处理器被配置为调用所述存储器中存储的可执行指令,实现如权利要求1至14中任一项所述模型训练方法或权利要求15所述的图像处理方法。
  19. 一种计算机可读存储介质,其特征在于,所述存储介质存储有计算机程序,所述 计算机程序用于如执行权利要求1至14中任一项所述模型训练方法或权利要求15所述的图像处理方法。
PCT/CN2022/076751 2021-02-22 2022-02-18 模型训练与图像处理方法、装置、电子设备和存储介质 WO2022174805A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110198534.X 2021-02-22
CN202110198534.XA CN112733808A (zh) 2021-02-22 2021-02-22 模型训练与图像处理方法、装置、电子设备和存储介质

Publications (1)

Publication Number Publication Date
WO2022174805A1 true WO2022174805A1 (zh) 2022-08-25

Family

ID=75596874

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/076751 WO2022174805A1 (zh) 2021-02-22 2022-02-18 模型训练与图像处理方法、装置、电子设备和存储介质

Country Status (2)

Country Link
CN (1) CN112733808A (zh)
WO (1) WO2022174805A1 (zh)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112733808A (zh) * 2021-02-22 2021-04-30 深圳市商汤科技有限公司 模型训练与图像处理方法、装置、电子设备和存储介质
CN113360696B (zh) * 2021-06-23 2024-09-13 七腾机器人有限公司 图像配对方法、装置、设备以及存储介质
CN115700838A (zh) * 2021-07-29 2023-02-07 脸萌有限公司 用于图像识别模型的训练方法及其装置、图像识别方法
CN114596637B (zh) * 2022-03-23 2024-02-06 北京百度网讯科技有限公司 图像样本数据增强训练方法、装置及电子设备
CN115828162B (zh) * 2023-02-08 2023-07-07 支付宝(杭州)信息技术有限公司 一种分类模型训练的方法、装置、存储介质及电子设备

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180247107A1 (en) * 2015-09-30 2018-08-30 Siemens Healthcare Gmbh Method and system for classification of endoscopic images using deep decision networks
CN109816092A (zh) * 2018-12-13 2019-05-28 北京三快在线科技有限公司 深度神经网络训练方法、装置、电子设备及存储介质
CN110443241A (zh) * 2019-07-29 2019-11-12 北京迈格威科技有限公司 车牌识别模型训练方法、车牌识别方法及装置
CN111368525A (zh) * 2020-03-09 2020-07-03 深圳市腾讯计算机系统有限公司 信息搜索方法、装置、设备及存储介质
CN111523621A (zh) * 2020-07-03 2020-08-11 腾讯科技(深圳)有限公司 图像识别方法、装置、计算机设备和存储介质
CN111814835A (zh) * 2020-06-12 2020-10-23 理光软件研究所(北京)有限公司 计算机视觉模型的训练方法、装置、电子设备和存储介质
CN112733808A (zh) * 2021-02-22 2021-04-30 深圳市商汤科技有限公司 模型训练与图像处理方法、装置、电子设备和存储介质

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110490177A (zh) * 2017-06-02 2019-11-22 腾讯科技(深圳)有限公司 一种人脸检测器训练方法及装置
CN111259967B (zh) * 2020-01-17 2024-03-08 北京市商汤科技开发有限公司 图像分类及神经网络训练方法、装置、设备及存储介质

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180247107A1 (en) * 2015-09-30 2018-08-30 Siemens Healthcare Gmbh Method and system for classification of endoscopic images using deep decision networks
CN109816092A (zh) * 2018-12-13 2019-05-28 北京三快在线科技有限公司 深度神经网络训练方法、装置、电子设备及存储介质
CN110443241A (zh) * 2019-07-29 2019-11-12 北京迈格威科技有限公司 车牌识别模型训练方法、车牌识别方法及装置
CN111368525A (zh) * 2020-03-09 2020-07-03 深圳市腾讯计算机系统有限公司 信息搜索方法、装置、设备及存储介质
CN111814835A (zh) * 2020-06-12 2020-10-23 理光软件研究所(北京)有限公司 计算机视觉模型的训练方法、装置、电子设备和存储介质
CN111523621A (zh) * 2020-07-03 2020-08-11 腾讯科技(深圳)有限公司 图像识别方法、装置、计算机设备和存储介质
CN112733808A (zh) * 2021-02-22 2021-04-30 深圳市商汤科技有限公司 模型训练与图像处理方法、装置、电子设备和存储介质

Also Published As

Publication number Publication date
CN112733808A (zh) 2021-04-30

Similar Documents

Publication Publication Date Title
WO2022174805A1 (zh) 模型训练与图像处理方法、装置、电子设备和存储介质
CN107545889B (zh) 适用于模式识别的模型的优化方法、装置及终端设备
US20180075347A1 (en) Efficient training of neural networks
WO2022068195A1 (zh) 跨模态的数据处理方法、装置、存储介质以及电子装置
CN105930834B (zh) 基于球哈希二值编码的人脸识别方法及装置
US20210065011A1 (en) Training and application method apparatus system and stroage medium of neural network model
WO2022156331A1 (zh) 知识蒸馏和图像处理方法、装置、电子设备和存储介质
CN113255714A (zh) 图像聚类方法、装置、电子设备及计算机可读存储介质
CN111340057B (zh) 一种分类模型训练的方法及装置
CN111695458A (zh) 一种视频图像帧处理方法及装置
WO2023020214A1 (zh) 检索模型的训练和检索方法、装置、设备及介质
CN111339443A (zh) 用户标签确定方法、装置、计算机设备及存储介质
CN111291827A (zh) 图像聚类方法、装置、设备及存储介质
WO2023231753A1 (zh) 一种神经网络的训练方法、数据的处理方法以及设备
CN114118196A (zh) 用于训练用于图像分类的模型的方法和设备
CN112668482B (zh) 人脸识别训练方法、装置、计算机设备及存储介质
CN103984921B (zh) 一种用于人体动作识别的三轴特征融合方法
CN112149699A (zh) 用于生成模型的方法、装置和用于识别图像的方法、装置
CN112348079A (zh) 数据降维处理方法、装置、计算机设备及存储介质
CN115705694A (zh) 用于分割任务的无监督学习的系统和方法
CN113705598A (zh) 数据分类方法、装置及电子设备
CN113140012A (zh) 图像处理方法、装置、介质及电子设备
CN114155388B (zh) 一种图像识别方法、装置、计算机设备和存储介质
CN115774854A (zh) 一种文本分类方法、装置、电子设备和存储介质
CN116306987A (zh) 基于联邦学习的多任务学习方法及相关设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22755578

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22755578

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 14-02-2024)