CN112990432A - Target recognition model training method and device and electronic equipment - Google Patents

Target recognition model training method and device and electronic equipment Download PDF

Info

Publication number
CN112990432A
CN112990432A CN202110242083.5A CN202110242083A CN112990432A CN 112990432 A CN112990432 A CN 112990432A CN 202110242083 A CN202110242083 A CN 202110242083A CN 112990432 A CN112990432 A CN 112990432A
Authority
CN
China
Prior art keywords
training
sample
current
loss function
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110242083.5A
Other languages
Chinese (zh)
Other versions
CN112990432B (en
Inventor
张梦琴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Kingsoft Cloud Network Technology Co Ltd
Original Assignee
Beijing Kingsoft Cloud Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Kingsoft Cloud Network Technology Co Ltd filed Critical Beijing Kingsoft Cloud Network Technology Co Ltd
Priority to CN202110242083.5A priority Critical patent/CN112990432B/en
Publication of CN112990432A publication Critical patent/CN112990432A/en
Application granted granted Critical
Publication of CN112990432B publication Critical patent/CN112990432B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24137Distances to cluster centroïds
    • G06F18/2414Smoothing the distance, e.g. radial basis function networks [RBFN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The application provides a target recognition model training method, a target recognition model training device and electronic equipment, wherein a training sample set and a fitting image set are obtained, samples in a current training sample subset are input into an initial model, and a first feature vector and a prediction label of each sample are obtained; performing feature extraction on the current fitting image through a second intermediate layer of the initial image classification model to obtain a second feature vector corresponding to the current fitting image; calculating a characteristic loss function value according to a first characteristic vector corresponding to the positive sample and a second characteristic vector corresponding to the current fitting image; calculating a cross entropy loss function value according to the prediction label and the real label corresponding to each sample; and performing back propagation training based on the characteristic loss function value and the cross entropy loss function value to obtain a target recognition model. According to the method and the device, the target recognition model which can recognize whether the image contains the target or not can be trained, and the recognition accuracy and the recall rate of the target recognition model are improved.

Description

Target recognition model training method and device and electronic equipment
Technical Field
The present application relates to the field of image recognition technologies, and in particular, to a method and an apparatus for training a target recognition model, and an electronic device.
Background
The current image classification tasks are mainly divided into traditional image classification tasks and fine-grained image classification tasks. In an image recognition scene which only needs to recognize whether a certain target exists in an image and does not need to recognize detailed information such as the type, the position and the like of the target, if a traditional image classification task is adopted for model training, the characteristics of a key small target are easily ignored, and the recognition capability of a model is poor; if the fine-grained classification task is used for model training, the training process and the obtained model are too complex, and the recognition efficiency is influenced.
Disclosure of Invention
The application aims to provide a target recognition model training method, a device and electronic equipment, wherein a characteristic loss function value can be calculated through characteristic extraction of a fitting image, a reverse gradient propagation training is carried out on an initial image classification model through the characteristic loss function value and a cross entropy loss function value, a target recognition model capable of recognizing whether a target is contained in the image can be trained, and the recognition accuracy and the recall rate of the target recognition model are improved.
In a first aspect, an embodiment of the present application provides a target recognition model training method, where the method is applied to an electronic device, and an initial image classification model is prestored in the electronic device; the method comprises the following steps: acquiring a training sample set and a fitting image set; the samples in the training sample set comprise positive samples and negative samples, and the images in the fitting image set are images of which the area occupation ratio of the target is greater than a set threshold value; determining a training sample subset and a current fitting image corresponding to each training round based on a training sample set and a fitting image set, and executing the following operations for each training round: inputting samples in a current training sample subset into an initial image classification model to obtain a first feature vector and a prediction label of each sample; the first feature vector is a vector output by a first middle layer of the initial image classification model; performing feature extraction on the current fitting image through a second intermediate layer of the initial image classification model to obtain a second feature vector corresponding to the current fitting image; calculating a characteristic loss function value of the training of the current round according to a first characteristic vector corresponding to the positive samples in the current training sample subset and a second characteristic vector corresponding to the current fitting image; calculating a cross entropy loss function value of the training of the current round according to a prediction label and a real label corresponding to each sample in the current training sample subset; and determining a total loss value of the training of the current round based on the characteristic loss function value and the cross entropy loss function value of the training of the current round, performing reverse gradient propagation training on the initial image classification model according to the total loss value of the training of the current round, and stopping the training until the training round reaches a preset number of times or the total loss value converges to a preset convergence threshold value to obtain the target recognition model.
Further, the initial image classification model comprises a convolutional neural network, an attention structure, a fusion module and a classifier which are connected in sequence; the fusion module is a first middle layer; inputting samples in a current training sample subset into an initial image classification model to obtain a first feature vector and a prediction label of each sample, wherein the method comprises the following steps: inputting samples in the current training sample subset into a convolutional neural network to obtain an original characteristic diagram corresponding to each sample; inputting the original characteristic diagram corresponding to each sample into an attention structure to obtain an attention diagram corresponding to each sample; inputting the original characteristic diagram and the attention diagram corresponding to each sample into a fusion module to obtain a first characteristic vector corresponding to each sample; and inputting the first feature vector corresponding to each sample into a classifier to obtain a prediction label corresponding to each sample.
Further, the step of inputting the original feature map and the attention map corresponding to each sample into the fusion module to obtain the first feature vector corresponding to each sample includes: for each sample corresponding original feature map and attention map, the following operations are performed: carrying out spatial standardization on the attention diagram corresponding to the sample through a softmax function to obtain a value corresponding to each pixel in the attention diagram; and taking the value corresponding to each pixel in the attention map as a weight value, and carrying out weighted summation on the original feature map corresponding to the sample to obtain a first feature vector corresponding to the sample.
Further, the second intermediate layer is a convolutional neural network; the method comprises the following steps of performing feature extraction on a current fitting image through a second intermediate layer of the initial image classification model to obtain a second feature vector corresponding to the current fitting image, wherein the step comprises the following steps: and inputting the current fitting image into the convolutional neural network to obtain a second feature vector corresponding to the current fitting image.
Further, the step of calculating the feature loss function value of the training in this round according to the first feature vector corresponding to the positive sample in the current training sample subset and the second feature vector corresponding to the current fitting image includes: calculating a first characteristic loss function value corresponding to each positive sample according to a first characteristic vector corresponding to each positive sample in the current training sample subset and a second characteristic vector corresponding to the current fitting image; and carrying out mean value calculation on the first characteristic loss function values corresponding to the positive samples to obtain the characteristic loss function values of the training of the round.
Further, the step of calculating the first feature loss function value corresponding to each positive sample according to the first feature vector corresponding to each positive sample in the current training sample subset and the second feature vector corresponding to the current fitting image includes: calculating a first characteristic loss function value for the positive sample by the following equation:
Figure BDA0002962557840000031
wherein L is2A first characteristic loss function value representing a positive sample; MSE () represents the mean square error function,
Figure BDA0002962557840000032
representing a first feature vector corresponding to the positive sample; v. of2Representing a second feature vector corresponding to the currently fitted image.
Further, the step of calculating the cross entropy loss function value of the current training according to the prediction label and the real label corresponding to each sample in the current training sample subset includes: calculating a first cross entropy loss function value corresponding to each sample according to a prediction label, a real label and a cross entropy loss function corresponding to each sample in the current training sample subset; and carrying out mean value calculation on the first cross entropy loss function values corresponding to the samples to obtain cross entropy loss function values of the training of the current round.
Further, the step of determining the total loss value of the current round of training based on the characteristic loss function value and the cross entropy loss function value of the current round of training includes: and summing the characteristic loss function value and the cross entropy loss function value of the training of the current round to obtain the total loss value of the training of the current round.
Further, the attention structure includes three convolution layers; a BN layer and a linear connection unit are connected behind each convolution layer.
Further, the method further comprises: predicting the appointed image by using a target recognition model obtained by current training every preset training turn; designating the image as a target related image without labeling; and if the confidence of the prediction result exceeds a preset threshold, adding the specified image to the training sample set to perform model training.
Further, the method further comprises: acquiring an image to be identified; and inputting the image to be recognized into the target recognition model to obtain a recognition result corresponding to the image to be recognized.
In a second aspect, an embodiment of the present application further provides a target recognition model training apparatus, where the apparatus is applied to an electronic device, and an initial image classification model is prestored in the electronic device; the device comprises: the image set acquisition module is used for acquiring a training sample set and a fitting image set; the samples in the training sample set comprise positive samples and negative samples, and the images in the fitting image set are images of which the area occupation ratio of the target is greater than a set threshold value; the model training module is used for determining a training sample subset and a current fitting image corresponding to each training round based on the training sample set and the fitting image set, and executing the following operations for each training round: inputting samples in a current training sample subset into an initial image classification model to obtain a first feature vector and a prediction label of each sample; the first feature vector is a vector output by a first middle layer of the initial image classification model; performing feature extraction on the current fitting image through a second intermediate layer of the initial image classification model to obtain a second feature vector corresponding to the current fitting image; calculating a characteristic loss function value of the training of the current round according to a first characteristic vector corresponding to the positive samples in the current training sample subset and a second characteristic vector corresponding to the current fitting image; calculating a cross entropy loss function value of the training of the current round according to a prediction label and a real label corresponding to each sample in the current training sample subset; and determining a total loss value of the training of the current round based on the characteristic loss function value and the cross entropy loss function value of the training of the current round, performing reverse gradient propagation training on the initial image classification model according to the total loss value of the training of the current round, and stopping the training until the training round reaches a preset number of times or the total loss value converges to a preset convergence threshold value to obtain the target recognition model.
In a third aspect, an embodiment of the present application further provides an electronic device, which includes a processor and a memory, where the memory stores computer-executable instructions that can be executed by the processor, and the processor executes the computer-executable instructions to implement the method in the first aspect.
In a fourth aspect, embodiments of the present application further provide a computer-readable storage medium storing computer-executable instructions that, when invoked and executed by a processor, cause the processor to implement the method of the first aspect.
In the target recognition model training method provided by the embodiment of the application, a training sample set and a fitting image set are obtained firstly; the samples in the training sample set comprise positive samples and negative samples, and the images in the fitting image set are images of which the area occupation ratio of the target is greater than a set threshold value; determining a training sample subset and a current fitting image corresponding to each training round based on a training sample set and a fitting image set, and executing the following operations for each training round: inputting samples in a current training sample subset into an initial image classification model to obtain a first feature vector and a prediction label of each sample; the first feature vector is a vector output by a first middle layer of the initial image classification model; performing feature extraction on the current fitting image through a second intermediate layer of the initial image classification model to obtain a second feature vector corresponding to the current fitting image; calculating a characteristic loss function value of the training of the current round according to a first characteristic vector corresponding to the positive samples in the current training sample subset and a second characteristic vector corresponding to the current fitting image; calculating a cross entropy loss function value of the training of the current round according to a prediction label and a real label corresponding to each sample in the current training sample subset; and determining a total loss value of the training of the current round based on the characteristic loss function value and the cross entropy loss function value of the training of the current round, performing reverse gradient propagation training on the initial image classification model according to the total loss value of the training of the current round, and stopping the training until the training round reaches a preset number of times or the total loss value converges to a preset convergence threshold value to obtain the target recognition model. According to the method and the device, the characteristic loss function value can be calculated through characteristic extraction of the fitted image, the initial image classification model is subjected to reverse gradient propagation training through the characteristic loss function value and the cross entropy loss function value, a target identification model which can identify whether an image carries a target or not can be trained, and the identification accuracy rate and the recall rate of the target identification model are improved.
Drawings
In order to more clearly illustrate the detailed description of the present application or the technical solutions in the prior art, the drawings needed to be used in the detailed description of the present application or the prior art description will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a flowchart of a target recognition model training method according to an embodiment of the present disclosure;
FIG. 2 is a schematic diagram of a target recognition model training process according to an embodiment of the present disclosure;
fig. 3 is a flowchart of a target identification method according to an embodiment of the present application;
fig. 4 is a block diagram illustrating a structure of a target recognition model training apparatus according to an embodiment of the present disclosure;
fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The technical solutions of the present application will be described clearly and completely with reference to the following embodiments, and it should be understood that the described embodiments are some, but not all embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The current image classification tasks are mainly divided into traditional image classification tasks and fine-grained image classification tasks. In the traditional image classification task, no matter how large the proportion of an important judgment area in an image in the whole image is, only the characteristics of the whole image are extracted at one glance, and then classification is carried out; in fine-grained image classification, the discriminable area in an image to be classified is usually only a small area in the image, so that an area of an object of interest is usually obtained first, and then the object is subjected to fine classification in a plurality of classes with small differences.
And the classification of fine-grained images is divided into strong supervised learning and weak supervised learning. The strong supervised learning needs to add more additional labeling frames to the network for the strong supervised learning, so that the network can learn the position information of the target, which is similar to a target detection task. In the weak supervised learning, the network discriminates the position of a region through unsupervised learning, and then particularly pays Attention to the feature difference of the region to identify the category of a target.
In an image recognition scene which only needs to recognize whether a certain target exists in an image and does not need to recognize detailed information such as the type, the position and the like of the target, if a traditional image classification task is adopted for model training, the characteristics of a key small target are easily ignored, and the recognition capability of a model is poor; if the fine-grained classification task is used for model training, the training process and the obtained model are too complex, and the recognition efficiency is influenced.
Based on this, the embodiment of the application provides a method and a device for training a target recognition model, and an electronic device, wherein a feature loss function value can be calculated through feature extraction of a fitted image, and a reverse gradient propagation training is performed on an initial image classification model through the feature loss function value and a cross entropy loss function value, so that the target recognition model capable of recognizing whether an image carries a target or not can be trained, and the recognition accuracy and the recall rate of the target recognition model are improved.
For the convenience of understanding the present embodiment, a method for training a target recognition model disclosed in the embodiments of the present application will be described in detail first.
Fig. 1 is a flowchart of a target recognition model training method according to an embodiment of the present disclosure, where the method is applied to an electronic device, and an initial image classification model is prestored in the electronic device; the initial image classification model may be implemented in various ways, and is not limited in any way. The target may be a gun, a knife, or other articles, the target recognition model trained by the target recognition model training method provided in this embodiment may quickly determine whether a certain image includes or carries a target, and the target recognition model training method specifically includes the following steps:
and step S11, acquiring a training sample set and a fitting image set.
The samples in the training sample set comprise positive samples and negative samples, the positive samples are images containing targets, and the negative samples do not contain the images of the targets; the images in the fitted image set are images in which the area ratio of the target is greater than a set threshold, for example, only pure samples of the target are contained, or images in which the area ratio of the target is greater than a certain threshold, for example, 95%, and the threshold can be adjusted according to actual conditions.
And step S12, determining a training sample subset and a current fitting image corresponding to each training round based on the training sample set and the fitting image set, and executing the following operations for each training round until the training round reaches a preset number of times or the total loss value converges to a preset convergence threshold value, so as to obtain a target recognition model.
During model training, a training sample subset and a current fitting image corresponding to current training need to be determined from a training sample set and a fitting image set, for example, 20 images are selected from the training sample set as samples in the training sample subset corresponding to the current training, and one fitting image is randomly selected from the fitting image set as the current fitting image. And then, executing a model training process of the following five steps, and stopping training until the training round reaches a preset number (such as 100 times) or the total loss value converges to a preset convergence threshold value to obtain the target recognition model.
The following five steps are performed for each round of training:
step S121, inputting samples in a current training sample subset into an initial image classification model to obtain a first feature vector and a prediction label of each sample; and the first feature vector is a vector output by the first intermediate layer of the initial image classification model.
The above-mentioned process of obtaining the first feature vector may include multiple ways, and the first intermediate layer for extracting the feature vector is also different for the initial image classification models of different structures. In this embodiment, the first intermediate layer may be a fusion module, which outputs a first feature vector of the sample after fusing the feature map extracted by the neural network and the attention map extracted by the attention structure.
On the basis of obtaining the first feature vector of the sample, the classifier may further output a classification result, that is, a prediction label of the sample, for example, the label includes Y and N, Y indicates that the sample is an image containing the target, and N indicates that the sample is an image not containing the target.
And S122, performing feature extraction on the current fitting image through a second intermediate layer of the initial image classification model to obtain a second feature vector corresponding to the current fitting image.
And the second intermediate layer and the first intermediate layer have different structural positions in the initial image classification model, and the current fitting image is input into the initial image classification model, namely a second feature vector can be output through the second intermediate layer.
Step S123, calculating a feature loss function value of the training in the current round according to the first feature vector corresponding to the positive sample in the current training sample subset and the second feature vector corresponding to the current fitting image.
The characteristic loss function value can be calculated by substituting two kinds of characteristic vectors into a preset characteristic loss function. If the number of the positive samples is one, the first feature vector corresponding to the positive sample and the second feature vector corresponding to the current fitting image are directly substituted into a preset feature loss function to be calculated, and generally, the number of the positive samples is multiple, so that the feature loss function value of each positive sample can be calculated respectively, and then the average value of the feature loss function values corresponding to the multiple positive samples is taken as the feature loss function value of the training.
Step S124, calculating a cross entropy loss function value of the training in the current round according to the predicted label and the real label corresponding to each sample in the current training sample subset.
Similarly, the calculation of the cross entropy loss function value can also be realized by adopting a preset calculation formula, and the average value of the cross entropy loss function values corresponding to a plurality of samples can be taken as the cross entropy loss function value of the training in the current round.
And step S125, determining a total loss value of the training of the current round based on the characteristic loss function value and the cross entropy loss function value of the training of the current round, and performing reverse gradient propagation training on the initial image classification model according to the total loss value of the training of the current round.
In this step, the characteristic loss function value and the cross entropy loss function value of the current round of training are added to obtain a total loss value of the current round of training, and then the initial image classification model is subjected to inverse gradient propagation training through the total loss value.
Through a certain number of times of cyclic training processes, an ideal target recognition model can be obtained finally. According to the target recognition model training method provided by the embodiment of the application, the feature vector extraction of the fitting image is added, so that the feature loss function value can be calculated, the reverse gradient propagation training is carried out on the initial image classification model through the feature loss function value and the cross entropy loss function value, the target recognition model which can identify whether the image carries the target or not can be trained, and the recognition accuracy and the recall rate of the target recognition model are improved.
In the following, a preferred embodiment is listed, in which the training process of the target recognition model is implemented by adding an attention mechanism, and as shown in fig. 2, in the embodiment of the present application, the initial image classification model includes a convolutional neural network, an attention structure, a fusion module, and a classifier, which are connected in sequence; the fusion module, i.e. the first intermediate layer, may output a first feature vector of the sample.
The specific model training process is as follows:
(1) and simultaneously performing the characteristic extraction steps on the current training sample subset and the corresponding current fitting image:
the feature extraction process for the current training sample subset is as follows:
A. and inputting the samples in the current training sample subset into a convolutional neural network to obtain an original characteristic diagram corresponding to each sample.
In the embodiment of the present application, a ResNet50(Residual Network) is used to implement a process of extracting a feature map of samples in a current training sample subset, and the process may also be another Network, and currently, mainstream convolutional neural networks may be, for example, VGG, ResNet152, and the like. Model parameters trained on an ImageNet image database are adopted during initialization, and only the last full-connection layer is required to be modified into a binary classification problem whether the current sample set carries a target or not in the training process. The input size of all sample data is first scaled to 224 x 224, and in the embodiment of the present application, the feature map extracted from the last convolutional layer of the ResNet50 model is extracted as the original feature map Vs of the samples in the current training sample subset.
B. Inputting the original characteristic diagram corresponding to each sample into an attention structure to obtain an attention diagram corresponding to each sample; the attention structure comprises three convolution layers; a BN layer and a linear connection unit are connected behind each convolution layer.
After obtaining the feature map Vs from the ResNet50, it is input to the Attention structure learning to obtain the Attention map Vatt. The Attention structure consists of three convolutional layers, the first layer using 1024 convolutional kernels of size 1 x 1, the second layer using 512 convolutional kernels of size 3 x 3, and the third layer using 1 convolutional kernel of size 1 x 1, while each convolution is followed by a BN layer and a modified linear element. The role of the BN layer is mainly three: the training and convergence speed of the network is accelerated; controlling the gradient explosion to prevent the gradient from disappearing; overfitting is prevented.
C. And inputting the original feature map and the attention map corresponding to each sample into a fusion module to obtain a first feature vector corresponding to each sample.
Specifically, the following operations are performed for the original feature map and the attention map corresponding to each sample: carrying out spatial standardization on the attention diagram corresponding to the sample through a softmax function to obtain a value corresponding to each pixel in the attention diagram; and taking the value corresponding to each pixel in the attention map as a weight value, and carrying out weighted summation on the original feature map corresponding to the sample to obtain a first feature vector corresponding to the sample.
The softmax function described above is as follows:
Figure BDA0002962557840000111
wherein, ai,jThe value at the (i, j) position in the attention map Vatt after spatial normalization, that is, the weight value at the (i, j) position in the original feature map;
Figure BDA0002962557840000112
for attention force diagram VattThe median is the value at (i, j).
The first feature vector is calculated as follows:
v1=∑i,jxi,jai,j
wherein v is1Representing a first feature vector corresponding to the sample; x is the number ofi,jRepresents the feature vector at position (i, j) in the original feature map Vs, ai,jThe value at the (i, j) position in the attention map Vatt after spatial normalization, i.e. the weight value at the (i, j) position in the original feature map, is obtained.
The feature extraction process for the current fitted image is as follows:
A. and inputting the current fitting image into the convolutional neural network to obtain a second feature vector corresponding to the current fitting image. The convolutional neural network is the second intermediate layer of the initial image classification model.
The same applies toThe deep convolutional neural network ResNet50 is used for extracting the features of the current fitting image, at the moment, the last full connection layer of the network model is removed, the features of the last convolutional layer are extracted as feature vectors, and the second feature vector v corresponding to the current fitting image is obtained2
(2) And inputting the first feature vector corresponding to each sample into a classifier to obtain a prediction label corresponding to each sample.
Using the corresponding first feature vector v of each sample1To learn a binary linear classifier for target recognition:
Figure BDA0002962557840000121
wherein W and b are linear classifier parameters, and corresponding first feature vector v to each sample1And inputting the classifier to obtain the prediction label corresponding to each sample.
(3) The corresponding feature Loss function value of this round of training is calculated, as shown in Loss2 in fig. 2.
In order to train the Attention structure, in the embodiment of the application, the feature fitting loss needs to be calculated, namely, the second feature vector v of the fitting image is calculated2With the first feature vector v for classification1The feature loss function value corresponding to the training of the current round is calculated by the following steps.
A. And calculating a first characteristic loss function value corresponding to each positive sample according to the first characteristic vector corresponding to each positive sample in the current training sample subset and the second characteristic vector corresponding to the current fitting image.
Specifically, the first characteristic loss function value of the positive sample is calculated by the following formula:
Figure BDA0002962557840000122
wherein L is2Representing positive samplesA first characteristic loss function value; MSE () represents the mean square error function,
Figure BDA0002962557840000123
representing a first feature vector corresponding to the positive sample; v. of2Representing a second feature vector corresponding to the currently fitted image.
B. And carrying out mean value calculation on the first characteristic loss function values corresponding to the positive samples to obtain the characteristic loss function values of the training of the round.
For example, the subset of training samples in the current round includes 20 images, where 7 of the images are positive samples, and then the average of the first characteristic loss function values corresponding to the 7 positive samples can be calculated to obtain the characteristic loss function value of the current round of training.
(4) The cross entropy Loss function value corresponding to this round of training is calculated, as shown in Loss1 in fig. 2.
A. And calculating a first cross entropy loss function value corresponding to each sample according to the prediction label, the real label and the cross entropy loss function corresponding to each sample in the current training sample subset.
Computing predictive labels
Figure BDA0002962557840000131
With respect to authentic labels y, i.e. to minimise losses
Figure BDA0002962557840000132
And the cross entropy loss between y, the formula:
Figure BDA0002962557840000133
where Cross Encopy () is a Cross Entropy loss function. And calculating a first cross entropy loss function value corresponding to each sample through the function.
B. And carrying out mean value calculation on the first cross entropy loss function values corresponding to the samples to obtain cross entropy loss function values of the training of the current round.
Further, taking the above example as an example, for example, if the subset of the training samples of the current round includes 20 images, the average value of the first cross entropy loss function values corresponding to the 20 samples may be calculated to obtain the cross entropy loss function value of the training of the current round.
(5) And calculating the total Loss value corresponding to the training of the current round, such as the Loss total in FIG. 2.
And summing the characteristic loss function value and the cross entropy loss function value of the training of the current round to obtain the total loss value of the training of the current round.
The final loss function of the model is:
Figure BDA0002962557840000134
therefore, the characteristic loss function value and the cross entropy loss function value of the training of the current round are summed, and the total loss value of the training of the current round can be obtained.
(6) And (4) carrying out back propagation training. And carrying out back propagation training based on the total loss value of the training of the current round obtained by the calculation.
And (5) repeating the steps (1) to (6) to train to obtain the target recognition model.
In addition, the samples in the training sample set need to be labeled manually before training, namely, the samples are divided into positive samples and negative samples, due to the fact that data labeling cost is high, training data during training of the preliminary image classification model are few, in order to improve generalization capability of the model, semi-supervised training is further adopted in the embodiment of the application, and a large amount of data which are not labeled and are related to the target are added into training.
Namely: in the model training process, predicting the specified image by using a target recognition model obtained by current training every preset training turn; designating the image as a target related image without labeling; and if the confidence of the prediction result exceeds a preset threshold, adding the specified image to the training sample set to perform model training.
In practical application, a certain threshold value k can be set, firstly, a trained target recognition model is loaded to predict unlabelled data, images with confidence degrees larger than the threshold value k are automatically selected to be added into training, the model automatically reselects the unlabelled data once every n epochs are trained, and the size of the threshold value k is adjusted by observing the selected data volume and a test result in the model training process. Through model fine adjustment, the accuracy and generalization capability of the model can be improved.
According to the target recognition model training method provided by the embodiment of the application, the cross entropy loss predicted by the model is calculated, and meanwhile the fitting capacity between the attention weighted feature vector and the fitting image is calculated to directly train the attention structure, so that the accuracy of model recognition is improved. And the semi-supervised training method of selecting the unlabelled images while training is carried out in the training process, so that the generalization capability of the model can be improved without increasing the labeling cost.
Further, an embodiment of the present application further provides a target identification method, as shown in fig. 3, the method includes the steps of:
step S302, acquiring an image to be identified;
and step S304, inputting the image to be recognized into the target recognition model to obtain a recognition result corresponding to the image to be recognized.
The target recognition model is obtained by training the target recognition model training method in the previous embodiment, and the image to be recognized is input to the target recognition model to obtain the recognition result corresponding to the image to be recognized, that is, the prediction label is obtained through the extraction process of the first feature vector in the previous embodiment and the prediction of the classifier, and the prediction label can represent whether the image to be recognized is the image containing the target. For a specific identification process, reference may be made to the above embodiment, which is not described herein again.
Based on the method embodiment, the embodiment of the application also provides a target recognition model training device, which is applied to electronic equipment, wherein an initial image classification model is prestored in the electronic equipment; referring to fig. 4, the apparatus includes:
an image set obtaining module 41, configured to obtain a training sample set and a fitting image set; the samples in the training sample set comprise positive samples and negative samples, and the images in the fitting image set are images of which the area occupation ratio of the target is greater than a set threshold value; and the model training module 42 is configured to determine a training sample subset and a current fitting image corresponding to each training round based on the training sample set and the fitting image set, and perform the following operations for each training round until the training round reaches a preset number of times or a total loss value converges to a preset convergence threshold value, so as to stop training and obtain a target recognition model.
The model training module 42 includes: the system comprises a feature extraction and identification module 421, a loss value calculation module 422 and a back propagation training module 423, wherein the feature extraction and identification module 421 is used for inputting samples in a current training sample subset into an initial image classification model to obtain a first feature vector and a prediction label of each sample; the first feature vector is a vector output by a first middle layer of the initial image classification model; performing feature extraction on the current fitting image through a second intermediate layer of the initial image classification model to obtain a second feature vector corresponding to the current fitting image; the loss value calculating module 422 is configured to calculate a loss function value of the feature of the training in the current round according to a first feature vector corresponding to each positive sample in the current training sample subset and a second feature vector corresponding to the current fitting image; calculating a cross entropy loss function value of the training of the current round according to a prediction label and a real label corresponding to each sample in the current training sample subset; determining a total loss value of the training of the current round based on the characteristic loss function value and the cross entropy loss function value of the training of the current round; the back propagation training module 423 is configured to perform back gradient propagation training on the initial image classification model according to the total loss value of the training in the current round.
Further, the initial image classification model comprises a convolutional neural network, an attention structure, a fusion module and a classifier which are connected in sequence; the fusion module is a first middle layer; the feature extraction and identification module 421 is further configured to: inputting samples in the current training sample subset into a convolutional neural network to obtain an original characteristic diagram corresponding to each sample; inputting the original characteristic diagram corresponding to each sample into an attention structure to obtain an attention diagram corresponding to each sample; inputting the original characteristic diagram and the attention diagram corresponding to each sample into a fusion module to obtain a first characteristic vector corresponding to each sample; and inputting the first feature vector corresponding to each sample into a classifier to obtain a prediction label corresponding to each sample.
Further, the feature extraction and identification module 421 is further configured to: for each sample corresponding original feature map and attention map, the following operations are performed: carrying out spatial standardization on the attention diagram corresponding to the sample through a softmax function to obtain a value corresponding to each pixel in the attention diagram; and taking the value corresponding to each pixel in the attention map as a weight value, and carrying out weighted summation on the original feature map corresponding to the sample to obtain a first feature vector corresponding to the sample.
Further, the second intermediate layer is a convolutional neural network; the feature extraction and identification module 421 is further configured to: and inputting the current fitting image into the convolutional neural network to obtain a second feature vector corresponding to the current fitting image.
Further, the loss value calculation module 422 is further configured to: calculating a first characteristic loss function value corresponding to each positive sample according to a first characteristic vector corresponding to each positive sample in the current training sample subset and a second characteristic vector corresponding to the current fitting image; and carrying out mean value calculation on the first characteristic loss function values corresponding to the positive samples to obtain the characteristic loss function values of the training of the round.
Further, the loss value calculation module 422 is further configured to: calculating a first characteristic loss function value for the positive sample by the following equation:
Figure BDA0002962557840000161
wherein L is2A first characteristic loss function value representing a positive sample; MSE () represents the mean square error function,
Figure BDA0002962557840000162
representing a first feature vector corresponding to the positive sample; v. of2Representing a second feature vector corresponding to the currently fitted image.
Further, the loss value calculation module 422 is further configured to: calculating a first cross entropy loss function value corresponding to each sample according to a prediction label, a real label and a cross entropy loss function corresponding to each sample in the current training sample subset; and carrying out mean value calculation on the first cross entropy loss function values corresponding to the samples to obtain cross entropy loss function values of the training of the current round.
Further, the loss value calculation module 422 is further configured to: and summing the characteristic loss function value and the cross entropy loss function value of the training of the current round to obtain the total loss value of the training of the current round.
Further, the attention structure includes three convolution layers; a BN layer and a linear connection unit are connected behind each convolution layer.
Further, the model training module 42 is further configured to: in the model training process, predicting the specified image by using a target recognition model obtained by current training every preset training turn; designating the image as a target related image without labeling; and if the confidence of the prediction result exceeds a preset threshold, adding the specified image to the training sample set to perform model training.
Further, the above apparatus further comprises: the image recognition module is used for acquiring an image to be recognized; and inputting the image to be recognized into the target recognition model to obtain a recognition result corresponding to the image to be recognized.
The implementation principle and the generated technical effect of the target recognition model training device provided in the embodiment of the present application are the same as those of the target recognition model training method embodiment, and for brief description, the corresponding contents in the target recognition model training method embodiment may be referred to where the embodiment of the target recognition model training device is not mentioned.
An electronic device is further provided in the embodiments of the present application, as shown in fig. 5, which is a schematic structural diagram of the electronic device, where the electronic device includes a processor 51 and a memory 50, the memory 50 stores computer-executable instructions capable of being executed by the processor 51, and the processor 51 executes the computer-executable instructions to implement the method.
In the embodiment shown in fig. 5, the electronic device further comprises a bus 52 and a communication interface 53, wherein the processor 51, the communication interface 53 and the memory 50 are connected by the bus 52.
The Memory 50 may include a high-speed Random Access Memory (RAM) and may also include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. The communication connection between the network element of the system and at least one other network element is realized through at least one communication interface 53 (which may be wired or wireless), and the internet, a wide area network, a local network, a metropolitan area network, and the like can be used. The bus 52 may be an ISA (Industry Standard Architecture) bus, a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus 52 may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 5, but this does not indicate only one bus or one type of bus.
The processor 51 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 51. The Processor 51 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the device can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory, and the processor 51 reads information in the memory and performs the steps of the method of the previous embodiment in combination with hardware thereof.
Embodiments of the present application further provide a computer-readable storage medium, where computer-executable instructions are stored, and when the computer-executable instructions are called and executed by a processor, the computer-executable instructions cause the processor to implement the method, and specific implementation may refer to the foregoing method embodiments, and is not described herein again.
The method, the apparatus, and the computer program product for training a target recognition model provided in the embodiments of the present application include a computer-readable storage medium storing a program code, where instructions included in the program code may be used to execute the method described in the foregoing method embodiments, and specific implementations may refer to the method embodiments and are not described herein again.
Unless specifically stated otherwise, the relative steps, numerical expressions, and values of the components and steps set forth in these embodiments do not limit the scope of the present application.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
In the description of the present application, it should be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", and the like indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of description and simplicity of description, and do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be construed as limiting the present application. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present application, and are used for illustrating the technical solutions of the present application, but not limiting the same, and the scope of the present application is not limited thereto, and although the present application is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope disclosed in the present application; such modifications, changes or substitutions do not depart from the spirit and scope of the exemplary embodiments of the present application, and are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (14)

1. A target recognition model training method is characterized in that the method is applied to electronic equipment, and an initial image classification model is prestored in the electronic equipment; the method comprises the following steps:
acquiring a training sample set and a fitting image set; the samples in the training sample set comprise positive samples and negative samples, and the images in the fitting image set are images of which the area ratio of the target is greater than a set threshold value;
determining a training sample subset and a current fitting image corresponding to each training round based on the training sample set and the fitting image set, and executing the following operations for each training round:
inputting samples in a current training sample subset into the initial image classification model to obtain a first feature vector and a prediction label of each sample; wherein the first feature vector is a vector output by a first intermediate layer of the initial image classification model;
performing feature extraction on the current fitting image through a second intermediate layer of the initial image classification model to obtain a second feature vector corresponding to the current fitting image;
calculating a characteristic loss function value of the training of the current round according to a first characteristic vector corresponding to the positive samples in the current training sample subset and a second characteristic vector corresponding to the current fitting image;
calculating a cross entropy loss function value of the training in the current round according to the prediction label and the real label corresponding to each sample in the current training sample subset;
and determining a total loss value of the training of the current round based on the characteristic loss function value and the cross entropy loss function value of the training of the current round, performing reverse gradient propagation training on the initial image classification model according to the total loss value of the training of the current round, and stopping the training until the training round reaches a preset number of times or the total loss value converges to a preset convergence threshold value to obtain a target recognition model.
2. The method of claim 1, wherein the initial image classification model comprises a convolutional neural network, an attention structure, a fusion module, and a classifier connected in sequence; the fusion module is the first intermediate layer;
inputting the samples in the current training sample subset into the initial image classification model to obtain a first feature vector and a prediction label of each sample, wherein the steps comprise:
inputting the samples in the current training sample subset into the convolutional neural network to obtain an original characteristic diagram corresponding to each sample;
inputting the original feature map corresponding to each sample into the attention structure to obtain an attention map corresponding to each sample;
inputting the original feature map and the attention map corresponding to each sample into the fusion module to obtain a first feature vector corresponding to each sample;
and inputting the first feature vector corresponding to each sample into the classifier to obtain a prediction label corresponding to each sample.
3. The method according to claim 2, wherein the step of inputting the original feature map and the attention map corresponding to each sample into the fusion module to obtain the first feature vector corresponding to each sample comprises:
for each original feature map and attention map corresponding to the sample, the following operations are performed:
spatially normalizing the attention diagram corresponding to the sample through a softmax function to obtain a value corresponding to each pixel in the attention diagram;
and taking the value corresponding to each pixel in the attention map as a weight value, and carrying out weighted summation on the original feature map corresponding to the sample to obtain a first feature vector corresponding to the sample.
4. The method of claim 2, wherein the second intermediate layer is the convolutional neural network;
the step of extracting the features of the current fitting image through a second intermediate layer of the initial image classification model to obtain a second feature vector corresponding to the current fitting image comprises the following steps:
and inputting the current fitting image into the convolutional neural network to obtain a second feature vector corresponding to the current fitting image.
5. The method of claim 1, wherein the step of calculating the feature loss function value of the current training cycle according to the first feature vector corresponding to the positive sample in the current training sample subset and the second feature vector corresponding to the current fitting image comprises:
calculating a first characteristic loss function value corresponding to each positive sample in the current training sample subset according to a first characteristic vector corresponding to each positive sample and a second characteristic vector corresponding to the current fitting image;
and carrying out mean value calculation on the first characteristic loss function values corresponding to the positive samples to obtain the characteristic loss function values of the training of the current round.
6. The method of claim 5, wherein the step of calculating the first feature loss function value corresponding to each positive sample in the current training sample subset according to the first feature vector corresponding to each positive sample and the second feature vector corresponding to the current fitting image comprises:
calculating a first characteristic loss function value for the positive sample by the following equation:
Figure FDA0002962557830000031
wherein L is2A first characteristic loss function value representing a positive sample; MSE () represents the mean square error function,
Figure FDA0002962557830000032
representing a first feature vector corresponding to the positive sample; v. of2Representing a second feature vector corresponding to the currently fitted image.
7. The method of claim 1, wherein the step of calculating the cross-entropy loss function value for the current training run according to the prediction label and the true label corresponding to each of the samples in the current training sample subset comprises:
calculating a first cross entropy loss function value corresponding to each sample according to a prediction label, a real label and a cross entropy loss function corresponding to each sample in the current training sample subset;
and carrying out mean value calculation on the first cross entropy loss function values corresponding to the samples to obtain cross entropy loss function values of the training of the current round.
8. The method of claim 1, wherein the step of determining a total loss value for the current round of training based on the feature loss function values and the cross-entropy loss function values for the current round of training comprises:
and summing the characteristic loss function values and the cross entropy loss function values of the training of the current round to obtain a total loss value of the training of the current round.
9. The method of claim 2, wherein the attention structure comprises three convolutional layers; a BN layer and a linear connection unit are connected behind each convolution layer.
10. The method of claim 1, further comprising:
in the model training process, predicting the specified image by using a target recognition model obtained by current training every preset training turn; the specified image is a target related image without labeling;
and if the confidence of the prediction result exceeds a preset threshold, adding the specified image to the training sample set to perform model training.
11. The method of claim 1, further comprising:
acquiring an image to be identified;
and inputting the image to be recognized into the target recognition model to obtain a recognition result corresponding to the image to be recognized.
12. The device is characterized in that the device is applied to electronic equipment, and an initial image classification model is prestored in the electronic equipment; the device comprises:
the image set acquisition module is used for acquiring a training sample set and a fitting image set; the samples in the training sample set comprise positive samples and negative samples, and the images in the fitting image set are images of which the area ratio of the target is greater than a set threshold value;
a model training module, configured to determine, based on the training sample set and the fitting image set, a training sample subset and a current fitting image corresponding to each round of training, and perform the following operations for each round of training: inputting the samples in the current training sample subset into the initial image classification model to obtain a first feature vector and a prediction label of each sample; wherein the first feature vector is a vector output by a first intermediate layer of the initial image classification model; performing feature extraction on the current fitting image through a second intermediate layer of the initial image classification model to obtain a second feature vector corresponding to the current fitting image; calculating a characteristic loss function value of the training of the current round according to a first characteristic vector corresponding to the positive samples in the current training sample subset and a second characteristic vector corresponding to the current fitting image; calculating a cross entropy loss function value of the training in the current round according to the prediction label and the real label corresponding to each sample in the current training sample subset; and determining a total loss value of the training of the current round based on the characteristic loss function value and the cross entropy loss function value of the training of the current round, performing reverse gradient propagation training on the initial image classification model according to the total loss value of the training of the current round, and stopping the training until the training round reaches a preset number of times or the total loss value converges to a preset convergence threshold value to obtain a target recognition model.
13. An electronic device comprising a processor and a memory, the memory storing computer-executable instructions executable by the processor, the processor executing the computer-executable instructions to implement the method of any of claims 1 to 11.
14. A computer-readable storage medium having stored thereon computer-executable instructions that, when invoked and executed by a processor, cause the processor to implement the method of any of claims 1 to 11.
CN202110242083.5A 2021-03-04 2021-03-04 Target recognition model training method and device and electronic equipment Active CN112990432B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110242083.5A CN112990432B (en) 2021-03-04 2021-03-04 Target recognition model training method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110242083.5A CN112990432B (en) 2021-03-04 2021-03-04 Target recognition model training method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN112990432A true CN112990432A (en) 2021-06-18
CN112990432B CN112990432B (en) 2023-10-27

Family

ID=76352849

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110242083.5A Active CN112990432B (en) 2021-03-04 2021-03-04 Target recognition model training method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN112990432B (en)

Cited By (35)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113191461A (en) * 2021-06-29 2021-07-30 苏州浪潮智能科技有限公司 Picture identification method, device and equipment and readable storage medium
CN113254435A (en) * 2021-07-15 2021-08-13 北京电信易通信息技术股份有限公司 Data enhancement method and system
CN113421212A (en) * 2021-06-23 2021-09-21 华侨大学 Medical image enhancement method, device, equipment and medium
CN113435525A (en) * 2021-06-30 2021-09-24 平安科技(深圳)有限公司 Classification network training method and device, computer equipment and storage medium
CN113468108A (en) * 2021-09-06 2021-10-01 辰风策划(深圳)有限公司 Enterprise planning scheme intelligent management classification system based on characteristic data identification
CN113486804A (en) * 2021-07-07 2021-10-08 科大讯飞股份有限公司 Object identification method, device, equipment and storage medium
CN113505820A (en) * 2021-06-23 2021-10-15 北京阅视智能技术有限责任公司 Image recognition model training method, device, equipment and medium
CN113642481A (en) * 2021-08-17 2021-11-12 百度在线网络技术(北京)有限公司 Recognition method, training method, device, electronic equipment and storage medium
CN113657406A (en) * 2021-07-13 2021-11-16 北京旷视科技有限公司 Model training and feature extraction method and device, electronic equipment and storage medium
CN113762508A (en) * 2021-09-06 2021-12-07 京东鲲鹏(江苏)科技有限公司 Training method, device, equipment and medium for image classification network model
CN113808021A (en) * 2021-09-17 2021-12-17 北京金山云网络技术有限公司 Image processing method and device, image processing model training method and device, and electronic equipment
CN113807316A (en) * 2021-10-08 2021-12-17 南京恩博科技有限公司 Training method and device for smoke concentration estimation model, electronic equipment and medium
CN113850219A (en) * 2021-09-30 2021-12-28 广州文远知行科技有限公司 Data collection method and device, vehicle and storage medium
CN114004963A (en) * 2021-12-31 2022-02-01 深圳比特微电子科技有限公司 Target class identification method and device and readable storage medium
CN114186097A (en) * 2021-12-10 2022-03-15 北京百度网讯科技有限公司 Method and apparatus for training a model
CN114241374A (en) * 2021-12-14 2022-03-25 百度在线网络技术(北京)有限公司 Training method of live broadcast processing model, live broadcast processing method, device and equipment
CN114255381A (en) * 2021-12-23 2022-03-29 北京瑞莱智慧科技有限公司 Training method of image recognition model, image recognition method, device and medium
CN114417959A (en) * 2021-12-06 2022-04-29 浙江大华技术股份有限公司 Correlation method for feature extraction, target identification method, correlation device and apparatus
CN114581838A (en) * 2022-04-26 2022-06-03 阿里巴巴达摩院(杭州)科技有限公司 Image processing method and device and cloud equipment
CN114612717A (en) * 2022-03-09 2022-06-10 四川大学华西医院 AI model training label generation method, training method, use method and device
CN114648680A (en) * 2022-05-17 2022-06-21 腾讯科技(深圳)有限公司 Training method, device, equipment, medium and program product of image recognition model
CN114677255A (en) * 2022-03-17 2022-06-28 北京中交兴路信息科技有限公司 Method and device for identifying vehicle body in truck picture, storage medium and terminal
CN114722826A (en) * 2022-04-07 2022-07-08 平安科技(深圳)有限公司 Model training method and device, electronic equipment and storage medium
CN114827460A (en) * 2022-04-15 2022-07-29 武汉理工大学 Cloud deck image following method and device based on brushless motor control and electronic equipment
CN114866162A (en) * 2022-07-11 2022-08-05 中国人民解放军国防科技大学 Signal data enhancement method and system and identification method and system of communication radiation source
CN115034327A (en) * 2022-06-22 2022-09-09 支付宝(杭州)信息技术有限公司 External data application, user identification method, device and equipment
CN115063753A (en) * 2022-08-17 2022-09-16 苏州魔视智能科技有限公司 Safety belt wearing detection model training method and safety belt wearing detection method
CN115082740A (en) * 2022-07-18 2022-09-20 北京百度网讯科技有限公司 Target detection model training method, target detection method, device and electronic equipment
CN115100717A (en) * 2022-06-29 2022-09-23 腾讯科技(深圳)有限公司 Training method of feature extraction model, and cartoon object recognition method and device
CN115375978A (en) * 2022-10-27 2022-11-22 北京闪马智建科技有限公司 Behavior information determination method and apparatus, storage medium, and electronic apparatus
CN116127067A (en) * 2022-12-28 2023-05-16 北京明朝万达科技股份有限公司 Text classification method, apparatus, electronic device and storage medium
CN116137061A (en) * 2023-04-20 2023-05-19 北京睿芯通量科技发展有限公司 Training method and device for quantity statistical model, electronic equipment and storage medium
CN116935363A (en) * 2023-07-04 2023-10-24 东莞市微振科技有限公司 Cutter identification method, cutter identification device, electronic equipment and readable storage medium
CN117058493A (en) * 2023-10-13 2023-11-14 之江实验室 Image recognition security defense method and device and computer equipment
CN117058100A (en) * 2023-08-14 2023-11-14 阿里巴巴达摩院(杭州)科技有限公司 Image recognition method, electronic device, and computer-readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108681774A (en) * 2018-05-11 2018-10-19 电子科技大学 Based on the human body target tracking method for generating confrontation network negative sample enhancing
WO2019184124A1 (en) * 2018-03-30 2019-10-03 平安科技(深圳)有限公司 Risk-control model training method, risk identification method and apparatus, and device and medium
CN111046959A (en) * 2019-12-12 2020-04-21 上海眼控科技股份有限公司 Model training method, device, equipment and storage medium
US20200285896A1 (en) * 2019-03-09 2020-09-10 Tongji University Method for person re-identification based on deep model with multi-loss fusion training strategy

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019184124A1 (en) * 2018-03-30 2019-10-03 平安科技(深圳)有限公司 Risk-control model training method, risk identification method and apparatus, and device and medium
CN108681774A (en) * 2018-05-11 2018-10-19 电子科技大学 Based on the human body target tracking method for generating confrontation network negative sample enhancing
US20200285896A1 (en) * 2019-03-09 2020-09-10 Tongji University Method for person re-identification based on deep model with multi-loss fusion training strategy
CN111046959A (en) * 2019-12-12 2020-04-21 上海眼控科技股份有限公司 Model training method, device, equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王博威;潘宗序;胡玉新;马闻;: "少量样本下基于孪生CNN的SAR目标识别", 雷达科学与技术, no. 06 *

Cited By (50)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113421212B (en) * 2021-06-23 2023-06-02 华侨大学 Medical image enhancement method, device, equipment and medium
CN113421212A (en) * 2021-06-23 2021-09-21 华侨大学 Medical image enhancement method, device, equipment and medium
CN113505820B (en) * 2021-06-23 2024-02-06 北京阅视智能技术有限责任公司 Image recognition model training method, device, equipment and medium
CN113505820A (en) * 2021-06-23 2021-10-15 北京阅视智能技术有限责任公司 Image recognition model training method, device, equipment and medium
CN113191461A (en) * 2021-06-29 2021-07-30 苏州浪潮智能科技有限公司 Picture identification method, device and equipment and readable storage medium
CN113435525A (en) * 2021-06-30 2021-09-24 平安科技(深圳)有限公司 Classification network training method and device, computer equipment and storage medium
CN113486804A (en) * 2021-07-07 2021-10-08 科大讯飞股份有限公司 Object identification method, device, equipment and storage medium
CN113486804B (en) * 2021-07-07 2024-02-20 科大讯飞股份有限公司 Object identification method, device, equipment and storage medium
CN113657406B (en) * 2021-07-13 2024-04-23 北京旷视科技有限公司 Model training and feature extraction method and device, electronic equipment and storage medium
CN113657406A (en) * 2021-07-13 2021-11-16 北京旷视科技有限公司 Model training and feature extraction method and device, electronic equipment and storage medium
CN113254435B (en) * 2021-07-15 2021-10-29 北京电信易通信息技术股份有限公司 Data enhancement method and system
CN113254435A (en) * 2021-07-15 2021-08-13 北京电信易通信息技术股份有限公司 Data enhancement method and system
CN113642481A (en) * 2021-08-17 2021-11-12 百度在线网络技术(北京)有限公司 Recognition method, training method, device, electronic equipment and storage medium
CN113468108A (en) * 2021-09-06 2021-10-01 辰风策划(深圳)有限公司 Enterprise planning scheme intelligent management classification system based on characteristic data identification
CN113762508A (en) * 2021-09-06 2021-12-07 京东鲲鹏(江苏)科技有限公司 Training method, device, equipment and medium for image classification network model
CN113808021A (en) * 2021-09-17 2021-12-17 北京金山云网络技术有限公司 Image processing method and device, image processing model training method and device, and electronic equipment
CN113850219A (en) * 2021-09-30 2021-12-28 广州文远知行科技有限公司 Data collection method and device, vehicle and storage medium
CN113807316A (en) * 2021-10-08 2021-12-17 南京恩博科技有限公司 Training method and device for smoke concentration estimation model, electronic equipment and medium
CN113807316B (en) * 2021-10-08 2023-12-12 南京恩博科技有限公司 Training method and device of smoke concentration estimation model, electronic equipment and medium
CN114417959A (en) * 2021-12-06 2022-04-29 浙江大华技术股份有限公司 Correlation method for feature extraction, target identification method, correlation device and apparatus
CN114417959B (en) * 2021-12-06 2022-12-02 浙江大华技术股份有限公司 Correlation method for feature extraction, target identification method, correlation device and apparatus
CN114186097A (en) * 2021-12-10 2022-03-15 北京百度网讯科技有限公司 Method and apparatus for training a model
CN114241374A (en) * 2021-12-14 2022-03-25 百度在线网络技术(北京)有限公司 Training method of live broadcast processing model, live broadcast processing method, device and equipment
CN114241374B (en) * 2021-12-14 2022-12-13 百度在线网络技术(北京)有限公司 Training method of live broadcast processing model, live broadcast processing method, device and equipment
CN114255381A (en) * 2021-12-23 2022-03-29 北京瑞莱智慧科技有限公司 Training method of image recognition model, image recognition method, device and medium
CN114004963A (en) * 2021-12-31 2022-02-01 深圳比特微电子科技有限公司 Target class identification method and device and readable storage medium
CN114612717A (en) * 2022-03-09 2022-06-10 四川大学华西医院 AI model training label generation method, training method, use method and device
CN114677255A (en) * 2022-03-17 2022-06-28 北京中交兴路信息科技有限公司 Method and device for identifying vehicle body in truck picture, storage medium and terminal
CN114722826A (en) * 2022-04-07 2022-07-08 平安科技(深圳)有限公司 Model training method and device, electronic equipment and storage medium
CN114722826B (en) * 2022-04-07 2024-02-02 平安科技(深圳)有限公司 Model training method and device, electronic equipment and storage medium
CN114827460A (en) * 2022-04-15 2022-07-29 武汉理工大学 Cloud deck image following method and device based on brushless motor control and electronic equipment
CN114581838A (en) * 2022-04-26 2022-06-03 阿里巴巴达摩院(杭州)科技有限公司 Image processing method and device and cloud equipment
CN114648680A (en) * 2022-05-17 2022-06-21 腾讯科技(深圳)有限公司 Training method, device, equipment, medium and program product of image recognition model
CN114648680B (en) * 2022-05-17 2022-08-16 腾讯科技(深圳)有限公司 Training method, device, equipment and medium of image recognition model
CN115034327A (en) * 2022-06-22 2022-09-09 支付宝(杭州)信息技术有限公司 External data application, user identification method, device and equipment
CN115100717A (en) * 2022-06-29 2022-09-23 腾讯科技(深圳)有限公司 Training method of feature extraction model, and cartoon object recognition method and device
CN114866162B (en) * 2022-07-11 2023-09-26 中国人民解放军国防科技大学 Signal data enhancement method and system and communication radiation source identification method and system
CN114866162A (en) * 2022-07-11 2022-08-05 中国人民解放军国防科技大学 Signal data enhancement method and system and identification method and system of communication radiation source
CN115082740A (en) * 2022-07-18 2022-09-20 北京百度网讯科技有限公司 Target detection model training method, target detection method, device and electronic equipment
CN115082740B (en) * 2022-07-18 2023-09-01 北京百度网讯科技有限公司 Target detection model training method, target detection device and electronic equipment
CN115063753A (en) * 2022-08-17 2022-09-16 苏州魔视智能科技有限公司 Safety belt wearing detection model training method and safety belt wearing detection method
CN115375978A (en) * 2022-10-27 2022-11-22 北京闪马智建科技有限公司 Behavior information determination method and apparatus, storage medium, and electronic apparatus
CN116127067B (en) * 2022-12-28 2023-10-20 北京明朝万达科技股份有限公司 Text classification method, apparatus, electronic device and storage medium
CN116127067A (en) * 2022-12-28 2023-05-16 北京明朝万达科技股份有限公司 Text classification method, apparatus, electronic device and storage medium
CN116137061A (en) * 2023-04-20 2023-05-19 北京睿芯通量科技发展有限公司 Training method and device for quantity statistical model, electronic equipment and storage medium
CN116935363A (en) * 2023-07-04 2023-10-24 东莞市微振科技有限公司 Cutter identification method, cutter identification device, electronic equipment and readable storage medium
CN116935363B (en) * 2023-07-04 2024-02-23 东莞市微振科技有限公司 Cutter identification method, cutter identification device, electronic equipment and readable storage medium
CN117058100A (en) * 2023-08-14 2023-11-14 阿里巴巴达摩院(杭州)科技有限公司 Image recognition method, electronic device, and computer-readable storage medium
CN117058493A (en) * 2023-10-13 2023-11-14 之江实验室 Image recognition security defense method and device and computer equipment
CN117058493B (en) * 2023-10-13 2024-02-13 之江实验室 Image recognition security defense method and device and computer equipment

Also Published As

Publication number Publication date
CN112990432B (en) 2023-10-27

Similar Documents

Publication Publication Date Title
CN112990432B (en) Target recognition model training method and device and electronic equipment
CN110020592B (en) Object detection model training method, device, computer equipment and storage medium
WO2018121690A1 (en) Object attribute detection method and device, neural network training method and device, and regional detection method and device
CN111062413A (en) Road target detection method and device, electronic equipment and storage medium
CN109886335B (en) Classification model training method and device
CN110096938B (en) Method and device for processing action behaviors in video
CN113469088B (en) SAR image ship target detection method and system under passive interference scene
CN110135505B (en) Image classification method and device, computer equipment and computer readable storage medium
CN112183153A (en) Object behavior detection method and device based on video analysis
CN112488218A (en) Image classification method, and training method and device of image classification model
CN110716792B (en) Target detector and construction method and application thereof
US20210097344A1 (en) Target identification in large image data
CN111694954B (en) Image classification method and device and electronic equipment
CN112364974B (en) YOLOv3 algorithm based on activation function improvement
CN110490058B (en) Training method, device and system of pedestrian detection model and computer readable medium
CN114495006A (en) Detection method and device for left-behind object and storage medium
CN111340051A (en) Picture processing method and device and storage medium
CN113011532A (en) Classification model training method and device, computing equipment and storage medium
CN114861842A (en) Few-sample target detection method and device and electronic equipment
CN111539456A (en) Target identification method and device
CN116206227B (en) Picture examination system and method for 5G rich media information, electronic equipment and medium
CN112597997A (en) Region-of-interest determining method, image content identifying method and device
CN115713669B (en) Image classification method and device based on inter-class relationship, storage medium and terminal
CN111476144A (en) Pedestrian attribute identification model determination method and device and computer readable storage medium
CN113887455B (en) Face mask detection system and method based on improved FCOS

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant