WO2023123847A1 - Model training method and apparatus, image processing method and apparatus, and device, storage medium and computer program product - Google Patents

Model training method and apparatus, image processing method and apparatus, and device, storage medium and computer program product Download PDF

Info

Publication number
WO2023123847A1
WO2023123847A1 PCT/CN2022/095298 CN2022095298W WO2023123847A1 WO 2023123847 A1 WO2023123847 A1 WO 2023123847A1 CN 2022095298 W CN2022095298 W CN 2022095298W WO 2023123847 A1 WO2023123847 A1 WO 2023123847A1
Authority
WO
WIPO (PCT)
Prior art keywords
model
sequence
target
image
predicted
Prior art date
Application number
PCT/CN2022/095298
Other languages
French (fr)
Chinese (zh)
Inventor
金国强
杨帆
孙明珊
刘亚坤
李韡
暴天鹏
吴立威
Original Assignee
上海商汤智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海商汤智能科技有限公司 filed Critical 上海商汤智能科技有限公司
Publication of WO2023123847A1 publication Critical patent/WO2023123847A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting

Definitions

  • the present disclosure relates to but not limited to the field of artificial intelligence, and in particular relates to a model training and image processing method, device, equipment, storage medium and computer program product.
  • Target detection is an important problem in the fields of computer vision and industrial detection. Target detection is to use algorithms to obtain the position and corresponding classification of the target of interest in the image. Compared with image classification, object detection is a prediction-intensive computer vision task. During the training process of the object detection model, the labeling requirements are higher, so the labeling cost is also higher.
  • embodiments of the present disclosure provide a model training and image processing method, device, device, storage medium, and computer program product.
  • An embodiment of the present disclosure provides a model training method, the method is executed by a computer device, and the method includes:
  • first model to be trained uses the first model to be trained, perform target detection on the first augmented image to obtain at least one first detection result including the first predicted object sequence, and use the second model to perform target detection on the second augmented image Target detection, obtaining at least one second detection result including a second predicted object sequence;
  • the model parameters of the first model are updated at least once to obtain the trained first model.
  • An embodiment of the present disclosure provides an image processing method, the method is executed by a computer device, including:
  • the fourth model includes at least one of the following: the first model obtained by the above-mentioned model training method, using the above-mentioned The third model obtained by the model training method.
  • An embodiment of the present disclosure provides a model training device, the device comprising:
  • the first acquisition part is configured to acquire a first augmented image and a second augmented image obtained by respectively augmenting the first image sample;
  • the first detection part is configured to use the first model to be trained to perform object detection on the first augmented image, obtain at least one first detection result including the first predicted object sequence, and use the second model to perform target detection on the first augmented image.
  • Target detection is performed on the second augmented image, and at least one second detection result including a second predicted object sequence is obtained;
  • the first matching part is configured to match each of the first predictor sequences and each of the second predictor sequences to obtain at least one pair of first predictor sequences and second predictors having a target matching relationship sequence;
  • the first update part is configured to update the model parameters of the first model at least once based on each pair of the first predicted object sequence and the second predicted object sequence having a target matching relationship, and obtain the trained first predicted object sequence. a model.
  • An embodiment of the present disclosure provides an image processing device, including:
  • the third acquiring part is configured to acquire the image to be processed
  • the second detection part is configured to use the trained fourth model to perform target detection on the image to be processed to obtain a third detection result; wherein the fourth model includes at least one of the following: using the above-mentioned model training method The obtained first model and the third model obtained by using the above model training method.
  • An embodiment of the present disclosure provides a computer device, including a memory and a processor.
  • the memory stores a computer program that can run on the processor.
  • the processor executes the program, part or all of the steps in the above method are implemented.
  • An embodiment of the present disclosure provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, part or all of the steps in the above method are implemented.
  • An embodiment of the present disclosure provides a computer program, including computer readable codes, when the computer readable codes are run in a computer device, a processor in the computer device executes some or all of the steps for implementing the above method .
  • An embodiment of the present disclosure provides a computer program product.
  • the computer program product includes a non-transitory computer-readable storage medium storing a computer program.
  • the computer program is read and executed by a computer, a part or part of the above-mentioned method is implemented. All steps.
  • the first augmented image and the second augmented image obtained after respectively augmenting the first image sample are acquired; using the first model to be trained, the target detection is performed on the first augmented image, Obtain at least one first detection result including the first predicted object sequence, and use the second model to perform target detection on the second augmented image, and obtain at least one second detection result including the second predicted object sequence; for each first A predictor sequence is matched with each second predictor sequence to obtain at least one pair of the first predictor sequence and the second predictor sequence with the target matching relationship; based on each pair of the first predictor sequence with the target match relationship and the second predicted object sequence, and update the model parameters of the first model at least once to obtain the trained first model.
  • the sequence-level self-supervised training process of the target detection model can be realized, and the overall network structure of the target detection model can be trained, so that the performance of the entire target detection model can be effectively improved, and the labeling cost in the training process of the target detection model can be reduced.
  • FIG. 1 is a schematic diagram of the implementation flow of a model training method provided by an embodiment of the present disclosure
  • FIG. 2 is a schematic diagram of an implementation flow of a model training method provided by an embodiment of the present disclosure
  • FIG. 3 is a schematic diagram of an implementation flow of a model training method provided by an embodiment of the present disclosure
  • FIG. 4 is a schematic diagram of an implementation flow of a model training method provided by an embodiment of the present disclosure
  • FIG. 5 is a schematic diagram of an implementation flow of a model training method provided by an embodiment of the present disclosure
  • FIG. 6 is a schematic diagram of an implementation flow of an image processing method provided by an embodiment of the present disclosure.
  • FIG. 7A is a schematic diagram of an implementation process of model training based on a pre-training method provided by an embodiment of the present disclosure
  • FIG. 7B is a schematic diagram of an implementation architecture of a model training method provided by an embodiment of the present disclosure.
  • FIG. 8 is a schematic diagram of the composition and structure of a model training device provided by an embodiment of the present disclosure.
  • FIG. 9 is a schematic diagram of the composition and structure of an image processing device provided by an embodiment of the present disclosure.
  • FIG. 10 is a schematic diagram of a hardware entity of a computer device provided by an embodiment of the present disclosure.
  • references to “some embodiments” describe a subset of all possible embodiments, but it is understood that “some embodiments” may be the same subset or a different subset of all possible embodiments, and Can be combined with each other without conflict.
  • the term “first/second/third” involved is only to distinguish similar objects, and does not represent a specific ordering for the objects. It is understandable that “first/second/third” can be used interchangeably when permitted. The specific order or sequential order is not intended to enable the embodiments of the disclosure described herein to be practiced in other orders than those illustrated or described herein.
  • a self-supervised training algorithm can be used to help improve the performance of the target detection model by using unlabeled data.
  • the self-supervised training algorithm in the related art is mainly applied to the image classification task, and the entire image is regarded as a whole, which is not suitable for the prediction-intensive task of target detection, and the self-supervised training algorithm in the related art usually can only The parameters of some networks in the pre-trained target detection model, for example, can only train the parameters of the backbone network, so the performance improvement of the final target detection model is limited.
  • An embodiment of the present disclosure provides a model training method, which can be executed by a processor of a computer device.
  • computer equipment refers to servers, notebook computers, tablet computers, desktop computers, smart TVs, set-top boxes, mobile devices (such as mobile phones, portable video players, personal digital assistants, dedicated messaging devices, portable game devices), etc. Devices with data processing capabilities.
  • Fig. 1 is a schematic diagram of the implementation flow of a model training method provided by an embodiment of the present disclosure. As shown in Fig. 1, the method includes the following steps S101 to S104:
  • Step S101 acquiring a first augmented image and a second augmented image obtained by respectively performing augmentation processing on a first image sample.
  • the first image sample may be any suitable image containing at least one object.
  • the objects contained in the first image sample can be determined according to the actual application scene, for example, may include but not limited to at least one of objects such as people, human body parts, animals, animal limbs, plants, flowers, leaves, stones, clouds, and fences.
  • the augmentation processing performed on the first image sample may include but not limited to at least one of random scaling, random cropping, random flipping, random resizing, color dithering, grayscale processing, Gaussian blurring, random erasing, and the like.
  • the first augmented image and the second augmented image may be obtained by performing different augmentation processes on the same first image sample, or may be obtained by performing the same augmentation process on the same first image sample.
  • those skilled in the art may use appropriate augmentation processing on the first image sample to obtain the first augmented image and the second augmented image according to actual conditions, which are not limited by the embodiments of the present disclosure.
  • Step S102 use the first model to be trained to perform object detection on the first augmented image, obtain at least one first detection result including the first predicted object sequence, and use the second model to perform object detection on the second augmented image Target detection is performed on the wide image, and at least one second detection result including the second predicted object sequence is obtained.
  • the first model can be any suitable model for object detection based on sequence characteristics, such as Vision Transformer (ViT), Transformer-based object detection model (Detection Transformer, DETR), deformable DETR, etc.
  • the first model can transform the target detection problem into a prediction problem of the feature sequence set, so as to output at least one first detection result including the first predicted object sequence.
  • the first prediction target sequence may be obtained after the first model performs sequence encoding and sequence decoding on the first augmented image.
  • Each sequence of first predictors may represent a predictor in the first image sample.
  • those skilled in the art may use any suitable sequence encoding method and sequence decoding method to process the first augmented image according to the actual situation to obtain at least one first prediction object sequence, which is not limited in this embodiment of the present disclosure.
  • the first model may be a deformable DETR.
  • the first prediction object sequence in the first detection result can be the prediction object sequence output by the decoder in the transformer (Transformer), or it can be the prediction object sequence output by the decoder in the Transformer after mapping processing such as dimension transformation The resulting mapped prediction object sequence.
  • the first detection result may include a first predicted object sequence, a first object region and a first object category corresponding to the first predicted object sequence.
  • the first predicted object sequence may represent a predicted object, and the first object area and the first object category corresponding to the first predicted object sequence may respectively represent the predicted location area and predicted category of the predicted object.
  • the second model may have the same network structure as the first model, or may have a different network structure from the first model, which is not limited here.
  • the process of using the second model to perform object detection on the second augmented image corresponds to the process of using the first model to perform object detection on the first augmented image. During implementation, you can refer to the process of using the first model to detect the first The process of augmenting images for object detection.
  • the second prediction target sequence may be obtained after the second model performs sequence encoding and sequence decoding on the second augmented image. Each second sequence of predictors may represent a predictor in the first image sample.
  • the second predicted object sequence in the second detection result may be the predicted object sequence output by the decoder in the Transformer, or a pair of The predicted target sequence output by the decoder in the Transformer is the mapped predicted target sequence obtained after mapping processing such as dimension transformation.
  • the second detection result may include a second predicted object sequence, a second object region and a second object category corresponding to the second predicted object sequence.
  • the second predicted object sequence may represent a predicted object, and the second object area and the second object category corresponding to the second predicted object sequence may respectively represent the predicted location area and predicted category of the predicted object.
  • Step S103 matching each of the first predictor sequences and each of the second predictor sequences to obtain at least one pair of the first predictor sequence and the second predictor sequence having a target matching relationship.
  • first sequence of prediction objects and the second sequence of prediction objects having a target matching relationship may represent the same prediction object in the first image sample.
  • those skilled in the art may use any suitable matching method to match each first sequence of prediction objects with each second sequence of prediction objects according to actual conditions, which is not limited here.
  • the output timing of each first predictor sequence and the output timing of each second predictor sequence can be determined, and the first predictor sequence and the second predictor sequence with the same output timing are determined as having The first predictor sequence and the second predictor sequence of the target matching relationship, so that at least one pair of the first predictor sequence and the second predictor sequence having the target matching relationship can be obtained.
  • bipartite graph matching can be used to match each first predictor sequence with each second predictor sequence to obtain at least one pair of the first predictor sequence and the second predictor sequence with the target matching relationship. Predict sequence of objects.
  • any suitable manner may be used to calculate the matching loss used in the bipartite graph matching process, which is not limited here.
  • the matching loss used in the bipartite graph matching process may be determined based on at least one of the following: the similarity between each pair of matched first predictor sequences and each second predictor sequence, each pair of matched The intersection ratio between the first object region and the second object region corresponding to the first prediction object sequence and each second prediction object sequence, each pair of the first prediction object sequence and each second prediction object sequence that match each other The focus loss between the first object category and the second object category respectively corresponding to the object sequences, etc.
  • Step S104 based on each pair of the first predicted object sequence and the second predicted object sequence having the target matching relationship, update the model parameters of the first model at least once to obtain the trained first model.
  • an appropriate parameter update algorithm is used to update the model parameters of the first model, and after the update, each pair of the first prediction object sequence and the second prediction object sequence with the target matching relationship is re-determined, Based on each re-determined pair of the first predictor sequence and the second predictor sequence having the target matching relationship, it is determined whether the model parameters of the first model need to be continuously updated. If it is determined that the model parameters of the first model do not need to be continuously updated, the finally updated first model is determined as the trained first model.
  • the target loss value can be determined based on each pair of the first predictor sequence and the second predictor sequence with the target matching relationship, and when the target loss value does not meet the preset condition, the model parameters of the first model Update, when the target loss value satisfies the preset condition or the number of updates to the model parameters of the first model reaches the set threshold, stop updating the model parameters of the first model, and finally update the first model
  • the model is determined to be the first model after training.
  • the first augmented image and the second augmented image obtained after respectively augmenting the first image sample are acquired; using the first model to be trained, the target detection is performed on the first augmented image, Obtain at least one first detection result including the first predicted object sequence, and use the second model to perform target detection on the second augmented image, and obtain at least one second detection result including the second predicted object sequence; for each first A predictor sequence is matched with each second predictor sequence to obtain at least one pair of the first predictor sequence and the second predictor sequence with the target matching relationship; based on each pair of the first predictor sequence with the target match relationship and the second predicted object sequence, and update the model parameters of the first model at least once to obtain the trained first model.
  • the sequence-level self-supervised training process of the target detection model can be realized, and the overall network structure of the target detection model can be trained, so that the performance of the entire target detection model can be effectively improved, and the labeling cost in the training process of the target detection model can be reduced.
  • the first model includes a feature extraction network and a converter network; using the first model to be trained in the above step S102 to perform target detection on the first augmented image to obtain at least one
  • the first detection result including the first predicted object sequence includes the following steps S111 to S112:
  • Step S111 using the feature extraction network of the first model to perform feature extraction on the first augmented image to obtain image feature information.
  • the feature extraction network can be any suitable network capable of extracting image features, such as a convolutional neural network, a recurrent neural network, a converter-based feature extraction network, and the like.
  • a convolutional neural network such as a convolutional neural network, a recurrent neural network, a converter-based feature extraction network, and the like.
  • those skilled in the art may use an appropriate feature extraction network in the first model according to actual conditions to obtain image feature information, which is not limited here.
  • Step S112 using the converter network of the first model to perform prediction processing on the image feature information to obtain at least one sequence of first prediction objects.
  • the converter network may include an encoder network and a decoder network.
  • those skilled in the art may use an appropriate converter network in the first model according to actual conditions to perform prediction processing on the image feature information, which is not limited here.
  • the image feature information can be position-encoded and then input into the encoder network to obtain at least one encoded feature sequence after the encoder network performs feature encoding processing on the position-encoded image feature information; using the decoder network, the Identifying each coded feature sequence to obtain context identification information corresponding to at least one prediction object, and performing feature decoding processing on each coded feature sequence according to each context identification information to obtain at least one first prediction object sequence.
  • the first model includes a feature extraction network and a converter network, so that based on the sequence characteristics of the converter network, the sequence-level self-supervised training process of the target detection model based on the converter network can be realized, and can be based on The overall network structure of the target detection model of the converter network is trained, so that the performance of the entire target detection model can be effectively improved, and the labeling cost in the training process of the target detection model can be reduced.
  • the first model further includes a first feed-forward neural network; the above step S112 may include the following steps S121 to S122:
  • Step S121 using the converter network of the first model to perform prediction processing on the image feature information to obtain at least one feature sequence
  • Step S122 using the first feed-forward neural network to map each feature sequence to a target dimension to obtain at least one first sequence of predicted objects.
  • the first feedforward neural network may be any suitable feedforward neural network capable of mapping the feature sequence to the target dimension, which is not limited here.
  • Target dimensions can be pre-set. During implementation, those skilled in the art can set appropriate target dimensions according to actual business scenarios.
  • the feature sequence output by the converter network is a 256-dimensional feature
  • the 256-dimensional feature sequence can be mapped to a 512-dimensional first prediction object sequence through the first feedforward neural network.
  • the feature sequence output by the converter network is mapped to the target dimension through the first feed-forward neural network to obtain the second predicted object sequence.
  • the detection of the first model can be improved by presetting the appropriate target dimension. performance.
  • the detection accuracy of the first model can be improved by setting a higher target dimension; for another example, the detection efficiency of the first model can be improved by setting a lower target dimension.
  • the first detection result further includes a first object area and a first object category
  • the first model further includes a second feedforward neural network and a third feedforward neural network; the above step S102
  • the aforementioned method uses the first model to be trained to perform target detection on the first augmented image to obtain at least one first detection result including the first predicted object sequence, and further includes:
  • Step S131 for each feature sequence, use the second feedforward neural network to perform region prediction on the feature sequence to obtain a first object region, and use the third feedforward neural network to predict the The feature sequence is used for category prediction to obtain the first object category.
  • the second feedforward neural network may be any suitable feedforward neural network capable of area prediction, which is not limited here.
  • the second feedforward neural network can be used to predict the position area of the predicted object represented by the feature sequence in the first augmented image, and the obtained first object area can be a detection frame of the predicted object.
  • the third feedforward neural network may be any suitable feedforward neural network capable of category prediction, which is not limited here.
  • the object category of the predicted object represented by the feature sequence can be predicted by using the third feedforward neural network to obtain the first object category.
  • the output quantity of the third feedforward neural network may be determined according to the quantity of object categories to be detected in an actual business scenario, which is not limited here.
  • the second model has the same network structure as the first model.
  • the process of performing target detection on the second augmented image by using the second model may refer to the process of performing target detection on the first augmented image by using the first model.
  • the above step S101 may include the following steps S141 to S142:
  • Step S141 performing first image augmentation processing on the first image sample to obtain a first augmented image
  • Step S142 performing a second image augmentation process on the first image sample to obtain a second augmented image.
  • the first image augmentation processing and the second image augmentation processing may adopt the same augmentation processing manner, or may adopt different augmentation processing manners, which are not limited here.
  • the first image augmentation process includes at least one of the following: color dithering, grayscale processing, Gaussian blur, and random erasure;
  • the second image augmentation process includes at least one of the following: random scaling , random cropping, random flipping, random resizing.
  • the first augmented image and the second augmented image are obtained by performing the first image augmentation processing and the second image augmentation processing on the first image sample and the first image sample respectively.
  • the image disturbance caused by random scaling, random cropping, random flipping and random resizing included in the second image augmentation process, the image caused by color dithering, grayscale processing, Gaussian blur, and random erasure included in the first augmentation process The disturbance is stronger, which can make the target detection difficulty of the first model higher than that of the second model, thereby improving the learning ability of the trained first model, and improving the model due to the same learning ability of the first model and the second model. Collapsing situation.
  • FIG. 2 is a schematic diagram of the implementation flow of a model training method provided by an embodiment of the present disclosure. As shown in Fig. 2, the method includes the following steps S201 to S206:
  • Step S201 acquiring a first augmented image and a second augmented image obtained by respectively performing augmentation processing on a first image sample.
  • Step S202 use the first model to be trained to perform object detection on the first augmented image, obtain at least one first detection result including the first predicted object sequence, and use the second model to perform object detection on the second augmented image Target detection is performed on the wide image, and at least one second detection result including the second predicted object sequence is obtained.
  • Step S203 matching each of the first predictor sequences and each of the second predictor sequences to obtain at least one pair of the first predictor sequence and the second predictor sequence having a target matching relationship.
  • the above steps S201 to S203 respectively correspond to the above steps S101 to S103, and the implementation of the above steps S101 to S103 can be referred to for implementation.
  • Step S204 based on the similarity between each pair of the first prediction object sequence and the second prediction object sequence having the target matching relationship, determine a target loss value.
  • any suitable similarity loss function can be used to determine the similarity loss between each pair of the first prediction object sequence and the second prediction object sequence with the target matching relationship, and based on each similarity loss, the target loss value can be determined .
  • the similarity loss function may include but not limited to at least one of absolute value loss function, least square error loss function, cosine loss function, BYOL (Bootstrap Your Own Latent) algorithm, Momentum Contrastive (MOCO) algorithm, etc.
  • Step S205 if the target loss value does not meet the preset condition, update the model parameters of the first model to obtain an updated first model.
  • the preset conditions may include, but are not limited to, the target loss value being smaller than a set loss value threshold, the change of the target loss value converging, and the like.
  • the preset conditions may be set according to actual conditions, which are not limited here.
  • the way to update the model parameters of the first model can be determined according to the actual situation, and can include but not limited to at least one of gradient descent method, momentum update method, Newton momentum method, etc., which is not limited here.
  • Step S206 based on the updated first model, determine the trained first model.
  • the updated first model may be determined as the trained first model.
  • the updated first model may be continuously updated, and the finally updated first model may be determined as the trained first model.
  • the target loss value is determined based on the similarity between each pair of the first prediction object sequence and the second prediction object sequence with the target matching relationship.
  • the model parameters of the first model are updated to obtain an updated first model, and based on the updated first model, a trained first model is determined.
  • the model parameters of the first model can be updated at least once when the target loss value does not meet the preset condition, because the target loss value is based on each pair of the first prediction object sequence and the second
  • the similarity between the predicted object sequences is determined, so that the consistency of the predicted object sequences obtained after the first model after training and the second model are processed for different augmented images of the same image sample can be improved, and the post-training can be improved.
  • the above step S205 may include the following step S211:
  • Step S211 when the target loss value does not meet the preset condition, update the model parameters of the first model and the model parameters of the second model respectively to obtain the updated first model and the updated of the second model.
  • both the model parameters of the first model and the model parameters of the second model may be updated when the target loss value does not meet the preset condition, so as to realize comparative learning between the first model and the second model.
  • the way to update the model parameters of the second model can be determined according to the actual situation, and can include but not limited to at least one of gradient descent method, momentum update method, Newton momentum method, etc., which is not limited here.
  • the model parameter updating methods of the first model and the second model may be the same or different, which is not limited here.
  • step S206 may include the following step S212:
  • Step S212 Determine the trained first model based on the updated first model and the updated second model.
  • a new target loss value can be determined, and by judging whether the new target loss value satisfies a preset condition, it is determined whether the updated The first model continues to be updated.
  • the new target loss value satisfies the preset condition, it can be determined not to continue to update the updated first model, and the updated first model can be determined as the first model after training; in the new target loss If the value does not meet the preset condition, the updated first model may be continuously updated, and the finally updated first model may be determined as the trained first model.
  • the model parameters of the second model are also updated, so that the learning capabilities of the first model and the second model can be mutually enhanced, thereby improving the training performance. performance of the target detection model.
  • the above step S211 may include the following steps S221 to S222:
  • Step S221 based on the current model parameters of the first model, perform momentum update on the model parameters of the second model to obtain an updated second model.
  • the current model parameters of the first model and the current model parameters of the second model may be weighted and summed to obtain an updated second model.
  • the following formula 1 can be used to update the momentum of the model parameters of the second model:
  • ⁇ m+1 k* ⁇ m +(1-k)* ⁇ o (1);
  • ⁇ m and ⁇ o are the current model parameters of the second model and the current model parameters of the first model, respectively, ⁇ m+1 is the updated second model, and k is the set momentum coefficient.
  • k may be a value greater than or equal to 0.9 and less than 1, for example, k is 0.995.
  • step S222 the current model parameters of the first model are updated in a gradient update manner to obtain an updated first model.
  • any suitable gradient update algorithm may be used to update the current model parameters of the first model, which is not limited in this embodiment of the present disclosure.
  • the gradient update algorithm may include, but is not limited to, at least one of batch gradient descent, stochastic gradient descent, and mini-batch gradient descent.
  • the momentum update is performed on the model parameters of the second model to obtain the updated second model, and the current model parameters of the first model are updated by means of gradient update , to get the updated first model.
  • the first model and the second model can be updated at different rates, which can improve model collapse and improve the performance of the trained target detection model.
  • the above step S212 may include the following steps S231 to S235:
  • step S231 the first augmented image and the second augmented image obtained after augmenting the next first image sample are respectively determined as the current first augmented image and the current second augmented image.
  • next first image sample may be the same image as the current first image sample, or an image different from the current first image sample.
  • Step S232 using the currently updated first model to perform object detection on the current first augmented image, obtaining at least one first detection result including the first predicted object sequence, and using the currently updated second model, Object detection is performed on the current second augmented image to obtain at least one second detection result including a second predicted object sequence.
  • Step S233 matching each of the first predictor sequences and each of the second predictor sequences to obtain at least one pair of the first predictor sequence and the second predictor sequence having a target matching relationship.
  • Step S234 based on the similarity between each pair of the first prediction target sequence and the second prediction target sequence having a target matching relationship, determine the current target loss value.
  • steps S231 to S234 respectively correspond to the above steps S201 to S204, and for implementation, reference may be made to the implementation manners of the above steps S201 to S204.
  • Step S235 when the current target loss value satisfies the preset condition or the number of times the model parameters of the first model are updated reaches a number threshold, determine the currently updated first model as training after the first model.
  • the number of times threshold may be preset by the user according to the actual situation, or may be a default value.
  • step S212 may further include the following steps S241 to S242:
  • Step S241 in the case that the current target loss value does not meet the preset condition, respectively update the model parameters of the first model and the model parameters of the second model to obtain the second update after the next update.
  • Step S242 based on the first model after the next update and the second model after the next update, determine the first model after training.
  • the model parameters of the first model and the model parameters of the second model can be updated next time, and based on the first model after the next update and the second model after the next update to determine the first model after training, so that the performance of the first model after training can be improved through continuous iterative updating.
  • FIG. 3 is a schematic diagram of the implementation flow of a model training method provided by an embodiment of the present disclosure. As shown in Fig. 3, the method includes the following steps S301 to S310:
  • Step S301 acquiring a first augmented image and a second augmented image obtained by respectively performing augmentation processing on a first image sample.
  • Step S302 use the first model to be trained to perform target detection on the first augmented image to obtain at least one first detection result, and use the second model to perform target detection on the second augmented image to obtain At least one of the second detection results includes a second sequence of predicted objects; the first detection result includes a first sequence of predicted objects and a first object region and a first object category corresponding to the first sequence of predicted objects.
  • Step S303 matching each of the first predictor sequences and each of the second predictor sequences to obtain at least one pair of the first predictor sequence and the second predictor sequence having a target matching relationship.
  • step S301 to step S303 respectively correspond to the above step S101 to step S103, and the implementation manner of the above step S101 to step S103 can be referred to for implementation.
  • Step S304 acquiring at least one candidate object in the first image sample, each of the candidate objects having a candidate object area and a candidate object category.
  • At least one candidate object in the first image sample may be randomly determined, or may be obtained by performing object detection on the first image sample through any suitable unsupervised algorithm, which is not limited here.
  • the unsupervised detection algorithm may include but not limited to at least one of a sliding window method, a candidate region algorithm, a selective search algorithm, and the like.
  • the candidate object area of the candidate object is the predicted position area of the candidate object in the first image sample, and the candidate object category of the candidate object is the prediction type of the candidate object.
  • the candidate object category of the candidate object can be used as a pseudo-label of the candidate object region of the candidate object.
  • the above step S304 may include: performing object detection on the first image sample in an unsupervised manner to obtain at least one predicted object region and a pseudo-label of each predicted object region;
  • the pseudo-label of the prediction object area is used to characterize the prediction object category of the prediction object area; for each of the prediction object areas, the prediction object area is used as a candidate object area, and the pseudo-label of the prediction object area is used as Candidate object category, get a candidate object.
  • any suitable unsupervised algorithm may be used to implement the unsupervised target detection on the first image sample. In this way, the labeling cost in the training process of the target detection model can be reduced.
  • Step S305 based on the first object region and the first object category corresponding to each of the first prediction object sequences, and the candidate object region and the candidate object category of each of the candidate objects, for each of the first prediction objects
  • the sequence is matched with each of the candidate objects to obtain at least one pair of the first predicted object sequence and the candidate object having a target matching relationship.
  • the first predicted object sequence and the candidate object having a target matching relationship may represent the same predicted object in the first image sample.
  • those skilled in the art may use any suitable matching manner to match each first predicted object sequence with each candidate object according to actual conditions, which is not limited here.
  • bipartite graph matching may be used to match each first predicted object sequence and each candidate object to obtain at least one pair of the first predicted object sequence and the candidate object having a target matching relationship.
  • any suitable manner may be used to calculate the matching loss used in the bipartite graph matching process, which is not limited here.
  • the matching loss used in the bipartite graph matching process may be determined based on at least one of the following: the intersection and union between the first object region and the candidate object region respectively corresponding to each pair of first predicted object sequences and candidate objects that match each other
  • Each pair of the first predicted object sequence and the candidate object matched with each other corresponds to the focal loss between the first object category and the candidate object category, etc.
  • Step S306 based on the similarity between each pair of the first prediction object sequence and the second prediction object sequence having the target matching relationship, determine a first loss value.
  • any suitable similarity loss function may be used to determine the first loss value between each pair of the first prediction object sequence and the second prediction object sequence having the target matching relationship, which is not limited in this embodiment of the present disclosure.
  • the similarity loss between each pair of the first predicted object sequence and the second predicted object sequence having the target matching relationship may be determined, and each similarity loss may be accumulated to obtain the first loss value.
  • the first loss value may be determined in a manner as shown in the following formula 2:
  • N is the number of pairs of the first prediction object sequence and the second prediction object sequence with the target matching relationship, and N is a positive integer;
  • si is the first prediction object sequence, is the second prediction object sequence that has a target matching relationship with si .
  • Step S307 based on each pair of the first predicted object sequence and the candidate object having the target matching relationship, determine a second loss value.
  • any suitable loss function may be used to determine the second loss value between each pair of the first predicted object sequence and the candidate object having the target matching relationship, which is not limited in this embodiment of the present disclosure.
  • the loss function may include but not limited to at least one of a similarity loss function, a focus loss function, an intersection loss function, a generalized intersection loss function, and the like.
  • Step S308 Determine a target loss value based on the first loss value and the second loss value.
  • the target loss value may be determined based on the first loss value and the second loss value in an appropriate manner according to actual conditions, which is not limited in this embodiment of the present disclosure.
  • the sum of the first loss value and the second loss value can be determined as the target loss value, or the average value of the first loss value and the second loss value can be determined as the target loss value, and different weights can be used to determine the target loss value
  • the first loss value and the second loss value are weighted and summed to obtain the target loss value.
  • Step S309 if the target loss value does not meet the preset condition, update the model parameters of the first model to obtain an updated first model.
  • Step S310 based on the updated first model, determine the trained first model.
  • steps S309 to S310 correspond to the above-mentioned steps S205 to S206 respectively, and for implementation, reference may be made to the implementation manners of the above-mentioned steps S205 to S206.
  • step S304 may be performed before step S301, step S304 may be performed after step S306, and step S307 may be performed after step S302 and before step S303; this is not limited in this embodiment of the present disclosure.
  • the first loss value is determined, and based on each pair of the first prediction object sequence with the target matching relationship.
  • the object sequence and candidate objects are used to determine a second loss value, and based on the first loss value and the second loss value, a target loss value is determined. Since the candidate object category of each candidate object can be used as the pseudo-label of the candidate object region of the candidate object, based on each pair of the first predicted object sequence and the candidate object with the target matching relationship, the determined second loss value can be The object location prediction ability of the first model provides objective supervision, thereby improving the object location ability of the trained first model, and further improving the detection accuracy of the trained first model.
  • the above step S307 may include the following steps S321 to S322:
  • Step S321 for each pair of the first predicted object sequence and the candidate object having the target matching relationship, based on the first object region corresponding to the first predicted object sequence and the candidate object region of the candidate object, determine a first sub- A loss value, and determine a second sub-loss value based on the first object category corresponding to the first predicted object sequence and the candidate object category of the candidate object.
  • any suitable loss function can be used to determine the first sub-loss value between the first object region and the candidate object region, and the second sub-loss value between the first object category and the candidate object category.
  • the intersection ratio loss function, generalized intersection ratio loss function, etc. can be used to determine the first sub-loss value between the first object area and the candidate object area
  • the focus loss function can be used to determine the first sub-loss value between the first object category and the candidate object category.
  • Step S322 Determine a second loss value based on each of the first sub-loss values and each of the second sub-loss values.
  • the second loss value may be determined based on the first sub-loss value and the second sub-loss value in an appropriate manner according to actual conditions, which is not limited in this embodiment of the present disclosure.
  • the sum of the first sub-loss value and the second sub-loss value may be determined as the second loss value, or the average value of the first sub-loss value and the second sub-loss value may be determined as the second loss value, or
  • the first sub-loss value and the second sub-loss value are weighted and summed with different weights to obtain the second loss value.
  • each of the first sub-loss values, each of the second sub-loss values, and each pair of the first prediction object sequence and the second prediction object sequence having a target matching relationship can be The target loss value is obtained after the weighted summation of the similarity losses between them.
  • the target loss value can be determined in the manner shown in the following formula 3:
  • N is the number of pairs of the first prediction object sequence and the second prediction object sequence with the target matching relationship, and N is a positive integer;
  • si is the first prediction object sequence, is the second predicted object sequence that has a target matching relationship with si , is the first predictor sequence s i and the second predictor sequence
  • the similarity loss between; ci is the first object category corresponding to the first predicted object sequence si , is the candidate object category of the candidate object that has a target matching relationship with si , is the first object category c i and the candidate object category calculated using the intersection loss function
  • b i is the first object area corresponding to the first prediction object sequence si , is the candidate object region of the candidate object that has a target matching relationship with si , is the first object region bi and the candidate object region calculated using the generalized intersection ratio loss function
  • a first sub-loss value is determined based on the first object region corresponding to the first predicted object sequence and the candidate object region of the candidate object , and based on the first object category corresponding to the first predicted object sequence and the candidate object category of the candidate object, determine a second sub-loss value; based on each first sub-loss value and each second sub-loss value, determine the second loss value.
  • FIG. 4 is a schematic diagram of the implementation flow of a model training method provided by an embodiment of the present disclosure. As shown in Fig. 4, the method includes the following steps S401 to S404:
  • Step S401 acquiring a first augmented image and a second augmented image obtained by respectively performing augmentation processing on a first image sample.
  • Step S402 use the first model to be trained to perform object detection on the first augmented image to obtain at least one first detection result, and use the second model to perform object detection on the second augmented image to obtain At least one second detection result;
  • the first detection result includes a first predicted object sequence and a first object area and a first object category corresponding to the first predicted object sequence
  • the second detection result includes a second predicted object sequence An object sequence and a second object area and a second object category corresponding to the second predicted object sequence.
  • step S401 to step S402 respectively correspond to the above step S101 to step S102, and the implementation manner of the above step S101 to step S102 can be referred to for implementation.
  • the second object area may be obtained by predicting the position area of the predicted object represented by the second predicted object sequence in the second augmented image, and may be a detection frame of the predicted object.
  • the second object category may be obtained by predicting the object category of the predicted object represented by the second sequence of predicted objects.
  • Step S403 based on the first object region and the first object category corresponding to each of the first predicted object sequences, and the second object region and the second object category corresponding to each of the second predicted object sequences, for each The first predictor sequence and each of the second predictor sequences perform bipartite graph matching to obtain at least one pair of the first predictor sequence and the second predictor sequence having a target matching relationship.
  • any suitable bipartite graph matching algorithm can be used to match each first predictor sequence with each second predictor sequence to obtain at least one pair of first predictor sequence and second predictor sequence with target matching relationship sequence.
  • the bipartite graph matching algorithm used may include but not limited to at least one of the Hungarian matching algorithm, the maximum flow matching algorithm, and the like.
  • any suitable manner may be used to calculate the matching loss used in the bipartite graph matching process, which is not limited here.
  • the matching loss used in the bipartite graph matching process may be determined based on at least one of the following: the similarity between each pair of first predicted object sequences and second predicted object sequences that match each other, each pair of first predicted object sequences that match each other The intersection ratio between the first object region and the second object region corresponding to the first prediction object sequence and the second prediction object sequence, and the first prediction object sequence and the second prediction object sequence respectively corresponding to each matching pair Focal loss between one object class and a second object class, etc.
  • Step S404 based on each pair of the first predicted object sequence and the second predicted object sequence having the target matching relationship, update the model parameters of the first model at least once to obtain the trained first model.
  • the above-mentioned step S404 corresponds to the above-mentioned step S104, and the implementation of the above-mentioned step S104 can be referred to for implementation.
  • the above step S403 may include the following steps S411 to S413:
  • Step S411 based on each of the first predictor sequences and each of the second predictor sequences, determine at least one set of candidate sequence pairs; each set of candidate sequence pairs includes at least one pair of candidates with a matching relationship The first sequence of predictors and the second sequence of predictors.
  • any suitable manner may be used to perform one-to-one matching on each first predictor sequence and each second predictor sequence to obtain at least one candidate sequence pair set, which is not limited in this embodiment of the present disclosure.
  • at least one random match may be performed on each first predictor sequence and each second predictor sequence to obtain at least one candidate sequence pair set.
  • Step S412 for each set of candidate sequence pairs, based on the first predictor sequence corresponding to the first predictor sequence in the second predictor sequence and each pair of candidate sequence pair sets in the set of candidate sequence pairs.
  • the object region and the first object category, and the second object region and the second object category corresponding to the second predicted object sequence determine the matching loss of the set of candidate sequence pairs.
  • any suitable manner may be used to calculate the matching loss of the set of candidate sequence pairs.
  • the mutual The focal loss between the first object category and the second object category respectively corresponding to each pair of the first predicted object sequence and each second predicted object sequence is used to determine the matching loss of the set of candidate sequence pairs.
  • the matching loss of the candidate sequence pair set can be calculated in the manner shown in the following formula 4:
  • N is the number of pairs of the first prediction object sequence and the second prediction object sequence having a target matching relationship
  • N is a positive integer
  • denotes the Hungarian matching loss Representing at least one pair of a first prediction target sequence and a second prediction target sequence that match each other in the candidate sequence pair set
  • b i is the first object region corresponding to the first prediction object sequence in the i-th pair of the first prediction object sequence and the second prediction object sequence with the target matching relationship
  • Step S413 determining each pair of the first prediction object sequence and the second prediction object sequence having a candidate matching relationship in the candidate sequence pair set with the smallest matching loss in the at least one candidate sequence pair set as at least one pair with the target The first predictor sequence and the second predictor sequence of the matching relationship.
  • each first predictor sequence and each second predictor sequence are matched by using bipartite graph matching, which can improve the determined at least one pair of first predictor sequence and each second predictor sequence with the target matching relationship.
  • the second predicts the accuracy of the target matching relationship between the object sequences, thereby improving the detection accuracy of the trained first model.
  • FIG. 5 is a schematic diagram of the implementation flow of a model training method provided by an embodiment of the present disclosure. As shown in Fig. 5, the method includes the following steps S501 to S506:
  • Step S501 acquiring a first augmented image and a second augmented image obtained by respectively performing augmentation processing on a first image sample.
  • Step S502 use the first model to be trained to perform object detection on the first augmented image, obtain at least one first detection result including the first predicted object sequence, and use the second model to perform object detection on the second augmented image Target detection is performed on the wide image, and at least one second detection result including the second predicted object sequence is obtained.
  • Step S503 matching each of the first predictor sequences and each of the second predictor sequences to obtain at least one pair of the first predictor sequence and the second predictor sequence having a target matching relationship.
  • Step S504 based on each pair of the first predicted object sequence and the second predicted object sequence having the target matching relationship, update the model parameters of the first model at least once to obtain the trained first model.
  • the above-mentioned steps S501 to S504 correspond to the above-mentioned steps S101 to S104 respectively, and for implementation, reference may be made to the implementation manners of the above-mentioned steps S101 to S104.
  • Step S505 based on the trained first model, determine an initial third model.
  • the feedforward neural network in the trained first model may be adjusted according to an actual target detection scenario, and the adjusted first model may be determined as the initial third model.
  • the first model includes a feature extraction network, a converter network, and a first feedforward neural network, a second feedforward neural network, and a third feedforward neural network connected to the converter network; the first The feedforward neural network, the second feedforward neural network and the third feedforward neural network are respectively used to output the first predicted object sequence, the first object region corresponding to the first predicted object sequence, and the first predicted object sequence.
  • the adjusted first model is determined as an initial third model.
  • Step S506 based on at least one second image sample, update the model parameters of the third model to obtain the trained third model.
  • the second image sample may have label information or may not have label information.
  • those skilled in the art may determine an appropriate second image sample according to an actual target detection scene, which is not limited here.
  • the model parameters of the third model may be fine-tuned and trained based on at least one second image sample to obtain the trained third model.
  • an initial third model is determined based on the trained first model, and model parameters of the third model are updated based on at least one second image sample to obtain a trained third model.
  • the model parameters of the trained first model can be transferred to other target detection models to be applied to various target detection scenarios, which can improve the training efficiency of the third model and the detection accuracy of the trained third model.
  • FIG. 6 is a schematic diagram of an implementation flow of an image processing method provided by an embodiment of the present disclosure. As shown in FIG. 6, the method includes the following steps S601 to S602:
  • Step S601 acquiring an image to be processed
  • Step S602 using the trained fourth model to perform object detection on the image to be processed to obtain a third detection result; wherein, the fourth model includes at least one of the following: using the model training described in the above-mentioned embodiments
  • the first model obtained by the method is the third model obtained by using the model training method described in the above embodiment.
  • the image to be processed can be any suitable image to be detected.
  • those skilled in the art can select an appropriate image to be processed according to the actual application scenario, which is not limited by the embodiments of the present disclosure.
  • the first augmented image and the second augmented image obtained after processing the first augmented image and the second augmented image of the same image sample can be obtained by maintaining the first model and the second model respectively.
  • the consistency between the first predicted object sequence and the second predicted object sequence realizes the sequence-level self-supervised training process of the target detection model, and can train the overall network structure of the target detection model, so that the entire target detection model can be effectively improved. Therefore, based on at least one of the first model and the third model obtained by adopting the model training method described in the above embodiments, performing target detection on the image to be processed can improve the accuracy of target detection.
  • An embodiment of the present disclosure provides a pre-training method of a self-supervised target detection model based on Transformer sequence consistency, the method can be executed by a processor of a computer device, and the method can use unlabeled data to carry out the overall network structure of the target detection model Training, and based on the sequence characteristics of the Transformer, it can simultaneously realize the object region regression in the target detection model detection and the self-supervised representation learning process of the object category.
  • FIG. 7A is a schematic diagram of an implementation process of model training based on a pre-training method provided by an embodiment of the present disclosure. As shown in FIG. 7A, the method may include the following steps S701 to S703:
  • Step S701 acquire at least one candidate object in the first image sample in an unsupervised manner, each of the candidate objects has a candidate object area and a candidate object category.
  • any suitable unsupervised detection algorithm may be used to detect the target object in the first image sample to obtain at least one candidate object.
  • a selective search algorithm may be employed to unsupervisedly obtain at least one candidate object with a high recall rate from the first image sample.
  • Step S702 using the pre-training method of the self-supervised target detection model based on Transformer sequence consistency to pre-train the first model.
  • the model training architecture as shown in FIG. 7B can be used to realize the pre-training method of the self-supervised target detection model based on Transformer sequence consistency.
  • the model training architecture includes the first model 10 and The second model 20, wherein the network structure of the first model 10 and the second model 20 is the same, and both include a convolutional neural network (Convolutional Neural Networks, CNN) 11 or 21, a Transformer encoder 12 or 22, and a Transformer decoder 13 Or 23, and feedforward neural network (Feed-Forward Networks, FFN) 14 or 24, feedforward neural network can comprise the first feedforward neural network, the second feedforward neural network and the 3rd feedforward neural network;
  • the inputs of the first model 10 and the second model 20 are respectively the first augmented image and the second augmented image obtained after augmenting the first image sample 30, wherein the input of the first model 10 The perturbation of the first augmented image contains more color level perturbations.
  • the processes of the first model 10 and the second model 20 performing target detection on the first augmented image and the second augmented image respectively are the same, taking the process of the first model 10 performing target detection on the first augmented image as an example, After the convolutional neural network 11 is used to extract the features of the first augmented image, a position code 40 will be added to the extracted features, and the Transformer encoder 12 and Transformer decoder 13 will be used to process the features after adding the position code.
  • At least one feature sequence 31 representing the predicted object can be obtained, and the first feedforward neural network, the second feedforward neural network and the third feedforward neural network are used for each feature Sequence 31 is processed, and for each feature sequence 31, the first predicted object sequence Prj1 output by the first feedforward neural network, and the first object region corresponding to the first predicted object sequence output by the second feedforward neural network can be obtained Bx1, the first object category Cls1 corresponding to the first predicted object sequence output by the third feed-forward neural network, correspondingly, after the second augmented image is processed by the second model 20, the feature sequence 32, the second A prediction target sequence Prj2, a second target region Bx2 corresponding to the second prediction target sequence, and a second target class Cls2 corresponding to the second prediction target sequence.
  • At least one first predictor sequence Prj1 and at least one second predictor sequence Prj2 can be matched using a bipartite graph matching algorithm to obtain at least one pair of objects with a target matching relationship.
  • the first predictor sequence and the second predictor sequence (such as the first predictor sequence corresponding to the first target region Bx1-1 and the second predictor sequence corresponding to the second target region Bx2-1, corresponding to the first target region Bx1
  • the absolute value loss function to calculate the similarity loss
  • the target loss value can be determined, and based on the target loss value, the network parameters of the first model 10 and the network of the second model 20
  • the parameters are updated to improve the consistency of the Transformer feature sequence of the augmented image after different augmentation processes on the same image sample;
  • the bipartite graph matching algorithm is a set-based matching method
  • the input of the bipartite graph matching algorithm is at least one first prediction object sequence and at least one second prediction object sequence respectively output by the first model 10 and the second model 20, And the confidence degree of the first object region and the first object category corresponding to each first predicted object sequence, and the confidence degree of the second object region and the second object category corresponding to each second predicted object sequence.
  • the bipartite graph matching algorithm can find a better sequence matching pair (that is, the first prediction object sequence and the second prediction object sequence with the target matching relationship), and for the first model Self-supervised learning brings more beneficial information, and ultimately improves the efficiency and accuracy of self-supervised learning.
  • the target loss value considered in the process of updating the network parameters of the first model 10 and the network parameters of the second model 20 may also include at least one first predicted object sequence corresponding to the output of the first target detection network The difference between the first object region of the first object region and the candidate object region of at least one candidate object, and the difference between the first object category corresponding to each first predicted object sequence and the candidate object category of each candidate object.
  • the bipartite graph matching algorithm can be used to match the first object region and the first object category corresponding to each first predicted object sequence, as well as the candidate object region and the candidate object category of each candidate object, and then use the generalized intersection and merge ratio
  • the function determines the first sub-loss value between the first object region and the candidate object region corresponding to each pair of the first predicted object sequence and the candidate object with the target matching relationship, and uses the focal loss function to determine that each pair has the target matching relationship
  • the second sub-loss value between the first object category and the candidate object category corresponding to the first predicted object sequence and the candidate object, based on each first sub-loss value, each second sub-loss value and each pair with The similarity loss between the first predicted object sequence and the second predicted object sequence of the target matching relationship may determine a target loss value.
  • Step S703 migrating the pre-trained first model to the target detection task.
  • the first previous model in the trained first model can be The feed-forward neural network is removed, and the number of output categories of the third feed-forward neural network in the first model is adjusted according to the actual target detection task, and the adjusted first model is determined as the initial third model, and then the The model parameters of the third model are fine-tuned and trained to obtain a third model that can be used for the target detection task.
  • FIG. 8 is a schematic diagram of the composition and structure of a model training device provided by an embodiment of the present disclosure.
  • An updating part 840 wherein: the first acquiring part 810 is configured to acquire the first augmented image and the second augmented image obtained after augmenting the first image sample respectively; the first detecting part 820 is configured In order to use the first model to be trained to perform target detection on the first augmented image, obtain at least one first detection result including the first predicted object sequence, and use the second model to perform target detection on the second augmented image, Obtain at least one second detection result including the second predictor sequence; the first matching part 830 is configured to match each first predictor sequence with each second predictor sequence, and obtain at least one pair with target matching The first predictor sequence and the second predictor sequence of the relationship; the first update part 840 is configured to update the model of the first model based on each pair of the first predictor sequence and the second predictor sequence with the target matching relationship The parameters are updated at least once to obtain the trained first model.
  • the first updating part is further configured to: determine the target loss value based on the similarity between each pair of the first predictor sequence and the second predictor sequence having the target matching relationship; If the preset condition is not satisfied, the model parameters of the first model are updated to obtain an updated first model; based on the updated first model, the trained first model is determined.
  • the first updating part is further configured to: update the model parameters of the first model and the model parameters of the second model respectively when the target loss value does not meet the preset condition, and obtain the updated The first model and the updated second model; based on the updated first model and the updated second model, the trained first model is determined.
  • the first update part is further configured to: update the momentum of the model parameters of the second model based on the current model parameters of the first model to obtain the updated second model;
  • the current model parameters of the first model are updated to obtain the updated first model.
  • the first updating part is further configured to: respectively determine the first augmented image and the second augmented image obtained after augmenting the next first image sample as the current first augmented image.
  • the widened image and the current second augmented image use the currently updated first model to perform target detection on the current first augmented image, obtain at least one first detection result including the first predicted object sequence, and use the current updated
  • the second model of is to perform target detection on the current second augmented image, and obtain at least one second detection result including a second predicted object sequence; match each first predicted object sequence with each second predicted object sequence, Obtain at least one pair of the first predictor sequence and the second predictor sequence with the target matching relationship; based on the similarity between each pair of the first predictor sequence and the second predictor sequence with the target matching relationship, determine the current target Loss value: when the current target loss value satisfies the preset condition or the number of times the model parameters of the first model are updated reaches a number threshold, the currently updated first model is determined as the trained first model.
  • the first updating part is further configured to: when the current target loss value does not meet the preset condition, respectively update the model parameters of the first model and the model parameters of the second model for the next time , to obtain the first model after the next update and the second model after the next update; based on the first model after the next update and the second model after the next update, determine the first model after training.
  • the first detection result further includes a first object area and a first object category corresponding to the first predicted object sequence in the first detection result;
  • the device further includes: a second acquisition part configured to acquire At least one candidate object in the first image sample, each candidate object has a candidate object area and a candidate object category;
  • the second matching part is configured to be based on the first object area and the first object corresponding to each first sequence of predicted objects Category, and the candidate object area and candidate object category of each candidate object, matching each first predicted object sequence and each candidate object to obtain at least one pair of first predicted object sequence and candidate object with a target matching relationship;
  • the first updating part is further configured to: determine a first loss value based on the similarity between each pair of the first predictor sequence and the second predictor sequence with the target matching relationship;
  • the first prediction object sequence and the candidate object determine a second loss value; based on the first loss value and the second loss value, determine a target loss value.
  • the first update part is further configured to: for each pair of the first predicted object sequence and the candidate object having the target matching relationship, based on the first object region corresponding to the first predicted object sequence and the candidate object The candidate object area, determine a first sub-loss value, and determine a second sub-loss value based on the first object category corresponding to the first predicted object sequence and the candidate object category of the candidate object; based on each first sub-loss The loss value and each second sub-loss value determine the second loss value.
  • the second acquisition part is further configured to: use an unsupervised method to perform object detection on the first image sample to obtain at least one predicted object region and a pseudo-label of each predicted object region; each predicted object region The pseudo-label of is used to represent the prediction object category of the prediction object region; for each prediction object region, the prediction object region is used as a candidate object region, and the pseudo-label of the prediction object region is used as a candidate object category to obtain a candidate object.
  • the first detection result further includes the first object region and the first object category corresponding to the first predicted object sequence in the first detection result
  • the second detection result further includes the first object category corresponding to the first predicted object sequence in the second detection result.
  • the second object region and the second object category corresponding to the two predicted object sequences; the first matching part is further configured to: based on the first object region and the first object category corresponding to each first predicted object sequence, and each second For the second object area and the second object category corresponding to the predicted object sequence, perform bipartite graph matching on each first predicted object sequence and each second predicted object sequence to obtain at least one pair of first predicted object sequences with a target matching relationship and the second predictor sequence.
  • the first matching part is further configured to: determine at least one candidate sequence pair set based on each first predictor sequence and each second predictor sequence; each candidate sequence pair set includes at least one For the first prediction target sequence and the second prediction target sequence with a candidate matching relationship; for each candidate sequence pair set, based on each pair of the first prediction target sequence and the second prediction target sequence with a candidate matching relationship in the candidate sequence pair set The first object region and the first object category corresponding to the first predicted object sequence in the object sequence, and the second object region and the second object category corresponding to the second predicted object sequence, determine the matching loss of the candidate sequence pair set; at least Each pair of the first prediction object sequence and the second prediction object sequence with a candidate matching relationship in the candidate sequence pair set with the smallest matching loss in a candidate sequence pair set is determined as at least one pair of first prediction objects with a target matching relationship sequence and the second predictor sequence.
  • the first model includes a feature extraction network and a converter network; the first detection part is further configured to: use the feature extraction network of the first model to perform feature extraction on the first augmented image to obtain image feature information ; Using the converter network of the first model to perform prediction processing on the image feature information to obtain at least one sequence of first prediction objects.
  • the first model further includes a first feed-forward neural network; the first detection part is further configured to: use the converter network of the first model to predict image feature information to obtain at least one feature sequence; Using the first feed-forward neural network, each feature sequence is mapped to the target dimension to obtain at least one first sequence of predicted objects.
  • the first detection result also includes the first object region and the first object category
  • the first model also includes a second feedforward neural network and a third feedforward neural network
  • the first detection part is also configured as: For each feature sequence, the second feedforward neural network is used to perform area prediction on the feature sequence to obtain the first object area, and the third feedforward neural network is used to perform category prediction on the feature sequence to obtain the first object category.
  • the second model has the same network structure as the first model.
  • the first acquisition part is further configured to: perform a first image augmentation process on the first image sample to obtain a first augmented image; perform a second image augmentation process on the first image sample to obtain a second image augmentation process Two augmented images.
  • the first image augmentation process includes at least one of the following: color dithering, grayscale processing, Gaussian blur, random erasure; the second image augmentation process includes at least one of the following: random scaling, random cropping, Flip randomly, resize randomly.
  • the apparatus further includes: a determining part configured to determine an initial third model based on the trained first model; a second updating part configured to determine an initial third model based on at least one second image sample The model parameters of the third model are updated to obtain the trained third model.
  • FIG. 9 is a schematic diagram of the composition and structure of an image processing device provided by an embodiment of the present disclosure.
  • the image processing device 900 includes: a third acquisition part 910 and a second detection part 920, wherein: the third acquisition part 910 , is configured to acquire the image to be processed; the second detection part 920 is configured to use the trained fourth model to perform object detection on the image to be processed to obtain a third detection result; the fourth model includes at least one of the following: using the above The first model obtained by the model training method described in the embodiment, and the third model obtained by the model training method described in the above embodiment are used.
  • a "part" may be a part of a circuit, a part of a processor, a part of a program or software, etc., of course it may also be a unit, a module or a non-modular one.
  • model training method or image processing method is implemented in the form of software function parts, and sold or used as an independent product, it can also be stored in a computer-readable storage medium middle.
  • the essence of the technical solution of the embodiments of the present disclosure or the part that contributes to the related technology can be embodied in the form of a software product, which is stored in a storage medium and includes several instructions to make a A computer device (which may be a personal computer, a server, or a network device, etc.) executes all or part of the methods described in various embodiments of the present disclosure.
  • the aforementioned storage medium includes: various media that can store program codes such as U disk, mobile hard disk, read-only memory (Read Only Memory, ROM), magnetic disk or optical disk.
  • embodiments of the present disclosure are not limited to any specific combination of hardware and software.
  • An embodiment of the present disclosure provides a computer device, including a memory and a processor, the memory stores a computer program that can run on the processor, and the processor implements the steps in the above method when executing the program.
  • An embodiment of the present disclosure provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the steps in the above method are implemented.
  • the computer readable storage medium may be transitory or non-transitory.
  • An embodiment of the present disclosure provides a computer program product.
  • the computer program product includes a non-transitory computer-readable storage medium storing a computer program. When the computer program is read and executed by a computer, a part or part of the above-mentioned method is implemented. All steps.
  • the computer program product can be realized by hardware, software or a combination thereof.
  • the computer program product is embodied as a computer-readable storage medium, and in other embodiments, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK) and the like.
  • FIG. 10 is a schematic diagram of a hardware entity of a computer device in an embodiment of the present disclosure.
  • the hardware entity of the computer device 1000 includes: a processor 1001, a communication interface 1002, and a memory 1003, wherein:
  • the processor 1001 usually controls the overall operation of the computer device 1000;
  • the communication interface 1002 can enable the computer device to communicate with other terminals or servers through the network;
  • the memory 1003 is configured to store instructions and applications executable by the processor 1001, and can also cache 1001 and the data to be processed or processed by each part of the computer device 1000 (for example, image data, audio data, voice communication data and video communication data), can be stored in flash memory (FLASH) or random access memory (Random Access Memory, RAM) accomplish.
  • Data transmission can be performed between the processor 1001 , the communication interface 1002 and the memory 1003 through the bus 1004 .
  • the disclosed devices and methods may be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the parts is only a logical function division.
  • the coupling, or direct coupling, or communication connection between the various components shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or parts may be in electrical, mechanical or other forms. of.
  • the part described above as a separate component may or may not be physically separated, and the part shown as a part may or may not be a physical part; it may be located in one place or distributed to multiple network parts; Some or all of them can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional part in each embodiment of the present disclosure may be fully integrated into one processing part, or each part may be separately regarded as one part, or two or more parts may be integrated into one part; the above-mentioned integration
  • the part can be implemented in the form of hardware, or in the form of hardware plus software function.
  • the above-mentioned integrated part of the present disclosure is realized in the form of software function part and sold or used as an independent product, it can also be stored in a computer-readable storage medium.
  • the computer software product is stored in a storage medium and includes several instructions to make a A computer device (which may be a personal computer, a server, or a network device, etc.) executes all or part of the methods described in various embodiments of the present disclosure.
  • the aforementioned storage medium includes various media capable of storing program codes such as removable storage devices, ROMs, magnetic disks or optical disks.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

Provided in the embodiments of the present disclosure are a model training method and apparatus, an image processing method and apparatus, and a device, a storage medium and a computer program product. The model training method comprises: acquiring a first augmented image and a second augmented image; performing target detection on the first augmented image by using a first model so as to obtain at least one first detection result that comprises first predictive object sequences, and performing target detection on the second augmented image by using a second model so as to obtain at least one second detection result that comprises second predictive object sequences; matching each first predictive object sequence with each second predictive object sequence to obtain at least one pair of first predictive object sequence and second predictive object sequence that have a target matching relationship; and on the basis of each pair of first predictive object sequence and second predictive object sequence that have a target matching relationship, updating model parameters of the first model at least once to obtain a trained first model.

Description

模型训练及图像处理方法、装置、设备、存储介质及计算机程序产品Model training and image processing method, device, equipment, storage medium and computer program product
相关申请的交叉引用Cross References to Related Applications
本公开实施例基于申请号为202111667489.4、申请人为上海商汤智能科技有限公司、申请日为2021年12月31日、申请名称为“模型训练及图像处理方法、装置、设备、存储介质”的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本公开作为参考。The embodiments of the present disclosure are based on the application number 202111667489.4, the applicant is Shanghai Shangtang Intelligent Technology Co., Ltd., the application date is December 31, 2021, and the application name is "model training and image processing method, device, equipment, storage medium" in China. A patent application is filed and priority is claimed to this Chinese patent application, the entire contents of which are hereby incorporated by reference into this disclosure.
技术领域technical field
本公开涉及但不限人工智能领域,尤其涉及一种模型训练及图像处理方法、装置、设备、存储介质及计算机程序产品。The present disclosure relates to but not limited to the field of artificial intelligence, and in particular relates to a model training and image processing method, device, equipment, storage medium and computer program product.
背景技术Background technique
目标检测是计算机视觉以及工业检测等领域中的重要问题,目标检测是利用算法在图像中得到感兴趣目标的位置和对应的分类。相比于图像分类,目标检测是一种预测密集型的计算机视觉任务,在目标检测模型的训练过程中对标注要求较高,从而标注成本也较高。Target detection is an important problem in the fields of computer vision and industrial detection. Target detection is to use algorithms to obtain the position and corresponding classification of the target of interest in the image. Compared with image classification, object detection is a prediction-intensive computer vision task. During the training process of the object detection model, the labeling requirements are higher, so the labeling cost is also higher.
发明内容Contents of the invention
有鉴于此,本公开实施例提供一种模型训练及图像处理方法、装置、设备、存储介质及计算机程序产品。In view of this, embodiments of the present disclosure provide a model training and image processing method, device, device, storage medium, and computer program product.
本公开实施例的技术方案是这样实现的:The technical scheme of the embodiment of the present disclosure is realized in this way:
本公开实施例提供一种模型训练方法,所述方法由计算机设备执行,所述方法包括:An embodiment of the present disclosure provides a model training method, the method is executed by a computer device, and the method includes:
获取分别对第一图像样本进行增广处理后得到的第一增广图像和第二增广图像;Acquiring a first augmented image and a second augmented image respectively obtained by augmenting the first image sample;
利用待训练的第一模型,对所述第一增广图像进行目标检测,得到至少一个包括第一预测对象序列的第一检测结果,并利用第二模型,对所述第二增广图像进行目标检测,得到至少一个包括第二预测对象序列的第二检测结果;Using the first model to be trained, perform target detection on the first augmented image to obtain at least one first detection result including the first predicted object sequence, and use the second model to perform target detection on the second augmented image Target detection, obtaining at least one second detection result including a second predicted object sequence;
对每一所述第一预测对象序列和每一所述第二预测对象序列进行匹配,得到至少一对具有目标匹配关系的第一预测对象序列和第二预测对象序列;Matching each of the first predictor sequences and each of the second predictor sequences to obtain at least one pair of the first predictor sequence and the second predictor sequence having a target matching relationship;
基于每一对具有目标匹配关系的第一预测对象序列和第二预测对象序列,对所述第一模型的模型参数进行至少一次更新,得到训练后的所述第一模型。Based on each pair of the first predicted object sequence and the second predicted object sequence having a target matching relationship, the model parameters of the first model are updated at least once to obtain the trained first model.
本公开实施例提供一种图像处理方法,所述方法由计算机设备执行,包括:An embodiment of the present disclosure provides an image processing method, the method is executed by a computer device, including:
获取待处理图像;Get the image to be processed;
利用已训练的第四模型,对所述待处理图像进行目标检测,得到第三检测结果;其中,所述第四模型包括以下至少之一:采用上述模型训练方法得到的第一模型,采用上述模型训练方法得到的第三模型。Use the trained fourth model to perform target detection on the image to be processed to obtain a third detection result; wherein, the fourth model includes at least one of the following: the first model obtained by the above-mentioned model training method, using the above-mentioned The third model obtained by the model training method.
本公开实施例提供一种模型训练装置,所述装置包括:An embodiment of the present disclosure provides a model training device, the device comprising:
第一获取部分,被配置为获取分别对第一图像样本进行增广处理后得到的第一增广图像和第二增广图像;The first acquisition part is configured to acquire a first augmented image and a second augmented image obtained by respectively augmenting the first image sample;
第一检测部分,被配置为利用待训练的第一模型,对所述第一增广图像进行目标检测,得到至少一个包括第一预测对象序列的第一检测结果,并利用第二模型,对所述第二增广图像进行目标检测,得到至少一个包括第二预测对象序列的第二检测结果;The first detection part is configured to use the first model to be trained to perform object detection on the first augmented image, obtain at least one first detection result including the first predicted object sequence, and use the second model to perform target detection on the first augmented image. Target detection is performed on the second augmented image, and at least one second detection result including a second predicted object sequence is obtained;
第一匹配部分,被配置为对每一所述第一预测对象序列和每一所述第二预测对象序列进行匹配,得到至少一对具有目标匹配关系的第一预测对象序列和第二预测对象序列;The first matching part is configured to match each of the first predictor sequences and each of the second predictor sequences to obtain at least one pair of first predictor sequences and second predictors having a target matching relationship sequence;
第一更新部分,被配置为基于每一对具有目标匹配关系的第一预测对象序列和第二预测对象序列,对所述第一模型的模型参数进行至少一次更新,得到训练后的所述第一模型。The first update part is configured to update the model parameters of the first model at least once based on each pair of the first predicted object sequence and the second predicted object sequence having a target matching relationship, and obtain the trained first predicted object sequence. a model.
本公开实施例提供一种图像处理装置,包括:An embodiment of the present disclosure provides an image processing device, including:
第三获取部分,被配置为获取待处理图像;The third acquiring part is configured to acquire the image to be processed;
第二检测部分,被配置为利用已训练的第四模型,对所述待处理图像进行目标检测,得到第三检测结果;其中,所述第四模型包括以下至少之一:采用上述模型训练方法得到的第一模型,采用上述模型训练方法得到的第三模型。The second detection part is configured to use the trained fourth model to perform target detection on the image to be processed to obtain a third detection result; wherein the fourth model includes at least one of the following: using the above-mentioned model training method The obtained first model and the third model obtained by using the above model training method.
本公开实施例提供一种计算机设备,包括存储器和处理器,所述存储器存储有可在处理器上运行的计算机程序,所述处理器执行所述程序时实现上述方法中的部分或全部步骤。An embodiment of the present disclosure provides a computer device, including a memory and a processor. The memory stores a computer program that can run on the processor. When the processor executes the program, part or all of the steps in the above method are implemented.
本公开实施例提供一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现上述方法中的部分或全部步骤。An embodiment of the present disclosure provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, part or all of the steps in the above method are implemented.
本公开实施例提供一种计算机程序,包括计算机可读代码,当所述计算机可读代码在计算机设备中运行时,所述计算机设备中的处理器执行用于实现上述方法中的部分或全部步骤。An embodiment of the present disclosure provides a computer program, including computer readable codes, when the computer readable codes are run in a computer device, a processor in the computer device executes some or all of the steps for implementing the above method .
本公开实施例提供一种计算机程序产品,所述计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质,所述计算机程序被计算机读取并执行时,实现上述方法中的部分或全部步骤。An embodiment of the present disclosure provides a computer program product. The computer program product includes a non-transitory computer-readable storage medium storing a computer program. When the computer program is read and executed by a computer, a part or part of the above-mentioned method is implemented. All steps.
本公开实施例中,获取分别对第一图像样本进行增广处理后得到的第一增广图像和第二增广图像;利用待训练的第一模型,对第一增广图像进行目标检测,得到至少一个包括第一预测对象序列的第一检测结果,并利用第二模型,对第二增广图像进行目标检测,得到至少一个包括第二预测对象序列的第二检测结果;对每一第一预测对象序列和每一第二预测对象序列进行匹配,得到至少一对具有目标匹配关系的第一预测对象序列和第二预测对象序列;基于每一对具有目标匹配关系的第一预测对象序列和第二预测对象序列,对第一模型的模型参数进行至少一次更新,得到训练后的第一模型。这样,可以通过保持第一模型和第二模型分别对同一图像样本的第一增广图像和第二增广图像处理后得到的第一预测对象序列和第二预测对象序列之间的一致性,实现目标检测模型的序列级别的自监督训练过程,并且可以对目标检测模型的整体网络结构进行训练,从而可以有效提高整个目标检测模型的性能,并能减少目标检测模型训练过程中的标注成本。In the embodiment of the present disclosure, the first augmented image and the second augmented image obtained after respectively augmenting the first image sample are acquired; using the first model to be trained, the target detection is performed on the first augmented image, Obtain at least one first detection result including the first predicted object sequence, and use the second model to perform target detection on the second augmented image, and obtain at least one second detection result including the second predicted object sequence; for each first A predictor sequence is matched with each second predictor sequence to obtain at least one pair of the first predictor sequence and the second predictor sequence with the target matching relationship; based on each pair of the first predictor sequence with the target match relationship and the second predicted object sequence, and update the model parameters of the first model at least once to obtain the trained first model. In this way, by maintaining the consistency between the first predicted object sequence and the second predicted object sequence obtained after the first model and the second model respectively process the first augmented image and the second augmented image of the same image sample, The sequence-level self-supervised training process of the target detection model can be realized, and the overall network structure of the target detection model can be trained, so that the performance of the entire target detection model can be effectively improved, and the labeling cost in the training process of the target detection model can be reduced.
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,而非限制本公开。It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
附图说明Description of drawings
为了更清楚地说明本公开实施例的技术方案,下面将对本公开实施例中所需要使用的附图进行说明。此处的附图被并入说明书中并构成本说明书的一部分,这些附图示出了符合本公开的实施例,并与说明书一起用于说明本公开的技术方案。In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings that need to be used in the embodiments of the present disclosure will be described below. The accompanying drawings here are incorporated into the description and constitute a part of the present description. These drawings show embodiments consistent with the present disclosure, and are used together with the description to explain the technical solution of the present disclosure.
图1为本公开实施例提供的一种模型训练方法的实现流程示意图;FIG. 1 is a schematic diagram of the implementation flow of a model training method provided by an embodiment of the present disclosure;
图2为本公开实施例提供的一种模型训练方法的实现流程示意图;FIG. 2 is a schematic diagram of an implementation flow of a model training method provided by an embodiment of the present disclosure;
图3为本公开实施例提供的一种模型训练方法的实现流程示意图;FIG. 3 is a schematic diagram of an implementation flow of a model training method provided by an embodiment of the present disclosure;
图4为本公开实施例提供的一种模型训练方法的实现流程示意图;FIG. 4 is a schematic diagram of an implementation flow of a model training method provided by an embodiment of the present disclosure;
图5为本公开实施例提供的一种模型训练方法的实现流程示意图;FIG. 5 is a schematic diagram of an implementation flow of a model training method provided by an embodiment of the present disclosure;
图6为本公开实施例提供的一种图像处理方法的实现流程示意图;FIG. 6 is a schematic diagram of an implementation flow of an image processing method provided by an embodiment of the present disclosure;
图7A为本公开实施例提供的一种基于预训练方法进行模型训练的实现流程示意图;FIG. 7A is a schematic diagram of an implementation process of model training based on a pre-training method provided by an embodiment of the present disclosure;
图7B为本公开实施例提供的一种模型训练方法的实现架构示意图;FIG. 7B is a schematic diagram of an implementation architecture of a model training method provided by an embodiment of the present disclosure;
图8为本公开实施例提供的一种模型训练装置的组成结构示意图;FIG. 8 is a schematic diagram of the composition and structure of a model training device provided by an embodiment of the present disclosure;
图9为本公开实施例提供的一种图像处理装置的组成结构示意图;FIG. 9 is a schematic diagram of the composition and structure of an image processing device provided by an embodiment of the present disclosure;
图10为本公开实施例提供的一种计算机设备的硬件实体示意图。FIG. 10 is a schematic diagram of a hardware entity of a computer device provided by an embodiment of the present disclosure.
具体实施方式Detailed ways
为了使本公开的目的、技术方案和优点更加清楚,下面结合附图和实施例对本公开的技术方案详细阐述,所描述的实施例不应视为对本公开的限制,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例,都属于本公开保护的范围。In order to make the purpose, technical solutions and advantages of the present disclosure clearer, the technical solutions of the present disclosure will be described in detail below in conjunction with the accompanying drawings and embodiments, and the described embodiments should not be considered as limiting the present disclosure. All other embodiments obtained under the premise of creative work belong to the protection scope of the present disclosure.
在以下的描述中,涉及到“一些实施例”,其描述了所有可能实施例的子集,但是可以理解,“一些实施例”可以是所有可能实施例的相同子集或不同子集,并且可以在不冲突的情况下相互结合。所涉及的术语“第一/第二/第三”仅仅是区别类似的对象,不代表针对对象的特定排序,可以理解地,“第一/第二/第三”在允许的情况下可以互换特定的顺序或先后次序,以使这里描述的本公开实施例能够以除了在这里图示或描述的以外的顺序实施。In the following description, references to "some embodiments" describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or a different subset of all possible embodiments, and Can be combined with each other without conflict. The term "first/second/third" involved is only to distinguish similar objects, and does not represent a specific ordering for the objects. It is understandable that "first/second/third" can be used interchangeably when permitted. The specific order or sequential order is not intended to enable the embodiments of the disclosure described herein to be practiced in other orders than those illustrated or described herein.
除非另有定义,本文所使用的所有的技术和科学术语与属于本公开的技术领域的技术人员通常理解的含义相同。本文中所使用的术语只是为了描述本公开的目的,不是旨在限制本公开。Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. The terms used herein are for the purpose of describing the present disclosure only, and are not intended to limit the present disclosure.
针对相关技术中目标检测模型的训练过程中标注成本高的问题,可以采用自监督训练算法,利用无标签数据辅助提升目标检测模型的性能。但是,相关技术中的自监督训练算法主要应用在图像分类任务上,将整个图像看作一个整体,不适用于目标检测这种预测密集型任务,并且相关技术中的自监督训练算法通常只能预训练目标检测模型中部分网络的参数,例如只能训练主干网络部分的参数,因此对最终的整个目标检测模型的性能提升有限。To solve the problem of high labeling costs in the training process of the target detection model in related technologies, a self-supervised training algorithm can be used to help improve the performance of the target detection model by using unlabeled data. However, the self-supervised training algorithm in the related art is mainly applied to the image classification task, and the entire image is regarded as a whole, which is not suitable for the prediction-intensive task of target detection, and the self-supervised training algorithm in the related art usually can only The parameters of some networks in the pre-trained target detection model, for example, can only train the parameters of the backbone network, so the performance improvement of the final target detection model is limited.
本公开实施例提供一种模型训练方法,该方法可以由计算机设备的处理器执行。其中,计算机设备指的可以是服务器、笔记本电脑、平板电脑、台式计算机、智能电视、机顶盒、移动设备(例如移动电话、便携式视频播放器、个人数字助理、专用消息设备、便携式游戏设备)等具备数据处理能力的设备。图1为本公开实施例提供的一种模型训练方法的实现流程示意图,如图1所示,该方法包括如下步骤S101至步骤S104:An embodiment of the present disclosure provides a model training method, which can be executed by a processor of a computer device. Among them, computer equipment refers to servers, notebook computers, tablet computers, desktop computers, smart TVs, set-top boxes, mobile devices (such as mobile phones, portable video players, personal digital assistants, dedicated messaging devices, portable game devices), etc. Devices with data processing capabilities. Fig. 1 is a schematic diagram of the implementation flow of a model training method provided by an embodiment of the present disclosure. As shown in Fig. 1, the method includes the following steps S101 to S104:
步骤S101,获取分别对第一图像样本进行增广处理后得到的第一增广图像和第二增广图像。Step S101 , acquiring a first augmented image and a second augmented image obtained by respectively performing augmentation processing on a first image sample.
这里,第一图像样本可以是任意合适的包含至少一个对象的图像。第一图像样本中包含的对象可以根据实际应用场景确定,例如可以包括但不限于人、人体部位、动物、动物肢体、植物、花朵、树叶、石头、云朵、围栏等对象中的至少一种。Here, the first image sample may be any suitable image containing at least one object. The objects contained in the first image sample can be determined according to the actual application scene, for example, may include but not limited to at least one of objects such as people, human body parts, animals, animal limbs, plants, flowers, leaves, stones, clouds, and fences.
对第一图像样本进行的增广处理可以包括但不限于随机缩放、随机裁剪、随机翻转、随机调整尺寸、颜色抖动、灰度处理、高斯模糊、随机擦除等中的至少一种。第一增广图像和第二增广图像可以是对同一第一图像样本分别进行不同增广处理后得到的,可以是对同一第一图像样本分别进行相同的增广处理后得到的。在实施时,本领域技术人员可以根据实际情况,对第一图像样本采用合适的增广处理得到第一增广图像和第二增广图像,本公开实施例并不限定。The augmentation processing performed on the first image sample may include but not limited to at least one of random scaling, random cropping, random flipping, random resizing, color dithering, grayscale processing, Gaussian blurring, random erasing, and the like. The first augmented image and the second augmented image may be obtained by performing different augmentation processes on the same first image sample, or may be obtained by performing the same augmentation process on the same first image sample. During implementation, those skilled in the art may use appropriate augmentation processing on the first image sample to obtain the first augmented image and the second augmented image according to actual conditions, which are not limited by the embodiments of the present disclosure.
步骤S102,利用待训练的第一模型,对所述第一增广图像进行目标检测,得到至少一个包括第一预测对象序列的第一检测结果,并利用第二模型,对所述第二增广图像进行目标检测,得到至少一个包括第二预测对象序列的第二检测结果。Step S102, use the first model to be trained to perform object detection on the first augmented image, obtain at least one first detection result including the first predicted object sequence, and use the second model to perform object detection on the second augmented image Target detection is performed on the wide image, and at least one second detection result including the second predicted object sequence is obtained.
这里,第一模型可以是任意合适的基于序列特性进行目标检测的模型,如视觉转换器(Vision Transformer,ViT)、基于转换器的目标检测模型(Detection Transformer,DETR)、可变形DETR等。第一模型能将目标检测问题转换为特征序列集合的预测问题,从而能够输出至少一个包括第一预测对象序列的第一检测结果。第一预测对象序列可以是第一模型对第一增广图像进行序列编码以及序列解码后得到的。每一第一预测对象序列可以表征第一图像样本中的一个预测对象。在实施时,本领域技术人员可以根据实际情况采用任意合适的序列编码方式以及序列解码方式对第一增广图像进行处理,得到至少一个第一预测对象序列,本公开实施例并不限定。Here, the first model can be any suitable model for object detection based on sequence characteristics, such as Vision Transformer (ViT), Transformer-based object detection model (Detection Transformer, DETR), deformable DETR, etc. The first model can transform the target detection problem into a prediction problem of the feature sequence set, so as to output at least one first detection result including the first predicted object sequence. The first prediction target sequence may be obtained after the first model performs sequence encoding and sequence decoding on the first augmented image. Each sequence of first predictors may represent a predictor in the first image sample. During implementation, those skilled in the art may use any suitable sequence encoding method and sequence decoding method to process the first augmented image according to the actual situation to obtain at least one first prediction object sequence, which is not limited in this embodiment of the present disclosure.
在一些实施方式中,第一模型可以是可变形DETR。第一检测结果中的第一预测对象序列,可以是转换器(Transformer)中的解码器输出的预测对象序列,也可以是对Transformer 中的解码器输出的预测对象序列进行维度变换等映射处理后得到的映射后的预测对象序列。In some embodiments, the first model may be a deformable DETR. The first prediction object sequence in the first detection result can be the prediction object sequence output by the decoder in the transformer (Transformer), or it can be the prediction object sequence output by the decoder in the Transformer after mapping processing such as dimension transformation The resulting mapped prediction object sequence.
在一些实施方式中,第一检测结果中可以包括第一预测对象序列、与所述第一预测对象序列对应的第一对象区域和第一对象类别。第一预测对象序列可以表征一个预测对象,与第一预测对象序列对应的第一对象区域和第一对象类别可以分别表征该预测对象的预测位置区域和预测类别。In some implementation manners, the first detection result may include a first predicted object sequence, a first object region and a first object category corresponding to the first predicted object sequence. The first predicted object sequence may represent a predicted object, and the first object area and the first object category corresponding to the first predicted object sequence may respectively represent the predicted location area and predicted category of the predicted object.
第二模型可以与第一模型具有相同的网络结构,也可以与第一模型具有不同的网络结构,这里并不限定。利用第二模型对第二增广图像进行目标检测的处理过程,与利用第一模型对第一增广图像进行目标检测的处理过程是对应的,在实施时可以参照利用第一模型对第一增广图像进行目标检测的处理过程。第二预测对象序列可以是第二模型对第二增广图像进行序列编码以及序列解码后得到的。每一第二预测对象序列可以表征第一图像样本中的一个预测对象。The second model may have the same network structure as the first model, or may have a different network structure from the first model, which is not limited here. The process of using the second model to perform object detection on the second augmented image corresponds to the process of using the first model to perform object detection on the first augmented image. During implementation, you can refer to the process of using the first model to detect the first The process of augmenting images for object detection. The second prediction target sequence may be obtained after the second model performs sequence encoding and sequence decoding on the second augmented image. Each second sequence of predictors may represent a predictor in the first image sample.
在一些实施方式中,在第三模型为基于Transformer的目标检测模型的情况下,第二检测结果中的第二预测对象序列,可以是Transformer中的解码器输出的预测对象序列,也可以是对Transformer中的解码器输出的预测对象序列进行维度变换等映射处理后得到的映射后的预测对象序列。In some implementations, in the case where the third model is a Transformer-based target detection model, the second predicted object sequence in the second detection result may be the predicted object sequence output by the decoder in the Transformer, or a pair of The predicted target sequence output by the decoder in the Transformer is the mapped predicted target sequence obtained after mapping processing such as dimension transformation.
在一些实施方式中,第二检测结果中可以包括第二预测对象序列、与所述第二预测对象序列对应的第二对象区域和第二对象类别。第二预测对象序列可以表征一个预测对象,与第二预测对象序列对应的第二对象区域和第二对象类别可以分别表征该预测对象的预测位置区域和预测类别。In some implementation manners, the second detection result may include a second predicted object sequence, a second object region and a second object category corresponding to the second predicted object sequence. The second predicted object sequence may represent a predicted object, and the second object area and the second object category corresponding to the second predicted object sequence may respectively represent the predicted location area and predicted category of the predicted object.
步骤S103,对每一所述第一预测对象序列和每一所述第二预测对象序列进行匹配,得到至少一对具有目标匹配关系的第一预测对象序列和第二预测对象序列。Step S103, matching each of the first predictor sequences and each of the second predictor sequences to obtain at least one pair of the first predictor sequence and the second predictor sequence having a target matching relationship.
这里,具有目标匹配关系的第一预测对象序列和第二预测对象序列可以表征第一图像样本中的同一个预测对象。在实施时,本领域技术人员可以根据实际情况采用任意合适的匹配方式对每一第一预测对象序列和每一第二预测对象序列进行匹配,这里并不限定。Here, the first sequence of prediction objects and the second sequence of prediction objects having a target matching relationship may represent the same prediction object in the first image sample. During implementation, those skilled in the art may use any suitable matching method to match each first sequence of prediction objects with each second sequence of prediction objects according to actual conditions, which is not limited here.
在一些实施方式中,可以确定每一第一预测对象序列的输出时序,以及每一第二预测对象序列的输出时序,将输出时序相同的第一预测对象序列和第二预测对象序列确定为具有目标匹配关系的第一预测对象序列和第二预测对象序列,从而可以得到至少一对具有目标匹配关系的第一预测对象序列和第二预测对象序列。In some implementations, the output timing of each first predictor sequence and the output timing of each second predictor sequence can be determined, and the first predictor sequence and the second predictor sequence with the same output timing are determined as having The first predictor sequence and the second predictor sequence of the target matching relationship, so that at least one pair of the first predictor sequence and the second predictor sequence having the target matching relationship can be obtained.
在一些实施方式中,可以采用二分图匹配的方式,对每一第一预测对象序列和每一第二预测对象序列进行匹配,得到至少一对具有目标匹配关系的第一预测对象序列和第二预测对象序列。在实施时,可以采用任意合适的方式对二分图匹配过程采用的匹配损失进行计算,这里并不限定。例如,二分图匹配过程采用的匹配损失可以是基于以下至少之一确定的:互相匹配的每一对第一预测对象序列和每一第二预测对象序列之间的相似度、互相匹配的每一对第一预测对象序列和每一第二预测对象序列分别对应的第一对象区域与第二对象区域之间的交并比、互相匹配的每一对第一预测对象序列和每一第二预测对象序列分别对应的第一对象类别与第二对象类别之间的焦点损失等。In some implementations, bipartite graph matching can be used to match each first predictor sequence with each second predictor sequence to obtain at least one pair of the first predictor sequence and the second predictor sequence with the target matching relationship. Predict sequence of objects. During implementation, any suitable manner may be used to calculate the matching loss used in the bipartite graph matching process, which is not limited here. For example, the matching loss used in the bipartite graph matching process may be determined based on at least one of the following: the similarity between each pair of matched first predictor sequences and each second predictor sequence, each pair of matched The intersection ratio between the first object region and the second object region corresponding to the first prediction object sequence and each second prediction object sequence, each pair of the first prediction object sequence and each second prediction object sequence that match each other The focus loss between the first object category and the second object category respectively corresponding to the object sequences, etc.
步骤S104,基于每一对具有目标匹配关系的第一预测对象序列和第二预测对象序列,对所述第一模型的模型参数进行至少一次更新,得到训练后的所述第一模型。Step S104 , based on each pair of the first predicted object sequence and the second predicted object sequence having the target matching relationship, update the model parameters of the first model at least once to obtain the trained first model.
这里,在一些实施方式中,可以基于每一对具有目标匹配关系的第一预测对象序列和第二预测对象序列,确定是否需要对第一模型的模型参数进行更新,在需要对第一模型的模型参数进行更新的情况下,采用合适的参数更新算法对第一模型的模型参数进行更新,并在更新后重新确定每一对具有目标匹配关系的第一预测对象序列和第二预测对象序列,以基于重新确定的每一对具有目标匹配关系的第一预测对象序列和第二预测对象序列,确定是否需要对第一模型的模型参数进行继续更新。在确定不需要对第一模型的模型参数进行继续更新的情况下,将最终更新后的第一模型确定为训练后的第一模型。Here, in some implementations, it may be determined whether to update the model parameters of the first model based on each pair of the first predicted object sequence and the second predicted object sequence having a target matching relationship. In the case of updating the model parameters, an appropriate parameter update algorithm is used to update the model parameters of the first model, and after the update, each pair of the first prediction object sequence and the second prediction object sequence with the target matching relationship is re-determined, Based on each re-determined pair of the first predictor sequence and the second predictor sequence having the target matching relationship, it is determined whether the model parameters of the first model need to be continuously updated. If it is determined that the model parameters of the first model do not need to be continuously updated, the finally updated first model is determined as the trained first model.
例如,可以基于每一对具有目标匹配关系的第一预测对象序列和第二预测对象序列,确定目标损失值,并在目标损失值不满足预设条件的情况下,对第一模型的模型参数进行更新, 在目标损失值满足预设条件或对第一模型的模型参数进行更新的次数达到设定阈值的情况下,停止对第一模型的模型参数进行更新,并将最终更新后的第一模型确定为训练后的第一模型。For example, the target loss value can be determined based on each pair of the first predictor sequence and the second predictor sequence with the target matching relationship, and when the target loss value does not meet the preset condition, the model parameters of the first model Update, when the target loss value satisfies the preset condition or the number of updates to the model parameters of the first model reaches the set threshold, stop updating the model parameters of the first model, and finally update the first model The model is determined to be the first model after training.
本公开实施例中,获取分别对第一图像样本进行增广处理后得到的第一增广图像和第二增广图像;利用待训练的第一模型,对第一增广图像进行目标检测,得到至少一个包括第一预测对象序列的第一检测结果,并利用第二模型,对第二增广图像进行目标检测,得到至少一个包括第二预测对象序列的第二检测结果;对每一第一预测对象序列和每一第二预测对象序列进行匹配,得到至少一对具有目标匹配关系的第一预测对象序列和第二预测对象序列;基于每一对具有目标匹配关系的第一预测对象序列和第二预测对象序列,对第一模型的模型参数进行至少一次更新,得到训练后的第一模型。这样,可以通过保持第一模型和第二模型分别对同一图像样本的第一增广图像和第二增广图像处理后得到的第一预测对象序列和第二预测对象序列之间的一致性,实现目标检测模型的序列级别的自监督训练过程,并且可以对目标检测模型的整体网络结构进行训练,从而可以有效提高整个目标检测模型的性能,并能减少目标检测模型训练过程中的标注成本。In the embodiment of the present disclosure, the first augmented image and the second augmented image obtained after respectively augmenting the first image sample are acquired; using the first model to be trained, the target detection is performed on the first augmented image, Obtain at least one first detection result including the first predicted object sequence, and use the second model to perform target detection on the second augmented image, and obtain at least one second detection result including the second predicted object sequence; for each first A predictor sequence is matched with each second predictor sequence to obtain at least one pair of the first predictor sequence and the second predictor sequence with the target matching relationship; based on each pair of the first predictor sequence with the target match relationship and the second predicted object sequence, and update the model parameters of the first model at least once to obtain the trained first model. In this way, by maintaining the consistency between the first predicted object sequence and the second predicted object sequence obtained after the first model and the second model respectively process the first augmented image and the second augmented image of the same image sample, The sequence-level self-supervised training process of the target detection model can be realized, and the overall network structure of the target detection model can be trained, so that the performance of the entire target detection model can be effectively improved, and the labeling cost in the training process of the target detection model can be reduced.
在一些实施例中,所述第一模型包括特征提取网络和转化器网络;上述步骤S102中所述的利用待训练的第一模型,对所述第一增广图像进行目标检测,得到至少一个包括第一预测对象序列的第一检测结果,包括如下步骤S111至步骤S112:In some embodiments, the first model includes a feature extraction network and a converter network; using the first model to be trained in the above step S102 to perform target detection on the first augmented image to obtain at least one The first detection result including the first predicted object sequence includes the following steps S111 to S112:
步骤S111,利用所述第一模型的特征提取网络,对所述第一增广图像进行特征提取,得到图像特征信息。Step S111, using the feature extraction network of the first model to perform feature extraction on the first augmented image to obtain image feature information.
这里,特征提取网络可以是任意合适的能够进行图像特征提取的网络,如卷积神经网络、循环神经网络、基于转换器的特征提取网络等。在实施时,本领域技术人员可以根据实际情况在第一模型中采用合适的特征提取网络,得到图像特征信息,这里并不限定。Here, the feature extraction network can be any suitable network capable of extracting image features, such as a convolutional neural network, a recurrent neural network, a converter-based feature extraction network, and the like. During implementation, those skilled in the art may use an appropriate feature extraction network in the first model according to actual conditions to obtain image feature information, which is not limited here.
步骤S112,利用所述第一模型的转换器网络,对所述图像特征信息进行预测处理,得到至少一个第一预测对象序列。Step S112, using the converter network of the first model to perform prediction processing on the image feature information to obtain at least one sequence of first prediction objects.
这里,转换器网络可以包括编码器网络和解码器网络。在实施时,本领域技术人员可以根据实际情况在第一模型中采用合适的转换器网络,对所述图像特征信息进行预测处理,这里并不限定。Here, the converter network may include an encoder network and a decoder network. During implementation, those skilled in the art may use an appropriate converter network in the first model according to actual conditions to perform prediction processing on the image feature information, which is not limited here.
在一些实施方式中,可以对图像特征信息进行位置编码后输入编码器网络,得到编码器网络对位置编码后的图像特征信息进行特征编码处理后的至少一个编码特征序列;利用解码器网络,对每一编码特征序列进行识别处理,得到至少一个预测对象对应的上下文识别信息,并利用根据每一上下文识别信息对每一编码特征序列进行特征解码处理,得到至少一个第一预测对象序列。In some embodiments, the image feature information can be position-encoded and then input into the encoder network to obtain at least one encoded feature sequence after the encoder network performs feature encoding processing on the position-encoded image feature information; using the decoder network, the Identifying each coded feature sequence to obtain context identification information corresponding to at least one prediction object, and performing feature decoding processing on each coded feature sequence according to each context identification information to obtain at least one first prediction object sequence.
上述实施例中,第一模型包括特征提取网络和转化器网络,这样,可以基于转换器网络的序列特性,实现基于转换器网络的目标检测模型的序列级别的自监督训练过程,并且可以对基于转换器网络的目标检测模型的整体网络结构进行训练,从而可以有效提高整个目标检测模型的性能,并能减少目标检测模型训练过程中的标注成本。In the above-mentioned embodiment, the first model includes a feature extraction network and a converter network, so that based on the sequence characteristics of the converter network, the sequence-level self-supervised training process of the target detection model based on the converter network can be realized, and can be based on The overall network structure of the target detection model of the converter network is trained, so that the performance of the entire target detection model can be effectively improved, and the labeling cost in the training process of the target detection model can be reduced.
在一些实施例中,所述第一模型还包括第一前馈神经网络;上述步骤S112可以包括如下步骤S121至步骤S122:In some embodiments, the first model further includes a first feed-forward neural network; the above step S112 may include the following steps S121 to S122:
步骤S121,利用所述第一模型的转换器网络,对所述图像特征信息进行预测处理,得到至少一个特征序列;Step S121, using the converter network of the first model to perform prediction processing on the image feature information to obtain at least one feature sequence;
步骤S122,利用所述第一前馈神经网络,将每一所述特征序列映射至目标维度,得到至少一个第一预测对象序列。Step S122, using the first feed-forward neural network to map each feature sequence to a target dimension to obtain at least one first sequence of predicted objects.
这里,第一前馈神经网络可以是任意合适的能将特征序列映射至目标维度的前馈神经网络,这里并不限定。Here, the first feedforward neural network may be any suitable feedforward neural network capable of mapping the feature sequence to the target dimension, which is not limited here.
目标维度可以是预先设定的。在实施时,本领域技术人员可以根据实际业务场景设置合适的目标维度。Target dimensions can be pre-set. During implementation, those skilled in the art can set appropriate target dimensions according to actual business scenarios.
例如,转换器网络输出的特征序列为256维的特征,通过第一前馈神经网络可以将该256 维的特征序列映射为一个512维的第一预测对象序列。For example, the feature sequence output by the converter network is a 256-dimensional feature, and the 256-dimensional feature sequence can be mapped to a 512-dimensional first prediction object sequence through the first feedforward neural network.
上述实施例中,通过第一前馈神经网络将转换器网络输出的特征序列映射至目标维度,得到第二预测对象序列,这样,可以通过预先设定合适的目标维度,提高第一模型的检测性能。例如,可以通过设置较高的目标维度,提高第一模型的检测准确性;又例如,可以通过设置较低的目标维度,提高第一模型的检测效率。In the above embodiment, the feature sequence output by the converter network is mapped to the target dimension through the first feed-forward neural network to obtain the second predicted object sequence. In this way, the detection of the first model can be improved by presetting the appropriate target dimension. performance. For example, the detection accuracy of the first model can be improved by setting a higher target dimension; for another example, the detection efficiency of the first model can be improved by setting a lower target dimension.
在一些实施例中,所述第一检测结果还包括第一对象区域和第一对象类别,所述第一模型还包括第二前馈神经网络和第三前馈神经网络;上述步骤S102中所述的利用待训练的第一模型,对所述第一增广图像进行目标检测,得到至少一个包括第一预测对象序列的第一检测结果,还包括:In some embodiments, the first detection result further includes a first object area and a first object category, and the first model further includes a second feedforward neural network and a third feedforward neural network; the above step S102 The aforementioned method uses the first model to be trained to perform target detection on the first augmented image to obtain at least one first detection result including the first predicted object sequence, and further includes:
步骤S131,针对每一所述特征序列,利用所述第二前馈神经网络,对所述特征序列进行区域预测,得到第一对象区域,并利用所述第三前馈神经网络,对所述特征序列进行类别预测,得到第一对象类别。Step S131, for each feature sequence, use the second feedforward neural network to perform region prediction on the feature sequence to obtain a first object region, and use the third feedforward neural network to predict the The feature sequence is used for category prediction to obtain the first object category.
这里,第二前馈神经网络可以是任意合适的能进行区域预测的前馈神经网络,这里并不限定。在一些实施方式中,利用第二前馈神经网络可以对特征序列表征的预测对象在第一增广图像中的位置区域进行预测,得到的第一对象区域可以是预测对象的检测框。Here, the second feedforward neural network may be any suitable feedforward neural network capable of area prediction, which is not limited here. In some implementations, the second feedforward neural network can be used to predict the position area of the predicted object represented by the feature sequence in the first augmented image, and the obtained first object area can be a detection frame of the predicted object.
第三前馈神经网络可以是任意合适的能进行类别预测的前馈神经网络,这里并不限定。在一些实施方式中,利用第三前馈神经网络可以对特征序列表征的预测对象的对象类别进行预测,得到第一对象类别。在实施时,第三前馈神经网络的输出数量可以根据实际业务场景中所需检测的对象类别的数量确定,这里并不限定。The third feedforward neural network may be any suitable feedforward neural network capable of category prediction, which is not limited here. In some implementation manners, the object category of the predicted object represented by the feature sequence can be predicted by using the third feedforward neural network to obtain the first object category. During implementation, the output quantity of the third feedforward neural network may be determined according to the quantity of object categories to be detected in an actual business scenario, which is not limited here.
在一些实施例中,所述第二模型与所述第一模型具有相同的网络结构。在实施,利用第二模型对第二增广图像进行目标检测的处理过程,可以参照利用第一模型对第一增广图像进行目标检测的处理过程。In some embodiments, the second model has the same network structure as the first model. In implementation, the process of performing target detection on the second augmented image by using the second model may refer to the process of performing target detection on the first augmented image by using the first model.
在一些实施例中,上述步骤S101可以包括如下步骤S141至步骤S142:In some embodiments, the above step S101 may include the following steps S141 to S142:
步骤S141,对第一图像样本进行第一图像增广处理,得到第一增广图像;Step S141, performing first image augmentation processing on the first image sample to obtain a first augmented image;
步骤S142,对所述第一图像样本进行第二图像增广处理,得到第二增广图像。Step S142, performing a second image augmentation process on the first image sample to obtain a second augmented image.
在实施时,第一图像增广处理与第二图像增广处理可以采用相同的增广处理方式,也可以采用不同的增广处理方式,这里并不限定。During implementation, the first image augmentation processing and the second image augmentation processing may adopt the same augmentation processing manner, or may adopt different augmentation processing manners, which are not limited here.
在一些实施例中,所述第一图像增广处理包括以下至少之一:颜色抖动、灰度处理、高斯模糊、随机擦除;所述第二图像增广处理包括以下至少之一:随机缩放、随机裁剪、随机翻转、随机调整尺寸。In some embodiments, the first image augmentation process includes at least one of the following: color dithering, grayscale processing, Gaussian blur, and random erasure; the second image augmentation process includes at least one of the following: random scaling , random cropping, random flipping, random resizing.
上述实施例中,通过对第一图像样本和第一图像样本分别进行第一图像增广处理和第二图像增广处理,得到第一增广图像和第二增广图像,由于相比于第二图像增广处理包括的随机缩放、随机裁剪、随机翻转以及随机调整尺寸带来的图像扰动,第一增广处理中包括的颜色抖动、灰度处理、高斯模糊、随机擦除带来的图像扰动更强,可以使得第一模型的目标检测难度相较第二模型更高,从而可以提升训练得到的第一模型的学习能力,并能改善由于第一模型与第二模型学习能力相同导致模型坍缩的情况。In the above-mentioned embodiment, the first augmented image and the second augmented image are obtained by performing the first image augmentation processing and the second image augmentation processing on the first image sample and the first image sample respectively. The image disturbance caused by random scaling, random cropping, random flipping and random resizing included in the second image augmentation process, the image caused by color dithering, grayscale processing, Gaussian blur, and random erasure included in the first augmentation process The disturbance is stronger, which can make the target detection difficulty of the first model higher than that of the second model, thereby improving the learning ability of the trained first model, and improving the model due to the same learning ability of the first model and the second model. Collapsing situation.
本公开实施例提供一种模型训练方法,该方法可以由计算机设备的处理器执行。图2为本公开实施例提供的一种模型训练方法的实现流程示意图,如图2所示,该方法包括如下步骤S201至步骤S206:An embodiment of the present disclosure provides a model training method, which can be executed by a processor of a computer device. Fig. 2 is a schematic diagram of the implementation flow of a model training method provided by an embodiment of the present disclosure. As shown in Fig. 2, the method includes the following steps S201 to S206:
步骤S201,获取分别对第一图像样本进行增广处理后得到的第一增广图像和第二增广图像。Step S201, acquiring a first augmented image and a second augmented image obtained by respectively performing augmentation processing on a first image sample.
步骤S202,利用待训练的第一模型,对所述第一增广图像进行目标检测,得到至少一个包括第一预测对象序列的第一检测结果,并利用第二模型,对所述第二增广图像进行目标检测,得到至少一个包括第二预测对象序列的第二检测结果。Step S202, use the first model to be trained to perform object detection on the first augmented image, obtain at least one first detection result including the first predicted object sequence, and use the second model to perform object detection on the second augmented image Target detection is performed on the wide image, and at least one second detection result including the second predicted object sequence is obtained.
步骤S203,对每一所述第一预测对象序列和每一所述第二预测对象序列进行匹配,得到至少一对具有目标匹配关系的第一预测对象序列和第二预测对象序列。Step S203, matching each of the first predictor sequences and each of the second predictor sequences to obtain at least one pair of the first predictor sequence and the second predictor sequence having a target matching relationship.
这里,上述步骤S201至步骤S203分别对应于前述步骤S101至步骤S103,在实施时可 以参照前述步骤S101至步骤S103的实施方式。Here, the above steps S201 to S203 respectively correspond to the above steps S101 to S103, and the implementation of the above steps S101 to S103 can be referred to for implementation.
步骤S204,基于每一对具有目标匹配关系的第一预测对象序列和第二预测对象序列之间的相似度,确定目标损失值。Step S204, based on the similarity between each pair of the first prediction object sequence and the second prediction object sequence having the target matching relationship, determine a target loss value.
这里,可以采用任意合适的相似度损失函数确定每一对具有目标匹配关系的第一预测对象序列和第二预测对象序列之间的相似度损失,基于每一相似度损失,可以确定目标损失值。相似度损失函数可以包括但不限于绝对值损失函数、最小平方误差损失函数、余弦损失函数、BYOL(Bootstrap Your Own Latent)算法、动量对比(Momentum Contrastive,MOCO)算法等中的至少一种。Here, any suitable similarity loss function can be used to determine the similarity loss between each pair of the first prediction object sequence and the second prediction object sequence with the target matching relationship, and based on each similarity loss, the target loss value can be determined . The similarity loss function may include but not limited to at least one of absolute value loss function, least square error loss function, cosine loss function, BYOL (Bootstrap Your Own Latent) algorithm, Momentum Contrastive (MOCO) algorithm, etc.
步骤S205,在所述目标损失值不满足预设条件的情况下,对所述第一模型的模型参数进行更新,得到更新后的第一模型。Step S205, if the target loss value does not meet the preset condition, update the model parameters of the first model to obtain an updated first model.
这里,预设条件可以包括但不限于目标损失值小于设定的损失值阈值、目标损失值变化收敛等。在实施时,预设条件可以根据实际情况设定,这里并不限定。Here, the preset conditions may include, but are not limited to, the target loss value being smaller than a set loss value threshold, the change of the target loss value converging, and the like. During implementation, the preset conditions may be set according to actual conditions, which are not limited here.
对第一模型的模型参数进行更新的方式可以是根据实际情况确定的,可以包括但不限于梯度下降法、动量更新法、牛顿动量法等中的至少一种,这里并不限定。The way to update the model parameters of the first model can be determined according to the actual situation, and can include but not limited to at least one of gradient descent method, momentum update method, Newton momentum method, etc., which is not limited here.
步骤S206,基于更新后的第一模型,确定训练后的所述第一模型。Step S206, based on the updated first model, determine the trained first model.
这里,在一些实施方式中,可以将该更新后的第一模型确定为训练后的第一模型。Here, in some implementation manners, the updated first model may be determined as the trained first model.
在一些实施方式中,可以对更新后的第一模型继续进行更新,并将最终更新后的第一模型确定为训练后的第一模型。In some implementation manners, the updated first model may be continuously updated, and the finally updated first model may be determined as the trained first model.
本公开实施例中,基于每一对具有目标匹配关系的第一预测对象序列和第二预测对象序列之间的相似度,确定目标损失值,在目标损失值不满足预设条件的情况下,对第一模型的模型参数进行更新,得到更新后的第一模型,基于更新后的第一模型,确定训练后的第一模型。这样,可以在目标损失值不满足预设条件的情况下,对第一模型的模型参数进行至少一次更新,由于目标损失值是基于每一对具有目标匹配关系的第一预测对象序列和第二预测对象序列之间的相似度确定的,从而可以提高训练后的第一模型与第二模型对于同一图像样本的不同增广图像进行处理后得到的预测对象序列的一致性,进而可以提高训练后的目标检测模型的性能。In the embodiment of the present disclosure, the target loss value is determined based on the similarity between each pair of the first prediction object sequence and the second prediction object sequence with the target matching relationship. When the target loss value does not meet the preset condition, The model parameters of the first model are updated to obtain an updated first model, and based on the updated first model, a trained first model is determined. In this way, the model parameters of the first model can be updated at least once when the target loss value does not meet the preset condition, because the target loss value is based on each pair of the first prediction object sequence and the second The similarity between the predicted object sequences is determined, so that the consistency of the predicted object sequences obtained after the first model after training and the second model are processed for different augmented images of the same image sample can be improved, and the post-training can be improved. The performance of the target detection model.
在一些实施例中,上述步骤S205可以包括如下步骤S211:In some embodiments, the above step S205 may include the following step S211:
步骤S211,在所述目标损失值不满足预设条件的情况下,分别对所述第一模型的模型参数和所述第二模型的模型参数进行更新,得到更新后的第一模型和更新后的第二模型。Step S211, when the target loss value does not meet the preset condition, update the model parameters of the first model and the model parameters of the second model respectively to obtain the updated first model and the updated of the second model.
这里,可以在目标损失值不满足预设条件的情况下,对第一模型的模型参数和第二模型的模型参数均进行更新,实现第一模型和第二模型的对比学习。Here, both the model parameters of the first model and the model parameters of the second model may be updated when the target loss value does not meet the preset condition, so as to realize comparative learning between the first model and the second model.
对第二模型的模型参数进行更新的方式可以是根据实际情况确定的,可以包括但不限于梯度下降法、动量更新法、牛顿动量法等中的至少一种,这里并不限定。在实施时,第一模型和第二模型的模型参数更新方法可以相同,也可以不同,这里并不限定。The way to update the model parameters of the second model can be determined according to the actual situation, and can include but not limited to at least one of gradient descent method, momentum update method, Newton momentum method, etc., which is not limited here. During implementation, the model parameter updating methods of the first model and the second model may be the same or different, which is not limited here.
上述步骤S206可以包括如下步骤S212:The above step S206 may include the following step S212:
步骤S212,基于更新后的第一模型和更新后的第二模型,确定训练后的所述第一模型。Step S212: Determine the trained first model based on the updated first model and the updated second model.
在一些实施方式中,可以基于更新后的第一模型和更新后的第二模型,确定新的目标损失值,并通过判断该新的目标损失值是否满足预设条件,来确定是否对更新后的第一模型继续进行更新。在新的目标损失值满足预设条件的情况下,可以确定不对更新后的第一模型继续进行更新,可以将该更新后的第一模型确定为训练后的第一模型;在新的目标损失值不满足预设条件的情况下,可以对更新后的第一模型继续进行更新,并将最终更新后的第一模型确定为训练后的第一模型。In some implementations, based on the updated first model and the updated second model, a new target loss value can be determined, and by judging whether the new target loss value satisfies a preset condition, it is determined whether the updated The first model continues to be updated. When the new target loss value satisfies the preset condition, it can be determined not to continue to update the updated first model, and the updated first model can be determined as the first model after training; in the new target loss If the value does not meet the preset condition, the updated first model may be continuously updated, and the finally updated first model may be determined as the trained first model.
上述实施例中,在对第一模型的模型参数进行更新的过程中,也对第二模型的模型参数进行更新,从而可以使得第一模型和第二模型的学习能力互相增强,进而可以提高训练后的目标检测模型的性能。In the above embodiment, in the process of updating the model parameters of the first model, the model parameters of the second model are also updated, so that the learning capabilities of the first model and the second model can be mutually enhanced, thereby improving the training performance. performance of the target detection model.
在一些实施例中,上述步骤S211可以包括如下步骤S221至步骤S222:In some embodiments, the above step S211 may include the following steps S221 to S222:
步骤S221,基于所述第一模型当前的模型参数,对所述第二模型的模型参数进行动量 更新,得到更新后的第二模型。Step S221, based on the current model parameters of the first model, perform momentum update on the model parameters of the second model to obtain an updated second model.
这里,本领域技术人员可以在实施时根据实际情况采用任意合适的动量更新方式基于第一模型当前的模型参数对第二模型的模型参数进行动量更新,本公开实施例并不限定。Here, those skilled in the art may use any appropriate momentum update method according to the actual situation to update the momentum of the model parameters of the second model based on the current model parameters of the first model, which is not limited by the embodiments of the present disclosure.
在一些实施方式中,可以基于设定的权重,对第一模型当前的模型参数和第二模型当前的模型参数进行加权求和,得到更新后的第二模型。例如,可以采用如下公式1对第二模型的模型参数进行动量更新:In some implementations, based on the set weights, the current model parameters of the first model and the current model parameters of the second model may be weighted and summed to obtain an updated second model. For example, the following formula 1 can be used to update the momentum of the model parameters of the second model:
Θ m+1=k*Θ m+(1-k)*Θ o      (1); Θ m+1 = k*Θ m +(1-k)*Θ o (1);
其中,Θ m和Θ o分别为第二模型当前的模型参数和第一模型当前的模型参数,Θ m+1为更新后的第二模型,k为设定的动量系数。在一些实施方式中k可以是大于或等于0.9且小于1的值,例如,k为0.995。 Among them, Θ m and Θ o are the current model parameters of the second model and the current model parameters of the first model, respectively, Θ m+1 is the updated second model, and k is the set momentum coefficient. In some embodiments k may be a value greater than or equal to 0.9 and less than 1, for example, k is 0.995.
步骤S222,采用梯度更新的方式,对所述第一模型当前的模型参数进行更新,得到更新后的第一模型。In step S222, the current model parameters of the first model are updated in a gradient update manner to obtain an updated first model.
这里,可以采用任意合适的梯度更新算法,对第一模型当前的模型参数进行更新,本公开实施例并不限定。例如,梯度更新算法可以包括但不限定于批量梯度下降、随机梯度下降、小批量梯度下降等中的至少一种。Here, any suitable gradient update algorithm may be used to update the current model parameters of the first model, which is not limited in this embodiment of the present disclosure. For example, the gradient update algorithm may include, but is not limited to, at least one of batch gradient descent, stochastic gradient descent, and mini-batch gradient descent.
上述实施例中,基于第一模型当前的模型参数,对第二模型的模型参数进行动量更新,得到更新后的第二模型,并采用梯度更新的方式,对第一模型当前的模型参数进行更新,得到更新后的第一模型。这样,可以使得第一模型和第二模型以不同的速率进行更新,可以改善模型坍缩的情况,提高训练后的目标检测模型的性能。In the above embodiment, based on the current model parameters of the first model, the momentum update is performed on the model parameters of the second model to obtain the updated second model, and the current model parameters of the first model are updated by means of gradient update , to get the updated first model. In this way, the first model and the second model can be updated at different rates, which can improve model collapse and improve the performance of the trained target detection model.
在一些实施例中,上述步骤S212可以包括如下步骤S231至步骤S235:In some embodiments, the above step S212 may include the following steps S231 to S235:
步骤S231,将对下一第一图像样本分别进行增广处理后得到的第一增广图像和第二增广图像,分别确定为当前第一增广图像和当前第二增广图像。In step S231 , the first augmented image and the second augmented image obtained after augmenting the next first image sample are respectively determined as the current first augmented image and the current second augmented image.
这里,下一第一图像样本可以是与当前的第一图像样本相同的图像,也可以是与当前的第一图像样本不同的图像。Here, the next first image sample may be the same image as the current first image sample, or an image different from the current first image sample.
步骤S232,利用当前更新后的第一模型,对所述当前第一增广图像进行目标检测,得到至少一个包括第一预测对象序列的第一检测结果,并利用当前更新后的第二模型,对所述当前第二增广图像进行目标检测,得到至少一个包括第二预测对象序列的第二检测结果。Step S232, using the currently updated first model to perform object detection on the current first augmented image, obtaining at least one first detection result including the first predicted object sequence, and using the currently updated second model, Object detection is performed on the current second augmented image to obtain at least one second detection result including a second predicted object sequence.
步骤S233,对每一所述第一预测对象序列和每一所述第二预测对象序列进行匹配,得到至少一对具有目标匹配关系的第一预测对象序列和第二预测对象序列。Step S233, matching each of the first predictor sequences and each of the second predictor sequences to obtain at least one pair of the first predictor sequence and the second predictor sequence having a target matching relationship.
步骤S234,基于每一对具有目标匹配关系的第一预测对象序列和第二预测对象序列之间的相似度,确定当前目标损失值。Step S234, based on the similarity between each pair of the first prediction target sequence and the second prediction target sequence having a target matching relationship, determine the current target loss value.
这里,上述步骤S231至步骤S234分别对应于前述步骤S201至步骤S204,在实施时可以参照前述步骤S201至步骤S204的实施方式。Here, the above steps S231 to S234 respectively correspond to the above steps S201 to S204, and for implementation, reference may be made to the implementation manners of the above steps S201 to S204.
步骤S235,在所述当前目标损失值满足所述预设条件或对所述第一模型的模型参数进行更新的次数达到次数阈值的情况下,将所述当前更新后的第一模型确定为训练后的所述第一模型。Step S235, when the current target loss value satisfies the preset condition or the number of times the model parameters of the first model are updated reaches a number threshold, determine the currently updated first model as training after the first model.
这里,次数阈值可以是用户预先根据实际情况设定的,也可以是默认的。Here, the number of times threshold may be preset by the user according to the actual situation, or may be a default value.
在一些实施例中,上述步骤S212可以还包括如下步骤S241至步骤S242:In some embodiments, the above step S212 may further include the following steps S241 to S242:
步骤S241,在所述当前目标损失值不满足预设条件的情况下,分别对所述第一模型的模型参数和所述第二模型的模型参数进行下一次更新,得到下一次更新后的第一模型和下一次更新后的第二模型。Step S241, in the case that the current target loss value does not meet the preset condition, respectively update the model parameters of the first model and the model parameters of the second model to obtain the second update after the next update. One model and the second model after the next update.
步骤S242,基于所述下一次更新后的第一模型和所述下一次更新后的第二模型,确定训练后的所述第一模型。Step S242, based on the first model after the next update and the second model after the next update, determine the first model after training.
上述实施例中,可以在目标损失值不满足预设条件的情况下,对第一模型的模型参数和所述第二模型的模型参数进行下一次更新,并基于下一次更新后的第一模型和下一次更新后的第二模型,确定训练后的第一模型,从而可以通过不断迭代更新提高训练后的第一模型的 性能。In the above embodiment, when the target loss value does not meet the preset condition, the model parameters of the first model and the model parameters of the second model can be updated next time, and based on the first model after the next update and the second model after the next update to determine the first model after training, so that the performance of the first model after training can be improved through continuous iterative updating.
本公开实施例提供一种模型训练方法,该方法可以由计算机设备的处理器执行。图3为本公开实施例提供的一种模型训练方法的实现流程示意图,如图3所示,该方法包括如下步骤S301至步骤S310:An embodiment of the present disclosure provides a model training method, which can be executed by a processor of a computer device. Fig. 3 is a schematic diagram of the implementation flow of a model training method provided by an embodiment of the present disclosure. As shown in Fig. 3, the method includes the following steps S301 to S310:
步骤S301,获取分别对第一图像样本进行增广处理后得到的第一增广图像和第二增广图像。Step S301, acquiring a first augmented image and a second augmented image obtained by respectively performing augmentation processing on a first image sample.
步骤S302,利用待训练的第一模型,对所述第一增广图像进行目标检测,得到至少一个第一检测结果,并利用第二模型,对所述第二增广图像进行目标检测,得到至少一个包括第二预测对象序列的第二检测结果;所述第一检测结果包括第一预测对象序列以及与所述第一预测对象序列对应的第一对象区域和第一对象类别。Step S302, use the first model to be trained to perform target detection on the first augmented image to obtain at least one first detection result, and use the second model to perform target detection on the second augmented image to obtain At least one of the second detection results includes a second sequence of predicted objects; the first detection result includes a first sequence of predicted objects and a first object region and a first object category corresponding to the first sequence of predicted objects.
步骤S303,对每一所述第一预测对象序列和每一所述第二预测对象序列进行匹配,得到至少一对具有目标匹配关系的第一预测对象序列和第二预测对象序列。Step S303, matching each of the first predictor sequences and each of the second predictor sequences to obtain at least one pair of the first predictor sequence and the second predictor sequence having a target matching relationship.
这里,上述步骤S301至步骤S303分别对应于前述步骤S101至步骤S103,在实施时可以参照前述步骤S101至步骤S103的实施方式。Here, the above step S301 to step S303 respectively correspond to the above step S101 to step S103, and the implementation manner of the above step S101 to step S103 can be referred to for implementation.
步骤S304,获取所述第一图像样本中的至少一个候选对象,每一所述候选对象具有候选对象区域和候选对象类别。Step S304, acquiring at least one candidate object in the first image sample, each of the candidate objects having a candidate object area and a candidate object category.
这里,第一图像样本中的至少一个候选对象可以是随机确定的,也可以是通过任意合适的无监督算法对第一图像样本进行目标检测得到的,这里并不限定。例如,无监督检测算法可以包括但不限于滑动窗口法、候选区域算法、选择性搜索算法等中的至少一种。Here, at least one candidate object in the first image sample may be randomly determined, or may be obtained by performing object detection on the first image sample through any suitable unsupervised algorithm, which is not limited here. For example, the unsupervised detection algorithm may include but not limited to at least one of a sliding window method, a candidate region algorithm, a selective search algorithm, and the like.
候选对象的候选对象区域为候选对象在第一图像样本中的预测位置区域,候选对象的候选对象类别为该候选对象的预测类型。候选对象的候选对象类别可以作为该候选对象的候选对象区域的伪标签。The candidate object area of the candidate object is the predicted position area of the candidate object in the first image sample, and the candidate object category of the candidate object is the prediction type of the candidate object. The candidate object category of the candidate object can be used as a pseudo-label of the candidate object region of the candidate object.
在一些实施例中,上述步骤S304可以包括:采用无监督方式,对所述第一图像样本进行目标检测,得到至少一个预测对象区域以及每一所述预测对象区域的伪标签;每一所述预测对象区域的伪标签用于表征所述预测对象区域的预测对象类别;针对每一所述预测对象区域,将所述预测对象区域作为候选对象区域,并将所述预测对象区域的伪标签作为候选对象类别,得到一个候选对象。这里,可以采用任意合适的无监督算法实现对第一图像样本进行的无监督方式的目标检测。这样,可以减少目标检测模型训练过程中的标注成本。In some embodiments, the above step S304 may include: performing object detection on the first image sample in an unsupervised manner to obtain at least one predicted object region and a pseudo-label of each predicted object region; The pseudo-label of the prediction object area is used to characterize the prediction object category of the prediction object area; for each of the prediction object areas, the prediction object area is used as a candidate object area, and the pseudo-label of the prediction object area is used as Candidate object category, get a candidate object. Here, any suitable unsupervised algorithm may be used to implement the unsupervised target detection on the first image sample. In this way, the labeling cost in the training process of the target detection model can be reduced.
步骤S305,基于每一所述第一预测对象序列对应的第一对象区域和第一对象类别、以及每一所述候选对象的候选对象区域和候选对象类别,对每一所述第一预测对象序列和每一所述候选对象进行匹配,得到至少一对具有目标匹配关系的第一预测对象序列和候选对象。Step S305, based on the first object region and the first object category corresponding to each of the first prediction object sequences, and the candidate object region and the candidate object category of each of the candidate objects, for each of the first prediction objects The sequence is matched with each of the candidate objects to obtain at least one pair of the first predicted object sequence and the candidate object having a target matching relationship.
这里,具有目标匹配关系的第一预测对象序列和候选对象可以表征第一图像样本中的同一个预测对象。在实施时,本领域技术人员可以根据实际情况采用任意合适的匹配方式对每一第一预测对象序列和每一候选对象进行匹配,这里并不限定。Here, the first predicted object sequence and the candidate object having a target matching relationship may represent the same predicted object in the first image sample. During implementation, those skilled in the art may use any suitable matching manner to match each first predicted object sequence with each candidate object according to actual conditions, which is not limited here.
在一些实施方式中,可以采用二分图匹配的方式,对每一第一预测对象序列和每一候选对象进行匹配,得到至少一对具有目标匹配关系的第一预测对象序列和候选对象。在实施时,可以采用任意合适的方式对二分图匹配过程采用的匹配损失进行计算,这里并不限定。例如,二分图匹配过程采用的匹配损失可以是基于以下至少之一确定的:互相匹配的每一对第一预测对象序列和候选对象分别对应的第一对象区域与候选对象区域之间的交并比、互相匹配的每一对第一预测对象序列和候选对象分别对应第一对象类别与候选对象类别之间的焦点损失等。In some implementations, bipartite graph matching may be used to match each first predicted object sequence and each candidate object to obtain at least one pair of the first predicted object sequence and the candidate object having a target matching relationship. During implementation, any suitable manner may be used to calculate the matching loss used in the bipartite graph matching process, which is not limited here. For example, the matching loss used in the bipartite graph matching process may be determined based on at least one of the following: the intersection and union between the first object region and the candidate object region respectively corresponding to each pair of first predicted object sequences and candidate objects that match each other Each pair of the first predicted object sequence and the candidate object matched with each other corresponds to the focal loss between the first object category and the candidate object category, etc.
步骤S306,基于每一对具有目标匹配关系的第一预测对象序列和第二预测对象序列之间的相似度,确定第一损失值。Step S306, based on the similarity between each pair of the first prediction object sequence and the second prediction object sequence having the target matching relationship, determine a first loss value.
这里,可以采用任意合适的相似度损失函数确定每一对具有目标匹配关系的第一预测对象序列和第二预测对象序列之间的第一损失值,本公开实施例对此并不限定。Here, any suitable similarity loss function may be used to determine the first loss value between each pair of the first prediction object sequence and the second prediction object sequence having the target matching relationship, which is not limited in this embodiment of the present disclosure.
在一些实施方式中,可以确定每一对具有目标匹配关系的第一预测对象序列和第二预测对象序列之间的相似度损失,并将每一相似度损失累加后得到第一损失值。例如,可以采用 如下公式2所示的方式确定第一损失值:In some implementations, the similarity loss between each pair of the first predicted object sequence and the second predicted object sequence having the target matching relationship may be determined, and each similarity loss may be accumulated to obtain the first loss value. For example, the first loss value may be determined in a manner as shown in the following formula 2:
Figure PCTCN2022095298-appb-000001
Figure PCTCN2022095298-appb-000001
其中,N为具有目标匹配关系的第一预测对象序列和第二预测对象序列对的数量,N为正整数;s i是第一预测对象序列,
Figure PCTCN2022095298-appb-000002
是与s i具有目标匹配关系的第二预测对象序列。
Figure PCTCN2022095298-appb-000003
是相似度损失算法,
Figure PCTCN2022095298-appb-000004
为确定的第一损失值。
Among them, N is the number of pairs of the first prediction object sequence and the second prediction object sequence with the target matching relationship, and N is a positive integer; si is the first prediction object sequence,
Figure PCTCN2022095298-appb-000002
is the second prediction object sequence that has a target matching relationship with si .
Figure PCTCN2022095298-appb-000003
is the similarity loss algorithm,
Figure PCTCN2022095298-appb-000004
is the determined first loss value.
步骤S307,基于每一对具有目标匹配关系的第一预测对象序列和候选对象,确定第二损失值。Step S307, based on each pair of the first predicted object sequence and the candidate object having the target matching relationship, determine a second loss value.
这里,可以采用任意合适的损失函数确定每一对具有目标匹配关系的第一预测对象序列和候选对象之间的第二损失值,本公开实施例对此并不限定。损失函数可以包括但不限于相似度损失函数、焦点损失函数、交并比损失函数、广义交并比损失函数等中的至少一种。Here, any suitable loss function may be used to determine the second loss value between each pair of the first predicted object sequence and the candidate object having the target matching relationship, which is not limited in this embodiment of the present disclosure. The loss function may include but not limited to at least one of a similarity loss function, a focus loss function, an intersection loss function, a generalized intersection loss function, and the like.
步骤S308,基于所述第一损失值和所述第二损失值,确定目标损失值。Step S308: Determine a target loss value based on the first loss value and the second loss value.
这里,可以根据实际情况采用合适的方式基于第一损失值和第二损失值确定目标损失值,本公开实施例并不限定。例如,可以将第一损失值和第二损失值之和确定为目标损失值,也可以将第一损失值和第二损失值的平均值确定为目标损失值,还可以采用不同的权重对第一损失值和第二损失值进行加权求和得到目标损失值。Here, the target loss value may be determined based on the first loss value and the second loss value in an appropriate manner according to actual conditions, which is not limited in this embodiment of the present disclosure. For example, the sum of the first loss value and the second loss value can be determined as the target loss value, or the average value of the first loss value and the second loss value can be determined as the target loss value, and different weights can be used to determine the target loss value The first loss value and the second loss value are weighted and summed to obtain the target loss value.
步骤S309,在所述目标损失值不满足预设条件的情况下,对所述第一模型的模型参数进行更新,得到更新后的第一模型。Step S309, if the target loss value does not meet the preset condition, update the model parameters of the first model to obtain an updated first model.
步骤S310,基于更新后的第一模型,确定训练后的所述第一模型。Step S310, based on the updated first model, determine the trained first model.
这里,上述步骤S309至步骤S310分别对应于前述步骤S205至步骤S206,在实施时可以参照前述步骤S205至步骤S206的实施方式。Here, the above-mentioned steps S309 to S310 correspond to the above-mentioned steps S205 to S206 respectively, and for implementation, reference may be made to the implementation manners of the above-mentioned steps S205 to S206.
需要说明的是,各步骤之间的执行顺序不限于图3所示的顺序。例如,可以在步骤S301之前执行步骤S304,也可以在步骤S306之后执行步骤S304,还可以在步骤S302之后且在步骤S303之前执行步骤S307;本公开实施例对此并不限定。It should be noted that the execution order of the various steps is not limited to the order shown in FIG. 3 . For example, step S304 may be performed before step S301, step S304 may be performed after step S306, and step S307 may be performed after step S302 and before step S303; this is not limited in this embodiment of the present disclosure.
本公开实施例中,基于每一对具有目标匹配关系的第一预测对象序列和第二预测对象序列之间的相似度,确定第一损失值,基于每一对具有目标匹配关系的第一预测对象序列和候选对象,确定第二损失值,并基于第一损失值和第二损失值,确定目标损失值。由于每一候选对象的候选对象类别可以作为该候选对象的候选对象区域的伪标签,因此,基于每一对具有目标匹配关系的第一预测对象序列和候选对象,确定的第二损失值可以对第一模型的预测对象定位能力提供客观性监督,从而可以提高训练后的第一模型的对象定位能力,进而可以提高训练后的第一模型的检测准确性。In the embodiment of the present disclosure, based on the similarity between each pair of the first prediction object sequence and the second prediction object sequence with the target matching relationship, the first loss value is determined, and based on each pair of the first prediction object sequence with the target matching relationship The object sequence and candidate objects are used to determine a second loss value, and based on the first loss value and the second loss value, a target loss value is determined. Since the candidate object category of each candidate object can be used as the pseudo-label of the candidate object region of the candidate object, based on each pair of the first predicted object sequence and the candidate object with the target matching relationship, the determined second loss value can be The object location prediction ability of the first model provides objective supervision, thereby improving the object location ability of the trained first model, and further improving the detection accuracy of the trained first model.
在一些实施例中,上述步骤S307可以包括如下步骤S321至步骤S322:In some embodiments, the above step S307 may include the following steps S321 to S322:
步骤S321,针对每一对具有目标匹配关系的第一预测对象序列和候选对象,基于所述第一预测对象序列对应的第一对象区域和所述候选对象的候选对象区域,确定一第一子损失值,并基于所述第一预测对象序列对应的第一对象类别与所述候选对象的候选对象类别,确定一第二子损失值。Step S321, for each pair of the first predicted object sequence and the candidate object having the target matching relationship, based on the first object region corresponding to the first predicted object sequence and the candidate object region of the candidate object, determine a first sub- A loss value, and determine a second sub-loss value based on the first object category corresponding to the first predicted object sequence and the candidate object category of the candidate object.
这里,可以采用任意合适的损失函数确定第一对象区域和候选对象区域之间的第一子损失值,以及第一对象类别与候选对象类别之间的第二子损失值,本公开实施例对此并不限定。例如,可以采用交并比损失函数、广义交并比损失函数等确定第一对象区域和候选对象区域之间的第一子损失值,可以采用焦点损失函数确定第一对象类别与候选对象类别之间的第二子损失值。Here, any suitable loss function can be used to determine the first sub-loss value between the first object region and the candidate object region, and the second sub-loss value between the first object category and the candidate object category. This is not limited. For example, the intersection ratio loss function, generalized intersection ratio loss function, etc. can be used to determine the first sub-loss value between the first object area and the candidate object area, and the focus loss function can be used to determine the first sub-loss value between the first object category and the candidate object category. The second sub-loss value between.
步骤S322,基于每一所述第一子损失值和每一所述第二子损失值,确定第二损失值。Step S322: Determine a second loss value based on each of the first sub-loss values and each of the second sub-loss values.
这里,可以根据实际情况采用合适的方式基于第一子损失值和第二子损失值确定第二损失值,本公开实施例并不限定。例如,可以将第一子损失值和第二子损失值之和确定为第二损失值,也可以将第一子损失值和第二子损失值的平均值确定为第二损失值,还可以采用不 同的权重对第一子损失值和第二子损失值进行加权求和得到第二损失值。Here, the second loss value may be determined based on the first sub-loss value and the second sub-loss value in an appropriate manner according to actual conditions, which is not limited in this embodiment of the present disclosure. For example, the sum of the first sub-loss value and the second sub-loss value may be determined as the second loss value, or the average value of the first sub-loss value and the second sub-loss value may be determined as the second loss value, or The first sub-loss value and the second sub-loss value are weighted and summed with different weights to obtain the second loss value.
在一些实施例中,可以对每一所述第一子损失值、每一所述第二子损失值、以及基于每一对具有目标匹配关系的第一预测对象序列和第二预测对象序列之间的相似度损失进行加权求和后,得到目标损失值。例如,可以采用如下公式3所示的方式确定目标损失值:In some embodiments, each of the first sub-loss values, each of the second sub-loss values, and each pair of the first prediction object sequence and the second prediction object sequence having a target matching relationship can be The target loss value is obtained after the weighted summation of the similarity losses between them. For example, the target loss value can be determined in the manner shown in the following formula 3:
Figure PCTCN2022095298-appb-000005
Figure PCTCN2022095298-appb-000005
其中,N为具有目标匹配关系的第一预测对象序列和第二预测对象序列对的数量,N为正整数;s i是第一预测对象序列,
Figure PCTCN2022095298-appb-000006
是与s i具有目标匹配关系的第二预测对象序列,
Figure PCTCN2022095298-appb-000007
是第一预测对象序列s i与第二预测对象序列
Figure PCTCN2022095298-appb-000008
之间的相似度损失;c i是第一预测对象序列s i对应的第一对象类别,
Figure PCTCN2022095298-appb-000009
是与s i具有目标匹配关系的候选对象的候选对象类别,
Figure PCTCN2022095298-appb-000010
是采用交点损失函数计算的第一对象类别c i和候选对象类别
Figure PCTCN2022095298-appb-000011
之间的第一子损失值;
Figure PCTCN2022095298-appb-000012
表示c i为空时取0,c i不为空时取1;b i是第一预测对象序列s i对应的第一对象区域,
Figure PCTCN2022095298-appb-000013
是与s i具有目标匹配关系的候选对象的候选对象区域,
Figure PCTCN2022095298-appb-000014
是采用广义交并比损失函数计算的第一对象区域b i和候选对象区域
Figure PCTCN2022095298-appb-000015
之间的第二子损失值;λ f、λ b和λ e分别为第一子损失值、第二子损失值和相似度损失的权重;
Figure PCTCN2022095298-appb-000016
为第一预测对象序列y与第二预测对象序列
Figure PCTCN2022095298-appb-000017
之间的目标损失值。
Among them, N is the number of pairs of the first prediction object sequence and the second prediction object sequence with the target matching relationship, and N is a positive integer; si is the first prediction object sequence,
Figure PCTCN2022095298-appb-000006
is the second predicted object sequence that has a target matching relationship with si ,
Figure PCTCN2022095298-appb-000007
is the first predictor sequence s i and the second predictor sequence
Figure PCTCN2022095298-appb-000008
The similarity loss between; ci is the first object category corresponding to the first predicted object sequence si ,
Figure PCTCN2022095298-appb-000009
is the candidate object category of the candidate object that has a target matching relationship with si ,
Figure PCTCN2022095298-appb-000010
is the first object category c i and the candidate object category calculated using the intersection loss function
Figure PCTCN2022095298-appb-000011
The first sub-loss value between;
Figure PCTCN2022095298-appb-000012
Indicates that ci is 0 when it is empty, and 1 when ci is not empty; b i is the first object area corresponding to the first prediction object sequence si ,
Figure PCTCN2022095298-appb-000013
is the candidate object region of the candidate object that has a target matching relationship with si ,
Figure PCTCN2022095298-appb-000014
is the first object region bi and the candidate object region calculated using the generalized intersection ratio loss function
Figure PCTCN2022095298-appb-000015
The second sub-loss value between; λ f , λ b and λ e are the weights of the first sub-loss value, the second sub-loss value and the similarity loss respectively;
Figure PCTCN2022095298-appb-000016
is the first predictor sequence y and the second predictor sequence
Figure PCTCN2022095298-appb-000017
The target loss value between.
上述实施例中,针对每一对具有目标匹配关系的第一预测对象序列和候选对象,基于第一预测对象序列对应的第一对象区域和候选对象的候选对象区域,确定一第一子损失值,并基于第一预测对象序列对应的第一对象类别与候选对象的候选对象类别,确定一第二子损失值;基于每一第一子损失值和每一第二子损失值,确定第二损失值。这样,可以同时实现第一模型检测中的对象区域回归和对象类别的自监督表征学习过程,从而可以提高训练后的第一模型的检测准确性。In the above embodiment, for each pair of the first predicted object sequence and the candidate object having a target matching relationship, a first sub-loss value is determined based on the first object region corresponding to the first predicted object sequence and the candidate object region of the candidate object , and based on the first object category corresponding to the first predicted object sequence and the candidate object category of the candidate object, determine a second sub-loss value; based on each first sub-loss value and each second sub-loss value, determine the second loss value. In this way, the object region regression in the first model detection and the self-supervised representation learning process of the object category can be realized simultaneously, so that the detection accuracy of the trained first model can be improved.
本公开实施例提供一种模型训练方法,该方法可以由计算机设备的处理器执行。图4为本公开实施例提供的一种模型训练方法的实现流程示意图,如图4所示,该方法包括如下步骤S401至步骤S404:An embodiment of the present disclosure provides a model training method, which can be executed by a processor of a computer device. Fig. 4 is a schematic diagram of the implementation flow of a model training method provided by an embodiment of the present disclosure. As shown in Fig. 4, the method includes the following steps S401 to S404:
步骤S401,获取分别对第一图像样本进行增广处理后得到的第一增广图像和第二增广图像。Step S401, acquiring a first augmented image and a second augmented image obtained by respectively performing augmentation processing on a first image sample.
步骤S402,利用待训练的第一模型,对所述第一增广图像进行目标检测,得到至少一个第一检测结果,并利用第二模型,对所述第二增广图像进行目标检测,得到至少一个第二检测结果;所述第一检测结果包括第一预测对象序列以及与所述第一预测对象序列对应的第一对象区域和第一对象类别,所述第二检测结果包括第二预测对象序列以及与所述第二预测对象序列对应的第二对象区域和第二对象类别。Step S402, use the first model to be trained to perform object detection on the first augmented image to obtain at least one first detection result, and use the second model to perform object detection on the second augmented image to obtain At least one second detection result; the first detection result includes a first predicted object sequence and a first object area and a first object category corresponding to the first predicted object sequence, and the second detection result includes a second predicted object sequence An object sequence and a second object area and a second object category corresponding to the second predicted object sequence.
这里,上述步骤S401至步骤S402分别对应于前述步骤S101至步骤S102,在实施时可以参照前述步骤S101至步骤S102的实施方式。Here, the above step S401 to step S402 respectively correspond to the above step S101 to step S102, and the implementation manner of the above step S101 to step S102 can be referred to for implementation.
其中,第二对象区域可以是对第二预测对象序列表征的预测对象在第二增广图像中的位置区域进行预测得到的,可以是预测对象的检测框。第二对象类别可以是对第二预测对象序列表征的预测对象的对象类别进行预测得到的。Wherein, the second object area may be obtained by predicting the position area of the predicted object represented by the second predicted object sequence in the second augmented image, and may be a detection frame of the predicted object. The second object category may be obtained by predicting the object category of the predicted object represented by the second sequence of predicted objects.
步骤S403,基于每一所述第一预测对象序列对应的第一对象区域和第一对象类别、以及每一所述第二预测对象序列对应的第二对象区域和第二对象类别,对每一所述第一预测对 象序列和每一所述第二预测对象序列进行二分图匹配,得到至少一对具有目标匹配关系的第一预测对象序列和第二预测对象序列。Step S403, based on the first object region and the first object category corresponding to each of the first predicted object sequences, and the second object region and the second object category corresponding to each of the second predicted object sequences, for each The first predictor sequence and each of the second predictor sequences perform bipartite graph matching to obtain at least one pair of the first predictor sequence and the second predictor sequence having a target matching relationship.
这里,可以采用任意合适的二分图匹配算法,对每一第一预测对象序列和每一第二预测对象序列进行匹配,得到至少一对具有目标匹配关系的第一预测对象序列和第二预测对象序列。例如,采用的二分图匹配算法可以包括但不限于匈牙利匹配算法、最大流匹配算法等中的至少一种。在实施时,可以采用任意合适的方式对二分图匹配过程采用的匹配损失进行计算,这里并不限定。例如,二分图匹配过程采用的匹配损失可以是基于以下至少之一确定的:互相匹配的每一对第一预测对象序列和第二预测对象序列之间的相似度、互相匹配的每一对第一预测对象序列和第二预测对象序列分别对应的第一对象区域与第二对象区域之间的交并比、互相匹配的每一对第一预测对象序列和第二预测对象序列分别对应的第一对象类别与第二对象类别之间的焦点损失等。Here, any suitable bipartite graph matching algorithm can be used to match each first predictor sequence with each second predictor sequence to obtain at least one pair of first predictor sequence and second predictor sequence with target matching relationship sequence. For example, the bipartite graph matching algorithm used may include but not limited to at least one of the Hungarian matching algorithm, the maximum flow matching algorithm, and the like. During implementation, any suitable manner may be used to calculate the matching loss used in the bipartite graph matching process, which is not limited here. For example, the matching loss used in the bipartite graph matching process may be determined based on at least one of the following: the similarity between each pair of first predicted object sequences and second predicted object sequences that match each other, each pair of first predicted object sequences that match each other The intersection ratio between the first object region and the second object region corresponding to the first prediction object sequence and the second prediction object sequence, and the first prediction object sequence and the second prediction object sequence respectively corresponding to each matching pair Focal loss between one object class and a second object class, etc.
步骤S404,基于每一对具有目标匹配关系的第一预测对象序列和第二预测对象序列,对所述第一模型的模型参数进行至少一次更新,得到训练后的所述第一模型。Step S404, based on each pair of the first predicted object sequence and the second predicted object sequence having the target matching relationship, update the model parameters of the first model at least once to obtain the trained first model.
这里,上述步骤S404对应于前述步骤S104,在实施时可以参照前述步骤S104的实施方式。Here, the above-mentioned step S404 corresponds to the above-mentioned step S104, and the implementation of the above-mentioned step S104 can be referred to for implementation.
在一些实施例中,上述步骤S403可以包括如下步骤S411至步骤S413:In some embodiments, the above step S403 may include the following steps S411 to S413:
步骤S411,基于每一所述第一预测对象序列和每一所述第二预测对象序列,确定至少一个候选序列对集合;每一所述候选序列对集合中包括至少一对具有候选匹配关系的第一预测对象序列和第二预测对象序列。Step S411, based on each of the first predictor sequences and each of the second predictor sequences, determine at least one set of candidate sequence pairs; each set of candidate sequence pairs includes at least one pair of candidates with a matching relationship The first sequence of predictors and the second sequence of predictors.
这里,可采用任意合适的方式对每一第一预测对象序列和每一第二预测对象序列进行一一匹配,得到至少一个候选序列对集合,本公开实施例并不限定。例如,可对每一第一预测对象序列和每一第二预测对象序列进行至少一次随机匹配,得到至少一个候选序列对集合。Here, any suitable manner may be used to perform one-to-one matching on each first predictor sequence and each second predictor sequence to obtain at least one candidate sequence pair set, which is not limited in this embodiment of the present disclosure. For example, at least one random match may be performed on each first predictor sequence and each second predictor sequence to obtain at least one candidate sequence pair set.
步骤S412,针对每一候选序列对集合,基于所述候选序列对集合中每一对具有候选匹配关系的第一预测对象序列和第二预测对象序列中所述第一预测对象序列对应的第一对象区域和第一对象类别、以及所述第二预测对象序列对应的第二对象区域和第二对象类别,确定所述候选序列对集合的匹配损失。Step S412, for each set of candidate sequence pairs, based on the first predictor sequence corresponding to the first predictor sequence in the second predictor sequence and each pair of candidate sequence pair sets in the set of candidate sequence pairs. The object region and the first object category, and the second object region and the second object category corresponding to the second predicted object sequence determine the matching loss of the set of candidate sequence pairs.
这里,可以采用任意合适的方式对候选序列对集合的匹配损失进行计算。Here, any suitable manner may be used to calculate the matching loss of the set of candidate sequence pairs.
在一些实施方式中,可以基于候选序列对集合中互相匹配的每一对第一预测对象序列和第二预测对象序列分别对应的第一对象区域与第二对象区域之间的交并比、互相匹配的每一对第一预测对象序列和每一第二预测对象序列分别对应的第一对象类别与第二对象类别之间的焦点损失,确定该候选序列对集合的匹配损失。In some implementations, based on the intersection ratio between the first object area and the second object area corresponding to each pair of the first prediction object sequence and the second prediction object sequence that match each other in the candidate sequence pair set, the mutual The focal loss between the first object category and the second object category respectively corresponding to each pair of the first predicted object sequence and each second predicted object sequence is used to determine the matching loss of the set of candidate sequence pairs.
例如,可以采用如下公式4所示的方式计算候选序列对集合的匹配损失:For example, the matching loss of the candidate sequence pair set can be calculated in the manner shown in the following formula 4:
Figure PCTCN2022095298-appb-000018
Figure PCTCN2022095298-appb-000018
其中,N为具有目标匹配关系的第一预测对象序列和第二预测对象序列对的数量,N为正整数;
Figure PCTCN2022095298-appb-000019
表示匈牙利匹配损失,
Figure PCTCN2022095298-appb-000020
表示候选序列对集合中互相匹配的至少一对第一预测对象序列与第二预测对象序列;
Figure PCTCN2022095298-appb-000021
是第i对具有目标匹配关系的第一预测对象序列和第二预测对象序列中第二预测对象序列对应的第二对象类别,
Figure PCTCN2022095298-appb-000022
是与该第二预测对象序列具有目标匹配关系的第一预测对象序列的第一对象类别为
Figure PCTCN2022095298-appb-000023
的置信度;
Figure PCTCN2022095298-appb-000024
Figure PCTCN2022095298-appb-000025
为空时取0,
Figure PCTCN2022095298-appb-000026
不为空时取1;b i是第i对具有目标匹配关系的第一预测对象序列和第二预测对象序列中第一预测对象序列对应的第一对象区域,
Figure PCTCN2022095298-appb-000027
是与该第一预测对象序列具有目标匹配关系的第二预测对象序列的第二对象区域,
Figure PCTCN2022095298-appb-000028
是采用广义交并比损失函数计算的第一对象区域b i和第二对象区域b σ(i)之间的损失值。
Wherein, N is the number of pairs of the first prediction object sequence and the second prediction object sequence having a target matching relationship, and N is a positive integer;
Figure PCTCN2022095298-appb-000019
denotes the Hungarian matching loss,
Figure PCTCN2022095298-appb-000020
Representing at least one pair of a first prediction target sequence and a second prediction target sequence that match each other in the candidate sequence pair set;
Figure PCTCN2022095298-appb-000021
is the second object category corresponding to the second prediction object sequence in the i-th pair of the first prediction object sequence and the second prediction object sequence having a target matching relationship,
Figure PCTCN2022095298-appb-000022
is the first object category of the first predicted object sequence that has a target matching relationship with the second predicted object sequence is
Figure PCTCN2022095298-appb-000023
confidence level;
Figure PCTCN2022095298-appb-000024
exist
Figure PCTCN2022095298-appb-000025
Take 0 when empty,
Figure PCTCN2022095298-appb-000026
Take 1 when it is not empty; b i is the first object region corresponding to the first prediction object sequence in the i-th pair of the first prediction object sequence and the second prediction object sequence with the target matching relationship,
Figure PCTCN2022095298-appb-000027
is the second target area of the second predicted target sequence having a target matching relationship with the first predicted target sequence,
Figure PCTCN2022095298-appb-000028
is the loss value between the first object region b i and the second object region b σ(i) calculated using the generalized intersection ratio loss function.
步骤S413,将所述至少一个候选序列对集合中匹配损失最小的候选序列对集合中的每一对具有候选匹配关系的第一预测对象序列和第二预测对象序列,确定为至少一对具有目标匹配关系的第一预测对象序列和第二预测对象序列。Step S413, determining each pair of the first prediction object sequence and the second prediction object sequence having a candidate matching relationship in the candidate sequence pair set with the smallest matching loss in the at least one candidate sequence pair set as at least one pair with the target The first predictor sequence and the second predictor sequence of the matching relationship.
本公开实施例中,采用二分图匹配的方式,对每一第一预测对象序列和每一第二预测对象序列进行匹配,可以提高确定的至少一对具有目标匹配关系的第一预测对象序列和第二预测对象序列之间目标匹配关系的准确性,从而可以提高训练后的第一模型的检测准确性。In the embodiment of the present disclosure, each first predictor sequence and each second predictor sequence are matched by using bipartite graph matching, which can improve the determined at least one pair of first predictor sequence and each second predictor sequence with the target matching relationship. The second predicts the accuracy of the target matching relationship between the object sequences, thereby improving the detection accuracy of the trained first model.
本公开实施例提供一种模型训练方法,该方法可以由计算机设备的处理器执行。图5为本公开实施例提供的一种模型训练方法的实现流程示意图,如图5所示,该方法包括如下步骤S501至步骤S506:An embodiment of the present disclosure provides a model training method, which can be executed by a processor of a computer device. Fig. 5 is a schematic diagram of the implementation flow of a model training method provided by an embodiment of the present disclosure. As shown in Fig. 5, the method includes the following steps S501 to S506:
步骤S501,获取分别对第一图像样本进行增广处理后得到的第一增广图像和第二增广图像。Step S501, acquiring a first augmented image and a second augmented image obtained by respectively performing augmentation processing on a first image sample.
步骤S502,利用待训练的第一模型,对所述第一增广图像进行目标检测,得到至少一个包括第一预测对象序列的第一检测结果,并利用第二模型,对所述第二增广图像进行目标检测,得到至少一个包括第二预测对象序列的第二检测结果。Step S502, use the first model to be trained to perform object detection on the first augmented image, obtain at least one first detection result including the first predicted object sequence, and use the second model to perform object detection on the second augmented image Target detection is performed on the wide image, and at least one second detection result including the second predicted object sequence is obtained.
步骤S503,对每一所述第一预测对象序列和每一所述第二预测对象序列进行匹配,得到至少一对具有目标匹配关系的第一预测对象序列和第二预测对象序列。Step S503, matching each of the first predictor sequences and each of the second predictor sequences to obtain at least one pair of the first predictor sequence and the second predictor sequence having a target matching relationship.
步骤S504,基于每一对具有目标匹配关系的第一预测对象序列和第二预测对象序列,对所述第一模型的模型参数进行至少一次更新,得到训练后的所述第一模型。Step S504, based on each pair of the first predicted object sequence and the second predicted object sequence having the target matching relationship, update the model parameters of the first model at least once to obtain the trained first model.
这里,上述步骤S501至步骤S504分别对应于前述步骤S101至步骤S104,在实施时可以参照前述步骤S101至步骤S104的实施方式。Here, the above-mentioned steps S501 to S504 correspond to the above-mentioned steps S101 to S104 respectively, and for implementation, reference may be made to the implementation manners of the above-mentioned steps S101 to S104.
步骤S505,基于训练后的所述第一模型,确定初始的第三模型。Step S505, based on the trained first model, determine an initial third model.
这里,在一些实施方式中,可以根据实际的目标检测场景对训练后的第一模型中的前馈神经网络进行调整,并将调整后的第一模型确定为初始的第三模型。Here, in some implementation manners, the feedforward neural network in the trained first model may be adjusted according to an actual target detection scenario, and the adjusted first model may be determined as the initial third model.
在一些实施方式中,第一模型包括特征提取网络、转换器网络,以及与所述转换器网络连接的第一前馈神经网络、第二前馈神经网络和第三前馈神经网络;第一前馈神经网络、第二前馈神经网络和第三前馈神经网络分别用于输出第一预测对象序列、与所述第一预测对象序列对应的第一对象区域、与所述第一预测对象序列对应的第一对象类别;可以将训练后的第一模型中的第一前馈神经网络移除,并根据实际的目标检测场景对该第一模型中的第三前馈神经网络进行调整,将调整后的第一模型确定为初始的第三模型。In some embodiments, the first model includes a feature extraction network, a converter network, and a first feedforward neural network, a second feedforward neural network, and a third feedforward neural network connected to the converter network; the first The feedforward neural network, the second feedforward neural network and the third feedforward neural network are respectively used to output the first predicted object sequence, the first object region corresponding to the first predicted object sequence, and the first predicted object sequence. The first object category corresponding to the sequence; the first feedforward neural network in the first model after training can be removed, and the third feedforward neural network in the first model can be adjusted according to the actual target detection scene, The adjusted first model is determined as an initial third model.
步骤S506,基于至少一个第二图像样本,对所述第三模型的模型参数进行更新,得到训练后的所述第三模型。Step S506, based on at least one second image sample, update the model parameters of the third model to obtain the trained third model.
这里,第二图像样本可以具有标注信息,也可以是无标注信息的。在实施时,本领域技术人员可以根据实际的目标检测场景确定合适的第二图像样本,这里并不限定。Here, the second image sample may have label information or may not have label information. During implementation, those skilled in the art may determine an appropriate second image sample according to an actual target detection scene, which is not limited here.
在一些实施方式中,可以基于至少一个第二图像样本,对所述第三模型的模型参数进行微调训练,得到训练后的所述第三模型。In some implementation manners, the model parameters of the third model may be fine-tuned and trained based on at least one second image sample to obtain the trained third model.
本公开实施例中,基于训练后的第一模型,确定初始的第三模型,并基于至少一个第二图像样本,对第三模型的模型参数进行更新,得到训练后的第三模型。这样,可以将训练后的第一模型的模型参数迁移至其他目标检测模型,以应用至多种目标检测场景,可以提高第三模型的训练效率,以及训练后的第三模型的检测准确性。In the embodiment of the present disclosure, an initial third model is determined based on the trained first model, and model parameters of the third model are updated based on at least one second image sample to obtain a trained third model. In this way, the model parameters of the trained first model can be transferred to other target detection models to be applied to various target detection scenarios, which can improve the training efficiency of the third model and the detection accuracy of the trained third model.
本公开实施例提供一种图像处理方法,该方法可以由计算机设备的处理器执行。图6为本公开实施例提供的一种图像处理方法的实现流程示意图,如图6所示,该方法包括如下步骤S601至步骤S602:An embodiment of the present disclosure provides an image processing method, which can be executed by a processor of a computer device. FIG. 6 is a schematic diagram of an implementation flow of an image processing method provided by an embodiment of the present disclosure. As shown in FIG. 6, the method includes the following steps S601 to S602:
步骤S601,获取待处理图像;Step S601, acquiring an image to be processed;
步骤S602,利用已训练的第四模型,对所述待处理图像进行目标检测,得到第三检测结果;其中,所述第四模型包括以下至少之一:采用上述实施例中所述的模型训练方法得到的第一模型,采用上述实施例中所述的模型训练方法得到的第三模型。Step S602, using the trained fourth model to perform object detection on the image to be processed to obtain a third detection result; wherein, the fourth model includes at least one of the following: using the model training described in the above-mentioned embodiments The first model obtained by the method is the third model obtained by using the model training method described in the above embodiment.
这里,待处理图像可以是任意合适的待进行目标检测的图像,在实施时,本领域技术人 员可以根据实际应用场景选择合适的待处理图像,本公开实施例并不限定。Here, the image to be processed can be any suitable image to be detected. During implementation, those skilled in the art can select an appropriate image to be processed according to the actual application scenario, which is not limited by the embodiments of the present disclosure.
本公开实施例中,由于上述实施例中所述的模型训练方法,可以通过保持第一模型和第二模型分别对同一图像样本的第一增广图像和第二增广图像处理后得到的第一预测对象序列和第二预测对象序列之间的一致性,实现目标检测模型的序列级别的自监督训练过程,并且可以对目标检测模型的整体网络结构进行训练,从而可以有效提高整个目标检测模型的性能,因此,基于采用上述实施例中所述的模型训练方法得到的第一模型和第三模型中的至少之一,对待处理图像进行目标检测,可以提高目标检测的准确性。In the embodiments of the present disclosure, due to the model training method described in the above embodiments, the first augmented image and the second augmented image obtained after processing the first augmented image and the second augmented image of the same image sample can be obtained by maintaining the first model and the second model respectively. The consistency between the first predicted object sequence and the second predicted object sequence realizes the sequence-level self-supervised training process of the target detection model, and can train the overall network structure of the target detection model, so that the entire target detection model can be effectively improved. Therefore, based on at least one of the first model and the third model obtained by adopting the model training method described in the above embodiments, performing target detection on the image to be processed can improve the accuracy of target detection.
本公开实施例提供一种基于Transformer序列一致性的自监督目标检测模型的预训练方法,该方法可以由计算机设备的处理器执行,该方法可以利用无标签数据对目标检测模型的整体网络结构进行训练,并能基于Transformer的序列特性,同时实现目标检测模型检测中的对象区域回归和对象类别的自监督表征学习过程。图7A为本公开实施例提供的一种基于预训练方法进行模型训练的实现流程示意图,如图7A所示,该方法可以包括如下步骤S701至步骤S703:An embodiment of the present disclosure provides a pre-training method of a self-supervised target detection model based on Transformer sequence consistency, the method can be executed by a processor of a computer device, and the method can use unlabeled data to carry out the overall network structure of the target detection model Training, and based on the sequence characteristics of the Transformer, it can simultaneously realize the object region regression in the target detection model detection and the self-supervised representation learning process of the object category. FIG. 7A is a schematic diagram of an implementation process of model training based on a pre-training method provided by an embodiment of the present disclosure. As shown in FIG. 7A, the method may include the following steps S701 to S703:
步骤S701,利用无监督方式获取第一图像样本中的至少一个候选对象,每一所述候选对象具有候选对象区域和候选对象类别。Step S701, acquire at least one candidate object in the first image sample in an unsupervised manner, each of the candidate objects has a candidate object area and a candidate object category.
在实施时,可以采用任意合适的无监督检测算法对第一图像样本中的目标对象进行检测,得到至少一个候选对象。例如,可以采用的选择性搜索算法从第一图像样本中无监督地获得高召回率的至少一个候选对象。During implementation, any suitable unsupervised detection algorithm may be used to detect the target object in the first image sample to obtain at least one candidate object. For example, a selective search algorithm may be employed to unsupervisedly obtain at least one candidate object with a high recall rate from the first image sample.
步骤S702,利用基于Transformer序列一致性的自监督目标检测模型的预训练方法,对第一模型进行预训练。Step S702, using the pre-training method of the self-supervised target detection model based on Transformer sequence consistency to pre-train the first model.
在一些实施方式中,可以采用如图7B所示的模型训练架构实现基于Transformer序列一致性的自监督目标检测模型的预训练方法,如图7B所示,该模型训练架构包括第一模型10和第二模型20,其中,第一模型10和第二模型20的网络结构相同,都包含有卷积神经网络(Convolutional Neural Networks,CNN)11或21、Transformer编码器12或22、Transformer解码器13或23、以及前馈神经网络(Feed-Forward Networks,FFN)14或24,前馈神经网络可以包括第一前馈神经网络、第二前馈神经网络和第三前馈神经网络;在模型训练的过程中,第一模型10和第二模型20的输入分别是对第一图像样本30进行增广处理后得到的第一增广图像和第二增广图像,其中,第一模型10中输入的第一增广图像的扰动包含更多的颜色层面扰动。第一模型10和第二模型20分别对第一增广图像和第二增广图像进行目标检测的过程是相同的,以第一模型10对第一增广图像进行目标检测的过程为例,在利用卷积神经网络11对第一增广图像进行特征提取后,会在提取的特征中添加位置编码40,利用Transformer编码器12和Transformer解码器13对添加位置编码后的特征进行处理,经过Transformer编码器12和Transformer解码器13的处理后,可以得到至少一个表征预测对象的特征序列31,利用第一前馈神经网络、第二前馈神经网络和第三前馈神经网络对每一特征序列31进行处理,针对每一特征序列31,可以得到第一前馈神经网络输出的第一预测对象序列Prj1、第二前馈神经网络输出的与该第一预测对象序列对应的第一对象区域Bx1、第三前馈神经网络输出的与该第一预测对象序列对应的第一对象类别Cls1,相应地,利用第二模型20对第二增广图像进行处理后可以得到特征序列32、第二预测对象序列Prj2、与该第二预测对象序列对应的第二对象区域Bx2、与该第二预测对象序列对应的第二对象类别Cls2。对于第一模型10和第二模型20的输出结果,可以利用二分图匹配算法对至少一个第一预测对象序列Prj1和至少一个第二预测对象序列Prj2进行匹配,得到至少一对具有目标匹配关系的第一预测对象序列和第二预测对象序列(如对应于第一对象区域Bx1-1的第一预测对象序列和第二对象区域Bx2-1的第二预测对象序列、对应于第一对象区域Bx1-4的第一预测对象序列和第二对象区域Bx2-2的第二预测对象序列、对应于第一对象区域Bx1-4的第一预测对象序列和第二对象区域Bx2-3的第二预测对象序列、对应于第一对象区域Bx1-4的第一预测对象序列和第二对象区域Bx2-4的第二预测对象序列),然后可以基于至少一对具有目标匹配关系的第一预测对象序列和第二预测对象序列,利用绝对值损失函数计算相似度损失, 基于该相似度损失,可以确定目标损失值,基于该目标损失值可以对第一模型10的网络参数和第二模型20的网络参数进行更新,以提高对同一图像样本进行不同增广处理后的增广图像的Transformer特征序列的一致性;其中,第一模型10的网络参数可以采用梯度更新的方式进行更新,第二模型20的网络参数的更新可以采用停止梯度的设计,基于第一模型10当前的网络参数进行动量更新。其中,二分图匹配算法是一种基于集合的匹配方法,二分图匹配算法的输入是第一模型10和第二模型20分别输出的至少一个第一预测对象序列和至少一个第二预测对象序列,以及每个第一预测对象序列对应的第一对象区域和第一对象类别的置信度、每个第二预测对象序列对应的第二对象区域和第二对象类别的置信度。相比于基于时序的一对一序列匹配,利用二分图匹配算法可以找到更优的序列匹配对(即具有目标匹配关系的第一预测对象序列和第二预测对象序列),并且为第一模型的自监督学习带来带更多的有益信息,最终提升自监督学习的效率和精度。In some embodiments, the model training architecture as shown in FIG. 7B can be used to realize the pre-training method of the self-supervised target detection model based on Transformer sequence consistency. As shown in FIG. 7B, the model training architecture includes the first model 10 and The second model 20, wherein the network structure of the first model 10 and the second model 20 is the same, and both include a convolutional neural network (Convolutional Neural Networks, CNN) 11 or 21, a Transformer encoder 12 or 22, and a Transformer decoder 13 Or 23, and feedforward neural network (Feed-Forward Networks, FFN) 14 or 24, feedforward neural network can comprise the first feedforward neural network, the second feedforward neural network and the 3rd feedforward neural network; In model training In the process of , the inputs of the first model 10 and the second model 20 are respectively the first augmented image and the second augmented image obtained after augmenting the first image sample 30, wherein the input of the first model 10 The perturbation of the first augmented image contains more color level perturbations. The processes of the first model 10 and the second model 20 performing target detection on the first augmented image and the second augmented image respectively are the same, taking the process of the first model 10 performing target detection on the first augmented image as an example, After the convolutional neural network 11 is used to extract the features of the first augmented image, a position code 40 will be added to the extracted features, and the Transformer encoder 12 and Transformer decoder 13 will be used to process the features after adding the position code. After the processing of the Transformer encoder 12 and the Transformer decoder 13, at least one feature sequence 31 representing the predicted object can be obtained, and the first feedforward neural network, the second feedforward neural network and the third feedforward neural network are used for each feature Sequence 31 is processed, and for each feature sequence 31, the first predicted object sequence Prj1 output by the first feedforward neural network, and the first object region corresponding to the first predicted object sequence output by the second feedforward neural network can be obtained Bx1, the first object category Cls1 corresponding to the first predicted object sequence output by the third feed-forward neural network, correspondingly, after the second augmented image is processed by the second model 20, the feature sequence 32, the second A prediction target sequence Prj2, a second target region Bx2 corresponding to the second prediction target sequence, and a second target class Cls2 corresponding to the second prediction target sequence. For the output results of the first model 10 and the second model 20, at least one first predictor sequence Prj1 and at least one second predictor sequence Prj2 can be matched using a bipartite graph matching algorithm to obtain at least one pair of objects with a target matching relationship. The first predictor sequence and the second predictor sequence (such as the first predictor sequence corresponding to the first target region Bx1-1 and the second predictor sequence corresponding to the second target region Bx2-1, corresponding to the first target region Bx1 The first prediction target sequence of -4 and the second prediction target sequence of the second target area Bx2-2, the first prediction target sequence corresponding to the first target area Bx1-4 and the second prediction of the second target area Bx2-3 object sequence, the first predicted object sequence corresponding to the first object region Bx1-4 and the second predicted object sequence corresponding to the second object region Bx2-4), and then based on at least one pair of the first predicted object sequence with the target matching relationship and the second predicted object sequence, using the absolute value loss function to calculate the similarity loss, based on the similarity loss, the target loss value can be determined, and based on the target loss value, the network parameters of the first model 10 and the network of the second model 20 The parameters are updated to improve the consistency of the Transformer feature sequence of the augmented image after different augmentation processes on the same image sample; wherein, the network parameters of the first model 10 can be updated in a gradient update manner, and the second model 20 The update of the network parameters can adopt the design of stopping gradient, and update the momentum based on the current network parameters of the first model 10 . Wherein, the bipartite graph matching algorithm is a set-based matching method, and the input of the bipartite graph matching algorithm is at least one first prediction object sequence and at least one second prediction object sequence respectively output by the first model 10 and the second model 20, And the confidence degree of the first object region and the first object category corresponding to each first predicted object sequence, and the confidence degree of the second object region and the second object category corresponding to each second predicted object sequence. Compared with the one-to-one sequence matching based on time series, the bipartite graph matching algorithm can find a better sequence matching pair (that is, the first prediction object sequence and the second prediction object sequence with the target matching relationship), and for the first model Self-supervised learning brings more beneficial information, and ultimately improves the efficiency and accuracy of self-supervised learning.
在一些实施方式中,对第一模型10的网络参数和第二模型20的网络参数进行更新的过程中考虑的目标损失值还可以包括第一目标检测网络输出的至少一个第一预测对象序列对应的第一对象区域和至少一个候选对象的候选对象区域之间的差异,以及每一第一预测对象序列对应的第一对象类别与每一候选对象的候选对象类别之间的差异。在实施时,可以利用二分图匹配算法匹配每一第一预测对象序列对应的第一对象区域和第一对象类别、以及每一候选对象的候选对象区域和候选对象类别,然后利用广义交并比函数确定每一对具有目标匹配关系的第一预测对象序列和候选对象分别对应的第一对象区域和候选对象区域之间的第一子损失值,利用焦点损失函数确定每一对具有目标匹配关系的第一预测对象序列和候选对象分别对应的第一对象类别与候选对象类别之间的第二子损失值,基于每一第一子损失值、每一第二子损失值以及每一对具有目标匹配关系的第一预测对象序列和第二预测对象序列之间的相似度损失,可以确定目标损失值。In some implementations, the target loss value considered in the process of updating the network parameters of the first model 10 and the network parameters of the second model 20 may also include at least one first predicted object sequence corresponding to the output of the first target detection network The difference between the first object region of the first object region and the candidate object region of at least one candidate object, and the difference between the first object category corresponding to each first predicted object sequence and the candidate object category of each candidate object. During implementation, the bipartite graph matching algorithm can be used to match the first object region and the first object category corresponding to each first predicted object sequence, as well as the candidate object region and the candidate object category of each candidate object, and then use the generalized intersection and merge ratio The function determines the first sub-loss value between the first object region and the candidate object region corresponding to each pair of the first predicted object sequence and the candidate object with the target matching relationship, and uses the focal loss function to determine that each pair has the target matching relationship The second sub-loss value between the first object category and the candidate object category corresponding to the first predicted object sequence and the candidate object, based on each first sub-loss value, each second sub-loss value and each pair with The similarity loss between the first predicted object sequence and the second predicted object sequence of the target matching relationship may determine a target loss value.
步骤S703,将预训练后的第一模型迁移到目标检测任务中。Step S703, migrating the pre-trained first model to the target detection task.
这里,根据不同的目标检测场景(如工业质检、工业巡检、医疗场景检测、自动驾驶等至少一种应用场景)中的目标检测任务,可以将训练后的第一模型中的第一前馈神经网络移除,并根据实际的目标检测任务对该第一模型中第三前馈神经网络的输出类别数量进行调整,将调整后的第一模型确定为初始的第三模型,然后对该第三模型的模型参数进行微调训练,得到可用于目标检测任务的第三模型。Here, according to the target detection tasks in different target detection scenarios (such as at least one application scenario such as industrial quality inspection, industrial inspection, medical scene detection, automatic driving, etc.), the first previous model in the trained first model can be The feed-forward neural network is removed, and the number of output categories of the third feed-forward neural network in the first model is adjusted according to the actual target detection task, and the adjusted first model is determined as the initial third model, and then the The model parameters of the third model are fine-tuned and trained to obtain a third model that can be used for the target detection task.
图8为本公开实施例提供的一种模型训练装置的组成结构示意图,如图8所示,模型训练装置800包括:第一获取部分810、第一检测部分820、第一匹配部分830和第一更新部分840,其中:第一获取部分810,被配置为获取分别对第一图像样本进行增广处理后得到的第一增广图像和第二增广图像;第一检测部分820,被配置为利用待训练的第一模型,对第一增广图像进行目标检测,得到至少一个包括第一预测对象序列的第一检测结果,并利用第二模型,对第二增广图像进行目标检测,得到至少一个包括第二预测对象序列的第二检测结果;第一匹配部分830,被配置为对每一第一预测对象序列和每一第二预测对象序列进行匹配,得到至少一对具有目标匹配关系的第一预测对象序列和第二预测对象序列;第一更新部分840,被配置为基于每一对具有目标匹配关系的第一预测对象序列和第二预测对象序列,对第一模型的模型参数进行至少一次更新,得到训练后的第一模型。FIG. 8 is a schematic diagram of the composition and structure of a model training device provided by an embodiment of the present disclosure. As shown in FIG. An updating part 840, wherein: the first acquiring part 810 is configured to acquire the first augmented image and the second augmented image obtained after augmenting the first image sample respectively; the first detecting part 820 is configured In order to use the first model to be trained to perform target detection on the first augmented image, obtain at least one first detection result including the first predicted object sequence, and use the second model to perform target detection on the second augmented image, Obtain at least one second detection result including the second predictor sequence; the first matching part 830 is configured to match each first predictor sequence with each second predictor sequence, and obtain at least one pair with target matching The first predictor sequence and the second predictor sequence of the relationship; the first update part 840 is configured to update the model of the first model based on each pair of the first predictor sequence and the second predictor sequence with the target matching relationship The parameters are updated at least once to obtain the trained first model.
在一些实施例中,第一更新部分还被配置为:基于每一对具有目标匹配关系的第一预测对象序列和第二预测对象序列之间的相似度,确定目标损失值;在目标损失值不满足预设条件的情况下,对第一模型的模型参数进行更新,得到更新后的第一模型;基于更新后的第一模型,确定训练后的第一模型。In some embodiments, the first updating part is further configured to: determine the target loss value based on the similarity between each pair of the first predictor sequence and the second predictor sequence having the target matching relationship; If the preset condition is not satisfied, the model parameters of the first model are updated to obtain an updated first model; based on the updated first model, the trained first model is determined.
在一些实施例中,第一更新部分还被配置为:在目标损失值不满足预设条件的情况下,分别对第一模型的模型参数和第二模型的模型参数进行更新,得到更新后的第一模型和更新后的第二模型;基于更新后的第一模型和更新后的第二模型,确定训练后的所述第一模型。In some embodiments, the first updating part is further configured to: update the model parameters of the first model and the model parameters of the second model respectively when the target loss value does not meet the preset condition, and obtain the updated The first model and the updated second model; based on the updated first model and the updated second model, the trained first model is determined.
在一些实施例中,第一更新部分还被配置为:基于第一模型当前的模型参数,对第二模型的模型参数进行动量更新,得到更新后的第二模型;采用梯度更新的方式,对第一模型当 前的模型参数进行更新,得到更新后的第一模型。In some embodiments, the first update part is further configured to: update the momentum of the model parameters of the second model based on the current model parameters of the first model to obtain the updated second model; The current model parameters of the first model are updated to obtain the updated first model.
在一些实施例中,第一更新部分还被配置为:将对下一第一图像样本分别进行增广处理后得到的第一增广图像和第二增广图像,分别确定为当前第一增广图像和当前第二增广图像;利用当前更新后的第一模型,对当前第一增广图像进行目标检测,得到至少一个包括第一预测对象序列的第一检测结果,并利用当前更新后的第二模型,对当前第二增广图像进行目标检测,得到至少一个包括第二预测对象序列的第二检测结果;对每一第一预测对象序列和每一第二预测对象序列进行匹配,得到至少一对具有目标匹配关系的第一预测对象序列和第二预测对象序列;基于每一对具有目标匹配关系的第一预测对象序列和第二预测对象序列之间的相似度,确定当前目标损失值;在当前目标损失值满足预设条件或对第一模型的模型参数进行更新的次数达到次数阈值的情况下,将当前更新后的第一模型确定为训练后的第一模型。In some embodiments, the first updating part is further configured to: respectively determine the first augmented image and the second augmented image obtained after augmenting the next first image sample as the current first augmented image. The widened image and the current second augmented image; use the currently updated first model to perform target detection on the current first augmented image, obtain at least one first detection result including the first predicted object sequence, and use the current updated The second model of is to perform target detection on the current second augmented image, and obtain at least one second detection result including a second predicted object sequence; match each first predicted object sequence with each second predicted object sequence, Obtain at least one pair of the first predictor sequence and the second predictor sequence with the target matching relationship; based on the similarity between each pair of the first predictor sequence and the second predictor sequence with the target matching relationship, determine the current target Loss value: when the current target loss value satisfies the preset condition or the number of times the model parameters of the first model are updated reaches a number threshold, the currently updated first model is determined as the trained first model.
在一些实施例中,第一更新部分还被配置为:在当前目标损失值不满足预设条件的情况下,分别对第一模型的模型参数和所述第二模型的模型参数进行下一次更新,得到下一次更新后的第一模型和下一次更新后的第二模型;基于下一次更新后的第一模型和下一次更新后的第二模型,确定训练后的第一模型。In some embodiments, the first updating part is further configured to: when the current target loss value does not meet the preset condition, respectively update the model parameters of the first model and the model parameters of the second model for the next time , to obtain the first model after the next update and the second model after the next update; based on the first model after the next update and the second model after the next update, determine the first model after training.
在一些实施例中,第一检测结果还包括与第一检测结果中的第一预测对象序列对应的第一对象区域和第一对象类别;还装置还包括:第二获取部分,被配置为获取第一图像样本中的至少一个候选对象,每一候选对象具有候选对象区域和候选对象类别;第二匹配部分,被配置为基于每一第一预测对象序列对应的第一对象区域和第一对象类别、以及每一候选对象的候选对象区域和候选对象类别,对每一第一预测对象序列和每一候选对象进行匹配,得到至少一对具有目标匹配关系的第一预测对象序列和候选对象;第一更新部分还被配置为:基于每一对具有目标匹配关系的第一预测对象序列和第二预测对象序列之间的相似度,确定第一损失值;基于每一对具有目标匹配关系的第一预测对象序列和候选对象,确定第二损失值;基于第一损失值和第二损失值,确定目标损失值。In some embodiments, the first detection result further includes a first object area and a first object category corresponding to the first predicted object sequence in the first detection result; the device further includes: a second acquisition part configured to acquire At least one candidate object in the first image sample, each candidate object has a candidate object area and a candidate object category; the second matching part is configured to be based on the first object area and the first object corresponding to each first sequence of predicted objects Category, and the candidate object area and candidate object category of each candidate object, matching each first predicted object sequence and each candidate object to obtain at least one pair of first predicted object sequence and candidate object with a target matching relationship; The first updating part is further configured to: determine a first loss value based on the similarity between each pair of the first predictor sequence and the second predictor sequence with the target matching relationship; The first prediction object sequence and the candidate object determine a second loss value; based on the first loss value and the second loss value, determine a target loss value.
在一些实施例中,第一更新部分还被配置为:针对每一对具有目标匹配关系的第一预测对象序列和候选对象,基于该第一预测对象序列对应的第一对象区域和还候选对象的候选对象区域,确定一第一子损失值,并基于该第一预测对象序列对应的第一对象类别与该候选对象的候选对象类别,确定一第二子损失值;基于每一第一子损失值和每一第二子损失值,确定第二损失值。In some embodiments, the first update part is further configured to: for each pair of the first predicted object sequence and the candidate object having the target matching relationship, based on the first object region corresponding to the first predicted object sequence and the candidate object The candidate object area, determine a first sub-loss value, and determine a second sub-loss value based on the first object category corresponding to the first predicted object sequence and the candidate object category of the candidate object; based on each first sub-loss The loss value and each second sub-loss value determine the second loss value.
在一些实施例中,第二获取部分还被配置为:采用无监督方式,对第一图像样本进行目标检测,得到至少一个预测对象区域以及每一预测对象区域的伪标签;每一预测对象区域的伪标签用于表征预测对象区域的预测对象类别;针对每一预测对象区域,将该预测对象区域作为候选对象区域,并将该预测对象区域的伪标签作为候选对象类别,得到一个候选对象。In some embodiments, the second acquisition part is further configured to: use an unsupervised method to perform object detection on the first image sample to obtain at least one predicted object region and a pseudo-label of each predicted object region; each predicted object region The pseudo-label of is used to represent the prediction object category of the prediction object region; for each prediction object region, the prediction object region is used as a candidate object region, and the pseudo-label of the prediction object region is used as a candidate object category to obtain a candidate object.
在一些实施例中,第一检测结果还包括与第一检测结果中的第一预测对象序列对应的第一对象区域和第一对象类别,第二检测结果还包括与第二检测结果中的第二预测对象序列对应的第二对象区域和第二对象类别;第一匹配部分还被配置为:基于每一第一预测对象序列对应的第一对象区域和第一对象类别、以及每一第二预测对象序列对应的第二对象区域和第二对象类别,对每一第一预测对象序列和每一第二预测对象序列进行二分图匹配,得到至少一对具有目标匹配关系的第一预测对象序列和第二预测对象序列。In some embodiments, the first detection result further includes the first object region and the first object category corresponding to the first predicted object sequence in the first detection result, and the second detection result further includes the first object category corresponding to the first predicted object sequence in the second detection result. The second object region and the second object category corresponding to the two predicted object sequences; the first matching part is further configured to: based on the first object region and the first object category corresponding to each first predicted object sequence, and each second For the second object area and the second object category corresponding to the predicted object sequence, perform bipartite graph matching on each first predicted object sequence and each second predicted object sequence to obtain at least one pair of first predicted object sequences with a target matching relationship and the second predictor sequence.
在一些实施例中,第一匹配部分还被配置为:基于每一第一预测对象序列和每一第二预测对象序列,确定至少一个候选序列对集合;每一候选序列对集合中包括至少一对具有候选匹配关系的第一预测对象序列和第二预测对象序列;针对每一候选序列对集合,基于该候选序列对集合中每一对具有候选匹配关系的第一预测对象序列和第二预测对象序列中第一预测对象序列对应的第一对象区域和第一对象类别、以及第二预测对象序列对应的第二对象区域和第二对象类别,确定该候选序列对集合的匹配损失;将至少一个候选序列对集合中匹配损失最小的候选序列对集合中的每一对具有候选匹配关系的第一预测对象序列和第二预测对象序列,确定为至少一对具有目标匹配关系的第一预测对象序列和第二预测对象序列。In some embodiments, the first matching part is further configured to: determine at least one candidate sequence pair set based on each first predictor sequence and each second predictor sequence; each candidate sequence pair set includes at least one For the first prediction target sequence and the second prediction target sequence with a candidate matching relationship; for each candidate sequence pair set, based on each pair of the first prediction target sequence and the second prediction target sequence with a candidate matching relationship in the candidate sequence pair set The first object region and the first object category corresponding to the first predicted object sequence in the object sequence, and the second object region and the second object category corresponding to the second predicted object sequence, determine the matching loss of the candidate sequence pair set; at least Each pair of the first prediction object sequence and the second prediction object sequence with a candidate matching relationship in the candidate sequence pair set with the smallest matching loss in a candidate sequence pair set is determined as at least one pair of first prediction objects with a target matching relationship sequence and the second predictor sequence.
在一些实施例中,第一模型包括特征提取网络和转化器网络;第一检测部分还被配置为:利用第一模型的特征提取网络,对第一增广图像进行特征提取,得到图像特征信息;利用第一模型的转换器网络,对图像特征信息进行预测处理,得到至少一个第一预测对象序列。In some embodiments, the first model includes a feature extraction network and a converter network; the first detection part is further configured to: use the feature extraction network of the first model to perform feature extraction on the first augmented image to obtain image feature information ; Using the converter network of the first model to perform prediction processing on the image feature information to obtain at least one sequence of first prediction objects.
在一些实施例中,第一模型还包括第一前馈神经网络;第一检测部分还被配置为:利用第一模型的转换器网络,对图像特征信息进行预测处理,得到至少一个特征序列;利用第一前馈神经网络,将每一特征序列映射至目标维度,得到至少一个第一预测对象序列。In some embodiments, the first model further includes a first feed-forward neural network; the first detection part is further configured to: use the converter network of the first model to predict image feature information to obtain at least one feature sequence; Using the first feed-forward neural network, each feature sequence is mapped to the target dimension to obtain at least one first sequence of predicted objects.
在一些实施例中,第一检测结果还包括第一对象区域和第一对象类别,第一模型还包括第二前馈神经网络和第三前馈神经网络;第一检测部分还被配置为:针对每一特征序列,利用第二前馈神经网络,对特征序列进行区域预测,得到第一对象区域,并利用第三前馈神经网络,对特征序列进行类别预测,得到第一对象类别。In some embodiments, the first detection result also includes the first object region and the first object category, and the first model also includes a second feedforward neural network and a third feedforward neural network; the first detection part is also configured as: For each feature sequence, the second feedforward neural network is used to perform area prediction on the feature sequence to obtain the first object area, and the third feedforward neural network is used to perform category prediction on the feature sequence to obtain the first object category.
在一些实施例中,第二模型与第一模型具有相同的网络结构。In some embodiments, the second model has the same network structure as the first model.
在一些实施例中,第一获取部分还被配置为:对第一图像样本进行第一图像增广处理,得到第一增广图像;对第一图像样本进行第二图像增广处理,得到第二增广图像。In some embodiments, the first acquisition part is further configured to: perform a first image augmentation process on the first image sample to obtain a first augmented image; perform a second image augmentation process on the first image sample to obtain a second image augmentation process Two augmented images.
在一些实施例中,第一图像增广处理包括以下至少之一:颜色抖动、灰度处理、高斯模糊、随机擦除;第二图像增广处理包括以下至少之一:随机缩放、随机裁剪、随机翻转、随机调整尺寸。In some embodiments, the first image augmentation process includes at least one of the following: color dithering, grayscale processing, Gaussian blur, random erasure; the second image augmentation process includes at least one of the following: random scaling, random cropping, Flip randomly, resize randomly.
在一些实施例中,所述装置还包括:确定部分,被配置为基于训练后的第一模型,确定初始的第三模型;第二更新部分,被配置为基于至少一个第二图像样本,对第三模型的模型参数进行更新,得到训练后的第三模型。In some embodiments, the apparatus further includes: a determining part configured to determine an initial third model based on the trained first model; a second updating part configured to determine an initial third model based on at least one second image sample The model parameters of the third model are updated to obtain the trained third model.
图9为本公开实施例提供的一种图像处理装置的组成结构示意图,如图9所示,图像处理装置900包括:第三获取部分910和第二检测部分920,其中:第三获取部分910,被配置为获取待处理图像;第二检测部分920,被配置为利用已训练的第四模型,对待处理图像进行目标检测,得到第三检测结果;第四模型包括以下至少之一:采用上述实施例中所述的模型训练方法得到的第一模型,采用上述实施例中所述的模型训练方法得到的第三模型。FIG. 9 is a schematic diagram of the composition and structure of an image processing device provided by an embodiment of the present disclosure. As shown in FIG. 9 , the image processing device 900 includes: a third acquisition part 910 and a second detection part 920, wherein: the third acquisition part 910 , is configured to acquire the image to be processed; the second detection part 920 is configured to use the trained fourth model to perform object detection on the image to be processed to obtain a third detection result; the fourth model includes at least one of the following: using the above The first model obtained by the model training method described in the embodiment, and the third model obtained by the model training method described in the above embodiment are used.
以上装置实施例的描述,与上述方法实施例的描述是类似的,具有同方法实施例相似的有益效果。对于本公开装置实施例中未披露的技术细节,请参照本公开方法实施例的描述而理解。The description of the above device embodiment is similar to the description of the above method embodiment, and has similar beneficial effects as the method embodiment. For technical details not disclosed in the device embodiments of the present disclosure, please refer to the description of the method embodiments of the present disclosure for understanding.
在本公开实施例以及其他的实施例中,“部分”可以是部分电路、部分处理器、部分程序或软件等等,当然也可以是单元,还可以是模块也可以是非模块化的。In the embodiments of the present disclosure and other embodiments, a "part" may be a part of a circuit, a part of a processor, a part of a program or software, etc., of course it may also be a unit, a module or a non-modular one.
需要说明的是,本公开实施例中,如果以软件功能部分的形式实现上述的模型训练方法或图像处理方法,并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。基于这样的理解,本公开实施例的技术方案本质上或者说对相关技术做出贡献的部分可以以软件产品的形式体现出来,该软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机、服务器、或者网络设备等)执行本公开各个实施例所述方法的全部或部分。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read Only Memory,ROM)、磁碟或者光盘等各种可以存储程序代码的介质。这样,本公开实施例不限制于任何特定的硬件和软件结合。It should be noted that, in the embodiment of the present disclosure, if the above-mentioned model training method or image processing method is implemented in the form of software function parts, and sold or used as an independent product, it can also be stored in a computer-readable storage medium middle. Based on this understanding, the essence of the technical solution of the embodiments of the present disclosure or the part that contributes to the related technology can be embodied in the form of a software product, which is stored in a storage medium and includes several instructions to make a A computer device (which may be a personal computer, a server, or a network device, etc.) executes all or part of the methods described in various embodiments of the present disclosure. The aforementioned storage medium includes: various media that can store program codes such as U disk, mobile hard disk, read-only memory (Read Only Memory, ROM), magnetic disk or optical disk. As such, embodiments of the present disclosure are not limited to any specific combination of hardware and software.
本公开实施例提供一种计算机设备,包括存储器和处理器,所述存储器存储有可在处理器上运行的计算机程序,所述处理器执行所述程序时实现上述方法中的步骤。An embodiment of the present disclosure provides a computer device, including a memory and a processor, the memory stores a computer program that can run on the processor, and the processor implements the steps in the above method when executing the program.
本公开实施例提供一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现上述方法中的步骤。所述计算机可读存储介质可以是瞬时性的,也可以是非瞬时性的。An embodiment of the present disclosure provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the steps in the above method are implemented. The computer readable storage medium may be transitory or non-transitory.
本公开实施例提供一种计算机程序产品,所述计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质,所述计算机程序被计算机读取并执行时,实现上述方法中的部分或全部步骤。该计算机程序产品可以通过硬件、软件或其结合的方式实现。在一些实施方式中,所述计算机程序产品体现为计算机可读存储介质,在另一些实施方式中,计算机程序产品体现为软件产品,例如软件开发包(Software Development Kit,SDK)等等。An embodiment of the present disclosure provides a computer program product. The computer program product includes a non-transitory computer-readable storage medium storing a computer program. When the computer program is read and executed by a computer, a part or part of the above-mentioned method is implemented. All steps. The computer program product can be realized by hardware, software or a combination thereof. In some embodiments, the computer program product is embodied as a computer-readable storage medium, and in other embodiments, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK) and the like.
这里需要指出的是:以上存储介质、计算机程序产品和设备实施例的描述,与上述方法实施例的描述是类似的,具有同方法实施例相似的有益效果。对于本公开存储介质、计算机程序产品和设备实施例中未披露的技术细节,请参照本公开方法实施例的描述而理解。It should be pointed out here that: the above descriptions of the storage medium, computer program product, and device embodiments are similar to the descriptions of the above method embodiments, and have similar beneficial effects to those of the method embodiments. For technical details not disclosed in the storage medium, computer program product, and device embodiments of the present disclosure, please refer to the description of the method embodiments of the present disclosure for understanding.
需要说明的是,图10为本公开实施例中计算机设备的一种硬件实体示意图,如图10所示,该计算机设备1000的硬件实体包括:处理器1001、通信接口1002和存储器1003,其中:处理器1001通常控制计算机设备1000的总体操作;通信接口1002可以使计算机设备通过网络与其他终端或服务器通信;存储器1003配置为存储由处理器1001可执行的指令和应用,还可以缓存待处理器1001以及计算机设备1000中各部分待处理或已经处理的数据(例如,图像数据、音频数据、语音通信数据和视频通信数据),可以通过闪存(FLASH)或随机访问存储器(Random Access Memory,RAM)实现。处理器1001、通信接口1002和存储器1003之间可以通过总线1004进行数据传输。It should be noted that FIG. 10 is a schematic diagram of a hardware entity of a computer device in an embodiment of the present disclosure. As shown in FIG. 10 , the hardware entity of the computer device 1000 includes: a processor 1001, a communication interface 1002, and a memory 1003, wherein: The processor 1001 usually controls the overall operation of the computer device 1000; the communication interface 1002 can enable the computer device to communicate with other terminals or servers through the network; the memory 1003 is configured to store instructions and applications executable by the processor 1001, and can also cache 1001 and the data to be processed or processed by each part of the computer device 1000 (for example, image data, audio data, voice communication data and video communication data), can be stored in flash memory (FLASH) or random access memory (Random Access Memory, RAM) accomplish. Data transmission can be performed between the processor 1001 , the communication interface 1002 and the memory 1003 through the bus 1004 .
应理解,说明书通篇中提到的“一个实施例”或“一实施例”意味着与实施例有关的特定特征、结构或特性包括在本公开的至少一个实施例中。因此,在整个说明书各处出现的“在一个实施例中”或“在一实施例中”未必一定指相同的实施例。此外,这些特定的特征、结构或特性可以任意适合的方式结合在一个或多个实施例中。应理解,在本公开的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本公开实施例的实施过程构成任何限定。上述本公开实施例序号仅仅为了描述,不代表实施例的优劣。需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。It should be understood that reference throughout the specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic related to the embodiment is included in at least one embodiment of the present disclosure. Thus, appearances of "in one embodiment" or "in an embodiment" in various places throughout the specification are not necessarily referring to the same embodiment. Furthermore, the particular features, structures or characteristics may be combined in any suitable manner in one or more embodiments. It should be understood that in various embodiments of the present disclosure, the sequence numbers of the above-mentioned processes do not mean the order of execution, and the execution order of the processes should be determined by their functions and internal logic, rather than by the embodiments of the present disclosure. The implementation process constitutes any limitation. The serial numbers of the above-mentioned embodiments of the present disclosure are for description only, and do not represent the advantages and disadvantages of the embodiments. It should be noted that, in this document, the term "comprising", "comprising" or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article or apparatus comprising a set of elements includes not only those elements, It also includes other elements not expressly listed, or elements inherent in the process, method, article, or device. Without further limitations, an element defined by the phrase "comprising a ..." does not preclude the presence of additional identical elements in the process, method, article, or apparatus comprising that element.
在本公开所提供的几个实施例中,应该理解到,所揭露的设备和方法,可以通过其它的方式实现。以上所描述的设备实施例仅仅是示意性的,例如,所述部分的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,如:多个部分或组件可以结合,或可以集成到另一个系统,或一些特征可以忽略,或不执行。另外,所显示或讨论的各组成部分相互之间的耦合、或直接耦合、或通信连接可以是通过一些接口,设备或部分的间接耦合或通信连接,可以是电性的、机械的或其它形式的。In the several embodiments provided in the present disclosure, it should be understood that the disclosed devices and methods may be implemented in other ways. The device embodiments described above are only illustrative. For example, the division of the parts is only a logical function division. In actual implementation, there may be other division methods, such as: multiple parts or components can be combined, or May be integrated into another system, or some features may be ignored, or not implemented. In addition, the coupling, or direct coupling, or communication connection between the various components shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or parts may be in electrical, mechanical or other forms. of.
上述作为分离部件说明的部分可以是、或也可以不是物理上分开的,作为部分显示的部件可以是、或也可以不是物理部分;既可以位于一个地方,也可以分布到多个网络部分上;可以根据实际的需要选择其中的部分或全部部分来实现本实施例方案的目的。另外,在本公开各实施例中的各功能部分可以全部集成在一个处理部分中,也可以是各部分分别单独作为一个部分,也可以两个或两个以上部分集成在一个部分中;上述集成的部分既可以采用硬件的形式实现,也可以采用硬件加软件功能部分的形式实现。The part described above as a separate component may or may not be physically separated, and the part shown as a part may or may not be a physical part; it may be located in one place or distributed to multiple network parts; Some or all of them can be selected according to actual needs to achieve the purpose of the solution of this embodiment. In addition, each functional part in each embodiment of the present disclosure may be fully integrated into one processing part, or each part may be separately regarded as one part, or two or more parts may be integrated into one part; the above-mentioned integration The part can be implemented in the form of hardware, or in the form of hardware plus software function.
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质包括:移动存储设备、ROM、磁碟或者光盘等各种可以存储程序代码的介质。Those of ordinary skill in the art can understand that all or part of the steps to realize the above method embodiments can be completed by hardware related to program instructions, and the aforementioned programs can be stored in computer-readable storage media. When the program is executed, the execution includes: The steps in the foregoing method embodiments; and the aforementioned storage medium includes: various media capable of storing program codes such as removable storage devices, ROMs, magnetic disks, or optical disks.
或者,本公开上述集成的部分如果以软件功能部分的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。基于这样的理解,本公开的技术方案本质上或者说对相关技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机、服务器、或者网络设备等)执行本公开各个实施例所述方法的全部或部分。而前述的存储介质包括:移动存储设备、ROM、磁碟或者光盘等各种可以存储程序代码的介质。Alternatively, if the above-mentioned integrated part of the present disclosure is realized in the form of software function part and sold or used as an independent product, it can also be stored in a computer-readable storage medium. Based on this understanding, the essence of the technical solution of the present disclosure or the part that contributes to the related technology can be embodied in the form of a software product. The computer software product is stored in a storage medium and includes several instructions to make a A computer device (which may be a personal computer, a server, or a network device, etc.) executes all or part of the methods described in various embodiments of the present disclosure. The aforementioned storage medium includes various media capable of storing program codes such as removable storage devices, ROMs, magnetic disks or optical disks.
以上所述,仅为本公开的实施方式,但本公开的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本公开揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本公开的保护范围之内。The above is only the embodiment of the present disclosure, but the scope of protection of the present disclosure is not limited thereto. Anyone skilled in the art can easily think of changes or substitutions within the technical scope of the present disclosure, and should within the protection scope of the present disclosure.

Claims (24)

  1. 一种模型训练方法,所述方法包括:A model training method, the method comprising:
    获取分别对第一图像样本进行增广处理后得到的第一增广图像和第二增广图像;Acquiring a first augmented image and a second augmented image respectively obtained by augmenting the first image sample;
    利用待训练的第一模型,对所述第一增广图像进行目标检测,得到至少一个包括第一预测对象序列的第一检测结果,并利用第二模型,对所述第二增广图像进行目标检测,得到至少一个包括第二预测对象序列的第二检测结果;Using the first model to be trained, perform target detection on the first augmented image to obtain at least one first detection result including the first predicted object sequence, and use the second model to perform target detection on the second augmented image Target detection, obtaining at least one second detection result including a second predicted object sequence;
    对每一所述第一预测对象序列和每一所述第二预测对象序列进行匹配,得到至少一对具有目标匹配关系的第一预测对象序列和第二预测对象序列;Matching each of the first predictor sequences and each of the second predictor sequences to obtain at least one pair of the first predictor sequence and the second predictor sequence having a target matching relationship;
    基于每一对具有目标匹配关系的第一预测对象序列和第二预测对象序列,对所述第一模型的模型参数进行至少一次更新,得到训练后的所述第一模型。Based on each pair of the first predicted object sequence and the second predicted object sequence having a target matching relationship, the model parameters of the first model are updated at least once to obtain the trained first model.
  2. 根据权利要求1所述的方法,其中,所述基于每一对具有目标匹配关系的第一预测对象序列和第二预测对象序列,对所述第一模型的模型参数进行至少一次更新,得到训练后的所述第一模型,包括:The method according to claim 1, wherein, based on each pair of the first predicted object sequence and the second predicted object sequence with target matching relationship, the model parameters of the first model are updated at least once to obtain training After the first model, including:
    基于每一对具有目标匹配关系的第一预测对象序列和第二预测对象序列之间的相似度,确定目标损失值;Determining a target loss value based on the similarity between each pair of the first predictor sequence and the second predictor sequence having a target matching relationship;
    在所述目标损失值不满足预设条件的情况下,对所述第一模型的模型参数进行更新,得到更新后的第一模型;When the target loss value does not meet the preset condition, update the model parameters of the first model to obtain the updated first model;
    基于更新后的第一模型,确定训练后的所述第一模型。Based on the updated first model, the trained first model is determined.
  3. 根据权利要求2所述的方法,其中,所述在所述目标损失值不满足预设条件的情况下,对所述第一模型的模型参数进行更新,得到更新后的第一模型,包括:The method according to claim 2, wherein, when the target loss value does not meet a preset condition, updating the model parameters of the first model to obtain an updated first model includes:
    在所述目标损失值不满足预设条件的情况下,分别对所述第一模型的模型参数和所述第二模型的模型参数进行更新,得到更新后的第一模型和更新后的第二模型;When the target loss value does not meet the preset condition, the model parameters of the first model and the model parameters of the second model are respectively updated to obtain the updated first model and the updated second model. Model;
    所述基于更新后的第一模型,确定训练后的所述第一模型,包括:The determining the trained first model based on the updated first model includes:
    基于更新后的第一模型和更新后的第二模型,确定训练后的所述第一模型。Based on the updated first model and the updated second model, the trained first model is determined.
  4. 根据权利要求3所述的方法,其中,所述分别对所述第一模型的模型参数和所述第二模型的模型参数进行更新,得到更新后的第一模型和更新后的第二模型,包括:The method according to claim 3, wherein the model parameters of the first model and the model parameters of the second model are respectively updated to obtain an updated first model and an updated second model, include:
    基于所述第一模型当前的模型参数,对所述第二模型的模型参数进行动量更新,得到更新后的第二模型;Based on the current model parameters of the first model, update the momentum of the model parameters of the second model to obtain an updated second model;
    采用梯度更新的方式,对所述第一模型当前的模型参数进行更新,得到更新后的第一模型。The current model parameters of the first model are updated in a gradient update manner to obtain an updated first model.
  5. 根据权利要求3或4所述的方法,其中,所述基于更新后的第一模型和更新后的第二模型,确定训练后的所述第一模型,包括:The method according to claim 3 or 4, wherein said determining the trained first model based on the updated first model and the updated second model comprises:
    将对下一第一图像样本分别进行增广处理后得到的第一增广图像和第二增广图像,分别确定为当前第一增广图像和当前第二增广图像;Determining the first augmented image and the second augmented image obtained after augmenting the next first image sample respectively as the current first augmented image and the current second augmented image;
    利用当前更新后的第一模型,对所述当前第一增广图像进行目标检测,得到至少一个包括第一预测对象序列的第一检测结果,并利用当前更新后的第二模型,对所述当前第二增广图像进行目标检测,得到至少一个包括第二预测对象序列的第二检测结果;Using the currently updated first model, perform target detection on the current first augmented image to obtain at least one first detection result including a first predicted object sequence, and use the currently updated second model to perform target detection on the current first augmented image. Perform target detection on the current second augmented image, and obtain at least one second detection result including a second predicted object sequence;
    对每一所述第一预测对象序列和每一所述第二预测对象序列进行匹配,得到至少一对具有目标匹配关系的第一预测对象序列和第二预测对象序列;Matching each of the first predictor sequences and each of the second predictor sequences to obtain at least one pair of the first predictor sequence and the second predictor sequence having a target matching relationship;
    基于每一对具有目标匹配关系的第一预测对象序列和第二预测对象序列之间的相似度,确定当前目标损失值;Determine the current target loss value based on the similarity between each pair of the first predictor sequence and the second predictor sequence having a target matching relationship;
    在所述当前目标损失值满足所述预设条件或对所述第一模型的模型参数进行更新的次数达到次数阈值的情况下,将所述当前更新后的第一模型确定为训练后的所述第一模型。When the current target loss value satisfies the preset condition or the number of times the model parameters of the first model are updated reaches a number threshold, determine the currently updated first model as the trained model. Describe the first model.
  6. 根据权利要求5所述的方法,其中,所述基于更新后的第一模型和更新后的第二模型,确定训练后的所述第一模型,还包括:The method according to claim 5, wherein said determining the trained first model based on the updated first model and the updated second model further comprises:
    在所述当前目标损失值不满足预设条件的情况下,分别对所述第一模型的模型参数和所述第二模型的模型参数进行下一次更新,得到下一次更新后的第一模型和下一次更新后的第二模型;In the case that the current target loss value does not meet the preset condition, the model parameters of the first model and the model parameters of the second model are respectively updated next time to obtain the first model and the model parameters after the next update. The second model after the next update;
    基于所述下一次更新后的第一模型和所述下一次更新后的第二模型,确定训练后的所述第一模型。A trained first model is determined based on the next updated first model and the next updated second model.
  7. 根据权利要求2至6中任一项所述的方法,其中,所述第一检测结果还包括与所述第一检测结果中的第一预测对象序列对应的第一对象区域和第一对象类别;所述方法还包括:The method according to any one of claims 2 to 6, wherein the first detection result further includes a first object region and a first object category corresponding to the first predicted object sequence in the first detection result ; The method also includes:
    获取所述第一图像样本中的至少一个候选对象,每一所述候选对象具有候选对象区域和候选对象类别;acquiring at least one candidate object in the first image sample, each of the candidate objects having a candidate object region and a candidate object class;
    基于每一所述第一预测对象序列对应的第一对象区域和第一对象类别、以及每一所述候选对象的候选对象区域和候选对象类别,对每一所述第一预测对象序列和每一所述候选对象进行匹配,得到至少一对具有目标匹配关系的第一预测对象序列和候选对象;Based on the first object region and the first object category corresponding to each of the first predicted object sequences, and the candidate object region and the candidate object category of each of the candidate objects, for each of the first predicted object sequences and each Matching the candidate objects to obtain at least one pair of the first predicted object sequence and the candidate object having a target matching relationship;
    所述基于每一对具有目标匹配关系的第一预测对象序列和第二预测对象序列之间的相似度,确定目标损失值,包括:The determining the target loss value based on the similarity between each pair of the first predictor sequence and the second predictor sequence having a target matching relationship includes:
    基于每一对具有目标匹配关系的第一预测对象序列和第二预测对象序列之间的相似度,确定第一损失值;Determining a first loss value based on the similarity between each pair of the first predictor sequence and the second predictor sequence having a target matching relationship;
    基于每一对具有目标匹配关系的第一预测对象序列和候选对象,确定第二损失值;Determining a second loss value based on each pair of the first predicted object sequence and the candidate object having a target matching relationship;
    基于所述第一损失值和所述第二损失值,确定目标损失值。Based on the first loss value and the second loss value, a target loss value is determined.
  8. 根据权利要求7所述的方法,其中,所述基于每一对具有目标匹配关系的第一预测对象序列和候选对象,确定第二损失值,包括:The method according to claim 7, wherein said determining the second loss value based on each pair of the first predicted object sequence and the candidate object having a target matching relationship comprises:
    针对每一对具有目标匹配关系的第一预测对象序列和候选对象,基于所述第一预测对象序列对应的第一对象区域和所述候选对象的候选对象区域,确定一第一子损失值,并基于所述第一预测对象序列对应的第一对象类别与所述候选对象的候选对象类别,确定一第二子损失值;For each pair of the first predicted object sequence and the candidate object having a target matching relationship, determine a first sub-loss value based on the first object region corresponding to the first predicted object sequence and the candidate object region of the candidate object, and determining a second sub-loss value based on the first object category corresponding to the first predicted object sequence and the candidate object category of the candidate object;
    基于每一所述第一子损失值和每一所述第二子损失值,确定第二损失值。A second loss value is determined based on each of the first sub-loss values and each of the second sub-loss values.
  9. 根据权利要求7或8所述的方法,其中,所述获取所述第一图像样本中的至少一个候选对象,每一所述候选对象具有候选对象区域和候选对象类别,包括:The method according to claim 7 or 8, wherein the acquiring at least one candidate object in the first image sample, each of the candidate objects having a candidate object area and a candidate object category, comprises:
    采用无监督方式,对所述第一图像样本进行目标检测,得到至少一个预测对象区域以及每一所述预测对象区域的伪标签;每一所述预测对象区域的伪标签用于表征所述预测对象区域的预测对象类别;In an unsupervised manner, perform target detection on the first image sample to obtain at least one predicted object region and a pseudo-label of each predicted object region; the pseudo-label of each predicted object region is used to characterize the prediction the predicted object category of the object area;
    针对每一所述预测对象区域,将所述预测对象区域作为候选对象区域,并将所述预测对象区域的伪标签作为候选对象类别,得到一个候选对象。For each of the prediction object regions, the prediction object region is used as a candidate object region, and the pseudo-label of the prediction object region is used as a candidate object category to obtain a candidate object.
  10. 根据权利要求1至9中任一项所述的方法,其中,所述第一检测结果还包括与所述第一检测结果中的第一预测对象序列对应的第一对象区域和第一对象类别,所述第二检测结果还包括与所述第二检测结果中的第二预测对象序列对应的第二对象区域和第二对象类别;The method according to any one of claims 1 to 9, wherein the first detection result further includes a first object region and a first object category corresponding to the first predicted object sequence in the first detection result , the second detection result further includes a second object region and a second object category corresponding to a second predicted object sequence in the second detection result;
    所述对每一所述第一预测对象序列和每一所述第二预测对象序列进行匹配,得到至少一对具有目标匹配关系的第一预测对象序列和第二预测对象序列,包括:The matching of each of the first predictor sequences and each of the second predictor sequences to obtain at least one pair of first predictor sequences and second predictor sequences having a target matching relationship includes:
    基于每一所述第一预测对象序列对应的第一对象区域和第一对象类别、以及每一所述第二预测对象序列对应的第二对象区域和第二对象类别,对每一所述第一预测对象序列和每一所述第二预测对象序列进行二分图匹配,得到至少一对具有目标匹配关系的第一预测对象序列和第二预测对象序列。Based on the first object region and the first object category corresponding to each of the first predicted object sequences, and the second object region and the second object category corresponding to each of the second predicted object sequences, for each of the first predicted object sequences A bipartite graph matching is performed between a sequence of prediction objects and each of the second sequence of prediction objects to obtain at least one pair of the first sequence of prediction objects and the second sequence of prediction objects having a target matching relationship.
  11. 根据权利要求10所述的方法,其中,所述基于每一所述第一预测对象序列对应的第一对象区域和第一对象类别、以及每一所述第二预测对象序列对应的第二对象区域和第二对象类别,对每一所述第一预测对象序列和每一所述第二预测对象序列进行二分图匹配,得到至少一对具有目标匹配关系的第一预测对象序列和第二预测对象序列,包括:The method according to claim 10, wherein the first object area and the first object category corresponding to each of the first predicted object sequences, and the second objects corresponding to each of the second predicted object sequences area and the second object category, performing bipartite graph matching on each of the first prediction object sequences and each of the second prediction object sequences to obtain at least one pair of the first prediction object sequence and the second prediction object sequence having a target matching relationship sequence of objects, including:
    基于每一所述第一预测对象序列和每一所述第二预测对象序列,确定至少一个候选序列对集合;每一所述候选序列对集合中包括至少一对具有候选匹配关系的第一预测对象序列和第二预测对象序列;Based on each of the first predictor sequences and each of the second predictor sequences, determine at least one candidate sequence pair set; each of the candidate sequence pair sets includes at least one pair of first predictors with a candidate matching relationship a sequence of objects and a second predicted sequence of objects;
    针对每一候选序列对集合,基于所述候选序列对集合中每一对具有候选匹配关系的第一预测对象序列和第二预测对象序列中所述第一预测对象序列对应的第一对象区域和第一对象类别、以及所述第二预测对象序列对应的第二对象区域和第二对象类别,确定所述候选序列对集合的匹配损失;For each set of candidate sequence pairs, based on the first object region and determining the matching loss of the set of candidate sequence pairs for the first object category, and the second object region and the second object category corresponding to the second predicted object sequence;
    将所述至少一个候选序列对集合中匹配损失最小的候选序列对集合中的每一对具有候选匹配关系的第一预测对象序列和第二预测对象序列,确定为至少一对具有目标匹配关系的第一预测对象序列和第二预测对象序列。Determining each pair of the first prediction target sequence and the second prediction target sequence with a candidate matching relationship in the candidate sequence pair set with the smallest matching loss in the at least one candidate sequence pair set as at least one pair of target matching relationship The first sequence of predictors and the second sequence of predictors.
  12. 根据权利要求1至11中任一项所述的方法,其中,所述第一模型包括特征提取网络和转化器网络;The method according to any one of claims 1 to 11, wherein the first model comprises a feature extraction network and a transformer network;
    所述利用待训练的第一模型,对所述第一增广图像进行目标检测,得到至少一个包括第一预测对象序列的第一检测结果,包括:Using the first model to be trained to perform target detection on the first augmented image to obtain at least one first detection result including a first predicted object sequence, including:
    利用所述第一模型的特征提取网络,对所述第一增广图像进行特征提取,得到图像特征信息;Using the feature extraction network of the first model to perform feature extraction on the first augmented image to obtain image feature information;
    利用所述第一模型的转换器网络,对所述图像特征信息进行预测处理,得到至少一个第一预测对象序列。Prediction processing is performed on the image feature information by using the converter network of the first model to obtain at least one sequence of first prediction objects.
  13. 根据权利要求12所述的方法,其中,所述第一模型还包括第一前馈神经网络;The method of claim 12, wherein the first model further comprises a first feed-forward neural network;
    所述利用所述第一模型的转换器网络,对所述图像特征信息进行预测处理,得到至少一个第一预测对象序列,包括:Using the converter network of the first model to perform prediction processing on the image feature information to obtain at least one first prediction object sequence, including:
    利用所述第一模型的转换器网络,对所述图像特征信息进行预测处理,得到至少一个特征序列;Predicting the image feature information by using the converter network of the first model to obtain at least one feature sequence;
    利用所述第一前馈神经网络,将每一所述特征序列映射至目标维度,得到至少一个第一预测对象序列。Using the first feed-forward neural network, each of the feature sequences is mapped to a target dimension to obtain at least one first sequence of predicted objects.
  14. 根据权利要求13所述的方法,其中,所述第一检测结果还包括第一对象区域和第一对象类别,所述第一模型还包括第二前馈神经网络和第三前馈神经网络;The method according to claim 13, wherein the first detection result further includes a first object area and a first object category, and the first model further includes a second feedforward neural network and a third feedforward neural network;
    所述利用待训练的第一模型,对所述第一增广图像进行目标检测,得到至少一个包括第一预测对象序列的第一检测结果,还包括:The step of using the first model to be trained to perform object detection on the first augmented image to obtain at least one first detection result including the first predicted object sequence further includes:
    针对每一所述特征序列,利用所述第二前馈神经网络,对所述特征序列进行区域预测,得到第一对象区域,并利用所述第三前馈神经网络,对所述特征序列进行类别预测,得到第一对象类别。For each feature sequence, using the second feedforward neural network to perform region prediction on the feature sequence to obtain a first object region, and using the third feedforward neural network to perform region prediction on the feature sequence Category prediction, get the first object category.
  15. 根据权利要求12至14中任一项所述的方法,其中,所述第二模型与所述第一模型具有相同的网络结构。The method according to any one of claims 12 to 14, wherein the second model has the same network structure as the first model.
  16. 根据权利要求1至15中任一项所述的方法,其中,所述获取分别对第一图像样本进行增广处理后得到的第一增广图像和第二增广图像,包括:The method according to any one of claims 1 to 15, wherein said acquiring the first augmented image and the second augmented image respectively obtained after augmenting the first image sample includes:
    对第一图像样本进行第一图像增广处理,得到第一增广图像;performing a first image augmentation process on the first image sample to obtain a first augmented image;
    对所述第一图像样本进行第二图像增广处理,得到第二增广图像。Performing a second image augmentation process on the first image sample to obtain a second augmented image.
  17. 根据权利要求16所述的方法,其中,The method of claim 16, wherein,
    所述第一图像增广处理包括以下至少之一:颜色抖动、灰度处理、高斯模糊、随机擦除;The first image augmentation process includes at least one of the following: color dithering, grayscale processing, Gaussian blur, random erasing;
    所述第二图像增广处理包括以下至少之一:随机缩放、随机裁剪、随机翻转、随机调整尺寸。The second image augmentation process includes at least one of the following: random scaling, random cropping, random flipping, and random resizing.
  18. 根据权利要求1至17中任一项所述的方法,其中,所述方法还包括:The method according to any one of claims 1 to 17, wherein the method further comprises:
    基于训练后的所述第一模型,确定初始的第三模型;determining an initial third model based on the trained first model;
    基于至少一个第二图像样本,对所述第三模型的模型参数进行更新,得到训练后的所述第三模型。Based on at least one second image sample, the model parameters of the third model are updated to obtain the trained third model.
  19. 一种图像处理方法,包括:An image processing method, comprising:
    获取待处理图像;Get the image to be processed;
    利用已训练的第四模型,对所述待处理图像进行目标检测,得到第三检测结果;其中,所述第四模型包括以下至少之一:采用如权利要求1至17中任一项所述的模型训练方法得到的第一模型,采用如权利要求18所述的模型训练方法得到的第三模型。Use the trained fourth model to perform target detection on the image to be processed to obtain a third detection result; wherein the fourth model includes at least one of the following: using any one of claims 1 to 17 The first model obtained by the model training method, and the third model obtained by the model training method according to claim 18.
  20. 一种模型训练装置,包括:A model training device, comprising:
    第一获取部分,被配置为获取分别对第一图像样本进行增广处理后得到的第一增广图像和第二增广图像;The first acquisition part is configured to acquire a first augmented image and a second augmented image obtained by respectively augmenting the first image sample;
    第一检测部分,被配置为利用待训练的第一模型,对所述第一增广图像进行目标检测,得到至少一个包括第一预测对象序列的第一检测结果,并利用第二模型,对所述第二增广图像进行目标检测,得到至少一个包括第二预测对象序列的第二检测结果;The first detection part is configured to use the first model to be trained to perform object detection on the first augmented image, obtain at least one first detection result including the first predicted object sequence, and use the second model to perform target detection on the first augmented image. Target detection is performed on the second augmented image, and at least one second detection result including a second predicted object sequence is obtained;
    第一匹配部分,被配置为对每一所述第一预测对象序列和每一所述第二预测对象序列进行匹配,得到至少一对具有目标匹配关系的第一预测对象序列和第二预测对象序列;The first matching part is configured to match each of the first predictor sequences and each of the second predictor sequences to obtain at least one pair of first predictor sequences and second predictors having a target matching relationship sequence;
    第一更新部分,被配置为基于每一对具有目标匹配关系的第一预测对象序列和第二预测对象序列,对所述第一模型的模型参数进行至少一次更新,得到训练后的所述第一模型。The first update part is configured to update the model parameters of the first model at least once based on each pair of the first predicted object sequence and the second predicted object sequence having a target matching relationship, and obtain the trained first predicted object sequence. a model.
  21. 一种图像处理装置,包括:An image processing device, comprising:
    第三获取部分,被配置为获取待处理图像;The third acquiring part is configured to acquire the image to be processed;
    第二检测部分,被配置为利用已训练的第四模型,对所述待处理图像进行目标检测,得到第三检测结果;其中,所述第四模型包括以下至少之一:采用如权利要求1至17中任一项所述的模型训练方法得到的第一模型,采用如权利要求18所述的模型训练方法得到的第三模型。The second detection part is configured to use the trained fourth model to perform target detection on the image to be processed to obtain a third detection result; wherein the fourth model includes at least one of the following: using the method according to claim 1 The first model obtained by the model training method described in any one of to 17, adopts the third model obtained by the model training method described in claim 18.
  22. 一种计算机设备,包括存储器和处理器,所述存储器存储有可在处理器上运行的计算机程序,所述处理器执行所述程序时实现权利要求1至19中任一项所述方法中的步骤。A computer device, comprising a memory and a processor, the memory stores a computer program that can run on the processor, and the processor implements the method described in any one of claims 1 to 19 when executing the program step.
  23. 一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现权利要求1至19中任一项所述方法中的步骤。A computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the steps in the method according to any one of claims 1 to 19 are realized.
  24. 一种计算机程序产品,所述计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质,所述计算机程序被计算机读取并执行时,实现权利要求1至19中任一项所述方法中的步骤。A computer program product, the computer program product includes a non-transitory computer-readable storage medium storing a computer program, and when the computer program is read and executed by a computer, it realizes any one of claims 1 to 19 steps in the method.
PCT/CN2022/095298 2021-12-31 2022-05-26 Model training method and apparatus, image processing method and apparatus, and device, storage medium and computer program product WO2023123847A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111667489.4A CN114359592A (en) 2021-12-31 2021-12-31 Model training and image processing method, device, equipment and storage medium
CN202111667489.4 2021-12-31

Publications (1)

Publication Number Publication Date
WO2023123847A1 true WO2023123847A1 (en) 2023-07-06

Family

ID=81104446

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/095298 WO2023123847A1 (en) 2021-12-31 2022-05-26 Model training method and apparatus, image processing method and apparatus, and device, storage medium and computer program product

Country Status (2)

Country Link
CN (1) CN114359592A (en)
WO (1) WO2023123847A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114359592A (en) * 2021-12-31 2022-04-15 上海商汤智能科技有限公司 Model training and image processing method, device, equipment and storage medium
CN117077541B (en) * 2023-10-11 2024-01-09 北京芯联心科技发展有限公司 Efficient fine adjustment method and system for parameters of medical model

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6226388B1 (en) * 1999-01-05 2001-05-01 Sharp Labs Of America, Inc. Method and apparatus for object tracking for automatic controls in video devices
CN105224623A (en) * 2015-09-22 2016-01-06 北京百度网讯科技有限公司 The training method of data model and device
CN113570398A (en) * 2021-02-02 2021-10-29 腾讯科技(深圳)有限公司 Promotion data processing method, model training method, system and storage medium
CN114359592A (en) * 2021-12-31 2022-04-15 上海商汤智能科技有限公司 Model training and image processing method, device, equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6226388B1 (en) * 1999-01-05 2001-05-01 Sharp Labs Of America, Inc. Method and apparatus for object tracking for automatic controls in video devices
CN105224623A (en) * 2015-09-22 2016-01-06 北京百度网讯科技有限公司 The training method of data model and device
CN113570398A (en) * 2021-02-02 2021-10-29 腾讯科技(深圳)有限公司 Promotion data processing method, model training method, system and storage medium
CN114359592A (en) * 2021-12-31 2022-04-15 上海商汤智能科技有限公司 Model training and image processing method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN114359592A (en) 2022-04-15

Similar Documents

Publication Publication Date Title
CN109389091B (en) Character recognition system and method based on combination of neural network and attention mechanism
US20230196117A1 (en) Training method for semi-supervised learning model, image processing method, and device
CN109840531B (en) Method and device for training multi-label classification model
Kaymak et al. A brief survey and an application of semantic image segmentation for autonomous driving
CN111523621B (en) Image recognition method and device, computer equipment and storage medium
WO2023123847A1 (en) Model training method and apparatus, image processing method and apparatus, and device, storage medium and computer program product
KR101865102B1 (en) Systems and methods for visual question answering
WO2019228358A1 (en) Deep neural network training method and apparatus
WO2019100724A1 (en) Method and device for training multi-label classification model
CN112528780B (en) Video motion segmentation by hybrid temporal adaptation
CN108734210B (en) Object detection method based on cross-modal multi-scale feature fusion
CN109522945B (en) Group emotion recognition method and device, intelligent device and storage medium
Zhao et al. Looking wider for better adaptive representation in few-shot learning
CN112507990A (en) Video time-space feature learning and extracting method, device, equipment and storage medium
CN114495129B (en) Character detection model pre-training method and device
CN114266897A (en) Method and device for predicting pox types, electronic equipment and storage medium
Liu et al. Learning explicit shape and motion evolution maps for skeleton-based human action recognition
CN114462290A (en) Method and device for generating pre-training artificial intelligence model
CN114091594A (en) Model training method and device, equipment and storage medium
CN116503876A (en) Training method and device of image recognition model, and image recognition method and device
Sun et al. Two-stage deep regression enhanced depth estimation from a single RGB image
CN113435531B (en) Zero sample image classification method and system, electronic equipment and storage medium
CN116503670A (en) Image classification and model training method, device and equipment and storage medium
CN116704433A (en) Self-supervision group behavior recognition method based on context-aware relationship predictive coding
WO2023115891A1 (en) Spiking encoding method and system, and electronic device and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22913139

Country of ref document: EP

Kind code of ref document: A1