CN113869361A - Model training method, target detection method and related device - Google Patents

Model training method, target detection method and related device Download PDF

Info

Publication number
CN113869361A
CN113869361A CN202110963178.6A CN202110963178A CN113869361A CN 113869361 A CN113869361 A CN 113869361A CN 202110963178 A CN202110963178 A CN 202110963178A CN 113869361 A CN113869361 A CN 113869361A
Authority
CN
China
Prior art keywords
input image
training
image
information corresponding
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110963178.6A
Other languages
Chinese (zh)
Inventor
陈海波
罗志鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenyan Technology Beijing Co ltd
Original Assignee
Shenyan Technology Beijing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenyan Technology Beijing Co ltd filed Critical Shenyan Technology Beijing Co ltd
Priority to CN202110963178.6A priority Critical patent/CN113869361A/en
Publication of CN113869361A publication Critical patent/CN113869361A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The application provides a model training method, a target detection method and a related device, wherein the model training method is used for training a preset deep neural network, the preset deep neural network comprises a prediction module, the prediction module uses Cascade RCNN and uses CBNet as a feature extraction network of the Cascade RCNN, and the model training method comprises the following steps: acquiring a training data set, wherein each training data in the training data set comprises a training image and label detection information corresponding to the training image, and the label detection information corresponding to the training image comprises label classification information and label bounding box information corresponding to the training image; and training the preset deep neural network by using the training data set to obtain a target detection model. The model training method can obtain a target detection model which is more stable, higher in accuracy and wide in application range.

Description

Model training method, target detection method and related device
Technical Field
The present application relates to the field of deep learning technologies, and in particular, to a model training method, a target detection method, and a related apparatus.
Background
Target detection is a very popular research direction in the field of computer vision at present and is an important link in the fields of unmanned driving technology and the like.
The prior art CN110942000A discloses a method for detecting an unmanned vehicle target based on deep learning, which samples a target object by generating a three-dimensional template of the target object, and generates a candidate frame for an input image by combining the generated three-dimensional template and an object sampling strategy; extracting the characteristics of the generated candidate frame to construct a target function; based on the obtained target function, training the weight of the target function and primarily detecting a target object by using a structured support vector machine classifier; improving a regional candidate network and constructing a high-efficiency HRPN network; training a fast RCNN monitoring model based on the constructed HRPN network, inputting a preliminary detection result obtained by a structured support vector machine classifier into the network for training, and storing model parameter information and structure information for target detection after training.
However, the above target detection method still has a very critical problem, namely that the stability is poor, the accuracy is low, and the application range is affected.
Disclosure of Invention
The application aims to provide a model training method, a target detection method and a related device, so that a target detection model obtained through training is more stable, higher in accuracy and wide in application range.
The purpose of the application is realized by adopting the following technical scheme:
in a first aspect, the present application provides a model training method for training a preset deep neural network, where the preset deep neural network includes a prediction module, the prediction module uses Cascade RCNN and CBNet as a feature extraction network of the Cascade RCNN, the model training method includes: acquiring a training data set, wherein each training data in the training data set comprises a training image and label detection information corresponding to the training image, and the label detection information corresponding to the training image comprises label classification information and label bounding box information corresponding to the training image; and training the preset deep neural network by using the training data set to obtain a target detection model.
The technical scheme has the beneficial effects that the CBNet is used as the feature extraction network of the Cascade RCNN, the CBNet has stronger feature extraction capability and higher precision, and further can be applied to more scenes. The target detection model obtained by training by the method is used for executing the target detection task, and is more stable, higher in accuracy and wide in application range.
In some optional embodiments, the preset deep neural network further includes a data augmentation module, and the training the preset deep neural network with the training data set to obtain a target detection model includes: inputting at least one training image into the data augmentation module to obtain an augmentation image corresponding to the at least one training image; taking at least one training image and corresponding label detection information thereof as a source domain, taking an augmented image corresponding to at least one training image as an augmented area, and training the preset deep neural network by using the source domain and the augmented area so as to reduce the data distribution difference between the augmented area and the source domain; acquiring label detection information of an augmented image corresponding to at least one training image; acquiring a target domain; taking an augmented image corresponding to at least one training image and label detection information corresponding to the augmented image as a new augmented domain, and training the preset deep neural network by using the new augmented domain and the target domain to reduce the data distribution difference between the augmented domain and the target domain; and taking the trained preset deep neural network as the target detection model.
The training image may be a training image obtained under some specific weather conditions, shelters, road congestion, or the like. The technical scheme has the advantages that the training image is subjected to data amplification through the data amplification module, so that a training data set can be diversified as much as possible, a target detection model obtained by training has strong generalization capability, and the method is also suitable for training images obtained under the conditions of specific weather, shelters or road congestion and the like, and is wide in application range; training the preset deep neural network through the source domain and the augmented wide area, so that the data distribution difference between the augmented wide area and the source domain can be reduced; and training the preset deep neural network by using the new increased area and the target area, so that the data distribution difference between the increased area and the target area can be reduced.
In some optional embodiments, the data augmentation module is a generator, the preset deep neural network further includes a feature extraction module, a gradient inversion layer, and a domain discriminator, and the training of the preset deep neural network includes: inputting an input image into the feature extraction module to obtain first feature information and second feature information corresponding to the input image, wherein the input image has corresponding label detection information or does not have corresponding label detection information; inputting first feature information corresponding to the input image into the prediction module to obtain prediction detection information corresponding to the input image, wherein the prediction detection information corresponding to the input image comprises prediction classification information and prediction bounding box information corresponding to the input image; when the input image has corresponding label detection information, training the prediction module based on the label detection information and the prediction detection information corresponding to the input image; inputting first characteristic information and second characteristic information corresponding to the input image into the gradient inversion layer to obtain gradient inversion information corresponding to the input image; inputting the gradient inversion information corresponding to the input image into the domain discriminator to obtain domain discrimination information corresponding to the input image; training the generator and the domain discriminator in a counterlearning manner based on domain discrimination information corresponding to the input image.
The technical scheme has the advantages that the first characteristic information and the second characteristic information corresponding to the input image are obtained through the characteristic extraction module, the first characteristic information and the second characteristic information are sent to the gradient inversion layer and the domain discriminator, the generator and the domain discriminator are trained in an antagonistic learning mode, the domain invariant characteristics are learned in an antagonistic mode, and the antagonistic robustness of the preset deep neural network is improved.
In some alternative embodiments, the prediction module comprises a feature extraction network and a dual-headed structure; the inputting the first feature information corresponding to the input image into the prediction module to obtain the prediction detection information corresponding to the input image includes: inputting first feature information corresponding to the input image into the feature extraction network to obtain feature extraction information corresponding to the input image; and inputting the feature extraction information corresponding to the input image into the double-head structure to obtain the prediction detection information corresponding to the input image.
The technical scheme has the advantages that the first feature information corresponding to the input image is input into the feature extraction network to obtain the feature extraction information corresponding to the input image, the feature extraction information corresponding to the input image obtained by the feature extraction network is input into the double-head structure, and the prediction detection information corresponding to the input image can be obtained.
In some optional embodiments, the inputting the first feature information corresponding to the input image into the feature extraction network to obtain the feature extraction information corresponding to the input image includes: the long sides in the width and the height of the input image are zoomed to preset length values, and the short sides in the width and the height of the input image are zoomed to any value in a preset length range; determining a plurality of input images including the input image; filling the short sides of the rest input images to a reference value by taking the maximum value of the short sides in the plurality of input images as the reference value; inputting the plurality of input images into the feature extraction network in a batch mode to obtain feature extraction information corresponding to the plurality of input images, wherein the feature extraction information corresponding to the plurality of input images comprises the feature extraction information corresponding to the input images.
The technical scheme has the advantages that the spatial-level image enhancement can be performed on a plurality of images in a data set in a batch mode to remove image noise, and the structural information of the original image cannot be damaged.
In some optional embodiments, the feature extraction network comprises Stage1、Stage2、Stage3、Stage4、Stage1_1、Stage2_2、Stage3_3、Stage4_4And a first up-sampling unit to a third up-sampling unit, wherein the first feature information corresponding to the input image is input into the feature extraction network to obtain the correspondence of the input imageThe feature extraction information of (1), comprising: inputting first characteristic information corresponding to the input image into Stage1Obtaining a characteristic map F corresponding to the input image1(ii) a The feature map F corresponding to the input image1Input Stage1_1Obtaining a characteristic map F corresponding to the input image2(ii) a The feature map F corresponding to the input image1Input Stage2Obtaining a characteristic map F corresponding to the input image3(ii) a The feature map F corresponding to the input image3And feature map F2Added to Stage2_2Obtaining a characteristic map F corresponding to the input image4(ii) a The feature map F corresponding to the input image3Input Stage3Obtaining a characteristic map F corresponding to the input image5(ii) a The feature map F corresponding to the input image5And feature map F4Added to Stage3_3Obtaining a characteristic map F corresponding to the input image6(ii) a The feature map F corresponding to the input image5Input Stage4Obtaining a characteristic map F corresponding to the input image7(ii) a The feature map F corresponding to the input image7And feature map F6Added to Stage4_4Obtaining a characteristic map F corresponding to the input image8And corresponding characteristic diagram F of the input image8As a fusion feature M corresponding to the input image3(ii) a The feature map F corresponding to the input image8Inputting the third up-sampling unit to obtain a feature map F corresponding to the input image8And the feature map F corresponding to the input image is obtained8And a feature map F corresponding to the input image6Adding to obtain the corresponding fusion feature M of the input image2(ii) a Corresponding fusion characteristics M of the input image2Inputting a second up-sampling unit to obtain a fusion feature M corresponding to the input image2And the corresponding fusion feature M of the input image is obtained2And a feature map F corresponding to the input image4Adding to obtain the corresponding fusion characteristics of the input imagesM1(ii) a Corresponding fusion characteristics M of the input image1Inputting the first up-sampling unit to obtain the fusion feature M corresponding to the input image1And the corresponding fusion feature M of the input image is obtained1And a feature map F corresponding to the input image2Adding to obtain the corresponding fusion feature M of the input image0(ii) a Corresponding fusion characteristics M of the input image3The fusion feature M corresponding to the input image2Fusion feature M corresponding to input image1Fusion feature M corresponding to the input image0And extracting information as the characteristic corresponding to the input image.
The technical scheme has the beneficial effect that a plurality of fusion features corresponding to the input image are used as feature extraction information corresponding to the input image.
In some optional embodiments, the double-headed structure includes a convolution layer, a first-stage network and a second-stage network, the first-stage network includes a bounding box extraction unit, a two-class network and a first-stage regression network, the second-stage network includes a first multi-class network to a third multi-class network and a first regression network to a third regression network, and the inputting the feature extraction information corresponding to the input image into the double-headed structure to obtain the prediction detection information corresponding to the input image includes: inputting the feature extraction information corresponding to the input image into the convolution layer to obtain a convolution result corresponding to the input image; inputting the convolution result corresponding to the input image into the bounding box extraction unit to obtain first-stage bounding box information corresponding to the input image; acquiring second-stage boundary box information corresponding to the input image by using the first-stage boundary box information corresponding to the input image, the two-classification network and the first-stage regression network; acquiring first bounding box information corresponding to the input image by utilizing second-stage bounding box information corresponding to the input image, the first multi-classification network and the first regression network; acquiring second bounding box information corresponding to the input image by using the first bounding box information corresponding to the input image, the second multi-classification network and the second regression network; and acquiring the prediction detection information corresponding to the input image by using the second bounding box information corresponding to the input image, the third multi-classification network and the third regression network.
The technical scheme has the advantages that the classification task usually needs more image semantic information, the regression task needs more spatial information, the double-head structure is adopted, characteristics of different requirements are considered, and the effect is more obvious.
In a second aspect, the present application provides a target detection method, including: acquiring an image to be detected; inputting the image to be detected into a target detection model to obtain the corresponding prediction detection information of the image to be detected; the target detection model is obtained by training by using any one of the model training methods.
The technical scheme has the advantages that the image to be detected is input into the target detection model, and the prediction detection information corresponding to the image to be detected can be accurately and stably obtained.
In a third aspect, the present application provides a model training apparatus for training a preset deep neural network, where the preset deep neural network includes a prediction module, the prediction module uses Cascade RCNN and uses CBNet as a feature extraction network of the Cascade RCNN, the model training apparatus includes:
the training data set part is used for acquiring a training data set, each piece of training data in the training data set comprises a training image and label detection information corresponding to the training image, and the label detection information corresponding to the training image comprises label classification information and label boundary box information corresponding to the training image;
and the model training part is used for training the preset deep neural network by using the training data set to obtain a target detection model.
In some optional embodiments, the preset deep neural network further includes a data augmentation module, and the model training part includes:
the augmented image module is used for inputting at least one training image into the data augmented module to obtain an augmented image corresponding to the at least one training image;
the first training module is used for taking at least one training image and corresponding label detection information thereof as a source domain, taking an augmented image corresponding to at least one training image as an augmented area, and training the preset deep neural network by using the source domain and the augmented area so as to reduce the data distribution difference between the augmented area and the source domain;
the annotation information acquisition module is used for acquiring annotation detection information of the augmented image corresponding to at least one training image;
the target domain acquiring module is used for acquiring a target domain;
the second training module is used for taking an augmented image corresponding to at least one training image and label detection information corresponding to the augmented image as a new augmented domain, and training the preset deep neural network by using the new augmented domain and the target domain so as to reduce the data distribution difference between the augmented domain and the target domain;
and the target detection module is used for taking the trained preset deep neural network as the target detection model.
In some optional embodiments, the data augmentation module is a generator, the preset deep neural network further includes a feature extraction module, a gradient inversion layer, and a domain discriminator, and the first training module and the second training module each include:
the characteristic extraction submodule is used for inputting an input image into the characteristic extraction module to obtain first characteristic information and second characteristic information corresponding to the input image, wherein the input image has corresponding label detection information or does not have corresponding label detection information;
the first prediction sub-module is used for inputting first feature information corresponding to the input image into the prediction module to obtain prediction detection information corresponding to the input image, wherein the prediction detection information corresponding to the input image comprises prediction classification information and prediction boundary box information corresponding to the input image;
the first training sub-module is used for training the prediction module based on the label detection information and the prediction detection information corresponding to the input image when the input image has the corresponding label detection information;
the gradient inversion submodule is used for inputting first characteristic information and second characteristic information corresponding to the input image into the gradient inversion layer to obtain gradient inversion information corresponding to the input image;
the domain identification submodule is used for inputting the gradient inversion information corresponding to the input image into the domain identifier to obtain domain identification information corresponding to the input image;
and the countercheck learning sub-module is used for training the generator and the domain discriminator in a countercheck learning mode based on the domain discrimination information corresponding to the input image.
In some alternative embodiments, the prediction module comprises a feature extraction network and a dual-headed structure;
the first prediction sub-module includes:
the feature extraction unit is used for inputting first feature information corresponding to the input image into the feature extraction network to obtain feature extraction information corresponding to the input image;
and the double-head structure unit is used for inputting the feature extraction information corresponding to the input image into the double-head structure to obtain the prediction detection information corresponding to the input image.
In some optional embodiments, the feature extraction unit includes:
the image scaling subunit is used for scaling the long sides in the width and the height of the input image to preset length values and scaling the short sides in the width and the height of the input image to any value in a preset length range;
an image determining subunit configured to determine a plurality of input images including the input image;
an image filling subunit, configured to fill the short edges of the remaining input images to a reference value, where the reference value is a maximum value of the short edges in the plurality of input images;
and the batch input subunit is used for inputting the plurality of input images into the feature extraction network in a batch mode to obtain feature extraction information corresponding to the plurality of input images, wherein the feature extraction information corresponding to the plurality of input images comprises the feature extraction information corresponding to the input images.
In some optional embodiments, the feature extraction network comprises Stage1、Stage2、Stage3、Stage4、Stage1_1、Stage2_2、Stage3_3、Stage4_4And first to third upsampling units, the feature extraction unit including:
a first feature map subunit, configured to input first feature information corresponding to the input image to Stage1, so as to obtain a feature map F1 corresponding to the input image;
the second feature map subunit is configured to input the feature map F1 corresponding to the input image to Stage1_1, so as to obtain a feature map F2 corresponding to the input image;
a third feature map subunit, configured to input a feature map F1 corresponding to the input image into Stage2, so as to obtain a feature map F3 corresponding to the input image;
a fourth feature map subunit, configured to add the feature map F3 and the feature map F2 corresponding to the input image, and input the result to Stage2_2, so as to obtain a feature map F4 corresponding to the input image;
a fifth feature map subunit, configured to input the feature map F3 corresponding to the input image into Stage3, so as to obtain a feature map F5 corresponding to the input image;
a sixth feature map subunit, configured to add the feature map F5 and the feature map F4 corresponding to the input image, and input the result to Stage3_3 to obtain a feature map F6 corresponding to the input image;
a seventh feature map subunit, configured to input the feature map F5 corresponding to the input image into Stage4, so as to obtain a feature map F7 corresponding to the input image;
an eighth feature map subunit, configured to add the feature map F7 and the feature map F6 corresponding to the input image, and input the result to Stage4_4 to obtain a feature map F8 corresponding to the input image, and use the feature map F8 corresponding to the input image as the fusion feature M3 corresponding to the input image;
a third sampling subunit, configured to input the feature map F8 corresponding to the input image into the third upsampling unit, to obtain an upsampling result of the feature map F8 corresponding to the input image, and add the upsampling result of the feature map F8 corresponding to the input image and the feature map F6 corresponding to the input image, to obtain a fused feature M2 corresponding to the input image;
a second sampling subunit, configured to input the fusion feature M2 corresponding to the input image into a second upsampling unit, to obtain an upsampling result of the fusion feature M2 corresponding to the input image, and add the upsampling result of the fusion feature M2 corresponding to the input image and the feature map F4 corresponding to the input image, to obtain a fusion feature M1 corresponding to the input image;
a first sampling sub-unit, configured to input the fusion feature M1 corresponding to the input image into the first upsampling unit, to obtain an upsampling result of the fusion feature M1 corresponding to the input image, and add the upsampling result of the fusion feature M1 corresponding to the input image and the feature map F2 corresponding to the input image, to obtain a fusion feature M0 corresponding to the input image;
and the feature information subunit is used for taking the fusion feature M3 corresponding to the input image, the fusion feature M2 corresponding to the input image, the fusion feature M1 corresponding to the input image and the fusion feature M0 corresponding to the input image as feature extraction information corresponding to the input image.
In some optional embodiments, the dual-headed structure comprises a convolutional layer, a first-stage network and a second-stage network, the first-stage network comprising a bounding box extraction unit, a two-stage network and a first-stage regression network, the second-stage network comprising a first multi-stage network to a third multi-stage network and a first regression network to a third regression network, the dual-headed structure unit comprising:
a convolution subunit, configured to input feature extraction information corresponding to the input image into the convolution layer, so as to obtain a convolution result corresponding to the input image;
the first bounding box subunit is used for inputting the convolution result corresponding to the input image into the bounding box extraction unit to obtain first-stage bounding box information corresponding to the input image;
the second bounding box subunit is configured to obtain second-stage bounding box information corresponding to the input image by using the first-stage bounding box information corresponding to the input image, the two-class network, and the first-stage regression network;
a first information subunit, configured to obtain first bounding box information corresponding to the input image by using second-stage bounding box information corresponding to the input image, the first multi-classification network, and the first regression network;
a second information subunit, configured to obtain second bounding box information corresponding to the input image by using the first bounding box information corresponding to the input image, the second multi-classification network, and the second regression network;
and the information prediction subunit is configured to acquire prediction detection information corresponding to the input image by using the second bounding box information corresponding to the input image, the third multi-classification network, and the third regression network.
In a fourth aspect, the present application provides an object detection apparatus comprising:
the image module to be detected is used for acquiring an image to be detected;
the image prediction module is used for inputting the image to be detected into a target detection model to obtain prediction detection information corresponding to the image to be detected;
the target detection model is obtained by training by using any one of the model training methods.
In a fifth aspect, the present application provides an electronic device, which includes a memory and a processor, where the memory stores a computer program, and the processor implements the steps of any one of the above model training methods or the above target detection method when executing the computer program.
In a sixth aspect, the present application provides a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program or an object detection model;
the computer program when executed by a processor implementing the steps of any of the above described model training methods or the steps of the above described target detection methods;
the target detection model is obtained by training by any one of the model training methods.
Drawings
The present application is further described below with reference to the drawings and examples.
FIG. 1 is a schematic flow chart diagram illustrating a model training method according to an embodiment of the present disclosure;
FIG. 2 is a schematic flow chart illustrating a process for obtaining a target detection model according to an embodiment of the present disclosure;
fig. 3 is a schematic flowchart of training a preset deep neural network according to an embodiment of the present application;
FIG. 4 is a schematic flow chart illustrating adapting an augmented image according to the present annotation detection information according to an embodiment of the present application;
fig. 5 is a schematic flowchart of obtaining the prediction detection information according to an embodiment of the present application;
fig. 6 is a schematic flowchart of obtaining feature extraction information of a plurality of input images according to an embodiment of the present application;
fig. 7 is a schematic flowchart of obtaining feature extraction information according to an embodiment of the present application;
fig. 8 is a schematic flow chart of upsampling using caraafe according to an embodiment of the present application;
fig. 9 is a schematic flowchart of obtaining feature extraction information according to an embodiment of the present application;
fig. 10 is a schematic flowchart illustrating a process of obtaining prediction detection information corresponding to an input image according to an embodiment of the present application;
FIG. 11 is a schematic flow chart of a dual head structure provided by an embodiment of the present application;
FIG. 12 is a schematic flow chart of another dual head structure provided by an embodiment of the present application;
FIG. 13 is a schematic flow chart of another dual head structure provided by an embodiment of the present application;
FIG. 14 is a schematic flow chart of another dual head structure provided by an embodiment of the present application;
fig. 15 is a schematic flowchart of a target detection method according to an embodiment of the present application;
FIG. 16 is a schematic flow chart illustrating a further method for detecting an object according to an embodiment of the present disclosure;
FIG. 17 is a schematic structural diagram of a model training apparatus according to an embodiment of the present disclosure;
FIG. 18 is a schematic structural diagram of a model training section according to an embodiment of the present disclosure;
FIG. 19 is a schematic diagram of a configuration of a resistance training module according to an embodiment of the present application;
FIG. 20 is a block diagram of a first prediction sub-module according to an embodiment of the present disclosure;
fig. 21 is a schematic structural diagram of a feature extraction unit provided in an embodiment of the present application;
fig. 22 is a schematic structural diagram of another feature extraction unit provided in an embodiment of the present application;
FIG. 23 is a schematic structural diagram of a dual-head structural unit provided in an embodiment of the present application;
fig. 24 is a schematic structural diagram of an object detection apparatus according to an embodiment of the present application;
fig. 25 is a block diagram of an electronic device according to an embodiment of the present application;
fig. 26 is a schematic structural diagram of a program product for implementing a model training method or an object detection method according to an embodiment of the present application.
Detailed Description
The present application is further described with reference to the accompanying drawings and the detailed description, and it should be noted that, in the present application, the embodiments or technical features described below may be arbitrarily combined to form a new embodiment without conflict.
The terms "first," "second," "third," "fourth," "fifth," "sixth," "seventh," "eighth," "ninth," and the like in the description and in the claims of the present application and in the above-described drawings (if any) are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "corresponding" and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Referring to fig. 1, an embodiment of the present application provides a model training method, configured to train a preset deep neural network, where the preset deep neural network includes a prediction module, the prediction module uses Cascade RCNN and uses CBNet as a feature extraction network of the Cascade RCNN, and the model training method includes steps S101 to S102.
Step S101: acquiring a training data set, wherein each training data in the training data set comprises a training image and label detection information corresponding to the training image, and the label detection information corresponding to the training image comprises label classification information and label bounding box information corresponding to the training image.
Step S102: and training the preset deep neural network by using the training data set to obtain a target detection model.
The target detection model obtained by training the preset deep neural network is not limited, and the target detection model obtained by training the preset deep neural network can be a target detection model of a driving scene, a target detection model of an unmanned aerial vehicle and the like. The preset deep neural network includes a prediction module that uses Cascade RCNN and uses CBNet as a feature extraction network for Cascade RCNN.
In a specific application scene, a preset deep neural network is trained, and a target detection model obtained by training the preset deep neural network is a target detection model of a driving scene. The method comprises the steps of firstly, obtaining a training data set for training a preset deep neural network, wherein each training data in the training data set comprises a training driving scene image and label detection information corresponding to the training driving scene image, and the label detection information corresponding to the training image comprises obstacle classification information, obstacle boundary box information, road marking information and road boundary box information. The obstacle classification information is used for marking the type of the obstacle on the corresponding training driving scene image, and the obstacle bounding box information is used for marking the bounding box of the obstacle on the corresponding training driving scene image. The road mark information is used for marking the type of the road mark on the corresponding training driving scene image, and the road mark boundary box information is used for marking the boundary box of the road mark on the corresponding training driving scene image. And training the preset deep neural network by using the training data set to obtain a target detection model of the driving scene. Compared with the original characteristic extraction network of Cascade RCNN, the CBNet has stronger characteristic extraction capability and higher precision, and can be further applied to more scenes.
Therefore, the target detection model obtained by training by the method is used for executing the target detection task, and is more stable, higher in accuracy and wide in application range.
Referring to fig. 2, in some optional embodiments, the preset deep neural network further includes a data augmentation module, and the step S102 may include steps S201 to S206.
S201: and inputting at least one training image into the data augmentation module to obtain an augmentation image corresponding to at least one training image. The corresponding augmented images are obtained through the training image input data augmentation module, and training images similar to the training images but different from the training images can be generated, so that the scale of the training data set is enlarged.
S202: and taking at least one training image and corresponding label detection information thereof as a source domain, taking at least one augmented image corresponding to the training image as an augmented area, and training the preset deep neural network by using the source domain and the augmented area so as to reduce the data distribution difference between the augmented area and the source domain.
S203: and acquiring label detection information of the augmented image corresponding to at least one training image.
S204: and acquiring the target domain.
S205: and taking at least one augmented image corresponding to the training image and the corresponding label detection information thereof as a new augmented domain, and training the preset deep neural network by using the new augmented domain and the target domain so as to reduce the data distribution difference between the augmented domain and the target domain.
S206: and taking the trained preset deep neural network as the target detection model.
The training image may be a training image obtained under some specific weather conditions, shelters, road congestion, or the like. Therefore, the training image is subjected to data amplification through the data amplification module, so that a training data set can be diversified as much as possible, a target detection model obtained by training has strong generalization capability, and the method is also suitable for training images obtained under the conditions of specific weather, shelters or road congestion and the like, and is wide in application range; training the preset deep neural network through the source domain and the augmented wide area, so that the data distribution difference between the augmented wide area and the source domain can be reduced; and training the preset deep neural network by using the new increased area and the target area, so that the data distribution difference between the increased area and the target area can be reduced.
Referring to fig. 3, in some alternative embodiments, the data augmentation module may be a generator, the preset deep neural network may further include a feature extraction module, a gradient inversion layer, and a domain discriminator, and the steps of training the preset deep neural network in steps S202 and S205 may include steps S301 to S306. Wherein, the gradient inversion layer is a gradient reverse layer, which is abbreviated as GRL.
Step S301: inputting an input image into the feature extraction module to obtain first feature information and second feature information corresponding to the input image, wherein the input image has corresponding label detection information or does not have corresponding label detection information.
Step S302: and inputting the first characteristic information corresponding to the input image into the prediction module to obtain prediction detection information corresponding to the input image, wherein the prediction detection information corresponding to the input image comprises prediction classification information and prediction bounding box information corresponding to the input image.
Step S303: and when the input image has corresponding label detection information, training the prediction module based on the label detection information and the prediction detection information corresponding to the input image.
Step S304: and inputting the first characteristic information and the second characteristic information corresponding to the input image into the gradient inversion layer to obtain the gradient inversion information corresponding to the input image.
Step S305: and inputting the gradient inversion information corresponding to the input image into the domain discriminator to obtain the domain discrimination information corresponding to the input image.
Step S306: training the generator and the domain discriminator in a counterlearning manner based on domain discrimination information corresponding to the input image.
In a specific application scenario, the generator is a generator g (generator) learned through a cyclic generation network (CycleGAN), and the domain evaluator is an evaluator Dcycle in the CycleGAN. And obtaining first characteristic information and second characteristic information corresponding to the input image by the prediction module from the input image with the label detection information and the input image without the label detection information.
On one hand, the first feature information corresponding to the input image is input into the prediction module to obtain the prediction detection information corresponding to the input image, and the prediction module is trained through the annotation detection information and the prediction detection information corresponding to the input image. On the other hand, the obtained first feature information and second feature information are input to a gradient inversion layer, and gradient inversion information corresponding to the input image is obtained. Inputting the gradient inversion information corresponding to the input image into the domain discriminator to obtain domain discrimination information corresponding to the input image, training the generator and the domain discriminator in an antagonistic learning mode based on the obtained domain discrimination information, learning the invariant features of the domain in an antagonistic mode, and improving the antagonistic robustness of the preset deep neural network.
Therefore, the first characteristic information and the second characteristic information corresponding to the input image are obtained through the characteristic extraction module, the first characteristic information and the second characteristic information are sent to the gradient inversion layer and the domain discriminator, the generator and the domain discriminator are trained in an antagonistic learning mode, the domain invariant characteristics are learned in the antagonistic mode, and the antagonistic robustness of the preset deep neural network is improved.
Referring to fig. 4, in a specific application scenario, a source image is first transformed using a generator G learned through CycleGAN to generate a composite image. Thereafter, the labeled source domain is used and a first phase adaptation to the synthesized domain is performed. This is followed by a second stage of adaptation, which takes the labeled synthetic domain and aligns the synthetic domain features with the target distribution. In addition, the weight w is obtained from the discriminator Dcycle in the CycleGAN to balance the quality of the synthesized image in the detection loss, and the purpose of adapting to the augmented image through the existing label detection information is achieved.
Referring to fig. 5, in some alternative embodiments, the prediction module may include a feature extraction network and a dual-headed structure, and the step S302 may include steps S401 to S402.
Step S401: and inputting the first feature information corresponding to the input image into the feature extraction network to obtain the feature extraction information corresponding to the input image.
Step S402: and inputting the feature extraction information corresponding to the input image into the double-head structure to obtain the prediction detection information corresponding to the input image.
Therefore, the input feature extraction network obtains the feature extraction information corresponding to the input image according to the first feature information corresponding to the image, and the feature extraction information corresponding to the input image obtained by the feature extraction network is input into the double-head structure, so that the prediction detection information corresponding to the input image can be obtained.
Referring to fig. 6, in some alternative embodiments, the step S401 may include steps S501 to S504.
Step S501: and scaling the long sides in the width and the height of the input image to preset length values, and scaling the short sides in the width and the height of the input image to any value in a preset length range.
Step S502: a plurality of input images including the input image is determined.
Step S503: and filling the short sides of the rest input images to the reference value by taking the maximum value of the short sides in the plurality of input images as the reference value.
Step S504: inputting the plurality of input images into the feature extraction network in a batch mode to obtain feature extraction information corresponding to the plurality of input images, wherein the feature extraction information corresponding to the plurality of input images comprises the feature extraction information corresponding to the input images.
In a specific application, the expression for filling the short sides of the rest of the input images to the reference value with the maximum value of the short sides in the plurality of input images as the reference value is as follows:
S_base=Si+padding
wherein the plurality of input images are a plurality of images sampled randomly in the data set, for the plurality of sampled images (Ii), the width (Ii _ w) and the height (Ii _ h) of the image itself are compared, the long side (max (Ii _ w, Ii _ h)) of the width and the height is selected to be scaled to L, the short side (min (Ii _ w, Ii _ h)) is scaled to S, and S is selected randomly from S1 to S2. When a plurality of sampled images (Ii (i ═ 1,2,3 … n)) are sent to a feature extraction network in the form of a batch (batch), where the long side of all multi-frame images in the batch is L and the short sides of the images are uniform in size, the width of the image whose short side is not the maximum value is increased by padding (padding) to a width equal to the width of the reference (S _ base) based on the maximum value (max (Si)) among the short sides (Si (i ═ 1,2,3 … n)) of the plurality of images in the entire batch. Wherein L is 2048, and the short sides S1-S2 are 768-1080.
Therefore, spatial-level image enhancement can be performed on a plurality of images in a data set in a batch data mode, image noise is removed, and structural information of the original images cannot be damaged.
Referring to fig. 7-9, in some alternative embodiments, the feature extraction network may include Stage1、Stage2、Stage3、Stage4、Stage1_1、Stage2_2、Stage3_3、Stage4_4And a first up-sampling unit to a third up-sampling unit, the step S401 may include steps S601 to S612.
Wherein Stage is1、Stage2、Stage3、Stage4、Stage1_1、Stage2_2、Stage3_3、Stage4_4Each Stage in (a) may have the same structure or may have a different structure.
S601: inputting first characteristic information corresponding to the input image into Stage1Obtaining a characteristic map F corresponding to the input image1
S602: the feature map F corresponding to the input image1Input Stage1_1Obtaining a characteristic map F corresponding to the input image2
S603: the feature map F corresponding to the input image1Input Stage2Obtaining a characteristic map F corresponding to the input image3
S604: the feature map F corresponding to the input image3And feature map F2Added to Stage2_2Obtaining a characteristic map F corresponding to the input image4
S605: the feature map F corresponding to the input image3Input Stage3Obtaining a characteristic map F corresponding to the input image5
S606: the feature map F corresponding to the input image5And feature map F4Added to Stage3_3Obtaining a characteristic map F corresponding to the input image6
S607: the feature map F corresponding to the input image5Input Stage4Obtaining a characteristic map F corresponding to the input image7
S608: the feature map F corresponding to the input image7And feature map F6Added to Stage4_4Obtaining a characteristic map F corresponding to the input image8And corresponding characteristic diagram F of the input image8As a fusion feature M corresponding to the input image3
S609: the feature map F corresponding to the input image8Inputting the third up-sampling unit to obtain a feature map F corresponding to the input image8And the feature map F corresponding to the input image is obtained8And a feature map F corresponding to the input image6Adding to obtain the corresponding fusion feature M of the input image2
S610: corresponding fusion characteristics M of the input image2Inputting a second up-sampling unit to obtain a fusion feature M corresponding to the input image2And the corresponding fusion feature M of the input image is obtained2And a feature map F corresponding to the input image4Adding to obtain the corresponding fusion feature M of the input image1
S611: corresponding fusion characteristics M of the input image1Inputting the first up-sampling unit to obtain the fusion feature M corresponding to the input image1And the corresponding fusion feature M of the input image is obtained1And a feature map F corresponding to the input image2Adding to obtain the corresponding fusion feature M of the input image0
S612: corresponding fusion characteristics M of the input image3The fusion feature M corresponding to the input image2Fusion feature M corresponding to input image1Fusion feature M corresponding to the input image0And extracting information as the characteristic corresponding to the input image.
In a specific application scenario, any input image I (I ═ 1,2,3 … n) of a plurality of images (Ii (I ═ 1,2,3 … n)) in a dataset passes through Stage1Then, a feature map F is generated1,F1As Stage1StStage side by side transversely1_1Input characteristic of (1), F1Passing through Stage1_1Post-production profile F2;F1Passing through Stage2Then, a feature map F is generated3,F3And F2Added to obtain Stage2Stage side by side in transverse direction2_2Input features of (1), via Stage2_2Post-production profile F4;F3Passing through Stage3Then, a feature map F is generated5,F5And F4Added to obtain Stage3Stage side by side in transverse direction3_3Input features of (1), via Stage3_3Post-production profile F6;F5Passing through Stage4Then, a feature map F is generated7,F7And F6Added to obtain Stage4Stage side by side in transverse direction4_4Input features of (1), via Stage4_4Post-production profile F8. Extracting F produced by the above process2、F4、F6、F8,F8After upsampling, form a sum F6Feature maps of the same size, same channel, added to fuse stages4_4And Stage3_3Characteristics M of the phases2;M2After upsampling, form a sum F4Feature maps of the same size, same channel, added to fuse stages3_3And Stage2_2The phase is characterized by M1;M1After upsampling, form a sum F2Feature maps of the same size, same channel, added to fuse stages2_2And Stage1_1The phase is characterized by M0(ii) a F is to be8Directly as M3And (6) outputting.
In a specific application scenario, as shown in fig. 8, the upsampling method employs caraafe. For shapes H x W x CInputting a feature map, compressing the number of channels of the input feature map to H × W × C by using a 1 × 1 convolutionmThe convolution is used for compressing the number of channels of the input feature map, so that the calculation amount of the subsequent step can be reduced.
For the compressed input feature map, use a kencoder×kencoderPredicting the upsampled core by the convolution layer of (a), the number of input channels being set to CmThe number of output channels is set to σ2k2 upExpanding the channel dimension in the spatial dimension can result in a shape of σ H × σ W × k2 upThe upsampling core of (a); the resulting upsampled kernel is normalized with softmax such that the convolution kernel weight is 1.
Thus, a plurality of fusion features corresponding to the input image are used as feature extraction information corresponding to the input image.
Referring to fig. 10 to 14, in some alternative embodiments, the dual-headed structure includes a convolution layer, a first-stage network including a bounding box extracting unit, a two-stage network and a first-stage regression network, and a second-stage network including first to third multi-stage networks and first to third regression networks, and the step S402 may include steps S701 to S706.
S701: and inputting the feature extraction information corresponding to the input image into the convolution layer to obtain a convolution result corresponding to the input image.
S702: and inputting the convolution result corresponding to the input image into the boundary box extraction unit to obtain the first-stage boundary box information corresponding to the input image.
S703: and acquiring second-stage boundary box information corresponding to the input image by using the first-stage boundary box information corresponding to the input image, the two-classification network and the first-stage regression network.
S704: and acquiring first bounding box information corresponding to the input image by using the second-stage bounding box information corresponding to the input image, the first multi-classification network and the first regression network.
S705: and acquiring second bounding box information corresponding to the input image by using the first bounding box information corresponding to the input image, the second multi-classification network and the second regression network.
S706: and acquiring the prediction detection information corresponding to the input image by using the second bounding box information corresponding to the input image, the third multi-classification network and the third regression network.
In a specific application scenario, as shown in fig. 11 to 14, the corresponding fusion feature M of the input image is used0、M1、M2、M3A 3 × 3 convolution is performed, and then the convolutions are fed into the first stage Network and the second stage Network, respectively, where the first stage Network is RPN (Region pro-active Network) and the second stage Network is Cascade RCNN. In the first stage, a convolution result corresponding to an input image is input into the boundary box extraction unit to obtain first-stage boundary box information corresponding to the input image, a plurality of anchors (anchors) with fixed size and fixed proportion are artificially set to be used as predicted boundary boxes, and then boundary box information (proposals) with higher confidence coefficient is screened from the anchors (anchors) through a classification network and a regression network to be used as boundary boxes of a second stage. The classification network is a two-class network, only predicts the probability value of whether a target exists in an anchor (anchor), and the regression network predicts the offset, namely, if a target possibly exists in a certain anchor (anchor), the deviation between the anchor (anchor) and the target real bounding box regression (bounding box). Similarly, the second stage network uses the bounding box information (proposals) as the predicted bounding box, and then screens out the final bounding box from the bounding box information (proposals) through a classification network and a regression network. Wherein the number of classes of the multi-class network depends on the number of classes to be detected in the data set. The regression network predicts the offset between all bounding box information (proposals) and the real bounding box regression (bounding box).
The second stage adopts three-stage cascade network for prediction, wherein the first multi-classification network, the second multi-classification network and the third multi-classification network are FC-head, and the first regression network and the second regression networkThe third regression network is Conv-head, and the output of the first-level network is the first bounding box information (propusals)1) Output second bounding box information (propusals) of the second level network as input bounding box information of the second level network2) And as the input boundary box information of the third-level network, the output value of the third-level network is the prediction monitoring information corresponding to the input image. FC-head was used as the classification network and Conv-head was used as the regression network.
Therefore, the classification task usually needs more image semantic information, the regression task needs more spatial information, the characteristics of different requirements are considered by adopting a double-head structure, and the effect is more obvious.
Referring to fig. 15, an embodiment of the present application further provides an object detection method, which includes steps S11 to S12.
S11: and acquiring an image to be detected. In some embodiments, the image to be detected may comprise any one of the following: monitoring an image; the traffic image is stored in the camera storage device; a transmitted image of the aircraft.
S12: and inputting the image to be detected into a target detection model to obtain the corresponding prediction detection information of the image to be detected. The target detection model is obtained by training by using any one of the model training methods.
Therefore, the image to be detected is input into the target detection model, and the prediction detection information corresponding to the image to be detected can be accurately and stably obtained.
In one embodiment, the flow of the target detection method is shown in FIG. 16. Firstly, data are obtained, wherein the data can be images of driving scenes and images to be detected acquired by unmanned aerial vehicles. And carrying out data augmentation (image enhancement) on the image to be detected so as to remove the noise of the image to be detected. An image to be detected, which may comprise a marked image and an unmarked image, is input to an encoder (encoder) to extract corresponding first feature information (featL) and second feature information (featu) of the image to be detected. Applying the first feature information (featL) to a Double-headed Cascade RCNN (Cascade RCNN with Double-Head) to learn supervised object detection using a detector network and to obtain predicted detection information corresponding to an image to be detected, and forwarding both the first feature information (featL) and the second feature information (featu) to a gradient inversion layer (GRL) and a Domain Discriminator (Domain Discriminator) to learn Domain invariant characteristics in a resistively manner.
Referring to fig. 17, an embodiment of the present application further provides a model training apparatus, and a specific implementation manner of the model training apparatus is consistent with the implementation manner and the achieved technical effect described in the embodiment of the model training method, and details of a part of the implementation manner and the achieved technical effect are not repeated.
The model training device is used for training a preset deep neural network, the preset deep neural network comprises a prediction module, the prediction module uses Cascade RCNN and uses CBNet as a feature extraction network of the Cascade RCNN, and the model training device comprises: a training data set part 101, configured to obtain a training data set, where each piece of training data in the training data set includes a training image and label detection information corresponding to the training image, and the label detection information corresponding to the training image includes label classification information and label bounding box information corresponding to the training image; and a model training part 102, configured to train the preset deep neural network by using the training data set, so as to obtain a target detection model.
Referring to fig. 18, in some optional embodiments, the preset deep neural network may further include a data augmentation module, and the model training part 102 may include: an augmented image module 201, configured to input at least one training image into the data augmentation module to obtain an augmented image corresponding to the at least one training image; a first training module 202, configured to train the preset deep neural network by using at least one of the training images and corresponding label detection information thereof as a source domain and using an augmented image corresponding to at least one of the training images as an augmented domain, so as to reduce a data distribution difference between the augmented domain and the source domain; the annotation information acquisition module 203 is configured to acquire annotation detection information of an augmented image corresponding to at least one of the training images; a target domain obtaining module 204, configured to obtain a target domain; a second training module 205, configured to use an augmented image corresponding to at least one of the training images and label detection information corresponding to the at least one training image as a new augmented domain, and train the preset deep neural network by using the new augmented domain and the target domain to reduce a data distribution difference between the augmented domain and the target domain; and the target detection module 206 is configured to use the trained preset deep neural network as the target detection model.
Referring to fig. 19, in some alternative embodiments, the data augmentation module may be a generator, and the preset deep neural network may further include a feature extraction module, a gradient inversion layer, and a domain discriminator; the first training module 202 and the second training module 205 may comprise a confrontation training module comprising: a feature extraction sub-module 301, configured to input an input image into the feature extraction module to obtain first feature information and second feature information corresponding to the input image, where the input image has corresponding annotation detection information or does not have corresponding annotation detection information; a first prediction sub-module 302, configured to input first feature information corresponding to the input image into the prediction module to obtain prediction detection information corresponding to the input image, where the prediction detection information corresponding to the input image includes prediction classification information and prediction bounding box information corresponding to the input image; a first training sub-module 303, configured to train the prediction module based on the label detection information and the prediction detection information corresponding to the input image when the input image has the corresponding label detection information; a gradient inversion subsystem module 304, configured to input first feature information and second feature information corresponding to the input image into the gradient inversion layer, so as to obtain gradient inversion information corresponding to the input image; a domain identification submodule 305, configured to input gradient inversion information corresponding to the input image into the domain identifier, so as to obtain domain identification information corresponding to the input image; a counterstudy sub-module 306 for training the generator and the domain discriminator in a counterstudy manner based on the domain discrimination information corresponding to the input image.
Referring to fig. 20, in some alternative embodiments, the prediction module may include a feature extraction network and a dual-headed structure; the first prediction sub-module 302 may include: a feature extraction unit 401, configured to input first feature information corresponding to the input image into the feature extraction network, so as to obtain feature extraction information corresponding to the input image; a double-headed structure unit 402, configured to input feature extraction information corresponding to the input image into the double-headed structure, to obtain prediction detection information corresponding to the input image.
Referring to fig. 21, in some alternative embodiments, the feature extraction unit 401 may include: an image scaling subunit 501, configured to scale the long side of the width and the middle of the height of the input image to a preset length value, and scale the short side of the width and the middle of the height of the input image to any value in a preset length range; an image determination subunit 502 for determining a plurality of input images including the input image; an image padding sub-unit 503 configured to pad the short sides of the remaining input images to a reference value, which is a maximum value of the short sides in the plurality of input images; a batch input subunit 504, configured to input the multiple input images into the feature extraction network in a batch manner, so as to obtain feature extraction information corresponding to the multiple input images, where the feature extraction information corresponding to the multiple input images includes feature extraction information corresponding to the input images.
Referring to FIG. 22, in some alternative embodiments, the feature extraction network may include Stage1、Stage2、Stage3、Stage4、Stage1_1、Stage2_2、Stage3_3、Stage4_4And first to third upsampling units, the feature extraction unit 401 may include: a first feature map subunit 601, configured to input first feature information corresponding to the input image into Stage1Obtaining a characteristic map F corresponding to the input image1(ii) a A second feature map subunit 602, configured to map a feature map F corresponding to the input image1Input Stage1_1Obtaining a characteristic map F corresponding to the input image2(ii) a A third feature map subunit 603 for mapping the input imageFeature map F1Input Stage2Obtaining a characteristic map F corresponding to the input image3(ii) a A fourth feature map subunit 604, configured to map a feature map F corresponding to the input image3And feature map F2Added to Stage2_2Obtaining a characteristic map F corresponding to the input image4(ii) a A fifth feature map subunit 605, configured to map a feature map F corresponding to the input image3Input Stage3Obtaining a characteristic map F corresponding to the input image5(ii) a A sixth feature map subunit 606, configured to map a feature map F corresponding to the input image5And feature map F4Added to Stage3_3Obtaining a characteristic map F corresponding to the input image6(ii) a A seventh feature map subunit 607, configured to map the feature map F corresponding to the input image5Input Stage4Obtaining a characteristic map F corresponding to the input image7(ii) a An eighth feature map subunit 608, configured to map a feature map F corresponding to the input image7And feature map F6Added to Stage4_4Obtaining a characteristic map F corresponding to the input image8And corresponding characteristic diagram F of the input image8As a fusion feature M corresponding to the input image3(ii) a A third sampling sub-unit 609, configured to apply a feature map F corresponding to the input image8Inputting the third up-sampling unit to obtain a feature map F corresponding to the input image8And the feature map F corresponding to the input image is obtained8And a feature map F corresponding to the input image6Adding to obtain the corresponding fusion feature M of the input image2(ii) a A second sampling subunit 610, configured to apply the fused feature M corresponding to the input image2Inputting a second up-sampling unit to obtain a fusion feature M corresponding to the input image2And the corresponding fusion feature M of the input image is obtained2And a feature map F corresponding to the input image4Adding to obtain the corresponding fusion feature M of the input image1(ii) a A first sampling sub-unit 611 for merging the corresponding fusion features of the input imageSign M1Inputting the first up-sampling unit to obtain the fusion feature M corresponding to the input image1And the corresponding fusion feature M of the input image is obtained1And a feature map F corresponding to the input image2Adding to obtain the corresponding fusion feature M of the input image0(ii) a A feature information subunit 612, configured to blend the features M corresponding to the input image3The fusion feature M corresponding to the input image2Fusion feature M corresponding to input image1Fusion feature M corresponding to the input image0And extracting information as the characteristic corresponding to the input image.
Referring to fig. 23, in some alternative embodiments, the dual-headed structure may include a convolutional layer, a first-stage network and a second-stage network, the first-stage network may include a bounding box extraction unit, a two-stage network and a first-stage regression network, the second-stage network may include first to third multi-stage networks and first to third regression networks, and the dual-headed structure unit 402 may include: a convolution subunit 701, configured to input feature extraction information corresponding to the input image into the convolution layer, so as to obtain a convolution result corresponding to the input image; a first bounding box subunit 702, configured to input the convolution result corresponding to the input image into the bounding box extraction unit, so as to obtain first-stage bounding box information corresponding to the input image; a second bounding box subunit 703, configured to obtain second-stage bounding box information corresponding to the input image by using the first-stage bounding box information corresponding to the input image, the two-class network, and the first-stage regression network; a first information subunit 704, configured to obtain first bounding box information corresponding to the input image by using the second-stage bounding box information corresponding to the input image, the first multi-classification network, and the first regression network; a second information subunit 705, configured to obtain second bounding box information corresponding to the input image by using the first bounding box information corresponding to the input image, the second multi-classification network, and the second regression network; an information predictor 706, configured to obtain prediction detection information corresponding to the input image by using the second bounding box information corresponding to the input image, the third multi-classification network, and the third regression network.
Referring to fig. 24, an embodiment of the present application further provides a target detection apparatus, and a specific implementation manner of the target detection apparatus is consistent with the implementation manner and the achieved technical effect described in the embodiment of the target detection method, and a part of the content is not described again.
The object detection device includes: the image module to be detected 11 is used for acquiring an image to be detected; the image prediction module 12 is configured to input the image to be detected into a target detection model, so as to obtain prediction detection information corresponding to the image to be detected; wherein, the target detection model is obtained by training by using any one of the model training methods.
Referring to fig. 25, an embodiment of the present application further provides an electronic device 200, where the electronic device 200 includes at least one memory 210, at least one processor 220, and a bus 230 connecting different platform systems.
The memory 210 may include readable media in the form of volatile memory, such as Random Access Memory (RAM)211 and/or cache memory 212, and may further include Read Only Memory (ROM) 213.
The memory 210 further stores a computer program, and the computer program can be executed by the processor 220, so that the processor 220 executes the steps of the model training method or the target detection method in the embodiment of the present application, and a specific implementation manner of the method is consistent with the implementation manner and the achieved technical effect described in the embodiment of the model training method or the target detection method, and details of some of the contents are not repeated.
Memory 210 may also include a utility 214 having at least one program module 215, such program modules 215 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
Accordingly, the processor 220 may execute the computer programs described above, and may execute the utility 214.
Bus 230 may be a local bus representing one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor, or any other type of bus structure.
The electronic device 200 may also communicate with one or more external devices 240, such as a keyboard, pointing device, bluetooth device, etc., and may also communicate with one or more devices capable of interacting with the electronic device 200, and/or with any devices (e.g., routers, modems, etc.) that enable the electronic device 200 to communicate with one or more other computing devices. Such communication may be through input-output interface 250. Also, the electronic device 200 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the Internet) via the network adapter 260. The network adapter 260 may communicate with other modules of the electronic device 200 via the bus 230. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 200, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID systems, tape drives, and data backup storage platforms, to name a few.
The embodiment of the present application further provides a computer-readable storage medium, and a specific implementation manner of the computer-readable storage medium is consistent with the implementation manner and the achieved technical effect described in the embodiment of the model training method or the target detection method, and some contents are not repeated.
The computer-readable storage medium is used for storing a computer program or an object detection model; the computer program, when executed, implements the steps of a model training method or a target detection method in embodiments of the present application; the target detection model is obtained by training by any one of the model training methods.
Fig. 26 shows a program product 300 provided by the present embodiment for implementing the above-described model training method or the target detection method, which may employ a portable compact disc read only memory (CD-ROM) and include program codes, and may be executed on a terminal device, such as a personal computer. However, the program product 300 of the present invention is not so limited, and in this application, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. Program product 300 may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
A computer readable storage medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable storage medium may also be any readable medium that can communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing. Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the C language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
While the present application is described in terms of various aspects, including exemplary embodiments, the principles of the invention should not be limited to the disclosed embodiments, but are also intended to cover various modifications, equivalents, and alternatives falling within the spirit and scope of the invention as defined by the appended claims.

Claims (12)

1. A model training method for training a preset deep neural network, the preset deep neural network including a prediction module using Cascade RCNN and using CBNet as a feature extraction network of Cascade RCNN, the model training method comprising:
acquiring a training data set, wherein each training data in the training data set comprises a training image and label detection information corresponding to the training image, and the label detection information corresponding to the training image comprises label classification information and label bounding box information corresponding to the training image;
and training the preset deep neural network by using the training data set to obtain a target detection model.
2. The model training method of claim 1, wherein the preset deep neural network further comprises a data augmentation module, and the training of the preset deep neural network with the training data set to obtain the target detection model comprises:
inputting at least one training image into the data augmentation module to obtain an augmentation image corresponding to the at least one training image;
taking at least one training image and corresponding label detection information thereof as a source domain, taking an augmented image corresponding to at least one training image as an augmented area, and training the preset deep neural network by using the source domain and the augmented area so as to reduce the data distribution difference between the augmented area and the source domain;
acquiring label detection information of an augmented image corresponding to at least one training image;
acquiring a target domain;
taking an augmented image corresponding to at least one training image and label detection information corresponding to the augmented image as a new augmented domain, and training the preset deep neural network by using the new augmented domain and the target domain to reduce the data distribution difference between the augmented domain and the target domain;
and taking the trained preset deep neural network as the target detection model.
3. The model training method of claim 2, wherein the data augmentation module is a generator, the preset deep neural network further comprises a feature extraction module, a gradient inversion layer and a domain discriminator, and the training of the preset deep neural network comprises:
inputting an input image into the feature extraction module to obtain first feature information and second feature information corresponding to the input image, wherein the input image has corresponding label detection information or does not have corresponding label detection information;
inputting first feature information corresponding to the input image into the prediction module to obtain prediction detection information corresponding to the input image, wherein the prediction detection information corresponding to the input image comprises prediction classification information and prediction bounding box information corresponding to the input image;
when the input image has corresponding label detection information, training the prediction module based on the label detection information and the prediction detection information corresponding to the input image;
inputting first characteristic information and second characteristic information corresponding to the input image into the gradient inversion layer to obtain gradient inversion information corresponding to the input image;
inputting the gradient inversion information corresponding to the input image into the domain discriminator to obtain domain discrimination information corresponding to the input image;
training the generator and the domain discriminator in a counterlearning manner based on domain discrimination information corresponding to the input image.
4. The model training method of claim 3, wherein the prediction module comprises a feature extraction network and a dual-headed structure;
the inputting the first feature information corresponding to the input image into the prediction module to obtain the prediction detection information corresponding to the input image includes:
inputting first feature information corresponding to the input image into the feature extraction network to obtain feature extraction information corresponding to the input image;
and inputting the feature extraction information corresponding to the input image into the double-head structure to obtain the prediction detection information corresponding to the input image.
5. The model training method according to claim 4, wherein the inputting first feature information corresponding to the input image into the feature extraction network to obtain feature extraction information corresponding to the input image comprises:
the long sides in the width and the height of the input image are zoomed to preset length values, and the short sides in the width and the height of the input image are zoomed to any value in a preset length range;
determining a plurality of input images including the input image;
filling the short sides of the rest input images to a reference value by taking the maximum value of the short sides in the plurality of input images as the reference value;
inputting the plurality of input images into the feature extraction network in a batch mode to obtain feature extraction information corresponding to the plurality of input images, wherein the feature extraction information corresponding to the plurality of input images comprises the feature extraction information corresponding to the input images.
6. The model training method of claim 4, wherein the feature extraction network comprises Stage1、Stage2、Stage3、Stage4、Stage1_1、Stage2_2、Stage3_3、Stage4_4And a first up-sampling unit to a third up-sampling unit, wherein the step of inputting the first feature information corresponding to the input image into the feature extraction network to obtain the feature extraction information corresponding to the input image comprises the steps of:
inputting first characteristic information corresponding to the input image into Stage1Obtaining a characteristic map F corresponding to the input image1
The feature map F corresponding to the input image1Input Stage1_1Obtaining a characteristic map F corresponding to the input image2
The feature map F corresponding to the input image1Input Stage2Obtaining a characteristic map F corresponding to the input image3
The feature map F corresponding to the input image3And feature map F2Added to Stage2_2Obtaining a characteristic map F corresponding to the input image4
The feature map F corresponding to the input image3Input Stage3Obtaining a characteristic map F corresponding to the input image5
The feature map F corresponding to the input image5And feature map F4Added to Stage3_3Obtaining a characteristic map F corresponding to the input image6
The feature map F corresponding to the input image5Input Stage4Obtaining a characteristic map F corresponding to the input image7
The feature map F corresponding to the input image7And feature map F6Added to Stage4_4Obtaining a characteristic map F corresponding to the input image8And corresponding characteristic diagram F of the input image8As a fusion feature M corresponding to the input image3
The feature map F corresponding to the input image8Inputting the third up-sampling unit to obtain a feature map F corresponding to the input image8And the feature map F corresponding to the input image is obtained8And a feature map F corresponding to the input image6Adding to obtain the corresponding fusion feature M of the input image2
Corresponding fusion characteristics M of the input image2Inputting a second up-sampling unit to obtain a fusion feature M corresponding to the input image2And the corresponding fusion feature M of the input image is obtained2And a feature map F corresponding to the input image4Adding to obtain the corresponding fusion feature M of the input image1
Corresponding fusion characteristics M of the input image1Inputting the first up-sampling unit to obtain the fusion feature M corresponding to the input image1And the corresponding fusion feature M of the input image is obtained1And a feature map F corresponding to the input image2Adding to obtain the corresponding fusion feature M of the input image0
Corresponding fusion characteristics M of the input image3The fusion feature M corresponding to the input image2Fusion feature M corresponding to input image1Fusion feature M corresponding to the input image0And extracting information as the characteristic corresponding to the input image.
7. The model training method of claim 4, wherein the double-headed structure comprises a convolutional layer, a first-stage network and a second-stage network, the first-stage network comprises a bounding box extraction unit, a two-class network and a first-stage regression network, the second-stage network comprises a first multi-class network to a third multi-class network and a first regression network to a third regression network, and the inputting the feature extraction information corresponding to the input image into the double-headed structure to obtain the prediction detection information corresponding to the input image comprises:
inputting the feature extraction information corresponding to the input image into the convolution layer to obtain a convolution result corresponding to the input image;
inputting the convolution result corresponding to the input image into the bounding box extraction unit to obtain first-stage bounding box information corresponding to the input image;
acquiring second-stage boundary box information corresponding to the input image by using the first-stage boundary box information corresponding to the input image, the two-classification network and the first-stage regression network;
acquiring first bounding box information corresponding to the input image by utilizing second-stage bounding box information corresponding to the input image, the first multi-classification network and the first regression network;
acquiring second bounding box information corresponding to the input image by using the first bounding box information corresponding to the input image, the second multi-classification network and the second regression network;
and acquiring the prediction detection information corresponding to the input image by using the second bounding box information corresponding to the input image, the third multi-classification network and the third regression network.
8. An object detection method, characterized in that the object detection method comprises:
acquiring an image to be detected;
inputting the image to be detected into a target detection model to obtain the corresponding prediction detection information of the image to be detected;
wherein the object detection model is trained by the model training method according to any one of claims 1 to 7.
9. A model training apparatus for training a preset deep neural network including a prediction module using Cascade RCNN and using CBNet as a feature extraction network of Cascade RCNN, the model training apparatus comprising:
the training data set part is used for acquiring a training data set, each piece of training data in the training data set comprises a training image and label detection information corresponding to the training image, and the label detection information corresponding to the training image comprises label classification information and label boundary box information corresponding to the training image;
and the model training part is used for training the preset deep neural network by using the training data set to obtain a target detection model.
10. An object detection apparatus, characterized in that the object detection apparatus comprises:
the image module to be detected is used for acquiring an image to be detected;
the image prediction module is used for inputting the image to be detected into a target detection model to obtain prediction detection information corresponding to the image to be detected;
wherein the object detection model is trained by the model training method according to any one of claims 1 to 7.
11. An electronic device, characterized in that the electronic device comprises a memory storing a computer program and a processor implementing the steps of the model training method according to any one of claims 1 to 7 or the steps of the object detection method according to claim 8 when the computer program is executed.
12. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program or an object detection model;
the computer program when being executed by a processor performs the steps of the model training method of any one of claims 1 to 7 or the steps of the object detection method of claim 8;
the object detection model is trained by using the model training method of any one of claims 1 to 7.
CN202110963178.6A 2021-08-20 2021-08-20 Model training method, target detection method and related device Pending CN113869361A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110963178.6A CN113869361A (en) 2021-08-20 2021-08-20 Model training method, target detection method and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110963178.6A CN113869361A (en) 2021-08-20 2021-08-20 Model training method, target detection method and related device

Publications (1)

Publication Number Publication Date
CN113869361A true CN113869361A (en) 2021-12-31

Family

ID=78987994

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110963178.6A Pending CN113869361A (en) 2021-08-20 2021-08-20 Model training method, target detection method and related device

Country Status (1)

Country Link
CN (1) CN113869361A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114067370A (en) * 2022-01-17 2022-02-18 北京新氧科技有限公司 Neck shielding detection method and device, electronic equipment and storage medium
CN114764874A (en) * 2022-04-06 2022-07-19 北京百度网讯科技有限公司 Deep learning model training method, object recognition method and device
CN117115170A (en) * 2023-10-25 2023-11-24 安徽大学 Self-adaptive SAR ship detection method and system in unsupervised domain

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109977918A (en) * 2019-04-09 2019-07-05 华南理工大学 A kind of target detection and localization optimization method adapted to based on unsupervised domain
WO2019144575A1 (en) * 2018-01-24 2019-08-01 中山大学 Fast pedestrian detection method and device
CN111814754A (en) * 2020-08-18 2020-10-23 深延科技(北京)有限公司 Single-frame image pedestrian detection method and device for night scene
CN111814755A (en) * 2020-08-18 2020-10-23 深延科技(北京)有限公司 Multi-frame image pedestrian detection method and device for night motion scene
CN111898668A (en) * 2020-07-24 2020-11-06 佛山市南海区广工大数控装备协同创新研究院 Small target object detection method based on deep learning
CN112215255A (en) * 2020-09-08 2021-01-12 深圳大学 Training method of target detection model, target detection method and terminal equipment
CN112365497A (en) * 2020-12-02 2021-02-12 上海卓繁信息技术股份有限公司 High-speed target detection method and system based on Trident Net and Cascade-RCNN structures
WO2021244079A1 (en) * 2020-06-02 2021-12-09 苏州科技大学 Method for detecting image target in smart home environment

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019144575A1 (en) * 2018-01-24 2019-08-01 中山大学 Fast pedestrian detection method and device
CN109977918A (en) * 2019-04-09 2019-07-05 华南理工大学 A kind of target detection and localization optimization method adapted to based on unsupervised domain
WO2021244079A1 (en) * 2020-06-02 2021-12-09 苏州科技大学 Method for detecting image target in smart home environment
CN111898668A (en) * 2020-07-24 2020-11-06 佛山市南海区广工大数控装备协同创新研究院 Small target object detection method based on deep learning
CN111814754A (en) * 2020-08-18 2020-10-23 深延科技(北京)有限公司 Single-frame image pedestrian detection method and device for night scene
CN111814755A (en) * 2020-08-18 2020-10-23 深延科技(北京)有限公司 Multi-frame image pedestrian detection method and device for night motion scene
CN112215255A (en) * 2020-09-08 2021-01-12 深圳大学 Training method of target detection model, target detection method and terminal equipment
CN112365497A (en) * 2020-12-02 2021-02-12 上海卓繁信息技术股份有限公司 High-speed target detection method and system based on Trident Net and Cascade-RCNN structures

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
HAN-KAI HSU ET AL.: "Progressive Domain Adaptation for Object Detection", WORKSHOP ON APPLICATIONS OF COMPUTER VISION.IEEE, 31 December 2020 (2020-12-31), pages 749 - 757 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114067370A (en) * 2022-01-17 2022-02-18 北京新氧科技有限公司 Neck shielding detection method and device, electronic equipment and storage medium
CN114764874A (en) * 2022-04-06 2022-07-19 北京百度网讯科技有限公司 Deep learning model training method, object recognition method and device
CN114764874B (en) * 2022-04-06 2023-04-07 北京百度网讯科技有限公司 Deep learning model training method, object recognition method and device
CN117115170A (en) * 2023-10-25 2023-11-24 安徽大学 Self-adaptive SAR ship detection method and system in unsupervised domain
CN117115170B (en) * 2023-10-25 2024-01-12 安徽大学 Self-adaptive SAR ship detection method and system in unsupervised domain

Similar Documents

Publication Publication Date Title
CN113869361A (en) Model training method, target detection method and related device
CN108304835B (en) character detection method and device
CN108229341B (en) Classification method and device, electronic equipment and computer storage medium
CN111931664A (en) Mixed note image processing method and device, computer equipment and storage medium
CN109886330B (en) Text detection method and device, computer readable storage medium and computer equipment
CN115731533B (en) Vehicle-mounted target detection method based on improved YOLOv5
US20180285689A1 (en) Rgb-d scene labeling with multimodal recurrent neural networks
CN113159091B (en) Data processing method, device, electronic equipment and storage medium
CN113095346A (en) Data labeling method and data labeling device
CN112528961B (en) Video analysis method based on Jetson Nano
CN110533046B (en) Image instance segmentation method and device, computer readable storage medium and electronic equipment
KR102497361B1 (en) Object detecting system and method
CN115658955B (en) Cross-media retrieval and model training method, device, equipment and menu retrieval system
CN112749666A (en) Training and motion recognition method of motion recognition model and related device
CN115019314A (en) Commodity price identification method, device, equipment and storage medium
CN115131634A (en) Image recognition method, device, equipment, storage medium and computer program product
CN110796003B (en) Lane line detection method and device and electronic equipment
CN112288702A (en) Road image detection method based on Internet of vehicles
CN113762455A (en) Detection model training method, single character detection method, device, equipment and medium
CN108596068B (en) Method and device for recognizing actions
JP2023036795A (en) Image processing method, model training method, apparatus, electronic device, storage medium, computer program, and self-driving vehicle
CN113191364B (en) Vehicle appearance part identification method, device, electronic equipment and medium
CN113762292B (en) Training data acquisition method and device and model training method and device
CN114666656A (en) Video clipping method, video clipping device, electronic equipment and computer readable medium
CN114708429A (en) Image processing method, image processing device, computer equipment and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination