CN112561053B - Image processing method, training method and device of pre-training model and electronic equipment - Google Patents

Image processing method, training method and device of pre-training model and electronic equipment Download PDF

Info

Publication number
CN112561053B
CN112561053B CN202011249923.2A CN202011249923A CN112561053B CN 112561053 B CN112561053 B CN 112561053B CN 202011249923 A CN202011249923 A CN 202011249923A CN 112561053 B CN112561053 B CN 112561053B
Authority
CN
China
Prior art keywords
training
image
model
image processing
distance
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011249923.2A
Other languages
Chinese (zh)
Other versions
CN112561053A (en
Inventor
李超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202011249923.2A priority Critical patent/CN112561053B/en
Publication of CN112561053A publication Critical patent/CN112561053A/en
Priority to JP2021178829A priority patent/JP7267379B2/en
Application granted granted Critical
Publication of CN112561053B publication Critical patent/CN112561053B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses an image processing method, a training method and device of a pre-training model and electronic equipment, and relates to the technical field of deep learning and computer vision. The specific implementation scheme is as follows: the method comprises the steps of obtaining a trained pre-training model, wherein the pre-training model adopts multi-frame training images to train, so that the difference between the first image feature distance and the second image feature distance is minimum in the image feature output by the trained pre-training model, further, according to the general pre-training model and a target image processing task, a corresponding image processing model is generated, the generation efficiency of the image processing model corresponding to the target processing task is improved, the target image processing task is executed on the target image by adopting the generated image processing model, and the image processing model corresponds to the target image processing task, so that the effect and the efficiency of image processing are improved.

Description

Image processing method, training method and device of pre-training model and electronic equipment
Technical Field
The application relates to the technical field of image processing, in particular to the technical field of deep learning and computer vision, and further relates to an image processing method, a training method and device of a pre-training model and electronic equipment.
Background
Image processing technology based on neural network has been developed for many years, according to image processing requirements, a trained image processing model is adopted to perform image processing and identification, however, different image processing tasks have different image processing requirements, a fixed image processing model is adopted to perform image processing, and the image processing requirements in different scenes cannot be met, so how to improve the image processing effect is a technical problem to be solved urgently.
Disclosure of Invention
The application provides an image processing method for improving an image processing effect, a training method and device of a pre-training model and electronic equipment.
According to an aspect of the present application, there is provided an image processing method including:
acquiring a trained pre-training model, wherein the pre-training model is trained by adopting a plurality of frames of training images, so that the difference between the first image feature distance and the second image feature distance is minimum as the image features output by the trained pre-training model meet the requirements; wherein the first image feature distance is a distance between image features of training images extracted from the same video clip, and the second image feature distance is a distance between image features of training images extracted from different video clips;
Generating an image processing model for executing the target image processing task according to the pre-training model;
and executing a target image processing task on the target image by adopting the image processing model.
According to another aspect of the present application, there is provided a training method of a pre-training model, including:
acquiring a plurality of video clips;
extracting multi-frame training images from the plurality of video clips to obtain a training set; at least two frames of training images are extracted from each video clip;
performing a multi-round training process on a pre-training model for image feature extraction using the training set; wherein, each round training process includes: selecting each training image extracted from at least two video clips from the training set; inputting each training image selected in the round into the pre-training model to obtain output image characteristics; according to the image characteristics of each training image selected by the round, determining a first image characteristic distance between training images belonging to the same video segment, and determining a second image characteristic distance between training images belonging to different video segments, and according to the first image characteristic distance and the second image characteristic distance, adjusting model parameters of the pre-training model so as to minimize the difference between the first image characteristic distance and the second image characteristic distance.
According to another aspect of the present application, there is provided an image processing apparatus including:
the acquisition module is used for acquiring a trained pre-training model, and the pre-training model is trained by adopting multi-frame training images, so that the difference between the first image feature distance and the second image feature distance is minimum in the image features output by the trained pre-training model; wherein the first image feature distance is a distance between image features of training images extracted from the same video clip, and the second image feature distance is a distance between image features of training images extracted from different video clips;
the generation module is used for generating an image processing model for executing a target image processing task according to the pre-training model;
and the processing module is used for executing a target image processing task on the target image by adopting the image processing model.
According to another aspect of the present application, there is provided a training apparatus for a pre-training model, including: the acquisition module is used for acquiring a plurality of video clips;
the extraction module is used for extracting multi-frame training images from the plurality of video clips so as to obtain a training set; at least two frames of training images are extracted from each video clip;
The training module is used for executing a multi-round training process on the pre-training model for image feature extraction by adopting the training set; wherein, each round training process includes: selecting each training image extracted from at least two video clips from the training set; inputting each training image selected in the round into the pre-training model to obtain output image characteristics; according to the image characteristics of each training image selected by the round, determining a first image characteristic distance between training images belonging to the same video segment, and determining a second image characteristic distance between training images belonging to different video segments, and according to the first image characteristic distance and the second image characteristic distance, adjusting model parameters of the pre-training model so as to minimize the difference between the first image characteristic distance and the second image characteristic distance.
According to another aspect of the present application, there is provided an electronic apparatus including:
at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the image processing method of the first aspect or the training method of the pre-training model of the second aspect.
According to another aspect of the present application, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the image processing method of the first aspect, or to perform the training method of the pre-training model of the other aspect.
According to another aspect of the present application, there is provided a computer program product comprising a computer program which, when executed by a processor, performs the image processing method of the first aspect, or performs the training method of the pre-training model of the second aspect.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the application or to delineate the scope of the application. Other features of the present application will become apparent from the description that follows.
Drawings
The drawings are included to provide a better understanding of the present application and are not to be construed as limiting the application. Wherein:
fig. 1 is a schematic flow chart of an image processing method according to an embodiment of the present application;
FIG. 2 is a flowchart of another image processing method according to an embodiment of the present application;
FIG. 3 is a schematic structural diagram of an image processing model according to an embodiment of the present application;
FIG. 4 is a schematic flow chart of a training method of a pre-training model according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application;
FIG. 6 is a schematic structural diagram of a training device for pre-training a model according to an embodiment of the present application;
fig. 7 is a block diagram of an electronic device according to an embodiment of the application.
Detailed Description
Exemplary embodiments of the present application will now be described with reference to the accompanying drawings, in which various details of the embodiments of the present application are included to facilitate understanding, and are to be considered merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
The image processing method, the training method and the training device of the pre-training model and the electronic equipment of the embodiment of the application are described below with reference to the accompanying drawings.
Fig. 1 is a flowchart of an image processing method according to an embodiment of the present application.
As shown in fig. 1, the method comprises the steps of:
Step 101, acquiring a trained pre-training model, wherein the pre-training model is trained by adopting a plurality of frames of training images, so that the difference between the first image feature distance and the second image feature distance is minimum as the image features output by the trained pre-training model; wherein the first image feature distance is a distance between image features of training images extracted from the same video clip and the second image feature distance is a distance between image features of training images extracted from different video clips.
When training, the pre-training model in the embodiment can train in a deep learning mode, and compared with other machine learning methods, the deep learning has better performance on a large data set. In the pre-training model in this embodiment, a multi-frame training image extracted from a plurality of video clips is used as a training set, the pre-training model is input, and parameters of the pre-training model are continuously adjusted to perform iterative training on the pre-training model until a result output by the pre-training model meets a preset threshold, so that training is finished, a general pre-training model is generated based on a large amount of image data, and accordingly, the efficiency of generating a corresponding target image processing model based on the general pre-training model can be improved.
The training method of the pre-training model will be described in detail in the following embodiments of the training method of the pre-training model, which will not be described in detail in this embodiment.
Step 102, generating an image processing model for executing the target image processing task according to the pre-training model.
The target image processing task comprises an image classification task, a target detection task or an object identification task.
In the application, after the pre-training model is generated, the pre-training model is a pre-generated general model, so that the image processing model corresponding to the target image processing task is quickly generated according to the image set corresponding to the target image processing task, and the efficiency of generating the image processing model corresponding to the target image processing task is improved.
The image processing model may be a convolutional neural network model CNN (Convolutional Neural Networks, CNN) model or a deep neural network model DNN (Deep Neural Networks, DNN), which is not limited in this embodiment.
And step 103, executing a target image processing task on the target image by adopting the image processing model.
The image processing model in the embodiment is a general pre-training model obtained through pre-training, and the generated image processing model corresponding to the target image processing task improves the generation efficiency of the model.
In the image processing method of the embodiment of the application, a trained pre-training model is obtained, wherein the pre-training model is trained by adopting a plurality of frames of training images, so that the difference between the first image feature distance and the second image feature distance is minimum as the image features output by the trained pre-training model. Furthermore, according to the general pre-training model and the target image processing task, a corresponding image processing model is generated, the generation efficiency of the image processing model corresponding to the target processing task is improved, and the target image processing task is executed on the target image by adopting the generated image processing model.
In the above embodiment, in order to improve the efficiency of image processing, according to the target image processing task and the pre-training model, the image processing model corresponding to the target image processing task is generated. As another possible implementation manner, after the pre-training model is spliced with the network layer corresponding to the target processing task, training is performed to obtain a corresponding image processing model, so as to improve the efficiency of generating the image processing model and improve the effect of image processing.
For this reason, based on the above embodiment, another image processing method is provided in the present embodiment, and fig. 2 is a schematic flow chart of another image processing method provided in the embodiment of the present application, as shown in fig. 2, the step 102 includes the following steps:
step 201, a network layer corresponding to a target image processing task is acquired.
In the application, the acquired network layer and the target image processing task have a corresponding relationship.
In one scenario, the target image processing task is an image classification task, and the corresponding network layer is a classification layer, which is used to classify the target image, for example, determine a corresponding vehicle class, for example, a car, an SUV, and the like, for vehicles included in the image to be classified.
In another scenario, the target image processing task is a target detection task, and the corresponding network layer is a detection network, which is used for identifying a target object included in the target image, for example, for a target image to be processed, whether the image includes an obstacle or not, or whether multiple images include the same target object or not.
In still another scenario, if the target image processing task is an object recognition task, the corresponding network layer is configured to recognize objects in the image, for example, for the target image to be processed, recognize types of objects contained in different areas in the image, or recognize types of objects contained in the image.
Step 202, splicing the pre-training model with a network layer, wherein the input of the network layer is the image characteristic output by the pre-training model, and the output of the network layer is the target image task processing result.
In this embodiment, after a general pre-training model is generated, the pre-training model is spliced with a network layer corresponding to a target image processing task, as shown in fig. 3, and the pre-training model obtained by training is spliced with the network layer to obtain an image processing model to be trained. The image features output by the pre-training model are input into a network layer, and the output of the network layer is a target image task processing result.
And 203, training the spliced pre-training model and the network layer by adopting a training set of target image processing tasks to obtain an image processing model.
In this embodiment, for different target image processing tasks, in order to quickly obtain an image processing model corresponding to the target image processing task, a training set corresponding to the target image processing task is adopted to train the spliced pre-training model and the network layer, so as to obtain the image processing model. That is, the image processing model obtained by training has a corresponding relation with the target image processing task, and after being spliced with the corresponding network layer based on the universal pre-processing model which is completed by pre-training, the training is performed, as a possible implementation manner, parameters of the network layer can be adjusted mainly aiming at the requirements of the processing task of the target image, so that the training efficiency of the corresponding image processing model is improved, the processing requirements of different target image processing tasks are met, and the processing requirements under different scenes are met.
In the image processing method of the embodiment, the general pre-processing model which is completed based on pre-training is spliced with the corresponding network layer, wherein the input of the network layer is the image characteristic output by the pre-training model, the output of the network layer is the target image task processing result, and further training is performed.
In order to implement the above embodiment, the present embodiment provides a training method of a pre-training model.
Fig. 4 is a flow chart of a training method of a pre-training model according to an embodiment of the present application, as shown in fig. 4, the method includes the following steps:
in step 401, a plurality of video clips are acquired.
In one possible implementation manner of the embodiment of the present application, at least one video may be acquired, and each video is randomly segmented into a plurality of video segments.
In one possible implementation manner, in order to obtain more video clips, multiple videos may be obtained, and segmentation processing is performed according to a content difference between adjacent image frames in each video to obtain multiple video clips of each video, that is, when video clip segmentation is performed on each video, the content of each frame in the segmented video clip is continuously changed, so that the continuity of frames in the video clip is improved.
In another possible implementation manner of the embodiment of the present application, a video may be obtained, and segmentation processing is performed according to a content difference between adjacent image frames in the video, so as to obtain a plurality of video segments, that is, when video segment segmentation is performed on the video, the content of each frame in the segmented video segments is continuously changed, so that the continuity of frames in the video segments is improved.
As shown in fig. 3, A, B, the video clips are different from each other.
In one scenario, the different video clips may be split from one video clip. In another scenario, the different video segments may be segmented from multiple video segments. The flexible setting can be specifically performed according to the requirements of the training scene, and the embodiment is not limited.
Step 402, extracting multi-frame training images from a plurality of video clips to obtain a training set; at least two frames of training images are extracted from each video clip.
In this embodiment, the training set is composed of a plurality of training images extracted from a plurality of video clips. As a possible implementation, a certain number of training image frames are randomly extracted from each video clip, and the training set is formed by using the extracted frames of the video clips. At least two frames of training images are extracted from each video clip.
As another possible implementation manner, in order to improve the model training effect, the number of frames of the training image extracted from each video segment is the same, and the uniformity of the distribution of the number of frames of each video segment in the training set is improved, so that the training set trains the pre-training model, and the weight proportion occupied by each video segment is the same when the model parameters are determined, so that the training effect of the subsequent pre-training model is improved.
As shown in fig. 3, A, B and N are respectively different video clips, and in this embodiment, 2 frames are extracted from each video clip as training images for illustration. Where A1 and A2 are two frames in video segment a, B1 and B2 are two frames in video segment B, and N1 and N2 are two frames in video segment N.
For example, a video X is taken to divide the video segment into 3 video segments, namely, video segments A, B and C, respectively, as shown in fig. 3, where N is C, and two frames are extracted from each video segment for illustration.
In the video clip a, the extracted two frames of images are A1 and A2, and A1 and A2 are two continuous frames. In the video clip B, the two extracted frames are B1 and B2, and B1 and B2 are two consecutive frames. In the video clip C, the two extracted frames are C1 and C2, and C1 and C2 are two consecutive frames. Further, the image frames A1, A2, B1, B2, C1, and C2 constitute a training set.
It should be noted that, in practical application, the number of multi-frame training images included in the training set is not limited to the 6-frame images described in the embodiment, and may be flexibly set according to the accuracy requirement of training.
Step 403, performing a multi-round training process on the pre-training model for image feature extraction by using the training set, wherein each round of training process includes: selecting each training image extracted from at least two video clips from a training set, inputting each training image selected by a round into a pre-training model to obtain output image characteristics, determining a first image characteristic distance between training images belonging to the same video clip according to the image characteristics of each training image selected by the round, determining a second image characteristic distance between training images belonging to different video clips, and adjusting model parameters of the pre-training model according to the first image characteristic distance and the second image characteristic distance to minimize the difference between the first image characteristic distance and the second image characteristic distance so that the pre-training model obtained by training can identify the association relationship between different video clips to be used as a universal pre-training model.
In this embodiment, a training set is used to perform a multi-round training process on a pre-training model, in the training process of each round, the training effect is determined according to the recognition result, so as to adjust the parameters of the pre-training model until the model converges, so that the pre-training model can accurately generate the image features of the training image.
It should be understood that, because the training set includes a plurality of video clips belonging to the same video and also includes a plurality of video clips belonging to different videos, in the training process of each round, each training image extracted from at least two video clips is selected from the training set, where the two video clips may belong to the same video or belong to different videos, so that the association relationship between the different video clips can be identified by using the extracted training images, so as to be used as a general pre-training model, and improve the robustness of the general model.
In the training method of the pre-training model, at least two frames of training images are obtained through extracting the acquired multiple video clips respectively, a multi-frame training image is obtained, so that a training set is obtained, a multi-round training process is carried out on the pre-training model for image feature extraction through the training set, in each round of training process, image features are obtained according to the training images, first image feature distances among images are obtained according to the image features of images belonging to the same video clip, second image feature distances among images are obtained according to the image features of images belonging to different video clips, and parameters of the pre-training model are continuously adjusted, so that the difference between the first image feature distances and the second image feature distances is minimum, the training of the universal pre-training model is realized, and the reliability of the image features obtained through recognition of the pre-training model is improved.
Based on the above embodiments, the present embodiment provides another training method of a pre-training model, which illustrates how to determine the first image feature distance between training images belonging to the same video segment in order to improve the fineness of the first image feature distance calculation, and specifically may be implemented by the following steps:
the training images of the pre-training model are input to the training process of the round, the intra-class feature distances among the image features of different training images belonging to the same video segment are determined, and the sum of the intra-class feature distances is determined for at least two video segments selected from the training set to obtain a first image feature distance, so that the association relation among the image features of different training images belonging to the same video segment is indicated through the first image distance.
In one possible implementation manner of the embodiment of the present application, for example, the selected training images i1 and i2 belong to the same video segment i, and the training images i1 and i2 are input into the pre-training module to obtain image features of each training image, which are respectively denoted as hi1 and hi2. Further, the intra-class feature distance d (i 1, i 2) between the image features hi1 and hi2 of the training images i1 and i2 belonging to the same video segment i is calculated, and then, for at least two video segments selected from the training set in the present training process, the sum of the intra-class feature distances is determined, so as to obtain the first image feature distance dist (in), which can be specifically realized by the following formula:
Wherein i is a video clip, i.e. the video clip is a natural number from 1 to n, n is greater than or equal to 2.
In another possible implementation manner of the embodiment of the present application, in order to meet the requirements of different scenes, the image features of different training images belonging to the same video segment may be classified, that is, the image features of different training images are divided into different categories, so as to implement refined feature recognition, for example, the image features belonging to the person category, the image features belonging to the building, or the image features belonging to the nose category are determined, and then, for different training images, feature distances between categories are calculated respectively from the image features of any two training images corresponding to the same category, and then, feature distances between all categories are summed to obtain intra-category feature distances belonging to the same video segment. And then, at least two video clips selected from the training set in the training process are subjected to the determination of the sum of the intra-class feature distances to obtain the first image feature distance, so that the fine degree of the calculation of the first image feature distance is realized, and the accuracy of the calculation of the first image feature distance is improved.
It should be noted that, the image feature distance may be calculated according to a euclidean distance or a cosine distance.
Based on the above embodiments, the present embodiment provides another training method of a pre-training model, which illustrates how to determine the second image feature distance between training images belonging to different video segments in order to improve the fineness of the second image feature distance calculation, and specifically may be implemented by the following steps:
the training images of the pre-training model are input to the training process of the round, the inter-class feature distances among the image features of different training images belonging to different video clips are determined, and the sum of the inter-class feature distances is determined for at least two video clips selected from the training set to obtain a second image feature distance, so that the association relation among the image features of different training images not belonging to the same video clip is indicated through the second image distance.
In one possible implementation manner of the embodiment of the present application, for example, the selected training images i1 and i2 belong to the same video segment i, the training images j1 and j2 belong to the same video segment j, the training images i1 and i2 are input into a pre-training module to obtain image features of each training image, which are respectively denoted as hi1 and hi2, and the training images j1 and j2 are input into the pre-training module to obtain corresponding image features, which are respectively denoted as hj1 and hj2. Furthermore, the inter-class feature distance between the image features of the training images belonging to different video segments i and j is calculated, and then, for at least two video segments selected from the training set in the present training process, the sum of the inter-class feature distances is determined, so as to obtain a second image feature distance dist (interval), which can be specifically realized by the following formula:
Wherein i and j are different video clips, where n is equal to or greater than 2, d (h i1 ,h j1 ) For the inter-class feature distance between the image features hi1 and hj1 of the training images in the different video clips i and j, and d (h i1 ,h j2 )、d(h i2 ,h j1 ) And d (h) i2 ,h j2 ) Inter-class feature distance d (h) between image features for training images in different video segments i and j i2 ,h j1 )。
In this embodiment, two training images are selected from each video clip to be described as an example, and in practical application, the number of the training images selected from each video clip can be flexibly set according to the training requirement, which is not limited in this embodiment.
In another possible implementation manner of the embodiment of the present application, in order to meet the requirements of different scenes, the image features of different training images belonging to different video clips may be classified, that is, the image features of different training images are divided into different categories, so as to implement refined feature recognition, for example, the image features belonging to a person category, the image features belonging to a building, or the image features belonging to a nose category are determined, and then, for the training images belonging to different video clips, feature distances between categories are calculated respectively from the image features of any two training images corresponding to the same category, and then, feature distances between all categories are summed to obtain feature distances between different training images belonging to different video clips. And then, at least two video clips selected from the training set in the training process are subjected to the determination of the sum of the feature distances between the classes so as to obtain the second image feature distance, so that the fine degree of the second image feature distance calculation is realized, and the accuracy of the second image feature distance calculation is improved.
It should be noted that, the image feature distance may be calculated according to a euclidean distance or a cosine distance.
In order to achieve the above embodiments, the present application also provides an image processing apparatus.
Fig. 5 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application.
As shown in fig. 5, includes: an acquisition module 51, a generation module 52 and a processing module 53.
The obtaining module 51 is configured to obtain a trained pre-training model, where the pre-training model uses multiple frames of training images to perform training, so that an image feature output by the trained pre-training model meets a minimum difference between a first image feature distance and a second image feature distance, where the first image feature distance is a distance between image features of training images extracted from a same video segment, and the second image feature distance is a distance between image features of training images extracted from different video segments.
The generating module 52 is configured to generate an image processing model for performing the target image processing task according to the pre-training model.
A processing module 53 for performing a target image processing task on the target image using the image processing model.
Further, in one possible implementation manner of the embodiment of the present application, the generating module 52 is specifically configured to: and acquiring a network layer corresponding to the target image processing task, splicing the pre-training model with the network layer, wherein the input of the network layer is the image characteristics output by the pre-training model, the output of the network layer is the target image task processing result, and training the spliced pre-training model and the network layer by adopting a training set of the target image processing task to obtain the image processing model.
In one possible implementation of an embodiment of the present application, the target image processing task includes an image classification task, a target detection task, or an object recognition task.
It should be noted that the foregoing explanation of the embodiment of the image processing method is also applicable to the image processing apparatus of this embodiment, and the principle is the same, and will not be repeated here.
In the image processing device provided by the embodiment of the application, the trained pre-training model is obtained, wherein the pre-training model is trained by adopting a plurality of frames of training images, so that the difference between the first image feature distance and the second image feature distance is minimum as the image features output by the trained pre-training model. Furthermore, according to the general pre-training model and the target image processing task, a corresponding image processing model is generated, the generation efficiency of the image processing model corresponding to the target processing task is improved, and the target image processing task is executed on the target image by adopting the generated image processing model.
In order to implement the above embodiment, the present embodiment provides a training device for a pre-training model.
Fig. 6 is a schematic structural diagram of a training device for pre-training a model according to an embodiment of the present application. As shown in fig. 6, the apparatus includes: an acquisition module 61, an extraction module 62 and a training module 63.
The obtaining module 61 is configured to obtain a plurality of video clips.
The extracting module 62 is configured to extract a plurality of training images from a plurality of video segments to obtain a training set, where at least two training images are extracted from each video segment.
The training module 63 is configured to perform a plurality of training processes on the pre-training model for image feature extraction using a training set, where each training process includes: selecting each extracted training image from at least two video clips from the training set; inputting each training image selected in the round into a pre-training model to obtain output image characteristics; according to the image characteristics of each training image selected by the round, determining a first image characteristic distance between training images belonging to the same video segment, and determining a second image characteristic distance between training images belonging to different video segments, and according to the first image characteristic distance and the second image characteristic distance, adjusting model parameters of a pre-training model so as to minimize the difference between the first image characteristic distance and the second image characteristic distance.
In one possible implementation of the embodiment of the present application, the training module 63 is specifically configured to: inputting training images of the pre-training model in the training process of the round, and determining intra-class feature distances among image features of different training images belonging to the same video segment; and determining the sum of the feature distances in the class for at least two video clips selected from the training set in the training process of the round, so as to obtain the first image feature distance.
In one possible implementation of the embodiment of the present application, the training module 63 is specifically configured to:
inputting training images of the pre-training model in the training process of the round, and determining the inter-class feature distance between the image features of different training images belonging to different video clips; and determining the sum of the feature distances between the classes for at least two video clips selected from the training set in the training process of the round, so as to obtain the second image feature distance.
In a possible implementation manner of the embodiment of the present application, the number of frames of the training image extracted from each video clip is the same.
In one possible implementation manner of the embodiment of the present application, the obtaining module 61 is specifically configured to: acquiring a plurality of videos; and performing segmentation processing according to the content difference between adjacent image frames in each video to obtain a plurality of video fragments of each video.
In the training device of the pre-training model, at least two frames of training images are obtained through extracting the acquired multiple video clips respectively, a multi-frame training image is obtained, so that a training set is obtained, a multi-round training process is carried out on the pre-training model for image feature extraction through the training set, in each round of training process, image features are obtained according to the training images, a first image feature distance between the images is obtained according to the image features of the images belonging to the same video clip, a second image feature distance between the images is obtained according to the image features of the images belonging to different video clips, and parameters of the pre-training model are continuously adjusted, so that the difference between the first image feature distance and the second image feature distance is minimum, the training of the universal pre-training model is realized, and the reliability of the image features obtained through recognition of the pre-training model is improved.
In order to achieve the above embodiments, an embodiment of the present application provides an electronic device, including:
at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the image processing method of the foregoing method embodiment or the training method of the foregoing method embodiment.
In order to achieve the above-described embodiments, embodiments of the present application provide a non-transitory computer-readable storage medium storing computer instructions for causing the computer to execute the image processing method described in the above-described method embodiments or to execute the training method described in the above-described method embodiments.
In order to achieve the above embodiments, embodiments of the present application provide a computer program product, which includes a computer program that, when executed by a processor, performs the image processing method described in the foregoing method embodiment, or performs the training method of the pre-training model described in the foregoing method embodiment.
According to an embodiment of the present application, the present application also provides an electronic device and a readable storage medium.
As shown in fig. 7, is a block diagram of an electronic device according to an embodiment of the application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the applications described and/or claimed herein.
As shown in fig. 7, the electronic device includes: one or more processors 701, memory 702, and interfaces for connecting the various components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the electronic device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In other embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple electronic devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 701 is illustrated in fig. 7.
Memory 702 is a non-transitory computer readable storage medium provided by the present application. The memory stores instructions executable by the at least one processor to cause the at least one processor to perform the image processing method provided by the present application. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to execute the image processing method provided by the present application.
The memory 702 is used as a non-transitory computer readable storage medium for storing non-transitory software programs, non-transitory computer-executable programs, and modules, such as program instructions/modules (e.g., the acquisition module 51, the generation module 52, and the processing module 53 shown in fig. 5) corresponding to the image processing method according to the embodiment of the present application. The processor 701 executes various functional applications of the server and data processing, i.e., implements the image processing method in the above-described method embodiment, by running non-transitory software programs, instructions, and modules stored in the memory 702.
Memory 702 may include a storage program area that may store an operating system, at least one application program required for functionality, and a storage data area; the storage data area may store data created according to the use of the electronic device of the image processing method, and the like. In addition, the memory 702 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, the memory 702 may optionally include memory remotely located relative to the processor 701, which may be connected to the image processing method's electronics via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The electronic device of the image processing method may further include: an input device 703 and an output device 704. The processor 701, the memory 702, the input device 703 and the output device 704 may be connected by a bus or otherwise, in fig. 7 by way of example.
The input device 703 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic device of the image processing method, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointer stick, one or more mouse buttons, a track ball, a joystick, etc. The output device 704 may include a display apparatus, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibration motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device may be a touch screen.
Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASIC (application specific integrated circuit), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
These computing programs (also referred to as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.
According to the technical scheme provided by the embodiment of the application, the trained pre-training model is obtained, wherein the pre-training model is trained by adopting a plurality of frames of training images, so that the difference between the first image feature distance and the second image feature distance is minimum as the image features output by the trained pre-training model. Furthermore, according to the general pre-training model and the target image processing task, a corresponding image processing model is generated, the generation efficiency of the image processing model corresponding to the target processing task is improved, and the target image processing task is executed on the target image by adopting the generated image processing model.
It should be noted that, the electronic device may also implement the training method of the pre-training model of the present application, and the principle is the same, and will not be described herein again.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present application may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed embodiments are achieved, and are not limited herein.
The above embodiments do not limit the scope of the present application. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present application should be included in the scope of the present application.

Claims (16)

1. An image processing method, comprising:
acquiring a trained pre-training model, wherein the pre-training model is trained by adopting a plurality of frames of training images, so that the difference between the first image feature distance and the second image feature distance is minimum as the image features output by the trained pre-training model meet the requirements; wherein the first image feature distance is a distance between image features of training images extracted from the same video clip, and the second image feature distance is a distance between image features of training images extracted from different video clips;
generating an image processing model for executing the target image processing task according to the pre-training model;
executing a target image processing task on a target image by adopting the image processing model;
the number of training image frames extracted from each video clip is the same.
2. The image processing method of claim 1, wherein the generating an image processing model that performs a target image processing task from the pre-training model comprises:
acquiring a network layer corresponding to the target image processing task;
splicing the pre-training model with the network layer; the input of the network layer is the image characteristics output by the pre-training model, and the output of the network layer is the target image task processing result;
and training the spliced pre-training model and the network layer by adopting the training set of the target image processing task to obtain the image processing model.
3. The image processing method according to claim 1 or 2, wherein the target image processing task includes an image classification task, a target detection task, or an object recognition task.
4. A training method of a pre-training model, comprising:
acquiring a plurality of video clips;
extracting multi-frame training images from the plurality of video clips to obtain a training set; at least two frames of training images are extracted from each video clip;
performing a multi-round training process on a pre-training model for image feature extraction using the training set; wherein, each round training process includes: selecting each training image extracted from at least two video clips from the training set; inputting each training image selected in the round into the pre-training model to obtain output image characteristics; according to the image characteristics of each training image selected by the round, determining a first image characteristic distance between training images belonging to the same video segment, and determining a second image characteristic distance between training images belonging to different video segments, and according to the first image characteristic distance and the second image characteristic distance, adjusting model parameters of the pre-training model so as to minimize the difference between the first image characteristic distance and the second image characteristic distance;
The number of training image frames extracted from each video clip is the same.
5. The training method of claim 4, wherein the determining a first image feature distance between training images belonging to the same video segment comprises:
inputting training images of the pre-training model in the training process of the round, and determining intra-class feature distances among image features of different training images belonging to the same video segment;
and determining the sum of the feature distances in the class for at least two video clips selected from the training set in the training process of the round, so as to obtain the first image feature distance.
6. The training method of claim 4, wherein the determining a second image feature distance between training images belonging to different video segments comprises:
inputting training images of the pre-training model in the training process of the round, and determining the inter-class feature distance between the image features of different training images belonging to different video clips;
and determining the sum of the feature distances between the classes for at least two video clips selected from the training set in the training process of the round, so as to obtain the second image feature distance.
7. The training method of any of claims 4-6, wherein the acquiring a plurality of video clips comprises:
Acquiring a plurality of videos;
and performing segmentation processing according to the content difference between adjacent image frames in each video to obtain a plurality of video fragments of each video.
8. An image processing apparatus comprising:
the acquisition module is used for acquiring a trained pre-training model, and the pre-training model is trained by adopting multi-frame training images, so that the difference between the first image feature distance and the second image feature distance is minimum in the image features output by the trained pre-training model; wherein the first image feature distance is a distance between image features of training images extracted from the same video clip, and the second image feature distance is a distance between image features of training images extracted from different video clips;
the generation module is used for generating an image processing model for executing a target image processing task according to the pre-training model;
the processing module is used for executing a target image processing task on a target image by adopting the image processing model;
the number of training image frames extracted from each video clip is the same.
9. The image processing device according to claim 8, wherein the generating module is specifically configured to:
Acquiring a network layer corresponding to the target image processing task;
splicing the pre-training model with the network layer; the input of the network layer is the image characteristics output by the pre-training model, and the output of the network layer is the target image task processing result;
and training the spliced pre-training model and the network layer by adopting the training set of the target image processing task to obtain the image processing model.
10. The image processing apparatus according to claim 8 or 9, wherein the target image processing task includes an image classification task, a target detection task, or an object recognition task.
11. A training device for a pre-training model, comprising:
the acquisition module is used for acquiring a plurality of video clips;
the extraction module is used for extracting multi-frame training images from the plurality of video clips so as to obtain a training set; at least two frames of training images are extracted from each video clip;
the training module is used for executing a multi-round training process on the pre-training model for image feature extraction by adopting the training set; wherein, each round training process includes: selecting each training image extracted from at least two video clips from the training set; inputting each training image selected in the round into the pre-training model to obtain output image characteristics; according to the image characteristics of each training image selected by the round, determining a first image characteristic distance between training images belonging to the same video segment, and determining a second image characteristic distance between training images belonging to different video segments, and according to the first image characteristic distance and the second image characteristic distance, adjusting model parameters of the pre-training model so as to minimize the difference between the first image characteristic distance and the second image characteristic distance;
The number of training image frames extracted from each video clip is the same.
12. Training device according to claim 11, wherein the training module is specifically configured to:
inputting training images of the pre-training model in the training process of the round, and determining intra-class feature distances among image features of different training images belonging to the same video segment;
and determining the sum of the feature distances in the class for at least two video clips selected from the training set in the training process of the round, so as to obtain the first image feature distance.
13. Training device according to claim 11, wherein the training module is specifically configured to:
inputting training images of the pre-training model in the training process of the round, and determining the inter-class feature distance between the image features of different training images belonging to different video clips;
and determining the sum of the feature distances between the classes for at least two video clips selected from the training set in the training process of the round, so as to obtain the second image feature distance.
14. Training device according to any of the claims 11-13, wherein the acquisition module is specifically adapted to:
acquiring a plurality of videos; and performing segmentation processing according to the content difference between adjacent image frames in each video to obtain a plurality of video fragments of each video.
15. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the image processing method of any one of claims 1-3 or the training method of the pre-training model of any one of claims 4-7.
16. A non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the image processing method of any one of claims 1-3, or to perform the training method of the pre-training model of any one of claims 4-7.
CN202011249923.2A 2020-11-10 2020-11-10 Image processing method, training method and device of pre-training model and electronic equipment Active CN112561053B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202011249923.2A CN112561053B (en) 2020-11-10 2020-11-10 Image processing method, training method and device of pre-training model and electronic equipment
JP2021178829A JP7267379B2 (en) 2020-11-10 2021-11-01 Image processing method, pre-trained model training method, device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011249923.2A CN112561053B (en) 2020-11-10 2020-11-10 Image processing method, training method and device of pre-training model and electronic equipment

Publications (2)

Publication Number Publication Date
CN112561053A CN112561053A (en) 2021-03-26
CN112561053B true CN112561053B (en) 2023-09-08

Family

ID=75041883

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011249923.2A Active CN112561053B (en) 2020-11-10 2020-11-10 Image processing method, training method and device of pre-training model and electronic equipment

Country Status (2)

Country Link
JP (1) JP7267379B2 (en)
CN (1) CN112561053B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11501105B2 (en) * 2018-03-02 2022-11-15 Zoox, Inc. Automatic creation and updating of maps
CN116664966B (en) * 2023-03-27 2024-02-20 北京鹰之眼智能健康科技有限公司 Infrared image processing system

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108235770A (en) * 2017-12-29 2018-06-29 深圳前海达闼云端智能科技有限公司 image identification method and cloud system
CN109063719A (en) * 2018-04-23 2018-12-21 湖北工业大学 A kind of image classification method of co-ordinative construction similitude and category information
CN109101602A (en) * 2018-08-01 2018-12-28 腾讯科技(深圳)有限公司 Image encrypting algorithm training method, image search method, equipment and storage medium
CN109165566A (en) * 2018-08-01 2019-01-08 中国计量大学 A kind of recognition of face convolutional neural networks training method based on novel loss function
CN109918532A (en) * 2019-03-08 2019-06-21 苏州大学 Image search method, device, equipment and computer readable storage medium
CN110188641A (en) * 2019-05-20 2019-08-30 北京迈格威科技有限公司 Image recognition and the training method of neural network model, device and system
US10551846B1 (en) * 2019-01-25 2020-02-04 StradVision, Inc. Learning method and learning device for improving segmentation performance to be used for detecting road user events using double embedding configuration in multi-camera system and testing method and testing device using the same
CN111242026A (en) * 2020-01-13 2020-06-05 中国矿业大学 Remote sensing image target detection method based on spatial hierarchy perception module and metric learning
CN111553428A (en) * 2020-04-30 2020-08-18 北京百度网讯科技有限公司 Method, device, equipment and readable storage medium for training discriminant model

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008136466A1 (en) 2007-05-01 2008-11-13 Dep Co., Ltd. Dynamic image editing device
US10970823B2 (en) * 2018-07-06 2021-04-06 Mitsubishi Electric Research Laboratories, Inc. System and method for detecting motion anomalies in video
CN109523526B (en) 2018-11-08 2021-10-22 腾讯科技(深圳)有限公司 Tissue nodule detection and model training method, device, equipment and system thereof

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108235770A (en) * 2017-12-29 2018-06-29 深圳前海达闼云端智能科技有限公司 image identification method and cloud system
CN109063719A (en) * 2018-04-23 2018-12-21 湖北工业大学 A kind of image classification method of co-ordinative construction similitude and category information
CN109101602A (en) * 2018-08-01 2018-12-28 腾讯科技(深圳)有限公司 Image encrypting algorithm training method, image search method, equipment and storage medium
CN109165566A (en) * 2018-08-01 2019-01-08 中国计量大学 A kind of recognition of face convolutional neural networks training method based on novel loss function
US10551846B1 (en) * 2019-01-25 2020-02-04 StradVision, Inc. Learning method and learning device for improving segmentation performance to be used for detecting road user events using double embedding configuration in multi-camera system and testing method and testing device using the same
CN111488879A (en) * 2019-01-25 2020-08-04 斯特拉德视觉公司 Method and apparatus for improving segmentation performance using dual-embedding configuration
CN109918532A (en) * 2019-03-08 2019-06-21 苏州大学 Image search method, device, equipment and computer readable storage medium
CN110188641A (en) * 2019-05-20 2019-08-30 北京迈格威科技有限公司 Image recognition and the training method of neural network model, device and system
CN111242026A (en) * 2020-01-13 2020-06-05 中国矿业大学 Remote sensing image target detection method based on spatial hierarchy perception module and metric learning
CN111553428A (en) * 2020-04-30 2020-08-18 北京百度网讯科技有限公司 Method, device, equipment and readable storage medium for training discriminant model

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Beyond sharing weights for deep domain adaptation;Artem Rozantsev;《IEEE Transactions on Pattern Analysis and Machine Intelligence》;全文 *

Also Published As

Publication number Publication date
JP7267379B2 (en) 2023-05-01
CN112561053A (en) 2021-03-26
JP2022006189A (en) 2022-01-12

Similar Documents

Publication Publication Date Title
CN111753727B (en) Method, apparatus, device and readable storage medium for extracting structured information
CN111914628B (en) Training method and device of face recognition model
CN111783647B (en) Training method of face fusion model, face fusion method, device and equipment
KR20220113829A (en) Vehicle tracking methods, devices and electronic devices
CN110659600B (en) Object detection method, device and equipment
CN110717933B (en) Post-processing method, device, equipment and medium for moving object missed detection
CN111709873B (en) Training method and device for image conversion model generator
CN112561053B (en) Image processing method, training method and device of pre-training model and electronic equipment
CN111539347B (en) Method and device for detecting target
EP4080470A2 (en) Method and apparatus for detecting living face
CN112288699B (en) Method, device, equipment and medium for evaluating relative definition of image
JP7389824B2 (en) Object identification method and device, electronic equipment and storage medium
CN111783639A (en) Image detection method and device, electronic equipment and readable storage medium
CN111738263A (en) Target detection method and device, electronic equipment and storage medium
CN111444819B (en) Cut frame determining method, network training method, device, equipment and storage medium
CN111640123A (en) Background-free image generation method, device, equipment and medium
US20220004812A1 (en) Image processing method, method for training pre-training model, and electronic device
CN112508004A (en) Character recognition method and device, electronic equipment and storage medium
CN112561879A (en) Ambiguity evaluation model training method, image ambiguity evaluation method and device
CN111862031A (en) Face synthetic image detection method and device, electronic equipment and storage medium
CN111950345A (en) Camera identification method and device, electronic equipment and storage medium
CN111709288A (en) Face key point detection method and device and electronic equipment
CN112560772B (en) Face recognition method, device, equipment and storage medium
CN111932530B (en) Three-dimensional object detection method, device, equipment and readable storage medium
CN111783644B (en) Detection method, detection device, detection equipment and computer storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant