CN114758130B - Image processing and model training method, device, equipment and storage medium - Google Patents

Image processing and model training method, device, equipment and storage medium Download PDF

Info

Publication number
CN114758130B
CN114758130B CN202210423894.XA CN202210423894A CN114758130B CN 114758130 B CN114758130 B CN 114758130B CN 202210423894 A CN202210423894 A CN 202210423894A CN 114758130 B CN114758130 B CN 114758130B
Authority
CN
China
Prior art keywords
image
feature
segmentation
loss function
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210423894.XA
Other languages
Chinese (zh)
Other versions
CN114758130A (en
Inventor
谷祎
孙准
王晓迪
韩树民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202210423894.XA priority Critical patent/CN114758130B/en
Publication of CN114758130A publication Critical patent/CN114758130A/en
Application granted granted Critical
Publication of CN114758130B publication Critical patent/CN114758130B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

The disclosure provides an image processing and model training method, an image processing and model training device, equipment and a storage medium, relates to the technical field of artificial intelligence, in particular to the technical field of deep learning, image processing and computer vision, and can be applied to scenes such as OCR, face recognition and the like. The training method of the image feature extraction model comprises the following steps: performing segmentation processing on first image features of the first image to obtain first segmentation features; performing segmentation processing on second image features of a second image to obtain second segmentation features, wherein the first image and the second image are positive samples; constructing a total loss function based on the first image feature, the second image feature, the first segmentation feature, and the second segmentation feature; and training an image feature extraction model based on the total loss function. The present disclosure may improve performance of an image feature extraction model.

Description

Image processing and model training method, device, equipment and storage medium
Technical Field
The disclosure relates to the technical field of artificial intelligence, in particular to the technical fields of deep learning, image processing and computer vision, which can be applied to scenes such as optical character recognition (Optical Character Recognition, OCR), face recognition and the like, and particularly relates to an image processing and model training method, an image processing and model training device, equipment and a storage medium.
Background
In order for a computer to "understand" an image, and thus obtain "vision," the underlying step of image processing is to subject the image to feature extraction processing to convert the image to a representation or description of a non-image, which is referred to as an image feature.
With the development of deep learning technology, an image feature extraction model may be used to perform feature extraction processing on an input image to output corresponding image features.
Disclosure of Invention
The present disclosure provides an image processing and model training method, apparatus, device and storage medium.
According to an aspect of the present disclosure, there is provided a training method of an image feature extraction model, including: performing segmentation processing on first image features of the first image to obtain first segmentation features; performing segmentation processing on second image features of the second image to obtain second segmentation features; constructing a total loss function based on the first image feature, the second image feature, the first segmentation feature, and the second segmentation feature; and training an image feature extraction model based on the total loss function.
According to another aspect of the present disclosure, there is provided a training apparatus of an image feature extraction model, including: the first acquisition module is used for carrying out segmentation processing on first image features of the first image so as to obtain first segmentation features; the second acquisition module is used for carrying out segmentation processing on second image features of the second image so as to obtain second segmentation features; a building module for building a total loss function based on the first image feature, the second image feature, the first segmentation feature, and the second segmentation feature; and the training module is used for training an image feature extraction model based on the total loss function.
According to another aspect of the present disclosure, there is provided an image processing method including: acquiring an image to be processed; extracting image characteristics of the image to be processed based on preset parameters; the preset parameters are obtained based on a first image feature of a first image sample, a second image feature of a second image sample, and a first segmentation feature and a second segmentation feature, wherein the first segmentation feature is obtained after segmentation processing of the first image feature, the second segmentation feature is obtained after segmentation processing of the second image feature, and the first image sample and the second image sample are positive samples; and acquiring a processing result of the image to be processed based on the image characteristics.
According to another aspect of the present disclosure, there is provided an image processing apparatus including: the acquisition module is used for acquiring the image to be processed; the extraction module is used for extracting the image characteristics of the image to be processed based on preset parameters; the preset parameters are obtained based on a first image feature of a first image sample, a second image feature of a second image sample, and a first segmentation feature and a second segmentation feature, wherein the first segmentation feature is obtained after segmentation processing of the first image feature, the second segmentation feature is obtained after segmentation processing of the second image feature, and the first image sample and the second image sample are positive samples; and the processing module is used for acquiring a processing result of the image to be processed based on the image characteristics.
According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of the above aspects.
According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method according to any one of the above aspects.
According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a method according to any of the above aspects.
According to the technical scheme, the performance of the image feature extraction model can be improved.
It should be understood that the description in this section is not intended to identify key or critical elements of the embodiments of the disclosure nor is it intended to limit the scope of the disclosure. Other representations of the present disclosure will become readily apparent from the following description.
Drawings
The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is a flow chart of a training method for an image feature extraction model provided by an embodiment of the present disclosure;
FIG. 2 is a schematic illustration of an application scenario of a training method for implementing an image feature extraction model of an embodiment of the present disclosure;
FIG. 3 is a flow chart of another training method for an image feature extraction model provided by an embodiment of the present disclosure;
FIG. 4 is a frame diagram corresponding to FIG. 3;
FIG. 5 is a diagram comparing image features extracted by embodiments of the present disclosure with related art;
FIG. 6 is a block diagram of a training device for an image feature extraction model provided by an embodiment of the present disclosure;
FIG. 7 is a flowchart of an image processing method provided by an embodiment of the present disclosure;
fig. 8 is a flowchart of an image processing apparatus provided in an embodiment of the present disclosure;
fig. 9 is a schematic diagram of an electronic device for implementing a training method or image processing method of an image feature extraction model of an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
In the related art, when an image feature extraction model is trained, the granularity is coarse, and the performance is not ideal.
In order to improve the performance of the image feature extraction model, the present disclosure provides the following embodiments.
Fig. 1 is a flowchart of a training method of an image feature extraction model according to an embodiment of the disclosure, where, as shown in fig. 1, the method of the embodiment includes:
101. the first image feature of the first image is segmented to obtain a first segmented feature.
102. And carrying out segmentation processing on second image features of the second image to obtain second segmentation features, wherein the first image and the second image are positive samples.
103. A total loss function is constructed based on the first image feature, the second image feature, the first segmentation feature, and the second segmentation feature.
104. And training an image feature extraction model based on the total loss function.
The execution body of the embodiment may be a training device of the image feature extraction model, where the training device may be located in an electronic device, and the electronic device may be a user terminal or a server.
The first image and the second image are a pair of samples of model training, and are positive samples. The method can be obtained in the existing sample set or generated by adopting a preset generation mode.
The first image features are image features of the first image and can be obtained by performing feature extraction processing on the first image; the second image feature is an image feature of the second image, and can be obtained by performing feature extraction processing on the second image.
The feature obtained by subjecting the first image feature to the segmentation process may be referred to as a first segmentation feature, and the feature obtained by subjecting the second image feature to the segmentation process may be referred to as a second segmentation feature.
Since the segmented features (first and second segmented features) are obtained by segmenting the image features (first and second image features), in general, the dimensions of the segmented features are smaller than those of the image features, and thus, the image features may be regarded as coarse-grained features, and the segmented features may be regarded as fine-grained features.
For example, in general, an image feature is a feature map, and thus the feature map can be divided (split) into a plurality of partitions (patches), each as a set of component features.
Specifically, for example, the dimension of the feature map is n×n, and after the segmentation process, the dimension of the segmented feature may be m×m, where M and N are both positive integers, and M < N; specifically, if divided into 4 blocks, m=n/2.
After the first image feature, the second image feature, the first segmentation feature, and the second segmentation feature are obtained, a total loss function may be constructed based on the features, and an image feature extraction model may be trained based on the total loss function.
The model training process, i.e. the process of adjusting model parameters, for example, the process of initially having initial parameters, may adjust the initial parameters based on the total loss function until the preset number of iterations is reached, and then obtain final model parameters, i.e. generate the final model.
Since the image features are corresponding images, they can be considered as instance levels, being coarse-grained; the segmentation feature is obtained after the image feature is segmented, and can be regarded as a patch level and is fine-grained; thus, various levels of features can be obtained.
In this embodiment, since the first image feature and the second image feature are of sample level, and the first segmentation feature and the second segmentation feature are of block level, a total loss function can be built based on features of multiple levels, so as to train an image feature extraction model.
In order to better understand the embodiments of the present disclosure, application scenarios of the embodiments of the present disclosure are described below.
Fig. 2 is a schematic diagram of an application scenario of a training method for implementing an image feature extraction model of an embodiment of the present disclosure.
As shown in fig. 2, the system architecture corresponding to the application environment may include a user terminal 201 and a server 202, where the user terminal 201 and the server 202 communicate using a communication network. The user terminal 201 may include: personal computer (Personal Computer, PC), mobile device, smart home device, wearable device, etc., mobile device includes cell phone, portable computer, tablet computer, etc., smart home device includes smart speaker, smart television, etc., and wearable device includes smart watch, smart glasses, etc. The server 202 may be a local server or a cloud server, may be a single server or a cluster server, etc. The communication network may be a wired (e.g., fiber optic) and/or wireless (e.g., wifi) communication link.
The user terminal faces to the user, and can acquire data submitted by the user and/or display the data to the user. For example, the user may select an image sample to be processed through a user interface of the user terminal, and the user terminal transmits the image sample to the server.
The server provides various services, for example, during a training phase, the server may train an image feature extraction model based on image samples sent by the user terminal. In the application stage, the server may perform image feature extraction processing on the image to be processed sent by the user terminal based on the generated image feature extraction model, so as to obtain image features of the image to be processed, and perform subsequent procedures based on the image features, for example, performing OCR, face recognition, and the like based on the image features.
In addition, the image processing process may also be performed at the user terminal side, for example, the server may also send the image feature extraction model generated after training to the user terminal. In the application stage, the user terminal may extract image features using the image feature extraction model, perform subsequent processes based on the image features, such as OCR, face recognition, and the like, based on the image features.
The training method of the image feature extraction model according to the embodiment of the present disclosure is described below with reference to the above application scenario:
generally, the model training process is performed on the server side, i.e., the server may receive image samples sent by the user terminal, train image feature extraction models based on the image samples.
The first image and the second image are image samples trained as a model, and may be sent to the server by the user terminal. Alternatively, the server may generate the first image and the second image based on a certain image sample transmitted from the user terminal.
The first image and the second image may be positive samples, where the positive samples refer to samples that are semantically similar, for example, the first image is a kitten image, the second image is also a kitten image (both may be at different angles), the third image is a puppy image, and then the first image and the second image are positive samples, and the third image is a negative sample.
The image feature extraction model may also be referred to as an encoder (encoder).
After the server obtains the first image and the second image, an initial encoder may be used to extract image features of the first image and the second image, respectively, so as to obtain the first image feature and the second image feature.
The initial encoder may be randomly initialized or may be an existing pre-trained model.
After the first image feature and the second image feature are obtained, a segmentation process may be performed to obtain a first segmentation feature and a second segmentation feature, respectively. The dividing process may be specifically an equipartition process, for example, the number of the blocks may be preset, specifically, for example, 4 blocks, and then the first image feature and the second image feature may be respectively equipartited into 4 blocks, where each block is a group of the component features.
It will be appreciated that the above-described segmentation process may be one or more times. For example, the first image feature is feature a, feature a may be divided into feature A1, feature A1 is a group of component features, and then feature A1 may be further divided to obtain feature A2, so that more levels of image features may be obtained. For simplicity of explanation, one division is taken as an example.
After the image features (first image feature, second image feature, first segmentation feature, second segmentation feature) are obtained, a total loss function may be constructed based on the image features.
Wherein a first loss function may be constructed based on the first image feature and the second image feature; constructing a second loss function based on the first segmentation feature and the second segmentation feature; a total loss function is constructed based on the first loss function and the second loss function.
In the above example, the first image feature and the second image feature are used to construct a first loss function, so that a first loss function corresponding to a feature of one level (for example, a sample level) can be obtained, the first segmentation feature and the second segmentation feature are used to construct a second loss function corresponding to a feature of another level (for example, a block level) can be obtained, and the total loss function is constructed based on the first loss function and the second loss function, so that the total loss function is constructed based on the feature of multiple levels, and can contain information of multiple levels, and further, a model with better performance can be trained.
Self-supervision learning is a machine learning model, and can obtain a model with better performance on the basis of no large-scale manual annotation data. Contrast learning is one of self-supervision learning, and the core idea is to pull the distance between positive samples closer and the distance between positive samples farther from negative samples.
In computer vision, image features can be better extracted by adopting an image feature extraction model obtained by contrast learning.
For the contrast learning algorithm, a third image feature may also be obtained, where the third image feature refers to an image feature of a negative sample of the first image, for example, in the above example, the first image and the second image are both cat images, and the dog image may be taken as the negative sample, and correspondingly, the third image feature refers to an image feature of the dog image.
Accordingly, a contrast learning algorithm may be employed to construct a first loss function based on the first image feature, the second image feature, and the third image feature.
In the above example, the first loss function is constructed by adopting the contrast learning algorithm, so that the excellent performance of the contrast learning algorithm can be utilized, and the contrast learning algorithm is a self-supervision algorithm, so that the data do not need to be marked, the workload of manual marking can be reduced, and the training efficiency can be improved; in addition, the contrast learning algorithm is adopted to compare the difference between the positive and negative samples, and an image feature extraction model with better accuracy can be obtained based on the difference.
The total loss function may be a sum of a first loss function constructed based on coarse-grain level features (first image features, second image features, and third image features) of the positive and negative samples and a second loss function constructed based on fine-grain level features (first segmentation features and second segmentation features) of the positive samples. Thus, the total loss function will contain multiple levels of features.
In addition, one of the core ideas of contrast learning is to pull the positive sample distance closer, which can be represented in the first loss function and the second loss function, and the other of the core ideas of contrast learning is to pull the positive and negative sample distance farther, which can be represented in the first loss function.
When the image feature extraction model is constructed based on the total loss function, the method can comprise calculating the gradient of the total loss function, and updating the model parameters of the image feature extraction model by adopting a Back Propagation (BP) algorithm until the preset iteration times are reached.
The first image feature and the second image feature are the image features of the first image and the second image respectively, when the image features are extracted, the first image and the second image can share the same image feature extraction model, and at the moment, the model parameters of the shared same image extraction model can be updated by adopting a BP algorithm until a final image feature extraction model is generated. Or,
When extracting image features, the first image and the second image may adopt different image feature extraction models, for example, the first image adopts the first image feature extraction model to extract the first image features, and the second image adopts the second image feature extraction model to extract the second image features. At this time, the BP algorithm may be used to update the model parameters of one of the image feature extraction models, for example, the BP algorithm may be used to update the model parameters of the first image feature extraction model, and the model parameters of the other image feature extraction model may be updated in other manners. Accordingly, one of the first image feature extraction model and the second image feature extraction model may be used as a final image feature extraction model, for example, the finally generated first image feature extraction model may be used as a final image feature extraction model.
The following describes an example in which the first image and the second image respectively use different image feature extraction models to extract image features, and a contrast learning algorithm is used to construct a first loss function.
Since the image feature extraction model may also be referred to as an encoder, the different image feature extraction models described above may be referred to as a first encoder and a second encoder, respectively, and then the first encoder generated in the training stage may be selected as the encoder used in the application stage, i.e. the final image feature extraction model is the first encoder.
The structure of the image feature extraction model (the first encoder and the second encoder) may be an existing structure, such as moco v2.
Fig. 3 is a flowchart of another training method for an image feature extraction model according to an embodiment of the present disclosure, and fig. 4 is a system architecture diagram corresponding to fig. 3, as shown in fig. 3, and reference may be made to the architecture diagram shown in fig. 4, where the method of the present embodiment includes:
301. a first image and a second image are acquired, the first image and the second image being positive samples of each other.
Where positive samples refer to semantically similar samples and negative samples refer to semantically dissimilar (the manifestations may be that the semantic features are mutually orthogonal) samples.
Specifically, positive and negative samples can be obtained by manually collecting or labeling from an existing image sample set.
Alternatively, for positive samples, they may be obtained based on the same image.
Specifically, two different data enhancement processes may be performed on the same image to obtain the first image and the second image, respectively.
Wherein the same image may be an image in a sample set of existing images.
Typical data enhancement methods in computer vision are Flip (Flip), rotate (Rotate), scale, random clipping or zero padding (Random Crop or Pad), color dithering (Color jitter), noise (Noise), and the like.
In addition, the two different data enhancement processing manners of the present embodiment may further include a processing manner of keeping the image unchanged.
For example, for the same image, the image may be referred to as an original image, one leaving the original image unchanged as a first image; the other image is obtained after the original image is turned over and is used as a second image.
Referring to fig. 4, the first image and the second image may be two images of kittens at different angles.
In the above example, by performing different data enhancement processing on the same image to obtain the first image and the second image, multiple first images and second images which are positive samples can be obtained when the number of samples is small, so that the sample size required by model training is reduced, enough positive samples can be obtained on the basis of the small sample size through different data enhancement processing, model training on the basis of small samples is completed, various costs required by collecting samples with large data size are reduced, and training efficiency is improved.
302. And adopting a first encoder to perform feature extraction processing on the input first image so as to obtain first image features.
Wherein, referring to fig. 4, the input of the first encoder is a first image, the output is a first image feature, and the first encoder performs image feature extraction processing on the first image.
303. And carrying out segmentation processing on the first image feature to obtain a first segmentation feature.
Generally, the image feature is a feature map (feature map), as shown in fig. 4.
That is, the first image feature is a first image feature map, the second image feature is a second image feature map, and the dimensions of the first image feature map and the second image feature map are the same; the segmenting the first image feature of the first image to obtain a first segmented feature includes: the first image feature map is divided into a first number of blocks on average, and the first number of blocks are used as the first division features; the segmenting the second image feature of the second image to obtain a second segmented feature includes: dividing the second image feature map into a second number of blocks on average, and taking the second number of blocks as the second division features; wherein the first number is the same as the second number.
For example, referring to fig. 4, the first image feature may be evenly divided into 4 segments, each segment being a set of first segmentation features.
Since the image features are typically feature maps, fine-grained segmentation features can be easily obtained by means of average segmentation.
The above describes the processing flow of the first image, and the processing flow of the second image is similar, that is, may further include:
304. and adopting the second encoder to perform feature extraction processing on the input second image so as to obtain the second image feature.
305. And performing segmentation processing on the second image features of the second image to obtain second segmentation features.
Wherein 302-303, and 304-305 have no timing constraints.
In addition, it will be appreciated that in acquiring image features, other general steps may be included in addition to the processing of the encoder, such as pooling of the feature map output by the encoder and multi-layer perceptron (Multi Layer Perceptron, MLP) processing. Wherein pooling may specifically be average pooling (avg pooling).
After the above-described general steps such as averaging pooling and MLP processing, the features in the form of a graph can be converted into a vector form. For example, referring to fig. 4, the first image feature is denoted by q, the second image feature is denoted by k, the first segmentation feature is denoted by q1 to q4, the second segmentation feature is denoted by k1 to k4, and q, k, q1 to q4, and k1 to k4 are each in the form of vectors.
In the above example, the first encoder and the second encoder are used to perform the image feature extraction processing on the first image and the second image, so that the first encoder and the second encoder can be adjusted by different adjustment modes. By using two encoders, the first encoder and the second encoder can use respective suitable model parameter adjustment modes to improve respective performances.
306. A third image feature of a negative sample of the first image is acquired.
Wherein the third image feature may be acquired from a memory queue (memory queue).
Specifically, the third image feature may be obtained in advance and stored in the store queue, so that the third image feature may be acquired from the store queue. The number of negative samples may be one or more.
As for the acquisition of the third image feature, it may be an image feature of a negative sample, that is, the third image feature, with the image feature that has been generated by the previous sample as the current sample.
Specifically, the model training process is a process of multiple iterations, with different samples taken for each iteration process. For example, the first image and the second image in the first iteration process are respectively represented by A1 and B1, so that the first image feature corresponding to A1 and the second image feature corresponding to B1 can be obtained in the first iteration process, and at this time, the second image feature corresponding to B1 can be stored in a storage queue; the first image and the second image in the second iteration process are respectively represented by A2 and B2, so that the first image feature corresponding to A2 and the second image feature corresponding to B2 can be obtained in the second iteration process, and at this time, the third image feature corresponding to A2 can be selected as the second image feature of B1 generated in the first iteration process. That is, the image features of the positive samples in the previous iteration process may be taken as the image features of the negative samples in the subsequent iteration process.
Since the length of the storage queue is a set integer value, if the current iteration process is the T (T is a positive integer) th iteration process, the second image feature generated by the (T-K) th to (T-1) th iteration processes can be used as the third image feature of the current iteration process, assuming that the length of the storage queue is K (K is a positive integer). It will be appreciated that the preset initial value may be employed if the second image feature of the previous iteration process or processes is not generated, e.g. the current iteration process is the first iteration process.
Accordingly, the second image feature generated by the current iterative process may be stored in a storage queue as the third image feature of the subsequent iterative process. For example, referring to fig. 4, the currently generated second image feature k may be stored in a store queue.
In the above example, by acquiring the third image feature from the storage queue, the third image feature that has been generated in advance can be acquired, and the third image feature does not have to be generated in real time, so that the efficiency of acquiring the third image feature can be improved, and the amount of computation can be reduced.
307. And adopting a contrast learning algorithm to construct a first loss function based on the first image feature, the second image feature and the third image feature.
The core idea of contrast learning is to pull the positive sample closer to the negative sample farther away, so a first loss function can be chosen that characterizes the idea.
For example, the calculation formula of the first loss function may be:
wherein L1 is a first loss function; q is a first image feature, k is a second image feature, and τ is a preset parameter; k is the length of the storage queue, namely the number of negative samples; k (k) i - The image feature that is the i-th negative sample, i.e., the third image feature.
308. A second loss function is constructed based on the first segmentation feature and the second segmentation feature.
Wherein the second loss function is a loss function capable of characterizing the scaled-up positive sample distance.
For example, the second loss function is KL (Kullback-Leibler) divergence.
Since the division features (the first division feature and the second division feature) are multiple groups, for example, four groups are shown in fig. 4, when the KL divergence is calculated, the KL divergences may be calculated by each group of the first division feature and each group of the second division feature, so as to obtain multiple KL divergences, and then the multiple KL divergences are summed to obtain the second loss function.
Based on the four-component segmentation feature shown in fig. 4, the calculation formula of the second loss function may be:
L2=L KL (q1,k1)+L KL (q2,k2)+L KL (q3,k3)+L KL (q4,k4)
Wherein L2 is a second loss function; l (L) KL (qi, ki) is the KL divergence of the first and second split features of the i-th set, i=1, 2,3,4.
L KL The calculation formula of (qi, ki) is:
where M is the dimension of the first split feature qi and the second split feature ki, qi (x) is the element of the x-th dimension of qi, and ki (x) is the element of the x-th dimension of ki.
307 and 308 have no timing defining relationship.
309. A total loss function is constructed based on the first loss function and the second loss function.
The calculation formula of the total loss function may be:
L=L1+L2;
where L is the total loss function, L1 is the first loss function, and L2 is the second loss function.
It will be appreciated that this embodiment takes the example that the first image and the second image are positive samples of each other. In other examples, the first image and the second image may be negative samples, and when the first loss function is constructed, the image features of the positive samples of the first image may be obtained, and then the first loss function is constructed based on the image features of the positive and negative samples of the contrast learning; however, for the second loss function, if the first image and the second image are negative samples, the loss function of the above-mentioned distance between the zoomed positive samples cannot be used, but instead, since the two are negative samples, the distance between the two needs to be zoomed out, so the loss function capable of representing the distance between the zoomed out two can be used.
310. And adjusting model parameters of the first encoder based on the total loss function to obtain adjusted model parameters of the first encoder.
Wherein, as shown in fig. 4, a BP algorithm may be used to update the model parameters of the first encoder with the gradient values of the total loss function.
311. And obtaining the adjusted model parameters of the second encoder based on the model parameters of the second encoder before adjustment and the adjusted model parameters of the first encoder.
Wherein, for the first encoder, a normal BP algorithm can be adopted to adjust the model parameters.
For the second encoder, the original parameters of the second encoder may be largely preserved.
Specifically, a weighted summation operation may be performed on the model parameters before adjustment of the second encoder and the model parameters after adjustment of the first encoder, so as to obtain the model parameters after adjustment of the second encoder; the model parameters before adjustment of the second encoder correspond to first weight values, the model parameters after adjustment of the first encoder correspond to second weight values, and the first weight values are larger than the second weight values.
For example, the first weight value is 0.9, and the second weight value is 0.1.
In the above example, the model parameters of the first encoder are adjusted based on the total loss function, and the information of each iteration process can be continuously learned; the model parameters of the second encoder are determined based on the parameters of the first encoder and the original parameters of the second encoder, so that the information of the first encoder can be learned and the information of the second encoder can be reserved; therefore, the first encoder and the second encoder can learn information of different aspects respectively, so that the whole training process contains more information as a whole, and the performance of a final image feature extraction model is improved.
In the above example, for the second encoder, the weight value corresponding to the model parameter of the second encoder is relatively large during the weighting operation, so that the model parameter of the second encoder is slowly changed, so that the stability can be improved, and the good performance of the second encoder can be maintained.
301 to 311 show the flow of an iteration process, and when the model is trained, a plurality of iteration processes can be executed until the preset iteration times are reached.
Then, the first encoder reaching the preset iteration number can be used as a final image feature extraction model for the application stage.
In order to better understand the performance of the image feature extraction model generated in this embodiment, a comparison chart is given in fig. 5.
As shown in fig. 5, where (a) is an image of a feature to be extracted, (b) is an image feature of the image shown in (a) extracted using a model of which the related art does not consider multi-level features, and (c) is an image feature of the image shown in (a) extracted using a model of multi-level feature generation in the embodiment of the present disclosure.
As can be seen from fig. 5, with the image features extracted by the model according to the embodiment of the present disclosure, the degree of distinction between the image features of the target (bathtub) and the image features of the background (such as the land) is stronger, that is, the acquisition capability of the model according to the embodiment for key information is stronger, and the feature expression capability is stronger. It will be appreciated that the use of a tinted image will in practice be more likely to exhibit the above-described differentiation, subject to the representation of the greyscale image required by the drawings.
In addition, table 1 gives a comparison of some performance parameters of the model of the presently disclosed embodiments with the related art model based on the sample set Imagenet 100.
TABLE 1
Wherein L1 corresponds to a performance parameter of the related art, and l1+l2 corresponds to a performance parameter of an embodiment of the disclosure, as shown in table 1, the performance parameter of the embodiment of the disclosure (the last two rows) is better, and as the number of iterations increases (the number of iterations of the last row is greater), the performance parameter is better.
Fig. 6 is a block diagram of a training apparatus for an image feature extraction model according to an embodiment of the present disclosure. As shown in fig. 6, the training apparatus 600 of the image feature extraction model includes: an acquisition module 601, a construction module 602 and a training module 603.
The first obtaining module 601 is configured to perform segmentation processing on a first image feature of a first image to obtain a first segmentation feature; the second obtaining module 602 is configured to perform segmentation processing on a second image feature of a second image to obtain a second segmented feature, where the first image and the second image are positive samples; a construction module 603 is configured to construct a total loss function based on the first image feature, the second image feature, the first segmentation feature, and the second segmentation feature; the training module 604 is configured to train the image feature extraction model based on the total loss function.
In this embodiment, since the first image feature and the second image feature are of sample level, and the first segmentation feature and the second segmentation feature are of block level, the total loss function can be constructed based on the features of multiple levels, so as to train the image feature extraction model, and since the features of multiple levels are referred to during model training, the performance of the image feature extraction model can be improved.
In some embodiments, the building module 603 is further configured to: constructing a first loss function based on the first image feature and the second image feature; constructing a second loss function based on the first segmentation feature and the second segmentation feature; a total loss function is constructed based on the first loss function and the second loss function.
In this embodiment, a first loss function is constructed by using a first image feature and a second image feature, so that a first loss function corresponding to a feature of one level (for example, a sample level) can be obtained, a second loss function corresponding to a feature of another level (for example, a block level) can be obtained by using a first segmentation feature and a second segmentation feature, and a total loss function is constructed based on the first loss function and the second loss function, so that the total loss function is constructed based on a feature of multiple levels, and can contain information of multiple levels, so that a model with better performance can be trained.
In some embodiments, the first image and the second image are positive samples of each other; the apparatus 600 further comprises: a third acquisition module for acquiring a third image feature of the negative sample of the first image; the building block 603 is further configured to: and adopting a contrast learning algorithm to construct a first loss function based on the first image feature, the second image feature and the third image feature.
In this embodiment, the first loss function is constructed by adopting the contrast learning algorithm, so that the excellent performance of the contrast learning algorithm can be utilized, and an image feature extraction model with better performance can be obtained.
In some embodiments, the third acquisition module is further to: and acquiring the third image characteristic from a preset storage queue.
In this embodiment, by acquiring the third image feature from the storage queue, the third image feature that has been generated in advance can be acquired, and the third image feature does not need to be generated in real time, so that the efficiency of acquiring the third image feature can be improved, and the operation amount can be reduced.
In some embodiments, the image feature extraction model is a first encoder or a second encoder; the apparatus 600 further comprises: the first coding module is used for carrying out feature extraction processing on the input first image by adopting the first coder so as to obtain the first image feature; and the second coding module is used for carrying out feature extraction processing on the input second image by adopting the second coder so as to obtain the second image features.
In this embodiment, the first encoder and the second encoder are used to perform image feature extraction processing on the first image and the second image, so that different adjustment modes can be used to adjust the first encoder and the second encoder. By using two encoders, the first encoder and the second encoder can use respective suitable model parameter adjustment modes to improve respective performances.
In some embodiments, the training module 604 is further configured to: adjusting model parameters of the first encoder based on the total loss function to obtain adjusted model parameters of the first encoder; and obtaining the adjusted model parameters of the second encoder based on the model parameters of the second encoder before adjustment and the adjusted model parameters of the first encoder.
In this embodiment, the model parameters of the first encoder are adjusted based on the total loss function, so that the information of each iteration process can be continuously learned; the model parameters of the second encoder are determined based on the parameters of the first encoder and the original parameters of the second encoder, so that the information of the first encoder can be learned and the information of the second encoder can be reserved; therefore, the first encoder and the second encoder can learn information of different aspects respectively, so that the whole training process contains more information as a whole, and the performance of a final image feature extraction model is improved.
In some embodiments, the training module is further to: performing weighted summation operation on the model parameters before adjustment of the second encoder and the model parameters after adjustment of the first encoder to obtain the model parameters after adjustment of the second encoder; the model parameters before adjustment of the second encoder correspond to first weight values, the model parameters after adjustment of the first encoder correspond to second weight values, and the first weight values are larger than the second weight values.
In this embodiment, for the second encoder, the weight value corresponding to the model parameter of the second encoder is relatively large during the weighting operation, so that the model parameter of the second encoder is slowly changed, so that the stability can be improved, and the good performance of the second encoder can be maintained.
In some embodiments, the first image feature is a first image feature map, the second image feature is a second image feature map, and the first image feature map is the same dimension as the second image feature map; the first obtaining module 601 is further configured to: the first image feature map is divided into a first number of blocks on average, and the first number of blocks are used as the first division features; the second obtaining module 602 is further configured to: dividing the second image feature map into a second number of blocks on average, and taking the second number of blocks as the second division features; wherein the first number is the same as the second number.
In this embodiment, since the image features are generally feature maps, fine-grained segmentation features can be easily obtained by means of average segmentation.
In some embodiments, the apparatus 600 further comprises: a data enhancement module for performing two different data enhancement processes on the same image to obtain the first image and the second image
In this embodiment, by performing different data enhancement processing on the same image to obtain the first image and the second image, multiple first images and second images which are positive samples can be obtained when the number of samples is small, so that the required sample size is reduced, model training on the basis of small samples is realized, and training efficiency is improved.
The above describes a model training process by which an image feature extraction model can be obtained. The image feature extraction model may be used in an image processing process.
Fig. 7 is a flowchart of an image processing method according to an embodiment of the present disclosure, and as shown in fig. 7, the image processing method may include:
701. and acquiring an image to be processed.
702. Extracting image characteristics of the image to be processed based on preset parameters; the preset parameters are obtained based on a first image feature of a first image sample, a second image feature of a second image sample, and a first segmentation feature and a second segmentation feature, wherein the first segmentation feature is obtained after segmentation processing is performed on the first image feature, the second segmentation feature is obtained after segmentation processing is performed on the second image feature, and the first image sample and the second image sample are positive samples.
703. And acquiring a processing result of the image to be processed based on the image characteristics.
In this embodiment, since the segmentation feature is obtained by performing segmentation processing on the image feature, the image feature may be considered as a coarse-grained feature, and the segmentation feature may be a fine-grained feature; because the preset parameters are determined based on the image features and the segmentation features, the preset parameters can be considered to contain various levels of information, so that when the features of the image to be processed are extracted by adopting the preset parameters, the various levels of features can be extracted, and further, when the processing results are obtained based on the various levels of features, more accurate processing results can be obtained. Therefore, the image processing method improves the accuracy of image processing.
In some embodiments, the preset parameters may refer to model parameters of an image feature extraction model, which may be generated during a training phase, and the training process of the image feature extraction model may be referred to in the related embodiments above.
In the application stage, the input of the image feature extraction model is an image to be processed, and the output is the image feature of the image to be processed.
Image processing may be applied to various relevant scenarios, such as OCR, face recognition, object detection, etc.
Taking face recognition as an example, the image to be processed may be a face image. Accordingly, the image features may be image features of a face image.
Based on the difference of application scenes, the image features can be input into a model of a related downstream task for processing so as to output a processing result.
Taking face recognition as an example, face recognition can be regarded as a classification task, and thus image features can be input into a classification model, and the output of the classification model is the face recognition result, i.e., a face image of who is determined among a plurality of candidates. The specific structure of the classification model may be implemented using various related techniques, such as a fully connected network.
Further, the determining process of the preset parameter may include: the preset parameters can be initialized randomly to obtain initial values; then, a first loss function is constructed based on the first image feature and the second image feature, a second loss function is constructed based on the first segmentation feature and the second segmentation feature, a total loss function is constructed based on the first loss function and the second loss function, the preset parameters are continuously adjusted from an initial value by adopting the total loss function until the preset iteration times are reached, and the preset parameters when the preset iteration times are reached are used as preset parameters finally adopted in an application stage.
Unlike the conventional manner of constructing the loss function based on only the image features, the loss function employed in this embodiment may be referred to as a total loss function, which is determined based on the first loss function and the second loss function, instead of including only the first loss function. Because the first loss function can reflect coarse-grained information and the second loss function can reflect fine-grained information, the total loss function contains coarse-grained information and fine-grained information, namely contains various levels of information, and the preset parameters adjusted based on the information containing the various levels also contain the various levels of information, so that the extraction capability of the preset parameters for image features can be improved, and the accuracy of image processing is improved.
Fig. 8 is a block diagram of an image processing apparatus according to an embodiment of the present disclosure, and as shown in fig. 8, the image processing apparatus may include: an acquisition module 801, an extraction module 802, and a processing module 803.
The acquisition module 801 is used for acquiring an image to be processed; the extracting module 802 is configured to extract image features of the image to be processed based on preset parameters; the preset parameters are obtained based on a first image feature of a first image sample, a second image feature of a second image sample, and a first segmentation feature and a second segmentation feature, wherein the first segmentation feature is obtained after segmentation processing of the first image feature, the second segmentation feature is obtained after segmentation processing of the second image feature, and the first image sample and the second image sample are positive samples; the processing module 803 is configured to obtain a processing result of the image to be processed based on the image feature.
In this embodiment, since the segmentation feature is obtained by performing segmentation processing on the image feature, the image feature may be considered as a coarse-grained feature, and the segmentation feature may be a fine-grained feature; because the preset parameters are determined based on the image features and the segmentation features, the preset parameters can be considered to contain various levels of information, so that when the features of the image to be processed are extracted by adopting the preset parameters, the various levels of features can be extracted, and further, when the processing results are obtained based on the various levels of features, more accurate processing results can be obtained. Therefore, the image processing method improves the accuracy of image processing.
It is to be understood that in the embodiments of the disclosure, the same or similar content in different embodiments may be referred to each other.
It can be understood that "first", "second", etc. in the embodiments of the present disclosure are only used for distinguishing, and do not indicate the importance level, the time sequence, etc.
The first step and the second step have no time sequence limiting relationship, that is, the first step can be executed first and then the second step can be executed, or the second step can be executed first and then the first step can be executed, or the first step and the second step can be executed in parallel.
According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.
Fig. 9 shows a schematic block diagram of an example electronic device 900 that may be used to implement embodiments of the present disclosure. Electronic device 900 is intended to represent various forms of digital computers, such as laptops, desktops, workstations, servers, blade servers, mainframes, and other appropriate computers. Electronic device 900 may also represent various forms of mobile apparatuses such as personal digital assistants, cellular telephones, smartphones, wearable devices, and other similar computing apparatuses. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 9, the electronic device 900 includes a computing unit 901 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 902 or a computer program loaded from a storage unit 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data required for the operation of the electronic device 900 can also be stored. The computing unit 901, the ROM 902, and the RAM 903 are connected to each other by a bus 904. An input/output (I/O) interface 905 is also connected to the bus 904.
A number of components in the electronic device 900 are connected to the I/O interface 905, including: an input unit 906 such as a keyboard, a mouse, or the like; an output unit 909 such as various types of displays, speakers, and the like; a storage unit 908 such as a magnetic disk, an optical disk, or the like; and a communication unit 909 such as a network card, modem, wireless communication transceiver, or the like. The communication unit 909 allows the electronic device 900 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunications networks.
The computing unit 901 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 901 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 901 performs the respective methods and processes described above, for example, a training method of an image feature extraction model or an image processing method. For example, in some embodiments, the training method of the image feature extraction model may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 908. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 900 via the ROM 902 and/or the communication unit 909. When the computer program is loaded into the RAM 903 and executed by the computing unit 901, one or more steps of the training method or the image processing method of the image feature extraction model described above may be performed. Alternatively, in other embodiments, the computing unit 901 may be configured to perform a training method or an image processing method of the image feature extraction model in any other suitable way (e.g. by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service ("Virtual Private Server" or simply "VPS") are overcome. The server may also be a server of a distributed system or a server that incorporates a blockchain.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.
The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims (18)

1. A training method of an image feature extraction model, comprising:
performing segmentation processing on first image features of the first image to obtain first segmentation features;
performing segmentation processing on second image features of a second image to obtain second segmentation features, wherein the first image and the second image are positive samples;
constructing a total loss function based on the first image feature, the second image feature, the first segmentation feature, and the second segmentation feature;
Training an image feature extraction model based on the total loss function;
wherein said constructing a total loss function based on said first image feature, said second image feature, said first segmentation feature, and said second segmentation feature comprises:
constructing a first loss function based on the first and second image features and a third image feature of a negative sample of the first image; the third image features are acquired from a preset storage queue, and the third image features are image features generated by positive samples of a previous iteration process of the current iteration process;
constructing a second loss function based on the first segmentation feature and the second segmentation feature;
constructing the total loss function based on the first loss function and the second loss function;
the first segmentation features and the second segmentation features are multiple groups, the groups are the same, and the positions of the first segmentation features and the second segmentation features of the same group are the same; the second loss function is constructed based on a plurality of parameters, the number of the plurality of parameters being the number of groups, each parameter being constructed based on a first segmentation feature and a second segmentation feature of a same group.
2. The method of claim 1, wherein,
said constructing a first loss function based on said first image feature and said second image feature, comprising:
and adopting a contrast learning algorithm to construct the first loss function based on the first image feature, the second image feature and the third image feature.
3. The method of claim 1, wherein,
the image feature extraction model is a first encoder or a second encoder;
the method further comprises the steps of:
performing feature extraction processing on the input first image by adopting the first encoder to obtain the first image feature;
and adopting the second encoder to perform feature extraction processing on the input second image so as to obtain the second image feature.
4. A method according to claim 3, wherein said training an image feature extraction model based on said total loss function comprises:
adjusting model parameters of the first encoder based on the total loss function to obtain adjusted model parameters of the first encoder;
and obtaining the adjusted model parameters of the second encoder based on the model parameters of the second encoder before adjustment and the adjusted model parameters of the first encoder.
5. The method of claim 4, wherein the obtaining adjusted model parameters of the second encoder based on the pre-adjusted model parameters of the second encoder and the adjusted model parameters of the first encoder comprises:
performing weighted summation operation on the model parameters before adjustment of the second encoder and the model parameters after adjustment of the first encoder to obtain the model parameters after adjustment of the second encoder;
the model parameters before adjustment of the second encoder correspond to first weight values, the model parameters after adjustment of the first encoder correspond to second weight values, and the first weight values are larger than the second weight values.
6. The method according to any one of claims 1 to 5, wherein,
the first image feature is a first image feature map, the second image feature is a second image feature map, and the dimensions of the first image feature map and the second image feature map are the same;
the segmenting the first image feature of the first image to obtain a first segmented feature includes: the first image feature map is divided into a first number of blocks on average, and the first number of blocks are used as the first division features;
The segmenting the second image feature of the second image to obtain a second segmented feature includes: dividing the second image feature map into a second number of blocks on average, and taking the second number of blocks as the second division features; wherein the first number is the same as the second number.
7. The method of any of claims 1-5, further comprising:
and respectively carrying out two different data enhancement processes on the same image to obtain the first image and the second image.
8. A training device for an image feature extraction model, comprising:
the first acquisition module is used for carrying out segmentation processing on first image features of the first image so as to obtain first segmentation features;
the second acquisition module is used for carrying out segmentation processing on second image features of a second image so as to obtain second segmentation features, and the first image and the second image are positive samples;
a building module for building a total loss function based on the first image feature, the second image feature, the first segmentation feature, and the second segmentation feature;
the training module is used for training an image feature extraction model based on the total loss function;
Wherein the building module is further to:
constructing a first loss function based on the first and second image features and a third image feature of a negative sample of the first image; the third image features are acquired from a preset storage queue, and the third image features are image features generated by positive samples of a previous iteration process of the current iteration process;
constructing a second loss function based on the first segmentation feature and the second segmentation feature;
constructing the total loss function based on the first loss function and the second loss function;
the first segmentation features and the second segmentation features are multiple groups, the groups are the same, and the positions of the first segmentation features and the second segmentation features of the same group are the same; the second loss function is constructed based on a plurality of parameters, the number of the plurality of parameters being the number of groups, each parameter being constructed based on a first segmentation feature and a second segmentation feature of a same group.
9. The apparatus of claim 8, wherein,
the build module is further to:
and adopting a contrast learning algorithm to construct the first loss function based on the first image feature, the second image feature and the third image feature.
10. The apparatus of claim 8, wherein,
the image feature extraction model is a first encoder or a second encoder;
the apparatus further comprises:
the first coding module is used for carrying out feature extraction processing on the input first image by adopting the first coder so as to obtain the first image feature;
and the second coding module is used for carrying out feature extraction processing on the input second image by adopting the second coder so as to obtain the second image features.
11. The apparatus of claim 10, wherein the training module is further to:
adjusting model parameters of the first encoder based on the total loss function to obtain adjusted model parameters of the first encoder;
and obtaining the adjusted model parameters of the second encoder based on the model parameters of the second encoder before adjustment and the adjusted model parameters of the first encoder.
12. The apparatus of claim 11, wherein the training module is further to:
performing weighted summation operation on the model parameters before adjustment of the second encoder and the model parameters after adjustment of the first encoder to obtain the model parameters after adjustment of the second encoder;
The model parameters before adjustment of the second encoder correspond to first weight values, the model parameters after adjustment of the first encoder correspond to second weight values, and the first weight values are larger than the second weight values.
13. The device according to any one of claims 8-12, wherein,
the first image feature is a first image feature map, the second image feature is a second image feature map, and the dimensions of the first image feature map and the second image feature map are the same;
the first acquisition module is further configured to: the first image feature map is divided into a first number of blocks on average, and the first number of blocks are used as the first division features;
the second acquisition module is further configured to: dividing the second image feature map into a second number of blocks on average, and taking the second number of blocks as the second division features; wherein the first number is the same as the second number.
14. The apparatus of any of claims 8-12, further comprising:
and the data enhancement module is used for respectively carrying out two different data enhancement processes on the same image so as to obtain the first image and the second image.
15. An image processing method, comprising:
acquiring an image to be processed;
extracting image characteristics of the image to be processed based on preset parameters; the preset parameters are obtained based on a first image feature of a first image sample, a second image feature of a second image sample, and a first segmentation feature and a second segmentation feature, wherein the first segmentation feature is obtained after segmentation processing of the first image feature, the second segmentation feature is obtained after segmentation processing of the second image feature, and the first image sample and the second image sample are positive samples; wherein the preset parameters are trained by a total loss function, the total loss function being constructed based on a first loss function and a second loss function, the first loss function being constructed based on the first and second image features and a third image feature of a negative sample of the first image; the third image features are acquired from a preset storage queue, and the third image features are image features generated by positive samples of a previous iteration process of the current iteration process; the first segmentation feature and the second segmentation feature are multiple groups, the groups are the same, and the positions of the first segmentation feature and the second segmentation feature of the same group are the same; the second loss function is constructed based on a plurality of parameters, the number of the plurality of parameters being the number of groups, each parameter being constructed based on a first segmentation feature and a second segmentation feature of a same group;
And acquiring a processing result of the image to be processed based on the image characteristics.
16. An image processing apparatus comprising:
the acquisition module is used for acquiring the image to be processed;
the extraction module is used for extracting the image characteristics of the image to be processed based on preset parameters; the preset parameters are obtained based on a first image feature of a first image sample, a second image feature of a second image sample, and a first segmentation feature and a second segmentation feature, wherein the first segmentation feature is obtained after segmentation processing of the first image feature, the second segmentation feature is obtained after segmentation processing of the second image feature, and the first image sample and the second image sample are positive samples; wherein the preset parameters are trained by a total loss function, the total loss function being constructed based on a first loss function and a second loss function, the first loss function being constructed based on the first and second image features and a third image feature of a negative sample of the first image; the third image features are acquired from a preset storage queue, and the third image features are image features generated by positive samples of a previous iteration process of the current iteration process; the first segmentation feature and the second segmentation feature are multiple groups, the groups are the same, and the positions of the first segmentation feature and the second segmentation feature of the same group are the same; the second loss function is constructed based on a plurality of parameters, the number of the plurality of parameters being the number of groups, each parameter being constructed based on a first segmentation feature and a second segmentation feature of a same group;
And the processing module is used for acquiring a processing result of the image to be processed based on the image characteristics.
17. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7, 15.
18. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-7, 15.
CN202210423894.XA 2022-04-21 2022-04-21 Image processing and model training method, device, equipment and storage medium Active CN114758130B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210423894.XA CN114758130B (en) 2022-04-21 2022-04-21 Image processing and model training method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210423894.XA CN114758130B (en) 2022-04-21 2022-04-21 Image processing and model training method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN114758130A CN114758130A (en) 2022-07-15
CN114758130B true CN114758130B (en) 2023-12-22

Family

ID=82331901

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210423894.XA Active CN114758130B (en) 2022-04-21 2022-04-21 Image processing and model training method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114758130B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116884077B (en) * 2023-09-04 2023-12-08 上海任意门科技有限公司 Face image category determining method and device, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113378632A (en) * 2021-04-28 2021-09-10 南京大学 Unsupervised domain pedestrian re-identification algorithm based on pseudo label optimization
CN114020950A (en) * 2021-11-03 2022-02-08 北京百度网讯科技有限公司 Training method, device and equipment of image retrieval model and storage medium
CN114049516A (en) * 2021-11-09 2022-02-15 北京百度网讯科技有限公司 Training method, image processing method, device, electronic device and storage medium
CN114186622A (en) * 2021-11-30 2022-03-15 北京达佳互联信息技术有限公司 Image feature extraction model training method, image feature extraction method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113378632A (en) * 2021-04-28 2021-09-10 南京大学 Unsupervised domain pedestrian re-identification algorithm based on pseudo label optimization
CN114020950A (en) * 2021-11-03 2022-02-08 北京百度网讯科技有限公司 Training method, device and equipment of image retrieval model and storage medium
CN114049516A (en) * 2021-11-09 2022-02-15 北京百度网讯科技有限公司 Training method, image processing method, device, electronic device and storage medium
CN114186622A (en) * 2021-11-30 2022-03-15 北京达佳互联信息技术有限公司 Image feature extraction model training method, image feature extraction method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Joint patch and instance discrimination learning for unsupervised person re-identification;Yu Zhao et al.;《Image and Vision Computing》;第1-10页 *
Yu Zhao et al..Joint patch and instance discrimination learning for unsupervised person re-identification.《Image and Vision Computing》.2020,第1-10页. *

Also Published As

Publication number Publication date
CN114758130A (en) 2022-07-15

Similar Documents

Publication Publication Date Title
CN113379627B (en) Training method of image enhancement model and method for enhancing image
CN113361710B (en) Student model training method, picture processing device and electronic equipment
CN113627536B (en) Model training, video classification method, device, equipment and storage medium
CN112580733B (en) Classification model training method, device, equipment and storage medium
CN112785493B (en) Model training method, style migration method, device, equipment and storage medium
CN114186632A (en) Method, device, equipment and storage medium for training key point detection model
KR20220034080A (en) Training method for circulary generating network model, method and apparatus for establishing word library, electronic device, recording medium and computer program
CN115147680B (en) Pre-training method, device and equipment for target detection model
CN114792355B (en) Virtual image generation method and device, electronic equipment and storage medium
CN114020950B (en) Training method, device, equipment and storage medium for image retrieval model
CN114693934B (en) Training method of semantic segmentation model, video semantic segmentation method and device
CN114758130B (en) Image processing and model training method, device, equipment and storage medium
CN113033408B (en) Data queue dynamic updating method and device, electronic equipment and storage medium
CN113657411A (en) Neural network model training method, image feature extraction method and related device
CN115170919B (en) Image processing model training and image processing method, device, equipment and storage medium
CN116402914A (en) Method, device and product for determining stylized image generation model
CN114494782B (en) Image processing method, model training method, related device and electronic equipment
CN112784967B (en) Information processing method and device and electronic equipment
CN113657136B (en) Identification method and device
CN113344213A (en) Knowledge distillation method, knowledge distillation device, electronic equipment and computer readable storage medium
CN116091773B (en) Training method of image segmentation model, image segmentation method and device
CN115294396B (en) Backbone network training method and image classification method
CN116030150B (en) Avatar generation method, device, electronic equipment and medium
CN116051935B (en) Image detection method, training method and device of deep learning model
CN115456167B (en) Lightweight model training method, image processing device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant