CN114758130A - Image processing and model training method, device, equipment and storage medium - Google Patents

Image processing and model training method, device, equipment and storage medium Download PDF

Info

Publication number
CN114758130A
CN114758130A CN202210423894.XA CN202210423894A CN114758130A CN 114758130 A CN114758130 A CN 114758130A CN 202210423894 A CN202210423894 A CN 202210423894A CN 114758130 A CN114758130 A CN 114758130A
Authority
CN
China
Prior art keywords
image
feature
image feature
encoder
segmentation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210423894.XA
Other languages
Chinese (zh)
Other versions
CN114758130B (en
Inventor
谷祎
孙准
王晓迪
韩树民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202210423894.XA priority Critical patent/CN114758130B/en
Publication of CN114758130A publication Critical patent/CN114758130A/en
Application granted granted Critical
Publication of CN114758130B publication Critical patent/CN114758130B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The disclosure provides an image processing and model training method, device, equipment and storage medium, relates to the technical field of artificial intelligence, in particular to the technical field of deep learning, image processing and computer vision, and can be applied to scenes such as OCR (optical character recognition), face recognition and the like. The training method of the image feature extraction model comprises the following steps: performing segmentation processing on a first image feature of a first image to obtain a first segmentation feature; segmenting second image features of a second image to obtain second segmentation features, wherein the first image and the second image are positive samples; constructing a total loss function based on the first image feature, the second image feature, the first segmentation feature, and the second segmentation feature; and training an image feature extraction model based on the total loss function. The performance of the image feature extraction model can be improved.

Description

Image processing and model training method, device, equipment and storage medium
Technical Field
The present disclosure relates to the field of artificial intelligence technologies, and in particular, to the field of deep learning, image processing, and computer vision technologies, which may be applied to scenes such as Optical Character Recognition (OCR), face Recognition, and the like, and in particular, to a method, an apparatus, a device, and a storage medium for image processing and model training.
Background
In order to enable a computer to "understand" an image and thereby obtain "vision", the basic step of image processing is to perform a feature extraction process on the image to convert the image into a representation or description of non-images, these representations or descriptions being image features.
With the development of the deep learning technology, an image feature extraction model may be used to perform feature extraction processing on an input image to output corresponding image features.
Disclosure of Invention
The disclosure provides an image processing and model training method, device, equipment and storage medium.
According to an aspect of the present disclosure, there is provided a training method of an image feature extraction model, including: performing segmentation processing on a first image feature of a first image to obtain a first segmentation feature; performing segmentation processing on a second image feature of the second image to obtain a second segmentation feature; constructing a total loss function based on the first image feature, the second image feature, the first segmentation feature, and the second segmentation feature; and training an image feature extraction model based on the total loss function.
According to another aspect of the present disclosure, there is provided a training apparatus for an image feature extraction model, including: the first acquisition module is used for carrying out segmentation processing on first image characteristics of the first image so as to obtain first segmentation characteristics; the second acquisition module is used for segmenting second image characteristics of the second image to obtain second segmentation characteristics; a construction module for constructing a total loss function based on the first image feature, the second image feature, the first segmentation feature and the second segmentation feature; and the training module is used for training an image feature extraction model based on the total loss function.
According to another aspect of the present disclosure, there is provided an image processing method including: acquiring an image to be processed; extracting image features of the image to be processed based on preset parameters; the preset parameters are obtained based on a first image feature of a first image sample, a second image feature of a second image sample, and a first segmentation feature and a second segmentation feature, wherein the first segmentation feature is obtained after segmentation processing is performed on the first image feature, the second segmentation feature is obtained after segmentation processing is performed on the second image feature, and the first image sample and the second image sample are positive samples; and acquiring a processing result of the image to be processed based on the image characteristics.
According to another aspect of the present disclosure, there is provided an image processing apparatus including: the acquisition module is used for acquiring an image to be processed; the extraction module is used for extracting the image characteristics of the image to be processed based on preset parameters; the preset parameters are obtained based on a first image feature of a first image sample, a second image feature of a second image sample, and a first segmentation feature and a second segmentation feature, wherein the first segmentation feature is obtained after segmentation processing is performed on the first image feature, the second segmentation feature is obtained after segmentation processing is performed on the second image feature, and the first image sample and the second image sample are positive samples; and the processing module is used for acquiring a processing result of the image to be processed based on the image characteristics.
According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of the above aspects.
According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method according to any one of the above aspects.
According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of the above aspects.
According to the technical scheme disclosed by the invention, the performance of the image feature extraction model can be improved.
It should be understood that the statements in this section do not necessarily identify key or critical elements of the embodiments of the present disclosure, nor do they necessarily limit the scope of the present disclosure. Other representations of the present disclosure will become readily apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
fig. 1 is a flowchart of a training method of an image feature extraction model according to an embodiment of the present disclosure;
FIG. 2 is a schematic diagram of an application scenario of a training method for implementing an image feature extraction model according to an embodiment of the present disclosure;
FIG. 3 is a flowchart of another training method for an image feature extraction model provided by the embodiment of the disclosure;
FIG. 4 is a frame diagram corresponding to FIG. 3;
FIG. 5 is a comparison of image features extracted by embodiments of the present disclosure and related techniques;
FIG. 6 is a block diagram of a training apparatus for an image feature extraction model according to an embodiment of the present disclosure;
fig. 7 is a flowchart of an image processing method provided by an embodiment of the present disclosure;
fig. 8 is a flowchart of an image processing apparatus provided in an embodiment of the present disclosure;
fig. 9 is a schematic diagram of an electronic device for implementing a training method or an image processing method of an image feature extraction model according to an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
In the related art, when an image feature extraction model is trained, the model is generally at a sample level (instance level), and the granularity is coarse, and the performance is not ideal.
In order to improve the performance of the image feature extraction model, the present disclosure provides the following embodiments.
Fig. 1 is a flowchart of a training method for an image feature extraction model provided in an embodiment of the present disclosure, and as shown in fig. 1, the method of the present embodiment includes:
101. a first image feature of the first image is subjected to segmentation processing to obtain a first segmentation feature.
102. And carrying out segmentation processing on second image features of a second image to obtain second segmentation features, wherein the first image and the second image are positive samples.
103. Constructing a total loss function based on the first image feature, the second image feature, the first segmentation feature, and the second segmentation feature.
104. And training an image feature extraction model based on the total loss function.
The executing subject of this embodiment may be a training apparatus of the image feature extraction model, and the training apparatus may be located in an electronic device, and the electronic device may be a user terminal or a server, etc.
The first image and the second image are a pair of samples of model training, and are positive samples of each other. The method can be used for obtaining the existing sample set, or generating the sample set by adopting a preset generation mode.
The first image feature is an image feature of the first image and can be obtained by performing feature extraction processing on the first image; the second image feature is an image feature of the second image, and may be obtained by performing feature extraction processing on the second image.
The feature obtained by performing segmentation processing on the first image feature may be referred to as a first segmentation feature, and the feature obtained by performing segmentation processing on the second image feature may be referred to as a second segmentation feature.
Since the segmentation features (the first segmentation feature and the second segmentation feature) are obtained by segmenting the image features (the first image feature and the second image feature), in general, the dimension of the segmentation features is smaller than that of the image features, and therefore, the image features can be regarded as coarse-grained features, and the segmentation features can be regarded as fine-grained features.
For example, generally, an image feature is a feature map (feature map), and therefore, the feature map may be divided (split) into a plurality of patches (patch), each patch serving as a set of divided features.
Specifically, for example, the dimension of the feature map is N × N, and after the segmentation process, the dimension of the segmented feature may be M × M, where M and N are both positive integers, and M < N; specifically, when the partition is divided into 4 partitions, M is equal to N/2.
After the first image feature, the second image feature, the first segmentation feature and the second segmentation feature are obtained, a total loss function can be constructed based on the features, and an image feature extraction model can be trained based on the total loss function.
The model training process, i.e., the process of adjusting the model parameters, for example, the model has initial parameters at the beginning, and the initial parameters may be adjusted based on the total loss function until the preset iteration number is reached, and the final model parameters are obtained, i.e., the final model is generated.
Because the image features are corresponding to the images, the images can be regarded as instance levels and are coarse-grained; the segmentation features are obtained after the image features are segmented, can be considered as patch levels and are fine-grained; thus, various levels of features may be obtained.
In this embodiment, the first image feature and the second image feature are at a sample level, and the first segmentation feature and the second segmentation feature are at a block level, so that a total loss function can be constructed based on multiple levels of features, and an image feature extraction model is trained.
For better understanding of the embodiments of the present disclosure, application scenarios of the embodiments of the present disclosure are explained as follows.
Fig. 2 is a schematic diagram of an application scenario of a training method for implementing an image feature extraction model according to an embodiment of the present disclosure.
As shown in fig. 2, the system architecture corresponding to the application environment may include a user terminal 201 and a server 202, and the user terminal 201 and the server 202 communicate with each other through a communication network. The user terminal 201 may include: personal Computer (Personal Computer, PC), mobile device, smart home device, wearable device, etc., the mobile device includes, for example, cell-phone, laptop, panel Computer, etc., the smart home device includes, for example, smart loudspeaker box, smart television, etc., the wearable device includes, for example, smart watch, smart glasses, etc. The server 202 may be a local server or a cloud server, and may be a single server or a cluster server, etc. The communication network may be a wired (e.g., fiber optic) and/or wireless (e.g., wifi) communication link.
The user terminal faces the user and can acquire data submitted by the user and/or display the data to the user. For example, the user may select an image sample to be processed through a user interface of the user terminal, and the user terminal sends the image sample to the server.
The server provides various services, for example, in the training stage, the server may train an image feature extraction model based on the image samples sent by the user terminal. In the application stage, the server may perform image feature extraction processing on the to-be-processed image sent by the user terminal based on the generated image feature extraction model to obtain an image feature of the to-be-processed image, and perform subsequent processes based on the image feature, such as OCR, face recognition and the like based on the image feature.
In addition, the image processing process may also be executed on the user terminal side, for example, the server may also send the image feature extraction model generated after training to the user terminal. In the application stage, the user terminal may extract image features by using the image feature extraction model, and perform subsequent processes based on the image features, for example, performing OCR, face recognition, and the like based on the image features.
With reference to the application scenarios, the training method of the image feature extraction model according to the embodiment of the present disclosure is described as follows:
generally, the model training process is performed on the server side, that is, the server may receive an image sample sent by the user terminal, and train the image feature extraction model based on the image sample.
The first image and the second image are image samples trained as a model, and can be sent to a server by the user terminal. Alternatively, the server may generate the first image and the second image based on a certain image sample transmitted from the user terminal.
The first image and the second image may be positive samples, the positive samples refer to samples with similar semantics, for example, the first image is a kitten image, the second image is also a kitten image (the two images may be at different angles), the third image is a puppy image, the first image and the second image are positive samples, and the third image is a negative sample.
The image feature extraction model may also be referred to as an encoder (encoder).
After the server obtains the first image and the second image, an initial encoder may be used to extract image features of the first image and the second image, respectively, so as to obtain the first image feature and the second image feature.
The initial encoder may be randomly initialized or may be an existing pre-trained model.
After the first image feature and the second image feature are obtained, segmentation processing may be performed to obtain a first segmentation feature and a second segmentation feature, respectively. The segmentation process may specifically be an averaging process, for example, the number of the blocks may be preset, and specifically, if the number of the blocks is 4, the first image feature and the second image feature may be respectively averaged into 4 blocks, where each block is a set of segmentation features.
It is to be understood that the above-described division process may be performed one or more times. For example, the first image feature is feature a, which may be segmented into feature a1, feature a1 is a set of segmented features, and then feature a1 may be further segmented to obtain feature a2, so that more levels of image features may be obtained. For simplicity of description, one division is taken as an example.
After the image features (first image feature, second image feature, first segmentation feature, second segmentation feature) are obtained, a total loss function may be constructed based on the image features.
Wherein a first loss function may be constructed based on the first image feature and the second image feature; constructing a second loss function based on the first segmentation feature and the second segmentation feature; a total loss function is constructed based on the first loss function and the second loss function.
In the above example, a first loss function corresponding to one level of features (for example, a sample level) may be obtained by constructing a first loss function through the first image features and the second image features, a second loss function corresponding to another level of features (for example, a block level) may be obtained by constructing a second loss function through the first segmentation features and the second segmentation features, and a total loss function is constructed based on the first loss function and the second loss function, so that the total loss function is constructed based on the multiple levels of features and may include multiple levels of information, and thus a model with better performance may be trained.
The self-supervision learning is one of machine learning, and a model with better performance can be obtained on the basis of no large-scale manual labeling data. The contrast learning is one of the self-supervision learning, and the core idea is to shorten the distance between the positive sample and the negative sample.
In computer vision, the image features can be better extracted by adopting an image feature extraction model obtained by contrast learning.
For the comparison learning algorithm, a third image feature may also be obtained, where the third image feature refers to an image feature of a negative sample of the first image, for example, in the above example, if the first image and the second image are both cat images, the puppy image may be used as the negative sample, and correspondingly, the third image feature refers to an image feature of the puppy image.
Accordingly, a contrast learning algorithm may be employed to construct a first loss function based on the first image feature, the second image feature, and the third image feature.
In the above example, the first loss function is constructed by using the comparison learning algorithm, so that the excellent performance of the comparison learning algorithm can be utilized, and the comparison learning algorithm is an automatic supervision algorithm, so that data does not need to be labeled, the workload of manual labeling can be reduced, and the training efficiency is improved; in addition, the difference between the positive sample and the negative sample can be compared by adopting a comparison learning algorithm, and an image feature extraction model with better accuracy can be obtained based on the difference.
The total loss function may be a sum of a first loss function constructed based on coarse-grained level features (the first image feature, the second image feature, and the third image feature) of the positive and negative samples and a second loss function constructed based on fine-grained level features (the first segmentation feature and the second segmentation feature) of the positive samples. Thus, the total loss function will contain multiple levels of features.
In addition, one of the core ideas of the contrast learning is to shorten the distance between the positive samples, which can be embodied in the first loss function and the second loss function, and the other core idea of the contrast learning is to lengthen the distance between the positive samples and the negative samples, which can be embodied in the first loss function.
When the image feature extraction model is constructed based on the total loss function, the method may include calculating a gradient of the total loss function, and updating model parameters of the image feature extraction model by using a Back Propagation (BP) algorithm until a preset number of iterations is reached.
The first image feature and the second image feature are image features of the first image and the second image respectively, when the image features are extracted, the first image and the second image can share the same image feature extraction model, and at the moment, a BP algorithm can be adopted to update model parameters of the shared image feature extraction model until a final image feature extraction model is generated. Alternatively, the first and second liquid crystal display panels may be,
When the image features are extracted, the first image and the second image may use different image feature extraction models, for example, the first image uses the first image feature extraction model to extract the first image features, and the second image uses the second image feature extraction model to extract the second image features. At this time, the model parameter of one of the image feature extraction models may be updated by using a BP algorithm, for example, the model parameter of the first image feature extraction model may be updated by using a BP algorithm, and the model parameter of the other image feature extraction model may be updated in other manners. Accordingly, one of the first image feature extraction model and the second image feature extraction model may be used as a final image feature extraction model, for example, the finally generated first image feature extraction model may be used as the final image feature extraction model.
The following description will take an example in which the first image and the second image respectively use different image feature extraction models to perform image feature extraction, and a contrast learning algorithm is used to construct the first loss function.
Since the image feature extraction model may also be referred to as an encoder, the different image feature extraction models may be referred to as a first encoder and a second encoder, respectively, and then the first encoder generated in the training stage may be selected as the encoder used in the application stage, that is, the final image feature extraction model is the first encoder.
The structure of the image feature extraction model (the first encoder and the second encoder) may adopt an existing structure, such as moco v 2.
Fig. 3 is a flowchart of another training method for an image feature extraction model according to an embodiment of the present disclosure, and fig. 4 is a system architecture diagram corresponding to fig. 3, as shown in fig. 3, and referring to the architecture diagram shown in fig. 4, the method of this embodiment includes:
301. acquiring a first image and a second image, wherein the first image and the second image are positive samples of each other.
Wherein, positive samples refer to samples with similar semantics, and negative samples refer to samples with dissimilar semantics (the expression can be that the semantic features are mutually orthogonal).
Specifically, positive and negative samples can be obtained by manually collecting or labeling from an existing image sample set.
Alternatively, for positive samples, they may be obtained based on the same image.
Specifically, two different data enhancement processes may be performed on the same image, respectively, to obtain the first image and the second image.
Wherein the same image may be an image in a sample set of existing images.
In computer vision, typical data enhancement methods are Flip (Flip), Rotate (Rotate), Scale (Scale), Random Crop or zero padding (Random Crop or Pad), Color dithering (Color jittering), Noise (Noise), and the like.
In addition, the two different data enhancement processing manners of this embodiment may further include a processing manner that keeps the image unchanged.
For example, for the same image, the image may be referred to as an original image, and one image is used as a first image while the original image is kept unchanged; and the other image is obtained as a second image after the original image is turned over.
Referring to fig. 4, the first image and the second image may be two images of kittens at different angles.
In the above example, the first image and the second image are obtained by performing different data enhancement processing on the same image, so that a plurality of types of first images and second images which are positive samples can be obtained even when the number of samples is small, and the amount of samples required by model training is reduced, so that sufficient positive samples can be obtained by still performing different data enhancement processing on the basis of a small amount of samples, model training on the basis of a small sample is completed, various expenses required by collecting samples with a large amount of data are reduced, and training efficiency is improved.
302. And performing feature extraction processing on the input first image by adopting a first encoder to obtain first image features.
Referring to fig. 4, the input of the first encoder is a first image, the output is a first image feature, and the first encoder performs image feature extraction processing on the first image.
303. And performing segmentation processing on the first image characteristic to obtain a first segmentation characteristic.
Generally, the image features are feature maps (feature maps), as shown in fig. 4.
Namely, the first image feature is a first image feature map, the second image feature is a second image feature map, and the dimensions of the first image feature map and the second image feature map are the same; the segmenting the first image feature of the first image to obtain a first segmented feature includes: averagely dividing the first image feature map into a first number of blocks, and taking the first number of blocks as the first division features; the segmenting a second image feature of the second image to obtain a second segmented feature includes: averagely dividing the second image feature map into a second number of blocks, and taking the second number of blocks as the second division features; wherein the first number is the same as the second number.
For example, referring to fig. 4, the first image feature may be equally divided into 4 segments, each segment as a set of first segmentation features.
Since the image features are generally feature maps, fine-grained segmentation features can be obtained simply by adopting an average blocking mode.
The above describes the processing flow of the first image, and the processing flow of the second image is similar, that is, the method may further include:
304. and performing feature extraction processing on the input second image by adopting the second encoder to obtain the second image feature.
305. And performing segmentation processing on a second image feature of the second image to obtain a second segmentation feature.
Wherein, 302-303 and 304-305 have no timing restriction relationship.
In addition, it is understood that, when acquiring the image features, besides the processing of the encoder, other general steps may be included, for example, the feature map output by the encoder may be subjected to pooling and Multi Layer Perceptron (MLP) processing. Wherein the pooling may be specifically average pooling (avg pooling).
After the general steps described above, such as averaging pooling and MLP processing, the graph-form features can be converted to vector form. For example, referring to fig. 4, the first image feature is represented by q, the second image feature is represented by k, the first divided features are represented by q1 to q4, and the second divided features are represented by k1 to k4, wherein q, k, q1 to q4, and k1 to k4 are all in vector form.
In the above example, the first encoder and the second encoder are respectively adopted to perform the image feature extraction processing on the first image and the second image, which may facilitate the subsequent adjustment of the first encoder and the second encoder respectively by adopting different adjustment manners. By adopting two encoders, the first encoder and the second encoder can adopt respective suitable model parameter adjustment modes to improve respective performances.
306. A third image feature of a negative sample of the first image is obtained.
Wherein the third image feature may be obtained from a memory queue.
Specifically, the third image feature may be obtained in advance and stored in the storage queue, so that the third image feature may be obtained from the storage queue. The number of negative examples may be one or more.
Regarding the acquisition of the third image feature, it may be an image feature that has been generated by a previous sample as an image feature of a negative sample of the current sample, that is, the third image feature.
Specifically, the model training process is a process of multiple iterations, and the samples adopted in each iteration process are different. For example, a first image and a second image of the first iteration process are respectively represented by a1 and B1, so that the first iteration process may obtain a first image feature corresponding to a1 and a second image feature corresponding to B1, and at this time, the second image feature corresponding to B1 may be stored in the storage queue; the first image and the second image in the second iteration process are respectively represented by a2 and B2, so that the first image feature corresponding to a2 and the second image feature corresponding to B2 can be obtained in the second iteration process, and at this time, the third image feature corresponding to a2 can be selected as the second image feature of B1 generated in the first iteration process. That is, the image features of the positive samples in the previous iteration process can be used as the image features of the negative samples in the subsequent iteration process.
Since the length of the storage queue is a set integer value, if the current iteration process is the T-th (T is a positive integer) iteration process, and the length of the storage queue is K (K is a positive integer), the second image features generated by the (T-K) th to (T-1) th iteration processes can be used as the third image features of the current iteration process. It will be appreciated that the preset initial value may be used if the second image feature of the previous iteration or iterations is not generated, for example, if the current iteration is the first iteration.
Accordingly, the second image feature generated by the current iterative process may be stored in the storage queue as a third image feature of a subsequent iterative process. For example, referring to fig. 4, the currently generated second image feature k may be stored in a storage queue.
In the above example, by acquiring the third image feature from the storage queue, the third image feature that has been generated in advance may be acquired, and it is not necessary to generate the third image feature in real time, which may improve the efficiency of acquiring the third image feature and reduce the amount of computation.
307. And constructing a first loss function based on the first image characteristic, the second image characteristic and the third image characteristic by adopting a contrast learning algorithm.
The core idea of the comparative learning is to shorten the distance between the positive sample and the negative sample, and therefore, a first loss function that can characterize the idea can be selected.
For example, the first loss function may be calculated as:
Figure BDA0003607715930000121
wherein L1 is a first loss function; q is a first image characteristic, k is a second image characteristic, and tau is a preset parameter; k is the length of the storage queue, namely the number of negative samples; k is a radical ofi -Is the image feature of the ith negative example, i.e. the third image feature.
308. Constructing a second loss function based on the first segmentation feature and the second segmentation feature.
Wherein the second loss function is a loss function that can characterize the drawn positive sample distance.
For example, the second loss function is KL (Kullback-Leibler) divergence.
Since the segmentation features (the first segmentation feature and the second segmentation feature) are multiple groups, for example, four groups as shown in fig. 4, when calculating the KL divergence, the KL divergence may be calculated by calculating the KL divergence by using the first segmentation feature of each group and the second segmentation feature of each group, so as to obtain a plurality of KL divergences, and then summing the KL divergences to obtain the second loss function.
Based on the four sets of segmentation features shown in fig. 4, the second loss function may be calculated as:
L2=LKL(q1,k1)+LKL(q2,k2)+LKL(q3,k3)+LKL(q4,k4)
Wherein L2 is a second loss function; l is a radical of an alcoholKL(qi, ki) is the first score of the ith groupThe KL divergence, i, of the cut feature from the second cut feature is 1,2,3, 4.
LKLThe formula for the calculation of (qi, ki) is:
Figure BDA0003607715930000122
where M is the dimension of the first segmented feature qi and the second segmented feature ki, qi (x) is the element of the x-th dimension of qi, and ki (x) is the element of the x-th dimension of ki.
307 and 308 have no timing-defining relationship.
309. Constructing a total loss function based on the first loss function and the second loss function.
The calculation formula of the total loss function may be:
L=L1+L2;
where L is the total loss function, L1 is the first loss function, and L2 is the second loss function.
It is understood that the first image and the second image are positive samples in this embodiment. In other examples, the first image and the second image may also be negative samples of each other, and at this time, when the first loss function is constructed, the image features of the positive sample of the first image may be obtained, and then the first loss function is constructed based on the image features of the positive and negative samples learned through comparison; however, in the second loss function, if the first image and the second image are negative samples, the loss function that is closer to the positive sample distance cannot be used, but since the first image and the second image are negative samples, the distance between the first image and the second image needs to be increased, and therefore, a loss function that can represent the distance between the first image and the second image can be used.
310. Adjusting model parameters of the first encoder based on the total loss function to obtain adjusted model parameters of the first encoder.
Wherein, as shown in fig. 4, the BP algorithm can be adopted to update the model parameters of the first encoder by using the gradient values of the total loss function.
311. Obtaining the adjusted model parameters of the second encoder based on the model parameters of the second encoder before adjustment and the adjusted model parameters of the first encoder.
For the first encoder, a normal BP algorithm may be adopted to adjust the model parameters.
For the second encoder, the original parameters of the second encoder may be mostly preserved.
Specifically, a weighted summation operation may be performed on the model parameters of the second encoder before adjustment and the model parameters of the first encoder after adjustment, so as to obtain the adjusted model parameters of the second encoder; the model parameter before adjustment of the second encoder corresponds to a first weight value, the model parameter after adjustment of the first encoder corresponds to a second weight value, and the first weight value is greater than the second weight value.
For example, the first weighted value is 0.9, and the second weighted value is 0.1.
In the above example, the model parameters of the first encoder are adjusted based on the total loss function, and information of each iteration process can be continuously learned; the model parameters of the second encoder are determined based on the parameters of the first encoder and the original parameters of the second encoder, so that the information of the first encoder can be learned and the information of the second encoder can be reserved; therefore, the first encoder and the second encoder can respectively learn information in different aspects, so that the whole training process contains more information on the whole, and the performance of the final image feature extraction model is improved.
In the above example, for the second encoder, during the weighting operation, the weight value corresponding to the model parameter of the second encoder is large, so that the model parameter of the second encoder is changed slowly, the stability can be improved, and the good performance of the second encoder can be maintained.
The above 301 to 311 indicate the flow of one iteration process, and when the model is trained, the iteration process may be executed for a plurality of times until the preset iteration number is reached.
The first encoder up to the preset number of iterations may then be used as the final image feature extraction model for the application phase.
To better understand the performance of the image feature extraction model generated by the present embodiment, a comparison graph is given in fig. 5.
As shown in fig. 5, (a) is an image of a feature to be extracted, (b) is an image feature of the image shown in (a) extracted using a model in which a multi-level feature is not considered in the related art, and (c) is an image feature of the image shown in (a) extracted using a model generated using a multi-level feature in the embodiment of the present disclosure.
As can be seen from fig. 5, the image features extracted by using the model of the embodiment of the present disclosure have a stronger distinction degree between the image features of the target (bathtub) and the image features of the background (such as land, etc.), that is, the model of the embodiment has a stronger collection capability for the key information and a stronger feature expression capability. It will be appreciated that in practice, the use of a coloured image will be more likely to reflect the above described discrimination, limited by the required form of representation of the grey scale map of the drawings.
In addition, table 1 shows a comparison of some performance parameters of the model of the disclosed embodiments with the related art model based on the sample set Imagenet 100.
TABLE 1
Figure BDA0003607715930000141
Wherein L1 corresponds to the performance parameter of the related art, and L1+ L2 corresponds to the performance parameter of the embodiment of the present disclosure, as shown in table 1, the performance parameter of the embodiment of the present disclosure (the last two rows) is better, and the performance parameter is better as the number of iterations increases (the number of iterations in the last row is more).
Fig. 6 is a structural diagram of a training apparatus for an image feature extraction model according to an embodiment of the present disclosure. As shown in fig. 6, the training apparatus 600 for the image feature extraction model includes: an acquisition module 601, a construction module 602, and a training module 603.
The first obtaining module 601 is configured to perform segmentation processing on a first image feature of a first image to obtain a first segmentation feature; the second obtaining module 602 is configured to segment a second image feature of a second image to obtain a second segmented feature, where the first image and the second image are positive samples; the construction module 603 is configured to construct a total loss function based on the first image feature, the second image feature, the first segmentation feature and the second segmentation feature; the training module 604 is configured to train an image feature extraction model based on the total loss function.
In this embodiment, because the first image feature and the second image feature are at a sample level and the first segmentation feature and the second segmentation feature are at a block level, a total loss function can be constructed based on features at multiple levels to train an image feature extraction model, and because the features at multiple levels are referred to during model training, the performance of the image feature extraction model can be improved.
In some embodiments, the building module 603 is further configured to: constructing a first loss function based on the first image feature and the second image feature; constructing a second loss function based on the first segmentation feature and the second segmentation feature; a total loss function is constructed based on the first loss function and the second loss function.
In this embodiment, a first loss function corresponding to one level of features (for example, a sample level) may be obtained by constructing a first loss function through the first image features and the second image features, a second loss function corresponding to another level of features (for example, a block level) may be obtained by constructing a second loss function through the first segmentation features and the second segmentation features, and a total loss function is constructed based on the first loss function and the second loss function, so that the total loss function is constructed based on multiple levels of features, and may include multiple levels of information, and thus a model with better performance may be trained.
In some embodiments, the first image and the second image are positive samples of each other; the apparatus 600 further comprises: a third obtaining module, configured to obtain a third image feature of a negative sample of the first image; the building module 603 is further configured to: and constructing a first loss function based on the first image characteristic, the second image characteristic and the third image characteristic by adopting a contrast learning algorithm.
In this embodiment, the first loss function is constructed by using the contrast learning algorithm, and the superior performance of the contrast learning algorithm can be utilized, so that an image feature extraction model with better performance can be obtained.
In some embodiments, the third obtaining module is further configured to: and acquiring the third image characteristic from a preset storage queue.
In this embodiment, by acquiring the third image feature from the storage queue, the third image feature that has been generated in advance may be acquired, and the third image feature does not need to be generated in real time, so that the acquisition efficiency of the third image feature may be improved, and the amount of computation may be reduced.
In some embodiments, the image feature extraction model is a first encoder or a second encoder; the apparatus 600 further comprises: the first encoding module is used for performing feature extraction processing on the input first image by adopting the first encoder to obtain the first image feature; and the second coding module is used for performing feature extraction processing on the input second image by adopting the second coder so as to obtain the second image feature.
In this embodiment, the first encoder and the second encoder are respectively adopted to perform image feature extraction processing on the first image and the second image, so that the first encoder and the second encoder can be conveniently adjusted respectively in a subsequent different adjustment mode. By adopting two encoders, the first encoder and the second encoder can adopt respective suitable model parameter adjustment modes to improve respective performances.
In some embodiments, the training module 604 is further configured to: adjusting model parameters of the first encoder based on the total loss function to obtain adjusted model parameters of the first encoder; and obtaining the adjusted model parameters of the second encoder based on the model parameters of the second encoder before adjustment and the adjusted model parameters of the first encoder.
In this embodiment, the model parameter of the first encoder is adjusted based on the total loss function, and information of each iteration process can be continuously learned; the model parameters of the second encoder are determined based on the parameters of the first encoder and the original parameters of the second encoder, so that the information of the first encoder can be learned and the information of the second encoder can be reserved; therefore, the first encoder and the second encoder can respectively learn information in different aspects, so that the whole training process contains more information on the whole, and the performance of the final image feature extraction model is improved.
In some embodiments, the training module is further to: performing weighted summation operation on the model parameters of the second encoder before adjustment and the model parameters of the first encoder after adjustment to obtain the model parameters of the second encoder after adjustment; the model parameter before adjustment of the second encoder corresponds to a first weight value, the model parameter after adjustment of the first encoder corresponds to a second weight value, and the first weight value is greater than the second weight value.
In this embodiment, for the second encoder, during the weighting operation, the weight value corresponding to the model parameter of the second encoder is large, so that the model parameter of the second encoder is changed slowly, the stability can be improved, and the good performance of the second encoder can be maintained.
In some embodiments, the first image feature is a first image feature map, the second image feature is a second image feature map, and the first image feature map and the second image feature map have the same dimension; the first obtaining module 601 is further configured to: averagely dividing the first image feature map into a first number of blocks, and taking the first number of blocks as the first division features; the second obtaining module 602 is further configured to: averagely dividing the second image feature map into a second number of blocks, and taking the second number of blocks as the second division features; wherein the first number is the same as the second number.
In this embodiment, since the image features are generally feature maps, fine-grained segmentation features can be obtained simply by using an average blocking method.
In some embodiments, the apparatus 600 further comprises: a data enhancement module for respectively performing two different data enhancement processes on the same image to obtain the first image and the second image
In this embodiment, different data enhancement processing is performed on the same image to obtain the first image and the second image, so that the first image and the second image which are positive samples can be obtained when the number of samples is small, the required sample size is reduced, model training on the basis of a small sample is realized, and training efficiency is improved.
The above describes a model training process, and through the training process, an image feature extraction model can be obtained. The image feature extraction model can be used in an image processing process.
Fig. 7 is a flowchart of an image processing method provided in an embodiment of the present disclosure, and as shown in fig. 7, the image processing method may include:
701. and acquiring an image to be processed.
702. Extracting image features of the image to be processed based on preset parameters; the preset parameters are obtained based on a first image feature of a first image sample, a second image feature of a second image sample, and a first segmentation feature and a second segmentation feature, the first segmentation feature is obtained after the first image feature is segmented, the second segmentation feature is obtained after the second image feature is segmented, and the first image sample and the second image sample are positive samples.
703. And acquiring a processing result of the image to be processed based on the image characteristics.
In this embodiment, since the segmentation features are obtained by performing segmentation processing on the image features, the image features may be regarded as coarse-grained features, and the segmentation features are fine-grained features; the preset parameters are determined based on the image features and the segmentation features, and the preset parameters can be considered to contain information of multiple levels, so that the features of the image to be processed can be extracted when the preset parameters are adopted to extract the features of the image to be processed, and further, when the processing results are obtained based on the features of the multiple levels, more accurate processing results can be obtained. Therefore, the image processing method improves the accuracy of image processing.
In some embodiments, the preset parameters may refer to model parameters of an image feature extraction model, the image feature extraction model may be generated in a training stage, and reference may be made to the related embodiments above for a training process of the image feature extraction model.
In the application stage, the input of the image feature extraction model is the image to be processed, and the output is the image features of the image to be processed.
Image processing may be applied to a variety of related scenarios, such as OCR, face recognition, object detection, and the like.
Taking face recognition as an example, the image to be processed may be a face image. Accordingly, the image feature may be an image feature of a face image.
Based on different application scenes, the image features can be input into a model of a related downstream task for processing so as to output a processing result.
Taking face recognition as an example, face recognition can be considered as a classification task, and therefore, the image features can be input into a classification model, and the output of the classification model is a face recognition result, that is, a face image whose person is determined among a plurality of candidates. The specific structure of the classification model can be implemented by various related technologies, such as a fully connected network.
Further, the determining process of the preset parameter may include: the preset parameters can be initialized randomly to obtain initial values; and then, constructing a first loss function based on the first image characteristic and the second image characteristic, constructing a second loss function based on the first segmentation characteristic and the second segmentation characteristic, constructing a total loss function based on the first loss function and the second loss function, continuously adjusting preset parameters from an initial value by adopting the total loss function until reaching a preset iteration time, and taking the preset parameters reaching the preset iteration time as the finally adopted preset parameters in the application stage.
Unlike the conventional method in which the loss function is constructed based on only the image features, the loss function used in this embodiment may be referred to as a total loss function, and the total loss function is determined based on the first loss function and the second loss function, rather than including only the first loss function. Because the first loss function can reflect information of coarse granularity, and the second loss function can reflect information of fine granularity, the total loss function contains information of coarse granularity, namely information of multiple levels, and the preset parameters adjusted based on the information of multiple levels also contain information of multiple levels, so that the extraction capability of the preset parameters for image features can be improved, and the accuracy of image processing is improved.
Fig. 8 is a structural diagram of an image processing apparatus provided in an embodiment of the present disclosure, and as shown in fig. 8, the image processing apparatus may include: an acquisition module 801, an extraction module 802 and a processing module 803.
The obtaining module 801 is configured to obtain an image to be processed; the extraction module 802 is configured to extract image features of the image to be processed based on preset parameters; the preset parameters are obtained based on a first image feature of a first image sample, a second image feature of a second image sample, and a first segmentation feature and a second segmentation feature, wherein the first segmentation feature is obtained after segmentation processing is performed on the first image feature, the second segmentation feature is obtained after segmentation processing is performed on the second image feature, and the first image sample and the second image sample are positive samples; the processing module 803 is configured to obtain a processing result of the image to be processed based on the image feature.
In this embodiment, since the segmentation features are obtained by performing segmentation processing on the image features, the image features may be regarded as coarse-grained features, and the segmentation features are fine-grained features; the preset parameters are determined based on the image features and the segmentation features, and the preset parameters can be considered to contain information of multiple levels, so that the features of the image to be processed can be extracted when the preset parameters are adopted to extract the features of the image to be processed, and more accurate processing results can be obtained when the processing results are obtained based on the features of the multiple levels. Therefore, the image processing method improves the accuracy of image processing.
It is to be understood that in the disclosed embodiments, the same or similar contents in different embodiments may be mutually referred to.
It is to be understood that "first", "second", and the like in the embodiments of the present disclosure are only used for distinguishing, and do not indicate the degree of importance, the sequence, and the like.
The first step and the second step have no timing limitation relationship, which means that the first step may be performed first and then the second step, or the second step may be performed first and then the first step, or the first step and the second step may be performed in parallel.
The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.
FIG. 9 illustrates a schematic block diagram of an example electronic device 900 that can be used to implement embodiments of the present disclosure. The electronic device 900 is intended to represent various forms of digital computers, such as laptops, desktops, workstations, servers, blade servers, mainframes, and other appropriate computers. The electronic device 900 may also represent various forms of mobile devices, such as personal digital assistants, cellular telephones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 9, the electronic apparatus 900 includes a computing unit 901, which can perform various appropriate actions and processes in accordance with a computer program stored in a Read Only Memory (ROM)902 or a computer program loaded from a storage unit 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data required for the operation of the electronic device 900 can also be stored. The calculation unit 901, ROM 902, and RAM 903 are connected to each other via a bus 904. An input/output (I/O) interface 905 is also connected to bus 904.
A number of components in the electronic device 900 are connected to the I/O interface 905, including: an input unit 906 such as a keyboard, a mouse, and the like; an output unit 909 such as various types of displays, speakers, and the like; a storage unit 908 such as a magnetic disk, optical disk, or the like; and a communication unit 909 such as a network card, a modem, a wireless communication transceiver, and the like. The communication unit 909 allows the electronic device 900 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.
The computing unit 901 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 901 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 901 performs various methods and processes described above, such as a training method of an image feature extraction model or an image processing method. For example, in some embodiments, the training method of the image feature extraction model may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 908. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 900 via the ROM 902 and/or the communication unit 909. When the computer program is loaded into the RAM 903 and executed by the computing unit 901, one or more steps of the training method or the image processing method of the image feature extraction model described above may be performed. Alternatively, in other embodiments, the computing unit 901 may be configured in any other suitable way (e.g. by means of firmware) to perform a training method or an image processing method of the image feature extraction model.
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server can be a cloud Server, also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service ("Virtual Private Server", or simply "VPS"). The server may also be a server of a distributed system, or a server incorporating a blockchain.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.
The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims (23)

1. A training method of an image feature extraction model comprises the following steps:
performing segmentation processing on a first image feature of a first image to obtain a first segmentation feature;
segmenting second image features of a second image to obtain second segmentation features, wherein the first image and the second image are positive samples;
constructing a total loss function based on the first image feature, the second image feature, the first segmentation feature, and the second segmentation feature;
And training an image feature extraction model based on the total loss function.
2. The method of claim 1, wherein said constructing a total loss function based on said first image feature, said second image feature, said first segmentation feature, and said second segmentation feature comprises:
constructing a first loss function based on the first image feature and the second image feature;
constructing a second loss function based on the first segmentation feature and the second segmentation feature;
constructing the total loss function based on the first loss function and the second loss function.
3. The method of claim 2, further comprising:
acquiring a third image feature of a negative sample of the first image;
constructing a first loss function based on the first image feature and the second image feature, comprising:
and constructing the first loss function based on the first image characteristic, the second image characteristic and the third image characteristic by adopting a contrast learning algorithm.
4. The method of claim 3, wherein said obtaining a third image feature of a negative example of the first image comprises:
And acquiring the third image characteristic from a preset storage queue.
5. The method of claim 1, wherein,
the image feature extraction model is a first encoder or a second encoder;
the method further comprises the following steps:
performing feature extraction processing on the input first image by using the first encoder to obtain the first image feature;
and performing feature extraction processing on the input second image by adopting the second encoder to obtain the second image feature.
6. The method of claim 5, wherein training an image feature extraction model based on the total loss function comprises:
adjusting model parameters of the first encoder based on the total loss function to obtain adjusted model parameters of the first encoder;
and obtaining the adjusted model parameters of the second encoder based on the model parameters of the second encoder before adjustment and the adjusted model parameters of the first encoder.
7. The method of claim 6, wherein the obtaining adjusted model parameters for the second encoder based on the pre-adjusted model parameters for the second encoder and the adjusted model parameters for the first encoder comprises:
Performing weighted summation operation on the model parameters of the second encoder before adjustment and the model parameters of the first encoder after adjustment to obtain the model parameters of the second encoder after adjustment;
the model parameter before adjustment of the second encoder corresponds to a first weight value, the model parameter after adjustment of the first encoder corresponds to a second weight value, and the first weight value is greater than the second weight value.
8. The method of any one of claims 1-7,
the first image feature is a first image feature map, the second image feature is a second image feature map, and the dimensions of the first image feature map and the second image feature map are the same;
the segmenting the first image feature of the first image to obtain a first segmented feature includes: averagely dividing the first image feature map into a first number of blocks, and taking the first number of blocks as the first division features;
the segmenting the second image feature of the second image to obtain a second segmented feature includes: averagely dividing the second image feature map into a second number of blocks, and taking the second number of blocks as the second division features; wherein the first number is the same as the second number.
9. The method of any of claims 1-7, further comprising:
and respectively carrying out two different data enhancement processes on the same image to obtain the first image and the second image.
10. A training device of an image feature extraction model comprises:
the first acquisition module is used for segmenting a first image characteristic of a first image to obtain a first segmentation characteristic;
the second acquisition module is used for segmenting a second image characteristic of a second image to obtain a second segmentation characteristic, and the first image and the second image are positive samples;
a construction module for constructing a total loss function based on the first image feature, the second image feature, the first segmentation feature and the second segmentation feature;
and the training module is used for training an image feature extraction model based on the total loss function.
11. The apparatus of claim 10, wherein the build module is further to:
constructing a first loss function based on the first image feature and the second image feature;
constructing a second loss function based on the first segmentation feature and the second segmentation feature;
Constructing the total loss function based on the first loss function and the second loss function.
12. The apparatus of claim 11, further comprising:
a third obtaining module, configured to obtain a third image feature of a negative sample of the first image;
the build module is further to:
and constructing the first loss function based on the first image feature, the second image feature and the third image feature by adopting a contrast learning algorithm.
13. The apparatus of claim 12, wherein the third obtaining means is further configured to:
and acquiring the third image characteristic from a preset storage queue.
14. The apparatus of claim 10, wherein,
the image feature extraction model is a first encoder or a second encoder;
the device further comprises:
the first encoding module is used for performing feature extraction processing on the input first image by adopting the first encoder to obtain the first image feature;
and the second coding module is used for performing feature extraction processing on the input second image by adopting the second coder so as to obtain the second image feature.
15. The apparatus of claim 14, wherein the training module is further to:
adjusting model parameters of the first encoder based on the total loss function to obtain adjusted model parameters of the first encoder;
and obtaining the adjusted model parameters of the second encoder based on the model parameters of the second encoder before adjustment and the adjusted model parameters of the first encoder.
16. The apparatus of claim 15, wherein the training module is further to:
performing weighted summation operation on the model parameters of the second encoder before adjustment and the model parameters of the first encoder after adjustment to obtain the model parameters of the second encoder after adjustment;
the model parameter of the second encoder before adjustment corresponds to a first weight value, the model parameter of the first encoder after adjustment corresponds to a second weight value, and the first weight value is larger than the second weight value.
17. The apparatus of any one of claims 10-16,
the first image feature is a first image feature map, the second image feature is a second image feature map, and the dimensions of the first image feature map and the dimensions of the second image feature map are the same;
The first obtaining module is further configured to: averagely dividing the first image feature map into a first number of blocks, and taking the first number of blocks as the first division features;
the second obtaining module is further configured to: averagely dividing the second image feature map into a second number of blocks, and taking the second number of blocks as the second division features; wherein the first number is the same as the second number.
18. The apparatus of any of claims 10-16, further comprising:
and the data enhancement module is used for respectively carrying out two different data enhancement treatments on the same image so as to obtain the first image and the second image.
19. An image processing method, comprising:
acquiring an image to be processed;
extracting image features of the image to be processed based on preset parameters; the preset parameters are obtained based on a first image feature of a first image sample, a second image feature of a second image sample, and a first segmentation feature and a second segmentation feature, wherein the first segmentation feature is obtained after segmentation processing is performed on the first image feature, the second segmentation feature is obtained after segmentation processing is performed on the second image feature, and the first image sample and the second image sample are positive samples;
And acquiring a processing result of the image to be processed based on the image characteristics.
20. An image processing apparatus comprising:
the acquisition module is used for acquiring an image to be processed;
the extraction module is used for extracting the image characteristics of the image to be processed based on preset parameters; the preset parameters are obtained based on a first image feature of a first image sample, a second image feature of a second image sample, and a first segmentation feature and a second segmentation feature, wherein the first segmentation feature is obtained after segmentation processing is performed on the first image feature, the second segmentation feature is obtained after segmentation processing is performed on the second image feature, and the first image sample and the second image sample are positive samples;
and the processing module is used for acquiring a processing result of the image to be processed based on the image characteristics.
21. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-9, 19.
22. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-9, 19.
23. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-9, 19.
CN202210423894.XA 2022-04-21 2022-04-21 Image processing and model training method, device, equipment and storage medium Active CN114758130B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210423894.XA CN114758130B (en) 2022-04-21 2022-04-21 Image processing and model training method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210423894.XA CN114758130B (en) 2022-04-21 2022-04-21 Image processing and model training method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN114758130A true CN114758130A (en) 2022-07-15
CN114758130B CN114758130B (en) 2023-12-22

Family

ID=82331901

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210423894.XA Active CN114758130B (en) 2022-04-21 2022-04-21 Image processing and model training method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114758130B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116884077A (en) * 2023-09-04 2023-10-13 上海任意门科技有限公司 Face image category determining method and device, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113378632A (en) * 2021-04-28 2021-09-10 南京大学 Unsupervised domain pedestrian re-identification algorithm based on pseudo label optimization
CN114020950A (en) * 2021-11-03 2022-02-08 北京百度网讯科技有限公司 Training method, device and equipment of image retrieval model and storage medium
CN114049516A (en) * 2021-11-09 2022-02-15 北京百度网讯科技有限公司 Training method, image processing method, device, electronic device and storage medium
CN114186622A (en) * 2021-11-30 2022-03-15 北京达佳互联信息技术有限公司 Image feature extraction model training method, image feature extraction method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113378632A (en) * 2021-04-28 2021-09-10 南京大学 Unsupervised domain pedestrian re-identification algorithm based on pseudo label optimization
CN114020950A (en) * 2021-11-03 2022-02-08 北京百度网讯科技有限公司 Training method, device and equipment of image retrieval model and storage medium
CN114049516A (en) * 2021-11-09 2022-02-15 北京百度网讯科技有限公司 Training method, image processing method, device, electronic device and storage medium
CN114186622A (en) * 2021-11-30 2022-03-15 北京达佳互联信息技术有限公司 Image feature extraction model training method, image feature extraction method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YU ZHAO ET AL.: "Joint patch and instance discrimination learning for unsupervised person re-identification", 《IMAGE AND VISION COMPUTING》, pages 1 - 10 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116884077A (en) * 2023-09-04 2023-10-13 上海任意门科技有限公司 Face image category determining method and device, electronic equipment and storage medium
CN116884077B (en) * 2023-09-04 2023-12-08 上海任意门科技有限公司 Face image category determining method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN114758130B (en) 2023-12-22

Similar Documents

Publication Publication Date Title
CN114020950B (en) Training method, device, equipment and storage medium for image retrieval model
CN113379627B (en) Training method of image enhancement model and method for enhancing image
CN113343803A (en) Model training method, device, equipment and storage medium
CN113792854A (en) Model training and word stock establishing method, device, equipment and storage medium
CN114065863B (en) Federal learning method, apparatus, system, electronic device and storage medium
CN112580733B (en) Classification model training method, device, equipment and storage medium
CN113627536B (en) Model training, video classification method, device, equipment and storage medium
CN115147680B (en) Pre-training method, device and equipment for target detection model
CN112785493A (en) Model training method, style migration method, device, equipment and storage medium
CN115496970A (en) Training method of image task model, image recognition method and related device
CN113902696A (en) Image processing method, image processing apparatus, electronic device, and medium
CN110633717A (en) Training method and device for target detection model
CN114781654A (en) Federal transfer learning method, device, computer equipment and medium
CN113033408B (en) Data queue dynamic updating method and device, electronic equipment and storage medium
CN114693934A (en) Training method of semantic segmentation model, video semantic segmentation method and device
CN114758130A (en) Image processing and model training method, device, equipment and storage medium
CN115170919B (en) Image processing model training and image processing method, device, equipment and storage medium
CN114841341B (en) Image processing model training and image processing method, device, equipment and medium
CN116363429A (en) Training method of image recognition model, image recognition method, device and equipment
CN114330576A (en) Model processing method and device, and image recognition method and device
CN113344213A (en) Knowledge distillation method, knowledge distillation device, electronic equipment and computer readable storage medium
CN115294396B (en) Backbone network training method and image classification method
CN114550236B (en) Training method, device, equipment and storage medium for image recognition and model thereof
US20230237780A1 (en) Method, device, and computer program product for data augmentation
CN113362304B (en) Training method of definition prediction model and method for determining definition level

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant