CN114266937A

CN114266937A - Model training method, image processing method, device, equipment and storage medium

Info

Publication number: CN114266937A
Application number: CN202111560350.XA
Authority: CN
Inventors: 时阳; 孙涛; 李超
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-12-20
Filing date: 2021-12-20
Publication date: 2022-04-01

Abstract

The disclosure provides a model training method, an image processing method, a device, equipment and a storage medium, and relates to the fields of deep learning and computer vision. The specific implementation scheme is as follows: acquiring a training sample pair set, wherein training sample pairs in the training sample pair set comprise sample visible light images and corresponding sample infrared images; inputting the visible light image of each sample into a first generator to obtain a first infrared image, and inputting the first infrared image into a second generator to obtain a first visible light image; inputting the corresponding infrared image of each sample into a second generator to obtain a second visible light image, and inputting the second visible light image into the first generator to obtain a second infrared image; determining a first target loss function or a second target loss function based on the input sample image and the obtained image; the first generator and the second generator are trained according to the first objective loss function or the second objective loss function. The realization mode can improve the conversion efficiency and accuracy between the visible light image and the infrared image.

Description

Model training method, image processing method, device, equipment and storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to the field of deep learning and computer vision, and in particular, to methods, apparatuses, devices, and storage media for model training and image processing.

Background

With the rapid development of computer technology and internet industry, the artificial intelligence era is silent. Computer vision processing technology is continuously developed and matured. The human face recognition and analysis task is particularly popular, and the human face recognition and analysis task is widely applied to the fields of payment, security protection, human-computer interaction and the like. According to the change of illumination conditions in a natural environment, the tasks need to acquire a visible light image as input through an optical camera under the condition of good illumination conditions, an infrared camera is adopted to acquire an infrared image as input under the condition of poor illumination conditions, and the quantity and the abundance of two modal data play a vital role in the expression of the human face related tasks. For the task of paying attention to the human face details, the accurate data annotation is often carried out on the human face details, but the infrared image has the characteristics of fuzzy visual effect, low signal-to-noise ratio and poor resolution ratio, and the problems of high difficulty and high cost can be caused by directly carrying out data annotation on the infrared image. Therefore, the optical image is migrated to the infrared image by adopting the image style migration method, and the data annotation of the optical image as the data annotation result of the infrared image becomes an effective solution.

Disclosure of Invention

The present disclosure provides a model training method, an image processing method, an apparatus, a device and a storage medium.

According to a first aspect, there is provided a model training method comprising: acquiring a training sample pair set, wherein training sample pairs in the training sample pair set comprise sample visible light images and corresponding sample infrared images; inputting the visible light image of each sample into a first generator to obtain a first infrared image, and inputting the first infrared image into a second generator to obtain a first visible light image, wherein the first generator is used for converting the visible light image into an infrared image, and the second generator is used for converting the infrared image into a visible light image; inputting the corresponding infrared image of each sample into a second generator to obtain a second visible light image, and inputting the second visible light image into the first generator to obtain a second infrared image; determining a first target loss function or a second target loss function based on the input sample image and the obtained image; the first generator and the second generator are trained according to the first objective loss function or the second objective loss function.

According to a second aspect, there is provided an image processing method comprising: acquiring a target visible light image or a target infrared image; generating a processed infrared image corresponding to the target visible light image by using a trained first generator or generating a processed visible light image corresponding to the target infrared image by using a trained second generator, wherein the first generator and the second generator are trained by the method described in the first aspect.

According to a third aspect, there is provided a model training apparatus comprising: the device comprises a sample pair set acquisition unit, a comparison unit and a comparison unit, wherein the sample pair set acquisition unit is configured to acquire a training sample pair set, and a training sample pair in the training sample pair set comprises a sample visible light image and a corresponding sample infrared image; the first image generation unit is configured to input the visible light image of each sample into the first generator, obtain a first infrared image, and input the first infrared image into the second generator to obtain a first visible light image, wherein the first generator is used for converting the visible light image into an infrared image, and the second generator is used for converting the infrared image into a visible light image; the second image generation unit is configured to input the corresponding infrared image of each sample into the second generator to obtain a second visible light image, and input the second visible light image into the first generator to obtain a second infrared image; a loss function determination unit configured to determine a first target loss function or a second target loss function based on the input sample image and the obtained image; a generator training unit configured to train the first generator and the second generator according to the first target loss function or the second target loss function.

According to a fourth aspect, there is provided an image processing apparatus comprising: a target image acquisition unit configured to acquire a target visible light image or a target infrared image; a target image processing unit configured to generate a processed infrared image corresponding to the target visible light image by using a trained first generator or generate a processed visible light image corresponding to the target infrared image by using a trained second generator, wherein the first generator and the second generator are obtained by the apparatus as described in the third aspect.

According to a fifth aspect, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described in the first aspect or the method as described in the second aspect.

According to a sixth aspect, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method as described in the first aspect or the method as described in the second aspect.

According to a seventh aspect, a computer program product comprising a computer program which, when executed by a processor, implements the method as described in the first aspect or the method as described in the second aspect.

According to the budget image processing method provided by the technology disclosed by the invention, the conversion between the visible light image and the infrared image can be facilitated, and the image processing precision and speed are improved.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is an exemplary system architecture diagram in which one embodiment of the present disclosure may be applied;

FIG. 2 is a flow diagram of one embodiment of a model training method according to the present disclosure;

FIG. 3 is a flow diagram of another embodiment of a model training method according to the present disclosure;

FIG. 4 is a schematic diagram of the structure of the feature extraction model in the embodiment shown in FIG. 3;

FIG. 5 is a flow diagram for one embodiment of an image processing method according to the present disclosure;

FIG. 6 is a flow diagram of another embodiment of an image processing method according to the present disclosure;

FIG. 7 is a schematic diagram of an application scenario of a model training method, an image processing method according to the present disclosure;

FIG. 8 is a schematic block diagram of one embodiment of a model training apparatus according to the present disclosure;

FIG. 9 is a schematic block diagram view of one embodiment of an image sundown apparatus according to the present disclosure;

FIG. 10 is a block diagram of an electronic device for implementing a model training method, an image processing method of an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

It should be noted that, in the present disclosure, the embodiments and features of the embodiments may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Fig. 1 illustrates an exemplary system architecture 100 to which the model training method, the image processing method, or embodiments for the model training apparatus, the image processing apparatus, of the present disclosure may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. Various communication client applications, such as a video playing application, an image processing application, and the like, may be installed on the

terminal devices

101, 102, and 103.

The

terminal apparatuses

101, 102, and 103 may be hardware or software. When the

terminal devices

101, 102, 103 are hardware, they may be various electronic devices including, but not limited to, smart phones, tablet computers, car computers, laptop portable computers, desktop computers, and the like. When the

terminal apparatuses

101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.

The server 105 may be a server providing various services, such as a background server providing models on the

terminal devices

101, 102, 103. The background server may train the initial model with sample images (including visible light images and infrared images), obtain the first generator and the second generator, and feed back the first generator and the second generator to the

terminal devices

101, 102, 103.

The server 105 may be hardware or software. When the server 105 is hardware, it may be implemented as a distributed server cluster composed of a plurality of servers, or may be implemented as a single server. When the server 105 is software, it may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.

It should be noted that the model training method provided by the embodiment of the present disclosure is generally executed by the server 105, and the image processing method may be executed by the

terminal devices

101, 102, and 103, or may be executed by the server 105. Accordingly, the model training apparatus is generally provided in the server 105, and the image processing apparatus may be provided in the

terminal devices

101, 102, and 103, or may be provided in the server 105.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to FIG. 2, a flow 200 of one embodiment of a model training method according to the present disclosure is shown. The model training method of the embodiment comprises the following steps:

step 201, a training sample pair set is obtained.

In this embodiment, an executing agent (e.g., the server 105 shown in fig. 1) of the model training method may first obtain a set of training sample pairs. The training sample pair set may include a plurality of training sample pairs, each of which includes a sample visible light image and a corresponding sample infrared image. Here, the corresponding meaning means that the object included in the sample visible light image and the sample infrared image is the same, the shooting angle is the same, and the number of pixels occupied by the object is the same, and it can be understood that the sample visible light image and the sample infrared image are different expressions of the same image.

Step 202, inputting the visible light image of each sample into a first generator to obtain a first infrared image, and inputting the first infrared image into a second generator to obtain a first visible light image.

In this embodiment, the first generator and the second generator may be connected in sequence. The execution main body can input the visible light images of the samples into the first generator first, obtain the first infrared image output by the first generator, and then input the obtained first infrared image into the second generator to obtain the first visible light image. Here, the first generator is configured to process the input visible light image to obtain a first infrared image. The second generator is used for processing the input infrared image to obtain a visible light image. I.e. a first generator for converting visible light images into infrared images and a second generator for converting infrared images into visible light images. In this embodiment, an infrared image obtained by processing the sample visible light image by the first generator is referred to as a first infrared image, and a visible light image obtained by processing the first infrared image by the second generator is referred to as a first visible light image. The first generator and the second generator may be determined by generating a countermeasure circulation network (cycle GAN).

And 203, inputting the corresponding infrared image of each sample into a second generator to obtain a second visible light image, and inputting the second visible light image into the first generator to obtain a second infrared image.

The execution main body can also input each sample infrared image corresponding to each sample visible light image into the second generator respectively to obtain a second visible light image output by the second generator. And then inputting each obtained second visible light image into the first generator to obtain a second infrared image. It will be appreciated that in this step the second generator and the first generator are connected in series. In this embodiment, the visible light image obtained by processing the sample infrared image by the second generator is referred to as a second visible light image, and the infrared image obtained by processing the second visible light image by the first generator is referred to as a second infrared image.

Step 204, determining a first target loss function or a second target loss function based on the input sample image and the obtained image.

The executing subject may determine the first target loss function or the second target loss function based on the input sample image and the obtained image after obtaining the respective images generated by the first generator and the second generator for the sample visible light image and the sample infrared image. Here, the first objective loss function may be used mainly for training the first generator, and the second objective loss function may be used mainly for training the second generator. It will be appreciated that the parameters of the second generator may also be adjusted simultaneously during training with the first objective loss function to optimize performance of the first generator when used simultaneously with the second generator. Similarly, the parameters of the first generator may be adjusted simultaneously during training with the second objective loss function to optimize performance of the second generator when used simultaneously with the first generator.

Specifically, for the first target loss function, the executing entity may consider the pixel level difference between the sample visible light image and the second visible light image, and the first visible light image, and construct the first target loss function according to the difference. For the second target loss function, the executing subject may consider the pixel-level difference between the sample infrared image and the first infrared image and the second infrared image, and construct the second target loss function according to the difference.

Step 205, training the first generator and the second generator according to the first objective loss function or the second objective loss function.

After the execution subject constructs the first objective loss function or the second objective loss function, the execution subject may train the first generator and the second generator simultaneously using the first objective loss function or the second objective loss function. Alternatively, the execution body may also adjust only the parameters of the first generator using the first objective loss function, while adjusting only the parameters of the second generator using the second objective loss function.

According to the model training method provided by the embodiment of the disclosure, the parameters of the first generator and the second generator can be adjusted at the same time, the conversion between the visible light image and the infrared image is realized, and the image processing efficiency is improved.

With continued reference to FIG. 3, a flow 300 of another embodiment of a model training method according to the present disclosure is shown. As shown in fig. 3, the method of the present embodiment may include the following steps:

step 301, a training sample pair set is obtained.

In this embodiment, a training sample pair in the training sample pair set includes a sample visible light image and a corresponding sample infrared image.

Step 302, inputting the visible light image of each sample into the first generator to obtain a first infrared image, and inputting the first infrared image into the second generator to obtain a first visible light image.

And 303, inputting the corresponding infrared image of each sample into a second generator to obtain a second visible light image, and inputting the second visible light image into the first generator to obtain a second infrared image.

Step 304, a first loss function is determined according to the sample visible light image and the first infrared image.

In this embodiment, the execution subject may comprehensively consider various factors when determining the loss function. Specifically, the difference between the sample visible image and the first infrared image can be regarded as the discriminator loss. The method is determined by the existing discriminator loss function determination mode of the countermeasure generation network.

Step 305, determining a second loss function according to the sample visible light image and the first visible light image.

The performing agent may also take into account the difference between the sample visible light image and the first visible light image, and in general, if the performance of the first generator and the second generator is good enough, the difference between the sample visible light image and the first visible light image should not be large in terms of cycle consistency. Here, the performing subject may calculate a difference in aspects of the sample visible light image and the first visible light image to determine the second loss function. For example, the executing entity may calculate the difference between the two pixels and the difference between features at different depths to determine the second penalty function.

Step 306, determining a third loss function according to the sample visible light image and the second visible light image.

In addition, in order to make the loop-fighting generation network pay more attention to the details of each object in the image, on the basis of the paired infrared and visible light training data, the execution subject may also construct a third loss function taking into account the pixel-level reconstruction error between the original image and the generated image of the same modality, i.e., the pixel-level error between the sample visible light image and the second visible light image. Specifically, the executing entity may calculate differences between respective corresponding pixels between the sample visible light image and the second visible light image, and add the differences to obtain the third loss function.

Step 307, determining a fourth loss function based on the sample visible light image, the sample infrared image, the first visible light image, the second infrared image and the pre-trained feature extraction model.

In addition, the execution subject may also determine a fourth loss function based on the input sample visible light image, the sample infrared image, and the finally output first visible light image, the second infrared image. Specifically, the executing agent may input each of the images into a feature extraction model trained in advance, extract features of each of the images using the feature extraction model, and determine the fourth loss function based on a difference between the features.

In some optional implementations of this embodiment, the execution subject may determine the fourth loss function by the following specific steps: respectively extracting the characteristics of the sample visible light image, the sample infrared image, the first visible light image and the second infrared image by using a characteristic extraction model to obtain corresponding first characteristics, second characteristics, third characteristics and fourth characteristics; respectively determining a first sub-loss function and a second sub-loss function according to the first characteristic to the fourth characteristic; and determining a fourth loss function according to the first sub-loss function and the second sub-loss function.

In this implementation manner, the execution subject may respectively extract the features of the sample visible light image, the sample infrared image, the first visible light image, and the second infrared image by using the feature extraction model, so as to obtain the corresponding first feature, second feature, third feature, and fourth feature. The first feature, the second feature, the third feature, and the fourth feature may include a feature vector, a feature map, and the like. The execution body may divide the four features of the first feature to the fourth feature into two groups to obtain a first sub-loss function and a second sub-loss function. Specifically, the executing entity may randomly divide the four features into two groups, or divide two features corresponding to the visible light image into one group, divide two features corresponding to the infrared image into another group, and determine the first sub-loss function and the second sub-loss function in the two groups respectively. Then, the two sub-loss functions are weighted to obtain a fourth loss function.

In some optional implementations of this embodiment, the executing subject may extract the first feature, the second feature, the third feature, and the fourth feature of each image through the following specific steps: and respectively extracting the characteristics of the sample visible light image, the sample infrared image, the second infrared image and the first visible light image by utilizing a plurality of characteristic extraction layers of the characteristic extraction model to obtain a first characteristic, a second characteristic, a third characteristic and a fourth characteristic which respectively comprise a plurality of characteristic graphs.

In this implementation, the feature extraction model includes a plurality of feature extraction layers. Each feature extraction layer may extract features of a plurality of different depths of the input image, i.e., a first feature, a second feature, a third feature, and a fourth feature, respectively. The specific structure of the feature extraction model is shown in fig. 4. In fig. 4, an input image I inputs a model on the left side, and a feature a, a feature B, a feature C, and a feature D are obtained in this order on the right side. The feature A, the feature B, the feature C and the feature D are feature maps of different depths. The sample visible light image, the sample infrared image, the second infrared image and the first visible light image can be respectively used as input images to be input into the feature extraction model, and the feature extraction layers of the feature extraction model sequentially extract features of different depths of the images. In addition, the features of different depths are input into the feature extraction layers corresponding to the feature extraction layers, and the features are input into the feature extraction layers connected in sequence, and the features output by each corresponding feature extraction layer are fused with the features output by each feature extraction layer connected in sequence, so that the first feature, the second feature, the third feature and the fourth feature are finally obtained.

In some optional implementations of this embodiment, the execution subject may determine the first sub-loss function and the second sub-loss function by the following specific steps: determining a first sub-loss function according to each feature map in the feature map set of the first feature and the corresponding feature map in the feature map set of the third feature; and determining a second sub-loss function according to each feature map in the feature map set of the second feature and the corresponding feature map in the feature map set of the fourth feature.

In this implementation, the first feature, the second feature, the third feature, and the fourth feature each include a feature map with different depths. The execution subject may calculate a first sub-loss function from a difference between feature maps of the same depth in the first feature and the third feature. And calculating the difference between the feature maps with the same depth in the second feature and the fourth feature to obtain a second sub-loss function.

Step 308, determining a fifth loss function according to the sample infrared image and the second visible light image.

Step 309, determining a sixth loss function according to the sample infrared image and the second infrared image.

Step 310, determining a seventh loss function according to the sample infrared image and the first infrared image.

In this embodiment, the fifth loss function, the sixth loss function, and the seventh loss function are determined in a similar manner to the first loss function, the second loss function, and the third loss function, except that the fifth loss function, the sixth loss function, and the seventh loss function are for infrared images, and the first loss function, the second loss function, and the third loss function are for visible light images.

Step 311, determining a first target loss function according to at least one of the first to fourth loss functions; determining a second target loss function according to at least one of the fourth to seventh loss functions.

The execution subject may determine a first target loss function based on at least one of the first to fourth loss functions. Specifically, the execution body may weight the first loss function, the second loss function, the third loss function, and the fourth loss function to obtain a first target loss function. Determining a second target loss function using at least one of the fourth through seventh loss functions. Namely, the fourth loss function, the fifth loss function, the sixth loss function and the seventh loss function are weighted to obtain a second target loss function.

Step 312, train the first generator and the second generator according to the first objective loss function or the second objective loss function.

In some specific times, the visible light image and the infrared image may be face images, so that conversion between visible light and infrared styles of the face images can be realized.

According to the model training method provided by the embodiment of the disclosure, the detail characteristics of the object can be emphasized when the visible light image and the infrared image are converted, and the accuracy of style conversion is improved.

Referring to fig. 5, a flow 500 of one embodiment of an image processing method according to the present disclosure is shown. As shown in fig. 5, the method of the present embodiment may include the following steps:

step 501, acquiring a target visible light image or a target infrared image.

In this embodiment, the execution subject may acquire the target visible light image or the target infrared image in various ways. Here, the target visible light image may be a face image to be processed. The target infrared image may be an image to be annotated.

And 502, generating a processed infrared image corresponding to the target visible light image by using the trained first generator or generating a processed visible light image corresponding to the target infrared image by using the trained second generator.

The executive subject can process the target visible light image by using the trained first generator, and the obtained image can be recorded as a processed infrared image. Similarly, the executive body can also generate a processed visible light image corresponding to the target infrared image by using the trained second generator.

The first generator and the second generator can be trained by the method shown in fig. 2 or fig. 3.

The image processing method provided by the embodiment of the disclosure can realize conversion between the infrared image and the visible light image, and improve conversion efficiency.

With continued reference to fig. 6, a flow 600 of another embodiment of an image processing method according to the present disclosure is shown. As shown in fig. 6, the method of the present embodiment may include the following steps:

step 601, acquiring a target visible light image and annotation information of the target visible light image.

In this embodiment, the labeling information of the target visible light image may be used to label different objects of the target visible light image. The label information may include a label box.

Step 602, generating a processed infrared image corresponding to the target visible light image by using the trained first generator.

Step 603, determining the labeling information of the processed infrared image according to the labeling information.

The execution main body can determine the pixels occupied by the marking information in the processing infrared image according to the pixels occupied by the marking information in the target visible light image, so that the marking information of the processing infrared image can be determined. Therefore, the infrared image can be easily marked, and the marking accuracy is improved.

And step 604, performing model training by using the marked processed infrared image.

After determining the labeling information of the processed infrared image, the executing subject can perform model training by using the labeled processed infrared image to obtain a model specially used for processing the infrared image.

According to the image processing method provided by the embodiment of the disclosure, the trained generator can be used for processing the marked visible light image to obtain the marked infrared image, so that the infrared image can be conveniently marked, and the marking accuracy is improved.

Fig. 7 is a schematic diagram illustrating an application scenario of the model training method and the image processing method of the present disclosure. In the application scenario of fig. 7, the server 701 obtains a trained first generator and a trained second generator by using steps 201 to 205. The first generator and the second generator are then transmitted to the terminal 702. The terminal 702 can perform image conversion using the first generator and the second generator described above, and realize easy conversion between the visible light image and the infrared image.

With further reference to fig. 8, as an implementation of the methods shown in the above-mentioned figures, the present disclosure provides an embodiment of a model training apparatus, which corresponds to the embodiment of the method shown in fig. 2, and which can be applied in various electronic devices.

As shown in fig. 8, the model training apparatus 800 of the present embodiment includes: a sample pair set acquisition unit 801, a first image generation unit 802, a second image generation unit 803, a loss function determination unit 804, and a generator training unit 805.

A sample pair set acquisition unit 801 configured to acquire a training sample pair set. The training sample pairs in the set of training sample pairs include a sample visible light image and a corresponding sample infrared image. .

The first image generating unit 802 is configured to input the visible light image of each sample into the first generator to obtain a first infrared image, and then input the first infrared image into the second generator to obtain a first visible light image. The first generator is used for converting the visible light image into the infrared image, and the second generator is used for converting the infrared image into the visible light image.

The second image generation unit 803 is configured to input the corresponding sample infrared image to the second generator to obtain a second visible light image, and input the second visible light image to the first generator to obtain a second infrared image.

A loss function determination unit 804 configured to determine a first target loss function or a second target loss function based on the input sample image and the resulting image.

A generator training unit 805 configured to train the first generator and the second generator according to the first target loss function or the second target loss function.

In some optional implementations of this embodiment, the loss function determining unit 804 may be further configured to: determining a first loss function according to the visible light image and the first infrared image of the sample; determining a second loss function according to the sample visible light image and the first visible light image; determining a third loss function according to the sample visible light image and the second visible light image; determining a fourth loss function based on the sample visible light image, the sample infrared image, the first visible light image, the second infrared image and a pre-trained feature extraction model; a first target loss function is determined based on at least one of the first through fourth loss functions.

In some optional implementations of this embodiment, the loss function determining unit 804 may be further configured to: determining a fifth loss function according to the sample infrared image and the second visible light image; determining a sixth loss function according to the sample infrared image and the second infrared image; determining a seventh loss function according to the sample infrared image and the first infrared image; determining a second target loss function according to at least one of the fourth to seventh loss functions.

In some optional implementations of this embodiment, the loss function determining unit 804 may be further configured to: respectively extracting the characteristics of the sample visible light image, the sample infrared image, the first visible light image and the second infrared image by using a characteristic extraction model to obtain corresponding first characteristics, second characteristics, third characteristics and fourth characteristics; respectively determining a first sub-loss function and a second sub-loss function according to the first characteristic to the fourth characteristic; and determining a fourth loss function according to the first sub-loss function and the second sub-loss function.

In some optional implementations of this embodiment, the feature extraction model includes a plurality of feature extraction layers. The loss function determination unit 804 may be further configured to: and respectively extracting the characteristics of the sample visible light image, the sample infrared image, the first visible light image and the second infrared image by utilizing a plurality of characteristic extraction layers of the characteristic extraction model to obtain a first characteristic, a second characteristic, a third characteristic and a fourth characteristic which respectively comprise a plurality of characteristic graphs.

In some optional implementations of this embodiment, the loss function determining unit 804 may be further configured to: determining a first sub-loss function according to each feature map in the feature map set of the first feature and the corresponding feature map in the feature map set of the third feature; and determining a second sub-loss function according to each feature map in the feature map set of the second feature and the corresponding feature map in the feature map set of the fourth feature.

It should be understood that units 801 to 805 described in the model training apparatus 800 correspond to respective steps in the method described with reference to fig. 2. Thus, the operations and features described above with respect to the model training method are equally applicable to the apparatus 800 and the units included therein, and are not described in detail here.

With further reference to fig. 9, as an implementation of the methods shown in the above figures, the present disclosure provides an embodiment of an image processing apparatus, which corresponds to the method embodiment shown in fig. 5, and which is particularly applicable in various electronic devices.

As shown in fig. 9, the image processing apparatus 900 of the present embodiment includes: an object image acquisition unit 901 and an object image processing unit 902.

A target image acquisition unit 901 configured to acquire a target visible light image or a target infrared image;

a target image processing unit 902 configured to generate a processed infrared image corresponding to the target visible light image by using a trained first generator or generate a processed visible light image corresponding to the target infrared image by using a trained second generator, wherein the first generator and the second generator are obtained by the apparatus described in fig. 5 or fig. 6.

In some optional implementations of this embodiment, the apparatus 900 may further include: the device comprises a label information acquisition unit and a label information determination unit.

And an annotation information acquisition unit configured to acquire annotation information of the target visible light image.

And the annotation information determining unit is configured to determine the annotation information for processing the infrared image according to the annotation information.

In some optional implementations of this embodiment, the apparatus 900 may further include: a model training unit configured to perform model training using the annotated processed infrared image.

It should be understood that units 901 to 902 recited in the image processing apparatus 900 correspond to respective steps in the method described with reference to fig. 5, respectively. Thus, the operations and features described above for the image processing method are equally applicable to the apparatus 900 and the units included therein, and are not described in detail here.

In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure and other processing of the personal information of the related user are all in accordance with the regulations of related laws and regulations and do not violate the good customs of the public order.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to an embodiment of the present disclosure.

FIG. 10 shows a block diagram of an electronic device 1000 that performs a model training method, an image processing method, according to an embodiment of the disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 10, the electronic device 1000 includes a processor 1001 that can perform various appropriate actions and processes in accordance with a computer program stored in a Read Only Memory (ROM)1002 or a computer program loaded from a memory 1008 into a Random Access Memory (RAM) 1003. In the RAM 1003, various programs and data necessary for the operation of the electronic apparatus 1000 can also be stored. The processor 1001, ROM 1002, and RAM 1003 are connected to each other by a bus 1004. An I/O interface (input/output interface) 1005 is also connected to the bus 1004.

A number of components in the electronic device 1000 are connected to the I/O interface 1005, including: an input unit 1006 such as a keyboard, a mouse, and the like; an output unit 1007 such as various types of displays, speakers, and the like; a memory 1008 such as a magnetic disk, optical disk, or the like; and a communication unit 1009 such as a network card, a modem, a wireless communication transceiver, or the like. The communication unit 1009 allows the electronic device 1000 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

The processor 1001 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of processor 1001 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various application specific Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, or the like. The processor 1001 performs the various methods and processes described above, such as a model training method, an image processing method. For example, in some embodiments, the model training method, the image processing method, may be implemented as a computer software program tangibly embodied in a machine-readable storage medium, such as the memory 1008. In some embodiments, part or all of the computer program may be loaded and/or installed onto electronic device 1000 via ROM 1002 and/or communications unit 1009. When the computer program is loaded into the RAM 1003 and executed by the processor 1001, one or more steps of the model training method, the image processing method described above may be performed. Alternatively, in other embodiments, the processor 1001 may be configured to perform the model training method, the image processing method, by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. The program code described above may be packaged as a computer program product. These program code or computer program products may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program code, when executed by the processor 1001, causes the functions/acts specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable storage medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable storage medium may be a machine-readable signal storage medium or a machine-readable storage medium. A machine-readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server can be a cloud Server, also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service ("Virtual Private Server", or simply "VPS"). The server may also be a server of a distributed system, or a server incorporating a blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel or sequentially or in different orders, and are not limited herein as long as the desired results of the technical solutions of the present disclosure can be achieved.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.

Claims

1. A model training method, comprising:

acquiring a training sample pair set, wherein a training sample pair in the training sample pair set comprises a sample visible light image and a corresponding sample infrared image;

inputting each sample visible light image into a first generator to obtain a first infrared image, and inputting the first infrared image into a second generator to obtain a first visible light image, wherein the first generator is used for converting the visible light image into the infrared image, and the second generator is used for converting the infrared image into the visible light image;

inputting the corresponding infrared image of each sample into the second generator to obtain a second visible light image, and inputting the second visible light image into the first generator to obtain a second infrared image;

determining a first target loss function or a second target loss function based on the input sample image and the obtained image;

training the first generator and the second generator according to the first target loss function or the second target loss function.

2. The method of claim 1, wherein determining a first or second target loss function based on the input sample image and the resulting image comprises:

determining a first loss function according to the sample visible light image and the first infrared image;

determining a second loss function according to the sample visible light image and the first visible light image;

determining a third loss function according to the sample visible light image and the second visible light image;

determining a fourth loss function based on the sample visible light image, the sample infrared image, the first visible light image, the second infrared image, and a pre-trained feature extraction model;

determining the first target loss function according to at least one of the first to fourth loss functions.

3. The method of claim 2, wherein determining the first or second target loss function based on the input sample image and the resulting image comprises:

determining a fifth loss function according to the sample infrared image and the second visible light image;

determining a sixth loss function according to the sample infrared image and the second infrared image;

determining a seventh loss function according to the sample infrared image and the first infrared image;

determining the second target loss function according to at least one of the fourth to seventh loss functions.

4. The method of claim 2, wherein the determining a fourth loss function based on the sample visible light image, the sample infrared image, the first visible light image, the second infrared image, and a pre-trained feature extraction model comprises:

respectively extracting the characteristics of the sample visible light image, the sample infrared image, the first visible light image and the second infrared image by using the characteristic extraction model to obtain corresponding first characteristics, second characteristics, third characteristics and fourth characteristics;

respectively determining a first sub-loss function and a second sub-loss function according to the first characteristic to the fourth characteristic;

and determining the fourth loss function according to the first sub-loss function and the second sub-loss function.

5. The method of claim 4, wherein the feature extraction model comprises a plurality of feature extraction layers; and

the extracting the characteristics of the sample visible light image, the sample infrared image, the first visible light image and the second infrared image by using the characteristic extraction model respectively to obtain corresponding first characteristics, second characteristics, third characteristics and fourth characteristics includes:

and respectively extracting the characteristics of the sample visible light image, the sample infrared image, the first visible light image and the second infrared image by utilizing a plurality of characteristic extraction layers of the characteristic extraction model to obtain a corresponding first characteristic, a corresponding second characteristic, a corresponding third characteristic and a corresponding fourth characteristic which comprise a plurality of characteristic graphs.

6. The method of claim 5, wherein the determining a first sub-loss function and a second sub-loss function from the first feature to the fourth feature, respectively, comprises:

determining the first sub-loss function according to each feature map in the feature map set of the first feature and the corresponding feature map in the feature map set of the third feature;

and determining the second sub-loss function according to each feature map in the feature map set of the second feature and the corresponding feature map in the feature map set of the fourth feature.

7. An image processing method comprising:

acquiring a target visible light image or a target infrared image;

generating a processed infrared image corresponding to the target visible light image using a trained first generator or generating a processed visible light image corresponding to the target infrared image using a trained second generator, wherein the first generator and the second generator are trained by the method of any one of claims 1-6.

8. The method of claim 7, wherein the method further comprises:

acquiring the labeling information of the target visible light image;

and determining the labeling information of the processed infrared image according to the labeling information.

9. The method of claim 8, wherein the method further comprises:

and performing model training by using the marked processed infrared image.

10. A model training apparatus comprising:

a sample pair set acquisition unit configured to acquire a training sample pair set, a training sample pair in the training sample pair set including a sample visible light image and a corresponding sample infrared image;

a first image generation unit, configured to input each sample visible light image into a first generator, obtain a first infrared image, and input the first infrared image into a second generator, obtain a first visible light image, wherein the first generator is used for converting the visible light image into an infrared image, and the second generator is used for converting the infrared image into a visible light image;

a second image generation unit configured to input the corresponding infrared image of each sample into the second generator to obtain a second visible light image, and input the second visible light image into the first generator to obtain a second infrared image;

a loss function determination unit configured to determine a first target loss function or a second target loss function based on the input sample image and the obtained image;

a generator training unit configured to train the first generator and the second generator according to the first target loss function or the second target loss function.

11. The apparatus of claim 10, wherein the loss function determination unit is further configured to:

12. The apparatus of claim 11, wherein the loss function determination unit is further configured to:

13. The apparatus of claim 11, wherein the loss function determination unit is further configured to:

14. The apparatus of claim 13, wherein the feature extraction model comprises a plurality of feature extraction layers; and

the loss function determination unit is further configured to:

15. The apparatus of claim 14, wherein the loss function determination unit is further configured to:

16. An image processing apparatus comprising:

a target image acquisition unit configured to acquire a target visible light image or a target infrared image;

a target image processing unit configured to generate a processed infrared image corresponding to the target visible light image using a trained first generator or generate a processed visible light image corresponding to the target infrared image using a trained second generator, wherein the first generator and the second generator are obtained by the apparatus of any one of claims 10-15.

17. The apparatus of claim 16, wherein the apparatus further comprises:

an annotation information acquisition unit configured to acquire annotation information of the target visible light image;

and the annotation information determining unit is configured to determine the annotation information of the processed infrared image according to the annotation information.

18. The apparatus of claim 16, wherein the apparatus further comprises:

a model training unit configured to perform model training using the annotated processed infrared image.

19. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6 or to perform the method of any one of claims 7-9.

20. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-6 or to perform the method of any one of claims 7-9.

21. A computer program product comprising a computer program which, when executed by a processor, implements the method of any of claims 1-6 or the method of any of claims 7-9.