CN116664820A

CN116664820A - Image processing method, device, electronic equipment and storage medium

Info

Publication number: CN116664820A
Application number: CN202310583027.7A
Authority: CN
Inventors: 刘浪涛; 陈承隆
Original assignee: Vivo Mobile Communication Co Ltd
Current assignee: Vivo Mobile Communication Co Ltd
Priority date: 2023-05-22
Filing date: 2023-05-22
Publication date: 2023-08-29

Abstract

The application discloses an image processing method, an image processing device, electronic equipment and a storage medium, and belongs to the technical field of artificial intelligence. The image processing method comprises the following steps: acquiring a first sample image dataset and a second sample image dataset; training an initial difference detection model based on the first sample image data set and the second sample image data set to obtain a target difference detection model, wherein the target difference detection model is used for outputting difference images of the first sample image data set and the second sample image data set; inputting the first image to be processed and the second image to be processed into a target difference detection model to obtain a target difference image; and performing image processing on the first to-be-processed image or the second to-be-processed image based on the target difference image to obtain a target image.

Description

Image processing method, device, electronic equipment and storage medium

Technical Field

The application belongs to the technical field of artificial intelligence, and particularly relates to an image processing method, an image processing device, electronic equipment and a storage medium.

Background

Because of the differences in design schemes and manufacturing processes of the image sensors of different types, response characteristics of the image sensors to light are inconsistent, so that differences exist in signal acquisition processes, and thus larger differences exist in colors, brightness, definition and the like of original images output to an image signal processing (Image Signal Processing, ISP) unit by the image sensors of different types, the original differences among the images cannot be eliminated all the time when the original images pass through an ISP Pipeline (Pipeline), and finally differences exist in imaging effects of the image sensors of different types, so-called multi-camera effect consistency abnormal problems occur.

In the related art, in order to solve the above-mentioned problems, the parameters of ISP units corresponding to different types of image sensors are usually adjusted in a targeted manner to eliminate the differences between images when the images pass through the ISP units.

However, this method requires manual parameter adjustment, and it is difficult to ensure consistency of the multi-shot effect, and a lot of human resources are consumed.

Disclosure of Invention

The embodiment of the application aims to provide an image processing method, an image processing device, electronic equipment and a storage medium, which can ensure the consistency of images acquired by different camera modules.

In a first aspect, an embodiment of the present application provides an image processing method, including:

acquiring a first sample image data set and a second sample image data set, wherein the first sample image data set is image data acquired through a first camera module, and the second sample image data set is image data acquired through a second camera module under the same shooting scene, wherein the angle of view of the first camera module is different from that of the second camera module, or the photosensitivity of the first camera module is different from that of the second camera module;

training an initial difference detection model based on the first sample image data set and the second sample image data set to obtain a target difference detection model, wherein the target difference detection model is used for outputting difference images of the first sample image data set and the second sample image data set;

Inputting a first image to be processed and a second image to be processed into a target difference detection model to obtain a target difference image, wherein the first image to be processed is an image acquired by a first camera module, and the second image to be processed is an image acquired by a second camera module under the same shooting scene;

performing image processing on the first to-be-processed image or the second to-be-processed image based on the target difference image to obtain a target image;

when the initial difference detection model is trained, image channel splitting is carried out on a first sample image in the first sample image data set and a second sample image in the second sample image data set, first loss function values of different image channels are respectively calculated, and total loss function values of the initial difference detection model are calculated based on the first loss function values of the different image channels.

In a second aspect, an embodiment of the present application provides an image processing apparatus, including:

the acquisition module is used for acquiring a first sample image data set and a second sample image data set, wherein the first sample image data set is image data acquired through a first camera module, the second sample image data set is image data acquired through a second camera module under the same shooting scene, and the angle of view of the first camera module is different from the angle of view of the second camera module or the photosensitive performance of the first camera module is different from the photosensitive performance of the second camera module;

The training module is used for training the initial difference detection model based on the first sample image data set and the second sample image data set to obtain a target difference detection model, and the target difference detection model is used for outputting difference images of the first sample image data set and the second sample image data set;

the detection module is used for inputting a first image to be processed and a second image to be processed into the target difference detection model to obtain a target difference image, wherein the first image to be processed is an image acquired by the first camera module, and the second image to be processed is an image acquired by the second camera module under the same shooting scene;

the processing module is used for carrying out image processing on the first image to be processed or the second image to be processed based on the target difference image to obtain a target image;

In a third aspect, an embodiment of the present application provides an electronic device comprising a processor and a memory storing a program or instructions executable on the processor, which when executed by the processor, implement the steps of the method as described in the first aspect.

In a fourth aspect, embodiments of the present application provide a readable storage medium having stored thereon a program or instructions which when executed by a processor perform the steps of the method according to the first aspect.

In a fifth aspect, an embodiment of the present application provides a chip, where the chip includes a processor and a communication interface, where the communication interface is coupled to the processor, and where the processor is configured to execute a program or instructions to implement a method according to the first aspect.

In a sixth aspect, embodiments of the present application provide a computer program product stored in a storage medium, the program product being executable by at least one processor to implement the method according to the first aspect.

According to the embodiment of the application, the initial difference detection model can be trained through a first sample image data set and a second sample image data set which are acquired by the camera modules based on different view angles or different photosensitivity to obtain the target difference detection model, then the first to-be-processed image and the second to-be-processed image which are acquired by the camera modules based on different camera modules under the same shooting scene are input into the target difference detection model to obtain the target difference image, and then the first to-be-processed image or the second to-be-processed image is subjected to image processing based on the target difference image, so that the target image can be obtained. The thus obtained target image is consistent with the photographing effect of the second image to be processed or the first image to be processed. Therefore, the shooting effect of the images acquired based on different shooting modules can be consistent through image processing, the difference between the different shooting modules is made up, the consistency of the multi-shooting effect is ensured, and a large amount of human resources are not required to be consumed. When training the initial difference detection model, splitting image channels of a first sample image in the first sample image data set and a second sample image in the second sample image data set, respectively calculating first loss function values of different image channels, and calculating total loss function values of the initial difference detection model based on the first loss function values of different image channels. Therefore, the accuracy of model training can be improved by improving the accuracy of the loss function value, and the consistency of the shooting effect of the target image and the second image to be processed or the first image to be processed is further improved.

Drawings

FIG. 1 is one of the flowcharts of an image processing method shown in accordance with an exemplary embodiment;

FIG. 2 is a schematic diagram illustrating a process for generating first weight information according to an exemplary embodiment;

FIG. 3 is a second flowchart illustrating a method of image processing according to an exemplary embodiment;

FIG. 4 is a schematic diagram illustrating a training process of a discrepancy detection model, according to an exemplary embodiment;

FIG. 5 is a schematic diagram illustrating a channel splitting process according to an example embodiment;

fig. 6 is a block diagram showing a configuration of an image processing apparatus according to an exemplary embodiment;

FIG. 7 is a block diagram of an electronic device, according to an example embodiment;

fig. 8 is a schematic diagram of a hardware structure of an electronic device implementing an embodiment of the present application.

Detailed Description

The technical solutions of the embodiments of the present application will be clearly described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which are obtained by a person skilled in the art based on the embodiments of the present application, fall within the scope of protection of the present application.

The terms first, second and the like in the description and in the claims, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged, as appropriate, such that embodiments of the present application may be implemented in sequences other than those illustrated or described herein, and that the objects identified by "first," "second," etc. are generally of a type, and are not limited to the number of objects, such as the first object may be one or more. Furthermore, in the description and claims, "and/or" means at least one of the connected objects, and the character "/", generally means that the associated object is an "or" relationship.

As background art, in the related art, in order to solve the problem of abnormal consistency of multiple shooting effects, parameters of ISP units corresponding to different types of image sensors are usually adjusted in a targeted manner, so as to eliminate differences between images when the images pass through the ISP units. Specifically, the ISP may include Noise Reduction (NR), optical Black Correction (OB), lens shading Correction (Lens Shading Correction, LSC), automatic white balance (Auto White Balance, AWB), color Correction (CC), local tone mapping (Local Tone Mapping, LTM), global tone mapping (Global Tone Mapping, GTM), and the like. However, this method requires manual parameter adjustment, and it is difficult to ensure consistency of the multi-shot effect, and a lot of human resources are consumed.

In addition, in the related art, in order to solve the problem of abnormal consistency of multiple shooting effects, a mapping relationship between different image sensors can be calculated through a calibration algorithm, the mapping relationship is usually a mapping matrix, and the mapping relationship is applied to a certain module at the back end, for example, a color correction matrix (Color Correction Matrix, CCM) can be used in a color correction module to eliminate differences in color correction, but correction of a single module limited by differences of other modules does not greatly improve consistency of multiple shooting effects, and a lot of manpower resources are consumed if calibration and correction are performed on a plurality of modules.

Aiming at the problems in the related art, the embodiment of the application provides an image processing method, which can train an initial difference detection model through a first sample image data set and a second sample image data set which are acquired by camera modules based on different angles of view or different photosensitivity to obtain a target difference detection model, then input a first to-be-processed image and a second to-be-processed image which are acquired by different camera modules under the same shooting scene into the target difference detection model to obtain a target difference image, and then perform image processing on the first to-be-processed image or the second to-be-processed image based on the target difference image to obtain the target image. The thus obtained target image is consistent with the photographing effect of the second image to be processed or the first image to be processed. Therefore, the shooting effect of the images acquired based on different shooting modules can be consistent through image processing, the difference between the different shooting modules is made up, the consistency of the multi-shooting effect is ensured, and a large amount of human resources are not required to be consumed. When training the initial difference detection model, splitting image channels of a first sample image in the first sample image data set and a second sample image in the second sample image data set, respectively calculating first loss function values of different image channels, and calculating total loss function values of the initial difference detection model based on the first loss function values of different image channels. Therefore, the accuracy of model training can be improved by improving the accuracy of the loss function value, and the consistency of the shooting effect of the target image and the second image to be processed or the first image to be processed is further improved.

The image processing method, the device, the electronic equipment and the storage medium provided by the embodiment of the application are described in detail below through specific embodiments and application scenes thereof with reference to the accompanying drawings.

Fig. 1 is a flowchart illustrating an image processing method, the execution subject of which may be an electronic device, according to an exemplary embodiment. The execution body is not limited to the present application.

As shown in fig. 1, the image processing method may include the steps of:

step 110, a first sample image dataset and a second sample image dataset are acquired.

Here, the first sample image data set may be image data acquired through the first camera module, and the second sample image data set may be image data acquired through the second camera module under the same shooting scene.

The field angle of View (FOV) of the first camera module may be different from that of the second camera module, or the photosensitivity of the first camera module may be different from that of the second camera module.

The difference between the photosensitivity of the first camera module and the photosensitivity of the second camera module may be caused by the difference between the hardware designs of the first camera module and the second camera module, for example, the difference between materials of complementary metal oxide semiconductors (Complementary Metal Oxide Semiconductor, CMOS) or the difference between the sizes of photosensitive members, or the difference between the sizes of pixel units.

For example, the first sample image in the first sample image data set and the second sample image in the second sample image data set may be collected by gold (Golden) modules of different models, and may be an original image file (RAW), or may be an image of other formats, which is not limited herein.

Step 120, training the initial difference detection model based on the first sample image dataset and the second sample image dataset to obtain a target difference detection model.

Here, the target difference detection model may be used to output a difference image of the first sample image dataset and the second sample image dataset.

Specifically, when training the initial difference detection model, image channel splitting may be performed on a first sample image in the first sample image data set and a second sample image in the second sample image data set, first loss function values of different image channels are calculated respectively, and a total loss function value of the initial difference detection model is calculated based on the first loss function values of the different image channels.

Illustratively, the initial discrepancy detection model may be UNet and the target discrepancy detection model may be UNet.

Then, model parameters of the initial difference detection model can be adjusted according to the total loss function value, the total loss function value in the early stage can be continuously reduced along with the training, but at the end of the training, the total loss function value usually fluctuates up and down at a certain value because the model is fitted to the maximum limit, and when the total loss function value can not be continuously reduced any more, the training can be stopped, and the target difference detection model is obtained.

Illustratively, the first sample image and the second sample image may each be in RGGB Bayer (Bayer) format, and each may be split into four channels, R channel, GR channel, GB channel, and B channel.

In an alternative embodiment, the step 110 may specifically include:

performing difference detection on a first sample image in the first sample image data set and a corresponding second sample image in the second sample image data set by using an initial difference detection model to obtain a sample difference image;

fusing the sample difference image and the second sample image to obtain a target sample image;

splitting a target sample image into first single-channel images corresponding to the four image channels respectively, and splitting the first sample image into second single-channel images corresponding to the four image channels respectively;

respectively determining a loss function value according to a first single-channel image and a second single-channel image corresponding to each of the four image channels, and obtaining a first loss function value corresponding to each of the four image channels;

determining an average value of the first loss function values corresponding to the four image channels respectively as a total loss function value;

and adjusting model parameters of the initial difference detection model according to the total loss function value, and training to obtain the target difference detection model.

Here, the first single-channel image and the second single-channel image may each be an image containing only single-channel information.

For example, as shown in fig. 2, an image E and an image F may be input to an initial UNet, the image E is a first sample image, the image F is a second sample image, the difference detection is performed on the image E and the image F by using the initial UNet to obtain an image G, that is, a sample difference image, and then the image G and the image F are superimposed to obtain an image H, that is, a target sample image, and then the total loss function value is determined according to the image E and the image H.

Illustratively, taking image channel splitting of the first sample image as an example, the image channel splitting process is described, as shown in fig. 3, the image E may be in RGGB Bayer format, the image E may be split in R channel to obtain an R channel image 310, the image E may be split in GR channel to obtain a GR channel image 320, the image E may be split in GB channel to obtain a GB channel image 330, the image E may be split in B channel to obtain a B channel image 340, and then the R channel image 310, the GR channel image 320, the GB channel image 330, and the B channel image 340 may be linearly interpolated to obtain an R channel image 311, a GR channel image 321, a GB channel image 331, and a B channel image 341, that is, second single channel images corresponding to the plurality of image channels, respectively.

The specific process of splitting the target sample image into the first single-channel image corresponding to the plurality of image channels is the same as the specific process of splitting the first sample image into the second single-channel image corresponding to the plurality of image channels, and will not be described herein.

Then, the loss function value can be determined according to the first single-channel image and the second single-channel image corresponding to each of the plurality of image channels, so as to obtain first loss function values corresponding to the plurality of image channels, and then an average value of the first loss function values corresponding to the plurality of image channels is determined as a total loss function value.

For example, the calculation formula of the total loss function value may be as follows:

wherein Loss is the total Loss function value _R For the first Loss function value corresponding to the R image channel, loss _GR For the first Loss function value corresponding to GR image channel, loss _GB To be a first Loss function value corresponding to GB image channel _B Is the first loss function value corresponding to the B image channel.

In this way, by individually determining the first loss function value corresponding to each image channel and performing model training using the average value of the first loss function values corresponding to a plurality of image channels as the total loss function, it is possible to more accurately measure whether the model has completed training.

In an optional embodiment, the determining the loss function value according to the first single-channel image and the second single-channel image corresponding to each of the four image channels respectively, to obtain the first loss function value corresponding to each of the four image channels may specifically include:

performing downsampling on the first single-channel image for M times to obtain M third single-channel images with different image sizes, performing downsampling on the second single-channel image for M times to obtain M fourth single-channel images with different image sizes, wherein the downsampling multiples of the first single-channel image and the second single-channel image are the same, and M is an integer greater than 1;

respectively determining a loss function value according to a fifth single-channel image and a sixth single-channel image of each image size of M+1 image sizes to obtain second loss function values respectively corresponding to the M+1 image sizes, wherein the fifth single-channel image is any one of the M third single-channel images and the first single-channel images, and the sixth single-channel image is any one of the M fourth single-channel images and the second single-channel images;

an average value of the second loss function values corresponding to the m+1 image sizes, respectively, is determined as the first loss function value.

Here, the multiple by which the first single-channel image and the second single-channel image are downsampled may be the same. M may be an integer greater than 1. The fifth single-channel image may be any one of the M third single-channel images and the first single-channel image, and the sixth single-channel image may be any one of the M fourth single-channel images and the second single-channel image.

Since the information richness of the loss measurement is insufficient for an image of a certain image size, the first single-channel image and the second single-channel image can be downsampled by M gauss respectively.

For example, the first single-channel image may be subjected to three gaussian downsampling to obtain downsampled images 1/2, 1/4 and 1/8 times that of the original image, i.e., the third single-channel image. And the second single-channel image can be subjected to three Gaussian downsampling to obtain downsampled images which are 1/2, 1/4 and 1/8 times of the original image, namely a fourth single-channel image. Then respectively calculating a loss function value between two original pictures, a loss function value between two 1/2 times of downsampled images, a loss function value between two 1/4 times of downsampled images and a loss function value between two 1/8 times of downsampled images, and taking the average value of the four loss function values as a first loss function value corresponding to the image channel.

Taking the first loss function value corresponding to the R image channel as an example, the calculation formula of the first loss function value may be as follows:

wherein, loss _R For the first Loss function value corresponding to the R image channel, loss ₁ Is the second Loss function value between the two original pictures, loss _1/2 A second Loss function value between two 1/2 times downsampled images, loss _1/4 A second Loss function value between two 1/4 times downsampled images, loss _1/8 Two downsampled images of 1/8 timesA second loss function value therebetween.

In this way, by calculating the second loss function value for each of the single-channel images of the plurality of image sizes and taking the average value thereof as the first loss function value of the image channel, the information amount of the loss measurement can be enriched, thereby improving the accuracy of model training.

In an optional embodiment, the determining the loss function value according to the fifth single-channel image and the sixth single-channel image at each of the m+1 image sizes, to obtain the second loss function value corresponding to each of the m+1 image sizes may specifically include:

determining a third loss function value corresponding to at least two loss functions respectively according to the fifth single-channel image and the sixth single-channel image;

And carrying out weighted summation on at least two third loss function values according to preset weights respectively corresponding to the at least two loss functions to obtain second loss function values.

Here, for each image size, since the image resolution is different, the image feature information contained therein is also different, and the characteristics of the different loss functions are also different, so that the second loss function value corresponding to each image size can be determined jointly by using at least two loss functions. Specifically, a preset weight may be assigned to each loss function according to its characteristics at different image sizes.

Illustratively, the at least two loss functions may be at least two of mean square error (Mean Squared Error, MSE), structural similarity index (Structural Similarity Index Measurement, SSIM), and learning perceived image block similarity (Learned Perceptual Image Patch Similarity, LPIPS).

For example, analysis of three loss functions of MSE, SSIM and LPIPS shows that MSE is mean square error information, and is relatively sensitive to information in a high-resolution image, and the measurement is relatively accurate, so that when the image size is large at a high resolution, a higher weight can be given, and the weight is gradually reduced at a low resolution, that is, the preset weight corresponding to MSE can be positively correlated with the image size.

The SSIM is a structural similarity index, and is also accurate when the resolution is high, i.e. the image size is large, so that higher weight can be given when the image size is large, and the weight is gradually reduced when the resolution is low, i.e. the preset weight corresponding to the SSIM can be positively correlated with the image size.

The LPIPS is a visual perceptibility index, and can be well measured under various resolutions, namely under various image sizes, so that the preset weight of the LPIPS can be gradually increased along with the reduction of the image size so as to make up for the disadvantages of the MSE and the SSIM before.

Illustratively, taking the second loss function value corresponding to the original image size as an example, the calculation formula of the second loss function value may be as follows:

Loss ₁ ＝α*Loss _MSE +β*Loss _SSIM +γ*Loss _LPIPS

wherein, loss ₁ Is the second Loss function value between the two original pictures, loss _MSE For the third Loss function value corresponding to MSE, loss _SSIM For the third Loss function value corresponding to SSIM, loss _LPIPS For the third loss function value corresponding to the LPIPS, α, β, and γ may be preset weights.

Illustratively, the image size is from large to small: 1. in the case of 1/2, 1/4, and 1/8, the preset weights (α, β, γ) allocated in sequence may be (0.2,0.4,0.4), (0.2,0.3,0.5), (0.1,0.3,0.6), and (0.1,0.2,0.7).

Therefore, the training precision of the model can be measured more comprehensively by adopting at least two loss functions, so that a target difference detection model with higher precision can be obtained conveniently.

In an alternative embodiment, the step 120 may specifically include:

acquiring sample weight information;

and inputting sample weight information, a first sample image in the first sample image data set and a second sample image corresponding to the second sample image data set into an initial difference detection model, extracting difference characteristics respectively corresponding to a plurality of image areas corresponding to the first sample image and the second sample image by the initial difference detection model, and weighting the difference characteristics according to weights respectively corresponding to the plurality of image areas to obtain a target sample image.

In an alternative embodiment, the initial difference detection model may include N feature extraction layers, the number of sample weight information may be N, and N may be an integer greater than 1, where different feature extraction layers may correspond to different image sizes, and different image sizes may correspond to different sample weight information;

the acquiring sample weight information may specifically include:

for N feature extraction layers, acquiring weights corresponding to a plurality of image areas under the image size corresponding to each feature extraction layer respectively to obtain N sample weight information;

Extracting, by the initial difference detection model, difference features corresponding to a plurality of image areas corresponding to the first sample image and the second sample image, and weighting the difference features according to weights corresponding to the plurality of image areas, to obtain a target sample image, including:

extracting difference features of the first sample image and the second sample image, which correspond to a plurality of image areas under N image sizes, respectively by N feature extraction layers in the initial difference detection model;

and weighting the difference features of the plurality of image areas respectively extracted by the N feature extraction layers according to the N sample weight information to obtain a target sample image.

In an optional embodiment, the obtaining weights corresponding to the plurality of image areas under the image size corresponding to each feature extraction layer for the N feature extraction layers to obtain N sample weight information specifically may include:

according to the similarity between a plurality of corresponding image areas in the first sample image and the second sample image, determining weights corresponding to the plurality of corresponding image areas under the original image size, and obtaining first sample weight information;

performing N-1 times of downsampling on the first sample weight information, and determining weights corresponding to a plurality of image areas under N-1 image sizes to obtain N-1 second sample weight information;

The first sample weight information and the N-1 second sample weight information are determined as N sample weight information.

The above process of acquiring the sample weight and determining the target sample image based on the sample weight is the same as the process of acquiring the weight information and determining the target difference image based on the weight information, and the detailed process is referred to below, and will not be repeated here.

And 130, inputting the first to-be-processed image and the second to-be-processed image into a target difference detection model to obtain a target difference image.

Here, the first image to be processed may be an image acquired by the first camera module, and the second image to be processed may be an image acquired by the second camera module in the same shooting scene.

Specifically, the difference features between the first to-be-processed image and the second to-be-processed image can be extracted by using the target difference detection model, so as to obtain a target difference image.

In an alternative embodiment, the step 130 may specifically include:

acquiring a second image to be processed and a third image to be processed;

and cutting out an area corresponding to the second to-be-processed image from the third to-be-processed image according to the second to-be-processed image to obtain a first to-be-processed image.

Here, the second image to be processed and the third image to be processed may be acquired by different camera modules. The FOV of the third image to be processed may be greater than the FOV of the second image to be processed. The third image to be processed may include an area corresponding to the second image to be processed. The first image to be processed may be an area corresponding to the second image to be processed in the third image to be processed.

Specifically, a mature matching algorithm may be adopted to determine an area corresponding to the second to-be-processed image from the third to-be-processed image, and cut out the area corresponding to the second to-be-processed image, and then the cut-out image is restored to be consistent with the resolution of the second to-be-processed image through sampling, so as to obtain the first to-be-processed image.

The matching algorithm may be a Scale-invariant feature transform (SIFT) based matching algorithm.

For example, the image a, that is, the second image to be processed, and the image B, that is, the third image to be processed, may be acquired by two Golden modules of different models, and both the image a and the image B may be original RAW images (Pure RAW), and due to the difference in FOV of the two camera modules, the field of view information of the image B and the image a is different, and the FOV of the image B is greater than that of the image a, and the image B includes an area corresponding to the image a. Then, the image B and the image a can be matched by adopting a SIFT-based matching algorithm, the region corresponding to the image a in the image B is determined, the region corresponding to the image a is cut out, and the resolution of the region is restored to be consistent with the resolution of the image a through sampling, so that an image C, namely a first image to be processed, is obtained.

Therefore, the difference of the effects between the first to-be-processed image and the second to-be-processed image, namely the target difference image, can be determined more accurately, and the target difference image is a part of the third to-be-processed image, so that the difference of the effects between the second to-be-processed image and the third to-be-processed image can be accurately represented.

In an alternative embodiment, the step 130 may specifically include:

acquiring weight information;

and inputting the weight information, the first to-be-processed image and the second to-be-processed image into a target difference detection model, extracting difference characteristics corresponding to a plurality of image areas corresponding to the first to-be-processed image and the second to-be-processed image respectively by the target difference detection model, and carrying out weighting treatment on the difference characteristics according to the weights corresponding to the plurality of image areas respectively to obtain a target difference image.

Here, the difference between the different image areas in the first image to be processed and the second image to be processed is different in size, so different weights may be assigned to the different image areas, and thus weights corresponding to the plurality of image areas, respectively, may be included in the weight information. The weight information may be manually set empirically, or may be determined according to the first image to be processed and the second image to be processed, which is not limited herein. The weight information may be a weight table, for example.

Illustratively, the weight table may include weights respectively corresponding to the plurality of image areas. After the weight table, the image C and the image A are all input into the UNet, the UNet can extract difference features corresponding to a plurality of corresponding image areas in the image C and the image A respectively, and then the weight table carries out matrix dot multiplication on the difference features to obtain a target difference image.

In this way, by weighting the difference features respectively corresponding to the plurality of image areas in the first to-be-processed image and the second to-be-processed image respectively corresponding to the plurality of image areas, different weights can be allocated to different image areas, and the accuracy of the target difference image is improved.

In an alternative embodiment, the target difference detection model may include N feature extraction layers, where the number of weight information may be N, and N may be an integer greater than 1, where different feature extraction layers may correspond to different image sizes, and different image sizes may correspond to different weight information;

the acquiring weight information may specifically include:

for N feature extraction layers, acquiring weights corresponding to a plurality of image areas under the image size corresponding to each feature extraction layer respectively to obtain N weight information;

Extracting, by the target difference detection model, difference features corresponding to a plurality of image areas corresponding to the first to-be-processed image and the second to-be-processed image, and weighting the difference features according to weights corresponding to the plurality of image areas, to obtain a target difference image, including:

extracting difference features of the first to-be-processed image and the second to-be-processed image, which correspond to a plurality of image areas under N image sizes, by N feature extraction layers in the target difference detection model;

and weighting the difference features of the plurality of image areas respectively extracted by the N feature extraction layers according to the N weight information to obtain a target difference image.

Here, the feature extraction layers may include at least a stitching layer, and the image size corresponding to each feature extraction layer may refer to the size of the difference feature output by the stitching layer in the feature extraction layer.

Since the feature extraction layer may further include a downsampling layer, the downsampling layer may be used to downsample the input features, different feature extraction layers may correspond to different image sizes, and N feature extraction layers correspond to N image sizes.

The difference feature of each image size needs the same size weight information, so that different image sizes correspond to different weight information, and N image sizes correspond to N weight information.

Because the feature extraction layer may further include a convolution layer, the convolution layer may be configured to extract difference features corresponding to a plurality of image areas in the first to-be-processed image and the second to-be-processed image, respectively.

Specifically, the weight information under N image sizes may be acquired, and the N weight information may be input to N feature extraction layers in the target difference detection model, respectively. And extracting difference features of the first to-be-processed image and the second to-be-processed image, which correspond to the plurality of image areas respectively under the N image sizes, through the N feature extraction layers, and then carrying out weighting processing on the difference features of the plurality of image areas respectively extracted by the N feature extraction layers according to the N weight information to obtain a target difference image.

For example, the target variance detection model may include four feature extraction layers. Weight information under four image sizes can be acquired, and the four weight information is respectively input to the four feature extraction layers. And extracting the difference features of the image C and the image A, which correspond to the image areas under the four image sizes, through the four feature extraction layers, and then carrying out weighting processing on the difference features of the image areas extracted by the four feature extraction layers according to the four weight information to obtain the target difference image.

Therefore, the feature extraction layers corresponding to the image sizes can be guided a priori based on the weight information corresponding to the image sizes, the accuracy of the difference features is further improved, and a more accurate target difference image is obtained.

In an optional embodiment, the obtaining weights corresponding to the plurality of image areas under the image size corresponding to each feature extraction layer for the N feature extraction layers to obtain N weight information may specifically include:

according to the similarity between a plurality of corresponding image areas in the first image to be processed and the second image to be processed, determining weights corresponding to the plurality of corresponding image areas under the original image size, and obtaining first weight information;

performing N-1 times of downsampling on the first weight information, and determining weights corresponding to a plurality of image areas under N-1 image sizes to obtain N-1 second weight information;

the first weight information and the N-1 second weight information are determined as N weight information.

Here, since the object difference detection model is used to extract the difference feature, it is necessary to pay more attention to the difference between the images, so the region with lower similarity needs to be assigned with a larger weight, and the region with higher similarity needs to be assigned with a smaller weight, that is, the similarity corresponding to each image region and the weight corresponding thereto may be inversely related.

Specifically, SSIM calculation may be performed on a plurality of image areas corresponding to the first image to be processed and the second image to be processed through a sliding window. For any image region, the greater the SSIM value, the higher the similarity; the smaller the SSIM value, the lower the similarity. According to the SSIM values corresponding to the different image areas, the weights corresponding to the different image areas can be determined. For any image region, the larger the SSIM value, the smaller the weight; the smaller the SSIM value, the greater the weight. Thus, first weight information can be obtained.

Then, the first weight information may be downsampled N-1 times to obtain N-1 second weight information, where the N-1 second weight information may be weight information corresponding to N-1 image sizes, respectively.

Thus, the first weight information and N-1 pieces of second weight information, that is, N pieces of weight information, are obtained.

Illustratively, as shown in fig. 4, the corresponding plurality of image areas in image C and image a may be identified by a sliding window, such as: c ₁ And a ₁ 、c ₂ And a ₂ 、c _n-1 And a _n-1 、c _n And a _n Respectively performing SSIM measurement, and determining weights respectively corresponding to a plurality of image areas according to SSIM values, for example: c ₁ And a ₁ The corresponding weight is w ₁ 、c ₂ And a ₂ The corresponding weight is w ₂ 、c _n-1 And a _n-1 The corresponding weight is w _n-1 、c _n And a _n The corresponding weight is w _n A weight table 1, i.e. first weight information, is obtained.

In this way, according to the similarity between the corresponding multiple image areas in the first to-be-processed image and the second to-be-processed image, the weights corresponding to the multiple image areas under the original image size are determined to obtain the first weight information, the weights corresponding to the multiple image areas can be more accurately determined, and then the second weight information corresponding to the multiple image sizes can be quickly obtained by performing multiple downsampling on the first weight information.

In some examples, UNet includes a convolutional layer, a downsampling layer, a splice layer, and an upsampling layer. Taking UNet as an example, the specific process of step 130 may be as shown in fig. 5.

In step 1301, image C and image a are input to the convolution layer, and difference extraction is performed on image a and image C by the convolution layer, so as to obtain difference feature 501.

In step 1302, the difference feature 501 is input to a downsampling layer, and the downsampling layer downsamples the difference feature 501 to obtain the difference feature 502.

In step 1303, the difference feature 502 is input to the convolution layer, and the difference feature 503 is obtained by performing difference extraction on the difference feature 502 by the convolution layer.

In step 1304, the difference feature 503 is input to a downsampling layer, and the downsampling layer downsamples the difference feature 503 to obtain a difference feature 504.

In step 1305, the difference feature 504 is input to the convolution layer, and the difference feature 504 is subjected to difference extraction by the convolution layer, so as to obtain a difference feature 505.

In step 1306, the difference feature 505 is input to a downsampling layer, and the downsampling layer downsamples the difference feature 505 to obtain a difference feature 506.

In step 1307, the difference feature 506 is input to the convolution layer, and the difference feature 506 is subjected to difference extraction by the convolution layer to obtain a difference feature 507.

Step 1308, the difference feature 507 is input to a downsampling layer, and the downsampling layer downsamples the difference feature 507 to obtain a difference feature 508.

In step 1309, the difference feature 508 is input to the convolution layer, and the difference feature 508 is extracted by the convolution layer to obtain a difference feature 509.

Step 1310, inputting the difference feature 509 and the difference feature 507 into a splicing layer, and splicing the difference feature 509 and the difference feature 507 by the splicing layer to obtain a difference feature 510.

In step 1311, SSIM metrics are performed on image C and image a, resulting in weight table 1.

Step 1312, three downsamples are performed on weight table 1 to obtain weight table 2, weight table 3 and weight table 4.

It should be noted that, the embodiment of the present application does not limit the sequence of steps 1301-1310 and steps 1311-1312.

In step 1313, the difference feature 510 and the weight table 4 are subjected to matrix dot multiplication to obtain the difference feature 511.

In step 1314, the difference feature 511 is input to the convolution layer, and the difference feature 511 is extracted by the convolution layer to obtain the difference feature 512.

In step 1315, the difference feature 512 is input to the upsampling layer, the upsampling layer upsamples the difference feature 512, the upsampled difference feature and the difference feature 505 are input to the stitching layer, and the stitching layer stitches the upsampled difference feature and the difference feature 505 to obtain the difference feature 513.

In step 1316, the difference feature 513 and the weight table 3 are subjected to matrix dot multiplication to obtain a difference feature 514.

In step 1317, the difference feature 514 is input to the convolution layer, and the difference feature 514 is extracted by the convolution layer to obtain the difference feature 515.

In step 1318, the difference feature 515 is input to the upsampling layer, the upsampling layer upsamples the difference feature 515, the upsampled difference feature and the difference feature 502 are input to the stitching layer, and the stitching layer stitches the upsampled difference feature and the difference feature 502 to obtain the difference feature 516.

In step 1319, the difference feature 516 and the weight table 2 are subjected to matrix dot multiplication to obtain the difference feature 517.

In step 1317, the difference feature 517 is input to the convolution layer, and the difference feature 517 is extracted by the convolution layer to obtain a difference feature 518.

In step 1318, the difference feature 518 is input to the upsampling layer, the upsampling layer upsamples the difference feature 518, the upsampled difference feature and the difference feature 501 are input to the stitching layer, and the stitching layer stitches the upsampled difference feature and the difference feature 501 to obtain a difference feature 519.

In step 1319, the difference feature 519 and the weight table 1 are subjected to matrix dot multiplication to obtain the difference feature 520.

In step 1317, the difference feature 520 is input to the convolution layer, and the difference feature 520 is extracted by the convolution layer to obtain the target difference image 521.

In some examples, the dimensions of image C and image a may not conform to the original input of UNet, and thus image C and image a may be downsampled or upsampled prior to inputting image C and image a to UNet, i.e., prior to step 1301, to conform the dimensions of image C and image a to the original input of UNet.

For example, the sizes of the image C and the image a are 4080×3060, and the original input of UNet is 572×572, so a downsampling layer may be added at the front end of the input to downsample the image C and the image a, so that the sizes of the image C and the image a are consistent with the original input of UNet.

Further, since the target difference image needs to be fused with the image a later, the size of the target difference image needs to be the same as the original size of the image a. In some examples, the original output of UNet may not be the same as the original size of image a, and thus the size of the target difference image 521 output by UNet may be different from the original size of image a, the target difference image 521 output by UNet may be downsampled or upsampled after the target difference image is output by UNet, i.e., after step 1301, so that the size of the target difference image is the same as the original size of image a.

For example, since the size of the target difference image 521 output by UNet is 388×388 and the original size of the image a is 408×3060, an up-sampling layer may be added after the output end to up-sample the target difference image 521, so that the size of the target difference image is the same as the original size of the image a.

And 140, performing image processing on the first to-be-processed image or the second to-be-processed image based on the target difference image to obtain a target image.

Here, the first to-be-processed image may be subjected to image processing based on the target difference image to obtain a target image in accordance with the effect of the second to-be-processed image, or the second to-be-processed image may be subjected to image processing based on the target difference image to obtain a target image in accordance with the effect of the first to-be-processed image.

Specifically, the target difference image and the first to-be-processed image or the second to-be-processed image may be superimposed to obtain the target image, for example, pixel values of corresponding positions of the target difference image and the first to-be-processed image or the second to-be-processed image may be added to obtain the target image.

Therefore, the initial difference detection model can be trained through a first sample image data set and a second sample image data set which are acquired by the camera modules based on different view angles or different photosensitivity to obtain a target difference detection model, then a first to-be-processed image and a second to-be-processed image which are acquired by the camera modules based on different camera modules under the same shooting scene are input into the target difference detection model to obtain a target difference image, and then the first to-be-processed image or the second to-be-processed image is subjected to image processing based on the target difference image, so that the target image can be obtained. The thus obtained target image is consistent with the photographing effect of the second image to be processed or the first image to be processed. Therefore, the shooting effect of the images acquired based on different shooting modules can be consistent through image processing, the difference between the different shooting modules is made up, the consistency of the multi-shooting effect is ensured, and a large amount of human resources are not required to be consumed. When training the initial difference detection model, splitting image channels of a first sample image in the first sample image data set and a second sample image in the second sample image data set, respectively calculating first loss function values of different image channels, and calculating total loss function values of the initial difference detection model based on the first loss function values of different image channels. Therefore, the accuracy of model training can be improved by improving the accuracy of the loss function value, and the consistency of the shooting effect of the target image and the second image to be processed or the first image to be processed is further improved.

The target difference detection model adopted by the image processing method provided by the embodiment of the application can effectively judge the similarity of two paths of input, thereby accurately measuring the effect difference between different types of camera modules and generating a difference image. According to the image processing method provided by the embodiment of the application, the original difference between different camera modules can be accurately separated through compensating the difference image, and effective image information can be maintained while the difference is eliminated. The original RAW data of different camera modules can be sent to a target difference detection model, target difference images predicted by the target difference detection model are processed on RAW images acquired by one camera module, and RAW images with small difference with RAW images acquired by the other camera module are obtained. Therefore, RAW images acquired by different camera modules can be directly sent to an ISP for processing, so that the consistency of ISP input is ensured. Therefore, the parameter multiplexing among different camera modules can be realized, the debugging workload is greatly reduced, and meanwhile, the method has high precision and strong robustness and is suitable for various types of images. In addition, the problem of difference among different camera modules is effectively solved, the stability and consistency of the camera modules are improved, the camera module can be widely applied to various camera modules, and a new technological breakthrough is brought to the photographic equipment industry.

It should be noted that, the application scenario described in the foregoing embodiment of the present application is for more clearly describing the technical solution of the embodiment of the present application, and does not constitute a limitation on the technical solution provided by the embodiment of the present application, and as a person of ordinary skill in the art can know, with the appearance of a new application scenario, the technical solution provided by the embodiment of the present application is also applicable to similar technical problems.

According to the image processing method provided by the embodiment of the application, the execution subject can be an image processing device. In the embodiment of the present application, an image processing apparatus is described by taking an example of an image processing method performed by the image processing apparatus.

Based on the same inventive concept, the application also provides an image processing device. An image processing apparatus according to an embodiment of the present application will be described in detail with reference to fig. 6.

Fig. 6 is a block diagram showing a configuration of an image processing apparatus according to an exemplary embodiment.

As shown in fig. 6, the image processing apparatus 600 may include:

an obtaining module 601, configured to obtain a first sample image dataset and a second sample image dataset, where the first sample image dataset is image data acquired by a first camera module, and the second sample image dataset is image data acquired by a second camera module under the same shooting scene, and an angle of view of the first camera module is different from an angle of view of the second camera module, or a photosensitive performance of the first camera module is different from a photosensitive performance of the second camera module;

The training module 602 is configured to train the initial difference detection model based on the first sample image dataset and the second sample image dataset to obtain a target difference detection model, where the target difference detection model is configured to output a difference image of the first sample image dataset and the second sample image dataset;

the detection module 603 is configured to input a first image to be processed and a second image to be processed into a target difference detection model to obtain a target difference image, where the first image to be processed is an image acquired by the first camera module, and the second image to be processed is an image acquired by the second camera module under the same shooting scene;

a processing module 604, configured to perform image processing on the first to-be-processed image or the second to-be-processed image based on the target difference image, to obtain a target image;

The image processing apparatus 600 will be described in detail below, specifically as follows:

in one embodiment, training module 602 may include:

the detection sub-module is used for carrying out difference detection on a first sample image in the first sample image data set and a corresponding second sample image in the second sample image data set by using an initial difference detection model to obtain a sample difference image;

the fusion sub-module is used for fusing the sample difference image and the second sample image to obtain a target sample image;

the splitting module is used for splitting the target sample image into first single-channel images corresponding to the four image channels respectively and splitting the first sample image into second single-channel images corresponding to the four image channels respectively;

the first determining submodule is used for respectively determining loss function values according to a first single-channel image and a second single-channel image corresponding to each image channel in the four image channels to obtain first loss function values corresponding to the four image channels respectively;

the second determining submodule is used for determining an average value of the first loss function values corresponding to the four image channels respectively as a total loss function value;

and the training sub-module is used for adjusting model parameters of the initial difference detection model according to the total loss function value and training to obtain the target difference detection model.

In one embodiment, the first determining sub-module may include:

the downsampling unit is used for downsampling the first single-channel image for M times to obtain M third single-channel images with different image sizes, downsampling the second single-channel image for M times to obtain M fourth single-channel images with different image sizes, the downsampling multiples of the first single-channel image and the downsampling of the second single-channel image are the same, and M is an integer larger than 1;

a first determining unit, configured to determine a loss function value according to a fifth single-channel image and a sixth single-channel image of each of the m+1 image sizes, to obtain a second loss function value corresponding to each of the m+1 image sizes, where the fifth single-channel image is any one of the M third single-channel images and the first single-channel image, and the sixth single-channel image is any one of the M fourth single-channel images and the second single-channel image;

and a second determining unit configured to determine an average value of second loss function values corresponding to the m+1 image sizes, respectively, as the first loss function value.

In one embodiment, the first determining unit may include:

a first determining subunit, configured to determine third loss function values corresponding to at least two loss functions respectively according to the fifth single-channel image and the sixth single-channel image;

And the calculating subunit is used for carrying out weighted summation on at least two third loss function values according to preset weights respectively corresponding to the at least two loss functions to obtain a second loss function value.

In one embodiment, the detection module 603 may include:

the acquisition sub-module is used for acquiring weight information, wherein the weight information comprises weights respectively corresponding to the plurality of image areas;

the extraction sub-module is used for inputting the weight information, the first to-be-processed image and the second to-be-processed image into the target difference detection model, extracting difference features corresponding to a plurality of corresponding image areas in the first to-be-processed image and the second to-be-processed image respectively by the target difference detection model, and carrying out weighting processing on the difference features according to the weights corresponding to the image areas respectively to obtain the target difference image.

In one embodiment, the target difference detection model includes N feature extraction layers, where the number of weight information is N, and N is an integer greater than 1, where different feature extraction layers correspond to different image sizes, and different image sizes correspond to different weight information;

the acquiring sub-module may include:

the acquisition unit is used for acquiring weights corresponding to a plurality of image areas under the image size corresponding to each feature extraction layer aiming at the N feature extraction layers to obtain N weight information;

The extraction sub-module may include:

the extraction unit is used for extracting difference features of the first to-be-processed image and the second to-be-processed image, which correspond to a plurality of image areas under N image sizes, by N feature extraction layers in the target difference detection model;

and the processing unit is used for carrying out weighting processing on the difference features of the plurality of image areas respectively extracted by the N feature extraction layers according to the N weight information to obtain a target difference image.

In one embodiment, the acquiring unit may include:

the second determining subunit is used for determining weights corresponding to the plurality of image areas under the original image size according to the similarity between the plurality of image areas corresponding to the first image to be processed and the second image to be processed, so as to obtain first weight information;

the third determining subunit is used for performing N-1 times of downsampling on the first weight information, determining weights corresponding to a plurality of image areas under N-1 image sizes, and obtaining N-1 second weight information;

and a fourth determining subunit configured to determine the first weight information and the N-1 second weight information as N weight information.

The image processing device in the embodiment of the application can be an electronic device, or can be a component in the electronic device, such as an integrated circuit or a chip. The electronic device may be a terminal, or may be other devices than a terminal. By way of example, the electronic device may be a mobile phone, tablet computer, notebook computer, palm computer, vehicle-mounted electronic device, mobile internet appliance (Mobile Internet Device, MID), augmented reality (augmented reality, AR)/Virtual Reality (VR) device, robot, wearable device, ultra-mobile personal computer, UMPC, netbook or personal digital assistant (personal digital assistant, PDA), etc., but may also be a server, network attached storage (Network Attached Storage, NAS), personal computer (personal computer, PC), television (TV), teller machine or self-service machine, etc., and the embodiments of the present application are not limited in particular.

The image processing apparatus in the embodiment of the present application may be an apparatus having an operating system. The operating system may be an Android operating system, an ios operating system, or other possible operating systems, and the embodiment of the present application is not limited specifically.

The image processing device provided by the embodiment of the present application can implement each process implemented by the embodiments of the methods of fig. 1 to 5, and achieve the same technical effects, so that repetition is avoided, and no further description is provided herein.

Optionally, as shown in fig. 7, the embodiment of the present application further provides an electronic device 700, including a processor 701 and a memory 702, where the memory 702 stores a program or an instruction that can be executed on the processor 701, and the program or the instruction implements each step of the above-mentioned image processing method embodiment when executed by the processor 701, and the steps achieve the same technical effects, so that repetition is avoided, and no further description is given here.

The electronic device in the embodiment of the application includes the mobile electronic device and the non-mobile electronic device.

The electronic device 800 includes, but is not limited to: radio frequency unit 801, network module 802, audio output unit 803, input unit 804, sensor 805, display unit 806, user input unit 807, interface unit 808, memory 809, and processor 810.

Those skilled in the art will appreciate that the electronic device 800 may also include a power source (e.g., a battery) for powering the various components, which may be logically connected to the processor 810 by a power management system to perform functions such as managing charge, discharge, and power consumption by the power management system. The electronic device structure shown in fig. 8 does not constitute a limitation of the electronic device, and the electronic device may include more or less components than shown, or may combine certain components, or may be arranged in different components, which are not described in detail herein.

The processor 810 is configured to obtain a first sample image data set and a second sample image data set, where the first sample image data set is image data acquired by a first camera module, and the second sample image data set is image data acquired by a second camera module under the same shooting scene, and a field angle of the first camera module is different from a field angle of the second camera module, or a photosensitive performance of the first camera module is different from a photosensitive performance of the second camera module;

In some embodiments, the processor 810 is further configured to perform a difference detection on the first sample image in the first sample image data set and the corresponding second sample image in the second sample image data set using the initial difference detection model to obtain a sample difference image;

In some embodiments, the processor 810 is further configured to downsample the first single-channel image M times to obtain M third single-channel images with different image sizes, downsample the second single-channel image M times to obtain M fourth single-channel images with different image sizes, the downsampled multiple of the first single-channel image and the downsampled multiple of the second single-channel image are the same, and M is an integer greater than 1;

In some embodiments, the processor 810 is further configured to determine a third loss function value corresponding to at least two loss functions according to the fifth single-channel image and the sixth single-channel image, respectively;

In some embodiments, the processor 810 is further configured to obtain weight information, where the weight information includes weights corresponding to the plurality of image areas respectively;

In some embodiments, the target difference detection model includes N feature extraction layers, the number of weight information is N, N is an integer greater than 1, wherein different feature extraction layers correspond to different image sizes, and different image sizes correspond to different weight information; the processor 810 is further configured to obtain weights corresponding to the plurality of image areas under the image size corresponding to each feature extraction layer for the N feature extraction layers, to obtain N weight information;

In some embodiments, the processor 810 is further configured to determine weights corresponding to the plurality of image areas under the original image size according to the similarity between the corresponding plurality of image areas in the first to-be-processed image and the second to-be-processed image, so as to obtain first weight information;

It should be appreciated that in embodiments of the present application, the input unit 804 may include a graphics processor (Graphics Processing Unit, GPU) 8041 and a microphone 8042, the graphics processor 8041 processing image data of still pictures or video obtained by an image capturing device (e.g., a camera) in a video capturing mode or an image capturing mode. The display unit 806 may include a display panel 8061, and the display panel 8061 may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like. The user input unit 807 includes at least one of a touch panel 8071 and other input devices 8072. Touch panel 8071, also referred to as a touch screen. The touch panel 8071 may include two parts, a touch detection device and a touch controller. Other input devices 8072 may include, but are not limited to, a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and so forth, which are not described in detail herein.

The memory 809 can be used to store software programs as well as various data. The memory 809 may mainly include a first storage area storing programs or instructions and a second storage area storing data, wherein the first storage area may store an operating system, application programs or instructions (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like. Further, the memory 809 may include volatile memory or nonvolatile memory, or the memory 809 may include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable EPROM (EEPROM), or a flash Memory. The volatile memory may be random access memory (Random Access Memory, RAM), static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (ddr SDRAM), enhanced SDRAM (Enhanced SDRAM), synchronous DRAM (SLDRAM), and Direct RAM (DRRAM). Memory 809 in embodiments of the application includes, but is not limited to, these and any other suitable types of memory.

The processor 810 may include one or more processing units; optionally, the processor 810 integrates an application processor that primarily processes operations involving an operating system, user interface, application programs, etc., and a modem processor that primarily processes wireless communication signals, such as a baseband processor. It will be appreciated that the modem processor described above may not be integrated into the processor 810.

The embodiment of the application also provides a readable storage medium, on which a program or an instruction is stored, which when executed by a processor, implements each process of the above image processing method embodiment, and can achieve the same technical effects, and in order to avoid repetition, a detailed description is omitted here.

Wherein the processor is a processor in the electronic device described in the above embodiment. The readable storage medium includes computer readable storage medium such as read-only memory, random access memory, magnetic disk or optical disk.

The embodiment of the application further provides a chip, which comprises a processor and a communication interface, wherein the communication interface is coupled with the processor, and the processor is used for running programs or instructions to realize the processes of the embodiment of the image processing method, and can achieve the same technical effects, so that repetition is avoided, and the description is omitted here.

It should be understood that the chips referred to in the embodiments of the present application may also be referred to as system-on-chip chips, chip systems, or system-on-chip chips, etc.

Embodiments of the present application provide a computer program product stored in a storage medium, where the program product is executed by at least one processor to implement the respective processes of the above-described image processing method embodiments, and achieve the same technical effects, and for avoiding repetition, a detailed description is omitted herein.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Furthermore, it should be noted that the scope of the methods and apparatus in the embodiments of the present application is not limited to performing the functions in the order shown or discussed, but may also include performing the functions in a substantially simultaneous manner or in an opposite order depending on the functions involved, e.g., the described methods may be performed in an order different from that described, and various steps may be added, omitted, or combined. Additionally, features described with reference to certain examples may be combined in other examples.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a computer software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method according to the embodiments of the present application.

The embodiments of the present application have been described above with reference to the accompanying drawings, but the present application is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those having ordinary skill in the art without departing from the spirit of the present application and the scope of the claims, which are to be protected by the present application.

Claims

1. An image processing method, the method comprising:

acquiring a first sample image data set and a second sample image data set, wherein the first sample image data set is image data acquired through a first camera module, the second sample image data set is image data acquired through a second camera module under the same shooting scene, and the angle of view of the first camera module is different from the angle of view of the second camera module, or the photosensitive performance of the first camera module is different from the photosensitive performance of the second camera module;

inputting a first image to be processed and a second image to be processed into the target difference detection model to obtain a target difference image, wherein the first image to be processed is an image acquired by the first camera module, and the second image to be processed is an image acquired by the second camera module under the same shooting scene;

when training the initial difference detection model, splitting image channels of a first sample image in the first sample image data set and a second sample image in the second sample image data set, respectively calculating first loss function values of different image channels, and calculating total loss function values of the initial difference detection model based on the first loss function values of the different image channels.

2. The method of claim 1, wherein training the initial variance detection model based on the first sample image dataset and the second sample image dataset to obtain a target variance detection model comprises:

Splitting the target sample image into first single-channel images corresponding to four image channels respectively, and splitting the first sample image into second single-channel images corresponding to the four image channels respectively;

respectively determining loss function values according to the first single-channel image and the second single-channel image corresponding to each image channel in the four image channels to obtain first loss function values corresponding to the four image channels respectively;

determining an average value of the first loss function values respectively corresponding to the four image channels as the total loss function value;

3. The method according to claim 2, wherein determining the loss function value according to the first single-channel image and the second single-channel image corresponding to each of the four image channels, respectively, to obtain a first loss function value corresponding to each of the four image channels, includes:

performing downsampling on the first single-channel image for M times to obtain M third single-channel images with different image sizes, performing downsampling on the second single-channel image for M times to obtain M fourth single-channel images with different image sizes, wherein the downsampling multiples of the first single-channel image and the downsampling multiples of the second single-channel image are the same, and M is an integer greater than 1;

Respectively determining a loss function value according to a fifth single-channel image and a sixth single-channel image of each image size of M+1 image sizes, so as to obtain second loss function values respectively corresponding to the M+1 image sizes, wherein the fifth single-channel image is any one of the M third single-channel images and the first single-channel images, and the sixth single-channel image is any one of the M fourth single-channel images and the second single-channel images;

and determining an average value of second loss function values corresponding to the M+1 image sizes as the first loss function value.

4. A method according to claim 3, wherein determining the loss function value from the fifth single channel image and the sixth single channel image at each of the m+1 image sizes, respectively, to obtain the second loss function value corresponding to the m+1 image sizes, respectively, comprises:

determining third loss function values corresponding to at least two loss functions respectively according to the fifth single-channel image and the sixth single-channel image;

and carrying out weighted summation on at least two third loss function values according to preset weights respectively corresponding to the at least two loss functions to obtain the second loss function value.

5. The method according to claim 1, wherein inputting the first to-be-processed image and the second to-be-processed image into the target difference detection model to obtain a target difference image includes:

acquiring weight information, wherein the weight information comprises weights respectively corresponding to a plurality of image areas;

and inputting the weight information, the first to-be-processed image and the second to-be-processed image into the target difference detection model, extracting difference characteristics corresponding to a plurality of corresponding image areas in the first to-be-processed image and the second to-be-processed image respectively by the target difference detection model, and carrying out weighting processing on the difference characteristics according to the weights corresponding to the plurality of image areas respectively to obtain the target difference image.

6. The method of claim 5, wherein the target difference detection model includes N feature extraction layers, the number of weight information being N, N being an integer greater than 1, wherein different feature extraction layers correspond to different image sizes, and different image sizes correspond to different weight information;

the obtaining weight information includes:

for the N feature extraction layers, acquiring weights corresponding to a plurality of image areas under the image size corresponding to each feature extraction layer respectively to obtain N weight information;

Extracting, by the target difference detection model, difference features corresponding to a plurality of image areas corresponding to the first to-be-processed image and the second to-be-processed image, and performing weighting processing on the difference features according to weights corresponding to the plurality of image areas, to obtain the target difference image, including:

extracting difference features of a plurality of image areas of the first to-be-processed image and the second to-be-processed image respectively corresponding to the N image sizes by N feature extraction layers in the target difference detection model;

and weighting the difference features of the plurality of image areas respectively extracted by the N feature extraction layers according to the N weight information to obtain the target difference image.

7. The method of claim 6, wherein the obtaining, for the N feature extraction layers, weights corresponding to a plurality of image areas under an image size corresponding to each feature extraction layer, to obtain N weight information includes:

and determining the first weight information and the N-1 second weight information as the N weight information.

8. An image processing apparatus, characterized in that the apparatus comprises:

the device comprises an acquisition module, a first image acquisition module and a second image acquisition module, wherein the first image acquisition module is used for acquiring a first sample image data set and a second sample image data set, the first sample image data set is acquired through a first camera module, the second sample image data set is acquired through a second camera module under the same shooting scene, and the angle of view of the first camera module is different from the angle of view of the second camera module or the photosensitivity of the first camera module is different from the photosensitivity of the second camera module;

the training module is used for training an initial difference detection model based on the first sample image data set and the second sample image data set to obtain a target difference detection model, and the target difference detection model is used for outputting difference images of the first sample image data set and the second sample image data set;

9. An electronic device comprising a processor and a memory storing a program or instructions executable on the processor, which when executed by the processor, implement the steps of the image processing method of any of claims 1-7.

10. A readable storage medium, characterized in that the readable storage medium has stored thereon a program or instructions which, when executed by a processor, implement the steps of the image processing method according to any of claims 1-7.