CN112037142B

CN112037142B - Image denoising method, device, computer and readable storage medium

Info

Publication number: CN112037142B
Application number: CN202010857685.7A
Authority: CN
Inventors: 张凯皓; 罗文寒; 刘威
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-08-24
Filing date: 2020-08-24
Publication date: 2024-02-13
Anticipated expiration: 2040-08-24
Also published as: CN112037142A

Abstract

The embodiment of the application discloses an image denoising method, an image denoising device, a computer and a readable storage medium, wherein the method can use technologies such as computer vision, deep learning and the like in the field of artificial intelligence, and comprises the following steps: obtaining a binocular image; denoising and semantically segmenting the monocular images in the binocular images to generate an initial denoising feature map and a semantic information feature map of each monocular image; fusing the initial denoising feature map and the semantic information feature map of each monocular image to obtain fusion features of each monocular image; and denoising each monocular image based on the fusion characteristics respectively to obtain a denoising image of each monocular image. By adopting the method and the device, the information loss during denoising treatment of the binocular image can be reduced, and the denoising effect of the image is improved.

Description

Image denoising method, device, computer and readable storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to an image denoising method, an image denoising device, a computer, and a readable storage medium.

Background

When noise information exists in the image, the viewing effect of the image by a user can be influenced, and in some special occasions (such as vehicles or monitoring and the like), normal work can be influenced. When capturing images or videos by using a mobile phone, a digital camera, or a camera on a vehicle, images with noise information (such as rainwater, fog, or snowflake) may be captured due to weather (such as rainy days, foggy days, or snowy days), and at this time, denoising the images is particularly important. At present, denoising is generally performed on a monocular image (an image with noise information) through a deep learning network to obtain an image after denoising the monocular image, and in this way, when predicting a place covered by the noise information in the monocular image, more information loss is caused, so that the restoration effect of the image is poor.

Disclosure of Invention

The embodiment of the application provides an image denoising method, an image denoising device, a computer and a readable storage medium, which can reduce information loss when denoising binocular images and improve the denoising effect of the binocular images.

In one aspect, an embodiment of the present application provides an image denoising method, including:

obtaining a binocular image;

denoising and semantically segmenting a monocular image in the binocular image to generate an initial denoising feature map and a semantic information feature map of each monocular image;

fusing the initial denoising feature map and the semantic information feature map of each monocular image to obtain fusion features of each monocular image;

and denoising each monocular image based on the fusion characteristics respectively to obtain a denoising image of each monocular image.

Wherein the binocular image comprises a monocular image K _i I is a positive integer, i is less than or equal to the number of monocular images included in the binocular image;

denoising and semantically segmenting a monocular image in the binocular image to generate an initial denoising feature map and a semantic information feature map of each monocular image, wherein the method comprises the following steps of:

monocular image K _i Input image encoder, based on image encoder extraction of monocular image K _i Monocular image features of (2);

monocular image K _i Inputting monocular image characteristics into an image processing model, starting a denoising processing mode in the image processing model, and carrying out monocular image K based on the denoising processing mode _i Denoising the monocular image features to generate a monocular image K _i Is characterized by an initial denoising feature map;

removing the image processing model from the modelThe noise processing mode is switched into a semantic extraction mode, and the monocular image K is subjected to semantic extraction based on the semantic extraction mode _i Semantic segmentation is carried out on monocular image characteristics of the image to generate a monocular image K _i Is a semantic information feature map of (1).

fusing the initial denoising feature map and the semantic information feature map of each monocular image to obtain fusion features of each monocular image, wherein the fusion features comprise:

based on the first image fusion model, extracting a monocular image K _i De-noised image features of an initial de-noised feature map of (a) to extract a monocular image K _i Semantic image features of a semantic information feature map;

monocular image K _i Feature stitching is carried out on the denoising image features and the semantic image features to obtain a monocular image K _i Monocular stitching feature of (2), monocular image K is processed by convolution layer based on first image fusion model _i Iteration is carried out on the monocular stitching features of the image to generate a monocular image K _i Is described.

The binocular image comprises a first monocular image and a second monocular image, and the fusion characteristics of the monocular images comprise a first fusion characteristic of the first monocular image and a second fusion characteristic of the second monocular image;

denoising each monocular image based on the fusion characteristics to obtain a denoising image of each monocular image, wherein the denoising image comprises:

and carrying out feature transfer on the second fusion feature to the first fusion feature, generating a first denoising image of the first monocular image according to the transferred first fusion feature, carrying out feature transfer on the first fusion feature to the second fusion feature, and generating a second denoising image of the second monocular image according to the transferred second fusion feature.

The feature transfer is performed on the second fusion feature to the first fusion feature, a first denoising image of a first monocular image is generated according to the transferred first fusion feature, the feature transfer is performed on the first fusion feature to the second fusion feature, and a second denoising image of a second monocular image is generated according to the transferred second fusion feature, and the method comprises the following steps:

performing feature stitching on the first fusion feature and the second fusion feature to obtain binocular stitching features;

Inputting the binocular stitching features into a second image fusion model, carrying out feature transfer on the first stitching features in the binocular stitching features by adopting the second stitching features based on a convolution layer of the second image fusion model, carrying out feature transfer on the second stitching features in the binocular stitching features by adopting the first stitching features, and generating a third stitching feature;

dividing the third fusion feature into a first color channel feature and a second color channel feature based on the second image fusion model, generating a first denoising image of the first monocular image according to the first color channel feature, and generating a second denoising image of the second monocular image according to the second color channel feature; the first color channel feature is a transferred first fusion feature and the second color channel feature is a transferred second fusion feature.

The method for generating the third fusion feature comprises the following steps of:

acquiring an image visual difference between a first monocular image and a second monocular image, and determining a position association relation between the first monocular image and the second monocular image according to the image visual difference; the position association relation is used for representing the same geographic position, and the association relation between the corresponding pixel point in the first monocular image and the corresponding pixel point in the second monocular image;

Based on the position association relationship and a convolution layer of the second image fusion model, establishing a feature element association relationship between a first fusion feature and a second fusion feature in the binocular stitching features;

and according to the association relation of the feature elements, carrying out feature transfer by adopting the second fusion feature to the first fusion feature in the binocular splicing features, and carrying out feature transfer by adopting the first fusion feature to the second fusion feature in the binocular splicing features, so as to generate a third fusion feature.

The method for generating the second denoising image of the second monocular image according to the second color channel feature comprises the steps of:

dividing the third fusion feature into a first color channel feature and a second color channel feature based on the second image fusion model;

feature superposition is carried out on at least two monochromatic channel features in the first color channel features, and a first denoising image of the first monocular image is generated;

and performing feature superposition on at least two monochromatic channel features in the second color channel features to generate a second denoising image of the second monocular image.

Wherein the method further comprises:

acquiring a noise image sample and a denoising image sample of the noise image sample; the noise image sample is an image obtained by adding noise information into the denoising image sample;

inputting the noise image sample into an initial image processing model in a denoising processing mode, obtaining a predicted denoising image output by the initial image processing model, and adjusting the initial image processing model in the denoising processing mode based on a first loss function between the predicted denoising image and the denoising image sample to obtain an image processing model in an executable denoising processing mode.

Wherein the method further comprises:

acquiring a noise image sample and a semantic image sample of the noise image sample; the semantic image sample is an image obtained by carrying out semantic segmentation on a denoising image sample corresponding to the noise image sample;

inputting the noise image sample into an initial image processing model in a semantic extraction mode, acquiring a predicted semantic image output by the initial image processing model, and adjusting the initial image processing model in the semantic extraction mode based on a second loss function between the predicted semantic image and the semantic image sample to obtain an image processing model in an executable semantic extraction mode.

Wherein the method further comprises:

acquiring a first denoising image sample and a second denoising image sample in a binocular image sample, adding noise information to the first denoising image sample to generate a first denoising image sample of the first denoising image sample, adding noise information to the second denoising image sample to generate a second denoising image sample of the second denoising image sample;

acquiring a first initial denoising sample image and a first initial semantic sample image of a first noise image sample, and acquiring a second initial denoising sample image and a second initial semantic sample image of a second noise image sample;

training the first initial image fusion model and the second initial image fusion model based on the first initial denoising sample diagram, the first initial semantic sample diagram, the second initial denoising sample diagram and the second initial semantic sample diagram to obtain a first image fusion model and a second image fusion model; the output of the first initial image fusion model is the input of the second initial image fusion model.

The training of the first initial image fusion model and the second initial image fusion model based on the first initial denoising sample diagram, the first initial semantic sample diagram, the second initial denoising sample diagram and the second initial semantic sample diagram to obtain the first image fusion model and the second image fusion model comprises the following steps:

Inputting the first initial denoising sample graph and the first initial semantic sample graph into a first initial image fusion model to obtain first fusion sample characteristics of a first noise image sample;

inputting the second initial denoising sample graph and the second initial semantic sample graph into a first initial image fusion model to obtain second fusion sample characteristics of the first noise image sample;

inputting the first fusion sample characteristics and the second fusion sample characteristics into a second initial image fusion model to obtain a first prediction denoising image and a second prediction denoising image;

acquiring a third loss function between the first predicted denoising image and the first denoising image sample, and acquiring a fourth loss function between the second predicted denoising image and the second denoising image sample;

and adjusting the first initial image fusion model and the second initial image fusion model based on the comprehensive loss function corresponding to the third loss function and the fourth loss function to obtain the first image fusion model and the second image fusion model.

In one aspect, an embodiment of the present application provides an image denoising apparatus, including:

the image acquisition module is used for acquiring binocular images;

the image processing module is used for carrying out denoising processing and semantic segmentation on the monocular images in the binocular images to generate an initial denoising feature map and a semantic information feature map of each monocular image;

The image fusion module is used for fusing the initial denoising feature image and the semantic information feature image of each monocular image to obtain fusion features of each monocular image;

and the image denoising module is used for denoising each monocular image based on the fusion characteristics to obtain a denoising image of each monocular image.

the image processing module comprises:

a first feature extraction unit for extracting a monocular image K _i Input image encoder, based on image encoder extraction of monocular image K _i Monocular image features of (2);

an image denoising unit for denoising the monocular image K _i Inputting monocular image characteristics into an image processing model, starting a denoising processing mode in the image processing model, and carrying out monocular image K based on the denoising processing mode _i Denoising the monocular image features to generate a monocular image K _i Is characterized by an initial denoising feature map;

the semantic extraction unit is used for switching the image processing model from the denoising processing mode to the semantic extraction mode and carrying out monocular image K based on the semantic extraction mode _i Semantic segmentation is carried out on monocular image characteristics of the image to generate a monocular image K _i Is a semantic information feature map of (1).

the image fusion module comprises:

a second feature extraction unit for extracting a monocular image K based on the first image fusion model _i De-noised image features of an initial de-noised feature map of (a) to extract a monocular image K _i Semantic image features of a semantic information feature map;

a feature processing unit for processing the monocular image K _i Feature stitching is carried out on the denoising image features and the semantic image features to obtain a monocular image K _i Monocular stitching feature of (2), monocular image K is processed by convolution layer based on first image fusion model _i Iteration is carried out on the monocular stitching features of the image to generate a monocular image K _i Is described.

the image denoising module is specifically used for:

The image denoising module comprises:

the feature splicing unit is used for carrying out feature splicing on the first fusion feature and the second fusion feature to obtain binocular splicing features;

the feature transfer unit is used for inputting the binocular stitching features into the second image fusion model, carrying out feature transfer on the first stitching features in the binocular stitching features by adopting the second stitching features based on the convolution layer of the second image fusion model, carrying out feature transfer on the second stitching features in the binocular stitching features by adopting the first stitching features, and generating a third stitching feature;

the image generation unit is used for dividing the third fusion feature into a first color channel feature and a second color channel feature based on the second image fusion model, generating a first denoising image of the first monocular image according to the first color channel feature, and generating a second denoising image of the second monocular image according to the second color channel feature; the first color channel feature is a transferred first fusion feature and the second color channel feature is a transferred second fusion feature.

Wherein, this characteristic transfer unit includes:

the association determination subunit is used for acquiring the image vision difference between the first monocular image and the second monocular image and determining the position association relationship between the first monocular image and the second monocular image according to the image vision difference; the position association relation is used for representing the same geographic position, and the association relation between the corresponding pixel point in the first monocular image and the corresponding pixel point in the second monocular image;

The relation establishing subunit is used for establishing a characteristic element association relation between a first fusion characteristic and a second fusion characteristic in the binocular splicing characteristic based on the position association relation and a convolution layer of the second image fusion model;

and the feature generation subunit is used for carrying out feature transfer on the first fusion feature in the binocular splicing features by adopting the second fusion feature according to the association relation of the feature elements, carrying out feature transfer on the second fusion feature in the binocular splicing features by adopting the first fusion feature, and generating a third fusion feature.

Wherein the image generation unit includes:

the feature division subunit is used for dividing the third fusion feature into a first color channel feature and a second color channel feature based on the second image fusion model;

the feature superposition subunit is used for carrying out feature superposition on at least two monochromatic channel features in the first color channel features to generate a first denoising image of the first monocular image;

the feature superposition subunit is further configured to perform feature superposition on at least two monochromatic channel features in the second color channel features, and generate a second denoising image of the second monocular image.

Wherein the apparatus further comprises:

the first sample acquisition module is used for acquiring noise image samples and denoising image samples of the noise image samples; the noise image sample is an image obtained by adding noise information into the denoising image sample;

The first model training module is used for inputting the noise image sample into an initial image processing model in a denoising processing mode, obtaining a predicted denoising image output by the initial image processing model, and adjusting the initial image processing model in the denoising processing mode based on a first loss function between the predicted denoising image and the denoising image sample to obtain an image processing model in an executable denoising processing mode.

Wherein the apparatus further comprises:

the second sample acquisition module is used for acquiring noise image samples and semantic image samples of the noise image samples; the semantic image sample is an image obtained by carrying out semantic segmentation on a denoising image sample corresponding to the noise image sample;

the second model training module is used for inputting the noise image sample into the initial image processing model in the semantic extraction mode, obtaining a predicted semantic image output by the initial image processing model, and adjusting the initial image processing model in the semantic extraction mode based on a second loss function between the predicted semantic image and the semantic image sample to obtain the image processing model in the executable semantic extraction mode.

Wherein the apparatus further comprises:

The third sample acquisition module is used for acquiring a first denoising image sample and a second denoising image sample in the binocular image samples, adding noise information to the first denoising image sample to generate a first denoising image sample of the first denoising image sample, adding noise information to the second denoising image sample to generate a second denoising image sample of the second denoising image sample;

the sample conversion module is used for acquiring a first initial denoising sample image and a first initial semantic sample image of a first noise image sample and acquiring a second initial denoising sample image and a second initial semantic sample image of a second noise image sample;

the third model training module is used for training the first initial image fusion model and the second initial image fusion model based on the first initial denoising sample diagram, the first initial semantic sample diagram, the second initial denoising sample diagram and the second initial semantic sample diagram to obtain the first image fusion model and the second image fusion model; the output of the first initial image fusion model is the input of the second initial image fusion model.

Wherein the third model training module comprises:

the first feature acquisition unit is used for inputting the first initial denoising sample image and the first initial semantic sample image into a first initial image fusion model to obtain first fusion sample features of a first noise image sample;

The second feature acquisition unit is used for inputting a second initial denoising sample image and a second initial semantic sample image into the first initial image fusion model to obtain second fusion sample features of the first noise image sample;

the model prediction unit is used for inputting the first fusion sample characteristics and the second fusion sample characteristics into a second initial image fusion model to obtain a first prediction denoising image and a second prediction denoising image;

the loss acquisition unit is used for acquiring a third loss function between the first prediction denoising image and the first denoising image sample and acquiring a fourth loss function between the second prediction denoising image and the second denoising image sample;

the model adjustment unit is used for adjusting the first initial image fusion model and the second initial image fusion model based on the comprehensive loss function corresponding to the third loss function and the fourth loss function to obtain the first image fusion model and the second image fusion model.

In one aspect, a computer device is provided, including a processor, a memory, and an input/output interface;

the processor is respectively connected with the memory and the input/output interface, wherein the input/output interface is used for receiving data and outputting data, the memory is used for storing a computer program, and the processor is used for calling the computer program to execute the image denoising method in one aspect of the embodiment of the application.

An aspect of the present application provides a computer-readable storage medium storing a computer program, the computer program including program instructions that, when executed by a processor, perform the image denoising method in an aspect of the present application.

In one aspect, the present application provides a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The computer instructions are read from the computer-readable storage medium by a processor of a computer device, and executed by the processor, cause the computer device to perform the methods provided in the various alternatives in an aspect of the embodiments of the present application.

Implementation of the embodiment of the application has the following beneficial effects:

the embodiment of the application obtains the binocular image; denoising and semantically segmenting the monocular images in the binocular images to generate an initial denoising feature map and a semantic information feature map of each monocular image; fusing the initial denoising feature map and the semantic information feature map of each monocular image to obtain fusion features of each monocular image; and denoising each monocular image based on the fusion characteristics respectively to obtain a denoising image of each monocular image. Through the above process, after the monocular image is subjected to preliminary denoising, semantic information of the monocular image is extracted, and a preliminary denoising result (namely an initial denoising feature map) and semantic information (namely a semantic information feature map) are fused, so that information loss generated when the monocular image in the monocular image is denoised is reduced; further, feature fusion can be carried out on two images in the binocular image, namely, the result after preliminary denoising of the binocular image is completed (namely, denoising is carried out on each monocular image again) based on the information in the two images in the binocular image, so that information loss during denoising of the binocular image is further reduced, denoising processing results of the binocular image are perfected, and denoising processing effects of the binocular image are improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a network architecture diagram for image denoising according to an embodiment of the present application;

fig. 2 is a schematic diagram of a simple scene of denoising a binocular image according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a binocular image architecture according to an embodiment of the present application;

FIG. 4 is a flowchart of a method for denoising an image according to an embodiment of the present application;

FIG. 5 is a schematic diagram of a network structure for feature delivery between converged features according to an embodiment of the present application;

FIG. 6 is a flowchart of a denoising method for binocular images according to an embodiment of the present application;

FIG. 7 is a diagram of a network structure for preliminary denoising and semantic extraction of images according to an embodiment of the present application;

FIG. 8a is a diagram of a fusion feature generation network architecture provided in an embodiment of the present application;

FIG. 8b is a schematic illustration of a feature iteration provided by an embodiment of the present application;

fig. 9 is a schematic view of a denoising image generation scene provided in an embodiment of the present application;

fig. 10 is a schematic diagram of an image denoising apparatus according to an embodiment of the present application;

fig. 11 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.

In the embodiment of the application, the denoising processing can be performed on the binocular image based on the technologies of computer vision, deep learning and the like in the artificial intelligence field, so that the accuracy of a denoising result in a scene where the binocular image is denoised is improved.

Among these, artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.

The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions. The present application relates generally to the directions of computer vision (e.g., identifying images included in binocular images, etc.) and machine learning/deep learning (e.g., performing feature extraction or feature fusion on a first monocular image and a second monocular image included in binocular images).

The Computer Vision technology (CV) is a science for researching how to make a machine "look at", and more specifically, a camera and a Computer are used to replace human eyes to perform machine Vision such as identifying, tracking and measuring on a target, and further perform graphic processing, so that the Computer is processed into an image more suitable for human eyes to observe or transmit to an instrument to detect. For example, a binocular image is acquired and processed by replacing the human eye with a computer device. As a scientific discipline, computer vision research-related theory and technology has attempted to build an artificial intelligence system capable of acquiring information from images or multidimensional data, and in this application, mainly, based on computer vision technology, an artificial intelligence system capable of denoising binocular images has been built. Computer vision techniques typically include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D techniques, virtual reality, augmented reality, synchronous positioning, and map construction, among others, as well as common biometric recognition techniques such as face recognition, fingerprint recognition, and others.

And Deep Learning (DL) is a new research direction in the field of Machine Learning (ML). Deep learning is the inherent regularity and presentation hierarchy of learning sample data, and the information obtained during such learning is helpful in interpreting data such as text, images and sounds. The deep learning is a complex machine learning algorithm, and the effect obtained in terms of voice and image recognition is far superior to that of the prior related technology, and the deep learning generally comprises the technologies of artificial neural network, confidence network, reinforcement learning, transfer learning, induction learning, teaching learning and the like.

As artificial intelligence technology research and advances, artificial intelligence technology expands research and applications in a variety of fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned, autopilot, unmanned, robotic, smart medicine, smart customer service, etc., it is believed that as technology advances, artificial intelligence technology will find application in more fields and will find increasingly important value, such as in the field of image denoising in the present application.

The scheme provided by the embodiment of the application relates to the technologies of computer vision, deep learning and the like in the field of artificial intelligence, and is specifically described by the following embodiments:

Specifically, referring to fig. 1, fig. 1 is a network architecture diagram for image denoising provided in an embodiment of the present application, where the embodiment of the present application may be implemented by a computer device, where the computer device may be composed of a server and a terminal device; the computer device may also be a server or a terminal device, without limitation. The embodiment of the application can be applied to any device capable of generating or acquiring binocular images, such as a mobile phone, a digital camera or a video camera.

It is understood that the computer device or user device mentioned in the embodiments of the present application includes, but is not limited to, a terminal device or a server. In other words, the computer device or the user device may be a server or a terminal device, or may be a system formed by the server and the terminal device. The above-mentioned terminal device may be an electronic device, including but not limited to a mobile phone, a tablet computer, a desktop computer, a notebook computer, a palm computer, an augmented Reality/Virtual Reality (AR/VR) device, a head mounted display, a wearable device, a smart speaker, a digital camera, a camera, and other mobile internet devices (mobile internet device, MID) with network access capability, etc. The servers mentioned above may be independent physical servers, or may be server clusters or distributed systems formed by a plurality of physical servers, or may be cloud servers that provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content delivery networks (Content Delivery Network, CDN), and basic cloud computing services such as big data and artificial intelligence platforms.

In this embodiment of the present application, as shown in fig. 1, a network connection may exist between a computer device 101 and a terminal device, where the computer device 101 may obtain a binocular image sent by the terminal device, perform denoising processing on the binocular image, obtain a denoising image corresponding to the binocular image, and send the denoising image to the terminal device corresponding to the binocular image. The terminal devices may include a terminal device 102a, a terminal device 102b, and a terminal device 102c. Optionally, the process of denoising the binocular image implemented in the present application may be integrated in a terminal device, and at this time, after the terminal device generates or obtains the binocular image, the process of denoising the binocular image implemented in the embodiment of the present application may be invoked to denoise the binocular image. The binocular image mentioned in the embodiment of the present application includes a first monocular image and a second monocular image, where the first monocular image and the second monocular image are images with noise information. The first monocular image may be an image corresponding to left vision, and the second monocular image may be an image corresponding to right vision.

Alternatively, the process of denoising the binocular image implemented by the embodiment of the present application may be referred to as binocular image processing artificial intelligence. Taking the terminal device 102a as an example, assuming that the terminal device 102a is a mobile phone, in one possible application scenario using the embodiment of the present application, the terminal device 102a takes a photo (i.e. a binocular image) through an application program of a camera, noise information exists in the photo due to environmental reasons, the terminal device 102a sends the photo to the computer device 101, the computer device 101 obtains a first monocular image and a second monocular image in the photo, and denoising processing is performed on the first monocular image and the second monocular image based on the binocular image processing artificial intelligence implemented in the embodiment of the present application, so as to obtain a first denoised image of the first monocular image and a second denoised image of the second monocular image. Or, the terminal device 102a takes a photo (i.e. a binocular image) through the application program of the camera, and due to the environmental reasons, the photo has noise information, the terminal device 102a invokes the binocular image processing artificial intelligence integrated in the camera, and performs denoising processing on the photo, so as to obtain a first denoised image and a second denoised image. The binocular image processing artificial intelligence may be integrated into a terminal device, or may be deployed in a background server (i.e., the computer device 101), which is not limited herein.

Further, referring to fig. 2, fig. 2 is a schematic diagram of a simple scene of denoising a binocular image according to an embodiment of the present application. As shown in fig. 2, a first monocular image 201 and a second monocular image 202 of the binocular images are acquired, the first monocular image 201 and the second monocular image are images for the same shooting scene, and an image visual difference exists between the first monocular image and the second monocular image. The same shooting scene refers to scenes with the same geographic position. Preliminary denoising and semantic segmentation are carried out on the first monocular image 201, an initial denoising feature map and a semantic information feature map of the first monocular image 201 are generated, the initial denoising feature map and the semantic information feature map of the first monocular image 201 are fused, fusion features of the first monocular image 201 are obtained, and the fusion features are recorded as first fusion features; the first monocular image 202 is subjected to preliminary denoising and semantic segmentation to generate an initial denoising feature map and a semantic information feature map of the first monocular image 202, the initial denoising feature map and the semantic information feature map of the first monocular image 202 are fused to obtain fusion features of the first monocular image 202, and the fusion features are recorded as second fusion features. By fusing the preliminary denoising result (i.e., the initial denoising feature map) of the first monocular image or the second monocular image with the semantic information (i.e., the semantic information feature map), part of the image information lost in the preliminary denoising result of the first monocular image or the second monocular image can be repaired. Denoising each monocular image again based on the first fusion feature and the second fusion feature, namely, feature transferring the second fusion feature to the first fusion feature, feature transferring the first fusion feature to the second fusion feature, so as to generate a first denoising image 2011 of the first monocular image 201 according to the transferred first fusion feature, and generating a second denoising image 2021 of the second monocular image 202 according to the transferred second fusion feature. Through carrying out the feature transfer between first fusion feature and the second fusion feature to restore the second monocular image through the feature of first monocular image, restore first monocular image through the feature of second monocular image, can further reduce the image information loss that produces when denoising the binocular image, thereby improve the denoising result accuracy of denoising the binocular image, promote binocular image denoising effect.

Referring to fig. 3, fig. 3 is a schematic diagram of a binocular image architecture according to an embodiment of the present application. As shown in fig. 3, taking an object 301 as a reference, it is assumed that a shooting scene where the object 301 is located is shot by a digital camera, where the digital camera includes a shooting angle corresponding to left vision and a shooting angle corresponding to right vision, and the shooting angle corresponding to left vision may be referred to as a left camera 302 and the shooting angle corresponding to right vision may be referred to as a right camera 303. The distance between the object 301 and the digital camera is L, the distance between the left camera 302 and the right camera 303 is a distance d, at this time, a left viewing angle α exists between the left camera 302 and the object 301, and a right viewing angle β exists between the right camera 303 and the object 301, where the left viewing angle α refers to an angle between a side formed by the left camera 302 and the object 301 and a side formed by the left camera 302 and the right camera 303, and the right viewing angle β refers to an angle between a side formed by the right camera 303 and the object 301 and a side formed by the left camera 302 and the right camera 303. When a shooting scene of the object 301 is shot by a digital camera, acquiring the shooting scene based on the left camera 302 to obtain a first monocular image 3021, wherein the first image position information of the object 301 in the first monocular image 3021 is (x 1, y 1); based on the acquisition of the shooting scene by the right camera 303, a second monocular image 3031 is obtained, and the second image position information of the object 301 in the second monocular image 3031 is (x 2, y 2), it can be known that the first monocular image and the second monocular image display the same object in the same geographic position and the same object shot under different vision aiming at the same shooting scene, and the corresponding pixel position in the first monocular image is different from the pixel position in the second monocular image. The first monocular image 3021 and the second monocular image 3031 form a binocular image.

Further, referring to fig. 4, fig. 4 is a flowchart of a method for denoising an image according to an embodiment of the present application. As shown in fig. 4, the image denoising process includes the steps of:

step S401, acquiring a binocular image.

In an embodiment of the present application, a computer device acquires a binocular image, wherein the binocular image includes a monocular image K _i I is a positive integer, i is less than or equal to the number of monocular images comprised by the binocular image, e.g. the binocular image comprises 2 monocular images, i is less than or equal to 2, i.e. i is 1 or 2, wherein the monocular image K is to be used ₁ Recording as a first monocular image, monocular image K ₂ Recorded as a second monocular image. After the computer equipment acquires the binocular image, acquiring a first monocular image and a second monocular image in the binocular image, wherein the first monocular image and the second monocular image are images with noise information. The noise information may be generated due to an environmental reason, and the environmental information corresponding to the environmental reason for generating the noise information is referred to as noise environmental information, where the noise environmental information may include one or at least two environmental types, such as a weather environment or a surrounding environment (such as light, etc.), and optionally, when the weather environment is a rainy day, a foggy day, or a snowy day, etc., the noise information may exist in the captured binocular image, that is, the rainy day, the foggy day, the snowy day, etc., all belong to the noise environmental information, for example, the noise information corresponding to the rainy day is rainwater, the noise information corresponding to the foggy day is fog, and the noise information corresponding to the snowy day is snowy day, etc.

Optionally, after the computer device obtains the binocular image, shooting environment information of the binocular image may be obtained, and when the shooting environment information belongs to noise environment information, a first monocular image and a second monocular image in the binocular image are obtained; when the shooting environment information does not belong to noise environment information, the step is not triggered, and the binocular image can be directly stored. Or, whether noise information exists in the binocular image can be detected through a model, for example, the binocular image is input into a noise detection model for recognition to obtain a noise label and a noise probability value, and a noise-free label and a noise-free probability value, if the noise probability value is larger than the noise-free probability value, the noise information exists in the binocular image, and a first monocular image and a second monocular image in the binocular image are obtained; if the noise probability value is smaller than the noise-free probability value, determining that no noise information exists in the binocular image, and not triggering the step; if the noise probability value is equal to the noise-free probability value, whether the step is triggered or not can be set according to requirements, for example, if all the binocular images with noise information are to be subjected to denoising processing as much as possible, the noise information in the binocular images can be determined when the noise probability value is equal to the noise-free probability value, and if the situation of mistakenly denoising the binocular images is to be avoided, the noise information in the binocular images can be determined when the noise probability value is equal to the noise-free probability value; alternatively, whether noise information exists in the binocular image may also be detected based on a noise threshold, which is not limited herein.

Step S402, denoising and semantic segmentation are carried out on monocular images in the monocular images, and an initial denoising feature map and a semantic information feature map of each monocular image are generated.

In the embodiment of the application, for the monocular image K _i Denoising to obtain monocular image K _i For monocular image K _i Semantic segmentation is carried out to obtain a monocular image K _i Is a semantic information feature map of (1). Wherein i is 1 or 2, monocular image K ₁ For the first monocular image, monocular image K ₂ Is the second monocular image. The initial denoising feature map of the first monocular image can be recorded as a first initial denoising feature map, and the first initial denoising feature map is used for representing a first initial denoising result obtained after the first monocular image is subjected to initial denoising processing; the semantic information feature map of the first monocular image is noted as a first semantic information feature map, which is used to characterize the first semantic information of the first monocular image. Wherein the initial denoising feature map of the second monocular image may be recordedA second initial denoising feature map is made, and the second initial denoising feature map is used for representing a second initial denoising result obtained after the preliminary denoising treatment is carried out on the second monocular image; and recording the semantic information feature map of the second monocular image as a second semantic information feature map, wherein the second semantic information feature map is used for representing second semantic information of the second monocular image.

Specifically, monocular image K _i Input image encoder, based on image encoder extraction of monocular image K _i Monocular image features of (a). Monocular image K _i Inputting monocular image characteristics into an image processing model, starting a denoising processing mode in the image processing model, and carrying out monocular image K based on the denoising processing mode _i Denoising the monocular image features to generate a monocular image K _i Is described. Switching the image processing model from a denoising processing mode to a semantic extraction mode, and performing monocular image K based on the semantic extraction mode _i Semantic segmentation is carried out on monocular image characteristics of the image to generate a monocular image K _i Is provided. Through this process, a first monocular image (i.e., monocular image K ₁ ) The first initial denoising feature map and the first semantic information feature map of (1) to obtain a second monocular image (namely monocular image K) ₂ ) Is described as a second initial denoising feature map and a second semantic information feature map.

Step S403, fusing the initial denoising feature map and the semantic information feature map of each monocular image to obtain fusion features of each monocular image.

In the embodiment of the application, the monocular image K is extracted based on the first image fusion model _i De-noised image features of an initial de-noised feature map of (a) to extract a monocular image K _i Semantic image features of the semantic information feature map. Monocular image K _i Feature stitching is carried out on the denoising image features and the semantic image features to obtain a monocular image K _i Monocular stitching feature of (2), monocular image K is processed by convolution layer based on first image fusion model _i Iteration is carried out on the monocular stitching features of the image to generate a monocular image K _i Is described. Wherein a first monocular image (i.e. monocular image K ₁ ) The monocular stitching feature of (2) is marked as a first monocular stitching feature, and the fusion feature of the first monocular image is marked as a first fusion feature; the second monocular image (i.e. monocular image K ₂ ) The monocular stitching feature of (2) is referred to as a second monocular stitching feature and the fusion feature of the second monocular image is referred to as a second fusion feature. By fusing the initial denoising feature map and the semantic information feature map of each monocular image, the monocular image K-based image can be realized _i Semantic information pair monocular image K _i Information supplementation to reduce the noise in the monocular image K _i Information loss generated when preliminary denoising processing is performed, and the monocular image K is restored as much as possible _i The image information of each monocular image is reduced, so that the image denoising processing effect on the binocular image can be improved to a certain extent.

And step S404, denoising each monocular image based on the fusion characteristics to obtain a denoising image of each monocular image.

In the embodiment of the application, the first fusion feature of the first monocular image and the second fusion feature of the second monocular image are subjected to feature transfer for multiple times, namely the second fusion feature is subjected to feature transfer to the first fusion feature, and the first fusion feature is subjected to feature transfer to the second fusion feature, wherein each feature transfer is the mutual transfer between the first fusion feature and the second fusion feature, so that the transferred first fusion feature and the transferred second fusion feature are obtained, the first denoising image of the first monocular image is generated according to the transferred first fusion feature, and the second denoising image of the second monocular image is generated according to the transferred second fusion feature. For example, referring to fig. 5, fig. 5 is a schematic diagram of a network structure for feature transfer between fusion features according to an embodiment of the present application. As shown in fig. 5, assuming that there are n feature transfers between the first fusion feature and the second fusion feature, n is a positive integer, and n may be an empirical value or may be set by a worker, which is not limited herein. Specifically, a first fusion feature 501 and a second fusion feature 502 are obtained, the second fusion feature 502 is subjected to first feature transfer to the first fusion feature 501 to obtain a first fusion feature 5011 of the first transfer, and the first fusion feature 501 is subjected to first feature transfer to the second fusion feature 502 to obtain a second fusion feature 5021 of the first transfer; …; the second fusion feature 502 (n-1) transferred from the (n-1) th time is transferred to the first fusion feature 501 (n-1) transferred from the (n-1) th time to obtain the first fusion feature 501n transferred from the nth time, namely the first fusion feature after transfer, and the first fusion feature 501 (n-1) transferred from the (n-1) th time is transferred to the second fusion feature 502 (n-1) transferred from the (n-1) th time to obtain the second fusion feature 502n transferred from the nth time, namely the second fusion feature after transfer. Wherein, a first denoising image of the first monocular image may be generated according to the transferred first fusion feature 501n, and a second denoising image of the second monocular image may be generated according to the transferred second fusion feature 502 n.

Optionally, the triggering modes of the first monocular image and the second monocular image for acquiring the binocular image include, but are not limited to, the following cases:

1. the image denoising configuration information comprises an image denoising starting state, original image retention information and the like, wherein the image denoising starting state comprises an operating state and an idle state. After the computer equipment acquires the binocular image, acquiring image denoising configuration information of the computer equipment, and triggering step S401 to step S404 to denoise the binocular image if the image denoising starting state in the image denoising configuration information is an operating state. Optionally, after the computer device obtains the binocular image, when noise information exists in the binocular image and the image denoising start state in the image denoising configuration information is an operation state, step S401 to step S404 are triggered to perform denoising processing on the binocular image.

The sequence of execution between the process of detecting whether the noise information exists in the binocular image and the process of detecting the image denoising start state in the image denoising configuration information is not limited, for example, the two processes may be parallel, or whether the noise information exists in the binocular image is detected first, then the image denoising start state in the image denoising configuration information is detected, or whether the image denoising start state in the image denoising configuration information is detected first, and then whether the noise information exists in the binocular image is detected. Through the acquisition of the pre-configuration, when the binocular image is acquired, the denoising processing of the binocular image does not need to be manually triggered each time, the processing efficiency of the binocular image can be improved, the binocular image is detected first to determine whether the denoising processing of the binocular image is needed, the generated invalid workload can be reduced, and resources are saved to a certain extent.

Alternatively, the original image retention information may be retention processing information or deletion processing information. After the computer equipment performs denoising processing on the binocular image by executing the steps S401 to S404, acquiring original image retention information in the image denoising configuration information, and if the original image retention information is retention processing information, storing a first monocular image and a second monocular image in the binocular image by the computer equipment, wherein the first monocular image, the second monocular image, the first denoising image and the second denoising image exist in the computer equipment; if the original image retention information is deletion processing information, deleting the first monocular image and the second monocular image in the binocular image by the computer equipment, wherein at the moment, a first denoising image and a second denoising image exist in the computer equipment.

Optionally, when the image denoising configuration information is not set by the user, an image denoising start state, original image retention information and the like in the image denoising configuration information may be set as default values, for example, the image denoising start state is set as default state, the original image retention information is set as default processing information, where the default state may be any one of an operation state and an idle state, and the default processing information may be any one of retention processing information and deletion processing information, where the default value may be set by a developer.

2. The method comprises the steps that a computer device obtains a binocular image, whether noise information exists in the binocular image is detected, if the noise information exists in the binocular image, a denoising prompt message is displayed, when the computer device obtains a confirmation operation for the denoising prompt message, the computer device responds to the confirmation operation, and steps S401 to S404 are executed for the binocular image; when the computer equipment acquires the canceling operation aiming at the denoising prompt message, the process is ended, and the binocular image is not processed. By determining whether to trigger each step in fig. 4 for a specific binocular image, the triggering accuracy of denoising the binocular image can be improved, so as to reduce the occurrence of false denoising. For example, the user exclusively shoots a scene in the rain, in this way, it can be avoided that the shot image of the scene in the rain is erroneously denoised.

3. After the binocular image is acquired, the computer device may directly save the binocular image. When the computer device acquires a denoising request for a certain binocular image, the computer device acquires the binocular image requested by the denoising request again, and performs step S401 and step S404 for the binocular image to denoise the binocular image.

The above is a few alternative triggering manners for each step in fig. 4, and other manners that may trigger performing each step in fig. 4 may also be applied in the present application, which is not limited herein.

Optionally, after the first denoising image and the second denoising image corresponding to the binocular image are obtained, a first monocular name of the first monocular image and a second monocular name of the second monocular image may be obtained, the first denoising name of the first denoising image is determined according to the first monocular name, and the second denoising name of the second denoising image is determined according to the second monocular name. For example, when the first and second monocular images in the binocular image are deleted after the denoising process is performed on the binocular image, the first monocular name may be determined as the first denoising name and the second monocular name may be determined as the second denoising name. Optionally, the first monocular name may be adjusted to generate a first denoising name, and the second monocular name may be adjusted to generate a second denoising name, for example, the first monocular name is "taishan day map" to obtain a first denoising name is "taishan day denoising map", where the adjustment manner is not limited herein. Optionally, the first denoising name and the second denoising name may be obtained according to an image name generating mode of the computer device, and the application is not limited to use of other name generating modes.

In the embodiment of the application, binocular images are acquired; denoising and semantically segmenting a monocular image in the binocular image to generate an initial denoising feature map and a semantic information feature map of each monocular image, and fusing the initial denoising feature map and the semantic information feature map of each monocular image to generate fusion features of each monocular image; and denoising each monocular image based on the fusion characteristics respectively to obtain a denoising image of each monocular image. Through the above processes, two aspects of improvement are performed, in the first aspect, semantic information (namely, a semantic information feature map) in a monocular image (namely, an initial denoising feature map) is fused in a preliminary denoising result (namely, an initial denoising feature map) of the monocular image (namely, a first monocular image and a second monocular image), and in the second aspect, information in the image is mutually complemented between two images in the binocular image, so that after denoising the binocular image with noise information, the obtained denoising image can retain more image information, namely, the information loss caused when denoising the binocular image is reduced, the denoising result of the binocular image is perfected, and the denoising effect of the binocular image is improved.

Further, referring to fig. 6, fig. 6 is a flowchart of a denoising method for binocular images according to an embodiment of the present application. As shown in fig. 6, the process of denoising the binocular image includes the following steps:

step S601, a first monocular image and a second monocular image in the binocular image are acquired.

In the embodiment of the application, the computer equipment acquires the binocular image, and if noise information exists in the binocular image, the first monocular image and the second monocular image in the binocular image are acquired. The first monocular image and the second monocular image are two images obtained by shooting the same shooting scene under different vision, namely, an image vision difference exists between the first monocular image and the second monocular image. The details can be seen from the detailed description shown in step S401 in fig. 4, and the details are not repeated here.

Step S602, denoising processing and semantic segmentation are carried out on the first monocular image.

In the embodiment of the application, denoising is performed on a first monocular image to obtain a first initial denoising feature map of the first monocular image, semantic segmentation is performed on the first monocular image to obtain a first semantic information feature map of the first monocular image. Specifically, inputting a first monocular image into an image encoder, and extracting first monocular image features of the first monocular image based on the image encoder; inputting the first monocular image characteristics into an image processing model, starting a denoising processing mode in the image processing model, denoising the first monocular image characteristics based on the denoising processing mode, and generating a first initial denoising characteristic diagram of the first monocular image; switching the image processing model from a denoising processing mode to a semantic extraction mode, and performing semantic segmentation on the first monocular image features based on the semantic extraction mode to generate a first semantic information feature map of the first monocular image.

Specifically, referring to fig. 7, fig. 7 is a network structure diagram for image preliminary denoising and semantic extraction according to an embodiment of the present application. As shown in fig. 7, the computer device inputs a first monocular image 701 into an image encoder 702 to extract a first monocular image feature of the first monocular image 701, the image encoder 702 may be a convolutional neural network (Convolutional Neural Networks, CNN) or a network of at least two CNNs, the image encoder 702 may be configured to extract a feature of the image, and the monocular image feature may be a feature map. For example, the image encoder 702 may be a model comprising 7 layers of CNN, with a step size of 1 using a 3*3 convolution kernel.

The computer device inputs the first monocular image feature into the image processing model 703, obtains a first execution condition 7031, starts a denoising processing mode in the image processing model 703 based on the first execution condition 7031, performs denoising processing on the first monocular image feature based on the denoising processing mode, obtains a first preliminary denoising result of the first monocular image 701 (i.e., 704 in fig. 7 is the first preliminary denoising result), and converts the first preliminary denoising result into a first initial denoising feature map 7051. The computer device acquires the second execution condition 7032, switches the image processing model 703 from the denoising processing mode to the semantic extraction mode based on the second execution condition 7032, performs semantic segmentation on the first monocular image feature based on the semantic extraction mode, obtains the first semantic information of the first monocular image 701 (i.e., 704 in fig. 7 is the first semantic information), and converts the first semantic information into the first semantic information feature map 7052. Alternatively, the execution condition may be acquired based on a condition start bit, specifically, if the first execution condition 7031 is set to 0, the second execution condition 7032 is set to 1, and there is a condition start bit in the image processing model 703, and the condition start bit is configured to be a value 0 of the first execution condition 7031, so as to start the denoising processing mode of the image processing model 703; the condition initiation bit is configured to a value of 1 for the second execution condition 7032 to initiate the semantic extraction mode of the image processing model 703. Alternatively, the image processing model 703 may be directly triggered based on the first execution condition or the second execution condition without updating the image processing model 703.

The first execution condition 7031 and the second execution condition 7032 may be considered as a switch of the image processing model 703, and when the switch points to the first execution condition 7031, the denoising processing mode in the image processing model 703 is started, and when the switch points to the second execution condition 7032, the semantic extraction mode in the image processing model 703 is started. The sequence of execution of the generation process of the first initial denoising feature map and the generation process of the first semantic information feature map is not limited. The above example is that a first initial denoising feature map is generated first, and a first semantic information feature map is regenerated; when the first semantic information feature map is generated first and the first initial denoising feature map is generated, the image processing model 703 is switched from the semantic extraction mode to the denoising processing mode after the first semantic information feature map is generated.

Step S603, performing feature fusion on the first preliminary denoising result and the first semantic information of the first monocular image, and generating a first fusion feature of the first monocular image.

In the embodiment of the application, feature fusion is performed on the first initial denoising feature map and the first semantic information feature map, so as to generate a first fusion feature. Specifically, denoising image features of a first initial denoising feature map are extracted based on a first image fusion model and are recorded as first denoising image features, semantic image features of a first semantic information feature map are extracted and recorded as first semantic image features; and performing feature stitching on the first denoising image features and the first semantic image features to obtain monocular stitching features of the first monocular image, marking the monocular stitching features of the first monocular image as the first monocular stitching features, and iterating the first monocular stitching features based on a convolution layer of a first image fusion model to generate first fusion features. The feature stitching may be directly stitching the first denoised image feature with the first semantic image feature, for example, the first denoised image feature is (0,1,4,0,5), the first semantic image feature is (1,3,0,1,2), and the feature stitching is performed on the first denoised image feature and the first semantic image feature, so as to obtain a first monocular stitching feature (0,1,4,0,5,1,3,0,1,2). When the first monocular stitching feature is iterated by the convolution layer based on the first image fusion model to generate the first fusion feature, the first monocular stitching feature may be iterated by the convolution layer based on the first image fusion model, which is equivalent to convolving the first monocular stitching feature to obtain a first convolution feature, convolving the first convolution feature to obtain a second convolution feature, and … until the first fusion feature is obtained.

For example, referring to fig. 8a, fig. 8a is a network architecture diagram for generating fusion features according to an embodiment of the present application. As shown in fig. 8a, the first initial denoising feature map 7051 and the first semantic information feature map 7052 obtained in fig. 7 are input into a first image fusion model 802, first denoising image features 8011 of the first initial denoising feature map 7051 are extracted based on the first image fusion model 802, first semantic image features 8012 of the first semantic information feature map 7052 are extracted, and feature stitching is performed on the first denoising image features 8011 and the first semantic image features 8012, so as to obtain first monocular stitching features 8021. The first monocular stitching feature 8021 is iterated based on the convolution layer of the first image fusion model 802, generating a first fusion feature 803. Where, in FIG. 8a, y is used to represent feature stitching (concatate).

For example, referring to fig. 8b, fig. 8b is a schematic illustration of feature iteration provided in an embodiment of the present application. As shown in fig. 8b, taking a feature point as an example, feature expansion is performed on the feature point 8041 to generate a feature map 8042; further performing feature expansion on the feature map 8042 to generate a feature map 8043; feature expansion is performed on the feature map 8043 to generate a feature map 8044; feature expansion is performed on the feature map 8044 to generate a feature map 8045 or the like. Specifically, the convolution layer based on the first image fusion model convolves the first monocular stitching feature 8021, performs feature expansion on the first monocular stitching feature 8021 to generate a feature map 8022, performs feature expansion on the feature map 8022, and generates feature maps 8023 and … until a first fusion feature 803 is generated. The feature expansion is mainly implemented based on feature prediction, feature combination, and the like of the corresponding feature map, and will not be described in detail here.

Step S604, denoising and semantic segmentation are performed on the second monocular image.

In the embodiment of the application, denoising is performed on the second monocular image to obtain a second initial denoising feature map of the second monocular image, and semantic segmentation is performed on the second monocular image to obtain a second semantic information feature map of the second monocular image. Specifically, inputting the second monocular image into an image encoder, and extracting second monocular image features of the second monocular image based on the image encoder; inputting the second monocular image characteristics into an image processing model, starting a denoising processing mode (started based on the first execution condition) in the image processing model, denoising the second monocular image characteristics based on the denoising processing mode, and generating a second initial denoising characteristic diagram of the second monocular image; and switching the image processing model from a denoising processing mode to a semantic extraction mode (the semantic extraction mode is started based on a second execution condition), and performing semantic segmentation on the second monocular image features based on the semantic extraction mode to generate a second semantic information feature map of the second monocular image. Specifically, the generating process of the second initial denoising feature map may refer to the generating process of the first initial denoising feature map in step S602, and the generating process of the second semantic information feature map may refer to the generating process of the first semantic information feature map in step S602, which is not described herein.

Step S605, feature fusion is carried out on the second preliminary denoising result of the second monocular image and the second semantic information, and second fusion features of the second monocular image are generated.

In the embodiment of the application, the denoising image features of the second initial denoising feature map are extracted based on the first image fusion model and are marked as second denoising image features, and the semantic image features of the second semantic information feature map are extracted and are marked as second semantic image features; and performing feature stitching on the second denoising image features and the second semantic image features to obtain monocular stitching features of the second monocular image, marking the monocular stitching features of the second monocular image as the second monocular stitching features, and iterating the second monocular stitching features based on a convolution layer of the first image fusion model to generate second fusion features of the second monocular image. The feature stitching may be directly stitching the second denoised image feature with the second semantic image feature. When the second monocular stitching feature is iterated by the convolution layer based on the first image fusion model to generate the second fusion feature, the convolution layer based on the first image fusion model may iterate the convolution of the second monocular stitching feature, which is equivalent to convolving the second monocular stitching feature to obtain a first convolution feature of the second monocular stitching feature, convolving the first convolution feature of the second monocular stitching feature to obtain a second convolution feature of the second monocular stitching feature, and … until the second fusion feature is obtained.

Further, the generating process of the second fusion feature may refer to the generating process of the first fusion feature in step S603, which is not described herein.

Step S606, according to the first fusion feature and the second fusion feature, a first denoising image of the first monocular image and a second denoising image of the second monocular image are generated.

In this embodiment of the present application, feature stitching is performed on the first fusion feature and the second fusion feature, so as to obtain a binocular stitching feature, and optionally, the feature stitching may be performed on the first fusion feature and the second fusion feature directly. Inputting the binocular stitching features into a second image fusion model, carrying out feature transfer on the first stitching features in the binocular stitching features by adopting the second stitching features based on a convolution layer of the second image fusion model, carrying out feature transfer on the second stitching features in the binocular stitching features by adopting the first stitching features, and generating a third stitching feature. The generation process of the third fusion feature may be described in detail with reference to step S404 in fig. 4, and the generation network structure of the third fusion feature may be described in fig. 5, which is not described here too much.

Further, dividing the third fusion feature into a first color channel feature and a second color channel feature based on the second image fusion model, generating a first denoising image of the first monocular image according to the first color channel feature, and generating a second denoising image of the second monocular image according to the second color channel feature; the first color channel features are transferred first fusion features, and the second color channel features are transferred second fusion features.

Specifically, the process of generating the third fusion feature based on feature delivery is specifically as follows:

and acquiring an image visual difference between the first monocular image and the second monocular image, and determining a position association relation between the first monocular image and the second monocular image according to the image visual difference, wherein the position association relation is used for representing the same geographic position, and the association relation between the corresponding pixel point in the first monocular image and the corresponding pixel point in the second monocular image. For example, as shown in fig. 3, the image visual difference between the first monocular image 3021 and the second monocular image 3031 may be determined based on the distance d between the left camera 302 and the right camera 303 of the digital camera, the distance L between the object 301 and the digital camera, the left view angle α, the right view angle β, and the like, and the positional association relationship between the first monocular image 3021 and the second monocular image 3031, such as the association relationship between the first image positional information (x 1, y 1) and the second image positional information (x 2, y 2), may be determined according to the image visual difference. In other words, when there is an association relationship between a first pixel point in the first monocular image and a second pixel point in the second monocular image, the first pixel point and the second pixel point indicate the same geographic position. Further, based on the position association relationship and a convolution layer of the second image fusion model, a feature element association relationship between a first fusion feature and a second fusion feature in the binocular stitching features is established. According to the association relation of the feature elements, the second fusion feature is adopted to transfer the features to the first fusion feature in the binocular splicing features, the first fusion feature is adopted to transfer the features to the second fusion feature in the binocular splicing features, and a third fusion feature is generated, and optionally, multiple feature transfer can be carried out between the first fusion feature and the second fusion feature in the binocular splicing features. For example, the binocular stitching feature is (0,1,4,0,5,4,1,5,1,2), where (0,1,4,0,5) is a first fused feature and (4,1,5,1,2) is a second fused feature, where a feature element association relationship exists between a third bit in the first fused feature and a first bit in the second fused feature, a fourth bit in the first fused feature and a second bit in the second fused feature, and a fifth bit in the first fused feature and a third bit in the second fused feature, and after feature transfer, a third fused feature is generated (0,1,4,0.5,5,4,0.5,5,1,2).

When the first denoising image and the second denoising image are generated according to the third fusion characteristic, the method can be realized through the following steps:

the third fusion feature is divided into a first color channel feature and a second color channel feature based on the second image fusion model. Feature superposition is carried out on at least two monochromatic channel features in the first color channel features, and a first denoising image of the first monocular image is generated; and performing feature superposition on at least two monochromatic channel features in the second color channel features to generate a second denoising image of the second monocular image. For example, taking a third fusion feature for representing a three-channel color image, the three channels being a red channel (R channel), a green channel (G channel), and a blue channel (B channel), respectively, the third fusion feature is a six-channel feature, that is, includes six single-color channel features, which are a first red channel feature, a first green channel feature, a first blue channel feature, a second red channel feature, a second green channel feature, and a second blue channel feature, respectively. Superposing the first red channel characteristic, the first green channel characteristic and the first blue channel characteristic to generate a first denoising image of the first monocular image; and superposing the second red channel characteristic, the second green channel characteristic and the second blue channel characteristic to generate a second denoising image of the second monocular image.

For example, referring to fig. 9, fig. 9 is a schematic view of a denoising image generation scene according to an embodiment of the present application. As shown in fig. 9, a first fusion feature 803 is generated from the first initial denoising feature map 7051 and the first semantic information feature map 7052, and a second fusion feature 902 is generated from the second initial denoising feature map 9011 and the second semantic information feature map 9012. Feature stitching is performed on the first fusion feature 803 and the second fusion feature 902 to generate a binocular stitching feature, the binocular stitching feature is input into the second image fusion model 903, feature transfer is performed between the first fusion feature 803 and the second fusion feature 902 based on a convolution layer of the second image fusion model 903, a third fusion feature is obtained, a first denoising image 9041 of the first monocular image is generated according to a first color channel feature in the third fusion feature, and a second denoising image 9042 of the second monocular image is generated according to a second color channel feature in the third fusion feature.

In this application, the execution sequence of the processing procedure of the first monocular image (i.e., step S602 and step S603) and the processing procedure of the second monocular image (i.e., step S604 and step S605) is not limited, in other words, step S604 and step S605 may be executed first to process the second monocular image, and then step S602 and step S603 may be executed to process the first monocular image, where the execution sequence of the two steps does not affect the implementation of the scheme in this application.

The embodiment of the application can be applied to any device capable of acquiring binocular images. For example, the binocular image processing artificial intelligence realized by the embodiment of the application is integrated in an unmanned vehicle, when the vehicle runs on a traffic road, the current weather environment can be obtained, if the current weather environment belongs to a noisy weather environment, the vehicle acquires the acquired binocular image based on the camera when the surrounding environment information is acquired, the denoising processing is carried out on the binocular image based on the binocular image processing artificial intelligence, the denoising image of each monocular image is obtained, the current road condition is displayed based on the denoising image of each monocular image, the vehicle can plan the running route based on the denoising image, and the route planning accuracy of the unmanned vehicle in a noisy weather environment such as a rainy day can be improved. Or, the binocular image processing artificial intelligence realized by the embodiment of the application is integrated in a navigation system and the like, and after the navigation system obtains the denoising image, the current road condition is directly displayed based on the denoising image, so that a driver of the traffic tool can check clear road conditions, the safety is improved and the like.

The embodiment of the present application is further described with reference to fig. 4, where binocular images are acquired; denoising and semantically segmenting monocular images in the binocular images to generate initial denoising feature images and semantic information feature images of the monocular images, and fusing the initial denoising feature images and the semantic information feature images of the monocular images to obtain fusion features of the monocular images; and denoising each monocular image based on the fusion characteristics respectively to obtain a denoising image of each monocular image. Through the above processes, two aspects of improvement are performed, in the first aspect, semantic information in a monocular image (such as a first monocular image and a second monocular image) is fused in a preliminary denoising result of the monocular image, and in the second aspect, information in the images is mutually complemented between two images in the binocular image, so that after denoising is performed on the binocular image with noise information, more image information can be reserved in the obtained denoising image, namely, information loss caused when denoising is performed on the binocular image is reduced, denoising processing results of the binocular image are perfected, and denoising processing effects of the binocular image are improved. The deep learning network (namely, each model) is introduced in each process in the application, so that each part of functions are conveniently packaged, and the deep learning network can continuously learn and adjust to optimize the effect of the deep learning network, so that the denoising effect on the binocular image can be further improved.

The model architecture used in the present application mainly includes two parts, one part is used for performing preliminary denoising and semantic segmentation on a first monocular image and a second monocular image, and the model architecture includes an image encoder and an image processing model, wherein the image encoder is denoted as E, the image processing model is denoted as G, and the execution condition of the image processing model G is denoted as T, i.e. the first execution condition may be denoted as T _de The second execution condition may be denoted as T _seg The method comprises the steps of carrying out a first treatment on the surface of the And one part of the fusion model is used for carrying out feature fusion on the result output by the image processing model, and carrying out feature fusion on the first fusion feature of the first monocular image and the second fusion feature of the second monocular image, and comprises a first image fusion model and a second image fusion model. The image processing model and the initial image processing model are denoted by G, mainly because the image processing model is obtained by adjusting the initial image processing model, and is not a completely inconsistent model.

The training process of the image processing model in the denoising processing mode is as follows:

the method comprises the steps of obtaining a noise image sample and a denoising image sample of the noise image sample, wherein the noise image sample is an image obtained by adding noise information into the denoising image sample, and at the moment, the denoising image sample is an ideal image obtained by denoising the noise image sample. Inputting the noise image sample into an initial image processing model in a denoising processing mode, obtaining a predicted denoising image output by the initial image processing model, and adjusting the initial image processing model in the denoising processing mode based on a first loss function between the predicted denoising image and the denoising image sample to obtain an image processing model in an executable denoising processing mode. The noise image samples may be input to the image encoder E, the output result of the image encoder E may be input to an initial image processing model in the denoising processing mode, and the initial image processing model may be trained. Alternatively, an initial image encoder may be acquired, and based on the noise image sample and the denoising image sample, the initial image encoder and an initial image processing model in a denoising processing mode are trained at the same time, so as to obtain an image encoder E and an image processing model G capable of executing the denoising processing mode. Wherein the first loss function can be represented by formula (1):

L ₁ ＝||I _c -σ _de (G(E(I _noise )，T _de ))|| ² ①

Wherein I is _c To denoise image samples, I _noise For noise image samples, T _de The first execution condition is used for representing the corresponding denoising processing mode; e (I) _noise ) The output result is obtained after the noise image sample is input into the image encoder E; g (E (I) _noise )，T _de ) The output result of the image encoder E is input to the initial image processing model in the denoising processing mode. Wherein sigma _de For representing a pixel-based reconstruction function, a normalization (softmax) function, etc.

Where the output result of the image processing model may be denoted as P, then p=g (E (I), T), I is a noise image sample, and T is an execution condition of the image processing model, where the formula is a unified description of data transmission of the image processing model in two modes.

The training process of the image processing model in the semantic extraction mode is as follows:

the method comprises the steps of obtaining a noise image sample and a semantic image sample of the noise image sample, wherein the semantic image sample is an image obtained by carrying out semantic segmentation on a denoising image sample corresponding to the noise image sample, and the semantic image sample can also be obtained by manually labeling. Inputting the noise image sample into an initial image processing model in a semantic extraction mode, acquiring a predicted semantic image output by the initial image processing model, and adjusting the initial image processing model in the semantic extraction mode based on a second loss function between the predicted semantic image and the semantic image sample to obtain an image processing model in an executable semantic extraction mode. Wherein the second loss function can be represented by formula (2):

Wherein,for semantic image samples, I _seg For representing predictive semantic images, sigma _h May be a softmax function, etc.

The training process for the first image fusion model and the second image fusion model is as follows:

acquiring a first denoising image sample and a second denoising image sample in the binocular image sample, adding noise information to the first denoising image sample, generating the first denoising image sample of the first denoising image sample, adding noise information to the second denoising image sample, and generating the second denoising image sample of the second denoising image sample. Acquiring a first initial denoising sample map and a first initial semantic sample map of a first noise image sample, and acquiring a second initial denoising sample map and a second initial semantic sample map of a second noise image sample. Training the first initial image fusion model and the second initial image fusion model based on the first initial denoising sample diagram, the first initial semantic sample diagram, the second initial denoising sample diagram and the second initial semantic sample diagram to obtain a first image fusion model and a second image fusion model; the output of the first initial image fusion model is the input of the second initial image fusion model.

Further, training the first initial image fusion model and the second initial image fusion model based on the first initial denoising sample map, the first initial semantic sample map, the second initial denoising sample map and the second initial semantic sample map to obtain the first image fusion model and the second image fusion model, wherein the process specifically comprises the following steps:

inputting the first initial denoising sample graph and the first initial semantic sample graph into a first initial image fusion model to obtain first fusion sample characteristics of a first noise image sample; and inputting the second initial denoising sample graph and the second initial semantic sample graph into a first initial image fusion model to obtain second fusion sample characteristics of the first noise image sample. And inputting the first fusion sample characteristic and the second fusion sample characteristic into a second initial image fusion model to obtain a first prediction denoising image and a second prediction denoising image. A third loss function between the first predicted denoised image and the first denoised image sample is obtained, and a fourth loss function between the second predicted denoised image and the second denoised image sample is obtained. And adjusting the first initial image fusion model and the second initial image fusion model based on the comprehensive loss function corresponding to the third loss function and the fourth loss function to obtain the first image fusion model and the second image fusion model.

Wherein the third loss function can be represented by formula (3):

/>

wherein,for representing a first predictive denoised image, < > and->For representing the first denoised image sample.

Wherein the fourth loss function can be represented by formula (4):

wherein,for representing a second predicted denoised image, +.>For representing a second denoised image sample.

Wherein, according to the third loss function and the fourth loss function, a comprehensive loss function is obtained, and the comprehensive loss function can be represented by a formula (5):

L＝λ ₁ L ₃ +λ ₂ L ₄ ⑤

wherein lambda is ₁ A first weight for representing importance of the first monocular image in the binocular image; lambda (lambda) ₂ Is a second weight for representing the importance of the second monocular image in the binocular image. The comprehensive loss function may be further represented by equation (6):

optionally, the importance of the first monocular image and the second monocular image in the binocular image is the same, and the first weight λ ₁ And a second weight lambda ₂ May be the same. Optionally, the first weight lambda ₁ And a second weight lambda ₂ The size relationship of (2) may be adjusted as needed.

Further, referring to fig. 10, fig. 10 is a schematic diagram of an image denoising apparatus according to an embodiment of the present application. The image denoising apparatus may be a computer program (including program code) running in a computer device, for example, the image denoising apparatus is an application software; the device can be used for executing corresponding steps in the method provided by the embodiment of the application. As shown in fig. 10, the image denoising apparatus 1000 may be used in the computer device in the embodiment corresponding to fig. 4, and specifically, the apparatus may include: an image acquisition module 11, a first image processing module 12, a second image processing module 13, and an image denoising module 14.

An image acquisition module 11 for acquiring a binocular image;

the image processing module 12 is used for carrying out denoising processing and semantic segmentation on the monocular images in the binocular images to generate an initial denoising feature map and a semantic information feature map of each monocular image;

the image fusion module 13 is used for fusing the initial denoising feature image and the semantic information feature image of each monocular image to obtain fusion features of each monocular image;

the image denoising module 14 is configured to denoise each monocular image based on the fusion feature, so as to obtain a denoised image of each monocular image.

the image processing module 12 includes:

a first feature extraction unit 121 for extracting a monocular image K _i Input image encoder, based on image encoder extraction of monocular image K _i Monocular image features of (2);

an image denoising unit 122 for denoising the monocular image K _i Inputting monocular image characteristics into an image processing model, starting a denoising processing mode in the image processing model, and carrying out monocular image K based on the denoising processing mode _i Denoising the monocular image features to generate a monocular image K _i Is characterized by an initial denoising feature map;

a semantic extraction unit 123 for switching the image processing model from the denoising processing mode to the semantic extraction mode, and performing a semantic extraction on the monocular image K based on the semantic extraction mode _i Semantic segmentation is carried out on monocular image characteristics of the image to generate a monocular image K _i Is a semantic information feature map of (1).

the image fusion module 13 includes:

a second feature extraction unit 131 for extracting a monocular image K based on the first image fusion model _i Is used for extracting monocular image features of the initial denoising feature mapImage K _i Semantic image features of a semantic information feature map;

a feature processing unit 132 for processing the monocular image K _i Feature stitching is carried out on the denoising image features and the semantic image features to obtain a monocular image K _i Monocular stitching feature of (2), monocular image K is processed by convolution layer based on first image fusion model _i Iteration is carried out on the monocular stitching features of the image to generate a monocular image K _i Is described.

The image denoising module 14 is specifically configured to:

The image denoising module 14 includes:

the feature stitching unit 141 is configured to stitch the first fusion feature and the second fusion feature to obtain a binocular stitching feature;

the feature transfer unit 142 is configured to input the binocular stitching feature into a second image fusion model, perform feature transfer to a first fusion feature of the binocular stitching features by using the second fusion feature, and perform feature transfer to a second fusion feature of the binocular stitching features by using the first fusion feature based on a convolution layer of the second image fusion model, so as to generate a third fusion feature;

an image generating unit 143, configured to divide the third fusion feature into a first color channel feature and a second color channel feature based on the second image fusion model, generate a first denoising image of the first monocular image according to the first color channel feature, and generate a second denoising image of the second monocular image according to the second color channel feature; the first color channel feature is a transferred first fusion feature and the second color channel feature is a transferred second fusion feature.

Wherein the feature transfer unit 142 includes:

an association determination subunit 1421, configured to obtain an image visual difference between the first monocular image and the second monocular image, and determine a positional association relationship between the first monocular image and the second monocular image according to the image visual difference; the position association relation is used for representing the same geographic position, and the association relation between the corresponding pixel point in the first monocular image and the corresponding pixel point in the second monocular image;

a relationship establishing subunit 1422, configured to establish a feature element association relationship between the first fusion feature and the second fusion feature in the binocular stitching feature based on the position association relationship and the convolution layer of the second image fusion model;

the feature generation subunit 1423 is configured to perform feature transfer to a first fusion feature in the binocular stitching features by using the second fusion feature according to the association relationship of feature elements, and perform feature transfer to a second fusion feature in the binocular stitching features by using the first fusion feature, so as to generate a third fusion feature.

Wherein the image generating unit 143 includes:

a feature partitioning sub-unit 1431 for partitioning the third fusion feature into a first color channel feature and a second color channel feature based on the second image fusion model;

A feature stacking subunit 1432, configured to perform feature stacking on at least two monochromatic channel features in the first color channel features, and generate a first denoising image of the first monocular image;

the feature stacking subunit 1432 is further configured to perform feature stacking on at least two monochromatic channel features in the second color channel features to generate a second denoised image of the second monocular image.

Wherein the apparatus 1000 further comprises:

a first sample acquiring module 15, configured to acquire a noise image sample and a denoised image sample of the noise image sample; the noise image sample is an image obtained by adding noise information into the denoising image sample;

the first model training module 16 is configured to input the noise image sample into an initial image processing model in a denoising processing mode, obtain a predicted denoising image output by the initial image processing model, and adjust the initial image processing model in the denoising processing mode based on a first loss function between the predicted denoising image and the denoising image sample, so as to obtain an image processing model in an executable denoising processing mode.

Wherein the apparatus 1000 further comprises:

a second sample acquiring module 17, configured to acquire a noise image sample and a semantic image sample of the noise image sample; the semantic image sample is an image obtained by carrying out semantic segmentation on a denoising image sample corresponding to the noise image sample;

The second model training module 18 is configured to input the noise image sample into an initial image processing model in a semantic extraction mode, obtain a predicted semantic image output by the initial image processing model, and adjust the initial image processing model in the semantic extraction mode based on a second loss function between the predicted semantic image and the semantic image sample, so as to obtain an image processing model in an executable semantic extraction mode.

Wherein the apparatus 1000 further comprises:

a third sample acquiring module 19, configured to acquire a first denoised image sample and a second denoised image sample in the binocular image samples, add noise information to the first denoised image sample, generate a first noise image sample of the first denoised image sample, add noise information to the second denoised image sample, and generate a second noise image sample of the second denoised image sample;

the sample conversion module 20 is configured to obtain a first initial denoising sample map and a first initial semantic sample map of a first noise image sample, and obtain a second initial denoising sample map and a second initial semantic sample map of a second noise image sample;

the third model training module 21 is configured to train the first initial image fusion model and the second initial image fusion model based on the first initial denoising sample map, the first initial semantic sample map, the second initial denoising sample map, and the second initial semantic sample map, so as to obtain the first image fusion model and the second image fusion model; the output of the first initial image fusion model is the input of the second initial image fusion model.

Wherein the third model training module 21 comprises:

a first feature obtaining unit 211, configured to input a first initial denoising sample map and a first initial semantic sample map into a first initial image fusion model, so as to obtain first fusion sample features of a first noise image sample;

a second feature obtaining unit 212, configured to input a second initial denoising sample map and a second initial semantic sample map into the first initial image fusion model, so as to obtain a second fusion sample feature of the first noise image sample;

the model prediction unit 213 is configured to input the first fused sample feature and the second fused sample feature into a second initial image fusion model, so as to obtain a first predicted denoising image and a second predicted denoising image;

a loss obtaining unit 214, configured to obtain a third loss function between the first predicted denoising image and the first denoising image sample, and obtain a fourth loss function between the second predicted denoising image and the second denoising image sample;

the model adjustment unit 215 is configured to adjust the first initial image fusion model and the second initial image fusion model based on the integrated loss function corresponding to the third loss function and the fourth loss function, so as to obtain the first image fusion model and the second image fusion model.

Wherein, monocular image K _i For composing images of binocular images, wherein a monocular image K of the binocular images ₁ A monocular image K in the binocular image which is the first monocular image ₂ Is the second monocular image.

The embodiment of the application provides an image denoising device, which acquires binocular images; denoising and semantically segmenting each monocular image in the binocular image to generate an initial denoising feature map and a semantic information feature map of each monocular image; fusing the initial denoising feature map and the semantic information feature map of each monocular image to obtain fusion features of each monocular image; and denoising each monocular image based on the fusion characteristics respectively to obtain a denoising image of each monocular image. Through the above process, after the monocular image is subjected to preliminary denoising, semantic information of the monocular image is extracted, and a result of the preliminary denoising is fused with the semantic information, so that information loss when the monocular image in the binocular image is denoised is reduced; further, feature fusion can be carried out on two images in the binocular image, namely, the result after preliminary denoising of the binocular image is completed based on the information in the two images in the binocular image, so that information loss during denoising of the binocular image is further reduced, the denoising result of the binocular image is perfected, and the denoising effect of the binocular image is improved. Furthermore, a deep learning network (namely, each model) is introduced in each process in the application, so that each part of functions are conveniently packaged, and the deep learning network can continuously learn and adjust, so that the denoising effect on the binocular image can be further improved.

Referring to fig. 11, fig. 11 is a schematic structural diagram of a computer device according to an embodiment of the present application. As shown in fig. 11, the computer device in the embodiment of the present application may include: one or more processors 1101, memory 1102, and an input-output interface 1103. The processor 1101, memory 1102, and input-output interface 1103 are connected by a bus 1104. The memory 1102 is used for storing a computer program, which includes program instructions, and the input/output interface 1103 is used for receiving data and outputting data, such as for data interaction between models; the processor 1101 is configured to execute program instructions stored in the memory 1102, and perform the following operations:

obtaining a binocular image;

In some possible embodiments, the processor 1101 may be a central processing unit (central processing unit, CPU), which may also be other general purpose processors, digital signal processors (digital signal processor, DSP), application specific integrated circuits (application specific integrated circuit, ASIC), off-the-shelf programmable gate arrays (field-programmable gate array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 1102 may include read only memory and random access memory, and provides instructions and data to the processor 1101 and input output interface 1103. A portion of memory 1102 may also include non-volatile random access memory. For example, memory 1102 may also store information of device type.

In a specific implementation, the computer device may execute, through each functional module built in the computer device, an implementation manner provided by each step in fig. 4, and specifically, the implementation manner provided by each step in fig. 4 may be referred to, which is not described herein again.

Embodiments of the present application provide a computer device, comprising: the image denoising method comprises a processor, an input/output interface and a memory, wherein the processor acquires computer instructions in the memory, executes the steps of the method shown in fig. 4 and performs image denoising operation. The embodiment of the application realizes the acquisition of binocular images; denoising and semantically segmenting each monocular image in the binocular image to generate an initial denoising feature map and a semantic information feature map of each monocular image; fusing the initial denoising feature map and the semantic information feature map of each monocular image to obtain fusion features of each monocular image; and denoising each monocular image based on the fusion characteristics respectively to obtain a denoising image of each monocular image. Through the above process, after the preliminary denoising is carried out on each monocular image, semantic information of each monocular image is extracted, and the result of the preliminary denoising is fused with the semantic information, so that the information loss when the monocular images in the binocular images are denoised is reduced; furthermore, as a certain image information coincidence exists in each monocular image included in the binocular image, feature fusion can be carried out on two images in the binocular image, namely, the information supplementation is carried out on the results after preliminary denoising of the binocular image based on the information in the two images in the binocular image, so that the information loss during denoising of the binocular image is further reduced, the denoising result of the binocular image is perfected, and the denoising effect of the binocular image is improved. Furthermore, a deep learning network (namely, each model) is introduced in each process in the application, so that each part of functions are conveniently packaged, and the deep learning network can continuously learn and adjust, so that the denoising effect on the binocular image can be further improved.

The embodiment of the present application further provides a computer readable storage medium, where the computer readable storage medium stores a computer program, where the computer program includes program instructions, when executed by the processor, may implement the image denoising method provided by each step in fig. 4 or fig. 6, and specifically refer to the implementation manner provided by each step in fig. 4 or fig. 6, which is not described herein again. In addition, the description of the beneficial effects of the same method is omitted. For technical details not disclosed in the embodiments of the computer-readable storage medium according to the present application, please refer to the description of the method embodiments of the present application. As an example, the program instructions may be deployed to be executed on one computer device or on multiple computer devices at one site or distributed across multiple sites and interconnected by a communication network.

The computer readable storage medium may be the image denoising apparatus provided in any of the foregoing embodiments or an internal storage unit of the computer device, for example, a hard disk or a memory of the computer device. The computer readable storage medium may also be an external storage device of the computer device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) card, a flash card (flash card) or the like, which are provided on the computer device. Further, the computer-readable storage medium may also include both internal storage units and external storage devices of the computer device. The computer-readable storage medium is used to store the computer program and other programs and data required by the computer device. The computer-readable storage medium may also be used to temporarily store data that has been output or is to be output.

Embodiments of the present application also provide a computer program product or computer program comprising computer instructions stored in a computer-readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium, and the processor executes the computer instructions, so that the computer device executes the methods provided in various alternative modes in fig. 4 or fig. 6, thereby implementing denoising processing on the binocular image, reducing information loss when denoising processing is performed on the binocular image, perfecting denoising processing results on the binocular image, and improving denoising processing effects on the binocular image.

The terms first, second and the like in the description and in the claims and drawings of the embodiments of the present application are used for distinguishing between different objects and not for describing a particular sequential order. Furthermore, the term "include" and any variations thereof is intended to cover a non-exclusive inclusion. For example, a process, method, apparatus, article, or device that comprises a list of steps or elements is not limited to the list of steps or modules but may, in the alternative, include other steps or modules not listed or inherent to such process, method, apparatus, article, or device.

Those of ordinary skill in the art will appreciate that the elements and algorithm steps described in connection with the embodiments disclosed herein may be embodied in electronic hardware, in computer software, or in a combination of the two, and that the elements and steps of the examples have been generally described in terms of function in this description to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The methods and related devices provided in the embodiments of the present application are described with reference to the method flowcharts and/or structure diagrams provided in the embodiments of the present application, and each flowchart and/or block of the method flowcharts and/or structure diagrams may be implemented by computer program instructions, and combinations of flowcharts and/or blocks in the flowchart and/or block diagrams. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or structural diagram block or blocks. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or structures.

The foregoing disclosure is only illustrative of the preferred embodiments of the present application and is not intended to limit the scope of the claims herein, as the equivalent of the claims herein shall be construed to fall within the scope of the claims herein.

Claims

1. A method of denoising an image, the method comprising:

obtaining a binocular image; the binocular image comprises a first monocular image and a second monocular image;

denoising and semantically segmenting the monocular images in the binocular images to generate an initial denoising feature map and a semantic information feature map of each monocular image;

fusing the initial denoising feature map and the semantic information feature map of each monocular image to obtain fusion features of each monocular image; the fusion features of the monocular images comprise a first fusion feature of the first monocular image and a second fusion feature of the second monocular image;

inputting the binocular stitching features into a second image fusion model, carrying out feature transfer on the first fusion features in the binocular stitching features by adopting the second fusion features based on a convolution layer of the second image fusion model, carrying out feature transfer on the second fusion features in the binocular stitching features by adopting the first fusion features, and generating a third fusion feature;

Dividing the third fusion feature into a first color channel feature and a second color channel feature based on the second image fusion model; the first color channel feature is a transferred first fusion feature, and the second color channel feature is a transferred second fusion feature;

performing feature superposition on at least two monochromatic channel features in the first color channel features to generate a first denoising image of the first monocular image;

2. The method of claim 1, wherein the binocular image comprises a monocular image K _i I is a positive integer, i is less than or equal to the number of monocular images included in the binocular image;

denoising and semantically segmenting the monocular images in the binocular images to generate an initial denoising feature map and a semantic information feature map of each monocular image, wherein the method comprises the following steps of:

the monocular image K _i An input image encoder for extracting the monocular image K based on the image encoder _i Monocular image features of (2);

the monocular image K _i Inputting monocular image features into an image processing model, starting a denoising processing mode in the image processing model, and performing monocular image K based on the denoising processing mode _i Denoising the monocular image features of (1) to generate the monocular image K _i Is characterized by the initial denoising feature map;

switching the image processing model from the denoising processing mode to a semantic extraction mode, and performing monocular image K based on the semantic extraction mode _i Semantic segmentation is carried out on monocular image features of the image to generate the monocular image K _i Is described.

3. The method of claim 1, wherein the binocular image comprises a monocular image K _i I is a positive integer, i is less than or equal to the number of monocular images included in the binocular image;

the fusing the initial denoising feature map and the semantic information feature map of each monocular image to obtain the fusion feature of each monocular image comprises the following steps:

extracting the monocular image K based on a first image fusion model _i Extracting the monocular image K from the de-noised image features of the initial de-noised feature map _i Semantic image features of the semantic information feature map;

The monocular image K _i Feature stitching is carried out on the denoising image features and the semantic image features to obtain the monocular image K _i Monocular stitching features of (2) for the monocular image K based on the convolution layer of the first image fusion model _i Iteration is carried out on the monocular stitching features of the (4) to generate the monocular image K _i Is described.

4. The method of claim 1, wherein the generating a third fused feature based on the convolution layer of the second image fusion model using the second fused feature to perform feature transfer to the first fused feature of the binocular stitching features and using the first fused feature to perform feature transfer to the second fused feature of the binocular stitching features comprises:

acquiring an image visual difference between the first monocular image and the second monocular image, and determining a position association relation between the first monocular image and the second monocular image according to the image visual difference; the position association relationship is used for representing the same geographic position, and the association relationship between the corresponding pixel point in the first monocular image and the corresponding pixel point in the second monocular image;

and according to the association relation of the feature elements, carrying out feature transfer on the first fusion feature in the binocular splicing feature by adopting the second fusion feature, carrying out feature transfer on the second fusion feature in the binocular splicing feature by adopting the first fusion feature, and generating a third fusion feature.

5. The method of claim 2, wherein the method further comprises:

inputting the noise image sample into an initial image processing model in the denoising processing mode, obtaining a predicted denoising image output by the initial image processing model, and adjusting the initial image processing model in the denoising processing mode based on a first loss function between the predicted denoising image and the denoising image sample to obtain an image processing model capable of executing the denoising processing mode.

6. The method of claim 2, wherein the method further comprises:

inputting the noise image sample into an initial image processing model in the semantic extraction mode, acquiring a predicted semantic image output by the initial image processing model, and adjusting the initial image processing model in the semantic extraction mode based on a second loss function between the predicted semantic image and the semantic image sample to obtain an image processing model capable of executing the semantic extraction mode.

7. The method of claim 1, wherein the method further comprises:

acquiring a first denoising image sample and a second denoising image sample in a binocular image sample, adding noise information to the first denoising image sample to generate a first denoising image sample of the first denoising image sample, adding the noise information to the second denoising image sample to generate a second denoising image sample of the second denoising image sample;

Acquiring a first initial denoising sample map and a first initial semantic sample map of the first noise image sample, and acquiring a second initial denoising sample map and a second initial semantic sample map of the second noise image sample;

training a first initial image fusion model and a second initial image fusion model based on the first initial denoising sample map, the first initial semantic sample map, the second initial denoising sample map and the second initial semantic sample map to obtain a first image fusion model and a second image fusion model; the output of the first initial image fusion model is the input of the second initial image fusion model.

8. The method of claim 7, wherein training the first initial image fusion model and the second initial image fusion model based on the first initial denoised sample map, the first initial semantic sample map, the second initial denoised sample map, and the second initial semantic sample map to obtain the first image fusion model and the second image fusion model comprises:

inputting the first initial denoising sample graph and the first initial semantic sample graph into the first initial image fusion model to obtain first fusion sample characteristics of the first noise image sample;

Inputting the second initial denoising sample graph and the second initial semantic sample graph into the first initial image fusion model to obtain second fusion sample characteristics of the first noise image sample;

inputting the first fusion sample feature and the second fusion sample feature into the second initial image fusion model to obtain a first prediction denoising image and a second prediction denoising image;

9. An image denoising apparatus, comprising:

the image acquisition module is used for acquiring binocular images; the binocular image comprises a first monocular image and a second monocular image;

the image processing module is used for carrying out denoising processing and semantic segmentation on the monocular images in the monocular images to generate an initial denoising feature map and a semantic information feature map of each monocular image; the fusion features of the monocular images comprise a first fusion feature of the first monocular image and a second fusion feature of the second monocular image;

the image denoising module is used for denoising each monocular image again based on the fusion characteristics to obtain a denoising image of each monocular image;

the image denoising module comprises:

the feature transfer unit is used for inputting the binocular stitching features into a second image fusion model, carrying out feature transfer on the first fusion features in the binocular stitching features by adopting the second fusion features based on a convolution layer of the second image fusion model, carrying out feature transfer on the second fusion features in the binocular stitching features by adopting the first fusion features, and generating a third fusion feature;

the image generation unit is used for dividing the third fusion feature into a first color channel feature and a second color channel feature based on the second image fusion model, generating a first denoising image of the first monocular image according to the first color channel feature, and generating a second denoising image of the second monocular image according to the second color channel feature; the first color channel feature is a transferred first fusion feature, and the second color channel feature is a transferred second fusion feature;

Wherein the image generation unit includes:

a feature classification subunit configured to classify the third fusion feature into a first color channel feature and a second color channel feature based on the second image fusion model;

10. The apparatus of claim 9, wherein the binocular image comprises a monocular image K _i I is a positive integer, i is less than or equal to the number of monocular images included in the binocular image;

the image processing module comprises:

a first feature extraction unit for extracting the monocular image K _i An input image encoder for extracting the monocular image K based on the image encoder _i Monocular image features of (2);

an image denoising unit for denoising the monocular image K _i Inputting monocular image features into an image processing model, starting a denoising processing mode in the image processing model, and performing monocular image K based on the denoising processing mode _i Denoising the monocular image features of (1) to generate the monocular image K _i Is characterized by the initial denoising feature map;

a semantic extraction unit, configured to switch the image processing model from the denoising processing mode to a semantic extraction mode, and perform the image processing on the monocular image K based on the semantic extraction mode _i Semantic segmentation is carried out on monocular image features of the image to generate the monocular image K _i The language of (2)And (5) a sense information characteristic diagram.

11. A computer device, comprising a processor, a memory, and an input-output interface;

the processor is connected to the memory and the input-output interface, respectively, wherein the input-output interface is used for receiving data and outputting data, the memory is used for storing a computer program, and the processor is used for calling the computer program to execute the method as claimed in any one of claims 1-8.

12. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program comprising program instructions which, when executed by a processor, perform the method of any of claims 1-8.