WO2019228456A1

WO2019228456A1 - Image processing method, apparatus and device, and machine-readable storage medium

Info

Publication number: WO2019228456A1
Application number: PCT/CN2019/089272
Authority: WO
Inventors: 姜子伦; 肖飞; 范蒙; 俞海
Original assignee: 杭州海康威视数字技术股份有限公司
Priority date: 2018-05-31
Filing date: 2019-05-30
Publication date: 2019-12-05
Also published as: CN110555808B; CN110555808A

Abstract

Provided in the present disclosure are an image processing method, apparatus and device, and a machine-readable storage medium. The method comprises: acquiring an original image of a first data format; using a neural network to convert the original image into an intermediate image of a second data format; and using a buffered image to denoise the intermediate image to obtain a target image of the second data format, wherein the buffered image comprises a target image corresponding to a previous-frame original image adjacent to the original image.

Description

Image processing method, device, equipment and machine-readable storage medium

Cross-reference to related applications

This patent application claims priority from a Chinese patent application filed on May 31, 2018 with an application number of 201810556530.2 and an invention name of "an image processing method, device, device, and machine-readable storage medium." The entire text is incorporated herein by reference.

Technical field

The present application relates to the field of image technology, and in particular, to an image processing method, apparatus, device, and machine-readable storage medium.

Background technique

The original image in the first data format collected by the imaging device cannot usually be directly displayed or transmitted. Therefore, the original image in the first data format can also be converted into a target image in the second data format for display or transmission. For example, an ISP (Image Signal Processing) algorithm can be used to convert the original image into a target image. The ISP algorithm is used to solve image processing such as brightness and color compensation and correction.

When the imaging device uses the ISP algorithm to convert the original image into the target image, due to the defects of the ISP algorithm itself and the superposition of the loss of each processing module, the target image will lose the original image information to a certain extent. If the loss of the original image information is serious, it may not be repaired subsequently. Moreover, the original image collected when the lighting conditions are poor has a large noise after being processed by the ISP algorithm.

Summary of the Invention

The present disclosure provides an image processing method, device, device, and machine-readable storage medium, which can effectively remove noise, improve the quality of a target image, and improve user experience.

The present disclosure provides an image processing method, which includes:

Obtaining an original image in a first data format;

Using a neural network to convert the original image into an intermediate image in a second data format;

The buffer image is used to perform noise reduction processing on the intermediate image to obtain a target image in a second data format. The buffer image includes a target image corresponding to a previous frame of the original image adjacent to the original image.

The present disclosure provides an image processing apparatus including:

An image processing module, configured to obtain an original image in a first data format, and convert the original image into an intermediate image in a second data format using a neural network;

A video processing module, configured to perform noise reduction processing on the intermediate image by using a cached image to obtain a target image in a second data format, wherein the cached image includes a corresponding original image of a previous frame adjacent to the original image The target image.

The present disclosure provides an image processing apparatus including a processor and a machine-readable storage medium, where the machine-readable storage medium stores machine-executable instructions that can be executed by the processor; and the processor is configured to execute the machine-executable Execute instructions to implement the method steps described above.

The present disclosure provides a machine-readable storage medium. Computer instructions are stored on the machine-readable storage medium. When the computer instructions are executed, the foregoing method steps are implemented. As can be seen from the above technical solutions, in the embodiment of the present disclosure, after the original image in the first data format is converted into the intermediate image in the second data format by using a neural network, the buffer image can be used to perform noise reduction processing on the intermediate image to obtain the second The target image in the data format. Since the cache image is the target image corresponding to the previous frame of the original image of the original image, the two frames of the image (ie, the cache image and the intermediate image) are closely related in time and space. The correlation between the two frames of image can be used to distinguish the signal and noise in the image, and the noise reduction process can be performed on the intermediate image to effectively remove the noise. Therefore, noise can be effectively removed even in poor lighting conditions, so that noise in the image can be effectively suppressed, and the quality of the target image is improved. Moreover, in the above manner, the original image in the first data format is converted into the intermediate image in the second data format by using a neural network, which can reduce the original image information loss of the intermediate image and can be repaired later.

BRIEF DESCRIPTION OF THE DRAWINGS

1A and 1B are schematic diagrams of a neural network in an embodiment of the present disclosure.

2A-2C are schematic diagrams of an offline training neural network in an embodiment of the present disclosure.

FIG. 3 is a flowchart of an image processing method according to an embodiment of the present disclosure.

4A-4D are schematic diagrams of image processing in an embodiment of the present disclosure.

FIG. 5 is a structural diagram of an image processing apparatus in an embodiment of the present disclosure.

FIG. 6 is a hardware configuration diagram of an image processing apparatus in an embodiment of the present disclosure.

Detailed ways

The terminology used in the embodiments of the present disclosure is for the purpose of describing particular embodiments only, and is not intended to limit the present disclosure. The singular forms "a", "the" and "the" used in this disclosure and the claims are also intended to include the plural forms unless the context clearly indicates otherwise. It should also be understood that the term "and / or" as used herein refers to any or all possible combinations that include one or more of the associated listed items.

It should be understood that the terms first, second, third, etc. may be used in the embodiments of the present disclosure to describe various kinds of information, and these descriptions are only used to distinguish the same type of information from each other. For example, without departing from the scope of the present disclosure, the first information may also be referred to as the second information, and similarly, the second information may also be referred to as the first information. Depending on the context, in addition, the word "if" can be interpreted as "at" or "at ..." or "in response to a determination".

An embodiment of the present disclosure provides an image processing method, which can be applied to an image processing device. The image processing device may be an imaging device, such as a video camera, and the type of the image processing device is not limited.

In the embodiment of the present disclosure, after the original image in the first data format is obtained, the original image may be converted into an intermediate image in the second data format by using a neural network, and then the buffer image is used to perform noise reduction processing on the intermediate image to obtain the first image. Target image in two data formats. That is, the target image is an image subjected to noise reduction processing. In this way, two frames of images (that is, the cached image and the intermediate image) can be used to perform noise reduction processing on the intermediate image, which can effectively remove noise, and can also effectively remove noise when the lighting conditions are poor, so that the noise in the image can be effectively suppressed. Improve the quality of the target image and enhance the user experience.

In order to explain the present disclosure more clearly, the following concepts of the present disclosure are briefly described first:

1. The first data format and the original image.

The image collected by the image processing device may be an original image, and a data format of the original image may be a first data format. The first data format is an original image format, and usually includes image data of one or more spectral bands. The original image in the first data format cannot be directly displayed or transmitted, that is, there is an exception when the original image in the first data format is displayed or transmitted.

For example, the first data format may include a Bayer format. Of course, the Bayer format is only an example, and there is no limitation on this first data format, and the data format of all the original images is within the protection scope of the present disclosure.

2. The second data format, the intermediate image and the target image.

After the image processing device uses the neural network to convert the original image, the intermediate image is obtained. The intermediate image is the output image of the neural network, and is not the final target image. After the image processing device performs noise reduction processing on the intermediate image by using the cache image, the target image is obtained, that is, the final output image.

The data format of the intermediate image and the target image may be a second data format, and the second data format is any image format suitable for display or transmission. For example, when a target image in a second data format is displayed or transmitted, no abnormality occurs.

For example, the second data format may include an RGB (Red Green Blue) format, a YUV (Luminance Chrominance) format, and the like. Of course, the RGB format and the YUV format are just examples, and there is no limitation on this second data format. All image formats suitable for display or transmission are within the protection scope of the present disclosure.

The following describes the neural network in the embodiment of the present disclosure, which can be used to convert an original image in a first data format into an intermediate image in a second data format. During the conversion process, you can also use the neural network to optimize the original image, such as adjusting the attributes of the original image, such as adjusting the brightness, color, contrast, signal-to-noise ratio, and size of the original image. This optimization method is not detailed. limit.

The neural network in the present disclosure may include, but is not limited to, a convolutional neural network (CNN for short), a recurrent neural network (RNN for short), a fully connected network, etc. In this embodiment, the convolutional neural network is taken as an example.

The structural units of the neural network in the present disclosure may include, but are not limited to, one or any combination of the following: a convolutional layer, a pooling layer, an excitation layer, a fully connected layer, and the like. There are no restrictions on the structural units specifically included in the neural network, as long as it includes at least one convolutional layer. For example, in one example, a neural network may include: at least one convolutional layer, at least one pooling layer, and at least one fully connected layer. Alternatively, in another example, the neural network may include: at least one convolutional layer and at least one excitation layer.

For example, as shown in FIGS. 1A and 1B, there are two examples of the neural network used in this embodiment.

In FIG. 1A, the neural network may be composed of several convolutional layers (Conv), several pooling layers (Pool), and a fully connected layer (FC). There are no restrictions on the number of convolution layers and the number of pooling layers.

In FIG. 1B, the neural network may be composed of several convolutional layers and several excitation layers. There are no restrictions on the number of convolutional layers or the number of excitation layers.

Of course, the neural network used in the present disclosure for converting the original image in the first data format into the intermediate image in the second data format may also have other structures, which is not limited as long as it includes at least one convolution layer, and Not limited to FIG. 1A or FIG. 1B, the neural network illustrated in FIG. 1A or FIG. 1B is merely an example.

The algorithms and functions of each computing layer in the neural network are described below.

In the convolution layer, the image features are enhanced by performing a convolution operation on the image using a convolution kernel. The convolution kernel can be a matrix of size m * n. The input of the convolution layer and the convolution kernel are convolved to obtain the output of the convolution layer. The convolution operation is actually a filtering process. In the convolution operation, the pixel value f (x, y) of the point (x, y) on the image is convolved with the convolution kernel w (x, y). For example, a 4 * 4 convolution kernel is provided. The 4 * 4 convolution kernel contains 16 values, and the size of the 16 values can be configured as required. Slide the image in order according to the size of 4 * 4 to get multiple 4 * 4 sliding windows. The 4 * 4 convolution kernel is convolved with each sliding window to obtain multiple convolution features. These convolutional features are the output of the convolutional layer and are provided to the pooling layer.

The processing of the pooling layer is actually a process of downsampling. By performing operations such as maximizing, minimizing, and averaging multiple convolutional features output by the convolutional layer, the amount of calculation can be reduced and feature invariance can be maintained. In the pooling layer, the principle of local image correlation can be used to sub-sample the image, which can reduce the amount of data processing and retain useful information. In one example, the following formula for performing maximum pooling can be used to pool the convolution features and obtain the pooled features.

Among them, s represents the corresponding window size (s * s) during the pooling process, m and n are set values, j and k are convolution features output by the convolution layer, and i represents the i-th image. y ⁱ _{j, k} represents the features obtained by pooling the i-th image.

In the excitation layer after the pooling layer, the activation function (such as a non-linear function) can be used to map the features of the pooling layer output, thereby introducing non-linear factors, so that the neural network can enhance the expression ability through non-linear combination. The activation function of the excitation layer may include, but is not limited to, a ReLU (Rectified Linear Units, Rectified Linear Units) function. Taking the following ReLU function as an example, the ReLU function can set all the features x that are less than or equal to 0 to 0 in the output of the pooling layer, and keep the features that are greater than 0 unchanged.

In the fully-connected layer, each node of the fully-connected layer is connected to all the nodes in the previous layer, and is used to fully-connect all features input to the fully-connected layer to obtain a feature vector, and The feature vector may include multiple features. Further, the fully connected layer may also use a 1 * 1 convolution layer, so that a fully convolutional network can be formed.

In practical applications, according to different needs, one or more convolutional layers, one or more pooling layers, one or more excitation layers, and one or more fully connected layers can be combined to construct a neural network.

In this embodiment, the input of the neural network is the original image in the first data format, and the output of the neural network is the intermediate image in the second data format. That is, after the original image in the first data format is input to the neural network, after processing by various structural units (such as convolutional layer, pooling layer, excitation layer, fully connected layer, etc.) in the neural network, the second image can be output Intermediate image in data format.

In order to achieve the above functions, the neural network can be trained offline, which mainly trains various neural network parameters in the neural network, such as convolution layer parameters (such as convolution kernel parameters), pooling layer parameters, and excitation layer parameters. There is no limitation on this, all parameters involved in the neural network are within the protection scope of this embodiment. By training each neural network parameter in the neural network, the neural network can fit the mapping relationship between input and output, that is, the mapping relationship between the original image in the first data format and the intermediate image in the second data format. In this way, when the input of the neural network is the original image in the first data format, and through the processing of the neural network, the output of the neural network is the intermediate image in the second data format.

The following describes the process of offline training of neural networks in detail in combination with specific application scenarios. In this embodiment, two training methods are introduced. For the convenience of distinguishing, the neural networks obtained by the two training methods may be referred to as a first neural network and a second neural network, respectively.

Referring to FIG. 2A, in order to train the first neural network offline, a training image in a first data format and a training image in a second data format may be collected, and the training image in the first data format and the training image in the second data format may be associated. The image data set is stored, and the image data set is output to the first neural network. The first neural network uses the image data set to train each neural network parameter in the first neural network.

The process of obtaining the image data set can include:

Method 1: For the same frame image, the imaging device A collects the training image A1 in the first data format, the imaging device B synchronously acquires the training image B1 in the second data format, and then stores the training image A1 and the training image B1 in association. Similarly, the imaging device A acquires the training image A2, the imaging device B acquires the training image B2, and stores the training image A2 and the training image B2 in association. By analogy, in the end, the image data set may include the corresponding relationship between each group of training images An and training images Bn.

Method 2: The imaging device A collects the training image A1 in the first data format, and processes the training image A1 (such as white balance correction, color interpolation, curve mapping, etc., and the processing method is not limited) to obtain the second The training image A1 'in a data format, and the training image A1 and the training image A1' are stored in association. Similarly, the imaging device A collects the training image A2 in the first data format, and processes the training image A2 to obtain the training image A2 ′ in the second data format, and stores the training image A2 and the training image A2 ′ in association with each other. By analogy, the final image data set may include the correspondence between each group of training images An and training images An ′.

Obviously, through the above two methods, an image data set can be obtained, and the image data set includes a correspondence between a training image in a first data format and a training image in a second data format. Based on this image data set, a pre-designed first neural network can be trained, that is, each neural network parameter is trained, so that the first neural network fits the mapping relationship between the image in the first data format and the image in the second data format. There are no restrictions on this training method, such as back propagation, elastic propagation, and conjugate gradient.

After the training of the first neural network is completed, the original image may be input into the trained first neural network to convert the original image in the first data format into the intermediate image in the second data format by the first neural network. Moreover, after the training of the first neural network is completed, the first neural network can also be adjusted online. That is, the original image in the first data format and the target image in the second data format can be used to re-optimize the parameters of each neural network in the first neural network, and there is no limitation on the online adjustment process of the first neural network.

As shown in FIG. 2B, in order to train the second neural network offline, training images in a first data format and training images in a second data format may be collected to obtain device parameters (that is, parameters of an imaging device that acquires training images in the first data format). ), Associate the training image in the first data format, the training image in the second data format, and the device parameters to obtain an image data set, and output the image data set to the second neural network, and the second neural network uses the image data Set the parameters of each neural network in the training second neural network.

The process of obtaining the image data set may include, but is not limited to:

Method 1: For the same frame image, the imaging device A acquires the training image A1 in the first data format, and acquires the device parameter 1 of the imaging device A. The imaging device B synchronously acquires the training image B1 in the second data format, and then, the training image is A1, equipment parameter 1 and training image B1 are stored in association. Similarly, the imaging device A acquires the training image A2 and acquires the device parameter 2, and the imaging device B acquires the training image B2, and stores the training image A2, the device parameter 2 and the training image B2 in association. By analogy, in the end, the image data set may include the correspondence between each group of training images An, device parameters n, and training images Bn.

The above device parameters may be fixed parameters that are not related to the environment (such as sensor sensitivity), or shooting parameters that are related to the environment (such as aperture size). For example, device parameters may include, but are not limited to, one or any combination of the following: sensor sensitivity, dynamic range, signal-to-noise ratio, pixel size, target surface size, resolution, frame rate, number of pixels, spectral response, photoelectric response, There are no restrictions on the array mode, lens aperture diameter, focal length, aperture size, hood model, filter aperture, viewing angle, etc.

Method 2: The imaging device 1 collects the training image set 1 (including a large number of training images) in the first data format, obtains the device parameters 1 of the imaging device 1, and processes each training image in the training image set 1 (such as white Balance correction, color interpolation, curve mapping, etc., without limitation), to obtain training image set 1 in the second data format, and training image set 1, device parameter 1 and training in the second data format in the first data format Image set 1 is stored in association. Similarly, the imaging device 2 collects the training image set 2 in the first data format, acquires the device parameters 2 of the imaging device 2, and processes each training image in the training image set 2 to obtain the training image set in the second data format. 2. Store the training image set 2 in the first data format 2 and the device parameters 2 and the training image set 2 in the second data format in an associative manner; and so on, the final image data set may include the training image sets of each group in the first data format The correspondence between K, the device parameter K, and the training image set K in the second data format is shown in FIG. 2C.

An image data set can be obtained through the above two methods, and the image data set includes a training image in a first data format, a correspondence between a device parameter and a training image in a second data format. Based on this image data set, a pre-designed second neural network can be trained, that is, training the neural network parameters, so that the second neural network fits the mapping of the image in the first data format, the device parameters, and the image in the second data format. relationship. There are no restrictions on this training method, such as back propagation, elastic propagation, and conjugate gradient. It should be noted that the mapping relationship fitted by the second neural network includes device parameters.

After the training of the second neural network is completed, the original image and the device parameters of the device that collects the original image can be obtained, and the original image and the device parameters can be input to the trained second neural network, so that the second neural network can The device parameter converts a data format of the original image from the first data format to a second data format, thereby obtaining an intermediate image in a second data format.

In the above application scenario, an image processing method is described below with reference to several specific embodiments.

Referring to FIG. 3, which is a schematic flowchart of an image processing method, the method may include the following steps.

Step 301: Obtain an original image in a first data format.

In one example, a light signal in a first wavelength range may be sampled to obtain an original image; or a light signal in a second wavelength range may be sampled to obtain an original image; or a first wavelength range and a second wavelength range may be obtained. The light signal is sampled to obtain the original image. The first wavelength range and the second wavelength range are merely examples and are not restrictive. For example, the first wavelength range may be a visible light wavelength range from 380 nm to 780 nm, and the second wavelength range may be an infrared wavelength range from 780 nm to 2500 nm.

Step 302: Use the trained neural network to convert the original image into an intermediate image in a second data format. Referring to the foregoing embodiment, the original image may be input to a trained first neural network, and the data format of the original image is converted from the first data format to the second data format by the first neural network; A data format of the device parameters of the original image device; the original image and the device parameters are then input to a trained second neural network, so that the second neural network converts the data of the first data format according to the device parameters The original image is converted into an intermediate image in the second data format.

In step 303, the buffer image is used to perform noise reduction processing on the intermediate image in the second data format to obtain a target image in the second data format. The buffer image includes a target image corresponding to a previous frame of the original image adjacent to the original image. After step 303, the cache image corresponding to the original image of the next frame may be updated as the target image.

For example, for the first frame of the original image 1, after obtaining the target image 1 of the original image 1 (the acquisition method of the target image 1 is not limited, for example, the intermediate image 1 of the original image 1 can be directly determined as the target image 1) To update the cached image in the cache to target image 1. For the second frame of the original image 2, after obtaining the intermediate image 2 of the original image 2, the buffer image (that is, the target image 1) is used to perform noise reduction processing on the intermediate image 2 to obtain the target image 2, and the cached image in the cache is updated Is target image 2, that is, target image 1 is no longer a cached image. For the third frame of the original image 3, after the intermediate image 3 of the original image 3 is obtained, the buffer image (that is, the target image 2) is used to perform noise reduction processing on the intermediate image 3 to obtain the target image 3, and the cached image in the cache is updated Is target image 3, that is, target image 2 is no longer a cached image.

By analogy, the target image corresponding to the original image of the previous frame can be used to continuously update the cached image in the cache to perform noise reduction processing on the intermediate image corresponding to the original image of the current frame, thereby obtaining the target image corresponding to the original image of the current frame. .

The buffer image is used to perform noise reduction processing on the intermediate image to obtain a target image in a second data format, including but not limited to: obtaining a motion estimation value of each pixel in the intermediate image according to the intermediate image and the buffer image; The intermediate image is converted into a target image in a second data format.

As can be seen from the above technical solutions, in the embodiment of the present disclosure, after the original image in the first data format is converted into the intermediate image in the second data format by using a neural network, the buffer image can be used to perform noise reduction processing on the intermediate image to obtain the second The target image in the data format. Since the cached image is the target image corresponding to the previous frame of the original image adjacent to the original image, the two adjacent frames (ie, the cached image and the intermediate image) are closely related in time and space. You can use the correlation between two adjacent frames to distinguish the signal and noise in the intermediate image, and perform noise reduction processing on the intermediate image to effectively remove the noise. Therefore, noise can be effectively removed even in poor lighting conditions, so that the noise in the target image can be effectively suppressed.

In one embodiment, the above method flow may be executed by the image processing apparatus 100. As shown in FIG. 4A, the image processing apparatus 100 may include three modules: an image processing module 101, a video processing module 102, and a training and learning module 103. The image processing module 101 is configured to perform the foregoing

steps

301 and 302, and the video processing module 102 is configured to perform the foregoing step 303.

The training and learning module 103 may be an offline module. The neural network is trained and adjusted in advance using the image data set, and the trained and adjusted neural network is output to the image processing module 101. The first neural network described in the above embodiment may be executed. And offline training of the second neural network.

The image processing module 101 obtains a pre-adjusted neural network from the training and learning module 103, processes the original image of the inputted first data format based on the neural network, and outputs the intermediate image of the second data format.

The video processing module 102 receives a frame of intermediate image in the second data format output by the image processing module 101, performs noise reduction processing in combination with the information of the cached image stored in the cache, obtains the target image, and stores the processed target image in the cache. As a cache image corresponding to the next frame, the video processing module 102 may record a frame of the image in the cache as a cache image.

In one embodiment, as shown in FIG. 4B, the video processing module 102 is composed of a motion estimation unit 201 and a time domain processing unit 202. In this case, step S1 may be performed by the motion estimation unit 201, step S2 may be performed by the time domain processing unit 202, and step 303 described above may be implemented by the motion estimation unit 201 and the time domain processing unit 202.

In step S1, a motion estimation value of each pixel in the intermediate image is obtained according to the intermediate image and the cache image. The cache image includes a target image corresponding to a previous frame of the original image adjacent to the original image.

In step S2, the intermediate image is converted into a target image in a second data format according to the motion estimation value.

Of course, the above-mentioned video processing module 102 composed of the motion estimation unit 201 and the time-domain processing unit 202 is just an example. In practical applications, the video processing module 102 may further include other units, such as a noise estimation unit, a spatial processing unit, and the like. The estimation unit is configured to perform noise estimation on the image, and the spatial processing unit is configured to perform spatial processing on the image, and there is no limitation on the processing process.

In another embodiment, as shown in FIG. 4C, the video processing module 102 is composed of a motion estimation unit 201, a time-domain processing unit 202, a noise estimation unit 203, and a spatial-domain processing unit 204. In this case, the aforementioned step 303 may be implemented by the aforementioned motion estimation unit 201, time domain processing unit 202, noise estimation unit 203, and spatial domain processing unit 204. Alternatively, as shown in FIG. 4D, the video processing module 102 is composed of a motion estimation unit 201, a time domain processing unit 202, and a spatial domain processing unit 204. In this case, the above-mentioned step 303 may be implemented by the above-mentioned motion estimation unit 201, time-domain processing unit 202, and space-domain processing unit 204.

In one embodiment, the above step S1 is implemented through steps 4031 and 4032, which are specifically:

Step 4031: Obtain a correlation image according to the intermediate image and the cache image. The correlation image is obtained from the pixel values of the corresponding positions of the intermediate image and the cache image according to a preset calculation method. The preset calculation method may be a frame difference method, a convolution method, a cross-correlation method, etc., and there is no limitation on this.

Obtaining a correlation image according to the intermediate image and the cache image may include, but is not limited to:

Method 1: Use the frame difference method to calculate the correlation image, that is, the difference between the pixel values of each pixel point of the intermediate image and the buffer image to obtain the correlation image. For example, for each pixel in the associated image, a first pixel value corresponding to the pixel in the intermediate image, a second pixel value corresponding to the pixel in the cache image, and the first pixel value and the second pixel are obtained. The difference in values is determined as the pixel value of the pixel in the associated image.

For example, the size of the intermediate image is 6 * 4, the size of the cache image is 6 * 4, and the size of the correlation image is 6 * 4. 6 represents the number of pixels in the horizontal direction, and 4 represents the number of pixels in the vertical direction. Of course, 6 and 4 are just an example. In practical applications, the number of pixels in the horizontal direction is much larger than 6, and the number of pixels in the vertical direction is much larger than 4. There is no limitation on this, but the intermediate images, cached images, and related images are the same size. .

Further, it can be assumed that the pixels of the intermediate image are A11-A16, A21-A26, A31-A36, A41-A46, and the pixels of the cache image are B11-B16, B21-B26, B31-B36, B41- B46, it is assumed that the pixels of the correlation image are C11-C16, C21-C26, C31-C36, and C41-C46 in this order.

In this application scenario, since the pixel value of each pixel of the intermediate image is known and the pixel value of each pixel of the cache image is known, the pixel value of each pixel of the correlation image can be calculated as follows: For the pixel point C11 in the related image, obtain the first pixel value (pixel value of the pixel point A11) corresponding to the pixel point C11 in the intermediate image, and the second pixel value (pixel point B11) corresponding to the pixel point C11 in the cache image. Pixel value), and then the difference between the first pixel value and the second pixel value is determined as the pixel value of the pixel C11 in the associated image. For the other pixels in the related image, refer to pixel C11 for the processing method, and the details will not be repeated.

Method 2: Use the convolution method to calculate the correlation image, that is, convolve the image blocks of the intermediate image and the cache image to obtain the correlation image. The size of the image block is preset, such as 3 * 3. For example, for each pixel point in the associated image, a first image block corresponding to the pixel point is selected from the intermediate image, a second image block corresponding to the pixel point is selected from the cache image, and the first image block is The convolution value (that is, the convolution value of two matrices) with the second image block is determined as the pixel value of the pixel in the associated image.

For example, you can assume that the pixels of the intermediate image are A11-A16, A21-A26, A31-A36, A41-A46, and the pixels of the cache image are B11-B16, B21-B26, B31-B36, B41-B46. It is assumed that the pixels of the correlation image are C11-C16, C21-C26, C31-C36, and C41-C46 in this order.

In this application scenario, since the pixel value of each pixel of the intermediate image is known and the pixel value of each pixel of the cache image is known, the pixel value of each pixel of the correlation image can be calculated as follows: For the pixel C11 in the related image, a first image block corresponding to the pixel C11 is selected from the intermediate image. The first image block is a 3 * 3 matrix, and the first row includes the pixels A11, A12, and A13. The second line includes pixels A21, A22, and A23, and the third line includes pixels A31, A32, and A33. Select the second image block corresponding to pixel C11 from the cache image. The second image block is a 3 * 3 matrix. The first row includes pixels B11, B12, and B13, and the second row includes pixels B21, B22, and B23. The third line includes pixels B31, B32, and B33. Then, a convolution value of the first image block and the second image block is calculated, and the convolution value is the pixel value of the pixel point C11. For other pixels in the associated image, the processing method can refer to pixel C11, which will not be repeated here.

Of course, the above manners 1 and 2 are only two examples of acquiring the correlation image, and other manners can also be used to acquire the correlation image, such as using the cross-correlation method to calculate the correlation image, which is not limited.

Step 4032: Obtain a motion estimation value of each pixel in the intermediate image according to the correlation image. The value of the motion estimation value may be binarized or continuous.

For example, at least one of a smoothing process, a mapping process, and a threshold process may be adopted to obtain a motion estimation value of each pixel point in the intermediate image. Further, the smoothing processing may include: an image filtering operation having a smoothing characteristic, such as an average filtering operation, a median filtering operation, a Gaussian filtering operation, and the like, and there is no limitation on this image filtering operation. The mapping process may include a linear scaling operation and a panning operation. The threshold processing may include: determining a motion estimation value according to a magnitude relationship between the pixel value and the threshold; limiting the value of the motion estimation value to a range divided by the threshold, and there is no limitation on this.

The process of obtaining the motion estimation value of each pixel can include, but is not limited to:

Method 1: If the motion estimation value is binarized, and smoothing and threshold processing are used to obtain the motion estimation value of each pixel in the intermediate image, then the average filtering operation and the correlation image can be performed. Threshold processing, etc., to obtain a motion estimation value for each pixel in the intermediate image.

For example, for each pixel in the intermediate image, a third image block corresponding to the pixel can be selected from the correlation image, and the third image block is averaged to obtain a pixel value corresponding to the pixel; if the pixel value If the pixel value is greater than the threshold value, the motion estimation value of the pixel is determined to be the first value (such as 1), and if the pixel value is not greater than the threshold value, the motion estimation value of the pixel is determined to be the second value (such as 0).

For example, assume that the pixels of the intermediate image are A11-A16, A21-A26, A31-A36, A41-A46, and the pixels of the correlation image are C11-C16, C21-C26, C31-C36, C41-C46. . Since the correlation image is known (see step 4031), the pixel value of each pixel of the correlation image is known. In order to calculate the motion estimation value of each pixel in the intermediate image, the following methods can be used:

For the pixel A11 in the intermediate image, a third image block corresponding to the pixel A11 can be selected from the correlation image. The third image block can be a 3 * 3 matrix (the matrix can also be a matrix of other sizes. This is not limited), the first line includes pixels C11, C12, and C13, the second line includes pixels C21, C22, and C23, and the third line includes pixels C31, C32, and C33. Then, an average filter may be performed on the 9 pixels of the third image block (that is, the average of the 9 pixel values is calculated), and the average filter result is the pixel value corresponding to the pixel A11. If the pixel value is greater than the threshold, the motion estimation value of the pixel A11 may be 1, and if the pixel value is not greater than the threshold, the motion estimation value of the pixel A11 may be 0. For other pixels in the intermediate image, the processing method can refer to pixel A11, which is not repeated here.

Method 2: If the motion estimation value is continuously obtained, and smoothing and mapping processing are used to obtain the motion estimation value of each pixel in the intermediate image, a median filtering operation and linearity can be performed on the correlation image. Scaling operations, etc., to obtain a motion estimation value for each pixel in the intermediate image.

For example, after performing median filtering and linear scaling on the pixel value of each pixel in the correlation image, a filter value located in a specific interval (such as interval 0-1) is obtained; then, for each pixel in the intermediate image It is also possible to obtain a filter value corresponding to the pixel located in a specific interval and determine the filter value as a motion estimation value of the pixel, that is, the motion estimation value is also a value located in a specific interval (such as interval 0-1).

For example, assume that the pixels of the intermediate image are A11-A16, A21-A26, A31-A36, A41-A46, and the pixels of the correlation image are C11-C16, C21-C26, C31-C36, C41-C46. . Since the pixel value of each pixel of the correlation image is known, the motion estimation value of each pixel in the intermediate image can be calculated as follows:

First perform median filtering on the pixel value of each pixel in the correlation image, linearly scale the filtered pixel value to the range [0,1], and obtain the filtered value located in the interval [0,1]. Assume that the pixel The value ranges from 0 to 255. Then, the pixel value of each pixel point can be divided by 255 to obtain a filtered value located in the interval [0, 1]. For example, divide the pixel value 10 of the pixel C11 by 255 to obtain a filtered value of 0.039, divide the pixel value of the pixel C12 of 50 by 255, and obtain a filtered value of 0.196, and so on. Then, for pixel A11 in the intermediate image, pixel A11 corresponds to the filter value of pixel C11 0.039, that is, the motion estimate of pixel A11 is 0.039; for pixel A12 in the intermediate image, pixel A12 corresponds to the pixel The filter value of point C12 is 0.196, that is, the motion estimation value of pixel A12 is 0.196; and so on.

Of course, the above manners 1 and 2 are only two examples of acquiring the motion estimation value of each pixel in the intermediate image, and other manners can also be used to acquire the motion estimation value, which is not limited.

In one embodiment, the above step S2 can be implemented through steps 4041 and 4042. Of course, the target image in the second data format can also be obtained by other methods, which is not limited as long as the target image can be obtained based on the motion estimation value. Just fine. Step 4041 and step 4042 are specifically:

Step 4041: Obtain a low-noise image according to the intermediate image and the cached image. The low-noise image is an image with a lower noise level than the intermediate image, and there is no restriction on the acquisition method. For example, the low-noise image may be the average of the intermediate image and the cached image. image.

For example, assume that the pixels of the intermediate image are A11-A16, A21-A26, A31-A36, A41-A46, and the pixels of the cache image are B11-B16, B21-B26, B31-B36, and B41-B46. Assume that the pixels of the low-noise image are D11-D16, D21-D26, D31-D36, and D41-D46 in this order. Based on this, since the pixel value of each pixel point of the intermediate image and the cache image is known, the pixel value of each pixel point of the low-noise image can be calculated as follows:

For the pixel point D11 in the low-noise image, obtain the pixel value corresponding to the pixel point D11 in the intermediate image (the pixel value of the pixel point A11) and the pixel value corresponding to the pixel point D11 in the cache image (the pixel value of the pixel point B11). The average value of the above two pixel values is determined as the pixel value of the pixel point D11 in the low-noise image. For other pixels in the low-noise image, refer to pixel D11 for the processing method, and the details will not be repeated.

Step 4042: Obtain a target image according to the intermediate image, the low-noise image, and the motion estimation value.

In an example, acquiring the target image according to the intermediate image, the low-noise image, and the motion estimation value may include, but is not limited to, determining a pixel value of a first pixel point in the target image, where the first pixel point is any one of the target image Pixels, the pixel value of the first pixel is determined in the following manner: determining a first weight of the first pixel in the intermediate image and a second weight of the first pixel in the low-noise image according to the motion estimation value of the first pixel; Determine the pixel value of the first pixel in the target image according to the pixel value and the first weight corresponding to the first pixel in the intermediate image, and the pixel value and the second weight corresponding to the first pixel in the low-noise image; The pixel values of all pixels determine the target image.

Assuming that the motion estimation value of the first pixel is A, the first weight of the first pixel in the intermediate image may be A, and the second weight of the first pixel in the low-noise image may be (1-A) Suppose that the pixel value corresponding to the first pixel point in the intermediate image is N and the pixel value corresponding to the first pixel point in the low-noise image is M. Then the pixel value of the first pixel point in the target image is N * A + M * (1-A). Of course, the above method is only an example, and the pixel value of the first pixel point in the target image may also be obtained by other methods, which is not limited.

For example, assume that the pixels of the intermediate image are A11-A16, A21-A26, A31-A36, A41-A46, and the pixels of the low-noise image are D11-D16, D21-D26, D31-D36, D41-D46. Suppose that the pixels of the target image are E11-E16, E21-E26, E31-E36, E41-E46. Since the pixel value of each pixel of the intermediate image and the low-noise image is known, calculate each The pixel value of the pixel in the target image can be adopted as follows:

For the pixel E11 in the target image, determine the first weight as the motion estimation value of the pixel E11, and the second weight as 1 minus the motion estimation value of the pixel E11, and then calculate the pixel value of the pixel A11 * the first weight + Pixel value of pixel point D11 * second weight, the result of this calculation is the pixel value of pixel point E11. For other pixels in the target image, refer to pixel E11 for the processing method, which will not be repeated.

It can be seen from the above embodiments that when the motion estimation value is larger, it means that the first weight is larger and the second weight is smaller, that is, the pixel value of the intermediate image is higher, and the pixel value of the low-noise image is higher. The lower the proportion, the smaller the motion estimation value, the smaller the first weight and the larger the second weight, that is, the lower the proportion of the pixel values of the intermediate image, and the higher the proportion of the pixel values of the low-noise image high.

Because of the above principle, when the motion estimation value of pixel E11 is relatively large (such as greater than 0.5), the first weight is greater than the second weight. When the motion estimation value is relatively large, it indicates that the pixel is compared with the previous frame. The change of E11 is large, that is, the pixel value of the current frame is more accurate. Obviously, because the first weight is greater than the second weight, the pixel value of the intermediate image in the current frame accounts for a larger proportion, which is more in line with the demand for larger motion estimates. In this way, the pixel value of pixel E11 in the target image is more accurate.

In the above embodiment, the pixel value of the pixel point may include, but is not limited to, the gray value, brightness value, and chrominance value of the pixel point, and the type of the pixel value is not limited, and may be related to actual image processing.

As can be seen from the above technical solutions, in the embodiment of the present disclosure, after the original image in the first data format is converted into the intermediate image in the second data format by using a neural network, the buffer image can be used to perform noise reduction processing on the intermediate image to obtain the second The target image in the data format. Since the cached image is the target image corresponding to the previous frame of the original image adjacent to the original image, the two adjacent frames (ie, the cached image and the intermediate image) are closely related in time and space. The correlation between two adjacent frames of images can be used to distinguish the signal and noise in the images, and the intermediate image is processed for noise reduction. Therefore, noise can be effectively removed even in poor lighting conditions, so that the noise in the target image can be effectively suppressed.

Based on the same concept as the above method, an embodiment of the present disclosure also proposes an image processing device. As shown in FIG. 5, it is a structural diagram of the image processing device. The image processing device includes:

An image processing module 501, configured to obtain an original image in a first data format, and use a neural network to convert the original image into an intermediate image in a second data format;

The video processing module 502 is configured to perform noise reduction processing on the intermediate image by using a cached image to obtain a target image in a second data format, where the cached image includes a previous frame corresponding to the original image corresponding to the original image Target image.

When the image processing module 501 uses a neural network to convert the original image into an intermediate image in a second data format, the image processing module 501 is specifically configured to:

Inputting the original image into a first neural network to convert the original image in the first data format into an intermediate image in the second data format by the first neural network; or,

Acquiring device parameters of a device that collects the original image in the first data format;

The original image and the device parameters are input to a second neural network to convert the original image in the first data format into an intermediate image in a second data format according to the device parameters by the second neural network.

When the video processing module 502 obtains the motion estimation value of each pixel in the intermediate image according to the intermediate image and the cache image, the video processing module 502 is specifically configured to: obtain a correlation image according to the intermediate image and the cache image; The correlation image acquires a motion estimation value of each pixel point in the intermediate image.

The video processing module 502 obtains a motion estimation value of each pixel in the intermediate image by using at least one of a smoothing process, a mapping process, and a threshold process.

When the video processing module 502 converts the intermediate image into a target image in a second data format according to the motion estimation value, the video processing module 502 is specifically configured to: obtain a low-noise image according to the intermediate image and the cache image; An image, the low-noise image, and the motion estimation value to obtain a target image.

The video processing module 502 is specifically configured to obtain a target image according to the intermediate image, the low-noise image, and the motion estimation value:

Determine a pixel value of a first pixel point in the target image, where the first pixel point is any pixel point in the target image, and a pixel value of the first pixel point is determined in the following manner: according to the first The motion estimation value corresponding to the pixel determines the first weight of the first pixel in the intermediate image and the second weight of the first pixel in the low-noise image; according to the pixel corresponding to the first pixel in the intermediate image A value and the first weight, a pixel value corresponding to the first pixel point in the low-noise image, and the second weight, to determine a pixel value of the first pixel point in the target image;

The target image is determined according to pixel values of all pixel points of the target image.

In terms of hardware, the schematic diagram of the hardware architecture of the image processing device provided by the embodiment of the present disclosure can be specifically shown in FIG. 6, and includes: a processor 601 and a machine-readable storage medium 602, wherein: the machine-readable storage medium 602 stores machine-executable instructions that can be executed by the processor 601; the processor 601 is configured to execute machine-executable instructions to implement the image processing method disclosed in the above examples of the present disclosure.

The machine-readable storage medium may be any electronic, magnetic, optical, or other physical storage device, and may contain or store information, such as executable instructions, data, and so on. For example, the machine-readable storage medium may be: RAM (Radom Access Memory), volatile memory, non-volatile memory, flash memory, storage drive (such as hard drive), solid state hard disk, any type of storage disk (Such as optical discs, DVDs, etc.), or similar storage media, or a combination thereof.

The system, device, module, or unit described in the foregoing embodiments may be specifically implemented by a computer chip or entity, or a product with a certain function. A typical implementation device is a computer, and the specific form of the computer may be a personal computer, a laptop computer, a cellular phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email sending and receiving device, and a game control Desk, tablet computer, wearable device, or a combination of any of these devices.

For the convenience of description, when describing the above device, the functions are divided into various units and described separately. Of course, the functions of the units may be implemented in the same or multiple software and / or hardware when implementing the present disclosure.

Those skilled in the art should understand that the embodiments of the present disclosure may be provided as a method, a system, or a computer program product. Therefore, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Moreover, the embodiments of the present disclosure may take the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.

The present disclosure is described with reference to flowcharts and / or block diagrams of methods, devices (systems), and computer program products according to embodiments of the present disclosure. It should be understood that each process and / or block in the flowcharts and / or block diagrams, and combinations of processes and / or blocks in the flowcharts and / or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing device to produce a machine such that the instructions generated by the processor of the computer or other programmable data processing device are used to generate instructions Means for implementing the functions specified in one or more flowcharts and / or one or more blocks of the block diagrams.

Furthermore, these computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing device to work in a specific manner, so that the instructions stored in the computer-readable memory produce a manufactured article including an instruction device, The instruction device implements the functions specified in a flowchart or a plurality of processes and / or a block or a plurality of blocks in the block diagram.

These computer program instructions can also be loaded on a computer or other programmable data processing device, so that a series of operation steps can be performed on the computer or other programmable device to generate a computer-implemented process, which can be executed on the computer or other programmable device. The instructions provide steps for implementing the functions specified in one or more flowcharts and / or one or more blocks of the block diagrams.

The above are only examples of the present disclosure and are not intended to limit the present disclosure. It will be apparent to those skilled in the art that various modifications and variations can be made in the present disclosure. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of this disclosure shall be included in the scope of claims of this disclosure.

Claims

An image processing method includes:

Obtaining an original image in a first data format;

Using a neural network to convert the original image into an intermediate image in a second data format;

The buffer image is used to perform noise reduction processing on the intermediate image to obtain a target image in a second data format. The buffer image includes a target image corresponding to a previous frame of the original image adjacent to the original image.
The method according to claim 1, wherein the neural network comprises:

At least one convolutional layer, at least one pooling layer, and at least one fully connected layer; or

The neural network includes: at least one convolutional layer and at least one excitation layer.
The method according to claim 1, wherein using the neural network to convert the original image into the intermediate image in the second data format comprises:

Inputting the original image into a first neural network to convert the original image in the first data format into the intermediate image in the second data format by the first neural network; or,

Acquiring device parameters of a device that collects the original image in the first data format;

Inputting the original image and the device parameters to a second neural network to convert the original image in the first data format to the second data format by the second neural network according to the device parameters The intermediate image.
The method according to claim 1, wherein using the cached image to perform noise reduction processing on the intermediate image to obtain the target image in the second data format comprises:

Acquiring a motion estimation value of each pixel point in the intermediate image according to the intermediate image and the cache image;

Converting the intermediate image into a target image in the second data format according to the motion estimation value.
The method according to claim 4, wherein at least one of a smoothing process, a mapping process, and a threshold process is used to obtain a motion estimation value of each pixel point in the intermediate image.
The method according to claim 4, wherein converting the intermediate image to the target image in the second data format according to the motion estimation value comprises:

Acquiring a low-noise image according to the intermediate image and the cache image;

Acquiring the target image according to the intermediate image, the low-noise image, and the motion estimation value.
The method according to claim 6, wherein the acquiring the target image according to the intermediate image, the low-noise image, and the motion estimation value comprises:

Determine a pixel value of a first pixel point in the target image, where the first pixel point is any one pixel point in the target image,

The pixel value of the first pixel point is determined in the following manner:

Determining a first weight of the first pixel in the intermediate image and a second weight of the first pixel in the low-noise image according to the motion estimation value corresponding to the first pixel;

Determining the first pixel according to a pixel value corresponding to the first pixel point in the intermediate image and the first weight, a pixel value corresponding to the first pixel point in the low-noise image, and the second weight. A pixel value of a pixel in the target image;

The target image is determined according to pixel values of all pixel points of the target image.
An image processing device includes:

An image processing module, configured to obtain an original image in a first data format and convert the original image into an intermediate image in a second data format using a neural network;

A video processing module, configured to perform noise reduction processing on the intermediate image by using a cached image to obtain a target image in a second data format, wherein the cached image includes a corresponding original image of a previous frame adjacent to the original image The target image.
The device according to claim 8, wherein the image processing module uses a neural network to convert the original image into the intermediate image in the second data format, and is specifically configured to:

Inputting the original image into a first neural network to convert the original image in the first data format into the intermediate image in the second data format by the first neural network; or,

Acquiring device parameters of a device that collects the original image in the first data format;

Inputting the original image and the device parameters to a second neural network to convert the original image in the first data format to the second data format by the second neural network according to the device parameters The intermediate image.
The device according to claim 8, wherein the video processing module performs noise reduction processing on the intermediate image by using the cached image to obtain the target image in the second data format, and is specifically configured to:

Acquiring a motion estimation value of each pixel point in the intermediate image according to the intermediate image and the cache image;

Converting the intermediate image into a target image in the second data format according to the motion estimation value.
The device according to claim 10, wherein:

The video processing module uses at least one of a smoothing process, a mapping process, and a threshold process to obtain a motion estimation value of each pixel in the intermediate image.
The device according to claim 10, wherein:

The video processing module is specifically configured to convert the intermediate image into the target image in the second data format according to the motion estimation value:

Acquiring a low-noise image according to the intermediate image and the cache image;

Acquiring the target image according to the intermediate image, the low-noise image, and the motion estimation value.
The device according to claim 12, wherein the video processing module is specifically configured to obtain the target image according to the intermediate image, the low-noise image, and the motion estimation value:

Determine a pixel value of a first pixel point in the target image, where the first pixel point is any pixel point in the target image,

The pixel value of the first pixel point is determined in the following manner:

Determining a first weight of the first pixel in the intermediate image and a second weight of the first pixel in the low-noise image according to the motion estimation value corresponding to the first pixel;

Determining the first pixel according to a pixel value corresponding to the first pixel point in the intermediate image and the first weight, a pixel value corresponding to the first pixel point in the low-noise image, and the second weight. A pixel value of a pixel in the target image;

The target image is determined according to pixel values of all pixel points of the target image.
An image processing device includes: a processor and a machine-readable storage medium, where the machine-readable storage medium stores machine-executable instructions executable by the processor; and the processor is configured to execute the machine-executable instructions, To implement the method steps according to any one of claims 1-7.
A machine-readable storage medium stores computer instructions on the machine-readable storage medium. When the computer instructions are executed, the method steps according to any one of claims 1-7 are implemented.