WO2021196050A1 - Procédé et appareil de traitement d'image sur la base d'un réseau neuronal - Google Patents

Procédé et appareil de traitement d'image sur la base d'un réseau neuronal Download PDF

Info

Publication number
WO2021196050A1
WO2021196050A1 PCT/CN2020/082634 CN2020082634W WO2021196050A1 WO 2021196050 A1 WO2021196050 A1 WO 2021196050A1 CN 2020082634 W CN2020082634 W CN 2020082634W WO 2021196050 A1 WO2021196050 A1 WO 2021196050A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
neural network
processed
images
frame
Prior art date
Application number
PCT/CN2020/082634
Other languages
English (en)
Chinese (zh)
Inventor
李蒙
郑成林
胡慧
陈海
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2020/082634 priority Critical patent/WO2021196050A1/fr
Priority to CN202080099095.0A priority patent/CN115335852A/zh
Publication of WO2021196050A1 publication Critical patent/WO2021196050A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/60Image enhancement or restoration using machine learning, e.g. neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Definitions

  • the embodiments of the present application relate to the field of image processing technology, and in particular, to a neural network-based image processing method and device.
  • the mobile terminal performs image signal processing (ISP) on the image signal.
  • ISP image signal processing
  • the main function of ISP is to perform post-processing on the image signal output by the front-end image sensor. Depending on the ISP, the images obtained under different optical conditions can better restore the details of the scene.
  • the ISP processing flow is shown in Figure 1.
  • the natural scene 101 obtains the Bayer image through the lens 102, and then obtains the analog electrical signal 105 through the sensor 103 and the photoelectric conversion 104, and further passes noise reduction and digital-to-analog conversion (A/ D) 106 obtains a digital electrical signal (ie raw image) 107, and then enters the digital signal processing chip 100.
  • the steps in the digital signal processing chip 100 are the core steps of ISP processing.
  • the digital signal processing chip 100 generally includes black level compensation (BLC) 108, lens shading correction 109, and dead pixel correction ( bad pixel correction, BPC) 110, demosaic (demosaic) 111, Bayer domain noise reduction (denoise) 112, auto white balance (AWB) 113, Ygamma 114, auto exposure (AE) 115, auto focus (auto focus, AF) (not shown in Figure 1), color correction (CC) 116, gamma correction 117, color gamut conversion 118, color denoising/detail enhancement 119, color enhancement (color Enhance (CE) 120, formater (formater) 121, input/output (input/output, I/O) control 122 and other modules.
  • BLC black level compensation
  • BPC dead pixel correction
  • demosaic demosaic
  • Bayer domain noise reduction denoise
  • ARB auto white balance
  • AE auto exposure
  • AF auto focus
  • CE color Enhance
  • CE color Enhance
  • ISP based on deep learning has achieved certain results in the application of many tasks.
  • the ISP based on deep learning will process the image data through a neural network and then output it.
  • the processing complexity of the neural network is generally very high.
  • the expected purpose can be achieved, but in scenarios that require real-time processing , Generally there are problems such as energy consumption and running time.
  • ISP based on neural network needs to be further optimized.
  • This application provides a neural network-based image processing method and device, in order to optimize the neural network-based image signal processing performance.
  • a neural network-based image processing method uses a first neural network and a second neural network to process multiple frames of to-be-processed images, and output a second image.
  • the steps of the method are as follows: input multiple frames of to-be-processed images into the first neural network for calculation to obtain the first image; input multiple image groups into multiple second neural networks for calculation respectively to obtain multiple frames of the first neural network. Two images, where each image group includes the first image and one frame of images among the multiple frames of images to be processed.
  • the first image obtained after multiple frames of images to be processed is subjected to a first neural network operation, that is, the image characteristics common to the multiple frames of images to be processed are obtained.
  • the first image and a frame of image to be processed are subjected to a second neural network operation to obtain a second image, so as to obtain multiple frames of second images respectively. Since the first neural network and the second neural network are used to process multiple frames of images to be processed, the first image is applied to the processing of the second neural network, which reduces the computational complexity of the second neural network and can Ensure the quality of image processing.
  • inputting multiple image groups into multiple second neural networks to perform calculations includes: inputting one frame of the multiple frames of images to be processed into the second neural network to perform calculations, so as to obtain the third neural network. Image; merge the first image and the third image to obtain the second image.
  • the first neural network includes an image output layer and multiple feature map output layers, the image output layer outputs the first image, the feature map output layer outputs multiple intermediate feature maps, and the multiple intermediate feature maps are used to participate in the second The operation of the neural network to obtain the third image.
  • the complexity of the second neural network is lower than the complexity of the first neural network.
  • the multiple frames of images to be processed include multiple frames of temporally adjacent images.
  • the processing capability of the first neural network on the static region of the image is greater than that of the second neural network.
  • the first neural network is used to process the static area of the image to be processed.
  • the second neural network is used to process the motion area of the image to be processed.
  • the first neural network and the second neural network form an image processing system, and the image processing system is used to reduce noise and eliminate mosaic effects on the image to be processed.
  • a neural network-based image processing device can be a mobile terminal, a device in a mobile terminal (such as a chip, or a chip system, or a circuit), or a device that can be matched with the mobile terminal.
  • the device may include modules that perform one-to-one correspondence of the methods/operations/steps/actions described in the first aspect.
  • the modules may be hardware circuits, software, or hardware circuits combined with software.
  • the device processes multiple frames of to-be-processed images to obtain a second image.
  • the device may include an arithmetic module.
  • an arithmetic module for inputting multiple frames of images to be processed into a first neural network for operation to obtain a first image; the arithmetic module is also used for inputting multiple image groups into multiple second neural networks, respectively Perform operations to obtain multiple frames of second images respectively, where each image group includes the first image and one frame of the multiple frames of images to be processed.
  • the arithmetic module is used to: input one frame of the images to be processed into the second neural network for calculation to obtain the third image; merge the first image and the third image to obtain The second image.
  • the first neural network includes an image output layer and multiple feature map output layers, the image output layer outputs the first image, the feature map output layer outputs multiple intermediate feature maps, and the multiple intermediate feature maps are used to participate in the second The operation of the neural network to obtain the third image.
  • the complexity of the second neural network is lower than the complexity of the first neural network.
  • the multiple frames of images to be processed include multiple frames of temporally adjacent images.
  • the processing capability of the first neural network on the static region of the image is greater than that of the second neural network.
  • the first neural network is used to process the static area of the image to be processed.
  • the second neural network is used to process the motion area of the image to be processed.
  • the first neural network and the second neural network constitute an image processing system, and the image processing system is used to reduce noise and eliminate mosaic effects on the image to be processed.
  • an embodiment of the present application provides an image processing device based on a neural network.
  • the device includes a processor, and the processor is used to call a set of programs, instructions, or data to execute the first aspect or any one of the first aspects. Possible design methods described.
  • the device may also include a memory for storing programs, instructions or data called by the processor.
  • the memory is coupled with the processor, and when the processor executes the instructions or data stored in the memory, it can implement the method described in the first aspect or any possible design.
  • an embodiment of the present application provides a chip system, which includes a processor and may also include a memory, for implementing the method described in the first aspect or any one of the possible designs of the first aspect.
  • the chip system can be composed of chips, and can also include chips and other discrete devices.
  • the embodiments of the present application also provide a computer-readable storage medium.
  • the computer-readable storage medium stores computer-readable instructions.
  • the method described in one aspect or any one of the possible designs of the first aspect is executed.
  • the embodiments of the present application also provide a computer program product containing instructions, which when run on a computer, cause the computer to execute the method described in the first aspect or any possible design of the first aspect .
  • FIG. 1 is a schematic diagram of an ISP processing flow in the prior art
  • FIG. 2 is a schematic structural diagram of a system architecture provided by an embodiment of this application.
  • FIG. 3 is a schematic diagram of the principle of a neural network provided by an embodiment of the application.
  • FIG. 4 is a flowchart of a neural network-based image processing method provided by an embodiment of the application.
  • FIG. 5 is a schematic diagram of an implementation manner of image processing provided by an embodiment of the application.
  • FIG. 6 is a schematic diagram of an implementation manner of image processing provided by an embodiment of the application.
  • FIG. 7 is a schematic diagram of an implementation manner of image processing provided by an embodiment of the application.
  • FIG. 8 is a schematic diagram of an RGrGbB image processing process provided by an embodiment of the application.
  • FIG. 9a is one of the schematic structural diagrams of the first neural network provided by an embodiment of this application.
  • FIG. 9b is the second schematic diagram of the structure of the first neural network provided by an embodiment of the application.
  • FIG. 10a is one of the schematic structural diagrams of the second neural network provided by an embodiment of this application.
  • FIG. 10b is one of the schematic structural diagrams of the second neural network provided by an embodiment of this application.
  • FIG. 11a is a schematic structural diagram of a first neural network and a second neural network provided by an embodiment of this application;
  • FIG. 11b is a schematic diagram of the structure of the first neural network and the second neural network provided by an embodiment of the application;
  • FIG. 12 is a schematic structural diagram of an image processing device based on a neural network provided by an embodiment of the application.
  • FIG. 13 is a schematic structural diagram of an image processing device based on a neural network provided by an embodiment of the application.
  • words such as “exemplary” or “for example” are used as examples, illustrations, or illustrations. Any embodiment or design solution described as “exemplary” or “for example” in the embodiments of the present application should not be construed as being more preferable or advantageous than other embodiments or design solutions. To be precise, words such as “exemplary” or “for example” are used to present related concepts in a specific manner.
  • the image processing method and device based on neural network (NN) provided by the embodiments of this application can be applied to electronic equipment.
  • the electronic equipment may be a mobile terminal (mobile terminal), a mobile station (MS), Mobile devices such as user equipment (UE) can also be fixed devices, such as fixed phones, desktop computers, etc., or video monitors.
  • the electronic device is an image acquisition and processing device with image signal acquisition and processing functions, and has an ISP processing function.
  • the electronic device can also optionally have a wireless connection function to provide users with a handheld device with voice and/or data connectivity, or other processing devices connected to a wireless modem.
  • the electronic device can be a mobile phone (or (Called "cellular" phones), computers with mobile terminals, etc., can also be portable, pocket-sized, handheld, computer-built or vehicle-mounted mobile devices, of course, can also be wearable devices (such as smart watches, smart bracelets) Etc.), tablet computers, personal computers (PC), personal digital assistants (PDAs), point of sales (POS), etc.
  • the following takes the electronic device as a mobile terminal as an example for description.
  • FIG. 2 is a schematic diagram of an optional hardware structure of the mobile terminal 200 according to an embodiment of the application.
  • the mobile terminal 200 mainly includes a chipset and peripheral devices.
  • Components such as USB interface, memory, display screen, battery/mains power, earphone/speaker, antenna, sensor, etc. can be understood as peripheral devices.
  • the arithmetic processor, RAM, I/O, display interface, ISP, sensor interface, baseband and other components in the chipset can form a system-on-a-chip (SOC), which is the main part of the chipset.
  • SOC system-on-a-chip
  • the components in the SOC can all be integrated into a complete chip, or part of the components in the SOC can be integrated, and the other parts are not integrated.
  • the baseband communication module in the SOC can not be integrated with other parts and become an independent part.
  • the components in the SOC can be connected to each other through a bus or other connecting lines.
  • the PMU, voice codec, RF, etc. outside the SOC usually include analog circuit parts, so they are often outside the SOC and are not integrated with each other.
  • the PMU is used to connect to the mains or battery to supply power to the SOC, and the mains can be used to charge the battery.
  • the voice codec is used as the sound codec unit to connect with earphones or speakers to realize the conversion between natural analog voice signals and digital voice signals that can be processed by the SOC.
  • the short-range module can include wireless fidelity (WiFi) and Bluetooth, and can also optionally include infrared, near field communication (NFC), radio (FM) or global positioning system (GPS) ) Module etc.
  • the RF is connected with the baseband communication module in the SOC to realize the conversion between the air interface RF signal and the baseband signal, that is, mixing. For mobile phones, receiving is down-conversion, and sending is up-conversion.
  • Baseband is used for baseband communication, including one or more of a variety of communication modes, used for processing wireless communication protocols, including physical layer (layer 1), medium access control (MAC) ( Layer 2), radio resource control (RRC) (layer 3) and other protocol layers can support various cellular communication standards, such as long term evolution (LTE) communication, or 5G new air interface ( new radio, NR) communication, etc.
  • the sensor interface is an interface between the SOC and an external sensor, and is used to collect and process data from at least one external sensor.
  • the external sensor may be, for example, an accelerometer, a gyroscope, a control sensor, an image sensor, and so on.
  • the arithmetic processor can be a general-purpose processor, such as a central processing unit (CPU), or one or more integrated circuits, such as one or more application specific integrated circuits (ASICs), or , One or more digital signal processors (digital signal processors, DSP), or microprocessors, or, one or more field programmable gate arrays (FPGA), etc.
  • the arithmetic processor can include one or more cores, and can selectively schedule other units.
  • RAM can store some intermediate data during calculation or processing, such as intermediate calculation data of CPU and baseband.
  • ISP is used to process the data collected by the image sensor.
  • I/O is used for the SOC to interact with various external interfaces, such as the universal serial bus (USB) interface for data transmission.
  • USB universal serial bus
  • the memory can be a chip or a group of chips.
  • the display screen can be a touch screen, which is connected to the bus through a display interface.
  • the display interface can be used for data processing before image display, such as aliasing of multiple layers to be displayed, buffering of display data, or control and adjustment of screen brightness.
  • the mobile terminal 200 involved in the embodiment of the present application includes an image sensor, which can collect external signals such as light from the outside, and process and convert the external signals into sensor signals, that is, electrical signals.
  • the sensor signal can be a static image signal or a dynamic video image signal.
  • the image sensor may be a camera, for example.
  • the mobile terminal 200 involved in the embodiments of the present application further includes an image signal processor.
  • the image sensor collects sensor signals and transmits them to the image signal processor.
  • the image signal processor obtains the sensor signal and can perform image signal processing on the sensor signal. , In order to obtain the image signal of the sharpness, color, brightness and other aspects that are in line with the characteristics of the human eye.
  • the image signal processor involved in the embodiment of the present application may be one or a group of chips, that is, it may be integrated or independent.
  • the image signal processor included in the mobile terminal 200 may be an integrated ISP chip integrated in the arithmetic processor.
  • the mobile terminal 200 involved in the embodiments of the present application has the function of taking photos or recording videos.
  • the neural network-based image processing method provided in the embodiments of the present application mainly focuses on how to perform image signal processing based on the neural network.
  • a neural network is used to process the multi-frame images to be processed.
  • Neural network is a network structure that imitates the behavioral characteristics of animal neural network for information processing, also referred to as neural network for short.
  • the neural network can be composed of neural units, which can refer to an arithmetic unit that takes x s and intercept 1 as inputs, and the output of the arithmetic unit can be as shown in formula (1):
  • s 1, 2,...n, n is a natural number greater than 1
  • W s is the weight of x s
  • b is the bias of the neural unit.
  • f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal.
  • the output signal of the activation function can be used as the input of the next convolutional layer, and the activation function can be a sigmoid function.
  • a neural network is a network formed by connecting multiple above-mentioned single neural units together, that is, the output of one neural unit can be the input of another neural unit.
  • the input of each neural unit can be connected with the local receptive field of the previous layer to extract the characteristics of the local receptive field.
  • the local receptive field can be a region composed of several neural units.
  • the neural network 300 has N processing layers, where N ⁇ 3 and N takes a natural number.
  • the first layer of the neural network is the input layer 301, which is responsible for receiving input signals.
  • the last layer of the neural network is the output layer 303, which outputs the processing results of the neural network.
  • the other layers except the first and last layers are the intermediate layer 304.
  • These intermediate layers together form the hidden layer 302, each of the hidden layers
  • the middle layer of the layer can receive input signals and output signals, and the hidden layer is responsible for the processing of input signals.
  • Each layer represents a logic level of signal processing. Through multiple layers, data signals can be processed by multiple levels of logic.
  • the input signal of the neural network may be a signal in various forms such as a voice signal, a text signal, an image signal, and a temperature signal.
  • the processed image signals may be various sensor signals such as landscape signals captured by a camera (image sensor), image signals of a community environment captured by a display monitoring device, and facial signals of human faces acquired by an access control system.
  • the input signals of the neural network include various other engineering signals that can be processed by computers, so I won't list them all here. If the neural network is used for deep learning of the image signal, the image quality can be improved.
  • Deep neural network also known as multi-layer neural network
  • the DNN is divided according to the positions of different layers.
  • the neural network inside the DNN can be divided into three categories: input layer, hidden layer and output layer. Generally speaking, the first layer is the input layer, the last layer is the output layer, and the number of layers in the middle are all hidden layers.
  • the layers are fully connected, that is to say, any neuron in the i-th layer must be connected to any neuron in the i+1th layer.
  • x is the input vector and y is The output vector
  • b is the offset vector
  • W is the weight matrix (also called coefficient)
  • ⁇ () is the activation function.
  • Each layer simply performs such a simple operation on the input vector x to obtain the output vector y. Due to the large number of DNN layers, the number of coefficients W and offset vectors b is also relatively large.
  • DNN The definition of these parameters in DNN is as follows: Take coefficient W as an example: Suppose in a three-layer DNN, the linear coefficients from the fourth neuron in the second layer to the second neuron in the third layer are defined as Among them, the superscript 3 represents the number of layers where the coefficient W is located, and the subscript corresponds to the output third-level index 2 and the input second-level index 4.
  • the coefficient from the kth neuron in the L-1th layer to the jth neuron in the Lth layer is defined as
  • Convolutional neural network (convolutional neuron network, CNN) is a deep neural network with a convolutional structure.
  • the convolutional neural network contains a feature extractor composed of a convolutional layer and a sub-sampling layer.
  • the feature extractor can be regarded as a filter.
  • the convolutional layer refers to the neuron layer that performs convolution processing on the input signal in the convolutional neural network.
  • a neuron can be connected to only part of the neighboring neurons.
  • a convolutional layer usually contains several feature planes, and each feature plane can be composed of some rectangularly arranged neural units. Neural units in the same feature plane share weights, and the shared weights here are the convolution kernels.
  • Sharing weight can be understood as the way of extracting image information has nothing to do with location.
  • the convolution kernel can be initialized in the form of a matrix of random size. In the training process of the convolutional neural network, the convolution kernel can obtain reasonable weights through learning. In addition, the direct benefit of sharing weights is to reduce the connections between the layers of the convolutional neural network, and at the same time reduce the risk of overfitting.
  • the neural network in the embodiment of the present application may be a convolutional neural network, and of course, it may also be another type of neural network, such as a recurrent neural network (recurrent neural network, RNN).
  • recurrent neural network recurrent neural network
  • the images in the embodiments of the present application may be static images (or referred to as static pictures) or dynamic images (or referred to as dynamic pictures).
  • the images in the present application may be videos or dynamic pictures, or the present application
  • the images in can also be static pictures or photos.
  • static images or dynamic images are collectively referred to as images.
  • the method is executed by an image processing device based on a neural network.
  • the neural network-based image processing device may be any device or device with image processing functions to execute, for example, the method is executed by the mobile terminal 200 shown in FIG. 2 or executed by a device related to the mobile terminal, or It is executed by part of the equipment included in the mobile terminal.
  • multiple neural networks are used for image processing, for example, two neural networks are used to process the image to be processed, and the two neural networks are denoted as the first neural network and the second neural network.
  • the first neural network and the second neural network conform to the above description of the neural network.
  • the neural network-based image processing method provided by the embodiment of the present application includes the following steps.
  • S401 Input multiple frames of to-be-processed images into a first neural network for calculation to obtain a first image.
  • S402 Input multiple image groups into multiple second neural networks to perform operations to obtain multiple frames of second images, each image group includes the first image and one frame of the multiple frames of images to be processed.
  • n is an integer greater than or equal to 2.
  • each second neural network receives the first image and a frame of image to be processed, and each second neural network outputs a frame of the second image, that is, the first second neural network outputs the first frame of the second image, The second second neural network outputs the second image of the second frame, and so on, the nth second neural network outputs the second image of the nth frame.
  • the first image is obtained after multiple frames of images to be processed through the first neural network operation, that is, the image characteristics common to the multiple frames of images to be processed are obtained.
  • the first image and a frame of image to be processed are subjected to a second neural network operation to obtain a second image, so as to obtain multiple frames of second images respectively. Since the first neural network and the second neural network are used to process multiple frames of images to be processed, the first image is applied to the processing of the second neural network, which reduces the computational complexity of the second neural network and can Ensure the quality of image processing.
  • the first neural network is used to process static regions of multiple frames of images to be processed.
  • the first image may be an image of a static area shared by multiple frames of images to be processed. Since the features of the static region occupy a high proportion of the network complexity, the features of the static region are processed first through the first neural network, and the processing results of the features of the static region are input into the second neural network as an intermediate result. The complexity of the second neural network The requirements will be reduced. Through the combined use of two neural networks, when processing multiple frames of images, it can achieve lower complexity than one neural network.
  • the second neural network is used to process the motion regions of multiple frames of images to be processed.
  • each second neural network receives the first image and a frame of image to be processed. Further, each second neural network outputs a frame of the third image, and merges the first image and the third image to obtain the second image.
  • Each second neural network outputs a frame of the second image, that is, the first second neural network outputs the first frame of the second image, the second second neural network outputs the second frame of the second image, and so on, the nth The second neural network outputs the second image of the nth frame.
  • combining the first image and the third image can also be considered as combining the first image and the third image, for example, performing a matrix addition operation on the first image and the third image to obtain the second image.
  • the first image is an image of a static area processed by the first neural network
  • the third image is an image of a moving area processed by the second neural network
  • the first image and the third image are combined, namely The processed image of the static area and the processed image of the moving area are combined to obtain a complete second image.
  • the first neural network and/or the second neural network do not divide or recognize the image to be processed as static regions and/or moving regions, but process the image to be processed as a whole.
  • the characteristics of the first neural network will make it process the image area with static area characteristics in the image to be processed
  • the characteristics of the second neural network will make it process the image to be processed and the intermediate image processed by the first neural network.
  • the image area with the characteristics of the moving area is processed.
  • the characteristics of the first neural network will cause it to perform higher-intensity processing on image areas with static area characteristics in the image to be processed, and perform lower-intensity processing on image areas with moving area characteristics, while the second neural network
  • the characteristics of the network itself will make the image area to be processed and the intermediate image processed by the first neural network to be processed with higher intensity, and the image to be processed and the intermediate image processed by the first neural network are static.
  • the image area with regional characteristics is processed with lower intensity.
  • the first image may be an image processed by the first neural network in a static area
  • the third image may be an image processed by the second neural network in a moving area.
  • the difference from the implementation manner described in FIG. 6 is that the second neural network also receives the intermediate feature map output by the first neural network.
  • the intermediate feature map is used to participate in the operation of the second neural network to obtain the third image.
  • a frame of to-be-processed image and an intermediate feature map may be subjected to vector splicing or vector addition to obtain the to-be-processed image matrix, and the to-be-processed image matrix may be input to the second neural network for operation to obtain the third image.
  • the vector stitching of a frame of image to be processed and the intermediate feature map can be regarded as the internal processing process of the second neural network.
  • the input to the second neural network is an overall matrix, that is, the matrix of images to be processed.
  • the second neural network also receives a multi-frame intermediate feature map output by the first neural network.
  • a frame of image to be processed and the intermediate feature map of the first frame can be vector spliced or vector added to obtain an image matrix to be processed, and the image matrix to be processed can be input to the second neural network for operation to obtain the intermediate feature map of the second neural network.
  • the first neural network includes one image output layer and multiple feature map output layers.
  • the image output layer outputs the first image.
  • the feature map output layer outputs multiple intermediate feature maps.
  • no specific distinction is made between the first image, the second image, and the third image.
  • no specific distinction is made between the color and texture of the first image, the second image, and the third image.
  • the multiple frames of images to be processed include multiple frames of temporally adjacent images.
  • multiple frames of temporally adjacent images include multiple frames of temporally continuous images.
  • the processed images are also corresponding multiple frames.
  • the second images obtained through multiple second neural networks are multiple frames, and each frame of the image to be processed corresponds to one frame of the second image.
  • Each frame of the second image corresponds to the first image and one frame of the third image.
  • the format of the image to be processed may be a red-green-blue (RGB) format, a bright color separation (YUV) format, or a Bayer format.
  • RGB red-green-blue
  • YUV bright color separation
  • Bayer format There is no limitation in this application.
  • the number of multi-frame to-be-processed images is 4 frames, and the 4 frames of to-be-processed images are input to the first neural network for calculation to obtain the first image.
  • the first image is obtained corresponding to 4 frames of images to be processed.
  • the 4 image groups are respectively input to 4 second neural networks to perform operations to obtain 4 second images respectively, and each image group includes the first image and one of the 4 images to be processed.
  • the first frame of the image to be processed corresponds to the first frame of the second image
  • the second frame of image to be processed corresponds to the second frame of the second image
  • the third frame of image to be processed corresponds to the third frame of the second image
  • the fourth frame The image to be processed corresponds to the fourth frame of the second image.
  • the first frame of image to be processed is input to the first second neural network for calculation to obtain the first frame of the third image, and the first image and the first frame of the third image are combined to obtain the first frame of second image.
  • the second frame of image to be processed is input to the second second neural network for operation to obtain the second frame of the third image, and the first image and the second frame of the third image are combined to obtain the second frame of the second image.
  • the third frame of image to be processed is input to the third second neural network for operation to obtain the third frame of the third image, and the first image and the third frame of the third image are combined to obtain the third frame of the second image.
  • the fourth frame of image to be processed is input to the fourth second neural network for operation to obtain the fourth frame of the third image, and the first image and the fourth frame of the third image are combined to obtain the fourth frame of the second image.
  • the first neural network and the second neural network can be combined into an image processing system, and the image repair system is used to process the image to be processed to improve the quality of the image or video.
  • the processing process can include processing such as noise reduction and elimination of mosaic effects.
  • the complexity of the first neural network is higher than the complexity of the second neural network.
  • the processing capability of the first neural network on the static area of the image is greater than that of the second neural network.
  • multiple frames of images are often synthesized into one frame of output through a neural network to improve image or video quality.
  • a neural network requires a high degree of complexity, and in a video scene, a high processing speed is required.
  • the implementation of video processing on a mobile terminal requires a video with a resolution of 8K to reach a processing speed of 30 frames/s, that is, a frame rate of 30.
  • a neural network is used to synthesize multiple frames of images into one frame for output, it needs to face the problems of computational complexity and computational resource consumption, and a large time delay is required. If you blindly reduce the complexity of the neural network and use a lower complexity network, it will affect the quality of the image or video.
  • the first neural network is used to deal with the problem of complex computing power between multiple frames of images
  • the second neural network is used to deal with the problem of lower computing power in each frame of the multi-frame images, and output the multi-frame processed
  • the integrated computing power of the first neural network and the second neural network is allocated to multiple frames of images, so that the processing complexity of each frame of image is reduced compared with the above-mentioned solution, and at the same time, the quality of the image or video can be guaranteed.
  • the first image is an image of a static area
  • the third image is an image of a moving area
  • the first neural network is used to process the static area of the multi-frame image to be processed
  • the second neural network is used to process the motion of the multi-frame image to be processed
  • the first neural network and the second neural network are convolutional neural networks as an example.
  • the image to be processed is 4 frames
  • the second image is 4 frames.
  • the format of the image to be processed is a Bayer format image, in particular the image format is an RGrGbB format, and one frame of an RGrGbB format image includes 4 channels (R, Gr, Gb, B).
  • R, Gr, Gb, B 4 channels
  • the image processing system includes a first neural network and a second neural network.
  • the 16 channels include (R1, Gr1, Gb1, B1, R2, Gr2, Gb2, B2, R3, Gr3, Gb3, B3, R4, Gr4, Gb4, B4).
  • Figures 9a and 9b Exemplarily, the architecture of the first neural network is shown in Figures 9a and 9b. Since the drawings of the first neural network are too large, the first neural network is split into two parts, as shown in Figure 9a and Figure 9b, respectively. out. Figures 9a and 9b together form the architecture of the first neural network. After add in Figure 9a, connect to the first layer in Figure 9b.
  • the convolutional layer is represented by a rectangular box.
  • Conv2d represents a 2-dimensional convolution
  • bias represents the bias term
  • 1x1/3x3 represents the size of the convolution kernel
  • stride represents the step size
  • _32_16 represents the number of input and output feature maps
  • 32 represents the number of feature maps input to the layer is 32
  • 16 means that the number of feature maps of the output layer is 16.
  • Split represents the split layer, which means that the feature map is split in the channel (chanel) dimension.
  • Split 2 means to split the image in the feature map dimension. For example, if 32 feature maps are input through the above operation, they will become two 16 feature maps.
  • concat stands for the jump chain layer, which means that the images will be merged in the dimension of the feature map, for example, two images of 16 feature maps are merged into one image of 32 feature maps.
  • add represents a matrix addition operation.
  • the first neural network shown in FIG. 9a and FIG. 9b is a typical convolutional neural network, which can well solve the static area of multiple frames of images to be processed. Assume 4 that the image to be processed is input to the first neural network, and the first neural network outputs the first image.
  • the convolutional layer of the first neural network may also adopt a group convolution (group convolution) operation.
  • group convolution is a special convolution layer.
  • N convolution kernels in the previous layer.
  • M for group convolution.
  • the operation of the group of convolutional layers is to first divide the N channels into M parts. Each group corresponds to N/M channels, and the convolution of each group is performed independently. After completion, the output feature maps are vector concatenated (concat) together as the output channel of this layer.
  • the operation mode of group convolution can obtain the same or similar technical effect of the branch mode.
  • FIG. 10a and Figure 10b Exemplarily, the architecture of 4 second neural networks processing 4 images to be processed is shown in Fig. 10a and Fig. 10b.
  • the output first image in Fig. 10a is input to the first layer in Fig. 10b.
  • the convolutional layer is represented by a rectangular box.
  • Fig. 9a and Fig. 9b For specific explanation, please refer to the explanation in Fig. 9a and Fig. 9b.
  • the first frame of image to be processed is input to the first second neural network to obtain the first frame of the third image, and the first image and the first frame of the third image are combined to obtain the first frame of the second image .
  • the second frame of image to be processed is input to the second second neural network to obtain the second frame of the third image, and the first image and the second frame of the third image are combined to obtain the second frame of the second image.
  • the third frame of the image to be processed is input to the third second neural network to obtain the third frame of the third image, and the first image and the third frame of the third image are combined to obtain the third frame of the second image .
  • the fourth frame of image to be processed is input to the fourth second neural network to obtain the fourth frame of the third image, and the first image and the fourth frame of the third image are combined to obtain the fourth frame of the second image.
  • the first neural network can also input an intermediate feature map, and the intermediate feature map is input to the second neural network, and the intermediate feature map is used to participate in the calculation of the second neural network to obtain The third image.
  • the intermediate feature map output by the second convolutional layer in the first neural network is used for vector stitching with the image output by the first convolutional layer of the second neural network to obtain a processed image.
  • the intermediate feature map output by the fourth convolution layer in the first neural network is used for vector stitching with the intermediate feature map output by the third convolution layer of the second neural network to obtain a processed image.
  • the neural network model needs to be trained.
  • the training data can include training images and ground truth images.
  • the output image is compared with the true value image of the first image until the network converges, and the training of the first neural network model is completed.
  • the so-called network convergence may refer to, for example, that the true value image difference between the output image and the first image is smaller than the set first threshold.
  • Fix the parameters of the first image obtained by the training of the first neural network use the true value image of the third image to process the collected training image to obtain and output the image.
  • the output image is compared with the true value image of the third image until the network converges, and the training of the second neural network model is completed.
  • network convergence may mean that the difference between the output image and the true value image of the third image is less than the set second threshold.
  • an image processing system is formed by a first neural network and a second neural network, and the image processing system is used to process multiple frames of to-be-processed images, and output multiple frames of processed images.
  • the complexity of the second neural network is lower than the complexity of the first neural network.
  • the calculation amount of the image processing system for each frame of the image to be processed is reduced to a certain extent compared with the scheme of processing multiple frames of images into one frame through the basic network in some technologies. In turn, the image processing time delay can be reduced, and the quality of the image or video can be guaranteed.
  • the computing power of the two neural networks for processing multiple frames of images to be processed will be illustrated below with examples.
  • the processed image output by the first neural network and the second neural network is 4 frames.
  • a frame is output after basic network processing.
  • the first neural network is shown in Figures 9a and 9b
  • the second neural network is shown in Figures 10a and 10b.
  • the calculation amount of the first neural network is about the same as that of the basic network, which is about 12000 MAC.
  • the calculation process of the network complexity of the basic network is as follows:
  • the calculation amount of the second neural network is about 4000, which is assumed to be 4000.
  • the multi-frame input and multi-frame output schemes performed by the first neural network and the second neural network can reduce the amount of calculation, thereby reducing the delay of image processing, and can meet the requirements of video in video scenarios.
  • Requirements for image processing delay The network computing power requirement of a video with a resolution of 8 thousand (K) pixels and a frame rate of 30 frames per second is about 50000 MAC.
  • the embodiment of this application outputs 8 frames, the amount of calculation prescribed by the image processing system is basically sufficient Meet the network computing power requirements of 8K 30 videos.
  • the neural network-based image processing device may include a hardware structure and/or a software module, and implement the above in the form of a hardware structure, a software module, or a hardware structure plus a software module.
  • a hardware structure e.g., a hardware structure plus a software module.
  • Each function Whether a certain function among the above-mentioned functions is executed by a hardware structure, a software module, or a hardware structure plus a software module depends on the specific application and design constraint conditions of the technical solution.
  • an embodiment of the present application also provides a neural network-based image processing apparatus 1200.
  • the neural network-based image processing apparatus 1200 may be a mobile terminal or any device with image processing functions. .
  • the neural network-based image processing device 1200 may include modules that perform one-to-one correspondence of the methods/operations/steps/actions in the foregoing method embodiments.
  • the modules may be hardware circuits, software, or It is realized by hardware circuit combined with software.
  • the neural network-based image processing device 1200 may include an arithmetic module 1201.
  • the arithmetic module 1201 is configured to input multiple frames of to-be-processed images into a first neural network for calculation to obtain a first image; and input multiple image groups into multiple second neural networks to perform calculations to obtain multiple frames of second images respectively , wherein each of the image groups includes one frame of the first image and the multiple frames of images to be processed.
  • the division of modules in the embodiments of this application is illustrative, and it is only a logical function division. In actual implementation, there may be other division methods.
  • the functional modules in the various embodiments of this application can be integrated into one process. In the device, it can also exist alone physically, or two or more modules can be integrated into one module.
  • the above-mentioned integrated modules can be implemented in the form of hardware or software functional modules.
  • an embodiment of the present application also provides an image processing device 1300 based on a neural network.
  • the image processing device 1300 of the neural network includes a processor 1301.
  • the processor 1301 is used to call a group of programs to enable the foregoing method embodiments to be executed.
  • the image processing device 1300 of the neural network further includes a memory 1302, and the memory 1302 is configured to store program instructions and/or data executed by the processor 1301.
  • the memory 1302 is coupled with the processor 1301.
  • the coupling in the embodiments of the present application is an indirect coupling or communication connection between devices, units or modules, and may be in electrical, mechanical or other forms, and is used for information exchange between devices, units or modules.
  • the processor 1301 may operate in cooperation with the memory 1302.
  • the processor 1301 may execute program instructions stored in the memory 1302.
  • the memory 1302 may be included in the processor 1301.
  • the neural network-based image processing device 1300 may be a chip system.
  • the chip system may be composed of chips, or may include chips and other discrete devices.
  • the processor 1301 is configured to input multiple frames of to-be-processed images into a first neural network for operation to obtain a first image; and input multiple image groups into multiple second neural networks for operation to obtain multiple frames of second images respectively , wherein each of the image groups includes one frame of the first image and the multiple frames of images to be processed.
  • the processor 1301 may be a general-purpose processor, a digital signal processor, an application specific integrated circuit, a field programmable gate array or other programmable logic device, a discrete gate or transistor logic device, a discrete hardware component, and may implement or execute the The disclosed methods, steps and logic block diagrams.
  • the general-purpose processor may be a microprocessor or any conventional processor or the like. The steps of the method disclosed in combination with the embodiments of the present application may be directly embodied as being executed and completed by a hardware processor, or executed and completed by a combination of hardware and software modules in the processor.
  • the memory 1302 may be a non-volatile memory, such as a hard disk drive (HDD) or a solid-state drive (SSD), etc., and may also be a volatile memory, such as random access memory (random access memory). -access memory, RAM).
  • the memory is any other medium that can be used to carry or store desired program codes in the form of instructions or data structures and that can be accessed by a computer, but is not limited thereto.
  • the memory in the embodiment of the present application may also be a circuit or any other device capable of realizing a storage function for storing program instructions and/or data.
  • An embodiment of the present application also provides a chip including a processor, which is used to support the neural network-based image processing device to implement the functions involved in the foregoing method embodiments.
  • the chip is connected to a memory or the chip includes a memory, and the memory is used to store the necessary program instructions and data of the communication device.
  • the embodiment of the present application provides a computer-readable storage medium that stores a computer program, and the computer program includes instructions for executing the foregoing method embodiments.
  • the embodiments of the present application provide a computer program product containing instructions, which when run on a computer, cause the computer to execute the foregoing method embodiments.
  • this application can be provided as methods, systems, or computer program products. Therefore, this application may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, this application may adopt the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions can also be stored in a computer-readable memory that can guide a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device.
  • the device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
  • These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment.
  • the instructions provide steps for implementing the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Picture Signal Circuits (AREA)
  • Apparatus For Radiation Diagnosis (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

La présente demande concerne le domaine technique du traitement d'image et divulgue un procédé et un appareil de traitement d'image sur la base d'un réseau neuronal, à utiliser pour garantir la qualité d'image et réduire le temps de traitement d'image. Le procédé consiste à : entrer de multiples trames d'images à traiter dans un premier réseau neuronal pour le calcul afin d'obtenir une première image ; et entrer respectivement de multiples groupes d'images dans de multiples seconds réseaux neuronaux pour le calcul afin d'obtenir respectivement de multiples trames de secondes images, chaque groupe d'images comprenant la première image et l'une des multiples trames d'images à traiter.
PCT/CN2020/082634 2020-03-31 2020-03-31 Procédé et appareil de traitement d'image sur la base d'un réseau neuronal WO2021196050A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2020/082634 WO2021196050A1 (fr) 2020-03-31 2020-03-31 Procédé et appareil de traitement d'image sur la base d'un réseau neuronal
CN202080099095.0A CN115335852A (zh) 2020-03-31 2020-03-31 一种基于神经网络的图像处理方法及装置

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/082634 WO2021196050A1 (fr) 2020-03-31 2020-03-31 Procédé et appareil de traitement d'image sur la base d'un réseau neuronal

Publications (1)

Publication Number Publication Date
WO2021196050A1 true WO2021196050A1 (fr) 2021-10-07

Family

ID=77927278

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/082634 WO2021196050A1 (fr) 2020-03-31 2020-03-31 Procédé et appareil de traitement d'image sur la base d'un réseau neuronal

Country Status (2)

Country Link
CN (1) CN115335852A (fr)
WO (1) WO2021196050A1 (fr)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107808365A (zh) * 2017-09-30 2018-03-16 广州智慧城市发展研究院 一种基于卷积神经网络自压缩的图像降噪方法及系统
CN108197623A (zh) * 2018-01-19 2018-06-22 百度在线网络技术(北京)有限公司 用于检测目标的方法和装置
CN108737750A (zh) * 2018-06-07 2018-11-02 北京旷视科技有限公司 图像处理方法、装置及电子设备
US10255663B2 (en) * 2016-11-11 2019-04-09 Kabushiki Kaisha Toshiba Image processing device, image processing method, computer program product
CN109886892A (zh) * 2019-01-17 2019-06-14 迈格威科技有限公司 图像处理方法、图像处理装置以及存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10255663B2 (en) * 2016-11-11 2019-04-09 Kabushiki Kaisha Toshiba Image processing device, image processing method, computer program product
CN107808365A (zh) * 2017-09-30 2018-03-16 广州智慧城市发展研究院 一种基于卷积神经网络自压缩的图像降噪方法及系统
CN108197623A (zh) * 2018-01-19 2018-06-22 百度在线网络技术(北京)有限公司 用于检测目标的方法和装置
CN108737750A (zh) * 2018-06-07 2018-11-02 北京旷视科技有限公司 图像处理方法、装置及电子设备
CN109886892A (zh) * 2019-01-17 2019-06-14 迈格威科技有限公司 图像处理方法、图像处理装置以及存储介质

Also Published As

Publication number Publication date
CN115335852A (zh) 2022-11-11

Similar Documents

Publication Publication Date Title
US11430209B2 (en) Image signal processing method, apparatus, and device
US10136110B2 (en) Low-light image quality enhancement method for image processing device and method of operating image processing system performing the method
US10417771B2 (en) Fast MRF energy optimization for solving scene labeling problems
US11153506B2 (en) Application processor including multiple camera serial interfaces receiving image signals from multiple camera modules
CN107431770A (zh) 自适应线性亮度域视频流水线架构
WO2020062312A1 (fr) Dispositif de traitement de signal et procédé de traitement de signal
JP2020042774A (ja) 人工知能推論演算装置
CN111951171A (zh) Hdr图像生成方法、装置、可读存储介质及终端设备
WO2021196050A1 (fr) Procédé et appareil de traitement d'image sur la base d'un réseau neuronal
US20230388623A1 (en) Composite image signal processor
KR20130018899A (ko) 단일 파이프라인 스테레오 이미지 캡처
WO2021179147A1 (fr) Procédé et appareil de traitement d'image basés sur un réseau neuronal
CN111598781B (zh) 一种基于混合高阶注意网络的图像超分辨率方法
US11941789B2 (en) Tone mapping and tone control integrations for image processing
CN114363693B (zh) 画质调整方法及装置
CN116205806B (zh) 一种图像增强方法及电子设备
CN109688333B (zh) 彩色图像获取方法、装置、设备以及存储介质
CN116012262B (zh) 一种图像处理方法、模型训练方法及电子设备
US20230388673A1 (en) Patch-based image sensor
CN117768774A (zh) 图像处理器、图像处理方法、拍摄装置和电子设备
CN115619679A (zh) 图像处理方法、装置、设备、存储介质及计算机程序
KR20240035992A (ko) 현저성에 기초한 초해상도
CN116228554A (zh) 图像恢复方法、装置和计算机存储介质
KR20240047283A (ko) 인공지능 기반의 영상 처리 방법 및 이를 지원하는 전자 장치
CN115801987A (zh) 一种视频插帧方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20928798

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20928798

Country of ref document: EP

Kind code of ref document: A1