WO2021196050A1 - Neural network-based image processing method and apparatus - Google Patents

Neural network-based image processing method and apparatus Download PDF

Info

Publication number
WO2021196050A1
WO2021196050A1 PCT/CN2020/082634 CN2020082634W WO2021196050A1 WO 2021196050 A1 WO2021196050 A1 WO 2021196050A1 CN 2020082634 W CN2020082634 W CN 2020082634W WO 2021196050 A1 WO2021196050 A1 WO 2021196050A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
neural network
processed
images
frame
Prior art date
Application number
PCT/CN2020/082634
Other languages
French (fr)
Chinese (zh)
Inventor
李蒙
郑成林
胡慧
陈海
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2020/082634 priority Critical patent/WO2021196050A1/en
Priority to CN202080099095.0A priority patent/CN115335852A/en
Publication of WO2021196050A1 publication Critical patent/WO2021196050A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/60Image enhancement or restoration using machine learning, e.g. neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Definitions

  • the embodiments of the present application relate to the field of image processing technology, and in particular, to a neural network-based image processing method and device.
  • the mobile terminal performs image signal processing (ISP) on the image signal.
  • ISP image signal processing
  • the main function of ISP is to perform post-processing on the image signal output by the front-end image sensor. Depending on the ISP, the images obtained under different optical conditions can better restore the details of the scene.
  • the ISP processing flow is shown in Figure 1.
  • the natural scene 101 obtains the Bayer image through the lens 102, and then obtains the analog electrical signal 105 through the sensor 103 and the photoelectric conversion 104, and further passes noise reduction and digital-to-analog conversion (A/ D) 106 obtains a digital electrical signal (ie raw image) 107, and then enters the digital signal processing chip 100.
  • the steps in the digital signal processing chip 100 are the core steps of ISP processing.
  • the digital signal processing chip 100 generally includes black level compensation (BLC) 108, lens shading correction 109, and dead pixel correction ( bad pixel correction, BPC) 110, demosaic (demosaic) 111, Bayer domain noise reduction (denoise) 112, auto white balance (AWB) 113, Ygamma 114, auto exposure (AE) 115, auto focus (auto focus, AF) (not shown in Figure 1), color correction (CC) 116, gamma correction 117, color gamut conversion 118, color denoising/detail enhancement 119, color enhancement (color Enhance (CE) 120, formater (formater) 121, input/output (input/output, I/O) control 122 and other modules.
  • BLC black level compensation
  • BPC dead pixel correction
  • demosaic demosaic
  • Bayer domain noise reduction denoise
  • ARB auto white balance
  • AE auto exposure
  • AF auto focus
  • CE color Enhance
  • CE color Enhance
  • ISP based on deep learning has achieved certain results in the application of many tasks.
  • the ISP based on deep learning will process the image data through a neural network and then output it.
  • the processing complexity of the neural network is generally very high.
  • the expected purpose can be achieved, but in scenarios that require real-time processing , Generally there are problems such as energy consumption and running time.
  • ISP based on neural network needs to be further optimized.
  • This application provides a neural network-based image processing method and device, in order to optimize the neural network-based image signal processing performance.
  • a neural network-based image processing method uses a first neural network and a second neural network to process multiple frames of to-be-processed images, and output a second image.
  • the steps of the method are as follows: input multiple frames of to-be-processed images into the first neural network for calculation to obtain the first image; input multiple image groups into multiple second neural networks for calculation respectively to obtain multiple frames of the first neural network. Two images, where each image group includes the first image and one frame of images among the multiple frames of images to be processed.
  • the first image obtained after multiple frames of images to be processed is subjected to a first neural network operation, that is, the image characteristics common to the multiple frames of images to be processed are obtained.
  • the first image and a frame of image to be processed are subjected to a second neural network operation to obtain a second image, so as to obtain multiple frames of second images respectively. Since the first neural network and the second neural network are used to process multiple frames of images to be processed, the first image is applied to the processing of the second neural network, which reduces the computational complexity of the second neural network and can Ensure the quality of image processing.
  • inputting multiple image groups into multiple second neural networks to perform calculations includes: inputting one frame of the multiple frames of images to be processed into the second neural network to perform calculations, so as to obtain the third neural network. Image; merge the first image and the third image to obtain the second image.
  • the first neural network includes an image output layer and multiple feature map output layers, the image output layer outputs the first image, the feature map output layer outputs multiple intermediate feature maps, and the multiple intermediate feature maps are used to participate in the second The operation of the neural network to obtain the third image.
  • the complexity of the second neural network is lower than the complexity of the first neural network.
  • the multiple frames of images to be processed include multiple frames of temporally adjacent images.
  • the processing capability of the first neural network on the static region of the image is greater than that of the second neural network.
  • the first neural network is used to process the static area of the image to be processed.
  • the second neural network is used to process the motion area of the image to be processed.
  • the first neural network and the second neural network form an image processing system, and the image processing system is used to reduce noise and eliminate mosaic effects on the image to be processed.
  • a neural network-based image processing device can be a mobile terminal, a device in a mobile terminal (such as a chip, or a chip system, or a circuit), or a device that can be matched with the mobile terminal.
  • the device may include modules that perform one-to-one correspondence of the methods/operations/steps/actions described in the first aspect.
  • the modules may be hardware circuits, software, or hardware circuits combined with software.
  • the device processes multiple frames of to-be-processed images to obtain a second image.
  • the device may include an arithmetic module.
  • an arithmetic module for inputting multiple frames of images to be processed into a first neural network for operation to obtain a first image; the arithmetic module is also used for inputting multiple image groups into multiple second neural networks, respectively Perform operations to obtain multiple frames of second images respectively, where each image group includes the first image and one frame of the multiple frames of images to be processed.
  • the arithmetic module is used to: input one frame of the images to be processed into the second neural network for calculation to obtain the third image; merge the first image and the third image to obtain The second image.
  • the first neural network includes an image output layer and multiple feature map output layers, the image output layer outputs the first image, the feature map output layer outputs multiple intermediate feature maps, and the multiple intermediate feature maps are used to participate in the second The operation of the neural network to obtain the third image.
  • the complexity of the second neural network is lower than the complexity of the first neural network.
  • the multiple frames of images to be processed include multiple frames of temporally adjacent images.
  • the processing capability of the first neural network on the static region of the image is greater than that of the second neural network.
  • the first neural network is used to process the static area of the image to be processed.
  • the second neural network is used to process the motion area of the image to be processed.
  • the first neural network and the second neural network constitute an image processing system, and the image processing system is used to reduce noise and eliminate mosaic effects on the image to be processed.
  • an embodiment of the present application provides an image processing device based on a neural network.
  • the device includes a processor, and the processor is used to call a set of programs, instructions, or data to execute the first aspect or any one of the first aspects. Possible design methods described.
  • the device may also include a memory for storing programs, instructions or data called by the processor.
  • the memory is coupled with the processor, and when the processor executes the instructions or data stored in the memory, it can implement the method described in the first aspect or any possible design.
  • an embodiment of the present application provides a chip system, which includes a processor and may also include a memory, for implementing the method described in the first aspect or any one of the possible designs of the first aspect.
  • the chip system can be composed of chips, and can also include chips and other discrete devices.
  • the embodiments of the present application also provide a computer-readable storage medium.
  • the computer-readable storage medium stores computer-readable instructions.
  • the method described in one aspect or any one of the possible designs of the first aspect is executed.
  • the embodiments of the present application also provide a computer program product containing instructions, which when run on a computer, cause the computer to execute the method described in the first aspect or any possible design of the first aspect .
  • FIG. 1 is a schematic diagram of an ISP processing flow in the prior art
  • FIG. 2 is a schematic structural diagram of a system architecture provided by an embodiment of this application.
  • FIG. 3 is a schematic diagram of the principle of a neural network provided by an embodiment of the application.
  • FIG. 4 is a flowchart of a neural network-based image processing method provided by an embodiment of the application.
  • FIG. 5 is a schematic diagram of an implementation manner of image processing provided by an embodiment of the application.
  • FIG. 6 is a schematic diagram of an implementation manner of image processing provided by an embodiment of the application.
  • FIG. 7 is a schematic diagram of an implementation manner of image processing provided by an embodiment of the application.
  • FIG. 8 is a schematic diagram of an RGrGbB image processing process provided by an embodiment of the application.
  • FIG. 9a is one of the schematic structural diagrams of the first neural network provided by an embodiment of this application.
  • FIG. 9b is the second schematic diagram of the structure of the first neural network provided by an embodiment of the application.
  • FIG. 10a is one of the schematic structural diagrams of the second neural network provided by an embodiment of this application.
  • FIG. 10b is one of the schematic structural diagrams of the second neural network provided by an embodiment of this application.
  • FIG. 11a is a schematic structural diagram of a first neural network and a second neural network provided by an embodiment of this application;
  • FIG. 11b is a schematic diagram of the structure of the first neural network and the second neural network provided by an embodiment of the application;
  • FIG. 12 is a schematic structural diagram of an image processing device based on a neural network provided by an embodiment of the application.
  • FIG. 13 is a schematic structural diagram of an image processing device based on a neural network provided by an embodiment of the application.
  • words such as “exemplary” or “for example” are used as examples, illustrations, or illustrations. Any embodiment or design solution described as “exemplary” or “for example” in the embodiments of the present application should not be construed as being more preferable or advantageous than other embodiments or design solutions. To be precise, words such as “exemplary” or “for example” are used to present related concepts in a specific manner.
  • the image processing method and device based on neural network (NN) provided by the embodiments of this application can be applied to electronic equipment.
  • the electronic equipment may be a mobile terminal (mobile terminal), a mobile station (MS), Mobile devices such as user equipment (UE) can also be fixed devices, such as fixed phones, desktop computers, etc., or video monitors.
  • the electronic device is an image acquisition and processing device with image signal acquisition and processing functions, and has an ISP processing function.
  • the electronic device can also optionally have a wireless connection function to provide users with a handheld device with voice and/or data connectivity, or other processing devices connected to a wireless modem.
  • the electronic device can be a mobile phone (or (Called "cellular" phones), computers with mobile terminals, etc., can also be portable, pocket-sized, handheld, computer-built or vehicle-mounted mobile devices, of course, can also be wearable devices (such as smart watches, smart bracelets) Etc.), tablet computers, personal computers (PC), personal digital assistants (PDAs), point of sales (POS), etc.
  • the following takes the electronic device as a mobile terminal as an example for description.
  • FIG. 2 is a schematic diagram of an optional hardware structure of the mobile terminal 200 according to an embodiment of the application.
  • the mobile terminal 200 mainly includes a chipset and peripheral devices.
  • Components such as USB interface, memory, display screen, battery/mains power, earphone/speaker, antenna, sensor, etc. can be understood as peripheral devices.
  • the arithmetic processor, RAM, I/O, display interface, ISP, sensor interface, baseband and other components in the chipset can form a system-on-a-chip (SOC), which is the main part of the chipset.
  • SOC system-on-a-chip
  • the components in the SOC can all be integrated into a complete chip, or part of the components in the SOC can be integrated, and the other parts are not integrated.
  • the baseband communication module in the SOC can not be integrated with other parts and become an independent part.
  • the components in the SOC can be connected to each other through a bus or other connecting lines.
  • the PMU, voice codec, RF, etc. outside the SOC usually include analog circuit parts, so they are often outside the SOC and are not integrated with each other.
  • the PMU is used to connect to the mains or battery to supply power to the SOC, and the mains can be used to charge the battery.
  • the voice codec is used as the sound codec unit to connect with earphones or speakers to realize the conversion between natural analog voice signals and digital voice signals that can be processed by the SOC.
  • the short-range module can include wireless fidelity (WiFi) and Bluetooth, and can also optionally include infrared, near field communication (NFC), radio (FM) or global positioning system (GPS) ) Module etc.
  • the RF is connected with the baseband communication module in the SOC to realize the conversion between the air interface RF signal and the baseband signal, that is, mixing. For mobile phones, receiving is down-conversion, and sending is up-conversion.
  • Baseband is used for baseband communication, including one or more of a variety of communication modes, used for processing wireless communication protocols, including physical layer (layer 1), medium access control (MAC) ( Layer 2), radio resource control (RRC) (layer 3) and other protocol layers can support various cellular communication standards, such as long term evolution (LTE) communication, or 5G new air interface ( new radio, NR) communication, etc.
  • the sensor interface is an interface between the SOC and an external sensor, and is used to collect and process data from at least one external sensor.
  • the external sensor may be, for example, an accelerometer, a gyroscope, a control sensor, an image sensor, and so on.
  • the arithmetic processor can be a general-purpose processor, such as a central processing unit (CPU), or one or more integrated circuits, such as one or more application specific integrated circuits (ASICs), or , One or more digital signal processors (digital signal processors, DSP), or microprocessors, or, one or more field programmable gate arrays (FPGA), etc.
  • the arithmetic processor can include one or more cores, and can selectively schedule other units.
  • RAM can store some intermediate data during calculation or processing, such as intermediate calculation data of CPU and baseband.
  • ISP is used to process the data collected by the image sensor.
  • I/O is used for the SOC to interact with various external interfaces, such as the universal serial bus (USB) interface for data transmission.
  • USB universal serial bus
  • the memory can be a chip or a group of chips.
  • the display screen can be a touch screen, which is connected to the bus through a display interface.
  • the display interface can be used for data processing before image display, such as aliasing of multiple layers to be displayed, buffering of display data, or control and adjustment of screen brightness.
  • the mobile terminal 200 involved in the embodiment of the present application includes an image sensor, which can collect external signals such as light from the outside, and process and convert the external signals into sensor signals, that is, electrical signals.
  • the sensor signal can be a static image signal or a dynamic video image signal.
  • the image sensor may be a camera, for example.
  • the mobile terminal 200 involved in the embodiments of the present application further includes an image signal processor.
  • the image sensor collects sensor signals and transmits them to the image signal processor.
  • the image signal processor obtains the sensor signal and can perform image signal processing on the sensor signal. , In order to obtain the image signal of the sharpness, color, brightness and other aspects that are in line with the characteristics of the human eye.
  • the image signal processor involved in the embodiment of the present application may be one or a group of chips, that is, it may be integrated or independent.
  • the image signal processor included in the mobile terminal 200 may be an integrated ISP chip integrated in the arithmetic processor.
  • the mobile terminal 200 involved in the embodiments of the present application has the function of taking photos or recording videos.
  • the neural network-based image processing method provided in the embodiments of the present application mainly focuses on how to perform image signal processing based on the neural network.
  • a neural network is used to process the multi-frame images to be processed.
  • Neural network is a network structure that imitates the behavioral characteristics of animal neural network for information processing, also referred to as neural network for short.
  • the neural network can be composed of neural units, which can refer to an arithmetic unit that takes x s and intercept 1 as inputs, and the output of the arithmetic unit can be as shown in formula (1):
  • s 1, 2,...n, n is a natural number greater than 1
  • W s is the weight of x s
  • b is the bias of the neural unit.
  • f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal.
  • the output signal of the activation function can be used as the input of the next convolutional layer, and the activation function can be a sigmoid function.
  • a neural network is a network formed by connecting multiple above-mentioned single neural units together, that is, the output of one neural unit can be the input of another neural unit.
  • the input of each neural unit can be connected with the local receptive field of the previous layer to extract the characteristics of the local receptive field.
  • the local receptive field can be a region composed of several neural units.
  • the neural network 300 has N processing layers, where N ⁇ 3 and N takes a natural number.
  • the first layer of the neural network is the input layer 301, which is responsible for receiving input signals.
  • the last layer of the neural network is the output layer 303, which outputs the processing results of the neural network.
  • the other layers except the first and last layers are the intermediate layer 304.
  • These intermediate layers together form the hidden layer 302, each of the hidden layers
  • the middle layer of the layer can receive input signals and output signals, and the hidden layer is responsible for the processing of input signals.
  • Each layer represents a logic level of signal processing. Through multiple layers, data signals can be processed by multiple levels of logic.
  • the input signal of the neural network may be a signal in various forms such as a voice signal, a text signal, an image signal, and a temperature signal.
  • the processed image signals may be various sensor signals such as landscape signals captured by a camera (image sensor), image signals of a community environment captured by a display monitoring device, and facial signals of human faces acquired by an access control system.
  • the input signals of the neural network include various other engineering signals that can be processed by computers, so I won't list them all here. If the neural network is used for deep learning of the image signal, the image quality can be improved.
  • Deep neural network also known as multi-layer neural network
  • the DNN is divided according to the positions of different layers.
  • the neural network inside the DNN can be divided into three categories: input layer, hidden layer and output layer. Generally speaking, the first layer is the input layer, the last layer is the output layer, and the number of layers in the middle are all hidden layers.
  • the layers are fully connected, that is to say, any neuron in the i-th layer must be connected to any neuron in the i+1th layer.
  • x is the input vector and y is The output vector
  • b is the offset vector
  • W is the weight matrix (also called coefficient)
  • ⁇ () is the activation function.
  • Each layer simply performs such a simple operation on the input vector x to obtain the output vector y. Due to the large number of DNN layers, the number of coefficients W and offset vectors b is also relatively large.
  • DNN The definition of these parameters in DNN is as follows: Take coefficient W as an example: Suppose in a three-layer DNN, the linear coefficients from the fourth neuron in the second layer to the second neuron in the third layer are defined as Among them, the superscript 3 represents the number of layers where the coefficient W is located, and the subscript corresponds to the output third-level index 2 and the input second-level index 4.
  • the coefficient from the kth neuron in the L-1th layer to the jth neuron in the Lth layer is defined as
  • Convolutional neural network (convolutional neuron network, CNN) is a deep neural network with a convolutional structure.
  • the convolutional neural network contains a feature extractor composed of a convolutional layer and a sub-sampling layer.
  • the feature extractor can be regarded as a filter.
  • the convolutional layer refers to the neuron layer that performs convolution processing on the input signal in the convolutional neural network.
  • a neuron can be connected to only part of the neighboring neurons.
  • a convolutional layer usually contains several feature planes, and each feature plane can be composed of some rectangularly arranged neural units. Neural units in the same feature plane share weights, and the shared weights here are the convolution kernels.
  • Sharing weight can be understood as the way of extracting image information has nothing to do with location.
  • the convolution kernel can be initialized in the form of a matrix of random size. In the training process of the convolutional neural network, the convolution kernel can obtain reasonable weights through learning. In addition, the direct benefit of sharing weights is to reduce the connections between the layers of the convolutional neural network, and at the same time reduce the risk of overfitting.
  • the neural network in the embodiment of the present application may be a convolutional neural network, and of course, it may also be another type of neural network, such as a recurrent neural network (recurrent neural network, RNN).
  • recurrent neural network recurrent neural network
  • the images in the embodiments of the present application may be static images (or referred to as static pictures) or dynamic images (or referred to as dynamic pictures).
  • the images in the present application may be videos or dynamic pictures, or the present application
  • the images in can also be static pictures or photos.
  • static images or dynamic images are collectively referred to as images.
  • the method is executed by an image processing device based on a neural network.
  • the neural network-based image processing device may be any device or device with image processing functions to execute, for example, the method is executed by the mobile terminal 200 shown in FIG. 2 or executed by a device related to the mobile terminal, or It is executed by part of the equipment included in the mobile terminal.
  • multiple neural networks are used for image processing, for example, two neural networks are used to process the image to be processed, and the two neural networks are denoted as the first neural network and the second neural network.
  • the first neural network and the second neural network conform to the above description of the neural network.
  • the neural network-based image processing method provided by the embodiment of the present application includes the following steps.
  • S401 Input multiple frames of to-be-processed images into a first neural network for calculation to obtain a first image.
  • S402 Input multiple image groups into multiple second neural networks to perform operations to obtain multiple frames of second images, each image group includes the first image and one frame of the multiple frames of images to be processed.
  • n is an integer greater than or equal to 2.
  • each second neural network receives the first image and a frame of image to be processed, and each second neural network outputs a frame of the second image, that is, the first second neural network outputs the first frame of the second image, The second second neural network outputs the second image of the second frame, and so on, the nth second neural network outputs the second image of the nth frame.
  • the first image is obtained after multiple frames of images to be processed through the first neural network operation, that is, the image characteristics common to the multiple frames of images to be processed are obtained.
  • the first image and a frame of image to be processed are subjected to a second neural network operation to obtain a second image, so as to obtain multiple frames of second images respectively. Since the first neural network and the second neural network are used to process multiple frames of images to be processed, the first image is applied to the processing of the second neural network, which reduces the computational complexity of the second neural network and can Ensure the quality of image processing.
  • the first neural network is used to process static regions of multiple frames of images to be processed.
  • the first image may be an image of a static area shared by multiple frames of images to be processed. Since the features of the static region occupy a high proportion of the network complexity, the features of the static region are processed first through the first neural network, and the processing results of the features of the static region are input into the second neural network as an intermediate result. The complexity of the second neural network The requirements will be reduced. Through the combined use of two neural networks, when processing multiple frames of images, it can achieve lower complexity than one neural network.
  • the second neural network is used to process the motion regions of multiple frames of images to be processed.
  • each second neural network receives the first image and a frame of image to be processed. Further, each second neural network outputs a frame of the third image, and merges the first image and the third image to obtain the second image.
  • Each second neural network outputs a frame of the second image, that is, the first second neural network outputs the first frame of the second image, the second second neural network outputs the second frame of the second image, and so on, the nth The second neural network outputs the second image of the nth frame.
  • combining the first image and the third image can also be considered as combining the first image and the third image, for example, performing a matrix addition operation on the first image and the third image to obtain the second image.
  • the first image is an image of a static area processed by the first neural network
  • the third image is an image of a moving area processed by the second neural network
  • the first image and the third image are combined, namely The processed image of the static area and the processed image of the moving area are combined to obtain a complete second image.
  • the first neural network and/or the second neural network do not divide or recognize the image to be processed as static regions and/or moving regions, but process the image to be processed as a whole.
  • the characteristics of the first neural network will make it process the image area with static area characteristics in the image to be processed
  • the characteristics of the second neural network will make it process the image to be processed and the intermediate image processed by the first neural network.
  • the image area with the characteristics of the moving area is processed.
  • the characteristics of the first neural network will cause it to perform higher-intensity processing on image areas with static area characteristics in the image to be processed, and perform lower-intensity processing on image areas with moving area characteristics, while the second neural network
  • the characteristics of the network itself will make the image area to be processed and the intermediate image processed by the first neural network to be processed with higher intensity, and the image to be processed and the intermediate image processed by the first neural network are static.
  • the image area with regional characteristics is processed with lower intensity.
  • the first image may be an image processed by the first neural network in a static area
  • the third image may be an image processed by the second neural network in a moving area.
  • the difference from the implementation manner described in FIG. 6 is that the second neural network also receives the intermediate feature map output by the first neural network.
  • the intermediate feature map is used to participate in the operation of the second neural network to obtain the third image.
  • a frame of to-be-processed image and an intermediate feature map may be subjected to vector splicing or vector addition to obtain the to-be-processed image matrix, and the to-be-processed image matrix may be input to the second neural network for operation to obtain the third image.
  • the vector stitching of a frame of image to be processed and the intermediate feature map can be regarded as the internal processing process of the second neural network.
  • the input to the second neural network is an overall matrix, that is, the matrix of images to be processed.
  • the second neural network also receives a multi-frame intermediate feature map output by the first neural network.
  • a frame of image to be processed and the intermediate feature map of the first frame can be vector spliced or vector added to obtain an image matrix to be processed, and the image matrix to be processed can be input to the second neural network for operation to obtain the intermediate feature map of the second neural network.
  • the first neural network includes one image output layer and multiple feature map output layers.
  • the image output layer outputs the first image.
  • the feature map output layer outputs multiple intermediate feature maps.
  • no specific distinction is made between the first image, the second image, and the third image.
  • no specific distinction is made between the color and texture of the first image, the second image, and the third image.
  • the multiple frames of images to be processed include multiple frames of temporally adjacent images.
  • multiple frames of temporally adjacent images include multiple frames of temporally continuous images.
  • the processed images are also corresponding multiple frames.
  • the second images obtained through multiple second neural networks are multiple frames, and each frame of the image to be processed corresponds to one frame of the second image.
  • Each frame of the second image corresponds to the first image and one frame of the third image.
  • the format of the image to be processed may be a red-green-blue (RGB) format, a bright color separation (YUV) format, or a Bayer format.
  • RGB red-green-blue
  • YUV bright color separation
  • Bayer format There is no limitation in this application.
  • the number of multi-frame to-be-processed images is 4 frames, and the 4 frames of to-be-processed images are input to the first neural network for calculation to obtain the first image.
  • the first image is obtained corresponding to 4 frames of images to be processed.
  • the 4 image groups are respectively input to 4 second neural networks to perform operations to obtain 4 second images respectively, and each image group includes the first image and one of the 4 images to be processed.
  • the first frame of the image to be processed corresponds to the first frame of the second image
  • the second frame of image to be processed corresponds to the second frame of the second image
  • the third frame of image to be processed corresponds to the third frame of the second image
  • the fourth frame The image to be processed corresponds to the fourth frame of the second image.
  • the first frame of image to be processed is input to the first second neural network for calculation to obtain the first frame of the third image, and the first image and the first frame of the third image are combined to obtain the first frame of second image.
  • the second frame of image to be processed is input to the second second neural network for operation to obtain the second frame of the third image, and the first image and the second frame of the third image are combined to obtain the second frame of the second image.
  • the third frame of image to be processed is input to the third second neural network for operation to obtain the third frame of the third image, and the first image and the third frame of the third image are combined to obtain the third frame of the second image.
  • the fourth frame of image to be processed is input to the fourth second neural network for operation to obtain the fourth frame of the third image, and the first image and the fourth frame of the third image are combined to obtain the fourth frame of the second image.
  • the first neural network and the second neural network can be combined into an image processing system, and the image repair system is used to process the image to be processed to improve the quality of the image or video.
  • the processing process can include processing such as noise reduction and elimination of mosaic effects.
  • the complexity of the first neural network is higher than the complexity of the second neural network.
  • the processing capability of the first neural network on the static area of the image is greater than that of the second neural network.
  • multiple frames of images are often synthesized into one frame of output through a neural network to improve image or video quality.
  • a neural network requires a high degree of complexity, and in a video scene, a high processing speed is required.
  • the implementation of video processing on a mobile terminal requires a video with a resolution of 8K to reach a processing speed of 30 frames/s, that is, a frame rate of 30.
  • a neural network is used to synthesize multiple frames of images into one frame for output, it needs to face the problems of computational complexity and computational resource consumption, and a large time delay is required. If you blindly reduce the complexity of the neural network and use a lower complexity network, it will affect the quality of the image or video.
  • the first neural network is used to deal with the problem of complex computing power between multiple frames of images
  • the second neural network is used to deal with the problem of lower computing power in each frame of the multi-frame images, and output the multi-frame processed
  • the integrated computing power of the first neural network and the second neural network is allocated to multiple frames of images, so that the processing complexity of each frame of image is reduced compared with the above-mentioned solution, and at the same time, the quality of the image or video can be guaranteed.
  • the first image is an image of a static area
  • the third image is an image of a moving area
  • the first neural network is used to process the static area of the multi-frame image to be processed
  • the second neural network is used to process the motion of the multi-frame image to be processed
  • the first neural network and the second neural network are convolutional neural networks as an example.
  • the image to be processed is 4 frames
  • the second image is 4 frames.
  • the format of the image to be processed is a Bayer format image, in particular the image format is an RGrGbB format, and one frame of an RGrGbB format image includes 4 channels (R, Gr, Gb, B).
  • R, Gr, Gb, B 4 channels
  • the image processing system includes a first neural network and a second neural network.
  • the 16 channels include (R1, Gr1, Gb1, B1, R2, Gr2, Gb2, B2, R3, Gr3, Gb3, B3, R4, Gr4, Gb4, B4).
  • Figures 9a and 9b Exemplarily, the architecture of the first neural network is shown in Figures 9a and 9b. Since the drawings of the first neural network are too large, the first neural network is split into two parts, as shown in Figure 9a and Figure 9b, respectively. out. Figures 9a and 9b together form the architecture of the first neural network. After add in Figure 9a, connect to the first layer in Figure 9b.
  • the convolutional layer is represented by a rectangular box.
  • Conv2d represents a 2-dimensional convolution
  • bias represents the bias term
  • 1x1/3x3 represents the size of the convolution kernel
  • stride represents the step size
  • _32_16 represents the number of input and output feature maps
  • 32 represents the number of feature maps input to the layer is 32
  • 16 means that the number of feature maps of the output layer is 16.
  • Split represents the split layer, which means that the feature map is split in the channel (chanel) dimension.
  • Split 2 means to split the image in the feature map dimension. For example, if 32 feature maps are input through the above operation, they will become two 16 feature maps.
  • concat stands for the jump chain layer, which means that the images will be merged in the dimension of the feature map, for example, two images of 16 feature maps are merged into one image of 32 feature maps.
  • add represents a matrix addition operation.
  • the first neural network shown in FIG. 9a and FIG. 9b is a typical convolutional neural network, which can well solve the static area of multiple frames of images to be processed. Assume 4 that the image to be processed is input to the first neural network, and the first neural network outputs the first image.
  • the convolutional layer of the first neural network may also adopt a group convolution (group convolution) operation.
  • group convolution is a special convolution layer.
  • N convolution kernels in the previous layer.
  • M for group convolution.
  • the operation of the group of convolutional layers is to first divide the N channels into M parts. Each group corresponds to N/M channels, and the convolution of each group is performed independently. After completion, the output feature maps are vector concatenated (concat) together as the output channel of this layer.
  • the operation mode of group convolution can obtain the same or similar technical effect of the branch mode.
  • FIG. 10a and Figure 10b Exemplarily, the architecture of 4 second neural networks processing 4 images to be processed is shown in Fig. 10a and Fig. 10b.
  • the output first image in Fig. 10a is input to the first layer in Fig. 10b.
  • the convolutional layer is represented by a rectangular box.
  • Fig. 9a and Fig. 9b For specific explanation, please refer to the explanation in Fig. 9a and Fig. 9b.
  • the first frame of image to be processed is input to the first second neural network to obtain the first frame of the third image, and the first image and the first frame of the third image are combined to obtain the first frame of the second image .
  • the second frame of image to be processed is input to the second second neural network to obtain the second frame of the third image, and the first image and the second frame of the third image are combined to obtain the second frame of the second image.
  • the third frame of the image to be processed is input to the third second neural network to obtain the third frame of the third image, and the first image and the third frame of the third image are combined to obtain the third frame of the second image .
  • the fourth frame of image to be processed is input to the fourth second neural network to obtain the fourth frame of the third image, and the first image and the fourth frame of the third image are combined to obtain the fourth frame of the second image.
  • the first neural network can also input an intermediate feature map, and the intermediate feature map is input to the second neural network, and the intermediate feature map is used to participate in the calculation of the second neural network to obtain The third image.
  • the intermediate feature map output by the second convolutional layer in the first neural network is used for vector stitching with the image output by the first convolutional layer of the second neural network to obtain a processed image.
  • the intermediate feature map output by the fourth convolution layer in the first neural network is used for vector stitching with the intermediate feature map output by the third convolution layer of the second neural network to obtain a processed image.
  • the neural network model needs to be trained.
  • the training data can include training images and ground truth images.
  • the output image is compared with the true value image of the first image until the network converges, and the training of the first neural network model is completed.
  • the so-called network convergence may refer to, for example, that the true value image difference between the output image and the first image is smaller than the set first threshold.
  • Fix the parameters of the first image obtained by the training of the first neural network use the true value image of the third image to process the collected training image to obtain and output the image.
  • the output image is compared with the true value image of the third image until the network converges, and the training of the second neural network model is completed.
  • network convergence may mean that the difference between the output image and the true value image of the third image is less than the set second threshold.
  • an image processing system is formed by a first neural network and a second neural network, and the image processing system is used to process multiple frames of to-be-processed images, and output multiple frames of processed images.
  • the complexity of the second neural network is lower than the complexity of the first neural network.
  • the calculation amount of the image processing system for each frame of the image to be processed is reduced to a certain extent compared with the scheme of processing multiple frames of images into one frame through the basic network in some technologies. In turn, the image processing time delay can be reduced, and the quality of the image or video can be guaranteed.
  • the computing power of the two neural networks for processing multiple frames of images to be processed will be illustrated below with examples.
  • the processed image output by the first neural network and the second neural network is 4 frames.
  • a frame is output after basic network processing.
  • the first neural network is shown in Figures 9a and 9b
  • the second neural network is shown in Figures 10a and 10b.
  • the calculation amount of the first neural network is about the same as that of the basic network, which is about 12000 MAC.
  • the calculation process of the network complexity of the basic network is as follows:
  • the calculation amount of the second neural network is about 4000, which is assumed to be 4000.
  • the multi-frame input and multi-frame output schemes performed by the first neural network and the second neural network can reduce the amount of calculation, thereby reducing the delay of image processing, and can meet the requirements of video in video scenarios.
  • Requirements for image processing delay The network computing power requirement of a video with a resolution of 8 thousand (K) pixels and a frame rate of 30 frames per second is about 50000 MAC.
  • the embodiment of this application outputs 8 frames, the amount of calculation prescribed by the image processing system is basically sufficient Meet the network computing power requirements of 8K 30 videos.
  • the neural network-based image processing device may include a hardware structure and/or a software module, and implement the above in the form of a hardware structure, a software module, or a hardware structure plus a software module.
  • a hardware structure e.g., a hardware structure plus a software module.
  • Each function Whether a certain function among the above-mentioned functions is executed by a hardware structure, a software module, or a hardware structure plus a software module depends on the specific application and design constraint conditions of the technical solution.
  • an embodiment of the present application also provides a neural network-based image processing apparatus 1200.
  • the neural network-based image processing apparatus 1200 may be a mobile terminal or any device with image processing functions. .
  • the neural network-based image processing device 1200 may include modules that perform one-to-one correspondence of the methods/operations/steps/actions in the foregoing method embodiments.
  • the modules may be hardware circuits, software, or It is realized by hardware circuit combined with software.
  • the neural network-based image processing device 1200 may include an arithmetic module 1201.
  • the arithmetic module 1201 is configured to input multiple frames of to-be-processed images into a first neural network for calculation to obtain a first image; and input multiple image groups into multiple second neural networks to perform calculations to obtain multiple frames of second images respectively , wherein each of the image groups includes one frame of the first image and the multiple frames of images to be processed.
  • the division of modules in the embodiments of this application is illustrative, and it is only a logical function division. In actual implementation, there may be other division methods.
  • the functional modules in the various embodiments of this application can be integrated into one process. In the device, it can also exist alone physically, or two or more modules can be integrated into one module.
  • the above-mentioned integrated modules can be implemented in the form of hardware or software functional modules.
  • an embodiment of the present application also provides an image processing device 1300 based on a neural network.
  • the image processing device 1300 of the neural network includes a processor 1301.
  • the processor 1301 is used to call a group of programs to enable the foregoing method embodiments to be executed.
  • the image processing device 1300 of the neural network further includes a memory 1302, and the memory 1302 is configured to store program instructions and/or data executed by the processor 1301.
  • the memory 1302 is coupled with the processor 1301.
  • the coupling in the embodiments of the present application is an indirect coupling or communication connection between devices, units or modules, and may be in electrical, mechanical or other forms, and is used for information exchange between devices, units or modules.
  • the processor 1301 may operate in cooperation with the memory 1302.
  • the processor 1301 may execute program instructions stored in the memory 1302.
  • the memory 1302 may be included in the processor 1301.
  • the neural network-based image processing device 1300 may be a chip system.
  • the chip system may be composed of chips, or may include chips and other discrete devices.
  • the processor 1301 is configured to input multiple frames of to-be-processed images into a first neural network for operation to obtain a first image; and input multiple image groups into multiple second neural networks for operation to obtain multiple frames of second images respectively , wherein each of the image groups includes one frame of the first image and the multiple frames of images to be processed.
  • the processor 1301 may be a general-purpose processor, a digital signal processor, an application specific integrated circuit, a field programmable gate array or other programmable logic device, a discrete gate or transistor logic device, a discrete hardware component, and may implement or execute the The disclosed methods, steps and logic block diagrams.
  • the general-purpose processor may be a microprocessor or any conventional processor or the like. The steps of the method disclosed in combination with the embodiments of the present application may be directly embodied as being executed and completed by a hardware processor, or executed and completed by a combination of hardware and software modules in the processor.
  • the memory 1302 may be a non-volatile memory, such as a hard disk drive (HDD) or a solid-state drive (SSD), etc., and may also be a volatile memory, such as random access memory (random access memory). -access memory, RAM).
  • the memory is any other medium that can be used to carry or store desired program codes in the form of instructions or data structures and that can be accessed by a computer, but is not limited thereto.
  • the memory in the embodiment of the present application may also be a circuit or any other device capable of realizing a storage function for storing program instructions and/or data.
  • An embodiment of the present application also provides a chip including a processor, which is used to support the neural network-based image processing device to implement the functions involved in the foregoing method embodiments.
  • the chip is connected to a memory or the chip includes a memory, and the memory is used to store the necessary program instructions and data of the communication device.
  • the embodiment of the present application provides a computer-readable storage medium that stores a computer program, and the computer program includes instructions for executing the foregoing method embodiments.
  • the embodiments of the present application provide a computer program product containing instructions, which when run on a computer, cause the computer to execute the foregoing method embodiments.
  • this application can be provided as methods, systems, or computer program products. Therefore, this application may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, this application may adopt the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions can also be stored in a computer-readable memory that can guide a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device.
  • the device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
  • These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment.
  • the instructions provide steps for implementing the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)
  • Picture Signal Circuits (AREA)
  • Apparatus For Radiation Diagnosis (AREA)

Abstract

The present application relates to the technical field of image processing, and disclosed are a neural network-based image processing method and apparatus, for use in ensuring image quality and reducing image processing delay. The method comprises: inputting multiple frames of images to be processed into a first neural network for calculation to obtain a first image; and respectively inputting multiple image groups into multiple second neural networks for calculation to respectively obtain multiple frames of second images, each image group comprising the first image and one of the multiple frames of images to be processed.

Description

一种基于神经网络的图像处理方法及装置Image processing method and device based on neural network 技术领域Technical field
本申请实施例涉及图像处理技术领域,尤其涉及一种基于神经网络的图像处理方法及装置。The embodiments of the present application relate to the field of image processing technology, and in particular, to a neural network-based image processing method and device.
背景技术Background technique
随着科学技术的发展,手机、平板电脑等具有拍照和视频录制功能的移动终端已被人们广泛使用。移动终端在拍照或视频录制过程中,对图像信号进行图像信号处理(image Signal processing,ISP)。With the development of science and technology, mobile terminals with camera and video recording functions such as mobile phones and tablet computers have been widely used by people. In the process of photographing or video recording, the mobile terminal performs image signal processing (ISP) on the image signal.
ISP主要作用是对前端图像传感器输出的图像信号进行后期处理。依赖于ISP,在不同的光学条件下得到的图像才能较好的还原现场细节。ISP处理流程如图1所示,自然景物101通过镜头(lens)102获得贝尔(bayer)图像,然后通过传感器103、光电转换104得到模拟电信号105,进一步通过消噪和数模转换(A/D)106获得数字电信号(即原始图像(raw image))107,接下来会进入数字信号处理芯片100中。在数字信号处理芯片100中的步骤是ISP处理的核心步骤,数字信号处理芯片100一般包含黑电平矫正(black level compensation,BLC)108、镜头阴影矫正(lens shading correction)109、坏点矫正(bad pixel correction,BPC)110、去马赛克(demosaic)111、拜耳域降噪(denoise)112、自动白平衡(auto white balance,AWB)113、Ygamma114、自动曝光(auto exposure,AE)115、自动对焦(auto focus,AF)(图1中未示出)、色彩矫正(color correction,CC)116、伽玛(gamma)矫正117、色域转换118、色彩去噪/细节增强119、色彩增强(color enhance,CE)120、编织器(formater)121、输入输出(input/output,I/O)控制122等模块。The main function of ISP is to perform post-processing on the image signal output by the front-end image sensor. Depending on the ISP, the images obtained under different optical conditions can better restore the details of the scene. The ISP processing flow is shown in Figure 1. The natural scene 101 obtains the Bayer image through the lens 102, and then obtains the analog electrical signal 105 through the sensor 103 and the photoelectric conversion 104, and further passes noise reduction and digital-to-analog conversion (A/ D) 106 obtains a digital electrical signal (ie raw image) 107, and then enters the digital signal processing chip 100. The steps in the digital signal processing chip 100 are the core steps of ISP processing. The digital signal processing chip 100 generally includes black level compensation (BLC) 108, lens shading correction 109, and dead pixel correction ( bad pixel correction, BPC) 110, demosaic (demosaic) 111, Bayer domain noise reduction (denoise) 112, auto white balance (AWB) 113, Ygamma 114, auto exposure (AE) 115, auto focus (auto focus, AF) (not shown in Figure 1), color correction (CC) 116, gamma correction 117, color gamut conversion 118, color denoising/detail enhancement 119, color enhancement (color Enhance (CE) 120, formater (formater) 121, input/output (input/output, I/O) control 122 and other modules.
目前,深度学习的应用越来越广泛,基于深度学习的ISP,在很多任务的应用中取得一定的效果。基于深度学习的ISP,会将图像数据经过神经网络进行处理后输出,但是神经网络的处理复杂度一般会很高,在非实时处理场景下,可以达到预计目的,但在需要实时处理的场景中,一般存在能耗、运行时间等问题。At present, the application of deep learning is becoming more and more extensive. ISP based on deep learning has achieved certain results in the application of many tasks. The ISP based on deep learning will process the image data through a neural network and then output it. However, the processing complexity of the neural network is generally very high. In non-real-time processing scenarios, the expected purpose can be achieved, but in scenarios that require real-time processing , Generally there are problems such as energy consumption and running time.
因此基于神经网络的ISP需要进一步优化。Therefore, ISP based on neural network needs to be further optimized.
发明内容Summary of the invention
本申请提供一种基于神经网络的图像处理方法及装置,以期优化基于神经网络的图像信号处理性能。This application provides a neural network-based image processing method and device, in order to optimize the neural network-based image signal processing performance.
第一方面,提供一种基于神经网络的图像处理方法,采用第一神经网络和第二神经网络对多帧待处理图像进行处理,输出第二图像。该方法的步骤如下所述:将多帧待处理图像输入第一神经网络进行运算,以获得第一图像;将多个图像组分别输入多个第二神经网络进行运算,以分别获得多帧第二图像,其中,每个图像组包括第一图像和多帧待处理图像中的一帧图像。In a first aspect, a neural network-based image processing method is provided, which uses a first neural network and a second neural network to process multiple frames of to-be-processed images, and output a second image. The steps of the method are as follows: input multiple frames of to-be-processed images into the first neural network for calculation to obtain the first image; input multiple image groups into multiple second neural networks for calculation respectively to obtain multiple frames of the first neural network. Two images, where each image group includes the first image and one frame of images among the multiple frames of images to be processed.
本申请提供的基于神经网络的图像处理方法,将多帧待处理图像经过第一神经网 络运算后获得的第一图像,即得到多帧待处理图像共有的图像特征。将第一图像和一帧待处理图像经过一个第二神经网络运算后获得第二图像,以分别获得多帧第二图像。由于分别利用第一神经网络和第二神经网络分别对多帧待处理图像进行处理,将第一图像运用到第二神经网络的处理过程中,减小第二神经网络的计算复杂度,并能够保证图像处理质量。In the neural network-based image processing method provided in this application, the first image obtained after multiple frames of images to be processed is subjected to a first neural network operation, that is, the image characteristics common to the multiple frames of images to be processed are obtained. The first image and a frame of image to be processed are subjected to a second neural network operation to obtain a second image, so as to obtain multiple frames of second images respectively. Since the first neural network and the second neural network are used to process multiple frames of images to be processed, the first image is applied to the processing of the second neural network, which reduces the computational complexity of the second neural network and can Ensure the quality of image processing.
在一种可能的实现方式中,将多个图像组分别输入多个第二神经网络进行运算,包括:将多帧待处理图像中的一帧图像输入第二神经网络进行运算,以获得第三图像;合并第一图像和第三图像,以获得第二图像。In a possible implementation manner, inputting multiple image groups into multiple second neural networks to perform calculations includes: inputting one frame of the multiple frames of images to be processed into the second neural network to perform calculations, so as to obtain the third neural network. Image; merge the first image and the third image to obtain the second image.
可选的,第一神经网络包括一个图像输出层和多个特征图输出层,图像输出层输出第一图像,特征图输出层输出多个中间特征图,多个中间特征图用于参与第二神经网络的运算,以获得第三图像。Optionally, the first neural network includes an image output layer and multiple feature map output layers, the image output layer outputs the first image, the feature map output layer outputs multiple intermediate feature maps, and the multiple intermediate feature maps are used to participate in the second The operation of the neural network to obtain the third image.
可选的,第二神经网络的复杂度低于第一神经网络的复杂度。Optionally, the complexity of the second neural network is lower than the complexity of the first neural network.
可选的,多帧待处理图像包括多帧时域邻近的图像。Optionally, the multiple frames of images to be processed include multiple frames of temporally adjacent images.
可选的,第一神经网络对图像静止区域的处理能力大于第二神经网络。Optionally, the processing capability of the first neural network on the static region of the image is greater than that of the second neural network.
在一种可能的设计中,第一神经网络用于处理待处理图像的静止区域。In one possible design, the first neural network is used to process the static area of the image to be processed.
在一种可能的设计中,第二神经网络用于处理待处理图像的运动区域。In one possible design, the second neural network is used to process the motion area of the image to be processed.
在一种可能的设计中,第一神经网络和第二神经网络组成图像处理系统,图像处理系统用于对待处理图像进行降噪和消除马赛克效应处理。In a possible design, the first neural network and the second neural network form an image processing system, and the image processing system is used to reduce noise and eliminate mosaic effects on the image to be processed.
第二方面,提供基于神经网络的图像处理装置,该装置可以是移动终端,也可以是移动终端中的装置(例如芯片、或者芯片系统、或者电路),或者是能够和移动终端匹配使用的装置。一种设计中,该装置可以包括执行第一方面中所描述的方法/操作/步骤/动作所一一对应的模块,该模块可以是硬件电路,也可是软件,也可以是硬件电路结合软件实现。该装置对多帧待处理图像进行处理,获得第二图像。一种设计中,该装置可以包括运算模块。示例性地:运算模块,用于将多帧待处理图像输入第一神经网络进行运算,以获得第一图像;所述运算模块,还用于将多个图像组分别输入多个第二神经网络进行运算,以分别获得多帧第二图像,其中,每个图像组包括第一图像和多帧待处理图像中的一帧图像。In the second aspect, a neural network-based image processing device is provided. The device can be a mobile terminal, a device in a mobile terminal (such as a chip, or a chip system, or a circuit), or a device that can be matched with the mobile terminal. . In one design, the device may include modules that perform one-to-one correspondence of the methods/operations/steps/actions described in the first aspect. The modules may be hardware circuits, software, or hardware circuits combined with software. . The device processes multiple frames of to-be-processed images to obtain a second image. In one design, the device may include an arithmetic module. Exemplarily: an arithmetic module for inputting multiple frames of images to be processed into a first neural network for operation to obtain a first image; the arithmetic module is also used for inputting multiple image groups into multiple second neural networks, respectively Perform operations to obtain multiple frames of second images respectively, where each image group includes the first image and one frame of the multiple frames of images to be processed.
在一种可能的实现方式中,运算模块用于:将多帧待处理图像中的一帧图像输入第二神经网络进行运算,以获得第三图像;合并第一图像和第三图像,以获得第二图像。In a possible implementation, the arithmetic module is used to: input one frame of the images to be processed into the second neural network for calculation to obtain the third image; merge the first image and the third image to obtain The second image.
可选的,第一神经网络包括一个图像输出层和多个特征图输出层,图像输出层输出第一图像,特征图输出层输出多个中间特征图,多个中间特征图用于参与第二神经网络的运算,以获得第三图像。Optionally, the first neural network includes an image output layer and multiple feature map output layers, the image output layer outputs the first image, the feature map output layer outputs multiple intermediate feature maps, and the multiple intermediate feature maps are used to participate in the second The operation of the neural network to obtain the third image.
可选的,第二神经网络的复杂度低于第一神经网络的复杂度。Optionally, the complexity of the second neural network is lower than the complexity of the first neural network.
可选的,多帧待处理图像包括多帧时域邻近的图像。Optionally, the multiple frames of images to be processed include multiple frames of temporally adjacent images.
可选的,第一神经网络对图像静止区域的处理能力大于第二神经网络。Optionally, the processing capability of the first neural network on the static region of the image is greater than that of the second neural network.
在一种可能的设计中,第一神经网络用于处理待处理图像的静止区域。In one possible design, the first neural network is used to process the static area of the image to be processed.
在一种可能的设计中,第二神经网络用于处理待处理图像的运动区域。In one possible design, the second neural network is used to process the motion area of the image to be processed.
在一种可能的设计中,第一神经网络和第二神经网络组成图像处理系统,图像处 理系统用于对待处理图像进行降噪和消除马赛克效应处理。In a possible design, the first neural network and the second neural network constitute an image processing system, and the image processing system is used to reduce noise and eliminate mosaic effects on the image to be processed.
第二方面的有益效果可以参考第一方面对应的效果,在此不再赘述。For the beneficial effects of the second aspect, reference may be made to the corresponding effects of the first aspect, which will not be repeated here.
第三方面,本申请实施例提供一种基于神经网络的图像处理装置,所述装置包括处理器,处理器用于调用一组程序、指令或数据,执行上述第一方面或第一方面的任一可能的设计所描述的方法。所述装置还可以包括存储器,用于存储处理器调用的程序、指令或数据。所述存储器与所述处理器耦合,所述处理器执行所述存储器中存储的、指令或数据时,可以实现上述第一方面或任一可能的设计描述的方法。In a third aspect, an embodiment of the present application provides an image processing device based on a neural network. The device includes a processor, and the processor is used to call a set of programs, instructions, or data to execute the first aspect or any one of the first aspects. Possible design methods described. The device may also include a memory for storing programs, instructions or data called by the processor. The memory is coupled with the processor, and when the processor executes the instructions or data stored in the memory, it can implement the method described in the first aspect or any possible design.
第四方面,本申请实施例提供了一种芯片系统,该芯片系统包括处理器,还可以包括存储器,用于实现上述第一方面或第一方面中任一种可能的设计中所述的方法。该芯片系统可以由芯片构成,也可以包含芯片和其他分立器件。In a fourth aspect, an embodiment of the present application provides a chip system, which includes a processor and may also include a memory, for implementing the method described in the first aspect or any one of the possible designs of the first aspect. . The chip system can be composed of chips, and can also include chips and other discrete devices.
第五方面,本申请实施例中还提供一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机可读指令,当所述计算机可读指令在计算机上运行时,使得如第一方面或第一方面中任一种可能的设计中所述的方法被执行。In a fifth aspect, the embodiments of the present application also provide a computer-readable storage medium. The computer-readable storage medium stores computer-readable instructions. When the computer-readable instructions run on a computer, The method described in one aspect or any one of the possible designs of the first aspect is executed.
第六方面,本申请实施例中还提供一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述第一方面或第一方面的任一可能的设计中所述的方法。In a sixth aspect, the embodiments of the present application also provide a computer program product containing instructions, which when run on a computer, cause the computer to execute the method described in the first aspect or any possible design of the first aspect .
附图说明Description of the drawings
图1为现有技术中ISP处理流程示意图;Figure 1 is a schematic diagram of an ISP processing flow in the prior art;
图2为本申请一实施例提供的系统架构的结构示意图;FIG. 2 is a schematic structural diagram of a system architecture provided by an embodiment of this application;
图3为本申请一实施例提供的神经网络的原理示意图;FIG. 3 is a schematic diagram of the principle of a neural network provided by an embodiment of the application;
图4为本申请一实施例提供的基于神经网络的图像处理方法流程图;4 is a flowchart of a neural network-based image processing method provided by an embodiment of the application;
图5为本申请一实施例提供的图像处理的实现方式的示意图;FIG. 5 is a schematic diagram of an implementation manner of image processing provided by an embodiment of the application;
图6为本申请一实施例提供的图像处理的实现方式的示意图;FIG. 6 is a schematic diagram of an implementation manner of image processing provided by an embodiment of the application;
图7为本申请一实施例提供的图像处理的实现方式的示意图;FIG. 7 is a schematic diagram of an implementation manner of image processing provided by an embodiment of the application;
图8为本申请一实施例提供的RGrGbB图像处理过程示意图;FIG. 8 is a schematic diagram of an RGrGbB image processing process provided by an embodiment of the application;
图9a为本申请实施例提供的第一神经网络的结构示意图之一;FIG. 9a is one of the schematic structural diagrams of the first neural network provided by an embodiment of this application;
图9b为本申请实施例提供的第一神经网络的结构示意图之二;FIG. 9b is the second schematic diagram of the structure of the first neural network provided by an embodiment of the application;
图10a为本申请实施例提供的第二神经网络的结构示意图之一;FIG. 10a is one of the schematic structural diagrams of the second neural network provided by an embodiment of this application;
图10b为本申请实施例提供的第二神经网络的结构示意图之一;FIG. 10b is one of the schematic structural diagrams of the second neural network provided by an embodiment of this application;
图11a为本申请一实施例提供的第一神经网络和第二神经网络的结构示意图;FIG. 11a is a schematic structural diagram of a first neural network and a second neural network provided by an embodiment of this application;
图11b为本申请一实施例提供的第一神经网络和第二神经网络的结构示意图;FIG. 11b is a schematic diagram of the structure of the first neural network and the second neural network provided by an embodiment of the application;
图12为本申请一实施例提供的基于神经网络的图像处理装置结构示意图;FIG. 12 is a schematic structural diagram of an image processing device based on a neural network provided by an embodiment of the application;
图13为本申请一实施例提供的基于神经网络的图像处理装置结构示意图。FIG. 13 is a schematic structural diagram of an image processing device based on a neural network provided by an embodiment of the application.
具体实施方式Detailed ways
本申请说明书和权利要求书及上述附图中的术语“第一”、“第二”和“第三”等是用于区别不同对象,而不是用于限定特定顺序。The terms "first", "second", and "third" in the specification and claims of this application and the above-mentioned drawings are used to distinguish different objects, rather than to limit a specific order.
在本申请实施例中,“示例性的”或者“例如”等词用于表示作例子、例证或说明。本申请实施例中被描述为“示例性的”或者“例如”的任何实施例或设计方案不应被解释为比其它实施例或设计方案更优选或更具优势。确切而言,使用“示例性的” 或者“例如”等词旨在以具体方式呈现相关概念。In the embodiments of the present application, words such as "exemplary" or "for example" are used as examples, illustrations, or illustrations. Any embodiment or design solution described as "exemplary" or "for example" in the embodiments of the present application should not be construed as being more preferable or advantageous than other embodiments or design solutions. To be precise, words such as "exemplary" or "for example" are used to present related concepts in a specific manner.
本申请实施例提供的基于神经网络(neural network,NN)的图像处理方法及装置,可应用于电子设备,该电子设备,可以是移动终端(mobile terminal)、移动台(mobile station,MS)、用户设备(user equipment,UE)等移动设备,也可以是固定设备,如固定电话、台式电脑等,还可以是视频监控器。该电子设备,具有图像信号采集与处理功能的图像采集与处理设备,具有ISP处理功能。该电子设备还可以选择性地具有无线连接功能,以向用户提供语音和/或数据连通性的手持式设备、或连接到无线调制解调器的其他处理设备,比如:该电子设备可以是移动电话(或称为“蜂窝”电话)、具有移动终端的计算机等,还可以是便携式、袖珍式、手持式、计算机内置的或者车载的移动装置,当然也可以是可穿戴设备(如智能手表、智能手环等)、平板电脑、个人电脑(personal computer,PC)、个人数字助理(personal digital assistant,PDA)、销售终端(Point of Sales,POS)等。本申请实施例中以下以电子设备为移动终端为例进行说明。The image processing method and device based on neural network (NN) provided by the embodiments of this application can be applied to electronic equipment. The electronic equipment may be a mobile terminal (mobile terminal), a mobile station (MS), Mobile devices such as user equipment (UE) can also be fixed devices, such as fixed phones, desktop computers, etc., or video monitors. The electronic device is an image acquisition and processing device with image signal acquisition and processing functions, and has an ISP processing function. The electronic device can also optionally have a wireless connection function to provide users with a handheld device with voice and/or data connectivity, or other processing devices connected to a wireless modem. For example, the electronic device can be a mobile phone (or (Called "cellular" phones), computers with mobile terminals, etc., can also be portable, pocket-sized, handheld, computer-built or vehicle-mounted mobile devices, of course, can also be wearable devices (such as smart watches, smart bracelets) Etc.), tablet computers, personal computers (PC), personal digital assistants (PDAs), point of sales (POS), etc. In the embodiments of the present application, the following takes the electronic device as a mobile terminal as an example for description.
图2所示为本申请实施例涉及的移动终端200的一种可选的硬件结构示意图。FIG. 2 is a schematic diagram of an optional hardware structure of the mobile terminal 200 according to an embodiment of the application.
如图2所示,移动终端200主要包括芯片组和外设装置,其中,图2中实线框中的电源管理单元(power management unit,PMU)、语音codec、短距离模块和射频(radio frequency,RF)、运算处理器、随机存储器(random-access memory,RAM)、输入/输出(input/output,I/O)、显示接口、图像信号处理器(image signal processor,ISP)、传感器接口(sensor hub)、基带通信模块等各部件组成芯片或芯片组。USB接口、存储器、显示屏、电池/市电、耳机/扬声器、天线、传感器(sensor)等部件可以理解为是外设装置。芯片组内的运算处理器、RAM、I/O、显示接口、ISP、传感器接口、基带等部件可组成片上系统(system-on-a-chip,SOC),为芯片组的主要部分。SOC内的各部件可以全部集成为一个完整芯片,或者SOC内也可以是部分部件集成,另一部分部件不集成,比如SOC内的基带通信模块,可以与其他部分不集成在一起,成为独立部分。SOC中的各部件可通过总线或其他连接线互相连接。SOC外部的PMU、语音codec、RF等通常包括模拟电路部分,因此经常在SOC之外,彼此并不集成。As shown in FIG. 2, the mobile terminal 200 mainly includes a chipset and peripheral devices. Among them, the power management unit (PMU), voice codec, short-distance module, and radio frequency (PMU) in the solid-line box in FIG. , RF), arithmetic processor, random-access memory (RAM), input/output (input/output, I/O), display interface, image signal processor (ISP), sensor interface ( Sensor hub), baseband communication module and other components make up a chip or chipset. Components such as USB interface, memory, display screen, battery/mains power, earphone/speaker, antenna, sensor, etc. can be understood as peripheral devices. The arithmetic processor, RAM, I/O, display interface, ISP, sensor interface, baseband and other components in the chipset can form a system-on-a-chip (SOC), which is the main part of the chipset. The components in the SOC can all be integrated into a complete chip, or part of the components in the SOC can be integrated, and the other parts are not integrated. For example, the baseband communication module in the SOC can not be integrated with other parts and become an independent part. The components in the SOC can be connected to each other through a bus or other connecting lines. The PMU, voice codec, RF, etc. outside the SOC usually include analog circuit parts, so they are often outside the SOC and are not integrated with each other.
图2中,PMU用于外接市电或电池,为SOC供电,可以利用市电为电池充电。语音codec作为声音的编解码单元外接耳机或扬声器,实现自然的模拟语音信号与SOC可处理的数字语音信号之间的转换。短距离模块可包括无线保真(wireless fidelity,WiFi)和蓝牙,也可选择性包括红外、近距离无线通信(near field communication,NFC)、收音机(FM)或全球定位系统(global positioning system,GPS)模块等。RF与SOC中的基带通信模块连接,用来实现空口RF信号和基带信号的转换,即混频。对手机而言,接收是下变频,发送则是上变频。短距离模块和RF都可以有一个或多个用于信号发送或接收的天线。基带用来做基带通信,包括多种通信模式中的一种或多种,用于进行无线通信协议的处理,可包括物理层(层1)、媒体接入控制(medium access control,MAC)(层2)、无线资源控制(radio resource control,RRC)(层3)等各个协议层的处理,可支持各种蜂窝通信制式,例如长期演进(long term evolution,LTE)通信、或5G新空口(new radio,NR)通信等。传感器接口是SOC与外界传感器的接口,用来收集和处理外界至少一个传感器的数据,外界的传感器例如可以是加速计、 陀螺仪、控制传感器、图像传感器等。运算处理器可以是通用处理器,例如中央处理器(central processing unit,CPU),还可以是一个或多个集成电路,例如:一个或多个特定集成电路(application specific integrated circuit,ASIC),或,一个或多个数字信号处理器(digital singnal processor,DSP),或微处理器,或,一个或者多个现场可编程门阵列(field programmable gate array,FPGA)等。运算处理器可包括一个或多个核,并可选择性调度其他单元。RAM可存储一些计算或处理过程中的中间数据,如CPU和基带的中间计算数据。ISP用于图像传感器采集的数据进行处理。I/O用于SOC与外界各类接口进行交互,如可与用于数据传输的通用串行总线(universal serial bus,USB)接口进行交互等。存储器可以是一个或一组芯片。显示屏可以是触摸屏,通过显示接口与总线连接,显示接口可以是进行图像显示前的数据处理,比如需要显示的多个图层的混叠、显示数据的缓存或对屏幕亮度的控制调整等。In Figure 2, the PMU is used to connect to the mains or battery to supply power to the SOC, and the mains can be used to charge the battery. The voice codec is used as the sound codec unit to connect with earphones or speakers to realize the conversion between natural analog voice signals and digital voice signals that can be processed by the SOC. The short-range module can include wireless fidelity (WiFi) and Bluetooth, and can also optionally include infrared, near field communication (NFC), radio (FM) or global positioning system (GPS) ) Module etc. The RF is connected with the baseband communication module in the SOC to realize the conversion between the air interface RF signal and the baseband signal, that is, mixing. For mobile phones, receiving is down-conversion, and sending is up-conversion. Both the short-range module and the RF can have one or more antennas for signal transmission or reception. Baseband is used for baseband communication, including one or more of a variety of communication modes, used for processing wireless communication protocols, including physical layer (layer 1), medium access control (MAC) ( Layer 2), radio resource control (RRC) (layer 3) and other protocol layers can support various cellular communication standards, such as long term evolution (LTE) communication, or 5G new air interface ( new radio, NR) communication, etc. The sensor interface is an interface between the SOC and an external sensor, and is used to collect and process data from at least one external sensor. The external sensor may be, for example, an accelerometer, a gyroscope, a control sensor, an image sensor, and so on. The arithmetic processor can be a general-purpose processor, such as a central processing unit (CPU), or one or more integrated circuits, such as one or more application specific integrated circuits (ASICs), or , One or more digital signal processors (digital signal processors, DSP), or microprocessors, or, one or more field programmable gate arrays (FPGA), etc. The arithmetic processor can include one or more cores, and can selectively schedule other units. RAM can store some intermediate data during calculation or processing, such as intermediate calculation data of CPU and baseband. ISP is used to process the data collected by the image sensor. I/O is used for the SOC to interact with various external interfaces, such as the universal serial bus (USB) interface for data transmission. The memory can be a chip or a group of chips. The display screen can be a touch screen, which is connected to the bus through a display interface. The display interface can be used for data processing before image display, such as aliasing of multiple layers to be displayed, buffering of display data, or control and adjustment of screen brightness.
本申请实施例中涉及的移动终端200中包括有图像传感器,该图像传感器可从外界采集光线等外界信号,将该外界信号进行处理转换成传感器信号,即电信号。该传感器信号可以是静态图像信号,也可以是动态的视频图像信号。其中,该图像传感器例如可以是摄像头。The mobile terminal 200 involved in the embodiment of the present application includes an image sensor, which can collect external signals such as light from the outside, and process and convert the external signals into sensor signals, that is, electrical signals. The sensor signal can be a static image signal or a dynamic video image signal. Wherein, the image sensor may be a camera, for example.
本申请实施例中涉及的移动终端200还包括有图像信号处理器,图像传感器采集到传感器信号传送给图像信号处理器,图像信号处理器获取到该传感器信号,可对该传感器信号进行图像信号处理,以得到清晰度、色彩、亮度等各方面均符合人眼特性的图像信号。The mobile terminal 200 involved in the embodiments of the present application further includes an image signal processor. The image sensor collects sensor signals and transmits them to the image signal processor. The image signal processor obtains the sensor signal and can perform image signal processing on the sensor signal. , In order to obtain the image signal of the sharpness, color, brightness and other aspects that are in line with the characteristics of the human eye.
可以理解的是,本申请实施例中涉及的图像信号处理器可以是一个或一组芯片,即可以是集成的,也可以是独立的。例如,移动终端200中包括的图像信号处理器可以是集成在运算处理器中的集成ISP芯片。It can be understood that the image signal processor involved in the embodiment of the present application may be one or a group of chips, that is, it may be integrated or independent. For example, the image signal processor included in the mobile terminal 200 may be an integrated ISP chip integrated in the arithmetic processor.
本申请实施例中涉及的移动终端200具有拍摄照片或录制视频的功能。The mobile terminal 200 involved in the embodiments of the present application has the function of taking photos or recording videos.
本申请实施例提供的基于神经网络的图像处理方法主要针对如何基于神经网络进行图像信号处理进行说明。The neural network-based image processing method provided in the embodiments of the present application mainly focuses on how to perform image signal processing based on the neural network.
为了更好的理解本申请实施例的方案,首先对本申请实施例涉及到的概念术语进行解释说明。In order to better understand the solutions of the embodiments of the present application, firstly, the conceptual terms involved in the embodiments of the present application are explained.
(1)神经网络(1) Neural network
本申请实施例中采用神经网络对待处理的多帧图像进行处理。神经网络是一种模仿动物神经网络行为特征进行信息处理的网络结构,也简称为神经网络。In the embodiment of the present application, a neural network is used to process the multi-frame images to be processed. Neural network is a network structure that imitates the behavioral characteristics of animal neural network for information processing, also referred to as neural network for short.
其中,神经网络可以是由神经单元组成的,神经单元可以是指以x s和截距1为输入的运算单元,该运算单元的输出可以如公式(1)所示: Among them, the neural network can be composed of neural units, which can refer to an arithmetic unit that takes x s and intercept 1 as inputs, and the output of the arithmetic unit can be as shown in formula (1):
Figure PCTCN2020082634-appb-000001
Figure PCTCN2020082634-appb-000001
其中,s=1、2、……n,n为大于1的自然数,W s为x s的权重,b为神经单元的偏置。f为神经单元的激活函数(activation functions),用于将非线性特性引入神经网络中,来将神经单元中的输入信号转换为输出信号。该激活函数的输出信号可以作为下一层卷积层的输入,激活函数可以是sigmoid函数。神经网络是将多个上述单一的神经单元联结在一起形成的网络,即一个神经单元的输出可以是另一个神经单元的输入。每个神经单元的输入可以与前一层的局部接受域相连,来提取局部接受域的特征,局 部接受域可以是由若干个神经单元组成的区域。 Among them, s=1, 2,...n, n is a natural number greater than 1, W s is the weight of x s , and b is the bias of the neural unit. f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal. The output signal of the activation function can be used as the input of the next convolutional layer, and the activation function can be a sigmoid function. A neural network is a network formed by connecting multiple above-mentioned single neural units together, that is, the output of one neural unit can be the input of another neural unit. The input of each neural unit can be connected with the local receptive field of the previous layer to extract the characteristics of the local receptive field. The local receptive field can be a region composed of several neural units.
如图3所示,是一种神经网络的原理示意图,该神经网络300具有N个处理层,N≥3且N取自然数,该神经网络的第一层为输入层301,负责接收输入信号,该神经网络的最后一层为输出层303,输出神经网络的处理结果,除去第一层和最后一层的其他层为中间层304,这些中间层共同组成隐藏层302,隐藏层中的每一层中间层既可以接收输入信号,也可以输出信号,隐藏层负责输入信号的处理过程。每一层代表了信号处理的一个逻辑级别,通过多个层,数据信号可经过多级逻辑的处理。As shown in Figure 3, it is a schematic diagram of the principle of a neural network. The neural network 300 has N processing layers, where N≥3 and N takes a natural number. The first layer of the neural network is the input layer 301, which is responsible for receiving input signals. The last layer of the neural network is the output layer 303, which outputs the processing results of the neural network. The other layers except the first and last layers are the intermediate layer 304. These intermediate layers together form the hidden layer 302, each of the hidden layers The middle layer of the layer can receive input signals and output signals, and the hidden layer is responsible for the processing of input signals. Each layer represents a logic level of signal processing. Through multiple layers, data signals can be processed by multiple levels of logic.
在一些可行的实施例中该神经网络的输入信号可以是语音信号、文本信号、图像信号、温度信号等各种形式的信号。在本实施例中,被处理的图像信号可以是相机(图像传感器)拍摄的风景信号、显监控设备捕捉的社区环境的图像信号以及门禁系统获取的人脸的面部信号等各类传感器信号,该神经网络的输入信号包括其他各种计算机可处理的工程信号,在此不再一一列举。若利用神经网络对图像信号进行深度学习,可提高图像质量。In some feasible embodiments, the input signal of the neural network may be a signal in various forms such as a voice signal, a text signal, an image signal, and a temperature signal. In this embodiment, the processed image signals may be various sensor signals such as landscape signals captured by a camera (image sensor), image signals of a community environment captured by a display monitoring device, and facial signals of human faces acquired by an access control system. The input signals of the neural network include various other engineering signals that can be processed by computers, so I won't list them all here. If the neural network is used for deep learning of the image signal, the image quality can be improved.
(2)深度神经网络(2) Deep neural network
深度神经网络(deep neural network,DNN),也称多层神经网络,可以理解为具有多层隐藏层的神经网络。按照不同层的位置对DNN进行划分,DNN内部的神经网络可以分为三类:输入层,隐藏层和输出层。一般来说第一层是输入层,最后一层是输出层,中间的层数都是隐藏层。层与层之间是全连接的,也就是说,第i层的任意一个神经元一定与第i+1层的任意一个神经元相连。Deep neural network (DNN), also known as multi-layer neural network, can be understood as a neural network with multiple hidden layers. The DNN is divided according to the positions of different layers. The neural network inside the DNN can be divided into three categories: input layer, hidden layer and output layer. Generally speaking, the first layer is the input layer, the last layer is the output layer, and the number of layers in the middle are all hidden layers. The layers are fully connected, that is to say, any neuron in the i-th layer must be connected to any neuron in the i+1th layer.
虽然DNN看起来很复杂,但是就每一层的工作来说,其实并不复杂,简单来说就是如下线性关系表达式:y=α(Wx+b),其中,x是输入向量,y是输出向量,b是偏移向量,W是权重矩阵(也称系数),α()是激活函数。每一层仅仅是对输入向量x经过如此简单的操作得到输出向量y。由于DNN层数多,系数W和偏移向量b的数量也比较多。这些参数在DNN中的定义如下所述:以系数W为例:假设在一个三层的DNN中,第二层的第4个神经元到第三层的第2个神经元的线性系数定义为
Figure PCTCN2020082634-appb-000002
其中,上标3代表系数W所在的层数,而下标对应的是输出的第三层索引2和输入的第二层索引4。
Although DNN looks very complicated, it is not complicated as far as the work of each layer is concerned. Simply put, it is the following linear relationship expression: y=α(Wx+b), where x is the input vector and y is The output vector, b is the offset vector, W is the weight matrix (also called coefficient), and α() is the activation function. Each layer simply performs such a simple operation on the input vector x to obtain the output vector y. Due to the large number of DNN layers, the number of coefficients W and offset vectors b is also relatively large. The definition of these parameters in DNN is as follows: Take coefficient W as an example: Suppose in a three-layer DNN, the linear coefficients from the fourth neuron in the second layer to the second neuron in the third layer are defined as
Figure PCTCN2020082634-appb-000002
Among them, the superscript 3 represents the number of layers where the coefficient W is located, and the subscript corresponds to the output third-level index 2 and the input second-level index 4.
综上,第L-1层的第k个神经元到第L层的第j个神经元的系数定义为
Figure PCTCN2020082634-appb-000003
In summary, the coefficient from the kth neuron in the L-1th layer to the jth neuron in the Lth layer is defined as
Figure PCTCN2020082634-appb-000003
需要注意的是,输入层是没有W参数的。在深度神经网络中,更多的隐藏层让网络更能够刻画现实世界中的复杂情形。理论上而言,参数越多的模型复杂度越高,“容量”也就越大,也就意味着它能完成更复杂的学习任务。训练深度神经网络的也就是学习权重矩阵的过程,其最终目的是得到训练好的深度神经网络的所有层的权重矩阵(由很多层的向量W形成的权重矩阵)。It should be noted that there is no W parameter in the input layer. In deep neural networks, more hidden layers make the network more capable of portraying complex situations in the real world. In theory, a model with more parameters is more complex and has a greater "capacity", which means that it can complete more complex learning tasks. Training the deep neural network is also the process of learning the weight matrix, and its ultimate goal is to obtain the weight matrix of all layers of the trained deep neural network (the weight matrix formed by the vector W of many layers).
(3)卷积神经网络(3) Convolutional neural network
卷积神经网络(convolutional neuron network,CNN)是一种带有卷积结构的深度神经网络。卷积神经网络包含了一个由卷积层和子采样层构成的特征抽取器,该特征抽取器可以看作是滤波器。卷积层是指卷积神经网络中对输入信号进行卷积处理的神经元层。在卷积神经网络的卷积层中,一个神经元可以只与部分邻层神经元连接。一个卷积层中,通常包含若干个特征平面,每个特征平面可以由一些矩形排列的神经单元组成。同一特征平面的神经单元共享权重,这里共享的权重就是卷积核。共享权重 可以理解为提取图像信息的方式与位置无关。卷积核可以以随机大小的矩阵的形式初始化,在卷积神经网络的训练过程中卷积核可以通过学习得到合理的权重。另外,共享权重带来的直接好处是减少卷积神经网络各层之间的连接,同时又降低了过拟合的风险。Convolutional neural network (convolutional neuron network, CNN) is a deep neural network with a convolutional structure. The convolutional neural network contains a feature extractor composed of a convolutional layer and a sub-sampling layer. The feature extractor can be regarded as a filter. The convolutional layer refers to the neuron layer that performs convolution processing on the input signal in the convolutional neural network. In the convolutional layer of a convolutional neural network, a neuron can be connected to only part of the neighboring neurons. A convolutional layer usually contains several feature planes, and each feature plane can be composed of some rectangularly arranged neural units. Neural units in the same feature plane share weights, and the shared weights here are the convolution kernels. Sharing weight can be understood as the way of extracting image information has nothing to do with location. The convolution kernel can be initialized in the form of a matrix of random size. In the training process of the convolutional neural network, the convolution kernel can obtain reasonable weights through learning. In addition, the direct benefit of sharing weights is to reduce the connections between the layers of the convolutional neural network, and at the same time reduce the risk of overfitting.
本申请实施例中的神经网络可以是卷积神经网络,当然也可以是其它类型的神经网络,例如循环神经网络(recurrent neural network,RNN)。The neural network in the embodiment of the present application may be a convolutional neural network, and of course, it may also be another type of neural network, such as a recurrent neural network (recurrent neural network, RNN).
应理解,本申请实施例中的图像可以为静态图像(或称为静态画面)或动态图像(或称为动态画面),例如,本申请中的图像可以为视频或动态图片,或者,本申请中的图像也可以为静态图片或照片。为了便于描述,本申请在下述实施例中将静态图像或动态图像统一称为图像。It should be understood that the images in the embodiments of the present application may be static images (or referred to as static pictures) or dynamic images (or referred to as dynamic pictures). For example, the images in the present application may be videos or dynamic pictures, or the present application The images in can also be static pictures or photos. For ease of description, in the following embodiments of the present application, static images or dynamic images are collectively referred to as images.
下面对本申请实施例提供的基于神经网络的图像处理方法进行介绍。该方法由基于神经网络的图像处理装置来执行。该基于神经网络的图像处理装置可以是任意具有图像处理功能的装置或设备来执行,例如,该方法由图2所示的移动终端200来执行,或者由与移动终端相关的设备来执行,或者由移动终端包含的部分设备来执行。The following describes the neural network-based image processing method provided by the embodiments of the present application. The method is executed by an image processing device based on a neural network. The neural network-based image processing device may be any device or device with image processing functions to execute, for example, the method is executed by the mobile terminal 200 shown in FIG. 2 or executed by a device related to the mobile terminal, or It is executed by part of the equipment included in the mobile terminal.
本申请实施例中,采用多个神经网络进行图像处理,例如采用两个神经网络对待处理图像进行处理,该两个神经网络记为第一神经网络和第二神经网络。第一神经网络和第二神经网络符合上述对神经网络的介绍。In the embodiment of the present application, multiple neural networks are used for image processing, for example, two neural networks are used to process the image to be processed, and the two neural networks are denoted as the first neural network and the second neural network. The first neural network and the second neural network conform to the above description of the neural network.
如图4所示,本申请实施例提供的基于神经网络的图像处理方法包含以下步骤。As shown in FIG. 4, the neural network-based image processing method provided by the embodiment of the present application includes the following steps.
S401、将多帧待处理图像输入第一神经网络进行运算,以获得第一图像。S401: Input multiple frames of to-be-processed images into a first neural network for calculation to obtain a first image.
S402、将多个图像组分别输入多个第二神经网络进行运算,以分别获得多帧第二图像,每个图像组包括第一图像和多帧待处理图像中的一帧图像。S402: Input multiple image groups into multiple second neural networks to perform operations to obtain multiple frames of second images, each image group includes the first image and one frame of the multiple frames of images to be processed.
如图5所示,以n帧待处理图像为例进行说明,n为大于或等于2的整数。将n帧待处理图像输入第一神经网络获得第一图像,将第一图像和第1帧待处理图像输入第1个第二神经网络,将第一图像和第2帧待处理图像输入第2个第二神经网络,以此类推,将第一图像和第n帧待处理图像输入第n个第二神经网络。可理解的,每个第二神经网络接收第一图像和一帧待处理图像,每个第二神经网络输出一帧第二图像,即第1个第二神经网络输出第1帧第二图像,第2个第二神经网络输出第2帧第二图像,以此类推,第n个第二神经网络输出第n帧第二图像。As shown in FIG. 5, taking n frames of images to be processed as an example, n is an integer greater than or equal to 2. Input n frames of image to be processed into the first neural network to obtain the first image, input the first image and the first frame of image to be processed into the first second neural network, and input the first image and the second frame of image to be processed into the second neural network. A second neural network, and so on, input the first image and the n-th frame to be processed image into the n-th second neural network. It is understandable that each second neural network receives the first image and a frame of image to be processed, and each second neural network outputs a frame of the second image, that is, the first second neural network outputs the first frame of the second image, The second second neural network outputs the second image of the second frame, and so on, the nth second neural network outputs the second image of the nth frame.
通过图4所示的方法,将多帧待处理图像经过第一神经网络运算后获得第一图像,即得到多帧待处理图像共有的图像特征。将第一图像和一帧待处理图像经过一个第二神经网络运算后获得第二图像,以分别获得多帧第二图像。由于分别利用第一神经网络和第二神经网络分别对多帧待处理图像进行处理,将第一图像运用到第二神经网络的处理过程中,减小第二神经网络的计算复杂度,并能够保证图像处理质量。Through the method shown in FIG. 4, the first image is obtained after multiple frames of images to be processed through the first neural network operation, that is, the image characteristics common to the multiple frames of images to be processed are obtained. The first image and a frame of image to be processed are subjected to a second neural network operation to obtain a second image, so as to obtain multiple frames of second images respectively. Since the first neural network and the second neural network are used to process multiple frames of images to be processed, the first image is applied to the processing of the second neural network, which reduces the computational complexity of the second neural network and can Ensure the quality of image processing.
例如,第一神经网络用于处理多帧待处理图像的静止区域。第一图像可以是多帧待处理图像共有的静止区域的图像。由于静止区域的特征占用网络复杂度的比例较高,通过第一神经网络先处理静止区域的特征,将静止区域的特征的处理结果作为中间结果输入第二神经网络,第二神经网络的复杂度要求就会降低。通过两个神经网络的配合使用,当处理多帧图像时,能够比一个神经网络达到更低的复杂度。For example, the first neural network is used to process static regions of multiple frames of images to be processed. The first image may be an image of a static area shared by multiple frames of images to be processed. Since the features of the static region occupy a high proportion of the network complexity, the features of the static region are processed first through the first neural network, and the processing results of the features of the static region are input into the second neural network as an intermediate result. The complexity of the second neural network The requirements will be reduced. Through the combined use of two neural networks, when processing multiple frames of images, it can achieve lower complexity than one neural network.
可选的,第二神经网络用于处理多帧待处理图像的运动区域。Optionally, the second neural network is used to process the motion regions of multiple frames of images to be processed.
下面本申请实施例提供的基于神经网络的图像处理方法的一些可选的设计进行说明。Some optional designs of the neural network-based image processing method provided in the embodiments of the present application are described below.
在一种可能的实现方式中,如图6所示,将n帧待处理图像输入第一神经网络获得第一图像,将第一图像和第1帧待处理图像输入第1个第二神经网络,将第一图像和第2帧待处理图像输入第2个第二神经网络,以此类推,将第一图像和第n帧待处理图像输入第n个第二神经网络。可理解的,每个第二神经网络接收第一图像和一帧待处理图像。进一步的,每个第二神经网络输出一帧第三图像,合并第一图像和第三图像,以获得第二图像。每个第二神经网络输出一帧第二图像,即第1个第二神经网络输出第1帧第二图像,第2个第二神经网络输出第2帧第二图像,以此类推,第n个第二神经网络输出第n帧第二图像。In a possible implementation, as shown in FIG. 6, input n frames of to-be-processed images into the first neural network to obtain the first image, and input the first image and the first frame of to-be-processed images into the first second neural network , The first image and the second frame to be processed image are input into the second second neural network, and so on, the first image and the nth frame to be processed image are input into the nth second neural network. It is understandable that each second neural network receives the first image and a frame of image to be processed. Further, each second neural network outputs a frame of the third image, and merges the first image and the third image to obtain the second image. Each second neural network outputs a frame of the second image, that is, the first second neural network outputs the first frame of the second image, the second second neural network outputs the second frame of the second image, and so on, the nth The second neural network outputs the second image of the nth frame.
可选的,合并第一图像和第三图像,也可以认为将第一图像和第三图像进行组合,比如对第一图像和第三图像进行矩阵加法运算得到第二图像。例如,假设第一图像是经过第一神经网络处理后的静止区域的图像,第三图像是经过第二神经网络处理后的运动区域的图像,那么将第一图像和第三图像进行合并,即将处理后的静止区域的图像和处理后的运动区域的图像进行合并,得到完整的第二图像。Optionally, combining the first image and the third image can also be considered as combining the first image and the third image, for example, performing a matrix addition operation on the first image and the third image to obtain the second image. For example, if the first image is an image of a static area processed by the first neural network, and the third image is an image of a moving area processed by the second neural network, then the first image and the third image are combined, namely The processed image of the static area and the processed image of the moving area are combined to obtain a complete second image.
在一种可能的实现方式中,第一神经网络和/或第二神经网络并没有将待处理图像划分或识别为静止区域和/或运动区域,而是对待处理图像整体进行处理。例如,第一神经网络自身的特性会使其对待处理图像中具有静止区域特征的图像区域进行处理,而第二神经网络自身的特性会使其对待处理图像以及第一神经网络处理过的中间图像中具有运动区域特征的图像区域进行处理。又例如,第一神经网络自身的特性会使其对待处理图像中具有静止区域特征的图像区域进行较高强度的处理,对具有运动区域特征的图像区域进行较低强度的处理,而第二神经网络自身的特性会使其对待处理图像以及第一神经网络处理过的中间图像中具有运动区域特征的图像区域进行较高强度处理,对待处理图像以及第一神经网络处理过的中间图像中具有静止区域特征的图像区域进行较低强度处理。对应的,第一图像可以是静止区域经过第一神经网络处理后的图像,第三图像可以是运动区域经过第二神经网络处理后的图像。应理解,为了描述方便,上述实现方式也简述为第一神经网络处理静止区域图像,第二神经网络处理运动区域图像。In a possible implementation manner, the first neural network and/or the second neural network do not divide or recognize the image to be processed as static regions and/or moving regions, but process the image to be processed as a whole. For example, the characteristics of the first neural network will make it process the image area with static area characteristics in the image to be processed, and the characteristics of the second neural network will make it process the image to be processed and the intermediate image processed by the first neural network. The image area with the characteristics of the moving area is processed. For another example, the characteristics of the first neural network will cause it to perform higher-intensity processing on image areas with static area characteristics in the image to be processed, and perform lower-intensity processing on image areas with moving area characteristics, while the second neural network The characteristics of the network itself will make the image area to be processed and the intermediate image processed by the first neural network to be processed with higher intensity, and the image to be processed and the intermediate image processed by the first neural network are static. The image area with regional characteristics is processed with lower intensity. Correspondingly, the first image may be an image processed by the first neural network in a static area, and the third image may be an image processed by the second neural network in a moving area. It should be understood that, for the convenience of description, the foregoing implementation manners are also briefly described as that the first neural network processes images in a static area, and the second neural network processes images in a moving area.
在另一种可能的实现方式中,如图7所示,与上述图6所述的实现方式的区别在于,第二神经网络还接收第一神经网络输出的中间特征图。中间特征图用于参与第二神经网络的运算,以获得第三图像。In another possible implementation manner, as shown in FIG. 7, the difference from the implementation manner described in FIG. 6 is that the second neural network also receives the intermediate feature map output by the first neural network. The intermediate feature map is used to participate in the operation of the second neural network to obtain the third image.
例如,可以将一帧待处理图像和中间特征图进行向量拼接或者向量加法,得到待处理图像矩阵,将待处理图像矩阵输入第二神经网络进行运算,以获得第三图像。可以理解的是,一帧待处理图像和中间特征图进行向量拼接可以看成是第二神经网络内部的处理过程。输入第二神经网络的是一个整体矩阵,即待处理图像矩阵。For example, a frame of to-be-processed image and an intermediate feature map may be subjected to vector splicing or vector addition to obtain the to-be-processed image matrix, and the to-be-processed image matrix may be input to the second neural network for operation to obtain the third image. It can be understood that the vector stitching of a frame of image to be processed and the intermediate feature map can be regarded as the internal processing process of the second neural network. The input to the second neural network is an overall matrix, that is, the matrix of images to be processed.
可选的,第二神经网络还接收第一神经网络输出的多帧中间特征图。可以将一帧待处理图像和第一帧中间特征图进行向量拼接或者向量加法,得到待处理图像矩阵,将待处理图像矩阵输入第二神经网络进行运算,获得第二神经网络中间特征图,第二神经网络的中间特征图和第一神经网络的中间特征图进行向量拼接或者矩阵加法,进 而再经过第二神经网络剩余网络层处理,以获得第三图像。Optionally, the second neural network also receives a multi-frame intermediate feature map output by the first neural network. A frame of image to be processed and the intermediate feature map of the first frame can be vector spliced or vector added to obtain an image matrix to be processed, and the image matrix to be processed can be input to the second neural network for operation to obtain the intermediate feature map of the second neural network. Perform vector splicing or matrix addition on the intermediate feature map of the second neural network and the intermediate feature map of the first neural network, and then process the remaining network layer of the second neural network to obtain the third image.
可选的,第一神经网络包括一个图像输出层和多个特征图输出层。图像输出层输出第一图像。特征图输出层输出多个中间特征图。Optionally, the first neural network includes one image output layer and multiple feature map output layers. The image output layer outputs the first image. The feature map output layer outputs multiple intermediate feature maps.
本申请实施例中,对第一图像、第二图像和第三图像不做具体的区分。例如,对第一图像、第二图像和第三图像的颜色、纹理等特征不做具体的区分。In the embodiments of the present application, no specific distinction is made between the first image, the second image, and the third image. For example, no specific distinction is made between the color and texture of the first image, the second image, and the third image.
本申请实施例中,多帧待处理图像包括多帧时域邻近的图像。可选的,多帧时域邻近的图像包括多帧时域连续的图像。多帧待处理图像经过多个第二神经网络处理后,处理后的图像也是对应的多帧。例如,经过多个第二神经网络获得的第二图像为多帧,每帧待处理图像对应一帧第二图像。每帧第二图像对应第一图像和一帧第三图像。In the embodiment of the present application, the multiple frames of images to be processed include multiple frames of temporally adjacent images. Optionally, multiple frames of temporally adjacent images include multiple frames of temporally continuous images. After multiple frames of to-be-processed images are processed by multiple second neural networks, the processed images are also corresponding multiple frames. For example, the second images obtained through multiple second neural networks are multiple frames, and each frame of the image to be processed corresponds to one frame of the second image. Each frame of the second image corresponds to the first image and one frame of the third image.
本申请实施例中,可选的,待处理图像的格式可以为红绿蓝(RGB)格式,也可以为亮色分离(YUV)格式,也可以为贝尔(bayer)格式。本申请中不作限定。In the embodiment of the present application, optionally, the format of the image to be processed may be a red-green-blue (RGB) format, a bright color separation (YUV) format, or a Bayer format. There is no limitation in this application.
例如,多帧待处理图像的数量为4帧,将4帧待处理图像输入第一神经网络进行运算,获得第一图像。其中,4帧待处理图像对应得到第一图像。将4个图像组分别输入4个第二神经网络进行运算,以分别获得4帧第二图像,每个图像组包括第一图像和4帧待处理图像中的一帧图像。例如,第1帧待处理图像对应得到第1帧第二图像;第2帧待处理图像对应得到第2帧第二图像;第3帧待处理图像对应得到第3帧第二图像;第4帧待处理图像对应得到第4帧第二图像。For example, the number of multi-frame to-be-processed images is 4 frames, and the 4 frames of to-be-processed images are input to the first neural network for calculation to obtain the first image. Among them, the first image is obtained corresponding to 4 frames of images to be processed. The 4 image groups are respectively input to 4 second neural networks to perform operations to obtain 4 second images respectively, and each image group includes the first image and one of the 4 images to be processed. For example, the first frame of the image to be processed corresponds to the first frame of the second image; the second frame of image to be processed corresponds to the second frame of the second image; the third frame of image to be processed corresponds to the third frame of the second image; the fourth frame The image to be processed corresponds to the fourth frame of the second image.
可选的,将第1帧待处理图像输入第1个第二神经网络进行运算,以获得第1帧第三图像,合并第一图像和第1帧第三图像,以获得第1帧第二图像。将第2帧待处理图像输入第2个第二神经网络进行运算,以获得第2帧第三图像,合并第一图像和第2帧第三图像,以获得第2帧第二图像。将第3帧待处理图像输入第3个第二神经网络进行运算,以获得第3帧第三图像,合并第一图像和第3帧第三图像,以获得第3帧第二图像。将第4帧待处理图像输入第4个第二神经网络进行运算,以获得第4帧第三图像,合并第一图像和第4帧第三图像,以获得第4帧第二图像。Optionally, the first frame of image to be processed is input to the first second neural network for calculation to obtain the first frame of the third image, and the first image and the first frame of the third image are combined to obtain the first frame of second image. The second frame of image to be processed is input to the second second neural network for operation to obtain the second frame of the third image, and the first image and the second frame of the third image are combined to obtain the second frame of the second image. The third frame of image to be processed is input to the third second neural network for operation to obtain the third frame of the third image, and the first image and the third frame of the third image are combined to obtain the third frame of the second image. The fourth frame of image to be processed is input to the fourth second neural network for operation to obtain the fourth frame of the third image, and the first image and the fourth frame of the third image are combined to obtain the fourth frame of the second image.
本申请实施例中,第一神经网络和第二神经网络可以组合成图像处理系统,该图像修理系统用于对待处理图像进行处理,以提高图像或视频的质量。处理过程可以包括降噪和消除马赛克效应等处理。In the embodiment of the present application, the first neural network and the second neural network can be combined into an image processing system, and the image repair system is used to process the image to be processed to improve the quality of the image or video. The processing process can include processing such as noise reduction and elimination of mosaic effects.
一般情况下,第一神经网络的复杂度要高于第二神经网络的复杂度。如第一神经网络对图像静止区域的处理能力大于第二神经网络。In general, the complexity of the first neural network is higher than the complexity of the second neural network. For example, the processing capability of the first neural network on the static area of the image is greater than that of the second neural network.
一些技术中,往往将多帧图像经过神经网络合成为一帧输出,以提高图像或视频质量。但是这样神经网络需要很高的复杂度,在视频场景下,需要很高的处理速度。例如,移动终端视频实施处理需要将分辨率8K的视频达到30帧/s的处理速度,即帧率30。在视频场景对处理速度的高要求下,如果采用神经网络将多帧图像合成一帧输出,需要面临计算复杂度和计算资源耗费较大的问题,且需要很大的时延。如果一味的降低神经网络的复杂度,用复杂度较低的网络,又会影响图像或视频的质量。In some technologies, multiple frames of images are often synthesized into one frame of output through a neural network to improve image or video quality. However, such a neural network requires a high degree of complexity, and in a video scene, a high processing speed is required. For example, the implementation of video processing on a mobile terminal requires a video with a resolution of 8K to reach a processing speed of 30 frames/s, that is, a frame rate of 30. Under the high requirements for processing speed in video scenes, if a neural network is used to synthesize multiple frames of images into one frame for output, it needs to face the problems of computational complexity and computational resource consumption, and a large time delay is required. If you blindly reduce the complexity of the neural network and use a lower complexity network, it will affect the quality of the image or video.
本申请实施例中,采用第一神经网络处理多帧图像之间的复杂算力的问题,采用第二神经网络处理多帧图像中每帧图像较低算力的问题,并输出多帧处理后的图像,使得第一神经网络和第二神经网络的综合算力分摊到多帧图像上,使得每帧图像的处理复杂度相比上述方案得到降低,同时又能够保证图像或视频的质量。例如,第一图 像为静止区域的图像,第三图像为运动区域的图像,第一神经网络用于处理多帧待处理图像的静止区域,第二神经网络用于处理多帧待处理图像的运动区域,这样通过两个神经网络的合力处理,使得本申请提供的图像处理系统在图像处理时具有较低复杂度,并保证图像或视频的质量。提高了深度学习技术在图像信号处理领域的应用。In the embodiment of this application, the first neural network is used to deal with the problem of complex computing power between multiple frames of images, and the second neural network is used to deal with the problem of lower computing power in each frame of the multi-frame images, and output the multi-frame processed The integrated computing power of the first neural network and the second neural network is allocated to multiple frames of images, so that the processing complexity of each frame of image is reduced compared with the above-mentioned solution, and at the same time, the quality of the image or video can be guaranteed. For example, the first image is an image of a static area, the third image is an image of a moving area, the first neural network is used to process the static area of the multi-frame image to be processed, and the second neural network is used to process the motion of the multi-frame image to be processed In this way, through the joint processing of two neural networks, the image processing system provided by the present application has lower complexity in image processing and guarantees the quality of the image or video. Improve the application of deep learning technology in the field of image signal processing.
下面以第一神经网络和第二神经网络为卷积神经网络为例进行说明。假设待处理图像为4帧,第二图像为4帧。待处理图像的格式为贝尔格式图像,特别地图像格式为RGrGbB格式,一帧RGrGbB格式的图像包括4个通道(R、Gr、Gb、B)。4帧待处理图像经过图像处理系统后,输出4帧处理后的图像。图像处理系统包括第一神经网络和第二神经网络。In the following description, the first neural network and the second neural network are convolutional neural networks as an example. Assume that the image to be processed is 4 frames, and the second image is 4 frames. The format of the image to be processed is a Bayer format image, in particular the image format is an RGrGbB format, and one frame of an RGrGbB format image includes 4 channels (R, Gr, Gb, B). After 4 frames of to-be-processed images pass through the image processing system, 4 frames of processed images are output. The image processing system includes a first neural network and a second neural network.
如图8所示,4帧待处理的连续的RGrGbB图像,拆分成4*4=16个通道。16个通道包括(R1、Gr1、Gb1、B1、R2、Gr2、Gb2、B2、R3、Gr3、Gb3、B3、R4、Gr4、Gb4、B4)。将4帧连续的RGrGbB图像输入第一神经网络,获得第一图像(4通道(R、Gr、Gb、B))。As shown in Fig. 8, 4 consecutive RGrGbB images to be processed are split into 4*4=16 channels. The 16 channels include (R1, Gr1, Gb1, B1, R2, Gr2, Gb2, B2, R3, Gr3, Gb3, B3, R4, Gr4, Gb4, B4). Input 4 consecutive RGrGbB images into the first neural network to obtain the first image (4 channels (R, Gr, Gb, B)).
将第1帧RGrGbB图像(4通道(R1、Gr1、Gb1、B1))输入第1个第二神经网络,获得第1帧第三图像(4通道(R1’、Gr1’、Gb1’、B1’)),合并第一图像(R、Gr、Gb、B)和第1帧第三图像(4通道(R1’、Gr1’、Gb1’、B1’)),以获得第1帧第二图像(4通道(R1”、Gr1”、Gb1”、B1”))。Input the first frame of RGrGbB image (4 channels (R1, Gr1, Gb1, B1)) into the first second neural network, and obtain the first frame of the third image (4 channels (R1', Gr1', Gb1', B1') )), merge the first image (R, Gr, Gb, B) and the first frame of the third image (4 channels (R1', Gr1', Gb1', B1')) to obtain the first frame of the second image ( 4 channels (R1", Gr1", Gb1", B1")).
将第2帧RGrGbB图像(4通道(R2、Gr2、Gb2、B2))输入第2个第二神经网络,获得第2帧第三图像(4通道(R2’、Gr2’、Gb2’、B2’)),合并第一图像(R、Gr、Gb、B)和第2帧第三图像(4通道(R2’、Gr2’、Gb2’、B2’)),以获得第2帧第二图像(4通道(R2”、Gr2”、Gb2”、B2”))。Input the second frame of RGrGbB image (4 channels (R2, Gr2, Gb2, B2)) into the second second neural network, and obtain the second frame of the third image (4 channels (R2', Gr2', Gb2', B2') )), merge the first image (R, Gr, Gb, B) and the second frame of the third image (4 channels (R2', Gr2', Gb2', B2')) to obtain the second frame of the second image ( 4 channels (R2", Gr2", Gb2", B2")).
将第3帧RGrGbB图像(4通道(R3、Gr3、Gb3、B3))输入第3个第二神经网络,获得第3帧第三图像(4通道(R3’、Gr3’、Gb3’、B3’)),合并第一图像(R、Gr、Gb、B)和第3帧第三图像(4通道(R3’、Gr3’、Gb3’、B3’)),以获得第3帧第二图像(4通道(R3”、Gr3”、Gb3”、B3”))。Input the third frame RGrGbB image (4 channels (R3, Gr3, Gb3, B3)) into the third second neural network, and obtain the third frame of the third image (4 channels (R3', Gr3', Gb3', B3') )), merge the first image (R, Gr, Gb, B) and the third frame of the third image (4 channels (R3', Gr3', Gb3', B3')) to obtain the second image of the third frame ( 4 channels (R3”, Gr3”, Gb3”, B3”)).
将第4帧RGrGbB图像(4通道(R4、Gr4、Gb4、B4))输入第4个第二神经网络,获得第4帧第三图像(4通道(R4’、Gr4’、Gb4’、B4’)),合并第一图像(R、Gr、Gb、B)和第4帧第三图像(4通道(R4’、Gr4’、Gb4’、B4’)),以获得第4帧第二图像(4通道(R4”、Gr4”、Gb4”、B4”))。Input the fourth frame of RGrGbB image (4 channels (R4, Gr4, Gb4, B4)) into the fourth second neural network, and obtain the fourth frame of the third image (4 channels (R4', Gr4', Gb4', B4') )), merge the first image (R, Gr, Gb, B) and the fourth frame of the third image (4 channels (R4', Gr4', Gb4', B4')) to obtain the fourth frame of the second image ( 4 channels (R4", Gr4", Gb4", B4")).
示例性的,第一神经网络中的架构图9a和图9b所示,由于第一神经网络的附图过大,因此将第一神经网络拆分为两部分,分别由图9a和图9b示出。图9a和图9b共同组成第一神经网络的架构。图9a中的add之后连接图9b中的第一个层。Exemplarily, the architecture of the first neural network is shown in Figures 9a and 9b. Since the drawings of the first neural network are too large, the first neural network is split into two parts, as shown in Figure 9a and Figure 9b, respectively. out. Figures 9a and 9b together form the architecture of the first neural network. After add in Figure 9a, connect to the first layer in Figure 9b.
在图9a和图9b中,卷积层用矩形框表示。矩形框中的Conv2d+bias stride=23x3_16_32表示卷积层。其中Conv2d代表2维的卷积,bias表示偏置项,1x1/3x3代表卷积核大小,stride表示步长,_32_16表示输入输出特征图个数,32表示输入该层的特征图个数为32,16表示输出该层的特征图个数为16。In Figure 9a and Figure 9b, the convolutional layer is represented by a rectangular box. Conv2d+bias stride=23x3_16_32 in the rectangular box represents the convolutional layer. Among them, Conv2d represents a 2-dimensional convolution, bias represents the bias term, 1x1/3x3 represents the size of the convolution kernel, stride represents the step size, _32_16 represents the number of input and output feature maps, and 32 represents the number of feature maps input to the layer is 32 , 16 means that the number of feature maps of the output layer is 16.
split表示拆分层,表示特征图在通道(chanel)维度进行拆分。split 2表示将图像在特征图维度进行拆分,如32个特征图输入经过上述操作,会变成两个16个特征图的图像。Split represents the split layer, which means that the feature map is split in the channel (chanel) dimension. Split 2 means to split the image in the feature map dimension. For example, if 32 feature maps are input through the above operation, they will become two 16 feature maps.
concat表示跳链层,表示将将图像在特征图维度进行合并,例如将两个16个特征图的图像合并成一个32个特征图的图像。concat stands for the jump chain layer, which means that the images will be merged in the dimension of the feature map, for example, two images of 16 feature maps are merged into one image of 32 feature maps.
add表示矩阵加法操作。add represents a matrix addition operation.
图9a和图9b所示的第一神经网络是一个典型的卷积神经网络,可以很好地解决多帧待处理图像的静止区域。假设4待处理图像输入第一神经网络,第一神经网络输出第一图像。The first neural network shown in FIG. 9a and FIG. 9b is a typical convolutional neural network, which can well solve the static area of multiple frames of images to be processed. Assume 4 that the image to be processed is input to the first neural network, and the first neural network outputs the first image.
可选的,若不采用典型的卷积神经网络,也可以采用多分枝的神经网络。第一神经网络的卷积层还可以采用群卷积(group convolution)的操作。其中,群卷积是一个特殊的卷积层,假设上一层的输出特征图(feature map)有N个,即通道数channel=N,也就是说上一层有N个卷积核。再假设群卷积的群数目M。那么该群卷积层的操作就是,先将N个通道(channel)分成M份。每一个组(group)对应N/M个通道,各个群(group)卷积独立进行,完成后将输出的特征图进行向量拼接(concat)在一起,作为这一层的输出通道。采用群卷积的操作方式能够获得采用分枝的方式相同或相似的技术效果。Optionally, if a typical convolutional neural network is not used, a multi-branch neural network can also be used. The convolutional layer of the first neural network may also adopt a group convolution (group convolution) operation. Among them, the group convolution is a special convolution layer. Assume that there are N output feature maps (feature maps) of the previous layer, that is, the number of channels channel=N, that is, there are N convolution kernels in the previous layer. Suppose also the number of groups M for group convolution. Then the operation of the group of convolutional layers is to first divide the N channels into M parts. Each group corresponds to N/M channels, and the convolution of each group is performed independently. After completion, the output feature maps are vector concatenated (concat) together as the output channel of this layer. The operation mode of group convolution can obtain the same or similar technical effect of the branch mode.
示例性的,4个第二神经网络处理4待处理图像的架构图10a和图10b所示。图10a中的输出的第一图像输入至图10b中的第一个层。在图10a和图10b中,卷积层用矩形框表示。具体解释可以参考图9a和图9b中的解释。Exemplarily, the architecture of 4 second neural networks processing 4 images to be processed is shown in Fig. 10a and Fig. 10b. The output first image in Fig. 10a is input to the first layer in Fig. 10b. In Figure 10a and Figure 10b, the convolutional layer is represented by a rectangular box. For specific explanation, please refer to the explanation in Fig. 9a and Fig. 9b.
如图10a所示,将第1帧待处理图像输入第1个第二神经网络,获得第1帧第三图像,合并第一图像和第1帧第三图像,以获得第1帧第二图像。As shown in Figure 10a, the first frame of image to be processed is input to the first second neural network to obtain the first frame of the third image, and the first image and the first frame of the third image are combined to obtain the first frame of the second image .
将第2帧待处理图像输入第2个第二神经网络,获得第2帧第三图像,合并第一图像和第2帧第三图像,以获得第2帧第二图像。The second frame of image to be processed is input to the second second neural network to obtain the second frame of the third image, and the first image and the second frame of the third image are combined to obtain the second frame of the second image.
如图10b所示,将第3帧待处理图像输入第3个第二神经网络,获得第3帧第三图像,合并第一图像和第3帧第三图像,以获得第3帧第二图像。As shown in Figure 10b, the third frame of the image to be processed is input to the third second neural network to obtain the third frame of the third image, and the first image and the third frame of the third image are combined to obtain the third frame of the second image .
将第4帧待处理图像输入第4个第二神经网络,获得第4帧第三图像,合并第一图像和第4帧第三图像,以获得第4帧第二图像。The fourth frame of image to be processed is input to the fourth second neural network to obtain the fourth frame of the third image, and the first image and the fourth frame of the third image are combined to obtain the fourth frame of the second image.
可选的,如图11a和图11b所示,第一神经网络还可以输入中间特征图,将中间特征图输入至第二神经网络,中间特征图用于参与第二神经网络的运算,以获得第三图像。例如,第一神经网络中的第二卷积层输出的中间特征图,用于与第二神经网络的第一卷积层输出的图像进行向量拼接,得到处理后图像。又如,第一神经网络中的第四卷积层输出的中间特征图,用于与第二神经网络的第三卷积层输出的中间特征图进行向量拼接,得到处理后图像。Optionally, as shown in Figures 11a and 11b, the first neural network can also input an intermediate feature map, and the intermediate feature map is input to the second neural network, and the intermediate feature map is used to participate in the calculation of the second neural network to obtain The third image. For example, the intermediate feature map output by the second convolutional layer in the first neural network is used for vector stitching with the image output by the first convolutional layer of the second neural network to obtain a processed image. For another example, the intermediate feature map output by the fourth convolution layer in the first neural network is used for vector stitching with the intermediate feature map output by the third convolution layer of the second neural network to obtain a processed image.
本申请实施例中,第一神经网络和第二神经网络在使用之前,需要对神经网络的模型进行训练。在对神经网络进行训练过程中,训练的数据可以包括训练图像和真值图像。In the embodiment of the present application, before the first neural network and the second neural network are used, the neural network model needs to be trained. In the process of training the neural network, the training data can include training images and ground truth images.
在训练第一神经网络的模型时:首先使用第一图像的真值图像对采集的训练图像进行处理,获得并输出图像。将输出的图像与第一图像的真值图像进行对比,直到网络收敛,完成对第一神经网络的模型的训练。所谓网络收敛例如可以是指输出的图像与第一图像的真值图像差值小于设定的第一阈值。When training the model of the first neural network: first use the true value image of the first image to process the collected training image to obtain and output the image. The output image is compared with the true value image of the first image until the network converges, and the training of the first neural network model is completed. The so-called network convergence may refer to, for example, that the true value image difference between the output image and the first image is smaller than the set first threshold.
固定第一神经网络训练得到的第一图像的参数,使用第三图像的真值图像对采集 的训练图像进行处理,获得并输出图像。将输出的图像与第三图像的真值图像进行对比,直到网络收敛,完成对第二神经网络的模型的训练。这里网络收敛可以是指输出的图像与第三图像的真值图像的差值小于设定的第二阈值。Fix the parameters of the first image obtained by the training of the first neural network, use the true value image of the third image to process the collected training image to obtain and output the image. The output image is compared with the true value image of the third image until the network converges, and the training of the second neural network model is completed. Here, network convergence may mean that the difference between the output image and the true value image of the third image is less than the set second threshold.
本申请实施例,通过第一神经网络和第二神经网络构成图像处理系统,使用该图像处理系统对多帧待处理图像进行处理,输出多帧处理后的图像。第二神经网络的复杂度要低于第一神经网络的复杂度。图像处理系统对每帧待处理图像的计算量,相比一些技术中将多帧图像通过基础网络处理成一帧的方案,计算量有一定程度的降低。进而能够降低图像处理时延,并且能够保证图像或视频的质量。下面对两个神经网络处理多帧待处理图像的算力进行举例说明。假设待处理图像为4帧,经过第一神经网络和第二神经网络输出的处理后的图像为4帧。经过基础网络处理后输出一帧。第一神经网络如图9a和图9b所示,第二神经网络如图10a和图10b所示。In the embodiment of the present application, an image processing system is formed by a first neural network and a second neural network, and the image processing system is used to process multiple frames of to-be-processed images, and output multiple frames of processed images. The complexity of the second neural network is lower than the complexity of the first neural network. The calculation amount of the image processing system for each frame of the image to be processed is reduced to a certain extent compared with the scheme of processing multiple frames of images into one frame through the basic network in some technologies. In turn, the image processing time delay can be reduced, and the quality of the image or video can be guaranteed. The computing power of the two neural networks for processing multiple frames of images to be processed will be illustrated below with examples. Assuming that the image to be processed is 4 frames, the processed image output by the first neural network and the second neural network is 4 frames. A frame is output after basic network processing. The first neural network is shown in Figures 9a and 9b, and the second neural network is shown in Figures 10a and 10b.
第一神经网络的计算量与基础网络的计算量大约相同,为12000MAC左右。例如基础网络的网络复杂度计算过程如下:The calculation amount of the first neural network is about the same as that of the basic network, which is about 12000 MAC. For example, the calculation process of the network complexity of the basic network is as follows:
(23*32*1*1+32*16*3*3)/4#1336(23*32*1*1+32*16*3*3)/4#1336
+16*32*3*3/16#288+16*32*3*3/16#288
+(32*32*3*3)/16#576+(32*32*3*3)/16#576
+32*64*3*3/64#288+32*64*3*3/64#288
+(64*96*3*3+(48*48*3*3*2+96*96*1*1*1)*2+96*64*3*3+32*32*3*3*2+64*64*1*1*1+64*64*3*3*1)/64#4240+(64*96*3*3+(48*48*3*3*2+96*96*1*1*1)*2+96*64*3*3+32*32*3*3* 2+64*64*1*1*1+64*64*3*3*1)/64#4240
+(64*32*2*2)/16+(concat)#512+(64*32*2*2)/16+(concat)#512
+(64*32*3*3)/16#1152+(64*32*3*3)/16#1152
+(32*16*2*2)/4+(concat)#512+(32*16*2*2)/4+(concat)#512
+(32*16*3*3)/4#1152+(32*16*3*3)/4#1152
+(16*16*3*3+16*16*3*3+16*4*3*3)/4#1296+(16*16*3*3+16*16*3*3+16*4*3*3)/4#1296
=11352= 11352
下面给出第二神经网络的网络复杂度计算过程如下:The calculation process of the network complexity of the second neural network is as follows:
(4*16*3*3)/4#144(4*16*3*3)/4#144
+16*32*3*3/16#288+16*32*3*3/16#288
+(32*32*3*3)/16#576+(32*32*3*3)/16#576
+32*64*3*3/64#288+32*64*3*3/64#288
+(64*64*3*3)/64#576+(64*64*3*3)/64#576
+(64*32*2*2)/16+(concat)#512+(64*32*2*2)/16+(concat)#512
+(64*32*3*3)/16#1152+(64*32*3*3)/16#1152
+(32*16*2*2)/4+(concat)#512+(32*16*2*2)/4+(concat)#512
+(32*4*3*3)/4#288+(32*4*3*3)/4#288
=4336= 4336
可见,第二神经网络的计算量为4000左右,假设为4000。It can be seen that the calculation amount of the second neural network is about 4000, which is assumed to be 4000.
则当输入4帧待处理图像并同时输出4帧处理后图像时,图像处理系统的计算量为(4000*4+12000)/4=7000;当输入8帧待处理图像并同时输出8帧处理后图像时,图 像处理系统的计算量为(4000*8+12000)/8=5500;当输入16帧待处理图像并同时输出16帧处理后图像时,图像处理系统的计算量为(4000*16+12000)/16=4700。均小于通过基础网络将多帧处理成一帧的算力12000。可以看出通过本申请实施例提供的第一神经网络和第二神经网络进行的多帧输入多帧输出的方案,能够降低计算量,从而降低图像处理的时延,对于视频场景下能够满足视频对图像处理时延的要求。分辨率为8千(K)个像素,帧率为30帧/秒的视频的网络算力要求大概在50000MAC左右,本申请实施例在输出8帧的时候,图像处理系统规定的计算量基本可以满足8K 30视频的网络算力要求。Then when 4 frames of images to be processed are input and 4 frames of processed images are output at the same time, the calculation amount of the image processing system is (4000*4+12000)/4=7000; when 8 frames of images to be processed are input and 8 frames of processed images are output at the same time For the post image, the calculation amount of the image processing system is (4000*8+12000)/8=5500; when 16 frames of images to be processed are input and 16 frames of processed images are output at the same time, the calculation amount of the image processing system is (4000* 16+12000)/16=4700. Both are less than 12,000 computing power to process multiple frames into one frame through the basic network. It can be seen that the multi-frame input and multi-frame output schemes performed by the first neural network and the second neural network provided by the embodiments of the present application can reduce the amount of calculation, thereby reducing the delay of image processing, and can meet the requirements of video in video scenarios. Requirements for image processing delay. The network computing power requirement of a video with a resolution of 8 thousand (K) pixels and a frame rate of 30 frames per second is about 50000 MAC. When the embodiment of this application outputs 8 frames, the amount of calculation prescribed by the image processing system is basically sufficient Meet the network computing power requirements of 8K 30 videos.
需要说明的是,本申请中的各个应用场景中的举例仅仅表现了一些可能的实现方式,是为了对本申请的方法更好的理解和说明。本领域技术人员可以根据申请提供的参考信号的指示方法,得到一些演变形式的举例。It should be noted that the examples in each application scenario in this application only show some possible implementation manners, which are for a better understanding and description of the method of this application. Those skilled in the art can obtain some examples of evolution forms according to the indication method of the reference signal provided in the application.
为了实现上述本申请实施例提供的方法中的各功能,基于神经网络的图像处理装置可以包括硬件结构和/或软件模块,以硬件结构、软件模块、或硬件结构加软件模块的形式来实现上述各功能。上述各功能中的某个功能以硬件结构、软件模块、还是硬件结构加软件模块的方式来执行,取决于技术方案的特定应用和设计约束条件。In order to realize the functions in the methods provided in the above embodiments of the present application, the neural network-based image processing device may include a hardware structure and/or a software module, and implement the above in the form of a hardware structure, a software module, or a hardware structure plus a software module. Each function. Whether a certain function among the above-mentioned functions is executed by a hardware structure, a software module, or a hardware structure plus a software module depends on the specific application and design constraint conditions of the technical solution.
如图12所示,基于同一技术构思,本申请实施例还提供了一种基于神经网络的图像处理装置1200,该基于神经网络的图像处理装置1200可以是移动终端或任意具有图像处理功能的设备。一种设计中,该基于神经网络的图像处理装置1200可以包括执行上述方法实施例中各方法/操作/步骤/动作所一一对应的模块,该模块可以是硬件电路,也可是软件,也可以是硬件电路结合软件实现。一种设计中,该基于神经网络的图像处理装置1200可以包括运算模块1201。As shown in FIG. 12, based on the same technical concept, an embodiment of the present application also provides a neural network-based image processing apparatus 1200. The neural network-based image processing apparatus 1200 may be a mobile terminal or any device with image processing functions. . In one design, the neural network-based image processing device 1200 may include modules that perform one-to-one correspondence of the methods/operations/steps/actions in the foregoing method embodiments. The modules may be hardware circuits, software, or It is realized by hardware circuit combined with software. In one design, the neural network-based image processing device 1200 may include an arithmetic module 1201.
运算模块1201用于将多帧待处理图像输入第一神经网络进行运算,以获得第一图像;以及将多个图像组分别输入多个第二神经网络进行运算,以分别获得多帧第二图像,其中,每个所述图像组包括所述第一图像和所述多帧待处理图像中的一帧图像。The arithmetic module 1201 is configured to input multiple frames of to-be-processed images into a first neural network for calculation to obtain a first image; and input multiple image groups into multiple second neural networks to perform calculations to obtain multiple frames of second images respectively , Wherein each of the image groups includes one frame of the first image and the multiple frames of images to be processed.
本申请实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,另外,在本申请各个实施例中的各功能模块可以集成在一个处理器中,也可以是单独物理存在,也可以两个或两个以上模块集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。The division of modules in the embodiments of this application is illustrative, and it is only a logical function division. In actual implementation, there may be other division methods. In addition, the functional modules in the various embodiments of this application can be integrated into one process. In the device, it can also exist alone physically, or two or more modules can be integrated into one module. The above-mentioned integrated modules can be implemented in the form of hardware or software functional modules.
基于同一技术构思,如图13所示,本申请实施例还提供一种基于神经网络的图像处理装置1300。该神经网络的图像处理装置1300包括处理器1301。该处理器1301用于调用一组程序,以使得上述方法实施例被执行。该神经网络的图像处理装置1300还包括存储器1302,存储器1302用于存储处理器1301执行的程序指令和/或数据。存储器1302和处理器1301耦合。本申请实施例中的耦合是装置、单元或模块之间的间接耦合或通信连接,可以是电性,机械或其它的形式,用于装置、单元或模块之间的信息交互。处理器1301可能和存储器1302协同操作。处理器1301可能执行存储器1302中存储的程序指令。存储器1302可以包括于处理器1301中。Based on the same technical concept, as shown in FIG. 13, an embodiment of the present application also provides an image processing device 1300 based on a neural network. The image processing device 1300 of the neural network includes a processor 1301. The processor 1301 is used to call a group of programs to enable the foregoing method embodiments to be executed. The image processing device 1300 of the neural network further includes a memory 1302, and the memory 1302 is configured to store program instructions and/or data executed by the processor 1301. The memory 1302 is coupled with the processor 1301. The coupling in the embodiments of the present application is an indirect coupling or communication connection between devices, units or modules, and may be in electrical, mechanical or other forms, and is used for information exchange between devices, units or modules. The processor 1301 may operate in cooperation with the memory 1302. The processor 1301 may execute program instructions stored in the memory 1302. The memory 1302 may be included in the processor 1301.
该基于神经网络的图像处理装置1300可以为芯片系统。本申请实施例中,芯片系统可以由芯片构成,也可以包含芯片和其他分立器件。The neural network-based image processing device 1300 may be a chip system. In the embodiments of the present application, the chip system may be composed of chips, or may include chips and other discrete devices.
处理器1301用于将多帧待处理图像输入第一神经网络进行运算,以获得第一图 像;以及将多个图像组分别输入多个第二神经网络进行运算,以分别获得多帧第二图像,其中,每个所述图像组包括所述第一图像和所述多帧待处理图像中的一帧图像。The processor 1301 is configured to input multiple frames of to-be-processed images into a first neural network for operation to obtain a first image; and input multiple image groups into multiple second neural networks for operation to obtain multiple frames of second images respectively , Wherein each of the image groups includes one frame of the first image and the multiple frames of images to be processed.
处理器1301可以是通用处理器、数字信号处理器、专用集成电路、现场可编程门阵列或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件,可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。The processor 1301 may be a general-purpose processor, a digital signal processor, an application specific integrated circuit, a field programmable gate array or other programmable logic device, a discrete gate or transistor logic device, a discrete hardware component, and may implement or execute the The disclosed methods, steps and logic block diagrams. The general-purpose processor may be a microprocessor or any conventional processor or the like. The steps of the method disclosed in combination with the embodiments of the present application may be directly embodied as being executed and completed by a hardware processor, or executed and completed by a combination of hardware and software modules in the processor.
存储器1302可以是非易失性存储器,比如硬盘(hard disk drive,HDD)或固态硬盘(solid-state drive,SSD)等,还可以是易失性存储器(volatile memory),例如随机存取存储器(random-access memory,RAM)。存储器是能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。本申请实施例中的存储器还可以是电路或者其它任意能够实现存储功能的装置,用于存储程序指令和/或数据。The memory 1302 may be a non-volatile memory, such as a hard disk drive (HDD) or a solid-state drive (SSD), etc., and may also be a volatile memory, such as random access memory (random access memory). -access memory, RAM). The memory is any other medium that can be used to carry or store desired program codes in the form of instructions or data structures and that can be accessed by a computer, but is not limited thereto. The memory in the embodiment of the present application may also be a circuit or any other device capable of realizing a storage function for storing program instructions and/or data.
本申请上述方法实施例所描述的各个操作和功能中的部分或全部,可以用芯片或集成电路来完成。Part or all of the various operations and functions described in the foregoing method embodiments of the present application may be completed by chips or integrated circuits.
本申请实施例还提供一种芯片,包括处理器,用于支持该基于神经网络的图像处理装置实现上述方法实施例所涉及的功能。在一种可能的设计中,该芯片与存储器连接或者该芯片包括存储器,该存储器用于保存该通信装置必要的程序指令和数据。An embodiment of the present application also provides a chip including a processor, which is used to support the neural network-based image processing device to implement the functions involved in the foregoing method embodiments. In a possible design, the chip is connected to a memory or the chip includes a memory, and the memory is used to store the necessary program instructions and data of the communication device.
本申请实施例提供了一种计算机可读存储介质,存储有计算机程序,该计算机程序包括用于执行上述方法实施例的指令。The embodiment of the present application provides a computer-readable storage medium that stores a computer program, and the computer program includes instructions for executing the foregoing method embodiments.
本申请实施例提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述方法实施例。The embodiments of the present application provide a computer program product containing instructions, which when run on a computer, cause the computer to execute the foregoing method embodiments.
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art should understand that the embodiments of the present application can be provided as methods, systems, or computer program products. Therefore, this application may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, this application may adopt the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes.
本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。This application is described with reference to flowcharts and/or block diagrams of methods, devices (systems), and computer program products according to embodiments of this application. It should be understood that each process and/or block in the flowchart and/or block diagram, and the combination of processes and/or blocks in the flowchart and/or block diagram can be realized by computer program instructions. These computer program instructions can be provided to the processor of a general-purpose computer, a special-purpose computer, an embedded processor, or other programmable data processing equipment to generate a machine, so that the instructions executed by the processor of the computer or other programmable data processing equipment are used to generate It is a device that realizes the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions can also be stored in a computer-readable memory that can guide a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device. The device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计 算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment. The instructions provide steps for implementing the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
尽管已描述了本申请的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例作出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本申请范围的所有变更和修改。Although the preferred embodiments of the present application have been described, those skilled in the art can make additional changes and modifications to these embodiments once they learn the basic creative concept. Therefore, the appended claims are intended to be interpreted as including the preferred embodiments and all changes and modifications falling within the scope of the present application.
显然,本领域的技术人员可以对本申请实施例进行各种改动和变型而不脱离本申请实施例的精神和范围。这样,倘若本申请实施例的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包含这些改动和变型在内。Obviously, those skilled in the art can make various changes and modifications to the embodiments of the present application without departing from the spirit and scope of the embodiments of the present application. In this way, if these modifications and variations of the embodiments of the present application fall within the scope of the claims of the present application and their equivalent technologies, the present application also intends to include these modifications and variations.

Claims (19)

  1. 一种基于神经网络的图像处理方法,其特征在于,包括:A neural network-based image processing method, which is characterized in that it includes:
    将多帧待处理图像输入第一神经网络进行运算,以获得第一图像;Input multiple frames of to-be-processed images into the first neural network for calculation to obtain the first image;
    将多个图像组分别输入多个第二神经网络进行运算,以分别获得多帧第二图像,其中,每个所述图像组包括所述第一图像和所述多帧待处理图像中的一帧图像。Input multiple image groups into multiple second neural networks to perform operations respectively to obtain multiple frames of second images, wherein each of the image groups includes one of the first image and the multiple frames of images to be processed Frame image.
  2. 根据权利要求1所述的方法,其特征在于,所述将多个图像组分别输入多个第二神经网络进行运算,包括:The method according to claim 1, wherein said inputting multiple image groups into multiple second neural networks for calculation respectively comprises:
    将所述多帧待处理图像中的一帧图像输入所述第二神经网络进行运算,以获得第三图像;Input one frame of the multiple frames of images to be processed into the second neural network for calculation to obtain a third image;
    合并所述第一图像和所述第三图像,以获得所述第二图像。Combining the first image and the third image to obtain the second image.
  3. 根据权利要求1所述的方法,其特征在于,所述第一神经网络包括一个图像输出层和多个特征图输出层,所述图像输出层输出所述第一图像,所述特征图输出层输出多个中间特征图,所述多个中间特征图用于参与所述第二神经网络的运算,以获得第三图像。The method according to claim 1, wherein the first neural network comprises an image output layer and a plurality of feature map output layers, the image output layer outputs the first image, and the feature map output layer A plurality of intermediate feature maps are output, and the plurality of intermediate feature maps are used to participate in the operation of the second neural network to obtain a third image.
  4. 根据权利要求1-3中任一项所述的方法,其特征在于,所述多帧待处理图像包括多帧时域邻近的图像。The method according to any one of claims 1 to 3, wherein the multiple frames of to-be-processed images include multiple frames of temporally adjacent images.
  5. 根据权利要求1-4中任一项所述的方法,其特征在于,所述第一神经网络用于处理所述待处理图像的静止区域。The method according to any one of claims 1 to 4, wherein the first neural network is used to process the static area of the image to be processed.
  6. 根据权利要求1-5中任一项所述的方法,其特征在于,所述第二神经网络用于处理所述待处理图像的运动区域。The method according to any one of claims 1 to 5, wherein the second neural network is used to process the motion area of the image to be processed.
  7. 根据权利要求1-6中任一项所述的方法,其特征在于,所述第一神经网络对图像静止区域的处理能力大于第二神经网络。The method according to any one of claims 1 to 6, wherein the processing capability of the first neural network on the static area of the image is greater than that of the second neural network.
  8. 根据权利要求1-7中任一项所述的方法,其特征在于,所述第一神经网络和所述第二神经网络组成图像处理系统,所述图像处理系统用于对所述待处理图像进行降噪和消除马赛克效应处理。The method according to any one of claims 1-7, wherein the first neural network and the second neural network form an image processing system, and the image processing system is used for processing the image to be processed Perform noise reduction and remove mosaic effect processing.
  9. 一种基于神经网络的图像处理装置,其特征在于,包括:An image processing device based on neural network, characterized in that it comprises:
    运算模块,用于将多帧待处理图像输入第一神经网络进行运算,以获得第一图像;An arithmetic module, configured to input multiple frames of to-be-processed images into the first neural network for calculation to obtain the first image;
    所述运算模块,还用于将多个图像组分别输入多个第二神经网络进行运算,以分别获得多帧第二图像,其中,每个所述图像组包括所述第一图像和所述多帧待处理图像中的一帧图像。The arithmetic module is further configured to input multiple image groups into multiple second neural networks for calculation respectively to obtain multiple frames of second images, wherein each of the image groups includes the first image and the One of the multiple frames to be processed.
  10. 根据权利要求9所述的装置,其特征在于,所述运算模块用于:The device according to claim 9, wherein the computing module is used for:
    将所述多帧待处理图像中的一帧图像输入所述第二神经网络进行运算,以获得第三图像;Input one frame of the multiple frames of images to be processed into the second neural network for calculation to obtain a third image;
    合并所述第一图像和所述第三图像,以获得所述第二图像。Combining the first image and the third image to obtain the second image.
  11. 根据权利要求9所述的装置,其特征在于,所述第一神经网络包括一个图像输出层和多个特征图输出层,所述图像输出层输出所述第一图像,所述特征图输出层输出多个中间特征图,所述多个中间特征图用于参与所述第二神经网络的运算,以获得第三图像。The device according to claim 9, wherein the first neural network comprises an image output layer and a plurality of feature map output layers, the image output layer outputs the first image, and the feature map output layer A plurality of intermediate feature maps are output, and the plurality of intermediate feature maps are used to participate in the operation of the second neural network to obtain a third image.
  12. 根据权利要求9-11中任一项所述的装置,其特征在于,所述多帧待处理图像 包括多帧时域邻近的图像。The apparatus according to any one of claims 9-11, wherein the multiple frames of to-be-processed images comprise multiple frames of temporally adjacent images.
  13. 根据权利要求9-12中任一项所述的装置,其特征在于,所述第一神经网络用于处理所述待处理图像的静止区域。The device according to any one of claims 9-12, wherein the first neural network is used to process the static area of the image to be processed.
  14. 根据权利要求9-13中任一项所述的装置,其特征在于,所述第二神经网络用于处理所述待处理图像的运动区域。The device according to any one of claims 9-13, wherein the second neural network is used to process the motion area of the image to be processed.
  15. 根据权利要求9-14中任一项所述的装置,其特征在于,所述第一神经网络对图像静止区域的处理能力大于第二神经网络。The device according to any one of claims 9-14, wherein the processing capability of the first neural network on the static area of the image is greater than that of the second neural network.
  16. 根据权利要求9-15中任一项所述的装置,其特征在于,所述第一神经网络和所述第二神经网络组成图像处理系统,所述图像处理系统用于对所述待处理图像进行降噪和消除马赛克效应处理。The device according to any one of claims 9-15, wherein the first neural network and the second neural network form an image processing system, and the image processing system is used for processing the image to be processed Perform noise reduction and remove mosaic effect processing.
  17. 一种芯片,其特征在于,所述芯片与存储器相连,用于读取并执行所述存储器中存储的软件程序,以实现如权利要求1-8中任一项所述的方法。A chip, characterized in that the chip is connected to a memory, and is used to read and execute a software program stored in the memory to implement the method according to any one of claims 1-8.
  18. 一种基于神经网络的图像处理装置,其特征在于,包括处理器和存储器,所述处理器用于运行一组程序,以使得如权利要求1-8中任一项的方法被执行。An image processing device based on a neural network, which is characterized by comprising a processor and a memory, and the processor is used to run a set of programs to enable the method according to any one of claims 1 to 8 to be executed.
  19. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有计算机可读指令,当所述计算机可读指令在基于神经网络的图像处理装置上运行时,使得所述基于神经网络的图像处理装置执行权利要求1-8任一项所述的方法。A computer-readable storage medium, characterized in that computer-readable instructions are stored in the computer-readable storage medium, and when the computer-readable instructions run on a neural network-based image processing device, the computer-readable The image processing device of the neural network executes the method according to any one of claims 1-8.
PCT/CN2020/082634 2020-03-31 2020-03-31 Neural network-based image processing method and apparatus WO2021196050A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2020/082634 WO2021196050A1 (en) 2020-03-31 2020-03-31 Neural network-based image processing method and apparatus
CN202080099095.0A CN115335852A (en) 2020-03-31 2020-03-31 Image processing method and device based on neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/082634 WO2021196050A1 (en) 2020-03-31 2020-03-31 Neural network-based image processing method and apparatus

Publications (1)

Publication Number Publication Date
WO2021196050A1 true WO2021196050A1 (en) 2021-10-07

Family

ID=77927278

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/082634 WO2021196050A1 (en) 2020-03-31 2020-03-31 Neural network-based image processing method and apparatus

Country Status (2)

Country Link
CN (1) CN115335852A (en)
WO (1) WO2021196050A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107808365A (en) * 2017-09-30 2018-03-16 广州智慧城市发展研究院 One kind is based on the self-compressed image denoising method of convolutional neural networks and system
CN108197623A (en) * 2018-01-19 2018-06-22 百度在线网络技术(北京)有限公司 For detecting the method and apparatus of target
CN108737750A (en) * 2018-06-07 2018-11-02 北京旷视科技有限公司 Image processing method, device and electronic equipment
US10255663B2 (en) * 2016-11-11 2019-04-09 Kabushiki Kaisha Toshiba Image processing device, image processing method, computer program product
CN109886892A (en) * 2019-01-17 2019-06-14 迈格威科技有限公司 Image processing method, image processing apparatus and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10255663B2 (en) * 2016-11-11 2019-04-09 Kabushiki Kaisha Toshiba Image processing device, image processing method, computer program product
CN107808365A (en) * 2017-09-30 2018-03-16 广州智慧城市发展研究院 One kind is based on the self-compressed image denoising method of convolutional neural networks and system
CN108197623A (en) * 2018-01-19 2018-06-22 百度在线网络技术(北京)有限公司 For detecting the method and apparatus of target
CN108737750A (en) * 2018-06-07 2018-11-02 北京旷视科技有限公司 Image processing method, device and electronic equipment
CN109886892A (en) * 2019-01-17 2019-06-14 迈格威科技有限公司 Image processing method, image processing apparatus and storage medium

Also Published As

Publication number Publication date
CN115335852A (en) 2022-11-11

Similar Documents

Publication Publication Date Title
US11430209B2 (en) Image signal processing method, apparatus, and device
US10136110B2 (en) Low-light image quality enhancement method for image processing device and method of operating image processing system performing the method
US10417771B2 (en) Fast MRF energy optimization for solving scene labeling problems
US10348978B2 (en) Processor selecting between image signals in response to illuminance condition, image processing device including same, and related method for image processing
CN107431770A (en) Adaptive line brightness domain video pipeline framework
WO2020062312A1 (en) Signal processing device and signal processing method
CN112202986A (en) Image processing method, image processing apparatus, readable medium and electronic device thereof
CN113850367A (en) Network model training method, image processing method and related equipment thereof
JP2020042774A (en) Artificial intelligence inference computing device
CN115049783B (en) Model determining method, scene reconstruction model, medium, equipment and product
CN111951171A (en) HDR image generation method and device, readable storage medium and terminal equipment
CN117768774A (en) Image processor, image processing method, photographing device and electronic device
WO2021196050A1 (en) Neural network-based image processing method and apparatus
US20230388623A1 (en) Composite image signal processor
KR20130018899A (en) Single pipeline stereo image capture
WO2021179147A1 (en) Image processing method and apparatus based on neural network
CN111598781B (en) Image super-resolution method based on hybrid high-order attention network
US11941789B2 (en) Tone mapping and tone control integrations for image processing
CN114363693B (en) Image quality adjusting method and device
CN116205806B (en) Image enhancement method and electronic equipment
CN109688333B (en) Color image acquisition method, device, equipment and storage medium
WO2024025224A1 (en) Method and system for generation of a plurality of portrait effects in an electronic device
WO2024130715A1 (en) Video processing method, video processing apparatus and readable storage medium
US20240292112A1 (en) Image signal processor, operating method thereof, and application processor including the image signal processor
WO2023283855A1 (en) Super resolution based on saliency

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20928798

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20928798

Country of ref document: EP

Kind code of ref document: A1