CN111951168A

CN111951168A - Image processing method, image processing apparatus, storage medium, and electronic device

Info

Publication number: CN111951168A
Application number: CN202010866139.XA
Authority: CN
Inventors: 颜海强
Original assignee: Oppo Chongqing Intelligent Technology Co Ltd
Current assignee: Oppo Chongqing Intelligent Technology Co Ltd
Priority date: 2020-08-25
Filing date: 2020-08-25
Publication date: 2020-11-17
Anticipated expiration: 2040-08-25
Also published as: CN111951168B

Abstract

The disclosure provides an image processing method, an image processing device, a computer readable storage medium and an electronic device, and relates to the technical field of image processing. The image processing method comprises the following steps: acquiring an image to be processed under a preset task; processing the image to be processed by utilizing a generation countermeasure network under the preset task to generate an intermediate image corresponding to the image to be processed; and performing convolution processing on the intermediate image by using an image enhancement network at least two different scales, and generating a target image according to data after the convolution processing. The method and the device can break through the content limitation of the image to be processed in the image processing process, thereby improving the image processing effect and improving the image quality.

Description

Image processing method, image processing apparatus, storage medium, and electronic device

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to an image processing method, an image processing apparatus, a computer-readable storage medium, and an electronic device.

Background

At present, most of image processing is based on a priori model or algorithm, and data analysis and processing are performed on the basis of an original image to obtain a processed image. For example, image denoising is filtering an original image, and image defogging is removing scattering components from the original image. Such a processing method is limited to the category of the original image, and the available information is limited, thereby affecting the image processing effect.

Disclosure of Invention

The present disclosure provides an image processing method, an image processing apparatus, a computer-readable storage medium, and an electronic device, thereby improving an effect of image processing at least to some extent.

Additional features and advantages of the disclosure will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of the disclosure.

According to a first aspect of the present disclosure, there is provided an image processing method including: acquiring an image to be processed under a preset task; processing the image to be processed by utilizing a generation countermeasure network under the preset task to generate an intermediate image corresponding to the image to be processed; and performing convolution processing on the intermediate image by using an image enhancement network at least two different scales, and generating a target image according to data after the convolution processing.

According to a second aspect of the present disclosure, there is provided an image processing apparatus comprising: the image acquisition module is used for acquiring an image to be processed under a preset task; the first processing module is used for processing the image to be processed by utilizing a generation countermeasure network under the preset task to generate an intermediate image corresponding to the image to be processed; and the second processing module is used for performing convolution processing on the intermediate image in at least two different scales by using the image enhancement network and generating a target image according to data after the convolution processing.

According to a third aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the image processing method of the first aspect described above and its possible implementations.

According to a fourth aspect of the present disclosure, there is provided an electronic device comprising: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to perform the image processing method of the first aspect and possible embodiments thereof described above via execution of the executable instructions.

The technical scheme of the disclosure has the following beneficial effects:

on one hand, the image to be processed is processed by generating the countermeasure network to execute the preset task, so that the general image processing problem can be converted into the image conversion problem, the dependence of image processing on a prior model or algorithm is avoided, the content limitation of the image to be processed is broken through, more knowledge and information learned from data set by the network are utilized, the image processing effect is improved, and the target image with higher quality is obtained. On the other hand, through the multi-scale convolution processing of the image enhancement network, the detail information and the feature richness of the target image can be enhanced, so that the problems of detail loss, color distortion and the like possibly caused by generation of the countermeasure network are solved, and the image quality is further improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is apparent that the drawings in the following description are only some embodiments of the present disclosure, and that other drawings can be obtained from those drawings without inventive effort for a person skilled in the art.

Fig. 1 shows a schematic configuration diagram of an electronic apparatus in the present exemplary embodiment;

fig. 2 shows a flowchart of an image processing method in the present exemplary embodiment;

fig. 3 shows a flowchart of generating an intermediate image in the present exemplary embodiment;

FIG. 4 illustrates a flow chart for processing an image using a generation network in the present exemplary embodiment;

fig. 5 shows a schematic structural diagram of one generation network in the present exemplary embodiment;

FIG. 6 shows a flowchart for processing an image using a discriminative network in the present exemplary embodiment;

fig. 7 is a schematic diagram showing the structure of a discrimination network in the present exemplary embodiment;

FIG. 8 illustrates a flow chart for training the generation network, the discrimination network, and the image enhancement network in the present exemplary embodiment;

FIG. 9 illustrates a flow chart for processing an image using an image enhancement network in the present exemplary embodiment;

FIG. 10 is a diagram illustrating the structure of an image enhancement network in the exemplary embodiment;

fig. 11 shows a schematic configuration diagram of an image processing apparatus of the present exemplary embodiment.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and the like. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the present disclosure.

Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.

Taking image defogging in image processing as an example, in the related art, an image defogging method estimates atmospheric light and a propagation map based on a prior physical scattering model, and further separates fog components. The defogging result depends heavily on the estimation precision of atmospheric light and a propagation map, and the original image has the problems of fogging and unsharpness, so that the propagation map is easy to be inaccurate, and the defogging quality is difficult to ensure.

In view of one or more of the above problems, exemplary embodiments of the present disclosure first provide an image processing method. The image processing method can be executed in an electronic device, the electronic device generally comprises a processor and a memory, the memory is used for storing executable instructions of the processor and also storing application data such as images, and the processor is used for executing the executable instructions to realize data processing. The electronic devices include, but are not limited to: smart phones, tablet computers, game machines, desktop computers, notebook computers, televisions, electronic photo frames, Personal Digital Assistants (PDAs), navigation devices, wearable devices, unmanned planes, servers, and the like.

The structure of the electronic device is exemplarily described below by taking the mobile terminal 100 in fig. 1 as an example. It will be appreciated by those skilled in the art that the configuration of figure 1 can also be applied to fixed type devices, in addition to components specifically intended for mobile purposes. In other embodiments, mobile terminal 100 may include more or fewer components than shown, or some components may be combined, some components may be split, or a different arrangement of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware. The interfacing relationship between the components is only schematically illustrated and does not constitute a structural limitation of the mobile terminal 100. In other embodiments, the mobile terminal 100 may also interface differently than shown in fig. 1, or a combination of multiple interfaces.

As shown in fig. 1, the mobile terminal 100 may specifically include: the mobile terminal includes a processor 110, an internal memory 121, an external memory interface 122, a USB interface 130, a charging management Module 140, a power management Module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication Module 150, a wireless communication Module 160, an audio Module 170, a speaker 171, a receiver 172, a microphone 173, an earphone interface 174, a sensor Module 180, a display 190, a camera Module 191, an indicator 192, a motor 193, a button 194, a Subscriber Identity Module (SIM) card interface 195, and the like.

Processor 110 may include one or more processing units, such as: the Processor 110 may include an Application Processor (AP), a modem Processor, a Graphics Processing Unit (GPU), an Image Signal Processor (ISP), a controller, an encoder, a decoder, a Digital Signal Processor (DSP), a baseband Processor, and/or a Neural Network Processor (NPU), and the like. The different processing units may be separate devices or may be integrated into one or more processors.

The encoder may encode (i.e., compress) the image or video data to form code stream data; the decoder may decode (i.e., decompress) the codestream data of the image or video to restore the image or video data. The mobile terminal 100 may support one or more encoders and decoders. In this way, the mobile terminal 100 may process images or video in a variety of encoding formats, such as: image formats such as JPEG (Joint Photographic Experts Group), PNG (Portable Network Graphics), BMP (Bitmap), and Video formats such as MPEG (Moving Picture Experts Group), h.264, and HEVC (High Efficiency Video Coding).

The NPU processes calculation work such as image feature extraction, image classification, image identification and the like by deploying a neural network and utilizing the neural network. In some embodiments, the neural network may also be deployed in the AP.

In some implementations, the processor 110 may include one or more interfaces. The Interface may include an Integrated Circuit (I2C) Interface, an Inter-Integrated Circuit built-in audio (I2S) Interface, a Pulse Code Modulation (PCM) Interface, a Universal Asynchronous Receiver/Transmitter (UART) Interface, a Mobile Industry Processor Interface (MIPI), a General-Purpose Input/Output (GPIO) Interface, a Subscriber Identity Module (SIM) Interface, and/or a Universal Serial Bus (USB) Interface, etc. Connections are made with other components of the mobile terminal 100 through different interfaces.

The USB interface 130 is an interface conforming to the USB standard specification, and may specifically be a MiniUSB interface, a microsusb interface, a USB type c interface, or the like. The USB interface 130 may be used to connect a charger to charge the mobile terminal 100, may also be connected to an earphone to play audio through the earphone, and may also be used to connect the mobile terminal 100 to other electronic devices, such as a computer and a peripheral device.

The charging management module 140 is configured to receive charging input from a charger. The charging management module 140 may also supply power to the device through the power management module 141 while charging the battery 142.

The power management module 141 is used for connecting the battery 142, the charging management module 140 and the processor 110. The power management module 141 receives input from the battery 142 and/or the charge management module 140, supplies power to various parts of the mobile terminal 100, and may also be used to monitor the state of the battery.

The wireless communication function of the mobile terminal 100 may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, a modem processor, a baseband processor, and the like.

The antennas 1 and 2 are used for transmitting and receiving electromagnetic wave signals. Each antenna in the mobile terminal 100 may be used to cover a single or multiple communication bands. Different antennas can also be multiplexed to improve the utilization of the antennas. The mobile communication module 150 may provide a solution including 2G/3G/4G/5G wireless communication applied on the mobile terminal 100.

The Wireless Communication module 160 may provide Wireless Communication solutions including a Wireless Local Area Network (WLAN) (e.g., a Wireless Fidelity (Wi-Fi) network), Bluetooth (BT), a Global Navigation Satellite System (GNSS), Frequency Modulation (FM), Near Field Communication (NFC), Infrared (IR), and the like, which are applied to the mobile terminal 100. The wireless communication module 160 may be one or more devices integrating at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2, performs frequency modulation and filtering processing on electromagnetic wave signals, and transmits the processed signals to the processor 110. The wireless communication module 160 may also receive a signal to be transmitted from the processor 110, perform frequency modulation and amplification on the signal, and convert the signal into electromagnetic waves through the antenna 2 to radiate the electromagnetic waves.

In some embodiments, the antenna 1 of the mobile terminal 100 is coupled to the mobile communication module 150 and the antenna 2 is coupled to the wireless communication module 160, so that the mobile terminal 100 can communicate with a network and other devices through wireless communication technology. The wireless communication technology may include Global System for Mobile communications (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (Code Division Multiple Access, CDMA), Wideband Code Division Multiple Access (WCDMA), Time Division-Code Division Multiple Access (TD-SCDMA), Long Term Evolution (Long Term Evolution, LTE), New air interface (New Radio, NR), BT, GNSS, WLAN, NFC, FM, and/or IR technologies, etc.

The mobile terminal 100 implements a display function through the GPU, the display screen 190, the application processor, and the like. The GPU is used to perform mathematical and geometric calculations to achieve graphics rendering and to connect the display screen 190 and the application processor. The processor 110 may include one or more GPUs that execute program instructions to generate or alter display information. The mobile terminal 100 may include one or more display screens 190 for displaying images, video, and the like.

The mobile terminal 100 may implement a photographing function through the ISP, the camera module 191, the encoder, the decoder, the GPU, the display screen 190, the application processor, and the like.

The camera module 191 is used to capture still images or videos, collect optical signals through the photosensitive element, and convert the optical signals into electrical signals. The ISP is used to process the data fed back by the camera module 191 and convert the electrical signal into a digital image signal.

The external memory interface 122 may be used to connect an external memory card, such as a Micro SD card, to extend the memory capability of the mobile terminal 100.

The internal memory 121 may be used to store computer-executable program code, which includes instructions. The internal memory 121 may include a program storage area and a data storage area. The storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required by at least one function, and the like. The storage data area may store data (e.g., images, videos) created during use of the mobile terminal 100, and the like. The processor 110 executes various functional applications of the mobile terminal 100 and data processing by executing instructions stored in the internal memory 121 and/or instructions stored in a memory provided in the processor.

The mobile terminal 100 may implement an audio function through the audio module 170, the speaker 171, the receiver 172, the microphone 173, the earphone interface 174, and the application processor. Such as music playing, recording, etc. The audio module 170 is used to convert digital audio information into an analog audio signal output and also to convert an analog audio input into a digital audio signal. The audio module 170 may also be used to encode and decode audio signals. The speaker 171 converts an audio electric signal into a sound signal. The receiver 172 is used for converting the audio electrical signal into a sound signal. The microphone 173 converts a sound signal into an electrical signal. The earphone interface 174 is used to connect a wired earphone.

The sensor module 180 may include a depth sensor 1801, a pressure sensor 1802, a gyroscope sensor 1803, a barometric pressure sensor 1804, and the like. The depth sensor 1801 is used to acquire depth information of a scene. The pressure sensor 1802 is used for sensing a pressure signal, and can convert the pressure signal into an electrical signal to implement functions such as pressure touch control. The gyro sensor 1803 may be used to determine a motion gesture of the mobile terminal 100, and may be used to photograph scenes such as anti-shake, navigation, and motion sensing games. The air pressure sensor 1804 is used to measure air pressure, which can be used to assist in positioning and navigation by calculating altitude. Further, according to actual needs, sensors having other functions, such as a magnetic sensor, an acceleration sensor, and a distance sensor, may be provided in the sensor module 180.

Indicator 192 may be an indicator light that may be used to indicate a state of charge, a change in charge, or a message, missed call, notification, etc.

The motor 193 can generate vibration prompts, such as incoming calls, alarm clocks, receiving messages, etc., and can also be used for touch vibration feedback, etc.

The keys 194 include a power-on key, a volume key, and the like. The keys 194 may be mechanical keys. Or may be touch keys. The mobile terminal 100 may receive a key input, and generate a key signal input related to user setting and function control of the mobile terminal 100.

The mobile terminal 100 may support one or more SIM card interfaces 195 for connecting SIM cards, so that the mobile terminal 100 interacts with a network through the SIM cards to implement functions such as communication and data communication.

A data processing method according to an exemplary embodiment of the present disclosure is specifically described below with reference to fig. 2. As shown in fig. 2, the data processing method may include the following steps S210 to S230:

step S210, acquiring a to-be-processed image under a preset task.

The preset task refers to a predetermined task for processing some aspect of the image, and may include image defogging, image denoising, image deblurring, image super-resolution reconstruction, or any combination thereof, such as image defogging and image super-resolution reconstruction. The image to be processed is usually an image with defects in some aspect, and needs to be optimized by performing processing of a preset task, and if the preset task is image defogging, the image to be processed may be an image shot in a haze environment, and is affected by particle scattering, and defogging processing is needed.

Step S220, the image to be processed is processed by utilizing the generation countermeasure network under the preset task, and an intermediate image corresponding to the image to be processed is generated.

In the exemplary embodiment, a countermeasure network (GAN) can be trained and generated under each task, so as to convert the image processing problem into an image conversion problem. Taking image defogging as an example, a priori physical scattering model is adopted in the related technology, and the essence is to calculate the relevant data of fog in the original image, so as to remove the influence of the fog part and obtain a defogged image. The generation of the countermeasure network can learn the characteristics of all aspects of the image to be processed, including the characteristics of the 'fogging' part and the characteristics of the 'non-fogging' part, and then the knowledge learned by the network from other images of the data set is combined to generate a new defogged image, so that the limitation of the image content of the image to be processed is broken through. Therefore, the method of generating the countermeasure network for processing is quite different from the related art processing method.

The generation countermeasure network is generally composed of two parts, a generation network and a countermeasure network. In an alternative embodiment, as shown with reference to fig. 3, step S220 may include the following steps S310 to S330:

step S310, processing the image to be processed by using a generation network under a preset task to generate an image to be distinguished;

step S320, determining whether the image to be determined is a real image or not by using a determination network under a preset task;

step S330, when the image to be distinguished is determined to be a real image, determining the image to be distinguished to be an intermediate image corresponding to the image to be processed.

Wherein the image to be discriminated is a new image generated by the generation network based on the image to be processed. The image to be distinguished needs to be distinguished through a distinguishing network so as to determine whether the image reaches the level of a real image. It should be noted that the discrimination network may actually output the probability distribution that the image to be discriminated is the real image, and then measure the probability distribution according to a preset probability threshold (e.g., 70%, 80%, etc., specifically determined according to actual needs and experience). And when the probability that the image to be distinguished is the real image is greater than the probability threshold value, determining the image to be distinguished as the real image, indicating that the generated image to be distinguished is effective, and determining the image to be distinguished as the intermediate image corresponding to the image to be processed. On the contrary, when the discrimination network determines that the image to be discriminated is not a real image, it is described that the generated image to be discriminated is invalid, and for this case, two exemplary processing manners are provided below:

iterating an image to be distinguished by generating a network, specifically: inputting the image to be distinguished into a generation network to generate a new image to be distinguished, distinguishing whether the image is a real image or not through the distinguishing network, finishing iteration when the image is determined to be the real image to obtain an intermediate image, and inputting the generated new image to be distinguished into the generation network again to continue iteration when the image is determined to be a non-real image.

And secondly, optimizing the image to be processed, such as up-sampling, image sharpening, filtering and the like, inputting the image to be processed into the generation network again to generate a new image to be distinguished, and distinguishing whether the image is a real image or not through the distinguishing network.

From the above, the quality of the image to be determined, which is generated by the network output, has an important influence on the implementation of the scheme. In an alternative embodiment, the structure of the generation network can be designed, and the features of the image to be processed can be extracted from multiple aspects to generate a high-quality image to be distinguished. As shown in fig. 4 in particular, step S310 may include the following steps S410 to S430:

step S410, processing the image to be processed by utilizing a first branch network of a generating network to generate a first characteristic image, wherein the first branch network comprises at least one convolution layer;

step S420, processing the image to be processed by utilizing a second branch network of the generating network to generate a second characteristic image, wherein the second branch network comprises at least one residual block;

step S430, processing the image obtained by superposing the first characteristic image and the second characteristic image by using a third branch network of the generating network to generate an image to be distinguished, wherein the third branch network comprises at least one deconvolution layer.

The first branch network is mainly used for extracting global features of the image to be processed, the second branch network is mainly used for extracting local features of the image to be processed, and the third branch network is mainly used for combining the two features.

Fig. 5 shows a schematic structure of a generating network, the first and second branch networks being two parts of a concatenation. The first branch network is mainly based on the convolutional layers, and can be provided with a certain number of convolutional layers, matched pooling layers and the like according to requirements, for example, two convolutional layers with different scales are arranged in fig. 5, and one corresponding pooling layer is arranged behind each convolutional layer; the first branch network can extract global features from the image to be processed to obtain a corresponding first feature image. The second branch network mainly uses the residual blocks, and can set a certain number of residual blocks and other matched intermediate layers as required, for example, a down-sampling layer, two convolution layers, three residual blocks and an anti-convolution layer are set in fig. 5, the down-sampling layer reduces the image to be processed to a half scale, the convolution layer extracts local features from the reduced image to be processed, the residual blocks combine the local features, the anti-convolution layer deconvolutes the feature image to a specific size, so that the generated second feature image has the same size as the first feature image, and the size is generally smaller than the size of the image to be processed. The third branch network is mainly based on the deconvolution layer, so that the finally generated image to be distinguished reaches the same size as the image to be processed, for example, three residual blocks and one deconvolution layer are arranged in fig. 5, the first feature image and the second feature image are overlapped (for example, the first feature image and the second feature image can be added according to pixels at corresponding positions), the global feature and the local feature are fused, the features are combined through the residual blocks, and finally, the image to be distinguished is obtained through deconvolution processing. Through the design of the three branch networks, the situation that the image to be distinguished learns richer characteristics in the image to be processed can be ensured, more detail information is reserved, and higher image quality is realized.

Furthermore, the real probability of the image to be distinguished can be predicted more accurately by designing the structure of the distinguishing network. As shown in fig. 6 in particular, step S320 may include the following steps S610 and S620:

step S610, processing the image to be processed and the image to be distinguished by utilizing at least two convolution layers with different scales in the distinguishing network to generate a third characteristic image corresponding to the image to be processed and a fourth characteristic image corresponding to the image to be distinguished;

step S620, processing the third feature image and the fourth feature image by using at least one full connection layer in the determination network, and outputting probability distribution that the image to be determined is a real image, so as to determine whether the image to be determined is a real image.

The discrimination network generally has two input channels, which are respectively used for inputting the image to be processed and the image to be discriminated, and the discrimination network recognizes whether the image to be discriminated is a derivative image of the image to be processed by learning the characteristics of the two images and comparing the two images, which is also the basis for performing authenticity prediction by the discrimination network. Thus, the entire discrimination network can be divided into two parts: a feature extraction section and a feature comparison section. Step S610 is a processing procedure of the feature extraction part, which extracts features from the image to be processed and the image to be distinguished, respectively, and generates a third feature image and a fourth feature image corresponding thereto; step S620 is a processing procedure of the feature comparison portion, which performs comparison processing after fusing the third feature image and the fourth feature image, and finally outputs probability distribution that the image to be determined is a real image.

Fig. 7 shows a schematic structure of a discrimination network, in the feature extraction part, an image to be processed and an image to be discriminated are processed respectively by m convolution layers (m >2) with different scales, where the m convolution layers are in a serial connection structure, for example, four convolution layers are set, each convolution layer adopts 4 × 4 convolution kernels, the number of channels is 64, 128, 256, and 512 in sequence, and all the convolution layers adopt lreul (leak corrected Linear Unit) activation functions, and certainly, other intermediate layers such as a pooling layer and a residual block may also be set. In the feature comparison part, the third feature image and the fourth feature image are fused and subjected to dimensionality reduction through n full-connection layers (n >2), for example, a 64-dimensional full-connection layer 1 and a 2-dimensional full-connection layer 2 are set, and finally, the probability distribution of whether the image to be judged is a real image is output. The specific values of m and n can be determined according to actual requirements and experience.

Exemplary embodiments of the present disclosure also provide a training method for generating a network and discriminating the network, which can be referred to as fig. 8 and includes the following steps:

step S810, constructing an initial generating network and an initial judging network;

step S830, inputting the sample initial image into a generation network to obtain a corresponding sample generation image;

step 840, inputting the label image and the sample generation image corresponding to the sample initial image into a discrimination network respectively, and obtaining a first probability distribution that the sample generation image is a real image and a second probability distribution that the label image is a real image;

step S860, updating parameters of the generated network according to the distance between the label image and the sample generation image and the first probability distribution;

step S870, updates the parameters of the discrimination network according to the first probability distribution and the second probability distribution.

After the structures of the generation network and the judgment network are determined, network parameter assignment can be performed through random initialization, and existing numerical values can also be adopted, so that the initial generation network and the initial judgment network are obtained. The sample initial image may be from a pre-established dataset. Taking image defogging as an example, a large number of real images (usually clear pictures) are selected as tag images and recorded as B, the tag images are subjected to defogging processing to obtain initial images of samples to be defogged, the initial images are recorded as A, one A and the corresponding B form a group of samples, and a data set is formed through a large number of samples. During training, inputting A into a generating network to obtain a corresponding sample generating image, and recording the sample generating image as G (A); the method includes inputting the data of G (A) and G (A) into the discrimination network, and obtaining a first probability distribution of the real image G (A) and D (A), and inputting the data of B and A into the discrimination network to obtain a second probability distribution of the real image B and D (A, B). Constructing a first Loss function Loss1 for generating the network and a second Loss function Loss2 for judging the network:

Loss1＝Dist(G(A),B)-log(D(A,G(A))) (1)

Loss2＝log(D(A,B))+log(1-D(A,G(A))) (2)

dist (g (a), B) represents the distance between g (a), B, and may be measured by using, for example, the euclidean distance between the two images via a discrimination network, or by using an index such as the sum of absolute differences between the two images. And respectively optimizing and updating parameters of the generation network and the judgment network through the loss function so as to realize training.

And step S230, performing convolution processing on the intermediate image by using an image enhancement network at least in two different scales, and generating a target image according to data after the convolution processing.

The image enhancement network is used for enhancing the intermediate image, and can further mine image characteristics under local views of different scales through multi-scale convolution, so that image detail information is improved.

In an alternative embodiment, referring to fig. 9, step S230 may include the following steps S910 to S930:

step S910, processing the intermediate image by utilizing a pre-extraction layer in the image enhancement network to generate a fifth characteristic image;

step S920, the fifth characteristic image is processed by utilizing at least two convolution layers with different scales in the discrimination network respectively, and a sixth characteristic image corresponding to each convolution layer is generated;

and step S930, stitching the fifth feature image and the sixth feature images subjected to the upsampling, performing feature processing, and generating a target image.

The pre-extraction layer can comprise a certain number of convolution layers and pooling layers and is used for extracting global features from the intermediate image, then performing parallel processing on the convolution layers with different scales to extract local features under different scales, and finally splicing a fifth feature image representing the global features and a sixth feature image representing the local features with different scales and performing comprehensive processing to generate a target image.

FIG. 10 shows a schematic structure of an image enhancement network, where the pre-extraction layer may include several convolution layers and pooling layers, such as two convolution layers and one average pooling layer may be provided to extract global features from the intermediate image, resulting in a fifth feature image; then, the fifth feature image is downsampled by 1/32, 1/16, 1/8 and 1/4 through downsampling layers with different scales, for example, the feature images with four scales are obtained; then, dimension reduction is carried out through 1-by-1 convolution respectively, and a convolution kernel corresponding to the number of the characteristic image channels is adopted during dimension reduction, so that single-channel images (which can also be RGB three-channel images, and the disclosure does not limit the same) with different dimensions, namely sixth characteristic images, are obtained after dimension reduction; the sixth feature images are different in size and different from the fifth feature images, so that the sixth feature images are converted to the same size as the fifth feature images through upsampling in different scales; and splicing the fifth characteristic image with each sixth characteristic image, and outputting a target image after subsequent processing of the convolution layer.

Compared with the intermediate image, the target image has better visual effect in the aspects of details and colors. Taking image defogging as an example, an intermediate image processed by the generation countermeasure network may be regarded as a coarse defogged image, and a target image further processed by the image enhancement network may be regarded as a fine defogged image.

In the exemplary embodiment, the image enhancement network may be trained alone or in combination with the above-described generation countermeasure network. One embodiment of joint training is provided below, and with continued reference to fig. 8, the training process of the image enhancement network may include the following steps:

step S820, constructing an initial image enhancement network;

step S850, inputting the sample generation image into an image enhancement network to obtain a corresponding sample enhancement image;

and step S880, updating parameters of the image enhancement network through the label image and the sample enhancement image.

The training process described above may employ a data set that generates a countermeasure network. For example, a sample generation image is recorded as G (A), the sample generation image is input into an image enhancement network to obtain a corresponding sample enhancement image which is recorded as Enhance (G (A)), a loss function is constructed according to the deviation between the Enhance (G (A)) and a label image B, and parameters of the image enhancement network are updated according to the loss function, so that training is realized.

Further, the loss function of the image enhancement network can adopt reconstruction loss and perception loss, wherein the reconstruction loss indicates the L1 loss (loss based on L1 norm and usually absolute value difference) between the label image and the sample enhancement image, and the perception loss indicates that characteristic images are respectively extracted from the label image and the sample enhancement image through VGG (classical visual convolution neural network), and the L1 loss between the two characteristic images is calculated.

It should be noted that the exemplary embodiment may train a common image enhancement network for different tasks, for example, the same image enhancement network is used for image defogging, image deblurring, and the like, or may train a dedicated image enhancement network for different tasks, for example, the image enhancement networks for image defogging and image deblurring training, respectively. Generally, the processing effect of the dedicated image enhancement network is better, but higher resource overhead is required for the construction, training and storage of the network, which scheme is specifically adopted can be determined according to the actual situation, and the disclosure does not limit the scheme.

As can be seen from steps S810 to S880 shown in fig. 8, in practical applications, the same data set may be used to perform joint training on the generation network, the countermeasure network, and the image enhancement network, and the same Normalization manner is used during training, for example, the three networks are updated synchronously through Instance Normalization (Instance Normalization), so as to improve training efficiency and shorten training time.

In an optional implementation manner, after the target image is generated, the optimal target image can be realized by iterating the target image, which is specifically as follows:

inputting the target image into an image enhancement network to update the target image;

acquiring evaluation data of a target image;

when the evaluation data of the target image is converged, determining the current target image;

when the evaluation data of the target image is not converged, the target image is input to the image enhancement network again to update the target image.

The evaluation data may be a result of manually scoring the visual effect of the image, or may be a result of machine evaluation. An image standard threshold value can be set for the evaluation data, when the evaluation data of the target image reaches or exceeds the threshold value, the target image meets the requirement, and the target image is determined as the final output image; otherwise, inputting the target image into an image enhancement network to obtain further enhancement until the evaluation data reaches the image standard threshold. Or, the target image is circularly input into the image enhancement network to repeatedly enhance the image until the evaluation data cannot be improved, and the optimal target image is determined to be obtained. The evaluation data convergence may be that the difference between the evaluation data obtained after two consecutive target image updates is smaller than a convergence threshold (usually a small value, which is relevant to the practical application), which indicates that the target image cannot be obviously improved by iteration of the image enhancement network, and the target image at this time is output as the final image.

In summary, in the exemplary embodiment, on one hand, by generating the countermeasure network to process the to-be-processed image to execute the preset task, a general image processing problem can be converted into an image conversion problem, dependence of image processing on a prior model or an algorithm is eliminated, content limitation of the to-be-processed image is broken through, more knowledge and information learned from a data set by the network is utilized, an image processing effect is improved, and a target image with higher quality is obtained. On the other hand, through the multi-scale convolution processing of the image enhancement network, the detail information and the feature richness of the target image can be enhanced, so that the problems of detail loss, color distortion and the like possibly caused by generation of the countermeasure network are solved, and the image quality is further improved.

Exemplary embodiments of the present disclosure also provide an image processing apparatus. As shown in fig. 11, the image processing apparatus 1100 may include:

an image obtaining module 1110, configured to obtain an image to be processed under a preset task;

the first processing module 1120 is configured to process the to-be-processed image by using a generation countermeasure network under a preset task, and generate an intermediate image corresponding to the to-be-processed image;

the second processing module 1130 is configured to perform convolution processing on the intermediate image in at least two different scales by using the image enhancement network, and generate a target image according to data after the convolution processing.

In an alternative embodiment, the first processing module 1120 includes:

the generation processing unit is used for processing the image to be processed by utilizing a generation network under a preset task to generate an image to be distinguished;

the judging and processing unit is used for determining whether the image to be judged is a real image or not by utilizing a judging network under a preset task;

and the intermediate image determining unit is used for determining the image to be distinguished as the intermediate image corresponding to the image to be processed when the image to be distinguished is determined to be the real image.

In an alternative embodiment, the generation processing unit is configured to:

processing the image to be processed by using a first branch network of a generating network to generate a first characteristic image, wherein the first branch network comprises at least one convolution layer

Processing the image to be processed by utilizing a second branch network of the generating network to generate a second characteristic image, wherein the second branch network comprises at least one residual block;

and processing the image obtained by superposing the first characteristic image and the second characteristic image by using a third branch network of the generation network to generate an image to be distinguished, wherein the third branch network comprises at least one deconvolution layer.

In an alternative embodiment, the discrimination processing unit is configured to:

processing the image to be processed and the image to be distinguished by utilizing at least two convolution layers with different scales in the distinguishing network to generate a third characteristic image corresponding to the image to be processed and a fourth characteristic image corresponding to the image to be distinguished;

and processing the third characteristic image and the fourth characteristic image by utilizing at least one full connection layer in the judgment network, and outputting probability distribution of the image to be judged as a real image so as to determine whether the image to be judged is the real image.

In an alternative embodiment, the first processing module 1120 further includes a generate confrontation network training unit configured to:

constructing an initial generating network and an initial judging network;

inputting the initial sample image into a generation network to obtain a corresponding sample generation image;

respectively inputting the label image and the sample generation image corresponding to the sample initial image into a discrimination network to obtain a first probability distribution that the sample generation image is a real image and a second probability distribution that the label image is a real image;

updating parameters of a generation network according to the distance between the label image and the sample generation image and the first probability distribution;

and updating the parameters of the discrimination network according to the first probability distribution and the second probability distribution.

In an alternative embodiment, the second processing module 1130 includes an image enhancement network training unit configured to:

constructing an initial image enhancement network;

inputting the sample generation image into an image enhancement network to obtain a corresponding sample enhancement image;

and updating parameters of the image enhancement network through the label image and the sample enhancement image.

In an alternative embodiment, the second processing module 1130 is configured to:

processing the intermediate image by utilizing a pre-extraction layer in the image enhancement network to generate a fifth characteristic image;

respectively processing the fifth characteristic image by utilizing at least two convolution layers with different scales in the discrimination network to generate a sixth characteristic image corresponding to each convolution layer;

and splicing the fifth characteristic image and the sixth characteristic images subjected to up-sampling, and performing characteristic processing to generate a target image.

acquiring evaluation data of a target image;

In an alternative embodiment, the predetermined task includes any one or more of: defogging an image, denoising the image, deblurring the image and reconstructing super-resolution of the image.

The specific details of each part in the above device have been described in detail in the method part embodiments, and thus are not described again.

Exemplary embodiments of the present disclosure also provide a computer-readable storage medium having stored thereon a program product capable of implementing the above-described method of the present specification. In some possible embodiments, various aspects of the disclosure may also be implemented in the form of a program product including program code for causing a terminal device to perform the steps according to various exemplary embodiments of the disclosure described in the above-mentioned "exemplary methods" section of this specification, when the program product is run on the terminal device, for example, any one or more of the steps in fig. 2 may be performed.

The program product may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present disclosure is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

As will be appreciated by one skilled in the art, aspects of the present disclosure may be embodied as a system, method or program product. Accordingly, various aspects of the present disclosure may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is to be limited only by the following claims.

Claims

1. An image processing method, comprising:

acquiring an image to be processed under a preset task;

processing the image to be processed by utilizing a generation countermeasure network under the preset task to generate an intermediate image corresponding to the image to be processed;

and performing convolution processing on the intermediate image by using an image enhancement network at least two different scales, and generating a target image according to data after the convolution processing.

2. The method according to claim 1, wherein the processing the image to be processed by using the generation countermeasure network under the preset task to generate an intermediate image corresponding to the image to be processed comprises:

processing the image to be processed by using a generating network under the preset task to generate an image to be distinguished;

determining whether the image to be distinguished is a real image or not by using a distinguishing network under the preset task;

and when the image to be distinguished is determined to be a real image, determining that the image to be distinguished is an intermediate image corresponding to the image to be processed.

3. The method according to claim 2, wherein the processing the image to be processed by using the generation network under the preset task to generate an image to be distinguished comprises:

processing the image to be processed by using a first branch network of the generation network to generate a first feature image, wherein the first branch network comprises at least one convolution layer

and processing an image obtained by superposing the first characteristic image and the second characteristic image by using a third branch network of the generating network to generate an image to be distinguished, wherein the third branch network comprises at least one deconvolution layer.

4. The method according to claim 2, wherein the determining whether the image to be discriminated is a real image by using the discrimination network under the preset task comprises:

and processing the third characteristic image and the fourth characteristic image by using at least one full connection layer in the judgment network, and outputting probability distribution that the image to be judged is a real image so as to determine whether the image to be judged is the real image.

5. The method of claim 2, wherein the generator network and the discriminant network are trained by:

constructing an initial generating network and an initial judging network;

inputting the sample initial image into the generation network to obtain a corresponding sample generation image;

respectively inputting the label image corresponding to the sample initial image and the sample generation image into the discrimination network to obtain a first probability distribution that the sample generation image is a real image and a second probability distribution that the label image is a real image;

updating parameters of the generation network by a distance between the label image and the specimen generation image, and the first probability distribution;

updating parameters of the discriminative network with the first probability distribution and the second probability distribution.

6. The method of claim 5, wherein the image enhancement network is trained by:

constructing an initial image enhancement network;

inputting the sample generation image into the image enhancement network to obtain a corresponding sample enhancement image;

and updating the parameters of the image enhancement network through the label image and the sample enhancement image.

7. The method of claim 1, wherein the performing convolution processing on the intermediate image with at least two different scales by using an image enhancement network and generating a target image according to the convolution processed data comprises:

processing the fifth feature image by using at least two convolution layers with different scales in the discrimination network respectively to generate a sixth feature image corresponding to each convolution layer;

8. The method of claim 1, wherein after generating the target image, the method further comprises:

inputting the target image into the image enhancement network to update the target image;

acquiring evaluation data of the target image;

determining a current target image when the evaluation data of the target image is converged;

and when the evaluation data of the target image is not converged, inputting the target image into the image enhancement network again to update the target image.

9. The method according to any one of claims 1 to 8, wherein the pre-set tasks include any one or more of: defogging an image, denoising the image, deblurring the image and reconstructing super-resolution of the image.

10. An image processing apparatus characterized by comprising:

the image acquisition module is used for acquiring an image to be processed under a preset task;

the first processing module is used for processing the image to be processed by utilizing a generation countermeasure network under the preset task to generate an intermediate image corresponding to the image to be processed;

and the second processing module is used for performing convolution processing on the intermediate image in at least two different scales by using the image enhancement network and generating a target image according to data after the convolution processing.

11. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method of any one of claims 1 to 9.

12. An electronic device, comprising:

a processor; and

a memory for storing executable instructions of the processor;

wherein the processor is configured to perform the method of any of claims 1 to 9 via execution of the executable instructions.