CN113888432A

CN113888432A - Image enhancement method and device for image enhancement

Info

Publication number: CN113888432A
Application number: CN202111166910.3A
Authority: CN
Inventors: 宋明辉; 张俊; 谢泽华; 周泽南
Original assignee: Beijing Sogou Technology Development Co Ltd
Current assignee: Beijing Sogou Technology Development Co Ltd
Priority date: 2021-09-30
Filing date: 2021-09-30
Publication date: 2022-01-04

Abstract

The embodiment of the application discloses an image enhancement method and device and a device for image enhancement. An embodiment of the method comprises: respectively inputting the original image into a local feature extraction network and a global feature extraction network in a pre-trained image enhancement model to respectively obtain the local feature and the global feature of the original image; and inputting the local features and the global features into a feature processing network in the image enhancement model to obtain a first enhanced image corresponding to the original image. This embodiment improves the image enhancement effect.

Description

Image enhancement method and device for image enhancement

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to an image enhancement method and device and a device for image enhancement.

Background

Image enhancement is a method of image processing by performing some processing on the original image to selectively highlight features of interest in the image or suppress some unwanted features in the image to match the image to the visual response characteristics.

In the prior art, the same feature extraction network is generally used to extract the local features and the global features of the image, and the image enhancement is performed based on the extracted features. The enhanced image generated in this way is less effective when the input image is larger.

Disclosure of Invention

The embodiment of the application provides an image enhancement method and device and a device for image enhancement, so as to solve the technical problem of poor image enhancement effect in the prior art.

In a first aspect, an embodiment of the present application provides an image enhancement method, where the method includes: respectively inputting an original image into a local feature extraction network and a global feature extraction network in a pre-trained image enhancement model to obtain local features and global features of the original image; and inputting the local features and the global features into a feature processing network in the image enhancement model to respectively obtain first enhanced images corresponding to the original images.

In a second aspect, an embodiment of the present application provides an image enhancement apparatus, including: the image enhancement model comprises a feature extraction unit, a feature extraction unit and a feature extraction unit, wherein the feature extraction unit is configured to input an original image into a local feature extraction network and a global feature extraction network in a pre-trained image enhancement model respectively to obtain local features and global features of the original image; and the first image enhancement unit is configured to input the local features and the global features into a feature processing network in the image enhancement model to respectively obtain first enhanced images corresponding to the original images.

In a third aspect, an embodiment of the present application provides an apparatus for image enhancement, including a memory, and one or more programs, where the one or more programs are stored in the memory, and when the programs are executed by the one or more processors, the method as described in the first aspect is implemented.

In a fourth aspect, embodiments of the present application provide a computer-readable medium on which a computer program is stored, which when executed by a processor, implements the method as described in the first aspect above.

According to the image enhancement method and device and the device for image enhancement provided by the embodiment of the application, the image enhancement model comprising the local feature extraction network, the global feature extraction network and the feature processing network is trained in advance, so that the local feature and the global feature of the original image can be extracted through the local feature extraction network and the global feature extraction network, the extracted local feature and the extracted global feature are processed through the feature processing network, and the first enhanced image obtained by enhancing the original image is obtained respectively. Because the local features and the global features of the image are extracted by using different feature extraction networks respectively, when the image is large, the process of extracting the local features does not need to be limited by the global features to reduce the input image, the accuracy of the local features is improved, and the effect of the generated enhanced image is improved.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is a flow diagram of one embodiment of an image enhancement method according to the present application;

FIG. 2 is a schematic diagram of the structure of a local feature extraction network and a global feature extraction network in an image enhancement model according to the application;

FIG. 3 is a flow diagram of yet another embodiment of an image enhancement method according to the present application;

FIG. 4 is a schematic block diagram of one embodiment of an image enhancement apparatus according to the present application;

FIG. 5 is a schematic block diagram of an apparatus for image enhancement according to the present application;

FIG. 6 is a schematic diagram of a server in accordance with some embodiments of the present application.

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

Referring to fig. 1, a flow 100 of one embodiment of an image enhancement method according to the present application is shown. The image enhancement method can be operated in various electronic devices including but not limited to: a server, a smart phone, a tablet computer, an e-book reader, an MP4 (Moving Picture Experts Group Audio Layer IV) player, a laptop, a vehicle computer, a desktop computer, a set-top box, a smart tv, a wearable device, and so on.

The image enhancement method in this embodiment may include the steps of:

step 101, respectively inputting an original image into a local feature extraction network and a global feature extraction network in a pre-trained image enhancement model, and respectively obtaining a local feature and a global feature of the original image.

In this embodiment, the execution subject of the image enhancement method (e.g., an electronic device such as a server) may store a pre-trained image enhancement model, and the image enhancement model may process an image input thereto and output an enhanced image. The image enhancement model may be obtained by pre-training an image processing Network such as GAN (generic adaptive Net, Generative countermeasure Network) or CNN (Convolutional Neural Network) by a machine learning method.

In this embodiment, the image enhancement model may include a local feature extraction network, a global feature extraction network, and a feature processing network. The local feature extraction network can be used for extracting local features of the image, and the local features are features of the local area. A global feature extraction network may be used to extract global features of an image. The feature processing network may be used to process (e.g., convolve, pool, deconvolve, inverse pool, etc.) the extracted local features and global features to obtain a new image. The new image is an enhanced image obtained by enhancing the original image.

As an example, fig. 2 shows a schematic structural diagram of a local feature extraction network and a global feature extraction network in an image enhancement model. As shown in fig. 2, the global feature extraction network includes an input layer, a convolutional block (which may include one or more convolutional layers), and an output layer. The input to the global feature extraction network may be an image that has been downsampled (e.g., by convolution, pooling, etc.) to a fixed size (e.g., 512 x 512).

The local feature extraction network may include an input layer, at least one convolution block (each convolution block may include one or more convolution layers), and an output layer. The input of the local feature extraction network is an image of the original size. In practice, if the image size is smaller than 512, enhancement can be performed using only the local feature extraction network. If the image size is larger than 512, the global feature output by the global feature extraction network can be fused with the intermediate data generated by the local feature extraction model in an element-wise add (element-wise add) mode.

In this embodiment, the executing entity may input the original image to a local feature extraction network and a global feature extraction network in a pre-trained image enhancement model, respectively, to obtain a local feature and a global feature of the original image, respectively. Because the local features and the global features of the image are extracted by using the local feature extraction network and the global feature extraction network with different network parameters respectively, when the image is large, the process of extracting the local features does not need to be limited by the global features to reduce the input image, the accuracy of the local features is improved, and the effect of the generated enhanced image is improved.

In some optional implementations of this embodiment, the image enhancement model is obtained by training GAN. Compared with CNN, GAN can learn more real image characteristics, and the enhancement effect is better. The GAN includes a generation network (which may be a convolutional neural network (e.g., various convolutional neural network structures including convolutional layers, pooling layers, anti-pooling layers, and anti-convolutional layers, and may be sequentially down-sampled and up-sampled) for processing an input image, and a decision network (which may be used to determine whether the input image is an image output by the generation network). In practice, the discriminant network may also be a convolutional neural network (e.g., various convolutional neural network structures including a fully-connected layer, wherein the fully-connected layer may implement a classification function). It should be noted that the image output by the above generation network can be expressed by a matrix of RGB three channels. The operation of obtaining an image enhancement model by training GAN can be seen in substeps S11 and substep S12 as follows:

in sub-step S11, a sample set is obtained.

Here, the sample set may include a large number of sample images and a label image obtained by image enhancement of each sample image. In practice, the sample set may be obtained in a variety of ways. For example, the existing sample set stored therein may be obtained from another server (e.g., a database server) for storing samples via a wired connection or a wireless connection. As another example, the user may collect samples through the terminal device, so that the executing entity may receive the samples collected by the terminal device and store the samples locally, thereby generating the sample set. It should be noted that the wireless connection means may include, but is not limited to, a 3G/4G connection, a WiFi connection, a bluetooth connection, a WiMAX (World Interoperability for Microwave Access) connection, a Zigbee (Zigbee protocol) connection, an UWB (ultra wide band) connection, and other wireless connection means now known or developed in the future.

Optionally, the sample image in the sample set may be obtained by expanding an image in an existing image set. Specifically, an existing image set may be used as an initial image set, and the initial image in the initial image set is preprocessed to obtain an extended image. And summarizing the initial image and the expanded image to obtain a sample image. Wherein the pre-treatment may include, but is not limited to, at least one of: flipping, scaling, clipping, translating, rotating, interpolating. Therefore, the problem of insufficient supervised learning training data can be solved, and the generalization of the model is improved.

And a substep S12 of training the GAN based on the sample set and determining the generation network in the trained generative confrontation network as the image enhancement model.

Here, the GAN includes a local feature extraction network to be trained, a global feature extraction network, and a feature processing network. Based on the sample set, the executing agent may train the GAN by using a machine learning method, and determine a generation network in the trained GAN as an image enhancement model.

In practice, the training may be performed by the conventional way of training GAN. In other words, training of the generated network and training of the discrimination network are performed iteratively. Specifically, the generation network can be fixed firstly, and the discrimination network is optimized, so that the discrimination network can accurately distinguish real data from generated data; and fixing the discrimination network, and improving the generation network so that the discrimination network cannot distinguish whether the input image is generated by the generation network. And continuously carrying out the iteration until the final convergence. In this case, the image generated by the generation network is close to the tag image, and the discrimination network cannot accurately discriminate between the real data and the generated data (i.e., the accuracy is close to 50%).

Alternatively, the activation function in the GAN may be a scaled exponential linear unit activation function (SELU). The SELU is used as an activation function, so that the convergence capability in the model training process can be improved, namely the output of each layer is close to a certain fixed distribution, and gradient explosion or gradient disappearance is avoided.

Optionally, in sub-step S12, the GAN model may be trained by performing collaborative training on a global feature extraction network and a local feature extraction network, which is specifically described as follows:

firstly, training a global feature extraction network based on a sample set.

Here, the following training steps may be iteratively performed until a preset training stopping condition is satisfied (e.g., the number of iterative training times reaches a preset number of times): firstly, sample images in a sample set are respectively input into a global feature extraction network and a local feature extraction network, and a loss value of a generating type countermeasure network is determined based on an image generated by the generating network, the input sample image, a label image corresponding to the input sample image and a discrimination result output by a discrimination network in the generating type countermeasure network. And then, fixing the parameters of the local feature extraction network, and updating the parameters of the global feature extraction network based on the loss value. It should be noted that, based on the loss value, in addition to updating the parameters of the global feature extraction network, the parameters of the feature processing network and the discrimination network may also be updated.

The loss value is a value of a loss function (loss function), and the loss function is a non-negative real-value function and can be used for representing a difference between a detection result and a real result. In general, the smaller the loss value, the better the robustness of the model. The loss function may be set according to actual requirements.

Optionally, multi-tasking may be employed to train the global feature extraction network. In this case, the loss values can be determined jointly by the loss function. For example, the image generated by the generation network and the label image corresponding to the input sample image may be input to a preset Mean Square Error (MSE) loss function, so as to obtain a first loss value. In addition, the image generated by the generating network may be input to a trained feature extraction model (e.g., VGG model), and the feature map output by a target layer (e.g., Stage 4/5 layer) of the feature extraction model and the image generated by the generating network may be input to a preset Perceptual Loss function (Perceptual Loss), so as to obtain the second Loss value. The perceptual loss function may specifically employ the principle of computation of MSE loss. In addition, the image generated by the generation network and the input sample image may be input to a preset Identity Loss function (Identity Loss), resulting in a third Loss value. The identity loss function may specifically also use the principle of computation of MSE loss. The identity loss function can ensure that the input image is not greatly modified by the model to cause the change of the picture content. In addition, the discrimination result output by the discrimination network in the generative confrontation network may be input to a preset generative confrontation network loss function (GAN loss) to obtain a fourth loss value. The generating type confrontation network loss function can specifically adopt the calculation principles of cross entropy loss, Hinge loss and the like. Finally, a loss value for the generative countermeasure network can be determined in conjunction with the first loss value, the second loss value, the third loss value, and the fourth loss value. For example, the loss functions may be added or weighted and summed to obtain the loss value of the generative countermeasure network. The training is carried out by adopting a multi-task learning mode such as MSE Loss, Perceptial Loss, Identity Loss, GAN Loss and the like, so that the stability of GAN training can be improved.

And secondly, training the local feature extraction network based on the sample set.

Here, the following training steps may be iteratively performed: respectively inputting the sample images in the sample set into a global feature extraction network and a local feature extraction network, and determining a loss value of the generative countermeasure network based on an image generated by the generative network, the input sample image, a label image corresponding to the input sample image and a discrimination result output by the discrimination network in the generative countermeasure network; and fixing parameters of the global feature extraction network, and updating the parameters of the local feature extraction network based on the loss value. It should be noted that, based on the loss value, in addition to updating the parameters of the local feature extraction network, the parameters of the feature processing network and the discrimination network may also be updated. Here, the steps of training the local feature extraction network are similar to the steps of training the global feature extraction network, and the process of determining the loss value is also similar, which is not described herein again.

And thirdly, performing overall training on the generative confrontation network based on the sample set.

Here, the whole training process is a process of iteratively training a generating network and a discriminating network of the GAN. For details, reference may be made to the above description, which is not repeated herein.

Through the collaborative training of the global feature extraction network and the local feature extraction network, the global feature extraction network and the local feature extraction network can better learn the extraction capability of the local features and the global features in the image, and assist the matching of the two scale features.

And 102, inputting the local features and the global features into a feature processing network in the image enhancement model to obtain a first enhanced image corresponding to the original image.

In this embodiment, the executing entity may input the local feature and the global feature to a feature processing network in an image enhancement model, so as to obtain a first enhanced image corresponding to the original image. In practice, the feature processing network may include various convolutional neural network structures including a convolutional layer, a pooling layer, an anti-pooling layer, and an anti-convolutional layer, and may perform down-sampling and up-sampling in sequence to finally realize output of a new image. The new image outputted is the first enhanced image.

In the method provided by the above embodiment of the application, the image enhancement model including the local feature extraction network, the global feature extraction network and the feature processing network is trained in advance, so that the local feature and the global feature of the original image can be extracted through the local feature extraction network and the global feature extraction network, and then the extracted local feature and the extracted global feature are processed through the feature processing network, so as to obtain the first enhanced image after the original image is enhanced. Because the local features and the global features of the image are extracted by using the local feature extraction network and the global feature extraction network with different network parameters respectively, when the image is large, the process of extracting the local features does not need to be limited by the global features to reduce the input image, the accuracy of the local features is improved, and the effect of the generated enhanced image is improved.

With further reference to fig. 3, a flow 300 of yet another embodiment of an image enhancement method is shown. The flow 300 of the image enhancement method comprises the following steps:

step 301, inputting the original image into a local feature extraction network and a global feature extraction network in a pre-trained image enhancement model, respectively, to obtain a local feature and a global feature of the original image.

Step 201 in this embodiment can refer to step 101 in the corresponding embodiment of fig. 1, and is not described herein again.

Step 302, inputting the local features and the global features into a feature processing network in the image enhancement model to obtain a first enhanced image corresponding to the original image.

Step 202 in this embodiment can refer to step 102 in the corresponding embodiment of fig. 1, and is not described herein again.

Step 303, performing filtering processing on the first enhanced image to obtain a filtered image.

In this embodiment, the executing entity may perform filtering processing, such as mean filtering processing, on the first enhanced image to obtain a filtered image. Mean filtering is typically a linear filtering algorithm, which means that a template is given to a target pixel on an image, the template includes its surrounding neighboring pixels (8 surrounding pixels centered on the target pixel, which form a filtering template, i.e., includes the target pixel itself), and the average value of all pixels in the template is used to replace the original pixel value. The mean filtering can achieve the effect of denoising the image.

And step 304, carrying out weighted average on the first enhanced image and the filtered image to obtain a second enhanced image.

In this embodiment, the executing entity may perform weighted average on the first enhanced image and the filtered image to obtain a second enhanced image. The weight coefficient of the filtered image is a negative number, so that the high-frequency information of the image can be reserved in the filtered image, and the corresponding low-frequency information is a pixel value close to the average value, so that the low-frequency information can be removed through the summation with the negative coefficient, and the high-frequency information of the input image is enhanced. The image enhancement effect of the first enhanced image output by the image enhancement model is further improved by filtering the first enhanced image and carrying out weighted average on the filtered image and the first enhanced image.

As can be seen from fig. 2, compared to the embodiment shown in fig. 1, the flow 200 of the image enhancement method in this embodiment involves the steps of filtering the first enhanced image and performing weighted average on the filtered image and the first enhanced image. Because an additional image enhancement algorithm is introduced on the basis of the image enhancement model, the image enhancement effect is further improved.

With further reference to fig. 4, as an implementation of the methods shown in the above-mentioned figures, the present application provides an embodiment of an image enhancement apparatus, which corresponds to the embodiment of the method shown in fig. 1, and which is particularly applicable to various electronic devices.

As shown in fig. 4, the image enhancement apparatus 400 of the present embodiment includes: a feature extraction unit 401, configured to input an original image into a local feature extraction network and a global feature extraction network in a pre-trained image enhancement model, respectively, to obtain a local feature and a global feature of the original image, respectively; a first image enhancement unit 402, configured to input the local features and the global features into a feature processing network in the image enhancement model, so as to obtain a first enhanced image corresponding to the original image.

In some optional implementations of this embodiment, the apparatus further includes a second image enhancement unit configured to: filtering the first enhanced image to obtain a filtered image; and carrying out weighted average on the first enhanced image and the filtered image to obtain a second enhanced image.

In some optional implementations of the present embodiment, the image enhancement model is obtained by training through the following steps: obtaining a sample set, wherein the sample set comprises a sample image and a label image which is obtained by enhancing the sample image; training a generative confrontation network based on the sample set, and determining the generative network in the trained generative confrontation network as an image enhancement model, wherein the generative network comprises the local feature extraction network, the global feature extraction network and the feature processing network.

In some optional implementations of this embodiment, the obtaining of the sample image in the sample set is performed by: preprocessing initial images in the initial image set to obtain an extended image, wherein the preprocessing comprises at least one of the following steps: turning, zooming, clipping, translating, rotating and interpolating; and summarizing the initial image and the extended image to obtain a sample image.

In some optional implementations of this embodiment, the training the generative confrontation network based on the sample set includes: training the global feature extraction network based on the sample set; training the local feature extraction network based on the sample set; and performing overall training on the generative confrontation network based on the sample set.

In some optional implementations of this embodiment, the training the generative confrontation network based on the sample set includes: the following training steps are performed iteratively: inputting the sample images in the sample set to the global feature extraction network and the local feature extraction network, and determining a loss value of the generative countermeasure network based on an image generated by the generative network, the input sample image, a label image corresponding to the input sample image, and a discrimination result output by a discrimination network in the generative countermeasure network; and fixing the parameters of the local feature extraction network, and updating the parameters of the global feature extraction network based on the loss value.

In some optional implementations of this embodiment, the training the generative confrontation network based on the sample set includes: the following training steps are performed iteratively: inputting the sample images in the sample set to the global feature extraction network and the local feature extraction network, and determining a loss value of the generative countermeasure network based on an image generated by the generative network, the input sample image, a label image corresponding to the input sample image, and a discrimination result output by a discrimination network in the generative countermeasure network; and fixing the parameters of the global feature extraction network, and updating the parameters of the local feature extraction network based on the loss value.

In some optional implementations of this embodiment, the training the generative confrontation network based on the sample set includes: the determining a loss value of the generator countermeasure network based on the image generated by the generator network, the input sample image, the label image corresponding to the input sample image, and the determination result output by the determination network in the generator countermeasure network, includes: inputting the image generated by the generating network and the label image corresponding to the input sample image into a preset mean square error loss function to obtain a first loss value; inputting the image generated by the generated network into a trained feature extraction model, and inputting a feature map output by a target layer of the feature extraction model and the image generated by the generated network into a preset perception loss function to obtain a second loss value; inputting the image generated by the generated network and the input sample image into a preset identity loss function to obtain a third loss value; inputting a judgment result output by a judgment network in the generative confrontation network into a preset generative confrontation network loss function to obtain a fourth loss value; and determining a loss value of the generative countermeasure network based on the first loss value, the second loss value, the third loss value, and the fourth loss value.

In some optional implementations of this embodiment, the training the generative confrontation network based on the sample set includes: the activation function in the generative countermeasure network is a scaled exponential linear unit activation function.

According to the device provided by the embodiment of the application, the image enhancement model comprising the local feature extraction network, the global feature extraction network and the feature processing network is trained in advance, so that the local features and the global features of the original image can be extracted through the local feature extraction network and the global feature extraction network, the extracted local features and the extracted global features are processed through the feature processing network, and the first enhanced image obtained after the original image is enhanced is obtained. Because the local features and the global features of the image are extracted by using the local feature extraction network and the global feature extraction network with different network parameters respectively, when the image is large, the process of extracting the local features does not need to be limited by the global features to reduce the input image, the accuracy of the local features is improved, and the effect of the generated enhanced image is improved.

Fig. 5 is a block diagram illustrating an apparatus 500 for image enhancement according to an exemplary embodiment, where the apparatus 500 may be an intelligent terminal or a server. For example, the apparatus 500 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.

Referring to fig. 5, the apparatus 500 may include one or more of the following components: processing component 502, memory 504, power component 506, multimedia component 508, audio component 510, input/output (I/O) interface 512, sensor component 514, and communication component 516.

The processing component 502 generally controls overall operation of the device 500, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing elements 502 may include one or more processors 520 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 502 can include one or more modules that facilitate interaction between the processing component 502 and other components. For example, the processing component 502 can include a multimedia module to facilitate interaction between the multimedia component 508 and the processing component 502.

The memory 504 is configured to store various types of data to support operations at the apparatus 500. Examples of such data include instructions for any application or method operating on device 500, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 504 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The power supply component 506 provides power to the various components of the device 500. The power components 506 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the apparatus 500.

The multimedia component 508 includes a screen that provides an output interface between the device 500 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of the touch or slide action but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 508 includes a front facing camera and/or a rear facing camera. The front-facing camera and/or the rear-facing camera may receive external multimedia data when the device 500 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 510 is configured to output and/or input audio signals. For example, audio component 510 includes a Microphone (MIC) configured to receive external audio signals when apparatus 500 is in an operating mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 504 or transmitted via the communication component 516. In some embodiments, audio component 510 further includes a speaker for outputting audio signals.

The I/O interface 512 provides an interface between the processing component 502 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 514 includes one or more sensors for providing various aspects of status assessment for the device 500. For example, the sensor assembly 514 may detect an open/closed state of the device 500, the relative positioning of the components, such as a display and keypad of the apparatus 500, the sensor assembly 514 may also detect a change in position of the apparatus 500 or a component of the apparatus 500, the presence or absence of user contact with the apparatus 500, orientation or acceleration/deceleration of the apparatus 500, and a change in temperature of the apparatus 500. The sensor assembly 514 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 514 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 514 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 516 is configured to facilitate communication between the apparatus 500 and other devices in a wired or wireless manner. The apparatus 500 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 516 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the aforementioned communication component 516 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the apparatus 500 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer-readable storage medium comprising instructions, such as the memory 504 comprising instructions, executable by the processor 520 of the apparatus 500 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

Fig. 6 is a schematic diagram of a server in some embodiments of the present application. The server 600 may vary significantly due to configuration or performance, and may include one or more Central Processing Units (CPUs) 622 (e.g., one or more processors) and memory 632, one or more storage media 630 (e.g., one or more mass storage devices) storing applications 642 or data 644. Memory 632 and storage medium 630 may be, among other things, transient or persistent storage. The program stored in the storage medium 630 may include one or more modules (not shown), each of which may include a series of instruction operations for the server. Still further, the central processor 622 may be configured to communicate with the storage medium 630 and execute a series of instruction operations in the storage medium 630 on the server 600.

The server 600 may also include one or more power supplies 626, one or more wired or wireless network interfaces 650, one or more input-output interfaces 658, one or more keyboards 656, and/or one or more operating systems 641, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.

A non-transitory computer readable storage medium having instructions therein which, when executed by a processor of an apparatus (smart terminal or server), enable the apparatus to perform a method of image enhancement, the method comprising: respectively inputting an original image into a local feature extraction network and a global feature extraction network in a pre-trained image enhancement model to respectively obtain local features and global features of the original image; and inputting the local features and the global features into a feature processing network in the image enhancement model to obtain a first enhanced image corresponding to the original image.

Optionally, the device being configured to execute the one or more programs by the one or more processors includes instructions for: filtering the first enhanced image to obtain a filtered image; and carrying out weighted average on the first enhanced image and the filtered image to obtain a second enhanced image.

Optionally, the image enhancement model is obtained by training through the following steps: obtaining a sample set, wherein the sample set comprises a sample image and a label image which is obtained by enhancing the sample image; training a generative confrontation network based on the sample set, and determining the generative network in the trained generative confrontation network as an image enhancement model, wherein the generative network comprises the local feature extraction network, the global feature extraction network and the feature processing network.

Optionally, the obtaining of the sample image in the sample set is performed by: preprocessing initial images in the initial image set to obtain an extended image, wherein the preprocessing comprises at least one of the following steps: turning, zooming, clipping, translating, rotating and interpolating; and summarizing the initial image and the expanded image to obtain a sample image.

Optionally, the training the generative confrontation network based on the sample set includes: training the global feature extraction network based on the sample set; training the local feature extraction network based on the sample set; based on the sample set, the generative confrontation network is trained in its entirety.

Optionally, the training the global feature extraction network based on the sample set includes: the following training steps are performed iteratively: respectively inputting the sample images in the sample set into the global feature extraction network and the local feature extraction network, and determining a loss value of the generative countermeasure network based on an image generated by the generative network, the input sample image, a label image corresponding to the input sample image and a discrimination result output by a discrimination network in the generative countermeasure network; and fixing the parameters of the local feature extraction network, and updating the parameters of the global feature extraction network based on the loss value.

Optionally, the training the local feature extraction network based on the sample set includes: the following training steps are performed iteratively: respectively inputting the sample images in the sample set into the global feature extraction network and the local feature extraction network, and determining a loss value of the generative countermeasure network based on an image generated by the generative network, the input sample image, a label image corresponding to the input sample image and a discrimination result output by a discrimination network in the generative countermeasure network; and fixing the parameters of the global feature extraction network, and updating the parameters of the local feature extraction network based on the loss value.

Optionally, the determining a loss value of the generative countermeasure network based on the image generated by the generative network, the input sample image, the label image corresponding to the input sample image, and a discrimination result output by a discrimination network in the generative countermeasure network includes: inputting the image generated by the generating network and the label image corresponding to the input sample image into a preset mean square error loss function to obtain a first loss value; inputting the image generated by the generated network into a trained feature extraction model, and inputting a feature map output by a target layer of the feature extraction model and the image generated by the generated network into a preset perception loss function to obtain a second loss value; inputting the image generated by the generating network and the input sample image into a preset identity loss function to obtain a third loss value; inputting a judgment result output by a judgment network in the generative confrontation network into a preset generative confrontation network loss function to obtain a fourth loss value; determining a loss value for the generative countermeasure network based on the first loss value, the second loss value, the third loss value, and the fourth loss value.

Optionally, the activation function in the generative countermeasure network is a scaled exponential linear unit activation function.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the application disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice in the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.

It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

The foregoing detailed description has provided a method and apparatus for image enhancement and an apparatus for image enhancement, and the principles and embodiments of the present application have been described herein using specific examples, which are provided only to help understand the method and core ideas of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. A method of image enhancement, the method comprising:

respectively inputting an original image into a local feature extraction network and a global feature extraction network in a pre-trained image enhancement model to respectively obtain local features and global features of the original image;

and inputting the local features and the global features into a feature processing network in the image enhancement model to obtain a first enhanced image corresponding to the original image.

2. The method of claim 1, wherein after obtaining the enhanced image corresponding to the original image, the method further comprises:

filtering the first enhanced image to obtain a filtered image;

and carrying out weighted average on the first enhanced image and the filtered image to obtain a second enhanced image.

3. The method of claim 1, wherein the image enhancement model is trained by:

obtaining a sample set, wherein the sample set comprises a sample image and a label image which is obtained by enhancing the sample image;

training a generative confrontation network based on the sample set, and determining the generative network in the trained generative confrontation network as an image enhancement model, wherein the generative network comprises the local feature extraction network, the global feature extraction network and the feature processing network.

4. The method of claim 3, wherein the obtaining of the sample image in the sample set is performed by:

preprocessing initial images in the initial image set to obtain an extended image, wherein the preprocessing comprises at least one of the following steps: turning, zooming, clipping, translating, rotating and interpolating;

and summarizing the initial image and the expanded image to obtain a sample image.

5. The method of claim 3, wherein training a generative confrontation network based on the sample set comprises:

training the global feature extraction network based on the sample set;

training the local feature extraction network based on the sample set;

based on the sample set, the generative confrontation network is trained in its entirety.

6. The method of claim 5, wherein training the global feature extraction network based on the sample set comprises:

the following training steps are performed iteratively:

respectively inputting the sample images in the sample set into the global feature extraction network and the local feature extraction network, and determining a loss value of the generative countermeasure network based on an image generated by the generative network, the input sample image, a label image corresponding to the input sample image and a discrimination result output by a discrimination network in the generative countermeasure network;

and fixing the parameters of the local feature extraction network, and updating the parameters of the global feature extraction network based on the loss value.

7. The method of claim 5, wherein training the local feature extraction network based on the sample set comprises:

the following training steps are performed iteratively:

and fixing the parameters of the global feature extraction network, and updating the parameters of the local feature extraction network based on the loss value.

8. The method according to claim 6 or 7, wherein the determining the loss value of the generative countermeasure network based on the generated image of the generative network, the input sample image, the label image corresponding to the input sample image, and the discrimination result output by the discrimination network in the generative countermeasure network comprises:

inputting the image generated by the generating network and the label image corresponding to the input sample image into a preset mean square error loss function to obtain a first loss value;

inputting the image generated by the generated network into a trained feature extraction model, and inputting a feature map output by a target layer of the feature extraction model and the image generated by the generated network into a preset perception loss function to obtain a second loss value;

inputting the image generated by the generating network and the input sample image into a preset identity loss function to obtain a third loss value;

inputting a judgment result output by a judgment network in the generative confrontation network into a preset generative confrontation network loss function to obtain a fourth loss value;

determining a loss value for the generative countermeasure network based on the first loss value, the second loss value, the third loss value, and the fourth loss value.

9. The method of claim 3, wherein the activation function in the generative countermeasure network is a scaled exponential linear unit activation function.

10. An image enhancement apparatus, characterized in that the apparatus comprises:

the image enhancement model comprises a feature extraction unit, a feature extraction unit and a feature extraction unit, wherein the feature extraction unit is configured to input an original image into a local feature extraction network and a global feature extraction network in a pre-trained image enhancement model respectively to obtain local features and global features of the original image respectively;

and the first image enhancement unit is configured to input the local features and the global features into a feature processing network in the image enhancement model to obtain a first enhanced image corresponding to the original image.

11. The apparatus according to claim 10, characterized in that the apparatus further comprises a second image enhancement unit configured to:

filtering the first enhanced image to obtain a filtered image;

12. The apparatus of claim 10, wherein the image enhancement model is trained by:

13. The method of claim 12, wherein the obtaining of the sample image in the sample set is performed by:

14. An apparatus for image enhancement, comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory, and wherein the programs, when executed by the one or more processors, perform the steps of the method of any one of claims 1-9.

15. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-9.