CN113518243A

CN113518243A - Image processing method and device

Info

Publication number: CN113518243A
Application number: CN202010278793.9A
Authority: CN
Inventors: 潘澄; 刘健威
Original assignee: TCL Technology Group Co Ltd
Current assignee: TCL Technology Group Co Ltd
Priority date: 2020-04-10
Filing date: 2020-04-10
Publication date: 2021-10-19

Abstract

The application is applicable to the technical field of image processing, and provides an image processing method, which comprises the following steps: acquiring a frame of target image from a plurality of frames of input images; performing sub-pixel alignment processing on the multiple frames of input images according to a preset sub-pixel alignment algorithm to obtain a frame of sub-pixel aligned image; and supplementing the sub-pixel information in the sub-pixel alignment image into the target image to obtain an image restored by a frame of sub-pixels. The method and the device have the advantages that the sub-pixel information of the sub-pixel alignment image of one frame obtained by sub-pixel alignment processing of the multi-frame input image is supplemented into the target image to obtain the image after sub-pixel reduction, compared with the existing scheme, the method and the device reduce the conditions required by the input image, simplify the steps, reduce the sensitivity to the difference between the multi-frame input image, can keep the original information of the image to a higher degree, and improve the effect of improving the sub-pixel level super-resolution.

Description

Image processing method and device

Technical Field

The present application relates to the field of image processing technologies, and in particular, to an image processing method and apparatus.

Background

A pixel is the smallest unit of an image and is the physical resolution of the image. The pixels between two original physical pixels are called Sub-pixels (Sub pixels). In the field of image processing technology, image resolution is generally required to be processed in order to obtain a high-definition image, and sub-pixel filling between pixels is important when the resolution of the image is improved.

The sub-pixel filling can retain the original information of the image to a high degree, reduce the pixel points close to the real situation and improve the image quality, so that the resolution of the image at the sub-pixel level is enhanced, and the enhancement of the resolution of the image at the sub-pixel level is one of the very important and wide application range subjects in the image research at present.

In the prior art, a conventional method generally restores sub-pixel points while aligning multiple frames of images, and the method generally needs to satisfy 3 conditions: 1. multi-frame input is required; 2. the input frame needs to contain image aliasing; 3. the incoming multi-frame aliased images are sampled at different sub-pixel locations. The conditions limit the effect of improving the super-resolution at the sub-pixel level, if the difference between frames is too large, for example, when a moving object is shot, the alignment of sub-pixels of multiple frames can make the image more fuzzy, which leads to the reduction of the image quality, and when a photo is shot, the last two conditions are satisfied or not, which also has high uncertainty, which leads to the poor effect of actually improving the resolution.

Disclosure of Invention

The embodiment of the application provides an image processing method and device, which can solve the problem that the super-resolution improvement effect of sub-pixel level images in the prior art is poor.

In a first aspect, an embodiment of the present application provides an image processing method, where the method includes:

acquiring a frame of target image from a plurality of frames of input images;

performing sub-pixel alignment processing on the multiple frames of input images according to a preset sub-pixel alignment algorithm to obtain a frame of sub-pixel aligned image;

and supplementing the sub-pixel information in the sub-pixel alignment image into the target image for reduction to obtain a frame of image subjected to sub-pixel reduction.

In a possible implementation manner of the first aspect, the step of obtaining a frame of target image from among multiple frames of input images is:

and acquiring an image with the highest definition from the multi-frame input images as the target image.

In a possible implementation manner of the first aspect, the step of obtaining the image with the highest definition from the multiple frames of input images is:

and acquiring the image with the largest image size from the multi-frame input images as the image with the highest definition.

In a possible implementation manner of the first aspect, the step of supplementing the sub-pixel information in the sub-pixel aligned image into the target image for restoration to obtain a frame of image with restored sub-pixels includes:

and supplementing the sub-pixel information in the sub-pixel alignment image into the target image for reduction to obtain a super-resolution image after one frame of sub-pixel reduction.

In a possible implementation manner of the first aspect, the supplementing sub-pixel information in the sub-pixel aligned image into the target image includes:

and inputting the target image into a preset neural network model, and training the preset neural network model by taking the sub-pixel alignment image as a training target.

In a possible implementation manner of the first aspect, the training the preset neural network model includes:

calculating a loss function of the preset neural network model;

and adjusting parameters of the preset neural network model according to the loss function, and training the preset neural network model again until the loss function reaches a preset value.

In a possible implementation manner of the first aspect, the step of calculating the loss function of the preset neural network model includes:

acquiring the spatial difference and the structural difference of the trained target image and the sub-pixel alignment image;

and calculating a loss function of the preset neural network model according to the spatial difference and the structural difference.

In a possible implementation manner of the first aspect, the calculation formula of the loss function S is:

wherein the content of the first and second substances,

ssim (p) is the structural difference between the trained target image and the sub-pixel aligned image; wherein the content of the first and second substances,

p represents image pixels, P represents a collection of image pixels, x (P) represents the trained target image, y (P) represents the sub-pixel aligned image, N represents total pixels, μ_xAverage, mu, of image blocks representing said target image after training_yAverage, σ, of image blocks representing said sub-pixel aligned image_x ²Variance, σ, of image blocks representing the target image after training_y ²Variance, σ, of an image block representing the sub-pixel aligned image_xyRepresents the covariance of the image block, and C is a constant.

In a possible implementation manner of the first aspect, after the obtaining a target image of one frame from among a plurality of frames of input images, the method further includes:

and amplifying the target image by preset times.

In a possible implementation manner of the first aspect, the performing sub-pixel alignment processing on the multiple frames of input images to obtain a frame of sub-pixel aligned image includes:

selecting a frame of image with the highest definition from the multi-frame input images as a reference frame;

acquiring the position relation of objects between other frames except the reference frame in the multi-frame input image and the reference frame;

converting the coordinate system of the other frame to be consistent with the coordinate system of the reference frame according to the object position relation;

and performing sub-pixel alignment fusion on the other frames and the reference frame by taking the reference frame as a reference image to obtain the sub-pixel alignment image.

In a possible implementation manner of the first aspect, before the obtaining a target image of one frame from among multiple input images, the method further includes:

acquiring a shot image with a preset frame number;

and cutting the shot image of each frame according to a preset cutting size, and taking the cut shot image as the input image.

In a second aspect, an embodiment of the present application provides an image processing apparatus, including:

the acquisition module is used for acquiring a frame of target image from a plurality of frames of input images;

the sub-pixel alignment module is used for carrying out sub-pixel alignment processing on the multiple frames of input images according to a preset sub-pixel alignment algorithm to obtain a frame of sub-pixel alignment image;

and the image restoration module is used for supplementing the sub-pixel information in the sub-pixel alignment image into the target image for restoration to obtain an image restored by a frame of sub-pixels.

In a third aspect, an embodiment of the present application provides a terminal device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor, when executing the computer program, implements the image processing method according to any one of the first aspect.

In a fourth aspect, the present application provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the computer program implements the image processing method according to any one of the first aspect.

In a fifth aspect, the present application provides a computer program product, which when run on a terminal device, causes the terminal device to execute the image processing method according to any one of the first aspect.

It is understood that the beneficial effects of the second aspect to the fifth aspect can be referred to the related description of the first aspect, and are not described herein again.

Compared with the prior art, the embodiment of the application has the advantages that: compared with the existing scheme of reducing sub-pixel points while aligning multi-frame images, the method reduces the conditions required by the input images, simplifies the steps, reduces the sensitivity of the difference between the multi-frame input images, can retain the original information of the images to a higher degree, and improves the effect of improving the sub-pixel level super-resolution.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1 is a schematic structural diagram of a mobile phone to which an image processing method according to an embodiment of the present application is applied;

fig. 2 is a schematic diagram of a software architecture suitable for an image processing method according to an embodiment of the present application;

FIG. 3 is a flowchart illustrating an image processing method according to an embodiment of the present application;

FIG. 4 is a flowchart illustrating an image processing method according to another embodiment of the present application;

fig. 5 is an exemplary diagram of an image processing apparatus according to an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It should also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to" determining "or" in response to detecting ". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".

Furthermore, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used for distinguishing between descriptions and not necessarily for describing or implying relative importance.

Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather "one or more but not all embodiments" unless specifically stated otherwise. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless expressly specified otherwise.

The image processing method provided by the embodiment of the application can be applied to terminal devices such as a mobile phone, a tablet personal computer, a wearable device, a vehicle-mounted device, an Augmented Reality (AR)/Virtual Reality (VR) device, a notebook computer, an ultra-mobile personal computer (UMPC), a netbook, a Personal Digital Assistant (PDA), and the like, and the embodiment of the application does not limit the specific type of the terminal device at all.

For example, the terminal device may be a Station (ST) in a WLAN, which may be a cellular phone, a cordless phone, a Session Initiation Protocol (SIP) phone, a Wireless Local Loop (WLL) station, a Personal Digital Assistant (PDA) device, a handheld device with Wireless communication capability, a computing device or other processing device connected to a Wireless modem, a vehicle-mounted device, a vehicle-mounted networking terminal, a computer, a laptop, a handheld communication device, a handheld computing device, a satellite Wireless device, a Wireless modem card, a television set-top box (STB), a Customer Premises Equipment (CPE), and/or other devices for communicating over a Wireless system and a next generation communication system, such as a Mobile terminal in a 5G Network or a Public Land Mobile Network (future evolved, PLMN) mobile terminals in the network, etc.

By way of example and not limitation, when the terminal device is a wearable device, the wearable device may also be a generic term for intelligently designing daily wearing by applying wearable technology, developing wearable devices, such as glasses, watches, and bracelets. A wearable device is a portable device that is worn directly on the body or integrated into the clothing or accessories of the user. The wearable device is not only a hardware device, but also realizes powerful functions through software support, data interaction and cloud interaction. The generalized wearable intelligent device has the advantages that the generalized wearable intelligent device is complete in function and large in size, can realize complete or partial functions without depending on a smart phone, such as a smart watch or smart glasses, and only is concentrated on a certain application function, and needs to be matched with other devices such as the smart phone for use, such as various smart bracelets for monitoring physical signs, smart jewelry and the like.

Take the terminal device as a mobile phone as an example. Fig. 1 is a block diagram illustrating a partial structure of a mobile phone according to an embodiment of the present disclosure. Referring to fig. 1, the cellular phone includes: a Radio Frequency (RF) circuit 110, a memory 120, an input unit 130, a display unit 140, a sensor 150, an audio circuit 160, a wireless fidelity (WiFi) module 170, a processor 180, and a power supply 190. Those skilled in the art will appreciate that the handset configuration shown in fig. 1 is not intended to be limiting and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.

The following describes each component of the mobile phone in detail with reference to fig. 1:

the RF circuit 110 may be used for receiving and transmitting signals during information transmission and reception or during a call, and in particular, receives downlink information of a base station and then processes the received downlink information to the processor 180; in addition, the data for designing uplink is transmitted to the base station. Typically, the RF circuitry includes, but is not limited to, an antenna, at least one Amplifier, a transceiver, a coupler, a Low Noise Amplifier (LNA), a duplexer, and the like. In addition, the RF circuitry 110 may also communicate with networks and other devices via wireless communications. The wireless communication may use any communication standard or protocol, including but not limited to Global System for Mobile communication (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Long Term Evolution (LTE)), e-mail, Short Messaging Service (SMS), and the like.

The memory 120 may be used to store software programs and modules, and the processor 180 executes various functional applications and data processing of the mobile phone by operating the software programs and modules stored in the memory 120. The memory 120 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. Further, the memory 120 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.

The input unit 130 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the cellular phone 100. Specifically, the input unit 130 may include a touch panel 131 and other input devices 132. The touch panel 131, also referred to as a touch screen, may collect touch operations of a user on or near the touch panel 131 (e.g., operations of the user on or near the touch panel 131 using any suitable object or accessory such as a finger or a stylus pen), and drive the corresponding connection device according to a preset program. Alternatively, the touch panel 131 may include two parts, i.e., a touch detection device and a touch controller. The touch detection device detects the touch direction of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch sensing device, converts the touch information into touch point coordinates, sends the touch point coordinates to the processor 180, and can receive and execute commands sent by the processor 180. In addition, the touch panel 131 may be implemented by various types such as a resistive type, a capacitive type, an infrared ray, and a surface acoustic wave. The input unit 130 may include other input devices 132 in addition to the touch panel 131. In particular, other input devices 132 may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and the like.

The display unit 140 may be used to display information input by a user or information provided to the user and various menus of the mobile phone. The Display unit 140 may include a Display panel 141, and optionally, the Display panel 141 may be configured in the form of a Liquid Crystal Display (LCD), an Organic Light-Emitting Diode (OLED), or the like. Further, the touch panel 131 can cover the display panel 141, and when the touch panel 131 detects a touch operation on or near the touch panel 131, the touch operation is transmitted to the processor 180 to determine the type of the touch event, and then the processor 180 provides a corresponding visual output on the display panel 141 according to the type of the touch event. Although the touch panel 131 and the display panel 141 are shown as two separate components in fig. 1 to implement the input and output functions of the mobile phone, in some embodiments, the touch panel 131 and the display panel 141 may be integrated to implement the input and output functions of the mobile phone.

The handset 100 may also include at least one sensor 150, such as a light sensor, motion sensor, and other sensors. Specifically, the light sensor may include an ambient light sensor that adjusts the brightness of the display panel 141 according to the brightness of ambient light, and a proximity sensor that turns off the display panel 141 and/or the backlight when the mobile phone is moved to the ear. As one of the motion sensors, the accelerometer sensor can detect the magnitude of acceleration in each direction (generally, three axes), can detect the magnitude and direction of gravity when stationary, and can be used for applications of recognizing the posture of a mobile phone (such as horizontal and vertical screen switching, related games, magnetometer posture calibration), vibration recognition related functions (such as pedometer and tapping), and the like; as for other sensors such as a gyroscope, a barometer, a hygrometer, a thermometer, and an infrared sensor, which can be configured on the mobile phone, further description is omitted here.

Audio circuitry 160, speaker 161, and microphone 162 may provide an audio interface between the user and the handset. The audio circuit 160 may transmit the electrical signal converted from the received audio data to the speaker 161, and convert the electrical signal into a sound signal for output by the speaker 161; on the other hand, the microphone 162 converts the collected sound signal into an electrical signal, which is received by the audio circuit 160 and converted into audio data, which is then processed by the audio data output processor 180 and then transmitted to, for example, another cellular phone via the RF circuit 110, or the audio data is output to the memory 120 for further processing.

WiFi belongs to short-distance wireless transmission technology, and the mobile phone can help a user to receive and send e-mails, browse webpages, access streaming media and the like through the WiFi module 170, and provides wireless broadband Internet access for the user. Although fig. 1 shows the WiFi module 170, it is understood that it does not belong to the essential constitution of the handset 100, and can be omitted entirely as needed within the scope not changing the essence of the invention.

The processor 180 is a control center of the mobile phone, connects various parts of the entire mobile phone by using various interfaces and lines, and performs various functions of the mobile phone and processes data by operating or executing software programs and/or modules stored in the memory 120 and calling data stored in the memory 120, thereby integrally monitoring the mobile phone. Alternatively, processor 180 may include one or more processing units; preferably, the processor 180 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 180.

The handset 100 also includes a power supply 190 (e.g., a battery) for powering the various components, which may preferably be logically connected to the processor 180 via a power management system, such that the power management system may be used to manage charging, discharging, and power consumption.

Although not shown, the handset 100 may also include a camera. Optionally, the position of the camera on the mobile phone 100 may be front-located or rear-located, which is not limited in this embodiment of the application.

Optionally, the mobile phone 100 may include a single camera, a dual camera, or a triple camera, which is not limited in this embodiment.

For example, the cell phone 100 may include three cameras, one being a main camera, one being a wide camera, and one being a tele camera.

Optionally, when the mobile phone 100 includes a plurality of cameras, the plurality of cameras may be all front-mounted, all rear-mounted, or a part of the cameras front-mounted and another part of the cameras rear-mounted, which is not limited in this embodiment of the present application.

In addition, although not shown, the mobile phone 100 may further include a bluetooth module or the like, which is not described herein.

Fig. 2 is a schematic diagram of a software structure of the mobile phone 100 according to the embodiment of the present application. Taking the operating system of the mobile phone 100 as an Android system as an example, in some embodiments, the Android system is divided into four layers, which are an application layer, an application Framework (FWK) layer, a system layer and a hardware abstraction layer, and the layers communicate with each other through a software interface.

As shown in fig. 2, the application layer may be a series of application packages, which may include short message, calendar, camera, video, navigation, gallery, call, and other applications.

The application framework layer provides an Application Programming Interface (API) and a programming framework for the application program of the application layer. The application framework layer may include some predefined functions, such as functions for receiving events sent by the application framework layer.

As shown in FIG. 2, the application framework layers may include a window manager, a resource manager, and a notification manager, among others.

The window manager is used for managing window programs. The window manager can obtain the size of the display screen, judge whether a status bar exists, lock the screen, intercept the screen and the like. The content provider is used to store and retrieve data and make it accessible to applications. The data may include video, images, audio, calls made and received, browsing history and bookmarks, phone books, etc.

The resource manager provides various resources for the application, such as localized strings, icons, pictures, layout files, video files, and the like.

The notification manager enables the application to display notification information in the status bar, can be used to convey notification-type messages, can disappear automatically after a short dwell, and does not require user interaction. Such as a notification manager used to inform download completion, message alerts, etc. The notification manager may also be a notification that appears in the form of a chart or scroll bar text at the top status bar of the system, such as a notification of a background running application, or a notification that appears on the screen in the form of a dialog window. For example, prompting text information in the status bar, sounding a prompt tone, vibrating the electronic device, flashing an indicator light, etc.

The application framework layer may further include:

a viewing system that includes visual controls, such as controls to display text, controls to display pictures, and the like. The view system may be used to build applications. The display interface may be composed of one or more views. For example, the display interface including the short message notification icon may include a view for displaying text and a view for displaying pictures.

The phone manager is used to provide the communication functions of the handset 100. Such as management of call status (including on, off, etc.).

The system layer may include a plurality of functional modules. For example: a sensor service module, a physical state identification module, a three-dimensional graphics processing library (such as OpenGL ES), and the like.

The sensor service module is used for monitoring sensor data uploaded by various sensors in a hardware layer and determining the physical state of the mobile phone 100;

the physical state recognition module is used for analyzing and recognizing user gestures, human faces and the like;

the three-dimensional graphic processing library is used for realizing three-dimensional graphic drawing, image rendering, synthesis, layer processing and the like.

The system layer may further include:

the surface manager is used to manage the display subsystem and provide fusion of 2D and 3D layers for multiple applications.

The media library supports a variety of commonly used audio, video format playback and recording, and still image files, among others. The media library may support a variety of audio-video encoding formats, such as MPEG4, h.264, MP3, AAC, AMR, JPG, PNG, and the like.

The hardware abstraction layer is a layer between hardware and software. The hardware abstraction layer may include a display driver, a camera driver, a sensor driver, etc. for driving the relevant hardware of the hardware layer, such as a display screen, a camera, a sensor, etc.

The following embodiments may be implemented on the cellular phone 100 having the above-described hardware structure/software structure. The following embodiment will take the mobile phone 100 as an example to explain the image processing method provided in the embodiment of the present application.

Fig. 3 shows a schematic flowchart of an image processing method provided in an embodiment of the present application, which may be applied to the mobile phone 100 described above by way of example and not limitation, and the image processing method specifically includes steps S101 to S103.

S101, acquiring a frame of target image from a plurality of frames of input images.

In a preferred embodiment, the resolution of the multi-frame input images may be consistent, and the input images of the frames are different in definition or in imaging quality. In addition, in a preferred embodiment, the captured image may be subjected to image processing such as cropping, drying, scaling, and the like before being used as the input image. In addition, the input image can be the image in the most original format shot by a mobile phone, for example, the input image can be the image in a Bayer format, so that the input image can be ensured not to be converted in any data format, the most complete original image information is reserved, and the effect of improving the final sub-pixel-level super-resolution is better. Meanwhile, it can be understood that if the number of input images is too large, the image processing efficiency is affected, so that the final super-resolution image output is slowed down, a mobile phone is stuck after pressing a shutter, the photographing experience is affected, and if the number of input images is too small, the best resolution improvement effect cannot be achieved due to limited input data.

And S102, performing sub-pixel alignment processing on the multiple frames of input images according to a preset sub-pixel alignment algorithm to obtain a frame of sub-pixel aligned image.

Specifically, the preset sub-pixel alignment algorithm includes, but is not limited to, an optical flow method, a block matching method based on a feature point, or a block matching method based on a feature point. The sub-pixel alignment is an image processing process for filling sub-pixel information of other images into corresponding sub-pixel positions of a reference frame, so that the sub-pixel aligned image obtained after sub-pixel alignment contains the sub-pixel information of each frame of input image, and more comprehensive sub-pixel information can be supplemented into a target image subsequently. The original information of the image can be kept to a high degree by sub-pixel alignment, and pixel points close to the real situation are restored, so that the image quality is improved.

It should be noted that step S101 and step S102 may be executed in tandem or in synchronization, and when a multi-core processor is provided, in order to increase the image processing efficiency, it is preferable that the screening and the sub-pixel alignment of the highest-resolution image be executed in parallel in synchronization.

S103, supplementing the sub-pixel information in the sub-pixel alignment image into the target image for reduction to obtain an image with one frame of sub-pixels reduced.

In specific implementation, each sub-pixel information in the sub-pixel alignment image can be supplemented one by one into the sub-pixels corresponding to the target image for reduction according to the sub-pixel position corresponding relationship between the sub-pixel alignment image and the target image, so as to form an image after sub-pixel reduction. Or, a neural network model such as FSRCNN, SRCNN, or RESNET may be used to train the target image, the sub-pixel aligned image is used as a training target of the neural network model during training, so that the sub-pixel information in the target image is close to the sub-pixel information of the sub-pixel aligned image, and through iterative training for a certain number of times, when the sub-pixel information in the target image is completely matched with the sub-pixel information of the sub-pixel aligned image or the matching rate is higher than a threshold value, the neural network model is optimal, and at this time, the sub-pixel information in the target image is closest to the sub-pixel information of the sub-pixel aligned image, the currently trained target image is output, so as to obtain an image after one frame of sub-pixel reduction.

In a possible implementation manner, the target image may be a highest-definition image among a plurality of frame input images, and the step of obtaining one frame of target image from among the plurality of frame input images may be: and acquiring the image with the highest definition from the multi-frame input images as the target image, so that the sub-pixel alignment image is fused with the image with the highest definition, thereby retaining the original information of the image to a higher degree, improving the effect of improving the sub-pixel level super-resolution and finally obtaining the super-resolution image after the sub-pixel reduction. In a possible implementation manner, image size information may be obtained from attributes of each frame of input image to obtain an image size of each frame of input image, and under the condition that resolutions are the same, the higher the definition of an image is, the better the imaging quality is, the larger the image is, the more memory is occupied, so that an image with the highest definition may be screened out according to the image size information of each frame of input image.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

In summary, in the image processing method in this embodiment, a frame of target image is obtained from a plurality of frames of input images, and sub-pixel information of a frame of sub-pixel aligned image obtained by performing sub-pixel alignment processing on the plurality of frames of input images is supplemented into the target image for reduction, so as to obtain an image after sub-pixel reduction.

Fig. 4 shows a schematic flowchart of an image processing method provided in another embodiment of the present application, which may be applied to the mobile phone 100 described above by way of example and not limitation, and specifically includes steps S201 to S206.

In step S201, a shot image of a preset number of frames is acquired.

The preset frame number is preferably 4-6 frames, namely 4-6 frames of shot images are selected each time to carry out sub-pixel level super-resolution improvement. In specific implementation, a picture library may be established by actually capturing pictures of the mobile phone, and then all pictures in the picture library are divided into a plurality of groups according to 4-6 pictures, for example, a dataset may be established by actually capturing pictures of the TCL T1 mobile phone, and there are 499 pictures in total, and 4-6 pictures are a group. Each group can be subjected to sub-pixel level super-resolution enhancement, and finally, one frame with the best effect in the multi-group sub-pixel level super-resolution enhancement can be selected as a final super-resolution image.

And step S202, cutting each frame of the shot image according to a preset cutting size, and taking the cut shot image as an input image to obtain a multi-frame input image.

The preset cropping size may be w × h, w is the image width, and h is the image height. In order to make the input images have uniform size and improve the training efficiency of the subsequent model, the step cuts each frame of shot image by adopting the cutting size of w x h, so that the size of the cut shot image is w x h, and the size of each frame of input image is uniform to be w x h. In addition, when the captured image is specifically cut, it is preferable to reserve the middle area of the captured image and cut out the edge area, but the present invention is not limited to this, and in other embodiments, other designated positions of the captured image may be reserved and areas other than the designated positions may be cut out, and the determination may be specifically made according to the type, pixel information, and actual size of the captured image. Due to the fact that the sub-pixel over-division method can be actually used for terminal products such as mobile phones, data preparation is conducted on terminal devices of different models, and effect improvement is facilitated.

Step S203, obtaining a frame of image with the highest definition from the plurality of frames of input images.

In a specific implementation, the step of obtaining the image with the highest definition from the multiple frames of input images may be:

In order to conveniently and subsequently acquire the reference frame, the input image with the highest definition determined in the step can be marked in the multi-frame input image, so that when the reference frame is acquired, only the marked frame image in the multi-frame input image is needed to be acquired, resolution screening is not needed to be repeatedly performed, and efficiency is improved.

And S204, amplifying the highest-definition image by a preset multiple, and taking the amplified highest-definition image as an input image of a preset neural network model.

The preset times may be 2x, 4x, 8x, 16x, and the like, and preferably 4x, that is, the image with the highest definition is magnified by 4 times. Since the size of the input image is w × h, the size of the magnified image with the highest resolution is 4w × 4 h. The image with the highest definition is amplified, so that the extraction of sub-pixel information of a subsequent neural network model is facilitated, the neural network model can train the image with the highest definition conveniently, and the model training efficiency is improved.

Step S205, performing sub-pixel alignment processing on the multiple frames of input images according to a preset sub-pixel alignment algorithm to obtain a frame of sub-pixel aligned image.

In this embodiment, the preset sub-pixel alignment algorithm is an optical flow method, and the step S205 may be implemented by the following thinning steps, where the thinning step specifically includes:

selecting a frame of image with the highest definition from the multiple frames of input images as a reference frame (the same frame as the image with the highest definition obtained in step S203);

In other embodiments, the preset sub-pixel alignment algorithm may also be an alignment algorithm based on block matching or based on feature point block matching.

And S206, inputting the amplified image with the highest definition into a preset neural network model, and training the preset neural network model by taking the sub-pixel alignment image as a training target to obtain a frame of super-resolution image restored by sub-pixels.

In specific implementation, the preset neural network model may be, but is not limited to, FSRCNN, SRCNN, or RESNET, preferably, the FSRCNN neural network is used as a pre-training model, the network has 5 layers, 3937 parameters, and the smaller size enables the operation of the network at the mobile phone end to have high efficiency.

Specifically, the step of training the preset neural network model may be specifically implemented as the following refining steps, and the refining steps specifically include:

calculating a loss function of the preset neural network model;

In some alternative embodiments, the step of calculating the loss function of the preset neural network model may specifically include:

acquiring the space domain difference and the structure difference of the trained highest-definition image and the sub-pixel alignment image;

More specifically, the formula for calculating the loss function S may be:

wherein the content of the first and second substances,

the spatial difference between the trained image with the highest definition and the sub-pixel alignment image, SSIM (p) is the structural difference between the trained image with the highest definition and the sub-pixel alignment image; wherein the content of the first and second substances,

p represents image pixels, P represents a collection of image pixels, x (P) represents the highest-definition image after training, y (P) represents the sub-pixel aligned image, N represents total pixels, μ_xAverage value, mu, of image blocks representing said highest definition images after training_yAverage, σ, of image blocks representing said sub-pixel aligned image_x ²Variance, σ, of image blocks representing the highest definition image after training_y ²Variance, σ, of an image block representing the sub-pixel aligned image_xyRepresents the covariance of the image block, and C is a constant.

By calculating the average difference of pixels, the accuracy of network output can be ensured on the image content, and the larger SSIM is, the closer the structural information representing the trained image with the highest definition and the sub-pixel aligned image is, the better the sub-pixel information is complemented, so that in order to balance the spatial difference and the structural difference of the model, the loss function of the model is defined as

And continuously optimizing the model by taking the loss function S reaching a preset value as an optimization target, and modifying the model parameters to continuously optimize until the loss function S reaches the preset value by iterating for certain times if the loss function S corresponding to the suboptimal result does not reach the preset value during optimization. And when the loss function S reaches a preset value, determining that the loss function S reaches the minimum value, and optimizing the neural network model. It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

Compared with the first embodiment, the super-resolution neural network training method capable of learning sub-pixel level alignment is designed, the end-to-end neural network is used for adaptively learning the association between the image pixels, the super-resolution of the sub-pixel level can be performed based on a single-frame image, the sub-pixel information can be automatically inferred by a model under the condition that multi-frame input is not needed, the real details of the image are restored, the model input image is single, the influence of possible fuzzy and sawtooth effects and the like of the multi-frame image is avoided, the robustness is high, and the resolution improving effect better than that of a traditional super-resolution method can be achieved.

Fig. 5 shows a block diagram of an image processing apparatus according to an embodiment of the present application, which corresponds to the image processing method described in the above embodiment, and only shows a part related to the embodiment of the present application for convenience of description.

Referring to fig. 5, the apparatus includes the following modules:

an obtaining module 11, configured to obtain a frame of target image from a plurality of input images;

the sub-pixel alignment module 12 is configured to perform sub-pixel alignment processing on the multiple frames of input images according to a preset sub-pixel alignment algorithm to obtain a frame of sub-pixel aligned image;

and an image restoration module 13, configured to supplement the sub-pixel information in the sub-pixel aligned image into the target image for restoration, so as to obtain an image restored by one frame of sub-pixels.

In some embodiments of the present invention, the obtaining module 11 is further configured to:

In some embodiments of the present invention, the image restoration module 13 is further configured to:

In some embodiments of the present invention, the image restoring module 13 may include:

and the neural network unit is used for inputting the target image into a preset neural network model, and training the preset neural network model by taking the sub-pixel alignment image as a training target.

In some embodiments of the invention, the neural network unit may further include:

a loss function calculating subunit, configured to calculate a loss function of the preset neural network model;

and the model training subunit is used for adjusting the parameters of the preset neural network model according to the loss function and training the preset neural network model again until the loss function reaches a preset value.

In some embodiments of the invention, the loss function calculation subunit is further configured to:

and acquiring the spatial difference and the structural difference of the trained target image and the sub-pixel alignment image, and calculating the loss function of the preset neural network model according to the spatial difference and the structural difference.

In some embodiments of the present invention, the calculation formula of the loss function S may specifically be:

wherein the content of the first and second substances,

In some embodiments of the invention, the apparatus may further comprise:

and the image amplification module is used for amplifying the target image by preset times.

In some embodiments of the present invention, the sub-pixel alignment module 12 may include:

a reference frame selecting unit that selects an image with the highest definition as a reference frame from the plurality of input images;

a positional relationship confirmation unit configured to acquire an object positional relationship between a frame other than the reference frame among the plurality of frames of input images and the reference frame;

the coordinate conversion unit is used for converting the coordinate system of the other frame to be consistent with the coordinate system of the reference frame according to the object position relation;

and the sub-pixel alignment unit is used for performing sub-pixel alignment fusion on the other frames and the reference frame by taking the reference frame as a reference image to obtain the sub-pixel alignment image.

In some embodiments of the invention, the apparatus may further comprise:

the image acquisition module is used for acquiring shot images with preset frame numbers;

and the image cutting module is used for cutting each frame of the shot image according to a preset cutting size and taking the cut shot image as the input image.

It should be noted that, for the information interaction, execution process, and other contents between the above-mentioned devices/units, the specific functions and technical effects thereof are based on the same concept as those of the embodiment of the method of the present application, and specific reference may be made to the part of the embodiment of the method, which is not described herein again.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

An embodiment of the present application further provides a terminal device, where the terminal device includes: at least one processor, a memory, and a computer program stored in the memory and executable on the at least one processor, the processor implementing the steps of any of the various method embodiments described above when executing the computer program.

The embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computer program implements the steps in the above-mentioned method embodiments.

The embodiments of the present application provide a computer program product, which when running on a mobile terminal, enables the mobile terminal to implement the steps in the above method embodiments when executed.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, all or part of the processes in the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium and can implement the steps of the embodiments of the methods described above when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code to a photographing apparatus/terminal apparatus, a recording medium, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), an electrical carrier signal, a telecommunications signal, and a software distribution medium. Such as a usb-disk, a removable hard disk, a magnetic or optical disk, etc. In certain jurisdictions, computer-readable media may not be an electrical carrier signal or a telecommunications signal in accordance with legislative and patent practice.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/network device and method may be implemented in other ways. For example, the above-described apparatus/network device embodiments are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implementing, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims

1. An image processing method, characterized in that the method comprises:

acquiring a frame of target image from a plurality of frames of input images;

2. The image processing method according to claim 1, wherein the step of obtaining a target image of one frame from among a plurality of frames of input images is:

3. The image processing method according to claim 2, wherein the step of obtaining the image with the highest definition from the plurality of frames of input images comprises:

4. The image processing method according to claim 2, wherein the step of supplementing the sub-pixel information in the sub-pixel alignment image into the target image for restoration to obtain a frame of sub-pixel restored image comprises:

5. The image processing method of claim 1, wherein said supplementing sub-pixel information in said sub-pixel aligned image into said target image comprises:

6. The image processing method of claim 5, wherein the training of the pre-set neural network model comprises:

calculating a loss function of the preset neural network model;

7. The image processing method of claim 6, wherein the step of calculating the loss function of the preset neural network model comprises:

8. The image processing method according to claim 7, wherein the loss function S is calculated by the formula:

wherein the content of the first and second substances,

p represents image pixels, P represents a collection of image pixels, x (P) represents the trained target image, y (P) represents the sub-pixel aligned image,n represents the total pixels, μ_xAverage, mu, of image blocks representing said target image after training_yAverage, σ, of image blocks representing said sub-pixel aligned image_x ²Variance, σ, of image blocks representing the target image after training_y ²Variance, σ, of an image block representing the sub-pixel aligned image_xyRepresents the covariance of the image block, and C is a constant.

9. The image processing method according to claim 5, further comprising, after said acquiring a target image of one frame from among a plurality of frames of input images:

and amplifying the target image by a preset multiple, and taking the amplified target image as an input image of the preset neural network model.

10. The image processing method according to claim 1, wherein performing sub-pixel alignment processing on the plurality of frames of input images to obtain a frame of sub-pixel aligned image comprises:

11. The image processing method according to claim 1 or 9, further comprising, before said acquiring a target image of one frame from among a plurality of frame input images:

acquiring a shot image with a preset frame number;

12. An image processing apparatus, characterized in that the apparatus comprises:

and the image restoration module is used for supplementing the sub-pixel information in the sub-pixel alignment image into the target image for restoration so as to obtain an image restored by one frame of sub-pixels.

13. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 11 when executing the computer program.

14. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 11.