CN111833285A

CN111833285A - Image processing method, image processing device and terminal equipment

Info

Publication number: CN111833285A
Application number: CN202010717686.1A
Authority: CN
Inventors: 李兴龙
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2020-07-23
Filing date: 2020-07-23
Publication date: 2020-10-27
Anticipated expiration: 2040-07-23
Also published as: CN111833285B

Abstract

The application provides an image processing method, which comprises the following steps: acquiring a video to be processed, wherein the video to be processed comprises a plurality of video frames to be processed shot aiming at the same static scene; determining a key frame from the video to be processed; for each key frame, aligning a video frame to be processed associated with the key frame to the key frame; and obtaining a truth-value image corresponding to the key frame according to the aligned video frame to be processed associated with the key frame, wherein the definition of the truth-value image is higher than that of the key frame. By the method, the low-quality image close to the real scene and the corresponding high-quality image can be obtained.

Description

Image processing method, image processing device and terminal equipment

Technical Field

The present application belongs to the field of image processing technologies, and in particular, relates to an image processing method, an image processing apparatus, a terminal device, and a computer-readable storage medium.

Background

At present, in some application scenarios, restoration of images and videos can be realized by a machine learning algorithm and the like without changing hardware devices, so as to improve the resolution of the images and videos and improve the quality of low-quality images and videos.

Before restoring images and videos by a machine learning algorithm or the like, the machine learning algorithm needs to be trained by a specific training set, and the training set contains low-quality images and corresponding high-quality images. Currently, the training set can be obtained by adding white gaussian noise to a high quality image to generate a corresponding noisy low quality image. However, the low-quality images obtained by adding gaussian white noise have different blur characteristics such as noise distribution pattern from the low-quality images to be restored in the real scene, and therefore, the machine learning algorithm trained by the existing training set cannot effectively process the low-quality images in the real scene, resulting in poor restoration effect of the images and videos. Therefore, a method is needed that can obtain a low quality image close to the real scene and a corresponding high quality image.

Disclosure of Invention

The embodiment of the application provides an image processing method, an image processing device, a terminal device and a computer readable storage medium, which can acquire a low-quality image close to a real scene and a corresponding high-quality image.

In a first aspect, an embodiment of the present application provides an image processing method, including:

acquiring a video to be processed, wherein the video to be processed comprises a plurality of video frames to be processed shot aiming at the same static scene;

determining a key frame from the video to be processed;

for each key frame, aligning a video frame to be processed associated with the key frame to the key frame;

and obtaining a truth-value image corresponding to the key frame according to the aligned video frame to be processed associated with the key frame, wherein the definition of the truth-value image is higher than that of the key frame.

In a second aspect, an embodiment of the present application provides an image processing apparatus, including:

the device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a video to be processed, and the video to be processed comprises a plurality of video frames to be processed which are shot aiming at the same static scene;

the determining module is used for determining key frames from the video to be processed;

an alignment module, configured to align, for each key frame, a to-be-processed video frame associated with the key frame to the key frame;

and the processing module is used for obtaining a truth-value image corresponding to the key frame according to the aligned video frame to be processed associated with the key frame, wherein the definition of the truth-value image is higher than that of the key frame.

In a third aspect, an embodiment of the present application provides a terminal device, which includes a memory, a processor, a display, and a computer program stored in the memory and executable on the processor, where the processor implements the image processing method according to the first aspect when executing the computer program.

In a fourth aspect, the present application provides a computer-readable storage medium, where a computer program is stored, and the computer program, when executed by a processor, implements the image processing method according to the first aspect.

In a fifth aspect, the present application provides a computer program product, which, when run on a terminal device, causes the terminal device to execute the image processing method described above in the first aspect.

Compared with the prior art, the embodiment of the application has the advantages that: in the embodiment of the application, a to-be-processed video can be acquired, wherein the to-be-processed video includes a plurality of to-be-processed video frames shot for the same static scene, and therefore a plurality of correlated images for the same scene can be acquired, so that information can be extracted by subsequently combining the correlated images. Then, determining key frames from the video to be processed, aligning the video frames to be processed associated with the key frames to the key frames for each key frame, and obtaining a truth value image corresponding to the key frame according to the aligned video frames to be processed associated with the key frames. The aligned to-be-processed video frame associated with the key frame can be referred to the key frame through the alignment, so that the obtained truth value image is also referred to the key frame, and the clarity of the truth value image is higher than that of the key frame, therefore, the truth value image can be a high-quality image of the key frame, and the key frame is a low-quality image, and the key frame is shot for a static scene in a real shooting scene and is not obtained by adding white noise and the like, so that the key frame is similar to the fuzzy characteristics of the noise distribution mode and the like of the low-quality image needing to be processed in a subsequent real application scene. Therefore, according to the embodiment of the application, the low-quality image and the high-quality image close to the real scene can be acquired, so that in some exemplary application scenes, the low-quality image and the high-quality image which are acquired and close to the real scene can be adopted to train the machine learning algorithm, the performance of the machine learning algorithm is improved, and the restoration effect of the low-quality image and the video is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

Fig. 1 is a schematic flowchart of an image processing method according to an embodiment of the present application;

FIG. 2 is an exemplary diagram of obtaining a truth image of a key frame according to an embodiment of the present application;

fig. 3 is an exemplary diagram for obtaining a matching point pair in a first image region and a second image region according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of a terminal device according to an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It should also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to" determining "or" in response to detecting ". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".

Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather "one or more but not all embodiments" unless specifically stated otherwise. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless expressly specified otherwise.

The image processing method provided by the embodiment of the application can be applied to a server, a desktop computer, a mobile phone, a tablet computer, a wearable device, a vehicle-mounted device, an Augmented Reality (AR)/Virtual Reality (VR) device, a notebook computer, a super-mobile personal computer (UMPC), a netbook, a Personal Digital Assistant (PDA) and other terminal devices, and the embodiment of the application does not limit the specific types of the terminal devices.

Fig. 1 shows a flowchart of an image processing method provided in an embodiment of the present application, which may be applied to a terminal device.

At present, in some application scenarios, restoration of low-quality images and videos can be achieved through a machine learning algorithm and the like without changing hardware devices, so as to improve the resolution of the images and videos and improve the quality of the low-quality images and videos.

Before the restoration of images and videos is realized by applying a machine learning algorithm such as a convolutional neural network, the machine learning algorithm needs to be trained through a specific training set, and the training set comprises low-quality images and corresponding high-quality images. Therefore, the training data of the training set, the final test data and the corresponding image characteristics (such as noise characteristics) of the application data belong to the same distribution, so that the performance of the machine learning algorithm can be effectively improved.

Currently, a common generation method for low-quality images and corresponding high-quality images in a relevant training set is to down-sample an original high-resolution image to obtain a corresponding low-resolution image; yet another common generation is to generate a corresponding noisy low quality image by adding white gaussian noise to the high quality image. However, the fuzzy characteristics such as the noise distribution mode in the existing generated low-quality image are not the same as those of the low-quality image to be restored in the real scene, so that the machine learning algorithm trained by the corresponding training set cannot accurately identify the features of the low-quality image in the real application scene, and the restoration of the image and the video cannot be realized more accurately.

In the embodiment of the application, a video to be processed can be acquired, so that a plurality of video frames to be processed, which are obtained by shooting in the same static scene, are acquired, and a key frame is determined from the video to be processed, wherein the key frame can be used as a low-quality image obtained by real shooting; then, aiming at each key frame, aligning a to-be-processed video frame associated with the key frame to the key frame, obtaining a truth-value image corresponding to the key frame according to the aligned to-be-processed video frame associated with the key frame, and obtaining a high-quality truth-value image by combining the to-be-processed video frame associated with the key frame, so that a low-quality image and a high-quality image close to a real scene can be obtained, and in some exemplary application scenes, the obtained low-quality image and the obtained high-quality image close to the real scene can be used for training a machine learning algorithm, so that the performance of the machine learning algorithm is improved, and the recovery effect of the low-quality image and the video is improved.

It should be noted that, in the embodiment of the present application, the specific application of the obtained low-quality image and high-quality image is not limited to being a training set of a machine learning algorithm, but may be used in other applications.

Specifically, as shown in fig. 1, the image processing method may include:

step S101, a video to be processed is obtained, wherein the video to be processed comprises a plurality of video frames to be processed which are obtained by shooting aiming at the same static scene.

In the embodiment of the present application, the video to be processed may be a video obtained by shooting a specific static scene. For example, the to-be-processed video may be obtained by fixing a shooting device (e.g., a mobile phone, a camera, etc.) via a foot stand, etc., and then recording a video related to the static scene. The plurality of to-be-processed video frames can be used as low-quality videos actually shot by the shooting device in the real scene, and at least part of the images can be used as low-quality images.

In some embodiments, in order to reduce the quality of the image in the video to be processed, the zoom factor of the shooting device may be adjusted higher before shooting the video to be processed, for example, the zoom factor may be set to 10, thereby reducing the quality of the video to be processed. In the process of shooting the video to be processed, shooting parameters of the shooting device can not be modified any more, so that the shooting parameters corresponding to the video to be processed are the same, and matching information can be extracted from a plurality of correlated images of the same scene conveniently for fusion and other operations.

In the embodiment of the present application, for example, each frame of image frame in the video to be processed may be taken as the image frame to be processed; in addition, in the image frames of the video to be processed, the image quality of the image frames can be evaluated according to at least one of the parameters of the image frames, such as variance, information entropy and structural similarity, so that the video frames with the image quality meeting the preset quality condition can be screened out as the video frames to be processed. The format of the video frame to be processed may be determined according to actual requirements, and for example, the video frame to be processed may be a format RGB image of bmp.

In some embodiments, after acquiring the video to be processed, the method further includes:

acquiring a plurality of original video frames in the video frames to be processed;

for each original video frame, calculating a fuzzy parameter of the original video frame according to a discrete parameter, an information entropy and/or a structural similarity of the original video frame, wherein the discrete parameter comprises a variance or a standard deviation of the original video frame;

and determining the video frame to be processed according to the fuzzy parameters of each original video frame.

In the embodiment of the present application, the discrete parameter of the original video frame may indicate a discrete degree of pixel values of the original video frame. In general, the larger the discrete parameter, the better the image quality of the corresponding original video frame. The information entropy of the original video frame can indicate the information amount of the original video frame, and therefore, the information entropy of the original video frame can also be used for judging the image quality such as the definition of the original video frame. The structural similarity of the original video frame may be gradient-based structural similarity (GSSIM), edge-based structural similarity (ESSIM), or the like. The structural similarity of the original video frame can also be used for judging the image quality of the original video frame.

The image quality of the original video frame can be determined according to one parameter of the discrete parameter, the information entropy and the structural similarity of the original video frame, or the image quality of the original video frame can be determined by combining at least two parameters of the discrete parameter, the information entropy and the structural similarity, so that the image quality of the original video frame can be more comprehensively evaluated.

In some embodiments, said calculating, for each original video frame, a blur parameter of the original video frame according to a discrete parameter, an information entropy and/or a structural similarity of the original video frame comprises:

for each original video frame, calculating a fuzzy parameter of the original video frame according to a preset formula, wherein the preset formula is as follows:

BP＝λ₁V_f+λ₂E_f+λ₃G_f

wherein BP is a blurring parameter of the original video frame, V_fDiscrete parameters for the original video frame, E_fEntropy of information, G, of the original video frame_fIs the structural similarity, lambda, of the original video frame₁Is a first predetermined weight, λ₂Is a second predetermined weight, λ₃Is a third preset weight;

the determining the video frame to be processed according to the fuzzy parameter of each original video frame comprises the following steps:

and for each original video frame, if the fuzzy parameter of the original video frame is in a preset parameter interval, taking the original video frame as a video frame to be processed.

In the embodiment of the present application, the values of the first preset weight, the second preset weight, and the third preset weight may be predetermined in advance according to experience, test, application requirements, and the like. Through the preset formula, the image quality of the original video frame can be quantitatively evaluated more comprehensively from different dimensions.

The preset parameter interval can also be obtained in advance through preset experimental statistics. For example, the blur parameter of the original video frame in the preset parameter interval may be that the blur parameter of the original video frame is greater than a first preset parameter threshold. In this case, the original video frame that meets the condition may be considered to have a low degree of blur, and the original video frame may be considered to contain a certain amount of information, so that the original video frame may be used as the video frame to be processed. In addition, in some examples, the blur parameter of the original video frame in the preset parameter interval may be that the blur parameter of the original video frame is greater than a first preset parameter threshold and smaller than a second preset parameter threshold, and at this time, the blur degree of the original video frame that meets the condition may be considered to be moderate, rather than an excessively clear or excessively blurred image, so as to be closer to a really captured low-quality image.

And step S102, determining a key frame from the video to be processed.

In the embodiment of the present application, the number of the key frames is not limited herein. The key frame may be determined in a variety of ways. For example, a video frame located at a specific frame number (such as a first frame or an intermediate frame or a last frame) in the video to be processed may be selected as the key frame; or, extracting key frames from the video to be processed at preset frame number intervals; in addition, the key frame may also be acquired only from the video frame to be processed.

Step S103, aiming at each key frame, aligning the video frame to be processed associated with the key frame to the key frame.

In the embodiment of the present application, there may be a plurality of specific association manners between the to-be-processed video frame associated with the key frame and the key frame. For example, the video frame to be processed associated with the key frame may be one or more frames of video frames to be processed adjacent to the key frame; alternatively, the video frame to be processed associated with the key frame may also be a video frame to be processed in which the contained feature points match with the feature points in the key frame, or the like.

In some embodiments, for any video frame to be processed associated with the key frame, the alignment may be performed according to feature information of mutual matching between the key frame and the video frame to be processed associated with the key frame. For example, according to the feature information matched with each other, transformation matrices such as homography matrices between the key frame and the to-be-processed video frame associated with the key frame may be established, and then the to-be-processed video frame associated with the key frame may be aligned to the key frame according to the corresponding transformation matrices.

Step S104, obtaining a truth-value image corresponding to the key frame according to the aligned video frame to be processed associated with the key frame, wherein the definition of the truth-value image is higher than that of the key frame.

In the embodiment of the application, after the aligned to-be-processed video frame associated with the key frame is acquired, image fusion may be performed according to a preset fusion rule to obtain the true value image. For example, a plurality of aligned to-be-processed video frames associated with the key frame may be merged. In addition, the aligned to-be-processed video frames associated with the key frames can be fused with the key frames. The specific mode of image fusion may be a fusion mode based on an average value, or a weighted fusion mode. For example, the weight of each aligned to-be-processed video frame associated with the key frame may be determined according to a parameter for evaluating image quality, such as a blur parameter, and for example, a higher weight may be assigned to a to-be-processed video frame with higher definition and lower noise intensity.

In some embodiments, the obtaining a truth image corresponding to the key frame according to the aligned to-be-processed video frame associated with the key frame may include:

generating a fusion image according to the aligned video frame to be processed associated with the key frame;

and performing specified optimization processing on the fused image to obtain a true value image corresponding to the key frame.

In the embodiment of the application, after the fused image is obtained, the fused image may be used as an initial true value image, and in some cases, some blurring problems may occur in the fused image, so that the image definition may be further improved by the specified optimization processing. Illustratively, the specified optimization process may include a deblurring process, a denoising process, an edge enhancement process, and the like.

In the embodiment of the present application, the obtained true value image is obtained by referring to the key frame, and the definition of the true value image is higher than that of the key frame, so that the true value image may be a high quality image of the key frame, and the key frame is a low quality image. Moreover, the key frame is shot for a static scene in a real shooting scene, and is not shot by adding white noise and the like, so that the key frame is similar to the fuzzy characteristics such as the noise distribution mode and the like of a low-quality image needing to be processed in a subsequent real application scene. Therefore, the low-quality image and the high-quality image close to the real scene can be acquired through the embodiment of the application.

In some specific applications, the obtained low-quality image and high-quality image close to the real scene can be used for training a machine learning algorithm so as to improve the performance of the machine learning algorithm, and further improve the restoration effect of the low-quality image and the video. In addition, in some applications, a specific fuzzy video-clear image training set may also be generated according to the to-be-processed video and the true value image, so as to train a machine learning algorithm for processing a video. Because the fuzzy characteristics such as the noise distribution mode of the fuzzy video in the specific fuzzy video-clear image training set are similar to the video acquired in the real application scene, the performance of the corresponding machine learning algorithm can be improved.

In some embodiments, after determining a key frame from the video to be processed, the method further includes:

aiming at any key frame, acquiring a video frame to be cut from M frames of video frames to be processed adjacent to the key frame;

extracting a first image block from the key frame;

extracting a second image block from each video frame to be cut aiming at each video frame to be cut, wherein the position of the second image block in the video frame to be cut is the position of the first image block in the key frame after the first image block is shifted by a preset numerical value along a corresponding preset shifting direction, and the size of the second image block is the same as that of the first image block;

obtaining training video data according to the first image blocks and the second image blocks;

after obtaining a true value image corresponding to the key frame according to the aligned to-be-processed video frame associated with the key frame, the method further includes:

extracting a truth value image block corresponding to the first image block from the truth value image;

and establishing a fuzzy video-clear image training set according to the training video data and the true value image blocks.

At present, in a video restoration task based on a machine learning algorithm, a video can be regarded as a sequence formed by single-frame images, so that each frame of video frame in the video is respectively input into the machine learning algorithm to obtain restored video frames, and the restored video frames are combined to form a restored video; in addition, in one input, a plurality of consecutive video frames in the video may be input to the machine learning algorithm, processed key frames may be output, and all processed key frames may be combined to form a restored video.

By the embodiment of the application, the fuzzy video-clear image training set close to a real application scene can be obtained and applied to training the corresponding machine learning algorithm in some application scenes, so that the performance of the machine learning algorithm is improved, and the recovery effect of the low-quality video is improved.

The number of the video frames to be cut and the frame number difference between the video frames to be cut and the key frames can be determined according to the actual scene requirement. Illustratively, the key frame may be a 60 th video frame I in the video to be processed₆₀Then the key frame I may be selected₆₀Before and after each adjacent video frame I₅₉、I₆₁As the video frame to be cut, or a 55 th frame video frame I in the video to be processed may also be selected₅₅59 th frame video frame I₅₉As the video frame to be cut. Therefore, the selection rule of the video frame to be cut can be various.

The position and size of the first image block in the key frame may be preset. The preset offset direction may be determined according to the time sequence of the key frame and each to-be-cut video frame in the to-be-processed video, and at this time, each obtained second image block may be close to a device moving mode in a shooting process, for example, a 59 th frame video frame I₅₉May be shifted to the right and the 60 th frame video frame I₆₁May be shifted to the left to incorporate the 60 th frame video frame I₆₀The first image block in (1) forms the case of motion offset. In addition, in some scenarios, the preset offset direction may also be random to simulate a video captured in a scene in which the capturing device is randomly shaken in a real-world capturing scene. The preset value can be a random number generated in a certain value interval, or can be preset by related personnel.

Specifically, the first image block may be obtained by clipping in the key frame, the second image blocks may be obtained by clipping from each video frame to be clipped, and the first image block and each second image block may be combined to obtain a set of training video data according to a specific time sequence. For example, the specific timing may be a timing of the key frame and each to-be-cropped video frame in the to-be-processed video.

In the embodiment of the application, after the training video data and the true-value image blocks are acquired, the true-value image blocks can be used as true-value labels of the training video data and used for training a machine learning algorithm for processing a video in some scenes. Because the fuzzy characteristics such as the noise distribution mode of the fuzzy video in the specific fuzzy video-clear image training set are similar to the video acquired in the real application scene, the performance of the corresponding machine learning algorithm can be improved.

In some embodiments, after obtaining a true value image corresponding to the key frame according to the aligned to-be-processed video frame associated with the key frame, the method further includes:

and establishing a fuzzy-clear image training set according to the key frame and the truth value image in the video to be processed.

In the embodiment of the present application, since the number of the key frames in the video frame to be processed may be more than one, there may be more than one group of training image pairs formed by the key frames and the true value images. The truth image may be labeled as a truth value of a corresponding key frame.

In some embodiments, said aligning, for each key frame, the video frame to be processed associated with the key frame to the key frame comprises:

for each key frame, determining at least one image group associated with the key frame, wherein each image group comprises a plurality of video frames to be processed;

aiming at each image group, respectively aligning each video frame to be processed in the image group to the key frame;

the obtaining a truth-value image corresponding to the key frame according to the aligned to-be-processed video frame associated with the key frame includes:

for each image group, fusing the aligned video frames to be processed in the image group to obtain a first image corresponding to the image group;

aligning each obtained first image to the key frame respectively;

and fusing the aligned first images to obtain a truth value image of the key frame.

In the embodiment of the present application, the dividing manner of each image group may be various. In some examples, the plurality of to-be-processed video frames in each image group may be consecutive video frames, thereby improving the relevance of the to-be-processed video frames in one image group. The modes of aligning each to-be-processed video frame in the image group to the key frame and aligning each obtained first image to the key frame may be the same or different. For example, the alignment may be performed according to feature information of two frames of video frames matching with each other. Illustratively, transformation matrices such as a homography matrix and an affine transformation between two frames of video frames can be established according to the mutually matched characteristic information, and then alignment is performed according to the corresponding transformation matrices.

A specific implementation of the embodiments of the present application is described below as a specific example.

For example, as shown in fig. 2, each to-be-processed video frame may be acquired from the to-be-processed video, and each to-be-processed video frame may be sorted according to a time sequence of each to-be-processed video frame in the to-be-processed video. If 170 to-be-processed video frames are acquired, the 60 th frame and the 120 th frame of the 170 to-be-processed video frames can be selected as key frames, and are respectively marked as I₆₀、I₁₂₀. Therein, with the key frame I₆₀The associated video frames to be processed can be 10 th to 110 th video frames to be processed and key frames I₁₂₀The associated video frames to be processed are 70-170 th frames of video frames to be processed.

For key frame I₆₀Can be associated with the key frame I₆₀Dividing the associated 10 th to 110 th frames of video to be processed into 10 imagesAnd each image group comprises 10 continuous video frames to be processed. And for each image group, aligning and then fusing each to-be-processed video frame in the image group to the key frame respectively to obtain a first image corresponding to the image group, so that 10 frames of first images can be obtained. Then, aligning the obtained 10 frames of first images to the key frames respectively and then fusing to obtain a final true value image.

In the embodiment of the application, image fusion can be respectively carried out on each group of image groups, and then image fusion is carried out on each obtained first image, at the moment, the image quality can be continuously improved progressively through image fusion operation of two levels, so that the image quality of the finally obtained true value image is better.

In some embodiments, for each image group, aligning each to-be-processed video frame in the image group to the key frame respectively includes:

aiming at each video frame to be processed in each image group, dividing the video frame to be processed into a plurality of first image areas;

for any first image area, determining an image area to be searched corresponding to the first image area in the key frame;

searching a second image area with the highest similarity with the first image area in the image area to be searched, wherein the size of the second image area is the same as that of the first image area;

acquiring a matching point pair in the first image area and the second image area, wherein two feature points in the matching point pair are respectively located in the first image area and the second image area;

after at least one group of matching point pairs between the video frame to be processed and the key frame is obtained, aligning the video frame to be processed to the key frame according to the at least one group of matching point pairs.

In this embodiment of the application, the size of the image area to be searched may be larger than the size of the first image area. The number of the matching point pairs can be determined according to actual requirements. Therefore, in the embodiment of the present application, for any first image region, in the key frame, determining an image region to be searched corresponding to the first image region, then searching for a second image region with the highest similarity to the first image region in the image region to be searched, and then in the first image region and the second image region, the number of times of operations for acquiring a matching point pair may be determined according to actual requirements.

In this embodiment of the application, the video frame to be processed may include a plurality of first image regions obtained by division, and in some examples, may further include other regions. For example, the video frame to be processed may be divided into a plurality of first image areas of a size. At this time, there may be an edge region with a width of less than 5 pixels at the edge of the video frame to be processed, and therefore, the edge region with the width of less than 5 pixels cannot be divided.

After at least one set of matching point pairs between the video frame to be processed and the key frame is obtained, a transformation matrix may be generated according to the at least one set of matching point pairs, so as to align the video frame to be processed to the key frame. For example, the transformation matrix may be a homography matrix between two images calculated by a Random Sample Consensus (RANSAC) algorithm.

Illustratively, as shown in fig. 3, in the video frame to be processed, the center coordinate of a first image area is (x)_r,y_r) And the size of the first image area is an image area of 5 × 5, determining an image area to be searched (x) corresponding to the first image area in the key frame_r-25～x_r+25,y_r-25～y_r+25). In the image area to be searched, the highest similarity with the first image area is searched in a mode that the step length is 1 pixelThe center coordinate of the obtained second image area is (x)_t,y_t) And will (x)_r,y_r) And (x)_t,y_t) As a matching point pair between the first image region and the second image region. After traversing the video frame to be processed, a plurality of groups of matching point pairs can be obtained, so that according to the plurality of groups of matching point pairs, a homography matrix between the video frame to be processed and the key frame is calculated through a Random Sample Consensus (RANSAC) algorithm, and then according to the homography matrix, the video frame to be processed is aligned to the key frame, so that the aligned video frame to be processed is obtained.

In the embodiment of the application, a to-be-processed video can be acquired, wherein the to-be-processed video includes a plurality of to-be-processed video frames shot for the same static scene, and therefore a plurality of correlated images for the same scene can be acquired, so that information can be extracted by subsequently combining the correlated images. Then, determining key frames from the video to be processed, aligning the video frames to be processed associated with the key frames to the key frames for each key frame, and obtaining a truth value image corresponding to the key frame according to the aligned video frames to be processed associated with the key frames. The aligned to-be-processed video frame associated with the key frame can be referred to the key frame through the alignment, so that the obtained truth value image is also referred to the key frame, and the clarity of the truth value image is higher than that of the key frame, therefore, the truth value image can be a high-quality image of the key frame, and the key frame is a low-quality image, and the key frame is shot for a static scene in a real shooting scene and is not obtained by adding white noise and the like, so that the key frame is similar to the fuzzy characteristics of the noise distribution mode and the like of the low-quality image needing to be processed in a subsequent real application scene. Therefore, according to the embodiment of the application, the low-quality image and the high-quality image close to the real scene can be acquired, so that in some exemplary application scenes, the low-quality image and the high-quality image which are acquired and close to the real scene can be adopted to train the machine learning algorithm, the performance of the machine learning algorithm is improved, and the restoration effect of the low-quality image and the video is improved.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

Fig. 4 shows a block diagram of an image processing apparatus provided in an embodiment of the present application, which corresponds to the image processing method described in the above embodiment, and only shows portions related to the embodiment of the present application for convenience of description.

Referring to fig. 4, the image processing apparatus 4 includes:

an obtaining module 401, configured to obtain a video to be processed, where the video to be processed includes multiple video frames to be processed that are obtained by shooting for the same static scene;

a determining module 402, configured to determine a key frame from the video to be processed;

an alignment module 403, configured to, for each key frame, align a to-be-processed video frame associated with the key frame to the key frame;

a processing module 404, configured to obtain a true value image corresponding to the key frame according to the aligned to-be-processed video frame associated with the key frame, where a definition of the true value image is higher than that of the key frame.

Optionally, the image processing apparatus 4 further includes:

the second acquisition module is used for acquiring a plurality of original video frames in the video frames to be processed;

the calculation module is used for calculating a fuzzy parameter of each original video frame according to a discrete parameter, an information entropy and/or a structural similarity of the original video frame, wherein the discrete parameter comprises a variance or a standard deviation of the original video frame;

and the second determining module is used for determining the video frame to be processed according to the fuzzy parameter of each original video frame.

Optionally, the calculation module is specifically configured to:

BP＝λ₁V_f+λ₂E_f+λ₃G_f

the second determining module is specifically configured to:

Optionally, the alignment module 403 specifically includes:

the first determining unit is used for determining at least one image group associated with each key frame, wherein each image group comprises a plurality of video frames to be processed;

the first alignment unit is used for aligning each video frame to be processed in each image group to the key frame;

the processing module 404 specifically includes:

the first fusion unit is used for fusing the aligned video frames to be processed in each image group to obtain a first image corresponding to the image group;

the second alignment unit is used for aligning each obtained first image to the key frame respectively;

and the second fusion unit is used for fusing the aligned first images to obtain a truth value image of the key frame.

Optionally, the first alignment unit specifically includes:

the dividing unit is used for dividing each video frame to be processed in each image group to obtain a plurality of first image areas;

a determining subunit, configured to determine, in the key frame, an image area to be searched corresponding to any first image area;

the searching subunit is configured to search, in the image area to be searched, a second image area with the highest similarity to the first image area, where the size of the second image area is the same as the size of the first image area;

a matching subunit, configured to obtain a pair of matching points in the first image region and the second image region, where two feature points in the pair of matching points are located in the first image region and the second image region, respectively;

and the aligning subunit is configured to align the video frame to be processed with the key frame according to at least one group of matching point pairs after the at least one group of matching point pairs between the video frame to be processed and the key frame is acquired.

Optionally, the image processing apparatus 4 further includes:

the third acquisition module is used for acquiring a video frame to be cut from M frames of video frames to be processed adjacent to any key frame;

a third determining module, configured to extract the first image block from the key frame;

a fourth determining module, configured to extract, for each to-be-cut video frame, a second image block from the to-be-cut video frame, where a position of the second image block in the to-be-cut video frame is a position of the first image block in the key frame after the first image block is shifted by a preset value along a corresponding preset shift direction, and a size of the second image block is the same as a size of the first image block;

the second processing module is used for obtaining training video data according to the first image blocks and the second image blocks;

the extraction module is used for extracting a true value image block corresponding to the first image block from the true value image;

and the third processing module is used for establishing a fuzzy video-clear image training set according to the training video data and the true value image blocks.

Optionally, the image processing apparatus 4 further includes:

and the fourth processing module is used for establishing a fuzzy-clear image training set according to the key frame and the true value image in the video to be processed.

It should be noted that, for the information interaction, execution process, and other contents between the above-mentioned devices/units, the specific functions and technical effects thereof are based on the same concept as those of the embodiment of the method of the present application, and specific reference may be made to the part of the embodiment of the method, which is not described herein again.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned functions may be distributed as different functional units and modules according to needs, that is, the internal structure of the apparatus may be divided into different functional units or modules to implement all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

Fig. 5 is a schematic structural diagram of a terminal device according to an embodiment of the present application. As shown in fig. 5, the terminal device 5 of this embodiment includes: at least one processor 50 (only one is shown in fig. 5), a memory 51, and a computer program 52 stored in the memory 51 and executable on the at least one processor 50, wherein the steps in any of the above-described embodiments of the image processing method are implemented when the processor 50 executes the computer program 52.

The terminal device 5 may be a server, a mobile phone, a wearable device, an Augmented Reality (AR)/Virtual Reality (VR) device, a desktop computer, a notebook, a desktop computer, a palmtop computer, or other computing devices. The terminal device may include, but is not limited to, a processor 50, a memory 51. Those skilled in the art will appreciate that fig. 5 is merely an example of the terminal device 5, and does not constitute a limitation of the terminal device 5, and may include more or less components than those shown, or combine some of the components, or different components, such as may also include input devices, output devices, network access devices, etc. The input device may include a keyboard, a touch pad, a fingerprint sensor (for collecting fingerprint information of a user and direction information of a fingerprint), a microphone, a camera, and the like, and the output device may include a display, a speaker, and the like.

The Processor 50 may be a Central Processing Unit (CPU), and the Processor 50 may also be other general-purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field-Programmable Gate arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, and the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The storage 51 may be an internal storage unit of the terminal device 5, such as a hard disk or a memory of the terminal device 5. In other embodiments, the memory 51 may also be an external storage device of the terminal device 5, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the terminal device 5. Further, the memory 51 may include both an internal storage unit and an external storage device of the terminal device 5. The memory 51 is used for storing an operating system, an application program, a Boot Loader (Boot Loader), data, and other programs, such as program codes of the computer programs. The above-mentioned memory 51 may also be used to temporarily store data that has been output or is to be output.

In addition, although not shown, the terminal device 5 may further include a network connection module, such as a bluetooth module Wi-Fi module, a cellular network module, and the like, which is not described herein again.

In this embodiment, when the processor 50 executes the computer program 52 to implement the steps in any of the image processing method embodiments, in this embodiment, a to-be-processed video may be obtained, where the to-be-processed video includes a plurality of to-be-processed video frames obtained by shooting for the same static scene, and therefore, a plurality of mutually associated images for the same scene may be obtained, so as to extract information by subsequently combining the respective associated images. Then, determining key frames from the video to be processed, aligning the video frames to be processed associated with the key frames to the key frames for each key frame, and obtaining a truth value image corresponding to the key frame according to the aligned video frames to be processed associated with the key frames. The aligned to-be-processed video frame associated with the key frame can be referred to the key frame through the alignment, so that the obtained truth value image is also referred to the key frame, and the clarity of the truth value image is higher than that of the key frame, therefore, the truth value image can be a high-quality image of the key frame, and the key frame is a low-quality image, and the key frame is shot for a static scene in a real shooting scene and is not obtained by adding white noise and the like, so that the key frame is similar to the fuzzy characteristics of the noise distribution mode and the like of the low-quality image needing to be processed in a subsequent real application scene. Therefore, according to the embodiment of the application, the low-quality image and the high-quality image close to the real scene can be acquired, so that in some exemplary application scenes, the low-quality image and the high-quality image which are acquired and close to the real scene can be adopted to train the machine learning algorithm, the performance of the machine learning algorithm is improved, and the restoration effect of the low-quality image and the video is improved.

The embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computer program implements the steps in the above method embodiments.

The embodiments of the present application provide a computer program product, which when running on a terminal device, enables the terminal device to implement the steps in the above method embodiments when executed.

The integrated unit may be stored in a computer-readable storage medium if it is implemented in the form of a software functional unit and sold or used as a separate product. Based on such understanding, all or part of the processes in the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium and can implement the steps of the embodiments of the methods described above when the computer program is executed by a processor. The computer program includes computer program code, and the computer program code may be in a source code form, an object code form, an executable file or some intermediate form. The computer-readable medium may include at least: any entity or device capable of carrying computer program code to a photographing apparatus/terminal apparatus, a recording medium, computer Memory, Read-Only Memory (ROM), random-access Memory (RAM), an electrical carrier signal, a telecommunications signal, and a software distribution medium. Such as a usb-disk, a removable hard disk, a magnetic or optical disk, etc. In certain jurisdictions, computer-readable media may not be an electrical carrier signal or a telecommunications signal in accordance with legislative and patent practice.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/device and method may be implemented in other ways. For example, the above-described apparatus/device embodiments are merely illustrative, and for example, the division of the above modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims

1. An image processing method, comprising:

determining a key frame from the video to be processed;

2. The image processing method of claim 1, after acquiring the video to be processed, further comprising:

3. The image processing method according to claim 2, wherein said calculating, for each original video frame, a blur parameter of the original video frame according to a discrete parameter, an information entropy and/or a structural similarity of the original video frame comprises:

BP＝λ₁V_f+λ₂E_f+λ₃G_f

wherein BP is the original viewBlurring parameter of the frequency frame, V_fDiscrete parameters for the original video frame, E_fEntropy of information, G, of the original video frame_fIs the structural similarity, lambda, of the original video frame₁Is a first predetermined weight, λ₂Is a second predetermined weight, λ₃Is a third preset weight;

4. The image processing method of claim 1, wherein said aligning, for each key frame, the video frame to be processed associated with the key frame to the key frame comprises:

aligning each obtained first image to the key frame respectively;

5. The image processing method according to claim 4, wherein said aligning, for each image group, the respective video frames to be processed in the image group to the key frame comprises:

6. The image processing method according to any one of claims 1 to 5, further comprising, after determining a key frame from the video to be processed:

extracting a first image block from the key frame;

7. The image processing method according to any one of claims 1 to 5, further comprising, after obtaining a true-value image corresponding to the key frame according to the aligned video frame to be processed associated with the key frame:

8. An image processing apparatus characterized by comprising:

9. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the image processing method according to any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the image processing method according to any one of claims 1 to 7.