CN111583138B

CN111583138B - Video enhancement method and device, electronic equipment and storage medium

Info

Publication number: CN111583138B
Application number: CN202010344386.3A
Authority: CN
Inventors: 张弓
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2020-04-27
Filing date: 2020-04-27
Publication date: 2023-08-29
Anticipated expiration: 2040-04-27
Also published as: CN111583138A

Abstract

The disclosure provides a video enhancement method and device, electronic equipment and a storage medium, and relates to the technical field of image processing. The method comprises the following steps: and acquiring a plurality of video frames in the video to be processed, and dividing each video frame into a plurality of image blocks. For each video frame, selecting a reference frame of the video frame from a plurality of video frames, and selecting a preset number of target image blocks matched with single image blocks in the video frame from the reference frame. And inputting the single image block and the preset number of target image blocks in the video frame into a video processing model, and determining the enhanced image block corresponding to the image block. Splicing the enhancement image blocks corresponding to the image blocks in each video frame to obtain an enhancement video frame of the video frame, and determining an enhancement video based on the enhancement video frames corresponding to the video frames. The present disclosure may enhance the effect of video enhancement.

Description

Video enhancement method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of image processing technology, and in particular, to a video enhancement method, a video enhancement apparatus, an electronic device, and a computer readable storage medium.

Background

In the image enhancement technology, the whole or partial characteristics of the image can be purposefully emphasized, the original unclear image is made clear or some interesting characteristics are emphasized, the image quality and the information quantity are improved, the image interpretation and recognition effect is enhanced, and the needs of some special analysis are met. In the existing video enhancement method, multiple frame images may be input into a neural network for video enhancement processing. However, this method has poor video enhancement effect.

Disclosure of Invention

An object of the present disclosure is to provide a video enhancement method, a video enhancement apparatus, an electronic device, and a computer-readable storage medium, which overcome to some extent the problem of poor video enhancement effect due to limitations and disadvantages of the related art.

According to a first aspect of the present disclosure, there is provided a video enhancement method, comprising:

acquiring a plurality of video frames in a video to be processed, and dividing each video frame into a plurality of image blocks;

for each video frame, selecting a reference frame of the video frame from the plurality of video frames, and selecting a preset number of target image blocks matched with single image blocks in the video frame from the reference frame;

inputting a single image block and the preset number of target image blocks in the video frame into a video processing model, and determining an enhanced image block corresponding to the image block;

splicing the enhancement image blocks corresponding to the image blocks in each video frame to obtain an enhancement video frame of the video frame, and determining an enhancement video based on the enhancement video frames corresponding to the video frames.

According to a second aspect of the present disclosure, there is provided a video enhancement apparatus comprising:

the video frame dividing module is used for acquiring a plurality of video frames in the video to be processed and dividing each video frame into a plurality of image blocks;

the target image block selection module is used for selecting a reference frame of each video frame from the plurality of video frames and selecting a preset number of target image blocks matched with single image blocks in the video frame from the reference frame;

the image block enhancement processing module is used for inputting a single image block and the preset number of target image blocks in the video frame into a video processing model and determining an enhanced image block corresponding to the image block;

the enhanced video determining module is used for splicing the enhanced image blocks corresponding to the image blocks in each video frame to obtain enhanced video frames of the video frame, and determining enhanced video based on the enhanced video frames corresponding to the video frames.

According to a third aspect of the present disclosure, there is provided an electronic device comprising: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to perform the video enhancement method described above via execution of the executable instructions.

According to a fourth aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the video enhancement method described above.

Exemplary embodiments of the present disclosure may have some or all of the following advantages:

in the video enhancement method provided by an example embodiment of the present disclosure, by selecting a target image block matching an image block in a video frame from a reference frame of the video frame, and inputting the image block in the video frame and the target image block into a video processing model together, on one hand, more features related to the image block can be extracted, so that the enhancement effect of the image block can be improved by processing more features through a neural network, and further the video enhancement effect can be improved. On the other hand, the selected target image block is combined with the video processing model, namely, the traditional algorithm is combined with the neural network, so that the advantages of the traditional algorithm and the neural network can be fully utilized, the image enhancement effect can be further improved, and the video enhancement effect is further improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure. It will be apparent to those of ordinary skill in the art that the drawings in the following description are merely examples of the disclosure and that other drawings may be derived from them without undue effort.

FIG. 1 shows a schematic diagram of an electronic device in an embodiment of the disclosure;

FIG. 2 illustrates a computer-readable storage medium for implementing a video enhancement method;

FIG. 3 illustrates a flow chart of a video enhancement method in an embodiment of the present disclosure;

FIG. 4 shows a schematic diagram of a reference frame of a video frame;

FIG. 5 illustrates a flow chart of a method of selecting a target image block from a reference frame in an embodiment of the disclosure;

FIG. 6 illustrates a flowchart of a training method for a video processing model in an embodiment of the present disclosure;

fig. 7 shows a schematic structural diagram of a video enhancement device in an embodiment of the present disclosure.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments may be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the present disclosure. One skilled in the relevant art will recognize, however, that the aspects of the disclosure may be practiced without one or more of the specific details, or with other methods, components, devices, steps, etc. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the present disclosure.

Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus a repetitive description thereof will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in software or in one or more hardware modules or integrated circuits or in different networks and/or processor devices and/or microcontroller devices.

Fig. 1 shows a schematic structural diagram of an electronic device in an embodiment of the disclosure. It should be noted that the electronic device 100 shown in fig. 1 is only an example, and should not impose any limitation on the functions and the application scope of the embodiments of the present disclosure.

As shown in fig. 1, the electronic device 100 includes a central processor 101 that can perform various appropriate actions and processes according to a program stored in a read-only memory 102 or a program loaded from a storage section 108 into a random access memory 103. In the random access memory 103, various programs and data required for the system operation are also stored. The central processing unit 101, the read only memory 102, and the random access memory 103 are connected to each other via a bus 104. An input/output interface 105 is also connected to the bus 104.

The following components are connected to the input/output interface 105: an input section 106 including a keyboard, a mouse, and the like; an output section 107 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker, and the like; a storage section 108 including a hard disk or the like; and a communication section 109 including a network interface card such as a local area network card, a modem, and the like. The communication section 109 performs communication processing via a network such as the internet. The drive 110 is also connected to the input/output interface 105 as needed. A removable medium 111 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed on the drive 110 as needed, so that a computer program read out therefrom is installed into the storage section 108 as needed.

In particular, according to embodiments of the present disclosure, the processes described below with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable storage medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network through the communication portion 109, and/or installed from the removable medium 111. The computer program, when executed by the central processor 101, performs the various functions defined in the method and apparatus of the present application.

It should be noted that the computer readable storage medium shown in the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory, a read-only memory, an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present disclosure, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable storage medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable storage medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, radio frequency, and the like, or any suitable combination of the foregoing.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units involved in the embodiments of the present disclosure may be implemented by means of software, or may be implemented by means of hardware, and the described units may also be provided in a processor. Wherein the names of the units do not constitute a limitation of the units themselves in some cases.

As another aspect, the present application also provides a computer-readable storage medium that may be contained in the electronic device described in the above embodiment; or may exist alone without being incorporated into the electronic device. Referring to fig. 2, the computer-readable storage medium carries one or more computer programs 200 that, when executed by the electronic device, cause the electronic device to implement the methods described in the embodiments below. For example, the electronic device may implement the steps shown in fig. 3, and so on.

Referring to fig. 3, fig. 3 shows a flowchart of a video enhancement method in an embodiment of the disclosure, which may include the steps of:

in step S310, a plurality of video frames in the video to be processed are acquired, and each video frame is divided into a plurality of image blocks.

Step S320, for each video frame, selecting a reference frame of the video frame from a plurality of video frames, and selecting a preset number of target image blocks from the reference frame, wherein the target image blocks are matched with a single image block in the video frame.

Step S330, inputting the single image block and the preset number of target image blocks in the video frame into a video processing model, and determining the enhanced image block corresponding to the image block.

Step S340, splicing the enhancement image blocks corresponding to the image blocks in each video frame to obtain an enhancement video frame of the video frame, and determining an enhancement video based on the enhancement video frames corresponding to the video frames.

In the video enhancement method of the embodiment of the disclosure, the target image block matched with the image block in the video frame is selected from the reference frame of the video frame, and the image block in the video frame and the target image block are input into the video processing model together, so that on one hand, more features related to the image block can be extracted, and thus, the enhancement effect of the image block can be improved by processing more features through the neural network, and further, the video enhancement effect is improved. On the other hand, the selected target image block is combined with the video processing model, namely, the traditional algorithm is combined with the neural network, so that the advantages of the traditional algorithm and the neural network can be fully utilized, the image enhancement effect can be further improved, and the video enhancement effect is further improved.

The video enhancement method of the embodiments of the present disclosure is described in more detail below:

In the embodiment of the present disclosure, the video to be processed may be a video stored in a terminal device, or may be a video obtained from the internet. The video is composed of one frame of image, and a plurality of video frames in the video to be processed can be acquired, wherein one video frame is one image. For example, for a video to be processed in which 1s contains 30 frames of images, if the video to be processed is 20s, 600 video frames are contained in the video to be processed.

For each video frame, the video frame may be divided into a plurality of image blocks, and the enhancement processing is performed on the entire video to be processed by performing the enhancement processing on each image block. The division manner of the image blocks is not particularly limited in the present disclosure, and for example, the video frame may be divided into a plurality of rectangular image blocks according to the size of the video frame.

In step S320, for each video frame, a reference frame of the video frame is selected from the plurality of video frames, and a preset number of target image blocks matching a single image block in the video frame are selected from the reference frames.

Because of the continuity of video frames, typically adjacent video frames may contain the same image information. Thus, for each video frame, one can chooseOne or more video frames adjacent to the video frame serve as reference frames for the video frame. For example, the video frame may be selected from a plurality of video frames forward, a plurality of video frames backward, and the selected plurality of video frames may be used as reference frames for the video frame. Referring to fig. 4, fig. 4 shows a schematic diagram of a reference frame of a video frame, it can be seen that for the current video frame F _cur F is selected from _cur-m To F _cur+n N + m reference frames of (c).

Then, a target image block matched with a single image block in the video frame is selected from the reference frame, so that not only the characteristics in the image block but also the characteristics in the target image block can be extracted to perform enhancement processing on the image block. It is apparent that the effect of image enhancement can be improved compared to simply extracting the features in the image block. The number of the selected target image blocks may be a preset number, and the preset number may be 2, 3, 4, 5, or the like, which is not limited herein.

In the embodiment of the disclosure, a preset number of target image blocks matched with a single image block in the video frame may be selected from the reference frame through motion estimation, and of course, the target image blocks may also be selected through other methods. The basic idea of motion estimation is to divide each video frame into a plurality of non-overlapping image blocks, consider that the displacement amounts of all pixels in the image blocks are the same, then determine a target image block matched with the current image block according to a certain matching criterion within a given specific search range from each image block to a reference frame, and the relative displacement between the target image block and the current image block is the motion vector.

In one implementation of the present disclosure, a method for selecting a target image block from a reference frame by motion estimation may be seen in fig. 5, including the steps of:

step S510, selecting, for a single image block in the video frame, an optimal motion vector and a suboptimal motion vector from the acquired candidate motion vectors, wherein the image block determined based on the optimal/suboptimal motion vector has an optimal/suboptimal matching degree with the image block.

In the embodiment of the disclosure, the candidate motion vector may be a motion vector set according to a spatiotemporal neighborhood rule. The spatio-temporal neighborhood rule refers to that, for any image block, there may be a certain correlation between a motion vector corresponding to an image block adjacent to the image block in the time dimension or adjacent to the image block in the space dimension and a motion vector corresponding to the image block. A motion vector for the image block may be determined based on the correlation.

For a single image block in the video frame, the candidate motion vectors may include:

1) A first motion vector corresponding to an image block adjacent to the image block in the video frame.

In the video frame, there may be a plurality of adjacent image blocks of a single image block, the adjacent image blocks referring to the adjacency of the spatial dimension. And for the neighboring image blocks for which the motion vectors have been determined, the motion vectors of the neighboring image blocks may be the first motion vector, i.e., there may be a plurality of first motion vectors.

2) And in the first L video frames of the video frames, the second motion vector corresponding to the image block at the same position is L which is a positive integer.

In the embodiment of the disclosure, the image block at the same position in the first L video frames of the video frame refers to the adjacent to the image block time dimension. Likewise, the number of second motion vectors may be plural.

3) And a third motion vector and a fourth motion vector obtained by adding the first motion vector and the second motion vector to the corresponding random motion vectors, respectively.

It should be noted that the random motion vector may be an offset vector set according to actual situations, and a maximum range of the random motion vector may be set. After each first motion vector and each second motion vector are added to the corresponding random motion vector, respectively, a third motion vector and a fourth motion vector can be obtained. Wherein the random motion vector corresponding to the first motion vector and the random motion vector corresponding to the second motion vector may be different.

Thus, the candidate motion vector may be a set of the above-described first motion vector, second motion vector, third motion vector, and fourth motion vector. Optionally, the candidate motion vector may further include: a motion vector of 0 in both coordinate axis directions; and an average of the first motion vector, the second motion vector, the third motion vector, and the fourth motion vector. It will be appreciated that the candidate motion vectors for image blocks in the same video frame may be different.

The selection method of the optimal motion vector and the suboptimal motion vector can be as follows: and obtaining a candidate image block corresponding to the image block according to each motion vector in the candidate motion vectors. Then, the matching degree between the image block and all candidate image blocks is calculated, and the motion vectors corresponding to the optimal matching degree value and the suboptimal value are respectively selected as the optimal motion vector and the suboptimal motion vector.

The method for calculating the matching degree of the image block and each candidate image block may be to calculate the matching error of the image block and each candidate image block; or, calculating Euclidean distance between the image block and each candidate image block; alternatively, the texture gradient of the image block and each candidate image block is calculated, etc. And then, determining the matching degree according to the matching error or the Euclidean distance or the texture gradient, wherein the matching error, the Euclidean distance and the texture gradient are in negative correlation with the matching degree.

Step S520, determining a rectangular search window according to the optimal motion vector and the suboptimal motion vector.

After determining the optimal motion vector and the suboptimal motion vector, in order to select more motion vectors, and the matching degree between the image block obtained according to the selected motion vector and the image block is higher, a rectangular search window may be determined according to the optimal motion vector and the suboptimal motion vector, where the rectangular search window is used to determine the selection range of the motion vector.

In one implementation of the present disclosure, the optimal motion vector and the suboptimal motion vector may be subtracted to obtain a vector difference; the vector difference is taken as a diagonal vector to construct a rectangular search window.

The optimal motion vector and the vector start point corresponding to the suboptimal motion vector for the same image block are the same, and may be, for example, the center point of the image block. And vector end points corresponding to the optimal motion vector and the suboptimal motion vector are two diagonal vertexes of the rectangular search window. Thus, since the vector start points are the same, and the vector end points can be any point in the rectangular search window, more motion vectors can be selected here in steps of pixel level or sub-pixel level of preset precision.

In step S530, a plurality of search image blocks in the reference frame are determined according to the rectangular search window and the image block, and the matching degree between each search image block and the image block is calculated.

In the embodiment of the disclosure, since more motion vectors can be selected according to the rectangular search window, a plurality of search image blocks can be determined according to more motion vectors and the image blocks. And selecting a target image block by calculating the matching degree of each searching image block and the image block.

Similarly, a matching error between each search image block and the image block can be calculated, and the obtained matching error is taken as a matching degree; or calculating the Euclidean distance between each search image block and the image block, and taking the obtained Euclidean distance as the matching degree; or calculating the texture gradient of each search image block and the image block, and taking the obtained texture gradient as the matching degree.

Step S540, selecting a preset number of target image blocks from the search image blocks according to the matching degree.

In the embodiment of the disclosure, the higher the matching degree is, the more matching is indicated among the image blocks, so that the matching degree can be ranked from large to small, and the search image block corresponding to the preset number of matching degrees is selected as the target image block matched with the image block. Of course, it is also possible to determine all candidate image blocks satisfying the preset matching degree, and then arbitrarily select a preset number of candidate image blocks from the candidate image blocks as target image blocks. It is understood that the number of candidate image blocks satisfying the preset matching degree may be different in each reference frame.

It should be noted that, in addition to the method shown in fig. 5, the target image block may be selected directly from the candidate image blocks, for example, a preset number of candidate image blocks with a high matching degree may be selected as the target image block. The method for selecting the target image block is not limited in the present disclosure.

In step S330, a single image block and a preset number of target image blocks in the video frame are input into a video processing model, and an enhanced image block corresponding to the image block is determined.

In the embodiment of the disclosure, the video processing model is a model trained in advance through a neural network. The training method of the video processing model can be seen in fig. 6, and includes the following steps:

in step S610, a plurality of sample data are acquired.

In embodiments of the present disclosure, a video processing model may be trained in which the input of the video processing model is an image block of any video frame, and the image block in a reference frame of the video frame that matches the image block, and the output is an enhanced image block of the image block. Whereas the video processing model is trained from a large number of sample data, each sample data may include: the method comprises the steps of presetting a sample image block in a sample video frame, presetting a plurality of image blocks matched with the sample image block in a reference frame of the sample video frame, and enhancing the image block corresponding to the sample image block.

Specifically, the present disclosure may obtain a sample video frame from a plurality of sample videos, and for any sample video frame, may select a reference frame from the sample video in which the sample video frame is located, and select an image block from the reference frame that matches an image block in the sample video frame. It can be seen that this process is similar to the processing of steps S310 to S320. Differently, an enhanced image block corresponding to each image block may also be acquired. The enhanced image block corresponding to each image block is obtained after performing image enhancement processing on the image block, and the method for determining the enhanced image block may be a method in related art, which is not limited herein. For example, gray level histogram processing or the like may be used.

The image enhancement process may include: image resolution scaling and image processing. The image resolution is scaled to be larger or smaller than the original image resolution, and the scaling is the required ratio for corresponding output. The image processing may include: the effect processing and the morphological processing of the image, the effect processing comprises: one of, or a combination of, image enhancement, sharpening, smoothing, denoising, deblurring, defogging, repair, etc. If the combination is adopted, the processing sequence can be arbitrarily specified. The morphological treatment includes: one of image cropping, stitching, etc., or a combination thereof.

It should be noted that, in the model training stage, the number of image blocks selected to match the sample image blocks may be a preset number. The preset number in step S320 is consistent with the preset number here, i.e. how many matched image blocks are selected in the model training stage, and correspondingly, the same number of target image blocks are also selected in the model using stage.

Step S620, training the neural network according to the mapping relation between the plurality of sample image blocks, the corresponding preset number of matching image blocks and the plurality of enhancement image blocks to obtain a video processing model.

In embodiments of the present disclosure, the network used in training may be a convolutional neural network or other network. During training, a loss function can be continuously calculated according to a back propagation principle through a gradient descent method, and network parameter values are updated according to the loss function, wherein the loss function can be used for measuring the degree of inconsistency between a predicted value and a true value. After training is completed, the value of the loss function meets the requirements, e.g., is less than a preset threshold, etc., to obtain a video processing model. The preset threshold may be set according to actual situations, and is not limited herein.

After the video processing model is obtained, a single image block and a preset number of target image blocks in a video frame are input into the video processing model, and an enhanced image block corresponding to the image block can be determined.

In step S340, the enhanced image blocks corresponding to the image blocks in each video frame are spliced to obtain an enhanced video frame of the video frame, and the enhanced video is determined based on the enhanced video frames corresponding to the video frames.

For each video frame, the above process is performed on each image block in the video frame, so that an enhanced image block corresponding to each image block can be obtained, and thus, an enhanced video frame corresponding to the video frame can be obtained. Then, for a plurality of video frames, respective corresponding enhanced video frames of the respective video frames can be obtained, so that enhanced video can be obtained.

Example 1

In the motion estimation stage, motion vectors of 3 image blocks adjacent in the spatial dimension can be selected as candidate motion vectors, 1 video frame before and after the current video frame is used as a reference frame, and each image block selects 1 matching block in 2 reference frames. The data composed of 2 matching blocks and 1 current image block is used as input, the super-resolution image corresponding to the current image block is used as target output, and the target output is input into a variant EDVR (Video Restoration with Enhanced Deformable Convolutional Networks, a deformable convolution network for enhancing video recovery) for training, so that a converged variant EDVR model is obtained. Then, super-resolution processing of the image block can be performed through the EDVR model.

Example two

In the motion estimation stage, 100 motion vectors may be selected as candidate motion vectors according to the method in step S510, and the first 3 video frames and the last 4 video frames of the current video frame are used as reference frames, so as to obtain 7 reference frames. Each image block may choose 5 matching blocks in 7 reference frames. The data composed of 2 matching blocks and 1 current image block is used as input, the recovery processing image corresponding to the current image block is used as target output, and the target output is input into the variant EDVR for training, so that a converged variant EDVR model is obtained. Then, the restoration processing of the image block can be performed by the EDVR model.

According to the video enhancement method disclosed by the embodiment of the disclosure, the image blocks of the current video frame can be traversed through a motion estimation method, a preset number of matching blocks are selected from the reference frame for each image block, and the image blocks, the preset number of matching blocks and the target enhanced image block form a data set to carry out neural network training. And finally, carrying out image enhancement processing on each image block according to the video processing model obtained through training. The method fully utilizes the advantages of the traditional algorithm and the convolutional neural network, effectively avoids image errors such as artifacts generated by the simple traditional multi-frame image enhancement and the simple multi-frame convolutional neural network image enhancement algorithm, and further improves the image enhancement effect. Meanwhile, as the multi-frame strategy adopts a motion estimation method, the operation complexity can be reduced compared with a multi-frame convolutional neural network.

It should be noted that although the steps of the methods in the present disclosure are depicted in the accompanying drawings in a particular order, this does not require or imply that the steps must be performed in that particular order, or that all illustrated steps be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform, etc.

Further, in this example embodiment, there is also provided a video enhancement apparatus 700, as shown in fig. 7, including:

the video frame dividing module 710 is configured to obtain a plurality of video frames in the video to be processed, and divide each video frame into a plurality of image blocks;

the target image block selecting module 720 is configured to select, for each video frame, a reference frame of the video frame from a plurality of video frames, and select a preset number of target image blocks from the reference frame, where the target image blocks are matched with a single image block in the video frame;

the image block enhancement processing module 730 is configured to input a single image block and a preset number of target image blocks in the video frame into the video processing model, and determine an enhanced image block corresponding to the image block;

the enhanced video determining module 740 is configured to splice the enhanced image blocks corresponding to the image blocks in each video frame to obtain an enhanced video frame of the video frame, and determine an enhanced video based on the enhanced video frames corresponding to the video frames.

In one exemplary embodiment of the present disclosure, the target image block selection module selects a preset number of target image blocks from the reference frame that match a single image block in the video frame by motion estimation.

In one exemplary embodiment of the present disclosure, the target image block selection module enables selecting a preset number of target image blocks from a reference frame that match a single image block in the video frame by motion estimation by:

selecting an optimal motion vector and a suboptimal motion vector from the acquired candidate motion vectors for a single image block in the video frame, wherein the matching degree between the image block determined based on the optimal/suboptimal motion vector and the image block is optimal/suboptimal;

determining a rectangular search window according to the optimal motion vector and the suboptimal motion vector, wherein the rectangular search window is used for determining a selection range of the motion vector;

determining a plurality of search image blocks in a reference frame according to the rectangular search window and the image blocks, and respectively calculating the matching degree of each search image block and the image block;

and selecting a preset number of target image blocks from the search image blocks according to the matching degrees.

In one exemplary embodiment of the present disclosure, a candidate motion vector includes:

for a single image block in the video frame, a first motion vector corresponding to an image block adjacent to the image block in the video frame;

in the first L video frames of the video frames, the second motion vector corresponding to the image block at the same position is L which is a positive integer;

and a third motion vector and a fourth motion vector obtained by adding the first motion vector and the second motion vector to the corresponding random motion vectors, respectively.

In an exemplary embodiment of the present disclosure, the candidate motion vector further includes:

a motion vector of 0 in both coordinate axis directions; and

an average of the first motion vector, the second motion vector, the third motion vector, and the fourth motion vector.

In an exemplary embodiment of the present disclosure, the target image block selection module performs calculating the matching degree of each search image block to the image block by:

calculating the matching error of each searching image block and the image block, and taking the obtained matching error as the matching degree; or alternatively

Calculating Euclidean distance between each search image block and the image block, and taking the obtained Euclidean distance as a matching degree; or alternatively

And calculating the texture gradient of each search image block and the image block, and taking the obtained texture gradient as the matching degree.

In one exemplary embodiment of the present disclosure, the target image block selection module determines a rectangular search window from the optimal motion vector and the suboptimal motion vector by:

subtracting the optimal motion vector from the suboptimal motion vector to obtain a vector difference value;

the vector difference is taken as a diagonal vector to construct a rectangular search window.

In an exemplary embodiment of the present disclosure, the video enhancement apparatus of the embodiment of the present disclosure further includes:

a sample data acquisition module for acquiring a plurality of sample data, each sample data comprising: a sample image block in a sample video frame, a preset number of image blocks matched with the sample image block in a reference frame of the sample video frame, and an enhanced image block corresponding to the sample image block;

and the network training module is used for carrying out neural network training according to the mapping relations between the plurality of sample image blocks, the corresponding preset number of matched image blocks and the plurality of enhanced image blocks to obtain a video processing model.

Specific details of each module or unit in the above apparatus have been described in the corresponding method, and thus are not described herein.

It should be noted that although in the above detailed description several modules or units of a device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit in accordance with embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A method of video enhancement, comprising:

selecting a reference frame of the video frame from the plurality of video frames for each video frame, and selecting an optimal motion vector and a suboptimal motion vector from the acquired candidate motion vectors for a single image block in the video frame, wherein the matching degree of the image block determined based on the optimal/suboptimal motion vector and the single image block is optimal/suboptimal;

determining a plurality of search image blocks in the reference frame according to the rectangular search window and the single image block, and respectively calculating the matching degree of each search image block and the single image block;

sorting the matching degrees from large to small, and selecting a searching image block corresponding to a preset number of matching degrees as a target image block matched with the single image block;

inputting a single image block and the preset number of target image blocks in the video frame into a video processing model, and determining an enhanced image block corresponding to the single image block;

2. The method of claim 1, wherein the candidate motion vectors for a single image block in the video frame comprise:

a first motion vector corresponding to an image block adjacent to the single image block in the video frame;

and adding the first motion vector and the second motion vector to corresponding random motion vectors respectively to obtain a third motion vector and a fourth motion vector.

3. The method of claim 2, wherein the candidate motion vector further comprises:

a motion vector of 0 in both coordinate axis directions; and

average values of the first motion vector, the second motion vector, the third motion vector, and the fourth motion vector.

4. The method according to claim 1, wherein the calculating the matching degree between each of the search image blocks and the single image block includes:

calculating the matching error of each searching image block and the single image block, and determining the matching degree according to the obtained matching error; or alternatively

Calculating the Euclidean distance between each searching image block and the single image block, and determining the matching degree according to the obtained Euclidean distance; or alternatively

And calculating the texture gradient of each searching image block and the single image block, and determining the matching degree according to the obtained texture gradient.

5. The method of claim 1, wherein said determining a rectangular search window from said optimal motion vector and said sub-optimal motion vector comprises:

and taking the vector difference value as a diagonal vector to construct a rectangular search window.

6. The method of claim 1, wherein prior to said inputting the single image block and the predetermined number of target image blocks in the video frame into the video processing model, the method further comprises:

acquiring a plurality of sample data, each sample data comprising: a sample image block in a sample video frame, a preset number of image blocks matched with the sample image block in a reference frame of the sample video frame, and an enhanced image block corresponding to the sample image block;

and training the neural network according to the mapping relation between the plurality of sample image blocks, the corresponding preset number of matching image blocks and the plurality of enhancement image blocks to obtain the video processing model.

7. A video enhancement device, comprising:

the target image block selecting module is used for selecting a reference frame of the video frame from the plurality of video frames for each video frame, and selecting a preset number of target image blocks matched with single image blocks in the video frame from the reference frame through motion estimation by the following steps:

selecting an optimal motion vector and a suboptimal motion vector from the acquired candidate motion vectors for a single image block in the video frame, wherein the degree of matching between the image block determined based on the optimal/suboptimal motion vector and the single image block is optimal/suboptimal;

determining a plurality of search image blocks in a reference frame according to the rectangular search window and the single image block, and respectively calculating the matching degree of each search image block and the single image block;

selecting a preset number of target image blocks from the search image blocks according to the matching degree;

the image block enhancement processing module is used for inputting a single image block and the preset number of target image blocks in the video frame into a video processing model and determining an enhanced image block corresponding to the single image block;

8. An electronic device, comprising:

a processor; and

a memory for storing executable instructions of the processor;

wherein the processor is configured to perform the method of any one of claims 1-6 via execution of the executable instructions.

9. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the method of any one of claims 1-6.