WO2023082685A1 - 视频增强方法、装置、计算机设备和存储介质 - Google Patents

视频增强方法、装置、计算机设备和存储介质 Download PDF

Info

Publication number
WO2023082685A1
WO2023082685A1 PCT/CN2022/105653 CN2022105653W WO2023082685A1 WO 2023082685 A1 WO2023082685 A1 WO 2023082685A1 CN 2022105653 W CN2022105653 W CN 2022105653W WO 2023082685 A1 WO2023082685 A1 WO 2023082685A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
frame
aggregation
reference frame
timing
Prior art date
Application number
PCT/CN2022/105653
Other languages
English (en)
French (fr)
Inventor
周昆
李文博
卢丽莹
蒋念娟
沈小勇
吕江波
Original Assignee
深圳思谋信息科技有限公司
上海思谋科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳思谋信息科技有限公司, 上海思谋科技有限公司 filed Critical 深圳思谋信息科技有限公司
Publication of WO2023082685A1 publication Critical patent/WO2023082685A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4053Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/14Transformations for image registration, e.g. adjusting or mapping for alignment of images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/70Denoising; Smoothing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/73Deblurring; Sharpening
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence

Definitions

  • the present application relates to the technical field of video processing, in particular to a video enhancement method, device, computer equipment and storage medium.
  • Video super-resolution aims to reconstruct low-resolution image sequences into high-resolution images. With the increase of network bandwidth, people's demand for high-definition images is also growing rapidly. Today, video super-resolution technology is successfully applied in various fields, such as mobile phone photography, high-definition of old film and television content, intelligent monitoring, etc.
  • neural networks are generally used to directly learn the nonlinear mapping from low-resolution images to high-resolution images to reconstruct high-resolution images.
  • the image obtained by this method is prone to false signals such as artifacts and noise, and it is difficult to reconstruct a high-quality image.
  • a video enhancement method including:
  • the continuous video frames include a reference frame and timing frames adjacent to the reference frame;
  • Extracting the feature information of the reference frame and the feature information of each of the time-sequence frames using the feature information of the reference frame as the reference frame information of the reference frame, and aligning the feature information of each of the time-sequence frames Processing to obtain the timing frame information of each timing frame;
  • a target video frame of the reference frame is reconstructed according to the reference frame information and each of the aggregation information; wherein, the image quality of the target video frame is higher than the image quality of the reference frame.
  • the alignment processing of the feature information of each of the timing frames to obtain the timing frame information of each of the timing frames includes:
  • the reference frame is used as the alignment target, and the feature information of each of the time-series frames is aligned based on the historical motion information of the feature information of each of the time-series frames to obtain each of the time-sequence frames.
  • Timing frame information of the frame including:
  • an intermediate frame is included between the time-series frame and the reference frame, using the intermediate frame as an alignment target, and performing alignment processing on the feature information of the time-series frame based on historical motion information of the feature information of the time-series frame , to obtain the initial alignment information of the timing frame;
  • re-alignment processing is performed on the initial alignment information to obtain time-sequence frame information of the time-sequence frame.
  • performing aggregation processing on each of the time series frame information according to the reference frame information to obtain the aggregation information of each of the time series frames includes:
  • re-aggregation is performed on the initial aggregation information of each time-series frame information to obtain the aggregation information of each time-series frame.
  • the first aggregation weight of each of the timing frame information is obtained in the following manner:
  • a first aggregation weight for each of the time-sequence frame information is determined according to difference information between each of the time-sequence frame information and the reference frame information.
  • the second aggregation weight of each of the timing frame information is obtained in the following manner:
  • a second aggregation weight for each of the time-series frame information is determined according to a distance between each of the time-series frame information and the average value.
  • the reconstruction of the target video frame of the reference frame according to the reference frame information and each of the aggregation information includes:
  • Convolution processing is performed on the mosaic information to obtain a target video frame of the reference frame.
  • the splicing processing of the reference frame information and each of the aggregation information to obtain splicing information includes:
  • the difference information between the timing frame information and the reference frame information is obtained by calculating a cosine distance between the timing frame information and the reference frame information.
  • a video enhancement device including:
  • a video frame acquisition module configured to acquire continuous video frames; wherein, the continuous video frames include a reference frame and timing frames adjacent to the reference frame;
  • An information extraction module configured to extract the feature information of the reference frame and the feature information of each of the timing frames; use the feature information of the reference frame as the reference frame information of the reference frame, and extract the feature information of each of the timing frames Aligning the feature information of the frame to obtain the timing frame information of each timing frame;
  • An information aggregation module configured to perform aggregation processing on each of the time series frame information according to the reference frame information, to obtain the aggregation information of each of the time series frames;
  • a video frame reconstruction module configured to reconstruct a target video frame of the reference frame according to the reference frame information and each of the aggregation information; wherein, the image quality of the target video frame is higher than the image quality of the reference frame .
  • the information extraction module is specifically configured to align the feature information of each time-series frame based on the historical motion information of the feature information of each time-series frame with the reference frame as the alignment target, and obtain each time-series frame timing frame information.
  • the information extraction module is specifically configured to, if an intermediate frame is included between the time-series frame and the reference frame, take the intermediate frame as the alignment target, and based on the historical motion information of the feature information of the time-series frame, the time-series frame The feature information is aligned to obtain the initial alignment information of the timing frame; the reference frame is used as the alignment target, and based on the historical motion information of the initial alignment information, the initial alignment information is re-aligned to obtain the timing frame information of the timing frame.
  • the information aggregation module is specifically configured to determine the first aggregation weight and the second aggregation weight of each timing frame information according to the reference frame information and each timing frame information; according to the first aggregation of each timing frame information Weight, aggregate the information of each time series frame to obtain the initial aggregation information of each time series frame information; according to the second aggregation weight of each time series frame information, perform aggregation processing again on the initial aggregation information of each time series frame information to obtain each time series frame aggregated information.
  • the information aggregation module is further configured to respectively obtain difference information between each timing frame information and the reference frame information; determine each timing frame according to the difference information between each timing frame information and the reference frame information The first aggregation weight of the information.
  • the information aggregation module is also used to obtain the average value of each time series frame information; obtain the distance between each time series frame information and the average value; according to the distance between each time series frame information and the average value, Determine the second aggregation weight of each timing frame information.
  • the video frame reconstruction module is specifically configured to perform splicing processing on the reference frame information and each aggregation information to obtain splicing information; perform convolution processing on the splicing information to obtain a target video frame of the reference frame.
  • the video frame reconstruction module is specifically configured to input the reference frame information and each of the aggregation information into an information reconstruction model, and use the information reconstruction model to input the reference frame information and each of the aggregation information The information is spliced to obtain the spliced information.
  • the difference information between the timing frame information and the reference frame information is obtained by calculating a cosine distance between the timing frame information and the reference frame information.
  • a computer device including a memory and a processor, the memory stores a computer program, and the processor implements the following steps when executing the computer program:
  • the continuous video frames include a reference frame and timing frames adjacent to the reference frame;
  • Extracting the feature information of the reference frame and the feature information of each of the time-sequence frames using the feature information of the reference frame as the reference frame information of the reference frame, and aligning the feature information of each of the time-sequence frames Processing to obtain the timing frame information of each timing frame;
  • a target video frame of the reference frame is reconstructed according to the reference frame information and each of the aggregation information; wherein, the image quality of the target video frame is higher than the image quality of the reference frame.
  • a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the following steps are implemented:
  • the continuous video frames include a reference frame and timing frames adjacent to the reference frame;
  • Extracting the feature information of the reference frame and the feature information of each of the time-sequence frames using the feature information of the reference frame as the reference frame information of the reference frame, and aligning the feature information of each of the time-sequence frames Processing to obtain the timing frame information of each timing frame;
  • a target video frame of the reference frame is reconstructed according to the reference frame information and each of the aggregation information; wherein, the image quality of the target video frame is higher than the image quality of the reference frame.
  • a computer program product includes a computer program, and when the computer program is executed by a processor, the following steps are implemented:
  • the continuous video frames include a reference frame and timing frames adjacent to the reference frame;
  • Extracting the feature information of the reference frame and the feature information of each of the time-sequence frames using the feature information of the reference frame as the reference frame information of the reference frame, and aligning the feature information of each of the time-sequence frames Processing to obtain the timing frame information of each timing frame;
  • a target video frame of the reference frame is reconstructed according to the reference frame information and each of the aggregation information; wherein, the image quality of the target video frame is higher than the image quality of the reference frame.
  • the feature information of each timing frame adjacent to the reference frame is aligned and aggregated, and combined with the reference frame information and the aggregation information of each timing frame, so that the reconstructed video frame has a higher signal-to-noise ratio And structural similarity, the visual effect is also more realistic, thereby improving the image quality of the reconstructed video frame. It avoids the non-linear mapping of low-resolution images to high-resolution images directly learned through the neural network, resulting in images that are prone to artifacts, noise and other erroneous signals, and it is difficult to reconstruct high-quality images.
  • Fig. 1 is a schematic flow chart of a video enhancement method in an embodiment
  • Fig. 2 is a schematic flow chart of motion alignment in an embodiment
  • FIG. 3 is a schematic flow chart of adaptive information re-aggregation in an embodiment
  • Fig. 4 is a schematic flow chart of a video enhancement method in another embodiment
  • FIG. 5 is a schematic flowchart of a video enhancement method for timing alignment in an embodiment
  • Fig. 6 is a structural block diagram of a video enhancement device in an embodiment
  • Figure 7 is an internal block diagram of a computer device in one embodiment.
  • a video enhancement method is provided, and this embodiment is described by taking the method applied to a server as an example. It can be understood that the method can also be applied to a terminal, and can also be applied to a system including a terminal and a server, and can be implemented through interaction between the terminal and the server. In this embodiment, the method includes the following steps:
  • Step S101 acquiring continuous video frames; wherein, the continuous video frames include a reference frame and time sequence frames adjacent to the reference frame.
  • a video is composed of many still pictures, and these still pictures are called video frames; for example, in a video, one second of video includes at least 24 video frames.
  • the continuous video frame refers to multiple continuous low-resolution video frames, such as multiple continuous low-resolution vehicle driving video frames captured by a surveillance camera, which is suitable for fast-moving object scenes.
  • the reference frame refers to a video frame with reference significance in the continuous video frames, such as an intermediate video frame in the continuous video frames.
  • continuous video frames may also refer to continuous video frames for which video deblurring and video denoising are required.
  • the server acquires continuous video frames that need to be subjected to video enhancement processing, and determines a reference frame from the continuous video frames, and simultaneously uses video frames adjacent to the reference frame in the continuous video frames as time-sequential frames.
  • the server takes five consecutive low-resolution video frames as input.
  • the third video frame is a reference frame, which corresponds to the final output high-resolution video frame, while the other four frames
  • a video frame is a temporal frame adjacent to a reference frame.
  • Step S102 extracting the feature information of the reference frame and the feature information of each time-series frame; using the feature information of the reference frame as the reference frame information of the reference frame, and aligning the feature information of each time-series frame to obtain the time-sequence frame Timing frame information.
  • the feature information of the reference frame refers to the image feature of the reference frame
  • the feature information of the time-series frame refers to the image feature of the time-series frame, both of which can be extracted through a feature extraction model.
  • performing alignment processing on the feature information of each time-series frame refers to performing motion alignment on the feature information of each time-sequence frame to reference frame information of a reference frame. It should be noted that, assuming that there is an intermediate frame between the timing frame and the reference frame, a progressive motion alignment strategy is adopted, and the timing frame is first aligned to the intermediate frame, and then aligned to the reference frame.
  • the time-sequence frame information of the time-sequence frame refers to information obtained by performing motion alignment on feature information of the time-sequence frame.
  • the server inputs the reference frame and each time-series frame into a pre-trained feature extraction model, performs feature extraction processing on the reference frame and each time-series frame through the feature extraction model, and obtains feature information of the reference frame and feature information of each time-series frame.
  • the feature information of the reference frame is identified as reference frame information of the reference frame.
  • Motion alignment is performed on the feature information of each time-sequence frame to the reference frame information of the reference frame to obtain the alignment information of each time-sequence frame, which corresponds to the time-sequence frame information of each time-sequence frame.
  • Step S103 according to the information of the reference frame, aggregate the information of each time-series frame to obtain the aggregated information of each time-sequence frame.
  • the aggregation information of the timing frame refers to the information obtained after re-aggregating the timing frame information of the timing frame.
  • the server inputs the reference frame information and the information of each time-series frame into the information aggregation model, and performs aggregation processing on each time-series frame information based on the reference frame information through the information aggregation model to obtain the aggregation information of each time-series frame.
  • the information aggregation model is a network model used for aggregation processing of time-series frame information of time-series frames.
  • Step S104 reconstructing a target video frame of the reference frame according to the reference frame information and each aggregation information; wherein, the image quality of the target video frame is higher than the image quality of the reference frame.
  • the image quality of the target video frame is higher than that of the reference frame, which means that the image resolution of the target video frame is higher than that of the reference frame, and has a higher signal-to-noise ratio and structural similarity, and the visual effect is also higher than that of the reference frame. more realistic.
  • the server inputs the reference frame information and the aggregation information of each time-series frame into the information reconstruction model, and performs convolution calculation on the reference frame information and the aggregation information of each time-series frame through the information reconstruction model to obtain a high-quality video frame of the reference frame, as The target video frame of the reference frame, such as a high-quality video frame of a vehicle driving video frame.
  • continuous video frames refer to continuous video frames for which video deblurring and video denoising are required
  • target video frame may also refer to video frames after video deblurring and video denoising.
  • continuous video frames are obtained; the continuous video frames include a reference frame and time sequence frames adjacent to the reference frame. Then extract the feature information of the reference frame and the feature information of each time-series frame, use the feature information of the reference frame as the reference frame information of the reference frame, and perform alignment processing on the feature information of each time-series frame to obtain the time-sequence frame of each time-series frame information. Then, according to the reference frame information, aggregation processing is performed on the information of each time-series frame to obtain the aggregation information of each time-sequence frame. Finally, the target video frame of the reference frame is reconstructed according to the reference frame information and each aggregation information.
  • the image quality of the target video frame is higher than that of the reference frame.
  • the reconstructed video frame has a higher signal-to-noise ratio and structure Similarity, the visual effect is also more realistic, thereby improving the image quality of the reconstructed video frame. It avoids the non-linear mapping of low-resolution images to high-resolution images directly learned through the neural network, resulting in images that are prone to artifacts, noise and other erroneous signals, and it is difficult to reconstruct high-quality images.
  • the feature information of each time-series frame is aligned to obtain the time-series frame information of each time-series frame, which specifically includes: taking the reference frame as the alignment target, based on the history of the feature information of each time-series frame For the motion information, the feature information of each time-series frame is aligned to obtain the time-series frame information of each time-series frame.
  • the historical movement information refers to three kinds of movement information of continuity (C-Prop), uniqueness (U-Prop) and transference (T-Prop).
  • the server adopts a progressive motion alignment strategy, takes the reference frame as the alignment target, takes the historical motion information of the feature information of each time series frame as a known condition, and performs motion alignment processing on the feature information of each time series frame to obtain each time series
  • the frame alignment information corresponds to timing frame information of each timing frame. In this way, using the historical motion information as a known condition is beneficial to help the alignment of the current time series frame.
  • the reference frame is used as the alignment target, and the feature information of each time-series frame is aligned based on the historical motion information of the feature information of each time-series frame to obtain the time-series frame information of each time-series frame, which can be specifically implemented in the following manner : If there is an intermediate frame between the timing frame and the reference frame, the intermediate frame is used as the alignment target, and the feature information of the timing frame is aligned based on the historical motion information of the feature information of the timing frame to obtain the initial alignment information of the timing frame. Taking the reference frame as the alignment target, based on the historical motion information of the initial alignment information, the initial alignment information is re-aligned to obtain the timing frame information of the timing frame.
  • A represents a set of alignment tasks, A contains multiple a, and each a is an alignment unit; the subscripts of A1 and A2 represent the sequence numbers "1" and "2" of adjacent frames ; Indicates the information transmission between two aligned units, and the arrow indicates that the information is transferred from delivered to and Both indicate that the information at time “1" is aligned to time "0", and their subscripts are "1" and "2" respectively, indicating that their signals come from video frame "1” and video frame “2” respectively .
  • M represents the motion vector, such as C-Prop, U-Prop, and T-Prop represent three kinds of motion information, namely continuity, uniqueness, and transferability, respectively.
  • the indicated alignment start time is the same as the end time, that is, "+1" ⁇ "0", but it belongs to the alignment tasks A1 and A2, so their source information comes from the timing frames "+1" and "+2" respectively.
  • the second transfer rule "U” for motion alignment information is defined here: Based on the two transfer rules given above, a third transfer rule “T” is derived:
  • the above step S103 aggregates the information of each timing frame to obtain the aggregation information of each timing frame, which specifically includes: determining the information of each timing frame according to the information of the reference frame and the information of each timing frame The first aggregation weight and the second aggregation weight of each time series frame information; according to the first aggregation weight of each time series frame information, the information of each time series frame is aggregated to obtain the initial aggregation information of each time series frame information; according to the second aggregation of each time series frame information The weight is used to re-aggregate the initial aggregation information of each time-series frame information to obtain the aggregation information of each time-series frame.
  • the first aggregation weight refers to the accuracy aggregation weight, such as W k in FIG. 3 ;
  • the second aggregation weight refers to the consistency aggregation weight, such as C k in FIG. 3 .
  • the first aggregation weight of each timing frame information is obtained in the following manner: separately obtain the difference information between each timing frame information and the reference frame information; according to the difference information between each timing frame information and the reference frame information, Determine the first aggregation weight of each timing frame information.
  • the server obtains the difference information between each timing frame information and the reference frame information respectively; according to the difference information between each timing frame information and the reference frame information, query the correspondence between the preset difference information and the first aggregation weight, and obtain The first aggregation weight of each timing frame information.
  • the second aggregation weight of each time series frame information is obtained in the following manner: obtaining the average value of each time series frame information; obtaining the distance between each time series frame information and the average value; according to the distance between each time series frame information and the average value
  • the distance between each time series frame information is determined to determine the second aggregation weight.
  • the server first calculates the average value of each time series frame information, and then obtains the square root distance between each time series frame information and the average value, which corresponds to the distance between each time series frame information and the average value.
  • query the correspondence between the preset distance and the second aggregation weight query the correspondence between the preset distance and the second aggregation weight, and obtain the second aggregation weight of each time series frame information.
  • This new value is based on the accuracy of information re-aggregation to obtain the pixel value, and all positions are calculated to generate the initial aggregation information It should be noted that the difference between the timing frame information and the reference frame information is obtained by calculating the cosine distance (vector dot product). The larger the value, the smaller the difference between the timing frame information and the reference frame information, and the weight bigger.
  • the timing frame information of each timing frame is aggregated to obtain the aggregation information of each timing frame, which can filter out inaccurate timing information , and can enhance accurate and reliable timing information.
  • the above step S104 reconstructs the target video frame of the reference frame, which specifically includes: splicing the reference frame information and each aggregation information to obtain the splicing information; Perform convolution processing to obtain the target video frame of the reference frame.
  • the server inputs the reference frame information and each aggregation information into the information reconstruction model, and splices the reference frame information and each aggregation information through the information reconstruction model to obtain the splicing information, and performs a series of convolution processing on the splicing information to obtain high The quality of the video frame that serves as the reference frame for the target video frame.
  • the reference frame information and each aggregation information it is beneficial to reconstruct a high-quality target video frame, avoiding the nonlinear mapping from a low-resolution image to a high-resolution image directly learned by a neural network, resulting in an easy-to-find image
  • There are erroneous signals such as artifacts and noise, and it is difficult to reconstruct the defects of high-quality images.
  • FIG. 4 another video enhancement method is provided, and the method is applied to a server as an example for illustration, including the following steps:
  • Step S401 acquiring continuous video frames; wherein, the continuous video frames include a reference frame and time sequence frames adjacent to the reference frame.
  • Step S402 extracting the feature information of the reference frame and the feature information of each time sequence frame, and using the feature information of the reference frame as the reference frame information of the reference frame.
  • step S403 the reference frame is used as an alignment target, and the feature information of each time-series frame is aligned based on the historical motion information of the feature information of each time-series frame to obtain the time-series frame information of each time-series frame.
  • step S404 the difference information between each timing frame information and the reference frame information is obtained respectively; and the first aggregation weight of each timing frame information is determined according to the difference information between each timing frame information and the reference frame information.
  • Step S405 obtaining the average value of each time series frame information; obtaining the distance between each time series frame information and the average value; and determining the second aggregation weight of each time series frame information according to the distance between each time series frame information and the average value.
  • Step S406 according to the first aggregation weight of each time-series frame information, perform aggregation processing on each time-series frame information, and obtain initial aggregation information of each time-series frame information.
  • Step S407 according to the second aggregation weight of each time-series frame information, re-aggregate the initial aggregation information of each time-series frame information to obtain the aggregation information of each time-series frame.
  • Step S408 performing splicing processing on the reference frame information and each aggregation information to obtain splicing information; performing convolution processing on the splicing information to obtain a target video frame of the reference frame.
  • the reconstructed video frame has a higher signal quality.
  • Noise ratio and structural similarity the visual effect is also more realistic, thereby improving the image quality of the reconstructed video frame. It avoids the non-linear mapping of low-resolution images to high-resolution images directly learned through the neural network, resulting in images that are prone to artifacts, noise and other erroneous signals, and it is difficult to reconstruct high-quality images.
  • the embodiment of the present application also proposes a video enhancement method for timing alignment.
  • This method is different from previous methods that directly perform motion estimation on long-distance adjacent frames.
  • This method adopts a progressive alignment strategy.
  • This alignment strategy makes full use of historical motion information, so that long-distance inter-frame alignment can be achieved more accurately and more reliable timing information can be obtained.
  • the embodiment of the present application proposes an information aggregation strategy based on consistency and accuracy of time series information.
  • the method of the embodiment of the present application can enhance the weight of reliable alignment information while eliminating unreliable alignment information.
  • the images generated by this method have higher signal-to-noise ratio and structural similarity, and the visual effect is more realistic. It can effectively deal with video blur and noise, and increase the resolution of the video to generate high-quality video images. Specifically include the following:
  • the information of each video frame is extracted by a feature extractor, and then the extracted information is initially aligned by a progressive motion aligner. Then the different alignment information is aggregated by the information aggregator, and finally the aggregated information is calculated by the reconstructor, and a high-quality video frame is reconstructed.
  • motion alignment is an important component module of the video repair task.
  • the flow of the motion alignment module proposed in the embodiment of this application is shown in the left figure of Figure 2.
  • a progressive alignment strategy is adopted to solve the long-term problem. Difficult problems with distance alignment.
  • the historical alignment information is fully considered, as shown in the right figure of Figure 2, three related historical motion information are defined: "C", "U”, and "T”.
  • the current alignment step is performed each time, the historical motion signal is used as a known condition to help the current alignment.
  • the relationship between different frame motions is fully explored, so that timing alignment can be accurately achieved.
  • an effective information re-aggregation module is proposed in an embodiment of the present application.
  • Figure 3 for a given adjacent timing frame information, the embodiment of the present application adopts two strategies to realize adaptive aggregation: (1) information re-aggregation strategy based on accuracy: as shown in Fig.
  • (a) in 3 for each time series frame information, the difference between the time series frame information and the reference frame information is calculated, and the aggregation weight based on the information accuracy is calculated according to the difference.
  • Consistency-based information aggregation strategy as shown in (b) in Figure 3, for each time series frame information, the distance between the time series frame information and the average time series frame information is counted, according to the size of the distance, Aggregation weights based on information consistency were calculated. Based on these two weights, inaccurate timing information can be filtered out, and accurate and reliable timing information can be enhanced.
  • the above-mentioned video enhancement method for timing alignment can achieve the following technical effects: (1) This method breaks through the limitation that related video repair methods can only handle certain specific tasks, and can simultaneously process three different video in one frame problems, while generating higher quality video frames; compared with related video inpainting methods, this method has achieved the best results in video deblurring tasks, video denoising tasks and video super-resolution tasks; (2) overcomes the Related technologies are difficult to align and aggregate information between frames for fast-moving objects, making it difficult to reconstruct high-quality images; at the same time, it avoids the deviation of effective information aggregation in related technologies, which makes the generated images have artifacts, Defects such as noise and other erroneous signals.
  • a video enhancement device including:
  • the video frame acquisition module 610 is configured to acquire continuous video frames; wherein, the continuous video frames include a reference frame and time sequence frames adjacent to the reference frame.
  • the information extraction module 620 is used to extract the feature information of the reference frame and the feature information of each time-series frame; use the feature information of the reference frame as the reference frame information of the reference frame, and perform alignment processing on the feature information of each time-series frame to obtain Timing frame information of each timing frame.
  • the information aggregation module 630 is configured to perform aggregation processing on the information of each time-series frame according to the information of the reference frame, and obtain the aggregation information of each time-series frame.
  • the video frame reconstruction module 640 is used for reconstructing the target video frame of the reference frame according to the reference frame information and each aggregation information; wherein, the image quality of the target video frame is higher than the image quality of the reference frame.
  • the information extraction module 620 is specifically configured to use the reference frame as the alignment target, and perform alignment processing on the feature information of each time-series frame based on the historical motion information of the feature information of each time-series frame to obtain the Timing frame information.
  • the information extraction module 620 is specifically configured to, if an intermediate frame is included between the time-series frame and the reference frame, take the intermediate frame as the alignment target, based on the historical motion information of the feature information of the time-series frame, perform The information is aligned to obtain the initial alignment information of the timing frame; the reference frame is used as the alignment target, and based on the historical motion information of the initial alignment information, the initial alignment information is re-aligned to obtain the timing frame information of the timing frame.
  • the information aggregation module 630 is specifically configured to determine the first aggregation weight and the second aggregation weight of each timing frame information according to the reference frame information and each timing frame information; according to the first aggregation weight of each timing frame information , aggregate the information of each time series frame to obtain the initial aggregation information of each time series frame information; according to the second aggregation weight of each time series frame information, perform aggregation processing on the initial aggregation information of each time series frame information again to obtain the initial aggregation information of each time series frame aggregate information.
  • the information aggregation module 630 is also used to respectively obtain the difference information between each timing frame information and the reference frame information; determine each timing frame information according to the difference information between each timing frame information and the reference frame information The first aggregation weight for .
  • the information aggregation module 630 is also used to obtain the average value of each time series frame information; obtain the distance between each time series frame information and the average value; according to the distance between each time series frame information and the average value, determine The second aggregation weight of each timing frame information.
  • the video frame reconstruction module 640 is specifically configured to perform splicing processing on the reference frame information and each aggregation information to obtain splicing information; perform convolution processing on the splicing information to obtain a target video frame of the reference frame.
  • the video frame reconstruction module 640 is specifically configured to input the reference frame information and each aggregation information into the information reconstruction model, and splicing the reference frame information and each aggregation information through the information reconstruction model to obtain the splicing information.
  • the difference information between the timing frame information and the reference frame information is obtained by calculating the cosine distance between the timing frame information and the reference frame information.
  • Each module in the above-mentioned video enhancement device may be fully or partially realized by software, hardware and a combination thereof.
  • the above-mentioned modules can be embedded in or independent of the processor in the computer device in the form of hardware, and can also be stored in the memory of the computer device in the form of software, so that the processor can invoke and execute the corresponding operations of the above-mentioned modules.
  • a computer device is provided.
  • the computer device may be a server, and its internal structure may be as shown in FIG. 7 .
  • the computer device includes a processor, memory and a network interface connected by a system bus. Wherein, the processor of the computer device is used to provide calculation and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system, computer programs and databases.
  • the internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium.
  • the database of the computer device is used to store data such as feature information of reference frames, feature information of each time-sequence frame, time-sequence frame information of each time-sequence frame, aggregation information of each time-sequence frame, and target video frame.
  • the network interface of the computer device is used to communicate with an external terminal via a network connection.
  • the computer program implements a video enhancement method when executed by a processor.
  • FIG. 7 is only a block diagram of a part of the structure related to the embodiment of the application, and does not constitute a limitation on the computer equipment applied to the embodiment of the application.
  • the computer device may include more or fewer components than shown in the figures, or combine certain components, or have a different arrangement of components.
  • a computer device including a memory and a processor, where a computer program is stored in the memory, and the processor implements the steps in the above method embodiments when executing the computer program.
  • a computer-readable storage medium storing a computer program, and implementing the steps in the foregoing method embodiments when the computer program is executed by a processor.
  • a computer program product or computer program comprising computer instructions stored on a computer readable storage medium.
  • the processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the steps in the foregoing method embodiments.
  • Non-volatile memory may include read-only memory (Read-Only Memory, ROM), magnetic tape, floppy disk, flash memory or optical memory, etc.
  • Volatile memory can include Random Access Memory (RAM) or external cache memory.
  • RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM).

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Television Systems (AREA)
  • Image Analysis (AREA)

Abstract

本申请涉及一种视频增强方法、装置、计算机设备和存储介质。方法包括:获取连续视频帧;连续视频帧中包括参考帧和与参考帧相邻的时序帧;提取出参考帧的特征信息和各时序帧的特征信息,将参考帧的特征信息,作为参考帧的参考帧信息,以及将各时序帧的特征信息进行对齐处理,得到各时序帧的时序帧信息;根据参考帧信息,对各时序帧信息进行聚合处理,得到各时序帧的聚合信息;根据参考帧信息和各聚合信息,重建出参考帧的目标视频帧;目标视频帧的图像质量高于参考帧的图像质量。采用本方法,使得重建出的视频帧拥有更高的信噪比和结构相似性,视觉效果也更为逼真,从而提高了重建出的视频帧的图像质量。

Description

视频增强方法、装置、计算机设备和存储介质
本申请要求于2021年11月11日提交中国国家知识产权局、申请号为202111330266.9、发明名称为“视频增强方法、装置、计算机设备和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及视频处理技术领域,特别是涉及一种视频增强方法、装置、计算机设备和存储介质。
背景技术
视频超分辨率旨在将低分辨率的图像序列重建成高分辨率的图像,随着网络带宽的增加,人们对于高清图像的需求也在快速增长。如今,视频超分技术成功的应用于各个领域中,例如手机拍照、老旧影视内容的高清化、智能监控等。
传统技术中,一般是通过神经网络直接学习低分辨率图像到高分辨率图像的非线性映射,来重建出高分辨率的图像。但是,该方法得到的图像容易存在伪影、噪声等错误的信号,很难重建出高质量的图像。
发明内容
基于此,有必要针对上述技术问题,提供一种能够提高重建出的图像的图像质量的视频增强方法、装置、计算机设备和存储介质。
第一方面,提供一种视频增强方法,包括:
获取连续视频帧;其中,所述连续视频帧包括参考帧和与所述参考帧相邻的时序帧;
提取出所述参考帧的特征信息和各所述时序帧的特征信息;将所述参考帧的特征信息,作为所述参考帧的参考帧信息,以及将各所述时序帧的特征信息进行对齐处理,得到各所述时序帧的时序帧信息;
根据所述参考帧信息,对各所述时序帧信息进行聚合处理,得到各所述时序帧的聚合信息;
根据所述参考帧信息和各所述聚合信息,重建出所述参考帧的目标视频帧;其中,所述目标视频帧的图像质量高于所述参考帧的图像质量。
在一些实施例中,所述将各所述时序帧的特征信息进行对齐处理,得到各所述时序帧的时序帧信息,包括:
以所述参考帧为对齐目标,分别基于各所述时序帧的特征信息的历史运动信息,将各所述时序帧的特征信息进行对齐处理,得到各所述时序帧的时序帧信息。
在一些实施例中,所述以所述参考帧为对齐目标,分别基于各所述时序帧的特征信息的历史运动信息,将各所述时序帧的特征信息进行对齐处理,得到各所述时序帧的时序帧信息,包括:
若所述时序帧与所述参考帧之间包含中间帧,则以所述中间帧为对齐目标,基于所述时序帧的特征信息的历史运动信息,对所述时序帧的特征信息进行对齐处理,得到所述时序帧的初始对齐信息;
以所述参考帧为对齐目标,基于所述初始对齐信息的历史运动信息,对所述初始对齐信息进行再次对齐处理,得到所述时序帧的时序帧信息。
在一些实施例中,所述根据所述参考帧信息,对各所述时序帧信息进行聚合处理,得到各所述时序帧的聚合信息,包括:
根据所述参考帧信息和各所述时序帧信息,确定各所述时序帧信息的第一聚合权重和第二聚合权重;
根据各所述时序帧信息的第一聚合权重,对各所述时序帧信息进行聚合处理,得到各所述时序帧信息的初始聚合信息;
根据各所述时序帧信息的第二聚合权重,对各所述时序帧信息的初始聚合信息进行再次聚合处理,得到各所述时序帧的聚合信息。
在一些实施例中,所述各所述时序帧信息的第一聚合权重通过下述方式得到:
分别获取各所述时序帧信息与所述参考帧信息之间的差异信息;
根据各所述时序帧信息与所述参考帧信息之间的差异信息,确定各所述时序帧信息的第一聚合权重。
在一些实施例中,所述各所述时序帧信息的第二聚合权重通过下述方式得 到:
获取各所述时序帧信息的平均值;
获取各所述时序帧信息与所述平均值之间的距离;
根据各所述时序帧信息与所述平均值之间的距离,确定各所述时序帧信息的第二聚合权重。
在一些实施例中,所述根据所述参考帧信息和各所述聚合信息,重建出所述参考帧的目标视频帧,包括:
将所述参考帧信息和各所述聚合信息进行拼接处理,得到拼接信息;
对所述拼接信息进行卷积处理,得到所述参考帧的目标视频帧。
在一些实施例中,所述将所述参考帧信息和各所述聚合信息进行拼接处理,得到拼接信息,包括:
将所述参考帧信息和各所述聚合信息输入信息重建模型,通过所述信息重建模型将所述参考帧信息和各所述聚合信息进行拼接处理,得到拼接信息。
在一些实施例中,所述时序帧信息与所述参考帧信息之间的差异信息是通过计算所述时序帧信息与所述参考帧信息的余弦距离得到的。
第二方面,提供一种视频增强装置,包括:
视频帧获取模块,用于获取连续视频帧;其中,所述连续视频帧包括参考帧和与所述参考帧相邻的时序帧;
信息提取模块,用于提取出所述参考帧的特征信息和各所述时序帧的特征信息;将所述参考帧的特征信息,作为所述参考帧的参考帧信息,以及将各所述时序帧的特征信息进行对齐处理,得到各所述时序帧的时序帧信息;
信息聚合模块,用于根据所述参考帧信息,对各所述时序帧信息进行聚合处理,得到各所述时序帧的聚合信息;
视频帧重建模块,用于根据所述参考帧信息和各所述聚合信息,重建出所述参考帧的目标视频帧;其中,所述目标视频帧的图像质量高于所述参考帧的图像质量。
在一些实施例中,所述信息提取模块,具体用于以参考帧为对齐目标,分别基于各时序帧的特征信息的历史运动信息,将各时序帧的特征信息进行对齐处理,得到各时序帧的时序帧信息。
在一些实施例中,所述信息提取模块,具体用于若时序帧与参考帧之间包含中间帧,则以中间帧为对齐目标,基于时序帧的特征信息的历史运动信息,对时序帧的特征信息进行对齐处理,得到时序帧的初始对齐信息;以参考帧为对齐目标,基于初始对齐信息的历史运动信息,对初始对齐信息进行再次对齐处理,得到时序帧的时序帧信息。
在一些实施例中,所述信息聚合模块,具体用于根据参考帧信息和各时序帧信息,确定各时序帧信息的第一聚合权重和第二聚合权重;根据各时序帧信息的第一聚合权重,对各时序帧信息进行聚合处理,得到各时序帧信息的初始聚合信息;根据各时序帧信息的第二聚合权重,对各时序帧信息的初始聚合信息进行再次聚合处理,得到各时序帧的聚合信息。
在一些实施例中,所述信息聚合模块,还用于分别获取各时序帧信息与参考帧信息之间的差异信息;根据各时序帧信息与参考帧信息之间的差异信息,确定各时序帧信息的第一聚合权重。
在一些实施例中,所述信息聚合模块,还用于获取各时序帧信息的平均值;获取各时序帧信息与平均值之间的距离;根据各时序帧信息与平均值之间的距离,确定各时序帧信息的第二聚合权重。
在一些实施例中,所述视频帧重建模块,具体用于将参考帧信息和各聚合信息进行拼接处理,得到拼接信息;对拼接信息进行卷积处理,得到参考帧的目标视频帧。
在一些实施例中,所述视频帧重建模块,具体用于将所述参考帧信息和各所述聚合信息输入信息重建模型,通过所述信息重建模型将所述参考帧信息和各所述聚合信息进行拼接处理,得到拼接信息。
在一些实施例中,所述时序帧信息与所述参考帧信息之间的差异信息是通过计算所述时序帧信息与所述参考帧信息的余弦距离得到的。
第三方面,提供一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,所述处理器执行所述计算机程序时实现以下步骤:
获取连续视频帧;其中,所述连续视频帧包括参考帧和与所述参考帧相邻的时序帧;
提取出所述参考帧的特征信息和各所述时序帧的特征信息,将所述参考帧 的特征信息;作为所述参考帧的参考帧信息,以及将各所述时序帧的特征信息进行对齐处理,得到各所述时序帧的时序帧信息;
根据所述参考帧信息,对各所述时序帧信息进行聚合处理,得到各所述时序帧的聚合信息;
根据所述参考帧信息和各所述聚合信息,重建出所述参考帧的目标视频帧;其中,所述目标视频帧的图像质量高于所述参考帧的图像质量。
第四方面,提供一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现以下步骤:
获取连续视频帧;其中,所述连续视频帧包括参考帧和与所述参考帧相邻的时序帧;
提取出所述参考帧的特征信息和各所述时序帧的特征信息;将所述参考帧的特征信息,作为所述参考帧的参考帧信息,以及将各所述时序帧的特征信息进行对齐处理,得到各所述时序帧的时序帧信息;
根据所述参考帧信息,对各所述时序帧信息进行聚合处理,得到各所述时序帧的聚合信息;
根据所述参考帧信息和各所述聚合信息,重建出所述参考帧的目标视频帧;其中,所述目标视频帧的图像质量高于所述参考帧的图像质量。
第五方面,提供一种计算机程序产品,所述计算机程序产品包括计算机程序,所述计算机程序被处理器执行时实现以下步骤:
获取连续视频帧;其中,所述连续视频帧包括参考帧和与所述参考帧相邻的时序帧;
提取出所述参考帧的特征信息和各所述时序帧的特征信息;将所述参考帧的特征信息,作为所述参考帧的参考帧信息,以及将各所述时序帧的特征信息进行对齐处理,得到各所述时序帧的时序帧信息;
根据所述参考帧信息,对各所述时序帧信息进行聚合处理,得到各所述时序帧的聚合信息;
根据所述参考帧信息和各所述聚合信息,重建出所述参考帧的目标视频帧;其中,所述目标视频帧的图像质量高于所述参考帧的图像质量。
本申请实施例通过对与参考帧相邻的各时序帧的特征信息进行对齐和聚合处理,并结合参考帧信息和各时序帧的聚合信息,使得重建出的视频帧拥有更高的信噪比和结构相似性,视觉效果也更为逼真,从而提高了重建出的视频帧的图像质量。避免了通过神经网络直接学习低分辨率图像到高分辨率图像的非线性映射,导致得到的图像容易存在伪影、噪声等错误的信号,很难重建出高质量的图像的缺陷。
附图说明
图1为一个实施例中视频增强方法的流程示意图;
图2为一个实施例中运动对齐的流程示意图;
图3为一个实施例中自适应的信息重聚合的流程示意图;
图4为另一个实施例中视频增强方法的流程示意图;
图5为一个实施例中一种用于时序对齐的视频增强方法的流程示意图;
图6为一个实施例中视频增强装置的结构框图;
图7为一个实施例中计算机设备的内部结构图。
具体实施方式
为了使本申请实施例的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请实施例进行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本申请实施例,并不用于限定本申请实施例。
在一个实施例中,如图1所示,提供了一种视频增强方法,本实施例以该方法应用于服务器进行举例说明。可以理解的是,该方法也可以应用于终端,还可以应用于包括终端和服务器的系统,并通过终端和服务器的交互实现。本实施例中,该方法包括以下步骤:
步骤S101,获取连续视频帧;其中,连续视频帧包括参考帧和与参考帧相邻的时序帧。
其中,视频是由众多个静止的画面组成,这些静止的画面称为视频帧;比如一个视频中,一秒的视频至少包括24个视频帧。
其中,连续视频帧是指多帧连续的低分辨率视频帧,比如监控摄像头拍摄 到的多帧连续的低分辨率的车辆行驶视频帧,适用于快速运动的物体场景。参考帧是指连续视频帧中具有参考意义的视频帧,比如连续视频帧中的中间视频帧。
需要说明的是,连续视频帧还可以是指需要进行视频去模糊和视频去噪的连续的视频帧。
具体地,服务器获取需要进行视频增强处理的连续视频帧,并从连续视频帧中确定出参考帧,同时将连续视频帧中与参考帧相邻的视频帧,作为时序帧。
举例说明,服务器将连续的五帧低分辨率的视频帧作为输入,这五帧视频帧中,第三帧视频帧是参考帧,与最终输出的高分辨率视频帧对应,而其他的四帧视频帧则是与参考帧相邻的时序帧。
步骤S102,提取出参考帧的特征信息和各时序帧的特征信息;将参考帧的特征信息,作为参考帧的参考帧信息,以及将各时序帧的特征信息进行对齐处理,得到各时序帧的时序帧信息。
其中,参考帧的特征信息是指参考帧的图像特征,时序帧的特征信息是指时序帧的图像特征,均可以通过特征提取模型提取得到。
其中,将各时序帧的特征信息进行对齐处理,是指分别将各时序帧的特征信息向参考帧的参考帧信息进行运动对齐。需要说明的是,假设时序帧与参考帧之间包括中间帧,则采用渐进式的运动对齐策略,先将该时序帧向中间帧对齐,再向参考帧对齐。
其中,时序帧的时序帧信息,是指对时序帧的特征信息进行运动对齐后得到的信息。
具体地,服务器将参考帧和各时序帧输入预先训练的特征提取模型,通过特征提取模型对参考帧和各时序帧进行特征提取处理,得到参考帧的特征信息和各时序帧的特征信息。将参考帧的特征信息,识别为参考帧的参考帧信息。分别将各时序帧的特征信息向参考帧的参考帧信息进行运动对齐,得到各时序帧的对齐信息,对应作为各时序帧的时序帧信息。
步骤S103,根据参考帧信息,对各时序帧信息进行聚合处理,得到各时序帧的聚合信息。
其中,时序帧的聚合信息是指对时序帧的时序帧信息进行重聚合后得到的 信息。
具体地,服务器将参考帧信息和各时序帧信息输入信息聚合模型,通过信息聚合模型基于参考帧信息,对各时序帧信息进行聚合处理,得到各时序帧的聚合信息。其中,信息聚合模型是一种用于对时序帧的时序帧信息进行聚合处理的网络模型。
步骤S104,根据参考帧信息和各聚合信息,重建出参考帧的目标视频帧;其中,目标视频帧的图像质量高于参考帧的图像质量。
其中,目标视频帧的图像质量高于参考帧的图像质量,是指目标视频帧的图像分辨率高于参考帧的图像分辨率,且拥有更高的信噪比和结构相似性,视觉效果也更为逼真。
具体地,服务器将参考帧信息和各时序帧的聚合信息输入信息重建模型,通过信息重建模型对参考帧信息和各时序帧的聚合信息进行卷积计算,得到参考帧的高质量视频帧,作为参考帧的目标视频帧,比如车辆行驶视频帧的高质量视频帧。
需要说明的是,假设连续视频帧是指需要进行视频去模糊和视频去噪的连续的视频帧,那么目标视频帧还可以是指视频去模糊和视频去噪后的视频帧。
上述视频增强方法中,通过获取连续视频帧;连续视频帧中包括参考帧和与参考帧相邻的时序帧。接着提取出参考帧的特征信息和各时序帧的特征信息,将参考帧的特征信息,作为参考帧的参考帧信息,以及将各时序帧的特征信息进行对齐处理,得到各时序帧的时序帧信息。然后根据参考帧信息,对各时序帧信息进行聚合处理,得到各时序帧的聚合信息。最后根据参考帧信息和各聚合信息,重建出参考帧的目标视频帧。目标视频帧的图像质量高于参考帧的图像质量。这样,通过对与参考帧相邻的各时序帧的特征信息进行对齐和聚合处理,并结合参考帧信息和各时序帧的聚合信息,使得重建出的视频帧拥有更高的信噪比和结构相似性,视觉效果也更为逼真,从而提高了重建出的视频帧的图像质量。避免了通过神经网络直接学习低分辨率图像到高分辨率图像的非线性映射,导致得到的图像容易存在伪影、噪声等错误的信号,很难重建出高质量的图像的缺陷。
在一个实施例中,上述步骤S102,将各时序帧的特征信息进行对齐处理, 得到各时序帧的时序帧信息,具体包括:以参考帧为对齐目标,分别基于各时序帧的特征信息的历史运动信息,将各时序帧的特征信息进行对齐处理,得到各时序帧的时序帧信息。
其中,历史运动信息,是指连续性(C-Prop)、唯一性(U-Prop)和转移性(T-Prop)这三种运动信息。
具体地,服务器采用渐进式的运动对齐策略,以参考帧为对齐目标,将各时序帧的特征信息的历史运动信息作为已知条件,将各时序帧的特征信息进行运动对齐处理,得到各时序帧的对齐信息,对应作为各时序帧的时序帧信息。这样,将历史运动信息作为已知条件,有利于帮助到当前时序帧的对齐。
进一步地,以参考帧为对齐目标,分别基于各时序帧的特征信息的历史运动信息,将各时序帧的特征信息进行对齐处理,得到各时序帧的时序帧信息,具体可以通过下述方式实现:若时序帧与参考帧之间包含中间帧,则以中间帧为对齐目标,基于时序帧的特征信息的历史运动信息,对时序帧的特征信息进行对齐处理,得到时序帧的初始对齐信息。以参考帧为对齐目标,基于初始对齐信息的历史运动信息,对初始对齐信息进行再次对齐处理,得到时序帧的时序帧信息。
举例说明,参考图2,A表示一次对齐任务的集合,A包含了多个a,每个a是一个对齐单元;A1,A2的下标表示了相邻帧的序号“1”,“2”;
Figure PCTCN2022105653-appb-000001
表示了两对齐单元之间的信息传递,箭头表示信息从
Figure PCTCN2022105653-appb-000002
传递到了
Figure PCTCN2022105653-appb-000003
Figure PCTCN2022105653-appb-000004
都表示了将时刻“1”的信息对齐到时刻“0”,而它们的下标分别是“1”和“2”,说明了它们的信号分别来自视频帧“1”和视频帧“2”。M表示运动向量,比如
Figure PCTCN2022105653-appb-000005
Figure PCTCN2022105653-appb-000006
C-Prop、U-Prop、T-Prop分别表示连续性、唯 一性、转移性这三种运动信息。
具体实现中,参考图2,假设有五个连续帧,编号分别为“-2”,“-1”,“0”,“+1”,“+2”;运动对齐的目标是为了把四个相邻帧“-2”,“-1”,“+1”,“+2”对齐到参考帧“0”上,这样四个对齐的任务定义成A-2,A-1,A1,A2;根据定义,A1表示了“+1”→“0”这个对齐任务,这个任务“+1”和“0”之间不存在中间帧,因此只有一个对齐单元
Figure PCTCN2022105653-appb-000007
A2表示了“+2”→“0”这个对齐任务,而“+2”和“0”存在一个中间帧“1”,因此A2包含了两个对齐单元
Figure PCTCN2022105653-appb-000008
“+2”→“+1”,
Figure PCTCN2022105653-appb-000009
“+1”→“0”。A2中包含的两个对齐单元
Figure PCTCN2022105653-appb-000010
Figure PCTCN2022105653-appb-000011
在时序上是相邻的,定义运动连续性的传递规则“C”:
Figure PCTCN2022105653-appb-000012
相邻的两个对齐任务,例如A1和A2中的
Figure PCTCN2022105653-appb-000013
Figure PCTCN2022105653-appb-000014
表示的对齐起始时刻和终止时刻一样,即“+1”→“0”,但隶属于对齐任务A1和A2,因此他们的源信息分别来自于时序帧“+1”和“+2”,这里定义了运动对齐信息的第二种传递规则“U”:
Figure PCTCN2022105653-appb-000015
而基于前面给出的两种传递规则,衍生出了第三种传递规则“T”:
Figure PCTCN2022105653-appb-000016
参考图2,可以简单表示为:
A1:(“+1”→“0”)
Figure PCTCN2022105653-appb-000017
A2:(“+2”→“+1”,“+1”→“+0”)
Figure PCTCN2022105653-appb-000018
A3:以此类推。
这样,对于不同帧的信息,采用了渐进式的对齐策略,解决了长距离直接对齐困难的问题;于此同时,充分地考虑了历史对齐信息,比如三种相关的历史运动信息:“C”,“U”,“T”;在每次执行当前对齐步骤时,将历史的运动信号作为已知的条件,帮助到当前的对齐。
本实施例中,通过渐进的对齐方案,充分挖掘了不同帧运动之间的关系,从而能够准确的实现时序的对齐,使得得到的时序帧的时序帧信息比较准确,同时解决了长距离直接对齐困难的问题。
在一个实施例中,上述步骤S103,根据参考帧信息,对各时序帧信息进行聚合处理,得到各时序帧的聚合信息,具体包括:根据参考帧信息和各时序帧信息,确定各时序帧信息的第一聚合权重和第二聚合权重;根据各时序帧信息的第一聚合权重,对各时序帧信息进行聚合处理,得到各时序帧信息的初始聚合信息;根据各时序帧信息的第二聚合权重,对各时序帧信息的初始聚合信息进行再次聚合处理,得到各时序帧的聚合信息。
其中,第一聚合权重是指准确性聚合权重,比如图3中的W k;第二聚合权重是指一致性聚合权重,比如图3中的C k
具体实现中,各时序帧信息的第一聚合权重通过下述方式得到:分别获取各时序帧信息与参考帧信息之间的差异信息;根据各时序帧信息与参考帧信息之间的差异信息,确定各时序帧信息的第一聚合权重。例如,服务器分别获取各时序帧信息与参考帧信息之间的差异信息;根据各时序帧信息与参考帧信息之间的差异信息,查询预设的差异信息与第一聚合权重的对应关系,得到各时 序帧信息的第一聚合权重。
具体实现中,各时序帧信息的第二聚合权重通过下述方式得到:获取各时序帧信息的平均值;获取各时序帧信息与平均值之间的距离;根据各时序帧信息与平均值之间的距离,确定各时序帧信息的第二聚合权重。例如,服务器先计算各时序帧信息的平均值,然后分别获取各时序帧信息与平均值之间的平方根距离,对应作为各时序帧信息与平均值之间的距离。最后根据各时序帧信息与平均值之间的距离,查询预设的距离与第二聚合权重的对应关系,得到各时序帧信息的第二聚合权重。
举例说明,参考图3,有两种聚合策略,分别是基于准确性的信息重聚合策略和基于一致性的信息聚合策略;F表示时序帧信息,P表示图像块。
针对图3中的(a)基于准确性的信息重聚合策略:首先有一个时序帧信息
Figure PCTCN2022105653-appb-000019
对任意的一个位置取3*3的块,同时将相同位置的参考帧信息取出,逐一的将参考帧的信息和这个对应的块进行乘法运算。之后将乘积的结果进行归一化处理(比如softmax处理),得到了这个3*3块的权重W k。最后再将这个3*3的权重乘以这个3*3块并求和,这样就得到了一个新的值。这个新的值则是基于准确性的信息重聚合得到像素值,所有位置经过计算后就生成了初始聚合信息
Figure PCTCN2022105653-appb-000020
需要说明的是,时序帧信息和参考帧信息之间的差异,通过计算余弦距离(向量点积)得到的,值越大,表示时序帧信息和参考帧信息之间的差异越小,则权重越大。
针对图3中的(b)基于一致性的信息聚合策略:先将所有的相邻时序帧信息求平均,这样就得到了平均时序帧信息
Figure PCTCN2022105653-appb-000021
每个相邻时序帧信息都 逐元素的和平均时序帧信息进行求平方根,通过指数函数“exp -(*)”就得到了新的一张权重图C k。需要说明的是,平方根距离(体现为差异性)越大,说明该时序帧的信息不连续性越强,则应该降低权重的大小。
最后,使用逐元素相乘结合两种策略的输出结果:
Figure PCTCN2022105653-appb-000022
这样就得到了一时序帧重聚合之后的信息
Figure PCTCN2022105653-appb-000023
需要说明的是,基于这两种权重,既能够过滤掉不准确的时序信息,又可以增强准确、可靠的时序信息。当时序信息不准确时,权重W k就相应的比较小,从而聚合的程度就小,达到过滤不准确的时序信息的目的。同理,当时序信息不连续时,C k就较小,同样聚合程度也小,从而同样可以过滤掉不连续,也就是不准确的时序信息。相反,当C k和W k都大时,他们的乘积才大,因此又可以用来增强准确、可靠的时序信息。本申请实施例结合这两种度量方式,实现信息重聚合。
本实施例中,根据各时序帧信息的第一聚合权重和第二聚合权重,对各时序帧的时序帧信息进行聚合处理,得到各时序帧的聚合信息,既能够过滤掉不准确的时序信息,又可以增强准确、可靠的时序信息。
在一个实施例中,上述步骤S104,根据参考帧信息和各聚合信息,重建出参考帧的目标视频帧,具体包括:将参考帧信息和各聚合信息进行拼接处理,得到拼接信息;对拼接信息进行卷积处理,得到参考帧的目标视频帧。
具体地,服务器将参考帧信息和各聚合信息输入信息重建模型,通过信息 重建模型将参考帧信息和各聚合信息进行拼接处理,得到拼接信息,并对拼接信息进行一系列卷积处理,得到高质量的视频帧,作为参考帧的目标视频帧。
本实施例中,根据参考帧信息和各聚合信息,有利于重建高质量的目标视频帧,避免了通过神经网络直接学习低分辨率图像到高分辨率图像的非线性映射,导致得到的图像容易存在伪影、噪声等错误的信号,很难重建出高质量的图像的缺陷。
在一个实施例中,如图4所示,提供了另一种视频增强方法,以该方法应用于服务器为例进行说明,包括以下步骤:
步骤S401,获取连续视频帧;其中,连续视频帧包括参考帧和与参考帧相邻的时序帧。
步骤S402,提取出参考帧的特征信息和各时序帧的特征信息,将参考帧的特征信息,作为参考帧的参考帧信息。
步骤S403,以参考帧为对齐目标,分别基于各时序帧的特征信息的历史运动信息,将各时序帧的特征信息进行对齐处理,得到各时序帧的时序帧信息。
步骤S404,分别获取各时序帧信息与参考帧信息之间的差异信息;根据各时序帧信息与参考帧信息之间的差异信息,确定各时序帧信息的第一聚合权重。
步骤S405,获取各时序帧信息的平均值;获取各时序帧信息与平均值之间的距离;根据各时序帧信息与平均值之间的距离,确定各时序帧信息的第二聚合权重。
步骤S406,根据各时序帧信息的第一聚合权重,对各时序帧信息进行聚合处理,得到各时序帧信息的初始聚合信息。
步骤S407,根据各时序帧信息的第二聚合权重,对各时序帧信息的初始聚合信息进行再次聚合处理,得到各时序帧的聚合信息。
步骤S408,将参考帧信息和各聚合信息进行拼接处理,得到拼接信息;对拼接信息进行卷积处理,得到参考帧的目标视频帧。
上述视频增强方法中,通过对与参考帧相邻的各时序帧的特征信息进行对齐和聚合处理,并结合参考帧信息和各时序帧的聚合信息,使得重建出的视频帧拥有更高的信噪比和结构相似性,视觉效果也更为逼真,从而提高了重建出 的视频帧的图像质量。避免了通过神经网络直接学习低分辨率图像到高分辨率图像的非线性映射,导致得到的图像容易存在伪影、噪声等错误的信号,很难重建出高质量的图像的缺陷。
在一个实施例中,如图5所示,本申请实施例还提出了一种用于时序对齐的视频增强方法,该方法不同于以前的方法直接对长距离的相邻帧进行运动估计,本申请实施例采用了一种渐进式的对齐策略。该对齐的策略充分利用了历史的运动信息,从而可以更加准确的实现长距离的帧间对齐、获取更可靠的时序信息。于此同时,为了过滤掉不可靠的对齐信息,本申请实施例提出了一种基于时序信息一致性和准确性的信息聚合策略。通过所提出的策略,使得本申请实施例的方法能够在剔除不可靠的对齐信息的同时增强可靠对齐信息的权重。该方法所生成的图像拥有更高的信噪比和结构相似性,视觉效果也更为逼真。可以有效的处理视频模糊和噪声,并提高视频的分辨率,从而生成高质量的视频画面。具体包括如下内容:
首先通过特征提取器来提取每个视频帧的信息,接着通过渐进式的运动对齐器将提取的信息进行初步的对齐。然后通过信息聚合器对不同的对齐信息进行聚合,最后通过重建器将聚合后的信息进行计算,并重建出高质量的视频帧。
其中,运动对齐是视频修复任务的一个重要组成模块,本申请实施例提出的运动对齐模块流程如图2左边的图所示,对于不同帧的信息,采用了渐进式的对齐策略,解决了长距离直接对齐困难的问题。于此同时,充分地考虑了历史对齐信息,如图2右边的图所示,定义了三种相关的历史运动信息:“C”,“U”,“T”。在每次执行当前对齐步骤时,将历史的运动信号作为已知条件,帮助到当前的对齐。通过这次渐进式的对齐方案,充分挖掘了不同帧运动之间的关系,从而能够准确的实现时序的对齐。
其中,对于视频修复任务而言,每个对齐的时序帧信息的重要性是存在差异性的,而且在对齐模块中不可避免的会引入一定的误差。为了更好的消除掉对齐模块产生的误差,同时给于不同时序帧信息自适应的聚合权重,本申请实施例提出了一种有效的信息重聚合模块。如图3所示:对于给定的某个相邻的时序帧信息,本申请实施例共采用了两种策略来实现自适应的聚合:(1)基于准确性的信息重聚合策略:如图3中的(a)所示,对于每一个时序帧信息, 计算该时序帧信息和参考帧信息之间的差异,根据差异性计算了基于信息准确性的聚合权重。(2)基于一致性的信息聚合策略:如图3中的(b)所示,对于每个时序帧信息,统计该时序帧信息和平均的时序帧信息之间的距离,根据距离的大小,计算了基于信息一致性的聚合权重。基于这两种权重,既能够过滤掉不准确的时序信息,又可以增强准确、可靠的时序信息。
上述用于时序对齐的视频增强方法,可以达到以下技术效果:(1)本方法突破了相关视频修复方法只能处理某种特定任务的限制,可以在一种框架中同时处理三种不同的视频问题,同时生成质量更高的视频帧;相比于相关视频修复方法,本方法在视频去模糊任务、视频去噪声任务和视频超分任务,都取得了最佳的结果;(2)克服了相关技术对于快速运动的物体难以进行帧间的信息对齐和聚合,从而很难重建出高质量的图像的缺陷;同时避免了相关技术对于有效信息的聚合存在偏差,使得产生的图像存在伪影、噪声等错误的信号的缺陷。
应该理解的是,虽然图1-5的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,图1-5中的至少一部分步骤可以包括多个步骤或者多个阶段,这些步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤中的步骤或者阶段的至少一部分轮流或者交替地执行。
在一个实施例中,如图6所示,提供了一种视频增强装置,包括:
视频帧获取模块610,用于获取连续视频帧;其中,连续视频帧包括参考帧和与参考帧相邻的时序帧。
信息提取模块620,用于提取出参考帧的特征信息和各时序帧的特征信息;将参考帧的特征信息,作为参考帧的参考帧信息,以及将各时序帧的特征信息进行对齐处理,得到各时序帧的时序帧信息。
信息聚合模块630,用于根据参考帧信息,对各时序帧信息进行聚合处理,得到各时序帧的聚合信息。
视频帧重建模块640,用于根据参考帧信息和各聚合信息,重建出参考帧 的目标视频帧;其中,目标视频帧的图像质量高于参考帧的图像质量。
在一些实施例中,信息提取模块620,具体用于以参考帧为对齐目标,分别基于各时序帧的特征信息的历史运动信息,将各时序帧的特征信息进行对齐处理,得到各时序帧的时序帧信息。
在一些实施例中,信息提取模块620,具体用于若时序帧与参考帧之间包含中间帧,则以中间帧为对齐目标,基于时序帧的特征信息的历史运动信息,对时序帧的特征信息进行对齐处理,得到时序帧的初始对齐信息;以参考帧为对齐目标,基于初始对齐信息的历史运动信息,对初始对齐信息进行再次对齐处理,得到时序帧的时序帧信息。
在一些实施例中,信息聚合模块630,具体用于根据参考帧信息和各时序帧信息,确定各时序帧信息的第一聚合权重和第二聚合权重;根据各时序帧信息的第一聚合权重,对各时序帧信息进行聚合处理,得到各时序帧信息的初始聚合信息;根据各时序帧信息的第二聚合权重,对各时序帧信息的初始聚合信息进行再次聚合处理,得到各时序帧的聚合信息。
在一些实施例中,信息聚合模块630,还用于分别获取各时序帧信息与参考帧信息之间的差异信息;根据各时序帧信息与参考帧信息之间的差异信息,确定各时序帧信息的第一聚合权重。
在一些实施例中,信息聚合模块630,还用于获取各时序帧信息的平均值;获取各时序帧信息与平均值之间的距离;根据各时序帧信息与平均值之间的距离,确定各时序帧信息的第二聚合权重。
在一些实施例中,视频帧重建模块640,具体用于将参考帧信息和各聚合信息进行拼接处理,得到拼接信息;对拼接信息进行卷积处理,得到参考帧的目标视频帧。
在一些实施例中,视频帧重建模块640,具体用于将参考帧信息和各聚合信息输入信息重建模型,通过信息重建模型将参考帧信息和各聚合信息进行拼接处理,得到拼接信息。
在一些实施例中,时序帧信息与参考帧信息之间的差异信息是通过计算时序帧信息与参考帧信息的余弦距离得到的。
关于视频增强装置的具体限定可以参见上文中对于视频增强方法的限定, 在此不再赘述。上述视频增强装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。
在一个实施例中,提供了一种计算机设备,该计算机设备可以是服务器,其内部结构图可以如图7所示。该计算机设备包括通过系统总线连接的处理器、存储器和网络接口。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机程序和数据库。该内存储器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的数据库用于存储参考帧的特征信息、各时序帧的特征信息、各时序帧的时序帧信息、各时序帧的聚合信息、目标视频帧等数据。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机程序被处理器执行时以实现一种视频增强方法。
本领域技术人员可以理解,图7中示出的结构,仅仅是与本申请实施例方案相关的部分结构的框图,并不构成对本申请实施例方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
在一个实施例中,还提供了一种计算机设备,包括存储器和处理器,存储器中存储有计算机程序,该处理器执行计算机程序时实现上述各方法实施例中的步骤。
在一个实施例中,提供了一种计算机可读存储介质,存储有计算机程序,该计算机程序被处理器执行时实现上述各方法实施例中的步骤。
在一个实施例中,提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行上述各方法实施例中的步骤。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于 一非易失性计算机可读取存储介质中,该计算机程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请实施例所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和易失性存储器中的至少一种。非易失性存储器可包括只读存储器(Read-Only Memory,ROM)、磁带、软盘、闪存或光存储器等。易失性存储器可包括随机存取存储器(Random Access Memory,RAM)或外部高速缓冲存储器。作为说明而非局限,RAM可以是多种形式,比如静态随机存取存储器(Static Random Access Memory,SRAM)或动态随机存取存储器(Dynamic Random Access Memory,DRAM)等。
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上所述实施例仅表达了本申请实施例的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请实施例构思的前提下,还可以做出若干变形和改进,这些都属于本申请实施例的保护范围。因此,本申请实施例专利的保护范围应以所附权利要求为准。

Claims (20)

  1. 一种视频增强方法,其特征在于,包括:
    获取连续视频帧;其中,所述连续视频帧包括参考帧和与所述参考帧相邻的时序帧;
    提取出所述参考帧的特征信息和各所述时序帧的特征信息;将所述参考帧的特征信息,作为所述参考帧的参考帧信息,以及将各所述时序帧的特征信息进行对齐处理,得到各所述时序帧的时序帧信息;
    根据所述参考帧信息,对各所述时序帧信息进行聚合处理,得到各所述时序帧的聚合信息;
    根据所述参考帧信息和各所述聚合信息,重建出所述参考帧的目标视频帧;其中,所述目标视频帧的图像质量高于所述参考帧的图像质量。
  2. 根据权利要求1所述的方法,其特征在于,所述将各所述时序帧的特征信息进行对齐处理,得到各所述时序帧的时序帧信息,包括:
    以所述参考帧为对齐目标,分别基于各所述时序帧的特征信息的历史运动信息,将各所述时序帧的特征信息进行对齐处理,得到各所述时序帧的时序帧信息。
  3. 根据权利要求2所述的方法,其特征在于,所述以所述参考帧为对齐目标,分别基于各所述时序帧的特征信息的历史运动信息,将各所述时序帧的特征信息进行对齐处理,得到各所述时序帧的时序帧信息,包括:
    若所述时序帧与所述参考帧之间包含中间帧,则以所述中间帧为对齐目标,基于所述时序帧的特征信息的历史运动信息,对所述时序帧的特征信息进行对齐处理,得到所述时序帧的初始对齐信息;
    以所述参考帧为对齐目标,基于所述初始对齐信息的历史运动信息,对所述初始对齐信息进行再次对齐处理,得到所述时序帧的时序帧信息。
  4. 根据权利要求1所述的方法,其特征在于,所述根据所述参考帧信息,对各所述时序帧信息进行聚合处理,得到各所述时序帧的聚合信息,包括:
    根据所述参考帧信息和各所述时序帧信息,确定各所述时序帧信息的第一聚合权重和第二聚合权重;
    根据各所述时序帧信息的第一聚合权重,对各所述时序帧信息进行聚合处 理,得到各所述时序帧信息的初始聚合信息;
    根据各所述时序帧信息的第二聚合权重,对各所述时序帧信息的初始聚合信息进行再次聚合处理,得到各所述时序帧的聚合信息。
  5. 根据权利要求4所述的方法,其特征在于,所述各所述时序帧信息的第一聚合权重通过下述方式得到:
    分别获取各所述时序帧信息与所述参考帧信息之间的差异信息;
    根据各所述时序帧信息与所述参考帧信息之间的差异信息,确定各所述时序帧信息的第一聚合权重。
  6. 根据权利要求4所述的方法,其特征在于,所述各所述时序帧信息的第二聚合权重通过下述方式得到:
    获取各所述时序帧信息的平均值;
    获取各所述时序帧信息与所述平均值之间的距离;
    根据各所述时序帧信息与所述平均值之间的距离,确定各所述时序帧信息的第二聚合权重。
  7. 根据权利要求1所述的方法,其特征在于,所述根据所述参考帧信息和各所述聚合信息,重建出所述参考帧的目标视频帧,包括:
    将所述参考帧信息和各所述聚合信息进行拼接处理,得到拼接信息;
    对所述拼接信息进行卷积处理,得到所述参考帧的目标视频帧。
  8. 根据权利要求7所述的方法,其特征在于,所述将所述参考帧信息和各所述聚合信息进行拼接处理,得到拼接信息,包括:
    将所述参考帧信息和各所述聚合信息输入信息重建模型,通过所述信息重建模型将所述参考帧信息和各所述聚合信息进行拼接处理,得到拼接信息。
  9. 根据权利要求5所述的方法,其特征在于,所述时序帧信息与所述参考帧信息之间的差异信息是通过计算所述时序帧信息与所述参考帧信息的余弦距离得到的。
  10. 一种视频增强装置,其特征在于,包括:
    视频帧获取模块,用于获取连续视频帧;其中,所述连续视频帧包括参考帧和与所述参考帧相邻的时序帧;
    信息提取模块,用于提取出所述参考帧的特征信息和各所述时序帧的特征 信息;将所述参考帧的特征信息,作为所述参考帧的参考帧信息,以及将各所述时序帧的特征信息进行对齐处理,得到各所述时序帧的时序帧信息;
    信息聚合模块,用于根据所述参考帧信息,对各所述时序帧信息进行聚合处理,得到各所述时序帧的聚合信息;
    视频帧重建模块,用于根据所述参考帧信息和各所述聚合信息,重建出所述参考帧的目标视频帧;其中,所述目标视频帧的图像质量高于所述参考帧的图像质量。
  11. 根据权利要求10所述的装置,其特征在于,所述信息提取模块,具体用于以参考帧为对齐目标,分别基于各时序帧的特征信息的历史运动信息,将各时序帧的特征信息进行对齐处理,得到各时序帧的时序帧信息。
  12. 根据权利要求11所述的装置,其特征在于,所述信息提取模块,具体用于若时序帧与参考帧之间包含中间帧,则以中间帧为对齐目标,基于时序帧的特征信息的历史运动信息,对时序帧的特征信息进行对齐处理,得到时序帧的初始对齐信息;以参考帧为对齐目标,基于初始对齐信息的历史运动信息,对初始对齐信息进行再次对齐处理,得到时序帧的时序帧信息。
  13. 根据权利要求10所述的装置,其特征在于,所述信息聚合模块,具体用于根据参考帧信息和各时序帧信息,确定各时序帧信息的第一聚合权重和第二聚合权重;根据各时序帧信息的第一聚合权重,对各时序帧信息进行聚合处理,得到各时序帧信息的初始聚合信息;根据各时序帧信息的第二聚合权重,对各时序帧信息的初始聚合信息进行再次聚合处理,得到各时序帧的聚合信息。
  14. 根据权利要求13所述的装置,其特征在于,所述信息聚合模块,还用于分别获取各时序帧信息与参考帧信息之间的差异信息;根据各时序帧信息与参考帧信息之间的差异信息,确定各时序帧信息的第一聚合权重。
  15. 根据权利要求13所述的装置,其特征在于,所述信息聚合模块,还用于获取各时序帧信息的平均值;获取各时序帧信息与平均值之间的距离;根据各时序帧信息与平均值之间的距离,确定各时序帧信息的第二聚合权重。
  16. 根据权利要求10所述的装置,其特征在于,所述视频帧重建模块,具体用于将参考帧信息和各聚合信息进行拼接处理,得到拼接信息;对拼接信 息进行卷积处理,得到参考帧的目标视频帧。
  17. 根据权利要求16所述的装置,其特征在于,所述视频帧重建模块,具体用于将所述参考帧信息和各所述聚合信息输入信息重建模型,通过所述信息重建模型将所述参考帧信息和各所述聚合信息进行拼接处理,得到拼接信息。
  18. 一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,其特征在于,所述处理器执行所述计算机程序时实现权利要求1至9中任一项所述的方法的步骤。
  19. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现权利要求1至9中任一项所述的方法的步骤。
  20. 一种计算机程序产品,其特征在于,所述计算机程序产品包括计算机程序,所述计算机程序被处理器执行时实现权利要求1至9任一项所述的方法的步骤。
PCT/CN2022/105653 2021-11-11 2022-07-14 视频增强方法、装置、计算机设备和存储介质 WO2023082685A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111330266.9A CN113781312B (zh) 2021-11-11 2021-11-11 视频增强方法、装置、计算机设备和存储介质
CN202111330266.9 2021-11-11

Publications (1)

Publication Number Publication Date
WO2023082685A1 true WO2023082685A1 (zh) 2023-05-19

Family

ID=78873738

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/105653 WO2023082685A1 (zh) 2021-11-11 2022-07-14 视频增强方法、装置、计算机设备和存储介质

Country Status (2)

Country Link
CN (1) CN113781312B (zh)
WO (1) WO2023082685A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113781312B (zh) * 2021-11-11 2022-03-25 深圳思谋信息科技有限公司 视频增强方法、装置、计算机设备和存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2105881A1 (en) * 2008-03-25 2009-09-30 Panasonic Corporation Fast reference frame selection for reconstruction of a high-resolution frame from low-resolution frames
CN111047516A (zh) * 2020-03-12 2020-04-21 腾讯科技(深圳)有限公司 图像处理方法、装置、计算机设备和存储介质
CN111784570A (zh) * 2019-04-04 2020-10-16 Tcl集团股份有限公司 一种视频图像超分辨率重建方法及设备
CN112584158A (zh) * 2019-09-30 2021-03-30 复旦大学 视频质量增强方法和系统
CN112700392A (zh) * 2020-12-01 2021-04-23 华南理工大学 一种视频超分辨率处理方法、设备及存储介质
CN113781312A (zh) * 2021-11-11 2021-12-10 深圳思谋信息科技有限公司 视频增强方法、装置、计算机设备和存储介质

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180082428A1 (en) * 2016-09-16 2018-03-22 Qualcomm Incorporated Use of motion information in video data to track fast moving objects
CN108495130B (zh) * 2017-03-21 2021-04-20 腾讯科技(深圳)有限公司 视频编码、解码方法和装置、终端、服务器和存储介质
CN110070511B (zh) * 2019-04-30 2022-01-28 北京市商汤科技开发有限公司 图像处理方法和装置、电子设备及存储介质
US11526970B2 (en) * 2019-09-04 2022-12-13 Samsung Electronics Co., Ltd System and method for video processing with enhanced temporal consistency
CN110830808A (zh) * 2019-11-29 2020-02-21 合肥图鸭信息科技有限公司 一种视频帧重构方法、装置及终端设备
CN112348766B (zh) * 2020-11-06 2023-04-18 天津大学 一种用于监控视频增强的渐进式特征流深度融合网络

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2105881A1 (en) * 2008-03-25 2009-09-30 Panasonic Corporation Fast reference frame selection for reconstruction of a high-resolution frame from low-resolution frames
CN111784570A (zh) * 2019-04-04 2020-10-16 Tcl集团股份有限公司 一种视频图像超分辨率重建方法及设备
CN112584158A (zh) * 2019-09-30 2021-03-30 复旦大学 视频质量增强方法和系统
CN111047516A (zh) * 2020-03-12 2020-04-21 腾讯科技(深圳)有限公司 图像处理方法、装置、计算机设备和存储介质
CN112700392A (zh) * 2020-12-01 2021-04-23 华南理工大学 一种视频超分辨率处理方法、设备及存储介质
CN113781312A (zh) * 2021-11-11 2021-12-10 深圳思谋信息科技有限公司 视频增强方法、装置、计算机设备和存储介质

Also Published As

Publication number Publication date
CN113781312A (zh) 2021-12-10
CN113781312B (zh) 2022-03-25

Similar Documents

Publication Publication Date Title
CN109671023B (zh) 一种人脸图像超分辨率二次重建方法
Yan et al. Attention-guided network for ghost-free high dynamic range imaging
Chen et al. Camera lens super-resolution
WO2021208122A1 (zh) 基于深度学习的视频盲去噪方法及装置
WO2020015330A1 (zh) 基于增强的神经网络的图像复原方法、存储介质及系统
CN111861884B (zh) 一种基于深度学习的卫星云图超分辨率重建方法
CN107123094B (zh) 一种混合泊松、高斯和脉冲噪声的视频去噪方法
Chen et al. Single image super-resolution using deep CNN with dense skip connections and inception-resnet
KR20200084419A (ko) 모아레 제거 모델을 생성하기 위한 장치, 모아레를 제거하기 위한 방법 및 모아레를 제거하기 위한 촬상 장치
CN105488759B (zh) 一种基于局部回归模型的图像超分辨率重建方法
CN114926335B (zh) 基于深度学习的视频超分辨率方法及系统、计算机设备
WO2023160426A1 (zh) 视频插帧方法、训练方法、装置和电子设备
WO2023082685A1 (zh) 视频增强方法、装置、计算机设备和存储介质
WO2024002211A1 (zh) 一种图像处理方法及相关装置
CN110766153A (zh) 神经网络模型训练方法、装置及终端设备
CN114972022B (zh) 一种基于非对齐rgb图像的融合高光谱超分辨率方法和系统
CN114170286A (zh) 一种基于无监督深度学习的单目深度估计方法
WO2023206343A1 (zh) 一种基于图像预训练策略的图像超分辨率方法
CN105225211A (zh) 一种基于振动提高视频分辨率的方法
Xu et al. Deformable kernel convolutional network for video extreme super-resolution
CN110895790B (zh) 基于后验降质信息估计的场景图像超分辨方法
CN116091337B (zh) 一种基于事件信号神经编码方式的图像增强方法及装置
CN108492264B (zh) 一种基于sigmoid变换的单帧图像快速超分辨方法
CN116862765A (zh) 一种医学影像超分辨率重建方法及系统
CN116309066A (zh) 一种超分辨率成像方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22891494

Country of ref document: EP

Kind code of ref document: A1