WO2024082933A1 - 视频处理方法、装置、电子设备及存储介质 - Google Patents

视频处理方法、装置、电子设备及存储介质 Download PDF

Info

Publication number
WO2024082933A1
WO2024082933A1 PCT/CN2023/121354 CN2023121354W WO2024082933A1 WO 2024082933 A1 WO2024082933 A1 WO 2024082933A1 CN 2023121354 W CN2023121354 W CN 2023121354W WO 2024082933 A1 WO2024082933 A1 WO 2024082933A1
Authority
WO
WIPO (PCT)
Prior art keywords
processed
frames
video
interlaced
model
Prior art date
Application number
PCT/CN2023/121354
Other languages
English (en)
French (fr)
Inventor
张珂新
赖水长
赵世杰
Original Assignee
抖音视界有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 抖音视界有限公司 filed Critical 抖音视界有限公司
Publication of WO2024082933A1 publication Critical patent/WO2024082933A1/zh

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/01Conversion of standards, e.g. involving analogue television standards or digital television standards processed at pixel level
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/14Picture signal circuitry for video frequency region
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • H04N5/265Mixing

Definitions

  • the embodiments of the present disclosure relate to the field of video processing technology, for example, to a video processing method, device, electronic device and storage medium.
  • interlaced video When displaying interlaced video on an existing display interface, it needs to be de-interlaced to display the complete video.
  • de-interlacing is usually performed on interlaced videos to remove the brushing effect in interlaced videos.
  • the de-interlacing effect of this method is not good. For example, in scenes with moving objects, the brushed areas are relatively blurred, and it is easy to lose details and have brushed images.
  • the present disclosure provides a video processing method, device, electronic device and storage medium to achieve an effect of effectively restoring video images. For example, for video images of motion scenes, a more significant restoration effect can be achieved.
  • an embodiment of the present disclosure provides a video processing method, the method comprising:
  • a target video is determined.
  • an embodiment of the present disclosure further provides a video processing device, the device comprising:
  • a module for acquiring interlaced frames to be processed configured to acquire at least three interlaced frames to be processed; wherein the interlaced frames to be processed are determined based on two adjacent video frames to be processed;
  • a target video frame determination module configured to input the at least three interlaced frames to be processed into a pre-trained image fusion model to obtain at least two target video frames corresponding to the at least three interlaced frames to be processed; wherein the image fusion model includes a feature processing sub-model and a motion perception sub-model;
  • the target video determination module is configured to determine the target video based on the at least two target video frames.
  • an embodiment of the present disclosure further provides an electronic device, the electronic device comprising:
  • processors one or more processors
  • a storage device configured to store one or more programs
  • the one or more processors When the one or more programs are executed by the one or more processors, the one or more processors implement the video processing method as described in any one of the embodiments of the present disclosure.
  • the embodiments of the present disclosure further provide a storage medium comprising computer executable instructions, which, when executed by a computer processor, are used to execute the video processing method as described in any one of the embodiments of the present disclosure.
  • FIG1 is a schematic flow chart of a video processing method provided by an embodiment of the present disclosure.
  • FIG2 is a schematic diagram of an image fusion model provided by an embodiment of the present disclosure.
  • FIG3 is a schematic diagram of a motion perception model provided by an embodiment of the present disclosure.
  • FIG4 is a schematic flow chart of a video processing method provided by an embodiment of the present disclosure.
  • FIG5 is a schematic diagram of a video frame to be processed provided by an embodiment of the present disclosure.
  • FIG6 is a schematic diagram of the structure of a video processing device provided by an embodiment of the present disclosure.
  • FIG. 7 is a schematic diagram of the structure of an electronic device provided by an embodiment of the present disclosure.
  • a prompt message is sent to the user to clearly prompt the user that the operation requested to be performed will require obtaining and using the user's personal information.
  • the user can autonomously choose whether to provide personal information to software or hardware such as an electronic device, application, server, or storage medium that performs the operation of the technical solution of the present disclosure according to the prompt message.
  • the prompt information may be sent to the user in the form of a pop-up window, in which the prompt information may be presented in text form.
  • the pop-up window may also carry a selection control for the user to select "agree” or “disagree” to provide personal information to the electronic device.
  • the first implementation method is to perform deinterlacing on the interlaced frames to be processed based on the deinterlacing algorithm (YADIF) to remove the brushed effect of the original video.
  • YADIF deinterlacing algorithm
  • the second implementation method is to input multiple interlaced frames to be processed into the ST-Deint deep learning neural network model that combines time domain and spatial domain information prediction, and process the interlaced frames to be processed based on the deep learning algorithm.
  • the third implementation method is to process the interlaced frames to be processed based on the deep learning model DIN, so that the interlaced frames to be processed first fill in the missing information and then fuse the interfield content to obtain the processed video. Similar to the second implementation method, this method can only achieve a rough restoration effect for motion scenes, and the detail restoration effect is not good. For example, the motion brushed area is prone to blurring and loss of details. Based on the above, it can be seen that the video processing method in the related art still has the problem of poor output video display effect.
  • the interlaced frames to be processed can be processed based on the image fusion model including multiple sub-models, thereby avoiding the output target video from missing details, screen brushing, and blurred images.
  • FIG1 is a flow chart of a video processing method provided by an embodiment of the present disclosure.
  • the embodiment of the present disclosure is applicable to the situation where feature information is supplemented for an original video using interlaced scanning so that the obtained target video can be fully displayed on an existing display device.
  • the method can be executed by a video processing device, which can be implemented in the form of software and/or hardware, for example, by an electronic device, which can be a mobile terminal, a PC or a server, etc.
  • the technical solution provided by the embodiment of the present disclosure can be executed based on a client, can be executed based on a server, or can be executed based on the cooperation of a client and a server.
  • the method comprises:
  • the interlaced frame to be processed is determined based on two adjacent video frames to be processed.
  • the device for executing the video processing method can be integrated into an application software that supports the video processing function, and the software can be installed in an electronic device, for example, the electronic device can be a mobile terminal or a PC.
  • the application software can be a type of software for image/video processing, and its specific application software will not be described one by one here, as long as the image/video processing can be achieved. It can also be a specially developed application program to implement video processing and display the output video in the software, or it can be integrated in the corresponding page, and the user can realize the processing of special effect videos through the page integrated in the PC.
  • the user can shoot a video in real time based on the camera device of the mobile terminal, or actively upload a video based on the pre-developed control in the application software. Therefore, it can be understood that the real-time video captured by the application or the video actively uploaded by the user is the video to be processed. For example, based on the pre-written program, a plurality of video frames to be processed can be obtained.
  • early video display methods usually adopt an interlaced scanning method, that is, first scan the odd rows to obtain a video frame in which only the odd rows of pixels have rendered pixel values, and then scan the even rows to obtain a video frame in which only the even rows of pixels have rendered pixel values, and combine the two video frames to obtain a complete video frame.
  • This display method will cause a large time interval between the display of two adjacent video frames, resulting in large flickering of the video frame, jagged lines, false images and other image quality problems.
  • the current video display usually adopts a line-by-line scanning method
  • the interlaced video frames can be used as the video frames to be processed, that is, only odd-numbered rows of pixels have rendered pixel values or even-numbered rows of pixels have rendered pixel values.
  • the video frame combined by two adjacent frames of video frames to be processed is the interlaced frame to be processed.
  • de-interlacing refers to filling the missing half-field information of the odd and even fields of the interlaced frames of two adjacent frames of images to restore the original frame size, and finally obtaining odd and even frames.
  • the video frame to be processed is a video frame in which only odd-numbered rows of pixels have rendered pixel values or even-numbered rows of pixels have rendered pixel values
  • two adjacent frames of the video frame to be processed can be combined to obtain an interlaced frame to be processed, so that the interlaced frame to be processed can be de-interlaced.
  • I 1 , I 2 , I 3 , I 4 , I 5 , and I 6 can be used as video frames to be processed.
  • I 1 and I 2 can be combined to obtain a frame of interlaced frame to be processed D 1
  • I 3 and I 4 can be combined to obtain a frame of interlaced frame to be processed D 2
  • I 5 and I 6 can be combined to obtain a frame of video frame to be processed D 3 .
  • the principle of making the interlaced frame to be processed can be expressed based on the following formula:
  • the number of interlaced frames to be processed may be three or more than three, and this is not specifically limited in the embodiments of the present disclosure.
  • the number of interlaced frames to be processed corresponds to the number of video frames of the original video
  • the number of interlaced frames to be processed input into the model can be three frames or more than three frames.
  • the interlaced frames to be processed can be input into a pre-trained image fusion model.
  • the image fusion model can be a deep learning neural network model including multiple sub-models.
  • the image fusion model includes a feature processing sub-model and a motion perception sub-model.
  • the number of interlaced frames to be processed is at least three frames.
  • the feature processing submodel can be a neural network model including multiple convolution modules.
  • the feature processing submodel can be used to extract, fuse and perform other processing on the features in the interlaced frames to be processed.
  • the feature processing submodel can include multiple 3D convolution layers, so that the feature processing submodel can not only process the time domain feature information of multiple frames, but also process the spatial feature information, thereby strengthening the information interaction between the interlaced frames to be processed.
  • the motion perception submodel may be a neural network model for perceiving inter-frame motion.
  • the motion perception submodel may be composed of at least one convolutional network, a network including a backward warping function, and a residual network.
  • the backward warping function may realize the mapping between images.
  • the motion perception submodel may be used to process the feature information between frames, thereby making the inter-frame content more continuous and also achieving the effect of complementing each other's details.
  • the video frame to be processed can be processed based on multiple sub-models in the image fusion model, so as to obtain at least two target video frames corresponding to the interlaced frame to be processed.
  • the image fusion model includes multiple sub-models
  • the multiple sub-models in the model can be used in turn to perform corresponding processing on the video frames to be processed, thereby outputting at least two target video frames corresponding to the interlaced frames to be processed.
  • the image fusion model includes multiple sub-models, and the arrangement order of the multiple sub-models can be arranged according to the data input and output order.
  • the image fusion model includes a feature processing sub-model, a motion perception sub-model, and a 2D convolution layer, wherein the 2D convolution layer may be a neural network layer that performs feature processing on only the height and width of the data.
  • determining the arrangement order of multiple sub-models in the image fusion model based on the data input and output order can enable the image fusion model to not only process the feature information of the interlaced frame to be processed, but also perceive the motion between multiple interlaced frames to be processed, so as to make the content between frames more continuous and achieve the effect of detail supplementation.
  • the solution adopted by the related technology is to split the interlaced frame to be processed into odd and even rows, that is, to halve the H dimension of the interlaced frame to be processed. For example, if the matrix of the interlaced frame to be processed is (H ⁇ W ⁇ C), then the matrix after the odd and even row split is (2/H ⁇ W ⁇ C). Such a solution may cause the objects in the interlaced frame to be processed to be deformed in structure, thereby affecting the visual effect of the target video frame.
  • the processing process of the embodiment of the present disclosure can be understood as, on the basis of splitting the interlaced frame to be processed into odd and even rows, it is also split into odd and even columns, that is, a dual feature processing branch is adopted, so as to ensure that when the interlaced frame to be processed is processed based on the image fusion model, it can not only process the overall structural feature information of the interlaced frame to be processed, but also process the high-frequency detail feature information of the interlaced frame to be processed.
  • the feature processing sub-model includes a first feature extraction branch and a second feature extraction branch; the output of the first feature extraction branch is the input of the first motion perception sub-model in the motion perception sub-model, and the output of the second feature extraction branch is the input of the second motion perception sub-model in the motion perception sub-model; the output of the first motion perception sub-model and the output of the second motion perception sub-model are the input of the 2D convolution layer, so that the 2D convolution layer outputs the target Label the video frame.
  • the first feature extraction branch can be a neural network model for processing the structural features of the interlaced frame to be processed.
  • the first feature extraction branch includes a structural feature extraction network and a structural feature fusion network.
  • the structural feature extraction network can be composed of at least one convolutional network, so that at least one convolutional network can process the interlaced frame to be processed according to a preset structural splitting ratio to obtain structural features corresponding to the interlaced frame to be processed.
  • the structural feature fusion network can be a neural network of a U-Net structure stacked by at least one 3D convolutional layer. It should be noted that the convolution kernels of at least one 3D convolutional layer can be the same value or different values, and the present embodiment does not specifically limit this.
  • the structural feature fusion network can be used to strengthen the information interaction between frames, so that not only the spatial features of the interlaced frame to be processed can be processed, but also the time domain features between multiple frames can be strengthened.
  • the second feature extraction branch may be a neural network model for processing detail features of the interlaced frame to be processed.
  • the second feature extraction branch includes a detail feature extraction network and a detail feature fusion network.
  • the detail feature extraction network may be composed of at least one convolutional layer, so that at least one convolutional layer can process the interlaced frame to be processed according to a preset detail splitting ratio to obtain detail features corresponding to the interlaced frame to be processed.
  • the detail feature fusion network may be a neural network of a U-Net structure stacked by at least one 3D convolutional layer. It should be noted that the convolution kernels of at least one 3D convolutional layer may be the same value or different values, and the disclosed embodiment does not specifically limit this.
  • the interlaced frames to be processed are respectively input into the first feature extraction branch and the second feature extraction branch.
  • the interlaced frames to be processed are processed by the structural feature extraction network and the structural feature fusion network in the first feature extraction branch, they can be input into the first motion perception sub-model.
  • the interlaced frames to be processed are processed by the detail feature extraction network and the detail feature fusion network in the second feature extraction branch, they can be input into the second motion perception sub-model.
  • the model input is processed by the first motion perception sub-model, it can be input into the 2D convolution layer.
  • the model input is processed by the second motion perception sub-model, it is input into the 2D convolution layer so that the 2D convolution layer can output the target video frame.
  • the image fusion model can process the feature information of the interlaced frames to be processed and perceive the motion between the interlaced frames to be processed, so that the content between frames is more continuous and the effect of detail supplementation is achieved.
  • the interlaced frame to be processed is input into the image fusion model, and it can be processed based on multiple sub-models in the model, so as to obtain the target video frame corresponding to the interlaced frame to be processed.
  • the following is a detailed description of the process of the image fusion model processing the interlaced frame to be processed in conjunction with Figure 2.
  • At least three interlaced frames to be processed are input into a pre-trained image fusion model to obtain at least two target video frames corresponding to the at least three interlaced frames to be processed, including: performing equal-proportion feature extraction on the at least three interlaced frames to be processed based on a structural feature extraction network to obtain structural features corresponding to the interlaced frames to be processed; and performing even-odd field feature extraction on the at least three interlaced frames to be processed based on a detail feature extraction network to obtain detail features corresponding to the interlaced frames to be processed; processing the structural features based on a structural feature fusion network to obtain a first inter-frame feature map between two adjacent interlaced frames to be processed; and processing the detail features based on a detail feature fusion network to obtain a second inter-frame feature map between two adjacent interlaced frames to be processed; processing the first inter-frame feature map based on a first motion perception sub-model to obtain a first fused feature map; and processing the second inter-frame feature map
  • the structural feature may be a feature for reflecting the overall structural information of the interlaced frame to be processed.
  • the detail feature may be a feature for reflecting the detail information of the interlaced frame to be processed.
  • the detail feature may be a high-frequency feature, which is a higher-order feature than the structural feature.
  • the interlaced frame to be processed is input into the image fusion model, and the interlaced frame to be processed can be subjected to proportional dimensionality reduction processing based on the structural feature extraction network to obtain the structural features corresponding to the interlaced frame to be processed.
  • the detail feature extraction network performs an odd-even field splitting process on the interlaced frame to be processed to obtain detail features corresponding to the interlaced frame to be processed; for example, for structural features, feature fusion processing can be performed on the structural features based on the structural feature fusion network, and the structural features of two adjacent interlaced frames to be processed are fused, so that a fusion feature map between the two adjacent interlaced frames to be processed can be obtained, that is, the first inter-frame feature map; at the same time, for detail features, detail features can be fused based on the detail feature fusion network, and the detail features of two adjacent interlaced frames to be processed are fused, so that a fusion feature map between the two adjacent interlaced frames to be processed can be obtained.
  • the first inter-frame feature map is input into the first motion perception sub-model, and the first inter-frame feature map is processed based on the first motion perception sub-model to obtain the first fused feature map.
  • the second inter-frame feature map is input into the second motion perception sub-model, and the second inter-frame feature map is processed based on the second motion perception sub-model to obtain the second fused feature map.
  • the first fused feature map and the second fused feature map are input into the 2D convolution layer, and the fused feature map is processed based on the 2D convolution layer to obtain at least two target video frames corresponding to the interlaced frame to be processed.
  • the dual feature processing branch enables the image fusion model to process both the overall structural feature information of the interlaced frame to be processed and the detailed feature information of the interlaced frame to be processed.
  • the 3D convolution layer can be used to strengthen the information interaction between frames.
  • the motion perception sub-model can be used to perceive the motion between frames and perform feature alignment, so that the content between frames is more continuous, thereby improving the display effect of the target video frame.
  • the first inter-frame feature map may include a first feature map and a second feature map.
  • the first inter-frame feature map is processed based on the first motion perception sub-model, and the first feature map and the second feature map may be processed separately based on the first motion perception sub-model to obtain a first fused feature map.
  • the processing process of the first inter-frame feature map by the first motion perception sub-model may be specifically described below in conjunction with FIG. 3 .
  • the first inter-frame feature map is processed based on the first motion perception sub-model to obtain a first fused feature map, including: based on the convolutional network in the first motion perception sub-model, the first feature map and the second feature map are processed respectively to obtain a first optical flow map and a second optical flow map; based on the distortion network in the first motion perception sub-model, the first optical flow map and the second optical flow map are mapped and processed to obtain an offset; based on the first optical flow map, the second optical flow map and the offset, the first fused feature map is determined.
  • an optical flow graph can represent the speed and direction of movement of each pixel in two adjacent frames of an image.
  • Optical flow is the instantaneous speed of the pixel movement of a moving object in space on the observation imaging plane. It is a method of finding the correspondence between the previous frame and the current frame by using the change of pixels in the time domain in an image sequence and the correlation between connected frames, thereby calculating the motion information of the object between adjacent frames.
  • the distortion network can be a network containing a backward warping function, which can realize the mapping between images.
  • the offset can be data obtained after mapping based on the optical flow graph, which is used to represent the feature displacement offset.
  • the first inter-frame feature map is input into the first motion perception sub-model, and the first feature map and the second feature map can be processed based on the convolutional network, respectively, so as to obtain a first optical flow map for characterizing the motion speed and motion direction of the pixels in the two adjacent interlaced frames to be processed corresponding to the first feature map, and a second optical flow map for characterizing the motion speed and motion direction of the pixels in the two adjacent interlaced frames to be processed corresponding to the second feature map.
  • the first optical flow map and the second optical flow map are mapped based on the distortion network to obtain the offset corresponding to the first optical flow map and the offset corresponding to the second optical flow map.
  • the first optical flow map, the second optical flow map and the offsets corresponding to the two optical flow maps are fused to obtain the first fused feature map.
  • a first fused feature map is determined, including: residual processing of the first optical flow map and the offset to obtain a first feature map to be spliced; residual processing of the second optical flow map and the offset to obtain a second feature map to be spliced; and splicing of the first feature map to be spliced and the second feature map to be spliced to obtain a first fused feature map.
  • residual processing may be performed on the first optical flow map and the offset to align multiple optical flow features in the first optical flow map, thereby obtaining feature alignment.
  • the first feature map to be spliced is obtained, and at the same time, the second optical flow map and the offset are subjected to residual processing to align multiple optical flow features in the second optical flow map, thereby obtaining the second feature map to be spliced after feature alignment.
  • the first feature map to be spliced and the second feature map to be spliced are spliced to obtain the first fused feature map.
  • processing process of the second inter-frame feature map based on the second motion perception sub-model is the same as the processing process of the first inter-frame feature map based on the first motion perception sub-model, and the embodiment of the present disclosure will not be described in detail here.
  • the process of processing the first inter-frame feature map by the first motion perception sub-model is described by taking three interlaced frames to be processed as an example.
  • D 1 , D 2 , and D 3 can be used as interlaced frames to be processed, and these three interlaced frames to be processed are input into the first feature extraction branch to obtain the first feature map and the second feature map, which can be represented by F 1 and F 2.
  • F 1 and F 2 are input into the first motion perception sub-model, and F 1 and F 2 are processed based on the convolution layer to obtain the first optical flow map IF 1 and the second optical flow map IF 2 .
  • IF 1 and IF 2 are mapped based on the distortion network to obtain the offset, and IF 1 and the offset are subjected to residual processing to obtain the first feature map to be spliced.
  • Perform residual processing on IF 2 and the offset to obtain the second feature map to be spliced Finally, and After the splicing process is performed, the first fused feature map F full can be obtained.
  • S130 Determine a target video based on at least two target video frames.
  • the target video frames can be spliced, so as to obtain a target video composed of multiple continuous target video frames.
  • determining the target video based on at least two target video frames includes: splicing the at least two target video frames in the time domain to obtain the target video.
  • the application can splice multiple video frames according to the timestamps corresponding to the target video frames to obtain the target video. It can be understood that by splicing multiple frames and generating a target video, the processed images can be displayed in a clear and coherent form.
  • the video can be played directly to display the processed video screen on the display interface, or the target video can be stored in a specific space according to a preset path.
  • the embodiments of the present disclosure do not specifically limit this.
  • the at least three interlaced frames to be processed can be input into a pre-trained image fusion model to obtain at least two target video frames corresponding to the at least three interlaced frames to be processed, and finally, based on the at least two target video frames, the target video is determined.
  • the restoration effect of the video picture can be effectively improved. For example, for the video picture of a motion scene, a more significant restoration effect can also be achieved.
  • the picture brushing and detail loss are avoided, the picture quality and clarity of the video picture are improved, and the user experience is enhanced.
  • FIG4 is a flow chart of a video processing method provided by an embodiment of the present disclosure.
  • a plurality of video frames to be processed in the original video can be processed to obtain an interlaced frame to be processed.
  • the exemplary implementation method thereof can refer to the technical solution of this embodiment.
  • the technical terms identical or corresponding to the above-mentioned embodiment are not repeated here.
  • the method comprises the following steps:
  • S210 Acquire a plurality of to-be-processed video frames corresponding to the original video.
  • the two to-be-processed video frames include an odd-numbered video frame and an even-numbered video frame, and the odd-numbered video frame and the even-numbered video frame are determined based on the order of the to-be-processed video frames in the original video.
  • the original video may be a video spliced from interlaced scanned video frames.
  • the original video may be a video captured in real time by a terminal device, or a video pre-stored in a storage space by an application software, or a video uploaded to a server or client by a user based on a pre-set video upload control, etc., which is not specifically limited in this embodiment of the present disclosure.
  • the original video may be an early image video.
  • the odd-numbered video frames may be the original video.
  • the number corresponding to the arrangement order in the original video is an odd number, and there are rendering pixel values for the odd-numbered rows of pixels, which can be rendered and displayed on the display interface, and the pixel values of the even-numbered rows of pixels may be preset values, and the video frames are displayed in the form of black holes in the display interface.
  • the even-numbered video frames may be the number corresponding to the arrangement order in the original video is an even number, and there are rendering pixel values for the even-numbered rows of pixels, which can be rendered and displayed on the display interface, and the pixel values of the odd-numbered rows of pixels may be preset values, and the video frames are displayed in the form of black holes in the display interface.
  • Figure 5a can be an odd video frame
  • Figure 5b can be an even video frame.
  • the odd video frame only scans and samples the odd rows. Therefore, in Figure 5a, only the pixel values of the pixels in the odd rows are rendering pixel values, which can be blue, and are rendered and displayed in the display interface, while the pixel values of the pixels in the even rows can be preset values, which can be black. At this time, when the pixels in the even rows are displayed in the display interface, they may be displayed in the display interface as a black hole; similarly, the even video frame only scans and samples the even rows.
  • the original video can be parsed based on a pre-written program to obtain multiple video frames to be processed. For example, starting from the first video frame to be processed, two adjacent video frames to be processed are fused to obtain an interlaced frame to be processed.
  • fusing two adjacent video frames to be processed to obtain an interlaced frame to be processed includes: extracting odd-numbered line data from an odd-numbered video frame and even-numbered line data from an even-numbered video frame; and fusing the odd-numbered line data and the even-numbered line data to obtain an interlaced frame to be processed.
  • the odd-numbered row data may be pixel point information in the odd-numbered row.
  • the even-numbered row data may be pixel point information in the even-numbered row.
  • the pixel point information of the odd-numbered row may be sampled first to obtain an odd-numbered video frame, and then the pixel point information of the even-numbered row may be sampled to obtain an even-numbered video frame, and the pixel point sampling information of the odd-numbered row in the odd-numbered video frame may be used as the odd-numbered row data, and the pixel point sampling information of the even-numbered row in the even-numbered video frame may be used as the even-numbered row data.
  • the odd-numbered row data of the odd-numbered video frame can be extracted, and the even-numbered row data of the even-numbered video frame can be extracted.
  • the odd-numbered row data and the even-numbered row data are fused to obtain the interlaced frame to be processed.
  • the interlaced frame to be processed containing both the pixel information of the odd-numbered row pixels and the pixel information of the even-numbered row pixels can be obtained, so that the target video frame that meets the user's needs can be obtained by processing the interlaced frame to be processed.
  • S240 Input at least three interlaced frames to be processed into a pre-trained image fusion model to obtain at least two target video frames corresponding to the at least three interlaced frames to be processed.
  • S250 Determine a target video based on at least two target video frames.
  • the technical solution of the disclosed embodiment obtains multiple to-be-processed video frames corresponding to the original video, fuses two adjacent to-be-processed video frames to obtain to-be-processed interlaced frames, then obtains at least three to-be-processed interlaced frames, and inputs the at least three to-be-processed interlaced frames into a pre-trained image fusion model to obtain at least two target video frames corresponding to the at least three to-be-processed interlaced frames, and finally, determines the target video based on the at least two target video frames.
  • the restoration effect of the video picture can be effectively improved. For example, for the video picture of a motion scene, a more significant restoration effect can also be achieved.
  • the picture brushing and detail loss are avoided, the picture quality and clarity of the video picture are improved, and the user experience is improved.
  • FIG6 is a schematic diagram of the structure of a video processing device provided by an embodiment of the present disclosure. As shown in FIG6 , the device includes: a to-be-processed interlaced frame acquisition module 310 , a target video frame determination module 320 , and a target video determination module 330 .
  • the to-be-processed interlaced frame acquisition module 310 is configured to acquire at least three to-be-processed interlaced frames; wherein the to-be-processed interlaced frames are determined based on two adjacent to-be-processed video frames;
  • the target video frame determination module 320 is configured to input the at least three interlaced frames to be processed into a pre-trained image fusion model to obtain at least two target video frames corresponding to the at least three interlaced frames to be processed; wherein the image fusion model includes a feature processing sub-model and a motion perception sub-model;
  • the target video determination module 330 is configured to determine the target video based on the at least two target video frames.
  • the device further includes: a to-be-processed video frame acquisition module and a to-be-processed video frame processing module.
  • a module for acquiring video frames to be processed configured to acquire a plurality of video frames to be processed corresponding to the original video before acquiring at least three interlaced frames to be processed;
  • the processing module of the video frames to be processed is configured to fuse two adjacent video frames to be processed to obtain the interlaced frame to be processed; wherein the two video frames to be processed include an odd video frame and an even video frame, and the odd video frame and the even video frame are determined based on the order of the video frames to be processed in the original video.
  • the to-be-processed video frame processing module includes: a data extraction unit and a data processing unit.
  • a data extraction unit configured to extract odd-numbered line data in the odd-numbered video frame and even-numbered line data in the even-numbered video frame;
  • the data processing unit is configured to obtain the to-be-processed interlaced frame by fusing the odd-numbered line data and the even-numbered line data.
  • the image fusion model includes a feature processing sub-model, a motion perception sub-model and a 2D convolution layer.
  • the feature processing sub-model includes a first feature extraction branch and a second feature extraction branch;
  • the output of the first feature extraction branch is the input of the first motion perception sub-model in the motion perception sub-model
  • the output of the second feature extraction branch is the input of the second motion perception sub-model in the motion perception sub-model
  • the output of the first motion perception sub-model and the output of the second motion perception sub-model are inputs to the 2D convolutional layer, so that the 2D convolutional layer outputs a target video frame.
  • the first feature extraction branch includes a structural feature extraction network and a structural feature fusion network
  • the second feature extraction branch includes a detail feature extraction network and a detail feature fusion network
  • the target video frame determination module 320 includes: an equal-proportional feature extraction submodule, an odd-even field feature extraction submodule, a structural feature processing submodule, a detail feature processing submodule, a first fusion feature map determination submodule, a second fusion feature map determination submodule and a target video frame determination submodule.
  • an equal-proportional feature extraction submodule configured to perform equal-proportional feature extraction on the at least three interlaced frames to be processed based on the structural feature extraction network to obtain structural features corresponding to the interlaced frames to be processed;
  • an odd-even field feature extraction submodule configured to extract odd-even field features from the at least three interlaced frames to be processed based on the detail feature extraction network, to obtain detail features corresponding to the interlaced frames to be processed;
  • a structural feature processing submodule configured to process the structural feature based on the structural feature fusion network to obtain a first inter-frame feature map between two adjacent interlaced frames to be processed;
  • a detail feature processing submodule configured to process the detail feature based on the detail feature fusion network to obtain a second inter-frame feature map between two adjacent interlaced frames to be processed;
  • a first fused feature map determining submodule is configured to process the first inter-frame feature map based on a first motion perception submodel to obtain a first fused feature map
  • a second fused feature map determining submodule configured to process the second inter-frame feature map based on a second motion perception submodel to obtain a second fused feature map
  • the target video frame determination submodule is configured to perform a 2D convolutional layer on the first fusion feature map and the first The two fused feature maps are processed to obtain the at least two target video frames.
  • the first fused feature map determination submodule includes: a feature map processing unit, an optical flow map mapping processing unit and a first fused feature map determination unit.
  • a feature map processing unit configured to process the first feature map and the second feature map respectively based on the convolutional network in the first motion perception sub-model to obtain a first optical flow map and a second optical flow map;
  • an optical flow map mapping processing unit configured to map and process the first optical flow map and the second optical flow map based on the distortion network in the first motion perception sub-model to obtain an offset
  • the first fused feature map determining unit is configured to determine the first fused feature map based on the first optical flow map, the second optical flow map and the offset.
  • the first fused feature map determining unit is configured to process the first optical flow map and the offset residual to obtain a first feature map to be spliced; process the second optical flow map and the offset residual to obtain a second feature map to be spliced; and obtain the first fused feature map by splicing the first feature map to be spliced and the second feature map to be spliced.
  • the target video determination module 330 is configured to splice the at least two target video frames in the time domain to obtain the target video.
  • the at least three interlaced frames to be processed can be input into a pre-trained image fusion model to obtain at least two target video frames corresponding to the at least three interlaced frames to be processed, and finally, based on the at least two target video frames, the target video is determined.
  • the restoration effect of the video picture can be effectively improved. For example, for the video picture of a motion scene, a more significant restoration effect can also be achieved.
  • the picture brushing and detail loss are avoided, the picture quality and clarity of the video picture are improved, and the user experience is enhanced.
  • the video processing device provided in the embodiments of the present disclosure can execute the video processing method provided in any embodiment of the present disclosure, and has the corresponding functional modules and beneficial effects of the execution method.
  • FIG7 is a schematic diagram of the structure of an electronic device provided by an embodiment of the present disclosure.
  • a schematic diagram of the structure of an electronic device e.g., a terminal device or server in FIG7
  • the terminal device in the embodiment of the present disclosure may include, but is not limited to, mobile terminals such as mobile phones, laptop computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), vehicle-mounted terminals (e.g., vehicle-mounted navigation terminals), etc., and fixed terminals such as digital TVs, desktop computers, etc.
  • the electronic device shown in FIG7 is merely an example and should not impose any limitations on the functions and scope of use of the embodiments of the present disclosure.
  • the electronic device 500 may include a processing device (e.g., a central processing unit, a graphics processing unit, etc.) 501, which can perform various appropriate actions and processes according to a program stored in a read-only memory (ROM) 502 or a program loaded from a storage device 508 into a random access memory (RAM) 503.
  • a processing device e.g., a central processing unit, a graphics processing unit, etc.
  • RAM random access memory
  • various programs and data required for the operation of the electronic device 500 are also stored.
  • the processing device 501, the ROM 502, and the RAM 503 are connected to each other via a bus 504.
  • An edit/output (I/O) interface 505 is also connected to the bus 504.
  • the following devices may be connected to the I/O interface 505: input devices 506 including, for example, a touch screen, a touchpad, a keyboard, a mouse, a camera, a microphone, an accelerometer, a gyroscope, etc.; output devices 507 including, for example, a liquid crystal display (LCD), a speaker, a vibrator, etc.; storage devices 508 including, for example, a magnetic tape, a hard disk, etc.; and communication devices 509.
  • the communication device 509 may allow the electronic device 500 to communicate wirelessly or wired with other devices to exchange data.
  • FIG. 7 shows an electronic device 500 with a variety of devices, it should be understood that it is not required to implement or have all the devices shown. More or fewer devices may be implemented or have alternatively.
  • an embodiment of the present disclosure includes a computer program product, which includes a computer program carried on a non-transitory computer-readable medium, and the computer program contains program code for executing the method shown in the flowchart.
  • the computer program product includes a computer program carried on a non-transitory computer-readable medium, and the computer program includes a computer program for executing the method shown in the flowchart.
  • the computer program may be downloaded and installed from the network through the communication device 509, or installed from the storage device 508, or installed from the ROM 502.
  • the electronic device provided by the embodiment of the present disclosure and the video processing method provided by the above embodiment belong to the same concept.
  • the technical details not fully described in this embodiment can be referred to the above embodiment, and this embodiment has the same beneficial effects as the above embodiment.
  • the embodiments of the present disclosure provide a computer storage medium on which a computer program is stored.
  • the program is executed by a processor, the video processing method provided by the above embodiments is implemented.
  • the computer-readable medium disclosed above may be a computer-readable signal medium or a computer-readable storage medium or any combination of the above two.
  • the computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device or device, or any combination of the above.
  • Computer-readable storage media may include, but are not limited to: an electrical connection with one or more wires, a portable computer disk, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the above.
  • a computer-readable storage medium may be any tangible medium containing or storing a program that may be used by or in combination with an instruction execution system, device or device.
  • a computer-readable signal medium may include a data signal propagated in a baseband or as part of a carrier wave, in which a computer-readable program code is carried.
  • This propagated data signal may take a variety of forms, including but not limited to an electromagnetic signal, an optical signal, or any suitable combination of the above.
  • the computer readable signal medium may also be any computer readable medium other than a computer readable storage medium, which may send, propagate or transmit a program for use by or in conjunction with an instruction execution system, apparatus or device.
  • the program code contained on the computer readable medium may be transmitted using any suitable medium, including but not limited to: wires, optical cables, RF (radio frequency), etc., or any suitable combination of the above.
  • the client and server may communicate using any currently known or future developed network protocol such as HTTP (HyperText Transfer Protocol), and may be interconnected with any form or medium of digital data communication (e.g., a communication network).
  • HTTP HyperText Transfer Protocol
  • Examples of communication networks include a local area network ("LAN”), a wide area network ("WAN”), an internet (e.g., the Internet), and a peer-to-peer network (e.g., an ad hoc peer-to-peer network), as well as any currently known or future developed network.
  • the computer-readable medium may be included in the electronic device, or may exist independently without being incorporated into the electronic device.
  • the computer-readable medium carries one or more programs.
  • the electronic device When the one or more programs are executed by the electronic device, the electronic device:
  • the computer-readable medium carries one or more programs.
  • the electronic device When the one or more programs are executed by the electronic device, the electronic device:
  • a target video is determined.
  • Computer program code for performing operations of the present disclosure may be written in one or more programming languages, or a combination thereof, including, but not limited to, object-oriented programming languages such as Java, Smalltalk, C++, and conventional procedural programming languages such as "C" or similar programming languages.
  • the program code may be executed entirely on a user's computer, partially on a user's computer, as a stand-alone software package, or partially on a computer.
  • the program on the user's computer may be executed partially on the remote computer or completely on the remote computer or server.
  • the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (e.g., through the Internet using an Internet service provider).
  • LAN local area network
  • WAN wide area network
  • each square box in the flow chart or block diagram can represent a module, a program segment or a part of a code, and the module, the program segment or a part of the code contains one or more executable instructions for realizing the specified logical function.
  • the functions marked in the square box can also occur in a sequence different from that marked in the accompanying drawings. For example, two square boxes represented in succession can actually be executed substantially in parallel, and they can sometimes be executed in the opposite order, depending on the functions involved.
  • each square box in the block diagram and/or flow chart, and the combination of the square boxes in the block diagram and/or flow chart can be implemented with a dedicated hardware-based system that performs the specified function or operation, or can be implemented with a combination of dedicated hardware and computer instructions.
  • the units involved in the embodiments described in the present disclosure may be implemented by software or hardware.
  • the name of a unit does not limit the unit itself in some cases.
  • the first acquisition unit may also be described as a "unit for acquiring at least two Internet Protocol addresses".
  • exemplary types of hardware logic components include: field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard products (ASSPs), systems on chip (SOCs), complex programmable logic devices (CPLDs), and the like.
  • FPGAs field programmable gate arrays
  • ASICs application specific integrated circuits
  • ASSPs application specific standard products
  • SOCs systems on chip
  • CPLDs complex programmable logic devices
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, device, or equipment.
  • a machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • a machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or equipment, or any suitable combination of the foregoing.
  • a more specific example of a machine-readable storage medium may include an electrical connection based on one or more lines, a portable computer disk, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read-only memory
  • EPROM or flash memory erasable programmable read-only memory
  • CD-ROM portable compact disk read-only memory
  • CD-ROM compact disk read-only memory
  • magnetic storage device or any suitable combination of the foregoing.
  • Example 1 provides a video processing method, the method comprising:
  • a target video is determined.
  • Example 2 provides a video processing method, and before obtaining at least three interlaced frames to be processed, the method further includes:
  • the two video frames to be processed include an odd video frame and an even video frame, and the odd video frame and the even video frame are determined based on the order of the video frames to be processed in the original video.
  • Example 3 provides a video processing method, wherein the method of fusing two adjacent video frames to be processed to obtain the interlaced frame to be processed includes:
  • the interlaced frame to be processed is obtained by fusing the odd-numbered line data and the even-numbered line data.
  • Example 4 provides a video processing method, wherein the image fusion model includes a feature processing sub-model, a motion perception sub-model and a 2D convolution layer.
  • Example 5 provides a video processing method, further comprising:
  • the feature processing sub-model includes a first feature extraction branch and a second feature extraction branch;
  • the output of the first feature extraction branch is the input of the first motion perception sub-model in the motion perception sub-model
  • the output of the second feature extraction branch is the input of the second motion perception sub-model in the motion perception sub-model
  • the output of the first motion perception sub-model and the output of the second motion perception sub-model are inputs to the 2D convolutional layer, so that the 2D convolutional layer outputs a target video frame.
  • Example Six provides a video processing method, wherein the first feature extraction branch includes a structural feature extraction network and a structural feature fusion network, and the second feature extraction branch includes a detail feature extraction network and a detail feature fusion network.
  • Example 7 provides a video processing method, wherein the step of inputting the at least three interlaced frames to be processed into a pre-trained image fusion model to obtain at least two target video frames corresponding to the at least three interlaced frames to be processed includes:
  • the first fused feature map and the second fused feature map are processed based on the 2D convolution layer to obtain the at least two target video frames.
  • Example 8 provides a video processing method, wherein the first inter-frame feature map includes a first feature map and a second feature map, and the first inter-frame feature map is processed based on the first motion perception sub-model to obtain a first fused feature map, including:
  • the first feature map and the second feature map are processed respectively to obtain a first optical flow map and a second optical flow map;
  • the first fusion feature map is determined based on the first optical flow map, the second optical flow map, and the offset.
  • Example 9 provides a video processing method, wherein the determining the first fusion feature map based on the first optical flow map, the second optical flow map, and the offset includes:
  • the first fused feature map is obtained by splicing the first feature map to be spliced and the second feature map to be spliced.
  • Example 10 provides a video processing method, wherein determining a target video based on the at least two target video frames includes:
  • the at least two target video frames are spliced in the time domain to obtain the target video.
  • Example 11 provides a video processing device, the device comprising:
  • a to-be-processed interlaced frame acquisition module is configured to acquire at least three to-be-processed interlaced frames; wherein the to-be-processed interlaced frames are determined based on two adjacent to-be-processed video frames;
  • the target video frame determination module is configured to input the at least three interlaced frames to be processed into a pre-trained image fusion model to obtain at least two target video frames corresponding to the at least three interlaced frames to be processed; wherein the The image fusion model includes a feature processing sub-model and a motion perception sub-model;
  • the target video determination module is configured to determine the target video based on the at least two target video frames.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Image Analysis (AREA)

Abstract

本公开实施例提供了一种视频处理方法、装置、电子设备及存储介质。其中,该方法包括:获取至少三个待处理交错帧;其中,待处理交错帧是基于相邻两个待处理视频帧确定的;将至少三个待处理交错帧输入至预先训练得到的图像融合模型中,得到与至少三个待处理交错帧所对应的至少两个目标视频帧;其中,图像融合模型中包括特征处理子模型以及运动感知子模型;基于至少两个目标视频帧,确定目标视频。

Description

视频处理方法、装置、电子设备及存储介质
本申请要求在2022年10月21日提交中国专利局、申请号为202211294643.2的中国专利申请的优先权,该申请的全部内容通过引用结合在本申请中。
技术领域
本公开实施例涉及视频处理技术领域,例如涉及一种视频处理方法、装置、电子设备及存储介质。
背景技术
随着网络技术的不断发展,在对图像进行扫描显示时,为了提高图像显示效果,越来越多的用户采用逐行扫描的方式对图像进行扫描显示。
然而,对于之前基于隔行扫描方式所生成的早期视频,由于两幅图像显示的时间间隔较大,会存在图像画面闪烁较大、出现齿纹、假象等图像质量问题,这样产生的视频叫做交错视频。在将交错视频显示在现有的显示界面上时,需要通过去交错处理后,才能显示完整视频。
目前,通常采用对交错视频进行反交错处理,以去除交错视频中的拉丝效应,但是,这种方式的去交错效果并不好,例如在运动物体场景中,拉丝区域都比较模糊,容易出现细节丢失、画面拉丝等情况。
发明内容
本公开提供一种视频处理方法、装置、电子设备及存储介质,以实现对视频画面进行有效恢复的效果,例如对于运动场景的视频画面,可以达到较为显著的恢复效果。
第一方面,本公开实施例提供了一种视频处理方法,该方法包括:
获取至少三个待处理交错帧;其中,所述待处理交错帧是基于相邻两个待处理视频帧确定的;
将所述至少三个待处理交错帧输入至预先训练得到的图像融合模型中,得到与所述至少三个待处理交错帧所对应的至少两个目标视频帧;其中,所述图像融合模型中包括特征处理子模型以及运动感知子模型;
基于所述至少两个目标视频帧,确定目标视频。
第二方面,本公开实施例还提供了一种视频处理装置,该装置包括:
待处理交错帧获取模块,设置为获取至少三个待处理交错帧;其中,所述待处理交错帧是基于相邻两个待处理视频帧确定的;
目标视频帧确定模块,设置为将所述至少三个待处理交错帧输入至预先训练得到的图像融合模型中,得到与所述至少三个待处理交错帧所对应的至少两个目标视频帧;其中,所述图像融合模型中包括特征处理子模型以及运动感知子模型;
目标视频确定模块,设置为基于所述至少两个目标视频帧,确定目标视频。
第三方面,本公开实施例还提供了一种电子设备,所述电子设备包括:
一个或多个处理器;
存储装置,设置为存储一个或多个程序,
当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如本公开实施例任一所述的视频处理方法。
第四方面,本公开实施例还提供了一种包含计算机可执行指令的存储介质,所述计算机可执行指令在由计算机处理器执行时用于执行如本公开实施例任一所述的视频处理方法。
附图说明
贯穿附图中,相同或相似的附图标记表示相同或相似的元素。应当理解附图是示意性的,原件和元素不一定按照比例绘制。
图1是本公开实施例所提供的一种视频处理方法流程示意图;
图2是本公开实施例所提供的图像融合模型的示意图;
图3是本公开实施例所提供的运动感知模型的示意图;
图4是本公开实施例所提供的一种视频处理方法流程示意图;
图5是本公开实施例所提供的一种待处理视频帧的示意图;
图6是本公开实施例所提供的一种视频处理装置结构示意图;
图7是本公开实施例所提供的一种电子设备的结构示意图。
具体实施方式
应当理解,本公开的方法实施方式中记载的多个步骤可以按照不同的顺序执行,和/或并行执行。此外,方法实施方式可以包括附加的步骤和/或省略执行示出的步骤。本公开的范围在此方面不受限制。
本文使用的术语“包括”及其变形是开放性包括,即“包括但不限于”。术语“基于”是“至少部分地基于”。术语“一个实施例”表示“至少一个实施例”;术语“另一实施例”表示“至少一个另外的实施例”;术语“一些实施例”表示“至少一些实施例”。其他术语的相关定义将在下文描述中给出。
需要注意,本公开中提及的“第一”、“第二”等概念仅用于对不同的装置、模块或单元进行区分,并非用于限定这些装置、模块或单元所执行的功能的顺序或者相互依存关系。
需要注意,本公开中提及的“一个”、“多个”的修饰是示意性而非限制性的,本领域技术人员应当理解,除非在上下文另有明确指出,否则应该理解为“一个或多个”。
本公开实施方式中的多个装置之间所交互的消息或者信息的名称仅用于说明性的目的,而并不是用于对这些消息或信息的范围进行限制。
可以理解的是,在使用本公开多个实施例公开的技术方案之前,均应当依据相关法律法规通过恰当的方式对本公开所涉及个人信息的类型、使用范围、使用场景等告知用户并获得用户的授权。
例如,在响应于接收到用户的主动请求时,向用户发送提示信息,以明确地提示用户,其请求执行的操作将需要获取和使用到用户的个人信息。从而,使得用户可以根据提示信息来自主地选择是否向执行本公开技术方案的操作的电子设备、应用程序、服务器或存储介质等软件或硬件提供个人信息。
例如,响应于接收到用户的主动请求,向用户发送提示信息的方式例如可以是弹窗的方式,弹窗中可以以文字的方式呈现提示信息。此外,弹窗中还可以承载供用户选择“同意”或者“不同意”向电子设备提供个人信息的选择控件。
可以理解的是,上述通知和获取用户授权过程仅是示意性的,不对本公开的实现方式构成限定,其它满足相关法律法规的方式也可应用于本公开的实现方式中。
可以理解的是,本技术方案所涉及的数据(包括但不限于数据本身、数据的获取或使用)应当遵循相应法律法规及相关规定的要求。
在介绍本技术方案之前,可以先对应用场景进行示例性说明。在应用基于原始视频生成相应的目标视频时,通常采用的方式有三种,第一种实施方式可以为基于去隔行算法(YADIF)对待处理交错帧进行反交错处理,以去除原始视频的拉丝效应,然而,此方法对于运动物体场景的恢复效果不佳,有细节缺失、画面拉丝等情况;第二种实施方式可以为将多个待处理交错帧输入至结合时域空域信息预测的ST-Deint深度学习神经网络模型中,基于深度学习算法对待处理交错帧进行处理,此方法对运动场景仅能做到大致恢复的效果,并且,细节恢复 效果不佳,还是会存在画面拉丝的情况;第三种实施方式可以为基于深度学习模型DIN对待处理交错帧进行处理,以使待处理交错帧先填充缺失信息再融合场间内容,以得到处理后的视频,与第二种实施方式相同,此方法对运动场景也仅能做到大致恢复的效果,并且,对细节恢复效果不佳,例如运动拉丝区域,容易出现模糊与细节丢失的情况。基于上述可知,相关技术中的视频处理方式仍然存在输出视频显示效果不佳的情况,此时,基于本公开实施例的技术方案,即可基于包括多个子模型的图像融合模型对待处理交错帧进行处理,从而避免输出的目标视频出现细节缺失、画面拉丝以及画面模糊等情况。
图1是本公开实施例所提供的一种视频处理方法流程示意图,本公开实施例适用于对采用隔行扫描的原始视频进行特征信息补充,以使得到的目标视频可以在现有的显示设备上进行完整显示的情形,该方法可以由视频处理装置来执行,该装置可以通过软件和/或硬件的形式实现,例如,通过电子设备来实现,该电子设备可以是移动终端、PC端或服务器等。本公开实施例所提供的技术方案可以基于客户端执行,也可以基于服务端执行,还可以基于客户端和服务端配合执行。
如图1所示,所述方法包括:
S110、获取至少三个待处理交错帧。
其中,待处理交错帧是基于相邻两个待处理视频帧确定的。
首先需要说明的是,执行本公开实施例提供的视频处理方法的装置,可以集成在支持视频处理功能的应用软件中,且该软件可以安装至电子设备中,例如,电子设备可以是移动终端或者PC端等。应用软件可以是对图像/视频处理的一类软件,其具体的应用软件在此不再一一赘述,只要可以实现图像/视频处理即可。还可以是专门研发的应用程序,来实现视频处理并将输出的视频进行展示的软件中,亦或是集成在相应的页面中,用户可以通过PC端中集成的页面来实现对特效视频的处理。
在本实施例中,用户可以基于移动终端的摄像装置实时拍摄视频,或者,基于应用软件内预先开发的控件主动上传视频,因此可以理解,应用所获取的实时拍摄的视频或用户主动上传的视频即为待处理视频。例如,基于预先编写的程序对待处理视频进行解析,即可得到多个待处理视频帧。本领域技术人员应当理解,由于带宽以及视频装置处理速度的限制,早期的视频显示方式通常采用隔行扫描的方式,即,首先扫描奇数行得到仅奇数行像素点存在渲染像素值的视频帧,然后扫描偶数行得到仅偶数行像素点存在渲染像素值的视频帧,将两个视频帧组合起来得到完整的视频帧。这种显示方式会造成相邻两个视频帧显示的时间间隔较大,从而导致视频帧画面闪烁较大、出现齿纹、假象等图像质量问题,同时,由于当前在进行视频显示时通常采用逐行扫描的方式,因此,基于现有的视频显示装置对早期视频进行显示时,可以在经过去交错处理后,得到完整视频,可以将经过隔行扫描的视频帧作为待处理视频帧,即仅奇数行像素点存在渲染像素值或偶数行像素点存在渲染像素值的视频帧,相邻两帧待处理视频帧组合起来的视频帧为待处理交错帧。其中,去交错处理是指将相邻两帧图像的交错帧的奇数场和偶数场分别填补缺失的半场信息,以恢复至原帧大小,最后得到奇偶两帧。
在本实施例中,由于待处理视频帧为仅奇数行像素点存在渲染像素值或偶数行像素点存在渲染像素值的视频帧,因此,在对待处理视频帧进行去交错处理时,可以将相邻两帧待处理视频帧组合起来,得到待处理交错帧,从而可以对待处理交错帧进行去交错处理。
示例性的,可以将I1、I2、I3、I4、I5、I6作为待处理视频帧,对相邻两个待处理视频帧进行融合处理时,可以将I1和I2组合在一起,得到一帧待处理交错帧D1,将I3和I4组合在一起,得到一帧待处理交错帧D2,将I5和I6组合在一起,得到一帧待处理视频帧D3。待处理交错帧的制作原理可以基于如下公式表示:
需要说明的是,待处理交错帧的数量可以为三个,也可以为三个以上,本公开实施例对此不作具体限定。
也就是说,待处理交错帧的数量是与原始视频的视频帧数量相对应的,输入至模型中的待处理交错帧的数量可以是三帧也可以是大于三帧。
S120、将至少三个待处理交错帧输入至预先训练得到的图像融合模型中,得到与至少三个待处理交错帧所对应的至少两个目标视频帧。
在本实施例中,在得到至少三个待处理交错帧后,即可将待处理交错帧输入至预先训练好的图像融合模型中。其中,图像融合模型可以是包括多个子模型的深度学习神经网络模型。图像融合模型中包括特征处理子模型以及运动感知子模型。
还需要说明的,在基于图像融合模型处理的过程中,需要结合相邻两个待处理交错帧之间的光流图来确定最终的目标视频,因此待处理交错帧的数量最少为三个帧。
其中,特征处理子模型可以是包含多个卷积模块的神经网络模型。特征处理子模型可以用于对待处理交错帧中的特征进行提取、融合以及其他处理。在本实施例中,特征处理子模型中可以包括多个3D卷积层,从而可以使特征处理子模型不仅可以处理多帧的时域特征信息,还可以处理空间特征信息,加强了待处理交错帧之间的信息交互。
其中,运动感知子模型可以为用于感知帧间运动情况的神经网络模型。运动感知子模型可以由至少一个卷积网络、包含反变换(Backward Warping)函数的网络以及残差网络组成。Backward Warping函数可以实现图像与图像之间的映射。在实际应用中,由于相邻两帧之间的帧间内容具有较强的时空关联性,因此,在对待处理交错帧进行处理时,可以采用运动感知子模型对帧间的特征信息进行处理,从而使得帧间内容更具连续性,同时还可以达到互相补足细节的作用。
在本实施例中,将待处理交错帧输入至图像融合模型后,即可基于图像融合模型中的多个子模型对待处理视频帧进行处理,从而可以得到与待处理交错帧相对应的至少两个目标视频帧。
在实际应用中,由于图像融合模型包括多个子模型,因此,在基于图像融合模型对待处理交错帧进行处理,可以依次通过模型中多个子模型对待处理视频帧进行相应的处理,从而输出与待处理交错帧相对应的至少两个目标视频帧。
需要说明的是,图像融合模型中包括多个子模型,其排列顺序可以根据数据输入输出顺序对多个子模型进行排列。
例如,图像融合模型中包括特征处理子模型、运动感知子模型以及2D卷积层。其中,2D卷积层可以为仅对数据的高度和宽度进行特征处理的神经网络层。
在本实施例中,基于数据输入输出顺序确定图像融合模型中多个子模型的排列顺序可以使得图像融合模型既可以对待处理交错帧的特征信息进行处理,又可以对多个待处理交错帧之间的运动情况进行感知,以使帧间内容更具连续性,同时达到细节补充的效果。
需要说明的是,当将待处理交错帧输入至图像融合模型进行处理时,相关技术所采用的方案是对待处理交错帧进行奇偶行的拆分,即将待处理交错帧的H维度减半,示例性的,若待处理交错帧的矩阵为(H×W×C),则经过奇偶行拆分后的矩阵为(2/H×W×C)。这样方案可能会导致待处理交错帧中的对象在结构上出现形变,从而影响目标视频帧的视觉效果。而本公开实施例的处理过程可以理解为,对待处理交错帧进行奇偶行拆分的基础上,也对其进行奇偶列的拆分,即采用双特征处理分支,从而保证在基于图像融合模型对待处理交错帧进行处理时,可以既对待处理交错帧的整体结构特征信息进行处理,还可以对待处理交错帧的高频细节特征信息进行处理。
基于此,在上述技术方案的基础上,还包括:特征处理子模型中包括第一特征提取分支和第二特征提取分支;第一特征提取分支的输出为运动感知子模型中第一运动感知子模型的输入,第二特征提取分支的输出为运动感知子模型中第二运动感知子模型的输入;第一运动感知子模型的输出和第二运动感知子模型的输出为2D卷积层的输入,以使2D卷积层输出目 标视频帧。
在本实施例中,第一特征提取分支可以为用于对待处理交错帧的结构特征进行处理的神经网络模型。例如,第一特征提取分支包括结构特征提取网络和结构特征融合网络。其中,结构特征提取网络可以由至少一个卷积网络组成,以使至少一个卷积网络可以依据预先设置的结构拆分比例对待处理交错帧进行处理,得到与待处理交错帧相对应的结构特征。结构特征融合网络可以为由至少一个3D卷积层堆叠而成的U-Net结构的神经网络。需要说明的是,至少一个3D卷积层的卷积核可以为相同值,也可以为不同值,本公开实施例对此不作具体限定。由于待处理交错帧的数量为至少三个,因此,结构特征融合网络可以用于对帧间信息交互进行加强,从而使得不仅可以对待处理交错帧的空间特征进行处理,还可以对多帧之间的时域特征进行加强。
在本实施例中,第二特征提取分支可以为用于对待处理交错帧的细节特征进行处理的神经网络模型。例如,第二特征提取分支包括细节特征提取网络和细节特征融合网络。其中,细节特征提取网络可以由至少一个卷积层组成,以使至少一个卷积层可以根据预先设置的细节拆分比例对待处理交错帧进行处理,得到与待处理交错帧相对应的细节特征。细节特征融合网络可以为由至少一个3D卷积层堆叠而成的U-Net结构的神经网络。需要说明的是,至少一个3D卷积层的卷积核可以为相同值,也可以为不同值,本公开实施例对此不作具体限定。还需说明的是,细节特征融合网络与结构特征融合网络的网络结构可以是相同的,并且,这两种网络所要达到的效果也是相同的,均是多个对待处理交错帧的帧间信息进行加强的效果。下面可以结合图2对图像融合模型中多个子模型的数据输入输出进行具体说明。
示例性的,参见图2,将待处理交错帧分别输入至第一特征提取分支和第二特征提取分支,经过第一特征提取分支中的结构特征提取网络和结构特征融合网络对待处理交错帧处理后,可以输入至第一运动感知子模型中,同时,经过第二特征提取分支中的细节特征提取网络和细节特征融合网络对待处理交错帧进行处理后,可以输入至第二运动感知子模型中,例如,经过第一运动感知子模型对模型输入进行处理后,可以输入至2D卷积层中,同时,经过第二运动感知子模型对模型输入进行处理后,输入至2D卷积层中,以使2D卷积层可以输出目标视频帧。这样可以使得图像融合模型既可以对待处理交错帧的特征信息进行处理,又可以对待处理交错帧之间的运动情况进行感知,以使帧间内容更具连续性,同时达到细节补充的效果。
在实际应用中,将待处理交错帧输入至图像融合模型中,即可基于模型中的多个子模型对其进行处理,从而可以得到与待处理交错帧相对应目标视频帧。下面继续结合图2对图像融合模型对待处理交错帧进行处理的过程进行具体的说明。
参见图2所示,将至少三个待处理交错帧输入至预先训练得到的图像融合模型中,得到与至少三个待处理交错帧所对应的至少两个目标视频帧,包括:基于结构特征提取网络对至少三个待处理交错帧进行等比例特征提取,得到与待处理交错帧所对应的结构特征;以及,基于细节特征提取网络对至少三个待处理交错帧进行奇偶场特征提取,得到与待处理交错帧所对应的细节特征;基于结构特征融合网络对结构特征进行处理,得到相邻两个待处理交错帧之间的第一帧间特征图;以及,基于细节特征融合网络对细节特征进行处理,得到相邻两个待处理交错帧之间的第二帧间特征图;基于第一运动感知子模型对第一帧间特征图处理,得到第一融合特征图;以及,基于第二运动感知子模型对第二帧间特征图处理,得到第二融合特征图;基于2D卷积层对第一融合特征图以及第二融合特征图进行处理,得到至少两个目标视频帧。
在本实施例中,结构特征可以为用于体现待处理交错帧的整体结构信息的特征。细节特征可以为用于体现待处理交错帧的细节信息的特征。细节特征可以为高频特征,是比结构特征更高位的特征。
在示例实施中,将待处理交错帧输入至图像融合模型中,可以基于结构特征提取网络对待处理交错帧进行等比例的降维处理,得到与待处理交错帧相对应的结构特征,同时,基于 细节特征提取网络对待处理交错帧进行奇偶场的拆分处理,以得到与待处理交错帧相对应的细节特征;例如,对于结构特征,可以基于结构特征融合网络对结构特征进行特征融合处理,将相邻两个待处理交错帧的结构特征进行融合,从而可以得到相邻两个待处理交错帧之间的融合特征图,即为第一帧间特征图,同时,对于细节特征,可以基于细节特诊融合网络对细节特征进行融合处理,将相邻两个待处理交错帧的细节特征进行融合,从而可以得到相邻两个待处理交错帧之间的融合特征图,即为第二帧间特征图;然后,将第一帧间特征图输入至第一运动感知子模型中,基于第一运动感知子模型对第一帧间特征图进行特征补齐处理,即可得到第一融合特征图,同时,将第二帧间特征图输入至第二运动感知子模型中,基于第二运动感知子模型对第二帧间特征图进行特征补齐处理,即可得到第二融合特征图,最后,将第一融合特征图和第二融合特诊图输入至2D卷积层中,基于2D卷积层对融合特征图进行处理,即可得到与待处理交错帧相对应的至少两个目标视频帧。采用双特征处理分支,可以使图像融合模型既可以对待处理交错帧的整体结构特征信息进行处理,又可以对待处理交错帧的细节特征信息进行处理,并且,采用3D卷积层,可以加强帧间的信息交互,采用运动感知子模型可以对帧间的运动情况进行感知,并进行特征对齐,使得帧间内容更具连续性,从而可以提高目标视频帧的显示效果。
需要说明的是,由于第一帧间特征图是基于相邻两个待处理交错帧的结构特征经过特征融合处理后得到的,当待处理交错帧为至少三个时,第一帧间特征图可以包括第一特征图和第二特征图。在实际应用中,基于第一运动感知子模型对第一帧间特征图进行处理,可以是基于第一运动感知子模型分别对第一特征图和第二特征图进行处理,从而得到第一融合特征图。下面可以结合图3对第一运动感知子模型对第一帧间特征图的处理过程进行具体说明。
参见图3所示,基于第一运动感知子模型对第一帧间特征图处理,得到第一融合特征图,包括:基于第一运动感知子模型中的卷积网络分别对第一特征图和第二特征图进行处理,得到第一光流图和第二光流图;基于第一运动感知子模型中的畸变网络对第一光流图和第二光流图映射处理,得到偏移量;基于第一光流图、第二光流图以及偏移量,确定第一融合特征图。
本领域技术人员可以理解,光流图可以表示相邻两帧图像中每个像素点的运动速度和运动方向。光流是空间运动物体在观察成像平面上的像素运动的瞬时速度,是利用图像序列中像素在时间域上的变化以及相连帧之间的相关性来找到上一帧与当前帧之间存在的对应关系,从而计算出相邻帧之间物体的运动信息的一种方法。畸变网络可以为包含反变换(Backward Warping)函数的网络,可以实现图像与图像之间的映射。偏移量可以为基于光流图映射之后得到的,用于表示特征位移偏移的数据。
在示例实施中,将第一帧间特征图输入至第一运动感知子模型中,可以基于卷积网络分别对第一特征图和第二特征图进行处理,从而可以得到用于表征与第一特征图相对应的相邻两个待处理交错帧中像素点的运动速度和运动方向的第一光流图,以及用于表征与第二特征图相对应的相邻两个待处理交错帧中像素点的运动速度和运动方向的第二光流图,例如,基于畸变网络对第一光流图和第二光流图进行映射处理,可以得到与第一光流图相对应的偏移量以及与第二光流图相对应的偏移量,最后,对第一光流图、第二光流图以及这两个光流图相对应的偏移量进行融合处理,即可得到第一融合特征图。这样可以对相邻两个待处理交错帧之间的像素点运动情况进行感知,并通过特征对齐,使得帧与帧之间的特征内容更具连续性,从而可以提高运动物体场景的恢复效果。
继续参见图3所示,基于第一光流图、第二光流图以及偏移量,确定第一融合特征图,包括:对第一光流图和偏移量残差处理,得到第一待拼接特征图;对第二光流图和偏移量残差处理,得到第二待拼接特征图;通过对第一待拼接特征图和第二待拼接特征图拼接处理,得到第一融合特征图。
在示例实施例中,在得到第一光流图、第二光流图以及偏移量之后,可以对第一光流图和偏移量进行残差处理,以使第一光流图中多个光流特征进行对齐,进而得到特征对齐之后 的第一待拼接特征图,同时,对第二光流图和偏移量进行残差处理,以使第二光流图中多个光流特征进行对齐,进而得到特征对齐之后的第二待拼接特征图,例如,对第一待拼接特征图和第二待拼接特征图进行拼接处理,从而可以得到第一融合特征图。这样可以实现待处理交错帧的偏移特征的特征对齐,以使待处理交错帧之间更具有连续性,同时还可以达到互相补足细节的效果。
需要说明的是,基于第二运动感知子模型对第二帧间特征图的处理过程与基于第一运动感知子模型对第一帧间特征图的处理过程相同,本公开实施例在此不再具体赘述。
示例性的,以待处理交错帧为三帧为例来对第一运动感知子模型对第一帧间特征图的处理过程进行示例性说明。可以将D1、D2、D3作为待处理交错帧,将这三帧待处理交错帧输入至第一特征提取分支中,可以得到第一特征图和第二特征图,可以用F1、F2表示,例如,将F1和F2输入至第一运动感知子模型中,基于卷积层分别对F1和F2进行处理,即可得到第一光流图IF1和第二光流图IF2,然后,基于畸变网络分别对IF1和IF2进行映射处理,可以得到偏移量,将IF1和偏移量进行残差处理,得到第一待拼接特征图将IF2和偏移量进行残差处理,得到第二待拼接特征图最后,对进行拼接处理,即可得到第一融合特征图Ffull
S130、基于至少两个目标视频帧,确定目标视频。
在本实施例中,在得到目标视频帧后,即可对目标视频帧进行拼接,从而可以得到多帧连续目标视频帧组成的目标视频。
例如,基于至少两个目标视频帧,确定目标视频,包括:对至少两个目标视频帧在时域上拼接处理,得到目标视频。
在本实施例中,由于目标视频帧携带有相应的时间戳,因此,在基于图像融合模型对待处理交错帧进行处理,并输出相应的目标视频帧后,应用即可按照目标视频帧对应的时间戳,对多个视频帧进行拼接,从而得到目标视频,可以理解,通过将多帧画面进行拼接并生成目标视频,可以使处理后的画面以清晰、连贯的形式展示出来。
本领域技术人员应当理解,在应用确定目标视频后,既可以直接播放该视频,以将处理后的视频画面展示于显示界面上,也可以按照预设路径将目标视频存储至特定的空间内,本公开实施例对此不作具体限定。
本公开实施例的技术方案,在获取至少三个待处理交错帧后,可以将至少三个待处理交错帧输入至预先训练得到的图像融合模型中,得到与至少三个待处理交错帧所对应的至少两个目标视频帧,最后,基于至少两个目标视频帧,确定目标视频,当交错视频在现有的显示设备上进行显示时,可以有效提升视频画面的恢复效果,例如对于运动场景的视频画面,也可以达到较为显著的恢复效果,同时,避免了画面拉丝以及细节丢失等情况,提高了视频画面的画面质量和清晰度,提升了用户的使用体验。
图4是本公开实施例所提供的一种视频处理方法流程示意图,在前述实施例的基础上,可以通过对原始视频中多个待处理视频帧进行处理,从而得到待处理交错帧,其示例实施方式可以参见本实施例技术方案。其中,与上述实施例相同或者相应的技术术语在此不再赘述。
如图4所示,该方法包括如下步骤:
S210、获取与原始视频相对应的多个待处理视频帧。
其中,两个待处理视频帧中包括一个奇数视频帧和一个偶数视频帧,奇数视频帧和偶数视频帧是基于待处理视频帧在原始视频中的顺序确定的。
在本实施例中,原始视频可以为经过隔行扫描的视频帧拼接而成的视频。原始视频可以为基于终端设备实时拍摄得到的视频,也可以是应用软件在存储空间中预先存储的视频,还可以是用户基于预先设置的视频上传控件上传至服务端或客户端的视频等,本公开实施例对此不作具体限定。示例性的,原始视频可以为早期影像视频。奇数视频帧可以为在原始视频 中的排列顺序所对应的数字为奇数,且奇数行像素点存在渲染像素值,可以在显示界面上渲染显示,偶数行像素点的像素值可能为预设数值,在显示界面中以黑洞的形式进行显示的视频帧,相应的,偶数视频帧可以为在原始视频中的排列顺序所对应的数字为偶数,且偶数行像素点存在渲染像素值,可以在显示界面上渲染显示,奇数行像素点的像素值可能为预设数值,在显示界面中以黑洞的形式进行显示的视频帧。
示例性的,如图5所示,其中,图5a可以为奇数视频帧,图5b可以为偶数视频帧,奇数视频帧是仅对奇数行进行扫描采样,因此,图5a中仅奇数行像素点的像素值为渲染像素值,可以为蓝色,在显示界面中进行渲染显示,而偶数行像素点的像素值可以为预设数值,可以为黑色,此时,当偶数行像素点在显示界面中进行显示时,可能会以黑洞的形式显示在显示界面中;同理,偶数视频帧是仅对偶数行进行扫描采样,因此,图5b中仅偶数行像素点的像素值为渲染像素值,可以在显示界面中进行渲染显示,而奇数行像素点的像素值可以为预设数值,此时,当奇数行像素点在显示界面中进行显示时,可能会以黑洞的形式显示在显示界面中。
S220、对相邻两个待处理视频帧融合处理,得到待处理交错帧。
在实际应用中,在得到原始视频后,可以基于预先编写的程序对原始视频进行解析,即可得到多个待处理视频帧,例如,从第一帧待处理视频帧开始,将相邻两个待处理视频帧进行融合处理,从而可以得到待处理交错帧。
需要说明的是,由于奇数视频帧中仅奇数行的数据包含像素点,偶数视频帧中仅偶数行的数据包含像素点,因此,在对相邻两个待处理视频帧进行融合处理时,可以分别提取奇数视频帧中存在像素点的数据以及偶数视频帧中存在像素点的数据,从而可以得到同时包含奇数行像素点和偶数行像素点的待处理交错帧。
例如,对相邻两个待处理视频帧融合处理,得到待处理交错帧,包括:提取奇数视频帧中的奇数行数据以及偶数视频帧中的偶数行数据;通过对奇数行数据以及偶数行数据融合处理,得到待处理交错帧。
在本实施例中,奇数行数据可以为处于奇数行的像素点信息。偶数行数据可以为处于偶数行的像素点信息。本领域技术人员应当理解,在基于隔行扫描的方式将原始视频显示在显示界面上时,可以首先对奇数行的像素点信息进行采样,以得到奇数视频帧,然后,对偶数行的像素点信息进行采样,得到偶数视频帧,可以将奇数视频帧中奇数行的像素点采样信息作为奇数行数据,偶数视频帧中偶数行的像素点采样信息作为偶数行数据。
在实际应用中,在得到多个待处理视频帧后,对于相邻两个待处理视频帧,可以对奇数视频帧的奇数行数据进行提取,对偶数视频帧的偶数行数据进行提取,例如,将奇数行数据以及偶数行数据进行融合处理,即可得到待处理交错帧。这样可以得到既包含奇数行像素点的像素信息,又包含偶数行像素点的像素信息的待处理交错帧,从而可以通过对待处理交错帧进行处理后,得到符合用户需求的目标视频帧。
S230、获取至少三个待处理交错帧。
S240、将至少三个待处理交错帧输入至预先训练得到的图像融合模型中,得到与至少三个待处理交错帧所对应的至少两个目标视频帧。
S250、基于至少两个目标视频帧,确定目标视频。
本公开实施例的技术方案,通过获取与原始视频相对应的多个待处理视频帧,对相邻两个待处理视频帧融合处理,得到待处理交错帧,然后,获取至少三个待处理交错帧,并将至少三个待处理交错帧输入至预先训练得到的图像融合模型中,得到与至少三个待处理交错帧所对应的至少两个目标视频帧,最后,基于至少两个目标视频帧,确定目标视频,当交错视频在现有的显示设备上进行显示时,可以有效提升视频画面的恢复效果,例如对于运动场景的视频画面,也可以达到较为显著的恢复效果,同时,避免了画面拉丝以及细节丢失等情况,提高了视频画面的画面质量和清晰度,提升了用户的使用体验。
图6是本公开实施例所提供的一种视频处理装置结构示意图,如图6所示,所述装置包括:待处理交错帧获取模块310、目标视频帧确定模块320以及目标视频确定模块330。
其中,待处理交错帧获取模块310,设置为获取至少三个待处理交错帧;其中,所述待处理交错帧是基于相邻两个待处理视频帧确定的;
目标视频帧确定模块320,设置为将所述至少三个待处理交错帧输入至预先训练得到的图像融合模型中,得到与所述至少三个待处理交错帧所对应的至少两个目标视频帧;其中,所述图像融合模型中包括特征处理子模型以及运动感知子模型;
目标视频确定模块330,设置为基于所述至少两个目标视频帧,确定目标视频。
在上述技术方案的基础上,所述装置还包括:待处理视频帧获取模块和待处理视频帧处理模块。
待处理视频帧获取模块,设置为在所述获取至少三个待处理交错帧之前,获取与原始视频相对应的多个待处理视频帧;
待处理视频帧处理模块,设置为对相邻两个待处理视频帧融合处理,得到所述待处理交错帧;其中,两个所述待处理视频帧中包括一个奇数视频帧和一个偶数视频帧,奇数视频帧和偶数视频帧是基于待处理视频帧在所述原始视频中的顺序确定的。
在上述技术方案的基础上,待处理视频帧处理模块包括:数据提取单元和数据处理单元。
数据提取单元,设置为提取所述奇数视频帧中的奇数行数据以及所述偶数视频帧中的偶数行数据;
数据处理单元,设置为通过对所述奇数行数据以及所述偶数行数据融合处理,得到所述待处理交错帧。
在上述技术方案的基础上,所述图像融合模型中包括特征处理子模型、运动感知子模型以及2D卷积层。
在上述技术方案的基础上,所述特征处理子模型中包括第一特征提取分支和第二特征提取分支;
所述第一特征提取分支的输出为所述运动感知子模型中第一运动感知子模型的输入,所述第二特征提取分支的输出为所述运动感知子模型中第二运动感知子模型的输入;
所述第一运动感知子模型的输出和所述第二运动感知子模型的输出为所述2D卷积层的输入,以使所述2D卷积层输出目标视频帧。
在上述技术方案的基础上,所述第一特征提取分支包括结构特征提取网络和结构特征融合网络,所述第二特征提取分支包括细节特征提取网络和细节特征融合网络。
在上述技术方案的基础上,目标视频帧确定模块320包括:等比例特征提取子模块、奇偶场特征提取子模块、结构特征处理子模块、细节特征处理子模块、第一融合特征图确定子模块、第二融合特征图确定子模块以及目标视频帧确定子模块。
等比例特征提取子模块,设置为基于所述结构特征提取网络对所述至少三个待处理交错帧进行等比例特征提取,得到与所述待处理交错帧所对应的结构特征;以及,
奇偶场特征提取子模块,设置为基于所述细节特征提取网络对所述至少三个待处理交错帧进行奇偶场特征提取,得到与所述待处理交错帧所对应的细节特征;
结构特征处理子模块,设置为基于所述结构特征融合网络对所述结构特征进行处理,得到相邻两个待处理交错帧之间的第一帧间特征图;以及,
细节特征处理子模块,设置为基于所述细节特征融合网络对所述细节特征进行处理,得到相邻两个待处理交错帧之间的第二帧间特征图;
第一融合特征图确定子模块,设置为基于第一运动感知子模型对所述第一帧间特征图处理,得到第一融合特征图;以及,
第二融合特征图确定子模块,设置为基于第二运动感知子模型对所述第二帧间特征图处理,得到第二融合特征图;
目标视频帧确定子模块,设置为基于所述2D卷积层对所述第一融合特征图以及所述第 二融合特征图进行处理,得到所述至少两个目标视频帧。
在上述技术方案的基础上,第一融合特征图确定子模块包括:特征图处理单元、光流图映射处理单元以及第一融合特征图确定单元。
特征图处理单元,设置为基于所述第一运动感知子模型中的卷积网络分别对所述第一特征图和所述第二特征图进行处理,得到第一光流图和第二光流图;
光流图映射处理单元,设置为基于所述第一运动感知子模型中的畸变网络对所述第一光流图和所述第二光流图映射处理,得到偏移量;
第一融合特征图确定单元,设置为基于所述第一光流图、第二光流图以及所述偏移量,确定所述第一融合特征图。
在上述技术方案的基础上,第一融合特征图确定单元,设置为对所述第一光流图和所述偏移量残差处理,得到第一待拼接特征图;对所述第二光流图和所述偏移量残差处理,得到第二待拼接特征图;通过对所述第一待拼接特征图和所述第二待拼接特征图拼接处理,得到所述第一融合特征图。
在上述技术方案的基础上,目标视频确定模块330,设置为对所述至少两个目标视频帧在时域上拼接处理,得到所述目标视频。
本公开实施例的技术方案,在获取至少三个待处理交错帧后,可以将至少三个待处理交错帧输入至预先训练得到的图像融合模型中,得到与至少三个待处理交错帧所对应的至少两个目标视频帧,最后,基于至少两个目标视频帧,确定目标视频,当交错视频在现有的显示设备上进行显示时,可以有效提升视频画面的恢复效果,例如对于运动场景的视频画面,也可以达到较为显著的恢复效果,同时,避免了画面拉丝以及细节丢失等情况,提高了视频画面的画面质量和清晰度,提升了用户的使用体验。
本公开实施例所提供的视频处理装置可执行本公开任意实施例所提供的视频处理方法,具备执行方法相应的功能模块和有益效果。
值得注意的是,上述装置所包括的多个单元和模块只是按照功能逻辑进行划分的,但并不局限于上述的划分,只要能够实现相应的功能即可;另外,多个功能单元的具体名称也只是为了便于相互区分,并不用于限制本公开实施例的保护范围。
图7是本公开实施例所提供的一种电子设备的结构示意图。下面参考图7,其示出了适于用来实现本公开实施例的电子设备(例如图7中的终端设备或服务器)500的结构示意图。本公开实施例中的终端设备可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、PDA(个人数字助理)、PAD(平板电脑)、PMP(便携式多媒体播放器)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字TV、台式计算机等等的固定终端。图7示出的电子设备仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。
如图7所示,电子设备500可以包括处理装置(例如中央处理器、图形处理器等)501,其可以根据存储在只读存储器(ROM)502中的程序或者从存储装置508加载到随机访问存储器(RAM)503中的程序而执行多种适当的动作和处理。在RAM 503中,还存储有电子设备500操作所需的多种程序和数据。处理装置501、ROM 502以及RAM 503通过总线504彼此相连。编辑/输出(I/O)接口505也连接至总线504。
通常,以下装置可以连接至I/O接口505:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置506;包括例如液晶显示器(LCD)、扬声器、振动器等的输出装置507;包括例如磁带、硬盘等的存储装置508;以及通信装置509。通信装置509可以允许电子设备500与其他设备进行无线或有线通信以交换数据。虽然图7示出了具有多种装置的电子设备500,但是应理解的是,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。
根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在非暂态计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计 算机程序可以通过通信装置509从网络上被下载和安装,或者从存储装置508被安装,或者从ROM 502被安装。在该计算机程序被处理装置501执行时,执行本公开实施例的方法中限定的上述功能。
本公开实施方式中的多个装置之间所交互的消息或者信息的名称仅用于说明性的目的,而并不是用于对这些消息或信息的范围进行限制。
本公开实施例提供的电子设备与上述实施例提供的视频处理方法属于同一构思,未在本实施例中详尽描述的技术细节可参见上述实施例,并且本实施例与上述实施例具有相同的有益效果。
本公开实施例提供了一种计算机存储介质,其上存储有计算机程序,该程序被处理器执行时实现上述实施例所提供的视频处理方法。
需要说明的是,本公开上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、RF(射频)等等,或者上述的任意合适的组合。
在一些实施方式中,客户端、服务器可以利用诸如HTTP(HyperText Transfer Protocol,超文本传输协议)之类的任何当前已知或未来研发的网络协议进行通信,并且可以与任意形式或介质的数字数据通信(例如,通信网络)互连。通信网络的示例包括局域网(“LAN”),广域网(“WAN”),网际网(例如,互联网)以及端对端网络(例如,ad hoc端对端网络),以及任何当前已知或未来研发的网络。
上述计算机可读介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。
上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备:
上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备:
获取至少三个待处理交错帧;其中,所述待处理交错帧是基于相邻两个待处理视频帧确定的;
将所述至少三个待处理交错帧输入至预先训练得到的图像融合模型中,得到与所述至少三个待处理交错帧所对应的至少两个目标视频帧;其中,所述图像融合模型中包括特征处理子模型以及运动感知子模型;
基于所述至少两个目标视频帧,确定目标视频。
可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码,上述程序设计语言包括但不限于面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在 用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
附图中的流程图和框图,图示了按照本公开多种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本公开实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,单元的名称在某种情况下并不构成对该单元本身的限定,例如,第一获取单元还可以被描述为“获取至少两个网际协议地址的单元”。
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示范类型的硬件逻辑部件包括:现场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、片上系统(SOC)、复杂可编程逻辑设备(CPLD)等等。
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。
根据本公开的一个或多个实施例,【示例一】提供了一种视频处理方法,该方法包括:
获取至少三个待处理交错帧;其中,所述待处理交错帧是基于相邻两个待处理视频帧确定的;
将所述至少三个待处理交错帧输入至预先训练得到的图像融合模型中,得到与所述至少三个待处理交错帧所对应的至少两个目标视频帧;其中,所述图像融合模型中包括特征处理子模型以及运动感知子模型;
基于所述至少两个目标视频帧,确定目标视频。
根据本公开的一个或多个实施例,【示例二】提供了一种视频处理方法,在所述获取至少三个待处理交错帧之前该方法还包括:
获取与原始视频相对应的多个待处理视频帧;
对相邻两个待处理视频帧融合处理,得到所述待处理交错帧;
其中,两个所述待处理视频帧中包括一个奇数视频帧和一个偶数视频帧,奇数视频帧和偶数视频帧是基于待处理视频帧在所述原始视频中的顺序确定的。
根据本公开的一个或多个实施例,【示例三】提供了一种视频处理方法,所述对相邻两个待处理视频帧进行融合处理,得到所述待处理交错帧,包括:
提取所述奇数视频帧中的奇数行数据以及所述偶数视频帧中的偶数行数据;
通过对所述奇数行数据以及所述偶数行数据融合处理,得到所述待处理交错帧。
根据本公开的一个或多个实施例,【示例四】提供了一种视频处理方法,其中所述图像融合模型中包括特征处理子模型、运动感知子模型以及2D卷积层。
根据本公开的一个或多个实施例,【示例五】提供了一种视频处理方法,还包括:
所述特征处理子模型中包括第一特征提取分支和第二特征提取分支;
所述第一特征提取分支的输出为所述运动感知子模型中第一运动感知子模型的输入,所述第二特征提取分支的输出为所述运动感知子模型中第二运动感知子模型的输入;
所述第一运动感知子模型的输出和所述第二运动感知子模型的输出为所述2D卷积层的输入,以使所述2D卷积层输出目标视频帧。
根据本公开的一个或多个实施例,【示例六】提供了一种视频处理方法,其中,所述第一特征提取分支包括结构特征提取网络和结构特征融合网络,所述第二特征提取分支包括细节特征提取网络和细节特征融合网络。
根据本公开的一个或多个实施例,【示例七】提供了一种视频处理方法,其中,所述将所述至少三个待处理交错帧输入至预先训练得到的图像融合模型中,得到与所述至少三个待处理交错帧所对应的至少两个目标视频帧,包括:
基于所述结构特征提取网络对所述至少三个待处理交错帧进行等比例特征提取,得到与所述待处理交错帧所对应的结构特征;以及,
基于所述细节特征提取网络对所述至少三个待处理交错帧进行奇偶场特征提取,得到与所述待处理交错帧所对应的细节特征;
基于所述结构特征融合网络对所述结构特征进行处理,得到相邻两个待处理交错帧之间的第一帧间特征图;以及,
基于所述细节特征融合网络对所述细节特征进行处理,得到相邻两个待处理交错帧之间的第二帧间特征图;
基于第一运动感知子模型对所述第一帧间特征图处理,得到第一融合特征图;以及,
基于第二运动感知子模型对所述第二帧间特征图处理,得到第二融合特征图;
基于所述2D卷积层对所述第一融合特征图以及所述第二融合特征图进行处理,得到所述至少两个目标视频帧。
根据本公开的一个或多个实施例,【示例八】提供了一种视频处理方法,其中,所述第一帧间特征图包括第一特征图和第二特征图,所述基于第一运动感知子模型对所述第一帧间特征图进行处理,得到第一融合特征图,包括:
基于所述第一运动感知子模型中的卷积网络分别对所述第一特征图和所述第二特征图进行处理,得到第一光流图和第二光流图;
基于所述第一运动感知子模型中的畸变网络对所述第一光流图和所述第二光流图映射处理,得到偏移量;
基于所述第一光流图、第二光流图以及所述偏移量,确定所述第一融合特征图。
根据本公开的一个或多个实施例,【示例九】提供了一种视频处理方法,其中,所述基于所述第一光流图、第二光流图以及所述偏移量,确定所述第一融合特征图,包括:
对所述第一光流图和所述偏移量残差处理,得到第一待拼接特征图;
对所述第二光流图和所述偏移量残差处理,得到第二待拼接特征图;
通过对所述第一待拼接特征图和所述第二待拼接特征图拼接处理,得到所述第一融合特征图。
根据本公开的一个或多个实施例,【示例十】提供了一种视频处理方法,其中,所述基于所述至少两个目标视频帧,确定目标视频,包括:
对所述至少两个目标视频帧在时域上拼接处理,得到所述目标视频。
根据本公开的一个或多个实施例,【示例十一】提供了一种视频处理装置,该装置包括:
待处理交错帧获取模块,设置为获取至少三个待处理交错帧;其中,所述待处理交错帧是基于相邻两个待处理视频帧确定的;
目标视频帧确定模块,设置为将所述至少三个待处理交错帧输入至预先训练得到的图像融合模型中,得到与所述至少三个待处理交错帧所对应的至少两个目标视频帧;其中,所述 图像融合模型中包括特征处理子模型以及运动感知子模型;
目标视频确定模块,设置为基于所述至少两个目标视频帧,确定目标视频。
此外,虽然采用特定次序描绘了多种操作,但是这不应当理解为要求这些操作以所示出的特定次序或以顺序次序执行来执行。在一定环境下,多任务和并行处理可能是有利的。同样地,虽然在上面论述中包含了若干具体实现细节,但是这些不应当被解释为对本公开的范围的限制。在单独的实施例的上下文中描述的某些特征还可以组合地实现在单个实施例中。相反地,在单个实施例的上下文中描述的多种特征也可以单独地或以任何合适的子组合的方式实现在多个实施例中。

Claims (13)

  1. 一种视频处理方法,包括:
    获取至少三个待处理交错帧;其中,所述待处理交错帧是基于相邻两个待处理视频帧确定的;
    将所述至少三个待处理交错帧输入至预先训练得到的图像融合模型中,得到与所述至少三个待处理交错帧所对应的至少两个目标视频帧;其中,所述图像融合模型中包括特征处理子模型以及运动感知子模型;
    基于所述至少两个目标视频帧,确定目标视频。
  2. 根据权利要求1所述的方法,在所述获取至少三个待处理交错帧之前,还包括:
    获取与原始视频相对应的多个待处理视频帧;
    对相邻两个待处理视频帧进行融合处理,得到所述待处理交错帧;
    其中,两个所述待处理视频帧中包括一个奇数视频帧和一个偶数视频帧,所述奇数视频帧和所述偶数视频帧是基于所述待处理视频帧在所述原始视频中的顺序确定的。
  3. 根据权利要求2所述的方法,其中,所述对相邻两个待处理视频帧进行融合处理,得到所述待处理交错帧,包括:
    提取所述奇数视频帧中的奇数行数据以及所述偶数视频帧中的偶数行数据;
    通过对所述奇数行数据以及所述偶数行数据进行融合处理,得到所述待处理交错帧。
  4. 根据权利要求1所述的方法,其中,所述图像融合模型还包括2D卷积层。
  5. 根据权利要求4所述的方法,还包括:
    所述特征处理子模型中包括第一特征提取分支和第二特征提取分支;
    所述第一特征提取分支的输出为所述运动感知子模型中第一运动感知子模型的输入,所述第二特征提取分支的输出为所述运动感知子模型中第二运动感知子模型的输入;
    所述第一运动感知子模型的输出和所述第二运动感知子模型的输出为所述2D卷积层的输入,以使所述2D卷积层输出目标视频帧。
  6. 根据权利要求5所述的方法,其中,所述第一特征提取分支包括结构特征提取网络和结构特征融合网络,所述第二特征提取分支包括细节特征提取网络和细节特征融合网络。
  7. 根据权利要求6所述的方法,其中,所述将所述至少三个待处理交错帧输入至预先训练得到的图像融合模型中,得到与所述至少三个待处理交错帧所对应的至少两个目标视频帧,包括:
    基于所述结构特征提取网络对所述至少三个待处理交错帧进行等比例特征提取,得到与所述待处理交错帧所对应的结构特征;
    基于所述细节特征提取网络对所述至少三个待处理交错帧进行奇偶场特征提取,得到与所述待处理交错帧所对应的细节特征;
    基于所述结构特征融合网络对所述结构特征进行处理,得到相邻两个待处理交错帧之间的第一帧间特征图;以及,基于所述细节特征融合网络对所述细节特征进行处理,得到相邻两个待处理交错帧之间的第二帧间特征图;
    基于所述第一运动感知子模型对所述第一帧间特征图进行处理,得到第一融合特征图;以及,基于所述第二运动感知子模型对所述第二帧间特征图进行处理,得到第二融合特征图;
    基于所述2D卷积层对所述第一融合特征图以及所述第二融合特征图进行处理,得到所述至少两个目标视频帧。
  8. 根据权利要求7所述的方法,其中,所述第一帧间特征图包括第一特征图和第二特征图,所述基于第一运动感知子模型对所述第一帧间特征图进行处理,得到第一融合特征图,包括:
    基于所述第一运动感知子模型中的卷积网络分别对所述第一特征图和所述第二特征图进行处理,得到第一光流图和第二光流图;
    基于所述第一运动感知子模型中的畸变网络对所述第一光流图和所述第二光流图进行映射处理,得到偏移量;
    基于所述第一光流图、第二光流图以及所述偏移量,确定所述第一融合特征图。
  9. 根据权利要求8所述的方法,其中,所述基于所述第一光流图、第二光流图以及所述偏移量,确定所述第一融合特征图,包括:
    对所述第一光流图和所述偏移量进行残差处理,得到第一待拼接特征图;
    对所述第二光流图和所述偏移量进行残差处理,得到第二待拼接特征图;
    通过对所述第一待拼接特征图和所述第二待拼接特征图进行拼接处理,得到所述第一融合特征图。
  10. 根据权利要求1所述的方法,其中,所述基于所述至少两个目标视频帧,确定目标视频,包括:
    对所述至少两个目标视频帧在时域上进行拼接处理,得到所述目标视频。
  11. 一种视频处理装置,包括:
    待处理交错帧获取模块,设置为获取至少三个待处理交错帧;其中,所述待处理交错帧是基于相邻两个待处理视频帧确定的;
    目标视频帧确定模块,设置为将所述至少三个待处理交错帧输入至预先训练得到的图像融合模型中,得到与所述至少三个待处理交错帧所对应的至少两个目标视频帧;其中,所述图像融合模型中包括特征处理子模型以及运动感知子模型;
    目标视频确定模块,设置为基于所述至少两个目标视频帧,确定目标视频。
  12. 一种电子设备,包括:
    一个或多个处理器;
    存储装置,设置为存储一个或多个程序,
    当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-10中任一所述的视频处理方法。
  13. 一种包含计算机可执行指令的存储介质,所述计算机可执行指令在由计算机处理器执行时用于执行如权利要求1-10中任一所述的视频处理方法。
PCT/CN2023/121354 2022-10-21 2023-09-26 视频处理方法、装置、电子设备及存储介质 WO2024082933A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211294643.2 2022-10-21
CN202211294643.2A CN115633144A (zh) 2022-10-21 2022-10-21 视频处理方法、装置、电子设备及存储介质

Publications (1)

Publication Number Publication Date
WO2024082933A1 true WO2024082933A1 (zh) 2024-04-25

Family

ID=84907105

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/121354 WO2024082933A1 (zh) 2022-10-21 2023-09-26 视频处理方法、装置、电子设备及存储介质

Country Status (2)

Country Link
CN (1) CN115633144A (zh)
WO (1) WO2024082933A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115633144A (zh) * 2022-10-21 2023-01-20 抖音视界有限公司 视频处理方法、装置、电子设备及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108134938A (zh) * 2016-12-01 2018-06-08 中兴通讯股份有限公司 视频扫描方式检测、纠正方法、及视频播放方法和装置
KR101979584B1 (ko) * 2017-11-21 2019-05-17 에스케이 텔레콤주식회사 디인터레이싱 방법 및 장치
CN112218081A (zh) * 2020-09-03 2021-01-12 深圳市捷视飞通科技股份有限公司 视频图像去隔行的方法和装置、电子设备及存储介质
CN112750094A (zh) * 2020-12-30 2021-05-04 合肥工业大学 一种视频处理方法及系统
US20220014708A1 (en) * 2020-07-10 2022-01-13 Disney Enterprises, Inc. Deinterlacing via deep learning
CN115633144A (zh) * 2022-10-21 2023-01-20 抖音视界有限公司 视频处理方法、装置、电子设备及存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108134938A (zh) * 2016-12-01 2018-06-08 中兴通讯股份有限公司 视频扫描方式检测、纠正方法、及视频播放方法和装置
KR101979584B1 (ko) * 2017-11-21 2019-05-17 에스케이 텔레콤주식회사 디인터레이싱 방법 및 장치
US20220014708A1 (en) * 2020-07-10 2022-01-13 Disney Enterprises, Inc. Deinterlacing via deep learning
CN112218081A (zh) * 2020-09-03 2021-01-12 深圳市捷视飞通科技股份有限公司 视频图像去隔行的方法和装置、电子设备及存储介质
CN112750094A (zh) * 2020-12-30 2021-05-04 合肥工业大学 一种视频处理方法及系统
CN115633144A (zh) * 2022-10-21 2023-01-20 抖音视界有限公司 视频处理方法、装置、电子设备及存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
WANG CHONG: "Video Deinterlacing Method Based on Optical Flow Method", INFORMATIZATION RESEARCH, vol. 39, no. 1, 20 March 2013 (2013-03-20), pages 52 - 57, XP093160126 *

Also Published As

Publication number Publication date
CN115633144A (zh) 2023-01-20

Similar Documents

Publication Publication Date Title
WO2024082933A1 (zh) 视频处理方法、装置、电子设备及存储介质
EP4053784A1 (en) Image processing method and apparatus, electronic device, and storage medium
US10121227B1 (en) Method and system of reconstructing videos by using super-resolution algorithm
CN114331820A (zh) 图像处理方法、装置、电子设备及存储介质
US11785195B2 (en) Method and apparatus for processing three-dimensional video, readable storage medium and electronic device
US11893770B2 (en) Method for converting a picture into a video, device, and storage medium
WO2024104248A1 (zh) 虚拟全景图的渲染方法、装置、设备及存储介质
WO2024037556A1 (zh) 图像处理方法、装置、设备及存储介质
CN111738951B (zh) 图像处理方法及装置
CN116527748A (zh) 一种云渲染交互方法、装置、电子设备及存储介质
WO2024056020A1 (zh) 一种双目图像的生成方法、装置、电子设备及存储介质
WO2023231918A1 (zh) 图像处理方法、装置、电子设备及存储介质
US8599240B2 (en) Super-resolution from 3D (3D to 2D conversion) for high quality 2D playback
CN113535645B (zh) 共享文档的展示方法、装置、电子设备及存储介质
CN113891057A (zh) 视频的处理方法、装置、电子设备和存储介质
CN113706385A (zh) 一种视频超分辨率方法、装置、电子设备及存储介质
CN108683842B (zh) 一种全景相机及输出全景视频的方法和装置
JP5327176B2 (ja) 画像処理方法及び画像処理装置
CN108471530B (zh) 用于检测视频的方法和设备
CN116760991A (zh) 码流信息生成方法和装置
Monteagudo et al. AI-based telepresence for broadcast applications
CN113592734B (zh) 图像处理方法、装置和电子设备
CN116939130A (zh) 一种视频生成方法、装置、电子设备和存储介质
CN116302268A (zh) 媒体内容的展示方法、装置、电子设备和存储介质
CN115953432A (zh) 基于图像的运动预测方法、装置、电子设备和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23878926

Country of ref document: EP

Kind code of ref document: A1