WO2023160617A9 - 视频插帧处理方法、视频插帧处理装置和可读存储介质 - Google Patents

视频插帧处理方法、视频插帧处理装置和可读存储介质 Download PDF

Info

Publication number
WO2023160617A9
WO2023160617A9 PCT/CN2023/077905 CN2023077905W WO2023160617A9 WO 2023160617 A9 WO2023160617 A9 WO 2023160617A9 CN 2023077905 W CN2023077905 W CN 2023077905W WO 2023160617 A9 WO2023160617 A9 WO 2023160617A9
Authority
WO
WIPO (PCT)
Prior art keywords
video frame
frame
video
sub
image
Prior art date
Application number
PCT/CN2023/077905
Other languages
English (en)
French (fr)
Other versions
WO2023160617A1 (zh
Inventor
孙梦笛
朱丹
Original Assignee
京东方科技集团股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 京东方科技集团股份有限公司 filed Critical 京东方科技集团股份有限公司
Publication of WO2023160617A1 publication Critical patent/WO2023160617A1/zh
Publication of WO2023160617A9 publication Critical patent/WO2023160617A9/zh

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • H04N21/234381Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements by altering the temporal resolution, e.g. decreasing the frame rate by frame skipping
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/435Processing of additional data, e.g. decrypting of additional data, reconstructing software from modules extracted from the transport stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/440281Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by altering the temporal resolution, e.g. by frame skipping
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/488Data services, e.g. news ticker
    • H04N21/4884Data services, e.g. news ticker for displaying subtitles

Definitions

  • Embodiments of the present disclosure relate to a video frame interpolation processing method, a video frame interpolation processing device, and a non-transitory readable storage medium.
  • Video processing is a typical application of artificial intelligence
  • video frame interpolation technology is a typical technology in video processing. It aims to synthesize a smooth transition of intermediate video frames based on the preceding and following video frames in a video, so as to make the video playback smoother and thereby improve the performance of the video.
  • User viewing experience For example, video frame insertion processing can be used to convert a 24 frame rate video into a 48 frame rate video, allowing users to feel the video is clearer and smoother when watching.
  • At least one embodiment of the present disclosure provides a video frame insertion processing method, including: acquiring a first video frame and a second video frame of a video, and acquiring the first video frame based on the first video frame and the second video frame. a first comparison result between a video frame and the second video frame, and determining whether to insert a frame between the first video frame and the second video frame based on the first comparison result.
  • the first video frame and the second video frame are adjacent in the time domain, and the first video frame is a forward frame of the second video frame.
  • the first comparison result indicates whether there is a picture switch between the first video frame and the second video frame.
  • the picture switching includes subtitle switching and/or scene switching.
  • the first video frame between the first video frame and the second video frame is obtained.
  • the comparison result includes: determining whether the subtitle switching exists between the first video frame and the second video frame based on whether the subtitle content of the first video frame and the second video frame is the same.
  • based on whether the subtitle content of the first video frame and the second video frame is the same it is determined whether the first video frame and the second video frame have the same subtitle content.
  • Whether there is the subtitle switching includes: obtaining the audio segment corresponding to the first video frame; based on the audio segment, obtaining the starting video frame and the ending video frame corresponding to the audio segment; based on the starting video frame
  • the starting video frame and the ending video frame are used to determine whether there is the subtitle switching between the first video frame and the second video frame.
  • the method based on the starting video frame and the ending video frame, it is determined whether there is the subtitle switch between the first video frame and the second video frame. , including: in response to the second video frame being between the start video frame and the end video frame, determining that the subtitle switch does not exist between the first video frame and the second video frame; In response to the second video frame not being between the start video frame and the end video frame, it is determined that the subtitle switch exists between the first video frame and the second video frame.
  • based on whether the subtitle content of the first video frame and the second video frame is the same it is determined whether the first video frame and the second video frame have the same subtitle content. Whether there is the subtitle switching includes: obtaining the first recognized text content of the first video frame; obtaining the second recognized text content of the second video frame; responding to the first recognized text content and the third recognized text content. If the two recognized text contents are the same, it is determined that there is no subtitle switching between the first video frame and the second video frame.
  • the first video frame and the second video frame are determined based on whether the subtitle content of the first video frame and the second video frame is the same. whether there is a subtitle switch between the two, further comprising: in response to the difference between the first recognized text content and the second recognized text content: acquiring the first sub-image of the first video frame; acquiring the second video a second sub-image of the frame, and determining whether the subtitle switch exists between the first video frame and the second video frame based on the first sub-image and the second sub-image.
  • the first sub-image corresponds to the first subtitle content of the first video frame;
  • the second sub-image corresponds to the second subtitle content of the second video frame.
  • based on the first sub-image and the second sub-image it is determined whether the subtitle exists between the first video frame and the second video frame.
  • Switching including: determining a first similarity between the first sub-image and the second sub-image based on the first sub-image and the second sub-image; in response to the first similarity being greater than a first threshold, determining that there is no subtitle switching between the first video frame and the second video frame; in response to the first similarity not being greater than the first threshold, determining that the first video frame
  • the subtitle switch exists between the second video frame and the second video frame.
  • the first video frame between the first video frame and the second video frame is obtained.
  • Compare The result includes: determining whether there is the scene switch between the first video frame and the second video frame based on whether the scenes of the first video frame and the second video frame are the same.
  • the scene switch exists, including: obtaining a second degree of similarity between the first video frame and the second video frame; and in response to the second degree of similarity being greater than a second threshold, determining the first video frame There is no scene switch between the first video frame and the second video frame; in response to the second similarity being not greater than the second threshold, it is determined that there is a scene switch between the first video frame and the second video frame. Describe scene switching.
  • determining whether to insert a frame between the first video frame and the second video based on the first comparison result includes: responding to the first comparison The result indicates that there is no picture switching between the first video frame and the second video frame, and it is determined to insert a frame between the first video frame and the second video; in response to the first comparison The result indicates that the picture switching exists between the first video frame and the second video frame, and it is determined not to insert a frame between the first video frame and the second video frame.
  • the method provided by at least one embodiment of the present disclosure further includes: setting a first frame insertion flag, and in response to the picture switching between the first video frame and the second video frame, changing the The first frame insertion flag is modified into the second frame insertion flag.
  • it further includes: in response to the picture switching between the first video frame and the second video frame, acquiring a fourth video frame; based on the first video frame two video frames and the fourth video frame, obtain a second comparison result between the second video frame and the fourth video frame; determine whether the second video frame and the fourth video frame are between the second video frame and the fourth video frame based on the second comparison result.
  • Frames are inserted between the fourth videos.
  • the fourth video frame and the second video frame are adjacent in the time domain, and the second video frame is a forward frame of the fourth video frame; the second comparison result indicates that the second video frame Whether there is the picture switch between the frame and the fourth video frame.
  • determining whether to insert a frame between the second video frame and the fourth video based on the second comparison result includes: responding to the second comparison The result indicates that there is no picture switching between the second video frame and the fourth video frame, and multiple video frames are inserted between the second video frame and the fourth video frame.
  • the frame number of the multi-frame video frame is based on the second frame insertion flag.
  • determining whether to insert a frame between the second video frame and the fourth video based on the second comparison result includes: responding to the second ratio The result indicates that the picture switching exists between the second video frame and the fourth video frame, determining not to insert a video frame between the second video frame and the fourth video; and inserting the second video frame into the fourth video frame.
  • the frame insertion flag is modified to a third frame insertion flag, where the third frame insertion flag is used to indicate the frame number of the next frame insertion.
  • the method further includes: in response to inserting a third video frame between the first video frame and the second video frame, obtaining the third video frame of the first video frame.
  • a sub-image obtaining a third sub-image of the third video frame, and determining whether to replace the third video frame with the first video frame based on the first sub-image and the third sub-image.
  • the first sub-image corresponds to the first subtitle content in the first video frame
  • the third sub-image corresponds to the third subtitle content in the third video frame.
  • determining whether to replace the third video frame with the first video frame based on the first sub-image and the third sub-image includes: obtaining the The pixel value of the first pixel in the first sub-image; based on the pixel value of the first pixel of the first sub-image, set the pixel value of the third pixel of the third sub-image, based on the first sub-image image and the set third sub-image, and determine whether to replace the third video frame with the first video frame.
  • the pixel value of the first pixel is greater than a third threshold; the relative position of the third pixel in the third sub-image is the same as the relative position of the first pixel in the first sub-image.
  • At least one embodiment of the present disclosure also provides a video frame insertion processing device, including: an acquisition module, a comparison module and an operation module.
  • the acquisition module is configured to acquire the first video frame and the second video frame of the video.
  • the first video frame and the second video frame are adjacent in the time domain, and the first video frame is a forward frame of the second video frame.
  • the comparison module is configured to obtain a first comparison result between the first video frame and the second video frame based on the first video frame and the second video frame.
  • the first comparison result indicates whether there is a picture switching between the first video frame and the second video frame.
  • the operating module is configured to determine whether to insert a frame between the first video frame and the second video frame based on the first comparison result.
  • At least one embodiment of the present disclosure also provides a video frame insertion processing device, including: a processor and a memory.
  • the memory includes one or more computer program modules.
  • the one or more computer program modules are stored in the memory and configured to be executed by the processor, the one or more computer program modules include a component for performing video frame insertion in any of the above embodiments. Instructions for processing methods.
  • At least one embodiment of the present disclosure also provides a non-transitory readable storage medium having computer instructions stored thereon.
  • the computer instructions are executed by the processor, the video insertion in any of the above embodiments is performed.
  • Frame processing methods are also provided.
  • Figure 1 is a schematic diagram of a video frame insertion method provided by at least one embodiment of the present disclosure
  • Figure 2 is a schematic flowchart of a video frame insertion processing method provided by at least one embodiment of the present disclosure
  • Figure 3 is a flow chart of a method for determining subtitle switching provided by at least one embodiment of the present disclosure
  • Figure 4 is a schematic flowchart of a text recognition method provided by at least one embodiment of the present disclosure
  • Figure 5 is a schematic flowchart of another method for determining whether subtitles are switched, provided by at least one embodiment of the present disclosure
  • Figure 6 is a schematic block diagram of yet another method for determining whether subtitles are switched, provided by at least one embodiment of the present disclosure
  • Figure 7 is a schematic diagram of another video frame insertion processing method provided by at least one embodiment of the present disclosure.
  • Figure 8 is a schematic flow chart of a post-processing method provided by at least one embodiment of the present disclosure.
  • Figure 9 is a schematic diagram of another video frame insertion processing method provided by at least one embodiment of the present disclosure.
  • Figure 10 is a schematic block diagram of yet another video frame insertion processing method provided by at least one embodiment of the present disclosure.
  • Figure 11 is a schematic block diagram of a video frame insertion processing device provided by at least one embodiment of the present disclosure.
  • Figure 12 is a schematic block diagram of another video frame insertion processing device provided by at least one embodiment of the present disclosure.
  • Figure 13 is a schematic block diagram of yet another video frame insertion processing device provided by at least one embodiment of the present disclosure.
  • Figure 14 is a schematic block diagram of a non-transitory readable storage medium provided by at least one embodiment of the present disclosure
  • Figure 15 is a schematic block diagram of an electronic device provided by at least one embodiment of the present disclosure.
  • Figure 1 is a schematic diagram of a video frame insertion method provided by at least one embodiment of the present disclosure.
  • video frame insertion technology is usually an intermediate frame between two consecutive frames of a synthesized video, which is used to increase the frame rate and enhance visual quality.
  • video frame interpolation technology can also support various applications such as slow-motion generation, video compression, and training data generation for video motion deblurring.
  • video frame insertion can use optical flow prediction algorithms to predict intermediate frames and insert them between two frames.
  • Optical flow like the flow of light, is a way of representing the direction of movement of objects in an image through color.
  • the optical flow prediction algorithm usually predicts a certain frame in the middle based on the two frames of video before and after. After inserting the predicted image, the video will look smoother.
  • the intermediate flow information is estimated through the network for two consecutive input frames, a rough result is obtained by reversely twisting the input frame, and the result is input into the fusion network together with the input frame and the intermediate flow information, and finally we get Intermediate frame.
  • At least one embodiment of the present disclosure provides a video frame insertion method Processing method, the method includes: obtaining the first video frame and the second video frame of the video; based on the first video frame and the second video frame, obtaining the first comparison result between the first video frame and the second video frame; based on The first comparison result determines whether to interpolate a frame between the first video frame and the second video frame.
  • the first video frame and the second video frame are adjacent in the time domain, and the first video frame is a forward frame of the second video frame.
  • the first comparison result indicates whether there is a picture switching between the first video frame and the second video frame.
  • At least one embodiment of the present disclosure also provides a video frame interpolation processing device and a non-transitory readable storage medium corresponding to the above video frame interpolation processing method.
  • the obvious deformation problem caused by the switching of video images during the frame insertion processing can be solved, ensuring the smoothness of the video, thereby improving the user's viewing experience.
  • FIG. 2 is a schematic flowchart of a video frame insertion processing method provided by at least one embodiment of the present disclosure.
  • At least one embodiment of the present disclosure provides a video frame insertion processing method 10, as shown in Figure 2.
  • the video frame insertion processing method 10 can be applied to any scene that requires video frame insertion, for example, it can be applied to various video products and services such as TV series, movies, documentaries, advertisements, MVs, etc., and can also be applied to other aspects, The embodiments of the present disclosure are not limited to this.
  • the video frame insertion processing method 10 may include the following steps S101 to S103.
  • Step S101 Obtain the first video frame and the second video frame of the video.
  • the first video frame and the second video frame are adjacent in the time domain, and the first video frame is a forward frame of the second video frame.
  • Step S102 Based on the first video frame and the second video frame, obtain a first comparison result between the first video frame and the second video frame.
  • the first comparison result indicates whether there is a picture switching between the first video frame and the second video frame.
  • Step S103 Determine whether to insert a frame between the first video frame and the second video frame based on the first comparison result.
  • first video frame and second video frame are used to refer to any two temporally consecutive or adjacent frames in a video or video frame sequence.
  • image or video frame “First video frame” is used to refer to the previous frame of two temporally adjacent images, and “second video frame” is used to refer to the later frame of two temporally adjacent images.
  • Image “No. "Three video frames” is used to refer to an intermediate frame or intervening frame inserted between two temporally adjacent frames of images.
  • First video frame “Second video frame” and “Third video frame” Neither is limited to a specific frame of images, nor is it limited to a specific order.
  • the "first comparison result” refers to the comparison result between two adjacent frames of images in the video, and is not limited to a specific A certain comparison result is not limited to a specific order. It should also be noted that the embodiment of the present disclosure uses the forward frames of two adjacent frames as a reference, and may also use the backward frames of two adjacent frames as a reference. The frame is used as a reference, as long as the frame insertion processing method of the entire video is consistent.
  • the adjacent first video frame and the second video frame may be compared to determine Whether there is a screen switch between the first video frame and the second video frame.
  • step S103 it may be determined based on a first comparison result of the first video frame and the second video frame whether to perform a frame interpolation operation between the first video frame and the second video frame.
  • the frame interpolation operation may be to calculate the intermediate frame/insertion frame based on the adjacent first video frame and the second video frame through the optical flow prediction method.
  • step S103 may include, in response to the first comparison result indicating that there is no picture switching between the first video frame and the second video frame, determining whether the first video frame and the second video frame are switched. Insert frames in between. In response to the first comparison result indicating that there is a picture switching between the first video frame and the second video frame, it is determined not to interpolate a frame between the first video frame and the second video frame.
  • the frame interpolation operation is selectively performed based on the comparison results between adjacent video frames, thereby effectively avoiding the video frame interpolation processing due to
  • the obvious deformation problem caused by screen switching ensures the smoothness of the video, thereby improving the user's viewing experience.
  • the picture switching between the first video frame and the second video frame may include subtitle switching, may include scene switching, etc., and the embodiments of the present disclosure will not limit this.
  • the subtitle in the first video frame is "Where are you going" and in the second The caption in the video frame reads "I'm getting ready for school.” If the subtitles in the first video frame are different from the subtitles in the second video frame, it can be regarded that a subtitle switching occurs between the first video frame and the second video. It should be noted that the embodiments of the present disclosure do not limit the subtitle content.
  • the scene in the first video frame is in a shopping mall
  • the scene in the second video frame is in a school. If the scene in the first video frame is different from the scene in the second video frame, then it can be considered that the scene in the first video frame is in a shopping mall.
  • a scene switch occurs between the video frame and the second video frame.
  • the scenes in each video frame may include any scenes such as shopping malls, schools, scenic spots, etc., and the embodiments of the present disclosure do not limit this.
  • obtaining the first comparison result between the first video frame and the second video frame based on the first video frame and the second video frame may include: based on the first video frame and the second video frame. Whether the subtitle content of the first video frame and the second video frame are the same determines whether there is subtitle switching between the first video frame and the second video frame.
  • the start and end of each sentence of the audio of the video can be located to obtain two video frames corresponding to the audio, Mark according to the time information of the corresponding audio frame to determine whether the corresponding subtitles are segmented.
  • FIG. 3 is an example flowchart of a method for determining subtitle switching provided by at least one embodiment of the present disclosure.
  • determining whether there is a subtitle switching between the first video frame and the second video frame based on whether the subtitle content of the first video frame and the second video frame is the same may include the following step S201 Go to S203, as shown in Figure 3.
  • S203 Based on the start video frame and the end video frame, determine whether there is subtitle switching between the first video frame and the second video frame.
  • start video frame and “end video frame” are used to refer to two video frames determined based on the time information of the corresponding audio segment.
  • start video frame and “End Video Frame” are not limited to a specific video frame, nor are they limited to a specific order.
  • the corresponding audio data can be input into the speech recognition system for speech segmentation, and the speech recognition result and corresponding time information can be obtained.
  • the time information includes the start time and end time of the corresponding audio segment. Based on the speech recognition result and corresponding time information, the audio segment corresponding to the first video frame can be obtained.
  • step S202 according to the recognized time information of the corresponding audio segment, the start video frame and the end video frame corresponding to the audio segment may be determined.
  • step S203 may include: in response to the second video frame being between the start video frame and the end video frame, determining that there is no gap between the first video frame and the second video frame. There is a subtitle switch, and in response to the second video frame not being between the start video frame and the end video frame, it is determined that there is a subtitle switch between the first video frame and the second video frame.
  • a video includes a sequence of video frames, for example, including temporally adjacent video frame 1, video frame 2, video frame 3, video frame 4, video frame 5...
  • first The video frame is video frame 2
  • the audio segment corresponding to the first video frame is "Where are you going?"
  • the time information of the audio segment for example, the starting time and ending time of a sentence
  • the starting video frame is video frame 1
  • the ending video frame is video frame 4.
  • the subtitles displayed on the screen from video frame 1 to video frame 4 are all "Where are you going", that is, the same subtitle content is displayed.
  • the second video frame is video frame 3, between video frame 1 and video frame 4, then there is no subtitle switching between the first video frame and the second video frame.
  • the second video frame is video frame 5 and is not between video frame 1 and video frame 4, then a subtitle switching occurs between the first video frame and the second video frame.
  • text recognition in order to determine whether subtitle switching occurs between adjacent video frames, in addition to judging by audio, text recognition can also be used.
  • a text recognition algorithm is used to obtain the subtitle content displayed on the first video frame and the second video frame, and after comparison, it is determined whether a subtitle switching has occurred between the first video frame and the second video frame. It should be noted that the embodiments of the present disclosure do not place specific restrictions on the text recognition algorithm, as long as the text content can be recognized.
  • Figure 4 is a schematic flowchart of a text recognition method provided by at least one embodiment of the present disclosure.
  • the coordinates of the text can also be obtained.
  • the obtained text coordinates may be the coordinates of the four vertex positions of the upper left, lower left, upper right, and lower right of a complete subtitle.
  • text detection can be performed on the input image (it can also be a single-frame video), the area where the text is located is determined, and then each word is segmented separately, and then the single word is used
  • the text classifier (for example, using algorithms based on text feature vector correlation, algorithms based on neural networks, etc.) completes the classification of single text (if the confidence is greater than a certain threshold, it is considered to be this word), and finally outputs text recognition The result and its coordinates. It should be noted that the embodiments of the present disclosure do not limit the specific operations of the text recognition method, and any effective text recognition method can be used.
  • determining whether subtitle switching occurs between adjacent frames of the video may include: obtaining the first recognized text content of the first video frame , obtain the second recognized text content of the second video frame, and in response to the first recognized text content and the second recognized text content being the same, it is determined that there is no subtitle switching between the first video frame and the second video frame.
  • first recognized text content and “second recognized text content” are used to refer to recognized text content obtained by performing a text recognition operation on the corresponding video frame.
  • the "first recognition text content” and the “second recognition text content” are not limited to specific text content, nor are they limited to a specific order.
  • the scope of application of the text recognition operation can be set in advance. Since the display position of subtitles in the video screen is usually fixed, the approximate area where the subtitles are located can be set in advance.
  • FIG. 5 is a schematic flowchart of another method for determining subtitle switching provided by at least one embodiment of the present disclosure.
  • the video frame insertion processing method 10 may include the following steps S301-S303, as shown in FIG. 5 .
  • Step S301 In response to the difference between the first recognized text content and the second recognized text content, obtain the first sub-image of the first video frame.
  • the first sub-image corresponds to first subtitle content of the first video frame.
  • Step S302 Obtain the second sub-image of the second video frame, where the second sub-image corresponds to the second subtitle content of the second video frame.
  • Step S303 Based on the first sub-image and the second sub-image, determine whether there is a subtitle switching between the first video frame and the second video frame.
  • first subtitle content and “second subtitle content” are respectively used to refer to the subtitle content displayed in the corresponding video frame.
  • First subtitle content and “No. “Second subtitled content” is not limited to specific subtitled content, nor is it limited to the order of features.
  • first sub-image “second sub-image” and “third sub-image” are respectively used to refer to the image of the area where the subtitles are located in the corresponding video frame.
  • the "first sub-image”, “second sub-image” and “third sub-image” are not limited to specific images, nor are they limited to a specific order.
  • a text recognition operation is performed on a certain video frame, and the coordinates of the subtitles in the video frame are identified (for example, the four vertices of the upper left, lower left, upper right, and lower right of a complete subtitle The coordinates of the position), based on the coordinates, the area where the subtitles are located in the video frame can be obtained, thereby obtaining the sub-image of the video frame corresponding to the subtitle content.
  • step S303 may include: determining a first similarity between the first sub-image and the second sub-image based on the first sub-image and the second sub-image; responding to the first When a similarity is greater than the first threshold, it is determined that there is no subtitle switching between the first video frame and the second video frame; in response to the first similarity being not greater than the first threshold, it is determined that there is a subtitle switching between the first video frame and the second video frame. Subtitle switching.
  • first similarity is used to refer to the image similarity between the subtitle sub-images of two adjacent video frames.
  • second similarity is used to refer to the image similarity between two adjacent video frames.
  • First degree of similarity and second degree of similarity are not limited to a specific degree of similarity, nor are they limited to a specific order.
  • first threshold the second threshold
  • third threshold the values of the "first threshold”, the second threshold, and the “third threshold” and can be set according to actual needs.
  • the “first threshold”, the second threshold Neither the “Threshold” nor the “Third Threshold” is limited to any specific values, nor is it limited to a specific order.
  • image similarity between two images may be calculated using various methods. For example, through cosine similarity algorithm, histogram algorithm, perceptual hashing algorithm, algorithm based on mutual information, etc.
  • the embodiments of the present disclosure do not limit the method of calculating image similarity, and can be selected according to actual needs.
  • SSIM Structural Similarity
  • SSIM is a full-reference image quality evaluation index that measures image similarity from three aspects: brightness, contrast, and structure.
  • the formula for calculating SSIM is as follows:
  • ⁇ x represents the average value of x
  • ⁇ y represents the average value of y
  • ⁇ y represents the variance of x
  • ⁇ xy represents the covariance of x and y
  • L represents the dynamic range of pixel values.
  • Structural similarity ranges from -1 to 1. The larger the value, the smaller the image distortion. When the two images are exactly the same, the value of SSIM is equal to 1.
  • the "first threshold” may be set to 0.6 or 0.8. It should be noted that the embodiments of the present disclosure do not limit the value of the “first threshold” and can be set according to actual requirements.
  • FIG. 6 is a schematic block diagram of yet another method for determining whether subtitles are switched, provided by at least one embodiment of the present disclosure.
  • the first text recognition content T 0 and the second recognition text content T 1 are obtained, as well as the corresponding coordinates C 0 and C 1 . Then, the text similarity between the first text recognition content T 0 and the second recognition text content T 1 is calculated to determine whether the first text recognition content T 0 and the second recognition text content T 1 are the same.
  • the similarity is greater than a certain threshold, it is deemed that the first text recognition content T 0 and the second recognition text content T 1 are the same, that is, the subtitles are not switched. If the similarity is not greater than a certain threshold, then further determine the similarity between the first sub-image corresponding to the subtitle area Z 0 in the first video frame I 0 and the second sub-image corresponding to the subtitle area Z 1 in the second video frame I 1 Spend. For example, as shown in Figure 6, it is determined whether the SSIM of the recognized image within the range of coordinate C 0 and coordinate C 1 (ie, the above-mentioned first sub-image and second sub-image) is greater than a threshold. If the SSIM is greater than the threshold (for example, 0.8), it indicates that the subtitles are not switched. If the SSIM is not greater than the threshold (for example, 0.8), it indicates that the subtitles have been switched.
  • the threshold for example, 0.8
  • the embodiments of the present disclosure do not limit the method of calculating text similarity. For example, Euclidean distance, Manhattan distance, cosine similarity and other methods can be used to calculate text similarity. It should also be noted that the embodiments of the present disclosure do not specifically limit the threshold of text similarity, and can be set according to actual needs.
  • picture switching may include scene switching in addition to subtitle switching.
  • step S102 may include: determining whether there is a scene switch between the first video frame and the second video frame based on whether the scenes of the first video frame and the second video frame are the same.
  • the two The image similarity (such as SSIM value) of the frame image will be significantly reduced. Therefore, scene segmentation can be achieved by calculating image similarity.
  • determining whether a scene switch occurs between two adjacent video frames may include the following steps: obtaining the second similarity between the first video frame and the second video frame; responding In response to the second similarity being greater than the second threshold, it is determined that there is no scene switch between the first video frame and the second video frame; in response to the second similarity being not greater than the second threshold, it is determined that there is no scene switch between the first video frame and the second video frame. There is a scene switch between.
  • the second similarity may be structural similarity (SSIM), or may be, for example, a perceptual hashing algorithm, a histogram algorithm, etc. to calculate the similarity between pictures (ie, video frames). Degree, embodiments of the present disclosure do not limit the algorithm for calculating image similarity.
  • the number of inserted frames is 2 times the number of inserted frames as an example. , for example, insert frames from 30fps (frames per second) to 60fps, which means that the number of frames transmitted per second increases from 30 to 60.
  • the frame insertion operation will no longer be performed between the current two frames.
  • two frames will be inserted the next time.
  • the frame insertion operation will not be performed twice. If only two frames are inserted the next time, the overall video will have fewer frames.
  • FIG. 7 is a schematic diagram of another video frame insertion processing method provided by at least one embodiment of the present disclosure.
  • the video frame insertion processing method 10 in addition to steps S10-S103, may include: setting a first frame insertion flag;
  • the first frame insertion flag is modified into a second frame insertion flag.
  • first frame insertion mark refers to frame insertion marks at different time points or different stages, so as to Used to indicate how many consecutive screen switches there are in the video.
  • the "first frame insertion flag”, the “second frame insertion flag” and the “third frame insertion flag” are not limited to specific values, nor are they limited to a specific order.
  • the video includes a sequence of video frames, for example, including temporally adjacent video frame 1, video frame 2, video frame 3, video frame 4, video frame 5...
  • set a frame insertion flag for example, the frame insertion flag Flag is initialized to (0, 0).
  • Input two adjacent video frames for example, a first video frame and a second video frame, assuming that the first video frame is video frame 2 and the second video frame is video frame 3.
  • Whether there is a picture switch (subtitle switch or scene switch) between video frame 2 and video frame 3 is determined by the method described in the above embodiment. If video frame 2 If there is a screen switch between video frame 3 and video frame 3, change the frame insertion flag Flag from (0,0) to (0,1).
  • a value "1” is appended to the interpolated frame flag Flag (0,0), and the previous value "0” is popped up, that is, updated The subsequent inserted frame mark is (0,1).
  • a value "0” is added to the frame insertion flag Flag(0,0), and the previous value "0” pops up, that is, the updated frame insertion flag is (0,0).
  • the frame insertion flag can also be initialized to other values, such as (1,1), (0,0,0), etc., and the embodiments of the present disclosure do not limit this.
  • a fourth video frame is acquired in response to a screen switch between the first video frame and the second video frame. Based on the second video frame and the fourth video frame, a second comparison result between the second video frame and the fourth video frame is obtained. It is determined based on the second comparison result whether to insert a frame between the second video frame and the fourth video.
  • the fourth video frame and the second video frame are adjacent in the time domain, and the second video frame is a forward frame of the fourth video frame.
  • the second comparison result indicates whether there is a picture switch between the second video frame and the fourth video frame.
  • determining whether to insert a frame between the second video frame and the fourth video based on the second comparison result includes: responding to the second comparison result indicating that the second video frame and the fourth video frame There is no picture switching between them, and multiple video frames are inserted between the second video frame and the fourth video frame.
  • the frame number of the multi-frame video frame is based on the second interpolated frame flag.
  • determining whether to insert a frame between the second video frame and the fourth video based on the second comparison result includes: responding to the second comparison result indicating that the second video frame and the fourth video frame There is a picture switching between the two video frames, determine not to insert a frame between the second video frame and the fourth video; and modify the second frame insertion flag to the third frame insertion flag.
  • the third frame insertion flag is used to indicate the frame number of the next frame insertion.
  • the "fourth video frame” is used to refer to the subsequent frame of image that is temporally adjacent to the "second video frame”.
  • the fourth video frame is not limited to a specific frame of image, nor is it Subject to a specific order.
  • "Second comparison result” is used to refer to the comparison result between two adjacent frames of images in the video (the second video frame and the fourth video frame). It is not limited to a specific comparison result, nor is it limited to a specific comparison result. Limited to a specific order.
  • the video includes a sequence of video frames, for example, including temporally adjacent video frame 1, video frame 2, video frame 3, video frame 4, video frame 5...
  • the first video frame is Video frame 1
  • the second video frame is video frame 2
  • the fourth video frame is video frame 3.
  • two adjacent video frames namely video frame 2 and video frame 3 are input, and whether there is a picture switch (subtitle switch or scene switch) between video frame 2 and video frame 3 is determined by the method provided in the above embodiment. For example, if it is determined that there is no picture switching between video frame 2 and video frame 3, a frame interpolation operation is performed between video frame 2 and video frame 3.
  • the frame insertion flag Flag is (0, 1), indicating that a picture switch occurs (that is, there is no frame insertion between video frame 1 and video frame 2).
  • the frame interpolation operation is not performed between video frame 2 and video frame 3.
  • change the frame insertion flag Flag from (0, 1) to (1, 1).
  • a value "1" is added to the frame insertion flag Flag(0,1), and the previous value "0" is popped out.
  • the frame insertion flag Flag(1,1) can indicate that two consecutive screen switches have occurred in the video frame sequence. For example, there is a picture switch between video frame 1 and video frame 2, and there is still a picture switch between video frame 2 and video frame 3. For example, through similar operations, continue to compare video frame 3 and video frame 4.
  • the above embodiment of the present disclosure takes up to two consecutive picture switching as an example to initialize the frame insertion flag. is (0,0).
  • the embodiments of the present disclosure do not limit this, and can be set according to actual needs.
  • FIG. 8 is a schematic flow chart of a frame insertion post-processing method provided by at least one embodiment of the present disclosure.
  • the video frame insertion processing method 10 further includes the following steps S401-S403, as shown in FIG. 8 .
  • Step S401 In response to inserting a third video frame between the first video frame and the second video frame, obtain a first sub-image of the first video frame.
  • the first sub-image corresponds to first subtitle content in the first video frame.
  • Step S402 Obtain the third sub-image of the third video frame.
  • the third sub-image corresponds to third subtitle content in the third video frame.
  • Step S403 Based on the first sub-image and the third sub-image, determine whether to replace the third video frame with the first video frame.
  • step S403 may include: obtaining the pixel value of the first pixel in the first sub-image; and setting the third sub-image based on the pixel value of the first pixel in the first sub-image. the pixel value of the third pixel; and based on the first sub-image and the set third sub-image, determine whether to replace the third video frame with the first video frame.
  • the pixel value of the first pixel is greater than the third threshold, and the relative position of the third pixel in the third sub-image is the same as the relative position of the first pixel in the first sub-image.
  • the relative position of the third pixel in the third sub-image and the relative position of the first pixel in the first sub-image are the same, which can be understood as, for example, taking the upper left corner vertex of the first sub-image as The coordinate origin, the position coordinate of the first pixel in this coordinate system is the same as the position coordinate of the third pixel in this coordinate system, taking the upper left corner vertex of the third sub-image as the coordinate origin.
  • FIG. 9 is a schematic diagram of another video frame insertion processing method provided by at least one embodiment of the present disclosure.
  • a third video frame is inserted between the first video frame and the second video frame
  • further processing can be performed.
  • the first sub-image of the first video frame (ie, the area corresponding to the recognized coordinate C 0 ) can be selected to be larger than The pixel (that is, the first pixel) of a certain threshold (that is, the third threshold). For example, set the third threshold to 220, and the pixel value range is generally 0-255. Assign the value of the first pixel to the pixel (ie, the third pixel) located at the same position as the first pixel in the third sub-image (ie, the area corresponding to the identified coordinate C t ). For example, in Figure 9, the third sub-image after assignment is recorded as C t '.
  • the deformation of the subtitles will usually significantly exceed the range of the original characters. Therefore, by comparing the first sub-image and the assigned third sub-image, it can be determined whether there is obvious deformation of the interpolated subtitles.
  • the first sub-image and the assigned third sub-image are compared, the pixel values of corresponding pixels of the first sub-image and the assigned third sub-image are subtracted, and the pixel difference is determined.
  • a certain threshold for example, 150
  • another threshold value for example, 30.
  • the deformed insertion frame ie. the third video frame
  • embodiments of the present disclosure do not limit this. In this way, deformation problems caused by large movements in the subtitle background can be avoided.
  • Figure 10 is a schematic block diagram of a video frame insertion processing method provided by at least one embodiment of the present disclosure.
  • a video frame insertion processing method provided by at least one embodiment of the present disclosure can not only solve the deformation problem caused by scene switching and subtitle switching, but can also solve the problem of large movements in the subtitle background through post-processing after frame insertion. obvious deformation problem.
  • the operations in each block of the method in Figure 10 have been described in detail above and will not be described again here.
  • the video frame insertion processing method 10 provided by at least one embodiment of the present disclosure, it is possible to solve the obvious deformation problem caused by the switching of the video picture and the large movement of the subtitle background during the frame insertion processing, thereby ensuring that the video The fluency improves the user’s viewing experience.
  • each step of the video frame insertion processing method 10 is not limited. Although the execution process of each step is described in a specific order above, this does not constitute Limitations on Embodiments of the Disclosure.
  • Each step in the video frame insertion processing method 10 can be executed serially or in parallel, which can be determined according to actual requirements.
  • the video frame insertion processing method 10 may also include more or fewer steps, which are not limited by the embodiments of the present disclosure.
  • At least one embodiment of the present disclosure also provides a video frame interpolation processing device.
  • the video frame interpolation processing device can selectively perform frame interpolation processing according to the comparison results between adjacent video frames, thereby effectively avoiding the need for frame interpolation processing.
  • the obvious deformation problem caused by the switching of video screens ensures the smoothness of the video, thereby improving the user's viewing experience.
  • Figure 11 is a schematic block diagram of a video frame insertion processing device provided by at least one embodiment of the present disclosure.
  • the video frame insertion processing device 80 includes an acquisition module 801, a comparison module 802, and an operation module 803.
  • the acquisition module 801 is configured to acquire the first video frame and the second video frame of the video.
  • the first video frame and the second video frame are adjacent in the time domain, and the first video frame is a forward frame of the second video frame.
  • the acquisition module 801 can implement step S101.
  • step S101 For its specific implementation method, please refer to the relevant description of step S101, which will not be described again here.
  • the comparison module 802 is configured to obtain a first comparison result between the first video frame and the second video frame based on the first video frame and the second video frame. First The comparison result indicates whether there is a picture switching between the first video frame and the second video frame.
  • the comparison module 802 can implement step S102, and the specific implementation method can refer to the relevant description of step S102, which will not be described again here.
  • the operation module 803 is configured to determine whether to insert a frame between the first video frame and the second video frame based on the first comparison result.
  • the operation module 803 can implement step S103.
  • step S103 For the specific implementation method, reference can be made to the relevant description of step S103, which will not be described again here.
  • acquisition module 801, comparison module 802 and operation module 803 can be implemented by software, hardware, firmware or any combination thereof.
  • they can be implemented as acquisition circuit 801, comparison circuit 802 and operation circuit 803 respectively.
  • the disclosed embodiments are not limited to their specific implementations.
  • video frame interpolation processing device 80 provided by the embodiment of the present disclosure can implement the foregoing video frame interpolation processing method 10, and can also achieve similar technical effects to the foregoing video frame interpolation processing method 10, which will not be described again here.
  • the video frame insertion processing device 80 may include more or less circuits or units, and the connection relationship between the various circuits or units is not limited, and may be based on Depends on actual needs.
  • the specific construction method of each circuit is not limited. It can be composed of analog devices according to the circuit principle, or it can be composed of digital chips, or it can be constructed in other suitable ways.
  • Figure 12 is a schematic block diagram of another video frame insertion processing device provided by at least one embodiment of the present disclosure.
  • At least one embodiment of the present disclosure also provides a video frame insertion processing device 90.
  • the video frame insertion processing device 90 includes a processor 910 and a memory 920 .
  • Memory 920 includes one or more computer program modules 921.
  • One or more computer program modules 921 are stored in the memory 920 and configured to be executed by the processor 910.
  • the one or more computer program modules 921 include a method for performing the video frame interpolation process provided by at least one embodiment of the present disclosure.
  • the instructions of the method 10 when executed by the processor 910, can perform one or more steps in the video frame insertion processing method 10 provided by at least one embodiment of the present disclosure.
  • Memory 920 and processor 910 may be interconnected by a bus system and/or other forms of connection mechanisms (not shown).
  • the processor 910 may be a central processing unit (CPU), a digital signal processor (DSP), or other forms of processing units with data processing capabilities and/or program execution capabilities, such as a field programmable gate array (FPGA), etc.;
  • the central processing unit (CPU) can be X86 or ARM Architecture etc.
  • the processor 910 can be a general-purpose processor or a special-purpose processor, and can control other components in the video frame insertion processing device 90 to perform desired functions.
  • memory 920 may include any combination of one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory.
  • Volatile memory may include, for example, random access memory (RAM) and/or cache memory (cache), etc.
  • Non-volatile memory may include, for example, read-only memory (ROM), hard disk, erasable programmable read-only memory (EPROM), portable compact disk read-only memory (CD-ROM), USB memory, flash memory, and the like.
  • One or more computer program modules 921 can be stored on a computer-readable storage medium, and the processor 910 can run one or more computer program modules 921 to implement various functions of the video frame insertion processing device 90 .
  • FIG. 13 is a schematic block diagram of yet another video frame insertion processing device 300 provided by at least one embodiment of the present disclosure.
  • Terminal devices in embodiments of the present disclosure may include, but are not limited to, mobile phones, laptops, digital broadcast receivers, PDAs (Personal Digital Assistants), PADs (Tablets), PMPs (Portable Multimedia Players), vehicle-mounted terminals (such as Mobile terminals such as car navigation terminals) and fixed terminals such as digital TVs, desktop computers, etc.
  • the video frame insertion processing device 300 shown in FIG. 13 is only an example and should not impose any restrictions on the functions and scope of use of the embodiments of the present disclosure.
  • the video frame insertion processing device 300 includes a processing device (such as a central processing unit, a graphics processor, etc.) 301, which can perform processing according to a program stored in a read-only memory (ROM) 302. Or a program loaded from the storage device 308 into the random access memory (RAM) 303 performs various appropriate actions and processes.
  • ROM read-only memory
  • RAM random access memory
  • various programs and data required for the operation of the computer system are also stored.
  • Processing device 301, ROM 302 and RAM 303 are connected via bus 304.
  • An input/output (I/O) interface 305 is also connected to bus 304.
  • the following components may be connected to the I/O interface 305: input devices 306 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speakers, vibration an output device 307 such as a processor; a storage device 308 including a magnetic tape, a hard disk, etc.; and a communication device 309 including a network interface card such as a LAN card, a modem, etc.
  • the communication device 309 can allow the video frame insertion processing device 300 to communicate with other devices. Wireless or wired communication to exchange data, performing communication processing via a network such as the Internet.
  • Driver 310 is also connected to I/O interface 305 as needed.
  • FIG. 13 illustrates the video frame insertion processing apparatus 300 including various devices, it should be understood that implementation or inclusion of all illustrated devices is not required. More or fewer means may alternatively be implemented or included.
  • the video frame insertion processing device 300 may further include a peripheral interface (not shown in the figure) and the like.
  • the peripheral interface can be various types of interfaces, such as a USB interface, a lightning interface, etc.
  • the communication device 309 may communicate via wireless communications with networks and other devices, such as the Internet, an intranet, and/or a wireless network such as a cellular telephone network, a wireless local area network (LAN), and/or a metropolitan area network (MAN).
  • networks and other devices such as the Internet, an intranet, and/or a wireless network such as a cellular telephone network, a wireless local area network (LAN), and/or a metropolitan area network (MAN).
  • LAN wireless local area network
  • MAN metropolitan area network
  • Wireless communications can use any of a variety of communications standards, protocols and technologies, including but not limited to Global System for Mobile Communications (GSM), Enhanced Data GSM Environment (EDGE), Wideband Code Division Multiple Access (W-CDMA) , Code Division Multiple Access (CDMA), Time Division Multiple Access (TDMA), Bluetooth, Wi-Fi (e.g. based on IEEE 802.11a, IEEE 802.11b, IEEE 802.11g and/or IEEE 802.11n standards), Internet Protocol-based voice transmission (VoIP), Wi-MAX, protocols for email, instant messaging and/or short message service (SMS), or any other suitable communications protocol.
  • GSM Global System for Mobile Communications
  • EDGE Enhanced Data GSM Environment
  • W-CDMA Wideband Code Division Multiple Access
  • CDMA Code Division Multiple Access
  • TDMA Time Division Multiple Access
  • Wi-Fi e.g. based on IEEE 802.11a, IEEE 802.11b, IEEE 802.11g and/or IEEE 802.11n standards
  • VoIP Internet Protocol-based voice transmission
  • Wi-MAX protocols
  • the video frame insertion processing device 300 can be any device such as a mobile phone, a tablet computer, a notebook computer, an e-book, a game console, a television, a digital photo frame, a navigator, etc., or it can be any combination of data processing devices and hardware. The disclosed embodiments do not limit this.
  • embodiments of the present disclosure include a computer program product including a computer program carried on a non-transitory computer-readable medium, the computer program containing program code for performing the method illustrated in the flowchart.
  • the computer program may be downloaded and installed from the network via communication device 309, or from storage device 308, or from ROM 302.
  • the video frame insertion processing method 10 disclosed in the embodiment of the present disclosure is executed.
  • the computer-readable medium mentioned above in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two.
  • the computer-readable storage medium may be, for example, but is not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or any combination thereof. More specific examples of computer-readable storage media can Includes, but is not limited to: electrical connections with one or more wires, portable computer disks, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), fiber optics , portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program for use by or in connection with an instruction execution system, apparatus, or device.
  • the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, in which computer-readable program code is carried. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the above.
  • a computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium that can send, propagate, or transmit a program for use by or in connection with an instruction execution system, apparatus, or device .
  • Program code embodied on a computer-readable medium may be transmitted using any suitable medium, including but not limited to: wire, optical cable, RF (radio frequency), etc., or any suitable combination of the above.
  • the computer-readable medium may be included in the video frame insertion processing device 300; it may also exist independently without being assembled into the video frame insertion processing device 300.
  • Figure 14 is a schematic block diagram of a non-transitory readable storage medium provided by at least one embodiment of the present disclosure.
  • Embodiments of the present disclosure also provide a non-transitory readable storage medium.
  • Figure 14 is a schematic block diagram of a non-transitory readable storage medium according to at least one embodiment of the present disclosure.
  • computer instructions 111 are stored on the non-transitory readable storage medium 140 . When executed by the processor, the computer instructions 111 perform one or more steps in the video frame insertion processing method 10 as described above.
  • the non-transitory readable storage medium 140 may be any combination of one or more computer readable storage media, for example, one computer readable storage medium includes a computer for obtaining the first video frame and the second video frame of the video. and another computer-readable storage medium containing computer-readable program code for obtaining a first comparison result between the first video frame and the second video frame based on the first video frame and the second video frame. Another computer-readable storage medium includes computer-readable program code for determining whether to insert a frame between the first video frame and the second video frame based on the first comparison result.
  • each of the above program codes can also be stored in the same computer-readable medium, and embodiments of the present disclosure do not limit this.
  • the computer when the program code is read by a computer, the computer can execute the computer storage medium
  • the program code stored in the program code is executed, for example, the video frame interpolation processing method 10 provided by any embodiment of the present disclosure.
  • the storage medium may include a memory card of a smartphone, a storage component of a tablet computer, a hard drive of a personal computer, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM), Portable compact disk read-only memory (CD-ROM), flash memory, or any combination of the above storage media can also be other suitable storage media.
  • the readable storage medium may also be the memory 920 in FIG. 12. For related descriptions, reference may be made to the foregoing content, which will not be described again here.
  • Embodiments of the present disclosure also provide an electronic device.
  • Figure 15 is a schematic block diagram of an electronic device according to at least one embodiment of the present disclosure.
  • the electronic device 120 may include the video frame insertion processing device 80/90/300 as described above.
  • the electronic device 120 can implement the video frame insertion processing method 10 provided by any embodiment of the present disclosure.
  • the term “plurality” refers to two or more than two, unless expressly limited otherwise.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Television Systems (AREA)

Abstract

一种视频插帧处理方法及装置和存储介质。该视频插帧处理方法,包括:(S101)获取视频的第一视频帧和第二视频帧;(S102)基于第一视频帧和第二视频帧,获取第一视频帧和第二视频帧之间的第一比较结果;以及(S103)基于第一比较结果确定是否在第一视频帧和第二视频帧之间插帧。第一视频帧和第二视频帧在时域上相邻,第一视频帧是第二视频帧的前向帧。第一比较结果指示第一视频帧和第二视频帧之间是否存在画面切换。该视频插帧处理方法,通过比较相邻视频帧选择性地执行插帧操作,从而有效避免在插帧处理中,由于发生画面切换而导致的明显形变问题,保证视频的流畅度,从而提升用户的观看体验。

Description

视频插帧处理方法、视频插帧处理装置和可读存储介质 技术领域
本公开的实施例涉及一种视频插帧处理方法、视频插帧处理装置、和非瞬时可读存储介质。
背景技术
视频处理是人工智能的典型应用,视频插帧技术又是视频处理中的一个典型技术,旨在根据一段视频中的前后视频帧合成过渡平滑的中间视频帧,以使得视频播放更加流畅,从而提升用户的观看体验。例如,可以通过视频插帧处理将24帧率的视频转变为48帧率的视频,从而让用户在观看时感觉视频更加清晰流畅。
发明内容
本公开至少一个实施例提供一种视频插帧处理方法,包括:获取视频的第一视频帧和第二视频帧,基于所述第一视频帧和所述第二视频帧,获取所述第一视频帧和所述第二视频帧之间的第一比较结果,以及基于所述第一比较结果确定是否在所述第一视频帧和所述第二视频帧之间插帧。所述第一视频帧和所述第二视频帧在时域上相邻,所述第一视频帧是所述第二视频帧的前向帧。所述第一比较结果指示所述第一视频帧和所述第二视频帧之间是否存在画面切换。
例如,在本公开至少一个实施例提供的方法中,所述画面切换包括字幕切换和/或场景切换。
例如,在本公开至少一个实施例提供的方法中,基于所述第一视频帧和所述第二视频帧,获取所述第一视频帧和所述第二视频帧之间的所述第一比较结果,包括:基于所述第一视频帧和所述第二视频帧的字幕内容是否相同,确定所述第一视频帧和所述第二视频帧之间是否存在所述字幕切换。
例如,在本公开至少一个实施例提供的方法中,基于所述第一视频帧和所述第二视频帧的字幕内容是否相同,确定所述第一视频帧和所述第二视频帧之间是否存在所述字幕切换,包括:获取所述第一视频帧对应的音频段;基于所述音频段,获取与所述音频段对应的起始视频帧和结束视频帧;基于所述起 始视频帧和所述结束视频帧,确定所述第一视频帧和所述第二视频帧之间是否存在所述字幕切换。
例如,在本公开至少一个实施例提供的方法中,基于所述起始视频帧和所述结束视频帧,确定所述第一视频帧和所述第二视频帧之间是否存在所述字幕切换,包括:响应于所述第二视频帧在所述起始视频帧和所述结束视频帧之间,确定所述第一视频帧和所述第二视频帧之间不存在所述字幕切换;响应于所述第二视频帧不在所述起始视频帧和所述结束视频帧之间,确定所述第一视频帧和所述第二视频帧之间存在所述字幕切换。
例如,在本公开至少一个实施例提供的方法中,基于所述第一视频帧和所述第二视频帧的字幕内容是否相同,确定所述第一视频帧和所述第二视频帧之间是否存在所述字幕切换,包括:获取所述第一视频帧的第一识别文本内容;获取所述第二视频帧的第二识别文本内容;响应于所述第一识别文本内容和所述第二识别文本内容相同,确定所述第一视频帧和所述第二视频帧之间不存在所述字幕切换。
例如,在本公开至少一个实施例提供的方法中,其中,基于所述第一视频帧和所述第二视频帧的字幕内容是否相同,确定所述第一视频帧和所述第二视频帧之间是否存在所述字幕切换,还包括:响应于所述第一识别文本内容和所述第二识别文本内容不同:获取所述第一视频帧的第一子图像;获取所述第二视频帧的第二子图像,以及基于所述第一子图像和所述第二子图像,确定所述第一视频帧和所述第二视频帧之间是否存在所述字幕切换。所述第一子图像对应于所述第一视频帧的第一字幕内容;所述第二子图像对应于所述第二视频帧的第二字幕内容。
例如,在本公开至少一个实施例提供的方法中,基于所述第一子图像和所述第二子图像,确定所述第一视频帧和所述第二视频帧之间是否存在所述字幕切换,包括:基于所述第一子图像和所述第二子图像,确定所述第一子图像和所述第二子图像之间的第一相似度;响应于所述第一相似度大于第一阈值,确定所述第一视频帧和所述第二视频帧之间不存在所述字幕切换;响应于所述第一相似度不大于所述第一阈值,确定所述第一视频帧和所述第二视频帧之间存在所述字幕切换。
例如,在本公开至少一个实施例提供的方法中,基于所述第一视频帧和所述第二视频帧,获取所述第一视频帧和所述第二视频帧之间的所述第一比较 结果,包括:基于所述第一视频帧和所述第二视频帧的场景是否相同,确定所述第一视频帧和所述第二视频帧之间是否存在所述场景切换。
例如,在本公开至少一个实施例提供的方法中,基于所述第一视频帧和所述第二视频帧的场景是否相同,确定所述第一视频帧和所述第二视频帧之间是否存在所述场景切换,包括:获取所述第一视频帧和所述第二视频帧之间的第二相似度;响应于所述第二相似度大于第二阈值,确定所述第一视频帧和所述第二视频帧之间不存在所述场景切换;响应于所述第二相似度不大于所述第二阈值,确定所述第一视频帧和所述第二视频帧之间存在所述场景切换。
例如,在本公开至少一个实施例提供的方法中,基于所述第一比较结果确定是否在所述第一视频帧和所述第二视频之间插帧,包括:响应于所述第一比较结果指示所述第一视频帧和所述第二视频帧之间不存在所述画面切换,确定在所述第一视频帧和所述第二视频之间插帧;响应于所述第一比较结果指示所述第一视频帧和所述第二视频帧之间存在所述画面切换,确定不在所述第一视频帧和所述第二视频之间插帧。
例如,在本公开至少一个实施例提供的方法中,还包括:设置第一插帧标志,响应于所述第一视频帧和所述第二视频帧之间存在所述画面切换,将所述第一插帧标志修改为所述第二插帧标志。
例如,在本公开至少一个实施例提供的方法中,还包括:响应于所述第一视频帧和所述第二视频帧之间存在所述画面切换,获取第四视频帧;基于所述第二视频帧和所述第四视频帧,获取所述第二视频帧和所述第四视频帧之间的第二比较结果;基于所述第二比较结果确定是否在所述第二视频帧和所述第四视频之间插帧。所述第四视频帧和所述第二视频帧在时域上相邻,所述第二视频帧是所述第四视频帧的前向帧;所述第二比较结果指示所述第二视频帧和所述第四视频帧之间是否存在所述画面切换。
例如,在本公开至少一个实施例提供的方法中,基于所述第二比较结果确定是否在所述第二视频帧和所述第四视频之间插帧,包括:响应于所述第二比较结果指示所述第二视频帧和所述第四视频帧之间不存在所述画面切换,在所述第二视频帧和所述第四视频之间插入多帧视频帧。所述多帧视频帧的帧数基于所述第二插帧标志。
例如,在本公开至少一个实施例提供的方法中,基于所述第二比较结果确定是否在所述第二视频帧和所述第四视频之间插帧,包括:响应于所述第二比 较结果指示所述第二视频帧和所述第四视频帧之间存在所述画面切换,确定不在所述第二视频帧和所述第四视频之间插入视频帧;以及将所述第二插帧标志修改为第三插帧标志,其中,所述第三插帧标志用于指示下一次插帧的帧数。
例如,在本公开至少一个实施例提供的方法中,还包括:响应于在所述第一视频帧和所述第二视频帧之间插入第三视频帧,获取所述第一视频帧的第一子图像,获取所述第三视频帧的第三子图像,基于所述第一子图像和所述第三子图像,确定是否用所述第一视频帧替换所述第三视频帧。所述第一子图像对应于所述第一视频帧中的第一字幕内容,所述第三子图像对应于所述第三视频帧中的第三字幕内容。
例如,在本公开至少一个实施例提供的方法中,基于所述第一子图像和所述第三子图像,确定是否用所述第一视频帧替换所述第三视频帧,包括:获取所述第一子图像中的第一像素的像素值;基于所述第一子图像的第一像素的像素值,设置所述第三子图像的第三像素的像素值,基于所述第一子图像和所述设置后的第三子图像,确定是否用所述第一视频帧替换所述第三视频帧。所述第一像素的像素值大于第三阈值;所述第三像素在所述第三子图像中的相对位置和所述第一像素在所述第一子图像的相对位置相同。
本公开至少一个实施例还提供一种视频插帧处理装置,包括:获取模块、比较模块和操作模块。获取模块被配置为获取视频的第一视频帧和第二视频帧。所述第一视频帧和所述第二视频帧在时域上相邻,所述第一视频帧是所述第二视频帧的前向帧。比较模块被配置为基于所述第一视频帧和所述第二视频帧,获取所述第一视频帧和所述第二视频帧之间的第一比较结果。所述第一比较结果指示所述第一视频帧和所述第二视频帧之间是否存在画面切换。操作模块被配置为基于所述第一比较结果确定是否在所述第一视频帧和所述第二视频帧之间插帧。
本公开至少一个实施例还提供一种视频插帧处理装置,包括:处理器和存储器。存储器包括一个或多个计算机程序模块。所述一个或多个计算机程序模块被存储在所述存储器中并被配置为由所述处理器执行,所述一个或多个计算机程序模块包括用于执行上述任一实施例中的视频插帧处理方法的指令。
本公开至少一个实施例还提供一种非瞬时可读存储介质,其上存储有计算机指令。所述计算机指令被处理器执行时执行上述任一实施例中的视频插 帧处理方法。
附图说明
为了更清楚地说明本公开实施例的技术方案,下面将对实施例的附图作简单地介绍,显而易见地,下面描述的附图仅仅涉及本公开的一些实施例,而非对本公开的限制。
图1为本公开至少一个实施例提供的一种视频插帧方法的示意图;
图2为本公开至少一个实施例提供的视频插帧处理方法的流程示意图;
图3为本公开至少一个实施例提供的判断字幕切换的方法的流程图;
图4为本公开至少一个实施例提供的一种文本识别方法的流程示意图;
图5为本公开至少一个实施例提供的另一种判断字幕是否切换的方法的流程示意图;
图6为本公开至少一个实施例提供的又一种判断字幕是否切换的方法的示意框图;
图7为本公开至少一个实施例提供的另一视频插帧处理方法的示意图;
图8为本公开至少一个实施例提供的一种后处理方法的示意流程图;
图9为本公开至少一个实施例提供的另一种视频插帧处理方法的示意图;
图10为本公开至少一实施例提供的又一种视频插帧处理方法的示意框图;
图11为本公开至少一个实施例提供的一种视频插帧处理装置的示意框图;
图12为本公开至少一个实施例提供的另一种视频插帧处理装置的示意框图;
图13为本公开至少一个实施例提供的又一种视频插帧处理装置的示意框图;
图14为本公开至少一个实施例提供的一种非瞬时可读存储介质的示意框图;
图15为本公开至少一个实施例提供的一种电子设备的示意框图。
具体实施方式
为使本公开实施例的目的、技术方案和优点更加清楚,下面将结合附图,对本公开实施例的技术方案进行清楚、完整地描述。显然,所描述的实施例是本公开的一部分实施例,而不是全部的实施例。基于所描述的本公开的实施 例,本领域普通技术人员在无需创造性劳动的前提下所获得的所有其他实施例,都属于本公开保护的范围。
本公开中使用了流程图来说明根据本申请的实施例的系统所执行的操作。应当理解的是,前面或下面操作不一定按照顺序来精确地执行。相反,根据需要,可以按照倒序或同时处理各种步骤。同时,也可以将其他操作添加到这些过程中,或从这些过程移除某一步或数步操作。
除非另外定义,本公开使用的技术术语或者科学术语应当为本公开所属领域内具有一般技能的人士所理解的通常意义。本公开中使用的“第一”、“第二”以及类似的词语并不表示任何顺序、数量或者重要性,而只是用来区分不同的组成部分。同样,“一个”、“一”或者“该”等类似词语也不表示数量限制,而是表示存在至少一个。“包括”或者“包含”等类似的词语意指出现该词前面的元件或者物件涵盖出现在该词后面列举的元件或者物件及其等同,而不排除其他元件或者物件。“连接”或者“相连”等类似的词语并非限定于物理的或者机械的连接,而是可以包括电性的连接,不管是直接的还是间接的。“上”、“下”、“左”、“右”等仅用于表示相对位置关系,当被描述对象的绝对位置改变后,则该相对位置关系也可能相应地改变。
图1为本公开至少一个实施例提供的一种视频插帧方法的示意图。
如图1所示,视频插帧技术通常是合成视频的两个连续帧之间的中间帧,用于提高帧速率和增强视觉质量。此外,视频插帧技术还可以支持各种应用,例如慢动作生成、视频压缩和用于视频运动去模糊的训练数据生成等。例如,视频插帧可以用光流预测算法来预测中间帧,并插入两帧之间。光流,就像光的流动一样,是一种通过颜色来表示图像中目标移动方向的方式。光流预测算法通常根据前后两帧视频来预测中间的某一帧。将预测完成的图像插进去后,视频看起来就会变得更流畅。例如,如图1所示,通过网络对输入的连续两帧估计中间流信息,通过反向扭曲输入帧得到粗略的结果,并将该结果与输入帧和中间流信息一起输入融合网络,最终得到中间帧。
目前,通常使用的视频插帧算法,都无法很好地处理形变问题,例如,由于视频的场景切换、字幕切换等导致的形变问题。因为大多数视频插帧算法都需要利用视频的前后帧的信息。当视频的前后帧的字幕/场景等发生切换时,无法正确地估计前后帧的光流信息,所以会产生明显的形变。
至少为了克服上述技术问题,本公开至少一个实施例提供一种视频插帧 处理方法,该方法包括:获取视频的第一视频帧和第二视频帧;基于第一视频帧和第二视频帧,获取第一视频帧和第二视频帧之间的第一比较结果;基于第一比较结果确定是否在第一视频帧和第二视频帧之间插帧。第一视频帧和第二视频帧在时域上相邻,第一视频帧是第二视频帧的前向帧。第一比较结果指示第一视频帧和第二视频帧之间是否存在画面切换。
相应地,本公开至少一个实施例还提供了一种对应于上述视频插帧处理方法的视频插帧处理装置和非瞬时可读存储介质。
通过本公开至少一个实施例提供的视频插帧处理方法,可以解决在插帧处理中由于视频画面发生切换而导致的明显形变问题,保证视频的流畅度,从而提升用户的观看体验。
下面通过几个示例或实施例对根据本公开的至少一个实施例提供的布局设计方法进行非限制性的说明,如下面所描述的,在不相互抵触的情况下这些具体示例或实施例中不同特征可以相互组合,从而得到新的示例或实施例,这些新的示例或实施例也都属于本公开保护的范围。
图2为本公开至少一个实施例提供的视频插帧处理方法的流程示意图。
本公开至少一个实施例提供了一种视频插帧处理方法10,如图2所示。例如,该视频插帧处理方法10可以应用于任何需要视频插帧的场景,例如,可以应用于电视剧、电影、纪录片、广告、MV等的各种视频产品和服务,还可以应用于其他方面,本公开的实施例对此不作限制。如图2所示,该视频插帧处理方法10可以包括如下步骤S101至S103。
步骤S101:获取视频的第一视频帧和第二视频帧。第一视频帧和第二视频帧在时域上相邻,第一视频帧是第二视频帧的前向帧。
步骤S102:基于第一视频帧和第二视频帧,获取第一视频帧和第二视频帧之间的第一比较结果。第一比较结果指示第一视频帧和第二视频帧之间是否存在画面切换。
步骤S103:基于第一比较结果确定是否在第一视频帧和第二视频帧之间插帧。
需要说明的是,在本公开的实施例中,“第一视频帧”和“第二视频帧”用于指代视频或者视频帧序列中任意两个在时间上连续的或者相邻的两帧图像或者视频帧。“第一视频帧”用于指代在时间上相邻的两帧图像中的前一帧图像,“第二视频帧”用于指代在时间上相邻的两帧图像中的后一帧图像,“第 三视频帧”用于指代插入到在时间上相邻的两帧图像之间的一帧中间帧或者插入帧。“第一视频帧”、“第二视频帧”和“第三视频帧”均不受限于特定的某一帧图像,也不受限于特定的顺序。“第一比较结果”用于指代视频中相邻的两帧图像之间的比较结果,不受限于特定的某一种比较结果,也不受限于特定的顺序。还需要说明的是,本公开的实施例是以相邻两帧的前向帧为参考,也可以以相邻两帧的后向帧为参考,只要整个视频插帧处理方法中保持一致即可。
例如,在本公开至少一个实施例中,对于步骤S102,为了避免由于视频的前后帧发生画面切换而导致的形变问题,可以将相邻的第一视频帧和第二视频帧进行比较,以确定第一视频帧和第二视频帧之间是否存在画面切换。
例如,在本公开至少一个实施例中,对于步骤S103,可以基于第一视频帧和第二视频帧的第一比较结果来确定是否在第一视频帧和第二视频帧之间执行插帧操作。例如,在一些示例中,插帧操作可以是通过光流预测方法,基于相邻的第一视频帧和第二视频帧计算得到中间帧/插入帧。
需要说明的是,本公开的实施例对如何获取中间帧/插入帧(即第三视频帧)的方法不作具体限制,可以采用各种常规的插帧方法得到第三视频帧。例如,中间帧/插入帧可以是基于相邻的两帧视频帧生成,可以是基于相邻的更多帧生成,也可以是基于某一特定或者某些特定的视频帧生成,本公开对此不作限制,可以根据实际情况来设置。例如,在本公开至少一个实施例中,对于步骤S103,可以包括响应于第一比较结果指示第一视频帧和第二视频帧之间不存在画面切换,确定在第一视频帧和第二视频之间插帧。响应于第一比较结果指示第一视频帧和第二视频帧之间存在画面切换,确定不在第一视频帧和第二视频之间插帧。
因此,在本公开至少一个实施例提供的视频插帧处理方法10中,根据相邻视频帧之间的比较结果,来选择性地执行插帧操作,从而有效避免在插帧处理中,由于视频画面发生切换而导致的明显形变问题,保证视频的流畅度,从而提升用户的观看体验。
例如,在本公开的至少一个实施例中,第一视频帧和第二视频帧之间的画面切换可以包括字幕切换,可以包括场景切换等,本公开的实施例对此不作限制。
例如,在一个示例中,第一视频帧中的字幕是“你要去什么地方”,第二 视频帧中的字幕是“我准备去学校”。第一视频帧中的字幕和第二视频帧中的字幕不同,则可以视为第一视频帧和第二视频之间发生了字幕切换。需要说明的中,本公开的实施例对字幕内容不作限制。
又例如,在一个示例中,第一视频帧中的场景是在商场,第二视频帧中的场景是在学校,第一视频帧的场景和第二视频帧的场景不同,则可以认为第一视频帧和第二视频帧之间发生了场景切换。需要说明的中,在本公开的实施例中,各个视频帧中的场景可以包括商场、学校、景点等任意场景,本公开的实施例对此不作限制。
例如,在本公开的至少一个实施例中,对于步骤S102,基于第一视频帧和第二视频帧,获取第一视频帧和第二视频帧之间的第一比较结果,可以包括:基于第一视频帧和第二视频帧的字幕内容是否相同,确定第一视频帧和第二视频帧之间是否存在字幕切换。
例如,在本公开至少一个实施例中,对于判断相邻两帧之间是否发生字幕切换,可以通过定位视频的音频的每句话的起始与结束,从而获取音频对应的两帧视频帧,按照对应音频帧的时间信息进行标记,以此来判断对应字幕是否切分。
图3为本公开至少一个实施例提供的判断字幕切换的方法的示例流程图。
例如,在本公开的至少一个实施例中,基于第一视频帧和第二视频帧的字幕内容是否相同,确定第一视频帧和第二视频帧之间是否存在字幕切换,可以包括以下步骤S201至S203,如图3所示。
S201:获取第一视频帧对应的音频段。
S202:基于音频段,获取与音频段对应的起始视频帧和结束视频帧。
S203:基于起始视频帧和结束视频帧,确定第一视频帧和第二视频帧之间是否存在字幕切换。
需要说明的是,在本公开的实施例中,“起始视频帧”和“结束视频帧”用于指代基于相应的音频段的时间信息来确定的两帧视频帧,“起始视频帧”和“结束视频帧”不受限于特定的视频帧,也不受限于特定的顺序。
例如,在本公开的至少一个实施例中,对于步骤S201,可以将相应的音频数据输入到语音识别系统进行语音切分,得到语音识别结果以及相应的时间信息。例如,该时间信息包括对应的音频段的起始时间和结束时间。基于该语音识别结果以及相应的时间信息可以得到与第一视频帧对应的音频段。
例如,在本公开的至少一个实施例中,对于步骤S202,根据识别到的相应音频段的时间信息,可以确定与该音频段对应的起始视频帧和结束视频帧。
需要说明的是,本公开的实施例对语音识别方法不作限制,可以采用任何有效的语音识别方法。
例如,在本公开的至少一个实施例中,对于步骤S203,可以包括:响应于第二视频帧在起始视频帧和结束视频帧之间,确定第一视频帧和第二视频帧之间不存在字幕切换,以及响应于第二视频帧不在起始视频帧和结束视频帧之间,确定第一视频帧和第二视频帧之间存在字幕切换。
例如,在本公开至少一个示例中,一个视频包括视频帧序列,例如,包括在时间上相邻的视频帧1、视频帧2、视频帧3、视频帧4、视频帧5……假设第一视频帧是视频帧2,第一视频帧对应的音频段是“你要去什么地方”,根据该音频段的时间信息(例如,一句话的起始时刻和结束时刻),确定该音频段对应的起始视频帧是视频帧1,并且结束视频帧是视频帧4。在这种情况下,说明从视频帧1到视频帧4的画面上显示的字幕都是“你要去什么地方”,即显示相同的字幕内容。例如,假设第二视频帧是视频帧3,在视频帧1和视频帧4之间,那么第一视频帧和第二视频帧之间不存在字幕切换。又例如,假设第二视频帧是视频帧5,不在视频帧1和视频帧4之间,那么第一视频帧和第二视频帧之间发生了字幕切换。通过上述操作,可以通过与视频对应的音频来判断哪些视频帧发生了字幕切换。
例如,在本公开至少一个实施例中,对于判断相邻视频帧之间是否发生字幕切换,除了通过音频来判断,还可以通过文本识别的方法。例如,在一些示例中,通过采用文本识别算法来获取第一视频帧和第二视频帧上显示的字幕内容,比较后判断第一视频帧和第二视频帧之间是否发生了字幕切换。需要说明的是,本公开的实施例对文本识别算法不作具体限制,只要能识别文本内容即可。
图4为本公开至少一个实施例提供的一种文本识别方法的流程示意图。
例如,在本公开至少一个实施例中,如图4所示。通过文本识别算法,除了获取识别文本内容以外,还可以得到文本的坐标。例如,在一些示例中,所获取的文本坐标可以是一句完整字幕的左上、左下、右上、右下四个顶点位置的坐标。例如,在一些示例中,可以对输入的图像(也可以是单帧视频)进行文本检测,确定文本所在的区域,然后对每个字单独进行分割,接着使用单体 文字分类器(例如,采用基于文本特征向量相关性的算法、基于神经网络的算法等)完成单体文字的分类(对于置信度大于某一阈值的则认为是这个字),最后输出文本的识别结果以及其坐标。需要说明的是,本公开的实施例对文本识别方法的具体操作不做限制,可以采用任何有效的文本识别方法。
例如,在本公开至少一个实施例中,对于判断视频的相邻帧(第一视频帧和第二视频帧)之间是否发生字幕切换,可以包括:获取第一视频帧的第一识别文本内容,获取第二视频帧的第二识别文本内容,响应于第一识别文本内容和第二识别文本内容相同,确定第一视频帧和第二视频帧之间不存在字幕切换。
需要说明的是,在本公开的实施例中,“第一识别文本内容”和“第二识别文本内容”用于指代对相应的视频帧执行文本识别操作得到的识别文本内容。“第一识别文本内容”和“第二识别文本内容”不受限于特定的文本内容,也不受限于特定的顺序。
例如,在本公开至少一个实施例中,为了更准确地识别字幕,可以提前设置文本识别操作应用的范围。由于字幕在视频画面中的显示位置通常是固定的,因此可以提前设置字幕所在的大致区域。
图5为本公开至少一个实施例提供的另一种判断字幕切换的方法的流程示意图。
通常,文本识别算法无法达到100%的准确率,例如,会使得文字切分的结果不是完全准确而产生其他问题。例如,在一些示例中,识别到了除字幕以外位置上的字体导致前后帧识别的文字序列无法匹配等。为了更加准确地判断字幕是否切换,本公开的实施例所提供的视频插帧处理方法10可以包括以下步骤S301-S303,如图5所示。
步骤S301:响应于第一识别文本内容和第二识别文本内容不同,获取第一视频帧的第一子图像。第一子图像对应于第一视频帧的第一字幕内容。
步骤S302:获取第二视频帧的第二子图像,第二子图像对应于第二视频帧的第二字幕内容。
步骤S303:基于第一子图像和第二子图像,确定第一视频帧和第二视频帧之间是否存在字幕切换。
需要说明的是,在本公开的实施例中,“第一字幕内容”和“第二字幕内容”分别用于指代相应的视频帧中显示的字幕内容。“第一字幕内容”和“第 二字幕内容”不受限于特定的字幕内容,也不受限于特性的顺序。
还需要说明的是,在本公开的实施例中,“第一子图像”、“第二子图像”和“第三子图像”分别用于指代相应的视频帧中字幕所在区域的图像。“第一子图像”、“第二子图像”和“第三子图像”不受限于特定的图像,也不受限于特定的顺序。
例如,在本公开的至少一个实施例中,对某一视频帧执行文本识别操作,识别到该视频帧中的字幕的坐标(例如,一句完整字幕的左上、左下、右上、右下四个顶点位置的坐标),基于该坐标,可以得到该视频帧中字幕所在的区域,从而得到该视频帧的对应于字幕内容的子图像。
例如,在本公开至少一个实施例中,对于步骤S303,可以包括:基于第一子图像和第二子图像,确定第一子图像和第二子图像之间的第一相似度;响应于第一相似度大于第一阈值,确定第一视频帧和第二视频帧之间不存在字幕切换;响应于第一相似度不大于第一阈值,确定第一视频帧和第二视频帧之间存在字幕切换。
需要说明的是,在本公开的实施例中,“第一相似度”用于指代相邻两帧视频帧的字幕子图像之间的图像相似性。“第二相似度”用于指代相邻两帧视频帧之间的图像相似性。“第一相似度”和“第二相似度”不受限于特定的相似度,也不受限于特定的顺序。
还需要说明的是,本公开的实施例中对“第一阈值”、第二阈值”和“第三阈值”的取值不作限制,可以根据实际需求来设置。“第一阈值”、第二阈值”和“第三阈值”均不受限于某些特定的值,也不受限于特定的顺序。
例如,在本公开的实施例中,两个图像之间的图像相似性可以采用各种方法来计算。例如,通过余弦相似度算法、直方图算法、感知哈希算法、基于互信息的算法等。本公开的实施例对计算图像相似性的方法不作限制,可以根据实际需求来选择。
例如,在本公开至少一个实施例中,可以采用结构相似性(Structural Similarity,SSIM)算法来计算两个图像之间的相似性。对于SSIM,是一种全参考的图像质量评价指标,分别从亮度、对比度、结构三个方面度量图像相似性。计算SSIM的公式如下所示:
其中,μx表示x的平均值,μy表示y的平均值,表示x的方差,表示y的方差,σxy表示x和y的协方差。c1=(k1L)2,c2=(k2L)2表示用来维持稳定的常数。L表示像素值的动态范围。k1=0.01,k2=0.03。结构相似性的取值范围为-1到1。数值越大,表示图像失真越小。当两张图像一模一样的时候,SSIM的值等于1。
例如,在本公开至少一个实施例中,可以设置“第一阈值”为0.6,也可以设置为0.8。需要说明的是,本公开的实施例对“第一阈值”的取值不作限制,可以根据实际需求来设置。
图6为本公开至少一个实施例提供的又一种关于判断字幕是否切换的方法的示意框图。
例如,在本公开至少一个实施例中,如图6所示,分别通过对第一视频帧I0的大致字幕区域Z0和第二视频帧I1的大致字幕区域Z1执行文本识别操作,得到第一文本识别内容T0和第二识别文本内容T1,以及相应的坐标C0和C1。然后,计算第一文本识别内容T0和第二识别文本内容T1之间的文本相似度,以确定第一文本识别内容T0和第二识别文本内容T1是否相同。如果该相似度大于某一阈值,则视为第一文本识别内容T0和第二识别文本内容T1相同,也即字幕没有发生切换。如果该相似度不大于某一阈值,则进一步判断第一视频帧I0中对应字幕区域Z0的第一子图像和第二视频帧I1中对应字幕区域Z1的第二子图像的相似度。例如,如图6所示,判断识别到的坐标C0和坐标C1范围内图像(即上述第一子图像和第二子图像)的SSIM是否大于阈值。如果SSIM大于阈值(例如,0.8),则表明字幕没有发生切换。如果SSIM不大于阈值(例如,0.8),则表明字幕发生了切换。
需要说明的是,本公开的实施例对计算文本相似度的方法不作限制。例如,可以采用欧氏距离、曼哈顿距离、余弦相似度等方法来计算文本相似度。还需要说明的是,本公开的实施例对于文本相似度的阈值也不作具体限制,可以根据实际需求来设置。
例如,在本公开至少一个实施例中,画面切换除了包括字幕切换以外,还可以包括场景切换。例如,对于步骤S102,可以包括:基于第一视频帧和第二视频帧的场景是否相同,确定第一视频帧和第二视频帧之间是否存在场景切换。
例如,在本公开至少一个实施例中,当视频涉及场景切换的时候,前后两 帧图像的图像相似度(例如SSIM数值)会明显的降低。因此,可以通过计算图像相似度的方法来实现场景切分。
例如,在本公开至少一个实施例中,对于判断相邻两帧视频帧之间是否发生场景切换,可以包括如下步骤:获取第一视频帧和第二视频帧之间的第二相似度;响应于第二相似度大于第二阈值,确定第一视频帧和第二视频帧之间不存在场景切换;响应于第二相似度不大于第二阈值,确定第一视频帧和第二视频帧之间存在场景切换。
例如,在本公开至少一个实施例中,第二相似度可以是结构相似度(SSIM),也可以是例如,感知哈希算法、直方图算法等来计算图片(即视频帧)之间的相似度,本公开的实施例对计算图像相似度的算法不作限制。
需要说明的是,在本公开的实施例中,所插帧数量是以2倍插帧为例。,例如由30fps(每秒传输帧数)插帧为60fps,即为每秒传输的帧数由30帧提高到60帧。当检测到相邻的两帧视频帧之间出现场景切换或是字幕切换时,当前的两帧之间不再执行插帧操作,为了保证帧数一致,下一次插帧时会插两帧。又例如,当场景切换以及字幕切换连续两次出现时,会导致两次未执行插帧操作,如果下一次插帧时只插两帧,会导致整体视频少帧。
图7为本公开至少一个实施例提供的另一视频插帧处理方法的示意图。
例如,为了避免上述少帧情况的出现,在本公开至少一个实施例中,视频插帧处理方法10除了步骤S10-S103以外,可以包括:设置第一插帧标志;
响应于第一视频帧和第二视频帧之间存在画面切换,将第一插帧标志修改为第二插帧标志。
需要说明的是,在本公开的实施例中,“第一插帧标志”、“第二插帧标志”和“第三插帧标志”指代不同时间点或不同阶段的插帧标志,以用于指示视频中存在连续多少次画面切换。“第一插帧标志”、“第二插帧标志”和“第三插帧标志”均不受限于特定的值,也不受限于特定的顺序。
例如,在一些示例中,假设视频包括视频帧序列,例如,包括在时间上相邻的视频帧1、视频帧2、视频帧3、视频帧4、视频帧5……例如,在一个示例中,设置一个插帧标志,例如,该插帧标志Flag被初始化为(0,0)。输入相邻的两个视频帧(例如,第一视频帧和第二视频帧),假设第一视频帧是视频帧2,第二视频帧是视频帧3。通过上述实施例中所述的方法确定视频帧2和视频帧3之间是否存在画面切换(字幕切换或者场景切换)。如果视频帧2 和视频帧3之间存在画面切换,则将插帧标志Flag从(0,0)修改为(0,1)。例如,在一些示例中,当确定相邻两帧视频帧之间发生画面切换时,对插帧标志Flag(0,0)附加一个值“1”,并弹出前一个值“0”,即更新后的插帧标志为(0,1)。当确定相邻两帧视频帧之间没有发生画面切换时,对插帧标志Flag(0,0)附加一个值“0”,并弹出前一个值“0”,即更新后的插帧标志为(0,0)。
需要说明的是,插帧标志也可以初始化为其他数值,例如,(1,1)、(0,0,0)等,本公开的实施例对此不作限制。
例如,在本公开至少一个实施例中,响应于第一视频帧和第二视频帧之间存在画面切换,获取第四视频帧。基于第二视频帧和第四视频帧,获取第二视频帧和第四视频帧之间的第二比较结果。基于第二比较结果确定是否在第二视频帧和第四视频之间插帧。第四视频帧和第二视频帧在时域上相邻,第二视频帧是第四视频帧的前向帧。第二比较结果指示第二视频帧和所述第四视频帧之间是否存在画面切换。
例如,在本公开至少一个实施例中,基于第二比较结果确定是否在第二视频帧和第四视频之间插帧,包括:响应于第二比较结果指示第二视频帧和第四视频帧之间不存在画面切换,在第二视频帧和第四视频之间插入多帧视频帧。多帧视频帧的帧数基于第二插帧标志。
例如,在本公开至少一个实施例中,基于第二比较结果确定是否在第二视频帧和第四视频之间插帧,包括:响应于第二比较结果指示第二视频帧和第四视频帧之间存在画面切换,确定不在第二视频帧和第四视频之间插帧;以及将第二插帧标志修改为第三插帧标志。该第三插帧标志用于指示下一次插帧的帧数。
需要说明的是,“第四视频帧”用于指代在时间上与“第二视频帧”相邻的后一帧图像,第四视频帧不受限于特定的某一帧图像,也不受限于特定的顺序。“第二比较结果”用于指代视频中相邻的两帧图像(第二视频帧和第四视频帧)之间的比较结果,不受限于特定的某一种比较结果,也不受限于特定的顺序。
例如,在一些示例中,假设视频包括视频帧序列,例如,包括在时间上相邻的视频帧1、视频帧2、视频帧3、视频帧4、视频帧5……假设第一视频帧是视频帧1,第二视频帧是视频帧2,第四视频帧是视频帧3。如图7所示,若输入视频帧1和视频帧2,确定视频帧1和视频帧2之间存在画面切换(字 幕切换或者场景切换),在这种情况下,视频帧1和视频帧2之间不进行插帧操作,且设置插帧标志Flag为(0,1)。然后,再输入相邻的2帧视频帧,即视频帧2和视频帧3,通过上述实施例提供的方法判断视频帧2和视频帧3之间是否存在画面切换(字幕切换或者场景切换)。例如,如果判断视频帧2和视频帧3之间不存在画面切换,则在视频帧2和视频帧3之间执行插帧操作。在这种情况下,插帧标志Flag为(0,1),说明出现了一次画面切换(即视频帧1和视频帧2之间没有插帧),为了避免出现少帧的问题,需要在视频帧2和视频帧3之间插入两帧视频帧。又例如,如果判断视频帧2和视频帧3之间仍然存在画面切换,则不在视频帧2和视频帧3之间执行插帧操作。在这种情况下,将插帧标志Flag从(0,1)修改为(1,1)。例如,对插帧标志Flag(0,1)附加一个值“1”,并弹出前一个值“0”。插帧标志Flag(1,1)可以说明视频帧序列中已连续两次出现画面切换。例如,视频帧1和视频帧2之间存在画面切换,并且视频帧2和视频帧3之间仍然存在画面切换。例如,通过类似的操作,继续比较视频帧3和视频帧4。如果视频帧3和视频帧4之间不存在画面切换,可以进行插帧操作。为了避免出现少帧的问题,基于插帧标志(1,1)可知,需要在视频帧3和视频帧4之间插入3帧视频帧。由此,保证了视频插帧后的整体完整性。
需要说明的是,在实际应用中,很少发生连续几帧相邻的视频帧均出现画面切换,因此,本公开的上述实施例以最多连续发生2次画面切换为示例,将插帧标志初始化为(0,0)。本公开的实施例对此不作限制,可以根据实际需求来设置。
图8为本公开至少一个实施例提供的一种插帧后处理方法的示意流程图。
例如,在本公开至少一个实施例中,视频插帧处理方法10还包括以下步骤S401-S403,如图8所示。
步骤S401:响应于在第一视频帧和第二视频帧之间插入第三视频帧,获取第一视频帧的第一子图像。第一子图像对应于第一视频帧中的第一字幕内容。
步骤S402:获取第三视频帧的第三子图像。第三子图像对应于第三视频帧中的第三字幕内容。
步骤S403:基于第一子图像和第三子图像,确定是否用第一视频帧替换第三视频帧。
例如,在本公开至少一个实施例中,对于步骤S403,可以包括:获取第一子图像中的第一像素的像素值;基于第一子图像的第一像素的像素值,设置第三子图像的第三像素的像素值;以及基于第一子图像和设置后的第三子图像,确定是否用第一视频帧替换第三视频帧。第一像素的像素值大于第三阈值,第三像素在第三子图像中的相对位置和第一像素在第一子图像的相对位置相同。
例如,在本公开的实施例中,第三像素在第三子图像中的相对位置和第一像素在第一子图像的相对位置相同可以理解为例如,以第一子图像的左上角顶点为坐标原点,第一像素在该坐标系下的位置坐标与以第三子图像的左上角顶点为坐标原点,第三像素在该坐标系下的位置坐标相同。
结合图9的详细描述,包括图8所示的操作的视频插帧处理方法10可以解决在视频插帧处理中由于字幕背景发生较大运动而导致的形变问题。图9为本公开至少一个实施例提供的另一种视频插帧处理方法的示意图。
例如,在一些示例中,当在第一视频帧和第二视频帧之间插入第三视频帧之后,为了提高插帧准确性,可以判断第一视频帧和第三视频帧的字幕是否相同,即是否发生字幕切换,如图9所示。例如,可以通过上述实施例中提供的判断相邻视频帧之间是否发生字幕切换的方法来判断。例如,该部分操作可以参考对应于图6的相关描述,在此不再赘述。例如,在通过图6的方法判断第一视频帧和第三视频帧之间不存在字幕切换后,还可以进一步进行处理。
例如,在一些示例中,因为字幕的颜色通常保持稳定,例如,大部分字幕都是白色,因此可以选择第一视频帧的第一子图像(即识别到的坐标C0对应的区域)中大于某一阈值(即第三阈值)的像素(即第一像素)。例如,设置第三阈值为220,像素值范围一般为0-255。将第一像素的值赋值给第三子图像(即识别到的坐标Ct对应的区域)中与第一像素位于相同位置的像素(即第三像素)。例如,在图9中,将赋值后的第三子图像记为Ct’。由于字幕背景如果存在较大幅度的运动会导致字幕的形变通常是明显超出原始字符的范围。因此,通过比较第一子图像和赋值后的第三子图像可以判断插帧字幕是否存在明显形变。
例如,在本公开至少一个实施例中,比较第一子图像和赋值后的第三子图像,将第一子图像和赋值后的第三子图像各个对应像素的像素值相减,判断像素差值的绝对值超过某一阈值(例如,150)的像素的数量是否大于另一个阈 值(例如,30)。如果像素差值的绝对值超过150的像素的数量大于30个,则视为插入的第三视频帧的字幕存在明显的形变,直接将第一视频帧复制,以替换发生形变的插入帧(即第三视频帧)。当然,也可以用第二视频帧来替换发生形变的插入帧(即第三视频帧),本公开的实施例对此不作限制。这样,可以避免字幕背景发生较大运动时带来的形变问题。
图10为本公开至少一实施例提供的一种视频插帧处理方法的示意框图。
如图10所示,本公开至少一实施例提供的一种视频插帧处理方法不仅可以解决场景切换、字幕切换导致的形变问题,还可以通过插帧后的后处理来解决字幕背景大运动导致的明显形变问题。关于图10中所述方法的各个框中的操作在上文中都有详细描述,在此不再赘述。
因此,通过本公开至少一个实施例提供的视频插帧处理方法10,可以解决在插帧处理中,由于视频画面发生切换而导致以及字幕背景发生较大运动而导致的明显形变问题,从而保证视频的流畅度,提升用户的观看体验。
还需要说明的是,在本公开的各个实施例中,视频插帧处理方法10的各个步骤的执行顺序不受限制,虽然上文以特定顺序描述了各个步骤的执行过程,但这并不构成对本公开实施例的限制。视频插帧处理方法10中的各个步骤可以串行执行或并行执行,这可以根据实际需求而定。例如,视频插帧处理方法10还可以包括更多或更少的步骤,本公开的实施例对此不作限制。
本公开至少一个实施例还提供一种视频插帧处理装置,该视频插帧处理装置可以根据相邻视频帧之间的比较结果,来选择性地执行插帧处理,从而有效避免在插帧处理中,由于视频画面发生切换而导致的明显形变问题,保证视频的流畅度,从而提升用户的观看体验。
图11为本公开至少一个实施例提供的一种视频插帧处理装置的示意框图。
例如,在本公开至少一个实施例中,如图11所示,视频插帧处理装置80包括获取模块801、比较模块802和操作模块803。
例如,在本公开至少一个实施例中,获取模块801配置为获取视频的第一视频帧和第二视频帧。第一视频帧和第二视频帧在时域上相邻,第一视频帧是第二视频帧的前向帧。例如,该获取模块801可以实现步骤S101,其具体实现方法可以参考步骤S101的相关描述,在此不再赘述。
例如,在本公开至少一个实施例中,比较模块802被配置为基于第一视频帧和第二视频帧,获取第一视频帧和第二视频帧之间的第一比较结果。第一 比较结果指示第一视频帧和第二视频帧之间是否存在画面切换。例如,该比较模块802可以实现步骤S102,其具体实现方法可以参考步骤S102的相关描述,在此不再赘述。
例如,在本公开至少一个实施例中,操作模块803被配置为基于第一比较结果确定是否在第一视频帧和第二视频帧之间插帧。例如,该操作模块803可以实现步骤S103,其具体实现方法可以参考步骤S103的相关描述,在此不再赘述。
需要说明的是,这些获取模块801、比较模块802和操作模块803可以通过软件、硬件、固件或它们的任意组合实现,例如,可以分别实现为获取电路801、比较电路802和操作电路803,本公开的实施例对它们的具体实施方式不作限制。
应当理解的是,本公开实施例提供的视频插帧处理装置80可以实施前述视频插帧处理方法10,也可以实现与前述视频插帧处理方法10相似的技术效果,在此不作赘述。
需要注意的是,在本公开的实施例中,该用于视频插帧处理装置80可以包括更多或更少的电路或单元,并且各个电路或单元之间的连接关系不受限制,可以根据实际需求而定。各个电路的具体构成方式不受限制,可以根据电路原理由模拟器件构成,也可以由数字芯片构成,或者以其他适用的方式构成。
图12是本公开至少一个实施例提供另一种视频插帧处理装置的示意框图。
本公开至少一个实施例还提供了一种视频插帧处理装置90。如图12所示,视频插帧处理装置90包括处理器910和存储器920。存储器920包括一个或多个计算机程序模块921。一个或多个计算机程序模块921被存储在存储器920中并被配置为由处理器910执行,该一个或多个计算机程序模块921包括用于执行本公开的至少一个实施例提供的视频插帧处理方法10的指令,其被处理器910执行时,可以执行本公开的至少一个实施例提供的视频插帧处理方法10中的一个或多个步骤。存储器920和处理器910可以通过总线系统和/或其它形式的连接机构(未示出)互连。
例如,处理器910可以是中央处理单元(CPU)、数字信号处理器(DSP)或者具有数据处理能力和/或程序执行能力的其它形式的处理单元,例如现场可编程门阵列(FPGA)等;例如,中央处理单元(CPU)可以为X86或ARM 架构等。处理器910可以为通用处理器或专用处理器,可以控制视频插帧处理装置90中的其它组件以执行期望的功能。
例如,存储器920可以包括一个或多个计算机程序产品的任意组合,计算机程序产品可以包括各种形式的计算机可读存储介质,例如易失性存储器和/或非易失性存储器。易失性存储器例如可以包括随机存取存储器(RAM)和/或高速缓冲存储器(cache)等。非易失性存储器例如可以包括只读存储器(ROM)、硬盘、可擦除可编程只读存储器(EPROM)、便携式紧致盘只读存储器(CD-ROM)、USB存储器、闪存等。在计算机可读存储介质上可以存储一个或多个计算机程序模块921,处理器910可以运行一个或多个计算机程序模块921,以实现视频插帧处理装置90的各种功能。在计算机可读存储介质中还可以存储各种应用程序和各种数据以及应用程序使用和/或产生的各种数据等。视频插帧处理装置90的具体功能和技术效果可以参考上文中关于视频插帧处理方法10的描述,此处不再赘述。
图13为本公开至少一个实施例提供的又一种视频插帧处理装置300的示意框图。
本公开实施例中的终端设备可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、PDA(个人数字助理)、PAD(平板电脑)、PMP(便携式多媒体播放器)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字TV、台式计算机等等的固定终端。图13示出的视频插帧处理装置300仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。
例如,如图13所示,在一些示例中,视频插帧处理装置300包括处理装置(例如中央处理器、图形处理器等)301,其可以根据存储在只读存储器(ROM)302中的程序或者从存储装置308加载到随机访问存储器(RAM)303中的程序而执行各种适当的动作和处理。在RAM 303中,还存储有计算机系统操作所需的各种程序和数据。处理装置301、ROM 302以及RAM 303通过总线304被此相连。输入/输出(I/O)接口305也连接至总线304。
例如,以下部件可以连接至I/O接口305:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置306;包括诸如液晶显示器(LCD)、扬声器、振动器等的输出装置307;包括例如磁带、硬盘等的存储装置308;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信装置309。通信装置309可以允许视频插帧处理装置300与其他设备进行 无线或有线通信以交换数据,经由诸如因特网的网络执行通信处理。驱动器310也根据需要连接至I/O接口305。可拆卸介质311,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器310上,以便于从其上读出的计算机程序根据需要被安装入存储装置308。虽然图13示出了包括各种装置的视频插帧处理装置300,但是应理解的是,并不要求实施或包括所有示出的装置。可以替代地实施或包括更多或更少的装置。
例如,该视频插帧处理装置300还可以进一步包括外设接口(图中未示出)等。该外设接口可以为各种类型的接口,例如为USB接口、闪电(lighting)接口等。该通信装置309可以通过无线通信来与网络和其他设备进行通信,该网络例如为因特网、内部网和/或诸如蜂窝电话网络之类的无线网络、无线局域网(LAN)和/或城域网(MAN)。无线通信可以使用多种通信标准、协议和技术中的任何一种,包括但不局限于全球移动通信系统(GSM)、增强型数据GSM环境(EDGE)、宽带码分多址(W-CDMA)、码分多址(CDMA)、时分多址(TDMA)、蓝牙、Wi-Fi(例如基于IEEE 802.11a、IEEE 802.11b、IEEE 802.11g和/或IEEE 802.11n标准)、基于因特网协议的语音传输(VoIP)、Wi-MAX,用于电子邮件、即时消息传递和/或短消息服务(SMS)的协议,或任何其他合适的通信协议。
例如,视频插帧处理装置300可以为手机、平板电脑、笔记本电脑、电子书、游戏机、电视机、数码相框、导航仪等任何设备,也可以为任意的数据处理装置及硬件的组合,本公开的实施例对此不作限制。
例如,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在非暂态计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置309从网络上被下载和安装,或者从存储装置308被安装,或者从ROM302被安装。在该计算机程序被处理装置301执行时,执行本公开实施例所公开的视频插帧处理方法10。
需要说明的是,本公开上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以 包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开的实施例中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开的实施例中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、RF(射频)等等,或者上述的任意合适的组合。
上述计算机可读介质可以是上述视频插帧处理装置300中所包含的;也可以是单独存在,而未装配入该视频插帧处理装置300中。
图14为本公开至少一个实施例提供的一种非瞬时可读存储介质的示意框图。
本公开的实施例还提供一种非瞬时可读存储介质。图14是根据本公开至少一个实施例的一种非瞬时可读存储介质的示意框图。如图14所示,非瞬时可读存储介质140上存储有计算机指令111,该计算机指令111被处理器执行时执行如上所述的视频插帧处理方法10中的一个或多个步骤。
例如,该非瞬时可读存储介质140可以是一个或多个计算机可读存储介质的任意组合,例如,一个计算机可读存储介质包含用于获取视频的第一视频帧和第二视频帧的计算机可读的程序代码,另一个计算机可读存储介质包含用于基于第一视频帧和所述第二视频帧,获取第一视频帧和第二视频帧之间的第一比较结果的计算机可读的程序代码,又一个计算机可读存储介质包含用于基于第一比较结果确定是否在第一视频帧和第二视频帧之间插帧的计算机可读的程序代码。当然,上述各个程序代码也可以存储在同一个计算机可读介质中,本公开的实施例对此不作限制。
例如,当该程序代码由计算机读取时,计算机可以执行该计算机存储介质 中存储的程序代码,执行例如本公开任一个实施例提供的视频插帧处理方法10。
例如,存储介质可以包括智能电话的存储卡、平板电脑的存储部件、个人计算机的硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM)、便携式紧致盘只读存储器(CD-ROM)、闪存、或者上述存储介质的任意组合,也可以为其他适用的存储介质。例如,该可读存储介质也可以为图12中的存储器920,相关描述可以参考前述内容,此处不再赘述。
本公开的实施例还提供一种电子设备。图15是根据本公开至少一个实施例的一种电子设备的示意框图。如图15所示,该电子设备120可以包括如上所述的视频插帧处理装置80/90/300。例如,该电子设备120可以实施本公开任一个实施例提供的视频插帧处理方法10。
在本公开中,术语“多个”指两个或两个以上,除非另有明确的限定。
本领域技术人员在考虑说明书及实践这里公开的公开后,将容易想到本公开的其它实施方案。本公开旨在涵盖本公开的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本公开的真正范围和精神由下面的权利要求指出。
应当理解的是,本公开并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本公开的范围仅由所附的权利要求来限制。

Claims (20)

  1. 一种视频插帧处理方法,包括:
    获取视频的第一视频帧和第二视频帧,其中,所述第一视频帧和所述第二视频帧在时域上相邻,所述第一视频帧是所述第二视频帧的前向帧;
    基于所述第一视频帧和所述第二视频帧,获取所述第一视频帧和所述第二视频帧之间的第一比较结果,其中,所述第一比较结果指示所述第一视频帧和所述第二视频帧之间是否存在画面切换;
    基于所述第一比较结果确定是否在所述第一视频帧和所述第二视频帧之间插帧。
  2. 根据权利要求1所述的方法,其中,所述画面切换包括字幕切换和/或场景切换。
  3. 根据权利要求2所述的方法,其中,基于所述第一视频帧和所述第二视频帧,获取所述第一视频帧和所述第二视频帧之间的所述第一比较结果,包括:
    基于所述第一视频帧和所述第二视频帧的字幕内容是否相同,确定所述第一视频帧和所述第二视频帧之间是否存在所述字幕切换。
  4. 根据权利要求3所述的方法,其中,基于所述第一视频帧和所述第二视频帧的字幕内容是否相同,确定所述第一视频帧和所述第二视频帧之间是否存在所述字幕切换,包括:
    获取所述第一视频帧对应的音频段;
    基于所述音频段,获取与所述音频段对应的起始视频帧和结束视频帧;
    基于所述起始视频帧和所述结束视频帧,确定所述第一视频帧和所述第二视频帧之间是否存在所述字幕切换。
  5. 根据权利要求4所述的方法,其中,基于所述起始视频帧和所述结束视频帧,确定所述第一视频帧和所述第二视频帧之间是否存在所述字幕切换,包括:
    响应于所述第二视频帧在所述起始视频帧和所述结束视频帧之间,确定所述第一视频帧和所述第二视频帧之间不存在所述字幕切换;
    响应于所述第二视频帧不在所述起始视频帧和所述结束视频帧之间,确定所述第一视频帧和所述第二视频帧之间存在所述字幕切换。
  6. 根据权利要求3-5中任一项所述的方法,其中,基于所述第一视频帧和所述第二视频帧的字幕内容是否相同,确定所述第一视频帧和所述第二视频帧之间是否存在所述字幕切换,包括:
    获取所述第一视频帧的第一识别文本内容;
    获取所述第二视频帧的第二识别文本内容;
    响应于所述第一识别文本内容和所述第二识别文本内容相同,确定所述第一视频帧和所述第二视频帧之间不存在所述字幕切换。
  7. 根据权利要求6所述的方法,其中,基于所述第一视频帧和所述第二视频帧的字幕内容是否相同,确定所述第一视频帧和所述第二视频帧之间是否存在所述字幕切换,还包括:
    响应于所述第一识别文本内容和所述第二识别文本内容不同;
    获取所述第一视频帧的第一子图像,其中,所述第一子图像对应于所述第一视频帧的第一字幕内容;
    获取所述第二视频帧的第二子图像,其中,所述第二子图像对应于所述第二视频帧的第二字幕内容;
    基于所述第一子图像和所述第二子图像,确定所述第一视频帧和所述第二视频帧之间是否存在所述字幕切换。
  8. 根据权利要求7所述的方法,其中,基于所述第一子图像和所述第二子图像,确定所述第一视频帧和所述第二视频帧之间是否存在所述字幕切换,包括:
    基于所述第一子图像和所述第二子图像,确定所述第一子图像和所述第二子图像之间的第一相似度;
    响应于所述第一相似度大于第一阈值,确定所述第一视频帧和所述第二视频帧之间不存在所述字幕切换;
    响应于所述第一相似度不大于所述第一阈值,确定所述第一视频帧和所述第二视频帧之间存在所述字幕切换。
  9. 根据权利要求2-8中任一项所述的方法,其中,基于所述第一视频帧和所述第二视频帧,获取所述第一视频帧和所述第二视频帧之间的所述第一比较结果,包括:
    基于所述第一视频帧和所述第二视频帧的场景是否相同,确定所述第一 视频帧和所述第二视频帧之间是否存在所述场景切换。
  10. 根据权利要求9所述的方法,其中,基于所述第一视频帧和所述第二视频帧的场景是否相同,确定所述第一视频帧和所述第二视频帧之间是否存在所述场景切换,包括:
    获取所述第一视频帧和所述第二视频帧之间的第二相似度;
    响应于所述第二相似度大于第二阈值,确定所述第一视频帧和所述第二视频帧之间不存在所述场景切换;
    响应于所述第二相似度不大于所述第二阈值,确定所述第一视频帧和所述第二视频帧之间存在所述场景切换。
  11. 根据权利要求1-10中任一项所述的方法,其中,基于所述第一比较结果确定是否在所述第一视频帧和所述第二视频之间插帧,包括:
    响应于所述第一比较结果指示所述第一视频帧和所述第二视频帧之间不存在所述画面切换,确定在所述第一视频帧和所述第二视频之间插帧;
    响应于所述第一比较结果指示所述第一视频帧和所述第二视频帧之间存在所述画面切换,确定不在所述第一视频帧和所述第二视频之间插帧。
  12. 根据权利要求1-11中任一项所述的方法,还包括:
    设置第一插帧标志;
    响应于所述第一视频帧和所述第二视频帧之间存在所述画面切换,将所述第一插帧标志修改为第二插帧标志。
  13. 根据权利要求12所述的方法,还包括:
    响应于所述第一视频帧和所述第二视频帧之间存在所述画面切换,获取第四视频帧,其中,所述第四视频帧和所述第二视频帧在时域上相邻,所述第二视频帧是所述第四视频帧的前向帧;
    基于所述第二视频帧和所述第四视频帧,获取所述第二视频帧和所述第四视频帧之间的第二比较结果,其中,所述第二比较结果指示所述第二视频帧和所述第四视频帧之间是否存在所述画面切换;
    基于所述第二比较结果确定是否在所述第二视频帧和所述第四视频之间插帧。
  14. 根据权利要求13所述的方法,其中,基于所述第二比较结果确定是否在所述第二视频帧和所述第四视频之间插帧,包括:
    响应于所述第二比较结果指示所述第二视频帧和所述第四视频帧之间不存在所述画面切换,在所述第二视频帧和所述第四视频之间插入多帧视频帧,其中,所述多帧视频帧的帧数基于所述第二插帧标志。
  15. 根据权利要求13所述的方法,基于所述第二比较结果确定是否在所述第二视频帧和所述第四视频之间插帧,包括:
    响应于所述第二比较结果指示所述第二视频帧和所述第四视频帧之间存在所述画面切换,确定不在所述第二视频帧和所述第四视频之间插帧;以及
    将所述第二插帧标志修改为第三插帧标志,其中,所述第三插帧标志用于指示下一次插帧的帧数。
  16. 根据权利要求1-15中任一项所述的方法,还包括:
    响应于在所述第一视频帧和所述第二视频帧之间插入第三视频帧,获取所述第一视频帧的第一子图像,其中,所述第一子图像对应于所述第一视频帧中的第一字幕内容;
    获取所述第三视频帧的第三子图像,其中,所述第三子图像对应于所述第三视频帧中的第三字幕内容;
    基于所述第一子图像和所述第三子图像,确定是否用所述第一视频帧替换所述第三视频帧。
  17. 根据权利要求16所述的方法,其中,基于所述第一子图像和所述第三子图像,确定是否用所述第一视频帧替换所述第三视频帧,包括:
    获取所述第一子图像中的第一像素的像素值;其中,所述第一像素的像素值大于第三阈值;
    基于所述第一子图像的第一像素的像素值,设置所述第三子图像的第三像素的像素值,其中,所述第三像素在所述第三子图像中的相对位置和所述第一像素在所述第一子图像的相对位置相同;
    基于所述第一子图像和所述设置后的第三子图像,确定是否用所述第一视频帧替换所述第三视频帧。
  18. 一种视频插帧处理装置,包括:
    获取模块,被配置为获取视频的第一视频帧和第二视频帧,其中,所述第一视频帧和所述第二视频帧在时域上相邻,所述第一视频帧是所述第二视频帧的前向帧;
    比较模块,被配置为基于所述第一视频帧和所述第二视频帧,获取所述第一视频帧和所述第二视频帧之间的第一比较结果,其中,所述第一比较结果指示所述第一视频帧和所述第二视频帧之间是否存在画面切换;
    操作模块,被配置为基于所述第一比较结果确定是否在所述第一视频帧和所述第二视频帧之间插帧。
  19. 一种视频插帧处理装置,包括:
    处理器;
    存储器,包括一个或多个计算机程序模块;
    其中,所述一个或多个计算机程序模块被存储在所述存储器中并被配置为由所述处理器执行,所述一个或多个计算机程序模块包括用于执行权利要求1-17中任一项所述的视频插帧处理方法的指令。
  20. 一种非瞬时可读存储介质,其上存储有计算机指令,其中,所述计算机指令被处理器执行时执行权利要求1-17中任一项所述的视频插帧处理方法。
PCT/CN2023/077905 2022-02-25 2023-02-23 视频插帧处理方法、视频插帧处理装置和可读存储介质 WO2023160617A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210178989.XA CN114554285A (zh) 2022-02-25 2022-02-25 视频插帧处理方法、视频插帧处理装置和可读存储介质
CN202210178989.X 2022-02-25

Publications (2)

Publication Number Publication Date
WO2023160617A1 WO2023160617A1 (zh) 2023-08-31
WO2023160617A9 true WO2023160617A9 (zh) 2023-10-26

Family

ID=81680086

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/077905 WO2023160617A1 (zh) 2022-02-25 2023-02-23 视频插帧处理方法、视频插帧处理装置和可读存储介质

Country Status (2)

Country Link
CN (1) CN114554285A (zh)
WO (1) WO2023160617A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114554285A (zh) * 2022-02-25 2022-05-27 京东方科技集团股份有限公司 视频插帧处理方法、视频插帧处理装置和可读存储介质
CN116886961B (zh) * 2023-09-06 2023-12-26 中移(杭州)信息技术有限公司 一种分布式直播视频插帧方法、设备、系统及存储介质

Family Cites Families (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003299000A (ja) * 2002-04-02 2003-10-17 Oojisu Soken:Kk シーンチェンジ検出方法、シーンチェンジ検出装置、コンピュータプログラム及び記録媒体
KR20060102639A (ko) * 2005-03-24 2006-09-28 주식회사 코난테크놀로지 동영상 재생 시스템 및 방법
JP4909165B2 (ja) * 2007-04-24 2012-04-04 ルネサスエレクトロニクス株式会社 シーン変化検出装置、符号化装置及びシーン変化検出方法
US20140002732A1 (en) * 2012-06-29 2014-01-02 Marat R. Gilmutdinov Method and system for temporal frame interpolation with static regions excluding
CN103559214B (zh) * 2013-10-11 2017-02-08 中国农业大学 视频自动生成方法及装置
JP6384119B2 (ja) * 2014-05-15 2018-09-05 ソニー株式会社 受信装置、および送信装置、並びにデータ処理方法
CN105227966A (zh) * 2015-09-29 2016-01-06 深圳Tcl新技术有限公司 电视播放控制方法、服务器及电视播放控制系统
CN106210767B (zh) * 2016-08-11 2020-01-07 上海交通大学 一种智能提升运动流畅性的视频帧率上变换方法及系统
CN106792071A (zh) * 2016-12-19 2017-05-31 北京小米移动软件有限公司 字幕处理方法及装置
CN106604125B (zh) * 2016-12-29 2019-06-14 北京奇艺世纪科技有限公司 一种视频字幕的确定方法及装置
US11120293B1 (en) * 2017-11-27 2021-09-14 Amazon Technologies, Inc. Automated indexing of media content
CN111277895B (zh) * 2018-12-05 2022-09-27 阿里巴巴集团控股有限公司 一种视频插帧方法和装置
CN109803175B (zh) * 2019-03-12 2021-03-26 京东方科技集团股份有限公司 视频处理方法及装置、设备、存储介质
US10963702B1 (en) * 2019-09-10 2021-03-30 Huawei Technologies Co., Ltd. Method and system for video segmentation
CN112584196A (zh) * 2019-09-30 2021-03-30 北京金山云网络技术有限公司 视频插帧方法、装置及服务器
CN112584232A (zh) * 2019-09-30 2021-03-30 北京金山云网络技术有限公司 视频插帧方法、装置及服务器
CN110933485A (zh) * 2019-10-21 2020-03-27 天脉聚源(杭州)传媒科技有限公司 一种视频字幕生成方法、系统、装置和存储介质
CN110708568B (zh) * 2019-10-30 2021-12-10 北京奇艺世纪科技有限公司 一种视频内容突变检测方法及装置
CN111182347B (zh) * 2020-01-07 2021-03-23 腾讯科技(深圳)有限公司 视频片段剪切方法、装置、计算机设备和存储介质
CN111510758A (zh) * 2020-04-24 2020-08-07 怀化学院 一种钢琴视频教学中的同步方法及系统
CN111641829B (zh) * 2020-05-16 2022-07-22 Oppo广东移动通信有限公司 视频处理方法及装置、系统、存储介质和电子设备
CN111641828B (zh) * 2020-05-16 2022-03-18 Oppo广东移动通信有限公司 视频处理方法及装置、存储介质和电子设备
CN112802469A (zh) * 2020-12-28 2021-05-14 出门问问(武汉)信息科技有限公司 一种获取语音识别模型训练数据的方法及装置
CN112735476A (zh) * 2020-12-29 2021-04-30 北京声智科技有限公司 一种音频数据标注方法及装置
CN112699787B (zh) * 2020-12-30 2024-02-20 湖南快乐阳光互动娱乐传媒有限公司 一种广告插入时间点的检测方法及装置
CN113052169A (zh) * 2021-03-15 2021-06-29 北京小米移动软件有限公司 视频字幕识别方法、装置、介质及电子设备
CN113691758A (zh) * 2021-08-23 2021-11-23 深圳市慧鲤科技有限公司 插帧方法和装置、设备、介质
CN113766314B (zh) * 2021-11-09 2022-03-04 北京中科闻歌科技股份有限公司 视频切分方法、装置、设备、系统及存储介质
CN114554285A (zh) * 2022-02-25 2022-05-27 京东方科技集团股份有限公司 视频插帧处理方法、视频插帧处理装置和可读存储介质

Also Published As

Publication number Publication date
WO2023160617A1 (zh) 2023-08-31
CN114554285A (zh) 2022-05-27

Similar Documents

Publication Publication Date Title
WO2023160617A9 (zh) 视频插帧处理方法、视频插帧处理装置和可读存储介质
WO2021147657A1 (zh) 插帧处理方法及相关产品
CN112561920A (zh) 用于在视频中进行密集语义分割的深度学习
US9589363B2 (en) Object tracking in encoded video streams
US9652829B2 (en) Video super-resolution by fast video segmentation for boundary accuracy control
US10430694B2 (en) Fast and accurate skin detection using online discriminative modeling
WO2020107988A1 (zh) 视频处理方法、装置、电子设备以及存储介质
US20230362328A1 (en) Video frame insertion method and apparatus, and electronic device
WO2021057359A1 (zh) 图像处理方法、电子设备及可读存储介质
US9774821B2 (en) Display apparatus and control method thereof
US20180268864A1 (en) Detecting and correcting whiteboard images while enabling the removal of the speaker
WO2022218042A1 (zh) 视频处理方法、装置、视频播放器、电子设备及可读介质
US9621841B1 (en) Frame rate conversion based on object tracking
WO2019242217A1 (zh) 媒体内容的播放方法和装置
CN114240754A (zh) 投屏处理方法、装置、电子设备及计算机可读存储介质
CN112866795B (zh) 电子设备及其控制方法
US9286655B2 (en) Content aware video resizing
US9275468B2 (en) Fallback detection in motion estimation
US9241144B2 (en) Panorama picture scrolling
US10121265B2 (en) Image processing device and method to calculate luminosity of an environmental light of an image
US11558621B2 (en) Selective motion-compensated frame interpolation
WO2023159470A1 (zh) 视频插帧处理方法、视频插帧处理装置和可读存储介质
US11741570B2 (en) Image processing device and image processing method of same
CN115348478B (zh) 设备交互显示方法、装置、电子设备及可读存储介质
CN115914647A (zh) 一种视频图像的运动估计方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23759240

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 18560891

Country of ref document: US