WO2015081776A1 - 视频画面的处理方法及装置 - Google Patents

视频画面的处理方法及装置 Download PDF

Info

Publication number
WO2015081776A1
WO2015081776A1 PCT/CN2014/089946 CN2014089946W WO2015081776A1 WO 2015081776 A1 WO2015081776 A1 WO 2015081776A1 CN 2014089946 W CN2014089946 W CN 2014089946W WO 2015081776 A1 WO2015081776 A1 WO 2015081776A1
Authority
WO
WIPO (PCT)
Prior art keywords
sequence
key
video
picture
screen
Prior art date
Application number
PCT/CN2014/089946
Other languages
English (en)
French (fr)
Inventor
张婧
邵丹丹
徐振华
Original Assignee
百度在线网络技术(北京)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 百度在线网络技术(北京)有限公司 filed Critical 百度在线网络技术(北京)有限公司
Priority to JP2016535328A priority Critical patent/JP6266109B2/ja
Priority to KR1020157035232A priority patent/KR101746165B1/ko
Priority to US14/392,326 priority patent/US9973793B2/en
Publication of WO2015081776A1 publication Critical patent/WO2015081776A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • H04N21/234381Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements by altering the temporal resolution, e.g. decreasing the frame rate by frame skipping
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/414Specialised client platforms, e.g. receiver in car or embedded in a mobile appliance
    • H04N21/41407Specialised client platforms, e.g. receiver in car or embedded in a mobile appliance embedded in a portable device, e.g. video client on a mobile phone, PDA, laptop
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/4302Content synchronisation processes, e.g. decoder synchronisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/488Data services, e.g. news ticker
    • H04N21/4884Data services, e.g. news ticker for displaying subtitles
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/60Network structure or processes for video distribution between server and client or between remote clients; Control signalling between clients, server and network components; Transmission of management data between server and client, e.g. sending from server to client commands for recording incoming content stream; Communication details between server and client 
    • H04N21/61Network physical structure; Signal processing
    • H04N21/6106Network physical structure; Signal processing specially adapted to the downstream path of the transmission network
    • H04N21/6131Network physical structure; Signal processing specially adapted to the downstream path of the transmission network involving transmission via a mobile phone network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/433Content storage operation, e.g. storage operation in response to a pause request, caching operations
    • H04N21/4332Content storage operation, e.g. storage operation in response to a pause request, caching operations by placing content in organized collections, e.g. local EPG data repository
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/433Content storage operation, e.g. storage operation in response to a pause request, caching operations
    • H04N21/4333Processing operations in response to a pause request
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • H04N21/4394Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams

Definitions

  • the present invention relates to the field of communications technologies, and in particular, to a method and an apparatus for processing a video picture.
  • the first problem it is currently solved by providing a smooth transcoding format.
  • the second problem it can be solved by reducing the number of frames, and the minimum can be reduced to 24 frames/second. It can also be passively replaced after receiving a report from the user. Video resources or encourage users to replace video nodes, skip the Caton clip; for the third problem, there is currently no solution.
  • the present invention aims to solve at least one of the above technical problems.
  • a first object of the present invention is to provide a method of processing a video picture.
  • the method sorts the key pictures, generates a picture library, and plays the key pictures in the picture library, thereby saving traffic and enabling the user to quickly and conveniently understand the video content.
  • a second object of the present invention is to provide a processing apparatus for a video picture.
  • a method for processing a video picture includes the following steps: obtaining information of a current video; and capturing a key picture of the current video according to the information of the current video, where the key picture includes a video frame picture of the full caption; sorting the key pictures to generate a picture library; and receiving a play request, and reading a corresponding key picture from the picture library according to the play request for playing.
  • the method for processing a video picture by obtaining information of a current video, intercepting a key picture of the current video according to information of the current video, sorting the key pictures, generating a picture library, and reading from the picture library according to the play request.
  • the corresponding key picture is played for playing, which saves the traffic, and ensures that the user can quickly preview the video in the case of the network card, so that the user can quickly and conveniently understand the story and reduce the user encountering the mobile video.
  • the bounce rate in the case of Caton improves the user experience.
  • a processing device for a video picture includes: an obtaining module, an intercepting module, a generating module, and a playing module.
  • the processing device for the video picture of the embodiment of the present invention intercepts the key picture of the current video according to the current video information, obtains the key picture by the current video information, generates a picture library, and reads from the picture library according to the play request.
  • the corresponding key picture is played for playing, which saves the traffic, and ensures that the user can quickly preview the video in the case of the network card, so that the user can quickly and conveniently understand the story and reduce the user encountering the mobile video.
  • the bounce rate in the case of Caton improves the user experience.
  • a storage medium configured to store an application for performing a processing method of a video picture according to the first aspect of the present invention.
  • FIG. 1 is a flow chart of a method of processing a video picture according to an embodiment of the present invention
  • FIG. 2 is a flow chart of a video playing process in accordance with one embodiment of the present invention.
  • FIG. 3 is a flow chart of intercepting a key picture of a current video according to an embodiment of the present invention
  • FIG. 4 is a schematic diagram of complementing a sequence of key time points of a keyword screen according to another embodiment of the present invention.
  • FIG. 5 is a schematic structural diagram of a processing apparatus of a video picture according to an embodiment of the present invention.
  • FIG. 6 is a schematic structural diagram of a video picture processing apparatus according to an embodiment of the present invention.
  • FIG. 7 is another schematic structural diagram of a processing apparatus for a video picture according to an embodiment of the present invention.
  • the present invention provides a video picture processing method.
  • FIG. 1 is a flow chart of a method of processing a video picture in accordance with one embodiment of the present invention. As shown in FIG. 1, the processing method of the video picture includes the following steps:
  • the current video information is obtained from the video resource library, and the information may include a video source path, a subtitle file path, and the like.
  • the key picture of the current video can be intercepted by the following steps: First, the key is obtained according to the information of the current video. Subtitle time point sequence, after obtaining the keyword screen time point sequence, the frame time point sequence of the keyword screen can be complemented, or the key screen time point in the keyword screen time point sequence can be offset corrected, and finally according to the keyword screen time
  • the point sequence corresponds to the key picture of the current video.
  • the complementary frame processing and the offset correction have no strict execution order, that is, the offset correction may be performed after the complement frame is performed first, or the offset correction may be performed after the offset correction is performed;
  • the supplementary frame is an optional step. If the supplementary frame processing is performed, the key picture further includes a picture for complementing the frame according to the interval time.
  • the key pictures of the captured current video are sorted according to the playing order in the video, and constitute a picture library of the current video comic picture mode.
  • the comic strip mode that is, the key story screen of the captured video, constitutes a series of albums sorted by playing time, so as to meet the needs of watching key plots.
  • S104 Receive a play request, and read a corresponding key screen from the screen library according to the play request for playing.
  • the video content has a comic strip mode and a normal video mode. Users can choose to watch the video, or browse the story by watching the picture gallery of the comic strip mode. The two modes can be switched to each other.
  • the video or key screen is automatically preloaded, and when the preset loading amount is lower than a preset threshold (for example, when the preloading amount cannot support 5 seconds of continuous playback), the mode is automatically switched to the comic strip mode, and the priority is given. Ensure that the user sees the story.
  • the user can also manually switch to video mode to continue waiting for preload. For example, as shown in FIG. 2, the video file is divided into a plurality of units, and the video file is preloaded and played in units of units. Specifically, the following steps are included:
  • step S202 when playing to 0.75 units, determine whether the next unit is preloaded. If the preloading is completed, step 203 is performed, and if the loading is not completed, step S204 is performed.
  • step S204 switching to the comic strip mode, and preloading the comic strip of the next unit, so that the user can browse the story in time, and then moving to step S202, so that when the video is preloaded, the video mode can be switched back to the video of the corresponding unit.
  • the playback when playing in the comic strip mode, the playback may be performed in an automatic manner or manually.
  • the automatic play mode when used, the corresponding key pictures are sequentially read from the picture library according to the automatic play request for playback, and the picture library is automatically played in order at predetermined time intervals, for example, every 3 seconds.
  • the playback speed can be preset by the user according to his own needs.
  • the corresponding key screen will be read from the screen library. Play, when playing for a period of time, if the preload can support 5s continuous playback, you can stop reading the key screen from the screen library and resume normal video playback.
  • the above-mentioned comic strip mode can save traffic. For example, a standard video with a duration of 30 minutes consumes more than 100M, while a comic strip mode requires only about 9M, which can save 90% of traffic. As a result, users can be traced to the user with very little traffic, which increases access frequency and user satisfaction.
  • a function to support user interaction is also provided. Users can interact with key storylines to create rich user-generated content. Moreover, advertisements in the form of videos and pictures can be inserted into the picture library as pictures to provide more information to the user. Users can also use the fragmentation time to watch the drama anytime and anywhere, like watching novels and watching pictures, without being strictly restricted by the environment and the network.
  • the method for processing a video picture by obtaining information of a current video, intercepting a key picture of the current video according to information of the current video, sorting the key pictures, generating a picture library, and reading from the picture library according to the play request.
  • the corresponding key picture is played for playing, which saves the traffic, and ensures that the user can quickly preview the video in the case of the network card, so that the user can quickly and conveniently understand the story and reduce the user encountering the mobile video.
  • the bounce rate in the case of Caton improves the user experience.
  • this embodiment proposes a method for intercepting the key picture of the current video. As shown in FIG. 3, the method may include The following steps:
  • the start and end time points of each voice subtitle in the current video can be obtained through techniques such as network, voice recognition, or image recognition, and one frame is intercepted at the end time of each voice caption to ensure only By intercepting the picture, you can completely read all the subtitles.
  • the keyword screen time point sequence may be obtained based on the subtitle file, wherein the movie subtitle is generally divided into a graphic type subtitle file and a text format subtitle file; for the graphic type subtitle file, the index file may be analyzed to obtain the video.
  • the "subtitle time range" sequence with dialogue; for subtitle files in text format such as .srt, .ass format subtitle files, the "subtitle time range” sequence can be automatically analyzed by the existing program; finally, according to each sequence in the sequence
  • the intermediate value or other value of the "subtitle time range” member generates a "keyword time point" sequence, of course, the above intermediate value or other value setting can guarantee to obtain a video frame with full subtitle.
  • the method of speech analysis can be used, that is, the voice part is distinguished based on the voice recognition, and the start and end time points of the voice subtitle are obtained, thereby obtaining the keyword screen time point.
  • Image recognition may also be adopted, that is, the video is converted into continuous frames at predetermined time intervals, and then the frame with the complete subtitle in a specific area is identified in the form of picture recognition, and the final picture frame sequence is directly obtained after de-duplication. Then, the time corresponding to the sequence of the picture frames is the sequence of the key time points of the keyword.
  • the predetermined time is preferably greater than 1/24 second because the sampling rate of the video is 24 frames/s.
  • S302 Perform frame completion processing on the key time sequence of the keyword screen.
  • This step is an optional step.
  • a "complement frame” is intercepted every 5 seconds, because there is no voice during this time, but There may be an action shot, and the action shot will also affect the user's understanding of the plot. Therefore, the 1 minute long video is about 15 screenshots to ensure the continuity of the plot.
  • Each image is 20k in size, and all the image files corresponding to the current video are 300k.
  • the process of complementing the frame may be: determining whether the time interval between two adjacent keyword screen time points is greater than a predetermined value, and if greater than, obtaining a new keyword screen time between adjacent two keyword screen time points. Point and insert the new keyword screen time point into the keyword screen time point sequence. For example, when two adjacent "keyword time points" differ by more than 4s, a screenshot of the intermediate value time point is inserted between them, as shown in Fig. 4, the time interval between 3.484 and 20.196 is greater than 4s.
  • the ellipsis indicates that several key time point sequence objects are omitted subsequently.
  • S303 Perform offset correction on a keyword screen time point in the sequence of key time points of the keyword.
  • This step is also an optional step. Since the subtitle file generally has errors, the "keyword time point" obtained according to the subtitle file needs to be automatically corrected by offset verification, and the correction program can verify the first 10 "subtitle time".
  • the start time of the range is used to determine the offset parameter, which can be automatically obtained by comparing the start time point obtained by image recognition or voice recognition and the start time of the subtitle in the subtitle file, or can be obtained by other methods. .
  • the offset correction is performed on the "keyword time point" sequence by the offset parameter.
  • the above steps S302 and S303 do not have a strict execution order, that is, the offset correction may be performed after the complement is performed first, or the offset correction may be performed after the offset correction.
  • the key video of the current video can be intercepted according to the keyword screen time point sequence through the open source free cross-platform video and audio stream scheme FFmpeg (an audio and video processing program).
  • the key pictures of the captured current video are sorted according to the playing order in the video, and constitute a picture library of the current video comic picture mode.
  • the comic strip mode that is, the key story screen of the captured video, constitutes a series of albums sorted by playing time, so as to meet the needs of watching key plots.
  • the comic strip mode has an outstanding advantage in the mobile environment, and the loading flow is small and fast.
  • the HD video with a duration of 1 minute is about 20M
  • the SD version is about 4M
  • the picture library in the comic strip mode only needs 300k. Assuming the user's network speed is 10k/s, loading HD video needs to wait. In 34 minutes, it takes 7 minutes to load standard definition video, and it takes only 30 seconds to load the picture library in comic strip mode.
  • the intercepted key picture is more consistent, accurate and comprehensive, thereby enabling the user to Quickly preview the video for faster, easier, and complete understanding of the story and enhance the user experience.
  • FIG. 5 is a schematic structural diagram of a processing apparatus of a video picture according to an embodiment of the present invention.
  • the processing device of the video screen includes: an obtaining module 100, an intercepting module 200, a generating module 300, and a playing module 400.
  • the obtaining module 100 is configured to obtain information of the current video.
  • the obtaining module 100 obtains information of the current video from the video resource library, and the information may include a video source path, a subtitle file path, and the like.
  • the intercepting module 200 is configured to intercept a key picture of the current video according to the information of the current video obtained by the obtaining module 100, where the key picture includes a video frame picture with complete subtitles.
  • the intercepting module 200 may obtain a sequence of key time points according to the information of the current video obtained by the obtaining module 100. After obtaining the sequence of the keyword screen time points, the intercepting module 200 may perform frame interpolation processing on the keyword screen time point sequence, and may also perform offset correction on the keyword screen time point in the keyword screen time point sequence, and finally according to the keyword screen time point sequence. Corresponding to the key picture of the current video. It should be noted that the complementary frame processing and the offset correction have no strict execution order, that is, the offset correction may be performed after the complement frame is performed first, or the offset correction may be performed first after the offset correction.
  • the generating module 300 is configured to sort the key images intercepted by the intercepting module 200 to generate a screen library.
  • the generating module 300 sorts the key pictures of the captured current video according to the playing order in the video to form a picture library of the current video comic picture mode.
  • the comic strip mode that is, the key story screen of the captured video, constitutes a series of albums sorted by playing time, so as to meet the needs of watching key plots.
  • the playing module 400 is configured to receive a play request, and read a corresponding key screen from the screen library generated by the generating module according to the play request for playing.
  • the video content has a comic strip mode and a normal video mode. Users can choose to watch the video, or browse the story by watching the picture gallery of the comic strip mode. The two modes can be switched to each other.
  • the video or the key image is automatically preloaded, and when the preset loading amount is lower than the preset threshold (for example, when the preloading amount cannot support the continuous playing of 5s), the mode is automatically switched to the comic strip mode, and the priority is given. Ensure that the user sees the story. At the same time, the user can also manually switch to video mode to continue waiting for preload.
  • the playback when playing in the comic strip mode, the playback may be performed in an automatic manner or manually.
  • the play module 400 reads the corresponding key pictures in order from the screen library according to the automatic play request, and automatically plays the picture library in the order of 1 frame every 3 seconds. In this way, the user can ensure that the subtitles are read smoothly, and at the same time coincide with the playing time of the general TV drama video. Can reason Solution, the playback speed can be preset by the user according to their own needs.
  • the playing module 400 can read the corresponding key screen from the screen library for playing according to the calling request. For example, when the preloading amount cannot support 5s continuous playback, the corresponding key screen will be read from the screen library for playback. After playing for a period of time, if the preload amount can support 5s continuous playback, the slave screen can be stopped. The key image is read in the library and normal video playback is resumed.
  • the above-mentioned comic strip mode can save traffic. For example, a standard video with a duration of 30 minutes consumes more than 100M, while a comic strip mode requires only about 9M, which can save 90% of traffic. As a result, users can be traced to the user with very little traffic, which increases access frequency and user satisfaction.
  • a function to support user interaction is also provided. Users can interact with key storylines to create rich user-generated content. Moreover, advertisements in the form of videos and pictures can be inserted into the picture library as pictures to provide more information to the user. Users can also use the fragmentation time to watch the drama anytime and anywhere, like watching novels and watching pictures, without being strictly restricted by the environment and the network.
  • the processing device for the video picture of the embodiment of the present invention intercepts the key picture of the current video according to the current video information, obtains the key picture by the current video information, generates a picture library, and reads from the picture library according to the play request.
  • the corresponding key picture is played for playing, which saves the traffic, and ensures that the user can quickly preview the video in the case of the network card, so that the user can quickly and conveniently understand the story and reduce the user encountering the mobile video.
  • the bounce rate in the case of Caton improves the user experience.
  • FIG. 6 is a schematic structural diagram of a video picture processing apparatus according to an embodiment of the present invention
  • FIG. 7 is another schematic structural diagram of a video picture processing apparatus according to an embodiment of the present invention.
  • the processing device of the video screen includes: an obtaining module 100, an intercepting module 200, a generating module 300, and a playing module 400.
  • the intercepting module 200 specifically includes: a time point sequence obtaining unit 210, an intercepting unit 220, a complementing frame unit 230, and a correcting unit 240.
  • the time point sequence obtaining unit 210 is configured to obtain a keyword screen time point sequence according to the information of the current video.
  • the time point sequence obtaining unit 210 may acquire the start and end time points of each voice subtitle in the current video through techniques such as network, voice recognition, or image recognition, and intercept one end point of each voice caption.
  • the frame is framed to ensure that all subtitles are completely viewed by simply capturing the picture.
  • the time point sequence obtaining unit 210 may obtain a keyword screen time point sequence based on the subtitle file, wherein the movie subtitles are generally divided into a graphic type subtitle file and a text format subtitle file; for the graphic type subtitle file, the time point sequence is obtained.
  • the unit 210 may obtain a "subtitle time range" sequence in the video by analyzing its index file; for a subtitle file in a text format such as a .srt, .ass format subtitle file, the time point sequence obtaining unit 210 may pass an existing program.
  • the time point sequence obtaining unit 210 can also use the method of voice analysis, that is, distinguishing the voice part based on the voice recognition, and obtaining the start and end time points of the voice caption, thereby obtaining the key. Subtitle time point.
  • the time point sequence obtaining unit 210 may also adopt an image recognition manner, that is, convert the video into consecutive frames at intervals of 0.5 s, and then identify a frame with a complete subtitle in a specific area in the form of picture recognition, and directly The final sequence of picture frames is obtained, and the time corresponding to the sequence of picture frames is the sequence of key time points of the keyword.
  • the intercepting unit 220 is configured to intercept the key picture of the current video according to the sequence of the key screen time points.
  • the complement frame unit 230 is configured to determine whether the time interval between the adjacent two keyword screen time points in the sequence of the key screen time points obtained by the time point sequence obtaining unit 210 or corrected by the correcting unit 240 is greater than a predetermined value, if greater than, Then, the new keyword screen time point is obtained between the adjacent two keyword screen time points, and the newly added keyword screen time point is inserted into the keyword screen time point sequence.
  • the 1 minute long video is about 15 screenshots to ensure the continuity of the plot.
  • Each image is 20k in size, and all the image files corresponding to the current video are 300k.
  • the complement frame unit 230 determines whether the time interval between two adjacent keyword screen time points is greater than a predetermined value, and if greater, obtains a new keyword screen time point between adjacent two keyword screen time points. And add the new keyword screen time point into the keyword screen time point sequence. For example, when two adjacent "keyword time points" differ by more than 5s, insert a screenshot of the intermediate value at the time point between them, insert 2 screenshots over 6s, and so on, to ensure that there is at least an average of 4s. A picture to ensure the continuity of the plot.
  • the correcting unit 240 is configured to perform offset correction on the keyword screen time point in the sequence of the key screen time points obtained by the time point sequence obtaining unit 210 or the complement frame unit 230.
  • the "keyword time point" obtained according to the subtitle file needs to be automatically corrected by the offset check, and the correcting unit 240 can determine the bias by verifying the start time of the first 10 "subtitle time ranges".
  • the correcting unit 240 performs offset correction on the "keyword time point" sequence by the offset parameter.
  • the complement frame processing performed by the complement frame unit 230 and the offset correction by the correction unit 240 are not strictly performed, that is, the offset correction may be performed after the complement frame is performed, or the offset correction may be performed first. Complement the frame.
  • the processing device for the video picture in the embodiment of the present invention performs frame-complementing processing on the time-point sequence of the keyword screen to And offset correction of the keyword screen time point in the keyword time point sequence, so that the intercepted key picture is more consistent and accurate, so that the user can quickly preview the video, and the story is more quickly and conveniently understood, thereby improving the user experience.
  • the present invention also provides a storage medium for storing an application for performing a processing method of a video picture according to any of the embodiments of the present invention.
  • portions of the invention may be implemented in hardware, software, firmware or a combination thereof.
  • multiple steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system.
  • a suitable instruction execution system For example, if implemented in hardware, as in another embodiment, it can be implemented by any one or combination of the following techniques well known in the art: having logic gates for implementing logic functions on data signals. Discrete logic circuits, application specific integrated circuits with suitable combinational logic gates, programmable gate arrays (PGAs), field programmable gate arrays (FPGAs), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Television Signal Processing For Recording (AREA)
  • Studio Circuits (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Databases & Information Systems (AREA)

Abstract

本发明提出一种视频画面的处理方法及装置。其中,视频画面的处理方法包括:获得当前视频的信息;根据当前视频的信息截取当前视频的关键画面,关键画面包括带有完整字幕的视频帧画面;对关键画面进行排序,生成画面库;以及接收播放请求,根据播放请求从画面库中读取对应的关键画面进行播放。本发明实施例,通过获得当前视频的信息,根据当前视频的信息截取当前视频的关键画面,并对关键画面进行排序,生成画面库,以及根据播放请求从画面库中读取对应的关键画面进行播放,在节省流量的同时,保证了在网络卡顿的情况下,用户能够对视频进行预览,以使用户快捷、方便、完整地了解剧情,减少了用户在移动视频卡顿情况时的跳出率,提升了用户体验。

Description

视频画面的处理方法及装置
相关申请的交叉引用
本申请要求百度在线网络技术(北京)有限公司于2013年12月4日提交的、发明名称为“视频画面的处理方法及装置”的、中国专利申请号“201310646783.6”的优先权。
技术领域
本发明涉及通信技术领域,尤其涉及一种视频画面的处理方法及装置。
背景技术
随着终端技术的迅速发展,各种功能的应用也越来越多元化,用户越来越倾向于通过在终端上安装不同的应用程序,来辅助进行商务、娱乐、生活等多种活动。
目前,很多用户喜欢在移动终端上观看在线视频,但观看在线视频比较耗费移动流量,平均每看1分钟的手机视频,标清资源耗费流量在4M左右,高清资源耗费流量在20M左右,对于大多数每月仅有百兆流量的用户来说,观看在线视频存在很大壁垒。
另外,受网络速度、视频资源、手机性能等因素的影响,用户在观看移动视频的过程中经常会遇到视频卡顿情况。在发生视频卡顿时,近一半用户会直接跳出页面或退出产品,因而无法满足用户的视频观看需求。
由于网络环境对视频资源的传输速度有很大影响,所以用户只能在安静的、移动网络很好的环境里才可以观看在线视频,而在较差的网络环境诸如地铁上通常无法观看,同时,对于一些碎片时间也不适合进行观看。
由此可见,目前观看在线视频存在以下问题:1、耗费移动流量大;2、视频卡顿情况严重;3、观看场所、时间受限制。
针对第一个问题,目前通过提供流畅转码的格式来解决,针对第二个问题,可以通过减少帧数来解决,最低可以减少到24帧/秒;也可以在接到用户举报后被动更替视频资源或者鼓励用户更换视频节点、跳过卡顿片段;针对第三个问题,目前尚无解决方案。
但是,上述流畅转码和减少帧数的方式,对视频所耗费的流量和卡顿情况虽有改善,但现存问题依旧严重;用户举报或让用户更换节点是被动的方式,只能事后解决卡顿问题。
发明内容
本发明旨在至少解决上述技术问题之一。
为此,本发明的第一个目的在于提出一种视频画面的处理方法。该方法通过截取当前视频的关键画面,对关键画面进行排序,生成画面库,并对画面库中的关键画面进行播放,节省了流量,使用户快捷、方便地了解视频内容。
本发明的第二个目的在于提出一种视频画面的处理装置。
为了实现上述目的,本发明第一方面实施例的视频画面的处理方法,包括以下步骤:获得当前视频的信息;根据所述当前视频的信息截取当前视频的关键画面,所述关键画面包括带有完整字幕的视频帧画面;对所述关键画面进行排序,生成画面库;以及接收播放请求,根据所述播放请求从所述画面库中读取对应的关键画面进行播放。
本发明实施例的视频画面的处理方法,通过获得当前视频的信息,根据当前视频的信息截取当前视频的关键画面,并对关键画面进行排序,生成画面库,以及根据播放请求从画面库中读取对应的关键画面进行播放,在节省流量的同时,保证了在网络卡顿的情况下,用户能够快速对视频进行预览,以使用户快捷、方便地了解剧情,减少了用户在遇到移动视频卡顿情况时的跳出率,提升了用户体验。
为了实现上述目的,本发明第二方面实施例的视频画面的处理装置,包括:获得模块、截取模块、生成模块以及播放模块。
本发明实施例的视频画面的处理装置,通过获得当前视频的信息,根据当前视频的信息截取当前视频的关键画面,并对关键画面进行排序,生成画面库,以及根据播放请求从画面库中读取对应的关键画面进行播放,在节省流量的同时,保证了在网络卡顿的情况下,用户能够快速对视频进行预览,以使用户快捷、方便地了解剧情,减少了用户在遇到移动视频卡顿情况时的跳出率,提升了用户体验。
为了实现上述目的,本发明第三方面实施例的存储介质,用于存储应用程序,所述应用程序用于执行本发明第一方面实施例所述的视频画面的处理方法。
本发明附加的方面和优点将在下面的描述中部分给出,部分将从下面的描述中变得明显,或通过本发明的实践了解到。
附图说明
本发明上述的和/或附加的方面和优点从下面结合附图对实施例的描述中将变得明显和容易理解,其中,
图1是根据本发明一个实施例的视频画面的处理方法的流程图;
图2是根据本发明一个实施例的视频播放过程的流程图;
图3是根据本发明一个具体实施例的截取当前视频的关键画面的流程图;
图4是根据本发明另一个实施例的对关键字幕时间点序列进行补帧的示意图;
图5是根据本发明一个实施例的视频画面的处理装置的结构示意图;
图6是根据本发明一个具体实施例的视频画面的处理装置的结构示意图;
图7是根据本发明一个具体实施例的视频画面的处理装置的另一个结构示意图。
具体实施方式
下面详细描述本发明的实施例,所述实施例的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的,仅用于解释本发明,而不能理解为对本发明的限制。相反,本发明的实施例包括落入所附加权利要求书的精神和内涵范围内的所有变化、修改和等同物。
在本发明的描述中,需要理解的是,术语“第一”、“第二”等仅用于描述目的,而不能理解为指示或暗示相对重要性。在本发明的描述中,需要说明的是,除非另有明确的规定和限定,术语“相连”、“连接”应做广义理解,例如,可以是固定连接,也可以是可拆卸连接,或一体地连接;可以是机械连接,也可以是电连接;可以是直接相连,也可以通过中间媒介间接相连。对于本领域的普通技术人员而言,可以具体情况理解上述术语在本发明中的具体含义。此外,在本发明的描述中,除非另有说明,“多个”的含义是两个或两个以上。
流程图中或在此以其他方式描述的任何过程或方法描述可以被理解为,表示包括一个或更多个用于实现特定逻辑功能或过程的步骤的可执行指令的代码的模块、片段或部分,并且本发明的优选实施方式的范围包括另外的实现,其中可以不按所示出或讨论的顺序,包括根据所涉及的功能按基本同时的方式或按相反的顺序,来执行功能,这应被本发明的实施例所属技术领域的技术人员所理解。
下面结合说明书附图详细说明本发明实施例的视频画面的处理方法及装置。
为了解决目前观看在线视频存在的耗费移动流量大、视频卡顿情况严重以及观看场所、时间受限制的问题,本发明提出一种视频画面的处理方法。
图1是根据本发明一个实施例的视频画面的处理方法的流程图。如图1所示,该视频画面的处理方法包括以下步骤:
S101,获得当前视频的信息。
首先从视频资源库中获得当前视频的信息,该信息可以包括视频源路径、字幕文件路径等。
S102,根据当前视频的信息截取当前视频的关键画面,该关键画面包括带有完整字幕的视频帧画面。
截取当前视频的关键画面可通过以下步骤完成:首先,根据当前视频的信息获得关键 字幕时间点序列,在获得关键字幕时间点序列之后,可以对关键字幕时间点序列进行补帧处理,也可以对关键字幕时间点序列中的关键字幕时间点进行偏移校正,最后根据关键字幕时间点序列对应截取当前视频的关键画面。需要说明的是,补帧处理和偏移校正并无严格的执行顺序,即可以先进行补帧后进行偏移校正,也可以先进行偏移校正后进行补帧;另外,上述偏移校正和进行补帧为可选步骤,若进行了补帧处理,则上述关键画面还包括根据间隔时间进行补帧的画面。
S103,对关键画面进行排序,生成画面库。
在本实施例中,对截取的当前视频的关键画面按照视频中的播放顺序进行排序,组成当前视频的连环画模式的画面库。其中,连环画模式,即截取视频的关键剧情画面,组成一系列按播放时间排序的图册,从而满足观看关键剧情的需求。
S104,接收播放请求,根据播放请求从画面库中读取对应的关键画面进行播放。
在本实施例中,视频内容具有连环画模式和正常的视频模式。用户可选择观看视频,也可以通过观看连环画模式的画面库浏览剧情。两种模式可以互相切换。在遇到视频卡顿时,可自动切换为连环画模式。具体地,在视频播放过程中,自动预加载视频或关键画面,当预设加载量低于预设阈值(例如,当预加载量无法支持5s的连续播放),则自动切换至连环画模式,优先保证用户看剧情。同时用户也可手动切换至视频模式继续等待预加载。举例来说,如图2所示,将视频文件分为多个单元,以单元为单位对视频文件进行预加载并播放。具体地,包括以下步骤:
S201,预加载完一个单元后,播放当前单元的视频。
S202,当播放到0.75个单元时,判断是否预加载完下一个单元,如果预加载完毕,则执行步骤203,若未加载完毕,则执行步骤S204。
S203,继续播放下一个单元的视频,操作结束。
S204,切换为连环画模式,并预加载下一个单元的连环画,以使用户及时浏览剧情,然后转向步骤S202,以便当视频预加载完毕后,可以切换回视频模式,并播放对应单元的视频。
本实施例中,在连环画模式下进行播放时,可以通过自动方式进行播放,也可以通过手动方式进行播放。通过自动播放方式时,根据自动播放请求按顺序从画面库中读取对应的关键画面进行播放,并以预定时间间隔例如每3秒1帧的速度按顺序自动播放画面库。这样,可以保证用户顺利阅读字幕,播放速度与用户正常观看视频的播放速度保持一致,使用户获得正常观看视频的体验。可以理解,播放速度可通过用户根据自身需求预先设定。通过手动或自动播放时,可以根据调用请求从画面库中读取对应的关键画面进行播放。例如,当预加载量无法支持5s的连续播放时,会从画面库中读取对应的关键画面 进行播放,当播放一段时间后,若预加载量可以支持5s的连续播放时,可以停止从画面库中读取关键画面并恢复正常的视频播放。
采用上述连环画模式,可以节省流量,比如,时长30分钟的一段标清视频所耗流量在100M以上,而连环画模式仅需9M左右,能节省90%的流量。因而,可以保证用户使用极少的流量就可以追剧,从而提升访问频率和用户满意度。
在连环画模式的每帧画面下,还提供支持用户交流互动的功能。用户可针对关键剧情进行互动交流,创造丰富的用户生成内容。并且,视频和图片形式的广告都能以图片形式插入画面库,为用户提供更多的信息。用户还可以利用碎片时间,像看小说、看图片一样,随时随地追剧,而不必受环境、网络的严格限制。
本发明实施例的视频画面的处理方法,通过获得当前视频的信息,根据当前视频的信息截取当前视频的关键画面,并对关键画面进行排序,生成画面库,以及根据播放请求从画面库中读取对应的关键画面进行播放,在节省流量的同时,保证了在网络卡顿的情况下,用户能够快速对视频进行预览,以使用户快捷、方便地了解剧情,减少了用户在遇到移动视频卡顿情况时的跳出率,提升了用户体验。
为了截取当前视频的关键画面,使用户能够对视频进行预览,更加快捷、方便、完整地了解剧情,本实施例提出了截取当前视频的关键画面的方法,如图3所示,该方法可以包括以下步骤:
S301,根据当前视频的信息获得关键字幕时间点序列。
在本实施例中,可以通过网络、语音识别或图像识别等技术获取当前视频中每段语音字幕的开始和结束时间点,在每段语音字幕的结束时间点上截取一帧画面,以保证仅通过截取画面,就可以完整地看完所有字幕。
具体地,可以基于字幕文件获得关键字幕时间点序列,其中,影片字幕一般分为图形类型的字幕文件和文本格式的字幕文件;对于图形类型的字幕文件,可以通过分析其索引文件来获得视频中拥有对白的“字幕时间范围”序列;对于文本格式的字幕文件例如.srt、.ass格式的字幕文件,可以通过现有程序自动分析获取其中的“字幕时间范围”序列;最后根据这个序列中每个“字幕时间范围”成员的中间值或其他值生成“关键字幕时间点”序列,当然上述中间值或其他值的设置均可以保证获得带有完整字幕的视频帧。
除了可以基于字幕文件获得关键字幕时间点序列外,还可以使用语音分析的方式,即基于语音识别区分出人声部分,获得语音字幕的开始和结束时间点,从而获得关键字幕时间点。也可以采用图像识别的方式,即以预定时间为间隔,将视频转换为连续的帧,再以图片识别的形式识别出特定区域带有完整字幕的帧,去重后直接获得最终的图片帧序列,则这些图片帧序列对应的时间即为关键字幕时间点序列。
其中,上述预定时间优选为大于1/24秒,因为这样视频的采样率是24帧/s。
S302,对关键字幕时间点序列进行补帧处理。
该步骤为可选步骤,当两帧画面间隔时间过长,例如超过预定时间5秒时,则每5秒需截取一副“补帧”,这是因为,这段时间内虽没有语音,但可能有动作镜头发生,动作镜头也会影响用户对剧情的理解。因此,1分钟长的视频约截图15次,以保证剧情的连贯性,每张图片大小为20k,对应当前视频的所有图片文件共计300k。
具体地,补帧过程可以为:判断相邻两个关键字幕时间点之间的时间间隔是否大于预定值,若大于,则在相邻两个关键字幕时间点之间获取新增的关键字幕时间点,并将新增的关键字幕时间点插入关键字幕时间点序列中。例如,当相邻两个“关键字幕时间点”相差超过4s时,则在其之间插入一个中间值时间点上的截图,如图4所示,3.484和20.196之间的时间间隔大于4s的4倍,因此,需要在二者之间插入4帧,其中,带有“-tween”标记的帧为插入的帧;20.196和28.887之间的时间间隔大于4s的2倍,因此,需要在二者之间插入2帧,以此类推,以保证至少平均4s内有一个图片,从而可以保证剧情的连贯性。
其中,上述补帧过程用算法实现的结果的示例如下:
Figure PCTCN2014089946-appb-000001
Figure PCTCN2014089946-appb-000002
其中,省略号表示后续省略了若干个关键时间点序列对象。
S303,对关键字幕时间点序列中的关键字幕时间点进行偏移校正。
该步骤也为可选步骤,由于字幕文件一般都有误差,所以根据字幕文件获得的“关键字幕时间点”需要通过偏移校验,自动修正,其修正程序可以通过验证首10个“字幕时间范围”的开始时间来确定偏移量参数,该偏移量参数可通过对比由图像识别或语音识别方式获得的开始时间点和字幕文件中字幕的开始时间来自动获得,也可以通过其他方法获得。通过该偏移量参数对“关键字幕时间点”序列进行偏移校正。
需要说明的是,上述步骤S302和S303并无严格的执行顺序,即可以先进行补帧后进行偏移校正,也可以先进行偏移校正后进行补帧。
S304,根据关键字幕时间点序列对应截取当前视频的关键画面。
在本实施例中,可以通过开源免费跨平台的视频和音频流方案FFmpeg(一种音视频处理程序),根据关键字幕时间点序列截取当前视频的关键画面。
在本实施例中,对截取的当前视频的关键画面按照视频中的播放顺序进行排序,组成当前视频的连环画模式的画面库。其中,连环画模式,即截取视频的关键剧情画面,组成一系列按播放时间排序的图册,从而满足观看关键剧情的需求。连环画模式在移动环境下具有突出优势,加载流量小且速度快。时长1分钟的高清视频大小约20M,标清版约4M,而连环画模式下的画面库仅需300k。假定用户的网速为10k/s,则加载高清视频需要等待 34分钟,加载标清视频需要等待7分钟,而加载连环画模式下的画面库仅需30秒。
本发明实施例,通过对关键字幕时间点序列进行补帧处理,以及对关键字幕时间点序列中的关键字幕时间点进行偏移校正,使截取的关键画面更加连贯、准确、全面,从而使用户能够快速对视频进行预览,更加快捷、方便、完整地了解剧情,提升了用户体验。
图5是根据本发明一个实施例的视频画面的处理装置的结构示意图。如图5所示,视频画面的处理装置包括:获得模块100、截取模块200、生成模块300以及播放模块400。
获得模块100用于获得当前视频的信息。
在本实施例中,获得模块100从视频资源库中获得当前视频的信息,该信息可以包括视频源路径、字幕文件路径等。
截取模块200用于根据获得模块100获得的当前视频的信息截取当前视频的关键画面,关键画面包括带有完整字幕的视频帧画面。
在本实施例中,截取模块200可根据获得模块100获得的当前视频的信息获得关键字幕时间点序列。在获得关键字幕时间点序列之后,截取模块200可以对关键字幕时间点序列进行补帧处理,也可以对关键字幕时间点序列中的关键字幕时间点进行偏移校正,最后根据关键字幕时间点序列对应截取当前视频的关键画面。需要说明的是,补帧处理和偏移校正并无严格的执行顺序,即可以先进行补帧后进行偏移校正,也可以先进行偏移校正后进行补帧。
生成模块300用于对截取模块200截取到的关键画面进行排序,生成画面库。
在本实施例中,生成模块300对截取的当前视频的关键画面按照视频中的播放顺序进行排序,组成当前视频的连环画模式的画面库。其中,连环画模式,即截取视频的关键剧情画面,组成一系列按播放时间排序的图册,从而满足观看关键剧情的需求。
播放模块400用于接收播放请求,根据播放请求从生成模块生成的画面库中读取对应的关键画面进行播放。
在本实施例中,视频内容具有连环画模式和正常的视频模式。用户可选择观看视频,也可以通过观看连环画模式的画面库浏览剧情。两种模式可以互相切换。在遇到视频卡顿时,可自动切换为连环画模式。具体地,在视频播放过程中,自动预加载视频或关键画面,当预设加载量低于预设阈值(例如:当预加载量无法支持5s的连续播放),则自动切换至连环画模式,优先保证用户看剧情。同时用户也可手动切换至视频模式继续等待预加载。
本实施例中,在连环画模式下进行播放时,可以通过自动方式进行播放,也可以通过手动方式进行播放。通过自动播放方式时,播放模块400根据自动播放请求按顺序从画面库中读取对应的关键画面进行播放,并以每3秒1帧的速度按顺序自动播放画面库。这样,可以保证用户顺利阅读字幕,同时与一般的电视剧视频的播放时间相一致。可以理 解,播放速度可通过用户根据自身需求预先设定。通过手动或自动播放时,播放模块400可以根据调用请求从画面库中读取对应的关键画面进行播放。例如,当预加载量无法支持5s的连续播放时,会从画面库中读取对应的关键画面进行播放,当播放一段时间后,若预加载量可以支持5s的连续播放时,可以停止从画面库中读取关键画面并恢复正常的视频播放。
采用上述连环画模式,可以节省流量,比如,时长30分钟的一段标清视频所耗流量在100M以上,而连环画模式仅需9M左右,能节省90%的流量。因而,可以保证用户使用极少的流量就可以追剧,从而提升访问频率和用户满意度。
在连环画模式的每帧画面下,还提供支持用户交流互动的功能。用户可针对关键剧情进行互动交流,创造丰富的用户生成内容。并且,视频和图片形式的广告都能以图片形式插入画面库,为用户提供更多的信息。用户还可以利用碎片时间,像看小说、看图片一样,随时随地追剧,而不必受环境、网络的严格限制。
本发明实施例的视频画面的处理装置,通过获得当前视频的信息,根据当前视频的信息截取当前视频的关键画面,并对关键画面进行排序,生成画面库,以及根据播放请求从画面库中读取对应的关键画面进行播放,在节省流量的同时,保证了在网络卡顿的情况下,用户能够快速对视频进行预览,以使用户快捷、方便地了解剧情,减少了用户在遇到移动视频卡顿情况时的跳出率,提升了用户体验。
图6是根据本发明一个具体实施例的视频画面的处理装置的结构示意图;图7是根据本发明一个具体实施例的视频画面的处理装置的另一个结构示意图。如图6、图7所示,视频画面的处理装置包括:获得模块100、截取模块200、生成模块300以及播放模块400。其中,截取模块200具体包括:时间点序列获得单元210、截取单元220、补帧单元230以及校正单元240。
时间点序列获得单元210用于根据当前视频的信息获得关键字幕时间点序列。
在本实施例中,时间点序列获得单元210可以通过网络、语音识别或图像识别等技术获取当前视频中每段语音字幕的开始和结束时间点,在每段语音字幕的结束时间点上截取一帧画面,以保证仅通过截取画面,就可以完整地看完所有字幕。
具体地,时间点序列获得单元210可以基于字幕文件获得关键字幕时间点序列,其中,影片字幕一般分为图形类型的字幕文件和文本格式的字幕文件;对于图形类型的字幕文件,时间点序列获得单元210可以通过分析其索引文件来获得视频中拥有对白的“字幕时间范围”序列;对于文本格式的字幕文件例如.srt、.ass格式的字幕文件,时间点序列获得单元210可以通过现有程序自动分析获取其中的“字幕时间范围”序列;最后根据这个序列中每个“字幕时间范围”成员的中间值或其他值生成“关键字幕时间点”序列,当然上述中间值或 其他值的设置均可以保证获得带有完整字幕的视频帧。
除了可以基于字幕文件获得关键字幕时间点序列外,时间点序列获得单元210还可以使用语音分析的方式,即基于语音识别区分出人声部分,获得语音字幕的开始和结束时间点,从而获得关键字幕时间点。时间点序列获得单元210也可以采用图像识别的方式,即以0.5s为间隔,将视频转换为连续的帧,再以图片识别的形式识别出特定区域带有完整字幕的帧,去重后直接获得最终的图片帧序列,则这些图片帧序列对应的时间即为关键字幕时间点序列。
截取单元220用于根据关键字幕时间点序列对应截取当前视频的关键画面。
补帧单元230用于判断时间点序列获得单元210获得的或校正单元240校正后的关键字幕时间点序列中的相邻两个关键字幕时间点之间的时间间隔是否大于预定值,若大于,则在相邻两个关键字幕时间点之间获取新增的关键字幕时间点,并将新增的关键字幕时间点插入关键字幕时间点序列中。
当两帧画面间隔时间过长,例如超过预定时间5秒时,则每5秒需截取一副“补帧”,这是因为,这段时间内虽没有语音,但可能有动作镜头发生,动作镜头也会影响用户对剧情的理解。因此,1分钟长的视频约截图15次,以保证剧情的连贯性,每张图片大小为20k,对应当前视频的所有图片文件共计300k。
具体地,补帧单元230判断相邻两个关键字幕时间点之间的时间间隔是否大于预定值,若大于,则在相邻两个关键字幕时间点之间获取新增的关键字幕时间点,并将新增的关键字幕时间点插入关键字幕时间点序列中。例如,当相邻两个“关键字幕时间点”相差超过5s时,则在其之间插入一个中间值时间点上的截图,超过6s插入2个截图,依次类推,以保证至少平均4s内有一个图片,从而可以保证剧情的连贯性。
校正单元240用于对时间点序列获得单元210或补帧单元230获得的关键字幕时间点序列中的关键字幕时间点进行偏移校正。
由于字幕文件一般都有误差,所以根据字幕文件获得的“关键字幕时间点”需要通过偏移校验,自动修正,校正单元240可以通过验证首10个“字幕时间范围”的开始时间来确定偏移量参数,该偏移量参数可通过对比由图像识别或语音识别方式获得的开始时间点和字幕文件中字幕的开始时间来自动获得,也可以通过其他方法获得。校正单元240通过该偏移量参数对“关键字幕时间点”序列进行偏移校正。
需要说明的是,补帧单元230进行的补帧处理和校正单元240的偏移校正并无严格的执行顺序,即可以先进行补帧后进行偏移校正,也可以先进行偏移校正后进行补帧。
本发明实施例的视频画面的处理装置,通过对关键字幕时间点序列进行补帧处理,以 及对关键字幕时间点序列中的关键字幕时间点进行偏移校正,使截取的关键画面更加连贯、准确,从而用户能够快速对视频进行预览,更加快捷、方便地了解剧情,提升了用户体验。
为了实现上述实施例,本发明还提出了一种存储介质,用于存储应用程序,该应用程序用于执行本发明任一项实施例所述的视频画面的处理方法。
应当理解,本发明的各部分可以用硬件、软件、固件或它们的组合来实现。在上述实施方式中,多个步骤或方法可以用存储在存储器中且由合适的指令执行系统执行的软件或固件来实现。例如,如果用硬件来实现,和在另一实施方式中一样,可用本领域公知的下列技术中的任一项或他们的组合来实现:具有用于对数据信号实现逻辑功能的逻辑门电路的离散逻辑电路,具有合适的组合逻辑门电路的专用集成电路,可编程门阵列(PGA),现场可编程门阵列(FPGA)等。
在本说明书的描述中,参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本发明的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不一定指的是相同的实施例或示例。而且,描述的具体特征、结构、材料或者特点可以在任何的一个或多个实施例或示例中以合适的方式结合。
尽管已经示出和描述了本发明的实施例,本领域的普通技术人员可以理解:在不脱离本发明的原理和宗旨的情况下可以对这些实施例进行多种变化、修改、替换和变型,本发明的范围由权利要求及其等同物限定。

Claims (13)

  1. 一种视频画面的处理方法,其特征在于,包括:
    获得当前视频的信息;
    根据所述当前视频的信息截取当前视频的关键画面,所述关键画面包括带有完整字幕的视频帧画面;
    对所述关键画面进行排序,生成画面库;以及
    接收播放请求,根据所述播放请求从所述画面库中读取对应的关键画面进行播放。
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述当前视频的信息截取当前视频的关键画面包括:
    根据所述当前视频的信息获得关键字幕时间点序列;
    根据所述关键字幕时间点序列对应截取当前视频的关键画面。
  3. 根据权利要求2所述的方法,其特征在于,在所述根据所述当前视频的信息获得关键字幕时间点序列之后,所述根据所述关键字幕时间点序列对应截取当前视频的关键画面之前,该方法还包括:
    判断相邻两个关键字幕时间点之间的时间间隔是否大于预定值,若大于,则在所述相邻两个关键字幕时间点之间获取新增的关键字幕时间点,并将所述新增的关键字幕时间点插入所述关键字幕时间点序列中;和/或
    对所述关键字幕时间点序列中的关键字幕时间点进行偏移校正。
  4. 根据权利要求2或3所述的方法,其特征在于,所述根据所述当前视频的信息获得关键字幕时间点序列包括:
    根据所述当前视频的字幕文件获得字幕时间范围序列,根据字幕时间范围序列生成关键字幕时间点序列;或者
    对当前视频进行语音识别,获得字幕时间范围序列,根据字幕时间范围序列生成关键字幕时间点序列;或者
    将当前视频转换为视频帧,采用图像识别的方式识别出预定区域带有完整字幕的视频帧序列,对所述视频帧序列进行去重处理,将去重后的视频帧序列对应的时间作为关键字幕时间点序列。
  5. 根据权利要求1所述的方法,其特征在于,所述接收播放请求,根据所述播放请求从所述画面库中读取对应的关键画面进行播放,包括:
    接收自动播放请求,根据所述自动播放请求按顺序从所述画面库中读取对应的关键画面进行播放;或者
    接收调用请求,根据所述调用请求从所述画面库中读取对应的关键画面进行播放。
  6. 根据权利要求5所述的方法,其特征在于,所述根据所述调用请求从所述画面库中读取对应的关键画面进行播放之后,该方法还包括:
    接收停止播放请求,根据所述停止播放请求停止从所述画面库中读取关键画面。
  7. 一种视频画面的处理装置,其特征在于,包括:
    获得模块,用于获得当前视频的信息;
    截取模块,用于根据所述获得模块获得的所述当前视频的信息截取当前视频的关键画面,所述关键画面包括带有完整字幕的视频帧画面;
    生成模块,用于对所述截取模块截取到的所述关键画面进行排序,生成画面库;以及
    播放模块,用于接收播放请求,根据所述播放请求从所述生成模块生成的所述画面库中读取对应的关键画面进行播放。
  8. 根据权利要求7所述的装置,其特征在于,所述截取模块包括:
    时间点序列获得单元,用于根据所述当前视频的信息获得关键字幕时间点序列;
    截取单元,用于根据所述关键字幕时间点序列对应截取当前视频的关键画面。
  9. 根据权利要求8所述的装置,其特征在于,所述截取模块还包括位于所述时间点序列获得单元和所述截取单元之间的补帧单元和校正单元,其中:
    所述补帧单元,用于判断所述时间点序列获得单元获得的或所述校正单元校正后的关键字幕时间点序列中的相邻两个关键字幕时间点之间的时间间隔是否大于预定值,若大于,则在所述相邻两个关键字幕时间点之间获取新增的关键字幕时间点,并将所述新增的关键字幕时间点插入所述关键字幕时间点序列中;和/或
    所述校正单元,用于对所述时间点序列获得单元或所述补帧单元获得的所述关键字幕时间点序列中的关键字幕时间点进行偏移校正。
  10. 根据权利要求8或9所述的装置,其特征在于,所述时间点序列获得单元,具体用于:
    根据所述当前视频的字幕文件获得字幕时间范围序列,根据字幕时间范围序列生成关键字幕时间点序列;或者
    对当前视频进行语音识别,获得字幕时间范围序列,根据字幕时间范围序列生成关键字幕时间点序列;或者
    将当前视频转换为视频帧,采用图像识别的方式识别出预定区域带有完整字幕的视频帧序列,对所述视频帧序列进行去重处理,将去重后的视频帧序列对应的时间作为关键字幕时间点序列。
  11. 根据权利要求7所述的装置,其特征在于,所述播放模块,具体用于:
    接收自动播放请求,根据所述自动播放请求按顺序从所述画面库中读取对应的关键画面进行播放;或者
    接收调用请求,根据所述调用请求从所述画面库中读取对应的关键画面进行播放。
  12. 根据权利要求11所述的装置,其特征在于,所述播放模块,还用于:
    在根据所述调用请求从所述画面库中读取对应的关键画面进行播放之后,接收停止播放请求,根据所述停止播放请求停止从所述画面库中读取关键画面。
  13. 一种存储介质,其特征在于,用于存储应用程序,所述应用程序用于执行权利要求1至6中任一项所述的视频画面的处理方法。
PCT/CN2014/089946 2013-12-04 2014-10-30 视频画面的处理方法及装置 WO2015081776A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP2016535328A JP6266109B2 (ja) 2013-12-04 2014-10-30 動画画面の処理方法及び装置
KR1020157035232A KR101746165B1 (ko) 2013-12-04 2014-10-30 동영상 화면의 처리 방법 및 장치
US14/392,326 US9973793B2 (en) 2013-12-04 2014-10-30 Method and apparatus for processing video image

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201310646783.6 2013-12-04
CN201310646783.6A CN103634605B (zh) 2013-12-04 2013-12-04 视频画面的处理方法及装置

Publications (1)

Publication Number Publication Date
WO2015081776A1 true WO2015081776A1 (zh) 2015-06-11

Family

ID=50215178

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/089946 WO2015081776A1 (zh) 2013-12-04 2014-10-30 视频画面的处理方法及装置

Country Status (5)

Country Link
US (1) US9973793B2 (zh)
JP (1) JP6266109B2 (zh)
KR (1) KR101746165B1 (zh)
CN (1) CN103634605B (zh)
WO (1) WO2015081776A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110490101A (zh) * 2019-07-30 2019-11-22 平安科技(深圳)有限公司 一种图片截取方法、装置及计算机存储介质

Families Citing this family (52)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9190110B2 (en) 2009-05-12 2015-11-17 JBF Interlude 2009 LTD System and method for assembling a recorded composition
US11232458B2 (en) 2010-02-17 2022-01-25 JBF Interlude 2009 LTD System and method for data mining within interactive multimedia
US9009619B2 (en) 2012-09-19 2015-04-14 JBF Interlude 2009 Ltd—Israel Progress bar for branched videos
US9257148B2 (en) 2013-03-15 2016-02-09 JBF Interlude 2009 LTD System and method for synchronization of selectably presentable media streams
US10448119B2 (en) 2013-08-30 2019-10-15 JBF Interlude 2009 LTD Methods and systems for unfolding video pre-roll
CN103634605B (zh) * 2013-12-04 2017-02-15 百度在线网络技术(北京)有限公司 视频画面的处理方法及装置
US9653115B2 (en) 2014-04-10 2017-05-16 JBF Interlude 2009 LTD Systems and methods for creating linear video from branched video
US9792957B2 (en) 2014-10-08 2017-10-17 JBF Interlude 2009 LTD Systems and methods for dynamic video bookmarking
US11412276B2 (en) 2014-10-10 2022-08-09 JBF Interlude 2009 LTD Systems and methods for parallel track transitions
CN105635749B (zh) 2014-10-31 2017-03-22 广州市动景计算机科技有限公司 产生视频帧集合的方法和设备
CN104581407A (zh) * 2014-12-31 2015-04-29 北京奇艺世纪科技有限公司 一种视频预览的方法和装置
US10582265B2 (en) 2015-04-30 2020-03-03 JBF Interlude 2009 LTD Systems and methods for nonlinear video playback using linear real-time video players
US10460765B2 (en) 2015-08-26 2019-10-29 JBF Interlude 2009 LTD Systems and methods for adaptive and responsive video
US11164548B2 (en) * 2015-12-22 2021-11-02 JBF Interlude 2009 LTD Intelligent buffering of large-scale video
US11128853B2 (en) 2015-12-22 2021-09-21 JBF Interlude 2009 LTD Seamless transitions in large-scale video
CN105635849B (zh) * 2015-12-25 2018-06-05 网易传媒科技(北京)有限公司 多媒体文件播放时的文本显示方法和装置
US10462202B2 (en) 2016-03-30 2019-10-29 JBF Interlude 2009 LTD Media stream rate synchronization
US11856271B2 (en) 2016-04-12 2023-12-26 JBF Interlude 2009 LTD Symbiotic interactive video
US10218760B2 (en) 2016-06-22 2019-02-26 JBF Interlude 2009 LTD Dynamic summary generation for real-time switchable videos
CN106201713B (zh) * 2016-06-30 2019-10-22 宇龙计算机通信科技(深圳)有限公司 一种卡顿的处理方法及系统
CN106295592A (zh) * 2016-08-17 2017-01-04 北京金山安全软件有限公司 一种媒体文件字幕的识别方法、装置及电子设备
CN106454151A (zh) * 2016-10-18 2017-02-22 珠海市魅族科技有限公司 视频画面拼接方法及装置
CN108124164B (zh) * 2016-11-28 2021-10-26 广州方硅信息技术有限公司 一种视频播放的方法、系统、主播端设备及客户端设备
US11050809B2 (en) 2016-12-30 2021-06-29 JBF Interlude 2009 LTD Systems and methods for dynamic weighting of branched video paths
KR101924634B1 (ko) * 2017-06-07 2018-12-04 네이버 주식회사 콘텐츠 제공 서버, 콘텐츠 제공 단말 및 콘텐츠 제공 방법
CN107484018B (zh) * 2017-07-31 2019-05-17 维沃移动通信有限公司 一种视频截图方法、移动终端
CN109756767B (zh) * 2017-11-06 2021-12-14 腾讯科技(深圳)有限公司 预览数据播放方法、装置及存储介质
CN109936763B (zh) * 2017-12-15 2022-07-01 腾讯科技(深圳)有限公司 视频的处理及发布方法
US10257578B1 (en) 2018-01-05 2019-04-09 JBF Interlude 2009 LTD Dynamic library display for interactive videos
CN110198467A (zh) * 2018-02-27 2019-09-03 优酷网络技术(北京)有限公司 视频播放方法及装置
US11601721B2 (en) 2018-06-04 2023-03-07 JBF Interlude 2009 LTD Interactive video dynamic adaptation and user profiling
CN108833973B (zh) * 2018-06-28 2021-01-19 腾讯科技(深圳)有限公司 视频特征的提取方法、装置和计算机设备
CN112866785B (zh) 2018-08-17 2021-10-29 腾讯科技(深圳)有限公司 图片生成方法、装置、设备及存储介质
CN109672932B (zh) * 2018-12-29 2021-09-28 深圳Tcl新技术有限公司 辅助视力障碍者观看视频的方法、系统、设备及存储介质
CN109714644B (zh) * 2019-01-22 2022-02-25 广州虎牙信息科技有限公司 一种视频数据的处理方法、装置、计算机设备和存储介质
CN109859298B (zh) * 2019-03-05 2023-06-30 腾讯科技(深圳)有限公司 一种图像处理方法及其装置、设备和存储介质
CN109803180B (zh) * 2019-03-08 2022-05-20 腾讯科技(深圳)有限公司 视频预览图生成方法、装置、计算机设备及存储介质
US11011183B2 (en) * 2019-03-25 2021-05-18 Cisco Technology, Inc. Extracting knowledge from collaborative support sessions
CN112118494B (zh) * 2019-06-20 2022-09-20 腾讯科技(深圳)有限公司 一种视频数据处理方法、装置及存储介质
CN110784750B (zh) * 2019-08-13 2022-11-11 腾讯科技(深圳)有限公司 视频播放方法、装置及计算机设备
CN110602546A (zh) * 2019-09-06 2019-12-20 Oppo广东移动通信有限公司 视频生成方法、终端及计算机可读存储介质
US11490047B2 (en) 2019-10-02 2022-11-01 JBF Interlude 2009 LTD Systems and methods for dynamically adjusting video aspect ratios
CN111161392B (zh) * 2019-12-20 2022-12-16 苏宁云计算有限公司 一种视频的生成方法、装置及计算机系统
CN111104913B (zh) * 2019-12-23 2023-03-24 福州大学 一种基于结构及相似度的视频提取ppt方法
US12096081B2 (en) 2020-02-18 2024-09-17 JBF Interlude 2009 LTD Dynamic adaptation of interactive video players using behavioral analytics
US11245961B2 (en) 2020-02-18 2022-02-08 JBF Interlude 2009 LTD System and methods for detecting anomalous activities for interactive videos
US12047637B2 (en) 2020-07-07 2024-07-23 JBF Interlude 2009 LTD Systems and methods for seamless audio and video endpoint transitions
CN113766149A (zh) * 2020-08-28 2021-12-07 北京沃东天骏信息技术有限公司 字幕拼接图片的拼接方法、装置、电子设备和存储介质
US11625928B1 (en) * 2020-09-01 2023-04-11 Amazon Technologies, Inc. Language agnostic drift correction
CN112672090B (zh) * 2020-12-17 2023-04-18 深圳随锐视听科技有限公司 一种云视频会议中优化音视频效果的方法
US11882337B2 (en) 2021-05-28 2024-01-23 JBF Interlude 2009 LTD Automated platform for generating interactive videos
US11934477B2 (en) 2021-09-24 2024-03-19 JBF Interlude 2009 LTD Video player integration within websites

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070011706A1 (en) * 2005-07-11 2007-01-11 Inventec Corporation Video browsing system and method
CN101901619A (zh) * 2010-07-16 2010-12-01 复旦大学 一种基于视频内容缩影的增强用户体验的视频播放器
CN102364960A (zh) * 2011-11-04 2012-02-29 北京播思软件技术有限公司 移动数字电视画中画和频道缩略图的播放方法及移动终端
CN102685574A (zh) * 2011-03-09 2012-09-19 须泽中 从数字电视节目中自动抽取图像的系统及其应用
CN103634605A (zh) * 2013-12-04 2014-03-12 百度在线网络技术(北京)有限公司 视频画面的处理方法及装置

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3472659B2 (ja) 1995-02-20 2003-12-02 株式会社日立製作所 映像供給方法および映像供給システム
JPH11313048A (ja) 1998-04-24 1999-11-09 Kokusai Electric Co Ltd マルチメディア通信方法及び通信装置
KR100965471B1 (ko) 2004-11-02 2010-06-24 가부시키가이샤 테레비 아사히 데이터 비젼 자막 첨부 정지 화상 컨텐츠 작성 장치, 자막 첨부 정지화상 컨텐츠 작성 프로그램 및 자막 첨부 정지 화상 컨텐츠작성 시스템
JP2007336263A (ja) * 2006-06-15 2007-12-27 Fujifilm Corp 画像処理方法及び装置並びにプログラム
JP4846674B2 (ja) 2007-08-14 2011-12-28 日本放送協会 静止画抽出装置及び静止画抽出プログラム
JP5173337B2 (ja) 2007-09-18 2013-04-03 Kddi株式会社 要約コンテンツ生成装置およびコンピュータプログラム
CN101770701A (zh) * 2008-12-30 2010-07-07 北京新学堂网络科技有限公司 一种用于外语学习的电影连环画制作方法
JP5246948B2 (ja) * 2009-03-27 2013-07-24 Kddi株式会社 字幕ずれ補正装置、再生装置および放送装置
US8281231B2 (en) * 2009-09-11 2012-10-02 Digitalsmiths, Inc. Timeline alignment for closed-caption text using speech recognition transcripts
JP5232744B2 (ja) 2009-09-14 2013-07-10 Kddi株式会社 要約コンテンツを表示する表示装置、方法及びプログラム
US8332530B2 (en) * 2009-12-10 2012-12-11 Hulu Llc User interface including concurrent display of video program, histogram, and transcript
KR101289267B1 (ko) * 2009-12-22 2013-08-07 한국전자통신연구원 방송통신시스템에서 dtv 자막 처리 장치 및 방법
JP5677229B2 (ja) 2011-07-28 2015-02-25 日本放送協会 映像字幕検出装置およびそのプログラム
US20130080384A1 (en) * 2011-09-23 2013-03-28 Howard BRIGGS Systems and methods for extracting and processing intelligent structured data from media files
CN103020076B (zh) * 2011-09-23 2017-02-08 深圳市快播科技有限公司 一种播放器的视频文件动态预览方法和装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070011706A1 (en) * 2005-07-11 2007-01-11 Inventec Corporation Video browsing system and method
CN101901619A (zh) * 2010-07-16 2010-12-01 复旦大学 一种基于视频内容缩影的增强用户体验的视频播放器
CN102685574A (zh) * 2011-03-09 2012-09-19 须泽中 从数字电视节目中自动抽取图像的系统及其应用
CN102364960A (zh) * 2011-11-04 2012-02-29 北京播思软件技术有限公司 移动数字电视画中画和频道缩略图的播放方法及移动终端
CN103634605A (zh) * 2013-12-04 2014-03-12 百度在线网络技术(北京)有限公司 视频画面的处理方法及装置

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110490101A (zh) * 2019-07-30 2019-11-22 平安科技(深圳)有限公司 一种图片截取方法、装置及计算机存储介质

Also Published As

Publication number Publication date
JP6266109B2 (ja) 2018-01-24
US20160277779A1 (en) 2016-09-22
JP2016531512A (ja) 2016-10-06
CN103634605A (zh) 2014-03-12
KR20160010507A (ko) 2016-01-27
CN103634605B (zh) 2017-02-15
KR101746165B1 (ko) 2017-06-12
US9973793B2 (en) 2018-05-15

Similar Documents

Publication Publication Date Title
WO2015081776A1 (zh) 视频画面的处理方法及装置
KR101193794B1 (ko) 비디오에서의 지연된 광고 삽입
US8516119B2 (en) Systems and methods for determining attributes of media items accessed via a personal media broadcaster
US9357267B2 (en) Synchronizing video content with extrinsic data
US9369780B2 (en) Methods and systems for detecting one or more advertisement breaks in a media content stream
US10735817B2 (en) Video playback method and apparatus, and computer readable storage medium
KR101145062B1 (ko) 비디오에서의 북마크
KR100803747B1 (ko) 요약 클립 생성 시스템 및 이를 이용한 요약 클립 생성방법
US11778286B2 (en) Systems and methods for summarizing missed portions of storylines
US10021433B1 (en) Video-production system with social-media features
US10749923B2 (en) Contextual video content adaptation based on target device
US10129592B2 (en) Audience measurement and feedback system
CN111405339A (zh) 一种分屏显示方法、电子设备及存储介质
CN110679153B (zh) 用于提供重新缓冲事件的时间放置的方法
WO2019149066A1 (zh) 视频播放方法、终端设备及存储介质
CN109194971A (zh) 一种为多媒体文件的生成方法及装置
US11653039B2 (en) Video stream batching
CN106507183B (zh) 视频名称的获取方法及装置
US12058409B2 (en) Metadata manipulation
US11196789B2 (en) Recording device and recording method
EP3032766A1 (en) Method and device for generating personalized video programs
Jeon et al. Video conversion scheme with animated image to enhance user experiences on mobile environments
KR20160036658A (ko) 비밀 광고를 위한 방법, 장치 및 시스템

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14867534

Country of ref document: EP

Kind code of ref document: A1

DPE1 Request for preliminary examination filed after expiration of 19th month from priority date (pct application filed from 20040101)
ENP Entry into the national phase

Ref document number: 20157035232

Country of ref document: KR

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 14392326

Country of ref document: US

ENP Entry into the national phase

Ref document number: 2016535328

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14867534

Country of ref document: EP

Kind code of ref document: A1