WO2014000703A1 - 视频处理方法、终端及字幕服务器 - Google Patents

视频处理方法、终端及字幕服务器 Download PDF

Info

Publication number
WO2014000703A1
WO2014000703A1 PCT/CN2013/078482 CN2013078482W WO2014000703A1 WO 2014000703 A1 WO2014000703 A1 WO 2014000703A1 CN 2013078482 W CN2013078482 W CN 2013078482W WO 2014000703 A1 WO2014000703 A1 WO 2014000703A1
Authority
WO
WIPO (PCT)
Prior art keywords
subtitle
video program
subtitles
stream
program
Prior art date
Application number
PCT/CN2013/078482
Other languages
English (en)
French (fr)
Inventor
郜文美
范姝男
吕小强
王雅辉
Original Assignee
华为终端有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为终端有限公司 filed Critical 华为终端有限公司
Priority to EP13809506.2A priority Critical patent/EP2852168A1/en
Publication of WO2014000703A1 publication Critical patent/WO2014000703A1/zh
Priority to US14/568,409 priority patent/US20150100981A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/8126Monomedia components thereof involving additional data, e.g. news, sports, stocks, weather forecasts
    • H04N21/8133Monomedia components thereof involving additional data, e.g. news, sports, stocks, weather forecasts specifically related to the content, e.g. biography of the actors in a movie, detailed information about an article seen in a video program
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/233Processing of audio elementary streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • H04N21/234336Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements by media transcoding, e.g. video is transformed into a slideshow of still pictures or audio is converted into text
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/237Communication with additional data server
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/4302Content synchronisation processes, e.g. decoder synchronisation
    • H04N21/4307Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen
    • H04N21/43074Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen of additional data with content streams on the same device, e.g. of EPG data or interactive icon with a TV program
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/435Processing of additional data, e.g. decrypting of additional data, reconstructing software from modules extracted from the transport stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/437Interfacing the upstream path of the transmission network, e.g. for transmitting client requests to a VOD server
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • H04N21/4394Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/440236Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by media transcoding, e.g. video is transformed into a slideshow of still pictures, audio is converted into text
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/4722End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for requesting additional data associated with the content
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/485End-user interface for client configuration
    • H04N21/4856End-user interface for client configuration for language selection, e.g. for the menu or subtitles
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/488Data services, e.g. news ticker
    • H04N21/4884Data services, e.g. news ticker for displaying subtitles
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/8547Content authoring involving timestamps for synchronizing content

Definitions

  • Video processing method, terminal and caption server This application claims priority to the Chinese Patent Application entitled “Video Processing Method, Terminal and Subtitle Server” submitted by the Chinese Patent Office on June 29, 2012, and the application number is 201210222137. The content is incorporated herein by reference.
  • the present invention relates to communications technologies, and in particular, to a video processing method, a terminal, and a subtitle server.
  • a video processing method a terminal
  • a subtitle server a subtitle server.
  • TV dramas, movies, etc. have gradually begun to use subtitles as standard, but in reality, there are still many video programs that are not equipped with subtitles, such as news programs, variety shows, sports programs, etc., especially live broadcasts.
  • the program is too late to broadcast the program while editing the subtitles; there are also many video programs on the Internet that have no subtitles. There are also some cases where there are subtitles, but no subtitles are added.
  • the present invention provides a video processing method, a terminal, and a subtitle server to implement real-time acquisition of subtitles according to a video program.
  • a first aspect of the present invention provides a video processing method, including:
  • the requesting, by the subtitle server, the subtitle corresponding to the video program stream includes: sending a subtitle acquisition request to the subtitle server, where the subtitle acquisition request is used to request acquisition a subtitle corresponding to the video program stream, and the audio stream in the video program stream Sending to the subtitle server, so that the subtitle server performs voice text conversion on the audio stream according to the subtitle acquisition request to generate the subtitle.
  • the received video program stream further includes a program identifier of the video program; and the requesting, by the subtitle server, the subtitle corresponding to the video program stream, includes: a caption server sends a caption acquisition request, the caption acquisition request is used to request to acquire a caption corresponding to the video program stream, and the caption acquisition request carries the program identifier, so that the caption server obtains a request according to the caption
  • the program identification determines the subtitle.
  • the method further includes: receiving a connection failure response sent by the subtitle server, The connection failure response is used to indicate that the subtitle server fails to acquire the audio stream according to the program identifier connection program source, and the program source is used to generate the video program stream; according to the connection failure response, the video program is An audio stream in the stream is sent to the caption server, such that the caption server performs voice text conversion on the audio stream to generate the caption.
  • the method further includes: receiving the received video program corresponding to the video program The stream is buffered and stored at least to the receiving of the subtitle corresponding to the video program stream returned by the subtitle server.
  • the received subtitle corresponding to the video program stream returned by the subtitle server further includes a packet identifier of an audio package corresponding to the subtitle; and the subtitle is received in the subtitle
  • the method further includes: synchronizing the subtitle with the audio stream according to a packet identifier of the audio package, to The video program and the subtitle are displayed synchronously.
  • a second aspect of the present invention provides a video processing method, including:
  • the request, by the receiving terminal, for acquiring a subtitle corresponding to a video program stream corresponding to the video program includes: receiving a subtitle acquisition request sent by the terminal, the subtitle Obtaining a request for requesting to acquire a subtitle corresponding to the video program stream, and connecting Receiving an audio stream in the video program stream sent by the terminal; the acquiring a subtitle corresponding to the video program stream, comprising: converting the audio stream into a voice text according to the subtitle acquisition request to generate the subtitle .
  • the request for acquiring the subtitle corresponding to the video program stream corresponding to the video program sent by the receiving terminal includes: receiving a subtitle obtaining request sent by the terminal, where the subtitle obtaining request is used for Requesting to acquire a subtitle corresponding to the video program stream, the subtitle acquisition request carrying a program identifier of the video program; the acquiring a subtitle corresponding to the video program stream, including: according to the subtitle acquisition request and the The program identification acquires a word corresponding to the video program stream.
  • the acquiring the subtitle corresponding to the video program stream according to the subtitle acquisition request and the program identifier including: determining, according to the subtitle acquisition request, whether the program has been connected Identifying the program source where the corresponding video program is located; if it is already connected, performing the returning the subtitle to the terminal; otherwise, establishing a connection with the program source and acquiring an audio stream in the video program stream And converting the audio stream into voice text to generate the subtitle.
  • the method further includes: if a connection failure with the program source fails, returning, to the terminal, a connection failure response indicating that the connection of the program source fails; and receiving the terminal according to The connection failure responds to the sent audio stream at the terminal, and the audio stream is converted into a voice text to generate the subtitle.
  • the performing the voice text conversion to generate the subtitle by the audio stream includes: performing voice text conversion on the audio stream to generate the subtitle corresponding to the video program stream, And setting a packet identifier of the audio package corresponding to the subtitle in the subtitle, so that the terminal synchronizes the subtitle with the audio stream according to the packet identifier of the audio package.
  • a third aspect of the present invention provides a terminal, including:
  • a program receiving unit configured to receive a video program stream corresponding to the video program
  • a real-time caption client configured to request a subtitle server to obtain a subtitle corresponding to the video program stream, and receive the subtitle returned by the subtitle server;
  • a program presentation unit configured to display the video program and the subtitle.
  • the real-time caption client is specifically configured to send a caption acquisition request to the caption server, where the caption acquisition request is used to request to acquire a video program stream corresponding to the video program stream. Subtitles and send audio streams in the video program stream to the subtitle service And causing the caption server to perform voice text conversion on the audio stream according to the caption acquisition request to generate the caption.
  • the real-time caption client is specifically configured to send a caption acquisition request to the caption server, where the caption acquisition request is used to acquire a caption corresponding to the video program stream,
  • the caption acquisition request carries the program identifier of the video program, so that the caption server determines a word corresponding to the video program stream according to the caption acquisition request and the program identifier.
  • the real-time caption client is further configured to: after sending the caption acquisition request to the caption server, receive a connection failure response sent by the caption server, where the connection failure response is used to Representing that the subtitle server fails to acquire the audio stream according to the program identifier connection program source, the program source is used to generate the video program stream; and, according to the connection failure response, in the video program stream An audio stream is sent to the caption server, such that the caption server converts the audio stream into a voice text to generate the caption.
  • the real-time subtitle client is further configured to: after receiving the video program stream corresponding to the video program, buffer the video program stream, at least buffering to And receiving the subtitle corresponding to the video program stream returned by the subtitle server.
  • the real-time caption client is further configured to: according to the received packet identifier of the audio package included in the subtitle corresponding to the video program stream returned by the subtitle server, The audio stream is synchronized such that the program presentation unit simultaneously displays the video program and the subtitles.
  • a fourth aspect of the present invention provides a caption server, including: a request for a corresponding subtitle in a stream;
  • a subtitle obtaining unit configured to acquire a subtitle corresponding to the video program stream according to the request
  • a subtitle sending unit configured to return the subtitle to the terminal, so that the terminal displays the video program and the subtitle.
  • the request receiving unit is specifically configured to receive a subtitle acquisition request sent by the terminal, where the subtitle acquisition request is used to request to acquire a subtitle corresponding to the video program stream. And receiving the audio stream in the video program stream sent by the terminal; the caption acquisition unit is specifically configured to perform voice text conversion on the audio stream according to the subtitle acquisition request. Into the subtitles.
  • the request receiving unit is specifically configured to receive a subtitle acquisition request sent by the terminal, where the subtitle acquisition request is used to request to acquire a subtitle corresponding to the video program stream, where the subtitle is acquired.
  • the subtitle acquisition request is used to request to acquire a subtitle corresponding to the video program stream, where the subtitle is acquired.
  • Requesting to carry the program identifier of the video program; the subtitle obtaining unit is specifically configured to acquire a subtitle corresponding to the video program stream according to the subtitle acquisition request and the program identifier.
  • the subtitle obtaining unit includes: a determining subunit, configured to determine, according to the subtitle obtaining request, whether a program source where a video program corresponding to the program identifier is connected is connected; a sending unit, configured to: when the determining result of the determining subunit is already connected, performing the returning the subtitle to the terminal; and acquiring a subunit, where the determining result in the determining subunit is not connected And establishing a connection with the program source and acquiring an audio stream in the video program stream; and converting a subunit, configured to perform voice text conversion on the audio stream to generate the subtitle.
  • the subtitle obtaining unit further includes: a feedback subunit, configured to return, when the acquiring subunit fails to establish a connection with the program source, to indicate that the connected program is returned a source failure connection response unit; the request receiving unit is further configured to receive an audio stream at the terminal that is sent by the terminal according to the connection failure response, so that the conversion subunit performs voice transmission on the audio stream Text conversion generates the subtitles.
  • the converting subunit is further configured to: when performing voice text conversion on the audio stream to generate the subtitle, setting an audio package corresponding to the subtitle in the subtitle The packet is identified such that the terminal 4 synchronizes the subtitle with the audio stream according to the packet identification of the audio packet.
  • FIG. 1 is a schematic flowchart of an embodiment of a video processing method according to the present invention
  • FIG. 2 is a schematic flowchart of another embodiment of a video processing method according to the present invention.
  • FIG. 3 is a schematic diagram of signaling according to still another embodiment of a video processing method according to the present invention
  • FIG. 4 is a schematic structural diagram of a terminal embodiment of the present invention
  • FIG. 5 is a schematic structural diagram of an embodiment of a caption server according to the present invention.
  • FIG. 6 is a schematic structural diagram of another embodiment of a caption server according to the present invention.
  • the video program described in the embodiments of the present invention includes video programs distributed in various manners, for example, digital television (Digital TV, DTV for short), and Internet Protocol television (abbreviation for IPTV). ), China Mobile Multimedia Broadcasting (CMMB), terrestrial/satellite TV, cable TV, Internet video, etc.; the terminal includes a variety of terminals, for example, Set Top Box (STB) ), smart TV (SmartTV), mobile terminals, etc.
  • STB Set Top Box
  • SmartTV smart TV
  • mobile terminals etc.
  • FIG. 1 is a schematic flowchart of a video processing method according to an embodiment of the present invention. The method may be performed by a terminal. As shown in FIG. 1, the video processing method in this embodiment may include:
  • the video program may be a plurality of forms of video programs as described above, and is usually provided by a video provider, such as a service provider (SP) or a content provider (Content Provider). , abbreviation: CP) can provide video programs.
  • a video provider such as a service provider (SP) or a content provider (Content Provider).
  • SP service provider
  • Content Provider Content provider
  • the terminal receives a video program stream corresponding to the video program sent by the video provider, and the video program stream includes a video stream (ie, picture data in the video program), and an audio stream (ie, sound data in the video program).
  • the terminal does not immediately present the video program stream when receiving the video program stream.
  • the video program stream may be buffered, and the subtitle server, for example, the cloud subtitle server, is requested to acquire the subtitle corresponding to the video program stream.
  • the subtitle corresponding to the video program stream actually means that the subtitle corresponds to the audio stream in the video program stream, for example, the audio stream is converted by voice text, and the subtitle is to be presented together with the video program.
  • the cloud caption server may be, for example, a cloud caption server provided by a professional provider. Since the cloud usually has strong computing power, database storage capability, etc., the voice database can be conveniently extended and voice recognition can be performed. Algorithm upgrade, so the cloud subtitle server The accuracy of speech recognition is high; subtitles with high recognition accuracy can be quickly obtained from the subtitle server in the cloud.
  • the subtitle server in the cloud may obtain a subtitle corresponding to the video program stream according to the request for acquiring the subtitle of the terminal, and send the subtitle to the terminal. .
  • the subtitle server may obtain subtitles.
  • the audio stream is converted into speech to text, that is, speech recognition is performed, and corresponding subtitles are obtained.
  • the terminal When the terminal receives the subtitle returned by the subtitle server in the cloud, the terminal presents the subtitle together with the video program corresponding to the video program stream, and displays the video program with the subtitle.
  • the video processing method of the embodiment obtains the corresponding subtitles by the terminal when the video program stream is received, and then displays the subtitles together with the video program stream, so that the subtitles can be obtained according to the video program in real time; for example, the terminal If a video program stream without subtitles is received, the terminal can process the method according to the embodiment to automatically obtain the subtitle corresponding to the video program stream, which facilitates the user to view the video program.
  • a switch may be configured to control whether to enable the real-time caption acquisition function, which may be controlled by a user of the terminal. If the real-time caption is not used, the function may not be enabled; If the video program has no subtitles and it is desired to activate the real-time subtitle acquisition function, the terminal can be executed by the switch, and the terminal can execute the subtitle acquisition process described in this embodiment.
  • FIG. 2 is a schematic flowchart of another embodiment of a video processing method according to the present invention.
  • the method may be performed by a subtitle server. This embodiment is performed by using a subtitle server in the cloud. As shown in FIG. 2, the video processing method in this embodiment may be used.
  • the method includes: a request for a subtitle corresponding to the stream;
  • the subtitle server set in the cloud receives the request for acquiring the subtitle sent by the terminal, and requests to acquire the subtitle corresponding to the video program stream from the subtitle server.
  • the subtitle server may receive the subtitle request sent by the terminal. With an audio stream, the subtitle server directly converts the audio stream into speech to text to obtain subtitles;
  • the subtitle server may receive the program identifier of the video program only in the subtitle request sent by the terminal, and the subtitle server may connect to the program source according to the program identifier to obtain the audio stream, and then perform voice to text conversion. Or obtaining a subtitle; or, when receiving the program identifier, the subtitle server obtains a subtitle whose audio stream corresponding to the program identifier is already stored by itself (the subtitle may be temporarily stored by the subtitle server for speech recognition of another terminal) , directly send the stored subtitles to the terminal.
  • the subtitle server can deliver the subtitle to the terminal through the subtitle stream channel between the terminal and the terminal. If the subtitle stream channel has not been established beforehand, the subtitle server needs to first negotiate with the terminal to establish a subtitle stream channel and then deliver the subtitle. After receiving the subtitle, the terminal will simultaneously display the video program and the subtitle.
  • the subtitle corresponding to the video program stream is obtained by the subtitle server in the cloud, and the subtitle is returned to the terminal, so that the terminal can display the subtitle together with the video program to implement the video program with the subtitle.
  • FIG. 3 is a schematic diagram of signaling according to still another embodiment of a video processing method according to the present invention.
  • two terminals namely, a terminal 1 and a terminal 2 are illustrated, wherein the structure is shown in the terminal 2, including a program.
  • the presentation unit, the real-time subtitle client, and the program receiving unit may further include a video buffer VPD and an audio buffer APD.
  • the specific functions of the foregoing units in the terminal 2 will be described in the fourth embodiment.
  • the process of how each unit of the terminal participates in the method is introduced.
  • the terminal 1 has the same structure as the terminal 2 and is not shown in Fig. 3.
  • the method in this embodiment is still exemplified by a cloud subtitle server, and the method may include:
  • the program receiving unit on the terminal 2 acquires a video program stream from a video program source.
  • the video program stream sent by the video program source to the terminal 2 includes a video stream and an audio stream, the video recording refers to picture data of the video program, and the audio stream refers to sound data of the video program.
  • the real-time subtitle client on the terminal 2 receives the video program stream from the program receiving unit, and acquires a program identifier corresponding to the video program.
  • the real-time caption client is disposed between the program presenting unit and the program receiving unit, and is specifically disposed between the audio and video buffer and the program presenting unit, where the audio and video buffer includes a video buffer.
  • VPD video program stream
  • the program receiving unit After receiving the video program stream, the program receiving unit will perform necessary processing such as decryption and descrambling, and then send the processed video program stream to the audio and video buffer, which is mainly used for the video program.
  • the stream is buffered.
  • the real-time caption client can be controlled by the user to open.
  • the user can use the remote controller to open the real-time caption client on the terminal 2 to request the function of turning on the real-time caption.
  • the video program stream of the audio and video buffer will be directly sent to the program presentation unit for display; if the user opens the real-time subtitle client, the real-time subtitle client will perform anchoring at Correlation processing between the program presentation unit and the audio and video buffers such that all video program streams arrive at the real time subtitle client before reaching the program presentation unit.
  • the real-time subtitle client can use the output interface of the real VPD and the APD as the input of the real-time subtitle client, change the output interface to another name, and forge a new VPD and APD output interface as the real-time subtitle.
  • the output of the client is such that the video program stream subsequently received by the program presentation unit is obtained from the forged VPD and APD output interface, and the program presentation unit does not perceive the anchor of the real-time caption client.
  • the video program stream of the audio and video buffer will be sent to the real-time caption client.
  • the real-time subtitle client may also acquire the program identification of the video program corresponding to the current video program stream from the program receiving unit, or include the program identification in the video program stream received from the audio and video buffer.
  • the program identification is, for example, ProgramlD (which can identify a video program) or a URL.
  • ProgramlD which can identify a video program
  • URL a URL for example, for a DT V or IPT V program, the program identification can be ProgramlD; for Internet video, the program identification can be a URL.
  • the real-time subtitle client on the terminal 2 sends a subtitle acquisition request to the subtitle server in the cloud, where the subtitle acquisition request carries the program identifier.
  • the caption server in the cloud is used for caption acquisition. Because the cloud has strong computing power, the recognition algorithm can be easily upgraded, and the caption can be accurately identified, multi-language recognition, and multiple accents. / Dialect recognition, the real-time conversion of speech to text is more accurate, so that the user experience is optimal.
  • the cloud subtitle server can be a subtitle server provided by a professional provider, and can provide real-time subtitles for video programs (for example, terrestrial/satellite TV, cable TV, IPTV, Internet video, etc.) for all sources. .
  • the subtitle acquisition request sent by the real-time subtitle client to the cloud subtitle server may carry the program identifier (for example, ProgramlD or URL) obtained in 302, and the subtitle acquisition request may be carried by an HTTP message, and the message body may be XML implementation. If the program identifier is carried in the subtitle acquisition request, execution 304 is continued.
  • ProgramlD or URL for example, ProgramlD or URL
  • the real-time subtitle client can also send the audio stream in the video program stream to the subtitle server when sending the subtitle acquisition request, so that the subtitle server can jump to 309, that is, the subtitle server will directly according to the audio.
  • the stream performs voice text conversion to generate corresponding subtitles.
  • the cloud caption server determines whether the video program source where the video program corresponding to the program identifier is connected is connected.
  • the cloud subtitle server After receiving the subtitle acquisition request sent by the terminal 2, the cloud subtitle server determines whether the video program source of the video program corresponding to the program identifier is connected according to the program identifier carried in the subtitle acquisition request.
  • the subtitle stream channel is negotiated with the terminal 2 to deliver the subtitle, and jump to 310.
  • the subtitle is sent to the terminal 2.
  • the subtitle server may extract from the subtitle stored by itself according to the program identifier. If there is no connection, proceed to 305.
  • the cloud caption server establishes a connection with the video program source, and obtains the audio stream.
  • the cloud caption server determines that it is not connected to the video program source, the connection is sent to the video program source according to the program identifier.
  • the request may carry the program identifier, obtain a video program stream corresponding to the program identifier from the video program source, or at least obtain an audio stream in the video program stream. Otherwise, if the cloud caption server fails to establish a connection with the video source, proceed to 306.
  • the video program is freely available, or the provider of the subtitle server has a cooperative relationship with the provider of the video program source in advance, and allows the caption server to freely acquire the video program of the video program source, then the video program The video program stream or just the audio stream will be sent to the subtitle server.
  • the cloud caption server returns a connection failure response to the terminal 2;
  • connection failure response is used to indicate that the cloud caption server fails to connect to the video program source, and returns a connection failure response to the terminal 2. 307.
  • the terminal 2 negotiates with the cloud subtitle server to establish a streaming media channel.
  • the real-time subtitle client on the terminal 2 needs to negotiate with the cloud subtitle server to establish a streaming media channel;
  • the streaming media channel includes an uplink audio stream channel (RTP bearer) and a downlink subtitle stream channel ( RTP or FLUTE bearer);
  • RTP bearer uplink audio stream channel
  • RTP or FLUTE bearer downlink subtitle stream channel
  • the negotiation method of the specific streaming media channel can be negotiated through SDP offer/answer.
  • the terminal 2 sends the audio stream to the cloud caption server.
  • the real-time subtitle client on the terminal 2 sends the audio stream to the cloud subtitle server; the audio stream may be obtained from the audio buffer APD before the real-time subtitle client.
  • the cloud caption server converts the audio stream into voice text to generate the caption, and sets a synchronization identifier in the caption;
  • the cloud subtitle server of the embodiment is also set in the subtitle when performing real-time speech-to-text conversion on the audio stream.
  • Synchronization identifier is specifically a packet identifier of an audio package corresponding to the subtitle.
  • the subtitle server may insert a packet ID of the audio packet corresponding to the first word of the subtitle of the sentence at the beginning of each subtitle, that is, the packet ID, so that the subsequent terminal 2 can synchronize the subtitle with the audio stream according to the packet identifier of the audio packet. .
  • Cloud subtitles The server returns the subtitles to the terminal 2, and carries the synchronization identifier;
  • the subtitle server sends the subtitles that are converted in real time to the real-time subtitle client on the terminal 2 through the subtitle stream channel, and the subtitle can be a text type; and the synchronization identifier set in 309 is also sent to the real-time subtitle client.
  • the terminal 2 performs secondary buffering on the video program stream.
  • the terminal 2 since the subtitle is generated by the cloud subtitle server, after reaching the terminal 2, the subtitle has a time delay compared with the video program stream initially received by the terminal 2; therefore, in order to ensure synchronization of the video picture and the subtitle, the terminal The real-time subtitle client of 2 needs to buffer the received original video program stream (which can be called secondary buffer) in order to achieve a specific delay (for example, 10 seconds), at least delay to receive subtitles. At this time, the delay generated by the subtitle generation and the delivery is offset, and the inherent delay of the subtitle is guaranteed to not cause the picture and the subtitle to be out of sync.
  • a specific delay for example, 10 seconds
  • the terminal 2 synchronizes the subtitle with the audio stream according to the synchronization identifier.
  • the real-time subtitle client of the terminal 2 can synchronize the subtitle with the audio stream according to the packet identifier of the audio packet in the subtitle, that is, the audio stream and the video stream are synchronized, so the subtitle and audio stream are processed. Synchronization ensures that the subtitles are synchronized with the video picture during subsequent display.
  • the real-time subtitle client of the terminal 2 sends the video program stream and the subtitle to the program presentation unit, so that the program presentation unit simultaneously displays the video program and the subtitle;
  • the real-time subtitle client of the terminal 2 sends the video program stream superimposed with the subtitle to the program presentation module, so that the user can see the video with the subtitle at this time.
  • the terminal 1 requests to obtain real-time subtitles, where the program identifier may be carried;
  • the cloud caption server finds that the video program corresponding to the program identifier is being generated in real time according to the program identifier.
  • the cloud subtitle server negotiates with the terminal 1 the downlink subtitle stream channel. Since the terminal 1 does not use the uplink audio stream at this time, only one downlink subtitle stream channel can be negotiated.
  • the cloud subtitle server delivers the subtitle to the terminal 1 , and may also carry the synchronization identifier. After receiving the subtitle, the subsequent terminal 1 superimposes the subtitle according to the synchronization identifier in the video program stream for display.
  • the subtitle corresponding to the video program stream is obtained by the subtitle server in the cloud, and the subtitle is returned to the terminal, so that the terminal can display the subtitle together with the video program to implement the video program with the subtitle.
  • FIG. 4 is a schematic structural diagram of a terminal according to an embodiment of the present invention.
  • the terminal can perform a video processing method according to any embodiment of the present invention.
  • This embodiment only briefly describes the structure of the terminal, and the detailed structure and working principle can be combined with the present disclosure. The invention is described in any of the method embodiments.
  • the terminal in this embodiment may include: a program receiving unit 41, a real-time caption client 42 and a program presenting unit 43;
  • a program receiving unit 41 configured to receive a video program stream corresponding to the video program, where the video program The stream includes a video stream and an audio stream;
  • the real-time subtitle client 42 is configured to request, from the subtitle server, to acquire a subtitle corresponding to the video program stream, and receive the subtitle returned by the subtitle server;
  • the program presentation unit 43 is configured to display the video program and the subtitle.
  • the real-time caption client 42 is specifically configured to send a caption acquisition request to the caption server, where the caption acquisition request is used to request to acquire a caption corresponding to the video program stream, and the caption in the video program stream And transmitting an audio stream to the subtitle server, so that the subtitle server performs voice text conversion on the audio stream according to the subtitle acquisition request to generate the subtitle.
  • the real-time caption client 42 is specifically configured to send a caption acquisition request to the caption server, where the caption acquisition request is used to request to acquire a caption corresponding to the video program stream, where the caption acquisition request carries the video a program identification of the program, such that the subtitle server determines a subtitle corresponding to the video program stream according to the subtitle acquisition request and the program identification.
  • the real-time caption client 42 is further configured to: after sending a caption acquisition request to the caption server, receive a connection failure response sent by the caption server, where the connection failure response is used to indicate that the caption server is configured according to the program identifier Failed to connect the program source to obtain the audio stream, the program source is used to generate the video program stream; and, according to the connection failure response, send an audio stream in the video program stream to the subtitle server to And causing the subtitle server to perform voice text conversion on the audio stream to generate the subtitle.
  • the real-time subtitle client 42 is further configured to: after receiving the video program stream corresponding to the video program, buffering the video program stream, at least buffering to the receiving the subtitle server and returning When the subtitle corresponds to the video program stream.
  • the real-time subtitle client 42 is further configured to synchronize the subtitle with the audio stream according to the received packet identifier of the audio package included in the subtitle corresponding to the video program stream returned by the subtitle server, to
  • the program presentation unit is caused to synchronously display the video program and the subtitle.
  • the present embodiment provides a caption server, which can perform the video processing method of any embodiment of the present invention.
  • This embodiment only briefly describes the structure of the caption server, and its detailed structure and working principle can be combined with any method of the present invention. As described in the examples.
  • the caption server of this embodiment may include: a request receiving unit 51, a caption acquiring unit 52, and a caption sending unit 53;
  • the request for the subtitle corresponding to the stream, the video program stream includes a video stream and an audio stream;
  • a subtitle obtaining unit 52 configured to acquire a subtitle corresponding to the video program stream according to the request
  • a subtitle sending unit 53 configured to return the subtitle to the terminal, so that the terminal displays the video program and The subtitles.
  • FIG. 6 is a schematic structural diagram of another embodiment of a caption server according to the present invention. As shown in FIG. 6, the caption server of this embodiment is based on the structure shown in FIG.
  • the request receiving unit 51 is specifically configured to receive a subtitle acquisition request sent by the terminal, where the subtitle acquisition request is used to acquire a subtitle corresponding to the video program stream, and receive the video program stream sent by the terminal.
  • the subtitle obtaining unit 52 is configured to perform voice text conversion on the audio stream according to the subtitle acquisition request to generate the subtitle.
  • the request receiving unit 51 is specifically configured to receive a subtitle acquisition request sent by the terminal, where the subtitle acquisition request is used to acquire a subtitle corresponding to the video program stream, where the subtitle acquisition request carries a program identifier of the video program.
  • the subtitle obtaining unit 52 is configured to acquire a subtitle corresponding to the video program stream according to the subtitle obtaining request and the program identifier.
  • the subtitle obtaining unit 52 includes: a determining subunit 521, an obtaining subunit 522, and a converting subunit 523, and may further include a feedback subunit 524;
  • a determining subunit 521 configured to determine, according to the subtitle obtaining request, whether a program source where a video program corresponding to the program identifier is connected is connected;
  • a subtitle transmitting unit 53 configured to: when the determination result of the determining subunit is already connected, perform the returning the subtitle to the terminal; wherein the subtitle sending unit 53 is further configured to be stored in the subtitle server itself When there is a subtitle corresponding to the video program stream, the subtitle transmission is directly obtained from the stored subtitle;
  • the obtaining subunit 522 is configured to establish a connection with the program source and acquire an audio stream in the video program stream when the judgment result of the determining subunit is not connected;
  • the conversion subunit 523 is configured to perform voice text conversion on the audio stream to generate the subtitle; wherein the conversion subunit 523 may perform voice text conversion on the audio stream obtained by the acquisition subunit 522 from the program source, or When the audio stream is carried in the subtitle acquisition request received by the request receiving unit 51, the audio stream is directly converted into a voice text;
  • the feedback sub-unit 524 is configured to: when the obtaining sub-unit fails to establish a connection with the program source, return, to the terminal, a connection failure response indicating that the connection of the program source fails;
  • the request receiving unit 51 is further configured to receive an audio stream at the terminal that is sent by the terminal according to the connection failure response, so that the conversion subunit performs voice text conversion on the audio stream to generate the subtitle.
  • the conversion subunit 523 is further configured to: when performing voice word conversion on the audio stream to generate the subtitle, set a packet identifier of an audio package corresponding to the subtitle in the subtitle, so that the terminal The subtitle is synchronized with the audio stream according to the packet identification of the audio package.
  • the aforementioned program can be stored in a computer readable storage medium.
  • the program when executed, performs the steps including the foregoing method embodiments; and the foregoing storage medium includes: a medium that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Human Computer Interaction (AREA)
  • Computer Security & Cryptography (AREA)
  • Databases & Information Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Computational Linguistics (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Television Systems (AREA)

Abstract

本发明提供一种视频处理方法、终端及字幕服务器,其中方法包括:接收与视频节目对应的视频节目流;向字幕服务器请求获取与所述视频节目流对应的字幕;接收所述字幕服务器返回的与所述视频节目流对应的字幕;并显示所述视频节目及所述字幕。本发明实现了根据视频节目实时获取字幕。

Description

视频处理方法、 终端及字幕服务器 本申请要求于 2012 年 6 月 29 日提交中国专利局、 申请号为 201210222137.2、 名称为 "视频处理方法、 终端及字幕服务器" 的中国专利申 请的优先权, 其全部内容通过引用结合在本申请中。
技术领域 本发明涉及通信技术, 尤其涉及一种视频处理方法、 终端及字幕服务器。 背景技术 现在的电视剧、 电影等已经逐渐开始将字幕作为标配了,但是在现实情况 下, 还是有很多视频节目中并没有配置字幕, 例如新闻节目、 综艺节目、 体育 节目等, 尤其是现场直播的节目, 更来不及一边播出节目一边编辑字幕; 另外 还有很多互联网上的视频节目也是没有字幕的; 也有一些虽然有字幕,但并未 全程加配字幕等各种情况。这样对于观众来说,观看没有字幕的视频节目有些 费力, 尤其是对于听力有障碍的人士来说, 更无法观看没有字幕的视频节目, 无法享受到视频节目的乐趣。 发明内容 本发明提供一种视频处理方法、终端及字幕服务器, 以实现根据视频节目 实时获取字幕。
本发明的第一方面是提供一种视频处理方法, 包括:
接收与视频节目对应的视频节目流,向字幕服务器请求获取与所述视频节 目流对应的字幕;
接收所述字幕服务器返回的所述字幕; 并显示所述视频节目及所述字幕。 在一种可能的实现方式中,所述向字幕服务器请求获取与所述视频节目流 对应的字幕, 包括: 向所述字幕服务器发送字幕获取请求, 所述字幕获取请求 用于请求获取与所述视频节目流对应的字幕,并将所述视频节目流中的音频流 发送至所述字幕服务器,以使得所述字幕服务器根据所述字幕获取请求将所述 音频流进行语音文字转换生成所述字幕。
在另一种可能的实现方式中,接收的所述视频节目流中还包括所述视频节 目的节目标识; 所述向字幕服务器请求获取与所述视频节目流对应的字幕, 包 括: 向所述字幕服务器发送字幕获取请求, 所述字幕获取请求用于请求获取与 所述视频节目流对应的字幕, 所述字幕获取请求携带所述节目标识, 以使得所 述字幕服务器根据所述字幕获取请求和所述节目标识确定所述字幕。
在又一种可能的实现方式中, 在所述向字幕服务器发送字幕获取请求之 后, 接收所述字幕服务器返回的所述字幕之前, 还包括: 接收所述字幕服务器 发送的连接失败响应,所述连接失败响应用于表示所述字幕服务器根据所述节 目标识连接节目源获取所述音频流失败, 所述节目源用于产生所述视频节目 流; 根据所述连接失败响应,将所述视频节目流中的音频流发送至所述字幕服 务器, 以使得所述字幕服务器将所述音频流进行语音文字转换生成所述字幕。
在又一种可能的实现方式中,在所述接收与视频节目对应的视频节目流之 后, 显示所述视频节目以及所述字幕之前, 还包括: 将接收的与视频节目对应 的所述视频节目流进行緩冲存储,至少緩冲存储至所述接收所述字幕服务器返 回的与所述视频节目流对应的字幕时。
在又一种可能的实现方式中,接收的所述字幕服务器返回的与所述视频节 目流对应的字幕中还包括与所述字幕对应的音频包的包标识;则在所述接收所 述字幕服务器返回的与所述视频节目流对应的字幕之后,显示所述视频节目以 及所述字幕之前, 还包括: 根据所述音频包的包标识, 将所述字幕与所述音频 流进行同步, 以同步显示所述视频节目及所述字幕。
本发明的第二方面是提供一种视频处理方法, 包括: 求;
根据所述请求获取与所述视频节目流对应的字幕,并将所述字幕返回至所 述终端, 以使得所述终端显示所述视频节目及所述字幕。
在该第二方面的一种可能的实现方式中,所述接收终端发送的用于获取与 视频节目对应的视频节目流对应的字幕的请求, 包括: 接收终端发送的字幕获 取请求, 所述字幕获取请求用于请求获取与所述视频节目流对应的字幕, 并接 收所述终端发送的所述视频节目流中的音频流;所述获取与所述视频节目流对 应的字幕, 包括: 根据所述字幕获取请求将所述音频流进行语音文字转换生成 所述字幕。
在另一种可能的实现方式中,所述接收终端发送的用于获取与视频节目对 应的视频节目流对应的字幕的请求, 包括: 接收终端发送的字幕获取请求, 所 述字幕获取请求用于请求获取与所述视频节目流对应的字幕,所述字幕获取请 求携带所述视频节目的节目标识; 所述获取与所述视频节目流对应的字幕, 包 括: 根据所述字幕获取请求及所述节目标识获取与所述视频节目流对应的字 眷。
在又一种可能的实现方式中 ,所述根据所述字幕获取请求及所述节目标识 获取与所述视频节目流对应的字幕, 包括: 根据所述字幕获取请求判断是否已 经连接与所述节目标识对应的视频节目所在的所述节目源; 若已经连接, 则执 行所述将所述字幕返回至所述终端; 否则, 与所述节目源建立连接并获取所述 视频节目流中的音频流, 将音频流进行语音文字转换生成所述字幕。
在又一种可能的实现方式中, 所述方法还包括: 若与所述节目源建立连接 失败, 则向所述终端返回用于表示连接节目源失败的连接失败响应; 并接收所 述终端根据所述连接失败响应发送的所述终端处的音频流,将所述音频流进行 语音文字转换生成所述字幕。
在又一种可能的实现方式中 ,所述将所述音频流进行语音文字转换生成所 述字幕, 包括: 对所述音频流进行语音文字转换生成与所述视频节目流对应的 所述字幕, 并在所述字幕中设置与所述字幕对应的音频包的包标识, 以使得所 述终端根据所述音频包的包标识将所述字幕与所述音频流进行同步。
本发明的第三方面是提供一种终端, 包括:
节目接收单元, 用于接收与视频节目对应的视频节目流;
实时字幕客户端,用于向字幕服务器请求获取与所述视频节目流对应的字 幕, 并接收所述字幕服务器返回的所述字幕;
节目呈现单元, 用于显示所述视频节目及所述字幕。
在该第三方面的一种可能的实现方式中, 所述实时字幕客户端, 具体用于 向所述字幕服务器发送字幕获取请求,所述字幕获取请求用于请求获取与所述 视频节目流对应的字幕,并将所述视频节目流中的音频流发送至所述字幕服务 器,以使得所述字幕服务器根据所述字幕获取请求将所述音频流进行语音文字 转换生成所述字幕。
在另一种可能的实现方式中, 所述实时字幕客户端, 具体用于向所述字幕 服务器发送字幕获取请求,所述字幕获取请求用于获取与所述视频节目流对应 的字幕, 所述字幕获取请求携带所述视频节目的节目标识, 以使得所述字幕服 务器根据所述字幕获取请求和所述节目标识确定与所述视频节目流对应的字 眷。
在又一种可能的实现方式中, 所述实时字幕客户端,还用于在向字幕服务 器发送所述字幕获取请求之后,接收所述字幕服务器发送的连接失败响应, 所 述连接失败响应用于表示所述字幕服务器根据所述节目标识连接节目源获取 所述音频流失败, 所述节目源用于产生所述视频节目流; 以及, 根据所述连接 失败响应, 将所述视频节目流中的音频流发送至所述字幕服务器, 以使得所述 字幕服务器将所述音频流进行语音文字转换生成所述字幕。
在又一种可能的实现方式中, 所述实时字幕客户端,还用于在所述接收与 视频节目对应的视频节目流之后,将所述视频节目流进行緩冲存储, 至少緩冲 存储至所述接收所述字幕服务器返回的与所述视频节目流对应的字幕时。
在又一种可能的实现方式中, 所述实时字幕客户端,还用于根据接收的所 述字幕服务器返回的与视频节目流对应的字幕中包括的音频包的包标识,将所 述字幕与所述音频流进行同步,以使得节目呈现单元同步显示所述视频节目及 所述字幕。
本发明的第四方面是提供一种字幕服务器, 包括: 流中对应的字幕的请求;
字幕获取单元, 用于根据所述请求获取与所述视频节目流对应的字幕; 字幕发送单元, 用于将所述字幕返回至所述终端, 以使得所述终端显示所 述视频节目及所述字幕。
在该第四方面的一种可能的实现方式中, 所述请求接收单元, 具体用于接 收终端发送的字幕获取请求,所述字幕获取请求用于请求获取与所述视频节目 流对应的字幕, 并接收所述终端发送的所述视频节目流中的音频流; 所述字幕 获取单元,具体用于根据所述字幕获取请求将所述音频流进行语音文字转换生 成所述字幕。
在另一种可能的实现方式中, 所述请求接收单元, 具体用于接收终端发送 的字幕获取请求,所述字幕获取请求用于请求获取与所述视频节目流对应的字 幕, 所述字幕获取请求携带所述视频节目的节目标识; 所述字幕获取单元, 具 体用于根据所述字幕获取请求及所述节目标识获取与所述视频节目流对应的 字幕。
在又一种可能的实现方式中, 所述字幕获取单元包括: 判断子单元, 用于 根据所述字幕获取请求判断是否已经连接与所述节目标识对应的视频节目所 在的节目源; 所述字幕发送单元, 用于在所述判断子单元的判断结果是已经连 接时, 执行所述将所述字幕返回至所述终端; 获取子单元, 用于在所述判断子 单元的判断结果是未连接时,与所述节目源建立连接并获取所述视频节目流中 的音频流; 转换子单元, 用于将所述音频流进行语音文字转换生成所述字幕。
在又一种可能的实现方式中, 所述字幕获取单元还包括: 反馈子单元, 用 于在所述获取子单元与所述节目源建立连接失败时,向所述终端返回用于表示 连接节目源失败的连接失败响应; 所述请求接收单元,还用于接收所述终端根 据所述连接失败响应发送的所述终端处的音频流,以使得所述转换子单元将所 述音频流进行语音文字转换生成所述字幕。
在又一种可能的实现方式中, 所述转换子单元,还用于在对所述音频流进 行语音文字转换生成所述字幕时,在所述字幕中设置与所述字幕对应的音频包 的包标识,以使得所述终端 4艮据所述音频包的包标识将所述字幕与所述音频流 进行同步。
本发明提供的视频处理方法、终端及字幕服务器的技术效果是: 通过在接 收到视频节目流时, 向字幕服务器获取与视频节目流对应的字幕, 并将该字幕 与视频节目流对应的视频节目同时显示, 实现了根据视频节目实时获取字幕。 附图说明 图 1为本发明视频处理方法一实施例的流程示意图;
图 2为本发明视频处理方法另一实施例的流程示意图;
图 3为本发明视频处理方法又一实施例的信令示意图; 图 4为本发明终端实施例的结构示意图;
图 5为本发明字幕服务器一实施例的结构示意图;
图 6为本发明字幕服务器另一实施例的结构示意图。 具体实施方式 本发明各实施例中所述的视频节目,包括多种方式下发的视频节目,例如, 数字电视(Digital TV , 简称: DTV ) 、 交互式网络电视 ( Internet Protocol television, 简称: IPTV ) 、 中国移动多媒体广播 ( China Mobile Multimedia Broadcasting, 简称: CMMB ) 、 地面波 /卫星 TV、 有线电视、 Internet视频等; 所述的终端包括多种终端, 例如, 机顶盒(Set Top Box, 简称: STB ) 、 智能 电视(SmartTV ) 、 移动终端等。
实施例一
图 1为本发明视频处理方法一实施例的流程示意图, 本方法可以是终端执 行, 如图 1所示, 本实施例的视频处理方法可以包括:
101、 接收与视频节目对应的视频节目流;
其中, 所述的视频节目可以是如上所述的多种形式的视频节目, 通常是由 视频提供商提供的, 比如某些服务提供商(Service Provider, 简称: SP )或者 内容提供商(Content Provider, 简称: CP )可以提供视频节目。 并且, 终端接 收的是视频提供商发送的该视频节目对应的视频节目流,该视频节目流包括视 频流(即视频节目中的画面数据 ) 、 以及音频流(即视频节目中的声音数据 )。
102、 向字幕服务器请求获取与所述视频节目流对应的字幕;
其中, 终端在接收到视频节目流时, 不会立即呈现, 比如可以对视频节目 流进行緩冲,并在该緩冲时间内向字幕服务器例如云端的字幕服务器请求获取 与视频节目流对应的字幕,该与视频节目流对应的字幕实际指的是该字幕是与 视频节目流中的音频流对应的, 例如是对音频流进行语音文字转换得到, 并且 该字幕要与视频节目一同呈现。
本实施例中,该云端的字幕服务器例如可以是由专业的提供商提供的云端 字幕服务器, 由于云端通常具有较强的计算能力、 数据库存储能力等, 能够很 方便的扩展语音数据库以及进行语音识别算法的升级,所以云端的字幕服务器 语音识别的准确率较高;从云端的字幕服务器可以快速获取到识别准确率较高 的字幕。
103、 接收所述字幕服务器返回的与所述视频节目流对应的字幕; 其中, 云端的字幕服务器可以根据终端的获取字幕的请求, 获取与视频节 目流对应的字幕, 并将该字幕发送至终端。该字幕服务器获取字幕的方式有很 多种, 通常是将该音频流进行语音到文字的转换, 即进行语音识别, 得到对应 的字幕。
104、 显示所述视频节目及所述字幕。
其中, 终端在接收到云端的字幕服务器返回的字幕时,将该字幕与视频节 目流对应的视频节目一起呈现, 显示为带有字幕的视频节目。
本实施例的视频处理方法,通过由终端在接收到视频节目流时去字幕服务 器获取对应的字幕,再将字幕与视频节目流一起呈现,使得可以实现根据视频 节目实时获取字幕;例如,该终端如果接收到一没有字幕的视频节目流,此时, 终端就可以按照本实施例所述的方法进行处理, 自动获取到该视频节目流对应 的字幕, 方便了用户对视频节目的观看。 可选的, 在具体实施中, 可以设置一 开关用于控制是否启动实时字幕的获取功能, 可以由终端的用户控制,如果不 想使用实时字幕获取, 则不必开启该功能即可; 如果在看到视频节目没有字幕 且希望启动实时字幕获取功能, 则通过该开关开启, 终端就可以执行本实施例 所述的字幕获取流程了。
实施例二
图 2为本发明视频处理方法另一实施例的流程示意图, 本方法可以是字幕 服务器执行, 本实施例以云端的字幕服务器执行为例; 如图 2所示, 本实施例 的视频处理方法可以包括: 目流对应的字幕的请求;
其中,设置在云端的字幕服务器接收到终端发送的获取字幕的请求,请求 从该字幕服务器获取与视频节目流对应的字幕。
202、 根据所述请求获取与所述视频节目流对应的字幕;
其中, 字幕服务器获取字幕的方式有多种, 在实施例三中会详细说明; 简 单举例如下: 例如, 字幕服务器可能接收到的终端发送的获取字幕请求中就携 带有音频流,则字幕服务器直接将该音频流进行语音到文字的转换得到字幕即 可;
或者,字幕服务器可能接收到的终端发送的获取字幕请求中仅携带有视频 节目的节目标识,则字幕服务器可以根据该节目标识连接到节目源去获取该音 频流, 然后再进行语音到文字的转换得到字幕; 或者, 字幕服务器在接收到节 目标识时, 通过查看得到其自身已经存储有该节目标识对应的音频流的字幕 (该字幕可能是字幕服务器正在为另一终端进行语音识别而暂时存储), 则直 接将其存储的字幕发送至终端即可。
203、 将所述字幕返回至所述终端;
其中, 字幕服务器可以通过其与终端之间的字幕流通道将字幕下发至终 端; 如果事先尚未建立该字幕流通道, 则字幕服务器需要首先与终端协商建立 字幕流通道再下发字幕。 终端在接收到该字幕之后,将同时显示所述视频节目 及所述字幕。
本实施例的视频处理方法,通过由云端的字幕服务器获取与视频节目流中 对应的字幕, 并将该字幕返回至终端,使得终端可以将字幕与视频节目一同显 示, 实现带字幕的视频节目。
实施例三
图 3为本发明视频处理方法又一实施例的信令示意图,在该图 3中, 示出了 两个终端即终端 1和终端 2, 其中, 在终端 2中示出了其结构, 包括节目呈现单 元、 实时字幕客户端、 节目接收单元, 还可以包括视频緩冲区 VPD、 以及音频 緩冲区 APD; 该终端 2中的上述各个单元的具体功能将在实施例四中说明, 本 实施例是为了将视频处理方法说明的更加清楚,所以对终端的各个单元如何参 与该方法的流程进行了介绍。 终端 1与终端 2具有相同的结构, 在图 3中未显示 出来。 如图 3所示, 本实施例的方法仍然以云端的字幕 Λ良务器为例, 该方法可 以包括:
301、 终端 2上的节目接收单元从视频节目源获取视频节目流;
其中, 视频节目源向终端 2发送的视频节目流包括视频流和音频流, 该视 频录指的是视频节目的画面数据, 音频流指的是视频节目的声音数据。
302、终端 2上的实时字幕客户端从节目接收单元接收该视频节目流, 并获 取与视频节目对应的节目标识; 其中, 实时字幕客户端设置在节目呈现单元与节目接收单元之间, 具体的 是设置在音视频緩冲区与节目呈现单元之间,该音视频緩冲区包括视频緩冲区
VPD、 音频緩冲区 APD。 节目接收单元在接收到视频节目流之后, 将进行必要 的处理例如解密和解扰, 然后会将处理后的视频节目流发向音视频緩冲区, 该 音视频緩冲区主要用于对视频节目流进行緩冲。
本实施例中, 实时字幕客户端可以由用户控制是否开启, 例如, 用户可以 使用遥控器开启终端 2上的实时字幕客户端, 以请求开启实时字幕的功能。 如 果用户没有开启该实时字幕客户端,则音视频緩冲区的视频节目流将直接发送 至节目呈现单元进行显示; 如果用户开启了该实时字幕客户端, 该实时字幕客 户端将执行锚定在节目呈现单元与音视频緩冲区之间的相关处理,以使得所有 的视频节目流在到达节目呈现单元前都要先到达实时字幕客户端。
具体的,实时字幕客户端可以将真实的 VPD和 APD的输出接口作为该实时 字幕客户端的输入, 将该输出接口更改为其他名称; 同时伪造一个新的 VPD 和 APD的输出接口,作为该实时字幕客户端的输出,使节目呈现单元后续接收 的视频节目流均从该伪造的 VPD和 APD输出接口获取,而节目呈现单元并未感 知实时字幕客户端的锚定。在经过上述处理后, 音视频緩冲区的视频节目流将 发送至实时字幕客户端。
此外,该实时字幕客户端还可以从节目接收单元获取当前的视频节目流对 应的视频节目的节目标识,或者从音视频緩冲区接收的视频节目流中就包括该 节目标识。 该节目标识例如是 ProgramlD (能标识视频节目即可)或 URL, 例 如, 对于 DT V或 IPT V节目, 节目标识为可以是 ProgramlD; 对于 Internet视频, 节目标识可以是 URL。
303、 终端 2上的实时字幕客户端向云端的字幕服务器发送字幕获取请求, 该字幕获取请求携带节目标识;
其中, 本实施例是釆用云端的字幕服务器进行字幕获取的, 由于云端具有 较强的计算能力, 能够很方便的升级识别算法, 可以实现字幕的精确识别、 多 种语言的识别、 多种口音 /方言的识别, 语音到文字的实时转换精确度较高, 从而使用户体验达到最佳。该云端的字幕服务器可以是由专业提供商提供的字 幕服务器, 可以实现针对所有来源的视频节目 (例如, 地面波 /卫星 TV、 有线 电视、 IPTV、 Internet视频等) 均提供实时的提供字幕的功能。 本实施例中,实时字幕客户端向云端字幕服务器发送的字幕获取请求中可 以携带在 302中得到的节目标识(例如, ProgramlD 或者 URL ) , 该字幕获取 请求可以通过 HTTP消息承载, 消息体可以由 XML方式实现。 在字幕获取请求 中携带该节目标识, 则继续执行 304。
可选的, 实时字幕客户端还可以在发送字幕获取请求时, 并将视频节目流 中的音频流也一并发送至字幕服务器, 这样就可以跳转至 309, 即字幕服务器 将直接根据该音频流进行语音文字转换生成对应的字幕。
304、 云端字幕服务器判断是否已经连接与节目标识对应的视频节目所在 的视频节目源;
其中, 云端的字幕服务器接收到终端 2发送的字幕获取请求后, 将根据该 字幕获取请求中携带的节目标识,判断是否已经连接该节目标识对应的视频节 目所在的视频节目源。
如果已经连接视频节目源,表明该字幕服务器正在提供该节目标识对应的 视频节目的字幕服务(可能是正在为另一个终端提供) , 则与终端 2协商字幕 流通道下发字幕, 并跳转至 310 , 将字幕下发至终端 2即可, 此时, 如果终端 2 所请求的字幕是字幕服务器之前已经转换过的字幕,则字幕服务器可以根据节 目标识从其自身存储的字幕中提取。 如果没有连接, 则继续执行 305。
305、 云端字幕服务器与视频节目源建立连接, 并获取所述音频流; 本实施例中, 在 304中如果云端字幕服务器判断自身并没有连接视频节目 源,则根据节目标识向视频节目源发送连接请求,该请求中可以携带节目标识, 从视频节目源获取与节目标识对应的视频节目流,或者至少是获取该视频节目 流中的音频流。 否则, 若云端字幕服务器与视频节目源建立连接失败, 则继续 执行 306。
其中,如果所述的视频节目是可以免费获取的, 或者字幕服务器的提供商 事先与视频节目源的提供商之间有合作关系,允许字幕服务器自由获取视频节 目源的视频节目,那么该视频节目的视频节目流或者仅仅是音频流会发给字幕 服务器。
306、 云端字幕服务器向终端 2返回连接失败响应;
其中, 该连接失败响应用于表示云端字幕服务器连接视频节目源失败, 则 向终端 2返回连接失败响应。 307、 终端 2与云端字幕服务器协商建立流媒体通道;
其中, 终端 2上的实时字幕客户端接收到连接失败响应后, 需要与云端的 字幕服务器协商建立流媒体通道; 该流媒体通道包括上行的音频流通道(RTP 承载) 、 下行的字幕流通道(RTP或 FLUTE承载); 具体的流媒体通道的协商 方法可以通过 SDP offer/answer方式协商。
308、 终端 2将音频流发送至云端字幕服务器;
其中,在 307中的流媒体通道协商成功后, 终端 2上的实时字幕客户端将音 频流发给云端字幕服务器;该音频流可以是实时字幕客户端之前从音频緩冲区 APD中获取的。
309、 云端字幕服务器将音频流进行语音文字转换生成所述字幕, 并在字 幕中设置同步标识;
其中, 为了使得后续字幕的显示与视频节目的画面精确同步,避免有提前 或延迟的现象,本实施例的云端字幕服务器在对音频流进行实时的语音到文字 转换时,还在字幕中设置了同步标识; 该同步标识具体是釆用与字幕对应的音 频包的包标识。
例如,字幕服务器可以在每句字幕的开头插入该句字幕第一个字对应的音 频包的包标识即 Packet ID , 这样后续终端 2就可以根据该音频包的包标识将字 幕与音频流进行同步。
310、 云端字幕 Λ良务器将字幕返回至终端 2, 携带同步标识;
其中, 字幕服务器将实时转换的字幕通过字幕流通道下发给终端 2上的实 时字幕客户端, 该字幕可以是 text类型; 并将 309中设置的同步标识也一并发送 至实时字幕客户端。
311、 终端 2对视频节目流进行二级緩冲;
其中, 由于字幕是在云端字幕服务器生成的, 在到达终端 2后, 该字幕与 终端 2最初接收的视频节目流相比是有一定时延的; 因此, 为了保证视频画面 与字幕的同步, 终端 2的实时字幕客户端需要将接收的原始的视频节目流进行 一定的緩冲存储(可以称为二级緩冲) , 以便实现特定的延迟(例如 10秒) , 至少是延时至接收到字幕时,从而抵消字幕生成和下发产生的延迟,保证字幕 的固有延迟不会导致画面与字幕的不同步。
312、 终端 2根据同步标识将字幕与音频流进行同步; 其中, 终端 2的实时字幕客户端可以根据字幕中的音频包的包标识即 Packet ID, 将字幕与音频流进行同步, 而音频流和视频流本身就是同步的, 所 以经过上述的字幕与音频流的同步,就可以保证在后续显示时字幕与视频画面 的同步。
313、终端 2的实时字幕客户端将视频节目流以及字幕都发送至节目呈现单 元, 以使得节目呈现单元同时显示视频节目及字幕;
其中, 终端 2的实时字幕客户端将叠加有字幕的视频节目流发送给节目呈 现模块, 从而用户此时可以看到带有字幕的视频。
至此, 终端 2的流程已经结束; 如下的 314 317是假设在终端 2正在执行上 述的字幕获取流程时, 又有另外的终端 1也请求获取实时字幕, 这种情况下对 于终端 1的处理流程; 其中, 该终端 1的流程仅做简单说明, 其工作原理与终端 2的字幕获取流程基本相同, 具体可以参见前述步骤。
314、 终端 1请求获取实时字幕, 其中可以携带节目标识;
315、 云端字幕服务器根据节目标识, 发现已经正在对该节目标识对应的 视频节目进行实时的字幕生成;
316、 云端字幕服务器与终端 1协商下行的字幕流通道, 由于此时终端 1不 用上行音频流, 所以仅协商一个下行的字幕流通道即可。
317、 云端字幕服务器将字幕下发至终端 1 , 同样可以携带同步标识; 后续 终端 1在接收到该字幕后, 将根据该同步标识将字幕叠加在视频节目流中进行 显示。
本实施例的视频处理方法,通过由云端的字幕服务器获取与视频节目流对 应的字幕,并将该字幕返回至终端,使得终端可以将字幕与视频节目一同显示, 实现带字幕的视频节目。
实施例四
图 4为本发明终端实施例的结构示意图, 该终端可以执行本发明任意实施 例的视频处理方法, 本实施例仅对该终端的结构进行简单说明, 其详细的结构 和工作原理可以结合参见本发明任意方法实施例所述。
如图 4所示, 本实施例的终端可以包括: 节目接收单元 41、 实时字幕客户 端 42和节目呈现单元 43; 其中,
节目接收单元 41 , 用于接收与视频节目对应的视频节目流, 所述视频节目 流包括视频流、 音频流;
实时字幕客户端 42,用于向字幕服务器请求获取与所述视频节目流对应的 字幕, 并接收所述字幕服务器返回的所述字幕;
节目呈现单元 43 , 用于显示所述视频节目及所述字幕。
进一步的, 实时字幕客户端 42, 具体用于向所述字幕服务器发送字幕获取 请求, 所述字幕获取请求用于请求获取与所述视频节目流对应的字幕, 并将所 述视频节目流中的音频流发送至所述字幕服务器,以使得所述字幕服务器根据 所述字幕获取请求将所述音频流进行语音文字转换生成所述字幕。
进一步的, 实时字幕客户端 42, 具体用于向所述字幕服务器发送字幕获取 请求, 所述字幕获取请求用于请求获取与所述视频节目流对应的字幕, 所述字 幕获取请求携带所述视频节目的节目标识,以使得所述字幕服务器根据所述字 幕获取请求和所述节目标识确定与所述视频节目流对应的字幕。
进一步的, 实时字幕客户端 42,还用于在向字幕服务器发送字幕获取请求 之后,接收所述字幕服务器发送的连接失败响应, 所述连接失败响应用于表示 所述字幕服务器根据所述节目标识连接节目源获取所述音频流失败,所述节目 源用于产生所述视频节目流; 以及, 根据所述连接失败响应, 将所述视频节目 流中的音频流发送至所述字幕服务器,以使得所述字幕服务器将所述音频流进 行语音文字转换生成所述字幕。
进一步的, 实时字幕客户端 42,还用于在所述接收与视频节目对应的视频 节目流之后,将所述视频节目流进行緩冲存储, 至少緩冲存储至所述接收所述 字幕服务器返回的与所述视频节目流对应的字幕时。
进一步的, 实时字幕客户端 42,还用于根据接收的所述字幕服务器返回的 与视频节目流对应的字幕中包括的音频包的包标识,将所述字幕与所述音频流 进行同步, 以使得节目呈现单元同步显示所述视频节目及所述字幕。
实施例五
本实施例提供一种字幕服务器,该字幕服务器可以执行本发明任意实施例 的视频处理方法, 本实施例仅对该字幕服务器的结构简单说明, 其详细结构和 工作原理可以结合参见本发明任意方法实施例所述。
图 5为本发明字幕服务器一实施例的结构示意图,如图 5所示, 本实施例的 字幕服务器可以包括: 请求接收单元 51、 字幕获取单元 52和字幕发送单元 53; 其中, 目流对应的字幕的请求, 所述视频节目流包括视频流和音频流;
字幕获取单元 52 , 用于根据所述请求获取与所述视频节目流对应的字幕; 字幕发送单元 53 , 用于将所述字幕返回至所述终端, 以使得所述终端显示 所述视频节目及所述字幕。
图 6为本发明字幕服务器另一实施例的结构示意图, 如图 6所示, 本实施例 的字幕服务器在图 5所示结构的基础上,
进一步的, 请求接收单元 51 , 具体用于接收终端发送的字幕获取请求, 所 述字幕获取请求用于获取与所述视频节目流对应的字幕,并接收所述终端发送 的所述视频节目流中的音频流; 字幕获取单元 52 , 具体用于根据所述字幕获取 请求将所述音频流进行语音文字转换生成所述字幕。
进一步的, 请求接收单元 51 , 具体用于接收终端发送的字幕获取请求, 所 述字幕获取请求用于获取与所述视频节目流对应的字幕,所述字幕获取请求携 带所述视频节目的节目标识; 字幕获取单元 52 , 具体用于根据所述字幕获取请 求及所述节目标识获取与所述视频节目流对应的字幕。
进一步的, 字幕获取单元 52包括: 判断子单元 521、 获取子单元 522、 转换 子单元 523 , 还可以包括反馈子单元 524; 其中,
判断子单元 521 , 用于根据所述字幕获取请求判断是否已经连接与所述节 目标识对应的视频节目所在的节目源;
字幕发送单元 53 , 用于在所述判断子单元的判断结果是已经连接时,执行 所述将所述字幕返回至所述终端; 其中, 该字幕发送单元 53还用于在字幕服务 器自身已经存储有与视频节目流对应的字幕时,直接从存储的字幕中获取字幕 发送;
获取子单元 522 , 用于在所述判断子单元的判断结果是未连接时, 与所述 节目源建立连接并获取所述视频节目流中的音频流;
转换子单元 523 , 用于将所述音频流进行语音文字转换生成所述字幕; 其 中 ,该转换子单元 523可以是将获取子单元 522从节目源获得的音频流进行语音 文字转换, 或者是, 当请求接收单元 51接收到的字幕获取请求中携带有音频流 时, 直接对该音频流进行语音文字转换; 反馈子单元 524, 用于在所述获取子单元与所述节目源建立连接失败时, 向所述终端返回用于表示连接节目源失败的连接失败响应;
请求接收单元 51 ,还用于接收所述终端根据所述连接失败响应发送的所述 终端处的音频流,以使得所述转换子单元将所述音频流进行语音文字转换生成 所述字幕。
进一步的, 转换子单元 523 , 还用于在对所述音频流进行语音文字转换生 成所述字幕时,在所述字幕中设置与所述字幕对应的音频包的包标识, 以使得 所述终端根据所述音频包的包标识将所述字幕与音频流进行同步。
本领域普通技术人员可以理解:实现上述各方法实施例的全部或部分步骤 可以通过程序指令相关的硬件来完成。前述的程序可以存储于一计算机可读取 存储介质中。 该程序在执行时, 执行包括上述各方法实施例的步骤; 而前述的 存储介质包括: ROM, RAM,磁碟或者光盘等各种可以存储程序代码的介质。
最后应说明的是: 以上各实施例仅用以说明本发明的技术方案, 而非对其 限制; 尽管参照前述各实施例对本发明进行了详细的说明, 本领域的普通技术 人员应当理解: 其依然可以对前述各实施例所记载的技术方案进行修改, 或者 对其中部分或者全部技术特征进行等同替换; 而这些修改或者替换, 并不使相 应技术方案的本质脱离本发明各实施例技术方案的范围。

Claims

权 利 要求 书
1、 一种视频处理方法, 其特征在于, 包括:
接收与视频节目对应的视频节目流,向字幕服务器请求获取与所述视频节 目流对应的字幕;
接收所述字幕服务器返回的所述字幕, 并显示所述视频节目及所述字幕。
2、 根据权利要求 1所述的视频处理方法, 其特征在于, 所述向字幕服务器 请求获取与所述视频节目流对应的字幕, 包括:
向所述字幕服务器发送字幕获取请求,所述字幕获取请求用于请求获取与 所述视频节目流对应的字幕,并将所述视频节目流中的音频流发送至所述字幕 服务器,以使得所述字幕服务器根据所述字幕获取请求将所述音频流进行语音 文字转换生成所述字幕。
3、 根据权利要求 1所述的视频处理方法, 其特征在于, 接收的所述视频节 目流中还包括所述视频节目的节目标识;
所述向字幕服务器请求获取与所述视频节目流对应的字幕, 包括: 向所述字幕服务器发送字幕获取请求,所述字幕获取请求用于请求获取与 所述视频节目流对应的字幕, 所述字幕获取请求携带所述节目标识, 以使得所 述字幕服务器根据所述字幕获取请求和所述节目标识确定所述字幕。
4、 根据权利要求 3所述的视频处理方法, 其特征在于, 在所述向字幕服务 器发送字幕获取请求之后,接收所述字幕服务器返回的所述字幕之前,还包括: 接收所述字幕服务器发送的连接失败响应,所述连接失败响应用于表示所 述字幕服务器根据所述节目标识连接节目源获取所述音频流失败,所述节目源 用于产生所述视频节目流;
根据所述连接失败响应,将所述视频节目流中的音频流发送至所述字幕服 务器, 以使得所述字幕服务器将所述音频流进行语音文字转换生成所述字幕。
5、 根据权利要求 1-4任一所述的视频处理方法, 其特征在于, 在所述接收 与视频节目对应的视频节目流之后,显示所述视频节目以及所述字幕之前,还 包括:
将接收的与视频节目对应的所述视频节目流进行緩冲存储,至少緩冲存储 至所述接收所述字幕服务器返回的与所述视频节目流对应的字幕时。
6、 根据权利要求 1-5任一所述的视频处理方法, 其特征在于, 接收的所述 字幕服务器返回的与所述视频节目流对应的字幕中还包括与所述字幕对应的 音频包的包标识; 则
在所述接收所述字幕服务器返回的与所述视频节目流对应的字幕之后,显 示所述视频节目以及所述字幕之前, 还包括: 根据所述音频包的包标识, 将所 述字幕与所述音频流进行同步, 以同步显示所述视频节目及所述字幕。
7、 一种视频处理方法, 其特征在于, 包括: 求; 、 一 ' 、 ' 、 、 根据所述请求获取与所述视频节目流对应的字幕,并将所述字幕返回至所 述终端, 以使得所述终端显示所述视频节目及所述字幕。
8、 根据权利要求 7所述的视频处理方法, 其特征在于, 的请求, 包括:
接收终端发送的字幕获取请求,所述字幕获取请求用于请求获取与所述视 频节目流对应的字幕, 并接收所述终端发送的所述视频节目流中的音频流; 所述获取与所述视频节目流对应的字幕, 包括:
根据所述字幕获取请求将所述音频流进行语音文字转换生成所述字幕。
9、 根据权利要求 7所述的视频处理方法, 其特征在于, 的请求, 包括:
接收终端发送的字幕获取请求,所述字幕获取请求用于请求获取与所述视 频节目流对应的字幕, 所述字幕获取请求携带所述视频节目的节目标识; 所述获取与所述视频节目流对应的字幕, 包括:
根据所述字幕获取请求及所述节目标识获取与所述视频节目流对应的字 眷。
10、 根据权利要求 9所述的视频处理方法, 其特征在于, 所述根据所述字 幕获取请求及所述节目标识获取与所述视频节目流对应的字幕, 包括:
根据所述字幕获取请求判断是否已经连接与所述节目标识对应的视频节 目所在的所述节目源; 若已经连接, 则执行所述将所述字幕返回至所述终端; 否则, 与所述节目 源建立连接并获取所述视频节目流中的音频流,将所述音频流进行语音文字转 换生成所述字幕。
11、根据权利要求 10所述的视频处理方法,其特征在于,所述方法还包括: 若与所述节目源建立连接失败,则向所述终端返回用于表示连接节目源失 败的连接失败响应;并接收所述终端根据所述连接失败响应发送的所述终端处 的音频流, 将所述音频流进行语音文字转换生成所述字幕。
12、 根据权利要求 8或 10所述的视频处理方法, 其特征在于, 所述将所述 音频流进行语音文字转换生成所述字幕, 包括:
对所述音频流进行语音文字转换生成与所述视频节目流对应的所述字幕, 并在所述字幕中设置与所述字幕对应的音频包的包标识,以使得所述终端根据 所述音频包的包标识将所述字幕与所述音频流进行同步。
13、 一种终端, 其特征在于, 包括:
节目接收单元, 用于接收与视频节目对应的视频节目流;
实时字幕客户端,用于向字幕服务器请求获取与所述视频节目流对应的字 幕, 并接收所述字幕服务器返回的所述字幕;
节目呈现单元, 用于显示所述视频节目及所述字幕。
14、 根据权利要求 13所述的终端, 其特征在于,
所述实时字幕客户端, 具体用于向所述字幕服务器发送字幕获取请求, 所 述字幕获取请求用于请求获取与所述视频节目流对应的字幕,并将所述视频节 目流中的音频流发送至所述字幕服务器,以使得所述字幕服务器根据所述字幕 获取请求将所述音频流进行语音文字转换生成所述字幕。
15、 根据权利要求 13所述的终端, 其特征在于,
所述实时字幕客户端, 具体用于向所述字幕服务器发送字幕获取请求, 所 述字幕获取请求用于获取与所述视频节目流对应的字幕,所述字幕获取请求携 带所述视频节目的节目标识,以使得所述字幕服务器根据所述字幕获取请求和 所述节目标识确定与所述视频节目流对应的字幕。
16、 根据权利要求 15所述的终端, 其特征在于,
所述实时字幕客户端, 还用于在向字幕服务器发送所述字幕获取请求之 后,接收所述字幕服务器发送的连接失败响应, 所述连接失败响应用于表示所 述字幕服务器根据所述节目标识连接节目源获取所述音频流失败,所述节目源 用于产生所述视频节目流; 以及, 根据所述连接失败响应, 将所述视频节目流 中的音频流发送至所述字幕服务器,以使得所述字幕服务器将所述音频流进行 语音文字转换生成所述字幕。
17、 根据权利要求 13-16任一所述的终端, 其特征在于,
所述实时字幕客户端,还用于在所述接收与视频节目对应的视频节目流之 后,将所述视频节目流进行緩冲存储, 至少緩冲存储至所述接收所述字幕服务 器返回的与所述视频节目流对应的字幕时。
18、 根据权利要求 13-17任一所述的终端, 其特征在于,
所述实时字幕客户端,还用于根据接收的所述字幕服务器返回的与视频节 目流对应的字幕中包括的音频包的包标识, 将所述字幕与所述音频流进行同 步, 以使得节目呈现单元同步显示所述视频节目及所述字幕。
19、 一种字幕服务器, 其特征在于, 包括: 流对应的字幕的请求;
字幕获取单元, 用于根据所述请求获取与所述视频节目流对应的字幕; 字幕发送单元, 用于将所述字幕返回至所述终端, 以使得所述终端显示所 述视频节目及所述字幕。
20、 根据权利要求 19所述的字幕服务器, 其特征在于,
所述请求接收单元, 具体用于接收终端发送的字幕获取请求, 所述字幕获 取请求用于请求获取与所述视频节目流对应的字幕,并接收所述终端发送的所 述视频节目流中的音频流;
所述字幕获取单元,具体用于根据所述字幕获取请求将所述音频流进行语 音文字转换生成所述字幕。
21、 根据权利要求 19所述的字幕服务器, 其特征在于,
所述请求接收单元, 具体用于接收终端发送的字幕获取请求, 所述字幕获 取请求用于请求获取与所述视频节目流对应的字幕,所述字幕获取请求携带所 述视频节目的节目标识;
所述字幕获取单元,具体用于根据所述字幕获取请求及所述节目标识获取 与所述视频节目流对应的字幕。
22、 根据权利要求 21所述的字幕服务器, 其特征在于, 所述字幕获取单元 包括:
判断子单元,用于根据所述字幕获取请求判断是否已经连接与所述节目标 识对应的视频节目所在的节目源;
所述字幕发送单元, 用于在所述判断子单元的判断结果是已经连接时,执 行所述将所述字幕返回至所述终端;
获取子单元, 用于在所述判断子单元的判断结果是未连接时, 与所述节目 源建立连接并获取所述视频节目流中的音频流;
转换子单元, 用于将所述音频流进行语音文字转换生成所述字幕。
23、 根据权利要求 22所述的字幕服务器, 其特征在于, 所述字幕获取单元 还包括:
反馈子单元, 用于在所述获取子单元与所述节目源建立连接失败时, 向所 述终端返回用于表示连接节目源失败的连接失败响应;
所述请求接收单元,还用于接收所述终端根据所述连接失败响应发送的所 述终端处的音频流,以使得所述转换子单元将所述音频流进行语音文字转换生 成所述字幕。
24、 根据权利要求 22所述的字幕服务器, 其特征在于,
所述转换子单元,还用于在对所述音频流进行语音文字转换生成所述字幕 时,在所述字幕中设置与所述字幕对应的音频包的包标识, 以使得所述终端根 据所述音频包的包标识将所述字幕与所述音频流进行同步。
PCT/CN2013/078482 2012-06-29 2013-06-29 视频处理方法、终端及字幕服务器 WO2014000703A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP13809506.2A EP2852168A1 (en) 2012-06-29 2013-06-29 Video processing method, terminal and caption server
US14/568,409 US20150100981A1 (en) 2012-06-29 2014-12-12 Video Processing Method, Terminal, and Caption Server

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201210222137.2 2012-06-29
CN2012102221372A CN102802044A (zh) 2012-06-29 2012-06-29 视频处理方法、终端及字幕服务器

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US14/568,409 Continuation US20150100981A1 (en) 2012-06-29 2014-12-12 Video Processing Method, Terminal, and Caption Server

Publications (1)

Publication Number Publication Date
WO2014000703A1 true WO2014000703A1 (zh) 2014-01-03

Family

ID=47200995

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2013/078482 WO2014000703A1 (zh) 2012-06-29 2013-06-29 视频处理方法、终端及字幕服务器

Country Status (4)

Country Link
US (1) US20150100981A1 (zh)
EP (1) EP2852168A1 (zh)
CN (1) CN102802044A (zh)
WO (1) WO2014000703A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104871895A (zh) * 2015-04-30 2015-09-02 湖北省农业科学院果树茶叶研究所 一种富有甜柿中间砧快速育苗方法
CN106170986A (zh) * 2014-09-26 2016-11-30 株式会社阿斯台姆 节目输出装置、节目管理服务器、辅助信息管理服务器、节目和辅助信息的输出方法以及存储介质

Families Citing this family (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102802044A (zh) * 2012-06-29 2012-11-28 华为终端有限公司 视频处理方法、终端及字幕服务器
WO2015039888A1 (en) * 2013-09-20 2015-03-26 Koninklijke Kpn N.V. Correlating timeline information between media streams
US9860581B2 (en) 2013-09-20 2018-01-02 Koninklijke Kpn N.V. Correlating timeline information between media streams
CN104410924B (zh) * 2014-11-25 2018-03-23 广东欧珀移动通信有限公司 一种多媒体字幕显示方法及装置
KR101789221B1 (ko) * 2015-07-16 2017-10-23 네이버 주식회사 동영상 제공 장치, 동영상 제공 방법, 및 컴퓨터 프로그램
CN106454547B (zh) * 2015-08-11 2020-01-31 中国科学院声学研究所 一种实时字幕播出方法及系统
KR102407630B1 (ko) * 2015-09-08 2022-06-10 삼성전자주식회사 서버, 사용자 단말 및 이들의 제어 방법.
US9374536B1 (en) 2015-11-12 2016-06-21 Captioncall, Llc Video captioning communication system, devices and related methods for captioning during a real-time video communication session
US9525830B1 (en) 2015-11-12 2016-12-20 Captioncall Llc Captioning communication systems
CN107181986A (zh) * 2016-03-11 2017-09-19 百度在线网络技术(北京)有限公司 视频与字幕的匹配方法和装置
CN106128440A (zh) * 2016-06-22 2016-11-16 北京小米移动软件有限公司 一种歌词显示处理方法、装置、终端设备及系统
CN106792071A (zh) * 2016-12-19 2017-05-31 北京小米移动软件有限公司 字幕处理方法及装置
US9854324B1 (en) 2017-01-30 2017-12-26 Rovi Guides, Inc. Systems and methods for automatically enabling subtitles based on detecting an accent
CN108632681B (zh) * 2017-03-21 2020-04-03 华为技术有限公司 播放媒体流的方法、服务器及终端
US10397645B2 (en) * 2017-03-23 2019-08-27 Intel Corporation Real time closed captioning or highlighting method and apparatus
CN107146623B (zh) * 2017-04-07 2021-03-16 百度在线网络技术(北京)有限公司 基于人工智能的语音识别方法、装置和系统
CN107968956B (zh) * 2017-11-10 2020-08-28 深圳天珑无线科技有限公司 播放视频的方法、电子设备及具有存储功能的装置
CN108156480B (zh) * 2017-12-27 2022-01-04 腾讯科技(深圳)有限公司 一种视频字幕生成的方法、相关装置及系统
CN108449622B (zh) * 2018-03-26 2021-05-04 腾龙电子技术(上海)股份有限公司 一种混合数据源智能电视播放及交互系统
CN108573053B (zh) * 2018-04-24 2021-11-30 百度在线网络技术(北京)有限公司 信息推送方法、装置和系统
CN108600773B (zh) 2018-04-25 2021-08-10 腾讯科技(深圳)有限公司 字幕数据推送方法、字幕展示方法、装置、设备及介质
CN108833991A (zh) * 2018-06-29 2018-11-16 北京优酷科技有限公司 视频字幕显示方法及装置
CN108924598A (zh) * 2018-06-29 2018-11-30 北京优酷科技有限公司 视频字幕显示方法及装置
CN108924664B (zh) * 2018-07-26 2021-06-08 海信视像科技股份有限公司 一种节目字幕的同步显示方法及终端
US11178465B2 (en) * 2018-10-02 2021-11-16 Harman International Industries, Incorporated System and method for automatic subtitle display
CN109761126B (zh) * 2019-01-31 2023-06-30 深圳桥通物联科技有限公司 电梯轿厢内乘客对话的语音识别及内容显示的方法
CN113382291A (zh) * 2020-03-09 2021-09-10 海信视像科技股份有限公司 一种显示设备及流媒体播放方法
CN115547357B (zh) * 2022-12-01 2023-05-09 合肥高维数据技术有限公司 音视频伪造同步方法及其构成的伪造系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20020010781A (ko) * 2000-07-31 2002-02-06 윤종용 휴대용 무선 단말기에서의 음악파일 재생 방법
CN1859565A (zh) * 2005-05-01 2006-11-08 腾讯科技(深圳)有限公司 播放流媒体字幕的方法及其流媒体播放器
CN101518055A (zh) * 2006-09-21 2009-08-26 松下电器产业株式会社 字幕生成装置、字幕生成方法及字幕生成程序
CN101616181A (zh) * 2009-07-27 2009-12-30 腾讯科技(深圳)有限公司 一种上传和下载字幕文件的方法、系统和设备
CN102802044A (zh) * 2012-06-29 2012-11-28 华为终端有限公司 视频处理方法、终端及字幕服务器

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080284910A1 (en) * 2007-01-31 2008-11-20 John Erskine Text data for streaming video
US20080295040A1 (en) * 2007-05-24 2008-11-27 Microsoft Corporation Closed captions for real time communication
US20100265397A1 (en) * 2009-04-20 2010-10-21 Tandberg Television, Inc. Systems and methods for providing dynamically determined closed caption translations for vod content
CN102087668A (zh) * 2011-02-17 2011-06-08 天擎华媒(北京)科技有限公司 一种自动获取音视频字幕和歌词并快速定位检索及个性化显示的方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20020010781A (ko) * 2000-07-31 2002-02-06 윤종용 휴대용 무선 단말기에서의 음악파일 재생 방법
CN1859565A (zh) * 2005-05-01 2006-11-08 腾讯科技(深圳)有限公司 播放流媒体字幕的方法及其流媒体播放器
CN101518055A (zh) * 2006-09-21 2009-08-26 松下电器产业株式会社 字幕生成装置、字幕生成方法及字幕生成程序
CN101616181A (zh) * 2009-07-27 2009-12-30 腾讯科技(深圳)有限公司 一种上传和下载字幕文件的方法、系统和设备
CN102802044A (zh) * 2012-06-29 2012-11-28 华为终端有限公司 视频处理方法、终端及字幕服务器

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP2852168A4 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106170986A (zh) * 2014-09-26 2016-11-30 株式会社阿斯台姆 节目输出装置、节目管理服务器、辅助信息管理服务器、节目和辅助信息的输出方法以及存储介质
EP3110164A4 (en) * 2014-09-26 2017-09-06 Astem Co., Ltd. Program output apparatus, program management server, supplemental information management server, method for outputting program and supplemental information, and recording medium
CN104871895A (zh) * 2015-04-30 2015-09-02 湖北省农业科学院果树茶叶研究所 一种富有甜柿中间砧快速育苗方法

Also Published As

Publication number Publication date
EP2852168A4 (en) 2015-03-25
US20150100981A1 (en) 2015-04-09
EP2852168A1 (en) 2015-03-25
CN102802044A (zh) 2012-11-28

Similar Documents

Publication Publication Date Title
WO2014000703A1 (zh) 视频处理方法、终端及字幕服务器
JP6783293B2 (ja) 複数のオーバーザトップストリーミングクライアントを同期させること
US10341714B2 (en) Synchronization of multiple audio assets and video data
EP3100458B1 (en) Method and apparatus for synchronizing the playback of two electronic devices
US20120066711A1 (en) Virtualized home theater service
US20140068691A1 (en) Method, system, and apparatus for acquiring comment information when watching a program
US20120230389A1 (en) Decoder and method at the decoder for synchronizing the rendering of contents received through different networks
EP3100457B1 (en) Method and apparatus for synchronizing playbacks at two electronic devices
WO2007128194A1 (fr) Procédé, appareil et système pour lire des données audio/vidéo
CN106464933B (zh) 用于远程控制对多媒体内容的渲染的设备和方法
EP2891323B1 (en) Rendering time control
WO2017071670A1 (zh) 音视频同步方法、装置及系统
US20120154679A1 (en) User-controlled synchronization of audio and video
EP3319331B1 (en) Transmission of audio streams
CN106412646B (zh) 一种实现同步播放的方法和装置
US9549223B2 (en) Server, client apparatus, data distribution method, and data distribution system
JP7253477B2 (ja) ストリームを同期させる方法及び生成する方法、並びに対応するコンピュータプログラム、記憶媒体、並びにレンダリングデバイス、実行デバイス、及び生成デバイス
EP2695389B1 (en) Processing media streams for synchronised output at multiple end points
JP5381434B2 (ja) コンテンツ処理装置
US20150095941A1 (en) Multilingual audio service supporting system and method therefor
KR102126224B1 (ko) 다국어 음성 서비스 제공 시스템 및 그 방법
JP2010232716A (ja) ビデオコンテンツ受信装置、コメント交換システム、コメントデータの生成方法およびプログラム
JP2022083357A (ja) アプリケーションエンジン、これを実装した情報通信端末装置、及び字幕表示制御方法並びに該方法を実行するためのコンピュータプログラム
CN117376593A (zh) 直播流的字幕处理方法、装置、存储介质及计算机设备
JP2010187256A (ja) 通信システム、並びに放送情報送信装置および通信装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13809506

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2013809506

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE