CN112188241A - Method and system for real-time subtitle generation of live stream - Google Patents

Method and system for real-time subtitle generation of live stream Download PDF

Info

Publication number
CN112188241A
CN112188241A CN202011072549.3A CN202011072549A CN112188241A CN 112188241 A CN112188241 A CN 112188241A CN 202011072549 A CN202011072549 A CN 202011072549A CN 112188241 A CN112188241 A CN 112188241A
Authority
CN
China
Prior art keywords
engine
real time
websocket
frame
audio
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011072549.3A
Other languages
Chinese (zh)
Inventor
唐杰
王遥远
李庆瑜
戴立言
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SHANGHAI WONDERTEK SOFTWARE CO Ltd
Original Assignee
SHANGHAI WONDERTEK SOFTWARE CO Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SHANGHAI WONDERTEK SOFTWARE CO Ltd filed Critical SHANGHAI WONDERTEK SOFTWARE CO Ltd
Priority to CN202011072549.3A priority Critical patent/CN112188241A/en
Publication of CN112188241A publication Critical patent/CN112188241A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/235Processing of additional data, e.g. scrambling of additional data or processing content descriptors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays
    • H04N21/2187Live feed
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/233Processing of audio elementary streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/236Assembling of a multiplex stream, e.g. transport stream, by combining a video stream with other content or additional data, e.g. inserting a URL [Uniform Resource Locator] into a video stream, multiplexing software data into a video stream; Remultiplexing of multiplex streams; Insertion of stuffing bits into the multiplex stream, e.g. to obtain a constant bit-rate; Assembling of a packetised elementary stream
    • H04N21/23614Multiplexing of additional data and video streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/242Synchronization processes, e.g. processing of PCR [Program Clock References]

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The invention relates to the technical field of audio and video live broadcast, and provides a method and a system for generating subtitles in real time by live broadcast stream, wherein the method comprises the following steps: decoding a direct broadcasting information source to obtain a video frame and an audio frame; establishing communication between an AI speech translation engine and a transcoding engine; the AI speech translation engine acquires the audio frames from the transcoding engine, translates the audio frames in real time and outputs translation contents; and the transcoding engine acquires the translation content from the AI speech translation engine in real time, and encapsulates the translation content with the original video frame and the audio frame in any mode including burning the translation content into the video frame and filling and packaging the translation content into a caption frame to output the live broadcast stream. The audio stream is translated in real time to generate the caption during live broadcasting, real caption and audio and video synchronization is realized, the caption of multiple languages is output simultaneously, live broadcasting streams with the caption of different live broadcasting stream pushing protocols are met, and the player can freely select the caption of different languages for display.

Description

Method and system for real-time subtitle generation of live stream
Technical Field
The invention relates to the technical field of audio and video live broadcast, in particular to a method and a system for generating subtitles in real time by live broadcast streams.
Background
With the development of the times, the live broadcast industry has a qualitative leap, and the problems of high definition image quality, low time delay, sound and picture synchronization and the like are optimized to the utmost extent, but the requirements of users are not met.
In some scenes, such as various sporting events, large conference reports, online education and training, subtitles are generally seen only after being burnt into a video after being manually translated by post-production of live videos, but the good experience brought to users by the subtitles is lost in the process of live broadcasting.
Subtitles can help the audiences with weak hearing to understand the interface content, and because many words are homophonic, the program content can be clearer only by watching the programs through the combination of subtitle characters and audio. In addition, subtitles can also be used to translate foreign language programs, so that viewers who do not understand the foreign language can understand the program contents while hearing the original sound.
The function of live broadcasting real-time generation of the subtitles is not mature, and especially, the problem of synchronization of sound subtitle pictures during live broadcasting is that the subtitles and the sound pictures are not synchronous, and the experience for users is poor due to time advance and time delay.
In the existing stream pushing protocol, for example, rtmp does not support plug-in subtitles, and subtitles are burned into a video with subtitles. Some users need to freely select subtitles of different languages when playing, and the subtitles need to exist in a plug-in mode.
Also, the users may come from different countries, and multiple languages need to be generated simultaneously to meet the user requirements of different countries.
Disclosure of Invention
In view of the foregoing problems, an object of the present invention is to provide a method and system for generating subtitles in real time in a live stream. Aiming at the defects of the existing live broadcast real-time subtitle generation technology, the audio stream is translated in real time to generate subtitles during live broadcast, real subtitles and audio and video synchronization are realized, multi-language subtitles are output simultaneously, live broadcast streams with subtitles of different live broadcast stream push protocols are met, and the player can freely select subtitles in different languages for display.
The above object of the present invention is achieved by the following technical solutions:
a method for generating subtitles in real time by using a live stream comprises the following steps:
s1: acquiring a live broadcast information source, starting a decapsulation decoding thread through a transcoding engine, and decoding the live broadcast information source to obtain a video frame and an audio frame;
s2: establishing an AI voice translation engine and establishing communication between the AI voice translation engine and the transcoding engine;
s3: the AI speech translation engine acquires the audio frame from the transcoding engine, translates the audio frame in real time and outputs translation content;
s4: and the transcoding engine acquires the translated content from the AI speech translation engine in real time, and encapsulates the translated content with the original video frame and the original audio frame together to output a live stream in any one mode of burning the translated content into the video frame and filling and packaging the translated content into a subtitle frame.
Further, establishing communication between the AI speech translation engine and the transcoding engine through a websocket specifically includes:
establishing a websocket server A and a websocket client B on the transcoding engine;
establishing a websocket client C and a websocket server D on the AI speech translation engine;
the websocket client C initiates an authentication request to the websocket server A, connection is established successfully through authentication, and the AI speech translation engine acquires the audio frame from the transcoding engine in real time through websocket communication;
and the websocket client B initiates an authentication request to the websocket server D, connection is successfully established through authentication, and the transcoding engine acquires the translation content from the AI speech translation engine in real time through websocket communication.
Further, an authentication request initiated by a websocket client, including the websocket client B and the websocket client C, to a websocket server including the websocket server a and the websocket server D, specifically includes the following steps:
the websocket client side is preset with an agreed key, and the agreed key is encrypted through an MD5 algorithm to obtain a first MD5 encryption key;
the websocket client appends the first MD5 encryption key to the URL request in the form of a parameter;
after receiving the request of the websocket client, the websocket server analyzes the URL without the parameters and the first MD5 encryption key;
the websocket server side encrypts the agreed key again through the MD5 algorithm to obtain a second MD5 encryption key;
and the websocket server compares the first MD5 encryption key with the second MD5 encryption key, if the first MD5 encryption key and the second MD5 encryption key are equal, the authentication is successful, and otherwise, the authentication fails.
Further, in step S4, when the live stream is output by being encapsulated together with the original video frame and the original audio frame by burning the translated content into the video frame, the method is suitable for a streaming media server that does not support pushing the plug-in subtitle stream.
Further, in step S4, when the live stream is output by being encapsulated together with the original video frame and the original audio frame by using the manner of padding and packaging the translation content into the subtitle frame, the method is suitable for satisfying the requirement that the streaming media server displaying the subtitle can be freely selected by using the form of pushing the plug-in subtitle stream.
Further, the translated content is burned into the video frame or separately encapsulated into the caption frame by using one or more translation languages.
Further, in step S1, the method further includes:
and the transcoding engine performs timestamp correction on the video frame and the audio frame obtained by decoding, and ensures that timestamps are aligned and continuously increased.
Further, in step S3, the method further includes:
and the AI intelligent speech engine multiplexes the translated content obtained by translation with the timestamp carried by the audio frame, so that after the translated content and the original video frame and the audio frame are encapsulated, the synchronization of three parties of audio, picture and subtitle is ensured.
The invention also provides a system corresponding to the method for executing the live stream to generate the subtitles in real time, which comprises the following steps:
the live stream decoding module is used for acquiring a live information source, starting a decapsulation decoding thread through a transcoding engine, and decoding the live information source to obtain a video frame and an audio frame;
the communication establishing module is used for establishing an AI voice translation engine and establishing communication between the AI voice translation engine and the transcoding engine;
the translation module is used for providing the AI speech translation engine with the audio frame acquired from the transcoding engine, translating the audio frame in real time and outputting translation contents;
and the packaging module is used for providing the transcoding engine with the translated content to the AI speech translation engine in real time, and packaging the translated content together with the original video frame and the audio frame in any one mode including burning the translated content into the video frame and filling and packaging the translated content into a subtitle frame to output a live broadcast stream.
An electronic device comprising a processor and a memory, wherein the memory stores at least one instruction, at least one program, a set of codes, or a set of instructions, and the at least one instruction, the at least one program, the set of codes, or the set of instructions is loaded and executed by the processor to implement the above method for real-time subtitle generation for a live stream.
A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the above-mentioned method for generating subtitles in real time from a live stream.
Compared with the prior art, the invention has at least one of the following beneficial effects:
(1) the method for generating the subtitles in real time by the live stream comprises the following steps: s1: acquiring a live broadcast information source, starting a decapsulation decoding thread through a transcoding engine, and decoding the live broadcast information source to obtain a video frame and an audio frame; s2: establishing an AI voice translation engine and establishing communication between the AI voice translation engine and the transcoding engine; s3: the AI speech translation engine acquires the audio frame from the transcoding engine, translates the audio frame in real time and outputs translation content; s4: and the transcoding engine acquires the translated content from the AI speech translation engine in real time, and encapsulates the translated content with the original video frame and the original audio frame together to output a live stream in any one mode of burning the translated content into the video frame and filling and packaging the translated content into a subtitle frame. According to the technical scheme, the subtitles are generated in real time through one-time transcoding task, and three-way synchronization of audio and video subtitles is achieved.
(2) Burning the video frames or separately encapsulating the video frames into the subtitle frames by using the translated content in one or more translation languages. According to the technical scheme, the subtitles of multiple languages are output simultaneously through one transcoding task, and the selective display of the subtitles by the player can be met.
(3) And outputting a live broadcast stream by jointly packaging the original video frame and the audio frame by using any one of the ways including burning the translation content into the video frame and filling and packaging the translation content into a caption frame. The technical scheme can be compatible with any streaming media server which does not support or supports the push of the plug-in subtitle stream.
Drawings
Fig. 1 is an overall flowchart of a method for generating subtitles in real time for a live stream according to the present invention;
FIG. 2 is a schematic diagram of a transcoding engine and an AI speech translation engine establishing websocket communication according to the present invention;
FIG. 3 is a flowchart illustrating a websocket client sending an authentication request to a websocket server according to the present invention;
FIG. 4 is a flow chart of a live broadcast stream of a subtitle burned into a video finally obtained by the live broadcast information source according to the method of the present invention;
FIG. 5 is a flowchart of a live broadcast stream with a subtitle plug-in obtained by the live broadcast source according to the present invention;
fig. 6 is an overall structural diagram of a method for generating subtitles in real time in a live stream according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
First embodiment
As shown in fig. 1, the present embodiment provides a method for generating subtitles in real time in a live stream, including the following steps:
s1: and acquiring a live broadcast information source, starting a decapsulation decoding thread through a transcoding engine, and decoding the live broadcast information source to obtain a video frame and an audio frame.
Specifically, after a live broadcast information source is acquired in real time, the live broadcast information source immediately enters a transcoding engine, the live broadcast information source is unpacked through a unpacking and decoding thread started by the transcoding engine, and the live broadcast information source is decoded after being packaged to obtain a video frame and an audio frame.
Further, after the video frame and the audio frame are obtained by decoding, in order to ensure that the timestamps of the video frame and the audio frame are kept consistent, and prevent the occurrence of the situation that the user experience is affected due to asymmetry of video and audio when the video is played subsequently, the timestamps of the video frame and the audio frame obtained by decoding need to be corrected after decoding, the alignment of the timestamps of the video frame and the audio frame is kept, and the timestamp between each frame is continuously increased.
S2: and establishing an AI voice translation engine and establishing communication between the AI voice translation engine and the transcoding engine.
Specifically, in order to avoid interfering with the transcoding process and occupy the transcoding resources, an AI speech translation engine for translating speech is separately established in the invention, and in order to realize the translation of an audio stream, communication between the AI speech translation engine and the transcoding engine needs to be established first.
In this embodiment, the communication between the AI speech translation engine and the transcoding engine is established through a websocket, as shown in fig. 2, which specifically includes the following steps:
s211: and establishing a websocket server A and a websocket client B on the transcoding engine.
S212: and establishing a websocket client C and a websocket server D on the AI speech translation engine.
S213: the websocket client C sends an authentication request to the websocket server A, connection is established successfully through authentication, and the AI speech translation engine acquires the audio frame from the transcoding engine in real time through websocket communication.
S214: and the websocket client B initiates an authentication request to the websocket server D, connection is successfully established through authentication, and the transcoding engine acquires the translation content from the AI speech translation engine in real time through websocket communication.
Further, after establishing communication between the transcoding engine and the AI speech translation engine, before transmitting data, an authentication request initiated from a websocket client including the websocket client B and the websocket client C to a websocket server including the websocket server a and the websocket server D is also required, as shown in fig. 3, specifically including the following steps:
s221: the websocket client side is preset with a convention key, and the convention key is encrypted through an MD5 algorithm to obtain a first MD5 encryption key.
S222: the websocket client appends the first MD5 encryption key to the URL request in the form of a parameter.
S223: after receiving the request of the websocket client, the websocket server analyzes the URL without the parameters and the first MD5 encryption key;
s224: the websocket server side encrypts the agreed key again through the MD5 algorithm to obtain a second MD5 encryption key;
s225: and the websocket server compares the first MD5 encryption key with the second MD5 encryption key, if the first MD5 encryption key and the second MD5 encryption key are equal, the authentication is successful, and otherwise, the authentication fails.
S3: and the AI speech translation engine acquires the audio frame from the transcoding engine, translates the audio frame in real time and outputs translation content.
Specifically, in the translation, the translated content can be translated into one or more languages, and the video frames are burned or separately packaged into the caption frames. When the live broadcasting is carried out in the subsequent playing, any one or more subtitles can be selected to be synchronously displayed, so that the requirements of different people are met.
Furthermore, the AI intelligent speech engine multiplexes the translated content obtained by translation with the timestamp carried by the audio frame, so as to realize the three-party synchronization of the translated content, the original video frame and the original audio frame after the translated content is encapsulated, and ensure the three-party synchronization of the audio, the drawing and the caption.
S4: and the transcoding engine acquires the translated content from the AI speech translation engine in real time, and encapsulates the translated content with the original video frame and the original audio frame together to output a live stream in any one mode of burning the translated content into the video frame and filling and packaging the translated content into a subtitle frame.
Second embodiment
As shown in fig. 4, when the translated content is burned into the video frame and is packaged together with the original video frame and the audio frame to output a live stream, the method is suitable for a streaming media server that does not support pushing a plug-in subtitle stream, and the overall steps are as follows:
(1) the live broadcast information source obtains video frames and audio frames through decapsulation and decoding of the transcoding engine, and carries out timestamp correction on the video frames and the audio frames, so that timestamps are aligned and continuously increased.
(2) The transcoding engine sends the audio frame into the AI speech translation engine to translate and output the translated content in real time, the translated content obtained by translating the audio frame multiplexes the timestamp carried by the audio frame, and the translated content is aligned with the timestamp of the video frame when being burnt into the video frame, so that the three-party synchronization of audio, pictures and subtitles is ensured.
(3) The translated content is overlaid into the video frame through a transcoding engine burn-in module, and one or more translation languages can be selected from the burn-in module and burned into the video frame.
(4) And the transcoding engine encodes and encapsulates the video frames and the audio frames with the overlapped subtitles and outputs the encoded and encapsulated video frames and the audio frames as a live stream with the subtitles burnt into the video.
Third embodiment
As shown in fig. 5, when the live stream is output by encapsulating the translated content into the subtitle frame together with the original video frame and the audio frame in a manner of padding and packaging the translated content into the subtitle frame, the method is suitable for satisfying the requirement of freely selecting the streaming media server for displaying the subtitles in a form of pushing the plug-in subtitle stream, and the overall steps are as follows:
(1) the live broadcast information source obtains video frames and audio frames through decapsulation and decoding of the transcoding engine, and carries out timestamp correction on the video frames and the audio frames, so that timestamps are aligned and continuously increased.
(2) The transcoding engine sends the audio frame into the AI speech translation engine to translate and output the translated content in real time, the translated content obtained by translating the audio frame multiplexes the timestamp carried by the audio frame, and the subtitle frame encapsulated by the translated content is aligned with the timestamp carried by the audio frame and the video frame of the audio frame, so that the three-party synchronization of audio, picture and subtitle is ensured.
(3) The translated content is filled and packed into caption frames through a transcoding engine, and the translated content can be selected from one or more translation languages to be packed into one or more language caption frames.
(4) And the transcoding engine encodes and encapsulates the video frames, the audio frames and the subtitle frames and outputs the encoded and encapsulated video frames, audio frames and subtitle frames as a live stream with a subtitle plug-in function.
Fourth embodiment
As shown in fig. 6, this embodiment provides a system corresponding to the method for generating subtitles in real time in a live stream in the first embodiment, and includes:
the live stream decoding module 1 is used for acquiring a live information source, starting a decapsulation decoding thread through a transcoding engine, and decoding the live information source to obtain a video frame and an audio frame;
the communication establishing module 2 is used for establishing an AI voice translation engine and establishing communication between the AI voice translation engine and the transcoding engine;
the translation module 3 is used for providing the AI speech translation engine with the audio frame acquired from the transcoding engine, translating the audio frame in real time and outputting a translation content;
and the packaging module 4 is used for providing the transcoding engine with the translated content to the AI speech translation engine in real time, and packaging the translated content together with the original video frame and the original audio frame to output a live stream by using any one of the modes including burning the translated content into the video frame and filling and packaging the translated content into a subtitle frame.
It should be noted that other specific execution steps are the same as those in the first embodiment, and are not described in detail in this embodiment.
The present invention also provides a computer-readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements a method for live streaming real-time subtitle generation as in the first embodiment.
The above description is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above embodiments, and all technical solutions belonging to the idea of the present invention belong to the protection scope of the present invention. It should be noted that modifications and embellishments within the scope of the invention may occur to those skilled in the art without departing from the principle of the invention, and are considered to be within the scope of the invention.
The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
It should be noted that the above embodiments can be freely combined as necessary. The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (10)

1. A method for generating subtitles in real time by using a live stream is characterized by comprising the following steps:
s1: acquiring a live broadcast information source, starting a decapsulation decoding thread through a transcoding engine, and decoding the live broadcast information source to obtain a video frame and an audio frame;
s2: establishing an AI voice translation engine and establishing communication between the AI voice translation engine and the transcoding engine;
s3: the AI speech translation engine acquires the audio frame from the transcoding engine, translates the audio frame in real time and outputs translation content;
s4: and the transcoding engine acquires the translated content from the AI speech translation engine in real time, and encapsulates the translated content with the original video frame and the original audio frame together to output a live stream in any one mode of burning the translated content into the video frame and filling and packaging the translated content into a subtitle frame.
2. The method for generating subtitles in real time in a live stream according to claim 1, further comprising:
establishing communication between the AI voice translation engine and the transcoding engine through websocket, specifically:
establishing a websocket server A and a websocket client B on the transcoding engine;
establishing a websocket client C and a websocket server D on the AI speech translation engine;
the websocket client C initiates an authentication request to the websocket server A, connection is established successfully through authentication, and the AI speech translation engine acquires the audio frame from the transcoding engine in real time through websocket communication;
and the websocket client B initiates an authentication request to the websocket server D, connection is successfully established through authentication, and the transcoding engine acquires the translation content from the AI speech translation engine in real time through websocket communication.
3. The method for generating subtitles in real time by using a live stream according to claim 2, wherein an authentication request initiated by a websocket client including the websocket client B and the websocket client C to a websocket server including the websocket server a and the websocket server D specifically comprises the following steps:
the websocket client side is preset with an agreed key, and the agreed key is encrypted through an MD5 algorithm to obtain a first MD5 encryption key;
the websocket client appends the first MD5 encryption key to the URL request in the form of a parameter;
after receiving the request of the websocket client, the websocket server analyzes the URL without the parameters and the first MD5 encryption key;
the websocket server side encrypts the agreed key again through the MD5 algorithm to obtain a second MD5 encryption key;
and the websocket server compares the first MD5 encryption key with the second MD5 encryption key, if the first MD5 encryption key and the second MD5 encryption key are equal, the authentication is successful, and otherwise, the authentication fails.
4. The method for generating subtitles in real time from a live stream as claimed in claim 1, wherein in step S4, the method is suitable for a streaming media server that does not support pushing plug-in subtitle streams when outputting live streams by encapsulating the original video frames and audio frames together with the translated content burned into the video frames.
5. The method for generating subtitles in real time according to claim 1, wherein in step S4, when the live stream is output by being encapsulated together with the original video frames and audio frames by using the method of stuffing and packaging the translated content into subtitle frames, a streaming media server capable of displaying subtitles in a form of pushing plug-in subtitle streams is freely selected.
6. The method for generating subtitles in real time in a live stream according to claim 1, further comprising:
and burning the translated content into the video frame or independently encapsulating the video frame into the subtitle frame by using one or more translation languages.
7. The method for generating subtitles in real time in a live stream according to claim 1, wherein in step S1, the method further comprises:
and the transcoding engine performs timestamp correction on the video frame and the audio frame obtained by decoding, and ensures that timestamps are aligned and continuously increased.
8. The method for generating subtitles in real time in a live stream according to claim 7, wherein in step S3, the method further comprises:
and the AI intelligent speech engine multiplexes the translated content obtained by translation with the timestamp carried by the audio frame, so that after the translated content and the original video frame and the audio frame are encapsulated, the synchronization of three parties of audio, picture and subtitle is ensured.
9. A corresponding system for implementing the method for generating subtitles in real time for live streams according to claims 1 to 8, comprising:
the live stream decoding module is used for acquiring a live information source, starting a decapsulation decoding thread through a transcoding engine, and decoding the live information source to obtain a video frame and an audio frame;
the communication establishing module is used for establishing an AI voice translation engine and establishing communication between the AI voice translation engine and the transcoding engine;
the translation module is used for providing the AI speech translation engine with the audio frame acquired from the transcoding engine, translating the audio frame in real time and outputting translation contents;
and the packaging module is used for providing the transcoding engine with the translated content to the AI speech translation engine in real time, and packaging the translated content together with the original video frame and the audio frame in any one mode including burning the translated content into the video frame and filling and packaging the translated content into a subtitle frame to output a live broadcast stream.
10. A computer-readable storage medium, having a computer program stored thereon, which, when being executed by a processor, implements a method for live stream real-time generation of subtitles according to any one of claims 1 to 8.
CN202011072549.3A 2020-10-09 2020-10-09 Method and system for real-time subtitle generation of live stream Pending CN112188241A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011072549.3A CN112188241A (en) 2020-10-09 2020-10-09 Method and system for real-time subtitle generation of live stream

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011072549.3A CN112188241A (en) 2020-10-09 2020-10-09 Method and system for real-time subtitle generation of live stream

Publications (1)

Publication Number Publication Date
CN112188241A true CN112188241A (en) 2021-01-05

Family

ID=73948261

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011072549.3A Pending CN112188241A (en) 2020-10-09 2020-10-09 Method and system for real-time subtitle generation of live stream

Country Status (1)

Country Link
CN (1) CN112188241A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114040220A (en) * 2021-11-25 2022-02-11 京东科技信息技术有限公司 Live broadcasting method and device
WO2023219556A1 (en) * 2022-05-13 2023-11-16 Song Peng A system and method to manage a plurality of language audio streams

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2106121A1 (en) * 2008-03-27 2009-09-30 Mundovision MGI 2000, S.A. Subtitle generation methods for live programming
CN103024503A (en) * 2012-07-05 2013-04-03 合一网络技术(北京)有限公司 System and method for achieving remote control through mobile communication equipment terminal
JP2015212731A (en) * 2014-05-01 2015-11-26 日本放送協会 Acoustic event recognition device and program
CN108063970A (en) * 2017-11-22 2018-05-22 北京奇艺世纪科技有限公司 A kind of method and apparatus for handling live TV stream
CN108401192A (en) * 2018-04-25 2018-08-14 腾讯科技(深圳)有限公司 Video stream processing method, device, computer equipment and storage medium
CN110381389A (en) * 2018-11-14 2019-10-25 腾讯科技(深圳)有限公司 A kind of method for generating captions and device based on artificial intelligence
CN111010614A (en) * 2019-12-26 2020-04-14 北京奇艺世纪科技有限公司 Method, device, server and medium for displaying live caption

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2106121A1 (en) * 2008-03-27 2009-09-30 Mundovision MGI 2000, S.A. Subtitle generation methods for live programming
CN103024503A (en) * 2012-07-05 2013-04-03 合一网络技术(北京)有限公司 System and method for achieving remote control through mobile communication equipment terminal
JP2015212731A (en) * 2014-05-01 2015-11-26 日本放送協会 Acoustic event recognition device and program
CN108063970A (en) * 2017-11-22 2018-05-22 北京奇艺世纪科技有限公司 A kind of method and apparatus for handling live TV stream
CN108401192A (en) * 2018-04-25 2018-08-14 腾讯科技(深圳)有限公司 Video stream processing method, device, computer equipment and storage medium
CN110381389A (en) * 2018-11-14 2019-10-25 腾讯科技(深圳)有限公司 A kind of method for generating captions and device based on artificial intelligence
CN111010614A (en) * 2019-12-26 2020-04-14 北京奇艺世纪科技有限公司 Method, device, server and medium for displaying live caption

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114040220A (en) * 2021-11-25 2022-02-11 京东科技信息技术有限公司 Live broadcasting method and device
WO2023093322A1 (en) * 2021-11-25 2023-06-01 京东科技信息技术有限公司 Live broadcast method and device
WO2023219556A1 (en) * 2022-05-13 2023-11-16 Song Peng A system and method to manage a plurality of language audio streams

Similar Documents

Publication Publication Date Title
WO2019205872A1 (en) Video stream processing method and apparatus, computer device and storage medium
CN105959772B (en) Streaming Media and the instant simultaneous display of subtitle, matched processing method, apparatus and system
JP5543504B2 (en) 3D still image service method and apparatus based on digital broadcasting
JP5903924B2 (en) Receiving apparatus and subtitle processing method
US9584837B2 (en) Receiving device and method of controlling the same, distribution device and distribution method, program, and distribution system
US20020154691A1 (en) System and process for compression, multiplexing, and real-time low-latency playback of networked audio/video bit streams
CN110708564B (en) Live transcoding method and system for dynamically switching video streams
US9860574B2 (en) Method and apparatus for transceiving broadcast signal
KR960032442A (en) Encoding / Decoding System of Image Information
CN106331853B (en) Multimedia de-encapsulation method and device
US20140208351A1 (en) Video processing apparatus, method and server
CN112188241A (en) Method and system for real-time subtitle generation of live stream
US20180109743A1 (en) Broadcasting signal transmission device, broadcasting signal reception device, broadcasting signal transmission method, and broadcasting signal reception method
CN110784730A (en) Live video data transmission method, device, equipment and storage medium
CN114040255A (en) Live caption generating method, system, equipment and storage medium
KR20130138213A (en) Methods for processing multimedia flows and corresponding devices
US7216288B2 (en) Dynamic scene description emulation for playback of audio/visual streams on a scene description based playback system
KR102518817B1 (en) Broadcast signal transmission apparatus, broadcast signal reception apparatus, broadcast signal transmission method, and broadcast signal reception method
CN113938470A (en) Method and device for playing RTSP data source by browser and streaming media server
WO2017092433A1 (en) Method and device for video real-time playback
CN116233490A (en) Video synthesis method, system, device, electronic equipment and storage medium
JP6455974B2 (en) Receiving machine
CN112055253B (en) Method and device for adding and multiplexing independent subtitle stream
JP7125692B2 (en) Broadcast service communication network distribution apparatus and method
JP4755717B2 (en) Broadcast receiving terminal device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210105

RJ01 Rejection of invention patent application after publication