CN114040220A - Live broadcasting method and device - Google Patents

Live broadcasting method and device Download PDF

Info

Publication number
CN114040220A
CN114040220A CN202111414398.XA CN202111414398A CN114040220A CN 114040220 A CN114040220 A CN 114040220A CN 202111414398 A CN202111414398 A CN 202111414398A CN 114040220 A CN114040220 A CN 114040220A
Authority
CN
China
Prior art keywords
live
stream
source
subtitle data
live stream
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111414398.XA
Other languages
Chinese (zh)
Inventor
商红宾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jingdong Technology Information Technology Co Ltd
Original Assignee
Jingdong Technology Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jingdong Technology Information Technology Co Ltd filed Critical Jingdong Technology Information Technology Co Ltd
Priority to CN202111414398.XA priority Critical patent/CN114040220A/en
Publication of CN114040220A publication Critical patent/CN114040220A/en
Priority to PCT/CN2022/124310 priority patent/WO2023093322A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays
    • H04N21/2187Live feed
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/440236Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by media transcoding, e.g. video is transformed into a slideshow of still pictures, audio is converted into text
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/478Supplemental services, e.g. displaying phone caller identification, shopping application
    • H04N21/4788Supplemental services, e.g. displaying phone caller identification, shopping application communicating with other users, e.g. chatting
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/485End-user interface for client configuration
    • H04N21/4856End-user interface for client configuration for language selection, e.g. for the menu or subtitles

Abstract

The application discloses a live broadcasting method and device, and relates to the technical field of live broadcasting. One embodiment of the method comprises: in response to the detection of the relevant information of the source live stream reported by the source station, translating audio data in the source live stream in the source station based on a live translation process to obtain subtitle data; encoding subtitle data into a source live stream to obtain a target live stream; transcoding a target live stream in a source station to obtain a transcoded live stream; live broadcast is carried out based on the transcoded live broadcast stream, and the efficiency of generating live broadcast subtitles is effectively improved.

Description

Live broadcasting method and device
Technical Field
The application relates to the technical field of computers, in particular to the technical field of live broadcasting, and particularly relates to a live broadcasting method and device.
Background
With the development of the times, the live broadcast industry has a qualitative leap, and the problems of high definition image quality, low time delay, sound and picture synchronization and the like are optimized to the utmost extent, but the requirements of users are not met.
In some scenarios, such as sporting events, large meeting reports, online educational training, etc., real-time translation of live broadcasts and addition of multi-language captions are required. The prior art realizes the functions of live broadcast real-time translation and multi-language captions through offline manual simultaneous transmission and caption machine equipment.
The above method mainly has the following problems:
1. the cost is high, and professional simultaneous transmission personnel and related hardware equipment are needed for each live broadcast;
2. low efficiency, influence of manpower and equipment, low efficiency and no condition for large-scale use.
Disclosure of Invention
The embodiment of the application provides a live broadcast method, a live broadcast device, live broadcast equipment and a storage medium.
According to a first aspect, an embodiment of the present application provides a live broadcasting method, including: in response to the detection of the relevant information of the source live stream reported by the source station, translating audio data in the source live stream in the source station based on a live translation process to obtain subtitle data; encoding subtitle data into a source live stream to obtain a target live stream; transcoding the target live stream to obtain a transcoded live stream; and carrying out live broadcast based on the transcoded live broadcast stream.
In some embodiments, translating audio data in a source live stream in a source station based on a live translation process to obtain subtitle data includes: establishing communication between an AI speech translation engine and a live broadcast translation process; and translating audio data in a source live stream in the source station based on the communication between the AI speech translation engine and the live translation process to obtain subtitle data.
In some embodiments, establishing communication between the AI speech translation engine and the live translation process includes: and establishing communication between the AI speech translation engine and the live translation process through the websocket.
In some embodiments, encoding subtitle data into the source live stream to obtain a target live stream includes: responding to the acquired caption data, and aligning the timestamp of the caption data with the live source stream to obtain the aligned caption data and the aligned live source stream; and merging and encoding the aligned subtitle data and the source live stream to obtain a target live stream.
In some embodiments, the method further comprises: and responding to the detected relevant information of the source live broadcast stream reported by the source station, and locally caching the source live broadcast stream based on a live broadcast translation process.
According to a second aspect, an embodiment of the present application provides a live broadcast apparatus, including a translation module, configured to respond to detection of relevant information of a source live broadcast stream reported by a source station, and translate audio data in the source live broadcast stream in the source station based on a live broadcast translation process to obtain subtitle data; the encoding module is configured to encode the subtitle data into a source live stream to obtain a target live stream; the transcoding module is configured to transcode the target live stream to obtain a transcoded live stream; and the live broadcast module is configured to carry out live broadcast based on the transcoded live broadcast stream.
In some embodiments, the translation module comprises: an establishing unit configured to establish communication between the AI speech translation engine and the live translation process; and the communication unit is configured to translate audio data in a source live stream in the source station based on the communication between the AI speech translation engine and the live translation process to obtain subtitle data.
In some embodiments, the establishing unit is further configured to: and establishing communication between the AI speech translation engine and the live translation process through the websocket.
In some embodiments, the encoding module comprises: the alignment unit is configured to perform timestamp alignment on the subtitle data and the source live stream in response to the acquired subtitle data to obtain aligned subtitle data and source live stream; and the merging unit is configured to merge and encode the aligned subtitle data and the source live stream to obtain a target live stream.
In some embodiments, the apparatus further comprises: and the cache module is configured to respond to the detection of the related information of the source live stream reported by the source station and locally cache the source live stream based on the live translation process.
According to a third aspect, embodiments of the present application provide an electronic device, which includes one or more processors; a storage device having one or more programs stored thereon, which when executed by the one or more processors, cause the one or more processors to implement a live method as in any embodiment of the first aspect.
According to a fourth aspect, embodiments provide a computer-readable medium on which a computer program is stored, which when executed by a processor, implements a live method as in any of the embodiments of the first aspect.
The method comprises the steps that in response to the detection of relevant information of a source live stream reported by a source station, audio data in the source live stream in the source station are translated based on a live translation process to obtain subtitle data; encoding subtitle data into the source live stream to obtain a target live stream; transcoding the target live stream to obtain a transcoded live stream; live broadcast is carried out based on the transcoded live broadcast stream, the problems of high cost and low efficiency of generating the translation subtitles in the prior art are solved, the efficiency of generating the translation subtitles of the live broadcast stream in real time is improved, the problem that resources are consumed for generating the translation subtitles in the prior art during transcoding is solved, and the consumption of the resources is effectively reduced.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;
FIG. 2 is a flow diagram of one embodiment of a live method according to the present application;
fig. 3 is a schematic diagram of an application scenario of a live method according to the present application;
FIG. 4 is a flow diagram of another embodiment of a live method according to the present application;
FIG. 5 is a schematic diagram of one embodiment of a live device according to the present application;
FIG. 6 is a schematic block diagram of a computer system suitable for use in implementing a server according to embodiments of the present application.
Detailed Description
The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
Fig. 1 illustrates an exemplary system architecture 100 to which embodiments of the live method of the present application may be applied.
As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The terminal devices 101, 102, 103 interact with a server 105 via a network 104 to receive or send messages or the like. Various communication client applications, such as a live application, a communication application, and the like, may be installed on the terminal devices 101, 102, and 103.
The terminal apparatuses 101, 102, and 103 may be hardware or software. When the terminal devices 101, 102, 103 are hardware, they may be various electronic devices having a display screen, including but not limited to a mobile phone and a notebook computer. When the terminal apparatuses 101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as a plurality of software or software modules (e.g. to provide live services) or as a single software or software module. And is not particularly limited herein.
The server 105 may be a server providing various services, for example, in response to detecting relevant information of a source live stream reported by a source station, translating audio data in the source live stream in the source station based on a live translation process to obtain subtitle data; encoding subtitle data into a source live stream to obtain a target live stream; transcoding the target live stream to obtain a transcoded live stream; and carrying out live broadcast based on the transcoded live broadcast stream.
The server 105 may be hardware or software. When the server 105 is hardware, it may be implemented as a distributed server cluster composed of a plurality of servers, or may be implemented as a single server. When the server is software, it may be implemented as a plurality of software or software modules (for example, to provide a live service), or as a single software or software module. And is not particularly limited herein.
It should be noted that the live broadcast method provided by the embodiment of the present disclosure may be executed by the server 105, or executed by the terminal devices 101, 102, and 103, or executed by the server 105 and the terminal devices 101, 102, and 103 in cooperation with each other. Accordingly, each part (for example, each unit, sub-unit, module, sub-module) included in the live broadcast apparatus may be entirely provided in the server 105, may be entirely provided in the terminal devices 101, 102, and 103, and may be provided in the server 105 and the terminal devices 101, 102, and 103, respectively.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Fig. 2 shows a flow diagram 200 of a live method that can be applied to the present application. In this embodiment, the live broadcasting method includes the following steps:
step 201, in response to detecting the relevant information of the source live stream reported by the source station, translating the audio data in the source live stream in the source station based on the live translation process to obtain the subtitle data.
In this embodiment, an execution main body (for example, the server 105 or the terminal devices 101, 102, and 103 shown in fig. 1) may detect whether there is related information of a source live stream reported by a source station in real time or at preset time intervals, and if it is detected that there is related information of the source live stream reported by the source station, may issue a task to a preset live translation process to control the live translation process to pull audio data of the source live stream in the source station, and perform translation to obtain subtitle data.
The related information of the source live stream may include a stream name, a stream state, a stream address, and the like of the source live stream, for example, vhost, app, name, ip, port, and the like.
Here, the execution subject may simultaneously obtain a video source stream and an aac (Advanced Audio Coding) Audio stream of the source live stream, transcode the aac Audio stream into an opus Audio stream, and translate the opus Audio stream to obtain subtitle data.
Wherein, translating refers to translating the content corresponding to the audio stream into one or more languages.
It should be noted that the executing subject may directly control the live broadcast translation process to translate the audio data, or may create an AI (Artificial Intelligence) speech translation engine to communicate with the live broadcast translation process to translate the audio data to obtain a translated subtitle, which is not limited in this application.
In some optional ways, the method further comprises: and responding to the detected relevant information of the source live stream reported by the source station, and locally caching the source live stream based on the live translation process.
In this implementation manner, if the execution main body detects relevant information of a source live stream reported by a source station, the execution main body may issue a task to a preset live translation process to control the live translation process to acquire audio data of the source live stream in the source station, perform translation, and locally cache the acquired source live stream.
Specifically, if the execution main body detects the relevant information of the source live stream reported by the source station, two ffmpeg tasks may be issued to a preset live translation process, one ffmpeg task is used to instruct the live translation process to acquire audio data of the source live stream in the source station for translation, and one ffmpeg task is used to instruct the live translation process to locally cache the acquired source live stream, so as to ensure synchronization of the source live stream and the acquired subtitle data.
Ffmpeg is a set of open source computer programs that can be used to record, convert digital audio, video, and convert them into streams.
According to the realization mode, the related information of the source live stream reported by the source station is responded, the source live stream is locally cached based on a live translation process, audio data in the source live stream is translated to obtain subtitle data, the subtitle data is encoded into the source live stream to obtain a target live stream, transcoding live is carried out based on the target live stream, and the instantaneity of generating subtitles by the live stream is effectively improved.
Step 202, encoding the subtitle data into a source live stream to obtain a target live stream.
In this embodiment, after acquiring the subtitle data, the execution main body may directly encode the subtitle data into the corresponding source live stream, or may align the timestamp of the subtitle data with the timestamp of the source live stream, encode the subtitle data to obtain the target live stream, and push the target live stream to the source station.
In some optional manners, encoding subtitle data into a source live stream to obtain a target live stream, including: responding to the acquired caption data, and aligning the timestamp of the caption data with the live source stream to obtain the aligned caption data and the aligned live source stream; and merging and encoding the aligned subtitle data and the source live stream to obtain a target live stream.
In this implementation, the execution main body may detect the subtitle data in real time, and in response to obtaining the subtitle data, may perform timestamp alignment on the subtitle data and a video frame and an audio frame in the source live stream to obtain the aligned subtitle data and the source live stream. Here, the subtitle data may multiplex time stamps carried by the audio frames.
Further, the execution main body combines and encodes the aligned subtitle data and the source live stream to obtain a target live stream, namely a video stream with subtitles.
The realization mode carries out timestamp alignment on the subtitle data and the source live stream by responding to the acquired subtitle data to obtain the aligned subtitle data and the source live stream; and merging and encoding the aligned caption data and the source live stream to obtain a target live stream, thereby ensuring three-party synchronization of audio, pictures and captions.
And 203, transcoding the target live stream to obtain a transcoded live stream for live broadcasting.
In this embodiment, the execution main body may issue a task to a preset live transcoding process after detecting that the target live stream reported by the source station has been generated, so as to control the live transcoding process to pull the target live stream from the source station for transcoding, thereby obtaining a transcoded live stream.
Here, transcoding is used to instruct that a live stream in a source station is converted into a transcoded stream with different encoding formats, different resolutions, and different code rates at a cloud end and pushed to viewers, so as to meet playing requirements in various scenes, such as different network environments and different terminal devices.
Specifically, the execution main body can control a live transcoding process to convert a target live stream into transcoded streams with different definitions, and a user can select video streams with different code rates to play according to the network condition of the user, so that the playing fluency is guaranteed.
And step 204, performing live broadcasting based on the transcoded live broadcasting stream.
In this embodiment, after generating the transcoded live stream, the execution subject may push the transcoded live stream to the source station. The source station may push the transcoded live stream to a CDN (Content Delivery Network) for the end user to access.
The CDN is an intelligent virtual network constructed on the basis of the existing network, and by means of edge servers deployed in various places and functional modules of load balancing, content distribution, scheduling and the like of a central platform, a user can obtain required content nearby, network congestion is reduced, and the access response speed and hit rate of the user are improved.
With continued reference to fig. 3, fig. 3 is a schematic diagram of an application scenario of the live method according to the present embodiment.
In the application scenario of fig. 3, the execution main body 301 may detect, in real time or at preset intervals, whether a source station 302 exists, for example, a tp source station, and relevant information of a source live stream reported by the source station, and if the relevant information of the source live stream reported by the source station is detected, may issue a task to the live translation process 303 to control the live translation process 303 to obtain audio data 304 of the source live stream in the source station 302, and perform translation to obtain subtitle data 305. Encoding subtitle data 305 into a source live stream 306 to obtain a target live stream 307, and pushing the target live stream 307 to a source station 302; in response to detecting that the target live stream has been pushed to the source station 302, the execution subject may issue a task to the live transcoding process 308 to control the live transcoding process 308 to pull the target live stream 307 from the source station 302 and perform transcoding 309, so as to obtain a transcoded live stream 310, such as a multi-definition live stream with subtitles, and push the transcoded live stream to the source station 302 for live streaming.
According to the live broadcasting method, audio data in a source live broadcasting stream in a source station is translated based on a live broadcasting translation process in response to the detection of relevant information of the source live broadcasting stream reported by the source station, so that subtitle data is obtained; encoding subtitle data into a source live stream to obtain a target live stream; transcoding the target live stream to obtain a transcoded live stream; live broadcast is carried out based on the transcoded live broadcast stream, and the efficiency of real-time generation of live broadcast stream subtitles is effectively improved.
With further reference to fig. 4, a flow 400 of yet another embodiment of a live method is shown. In this embodiment, the process 400 of the live broadcasting method of this embodiment may include the following steps:
step 401, in response to detecting the relevant information of the source live stream reported by the source station, establishing communication between the AI speech translation engine and the live translation process.
In this embodiment, the execution subject may detect whether there is related information of the source live stream reported by the source station in real time or at a preset time interval, and in response to detecting the related information of the source live stream reported by the source station, establish communication between the AI speech translation engine and the live translation process.
Here, the executing agent may establish communication between the AI speech translation engine and the live translation process using a communication protocol in the prior art or future development technology, such as http, websocket, and the like.
In some optional ways, establishing communication between the AI speech translation engine and the live translation process includes: and establishing communication between the AI speech translation engine and the live translation process through the websocket.
In this implementation, the execution subject may establish communication between the AI speech translation engine and the live translation process using a websocket communication protocol.
The websocket is a technology for arbitrary bidirectional data transmission between an application and a server. The websocket Protocol is implemented based on a TCP (Transmission Control Protocol) Protocol, and includes an initial handshake process and a subsequent multiple data frame bidirectional Transmission process. The method aims to enable the server to avoid opening a plurality of HTTP (Hypertext Transfer Protocol) connections to work so as to save resources and improve the working efficiency and the resource utilization rate when the websocket application and the websocket server carry out frequent two-way communication.
Specifically, the operation of executing the live translation process controlled by the main body to translate the audio data in the source live stream may include: the method comprises the steps of obtaining an aac audio stream in a source live stream, transcoding the aac audio stream into an opus audio stream, writing the opus audio stream into a unix socket, further reading the audio stream in the unix socket and sending the audio stream into an AI translation engine, meanwhile establishing websocket link with the AI translation engine, detecting a translation task process in real time, and obtaining a translation output result.
According to the implementation mode, communication between the AI voice translation engine and the live broadcast translation process is established through the websocket, audio data in live broadcast source streams in the source station are translated based on the communication between the AI voice translation engine and the live broadcast translation process, caption data are obtained, live broadcast is carried out according to the caption data and the source live broadcast streams, and the efficiency of generating captions through the live broadcast streams is effectively improved.
Step 402, translating audio data in the source live stream based on communication between the AI speech translation engine and the live translation process to obtain caption data.
In this embodiment, after establishing communication between the AI speech translation engine and the live broadcast translation process, the execution subject may translate audio data in the live broadcast source stream through the communication between the AI speech translation engine and the live broadcast translation process, and obtain subtitle data according to a translation result.
And 403, encoding the subtitle data into the source live stream to obtain a target live stream.
In this embodiment, reference may be made to the description of step 202 for details of implementation and technical effects of step 403, which are not described herein again.
And step 404, transcoding the target live stream to obtain a transcoded live stream.
In this embodiment, details of implementation and technical effects of step 404 may refer to the description of step 203, and are not described herein again.
And step 405, performing live broadcasting based on the transcoded live broadcasting stream.
In this embodiment, details of implementation and technical effects of step 405 may refer to the description of step 204, and are not described herein again.
Compared with the embodiment corresponding to fig. 2, the flow 400 of the live broadcasting method in this embodiment embodies that in response to detecting relevant information of a source live broadcasting stream reported by a source station, communication between an AI speech translation engine and a live broadcasting translation process is established, audio data in the source live broadcasting stream in the source station is translated based on the communication between the AI speech translation engine and the live broadcasting translation process to obtain subtitle data, a target live broadcasting stream is obtained according to the subtitle data and the source live broadcasting stream, and transcoding live broadcasting is performed based on the target live broadcasting stream, so that resources of the live broadcasting translation process are not occupied, effectiveness and reliability of the obtained subtitle data are improved, and effectiveness and reliability of the obtained live broadcasting stream with subtitles are improved.
With further reference to fig. 5, as an implementation of the methods shown in the above-mentioned figures, the present application provides an embodiment of a live broadcast apparatus, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 2, and the apparatus may be specifically applied to various electronic devices.
As shown in fig. 5, the live broadcasting apparatus 500 of the present embodiment includes: translation module 501, encoding module 502, transcoding module 503, and live module 504.
The translation module 501 may be configured to, in response to detecting relevant information of a source live stream reported by a source station, translate audio data in the source live stream in the source station based on a live translation process to obtain subtitle data.
The encoding module 502 may be configured to encode the subtitle data into the source live stream, resulting in a target live stream.
Transcoding module 503 may be configured to transcode the target live stream to obtain a transcoded live stream.
And a live broadcasting module 504, which can be configured to push the transcoded live broadcasting stream to the source station for live broadcasting.
In some optional aspects of this embodiment, the translation module includes: an establishing unit configured to establish communication between the AI speech translation engine and the live translation process; and the communication unit is configured to translate audio data in a source live stream in the source station based on the communication between the AI speech translation engine and the live translation process to obtain subtitle data.
In some optional aspects of this embodiment, the establishing unit is further configured to: and establishing communication between the AI speech translation engine and the live translation process through the websocket.
In some optional manners of this embodiment, the encoding module includes: the alignment unit is configured to perform timestamp alignment on the subtitle data and the source live stream in response to the acquired subtitle data to obtain aligned subtitle data and source live stream; and the merging unit is configured to merge and encode the aligned subtitle data and the source live stream to obtain a target live stream.
In some optional manners of this embodiment, the apparatus further includes: and the cache module is configured to respond to the detection of the related information of the source live stream reported by the source station and locally cache the source live stream based on the live translation process.
According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.
As shown in fig. 6, the electronic device is a block diagram of a live broadcast method according to an embodiment of the present application.
600 is a block diagram of an electronic device in accordance with a live method of an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.
As shown in fig. 6, the electronic apparatus includes: one or more processors 601, memory 602, and interfaces for connecting the various components, including a high-speed interface and a low-speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 6, one processor 601 is taken as an example.
The memory 602 is a non-transitory computer readable storage medium as provided herein. Wherein the memory stores instructions executable by at least one processor to cause the at least one processor to perform the live broadcast methods provided herein. A non-transitory computer-readable storage medium of the present application stores computer instructions for causing a computer to perform a live method provided by the present application.
Memory 602, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules (e.g., translation module 501, encoding module 502, transcoding module 503, and live module 504 shown in fig. 5) corresponding to the live broadcast method in the embodiments of the present application. The processor 601 executes various functional applications of the server and the live broadcast by running non-transitory software programs, instructions and modules stored in the memory 602, that is, the live broadcast method in the above method embodiment is implemented.
The memory 602 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created by use of a live electronic device, and the like. Further, the memory 602 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 602 optionally includes memory located remotely from processor 601, and these remote memories may be connected to a live electronic device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The electronic device of the live broadcast method may further include: an input device 603 and an output device 604. The processor 601, the memory 602, the input device 603 and the output device 604 may be connected by a bus or other means, and fig. 6 illustrates the connection by a bus as an example.
The input device 603 may receive input numeric or character information, such as an input device like a touch screen, keypad, mouse, track pad, touch pad, pointer, one or more mouse buttons, track ball, joystick, etc. The output devices 604 may include a display device, auxiliary lighting devices (e.g., LEDs), and tactile feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.
Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: for live broadcast of a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
According to the technical scheme of the embodiment of the application, the efficiency of generating the live subtitles is improved.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.
The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (12)

1. A live method, the method comprising:
in response to the detection of the relevant information of the source live stream reported by the source station, translating the audio data in the source live stream in the source station based on a live translation process to obtain subtitle data;
encoding the subtitle data into the source live broadcast stream to obtain a target live broadcast stream;
transcoding the target live stream to obtain a transcoded live stream;
and carrying out live broadcast based on the transcoded live broadcast stream.
2. The method of claim 1, wherein the translating audio data in the source live stream in the source station based on a live translation process to obtain subtitle data comprises:
establishing communication between an AI speech translation engine and a live broadcast translation process;
and translating the audio data in the source live stream in the source station based on the communication between the AI speech translation engine and the live translation process to obtain subtitle data.
3. The method of claim 2, wherein establishing communication between the AI speech translation engine and the live translation process comprises:
and establishing communication between the AI speech translation engine and the live translation process through the websocket.
4. The method of claim 1, wherein encoding the subtitle data into the source live stream to obtain a target live stream comprises:
responding to the acquired subtitle data, and performing timestamp alignment on the subtitle data and the source live stream to obtain aligned subtitle data and source live stream;
and merging and encoding the aligned subtitle data and the source live stream to obtain a target live stream.
5. The method of claim 1, further comprising:
and responding to the detected relevant information of the source live broadcast stream reported by the source station, and locally caching the source live broadcast stream based on a live broadcast translation process.
6. A live device, the device comprising:
the translation module is configured to respond to the detection of relevant information of a source live stream reported by a source station, and translate audio data in the source live stream in the source station based on a live translation process to obtain subtitle data;
an encoding module configured to encode the subtitle data into the source live stream to obtain a target live stream;
the transcoding module is configured to transcode the target live stream to obtain a transcoded live stream;
a live broadcast module configured to live broadcast based on the transcoded live broadcast stream.
7. The apparatus of claim 6, wherein the translation module comprises:
an establishing unit configured to establish communication between the AI speech translation engine and the live translation process;
a communication unit configured to translate audio data in the source live stream in the source station based on communication between the AI speech translation engine and a live translation process to obtain subtitle data.
8. The apparatus of claim 7, wherein the establishing unit is further configured to:
and establishing communication between the AI speech translation engine and the live translation process through the websocket.
9. The apparatus of claim 6, wherein the encoding module comprises:
the alignment unit is configured to perform timestamp alignment on the subtitle data and the source live broadcast stream in response to the acquisition of the subtitle data, so as to obtain aligned subtitle data and source live broadcast stream;
and the merging unit is configured to merge and encode the aligned subtitle data and the source live stream to obtain a target live stream.
10. The apparatus of claim 6, the apparatus further comprising:
the cache module is configured to respond to the detection of the related information of the source live stream reported by the source station, and locally cache the source live stream based on the live translation process.
11. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory is stored with instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5.
12. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-5.
CN202111414398.XA 2021-11-25 2021-11-25 Live broadcasting method and device Pending CN114040220A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202111414398.XA CN114040220A (en) 2021-11-25 2021-11-25 Live broadcasting method and device
PCT/CN2022/124310 WO2023093322A1 (en) 2021-11-25 2022-10-10 Live broadcast method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111414398.XA CN114040220A (en) 2021-11-25 2021-11-25 Live broadcasting method and device

Publications (1)

Publication Number Publication Date
CN114040220A true CN114040220A (en) 2022-02-11

Family

ID=80145558

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111414398.XA Pending CN114040220A (en) 2021-11-25 2021-11-25 Live broadcasting method and device

Country Status (2)

Country Link
CN (1) CN114040220A (en)
WO (1) WO2023093322A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023093322A1 (en) * 2021-11-25 2023-06-01 京东科技信息技术有限公司 Live broadcast method and device

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116527840A (en) * 2023-07-05 2023-08-01 卓望数码技术(深圳)有限公司 Live conference intelligent subtitle display method and system based on cloud edge collaboration

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002010138A (en) * 2000-06-20 2002-01-11 Nippon Telegr & Teleph Corp <Ntt> Method for processing information and device therefor
EP2106121A1 (en) * 2008-03-27 2009-09-30 Mundovision MGI 2000, S.A. Subtitle generation methods for live programming
CN108401192A (en) * 2018-04-25 2018-08-14 腾讯科技(深圳)有限公司 Video stream processing method, device, computer equipment and storage medium
CN111010614A (en) * 2019-12-26 2020-04-14 北京奇艺世纪科技有限公司 Method, device, server and medium for displaying live caption
CN111901615A (en) * 2020-06-28 2020-11-06 北京百度网讯科技有限公司 Live video playing method and device
CN112188241A (en) * 2020-10-09 2021-01-05 上海网达软件股份有限公司 Method and system for real-time subtitle generation of live stream
CN113596491A (en) * 2021-07-23 2021-11-02 深圳市通拓信息技术网络有限公司 Cross-border live broadcast system based on cloud server

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114040220A (en) * 2021-11-25 2022-02-11 京东科技信息技术有限公司 Live broadcasting method and device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002010138A (en) * 2000-06-20 2002-01-11 Nippon Telegr & Teleph Corp <Ntt> Method for processing information and device therefor
EP2106121A1 (en) * 2008-03-27 2009-09-30 Mundovision MGI 2000, S.A. Subtitle generation methods for live programming
CN108401192A (en) * 2018-04-25 2018-08-14 腾讯科技(深圳)有限公司 Video stream processing method, device, computer equipment and storage medium
CN111010614A (en) * 2019-12-26 2020-04-14 北京奇艺世纪科技有限公司 Method, device, server and medium for displaying live caption
CN111901615A (en) * 2020-06-28 2020-11-06 北京百度网讯科技有限公司 Live video playing method and device
CN112188241A (en) * 2020-10-09 2021-01-05 上海网达软件股份有限公司 Method and system for real-time subtitle generation of live stream
CN113596491A (en) * 2021-07-23 2021-11-02 深圳市通拓信息技术网络有限公司 Cross-border live broadcast system based on cloud server

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023093322A1 (en) * 2021-11-25 2023-06-01 京东科技信息技术有限公司 Live broadcast method and device

Also Published As

Publication number Publication date
WO2023093322A1 (en) 2023-06-01

Similar Documents

Publication Publication Date Title
CN106412621B (en) Image display method and device, control method and relevant device between network direct broadcasting
US7499075B2 (en) Video conference choreographer
CN102883135B (en) Screen sharing and control method
US7991801B2 (en) Real-time dynamic and synchronized captioning system and method for use in the streaming of multimedia data
CN102761603B (en) Webpage flash video redirection method in VDI environment
WO2023093322A1 (en) Live broadcast method and device
US20160021149A1 (en) Methods and systems for dynamic adjustment of session parameters for effective video collaboration among heterogeneous devices
CN103763627B (en) A kind of method and system for realizing real-time video conference
US20110208837A1 (en) Method and system for data communications in cloud computing architecture
WO2020248649A1 (en) Audio and video data synchronous playback method, apparatus and system, electronic device and medium
KR102611151B1 (en) Live broadcast message transmission method, apparatus, electronic equipment and medium
CN110784525A (en) Cloud mobile phone control method, system and storage medium based on H5 webpage technology
CN104168453A (en) Method for implementing video monitoring stream media application system
CN105898506A (en) Method and system for multi-screen playing of media files
CN113225577A (en) Live stream processing method, device and system, electronic equipment and storage medium
US8255461B1 (en) Efficient transmission of changing images using image caching
CN110659330A (en) Data processing method, device and storage medium
CN114217996A (en) Sound mixing method and device
CN113542906A (en) RTSP video-based webpage plug-in-free playing method
Suga A comparison of bandwidth consumption between proprietary web conference services and BigBlueButton, an open source webinar system
US11777871B2 (en) Delivery of multimedia components according to user activity
CA3041692C (en) Multichannel video programming distributor stream controller
CN113259730A (en) Code rate adjustment method and device for live broadcast
CN111818046A (en) Method, device, equipment and storage medium for interacting information
CN114401254B (en) Streaming media service processing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20220211