CN114040220A

CN114040220A - Live broadcasting method and device

Info

Publication number: CN114040220A
Application number: CN202111414398.XA
Authority: CN
Inventors: 商红宾
Original assignee: Jingdong Technology Information Technology Co Ltd
Current assignee: Jingdong Technology Information Technology Co Ltd
Priority date: 2021-11-25
Filing date: 2021-11-25
Publication date: 2022-02-11
Also published as: WO2023093322A1

Abstract

The application discloses a live broadcasting method and device, and relates to the technical field of live broadcasting. One embodiment of the method comprises: in response to the detection of the relevant information of the source live stream reported by the source station, translating audio data in the source live stream in the source station based on a live translation process to obtain subtitle data; encoding subtitle data into a source live stream to obtain a target live stream; transcoding a target live stream in a source station to obtain a transcoded live stream; live broadcast is carried out based on the transcoded live broadcast stream, and the efficiency of generating live broadcast subtitles is effectively improved.

Description

Live broadcasting method and device

Technical Field

The application relates to the technical field of computers, in particular to the technical field of live broadcasting, and particularly relates to a live broadcasting method and device.

Background

With the development of the times, the live broadcast industry has a qualitative leap, and the problems of high definition image quality, low time delay, sound and picture synchronization and the like are optimized to the utmost extent, but the requirements of users are not met.

In some scenarios, such as sporting events, large meeting reports, online educational training, etc., real-time translation of live broadcasts and addition of multi-language captions are required. The prior art realizes the functions of live broadcast real-time translation and multi-language captions through offline manual simultaneous transmission and caption machine equipment.

The above method mainly has the following problems:

1. the cost is high, and professional simultaneous transmission personnel and related hardware equipment are needed for each live broadcast;

2. low efficiency, influence of manpower and equipment, low efficiency and no condition for large-scale use.

Disclosure of Invention

The embodiment of the application provides a live broadcast method, a live broadcast device, live broadcast equipment and a storage medium.

According to a first aspect, an embodiment of the present application provides a live broadcasting method, including: in response to the detection of the relevant information of the source live stream reported by the source station, translating audio data in the source live stream in the source station based on a live translation process to obtain subtitle data; encoding subtitle data into a source live stream to obtain a target live stream; transcoding the target live stream to obtain a transcoded live stream; and carrying out live broadcast based on the transcoded live broadcast stream.

In some embodiments, translating audio data in a source live stream in a source station based on a live translation process to obtain subtitle data includes: establishing communication between an AI speech translation engine and a live broadcast translation process; and translating audio data in a source live stream in the source station based on the communication between the AI speech translation engine and the live translation process to obtain subtitle data.

In some embodiments, establishing communication between the AI speech translation engine and the live translation process includes: and establishing communication between the AI speech translation engine and the live translation process through the websocket.

In some embodiments, encoding subtitle data into the source live stream to obtain a target live stream includes: responding to the acquired caption data, and aligning the timestamp of the caption data with the live source stream to obtain the aligned caption data and the aligned live source stream; and merging and encoding the aligned subtitle data and the source live stream to obtain a target live stream.

In some embodiments, the method further comprises: and responding to the detected relevant information of the source live broadcast stream reported by the source station, and locally caching the source live broadcast stream based on a live broadcast translation process.

According to a second aspect, an embodiment of the present application provides a live broadcast apparatus, including a translation module, configured to respond to detection of relevant information of a source live broadcast stream reported by a source station, and translate audio data in the source live broadcast stream in the source station based on a live broadcast translation process to obtain subtitle data; the encoding module is configured to encode the subtitle data into a source live stream to obtain a target live stream; the transcoding module is configured to transcode the target live stream to obtain a transcoded live stream; and the live broadcast module is configured to carry out live broadcast based on the transcoded live broadcast stream.

In some embodiments, the translation module comprises: an establishing unit configured to establish communication between the AI speech translation engine and the live translation process; and the communication unit is configured to translate audio data in a source live stream in the source station based on the communication between the AI speech translation engine and the live translation process to obtain subtitle data.

In some embodiments, the establishing unit is further configured to: and establishing communication between the AI speech translation engine and the live translation process through the websocket.

In some embodiments, the encoding module comprises: the alignment unit is configured to perform timestamp alignment on the subtitle data and the source live stream in response to the acquired subtitle data to obtain aligned subtitle data and source live stream; and the merging unit is configured to merge and encode the aligned subtitle data and the source live stream to obtain a target live stream.

In some embodiments, the apparatus further comprises: and the cache module is configured to respond to the detection of the related information of the source live stream reported by the source station and locally cache the source live stream based on the live translation process.

According to a third aspect, embodiments of the present application provide an electronic device, which includes one or more processors; a storage device having one or more programs stored thereon, which when executed by the one or more processors, cause the one or more processors to implement a live method as in any embodiment of the first aspect.

According to a fourth aspect, embodiments provide a computer-readable medium on which a computer program is stored, which when executed by a processor, implements a live method as in any of the embodiments of the first aspect.

The method comprises the steps that in response to the detection of relevant information of a source live stream reported by a source station, audio data in the source live stream in the source station are translated based on a live translation process to obtain subtitle data; encoding subtitle data into the source live stream to obtain a target live stream; transcoding the target live stream to obtain a transcoded live stream; live broadcast is carried out based on the transcoded live broadcast stream, the problems of high cost and low efficiency of generating the translation subtitles in the prior art are solved, the efficiency of generating the translation subtitles of the live broadcast stream in real time is improved, the problem that resources are consumed for generating the translation subtitles in the prior art during transcoding is solved, and the consumption of the resources is effectively reduced.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;

FIG. 2 is a flow diagram of one embodiment of a live method according to the present application;

fig. 3 is a schematic diagram of an application scenario of a live method according to the present application;

FIG. 4 is a flow diagram of another embodiment of a live method according to the present application;

FIG. 5 is a schematic diagram of one embodiment of a live device according to the present application;

FIG. 6 is a schematic block diagram of a computer system suitable for use in implementing a server according to embodiments of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

Fig. 1 illustrates an exemplary system architecture 100 to which embodiments of the live method of the present application may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The

terminal devices

101, 102, 103 interact with a server 105 via a network 104 to receive or send messages or the like. Various communication client applications, such as a live application, a communication application, and the like, may be installed on the

terminal devices

101, 102, and 103.

The

terminal apparatuses

101, 102, and 103 may be hardware or software. When the

terminal devices

101, 102, 103 are hardware, they may be various electronic devices having a display screen, including but not limited to a mobile phone and a notebook computer. When the

terminal apparatuses

101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as a plurality of software or software modules (e.g. to provide live services) or as a single software or software module. And is not particularly limited herein.

The server 105 may be a server providing various services, for example, in response to detecting relevant information of a source live stream reported by a source station, translating audio data in the source live stream in the source station based on a live translation process to obtain subtitle data; encoding subtitle data into a source live stream to obtain a target live stream; transcoding the target live stream to obtain a transcoded live stream; and carrying out live broadcast based on the transcoded live broadcast stream.

The server 105 may be hardware or software. When the server 105 is hardware, it may be implemented as a distributed server cluster composed of a plurality of servers, or may be implemented as a single server. When the server is software, it may be implemented as a plurality of software or software modules (for example, to provide a live service), or as a single software or software module. And is not particularly limited herein.

It should be noted that the live broadcast method provided by the embodiment of the present disclosure may be executed by the server 105, or executed by the

terminal devices

101, 102, and 103, or executed by the server 105 and the

terminal devices

101, 102, and 103 in cooperation with each other. Accordingly, each part (for example, each unit, sub-unit, module, sub-module) included in the live broadcast apparatus may be entirely provided in the server 105, may be entirely provided in the

terminal devices

101, 102, and 103, and may be provided in the server 105 and the

terminal devices

101, 102, and 103, respectively.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Fig. 2 shows a flow diagram 200 of a live method that can be applied to the present application. In this embodiment, the live broadcasting method includes the following steps:

step 201, in response to detecting the relevant information of the source live stream reported by the source station, translating the audio data in the source live stream in the source station based on the live translation process to obtain the subtitle data.

In this embodiment, an execution main body (for example, the server 105 or the

terminal devices

101, 102, and 103 shown in fig. 1) may detect whether there is related information of a source live stream reported by a source station in real time or at preset time intervals, and if it is detected that there is related information of the source live stream reported by the source station, may issue a task to a preset live translation process to control the live translation process to pull audio data of the source live stream in the source station, and perform translation to obtain subtitle data.

The related information of the source live stream may include a stream name, a stream state, a stream address, and the like of the source live stream, for example, vhost, app, name, ip, port, and the like.

Here, the execution subject may simultaneously obtain a video source stream and an aac (Advanced Audio Coding) Audio stream of the source live stream, transcode the aac Audio stream into an opus Audio stream, and translate the opus Audio stream to obtain subtitle data.

Wherein, translating refers to translating the content corresponding to the audio stream into one or more languages.

It should be noted that the executing subject may directly control the live broadcast translation process to translate the audio data, or may create an AI (Artificial Intelligence) speech translation engine to communicate with the live broadcast translation process to translate the audio data to obtain a translated subtitle, which is not limited in this application.

In some optional ways, the method further comprises: and responding to the detected relevant information of the source live stream reported by the source station, and locally caching the source live stream based on the live translation process.

In this implementation manner, if the execution main body detects relevant information of a source live stream reported by a source station, the execution main body may issue a task to a preset live translation process to control the live translation process to acquire audio data of the source live stream in the source station, perform translation, and locally cache the acquired source live stream.

Specifically, if the execution main body detects the relevant information of the source live stream reported by the source station, two ffmpeg tasks may be issued to a preset live translation process, one ffmpeg task is used to instruct the live translation process to acquire audio data of the source live stream in the source station for translation, and one ffmpeg task is used to instruct the live translation process to locally cache the acquired source live stream, so as to ensure synchronization of the source live stream and the acquired subtitle data.

Ffmpeg is a set of open source computer programs that can be used to record, convert digital audio, video, and convert them into streams.

According to the realization mode, the related information of the source live stream reported by the source station is responded, the source live stream is locally cached based on a live translation process, audio data in the source live stream is translated to obtain subtitle data, the subtitle data is encoded into the source live stream to obtain a target live stream, transcoding live is carried out based on the target live stream, and the instantaneity of generating subtitles by the live stream is effectively improved.

Step 202, encoding the subtitle data into a source live stream to obtain a target live stream.

In this embodiment, after acquiring the subtitle data, the execution main body may directly encode the subtitle data into the corresponding source live stream, or may align the timestamp of the subtitle data with the timestamp of the source live stream, encode the subtitle data to obtain the target live stream, and push the target live stream to the source station.

In some optional manners, encoding subtitle data into a source live stream to obtain a target live stream, including: responding to the acquired caption data, and aligning the timestamp of the caption data with the live source stream to obtain the aligned caption data and the aligned live source stream; and merging and encoding the aligned subtitle data and the source live stream to obtain a target live stream.

In this implementation, the execution main body may detect the subtitle data in real time, and in response to obtaining the subtitle data, may perform timestamp alignment on the subtitle data and a video frame and an audio frame in the source live stream to obtain the aligned subtitle data and the source live stream. Here, the subtitle data may multiplex time stamps carried by the audio frames.

Further, the execution main body combines and encodes the aligned subtitle data and the source live stream to obtain a target live stream, namely a video stream with subtitles.

The realization mode carries out timestamp alignment on the subtitle data and the source live stream by responding to the acquired subtitle data to obtain the aligned subtitle data and the source live stream; and merging and encoding the aligned caption data and the source live stream to obtain a target live stream, thereby ensuring three-party synchronization of audio, pictures and captions.

And 203, transcoding the target live stream to obtain a transcoded live stream for live broadcasting.

In this embodiment, the execution main body may issue a task to a preset live transcoding process after detecting that the target live stream reported by the source station has been generated, so as to control the live transcoding process to pull the target live stream from the source station for transcoding, thereby obtaining a transcoded live stream.

Here, transcoding is used to instruct that a live stream in a source station is converted into a transcoded stream with different encoding formats, different resolutions, and different code rates at a cloud end and pushed to viewers, so as to meet playing requirements in various scenes, such as different network environments and different terminal devices.

Specifically, the execution main body can control a live transcoding process to convert a target live stream into transcoded streams with different definitions, and a user can select video streams with different code rates to play according to the network condition of the user, so that the playing fluency is guaranteed.

And step 204, performing live broadcasting based on the transcoded live broadcasting stream.

In this embodiment, after generating the transcoded live stream, the execution subject may push the transcoded live stream to the source station. The source station may push the transcoded live stream to a CDN (Content Delivery Network) for the end user to access.

The CDN is an intelligent virtual network constructed on the basis of the existing network, and by means of edge servers deployed in various places and functional modules of load balancing, content distribution, scheduling and the like of a central platform, a user can obtain required content nearby, network congestion is reduced, and the access response speed and hit rate of the user are improved.

With continued reference to fig. 3, fig. 3 is a schematic diagram of an application scenario of the live method according to the present embodiment.

In the application scenario of fig. 3, the execution main body 301 may detect, in real time or at preset intervals, whether a source station 302 exists, for example, a tp source station, and relevant information of a source live stream reported by the source station, and if the relevant information of the source live stream reported by the source station is detected, may issue a task to the live translation process 303 to control the live translation process 303 to obtain audio data 304 of the source live stream in the source station 302, and perform translation to obtain subtitle data 305. Encoding subtitle data 305 into a source live stream 306 to obtain a target live stream 307, and pushing the target live stream 307 to a source station 302; in response to detecting that the target live stream has been pushed to the source station 302, the execution subject may issue a task to the live transcoding process 308 to control the live transcoding process 308 to pull the target live stream 307 from the source station 302 and perform transcoding 309, so as to obtain a transcoded live stream 310, such as a multi-definition live stream with subtitles, and push the transcoded live stream to the source station 302 for live streaming.

According to the live broadcasting method, audio data in a source live broadcasting stream in a source station is translated based on a live broadcasting translation process in response to the detection of relevant information of the source live broadcasting stream reported by the source station, so that subtitle data is obtained; encoding subtitle data into a source live stream to obtain a target live stream; transcoding the target live stream to obtain a transcoded live stream; live broadcast is carried out based on the transcoded live broadcast stream, and the efficiency of real-time generation of live broadcast stream subtitles is effectively improved.

With further reference to fig. 4, a flow 400 of yet another embodiment of a live method is shown. In this embodiment, the process 400 of the live broadcasting method of this embodiment may include the following steps:

step 401, in response to detecting the relevant information of the source live stream reported by the source station, establishing communication between the AI speech translation engine and the live translation process.

In this embodiment, the execution subject may detect whether there is related information of the source live stream reported by the source station in real time or at a preset time interval, and in response to detecting the related information of the source live stream reported by the source station, establish communication between the AI speech translation engine and the live translation process.

Here, the executing agent may establish communication between the AI speech translation engine and the live translation process using a communication protocol in the prior art or future development technology, such as http, websocket, and the like.

In some optional ways, establishing communication between the AI speech translation engine and the live translation process includes: and establishing communication between the AI speech translation engine and the live translation process through the websocket.

In this implementation, the execution subject may establish communication between the AI speech translation engine and the live translation process using a websocket communication protocol.

The websocket is a technology for arbitrary bidirectional data transmission between an application and a server. The websocket Protocol is implemented based on a TCP (Transmission Control Protocol) Protocol, and includes an initial handshake process and a subsequent multiple data frame bidirectional Transmission process. The method aims to enable the server to avoid opening a plurality of HTTP (Hypertext Transfer Protocol) connections to work so as to save resources and improve the working efficiency and the resource utilization rate when the websocket application and the websocket server carry out frequent two-way communication.

Specifically, the operation of executing the live translation process controlled by the main body to translate the audio data in the source live stream may include: the method comprises the steps of obtaining an aac audio stream in a source live stream, transcoding the aac audio stream into an opus audio stream, writing the opus audio stream into a unix socket, further reading the audio stream in the unix socket and sending the audio stream into an AI translation engine, meanwhile establishing websocket link with the AI translation engine, detecting a translation task process in real time, and obtaining a translation output result.

According to the implementation mode, communication between the AI voice translation engine and the live broadcast translation process is established through the websocket, audio data in live broadcast source streams in the source station are translated based on the communication between the AI voice translation engine and the live broadcast translation process, caption data are obtained, live broadcast is carried out according to the caption data and the source live broadcast streams, and the efficiency of generating captions through the live broadcast streams is effectively improved.

Step 402, translating audio data in the source live stream based on communication between the AI speech translation engine and the live translation process to obtain caption data.

In this embodiment, after establishing communication between the AI speech translation engine and the live broadcast translation process, the execution subject may translate audio data in the live broadcast source stream through the communication between the AI speech translation engine and the live broadcast translation process, and obtain subtitle data according to a translation result.

And 403, encoding the subtitle data into the source live stream to obtain a target live stream.

In this embodiment, reference may be made to the description of step 202 for details of implementation and technical effects of step 403, which are not described herein again.

And step 404, transcoding the target live stream to obtain a transcoded live stream.

In this embodiment, details of implementation and technical effects of step 404 may refer to the description of step 203, and are not described herein again.

And step 405, performing live broadcasting based on the transcoded live broadcasting stream.

In this embodiment, details of implementation and technical effects of step 405 may refer to the description of step 204, and are not described herein again.

Compared with the embodiment corresponding to fig. 2, the flow 400 of the live broadcasting method in this embodiment embodies that in response to detecting relevant information of a source live broadcasting stream reported by a source station, communication between an AI speech translation engine and a live broadcasting translation process is established, audio data in the source live broadcasting stream in the source station is translated based on the communication between the AI speech translation engine and the live broadcasting translation process to obtain subtitle data, a target live broadcasting stream is obtained according to the subtitle data and the source live broadcasting stream, and transcoding live broadcasting is performed based on the target live broadcasting stream, so that resources of the live broadcasting translation process are not occupied, effectiveness and reliability of the obtained subtitle data are improved, and effectiveness and reliability of the obtained live broadcasting stream with subtitles are improved.

With further reference to fig. 5, as an implementation of the methods shown in the above-mentioned figures, the present application provides an embodiment of a live broadcast apparatus, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 2, and the apparatus may be specifically applied to various electronic devices.

As shown in fig. 5, the live broadcasting apparatus 500 of the present embodiment includes: translation module 501, encoding module 502, transcoding module 503, and live module 504.

The translation module 501 may be configured to, in response to detecting relevant information of a source live stream reported by a source station, translate audio data in the source live stream in the source station based on a live translation process to obtain subtitle data.

The encoding module 502 may be configured to encode the subtitle data into the source live stream, resulting in a target live stream.

Transcoding module 503 may be configured to transcode the target live stream to obtain a transcoded live stream.

And a live broadcasting module 504, which can be configured to push the transcoded live broadcasting stream to the source station for live broadcasting.

In some optional aspects of this embodiment, the translation module includes: an establishing unit configured to establish communication between the AI speech translation engine and the live translation process; and the communication unit is configured to translate audio data in a source live stream in the source station based on the communication between the AI speech translation engine and the live translation process to obtain subtitle data.

In some optional aspects of this embodiment, the establishing unit is further configured to: and establishing communication between the AI speech translation engine and the live translation process through the websocket.

In some optional manners of this embodiment, the encoding module includes: the alignment unit is configured to perform timestamp alignment on the subtitle data and the source live stream in response to the acquired subtitle data to obtain aligned subtitle data and source live stream; and the merging unit is configured to merge and encode the aligned subtitle data and the source live stream to obtain a target live stream.

In some optional manners of this embodiment, the apparatus further includes: and the cache module is configured to respond to the detection of the related information of the source live stream reported by the source station and locally cache the source live stream based on the live translation process.

According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.

As shown in fig. 6, the electronic device is a block diagram of a live broadcast method according to an embodiment of the present application.

600 is a block diagram of an electronic device in accordance with a live method of an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 6, the electronic apparatus includes: one or more processors 601, memory 602, and interfaces for connecting the various components, including a high-speed interface and a low-speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 6, one processor 601 is taken as an example.

The memory 602 is a non-transitory computer readable storage medium as provided herein. Wherein the memory stores instructions executable by at least one processor to cause the at least one processor to perform the live broadcast methods provided herein. A non-transitory computer-readable storage medium of the present application stores computer instructions for causing a computer to perform a live method provided by the present application.

Memory 602, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules (e.g., translation module 501, encoding module 502, transcoding module 503, and live module 504 shown in fig. 5) corresponding to the live broadcast method in the embodiments of the present application. The processor 601 executes various functional applications of the server and the live broadcast by running non-transitory software programs, instructions and modules stored in the memory 602, that is, the live broadcast method in the above method embodiment is implemented.

The memory 602 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created by use of a live electronic device, and the like. Further, the memory 602 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 602 optionally includes memory located remotely from processor 601, and these remote memories may be connected to a live electronic device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device of the live broadcast method may further include: an input device 603 and an output device 604. The processor 601, the memory 602, the input device 603 and the output device 604 may be connected by a bus or other means, and fig. 6 illustrates the connection by a bus as an example.

The input device 603 may receive input numeric or character information, such as an input device like a touch screen, keypad, mouse, track pad, touch pad, pointer, one or more mouse buttons, track ball, joystick, etc. The output devices 604 may include a display device, auxiliary lighting devices (e.g., LEDs), and tactile feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: for live broadcast of a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

According to the technical scheme of the embodiment of the application, the efficiency of generating the live subtitles is improved.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.

The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A live method, the method comprising:

in response to the detection of the relevant information of the source live stream reported by the source station, translating the audio data in the source live stream in the source station based on a live translation process to obtain subtitle data;

encoding the subtitle data into the source live broadcast stream to obtain a target live broadcast stream;

transcoding the target live stream to obtain a transcoded live stream;

and carrying out live broadcast based on the transcoded live broadcast stream.

2. The method of claim 1, wherein the translating audio data in the source live stream in the source station based on a live translation process to obtain subtitle data comprises:

establishing communication between an AI speech translation engine and a live broadcast translation process;

and translating the audio data in the source live stream in the source station based on the communication between the AI speech translation engine and the live translation process to obtain subtitle data.

3. The method of claim 2, wherein establishing communication between the AI speech translation engine and the live translation process comprises:

and establishing communication between the AI speech translation engine and the live translation process through the websocket.

4. The method of claim 1, wherein encoding the subtitle data into the source live stream to obtain a target live stream comprises:

responding to the acquired subtitle data, and performing timestamp alignment on the subtitle data and the source live stream to obtain aligned subtitle data and source live stream;

and merging and encoding the aligned subtitle data and the source live stream to obtain a target live stream.

5. The method of claim 1, further comprising:

and responding to the detected relevant information of the source live broadcast stream reported by the source station, and locally caching the source live broadcast stream based on a live broadcast translation process.

6. A live device, the device comprising:

the translation module is configured to respond to the detection of relevant information of a source live stream reported by a source station, and translate audio data in the source live stream in the source station based on a live translation process to obtain subtitle data;

an encoding module configured to encode the subtitle data into the source live stream to obtain a target live stream;

the transcoding module is configured to transcode the target live stream to obtain a transcoded live stream;

a live broadcast module configured to live broadcast based on the transcoded live broadcast stream.

7. The apparatus of claim 6, wherein the translation module comprises:

an establishing unit configured to establish communication between the AI speech translation engine and the live translation process;

a communication unit configured to translate audio data in the source live stream in the source station based on communication between the AI speech translation engine and a live translation process to obtain subtitle data.

8. The apparatus of claim 7, wherein the establishing unit is further configured to:

9. The apparatus of claim 6, wherein the encoding module comprises:

the alignment unit is configured to perform timestamp alignment on the subtitle data and the source live broadcast stream in response to the acquisition of the subtitle data, so as to obtain aligned subtitle data and source live broadcast stream;

and the merging unit is configured to merge and encode the aligned subtitle data and the source live stream to obtain a target live stream.

10. The apparatus of claim 6, the apparatus further comprising:

the cache module is configured to respond to the detection of the related information of the source live stream reported by the source station, and locally cache the source live stream based on the live translation process.

11. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory is stored with instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5.

12. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-5.