WO2023093322A1

WO2023093322A1 - Live broadcast method and device

Info

Publication number: WO2023093322A1
Application number: PCT/CN2022/124310
Authority: WO
Inventors: 商红宾
Original assignee: 京东科技信息技术有限公司
Priority date: 2021-11-25
Filing date: 2022-10-10
Publication date: 2023-06-01
Also published as: CN114040220A

Abstract

The present application relates to the technical field of live broadcast, and discloses a live broadcast method and device. A specific embodiment of the method comprises: in response to detecting related information of a source live broadcast stream reported by a source station, translating audio data in the source live broadcast stream in the source station on the basis of the live broadcast translation process to obtain subtitle data; encoding the subtitle data into the source live broadcast stream to obtain a target live broadcast stream; transcoding the target live broadcast stream in the source station to obtain a transcoded live broadcast stream; and performing live broadcast on the basis of the transcoded live broadcast stream.

Description

Live broadcast method and device

This application claims the priority of the Chinese patent application with application number 202111414398.X and titled "Live Streaming Method and Apparatus" filed on November 25, 2021, the entirety of which is incorporated herein by reference.

technical field

The present disclosure relates to the field of computer technology, in particular to the field of live broadcast technology, and in particular to a live broadcast method and device.

Background technique

With the development of the times, the live broadcast industry has made a qualitative leap, and issues such as high-definition picture quality, low latency, and synchronization of audio and video have been optimized to the extreme. However, the needs of users are not satisfied with this.

In some scenarios, such as major sports events, large-scale conference reports, online education and training, etc., it is necessary to translate live broadcasts in real time and add multilingual subtitles.

Contents of the invention

Embodiments of the present disclosure provide a live broadcast method, device, device, and storage medium.

According to the first aspect, an embodiment of the present disclosure provides a live broadcast method, the method includes: in response to detecting the relevant information of the source live stream reported by the source station, based on the live translation process, the audio in the source live stream in the source station Translate the data to obtain subtitle data; encode the subtitle data into the source live stream to obtain the target live stream; transcode the target live stream to obtain the transcoded live stream; perform live broadcast based on the transcoded live stream.

In some embodiments, based on the live translation process, the audio data in the source live stream in the source station is translated to obtain subtitle data, including: establishing communication between the AI voice translation engine and the live translation process; based on the AI voice translation engine The communication with the live translation process translates the audio data in the source live stream in the source station to obtain subtitle data.

In some embodiments, establishing the communication between the AI speech translation engine and the live translation process includes: establishing the communication between the AI speech translation engine and the live translation process through websocket.

In some embodiments, encoding the subtitle data into the source live stream to obtain the target live stream includes: in response to obtaining the subtitle data, aligning the subtitle data and the source live stream with time stamps to obtain the aligned subtitle data and Source live stream; merge and encode the aligned subtitle data with the source live stream to obtain the target live stream.

In some embodiments, the method further includes: in response to detecting relevant information of the source live stream reported by the source station, locally caching the source live stream based on a live translation process.

According to the second aspect, an embodiment of the present disclosure provides a live broadcast device, the device includes a translation module configured to respond to the detection of information about the source live stream reported by the source station, based on the live translation process, the source station in the source station The audio data in the live stream is translated to obtain subtitle data; the encoding module is configured to encode the subtitle data into the source live stream to obtain the target live stream; the transcoding module is configured to transcode the target live stream to obtain the transcoded The transcoded live stream; the live module is configured to perform live broadcast based on the transcoded live stream.

In some embodiments, the translation module includes: an establishment unit configured to establish communication between the AI speech translation engine and the live translation process; a communication unit configured to communicate based on the communication between the AI speech translation engine and the live translation process The audio data in the source live stream in the source site is translated to obtain subtitle data.

In some embodiments, the establishment unit is further configured to: establish the communication between the AI speech translation engine and the live translation process through the websocket communication protocol.

In some embodiments, the encoding module includes: an alignment unit configured to, in response to obtaining the subtitle data, perform time stamp alignment on the subtitle data and the source live stream to obtain the aligned subtitle data and the source live stream; a merging unit configured to It is configured to merge and encode the aligned subtitle data with the source live stream to obtain the target live stream.

In some embodiments, the device further includes: a cache module configured to locally cache the source live stream based on the live translation process in response to detecting relevant information of the source live stream reported by the source site.

According to a third aspect, an embodiment of the present disclosure provides an electronic device, the electronic device includes one or more processors; a storage device, on which one or more programs are stored, when the one or more programs are Multiple processors are executed, so that one or more processors implement the live broadcast method according to any embodiment of the first aspect.

According to a fourth aspect, an embodiment of the present disclosure provides a computer-readable medium, on which a computer program is stored, and when the program is executed by a processor, the live broadcast method according to any embodiment of the first aspect is implemented.

It should be understood that what is described in this section is not intended to identify key or important features of the embodiments of the present disclosure, nor is it intended to limit the scope of the present disclosure. Other features of the present disclosure will be easily understood from the following description.

Description of drawings

FIG. 1 is an exemplary system architecture diagram in which the present disclosure can be applied;

Fig. 2 is a flow chart according to one embodiment of the live broadcast method of the present disclosure;

FIG. 3 is a schematic diagram of an application scenario of the live broadcast method according to the present disclosure;

Fig. 4 is a flow chart according to another embodiment of the live broadcast method of the present disclosure;

FIG. 5 is a schematic diagram of an embodiment of a live broadcast device according to the present disclosure;

FIG. 6 is a schematic structural diagram of a computer system suitable for implementing a server according to an embodiment of the present disclosure.

Detailed ways

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and they should be regarded as exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

It should be noted that, in the case of no conflict, the embodiments in the present disclosure and the features in the embodiments can be combined with each other. The present disclosure will be described in detail below with reference to the accompanying drawings and embodiments.

The relevant technology realizes live broadcast real-time translation and multilingual subtitle functions through offline manual simultaneous interpretation + subtitle machine equipment.

The above method mainly has the following problems:

1. The cost is high, and each live broadcast requires professional simultaneous interpreters and related hardware equipment;

2. Low efficiency, affected by labor and equipment, low efficiency, does not have the conditions for large-scale use.

FIG. 1 shows an exemplary system architecture 100 to which an embodiment of the live broadcast method of the present disclosure can be applied.

As shown in FIG. 1 , a system architecture 100 may include

terminal devices

101 , 102 , 103 , a network 104 and a server 105 . The network 104 is used as a medium for providing communication links between the

terminal devices

101 , 102 , 103 and the server 105 . Network 104 may include various connection types, such as wires, wireless communication links, or fiber optic cables, among others.

The

terminal devices

101, 102, 103 interact with the server 105 via the network 104 to receive or send messages and the like. Various communication client applications may be installed on the

terminal devices

101, 102, 103, for example, live broadcast applications, communication applications, and the like.

The

terminal devices

101, 102, and 103 may be hardware or software. When the

terminal devices

101, 102, 103 are hardware, they may be various electronic devices with display screens, including but not limited to mobile phones and notebook computers. When the

terminal devices

101, 102, 103 are software, they can be installed in the electronic devices listed above. It can be implemented as multiple software or software modules (for example, for providing live broadcast services), or as a single software or software module. No specific limitation is made here.

The server 105 may be a server that provides various services. For example, in response to detecting the relevant information of the source live stream reported by the source station, the audio data in the source live stream in the source station is translated based on the live translation process to obtain subtitle data ; Encode the subtitle data into the source live stream to obtain the target live stream; transcode the target live stream to obtain the transcoded live stream; perform live broadcast based on the transcoded live stream.

It should be noted that the server 105 may be hardware or software. When the server 105 is hardware, it can be implemented as a distributed server cluster composed of multiple servers, or as a single server. When the server is software, it can be implemented as multiple software or software modules (for example, for providing live broadcast services), or as a single software or software module. No specific limitation is made here.

It should be noted that the live broadcast method provided by the embodiments of the present disclosure may be executed by the server 105, or executed by the

terminal devices

101, 102, 103, or executed by the server 105 and the

terminal devices

101, 102, 103 in cooperation with each other. Correspondingly, each part (such as each unit, subunit, module, submodule) that the live broadcasting device includes can be all set in the server 105, also can be all set in the

terminal equipment

101, 102, 103, can also be set in the server respectively 105 and

terminal devices

101, 102, 103.

It should be understood that the numbers of terminal devices, networks and servers in Fig. 1 are only illustrative. According to the implementation needs, there can be any number of terminal devices, networks and servers.

FIG. 2 shows a schematic flowchart 200 of a live broadcast method applicable to the present disclosure. In this embodiment, the live broadcast method includes the following steps:

Step 201, in response to detecting the relevant information of the source live stream reported by the source station, based on the live translation process, the audio data in the source live stream in the source station is translated to obtain subtitle data.

In this embodiment, the execution subject (server 105 or

terminal equipment

101, 102, 103 as shown in Figure 1) can detect whether there is relevant information about the source live stream reported by the source station in real time or at intervals of preset durations, if detected The relevant information of the source live stream reported to the source station can send a task to the preset live translation process to control the live translation process to pull the audio data of the source live stream in the source station and translate it to obtain subtitle data .

Wherein, the relevant information of the source live stream may include stream name, stream status, stream address, etc. of the source live stream, for example, vhost, app, name, ip, port, etc.

Here, the execution subject can obtain the video source stream and aac (Advanced Audio Coding, Advanced Audio Coding) audio stream of the source live stream at the same time, and transcode the aac audio stream into an opus audio stream, and translate according to the opus audio stream to obtain subtitle data.

The translation refers to translating the content corresponding to the audio stream into one or more languages.

It should be pointed out that the executive body can directly control the live translation process to translate the audio data, or create an AI (Artificial Intelligence, artificial intelligence) voice translation engine to communicate with the live translation process to translate the audio data and obtain the translated audio data. The subtitles are not limited in this disclosure.

In some optional manners, the method further includes: in response to detecting relevant information of the source live stream reported by the source station, locally caching the source live stream based on the live translation process.

In this implementation, if the execution subject detects the relevant information of the source live stream reported by the source station, it can issue a task to the preset live translation process to control the live translation process to obtain the audio of the source live stream in the source station The data is translated, and at the same time, the obtained source live stream is cached locally.

Specifically, if the execution subject detects the relevant information of the source live stream reported by the source station, it can issue two ffmpeg tasks to the preset live translation process, and one ffmpeg task is used to instruct the live translation process to obtain the source information in the source station. The audio data of the live stream is translated, and a ffmpeg task is used to instruct the live translation process to cache the obtained source live stream locally to ensure the synchronization of the source live stream and the obtained subtitle data.

Among them, ffmpeg is a set of open source computer programs that can be used to record, convert digital audio and video, and convert them into streams.

In this implementation, by responding to the detection of the relevant information of the source live stream reported by the source station, based on the live translation process, the source live stream is locally cached, and the audio data in the source live stream is translated to obtain subtitle data, and the subtitle The data is encoded into the source live stream to obtain the target live stream, and then transcoded live based on the target live stream, which effectively improves the real-time performance of generating subtitles from the live stream.

Step 202, encoding the subtitle data into the source live stream to obtain the target live stream.

In this embodiment, after obtaining the subtitle data, the execution subject can directly encode the subtitle data into the corresponding source live stream, or align the subtitle data with the timestamp of the source live stream and perform encoding to obtain the target live stream stream, and push the target live stream to the source site.

In some optional manners, encoding the subtitle data into the source live stream to obtain the target live stream includes: in response to obtaining the subtitle data, aligning the subtitle data with the source live stream to obtain the aligned subtitle data and Source live stream; merge and encode the aligned subtitle data with the source live stream to obtain the target live stream.

In this implementation, the execution subject can detect the subtitle data in real time, and in response to obtaining the subtitle data, can align the subtitle data with the video frame and audio frame in the source live stream to obtain the aligned subtitle data and the source live stream flow. Here, the subtitle data may multiplex the time stamp carried by the audio frame.

Further, the execution subject merges and encodes the aligned subtitle data and the source live stream to obtain the target live stream, that is, the video stream with subtitles.

In this implementation, in response to the acquisition of the subtitle data, the subtitle data and the source live stream are time-stamp aligned to obtain the aligned subtitle data and the source live stream; the aligned subtitle data and the source live stream are merged and encoded to obtain the target The live stream ensures the three-way synchronization of audio, video and subtitles.

Step 203, transcoding the target live stream to obtain the transcoded live stream for live broadcasting.

In this embodiment, after detecting the information that the target live stream reported by the source station has been generated, the execution subject can issue a task to the preset live transcoding process to control the live transcoding process to pull the target from the source station The live stream is transcoded to obtain the transcoded live stream.

Here, transcoding is used to indicate that the live stream in the source station is converted into different encoding formats, different resolutions, and different bit rates on the cloud and pushed to the audience, so as to meet the requirements of different network environments and different terminal devices. playback requirements in various scenarios.

Specifically, the executive body can control the live transcoding process to convert the target live stream into transcoded streams of different resolutions, and users can choose video streams with different code rates to play according to their own network conditions to ensure smooth playback.

Step 204, perform live broadcast based on the transcoded live stream.

In this embodiment, after the execution subject generates the transcoded live stream, it can push the transcoded live stream to the source station. The source station can forward the transcoded live stream to CDN (Content Delivery Network, that is, content distribution network) for end users to access.

Among them, CDN is an intelligent virtual network built on the basis of the existing network. Relying on the edge servers deployed in various places, through the load balancing, content distribution, scheduling and other functional modules of the central platform, users can obtain the required content nearby and reduce network traffic. Congestion, improve user access response speed and hit rate.

Continue to refer to FIG. 3 , which is a schematic diagram of an application scenario of the live broadcast method according to this embodiment.

In the application scenario in Figure 3, the executive body 301 can detect whether there is a source station 302 in real time or at preset intervals, for example, the tp source station, the relevant information of the source live stream reported by the source station, if the source live stream reported by the source station is detected relevant information, then a task can be issued to the live translation process 303 to control the live translation process 303 to obtain the audio data 304 of the source live stream in the source station 302 and perform translation to obtain subtitle data 305 . Encode the subtitle data 305 into the source live stream 306 to obtain the target live stream 307, and push the target live stream 307 to the source station 302; the execution subject can transfer to the live broadcast in response to detecting that the target live stream has been pushed to the source station 302 The encoding process 308 issues a task to control the live transcoding process 308 to pull the target live stream 307 from the source station 302 and perform transcoding 309 to obtain the transcoded live stream 310, such as a multi-definition live stream with subtitles stream, and push the transcoded live stream to the source station 302 for live broadcasting.

In the live broadcast method of the present disclosure, by responding to the relevant information of the source live stream reported by the source station, based on the live translation process, the audio data in the source live stream in the source station is translated to obtain subtitle data; the subtitle data is encoded into The source live stream is used to obtain the target live stream; the target live stream is transcoded to obtain the transcoded live stream; live broadcast based on the transcoded live stream effectively improves the efficiency of real-time generation of live stream subtitles.

Further referring to FIG. 4 , it shows a flow 400 of another embodiment of the live broadcast method. In this embodiment, the process 400 of the live broadcast method of this embodiment may include the following steps:

Step 401, in response to detecting the relevant information of the source live stream reported by the source station, establish communication between the AI speech translation engine and the live translation process.

In this embodiment, the execution subject can detect whether there is relevant information about the source live stream reported by the source station in real time or at intervals of a preset period of time, and in response to detecting the relevant information about the source live stream reported by the source station, an AI voice translation engine is established Communication with the live translation process.

Here, the executive body can adopt the communication protocol in the existing technology or the future development technology, such as http, websocket, etc., to establish the communication between the AI speech translation engine and the live translation process.

In some optional ways, establishing the communication between the AI speech translation engine and the live translation process includes: establishing the communication between the AI speech translation engine and the live translation process through websocket.

In this implementation, the executive body can use the websocket communication protocol to establish communication between the AI speech translation engine and the live translation process.

Among them, websocket is a technology used for arbitrary two-way data transmission between applications and servers. The websocket protocol is implemented based on the TCP (Transmission Control Protocol, Transmission Control Protocol) protocol, including the initial handshake process and the subsequent two-way transmission of multiple data frames. Its purpose is that when the websocket application and the websocket server perform frequent two-way communication, the server can avoid opening multiple HTTP (HyperText Transfer Protocol, hypertext transfer protocol) connections to save resources and improve work efficiency and resource utilization.

Specifically, the execution subject controls the live translation process to translate the audio data in the source live stream may include: obtaining the aac audio stream in the source live stream and transcoding it into an opus audio stream, and then writing it into a unix socket The socket, further, reads the audio stream in the unix socket socket and sends it to the AI translation engine, and establishes a websocket connection with the AI translation engine at the same time, and detects the progress of the translation task in real time to obtain the translation output result.

This implementation method establishes the communication between the AI voice translation engine and the live translation process through websocket, and then based on the communication between the AI voice translation engine and the live translation process, the audio data in the live source stream in the source station is translated to obtain subtitle data , perform live broadcasting according to subtitle data and source live stream, effectively improving the efficiency of generating subtitles from live stream.

Step 402: Based on the communication between the AI speech translation engine and the live translation process, the audio data in the source live stream is translated to obtain subtitle data.

In this embodiment, after the execution subject establishes the communication between the AI speech translation engine and the live translation process, the audio data in the live source stream can be translated through the communication between the AI speech translation engine and the live translation process, and according to The translation result obtains subtitle data.

Step 403, encoding the subtitle data into the source live stream to obtain the target live stream.

In this embodiment, for implementation details and technical effects of step 403, reference may be made to the description of step 202, which will not be repeated here.

Step 404, transcoding the target live stream to obtain the transcoded live stream.

In this embodiment, for implementation details and technical effects of step 404, reference may be made to the description of step 203, which will not be repeated here.

Step 405, perform live broadcast based on the transcoded live stream.

In this embodiment, for implementation details and technical effects of step 405, reference may be made to the description of step 204, which will not be repeated here.

In the above-mentioned embodiment of the present disclosure, compared with the embodiment corresponding to FIG. 2 , the process 400 of the live broadcast method in this embodiment reflects the establishment of an AI voice translation engine and The communication between the live translation process is based on the communication between the AI voice translation engine and the live translation process to translate the audio data in the source live stream in the source station to obtain the subtitle data, and then obtain the target according to the subtitle data and the source live stream The live stream is transcoded based on the target live stream, which does not occupy the resources of the live translation process, and at the same time helps to improve the validity and reliability of the obtained subtitle data, thereby improving the effectiveness and reliability of the obtained live stream with subtitles. reliability.

With further reference to FIG. 5 , as an implementation of the methods shown in the above figures, the present disclosure provides an embodiment of a live broadcast device. This device embodiment corresponds to the method embodiment shown in FIG. 2 , and the device can specifically be applied to in various electronic devices.

As shown in FIG. 5 , the live broadcast device 500 of this embodiment includes: a translation module 501 , an encoding module 502 , a transcoding module 503 and a live broadcast module 504 .

Wherein, the translation module 501 may be configured to translate the audio data in the source live stream in the source site based on the live translation process to obtain subtitle data in response to detecting the relevant information of the source live stream reported by the source site.

The encoding module 502 can be configured to encode the subtitle data into the source live stream to obtain the target live stream.

The transcoding module 503 may be configured to transcode the target live stream to obtain the transcoded live stream.

The live broadcast module 504 can be configured to push the transcoded live stream to the source station for live broadcast.

In some optional forms of this embodiment, the translation module includes: an establishment unit configured to establish communication between the AI speech translation engine and the live translation process; a communication unit configured to communicate with the live translation process based on the AI speech translation engine The communication between processes translates the audio data in the source live stream in the source station to obtain subtitle data.

In some optional manners of this embodiment, the establishment unit is further configured to: establish the communication between the AI speech translation engine and the live translation process through websocket.

In some optional manners of this embodiment, the encoding module includes: an alignment unit configured to, in response to obtaining the subtitle data, perform timestamp alignment on the subtitle data and the source live stream, and obtain the aligned subtitle data and the source live stream stream; a merging unit configured to merge and encode the aligned subtitle data with the source live stream to obtain a target live stream.

In some optional manners of this embodiment, the device further includes: a caching module configured to locally cache the source live stream based on the live translation process in response to detecting the relevant information of the source live stream reported by the source station.

According to the embodiments of the present disclosure, the present disclosure also provides an electronic device and a readable storage medium.

As shown in FIG. 6 , it is a block diagram of an electronic device according to a live broadcast method according to an embodiment of the present disclosure.

600 is a block diagram of an electronic device according to the live broadcast method of an embodiment of the present disclosure. Electronic device is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. Electronic devices may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are by way of example only, and are not intended to limit implementations of the disclosure described and/or claimed herein.

As shown in FIG. 6, the electronic device includes: one or more processors 601, a memory 602, and interfaces for connecting various components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and can be mounted on a common motherboard or otherwise as desired. The processor may process instructions executed within the electronic device, including instructions stored in or on the memory, to display graphical information of a GUI on an external input/output device such as a display device coupled to an interface. In other implementations, multiple processors and/or multiple buses may be used with multiple memories and multiple memories, if desired. Likewise, multiple electronic devices may be connected, with each device providing some of the necessary operations (eg, as a server array, a set of blade servers, or a multi-processor system). In FIG. 6, a processor 601 is taken as an example.

The memory 602 is a non-transitory computer-readable storage medium provided in the present disclosure. Wherein, the memory stores instructions executable by at least one processor, so that the at least one processor executes the live broadcast method provided by the present disclosure. The non-transitory computer-readable storage medium of the present disclosure stores computer instructions, and the computer instructions are used to make the computer execute the live broadcast method provided by the present disclosure.

The memory 602, as a non-transitory computer-readable storage medium, can be used to store non-transitory software programs, non-transitory computer-executable programs and modules, such as program instructions/modules corresponding to the live broadcast method in the embodiments of the present disclosure (for example, the accompanying drawings 5 shows the translation module 501, encoding module 502, transcoding module 503 and live broadcast module 504). The processor 601 runs the non-transitory software programs, instructions and modules stored in the memory 602 to execute various functional applications of the server and live broadcast, that is, to realize the live broadcast method in the above method embodiments.

The memory 602 may include a program storage area and a data storage area, wherein the program storage area may store an operating system and an application program required by at least one function; the data storage area may store data created by using live electronic devices, and the like. In addition, the memory 602 may include a high-speed random access memory, and may also include a non-transitory memory, such as at least one magnetic disk storage device, a flash memory device, or other non-transitory solid-state storage devices. In some embodiments, the storage 602 may optionally include storages that are set remotely relative to the processor 601, and these remote storages may be connected to the live electronic device through a network. Examples of the aforementioned networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic equipment of the live broadcast method may further include: an input device 603 and an output device 604 . The processor 601, the memory 602, the input device 603, and the output device 604 may be connected through a bus or in other ways. In FIG. 6, connection through a bus is taken as an example.

The input device 603 can receive input digital or character information, such as a touch screen, a keypad, a mouse, a trackpad, a touchpad, a pointing stick, one or more mouse buttons, a trackball, a joystick and other input devices. The output device 604 may include a display device, an auxiliary lighting device (eg, LED), a tactile feedback device (eg, a vibration motor), and the like. The display device may include, but is not limited to, a liquid crystal display (LCD), a light emitting diode (LED) display, and a plasma display. In some implementations, the display device may be a touch screen.

Various implementations of the systems and techniques described herein can be implemented in digital electronic circuitry, integrated circuit systems, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include being implemented in one or more computer programs executable and/or interpreted on a programmable system including at least one programmable processor, the programmable processor Can be special-purpose or general-purpose programmable processor, can receive data and instruction from storage system, at least one input device, and at least one output device, and transmit data and instruction to this storage system, this at least one input device, and this at least one output device an output device.

These computing programs (also referred to as programs, software, software applications, or codes) include machine instructions for a programmable processor and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine language calculation program. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or means for providing machine instructions and/or data to a programmable processor ( For example, magnetic disks, optical disks, memories, programmable logic devices (PLDs), including machine-readable media that receive machine instructions as machine-readable signals. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with the user, the systems and techniques described herein can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for live broadcast to the user; and a keyboard and pointing device (eg, a mouse or trackball) through which a user can provide input to the computer. Other kinds of devices can also be used to provide interaction with the user; for example, the feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and can be in any form (including Acoustic input, speech input or, tactile input) to receive input from the user.

The systems and techniques described herein can be implemented in a computing system that includes back-end components (e.g., as a data server), or a computing system that includes middleware components (e.g., an application server), or a computing system that includes front-end components (e.g., as a a user computer having a graphical user interface or web browser through which a user can interact with embodiments of the systems and techniques described herein), or including such backend components, middleware components, Or any combination of front-end components in a computing system. The components of the system can be interconnected by any form or medium of digital data communication, eg, a communication network. Examples of communication networks include: Local Area Network (LAN), Wide Area Network (WAN) and the Internet.

A computer system may include clients and servers. Clients and servers are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by computer programs running on the respective computers and having a client-server relationship to each other.

According to the technical solutions of the embodiments of the present disclosure, it is helpful to improve the efficiency of generating live subtitles.

It should be understood that steps may be reordered, added or deleted using the various forms of flow shown above. For example, the steps described in the present application may be executed in parallel, sequentially, or in a different order, as long as the desired result of the technical solution disclosed in the present disclosure can be achieved, no limitation is imposed herein.

The specific implementation manners described above do not limit the protection scope of the present disclosure. It should be apparent to those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made depending on design requirements and other factors. Any modifications, equivalent replacements and improvements made within the spirit and principles of the present disclosure shall be included within the protection scope of the present disclosure.

Claims

A live broadcast method, the method comprising:

In response to detecting the relevant information of the source live stream reported by the source station, translating the audio data in the source live stream in the source station based on a live translation process to obtain subtitle data;

Encoding the subtitle data into the source live stream to obtain the target live stream;

Transcoding the target live stream to obtain the transcoded live stream;

Perform live broadcast based on the transcoded live stream.
The method according to claim 1, wherein the audio data in the source live stream in the source station is translated based on the live translation process to obtain subtitle data, including:

Establish communication between the AI voice translation engine and the live translation process;

Based on the communication between the AI speech translation engine and the live translation process, the audio data in the source live stream in the source station is translated to obtain subtitle data.
The method according to claim 2, wherein said establishing the communication between the AI speech translation engine and the live translation process comprises:

The communication between the AI speech translation engine and the live translation process is established through the websocket communication protocol.
The method according to claim 1, wherein encoding the subtitle data into the source live stream to obtain the target live stream comprises:

In response to obtaining the subtitle data, aligning the subtitle data and the source live stream with time stamps to obtain the aligned subtitle data and the source live stream;

Merge and encode the aligned subtitle data and the source live stream to obtain the target live stream.
The method according to claim 1, said method further comprising:

In response to detecting the relevant information of the source live stream reported by the source station, the source live stream is locally cached based on the live translation process.
A live broadcast device, said device comprising:

The translation module is configured to, in response to detecting the relevant information of the source live stream reported by the source station, translate the audio data in the source live stream in the source station based on the live translation process to obtain subtitle data;

An encoding module configured to encode the subtitle data into the source live stream to obtain a target live stream;

The transcoding module is configured to transcode the target live stream to obtain the transcoded live stream;

The live broadcast module is configured to perform live broadcast based on the transcoded live stream.
The device according to claim 6, wherein the translation module comprises:

Establishing a unit configured to establish communication between the AI speech translation engine and the live translation process;

The communication unit is configured to translate the audio data in the source live stream in the source station based on the communication between the AI speech translation engine and the live translation process to obtain subtitle data.
The apparatus according to claim 7, wherein the establishing unit is further configured to:

The communication between the AI speech translation engine and the live translation process is established through the websocket communication protocol.
The device according to claim 6, wherein the encoding module comprises:

The alignment unit is configured to, in response to acquiring the subtitle data, perform timestamp alignment on the subtitle data and the source live stream, to obtain the aligned subtitle data and the source live stream;

The merging unit is configured to merge and encode the aligned subtitle data and the source live stream to obtain the target live stream.
The apparatus of claim 6, further comprising:

The caching module is configured to locally cache the source live stream based on the live translation process in response to detecting the relevant information of the source live stream reported by the source station.
An electronic device, characterized in that it comprises:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

The memory stores information executable by the at least one processor, so that the at least one processor can execute the method according to any one of claims 1-5.
A non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to make the computer execute the method according to any one of claims 1-5.