CN116112722B

CN116112722B - Audio playing method and device, electronic equipment and storage medium

Info

Publication number: CN116112722B
Application number: CN202310391590.4A
Authority: CN
Inventors: 郭晓; 李向荣; 刘杨; 郑强; 吕亚东; 王栋
Original assignee: Cctv New Media Culture Media Beijing Co ltd
Current assignee: Cctv New Media Culture Media Beijing Co ltd
Priority date: 2023-02-17
Filing date: 2023-04-13
Publication date: 2023-06-27
Anticipated expiration: 2043-04-13
Also published as: CN116112722A

Abstract

The disclosure relates to an audio playing method, an audio playing device, an electronic device and a storage medium, wherein the audio playing method comprises the following steps: acquiring a video identity of a video stream being played by a second terminal; the video identity is sent to a first service end, and an audio playing address of an audio stream corresponding to the video identity is obtained from the first service end; and based on the audio playing address, acquiring the audio stream from the first service end, and playing the audio stream synchronously with the playing of the video stream by the second terminal, wherein the audio stream comprises audio data and a standard timestamp shared with the video stream, and the audio stream is a spatial audio live stream generated based on an audio definition model. According to the audio playing method, the device, the electronic equipment and the storage medium, the problem that a better audio listening experience can not be obtained when the video is played in a large screen is solved, and collaborative playing among multiple terminals can be achieved.

Description

Audio playing method and device, electronic equipment and storage medium

The present disclosure claims priority from chinese patent application No. 202310141761.8 filed on day 2023, month 02, 17, the entire contents of which are incorporated herein by reference.

Technical Field

The disclosure relates to the technical field of multimedia and communication, and in particular relates to an audio playing method, an audio playing device, electronic equipment and a storage medium.

Background

With the development of information and communication technology, information transmission approaches are various, and in some outdoor environments or public environments, information can be transmitted to people by playing videos through a large screen.

For example, in a car theater, a user may stop a vehicle near the car theater and listen to movie sounds through an outdoor audio device or an in-car fm radio; for another example, in a city large screen, a user may view video content that is played on the city large screen outdoors.

However, when playing video in such large screens as car theatres, city large screens, it may be difficult to obtain a better audio listening experience, e.g. in car theatres, the sound quality of outdoor stereo or in-car fm radio is poor and the original spatial audio effects in the movie cannot be restored; for large urban screens, users may not be able to simultaneously hear the audio of the video in a noisy urban environment.

Disclosure of Invention

The disclosure provides an audio playing method, an audio playing device, an electronic device and a storage medium, so as to at least solve the problem that in the related art, a better audio listening experience may be difficult to obtain when a video is played in a large screen. The technical scheme of the present disclosure is as follows:

According to a first aspect of embodiments of the present disclosure, there is provided an audio playing method, applied to a first terminal, the audio playing method including: acquiring a video identity of a video stream being played by a second terminal, wherein the video identity is a unique identity corresponding to the video stream; the video identity is sent to a first service end, and an audio playing address of an audio stream corresponding to the video identity is obtained from the first service end; and based on the audio playing address, acquiring the audio stream from the first server side, and playing the audio stream synchronously with the playing of the video stream by the second terminal, wherein the first server side is a server for pushing the video stream to the second terminal and pushing the audio stream to the first terminal, the audio stream comprises audio data and a standard timestamp shared by the video stream, and the audio stream is a spatial live audio stream generated based on an audio definition model.

According to a second aspect of embodiments of the present disclosure, an audio playing method is provided, applied to a first service end, where the audio playing method includes: receiving a video identity of a video stream being played by a second terminal acquired by a first terminal from the first terminal, wherein the video identity is a unique identity corresponding to the video stream; transmitting an audio playing address of an audio stream corresponding to the video identity to the first terminal; and transmitting the audio stream to the first terminal based on the audio playing address so that the first terminal plays the audio stream synchronously with the second terminal playing the video stream, wherein the first server is a server for pushing the video stream to the second terminal and pushing the audio stream to the first terminal, the audio stream comprises audio data and a standard timestamp shared by the video stream, and the audio stream is a spatial audio live stream generated based on an audio definition model.

According to a third aspect of embodiments of the present disclosure, there is provided an audio playing method, applied to a third terminal, the audio playing method including: receiving audio playing information from a second service end, wherein the audio playing information is sent to the second service end by a first terminal, the audio playing information comprises an audio playing address or comprises the audio playing address and the audio playing progress of the first terminal, the audio playing address is acquired by the first terminal, the first terminal acquires a video identity of a video stream being played by the second terminal, the video identity is sent to the first service end, and the audio playing address of an audio stream corresponding to the video identity is acquired from the first service end, wherein the video identity is a unique identity corresponding to the video stream; and based on the audio playing information, acquiring the audio stream from the first server side, and playing the audio stream synchronously with the playing of the video stream by the second terminal, wherein the first server side is a server for pushing the video stream to the second terminal and pushing the audio stream to the first terminal, the audio stream comprises audio data and a standard timestamp shared by the video stream, and the audio stream is a spatial live audio stream generated based on an audio definition model.

According to a fourth aspect of embodiments of the present disclosure, there is provided an audio playing device applied to a first terminal, the audio playing device including: the device comprises an identification acquisition unit, a video processing unit and a video processing unit, wherein the identification acquisition unit is configured to acquire a video identification of a video stream being played by a second terminal, and the video identification is a unique identification corresponding to the video stream; the identification sending unit is configured to send the video identification to a first service end and acquire an audio playing address of an audio stream corresponding to the video identification from the first service end; an audio stream obtaining unit configured to obtain the audio stream from the first server based on the audio playing address, and play the audio stream in synchronization with the second terminal playing the video stream, wherein the first server is a server for pushing the video stream to the second terminal and pushing the audio stream to the first terminal, the audio stream includes audio data and a standard timestamp shared with the video stream, and the audio stream is a spatial audio live stream generated based on an audio definition model.

According to a fifth aspect of embodiments of the present disclosure, there is provided an audio playing device applied to a first service end, the audio playing device including: an identifier receiving unit configured to receive, from a first terminal, a video identifier of a video stream being played by a second terminal, where the video identifier is a unique identifier corresponding to the video stream; an address sending unit configured to send an audio play address of an audio stream corresponding to the video identity to the first terminal; an audio stream sending unit configured to send the audio stream to the first terminal based on the audio playing address, so that the first terminal plays the audio stream synchronously with the second terminal playing the video stream, wherein the first service end is a server for pushing the video stream to the second terminal and pushing the audio stream to the first terminal, the audio stream comprises audio data and a standard timestamp shared with the video stream, and the audio stream is a spatial audio live stream generated based on an audio definition model.

According to a sixth aspect of embodiments of the present disclosure, there is provided an audio playing device applied to a third terminal, the audio playing device including: an information receiving unit configured to receive audio playing information from a second service end, where the audio playing information is sent to the second service end by a first terminal, the audio playing information includes an audio playing address, or includes the audio playing address and an audio playing progress of the first terminal, the audio playing address is acquired by the first terminal, the first terminal acquires a video identity of a video stream being played by the second terminal, sends the video identity to the first service end, and acquires an audio playing address of an audio stream corresponding to the video identity from the first service end, where the video identity is a unique identity corresponding to the video stream; and an acquisition and play unit configured to acquire the audio stream from the first server based on the audio play information, and play the audio stream in synchronization with the video stream played by the second terminal, wherein the first server is a server for pushing the video stream to the second terminal and pushing the audio stream to the first terminal, the audio stream includes audio data and a standard timestamp shared with the video stream, and the audio stream is a spatial audio live stream generated based on an audio definition model.

According to a seventh aspect of embodiments of the present disclosure, there is provided an electronic device including: a processor; a memory for storing the processor-executable instructions, wherein the processor-executable instructions, when executed by the processor, cause the processor to perform an audio playback method according to an exemplary embodiment of the present disclosure.

According to an eighth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium, which when executed by a processor of a server, causes the server to perform the audio playing method according to the exemplary embodiments of the present disclosure.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:

the first terminal may send the video identity of the video stream played by the second terminal to the first server, and acquire the audio play address of the audio stream corresponding to the video stream from the first server, so that the audio stream may be played on the first terminal, so that the user may listen to the audio stream played on the first terminal while watching the video stream played by the second terminal, thereby implementing collaborative play between multiple terminals, so that the corresponding audio content may be clearly listened to by the first terminal even when playing the video in the second terminal such as a large screen.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure and do not constitute an undue limitation on the disclosure.

Fig. 1 is a schematic diagram of an example of a multi-terminal audio and video playback scenario in accordance with an exemplary embodiment of the present disclosure.

Fig. 2 is a schematic flowchart of an example of an audio playing method according to an exemplary embodiment of the present disclosure.

Fig. 3 is a schematic flowchart of steps in an audio playback method of acquiring an audio playback address according to an exemplary embodiment of the present disclosure.

Fig. 4 is a schematic diagram of another example of a multi-terminal audio and video playback scenario in accordance with an exemplary embodiment of the present disclosure.

Fig. 5 is a schematic flow chart of steps of playing an audio stream in synchronization with a video stream in an audio playing method according to an exemplary embodiment of the present disclosure.

Fig. 6 is a schematic flow chart of an example of an audio-video stream synchronized playback method according to an exemplary embodiment of the present disclosure.

Fig. 7 is a schematic diagram of a manually adjusted audio-visual synchronization interface of a first terminal according to an exemplary embodiment of the present disclosure.

Fig. 8 is a schematic flowchart of steps of a terminal playing an audio stream in an audio playing method according to an exemplary embodiment of the present disclosure.

Fig. 9 is a schematic diagram illustrating a decoding flow of an ADM audio file at a third terminal according to an exemplary embodiment of the present disclosure.

Fig. 10 is a schematic diagram illustrating a decoding flow of an ADM audio file at a first terminal according to an exemplary embodiment of the present disclosure.

Fig. 11 is a schematic diagram of an overall flow of an audio playing method according to an exemplary embodiment of the present disclosure.

Fig. 12 is a schematic flowchart of another example of an audio playing method according to an exemplary embodiment of the present disclosure.

Fig. 13 is a schematic flowchart of another example of an audio-video stream synchronized playback method according to an exemplary embodiment of the present disclosure.

Fig. 14 is a schematic flowchart of still another example of an audio playing method according to an exemplary embodiment of the present disclosure.

Fig. 15 is a schematic block diagram of an example of an audio playback device according to an exemplary embodiment of the present disclosure.

Fig. 16 is a schematic block diagram of another example of an audio playback device according to an exemplary embodiment of the present disclosure.

Fig. 17 is a schematic block diagram of still another example of an audio playback device according to an exemplary embodiment of the present disclosure.

Fig. 18 is a schematic block diagram of an example of an audio-video stream synchronized playback device according to an exemplary embodiment of the present disclosure.

Fig. 19 is a schematic block diagram of another example of an audio-video stream synchronized playback device according to an exemplary embodiment of the present disclosure.

Fig. 20 is a schematic block diagram of an electronic device according to an exemplary embodiment of the present disclosure.

Detailed Description

In order to enable those skilled in the art to better understand the technical solutions of the present disclosure, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the foregoing figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the disclosure described herein may be capable of operation in sequences other than those illustrated or described herein. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.

It should be noted that, in this disclosure, "at least one of the items" refers to a case where three types of juxtaposition including "any one of the items", "a combination of any of the items", "an entirety of the items" are included. For example, "including at least one of a and B" includes three cases side by side as follows: (1) comprises A; (2) comprising B; (3) includes A and B. For example, "at least one of the first and second steps is executed", that is, three cases are juxtaposed as follows: (1) performing step one; (2) executing the second step; (3) executing the first step and the second step.

In view of the foregoing, an audio playing method, an audio playing apparatus, an audio-video stream synchronized playing method, an audio-video stream synchronized playing apparatus, an electronic device, and a computer-readable storage medium according to exemplary embodiments of the present disclosure will be provided below with reference to the accompanying drawings.

It should be noted that, although the playing scenes of the automobile cinema and the city large screen are described herein as examples, it should be understood that the application scenes of the audio playing method, the audio playing device, the audio-video stream synchronous playing method, and the audio-video stream synchronous playing device according to the present disclosure are not limited thereto, and may be applied to any application scene where multiple terminals play video streams and audio streams respectively.

Fig. 1 is a schematic diagram of a multi-terminal audio and video playback scenario in accordance with an exemplary embodiment of the present disclosure.

As shown in fig. 1, the audio and video playing scenario may include a first terminal 110, a second terminal 120, and a first service end 210, where the first terminal 110 may be, for example, but not limited to, a portable terminal, such as a smart phone, a tablet, a notebook, a digital assistant, a wearable device, etc., and the first terminal 110 may include software running on a physical device, such as an application client, for playing audio. The second terminal 120 may be, for example, but not limited to, a city large screen, a car theater large screen, etc., and the second terminal 120 may include software running on a physical device, such as an application client, for playing video.

The first server 210 may be, for example, but not limited to, a server for pushing video streams and/or audio streams. Here, the first service end 210 may include a server that operates independently, may be a distributed server, or may be a server cluster formed by a plurality of servers.

In addition, the normal transmission of data may be ensured between any two of the first terminal, the second terminal and the first service end, for example, by establishing a communication connection through the network 300, where the communication connection may be any manner such as cellular data, WIFI, bluetooth, etc., and the disclosure is not limited in particular.

According to the exemplary embodiment of the disclosure, the first terminal may acquire the video identity of the video stream being played by the second terminal, and then may send the video identity to the first server, and acquire the audio play address of the audio stream corresponding to the video identity from the first server. The first terminal may acquire an audio stream from the first server based on the audio play address, and play the audio stream in synchronization with the second terminal playing the video stream.

According to a first aspect of exemplary embodiments of the present disclosure, an audio playing method is provided, which may be applied to a first terminal, for example, as shown in fig. 1.

As shown in fig. 2, an audio playing method according to an exemplary embodiment of the present disclosure may include the steps of:

in step S210, a video identity of the video stream being played by the second terminal may be obtained.

Here, the second terminal may be, for example, a city large screen or a large screen in an automobile theater, but it is not limited thereto, and it may be any video playback terminal. A user of the first terminal may view video content played on the second terminal. In addition, the video stream played by the second terminal may be in any format, and may include a picture and audio, or may include only a picture, and not include audio, and accordingly, the second terminal may have a display and a speaker, or may have only a display, and not include a speaker. The video stream may be, for example, but not limited to, a movie, a television program, etc.

The video identity may be a unique identity corresponding to the video stream, which may be, for example, but not limited to, a two-dimensional code, etc.

As an example, a user of a first terminal may scan a two-dimensional code displayed on a second terminal, such as an automobile cinema screen or a city large screen, using a code scanning function of a client on the first terminal.

In step S220, the video identity may be sent to the first server, and the audio play address of the audio stream corresponding to the video identity is obtained from the first server.

In this step, the first terminal may send the acquired video identity to a first service end, where the first service end may be, for example, a server for pushing a video stream to the second terminal and pushing an audio stream to the first terminal. In an example, the first service may include a single server for pushing both the video stream and the audio stream corresponding to the video stream; in another example, the first service may include a first sub-service that may be used to push an audio stream corresponding to the video stream and a second sub-service that may be used to push the video stream.

Here, the audio stream may be, for example, a spatial audio live stream generated based on an audio definition model (Audio Definition Model, ADM), for which a panoramic audio file may be used, which may achieve better audio playback.

In addition, the video stream and the audio stream described in the present disclosure may include, but are not limited to, live video programs and corresponding audio, live pre-recorded video programs and corresponding audio, and the like, which are not particularly limited by the present disclosure.

The first service end responds to the received video identity sent by the first terminal, and can identify which video stream is played on the second terminal, so that an audio stream corresponding to the video stream can be determined, and an audio playing address of the audio stream can be returned to the first terminal. In the case that the first service side includes a first sub-service side and a second sub-service side, the first terminal may send the video identity to the first sub-service side, and the first sub-service side sends the audio play address to the first terminal.

As an example, as shown in fig. 3, in this step S220, the first terminal may acquire an audio play address by:

in step S310, a first play request for obtaining play information of the second terminal may be generated based on the video identity, where the first play request includes the video identity and location information of the first terminal.

Here, the first terminal may generate the first play request based on the video identification of the video stream and the current location of the first terminal.

In step S320, the first play request may be sent to the first service end, so that the first service end determines, based on the first play request, a second terminal nearest to the first terminal.

Here, since the first play request includes the location information of the first terminal, the first service end may determine a nearest second terminal in the vicinity of the first terminal based on the first play request. In this way, in the scene of a plurality of second terminals, which second terminal is specifically watched by the user of the first terminal can be identified, so that the audio stream can be accurately pushed.

In particular, in some scenarios, there may be multiple second terminals playing the same video stream, the locations of which may be fixed, and the first service end may acquire or know the locations of these second terminals. The second terminal may play the same video stream synchronously or asynchronously, for example, in an application scenario of a large screen in a city, the same program may be played on multiple large screens in the city.

Under the condition of asynchronous playing, the first server needs to know which second terminal is currently watched by the user of the first terminal when pushing the audio stream to the first terminal, and because the video identity corresponds to the video stream and is irrelevant to the second terminal, the first server needs to determine which second terminal is watched by the user besides determining which video stream is being watched by the user through the video identity so as to determine the playing progress of the video stream of the second terminal, thereby pushing the audio stream matched with the playing progress of the second terminal. In the case of synchronous playing, although the playing progress of video streams pushed by the first server to the plurality of second terminals is the same, due to network delay, communication failure and the like, different time of video streams received by different second terminals may be caused, so that there is a difference in playing progress, so that the first server needs to know the second terminal being watched by the user, so as to accurately push an audio stream matched with the playing progress thereof under the condition of considering network delay and communication failure of the second terminal.

Here, since it is difficult for the first service end to directly determine the second terminal that the user is watching, according to the exemplary embodiment of the present disclosure, the first service end may send the current location information of the first service end together with the video identity through the first terminal, so that the first service end determines the second terminal that the user is most likely to and is most easy to watch according to the location of the first terminal. Therefore, the first terminal does not need to acquire the position of the second terminal first and then send the position to the first service terminal, but the operation of the first terminal can be simplified, and the first service terminal can determine the second terminal watched by the user with lower communication cost.

In addition, in the case that the first service side includes a first sub-service side and a second sub-service side, the first play request may be transmitted to the first sub-service side for pushing the video stream, and the first sub-service side may establish communication with the second terminal to push the video stream thereto and receive feedback information from the second terminal, so that the first terminal may directly request the first sub-service side for video play information related to the second terminal.

In step S330, video playback information may be received from the first service side.

The first server may determine video playing information of the corresponding second terminal according to the first playing request, and send the video playing information to the first terminal. Here, the video playing information may include an identification of the second terminal and an identification of an audio stream corresponding to the video stream, so that the first terminal may determine which of the second terminals the user is watching, and may determine which of the audio streams of the video content the user is watching, after receiving the video playing information.

In addition, the video playing information may further include, but is not limited to, a name of the second terminal, a video time delay of playing the video stream by the second terminal, and the like. Here, the video time delay of the second terminal playing the video stream may be used by the first terminal or a third terminal (to be described later) to play the audio stream in synchronization with the second terminal playing the video stream, which will be described in detail later.

Taking a playing scene of the large screen of the automobile cinema/city as an example, the first terminal can start to acquire the position authority, so that the first terminal can initiate a first playing request for acquiring the current information of the large screen of the automobile cinema/city to the first service terminal with the own position information of the first terminal, and after receiving the effective request, the first service terminal can return the video playing information of the large screen of the automobile cinema/city to the first terminal, and can comprise parameters such as a large screen name, a large screen ID, audio-video delay time, an audio live stream ID and the like.

In step S340, a second play request for acquiring an audio play address may be generated based on the video play information.

Here, the second play request may include an identification of the second terminal and an identification of the audio stream.

In step S350, the second play request may be sent to the first service side, and the audio play address may be acquired from the first service side.

Specifically, the first terminal may send a second play request to the first service end to request the first service end to acquire an audio play address for playing the audio stream. Here, the audio stream may be a spatial audio live stream, and the audio play address may be a play address of the spatial audio live stream, for example.

In addition, in the case that the first service side includes a first sub-service side and a second sub-service side, the second play request may be sent to the second sub-service side for pushing the audio stream, and the second sub-service side may establish communication with the first terminal to push the audio stream thereto, so that the first play request and the second play request of the first terminal may be respectively sent to different sub-service sides, thereby implementing split management of video play and audio play at the first service side, reducing the pressure of the first service side. However, the exemplary embodiments of the present disclosure are not limited thereto, and the first service side may also be one service side for pushing both the video stream and the audio stream and for receiving both the first play request and the second play request.

In step S230, an audio stream may be acquired from the first server based on the audio play address, and played in synchronization with the second terminal playing the video stream.

In this step, the first terminal may acquire an audio stream from the first server based on the audio play address, for example, may acquire an audio stream from the second sub-server. Thus, the audio stream can be played on the first terminal and the video stream can be played on the second terminal, so that the scheme of watching video and audio at different ends is realized, and a user can hear the sound corresponding to the video played on the large screen through the first terminal even under the scene such as an automobile cinema or a city large screen.

Furthermore, as described above, the audio stream according to the embodiments of the present disclosure may be a spatial audio live stream, which may be generated based on an audio definition model (Audio Definition Model, ADM) defined in the ITU-R bs.2076-1 recommendation issued by the international telecommunication sector, describing the structure of an audio metadata model, which may accurately describe the format and content of an audio file, ensuring compatibility of all systems. The model specifies how a standard defined panoramic audio file is generated. The implementation of such a spatial audio live stream in terms of playback and rendering is different from ordinary audio, which allows an audio stream played according to an exemplary embodiment of the present disclosure to achieve better audio playback effects. However, in the existing audio playback technology, there is a lack of design schemes for rendering playback spatial audio by a sound system such as a car-mounted terminal, for which specific playback and rendering manners of audio streams will be described in detail below.

The scheme that the first terminal obtains the audio playing address according to the video identity of the video stream played by the second terminal to obtain the audio stream corresponding to the video stream, so as to realize the video and audio playing of different ends is described above, and in this regard, according to the exemplary embodiment of the present disclosure, a third terminal and a second server may be further included in the video stream and audio playing scene.

Specifically, as shown in fig. 4, the audio and video playing scene may include the first terminal 110, the second terminal 120, the third terminal 130, the first service end 210 and the second service end 220, where the first terminal 110, the second terminal 120 and the first service end 210 are described in detail above, and will not be repeated here. The third terminal 130 may be, for example, but not limited to, a vehicle-mounted terminal, which may also include software running on a physical device, such as an application client, for playing audio. The second server 220 may be, for example, but not limited to, a server for managing user information and user play information. Here, the second server 220 may include a server that operates independently, may be a distributed server, or may be a server cluster formed by a plurality of servers.

The communication connection may be established between any two of the first terminal, the second terminal, the third terminal, the first service end and the second service end through, for example, the network 310 to ensure normal transmission of data, where the communication connection may be any manner such as cellular data, WIFI, bluetooth, etc., and the disclosure is not limited in particular.

In this example, the first terminal may also transmit the audio playback information to the second server for the second server to transmit the audio playback information to the third terminal to allow the third terminal to obtain the audio stream from the first server based on the audio playback information and play the audio stream in synchronization with the second terminal playing the video stream.

Here, the audio play information may include the above-described audio play address, or both the audio play address and the audio play progress of the first terminal. In addition, the audio playing information may further include the video playing information received by the first terminal from the first service end, so that the third terminal may obtain the playing condition of the second terminal.

Specifically, the first terminal, such as the portable terminal, may upload the obtained audio live stream address, the audio playing progress of the first terminal (if playing has been started), and the user account number to the second server for storing, and after the audio playing client on the third terminal, such as the vehicle terminal, logs in with the same user account number, the third terminal may obtain the audio live stream and the playing progress of the first terminal (if playing has been started) from the second server. If the third terminal obtains the audio live stream address and the playing progress through the mechanism, the third terminal can continue playing the audio stream synchronously with the video stream played by the second terminal similar to the first terminal. Therefore, the user can select to play the audio stream through the first terminal or the third terminal, the audio playing of the user in two scenes of walking or driving the vehicle is facilitated, and multiple choices of multi-terminal playing are provided for the user.

According to the above-described audio playing scheme, the first terminal may send the video identity of the video stream played by the second terminal to the first server, and acquire the audio playing address of the audio stream corresponding to the video stream from the first server, so that the audio stream may be played on the first terminal, so that the user may listen to the audio stream played on the first terminal while watching the video stream played by the second terminal, thereby implementing collaborative playback between multiple terminals, so that the corresponding audio content may be clearly listened to by the first terminal even when playing the video in the second terminal such as a large screen.

Having described a scheme for playing an audio stream by a multi-terminal according to an exemplary embodiment of the present disclosure, a specific process of playing an audio stream in synchronization with a video stream and a specific scheme for playing and rendering an audio stream by a terminal will be described in detail. It should be noted that the synchronous play scheme and the audio stream play and render scheme described below may be applied to both the first terminal and the third terminal.

In an example, the audio stream includes audio data and a standard timestamp common to the video stream, and as shown in fig. 5, the step of playing the audio stream in synchronization with the second terminal playing the video stream may include the steps of:

In step S510, the audio stream may be decoded to obtain a standard time stamp.

Here, the standard time stamp is a time stamp added when the audio stream and the video stream are generated, and the audio stream and the video stream can share the standard time stamp, so that audio-visual synchronization of the video picture and the audio sound can be ensured when played according to the standard time stamp.

Specifically, in this step, in the case that the first terminal or the third terminal acquires the audio stream, the first terminal or the third terminal may perform a decapsulation process on the audio stream, and produce a data object, which may include a standard time stamp of the current data frame and audio data, which is compressed data to be decoded. The compressed data may be decoded by a corresponding audio decoder, and pulse modulation encoded (Pulse Code Modulation, PCM) data may be generated by the decoder, and the generated PCM data may be entered into a buffer queue along with a standard timestamp previously parsed.

In step S520, a difference process may be performed on the standard timestamp and the local timestamp to obtain an audio delay time.

In this step, the first terminal or the third terminal may perform time stamp alignment. Specifically, the audio decoder of the first terminal or the third terminal may decode the acquired audio stream such as the ADM audio stream, the decoder continuously acquires PCM data and the standard timestamp from the buffer queue, and may perform a difference process on the standard timestamp acquired by decoding and the local timestamp to obtain an audio delay time Δt, that is, Δt=the standard timestamp-local timestamp of audio decoding.

In step S530, the audio stream may be played in synchronization with the second terminal playing the video stream based on the audio delay time.

In this step, the audio stream may be played after waiting for a time equal to the audio delay time in response to the audio delay time being greater than 0; audio data of a time equal to the audio delay time may be discarded from the audio stream in response to the audio delay time being less than 0, and the audio stream after the audio data is discarded may be played.

Specifically, if Δt > 0, the audio decoding work thread of the first terminal or the third terminal may be allowed to rest for a time equal to Δt, and in this case, the video stream played by the second terminal may be considered. If DeltaT < 0, audio data of the same length as DeltaT can be discarded from the decoded buffer queue.

In another example, playing the audio stream synchronously with the video stream may be implemented according to the audio-video stream synchronous playing method as shown in fig. 6. Here, it should be noted that the audio/video stream synchronous playing method shown in fig. 6 may be used to implement playing of an audio stream synchronously with a video stream in the audio playing method shown in fig. 2, but is not limited thereto, and may be used for other scenes where a video stream is played synchronously with an audio stream or may be implemented separately.

The audio/video stream synchronous playing method shown in fig. 6 may be applied to the first terminal, and specifically, the method may include the following steps:

in step S610, an audio stream may be acquired from a first service end, where the audio stream corresponds to a video stream being played by a second terminal, the second terminal receives the video stream from the first service end, the audio stream includes audio data and a standard timestamp, and the video stream includes video data and a standard timestamp.

As an example, the first terminal may obtain the audio stream from the first service side by the method shown in fig. 2.

Specifically, as shown in fig. 2, the first terminal may obtain an audio stream from the first server in the following manner: step S210, obtaining a video identity of a video stream being played by a second terminal; step S220, the video identity is sent to a first server, and an audio playing address of an audio stream corresponding to the video identity is obtained from the first server; step S230, based on the audio playing address, the audio stream is obtained from the first server.

The specific embodiments of each step are described in detail herein before, and may be implemented in the same manner as described herein before, so that a detailed description thereof is omitted. However, the present exemplary embodiment is not limited thereto, and the first terminal may acquire an audio stream to be played in synchronization with the video stream in other manners.

In step S620, a video delay time of playing the video stream by the second terminal, which is determined by the second terminal, may be received from the first service terminal, where the video delay time is a difference between a standard timestamp and a local timestamp of the second terminal.

Specifically, the audio/video encoder of the first service end may add the standard time stamp to the video stream and the audio stream respectively during encoding, then perform video push and audio push, the video stream push to the second terminal such as the city large screen or the automobile cinema player, and the audio stream push to the first terminal in the manner of obtaining the identity. For this process, when the video decoder of the second terminal decodes the video stream, the second terminal acquires the standard timestamp, and can perform difference processing with the local timestamp of the second terminal to obtain the video delay time delayTime1. The first service side can continuously acquire the video delay time delayTime1 from the second terminal according to a preset first timing task through a timer.

Meanwhile, the first terminal may acquire the video delay time delayTime1 from the first server according to a predetermined second timing task through a timer, and every other synchronization processing period. Here, the time interval of the first timing task may be the same as the time interval of the second timing task to enable transmission of the video delay time from the second terminal to the first service end to the first terminal to be accurately achieved and to synchronize the decoding of the first terminal with the second terminal.

In step S630, an audio delay time for the first terminal to play the audio stream may be determined based on a difference between the standard time stamp and the local time stamp of the first terminal.

In this step, the audio decoder of the first terminal may decode the audio stream, and continuously acquire PCM data and a standard timestamp from the decoded audio frame buffer sequence, so that the acquired standard timestamp and the local timestamp may be subjected to difference processing to obtain an audio delay time delayTime2, i.e. delayTime 2=audio decoding timestamp-local timestamp.

In step S640, the audio stream may be played in synchronization with the second terminal playing the video stream based on the video delay time and the audio delay time.

In this step, the first terminal may perform difference processing on the audio delay time delayTime2 and the video delay time delayTime1 to obtain an audio-video asynchronous time Δt, i.e., Δt=delaytime 2-delayTime1, where the audio-video asynchronous time Δt is a time to be adjusted.

Specifically, in response to the audio-visual asynchronous time being greater than 0 (i.e., Δt > 0), the audio stream may be played after waiting for a time equal to the audio delay time, e.g., the audio decoding worker thread may be allowed to rest for a time equal to Δt; in response to the audio-visual asynchronous time being less than 0 (i.e., Δt < 0), audio data of a length equal to the audio delay time may be discarded from the audio stream, and the audio stream after the audio data is discarded may be played, for example, audio data of a length equal to Δt may be discarded from the buffered audio frames. Therefore, the synchronous playing of the audio stream and the video stream can be realized, so that a user can synchronously listen to the corresponding audio stream from the first terminal when watching the video stream played by the second terminal.

Furthermore, according to an exemplary embodiment of the present disclosure, the time interval of the second timing task may be adjusted in real time during the reception of the video delay time from the first service end by the first terminal.

Specifically, under the condition that the audio delay time and the video delay time are subjected to difference processing to obtain audio-video asynchronous time, the video delay time can be received from the first service end at a first time interval in response to the audio-video asynchronous time meeting a first preset condition; receiving video delay time from the first service end at a second time interval in response to the audio-visual asynchronous time meeting a second preset condition; and receiving the video delay time from the first service end at a third time interval in response to the audio-visual asynchronous time meeting a third preset condition.

Here, the first preset condition may be: the sound-picture asynchronous time is in a first preset interval, or the frequency of the sound-picture asynchronous time outside the first preset interval is lower than the preset frequency within the preset time length; the second preset condition may be: the sound and picture asynchronous time obtained continuously twice is larger than or equal to the upper limit value of the first preset interval or smaller than or equal to the lower limit value of the first preset interval; the third preset condition may be: the sound and picture asynchronous time obtained continuously twice is larger than or equal to a first threshold value or smaller than or equal to a second threshold value, the first threshold value is larger than the upper limit value of a first preset interval, and the second threshold value is smaller than the lower limit value of the first preset interval.

Here, the first time interval may be greater than the second time interval, and the second time interval may be greater than the third time interval.

Specifically, in the data transmission process, due to network jitter and other reasons, the absolute value of the sound-picture asynchronous time DeltaT may be unstable, and if the absolute value of the sound-picture asynchronous time DeltaT is found to be larger in a certain synchronous processing, the fact that the sound-picture asynchronous feeling of the first terminal is obvious before the synchronous processing after the last synchronous processing is shown.

In this regard, the first preset interval may represent an interval where the audio and video are not synchronized and are not easily perceived by a person, for example, -100ms < [ delta ] T < 25ms; when the asynchronous time DeltaT of the audio and the video is larger than or equal to the upper limit value of the first preset interval or smaller than or equal to the lower limit value of the first preset interval, for example DeltaT is larger than or equal to 25ms or DeltaT is smaller than or equal to-100 ms, the asynchronous audio and the video is easy to be perceived; and when the asynchronous time DeltaT of the audio and the video is larger than or equal to a first threshold value or smaller than or equal to a second threshold value, for example DeltaT is larger than or equal to 90ms or DeltaT is smaller than or equal to minus 185ms, the asynchronous experience of the audio and the video is obvious and even unacceptable.

To solve the above-described problems, a mechanism for adaptively adjusting the synchronization processing period may be added according to an exemplary embodiment of the present disclosure to adjust the time interval for acquiring the video delay time delayTime1 from the server.

For example, under normal conditions, -100ms < DELTAT < 25ms, or sporadically DELTAT.gtoreq.25 ms or DELTAT.ltoreq.100 ms, the period defaults to 20s, i.e. the first terminal acquires the video delay time delayTime1 from the server every 20 s. If the synchronization process DeltaT is more than or equal to 25ms or DeltaT is less than or equal to-100 ms continuously for more than two times, the period is adaptively adjusted to 10s, namely the first terminal acquires video delay time delayTime1 from the server at intervals of 10 s. If the synchronization process DeltaT is more than or equal to 90ms or DeltaT is less than or equal to-185 ms for more than two times continuously, the period is adaptively adjusted to 5s, namely the first terminal acquires video delay time delayTime1 from the server at intervals of 5 s.

According to the synchronous playing scheme, the first terminal can receive the video delay time on the second terminal from the first service terminal, and can synchronize the audio stream played by the first terminal with the video stream played by the second terminal based on the audio delay time on the first terminal and the video delay time on the second terminal, so that the audio-and-video synchronous playing scheme among multiple terminals is realized, corresponding audio content can be synchronously listened to by the first terminal even when video is played in the second terminal such as a large screen, and the problem that synchronous audio listening can be difficult to obtain when video is played in the large screen is solved.

In addition, according to the exemplary embodiment of the present disclosure, a first terminal manual compensation mechanism may be added on the basis of a system automatic synchronization mechanism.

Specifically, in response to receiving an audio wait instruction input by a user, the audio stream may be played after waiting for a duration corresponding to a wait time in the audio wait instruction; in response to receiving an audio discard instruction input by a user, audio data with a duration corresponding to a discard time in the audio discard instruction can be discarded from the audio stream, and the audio stream after the audio data is discarded is played.

Here, the audio waiting instruction may be an instruction to instruct the local audio playback to wait for a specified waiting time, and the audio discarding instruction may be an instruction to discard the audio data of the specified discarding time from the local audio playback.

For example, as shown in fig. 7, the audio waiting instruction and the audio discarding instruction may be input to the first terminal through a button control in an interactive interface of the first terminal, respectively.

Specifically, as shown in fig. 7, if the user subjectively experiences that the video is lagged behind the audio, the "+" key can be clicked to manually adjust the audio playing progress in such a way that the local audio playing is waited, and the adjusted value is the waiting time required for the local audio playing; if the subjective feeling audio lags behind the video, the user can click the "" key to manually adjust the audio playing progress in such a way that audio decoding data to be played for a certain time period are discarded, and the adjusted numerical value is the time period of the audio data to be discarded for local audio playing.

As an example, the waiting time and the discard time described above may be fixed and the same, for example, in the example shown in fig. 7, the granularity of the adjusted time is 50ms, i.e., each time the "+" key or the "-" key is clicked, the waiting for 50ms or the discard for 50ms may be performed.

While several examples of implementing audio and video stream synchronous playback are described above, specific schemes for audio stream playback and rendering will be described below.

As an example, the sound recording of a second terminal, such as a city large screen or an automotive cinema, may be a spatial audio file of an ADM standard of 10 channels, each channel containing channel location information described using format metadata. The spatial audio file is stored in the first server for the first terminal to acquire. Based on such spatial audio files, a three-dimensional sound playing scheme may be employed to play audio.

Specifically, as shown in fig. 8, the first terminal or the third terminal may play the audio stream by:

in step S810, a channel of a speaker for playing an audio stream may be determined.

For example, the audio renderer of the first or third terminal may first access the speaker driver of the current terminal to obtain the speaker type, and different speaker types may have different channels, e.g., the speaker type of the first terminal is stereo, such as the speaker type of the 5.1 channel or 7.1 channel of the second terminal of the on-board client.

In step S820, the audio data parsed from the audio stream may be rendered in a channel direction, resulting in channel metadata of the audio data.

In step S830, spatial position information of each channel audio in the audio data may be determined based on the channel metadata.

In steps S820 and S830, the audio frame with the time stamp calibrated by the synchronous playing mode may enter into the renderer to render the channel azimuth, and the original spatial position information of each channel may be recovered by analyzing the metadata of each channel format.

In step S840, the channel audio may be converted according to the channel and spatial position information of the speaker, to obtain converted channel audio.

In step S850, an audio file adapted to the channel of the speaker may be generated based on the converted channel audio.

In steps S840 and S850, channels containing spatial position information, such as the original 10 channels, may be translated to channels of the current device speaker array by a channel translation algorithm to complete input-output channel translations, so that a playable PCM file adapted to the device speaker type may be generated.

In step S860, the audio file may be played through the channel of the speaker.

In this step, PCM file data may be written into a sound card buffer of the terminal for audio playback.

In the case where the audio stream is a spatial audio stream generated based on an ADM, playing based on the playing method of the exemplary embodiment of the present disclosure can obtain a better playing effect and listening experience. An example of a specific flow of decoding of an ADM audio file by the third terminal and the first terminal will be described in detail below with reference to fig. 9 and 10.

Referring to fig. 9, an ADM rendering program in a Microprocessor (MCU) in the third terminal first reads an ADM audio file, then parses and decodes the ADM audio file, and then realizes digital audio signal restoration and mixing of each audio object (i.e., audio data) according to the ADM audio file description, and encodes into a digital audio stream (e.g., dolby AC3 (Dolby AC 3) digital audio stream). The encoded audio signal (e.g., dolby AC3 audio signal) is then output to a digital Sound card (e.g., DAC Sound card) via an integrated circuit built-in audio bus (I2S, inter-IC Sound) interface and sent to a decoder (e.g., dolby) via a digital audio transmission interface of the digital Sound card (e.g., S/PDIF (Sony/Philips Digital Interface Format) interface), decoded by the decoder and output to an external stereo speaker (e.g., but not limited to, 5.1 channels (including FL channel, FR channel, CEN channel, SL channel, SR channel)) via an output interface (e.g., RAC interface).

Here, for the ADM audio stream, decoding and rendering are performed by the third terminal, such as the vehicle-mounted terminal, so that better rendering and playing effects can be achieved compared with the existing vehicle-mounted playing mode, and panoramic audio and video viewing of different-end playing can be achieved by vehicle-mounted playing stereo audio and matching with a large-screen video picture.

Fig. 10 is a schematic diagram illustrating a decoding flow of an ADM audio file by a first terminal according to an exemplary embodiment of the present disclosure.

Referring to fig. 10, the ADM rendering program in the first terminal first reads the ADM audio file, then parses and decodes the ADM audio file, and then realizes digital audio signal restoration and mixing of each audio object according to the ADM audio file description, and encodes into a digital audio stream (e.g., WAV format digital audio stream). And then output to a virtual sound card within the mobile client that decodes the digital audio stream and outputs to a speaker or earpiece of the mobile client device (e.g., without limitation, including left channel (L) and right channel (R)). Here, the virtual sound card may be, for example, an Android (Android) sound card or the like.

According to an exemplary embodiment of the present disclosure, the client may further receive virtual scene position information of each audio data from the server, and generate an interactive interface based on the received virtual scene position information of each audio data, wherein an icon corresponding to each audio data is displayed at a virtual scene position corresponding to the audio data in the interactive interface.

As shown in fig. 11, for a second terminal, it may play a video stream, for example, but not limited to, the video stream may be pushed by a server, and in the process of playing the video stream, a video identity for obtaining an audio stream corresponding to the video stream may be displayed in a video picture, for example, a two-dimensional code of a program played in the video stream may be displayed.

For the first terminal, the video identity may be obtained by, for example, scanning a two-dimensional code, and based on the video identity and the first terminal position information, the server (for example, the first server) may request to obtain video playing information of the second terminal, for example, information of a large screen. The first terminal may receive the video playing information of the second terminal from the server and analyze the related information of the second terminal, where the video playing information may include, but is not limited to, an ID of the second terminal, an audio-visual extension time, an audio stream ID, and the like.

The first terminal may initiate a request to the server to acquire a spatial audio stream based on the video play information, and receive an audio stream play address from the server. The first terminal may acquire the audio stream based on the audio stream play address, may parse the audio stream, align time stamps of the audio stream and the video stream based on the synchronous play method described in the above two examples, and may play the audio stream based on the above rendering and play scheme.

In addition, the first terminal may synchronize the obtained audio stream playing address to the server (for example, the second server), and may also synchronize the playing progress of the first terminal to the server.

For the server, the server may include, but is not limited to, the first server and the second server described above. The server may send the video playing information to the first terminal based on the acquisition request of the video playing information from the first terminal.

In addition, the server may further send the audio stream playing address to the first terminal based on the request for obtaining the audio stream from the first terminal. In addition, the server (e.g., the second server) may also store the user play record from the first terminal. In this way, the user play record may be transmitted to the third terminal in response to a request for acquiring the user play record (e.g., audio play progress) transmitted by the third terminal.

For the third terminal, the third terminal may log in the same user account with the first terminal, the third terminal may send account information of the user logged in on the third terminal to the server, and initiate a request to the server to obtain a play record of the user account, so as to obtain an audio stream play address and an audio play progress from the server, so that an audio stream may be obtained based on the audio stream play address, and the audio stream may be parsed from an audio frame of the audio play progress, and time stamps of the audio stream and the video stream may be aligned based on the synchronous play methods described in the two examples, and then the audio stream may be played based on the rendering and playing schemes.

The first terminal may be, but is not limited to, a mobile client, the second terminal may be, but is not limited to, an automobile cinema screen/city large screen, the third terminal may be, but is not limited to, a vehicle-mounted client, and the server may be, but is not limited to, a server.

According to the audio playing method and the audio and video stream synchronous playing method of the embodiment of the disclosure, the problem that accompanying sound cannot be listened to when watching videos such as urban large screens can be solved, and large-screen accompanying sound with audio quality can be listened to through a mobile phone or a vehicle-mounted listening space; the problem that the sound quality of movie accompaniment outside the automobile is not high when watching the automobile cinema can be solved, and the movie accompaniment with the sound quality of the audio frequency can be listened through a mobile phone or a vehicle-mounted listening space.

According to the audio playing method and the audio and video stream synchronous playing method of the exemplary embodiment of the present disclosure, a mobile client is allowed to pull a panoramic sound live stream in a movie ADM format played by a large screen in a mode of terminal code scanning such as a mobile phone, and movie space audio can be played on the mobile client by rendering metadata of ADM files and aligning time stamps.

In addition, according to the audio playing method and the audio and video stream synchronous playing method of the exemplary embodiment of the disclosure, rendering and playing of movie/city large-screen panoramic sound accompaniment can be realized at the vehicle-mounted client through a mobile phone/vehicle-mounted client and account content playing synchronous mechanism. The original panoramic sound accompaniment of the film can be played by utilizing the vehicle-mounted multichannel loudspeaker array (for example, 5.1 channels or 7.1 channels), so that the viewing experience effect of the large screen of the automobile cinema/city is greatly improved.

In addition, according to the audio playing method and the audio and video stream synchronous playing method of the exemplary embodiment of the disclosure, rendering playing application of the spatial audio under the ADM standard in the vehicle-mounted sound system can be realized, and application scenes of combining the ADM spatial audio with the vehicle-mounted multichannel sound system are enriched.

In addition, according to the audio playing method and the audio/video stream synchronous playing method of the exemplary embodiment of the present disclosure, when the audio and video of the audio/video program are decoded and played at different ends respectively, it is difficult to achieve a better audio/video synchronization effect.

In addition, according to the audio playing method and the audio and video stream synchronous playing method of the exemplary embodiment of the present disclosure, when the audio and video of the audio and video program are decoded and played at different ends respectively, a better audio and video synchronization effect can be achieved through some column timestamp alignment mechanisms of the server and the client, which are automatic and manual.

In addition, according to the audio playing method and the audio/video stream synchronous playing method of the exemplary embodiment of the present disclosure, when the audio stream obtained by the terminal such as the mobile phone through code scanning plays at both ends of the mobile phone and the car, the audio stream is aligned with the timestamp of the streaming media playing of the large screen/cinema, so as to realize the audio/video synchronization technology.

Fig. 12 is a schematic flowchart of another example of an audio playing method according to an exemplary embodiment of the present disclosure. The audio playing method may be applied to the first service end, as shown in fig. 12, and the audio playing method may include:

in step S1210, a video identification of a video stream being played by a second terminal acquired by a first terminal may be received from the first terminal.

In step S1220, an audio play address of an audio stream corresponding to the video identification may be transmitted to the first terminal.

In step S1230, the audio stream may be transmitted to the first terminal based on the audio play address for the first terminal to play the audio stream in synchronization with the second terminal playing the video stream.

As an example, step S1220 may include the steps of:

receiving a first play request generated by a first terminal based on a video identity from the first terminal, wherein the first play request is used for acquiring play information of a second terminal, and the first play request comprises the video identity and the position information of the first terminal;

determining a second terminal nearest to the first terminal based on the first play request;

transmitting video playing information to a first terminal, wherein the video playing information comprises an identity of a second terminal and playing parameters of a video stream;

Receiving a second play request generated by the first terminal based on the video play information from the first terminal, wherein the second play request is used for acquiring an audio play address, and the second play request comprises an identity of the second terminal and an identity of an audio stream;

and transmitting the audio playing address to the first terminal based on the second playing request.

As an example, the audio playing method may further include: receiving audio playing information sent to the third terminal by the second service end from the third terminal; and transmitting the audio stream to the third terminal based on the audio playing information so that the third terminal plays the audio stream synchronously with the video stream played by the second terminal. Here, the audio playing information is sent to the second server by the first terminal, and the audio playing information includes an audio playing address, or includes the audio playing address and an audio playing progress of the first terminal.

As an example, audio data and standard time stamps common to video streams may be included in the audio stream.

In this example, the step of playing the audio stream by the first terminal or the third terminal in synchronization with the playing of the video stream by the second terminal may include: decoding the audio stream to obtain a standard time stamp; performing difference processing on the standard time stamp and the local time stamp to obtain audio delay time; the audio stream is played in synchronization with the second terminal playing the video stream based on the audio delay time.

As an example, the step of playing the audio stream in synchronization with the playing of the video stream by the second terminal based on the audio delay time may include: in response to the audio delay time being greater than 0, playing the audio stream after waiting for a time equal to the audio delay time; in response to the audio delay time being less than 0, audio data of a time equal to the audio delay time is discarded from the audio stream, and the audio stream after the audio data is discarded is played.

As an example, the first terminal or the third terminal plays the audio stream by: determining a channel of a speaker for playing an audio stream; rendering the audio data analyzed from the audio stream in the channel azimuth to obtain channel metadata of the audio data; determining spatial position information of audio of each channel in the audio data based on the channel metadata; according to the sound channel and space position information of the loudspeaker, carrying out sound channel conversion on the sound channel audio to obtain converted sound channel audio; generating an audio file adapted to the channel of the speaker based on the converted channel audio; the audio file is played through the channels of the speakers.

In this exemplary embodiment, the configuration and executable functions of the first terminal, the second terminal, the third terminal, the first server, the second server, and the like are the same as those of the embodiment described above with reference to fig. 1 to 11, and thus, a detailed description thereof will be omitted.

Fig. 13 is a schematic flowchart of another example of an audio-video stream synchronized playback method according to an exemplary embodiment of the present disclosure. The audio and video stream synchronous playing method may be applied to the first service end, and the audio and video stream synchronous playing method may be applied, for example, but not limited to, in the step of playing the audio stream synchronously with the video stream in the audio playing method described above with reference to fig. 12.

As shown in fig. 13, the audio/video stream synchronous playing method may include the following steps:

in step S1310, an audio stream may be transmitted to the first terminal, and a video stream may be transmitted to the second terminal, where the video stream corresponds to the audio stream, the audio stream includes audio data and a standard timestamp, and the video stream includes video data and a standard timestamp.

In step S1320, the video delay time of the second terminal playing the video stream determined by the second terminal may be transmitted to the first terminal, so that the first terminal plays the audio stream in synchronization with the second terminal playing the video stream based on the video delay time and the audio delay time of the first terminal playing the audio stream, where the video delay time is a difference between the standard timestamp and the local timestamp of the second terminal, and the audio delay time is a difference between the standard timestamp and the local timestamp of the first terminal.

As an example, the step of the first terminal playing the audio stream in synchronization with the second terminal playing the video stream based on the video delay time and the audio delay time of the first terminal playing the audio stream may include: performing difference processing on the audio delay time and the video delay time to obtain audio-video asynchronous time; and playing the audio stream after waiting for a time equal to the audio delay time in response to the audio-visual asynchronous time being greater than 0, discarding the audio data of the time equal to the audio delay time from the audio stream in response to the audio-visual asynchronous time being less than 0, and playing the audio stream after discarding the audio data.

As an example, the audio/video stream synchronous playing method may further include: performing difference processing on the audio delay time and the video delay time to obtain audio-video asynchronous time; responding to the audio-video asynchronous time meeting a first preset condition, and sending the video delay time to a first terminal at a first time interval; responding to the audio-video asynchronous time meeting a second preset condition, and sending the video delay time to the first terminal at a second time interval; and responding to the audio-visual asynchronous time meeting a third preset condition, and transmitting the video delay time to the first terminal at a third time interval.

Here, the first preset condition is: the audio-visual asynchronous time is in a first preset interval, or the number of times that the audio-visual asynchronous time is outside the first preset interval is lower than the preset number of times within the preset duration. The second preset condition is: the sound and picture asynchronous time obtained by two continuous times is larger than or equal to the upper limit value of the first preset interval or smaller than or equal to the lower limit value of the first preset interval. The third preset condition is: the sound and picture asynchronous time obtained continuously twice is larger than or equal to a first threshold value or smaller than or equal to a second threshold value, the first threshold value is larger than the upper limit value of a first preset interval, and the second threshold value is smaller than the lower limit value of the first preset interval.

Here, the first time interval is greater than the second time interval, and the second time interval is greater than the third time interval.

As an example, the audio/video stream synchronous playing method may further include: the first terminal responds to receiving an audio waiting instruction input by a user, and plays an audio stream after waiting for a duration corresponding to the waiting time in the audio waiting instruction; and the first terminal discards the audio data with the duration corresponding to the discarding time in the audio discarding instruction from the audio stream in response to receiving the audio discarding instruction input by the user, and plays the audio stream after discarding the audio data.

As an example, the audio/video stream synchronous playing method may further include: receiving a video identity of a video stream being played by a second terminal, which is acquired by a first terminal, from the first terminal; transmitting an audio play address of an audio stream corresponding to the video identity to a first terminal; the step of transmitting the audio stream to the first terminal is performed based on the audio play address.

Fig. 14 is a schematic flowchart of still another example of an audio playing method according to an exemplary embodiment of the present disclosure. The audio playing method may be applied to a third terminal.

As shown in fig. 14, the audio playing method may include the steps of:

in step S1410, audio playing information may be received from the second server, where the audio playing information is sent from the first terminal to the second server, where the audio playing information includes an audio playing address, or includes the audio playing address and an audio playing progress of the first terminal, where the audio playing address is acquired by the first terminal, the first terminal acquires a video identity of a video stream being played by the second terminal, sends the video identity to the first server, and acquires an audio playing address of an audio stream corresponding to the video identity from the first server.

In step S1420, an audio stream may be acquired from the first server based on the audio play information, and played in synchronization with the second terminal playing the video stream.

As an example, the step of the first terminal sending the video identity to the first server, and obtaining, from the first server, an audio play address of an audio stream corresponding to the video stream includes:

generating a first playing request for acquiring playing information of the second terminal based on the video identity, wherein the first playing request comprises the video identity and the position information of the first terminal;

the first playing request is sent to a first service end, so that the first service end determines a second terminal nearest to the first terminal based on the first playing request;

receiving video playing information from a first service end, wherein the video playing information comprises an identity of a second terminal and an identity of an audio stream corresponding to the video stream;

generating a second playing request for acquiring an audio playing address based on the video playing information, wherein the second playing request comprises an identity of a second terminal and an identity of an audio stream;

and sending the second playing request to the first service end, and acquiring the audio playing address from the first service end.

As an example, audio data and standard time stamps common to video streams are included in the audio stream. In this example, the step of playing the audio stream by the first terminal or the third terminal in synchronization with the playing of the video stream by the second terminal may include: decoding the audio stream to obtain a standard time stamp; performing difference processing on the standard time stamp and the local time stamp to obtain audio delay time; the audio stream is played in synchronization with the second terminal playing the video stream based on the audio delay time.

Fig. 15 is a schematic block diagram of an example of an audio playback device according to an exemplary embodiment of the present disclosure. Referring to fig. 15, the audio playing device is applied to a first terminal, and includes:

the identifier obtaining unit 1510 is configured to obtain a video identifier of a video stream being played by the second terminal.

The identifier sending unit 1520 is configured to send the video identifier to the first server, and obtain an audio play address of the audio stream corresponding to the video identifier from the first server.

The audio stream acquisition unit 1530 is configured to acquire an audio stream from the first server based on the audio play address, and play the audio stream in synchronization with the second terminal playing the video stream.

As an example, the identity transmission unit 1520 is further configured to: generating a first playing request for acquiring playing information of the second terminal based on the video identity, wherein the first playing request comprises the video identity and the position information of the first terminal; the first playing request is sent to a first service end, so that the first service end determines a second terminal nearest to the first terminal based on the first playing request; receiving video playing information from a first service end, wherein the video playing information comprises an identity of a second terminal and an identity of an audio stream corresponding to the video stream; generating a second playing request for acquiring an audio playing address based on the video playing information, wherein the second playing request comprises an identity of a second terminal and an identity of an audio stream; and sending the second playing request to the first service end, and acquiring the audio playing address from the first service end.

As an example, the audio playing method further includes:

transmitting the audio playing information to the second server for the second server to transmit the audio playing information to the third terminal, allowing the third terminal to acquire the audio stream from the first server based on the audio playing information and play the audio stream in synchronization with the playing of the video stream by the second terminal,

the audio playing information comprises an audio playing address or comprises the audio playing address and the audio playing progress of the first terminal.

As an example, the audio stream includes audio data and a standard timestamp shared with the video stream, where the first terminal or the third terminal plays the audio stream in synchronization with the second terminal playing the video stream, including: decoding the audio stream to obtain a standard time stamp; performing difference processing on the standard time stamp and the local time stamp to obtain audio delay time; the audio stream is played in synchronization with the second terminal playing the video stream based on the audio delay time.

As an example, playing an audio stream in synchronization with playing a video stream by a second terminal based on an audio delay time, comprising: in response to the audio delay time being greater than 0, playing the audio stream after waiting for a time equal to the audio delay time; in response to the audio delay time being less than 0, audio data of a time equal to the audio delay time is discarded from the audio stream, and the audio stream after the audio data is discarded is played.

Fig. 16 is a schematic block diagram of another example of an audio playback device according to an exemplary embodiment of the present disclosure. Referring to fig. 16, the audio playing device is applied to a first service end, and includes:

the identity receiving unit 1610 is configured to receive, from the first terminal, a video identity of a video stream being played by the second terminal, which is acquired by the first terminal.

The address transmitting unit 1620 is configured to transmit an audio play address of an audio stream corresponding to the video identification to the first terminal.

The audio stream transmitting unit 1630 is configured to transmit an audio stream to the first terminal based on the audio play address, so that the first terminal plays the audio stream in synchronization with the second terminal playing the video stream.

As an example, the address transmitting unit 1620 is further configured to: receiving a first play request generated by a first terminal based on a video identity from the first terminal, wherein the first play request is used for acquiring play information of a second terminal, and the first play request comprises the video identity and the position information of the first terminal; determining a second terminal nearest to the first terminal based on the first play request; transmitting video playing information to a first terminal, wherein the video playing information comprises an identity of a second terminal and playing parameters of a video stream; receiving a second play request generated by the first terminal based on the video play information from the first terminal, wherein the second play request is used for acquiring an audio play address, and the second play request comprises an identity of the second terminal and an identity of an audio stream; and transmitting the audio playing address to the first terminal based on the second playing request.

As an example, the audio playback apparatus is further configured to: receiving audio playing information sent to the third terminal by the second service end from the third terminal, wherein the audio playing information is sent to the second service end by the first terminal, and the audio playing information comprises an audio playing address or comprises the audio playing address and the audio playing progress of the first terminal; and transmitting the audio stream to the third terminal based on the audio playing information so that the third terminal plays the audio stream synchronously with the video stream played by the second terminal.

Fig. 17 is a schematic block diagram of still another example of an audio playback device according to an exemplary embodiment of the present disclosure. Referring to fig. 17, the audio playing device is applied to a third terminal, which includes:

the information receiving unit 1710 is configured to receive audio playing information from the second server, where the audio playing information is sent to the second server by the first terminal, the audio playing information includes an audio playing address, or includes the audio playing address and an audio playing progress of the first terminal, the audio playing address is acquired by the first terminal, the first terminal acquires a video identity of a video stream being played by the second terminal, sends the video identity to the first server, and acquires an audio playing address of an audio stream corresponding to the video identity from the first server;

the acquisition and playback unit 1720 is configured to acquire an audio stream from the first server based on the audio playback information, and play the audio stream in synchronization with the second terminal playing the video stream.

As an example, the first terminal sends the video identity to the first server, and obtains an audio play address of an audio stream corresponding to the video stream from the first server, including: generating a first playing request for acquiring playing information of the second terminal based on the video identity, wherein the first playing request comprises the video identity and the position information of the first terminal; the first playing request is sent to a first service end, so that the first service end determines a second terminal nearest to the first terminal based on the first playing request; receiving video playing information from a first service end, wherein the video playing information comprises an identity of a second terminal and an identity of an audio stream corresponding to the video stream; generating a second playing request for acquiring an audio playing address based on the video playing information, wherein the second playing request comprises an identity of a second terminal and an identity of an audio stream; and sending the second playing request to the first service end, and acquiring the audio playing address from the first service end.

Fig. 18 is a schematic block diagram of an example of an audio-video stream synchronized playback device according to an exemplary embodiment of the present disclosure. Referring to fig. 18, the audio and video stream synchronized playback apparatus is applied to a first terminal, and includes:

the obtaining unit 1810 is configured to obtain an audio stream from a first service end, where the audio stream corresponds to a video stream being played by a second terminal, the second terminal receives the video stream from the first service end, the audio stream includes audio data and a standard timestamp, and the video stream includes video data and a standard timestamp.

The receiving unit 1820 is configured to receive, from the first service end, a video delay time of playing the video stream by the second terminal, which is determined by the second terminal, wherein the video delay time is a difference between a standard timestamp and a local timestamp of the second terminal.

The determining unit 1830 is configured to determine an audio delay time for the first terminal to play the audio stream based on a difference between the standard time stamp and the local time stamp of the first terminal.

The playback unit 1840 is configured to play back the audio stream in synchronization with the playback of the video stream by the second terminal based on the video delay time and the audio delay time.

As an example, the playback unit 1840 is further configured to: performing difference processing on the audio delay time and the video delay time to obtain audio-video asynchronous time; and playing the audio stream after waiting for a time equal to the audio delay time in response to the audio-visual asynchronous time being greater than 0, discarding the audio data of the time equal to the audio delay time from the audio stream in response to the audio-visual asynchronous time being less than 0, and playing the audio stream after discarding the audio data.

As an example, the audio-video stream synchronized playback apparatus is further configured to: performing difference processing on the audio delay time and the video delay time to obtain audio-video asynchronous time; receiving video delay time from a first service end at a first time interval in response to the audio-visual asynchronous time meeting a first preset condition; receiving video delay time from the first service end at a second time interval in response to the audio-visual asynchronous time meeting a second preset condition; in response to the audio-visual asynchronous time meeting a third preset condition, receiving video delay time from the first service end at a third time interval, wherein the first preset condition is as follows: the audio-visual asynchronous time is in a first preset interval, or the number of times that the audio-visual asynchronous time is outside the first preset interval is lower than a preset number of times within a preset time length, and the second preset condition is that: the sound and picture asynchronous time obtained continuously twice is larger than or equal to the upper limit value of the first preset interval or smaller than or equal to the lower limit value of the first preset interval, and the third preset condition is that: the sound and picture asynchronous time obtained continuously twice is larger than or equal to a first threshold value or smaller than or equal to a second threshold value, the first threshold value is larger than the upper limit value of a first preset interval, the second threshold value is smaller than the lower limit value of the first preset interval, and the first time interval is larger than the second time interval which is larger than the third time interval.

As an example, the audio-video stream synchronized playback apparatus is further configured to: responding to the audio waiting instruction input by the user, and playing the audio stream after waiting for a duration corresponding to the waiting time in the audio waiting instruction; and in response to receiving an audio discarding instruction input by a user, discarding audio data with a duration corresponding to the discarding time in the audio discarding instruction from the audio stream, and playing the audio stream after discarding the audio data.

As an example, the obtaining unit 1810 is configured to obtain an audio stream from a first server by: acquiring a video identity of a video stream being played by a second terminal; the video identification is sent to a first server, and an audio playing address of an audio stream corresponding to the video identification is obtained from the first server; and acquiring an audio stream from the first server based on the audio playing address.

Fig. 19 is a schematic block diagram of another example of an audio-video stream synchronized playback device according to an exemplary embodiment of the present disclosure. Referring to fig. 19, the audio/video stream synchronous playing device is applied to a first service end, and includes:

the stream transmitting unit 1910 is configured to transmit an audio stream to the first terminal and a video stream to the second terminal, wherein the video stream corresponds to the audio stream, the audio stream includes audio data and a standard time stamp, and the video stream includes video data and a standard time stamp.

The time transmitting unit 1920 is configured to transmit the video delay time of the second terminal playing the video stream determined by the second terminal to the first terminal for the first terminal to play the audio stream in synchronization with the second terminal playing the video stream based on the video delay time and the audio delay time of the first terminal playing the audio stream, the video delay time being a difference between the standard time stamp and the local time stamp of the second terminal, and the audio delay time being a difference between the standard time stamp and the local time stamp of the first terminal.

As an example, the audio-video stream synchronized playback apparatus is further configured to: performing difference processing on the audio delay time and the video delay time to obtain audio-video asynchronous time; responding to the audio-video asynchronous time meeting a first preset condition, and sending the video delay time to a first terminal at a first time interval; responding to the audio-video asynchronous time meeting a second preset condition, and sending the video delay time to the first terminal at a second time interval; responding to the audio-video asynchronous time meeting a third preset condition, and sending the video delay time to the first terminal at a third time interval, wherein the first preset condition is as follows: the audio-visual asynchronous time is in a first preset interval, or the number of times that the audio-visual asynchronous time is outside the first preset interval is lower than a preset number of times within a preset time length, and the second preset condition is that: the sound and picture asynchronous time obtained continuously twice is larger than or equal to the upper limit value of the first preset interval or smaller than or equal to the lower limit value of the first preset interval, and the third preset condition is that: the sound and picture asynchronous time obtained continuously twice is larger than or equal to a first threshold value or smaller than or equal to a second threshold value, the first threshold value is larger than the upper limit value of a first preset interval, the second threshold value is smaller than the lower limit value of the first preset interval, and the first time interval is larger than the second time interval which is larger than the third time interval.

As an example, the audio-video stream synchronized playback apparatus is further configured to: the first terminal responds to receiving an audio waiting instruction input by a user, and plays an audio stream after waiting for a duration corresponding to the waiting time in the audio waiting instruction; and the first terminal discards the audio data with the duration corresponding to the discarding time in the audio discarding instruction from the audio stream in response to receiving the audio discarding instruction input by the user, and plays the audio stream after discarding the audio data.

As an example, the audio-video stream synchronized playback apparatus is further configured to: receiving a video identity of a video stream being played by a second terminal, which is acquired by a first terminal, from the first terminal; transmitting an audio play address of an audio stream corresponding to the video identity to a first terminal; the step of transmitting the audio stream to the first terminal is performed based on the audio play address.

With respect to the audio playback apparatus and the audio-video stream synchronous playback apparatus in the above-described embodiments, the specific manner in which the respective units perform the operations has been described in detail in the embodiments regarding the method, and will not be described in detail here.

Fig. 20 is a block diagram of an electronic device, according to an example embodiment. As shown in fig. 20, the electronic device 1000 includes a processor 101 and a memory 102 for storing processor-executable instructions. Here, the processor executable instructions, when executed by the processor, cause the processor to perform the audio playing method or the audio-video stream synchronous playing method as described in the above exemplary embodiments.

By way of example, the electronic device 1000 need not be a single device, but may be any means or collection of circuits capable of executing the above-described instructions (or sets of instructions) alone or in combination. The electronic device 1000 may also be part of an integrated control system or system manager, or may be configured as a server that interfaces with either locally or remotely (e.g., via wireless transmission).

In electronic device 1000, processor 101 may include a Central Processing Unit (CPU), a Graphics Processor (GPU), a programmable logic device, a special purpose processor system, a microcontroller, or a microprocessor. By way of example and not limitation, processor 101 may also include an analog processor, a digital processor, a microprocessor, a multi-core processor, a processor array, a network processor, and the like.

The processor 101 may execute instructions or code stored in the memory 102, wherein the memory 102 may also store data. The instructions and data may also be transmitted and received over a network via a network interface device, which may employ any known transmission protocol.

The memory 102 may be integrated with the processor 101, for example, RAM or flash memory disposed within an integrated circuit microprocessor or the like. In addition, the memory 102 may include a stand-alone device, such as an external disk drive, a storage array, or any other storage device usable by a database system. The memory 102 and the processor 101 may be operatively coupled or may communicate with each other, for example, through an I/O port, a network connection, etc., such that the processor 101 is able to read files stored in the memory 102.

In addition, the electronic device 1000 may also include a video display (such as a liquid crystal display) and a user interaction interface (such as a keyboard, mouse, touch input device, etc.). All components of the electronic device 1000 may be connected to each other via buses and/or networks.

In an exemplary embodiment, a computer readable storage medium may also be provided, which when executed by a processor of a server, enables the server to perform the audio playing method or the audio video stream synchronous playing method as described in the above exemplary embodiment. The computer readable storage medium may be, for example, a memory including instructions, alternatively the computer readable storage medium may be: read-only memory (ROM), random-access memory (RAM), random-access programmable read-only memory (PROM), electrically erasable programmable read-only memory (EEPROM), dynamic random-access memory (DRAM), static random-access memory (SRAM), flash memory, nonvolatile memory, CD-ROM, CD-R, CD + R, CD-RW, CD+RW, DVD-ROM, DVD-R, DVD + R, DVD-RW, DVD+RW, DVD-RAM, BD-ROM, BD-R, BD-R LTH, BD-RE, blu-ray or optical disk storage, hard Disk Drives (HDD), solid State Disks (SSD), card memory (such as multimedia cards, secure Digital (SD) cards or ultra-fast digital (XD) cards), magnetic tape, floppy disks, magneto-optical data storage, hard disks, solid state disks, and any other means configured to store computer programs and any associated data, data files and data structures in a non-transitory manner and to provide the computer programs and any associated data, data files and data structures to a processor or computer to enable the processor or computer to execute the programs. The computer programs in the computer readable storage media described above can be run in an environment deployed in a computer device, such as a client, host, proxy device, server, etc., and further, in one example, the computer programs and any associated data, data files, and data structures are distributed across networked computer systems such that the computer programs and any associated data, data files, and data structures are stored, accessed, and executed in a distributed fashion by one or more processors or computers.

In an exemplary embodiment, a computer program product may also be provided, which comprises computer instructions which, when executed by a processor, implement the audio playing method or the audio-video stream synchronous playing method as described in the above exemplary embodiment.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any adaptations, uses, or adaptations of the disclosure following the general principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. An audio playing method, which is applied to a first terminal, comprising:

Acquiring a video identity of a video stream being played by a second terminal, wherein the video identity is a unique identity corresponding to the video stream;

the video identity is sent to a first service end, and an audio playing address of an audio stream corresponding to the video identity is obtained from the first service end;

acquiring the audio stream from the first server based on the audio play address, and playing the audio stream in synchronization with the playing of the video stream by the second terminal, wherein the first server is a server for pushing the video stream to the second terminal and pushing the audio stream to the first terminal, the audio stream comprises audio data and a standard timestamp shared with the video stream, the audio stream is a spatial audio live stream generated based on an audio definition model,

wherein playing the audio stream in synchronization with the playing of the video stream by the second terminal includes:

receiving video delay time of the second terminal playing the video stream determined by the second terminal from the first service terminal according to a time interval of a preset timing task, wherein the video delay time is the difference between a standard time stamp and a local time stamp of the second terminal;

Determining an audio delay time for the first terminal to play the audio stream based on a difference between the standard timestamp and a local timestamp of the first terminal;

playing the audio stream in synchronization with the second terminal playing the video stream based on the video delay time and the audio delay time,

and in the process that the first terminal receives the video delay time from the first service end, the time interval of the timing task is adjusted in real time according to the audio-video asynchronous time, wherein the audio-video asynchronous time is obtained by performing difference processing on the audio delay time and the video delay time.

2. The audio playing method according to claim 1, wherein the sending the video identity to a first service side, and obtaining, from the first service side, an audio playing address of an audio stream corresponding to the video stream, includes:

generating a first playing request for acquiring playing information of the second terminal based on the video identity and the current position of the first terminal, wherein the first playing request comprises the video identity and the position information of the first terminal;

The first playing request is sent to the first service end, so that the first service end determines the second terminal nearest to the first terminal based on the first playing request;

receiving video playing information from the first service end;

generating a second playing request for acquiring the audio playing address based on the video playing information, wherein the second playing request comprises an identity of the second terminal and an identity of the audio stream;

sending the second play request to the first server, and acquiring the audio play address from the first server,

the video playing information comprises an identity of the second terminal, an identity of an audio stream corresponding to the video stream, a name of the second terminal and video time delay of the second terminal for playing the video stream.

3. The audio playing method according to claim 1, wherein the first service side includes a first sub-service side and a second sub-service side, the first sub-service side is used for pushing an audio stream corresponding to the video stream, and the second sub-service side is used for pushing the video stream to the second terminal.

4. The audio playing method according to claim 2, characterized in that the audio playing method further comprises:

transmitting audio playing information to a second server for the second server to transmit the audio playing information to a third terminal to allow the third terminal to acquire the audio stream from the first server based on the audio playing information and play the audio stream in synchronization with the second terminal playing the video stream,

the audio playing information comprises the audio playing address, or comprises the audio playing address and the audio playing progress of the first terminal, or comprises the audio playing address, the audio playing progress of the first terminal and the video playing information received by the first terminal from the first service end.

5. The audio playing method according to claim 4, wherein the third terminal plays the audio stream in synchronization with the second terminal playing the video stream, comprising:

decoding the audio stream to obtain the standard time stamp;

performing difference processing on the standard time stamp and the local time stamp to obtain audio delay time;

Playing the audio stream in synchronization with the second terminal playing the video stream based on the audio delay time,

wherein playing the audio stream in synchronization with the second terminal playing the video stream based on the audio delay time includes:

in response to the audio delay time being greater than 0, playing the audio stream after waiting for a time equal to the audio delay time;

in response to the audio delay time being less than 0, audio data of a time equal to the audio delay time is discarded from the audio stream, and the audio stream after the audio data is discarded is played.

6. The audio playing method according to claim 5, wherein the first terminal or the third terminal plays the audio stream by:

determining a channel of a speaker for playing the audio stream;

rendering the audio data analyzed from the audio stream in the channel azimuth to obtain channel metadata of the audio data;

determining spatial position information of audio of each channel in the audio data based on the channel metadata;

according to the sound channel of the loudspeaker and the space position information, carrying out sound channel conversion on the sound channel audio to obtain converted sound channel audio;

Generating an audio file adapted to the channel of the loudspeaker based on the converted channel audio;

and playing the audio file through the sound channel of the loudspeaker.

7. An audio playing method, which is applied to a first service end, includes:

receiving a video identity of a video stream being played by a second terminal acquired by a first terminal from the first terminal, wherein the video identity is a unique identity corresponding to the video stream;

transmitting an audio playing address of an audio stream corresponding to the video identity to the first terminal;

transmitting the audio stream to the first terminal based on the audio playing address, so that the first terminal plays the audio stream synchronously with the second terminal playing the video stream, wherein the first server is a server for pushing the video stream to the second terminal and pushing the audio stream to the first terminal, the audio stream comprises audio data and a standard timestamp shared with the video stream, the audio stream is a spatial audio live stream generated based on an audio definition model,

wherein the playing of the audio stream by the first terminal in synchronization with the playing of the video stream by the second terminal comprises:

The first terminal receives video delay time of the second terminal playing the video stream, which is determined by the second terminal, from the first service terminal according to a time interval of a preset timing task, wherein the video delay time is the difference between a standard time stamp and a local time stamp of the second terminal;

the first terminal determines the audio delay time of the first terminal for playing the audio stream based on the difference between the standard time stamp and the local time stamp of the first terminal;

the first terminal plays the audio stream in synchronization with the second terminal playing the video stream based on the video delay time and the audio delay time,

8. The audio playing method according to claim 7, wherein the sending the audio playing address of the audio stream corresponding to the video identity to the first terminal includes:

Receiving a first play request generated by the first terminal based on the video identity and the current position of the first terminal from the first terminal, wherein the first play request is used for acquiring play information of the second terminal, and the first play request comprises the video identity and the position information of the first terminal;

determining the second terminal nearest to the first terminal based on the first play request;

transmitting video playing information to the first terminal;

receiving a second play request generated by the first terminal based on the video play information from the first terminal, wherein the second play request is used for acquiring the audio play address, and the second play request comprises an identity of the second terminal and an identity of the audio stream;

transmitting the audio play address to the first terminal based on the second play request,

9. The audio playing method according to claim 7, wherein the first service side includes a first sub-service side and a second sub-service side, the first sub-service side is used for pushing an audio stream corresponding to the video stream, and the second sub-service side is used for pushing the video stream to the second terminal.

10. The audio playing method according to claim 8, characterized in that the audio playing method further comprises:

receiving audio playing information sent to a third terminal by a second service end from the third terminal, wherein the audio playing information is sent to the second service end by the first terminal, and the audio playing information comprises the audio playing address, or comprises the audio playing address and the audio playing progress of the first terminal, or comprises the audio playing address, the audio playing progress of the first terminal and the video playing information received by the first terminal from the first service end;

and transmitting the audio stream to the third terminal based on the audio playing information so that the third terminal plays the audio stream synchronously with the video stream played by the second terminal.

11. The audio playing method according to claim 10, wherein the third terminal plays the audio stream in synchronization with the second terminal playing the video stream, comprising:

decoding the audio stream to obtain the standard time stamp;

12. The audio playing method according to claim 11, wherein the first terminal or the third terminal plays the audio stream by:

Determining a channel of a speaker for playing the audio stream;

and playing the audio file through the sound channel of the loudspeaker.

13. An audio playing method, applied to a third terminal, comprising:

receiving audio playing information from a second service end, wherein the audio playing information is sent to the second service end by a first terminal, the audio playing information comprises an audio playing address or comprises the audio playing address and the audio playing progress of the first terminal, the audio playing address is acquired by the first terminal, the first terminal acquires a video identity of a video stream being played by the second terminal, the video identity is sent to the first service end, and the audio playing address of an audio stream corresponding to the video identity is acquired from the first service end, wherein the video identity is a unique identity corresponding to the video stream;

Acquiring the audio stream from the first server based on the audio playing information, and playing the audio stream synchronously with the playing of the video stream by the second terminal, wherein the first server is a server for pushing the video stream to the second terminal and pushing the audio stream to the first terminal, the audio stream comprises audio data and a standard timestamp shared with the video stream, the audio stream is a spatial audio live stream generated based on an audio definition model,

determining an audio delay time for playing the audio stream by the third terminal based on the difference between the standard timestamp and the local timestamp of the third terminal;

playing the audio stream in synchronization with the third terminal playing the video stream based on the video delay time and the audio delay time,

And in the process that the third terminal receives the video delay time from the first service end, the time interval of the timing task is adjusted in real time according to the audio-video asynchronous time, wherein the audio-video asynchronous time is obtained by performing difference processing on the audio delay time and the video delay time.

14. The audio playing method according to claim 13, wherein the first terminal sends the video identification to a first service end, and obtains an audio playing address of an audio stream corresponding to the video stream from the first service end, including:

receiving video playing information from the first service end;

15. The audio playing method according to claim 13, wherein the first service side includes a first sub-service side and a second sub-service side, the first sub-service side is used for pushing an audio stream corresponding to the video stream, and the second sub-service side is used for pushing the video stream to the second terminal.

16. The audio playing method according to claim 14, wherein the third terminal plays the audio stream in synchronization with the second terminal playing the video stream, comprising:

decoding the audio stream to obtain the standard time stamp;

responsive to the audio delay time being greater than 0, playing the audio stream after waiting for a time equal to the audio delay time;

17. The audio playing method according to claim 16, wherein the first terminal or the third terminal plays the audio stream by:

determining a channel of a speaker for playing the audio stream;

and playing the audio file through the sound channel of the loudspeaker.

18. An audio playing device applied to a first terminal, characterized in that the audio playing device comprises:

the device comprises an identification acquisition unit, a video processing unit and a video processing unit, wherein the identification acquisition unit is configured to acquire a video identification of a video stream being played by a second terminal, and the video identification is a unique identification corresponding to the video stream;

the identification sending unit is configured to send the video identification to a first service end and acquire an audio playing address of an audio stream corresponding to the video identification from the first service end;

an audio stream acquisition unit configured to acquire the audio stream from the first server based on the audio play address and play the audio stream in synchronization with the second terminal playing the video stream, wherein the first server is a server for pushing the video stream to the second terminal and pushing the audio stream to the first terminal, the audio stream includes audio data and a standard timestamp common to the video stream, the audio stream is a spatial audio live stream generated based on an audio definition model,

Wherein the audio stream acquisition unit is further configured to:

19. An audio playing device applied to a first service end, wherein the audio playing device comprises:

an identifier receiving unit configured to receive, from a first terminal, a video identifier of a video stream being played by a second terminal, where the video identifier is a unique identifier corresponding to the video stream;

An address sending unit configured to send an audio play address of an audio stream corresponding to the video identity to the first terminal;

an audio stream transmission unit configured to transmit the audio stream to the first terminal based on the audio play address for the first terminal to play the audio stream in synchronization with the second terminal playing the video stream, wherein the first server is a server for pushing the video stream to the second terminal and pushing the audio stream to the first terminal, the audio stream includes audio data and a standard timestamp common to the video stream, the audio stream is a spatial audio live stream generated based on an audio definition model,

20. An audio playing device applied to a third terminal, characterized in that the audio playing device comprises:

an information receiving unit configured to receive audio playing information from a second service end, where the audio playing information is sent to the second service end by a first terminal, the audio playing information includes an audio playing address, or includes the audio playing address and an audio playing progress of the first terminal, the audio playing address is acquired by the first terminal, the first terminal acquires a video identity of a video stream being played by the second terminal, sends the video identity to the first service end, and acquires an audio playing address of an audio stream corresponding to the video identity from the first service end, where the video identity is a unique identity corresponding to the video stream;

An acquisition and playback unit configured to acquire the audio stream from the first server side based on the audio playback information, and play the audio stream in synchronization with the playback of the video stream by the second terminal, wherein the first server side is a server for pushing the video stream to the second terminal and pushing the audio stream to the first terminal, the audio stream including audio data and a standard timestamp common to the video stream, the audio stream being a spatial audio live stream generated based on an audio definition model,

wherein the acquisition and playback unit is further configured to:

21. An electronic device, the electronic device comprising:

a processor;

a memory for storing the processor-executable instructions,

wherein the processor executable instructions, when executed by the processor, cause the processor to perform the audio playing method according to any one of claims 1 to 17.

22. A computer readable storage medium, characterized in that instructions in the computer readable storage medium, when executed by a processor of a server, enable the server to perform the audio playback method according to any one of claims 1 to 17.