CN111654736B

CN111654736B - Method and device for determining audio and video synchronization error, electronic equipment and storage medium

Info

Publication number: CN111654736B
Application number: CN202010524709.7A
Authority: CN
Inventors: 王伟; 刘一卓
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-06-10
Filing date: 2020-06-10
Publication date: 2022-05-31
Anticipated expiration: 2040-06-10
Also published as: CN111654736A

Abstract

The application discloses a method and a device for determining audio and video synchronization errors, electronic equipment and a storage medium, and relates to image processing and sound processing technologies in the fields of multimedia and live broadcasting. The specific implementation scheme is as follows: acquiring a live streaming address; acquiring audio frame data and video frame data of a live data stream from a streaming media server according to the live stream address; determining the synchronous error of the audio and the video in the live data stream according to the display time information corresponding to the audio frame data and the display time information corresponding to the video frame data; or, the display time information corresponding to the audio frame data and the video frame data is visually displayed, so that the synchronization error is determined according to the display information. Therefore, according to the display time information corresponding to the audio frame data and the video frame data of the live data stream, the synchronization error is directly calculated or determined according to the display result of the display time information, and the purpose of quantitatively displaying the synchronization error of the audio and the video in the live data stream is achieved.

Description

Method and device for determining audio and video synchronization error, electronic equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for determining an audio and video synchronization error, an electronic device, and a storage medium, and to an image processing and sound processing technology in the multimedia and live broadcast fields.

Background

When each large video/game/E-commerce live broadcast platform watches live broadcast, the situation that the sound and the action of a main broadcast are not synchronous in a live broadcast room is often found. The reasons for the out of synchronization of sound and motion are: when the audio frame and the video frame are played, the time axes of the audio frame and the video frame generate synchronization errors, so that the problem of asynchronism is solved, and the method mainly comprises the steps of positioning the introduction stage of the synchronization errors in a live data stream and quantizing the synchronization errors. However, at present, a subjective evaluation method is adopted to judge whether the audio and the video are synchronous, and the synchronization error of the audio and the video cannot be quantified.

Disclosure of Invention

The embodiment of the application provides a method and a device for determining an audio and video synchronization error, electronic equipment and a storage medium, so as to achieve the purpose of quantifying the synchronization error.

In a first aspect, an embodiment of the present application provides a method for determining an audio and video synchronization error, including:

acquiring a live streaming address;

acquiring audio frame data and video frame data of a live data stream from a streaming media server according to the live stream address;

determining the synchronous error of the audio and the video in the live data stream according to the display time information corresponding to the audio frame data and the display time information corresponding to the video frame data; or visually displaying the display time information corresponding to the audio frame data and the display time information corresponding to the video frame data, so as to determine the synchronization error of the audio and the video in the live data stream according to the display information.

In a second aspect, an embodiment of the present application further provides an apparatus for determining an audio/video synchronization error, including:

the UI interaction module is used for acquiring a live streaming address;

the WebSocket Server module is used for sending the live streaming address to the acquisition task module;

the acquisition task module is used for acquiring audio frame data and video frame data of the live stream from the streaming media server based on the live stream address;

the WebSocket Server module is also used for: determining the synchronous error of the audio and the video in the live data stream according to the display time information corresponding to the audio frame data and the display time information corresponding to the video frame data;

the UI interaction module is further configured to: and visually displaying the display time information corresponding to the audio frame data and the display time information corresponding to the video frame data so as to determine the synchronous error of the audio and the video in the live data stream according to the display information.

In a third aspect, an embodiment of the present application further provides an electronic device, including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to execute the method for determining the audio-video synchronization error according to any embodiment of the present application.

In a fourth aspect, embodiments of the present application further provide a non-transitory computer-readable storage medium storing computer instructions for causing a computer to execute the method for determining an audio and video synchronization error according to any embodiment of the present application.

The embodiments in the above application have the following advantages or benefits: because the display time information corresponding to the audio frame data and the video frame data in the live stream is the timestamp transmitted to the terminal player, whether the two display time information are the same determines whether the audio and the video of the live broadcast played in the terminal player are synchronous, after the audio frame data and the video frame data of the live stream are obtained, the synchronization error can be directly calculated or the display time information can be visually displayed according to the display time information corresponding to the audio frame data and the video frame data, so that the synchronization error can be determined according to the display result, and the purpose of quantitatively displaying the synchronization error of the audio and the video in the live stream is achieved.

Other effects of the above alternatives will be described below with reference to specific embodiments.

Drawings

The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:

fig. 1a is a schematic flow chart of a method for determining audio-video synchronization errors according to a first embodiment of the present application;

fig. 1b is a diagram illustrating an effect of performing contrast display on display time information corresponding to audio frame data and display time information corresponding to video frame data according to the first embodiment of the present application;

fig. 2a is a schematic flow chart of a method for determining audio-video synchronization errors according to a second embodiment of the present application;

fig. 2b is a schematic diagram of a live broadcast process according to a second embodiment of the present application;

fig. 3 is a process diagram of determining a synchronization error by an audio-video synchronization error determining apparatus according to a third embodiment of the present application;

fig. 4 is a schematic structural diagram of an audio-video synchronization error determination apparatus according to a fourth embodiment of the present application;

fig. 5 is a block diagram of an electronic device for implementing the method for determining audio/video synchronization error according to the embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Fig. 1a is a schematic flowchart of a method for determining audio/video synchronization errors according to a first embodiment of the present application, which is applicable to a case of detecting synchronization errors of audio and video in a live broadcast data stream in a live broadcast process. The method may be performed by an audiovisual synchronization error determination apparatus, which is implemented in software and/or hardware, preferably configured in an electronic device, such as a server. Referring to fig. 1a, the method for determining the audio/video synchronization error specifically includes:

s101, acquiring a live streaming address.

And S102, acquiring audio frame data and video frame data of the live data stream from the streaming media server according to the live stream address.

The streaming media server is used for storing live streaming data; the live stream address is used to determine the storage location of the live stream data in the streaming server.

The reason for the asynchronism between the sound and the action of the anchor in the live broadcasting process is as follows: when the audio frame and the video frame of the live data stream are played, the time axes thereof have synchronization errors. Therefore, to quantify the synchronization error between audio and video in live broadcasting, it is necessary to acquire audio frame data and video frame data of a live data stream. Optionally, the live streaming address is obtained first, and then audio frame data and video frame data of the live data stream are obtained from the streaming media server according to the live streaming address, where the live streaming address may be manually input by a detection person.

In the embodiment of the present application, when the synchronization error is calculated according to the audio frame data and the video frame data of the acquired live data stream, the synchronization error may be automatically determined by a machine (see S103), or the synchronization error may be manually determined after the presentation (see S104).

S103, determining the synchronous error of the audio and the video in the live data stream according to the display time information corresponding to the audio frame data and the display time information corresponding to the video frame data.

In this embodiment, a data structure of a live stream includes a timestamp, where a dts (decoding Time stamp) decoding timestamp indicates decoding Time of a compressed frame, and a pts (presentation Time stamp) display timestamp indicates display Time of an original frame obtained after decoding the compressed frame. The DTS and the PTS in the audio are the same, and the B frames in the video need bidirectional prediction and depend on the front and the back frames, so the decoding sequence and the display sequence of the video containing the B frames are different, namely the DTS and the PTS are different, and the DTS and the PTS are the same for the video without the B frames. The time base is a unit of time stamp, and if a video frame DTS is 40, PTS is 160, and its time base is 1/1000 seconds, the decoding time of this video frame is 40ms, and the display time is 160 ms. In the device, ffprobe can be used for analyzing frame data, wherein PKT _ PTS is PTS TIME of a frame, and display TIME information (namely, PKT _ PTS _ TIME) is display TIME calculated by the frame through a TIME base, namely, a TIME stamp transmitted to the terminal player, so that whether live broadcast audio and video played in the terminal player are synchronous is determined only by judging whether PKT _ PTS _ TIME corresponding to audio frame data and PKT _ PTS _ TIME corresponding to video frame data are the same. Namely, the synchronization error of the audio and the video in the live data stream can be calculated according to the PKT _ PTS _ TIME corresponding to the audio frame data and the PKT _ PTS _ TIME corresponding to the video frame data.

In an optional implementation manner, determining a synchronization error between audio and video in the live data stream according to the display time information corresponding to the audio frame data and the display time information corresponding to the video frame data includes S1031 to S1032:

and S1031, according to the arrival sequence of the audio frame data and the video frame data, pairing the audio frame data and the video frame data to obtain at least one audio and video frame pair.

S1032, for each audio and video frame pair, determining a difference value between display time information corresponding to audio frame data and display time information corresponding to video frame data contained in the audio and video frame pair, and determining a synchronization error of audio and video in the live broadcast data stream according to the difference value.

In the embodiment of the application, when the video frame data and the audio frame data are acquired from the streaming media server according to the live streaming address, the streaming media server does not send the video frame data and the audio frame data according to a certain rule, for example, after several frames of audio frame data are continuously sent, several frames of video frame data are continuously sent again. Therefore, to determine whether the audio and video in the live data stream are synchronous, the obtained audio frame data and video frame data need to be paired, that is, the audio frame data and the video frame data are ensured to be corresponding to each other. Optionally, the audio frame data and the video frame data are paired according to the arrival sequence of the audio frame data and the video frame data, and since the audio frame data and the video frame data of the live data stream are continuously acquired from the streaming media server, at least one audio/video frame can be acquired after pairing according to the arrival sequence.

And after each audio and video frame pair is obtained, judging whether the display time information corresponding to the audio frame data contained in the audio and video frame pair is the same as the display time information corresponding to the video frame data. If the two are the same, determining that the audio and the video in the live stream are synchronous; and if the two are different, determining that the audio and the video in the live stream are not synchronous, namely, a synchronous error exists between the audio and the video. Optionally, in order to simplify the process of calculating the synchronization error, a difference between display time information corresponding to audio frame data and display time information corresponding to video frame data included in the audio/video frame pair may be directly used as the synchronization error.

It should be noted that, by pairing the audio frame data and the video frame data, the difference between the display time information corresponding to the audio frame data and the display time information corresponding to the video frame data included in the audio and video frame pairs is used as a synchronization error, so that the synchronization error is quantized, and the process of calculating the synchronization error is simple, convenient and accurate.

Further, a first buffer unit and a second buffer unit are preset in the apparatus for determining an audio/video synchronization error, so as to buffer audio frame data and video frame data acquired from the streaming media server.

In an optional embodiment, the pairing the audio frame data and the video frame data according to the arrival order of the audio frame data and the video frame data to obtain at least one audio/video frame pair includes: for the received current frame data sent by the streaming media server, if the current frame data is audio frame data, the audio frame data is paired with video frame data cached by a first cache unit; and if the current frame data is the video frame data, pairing the video frame data with the audio frame data cached by the second caching unit.

In the embodiment of the application, the first cache unit is used for caching recently arrived video frame data, and the second cache unit is used for caching recently arrived audio frame data, so that the current frame data and the video frame data cached by the first cache unit or the audio frame data cached by the second cache unit are paired according to the type of the newly arrived current frame data, the pairing accuracy is improved, and the audio and the video can be ensured to correspond.

And S104, visually displaying the display time information corresponding to the audio frame data and the display time information corresponding to the video frame data, and determining the synchronization error of the audio and the video in the live data stream according to the display information.

In an optional embodiment, visually displaying the display time information corresponding to the audio frame data and the display time information corresponding to the video frame data includes S1041-S1042:

s1041, according to the arrival sequence of the audio frame data and the video frame data, pairing the audio frame data and the video frame data to obtain at least one audio and video frame pair.

For the specific pairing process, reference is made to the above contents, and details are not repeated here.

And S1042, for each audio and video frame pair, comparing and displaying the display time information corresponding to the audio frame data and the display time information corresponding to the video frame data contained in the audio and video frame pair.

For example, referring to fig. 1b, an effect diagram illustrating the comparative display of the display TIME information corresponding to the audio frame data and the display TIME information corresponding to the video frame data is shown, wherein the horizontal axis represents the audio/video frame pair, and the vertical axis represents the display TIME information (i.e., PKT _ PTS _ TIME). And the audio and video frame is used for carrying out contrastive display on the display time information corresponding to the contained audio frame data and the display time information corresponding to the contained video frame data, and a user can quickly and intuitively determine whether the audio and the video in the live stream are synchronous or not only by checking whether the audio frame line and the video frame line in the figure 1b are overlapped or not. When the specific value of the synchronization error is calculated, a user can check the display time information corresponding to the audio frame data and the display time information corresponding to the video frame data contained in any audio and video frame pair by clicking any audio and video frame pair, and then the synchronization error can be calculated according to the difference.

In the embodiment of the application, the display time information corresponding to the audio frame data and the video frame data in the live stream is the timestamp transmitted to the terminal player, and whether the two display time information are the same determines whether the audio and the video of the live broadcast in the terminal player are synchronous, so that after the audio frame data and the video frame data of the live stream are acquired, the synchronization error can be directly calculated or the display time information can be visually displayed according to the display time information corresponding to the audio frame data and the video frame data, so that the synchronization error can be determined according to the display result, and the purpose of quantitatively displaying the synchronization error of the audio and the video in the live stream is achieved.

Fig. 2a is a schematic flow chart of a method for determining audio-video synchronization errors according to a second embodiment of the present application, and this embodiment is further optimized based on the above embodiments. As shown in fig. 2a, the method specifically includes the following steps:

s201, acquiring a live streaming address, wherein the live streaming address is a push streaming address or a pull streaming address.

S202, acquiring audio frame data and video frame data of the live data stream from the streaming media server according to the live stream address.

Referring to fig. 2b, which shows a schematic diagram of a live broadcast process, a live broadcast data stream is divided into a push stream end and a pull stream end, when a main broadcast is broadcast, the push stream end opens a push stream between a streaming media server and the push stream end, the push stream can push video and audio data (collectively called live broadcast stream data) to the streaming media server (such as an rmtp module of Nginx), and the play end (i.e. the pull stream end) obtains audio and video frame data from the streaming media server through a given address, and then plays the audio and video data synchronously by using a player to show the user watching the live broadcast.

Further, according to the live broadcasting process, the audio and video in the live broadcasting process that the user finally sees are asynchronous, and the introduction stage can be roughly divided into three stages: push flow stage, pull flow stage and play stage. There are two key factors that contribute to the problem of audio and video dyssynchrony: 1. the time stamp in the source data stream must be correct, and if the time stamp in the push stream or pull stream data source is problematic, how to adjust the playing is useless. 2. The time stamp at play controls the correlation. Therefore, the stage of introducing the synchronization error is determined, the live streaming data in different stages is acquired for analysis, and in order to improve the judgment efficiency, optionally, the device for determining the audio/video synchronization error detects the live streaming data in the push stage and the pull stage synchronously, and then determines whether the synchronization error is introduced in the playing stage by a checking method, that is, if the synchronization error does not exist in the push stage and the pull stage, the synchronization error is probably introduced in the playing stage.

Since the live broadcasting process includes two processes of push streaming and pull streaming, the live streaming address is also divided into a push streaming address and a pull streaming address.

If the obtained live streaming address is a stream pushing address, correspondingly, the live streaming data stream is a live streaming data stream pushed to the streaming media server by the stream pushing end, that is, the audio frame data and the video frame data obtained according to the live streaming address are frame data generated in a stream pushing stage. It should be noted that, the audio frame data and the video frame data are acquired through the stream pushing address, and then the synchronization error is determined according to S103 or S104, so that detection of whether the synchronization error is introduced in the stream pushing stage is achieved.

If the obtained live streaming address is a streaming address, correspondingly, the live streaming data stream is a live streaming data stream pulled by the streaming end from the streaming media server, that is, audio frame data and video frame data obtained according to the live streaming address are frame data generated in a streaming stage. It should be noted that, the audio frame data and the video frame data are acquired through the pull stream address, and then the synchronization error is determined according to S103 or S104, so that whether the synchronization error is introduced in the pull stream stage is detected.

S203, determining the synchronous error of the audio and the video in the live data stream according to the display time information corresponding to the audio frame data and the display time information corresponding to the video frame data.

And S204, visually displaying the display time information corresponding to the audio frame data and the display time information corresponding to the video frame data, and determining the synchronization error of the audio and the video in the live data stream according to the display information.

S205, when the obtained live streaming address is a streaming push address and the synchronization error of the audio and the video in the live data stream is determined not to be zero according to S202-S204, determining that the introduction stage of the synchronization error is a streaming push stage; or when the acquired live streaming address is a pull streaming address and the synchronization error between the audio and the video in the live streaming data stream is determined not to be zero according to S202-S204, determining that the introduction stage of the synchronization error is a pull streaming stage.

When the live streaming address is determined to be a push streaming address or a pull streaming address, the introduction stage of the synchronization error can be determined quickly and accurately only by judging whether the calculated synchronization error is zero. For example, the live streaming address is a stream pushing address, and the calculated synchronization error is not zero (i.e. the synchronization error exists), it is determined that the synchronization error is introduced in the stream pushing stage; conversely, if the calculated synchronization error is zero, it indicates that no synchronization error is introduced in the stream-pushing stage. In another alternative embodiment, a threshold may be preset, and if the synchronization error is smaller than the threshold, it is considered that there is no synchronization error, i.e. it is considered that the audio and video of the live data stream are synchronized.

In the embodiment of the invention, according to the difference of the obtained live streaming addresses (push streaming addresses or pull streaming addresses), and by combining with the fact that whether the determined value of the synchronization error is zero, whether the synchronization error is introduced in the push streaming stage or the pull streaming stage can be quickly positioned. Furthermore, the method can be combined with an elimination method, and if the synchronous error is judged not to be introduced in the push streaming stage and the pull streaming stage but the audio and the video in the live data stream are not synchronous, the synchronous error is determined to be introduced in the playing stage.

Fig. 3 is a process diagram of determining a synchronization error by an audio-video synchronization error determining apparatus according to a third embodiment of the present application, and this embodiment is further optimized based on the foregoing embodiments. In the embodiment of the application, the device for determining the audio and video synchronization error comprises a UI interaction module, a WebSocket Server module and a task acquisition module, and as shown in fig. 3, the method for determining the audio and video synchronization error comprises the following steps:

acquiring a live streaming address input by a user through a UI interaction module, and sending a stream opening instruction to a WebSocket Server module by the UI interaction module and carrying the live streaming address; after receiving the instruction, the WebSocket Server module sends the live streaming address to an acquisition task module; the method comprises the steps that an acquisition task module acquires audio frame data and video frame data of a live data stream from a streaming media server based on a live stream address, if the live stream address is a stream pushing address, live stream data pushed by a stream pushing end is acquired according to the stream pushing address, if the live stream address is a stream pulling address, live stream data needing to be pulled by a stream pulling end is acquired according to the stream pulling address, and it needs to be stated that the task acquisition module acquires audio and video with frame dimensions. After acquiring audio frame data and video frame data of a live data stream, an acquisition task module transmits the audio frame data and the video frame data of the live data stream to a WebSocket Server module, and the WebSocket Server module pairs the audio frame data and the video frame data according to the arrival sequence of the audio frame data and the video frame data to obtain at least one audio/video frame pair. The WebSocket Server module sends each audio-video pair to the UI interaction module, and the display time information corresponding to the audio frame data and the display time information corresponding to the video frame data contained in each audio-video frame pair are displayed in a contrasting manner through the UI interaction module, so that a user can settle synchronization errors according to display results. It should be noted that, after the WebSocket Server module completes the pairing of the audio frame data and the video frame data, for each audio/video frame pair, a difference value between display time information corresponding to the audio frame data included in the audio/video frame pair and display time information corresponding to the video frame data may also be determined, and then a synchronization error between the audio and the video in the live data stream is determined according to the difference value.

The embodiment of the application details the process of determining the synchronization error by the determining device of the audio and video synchronization error through each functional module, and realizes the quantitative display of the synchronization error.

Fig. 4 is a schematic structural diagram of an apparatus for determining audio/video synchronization errors according to a fourth embodiment of the present application, which is applicable to detecting synchronization error conditions of audio and video in a live broadcast data stream in a live broadcast process. As shown in fig. 4, the apparatus 400 specifically includes:

a UI interaction module 401, configured to obtain a live streaming address;

the WebSocket Server module 402 is used for sending the live streaming address to the collection task module;

a task acquisition module 403, configured to acquire audio frame data and video frame data of a live stream from a streaming media server based on the live stream address;

the WebSocket Server module 402 is further configured to: determining the synchronous error of the audio and the video in the live data stream according to the display time information corresponding to the audio frame data and the display time information corresponding to the video frame data;

the UI interaction module 401 is further configured to: and visually displaying the display time information corresponding to the audio frame data and the display time information corresponding to the video frame data so as to determine the synchronous error of the audio and the video in the live data stream according to the display information.

On the basis of the foregoing embodiment, optionally, the WebSocket Server module includes:

the matching unit matches the audio frame data and the video frame data according to the arrival sequence of the audio frame data and the video frame data to obtain at least one audio and video frame pair;

and the error calculation unit is used for determining a difference value between display time information corresponding to audio frame data and display time information corresponding to video frame data contained in each audio and video frame pair, and determining a synchronization error of audio and video in the live broadcast data stream according to the difference value.

On the basis of the foregoing embodiment, optionally, the UI interaction module is specifically configured to:

and comparing and displaying the display time information corresponding to the audio frame data and the display time information corresponding to the video frame data contained in each audio and video frame pair.

On the basis of the foregoing embodiment, optionally, the pairing unit is specifically configured to:

for the received current frame data sent by the streaming media server, if the current frame data is audio frame data, the audio frame data is paired with video frame data cached by a first cache unit; if the current frame data is video frame data, the video frame data is matched with the audio frame data cached by the second caching unit;

the first buffer unit is used for buffering recently arrived video frame data, and the second buffer unit is used for buffering recently arrived audio frame data.

On the basis of the above embodiment, optionally, the live streaming address is a push streaming address; the live data stream is the live data stream pushed to the streaming media server by the stream pushing end.

On the basis of the foregoing embodiment, optionally, the apparatus further includes:

and the first error introduction stage determining module is used for determining that the introduction stage of the synchronous error is a stream pushing stage if the synchronous error of the audio and the video in the live broadcast data stream is determined not to be zero.

On the basis of the foregoing embodiment, optionally, the live streaming address is a pull streaming address; the live data stream is the live data stream pulled by the stream pulling end from the stream media server.

and the second error introduction stage determining module is used for determining that the introduction stage of the synchronous error is a stream pulling stage if the synchronous error of the audio and the video in the live data stream is determined not to be zero.

The resource access control device 400 provided by the embodiment of the present application can execute the method for determining the audio and video synchronization error provided by any embodiment of the present application, and has functional modules corresponding to the execution method and beneficial effects. Reference may be made to the description of any method embodiment of the present application for details not explicitly described in this embodiment.

According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.

As shown in fig. 5, the embodiment of the present application is a block diagram of an electronic device according to the method for determining an audio-video synchronization error. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 5, the electronic apparatus includes: one or more processors 501, memory 502, and interfaces for connecting the various components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 5, one processor 501 is taken as an example.

Memory 502 is a non-transitory computer readable storage medium as provided herein. The memory stores instructions executable by at least one processor, so that the at least one processor executes the method for determining the audio and video synchronization error provided by the application. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the method for determining an audio-video synchronization error provided by the present application.

The memory 502, which is a non-transitory computer-readable storage medium, may be used to store non-transitory software programs, non-transitory computer-executable programs, and modules, such as program instructions/modules corresponding to the method for determining audio/video synchronization error in the embodiment of the present application (for example, the UI interaction module 401, the WebSocket Server module 402, and the collection task module 403 shown in fig. 4). The processor 501 executes various functional applications and data processing of the server by running non-transitory software programs, instructions and modules stored in the memory 502, that is, implements the method for determining audio and video synchronization errors in the above method embodiments.

The memory 502 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of an electronic device that implements the method of determining an audio-visual synchronization error of the embodiment of the present application, and the like. Further, the memory 502 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 502 may optionally include a memory remotely disposed from the processor 501, and these remote memories may be connected to an electronic device implementing the method for determining audio-video synchronization error of the embodiments of the present application through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device for implementing the method for determining the audio and video synchronization error according to the embodiment of the present application may further include: an input device 503 and an output device 504. The processor 501, the memory 502, the input device 503 and the output device 504 may be connected by a bus or other means, and fig. 5 illustrates the connection by a bus as an example.

The input device 503 may receive input numeric or character information and generate key signal inputs related to user settings and function control of an electronic device implementing the method for determining an audio-video synchronization error according to the embodiment of the present application, such as an input device of a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointing stick, one or more mouse buttons, a track ball, a joystick, or the like. The output devices 504 may include a display device, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), the internet, and blockchain networks.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

According to the technical scheme of the embodiment of the application, the display time information corresponding to the audio frame data and the video frame data in the live stream is the timestamp transmitted to the terminal player, and whether the two display time information are the same determines whether the audio and the video of the live broadcast played in the terminal player are synchronous, so that after the audio frame data and the video frame data of the live data stream are obtained, the synchronization error can be directly calculated or the display time information can be visually displayed according to the display time information corresponding to the audio frame data and the video frame data, so that the synchronization error can be determined according to the display result, and the purpose of quantitatively displaying the synchronization error of the audio and the video in the live data stream is achieved. And meanwhile, the introduction stage of the synchronization error can be quickly positioned according to the type of the acquired live streaming address and the calculated numerical value of the synchronization error.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present application can be achieved, and the present invention is not limited herein.

The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A method for determining an audio and video synchronization error comprises the following steps:

acquiring a live streaming address;

determining the synchronous error of the audio and the video in the live data stream according to the display time information corresponding to the audio frame data and the display time information corresponding to the video frame data; or visually displaying the display time information corresponding to the audio frame data and the display time information corresponding to the video frame data so as to determine the synchronization error of the audio and the video in the live data stream according to the display information;

the method for visually displaying the display time information corresponding to the audio frame data and the display time information corresponding to the video frame data comprises the following steps:

according to the arrival sequence of the audio frame data and the video frame data, pairing the audio frame data and the video frame data to obtain at least one audio/video frame pair;

and for each audio and video frame pair, comparing and displaying the display time information corresponding to the audio frame data and the display time information corresponding to the video frame data contained in the audio and video frame pair.

2. The method of claim 1, wherein determining a synchronization error between audio and video in the live data stream according to the display time information corresponding to the audio frame data and the display time information corresponding to the video frame data comprises:

according to the arrival sequence of the audio frame data and the video frame data, pairing the audio frame data and the video frame data to obtain at least one audio and video frame pair;

and for each audio and video frame pair, determining a difference value between display time information corresponding to audio frame data and display time information corresponding to video frame data contained in the audio and video frame pair, and determining a synchronization error of audio and video in the live broadcast data stream according to the difference value.

3. The method of claim 1, wherein obtaining a live stream address from which audio frame data and video frame data of a live data stream are obtained from a streaming media server comprises:

acquiring a live streaming address input by a user through a UI (user interface) interaction module;

sending the live streaming address to an acquisition task module through a WebSocket Server module;

acquiring audio frame data and video frame data of a live data stream from a streaming media server through the acquisition task module based on the live stream address; and

and comparing and displaying the display time information corresponding to the audio frame data and the display time information corresponding to the video frame data in each audio and video frame pair through a UI (user interface) interaction module.

4. The method according to claim 1 or 2, wherein pairing the audio frame data and the video frame data according to the arrival order of the audio frame data and the video frame data to obtain at least one audio/video frame pair comprises:

5. The method of any of claims 1-3, wherein the live-stream address is a push-stream address; the live data stream is the live data stream pushed to the streaming media server by the stream pushing end.

6. The method of claim 5, wherein the method further comprises:

and if the synchronous error of the audio and the video in the live broadcast data stream is determined not to be zero, determining that the introduction stage of the synchronous error is a stream pushing stage.

7. The method of any of claims 1-3, wherein the live-stream address is a pull-stream address; the live data stream is a live data stream pulled by the stream pulling end from the stream media server.

8. The method of claim 7, wherein the method further comprises:

and if the synchronous error of the audio and the video in the live broadcast data stream is determined not to be zero, determining that the introduction stage of the synchronous error is a stream pulling stage.

9. An apparatus for determining an audio-video synchronization error, comprising:

the UI interaction module is used for acquiring a live streaming address;

the acquisition task module is used for acquiring audio frame data and video frame data of the live data stream from the streaming media server based on the live stream address;

the UI interaction module is further configured to: displaying the display time information corresponding to the audio frame data and the display time information corresponding to the video frame data in a visual manner, so as to determine the synchronous error of the audio and the video in the live data stream according to the display information;

the WebSocket Server module comprises a pairing unit used for pairing the audio frame data and the video frame data according to the arrival sequence of the audio frame data and the video frame data to obtain at least one audio and video frame pair;

correspondingly, the UI interaction module is specifically configured to: and comparing and displaying the display time information corresponding to the audio frame data and the display time information corresponding to the video frame data contained in each audio and video frame pair.

10. The apparatus of claim 9, wherein the WebSocket Server module comprises:

11. The apparatus according to claim 10, wherein the pairing unit is specifically configured to:

12. The apparatus of any of claims 9-10, wherein the live-stream address is a push-stream address; the live data stream is the live data stream pushed to the streaming media server by the stream pushing end.

13. The apparatus of claim 12, wherein the apparatus further comprises:

and the first error introduction stage determining module is used for determining that the introduction stage of the synchronous error is a stream pushing stage if the synchronous error of the audio and the video in the live data stream is determined to be not zero.

14. The apparatus of any of claims 9-10, wherein the live-stream address is a pull-stream address; the live data stream is a live data stream pulled by the stream pulling end from the stream media server.

15. The apparatus of claim 14, wherein the apparatus further comprises:

16. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of audiovisual synchronization error determination of any of claims 1-8.

17. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method for determining an av synchronization error according to any one of claims 1 to 8.