CN114979718B

CN114979718B - Method, device, electronic equipment and storage medium for synchronous continuous playing of audio and video

Info

Publication number: CN114979718B
Application number: CN202210414021.2A
Authority: CN
Inventors: 毕新维; 何大红
Original assignee: Hainan Chezhiyi Communication Information Technology Co ltd
Current assignee: Hainan Chezhiyi Communication Information Technology Co ltd
Priority date: 2022-04-14
Filing date: 2022-04-14
Publication date: 2023-09-19
Anticipated expiration: 2042-04-14
Also published as: CN114979718A

Abstract

The application discloses a method, a device, electronic equipment and a storage medium for synchronously and continuously playing audio and video, wherein the method comprises the following steps: acquiring video and audio call and output parameters; acquiring a video decoding time stamp and an audio decoding time stamp of a data frame in an audio and video data packet according to the video and audio calling and output parameters; acquiring the sending speed of video data in the audio-video data packet according to the video decoding time stamp and the audio decoding time stamp of the data frame, and controlling the sending speed of the audio data; and according to the sending speed of the video data in the audio-video data packet, controlling the sending speed of the audio data, and obtaining the video-audio data packet synchronization realized by locking or frame loss or frame supplement or time base conversion. The application can realize lightweight guide broadcasting, HTTP interface remote scheduling, direct broadcasting abnormal state self-processing, without manual intervention, support video and audio data packets with different frame rates, and can keep the output synchronization of single-path or multi-path video and audio data packets.

Description

Method, device, electronic equipment and storage medium for synchronous continuous playing of audio and video

Technical Field

The invention relates to the technical field of audio and video processing, in particular to a method, a device, electronic equipment and a storage medium for synchronous continuous playing of audio and video.

Background

Live broadcasting scenes based on the Internet, such as live broadcasting video at an online evening, live broadcasting with goods, online education and the like, gradually become new trends, and currently live broadcasting promotion mainly comprises guide broadcasting software promotion such as camera promotion, mobile phone promotion, large guide broadcasting platform equipment and the like.

The plug flow mode in the prior art mainly comprises the following steps:

the first is single-channel push stream, generally FFmpeg-based encoding or equipment hard encoding, such as mobile phone push stream, camera picture acquisition, microphone audio acquisition, video picture encoding into h264 code stream through hard encoding or soft encoding, audio pcm data encoding into aac code stream data, and tcp network layer-based data packet transmission after RTMP protocol encapsulation. For mature live broadcast plug flow, coding dynamic code rate self-adaption and network speed monitoring are also supported, but for large-scale evening meetings, important business meetings and the like, multi-machine-position multi-scene live broadcast cannot be met;

secondly, professional equipment pushing or OBS and other guide software pushing is used, for example, OBS pushing is used, windows, mac, linux and other operating systems are supported, but setting items of the direct broadcasting professional knowledge are more needed in the mode, the method is not friendly for novice who does not understand direct broadcasting at all, automatic switching of standby emergency paths is not supported, remote API operation is not supported, desktop software operation is needed, graphical operating system operation is needed, and manual control is needed.

Therefore, a method for synchronously continuing playing audio and video is needed, and the requirement of guaranteeing audio and video synchronization when supporting different frame rate input sources and single-channel or multi-channel switching is met.

Disclosure of Invention

Accordingly, the present invention provides a method, apparatus, electronic device, and storage medium for audio and video synchronized playback, in an effort to solve or at least alleviate at least one of the above problems.

According to one aspect of the present invention, there is provided a method for synchronizing video and audio, the method controlling synchronization of a decoded video and an audio time stamp by storing a video and an audio in a read video stream into a queue, maintaining consistency of the time stamp when the video is transmitted, and realizing the same output data frame rate by locking or frame loss or frame supplement, the method comprising the steps of:

obtaining video and audio call and output parameters, wherein the video and audio call and output parameters comprise: the video and audio calling interface, the cover map path, the preview push address, the program output push address and the main input source pull address;

acquiring a video decoding time stamp and an audio decoding time stamp of a data frame in an audio and video data packet according to the video and audio calling and output parameters;

Acquiring the sending speed of video data in the audio-video data packet according to the video decoding time stamp and the audio decoding time stamp of the data frame, and controlling the sending speed of the audio data;

according to the sending speed of the video data in the audio-video data packet, controlling the sending speed of the audio data, and obtaining that the sending frame rate of the audio data packet and the sending frame rate of the video data packet in the audio-video data are different, and realizing the synchronization of the sending frame rates of the audio and the video in a locking mode;

or, the video and audio data packets have different calling frame rates, and the same output frame rate is realized by a frame loss or frame supplementing mode;

or when the calling frame rate and the output frame rate of the video and audio data packet are different, the video and audio data packet is correctly output in a time base conversion mode.

In another embodiment, the step of obtaining the video-audio call and the output parameter includes:

acquiring a video and audio calling interface called by a user event notification library server;

setting the cover map path, the preview push address, the program output push address and the main input source pull address according to the video and audio calling interface called by the user event notification library server;

Obtaining a video stream of a main input source pulling stream read by a first thread calling video coding standard according to the set cover map path, preview pushing address, program output pushing address and main input source pulling address;

according to the first thread calling video coding standard, reading a video stream of a main input source pull stream, obtaining an audio data packet and a video data packet of the video stream, and storing the audio data packet and the video data packet into an audio data packet queue and video data packet queue information respectively;

and respectively storing the audio data packet and the video data packet of the video stream into an audio data packet queue and video data packet queue information to obtain second thread starting decoding video data packet and third thread decoding audio data packet information.

In another embodiment, the step of obtaining the video decoding time stamp and the audio decoding time stamp of the data frame in the audio/video data packet according to the video/audio call and the output parameter includes:

decoding the video data packet according to the second thread, and acquiring a preview path preview_os and an output path target_os of the video stream;

obtaining video frame data of the preview path preview_os and an output path target_os according to the preview path preview_os and the output path target_os of the video stream, storing the video frame data of the preview path preview_os into a preview video frame list, and storing the video frame data of the output path target_os into output video frame list information and video decoding time stamp information;

Storing the video frame data of the preview path preview_os into a preview video frame list, storing the video frame data of the output path target_os into output video frame list information and video decoding time stamp information, and acquiring a current video decoding time stamp of video frame decoding of a video data packet and a first video decoding time stamp of first video frame decoding in the video data packet;

according to the third thread, decoding the audio data packet, obtaining an audio decoding time stamp decoded by the audio data packet, resampling audio frame data of the audio data packet, and reconstructing the audio frame data;

acquiring audio data frame list information of the audio frame data according to the reconstructed audio frame data;

and acquiring an audio pushing time stamp of the audio frame data according to the audio decoding time stamp and the audio data frame list information of the audio frame data.

In another embodiment, the step of reconstructing the audio frame data includes:

acquiring the audio frame data, and acquiring resampling parameters of the audio frame data, wherein the resampling parameters of the audio frame data are matched with the output parameters, and the resampling parameters of the audio frame data comprise: sampling rate, number of channels, audio coding format;

Acquiring first-in first-out queue information stored in the resampled audio frame data according to the resampling parameters of the audio frame data;

acquiring information of which the size is not less than 1024 according to the information of the first-in first-out queue stored in the resampled audio frame data;

and acquiring 1024 data read by the first-in first-out queue of the resampled audio frame data storage according to the information that the size of the first-in first-out queue of the resampled audio frame data storage is not less than 1024, and reconstructing the audio frame data.

In another embodiment, the step of obtaining the sending speed of the video data in the audio-video data packet and controlling the sending speed of the audio data according to the video decoding time stamp and the audio decoding time stamp of the data frame includes:

comparing the video decoding time stamp and the audio decoding time stamp of the data frame;

if the audio decoding time stamp of the data frame is greater than or equal to the video decoding time stamp of the data frame, controlling the sending dormancy of the audio data;

if the audio decoding time stamp of the data frame is smaller than the video decoding time stamp of the data frame, acquiring the sending speed of the video data;

And controlling the sending speed of the audio data according to the sending speed of the video data.

In another embodiment, the step of synchronizing the audio and video transmission frame rates by a locking manner when the audio data packet and the video data packet in the video and audio data have different transmission frame rates includes:

acquiring video frame data transmission information, video transmission time stamp information, audio frame data transmission information and audio transmission time stamp information in the video data packet queue;

when video frame data is transmitted, judging whether the video transmission time stamp is larger than the audio transmission time stamp or not;

if yes, the video frame data sending thread locks and waits for the audio frame data to be sent;

when audio frame data is transmitted, judging whether the audio transmission time stamp is larger than a video transmission time stamp or not;

if yes, the audio frame data sending thread locks and waits for the video frame data to be sent.

In another embodiment, the steps of calling frame rates of the multiple paths of video and audio data packets are different, and realizing the same output frame rate by a frame loss or frame supplement mode include:

acquiring calling frame rates of multiple paths of video and audio data packets;

Acquiring one path of video and audio data packet with highest frame rate in the calling frame rates of the multiple paths of video and audio data packets according to the calling frame rates of the multiple paths of video and audio data packets;

acquiring last frame fetching information recorded by each path of video and audio data packets according to the path of video and audio data packets with highest frame rate in calling frame rates of the paths of video and audio data packets;

acquiring the last frame taking information recorded by each path of the multi-path video and audio data packet according to the highest frame rate in the calling frame rate of the video and audio data packet, and if any path of video and audio data packet of the multi-path video and audio data packet cannot take the frame, continuing to acquire the last frame taking of the path of video and audio data packet;

according to the frame taking method of the multipath video and audio data packets, the output frame rate of the multipath audio data packets is the same.

In another embodiment, when the calling frame rate and the output frame rate of the video and audio data packet are different, the step of implementing the correct output of the video and audio data packet by a time-based conversion mode includes:

acquiring time base conversion information when the video and audio data are output, wherein the time base conversion information when the video and audio data are output comprises a practical output frame of the last output frame and a time stamp of the current output frame;

Acquiring successful output information of the video and audio data packet according to the time base conversion information when the video and audio data is output;

and according to the successful output information of the video and audio data packet, the time base conversion during the video and audio data output is carried out again.

According to still another aspect of the present invention, there is disclosed an apparatus for synchronizing video and audio in a video stream, the apparatus controlling synchronization of a decoded video and an audio time stamp by storing the video and the audio in a queue, maintaining consistency of the time stamp when the video is transmitted, and achieving the same output data frame rate by locking or frame loss or frame supplement, the apparatus comprising: the data acquisition module is used for acquiring video and audio calling and output parameters, wherein the video and audio calling and output parameters comprise: the video and audio calling interface, the cover map path, the preview push address, the program output push address and the main input source pull address; the time stamp analysis module is used for acquiring a video decoding time stamp and an audio decoding time stamp of a data frame in an audio and video data packet according to the video and audio calling and outputting parameters; the synchronous control module is used for acquiring the sending speed of the video data in the audio-video data packet according to the video decoding time stamp and the audio decoding time stamp of the data frame and controlling the sending speed of the audio data; according to the sending speed of the video data in the audio-video data packet, controlling the sending speed of the audio data, and obtaining that the sending frame rate of the audio data packet and the sending frame rate of the video data packet in the audio-video data are different, and realizing the synchronization of the sending frame rates of the audio and the video in a locking mode; or, the video and audio data packets have different calling frame rates, and the same output frame rate is realized by a frame loss or frame supplementing mode; or when the calling frame rate and the output frame rate of the video and audio data packet are different, the video and audio data packet is correctly output in a time base conversion mode.

According to yet another aspect of the present application, there is provided a computing device comprising: one or more processors; and a memory; one or more programs, wherein the one or more programs are stored in memory and configured to be executed by one or more processors, the one or more programs comprising instructions for performing any of the methods of audio video synchronized playback as described above.

According to yet another aspect of the present application, there is provided a computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a computing device, cause the computing device to perform any of the methods of audio video synchronized playback as described above.

According to the scheme of the application, the video and audio call and output parameters are obtained; acquiring a video decoding time stamp and an audio decoding time stamp of a data frame in an audio and video data packet according to the video and audio calling and output parameters; acquiring the sending speed of video data in the audio-video data packet according to the video decoding time stamp and the audio decoding time stamp of the data frame, and controlling the sending speed of the audio data; according to the sending speed of the video data in the audio-video data packet, controlling the sending speed of the audio data, and obtaining that the sending frame rate of the audio data packet and the sending frame rate of the video data packet in the audio-video data are different, and realizing the synchronization of the sending frame rates of the audio and the video in a locking mode; or, the video and audio data packets have different calling frame rates, and the same output frame rate is realized by a frame loss or frame supplementing mode; or when the calling frame rate and the output frame rate of the video and audio data packet are different, the video and audio data packet is correctly output in a time base conversion mode. The application can realize lightweight guide broadcasting, HTTP interface remote scheduling, direct broadcasting abnormal state self-processing, without manual intervention, support video and audio data packets with different frame rates, and can keep the output synchronization of single-path or multi-path video and audio data packets.

Drawings

To the accomplishment of the foregoing and related ends, certain illustrative aspects are described herein in connection with the following description and the annexed drawings, which set forth the various ways in which the principles disclosed herein may be practiced, and all aspects and equivalents thereof are intended to fall within the scope of the claimed subject matter. The above, as well as additional objects, features, and advantages of the present disclosure will become more apparent from the following detailed description when read in conjunction with the accompanying drawings. Like reference numerals generally refer to like parts or elements throughout the present disclosure.

FIG. 1 illustrates a schematic construction of a computing device 100 according to one embodiment of the invention; and

FIG. 2 illustrates a flow chart of a method 200 of audio and video synchronized playback in accordance with one embodiment of the present invention; and

fig. 3 shows a schematic structural diagram of an apparatus 300 for audio-video synchronous playback according to an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

Fig. 1 is a block diagram of an example computing device 100. In a basic configuration 102, computing device 100 typically includes a system memory 106 and one or more processors 104. The memory bus 108 may be used for communication between the processor 104 and the system memory 106.

Depending on the desired configuration, the processor 104 may be any type of processing including, but not limited to: a microprocessor (μp), a microcontroller (μc), a digital information processor (DSP), or any combination thereof. The processor 104 may include one or more levels of caches, such as a first level cache 110 and a second level cache 112, a processor core 114, and registers 116. The example processor core 114 may include an Arithmetic Logic Unit (ALU), a Floating Point Unit (FPU), a digital signal processing core (DSP core), or any combination thereof. The example memory controller 118 may be used with the processor 104, or in some implementations, the memory controller 118 may be an internal part of the processor 104.

Depending on the desired configuration, system memory 106 may be any type of memory including, but not limited to: volatile memory (such as RAM), non-volatile memory (such as ROM, flash memory, etc.), or any combination thereof. The system memory 106 may include an operating system 120, one or more applications 122, and program data 124. In some implementations, the application 122 may be arranged to operate on an operating system with program data 124. In some embodiments, the computing device 100 is configured to perform a method 200 of audio-video synchronized playback, the method 200 being capable of maintaining the consistency of the time stamps during video transmission by storing video and audio in a read video stream in a queue, controlling the decoded video to be synchronized with the audio time stamps, and achieving the same output data frame rate by locking or frame dropping or frame supplementing, the program data 124 including instructions for performing the method 200.

Computing device 100 may also include an interface bus 140 that facilitates communication from various interface devices (e.g., output devices 142, peripheral interfaces 144, and communication devices 146) to basic configuration 102 via bus/interface controller 130. The example output device 142 includes a graphics processing unit 148 and an audio processing unit 150. They may be configured to facilitate communication with various external devices such as a display or speakers via one or more a/V ports 152. Example peripheral interfaces 144 may include a serial interface controller 154 and a parallel interface controller 156, which may be configured to facilitate communication with external devices such as input devices (e.g., keyboard, mouse, pen, voice input device, touch input device) or other peripherals (e.g., printer, scanner, etc.) via one or more I/O ports 158. An example communication device 146 may include a network controller 160, which may be arranged to facilitate communication with one or more other computing devices 162 via one or more communication ports 164 over a network communication link.

The network communication link may be one example of a communication medium. Communication media may typically be embodied by computer readable instructions, data structures, program modules, and may include any information delivery media in a modulated data signal, such as a carrier wave or other transport mechanism. A "modulated data signal" may be a signal that has one or more of its data set or changed in such a manner as to encode information in the signal. By way of non-limiting example, communication media may include wired media such as a wired network or special purpose network, and wireless media such as acoustic, radio Frequency (RF), microwave, infrared (IR) or other wireless media. The term computer readable media as used herein may include both storage media and communication media. In some embodiments, one or more programs are stored in a computer readable medium, the one or more programs including instructions for performing methods by which the computing device 100 performs the method 200 of audio video synchronized playback in accordance with embodiments of the present invention.

Computing device 100 may be implemented as part of a small-sized portable (or mobile) electronic device such as a cellular telephone, a Personal Digital Assistant (PDA), a personal media player device, a wireless web-watch device, a personal headset device, an application specific device, or a hybrid device that may include any of the above functions. Computing device 100 may also be implemented as a personal computer including desktop and notebook computer configurations.

Fig. 2 illustrates a flow chart of a method 200 of audio and video synchronized playback in accordance with one embodiment of the present invention. As shown in fig. 2, the method 200 includes storing video and audio in a read video stream into a queue, controlling synchronization of decoded video and audio time stamps, maintaining consistency of time stamps during video transmission, and realizing the same output data frame rate by locking or frame loss or frame supplement, and the method 200 starts with step S210, and obtains video and audio call and output parameters, where the video and audio call and output parameters include: the system comprises an audio-video calling interface, a cover map path, a preview push address, a program output push address and a main input source pull address.

Specifically, in this embodiment, video and audio call is performed based on a http server based on a library, where the library is a lightweight open-source high-performance event notification library written in C language, and the http server of the library waits for calling a video and audio data interface, and performs setting of corresponding video and audio call and output parameters, and so on.

Specifically, in one embodiment of the present application, the step of obtaining the video and audio call and the output parameter includes:

Specifically, the first thread calls the video coding standard to be FFmpeg, where FFmpeg is an MPEG video coding standard, and the previous "FF" stands for "Fast Forward", and FFmpeg is a set of open source computer programs that can be used to record, convert digital audio and video, and can convert it into a stream, so that the mutual conversion between multiple video formats can be easily implemented. The first thread reads a pulled video stream by calling FFmpeg, the video stream comprises video data packets and audio data packets, the audio data packets and the video data packets are respectively stored in a video data packet queue and an audio data packet queue, and the second thread and the third thread which are started at the moment respectively process the data of the video data packet queue and the audio data packet queue. The second thread obtains a video data packet from the video data packet queue, and scales the width and the height to be the target width and the height through a libyuv library, wherein the libyuv library is an image processing library of a Google open source yuv format, and the conversion among various yuv format data is realized, including data conversion, cutting, scaling and rotation; and the third thread acquires an audio data packet from the audio data packet queue for decoding.

And step S220, according to the video and audio call and output parameters, acquiring the video decoding time stamp and the audio decoding time stamp of the data frame in the audio and video data packet.

Specifically, after the second thread and the third thread decode the video data packet and the audio data packet, respectively, the video decoding time stamp in the video data packet and the audio decoding time stamp in the audio data packet may be obtained, respectively. The audio decoding time stamp and the video decoding time stamp are basic parameters reflecting whether audio and video outputs are synchronized.

Specifically, in one embodiment of the present application, the step of obtaining the video decoding time stamp and the audio decoding time stamp of the data frame in the audio/video data packet according to the video/audio call and the output parameter includes:

Specifically, after the second thread decodes the video data packet, the width and height are scaled by a libyuv library which is a Google open source yuv format image processing library to achieve conversion between various yuv format data, including data conversion, clipping, scaling and rotation. Outputting video frame data with the width and height scaled by the libyuv library as a preview path and an output path respectively, wherein the preview path is marked as preview_os, the output path is marked as target_os, when the preview path preview_os is not empty and the preparation parameters corresponding to the preview path preview_os are ready, the decoded video frame data are put into a preview video frame list, and if the output path target_os is not empty, the preparation parameters corresponding to the output path target_os are ready, and when the output list of the current video frame data is matched with the thread list corresponding to the output of the current video frame data, the decoded video frame data are put into the output video frame list, at the moment, the parameters contained in the output video frame list are as follows: the method comprises the steps of marking video frame data, video decoding time stamp information and list information, marking a time stamp after video data packet decoding as a current video decoding time stamp, marking a video decoding time stamp corresponding to video frame data decoded by a first video data packet as a first video decoding time stamp, marking a time stamp at the beginning of a system as a system initial time stamp, marking a current time stamp of the system as a current time stamp of the system, subtracting the first video decoding time stamp from the current video decoding time stamp to obtain a reading time stamp of a current video frame data code stream, and subtracting the system initial time stamp from the current time stamp to obtain the consumed time of decoding the current video data packet. If the reading time stamp of the current video frame data code stream is larger than the consumed time of decoding the current video data packet, the video data packet is put into the video data packet queue too fast, and dormancy is needed.

And the third thread extracts the audio data packet from the audio data packet queue for decoding, marks the audio decoding time stamp at the moment as the current audio decoding time stamp, resamples the audio frame data, reconstructs the audio frame data, and stores the reconstructed audio frame data into the audio data frame list.

Specifically, in one embodiment of the present application, the step of reconstructing the audio frame data includes:

Specifically, the reconstruction of the audio frame data is performed by adopting a swr _convert () function, and because the decoded audio frame data is audio frame data in PCM format and needs to be converted into audio frame data in PGM format, when the reconstruction of the audio frame data is performed, the sampling rate, the number of channels and the audio coding format need to be converted into corresponding information of audio data frames in PGM format, then the resampled PCM data is put into a first-in first-out queue by using an avaduifofifo function, and when the degree of the first-in first-out queue is greater than or equal to 1024, the data quantity at the moment satisfies the encoding of one frame aac audio, and 1024 data are read by using the first-in first-out queue, so that the reconstruction of the audio frame data is completed.

According to the video decoding time stamp and the audio decoding time stamp of the data frame, the sending speed of the video data in the audio/video data packet is obtained and the sending speed of the audio data is controlled in step S230.

Specifically, in one embodiment of the present application, the step of obtaining the sending speed of the video data in the audio/video data packet and controlling the sending speed of the audio data according to the video decoding time stamp and the audio decoding time stamp of the data frame includes:

Specifically, if the audio frame data lists stored in the preview_os and the output target_os are not empty, and the preparation parameters corresponding to the preview_os and the output target_os are ready, and when the output list of the current audio frame data is matched with the thread list corresponding to the output of the current audio frame data, the current audio decoding timestamp at each output is the first-in first-out queue length divided by the sampling rate at the time of audio frame data reconstruction, and if the current audio decoding timestamp is greater than or equal to the current video decoding timestamp, it is indicated that the audio decoding speed is too fast, and dormancy is required, because the video frame data is required to be controlled to be output according to the set speed, the sending speed of the audio frame data depends on the output speed of the video frame data, and the audio and video data packet output is ensured to be synchronous in a dormancy mode.

Specifically, since the preview path preview_os and the output path target_os both send video frame data and audio frame data by a new thread, the preview path preview_os is responsible for displaying the global preview monitoring panel of the broadcasting guide table, and after performing splicing processing by a thread specially responsible for splicing, the final picture is put into the video frame data queue of the preview path preview_os.

Step S240 is executed to obtain that the audio and video transmission frame rate is synchronized in a locking manner when the audio data packet and the video data packet in the audio and video data packet have different transmission frame rates according to the transmission speed of the video data in the audio and video data packet and control the transmission speed of the audio data;

Specifically, in the actual transmission of the video/audio data packets, there may be a case where the frame rate of the input video/audio data is different from the frame rate of the output video/audio data, there may be a case where there are a plurality of frame rates in the input multiplexed video/audio data, or there may be a case where the frame rate of the video data packets in the video/audio data is different from the frame rate of the audio data packets, and synchronization processing is required.

Specifically, in one embodiment of the present application, the step of synchronizing the audio and video transmission frame rates by a locking manner when the audio data packet and the video data packet in the video and audio data have different transmission frame rates includes:

Specifically, in one embodiment of the present application, the steps for implementing the same output frame rate by the frame loss or frame compensation method include:

Specifically, in one embodiment of the present application, when the calling frame rate and the output frame rate of the video and audio data packet are different, the step of implementing the correct output of the video and audio data packet by the time-based conversion mode includes:

Specifically, for example, a thread acquires video frame data from a video frame data list, acquires a current timestamp and list information of the video frame data, records list information of last video frame data acquired from the list, if the list information of the last video frame data is not list information of PGM format video frame data, indicates that a switch has occurred, and after the switch has occurred, all data in the old list needs to be emptied, and simultaneously ensures that the starting video frame data and the starting audio frame data in the new list are synchronous. Recording the time stamp of the current output video frame data as the current video frame data output time stamp, and the time stamp of the first output video frame data as the first video frame data output time stamp, wherein the current thread and the time for completing the streaming of the video frame data are the current video data output time stamp minus the first video frame data output time stamp, and recording the total time length for completing the streaming of the last video frame data as the accumulated video frame data output time by calling an average code_encode_video 2 () function, wherein the total time length for outputting the video frame data by the current thread is the accumulated video frame data output time plus the current video frame data output time stamp. If the input source frame rate of the video data packet is 30, but the output is 25, the frame needs to be lost, when the frame loss occurs in the time base conversion, the current output time stamp of the video data frame needs to be more than the current output time stamp of the video data frame after the last push stream is completed, and after the video data packet is successfully output, the time base conversion is carried out again. Because the output of the audio frame data and the output of the video frame data are two independent threads, the synchronization of the data output of the audio and video data packets is realized through locking.

If there is a synchronization time difference between the output video frame data and the audio frame data, the initial time difference of the new video and audio data is the synchronization time difference after the switching, and if the influence of the synchronization time difference is to be eliminated, the following method is needed: recording a first output video frame data time stamp of the first video frame data output by each line, currently enabling the video frame data time stamp and the total time length of the video frame data output by the current line, assuming that the output of the video frame data is firstly executed, recording a synchronous time difference value, calculating a first output audio frame data time stamp of the audio frame data when the output of the audio frame data is executed, and enabling the first output audio frame data time stamp to be equal to the first output video frame data time stamp minus the synchronous time difference value at the moment, so that the consistency of the output time of the video frame data and the audio frame data of the new line is ensured during switching.

If one preview path preview_os is a multiple global preview path, for example, two global preview paths exist, but there is a frame rate of 25 for one path, 30 for the other path, and 30 for the output frame rate, then the two global preview paths need to be synchronized. Because the frames per second in the video packet queue and the audio packet queue are affected by the input frame rate when the decoding thread outputs the video and audio packets to the queue, which is one frame decoded and one frame stored. Whereas the reading is at a fixed frame rate. For example, 30 frames per second are currently taken, which results in a frame rate of 25, and each time a missing frame number is required to be paid out from the next second, which results in asynchronous time of each frame of the preview frame. Frame loss or frame supplement is required when different frame rates are converted into fixed frame rates, and for this embodiment, the specific strategy is to process in the queue: recording the last frame when each frame is fetched; always taking the latest frame; if there is no new frame, the last frame recorded is used. By the strategy, the method can ensure that the expected fixed frame number can be taken away within one second no matter what the input frame rate is, and after the queue layer is processed, the frame loss or frame supplement is not needed to be processed when the video frame data and the audio frame data are output.

It should be understood that, although the steps in the flowchart of fig. 2 are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in fig. 2 may include multiple sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, nor do the order in which the sub-steps or stages are performed necessarily performed in sequence, but may be performed alternately or alternately with at least a portion of the sub-steps or stages of other steps or other steps.

In one embodiment, as shown in fig. 3, an apparatus 300 for audio and video synchronization and continuous broadcasting is provided, the apparatus 300 is based on a distributed data storage cluster architecture, implementing locking reliability through a watchdog mechanism, and implementing reliable implementation of distributed locks by enabling threads not acquiring locks to acquire lock release messages in time through a cluster management mechanism, where the apparatus 300 includes: the device comprises a data acquisition module, a timestamp analysis module and a synchronous control module.

The data acquisition module is used for acquiring video and audio calling and output parameters, wherein the video and audio calling and output parameters comprise: the video and audio calling interface, the cover map path, the preview push address, the program output push address and the main input source pull address; the time stamp analysis module is used for acquiring a video decoding time stamp and an audio decoding time stamp of a data frame in an audio and video data packet according to the video and audio calling and outputting parameters; the synchronous control module is used for acquiring the sending speed of the video data in the audio-video data packet according to the video decoding time stamp and the audio decoding time stamp of the data frame and controlling the sending speed of the audio data; according to the sending speed of the video data in the audio-video data packet, controlling the sending speed of the audio data, and obtaining that the sending frame rate of the audio data packet and the sending frame rate of the video data packet in the audio-video data are different, and realizing the synchronization of the sending frame rates of the audio and the video in a locking mode; or, the video and audio data packets have different calling frame rates, and the same output frame rate is realized by a frame loss or frame supplementing mode; or when the calling frame rate and the output frame rate of the video and audio data packet are different, the video and audio data packet is correctly output in a time base conversion mode.

Specifically, in another embodiment of the present application, the data acquisition module is configured to acquire an audio-video call interface called by a user event notification library server; setting the cover map path, the preview push address, the program output push address and the main input source pull address according to the video and audio calling interface called by the user event notification library server; obtaining a video stream of a main input source pulling stream read by a first thread calling video coding standard according to the set cover map path, preview pushing address, program output pushing address and main input source pulling address; according to the first thread calling video coding standard, reading a video stream of a main input source pull stream, obtaining an audio data packet and a video data packet of the video stream, and storing the audio data packet and the video data packet into an audio data packet queue and video data packet queue information respectively; and respectively storing the audio data packet and the video data packet of the video stream into an audio data packet queue and video data packet queue information to obtain second thread starting decoding video data packet and third thread decoding audio data packet information.

Specifically, in another embodiment of the present application, the data obtaining module is configured to decode the video data packet according to the second thread, and obtain a preview path preview_os and an output path target_os of the video stream; obtaining video frame data of the preview path preview_os and an output path target_os according to the preview path preview_os and the output path target_os of the video stream, storing the video frame data of the preview path preview_os into a preview video frame list, and storing the video frame data of the output path target_os into output video frame list information and video decoding time stamp information; storing the video frame data of the preview path preview_os into a preview video frame list, storing the video frame data of the output path target_os into output video frame list information and video decoding time stamp information, and acquiring a current video decoding time stamp of video frame decoding of a video data packet and a first video decoding time stamp of first video frame decoding in the video data packet; according to the third thread, decoding the audio data packet, obtaining an audio decoding time stamp decoded by the audio data packet, resampling audio frame data of the audio data packet, and reconstructing the audio frame data; acquiring audio data frame list information of the audio frame data according to the reconstructed audio frame data; and acquiring an audio pushing time stamp of the audio frame data according to the audio decoding time stamp and the audio data frame list information of the audio frame data.

Specifically, in another embodiment of the present application, the data obtaining module is configured to obtain the audio frame data, obtain a resampling parameter of the audio frame data, where the resampling parameter of the audio frame data is adapted to the output parameter, and the resampling parameter of the audio frame data includes: sampling rate, number of channels, audio coding format; acquiring first-in first-out queue information stored in the resampled audio frame data according to the resampling parameters of the audio frame data; acquiring information of which the size is not less than 1024 according to the information of the first-in first-out queue stored in the resampled audio frame data; and acquiring 1024 data read by the first-in first-out queue of the resampled audio frame data storage according to the information that the size of the first-in first-out queue of the resampled audio frame data storage is not less than 1024, and reconstructing the audio frame data.

Specifically, in another embodiment of the present application, the timestamp parsing module is configured to compare a video decoding timestamp and an audio decoding timestamp of the data frame; if the audio decoding time stamp of the data frame is greater than or equal to the video decoding time stamp of the data frame, controlling the sending dormancy of the audio data; if the audio decoding time stamp of the data frame is smaller than the video decoding time stamp of the data frame, acquiring the sending speed of the video data; and controlling the sending speed of the audio data according to the sending speed of the video data.

Specifically, in another embodiment of the present application, the synchronization control module is configured to obtain video frame data transmission information, video transmission timestamp information, and audio frame data transmission information, and audio transmission timestamp information in the video packet queue; when video frame data is transmitted, judging whether the video transmission time stamp is larger than the audio transmission time stamp or not; if yes, the video frame data sending thread locks and waits for the audio frame data to be sent; when audio frame data is transmitted, judging whether the audio transmission time stamp is larger than a video transmission time stamp or not; if yes, the audio frame data sending thread locks and waits for the video frame data to be sent.

Specifically, in another embodiment of the present application, the synchronization control module is configured to obtain a calling frame rate of multiple paths of the video and audio data packets; acquiring one path of video and audio data packet with highest frame rate in the calling frame rates of the multiple paths of video and audio data packets according to the calling frame rates of the multiple paths of video and audio data packets; acquiring last frame fetching information recorded by each path of video and audio data packets according to the path of video and audio data packets with highest frame rate in calling frame rates of the paths of video and audio data packets; acquiring the last frame taking information recorded by each path of the multi-path video and audio data packet according to the highest frame rate in the calling frame rate of the video and audio data packet, and if any path of video and audio data packet of the multi-path video and audio data packet cannot take the frame, continuing to acquire the last frame taking of the path of video and audio data packet; according to the frame taking method of the multipath video and audio data packets, the output frame rate of the multipath audio data packets is the same.

Specifically, in another embodiment of the present application, the synchronization control module is configured to obtain time-based conversion information during outputting the video and audio data, where the time-based conversion information during outputting the video and audio data includes a time stamp of a practical output frame and a current output frame of a last output frame; acquiring successful output information of the video and audio data packet according to the time base conversion information when the video and audio data is output; and according to the successful output information of the video and audio data packet, the time base conversion during the video and audio data output is carried out again.

According to the scheme of the audio and video synchronous continuous playing, the data acquisition module acquires the video and audio call and output parameters, wherein the video and audio call and output parameters comprise: the video and audio calling interface, the cover map path, the preview push address, the program output push address and the main input source pull address; the time stamp analysis module acquires a video decoding time stamp and an audio decoding time stamp of a data frame in an audio and video data packet according to the video and audio calling and output parameters; the synchronous control module acquires the sending speed of the video data in the audio-video data packet according to the video decoding time stamp and the audio decoding time stamp of the data frame, and controls the sending speed of the audio data; according to the sending speed of the video data in the audio-video data packet, controlling the sending speed of the audio data, and obtaining that the sending frame rate of the audio data packet and the sending frame rate of the video data packet in the audio-video data are different, and realizing the synchronization of the sending frame rates of the audio and the video in a locking mode; or, the video and audio data packets have different calling frame rates, and the same output frame rate is realized by a frame loss or frame supplementing mode; or when the calling frame rate and the output frame rate of the video and audio data packet are different, the video and audio data packet is correctly output in a time base conversion mode. The application can realize lightweight guide broadcasting, HTTP interface remote scheduling, direct broadcasting abnormal state self-processing, without manual intervention, support video and audio data packets with different frame rates, and can keep the output synchronization of single-path or multi-path video and audio data packets.

A8, the method as in A2, wherein, when the calling frame rate and the output frame rate of the video and audio data packet are different, the step of realizing the correct output of the video and audio data packet by a time-based conversion mode comprises the following steps: acquiring time base conversion information when the video and audio data are output, wherein the time base conversion information when the video and audio data are output comprises a practical output frame of the last output frame and a time stamp of the current output frame; acquiring successful output information of the video and audio data packet according to the time base conversion information when the video and audio data is output; and according to the successful output information of the video and audio data packet, the time base conversion during the video and audio data output is carried out again.

It should be appreciated that in the above description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be construed as reflecting the intention that: i.e., the claimed invention requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

Those skilled in the art will appreciate that the modules or units or components of the devices in the examples disclosed herein may be arranged in a device as described in this embodiment, or alternatively may be located in one or more devices different from the devices in this example. The modules in the foregoing examples may be combined into one module or may be further divided into a plurality of sub-modules.

Those skilled in the art will appreciate that the modules in the apparatus of the embodiments may be adaptively changed and disposed in one or more apparatuses different from the embodiments. The modules or units or components of the embodiments may be combined into one module or unit or component and, furthermore, they may be divided into a plurality of sub-modules or sub-units or sub-components. Any combination of all features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or units of any method or apparatus so disclosed, may be used in combination, except insofar as at least some of such features and/or processes or units are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings), may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features but not others included in other embodiments, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments can be used in any combination.

Furthermore, some of the embodiments are described herein as methods or combinations of method elements that may be implemented by a processor of a computer system or by other means of performing the functions. Thus, a processor with the necessary instructions for implementing the described method or method element forms a means for implementing the method or method element. Furthermore, the elements of the apparatus embodiments described herein are examples of the following apparatus: the apparatus is for carrying out the functions performed by the elements for carrying out the objects of the invention.

As used herein, unless otherwise specified the use of the ordinal terms "first," "second," "third," etc., to describe a general object merely denote different instances of like objects, and are not intended to imply that the objects so described must have a given order, either temporally, spatially, in ranking, or in any other manner.

While the invention has been described with respect to a limited number of embodiments, those skilled in the art, having benefit of the above description, will appreciate that other embodiments are contemplated within the scope of the invention as described herein. Furthermore, it should be noted that the language used in the specification has been principally selected for readability and instructional purposes, and may not have been selected to delineate or circumscribe the inventive subject matter. Accordingly, many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the appended claims. The disclosure of the present invention is intended to be illustrative, but not limiting, of the scope of the invention, which is defined by the appended claims.

Claims

1. A method for synchronously and continuously broadcasting audio and video, the method controls the synchronization of the decoded video and audio time stamps by storing the video and audio in a read video stream into a queue, realizes the consistency of the time stamps when the video is transmitted, and realizes the same output data frame rate by locking or frame loss or frame supplement modes, the method comprises the following steps:

obtaining video and audio call and output parameters, wherein the video and audio call and output parameters comprise: the method specifically comprises the steps of obtaining a video-audio calling interface called by a user event notification library server, setting a cover map path, a preview push address, a program output push address and a main input source pull address according to the video-audio calling interface called by the user event notification library server, setting the cover map path, the preview push address, the program output push address and the main input source pull address, obtaining a video stream of a main input source pull according to a first thread calling video coding standard, reading a video stream of the main input source pull according to the first thread calling video coding standard, obtaining audio data packets and video data packets of the video stream, respectively storing audio data packet queues and video data packet queue information according to the audio data packets and the video data packet queue information of the video stream, and obtaining a second thread starting decoding video data packet and a third thread starting decoding video data packet;

Obtaining a video decoding time stamp and an audio decoding time stamp of a data frame in an audio-video data packet according to the video-audio call and output parameters, specifically comprising decoding the video data packet according to the second thread, obtaining a preview path preview_os and an output path target_os of the video stream, obtaining video frame data of the preview path preview_os according to the preview path preview_os and the output path target_os of the video stream, storing the video frame data of the output path target_os in a preview video frame list, storing the video frame data of the output path target_os in output video frame list information, storing the video frame data of the output path preview_os in a preview video frame list according to the video frame data of the preview path preview_os, storing the video frame data of the output path target_os in output video frame list information, the video decoding time stamp information is used for obtaining a current video decoding time stamp of video frame decoding of a video data packet and a first video decoding time stamp of first video frame decoding of the video data packet, decoding the audio data packet according to the third thread, obtaining an audio decoding time stamp of audio data packet decoding, resampling audio frame data of the audio data packet, reconstructing the audio frame data, obtaining audio data frame list information of the audio frame data according to the reconstructed audio frame data, and obtaining an audio pushing time stamp of the audio frame data according to the audio decoding time stamp and the audio data frame list information of the audio frame data;

2. The method of claim 1, wherein the reconstructing the audio frame data comprises:

3. The method of claim 1, wherein the step of obtaining the transmission speed of the video data in the audio-video data packet and controlling the transmission speed of the audio data according to the video decoding time stamp and the audio decoding time stamp of the data frame comprises:

4. The method as claimed in claim 1, wherein the step of synchronizing the audio and video transmission frame rates by a locking manner when the audio data packet and the video data packet in the video and audio data are different in transmission frame rate comprises:

5. The method as claimed in claim 1, wherein the multiple paths of the video and audio data packets have different calling frame rates, and the step of implementing the same output frame rate by a frame loss or frame supplement method comprises:

6. The method as claimed in claim 1, wherein the step of implementing the correct output of the video and audio data packets by the time-based conversion method when the video and audio data packet calling frame rate is different from the output frame rate comprises:

7. An audio-video synchronous continuous broadcasting device, which controls the synchronization of the decoded video and audio time stamps by storing the video and audio in a read video stream into a queue, realizes the consistency of the time stamps when the video is transmitted, and realizes the same output data frame rate by locking or frame loss or frame supplement modes, the device comprises:

the data acquisition module is used for acquiring video and audio calling and output parameters, wherein the video and audio calling and output parameters comprise: the method specifically comprises the steps of obtaining a video-audio calling interface called by a user event notification library server, setting a cover map path, a preview push address, a program output push address and a main input source pull address according to the video-audio calling interface called by the user event notification library server, setting the cover map path, the preview push address, the program output push address and the main input source pull address, obtaining a video stream of a main input source pull according to a first thread calling video coding standard, reading a video stream of the main input source pull according to the first thread calling video coding standard, obtaining audio data packets and video data packets of the video stream, respectively storing audio data packet queues and video data packet queue information according to the audio data packets and the video data packet queue information of the video stream, and obtaining a second thread starting decoding video data packet and a third thread starting decoding video data packet;

The time stamp analyzing module is used for acquiring video decoding time stamps and audio decoding time stamps of data frames in an audio-video data packet according to the video-audio calling and outputting parameters, and concretely comprises the steps of decoding the video data packet according to the second thread, acquiring a preview path preview_os and an output path target_os of the video stream, acquiring video frame data of the preview path preview_os according to the preview path preview_os and the output path target_os of the video stream, storing the video frame data of the preview path preview_os in a preview video frame list, storing video frame data of the output path target_os in an output video frame list, storing video frame data of the output path preview_os in the preview video frame list, storing video frame data of the output path preview_os in the output video frame list, storing video frame data of the output path target_os in the output video frame list, and video decoding time stamp information, acquiring a current three-line video decoding time stamp of the video frame decoding of the video data packet, reconstructing the audio frame data according to the first audio frame data of the audio frame decoding time stamp, reconstructing the audio frame data of the audio data packet, and reconstructing the audio frame data according to the audio frame decoding time stamp information, and obtaining the audio frame data of the audio frame decoding time stamp information of the audio frame data of the audio data packet;

The synchronous control module is used for acquiring the sending speed of the video data in the audio-video data packet according to the video decoding time stamp and the audio decoding time stamp of the data frame and controlling the sending speed of the audio data; according to the sending speed of the video data in the audio-video data packet, controlling the sending speed of the audio data, and obtaining that the sending frame rate of the audio data packet and the sending frame rate of the video data packet in the audio-video data are different, and realizing the synchronization of the sending frame rates of the audio and the video in a locking mode; or, the video and audio data packets have different calling frame rates, and the same output frame rate is realized by a frame loss or frame supplementing mode; or when the calling frame rate and the output frame rate of the video and audio data packet are different, the video and audio data packet is correctly output in a time base conversion mode.

8. An electronic device, comprising:

one or more processors; and

a memory;

one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs comprising instructions for performing any of the methods of claims 1-6.

9. A computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a computing device, cause the computing device to perform any of the methods of claims 1-6.