WO2023071598A1 - 音视频同步监控方法、装置、电子设备及存储介质 - Google Patents
音视频同步监控方法、装置、电子设备及存储介质 Download PDFInfo
- Publication number
- WO2023071598A1 WO2023071598A1 PCT/CN2022/119419 CN2022119419W WO2023071598A1 WO 2023071598 A1 WO2023071598 A1 WO 2023071598A1 CN 2022119419 W CN2022119419 W CN 2022119419W WO 2023071598 A1 WO2023071598 A1 WO 2023071598A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- video
- data
- audio
- frame
- rendering time
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 71
- 230000001360 synchronised effect Effects 0.000 title claims abstract description 58
- 238000012544 monitoring process Methods 0.000 title claims abstract description 45
- 230000008569 process Effects 0.000 claims abstract description 22
- 238000009877 rendering Methods 0.000 claims description 173
- 238000005070 sampling Methods 0.000 claims description 34
- 238000012806 monitoring device Methods 0.000 claims description 23
- 230000005540 biological transmission Effects 0.000 claims description 10
- 239000013589 supplement Substances 0.000 claims description 5
- 238000004891 communication Methods 0.000 description 14
- 238000010586 diagram Methods 0.000 description 11
- 230000006870 function Effects 0.000 description 10
- 238000004590 computer program Methods 0.000 description 8
- 230000003287 optical effect Effects 0.000 description 6
- 238000012545 processing Methods 0.000 description 6
- 230000009286 beneficial effect Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000003993 interaction Effects 0.000 description 3
- 238000003491 array Methods 0.000 description 2
- 239000013307 optical fiber Substances 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 101000822695 Clostridium perfringens (strain 13 / Type A) Small, acid-soluble spore protein C1 Proteins 0.000 description 1
- 101000655262 Clostridium perfringens (strain 13 / Type A) Small, acid-soluble spore protein C2 Proteins 0.000 description 1
- 101000655256 Paraclostridium bifermentans Small, acid-soluble spore protein alpha Proteins 0.000 description 1
- 101000655264 Paraclostridium bifermentans Small, acid-soluble spore protein beta Proteins 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000008713 feedback mechanism Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003032 molecular docking Methods 0.000 description 1
- 238000004806 packaging method and process Methods 0.000 description 1
- 238000012858 packaging process Methods 0.000 description 1
- 239000000047 product Substances 0.000 description 1
- 230000001105 regulatory effect Effects 0.000 description 1
- 230000000153 supplemental effect Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/24—Monitoring of processes or resources, e.g. monitoring of server load, available bandwidth, upstream requests
- H04N21/2407—Monitoring of transmitted content, e.g. distribution time, number of downloads
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/235—Processing of additional data, e.g. scrambling of additional data or processing content descriptors
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/4302—Content synchronisation processes, e.g. decoder synchronisation
- H04N21/4307—Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen
- H04N21/43072—Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen of multiple content streams on the same device
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/435—Processing of additional data, e.g. decrypting of additional data, reconstructing software from modules extracted from the transport stream
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/439—Processing of audio elementary streams
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/85—Assembly of content; Generation of multimedia applications
- H04N21/854—Content authoring
- H04N21/8547—Content authoring involving timestamps for synchronizing content
Definitions
- Embodiments of the present disclosure relate to the field of computer technology, for example, to an audio and video synchronization monitoring method, device, electronic equipment, and storage medium.
- the anchor data streaming terminal is used as the audio and video data streaming terminal to push audio and video data to the content distribution network
- the live audience data streaming terminal is used as the audio and video data streaming terminal.
- the streaming end using a third-party server as an intermediary, requests audio and video data from the content distribution network, and then decodes and plays the audio and video data that is fetched.
- the synchronous monitoring of audio and video by the streaming terminal is based on the sender message sent by the third-party server when translating the audio and video data, and the time stamp on the audio and video data packet.
- This requires close cooperation between the streaming end and the third-party server, and the docking cost is relatively high; moreover, the third-party server is a third-party server other than the streaming end and the streaming end.
- the embodiments of the present disclosure provide an audio and video synchronization monitoring method, device, electronic equipment, and storage medium, which can make the streaming end and the streaming end no longer rely on the time information provided by a third party for audio and video data synchronization, and improve the efficiency of streaming.
- the embodiment of the present disclosure provides an audio and video synchronous monitoring method, which is applied to the data push end, and the method includes:
- a video reference frame is selected from the video data, and additional enhancement information is written in the encoding process of the video reference frame, wherein the additional enhancement information includes information for synchronously playing the audio data and the video data reference information;
- the audio data and the video data are played synchronously.
- the embodiment of the present disclosure provides an audio and video synchronous monitoring method, which is applied to the data streaming terminal, and the method includes:
- the embodiment of the present disclosure also provides an audio and video synchronization monitoring device, which is configured at the data streaming end, and the device includes:
- the data encoding module is configured to collect audio data and video data to be streamed, and encode the audio data and the video data;
- the data information supplement module is configured to select a video reference frame in the video data, and write additional enhancement information during the encoding process of the video reference frame, wherein the additional enhancement information includes information for making the audio data Reference information played synchronously with the video data;
- the data streaming module is configured to push the encoded audio data and video data to the target content distribution network, so that the data pulling end can pull the audio data and the video data through a third-party server, and according to The additional enhanced information monitors the audio data and the video data to achieve synchronous playback.
- the embodiment of the present disclosure also provides an audio and video synchronization monitoring device, which is configured at the data streaming end, and the device includes:
- the data pulling module is configured to pull the audio data and video data to be played, and obtain additional enhanced information of the video reference frame in the video data;
- a data rendering time determination module configured to determine the rendering time of video frames in the video data and the rendering time of audio frames in the audio data based on the additional enhancement information
- the data synchronization monitoring module is configured to monitor the video data and the audio data for synchronous playback according to the rendering time of video frames in the video data and the rendering time of audio frames in the audio data.
- an embodiment of the present disclosure further provides an electronic device, and the electronic device includes:
- processors one or more processors
- storage means configured to store one or more programs
- the one or more processors When the one or more programs are executed by the one or more processors, the one or more processors implement the data push end or data pull end as described in any embodiment of the present disclosure. Audio and video synchronous monitoring method.
- the embodiments of the present disclosure further provide a storage medium containing computer-executable instructions, the computer-executable instructions are configured to execute the application described in any one of the embodiments of the present disclosure when executed by a computer processor.
- FIG. 1 is a schematic flow diagram of an audio and video synchronization monitoring method applied to a data streaming terminal provided by an embodiment of the present disclosure
- FIG. 2 is a schematic flowchart of an audio and video synchronization monitoring method applied to a data streaming terminal provided by another embodiment of the present disclosure
- FIG. 3 is a schematic structural diagram of an audio-video synchronization monitoring device configured at a data streaming end provided by an embodiment of the present disclosure
- FIG. 4 is a schematic structural diagram of an audio-video synchronization monitoring device configured at a data streaming end provided by another embodiment of the present disclosure
- FIG. 5 is a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure.
- the term “comprise” and its variations are open-ended, ie “including but not limited to”.
- the term “based on” is “based at least in part on”.
- the term “one embodiment” means “at least one embodiment”; the term “another embodiment” means “at least one further embodiment”; the term “some embodiments” means “at least some embodiments.” Relevant definitions of other terms will be given in the description below.
- FIG. 1 is a schematic flowchart of an audio-video synchronization monitoring method applied to a data streaming terminal provided by an embodiment of the present disclosure.
- the embodiment of the present disclosure is applicable to the situation where audio data and video data are played synchronously during a live broadcast.
- the method can be executed by an audio-video synchronization monitoring device configured on a data streaming end, and the device can be implemented in the form of software and/or hardware, and the device can be configured in electronic equipment, for example, in a server device.
- the audio and video synchronous monitoring method applied to the data streaming end includes:
- S110 Collect audio data and video data to be streamed, and encode the audio data and video data.
- the data streaming end is the data sender in real-time data interaction, which can be the live broadcast client used by the anchor in the live broadcast, or the conference client of the speaker in the real-time online conference.
- the data to be pushed is the video data and audio data collected by the data push end through the camera and microphone of the terminal device where it is located.
- Data streaming is the process of encoding and packaging the collected audio data and video data separately, and then transmitting the packaged data packets to the target server based on the real-time communication transmission protocol.
- the target server is the service node of the content distribution network.
- Video encoding refers to the conversion of a video format file into another video format file through a specific compression technology.
- Commonly used encoding formats in video streaming include H.261, H.263, H.264, M-JPEG, and MPEG.
- an encoded image can be regarded as a video frame.
- the audio frame will be different due to different encoding formats.
- the audio frame is related to the audio encoding format, and the encoding of audio data is realized under multiple encoding standards. Different encoding formats will have different parameters such as audio frame length and sampling rate.
- Video data usually contains multiple video frames, and the video reference frame is equivalent to a calibration point of other video frames.
- the rendering time of each frame can be determined by referring to the rendering time of the video reference frame.
- additional information can be added to the encoded video stream through additional enhanced information (Supplemental Enhancement Information, SEI).
- SEI Supplemental Enhancement Information
- the additional enhanced information includes information used to make the audio data Reference information for synchronous playback with video data.
- the data streaming terminal can determine the video reference frames containing additional enhancement information according to the pulled video data, but cannot confirm The audio reference frame that corresponds to the video reference frame. Therefore, related information of the audio reference frame is set in the additional enhancement information.
- the data streaming terminal will select an audio frame that is synchronized or close to the video reference frame in terms of encoding time, determine it as the audio reference frame corresponding to the video reference frame, and set the signature, audio frame length, and audio data sampling rate of the audio frame and audio frame rendering time are used as content in the additional enhancement information to determine the rendering time of each audio frame.
- the video data sampling rate of the video reference frame and the rendering time information of the video frame are included in the additional enhancement information, which are used to determine the rendering time of other video frames.
- the selection of the video reference frame may be selecting a video frame as the video reference frame according to a preset time interval, for example, selecting a video reference frame every two seconds.
- a video reference frame may be selected every certain number of video frames, for example, a video reference frame may be selected every 40 video frames.
- the data pusher will send the encoded audio data and video data to the target content delivery network (Content Delivery Network, CDN) in streaming media (FlashVideo, Flv) format.
- CDN Content Delivery Network
- Flv streaming media
- the data streaming end will send a streaming request to the third-party server, and the third-party server will pull the audio data and video data to the target CDN according to the streaming address, and will pull the audio data and video data in Flv format It is converted into a data packet under the real-time communication protocol and sent to the streaming end.
- the streaming end can obtain video data and audio data that need to be played synchronously.
- the streaming end can realize synchronous playback monitoring of audio data and video data according to the additional enhanced information of the video reference frame in the video data, realizing an end-to-end audio and video synchronization from the streaming end to the streaming end monitor.
- the security and reliability of audio and video synchronization monitoring are enhanced.
- a video reference frame is selected, and additional enhancement information is added to the video reference frame, as Reference information for synchronous playback of audio data and video data; finally, push the encoded audio data and video data to the target content distribution network, so that the data pulling end can pull audio data and video data through a third-party server , and monitor the audio data and video data based on the additional enhanced information in the video reference frame to realize synchronous playback.
- the technical solution of the embodiment of the present disclosure avoids the synchronization of audio and video data monitored by the data streaming terminal in the related art based on the message and time stamp information of a third-party server other than the data streaming terminal, which has high cost of data connection and low reliability.
- the push end and the pull end no longer rely on the information provided by the third-party server for audio and video data synchronization, which improves the security and reliability of data synchronization at the pull end.
- the embodiment of the present disclosure provides an audio-video synchronous monitoring method applied to the data pull end, which belongs to the same idea as the audio-video synchronous monitoring method applied to the data push end provided in the above embodiments.
- This embodiment describes the process of synchronously playing the pulled audio data and video data at the data streaming end.
- the method can be executed by an audio-video synchronization monitoring device configured at a data pull stream end, the device can be implemented in the form of software and/or hardware, and the device can be configured in an electronic device or a server device.
- FIG. 2 is a schematic flowchart of an audio-video synchronization monitoring method applied to a data streaming terminal provided by another embodiment of the present disclosure.
- the audio-video synchronous monitoring method provided in this embodiment includes:
- S210 Pull audio data and video data to be played, and acquire additional enhancement information of video reference frames in the video data.
- the data streaming end will pull the audio data and video data to be played from the content distribution network through a third-party server.
- the extracted video data includes video reference frames added with additional enhancement information.
- the additional reference information is the information added to the video code stream when data encoding is performed at the data streaming end.
- the additional enhanced information includes reference information for synchronously playing audio data and video data to be played.
- a video reference frame can be determined. First, it is necessary to determine the audio reference frame corresponding to the video reference frame based on the additional enhancement information. Then, the rendering time of each video frame in the video data and the rendering time of each audio frame in the audio data are determined according to the relevant information of the video reference frame and the audio reference frame. Of course, after the additional enhancement information is acquired, the operations of determining the audio reference frame and determining the rendering time of each video frame may be performed synchronously. The determination process is as follows:
- the video reference frame in the newly pulled video data will be obtained as the latest video reference frame, and the additional enhancement information of the video reference frame is the reference data for calculating and determining the rendering time of the video frame and audio frame whose rendering time is currently unknown .
- An embodiment of determining the video frame rendering time of the video frame whose rendering time is currently unknown is that for each video frame in the pulled video data, the sending timestamp of each video frame and the sending time of the video reference frame can be calculated
- the first time difference of the timestamp expressed as:
- rtp_timestamp is the timestamp when the third-party server sends the real-time communication data packet, and the data type is a 64-bit unsigned integer, indicating the time when the sender report (SR) of the corresponding video frame is sent. Then, according to the first time difference and the video data sampling rate, determine the first rendering time difference between each video frame and the video reference frame; superimpose the first rendering time difference and the video frame rendering time of the video reference frame to determine each video frame The video frame rendering time of the frame. Available formulas are expressed as:
- One embodiment of determining the audio frame rendering time of an audio frame whose rendering time is currently unknown is to first determine the audio reference frame based on the audio reference frame signature and audio frame length in the additional enhancement information, and match the corresponding audio reference frame for the video reference frame frame. After the data streaming terminal pulls the audio data, it will decode the audio data and temporarily store it in an audio buffer. After the additional enhancement information is obtained, according to the signature and the length of the audio frame of the audio reference frame, the audio reference frame with the same signature and frame length can be matched in the audio buffer as the latest reference frame, which is used as an undetermined A reference to the audio frame at render time.
- the audio frame rendering time of the corresponding audio frame in the audio data can be determined respectively by adding the audio data sampling rate and audio frame rendering time of the audio reference frame in the enhanced information, and the sending timestamp of each audio frame in the audio data .
- rtp_timestamp is the timestamp when the third-party server sends the real-time communication data packet.
- the data type is a 64-bit unsigned integer, indicating the time when the sender report (SR) of the corresponding audio frame is sent. For example, according to the second time difference and the audio data sampling rate, determine the second rendering time difference between each audio frame and the audio reference frame, superimpose the second rendering time difference and the video frame rendering time of the audio reference frame, and determine the rendering time of each audio frame Video frame rendering time.
- the audio and video playback can be regulated according to the real-time video frame rendering time and audio frame rendering time, so that the delay of the video relative to the audio is within a preset delay range.
- the rendering of the video frame and audio frame in the video data and audio data is determined according to the additional enhancement information of the video reference frame Time, so as to monitor audio data and video data according to the rendering time of video frames and audio frames to achieve synchronous playback, and monitor the synchronization of audio and video based on the actual rendering time of audio and video frames, without the need for audio and video data reports from third-party servers text information.
- the technical solution of the embodiment of the present disclosure avoids the synchronization of audio and video data monitored by the data streaming terminal in the related art based on the message and time stamp information of a third-party server other than the data streaming terminal, which has high cost of data connection and low reliability.
- the push end and the pull end no longer rely on the information provided by the third-party server for audio and video data synchronization, which improves the security and reliability of data synchronization at the pull end.
- FIG. 3 is a schematic structural diagram of an audio-video synchronization monitoring device configured at a data streaming end provided by an embodiment of the present disclosure.
- the audio and video synchronous monitoring device provided in this embodiment and configured at the data streaming end is suitable for synchronously playing audio data and video data during a live broadcast.
- the audio and video synchronous monitoring device configured at the data streaming end includes: a data encoding module 310 , a data information supplementary module 320 and a data streaming module 330 .
- the data encoding module 310 is configured to collect audio data and video data to be streamed, and encode the audio data and the video data;
- the data information supplement module 320 is configured to select video data from the video data reference frame, and write additional enhancement information in the encoding process of the video reference frame, wherein the additional enhancement information includes reference information for synchronously playing the audio data and the video data;
- data push module 330 set to push the encoded audio data and video data to the target content distribution network, so that the data pulling end can pull the audio data and the video data through the third-party server, and according to the additional The enhanced information monitors the audio data and the video data to achieve synchronous playback.
- a video reference frame is selected, and additional enhancement information is added to the video reference frame, as Reference information for synchronous playback of audio data and video data; finally, push the encoded audio data and video data to the target content distribution network, so that the data pulling end can pull audio data and video data through a third-party server , and monitor the audio data and video data based on the additional enhanced information in the video reference frame to realize synchronous playback.
- the technical solution of the embodiment of the present disclosure avoids the synchronization of audio and video data monitored by the data streaming terminal in the related art based on the message and time stamp information of a third-party server other than the data streaming terminal, which has high cost of data connection and low reliability.
- the push end and the pull end no longer rely on the information provided by the third-party server for audio and video data synchronization, which improves the security and reliability of data synchronization at the pull end.
- the data information supplement module 320 is set to:
- the audio and video synchronous monitoring device provided by the embodiments of the present disclosure and configured on the data push end can execute the audio and video synchronization monitoring method applied to the data push end provided by any embodiment of the present disclosure, and has corresponding functional modules and beneficial Effect.
- Fig. 4 is a schematic structural diagram of an audio-video synchronization monitoring device configured at a data streaming end provided by another embodiment of the present disclosure.
- the audio and video synchronous monitoring device configured at the data streaming end provided in this embodiment is suitable for the process of synchronously playing the pulled audio data and video data at the data streaming end.
- the audio-video synchronization monitoring device configured at the data pulling end includes: a data pulling module 410 , a data rendering time determination module 420 and a data synchronization monitoring module 430 .
- the data pulling module 410 is set to pull the audio data and video data to be played, and obtain the additional enhancement information of the video reference frame in the video data;
- the data rendering time determination module 420 is set to based on the additional The enhancement information determines the rendering time of the video frame in the video data and the rendering time of the audio frame in the audio data;
- the data synchronization monitoring module 430 is configured to be based on the rendering time of the video frame in the video data and the rendering time of the audio frame in the audio data The rendering time of the audio frame monitors the video data and the audio data for synchronous playback.
- the rendering of the video frame and audio frame in the video data and audio data is determined according to the additional enhancement information of the video reference frame Time, so as to monitor audio data and video data according to the rendering time of video frames and audio frames to achieve synchronous playback, and monitor the synchronization of audio and video based on the actual rendering time of audio and video frames, without the need for audio and video data reports from third-party servers text information.
- the technical solution of the embodiment of the present disclosure avoids the synchronization of audio and video data monitored by the data streaming terminal in the related art based on the message and time stamp information of a third-party server other than the data streaming terminal, which has high cost of data connection and low reliability.
- the push end and the pull end no longer rely on the information provided by the third-party server for audio and video data synchronization, which improves the security and reliability of data synchronization at the pull end.
- the data rendering time determining module 420 includes: an audio reference frame determining submodule, an audio frame rendering time determining submodule, and a video frame rendering time determining submodule.
- the audio reference frame determining sub-module is configured to: match the corresponding audio reference frame for the video reference frame according to the audio reference frame signature and the audio frame length in the additional enhancement information.
- the audio frame rendering time determination submodule is set to: determine the audio data sampling rate and audio frame rendering time of the audio reference frame in the additional enhancement information, and the sending timestamp of each audio frame in the audio data The audio frame rendering time of the audio frame in the audio data.
- the video frame rendering time determination sub-module is set to: determine by the video data sampling rate of the video reference frame in the additional enhanced information, the video frame rendering time, and the sending timestamp of each video frame in the video data The video frame rendering time of the video frame in the video data.
- the video frame rendering time determination submodule is set to:
- the video frame rendering time of each video frame is determined by superimposing the first rendering time difference with the video frame rendering time of the video reference frame.
- the video frame rendering time determination submodule is set to:
- the video frame rendering time of each audio frame is determined by superimposing the second rendering time difference with the video frame rendering time of the audio reference frame.
- the data synchronization monitoring module 430 is set to:
- the video data and the audio data are monitored and played synchronously.
- the audio and video synchronous monitoring device provided by the embodiments of the present disclosure and configured at the data streaming end can execute the audio and video synchronous monitoring method applied to the data streaming end provided by any embodiment of the present disclosure, and has corresponding functional modules and beneficial Effect.
- FIG. 5 it shows a schematic structural diagram of an electronic device (such as a terminal device or a server in FIG. 5 ) 500 suitable for implementing an embodiment of the present disclosure.
- the terminal equipment in the embodiment of the present disclosure may include but not limited to such as mobile phone, notebook computer, digital broadcast receiver, PDA (personal digital assistant), PAD (tablet computer), PMP (portable multimedia player), vehicle terminal (such as mobile terminals such as car navigation terminals) and fixed terminals such as digital TVs, desktop computers and the like.
- the electronic device shown in FIG. 5 is only an example, and should not limit the functions and scope of use of the embodiments of the present disclosure.
- an electronic device 500 may include a processing device (such as a central processing unit, a graphics processing unit, etc.) 506 is loaded into the program in the random access memory (Random Access Memory, RAM) 503 to execute various appropriate actions and processes.
- a processing device such as a central processing unit, a graphics processing unit, etc.
- RAM Random Access Memory
- various programs and data necessary for the operation of the electronic device 500 are also stored.
- the processing device 501, ROM 502, and RAM 503 are connected to each other through a bus 504.
- An input/output (I/O) interface 505 is also connected to the bus 504 .
- the following devices can be connected to the I/O interface 505: input devices 506 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speaker, vibration an output device 507 such as a computer; a storage device 508 including, for example, a magnetic tape, a hard disk, etc.; and a communication device 509.
- the communication means 509 may allow the electronic device 500 to perform wireless or wired communication with other devices to exchange data. While FIG. 5 shows electronic device 500 having various means, it is to be understood that implementing or possessing all of the means shown is not a requirement. More or fewer means may alternatively be implemented or provided.
- embodiments of the present disclosure include a computer program product, which includes a computer program carried on a non-transitory computer readable medium, where the computer program includes program code for executing the method shown in the flowchart.
- the computer program may be downloaded and installed from a network via communication means 509 , or from storage means 506 , or from ROM 502 .
- the processing device 501 executes the above-mentioned functions defined in the audio-video synchronization monitoring method applied to the data push end or the data pull end of the embodiment of the present disclosure.
- the electronic device provided by the embodiment of the present disclosure and the audio-video synchronization monitoring method applied to the data push terminal or the data pull terminal provided by the above embodiment belong to the same public concept, and the technical details not described in detail in this embodiment can be referred to the above implementation example, and this embodiment has the same beneficial effect as the above-mentioned embodiment.
- An embodiment of the present disclosure provides a computer storage medium, on which a computer program is stored.
- the program is executed by a processor, the audio-video synchronization monitoring method applied to the data pull end or the data push end provided by the above embodiment is implemented.
- the above-mentioned computer-readable medium in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium or any combination of the above two.
- a computer readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof.
- Computer-readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable Read-Only Memory (Erasable Programmable Read-Only Memory, EPROM) or flash memory (FLASH), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
- a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
- a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave carrying computer-readable program code therein. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
- the computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, and the computer-readable signal medium may send, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device .
- Program code embodied on a computer readable medium may be transmitted by any appropriate medium, including but not limited to: wires, optical cables, RF (radio frequency), etc., or any suitable combination of the above.
- the client and the server can communicate using any currently known or future network protocols such as Hypertext Transfer Protocol (HyperText Transfer Protocol, HTTP), and can communicate with digital data in any form or medium Communications (eg, communication networks) are interconnected.
- Examples of communication networks include local area networks (LANs), wide area networks (WANs), internetworks (eg, the Internet), and peer-to-peer networks (eg, ad hoc peer-to-peer networks), as well as any currently known or future developed networks.
- the above-mentioned computer-readable medium may be included in the above-mentioned electronic device, or may exist independently without being incorporated into the electronic device.
- the above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device:
- a video reference frame is selected from the video data, and additional enhancement information is written in the encoding process of the video reference frame, wherein the additional enhancement information includes information for synchronously playing the audio data and the video data reference information;
- the audio data and the video data are played synchronously.
- the above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device can also be made to:
- Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, or combinations thereof, including but not limited to object-oriented programming languages—such as Java, Smalltalk, C++, and Includes conventional procedural programming languages - such as the "C" language or similar programming languages.
- the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
- the remote computer can be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as through an Internet service provider). Internet connection).
- LAN local area network
- WAN wide area network
- Internet service provider such as AT&T, MCI, Sprint, EarthLink, MSN, GTE, etc.
- each block in a flowchart or block diagram may represent a module, program segment, or portion of code that contains one or more logical functions for implementing specified executable instructions.
- the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved.
- each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented by a dedicated hardware-based system that performs the specified functions or operations , or may be implemented by a combination of dedicated hardware and computer instructions.
- the units involved in the embodiments described in the present disclosure may be implemented by software or by hardware. Wherein, the names of the units and modules do not constitute limitations on the units and modules themselves under certain circumstances, for example, the data generating module may also be described as a "video data generating module".
- exemplary types of hardware logic components include: Field Programmable Gate Arrays (Field Programmable Gate Arrays, FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (Application Specific Standard Parts, ASSP), System on Chip (System on Chip, SOC), Complex Programmable Logic Device (CPLD), etc.
- a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device.
- a machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
- a machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing.
- machine-readable storage media would include one or more wire-based electrical connections, portable computer discs, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
- RAM random access memory
- ROM read only memory
- EPROM or flash memory erasable programmable read only memory
- CD-ROM compact disk read only memory
- magnetic storage or any suitable combination of the foregoing.
- Example 1 provides an audio and video synchronous monitoring method applied to a data streaming end, the method including:
- a video reference frame is selected from the video data, and additional enhancement information is written in the encoding process of the video reference frame, wherein the additional enhancement information includes information for synchronously playing the audio data and the video data reference information;
- the audio data and the video data are played synchronously.
- Example 2 provides an audio and video synchronous monitoring method applied to the data streaming end, which also includes:
- the writing of additional enhancement information during the encoding process of the video reference frame includes:
- Example 3 provides an audio and video synchronous monitoring method applied to the data streaming terminal, which also includes:
- Example 4 provides an audio and video synchronous monitoring method applied to the data streaming end, which also includes:
- determining the rendering time of video frames in the video data and the rendering time of audio frames in the audio data based on the additional enhancement information includes:
- Example 5 provides an audio and video synchronous monitoring method applied to the data streaming end, including:
- the video frame in the video data is determined by using the video data sampling rate of the video reference frame in the additional enhancement information, the video frame rendering time, and the sending timestamp of each video frame in the video data Video frame rendering time, including:
- the video frame rendering time of each video frame is determined by superimposing the first rendering time difference with the video frame rendering time of the video reference frame.
- Example 6 provides an audio and video synchronous monitoring method applied to the data streaming terminal, which also includes:
- the audio data sampling rate and audio frame rendering time of the audio reference frame in the additional enhancement information determine the audio data in the audio data Audio frame rendering time for audio frames, including:
- the video frame rendering time of each audio frame is determined by superimposing the second rendering time difference with the video frame rendering time of the audio reference frame.
- Example 7 provides an audio and video synchronous monitoring method applied to the data streaming end, which also includes:
- the monitoring the video data and the audio data for synchronous playback according to the rendering time of video frames in the video data and the rendering time of audio frames in the audio data includes:
- the video data and the audio data are monitored and played synchronously.
- Example 8 provides an audio and video synchronization monitoring device configured at the data streaming end, including:
- the data encoding module is configured to collect audio data and video data to be streamed, and encode the audio data and the video data;
- the data information supplement module is configured to select a video reference frame in the video data, and write additional enhancement information during the encoding process of the video reference frame, wherein the additional enhancement information includes information for making the audio data Reference information played synchronously with the video data;
- the data streaming module is configured to push the encoded audio data and video data to the target content distribution network, so that the data pulling end can pull the audio data and the video data through a third-party server, and according to The additional enhanced information monitors the audio data and the video data to achieve synchronous playback.
- Example 9 provides an audio and video synchronization monitoring device configured at the data streaming end, which also includes:
- the data information supplementary module is set to:
- Example 10 provides an audio and video synchronization monitoring device configured at the data streaming end, which further includes:
- the data pulling module is configured to pull the audio data and video data to be played, and obtain additional enhanced information of the video reference frame in the video data;
- a data rendering time determination module configured to determine the rendering time of video frames in the video data and the rendering time of audio frames in the audio data based on the additional enhancement information
- the data synchronization monitoring module is configured to monitor the video data and the audio data for synchronous playback according to the rendering time of video frames in the video data and the rendering time of audio frames in the audio data.
- Example 11 provides an audio and video synchronization monitoring device configured at the data streaming end, and further includes:
- the data rendering time determining module includes: an audio reference frame determining submodule, an audio frame rendering time determining submodule, and a video frame rendering time determining submodule.
- the audio reference frame determining sub-module is configured to: match the corresponding audio reference frame for the video reference frame according to the audio reference frame signature and the audio frame length in the additional enhancement information.
- the audio frame rendering time determination submodule is set to: determine the audio data sampling rate and audio frame rendering time of the audio reference frame in the additional enhancement information, and the sending timestamp of each audio frame in the audio data The audio frame rendering time of the audio frame in the audio data.
- the video frame rendering time determination sub-module is set to: determine by the video data sampling rate of the video reference frame in the additional enhanced information, the video frame rendering time, and the sending timestamp of each video frame in the video data The video frame rendering time of the video frame in the video data.
- Example 12 provides an audio and video synchronization monitoring device configured at the data streaming end, including:
- the video frame rendering time determination submodule is set to:
- the video frame rendering time of each video frame is determined by superimposing the first rendering time difference with the video frame rendering time of the video reference frame.
- Example 13 provides an audio and video synchronization monitoring device configured at the data streaming end, and further includes:
- the video frame rendering time determination submodule is set to:
- the video frame rendering time of each audio frame is determined by superimposing the second rendering time difference with the video frame rendering time of the audio reference frame.
- Example 14 provides an audio and video synchronization monitoring device configured at the data streaming end, further comprising:
- the data synchronization monitoring module is set to:
- the video data and the audio data are monitored and played synchronously.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Computer Security & Cryptography (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
Abstract
本公开实施例公开了一种音视频同步监控方法、装置、电子设备及存储介质,其中,应用于数据推流端的方法包括:采集待推流的音频数据和视频数据,并对音频数据和视频数据进行编码;在视频数据中选取视频参考帧,并在视频参考帧的编码过程中写入附加增强信息,其中,附加增强信息包括用于使音频数据和视频数据同步播放的参考信息;将编码完成后的音频数据和视频数据推流到目标内容分发网络中,以供数据拉流端通过第三方服务器拉取音频数据和视频数据,并根据附加增强信息监控音频数据和视频数据实现同步播放。
Description
本申请要求在2021年10月25日提交中国专利局、申请号为202111241413.5的中国专利申请的优先权,该申请的全部内容通过引用结合在本申请中。
本公开实施例涉及计算机技术领域,例如涉及一种音视频同步监控方法、装置、电子设备及存储介质。
在直播过程中,为降低延时通常会采用实时通信技术,主播数据拉流端作为音视频数据推流端将音频和视频数据推动到内容分发网络,直播观众数据拉流端作为音视频数据拉流端,以第三方服务器作为中介,向内容分发网络请求音视频数据,进而将拉取到的音视频数据进行解码播放。
其中,拉流端对音视频的同步监控是基于第三方服务器在转发音视频数据时发出的发送者报文,以及在音视频数据包上打的时间戳。这就需要拉流端与第三方服务器密切配合,对接成本较高;而且,第三方服务器作为推流端和拉流端之外的第三方服务器,一旦发送者报文或者实时通信数据时间戳出现问题,就会导致拉流端播放的音视频数据不能保持同步,影响音视频播放效果。
发明内容
本公开实施例提供了一种音视频同步监控方法、装置、电子设备及存储介质,能够使推流端和拉流端不再依赖第三方提供的时间信息进行音视频数据同步,提高了拉流端数据同步的安全性。
第一方面,本公开实施例提供了一种音视频同步监控方法,应用于数据推流端,该方法包括:
采集待推流的音频数据和视频数据,并对所述音频数据和所述视频数据进行编码;
在所述视频数据中选取视频参考帧,并在所述视频参考帧的编码过程中写入附加增强信息,其中,所述附加增强信息包括用于使所述音频数据和所述视频数据同步播放的参考信息;
将编码完成后的音频数据和视频数据推流到目标内容分发网络中,以供数据拉流端通过第三方服务器拉取所述音频数据和所述视频数据,并根据所述附加增强信息监控所述音频数据和所述视频数据实现同步播放。
第二方面,本公开实施例提供了一种音视频同步监控方法,应用于数据拉流端,该方法包括:
拉取待播放的音频数据和视频数据,并获取所述视频数据中的视频参考帧的附加增强信息;
基于所述附加增强信息确定所述视频数据中视频帧的渲染时间和所述音频数据中音频帧的渲染时间;
根据所述视频数据中视频帧的渲染时间和所述音频数据中音频帧的渲染时间,监控所述视频数据和所述音频数据进行同步播放。
第三方面,本公开实施例还提供了一种音视频同步监控装置,配置于数据推流端,该装置包括:
数据编码模块,设置为采集待推流的音频数据和视频数据,并对所述音频数据和所述视频数据进行编码;
数据信息补充模块,设置为在所述视频数据中选取视频参考帧,并在所述视频参考帧的编码过程中写入附加增强信息,其中,所述附加增强信息包括用于使所述音频数据和所述视频数据同步播放的参考信息;
数据推流模块,设置为将编码完成后的音频数据和视频数据推流到目标内容分发网络中,以供数据拉流端通过第三方服务器拉取所述音频数据和所述视频数据,并根据所述附加增强信息监控所述音频数据和所述视频数据实现同步播放。
第四方面,本公开实施例还提供了一种音视频同步监控装置,配置于数据拉流端,该装置包括:
数据拉取模块,设置为拉取待播放的音频数据和视频数据,并获取所述视频数据中的视频参考帧的附加增强信息;
数据渲染时间确定模块,设置为基于所述附加增强信息确定所述视频数据中视频帧的渲染时间和所述音频数据中音频帧的渲染时间;
数据同步监控模块,设置为根据所述视频数据中视频帧的渲染时间和所述音频数据中音频帧的渲染时间,监控所述视频数据和所述音频数据进行同步播放。
第五方面,本公开实施例还提供了一种电子设备,所述电子设备包括:
一个或多个处理器;
存储装置,设置为存储一个或多个程序,
当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如本公开实施例任一所述的应用于数据推流端或数据拉流端的音视频同步监控方法。
第六方面,本公开实施例还提供了一种包含计算机可执行指令的存储介质,所述计算机可执行指令在由计算机处理器执行时设置为执行如本公开实施例任一所述的应用于数据推流端或数据拉流端的音视频同步监控方法。
贯穿附图中,相同或相似的附图标记表示相同或相似的元素。应当理解附图是示意性的,原件和元素不一定按照比例绘制。
图1为本公开一实施例所提供的一种应用于数据推流端的音视频同步监控方法的流程示意图;
图2为本公开另一实施例所提供的一种应用于数据拉流端的音视频同步监控方法的流程示意图;
图3为本公开实施例所提供的一种配置于数据推流端的音视频同步监控装置结构示意图;
图4为本公开另一实施例所提供的一种配置于数据拉流端的音视频同步监控装置结构示意图;
图5为本公开一实施例所提供的一种电子设备结构示意图。
应当理解,本公开的方法实施方式中记载的多个步骤可以按照不同的顺序执行,和/或并行执行。此外,方法实施方式可以包括附加的步骤和/或省略执行示出的步骤。本公开的范围在此方面不受限制。
本文使用的术语“包括”及其变形是开放性包括,即“包括但不限于”。术语“基于”是“至少部分地基于”。术语“一个实施例”表示“至少一个实施例”;术语“另一实施例”表示“至少一个另外的实施例”;术语“一些实施例”表示“至少一些实施例”。其他术语的相关定义将在下文描述中给出。
需要注意,本公开中提及的“第一”、“第二”等概念仅用于对不同的装置、模块或单元进行区分,并非用于限定这些装置、模块或单元所执行的功能的顺序或者相互依存关系。
需要注意,本公开中提及的“一个”、“多个”的修饰是示意性而非限制性的,本领域技术人员应当理解,除非在上下文另有明确指出,否则应该理解为“一个或多个”。
图1为本公开一实施例所提供的一种应用于数据推流端的音视频同步监控方法流程示意图,本公开实施例适用于在直播过程中,同步播放音频数据和视频数据的情形。该方法可以由配置于数据推流端音视频同步监控装置来执行,该装置可以通过软件和/或硬件的形式实现,该装置可配置于电子设备中,例如配置于服务器设备中。
如图1所示,本实施例提供的应用于数据推流端的音视频同步监控方法,包括:
S110、采集待推流的音频数据和视频数据,并对所述音频数据和所述视频数据进行编码。
数据推流端是在实时数据交互中的数据发出方,可以是在直播中的主播使用的直播客户端,也可以是在实时线上会议的发言者的会议客户端。待推流的数据是数据推流端通过其所在终端设备的摄像头和麦克风分别采集的视频数据和音频数据。数据推流则是将采集到的音频数据和视频数据分别编码打包好,然后基于实时通信传输协议将打包好的数据包传输到目标服务器的过程。该目标服务器即为内容分发网络的服务节点。
数据推流端在采集到音频数据和视频数据之后,便会进行数据的编码打包。视频编码是指通过特定的压缩技术,将某个视频格式的文件转换成另一种视频格式文件。视频流传输中常用的编码格式包括H.261、H.263、H.264、M-JPEG和MPEG等格式。对于任意一种视频编码格式而言,编码后的一副图像可认为是一个视频帧。而音频帧会因编码格式的不同而不同,音频帧与音频编码格式相关,是在多个编码标准下实现音频数据的编码,不同编码格式下会有不同的音频帧长度和采样率等参数。
S120、在所述视频数据中选取视频参考帧,并在所述视频参考帧的编码过程中写入附加增强信息。
视频数据中通常会包含多个视频帧,视频参考帧则相当于其他视频帧的一个标定点,可以参照视频参考帧的渲染时间,确定每帧的渲染时间。
在视频数据的打包过程中,可以通过附加增强信息(Supplemental Enhancement Information,SEI),在视频编码后的视频码流添加额外的信息,在本实施例中,附加增强信息则包括用于使音频数据和视频数据同步播放的参考信息。
例如,在本实施例中,仅在视频数据中选择了少数的视频帧作为视频参考帧,数据拉流端可以根据拉取到的视频数据确定含有附加增强信息的视频参考帧,但是并不能确认与视频参考帧对应的音频参考帧。因此,在附加增强信息中,会设置有音频参考帧的相关信息。数据拉流端会选取与视频参考帧在编码时间上同步或接近的音频帧,确定为与视频参考帧相对应的音频参考帧,并将该音频帧的签名、音频帧长度、音频数据采样率和音频帧渲染时间作为附加增强信息中的内容,用于确定每个音频帧的渲染时间。此外,在附加增强信息中还有视频参考帧的视频数据采样率、视频帧渲染时间信息,用于确定其他视频帧渲染时间。
例如,视频参考帧的选取可以是按照一个预设的时间间隔选取一帧视频帧作为视频参考帧,例如,每两秒选取一个视频参考帧。或者,也可以是每隔一定数量的视频帧选取一个视频参考帧,例如,每40个视频帧选取一个视频参考帧。
S130、将编码完成后的音频数据和视频数据推流到目标内容分发网络中,以供数据拉流端通过第三方服务器拉取所述音频数据和所述视频数据,并根据所述附加增强信息监控所述音频数据和所述视频数据实现同步播放。
例如,在实时的数据流通信过程汇总,数据推流端会将编码后的音频数据和视频数据以流媒体(FlashVideo,Flv)格式发往目标内容分发网络(Content Delivery Network,CDN)。然后,数据拉流端会向第三方服务器发出拉流请求,由第三方服务器根据拉流地址向目标CDN进行音频数据和视频数据的拉流,并将拉到的Flv格式的音频数据和视频数据转成实时通信协议下的数据包发往拉流端。从而,拉流端可以获取到需要同步播放的视频数据和音频数据。在本实施例中,拉流端可以根据视频数据中视频参考帧的附加增强信息实现音频数据和视频数据的同步播放监控,实现了一种从推流端到拉流端的端到端音视频同步监控。而不再依赖于第三方服务器向拉流端发送数据包时的时间戳和发送者报文,从而增强了音视频同步监控的安全性和可靠性。
本公开实施例的技术方案,通过采集待推流的音频数据和视频数据,并在对音频数据和视频数据进行编码的过程中,选取视频参考帧,在视频参考帧中增加附加增强信息,作为使音频数据和视频数据同步播放的参考信息;最终,将编码完成后的音频数据和视频数据推流到目标内容分发网络中,以供数据拉流端通过第三方服务器拉取音频数据和视频数据,并根视频参考帧中的附加增强信息监控音频数据和视频数据实现同步播放。本公开实施例的技术方案避免了相关技术中数据拉流端依据数据推流端之外的第三方服务器的报文和时间戳信息 监控音视频数据的同步,存在数据对接成本高,可靠性较低的情况;实现了能够使推流端和拉流端不再依赖第三方服务器提供的信息进行音视频数据同步,提高了拉流端数据同步的安全性和可靠性。
本公开实施例提供了一种应用于数据拉流端的音视频同步监控方法,与上述实施例中所提供的应用于数据推流端的音视频同步监控方法属于同一构思。本实施例描述在数据拉流端将拉取到的音频数据和视频数据进行同步播放的过程。该方法可以由配置于数据拉流端音视频同步监控装置来执行,该装置可以通过软件和/或硬件的形式实现,该装置可配于电子设备或服务器设备中。
图2为本公开另一实施例所提供的一种应用于数据拉流端的音视频同步监控方法的流程示意图。如图2所示,本实施例提供的音视频同步监控方法,包括:
S210、拉取待播放的音频数据和视频数据,并获取所述视频数据中的视频参考帧的附加增强信息。
在低延迟的直播或是其他的实时数据传输场景下,数据拉流端会通过第三方服务器在内容分发网络中拉取待播放的音频数据和视频数据。在本实施例中,拉取到的视频数据中包含有添加了附加增强信息的视频参考帧。附加参考信息是在数据推流端进行数据编码时,在视频码流中添加的信息。附加增强信息中则包括用于使待播放的音频数据和视频数据实现同步播放的参考信息。
S220、基于所述附加增强信息确定所述视频数据中视频帧的渲染时间和所述音频数据中音频帧的渲染时间。
由于只有视频数据中有附加增强信息,因此可以确定视频参考帧。首先,就要基于附加增强信息确定与视频参考帧相对应的音频参考帧。然后,在再根据视频参考帧和音频参考帧的相关信息确定视频数据中每个视频帧的渲染时间和音频数据中每个音频帧的渲染时间。当然,在获取到附加增强信息之后,可以同步执行确定音频参考帧和确定每个视频帧的渲染时间的操作。确定过程如下:
将获取到最新拉取到的视频数据中的视频参考帧作为最新的视频参考帧,该视频参考帧的附加增强信息即是计算确定当前未知渲染时间的视频帧和音频帧的渲染时间的参考数据。
确定当前未知渲染时间的视频帧的视频帧渲染时间的一种实施例为,针对拉取到的视频数据中的每一个视频帧,可以计算每个视频帧的发送时间戳与视频参考帧的发送时间戳的第一时间差,用公式表示为:
diff-tsp1(第一时间差)=int64_t(视频帧的rtp_timestamp)-int64_t(视频参考帧的rtp_timestamp)。
rtp_timestamp是第三方服务器在发出实时通信数据包时的时间戳,数据类型是64位无符号整型,表示对应视频帧的发送者报文(Sender Report,SR)发送的时刻。然后,根据第一时间差和所述视频数据采样率,确定每个视频帧与视频参考帧的第一渲染时间差;将第一渲染时间差与视频参考帧的视频帧渲染时间进行叠加,确定每个视频帧的视频帧渲染时间。可用公式表示为:
确定当前未知渲染时间的音频帧的音频帧渲染时间的一种实施例为,首先确定音频参考帧是根据附加增强信息中的音频参考帧签名和音频帧长度,为视频参考帧匹配对应的音频参考帧。数据拉流端拉取到音频数据后,会将音频数据解码暂存到一个音频缓冲区中。当获取到附加增强信息之后,便可以根据音频参考帧的签名和音频帧长度,到音频缓冲区中匹配签名和帧长度一致的音频参考帧,作为当前最新的一个参考帧,用于作为未确定渲染时间的音频帧的参考。然后,便可以通过附加增强信息中的音频参考帧的音频数据采样率和音频帧渲染时间,以及音频数据中每个音频帧的发送时间戳,分别确定音频数据中对应音频帧的音频帧渲染时间。针对音频数据中的每一个音频帧,计算每个音频帧的发送时间戳与音频参考帧的发送时间戳的第二时间差;用公式可以表示为:
diff-tsp2(第二时间差)=int64_t(音频帧的rtp_timestamp)-int64_t(音频参考帧的rtp_timestamp)。
rtp_timestamp是第三方服务器在发出实时通信数据包时的时间戳,数据类型是64位无符号整型,表示对应音频帧的发送者报文(Sender Report,SR)发送的时刻。例如,根据第二时间差和音频数据采样率,确定每个音频帧与音频参考帧的第二渲染时间差,将第二渲染时间差与音频参考帧的视频帧渲染时间进行叠加,确定每个音频帧的视频帧渲染时间。
可用公式可表示为:
S230、根据所述视频数据中视频帧的渲染时间和所述音频数据中音频帧的渲染时间,监控所述视频数据和所述音频数据进行同步播放。
在音频和视频同步的过程中,由于人的听觉较视觉更为敏感,一般是把视频同步到音频时钟。通常意义上的音视频同步是允许一定延迟的,即延迟要在可接收的延迟范围内,相当于一个反馈机制,当视频慢于音频就要加快视频的播放速度,反之则减小视频的播放速度。
例如,可以通过拉取到的视频数据中的最新视频帧的视频帧渲染时间和数据到达数据拉流端的到达时间戳,以及音频数据中最新音频帧的音频帧渲染时间和数据到达时间戳,确定视频数据相对于音频数据的到达时间差。用公式可表示为:视频相对于音频的到达时间差=(最新视频帧的渲染时间-最新视频帧的到达时间)-(最新音频帧的渲染时间-最新音频帧的到达时间)。然后,基于视频相对于音频的到达时间差,更新音视频播放的时间差,以监控视频数据和音频数据进行同步播放。并且,可以在持续的音视频播放过程中,根据实时的视频帧渲染时间和音频帧渲染时间进行音视频播放的调控,使视频相对于音频的延迟在预设的延迟范围内。
本公开实施例的技术方案,通过实时数据交互中的数据拉流端,拉取音频数据和视频数据之后,根据视频参考帧的附加增强信息确定视频数据和音频数据中视频帧和音频帧的渲染时间,从而根据视频帧和音频帧的渲染时间监控音频数据和视频数据实现同步播放,并基于音视频帧的实际渲染时间监控音视频的同步情况,而无需借助第三方服务器的音视频数据的报文信息。本公开实施例的技术方案避免了相关技术中数据拉流端依据数据推流端之外的第三方服务器的报文和时间戳信息监控音视频数据的同步,存在数据对接成本高,可靠性较低 的情况;实现了能够使推流端和拉流端不再依赖第三方服务器提供的信息进行音视频数据同步,提高了拉流端数据同步的安全性和可靠性。
图3为本公开一实施例所提供的一种配置于数据推流端的音视频同步监控装置结构示意图。本实施例提供的配置于数据推流端的音视频同步监控装置适用于在直播过程中,同步播放音频数据和视频数据的情形。
如图3所示,配置于数据推流端的音视频同步监控装置包括:数据编码模块310、数据信息补充模块320和数据推流模块330。
其中,数据编码模块310,设置为采集待推流的音频数据和视频数据,并对所述音频数据和所述视频数据进行编码;数据信息补充模块320,设置为在所述视频数据中选取视频参考帧,并在所述视频参考帧的编码过程中写入附加增强信息,其中,所述附加增强信息包括用于使所述音频数据和所述视频数据同步播放的参考信息;数据推流模块330,设置为将编码完成后的音频数据和视频数据推流到目标内容分发网络中,以供数据拉流端通过第三方服务器拉取所述音频数据和所述视频数据,并根据所述附加增强信息监控所述音频数据和所述视频数据实现同步播放。
本公开实施例的技术方案,通过采集待推流的音频数据和视频数据,并在对音频数据和视频数据进行编码的过程中,选取视频参考帧,在视频参考帧中增加附加增强信息,作为使音频数据和视频数据同步播放的参考信息;最终,将编码完成后的音频数据和视频数据推流到目标内容分发网络中,以供数据拉流端通过第三方服务器拉取音频数据和视频数据,并根视频参考帧中的附加增强信息监控音频数据和视频数据实现同步播放。本公开实施例的技术方案避免了相关技术中数据拉流端依据数据推流端之外的第三方服务器的报文和时间戳信息监控音视频数据的同步,存在数据对接成本高,可靠性较低的情况;实现了能够使推流端和拉流端不再依赖第三方服务器提供的信息进行音视频数据同步,提高了拉流端数据同步的安全性和可靠性。
例如,所述数据信息补充模块320设置为:
确定与所述视频参考帧相对应的音频参考帧;
将所述音频参考帧的签名、音频帧长度、音频数据采样率和音频帧渲染时间,与所述视频参考帧的视频数据采样率、视频帧渲染时间作为所述附加增强信息,写入所述参考视频帧的编码数据。
本公开实施例所提供的配置于数据推流端的音视频同步监控装置,可执行本公开任意实施例所提供的应用于数据推流端的音视频同步监控方法,具备执行方法相应的功能模块和有益效果。
值得注意的是,上述装置所包括的多个单元和模块只是按照功能逻辑进行划分的,但并不局限于上述的划分,只要能够实现相应的功能即可;另外,多个功能单元的具体名称也只是为了便于相互区分,并不用于限制本公开实施例的保护范围。
图4为本公开另一实施例所提供的一种配置于数据拉流端的音视频同步监控装置结构示 意图。本实施例提供的配置于数据拉流端的音视频同步监控装置适用于在数据拉流端将拉取到的音频数据和视频数据进行同步播放的过程。
如图4所示,配置于数据拉流端的音视频同步监控装置包括:数据拉取模块410、数据渲染时间确定模块420和数据同步监控模块430。
其中,数据拉取模块410,设置为拉取待播放的音频数据和视频数据,并获取所述视频数据中的视频参考帧的附加增强信息;数据渲染时间确定模块420,设置为基于所述附加增强信息确定所述视频数据中视频帧的渲染时间和所述音频数据中音频帧的渲染时间;数据同步监控模块430,设置为根据所述视频数据中视频帧的渲染时间和所述音频数据中音频帧的渲染时间,监控所述视频数据和所述音频数据进行同步播放。
本公开实施例的技术方案,通过实时数据交互中的数据拉流端,拉取音频数据和视频数据之后,根据视频参考帧的附加增强信息确定视频数据和音频数据中视频帧和音频帧的渲染时间,从而根据视频帧和音频帧的渲染时间监控音频数据和视频数据实现同步播放,并基于音视频帧的实际渲染时间监控音视频的同步情况,而无需借助第三方服务器的音视频数据的报文信息。本公开实施例的技术方案避免了相关技术中数据拉流端依据数据推流端之外的第三方服务器的报文和时间戳信息监控音视频数据的同步,存在数据对接成本高,可靠性较低的情况;实现了能够使推流端和拉流端不再依赖第三方服务器提供的信息进行音视频数据同步,提高了拉流端数据同步的安全性和可靠性。
例如,数据渲染时间确定模块420包括:音频参考帧确定子模块、音频帧渲染时间确定子模块和视频帧渲染时间确定子模块。
其中,音频参考帧确定子模块设置为:根据所述附加增强信息中的音频参考帧签名和音频帧长度,为所述视频参考帧匹配对应的音频参考帧。
音频帧渲染时间确定子模块设置为:通过所述附加增强信息中的所述音频参考帧的音频数据采样率和音频帧渲染时间,以及所述音频数据中每个音频帧的发送时间戳,确定所述音频数据中该音频帧的音频帧渲染时间。
视频帧渲染时间确定子模块设置为:通过所述附加增强信息中的所述视频参考帧的视频数据采样率、视频帧渲染时间,以及所述视频数据中每个视频帧的发送时间戳,确定所述视频数据中该视频帧的视频帧渲染时间。
例如,所述视频帧渲染时间确定子模块设置为:
针对所述视频数据中的每一个视频帧,计算每个视频帧的发送时间戳与所述视频参考帧的发送时间戳的第一时间差;
根据所述第一时间差和所述视频数据采样率,确定每个视频帧与所述视频参考帧的第一渲染时间差;
将所述第一渲染时间差与所述视频参考帧的视频帧渲染时间进行叠加,确定每个视频帧的视频帧渲染时间。
例如,所述视频帧渲染时间确定子模块设置为:
针对所述音频数据中的每一个音频帧,计算每个音频帧的发送时间戳与所述音频参考帧的发送时间戳的第二时间差;
根据所述第二时间差和所述音频数据采样率,确定每个音频帧与所述音频参考帧的第二渲染时间差;
将所述第二渲染时间差与所述音频参考帧的视频帧渲染时间进行叠加,确定每个音频帧的视频帧渲染时间。
例如,数据同步监控模块430设置为:
通过所述视频数据中的最新视频帧的视频帧渲染时间和数据到达时间戳,以及所述音频数据中最新音频帧的音频帧渲染时间和数据到达时间,确定所述视频数据相对于所述音频数据的到达时间差;
基于所述到达时间差,监控所述视频数据和音频数据进行同步播放。
本公开实施例所提供的配置于数据拉流端的音视频同步监控装置,可执行本公开任意实施例所提供的应用于数据拉流端的音视频同步监控方法,具备执行方法相应的功能模块和有益效果。
值得注意的是,上述装置所包括的多个单元和模块只是按照功能逻辑进行划分的,但并不局限于上述的划分,只要能够实现相应的功能即可;另外,多个功能单元的具体名称也只是为了便于相互区分,并不用于限制本公开实施例的保护范围。
下面参考图5,其示出了适于用来实现本公开实施例的电子设备(例如图5中的终端设备或服务器)500的结构示意图。本公开实施例中的终端设备可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、PDA(个人数字助理)、PAD(平板电脑)、PMP(便携式多媒体播放器)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字TV、台式计算机等等的固定终端。图5示出的电子设备仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。
如图5所示,电子设备500可以包括处理装置(例如中央处理器、图形处理器等)501,其可以根据存储在只读存储器(Read-Only Memory,ROM)502中的程序或者从存储装置506加载到随机访问存储器(Random Access Memory,RAM)503中的程序而执行多种适当的动作和处理。在RAM 503中,还存储有电子设备500操作所需的多种程序和数据。处理装置501、ROM 502以及RAM 503通过总线504彼此相连。输入/输出(I/O)接口505也连接至总线504。
通常,以下装置可以连接至I/O接口505:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置506;包括例如液晶显示器(LCD)、扬声器、振动器等的输出装置507;包括例如磁带、硬盘等的存储装置508;以及通信装置509。通信装置509可以允许电子设备500与其他设备进行无线或有线通信以交换数据。虽然图5示出了具有多种装置的电子设备500,但是应理解的是,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。
根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在非暂态计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置509从网络上被下载和安装,或者从存储装置506被安装,或者从ROM502被安装。在该计算机程序被处理装置501执行时,执行本公开实施例的应用于数据推流端或数据拉流端的音视频同步监控方法中限定的上述功能。
本公开实施例提供的电子设备与上述实施例提供的应用于数据推流端或数据拉流端的音视频同步监控方法属于同一公开构思,未在本实施例中详尽描述的技术细节可参见上述实施例,并且本实施例与上述实施例具有相同的有益效果。
本公开实施例提供了一种计算机存储介质,其上存储有计算机程序,该程序被处理器执行时实现上述实施例所提供的应用于数据拉流端或数据推流端的音视频同步监控方法。
需要说明的是,本公开上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(Erasable Programmable Read-Only Memory,EPROM)或闪存(FLASH)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、RF(射频)等等,或者上述的任意合适的组合。
在一些实施方式中,客户端、服务器可以利用诸如超文本传输协议(HyperText Transfer Protocol,HTTP)之类的任何当前已知或未来研发的网络协议进行通信,并且可以与任意形式或介质的数字数据通信(例如,通信网络)互连。通信网络的示例包括局域网(LAN),广域网(WAN),网际网(例如,互联网)以及端对端网络(例如,ad hoc端对端网络),以及任何当前已知或未来研发的网络。
上述计算机可读介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。
上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备:
采集待推流的音频数据和视频数据,并对所述音频数据和所述视频数据进行编码;
在所述视频数据中选取视频参考帧,并在所述视频参考帧的编码过程中写入附加增强信息,其中,所述附加增强信息包括用于使所述音频数据和所述视频数据同步播放的参考信息;
将编码完成后的音频数据和视频数据推流到目标内容分发网络中,以供数据拉流端通过第三方服务器拉取所述音频数据和所述视频数据,并根据所述附加增强信息监控所述音频数据和所述视频数据实现同步播放。
上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,还可以使得该电子设备:
拉取待播放的音频数据和视频数据,并获取所述视频数据中的视频参考帧的附加增强信息;
基于所述附加增强信息确定所述视频数据中视频帧的渲染时间和所述音频数据中音频帧的渲染时间;
根据所述视频数据中视频帧的渲染时间和所述音频数据中音频帧的渲染时间,监控所述视频数据和所述音频数据进行同步播放。
可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码,上述程序设计语言包括但不限于面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
附图中的流程图和框图,图示了按照本公开多种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本公开实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,单元、模块的名称在某种情况下并不构成对该单元、模块本身的限定,例如,数据生成模块还可以被描述为“视频数据生成模块”。
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示范类型的硬件逻辑部件包括:现场可编程门阵列(Field Programmable Gate Array,FPGA)、专用集成电路(Application Specific Integrated Circuit,ASIC)、专用标准产品(Application Specific Standard Parts,ASSP)、片上系统(System on Chip,SOC)、复杂可编程逻辑设备(CPLD)等等。
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。
根据本公开的一个或多个实施例,【示例一】提供了一种应用于数据推流端的音视频同步监控方法,该方法包括:
采集待推流的音频数据和视频数据,并对所述音频数据和所述视频数据进行编码;
在所述视频数据中选取视频参考帧,并在所述视频参考帧的编码过程中写入附加增强信息,其中,所述附加增强信息包括用于使所述音频数据和所述视频数据同步播放的参考信息;
将编码完成后的音频数据和视频数据推流到目标内容分发网络中,以供数据拉流端通过第三方服务器拉取所述音频数据和所述视频数据,并根据所述附加增强信息监控所述音频数据和所述视频数据实现同步播放。
根据本公开的一个或多个实施例,【示例二】提供了一种应用于数据推流端的音视频同步监控方法,还包括:
例如,所述在所述视频参考帧的编码过程中写入附加增强信息,包括:
确定与所述视频参考帧相对应的音频参考帧;
将所述音频参考帧的签名、音频帧长度、音频数据采样率和音频帧渲染时间,与所述视频参考帧的视频数据采样率、视频帧渲染时间作为所述附加增强信息,写入所述参考视频帧的编码数据。
根据本公开的一个或多个实施例,【示例三】提供了一种应用于数据拉流端的音视频同步监控方法,还包括:
拉取待播放的音频数据和视频数据,并获取所述视频数据中的视频参考帧的附加增强信息;
基于所述附加增强信息确定所述视频数据中视频帧的渲染时间和所述音频数据中音频帧的渲染时间;
根据所述视频数据中视频帧的渲染时间和所述音频数据中音频帧的渲染时间,监控所述视频数据和所述音频数据进行同步播放。
根据本公开的一个或多个实施例,【示例四】提供了一种应用于数据拉流端的音视频同步 监控方法,还包括:
例如,基于所述附加增强信息确定所述视频数据中视频帧的渲染时间和所述音频数据中音频帧的渲染时间,包括:
根据所述附加增强信息中的音频参考帧签名和音频帧长度,为所述视频参考帧匹配对应的音频参考帧;
通过所述附加增强信息中的所述音频参考帧的音频数据采样率和音频帧渲染时间,以及所述音频数据中每个音频帧的发送时间戳,确定所述音频数据中该音频帧的音频帧渲染时间;
通过所述附加增强信息中的所述视频参考帧的视频数据采样率、视频帧渲染时间,以及所述视频数据中每个视频帧的发送时间戳,确定所述视频数据中该视频帧的视频帧渲染时间。
根据本公开的一个或多个实施例,【示例五】提供了一种应用于数据拉流端的音视频同步监控方法,包括:
所述通过所述附加增强信息中的所述视频参考帧的视频数据采样率、视频帧渲染时间,以及所述视频数据中每个视频帧的发送时间戳,确定所述视频数据中该视频帧的视频帧渲染时间,包括:
针对所述视频数据中的每一个视频帧,计算每个视频帧的发送时间戳与所述视频参考帧的发送时间戳的第一时间差;
根据所述第一时间差和所述视频数据采样率,确定每个视频帧与所述视频参考帧的第一渲染时间差;
将所述第一渲染时间差与所述视频参考帧的视频帧渲染时间进行叠加,确定每个视频帧的视频帧渲染时间。
根据本公开的一个或多个实施例,【示例六】提供了一种应用于数据拉流端的音视频同步监控方法,还包括:
例如,所述通过所述附加增强信息中的所述音频参考帧的音频数据采样率和音频帧渲染时间,以及所述音频数据中每个音频帧的发送时间戳,确定所述音频数据中该音频帧的音频帧渲染时间,包括:
针对所述音频数据中的每一个音频帧,计算每个音频帧的发送时间戳与所述音频参考帧的发送时间戳的第二时间差;
根据所述第二时间差和所述音频数据采样率,确定每个音频帧与所述音频参考帧的第二渲染时间差;
将所述第二渲染时间差与所述音频参考帧的视频帧渲染时间进行叠加,确定每个音频帧的视频帧渲染时间。
根据本公开的一个或多个实施例,【示例七】提供了一种应用于数据拉流端的音视频同步监控方法,还包括:
例如,所述根据所述视频数据中视频帧的渲染时间和所述音频数据中音频帧的渲染时间,监控所述视频数据和所述音频数据进行同步播放,包括:
通过所述视频数据中的最新视频帧的视频帧渲染时间和数据到达时间戳,以及所述音频数据中最新音频帧的音频帧渲染时间和数据到达时间,确定所述视频数据相对于所述音频数据的到达时间差;
基于所述到达时间差,监控所述视频数据和音频数据进行同步播放。
根据本公开的一个或多个实施例,【示例八】提供了一种配置于数据推流端的音视频同步监控装置,包括:
数据编码模块,设置为采集待推流的音频数据和视频数据,并对所述音频数据和所述视频数据进行编码;
数据信息补充模块,设置为在所述视频数据中选取视频参考帧,并在所述视频参考帧的编码过程中写入附加增强信息,其中,所述附加增强信息包括用于使所述音频数据和所述视频数据同步播放的参考信息;
数据推流模块,设置为将编码完成后的音频数据和视频数据推流到目标内容分发网络中,以供数据拉流端通过第三方服务器拉取所述音频数据和所述视频数据,并根据所述附加增强信息监控所述音频数据和所述视频数据实现同步播放。
根据本公开的一个或多个实施例,【示例九】提供了一种配置于数据推流端的音视频同步监控装置,还包括:
例如,所述数据信息补充模块设置为:
确定与所述视频参考帧相对应的音频参考帧;
将所述音频参考帧的签名、音频帧长度、音频数据采样率和音频帧渲染时间,与所述视频参考帧的视频数据采样率、视频帧渲染时间作为所述附加增强信息,写入所述参考视频帧的编码数据。
根据本公开的一个或多个实施例,【示例十】提供了一种配置于数据拉流端的音视频同步监控装置,还包括:
数据拉取模块,设置为拉取待播放的音频数据和视频数据,并获取所述视频数据中的视频参考帧的附加增强信息;
数据渲染时间确定模块,设置为基于所述附加增强信息确定所述视频数据中视频帧的渲染时间和所述音频数据中音频帧的渲染时间;
数据同步监控模块,设置为根据所述视频数据中视频帧的渲染时间和所述音频数据中音频帧的渲染时间,监控所述视频数据和所述音频数据进行同步播放。
根据本公开的一个或多个实施例,【示例十一】提供了一种配置于数据拉流端的音视频同步监控装置,还包括:
例如,数据渲染时间确定模块包括:音频参考帧确定子模块、音频帧渲染时间确定子模块和视频帧渲染时间确定子模块。
其中,音频参考帧确定子模块设置为:根据所述附加增强信息中的音频参考帧签名和音频帧长度,为所述视频参考帧匹配对应的音频参考帧。
音频帧渲染时间确定子模块设置为:通过所述附加增强信息中的所述音频参考帧的音频数据采样率和音频帧渲染时间,以及所述音频数据中每个音频帧的发送时间戳,确定所述音频数据中该音频帧的音频帧渲染时间。
视频帧渲染时间确定子模块设置为:通过所述附加增强信息中的所述视频参考帧的视频数据采样率、视频帧渲染时间,以及所述视频数据中每个视频帧的发送时间戳,确定所述视频数据中该视频帧的视频帧渲染时间。
根据本公开的一个或多个实施例,【示例十二】提供了一种配置于数据拉流端的音视频同步监控装置,包括:
例如,所述视频帧渲染时间确定子模块设置为:
针对所述视频数据中的每一个视频帧,计算每个视频帧的发送时间戳与所述视频参考帧的发送时间戳的第一时间差;
根据所述第一时间差和所述视频数据采样率,确定每个视频帧与所述视频参考帧的第一渲染时间差;
将所述第一渲染时间差与所述视频参考帧的视频帧渲染时间进行叠加,确定每个视频帧的视频帧渲染时间。
根据本公开的一个或多个实施例,【示例十三】提供了一种配置于数据拉流端的音视频同步监控装置,还包括:
例如,所述视频帧渲染时间确定子模块设置为:
针对所述音频数据中的每一个音频帧,计算每个音频帧的发送时间戳与所述音频参考帧的发送时间戳的第二时间差;
根据所述第二时间差和所述音频数据采样率,确定每个音频帧与所述音频参考帧的第二渲染时间差;
将所述第二渲染时间差与所述音频参考帧的视频帧渲染时间进行叠加,确定每个音频帧的视频帧渲染时间。
根据本公开的一个或多个实施例,【示例十四】提供了一种配置于数据拉流端的音视频同步监控装置,还包括:
例如,数据同步监控模块设置为:
通过所述视频数据中的最新视频帧的视频帧渲染时间和数据到达时间戳,以及所述音频数据中最新音频帧的音频帧渲染时间和数据到达时间,确定所述视频数据相对于所述音频数据的到达时间差;
基于所述到达时间差,监控所述视频数据和音频数据进行同步播放。
以上描述仅为本公开的示例实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本公开中所涉及的公开范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述公开构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本公开中公开的(但不限于)具有类似功能的技术特 征进行互相替换而形成的技术方案。
此外,虽然采用特定次序描绘了多种操作,但是这不应当理解为要求这些操作以所示出的特定次序或以顺序次序执行来执行。在一定环境下,多任务和并行处理可能是有利的。同样地,虽然在上面论述中包含了若干具体实现细节,但是这些不应当被解释为对本公开的范围的限制。在单独的实施例的上下文中描述的某些特征还可以组合地实现在单个实施例中。相反地,在单个实施例的上下文中描述的多种特征也可以单独地或以任何合适的子组合的方式实现在多个实施例中。
Claims (11)
- 一种音视频同步监控方法,应用于数据推流端,包括:采集待推流的音频数据和视频数据,并对所述音频数据和所述视频数据进行编码;在所述视频数据中选取视频参考帧,并在所述视频参考帧的编码过程中写入附加增强信息,其中,所述附加增强信息包括用于使所述音频数据和所述视频数据同步播放的参考信息;将编码完成后的音频数据和视频数据推流到目标内容分发网络中,以供数据拉流端通过第三方服务器拉取所述音频数据和所述视频数据,并根据所述附加增强信息监控所述音频数据和所述视频数据实现同步播放。
- 根据权利要求1所述的方法,其中,所述在所述视频参考帧的编码过程中写入附加增强信息,包括:确定与所述视频参考帧相对应的音频参考帧;将所述音频参考帧的签名、音频帧长度、音频数据采样率和音频帧渲染时间,所述视频参考帧的视频数据采样率、视频帧渲染时间作为所述附加增强信息,写入所述参考视频帧的编码数据。
- 一种音视频同步监控方法,应用于数据拉流端,包括:拉取待播放的音频数据和视频数据,并获取所述视频数据中的视频参考帧的附加增强信息;基于所述附加增强信息确定所述视频数据中视频帧的渲染时间和所述音频数据中音频帧的渲染时间;根据所述视频数据中视频帧的渲染时间和所述音频数据中音频帧的渲染时间,监控所述视频数据和所述音频数据进行同步播放。
- 根据权利要求3所述的方法,其中,所述视频数据包括多个视频帧,所述音频数据包括多个音频帧,所述基于所述附加增强信息确定所述视频数据中视频帧的渲染时间和所述音频数据中音频帧的渲染时间,包括:根据所述附加增强信息中的音频参考帧签名和音频帧长度,为所述视频参考帧匹配对应的音频参考帧;通过所述附加增强信息中的所述音频参考帧的音频数据采样率、音频帧渲染时间,以及所述音频数据中每个音频帧的发送时间戳,确定所述音频数据中所述每个音频帧的音频帧渲染时间;通过所述附加增强信息中的所述视频参考帧的视频数据采样率、视频帧渲染时间,以及所述视频数据中每个视频帧的发送时间戳,确定所述视频数据中所述每个视频帧的视频帧渲染时间。
- 根据权利要求4所述的方法,其中,所述通过所述附加增强信息中的所述视频参考帧的视频数据采样率、视频帧渲染时间,以及所述视频数据中每个视频帧的发送时间戳,确定所述视频数据中所述每个视频帧的视频帧渲染时间,包括:针对所述视频数据中的每个视频帧,计算所述每个视频帧的发送时间戳与所述视频参考 帧的发送时间戳的第一时间差;根据所述第一时间差和所述视频数据采样率,确定所述每个视频帧与所述视频参考帧的第一渲染时间差;将所述第一渲染时间差与所述视频参考帧的视频帧渲染时间进行叠加,确定所述每个视频帧的视频帧渲染时间。
- 根据权利要求4所述的方法,其中,所述通过所述附加增强信息中的所述音频参考帧的音频数据采样率和音频帧渲染时间,以及所述音频数据中每个音频帧的发送时间戳,确定所述音频数据中所述每个音频帧的音频帧渲染时间,包括:针对所述音频数据中的每个音频帧,计算所述每个音频帧的发送时间戳与所述音频参考帧的发送时间戳的第二时间差;根据所述第二时间差和所述音频数据采样率,确定所述每个音频帧与所述音频参考帧的第二渲染时间差;将所述第二渲染时间差与所述音频参考帧的视频帧渲染时间进行叠加,确定所述每个音频帧的视频帧渲染时间。
- 根据权利要求3所述的方法,其中,所述根据所述视频数据中视频帧的渲染时间和所述音频数据中音频帧的渲染时间,监控所述视频数据和所述音频数据进行同步播放,包括:通过所述视频数据中的最新视频帧的视频帧渲染时间和视频数据到达时间戳,以及所述音频数据中最新音频帧的音频帧渲染时间和音频数据到达时间戳,确定所述视频数据相对于所述音频数据的到达时间差;基于所述到达时间差,监控所述视频数据和音频数据进行同步播放。
- 一种音视频同步监控装置,配置于数据推流端,包括:数据编码模块,设置为采集待推流的音频数据和视频数据,并对所述音频数据和所述视频数据进行编码;数据信息补充模块,设置为在所述视频数据中选取视频参考帧,并在所述视频参考帧的编码过程中写入附加增强信息,其中,所述附加增强信息包括用于使所述音频数据和所述视频数据同步播放的参考信息;数据推流模块,设置为将编码完成后的音频数据和视频数据推流到目标内容分发网络中,以供数据拉流端通过第三方服务器拉取所述音频数据和所述视频数据,并根据所述附加增强信息监控所述音频数据和所述视频数据实现同步播放。
- 一种音视频同步监控装置,配置于数据拉流端,包括:数据拉取模块,设置为拉取待播放的音频数据和视频数据,并获取所述视频数据中的视频参考帧的附加增强信息;数据渲染时间确定模块,设置为基于所述附加增强信息确定所述视频数据中视频帧的渲染时间和所述音频数据中音频帧的渲染时间;数据同步监控模块,设置为根据所述视频数据中视频帧的渲染时间和所述音频数据中音 频帧的渲染时间,监控所述视频数据和所述音频数据进行同步播放。
- 一种电子设备,包括:一个或多个处理器;存储装置,设置为存储一个或多个程序,当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-7中任一所述的应用于数据推流端或数据拉流端的音视频同步监控方法。
- 一种包含计算机可执行指令的存储介质,所述计算机可执行指令在由计算机处理器执行时设置为执行如权利要求1-7中任一所述的应用于数据推流端或数据拉流端的音视频同步监控方法。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US18/572,317 US20240292044A1 (en) | 2021-10-25 | 2022-09-16 | Method, apparatus, electronic device and storage medium for audio and video synchronization monitoring |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111241413.5A CN113891132B (zh) | 2021-10-25 | 2021-10-25 | 一种音视频同步监控方法、装置、电子设备及存储介质 |
CN202111241413.5 | 2021-10-25 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023071598A1 true WO2023071598A1 (zh) | 2023-05-04 |
Family
ID=79013943
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/119419 WO2023071598A1 (zh) | 2021-10-25 | 2022-09-16 | 音视频同步监控方法、装置、电子设备及存储介质 |
Country Status (3)
Country | Link |
---|---|
US (1) | US20240292044A1 (zh) |
CN (1) | CN113891132B (zh) |
WO (1) | WO2023071598A1 (zh) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113891132B (zh) * | 2021-10-25 | 2024-07-19 | 北京字节跳动网络技术有限公司 | 一种音视频同步监控方法、装置、电子设备及存储介质 |
CN114339454A (zh) * | 2022-03-11 | 2022-04-12 | 浙江大华技术股份有限公司 | 音视频同步方法、装置、电子装置和存储介质 |
CN114710687B (zh) * | 2022-03-22 | 2024-03-19 | 阿里巴巴(中国)有限公司 | 音视频同步方法、装置、设备及存储介质 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104410894A (zh) * | 2014-11-19 | 2015-03-11 | 大唐移动通信设备有限公司 | 一种无线环境影音同步的方法和装置 |
WO2017107516A1 (zh) * | 2015-12-22 | 2017-06-29 | 乐视控股(北京)有限公司 | 网络视频播放方法及装置 |
CN111464256A (zh) * | 2020-04-14 | 2020-07-28 | 北京百度网讯科技有限公司 | 时间戳的校正方法、装置、电子设备和存储介质 |
CN112272327A (zh) * | 2020-10-26 | 2021-01-26 | 腾讯科技(深圳)有限公司 | 数据处理方法、装置、存储介质及设备 |
CN112291498A (zh) * | 2020-10-30 | 2021-01-29 | 新东方教育科技集团有限公司 | 音视频数据传输的方法、装置和存储介质 |
CN113891132A (zh) * | 2021-10-25 | 2022-01-04 | 北京字节跳动网络技术有限公司 | 一种音视频同步监控方法、装置、电子设备及存储介质 |
Family Cites Families (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2010252151A (ja) * | 2009-04-17 | 2010-11-04 | Renesas Electronics Corp | 再生装置、及び同期再生方法 |
CN103167320B (zh) * | 2011-12-15 | 2016-05-25 | 中国电信股份有限公司 | 音视频同步方法、系统及手机直播客户端 |
US9979997B2 (en) * | 2015-10-14 | 2018-05-22 | International Business Machines Corporation | Synchronization of live audio and video data streams |
US10177958B2 (en) * | 2017-02-07 | 2019-01-08 | Da Sheng Inc. | Method for synchronously taking audio and video in order to proceed one-to-multi multimedia stream |
CN109218794B (zh) * | 2017-06-30 | 2022-06-10 | 全球能源互联网研究院 | 远程作业指导方法及系统 |
CN109089130B (zh) * | 2018-09-18 | 2020-05-22 | 网宿科技股份有限公司 | 一种调整直播视频的时间戳的方法和装置 |
CN109660843A (zh) * | 2018-12-29 | 2019-04-19 | 深圳市九洲电器有限公司 | 车载实时播放方法及系统 |
CN110062277A (zh) * | 2019-03-13 | 2019-07-26 | 北京河马能量体育科技有限公司 | 一种音视频自动同步方法及同步系统 |
US11102540B2 (en) * | 2019-04-04 | 2021-08-24 | Wangsu Science & Technology Co., Ltd. | Method, device and system for synchronously playing message stream and audio-video stream |
US11228799B2 (en) * | 2019-04-17 | 2022-01-18 | Comcast Cable Communications, Llc | Methods and systems for content synchronization |
CN110234028A (zh) * | 2019-06-13 | 2019-09-13 | 北京大米科技有限公司 | 音视频数据同步播放方法、装置、系统、电子设备及介质 |
CN110753202B (zh) * | 2019-10-30 | 2021-11-30 | 广州河东科技有限公司 | 可视对讲系统的音视频同步方法、装置、设备及存储介质 |
CN111294634B (zh) * | 2020-02-27 | 2022-02-18 | 腾讯科技(深圳)有限公司 | 直播方法、装置、系统、设备及计算机可读存储介质 |
CN111654736B (zh) * | 2020-06-10 | 2022-05-31 | 北京百度网讯科技有限公司 | 音视频同步误差的确定方法、装置、电子设备和存储介质 |
-
2021
- 2021-10-25 CN CN202111241413.5A patent/CN113891132B/zh active Active
-
2022
- 2022-09-16 US US18/572,317 patent/US20240292044A1/en active Pending
- 2022-09-16 WO PCT/CN2022/119419 patent/WO2023071598A1/zh active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104410894A (zh) * | 2014-11-19 | 2015-03-11 | 大唐移动通信设备有限公司 | 一种无线环境影音同步的方法和装置 |
WO2017107516A1 (zh) * | 2015-12-22 | 2017-06-29 | 乐视控股(北京)有限公司 | 网络视频播放方法及装置 |
CN111464256A (zh) * | 2020-04-14 | 2020-07-28 | 北京百度网讯科技有限公司 | 时间戳的校正方法、装置、电子设备和存储介质 |
CN112272327A (zh) * | 2020-10-26 | 2021-01-26 | 腾讯科技(深圳)有限公司 | 数据处理方法、装置、存储介质及设备 |
CN112291498A (zh) * | 2020-10-30 | 2021-01-29 | 新东方教育科技集团有限公司 | 音视频数据传输的方法、装置和存储介质 |
CN113891132A (zh) * | 2021-10-25 | 2022-01-04 | 北京字节跳动网络技术有限公司 | 一种音视频同步监控方法、装置、电子设备及存储介质 |
Also Published As
Publication number | Publication date |
---|---|
CN113891132B (zh) | 2024-07-19 |
CN113891132A (zh) | 2022-01-04 |
US20240292044A1 (en) | 2024-08-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2023071598A1 (zh) | 音视频同步监控方法、装置、电子设备及存储介质 | |
US11240543B2 (en) | Synchronizing playback of segmented video content across multiple video playback devices | |
WO2022262459A1 (zh) | 投屏方法、装置、电子设备和存储介质 | |
WO2023051243A1 (zh) | 视频码率切换方法、装置、电子设备及存储介质 | |
CN111787365A (zh) | 多路音视频同步方法及装置 | |
WO2023035879A1 (zh) | 自由视角视频的视角切换方法、装置、系统、设备和介质 | |
CN112492357A (zh) | 一种处理多视频流的方法、装置、介质和电子设备 | |
JP7524231B2 (ja) | ビデオデータの処理方法、装置、電子機器およびコンピュータ可読媒体 | |
JP2018509060A5 (zh) | ||
CN111818383B (zh) | 视频数据的生成方法、系统、装置、电子设备及存储介质 | |
CN114567812A (zh) | 音频播放方法、装置、系统、电子设备及存储介质 | |
CN114095671A (zh) | 云会议直播系统、方法、装置、设备及介质 | |
WO2021093608A1 (zh) | 视频数据的处理方法、装置、电子设备及计算机可读介质 | |
CN113542856A (zh) | 在线录像的倒放方法、装置、设备和计算机可读介质 | |
CN113259729B (zh) | 数据切换的方法、服务器、系统及存储介质 | |
WO2023217188A1 (zh) | 一种直播数据传输方法、装置、系统、设备和介质 | |
CN112153322B (zh) | 数据分发方法、装置、设备及存储介质 | |
CN113242446B (zh) | 视频帧的缓存方法、转发方法、通信服务器及程序产品 | |
WO2021093500A1 (zh) | 视频数据的处理方法、装置、电子设备及计算机可读介质 | |
CN111800649A (zh) | 存储视频的方法和装置以及生成视频的方法和装置 | |
WO2022188688A1 (zh) | 信息发送方法、装置、电子设备及计算机可读存储介质 | |
CN113727183B (zh) | 直播推流方法、装置、设备、存储介质及计算机程序产品 | |
CN115802085A (zh) | 播放内容方法、装置、电子设备及存储介质 | |
US20180192085A1 (en) | Method and apparatus for distributed video transmission | |
WO2023197897A1 (zh) | 直播音视频流的处理方法、装置、设备及介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22885485 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 18572317 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |