CN114697720A - Method and device for synchronizing self-adaptive audio and video RTP timestamp - Google Patents

Method and device for synchronizing self-adaptive audio and video RTP timestamp Download PDF

Info

Publication number
CN114697720A
CN114697720A CN202011629055.0A CN202011629055A CN114697720A CN 114697720 A CN114697720 A CN 114697720A CN 202011629055 A CN202011629055 A CN 202011629055A CN 114697720 A CN114697720 A CN 114697720A
Authority
CN
China
Prior art keywords
packet
video
audio
timestamp
ntp
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011629055.0A
Other languages
Chinese (zh)
Other versions
CN114697720B (en
Inventor
符宁
李嘉豪
杨尚山
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Yizhangyunfeng Co ltd
Original Assignee
Beijing Yizhangyunfeng Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Yizhangyunfeng Co ltd filed Critical Beijing Yizhangyunfeng Co ltd
Priority to CN202011629055.0A priority Critical patent/CN114697720B/en
Publication of CN114697720A publication Critical patent/CN114697720A/en
Application granted granted Critical
Publication of CN114697720B publication Critical patent/CN114697720B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/4302Content synchronisation processes, e.g. decoder synchronisation
    • H04N21/4307Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/60Network structure or processes for video distribution between server and client or between remote clients; Control signalling between clients, server and network components; Transmission of management data between server and client, e.g. sending from server to client commands for recording incoming content stream; Communication details between server and client 
    • H04N21/63Control signaling related to video distribution between client, server and network components; Network processes for video distribution between server and clients or between remote clients, e.g. transmitting basic layer and enhancement layers over different transmission paths, setting up a peer-to-peer communication via Internet between remote STB's; Communication protocols; Addressing
    • H04N21/643Communication protocols
    • H04N21/6437Real-time Transport Protocol [RTP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/8547Content authoring involving timestamps for synchronizing content

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Synchronisation In Digital Transmission Systems (AREA)

Abstract

The invention relates to a method and a device for synchronizing an adaptive audio and video RTP timestamp. The method for synchronizing the self-adaptive audio and video RTP timestamp comprises the following steps: a sending end sends a video packet and an audio packet of an audio and video service to a receiving end; when a receiving end receives an RTP packet of a first video packet and an RTP packet of a first audio packet, the local timestamps are respectively used as an absolute timestamp of the first video packet and an absolute timestamp of the first audio packet, so that the video packet and the audio packet are synchronously controlled by using absolute time; when a receiving end receives the NTP packet of the first video packet and the NTP packet of the first audio packet, smooth deviation compensation processing is carried out on the deviation value in subsequent local playing according to the deviation value between the local timestamp of the nth video packet and the local timestamp of the nth audio packet and the NTP absolute timestamp of the nth video packet and the NTP absolute timestamp of the nth audio packet.

Description

Method and device for synchronizing self-adaptive audio and video RTP timestamp
Technical Field
The invention relates to the technical field of computer audio and video, in particular to a method for synchronizing timestamps of RTP (Real-time Transport Protocol)/RTCP (Real-time Transport Control Protocol) audio and video data packets.
Background
The audio and video synchronization control is a key technical point in the field of real-time audio and video, particularly video conferences, and whether the audio and video are synchronized or not has great influence on the experience of a call user. Real-time audio and video generally use RTP/RTCP protocol to transmit media data, and audio and video data packets are separately transmitted, so that the received data is not synchronized due to delay in the network transmission process, and therefore, buffering, sequencing, synchronizing and rendering are required to be performed on the received data at a receiving end.
Two timestamps exist in the audio-video data packet, one is the relative timestamp in the RTP data packet and the other is the NTP absolute timestamp in the RTCP data packet. Where a relative timestamp is present in each packet, while an absolute timestamp is typically sent every few seconds, or even no absolute timestamp.
There are three existing timestamp synchronization methods: the first method is to use relative time stamps as synchronization control; the method has the advantages of simplicity and easy implementation; the disadvantage is that the relative time stamps of audio and video generated by the sending end must start from a fixed value, and the receiving end joining the conference cannot synchronize in the conversation process, and certainly, the time stamp of the newly joined receiving end can be converted at the server end so as to start from a fixed value. The second method is to use absolute time stamps as synchronization control; the method has the advantages that absolute synchronization can be achieved; the disadvantage is that the absolute timestamp must be sent by the sending end, and rendering play and the effect of second-to-second opening can not be achieved at the initial stage of receiving the absolute timestamp, which affects user experience. The third method is to use the local timestamp of the received data packet as the synchronization control; the advantage of this method is that it is simple to implement; the disadvantage is that the method is too sensitive to network delay and jitter, and when the network jitter increases, cumulative delay and clock drift are introduced, so that serious asynchronization is caused.
Therefore, a method and an apparatus capable of adaptively performing audio-video synchronization are needed.
The above statements in the background are only intended to facilitate a thorough understanding of the present technical solutions (technical means used, technical problems solved and technical effects produced, etc.) and should not be taken as an acknowledgement or any form of suggestion that the messages constitute prior art already known to a person skilled in the art.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a self-adaptive algorithm, which combines an NTP absolute timestamp, an RTP relative timestamp and a local timestamp to carry out smooth transition on the control of the three timestamps, avoids respective defects, enhances user experience and further can achieve a good synchronization effect.
According to an embodiment of the present invention, there is provided a synchronization method of an adaptive audio and video RTP timestamp, including: a sending end sends a video packet and an audio packet of an audio and video service to a receiving end, wherein the video packet and the audio packet respectively comprise an RTP packet and an NTP packet; when a receiving end receives an RTP packet of a first video packet and an RTP packet of a first audio packet, local timestamps LV (1) and LA (1) when the RTP relative timestamp RV (1) of the first video packet and the RTP relative timestamp RA (1) of the first audio packet are received are respectively used as an absolute timestamp of the first video packet and an absolute timestamp of the first audio packet, so that the video packet and the audio packet are synchronously controlled by using the absolute time; when a receiving end receives an NTP packet of a first video packet and an NTP packet of a first audio packet, according to a calculated deviation value between a local time stamp LV (n) of a received nth video packet and a local time stamp LA (n) of the nth audio packet, and a calculated NTP absolute time stamp AV (n) of the received nth video packet and an NTP absolute time stamp AA (n) of the nth audio packet, smooth deviation compensation processing is carried out on the deviation value in subsequent local playing, and therefore synchronous control of the video packet and the audio packet is carried out by using the deviation compensated absolute time, wherein n is an integer larger than 1.
Preferably, when the NTP packet of the first video packet and the NTP packet of the first audio packet are received, the NTP absolute timestamp AV (1) of the first video packet and the NTP absolute timestamp AA (1) of the first audio packet are calculated by the following equations:
AA (1) ═ AA (ntp)) + [ RA (ntp)) -RA (1) ]/audio sampling rate
AV (1) ((ntp) + [ RV (ntp)) -RV (1) ]/video sampling rate
The NTP packets of the video packets include absolute timestamps av (NTP) and relative timestamps rv (NTP) of the video packets, and the NTP packets of the audio packets include absolute timestamps aa (NTP) and relative timestamps ra (NTP) of the video packets.
Preferably, the NTP absolute timestamp av (n) of the received nth video packet and the NTP absolute timestamp aa (n) of the nth audio packet are calculated by the following equations:
AV (n) ═ AV (1) + [ RV (n) -RV (1) ]/video sampling rate
AA (1) + [ RA (n) -RA (1) ]/audio sampling rate.
Preferably, when the NTP packet of the first video packet and the NTP packet of the first audio packet are received, the local timestamp lv (n) of the received nth video packet and the local timestamp la (n) of the received nth audio packet are calculated by the following equations:
LV (n) ═ LV (1) + [ RV (n) -RV (1) ]/video sampling rate
LA (1) + [ RA (n) -RA (1) ]/audio sampling rate.
Preferably, the offset value provision existing between the audio packet and the video packet is calculated by the following equation:
deviation=AA(n)-AV(n)-[LA(n)-LV(n)]
wherein, if the deviation value deviation is 0ms, it indicates that there is no fluctuation in network transmission; if the deviation value deviation is not 0ms, it indicates a fluctuation in network transmission.
Preferably, when compensating for the deviation existing between the audio packet and the video packet, the local timestamp of the video packet is gradually compensated with a smooth step of L based on the audio packet, wherein the time for each compensation is the compensation/L, the deviation for the remaining video packets is the compensation-compensation/L, and L is an integer greater than 1; when the remaining video packets compensate for the offset of 0, the local timestamps of the video packets are no longer compensated.
Preferably, in compensating for the deviation value existing between the audio packet and the video packet, the estimated local timestamp LV (estimate _ n) of the video packet is calculated by the following equation:
LV(estimate_n)=LV(n)+deviation/L
when the receiving end receives the relative time stamp RV (n +1) of the (n +1) th video packet again, the local time stamp LV (n +1) of the (n +1) th video packet when the relative time stamp RV (n +1) of the (n +1) th video packet is received can be calculated by the following equation:
LV (n +1) ═ LV (estimate _ n) + (RV (n +1) -RV (n))/video sampling rate.
Preferably, when calculating the remaining video packet compensation offset, rounding calculation is performed on the remaining video packet compensation offset.
According to an embodiment of the present invention, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method according to an embodiment of the present invention.
Compared with the prior art, the method and the device for synchronizing the self-adaptive audio and video RTP timestamp can more effectively realize audio and video synchronization.
Drawings
Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. For purposes of clarity, the same reference numbers will be used in different drawings to identify the same elements. It is noted that the drawings are merely schematic and are not necessarily drawn to scale. In these drawings:
fig. 1 is a flowchart illustrating a method for synchronization of adaptive audio-video RTP timestamps according to an embodiment of the present invention.
Fig. 2 is a schematic diagram showing the structure of a computing system for implementing an exemplary embodiment of the present invention.
Detailed Description
The embodiments of the present invention will be described in detail below, which are carried out on the premise of the technical scheme of the present invention, and detailed embodiments and specific operation procedures are given, but the scope of the present invention is not limited to the embodiments described below.
The invention adds the local timestamp in addition to the relative timestamp and the absolute timestamp provided by the traditional audio and video, so that the audio and video playing is converted from depending on the relative timestamp and the absolute timestamp into depending on the local timestamp, and finally the transmission delay caused by network fluctuation is eliminated through a certain algorithm.
Fig. 1 is a flowchart illustrating a synchronization method of an adaptive audio-video RTP timestamp according to an embodiment of the present invention.
The synchronization method of the adaptive audio and video RTP timestamp according to the embodiment of the invention is divided into two stages.
The first phase is an initial relative timestamp phase.
In step 101: the sending end sends the video packet and the audio packet to the receiving end, and the video packet and the audio packet only have RTP relative timestamps at the stage and have no NTP absolute timestamps.
In step 102: taking the local timestamps of the RTP relative timestamps of the first video packet and the first audio packet as their absolute timestamps, respectively, the network jitter is usually sporadic, and the sending time difference and the receiving time difference are approximately equal in rate, so that it is acceptable in the initial relative timestamp stage.
Assume that the RTP relative timestamp of the first video packet is RV (1), the RTP relative timestamp of the first audio packet is RA (1), the local timestamp of the first video packet is LV (1), and the local timestamp of the first audio packet is LA (1).
When the RTP relative timestamps of the nth video packet and the nth audio packet are received, the local timestamps lv (n) and la (n) of the nth video packet and the nth audio packet may be calculated by the following equations:
LV (n) ═ LV (1) + [ RV (n) -RV (1) ]/video sampling rate
LA (1) + [ RA (n) -RA (1) ]/audio sampling rate
In the initial relative timestamp stage, the RTP relative timestamp of the first video packet and the local timestamp of the RTP relative timestamp of the first audio packet are used as absolute timestamps for synchronous control, so that the limitation that a receiving end must receive the NTP absolute timestamp and the RTP relative timestamps of the audio packet and the video packet generated by the transmitting end must start from a fixed value is overcome.
The second stage is the NTP absolute timestamp stage.
In step 103: NTP absolute timestamps for video and audio packets are received. And entering the stage of NTP absolute timestamp after receiving the NTP absolute timestamps of the video packet and the audio packet, wherein the NTP absolute timestamps of the video packet and the audio packet are received, so that the absolute time can be correctly calculated, and the video packet and the audio packet are played according to the absolute time. In the process of switching the local timestamp, which is previously taken as the absolute timestamp, to the received NTP absolute timestamp, a smooth transition needs to be performed on the absolute timestamp, otherwise a sharp feeling is caused to the user.
Assuming that the NTP packet has been received for the nth video packet and the nth audio packet, the absolute time stamp can be obtained, the NTP absolute time stamp for the nth video packet is AVn, and the NTP absolute time stamp for the nth audio packet is AAn, and the local time stamp LV (n +1) for the (n +1) th video packet is adjusted by using the following equation:
LV(n+1)=LV(n)-[AV(n)-AA(n)]/L
and L is the deviation smoothing step amplitude, the larger the value is, the smoother the transition is, but the longer the time required for synchronizing to the absolute timestamp is, and the value of L can be adjusted according to the actual service requirement.
The following describes a processing procedure of a synchronization method of an adaptive audio-video RTP timestamp according to an embodiment of the present invention with a specific example.
Suppose that a transmitting end transmits an audio/video link to a receiving end, the audio sampling rate is 48000Hz, the video sampling rate is 90000Hz, and the deviation smoothing step amplitude L is 4.
Initial relative timestamp phase:
when a receiving end receives the RTP relative time stamp of the first audio packet, the local time of the receiving end is obtained, namely the RTP relative time stamp RA (1) of the first audio packet is 10000Hz, the local time stamp LA (1) is 80000ms, when the receiving end receives the RTP relative time stamp of the first video packet, the local time of the receiving end is obtained, namely the RTP relative time stamp RV (1) of the first video packet is 20000Hz, and the local time stamp LV (1) is 80005 ms. The RTP relative time stamp and the local time stamp of the first audio packet and the first video packet are recorded.
When the receiving end receives RTP timestamps ra (n) and rv (n) of subsequent audio and video packets, the local timestamps la (n) and lv (n) of the audio and video packets can be calculated by the following equations:
LA (n) ═ LA (1) + [ RA (n) — RA (1) ]/audio sampling rate
LV (n) ═ LV (1) + [ RV (n) -RV (1) ]/video sampling rate
Wherein n is an integer greater than 1.
In the above stage, the local time stamp when the first video packet and the first audio packet are received is used as the absolute time stamp, so that the absolute time synchronization control is performed, which overcomes the limitation that the receiving end must receive the NTP absolute time stamp and the RTP relative time stamps of the audio and video generated by the transmitting end must start from a fixed value.
NTP absolute timestamp stage:
when the receiving end receives the NTP packets of the first video packet and the first audio packet, respectively, according to absolute timestamps aa (NTP) and av (NTP) and relative timestamps ra (NTP) and rv (NTP) in the NTP packets (which are transmitted through RTCP data packets, where the RTCP data packets include an NTP absolute timestamp and an RTP relative timestamp corresponding to the absolute timestamp), the absolute timestamps of the received first video packet and first audio packet can be calculated by the following equation:
AA (1) ═ AA (ntp)) + [ RA (ntp)) -RA (1) ]/audio sampling rate
AV (1) (ntp) + [ RV (ntp) -RV (1) ]/video sampling rate
At this time, the RTP relative timestamp ra (n) of the received nth audio packet is 10480Hz, and the local timestamp la (n) of the nth audio packet can be calculated by the following equation, i.e. 80010ms,
LA (n) ═ LA (1) + [ RA (n) — RA (1) ]/audio sampling rate
And the absolute time stamp of the nth audio packet, i.e. 90025ms,
AA (n) ═ AA (1) + [ RA (n) -RA (1) ]/audio sampling rate
Similarly, the RTP relative timestamp rv (n) of the nth video packet received is 21800Hz, and the local timestamp of the nth video packet can be calculated by the following equation, i.e. 80025ms,
LV (n) ═ LV (1) + [ RV (n) -RV (1) ]/video sampling rate
And the NTP absolute timestamp for the nth video packet can be calculated by the following equation, namely 90025ms,
AV (n) ═ AV (1) + [ RV (n) -RV (1) ]/video sampling rate
If the network transmits the audio and video data without fluctuation, the relative time difference of the audio and video should be equal to the absolute time difference, namely LA (n) -LV (n) -AA (n) -AV (n)
From the above equation, one can obtain: [ aa (n) -av (n) ] - [ la (n) -lv (n) ] ═ 10ms, which illustrates that due to network fluctuations, there is a 10ms deviation between the absolute timestamp a (n) and the local timestamp l (n) of the audio-video packets, which may be audio-and video-derived.
According to an embodiment of the present invention, the audio packets are referenced regardless of whether there is a deviation. I.e. assuming that the audio packets have no deviation, all deviations are caused by video packets. Therefore, only the deviation of the video packets is considered in the calculation, and the deviation of the audio packets is not required to be considered.
If the relative time difference and the absolute time difference of the audio and video packets have deviation, the deviation of the video packets needs to be compensated.
The local timestamp lv (n) of the nth video packet is found to be 80025ms, at which time the video offset is known to be 10ms, and the offset smoothness L is set to 4.
The estimated time stamp LV (estimate _ n) of the nth video packet can be calculated by the following equation, which is 80027.5 ms.
LV(estimate_n)=LV(n)+deviation/L
At this time, the offset/L of the already compensated video packet is 10/4-2.5 ms, and the offset-compensation/L of the remaining video packet is 7.5ms, so as to ensure that the offset converges as soon as possible (the error approaches 0 ms). According to an embodiment of the present invention, the remaining video packet compensation offsets are rounded, i.e. (int) (removal-removal/L) is 7 ms.
When the receiving end receives the (n +1) th video packet, its relative timestamp RV (n +1) (here, 23600Hz), and at this time, when the local timestamp LV (n +1) of the (n +1) th video packet is obtained, the estimated timestamp LV (estimate _ n) of the (n) th video packet is obtained as the local timestamp of the (n) th video packet (since the estimated timestamp is the local timestamp that has been compensated for), the local timestamp LV (n +1) of the (n +1) th video packet can be calculated by the following equation, that is, 80047.5ms,
LV (n +1) ═ LV (estimate _ n) + (RV (n +1) -RV (n))/video sampling rate
The calculated local timestamp LV (n +1) of the (n +1) th video packet is the local timestamp of the (n +1) th video packet based on the last compensation, and at this time, the timestamp is not the actual timestamp, because the timestamp needs to be compensated again, and the estimated timestamp LV (estimate _ n +1) of the (n +1) th video packet is obtained by the following equation, which is 80049ms,
LV(estimate_n+1)=LV(n+1)+deviation/L
after the current deviation compensation, the residual video packet compensation deviation (int) (deviation-compensation/L) is 6ms
After the above steps are sequentially calculated circularly, when the offset compensation of the remaining video packets is 0ms, the resolution/L is 0ms, and the video packets are calculated according to the above formula and the offset is not compensated any more, at this time, it can be known from the following equation:
LV (estimate _ n +1) ═ LV (n +1) + removal/L, removal/L ═ 0 ms. LV (estimate _ n +1) ═ LV (n + 1). I.e., LV (estimate _ n +1) is equal to LV (n + 1).
At this point it can be concluded that the local timestamp LV of the video has eliminated the network fluctuations, approaching the absolute timestamp.
Here, L is a deviation smoothing step width, the larger the value of L is, the smoother the transition is, but the longer the time required for synchronization to the absolute timestamp is, L is an integer greater than 1, and the value of L can be adjusted according to actual service needs.
After receiving the NTP absolute timestamps of the video packets and the audio packets, in the process of switching the local timestamp, which is previously used as the absolute timestamp, to the received NTP absolute timestamp, the absolute timestamp needs to be smoothly transitioned, otherwise, a sharp feeling is caused to the user.
Fig. 2 is a schematic diagram showing the structure of a computing system for implementing an exemplary embodiment of the present invention.
Referring to fig. 2, a computing system 200 may include at least one processor 202, memory 203, I/O components 204, and a network interface 205 connected via a bus 201.
The processor 202 may be a Central Processing Unit (CPU) or a semiconductor device that performs processing on commands stored in the memory 203. The memory 203 may include various types of volatile or non-volatile storage media. For example, the memory 203 may include Read Only Memory (ROM) and Random Access Memory (RAM).
Thus, the operations of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware or in a software module executed by the processor 202, or in a combination of the two. A software module may reside on storage media (i.e., memory 203) such as RAM, flash memory, ROM, EPROM, EEPROM, registers, hard disk, a removable disk, and a CD-ROM. An exemplary storage medium is coupled to the processor 202, and the processor 202 can read information from, and write information to, the storage medium. In another approach, the storage medium may be integral to the processor 202. The processor 202 and the storage medium may reside in an Application Specific Integrated Circuit (ASIC). The ASIC may reside in a user terminal. In another approach, the processor and the storage medium may reside as discrete components in a user terminal.
While the above-described exemplary methods of the invention are shown as a series of acts for clarity of description, it is not intended to limit the order in which the steps are performed, and each step may be performed concurrently or in a different order as may be desired. The steps shown may further include other steps, may include other steps than some steps, or may include additional steps than some steps, in order to implement a method according to the present invention.
The various embodiments of the invention are not an exhaustive list of all possible combinations, but are intended to describe representative aspects of the invention, and what is described in the various embodiments can be applied independently or in combinations of two or more.
In addition, various embodiments of the invention may be implemented in hardware, firmware, software, or a combination thereof. The hardware may be implemented by one or more of an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Digital Signal Processing Device (DSPD), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a general purpose processor, a controller, a microcontroller, a microprocessor, or the like.
The scope of the present invention is intended to include software or machine-executable instructions (e.g., operating systems, application programs, firmware, programs, etc.) which cause operations according to various embodiments to be performed on an apparatus or computer, as well as non-volatile computer-readable media which are executable on a device or computer on which such software or instructions, etc., are stored.
The above description of exemplary embodiments has been presented only to illustrate the technical solutions of the present invention, and is not intended to be exhaustive or to limit the invention to the precise forms described. Obviously, many modifications and variations are possible to those skilled in the art in light of the above teachings. The exemplary embodiments were chosen and described in order to explain certain principles of the invention and its practical application to thereby enable others skilled in the art to understand, implement and utilize the invention in various exemplary embodiments and with various alternatives and modifications. It is intended that the scope of the invention be defined by the following claims and their equivalents.

Claims (9)

1. A synchronization method of an adaptive audio and video RTP timestamp is characterized by comprising the following steps:
a sending end sends a video packet and an audio packet of an audio and video service to a receiving end, wherein the video packet and the audio packet respectively comprise an RTP packet and an NTP packet;
when a receiving end receives an RTP packet of a first video packet and an RTP packet of a first audio packet, local timestamps LV (1) and LA (1) when the RTP relative timestamp RV (1) of the first video packet and the RTP relative timestamp RA (1) of the first audio packet are received are respectively used as an absolute timestamp of the first video packet and an absolute timestamp of the first audio packet, so that the video packet and the audio packet are synchronously controlled by using the absolute time;
when a receiving end receives an NTP packet of a first video packet and an NTP packet of a first audio packet, according to a calculated deviation value between a local timestamp LV (n) of an nth video packet and a local timestamp LA (n) of an nth audio packet, and a calculated NTP absolute timestamp AV (n) of an nth video packet and an NTP absolute timestamp AA (n) of an nth audio packet, smooth deviation compensation processing is carried out on the deviation value in subsequent local playing, and therefore synchronous control of the video packet and the audio packet is carried out by using absolute time of deviation compensation, wherein n is an integer larger than 1.
2. The method of synchronizing an adaptive audio-video RTP timestamp according to claim 1,
when the NTP packet of the first video packet and the NTP packet of the first audio packet are received, the NTP absolute timestamp AV (1) of the first video packet and the NTP absolute timestamp AA (1) of the first audio packet are calculated by the following equations:
AA (1) ═ AA (ntp) + [ RA (ntp) -RA (1) ]/audio sampling rate
AV (1) ((ntp) + [ RV (ntp)) -RV (1) ]/video sampling rate
The NTP packets of the video packets include absolute time stamps av (NTP) and relative time stamps rv (NTP) of the video packets, and the NTP packets of the audio packets include absolute time stamps aa (NTP) and relative time stamps ra (NTP) of the video packets.
3. The method of synchronizing an adaptive audio-video RTP timestamp according to claim 2,
the NTP absolute timestamp av (n) of the received nth video packet and the NTP absolute timestamp aa (n) of the nth audio packet are calculated by the following equations:
AV (n) ═ AV (1) + [ RV (n) -RV (1) ]/video sampling rate
AA (1) + [ RA (n) -RA (1) ]/audio sampling rate.
4. The method of synchronizing an adaptive audio-video RTP timestamp according to claim 3,
when the NTP packet of the first video packet and the NTP packet of the first audio packet are received, the local timestamp lv (n) of the received nth video packet and the local timestamp la (n) of the received nth audio packet are calculated by the following equations:
LV (n) ═ LV (1) + [ RV (n) -RV (1) ]/video sampling rate
LA (1) + [ RA (n) -RA (1) ]/audio sampling rate.
5. The method for synchronizing an adaptive audio-video RTP timestamp according to claim 4,
the offset value provision existing between the audio packet and the video packet is calculated by the following equation:
deviation=AA(n)-AV(n)-[LA(n)-LV(n)]
wherein, if the deviation value deviation is 0ms, it indicates that there is no fluctuation in network transmission; if the deviation value deviation is not 0ms, it indicates that there is a fluctuation in the network transmission.
6. The method of synchronizing an adaptive audio-video RTP timestamp according to claim 5,
when the deviation between the audio packet and the video packet is compensated, the local timestamp of the video packet is gradually compensated by taking the audio packet as a reference and taking L as a smooth step, wherein the compensation time is compensation/L, the compensation deviation of the rest video packets is compensation-compensation/L, and L is an integer greater than 1;
when the remaining video packets compensate for the offset of 0, the local timestamps of the video packets are no longer compensated.
7. The method for synchronizing an adaptive audio-video RTP timestamp according to claim 6,
in compensating for the deviation value existing between the audio packet and the video packet, the estimated local time stamp LV (estimate _ n) of the video packet is calculated by the following equation:
LV(estimate_n)=LV(n)+deviation/L
when the receiving end receives the relative time stamp RV (n +1) of the (n +1) th video packet again, the local time stamp LV (n +1) of the (n +1) th video packet at the time of receiving the relative time stamp RV (n +1) of the (n +1) th video packet can be calculated by the following equation:
LV (n +1) ═ LV (estimate _ n) + (RV (n +1) -RV (n))/video sampling rate.
8. The method for synchronizing an adaptive audio-video RTP timestamp according to claim 6,
and when calculating the residual video packet compensation deviation, rounding the residual video packet compensation deviation.
9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 6.
CN202011629055.0A 2020-12-31 2020-12-31 Synchronization method and device of adaptive audio and video RTP (real-time protocol) time stamps Active CN114697720B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011629055.0A CN114697720B (en) 2020-12-31 2020-12-31 Synchronization method and device of adaptive audio and video RTP (real-time protocol) time stamps

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011629055.0A CN114697720B (en) 2020-12-31 2020-12-31 Synchronization method and device of adaptive audio and video RTP (real-time protocol) time stamps

Publications (2)

Publication Number Publication Date
CN114697720A true CN114697720A (en) 2022-07-01
CN114697720B CN114697720B (en) 2023-11-07

Family

ID=82133981

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011629055.0A Active CN114697720B (en) 2020-12-31 2020-12-31 Synchronization method and device of adaptive audio and video RTP (real-time protocol) time stamps

Country Status (1)

Country Link
CN (1) CN114697720B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050281246A1 (en) * 2004-06-22 2005-12-22 Lg Electronics Inc. Synchronizing video/audio data of mobile communication terminal
US20070116057A1 (en) * 2003-07-04 2007-05-24 Liam Murphy System and method for determining clock skew in a packet -based telephony session
JP2009010863A (en) * 2007-06-29 2009-01-15 Oki Electric Ind Co Ltd Audio/video synchronizing method, audio/video synchronizing system and audio/video receiving terminal
US20100100917A1 (en) * 2008-10-16 2010-04-22 Industrial Technology Research Institute Mobile tv system and method for synchronizing the rendering of streaming services thereof
CN102571687A (en) * 2010-12-10 2012-07-11 联芯科技有限公司 Method for building synchronous status information among real-time media streams, device adopting same and SCC AS
CN103414957A (en) * 2013-07-30 2013-11-27 广东工业大学 Method and device for synchronization of audio data and video data
CN104092697A (en) * 2014-07-18 2014-10-08 杭州华三通信技术有限公司 Anti-replaying method and device based on time
CN111385625A (en) * 2018-12-29 2020-07-07 成都鼎桥通信技术有限公司 Non-IP data transmission synchronization method and device

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070116057A1 (en) * 2003-07-04 2007-05-24 Liam Murphy System and method for determining clock skew in a packet -based telephony session
US20050281246A1 (en) * 2004-06-22 2005-12-22 Lg Electronics Inc. Synchronizing video/audio data of mobile communication terminal
CN1738437A (en) * 2004-06-22 2006-02-22 Lg电子株式会社 Synchronizing video/audio of mobile communication terminal
JP2009010863A (en) * 2007-06-29 2009-01-15 Oki Electric Ind Co Ltd Audio/video synchronizing method, audio/video synchronizing system and audio/video receiving terminal
US20100100917A1 (en) * 2008-10-16 2010-04-22 Industrial Technology Research Institute Mobile tv system and method for synchronizing the rendering of streaming services thereof
CN102571687A (en) * 2010-12-10 2012-07-11 联芯科技有限公司 Method for building synchronous status information among real-time media streams, device adopting same and SCC AS
CN103414957A (en) * 2013-07-30 2013-11-27 广东工业大学 Method and device for synchronization of audio data and video data
CN104092697A (en) * 2014-07-18 2014-10-08 杭州华三通信技术有限公司 Anti-replaying method and device based on time
CN111385625A (en) * 2018-12-29 2020-07-07 成都鼎桥通信技术有限公司 Non-IP data transmission synchronization method and device

Also Published As

Publication number Publication date
CN114697720B (en) 2023-11-07

Similar Documents

Publication Publication Date Title
EP1775964B1 (en) Method and device for stream synchronization of real-time multimedia transport over packet network
DK3118855T3 (en) Method, device and system for synchronous audio playback
US7392102B2 (en) Method of synchronizing the playback of a digital audio broadcast using an audio waveform sample
KR100968928B1 (en) Apparatus and method for synchronization of audio and video streams
US20080152309A1 (en) Method and apparatus for audio/video synchronization
US20030198254A1 (en) Method of synchronizing the playback of a digital audio broadcast by inserting a control track pulse
CN108366283B (en) Media synchronous playing method among multiple devices
US10602468B2 (en) Software based audio timing and synchronization
KR20080007577A (en) Synchronized audio/video decoding for network devices
JP2013134119A (en) Transmitter, transmission method, receiver, reception method, synchronous transmission system, synchronous transmission method, and program
JP2001186180A (en) Ip terminal device, method for estimating frequency error range, method of estimating frequency difference and method of calculating estimated required time
KR102566550B1 (en) Method of display playback synchronization of digital contents in multiple connected devices and apparatus using the same
US7440474B1 (en) Method and apparatus for synchronizing clocks on packet-switched networks
US20070009071A1 (en) Methods and apparatus to synchronize a clock in a voice over packet network
US8477810B2 (en) Synchronization using multicasting
US9991981B2 (en) Method for operating a node of a communications network, a node and a communications network
CN114697720A (en) Method and device for synchronizing self-adaptive audio and video RTP timestamp
JP4042396B2 (en) Data communication system, data transmission apparatus, data reception apparatus and method, and computer program
KR100457508B1 (en) Apparatus for setting time stamp offset and method thereof
US20040218633A1 (en) Method for multiplexing, in MPEG stream processor, packets of several input MPEG streams into one output transport stream with simultaneous correction of time stamps
US11900010B2 (en) Method of managing an audio stream read in a manner that is synchronized on a reference clock
JP4425115B2 (en) Clock synchronization apparatus and program
JP2018121199A (en) Receiving device and clock generation method
WO2020206465A1 (en) Software based audio timing and synchronization
CA2651701C (en) Generation of valid program clock reference time stamps for duplicate transport stream packets

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant