CN113395561A

CN113395561A - Audio and video synchronization method and device based on different reference clocks and computer equipment

Info

Publication number: CN113395561A
Application number: CN202110018877.3A
Authority: CN
Inventors: 姜发波
Original assignee: Hangzhou Tuya Information Technology Co Ltd
Current assignee: Hangzhou Tuya Information Technology Co Ltd
Priority date: 2021-01-07
Filing date: 2021-01-07
Publication date: 2021-09-14

Abstract

The application relates to an audio and video synchronization method, an audio and video synchronization device and computer equipment based on different reference clocks, wherein the audio and video synchronization method comprises the steps of receiving an audio frame and calculating a first time stamp of the audio frame; and receiving the video frame and calculating a second timestamp of the video frame such that the first timestamp and the second timestamp are aligned; calculating a first offset between the first timestamp and the second timestamp; and correcting the second time stamp of the video frame according to the first offset to generate the audio and video sequence with the time stamps of the audio frame and the video frame increasing in an interlaced mode. According to the method, the time stamps of the audio and video frames are aligned, and then the time stamps of the video frames are corrected according to the offset between the adjacent audio and video time stamps, so that the playing speed of the video frames is accelerated or slowed down, the time stamps of the audio frames and the video frames are staggered, uniformly and incrementally arranged, and the problem of asynchronous audio and video playing based on different reference clocks is solved.

Description

Audio and video synchronization method and device based on different reference clocks and computer equipment

Technical Field

The present application relates to the field of streaming media technologies, and in particular, to an audio and video synchronization method and apparatus based on different reference clocks, and a computer device.

Background

The existing audio and video synchronization method mainly comprises the step that a playing end controls the playing speed of audio and video by taking a certain reference clock as reference according to an audio and video time stamp pts, so that the effect of audio and video synchronization is achieved. The premise of keeping the audio and video synchronous playing by adopting the method is that the generated time stamp pts must use the same reference clock when the audio and video coding end is coded. However, in general, the video and audio acquisition sources are different, the video source is usually a camera acquisition signal, the audio source is usually a microphone audio signal, and audio and video encoding is completed in two different systems, so that the video and audio timestamps pts adopt different reference clocks, further, the playing end cannot correctly perform synchronous control on the audio and video playing speeds according to the timestamp pts, and the problem of unmatched video picture and audio playing at the client end occurs.

Disclosure of Invention

The application provides an audio and video synchronization method and device based on different reference clocks and computer equipment, and aims to at least solve the problem that audio and video playing is not synchronized under the condition based on different reference clocks in the related technology.

In a first aspect, an embodiment of the present application provides an audio and video synchronization method based on different reference clocks, where the method includes:

receiving an audio frame and calculating a first time stamp of the audio frame; and receiving a video frame and calculating a second timestamp of the video frame such that the first timestamp and the second timestamp are aligned;

calculating a first offset between the first timestamp and the second timestamp;

and correcting the second time stamp of the video frame according to the first offset to generate an audio-video sequence with the time stamps of the audio frame and the video frame increasing in an interlaced mode.

In some of these embodiments, the receiving an audio frame and calculating a first timestamp for the audio frame comprises:

if the audio frame is a received first frame audio frame, setting a first timestamp of the audio frame to be 0;

otherwise, calculating the playing duration of the audio frame according to the length of the audio frame;

the sum of the first time stamp of the audio frame of the previous frame and the playing time length is calculated and used as the first time stamp of the audio frame.

In some of these embodiments, the receiving a video frame and calculating a second timestamp for the video frame comprises:

if the video frame is the received first frame video frame, setting a second timestamp of the video frame to be 0;

and otherwise, calculating a second offset of the time stamp of the video frame and the time stamp of the last frame of video frame, and taking the second offset as an accumulation reference value of the second time stamp.

In some embodiments, the correcting the second timestamp of the video frame according to the first offset comprises:

if the first offset is larger than an offset threshold, calculating a correction coefficient of the video frame according to the first offset;

calculating an accumulation target value of the video frame according to the correction coefficient, the first timestamp, the second timestamp and the accumulation reference value;

and correcting the second time stamp of the video frame according to the accumulated target value.

In some embodiments, said calculating the correction coefficient for the video frame according to the first offset comprises:

setting the frame number of a correction video frame;

and calculating a correction coefficient of the video frame according to the first offset and the frame number of the corrected video frame.

In some embodiments, said calculating an accumulation target value for the video frame based on the correction factor, the first timestamp, the second timestamp, and the accumulation reference value comprises:

comparing the first time stamp of the adjacent audio frame with the second time stamp of the video frame to obtain a first comparison result;

comparing the accumulated reference value with the correction coefficient to obtain a second comparison result;

determining a corresponding function in a second preset piecewise function according to the first comparison result and the second comparison result;

and calculating to obtain the accumulation target value according to the accumulation reference value, the correction coefficient and a corresponding function in the second preset piecewise function.

In some embodiments, the second predetermined piecewise function is:

wherein t represents an accumulation target value, v represents an accumulation reference value, y represents a correction coefficient, apts represents a first time stamp, vpts represents a second time stamp, and m represents a preset positive integer.

In a second aspect, an embodiment of the present application provides an audio and video synchronization apparatus based on different reference clocks, where the apparatus includes a receiving module, an alignment module, a calculation module, and a correction module; wherein:

the receiving module is used for receiving the audio frames and the video frames;

the alignment module is used for calculating a first time stamp of the audio frame and a second time stamp of the video frame and aligning the first time stamp and the second time stamp;

a calculation module to calculate an offset between the first timestamp and the second timestamp;

and the correcting module is used for correcting the second time stamp of the video frame according to the offset so as to generate an audio and video sequence with the time stamps of the audio frame and the video frame increasing in a staggered mode.

In a third aspect, an embodiment of the present application provides a computer device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and when the processor executes the computer program, the processor implements the audio and video synchronization method based on different reference clocks as described in the first aspect.

In a fourth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the different-reference-clock-based audio and video synchronization method according to the first aspect.

Compared with the related art, the audio and video synchronization method based on different reference clocks provided by the embodiment of the application comprises the following steps: receiving an audio frame and calculating a first time stamp of the audio frame; and receiving a video frame and calculating a second timestamp of the video frame such that the first timestamp and the second timestamp are aligned; calculating a first offset between the first timestamp and the second timestamp; and correcting the second time stamp of the video frame according to the first offset to generate an audio and video sequence with the time stamps of the audio frame and the video frame staggered and increased, so that the problem of audio and video playing asynchronization in the related technology based on different reference clocks is solved.

The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below to provide a more thorough understanding of the application.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

fig. 1 is a flowchart of an audio and video synchronization method based on different reference clocks according to an embodiment;

fig. 2 is a schematic diagram of audio-video timestamps generated based on different reference clocks according to an embodiment;

FIG. 3 is a schematic diagram illustrating an example of a playing sequence of the audio and video signals in FIG. 2 according to original time stamps;

fig. 4 is a schematic diagram of an audio time stamp and a video time stamp after performing alignment processing on the audio and video frames of fig. 2 according to an embodiment;

fig. 5 is a schematic diagram of an audio/video playing sequence in fig. 4 according to an embodiment;

FIG. 6 is a diagram of audio and video timestamps generated within the same time period by timestamp alignment provided by one embodiment;

fig. 7 is a schematic diagram of audio/video timestamps with accumulated deviation of timestamp alignment caused by jitter of the audio/video timestamps according to an embodiment;

fig. 8 is a schematic diagram of an audio time stamp and a video time stamp after performing correction processing on the audio and video frame of fig. 7 according to an embodiment;

fig. 9 is a block diagram of an audio and video synchronization apparatus based on different reference clocks according to an embodiment;

FIG. 10 is a diagram showing an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described and illustrated below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments provided in the present application without any inventive step are within the scope of protection of the present application.

It is obvious that the drawings in the following description are only examples or embodiments of the present application, and that it is also possible for a person skilled in the art to apply the present application to other similar contexts on the basis of these drawings without inventive effort. Moreover, it should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another.

Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of ordinary skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments without conflict.

Unless defined otherwise, technical or scientific terms referred to herein shall have the ordinary meaning as understood by those of ordinary skill in the art to which this application belongs. Reference to "a," "an," "the," and similar words throughout this application are not to be construed as limiting in number, and may refer to the singular or the plural. The present application is directed to the use of the terms "including," "comprising," "having," and any variations thereof, which are intended to cover non-exclusive inclusions; for example, a process, method, system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to the listed steps or elements, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. Reference to "connected," "coupled," and the like in this application is not intended to be limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. The term "plurality" as referred to herein means two or more. "and/or" describes an association relationship of associated objects, meaning that three relationships may exist, for example, "A and/or B" may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. Reference herein to the terms "first," "second," "third," and the like, are merely to distinguish similar objects and do not denote a particular ordering for the objects.

Fig. 1 is a flowchart of an audio and video synchronization method based on different reference clocks according to an embodiment, where as shown in fig. 1, the method includes steps 110 to 130; wherein:

step 110, receiving an audio frame and calculating a first time stamp of the audio frame; and receiving the video frame and calculating a second timestamp of the video frame such that the first timestamp and the second timestamp are aligned.

If different reference clocks are used for audio and video, the timestamps of audio and video frames generated at the same time point are also different. The present application will be described in detail with an example of an audio frame rate of 20 and a video frame rate of 25.

As shown in fig. 2, within about 320 milliseconds, a video source generates 9 video frames, an audio source generates 8 audio frames, the audio frames and the video frames are generated at the same time, if the time stamps of the video frames and the audio frames are not processed, the video data can be played in front of the audio data according to the playing sequence of the original time stamps as shown in fig. 3, so that the problem of unsynchronized audio and video playing is caused.

In order to eliminate the time stamp difference between the audio frame and the video frame generated at the same time point, the time stamp of the audio and the video is aligned after the audio and video data are received. The alignment processing method adopted in the embodiment is as follows: the timestamp of the first frame received is set to 0 and the following timestamp is added to the previous accumulated value plus the time interval from the previous frame.

In some of these embodiments, receiving the audio frame and calculating the first timestamp of the audio frame comprises: if the audio frame is the received first frame audio frame, setting a first time stamp of the audio frame to be 0; otherwise, calculating the playing duration of the audio frame according to the length of the audio frame; the sum of the first time stamp and the playing time length of the audio frame of the previous frame is calculated and used as the first time stamp of the audio frame.

In some of these embodiments, receiving the video frame and calculating the second timestamp for the video frame comprises: if the video frame is the received first frame video frame, setting a second timestamp of the video frame to be 0; otherwise, calculating a second offset of the time stamp of the video frame and the time stamp of the last frame of the video frame, and taking the second offset as an accumulation reference value of the second time stamp.

The following description will be given taking a video frame as an example.

If the original timestamp of the received first frame video frame is 200, the timestamp of the video frame is set to 0, and if the received video frame is a second frame video frame, a first offset between the original timestamp of the video frame and the original timestamp of the first frame video frame is calculated. Assuming that the original timestamp of the second frame video frame is 240, the first offset is: and 240-: 0+40 ═ 40. If the original timestamp of the third frame video frame is 280, the second timestamp of the third frame video frame is: 40+ 40-80. By analogy, new audio time stamps and video time stamps are obtained as shown in fig. 4, and are played according to the realigned time stamps, and the playing sequence is shown in fig. 5. As can be seen from fig. 5, the audio and video playback can be synchronized by aligning the timestamps of the audio and video frames.

A first offset between the first timestamp and the second timestamp is calculated, step 120.

It should be noted that, by means of time stamp alignment, audio and video generated in a short period of the same time can be considered as synchronous audio and video, that is, after the time stamp alignment process, the offset between the time stamps of the audio and video should be very small (for example, within 200 milliseconds, and within human perception range), as shown in fig. 6. However, after a period of operation, some small fluctuations may cause accumulated deviations in time stamp alignment, and thus, the problem of asynchrony may also occur after time stamp alignment is performed, as shown in fig. 7. Therefore, after the time stamps are aligned, the time stamps of the video frames after alignment are dynamically adjusted, and therefore the audio and video time stamp jitter can be resisted, and the audio and video synchronous playing is guaranteed.

In order to determine whether to correct the time stamp of the video frame, a first offset between a first time stamp of the audio frame and a second time stamp of the video frame needs to be calculated first, and if the first offset is greater than an offset threshold, that is, if a time interval between adjacent audio frames and video frames is different greatly, the time stamp of the video frame needs to be corrected.

Adjacent audio and video frames may be understood as a set of audios and videos. The audio data in the set of audio and video corresponds to the video pictures. As shown in fig. 5, the first frame audio frame and the first frame video frame are a set of audios and videos, and the second frame audio frame and the second frame video frame are a set of audios and videos.

And step 130, correcting the second time stamp of the video frame according to the first offset to generate an audio-video sequence with the time stamps of the audio frame and the video frame increasing in an interlaced mode.

And if the first offset is larger than the offset threshold, correcting the second time stamp of the video frame to generate an audio-video sequence with the time stamps of the audio frame and the video frame increasing in a staggered mode. An audio-video sequence with interleaved increasing timestamps of audio and video frames can be understood as: the first timestamps of the audio frames are uniformly incremented, the second timestamps of the video frames are uniformly incremented, and the first timestamps and the second timestamps are staggered. For ease of understanding, this is again illustrated. Assuming a sequence of audio frames a1, a2, A3, a4 and a sequence of video frames B1, B2, B3, B4, the interleaved incremental audiovisual sequence can be expressed as: a1, B1, A2, B2, A3, B3, A4 and B4, wherein A1 and B1 are a group of audio and video data, and audio and video synchronous playing can be realized by audio and video sequences with staggered and increased time stamps of audio frames and video frames.

Compared with the prior art, the audio and video synchronization method based on different reference clocks comprises the following steps

Receiving an audio frame and calculating a first time stamp of the audio frame; and receiving the video frame and calculating a second timestamp of the video frame such that the first timestamp and the second timestamp are aligned; calculating a first offset between the first timestamp and the second timestamp; and correcting the second time stamp of the video frame according to the first offset to generate the audio and video sequence with the time stamps of the audio frame and the video frame increasing in an interlaced mode. According to the method, the time stamps of the audio and video frames are aligned, and then the time stamps of the video frames are corrected according to the offset between the adjacent audio and video time stamps, so that the playing speed of the video frames is accelerated or slowed down, the time stamps of the audio frames and the video frames are staggered, uniformly and incrementally arranged, and the problem of asynchronous audio and video playing based on different reference clocks is solved.

In some of these embodiments, correcting the second timestamp of the video frame according to the first offset comprises:

if the first offset is larger than the offset threshold, calculating a correction coefficient of the video frame according to the first offset;

calculating an accumulation target value of the video frame according to the correction coefficient, the first time stamp, the second time stamp and the accumulation reference value;

The correction coefficient can be understood as an adjustment amount of the second time stamp of the video frame, and the accumulated target value is the time stamp of the video frame after the alignment processing and added to the accumulated target value to be used as the final time stamp of the video frame.

It is understood that the correction coefficient may be updated in real time or may be fixed until the adjustment is completed.

The offset threshold is the time difference which influences the synchronous playing of the audio and video. The specific value of the offset threshold may be set according to actual needs, and if the requirement for the synchronization effect is higher, a smaller offset threshold is set, specifically based on no obvious perception by people, such as 200-. The present application will be described with reference to the offset threshold as 200 ms.

Specifically, the audio/video data in fig. 7 is taken as an example for explanation.

A video frame is received with an original timestamp of 3500, an aligned timestamp of 2900, and a comparison with the current neighboring audio timestamp with a first offset of 2900 and 2280 at 620, the audio and video frames have been 620 ms apart. 620>200, the first offset is greater than the offset threshold, the video timestamp needs to be adjusted to reduce the timestamp to within 200ms, and we can achieve this by reducing the increment of the video timestamp, which is defined as the correction factor k. Since the increment of the timestamp of a normal video frame is 50, i.e. the accumulated reference value is 50, and if k is 50, the audio and video are synchronized again after 620/50-12.4, which corresponds to 12.4 video frames.

It should be noted that the above correction method, which is the most extreme, may have the effect of video skipping, because 12.4 video frames have the same timestamp, which may result in simultaneous playing. In order to reduce the influence on video playing as much as possible, the present application further provides a method for calculating a correction coefficient of a video frame according to a first offset, including:

setting the frame number of a correction video frame;

and calculating a correction coefficient of the video frame according to the first offset and the frame number of the correction video frame.

The frame number of the corrected video frame is set to be n, and the meaning is that if the time difference of the current audio and video occurs, the aligned video frames are expected to be adjusted by a coefficient k within the time of n video frames, and then the audio and video time stamps are consistent again.

It should be noted that n is set too small, which may affect experience, and too large, which may increase the asynchronous time length, so that the specific data of n is subject to an actual empirical value, and this embodiment is not particularly limited.

For example, the current first offset is 620 ms, and n is 40, i.e. it is desirable that after 40 video frames, the timestamps of the audio and video are consistent, so that the value of the coefficient k can be calculated, where k is the first offset/n.

Taking fig. 7 as an example, assuming that it is desired to achieve coincidence after 40 video frame timestamps, k is 620/40 is 15.5, and the integer is 15, that is, each video timestamp is decreased by 15 milliseconds, and the adjusted effect is shown in fig. 8.

In some of these embodiments, calculating the accumulation target value for the video frame based on the correction factor, the first timestamp, the second timestamp, and the accumulation reference value comprises:

determining a corresponding function in the second preset piecewise function according to the first comparison result and the second comparison result;

and calculating to obtain an accumulation target value according to the accumulation reference value, the correction coefficient and a corresponding function in the second preset piecewise function.

In some embodiments, the second predetermined piecewise function is:

wherein t represents an accumulation target value, v represents an accumulation reference value, y represents a correction coefficient, apts represents a first time stamp, vpts represents a second time stamp, and m represents a preset positive integer. The value of m is set to 5 in this application.

Specifically, after the correction coefficient is calculated, the corresponding function is determined according to the second piecewise function, and then the accumulation target value is calculated according to the function, that is, the accumulation target value is added to each aligned video frame, so that the consistency of the audio and video timestamps can be achieved again after a period of time, and the influence on video playing is small.

In the above, the case that the aligned video frame timestamp is greater than the aligned audio frame timestamp is discussed, and if the aligned video frame timestamp is smaller than the aligned audio frame timestamp, the calculation method of adding the correction coefficient k to the original aligned timestamp needs to be the same.

It should be understood that, although the steps in the flowchart of fig. 1 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in fig. 1 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

The present embodiment further provides an audio and video synchronization apparatus based on different reference clocks, where the apparatus is used to implement the foregoing embodiments and preferred embodiments, and details are not repeated for what has been described. As used below, the terms "module," "unit," "subunit," and the like may implement a combination of software and/or hardware for a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.

In one embodiment, as shown in fig. 9, an audio and video synchronization apparatus based on different reference clocks is provided, including a receiving module 910, an alignment module 920, a calculating module 930, and a correcting module 940, where:

a receiving module 910, configured to receive an audio frame and a video frame;

an alignment module 920, configured to calculate a first timestamp of the audio frame and a second timestamp of the video frame, and perform alignment processing on the first timestamp and the second timestamp;

a calculating module 930 configured to calculate an offset between the first timestamp and the second timestamp;

and a correcting module 940, configured to correct the second timestamp of the video frame according to the offset, so as to generate an audio-video sequence with timestamps of the audio frame and the video frame increasing in an interlaced manner.

The application provides an audio and video synchronizer based on different reference clocks, carry out alignment processing to the time stamp of audio and video frame through alignment module 920, then rectify the time stamp of video frame according to the offset between the adjacent audio and video time stamp through correction module 940 to accelerate or slow down the broadcast speed of video frame, thereby make the time stamp staggered uniform incremental arrangement of audio frame and video frame, solved the asynchronous problem of audio and video broadcast based on different reference clocks.

In some embodiments, the alignment module 920 is further configured to set the first timestamp of the audio frame to 0 if the audio frame is the received first frame audio frame; otherwise, calculating the playing duration of the audio frame according to the length of the audio frame; the sum of the first time stamp of the audio frame of the previous frame and the playing time length is calculated and used as the first time stamp of the audio frame.

In some embodiments, the alignment module 920 is further configured to set the second timestamp of the video frame to 0 if the video frame is the received first frame video frame; and otherwise, calculating a second offset of the time stamp of the video frame and the time stamp of the last frame of video frame, and taking the second offset as an accumulation reference value of the second time stamp.

In some embodiments, the correcting module 940 is further configured to calculate a correction coefficient of the video frame according to the first offset if the first offset is greater than an offset threshold; calculating an accumulation target value of the video frame according to the correction coefficient, the first timestamp, the second timestamp and the accumulation reference value; and correcting the second time stamp of the video frame according to the accumulated target value.

In some embodiments, the correction module 940 is further configured to set a frame number for correcting the video frame; and calculating a correction coefficient of the video frame according to the first offset and the frame number of the corrected video frame.

In some embodiments, the correction module 940 is further configured to compare the first time stamp of the adjacent audio frame with the second time stamp of the video frame to obtain a first comparison result; comparing the accumulated reference value with the correction coefficient to obtain a second comparison result; determining a corresponding function in a second preset piecewise function according to the first comparison result and the second comparison result; and calculating to obtain the accumulation target value according to the accumulation reference value, the correction coefficient and a corresponding function in the second preset piecewise function.

In some embodiments, the second predetermined piecewise function is:

For specific limitations of the audio and video synchronization apparatus based on different reference clocks, reference may be made to the above limitations of the audio and video synchronization method based on different reference clocks, which are not described herein again. The modules in the audio and video synchronization device based on different reference clocks can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In addition, the audio and video synchronization method based on different reference clocks in the embodiment of the present application described in conjunction with fig. 1 may be implemented by a computer device. Fig. 10 is a hardware configuration diagram of a computer device according to an embodiment of the present application.

The computer device may comprise a processor 101 and a memory 102 storing computer program instructions.

Specifically, the processor 101 may include a Central Processing Unit (CPU), or A Specific Integrated Circuit (ASIC), or may be configured to implement one or more Integrated circuits of the embodiments of the present Application.

Memory 102 may include, among other things, mass storage for data or instructions. By way of example, and not limitation, memory 102 may include a Hard Disk Drive (Hard Disk Drive, abbreviated HDD), a floppy Disk Drive, a Solid State Drive (SSD), flash memory, an optical Disk, a magneto-optical Disk, tape, or a Universal Serial Bus (USB) Drive or a combination of two or more of these. Memory 102 may include removable or non-removable (or fixed) media, where appropriate. The memory 102 may be internal or external to the data processing apparatus, where appropriate. In a particular embodiment, the memory 102 is a Non-Volatile (Non-Volatile) memory. In particular embodiments, Memory 102 includes Read-Only Memory (ROM) and Random Access Memory (RAM). The ROM may be mask-programmed ROM, Programmable ROM (PROM), Erasable PROM (EPROM), Electrically Erasable PROM (EEPROM), Electrically rewritable ROM (EAROM), or FLASH Memory (FLASH), or a combination of two or more of these, where appropriate. The RAM may be a Static Random-Access Memory (SRAM) or a Dynamic Random-Access Memory (DRAM), where the DRAM may be a Fast Page Mode Dynamic Random-Access Memory (FPMDRAM), an Extended data output Dynamic Random-Access Memory (EDODRAM), a Synchronous Dynamic Random-Access Memory (SDRAM), and the like.

The memory 102 may be used to store or cache various data files that need to be processed and/or used for communication, as well as possible computer program instructions executed by the processor 102.

The processor 101 reads and executes the computer program instructions stored in the memory 102 to implement any one of the audio and video synchronization methods based on different reference clocks in the above embodiments.

In some of these embodiments, the computer device may also include a communication interface 103 and bus 100. As shown in fig. 10, the processor 101, the memory 102, and the communication interface 103 are connected via a bus 100 to complete communication therebetween.

The communication interface 103 is used for implementing communication between modules, apparatuses, units and/or devices in the embodiments of the present application. The communication port 103 may also be implemented with other components such as: the data communication is carried out among external equipment, image/data acquisition equipment, a database, external storage, an image/data processing workstation and the like.

Bus 100 includes hardware, software, or both to couple the components of the computer device to each other. Bus 100 includes, but is not limited to, at least one of the following: data Bus (Data Bus), Address Bus (Address Bus), Control Bus (Control Bus), Expansion Bus (Expansion Bus), and Local Bus (Local Bus). By way of example, and not limitation, Bus 100 may include an Accelerated Graphics Port (AGP) or other Graphics Bus, an Enhanced Industry Standard Architecture (EISA) Bus, a Front-Side Bus (FSB), a HyperTransport (HT) Interconnect, an ISA (ISA) Bus, an InfiniBand (InfiniBand) Interconnect, a Low Pin Count (LPC) Bus, a memory Bus, a Microchannel Architecture (MCA) Bus, a PCI (Peripheral Component Interconnect) Bus, a PCI-Express (PCI-X) Bus, a Serial Advanced Technology Attachment (AGP) Bus, a Local Video Association (Video Electronics Bus), abbreviated VLB) bus or other suitable bus or a combination of two or more of these. Bus 100 may include one or more buses, where appropriate. Although specific buses are described and shown in the embodiments of the application, any suitable buses or interconnects are contemplated by the application.

The computer device can execute the audio and video synchronization method based on different reference clocks in the embodiment of the present application based on the acquired program instruction, thereby implementing the audio and video synchronization method based on different reference clocks described in conjunction with fig. 1.

In addition, in combination with the audio and video synchronization method based on different reference clocks in the foregoing embodiments, embodiments of the present application may provide a computer-readable storage medium to implement. The computer readable storage medium having stored thereon computer program instructions; the computer program instructions, when executed by a processor, implement any one of the above embodiments of different reference clock based audio and video synchronization methods.

The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the claims. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. An audio and video synchronization method based on different reference clocks is characterized by comprising the following steps:

2. The method of claim 1, wherein receiving an audio frame and calculating a first timestamp for the audio frame comprises:

otherwise, calculating the playing duration of the audio frame according to the length of the audio frame; the sum of the first time stamp of the audio frame of the previous frame and the playing time length is calculated and used as the first time stamp of the audio frame.

3. The method of claim 1, wherein receiving a video frame and calculating a second timestamp for the video frame comprises:

4. The method of claim 3, wherein the correcting the second timestamp of the video frame according to the first offset comprises:

5. The method of claim 4, wherein the calculating the correction coefficient for the video frame according to the first offset comprises:

setting the frame number of a correction video frame;

6. The method of claim 4, wherein calculating the accumulated target value for the video frame according to the correction factor, the first timestamp, the second timestamp, and the accumulated reference value comprises:

7. The method of claim 6, wherein the second predetermined piecewise function is:

8. An audio and video synchronization device based on different reference clocks, the device comprising:

9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.