WO2023236767A1 - Audio and video processing method and apparatus, and storage medium - Google Patents

Audio and video processing method and apparatus, and storage medium Download PDF

Info

Publication number
WO2023236767A1
WO2023236767A1 PCT/CN2023/095554 CN2023095554W WO2023236767A1 WO 2023236767 A1 WO2023236767 A1 WO 2023236767A1 CN 2023095554 W CN2023095554 W CN 2023095554W WO 2023236767 A1 WO2023236767 A1 WO 2023236767A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio
video
data frame
time
expected
Prior art date
Application number
PCT/CN2023/095554
Other languages
French (fr)
Chinese (zh)
Inventor
郑万鹏
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2023236767A1 publication Critical patent/WO2023236767A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/8547Content authoring involving timestamps for synchronizing content

Definitions

  • Embodiments of the present application relate to, but are not limited to, the field of communication technology, and in particular, to an audio and video processing method, device, and storage medium thereof.
  • Embodiments of the present application provide an audio and video processing method, device, and storage medium.
  • inventions of the present application provide an audio and video processing method.
  • the audio and video processing method includes: obtaining a video data frame and an audio data frame; The timestamp sequence number of the video data frame received twice determines the video sequence number jump value; the video expected playback time is determined according to the video sequence number jump value, and the video expected playback time represents the playback time corresponding to the video data frame.
  • embodiments of the present application also provide an audio and video processing device, including: a memory, a processor, and a computer program stored in the memory and executable on the processor.
  • the processor executes the computer program Implement the above audio and video processing method.
  • embodiments of the present application also provide a computer-readable storage medium that stores computer-executable instructions, and the computer-executable instructions are used to execute the above audio and video processing method.
  • Figure 1 is a flow chart of an audio and video processing method provided by an embodiment of the present application.
  • Figure 2 is a flow chart for determining the jump value of a video sequence number provided by an embodiment of the present application
  • Figure 3 is a flow chart for determining the audio sequence number jump value provided by an embodiment of the present application.
  • Embodiments of the present application provide an audio and video processing method, device, and storage medium.
  • the timestamp sequence number of the video data frame can be used to determine the video sequence number jump threshold of the video data frame, thereby knowing the video continuous state to determine whether a jump occurs, and then the video sequence number jump threshold can be used as the basis for determining the expected video playback time of the video data frame received this time, to avoid long-term repeated frames or loss during the playback of video data frames.
  • Frame situation at the same time, it also allows the video data frame to be played accurately based on the expected playback time of the video; similarly, when the audio data frame is received, the timestamp sequence number of the audio data frame can be used to determine the audio of the audio data frame
  • the serial number jump threshold is used to know the continuous status of the audio and determine whether a jump occurs.
  • the audio serial number jump threshold can then be used as a basis for determining the expected audio playback time of the audio data frame received this time to avoid the playback of the audio data frame.
  • the audio data frame can be played accurately based on the expected playback time of the audio.
  • the received audio data frames and video data frames can accurately correspond in time, achieving synchronization of audio data and video data.
  • Figure 1 is a flow chart of an audio and video processing method provided by an embodiment of the present application.
  • the audio and video processing method includes step S100, step S200, step S300, step S400, step S500 and step S600.
  • Step S100 Obtain video data frames and audio data frames
  • Step S200 Determine the video sequence number jump value based on the timestamp sequence number of the video data frame received this time and the timestamp sequence number of the video data frames received twice before;
  • Step S300 Determine the expected playback time of the video according to the jump value of the video sequence number.
  • the expected playback time of the video represents the playback time corresponding to the video data frame;
  • Step S400 Determine the audio sequence number jump value based on the timestamp sequence number of the audio data frame received this time and the timestamp sequence number of the audio data frames received twice before;
  • Step S500 Determine the expected audio playback time according to the audio sequence number jump value.
  • the expected audio playback time represents the playback time corresponding to the audio data frame, where the expected audio playback time is consistent with the expected video playback time;
  • Step S600 Synchronize the audio data frame and the video data frame according to the desired audio playback time and the desired video playback time.
  • the timestamp serial number carried by the video data frame or audio data frame can be obtained. Therefore, the time of the video data frame or audio data frame received this time can be directly used. Stamp the sequence number and the timestamp sequence number of the two previously received video data frames or audio data frames to determine the video sequence number jump value or audio sequence number Jump value, and then determine whether a timestamp sequence number jump occurs, and determine the expected video playback time or audio playback time based on the audio sequence number jump value to ensure that it does not take a long time due to large changes in the timestamp sequence number. Frame dropping or frame filling occurs.
  • each received video data frame corresponds to the expected video playback time
  • each received audio data frame corresponds to the audio playback expectation.
  • Time correspondence ensures that audio and video are synchronized.
  • the received video data frames will carry timestamp serial numbers.
  • the audio and video data sending end sends the continuity of the video data frames, then in normal In this case, the timestamp serial numbers of the video data frames should be continuous. If the timestamp serial numbers are discontinuous, it means there is a problem with the video stream transmission. Among them, the time stamp sequence number jump will cause the timestamp sequence number to change greatly. Therefore, it is necessary to use the timestamp sequence numbers of three consecutive received video data frames to calculate the video sequence number jump value, and analyze the video sequence number jump value.
  • the video sequence number jump value is used to determine the expected video playback time in the jump state, so as to determine the expected video playback time corresponding to the video data frame and ensure that the video data Frames can be played on time.
  • the audio data frames received by the audio and video data processing end will also carry the timestamp sequence number.
  • the timestamp sequence number of the audio data frame It should also be continuous. If the timestamp serial numbers are discontinuous, it means there is a problem with the audio stream transmission. Among them, the time stamp sequence number jump will cause the timestamp sequence number to change greatly. Therefore, it is necessary to use the timestamp sequence numbers of three consecutive received audio data frames to calculate the audio sequence number jump value, and analyze the audio sequence number jump value.
  • the audio serial number jump value is used to complete the determination of the expected audio playback time in the jump state, to determine the expected audio playback time corresponding to the audio data frame, and ensure that the audio data Frames can be played on time.
  • Each video data frame received will correspond to an expected playback time of the video.
  • the video data frame will be played on time at the expected playback time of the video, which is equivalent to corresponding settings at different moments on a timeline.
  • different audio data frames can also be set correspondingly at different moments in a timeline. At this time, you only need to ensure that the expected playback time of the audio corresponds to the expected playback time of the video, so that the two can be guaranteed to correspond on the timeline.
  • the subsequent audio and video processing method only needs to process each frame of audio data frame or video data frame according to the present application. Complete the corresponding desired playback time settings.
  • Step S200 includes but is not limited to the steps: step S210, step S220 and step S230. ,
  • Step S210 Determine the first sequence number difference based on the timestamp sequence numbers of the two previously received video data frames
  • Step S220 Determine the second sequence number difference based on the timestamp sequence number of the video data frame received this time and the timestamp sequence number of the video data frame received last time;
  • Step S220 Calculate the video sequence number jump value based on the second sequence number difference and the first sequence number difference.
  • the first two timestamp sequence numbers are introduced here to collaboratively determine the degree of timestamp error sequence number jumps.
  • the timestamp sequence numbers of three received video data frames are used to determine the video sequence number jump value, that is, the difference between the first sequence number and the second
  • the ratio of the sequence number difference can directly and effectively know the degree of jump of the timestamp sequence number, which facilitates subsequent judgment of whether a timestamp sequence number jump occurs.
  • the constraint formula for calculating the video serial number jump value u can refer to the following formula:
  • step S300 includes but is not limited to the steps: when the video sequence number jump value is greater than the preset video sequence number jump threshold, the video corresponding to the last received video data frame is expected to be played. The first time interval is added to the time, the expected video playback time is determined, and the first video expected playback time is obtained, where the first time interval represents the time interval between two received video data frames.
  • the video sequence number jump threshold also needs to be set larger, so that it can be distinguished from the short-term loss of the timestamp sequence number caused by packet loss. After determining the video sequence number jump value of the video data frame received this time, it is necessary to judge the video sequence number jump value. When the video sequence number jump value is greater than the video sequence number jump threshold, it can be determined that a timestamp sequence number currently occurs. Jump, at this time, the timestamp sequence number is directly used to determine the expected playback time of the video, and it is easy for the expected playback time of the video to be discontinuous.
  • the expected playback time of the video data frame received this time can be determined based on the first time interval. Therefore, after determining that the timestamp serial number jump occurs, you only need to add a first time interval to the expected video playback time corresponding to the last received video data frame to determine the expected video playback time of the currently received video data frame. , at the same time, the expected playback time of the video will also be recorded as the expected playback time of the first video, which will be used to subsequently update the initial video timestamp number and the expected playback time of the initial video.
  • the audio and video processing method also includes: when the video sequence number jump value is less than the video sequence number jump threshold, determine the video expected playback based on the initial video timestamp number, the initial video expected playback time, and the timestamp number of the video data frame. time to obtain the expected playback time of the second video, where the initial video timestamp sequence number is obtained based on the timestamp sequence number of the video data frame received this time; the initial video expected playback time is based on the expected video playback time of the video data frame received this time. And get.
  • the timestamp number When there is no jump in the timestamp number, the timestamp number will not change too much. During normal playback, the timestamp number is in a continuous state. When network fluctuations cause network packet loss and video data frames are missing, the missing timestamp number will It will not be too much. At this time, you can directly use the timestamp sequence number of the video data frame received this time to complete the determination of the expected playback time of the video. In some embodiments, if the video sequence number jump value corresponding to the video data frame received this time is less than the preset video sequence number jump threshold, it means that the video data frame received this time is in a normal playback state or the network has lost packets. status, there is no jump.
  • the expected playback time of this frame of video can be quickly determined based on the preset initial video timestamp number, the initial video expected playback time, and the timestamp number of the video data frame received this time.
  • the playback time interval of every two video data frames is fixed, then you only need to determine the difference between the video data frame received this time and the initial video timestamp number, and then the initial video can be expected to be played. Based on the time, this time interval is used to determine the expected playback time of this video data frame.
  • the difference between the timestamp number of the current video data frame and the initial video timestamp number can be directly used to determine the interval timestamp number, and then combined with the playback time interval between each two video data frames. Multiplication is performed to determine the video playback time difference from the initial video expected playback time. Finally, by adding the video playback time difference to the initial video expected playback time, the expected video playback time of the video data frame received this time can be determined. .
  • the initial video timestamp number and the initial video expected playback time are obtained by the following steps: when the video data frame is the first video data frame received, determine the timestamp number of the first video data frame. is the initial video timestamp sequence number, and the expected video playback time corresponding to the first video data frame is determined as the initial video expected playback time.
  • initialization After receiving the first frame of video data frame, initialization will begin. At this time, the timestamp of the first frame of video data frame is used. sequence number to complete the initial assignment of the initial video timestamp sequence number, and use the expected video playback time of the first frame of video data frame to complete the initial assignment of the expected playback time of the initial video.
  • the audio and video processing method further includes:
  • the initial video timestamp sequence number is updated according to the timestamp sequence number of the video data frame received this time;
  • the initial video expected play time is updated according to the first video expected play time.
  • the timestamp number of a video data frame jumps, the timestamp numbers of all subsequently received video data frames will be assigned according to the timestamp number after the jump. Therefore, the initial video corresponding to the first video data frame can no longer be used.
  • the timestamp number and initial video expected play time are used to calculate subsequent expected video play time, and after each jump, the previous initial video timestamp number and initial video expected play time cannot be used again.
  • the timestamp number of the video data frame received this time and the expected playback time of the first video are first determined, and then the timestamp number and the expected playback time of the first video are directly used. Time can complete the update of the initial video timestamp serial number and the initial video expected playback time. Subsequently, the expected video playback time is calculated based on the updated initial video timestamp serial number and the initial video expected playback time, thereby ensuring that the entire video Accuracy and smoothness of data frame playback on the timeline.
  • the audio and video processing method further includes: when the time of receiving the first frame of video data frame is earlier than or equal to the time of receiving the first frame of audio data frame. time, and set the expected playback time of the video to the preset time value.
  • the initial video timestamp number can be obtained directly from the first video data frame, but the initial video expected playback time cannot be obtained directly.
  • the preset time value can be directly determined as 0 seconds.
  • the audio and video processing method when the video data frame is the received first frame of video data frame, the audio and video processing method further includes: when the time of receiving the first frame of video data frame is later than the time of receiving the first frame of audio data frame. , the time interval between receiving the first frame of video data frame and receiving the first frame of audio data frame is determined as the expected video playback time.
  • x start is the initial video timestamp sequence number
  • x k is the k-th video data frame
  • t x is the time interval between two video data frames
  • abs(x start ) is the expected play time of the initial video
  • 0 is is 0 seconds
  • is the time interval between the first video data frame and the first audio data frame.
  • equation (1) is used to calculate the expected playback time of the video, and the initial view can be determined.
  • the video timestamp sequence number and the expected playback time of the initial video When there is no jump in the video data frame transmission, use equation (2) to complete the determination of the expected video playback time.
  • equation (3) is used to determine the expected video playback time, and the initial video timestamp sequence number and the initial video expected playback time are updated at the same time.
  • the audio and video processing method also includes:
  • the expected video playback time of the copied video data frame is determined according to the expected video playback time of the first time interval corresponding to the last received video data frame, where the first time interval represents the time interval between two received video data frames.
  • Step S400 includes but is not limited to the steps: step S410, step S420 and step S430. ,
  • Step S410 Determine the third sequence number difference based on the timestamp sequence numbers of the two previously received audio data frames
  • Step S420 Determine the fourth sequence number difference based on the timestamp sequence number of the audio data frame received this time and the timestamp sequence number of the audio data frame received last time;
  • Step S430 Calculate the audio sequence number jump value based on the third sequence number difference and the fourth sequence number difference.
  • the first two timestamp serial numbers are introduced here to jointly realize the judgment of the degree of wrong sequence number jump of the timestamp.
  • the timestamp serial numbers of the three received audio data frames are used to determine the audio sequence number jump value, that is, the difference between the third sequence number and the fourth
  • the ratio of the serial number difference can directly and effectively know the degree of jump of the timestamp serial number, which facilitates subsequent determination of whether a timestamp serial number occurs.
  • the constraint formula for calculating the audio sequence number jump value you can refer to the constraint formula for calculating the video sequence number jump value.
  • step S600 includes but is not limited to the step of: when the audio sequence number jump value is greater than the preset audio sequence number jump threshold, the expected audio playback time corresponding to the last received audio data frame is A second time interval is added to determine the expected audio playback time to obtain the first expected audio playback time, where the second time interval represents the time interval between two receptions of audio data frames.
  • the audio sequence number jump threshold also needs to be set larger, so that it can be distinguished from the short-term loss of the timestamp sequence number caused by packet loss. After determining the audio sequence number jump value of the audio data frame received this time, it is necessary to judge the audio sequence number jump value. When the audio sequence number jump value is greater than the audio sequence number jump threshold, it can be determined that a timestamp sequence number currently occurs. jump, at this time, the timestamp sequence number is directly used to complete the determination of the expected audio playback time, which is prone to the situation where the expected audio playback time is discontinuous.
  • the expected audio playback time of the audio data frame received this time can be determined based on the second time interval. Therefore, after determining that the timestamp serial number jump occurs, you only need to add a second time interval to the expected audio playback time corresponding to the last received audio data frame to determine the expected audio playback time of the currently received audio data frame. , at the same time, the expected audio playback time will also be recorded as the first audio expected playback time, which will be used to subsequently update the initial audio timestamp number and the initial audio expected playback time.
  • the audio and video processing method further includes: when the audio sequence number jump value is less than the audio sequence number jump threshold, determine the audio period based on the initial audio timestamp number, the initial audio expected playback time, and the timestamp number of the audio data frame. Expect the playback time to obtain the second audio expected playback time, where the initial audio timestamp number is obtained based on the timestamp number of the audio data frame received this time, and the initial audio expected playback time is based on the audio expectation of the audio data frame received this time. obtained by playing time.
  • the timestamp sequence number When there is no time stamp sequence number jump, the timestamp sequence number will not change too much. During normal playback, the timestamp sequence number is in a continuous state. When network fluctuations cause network packet loss and audio data frames are missing, the missing timestamp sequence number will It will not be too much. At this time, you can directly use the timestamp sequence number of the audio data frame received this time to complete the determination of the expected audio playback time. In some embodiments, if the audio sequence number jump value corresponding to the audio data frame received this time is less than the preset audio sequence number jump threshold, it means that the audio data frame received this time is in a normal playback state or the network packet is lost. status, there is no jump.
  • the expected playback time of this frame of audio can be quickly determined based on the preset initial audio timestamp number, the initial expected audio playback time, and the timestamp number of the audio data frame received this time.
  • the playback time interval of every two audio data frames is fixed, so you only need to determine the difference between the audio data frame received this time and the initial audio timestamp sequence number, and then the initial audio data can be played when expected. Based on the time, this time interval is used to complete the determination of the expected audio playback time of this audio data frame.
  • the difference between the timestamp number of this audio data frame and the initial audio timestamp number can be directly used to determine the timestamp number of the interval, and then combined with the playback time interval between each two audio data frames. Perform a multiplication operation to determine the audio playback time difference from the initial audio expected playback time. Finally, adding the audio playback time difference to the initial audio expected playback time can determine the audio expected playback time of the audio data frame received this time. .
  • the initial audio timestamp number and the initial audio expected playback time are obtained by the following steps: when the audio data frame is the first received audio data frame, determine the timestamp number of the first audio data frame. is the initial audio timestamp sequence number, and the expected audio playback time corresponding to the first frame of audio data frame is determined as the initial expected audio playback time.
  • initialization After receiving the first audio data frame, initialization will begin. At this time, the timestamp number of the first audio data frame is used to complete the initialization assignment of the initial audio timestamp number. The audio of the first audio data frame is used to complete the initialization assignment. The expected playback time completes the initial assignment of the expected playback time of the initial audio.
  • the audio and video processing method further includes:
  • the initial audio timestamp sequence number is updated according to the timestamp sequence number of the audio data frame received this time;
  • the initial audio expected play time is updated according to the first audio expected play time.
  • the timestamp number of an audio data frame jumps, the timestamp numbers of all subsequently received audio data frames will be assigned according to the timestamp number after the jump. Therefore, the initial audio corresponding to the first audio data frame can no longer be used.
  • the timestamp number and the initial expected audio playback time are used for subsequent calculations of the expected audio playback time, and after each jump, the previous initial audio timestamp number and initial expected audio playback time cannot be used again.
  • the timestamp number of the audio data frame received this time and the expected playback time of the first audio are first determined, and then the timestamp number and the expected playback time of the first audio are directly used.
  • Time can complete the update of the initial audio timestamp serial number and the initial audio expected playback time, and then calculate the audio expected playback time based on the updated initial audio timestamp serial number and the initial audio expected playback time, thereby ensuring that the entire audio Accuracy and smoothness of data frame playback on the timeline.
  • the audio and video processing method further includes: when the time of receiving the first audio data frame is earlier than or equal to the time of receiving the first video data frame time to set the desired audio playback time to the preset time value.
  • the initial audio timestamp number can be obtained directly from the first audio data frame, but the initial audio expected playback time cannot be obtained directly.
  • the preset time value can be directly determined as 0 seconds.
  • the audio and video processing method when the audio data frame is the first received audio data frame, further includes: when the time of receiving the first frame of audio data frame is later than the time of receiving the first frame of video data frame. , the time interval between receiving the first frame of audio data frame and receiving the first frame of video data frame is determined as the expected audio playback time.
  • the constraint relationship between the expected audio play time, the timestamp number, the initial audio timestamp number, and the initial expected audio play time can refer to the expected video play time, the timestamp number of the video data frame, the initial video timestamp number, and The constraint relationship of the expected playback time of the initial video.
  • the audio and video processing method also includes:
  • the expected audio playback time of the copied audio data frame is determined according to the expected audio playback time of the second time interval corresponding to the last received audio data frame, where the second time interval represents the time interval between two received audio data frames.
  • the audio and video processing method includes the following steps:
  • the initial frame of audio data frame and the first frame of video data frame determine the initial video expected play time and initial video timestamp sequence number of the video data stream based on the first frame of video data frame, and determine the audio data based on the first frame of audio data frame
  • the initial audio expected play time and initial audio timestamp sequence number of the stream when initialized for the first time, the initial video timestamp sequence number and the initial audio timestamp sequence number will usually remain consistent, and the initial video expected play time and initial audio expected play time will be based on NTP (Network Time Protocol) protocol determines a time interval;
  • NTP Network Time Protocol
  • the initial video timestamp sequence number, the initial video expected playback time and the timestamp sequence number of the video data frame can be directly used to determine the expected video playback time.
  • a first time interval is added to the expected video playback time corresponding to the last received video data frame to determine the expected video playback time after the jump, and at the same time , obtain the expected playback time of the first video, and update the initial video timestamp number and the initial video timestamp number according to the expected playback time of the first video, or the processing of the audio data frame can be completed based on the same principle as video processing, Obtain the expected audio playback time after the jump, and complete the update of the initial audio timestamp sequence number and the initial audio timestamp sequence number;
  • the audio and video processing method of this application directly uses the expected video playback time and the expected audio playback time to construct a timeline, and determines the corresponding expected video playback time and audio expected playback time for each video data frame and audio data frame, so that each video Data frames and audio data frames are unique in the timeline, so that the entire media stream can achieve audio and video synchronization simply and clearly.
  • an embodiment of the present application also provides a computer-readable storage medium.
  • the computer-readable storage medium stores computer-executable instructions.
  • the computer-executable instructions are used to execute the above audio and video processing method, for example, by the above Execution by a processor in the embodiment of the audio and video processing device can cause the above-mentioned processor to execute the information processing method in the above embodiment, for example, execute the method in Figure 1, the method in Figure 2 and the method in Figure 3 described above. Methods.
  • the audio and video processing device includes: a memory, a processor, and a computer program stored in the memory and executable on the processor.
  • the processor executes the computer program.
  • the non-transitory software programs and instructions required to implement the audio and video processing methods of the above embodiments are stored in the memory.
  • the audio and video processing methods in the above embodiments are executed.
  • the above described Figure 1 is executed.
  • Embodiments of the present application include: acquiring video data frames and audio data frames; determining the video sequence number jump value based on the timestamp number of the video data frame received this time and the timestamp number of the video data frames received twice before; The jump value determines the expected playback time of the video, and the expected playback time of the video represents the playback time corresponding to the video data frame; the audio sequence number is determined based on the timestamp number of the audio data frame received this time and the timestamp number of the two previously received audio data frames.
  • Jump value Determine the expected audio playback time based on the audio serial number jump value.
  • the expected audio playback time represents the playback time corresponding to the audio data frame.
  • the expected audio playback time is consistent with the expected video playback time; According to the expected audio playback time and video It is expected that the audio data frame and the video data frame will be synchronized during playback time.
  • the timestamp sequence number of the video data frame is used to determine the video sequence number jump threshold of the video data frame, so as to know the continuous status of the video and determine whether a jump occurs.
  • the video sequence number jump threshold can then be used to determine the video data frame received this time.
  • the basis of the expected playback time of the video avoids long-term repeated frames or dropped frames during the playback of video data frames.
  • the video data frames to be played accurately based on the expected playback time of the video; similarly, in When an audio data frame is received, the timestamp sequence number of the audio data frame can be used to determine the audio sequence number jump threshold of the audio data frame, so as to know the continuous state of the audio and determine whether a jump occurs, and then the audio sequence number jump threshold can be used as The basis for determining the expected audio playback time of the audio data frame received this time to avoid long-term repeated frames or frame loss during the playback of the audio data frame. At the same time, it also makes Audio data frames can be played accurately based on the expected playback time of the audio. Finally, because the expected audio playback time and the expected video playback time are consistent, the received audio data frames and video data frames can accurately correspond in time, achieving synchronization of audio data and video data.
  • Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other storage cell technology, CD-ROM, Digital Versatile Disk (DVD) or other optical disk storage, magnetic cassettes, tapes, disk storage or other magnetic storage devices, or Any other medium that can be used to store the desired information and that can be accessed by a computer.
  • communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery media .

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Television Signal Processing For Recording (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

Provided in the present application are an audio and video processing method and apparatus, and a storage medium. The audio and video processing method comprises: acquiring a video data frame and an audio data frame (S100); determining a video serial number jump value according to a timestamp serial number of the currently received video data frame and timestamp serial numbers of two previously received video data frames (S200); then, determining an expected video playback time by using the video serial number jump value (S300); on the basis of the same principle, determining an audio serial number jump value (S400), and determining an expected audio playback time (S500); and finally, completing audio and video synchronization by using the correlation between the expected audio playback time and the expected video playback time (S600).

Description

音视频处理方法及其装置、存储介质Audio and video processing method and device and storage medium
相关申请的交叉引用Cross-references to related applications
本申请基于申请号为202210631499.0、申请日为2022年6月6日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。This application is filed based on a Chinese patent application with application number 202210631499.0 and a filing date of June 6, 2022, and claims the priority of the Chinese patent application. The entire content of the Chinese patent application is hereby incorporated into this application as a reference.
技术领域Technical field
本申请实施例涉及但不限于通信技术领域,尤其涉及一种音视频处理方法及其装置、存储介质。Embodiments of the present application relate to, but are not limited to, the field of communication technology, and in particular, to an audio and video processing method, device, and storage medium thereof.
背景技术Background technique
现有的音视频同步技术较多,但是,这些方法普遍都是采取丢帧或者重复帧的方法对视频流和音频流进行同步,而在实际应用中会遇到这样的场景:如果在网络另一侧的摄像头由于重启或者网络抖动导致摄像头的RTP(Real-time Transport Protocol,实时传输协议)时间戳在某一时刻发生了变化,但是在时间上和跳变之前的画面或者声音是连续的,在该场景下,如果继续采用现有方法进行音视频同步,则会导致长时间出现重复帧或者丢帧的情形,严重影响了用户端的观看体验。There are many existing audio and video synchronization technologies. However, these methods generally use frame dropping or repeated frame methods to synchronize video streams and audio streams. In practical applications, you will encounter such a scenario: if another network connection is used, The camera's RTP (Real-time Transport Protocol, Real-time Transport Protocol) timestamp of the camera on one side changed at a certain moment due to restart or network jitter, but the time is continuous with the picture or sound before the jump. In this scenario, if you continue to use the existing method for audio and video synchronization, it will lead to repeated frames or dropped frames for a long time, seriously affecting the user's viewing experience.
发明内容Contents of the invention
本申请实施例提供了一种音视频处理方法及其装置、存储介质。Embodiments of the present application provide an audio and video processing method, device, and storage medium.
第一方面,本申请实施例提供了一种音视频处理方法,所述音视频处理方法包括:获取视频数据帧、音频数据帧;根据本次接收的所述视频数据帧的时间戳序号以及前两次接收的所述视频数据帧的时间戳序号确定视频序号跳变值;根据所述视频序号跳变值确定视频期望播放时间,所述视频期望播放时间表征所述视频数据帧对应的播放时间;根据本次接收的所述音频数据帧的时间戳序号以及前两次接收的所述音频数据帧的时间戳序号确定音频序号跳变值;根据所述音频序号跳变值确定音频期望播放时间,所述音频期望播放时间表征所述音频数据帧对应的播放时间,其中,所述音频期望播放时间与所述视频期望播放时间相一致;根据所述音频期望播放时间和所述视频期望播放时间对所述音频数据帧和所述视频数据帧进行同步处理。In the first aspect, embodiments of the present application provide an audio and video processing method. The audio and video processing method includes: obtaining a video data frame and an audio data frame; The timestamp sequence number of the video data frame received twice determines the video sequence number jump value; the video expected playback time is determined according to the video sequence number jump value, and the video expected playback time represents the playback time corresponding to the video data frame. ; Determine the audio sequence number jump value based on the timestamp sequence number of the audio data frame received this time and the timestamp sequence number of the audio data frame received twice before; determine the audio expected playback time based on the audio sequence number jump value , the audio expected play time represents the play time corresponding to the audio data frame, wherein the audio expected play time is consistent with the video expected play time; according to the audio expected play time and the video expected play time Perform synchronization processing on the audio data frame and the video data frame.
第二方面,本申请实施例还提供了一种音视频处理装置,包括:存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如上述的音视频处理方法。In a second aspect, embodiments of the present application also provide an audio and video processing device, including: a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program Implement the above audio and video processing method.
第三方面,本申请实施例还提供了一种计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令用于执行如上述的音视频处理方法。In a third aspect, embodiments of the present application also provide a computer-readable storage medium that stores computer-executable instructions, and the computer-executable instructions are used to execute the above audio and video processing method.
附图说明 Description of the drawings
图1是本申请一个实施例提供的音视频处理方法的流程图;Figure 1 is a flow chart of an audio and video processing method provided by an embodiment of the present application;
图2是本申请一个实施例提供的确定视频序号跳变值的流程图;Figure 2 is a flow chart for determining the jump value of a video sequence number provided by an embodiment of the present application;
图3是本申请一个实施例提供的确定音频序号跳变值的流程图。Figure 3 is a flow chart for determining the audio sequence number jump value provided by an embodiment of the present application.
具体实施方式Detailed ways
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行详细说明。应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。In order to make the purpose, technical solutions and advantages of the present application clearer, the present application will be described in detail below with reference to the drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present application and are not used to limit the present application.
需要说明的是,虽然在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于流程图中的顺序执行所示出或描述的步骤。说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。It should be noted that although a logical sequence is shown in the flowchart, in some cases, the steps shown or described may be performed in an order different from that in the flowchart. The terms "first", "second", etc. in the description, claims, and above-mentioned drawings are used to distinguish similar objects and are not necessarily used to describe a specific sequence or sequence.
本申请实施例提供了一种音视频处理方法及其装置、存储介质,在接收到视频数据帧时,可以利用视频数据帧的时间戳序号确定视频数据帧的视频序号跳变阈值,从而知晓视频的连续状态,确定是否出现跳变,进而可以利用视频序号跳变阈值来作为确定本次接收的视频数据帧的视频期望播放时间的依据,避免视频数据帧的播放时出现长时间重复帧或丢帧的情况,同时,也使得视频数据帧可以在视频期望播放时间的基础上实现准确播放;同理,在接收到音频数据帧时,可以利用音频数据帧的时间戳序号确定音频数据帧的音频序号跳变阈值,从而知晓音频的连续状态,确定是否出现跳变,进而可以利用音频序号跳变阈值来作为确定本次接收的音频数据帧的音频期望播放时间的依据,避免音频数据帧的播放时出现长时间重复帧或丢帧的情况,同时,也使得音频数据帧可以在音频期望播放时间的基础上实现准确播放。最终,因为音频期望播放时间和视频期望播放时间存在一致性,从而使得接收的音频数据帧和视频数据帧可以在时间上实现准确对应,实现对音频数据和视频数据的同步。Embodiments of the present application provide an audio and video processing method, device, and storage medium. When receiving a video data frame, the timestamp sequence number of the video data frame can be used to determine the video sequence number jump threshold of the video data frame, thereby knowing the video continuous state to determine whether a jump occurs, and then the video sequence number jump threshold can be used as the basis for determining the expected video playback time of the video data frame received this time, to avoid long-term repeated frames or loss during the playback of video data frames. Frame situation, at the same time, it also allows the video data frame to be played accurately based on the expected playback time of the video; similarly, when the audio data frame is received, the timestamp sequence number of the audio data frame can be used to determine the audio of the audio data frame The serial number jump threshold is used to know the continuous status of the audio and determine whether a jump occurs. The audio serial number jump threshold can then be used as a basis for determining the expected audio playback time of the audio data frame received this time to avoid the playback of the audio data frame. When frames are repeated or dropped for a long time, the audio data frame can be played accurately based on the expected playback time of the audio. Finally, because the expected audio playback time and the expected video playback time are consistent, the received audio data frames and video data frames can accurately correspond in time, achieving synchronization of audio data and video data.
如图1所示,图1是本申请一个实施例提供的音视频处理方法的流程图。As shown in Figure 1, Figure 1 is a flow chart of an audio and video processing method provided by an embodiment of the present application.
如图1所示,该音视频处理方法包括步骤S100、步骤S200、步骤S300、步骤S400、步骤S500和步骤S600,As shown in Figure 1, the audio and video processing method includes step S100, step S200, step S300, step S400, step S500 and step S600.
步骤S100:获取视频数据帧、音频数据帧;Step S100: Obtain video data frames and audio data frames;
步骤S200:根据本次接收的视频数据帧的时间戳序号以及前两次接收的视频数据帧的时间戳序号确定视频序号跳变值;Step S200: Determine the video sequence number jump value based on the timestamp sequence number of the video data frame received this time and the timestamp sequence number of the video data frames received twice before;
步骤S300:根据视频序号跳变值确定视频期望播放时间,视频期望播放时间表征视频数据帧对应的播放时间;Step S300: Determine the expected playback time of the video according to the jump value of the video sequence number. The expected playback time of the video represents the playback time corresponding to the video data frame;
步骤S400:根据本次接收的音频数据帧的时间戳序号以及前两次接收的音频数据帧的时间戳序号确定音频序号跳变值;Step S400: Determine the audio sequence number jump value based on the timestamp sequence number of the audio data frame received this time and the timestamp sequence number of the audio data frames received twice before;
步骤S500:根据音频序号跳变值确定音频期望播放时间,音频期望播放时间表征音频数据帧对应的播放时间,其中,音频期望播放时间与视频期望播放时间相一致;Step S500: Determine the expected audio playback time according to the audio sequence number jump value. The expected audio playback time represents the playback time corresponding to the audio data frame, where the expected audio playback time is consistent with the expected video playback time;
步骤S600:根据音频期望播放时间和视频期望播放时间对音频数据帧和视频数据帧进行同步处理。Step S600: Synchronize the audio data frame and the video data frame according to the desired audio playback time and the desired video playback time.
本实施例中,在接收到视频数据帧或音频数据帧时,可以获取视频数据帧或音频数据帧携带的时间戳序号,因此,可以直接通过本次接收的视频数据帧或音频数据帧的时间戳序号与前两次接收的视频数据帧或音频数据帧的时间戳序号,来确定视频序号跳变值或音频序号 跳变值,进而确定是否出现了时间戳序号跳变,并根据音频序号跳变值来确定视频期望播放时间或音频播放期望时间,以确保不会因为出现时间戳序号变化较大导致需要长时间丢帧或补帧的情形出现。同时,因为视频期望播放时间和音频播放期望时间在时间上处于相一致的关系,从而只需要保证每次接收的视频数据帧与视频期望播放时间对应、每次接收的音频数据帧与音频播放期望时间对应,即可保证音视频实现同步。In this embodiment, when a video data frame or audio data frame is received, the timestamp serial number carried by the video data frame or audio data frame can be obtained. Therefore, the time of the video data frame or audio data frame received this time can be directly used. Stamp the sequence number and the timestamp sequence number of the two previously received video data frames or audio data frames to determine the video sequence number jump value or audio sequence number Jump value, and then determine whether a timestamp sequence number jump occurs, and determine the expected video playback time or audio playback time based on the audio sequence number jump value to ensure that it does not take a long time due to large changes in the timestamp sequence number. Frame dropping or frame filling occurs. At the same time, because the expected video playback time and the expected audio playback time are in a consistent relationship in time, it only needs to ensure that each received video data frame corresponds to the expected video playback time, and each received audio data frame corresponds to the audio playback expectation. Time correspondence ensures that audio and video are synchronized.
在一些实施例中,在实时会议或直播等流媒体传输的场景中,接收到的视频数据帧会携带时间戳序号,同时,因为音视频数据发送端发送视频数据帧的连续性,那么在正常情形下,视频数据帧的时间戳序号应当是连续的,此时如果出现时间戳序号不连续,则说明视频流传输出现的问题。其中,时间戳序号跳变会导致时间戳序号出现较大的变化,因此,需要利用连续三次接收的视频数据帧的时间戳序号来计算视频序号跳变值,通过对视频序号跳变值进行分析,从而可以确定是否出现了跳变,进而再利用视频序号跳变值来完成在跳变状态下的视频期望播放时间的确定,以确定该视频数据帧对应的视频期望播放时间,确保该视频数据帧可以准时播放。In some embodiments, in streaming media transmission scenarios such as real-time conferences or live broadcasts, the received video data frames will carry timestamp serial numbers. At the same time, because the audio and video data sending end sends the continuity of the video data frames, then in normal In this case, the timestamp serial numbers of the video data frames should be continuous. If the timestamp serial numbers are discontinuous, it means there is a problem with the video stream transmission. Among them, the time stamp sequence number jump will cause the timestamp sequence number to change greatly. Therefore, it is necessary to use the timestamp sequence numbers of three consecutive received video data frames to calculate the video sequence number jump value, and analyze the video sequence number jump value. , so that it can be determined whether a jump has occurred, and then the video sequence number jump value is used to determine the expected video playback time in the jump state, so as to determine the expected video playback time corresponding to the video data frame and ensure that the video data Frames can be played on time.
同理,音视频数据处理端接收到的音频数据帧同样会携带时间戳序号,同时,因为音视频数据发送端发送视频数据帧的连续性,那么在正常情形下,音频数据帧的时间戳序号也应当是连续的,此时如果出现时间戳序号不连续,则说明音频流传输出现的问题。其中,时间戳序号跳变会导致时间戳序号出现较大的变化,因此,需要利用连续三次接收的音频数据帧的时间戳序号来计算音频序号跳变值,通过对音频序号跳变值进行分析,从而可以确定是否出现了跳变,进而再利用音频序号跳变值来完成在跳变状态下的音频期望播放时间的确定,以确定该音频数据帧对应的音频期望播放时间,确保该音频数据帧可以准时播放。In the same way, the audio data frames received by the audio and video data processing end will also carry the timestamp sequence number. At the same time, because of the continuity of the video data frames sent by the audio and video data sending end, under normal circumstances, the timestamp sequence number of the audio data frame It should also be continuous. If the timestamp serial numbers are discontinuous, it means there is a problem with the audio stream transmission. Among them, the time stamp sequence number jump will cause the timestamp sequence number to change greatly. Therefore, it is necessary to use the timestamp sequence numbers of three consecutive received audio data frames to calculate the audio sequence number jump value, and analyze the audio sequence number jump value. , so that it can be determined whether a jump has occurred, and then the audio serial number jump value is used to complete the determination of the expected audio playback time in the jump state, to determine the expected audio playback time corresponding to the audio data frame, and ensure that the audio data Frames can be played on time.
为了更好的解释本申请的音视频处理方法的音视频同步原理,这里进行一个简述。接收到的每一帧视频数据帧都会对应一个视频期望播放时间,该视频数据帧在经过处理之后会在这个视频期望播放时间准时进行播放,即相当于在一个时间轴上的不同时刻对应设置了不同的视频数据帧,只需要按照设置好的流程完成对视频数据帧的播放即可。同理,不同的音频数据帧也可以在一个时间轴的不同时刻对应设置,此时只需要再保证音频期望播放时间与视频期望播放时间对应,则可以保证两者的在时间轴上对应,因此,在确定好第一帧音频数据帧的音频期望播放时间和第一帧视频数据帧的视频期望播放时间后,后续只要依据本申请的音视频处理方法对每一帧音频数据帧或视频数据帧完成对应的期望播放时间设置即可。In order to better explain the audio and video synchronization principle of the audio and video processing method of this application, a brief description is provided here. Each video data frame received will correspond to an expected playback time of the video. After processing, the video data frame will be played on time at the expected playback time of the video, which is equivalent to corresponding settings at different moments on a timeline. For different video data frames, you only need to follow the set process to complete the playback of the video data frames. In the same way, different audio data frames can also be set correspondingly at different moments in a timeline. At this time, you only need to ensure that the expected playback time of the audio corresponds to the expected playback time of the video, so that the two can be guaranteed to correspond on the timeline. Therefore, , after determining the expected audio playback time of the first frame of audio data frame and the expected video playback time of the first frame of video data frame, the subsequent audio and video processing method only needs to process each frame of audio data frame or video data frame according to the present application. Complete the corresponding desired playback time settings.
如图2所示,图2是本申请一个实施例提供的确定视频序号跳变值的流程图,是对步骤S200进行了说明,步骤S200包括但不限于步骤:步骤S210、步骤S220和步骤S230,As shown in Figure 2, Figure 2 is a flow chart for determining the jump value of a video sequence number provided by an embodiment of the present application. It illustrates step S200. Step S200 includes but is not limited to the steps: step S210, step S220 and step S230. ,
步骤S210:根据前两次接收的视频数据帧的时间戳序号确定第一序号差值;Step S210: Determine the first sequence number difference based on the timestamp sequence numbers of the two previously received video data frames;
步骤S220:根据本次接收的视频数据帧的时间戳序号与上一次接收的视频数据帧的时间戳序号确定第二序号差值;Step S220: Determine the second sequence number difference based on the timestamp sequence number of the video data frame received this time and the timestamp sequence number of the video data frame received last time;
步骤S220:根据第二序号差值和第一序号差值计算得到视频序号跳变值。Step S220: Calculate the video sequence number jump value based on the second sequence number difference and the first sequence number difference.
这里引入了前两次时间戳序号来协同实现对时间戳错序号跳变程度的判断,利用三次接收视频数据帧的时间戳序号来确定视频序号跳变值,即第一序号差值和第二序号差值的比值,可以直接有效的知晓时间戳序号的跳变程度,便于后续判断是否出现时间戳序号跳变。The first two timestamp sequence numbers are introduced here to collaboratively determine the degree of timestamp error sequence number jumps. The timestamp sequence numbers of three received video data frames are used to determine the video sequence number jump value, that is, the difference between the first sequence number and the second The ratio of the sequence number difference can directly and effectively know the degree of jump of the timestamp sequence number, which facilitates subsequent judgment of whether a timestamp sequence number jump occurs.
在一些实施例中,以三个视频数据帧的时间戳序号分别为xk-1、xk、xk+1为例,计算视频序号跳变值u的约束公式可以参考下式:
In some embodiments, taking the timestamp serial numbers of three video data frames as x k-1 , x k , and x k+1 as an example, the constraint formula for calculating the video serial number jump value u can refer to the following formula:
在一些实施例中,对步骤S300进行了说明,步骤S300包括但不限于步骤:当视频序号跳变值大于预设的视频序号跳变阈值,在上一次接收的视频数据帧对应的视频期望播放时间上增加第一时间间隔,确定视频期望播放时间,得到第一视频期望播放时间,其中,第一时间间隔表征两次接收视频数据帧之间的时间间隔。In some embodiments, step S300 is described. Step S300 includes but is not limited to the steps: when the video sequence number jump value is greater than the preset video sequence number jump threshold, the video corresponding to the last received video data frame is expected to be played. The first time interval is added to the time, the expected video playback time is determined, and the first video expected playback time is obtained, where the first time interval represents the time interval between two received video data frames.
考虑到跳变时,时间戳序号跳变的数值较大,视频序号跳变阈值同样需要设置较大,从而可以与丢包造成的时间戳序号短时间缺失的情况相区分。在确定本次接收的视频数据帧的视频序号跳变值后,需要对视频序号跳变值进行判断,当视频序号跳变值大于视频序号跳变阈值时,则可以确定当前出现了时间戳序号跳变,此时,直接利用时间戳序号完成对视频期望播放时间的确定,容易出现视频期望播放时间不连续的情形。在一些实施例中,考虑到出现时间戳序号跳变时,视频数据帧会正常发送,那么本次接收的视频数据帧的期望播放时间便可以根据第一时间间隔来确定。因此,在确定出现时间戳序号跳变后,只需要在上一次接收的视频数据帧对应的视频期望播放时间上增加第一时间间隔,即可确定本次接收的视频数据帧的视频期望播放时间,同时,该视频期望播放时间也会记作第一视频期望播放时间,用于后续更新初始视频时间戳序号、初始视频期望播放时间使用。Considering that the timestamp sequence number jumps to a larger value during the jump, the video sequence number jump threshold also needs to be set larger, so that it can be distinguished from the short-term loss of the timestamp sequence number caused by packet loss. After determining the video sequence number jump value of the video data frame received this time, it is necessary to judge the video sequence number jump value. When the video sequence number jump value is greater than the video sequence number jump threshold, it can be determined that a timestamp sequence number currently occurs. Jump, at this time, the timestamp sequence number is directly used to determine the expected playback time of the video, and it is easy for the expected playback time of the video to be discontinuous. In some embodiments, considering that the video data frame will be sent normally when the timestamp sequence number jumps, the expected playback time of the video data frame received this time can be determined based on the first time interval. Therefore, after determining that the timestamp serial number jump occurs, you only need to add a first time interval to the expected video playback time corresponding to the last received video data frame to determine the expected video playback time of the currently received video data frame. , at the same time, the expected playback time of the video will also be recorded as the expected playback time of the first video, which will be used to subsequently update the initial video timestamp number and the expected playback time of the initial video.
在一些实施例中,音视频处理方法还包括:当视频序号跳变值小于视频序号跳变阈值,根据初始视频时间戳序号、初始视频期望播放时间以及视频数据帧的时间戳序号确定视频期望播放时间,得到第二视频期望播放时间,其中,初始视频时间戳序号根据本次接收的视频数据帧的时间戳序号而得到;初始视频期望播放时间根据本次接收的视频数据帧的视频期望播放时间而得到。In some embodiments, the audio and video processing method also includes: when the video sequence number jump value is less than the video sequence number jump threshold, determine the video expected playback based on the initial video timestamp number, the initial video expected playback time, and the timestamp number of the video data frame. time to obtain the expected playback time of the second video, where the initial video timestamp sequence number is obtained based on the timestamp sequence number of the video data frame received this time; the initial video expected playback time is based on the expected video playback time of the video data frame received this time. And get.
没有出现时间戳序号跳变时,时间戳序号变化不会太大,在正常播放时,时间戳序号处于连续状态,在网络波动导致网络丢包出现视频数据帧缺帧时,缺失的时间戳序号也不会过多,此时则可以直接利用本次接收的视频数据帧的时间戳序号来完成视频期望播放时间的确定。在一些实施例中,本次接收的视频数据帧对应的视频序号跳变值小于预设的视频序号跳变阈值,则说明本次接收的视频数据帧是处于正常播放的状态或网络丢包的状态,没有出现跳变,此时根据预设的初始视频时间戳序号、初始视频期望播放时间以及本次接收的视频数据帧的时间戳序号便可以完成对本帧视频期望播放时间的快速确定。在一些实施例中,每两帧视频数据帧的播放时间间隔是固定的,那么只需要确定本次接收的视频数据帧与初始视频时间戳序号之间的差值,便可以在初始视频期望播放时间的基础上利用这一时间间隔完成对本次视频数据帧的期望播放时间的确定。When there is no jump in the timestamp number, the timestamp number will not change too much. During normal playback, the timestamp number is in a continuous state. When network fluctuations cause network packet loss and video data frames are missing, the missing timestamp number will It will not be too much. At this time, you can directly use the timestamp sequence number of the video data frame received this time to complete the determination of the expected playback time of the video. In some embodiments, if the video sequence number jump value corresponding to the video data frame received this time is less than the preset video sequence number jump threshold, it means that the video data frame received this time is in a normal playback state or the network has lost packets. status, there is no jump. At this time, the expected playback time of this frame of video can be quickly determined based on the preset initial video timestamp number, the initial video expected playback time, and the timestamp number of the video data frame received this time. In some embodiments, the playback time interval of every two video data frames is fixed, then you only need to determine the difference between the video data frame received this time and the initial video timestamp number, and then the initial video can be expected to be played. Based on the time, this time interval is used to determine the expected playback time of this video data frame.
在一些实施例中,直接利用本次视频数据帧的时间戳序号与初始视频时间戳序号做差,从而可以确定间隔的时间戳序号数,再与每两次视频数据帧之间的播放时间间隔进行乘法运算,从而可以确定与初始视频期望播放时间之间的视频播放时间差,最后在初始视频期望播放时间的基础上加上视频播放时间差便可以确定本次接收的视频数据帧的视频期望播放时间。In some embodiments, the difference between the timestamp number of the current video data frame and the initial video timestamp number can be directly used to determine the interval timestamp number, and then combined with the playback time interval between each two video data frames. Multiplication is performed to determine the video playback time difference from the initial video expected playback time. Finally, by adding the video playback time difference to the initial video expected playback time, the expected video playback time of the video data frame received this time can be determined. .
在一些实施例中,初始视频时间戳序号和初始视频期望播放时间,由以下步骤得到:当视频数据帧为接收到的第一帧视频数据帧,将第一帧视频数据帧的时间戳序号确定为初始视频时间戳序号,将第一帧视频数据帧对应的视频期望播放时间确定为初始视频期望播放时间。In some embodiments, the initial video timestamp number and the initial video expected playback time are obtained by the following steps: when the video data frame is the first video data frame received, determine the timestamp number of the first video data frame. is the initial video timestamp sequence number, and the expected video playback time corresponding to the first video data frame is determined as the initial video expected playback time.
在接收到第一帧视频数据帧后,会开始初始化,此时,利用第一帧视频数据帧的时间戳 序号来完成对初始视频时间戳序号的初始化赋值,利用第一帧视频数据帧的视频期望播放时间完成对初始视频期望播放时间的初始化赋值。After receiving the first frame of video data frame, initialization will begin. At this time, the timestamp of the first frame of video data frame is used. sequence number to complete the initial assignment of the initial video timestamp sequence number, and use the expected video playback time of the first frame of video data frame to complete the initial assignment of the expected playback time of the initial video.
在一些实施例中,音视频处理方法还包括:In some embodiments, the audio and video processing method further includes:
当视频序号跳变值大于视频序号跳变阈值,根据本次接收的视频数据帧的时间戳序号更新初始视频时间戳序号;When the video sequence number jump value is greater than the video sequence number jump threshold, the initial video timestamp sequence number is updated according to the timestamp sequence number of the video data frame received this time;
根据第一视频期望播放时间更新初始视频期望播放时间。The initial video expected play time is updated according to the first video expected play time.
当视频数据帧出现时间戳序号跳变后,后续接收的所有视频数据帧的时间戳序号都会根据跳变后的时间戳序号进行赋值,因此,不能再使用第一帧视频数据帧对应的初始视频时间戳序号和初始视频期望播放时间来进行后续的视频期望播放时间计算,并且在每一次跳变后都同样不能再使用前一次的初始视频时间戳序号和初始视频期望播放时间。在一些实施例中,当出现时间戳序号跳变时,会先确定本次接收的视频数据帧的时间戳序号和第一视频期望播放时间,之后直接利用该时间戳序号和第一视频期望播放时间则可以完成对初始视频时间戳序号和初始视频期望播放时间的更新,后续则在更新后的初始视频时间戳序号和初始视频期望播放时间的基础上计算视频期望播放时间,从而保证了整个视频数据帧在时间轴上播放的准确性和流畅性。When the timestamp number of a video data frame jumps, the timestamp numbers of all subsequently received video data frames will be assigned according to the timestamp number after the jump. Therefore, the initial video corresponding to the first video data frame can no longer be used. The timestamp number and initial video expected play time are used to calculate subsequent expected video play time, and after each jump, the previous initial video timestamp number and initial video expected play time cannot be used again. In some embodiments, when a timestamp number jump occurs, the timestamp number of the video data frame received this time and the expected playback time of the first video are first determined, and then the timestamp number and the expected playback time of the first video are directly used. Time can complete the update of the initial video timestamp serial number and the initial video expected playback time. Subsequently, the expected video playback time is calculated based on the updated initial video timestamp serial number and the initial video expected playback time, thereby ensuring that the entire video Accuracy and smoothness of data frame playback on the timeline.
在一些实施例中,当视频数据帧为接收到的第一帧视频数据帧,音视频处理方法,还包括:当接收第一帧视频数据帧的时间早于或等于接收第一帧音频数据帧的时间,将视频期望播放时间设置为预设时间值。In some embodiments, when the video data frame is the received first frame of video data frame, the audio and video processing method further includes: when the time of receiving the first frame of video data frame is earlier than or equal to the time of receiving the first frame of audio data frame. time, and set the expected playback time of the video to the preset time value.
在对初始视频时间戳序号和初始视频期望播放时间进行初始化,初始视频时间戳序号可以直接从第一帧视频数据帧中获得,但是初始视频期望播放时间则无法直接得到,此时,则可以直接定义一个预设时间值作为起始时间,即初始视频期望播放时间即可。在一些实施例中,可以直接将预设时间值确定为0秒即可。When initializing the initial video timestamp number and the initial video expected playback time, the initial video timestamp number can be obtained directly from the first video data frame, but the initial video expected playback time cannot be obtained directly. At this time, you can directly Define a preset time value as the starting time, which is the expected playback time of the initial video. In some embodiments, the preset time value can be directly determined as 0 seconds.
在一些实施例中,当视频数据帧为接收到的第一帧视频数据帧,音视频处理方法,还包括:当接收第一帧视频数据帧的时间晚于接收第一帧音频数据帧的时间,将接收第一帧视频数据帧与接收第一帧音频数据帧之间的时间间隔确定为视频期望播放时间。In some embodiments, when the video data frame is the received first frame of video data frame, the audio and video processing method further includes: when the time of receiving the first frame of video data frame is later than the time of receiving the first frame of audio data frame. , the time interval between receiving the first frame of video data frame and receiving the first frame of audio data frame is determined as the expected video playback time.
实际操作时,可能会因为多种原因导致第一帧视频数据帧和第一帧音频数据帧在第一次发送时,会存在一定的时间间隔,那么在这种情况下,为了继续保证音频数据帧与视频数据帧之间的对应关系,则需要让两者在初始时,既保持固定的时间间隔,因此,在视频数据帧为第一帧且晚于第一帧音频数据帧时,则不能直接将视频期望播放时间确定为0秒,需要保持一个时间间隔的时间延迟。In actual operation, there may be a certain time interval between the first frame of video data frame and the first frame of audio data frame when they are sent for the first time due to various reasons. In this case, in order to continue to ensure that the audio data The corresponding relationship between frames and video data frames requires that the two maintain a fixed time interval initially. Therefore, when the video data frame is the first frame and is later than the first audio data frame, it cannot Directly determine the expected playback time of the video to 0 seconds, and a time delay of a time interval needs to be maintained.
为了更好的说明视频期望播放时间、时间戳序号、初始视频时间戳序号和初始视频期望播放时间的约束关系,可以参考以下约束公式:
In order to better explain the constraint relationship between the expected playback time of the video, the timestamp number, the initial video timestamp number, and the expected playback time of the initial video, you can refer to the following constraint formula:
式中,xstart为初始视频时间戳序号,xk为第k帧视频数据帧,tx为两次播放视频数据帧时间的时间间隔,abs(xstart)为初始视频期望播放时间,0即为0秒,Δ为第一帧视频数据帧与第一帧音频数据帧之间的时间间隔。In the formula, x start is the initial video timestamp sequence number, x k is the k-th video data frame, t x is the time interval between two video data frames, abs(x start ) is the expected play time of the initial video, 0 is is 0 seconds, and Δ is the time interval between the first video data frame and the first audio data frame.
在视频数据帧为第一帧时,使用式(1)进行视频期望播放时间计算,并可以确定初始视 频时间戳序号和初始视频期望播放时间。在视频数据帧传输未出现跳变时,使用式(2)完成视频期望播放时间的确定。在视频数据帧出现跳变时,则使用式(3)完成视频期望播放时间确定,并同时更新初始视频时间戳序号和初始视频期望播放时间。When the video data frame is the first frame, equation (1) is used to calculate the expected playback time of the video, and the initial view can be determined. The video timestamp sequence number and the expected playback time of the initial video. When there is no jump in the video data frame transmission, use equation (2) to complete the determination of the expected video playback time. When the video data frame jumps, equation (3) is used to determine the expected video playback time, and the initial video timestamp sequence number and the initial video expected playback time are updated at the same time.
在一些实施例中,音视频处理方法,还包括:In some embodiments, the audio and video processing method also includes:
当超过预设的视频补帧时间阈值未接收到视频数据帧,复制上一次依次接收的视频数据帧;When no video data frame is received beyond the preset video frame filling time threshold, copy the last video data frame received in sequence;
根据第一时间间隔与上一次接收视频数据帧对应的视频期望播放时间确定复制的视频数据帧的视频期望播放时间,其中,第一时间间隔表征两次接收视频数据帧之间的时间间隔。The expected video playback time of the copied video data frame is determined according to the expected video playback time of the first time interval corresponding to the last received video data frame, where the first time interval represents the time interval between two received video data frames.
在视频数据传输时,如果出现了丢包的情况,那么可能会出现连续一定时间内没有视频数据帧的情况,此时,如果在确定视频序号跳变值后再进行补帧则会造成一定的延时,那么在此种情形下,则可以直接利用视频补帧时间阈值来复制前一帧,以补偿对本次视频数据帧的缺失即可,每超过一个视频补帧时间阈值没有接收到视频数据帧则依次利用前一帧完成补帧。直至正常接收到视频数据帧后,则会利用该视频数据帧完成对视频期望播放时间的确定。During video data transmission, if packet loss occurs, there may be no video data frame for a certain period of time. At this time, if the video sequence number jump value is determined before supplementing the frame, it will cause certain problems. Delay, then in this case, you can directly use the video frame filling time threshold to copy the previous frame to compensate for the lack of this video data frame. Every time the video frame filling time threshold is exceeded, no video is received. The data frame uses the previous frame in turn to complete the frame complement. Until the video data frame is received normally, the video data frame will be used to complete the determination of the expected playback time of the video.
如图3所示,图3是本申请一个实施例提供的确定音频序号跳变值的流程图,是对步骤S400进行了说明,步骤S400包括但不限于步骤:步骤S410、步骤S420和步骤S430,As shown in Figure 3, Figure 3 is a flow chart for determining the audio sequence number jump value provided by an embodiment of the present application, and illustrates step S400. Step S400 includes but is not limited to the steps: step S410, step S420 and step S430. ,
步骤S410:根据前两次接收的音频数据帧的时间戳序号确定第三序号差值;Step S410: Determine the third sequence number difference based on the timestamp sequence numbers of the two previously received audio data frames;
步骤S420:根据本次接收的音频数据帧的时间戳序号与上一次接收的音频数据帧的时间戳序号确定第四序号差值;Step S420: Determine the fourth sequence number difference based on the timestamp sequence number of the audio data frame received this time and the timestamp sequence number of the audio data frame received last time;
步骤S430:根据第三序号差值和第四序号差值计算得到音频序号跳变值。Step S430: Calculate the audio sequence number jump value based on the third sequence number difference and the fourth sequence number difference.
这里引入了前两次时间戳序号来协同实现对时间戳错序号跳变程度的判断,利用三次接收音频数据帧的时间戳序号来确定音频序号跳变值,即第三序号差值和第四序号差值的比值,可以直接有效的知晓时间戳序号的跳变程度,便于后续确定是否出现时间戳序号判断。计算音频序号跳变值的约束公式可以参考计算视频序号跳变值的约束公式。The first two timestamp serial numbers are introduced here to jointly realize the judgment of the degree of wrong sequence number jump of the timestamp. The timestamp serial numbers of the three received audio data frames are used to determine the audio sequence number jump value, that is, the difference between the third sequence number and the fourth The ratio of the serial number difference can directly and effectively know the degree of jump of the timestamp serial number, which facilitates subsequent determination of whether a timestamp serial number occurs. For the constraint formula for calculating the audio sequence number jump value, you can refer to the constraint formula for calculating the video sequence number jump value.
在一些实施例中,对步骤S600进行了说明,步骤S600包括但不限于步骤:当音频序号跳变值大于预设的音频序号跳变阈值,在上一次接收音频数据帧对应的音频期望播放时间上增加第二时间间隔,确定音频期望播放时间,得到第一音频期望播放时间,其中,第二时间间隔表征两次接收音频数据帧之间的时间间隔。In some embodiments, step S600 is described. Step S600 includes but is not limited to the step of: when the audio sequence number jump value is greater than the preset audio sequence number jump threshold, the expected audio playback time corresponding to the last received audio data frame is A second time interval is added to determine the expected audio playback time to obtain the first expected audio playback time, where the second time interval represents the time interval between two receptions of audio data frames.
考虑到跳变时,时间戳序号跳变的数值较大,音频序号跳变阈值同样需要设置较大,从而可以与丢包造成的时间戳序号短时间缺失的情况相区分。在确定本次接收的音频数据帧的音频序号跳变值后,需要对音频序号跳变值进行判断,当音频序号跳变值大于音频序号跳变阈值时,则可以确定当前出现了时间戳序号跳变,此时,直接利用时间戳序号完成音频期望播放时间的确定,容易出现音频期望播放时间不连续的情形。在一些实施例中,考虑到出现时间戳序号跳变时,音频数据帧会正常发送,那么本次接收的音频数据帧的音频期望播放时间便可以根据第二时间间隔来确定。因此,在确定出现时间戳序号跳变后,只需要在上一次接收的音频数据帧对应的音频期望播放时间上增加第二时间间隔,即可确定本次接收的音频数据帧的音频期望播放时间,同时,该音频期望播放时间也会记作第一音频期望播放时间,用于后续更新初始音频时间戳序号、初始音频期望播放时间使用。Considering that the timestamp sequence number jumps to a larger value during the jump, the audio sequence number jump threshold also needs to be set larger, so that it can be distinguished from the short-term loss of the timestamp sequence number caused by packet loss. After determining the audio sequence number jump value of the audio data frame received this time, it is necessary to judge the audio sequence number jump value. When the audio sequence number jump value is greater than the audio sequence number jump threshold, it can be determined that a timestamp sequence number currently occurs. jump, at this time, the timestamp sequence number is directly used to complete the determination of the expected audio playback time, which is prone to the situation where the expected audio playback time is discontinuous. In some embodiments, considering that the audio data frame will be sent normally when the timestamp sequence number jumps, the expected audio playback time of the audio data frame received this time can be determined based on the second time interval. Therefore, after determining that the timestamp serial number jump occurs, you only need to add a second time interval to the expected audio playback time corresponding to the last received audio data frame to determine the expected audio playback time of the currently received audio data frame. , at the same time, the expected audio playback time will also be recorded as the first audio expected playback time, which will be used to subsequently update the initial audio timestamp number and the initial audio expected playback time.
在一些实施例中,音视频处理方法还包括:当音频序号跳变值小于音频序号跳变阈值,根据初始音频时间戳序号、初始音频期望播放时间以及音频数据帧的时间戳序号确定音频期 望播放时间,得到第二音频期望播放时间,其中,初始音频时间戳序号根据本次接收的音频数据帧的时间戳序号而得到,初始音频期望播放时间根据本次接收的音频数据帧的音频期望播放时间而得到。In some embodiments, the audio and video processing method further includes: when the audio sequence number jump value is less than the audio sequence number jump threshold, determine the audio period based on the initial audio timestamp number, the initial audio expected playback time, and the timestamp number of the audio data frame. Expect the playback time to obtain the second audio expected playback time, where the initial audio timestamp number is obtained based on the timestamp number of the audio data frame received this time, and the initial audio expected playback time is based on the audio expectation of the audio data frame received this time. obtained by playing time.
没有出现时间戳序号跳变时,时间戳序号变化不会太大,在正常播放时,时间戳序号处于连续状态,在网络波动导致网络丢包出现音频数据帧缺帧时,缺失的时间戳序号也不会过多,此时则可以直接利用本次接收的音频数据帧的时间戳序号来完成音频期望播放时间的确定。在一些实施例中,本次接收的音频数据帧对应的音频序号跳变值小于预设的音频序号跳变阈值,则说明本次接收的音频数据帧是处于正常播放的状态或网络丢包的状态,没有出现跳变,此时根据预设的初始音频时间戳序号、初始音频期望播放时间以及本次接收的音频数据帧的时间戳序号便可以完成对本帧音频期望播放时间的快速确定。在一些实施例中,每两帧音频数据帧的播放时间间隔是固定的,那么只需要确定本次接收的音频数据帧与初始音频时间戳序号之间的差值,便可以在初始音频期望播放时间的基础上利用这一时间间隔完成对本次音频数据帧的音频期望播放时间的确定。When there is no time stamp sequence number jump, the timestamp sequence number will not change too much. During normal playback, the timestamp sequence number is in a continuous state. When network fluctuations cause network packet loss and audio data frames are missing, the missing timestamp sequence number will It will not be too much. At this time, you can directly use the timestamp sequence number of the audio data frame received this time to complete the determination of the expected audio playback time. In some embodiments, if the audio sequence number jump value corresponding to the audio data frame received this time is less than the preset audio sequence number jump threshold, it means that the audio data frame received this time is in a normal playback state or the network packet is lost. status, there is no jump. At this time, the expected playback time of this frame of audio can be quickly determined based on the preset initial audio timestamp number, the initial expected audio playback time, and the timestamp number of the audio data frame received this time. In some embodiments, the playback time interval of every two audio data frames is fixed, so you only need to determine the difference between the audio data frame received this time and the initial audio timestamp sequence number, and then the initial audio data can be played when expected. Based on the time, this time interval is used to complete the determination of the expected audio playback time of this audio data frame.
在一些实施例中,直接利用本次音频数据帧的时间戳序号与初始音频时间戳序号做差,从而可以确定间隔的时间戳序号数,再与每两次音频数据帧之间的播放时间间隔进行乘法运算,从而可以确定与初始音频期望播放时间之间的音频播放时间差,最后在初始音频期望播放时间的基础上加上音频播放时间差便可以确定本次接收的音频数据帧的音频期望播放时间。In some embodiments, the difference between the timestamp number of this audio data frame and the initial audio timestamp number can be directly used to determine the timestamp number of the interval, and then combined with the playback time interval between each two audio data frames. Perform a multiplication operation to determine the audio playback time difference from the initial audio expected playback time. Finally, adding the audio playback time difference to the initial audio expected playback time can determine the audio expected playback time of the audio data frame received this time. .
在一些实施例中,初始音频时间戳序号和初始音频期望播放时间,由以下步骤得到:当音频数据帧为接收到的第一帧音频数据帧,将第一帧音频数据帧的时间戳序号确定为初始音频时间戳序号,将第一帧音频数据帧对应的音频期望播放时间确定为初始音频期望播放时间。In some embodiments, the initial audio timestamp number and the initial audio expected playback time are obtained by the following steps: when the audio data frame is the first received audio data frame, determine the timestamp number of the first audio data frame. is the initial audio timestamp sequence number, and the expected audio playback time corresponding to the first frame of audio data frame is determined as the initial expected audio playback time.
在接收到第一帧音频数据帧后,会开始初始化,此时,利用第一帧音频数据帧的时间戳序号来完成对初始音频时间戳序号的初始化赋值,利用第一帧音频数据帧的音频期望播放时间完成对初始音频期望播放时间的初始化赋值。After receiving the first audio data frame, initialization will begin. At this time, the timestamp number of the first audio data frame is used to complete the initialization assignment of the initial audio timestamp number. The audio of the first audio data frame is used to complete the initialization assignment. The expected playback time completes the initial assignment of the expected playback time of the initial audio.
在一些实施例中,音视频处理方法还包括:In some embodiments, the audio and video processing method further includes:
当音频序号跳变值大于音频序号跳变阈值,根据本次接收的音频数据帧的时间戳序号更新初始音频时间戳序号;When the audio sequence number jump value is greater than the audio sequence number jump threshold, the initial audio timestamp sequence number is updated according to the timestamp sequence number of the audio data frame received this time;
根据第一音频期望播放时间更新初始音频期望播放时间。The initial audio expected play time is updated according to the first audio expected play time.
当音频数据帧出现时间戳序号跳变后,后续接收的所有音频数据帧的时间戳序号都会根据跳变后的时间戳序号进行赋值,因此,不能再使用第一帧音频数据帧对应的初始音频时间戳序号和初始音频期望播放时间来进行后续的音频期望播放时间计算,并且在每一次跳变后都同样不能再使用前一次的初始音频时间戳序号和初始音频期望播放时间。在一些实施例中,当出现时间戳序号跳变时,会先确定本次接收的音频数据帧的时间戳序号和第一音频期望播放时间,之后直接利用该时间戳序号和第一音频期望播放时间则可以完成对初始音频时间戳序号和初始音频期望播放时间的更新,后续则在更新后的初始音频时间戳序号和初始音频期望播放时间的基础上计算音频期望播放时间,从而保证了整个音频数据帧在时间轴上播放的准确性和流畅性。When the timestamp number of an audio data frame jumps, the timestamp numbers of all subsequently received audio data frames will be assigned according to the timestamp number after the jump. Therefore, the initial audio corresponding to the first audio data frame can no longer be used. The timestamp number and the initial expected audio playback time are used for subsequent calculations of the expected audio playback time, and after each jump, the previous initial audio timestamp number and initial expected audio playback time cannot be used again. In some embodiments, when a timestamp number jump occurs, the timestamp number of the audio data frame received this time and the expected playback time of the first audio are first determined, and then the timestamp number and the expected playback time of the first audio are directly used. Time can complete the update of the initial audio timestamp serial number and the initial audio expected playback time, and then calculate the audio expected playback time based on the updated initial audio timestamp serial number and the initial audio expected playback time, thereby ensuring that the entire audio Accuracy and smoothness of data frame playback on the timeline.
在一些实施例中,当音频数据帧为接收到的第一帧音频数据帧,音视频处理方法,还包括:当接收第一帧音频数据帧的时间早于或等于接收第一帧视频数据帧的时间,将音频期望播放时间设置为预设时间值。 In some embodiments, when the audio data frame is the first received audio data frame, the audio and video processing method further includes: when the time of receiving the first audio data frame is earlier than or equal to the time of receiving the first video data frame time to set the desired audio playback time to the preset time value.
在对初始音频时间戳序号和初始音频期望播放时间进行初始化,初始音频时间戳序号可以直接从第一帧音频数据帧中获得,但是初始音频期望播放时间则无法直接得到,此时,则可以直接定义一个预设时间值作为起始时间,即初始音频期望播放时间即可。在一些实施例中,可以直接将预设时间值确定为0秒即可。When initializing the initial audio timestamp number and the initial audio expected playback time, the initial audio timestamp number can be obtained directly from the first audio data frame, but the initial audio expected playback time cannot be obtained directly. At this time, you can directly Define a preset time value as the starting time, which is the expected playback time of the initial audio. In some embodiments, the preset time value can be directly determined as 0 seconds.
在一些实施例中,当音频数据帧为接收到的第一帧音频数据帧,音视频处理方法,还包括:当接收第一帧音频数据帧的时间晚于接收第一帧视频数据帧的时间,将接收第一帧音频数据帧与接收第一帧视频数据帧之间的时间间隔确定为音频期望播放时间。In some embodiments, when the audio data frame is the first received audio data frame, the audio and video processing method further includes: when the time of receiving the first frame of audio data frame is later than the time of receiving the first frame of video data frame. , the time interval between receiving the first frame of audio data frame and receiving the first frame of video data frame is determined as the expected audio playback time.
实际操作时,可能会因为多种原因导致第一帧视频数据帧和第一帧音频数据帧在第一次发送时,会存在一定的时间间隔,那么在这种情况下,为了继续保证音频数据帧与视频数据帧之间的对应关系,则需要让两者在初始时,既保持固定的时间间隔,因此,在音频数据帧为第一帧且晚于第一帧视频数据帧时,则不能直接将音频期望播放时间确定为0秒,需要保持一个时间间隔的时间延迟。In actual operation, there may be a certain time interval between the first frame of video data frame and the first frame of audio data frame when they are sent for the first time due to various reasons. In this case, in order to continue to ensure that the audio data The correspondence between frames and video data frames requires that the two maintain a fixed time interval initially. Therefore, when the audio data frame is the first frame and is later than the first video data frame, it cannot Directly determine the expected audio playback time to 0 seconds, and a time delay of a time interval needs to be maintained.
在一些实施例中,音频期望播放时间、时间戳序号、初始音频时间戳序号和初始音频期望播放时间的约束关系可以参考视频期望播放时间、视频数据帧的时间戳序号、初始视频时间戳序号和初始视频期望播放时间的约束关系。In some embodiments, the constraint relationship between the expected audio play time, the timestamp number, the initial audio timestamp number, and the initial expected audio play time can refer to the expected video play time, the timestamp number of the video data frame, the initial video timestamp number, and The constraint relationship of the expected playback time of the initial video.
在一些实施例中,音视频处理方法,还包括:In some embodiments, the audio and video processing method also includes:
当超过预设的音频补帧时间阈值未接收到音频数据帧,复制上一次接收的音频数据帧;When no audio data frame is received beyond the preset audio frame filling time threshold, copy the last received audio data frame;
根据第二时间间隔与上一次接收音频数据帧对应的音频期望播放时间确定复制的音频数据帧的音频期望播放时间,其中,第二时间间隔表征两次接收音频数据帧之间的时间间隔。The expected audio playback time of the copied audio data frame is determined according to the expected audio playback time of the second time interval corresponding to the last received audio data frame, where the second time interval represents the time interval between two received audio data frames.
在音频数据传输时,如果出现了丢包的情况,那么可能会出现连续一定时间内没有音频数据帧的情况,此时,如果在确定音频序号跳变值后再进行补帧则会造成一定的延时,那么在此种情形下,则可以直接利用音频补帧时间阈值来复制前一帧,以补偿对本次音频数据帧的缺失即可,每超过一个音频补帧时间阈值没有接收到音频数据帧则依次利用前一帧完成补帧。直至正常接收到音频数据帧后,则会利用该音频数据帧完成对音频期望播放时间的确定。During audio data transmission, if packet loss occurs, there may be no audio data frame for a certain period of time. At this time, if the audio sequence number jump value is determined before supplementing the frame, it will cause certain problems. Delay, then in this case, you can directly use the audio frame filling time threshold to copy the previous frame to compensate for the lack of this audio data frame. Every time the audio frame filling time threshold is exceeded, no audio is received. The data frame uses the previous frame in turn to complete the frame complement. Until the audio data frame is received normally, the audio data frame will be used to complete the determination of the expected audio playback time.
为了更加清楚的说明本申请实施例提供的音视频处理方法的处理流程,下面以示例进行说明。In order to explain more clearly the processing flow of the audio and video processing method provided by the embodiment of the present application, examples are used to illustrate the following.
该音视频处理方法包括以下步骤:The audio and video processing method includes the following steps:
获取第一帧音频数据帧和第一帧视频数据帧,并根据第一帧视频数据帧确定视频数据流的初始视频期望播放时间和初始视频时间戳序号,根据第一帧音频数据帧确定音频数据流的初始音频期望播放时间和初始音频时间戳序号;第一次初始化时,初始视频时间戳序号和初始音频时间戳序号通常会保持一致,初始视频期望播放时间和初始音频期望播放时间则会根据NTP(Network Time Protocol,网络时间协议)协议确定一个时间间隔;Obtain the first frame of audio data frame and the first frame of video data frame, determine the initial video expected play time and initial video timestamp sequence number of the video data stream based on the first frame of video data frame, and determine the audio data based on the first frame of audio data frame The initial audio expected play time and initial audio timestamp sequence number of the stream; when initialized for the first time, the initial video timestamp sequence number and the initial audio timestamp sequence number will usually remain consistent, and the initial video expected play time and initial audio expected play time will be based on NTP (Network Time Protocol) protocol determines a time interval;
持续接收音频数据帧和视频数据帧,并且持续记录前两次接收的音频数据帧时间戳序号和视频数据帧的时间戳序号,在接收到本帧视频数据帧或音频数据帧之后便可利用对应的前两次记录的时间戳序号和本次的时间戳序号确定好视频序号跳变值或音频序号跳变值;Continuously receive audio data frames and video data frames, and continue to record the timestamp number of the first two received audio data frames and the timestamp number of the video data frame. After receiving the current video data frame or audio data frame, you can use the corresponding The timestamp number of the previous two records and the timestamp number of this time determine the video number jump value or the audio number jump value;
通过将视频序号跳变值与预设的视频序号跳变阈值比较或将音频序号跳变值与预设的音频跳变阈值比较,从而可以确定是否出现了时间戳序号跳变;By comparing the video sequence number jump value with the preset video sequence number jump threshold or comparing the audio sequence number jump value with the preset audio jump threshold, it can be determined whether a timestamp sequence number jump occurs;
在确定视频数据帧或音频数据帧未出现时间戳序号跳变时,则可以直接利用初始视频时间戳序号、初始视频期望播放时间以及视频数据帧的时间戳序号确定视频期望播放时间,利 用初始音频时间戳序号、初始音频期望播放时间以及音频数据帧的时间戳序号确定音频期望播放时间;When it is determined that there is no time stamp sequence number jump in the video data frame or audio data frame, the initial video timestamp sequence number, the initial video expected playback time and the timestamp sequence number of the video data frame can be directly used to determine the expected video playback time. Determine the expected audio playback time using the initial audio timestamp number, the initial audio expected playback time, and the timestamp number of the audio data frame;
在确定视频数据帧或音频数据帧出现时间戳序号跳变后,则在上一次接收视频数据帧对应的视频期望播放时间上增加第一时间间隔,以确定跳变后的视频期望播放时间,同时,得到第一视频期望播放时间,并且可以根据第一视频期望播放时间来更新初始视频时间戳序号和初始视频时间戳序号,或者,可以依据与视频处理相同的原理完成对音频数据帧的处理,得到跳变后的音频期望播放时间,以及完成对初始音频时间戳序号和初始音频时间戳序号的更新;After it is determined that the timestamp sequence number jumps in the video data frame or audio data frame, a first time interval is added to the expected video playback time corresponding to the last received video data frame to determine the expected video playback time after the jump, and at the same time , obtain the expected playback time of the first video, and update the initial video timestamp number and the initial video timestamp number according to the expected playback time of the first video, or the processing of the audio data frame can be completed based on the same principle as video processing, Obtain the expected audio playback time after the jump, and complete the update of the initial audio timestamp sequence number and the initial audio timestamp sequence number;
此外,在确定出现网络丢包时,则可以直接对前一帧视频数据帧或音频数据帧进行复制,然后按照正常确定视频期望播放时间或音频期望播放时间的方式,完成设置即可,从而可以保证视频和音频播放完整性和流畅性。In addition, when it is determined that network packet loss occurs, you can directly copy the previous video data frame or audio data frame, and then complete the settings in the normal way of determining the expected video playback time or the expected audio playback time, so that you can Ensure the integrity and smoothness of video and audio playback.
本申请的音视频处理方法直接利用视频期望播放时间和音频期望播放时间构建时间轴,通过给每个视频数据帧和音频数据帧确定对应的视频期望播放时间和音频期望播放时间,使得每个视频数据帧和音频数据帧在时间轴具有唯一性,从而使得整个媒体流可以简单清晰的实现音视频同步。The audio and video processing method of this application directly uses the expected video playback time and the expected audio playback time to construct a timeline, and determines the corresponding expected video playback time and audio expected playback time for each video data frame and audio data frame, so that each video Data frames and audio data frames are unique in the timeline, so that the entire media stream can achieve audio and video synchronization simply and clearly.
另外,本申请的一个实施例还提供了一种计算机可读存储介质,计算机可读存储介质存储有计算机可执行指令,计算机可执行指令用于执行如上述的音视频处理方法,例如,被上述音视频处理装置的实施例中的一个处理器执行,可使得上述处理器执行上述实施例中的信息处理方法,例如,执行以上描述的图1中的方法、图2中的方法和图3中的方法。In addition, an embodiment of the present application also provides a computer-readable storage medium. The computer-readable storage medium stores computer-executable instructions. The computer-executable instructions are used to execute the above audio and video processing method, for example, by the above Execution by a processor in the embodiment of the audio and video processing device can cause the above-mentioned processor to execute the information processing method in the above embodiment, for example, execute the method in Figure 1, the method in Figure 2 and the method in Figure 3 described above. Methods.
另外,本申请的一个实施例还提供了一种音视频处理装置,该音视频处理装置包括:存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,处理器执行计算机程序时实现如上述的音视频处理方法。In addition, one embodiment of the present application also provides an audio and video processing device. The audio and video processing device includes: a memory, a processor, and a computer program stored in the memory and executable on the processor. The processor executes the computer program. When implementing the above audio and video processing method.
实现上述实施例的音视频处理方法所需的非暂态软件程序以及指令存储在存储器中,当被处理器执行时,执行上述实施例中的音视频处理方法,例如,执行以上描述的图1中的方法、图2中的方法和图3中的方法。The non-transitory software programs and instructions required to implement the audio and video processing methods of the above embodiments are stored in the memory. When executed by the processor, the audio and video processing methods in the above embodiments are executed. For example, the above described Figure 1 is executed. The method in , the method in Figure 2 and the method in Figure 3 .
本申请实施例包括:获取视频数据帧、音频数据帧;根据本次接收的视频数据帧的时间戳序号以及前两次接收的视频数据帧的时间戳序号确定视频序号跳变值;根据视频序号跳变值确定视频期望播放时间,视频期望播放时间表征视频数据帧对应的播放时间;根据本次接收的音频数据帧的时间戳序号以及前两次接收的音频数据帧的时间戳序号确定音频序号跳变值;根据音频序号跳变值确定音频期望播放时间,音频期望播放时间表征音频数据帧对应的播放时间,其中,音频期望播放时间与视频期望播放时间相一致;根据音频期望播放时间和视频期望播放时间对音频数据帧和视频数据帧进行同步处理。利用视频数据帧的时间戳序号确定视频数据帧的视频序号跳变阈值,从而知晓视频的连续状态,确定是否出现跳变,进而可以利用视频序号跳变阈值来作为确定本次接收的视频数据帧的视频期望播放时间的依据,避免视频数据帧的播放时出现长时间重复帧或丢帧的情况,同时,也使得视频数据帧可以在视频期望播放时间的基础上实现准确播放;同理,在接收到音频数据帧时,可以利用音频数据帧的时间戳序号确定音频数据帧的音频序号跳变阈值,从而知晓音频的连续状态,确定是否出现跳变,进而可以利用音频序号跳变阈值来作为确定本次接收的音频数据帧的音频期望播放时间的依据,避免音频数据帧的播放时出现长时间重复帧或丢帧的情况,同时,也使得 音频数据帧可以在音频期望播放时间的基础上实现准确播放。最终,又因为音频期望播放时间和视频期望播放时间存在一致性,从而使得接收的音频数据帧和视频数据帧可以在时间上实现准确对应,实现对音频数据和视频数据的同步。Embodiments of the present application include: acquiring video data frames and audio data frames; determining the video sequence number jump value based on the timestamp number of the video data frame received this time and the timestamp number of the video data frames received twice before; The jump value determines the expected playback time of the video, and the expected playback time of the video represents the playback time corresponding to the video data frame; the audio sequence number is determined based on the timestamp number of the audio data frame received this time and the timestamp number of the two previously received audio data frames. Jump value; Determine the expected audio playback time based on the audio serial number jump value. The expected audio playback time represents the playback time corresponding to the audio data frame. Among them, the expected audio playback time is consistent with the expected video playback time; According to the expected audio playback time and video It is expected that the audio data frame and the video data frame will be synchronized during playback time. The timestamp sequence number of the video data frame is used to determine the video sequence number jump threshold of the video data frame, so as to know the continuous status of the video and determine whether a jump occurs. The video sequence number jump threshold can then be used to determine the video data frame received this time. The basis of the expected playback time of the video avoids long-term repeated frames or dropped frames during the playback of video data frames. At the same time, it also enables the video data frames to be played accurately based on the expected playback time of the video; similarly, in When an audio data frame is received, the timestamp sequence number of the audio data frame can be used to determine the audio sequence number jump threshold of the audio data frame, so as to know the continuous state of the audio and determine whether a jump occurs, and then the audio sequence number jump threshold can be used as The basis for determining the expected audio playback time of the audio data frame received this time to avoid long-term repeated frames or frame loss during the playback of the audio data frame. At the same time, it also makes Audio data frames can be played accurately based on the expected playback time of the audio. Finally, because the expected audio playback time and the expected video playback time are consistent, the received audio data frames and video data frames can accurately correspond in time, achieving synchronization of audio data and video data.
本领域普通技术人员可以理解,上文中所公开方法中的全部或某些步骤、系统可以被实施为软件、固件、硬件及其适当的组合。某些物理组件或所有物理组件可以被实施为由处理器,如中央处理器、数字信号处理器或微处理器执行的软件,或者被实施为硬件,或者被实施为集成电路,如专用集成电路。这样的软件可以分布在计算机可读介质上,计算机可读介质可以包括计算机存储介质(或非暂时性介质)和通信介质(或暂时性介质)。如本领域普通技术人员公知的,术语计算机存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、程序模块或其他数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于RAM、ROM、EEPROM、闪存或其他存储单元技术、CD-ROM、数字多功能盘(DVD)或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其他的介质。此外,本领域普通技术人员公知的是,通信介质通常包含计算机可读指令、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信号中的其他数据,并且可包括任何信息递送介质。 Those of ordinary skill in the art can understand that all or some steps and systems in the methods disclosed above can be implemented as software, firmware, hardware, and appropriate combinations thereof. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, a digital signal processor, or a microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit . Such software may be distributed on computer-readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). As is known to those of ordinary skill in the art, the term computer storage media includes volatile and nonvolatile media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. removable, removable and non-removable media. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other storage cell technology, CD-ROM, Digital Versatile Disk (DVD) or other optical disk storage, magnetic cassettes, tapes, disk storage or other magnetic storage devices, or Any other medium that can be used to store the desired information and that can be accessed by a computer. Additionally, it is known to those of ordinary skill in the art that communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery media .

Claims (19)

  1. 一种音视频处理方法,包括:An audio and video processing method, including:
    获取视频数据帧、音频数据帧;Get video data frame and audio data frame;
    根据本次接收的所述视频数据帧的时间戳序号以及前两次接收的所述视频数据帧的时间戳序号确定视频序号跳变值;Determine the video sequence number jump value based on the timestamp sequence number of the video data frame received this time and the timestamp sequence number of the video data frame received twice previously;
    根据所述视频序号跳变值确定视频期望播放时间,所述视频期望播放时间表征所述视频数据帧对应的播放时间;The expected video playback time is determined according to the video sequence number jump value, and the expected video playback time represents the playback time corresponding to the video data frame;
    根据本次接收的所述音频数据帧的时间戳序号以及前两次接收的所述音频数据帧的时间戳序号确定音频序号跳变值;Determine the audio sequence number jump value according to the timestamp sequence number of the audio data frame received this time and the timestamp sequence number of the audio data frame received twice previously;
    根据所述音频序号跳变值确定音频期望播放时间,所述音频期望播放时间表征所述音频数据帧对应的播放时间,其中,所述音频期望播放时间与所述视频期望播放时间相一致;The expected audio playback time is determined according to the audio sequence number jump value, and the expected audio playback time represents the playback time corresponding to the audio data frame, wherein the expected audio playback time is consistent with the expected video playback time;
    根据所述音频期望播放时间和所述视频期望播放时间对所述音频数据帧和所述视频数据帧进行同步处理。The audio data frame and the video data frame are synchronized according to the desired audio playback time and the desired video playback time.
  2. 根据权利要求1所述的音视频处理方法,其中,所述根据本次接收的所述视频数据帧的时间戳序号以及前两次接收的所述视频数据帧的时间戳序号确定视频序号跳变值,包括:The audio and video processing method according to claim 1, wherein the video sequence number jump is determined based on the timestamp number of the video data frame received this time and the timestamp number of the video data frame received twice before. Values, including:
    根据前两次接收的所述视频数据帧的时间戳序号确定第一序号差值;Determine the first sequence number difference based on the timestamp sequence numbers of the previously received two video data frames;
    根据本次接收的所述视频数据帧的时间戳序号与上一次接收的所述视频数据帧的时间戳序号确定第二序号差值;Determine the second sequence number difference based on the timestamp sequence number of the video data frame received this time and the timestamp sequence number of the video data frame received last time;
    根据所述第二序号差值和所述第一序号差值计算得到所述视频序号跳变值。The video sequence number jump value is calculated based on the second sequence number difference and the first sequence number difference.
  3. 根据权利要求1所述的音视频处理方法,其中,所述根据所述视频序号跳变值确定视频期望播放时间,包括:The audio and video processing method according to claim 1, wherein the determining the expected video playback time according to the video sequence number jump value includes:
    当所述视频序号跳变值大于预设的视频序号跳变阈值,在上一次接收的所述视频数据帧对应的所述视频期望播放时间上增加第一时间间隔,确定所述视频期望播放时间,得到第一视频期望播放时间,其中,所述第一时间间隔表征两次接收所述视频数据帧之间的时间间隔。When the video sequence number jump value is greater than the preset video sequence number jump threshold, a first time interval is added to the expected video playback time corresponding to the last received video data frame to determine the expected video playback time. , obtain the expected playback time of the first video, wherein the first time interval represents the time interval between receiving the video data frame twice.
  4. 根据权利要求3所述的音视频处理方法,其中,所述根据所述视频序号跳变值确定视频期望播放时间,包括:The audio and video processing method according to claim 3, wherein the determining the expected video playback time according to the video sequence number jump value includes:
    当所述视频序号跳变值小于所述视频序号跳变阈值,根据初始视频时间戳序号、初始视频期望播放时间以及所述视频数据帧的时间戳序号确定所述视频期望播放时间,得到第二视频期望播放时间,其中,所述初始视频时间戳序号根据本次接收的所述视频数据帧的时间戳序号而得到;所述初始视频期望播放时间根据本次接收的所述视频数据帧的视频期望播放时间而得到。When the video sequence number jump value is less than the video sequence number jump threshold, the expected video playback time is determined based on the initial video timestamp sequence number, the initial video expected playback time and the timestamp sequence number of the video data frame, and the second desired playback time is obtained. The expected video playback time, wherein the initial video timestamp number is obtained according to the timestamp number of the video data frame received this time; the initial video expected playback time is based on the video of the video data frame received this time. Expect play time and get.
  5. 根据权利要求4所述的音视频处理方法,其中,所述初始视频时间戳序号和所述初始视频期望播放时间,由以下步骤得到:The audio and video processing method according to claim 4, wherein the initial video timestamp serial number and the initial video expected playback time are obtained by the following steps:
    当所述视频数据帧为接收到的第一帧视频数据帧,将所述第一帧视频数据帧的时间戳序号确定为所述初始视频时间戳序号,将所述第一帧视频数据帧对应的所述视频期望播放时间确定为所述初始视频期望播放时间。When the video data frame is the received first video data frame, the timestamp number of the first video data frame is determined as the initial video timestamp number, and the corresponding first video data frame is The expected video play time is determined as the initial video expected play time.
  6. 根据权利要求4或5所述的音视频处理方法,还包括:The audio and video processing method according to claim 4 or 5, further comprising:
    当所述视频序号跳变值大于所述视频序号跳变阈值,根据本次接收的所述视频数据帧的 时间戳序号更新所述初始视频时间戳序号;When the video sequence number jump value is greater than the video sequence number jump threshold, according to the video data frame received this time The timestamp serial number updates the initial video timestamp serial number;
    根据所述第一视频期望播放时间更新所述初始视频期望播放时间。The initial video expected play time is updated according to the first video expected play time.
  7. 根据权利要求1所述的音视频处理方法,其中,当所述视频数据帧为接收到的第一帧视频数据帧,所述音视频处理方法,还包括:The audio and video processing method according to claim 1, wherein when the video data frame is the received first video data frame, the audio and video processing method further includes:
    当接收所述第一帧视频数据帧的时间早于或等于接收第一帧音频数据帧的时间,将所述视频期望播放时间设置为预设时间值。When the time at which the first video data frame is received is earlier than or equal to the time at which the first audio data frame is received, the expected video playback time is set to a preset time value.
  8. 根据权利要求1所述的音视频处理方法,其中,当所述视频数据帧为接收到的第一帧视频数据帧,所述音视频处理方法,还包括:The audio and video processing method according to claim 1, wherein when the video data frame is the received first video data frame, the audio and video processing method further includes:
    当接收所述第一帧视频数据帧的时间晚于接收第一帧音频数据帧的时间,将接收所述第一帧视频数据帧与接收所述第一帧音频数据帧之间的时间间隔确定为所述视频期望播放时间。When the time of receiving the first video data frame is later than the time of receiving the first audio data frame, the time interval between receiving the first video data frame and receiving the first audio data frame is determined. The desired playing time for the video.
  9. 根据权利要求1所述的音视频处理方法,还包括:The audio and video processing method according to claim 1, further comprising:
    当超过预设的视频补帧时间阈值未接收到所述视频数据帧,复制上一次接收的所述视频数据帧;When the video data frame is not received beyond the preset video frame filling time threshold, copy the last received video data frame;
    根据第一时间间隔与上一次接收所述视频数据帧对应的所述视频期望播放时间确定复制的所述视频数据帧的所述视频期望播放时间,其中,所述第一时间间隔表征两次接收所述视频数据帧之间的时间间隔。Determine the expected video playback time of the copied video data frame according to the first time interval and the video expected playback time corresponding to the last time the video data frame was received, wherein the first time interval represents two receptions The time interval between frames of video data.
  10. 根据权利要求1所述的音视频处理方法,其中,所述根据本次接收的所述音频数据帧的时间戳序号以及前两次接收的所述音频数据帧的时间戳序号确定音频序号跳变值,包括:The audio and video processing method according to claim 1, wherein the audio sequence number jump is determined based on the timestamp number of the audio data frame received this time and the timestamp number of the audio data frame received twice before. Values, including:
    根据前两次接收的所述音频数据帧的时间戳序号确定第三序号差值;Determine the third sequence number difference based on the timestamp sequence numbers of the audio data frames received twice previously;
    根据本次接收的所述音频数据帧的时间戳序号与上一次接收的所述音频数据帧的时间戳序号确定第四序号差值;Determine a fourth sequence number difference based on the timestamp sequence number of the audio data frame received this time and the timestamp sequence number of the audio data frame received last time;
    根据所述第三序号差值和所述第四序号差值计算得到所述音频序号跳变值。The audio sequence number jump value is calculated according to the third sequence number difference and the fourth sequence number difference.
  11. 根据权利要求1所述的音视频处理方法,其中,所述根据所述音频序号跳变值确定音频期望播放时间,包括:The audio and video processing method according to claim 1, wherein determining the expected audio playback time according to the audio sequence number jump value includes:
    当所述音频序号跳变值大于预设的音频序号跳变阈值,在上一次接收所述音频数据帧对应的所述音频期望播放时间上增加第二时间间隔,确定所述音频期望播放时间,得到第一音频期望播放时间,其中,所述第二时间间隔表征两次接收所述音频数据帧之间的时间间隔。When the audio sequence number jump value is greater than the preset audio sequence number jump threshold, add a second time interval to the audio expected playback time corresponding to the last received audio data frame, and determine the audio expected playback time, The first audio expected playback time is obtained, wherein the second time interval represents the time interval between receiving the audio data frame twice.
  12. 根据权利要求11所述的音视频处理方法,还包括:The audio and video processing method according to claim 11, further comprising:
    当所述音频序号跳变值小于所述音频序号跳变阈值,根据初始音频时间戳序号、初始音频期望播放时间以及所述音频数据帧的所述时间戳序号确定所述音频期望播放时间,得到第二音频期望播放时间,其中,所述初始音频时间戳序号根据本次接收的所述音频数据帧的时间戳序号而得到,所述初始音频期望播放时间根据本次接收的所述音频数据帧的所述音频期望播放时间而得到。When the audio sequence number jump value is less than the audio sequence number jump threshold, the expected audio playback time is determined based on the initial audio timestamp sequence number, the initial audio expected playback time and the timestamp sequence number of the audio data frame, and we obtain The second audio expected play time, wherein the initial audio timestamp number is obtained according to the timestamp number of the audio data frame received this time, and the initial audio expected play time is based on the audio data frame received this time. The desired playing time of the audio is obtained.
  13. 根据权利要求12所述的音视频处理方法,其中,所述初始音频时间戳序号和所述初始音频期望播放时间,由以下步骤得到:The audio and video processing method according to claim 12, wherein the initial audio timestamp serial number and the initial audio expected playback time are obtained by the following steps:
    当所述音频数据帧为接收到的第一帧音频数据帧,将所述第一帧音频数据帧的时间戳序号确定为所述初始音频时间戳序号,将所述第一帧音频数据帧对应的所述音频期望播放时间确定为所述初始音频期望播放时间。When the audio data frame is the received first audio data frame, the timestamp number of the first audio data frame is determined as the initial audio timestamp number, and the corresponding audio data frame of the first frame is The desired audio playback time is determined as the initial desired audio playback time.
  14. 根据权利要求12或13所述的音视频处理方法,还包括: The audio and video processing method according to claim 12 or 13, further comprising:
    当所述音频序号跳变值大于所述音频序号跳变阈值,根据本次接收的所述音频数据帧的时间戳序号更新所述初始音频时间戳序号;When the audio sequence number jump value is greater than the audio sequence number jump threshold, update the initial audio timestamp sequence number according to the timestamp sequence number of the audio data frame received this time;
    根据所述第一音频期望播放时间更新所述初始音频期望播放时间。The initial audio expected play time is updated according to the first audio expected play time.
  15. 根据权利要求1所述的音视频处理方法,其中,当所述音频数据帧为接收到的第一帧音频数据帧,所述音视频处理方法,还包括:The audio and video processing method according to claim 1, wherein when the audio data frame is the first received audio data frame, the audio and video processing method further includes:
    当接收所述第一帧音频数据帧的时间早于或等于接收第一帧视频数据帧的时间,将所述音频期望播放时间设置为预设时间值。When the time of receiving the first frame of audio data frame is earlier than or equal to the time of receiving the first frame of video data frame, the expected audio playback time is set to a preset time value.
  16. 根据权利要求1或15所述的音视频处理方法,其中,当所述音频数据帧为接收到的第一帧音频数据帧,所述音视频处理方法,还包括:The audio and video processing method according to claim 1 or 15, wherein when the audio data frame is the first received audio data frame, the audio and video processing method further includes:
    当接收所述第一帧音频数据帧的时间晚于接收第一帧视频数据帧的时间,将接收所述第一帧所述音频数据帧与接收第一帧视频数据帧之间的时间间隔确定为所述音频期望播放时间。When the time of receiving the first frame of audio data frame is later than the time of receiving the first frame of video data frame, the time interval between receiving the first frame of audio data frame and receiving the first frame of video data frame is determined. The desired playing time for the audio.
  17. 根据权利要求1所述的音视频处理方法,还包括:The audio and video processing method according to claim 1, further comprising:
    当超过预设的音频补帧时间阈值未接收到所述音频数据帧,复制上一次接收的所述音频数据帧;When the audio data frame is not received beyond the preset audio frame filling time threshold, copy the last received audio data frame;
    根据第二时间间隔与上一次接收所述音频数据帧对应的所述音频期望播放时间确定复制的所述音频数据帧的所述音频期望播放时间,其中,所述第二时间间隔表征两次接收所述音频数据帧之间的时间间隔。The expected audio playback time of the copied audio data frame is determined according to the expected audio playback time of the last time the audio data frame was received in a second time interval, where the second time interval represents two receptions. The time interval between the audio data frames.
  18. 一种音视频处理装置,包括:存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如权利要求1至17中任意一项所述的音视频处理方法。An audio and video processing device, including: a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, it implements any one of claims 1 to 17 The audio and video processing method.
  19. 一种计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令用于执行权利要求1至17中任意一项所述的音视频处理方法。 A computer-readable storage medium stores computer-executable instructions, and the computer-executable instructions are used to execute the audio and video processing method described in any one of claims 1 to 17.
PCT/CN2023/095554 2022-06-06 2023-05-22 Audio and video processing method and apparatus, and storage medium WO2023236767A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210631499.0 2022-06-06
CN202210631499.0A CN117241080A (en) 2022-06-06 2022-06-06 Audio and video processing method and device and storage medium thereof

Publications (1)

Publication Number Publication Date
WO2023236767A1 true WO2023236767A1 (en) 2023-12-14

Family

ID=89093552

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/095554 WO2023236767A1 (en) 2022-06-06 2023-05-22 Audio and video processing method and apparatus, and storage medium

Country Status (2)

Country Link
CN (1) CN117241080A (en)
WO (1) WO2023236767A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007028261A (en) * 2005-07-19 2007-02-01 Nec Viewtechnology Ltd Video/audio reproduction apparatus and video/audio reproducing method
CN101193311A (en) * 2006-12-21 2008-06-04 腾讯科技(深圳)有限公司 Audio and video data synchronization method in P2P system
CN102075803A (en) * 2010-12-22 2011-05-25 Tcl通力电子(惠州)有限公司 Method for synchronously playing video and audio
CN103731716A (en) * 2014-01-08 2014-04-16 珠海全志科技股份有限公司 Method for synchronizing audio and video in TS stream playing
CN110225385A (en) * 2019-06-19 2019-09-10 鼎桥通信技术有限公司 A kind of audio-visual synchronization method of adjustment and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007028261A (en) * 2005-07-19 2007-02-01 Nec Viewtechnology Ltd Video/audio reproduction apparatus and video/audio reproducing method
CN101193311A (en) * 2006-12-21 2008-06-04 腾讯科技(深圳)有限公司 Audio and video data synchronization method in P2P system
CN102075803A (en) * 2010-12-22 2011-05-25 Tcl通力电子(惠州)有限公司 Method for synchronously playing video and audio
CN103731716A (en) * 2014-01-08 2014-04-16 珠海全志科技股份有限公司 Method for synchronizing audio and video in TS stream playing
CN110225385A (en) * 2019-06-19 2019-09-10 鼎桥通信技术有限公司 A kind of audio-visual synchronization method of adjustment and device

Also Published As

Publication number Publication date
CN117241080A (en) 2023-12-15

Similar Documents

Publication Publication Date Title
TWI762117B (en) Dynamic control of fingerprinting rate to facilitate time-accurate revision of media content
TWI788744B (en) Dynamic reduction in playout of replacement content to help align end of replacement content with end of replaced content
US11812103B2 (en) Dynamic playout of transition frames while transitioning between playout of media streams
KR20210022134A (en) Establishment and use of temporal mapping based on interpolation using low-rate fingerprinting to facilitate frame-accurate content modification
WO2017161998A1 (en) Video processing method and device and computer storage medium
JP6486628B2 (en) An interconnected multimedia system for synchronized playback of media streams
RU2763518C1 (en) Method, device and apparatus for adding special effects in video and data media
KR20210022133A (en) Preliminary preparation for content modification based on the expected latency in acquiring new content
JP2022517587A (en) Audio stream and video stream synchronization switching method and equipment
JP5117264B2 (en) Interconnected multimedia system with playback synchronization
WO2023236767A1 (en) Audio and video processing method and apparatus, and storage medium
CN114697712A (en) Method, device and equipment for downloading media stream and storage medium
KR101811366B1 (en) Media playe program
US11076197B1 (en) Synchronization of multiple video-on-demand streams and methods of broadcasting and displaying multiple concurrent live streams
JP2013121029A (en) Content reproduction system
WO2023170676A1 (en) Method of facilitating a synchronized play of content and system thereof

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23818932

Country of ref document: EP

Kind code of ref document: A1