CN112423075A - Audio and video timestamp processing method and device, electronic equipment and storage medium - Google Patents

Audio and video timestamp processing method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN112423075A
CN112423075A CN202011255652.1A CN202011255652A CN112423075A CN 112423075 A CN112423075 A CN 112423075A CN 202011255652 A CN202011255652 A CN 202011255652A CN 112423075 A CN112423075 A CN 112423075A
Authority
CN
China
Prior art keywords
audio
timestamp
video
video data
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011255652.1A
Other languages
Chinese (zh)
Other versions
CN112423075B (en
Inventor
童欢
方周
朱经腾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Huaduo Network Technology Co Ltd
Original Assignee
Guangzhou Huaduo Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Huaduo Network Technology Co Ltd filed Critical Guangzhou Huaduo Network Technology Co Ltd
Priority to CN202011255652.1A priority Critical patent/CN112423075B/en
Publication of CN112423075A publication Critical patent/CN112423075A/en
Application granted granted Critical
Publication of CN112423075B publication Critical patent/CN112423075B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/4302Content synchronisation processes, e.g. decoder synchronisation
    • H04N21/4307Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen

Abstract

The application discloses processing method, device, electronic equipment and storage medium of audio and video timestamp, through acquireing pending audio and video data, then acquire the first timestamp that corresponds with the target audio frequency of audio data, reacquire the second timestamp that corresponds with the target video of video data, again based on first timestamp and second timestamp acquire the collection time delay between audio data and the video data, then carry out the revision processing to first timestamp or second timestamp based on this collection time delay, in order to obtain the audio and video data that the timestamp is synchronous. According to the method, under the condition that the acquisition time delay between the audio data and the video data is obtained based on the first time stamp corresponding to the target audio of the audio data and the second time stamp corresponding to the target video of the video data, the first time stamp or the second time stamp can be corrected based on the acquisition time delay, so that audio and video data with synchronous time stamps can be obtained, and the watching experience of a user is improved.

Description

Audio and video timestamp processing method and device, electronic equipment and storage medium
Technical Field
The present application relates to the field of audio and video processing technologies, and in particular, to a method and an apparatus for processing an audio and video timestamp, an electronic device, and a storage medium.
Background
With the development of network technology, users frequently perform online interaction activities through a network, such as online video conferences or online video chatting, and need to collect image data and sound data of conversation users, that is, audio and video data is transmitted to other users, and the collected audio and video data is played at other users, so that the user demands for the transmission quality of the played audio and video data are higher and higher. In a related mode of improving the transmission quality of audio and video data, the audio data and the video data can be subjected to timestamp packaging at a sending end, and the timestamp of the audio and video data is matched after being transmitted to a receiving end through a network, so that the audio and video data can be synchronously played. However, the above method cannot avoid the problem that the audio and video data is not played synchronously due to the time delay caused by hardware cache when the audio and video terminal collects the audio and video information.
Disclosure of Invention
In view of the foregoing problems, the present application provides a method and an apparatus for processing an audio/video timestamp, an electronic device, and a storage medium, so as to improve the foregoing problems.
In a first aspect, an embodiment of the present application provides an audio and video timestamp processing method, which is applicable to an electronic device, and the method includes: acquiring audio and video data to be processed, wherein the audio and video data to be processed comprises audio data and video data; acquiring a first time stamp corresponding to a target audio of the audio data; acquiring a second timestamp corresponding to a target video of the video data; acquiring a collection time delay between the audio data and the video data based on the first time stamp and the second time stamp; and correcting the first time stamp or the second time stamp based on the acquisition time delay to obtain audio and video data with synchronous time stamps.
In a second aspect, an embodiment of the present application provides an apparatus for processing an audio/video timestamp, which is operable on an electronic device, and the apparatus includes: the data acquisition module is used for acquiring audio and video data to be processed, and the audio and video data to be processed comprises audio data and video data; a first timestamp acquiring module, configured to acquire a first timestamp corresponding to a target audio of the audio data; a second timestamp acquiring module, configured to acquire a second timestamp corresponding to a target video of the video data; a time delay obtaining module, configured to obtain a collection time delay between the audio data and the video data based on the first time stamp and the second time stamp; and the processing module is used for correcting the first time stamp or the second time stamp based on the acquisition time delay so as to obtain audio and video data with synchronous time stamps.
In a third aspect, an embodiment of the present application provides an electronic device, including a memory and one or more processors; one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to perform the method of the first aspect described above.
In a fourth aspect, the present application provides a computer-readable storage medium, in which a program code is stored, wherein when the program code is executed by a processor, the method according to the first aspect is performed.
The embodiment of the application provides a processing method and device of audio and video timestamps, electronic equipment and storage medium, through obtaining pending audio and video data, pending audio and video data includes audio data and video data, then obtain the first timestamp that corresponds with the target audio frequency of audio data, reacquire the second timestamp that corresponds with the target video of video data, reacquire the acquisition time delay between audio data and the video data based on first timestamp and second timestamp again, then revise first timestamp or second timestamp based on this acquisition time delay, in order to obtain the audio and video data that the timestamp is synchronous. Therefore, the first time stamp corresponding to the target audio of the audio data and the second time stamp corresponding to the target video of the video data are obtained, and under the condition that the acquisition time delay between the audio data and the video data is obtained, the first time stamp or the second time stamp can be corrected based on the acquisition time delay, so that the audio and video data with synchronous time stamps can be obtained, and the watching experience of a user is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 illustrates a flowchart of an example that an audio-video timestamp out-of-sync problem is easily generated when a native video is recorded according to an embodiment of the present application.
Fig. 2 shows a flowchart of a method for processing an audio/video timestamp according to an embodiment of the present application.
Fig. 3 shows a method flowchart of step S120 in fig. 2.
Fig. 4 shows a method flowchart of step S130 in fig. 2.
Fig. 5 is a schematic diagram illustrating an acquisition process for acquiring timestamps corresponding to audio data and video data according to an embodiment of the present application.
Fig. 6 illustrates an example diagram of a timestamp difference between an audio frame and a video frame provided by an embodiment of the present application.
Fig. 7 shows a flowchart of a method for processing an audio/video timestamp according to another embodiment of the present application.
Fig. 8 shows a block diagram of a processing apparatus of an audio/video timestamp according to an embodiment of the present application.
Fig. 9 shows a block diagram of an electronic device according to an embodiment of the present application.
Fig. 10 illustrates a storage unit for storing or carrying program codes for implementing an audio/video timestamp processing method according to an embodiment of the present application.
Detailed Description
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.
When recording an acoustic video, a camera is required to collect video data, and a microphone is used to collect audio data. After the audio/video data is collected, a timestamp of the collected audio/video data may be determined. When the audio and video data are played, the playing end can play the audio and video data based on the time stamp. In an application with a video recording function, it is common that audio and video asynchronism occurs in a recorded acoustic video. In the way of processing audio and video asynchronization in a related manner, the interval time between two adjacent frames in audio data and video data is generally considered to be fixed. For a certain frame in the audio data and the video data, the sum of the time stamp of the last frame and the interval time is determined as the time stamp of the frame. And recording the time stamp in the recorded audio and video data.
The inventor finds, through long-term research, that as shown in fig. 1, when a native video is recorded, due to the existence of a camera and an audio input cache of a device (i.e., a camera software and hardware cache of the device and a system software and hardware cache of the device shown in fig. 1), a system API (Application Programming Interface) has a certain delay when audio and video data is recalled, and the delay sizes of different devices and different system versions are different, in this way, an error occurs when time stamps are applied to the currently recalled audio and video data, which causes that audio and video cannot be synchronized, that is, if the above-mentioned way is adopted to process the problem of audio and video asynchronization, even if the audio and video data are temporarily synchronized, the problem of the audio and video data inevitably becoming unsynchronized occurs again along with the extension of the playing time.
Aiming at the problems, the inventor finds out through long-term research that the audio and video data to be processed can be obtained, the audio and video data to be processed comprise audio data and video data, then a first time stamp corresponding to a target audio of the audio data is obtained, a second time stamp corresponding to a target video of the video data is obtained, then the acquisition time delay between the audio data and the video data is obtained based on the first time stamp and the second time stamp, and then the first time stamp or the second time stamp is corrected based on the acquisition time delay so as to obtain the audio and video data with synchronous time stamps. Therefore, the first time stamp corresponding to the target audio of the audio data and the second time stamp corresponding to the target video of the video data are obtained, and under the condition that the acquisition time delay between the audio data and the video data is obtained, the first time stamp or the second time stamp can be corrected based on the acquisition time delay, so that the audio and video data with synchronous time stamps can be obtained, and the watching experience of a user is improved. Therefore, the method and the device for processing the audio and video timestamp, the electronic device and the storage medium provided by the embodiment of the application are provided.
Embodiments of the present application will be described in detail below with reference to the accompanying drawings.
Referring to fig. 2, a flowchart of a processing method for an audio/video timestamp provided in an embodiment of the present application is shown, where the processing method for an audio/video timestamp provided in this embodiment is applicable to an electronic device, the electronic device is a viewing end device for audio/video data, and the viewing end device is applicable to various systems (for example, systems such as win/mac/ios/Android), and the method includes:
step S110: and acquiring audio and video data to be processed, wherein the audio and video data to be processed comprises audio data and video data.
In this embodiment, the audio/video data to be processed is the audio/video data to be played after being decoded by the viewing end device, and the audio/video data to be processed includes audio data and video data. Optionally, when an audio/video stream file sent by the broadcasting end device is received, an audio data packet and a video data packet in the audio/video stream file can be decoded respectively, wherein the audio data and a corresponding timestamp can be obtained by decoding the received audio data packet, the video data and a corresponding timestamp can be obtained by decoding the received video data packet, and the audio data and the video data obtained after decoding are used as audio/video data to be processed, so that whether the audio/video data are synchronous or not can be detected before playing, and if not, the timestamp of the audio/video data can be processed, so that the audio/video data can be played synchronously.
Step S120: a first timestamp corresponding to a target audio of the audio data is obtained.
The audio data in this embodiment may include multiple frames of audio data, each frame of audio data is pre-marked with a corresponding timestamp (i.e., a timestamp obtained in the above decoding process), and timestamps of different frames of audio data are different. As a way, if it is detected that the audio/video data is not synchronized, in order to facilitate accurate acquisition of a timestamp difference between the audio/video data, a first timestamp corresponding to a target audio of the audio data may be acquired. Optionally, the target audio may include a noise frame audio, and the noise may be ambient noise, or speech noise during a call, or other noise, and the specific type of the noise may not be limited.
Referring to fig. 3, as an alternative, step S120 may include:
step S121: and acquiring target coding modulation parameters corresponding to each frame of audio data.
In this embodiment, in the process of decoding the audio data packet, the audio frames in the audio frame sequence may be restored to audio data in a PCM (pulse-code modulation) format, in this way, each frame of audio data may correspond to one or more pulse-code modulation periods, and the lengths of the pulse-code modulation periods corresponding to different frames of audio data may be different, for example, the duration of the pulse-code modulation period corresponding to audio frame data with a sharp sound is shorter than the duration of the pulse-code modulation period corresponding to audio frame data with a thick sound. Alternatively, each pulse code modulation period may comprise a plurality of pulse code modulation parameters (which may be understood as PCM values). As one way, a target modulation parameter corresponding to each frame of audio data may be obtained, where the target modulation parameter may be understood as a sum of PCM values of one frame of audio data in one or more pulse code modulation periods corresponding to the frame of audio data.
For example, if the audio frame data a includes one pulse code modulation period, the target code modulation parameter corresponding to the audio frame data a is the sum of (the absolute value of) all PCM values in the one pulse code modulation period. If the audio data a includes a plurality of pulse code modulation periods, the target code modulation parameter corresponding to the audio frame data a is the sum of (the absolute value of) all PCM values in the plurality of pulse code modulation periods. Optionally, the target coding modulation parameters corresponding to the audio data of different frames may be different, or in some embodiments, the target coding modulation parameters corresponding to the audio data of different frames may be the same.
Step S122: and taking the audio frame corresponding to the target coding modulation parameter meeting the target condition as the target audio.
The target condition may include an audio frame corresponding to a target modulation parameter with a maximum value in all audio frame data, or an audio frame corresponding to a target modulation parameter with a value greater than a specified value (a specific value may be set according to actual requirements) in all audio frame data. As one way, the audio frame corresponding to the target modulation parameter with the largest value may be used as the target audio, so that the time stamp of the audio may be accurately determined.
For example, in a specific application scenario, since the PCM value of a noise frame in audio is large, the sum of the PCM absolute values of audio frame data of each frame may be calculated, and then whether the audio frame is a mute frame or a noise frame may be distinguished, where the noise frame with the largest PCM value is used as the target audio.
Step S123: identifying a timestamp corresponding to the target audio as a first timestamp.
Optionally, after the target audio is determined, a timestamp corresponding to the target audio in the obtained timestamps may be identified as the first timestamp. It should be noted that, in a frame of audio data, the value of the timestamp corresponding to the target audio is greater than the value of the timestamp corresponding to the non-target audio. For example, when the target audio is a noise frame audio and the non-target audio is a mute frame audio, the value of the timestamp corresponding to the noise frame audio is greater than the value of the timestamp corresponding to the mute frame audio.
By identifying the time stamp corresponding to the target audio as the time stamp of the audio data, the time stamp difference between the audio and video data can be detected by the time stamp corresponding to the more prominent audio feature, so that the time stamp difference of the audio and video data can be corrected or corrected more accurately.
Step S130: a second timestamp corresponding to a target video of the video data is obtained.
The video data in this embodiment may include multiple frames of video data, each frame of video data is pre-marked with a corresponding timestamp, and timestamps of different frames of video data may be different. Optionally, the target video may include a white picture video frame, or a picture video frame with another color, and the other color may be any color other than black, which is not limited herein.
It should be noted that, in this embodiment, the sequence of obtaining the first timestamp corresponding to the target audio of the audio data and obtaining the second timestamp corresponding to the target video of the video data may be changed, that is, the first timestamp may be obtained first, and then the second timestamp may be obtained; or the second time stamp can be acquired first, and then the first time stamp can be acquired; or the first timestamp and the second timestamp may be acquired simultaneously. The "first" in the first time stamp and the "second" in the second time stamp are not taken as limitations on the order of time stamp acquisition.
Referring to fig. 4, as an alternative, step S130 may include:
step S131: and acquiring the brightness parameter of each frame of video data.
In this embodiment, after the video packet data is decoded, the video frames in the video frame sequence may be restored to video data in YUV format, and for each frame of video data, the sum of (the absolute value of) the Y (representing brightness (Luma), that is, gray-scale value) vectors of each pixel in the corresponding YUV image may be used as a brightness parameter. Optionally, the brightness parameters of the video data of different frames may be different, or the brightness parameters of the video data of different frames may be the same, and the pulse code modulation periods included in the video data of different frames may be different. Each frame of video data may include at least one pulse code modulation period.
Step S132: and taking the video frame with the brightness parameter value larger than the target threshold value as the target video.
Alternatively, the specific value of the target threshold may not be limited. As a mode, when it is detected that the time stamps of the audio and video data are not consistent or the audio and video data are not synchronized, the target video can be determined based on the value of the brightness parameter of the video frame, and then the time stamps of the video data can be determined more accurately.
Step S133: identifying a timestamp corresponding to the target video as a second timestamp.
Optionally, after the target video is determined, a timestamp corresponding to the target video in the obtained timestamps may be identified as the second timestamp. It should be noted that, in a frame of video data, the value of the timestamp corresponding to the target video is greater than the value of the timestamp corresponding to the non-target video. For example, when the target video is a white frame video and the non-target video is a black frame video, the value of the timestamp corresponding to the white frame video is greater than the value of the timestamp corresponding to the black frame video.
The timestamp corresponding to the target video is identified as the timestamp of the video data, so that the timestamp difference between the audio and video data can be detected by the timestamp corresponding to the video characteristics which can reflect the real-time performance more, and the timestamp difference of the audio and video data can be corrected or corrected more accurately.
In a specific application scenario, please refer to fig. 5, which shows a schematic diagram of a process for acquiring timestamps corresponding to audio data and video data according to this embodiment. In the related art, a synchronization source video can periodically play a black-and-white screen picture, and a corresponding audio is a noise audio when the white screen picture is displayed, and a corresponding audio is a mute audio when the black screen picture is displayed, and a broadcast end device normally shoots a video corresponding to the synchronization source video, if only a system time returned by a system API function is taken as an audio time stamp and a video time stamp, the system time is inaccurate.
To overcome this problem, the inventors found that it is possible to distinguish whether the current frame video is a black-and-white picture from the viewing video stream by using the feature that the single Y vector value of the white picture video frame is greater than 0 and the single Y vector value of the black picture video frame is less than 0; and by utilizing the characteristic that the PCM amplitude value of the audio noise frame is large, whether the current frame is the audio of a mute frame or the audio of a noise frame can be distinguished by calculating the PCM absolute value sum of each audio frame, and further, the corresponding relation (Tn, Pn) between the video black-and-white frame and the video time stamp and the corresponding relation (Tn, Pn) between the audio noise frame and the audio time stamp shown in FIG. 5 can be obtained, namely, the video time stamp can be obtained when the video frame is a white frame, and the audio time stamp can be obtained when the audio frame is a noise frame. Where Tn represents the video timestamp, Tn represents the audio timestamp, Pn represents the color picture video frame, and Pn represents the noise frame audio.
In one embodiment, please refer to fig. 6, which illustrates an example of the timestamp difference between an audio frame and a video frame provided by the present embodiment. As shown in fig. 6, a time stamp corresponding to when the value of the PCM sum of the audio frame reaches the maximum value (peak value of the PCM sum) may be used as the audio time stamp, and a time stamp corresponding to when the value of the Y vector sum of the video frame reaches the maximum value (peak value of the Y vector sum) may be used as the video time stamp.
Step S140: acquiring a capture latency between the audio data and the video data based on the first timestamp and the second timestamp.
In one implementation, a difference between the first timestamp and the second timestamp may be obtained, and then the difference is used as a collection delay between the audio data and the video data, where the collection delay is a delay caused by a difference between a hardware cache or a hardware cache of the system when the camera collects the video data and the microphone collects the audio data. Optionally, if the difference is a negative number, an absolute value of the difference may be taken, and the absolute value is used as the acquisition delay between the audio data and the video data.
Taking the above example as an example, the difference between the audio time stamp Tn and the video time stamp Tn can be used as the difference between the audio and video input delays of the broadcasting end device, i.e. the acquisition delay between the audio data and the video data. Optionally, in some examples, the acquisition delay in this embodiment may be 300 milliseconds, 400 milliseconds, or the like, and the specific value may not be limited.
Step S150: and correcting the first time stamp or the second time stamp based on the acquisition time delay to obtain audio and video data with synchronous time stamps.
Optionally, after obtaining the capture delay between the audio data and the video data, the first timestamp may be compared to the second timestamp. In one embodiment, if the first timestamp is greater than the second timestamp, the acquisition delay may be subtracted from the first timestamp to control the first timestamp of the playback end to be equal to the second timestamp, that is, to control the timestamps of the audio data and the video data of the playback end to be identical (the same).
In another embodiment, if the first timestamp is smaller than the second timestamp, the acquisition delay may be subtracted from the second timestamp to control the second timestamp of the playback end to be equal to the first timestamp, that is, to control the timestamps of the video data and the audio data of the playback end to be identical (the same).
Optionally, if the audio/video data to be processed is sent to each of the watching end devices, the acquisition delay may be sent to each of the watching end devices, so as to control each of the watching end devices to check the timestamp of the audio/video data in time.
Optionally, in other embodiments, the playing speed of the audio data or the video data at the viewing end may also be controlled based on the acquisition delay, for example, if the timestamp corresponding to the audio data is greater than the timestamp corresponding to the video data, the playing speed of the video data may be reduced, so that the audio data and the video data may be played synchronously.
The processing method of the audio and video timestamp provided by this embodiment obtains audio and video data to be processed, where the audio and video data to be processed includes audio data and video data, then obtains a first timestamp corresponding to a target audio of the audio data, and then obtains a second timestamp corresponding to a target video of the video data, and then obtains a collection delay between the audio data and the video data based on the first timestamp and the second timestamp, and then corrects the first timestamp or the second timestamp based on the collection delay, so as to obtain audio and video data with synchronized timestamps. Therefore, the first time stamp corresponding to the target audio of the audio data and the second time stamp corresponding to the target video of the video data are obtained, and under the condition that the acquisition time delay between the audio data and the video data is obtained, the first time stamp or the second time stamp can be corrected based on the acquisition time delay, so that the audio and video data with synchronous time stamps can be obtained, and the watching experience of a user is improved.
Referring to fig. 7, a flowchart of a method for processing an audio/video timestamp according to another embodiment of the present application is shown, where the embodiment provides a method for processing an audio/video timestamp, which is applicable to an electronic device, and the method includes:
step S210: and acquiring audio and video data to be processed, wherein the audio and video data to be processed comprises audio data and video data.
Step S220: a first timestamp corresponding to a target audio of the audio data is obtained.
Step S230: a second timestamp corresponding to a target video of the video data is obtained.
Step S240: aligning the target audio with the target video.
In some embodiments, the noise frame audio of the synchronized video source at the broadcasting end may correspond to a white picture video frame, in this way, the target audio may be aligned with the target video, that is, the noise frame audio Pn and the white picture video frame Pn in the foregoing embodiments may be aligned at the viewing end, in this way, the timestamp difference between the audio data and the video data may be obtained by using the timestamp of the aligned noise frame audio and the timestamp of the white picture video frame, so that the timestamp difference between the audio data and the video data may be calculated by using more vivid audio features and video features, thereby improving the accuracy of obtaining the acquisition delay.
Step S250: acquiring the acquisition time delay between the audio data and the video data based on the aligned first time stamp corresponding to the target audio and the aligned second time stamp corresponding to the target video.
In this way, the difference between the first timestamp corresponding to the target audio and the second timestamp corresponding to the target video after alignment may be obtained, and then the difference may be used as the acquisition delay between the audio data and the video data. Similarly, if the value of the difference is negative, the absolute value of the difference may be used as the acquisition delay between the audio data and the video data.
Step S260: and correcting the first time stamp or the second time stamp based on the acquisition time delay to obtain audio and video data with synchronous time stamps.
According to the audio and video time stamp processing method provided by the embodiment, under the condition that the acquisition time delay between the audio data and the video data is acquired based on the first time stamp corresponding to the target audio of the audio data and the second time stamp corresponding to the target video of the video data, the first time stamp or the second time stamp can be corrected based on the acquisition time delay, so that audio and video data with synchronous time stamps can be acquired, and the watching experience of a user is further improved.
Referring to fig. 8, a block diagram of a structure of an audio/video timestamp processing apparatus according to an embodiment of the present application is shown, where the embodiment provides an audio/video timestamp processing apparatus 300, which can be operated in an electronic device, where the apparatus 300 includes: the data obtaining module 310, the first timestamp obtaining module 320, the second timestamp obtaining module 330, the latency obtaining module 340, and the processing module 350:
the data obtaining module 310 is configured to obtain to-be-processed audio and video data, where the to-be-processed audio and video data includes audio data and video data.
A first timestamp obtaining module 320, configured to obtain a first timestamp corresponding to a target audio of the audio data.
Optionally, the audio data may comprise a plurality of frames of audio data, the audio data being pre-marked with a corresponding time stamp. In one implementation, the first timestamp obtaining module 320 may be configured to obtain target modulation parameters corresponding to each frame of audio data; taking the audio frame corresponding to the target coding modulation parameter meeting the target condition as a target audio; identifying a timestamp corresponding to the target audio as a first timestamp. Optionally, the target audio in this embodiment includes a noise frame.
A second timestamp obtaining module 330, configured to obtain a second timestamp corresponding to a target video of the video data.
Optionally, the video data may comprise a plurality of frames of video data, the video data being pre-marked with a corresponding time stamp. In one implementation, the second timestamp obtaining module 330 may be configured to obtain a brightness parameter of each frame of video data; taking the video frame with the brightness parameter value larger than the target threshold value as a target video; identifying a timestamp corresponding to the target video as a second timestamp. Optionally, the target video in this embodiment includes a white picture frame.
A delay obtaining module 340, configured to obtain a collection delay between the audio data and the video data based on the first timestamp and the second timestamp.
As one way, a difference between the first timestamp and the second timestamp may be obtained; and taking the difference value as the acquisition time delay between the audio data and the video data.
As another way, the target audio and the target video may be aligned first; and acquiring the acquisition time delay between the audio data and the video data based on the aligned first time stamp corresponding to the target audio and the aligned second time stamp corresponding to the target video.
And the processing module 350 is configured to modify the first timestamp or the second timestamp based on the acquisition delay, so as to obtain audio and video data with synchronized timestamps.
Optionally, the processing module 350 may be configured to subtract the acquisition delay from the first timestamp to control the first timestamp and the second timestamp of the broadcast end to be equal; or subtracting the acquisition delay from the second timestamp to control the second timestamp of the broadcasting end to be equal to the first timestamp.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses and modules may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, the coupling or direct coupling or communication connection between the modules shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or modules may be in an electrical, mechanical or other form.
In addition, functional modules in the embodiments of the present application may be integrated into one processing module, or each of the modules may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode.
Referring to fig. 9, based on the above method and apparatus for processing an audio/video timestamp, an embodiment of the present application further provides an electronic device 100 capable of executing the method for processing an audio/video timestamp. The electronic device 100 includes a memory 102 and one or more processors 104 (only one shown) coupled to each other, the memory 102 and the processors 104 being communicatively coupled to each other. The memory 102 stores therein a program that can execute the contents of the foregoing embodiments, and the processor 104 can execute the program stored in the memory 102.
The processor 104 may include one or more processing cores, among other things. The processor 104 interfaces with various components throughout the electronic device 100 using various interfaces and circuitry to perform various functions of the electronic device 100 and process data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 102 and invoking data stored in the memory 102. Alternatively, the processor 104 may be implemented in hardware using at least one of Digital Signal Processing (DSP), Field-Programmable Gate Array (FPGA), and Programmable Logic Array (PLA). The processor 104 may integrate one or more of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a modem, and the like. Wherein, the CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing display content; the modem is used to handle wireless communications. It is understood that the modem may not be integrated into the processor 104, but may be implemented by a communication chip.
The Memory 102 may include a Random Access Memory (RAM) or a Read-Only Memory (Read-Only Memory). The memory 102 may be used to store instructions, programs, code sets, or instruction sets. The memory 102 may include a program storage area and a data storage area, wherein the program storage area may store instructions for implementing an operating system, instructions for implementing at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing the foregoing embodiments, and the like. The data storage area may also store data created by the electronic device 100 during use (e.g., phone book, audio-video data, chat log data), and the like.
Referring to fig. 10, a block diagram of a computer-readable storage medium according to an embodiment of the present application is shown. The computer-readable storage medium 400 has stored therein program code that can be called by a processor to execute the methods described in the above-described method embodiments.
The computer-readable storage medium 400 may be an electronic memory such as a flash memory, an EEPROM (electrically erasable programmable read only memory), an EPROM, a hard disk, or a ROM. Optionally, the computer-readable storage medium 400 includes a non-transitory computer-readable storage medium. The computer readable storage medium 400 has storage space for program code 410 for performing any of the method steps of the method described above. The program code can be read from or written to one or more computer program products. Program code 410 may be compressed, for example, in a suitable form.
To sum up, the processing method, the device, the electronic device and the storage medium of audio and video timestamp provided by the embodiment of the application, through obtaining audio and video data to be processed, the audio and video data to be processed include audio data and video data, then obtain the first timestamp corresponding to the target audio of the audio data, obtain the second timestamp corresponding to the target video of the video data again, obtain the acquisition time delay between the audio data and the video data based on the first timestamp and the second timestamp again, then correct the first timestamp or the second timestamp based on the acquisition time delay, so as to obtain the audio and video data with synchronized timestamps. Therefore, the first time stamp corresponding to the target audio of the audio data and the second time stamp corresponding to the target video of the video data are obtained, and under the condition that the acquisition time delay between the audio data and the video data is obtained, the first time stamp or the second time stamp can be corrected based on the acquisition time delay, so that the audio and video data with synchronous time stamps can be obtained, and the watching experience of a user is improved.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not necessarily depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims (10)

1. A method for processing audio and video time stamps is characterized by comprising the following steps:
acquiring audio and video data to be processed, wherein the audio and video data to be processed comprises audio data and video data;
acquiring a first time stamp corresponding to a target audio of the audio data;
acquiring a second timestamp corresponding to a target video of the video data;
acquiring a collection time delay between the audio data and the video data based on the first time stamp and the second time stamp;
and correcting the first time stamp or the second time stamp based on the acquisition time delay to obtain audio and video data with synchronous time stamps.
2. The method of claim 1, wherein the audio data comprises a plurality of frames of audio data, wherein the audio data is pre-marked with a corresponding timestamp, and wherein obtaining a first timestamp corresponding to a target audio of the audio data comprises:
acquiring target coding modulation parameters corresponding to each frame of audio data;
taking the audio frame corresponding to the target coding modulation parameter meeting the target condition as a target audio;
identifying a timestamp corresponding to the target audio as a first timestamp.
3. The method of claim 1, wherein the video data comprises a plurality of frames of video data, wherein the video data is pre-marked with a corresponding timestamp, and wherein obtaining a second timestamp corresponding to a target video of the video data comprises:
acquiring brightness parameters of each frame of video data;
taking the video frame with the brightness parameter value larger than the target threshold value as a target video;
identifying a timestamp corresponding to the target video as a second timestamp.
4. The method of any of claims 1-3, wherein the obtaining a capture latency between the audio data and the video data based on the first timestamp and the second timestamp comprises:
obtaining a difference between the first timestamp and the second timestamp;
and taking the difference value as the acquisition time delay between the audio data and the video data.
5. The method of claim 4, wherein the modifying the first timestamp or the second timestamp based on the acquisition latency comprises:
subtracting the acquisition time delay from the first time stamp to control the first time stamp and the second time stamp of the playing end to be equal; or
And subtracting the acquisition time delay from the second time stamp to control the second time stamp of the broadcasting end to be equal to the first time stamp.
6. The method of claim 4, wherein obtaining the acquisition latency between the audio data and the video data based on the first timestamp and the second timestamp comprises:
aligning the target audio with the target video;
acquiring the acquisition time delay between the audio data and the video data based on the aligned first time stamp corresponding to the target audio and the aligned second time stamp corresponding to the target video.
7. The method of claim 1, wherein the target audio comprises a noise frame and the target video comprises a white picture frame.
8. An apparatus for processing audio-video time stamps, the apparatus comprising:
the data acquisition module is used for acquiring audio and video data to be processed, and the audio and video data to be processed comprises audio data and video data;
a first timestamp acquiring module, configured to acquire a first timestamp corresponding to a target audio of the audio data;
a second timestamp acquiring module, configured to acquire a second timestamp corresponding to a target video of the video data;
a time delay obtaining module, configured to obtain a collection time delay between the audio data and the video data based on the first time stamp and the second time stamp;
and the processing module is used for correcting the first time stamp or the second time stamp based on the acquisition time delay so as to obtain audio and video data with synchronous time stamps.
9. An electronic device comprising one or more processors and memory;
one or more programs stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to perform the method of any of claims 1-7.
10. A computer-readable storage medium, having program code stored therein, wherein the program code when executed by a processor performs the method of any of claims 1-7.
CN202011255652.1A 2020-11-11 2020-11-11 Audio and video timestamp processing method and device, electronic equipment and storage medium Active CN112423075B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011255652.1A CN112423075B (en) 2020-11-11 2020-11-11 Audio and video timestamp processing method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011255652.1A CN112423075B (en) 2020-11-11 2020-11-11 Audio and video timestamp processing method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112423075A true CN112423075A (en) 2021-02-26
CN112423075B CN112423075B (en) 2022-09-16

Family

ID=74781498

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011255652.1A Active CN112423075B (en) 2020-11-11 2020-11-11 Audio and video timestamp processing method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112423075B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113132672A (en) * 2021-03-24 2021-07-16 联想(北京)有限公司 Data processing method and video conference equipment
CN114339350A (en) * 2021-12-30 2022-04-12 Oppo广东移动通信有限公司 Data processing method and device and electronic equipment
CN114339454A (en) * 2022-03-11 2022-04-12 浙江大华技术股份有限公司 Audio and video synchronization method and device, electronic device and storage medium
CN114979739A (en) * 2022-05-25 2022-08-30 赵燕武 Audio processing method and system in video communication
CN115471780A (en) * 2022-11-11 2022-12-13 荣耀终端有限公司 Method and device for testing sound-picture time delay
CN116437134A (en) * 2023-06-13 2023-07-14 中国人民解放军军事科学院系统工程研究院 Method and device for detecting audio and video synchronicity

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109600564A (en) * 2018-08-01 2019-04-09 北京微播视界科技有限公司 Method and apparatus for determining timestamp
CN109963184A (en) * 2017-12-14 2019-07-02 阿里巴巴集团控股有限公司 A kind of method, apparatus and electronic equipment of audio-video network broadcasting
CN111540103A (en) * 2019-09-12 2020-08-14 扬州盛世云信息科技有限公司 Embedded voice video talkback face recognition access control system based on timestamp synchronization method
CN111757158A (en) * 2020-06-29 2020-10-09 北京百度网讯科技有限公司 Audio and video synchronous playing method, device, equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109963184A (en) * 2017-12-14 2019-07-02 阿里巴巴集团控股有限公司 A kind of method, apparatus and electronic equipment of audio-video network broadcasting
CN109600564A (en) * 2018-08-01 2019-04-09 北京微播视界科技有限公司 Method and apparatus for determining timestamp
CN111540103A (en) * 2019-09-12 2020-08-14 扬州盛世云信息科技有限公司 Embedded voice video talkback face recognition access control system based on timestamp synchronization method
CN111757158A (en) * 2020-06-29 2020-10-09 北京百度网讯科技有限公司 Audio and video synchronous playing method, device, equipment and storage medium

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113132672A (en) * 2021-03-24 2021-07-16 联想(北京)有限公司 Data processing method and video conference equipment
CN113132672B (en) * 2021-03-24 2022-07-26 联想(北京)有限公司 Data processing method and video conference equipment
CN114339350A (en) * 2021-12-30 2022-04-12 Oppo广东移动通信有限公司 Data processing method and device and electronic equipment
CN114339350B (en) * 2021-12-30 2023-12-05 Oppo广东移动通信有限公司 Data processing method and device and electronic equipment
CN114339454A (en) * 2022-03-11 2022-04-12 浙江大华技术股份有限公司 Audio and video synchronization method and device, electronic device and storage medium
CN114979739A (en) * 2022-05-25 2022-08-30 赵燕武 Audio processing method and system in video communication
CN114979739B (en) * 2022-05-25 2024-02-27 新疆美特智能安全工程股份有限公司 Audio processing method and system in video communication
CN115471780A (en) * 2022-11-11 2022-12-13 荣耀终端有限公司 Method and device for testing sound-picture time delay
CN116437134A (en) * 2023-06-13 2023-07-14 中国人民解放军军事科学院系统工程研究院 Method and device for detecting audio and video synchronicity
CN116437134B (en) * 2023-06-13 2023-09-22 中国人民解放军军事科学院系统工程研究院 Method and device for detecting audio and video synchronicity

Also Published As

Publication number Publication date
CN112423075B (en) 2022-09-16

Similar Documents

Publication Publication Date Title
CN112423075B (en) Audio and video timestamp processing method and device, electronic equipment and storage medium
CN110430441B (en) Cloud mobile phone video acquisition method, system, device and storage medium
US9928844B2 (en) Method and system of audio quality and latency adjustment for audio processing by using audio feedback
US20210281718A1 (en) Video Processing Method, Electronic Device and Storage Medium
TWI513320B (en) Video conferencing device and lip synchronization method thereof
CN109101216B (en) Sound effect adjusting method and device, electronic equipment and storage medium
US20230144483A1 (en) Method for encoding video data, device, and storage medium
CN111404882B (en) Media stream processing method and device
JP7409963B2 (en) Computing system with trigger feature based on channel change
CN113301342B (en) Video coding method, network live broadcasting method, device and terminal equipment
CN110933485A (en) Video subtitle generating method, system, device and storage medium
CN112423074B (en) Audio and video synchronization processing method and device, electronic equipment and storage medium
CN112929712A (en) Video code rate adjusting method and device
CN114339454A (en) Audio and video synchronization method and device, electronic device and storage medium
CN111182302B (en) Video image encoding method, terminal device, and storage medium
EP2814259A1 (en) Method, system, capturing device and synchronization server for enabling synchronization of rendering of multiple content parts, using a reference rendering timeline
CN115904281A (en) Cloud desktop conference sharing method, server and computer readable storage medium
CN103517044A (en) Video conference apparatus and lip synchronization method
CN112201264A (en) Audio processing method and device, electronic equipment, server and storage medium
CN113810629B (en) Video frame processing method and device for multimedia signal of fusion platform
CN116962742A (en) Live video image data transmission method, device and live video system
CN113810725A (en) Video processing method, device, storage medium and video communication terminal
CN110730408A (en) Audio parameter switching method and device, electronic equipment and storage medium
CN113727183B (en) Live push method, apparatus, device, storage medium and computer program product
CN111601157B (en) Audio output method and display device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant