WO2020024945A1 - 确定时间戳的方法和装置 - Google Patents

确定时间戳的方法和装置 Download PDF

Info

Publication number
WO2020024945A1
WO2020024945A1 PCT/CN2019/098431 CN2019098431W WO2020024945A1 WO 2020024945 A1 WO2020024945 A1 WO 2020024945A1 CN 2019098431 W CN2019098431 W CN 2019098431W WO 2020024945 A1 WO2020024945 A1 WO 2020024945A1
Authority
WO
WIPO (PCT)
Prior art keywords
frame
time
video data
determining
transmission ready
Prior art date
Application number
PCT/CN2019/098431
Other languages
English (en)
French (fr)
Inventor
施磊
Original Assignee
北京微播视界科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京微播视界科技有限公司 filed Critical 北京微播视界科技有限公司
Publication of WO2020024945A1 publication Critical patent/WO2020024945A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/76Television signal recording
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/4302Content synchronisation processes, e.g. decoder synchronisation
    • H04N21/4307Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/8547Content authoring involving timestamps for synchronizing content

Definitions

  • the embodiments of the present application relate to the field of computer technology, for example, to a method and an apparatus for determining a time stamp.
  • audio (soundtrack) playback is usually performed at the same time as video capture with the camera. For example, during a song, a singing action performed by a user is recorded, and the recorded video uses the song as background music. In applications with video recording capabilities, it is common for audio and video to be out of sync for recorded soundtrack videos. Taking an Android device as an example, since there is a large difference between different devices and the fragmentation is serious, it is difficult to achieve the synchronization of recorded audio and video on different devices.
  • the related method When recording a soundtrack video, the related method usually determines the time stamp of the frame based on the acquisition time of the frame in the video data. For example, the acquisition time of the first frame is taken as the start time (that is, time 0), and the interval time between two adjacent frames in the video data is considered to be fixed. The sum of the time stamp of the previous frame and the interval time is determined as The timestamp of the current frame.
  • the embodiments of the present application provide a method and a device for determining a time stamp.
  • An embodiment of the present application provides a method for determining a timestamp.
  • the method includes: collecting video data and playing target audio data; wherein the video data includes multiple frames; and acquiring acquisition time and transmission of at least one frame of the video data. Ready time, based on the acquired acquisition time and transmission ready time, to determine the delay time of the frame of the video data; for each frame in the video data, determine the data amount of the target audio data that has been played when the frame is acquired, and The difference between the playback duration and the delay duration corresponding to the data amount is determined as the time stamp of each frame.
  • An embodiment of the present application further provides a device for determining a time stamp.
  • the device includes: an acquisition unit configured to acquire video data and play target audio data; wherein the video data includes multiple frames; and a first determination unit, which is Configured to acquire the acquisition time and transmission ready time of at least one frame in the video data, and determine the delay time of the frame of the video data based on the acquired acquisition time and transmission ready time; the second determining unit is configured to For each frame of the frame, the data amount of the target audio data that has been played when each frame is collected is determined, and the difference between the playback time and the delay time corresponding to the data amount is determined as the time stamp of each frame.
  • An embodiment of the present application further provides a terminal device, including: one or more processors; a storage device, the storage device is configured to store one or more programs, and when one or more programs are processed by one or more processors Execution causes one or more processors to implement the method for determining a timestamp as provided by any embodiment.
  • An embodiment of the present application further provides a computer-readable medium.
  • a computer program is stored on the computer-readable medium, and when the program is executed by a processor, the method for determining a time stamp as provided in any embodiment is implemented.
  • FIG. 1 is a system architecture diagram provided by an embodiment of the present application.
  • FIG. 2 is a flowchart of a method for determining a time stamp according to an embodiment of the present application
  • FIG. 3 is a schematic diagram of an application scenario of a method for determining a timestamp according to an embodiment of the present application
  • FIG. 4 is a flowchart of another method for determining a time stamp according to an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of a device for determining a time stamp according to an embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of a computer system of a terminal device according to an embodiment of the present application.
  • FIG. 1 is a system architecture diagram provided by an embodiment of the present application.
  • FIG. 1 illustrates an exemplary system architecture 100 to which the method or device for determining a time stamp of the present application can be applied.
  • the system architecture 100 may include a terminal device 101, a terminal device 102, a terminal device 103, a network 104, and a server 105.
  • the network 104 is used to provide a medium for a communication link between the terminal device 101, the terminal device 102, the terminal device 103, and the server 105.
  • the network 104 may include any type of connection network, such as a wired, wireless communication link, or a fiber optic cable.
  • the user can use the terminal devices 101, 102, 103 to interact with the server 105 through the network 104 to receive or send messages (such as audio and video data upload requests, audio data acquisition requests), and the like.
  • messages such as audio and video data upload requests, audio data acquisition requests
  • a variety of communication client applications can be installed on the terminal device 101, the terminal device 102, and the terminal device 103, such as video recording applications, audio playback applications, instant communication tools, email clients, social platform software, and so on.
  • the terminal device 101, the terminal device 102, and the terminal device 103 may be hardware or software.
  • the terminal device 101, the terminal device 102, and the terminal device 103 are hardware, they can be various electronic devices with display screens that can realize video recording and audio playback, including but not limited to smartphones, tablets, and laptop computers And desktop computers and much more.
  • the terminal device 101, the terminal device 102, and the terminal device 103 are software, the terminal device 101, the terminal device 102, and the terminal device 103 may be installed in the electronic devices listed above.
  • the terminal device 101, the terminal device 102, and the terminal device 103 may be implemented as multiple software or software modules (for example, to provide distributed services), or may be implemented as a single software or software module. It is not specifically limited here.
  • the terminal device 101, the terminal device 102, and the terminal device 103 may be equipped with an image acquisition device (such as a camera) to collect video data.
  • the minimum visual unit constituting a video is a frame. Each frame is a static image. Combining a sequence of temporally consecutive frames together forms a dynamic video.
  • the terminal device 101, the terminal device 102, and the terminal device 103 may also be provided with a device (such as a speaker) configured to convert an electrical signal into a sound to play the sound.
  • the audio data is data obtained by performing analog-to-digital conversion (ADC) on an analog audio signal at a certain frequency.
  • ADC analog-to-digital conversion
  • the playback of audio data is a process of digital-to-analog conversion of digital audio signals, reduction to analog audio signals, and conversion of analog audio signals (analog audio signals to electrical signals) into sound for output.
  • the terminal device 101, the terminal device 102, and the terminal device 103 can use the image acquisition device installed on the terminal device to collect video data, and can use the audio device that supports audio playback (such as converting digital audio signals to analog audio signals) installed on them. )
  • the audio processing component and speakers play audio data.
  • the terminal device 101, the terminal device 102, and the terminal device 103 may perform processing such as timestamp calculation on the collected video data, and finally store the processing results (for example, the video data including the timestamp and the played audio data).
  • the server 105 may be a server that provides various services.
  • the server 105 provides a background server that provides support for video recording applications installed on the terminal device 101, the terminal device 102, and the terminal device 103.
  • the background server can analyze and store the received audio and video data upload requests and other data.
  • the background server can also receive audio and video data acquisition requests sent by the terminal device 101, terminal device 102, and terminal device 103, and feed back the audio and video data indicated by the audio and video data acquisition request to the terminal device 101, terminal device 102, and terminal.
  • Equipment 103 is a server that provides various services.
  • the server 105 provides a background server that provides support for video recording applications installed on the terminal device 101, the terminal device 102, and the terminal device 103.
  • the background server can analyze and store the received audio and video data upload requests and other data.
  • the background server can also receive audio and video data acquisition requests sent by the terminal device 101, terminal device 102, and terminal device 103, and feed back the audio and video data indicated by the audio and
  • the server 105 may be hardware or software.
  • the server 105 can be implemented as a distributed server cluster composed of multiple servers or as a single server.
  • the server 105 is software, it may be implemented as multiple software or software modules (for example, to provide distributed services), or may be implemented as a single software or software module. It is not specifically limited here.
  • the method for determining a timestamp provided in the embodiments of the present application is generally executed by the terminal device 101, the terminal device 102, and the terminal device 103. Accordingly, the device for determining the timestamp is generally set on the terminal device 101, the terminal device 102. In the terminal device 103.
  • terminal devices, networks, and servers in FIG. 1 are merely exemplary. According to implementation needs, there can be any number of terminal devices, networks, and servers.
  • FIG. 2 is a flowchart of a method for determining a time stamp according to an embodiment of the present application.
  • the method for determining a time stamp includes the following steps.
  • step 2010 the video data is collected and the target audio data is played.
  • an execution subject of the method for determining the time stamp may obtain and store the target audio data in advance.
  • the above-mentioned target audio data may be audio data specified in advance by the user as a soundtrack of the video, for example, audio data corresponding to a specified song.
  • the audio data is data obtained by digitizing a sound signal.
  • the process of digitizing sound signals is a process of converting continuous analog audio signals into digital audio signals to obtain audio data at a certain frequency.
  • the digitization process of a sound signal includes three steps: sampling, quantization, and encoding.
  • sampling refers to replacing a signal that is continuous in time with a sequence of signal sample values at regular time intervals.
  • Quantization refers to the use of finite amplitude approximation to indicate the amplitude value that continuously changes in time, and changes the continuous amplitude of the analog signal into a finite number of discrete values with a certain time interval.
  • Encoding means that the quantized discrete value is represented by binary digits according to a certain rule.
  • Pulse Code Modulation can implement digital audio data that is obtained by sampling, quantizing, and encoding an analog audio signal. Therefore, the above-mentioned target audio data may be a data stream in a PCM encoding format, and the format of the file in which the target audio data is recorded may be a wav format.
  • the format of the file describing the target audio data may also be other formats, such as mp3 format, ape format, and the like.
  • the target audio data may be data of other encoding formats (for example, lossy compression formats such as Advanced Audio Coding (AAC)), and is not limited to the PCM encoding format.
  • the execution body may perform format conversion on the target audio data file, and convert the file into a wav format.
  • the target audio data in the converted file is a data stream in PCM encoding format.
  • the playback of audio data may be a process of performing digital-to-analog conversion on the digitized audio data, restoring the digitized audio data to an analog audio signal, and then converting the analog audio signal (electrical signal) into sound for output. .
  • the above-mentioned execution body may be equipped with an image acquisition device, such as a camera.
  • the above-mentioned execution subject may use the above camera to collect video data (vision data).
  • the video data can be described by a frame.
  • a frame is the smallest visual unit that makes up a video.
  • Each frame is a static image. Combining a sequence of temporally consecutive frames together forms a dynamic video.
  • the above-mentioned execution body may further be provided with a device for converting an electric signal into a sound, such as a speaker. After obtaining the target audio data, the execution subject may turn on the camera to collect video data, and at the same time, may convert the target audio data into an analog audio signal and output sound using the speaker to implement playback of the target audio data.
  • the above-mentioned execution subject may play the target audio data in any manner.
  • the above-mentioned execution body may implement the playback of the target audio data based on a class for playing a data stream in the PCM encoding format (for example, the Audio Track class in the Android development kit). Before playing, you can call this class in advance and instantiate the class to create a target object for playing the target audio data.
  • a streaming method (such as transmitting a fixed amount of data per unit time) may be used to transmit the target audio data to the target object, so as to play the target audio data using the target object.
  • AudioTrack in the Android Development Kit is a class that manages and plays a single audio resource. AudioTrack can be used for playback of PCM audio streams. Generally, the audio data is played by transmitting the audio data to an object instantiated with AudioTrack by using a push method. AudioTrack objects can operate in two modes. They are static mode and streaming mode. In stream mode, write (by calling the write method) a continuous PCM-encoded data stream to the AudioTrack object. In the above implementation manner, the target audio data can be written in a streaming mode. In an embodiment, the above-mentioned execution body may also use other components or tools that support audio data playback to play the target audio data, which is not limited to the foregoing manner.
  • a video recording application may be installed in the execution body.
  • This video recording application can support the recording of soundtrack videos.
  • the above soundtrack video may be a video that plays audio data while video data is being collected.
  • the sound in the recorded soundtrack video is the sound corresponding to the audio data.
  • a singing action performed by a user is recorded, and the recorded video uses the song as background music.
  • the above video recording applications can support continuous recording and segment recording of soundtrack videos.
  • the user can first click the recording button to record the first video. Then, click the recording button again to trigger the pause recording instruction. Then, click the record button again to trigger the resume recording instruction to record the second video. Then, click the recording button again to trigger the pause recording instruction. And so on.
  • the recording instruction, the recording pause instruction, and the resume recording instruction may be triggered in other ways. For example, you can record each video by long pressing the record button. When the record button is released, the pause recording instruction is triggered. I won't repeat them here.
  • Step 2020 Obtain the acquisition time and transmission ready time of at least one frame in the video data, and determine the delay time of the frame of the video data based on the acquired acquisition time and transmission ready time.
  • the execution time of the frame may be recorded.
  • the collection time of the frame may be a system time stamp (for example, a unix time stamp) when the image acquisition device acquires the frame.
  • a timestamp is a complete, verifiable data that can indicate that a piece of data already exists before a certain time.
  • a timestamp is a sequence of characters that uniquely identifies the time of a moment.
  • the frame After each frame is collected by the image acquisition device, the frame needs to be transmitted to the application layer, so that the application layer processes the frame. After transmitting the frame to the application layer, the execution body can record the transmission ready time of the frame.
  • the transmission ready time of each frame may be a system time stamp when the frame is transmitted to the application layer.
  • the execution body can record the acquisition time and transmission ready time of each frame in the collected video data, the execution body can directly obtain the acquisition time and transmission ready time of at least one frame in the video data locally.
  • the at least one frame may be one or more frames obtained randomly, or may be all frames in the collected video data. It is not limited here.
  • the execution body may determine the delay time of the frame of the video data based on the acquired acquisition time and the transmission ready time.
  • multiple methods may be used to determine the delay duration.
  • the number of the at least one frame may be determined. Different quantities can use different methods to determine the delay duration. In an embodiment, if the number of the at least one frame is 1, the difference between the transmission ready time and the acquisition time of the frame can be directly determined as the delay time of the frame of the video data.
  • the difference between the transmission ready time and the acquisition time of each frame in the at least one frame may be determined first; then, the average value between the determined multiple differences is determined as a video The length of the frame's delay.
  • the difference between the transmission ready time and the acquisition time of each frame in the at least one frame may be determined first; and then, the determined An average value between a plurality of differences is determined as a delay time of a frame of video data.
  • the difference between the transmission ready time and the acquisition time of each frame in the at least one frame may be determined first; then, the difference between Maximum value and minimum value; finally, the average value of the remaining differences is determined as the delay time of the frame of the video data.
  • the above-mentioned execution body may determine the transmission ready time of one frame in the following manner: First, a first preset interface (such as an updateTexlmage () interface) may be called to obtain one frame of the collected video data. The first preset interface may be used to obtain a collected frame. In an embodiment, the first preset interface can acquire frames collected by the image acquisition device. Then, in response to obtaining the frame, a second preset interface (such as the getTimestamp () interface) may be called to obtain the current time stamp, and the current time stamp is determined as the transmission ready time of the frame. The second preset interface may be used to obtain a timestamp. In an embodiment, after obtaining the frame, the timestamp obtained by using the second preset interface is the system timestamp when the frame is transmitted to the application layer.
  • a first preset interface such as an updateTexlmage () interface
  • the first preset interface may be used to obtain a collected frame.
  • the first preset interface can acquire frames collected by the image acquisition device.
  • the execution subject may determine the delay time in the following manner: First, the acquisition time and transmission ready time of at least one frame in the video data may be obtained. Then, for each of the at least one frame, a difference between the transmission ready time and the acquisition time of the frame is determined. Finally, an average value of the determined at least one difference value may be determined as a delay time of a frame of the video data.
  • the acquisition time and transmission ready time of the at least one frame obtained by the execution subject may include the acquisition time and transmission ready time of the first frame in the video data.
  • the execution subject may determine the difference between the transmission ready time of the first frame and the acquisition time as the delay time of the frame of the video data.
  • the acquisition time and transmission ready time of at least one frame obtained by the execution subject may include the acquisition time and transmission ready time of multiple target frames in the video data.
  • the multiple target frames may be two or more pre-designated frames. For example, it can be the first three frames of video data, or the first and last frames of video data.
  • the multiple target frames may also be two or more randomly selected frames in the collected video data.
  • the execution body may first determine an average value of the collection times of the multiple target frames, and determine the average value as a first average value. Then, an average value of the transmission ready times of the multiple target frames may be determined, and the average value is determined as a second average value. Finally, a difference between the second average value and the first average value may be determined as a delay time of a frame of the video data.
  • the execution body may further determine whether the delay duration is less than a preset delay duration threshold (for example, 0). In response to determining that the delay duration is less than a preset delay duration threshold, the delay duration is set to a preset value. The preset value is not less than the preset delay duration threshold.
  • a preset delay duration threshold for example, 0
  • Step 2030 For each frame in the video data, determine a data amount of the target audio data that has been played when each frame is collected, and determine a difference between the playback duration corresponding to the data amount and the delay duration as the each The timestamp of the frame.
  • the execution subject may first read the acquisition time of the frame. Then, the data amount of the target audio data that has been played at the acquisition time can be determined. In an embodiment, the execution subject may determine the data amount of the target audio data that has been transmitted to the target object when the frame is acquired, and may determine the data amount as the data of the target audio data that has been played when the frame is acquired. the amount.
  • the target audio data is obtained by sampling, quantizing, etc. the sound signal according to a set sampling frequency (Sampling), a set sampling size (Sampling), and playing the target audio data.
  • the number of channels is predetermined, so the playback of the target audio data when the frame is acquired can be calculated based on the data amount of the target audio data that has been played when the frame is collected duration.
  • the execution subject may determine the difference between the playback duration and the delay duration as the time stamp of the frame.
  • the sampling frequency is also referred to as a sampling speed or a sampling rate.
  • the sampling frequency can be the number of samples taken from the continuous signal per second and composed of discrete signals.
  • the sampling frequency can be expressed in Hertz (Hz).
  • the sample size can be expressed in bits.
  • the step of determining the playback duration is as follows: First, the product of the sampling frequency, the sampling size, and the number of channels can be determined. Then, the ratio of the data amount of the target audio data that has been played to the product can be determined as the playback duration of the target audio data.
  • the above-mentioned execution body may further use the target audio data that has been played when the last frame of the video data is collected as the target audio data interval, and extract the target audio data interval.
  • the above-mentioned execution body may first obtain a collection time of a tail frame of the collected video data. Then, the data amount of the target audio data that has been played at the acquisition time can be determined. After that, the target audio data may be intercepted from the starting position of the target audio data according to the data amount, and the intercepted data may be extracted as the target audio data interval. After the target frequency data interval is extracted, the video data containing the time stamp and the target audio data interval can be stored.
  • the target audio data interval and the video data including the timestamp may be stored in two files respectively, and a mapping of the two files is established. In an embodiment, the target audio data interval and the video data including the time stamp may also be stored in the same file.
  • the above-mentioned execution subject may store the target audio data interval and the video data including the time stamp in the following manner: First, the video data including the time stamp may be encoded. After that, the target audio data interval and the encoded video data are stored in the same file.
  • video encoding may refer to a manner of converting a file in a certain video format into another file in a video format through a specific compression technology. It should be noted that the video coding technology is a well-known technology that has been widely studied and applied, and is not repeated here.
  • the execution body may further upload the stored data to a server.
  • FIG. 3 is a schematic diagram of an application scenario of a method for determining a timestamp provided by an embodiment of the present application.
  • a user holds a terminal device 301 and records a soundtrack video.
  • a short video recording application runs on the terminal device 301.
  • the user first selects a certain soundtrack (such as the song "Little Apple") in the interface of the short video recording application.
  • the terminal device 301 obtains the target audio data 302 corresponding to the soundtrack.
  • the terminal device 301 turns on the camera to collect video data 303, and at the same time, plays the above-mentioned target audio data 302.
  • the terminal device 301 may acquire the acquisition time and transmission ready time of at least one frame of the video data 303, and determine the delay time of the frame of the video data based on the acquired acquisition time and transmission ready time. Finally, for each frame in the video data, the end device 301 may determine the data amount of the target audio data that has been played when the frame is collected, and determine the difference between the playback time corresponding to the data amount and the delay time as the frame. Timestamp.
  • the method provided by the foregoing embodiment of the present application determines video frame delay time by collecting video data and playing target audio data, and then based on the acquisition time and transmission ready time of at least one frame in the video data. For each frame in the data, determine the data amount of the target audio data that has been played when the frame was collected, and determine the difference between the playback time corresponding to the data amount and the delay time as the time stamp of the frame.
  • the time stamp of the frame can be determined according to the playback volume of the target audio data that has been played at the time of the frame collection, and the determined time stamp eliminates the delay time of the frame from acquisition to transmission ready, and improves the The accuracy of the frame time stamp improves the audio and video synchronization effect of the recorded soundtrack video.
  • FIG. 4 is a flowchart of another method for determining a timestamp provided by an embodiment of the present application.
  • the method provided by this embodiment includes the following steps.
  • Step 4010 Collect video data and play target audio data.
  • an execution subject of the method for determining a timestamp may collect video data by using a camera installed therein, and at the same time, use a preset
  • the audio processing component plays target audio data.
  • the target audio data may be a data stream in a PCM encoding format.
  • the target audio data can be played in the following manner.
  • a target class such as the Audio Track class in the Android development kit
  • the above target class may be used to play a data stream in PCM encoding format.
  • the target audio data may be transmitted to the target object in a streaming manner, so as to play the target audio data by using the target object.
  • Step 4020 Obtain the acquisition time and transmission ready time of the first frame in the video data.
  • the execution time of the frame may be recorded. After the first frame of the video data is transmitted to the application layer, the transmission ready time of the first frame can be recorded. Since the execution body can record the acquisition time and transmission ready time of each frame in the captured video data, the execution body can directly obtain the acquisition time and transmission ready time of the first frame of the video data from the local.
  • Step 4030 Determine the difference between the transmission ready time and the acquisition time as the delay time of the frame of the video data.
  • the execution subject may determine a difference between the transmission ready time and the acquisition time as a delay time of a frame of video data.
  • Step 4040 In response to determining that the delay duration is less than a preset delay duration threshold, the delay duration is set to a preset value.
  • the execution entity may determine whether the delay duration is less than a preset delay duration threshold (for example, 0). In response to determining that the delay duration is less than a preset delay duration threshold, the delay duration may be set to a preset value.
  • the preset value is not less than the preset delay duration threshold.
  • the preset value may be a value specified by a technician after performing statistics and analysis based on a large amount of data.
  • Step 4050 For each frame in the video data, determine the data amount of the target audio data that has been played when each frame is collected, and determine the difference between the playback duration and the delay duration corresponding to the data amount as the time of each frame. stamp.
  • the execution subject may first read the acquisition time of the frame. Then, the data amount of the target audio data that has been transmitted to the target object when the frame is acquired can be determined, and the data amount is determined as the data amount of the target audio data that has been played when the frame is acquired. After that, the playing time corresponding to the data amount can be determined. Finally, the difference between the playback duration and the delay duration can be determined as the time stamp of the frame.
  • the step of determining the playback duration is as follows: First, the product of the sampling frequency, the sampling size, and the number of channels can be determined. Then, the ratio of the data amount of the target audio data that has been played to the product can be determined as the playback duration of the target audio data.
  • Step 4060 Use the target audio data that has been played when the last frame of the video data is collected as the target audio data interval, and extract the target audio data interval.
  • the execution body may first obtain a collection time of a last frame (that is, a last frame in the video data) of the collected video data. Then, the data amount of the target audio data that has been played at the acquisition time can be determined. After that, the target audio data may be intercepted from the starting position of the target audio data according to the data amount, and the intercepted data may be extracted as the target audio data interval.
  • Step 4070 Store the video data containing the time stamp and the target audio data interval.
  • the execution subject may store the video data including the time stamp and the target audio data interval.
  • the target audio data interval and the video data including the time stamp can be stored in two files respectively, and a mapping of the two files is established.
  • the target audio data interval and the video data including the time stamp may also be stored in the same file.
  • the method for determining a timestamp in this embodiment embodies the steps of determining the delay time based on the acquisition time and transmission ready time of the first frame of video data. Therefore, the solution described in this embodiment can reduce the amount of data calculation and improve the data processing efficiency. On the other hand, it also reflects the steps of extracting the target audio data interval, and the steps of storing audio and video data. Therefore, the solution described in this embodiment can implement recording of a soundtrack video and save the recorded data.
  • FIG. 5 is a schematic structural diagram of a device for determining a time stamp according to an embodiment of the present application.
  • an embodiment of the present application provides a device for determining a timestamp. This device embodiment corresponds to the method embodiment shown in FIG. 2.
  • the device can be applied to various electronic devices. in.
  • the apparatus 500 for determining a time stamp includes: a collection unit 501 configured to collect video data and play target audio data; and a first determination unit 502 configured to obtain the video data.
  • the acquisition time and the transmission ready time of at least one frame of the video frame are determined based on the acquired acquisition time and the transmission ready time, and the delay time of the frame of the video data is determined.
  • the first determining unit 502 may include a first obtaining module, a first determining module, and a second determining module (not shown in the figure).
  • the first acquisition module may be configured to acquire an acquisition time and a transmission ready time of at least one frame in the video data.
  • the first determining module may be configured to determine, for each of the at least one frame, a difference between a transmission ready time and an acquisition time of each frame.
  • the second determination module may be configured to determine an average value of the determined at least one difference value as a delay duration of a frame of video data.
  • the at least one frame may include a first frame.
  • the first determining unit 502 may include a second obtaining module and a third determining module (not shown in the figure).
  • the second acquisition module may be configured to acquire an acquisition time and a transmission ready time of a first frame in the video data.
  • the third determination module may be configured to determine a difference between the transmission ready time and the acquisition time as a delay duration of a frame of video data.
  • the at least one frame may include a plurality of target frames.
  • the first determining unit 502 may include a third obtaining module, a fourth determining module, and a fifth determining module (not shown in the figure).
  • the third acquisition module may be configured to acquire acquisition time and transmission ready time of a plurality of target frames in the video data.
  • the fourth determination module may be configured to determine an average value of the acquisition times of the plurality of target frames as a first average value, and determine an average value of the transmission ready times of the plurality of target frames as a second average value.
  • the fifth determination module may be configured to determine a difference between the second average value and the first average value as a delay duration of a frame of the video data.
  • the transmission ready time of one frame may be obtained by calling a first preset interface to obtain one frame of the collected video data, wherein the first preset interface is used to obtain the acquired video data.
  • calling a second preset interface to obtain a current timestamp in response to acquiring a frame, calling a second preset interface to obtain a current timestamp, and determining the current timestamp as a transmission ready time of the frame, wherein the second preset interface is used to obtain a timestamp .
  • the device may further include a setting unit (not shown in the figure).
  • the setting unit may be configured to set the delay time to a preset value in response to determining that the delay time is less than a preset delay time threshold, and the preset value is not less than the preset value. Delay duration threshold.
  • the device may further include an extraction unit and a storage unit (not shown in the figure).
  • the extraction unit may be configured to use the target audio data that has been played when the last frame of the video data is collected as the target audio data interval, and extract the target audio data interval.
  • the storage unit may be configured to store the video data including the time stamp and the target audio data interval.
  • the device provided by the foregoing embodiment of the present application collects video data through the acquisition unit 501 and plays target audio data, and then the first determination unit 502 determines a frame of the video data based on the acquisition time and transmission ready time of at least one frame in the video data. The length of the delay. Finally, for each frame in the video data, the second determining unit 503 determines the data amount of the target audio data that has been played when the frame is collected, and the difference between the playback time corresponding to the data amount and the delay time.
  • the frame time stamp can be determined according to the playback volume of the target audio data that has been played at the time of frame acquisition, and the determined time stamp eliminates the frame from the acquisition
  • the delay time from transmission to readiness improves the accuracy of the timestamp of the frames in the video data, and improves the audio and video synchronization effect of the recorded soundtrack video.
  • FIG. 6 is a schematic structural diagram of a computer system of a terminal device according to an embodiment of the present application.
  • the terminal device / server shown in FIG. 6 is only an example, and should not impose any limitation on the functions and scope of use of the embodiments of the present application.
  • the computer system 600 includes a central processing unit (CPU) 601, and the CPU 601 can be loaded to a random computer according to a program stored in a read-only memory (ROM) 602 or from a storage portion 608
  • ROM read-only memory
  • RAM Random Access Memory
  • a program in the Random Access Memory (RAM) 603 is accessed to perform a variety of appropriate actions and processes.
  • RAM Random Access Memory
  • various programs and data required for the operation of the computer system 600 are also stored.
  • the CPU 601, the ROM 602, and the RAM 603 are connected to each other through a bus 604.
  • An input / output (I / O) interface 605 is also connected to the bus 604.
  • the following components are connected to the I / O interface 605: an input portion 606 including a touch screen, a touch panel, etc .; an output portion 607 including a liquid crystal display (LCD), and a speaker; a storage portion 608 including a hard disk; and A communication part 609 of a network interface card, such as a local area network (LAN) card, a modem, or the like.
  • the communication section 609 performs communication processing via a network such as the Internet.
  • the driver 610 is also connected to the I / O interface 605 as necessary.
  • a removable medium 611 such as a semiconductor memory, is installed on the drive 610 as needed, so that a computer program that the drive 610 reads from the removable medium 611 is installed into the storage section 608 as needed.
  • the process described above with reference to the flowchart may be implemented as a computer software program.
  • embodiments of the present disclosure include a computer program product including a computer program borne on a computer-readable medium, the computer program containing program code for performing a method shown in a flowchart.
  • the computer program may be downloaded and installed from a network through the communication portion 609, and / or installed from a removable medium 611.
  • the computer-readable medium described in this application may be a computer-readable signal medium or a computer-readable storage medium or any combination of the foregoing.
  • the computer-readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof.
  • the computer-readable storage medium may include, but is not limited to, an electrical connection with one or more wires, a portable computer diskette, a hard disk, a RAM, a ROM, an erasable programmable read-only memory (EPROM), or Flash memory, optical fiber, portable compact disc read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the foregoing.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in combination with an instruction execution system, apparatus, or device.
  • the computer-readable signal medium may include a data signal that is transmitted in baseband or transmitted as part of a carrier wave, and the data signal carries computer-readable program code.
  • a propagated data signal may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • the computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, and the computer-readable medium may send, propagate, or transmit a program for use by or in connection with an instruction execution system, apparatus, or device .
  • the program code contained on the computer-readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, optical cable, radio frequency (RF), or any suitable combination of the foregoing.
  • RF radio frequency
  • each block in the flowchart or block diagram may represent a module, a program segment, or a part of code, which contains one or more functions to implement a specified logical function Executable instructions.
  • each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts can be implemented by a dedicated hardware-based system that performs the specified function or operation , Or it can be implemented with a combination of dedicated hardware and computer instructions.
  • the units described in the embodiments of the present application may be implemented by software or hardware.
  • the described unit may also be provided in a processor, for example, it may be described as: a processor includes an acquisition unit, a first determination unit, and a second determination unit.
  • a processor includes an acquisition unit, a first determination unit, and a second determination unit.
  • the names of these units do not constitute a limitation on the unit itself in some cases.
  • the acquisition unit can also be described as a “unit that collects video data and plays target audio data”.
  • the present application also provides a computer-readable medium, which may be included in the device described in the foregoing embodiments; or may exist alone without being assembled into the device.
  • the computer-readable medium carries one or more programs, and when the one or more programs are executed by the device, the device causes the device to: collect video data and play target audio data; acquire at least one frame of the video data Time and transmission ready time, based on the acquired acquisition time and transmission ready time, determine the delay time of the frame of the video data; for each frame in the video data, determine the data of the target audio data that has been played when the frame is acquired The amount of difference between the playback duration corresponding to the data amount and the delay duration is determined as the time stamp of the frame.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Television Signal Processing For Recording (AREA)

Abstract

本申请实施例公开了确定时间戳的方法和装置。该方法包括:采集视频数据并播放目标音频数据;其中,视频数据包括多帧;获取该视频数据中的至少一帧的采集时间和传输就绪时间,基于所获取的采集时间和传输就绪时间,确定该视频数据的帧的延迟时长;对于该视频数据中的每帧,确定采集到该帧时已播放的目标音频数据的数据量,将该数据量对应的播放时长与该延迟时长的差值确定为该帧的时间戳。

Description

确定时间戳的方法和装置
本申请要求在2018年08月01日提交中国专利局、申请号为201810866765.1的中国专利申请的优先权,该申请的全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及计算机技术领域,例如涉及确定时间戳的方法和装置。
背景技术
录制配乐视频时,通常在利用摄像头进行视频采集的同时进行音频(配乐)播放。例如,播放某歌曲过程中录制用户表演的演唱动作,所录制的视频以该歌曲为背景音乐。在具有视频录制功能的应用中,录制的配乐视频出现音视频不同步的情况较为常见。以安卓(Android)设备为例,由于不同设备之间存在较大差异性,且碎片化较为严重,因而在不同设备上实现所录制的音视频同步,具有较高的难度。
在对配乐视频进行录制时,相关的方式通常基于视频数据中的帧的采集时间确定该帧的时间戳。例如,将首帧的采集时间作为起始时间(即0时刻),并认为视频数据中的相邻两帧的间隔时间是固定的,将上一帧的时间戳与该间隔时间之和确定为当前帧的时间戳。
发明内容
本申请实施例提出了确定时间戳的方法和装置。
本申请实施例提供了一种确定时间戳的方法,该方法包括:采集视频数据并播放目标音频数据;其中,所述视频数据包括多帧;获取视频数据中的至少一帧的采集时间和传输就绪时间,基于所获取的采集时间和传输就绪时间,确定视频数据的帧的延迟时长;对于视频数据中的每帧,确定采集到所述每帧时已播放的目标音频数据的数据量,将数据量对应的播放时长与延迟时长的差值确定为所述每帧的时间戳。
本申请实施例还提供了一种确定时间戳的装置,该装置包括:采集单元,被配置成采集视频数据并播放目标音频数据;其中,所述视频数据包括多帧;第一确定单元,被配置成获取视频数据中的至少一帧的采集时间和传输就绪时间,基于所获取的采集时间和传输就绪时间,确定视频数据的帧的延迟时长;第二确定单元,被配置成对于视频数据中的每帧,确定采集到所述每帧时已播 放的目标音频数据的数据量,将数据量对应的播放时长与延迟时长的差值确定为所述每帧的时间戳。
本申请实施例还提供了一种终端设备,包括:一个或多个处理器;存储装置,存储装置被配置成存储有一个或多个程序,当一个或多个程序被一个或多个处理器执行,使得一个或多个处理器实现如任一实施例提供的确定时间戳的方法。
本申请实施例还提供了一种计算机可读介质,计算机可读介质上存储有计算机程序,该程序被处理器执行时实现如任一实施例提供的确定时间戳的方法。
附图说明
图1是本申请实施例提供的一种系统架构图;
图2是本申请实施例提供的一种确定时间戳的方法的流程图;
图3是本申请实施例提供的一种确定时间戳的方法的应用场景的示意图;
图4是本申请实施例提供的另一种确定时间戳的方法的流程图;
图5是本申请实施例提供的一种确定时间戳的装置的结构示意图;
图6是本申请实施例提供的终端设备的计算机系统的结构示意图。
具体实施方式
下面结合附图和实施例对本申请进行说明。可以理解的是,此处所描述的具体实施例仅仅用于解释本申请,而非对本申请的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与本申请相关的部分。
下面将参考附图并结合实施例来说明本申请。
图1是本申请实施例提供的一种系统架构图。图1示出了可以应用本申请的确定时间戳的方法或确定时间戳的装置的示例性系统架构100。
如图1所示,系统架构100可以包括终端设备101、终端设备102、终端设备103、网络104和服务器105。网络104用以在终端设备101、终端设备102、终端设备103和服务器105之间提供通信链路的介质。网络104可以包括任意种连接类型的网络,例如有线、无线通信链路或者光纤电缆等等。
用户可以使用终端设备101、102、103通过网络104与服务器105交互,以接收或发送消息(例如音视频数据上传请求、音频数据获取请求)等。终端设备101、终端设备102、终端设备103上可以安装有多种通讯客户端应用,例如视频录制类应用、音频播放类应用、即时通信工具、邮箱客户端、社交平台 软件等。
终端设备101、终端设备102、终端设备103可以是硬件,也可以是软件。当终端设备101、终端设备102、终端设备103为硬件时,可以是具有显示屏并且可实现视频录制和音频播放的各种电子设备,包括但不限于智能手机、平板电脑、膝上型便携计算机和台式计算机等等。当终端设备101、终端设备102、终端设备103为软件时,终端设备101、终端设备102、终端设备103可以安装在上述所列举的电子设备中。终端设备101、终端设备102、终端设备103可以实现成多个软件或软件模块(例如用来提供分布式服务),也可以实现成单个软件或软件模块。在此不做具体限定。
终端设备101、终端设备102、终端设备103可以安装有图像采集装置(例如摄像头),以采集视频数据。在本实施例中,组成视频的最小视觉单位是帧(Frame)。每一帧是一幅静态的图像。将时间上连续的帧序列合成到一起便形成动态视频。在一实施例中,终端设备101、终端设备102、终端设备103也可以安装有被配置成将电信号转换为声音的装置(例如扬声器),以播放声音。在本实施例中,音频数据是以一定的频率对模拟音频信号进行模数转换(Analogue-to-Digital Conversion,ADC)后所得到的数据。音频数据的播放,是将数字音频信号进行数模转换,还原为模拟音频信号,再将模拟音频信号(模拟音频信号为电信号)转化为声音进行输出的过程。
终端设备101、终端设备102、终端设备103可以利用安装于其上的图像采集装置进行视频数据的采集,并可以利用安装于其上的支持音频播放的(例如将数字音频信号转换为模拟音频信号)的音频处理组件和扬声器播放音频数据。并且,终端设备101、终端设备102、终端设备103可以对所采集到的视频数据进行时间戳计算等处理,最终将处理结果(例如包含时间戳的视频数据和已播放的音频数据)进行存储。
服务器105可以是提供多种服务的服务器,例如服务器105为对终端设备101、终端设备102、终端设备103上所安装的视频录制类应用提供支持的后台服务器。后台服务器可以对所接收到的音视频数据上传请求等数据进行解析、存储等处理。后台服务器还可以接收终端设备101、终端设备102、终端设备103所发送的音视频数据获取请求,并将该音视频数据获取请求所指示的音视频数据反馈至终端设备101、终端设备102、终端设备103。
在一实施例中,服务器105可以是硬件,也可以是软件。当服务器105为硬件时,可以实现成多个服务器组成的分布式服务器集群,也可以实现成单个服务器。当服务器105为软件时,可以实现成多个软件或软件模块(例如用来提供分布式服务),也可以实现成单个软件或软件模块。在此不做具体限定。
在本实施例中,本申请实施例所提供的确定时间戳的方法一般由终端设备101、终端设备102、终端设备103执行,相应地,确定时间戳的装置一般设置于终端设备101、终端设备102、终端设备103中。
应该理解,图1中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器。
继续参见图2,图2是本申请实施例提供的确定时间戳的方法的流程图。该确定时间戳的方法,包括以下步骤。
步骤2010,采集视频数据并播放目标音频数据。
在本实施例中,确定时间戳的方法的执行主体(例如图1所示的终端设备101、终端设备102、终端设备103)可以预先获取并存储目标音频数据。在一实施例中,上述目标音频数据可以是用户预先指定作为视频的配乐的音频数据(voice data),例如某个指定歌曲对应的音频数据。
在一实施例中,音频数据是对声音信号进行数字化后的数据。声音信号的数字化过程是以一定的频率将连续的模拟音频信号转换成数字音频信号得到音频数据的过程。通常,声音信号的数字化过程包含采样、量化和编码三个步骤。其中,采样是指用每隔一定时间间隔的信号样本值序列来代替原来在时间上连续的信号。量化是指用有限幅度近似表示原来在时间上连续变化的幅度值,把模拟信号的连续幅度变为有限数量、有一定时间间隔的离散值。编码则是指按照一定的规律,把量化后的离散值用二进制数码表示。在一实施例中,脉冲编码调制(Pulse Code Modulation,PCM)可以实现将模拟音频信号经过采样、量化、编码转换成的数字化的音频数据。因此,上述目标音频数据可以是PCM编码格式的数据流,记载目标音频数据的文件的格式可以是wav格式。在一实施例中,记载上述目标音频数据的文件的格式还可以是其他格式,例如mp3格式、ape格式等。在一实施例中,上述目标音频数据可以是其他编码格式(例如高级音频编码(Advanced Audio Coding,AAC)等有损压缩格式)的数据,不限于PCM编码格式。上述执行主体可对目标音频数据的文件进行格式转换,将该文件转换为wav格式。转换后的文件中的目标音频数据则为PCM编码格式的数据流。
在本实施例中,音频数据的播放,可以是将数字化的音频数据进行数模转换,将数字化的音频数据还原为模拟音频信号,再将模拟音频信号(电信号)转换为声音进行输出的过程。
在本实施例中,上述执行主体可以安装有图像采集装置,例如摄像头。上 述执行主体可以利用上述摄像头进行视频数据(vision data)的采集。在本实施例中,视频数据可以用帧(Frame)来描述。这里,帧是组成视频的最小视觉单位。每一帧是一幅静态的图像。将时间上连续的帧序列合成到一起便形成动态视频。在一实施例中,上述执行主体还可以安装有用于将电信号转换为声音的装置,例如扬声器。在获取到上述目标音频数据后,上述执行主体可以开启上述摄像头进行视频数据的采集,同时,可以将上述目标音频数据转换为模拟音频信号,利用上述扬声器输出声音,以实现目标音频数据的播放。
在本实施例中,上述执行主体可以利用任意方式进行目标音频数据的播放。作为示例,上述执行主体可以基于用于播放PCM编码格式的数据流的类(例如Android开发包中的Audio Track类)实现目标音频数据的播放。在播放之前,可以预先调用该类,对该类进行实例化,以创建用于播放目标音频数据的目标对象。在进行目标音频数据的播放时,可以采用流式传输的方式(例如单位时间传输固定的数据量),向上述目标对象传输上述目标音频数据,以利用上述目标对象进行目标音频数据的播放。
Android开发包中的AudioTrack是管理和播放单一音频资源的类。AudioTrack可以用于PCM音频流的播放。通常,通过把音频数据利用push的方式传输到对AudioTrack实例化后的对象,进行音频数据播放。AudioTrack对象可以在两种模式下运行。分别为静态模式(static)和流模式(streaming)。在流模式下,把连续的PCM编码格式的数据流写入(通过调用write方法)到AudioTrack对象。在上述实现方式中,可以利用流模式进行目标音频数据的写入。在一实施例中,上述执行主体还可以利用其它支持音频数据播放的组件或工具进行目标音频数据的播放,不限于上述方式。
在一实施例中,上述执行主体中可以安装有视频录制类应用。该视频录制类应用可以支持配乐视频的录制。上述配乐视频可以是在视频数据采集的同时进行音频数据播放的视频。所录制的配乐视频中的声音为该音频数据对应的声音。例如,播放某歌曲过程中录制用户表演的演唱动作,所录制的视频以该歌曲为背景音乐。上述视频录制类应用可以支持配乐视频的连续录制和分段录制。在分段录制的情况下,用户可以首先点击录制按键,进行第一段视频的录制。接着,再次点击录制按键,触发暂停录制指令。接着,再次点击录制按键,触发恢复录制指令,以进行第二段视频的录制。接着,再次点击录制按键,触发暂停录制指令。以此类推。在一实施例中,还可以通过其他方式触发录制指令、暂停录制指令以及恢复录制指令。例如,可以通过长按录制按键进行每段视频的录制。当松开录制按键时,触发暂停录制指令。此处不再赘述。
步骤2020,获取视频数据中的至少一帧的采集时间和传输就绪时间,基于 所获取的采集时间和传输就绪时间,确定视频数据的帧的延迟时长。
在本实施例中,上述执行主体在其所安装的图像采集装置采集到视频数据的帧时,可以记录该帧的采集时间。帧的采集时间可以是图像采集装置采集到该帧时的系统时间戳(例如unix时间戳)。在本实施例中,时间戳(timestamp)是能表示一份数据在某个特定时间之前已经存在的、完整的、可验证的数据。通常,时间戳是一个字符序列,唯一地标识某一刻的时间。
在图像采集装置采集到每一帧后,需要将该帧传输至应用层,以便应用层对该帧进行处理。在将该帧传输至应用层后,上述执行主体可以记录该帧的传输就绪时间。在本实施例中,每一帧的传输就绪时间,可以是该帧被传输至应用层时的系统时间戳。
由于上述执行主体中可以记录有所采集到的视频数据中的每帧的采集时间和传输就绪时间,因此,上述执行主体可以直接从本地获取视频数据中的至少一帧的采集时间和传输就绪时间。在本实施例中,上述至少一帧,可以是随机获取的一个或多个帧,也可以是所采集的视频数据中的全部的帧。此处不作限定。
在本实施例中,在获取到上述至少一帧的采集时间和传输就绪时间之后,上述执行主体可以基于所获取的采集时间和传输就绪时间,确定视频数据的帧的延迟时长。在本实施例中,可以利用多种方式进行延迟时长的确定。作为又一示例,首先,可以确定上述至少一帧的数量。不同的数量,可以使用不同的方法确定延迟时长。在一实施例中,若上述至少一帧的数量为1,则可以直接将该帧的传输就绪时间与采集时间的差值确定为视频数据的帧的延迟时长。若上述至少一帧的数量大于1,则可以首先确定上述至少一帧中每帧的传输就绪时间与采集时间的差值;而后,将所确定的多个差值之间的平均值确定为视频数据的帧的延迟时长。作为又一示例,若上述至少一帧的数量不大于预设数值(例如3),则可以首先确定上述至少一帧中每帧的传输就绪时间与采集时间的差值;而后,将所确定的多个差值之间的平均值确定为视频数据的帧的延迟时长。若上述至少一帧的数量大于上述预设数值,则可以首先确定上述至少一帧中每帧的传输就绪时间与采集时间的差值;而后,可以从所确定的差值中,删除差值的最大值和最小值;最后,将余下的差值的平均值确定为视频数据的帧的延迟时长。
在一实施例中,上述执行主体可以通过如下方式确定一帧的传输就绪时间:首先,可以调用第一预置接口(例如updateTexlmage()接口)获取所采集的视频数据中的一帧。其中,所述第一预置接口可以用于获取所采集到的一帧。在一实施例中,第一预置接口可以获取来自于图像采集装置所采集的帧。而后,响 应于获取到帧,可以调用第二预置接口(例如getTimestamp()接口)获取当前时间戳,将所述当前时间戳确定为该帧的传输就绪时间。其中,所述第二预置接口可以用于获取时间戳。在一实施例中,在获取到帧后,利用该第二预置接口所获取的时间戳即为该帧被传输至应用层时的系统时间戳。
在一实施例中,所述执行主体可以通过如下方式确定延迟时长:首先,可以获取所述视频数据中的至少一帧的采集时间和传输就绪时间。而后,对于所述至少一帧中的每帧,确定该帧的传输就绪时间与采集时间的差值。最后,可以将所确定的至少一个差值的平均值确定为视频数据的帧的延迟时长。
在一实施例中,上述执行主体所获取的至少一帧的采集时间和传输就绪时间,可以包括上述视频数据中的首帧的采集时间和传输就绪时间。此时,上述执行主体可以将首帧的传输就绪时间与上述采集时间的差值确定为视频数据的帧的延迟时长。
在一实施例中,上述执行主体所获取的至少一帧的采集时间和传输就绪时间,可以包括上述视频数据中的多个目标帧的采集时间和传输就绪时间。在一实施例中,上述多个目标帧可以是两个或两个以上的预先指定的帧。例如,可以是视频数据的前三帧、或者视频数据的首帧和尾帧等。在一实施例中,上述多个目标帧也可以是所采集的视频数据中的随机选取的两个或两个以上的帧。在获取到上述多个目标帧的采集时间和传输就绪时间后,上述执行主体可以首先确定上述多个目标帧的采集时间的平均值,将该平均值确定为第一平均值。而后,可以确定上述多个目标帧的传输就绪时间的平均值,将该平均值确定为第二平均值。最后,可以将上述第二平均值与上述第一平均值的差值确定为上述视频数据的帧的延迟时长。
在一实施例中,在确定延迟时长之后,上述执行主体还可以确定该延迟时长是否小于预设延迟时长阈值(例如0)。响应于确定上述延迟时长小于预设延迟时长阈值,将上述延迟时长设定为预设数值。其中,上述预设数值不小于上述预设延迟时长阈值。
步骤2030,对于上述视频数据中的每帧,确定采集到所述每帧时已播放的目标音频数据的数据量,将上述数据量对应的播放时长与上述延迟时长的差值确定为所述每帧的时间戳。
在本实施例中,对于上述视频数据中的每帧,上述执行主体可以首先读取该帧的采集时间。而后,可以确定在该采集时间时,已播放的目标音频数据的数据量。在一实施例中,上述执行主体可以确定采集到该帧时已传输至上述目标对象的目标音频数据的数据量,可以将上述数据量确定为采集到该帧时已播放的目标音频数据的数据量。
在一实施例中,由于目标音频数据是按照设定的采样频率(Sampling Rate)、设定的采样大小(Sampling Size)对声音信号进行采样、量化等操作后得到的,并且播放目标音频数据的声道数是预先确定的,因此,可以基于采集某帧图像时已播放的目标音频数据的数据量,以及采样频率、采样大小和声道数,计算出采集到该帧时目标音频数据的播放时长。上述执行主体可以将该播放时长与上述延迟时长的差值确定为该帧的时间戳。在一实施例中,采样频率也称为采样速度或者采样率。采样频率可以是每秒从连续信号中提取并组成离散信号的采样个数。采样频率可以用赫兹(Hz)来表示。采样大小可以用比特(bit)来表示。在一实施例中,确定播放时长的步骤如下:首先,可以确定采样频率、采样大小和声道数三者的乘积。而后,可以将已播放的目标音频数据的数据量与该乘积的比值确定为目标音频数据的播放时长。
在一实施例中,上述执行主体还可以将采集到视频数据的尾帧时已播放的目标音频数据作为目标音频数据区间,提取目标音频数据区间。在一实施例中,上述执行主体可以首先获取所采集到的视频数据的尾帧的采集时间。而后,可以确定该采集时间时已播放的目标音频数据的数据量。之后,可以按照该数据量,从目标音频数据的播放的起始位置对目标音频数据进行截取,将所截取的数据作为目标音频数据区间进行提取。在提取目标频数据区间之后,可以将包含时间戳的视频数据和目标音频数据区间进行存储。在一实施例中,可以将上述目标音频数据区间和包含时间戳的视频数据分别存储至两个文件中,并建立上述两个文件的映射。在一实施例中,也可以将上述目标音频数据区间和包含时间戳的视频数据存储至同一个文件中。
在一实施例中,上述执行主体可以通过如下方式进行上述目标音频数据区间和包含时间戳的视频数据的存储:首先,可以将包含时间戳的视频数据进行编码。之后,将上述目标音频数据区间和编码后的视频数据存储在同一文件中。在本实施例中,视频编码可以是指通过特定的压缩技术,将某个视频格式的文件转换成另一种视频格式文件的方式。需要说明的是,视频编码技术是广泛研究和应用的公知技术,在此不再赘述。
在一实施例中,在将上述目标音频数据区间和包含时间戳的上述视频数据存储之后,上述执行主体还可以将所存储的数据上传至服务器。
继续参见图3,图3是本申请实施例提供的一种确定时间戳的方法的应用场景的示意图。在图3的应用场景中,用户手持终端设备301,进行配乐视频的录制。终端设备301中运行有短视频录制类应用。用户在该短视频录制类应用的界面中首先选择了某个配乐(例如歌曲《小苹果》)。而后终端设备301获取 该配乐对应的目标音频数据302。在用户点击了配乐视频录制按键之后,终端设备301开启摄像头进行视频数据303的采集,同时,播放上述目标音频数据302。之后,终端设备301可以获取上述视频数据303中的至少一帧的采集时间和传输就绪时间,基于所获取的采集时间和传输就绪时间,确定视频数据的帧的延迟时长。最后,对于上述视频数据中的每帧,端设备301可以确定采集到该帧时已播放的目标音频数据的数据量,将上述数据量对应的播放时长与上述延迟时长的差值确定为该帧的时间戳。
本申请的上述实施例提供的方法,通过采集视频数据并播放目标音频数据,而后基于视频数据中的至少一帧的采集时刻和传输就绪时刻,确定视频数据的帧的延迟时长,最后对于上述视频数据中的每帧,确定采集到该帧时已播放的目标音频数据的数据量,将上述数据量对应的播放时长与上述延迟时长的差值确定为该帧的时间戳,从而,当采集到某一帧时,可以根据该帧采集时刻已播放的目标音频数据的播放量确定该帧时间戳,且所确定的时间戳消除了帧从采集到传输就绪的延迟时长,提高了视频数据中的帧的时间戳的准确性,提升了所录制的配乐视频的音视频同步效果。
参见图4,图4是本申请实施例提供的另一种确定时间戳的方法的流程图本实施例提供的方法包括以下步骤。
步骤4010,采集视频数据并播放目标音频数据。
在本实施例中,确定时间戳的方法的执行主体(例如图1所示的终端设备101、终端设备102、终端设备103)可以利用其所安装的摄像头采集视频数据,同时,利用预置的音频处理组件播放目标音频数据。
在一实施例中,上述目标音频数据可以是PCM编码格式的数据流。播放目标音频数据可以采用如下方式:首先,对目标类(例如Android开发包中的Audio Track类)进行实例化,以创建用于播放目标音频数据的目标对象。在一实施例中,上述目标类可以用于播放PCM编码格式的数据流。之后,可以采用流式传输的方式,向上述目标对象传输上述目标音频数据,以利用上述目标对象播放上述目标音频数据。
步骤4020,获取视频数据中的首帧的采集时间和传输就绪时间。
在本实施例中,上述执行主体在其所安装的图像采集装置采集到视频数据的帧时,可以记录该帧的采集时间。在将视频数据的首帧传输至应用层后,可以记录上述首帧的传输就绪时间。由于上述执行主体中可以记录有所采集到的视频数据中的每帧的采集时间和传输就绪时间,因此,上述执行主体可以直接 从本地获取视频数据的首帧的采集时间和传输就绪时间。
步骤4030,将传输就绪时间与采集时间的差值确定为视频数据的帧的延迟时长。
在本实施例中,上述执行主体可以将上述传输就绪时间与上述采集时间的差值确定为视频数据的帧的延迟时长。
步骤4040,响应于确定上述延迟时长小于预设延迟时长阈值,将上述延迟时长设定为预设数值。
在本实施例中,上述执行主体可以确定该延迟时长是否小于预设延迟时长阈值(例如0)。响应于确定上述延迟时长小于预设延迟时长阈值,可以将上述延迟时长设定为预设数值。其中,上述预设数值不小于上述预设延迟时长阈值。在本实施例中,上述预设数值可以是技术人员基于大量数据进行统计和分析之后所指定的数值。
步骤4050,对于视频数据中的每帧,确定采集到所述每帧时已播放的目标音频数据的数据量,将数据量对应的播放时长与延迟时长的差值确定为所述每帧的时间戳。
在本实施例中,对于所采集到的视频数据中的每帧,上述执行主体可以首先读取该帧的采集时间。而后,可以确定采集到该帧时已传输至上述目标对象的目标音频数据的数据量,并将上述数据量确定为采集到该帧时已播放的目标音频数据的数据量。之后,可以确定该数据量对应的播放时长。最后,可以将该播放时长与上述延迟时长的差值确定为该帧的时间戳。在一实施例中,确定播放时长的步骤如下:首先,可以确定采样频率、采样大小和声道数三者的乘积。而后,可以将已播放的目标音频数据的数据量与该乘积的比值确定为目标音频数据的播放时长。
步骤4060,将采集到视频数据的尾帧时已播放的目标音频数据作为目标音频数据区间,提取目标音频数据区间。
在本实施例中,上述执行主体可以首先获取采集视频数据的尾帧(即视频数据中的最后一帧)的采集时间。而后,可以确定该采集时间时已播放的目标音频数据的数据量。之后,可以按照该数据量,从目标音频数据的播放的起始位置对目标音频数据进行截取,将所截取的数据作为目标音频数据区间进行提取。
步骤4070,将包含时间戳的视频数据和目标音频数据区间进行存储。
在本实施例中,上述执行主体可以将包含时间戳的视频数据和上述目标音频数据区间进行存储。在一实施例中,可以将上述目标音频数据区间和包含时 间戳的视频数据分别存储至两个文件中,并建立上述两个文件的映射。在一实施例中,也可以将上述目标音频数据区间和包含时间戳的视频数据存储至同一个文件中。
从图4中可以看出,与图2对应的实施例相比,本实施例中的确定时间戳的方法体现了基于视频数据的首帧的采集时间和传输就绪时间确定延迟时长的步骤。由此,本实施例描述的方案可以减少数据计算量,提高数据处理效率。另一方面还体现了提取目标音频数据区间的步骤,以及存储音视频数据的步骤。由此,本实施例描述的方案可以实现对配乐视频的录制并保存所录制的数据。
参见图5,图5是本申请实施例提供的一种确定时间戳的装置的结构示意图。作为对上述多图所示方法的实现,本申请实施例提供了一种确定时间戳的装置,该装置实施例与图2所示的方法实施例相对应,该装置可以应用于多种电子设备中。
如图5所示,本实施例所述的确定时间戳的装置500包括:采集单元501,被配置成采集视频数据并播放目标音频数据;第一确定单元502,被配置成获取上述视频数据中的至少一帧的采集时间和传输就绪时间,基于所获取的采集时间和传输就绪时间,确定上述视频数据的帧的延迟时长;第二确定单元503,被配置成对于上述视频数据中的每帧,确定采集到所述每帧时已播放的目标音频数据的数据量,将上述数据量对应的播放时长与上述延迟时长的差值确定为所述每帧的时间戳。
在一实施例中,所述第一确定单元502可以包括第一获取模块、第一确定模块和第二确定模块(图中未示出)。其中,所述第一获取模块可以被配置成获取所述视频数据中的至少一帧的采集时间和传输就绪时间。所述第一确定模块可以被配置成对于所述至少一帧中的每帧,确定所述每帧的传输就绪时间与采集时间的差值。所述第二确定模块可以被配置成将所确定的至少一个差值的平均值确定为视频数据的帧的延迟时长。
在一实施例中,所述至少一帧可以包括首帧。所述第一确定单元502可以包括第二获取模块和第三确定模块(图中未示出)。其中,所述第二获取模块可以被配置成获取所述视频数据中的首帧的采集时间和传输就绪时间。所述第三确定模块可以被配置成将所述传输就绪时间与所述采集时间的差值确定为视频数据的帧的延迟时长。
在一实施例中,所述至少一帧可以包括多个目标帧。所述第一确定单元502可以包括第三获取模块、第四确定模块和第五确定模块(图中未示出)。其中, 所述第三获取模块可以被配置成获取所述视频数据中的多个目标帧的采集时间和传输就绪时间。所述第四确定模块可以被配置成将所述多个目标帧的采集时间的平均值确定为第一平均值,将所述多个目标帧的传输就绪时间的平均值确定为第二平均值。所述第五确定模块可以被配置成将所述第二平均值与所述第一平均值的差值确定为所述视频数据的帧的延迟时长。
在一实施例中,一帧的传输就绪时间可以通过如下方式获取:调用第一预置接口获取所采集的视频数据中的一帧,其中,所述第一预置接口用于获取所采集到的帧;响应于获取到一帧,调用第二预置接口获取当前时间戳,将所述当前时间戳确定为该帧的传输就绪时间,其中,所述第二预置接口用于获取时间戳。
在一实施例中,所述装置还可以包括设定单元(图中未示出)。其中,所述设定单元可以被配置成响应于确定所述延迟时长小于预设延迟时长阈值,将所述延迟时长设定为预设数值,其中,所述预设数值不小于所述预设延迟时长阈值。在一实施例中,该装置还可以包括提取单元和存储单元(图中未示出)。其中,上述提取单元可以被配置成将采集到上述视频数据的尾帧时已播放的目标音频数据作为目标音频数据区间,提取上述目标音频数据区间。上述存储单元可以被配置成将包含时间戳的视频数据和上述目标音频数据区间进行存储。
本申请的上述实施例提供的装置,通过采集单元501采集视频数据并播放目标音频数据,而后第一确定单元502基于视频数据中的至少一帧的采集时刻和传输就绪时刻,确定视频数据的帧的延迟时长,最后第二确定单元503对于上述视频数据中的每帧,确定采集到该帧时已播放的目标音频数据的数据量,将上述数据量对应的播放时长与上述延迟时长的差值确定为该帧的时间戳,从而,当采集到某一帧时,可以根据该帧采集时刻已播放的目标音频数据的播放量确定该帧时间戳,且所确定的时间戳消除了帧从采集到传输就绪的延迟时长,提高了视频数据中的帧的时间戳的准确性,提升了所录制的配乐视频的音视频同步效果。
下面参见图6,图6是本申请实施例提供的一种终端设备的计算机系统的结构示意图。图6示出的终端设备/服务器仅仅是一个示例,不应对本申请实施例的功能和使用范围带来任何限制。
如图6所示,计算机系统600包括中央处理单元(Central Processing Unit,CPU)601,CPU601可以根据存储在只读存储器(Read-Only Memory,ROM)602中的程序或者从存储部分608加载到随机访问存储器(Random Access Memory,RAM)603中的程序而执行多种适当的动作和处理。在RAM 603中, 还存储有计算机系统600操作所需的多种程序和数据。CPU 601、ROM 602以及RAM 603通过总线604彼此相连。输入/输出(Input/Ouput,I/O)接口605也连接至总线604。
以下部件连接至I/O接口605:包括触摸屏、触摸板等的输入部分606;包括诸如液晶显示器(Liquid Crystal Display,LCD)等以及扬声器等的输出部分607;包括硬盘等的存储部分608;以及包括诸如局域网(Local Area Network,LAN)卡、调制解调器等的网络接口卡的通信部分609。通信部分609经由诸如因特网的网络执行通信处理。驱动器610也根据需要连接至I/O接口605。可拆卸介质611,诸如半导体存储器等等,根据需要安装在驱动器610上,以便于驱动器610从可拆卸介质611上读出的计算机程序根据需要被安装入存储部分608。
根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,计算机程序产品包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信部分609从网络上被下载和安装,和/或从可拆卸介质611被安装。在该计算机程序被CPU601执行时,执行本申请的方法中限定的上述功能。在一实施例中,本申请所述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、RAM、ROM、可擦式可编程只读存储器(Erasable Programmable Read-Only Memory,EPROM)或闪存、光纤、便携式紧凑磁盘只读存储器(Compact Disc Read-Only Memory,CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本申请中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本申请中,计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,数据信号中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:无线、电线、光缆、射频(Radio Frequency,RF)等等,或者上述的任意合适的组合。
附图中的流程图和框图,图示了按照本申请实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本申请实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。所描述的单元也可以设置在处理器中,例如,可以描述为:一种处理器包括采集单元、第一确定单元和第二确定单元。其中,这些单元的名称在某种情况下并不构成对该单元本身的限定,例如,采集单元还可以被描述为“采集视频数据并播放目标音频数据的单元”。
作为另一方面,本申请还提供了一种计算机可读介质,该计算机可读介质可以是上述实施例中描述的装置中所包含的;也可以是单独存在,而未装配入该装置中。上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该装置执行时,使得该装置:采集视频数据并播放目标音频数据;获取该视频数据中的至少一帧的采集时间和传输就绪时间,基于所获取的采集时间和传输就绪时间,确定该视频数据的帧的延迟时长;对于该视频数据中的每帧,确定采集到该帧时已播放的目标音频数据的数据量,将该数据量对应的播放时长与该延迟时长的差值确定为该帧的时间戳。

Claims (16)

  1. 一种确定时间戳的方法,包括:
    采集视频数据并播放目标音频数据;其中,所述视频数据包括多帧;
    获取所述视频数据中的至少一帧的采集时间和传输就绪时间,基于所获取的采集时间和传输就绪时间,确定所述视频数据的帧的延迟时长;
    对于所述视频数据中的每帧,确定采集到所述每帧时已播放的目标音频数据的数据量,将所述数据量对应的播放时长与所述延迟时长的差值确定为所述每帧的时间戳。
  2. 根据权利要求1所述的方法,其中,所述获取所述视频数据中的至少一帧的采集时间和传输就绪时间,基于所获取的采集时间和传输就绪时间,确定所述视频数据的帧的延迟时长,包括:
    获取所述视频数据中的至少一帧的采集时间和传输就绪时间;
    对于所述至少一帧中的每帧,确定所述每帧的传输就绪时间与所述每帧的采集时间的差值;
    将所确定的至少一个差值的平均值确定为所述视频数据的帧的延迟时长。
  3. 根据权利要求1所述的方法,其中,所述至少一帧包括首帧;以及
    所述获取所述视频数据中的至少一帧的采集时间和传输就绪时间,基于所获取的采集时间和传输就绪时间,确定所述视频数据的帧的延迟时长,包括:
    获取所述视频数据中的首帧的采集时间和传输就绪时间;
    将所述传输就绪时间与所述采集时间的差值确定为所述视频数据的帧的延迟时长。
  4. 根据权利要求1所述的方法,其中,所述至少一帧包括多个目标帧;以及
    所述获取所述视频数据中的至少一帧的采集时间和传输就绪时间,基于所获取的采集时间和传输就绪时间,确定所述视频数据的帧的延迟时长,包括:
    获取所述视频数据中的多个目标帧的采集时间和传输就绪时间;
    将所述多个目标帧的采集时间的平均值确定为第一平均值,将所述多个目标帧的传输就绪时间的平均值确定为第二平均值;
    将所述第二平均值与所述第一平均值的差值确定为所述视频数据的帧的延迟时长。
  5. 根据权利要求1所述的方法,其中,一帧的传输就绪时间通过如下方式 获取:
    调用第一预置接口获取所采集的视频数据中的一帧,其中,所述第一预置接口用于获取所采集到的一帧;
    响应于获取到所述一帧,调用第二预置接口获取当前时间戳,将所述当前时间戳确定为所述一帧的传输就绪时间,其中,所述第二预置接口用于获取时间戳。
  6. 根据权利要求1所述的方法,其中,在所述确定所述视频数据的帧的延迟时长之后,还包括:
    响应于确定所述延迟时长小于预设延迟时长阈值,将所述延迟时长设定为预设数值,其中,所述预设数值不小于所述预设延迟时长阈值。
  7. 根据权利要求1所述的方法,还包括:
    将采集到所述视频数据的尾帧时已播放的目标音频数据作为目标音频数据区间,提取所述目标音频数据区间;
    将包含时间戳的视频数据和所述目标音频数据区间进行存储。
  8. 一种确定时间戳的装置,包括:
    采集单元,被配置成采集视频数据并播放目标音频数据;其中,所述视频数据包括多帧;
    第一确定单元,被配置成获取所述视频数据中的至少一帧的采集时间和传输就绪时间,基于所获取的采集时间和传输就绪时间,确定所述视频数据的帧的延迟时长;
    第二确定单元,被配置成对于所述视频数据中的每帧,确定采集到所述每帧时已播放的目标音频数据的数据量,将所述数据量对应的播放时长与所述延迟时长的差值确定为所述每帧的时间戳;
  9. 根据权利要求8所述的装置,其中,所述第一确定单元,包括:
    第一获取模块,被配置成获取所述视频数据中的至少一帧的采集时间和传输就绪时间;
    第一确定模块,被配置成对于所述至少一帧中的每帧,确定所述每帧的传输就绪时间与采集时间的差值;
    第二确定模块,被配置成将所述至少一帧对应的差值之间的平均值确定为所述视频数据的帧的延迟时长。
  10. 根据权利要求8所述的装置,其中,所述至少一帧包括首帧;以及
    所述第一确定单元,包括:
    第二获取模块,被配置成获取所述视频数据中的首帧的采集时间和传输就绪时间;
    第三确定模块,被配置成将所述传输就绪时间与所述采集时间的差值确定为视频数据的帧的延迟时长。
  11. 根据权利要求8所述的装置,其中,所述至少一帧包括多个目标帧;以及
    所述第一确定单元,包括:
    第三获取模块,被配置成获取所述视频数据中的多个目标帧的采集时间和传输就绪时间;
    第四确定模块,被配置成将所述多个目标帧的采集时间的平均值确定为第一平均值,将所述多个目标帧的传输就绪时间的平均值确定为第二平均值;
    第五确定模块,被配置成将所述第二平均值与所述第一平均值的差值确定为所述视频数据的帧的延迟时长。
  12. 根据权利要求8所述的装置,其中,一帧的传输就绪时间通过如下方式获取:
    调用第一预置接口获取所采集的视频数据中的一帧,其中,所述第一预置接口用于获取所采集到的所述一帧;
    响应于获取到所述一帧,调用第二预置接口获取当前时间戳,将所述当前时间戳确定为所述一帧的传输就绪时间,其中,所述第二预置接口用于获取时间戳。
  13. 根据权利要求8所述的装置,还包括:
    设定单元,被配置成响应于确定所述延迟时长小于预设延迟时长阈值,将所述延迟时长设定为预设数值,其中,所述预设数值不小于所述预设延迟时长阈值。
  14. 根据权利要求8所述的装置,还包括:
    提取单元,被配置成将采集到所述视频数据的尾帧时已播放的目标音频数据作为目标音频数据区间,提取所述目标音频数据区间;
    存储单元,被配置成将包含时间戳的视频数据和所述目标音频数据区间进行存储。
  15. 一种终端设备,包括:
    至少一个处理器;
    存储装置,被配置成存储有至少一个程序,
    当所述至少一个程序被所述至少一个处理器执行,使得所述至少一个处理器实现如权利要求1-7中任一所述的方法。
  16. 一种计算机可读介质,存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1-7中任一所述的方法。
PCT/CN2019/098431 2018-08-01 2019-07-30 确定时间戳的方法和装置 WO2020024945A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810866765.1 2018-08-01
CN201810866765.1A CN109600564B (zh) 2018-08-01 2018-08-01 用于确定时间戳的方法和装置

Publications (1)

Publication Number Publication Date
WO2020024945A1 true WO2020024945A1 (zh) 2020-02-06

Family

ID=65956133

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/098431 WO2020024945A1 (zh) 2018-08-01 2019-07-30 确定时间戳的方法和装置

Country Status (2)

Country Link
CN (1) CN109600564B (zh)
WO (1) WO2020024945A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115065860A (zh) * 2022-07-01 2022-09-16 广州美录电子有限公司 适用于舞台的音频数据处理方法、装置、设备及介质

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109600564B (zh) * 2018-08-01 2020-06-02 北京微播视界科技有限公司 用于确定时间戳的方法和装置
CN110324643B (zh) * 2019-04-24 2021-02-02 网宿科技股份有限公司 一种视频录制方法及系统
TWI735890B (zh) * 2019-06-17 2021-08-11 瑞昱半導體股份有限公司 音訊播放系統與方法
CN110225279B (zh) * 2019-07-15 2022-08-16 北京小糖科技有限责任公司 一种移动终端的视频制作系统和视频制作方法
CN110381316B (zh) * 2019-07-17 2023-09-19 腾讯科技(深圳)有限公司 一种视频传输控制方法、装置、设备及存储介质
CN112423075B (zh) * 2020-11-11 2022-09-16 广州华多网络科技有限公司 音视频时间戳的处理方法、装置、电子设备及存储介质
CN112541472B (zh) * 2020-12-23 2023-11-24 北京百度网讯科技有限公司 一种目标检测方法、装置及电子设备
CN114554269A (zh) * 2022-02-25 2022-05-27 深圳Tcl新技术有限公司 数据处理方法、电子设备及计算机可读存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106792073A (zh) * 2016-12-29 2017-05-31 北京奇艺世纪科技有限公司 跨设备的音视频数据同步播放的方法、播放设备及系统
CN107509100A (zh) * 2017-09-15 2017-12-22 深圳国微技术有限公司 音视频同步方法、系统、计算机装置及计算机可读存储介质
CN107517401A (zh) * 2016-06-15 2017-12-26 成都鼎桥通信技术有限公司 多媒体数据播放方法及装置
US20180041783A1 (en) * 2016-08-05 2018-02-08 Alibaba Group Holding Limited Data processing method and live broadcasting method and device
CN107995503A (zh) * 2017-11-07 2018-05-04 西安万像电子科技有限公司 音视频播放方法和装置
CN109600564A (zh) * 2018-08-01 2019-04-09 北京微播视界科技有限公司 用于确定时间戳的方法和装置

Family Cites Families (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003169292A (ja) * 2001-11-30 2003-06-13 Victor Co Of Japan Ltd アフレコ装置、コンピュータプログラム、記録媒体、伝送方法及び再生装置
JP4375313B2 (ja) * 2005-09-16 2009-12-02 セイコーエプソン株式会社 画像音声出力システム、画像音声データ出力装置、音声処理プログラム、及び記録媒体
CN100499823C (zh) * 2006-02-15 2009-06-10 中国科学院声学研究所 实现mxf视频文件与pcm音频文件同步播放的方法
CN100579238C (zh) * 2008-02-22 2010-01-06 上海华平信息技术股份有限公司 音视频缓存同步播放的方法
CN103208298A (zh) * 2012-01-11 2013-07-17 三星电子(中国)研发中心 一种摄像方法及系统
US9154834B2 (en) * 2012-11-06 2015-10-06 Broadcom Corporation Fast switching of synchronized media using time-stamp management
US9892759B2 (en) * 2012-12-28 2018-02-13 Cbs Interactive Inc. Synchronized presentation of facets of a game event
CN103237191B (zh) * 2013-04-16 2016-04-06 成都飞视美视频技术有限公司 在视频会议中同步推送音视频的方法
JP6287315B2 (ja) * 2014-02-20 2018-03-07 富士通株式会社 動画像音声同期装置、動画像音声同期方法及び動画像音声同期用コンピュータプログラム
CN103905877A (zh) * 2014-03-13 2014-07-02 北京奇艺世纪科技有限公司 音视频数据的播放方法、智能电视和移动设备
CN103888748B (zh) * 2014-03-24 2015-09-23 中国人民解放军国防科学技术大学 用于众视点三维显示系统的视频帧同步方法
US10178281B2 (en) * 2014-07-28 2019-01-08 Starkey Laboratories, Inc. System and method for synchronizing audio and video signals for a listening system
CN105049917B (zh) * 2015-07-06 2018-12-07 深圳Tcl数字技术有限公司 录制音视频同步时间戳的方法和装置
JP6720566B2 (ja) * 2016-02-17 2020-07-08 ヤマハ株式会社 オーディオ機器
CN106658133B (zh) * 2016-10-26 2020-04-14 广州市百果园网络科技有限公司 一种音视频同步播放的方法及终端
CN107613357B (zh) * 2017-09-13 2020-05-19 广州酷狗计算机科技有限公司 声画同步优化方法、装置及可读存储介质
CN108282685A (zh) * 2018-01-04 2018-07-13 华南师范大学 一种音视频同步的方法及监控系统

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107517401A (zh) * 2016-06-15 2017-12-26 成都鼎桥通信技术有限公司 多媒体数据播放方法及装置
US20180041783A1 (en) * 2016-08-05 2018-02-08 Alibaba Group Holding Limited Data processing method and live broadcasting method and device
CN106792073A (zh) * 2016-12-29 2017-05-31 北京奇艺世纪科技有限公司 跨设备的音视频数据同步播放的方法、播放设备及系统
CN107509100A (zh) * 2017-09-15 2017-12-22 深圳国微技术有限公司 音视频同步方法、系统、计算机装置及计算机可读存储介质
CN107995503A (zh) * 2017-11-07 2018-05-04 西安万像电子科技有限公司 音视频播放方法和装置
CN109600564A (zh) * 2018-08-01 2019-04-09 北京微播视界科技有限公司 用于确定时间戳的方法和装置

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115065860A (zh) * 2022-07-01 2022-09-16 广州美录电子有限公司 适用于舞台的音频数据处理方法、装置、设备及介质
CN115065860B (zh) * 2022-07-01 2023-03-14 广州美录电子有限公司 适用于舞台的音频数据处理方法、装置、设备及介质

Also Published As

Publication number Publication date
CN109600564A (zh) 2019-04-09
CN109600564B (zh) 2020-06-02

Similar Documents

Publication Publication Date Title
WO2020024945A1 (zh) 确定时间戳的方法和装置
WO2020024962A1 (zh) 处理数据的方法和装置
WO2020024980A1 (zh) 处理数据的方法和装置
US11114133B2 (en) Video recording method and device
US20010029457A1 (en) System and method for automatic synchronization for multimedia presentations
CN109600661B (zh) 用于录制视频的方法和装置
WO2023125169A1 (zh) 音频处理方法、装置、设备及存储介质
WO2023024290A1 (zh) 视频录制方法、摄像设备、控制终端及视频录制系统
WO2019071808A1 (zh) 视频画面显示的方法、装置、系统、终端设备及存储介质
WO2021169632A1 (zh) 视频质量检测方法、装置和计算机设备
WO2020024949A1 (zh) 确定时间戳的方法和装置
WO2020024960A1 (zh) 处理数据的方法和装置
CN111385576B (zh) 视频编码方法、装置、移动终端及存储介质
CN109600660B (zh) 用于录制视频的方法和装置
CN111324576B (zh) 一种录音数据保存的方法、装置、存储介质及终端设备
CN109618198A (zh) 直播内容举报方法及装置、存储介质、电子设备
CN109218849B (zh) 一种直播数据的处理方法、装置、设备和存储介质
US11295726B2 (en) Synthetic narrowband data generation for narrowband automatic speech recognition systems
CN109600562B (zh) 用于录制视频的方法和装置
WO2020087788A1 (zh) 音频处理方法和装置
CN114495941A (zh) 单通道音频转文本的方法、装置、电子设备及存储介质
CN111145769A (zh) 音频处理方法和装置
CN111899764B (zh) 音频监控方法、装置、计算机设备及存储介质
CN115065852A (zh) 音画同步方法、装置、电子设备及可读存储介质
Xin et al. Live Signal Recording and Segmenting Solution Based on Cloud Architecture

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19843113

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 17.05.2021)

122 Ep: pct application non-entry in european phase

Ref document number: 19843113

Country of ref document: EP

Kind code of ref document: A1