WO2020024980A1 - Data processing method and apparatus - Google Patents

Data processing method and apparatus Download PDF

Info

Publication number
WO2020024980A1
WO2020024980A1 PCT/CN2019/098584 CN2019098584W WO2020024980A1 WO 2020024980 A1 WO2020024980 A1 WO 2020024980A1 CN 2019098584 W CN2019098584 W CN 2019098584W WO 2020024980 A1 WO2020024980 A1 WO 2020024980A1
Authority
WO
WIPO (PCT)
Prior art keywords
frame
audio data
time
audio
determining
Prior art date
Application number
PCT/CN2019/098584
Other languages
French (fr)
Chinese (zh)
Inventor
周驿
Original Assignee
北京微播视界科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京微播视界科技有限公司 filed Critical 北京微播视界科技有限公司
Publication of WO2020024980A1 publication Critical patent/WO2020024980A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/4302Content synchronisation processes, e.g. decoder synchronisation
    • H04N21/4305Synchronising client clock from received content stream, e.g. locking decoder clock with encoder clock, extraction of the PCR packets
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/8547Content authoring involving timestamps for synchronizing content

Definitions

  • Embodiments of the present disclosure relate to the field of computer technology, for example, to a method and an apparatus for processing data.
  • the interval time between two adjacent frames in audio data and video data is generally considered to be fixed.
  • the sum of the time stamp of the previous frame and the interval time is determined as the time stamp of the frame.
  • the time stamp is recorded in the recorded audio and video data.
  • the embodiments of the present disclosure provide a method and an apparatus for processing data.
  • an embodiment of the present disclosure provides a method for processing data, the method including: collecting audio and video data, the audio and video data including audio data and video data; and determining a collection time of frames in the video data as all The timestamp of the frames in the video data; using the first sampling time of the audio data as the start time, based on the start time, the total number of frames processed when the frame processing in the audio data is completed, a preset The number of samples per frame and the preset sampling frequency to determine the timestamp of the frame in the audio data; store the audio with the timestamp of the frame in the video data and the timestamp of the frame in the audio data Video data.
  • an embodiment of the present disclosure provides an apparatus for processing data.
  • the apparatus includes: an acquisition unit configured to acquire audio and video data, the audio and video data including audio data and video data; and a first determination unit configured to Determining the collection time of the frames in the video data as the timestamp of the frames in the video data; a second determining unit configured to use the first sampling time of the audio data as a starting time, based on the starting time The start time, the total number of frames processed when the frame processing in the audio data is completed, the preset number of samples per frame, and the preset sampling frequency to determine the time stamp of the frames in the audio data; the storage unit, And configured to store audio and video data with a time stamp of a frame in the video data and a time stamp of a frame in the audio data.
  • an embodiment of the present disclosure provides a terminal device including: at least one processor; a storage device storing at least one program thereon, and when at least one program is executed by at least one processor, the at least one processor implements As in any one of the methods of processing data.
  • an embodiment of the present disclosure provides a computer-readable medium having stored thereon a computer program that, when executed by a processor, implements the method as in any one of the methods for processing data.
  • FIG. 1 is an exemplary system architecture diagram to which an embodiment of the present disclosure can be applied;
  • FIG. 1 is an exemplary system architecture diagram to which an embodiment of the present disclosure can be applied;
  • FIG. 2 is a flowchart of an embodiment of a method for processing data according to the present disclosure
  • FIG. 3 is a schematic diagram of an application scenario of a method for processing data according to the present disclosure
  • FIG. 4 is a flowchart of still another embodiment of a method for processing data according to the present disclosure.
  • FIG. 5 is a schematic structural diagram of an embodiment of an apparatus for processing data according to the present disclosure.
  • FIG. 6 is a schematic structural diagram of a terminal device computer system suitable for implementing the embodiments of the present disclosure.
  • FIG. 1 illustrates an exemplary system architecture 100 to which a method for processing data or a device for processing data of the present disclosure can be applied.
  • the system architecture 100 may include a terminal device 101, a terminal device 102, a terminal device 103, a network 104, and a server 105.
  • the network 104 is used to provide a medium for a communication link between the terminal device 101, the terminal device 102, the terminal device 103, and the server 105.
  • the network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, and so on.
  • the user can use the terminal device 101, the terminal device 102, and the terminal device 103 to interact with the server 105 through the network 104 to receive or send messages (such as audio and video data upload requests) and the like.
  • Various communication client applications can be installed on the terminal device 101, the terminal device 102, and the terminal device 103, such as video recording applications, audio playback applications, instant communication tools, email clients, social platform software, and the like.
  • the terminal device 101, the terminal device 102, and the terminal device 103 may be hardware or software.
  • the terminal device 101, the terminal device 102, and the terminal device 103 are hardware, they can be various electronic devices with a display screen and audio and video recording, including but not limited to smartphones, tablets, laptops, and desktops Computer and so on.
  • the terminal device 101, the terminal device 102, and the terminal device 103 are software, they can be installed in the electronic devices listed above. It can be implemented as multiple software or software modules (for example, to provide distributed services), or it can be implemented as a single software or software module. It is not specifically limited here.
  • the terminal device 101, the terminal device 102, and the terminal device 103 may be equipped with an image acquisition device (such as a camera) to collect video data.
  • an image acquisition device such as a camera
  • the smallest visual unit that makes up a video is a frame. Each frame is a static image. Combining a sequence of temporally consecutive frames together forms a dynamic video.
  • the terminal device 101, the terminal device 102, and the terminal device 103 may also be installed with an audio collection device (such as a microphone) to collect continuous analog audio signals.
  • the data obtained by performing analog-to-digital conversion (ADC) on a continuous analog audio signal from a device such as a microphone at a certain frequency is audio data.
  • ADC analog-to-digital conversion
  • the terminal device 101, the terminal device 102, and the terminal device 103 may use an image acquisition device and an audio acquisition device installed on the terminal device 101 to collect video data and audio data, respectively.
  • time stamp calculation and other processing may be performed on the collected video data, and finally the processing results (such as the collected audio data and video data including the time stamp) are stored.
  • the server 105 may be a server that provides various services, such as a background server that provides support for video recording applications installed on the terminal device 101, the terminal device 102, and the terminal device 103.
  • the background server can analyze and store the received audio and video data upload requests and other data. It can also receive audio and video data acquisition requests sent by the terminal device 101, terminal device 102, and terminal device 103, and feed back the audio and video data indicated by the audio and video data acquisition request to the terminal device 101, terminal device 102, and terminal device 103 .
  • the server may be hardware or software.
  • the server can be implemented as a distributed server cluster consisting of multiple servers or as a single server.
  • the server can be implemented as multiple software or software modules (for example, to provide distributed services), or it can be implemented as a single software or software module. It is not specifically limited here.
  • the method for processing data provided by the embodiments of the present disclosure is generally executed by the terminal device 101, the terminal device 102, and the terminal device 103. Accordingly, the data processing device is generally provided in the terminal device 101, the terminal device 102, and the terminal Device 103.
  • terminal devices, networks, and servers in FIG. 1 are merely exemplary. According to implementation needs, there can be any number of terminal devices, networks, and servers.
  • the method for processing data includes steps 201 to 204.
  • step 201 audio and video data is collected.
  • an execution subject of the method for processing data may be installed with an image acquisition device (such as a camera) and an audio signal acquisition device (such as a microphone).
  • the execution subject may turn on the image acquisition device and the audio signal acquisition device at the same time, and use the image acquisition device and the audio signal acquisition device to collect audio and video data.
  • the audio and video data includes audio data and video data.
  • video data can be described by frames.
  • a frame is the smallest visual unit that makes up a video.
  • Each frame is a static image.
  • Combining a sequence of temporally consecutive frames together forms a dynamic video.
  • audio data is data obtained by digitizing a sound signal.
  • the process of digitizing sound signals is a process of converting continuous analog audio signals from microphones and other equipment into digital signals at a certain frequency to obtain audio data.
  • the digitization process of sound signals usually includes three steps: sampling, quantization, and encoding.
  • sampling refers to replacing a signal that is continuous in time with a sequence of signal sample values at regular time intervals.
  • Quantization refers to the use of finite amplitude approximation to indicate the amplitude value that continuously changes in time, and changes the continuous amplitude of the analog signal into a finite number of discrete values with a certain time interval.
  • Encoding means that the quantized discrete value is represented by binary digits according to a certain rule.
  • sampling frequency is also called a sampling speed or a sampling frequency.
  • the sampling frequency can be the number of samples taken from the continuous signal per second and composed of discrete signals.
  • the sampling frequency can be expressed in Hertz (Hz).
  • the sample size can be expressed in bits.
  • Pulse Code Modulation PCM
  • PCM Pulse Code Modulation
  • the format of the file describing the target audio data may also be other formats, such as the mp3 format and the ape format.
  • the target audio data may be data of other encoding formats (for example, lossy compression formats such as AAC (Advanced Audio Coding)), and is not limited to the PCM encoding format.
  • the above-mentioned execution body may also perform format conversion on the file after obtaining the file, and convert it into a record wav format.
  • the target audio file in the converted file is a data stream in PCM encoding format.
  • a video recording application may be installed in the execution body.
  • This video recording application can support the recording of original video.
  • the above-mentioned original sound video may be a video using the original sound of the video as the background sound of the video.
  • the user can trigger the video recording instruction by clicking the video recording button in the running interface of the video recording application.
  • the execution subject may simultaneously turn on the image acquisition device and the audio acquisition device to record the original video.
  • step 202 for a frame in the video data, the acquisition time of the frame is determined as the time stamp of the frame.
  • the above-mentioned execution subject may record the acquisition time when each frame of video data is acquired.
  • the collection time of each frame may be a system timestamp (such as a Unix timestamp) when the frame is collected.
  • the acquisition time of each frame can also adopt other timestamps, for example, relative timestamps relative to a specified time.
  • the timestamp is a complete, verifiable data that can indicate that a piece of data already exists at a specific time.
  • a timestamp is a sequence of characters that uniquely identifies the time of a moment.
  • the execution body may determine the collection time of the frame as the time stamp of the frame.
  • step 203 the first sampling time of the audio data is used as the starting time, and the starting time is determined based on the starting time, the total number of frames processed when the frame processing is completed, the preset number of samples per frame, and the preset sampling frequency. The timestamp of the frame.
  • the execution body may use the first sampling time of the audio data as the starting time.
  • the above start time may be the system time stamp of the first sampling of the audio data.
  • the above-mentioned start time may be a relative time stamp of the time of the first sampling of the audio data with respect to the specified time.
  • the above-mentioned execution body can perform various processing on the frame. For example, transparent transmission, reverberation, equalization, sound change, tone change, speed change and other processing can be performed.
  • the above-mentioned execution body may determine the time stamp of the frame based on the start time, the total number of frames processed when the frame is processed, the preset number of samples per frame, and the preset sampling frequency. .
  • the processing of frames in the present disclosure refers to processing such as transparent transmission, reverberation, equalization, sound change, tone change, speed change, and the like.
  • the execution body may first determine the duration of each frame based on a preset number of samples per frame and a preset sampling frequency.
  • the duration of each frame is a ratio of the number of samples in each frame and the sampling frequency. Since the number of samples and the sampling frequency of each frame are preset fixed values, the duration of each frame is a fixed value. Then, each time a frame is processed, the total number of frames currently processed (that is, the total number of frames processed when the frame is processed) can be multiplied by the duration of each frame, and the product is the time when the frame is processed. The total length of time the execution subject has processed. Finally, the sum of the start time and the total duration currently processed can be determined as the time stamp of the frame.
  • the execution body may determine the collection time of the frame as the time stamp of the frame.
  • the execution body may determine the time stamp of the frame according to the following steps: first, a ratio of a preset number of samples per frame to a preset sampling frequency may be determined . Then, the product of the above ratio and the total number of frames processed when the frame processing is completed can be determined. After that, the sum of the above product and the start time can be determined as the target time of the frame. Finally, the time stamp of the frame can be determined based on a comparison between the target time of the frame and the value of the acquisition time of the frame. As an example, if the difference between the target time and the acquisition time is within a preset numerical interval, the target time may be determined as the time stamp of the frame.
  • the acquisition time may be determined as the time stamp of the frame.
  • the above-mentioned numerical interval may be an interval that is preset by a technician based on a large amount of data statistics. It should be noted that the above-mentioned time stamp for determining the frame may be executed when processing of each frame is completed. For each frame, the total number of frames processed when the frame processing is completed is the total number of frames currently processed.
  • the target time of the frame in response to determining that the difference between the target time of the frame and the acquisition time of the frame is less than a preset value, the target time of the frame may be determined as the frame Timestamp.
  • the preset value may be a value determined in advance by a technician based on a large amount of data statistics.
  • the acquisition time of the frame may be determined Is the timestamp of the frame.
  • the above-mentioned execution subject in response to determining that the difference between the target time of the frame and the acquisition time of the frame is greater than or equal to the above-mentioned preset value, may also perform the following information resetting step: The start time is updated to the acquisition time of the frame; and the total number of frames currently processed is cleared.
  • the total number of frames currently processed is the total number of frames processed when the processing of the frame is completed.
  • the execution subject may further perform the following steps: First, the execution frequency of the information resetting step may be determined. In response to determining that the execution frequency of the information resetting step is greater than a preset execution frequency threshold, for a processed frame in the audio data without determining a timestamp, the acquisition time of the frame may be determined as the timestamp of the frame.
  • determining the execution frequency of the information reset step may be, after the information reset step is performed, the execution frequency of the information reset step is calculated or directly read. After the execution frequency of the information resetting step is determined, it is compared with a preset execution frequency threshold.
  • the execution subject may further perform the following steps: First, the number of executions of the information reset step may be determined. Then, in response to determining that the number of times of execution of the above information reset step is greater than a preset number of times of execution, for a processed frame in the audio data without determining a timestamp, the collection time of the frame may be determined as the timestamp of the frame .
  • determining the number of times the information reset step is performed may be, after performing the information reset step, calculating or directly reading the number of times the stored information reset step is executed. After determining the number of executions of the information reset step, it is compared with a preset number of executions threshold.
  • step 204 audio and video data with a timestamp of a frame in the video data and a timestamp of a frame in the audio data are stored.
  • the above-mentioned execution subject may store audio data including a time stamp and video data including a time stamp.
  • the audio data containing the timestamp and the video data containing the timestamp may be stored in two files respectively, and the mapping of the two files is established.
  • the audio and video data storing the timestamps of the frames in the video data and the timestamps of the frames in the audio data may be performed as follows: First, the timestamped Audio and video data are encoded. That is, the audio data including the time stamp and the video data including the time stamp are separately encoded.
  • video encoding can refer to a method of converting a file in a certain video format to another file in a video format through a specific compression technology. Audio coding can use coding methods such as waveform coding, parameter coding, and hybrid coding. It should be noted that audio coding and video coding technologies are well-known technologies that are widely studied and applied at present, and will not be repeated here.
  • the encoded audio and video data can be stored locally, or the encoded audio and video data can be sent to the server.
  • the execution body may store the encoded audio data and the encoded video data in the same file, and store the file locally.
  • the encoded audio data and the encoded video data may also be stored in the same file and sent to the server (such as the server 105 shown in FIG. 1) through a wired connection or a wireless connection.
  • FIG. 3 is a schematic diagram of an application scenario of a method for processing data according to this embodiment.
  • a user holds a terminal device 301 and records an original video.
  • a short video recording application runs on the terminal device 301.
  • the terminal device 301 After the user clicks the original video recording button in the interface of the short video recording application, the terminal device 301 simultaneously turns on the microphone and the camera, and collects audio data 302 and video data 303, respectively.
  • the terminal device 301 may determine the collection time of the frame as the time stamp of the frame.
  • the terminal device 301 may use the first sampling time of the audio data 302 as a start time, and determine the start time based on the start time, the total number of frames processed when the frame processing is completed, the preset number of samples per frame, and the preset sampling frequency. The timestamp of the frame. Finally, the timestamped audio and video data is stored in the file 304.
  • the acquisition time is used as the time stamp of the frame of the video data, and the acquisition time can be obtained directly, it does not need to be calculated at a fixed interval.
  • the timestamp caused by calculating the timestamp of the frame at a fixed time interval is inaccurate.
  • directly using the acquisition time of the audio data frame as the timestamp may cause a non-uniform time stamp.
  • the timestamp is not accurate enough when the timestamps are not uniform.
  • the method provided by the above embodiment of the present disclosure is adopted. The number of samples and the sampling frequency can determine a uniform and stable time stamp, thereby avoiding uneven and inaccurate timestamps of frames of audio data. Therefore, the accuracy of the time stamp of the audio and video data is improved, and the audio and video synchronization effect of the original audio and video recording is improved.
  • FIG. 4 a flowchart 400 of still another embodiment of a method of processing data is shown.
  • the process 400 of the method for processing data includes steps 401 to 406.
  • step 401 audio and video data is collected.
  • an execution subject of the method for processing data may be installed with an image acquisition device (such as a camera) and an audio signal acquisition device (such as a microphone).
  • the execution subject may turn on the image acquisition device and the audio acquisition device at the same time, and use the image acquisition device and the audio acquisition device to collect audio and video data.
  • the audio and video data includes audio data and video data.
  • step 402 for a frame in the video data, the acquisition time of the frame is determined as the time stamp of the frame.
  • the above-mentioned execution subject may record the acquisition time when each frame of video data is acquired.
  • the execution body may determine the collection time of the frame as the time stamp of the frame.
  • the first sampling time of the audio data is used as the starting time.
  • a ratio between a preset number of samples per frame and a preset sampling frequency is determined, and when the above ratio is determined and the processing of the frame is completed, The product of the total number of frames processed, the sum of the above product and the start time is determined as the target time of the frame.
  • the execution body may use the first sampling time of the audio data as the starting time.
  • the above-mentioned execution body can perform various processing on the frame. For example, transparent transmission, reverberation, equalization, sound change, tone change, speed change and other processing can be performed.
  • the above execution body can perform the following steps:
  • the determined ratio is the duration of each frame. Since the number of samples and the sampling frequency of each frame are preset fixed values, the duration of each frame is a fixed value.
  • the product of the above ratio and the total number of frames processed when the frame processing is completed is determined.
  • the total number of frames processed when the frame processing is completed is the total number of frames currently processed.
  • the above product is the time when the frame processing is completed and the execution body has processed the total time.
  • step 404 for a frame in the audio data, in response to determining that the difference between the target time of the frame and the acquisition time of the frame is less than a preset value, the target time of the frame is determined as the time stamp of the frame.
  • the target time of the frame in response to determining that the difference between the target time of the frame and the acquisition time of the frame is less than a preset value, the target time of the frame may be determined as the time stamp of the frame.
  • step 405 for a frame in the audio data, in response to determining that the difference between the target time of the frame and the acquisition time of the frame is greater than or equal to a preset value, the acquisition time of the frame is determined as the time stamp of the frame.
  • the execution body may determine the frame collection time as the The timestamp of the frame.
  • the above-mentioned execution subject in response to determining that the difference between the target time of the frame and the acquisition time of the frame is greater than or equal to the above-mentioned preset value, may also perform the following information resetting step: The start time is updated to the acquisition time of the frame; and the total number of frames currently processed is cleared.
  • the execution subject may further determine an execution frequency of the information reset step.
  • the frame collection time is determined as the frame time stamp. It should be noted that when the execution frequency of the information resetting step is less than or equal to the above-mentioned execution frequency threshold, for the frames in the audio data that are subsequently processed, the operation may continue to be performed according to step 403 to determine the time of the frame stamp.
  • the execution subject may further determine the number of times the information reset step is performed. In response to determining that the number of times of execution of the above information reset step is greater than a preset number of times of execution, for a processed frame in the audio data without determining a time stamp, the acquisition time of the frame is determined as the time stamp of the frame. It should be noted that when the number of times of execution of the above information reset step is less than or equal to the above number of execution times threshold, for the frames in the audio data that have been processed subsequently, the operation may continue to be performed according to step 403 to determine the time of the frame. stamp.
  • step 406 the audio and video data with the timestamp of the frames in the video data and the timestamp of the frames in the audio data are stored.
  • the above-mentioned execution subject may store audio data including a time stamp and video data including a time stamp.
  • the audio data containing the timestamp and the video data containing the timestamp may be stored in two files respectively, and the mapping of the two files is established.
  • the process 400 of the method for processing data in this embodiment embodies a frame in audio data, based on the target time of the frame and the collection of the frame.
  • the step of comparing the numerical value of time to determine the timestamp of the frame In the case of unstable audio data collection (for example, when the device is overheated, insufficient performance, etc.), the frame acquisition time of the audio data is uneven.
  • the target time determined by the total number of frames, the sampling frequency, and the number of samples per frame that are currently processed is uniform. In the case where the deviation between the target time and the acquisition time is small, it can be shown that the acquisition is relatively stable, and the amplitude of the acquisition jitter is small at this time.
  • the target time determined by the total number of frames, sampling frequency, and number of samples per frame currently processed is used as the frame time stamp in the audio data, which can increase the uniformity and stability of the time stamp of the audio data.
  • it can reflect that the acquisition is unstable, and frames are dropped.
  • the target time is used as the timestamp, when the frame is dropped, the calculated timestamp is not the timestamp of the current frame, and the accuracy is low.
  • the acquisition time is used at this time to ensure the relative accuracy of the time stamp. Therefore, the time stamp is determined in different ways in different situations, which improves the accuracy of the time stamp of the audio and video data, and improves the audio and video synchronization effect of the original audio and video recording.
  • the present disclosure provides an embodiment of a device for processing data.
  • the device embodiment corresponds to the method embodiment shown in FIG. 2, and the device can be applied.
  • the device can be applied.
  • various electronic equipment In various electronic equipment.
  • the apparatus 500 for processing data includes: a collecting unit 501 configured to collect audio and video data, where the audio and video data includes audio data and video data; a first determining unit 502 is configured For a frame in the video data, the acquisition time of the frame is determined as the time stamp of the frame; the second determination unit 503 is configured to use the first sampling time of the audio data as the start time, and based on the start time, the frame When the processing is completed, the total number of frames processed, the preset number of samples per frame, and the preset sampling frequency determine the time stamp of the frame; the storage unit 504 is configured to store the audio and video data with the time stamp.
  • the second determining unit 503 may be a first determining module and a second determining module (not shown in the figure).
  • the first determining module may be configured to determine a ratio between a preset number of samples per frame and a preset sampling frequency for frames in the audio data, and determine the ratio and the total number of frames processed when the frame is processed.
  • the product of, the sum of the above product and the start time is determined as the target time of the frame.
  • the above-mentioned second determination module may be configured to determine, for a frame in the audio data, a time stamp of the frame based on a comparison between a target time of the frame and a value of the acquisition time of the frame.
  • the foregoing second determination module may be configured to, for a frame in audio data, in response to determining that a difference between a target time of the frame and a collection time of the frame is less than a preset value, set the The target time of a frame is determined as the time stamp of the frame.
  • the second determining module may be configured to be a frame in audio data, and in response to determining that a difference between a target time of the frame and a collection time of the frame is greater than or equal to the preset value , Determine the collection time of the frame as the time stamp of the frame.
  • the apparatus may further include an execution unit (not shown in the figure).
  • the execution unit may be configured to, when the difference between the target time of the frame in the audio data and the acquisition time of the frame in the audio data is greater than or equal to a preset value, After the acquisition time of the frames in the data is determined as the timestamp of the frames in the audio data, the following information resetting steps are performed: updating the start time to the acquisition time of the frame; and updating the total frames currently processed The number is cleared.
  • the apparatus may further include a third determining unit and a fourth determining unit (not shown in the figure).
  • the third determining unit may be configured to determine an execution frequency of the information resetting step after the information resetting step is performed.
  • the fourth determining unit may be configured to determine, in response to determining that the execution frequency of the information resetting step is greater than a preset execution frequency threshold, for a frame in the audio data that has been processed without a timestamp, the acquisition time of the frame is determined. Is the timestamp of the frame.
  • the apparatus may further include a fifth determination unit and a sixth determination unit (not shown in the figure).
  • the fifth determining unit may be configured to determine the number of times the information reset step is performed after the information reset step is performed.
  • the above-mentioned sixth determining unit may be configured to determine, in response to determining that the number of times of execution of the information reset step is greater than a preset number of times of execution, for a processed frame in the audio data without determining a timestamp, determining the frame collection time Is the timestamp of the frame.
  • the device provided by the foregoing embodiment of the present disclosure collects audio and video data through the acquisition unit 501, and then the first determination unit 502 determines the collection time of the frame in the video data as the time stamp of the frame, and then the second determination unit 503 determines the audio
  • the first sampling time of the data is used as the starting time, and the time stamp of each frame is determined based on the starting time, the total number of frames processed when the processing of each frame in the audio data is completed, the preset number of samples, and the sampling frequency.
  • FIG. 6 illustrates a schematic structural diagram of a computer system 600 suitable for implementing a terminal device according to an embodiment of the present disclosure.
  • the terminal device shown in FIG. 6 is only an example, and should not impose any limitation on the functions and scope of use of the embodiments of the present disclosure.
  • the computer system 600 includes a central processing unit (CPU) 601, which can be loaded into random access according to a program stored in a read-only memory (ROM) 602 or from a storage portion 608
  • ROM read-only memory
  • RAM Random Access Memory
  • a program in a memory (Random Access Memory, RAM) 603 performs various appropriate actions and processes.
  • RAM 603 various programs and data required for the operation of the system 600 are also stored.
  • the CPU 601, the ROM 602, and the RAM 603 are connected to each other through a bus 604.
  • An input / output (I / O) interface 605 is also connected to the bus 604.
  • the following components are connected to the I / O interface 605: an input portion 606 including a touch screen, a touch panel, etc .; an output portion 607 including a liquid crystal display (LCD), and a speaker; a storage portion 608 including a hard disk; and A communication part 609 of a network interface card, such as a local area network (LAN) card, a modem, or the like.
  • the communication section 609 performs communication processing via a network such as the Internet.
  • the driver 610 is also connected to the I / O interface 605 as necessary.
  • a removable medium 611 such as a semiconductor memory or the like, is installed on the drive 610 as necessary, so that a computer program read therefrom is installed into the storage section 608 as necessary.
  • the process described above with reference to the flowchart may be implemented as a computer software program.
  • embodiments of the present disclosure include a computer program product including a computer program carried on a computer-readable medium, the computer program containing program code for performing a method shown in a flowchart.
  • the computer program may be downloaded and installed from a network through the communication portion 609, and / or installed from a removable medium 611.
  • CPU central processing unit
  • the computer-readable medium described in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium or any combination of the foregoing.
  • the computer-readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof.
  • Computer-readable storage media may include, but are not limited to: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programming read-only memory (Erasable, Programmable, Read-Only, Memory (EPROM) or flash memory), optical fiber, portable compact disk read-only memory (Compact Disc-Read-Only Memory, CD-ROM), optical storage device, magnetic storage device, or the Any suitable combination.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in combination with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal that is included in baseband or propagated as part of a carrier wave, and which carries computer-readable program code. Such a propagated data signal may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • the computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, and the computer-readable medium may send, propagate, or transmit a program for use by or in connection with an instruction execution system, apparatus, or device .
  • the program code contained on the computer-readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, optical cable, radio frequency (RF), or any suitable combination of the foregoing.
  • each block in the flowchart or block diagram may represent a module, a program segment, or a part of code, which contains one or more functions to implement a specified logical function Executable instructions.
  • the functions noted in the blocks may also occur in a different order than those marked in the drawings. For example, two successively represented boxes may actually be executed substantially in parallel, and they may sometimes be executed in the reverse order, depending on the functions involved.
  • each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts can be implemented by a dedicated hardware-based system that performs the specified function or operation , Or it can be implemented with a combination of dedicated hardware and computer instructions.
  • the units described in the embodiments of the present disclosure may be implemented by software or hardware.
  • the described unit may also be provided in a processor, for example, it may be described as: a processor includes an acquisition unit, a first determination unit, a second determination unit, and a storage unit.
  • a processor includes an acquisition unit, a first determination unit, a second determination unit, and a storage unit.
  • the names of these units do not in any way constitute a limitation on the unit itself.
  • the acquisition unit can also be described as a “unit that collects audio and video data”.
  • the present disclosure also provides a computer-readable medium, which may be included in the device described in the above embodiments; or may exist alone without being assembled into the device.
  • the computer-readable medium carries one or more programs, and when the one or more programs are executed by the device, the device causes the device to: collect audio and video data, the audio and video data includes audio data and video data; for video data, Frame, the acquisition time of the frame is determined as the time stamp of the frame; the first sampling time of the audio data is used as the start time, based on the start time, the total number of frames currently processed, and the preset number of samples per frame And a preset sampling frequency to determine the timestamp of the frame; store the timed audio and video data.

Abstract

Disclosed in embodiments of the present application are a data processing method and apparatus. One exemplary embodiment of the method comprises: acquiring audio/video data, the audio/video data comprising audio data and video data; determining acquisition times of frames in the video data as timestamps of the frames in the video data; taking a first sampling time of the audio data as a starting time, and determining timestamps of frames in the audio data on the basis of the starting time, the total number of processed frames when the processing of the frames in the audio data is completed, a preset number of samplings of each frame, and a preset sampling frequency; and storing the audio/video data carrying the timestamps of the frames in the video data and the timestamps of the frames in the audio data.

Description

处理数据的方法和装置Method and device for processing data
本公开要求在2018年08月01日提交中国专利局、公开号为201810865732.5的中国专利公开的优先权,该公开的全部内容通过引用结合在本公开中。The present disclosure claims the priority of the Chinese Patent Publication No. 201810865732.5, filed with the China Patent Office on August 01, 2018, the entire contents of which are incorporated herein by reference.
技术领域Technical field
本公开实施例涉及计算机技术领域,例如涉及处理数据的方法和装置。Embodiments of the present disclosure relate to the field of computer technology, for example, to a method and an apparatus for processing data.
背景技术Background technique
在录制原声视频时,需要使用摄像头采集视频数据,同时,使用麦克风采集音频数据。在采集到音视频数据后,可以确定所采集到的音视频数据的时间戳。在音视频数据播放时,播放端即可基于时间戳,来播放音视频数据。在具有视频录制功能的应用中,录制的原声视频出现音视频不同步的情况较为常见。When recording original video, you need to use a camera to collect video data, and at the same time, use a microphone to collect audio data. After the audio and video data is collected, the time stamp of the collected audio and video data can be determined. When the audio and video data is played, the player can play the audio and video data based on the time stamp. In applications with video recording capabilities, it is more common for recorded audio and video to be out of sync with audio and video.
相关的方式中,通常认为音频数据、视频数据中的相邻两帧的间隔时间是固定的。对于音频数据、视频数据中的某帧,将上一帧的时间戳与该间隔时间之和确定为该帧的时间戳。进而,将该时间戳记录于所录制的音视频数据中。In a related manner, the interval time between two adjacent frames in audio data and video data is generally considered to be fixed. For a certain frame in audio data and video data, the sum of the time stamp of the previous frame and the interval time is determined as the time stamp of the frame. Furthermore, the time stamp is recorded in the recorded audio and video data.
发明内容Summary of the invention
以下是对本文详细描述的主题的概述。本概述并非是为了限制权利要求的保护范围。The following is an overview of the topics detailed in this article. This summary is not intended to limit the scope of protection of the claims.
本公开实施例提出了处理数据的方法和装置。The embodiments of the present disclosure provide a method and an apparatus for processing data.
第一方面,本公开实施例提供了一种处理数据的方法,该方法包括:采集音视频数据,音视频数据包括音频数据和视频数据;将所述视频数据中的帧的采集时间确定为所述视频数据中的帧的时间戳;将所述音频数据的首次采样时间作为起始时间,基于所述起始时间、所述音频数据中的帧处理完成时已处理的总帧数、预设的每帧采样数和预设的采样频率,确定所述音频数据中的帧的时间戳;存储带有所述视频数据中的帧的时间戳和所述音频数据中的帧的时间戳的音视频数据。In a first aspect, an embodiment of the present disclosure provides a method for processing data, the method including: collecting audio and video data, the audio and video data including audio data and video data; and determining a collection time of frames in the video data as all The timestamp of the frames in the video data; using the first sampling time of the audio data as the start time, based on the start time, the total number of frames processed when the frame processing in the audio data is completed, a preset The number of samples per frame and the preset sampling frequency to determine the timestamp of the frame in the audio data; store the audio with the timestamp of the frame in the video data and the timestamp of the frame in the audio data Video data.
第二方面,本公开实施例提供了一种处理数据的装置,该装置包括:采集单元,被配置成采集音视频数据,音视频数据包括音频数据和视频数据;第一 确定单元,被配置成将所述视频数据中的帧的采集时间确定为所述视频数据中的帧的时间戳;第二确定单元,被配置成将所述音频数据的首次采样时间作为起始时间,基于所述起始时间、所述音频数据中的帧处理完成时已处理的总帧数、预设的每帧采样数和预设的采样频率,确定所述音频数据中的帧的时间戳;存储单元,被配置成存储带有所述视频数据中的帧的时间戳和所述音频数据中的帧的时间戳的音视频数据。In a second aspect, an embodiment of the present disclosure provides an apparatus for processing data. The apparatus includes: an acquisition unit configured to acquire audio and video data, the audio and video data including audio data and video data; and a first determination unit configured to Determining the collection time of the frames in the video data as the timestamp of the frames in the video data; a second determining unit configured to use the first sampling time of the audio data as a starting time, based on the starting time The start time, the total number of frames processed when the frame processing in the audio data is completed, the preset number of samples per frame, and the preset sampling frequency to determine the time stamp of the frames in the audio data; the storage unit, And configured to store audio and video data with a time stamp of a frame in the video data and a time stamp of a frame in the audio data.
第三方面,本公开实施例提供了一种终端设备,包括:至少一个处理器;存储装置,其上存储有至少一个程序,当至少一个程序被至少一个处理器执行,使得至少一个处理器实现如处理数据的方法中任一实施例的方法。According to a third aspect, an embodiment of the present disclosure provides a terminal device including: at least one processor; a storage device storing at least one program thereon, and when at least one program is executed by at least one processor, the at least one processor implements As in any one of the methods of processing data.
第四方面,本公开实施例提供了一种计算机可读介质,其上存储有计算机程序,该程序被处理器执行时实现如处理数据的方法中任一实施例的方法。In a fourth aspect, an embodiment of the present disclosure provides a computer-readable medium having stored thereon a computer program that, when executed by a processor, implements the method as in any one of the methods for processing data.
本公开在阅读并理解了附图和详细描述后,可以明白其他方面。After reading and understanding the accompanying drawings and the detailed description, the present disclosure may understand other aspects.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
公开public
图1是本公开的一个实施例可以应用于其中的示例性系统架构图;FIG. 1 is an exemplary system architecture diagram to which an embodiment of the present disclosure can be applied; FIG.
图2是根据本公开的处理数据的方法的一个实施例的流程图;2 is a flowchart of an embodiment of a method for processing data according to the present disclosure;
图3是根据本公开的处理数据的方法的一个应用场景的示意图;3 is a schematic diagram of an application scenario of a method for processing data according to the present disclosure;
图4是根据本公开的处理数据的方法的又一个实施例的流程图;4 is a flowchart of still another embodiment of a method for processing data according to the present disclosure;
图5是根据本公开的处理数据的装置的一个实施例的结构示意图;5 is a schematic structural diagram of an embodiment of an apparatus for processing data according to the present disclosure;
图6是适于用来实现本公开实施例的终端设备计算机系统的结构示意图。FIG. 6 is a schematic structural diagram of a terminal device computer system suitable for implementing the embodiments of the present disclosure.
具体实施方式detailed description
下面结合附图和实施例对本公开作进一步的详细说明。可以理解的是,此处所描述的示例实施例仅仅用于解释本公开,而非对本公开的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与本公开相关的部分。The disclosure is further described in detail below with reference to the drawings and embodiments. It can be understood that the example embodiments described herein are only used to explain the disclosure, but not to limit the disclosure. It should also be noted that, for convenience of description, only the parts related to the present disclosure are shown in the drawings.
需要说明的是,在不冲突的情况下,本公开中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本公开。It should be noted that, in the case of no conflict, the embodiments in the present disclosure and the features in the embodiments can be combined with each other. The disclosure will be described in detail below with reference to the drawings and embodiments.
图1示出了可以应用本公开的处理数据的方法或处理数据的装置的示例性系统架构100。FIG. 1 illustrates an exemplary system architecture 100 to which a method for processing data or a device for processing data of the present disclosure can be applied.
如图1所示,系统架构100可以包括终端设备101、终端设备102、终端设 备103,网络104和服务器105。网络104用以在终端设备101、终端设备102、终端设备103和服务器105之间提供通信链路的介质。网络104可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。As shown in FIG. 1, the system architecture 100 may include a terminal device 101, a terminal device 102, a terminal device 103, a network 104, and a server 105. The network 104 is used to provide a medium for a communication link between the terminal device 101, the terminal device 102, the terminal device 103, and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, and so on.
用户可以使用终端设备101、终端设备102、终端设备103通过网络104与服务器105交互,以接收或发送消息(例如音视频数据上传请求)等。终端设备101、终端设备102、终端设备103上可以安装有各种通讯客户端应用,例如视频录制类应用、音频播放类应用、即时通信工具、邮箱客户端、社交平台软件等。The user can use the terminal device 101, the terminal device 102, and the terminal device 103 to interact with the server 105 through the network 104 to receive or send messages (such as audio and video data upload requests) and the like. Various communication client applications can be installed on the terminal device 101, the terminal device 102, and the terminal device 103, such as video recording applications, audio playback applications, instant communication tools, email clients, social platform software, and the like.
终端设备101、终端设备102、终端设备103可以是硬件,也可以是软件。在终端设备101、终端设备102、终端设备103为硬件的情况下,可以是具有显示屏并且音视频录制的各种电子设备,包括但不限于智能手机、平板电脑、膝上型便携计算机和台式计算机等等。在终端设备101、终端设备102、终端设备103为软件的情况下,可以安装在上述所列举的电子设备中。其可以实现成多个软件或软件模块(例如用来提供分布式服务),也可以实现成单个软件或软件模块。在此不做具体限定。The terminal device 101, the terminal device 102, and the terminal device 103 may be hardware or software. In the case where the terminal device 101, the terminal device 102, and the terminal device 103 are hardware, they can be various electronic devices with a display screen and audio and video recording, including but not limited to smartphones, tablets, laptops, and desktops Computer and so on. When the terminal device 101, the terminal device 102, and the terminal device 103 are software, they can be installed in the electronic devices listed above. It can be implemented as multiple software or software modules (for example, to provide distributed services), or it can be implemented as a single software or software module. It is not specifically limited here.
终端设备101、终端设备102、终端设备103可以安装有图像采集装置(例如摄像头),以采集视频数据。实践中,组成视频的最小视觉单位是帧(Frame)。每一帧是一幅静态的图像。将时间上连续的帧序列合成到一起便形成动态视频。此外,终端设备101、终端设备102、终端设备103还可以安装有音频采集装置(例如麦克风),以采集连续的模拟音频信号。实践中,以一定的频率对来自麦克风等设备的连续的模拟音频信号进行模数转换(Analogue-to-Digital Conversion,ADC)后所得到的数据即为音频数据。The terminal device 101, the terminal device 102, and the terminal device 103 may be equipped with an image acquisition device (such as a camera) to collect video data. In practice, the smallest visual unit that makes up a video is a frame. Each frame is a static image. Combining a sequence of temporally consecutive frames together forms a dynamic video. In addition, the terminal device 101, the terminal device 102, and the terminal device 103 may also be installed with an audio collection device (such as a microphone) to collect continuous analog audio signals. In practice, the data obtained by performing analog-to-digital conversion (ADC) on a continuous analog audio signal from a device such as a microphone at a certain frequency is audio data.
终端设备101、终端设备102、终端设备103可以利用安装于其上的图像采集装置和音频采集装置分别进行视频数据和音频数据的采集。并且,可以对所采集到的视频数据进行时间戳计算等处理,最终将处理结果(例如所采集到的音频数据和包含时间戳的视频数据)进行存储。The terminal device 101, the terminal device 102, and the terminal device 103 may use an image acquisition device and an audio acquisition device installed on the terminal device 101 to collect video data and audio data, respectively. In addition, time stamp calculation and other processing may be performed on the collected video data, and finally the processing results (such as the collected audio data and video data including the time stamp) are stored.
服务器105可以是提供各种服务的服务器,例如对终端设备101、终端设备102、终端设备103上所安装的视频录制类应用提供支持的后台服务器。后台服务器可以对所接收到的音视频数据上传请求等数据进行解析、存储等处理。还可以接收终端设备101、终端设备102、终端设备103所发送的音视频数据获取请求,并将该音视频数据获取请求所指示的音视频数据反馈至终端设备101、终 端设备102、终端设备103。The server 105 may be a server that provides various services, such as a background server that provides support for video recording applications installed on the terminal device 101, the terminal device 102, and the terminal device 103. The background server can analyze and store the received audio and video data upload requests and other data. It can also receive audio and video data acquisition requests sent by the terminal device 101, terminal device 102, and terminal device 103, and feed back the audio and video data indicated by the audio and video data acquisition request to the terminal device 101, terminal device 102, and terminal device 103 .
需要说明的是,服务器可以是硬件,也可以是软件。在服务器为硬件的情况下,可以实现成多个服务器组成的分布式服务器集群,也可以实现成单个服务器。在服务器为软件的情况下,可以实现成多个软件或软件模块(例如用来提供分布式服务),也可以实现成单个软件或软件模块。在此不做具体限定。It should be noted that the server may be hardware or software. When the server is hardware, it can be implemented as a distributed server cluster consisting of multiple servers or as a single server. When the server is software, it can be implemented as multiple software or software modules (for example, to provide distributed services), or it can be implemented as a single software or software module. It is not specifically limited here.
需要说明的是,本公开实施例所提供的处理数据的方法一般由终端设备101、终端设备102、终端设备103执行,相应地,处理数据的装置一般设置于终端设备101、终端设备102、终端设备103中。It should be noted that the method for processing data provided by the embodiments of the present disclosure is generally executed by the terminal device 101, the terminal device 102, and the terminal device 103. Accordingly, the data processing device is generally provided in the terminal device 101, the terminal device 102, and the terminal Device 103.
应该理解,图1中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器。It should be understood that the numbers of terminal devices, networks, and servers in FIG. 1 are merely exemplary. According to implementation needs, there can be any number of terminal devices, networks, and servers.
继续参考图2,示出了根据本公开的处理数据的方法的一个实施例的流程200。该处理数据的方法,包括步骤201至步骤204。With continued reference to FIG. 2, a flowchart 200 of one embodiment of a method of processing data according to the present disclosure is shown. The method for processing data includes steps 201 to 204.
在步骤201中,采集音视频数据。In step 201, audio and video data is collected.
在本实施例中,处理数据的方法的执行主体(例如图1所示的终端设备101、102、103)可以安装有图像采集装置(例如摄像头)和音频信号采集装置(例如麦克风)。上述执行主体可以同时开启上述图像采集装置和上述音频信号采集装置,并利用上述图像采集装置和上述音频信号采集装置,采集音视频数据。其中,上述音视频数据包括音频数据(voice data)和视频数据(vision data)。In this embodiment, an execution subject of the method for processing data (for example, the terminal devices 101, 102, and 103 shown in FIG. 1) may be installed with an image acquisition device (such as a camera) and an audio signal acquisition device (such as a microphone). The execution subject may turn on the image acquisition device and the audio signal acquisition device at the same time, and use the image acquisition device and the audio signal acquisition device to collect audio and video data. The audio and video data includes audio data and video data.
实践中,视频数据可以用帧(Frame)来描述。这里,帧是组成视频的最小视觉单位。每一帧是一幅静态的图像。将时间上连续的帧序列合成到一起便形成动态视频。In practice, video data can be described by frames. Here, a frame is the smallest visual unit that makes up a video. Each frame is a static image. Combining a sequence of temporally consecutive frames together forms a dynamic video.
实践中,音频数据是对声音信号进行数字化后的数据。声音信号的数字化过程是以一定的频率将来自麦克风等设备的连续的模拟音频信号转换成数字信号得到音频数据的过程。声音信号的数字化过程通常包含采样、量化和编码三个步骤。其中,采样是指用每隔一定时间间隔的信号样本值序列来代替原来在时间上连续的信号。量化是指用有限幅度近似表示原来在时间上连续变化的幅度值,把模拟信号的连续幅度变为有限数量、有一定时间间隔的离散值。编码则是指按照一定的规律,把量化后的离散值用二进制数码表示。通常,声音信号的数字化过程有两个重要的指标,分别为采样频率(Sampling Rate)和采样大小(Sampling Size)。其中,采样频率也称为采样速度或者采样频率。采样频率可以是每秒从连续信号中提取并组成离散信号的采样个数。采样频率可以用赫 兹(Hz)来表示。采样大小可以用比特(bit)来表示。此处,脉冲编码调制(Pulse Code Modulation,PCM)可以实现将模拟音频信号经过采样、量化、编码转换成的数字化的音频数据。因此,上述音频数据可以是PCM编码格式的数据。In practice, audio data is data obtained by digitizing a sound signal. The process of digitizing sound signals is a process of converting continuous analog audio signals from microphones and other equipment into digital signals at a certain frequency to obtain audio data. The digitization process of sound signals usually includes three steps: sampling, quantization, and encoding. Among them, sampling refers to replacing a signal that is continuous in time with a sequence of signal sample values at regular time intervals. Quantization refers to the use of finite amplitude approximation to indicate the amplitude value that continuously changes in time, and changes the continuous amplitude of the analog signal into a finite number of discrete values with a certain time interval. Encoding means that the quantized discrete value is represented by binary digits according to a certain rule. Generally, there are two important indicators for the digitization process of a sound signal, namely the sampling frequency (Sampling Rate) and the sampling size (Sampling Size). Among them, the sampling frequency is also called a sampling speed or a sampling frequency. The sampling frequency can be the number of samples taken from the continuous signal per second and composed of discrete signals. The sampling frequency can be expressed in Hertz (Hz). The sample size can be expressed in bits. Here, Pulse Code Modulation (Pulse Code Modulation, PCM) can implement digital audio data that is obtained by sampling, quantizing, and encoding an analog audio signal. Therefore, the above audio data may be data in a PCM encoding format.
需要说明的是,记载上述目标音频数据的文件的格式还可以是其他格式,例如mp3格式、ape格式等。此时,上述目标音频数据可以是其他编码格式(例如AAC(Advanced Audio Coding,高级音频编码)等有损压缩格式)的数据,不限于PCM编码格式。上述执行主体也可以在获取该文件后,对该文件进行格式转换,将其转换为记录wav格式。此时,转换后的文件中的目标音频文件则为PCM编码格式的数据流。It should be noted that the format of the file describing the target audio data may also be other formats, such as the mp3 format and the ape format. At this time, the target audio data may be data of other encoding formats (for example, lossy compression formats such as AAC (Advanced Audio Coding)), and is not limited to the PCM encoding format. The above-mentioned execution body may also perform format conversion on the file after obtaining the file, and convert it into a record wav format. At this time, the target audio file in the converted file is a data stream in PCM encoding format.
实践中,上述执行主体中可以安装有视频录制类应用。该视频录制类应用可以支持原声视频的录制。其中,上述原声视频可以是以视频原声作为视频的背景声音的视频。用户可以通过在视频录制类应用的运行界面中点击视频录制按键,从而触发视频录制指令。上述执行主体在接收到视频录制指令后,可以同时开启上述图像采集装置和上述音频采集装置,进行原声视频的录制。In practice, a video recording application may be installed in the execution body. This video recording application can support the recording of original video. The above-mentioned original sound video may be a video using the original sound of the video as the background sound of the video. The user can trigger the video recording instruction by clicking the video recording button in the running interface of the video recording application. After receiving the video recording instruction, the execution subject may simultaneously turn on the image acquisition device and the audio acquisition device to record the original video.
在步骤202中,对于视频数据中的帧,将该帧的采集时间确定为该帧的时间戳。In step 202, for a frame in the video data, the acquisition time of the frame is determined as the time stamp of the frame.
在本实施例中,上述执行主体在采集到视频数据的每一帧时,可以记录采集时间。每一帧的采集时间可以是采集到该帧时的系统时间戳(例如unix时间戳)。此外,每一帧的采集时间也可以采用其他时间戳,例如,相对于某一指定时间的相对时间戳。需要说明的是,时间戳(timestamp)是能表示一份数据在某个特定时刻已经存在的、完整的、可验证的数据。通常,时间戳是一个字符序列,唯一地标识某一刻的时间。此处,对于视频数据中的帧,上述执行主体可以将该帧的采集时间确定为该帧的时间戳。In this embodiment, the above-mentioned execution subject may record the acquisition time when each frame of video data is acquired. The collection time of each frame may be a system timestamp (such as a Unix timestamp) when the frame is collected. In addition, the acquisition time of each frame can also adopt other timestamps, for example, relative timestamps relative to a specified time. It should be noted that the timestamp is a complete, verifiable data that can indicate that a piece of data already exists at a specific time. Usually, a timestamp is a sequence of characters that uniquely identifies the time of a moment. Here, for the frame in the video data, the execution body may determine the collection time of the frame as the time stamp of the frame.
在步骤203中,将音频数据的首次采样时间作为起始时间,基于起始时间、该帧处理完成时已处理的总帧数、预设的每帧采样数和预设的采样频率,确定该帧的时间戳。In step 203, the first sampling time of the audio data is used as the starting time, and the starting time is determined based on the starting time, the total number of frames processed when the frame processing is completed, the preset number of samples per frame, and the preset sampling frequency. The timestamp of the frame.
在本实施例中,上述执行主体可以将音频数据的首次采样时间作为起始时间。需要说明的是,在以系统时间戳作为视频数据中的帧的采集时间的情况下,则上述起始时间可以是音频数据的首次采样的系统时间戳。在以相对于某一指定时间的相对时间戳作为视频数据中的帧的采集时间的情况下,则上述起始时间可以是音频数据的首次采样的时间相对于该指定时间的相对时间戳。对于依 次采集到的音频数据中的每一帧,上述执行主体可以对该帧进行各种处理。例如,可以进行透传、混响、均衡、变声、变调、变速等处理。对于经过处理后的每一帧,上述执行主体可以基于起始时间、该帧处理完成时已处理的总帧数、预设的每帧采样数和预设的采样频率,确定该帧的时间戳。In this embodiment, the execution body may use the first sampling time of the audio data as the starting time. It should be noted that when the system time stamp is used as the frame collection time in the video data, the above start time may be the system time stamp of the first sampling of the audio data. In the case that a relative time stamp with respect to a specified time is used as the acquisition time of the frame in the video data, the above-mentioned start time may be a relative time stamp of the time of the first sampling of the audio data with respect to the specified time. For each frame of the audio data sequentially acquired, the above-mentioned execution body can perform various processing on the frame. For example, transparent transmission, reverberation, equalization, sound change, tone change, speed change and other processing can be performed. For each frame after processing, the above-mentioned execution body may determine the time stamp of the frame based on the start time, the total number of frames processed when the frame is processed, the preset number of samples per frame, and the preset sampling frequency. .
本公开中对帧的处理是指透传、混响、均衡、变声、变调、变速等处理。The processing of frames in the present disclosure refers to processing such as transparent transmission, reverberation, equalization, sound change, tone change, speed change, and the like.
作为示例,上述执行主体可以首先基于预设的每帧采样数和预设的采样频率,确定每一帧的时长。此处,每一帧的时长为上述每帧采样数和上述采样频率的比值。由于每帧采样数和采样频率是预设的固定数值,因此,每一帧的时长是固定值。而后,每处理完成一帧,可以将当前已处理的总帧数(即该帧处理完成时已处理的总帧数)与每一帧的时长相乘,乘积即为该帧处理完成时刻,上述执行主体已处理的总时长。最后,可以将起始时间与当前已处理的总时长的和确定为该帧的时间戳。As an example, the execution body may first determine the duration of each frame based on a preset number of samples per frame and a preset sampling frequency. Here, the duration of each frame is a ratio of the number of samples in each frame and the sampling frequency. Since the number of samples and the sampling frequency of each frame are preset fixed values, the duration of each frame is a fixed value. Then, each time a frame is processed, the total number of frames currently processed (that is, the total number of frames processed when the frame is processed) can be multiplied by the duration of each frame, and the product is the time when the frame is processed. The total length of time the execution subject has processed. Finally, the sum of the start time and the total duration currently processed can be determined as the time stamp of the frame.
作为又一示例,对于经过处理后的每一帧,上述执行主体也可以将该帧的采集时间确定为该帧的时间戳。As another example, for each frame that has been processed, the execution body may determine the collection time of the frame as the time stamp of the frame.
在本实施例的一些实现方式中,对于音频数据中的帧,上述执行主体可以按照如下步骤确定该帧的时间戳:首先,可以确定预设的每帧采样数与预设的采样频率的比值。而后,可以确定上述比值与该帧处理完成时已处理的总帧数的乘积。之后,可以将上述乘积与起始时间的和确定为该帧的目标时间。最后,可以基于该帧的目标时间与该帧的采集时间的数值比较,确定该帧的时间戳。作为示例,若目标时间与采集时间的差值位于预设的数值区间内,则可以将目标时间确定为该帧的时间戳。若目标时间与采集时间的差值不位于上述数值区间内,则可以将采集时间确定为该帧的时间戳。此处,上述数值区间可以是技术人员基于大量数据统计而预先制定的区间。需要说明的是,上述确定该帧的时间戳可以在每一帧处理完成时执行。对于每一帧而言,该帧处理完成时已处理的总帧数,即为当前已处理的总帧数。In some implementations of this embodiment, for a frame in audio data, the execution body may determine the time stamp of the frame according to the following steps: first, a ratio of a preset number of samples per frame to a preset sampling frequency may be determined . Then, the product of the above ratio and the total number of frames processed when the frame processing is completed can be determined. After that, the sum of the above product and the start time can be determined as the target time of the frame. Finally, the time stamp of the frame can be determined based on a comparison between the target time of the frame and the value of the acquisition time of the frame. As an example, if the difference between the target time and the acquisition time is within a preset numerical interval, the target time may be determined as the time stamp of the frame. If the difference between the target time and the acquisition time is not within the above-mentioned value interval, the acquisition time may be determined as the time stamp of the frame. Here, the above-mentioned numerical interval may be an interval that is preset by a technician based on a large amount of data statistics. It should be noted that the above-mentioned time stamp for determining the frame may be executed when processing of each frame is completed. For each frame, the total number of frames processed when the frame processing is completed is the total number of frames currently processed.
在本实施例的一些实现方式中,对于音频数据中的帧,响应于确定该帧的目标时间与该帧的采集时间的差值小于预设数值,可以将该帧的目标时间确定为该帧的时间戳。此处,上述预设数值可以是技术人员基于大量数据统计而预先确定的数值。In some implementations of this embodiment, for a frame in audio data, in response to determining that the difference between the target time of the frame and the acquisition time of the frame is less than a preset value, the target time of the frame may be determined as the frame Timestamp. Here, the preset value may be a value determined in advance by a technician based on a large amount of data statistics.
在本实施例的一些实现方式中,对于音频数据中的帧,响应于确定该帧的目标时间与该帧的采集时间的差值大于或等于上述预设数值,可以将该帧的采 集时间确定为该帧的时间戳。In some implementations of this embodiment, for a frame in audio data, in response to determining that the difference between the target time of the frame and the acquisition time of the frame is greater than or equal to the preset value, the acquisition time of the frame may be determined Is the timestamp of the frame.
在本实施例的一些实现方式中,响应于确定该帧的目标时间与该帧的采集时间的差值大于或等于上述预设数值,上述执行主体还可以执行如下的信息重设步骤:将起始时间更新为该帧的采集时间;以及将当前已处理完成的总帧数清零。此处,在检测到某一帧的目标时间与该帧的采集时间的差值小于预设数值的情况下,当前已处理完成的总帧数即为该帧处理完成时已处理的总帧数。In some implementations of this embodiment, in response to determining that the difference between the target time of the frame and the acquisition time of the frame is greater than or equal to the above-mentioned preset value, the above-mentioned execution subject may also perform the following information resetting step: The start time is updated to the acquisition time of the frame; and the total number of frames currently processed is cleared. Here, when the difference between the target time of a frame and the acquisition time of the frame is detected to be less than a preset value, the total number of frames currently processed is the total number of frames processed when the processing of the frame is completed. .
在本实施例的一些实现方式中,在执行上述信息重设步骤之后,上述执行主体还可以执行如下步骤:首先,可以确定上述信息重设步骤的执行频率。响应于确定上述信息重设步骤的执行频率大于预设的执行频率阈值,对于音频数据中的经过处理且未确定时间戳的帧,可以将该帧的采集时间确定为该帧的时间戳。In some implementations of this embodiment, after performing the information resetting step, the execution subject may further perform the following steps: First, the execution frequency of the information resetting step may be determined. In response to determining that the execution frequency of the information resetting step is greater than a preset execution frequency threshold, for a processed frame in the audio data without determining a timestamp, the acquisition time of the frame may be determined as the timestamp of the frame.
在本实施例的一些实现方式中,确定上述信息重设步骤的执行频率可以是,在执行完信息重设步骤之后,通过计算或者直接读取存储的信息重设步骤的执行频率。确定信息重设步骤的执行频率之后,再与预设的执行频率阈值进行比较。In some implementations of this embodiment, determining the execution frequency of the information reset step may be, after the information reset step is performed, the execution frequency of the information reset step is calculated or directly read. After the execution frequency of the information resetting step is determined, it is compared with a preset execution frequency threshold.
在本实施例的一些实现方式中,在执行上述信息重设步骤之后,上述执行主体还可以执行如下步骤:首先,可以确定上述信息重设步骤的执行次数。而后,响应于确定上述信息重设步骤的执行次数大于预设的执行次数阈值,对于音频数据中的经过处理且未确定时间戳的帧,可以将该帧的采集时间确定为该帧的时间戳。In some implementations of this embodiment, after performing the information reset step, the execution subject may further perform the following steps: First, the number of executions of the information reset step may be determined. Then, in response to determining that the number of times of execution of the above information reset step is greater than a preset number of times of execution, for a processed frame in the audio data without determining a timestamp, the collection time of the frame may be determined as the timestamp of the frame .
在本实施例的一些实现方式中,确定上述信息重设步骤的执行次数可以是,在执行完信息重设步骤之后,通过计算或者直接读取存储的信息重设步骤的执行次数。确定信息重设步骤的执行次数之后,再与预设的执行次数阈值进行比较。In some implementations of this embodiment, determining the number of times the information reset step is performed may be, after performing the information reset step, calculating or directly reading the number of times the stored information reset step is executed. After determining the number of executions of the information reset step, it is compared with a preset number of executions threshold.
在步骤204中,存储带有所述视频数据中的帧的时间戳和所述音频数据中的帧的时间戳的音视频数据。In step 204, audio and video data with a timestamp of a frame in the video data and a timestamp of a frame in the audio data are stored.
在本实施例中,上述执行主体可以存储包含时间戳的音频数据和包含时间戳的视频数据。此处,可以将包含时间戳的音频数据和包含时间戳的视频数据分别存储至两个文件中,并建立上述两个文件的映射。此外,也可以将包含时间戳的音频数据和包含时间戳的视频数据存储至同一个文件中。In this embodiment, the above-mentioned execution subject may store audio data including a time stamp and video data including a time stamp. Here, the audio data containing the timestamp and the video data containing the timestamp may be stored in two files respectively, and the mapping of the two files is established. In addition, you can store audio data with timestamps and video data with timestamps in the same file.
在本实施例的一些实现方式中,上述存储带有视频数据中的帧的时间戳和 音频数据中的帧的时间戳的音视频数据可以按照如下步骤执行:首先,可以将带有时间戳的音视频数据进行编码。即,对包含时间戳的音频数据和包含时间戳的视频数据分别进行编码。实践中,视频编码可以是指通过特定的压缩技术,将某个视频格式的文件转换成另一种视频格式文件的方式。音频编码可以采用波形编码、参数编码、混合编码等编码方式。需要说明的是,音频编码、视频编码技术是是目前广泛研究和应用的公知技术,在此不再赘述。之后,可以将编码后的音视频数据存储至本地,或者,将编码后的音视频数据发送至服务器。例如,上述执行主体可以将编码后的音频数据和编码后的视频数据存储在同一文件中,并将该文件存储在本地。也可以将编码后的音频数据和编码后的视频数据存储在同一文件中,并通过有线连接或者无线连接的方式发送至服务器(例如图1所示的服务器105)。In some implementations of this embodiment, the audio and video data storing the timestamps of the frames in the video data and the timestamps of the frames in the audio data may be performed as follows: First, the timestamped Audio and video data are encoded. That is, the audio data including the time stamp and the video data including the time stamp are separately encoded. In practice, video encoding can refer to a method of converting a file in a certain video format to another file in a video format through a specific compression technology. Audio coding can use coding methods such as waveform coding, parameter coding, and hybrid coding. It should be noted that audio coding and video coding technologies are well-known technologies that are widely studied and applied at present, and will not be repeated here. Afterwards, the encoded audio and video data can be stored locally, or the encoded audio and video data can be sent to the server. For example, the execution body may store the encoded audio data and the encoded video data in the same file, and store the file locally. The encoded audio data and the encoded video data may also be stored in the same file and sent to the server (such as the server 105 shown in FIG. 1) through a wired connection or a wireless connection.
继续参见图3,图3是根据本实施例的处理数据的方法的应用场景的一个示意图。在图3的应用场景中,用户手持终端设备301,进行原声视频的录制。终端设备301中运行有短视频录制类应用。用户在该短视频录制类应用的界面中点击了原声视频录制按键之后,终端设备301同时开启麦克风和摄像头,分别进行音频数据302和视频数据303的采集。对于视频数据303中的帧,终端设备301可以将该帧的采集时间确定为该帧的时间戳。终端设备301可以将音频数据302的首次采样时间作为起始时间,基于起始时间、该帧处理完成时已处理的总帧数、预设的每帧采样数和预设的采样频率,确定该帧的时间戳。最后,将带有时间戳的音视频数据存储至文件304中。With continued reference to FIG. 3, FIG. 3 is a schematic diagram of an application scenario of a method for processing data according to this embodiment. In the application scenario of FIG. 3, a user holds a terminal device 301 and records an original video. A short video recording application runs on the terminal device 301. After the user clicks the original video recording button in the interface of the short video recording application, the terminal device 301 simultaneously turns on the microphone and the camera, and collects audio data 302 and video data 303, respectively. For a frame in the video data 303, the terminal device 301 may determine the collection time of the frame as the time stamp of the frame. The terminal device 301 may use the first sampling time of the audio data 302 as a start time, and determine the start time based on the start time, the total number of frames processed when the frame processing is completed, the preset number of samples per frame, and the preset sampling frequency. The timestamp of the frame. Finally, the timestamped audio and video data is stored in the file 304.
本公开的上述实施例提供的方法,通过采集音视频数据,而后将视频数据中的帧的采集时间确定为该帧的时间戳,之后将音频数据的首次采样时间作为起始时间,基于起始时间、音频数据中各帧处理完成时已处理的总帧数、预设的采样数和采样频率,确定各帧的时间戳,最后存储带有视频数据中的帧的时间戳和音频数据中的帧的时间戳的音视频数据。由于使用采集时间作为视频数据的帧的时间戳,且采集时间可直接获取,不需要利用固定间隔时间进行计算,因而,避免了视频数据采集不稳定的情况下(例如设备过热、性能不足导致丢帧),按照固定时间间隔进行帧的时间戳的计算所导致的时间戳不准确的情况。此外,在音频数据采集不稳定的情况下(例如设备过热、性能不足导致采集小幅抖动),直接使用音频数据的帧的采集时间作为时间戳,会产生时间戳不均匀的情况。由于音频数据是连续的,因而,在时间戳不均匀的的情况下,时间戳 是不够准确的,采用本公开的上述实施例提供的方法,基于该帧处理完成时已处理的总帧数、采样数和采样频率,可以确定出均匀、稳定的时间戳,从而避免了音频数据的帧的时间戳不均匀和不准确的情况。由此,提高了音视频数据的时间戳的准确性,提升了原声视频录制的音视频同步效果。In the method provided by the foregoing embodiment of the present disclosure, by collecting audio and video data, and then determining the frame collection time in the video data as the time stamp of the frame, and then using the first sampling time of the audio data as the start time, based on the start Time, the total number of frames processed when each frame in the audio data is processed, the preset number of samples, and the sampling frequency, determine the timestamp of each frame, and finally store the timestamp with the frames in the video data and the audio data Framed time stamped audio and video data. Because the acquisition time is used as the time stamp of the frame of the video data, and the acquisition time can be obtained directly, it does not need to be calculated at a fixed interval. Therefore, it avoids the situation where the video data collection is unstable (such as equipment overheating, insufficient performance and loss Frame), the timestamp caused by calculating the timestamp of the frame at a fixed time interval is inaccurate. In addition, when the audio data collection is unstable (for example, the device is overheated and the performance is insufficient to cause a small jitter in the acquisition), directly using the acquisition time of the audio data frame as the timestamp may cause a non-uniform time stamp. Because the audio data is continuous, the timestamp is not accurate enough when the timestamps are not uniform. The method provided by the above embodiment of the present disclosure is adopted. The number of samples and the sampling frequency can determine a uniform and stable time stamp, thereby avoiding uneven and inaccurate timestamps of frames of audio data. Therefore, the accuracy of the time stamp of the audio and video data is improved, and the audio and video synchronization effect of the original audio and video recording is improved.
参考图4,其示出了处理数据的方法的又一个实施例的流程400。该处理数据的方法的流程400,包括步骤401至步骤406。Referring to FIG. 4, a flowchart 400 of still another embodiment of a method of processing data is shown. The process 400 of the method for processing data includes steps 401 to 406.
在步骤401中,采集音视频数据。In step 401, audio and video data is collected.
在本实施例中,处理数据的方法的执行主体(例如图1所示的终端设备101、102、103)可以安装有图像采集装置(例如摄像头)和音频信号采集装置(例如麦克风)。上述执行主体可以同时开启上述图像采集装置和上述音频采集装置,利用上述图像采集装置和上述音频采集装置进行音视频数据的采集。其中,上述音视频数据包括音频数据和视频数据。In this embodiment, an execution subject of the method for processing data (for example, the terminal devices 101, 102, and 103 shown in FIG. 1) may be installed with an image acquisition device (such as a camera) and an audio signal acquisition device (such as a microphone). The execution subject may turn on the image acquisition device and the audio acquisition device at the same time, and use the image acquisition device and the audio acquisition device to collect audio and video data. The audio and video data includes audio data and video data.
在步骤402中,对于视频数据中的帧,将该帧的采集时间确定为该帧的时间戳。In step 402, for a frame in the video data, the acquisition time of the frame is determined as the time stamp of the frame.
在本实施例中,上述执行主体在采集到视频数据的每一帧时,可以记录采集时间。对于视频数据中的帧,上述执行主体可以将该帧的采集时间确定为该帧的时间戳。In this embodiment, the above-mentioned execution subject may record the acquisition time when each frame of video data is acquired. For a frame in video data, the execution body may determine the collection time of the frame as the time stamp of the frame.
在步骤403中,将音频数据的首次采样时间作为起始时间,对于音频数据中的帧,确定预设的每帧采样数与预设的采样频率的比值,确定上述比值与该帧处理完成时已处理的总帧数的乘积,将上述乘积与起始时间的和确定为该帧的目标时间。In step 403, the first sampling time of the audio data is used as the starting time. For the frames in the audio data, a ratio between a preset number of samples per frame and a preset sampling frequency is determined, and when the above ratio is determined and the processing of the frame is completed, The product of the total number of frames processed, the sum of the above product and the start time is determined as the target time of the frame.
在本实施例中,上述执行主体可以将音频数据的首次采样时间作为起始时间。对于依次采集到的音频数据中的每一帧,上述执行主体可以对该帧进行各种处理。例如,可以进行透传、混响、均衡、变声、变调、变速等处理。对于经过处理后的每一帧,上述执行主体可以执行如下步骤:In this embodiment, the execution body may use the first sampling time of the audio data as the starting time. For each frame in the audio data collected in turn, the above-mentioned execution body can perform various processing on the frame. For example, transparent transmission, reverberation, equalization, sound change, tone change, speed change and other processing can be performed. For each frame after processing, the above execution body can perform the following steps:
首先,确定预设的每帧采样数与预设的采样频率的比值。此处,所确定比值即每一帧的时长。由于每帧采样数和采样频率是预设的固定数值,因此,每一帧的时长是固定值。First, determine a ratio between a preset number of samples per frame and a preset sampling frequency. Here, the determined ratio is the duration of each frame. Since the number of samples and the sampling frequency of each frame are preset fixed values, the duration of each frame is a fixed value.
之后,确定上述比值与该帧处理完成时已处理的总帧数的乘积。此处,对于经过处理的每一帧而言,该帧处理完成时已处理的总帧数,即为当前已处理的总帧数。实践中,上述乘积,即为该帧处理完成时刻,上述执行主体已处理 的总时长。Then, the product of the above ratio and the total number of frames processed when the frame processing is completed is determined. Here, for each processed frame, the total number of frames processed when the frame processing is completed is the total number of frames currently processed. In practice, the above product is the time when the frame processing is completed and the execution body has processed the total time.
最后,将上述乘积与起始时间的和确定为该帧的目标时间。Finally, the sum of the above product and the start time is determined as the target time of the frame.
在步骤404中,对于音频数据中的帧,响应于确定该帧的目标时间与该帧的采集时间的差值小于预设数值,将该帧的目标时间确定为该帧的时间戳。In step 404, for a frame in the audio data, in response to determining that the difference between the target time of the frame and the acquisition time of the frame is less than a preset value, the target time of the frame is determined as the time stamp of the frame.
在本实施例中,对于音频数据中的帧,响应于确定该帧的目标时间与该帧的采集时间的差值小于预设数值,可以将该帧的目标时间确定为该帧的时间戳。In this embodiment, for a frame in audio data, in response to determining that the difference between the target time of the frame and the acquisition time of the frame is less than a preset value, the target time of the frame may be determined as the time stamp of the frame.
在步骤405中,对于音频数据中的帧,响应于确定该帧的目标时间与该帧的采集时间的差值大于或等于预设数值,将该帧的采集时间确定为该帧的时间戳。In step 405, for a frame in the audio data, in response to determining that the difference between the target time of the frame and the acquisition time of the frame is greater than or equal to a preset value, the acquisition time of the frame is determined as the time stamp of the frame.
在本实施例中,对于音频数据中的帧,响应于确定该帧的目标时间与帧的采集时间的差值大于或等于上述预设数值,上述执行主体可以将该帧的采集时间确定为该帧的时间戳。In this embodiment, for a frame in audio data, in response to determining that a difference between a target time of the frame and a frame collection time is greater than or equal to the preset value, the execution body may determine the frame collection time as the The timestamp of the frame.
在本实施例的一些实现方式中,响应于确定该帧的目标时间与该帧的采集时间的差值大于或等于上述预设数值,上述执行主体还可以执行如下的信息重设步骤:将起始时间更新为该帧的采集时间;以及将当前已处理完成的总帧数清零。In some implementations of this embodiment, in response to determining that the difference between the target time of the frame and the acquisition time of the frame is greater than or equal to the above-mentioned preset value, the above-mentioned execution subject may also perform the following information resetting step: The start time is updated to the acquisition time of the frame; and the total number of frames currently processed is cleared.
在本实施例的一些实现方式中,在执行上述信息重设步骤之后,上述执行主体还可以确定上述信息重设步骤的执行频率。响应于确定上述信息重设步骤的执行频率大于预设的执行频率阈值,对于音频数据中的经过处理且未确定时间戳的帧,将帧的采集时间确定为帧的时间戳。需要说明的是,在上述信息重设步骤的执行频率小于或等于上述执行频率阈值的情况下,对于音频数据中后续处理完毕的帧,可以继续按照步骤403的操作执行,以确定该帧的时间戳。In some implementations of this embodiment, after the information reset step is performed, the execution subject may further determine an execution frequency of the information reset step. In response to determining that the execution frequency of the information resetting step is greater than a preset execution frequency threshold, for a processed frame in the audio data without determining a time stamp, the frame collection time is determined as the frame time stamp. It should be noted that when the execution frequency of the information resetting step is less than or equal to the above-mentioned execution frequency threshold, for the frames in the audio data that are subsequently processed, the operation may continue to be performed according to step 403 to determine the time of the frame stamp.
在本实施例的一些实现方式中,在执行上述信息重设步骤之后,上述执行主体还可以确定上述信息重设步骤的执行次数。响应于确定上述信息重设步骤的执行次数大于预设的执行次数阈值,对于音频数据中的经过处理且未确定时间戳的帧,将该帧的采集时间确定为该帧的时间戳。需要说明的是,在上述信息重设步骤的执行次数小于或等于上述执行次数阈值的情况下,对于音频数据中后续处理完毕的帧,可以继续按照步骤403的操作执行,以确定该帧的时间戳。In some implementations of this embodiment, after the information reset step is performed, the execution subject may further determine the number of times the information reset step is performed. In response to determining that the number of times of execution of the above information reset step is greater than a preset number of times of execution, for a processed frame in the audio data without determining a time stamp, the acquisition time of the frame is determined as the time stamp of the frame. It should be noted that when the number of times of execution of the above information reset step is less than or equal to the above number of execution times threshold, for the frames in the audio data that have been processed subsequently, the operation may continue to be performed according to step 403 to determine the time of the frame. stamp.
在步骤406中,存储带有所述视频数据中的帧的时间戳和所述音频数据中的帧的时间戳的音视频数据。In step 406, the audio and video data with the timestamp of the frames in the video data and the timestamp of the frames in the audio data are stored.
在本实施例中,上述执行主体可以存储包含时间戳的音频数据和包含时间戳的视频数据。此处,可以将包含时间戳的音频数据和包含时间戳的视频数据分别存储至两个文件中,并建立上述两个文件的映射。此外,也可以将包含时间戳的音频数据和包含时间戳的视频数据存储至同一个文件中。In this embodiment, the above-mentioned execution subject may store audio data including a time stamp and video data including a time stamp. Here, the audio data containing the timestamp and the video data containing the timestamp may be stored in two files respectively, and the mapping of the two files is established. In addition, you can store audio data with timestamps and video data with timestamps in the same file.
从图4中可以看出,与图2对应的实施例相比,本实施例中的处理数据的方法的流程400体现了对于音频数据中的帧,基于该帧的目标时间与该帧的采集时间的数值比较,确定该帧的时间戳的步骤。在音频数据采集不稳定的情况下(例如设备过热、性能不足等情况下),音频数据的帧的采集时间是不均匀的。而利用当前处理完成的总帧数、采样频率、每帧采样数所确定的目标时间是均匀的。在目标时间与采集时间偏差较小的情况下,可以体现出采集相对较为稳定,此时的采集抖动幅度较小。通过当前处理完成的总帧数、采样频率、每帧采样数所确定的目标时间作为音频数据中的帧时间戳,可以增加音频数据的时间戳的均匀性和稳定性。在目标时间与采集时间偏差较大的情况下,可以体现出采集较不稳定、出现丢帧等情况。此时,若使用目标时间作为时间戳,在丢帧时,所计算出的时间戳并不是当前帧的时间戳,准确性较低。此时采用采集时间,可以保证时间戳的相对准确性。由此,在不同的情况下使用不同的方式确定时间戳,提高了音视频数据的时间戳的准确性,提升了原声视频录制的音视频同步效果。As can be seen from FIG. 4, compared with the embodiment corresponding to FIG. 2, the process 400 of the method for processing data in this embodiment embodies a frame in audio data, based on the target time of the frame and the collection of the frame. The step of comparing the numerical value of time to determine the timestamp of the frame. In the case of unstable audio data collection (for example, when the device is overheated, insufficient performance, etc.), the frame acquisition time of the audio data is uneven. The target time determined by the total number of frames, the sampling frequency, and the number of samples per frame that are currently processed is uniform. In the case where the deviation between the target time and the acquisition time is small, it can be shown that the acquisition is relatively stable, and the amplitude of the acquisition jitter is small at this time. The target time determined by the total number of frames, sampling frequency, and number of samples per frame currently processed is used as the frame time stamp in the audio data, which can increase the uniformity and stability of the time stamp of the audio data. In the case of a large deviation between the target time and the acquisition time, it can reflect that the acquisition is unstable, and frames are dropped. At this time, if the target time is used as the timestamp, when the frame is dropped, the calculated timestamp is not the timestamp of the current frame, and the accuracy is low. The acquisition time is used at this time to ensure the relative accuracy of the time stamp. Therefore, the time stamp is determined in different ways in different situations, which improves the accuracy of the time stamp of the audio and video data, and improves the audio and video synchronization effect of the original audio and video recording.
参考图5,作为对上述各图所示方法的实现,本公开提供了一种处理数据的装置的一个实施例,该装置实施例与图2所示的方法实施例相对应,该装置可以应用于各种电子设备中。Referring to FIG. 5, as an implementation of the methods shown in the foregoing figures, the present disclosure provides an embodiment of a device for processing data. The device embodiment corresponds to the method embodiment shown in FIG. 2, and the device can be applied. In various electronic equipment.
如图5所示,本实施例所述的处理数据的装置500包括:采集单元501,被配置成采集音视频数据,上述音视频数据包括音频数据和视频数据;第一确定单元502,被配置成对于视频数据中的帧,将该帧的采集时间确定为该帧的时间戳;第二确定单元503,被配置成将音频数据的首次采样时间作为起始时间,基于起始时间、该帧处理完成时已处理的总帧数、预设的每帧采样数和预设的采样频率,确定该帧的时间戳;存储单元504,被配置成将带有时间戳的音视频数据进行存储。As shown in FIG. 5, the apparatus 500 for processing data according to this embodiment includes: a collecting unit 501 configured to collect audio and video data, where the audio and video data includes audio data and video data; a first determining unit 502 is configured For a frame in the video data, the acquisition time of the frame is determined as the time stamp of the frame; the second determination unit 503 is configured to use the first sampling time of the audio data as the start time, and based on the start time, the frame When the processing is completed, the total number of frames processed, the preset number of samples per frame, and the preset sampling frequency determine the time stamp of the frame; the storage unit 504 is configured to store the audio and video data with the time stamp.
在本实施例的一些实现方式中,上述第二确定单元503可以第一确定模块和第二确定模块(图中未示出)。其中,上述第一确定模块可以被配置成对于音频数据中的帧,确定预设的每帧采样数与预设的采样频率的比值,确定上述比 值与该帧处理完成时已处理的总帧数的乘积,将上述乘积与起始时间的和确定为该帧的目标时间。上述第二确定模块可以被配置成对于音频数据中的帧,基于该帧的目标时间与该帧的采集时间的数值比较,确定该帧的时间戳。In some implementations of this embodiment, the second determining unit 503 may be a first determining module and a second determining module (not shown in the figure). The first determining module may be configured to determine a ratio between a preset number of samples per frame and a preset sampling frequency for frames in the audio data, and determine the ratio and the total number of frames processed when the frame is processed. The product of, the sum of the above product and the start time is determined as the target time of the frame. The above-mentioned second determination module may be configured to determine, for a frame in the audio data, a time stamp of the frame based on a comparison between a target time of the frame and a value of the acquisition time of the frame.
在本实施例的一些实现方式中,上述第二确定模块可以被配置成对于音频数据中的帧,响应于确定该帧的目标时间与该帧的采集时间的差值小于预设数值,将该帧的目标时间确定为该帧的时间戳。In some implementations of this embodiment, the foregoing second determination module may be configured to, for a frame in audio data, in response to determining that a difference between a target time of the frame and a collection time of the frame is less than a preset value, set the The target time of a frame is determined as the time stamp of the frame.
在本实施例的一些实现方式中,上述第二确定模块可以被配置成于音频数据中的帧,响应于确定该帧的目标时间与该帧的采集时间的差值大于或等于上述预设数值,将该帧的采集时间确定为该帧的时间戳。In some implementations of this embodiment, the second determining module may be configured to be a frame in audio data, and in response to determining that a difference between a target time of the frame and a collection time of the frame is greater than or equal to the preset value , Determine the collection time of the frame as the time stamp of the frame.
在本实施例的一些实现方式中,该装置还可以包括执行单元(图中未示出)。其中,上述执行单元可以被配置成在所述响应于确定所述音频数据中的帧的目标时间与所述音频数据中的帧的采集时间的差值大于或等于预设数值,将所述音频数据中的帧的采集时间确定为所述音频数据中的帧的时间戳之后,执行如下的信息重设步骤:将起始时间更新为该帧的采集时间;以及将当前已处理完成的总帧数清零。In some implementations of this embodiment, the apparatus may further include an execution unit (not shown in the figure). Wherein, the execution unit may be configured to, when the difference between the target time of the frame in the audio data and the acquisition time of the frame in the audio data is greater than or equal to a preset value, After the acquisition time of the frames in the data is determined as the timestamp of the frames in the audio data, the following information resetting steps are performed: updating the start time to the acquisition time of the frame; and updating the total frames currently processed The number is cleared.
在本实施例的一些实现方式中,该装置还可以包括第三确定单元和第四确定单元(图中未示出)。其中,上述第三确定单元可以被配置成在执行所述信息重设步骤之后,确定上述信息重设步骤的执行频率。上述第四确定单元可以被配置成响应于确定上述信息重设步骤的执行频率大于预设的执行频率阈值,对于音频数据中的经过处理且未确定时间戳的帧,将该帧的采集时间确定为该帧的时间戳。In some implementations of this embodiment, the apparatus may further include a third determining unit and a fourth determining unit (not shown in the figure). The third determining unit may be configured to determine an execution frequency of the information resetting step after the information resetting step is performed. The fourth determining unit may be configured to determine, in response to determining that the execution frequency of the information resetting step is greater than a preset execution frequency threshold, for a frame in the audio data that has been processed without a timestamp, the acquisition time of the frame is determined. Is the timestamp of the frame.
在本实施例的一些实现方式中,该装置还可以包括第五确定单元和第六确定单元(图中未示出)。其中,上述第五确定单元可以被配置成在执行所述信息重设步骤之后,确定上述信息重设步骤的执行次数。上述第六确定单元可以被配置成响应于确定上述信息重设步骤的执行次数大于预设的执行次数阈值,对于音频数据中的经过处理且未确定时间戳的帧,将该帧的采集时间确定为该帧的时间戳。In some implementations of this embodiment, the apparatus may further include a fifth determination unit and a sixth determination unit (not shown in the figure). The fifth determining unit may be configured to determine the number of times the information reset step is performed after the information reset step is performed. The above-mentioned sixth determining unit may be configured to determine, in response to determining that the number of times of execution of the information reset step is greater than a preset number of times of execution, for a processed frame in the audio data without determining a timestamp, determining the frame collection time Is the timestamp of the frame.
本公开的上述实施例提供的装置,通过采集单元501采集音视频数据,而后第一确定单元502将视频数据中的帧的采集时间确定为该帧的时间戳,之后第二确定单元503将音频数据的首次采样时间作为起始时间,基于起始时间、音频数据中各帧处理完成时已处理的总帧数、预设的采样数和采样频率,确定 各帧的时间戳,最后存储单元504将带有时间戳的音视频数据进行存储,从而,避免了音视频数据采集不稳定的情况下(例如设备过热、性能不足导致丢帧),按照相同时间间隔进行帧的时间戳的计算所导致的时间戳不准确的情况,提高了所确定的音视频数据中的帧的时间戳的准确性,提升了原声视频录制的音视频同步效果。The device provided by the foregoing embodiment of the present disclosure collects audio and video data through the acquisition unit 501, and then the first determination unit 502 determines the collection time of the frame in the video data as the time stamp of the frame, and then the second determination unit 503 determines the audio The first sampling time of the data is used as the starting time, and the time stamp of each frame is determined based on the starting time, the total number of frames processed when the processing of each frame in the audio data is completed, the preset number of samples, and the sampling frequency. Store timestamped audio and video data, thereby avoiding the situation where the audio and video data collection is unstable (such as device overheating and insufficient performance resulting in dropped frames), resulting from the calculation of the timestamp of the frames at the same time interval In the case of inaccurate timestamps, the accuracy of the timestamps of the frames in the determined audio and video data is improved, and the audio and video synchronization effect of the original audio and video recording is improved.
下面参考图6,其示出了适于用来实现本公开实施例的终端设备的计算机系统600的结构示意图。图6示出的终端设备仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。Reference is now made to FIG. 6, which illustrates a schematic structural diagram of a computer system 600 suitable for implementing a terminal device according to an embodiment of the present disclosure. The terminal device shown in FIG. 6 is only an example, and should not impose any limitation on the functions and scope of use of the embodiments of the present disclosure.
如图6所示,计算机系统600包括中央处理单元(Central Processing Unit,CPU)601,其可以根据存储在只读存储器(Read Only Memory,ROM)602中的程序或者从存储部分608加载到随机访问存储器(Random Access Memory,RAM)603中的程序而执行各种适当的动作和处理。在RAM 603中,还存储有系统600操作所需的各种程序和数据。CPU 601、ROM 602以及RAM 603通过总线604彼此相连。输入/输出(Input/Output,I/O)接口605也连接至总线604。As shown in FIG. 6, the computer system 600 includes a central processing unit (CPU) 601, which can be loaded into random access according to a program stored in a read-only memory (ROM) 602 or from a storage portion 608 A program in a memory (Random Access Memory, RAM) 603 performs various appropriate actions and processes. In the RAM 603, various programs and data required for the operation of the system 600 are also stored. The CPU 601, the ROM 602, and the RAM 603 are connected to each other through a bus 604. An input / output (I / O) interface 605 is also connected to the bus 604.
以下部件连接至I/O接口605:包括触摸屏、触摸板等的输入部分606;包括诸如液晶显示器(Liquid Crystal Display,LCD)等以及扬声器等的输出部分607;包括硬盘等的存储部分608;以及包括诸如局域网(Local Area Network,LAN)卡、调制解调器等的网络接口卡的通信部分609。通信部分609经由诸如因特网的网络执行通信处理。驱动器610也根据需要连接至I/O接口605。可拆卸介质611,诸如半导体存储器等等,根据需要安装在驱动器610上,以便于从其上读出的计算机程序根据需要被安装入存储部分608。The following components are connected to the I / O interface 605: an input portion 606 including a touch screen, a touch panel, etc .; an output portion 607 including a liquid crystal display (LCD), and a speaker; a storage portion 608 including a hard disk; and A communication part 609 of a network interface card, such as a local area network (LAN) card, a modem, or the like. The communication section 609 performs communication processing via a network such as the Internet. The driver 610 is also connected to the I / O interface 605 as necessary. A removable medium 611, such as a semiconductor memory or the like, is installed on the drive 610 as necessary, so that a computer program read therefrom is installed into the storage section 608 as necessary.
根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信部分609从网络上被下载和安装,和/或从可拆卸介质611被安装。在该计算机程序被中央处理单元(CPU)601执行时,执行本公开的方法中限定的上述功能。需要说明的是,本公开所述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一 个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器((Erasable Programmable Read Only Memory,EPROM)或闪存)、光纤、便携式紧凑磁盘只读存储器(Compact Disc Read-Only Memory,CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:无线、电线、光缆、射频(Radio Frequency,RF)等等,或者上述的任意合适的组合。According to an embodiment of the present disclosure, the process described above with reference to the flowchart may be implemented as a computer software program. For example, embodiments of the present disclosure include a computer program product including a computer program carried on a computer-readable medium, the computer program containing program code for performing a method shown in a flowchart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication portion 609, and / or installed from a removable medium 611. When the computer program is executed by a central processing unit (CPU) 601, the above-mentioned functions defined in the method of the present disclosure are performed. It should be noted that the computer-readable medium described in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium or any combination of the foregoing. The computer-readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programming read-only memory (Erasable, Programmable, Read-Only, Memory (EPROM) or flash memory), optical fiber, portable compact disk read-only memory (Compact Disc-Read-Only Memory, CD-ROM), optical storage device, magnetic storage device, or the Any suitable combination. In this disclosure, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in combination with an instruction execution system, apparatus, or device. In this disclosure, a computer-readable signal medium may include a data signal that is included in baseband or propagated as part of a carrier wave, and which carries computer-readable program code. Such a propagated data signal may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. The computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, and the computer-readable medium may send, propagate, or transmit a program for use by or in connection with an instruction execution system, apparatus, or device . The program code contained on the computer-readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, optical cable, radio frequency (RF), or any suitable combination of the foregoing.
附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagram may represent a module, a program segment, or a part of code, which contains one or more functions to implement a specified logical function Executable instructions. It should also be noted that in some alternative implementations, the functions noted in the blocks may also occur in a different order than those marked in the drawings. For example, two successively represented boxes may actually be executed substantially in parallel, and they may sometimes be executed in the reverse order, depending on the functions involved. It should also be noted that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, can be implemented by a dedicated hardware-based system that performs the specified function or operation , Or it can be implemented with a combination of dedicated hardware and computer instructions.
描述于本公开实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。所描述的单元也可以设置在处理器中,例如,可以描述为:一种处理器包括采集单元、第一确定单元、第二确定单元和存储单元。其中,这些单元的名称在某种情况下并不构成对该单元本身的限定,例如,采集单元还可以被描述为“采集音视频数据的单元”。The units described in the embodiments of the present disclosure may be implemented by software or hardware. The described unit may also be provided in a processor, for example, it may be described as: a processor includes an acquisition unit, a first determination unit, a second determination unit, and a storage unit. Among them, the names of these units do not in any way constitute a limitation on the unit itself. For example, the acquisition unit can also be described as a “unit that collects audio and video data”.
作为另一方面,本公开还提供了一种计算机可读介质,该计算机可读介质可以是上述实施例中描述的装置中所包含的;也可以是单独存在,而未装配入 该装置中。上述计算机可读介质承载有一个或者多个程序,在上述一个或者多个程序被该装置执行时,使得该装置:采集音视频数据,该音视频数据包括音频数据和视频数据;对于视频数据中的帧,将该帧的采集时间确定为该帧的时间戳;将音频数据的首次采样时间作为起始时间,基于起始时间、当前已处理完成的总帧数、预设的每帧采样数和预设的采样频率,确定该帧的时间戳;将带有时间戳的音视频数据进行存储。As another aspect, the present disclosure also provides a computer-readable medium, which may be included in the device described in the above embodiments; or may exist alone without being assembled into the device. The computer-readable medium carries one or more programs, and when the one or more programs are executed by the device, the device causes the device to: collect audio and video data, the audio and video data includes audio data and video data; for video data, Frame, the acquisition time of the frame is determined as the time stamp of the frame; the first sampling time of the audio data is used as the start time, based on the start time, the total number of frames currently processed, and the preset number of samples per frame And a preset sampling frequency to determine the timestamp of the frame; store the timed audio and video data.

Claims (16)

  1. 一种处理数据的方法,包括:A method for processing data, including:
    采集音视频数据,所述音视频数据包括音频数据和视频数据;Collecting audio and video data, the audio and video data including audio data and video data;
    将所述视频数据中的帧的采集时间确定为所述视频数据中的帧的时间戳;Determining a collection time of a frame in the video data as a time stamp of the frame in the video data;
    将所述音频数据的首次采样时间作为起始时间,基于所述起始时间、所述音频数据中的帧处理完成时已处理的总帧数、预设的每帧采样数和预设的采样频率,确定所述音频数据中的帧的时间戳;Use the first sampling time of the audio data as a starting time, based on the starting time, the total number of frames processed when frame processing in the audio data is completed, a preset number of samples per frame, and a preset sample Frequency, determining a time stamp of a frame in the audio data;
    存储带有所述视频数据中的帧的时间戳和所述音频数据中的帧的时间戳的音视频数据。Audio and video data with timestamps of frames in the video data and timestamps of frames in the audio data are stored.
  2. 根据权利要求1所述的方法,其中,所述基于起始时间、所述音频数据中的帧处理完成时已处理的总帧数、预设的每帧采样数和预设的采样频率,确定所述音频数据中的帧的时间戳,包括:The method according to claim 1, wherein the determining is based on a start time, a total number of frames processed when frame processing in the audio data is completed, a preset number of samples per frame, and a preset sampling frequency. The timestamp of the frame in the audio data includes:
    确定预设的每帧采样数与预设的采样频率的比值,确定所述比值与所述音频数据中的帧处理完成时已处理的总帧数的乘积,将所述乘积与起始时间的和确定为所述音频数据中的帧的目标时间;Determining a preset ratio of the number of samples per frame to a preset sampling frequency, determining a product of the ratio and the total number of frames processed when frame processing in the audio data is completed, and combining the product with the starting time And determining the target time as a frame in the audio data;
    基于所述音频数据中的帧的目标时间与所述音频数据中的帧的采集时间的数值比较,确定所述音频数据中的帧的时间戳。A time stamp of a frame in the audio data is determined based on a comparison between a target time of a frame in the audio data and a value of a collection time of the frame in the audio data.
  3. 根据权利要求2所述的方法,其中,所述,基于所述音频数据中的帧的目标时间与所述音频数据中的帧的采集时间的数值比较,确定所述音频数据中的帧的时间戳,包括:The method according to claim 2, wherein the determining the time of the frame in the audio data is based on a numerical comparison of a target time of the frame in the audio data with a collection time of the frame in the audio data Poke, including:
    响应于确定所述音频数据中的帧的目标时间与所述音频数据中的帧的采集时间的差值小于预设数值,将所述音频数据中的帧的目标时间确定为所述音频数据中的帧的时间戳。In response to determining that the difference between the target time of the frame in the audio data and the acquisition time of the frame in the audio data is less than a preset value, determining the target time of the frame in the audio data as the audio data Timestamp of the frame.
  4. 根据权利要求2所述的方法,其中,所述基于所述音频数据中的帧的目标时间与所述音频数据中的帧的采集时间的数值比较,确定所述音频数据中的帧的时间戳,包括:The method according to claim 2, wherein the time stamp of the frame in the audio data is determined based on a numerical comparison of a target time of the frame in the audio data and a collection time of the frame in the audio data. ,include:
    响应于确定所述音频数据中的帧的目标时间与所述音频数据中的帧的采集时间的差值大于或等于预设数值,将所述音频数据中的帧的采集时间确定为所述音频数据中的帧的时间戳。In response to determining that the difference between the target time of the frame in the audio data and the acquisition time of the frame in the audio data is greater than or equal to a preset value, determining the acquisition time of the frame in the audio data as the audio The timestamp of the frame in the data.
  5. 根据权利要求4所述的方法,在所述响应于确定所述音频数据中的帧的目标时间与所述音频数据中的帧的采集时间的差值大于或等于预设数值,将所述音频数据中的帧的采集时间确定为所述音频数据中的帧的时间戳之后,还包 括:The method according to claim 4, wherein in response to determining that a difference between a target time of a frame in the audio data and a collection time of a frame in the audio data is greater than or equal to a preset value, the audio After the collection time of the frames in the data is determined as the timestamp of the frames in the audio data, the method further includes:
    执行如下的信息重设步骤:将所述起始时间更新为所述音频数据中的帧的采集时间;以及将当前已处理完成的总帧数清零。The following information resetting steps are performed: updating the start time to the acquisition time of frames in the audio data; and clearing the total number of frames currently processed to zero.
  6. 根据权利要求5所述的方法,在执行所述信息重设步骤之后,还包括:The method according to claim 5, after performing the information resetting step, further comprising:
    确定所述信息重设步骤的执行频率;Determining an execution frequency of the information resetting step;
    响应于确定所述信息重设步骤的执行频率大于预设的执行频率阈值,对于音频数据中的经过处理且未确定时间戳的帧,将所述音频数据中的经过处理且未确定时间戳的帧的采集时间确定为对应帧的时间戳。In response to determining that the execution frequency of the information reset step is greater than a preset execution frequency threshold, for the processed and undetermined timestamp frames in the audio data, The frame acquisition time is determined as the timestamp of the corresponding frame.
  7. 根据权利要求5所述的方法,在执行所述信息重设步骤之后,还包括:The method according to claim 5, after performing the information resetting step, further comprising:
    确定所述信息重设步骤的执行次数;Determining the number of executions of the information resetting step;
    响应于确定所述信息重设步骤的执行次数大于预设的执行次数阈值,对于音频数据中的经过处理且未确定时间戳的帧,将所述音频数据中的经过处理且未确定时间戳的帧的采集时间确定为对应帧的时间戳。In response to determining that the number of times of execution of the information resetting step is greater than a preset number of times of execution, for the processed and undetermined timestamp frames in the audio data, The frame acquisition time is determined as the timestamp of the corresponding frame.
  8. 一种处理数据的装置,包括:A device for processing data includes:
    采集单元,被配置成采集音视频数据,所述音视频数据包括音频数据和视频数据;An acquisition unit configured to acquire audio and video data, where the audio and video data includes audio data and video data;
    第一确定单元,被配置成将所述视频数据中的帧的采集时间确定为所述视频数据中的帧的时间戳;A first determining unit configured to determine an acquisition time of a frame in the video data as a time stamp of the frame in the video data;
    第二确定单元,被配置成将所述音频数据的首次采样时间作为起始时间,基于所述起始时间、所述音频数据中的帧处理完成时已处理的总帧数、预设的每帧采样数和预设的采样频率,确定所述音频数据中的帧的时间戳;The second determining unit is configured to use the first sampling time of the audio data as a start time, based on the start time, a total number of frames processed when frame processing in the audio data is completed, and a preset each The number of frame samples and a preset sampling frequency to determine a time stamp of a frame in the audio data;
    存储单元,被配置成存储带有所述视频数据中的帧的时间戳和所述音频数据中的帧的时间戳的音视频数据。The storage unit is configured to store audio and video data with a time stamp of a frame in the video data and a time stamp of a frame in the audio data.
  9. 根据权利要求8所述的装置,其中,所述第二确定单元,包括:The apparatus according to claim 8, wherein the second determining unit comprises:
    第一确定模块,被配置成确定预设的每帧采样数与预设的采样频率的比值,确定所述比值与所述音频数据中的帧处理完成时已处理的总帧数的乘积,将所述乘积与起始时间的和确定为所述音频数据中的帧的目标时间;A first determining module is configured to determine a ratio between a preset number of samples per frame and a preset sampling frequency, determine a product of the ratio and a total number of frames processed when frame processing in the audio data is completed, and The sum of the product and the start time is determined as a target time of a frame in the audio data;
    第二确定模块,被配置成基于所述音频数据中的帧的目标时间与所述音频数据中的帧的采集时间的数值比较,确定所述音频数据中的帧的时间戳。A second determination module is configured to determine a time stamp of a frame in the audio data based on a comparison between a target time of the frame in the audio data and a value of a collection time of the frame in the audio data.
  10. 根据权利要求9所述的装置,其中,所述第二确定模块,被配置成:The apparatus according to claim 9, wherein the second determination module is configured to:
    响应于确定所述音频数据中的帧的目标时间与所述音频数据中的帧的采集 时间的差值小于预设数值,将所述音频数据中的帧的目标时间确定为所述音频数据中的帧的时间戳。In response to determining that the difference between the target time of the frame in the audio data and the acquisition time of the frame in the audio data is less than a preset value, determining the target time of the frame in the audio data as the audio data Timestamp of the frame.
  11. 根据权利要求9所述的装置,其中,所述第二确定模块,被配置成:The apparatus according to claim 9, wherein the second determination module is configured to:
    响应于确定所述音频数据中的帧的目标时间与所述音频数据中的帧的采集时间的差值大于或等于预设数值,将所述音频数据中的帧的采集时间确定为所述音频数据中的帧的时间戳。In response to determining that the difference between the target time of the frame in the audio data and the acquisition time of the frame in the audio data is greater than or equal to a preset value, determining the acquisition time of the frame in the audio data as the audio The timestamp of the frame in the data.
  12. 根据权利要求11所述的装置,还包括:The apparatus according to claim 11, further comprising:
    执行单元,被配置成在所述响应于确定所述音频数据中的帧的目标时间与所述音频数据中的帧的采集时间的差值大于或等于预设数值,将所述音频数据中的帧的采集时间确定为所述音频数据中的帧的时间戳之后,执行如下的信息重设步骤:将所述起始时间更新为所述音频数据中的帧的采集时间;以及将当前已处理完成的总帧数清零。The execution unit is configured to, in response to determining that a difference between a target time of a frame in the audio data and a collection time of the frame in the audio data is greater than or equal to a preset value, After the frame collection time is determined as the time stamp of the frame in the audio data, the following information resetting step is performed: updating the start time to the collection time of the frame in the audio data; and updating the currently processed The total number of frames completed is cleared.
  13. 根据权利要求12所述的装置,还包括:The apparatus according to claim 12, further comprising:
    第三确定单元,被配置成在执行所述信息重设步骤之后,确定所述信息重设步骤的执行频率;A third determining unit configured to determine an execution frequency of the information resetting step after executing the information resetting step;
    第四确定单元,被配置成响应于确定所述信息重设步骤的执行频率大于预设的执行频率阈值,对于音频数据中的经过处理且未确定时间戳的帧,将所述音频数据中的经过处理且未确定时间戳的帧的采集时间确定为对应帧的时间戳。A fourth determining unit is configured to, in response to determining that the execution frequency of the information resetting step is greater than a preset execution frequency threshold, for frames in the audio data that have been processed and have not been determined with a timestamp, The collection time of a processed frame without a timestamp is determined as the timestamp of the corresponding frame.
  14. 根据权利要求12所述的装置,还包括:The apparatus according to claim 12, further comprising:
    第五确定单元,被配置成在执行所述信息重设步骤之后,确定所述信息重设步骤的执行次数;A fifth determining unit configured to determine the number of times the information reset step is performed after the information reset step is performed;
    第六确定单元,被配置成响应于确定所述信息重设步骤的执行次数大于预设的执行次数阈值,对于音频数据中的经过处理且未确定时间戳的帧,将所述音频数据中的经过处理且未确定时间戳的帧的采集时间确定为对应帧的时间戳。A sixth determining unit is configured to, in response to determining that the number of times of execution of the information reset step is greater than a preset number of times of execution, for the frames in the audio data that have been processed without a timestamp, the The collection time of a processed frame without a timestamp is determined as the timestamp of the corresponding frame.
  15. 一种终端设备,包括:A terminal device includes:
    至少一个处理器;At least one processor;
    存储装置,其上存储有至少一个程序,A storage device storing at least one program thereon,
    所述至少一个程序被所述至少一个处理器执行,使得所述至少一个处理器实现如权利要求1-7中任一项所述的方法。The at least one program is executed by the at least one processor, so that the at least one processor implements the method according to any one of claims 1-7.
  16. 一种计算机可读介质,其上存储有计算机程序,所述程序被处理器执行时实现如权利要求1-7中任一项所述的方法。A computer-readable medium having stored thereon a computer program which, when executed by a processor, implements the method according to any one of claims 1-7.
PCT/CN2019/098584 2018-08-01 2019-07-31 Data processing method and apparatus WO2020024980A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810865732.5 2018-08-01
CN201810865732.5A CN109600665B (en) 2018-08-01 2018-08-01 Method and apparatus for processing data

Publications (1)

Publication Number Publication Date
WO2020024980A1 true WO2020024980A1 (en) 2020-02-06

Family

ID=65956762

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/098584 WO2020024980A1 (en) 2018-08-01 2019-07-31 Data processing method and apparatus

Country Status (2)

Country Link
CN (1) CN109600665B (en)
WO (1) WO2020024980A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109600665B (en) * 2018-08-01 2020-06-19 北京微播视界科技有限公司 Method and apparatus for processing data
CN110290422B (en) * 2019-06-13 2021-09-10 浙江大华技术股份有限公司 Timestamp superposition method and device, shooting device and storage device
CN111601162B (en) * 2020-06-08 2022-08-02 北京世纪好未来教育科技有限公司 Video segmentation method and device and computer storage medium
CN113132672B (en) * 2021-03-24 2022-07-26 联想(北京)有限公司 Data processing method and video conference equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102595114A (en) * 2011-01-13 2012-07-18 安凯(广州)微电子技术有限公司 Method and terminal for playing video on low-side embedded product
WO2015107372A1 (en) * 2014-01-20 2015-07-23 British Broadcasting Corporation Method and apparatus for determining synchronisation of audio signals
CN106412662A (en) * 2016-09-20 2017-02-15 腾讯科技(深圳)有限公司 Timestamp distribution method and device
JP2017147594A (en) * 2016-02-17 2017-08-24 ヤマハ株式会社 Audio apparatus
CN109600665A (en) * 2018-08-01 2019-04-09 北京微播视界科技有限公司 Method and apparatus for handling data

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9019087B2 (en) * 2007-10-16 2015-04-28 Immersion Corporation Synchronization of haptic effect data in a media stream
CN103686315A (en) * 2012-09-13 2014-03-26 深圳市快播科技有限公司 Synchronous audio and video playing method and device
US9635334B2 (en) * 2012-12-03 2017-04-25 Avago Technologies General Ip (Singapore) Pte. Ltd. Audio and video management for parallel transcoding
CN105049917B (en) * 2015-07-06 2018-12-07 深圳Tcl数字技术有限公司 The method and apparatus of recording audio/video synchronized timestamp
CN106792073B (en) * 2016-12-29 2019-09-17 北京奇艺世纪科技有限公司 Method, playback equipment and the system that the audio, video data of striding equipment is played simultaneously
CN107135407B (en) * 2017-03-29 2019-10-18 华东交通大学 Synchronous method and system in a kind of piano video teaching
CN108322811A (en) * 2018-02-26 2018-07-24 宝鸡文理学院 A kind of synchronous method in piano video teaching and system
CN108259965B (en) * 2018-03-31 2020-05-12 湖南广播电视台广播传媒中心 Video editing method and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102595114A (en) * 2011-01-13 2012-07-18 安凯(广州)微电子技术有限公司 Method and terminal for playing video on low-side embedded product
WO2015107372A1 (en) * 2014-01-20 2015-07-23 British Broadcasting Corporation Method and apparatus for determining synchronisation of audio signals
JP2017147594A (en) * 2016-02-17 2017-08-24 ヤマハ株式会社 Audio apparatus
CN106412662A (en) * 2016-09-20 2017-02-15 腾讯科技(深圳)有限公司 Timestamp distribution method and device
CN109600665A (en) * 2018-08-01 2019-04-09 北京微播视界科技有限公司 Method and apparatus for handling data

Also Published As

Publication number Publication date
CN109600665A (en) 2019-04-09
CN109600665B (en) 2020-06-19

Similar Documents

Publication Publication Date Title
WO2020024980A1 (en) Data processing method and apparatus
CN109600564B (en) Method and apparatus for determining a timestamp
WO2020024962A1 (en) Method and apparatus for processing data
US11114133B2 (en) Video recording method and device
CN109600661B (en) Method and apparatus for recording video
WO2023125169A1 (en) Audio processing method and apparatus, device, and storage medium
WO2020024960A1 (en) Method and device for processing data
WO2020024949A1 (en) Method and apparatus for determining timestamp
CN111385576B (en) Video coding method and device, mobile terminal and storage medium
CN109600660B (en) Method and apparatus for recording video
US11302308B2 (en) Synthetic narrowband data generation for narrowband automatic speech recognition systems
CN109413492B (en) Audio data reverberation processing method and system in live broadcast process
JP6356857B1 (en) Log recording apparatus, log recording method, and log recording program
CN109600562B (en) Method and apparatus for recording video
CN111147655B (en) Model generation method and device
CN109375892B (en) Method and apparatus for playing audio
CN111145792B (en) Audio processing method and device
CN111145769A (en) Audio processing method and device
CN111210837B (en) Audio processing method and device
CN111145770B (en) Audio processing method and device
CN115065852B (en) Sound and picture synchronization method and device, electronic equipment and readable storage medium
CN113364672B (en) Method, device, equipment and computer readable medium for determining media gateway information
WO2020073565A1 (en) Audio processing method and apparatus
CN113436632A (en) Voice recognition method and device, electronic equipment and storage medium
BR112019027958A2 (en) apparatus and method of signal processing, and, program.

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19844125

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 17.05.2021)

122 Ep: pct application non-entry in european phase

Ref document number: 19844125

Country of ref document: EP

Kind code of ref document: A1