CN109600665B

CN109600665B - Method and apparatus for processing data

Info

Publication number: CN109600665B
Application number: CN201810865732.5A
Authority: CN
Inventors: 周驿
Original assignee: Beijing Microlive Vision Technology Co Ltd
Current assignee: Honey Grapefruit Network Technology Shanghai Co ltd
Priority date: 2018-08-01
Filing date: 2018-08-01
Publication date: 2020-06-19
Anticipated expiration: 2038-08-01
Also published as: CN109600665A; WO2020024980A1

Abstract

The embodiment of the application discloses a method and a device for processing data. One embodiment of the method comprises: collecting audio and video data, wherein the audio and video data comprises audio data and video data; for a frame in video data, determining the acquisition time of the frame as a timestamp of the frame; taking the first sampling time of the audio data as the starting time, and determining the time stamp of the frame based on the starting time, the total frame number processed currently, the preset sampling number of each frame and the preset sampling frequency for the frame in the audio data; and storing the audio and video data with the time stamp. The embodiment improves the accuracy of the time stamp of the audio and video data.

Description

Method and apparatus for processing data

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to a method and a device for processing data.

Background

When recording an acoustic video, a camera is required to collect video data, and a microphone is used to collect audio data. After the audio/video data is collected, a timestamp of the collected audio/video data may be determined. When the audio and video data are played, the playing end can play the audio and video data based on the time stamp. In an application with a video recording function, it is common that audio and video asynchronism occurs in a recorded acoustic video.

In the related manner, the interval between two adjacent frames in the audio data and the video data is generally considered to be fixed. For a certain frame in the audio data and the video data, the sum of the time stamp of the last frame and the interval time is determined as the time stamp of the frame. And recording the time stamp in the recorded audio and video data.

Disclosure of Invention

The embodiment of the application provides a method and a device for processing data.

In a first aspect, an embodiment of the present application provides a method for processing data, where the method includes: collecting audio and video data, wherein the audio and video data comprises audio data and video data; for a frame in video data, determining the acquisition time of the frame as a timestamp of the frame; taking the first sampling time of the audio data as the starting time, and determining the time stamp of the frame based on the starting time, the total frame number processed when the frame processing is completed, the preset sampling number of each frame and the preset sampling frequency for the frame in the audio data; and storing the audio and video data with the time stamp.

In some embodiments, for a frame in the audio data, determining a timestamp for the frame based on the start time, the total number of frames processed when the frame processing is complete, a preset number of samples per frame, and a preset sampling frequency comprises: for a frame in audio data, determining the ratio of the number of samples of each frame to the sampling frequency, determining the product of the ratio and the total number of frames processed when the frame is processed, and determining the sum of the product and the starting time as the target time of the frame; for a frame in audio data, a timestamp for the frame is determined based on a numerical comparison of a target time for the frame and an acquisition time for the frame.

In some embodiments, for a frame in audio data, determining a timestamp for the frame based on a numerical comparison of a target time for the frame and an acquisition time for the frame comprises: for a frame in the audio data, in response to determining that a difference between a target time of the frame and a capture time of the frame is less than a preset value, determining the target time of the frame as a timestamp of the frame.

In some embodiments, for a frame in audio data, determining a timestamp for the frame based on a numerical comparison of a target time for the frame and an acquisition time for the frame comprises: for a frame in the audio data, in response to determining that a difference between a target time of the frame and a capture time of the frame is not less than a preset value, determining the capture time of the frame as a timestamp of the frame.

In some embodiments, after responding to a determination that the difference between the target time of the frame and the acquisition time of the frame is not less than a preset value, the method further comprises: the following information resetting steps are performed: updating the starting time to the acquisition time of the frame; and clearing the total frame number which is processed at present.

In some embodiments, after performing the information resetting step, the method further comprises: determining an execution frequency of the information resetting step; and in response to the fact that the execution frequency of the information resetting step is larger than the preset execution frequency threshold, determining the acquisition time of a frame with a processed timestamp not determined as the timestamp of the frame in the audio data.

In some embodiments, after performing the information resetting step, the method further comprises: determining the number of times of execution of the reset information resetting step; and in response to the fact that the execution times of the information resetting step are larger than the preset execution times threshold, determining the acquisition time of a frame with a processed timestamp not determined as the timestamp of the frame in the audio data.

In a second aspect, an embodiment of the present application provides an apparatus for processing data, the apparatus including: the device comprises a collecting unit, a processing unit and a processing unit, wherein the collecting unit is configured to collect audio and video data, and the audio and video data comprises audio data and video data; a first determination unit configured to determine, for a frame in video data, an acquisition time of the frame as a timestamp of the frame; a second determination unit configured to determine, for a frame in the audio data, a timestamp of the frame based on the start time, a total number of frames processed when the frame processing is completed, a preset number of samples per frame, and a preset sampling frequency, with the first sampling time of the audio data as a start time; and the storage unit is configured to store the audio and video data with the time stamp.

In some embodiments, the second determining unit comprises: a first determination module configured to determine, for a frame in the audio data, a ratio of a preset number of samples per frame to a preset sampling frequency, determine a product of the ratio and a total number of frames processed when the frame processing is completed, and determine a sum of the product and a start time as a target time of the frame; a second determination module configured to determine, for a frame in the audio data, a timestamp for the frame based on a numerical comparison of a target time for the frame and an acquisition time for the frame.

In some embodiments, the second determination module is further configured to: for a frame in the audio data, in response to determining that a difference between a target time of the frame and a capture time of the frame is less than a preset value, determining the target time of the frame as a timestamp of the frame.

In some embodiments, the second determination module is further configured to: for a frame in the audio data, in response to determining that a difference between a target time of the frame and a capture time of the frame is not less than a preset value, determining the capture time of the frame as a timestamp of the frame.

In some embodiments, the apparatus further comprises: an execution unit configured to execute the following information resetting steps: updating the starting time to the acquisition time of the frame; and clearing the total frame number which is processed at present.

In some embodiments, the apparatus further comprises: a third determination unit configured to determine an execution frequency of the information resetting step; and a fourth determination unit configured to determine, for a frame of which the time stamp is not determined and which is processed in the audio data, the acquisition time of the frame as the time stamp of the frame in response to determining that the execution frequency of the information resetting step is greater than a preset execution frequency threshold.

In some embodiments, the apparatus further comprises: a fifth determination unit configured to determine the number of times of execution of the reset information resetting step; and a sixth determining unit configured to determine, for a frame of which the time stamp is not determined and which is processed in the audio data, the acquisition time of the frame as the time stamp of the frame in response to determining that the number of times of execution of the information resetting step is greater than a preset number-of-times-of-execution threshold.

In a third aspect, an embodiment of the present application provides a terminal device, including: one or more processors; a storage device having one or more programs stored thereon which, when executed by one or more processors, cause the one or more processors to implement a method as in any embodiment of a method for processing data.

In a fourth aspect, embodiments of the present application provide a computer-readable medium on which a computer program is stored, which program, when executed by a processor, implements a method as in any one of the embodiments of the method for processing data.

According to the method and the device for processing the data, the audio and video data are collected, the collection time of the frame in the video data is determined as the time stamp of the frame, the first sampling time of the audio data is taken as the starting time, the time stamp of each frame is determined based on the starting time, the total number of processed frames when the processing of each frame in the audio data is completed, the preset sampling number and the sampling frequency, and finally the audio and video data with the time stamp are stored.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is an exemplary system architecture diagram in which one embodiment of the present application may be applied;

FIG. 2 is a flow diagram of one embodiment of a method for processing data according to the present application;

FIG. 3 is a schematic diagram of an application scenario of a method for processing data according to the present application;

FIG. 4 is a flow diagram of yet another embodiment of a method for processing data according to the present application;

FIG. 5 is a schematic block diagram illustrating one embodiment of an apparatus for processing data according to the present application;

FIG. 6 is a block diagram of a terminal device computer system suitable for use in implementing embodiments of the present application.

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

Fig. 1 shows an exemplary system architecture 100 to which the method for processing data or the apparatus for processing data of the present application may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

A user may use the

terminal devices

101, 102, 103 to interact with the server 105 over the network 104 to receive or send messages (e.g., audio-visual data upload requests), etc. Various communication client applications, such as a video recording application, an audio playing application, an instant messaging tool, a mailbox client, social platform software, and the like, may be installed on the

terminal devices

101, 102, and 103.

The

terminal apparatuses

101, 102, and 103 may be hardware or software. When the

terminal devices

101, 102, 103 are hardware, they may be various electronic devices having a display screen and recording audio and video, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like. When the

terminal apparatuses

101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.

The

terminal devices

101, 102, 103 may be equipped with an image capturing device (e.g., a camera) to capture video data. In practice, the smallest visual unit that makes up a video is a Frame (Frame). Each frame is a static image. Temporally successive sequences of frames are composited together to form a motion video. Furthermore, the

terminal devices

101, 102, 103 may also be equipped with audio capturing means (e.g. microphones) to capture continuous analog audio signals. In practice, data obtained by performing analog-to-Digital Conversion (ADC) on continuous analog audio signals from a microphone or the like at a certain frequency is audio data.

The

terminal apparatuses

101, 102, 103 can perform acquisition of video data and audio data using an image acquisition device and an audio acquisition device mounted thereon, respectively. Further, the captured video data may be subjected to processing such as time stamp calculation, and the processing result (for example, the captured audio data and the video data including the time stamp) may be finally stored.

The server 105 may be a server providing various services, such as a background server providing support for video recording type applications installed on the

terminal devices

101, 102, 103. The background server can analyze and store the received data such as the audio and video data uploading request and the like. And audio and video data acquisition requests sent by the

terminal equipment

101, 102 and 103 can be received, and the audio and video data indicated by the audio and video data acquisition requests are fed back to the

terminal equipment

101, 102 and 103.

The server may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.

It should be noted that the method for processing data provided in the embodiment of the present application is generally executed by the

terminal devices

101, 102, and 103, and accordingly, the apparatus for processing data is generally disposed in the

terminal devices

101, 102, and 103.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to FIG. 2, a flow 200 of one embodiment of a method for processing data in accordance with the present application is shown. The method for processing data comprises the following steps:

step 201, collecting audio and video data.

In the present embodiment, the execution subject of the method for processing data (e.g.,

terminal apparatuses

101, 102, 103 shown in fig. 1) may be mounted with an image pickup device (e.g., a camera) and an audio signal pickup device (e.g., a microphone). The execution main body can simultaneously start the image acquisition device and the audio signal acquisition device and acquire audio and video data by utilizing the image acquisition device and the audio signal acquisition device. The audio/video data includes audio data (voice data) and video data (vision data).

In practice, video data may be described in frames (frames). Here, a frame is the smallest visual unit constituting a video. Each frame is a static image. Temporally successive sequences of frames are composited together to form a motion video.

In practice, audio data is data obtained by digitizing a sound signal. The process of digitizing an audio signal is a process of converting a continuous analog audio signal from a microphone or the like into a digital signal at a certain frequency to obtain audio data. The process of digitizing sound signals typically involves three steps of sampling, quantizing and encoding. Here, sampling is to replace an original signal that is continuous in time with a sequence of signal sample values at regular intervals. Quantization is the approximation of the original amplitude value which changes continuously in time by a finite amplitude, and the continuous amplitude of the analog signal is changed into a finite number of discrete values with a certain time interval. The encoding means that the quantized discrete values are represented by binary numbers according to a certain rule. Generally, there are two important indicators of the digitization process of a sound signal, namely, Sampling frequency (Sampling Rate) and Sampling Size (Sampling Size). The sampling frequency is also referred to as a sampling speed or a sampling frequency. The sampling frequency may be the number of samples per second that are extracted from a continuous signal and made up into a discrete signal. The sampling frequency may be expressed in hertz (Hz). The sample size may be expressed in bits (bits). Here, Pulse Code Modulation (PCM) may implement digitized audio data into which an analog audio signal is sampled, quantized, and encoded. Accordingly, the audio data may be data in a PCM encoding format.

The format of the file describing the target audio data may be other formats, such as mp3 format and ape format. At this time, the target audio data may be data in another encoding format (for example, a lossy compression format such as AAC (Advanced audio coding)), and is not limited to the PCM encoding format. The execution body may also perform format conversion on the file after acquiring the file, and convert the file into the record wav format. At this time, the target audio file in the converted file is a data stream in PCM coding format.

In practice, the execution main body may be installed with a video recording application. The video recording application can support the recording of the original sound video. The original sound video may be a video in which the original sound of the video is used as the background sound of the video. The user can trigger the video recording instruction by clicking the video recording button in the running interface of the video recording application. After receiving the video recording instruction, the execution main body can simultaneously start the image acquisition device and the audio acquisition device to record the original sound video.

For a frame in video data, the acquisition time of the frame is determined as the timestamp of the frame, step 202.

In this embodiment, the execution subject may record the capture time when capturing each frame of the video data. The acquisition time for each frame may be a system time stamp (e.g., unix time stamp) at the time the frame was acquired. In addition, other timestamps may be used for the acquisition time of each frame, such as a relative timestamp with respect to a specified time. It should be noted that the timestamp (timestamp) is a complete and verifiable data that can indicate that a piece of data already exists at a particular time. Generally, a time stamp is a sequence of characters that uniquely identifies a time of a moment. Here, for a frame in the video data, the execution body may determine the capture time of the frame as the time stamp of the frame.

Step 203, using the first sampling time of the audio data as the starting time, and determining the time stamp of the frame based on the starting time, the total number of frames processed when the frame processing is completed, the preset number of samples per frame and the preset sampling frequency for the frame in the audio data.

In this embodiment, the execution body may use the first sampling time of the audio data as the start time. It should be noted that, when the system time stamp is used as the capture time of the frame in the video data, the start time may be the system time stamp of the first sampling of the audio data. When the relative timestamp relative to a specified time is taken as the capture time of a frame in the video data, then the start time may be the relative timestamp of the time of the first sample of the audio data relative to the specified time. For each frame of the sequentially acquired audio data, the execution subject may perform various processes on the frame. For example, it is possible to perform processing such as transparent transmission, reverberation, equalization, sound change, tone change, and speed change. For each processed frame, the execution body may determine the timestamp of the frame based on the start time, the total number of frames processed when the frame processing is completed, a preset number of samples per frame, and a preset sampling frequency.

As an example, the execution body may first determine a time duration of each frame based on a preset number of samples per frame and a preset sampling frequency. Here, the duration of each frame is a ratio of the number of samples per frame to the sampling frequency. Since the number of samples per frame and the sampling frequency are preset fixed values, the duration of each frame is a fixed value. Then, each time a frame is processed, the total frame number currently processed (i.e. the total frame number processed when the frame is processed) may be multiplied by the duration of each frame, where the product is the frame processing completion time, and the total duration of the execution main body processed is described above. Finally, the sum of the start time and the total duration currently processed may be determined as the timestamp of the frame.

As another example, for each processed frame, the execution body may also determine the acquisition time of the frame as the timestamp of the frame.

In some optional implementations of this embodiment, for a frame in the audio data, the execution subject may determine the timestamp of the frame according to the following steps: first, a ratio of a preset number of samples per frame to a preset sampling frequency may be determined. The product of the ratio and the total number of frames processed at the completion of the frame processing can then be determined. The sum of the product and the start time may then be determined as the target time for the frame. Finally, a timestamp for the frame may be determined based on a numerical comparison of the target time for the frame and the acquisition time for the frame. As an example, if the difference between the target time and the acquisition time is within a preset value interval, the target time may be determined as the timestamp of the frame. If the difference between the target time and the acquisition time is not within the numerical range, the acquisition time may be determined as the timestamp of the frame. Here, the numerical value interval may be an interval that is predetermined by a technician based on a large amount of data statistics. It should be noted that the above-mentioned determining the timestamp of the frame may be performed when the processing of each frame is completed. For each frame, the total number of frames processed when the frame processing is completed is the current total number of frames processed.

In some optional implementations of this embodiment, for a frame in the audio data, in response to determining that a difference between the target time of the frame and the acquisition time of the frame is less than a preset value, the target time of the frame may be determined as a timestamp of the frame. Here, the preset value may be a value predetermined by a technician based on a large amount of data statistics.

In some optional implementations of this embodiment, for a frame in the audio data, in response to determining that a difference between the target time of the frame and the acquisition time of the frame is not less than the preset value, the acquisition time of the frame may be determined as the timestamp of the frame.

In some optional implementation manners of this embodiment, in response to determining that the difference between the target time of the frame and the acquisition time of the frame is not less than the preset value, the executing body may further perform the following information resetting step: updating the starting time to the acquisition time of the frame; and clearing the total frame number which is processed at present. Here, when it is detected that the difference between the target time of a certain frame and the acquisition time of the frame is smaller than a preset value, the total frame number that has been processed currently is the total frame number that has been processed when the frame processing is completed.

In some optional implementations of this embodiment, after the information resetting step is performed, the performing main body may further perform the following steps: first, the frequency of execution of the above-described information resetting step may be determined. In response to determining that the execution frequency of the above-mentioned information resetting step is greater than the preset execution frequency threshold, for a frame in the audio data for which the timestamp is not determined and processed, the acquisition time of the frame may be determined as the timestamp of the frame.

In some optional implementations of this embodiment, after the information resetting step is performed, the performing main body may further perform the following steps: first, the number of times of execution of the reset information resetting step described above may be determined. Then, in response to determining that the execution times of the information resetting step are greater than a preset execution time threshold, for a frame in the audio data, for which the time stamp is not determined and processed, the acquisition time of the frame may be determined as the time stamp of the frame.

And step 204, storing the audio and video data with the time stamp.

In this embodiment, the execution main body may store audio data containing a time stamp and video data containing a time stamp. Here, the audio data including the time stamp and the video data including the time stamp may be stored in two files, respectively, and a mapping of the two files may be established. In addition, the audio data containing the time stamp and the video data containing the time stamp may be stored in the same file.

In some optional implementations of this embodiment, the storing the audio-video data with the time stamp may be performed according to the following steps: first, time-stamped audio-video data can be encoded. That is, audio data containing a time stamp and video data containing a time stamp are encoded separately. In practice, video coding may refer to the way a file in a certain video format is converted into a file in another video format by a specific compression technique. The audio coding can adopt wave coding, parameter coding, mixed coding and other coding modes. It should be noted that the audio coding and video coding techniques are well-known techniques that are widely researched and applied at present, and are not described herein again. And then, the encoded audio and video data can be stored locally, or the encoded audio and video data can be sent to a server. Specifically, the execution body may store the encoded audio data and the encoded video data in the same file, and store the file locally. The encoded audio data and the encoded video data may also be stored in the same file and transmitted to a server (e.g., server 105 shown in fig. 1) via a wired connection or a wireless connection.

With continued reference to fig. 3, fig. 3 is a schematic diagram of an application scenario of the method for processing data according to the present embodiment. In the application scenario of fig. 3, a user holds the terminal device 301 and records an original sound video. The terminal device 301 runs a short video recording application. After the user clicks the original sound video recording button in the interface of the short video recording application, the terminal device 301 simultaneously starts the microphone and the camera to collect the audio data 302 and the video data 303 respectively. For a frame in video data 303, terminal device 301 may determine the capture time of the frame as the timestamp of the frame. The terminal device 301 may use the first sampling time of the audio data 302 as a start time, and determine, for a frame in the audio data 302, a timestamp of the frame based on the start time, the total number of frames processed when the frame processing is completed, a preset number of samples per frame, and a preset sampling frequency. Finally, the time-stamped audio-video data is stored in a file 304.

In the method provided by the embodiment of the application, the audio and video data are collected, the collection time of the frame in the video data is determined as the time stamp of the frame, the first sampling time of the audio data is taken as the starting time, the time stamp of each frame is determined based on the starting time, the total number of frames processed when the processing of each frame in the audio data is completed, the preset sampling number and the sampling frequency, and finally the audio and video data with the time stamp are stored. The acquisition time is used as the timestamp of the frame of the video data, and the acquisition time can be directly acquired without calculating by using the fixed interval time, so that the problem of inaccurate timestamp caused by calculating the timestamp of the frame according to the fixed interval under the condition that the video data acquisition is unstable (for example, the frame is lost due to overheating of equipment and insufficient performance) is solved. Furthermore, in the case where audio data acquisition is unstable (for example, the device is overheated, and performance is insufficient to cause acquisition of small jitter), directly using the acquisition time of a frame of audio data as a time stamp causes a problem of non-uniformity of the time stamp. Because the audio data is continuous, when the time stamp is not uniform, the time stamp is not accurate enough, and by adopting the method provided by the embodiment of the application, the uniform and stable time stamp can be determined based on the total frame number, the sampling number and the sampling frequency processed when the frame processing is completed, so that the problems of non-uniform time stamp and inaccuracy of the frame of the audio data are avoided. Therefore, the accuracy of the time stamp of the audio and video data is improved, and the audio and video synchronization effect of the original audio and video recording is improved.

With further reference to FIG. 4, a flow 400 of yet another embodiment of a method for processing data is shown. The flow 400 of the method for processing data includes the steps of:

and step 401, collecting audio and video data.

terminal apparatuses

101, 102, 103 shown in fig. 1) may be mounted with an image pickup device (e.g., a camera) and an audio signal pickup device (e.g., a microphone). The execution main body can simultaneously start the image acquisition device and the audio acquisition device, and the image acquisition device and the audio acquisition device are utilized to acquire audio and video data. The audio and video data comprises audio data and video data.

For a frame in video data, an acquisition time of the frame is determined as a timestamp of the frame, step 402.

In this embodiment, the execution subject may record the capture time when capturing each frame of the video data. For a frame in video data, the execution body may determine the capture time of the frame as the timestamp of the frame.

Step 403, using the first sampling time of the audio data as the starting time, determining the ratio of the number of samples of each frame to the preset sampling frequency for the frame in the audio data, determining the product of the ratio and the total number of frames processed when the frame processing is completed, and determining the sum of the product and the starting time as the target time of the frame.

In this embodiment, the execution body may use the first sampling time of the audio data as the start time. For each frame of the sequentially acquired audio data, the execution subject may perform various processes on the frame. For example, it is possible to perform processing such as transparent transmission, reverberation, equalization, sound change, tone change, and speed change. For each processed frame, the executing entity may perform the following steps:

first, the ratio of the preset number of samples per frame to the preset sampling frequency is determined. Here, the determined ratio is the duration of each frame. Since the number of samples per frame and the sampling frequency are preset fixed values, the duration of each frame is a fixed value.

Thereafter, the product of the ratio and the total number of frames processed at the completion of the frame processing is determined. Here, for each frame that is processed, the total number of frames processed when the frame processing is completed is the current total number of frames processed. In practice, the product is the total duration of the frame processing completed by the execution body.

Finally, the sum of the product and the start time is determined as the target time of the frame.

In step 404, for a frame in the audio data, in response to determining that the difference between the target time of the frame and the acquisition time of the frame is less than a preset value, the target time of the frame is determined as the timestamp of the frame.

In this embodiment, for a frame in audio data, in response to determining that a difference between a target time of the frame and a capture time of the frame is less than a preset value, the target time of the frame may be determined as a timestamp of the frame.

Step 405, for a frame in the audio data, in response to determining that the difference between the target time of the frame and the acquisition time of the frame is not less than a preset value, determining the acquisition time of the frame as the timestamp of the frame.

In this embodiment, for a frame in the audio data, in response to determining that a difference between the target time of the frame and the capture time of the frame is not less than the preset value, the execution subject may determine the capture time of the frame as the timestamp of the frame.

In some optional implementation manners of this embodiment, in response to determining that the difference between the target time of the frame and the acquisition time of the frame is not less than the preset value, the executing body may further perform the following information resetting step: updating the starting time to the acquisition time of the frame; and clearing the total frame number which is processed at present.

In some optional implementations of this embodiment, after the information resetting step is performed, the performing body may further determine an execution frequency of the information resetting step. And in response to the fact that the execution frequency of the information resetting step is larger than a preset execution frequency threshold, determining the acquisition time of a frame with a processed and undetermined time stamp as the time stamp of the frame in the audio data. It should be noted that, when the execution frequency of the information resetting step is not greater than the execution frequency threshold, the execution of the subsequent processed frame in the audio data may continue according to the operation of step 403 to determine the timestamp of the frame.

In some optional implementations of this embodiment, after the information resetting step is performed, the performing body may further determine the number of times of performing the reset information resetting step. And in response to the fact that the execution times of the information resetting step are larger than a preset execution time threshold, determining the acquisition time of a frame with a processed and undetermined time stamp as the time stamp of the frame in the audio data. It should be noted that, when the execution time of the information resetting step is not greater than the execution time threshold, the subsequent processed frame in the audio data may continue to be executed according to the operation of step 403 to determine the timestamp of the frame.

And step 406, storing the audio and video data with the time stamp.

As can be seen from fig. 4, compared with the embodiment corresponding to fig. 2, the flow 400 of the method for processing data in the present embodiment embodies a step of determining, for a frame in audio data, a timestamp of the frame based on a numerical comparison of a target time of the frame and an acquisition time of the frame. In the case where audio data acquisition is unstable (e.g., device overheating, performance deficiency, etc.), the acquisition time of the frames of audio data is not uniform. And the target time determined by the total number of frames, the sampling frequency, and the number of samples per frame that are currently processed is uniform. When the deviation between the target time and the acquisition time is small, the acquisition is relatively stable, and the acquisition jitter amplitude is small. The uniformity and stability of the time stamp of the audio data can be further improved by using the total frame number, the sampling frequency and the target time determined by the sampling number of each frame which are processed currently as the frame time stamp in the audio data. When the deviation between the target time and the acquisition time is large, the conditions of unstable acquisition, frame loss and the like can be embodied. In this case, if the target time is used as the timestamp, the calculated timestamp is not the timestamp of the current frame when a frame is dropped, and the accuracy is low. The acquisition time is adopted, so that the relative accuracy of the time stamp can be ensured. Therefore, the time stamps are determined in different modes under different conditions, the accuracy of the time stamps of the audio and video data is improved, and the audio and video synchronization effect of the original audio and video recording is improved.

With further reference to fig. 5, as an implementation of the methods shown in the above-mentioned figures, the present application provides an embodiment of an apparatus for processing data, which corresponds to the method embodiment shown in fig. 2, and which is particularly applicable to various electronic devices.

As shown in fig. 5, the apparatus 500 for processing data according to the present embodiment includes: the acquisition unit 501 is configured to acquire audio and video data, where the audio and video data includes audio data and video data; a first determining unit 502 configured to determine, for a frame in the video data, an acquisition time of the frame as a timestamp of the frame; a second determining unit 503 configured to determine, for a frame in the audio data, a timestamp of the frame based on the start time, the total number of frames processed when the frame processing is completed, a preset number of samples per frame, and a preset sampling frequency, with the first sampling time of the audio data as a start time; a storage unit 504 configured to store the time-stamped audio-video data.

In some optional implementations of this embodiment, the second determining unit 503 may include a first determining module and a second determining module (not shown in the figure). The first determining module may be configured to determine, for a frame in the audio data, a ratio of a preset number of samples per frame to a preset sampling frequency, determine a product of the ratio and a total number of frames processed when the frame processing is completed, and determine a sum of the product and a start time as a target time of the frame. The second determination module may be configured to determine, for a frame in the audio data, a timestamp for the frame based on a numerical comparison of a target time for the frame and an acquisition time for the frame.

In some optional implementations of this embodiment, the second determining module may be further configured to, for a frame in the audio data, determine the target time of the frame as the timestamp of the frame in response to determining that a difference between the target time of the frame and the acquisition time of the frame is less than a preset value.

In some optional implementations of this embodiment, the second determining module may be further configured to, for a frame in the audio data, determine the acquisition time of the frame as the timestamp of the frame in response to determining that a difference between the target time of the frame and the acquisition time of the frame is not less than the preset value.

In some optional implementations of this embodiment, the apparatus may further include an execution unit (not shown in the figure). Wherein the execution unit may be configured to execute the following information resetting steps: updating the starting time to the acquisition time of the frame; and clearing the total frame number which is processed at present.

In some optional implementations of this embodiment, the apparatus may further include a third determining unit and a fourth determining unit (not shown in the figure). Wherein the third determining unit may be configured to determine an execution frequency of the information resetting step. The fourth determination unit may be configured to determine, for a frame of the audio data for which a timestamp is not determined and processed, an acquisition time of the frame as a timestamp of the frame in response to determining that the execution frequency of the information resetting step is greater than a preset execution frequency threshold.

In some optional implementations of this embodiment, the apparatus may further include a fifth determining unit and a sixth determining unit (not shown in the figure). Wherein the fifth determining unit may be configured to determine the number of times of performing the reset information resetting step. The sixth determining unit may be configured to determine, in response to determining that the number of times of execution of the information resetting step is greater than a preset number-of-times threshold of execution, for a frame in the audio data for which a timestamp is not determined and processed, a collection time of the frame as the timestamp of the frame.

The device provided by the above embodiment of the present application, acquires audio and video data through the acquisition unit 501, then the first determining unit 502 determines the collecting time of the frame in the video data as the time stamp of the frame, then the second determining unit 503 determines the time stamp of each frame based on the starting time, the total number of processed frames when the processing of each frame in the audio data is completed, the preset sampling number and the sampling frequency by taking the first sampling time of the audio data as the starting time, and finally the storing unit 504 stores the audio and video data with the time stamp, therefore, the problem of inaccurate time stamp caused by calculating the time stamp of the frame according to the same time interval under the condition that audio and video data acquisition is unstable (for example, frame loss caused by overheating of equipment and insufficient performance) is solved, the accuracy of the time stamp of the frame in the determined audio and video data is improved, and the audio and video synchronization effect of recording the original sound video is improved.

Referring now to FIG. 6, shown is a block diagram of a computer system 600 suitable for use in implementing a terminal device of an embodiment of the present application. The terminal device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.

As shown in fig. 6, the computer system 600 includes a Central Processing Unit (CPU)601 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage section 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the system 600 are also stored. The CPU 601, ROM 602, and RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

The following components are connected to the I/O interface 605: an input portion 606 including a touch screen, a touch panel, and the like; an output portion 607 including a display such as a Liquid Crystal Display (LCD) and a speaker; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. The driver 610 is also connected to the I/O interface 605 as needed. A removable medium 611 such as a semiconductor memory or the like is mounted on the drive 610 as necessary, so that the computer program read out therefrom is mounted in the storage section 608 as necessary.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 609, and/or installed from the removable medium 611. The computer program performs the above-described functions defined in the method of the present application when executed by a Central Processing Unit (CPU) 601. It should be noted that the computer readable medium described herein can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present application may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes an acquisition unit, a first determination unit, a second determination unit, and a storage unit. The names of the units do not in some cases form a limitation on the units themselves, and for example, the acquisition unit may also be described as a "unit for acquiring audio and video data".

As another aspect, the present application also provides a computer-readable medium, which may be contained in the apparatus described in the above embodiments; or may be present separately and not assembled into the device. The computer readable medium carries one or more programs which, when executed by the apparatus, cause the apparatus to: collecting audio and video data, wherein the audio and video data comprises audio data and video data; for a frame in video data, determining the acquisition time of the frame as a timestamp of the frame; taking the first sampling time of the audio data as the starting time, and determining the time stamp of the frame based on the starting time, the total frame number processed currently, the preset sampling number of each frame and the preset sampling frequency for the frame in the audio data; and storing the audio and video data with the time stamp.

The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims

1. A method for processing data, comprising:

collecting audio and video data, wherein the audio and video data comprises audio data and video data;

for a frame in video data, determining the acquisition time of the frame as a timestamp of the frame;

taking the first sampling time of the audio data as the starting time, and determining the time stamp of the frame based on the starting time, the total frame number processed when the frame processing is completed, the preset sampling number of each frame and the preset sampling frequency for the frame in the audio data;

storing the audio and video data with the time stamp;

wherein, the first sampling time of the audio data is used as the starting time, and for the frame in the audio data, the time stamp of the frame is determined based on the starting time, the total frame number processed when the frame is processed, the preset sampling number of each frame and the preset sampling frequency, and the method comprises the following steps:

for a frame in audio data, determining a ratio of a preset number of samples per frame to a preset sampling frequency, determining a product of the ratio and a total number of frames processed when the frame processing is completed, and determining a timestamp of the frame according to the sum of the product and a start time.

2. The method for processing data according to claim 1, wherein said determining, for a frame in the audio data, a timestamp of the frame based on a start time, a total number of frames processed when the frame processing is completed, a preset number of samples per frame, and a preset sampling frequency comprises:

for a frame in audio data, determining the ratio of the number of samples of each frame to the sampling frequency, determining the product of the ratio and the total number of frames processed when the frame is processed, and determining the sum of the product and the starting time as the target time of the frame;

for a frame in audio data, a timestamp for the frame is determined based on a numerical comparison of a target time for the frame and an acquisition time for the frame.

3. The method for processing data according to claim 2, wherein said determining, for a frame in audio data, a timestamp for the frame based on a numerical comparison of a target time for the frame and an acquisition time for the frame comprises:

for a frame in the audio data, in response to determining that a difference between a target time of the frame and a capture time of the frame is less than a preset value, determining the target time of the frame as a timestamp of the frame.

4. The method for processing data according to claim 2, wherein said determining, for a frame in audio data, a timestamp for the frame based on a numerical comparison of a target time for the frame and an acquisition time for the frame comprises:

for a frame in the audio data, in response to determining that a difference between a target time of the frame and a capture time of the frame is not less than a preset value, determining the capture time of the frame as a timestamp of the frame.

5. The method for processing data of claim 4, wherein after said responding to a determination that the difference between the target time for the frame and the acquisition time for the frame is not less than a preset value, the method further comprises:

the following information resetting steps are performed: updating the starting time to the acquisition time of the frame; and clearing the total frame number which is processed at present.

6. The method for processing data according to claim 5, wherein after performing said information resetting step, said method further comprises:

determining an execution frequency of the information resetting step;

in response to determining that the execution frequency of the information resetting step is greater than a preset execution frequency threshold, for a frame which is processed and has no timestamp determined in the audio data, determining the acquisition time of the frame as the timestamp of the frame.

7. The method for processing data according to claim 5, wherein after performing said information resetting step, said method further comprises:

determining the number of times of execution of the information resetting step;

and in response to the fact that the execution times of the information resetting step are larger than a preset execution time threshold, determining the acquisition time of a frame with a processed timestamp not determined as the timestamp of the frame in the audio data.

8. An apparatus for processing data, comprising:

a collection unit configured to collect audio and video data, the audio and video data including audio data and video data;

a first determination unit configured to determine, for a frame in video data, an acquisition time of the frame as a timestamp of the frame;

a second determination unit configured to determine, for a frame in the audio data, a timestamp of the frame based on the start time, a total number of frames processed when the frame processing is completed, a preset number of samples per frame, and a preset sampling frequency, with the first sampling time of the audio data as a start time;

the storage unit is configured to store the audio and video data with the time stamp;

wherein the second determining unit is further configured to determine, for a frame in the audio data, a ratio of a preset number of samples per frame to a preset sampling frequency, determine a product of the ratio and a total number of frames processed when the frame processing is completed, and determine a timestamp of the frame according to a sum of the product and a start time.

9. The apparatus for processing data according to claim 8, wherein said second determining unit comprises:

a first determination module configured to determine, for a frame in audio data, a ratio of a preset number of samples per frame to a preset sampling frequency, determine a product of the ratio and a total number of frames processed when the frame processing is completed, and determine a sum of the product and a start time as a target time of the frame;

a second determination module configured to determine, for a frame in the audio data, a timestamp for the frame based on a numerical comparison of a target time for the frame and an acquisition time for the frame.

10. The apparatus for processing data of claim 9, wherein the second determining module is further configured to:

11. The apparatus for processing data of claim 9, wherein the second determining module is further configured to:

12. The apparatus for processing data according to claim 11, wherein said apparatus further comprises:

an execution unit configured to execute the following information resetting steps: updating the starting time to the acquisition time of the frame; and clearing the total frame number which is processed at present.

13. The apparatus for processing data according to claim 12, wherein said apparatus further comprises:

a third determination unit configured to determine an execution frequency of the information resetting step;

a fourth determination unit configured to determine, for a frame of which the time stamp is not determined and which is processed in the audio data, an acquisition time of the frame as the time stamp of the frame in response to determining that the execution frequency of the information resetting step is greater than a preset execution frequency threshold.

14. The apparatus for processing data according to claim 12, wherein said apparatus further comprises:

a fifth determination unit configured to determine the number of times of execution of the information resetting step;

a sixth determining unit configured to determine, for a frame of which the time stamp is not determined and which is processed in the audio data, the acquisition time of the frame as the time stamp of the frame, in response to determining that the number of times of execution of the information resetting step is greater than a preset number-of-times threshold of execution.

15. A terminal device, comprising:

one or more processors;

a storage device having one or more programs stored thereon,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-7.

16. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-7.