WO2020024980A1

WO2020024980A1 - Data processing method and apparatus

Info

Publication number: WO2020024980A1
Application number: PCT/CN2019/098584
Authority: WO
Inventors: 周驿
Original assignee: 北京微播视界科技有限公司
Priority date: 2018-08-01
Filing date: 2019-07-31
Publication date: 2020-02-06
Also published as: CN109600665A; CN109600665B

Abstract

Disclosed in embodiments of the present application are a data processing method and apparatus. One exemplary embodiment of the method comprises: acquiring audio/video data, the audio/video data comprising audio data and video data; determining acquisition times of frames in the video data as timestamps of the frames in the video data; taking a first sampling time of the audio data as a starting time, and determining timestamps of frames in the audio data on the basis of the starting time, the total number of processed frames when the processing of the frames in the audio data is completed, a preset number of samplings of each frame, and a preset sampling frequency; and storing the audio/video data carrying the timestamps of the frames in the video data and the timestamps of the frames in the audio data.

Description

Method and device for processing data

The present disclosure claims the priority of the Chinese Patent Publication No. 201810865732.5, filed with the China Patent Office on August 01, 2018, the entire contents of which are incorporated herein by reference.

Technical field

Embodiments of the present disclosure relate to the field of computer technology, for example, to a method and an apparatus for processing data.

Background technique

When recording original video, you need to use a camera to collect video data, and at the same time, use a microphone to collect audio data. After the audio and video data is collected, the time stamp of the collected audio and video data can be determined. When the audio and video data is played, the player can play the audio and video data based on the time stamp. In applications with video recording capabilities, it is more common for recorded audio and video to be out of sync with audio and video.

In a related manner, the interval time between two adjacent frames in audio data and video data is generally considered to be fixed. For a certain frame in audio data and video data, the sum of the time stamp of the previous frame and the interval time is determined as the time stamp of the frame. Furthermore, the time stamp is recorded in the recorded audio and video data.

Summary of the invention

The following is an overview of the topics detailed in this article. This summary is not intended to limit the scope of protection of the claims.

The embodiments of the present disclosure provide a method and an apparatus for processing data.

In a first aspect, an embodiment of the present disclosure provides a method for processing data, the method including: collecting audio and video data, the audio and video data including audio data and video data; and determining a collection time of frames in the video data as all The timestamp of the frames in the video data; using the first sampling time of the audio data as the start time, based on the start time, the total number of frames processed when the frame processing in the audio data is completed, a preset The number of samples per frame and the preset sampling frequency to determine the timestamp of the frame in the audio data; store the audio with the timestamp of the frame in the video data and the timestamp of the frame in the audio data Video data.

In a second aspect, an embodiment of the present disclosure provides an apparatus for processing data. The apparatus includes: an acquisition unit configured to acquire audio and video data, the audio and video data including audio data and video data; and a first determination unit configured to Determining the collection time of the frames in the video data as the timestamp of the frames in the video data; a second determining unit configured to use the first sampling time of the audio data as a starting time, based on the starting time The start time, the total number of frames processed when the frame processing in the audio data is completed, the preset number of samples per frame, and the preset sampling frequency to determine the time stamp of the frames in the audio data; the storage unit, And configured to store audio and video data with a time stamp of a frame in the video data and a time stamp of a frame in the audio data.

According to a third aspect, an embodiment of the present disclosure provides a terminal device including: at least one processor; a storage device storing at least one program thereon, and when at least one program is executed by at least one processor, the at least one processor implements As in any one of the methods of processing data.

In a fourth aspect, an embodiment of the present disclosure provides a computer-readable medium having stored thereon a computer program that, when executed by a processor, implements the method as in any one of the methods for processing data.

After reading and understanding the accompanying drawings and the detailed description, the present disclosure may understand other aspects.

BRIEF DESCRIPTION OF THE DRAWINGS

public

FIG. 1 is an exemplary system architecture diagram to which an embodiment of the present disclosure can be applied; FIG.

2 is a flowchart of an embodiment of a method for processing data according to the present disclosure;

3 is a schematic diagram of an application scenario of a method for processing data according to the present disclosure;

4 is a flowchart of still another embodiment of a method for processing data according to the present disclosure;

5 is a schematic structural diagram of an embodiment of an apparatus for processing data according to the present disclosure;

FIG. 6 is a schematic structural diagram of a terminal device computer system suitable for implementing the embodiments of the present disclosure.

detailed description

The disclosure is further described in detail below with reference to the drawings and embodiments. It can be understood that the example embodiments described herein are only used to explain the disclosure, but not to limit the disclosure. It should also be noted that, for convenience of description, only the parts related to the present disclosure are shown in the drawings.

It should be noted that, in the case of no conflict, the embodiments in the present disclosure and the features in the embodiments can be combined with each other. The disclosure will be described in detail below with reference to the drawings and embodiments.

FIG. 1 illustrates an exemplary system architecture 100 to which a method for processing data or a device for processing data of the present disclosure can be applied.

As shown in FIG. 1, the system architecture 100 may include a terminal device 101, a terminal device 102, a terminal device 103, a network 104, and a server 105. The network 104 is used to provide a medium for a communication link between the terminal device 101, the terminal device 102, the terminal device 103, and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, and so on.

The user can use the terminal device 101, the terminal device 102, and the terminal device 103 to interact with the server 105 through the network 104 to receive or send messages (such as audio and video data upload requests) and the like. Various communication client applications can be installed on the terminal device 101, the terminal device 102, and the terminal device 103, such as video recording applications, audio playback applications, instant communication tools, email clients, social platform software, and the like.

The terminal device 101, the terminal device 102, and the terminal device 103 may be hardware or software. In the case where the terminal device 101, the terminal device 102, and the terminal device 103 are hardware, they can be various electronic devices with a display screen and audio and video recording, including but not limited to smartphones, tablets, laptops, and desktops Computer and so on. When the terminal device 101, the terminal device 102, and the terminal device 103 are software, they can be installed in the electronic devices listed above. It can be implemented as multiple software or software modules (for example, to provide distributed services), or it can be implemented as a single software or software module. It is not specifically limited here.

The terminal device 101, the terminal device 102, and the terminal device 103 may be equipped with an image acquisition device (such as a camera) to collect video data. In practice, the smallest visual unit that makes up a video is a frame. Each frame is a static image. Combining a sequence of temporally consecutive frames together forms a dynamic video. In addition, the terminal device 101, the terminal device 102, and the terminal device 103 may also be installed with an audio collection device (such as a microphone) to collect continuous analog audio signals. In practice, the data obtained by performing analog-to-digital conversion (ADC) on a continuous analog audio signal from a device such as a microphone at a certain frequency is audio data.

The terminal device 101, the terminal device 102, and the terminal device 103 may use an image acquisition device and an audio acquisition device installed on the terminal device 101 to collect video data and audio data, respectively. In addition, time stamp calculation and other processing may be performed on the collected video data, and finally the processing results (such as the collected audio data and video data including the time stamp) are stored.

The server 105 may be a server that provides various services, such as a background server that provides support for video recording applications installed on the terminal device 101, the terminal device 102, and the terminal device 103. The background server can analyze and store the received audio and video data upload requests and other data. It can also receive audio and video data acquisition requests sent by the terminal device 101, terminal device 102, and terminal device 103, and feed back the audio and video data indicated by the audio and video data acquisition request to the terminal device 101, terminal device 102, and terminal device 103 .

It should be noted that the server may be hardware or software. When the server is hardware, it can be implemented as a distributed server cluster consisting of multiple servers or as a single server. When the server is software, it can be implemented as multiple software or software modules (for example, to provide distributed services), or it can be implemented as a single software or software module. It is not specifically limited here.

It should be noted that the method for processing data provided by the embodiments of the present disclosure is generally executed by the terminal device 101, the terminal device 102, and the terminal device 103. Accordingly, the data processing device is generally provided in the terminal device 101, the terminal device 102, and the terminal Device 103.

It should be understood that the numbers of terminal devices, networks, and servers in FIG. 1 are merely exemplary. According to implementation needs, there can be any number of terminal devices, networks, and servers.

With continued reference to FIG. 2, a flowchart 200 of one embodiment of a method of processing data according to the present disclosure is shown. The method for processing data includes steps 201 to 204.

In step 201, audio and video data is collected.

In this embodiment, an execution subject of the method for processing data (for example, the

terminal devices

101, 102, and 103 shown in FIG. 1) may be installed with an image acquisition device (such as a camera) and an audio signal acquisition device (such as a microphone). The execution subject may turn on the image acquisition device and the audio signal acquisition device at the same time, and use the image acquisition device and the audio signal acquisition device to collect audio and video data. The audio and video data includes audio data and video data.

In practice, video data can be described by frames. Here, a frame is the smallest visual unit that makes up a video. Each frame is a static image. Combining a sequence of temporally consecutive frames together forms a dynamic video.

In practice, audio data is data obtained by digitizing a sound signal. The process of digitizing sound signals is a process of converting continuous analog audio signals from microphones and other equipment into digital signals at a certain frequency to obtain audio data. The digitization process of sound signals usually includes three steps: sampling, quantization, and encoding. Among them, sampling refers to replacing a signal that is continuous in time with a sequence of signal sample values at regular time intervals. Quantization refers to the use of finite amplitude approximation to indicate the amplitude value that continuously changes in time, and changes the continuous amplitude of the analog signal into a finite number of discrete values with a certain time interval. Encoding means that the quantized discrete value is represented by binary digits according to a certain rule. Generally, there are two important indicators for the digitization process of a sound signal, namely the sampling frequency (Sampling Rate) and the sampling size (Sampling Size). Among them, the sampling frequency is also called a sampling speed or a sampling frequency. The sampling frequency can be the number of samples taken from the continuous signal per second and composed of discrete signals. The sampling frequency can be expressed in Hertz (Hz). The sample size can be expressed in bits. Here, Pulse Code Modulation (Pulse Code Modulation, PCM) can implement digital audio data that is obtained by sampling, quantizing, and encoding an analog audio signal. Therefore, the above audio data may be data in a PCM encoding format.

It should be noted that the format of the file describing the target audio data may also be other formats, such as the mp3 format and the ape format. At this time, the target audio data may be data of other encoding formats (for example, lossy compression formats such as AAC (Advanced Audio Coding)), and is not limited to the PCM encoding format. The above-mentioned execution body may also perform format conversion on the file after obtaining the file, and convert it into a record wav format. At this time, the target audio file in the converted file is a data stream in PCM encoding format.

In practice, a video recording application may be installed in the execution body. This video recording application can support the recording of original video. The above-mentioned original sound video may be a video using the original sound of the video as the background sound of the video. The user can trigger the video recording instruction by clicking the video recording button in the running interface of the video recording application. After receiving the video recording instruction, the execution subject may simultaneously turn on the image acquisition device and the audio acquisition device to record the original video.

In step 202, for a frame in the video data, the acquisition time of the frame is determined as the time stamp of the frame.

In this embodiment, the above-mentioned execution subject may record the acquisition time when each frame of video data is acquired. The collection time of each frame may be a system timestamp (such as a Unix timestamp) when the frame is collected. In addition, the acquisition time of each frame can also adopt other timestamps, for example, relative timestamps relative to a specified time. It should be noted that the timestamp is a complete, verifiable data that can indicate that a piece of data already exists at a specific time. Usually, a timestamp is a sequence of characters that uniquely identifies the time of a moment. Here, for the frame in the video data, the execution body may determine the collection time of the frame as the time stamp of the frame.

In step 203, the first sampling time of the audio data is used as the starting time, and the starting time is determined based on the starting time, the total number of frames processed when the frame processing is completed, the preset number of samples per frame, and the preset sampling frequency. The timestamp of the frame.

In this embodiment, the execution body may use the first sampling time of the audio data as the starting time. It should be noted that when the system time stamp is used as the frame collection time in the video data, the above start time may be the system time stamp of the first sampling of the audio data. In the case that a relative time stamp with respect to a specified time is used as the acquisition time of the frame in the video data, the above-mentioned start time may be a relative time stamp of the time of the first sampling of the audio data with respect to the specified time. For each frame of the audio data sequentially acquired, the above-mentioned execution body can perform various processing on the frame. For example, transparent transmission, reverberation, equalization, sound change, tone change, speed change and other processing can be performed. For each frame after processing, the above-mentioned execution body may determine the time stamp of the frame based on the start time, the total number of frames processed when the frame is processed, the preset number of samples per frame, and the preset sampling frequency. .

The processing of frames in the present disclosure refers to processing such as transparent transmission, reverberation, equalization, sound change, tone change, speed change, and the like.

As an example, the execution body may first determine the duration of each frame based on a preset number of samples per frame and a preset sampling frequency. Here, the duration of each frame is a ratio of the number of samples in each frame and the sampling frequency. Since the number of samples and the sampling frequency of each frame are preset fixed values, the duration of each frame is a fixed value. Then, each time a frame is processed, the total number of frames currently processed (that is, the total number of frames processed when the frame is processed) can be multiplied by the duration of each frame, and the product is the time when the frame is processed. The total length of time the execution subject has processed. Finally, the sum of the start time and the total duration currently processed can be determined as the time stamp of the frame.

As another example, for each frame that has been processed, the execution body may determine the collection time of the frame as the time stamp of the frame.

In some implementations of this embodiment, for a frame in audio data, the execution body may determine the time stamp of the frame according to the following steps: first, a ratio of a preset number of samples per frame to a preset sampling frequency may be determined . Then, the product of the above ratio and the total number of frames processed when the frame processing is completed can be determined. After that, the sum of the above product and the start time can be determined as the target time of the frame. Finally, the time stamp of the frame can be determined based on a comparison between the target time of the frame and the value of the acquisition time of the frame. As an example, if the difference between the target time and the acquisition time is within a preset numerical interval, the target time may be determined as the time stamp of the frame. If the difference between the target time and the acquisition time is not within the above-mentioned value interval, the acquisition time may be determined as the time stamp of the frame. Here, the above-mentioned numerical interval may be an interval that is preset by a technician based on a large amount of data statistics. It should be noted that the above-mentioned time stamp for determining the frame may be executed when processing of each frame is completed. For each frame, the total number of frames processed when the frame processing is completed is the total number of frames currently processed.

In some implementations of this embodiment, for a frame in audio data, in response to determining that the difference between the target time of the frame and the acquisition time of the frame is less than a preset value, the target time of the frame may be determined as the frame Timestamp. Here, the preset value may be a value determined in advance by a technician based on a large amount of data statistics.

In some implementations of this embodiment, for a frame in audio data, in response to determining that the difference between the target time of the frame and the acquisition time of the frame is greater than or equal to the preset value, the acquisition time of the frame may be determined Is the timestamp of the frame.

In some implementations of this embodiment, in response to determining that the difference between the target time of the frame and the acquisition time of the frame is greater than or equal to the above-mentioned preset value, the above-mentioned execution subject may also perform the following information resetting step: The start time is updated to the acquisition time of the frame; and the total number of frames currently processed is cleared. Here, when the difference between the target time of a frame and the acquisition time of the frame is detected to be less than a preset value, the total number of frames currently processed is the total number of frames processed when the processing of the frame is completed. .

In some implementations of this embodiment, after performing the information resetting step, the execution subject may further perform the following steps: First, the execution frequency of the information resetting step may be determined. In response to determining that the execution frequency of the information resetting step is greater than a preset execution frequency threshold, for a processed frame in the audio data without determining a timestamp, the acquisition time of the frame may be determined as the timestamp of the frame.

In some implementations of this embodiment, determining the execution frequency of the information reset step may be, after the information reset step is performed, the execution frequency of the information reset step is calculated or directly read. After the execution frequency of the information resetting step is determined, it is compared with a preset execution frequency threshold.

In some implementations of this embodiment, after performing the information reset step, the execution subject may further perform the following steps: First, the number of executions of the information reset step may be determined. Then, in response to determining that the number of times of execution of the above information reset step is greater than a preset number of times of execution, for a processed frame in the audio data without determining a timestamp, the collection time of the frame may be determined as the timestamp of the frame .

In some implementations of this embodiment, determining the number of times the information reset step is performed may be, after performing the information reset step, calculating or directly reading the number of times the stored information reset step is executed. After determining the number of executions of the information reset step, it is compared with a preset number of executions threshold.

In step 204, audio and video data with a timestamp of a frame in the video data and a timestamp of a frame in the audio data are stored.

In this embodiment, the above-mentioned execution subject may store audio data including a time stamp and video data including a time stamp. Here, the audio data containing the timestamp and the video data containing the timestamp may be stored in two files respectively, and the mapping of the two files is established. In addition, you can store audio data with timestamps and video data with timestamps in the same file.

In some implementations of this embodiment, the audio and video data storing the timestamps of the frames in the video data and the timestamps of the frames in the audio data may be performed as follows: First, the timestamped Audio and video data are encoded. That is, the audio data including the time stamp and the video data including the time stamp are separately encoded. In practice, video encoding can refer to a method of converting a file in a certain video format to another file in a video format through a specific compression technology. Audio coding can use coding methods such as waveform coding, parameter coding, and hybrid coding. It should be noted that audio coding and video coding technologies are well-known technologies that are widely studied and applied at present, and will not be repeated here. Afterwards, the encoded audio and video data can be stored locally, or the encoded audio and video data can be sent to the server. For example, the execution body may store the encoded audio data and the encoded video data in the same file, and store the file locally. The encoded audio data and the encoded video data may also be stored in the same file and sent to the server (such as the server 105 shown in FIG. 1) through a wired connection or a wireless connection.

With continued reference to FIG. 3, FIG. 3 is a schematic diagram of an application scenario of a method for processing data according to this embodiment. In the application scenario of FIG. 3, a user holds a terminal device 301 and records an original video. A short video recording application runs on the terminal device 301. After the user clicks the original video recording button in the interface of the short video recording application, the terminal device 301 simultaneously turns on the microphone and the camera, and collects audio data 302 and video data 303, respectively. For a frame in the video data 303, the terminal device 301 may determine the collection time of the frame as the time stamp of the frame. The terminal device 301 may use the first sampling time of the audio data 302 as a start time, and determine the start time based on the start time, the total number of frames processed when the frame processing is completed, the preset number of samples per frame, and the preset sampling frequency. The timestamp of the frame. Finally, the timestamped audio and video data is stored in the file 304.

In the method provided by the foregoing embodiment of the present disclosure, by collecting audio and video data, and then determining the frame collection time in the video data as the time stamp of the frame, and then using the first sampling time of the audio data as the start time, based on the start Time, the total number of frames processed when each frame in the audio data is processed, the preset number of samples, and the sampling frequency, determine the timestamp of each frame, and finally store the timestamp with the frames in the video data and the audio data Framed time stamped audio and video data. Because the acquisition time is used as the time stamp of the frame of the video data, and the acquisition time can be obtained directly, it does not need to be calculated at a fixed interval. Therefore, it avoids the situation where the video data collection is unstable (such as equipment overheating, insufficient performance and loss Frame), the timestamp caused by calculating the timestamp of the frame at a fixed time interval is inaccurate. In addition, when the audio data collection is unstable (for example, the device is overheated and the performance is insufficient to cause a small jitter in the acquisition), directly using the acquisition time of the audio data frame as the timestamp may cause a non-uniform time stamp. Because the audio data is continuous, the timestamp is not accurate enough when the timestamps are not uniform. The method provided by the above embodiment of the present disclosure is adopted. The number of samples and the sampling frequency can determine a uniform and stable time stamp, thereby avoiding uneven and inaccurate timestamps of frames of audio data. Therefore, the accuracy of the time stamp of the audio and video data is improved, and the audio and video synchronization effect of the original audio and video recording is improved.

Referring to FIG. 4, a flowchart 400 of still another embodiment of a method of processing data is shown. The process 400 of the method for processing data includes steps 401 to 406.

In step 401, audio and video data is collected.

terminal devices

101, 102, and 103 shown in FIG. 1) may be installed with an image acquisition device (such as a camera) and an audio signal acquisition device (such as a microphone). The execution subject may turn on the image acquisition device and the audio acquisition device at the same time, and use the image acquisition device and the audio acquisition device to collect audio and video data. The audio and video data includes audio data and video data.

In step 402, for a frame in the video data, the acquisition time of the frame is determined as the time stamp of the frame.

In this embodiment, the above-mentioned execution subject may record the acquisition time when each frame of video data is acquired. For a frame in video data, the execution body may determine the collection time of the frame as the time stamp of the frame.

In step 403, the first sampling time of the audio data is used as the starting time. For the frames in the audio data, a ratio between a preset number of samples per frame and a preset sampling frequency is determined, and when the above ratio is determined and the processing of the frame is completed, The product of the total number of frames processed, the sum of the above product and the start time is determined as the target time of the frame.

In this embodiment, the execution body may use the first sampling time of the audio data as the starting time. For each frame in the audio data collected in turn, the above-mentioned execution body can perform various processing on the frame. For example, transparent transmission, reverberation, equalization, sound change, tone change, speed change and other processing can be performed. For each frame after processing, the above execution body can perform the following steps:

First, determine a ratio between a preset number of samples per frame and a preset sampling frequency. Here, the determined ratio is the duration of each frame. Since the number of samples and the sampling frequency of each frame are preset fixed values, the duration of each frame is a fixed value.

Then, the product of the above ratio and the total number of frames processed when the frame processing is completed is determined. Here, for each processed frame, the total number of frames processed when the frame processing is completed is the total number of frames currently processed. In practice, the above product is the time when the frame processing is completed and the execution body has processed the total time.

Finally, the sum of the above product and the start time is determined as the target time of the frame.

In step 404, for a frame in the audio data, in response to determining that the difference between the target time of the frame and the acquisition time of the frame is less than a preset value, the target time of the frame is determined as the time stamp of the frame.

In this embodiment, for a frame in audio data, in response to determining that the difference between the target time of the frame and the acquisition time of the frame is less than a preset value, the target time of the frame may be determined as the time stamp of the frame.

In step 405, for a frame in the audio data, in response to determining that the difference between the target time of the frame and the acquisition time of the frame is greater than or equal to a preset value, the acquisition time of the frame is determined as the time stamp of the frame.

In this embodiment, for a frame in audio data, in response to determining that a difference between a target time of the frame and a frame collection time is greater than or equal to the preset value, the execution body may determine the frame collection time as the The timestamp of the frame.

In some implementations of this embodiment, in response to determining that the difference between the target time of the frame and the acquisition time of the frame is greater than or equal to the above-mentioned preset value, the above-mentioned execution subject may also perform the following information resetting step: The start time is updated to the acquisition time of the frame; and the total number of frames currently processed is cleared.

In some implementations of this embodiment, after the information reset step is performed, the execution subject may further determine an execution frequency of the information reset step. In response to determining that the execution frequency of the information resetting step is greater than a preset execution frequency threshold, for a processed frame in the audio data without determining a time stamp, the frame collection time is determined as the frame time stamp. It should be noted that when the execution frequency of the information resetting step is less than or equal to the above-mentioned execution frequency threshold, for the frames in the audio data that are subsequently processed, the operation may continue to be performed according to step 403 to determine the time of the frame stamp.

In some implementations of this embodiment, after the information reset step is performed, the execution subject may further determine the number of times the information reset step is performed. In response to determining that the number of times of execution of the above information reset step is greater than a preset number of times of execution, for a processed frame in the audio data without determining a time stamp, the acquisition time of the frame is determined as the time stamp of the frame. It should be noted that when the number of times of execution of the above information reset step is less than or equal to the above number of execution times threshold, for the frames in the audio data that have been processed subsequently, the operation may continue to be performed according to step 403 to determine the time of the frame. stamp.

In step 406, the audio and video data with the timestamp of the frames in the video data and the timestamp of the frames in the audio data are stored.

As can be seen from FIG. 4, compared with the embodiment corresponding to FIG. 2, the process 400 of the method for processing data in this embodiment embodies a frame in audio data, based on the target time of the frame and the collection of the frame. The step of comparing the numerical value of time to determine the timestamp of the frame. In the case of unstable audio data collection (for example, when the device is overheated, insufficient performance, etc.), the frame acquisition time of the audio data is uneven. The target time determined by the total number of frames, the sampling frequency, and the number of samples per frame that are currently processed is uniform. In the case where the deviation between the target time and the acquisition time is small, it can be shown that the acquisition is relatively stable, and the amplitude of the acquisition jitter is small at this time. The target time determined by the total number of frames, sampling frequency, and number of samples per frame currently processed is used as the frame time stamp in the audio data, which can increase the uniformity and stability of the time stamp of the audio data. In the case of a large deviation between the target time and the acquisition time, it can reflect that the acquisition is unstable, and frames are dropped. At this time, if the target time is used as the timestamp, when the frame is dropped, the calculated timestamp is not the timestamp of the current frame, and the accuracy is low. The acquisition time is used at this time to ensure the relative accuracy of the time stamp. Therefore, the time stamp is determined in different ways in different situations, which improves the accuracy of the time stamp of the audio and video data, and improves the audio and video synchronization effect of the original audio and video recording.

Referring to FIG. 5, as an implementation of the methods shown in the foregoing figures, the present disclosure provides an embodiment of a device for processing data. The device embodiment corresponds to the method embodiment shown in FIG. 2, and the device can be applied. In various electronic equipment.

As shown in FIG. 5, the apparatus 500 for processing data according to this embodiment includes: a collecting unit 501 configured to collect audio and video data, where the audio and video data includes audio data and video data; a first determining unit 502 is configured For a frame in the video data, the acquisition time of the frame is determined as the time stamp of the frame; the second determination unit 503 is configured to use the first sampling time of the audio data as the start time, and based on the start time, the frame When the processing is completed, the total number of frames processed, the preset number of samples per frame, and the preset sampling frequency determine the time stamp of the frame; the storage unit 504 is configured to store the audio and video data with the time stamp.

In some implementations of this embodiment, the second determining unit 503 may be a first determining module and a second determining module (not shown in the figure). The first determining module may be configured to determine a ratio between a preset number of samples per frame and a preset sampling frequency for frames in the audio data, and determine the ratio and the total number of frames processed when the frame is processed. The product of, the sum of the above product and the start time is determined as the target time of the frame. The above-mentioned second determination module may be configured to determine, for a frame in the audio data, a time stamp of the frame based on a comparison between a target time of the frame and a value of the acquisition time of the frame.

In some implementations of this embodiment, the foregoing second determination module may be configured to, for a frame in audio data, in response to determining that a difference between a target time of the frame and a collection time of the frame is less than a preset value, set the The target time of a frame is determined as the time stamp of the frame.

In some implementations of this embodiment, the second determining module may be configured to be a frame in audio data, and in response to determining that a difference between a target time of the frame and a collection time of the frame is greater than or equal to the preset value , Determine the collection time of the frame as the time stamp of the frame.

In some implementations of this embodiment, the apparatus may further include an execution unit (not shown in the figure). Wherein, the execution unit may be configured to, when the difference between the target time of the frame in the audio data and the acquisition time of the frame in the audio data is greater than or equal to a preset value, After the acquisition time of the frames in the data is determined as the timestamp of the frames in the audio data, the following information resetting steps are performed: updating the start time to the acquisition time of the frame; and updating the total frames currently processed The number is cleared.

In some implementations of this embodiment, the apparatus may further include a third determining unit and a fourth determining unit (not shown in the figure). The third determining unit may be configured to determine an execution frequency of the information resetting step after the information resetting step is performed. The fourth determining unit may be configured to determine, in response to determining that the execution frequency of the information resetting step is greater than a preset execution frequency threshold, for a frame in the audio data that has been processed without a timestamp, the acquisition time of the frame is determined. Is the timestamp of the frame.

In some implementations of this embodiment, the apparatus may further include a fifth determination unit and a sixth determination unit (not shown in the figure). The fifth determining unit may be configured to determine the number of times the information reset step is performed after the information reset step is performed. The above-mentioned sixth determining unit may be configured to determine, in response to determining that the number of times of execution of the information reset step is greater than a preset number of times of execution, for a processed frame in the audio data without determining a timestamp, determining the frame collection time Is the timestamp of the frame.

The device provided by the foregoing embodiment of the present disclosure collects audio and video data through the acquisition unit 501, and then the first determination unit 502 determines the collection time of the frame in the video data as the time stamp of the frame, and then the second determination unit 503 determines the audio The first sampling time of the data is used as the starting time, and the time stamp of each frame is determined based on the starting time, the total number of frames processed when the processing of each frame in the audio data is completed, the preset number of samples, and the sampling frequency. Store timestamped audio and video data, thereby avoiding the situation where the audio and video data collection is unstable (such as device overheating and insufficient performance resulting in dropped frames), resulting from the calculation of the timestamp of the frames at the same time interval In the case of inaccurate timestamps, the accuracy of the timestamps of the frames in the determined audio and video data is improved, and the audio and video synchronization effect of the original audio and video recording is improved.

Reference is now made to FIG. 6, which illustrates a schematic structural diagram of a computer system 600 suitable for implementing a terminal device according to an embodiment of the present disclosure. The terminal device shown in FIG. 6 is only an example, and should not impose any limitation on the functions and scope of use of the embodiments of the present disclosure.

As shown in FIG. 6, the computer system 600 includes a central processing unit (CPU) 601, which can be loaded into random access according to a program stored in a read-only memory (ROM) 602 or from a storage portion 608 A program in a memory (Random Access Memory, RAM) 603 performs various appropriate actions and processes. In the RAM 603, various programs and data required for the operation of the system 600 are also stored. The CPU 601, the ROM 602, and the RAM 603 are connected to each other through a bus 604. An input / output (I / O) interface 605 is also connected to the bus 604.

The following components are connected to the I / O interface 605: an input portion 606 including a touch screen, a touch panel, etc .; an output portion 607 including a liquid crystal display (LCD), and a speaker; a storage portion 608 including a hard disk; and A communication part 609 of a network interface card, such as a local area network (LAN) card, a modem, or the like. The communication section 609 performs communication processing via a network such as the Internet. The driver 610 is also connected to the I / O interface 605 as necessary. A removable medium 611, such as a semiconductor memory or the like, is installed on the drive 610 as necessary, so that a computer program read therefrom is installed into the storage section 608 as necessary.

According to an embodiment of the present disclosure, the process described above with reference to the flowchart may be implemented as a computer software program. For example, embodiments of the present disclosure include a computer program product including a computer program carried on a computer-readable medium, the computer program containing program code for performing a method shown in a flowchart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication portion 609, and / or installed from a removable medium 611. When the computer program is executed by a central processing unit (CPU) 601, the above-mentioned functions defined in the method of the present disclosure are performed. It should be noted that the computer-readable medium described in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium or any combination of the foregoing. The computer-readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programming read-only memory (Erasable, Programmable, Read-Only, Memory (EPROM) or flash memory), optical fiber, portable compact disk read-only memory (Compact Disc-Read-Only Memory, CD-ROM), optical storage device, magnetic storage device, or the Any suitable combination. In this disclosure, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in combination with an instruction execution system, apparatus, or device. In this disclosure, a computer-readable signal medium may include a data signal that is included in baseband or propagated as part of a carrier wave, and which carries computer-readable program code. Such a propagated data signal may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. The computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, and the computer-readable medium may send, propagate, or transmit a program for use by or in connection with an instruction execution system, apparatus, or device . The program code contained on the computer-readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, optical cable, radio frequency (RF), or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagram may represent a module, a program segment, or a part of code, which contains one or more functions to implement a specified logical function Executable instructions. It should also be noted that in some alternative implementations, the functions noted in the blocks may also occur in a different order than those marked in the drawings. For example, two successively represented boxes may actually be executed substantially in parallel, and they may sometimes be executed in the reverse order, depending on the functions involved. It should also be noted that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, can be implemented by a dedicated hardware-based system that performs the specified function or operation , Or it can be implemented with a combination of dedicated hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software or hardware. The described unit may also be provided in a processor, for example, it may be described as: a processor includes an acquisition unit, a first determination unit, a second determination unit, and a storage unit. Among them, the names of these units do not in any way constitute a limitation on the unit itself. For example, the acquisition unit can also be described as a “unit that collects audio and video data”.

As another aspect, the present disclosure also provides a computer-readable medium, which may be included in the device described in the above embodiments; or may exist alone without being assembled into the device. The computer-readable medium carries one or more programs, and when the one or more programs are executed by the device, the device causes the device to: collect audio and video data, the audio and video data includes audio data and video data; for video data, Frame, the acquisition time of the frame is determined as the time stamp of the frame; the first sampling time of the audio data is used as the start time, based on the start time, the total number of frames currently processed, and the preset number of samples per frame And a preset sampling frequency to determine the timestamp of the frame; store the timed audio and video data.

Claims

A method for processing data, including:

Collecting audio and video data, the audio and video data including audio data and video data;

Determining a collection time of a frame in the video data as a time stamp of the frame in the video data;

Use the first sampling time of the audio data as a starting time, based on the starting time, the total number of frames processed when frame processing in the audio data is completed, a preset number of samples per frame, and a preset sample Frequency, determining a time stamp of a frame in the audio data;

Audio and video data with timestamps of frames in the video data and timestamps of frames in the audio data are stored.
The method according to claim 1, wherein the determining is based on a start time, a total number of frames processed when frame processing in the audio data is completed, a preset number of samples per frame, and a preset sampling frequency. The timestamp of the frame in the audio data includes:

Determining a preset ratio of the number of samples per frame to a preset sampling frequency, determining a product of the ratio and the total number of frames processed when frame processing in the audio data is completed, and combining the product with the starting time And determining the target time as a frame in the audio data;

A time stamp of a frame in the audio data is determined based on a comparison between a target time of a frame in the audio data and a value of a collection time of the frame in the audio data.
The method according to claim 2, wherein the determining the time of the frame in the audio data is based on a numerical comparison of a target time of the frame in the audio data with a collection time of the frame in the audio data Poke, including:

In response to determining that the difference between the target time of the frame in the audio data and the acquisition time of the frame in the audio data is less than a preset value, determining the target time of the frame in the audio data as the audio data Timestamp of the frame.
The method according to claim 2, wherein the time stamp of the frame in the audio data is determined based on a numerical comparison of a target time of the frame in the audio data and a collection time of the frame in the audio data. ,include:

In response to determining that the difference between the target time of the frame in the audio data and the acquisition time of the frame in the audio data is greater than or equal to a preset value, determining the acquisition time of the frame in the audio data as the audio The timestamp of the frame in the data.
The method according to claim 4, wherein in response to determining that a difference between a target time of a frame in the audio data and a collection time of a frame in the audio data is greater than or equal to a preset value, the audio After the collection time of the frames in the data is determined as the timestamp of the frames in the audio data, the method further includes:

The following information resetting steps are performed: updating the start time to the acquisition time of frames in the audio data; and clearing the total number of frames currently processed to zero.
The method according to claim 5, after performing the information resetting step, further comprising:

Determining an execution frequency of the information resetting step;

In response to determining that the execution frequency of the information reset step is greater than a preset execution frequency threshold, for the processed and undetermined timestamp frames in the audio data, The frame acquisition time is determined as the timestamp of the corresponding frame.
The method according to claim 5, after performing the information resetting step, further comprising:

Determining the number of executions of the information resetting step;

In response to determining that the number of times of execution of the information resetting step is greater than a preset number of times of execution, for the processed and undetermined timestamp frames in the audio data, The frame acquisition time is determined as the timestamp of the corresponding frame.
A device for processing data includes:

An acquisition unit configured to acquire audio and video data, where the audio and video data includes audio data and video data;

A first determining unit configured to determine an acquisition time of a frame in the video data as a time stamp of the frame in the video data;

The second determining unit is configured to use the first sampling time of the audio data as a start time, based on the start time, a total number of frames processed when frame processing in the audio data is completed, and a preset each The number of frame samples and a preset sampling frequency to determine a time stamp of a frame in the audio data;

The storage unit is configured to store audio and video data with a time stamp of a frame in the video data and a time stamp of a frame in the audio data.
The apparatus according to claim 8, wherein the second determining unit comprises:

A first determining module is configured to determine a ratio between a preset number of samples per frame and a preset sampling frequency, determine a product of the ratio and a total number of frames processed when frame processing in the audio data is completed, and The sum of the product and the start time is determined as a target time of a frame in the audio data;

A second determination module is configured to determine a time stamp of a frame in the audio data based on a comparison between a target time of the frame in the audio data and a value of a collection time of the frame in the audio data.
The apparatus according to claim 9, wherein the second determination module is configured to:

In response to determining that the difference between the target time of the frame in the audio data and the acquisition time of the frame in the audio data is less than a preset value, determining the target time of the frame in the audio data as the audio data Timestamp of the frame.
The apparatus according to claim 9, wherein the second determination module is configured to:

In response to determining that the difference between the target time of the frame in the audio data and the acquisition time of the frame in the audio data is greater than or equal to a preset value, determining the acquisition time of the frame in the audio data as the audio The timestamp of the frame in the data.
The apparatus according to claim 11, further comprising:

The execution unit is configured to, in response to determining that a difference between a target time of a frame in the audio data and a collection time of the frame in the audio data is greater than or equal to a preset value, After the frame collection time is determined as the time stamp of the frame in the audio data, the following information resetting step is performed: updating the start time to the collection time of the frame in the audio data; and updating the currently processed The total number of frames completed is cleared.
The apparatus according to claim 12, further comprising:

A third determining unit configured to determine an execution frequency of the information resetting step after executing the information resetting step;

A fourth determining unit is configured to, in response to determining that the execution frequency of the information resetting step is greater than a preset execution frequency threshold, for frames in the audio data that have been processed and have not been determined with a timestamp, The collection time of a processed frame without a timestamp is determined as the timestamp of the corresponding frame.
The apparatus according to claim 12, further comprising:

A fifth determining unit configured to determine the number of times the information reset step is performed after the information reset step is performed;

A sixth determining unit is configured to, in response to determining that the number of times of execution of the information reset step is greater than a preset number of times of execution, for the frames in the audio data that have been processed without a timestamp, the The collection time of a processed frame without a timestamp is determined as the timestamp of the corresponding frame.
A terminal device includes:

At least one processor;

A storage device storing at least one program thereon,

The at least one program is executed by the at least one processor, so that the at least one processor implements the method according to any one of claims 1-7.
A computer-readable medium having stored thereon a computer program which, when executed by a processor, implements the method according to any one of claims 1-7.