WO2020024962A1

WO2020024962A1 - Method and apparatus for processing data

Info

Publication number: WO2020024962A1
Application number: PCT/CN2019/098510
Authority: WO
Inventors: 宫昀
Original assignee: 北京微播视界科技有限公司
Priority date: 2018-08-01
Filing date: 2019-07-31
Publication date: 2020-02-06
Also published as: CN109600650A; CN109600650B

Abstract

Disclosed by the embodiments of the present disclosure are a method and apparatus for processing data. An exemplary embodiment of the method Include: collecting video data and playing target audio data; determining a data amount of target audio data that has been played when frames in the video data are collected, and determining a playback duration corresponding to the data amount as a timestamp of a corresponding frame in the video data; and storing the video data containing the timestamp and the played data in the target audio data.

Description

Method and device for processing data

This disclosure claims priority from a Chinese patent application filed with the Chinese Patent Office on August 01, 2018, with application number 201810866740.1, the entire contents of which are incorporated herein by reference.

Technical field

Embodiments of the present disclosure relate to the field of computer technology, for example, to a method and an apparatus for processing data.

Background technique

When recording a soundtrack video, audio (soundtrack) playback is usually performed at the same time as video capture with the camera. For example, during a song, a singing action performed by a user is recorded, and the recorded video uses the song as background music. In applications with video recording capabilities, it is common for audio and video to be out of sync for recorded soundtrack videos. Taking an Android device as an example, since there are differences between different devices, it is difficult to achieve the synchronization of recorded audio and video on different devices.

In a related manner, during the recording of a soundtrack video, it is generally considered that the interval time between two adjacent frames in the collected video data is fixed. For a frame in video data, the sum of the time stamp of the previous frame and the interval time is usually determined as the time stamp of the frame. Further, the time-stamped video data and the played audio data are stored.

Summary of the invention

The following is an overview of the topics detailed in this article. This summary is not intended to limit the scope of protection of the claims.

The embodiments of the present disclosure provide a method and an apparatus for processing data.

In a first aspect, an embodiment of the present disclosure provides a method for processing data. The method includes: collecting video data and playing target audio data; and determining a data amount of the target audio data that has been played when frames in the video data are collected , Determining the playback duration corresponding to the data amount as the timestamp of the frame in the video data; storing the video data containing the timestamp and the already played data in the target audio data.

In a second aspect, an embodiment of the present disclosure provides a device for processing data. The device includes: an acquisition unit configured to acquire video data and play target audio data; a first determination unit configured to determine that the video is acquired The data amount of the target audio data that has been played at the time of the frame in the data, and the playback time corresponding to the data amount is determined as the time stamp of the frame in the video data; the storage unit is configured to store the video data and the target including the time stamp Played data in audio data.

According to a third aspect, an embodiment of the present disclosure provides a terminal device including: at least one processor; a storage device storing at least one program thereon, and when at least one program is executed by at least one processor, the at least one processor implements As in any one of the methods of processing data.

In a fourth aspect, an embodiment of the present disclosure provides a computer-readable medium having stored thereon a computer program that, when executed by a processor, implements the method as in any one of the methods for processing data.

After reading and understanding the drawings and detailed description, other aspects can be understood.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is an exemplary system architecture diagram to which an embodiment of the present disclosure can be applied; FIG.

2 is a flowchart of an embodiment of a method for processing data according to the present disclosure;

3 is a schematic diagram of an application scenario of a method for processing data according to the present disclosure;

4 is a flowchart of still another embodiment of a method for processing data according to the present disclosure;

5 is a schematic structural diagram of an embodiment of an apparatus for processing data according to the present disclosure;

FIG. 6 is a schematic structural diagram of a computer system suitable for implementing a terminal device according to an embodiment of the present disclosure.

detailed description

The disclosure is further described in detail below with reference to the drawings and embodiments. It can be understood that the example embodiments described herein are only used to explain the disclosure, but not to limit the disclosure. It should also be noted that, for convenience of description, only the parts related to the present disclosure are shown in the drawings.

It should be noted that, in the case of no conflict, the embodiments in the present disclosure and the features in the embodiments can be combined with each other. The disclosure will be described in detail below with reference to the drawings and embodiments.

FIG. 1 illustrates an exemplary system architecture 100 to which a method for processing data or a device for processing data of the present disclosure can be applied.

As shown in FIG. 1, the system architecture 100 may include a terminal device 101, a terminal device 102, a terminal device 103, a network 104, and a server 105. The network 104 is used to provide a medium for a communication link between the terminal device 101, the terminal device 102, the terminal device 103, and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, and so on.

The user can use the terminal device 101, the terminal device 102, and the terminal device 103 to interact with the server 105 through the network 104 to receive or send messages (such as audio and video data upload requests, audio data acquisition requests), and the like. Various communication client applications can be installed on the terminal device 101, the terminal device 102, and the terminal device 103, such as video recording applications, audio playback applications, instant communication tools, email clients, social platform software, and the like.

The terminal device 101, the terminal device 102, and the terminal device 103 may be hardware or software. When the terminal device 101, terminal device 102, and terminal device 103 are hardware, they can be various electronic devices with a display screen and video recording and audio playback, including but not limited to smartphones, tablets, laptops, and desktops Computer and so on. When the terminal device 101, the terminal device 102, and the terminal device 103 are software, they can be installed in the electronic devices listed above. It can be implemented as multiple software or software modules (for example, to provide distributed services), or it can be implemented as a single software or software module. It is not specifically limited here.

The terminal device 101, the terminal device 102, and the terminal device 103 may be equipped with an image acquisition device (such as a camera) to collect video data. In practice, the smallest visual unit that makes up a video is a frame. Each frame is a static image. Combining a sequence of temporally consecutive frames together forms a dynamic video. In addition, the terminal device 101, the terminal device 102, and the terminal device 103 may also be provided with a device (for example, a speaker) for converting an electric signal into a sound to play the sound. In practice, the audio data is obtained by performing analog-to-digital conversion (ADC) on an analog audio signal at a certain frequency. The playback of audio data is a process of digital-to-analog conversion of digital audio signals, reduction to analog audio signals, and conversion of analog audio signals (analog audio signals to electrical signals) into sound for output.

The terminal device 101, the terminal device 102, and the terminal device 103 can use the image acquisition device installed on them to collect video data, and can use the audio data processing (such as converting digital audio signals to analog Audio signal) components and speakers play audio data. In addition, the terminal device 101, the terminal device 102, and the terminal device 103 may perform processing such as timestamp calculation on the collected video data, and finally store the processing results (for example, the video data including the timestamp and the played audio data).

The server 105 may be a server that provides various services, such as a background server that provides support for video recording applications installed on the

terminal devices

101, 102, and 103. The background server can analyze and store the received audio and video data upload requests and other data. It can also receive audio and video data acquisition requests sent by the terminal device 101, terminal device 102, and terminal device 103, and feed back the audio and video data indicated by the audio and video data acquisition request to the terminal device 101, terminal device 102, and terminal device 103 .

It should be noted that the server may be hardware or software. When the server is hardware, it can be implemented as a distributed server cluster composed of multiple servers or as a single server. When the server is software, it can be implemented as multiple software or software modules (for example, to provide distributed services), or it can be implemented as a single software or software module. It is not specifically limited here.

It should be noted that the method for processing data provided by the embodiments of the present disclosure is generally executed by the terminal device 101, the terminal device 102, and the terminal device 103. Accordingly, the data processing device is generally provided in the terminal device 101, the terminal device 102, and the terminal Device 103.

It should be understood that the numbers of terminal devices, networks, and servers in FIG. 1 are merely exemplary. According to implementation needs, there can be any number of terminal devices, networks, and servers.

With continued reference to FIG. 2, a flowchart 200 of one embodiment of a method of processing data according to the present disclosure is shown. The method for processing data includes steps 201 to 203.

In step 201, video data is collected and target audio data is played.

In this embodiment, an execution subject of the method for processing data (for example, the terminal device 101, the terminal device 102, and the terminal device 103 shown in FIG. 1) may obtain a file in which the target audio data is recorded. Here, the above-mentioned target audio data may be audio data specified in advance by the user as a soundtrack of the video, for example, audio data corresponding to a specified song. Here, the execution body may store a large number of files in which different audio data are recorded in advance. The execution subject may directly find and obtain the target audio data file recorded from the local.

In practice, audio data is data obtained by digitizing a sound signal. The process of digitizing sound signals is a process of converting continuous analog audio signals into digital signals at a certain frequency to obtain audio data. Generally, the digitization process of a sound signal includes three steps: sampling, quantization, and encoding. Among them, sampling refers to replacing a signal that is continuous in time with a sequence of signal sample values at regular time intervals. Quantization refers to the use of finite amplitude approximation to indicate the amplitude value that continuously changes in time, and changes the continuous amplitude of the analog signal into a finite number of discrete values with a certain time interval. Encoding means that the quantized discrete value is represented by binary digits according to a certain rule. Here, Pulse Code Modulation (Pulse Code Modulation, PCM) can implement digital audio data that is obtained by sampling, quantizing, and encoding an analog audio signal. Therefore, the above-mentioned target audio data may be a data stream in a PCM encoding format. At this time, the format of the file describing the target audio data may be a wav format.

It should be noted that the format of the file describing the target audio data may also be other formats, such as the mp3 format and the ape format. At this time, the target audio data may be data of other encoding formats (for example, lossy compression formats such as Advanced Audio Coding (AAC)), and is not limited to the PCM encoding format. The above-mentioned execution body may also perform format conversion on the file after obtaining the file, and convert it into a record wav format. At this time, the target audio file in the converted file is a data stream in PCM encoding format.

It should be noted that the playback of audio data may be a process of digitally analogizing the digital audio data, restoring it to an analog audio signal, and then converting the analog audio signal (electrical signal) into sound for output.

In this embodiment, the above-mentioned execution body may be equipped with an image acquisition device, such as a camera. The execution subject may use the camera to collect video data. In practice, video data can be described by frames. Here, a frame is the smallest visual unit that makes up a video. Each frame is a static image. Combining a sequence of temporally consecutive frames together forms a dynamic video. In addition, the above-mentioned execution body may also be provided with a device for converting an electric signal into a sound, such as a speaker. After obtaining the target audio data, the execution subject may turn on the camera to collect video data, and at the same time, may convert the target audio data into an analog audio signal and output sound using the speaker to implement playback of the target audio data.

In this embodiment, the above-mentioned execution subject may play the target audio data in various ways. As an example, the above-mentioned execution body may first instantiate a class for playing audio and video (for example, the MediaPlayer class in the Android multimedia package) to create an object. This object can be used to play the above target audio data. Then, the target video data can be transmitted to the object to play the target audio data. In practice, the MediaPlayer class in the Android multimedia package can support playing sound files in multiple formats. For example, mp3 format, aac format, wav format, etc. When playing audio data, it first decodes the data into a data stream in PCM encoding format, and then performs digital-to-analog conversion on the data stream.

Generally, a video recording application can be installed in the execution body. This video recording application can support the recording of soundtrack videos. The above soundtrack video may be a video that plays audio data while video data is being collected. The sound in the recorded soundtrack video is the sound corresponding to the audio data. For example, during a song, a singing action performed by a user is recorded, and the recorded video uses the song as background music. The user may first click the name of an audio (such as the name of a song or melody) in the running interface of the video recording application. Then, the execution body can obtain the audio data corresponding to the name and use it as the target audio data. After that, the user can click the video recording button in the running interface of the video recording application to trigger a video recording instruction. After receiving the video recording instruction, the above-mentioned executive body can turn on the camera for video recording, and at the same time, process the target audio data, convert it into an analog audio signal, and use the speaker to output sound. Users can perform action performances while listening to sound to record performance videos.

In one application scenario, users can perform continuous recording of videos. At this time, the above-mentioned execution subject can continuously collect video data and simultaneously play the target audio data.

In another application scenario, users can perform segmented recording of videos. As an example, first record the first segment. At this time, the above-mentioned executive body can continuously collect video data and simultaneously play the target audio data until it detects that the user triggers a pause recording instruction (such as clicking the recording button or releasing the recording button), pausing the playback of the target audio data and stopping the video data collection . After detecting that the user triggers the resume recording instruction (for example, clicking the recording button again), the above-mentioned execution body can continue to continuously collect video data, and at the same time, continue to play the target audio data (that is, the amount of data played in the first segment is regarded as Starting point of the second segment), until it is detected that the user triggers the pause recording instruction again, pauses the playback of the target audio data and stops the collection of video data, and so on.

In step 202, the data amount of the target audio data that has been played when frames in the video data are collected is determined, and the playback time corresponding to the data amount is determined as the time stamp of the frames in the video data.

In this embodiment, when each frame of the video data is collected by the execution subject, the frame collection time can be recorded. The collection time of each frame may be a system timestamp (such as a Unix timestamp) when the frame is collected. It should be noted that the timestamp is a complete, verifiable data that can indicate that a piece of data already exists at a specific time. Usually, a timestamp is a sequence of characters that uniquely identifies the time of a moment. Here, the execution subject may determine the acquisition time of the first frame of the video data as the start time of the video data.

For a frame in the video data, the execution subject can read the acquisition time of the frame. Then, the data amount of the target audio data that has been played at the acquisition time can be determined. Finally, the playback time corresponding to the data amount can be determined as the time stamp of the frame. Here, various methods can be used to determine the data amount of the target audio data that has been played at a certain acquisition time. As an example, after performing instantiation of a preset class for playing audio and video (such as the MediaPlayer class in the Android multimedia package) and transmitting the target video data to the created object, you can determine the Acquisition time, the amount of target video data that has been transferred to the object. After that, the data amount can be determined as the data amount of the target audio data that has been played when the frame is acquired.

Here, the target audio data is obtained by sampling and quantizing the sound signal according to the set sampling frequency (Sampling and Rate) and the set sampling size (Sampling), and the number of channels of the target audio data is played. It is predetermined. Therefore, based on the data amount of the target audio data that has been played at the acquisition time of a certain frame of image, and the sampling frequency, sample size, and number of channels, the playback time of the target audio data when the corresponding frame is acquired can be calculated. The execution subject may determine the playback duration as the time stamp of the frame. In practice, the sampling frequency is also called the sampling speed or sampling rate. The sampling frequency can be the number of samples taken from the continuous signal per second and composed of discrete signals. The sampling frequency can be expressed in Hertz (Hz). The sample size can be expressed in bits. Here, the steps for determining the playback duration are as follows: First, the product of the sampling frequency, the sampling size, and the number of channels can be determined. Then, the ratio of the data amount of the target audio data that has been played to the product can be determined as the playback duration of the target audio data.

In some implementations of this embodiment, the foregoing target audio data may be a data stream in a PCM encoding format. The above-mentioned execution body can also play the target audio data through the following steps: first, instantiate the target class (such as the Audio Track class in the Android development kit) to create a target for playing the target audio data Object. The target class can be used to play a data stream in PCM encoding format. Thereafter, the target audio data may be transmitted to the target object in a streaming manner, so as to play the target audio data by using the target object.

In practice, the Audio Track in the Android Development Kit is a class that manages and plays a single audio resource. It is used for playback of PCM audio streams. Generally, audio data is played by pushing the audio data to an object instantiated with Audio Track by using a push method. Audio Track objects can operate in two modes. They are static mode and streaming mode. In stream mode, write (by calling the write method) a continuous PCM-encoded data stream to the Audio Track object. In the above implementation manner, the target audio data can be written in a streaming mode.

In some implementations of this embodiment, if the above implementation is used (that is, based on the instantiation of the target class (such as the Audio track class in the Android development kit) to play the target audio data, the following implementation methods can be used Determine the data volume of the target audio data that has been played: For the frame of the video data, determine the data volume of the target audio data that has been transmitted to the target object when the frame is collected, and determine the data volume as the time when the frame is collected The data amount of the target audio data to play.

In some implementations of this embodiment, the foregoing target audio data may be a data stream in a PCM encoding format. The above-mentioned execution body may also play the target audio data through the following steps: first, call a preset audio processing component (such as the OpenSL ES component in the Android development kit) that supports audio playback. The audio processing component may support setting of a buffer and setting of a callback function. The above callback function may be used to return the data volume of the processed audio data after the audio data processing (such as reading, playing, etc.) in the buffer is completed. After the audio processing component is called, the target audio data may be transmitted to the audio processing component to play the target audio data using the audio processing component.

In some implementations of this embodiment, using the above implementation (that is, calling a preset audio processing component that supports audio playback to play target audio data), each time the target audio data in the buffer is processed by the audio processing component After that, the callback function can return the data volume of the target audio data processed this time. Therefore, for the frame of the video data, the execution subject may determine the sum of the amount of data that the callback function has returned when the frame was collected. The execution body may determine the sum of the data amounts as the data amount of the target audio data that has been played when the frame is acquired.

In some implementations of this embodiment, a technician may set the size of the buffer of the audio processing component as a target value in advance, where the target value may be less than or equal to a preset interval duration of two adjacent frames of video data ( For example, the size of audio data corresponding to 33ms). Here, the preset interval duration may be a reciprocal of a preset frame rate (Frames Per Second) of the collected video data. In practice, the frame rate refers to the number of frames collected per second. The unit of the frame rate can be fps or Hertz (Hz). As an example, when the frame rate is 30 fps, the preset interval of two adjacent frames is 33 ms.

Therefore, this implementation manner can more accurately determine the data amount of the target audio data that has been played at a certain time. Thereby, the accuracy of the time stamp of the determined frame of the video data is improved.

In step 203, the video data including the timestamp and the played data in the target audio data are stored.

In this embodiment, the execution subject may store the played data in the target audio data and the video data including a time stamp. Here, the played data and the video data including the time stamp in the target audio data may be stored into two files respectively, and a mapping of the two files is established. In addition, the target audio data interval and the video data including the time stamp may be stored in the same file.

In some implementations of this embodiment, the execution body may first determine the data amount of the target audio data that has been played after the stop recording instruction is triggered (for example, after the user clicks the stop video recording button). Then, the data corresponding to the played amount of data can be extracted. Finally, video data and extracted data containing the time stamp can be stored.

In some implementations of this embodiment, the above-mentioned execution body may first obtain a target audio data interval based on the target audio data that has been played when the last frame of the video data is collected, and extract the target audio data interval. For example, the above-mentioned execution body may first obtain the acquisition time of the last frame of the collected video data. Then, the target audio data interval can be extracted from the area corresponding to the target audio data that has been played when the acquisition started to the acquisition time, and the target audio data interval can be extracted. After extracting the target audio data interval, the execution subject may store the video data and the target audio data interval including time stamps corresponding to all frames in the video data. Therefore, the data amount of the target audio data played when the video recording is stopped can be determined more accurately, and the audio and video synchronization effect when the video stops recording is improved.

In some implementations of this embodiment, the above-mentioned execution body may first encode video data including a time stamp. After that, the target audio data interval and the encoded video data are stored in the same file. In practice, video encoding can refer to a method of converting a file in a certain video format to another file in a video format through a specific compression technology. It should be noted that video coding technology is a well-known technology that is widely studied and applied at present, and will not be repeated here.

In some implementations of this embodiment, after the target audio data interval and the video data including a time stamp are stored, the execution entity may further upload the stored data to a server (for example, the server 105 shown in FIG. 1). ).

With continued reference to FIG. 3, FIG. 3 is a schematic diagram of an application scenario of a method for processing data according to this embodiment. In the application scenario of FIG. 3, a user holds a terminal device 301 and records a soundtrack video. A short video recording application runs on the terminal device 301. The user first selects a certain soundtrack (such as the song "Little Apple") in the interface of the short video recording application. The terminal device 301 then obtains the target audio file 302 corresponding to the soundtrack. After the user clicks the original video recording button, the terminal device 301 simultaneously turns on the camera to collect video data 303, and at the same time, plays the target audio file 302. For the frame in the video data, the terminal device 301 may determine the data amount of the target audio data that has been played when the frame is collected, and determine the playback time corresponding to the data amount as the time stamp of the frame. Finally, the terminal device 301 may store the video data including the timestamp and the played data in the above target audio data in the file 304.

The method provided by the above embodiments of the present disclosure collects video data and plays target audio data, and then for a frame in the video data, determines a data amount of the target audio data that has been played when the frame is collected, and corresponds the data amount. The playback duration of is determined as the timestamp of the frame, and finally the video data containing the timestamp and the played data in the above target audio data are stored. Therefore, when a certain frame is collected, the frame time stamp can be determined based on the playback data amount of the target audio data that has been played at the time of frame collection, that is, the time stamp of the frame of the video data can be determined based on the data amount of the target audio data instead Determined based on a fixed time interval. In the case of unstable video data collection (for example, device overheating and insufficient performance resulting in dropped frames), the interval between two adjacent frames in the video data is not fixed. It is determined that the timestamp of a frame is not accurate at regular time intervals. By adopting the method provided by the foregoing embodiment of the present disclosure, the situation of inaccurate timestamps caused by calculation of the timestamps of frames at fixed time intervals is avoided in the case of unstable video data collection, and the determined video data is improved The accuracy of the time stamps of the frames in the frame improves the audio and video synchronization effect of the recorded soundtrack video.

Referring to FIG. 4, a flowchart 400 of still another embodiment of a method of processing data is shown. The process 400 of the method for processing data includes steps 401 to 406.

In step 401, it is determined whether the target audio data is stored locally.

In this embodiment, an execution subject of the method for processing data (for example, the

terminal devices

101, 102, and 103 shown in FIG. 1) may determine whether the target audio data is stored locally. Here, the above-mentioned target audio data may be a data stream in a PCM encoding format.

In step 402, if the target audio data is not stored locally, a request for acquiring the target audio data is sent to the server, and the target audio data returned by the server is received.

In this embodiment, in response to determining that the target audio data is not stored locally, the execution entity may send the target audio data to a server (for example, server 105 shown in FIG. 1) by using a wired connection or a wireless connection. request. Then, the target audio data returned by the server can be received.

It should be noted that the above wireless connection methods may include, but are not limited to, 3G / 4G connection, WiFi connection, Bluetooth connection, WiMAX connection, Zigbee connection, Ultra Wideband (UWB) connection, and other known or developed in the future. Wireless connection.

It should be noted that if the target audio data returned by the server is not a data stream in the PCM encoding format, the execution body may convert it to a data stream in the PCM encoding format.

In step 403, video data is collected, a preset audio processing component that supports audio playback is called, and the target audio data is transmitted to the audio processing component to play the target audio data using the audio processing component.

In this embodiment, the above-mentioned execution body may use the camera installed on it to collect video data, and at the same time, play target audio data. Here, the target audio data can be played in the following ways:

First, call a preset audio processing component (such as the OpenSL ES component in the Android development kit) that supports audio playback. The audio processing component may support setting of a buffer and setting of a callback function. The above callback function may be used to return the data volume of the processed audio data after the audio data processing component in the buffer is processed once.

After that, the target audio data may be transmitted to the audio processing component to play the target audio data using the audio processing component.

In step 404, when the frames in the video data are collected, the sum of the data amount returned by the callback function is determined, and the sum of the data amounts is determined as the data amount of the target audio data that has been played when the frames in the video data are collected. The playback duration corresponding to the data amount is determined as a time stamp of a frame in the video data.

In this embodiment, after the target audio data in the buffer is processed by the audio processing component each time, the callback function may return the data amount of the processed target audio data. Therefore, for the frame of the video data, the execution subject may determine the sum of the amount of data that the callback function has returned when the frame was collected. The execution body may determine the sum of the data amounts as the data amount of the target audio data that has been played when the frame is acquired. After that, the execution body may determine the playback duration corresponding to the data amount of the target audio data that has been played according to the following steps: First, the product of the sampling frequency, the sampling size, and the number of channels may be determined. Then, the ratio of the data amount of the target audio data that has been played to the product can be determined as the playback duration of the target audio data.

In practice, a technician can set the size of the buffer of the audio processing component to a target value in advance, where the target value can be less than or equal to the audio corresponding to a preset interval (for example, 33ms) of two adjacent frames of video data. The size of the data. Therefore, this implementation manner can more accurately determine the data amount of the target audio data that has been played at a certain time. Thereby, the accuracy of the time stamp of the determined frame of the video data is improved.

In step 405, the target audio data interval is obtained according to the target audio data that has been played when the last frame of the video data is collected, and the target audio data interval is extracted.

In this embodiment, the execution body may first obtain a collection time of a tail frame of the collected video data. Then, the target audio data interval can be extracted from the interval corresponding to the target audio data that has been played when the acquisition time is started, and the target audio data interval can be extracted.

In step 406, the video data and the target audio data interval containing time stamps corresponding to all the frames in the video data are stored.

In this embodiment, in this embodiment, the execution subject may store the played data and the video data including the time stamp in the target audio data. Here, the played data and the video data including the time stamp in the target audio data may be stored into two files respectively, and a mapping of the two files is established. In addition, the target audio data interval and the video data including the time stamp may be stored in the same file.

As can be seen from FIG. 4, compared with the embodiment corresponding to FIG. 2, the process 400 of the method for processing data in this embodiment embodies the use of a preset audio processing component that supports audio playback for target audio data playback A step, and a step of determining a data amount of the target audio data that has been played at the frame collection time based on the callback function. Therefore, when a certain frame of video data is collected, because in the solution described in this embodiment, the callback function of the audio processing component can return the amount of data after each time the data in the buffer is processed by the audio processing component, so The execution body can directly calculate the playback based on the amount of data returned by the callback function. Therefore, compared with the manner in which the transmission volume of audio data is used as the playback volume, the solution described in this embodiment can more accurately determine the playback volume of the target audio data at each frame collection time. Furthermore, the accuracy of the time stamp of the frames in the determined video data is improved, and the audio and video synchronization effect of the recorded soundtrack video is further improved.

Referring to FIG. 5, as an implementation of the methods shown in the foregoing figures, the present disclosure provides an embodiment of a device for processing data. The device embodiment corresponds to the method embodiment shown in FIG. 2, and the device can be applied. In various electronic equipment.

As shown in FIG. 5, the apparatus 500 for processing data according to this embodiment includes: a collection unit 501 configured to collect video data and play target audio data; and a first determination unit 502 configured to perform Frame, determines the data amount of the target audio data that has been played when the frame is collected, and determines the playback time corresponding to the data amount as the time stamp of the frame; the storage unit 503 is configured to store the video data including the time stamp and the above The played data in the target audio data.

In some implementations of this embodiment, the storage unit 503 may include an extraction module and a storage module (not shown in the figure). The extraction module may be configured to obtain a target audio data interval based on the target audio data that has been played when the last frame of the video data is collected, and extract the target audio data interval. The storage module may be configured to store video data including time stamps corresponding to all frames in the video data and the target audio data interval.

In some implementations of this embodiment, the apparatus may further include a second determining unit (not shown in the figure). The second determining unit may be configured to determine whether the target audio data is stored locally; and the sending unit is configured to send a request for obtaining the target audio data to the server when the target audio data is not stored locally, And receiving the target audio data returned by the server.

In some implementations of this embodiment, the target audio data is a data stream in a pulse code modulation coding format, and the acquisition unit 501 may include an object creation module and a first transmission module (not shown in the figure). The object creation module may be configured to instantiate a target class to create a target object for playing target audio data, where the target class is used to play a data stream in a pulse code modulation format. The first transmission module may be configured to transmit the target audio data to the target object in a streaming manner, so as to play the target audio data by using the target object.

In some implementations of this embodiment, the first determining unit 502 may be configured to: for the frame of the video data, determine a data amount of target audio data that has been transmitted to the target object when the frame is collected, and The data amount is determined as the data amount of the target audio data that has been played when the frame was acquired.

In some implementations of this embodiment, the target audio data is a data stream in a pulse coding modulation coding format. And, the above-mentioned acquisition unit may include a calling module and a second transmission module (not shown in the figure). The calling module may be configured to call a preset audio processing component that supports audio playback. The audio processing component supports a buffer setting and a callback function. The callback function is used in each buffer. The audio data of the audio processing component returns the data volume of the processed audio data after processing. The second transmission module may be configured to transmit the target audio data to the audio processing component to play the target audio data using the audio processing component.

In some implementations of this embodiment, the first determining unit may be configured to determine, for the frame of the video data, the sum of the amount of data that the callback function has returned when the frame is collected, and sum the data amount And determine the data amount of the target audio data that was played when the frame was acquired.

In some implementations of this embodiment, the size of the buffer of the audio processing component may be a preset target value, where the target value is less than or equal to the audio data corresponding to a preset interval of two adjacent frames of video data. the size of.

In some implementations of this embodiment, the foregoing storage module may include an encoding submodule and a storage submodule (not shown in the figure). The encoding module may be configured to encode video data with time stamps. The storage module may be configured to store the encoded video data and the target audio data interval in a same file.

The device provided by the foregoing embodiment of the present disclosure collects video data and plays target audio data through the collecting unit 501, and then the first determining unit 502 determines, for a frame in the video data, the target audio data that has been played when the frame is collected. The amount of data, the playback time corresponding to the above-mentioned amount of data is determined as the timestamp of the frame, and finally the storage unit 503 stores the video data including the timestamp and the played data in the target audio data, so that when a certain frame is collected , The time stamp of the frame can be determined according to the playback amount of the target audio data that has been played at the frame collection moment, which improves the audio and video synchronization effect of the recorded soundtrack video.

Reference is now made to FIG. 6, which illustrates a schematic structural diagram of a computer system 600 suitable for implementing a terminal device according to an embodiment of the present disclosure. The terminal device shown in FIG. 6 is only an example, and should not impose any limitation on the functions and use range of the embodiments of the present disclosure.

As shown in FIG. 6, the computer system 600 includes a central processing unit (CPU) 601, which can be loaded into random access according to a program stored in a read-only memory (ROM) 602 or from a storage portion 608 A program in a memory (Random Access Memory, RAM) 603 performs various appropriate actions and processes. In the RAM 603, various programs and data required for the operation of the system 600 are also stored. The CPU 601, the ROM 602, and the RAM 603 are connected to each other through a bus 604. An input / output (I / O) interface 605 is also connected to the bus 604.

The following components are connected to the I / O interface 605: an input portion 606 including a touch screen, a touch panel, etc .; an output portion 607 including a liquid crystal display (LCD), and a speaker; a storage portion 608 including a hard disk; and A communication part 609 of a network interface card, such as a local area network (LAN) card, a modem, or the like. The communication section 609 performs communication processing via a network such as the Internet. The driver 610 is also connected to the I / O interface 605 as necessary. A removable medium 611, such as a semiconductor memory or the like, is installed on the drive 610 as necessary, so that a computer program read therefrom is installed into the storage section 608 as necessary.

According to an embodiment of the present disclosure, the process described above with reference to the flowchart may be implemented as a computer software program. For example, embodiments of the present disclosure include a computer program product including a computer program carried on a computer-readable medium, the computer program containing program code for performing a method shown in a flowchart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication portion 609, and / or installed from a removable medium 611. When the computer program is executed by a central processing unit (CPU) 601, the above-mentioned functions defined in the method of the present disclosure are performed. It should be noted that the computer-readable medium described in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium or any combination of the foregoing. The computer-readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programming read-only memory (Erasable, Programmable, Read-Only, Memory (EPROM) or flash memory), optical fiber, portable compact disk read-only memory (Compact Disc-Read-Only Memory, CD-ROM), optical storage device, magnetic storage device, or the above Any suitable combination. In this disclosure, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in combination with an instruction execution system, apparatus, or device. In this disclosure, a computer-readable signal medium may include a data signal that is included in baseband or propagated as part of a carrier wave, and which carries computer-readable program code. Such a propagated data signal may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. The computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, and the computer-readable medium may send, propagate, or transmit a program for use by or in connection with an instruction execution system, apparatus, or device . The program code contained on the computer-readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, optical cable, radio frequency (RF), or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagram may represent a module, a program segment, or a part of code, which contains one or more functions to implement a specified logical function Executable instructions. It should also be noted that in some alternative implementations, the functions noted in the blocks may also occur in a different order than those marked in the drawings. For example, two successively represented boxes may actually be executed substantially in parallel, and they may sometimes be executed in the reverse order, depending on the functions involved. It should also be noted that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, can be implemented by a dedicated hardware-based system that performs the specified function or operation , Or it can be implemented with a combination of dedicated hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software or hardware. The described unit may also be provided in a processor, for example, it may be described as: a processor includes an acquisition unit, a first determination unit, an extraction unit, and a storage unit. Among them, the names of these units do not constitute a limitation on the unit itself in some cases. For example, the acquisition unit can also be described as a “unit that collects video data and plays target audio data”.

As another aspect, the present disclosure also provides a computer-readable medium, which may be included in the device described in the above embodiments; or may exist alone without being assembled into the device. The computer-readable medium carries one or more programs, and when the one or more programs are executed by the device, the device causes the device to: collect video data and play target audio data; for frames in the video data, determine that the The amount of data of the target audio data that has been played during the frame. The playback time corresponding to the amount of data is determined as the time stamp of the frame; video data containing the time stamp and the data that has been played in the target audio data are stored.

Claims

A method for processing data, including:

Collect video data and play target audio data;

Determining the data amount of the target audio data that has been played when the frames in the video data are collected, and determining the playback time corresponding to the data amount as the time stamp of the frames in the video data;

The video data including the time stamp and the played data in the target audio data are stored.
The method according to claim 1, wherein the storing the video data including the time stamp and the played data in the target audio data comprises:

Obtaining a target audio data interval according to the target audio data that has been played when the last frame of the video data is collected, and extracting the target audio data interval;

The video data including the time stamps corresponding to all the frames in the video data and the target audio data interval are stored.
The method according to claim 1, before the collecting video data and playing target audio data, further comprising:

Determine whether the target audio data is stored locally;

When the target audio data is not stored locally, a request for acquiring the target audio data is sent to the server, and the target audio data returned by the server is received.
The method according to claim 1, wherein the target audio data is a data stream in a pulse code modulation coding format; and

The playback target audio data includes:

Create a target object for playing target audio data;

In a streaming manner, the target audio data is transmitted to the target object to play the target audio data by using the target object.
The method according to claim 4, wherein determining the data amount of the target audio data that has been played when the frames in the video data are collected comprises:

Determining a data amount of target audio data that has been transmitted to the target object when frames in the video data are collected, and determining the data amount as the target audio data that has been played when frames in the video data are collected The amount of data.
The method according to claim 1, wherein the target audio data is a data stream in a pulse code modulation coding format; and

The playback target audio data includes:

Invoking a preset audio processing component that supports audio playback, wherein the audio processing component supports the setting of a buffer and the setting of a callback function for the audio data in the buffer by the audio processing component After processing, the data volume of the processed audio data is returned;

Transmitting the target audio data to the audio processing component to play the target audio data using the audio processing component.
The method according to claim 6, wherein the determining the data amount of the target audio data that has been played when the frames in the video data are collected comprises:

Determine the sum of the amount of data that the callback function has returned when frames in the video data were collected, and determine the sum of the data amounts as the data of the target audio data that was played when the frames in the video data were collected the amount.
The method according to claim 6 or 7, wherein the size of the buffer of the audio processing component is a preset target value, and the target value is less than or equal to a preset interval length of two adjacent frames of video data. The size of the corresponding audio data.
A device for processing data includes:

An acquisition unit configured to acquire video data and play target audio data;

A first determining unit is configured to determine a data amount of target audio data that has been played when frames in the video data are collected, and determine a playback duration corresponding to the data amount as a time stamp of a frame in the video data ;

The storage unit is configured to store the video data including the time stamp and the played data in the target audio data.
The apparatus according to claim 9, wherein the storage unit comprises:

An extraction module configured to obtain a target audio data interval based on the target audio data that has been played when the last frame of the video data was collected, and extract the target audio data interval;

The storage module is configured to store video data including time stamps corresponding to all frames in the video data and the target audio data interval.
The apparatus according to claim 9, further comprising:

A second determining unit configured to determine whether target audio data is stored locally;

The sending unit is configured to send a request for obtaining the target audio data to the server when the target audio data is not stored locally, and receive the target audio data returned by the server.
The apparatus according to claim 9, wherein the target audio data is a data stream in a pulse coding modulation coding format; and

The acquisition unit includes:

An object creation module configured to create a target object for playing target audio data;

The first transmission module is configured to transmit the target audio data to the target object in a streaming manner, so as to play the target audio data by using the target object.
The apparatus according to claim 12, the first determining unit is configured to:

Determining a data amount of target audio data that has been transmitted to the target object when frames in the video data are collected, and determining the data amount as the target audio data that has been played when frames in the video data are collected The amount of data.
The apparatus according to claim 9, wherein the target audio data is a data stream in a pulse coding modulation coding format; and

The acquisition unit includes:

A calling module configured to call a preset audio processing component that supports audio playback, wherein the audio processing component supports setting of a buffer and setting of a callback function for the audio data in the buffer Return the data volume of the processed audio data after being processed by the audio processing component;

A second transmission module is configured to transmit the target audio data to the audio processing component to play the target audio data using the audio processing component.
The apparatus according to claim 14, wherein the first determining unit is configured to:

Determine the sum of the amount of data that the callback function has returned when frames in the video data were collected, and determine the sum of the data amounts as the data of the target audio data that was played when the frames in the video data were collected the amount.
The device according to claim 14 or 15, wherein the size of the buffer of the audio processing component is a preset target value, and the target value is less than or equal to a preset interval duration of two adjacent frames of video data The size of the corresponding audio data.
A terminal device includes:

At least one processor;

A storage device storing at least one program thereon,

The at least one program is executed by the at least one processor such that the at least one processor implements the method of any one of claims 1-8.
A computer-readable medium having stored thereon a computer program which, when executed by a processor, implements the method according to any one of claims 1-8.