WO2020024962A1 - Method and apparatus for processing data - Google Patents

Method and apparatus for processing data Download PDF

Info

Publication number
WO2020024962A1
WO2020024962A1 PCT/CN2019/098510 CN2019098510W WO2020024962A1 WO 2020024962 A1 WO2020024962 A1 WO 2020024962A1 CN 2019098510 W CN2019098510 W CN 2019098510W WO 2020024962 A1 WO2020024962 A1 WO 2020024962A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
audio data
target audio
target
video data
Prior art date
Application number
PCT/CN2019/098510
Other languages
French (fr)
Chinese (zh)
Inventor
宫昀
Original Assignee
北京微播视界科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京微播视界科技有限公司 filed Critical 北京微播视界科技有限公司
Publication of WO2020024962A1 publication Critical patent/WO2020024962A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/4302Content synchronisation processes, e.g. decoder synchronisation
    • H04N21/4305Synchronising client clock from received content stream, e.g. locking decoder clock with encoder clock, extraction of the PCR packets
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/8547Content authoring involving timestamps for synchronizing content

Definitions

  • Embodiments of the present disclosure relate to the field of computer technology, for example, to a method and an apparatus for processing data.
  • audio (soundtrack) playback is usually performed at the same time as video capture with the camera. For example, during a song, a singing action performed by a user is recorded, and the recorded video uses the song as background music. In applications with video recording capabilities, it is common for audio and video to be out of sync for recorded soundtrack videos. Taking an Android device as an example, since there are differences between different devices, it is difficult to achieve the synchronization of recorded audio and video on different devices.
  • the interval time between two adjacent frames in the collected video data is fixed.
  • the sum of the time stamp of the previous frame and the interval time is usually determined as the time stamp of the frame. Further, the time-stamped video data and the played audio data are stored.
  • the embodiments of the present disclosure provide a method and an apparatus for processing data.
  • an embodiment of the present disclosure provides a method for processing data.
  • the method includes: collecting video data and playing target audio data; and determining a data amount of the target audio data that has been played when frames in the video data are collected , Determining the playback duration corresponding to the data amount as the timestamp of the frame in the video data; storing the video data containing the timestamp and the already played data in the target audio data.
  • an embodiment of the present disclosure provides a device for processing data.
  • the device includes: an acquisition unit configured to acquire video data and play target audio data; a first determination unit configured to determine that the video is acquired The data amount of the target audio data that has been played at the time of the frame in the data, and the playback time corresponding to the data amount is determined as the time stamp of the frame in the video data; the storage unit is configured to store the video data and the target including the time stamp Played data in audio data.
  • an embodiment of the present disclosure provides a terminal device including: at least one processor; a storage device storing at least one program thereon, and when at least one program is executed by at least one processor, the at least one processor implements As in any one of the methods of processing data.
  • an embodiment of the present disclosure provides a computer-readable medium having stored thereon a computer program that, when executed by a processor, implements the method as in any one of the methods for processing data.
  • FIG. 1 is an exemplary system architecture diagram to which an embodiment of the present disclosure can be applied;
  • FIG. 1 is an exemplary system architecture diagram to which an embodiment of the present disclosure can be applied;
  • FIG. 2 is a flowchart of an embodiment of a method for processing data according to the present disclosure
  • FIG. 3 is a schematic diagram of an application scenario of a method for processing data according to the present disclosure
  • FIG. 4 is a flowchart of still another embodiment of a method for processing data according to the present disclosure.
  • FIG. 5 is a schematic structural diagram of an embodiment of an apparatus for processing data according to the present disclosure.
  • FIG. 6 is a schematic structural diagram of a computer system suitable for implementing a terminal device according to an embodiment of the present disclosure.
  • FIG. 1 illustrates an exemplary system architecture 100 to which a method for processing data or a device for processing data of the present disclosure can be applied.
  • the system architecture 100 may include a terminal device 101, a terminal device 102, a terminal device 103, a network 104, and a server 105.
  • the network 104 is used to provide a medium for a communication link between the terminal device 101, the terminal device 102, the terminal device 103, and the server 105.
  • the network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, and so on.
  • the user can use the terminal device 101, the terminal device 102, and the terminal device 103 to interact with the server 105 through the network 104 to receive or send messages (such as audio and video data upload requests, audio data acquisition requests), and the like.
  • Various communication client applications can be installed on the terminal device 101, the terminal device 102, and the terminal device 103, such as video recording applications, audio playback applications, instant communication tools, email clients, social platform software, and the like.
  • the terminal device 101, the terminal device 102, and the terminal device 103 may be hardware or software.
  • the terminal device 101, terminal device 102, and terminal device 103 are hardware, they can be various electronic devices with a display screen and video recording and audio playback, including but not limited to smartphones, tablets, laptops, and desktops Computer and so on.
  • the terminal device 101, the terminal device 102, and the terminal device 103 are software, they can be installed in the electronic devices listed above. It can be implemented as multiple software or software modules (for example, to provide distributed services), or it can be implemented as a single software or software module. It is not specifically limited here.
  • the terminal device 101, the terminal device 102, and the terminal device 103 may be equipped with an image acquisition device (such as a camera) to collect video data.
  • an image acquisition device such as a camera
  • the smallest visual unit that makes up a video is a frame. Each frame is a static image. Combining a sequence of temporally consecutive frames together forms a dynamic video.
  • the terminal device 101, the terminal device 102, and the terminal device 103 may also be provided with a device (for example, a speaker) for converting an electric signal into a sound to play the sound.
  • the audio data is obtained by performing analog-to-digital conversion (ADC) on an analog audio signal at a certain frequency.
  • ADC analog-to-digital conversion
  • the playback of audio data is a process of digital-to-analog conversion of digital audio signals, reduction to analog audio signals, and conversion of analog audio signals (analog audio signals to electrical signals) into sound for output.
  • the terminal device 101, the terminal device 102, and the terminal device 103 can use the image acquisition device installed on them to collect video data, and can use the audio data processing (such as converting digital audio signals to analog Audio signal) components and speakers play audio data.
  • the terminal device 101, the terminal device 102, and the terminal device 103 may perform processing such as timestamp calculation on the collected video data, and finally store the processing results (for example, the video data including the timestamp and the played audio data).
  • the server 105 may be a server that provides various services, such as a background server that provides support for video recording applications installed on the terminal devices 101, 102, and 103.
  • the background server can analyze and store the received audio and video data upload requests and other data. It can also receive audio and video data acquisition requests sent by the terminal device 101, terminal device 102, and terminal device 103, and feed back the audio and video data indicated by the audio and video data acquisition request to the terminal device 101, terminal device 102, and terminal device 103 .
  • the server may be hardware or software.
  • the server can be implemented as a distributed server cluster composed of multiple servers or as a single server.
  • the server can be implemented as multiple software or software modules (for example, to provide distributed services), or it can be implemented as a single software or software module. It is not specifically limited here.
  • the method for processing data provided by the embodiments of the present disclosure is generally executed by the terminal device 101, the terminal device 102, and the terminal device 103. Accordingly, the data processing device is generally provided in the terminal device 101, the terminal device 102, and the terminal Device 103.
  • terminal devices, networks, and servers in FIG. 1 are merely exemplary. According to implementation needs, there can be any number of terminal devices, networks, and servers.
  • the method for processing data includes steps 201 to 203.
  • step 201 video data is collected and target audio data is played.
  • an execution subject of the method for processing data may obtain a file in which the target audio data is recorded.
  • the above-mentioned target audio data may be audio data specified in advance by the user as a soundtrack of the video, for example, audio data corresponding to a specified song.
  • the execution body may store a large number of files in which different audio data are recorded in advance. The execution subject may directly find and obtain the target audio data file recorded from the local.
  • audio data is data obtained by digitizing a sound signal.
  • the process of digitizing sound signals is a process of converting continuous analog audio signals into digital signals at a certain frequency to obtain audio data.
  • the digitization process of a sound signal includes three steps: sampling, quantization, and encoding.
  • sampling refers to replacing a signal that is continuous in time with a sequence of signal sample values at regular time intervals.
  • Quantization refers to the use of finite amplitude approximation to indicate the amplitude value that continuously changes in time, and changes the continuous amplitude of the analog signal into a finite number of discrete values with a certain time interval.
  • Encoding means that the quantized discrete value is represented by binary digits according to a certain rule.
  • Pulse Code Modulation can implement digital audio data that is obtained by sampling, quantizing, and encoding an analog audio signal. Therefore, the above-mentioned target audio data may be a data stream in a PCM encoding format. At this time, the format of the file describing the target audio data may be a wav format.
  • the format of the file describing the target audio data may also be other formats, such as the mp3 format and the ape format.
  • the target audio data may be data of other encoding formats (for example, lossy compression formats such as Advanced Audio Coding (AAC)), and is not limited to the PCM encoding format.
  • AAC Advanced Audio Coding
  • the above-mentioned execution body may also perform format conversion on the file after obtaining the file, and convert it into a record wav format.
  • the target audio file in the converted file is a data stream in PCM encoding format.
  • the playback of audio data may be a process of digitally analogizing the digital audio data, restoring it to an analog audio signal, and then converting the analog audio signal (electrical signal) into sound for output.
  • the above-mentioned execution body may be equipped with an image acquisition device, such as a camera.
  • the execution subject may use the camera to collect video data.
  • video data can be described by frames.
  • a frame is the smallest visual unit that makes up a video.
  • Each frame is a static image. Combining a sequence of temporally consecutive frames together forms a dynamic video.
  • the above-mentioned execution body may also be provided with a device for converting an electric signal into a sound, such as a speaker. After obtaining the target audio data, the execution subject may turn on the camera to collect video data, and at the same time, may convert the target audio data into an analog audio signal and output sound using the speaker to implement playback of the target audio data.
  • the above-mentioned execution subject may play the target audio data in various ways.
  • the above-mentioned execution body may first instantiate a class for playing audio and video (for example, the MediaPlayer class in the Android multimedia package) to create an object.
  • This object can be used to play the above target audio data.
  • the target video data can be transmitted to the object to play the target audio data.
  • the MediaPlayer class in the Android multimedia package can support playing sound files in multiple formats. For example, mp3 format, aac format, wav format, etc.
  • it When playing audio data, it first decodes the data into a data stream in PCM encoding format, and then performs digital-to-analog conversion on the data stream.
  • a video recording application can be installed in the execution body.
  • This video recording application can support the recording of soundtrack videos.
  • the above soundtrack video may be a video that plays audio data while video data is being collected.
  • the sound in the recorded soundtrack video is the sound corresponding to the audio data.
  • a singing action performed by a user is recorded, and the recorded video uses the song as background music.
  • the user may first click the name of an audio (such as the name of a song or melody) in the running interface of the video recording application.
  • the execution body can obtain the audio data corresponding to the name and use it as the target audio data.
  • the user can click the video recording button in the running interface of the video recording application to trigger a video recording instruction.
  • the above-mentioned executive body After receiving the video recording instruction, the above-mentioned executive body can turn on the camera for video recording, and at the same time, process the target audio data, convert it into an analog audio signal, and use the speaker to output sound. Users can perform action performances while listening to sound to record performance videos.
  • users can perform continuous recording of videos.
  • the above-mentioned execution subject can continuously collect video data and simultaneously play the target audio data.
  • users can perform segmented recording of videos.
  • the above-mentioned executive body can continuously collect video data and simultaneously play the target audio data until it detects that the user triggers a pause recording instruction (such as clicking the recording button or releasing the recording button), pausing the playback of the target audio data and stopping the video data collection .
  • a pause recording instruction such as clicking the recording button or releasing the recording button
  • the above-mentioned execution body can continue to continuously collect video data, and at the same time, continue to play the target audio data (that is, the amount of data played in the first segment is regarded as Starting point of the second segment), until it is detected that the user triggers the pause recording instruction again, pauses the playback of the target audio data and stops the collection of video data, and so on.
  • the target audio data that is, the amount of data played in the first segment is regarded as Starting point of the second segment
  • step 202 the data amount of the target audio data that has been played when frames in the video data are collected is determined, and the playback time corresponding to the data amount is determined as the time stamp of the frames in the video data.
  • the frame collection time can be recorded.
  • the collection time of each frame may be a system timestamp (such as a Unix timestamp) when the frame is collected.
  • the timestamp is a complete, verifiable data that can indicate that a piece of data already exists at a specific time.
  • a timestamp is a sequence of characters that uniquely identifies the time of a moment.
  • the execution subject may determine the acquisition time of the first frame of the video data as the start time of the video data.
  • the execution subject can read the acquisition time of the frame. Then, the data amount of the target audio data that has been played at the acquisition time can be determined. Finally, the playback time corresponding to the data amount can be determined as the time stamp of the frame.
  • various methods can be used to determine the data amount of the target audio data that has been played at a certain acquisition time. As an example, after performing instantiation of a preset class for playing audio and video (such as the MediaPlayer class in the Android multimedia package) and transmitting the target video data to the created object, you can determine the Acquisition time, the amount of target video data that has been transferred to the object. After that, the data amount can be determined as the data amount of the target audio data that has been played when the frame is acquired.
  • the target audio data is obtained by sampling and quantizing the sound signal according to the set sampling frequency (Sampling and Rate) and the set sampling size (Sampling), and the number of channels of the target audio data is played. It is predetermined. Therefore, based on the data amount of the target audio data that has been played at the acquisition time of a certain frame of image, and the sampling frequency, sample size, and number of channels, the playback time of the target audio data when the corresponding frame is acquired can be calculated.
  • the execution subject may determine the playback duration as the time stamp of the frame.
  • the sampling frequency is also called the sampling speed or sampling rate.
  • the sampling frequency can be the number of samples taken from the continuous signal per second and composed of discrete signals.
  • the sampling frequency can be expressed in Hertz (Hz).
  • the sample size can be expressed in bits.
  • the steps for determining the playback duration are as follows: First, the product of the sampling frequency, the sampling size, and the number of channels can be determined. Then, the ratio of the data amount of the target audio data that has been played to the product can be determined as the playback duration of the target audio data.
  • the foregoing target audio data may be a data stream in a PCM encoding format.
  • the above-mentioned execution body can also play the target audio data through the following steps: first, instantiate the target class (such as the Audio Track class in the Android development kit) to create a target for playing the target audio data Object.
  • the target class can be used to play a data stream in PCM encoding format.
  • the target audio data may be transmitted to the target object in a streaming manner, so as to play the target audio data by using the target object.
  • Audio Track in the Android Development Kit is a class that manages and plays a single audio resource. It is used for playback of PCM audio streams.
  • audio data is played by pushing the audio data to an object instantiated with Audio Track by using a push method.
  • Audio Track objects can operate in two modes. They are static mode and streaming mode. In stream mode, write (by calling the write method) a continuous PCM-encoded data stream to the Audio Track object. In the above implementation manner, the target audio data can be written in a streaming mode.
  • the following implementation methods can be used Determine the data volume of the target audio data that has been played: For the frame of the video data, determine the data volume of the target audio data that has been transmitted to the target object when the frame is collected, and determine the data volume as the time when the frame is collected The data amount of the target audio data to play.
  • the foregoing target audio data may be a data stream in a PCM encoding format.
  • the above-mentioned execution body may also play the target audio data through the following steps: first, call a preset audio processing component (such as the OpenSL ES component in the Android development kit) that supports audio playback.
  • the audio processing component may support setting of a buffer and setting of a callback function.
  • the above callback function may be used to return the data volume of the processed audio data after the audio data processing (such as reading, playing, etc.) in the buffer is completed.
  • the target audio data may be transmitted to the audio processing component to play the target audio data using the audio processing component.
  • the execution subject may determine the sum of the amount of data that the callback function has returned when the frame was collected.
  • the execution body may determine the sum of the data amounts as the data amount of the target audio data that has been played when the frame is acquired.
  • a technician may set the size of the buffer of the audio processing component as a target value in advance, where the target value may be less than or equal to a preset interval duration of two adjacent frames of video data ( For example, the size of audio data corresponding to 33ms).
  • the preset interval duration may be a reciprocal of a preset frame rate (Frames Per Second) of the collected video data.
  • the frame rate refers to the number of frames collected per second.
  • the unit of the frame rate can be fps or Hertz (Hz).
  • the preset interval of two adjacent frames is 33 ms.
  • this implementation manner can more accurately determine the data amount of the target audio data that has been played at a certain time. Thereby, the accuracy of the time stamp of the determined frame of the video data is improved.
  • step 203 the video data including the timestamp and the played data in the target audio data are stored.
  • the execution subject may store the played data in the target audio data and the video data including a time stamp.
  • the played data and the video data including the time stamp in the target audio data may be stored into two files respectively, and a mapping of the two files is established.
  • the target audio data interval and the video data including the time stamp may be stored in the same file.
  • the execution body may first determine the data amount of the target audio data that has been played after the stop recording instruction is triggered (for example, after the user clicks the stop video recording button). Then, the data corresponding to the played amount of data can be extracted. Finally, video data and extracted data containing the time stamp can be stored.
  • the above-mentioned execution body may first obtain a target audio data interval based on the target audio data that has been played when the last frame of the video data is collected, and extract the target audio data interval.
  • the above-mentioned execution body may first obtain the acquisition time of the last frame of the collected video data. Then, the target audio data interval can be extracted from the area corresponding to the target audio data that has been played when the acquisition started to the acquisition time, and the target audio data interval can be extracted.
  • the execution subject may store the video data and the target audio data interval including time stamps corresponding to all frames in the video data. Therefore, the data amount of the target audio data played when the video recording is stopped can be determined more accurately, and the audio and video synchronization effect when the video stops recording is improved.
  • the above-mentioned execution body may first encode video data including a time stamp. After that, the target audio data interval and the encoded video data are stored in the same file.
  • video encoding can refer to a method of converting a file in a certain video format to another file in a video format through a specific compression technology. It should be noted that video coding technology is a well-known technology that is widely studied and applied at present, and will not be repeated here.
  • the execution entity may further upload the stored data to a server (for example, the server 105 shown in FIG. 1). ).
  • FIG. 3 is a schematic diagram of an application scenario of a method for processing data according to this embodiment.
  • a user holds a terminal device 301 and records a soundtrack video.
  • a short video recording application runs on the terminal device 301.
  • the user first selects a certain soundtrack (such as the song "Little Apple") in the interface of the short video recording application.
  • the terminal device 301 then obtains the target audio file 302 corresponding to the soundtrack.
  • the terminal device 301 simultaneously turns on the camera to collect video data 303, and at the same time, plays the target audio file 302.
  • the terminal device 301 may determine the data amount of the target audio data that has been played when the frame is collected, and determine the playback time corresponding to the data amount as the time stamp of the frame. Finally, the terminal device 301 may store the video data including the timestamp and the played data in the above target audio data in the file 304.
  • the method provided by the above embodiments of the present disclosure collects video data and plays target audio data, and then for a frame in the video data, determines a data amount of the target audio data that has been played when the frame is collected, and corresponds the data amount.
  • the playback duration of is determined as the timestamp of the frame, and finally the video data containing the timestamp and the played data in the above target audio data are stored. Therefore, when a certain frame is collected, the frame time stamp can be determined based on the playback data amount of the target audio data that has been played at the time of frame collection, that is, the time stamp of the frame of the video data can be determined based on the data amount of the target audio data instead Determined based on a fixed time interval.
  • the interval between two adjacent frames in the video data is not fixed. It is determined that the timestamp of a frame is not accurate at regular time intervals.
  • the situation of inaccurate timestamps caused by calculation of the timestamps of frames at fixed time intervals is avoided in the case of unstable video data collection, and the determined video data is improved
  • the accuracy of the time stamps of the frames in the frame improves the audio and video synchronization effect of the recorded soundtrack video.
  • FIG. 4 a flowchart 400 of still another embodiment of a method of processing data is shown.
  • the process 400 of the method for processing data includes steps 401 to 406.
  • step 401 it is determined whether the target audio data is stored locally.
  • an execution subject of the method for processing data may determine whether the target audio data is stored locally.
  • the above-mentioned target audio data may be a data stream in a PCM encoding format.
  • step 402 if the target audio data is not stored locally, a request for acquiring the target audio data is sent to the server, and the target audio data returned by the server is received.
  • the execution entity may send the target audio data to a server (for example, server 105 shown in FIG. 1) by using a wired connection or a wireless connection. request. Then, the target audio data returned by the server can be received.
  • a server for example, server 105 shown in FIG. 1
  • wireless connection methods may include, but are not limited to, 3G / 4G connection, WiFi connection, Bluetooth connection, WiMAX connection, Zigbee connection, Ultra Wideband (UWB) connection, and other known or developed in the future.
  • Wireless connection may include, but are not limited to, 3G / 4G connection, WiFi connection, Bluetooth connection, WiMAX connection, Zigbee connection, Ultra Wideband (UWB) connection, and other known or developed in the future.
  • Wireless connection may include, but are not limited to, 3G / 4G connection, WiFi connection, Bluetooth connection, WiMAX connection, Zigbee connection, Ultra Wideband (UWB) connection, and other known or developed in the future.
  • the execution body may convert it to a data stream in the PCM encoding format.
  • step 403 video data is collected, a preset audio processing component that supports audio playback is called, and the target audio data is transmitted to the audio processing component to play the target audio data using the audio processing component.
  • the above-mentioned execution body may use the camera installed on it to collect video data, and at the same time, play target audio data.
  • the target audio data can be played in the following ways:
  • the audio processing component may support setting of a buffer and setting of a callback function.
  • the above callback function may be used to return the data volume of the processed audio data after the audio data processing component in the buffer is processed once.
  • the target audio data may be transmitted to the audio processing component to play the target audio data using the audio processing component.
  • step 404 when the frames in the video data are collected, the sum of the data amount returned by the callback function is determined, and the sum of the data amounts is determined as the data amount of the target audio data that has been played when the frames in the video data are collected.
  • the playback duration corresponding to the data amount is determined as a time stamp of a frame in the video data.
  • the callback function may return the data amount of the processed target audio data. Therefore, for the frame of the video data, the execution subject may determine the sum of the amount of data that the callback function has returned when the frame was collected. The execution body may determine the sum of the data amounts as the data amount of the target audio data that has been played when the frame is acquired. After that, the execution body may determine the playback duration corresponding to the data amount of the target audio data that has been played according to the following steps: First, the product of the sampling frequency, the sampling size, and the number of channels may be determined. Then, the ratio of the data amount of the target audio data that has been played to the product can be determined as the playback duration of the target audio data.
  • a technician can set the size of the buffer of the audio processing component to a target value in advance, where the target value can be less than or equal to the audio corresponding to a preset interval (for example, 33ms) of two adjacent frames of video data.
  • the size of the data Therefore, this implementation manner can more accurately determine the data amount of the target audio data that has been played at a certain time. Thereby, the accuracy of the time stamp of the determined frame of the video data is improved.
  • step 405 the target audio data interval is obtained according to the target audio data that has been played when the last frame of the video data is collected, and the target audio data interval is extracted.
  • the execution body may first obtain a collection time of a tail frame of the collected video data. Then, the target audio data interval can be extracted from the interval corresponding to the target audio data that has been played when the acquisition time is started, and the target audio data interval can be extracted.
  • step 406 the video data and the target audio data interval containing time stamps corresponding to all the frames in the video data are stored.
  • the execution subject may store the played data and the video data including the time stamp in the target audio data.
  • the played data and the video data including the time stamp in the target audio data may be stored into two files respectively, and a mapping of the two files is established.
  • the target audio data interval and the video data including the time stamp may be stored in the same file.
  • the process 400 of the method for processing data in this embodiment embodies the use of a preset audio processing component that supports audio playback for target audio data playback A step, and a step of determining a data amount of the target audio data that has been played at the frame collection time based on the callback function. Therefore, when a certain frame of video data is collected, because in the solution described in this embodiment, the callback function of the audio processing component can return the amount of data after each time the data in the buffer is processed by the audio processing component, so The execution body can directly calculate the playback based on the amount of data returned by the callback function.
  • the solution described in this embodiment can more accurately determine the playback volume of the target audio data at each frame collection time. Furthermore, the accuracy of the time stamp of the frames in the determined video data is improved, and the audio and video synchronization effect of the recorded soundtrack video is further improved.
  • the present disclosure provides an embodiment of a device for processing data.
  • the device embodiment corresponds to the method embodiment shown in FIG. 2, and the device can be applied.
  • the device can be applied.
  • various electronic equipment In various electronic equipment.
  • the apparatus 500 for processing data includes: a collection unit 501 configured to collect video data and play target audio data; and a first determination unit 502 configured to perform Frame, determines the data amount of the target audio data that has been played when the frame is collected, and determines the playback time corresponding to the data amount as the time stamp of the frame; the storage unit 503 is configured to store the video data including the time stamp and the above The played data in the target audio data.
  • the storage unit 503 may include an extraction module and a storage module (not shown in the figure).
  • the extraction module may be configured to obtain a target audio data interval based on the target audio data that has been played when the last frame of the video data is collected, and extract the target audio data interval.
  • the storage module may be configured to store video data including time stamps corresponding to all frames in the video data and the target audio data interval.
  • the apparatus may further include a second determining unit (not shown in the figure).
  • the second determining unit may be configured to determine whether the target audio data is stored locally; and the sending unit is configured to send a request for obtaining the target audio data to the server when the target audio data is not stored locally, And receiving the target audio data returned by the server.
  • the target audio data is a data stream in a pulse code modulation coding format
  • the acquisition unit 501 may include an object creation module and a first transmission module (not shown in the figure).
  • the object creation module may be configured to instantiate a target class to create a target object for playing target audio data, where the target class is used to play a data stream in a pulse code modulation format.
  • the first transmission module may be configured to transmit the target audio data to the target object in a streaming manner, so as to play the target audio data by using the target object.
  • the first determining unit 502 may be configured to: for the frame of the video data, determine a data amount of target audio data that has been transmitted to the target object when the frame is collected, and The data amount is determined as the data amount of the target audio data that has been played when the frame was acquired.
  • the target audio data is a data stream in a pulse coding modulation coding format.
  • the above-mentioned acquisition unit may include a calling module and a second transmission module (not shown in the figure).
  • the calling module may be configured to call a preset audio processing component that supports audio playback.
  • the audio processing component supports a buffer setting and a callback function.
  • the callback function is used in each buffer.
  • the audio data of the audio processing component returns the data volume of the processed audio data after processing.
  • the second transmission module may be configured to transmit the target audio data to the audio processing component to play the target audio data using the audio processing component.
  • the first determining unit may be configured to determine, for the frame of the video data, the sum of the amount of data that the callback function has returned when the frame is collected, and sum the data amount And determine the data amount of the target audio data that was played when the frame was acquired.
  • the size of the buffer of the audio processing component may be a preset target value, where the target value is less than or equal to the audio data corresponding to a preset interval of two adjacent frames of video data. the size of.
  • the foregoing storage module may include an encoding submodule and a storage submodule (not shown in the figure).
  • the encoding module may be configured to encode video data with time stamps.
  • the storage module may be configured to store the encoded video data and the target audio data interval in a same file.
  • the device provided by the foregoing embodiment of the present disclosure collects video data and plays target audio data through the collecting unit 501, and then the first determining unit 502 determines, for a frame in the video data, the target audio data that has been played when the frame is collected.
  • the amount of data, the playback time corresponding to the above-mentioned amount of data is determined as the timestamp of the frame, and finally the storage unit 503 stores the video data including the timestamp and the played data in the target audio data, so that when a certain frame is collected ,
  • the time stamp of the frame can be determined according to the playback amount of the target audio data that has been played at the frame collection moment, which improves the audio and video synchronization effect of the recorded soundtrack video.
  • FIG. 6 illustrates a schematic structural diagram of a computer system 600 suitable for implementing a terminal device according to an embodiment of the present disclosure.
  • the terminal device shown in FIG. 6 is only an example, and should not impose any limitation on the functions and use range of the embodiments of the present disclosure.
  • the computer system 600 includes a central processing unit (CPU) 601, which can be loaded into random access according to a program stored in a read-only memory (ROM) 602 or from a storage portion 608
  • ROM read-only memory
  • RAM Random Access Memory
  • a program in a memory (Random Access Memory, RAM) 603 performs various appropriate actions and processes.
  • RAM 603 various programs and data required for the operation of the system 600 are also stored.
  • the CPU 601, the ROM 602, and the RAM 603 are connected to each other through a bus 604.
  • An input / output (I / O) interface 605 is also connected to the bus 604.
  • the following components are connected to the I / O interface 605: an input portion 606 including a touch screen, a touch panel, etc .; an output portion 607 including a liquid crystal display (LCD), and a speaker; a storage portion 608 including a hard disk; and A communication part 609 of a network interface card, such as a local area network (LAN) card, a modem, or the like.
  • the communication section 609 performs communication processing via a network such as the Internet.
  • the driver 610 is also connected to the I / O interface 605 as necessary.
  • a removable medium 611 such as a semiconductor memory or the like, is installed on the drive 610 as necessary, so that a computer program read therefrom is installed into the storage section 608 as necessary.
  • the process described above with reference to the flowchart may be implemented as a computer software program.
  • embodiments of the present disclosure include a computer program product including a computer program carried on a computer-readable medium, the computer program containing program code for performing a method shown in a flowchart.
  • the computer program may be downloaded and installed from a network through the communication portion 609, and / or installed from a removable medium 611.
  • CPU central processing unit
  • the computer-readable medium described in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium or any combination of the foregoing.
  • the computer-readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof.
  • Computer-readable storage media may include, but are not limited to: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programming read-only memory (Erasable, Programmable, Read-Only, Memory (EPROM) or flash memory), optical fiber, portable compact disk read-only memory (Compact Disc-Read-Only Memory, CD-ROM), optical storage device, magnetic storage device, or the above Any suitable combination.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in combination with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal that is included in baseband or propagated as part of a carrier wave, and which carries computer-readable program code. Such a propagated data signal may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • the computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, and the computer-readable medium may send, propagate, or transmit a program for use by or in connection with an instruction execution system, apparatus, or device .
  • the program code contained on the computer-readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, optical cable, radio frequency (RF), or any suitable combination of the foregoing.
  • each block in the flowchart or block diagram may represent a module, a program segment, or a part of code, which contains one or more functions to implement a specified logical function Executable instructions.
  • the functions noted in the blocks may also occur in a different order than those marked in the drawings. For example, two successively represented boxes may actually be executed substantially in parallel, and they may sometimes be executed in the reverse order, depending on the functions involved.
  • each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts can be implemented by a dedicated hardware-based system that performs the specified function or operation , Or it can be implemented with a combination of dedicated hardware and computer instructions.
  • the units described in the embodiments of the present disclosure may be implemented by software or hardware.
  • the described unit may also be provided in a processor, for example, it may be described as: a processor includes an acquisition unit, a first determination unit, an extraction unit, and a storage unit. Among them, the names of these units do not constitute a limitation on the unit itself in some cases.
  • the acquisition unit can also be described as a “unit that collects video data and plays target audio data”.
  • the present disclosure also provides a computer-readable medium, which may be included in the device described in the above embodiments; or may exist alone without being assembled into the device.
  • the computer-readable medium carries one or more programs, and when the one or more programs are executed by the device, the device causes the device to: collect video data and play target audio data; for frames in the video data, determine that the The amount of data of the target audio data that has been played during the frame.
  • the playback time corresponding to the amount of data is determined as the time stamp of the frame; video data containing the time stamp and the data that has been played in the target audio data are stored.

Abstract

Disclosed by the embodiments of the present disclosure are a method and apparatus for processing data. An exemplary embodiment of the method Include: collecting video data and playing target audio data; determining a data amount of target audio data that has been played when frames in the video data are collected, and determining a playback duration corresponding to the data amount as a timestamp of a corresponding frame in the video data; and storing the video data containing the timestamp and the played data in the target audio data.

Description

处理数据的方法和装置Method and device for processing data
本公开要求在2018年08月01日提交中国专利局、申请号为201810866740.1的中国专利申请的优先权,该申请的全部内容通过引用结合在本申请中。This disclosure claims priority from a Chinese patent application filed with the Chinese Patent Office on August 01, 2018, with application number 201810866740.1, the entire contents of which are incorporated herein by reference.
技术领域Technical field
本公开实施例涉及计算机技术领域,例如涉及处理数据的方法和装置。Embodiments of the present disclosure relate to the field of computer technology, for example, to a method and an apparatus for processing data.
背景技术Background technique
录制配乐视频时,通常在利用摄像头进行视频采集的同时进行音频(配乐)播放。例如,播放某歌曲过程中录制用户表演的演唱动作,所录制的视频以该歌曲为背景音乐。在具有视频录制功能的应用中,录制的配乐视频出现音视频不同步的情况较为常见。以安卓(Android)设备为例,由于不同设备之间存在差异性,在不同设备上实现所录制的音视频同步,具有较高的难度。When recording a soundtrack video, audio (soundtrack) playback is usually performed at the same time as video capture with the camera. For example, during a song, a singing action performed by a user is recorded, and the recorded video uses the song as background music. In applications with video recording capabilities, it is common for audio and video to be out of sync for recorded soundtrack videos. Taking an Android device as an example, since there are differences between different devices, it is difficult to achieve the synchronization of recorded audio and video on different devices.
相关的方式,在配乐视频录制过程中,通常认为所采集的视频数据中的相邻两帧的间隔时间是固定的。对于视频数据中的某帧,通常将上一帧的时间戳与该间隔时间之和确定为该帧的时间戳。进而,将带有时间戳的视频数据和所播放的音频数据存储。In a related manner, during the recording of a soundtrack video, it is generally considered that the interval time between two adjacent frames in the collected video data is fixed. For a frame in video data, the sum of the time stamp of the previous frame and the interval time is usually determined as the time stamp of the frame. Further, the time-stamped video data and the played audio data are stored.
发明内容Summary of the invention
以下是对本文详细描述的主题的概述。本概述并非是为了限制权利要求的保护范围。The following is an overview of the topics detailed in this article. This summary is not intended to limit the scope of protection of the claims.
本公开实施例提出了处理数据的方法和装置。The embodiments of the present disclosure provide a method and an apparatus for processing data.
第一方面,本公开实施例提供了一种处理数据的方法,该方法包括:采集视频数据并播放目标音频数据;确定采集到所述视频数据中的帧时已播放的目标音频数据的数据量,将数据量对应的播放时长确定为所述视频数据中的帧的时间戳;存储包含时间戳的视频数据和目标音频数据中已播放的数据。In a first aspect, an embodiment of the present disclosure provides a method for processing data. The method includes: collecting video data and playing target audio data; and determining a data amount of the target audio data that has been played when frames in the video data are collected , Determining the playback duration corresponding to the data amount as the timestamp of the frame in the video data; storing the video data containing the timestamp and the already played data in the target audio data.
第二方面,本公开实施例提供了一种处理数据的装置,该装置包括:采集单元,被配置成采集视频数据并播放目标音频数据;第一确定单元,被配置成确定采集到所述视频数据中的帧时已播放的目标音频数据的数据量,将数据量 对应的播放时长确定为所述视频数据中的帧的时间戳;存储单元,被配置成存储包含时间戳的视频数据和目标音频数据中已播放的数据。In a second aspect, an embodiment of the present disclosure provides a device for processing data. The device includes: an acquisition unit configured to acquire video data and play target audio data; a first determination unit configured to determine that the video is acquired The data amount of the target audio data that has been played at the time of the frame in the data, and the playback time corresponding to the data amount is determined as the time stamp of the frame in the video data; the storage unit is configured to store the video data and the target including the time stamp Played data in audio data.
第三方面,本公开实施例提供了一种终端设备,包括:至少一个处理器;存储装置,其上存储有至少一个程序,当至少一个程序被至少一个处理器执行,使得至少一个处理器实现如处理数据的方法中任一实施例的方法。According to a third aspect, an embodiment of the present disclosure provides a terminal device including: at least one processor; a storage device storing at least one program thereon, and when at least one program is executed by at least one processor, the at least one processor implements As in any one of the methods of processing data.
第四方面,本公开实施例提供了一种计算机可读介质,其上存储有计算机程序,该程序被处理器执行时实现如处理数据的方法中任一实施例的方法。In a fourth aspect, an embodiment of the present disclosure provides a computer-readable medium having stored thereon a computer program that, when executed by a processor, implements the method as in any one of the methods for processing data.
在阅读并理解了附图和详细描述后,可以明白其他方面。After reading and understanding the drawings and detailed description, other aspects can be understood.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
图1是本公开的一个实施例可以应用于其中的示例性系统架构图;FIG. 1 is an exemplary system architecture diagram to which an embodiment of the present disclosure can be applied; FIG.
图2是根据本公开的处理数据的方法的一个实施例的流程图;2 is a flowchart of an embodiment of a method for processing data according to the present disclosure;
图3是根据本公开的处理数据的方法的一个应用场景的示意图;3 is a schematic diagram of an application scenario of a method for processing data according to the present disclosure;
图4是根据本公开的处理数据的方法的又一个实施例的流程图;4 is a flowchart of still another embodiment of a method for processing data according to the present disclosure;
图5是根据本公开的处理数据的装置的一个实施例的结构示意图;5 is a schematic structural diagram of an embodiment of an apparatus for processing data according to the present disclosure;
图6是适于用来实现本公开实施例的终端设备的计算机系统的结构示意图。FIG. 6 is a schematic structural diagram of a computer system suitable for implementing a terminal device according to an embodiment of the present disclosure.
具体实施方式detailed description
下面结合附图和实施例对本公开作进一步的详细说明。可以理解的是,此处所描述的示例实施例仅仅用于解释本公开,而非对本公开的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与本公开相关的部分。The disclosure is further described in detail below with reference to the drawings and embodiments. It can be understood that the example embodiments described herein are only used to explain the disclosure, but not to limit the disclosure. It should also be noted that, for convenience of description, only the parts related to the present disclosure are shown in the drawings.
需要说明的是,在不冲突的情况下,本公开中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本公开。It should be noted that, in the case of no conflict, the embodiments in the present disclosure and the features in the embodiments can be combined with each other. The disclosure will be described in detail below with reference to the drawings and embodiments.
图1示出了可以应用本公开的处理数据的方法或处理数据的装置的示例性系统架构100。FIG. 1 illustrates an exemplary system architecture 100 to which a method for processing data or a device for processing data of the present disclosure can be applied.
如图1所示,系统架构100可以包括终端设备101、终端设备102、终端设备103,网络104和服务器105。网络104用以在终端设备101、终端设备102、终端设备103和服务器105之间提供通信链路的介质。网络104可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。As shown in FIG. 1, the system architecture 100 may include a terminal device 101, a terminal device 102, a terminal device 103, a network 104, and a server 105. The network 104 is used to provide a medium for a communication link between the terminal device 101, the terminal device 102, the terminal device 103, and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, and so on.
用户可以使用终端设备101、终端设备102、终端设备103通过网络104与服务器105交互,以接收或发送消息(例如音视频数据上传请求、音频数据获 取请求)等。终端设备101、终端设备102、终端设备103上可以安装有各种通讯客户端应用,例如视频录制类应用、音频播放类应用、即时通信工具、邮箱客户端、社交平台软件等。The user can use the terminal device 101, the terminal device 102, and the terminal device 103 to interact with the server 105 through the network 104 to receive or send messages (such as audio and video data upload requests, audio data acquisition requests), and the like. Various communication client applications can be installed on the terminal device 101, the terminal device 102, and the terminal device 103, such as video recording applications, audio playback applications, instant communication tools, email clients, social platform software, and the like.
终端设备101、终端设备102、终端设备103可以是硬件,也可以是软件。在终端设备101、终端设备102、终端设备103为硬件时,可以是具有显示屏并且视频录制和音频播放的各种电子设备,包括但不限于智能手机、平板电脑、膝上型便携计算机和台式计算机等等。在终端设备101、终端设备102、终端设备103为软件时,可以安装在上述所列举的电子设备中。其可以实现成多个软件或软件模块(例如用来提供分布式服务),也可以实现成单个软件或软件模块。在此不做具体限定。The terminal device 101, the terminal device 102, and the terminal device 103 may be hardware or software. When the terminal device 101, terminal device 102, and terminal device 103 are hardware, they can be various electronic devices with a display screen and video recording and audio playback, including but not limited to smartphones, tablets, laptops, and desktops Computer and so on. When the terminal device 101, the terminal device 102, and the terminal device 103 are software, they can be installed in the electronic devices listed above. It can be implemented as multiple software or software modules (for example, to provide distributed services), or it can be implemented as a single software or software module. It is not specifically limited here.
终端设备101、终端设备102、终端设备103可以安装有图像采集装置(例如摄像头),以采集视频数据。实践中,组成视频的最小视觉单位是帧(Frame)。每一帧是一幅静态的图像。将时间上连续的帧序列合成到一起便形成动态视频。此外,终端设备101、终端设备102、终端设备103也可以安装有用于将电信号转换为声音的装置(例如扬声器),以播放声音。实践中,音频数据是以一定的频率对模拟音频信号进行模数转换(Analogue-to-Digital Conversion,ADC)后所得到的数据。音频数据的播放,是将数字音频信号进行数模转换,还原为模拟音频信号,再将模拟音频信号(模拟音频信号为电信号)转化为声音进行输出的过程。The terminal device 101, the terminal device 102, and the terminal device 103 may be equipped with an image acquisition device (such as a camera) to collect video data. In practice, the smallest visual unit that makes up a video is a frame. Each frame is a static image. Combining a sequence of temporally consecutive frames together forms a dynamic video. In addition, the terminal device 101, the terminal device 102, and the terminal device 103 may also be provided with a device (for example, a speaker) for converting an electric signal into a sound to play the sound. In practice, the audio data is obtained by performing analog-to-digital conversion (ADC) on an analog audio signal at a certain frequency. The playback of audio data is a process of digital-to-analog conversion of digital audio signals, reduction to analog audio signals, and conversion of analog audio signals (analog audio signals to electrical signals) into sound for output.
终端设备101、终端设备102、终端设备103可以利用安装于其上的图像采集装置进行视频数据的采集,并可以利用安装于其上的用于进行音频数据处理(例如将数字音频信号转换为模拟音频信号)的组件和扬声器播放音频数据。并且,终端设备101、终端设备102、终端设备103可以对所采集到的视频数据进行时间戳计算等处理,最终将处理结果(例如包含时间戳的视频数据和已播放的音频数据)进行存储。The terminal device 101, the terminal device 102, and the terminal device 103 can use the image acquisition device installed on them to collect video data, and can use the audio data processing (such as converting digital audio signals to analog Audio signal) components and speakers play audio data. In addition, the terminal device 101, the terminal device 102, and the terminal device 103 may perform processing such as timestamp calculation on the collected video data, and finally store the processing results (for example, the video data including the timestamp and the played audio data).
服务器105可以是提供各种服务的服务器,例如对终端设备101、102、103上所安装的视频录制类应用提供支持的后台服务器。后台服务器可以对所接收到的音视频数据上传请求等数据进行解析、存储等处理。还可以接收终端设备101、终端设备102、终端设备103所发送的音视频数据获取请求,并将该音视频数据获取请求所指示的音视频数据反馈至终端设备101、终端设备102、终端设备103。The server 105 may be a server that provides various services, such as a background server that provides support for video recording applications installed on the terminal devices 101, 102, and 103. The background server can analyze and store the received audio and video data upload requests and other data. It can also receive audio and video data acquisition requests sent by the terminal device 101, terminal device 102, and terminal device 103, and feed back the audio and video data indicated by the audio and video data acquisition request to the terminal device 101, terminal device 102, and terminal device 103 .
需要说明的是,服务器可以是硬件,也可以是软件。在服务器为硬件时,可以实现成多个服务器组成的分布式服务器集群,也可以实现成单个服务器。在服务器为软件时,可以实现成多个软件或软件模块(例如用来提供分布式服务),也可以实现成单个软件或软件模块。在此不做具体限定。It should be noted that the server may be hardware or software. When the server is hardware, it can be implemented as a distributed server cluster composed of multiple servers or as a single server. When the server is software, it can be implemented as multiple software or software modules (for example, to provide distributed services), or it can be implemented as a single software or software module. It is not specifically limited here.
需要说明的是,本公开实施例所提供的处理数据的方法一般由终端设备101、终端设备102、终端设备103执行,相应地,处理数据的装置一般设置于终端设备101、终端设备102、终端设备103中。It should be noted that the method for processing data provided by the embodiments of the present disclosure is generally executed by the terminal device 101, the terminal device 102, and the terminal device 103. Accordingly, the data processing device is generally provided in the terminal device 101, the terminal device 102, and the terminal Device 103.
应该理解,图1中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器。It should be understood that the numbers of terminal devices, networks, and servers in FIG. 1 are merely exemplary. According to implementation needs, there can be any number of terminal devices, networks, and servers.
继续参考图2,示出了根据本公开的处理数据的方法的一个实施例的流程200。该处理数据的方法,包括步骤201至步骤203。With continued reference to FIG. 2, a flowchart 200 of one embodiment of a method of processing data according to the present disclosure is shown. The method for processing data includes steps 201 to 203.
在步骤201中,采集视频数据并播放目标音频数据。In step 201, video data is collected and target audio data is played.
在本实施例中,处理数据的方法的执行主体(例如图1所示的终端设备101、终端设备102、终端设备103)可以获取记载有目标音频数据的文件。此处,上述目标音频数据可以是用户预先指定作为视频的配乐的音频数据(voice data),例如某个指定歌曲对应的音频数据。这里,上述执行主体中可以预先存储有大量的、记载有不同音频数据的文件。上述执行主体可以直接从本地查找并获取记载有上述目标音频数据文件。In this embodiment, an execution subject of the method for processing data (for example, the terminal device 101, the terminal device 102, and the terminal device 103 shown in FIG. 1) may obtain a file in which the target audio data is recorded. Here, the above-mentioned target audio data may be audio data specified in advance by the user as a soundtrack of the video, for example, audio data corresponding to a specified song. Here, the execution body may store a large number of files in which different audio data are recorded in advance. The execution subject may directly find and obtain the target audio data file recorded from the local.
实践中,音频数据是对声音信号进行数字化后的数据。声音信号的数字化过程是以一定的频率将连续的模拟音频信号转换成数字信号得到音频数据的过程。通常,声音信号的数字化过程包含采样、量化和编码三个步骤。其中,采样是指用每隔一定时间间隔的信号样本值序列来代替原来在时间上连续的信号。量化是指用有限幅度近似表示原来在时间上连续变化的幅度值,把模拟信号的连续幅度变为有限数量、有一定时间间隔的离散值。编码则是指按照一定的规律,把量化后的离散值用二进制数码表示。此处,脉冲编码调制(Pulse Code Modulation,PCM)可以实现将模拟音频信号经过采样、量化、编码转换成的数字化的音频数据。因此,上述目标音频数据可以是PCM编码格式的数据流。此时,记载上述目标音频数据的文件的格式可以是wav格式。In practice, audio data is data obtained by digitizing a sound signal. The process of digitizing sound signals is a process of converting continuous analog audio signals into digital signals at a certain frequency to obtain audio data. Generally, the digitization process of a sound signal includes three steps: sampling, quantization, and encoding. Among them, sampling refers to replacing a signal that is continuous in time with a sequence of signal sample values at regular time intervals. Quantization refers to the use of finite amplitude approximation to indicate the amplitude value that continuously changes in time, and changes the continuous amplitude of the analog signal into a finite number of discrete values with a certain time interval. Encoding means that the quantized discrete value is represented by binary digits according to a certain rule. Here, Pulse Code Modulation (Pulse Code Modulation, PCM) can implement digital audio data that is obtained by sampling, quantizing, and encoding an analog audio signal. Therefore, the above-mentioned target audio data may be a data stream in a PCM encoding format. At this time, the format of the file describing the target audio data may be a wav format.
需要说明的是,记载上述目标音频数据的文件的格式还可以是其他格式,例如mp3格式、ape格式等。此时,上述目标音频数据可以是其他编码格式(例如高级音频编码(Advanced Audio Coding,AAC)等有损压缩格式)的数据, 不限于PCM编码格式。上述执行主体也可以在获取该文件后,对该文件进行格式转换,将其转换为记录wav格式。此时,转换后的文件中的目标音频文件则为PCM编码格式的数据流。It should be noted that the format of the file describing the target audio data may also be other formats, such as the mp3 format and the ape format. At this time, the target audio data may be data of other encoding formats (for example, lossy compression formats such as Advanced Audio Coding (AAC)), and is not limited to the PCM encoding format. The above-mentioned execution body may also perform format conversion on the file after obtaining the file, and convert it into a record wav format. At this time, the target audio file in the converted file is a data stream in PCM encoding format.
需要指出的是,音频数据的播放,可以是将数字化的音频数据进行数模转换,将其还原为模拟音频信号,再将模拟音频信号(电信号)转换为声音进行输出的过程。It should be noted that the playback of audio data may be a process of digitally analogizing the digital audio data, restoring it to an analog audio signal, and then converting the analog audio signal (electrical signal) into sound for output.
在本实施例中,上述执行主体可以安装有图像采集装置,例如摄像头。上述执行主体可以利用上述摄像头进行视频数据(vision data)的采集。实践中,视频数据可以用帧(Frame)来描述。这里,帧是组成视频的最小视觉单位。每一帧是一幅静态的图像。将时间上连续的帧序列合成到一起便形成动态视频。此外,上述执行主体还可以安装有用于将电信号转换为声音的装置,例如扬声器。在获取到上述目标音频数据后,上述执行主体可以开启上述摄像头进行视频数据的采集,同时,可以将上述目标音频数据转换为模拟音频信号,利用上述扬声器输出声音,以实现目标音频数据的播放。In this embodiment, the above-mentioned execution body may be equipped with an image acquisition device, such as a camera. The execution subject may use the camera to collect video data. In practice, video data can be described by frames. Here, a frame is the smallest visual unit that makes up a video. Each frame is a static image. Combining a sequence of temporally consecutive frames together forms a dynamic video. In addition, the above-mentioned execution body may also be provided with a device for converting an electric signal into a sound, such as a speaker. After obtaining the target audio data, the execution subject may turn on the camera to collect video data, and at the same time, may convert the target audio data into an analog audio signal and output sound using the speaker to implement playback of the target audio data.
在本实施例中,上述执行主体可以利用各种方式进行目标音频数据的播放。作为示例,上述执行主体可以首先对用于播放音视频的类(例如安卓(Android)多媒体包中的MediaPlayer类)进行实例化,创建对象。该对象可以用于播放上述目标音频数据。而后,可以将上述目标视频数据传输至该对象,以进行目标音频数据的播放。实践中,Android多媒体包中的MediaPlayer类可以支持播放多种格式的声音文件。例如,mp3格式、aac格式、wav格式等。其在播放音频数据时,先将数据解码成PCM编码格式的数据流,进而对该数据流进行数模转换等处理。In this embodiment, the above-mentioned execution subject may play the target audio data in various ways. As an example, the above-mentioned execution body may first instantiate a class for playing audio and video (for example, the MediaPlayer class in the Android multimedia package) to create an object. This object can be used to play the above target audio data. Then, the target video data can be transmitted to the object to play the target audio data. In practice, the MediaPlayer class in the Android multimedia package can support playing sound files in multiple formats. For example, mp3 format, aac format, wav format, etc. When playing audio data, it first decodes the data into a data stream in PCM encoding format, and then performs digital-to-analog conversion on the data stream.
通常,上述执行主体中可以安装有视频录制类应用。该视频录制类应用可以支持配乐视频的录制。其中,上述配乐视频可以是在视频数据采集的同时进行音频数据播放的视频。所录制的配乐视频中的声音为该音频数据对应的声音。例如,播放某歌曲过程中录制用户表演的演唱动作,所录制的视频以该歌曲为背景音乐。用户可以首先在视频录制类应用的运行界面中点击某一音频的名称(例如某一歌曲或旋律的名称)。而后,上述执行主体则可以获取该名称所对应音频数据,并作为目标音频数据。之后,用户可以在该视频录制类应用的运行界面中点击视频录制按键,从而触发视频录制指令。上述执行主体在接收到视频录制指令后,可以开启摄像头进行视频录制,同时,对该目标音频数据进行 处理,转换为模拟音频信号,利用扬声器输出声音。用户可以边听声音,边进行动作表演,以录制表演视频。Generally, a video recording application can be installed in the execution body. This video recording application can support the recording of soundtrack videos. The above soundtrack video may be a video that plays audio data while video data is being collected. The sound in the recorded soundtrack video is the sound corresponding to the audio data. For example, during a song, a singing action performed by a user is recorded, and the recorded video uses the song as background music. The user may first click the name of an audio (such as the name of a song or melody) in the running interface of the video recording application. Then, the execution body can obtain the audio data corresponding to the name and use it as the target audio data. After that, the user can click the video recording button in the running interface of the video recording application to trigger a video recording instruction. After receiving the video recording instruction, the above-mentioned executive body can turn on the camera for video recording, and at the same time, process the target audio data, convert it into an analog audio signal, and use the speaker to output sound. Users can perform action performances while listening to sound to record performance videos.
在一种应用场景中,用户可以进行视频的连续录制。此时,上述执行主体可以连续采集视频数据并同时播放目标音频数据。In one application scenario, users can perform continuous recording of videos. At this time, the above-mentioned execution subject can continuously collect video data and simultaneously play the target audio data.
在另一种应用场景中,用户可以进行视频的分段录制。作为示例,首先录制第一个分段。此时,上述执行主体可以连续采集视频数据并同时播放目标音频数据,直至检测到用户触发暂停录制指令(例如点击录制按键或者松开录制按键),暂停目标音频数据的播放并停止视频数据的采集。在检测到用户触发恢复录制指令(例如再次点击录制按键)后,上述执行主体可以继续连续采集视频数据,同时,继续目标音频数据的播放(即,以第一个分段已播放数据量处作为第二个分段的播放起点),直至检测到用户再次触发暂停录制指令,暂停目标音频数据的播放并停止视频数据的采集,以此类推。In another application scenario, users can perform segmented recording of videos. As an example, first record the first segment. At this time, the above-mentioned executive body can continuously collect video data and simultaneously play the target audio data until it detects that the user triggers a pause recording instruction (such as clicking the recording button or releasing the recording button), pausing the playback of the target audio data and stopping the video data collection . After detecting that the user triggers the resume recording instruction (for example, clicking the recording button again), the above-mentioned execution body can continue to continuously collect video data, and at the same time, continue to play the target audio data (that is, the amount of data played in the first segment is regarded as Starting point of the second segment), until it is detected that the user triggers the pause recording instruction again, pauses the playback of the target audio data and stops the collection of video data, and so on.
在步骤202中,确定采集到所述视频数据中的帧时已播放的目标音频数据的数据量,将数据量对应的播放时长确定为所述视频数据中的帧的时间戳。In step 202, the data amount of the target audio data that has been played when frames in the video data are collected is determined, and the playback time corresponding to the data amount is determined as the time stamp of the frames in the video data.
在本实施例中,上述执行主体在采集到视频数据的每一帧时,可以记录该帧的采集时间。每一帧的采集时间可以是采集到该帧时的系统时间戳(例如unix时间戳)。需要说明的是,时间戳(timestamp)是能表示一份数据在某个特定时刻已经存在的、完整的、可验证的数据。通常,时间戳是一个字符序列,唯一地标识某一刻的时间。此处,上述执行主体可以将上述视频数据的首帧的采集时间确定为视频数据的起始时间。In this embodiment, when each frame of the video data is collected by the execution subject, the frame collection time can be recorded. The collection time of each frame may be a system timestamp (such as a Unix timestamp) when the frame is collected. It should be noted that the timestamp is a complete, verifiable data that can indicate that a piece of data already exists at a specific time. Usually, a timestamp is a sequence of characters that uniquely identifies the time of a moment. Here, the execution subject may determine the acquisition time of the first frame of the video data as the start time of the video data.
对于视频数据中的帧,上述执行主体可以读取该帧的采集时间。而后,可以确定在该采集时间时,已播放的目标音频数据的数据量。最后,可以将该数据量对应的播放时长确定为该帧的时间戳。此处,可以利用各种方式确定在某一采集时间时已播放的目标音频数据的数据量。作为示例,在执行对预置的且用于播放音视频的类(例如Android多媒体包中的MediaPlayer类)进行实例化,并向所创建的对象传输目标视频数据之后,可以在确定每个帧的采集时间,已传输至该对象的目标视频数据的数据量。之后,可以将该数据量确定为采集到该帧时为已播放的目标音频数据的数据量。For a frame in the video data, the execution subject can read the acquisition time of the frame. Then, the data amount of the target audio data that has been played at the acquisition time can be determined. Finally, the playback time corresponding to the data amount can be determined as the time stamp of the frame. Here, various methods can be used to determine the data amount of the target audio data that has been played at a certain acquisition time. As an example, after performing instantiation of a preset class for playing audio and video (such as the MediaPlayer class in the Android multimedia package) and transmitting the target video data to the created object, you can determine the Acquisition time, the amount of target video data that has been transferred to the object. After that, the data amount can be determined as the data amount of the target audio data that has been played when the frame is acquired.
此处,由于目标音频数据是按照设定的采样频率(Sampling Rate)、设定的采样大小(Sampling Size)对声音信号进行采样、量化等操作后得到的,并且播放目标音频数据的声道数是预先确定的,因此,可以基于某帧图像的采集时间 已播放的目标音频数据的数据量,以及采样频率、采样大小和声道数,计算出采集到对应帧时目标音频数据的播放时长。上述执行主体可以将该播放时长确定为该帧的时间戳。实践中,采样频率也称为采样速度或者采样率。采样频率可以是每秒从连续信号中提取并组成离散信号的采样个数。采样频率可以用赫兹(Hz)来表示。采样大小可以用比特(bit)来表示。此处,确定播放时长的步骤如下:首先,可以确定采样频率、采样大小和声道数三者的乘积。而后,可以将已播放的目标音频数据的数据量与该乘积的比值确定为目标音频数据的播放时长。Here, the target audio data is obtained by sampling and quantizing the sound signal according to the set sampling frequency (Sampling and Rate) and the set sampling size (Sampling), and the number of channels of the target audio data is played. It is predetermined. Therefore, based on the data amount of the target audio data that has been played at the acquisition time of a certain frame of image, and the sampling frequency, sample size, and number of channels, the playback time of the target audio data when the corresponding frame is acquired can be calculated. The execution subject may determine the playback duration as the time stamp of the frame. In practice, the sampling frequency is also called the sampling speed or sampling rate. The sampling frequency can be the number of samples taken from the continuous signal per second and composed of discrete signals. The sampling frequency can be expressed in Hertz (Hz). The sample size can be expressed in bits. Here, the steps for determining the playback duration are as follows: First, the product of the sampling frequency, the sampling size, and the number of channels can be determined. Then, the ratio of the data amount of the target audio data that has been played to the product can be determined as the playback duration of the target audio data.
在本实施例的一些实现方式中,上述目标音频数据可以是PCM编码格式的数据流。上述执行主体还可以通过如下步骤进行目标音频数据的播放:首先,对目标类(例如Android开发包中的音频音轨(Audio Track)类)进行实例化,以创建用于播放目标音频数据的目标对象。其中,上述目标类可以用于播放PCM编码格式的数据流。之后,可以采用流式传输的方式,向上述目标对象传输上述目标音频数据,以利用上述目标对象播放上述目标音频数据。In some implementations of this embodiment, the foregoing target audio data may be a data stream in a PCM encoding format. The above-mentioned execution body can also play the target audio data through the following steps: first, instantiate the target class (such as the Audio Track class in the Android development kit) to create a target for playing the target audio data Object. The target class can be used to play a data stream in PCM encoding format. Thereafter, the target audio data may be transmitted to the target object in a streaming manner, so as to play the target audio data by using the target object.
实践中,Android开发包中的Audio Track是管理和播放单一音频资源的类。它用于PCM音频流的播放。通常,通过把音频数据利用推送(push)的方式传输到对Audio Track实例化后的对象,进行音频数据播放。Audio Track对象可以在两种模式下运行。分别为静态模式(static)和流模式(streaming)。在流模式下,把连续的PCM编码格式的数据流写入(通过调用写(write)方法)到Audio Track对象。在上述实现方式中,可以利用流模式进行目标音频数据的写入。In practice, the Audio Track in the Android Development Kit is a class that manages and plays a single audio resource. It is used for playback of PCM audio streams. Generally, audio data is played by pushing the audio data to an object instantiated with Audio Track by using a push method. Audio Track objects can operate in two modes. They are static mode and streaming mode. In stream mode, write (by calling the write method) a continuous PCM-encoded data stream to the Audio Track object. In the above implementation manner, the target audio data can be written in a streaming mode.
在本实施例的一些实现方式中,若采用上述实现方式(即基于对目标类(例如Android开发包中的Audio Track类)进行实例化的方式)对目标音频数据播放,则可以利用如下实现方式确定已播放的目标音频数据的数据量:对于上述视频数据的帧,确定采集到该帧时已传输至上述目标对象的目标音频数据的数据量,将上述数据量确定为采集到该帧时已播放的目标音频数据的数据量。In some implementations of this embodiment, if the above implementation is used (that is, based on the instantiation of the target class (such as the Audio track class in the Android development kit) to play the target audio data, the following implementation methods can be used Determine the data volume of the target audio data that has been played: For the frame of the video data, determine the data volume of the target audio data that has been transmitted to the target object when the frame is collected, and determine the data volume as the time when the frame is collected The data amount of the target audio data to play.
在本实施例的一些实现方式中,上述目标音频数据可以是PCM编码格式的数据流。上述执行主体还可以通过如下步骤进行目标音频数据的播放:首先,调用预置的且支持音频播放的音频处理组件(例如Android开发包中的OpenSL ES组件)。其中,上述音频处理组件可以支持缓冲区的设置和回调函数的设置。上述回调函数可以用于在每一次缓冲区中的音频数据处理(例如读取、播放等)完毕后返回该次所处理的音频数据的数据量。在调用该音频处理组件之后,可 以向上述音频处理组件传输上述目标音频数据,以利用上述音频处理组件播放上述目标音频数据。In some implementations of this embodiment, the foregoing target audio data may be a data stream in a PCM encoding format. The above-mentioned execution body may also play the target audio data through the following steps: first, call a preset audio processing component (such as the OpenSL ES component in the Android development kit) that supports audio playback. The audio processing component may support setting of a buffer and setting of a callback function. The above callback function may be used to return the data volume of the processed audio data after the audio data processing (such as reading, playing, etc.) in the buffer is completed. After the audio processing component is called, the target audio data may be transmitted to the audio processing component to play the target audio data using the audio processing component.
在本实施例的一些实现方式中,采用上述实现方式(即调用预置的且支持音频播放的音频处理组件对目标音频数据播放),每一次缓冲区中的目标音频数据被音频处理组件处理完毕后,回调函数可以返回该次已处理的目标音频数据的数据量。因此,对于上述视频数据的帧,上述执行主体可以确定采集到该帧时,上述回调函数已返回的数据量之和。上述执行主体可以将上述数据量之和确定为采集到该帧时已播放的目标音频数据的数据量。In some implementations of this embodiment, using the above implementation (that is, calling a preset audio processing component that supports audio playback to play target audio data), each time the target audio data in the buffer is processed by the audio processing component After that, the callback function can return the data volume of the target audio data processed this time. Therefore, for the frame of the video data, the execution subject may determine the sum of the amount of data that the callback function has returned when the frame was collected. The execution body may determine the sum of the data amounts as the data amount of the target audio data that has been played when the frame is acquired.
在本实施例的一些实现方式中,技术人员可以预先将音频处理组件的缓冲区的大小设置为目标数值,其中,上述目标数值可以小于或等于视频数据的相邻两帧的预设间隔时长(例如33ms)所对应的音频数据的大小。此处,上述预设间隔时长,可以是预先设定的采集视频数据的帧速率(Frames Per Second,FPS)的倒数。实践中,帧速率指每秒钟采集的帧数。帧速率的单位可以是fps或者赫兹(Hz)。作为示例,在帧速率为30fps时,相邻两帧的预设间隔时长为33ms。In some implementations of this embodiment, a technician may set the size of the buffer of the audio processing component as a target value in advance, where the target value may be less than or equal to a preset interval duration of two adjacent frames of video data ( For example, the size of audio data corresponding to 33ms). Here, the preset interval duration may be a reciprocal of a preset frame rate (Frames Per Second) of the collected video data. In practice, the frame rate refers to the number of frames collected per second. The unit of the frame rate can be fps or Hertz (Hz). As an example, when the frame rate is 30 fps, the preset interval of two adjacent frames is 33 ms.
由此,这种实现方式可以更准确地确定出在某一时刻已播放的目标音频数据的数据量。从而提高了所确定的视频数据的帧的时间戳的准确性。Therefore, this implementation manner can more accurately determine the data amount of the target audio data that has been played at a certain time. Thereby, the accuracy of the time stamp of the determined frame of the video data is improved.
在步骤203中,存储包含时间戳的视频数据和目标音频数据中已播放的数据。In step 203, the video data including the timestamp and the played data in the target audio data are stored.
在本实施例中,上述执行主体可以将上述目标音频数据中已播放的数据和包含时间戳的视频数据进行存储。此处,可以将上述目标音频数据中已播放的数据和包含时间戳的视频数据分别进行存储至两个文件中,并建立上述两个文件的映射。此外,也可以将上述目标音频数据区间和包含时间戳的视频数据存储至同一个文件中。In this embodiment, the execution subject may store the played data in the target audio data and the video data including a time stamp. Here, the played data and the video data including the time stamp in the target audio data may be stored into two files respectively, and a mapping of the two files is established. In addition, the target audio data interval and the video data including the time stamp may be stored in the same file.
在本实施例的一些实现方式中,上述执行主体可以首先确定停止录制指令触发后(例如用户点击停止视频录制按键后)已播放的目标音频数据的数据量。而后,可以提取该已播放的数据量所对应的数据。最后,可以存储包含时间戳的视频数据和所提取的数据。In some implementations of this embodiment, the execution body may first determine the data amount of the target audio data that has been played after the stop recording instruction is triggered (for example, after the user clicks the stop video recording button). Then, the data corresponding to the played amount of data can be extracted. Finally, video data and extracted data containing the time stamp can be stored.
在本实施例的一些实现方式中,上述执行主体可以首先根据采集到视频数据的尾帧时已播放的目标音频数据得到目标音频数据区间,提取目标音频数据区间。例如,上述执行主体可以首先获取所采集到的视频数据的尾帧的采集时间。而后,可以将从开始采集到该采集时间时已播放的目标音频数据对应的区 间作为目标音频数据区间,提取目标音频数据区间。在提取目标音频数据区间之后,上述执行主体可以存储包含所述视频数据中的全部帧对应的时间戳的视频数据和目标音频数据区间。由此,可以更加准确地确定在停止视频录制时所播放的目标音频数据的数据量,提升了视频停止录制时的音视频同步效果。In some implementations of this embodiment, the above-mentioned execution body may first obtain a target audio data interval based on the target audio data that has been played when the last frame of the video data is collected, and extract the target audio data interval. For example, the above-mentioned execution body may first obtain the acquisition time of the last frame of the collected video data. Then, the target audio data interval can be extracted from the area corresponding to the target audio data that has been played when the acquisition started to the acquisition time, and the target audio data interval can be extracted. After extracting the target audio data interval, the execution subject may store the video data and the target audio data interval including time stamps corresponding to all frames in the video data. Therefore, the data amount of the target audio data played when the video recording is stopped can be determined more accurately, and the audio and video synchronization effect when the video stops recording is improved.
在本实施例的一些实现方式中,上述执行主体可以首先将包含时间戳的视频数据进行编码。之后,将上述目标音频数据区间和编码后的视频数据存储在同一文件中。实践中,视频编码可以是指通过特定的压缩技术,将某个视频格式的文件转换成另一种视频格式文件的方式。需要说明的是,视频编码技术是目前广泛研究和应用的公知技术,在此不再赘述。In some implementations of this embodiment, the above-mentioned execution body may first encode video data including a time stamp. After that, the target audio data interval and the encoded video data are stored in the same file. In practice, video encoding can refer to a method of converting a file in a certain video format to another file in a video format through a specific compression technology. It should be noted that video coding technology is a well-known technology that is widely studied and applied at present, and will not be repeated here.
在本实施例的一些实现方式中,在将上述目标音频数据区间和包含时间戳的上述视频数据存储之后,上述执行主体还可以将所存储的数据上传至服务器(例如图1所示的服务器105)。In some implementations of this embodiment, after the target audio data interval and the video data including a time stamp are stored, the execution entity may further upload the stored data to a server (for example, the server 105 shown in FIG. 1). ).
继续参见图3,图3是根据本实施例的处理数据的方法的应用场景的一个示意图。在图3的应用场景中,用户手持终端设备301,进行配乐视频的录制。终端设备301中运行有短视频录制类应用。用户在该短视频录制类应用的界面中首先选择了某个配乐(例如歌曲《小苹果》)。而后终端设备301获取该配乐对应的目标音频文件302。在用户点击了原声视频录制按键之后,终端设备301同时开启摄像头进行视频数据303的采集,同时,播放上述目标音频文件302。对于上述视频数据中的帧,终端设备301可以确定采集到该帧时已播放的目标音频数据的数据量,将上述数据量对应的播放时长确定为该帧的时间戳。最后,终端设备301可以将包含时间戳的视频数据和上述目标音频数据中已播放的数据存储在文件304中。With continued reference to FIG. 3, FIG. 3 is a schematic diagram of an application scenario of a method for processing data according to this embodiment. In the application scenario of FIG. 3, a user holds a terminal device 301 and records a soundtrack video. A short video recording application runs on the terminal device 301. The user first selects a certain soundtrack (such as the song "Little Apple") in the interface of the short video recording application. The terminal device 301 then obtains the target audio file 302 corresponding to the soundtrack. After the user clicks the original video recording button, the terminal device 301 simultaneously turns on the camera to collect video data 303, and at the same time, plays the target audio file 302. For the frame in the video data, the terminal device 301 may determine the data amount of the target audio data that has been played when the frame is collected, and determine the playback time corresponding to the data amount as the time stamp of the frame. Finally, the terminal device 301 may store the video data including the timestamp and the played data in the above target audio data in the file 304.
本公开的上述实施例提供的方法,通过采集视频数据并播放目标音频数据,而后对于上述视频数据中的帧,确定采集到该帧时已播放的目标音频数据的数据量,将上述数据量对应的播放时长确定为该帧的时间戳,最后存储包含时间戳的视频数据和上述目标音频数据中已播放的数据。从而,在采集到某一帧时,可以该帧采集时刻已播放的目标音频数据的播放数据量确定帧时间戳,即视频数据的帧的时间戳可以基于目标音频数据的数据量确定,而非基于固定时间间隔确定。在视频数据采集不稳定的情况下(例如设备过热、性能不足导致丢帧),视频数据中的相邻两帧的间隔时间并不是固定的。按照固定时间间隔确定帧的时间戳不准确。采用本公开的上述实施例提供的方法,避免了视频数据采集不 稳定的情况下,按照固定时间间隔进行帧的时间戳的计算所导致的时间戳不准确的情况,提高了所确定的视频数据中的帧的时间戳的准确性,提升了所录制的配乐视频的音视频同步效果。The method provided by the above embodiments of the present disclosure collects video data and plays target audio data, and then for a frame in the video data, determines a data amount of the target audio data that has been played when the frame is collected, and corresponds the data amount. The playback duration of is determined as the timestamp of the frame, and finally the video data containing the timestamp and the played data in the above target audio data are stored. Therefore, when a certain frame is collected, the frame time stamp can be determined based on the playback data amount of the target audio data that has been played at the time of frame collection, that is, the time stamp of the frame of the video data can be determined based on the data amount of the target audio data instead Determined based on a fixed time interval. In the case of unstable video data collection (for example, device overheating and insufficient performance resulting in dropped frames), the interval between two adjacent frames in the video data is not fixed. It is determined that the timestamp of a frame is not accurate at regular time intervals. By adopting the method provided by the foregoing embodiment of the present disclosure, the situation of inaccurate timestamps caused by calculation of the timestamps of frames at fixed time intervals is avoided in the case of unstable video data collection, and the determined video data is improved The accuracy of the time stamps of the frames in the frame improves the audio and video synchronization effect of the recorded soundtrack video.
参考图4,其示出了处理数据的方法的又一个实施例的流程400。该处理数据的方法的流程400,包括步骤401至步骤406。Referring to FIG. 4, a flowchart 400 of still another embodiment of a method of processing data is shown. The process 400 of the method for processing data includes steps 401 to 406.
在步骤401中,确定本地是否存储有目标音频数据。In step 401, it is determined whether the target audio data is stored locally.
在本实施例中,处理数据的方法的执行主体(例如图1所示的终端设备101、102、103)可以确定本地是否存储有目标音频数据。此处,上述目标音频数据可以是PCM编码格式的数据流。In this embodiment, an execution subject of the method for processing data (for example, the terminal devices 101, 102, and 103 shown in FIG. 1) may determine whether the target audio data is stored locally. Here, the above-mentioned target audio data may be a data stream in a PCM encoding format.
在步骤402中,在本地没有存储目标音频数据的情况下,向服务器发送用于获取目标音频数据的请求,并接收服务器返回的目标音频数据。In step 402, if the target audio data is not stored locally, a request for acquiring the target audio data is sent to the server, and the target audio data returned by the server is received.
在本实施例中,响应于确定本地未存储有上述目标音频数据,上述执行主体可以通过有线连接或者无线连接方式,向服务器(例如图1所示的服务器105)发送用于获取目标音频数据的请求。而后,可以接收上述服务器返回的目标音频数据。In this embodiment, in response to determining that the target audio data is not stored locally, the execution entity may send the target audio data to a server (for example, server 105 shown in FIG. 1) by using a wired connection or a wireless connection. request. Then, the target audio data returned by the server can be received.
需要指出的是,上述无线连接方式可以包括但不限于3G/4G连接、WiFi连接、蓝牙连接、WiMAX连接、Zigbee连接、超带宽(Ultra Wideband,UWB)连接、以及其他现在已知或将来开发的无线连接方式。It should be noted that the above wireless connection methods may include, but are not limited to, 3G / 4G connection, WiFi connection, Bluetooth connection, WiMAX connection, Zigbee connection, Ultra Wideband (UWB) connection, and other known or developed in the future. Wireless connection.
需要说明的是,若上述服务器返回的上述目标音频数据不是PCM编码格式的数据流,上述执行主体可以对其进行转换,转换为PCM编码格式的数据流。It should be noted that if the target audio data returned by the server is not a data stream in the PCM encoding format, the execution body may convert it to a data stream in the PCM encoding format.
在步骤403中,采集视频数据,并调用预置的且支持音频播放的音频处理组件,向上述音频处理组件传输上述目标音频数据,以利用上述音频处理组件播放上述目标音频数据。In step 403, video data is collected, a preset audio processing component that supports audio playback is called, and the target audio data is transmitted to the audio processing component to play the target audio data using the audio processing component.
在本实施例中,上述执行主体可以利用其所安装的摄像头采集视频数据,同时,播放目标音频数据。此处,播放目标音频数据可以采用如下方式:In this embodiment, the above-mentioned execution body may use the camera installed on it to collect video data, and at the same time, play target audio data. Here, the target audio data can be played in the following ways:
首先,调用预置的且支持音频播放的音频处理组件(例如Android开发包中的OpenSL ES组件)。其中,上述音频处理组件可以支持缓冲区的设置和回调函数的设置。上述回调函数可以用于在每一次缓冲区中的音频数据音频处理组件处理完毕后返回该次所处理的音频数据的数据量。First, call a preset audio processing component (such as the OpenSL ES component in the Android development kit) that supports audio playback. The audio processing component may support setting of a buffer and setting of a callback function. The above callback function may be used to return the data volume of the processed audio data after the audio data processing component in the buffer is processed once.
之后,可以向上述音频处理组件传输上述目标音频数据,以利用上述音频处理组件播放上述目标音频数据。After that, the target audio data may be transmitted to the audio processing component to play the target audio data using the audio processing component.
在步骤404中,确定采集到视频数据中的帧时,回调函数已返回的数据量之和,将数据量之和确定为采集到视频数据中的帧时已播放的目标音频数据的数据量,将上述数据量对应的播放时长确定为视频数据中的帧的时间戳。In step 404, when the frames in the video data are collected, the sum of the data amount returned by the callback function is determined, and the sum of the data amounts is determined as the data amount of the target audio data that has been played when the frames in the video data are collected. The playback duration corresponding to the data amount is determined as a time stamp of a frame in the video data.
在本实施例中,每一次缓冲区中的目标音频数据被音频处理组件处理完毕后,回调函数可以返回该次已处理的目标音频数据的数据量。因此,对于上述视频数据的帧,上述执行主体可以确定采集到该帧时,上述回调函数已返回的数据量之和。上述执行主体可以将上述数据量之和确定为采集到该帧时已播放的目标音频数据的数据量。之后,上述执行主体可以按照如下步骤确定已播放的目标音频数据的数据量对应的播放时长:首先,可以确定采样频率、采样大小和声道数三者的乘积。而后,可以将已播放的目标音频数据的数据量与该乘积的比值确定为目标音频数据的播放时长。In this embodiment, after the target audio data in the buffer is processed by the audio processing component each time, the callback function may return the data amount of the processed target audio data. Therefore, for the frame of the video data, the execution subject may determine the sum of the amount of data that the callback function has returned when the frame was collected. The execution body may determine the sum of the data amounts as the data amount of the target audio data that has been played when the frame is acquired. After that, the execution body may determine the playback duration corresponding to the data amount of the target audio data that has been played according to the following steps: First, the product of the sampling frequency, the sampling size, and the number of channels may be determined. Then, the ratio of the data amount of the target audio data that has been played to the product can be determined as the playback duration of the target audio data.
实践中,技术人员可以预先将音频处理组件的缓冲区的大小设置为目标数值,其中,上述目标数值可以小于或等于视频数据的相邻两帧的预设间隔时长(例如33ms)所对应的音频数据的大小。由此,这种实现方式可以更准确地确定出在某一时刻已播放的目标音频数据的数据量。从而提高了所确定的视频数据的帧的时间戳的准确性。In practice, a technician can set the size of the buffer of the audio processing component to a target value in advance, where the target value can be less than or equal to the audio corresponding to a preset interval (for example, 33ms) of two adjacent frames of video data. The size of the data. Therefore, this implementation manner can more accurately determine the data amount of the target audio data that has been played at a certain time. Thereby, the accuracy of the time stamp of the determined frame of the video data is improved.
在步骤405中,根据采集到视频数据的尾帧时已播放的目标音频数据得到目标音频数据区间,提取目标音频数据区间。In step 405, the target audio data interval is obtained according to the target audio data that has been played when the last frame of the video data is collected, and the target audio data interval is extracted.
在本实施例中,上述执行主体可以首先获取所采集到的视频数据的尾帧的采集时间。而后,可以将从开始采集该采集时间时已播放的目标音频数据对应的区间作为目标音频数据区间,提取目标音频数据区间。In this embodiment, the execution body may first obtain a collection time of a tail frame of the collected video data. Then, the target audio data interval can be extracted from the interval corresponding to the target audio data that has been played when the acquisition time is started, and the target audio data interval can be extracted.
在步骤406中,存储包含所述视频数据中的全部帧对应的时间戳的视频数据和目标音频数据区间。In step 406, the video data and the target audio data interval containing time stamps corresponding to all the frames in the video data are stored.
在本实施例中,在本实施例中,上述执行主体可以将上述目标音频数据中已播放的数据和包含时间戳的视频数据进行存储。此处,可以将上述目标音频数据中已播放的数据和包含时间戳的视频数据分别进行存储至两个文件中,并建立上述两个文件的映射。此外,也可以将上述目标音频数据区间和包含时间戳的视频数据存储至同一个文件中。In this embodiment, in this embodiment, the execution subject may store the played data and the video data including the time stamp in the target audio data. Here, the played data and the video data including the time stamp in the target audio data may be stored into two files respectively, and a mapping of the two files is established. In addition, the target audio data interval and the video data including the time stamp may be stored in the same file.
从图4中可以看出,与图2对应的实施例相比,本实施例中的处理数据的方法的流程400体现了利用预置的且支持音频播放的音频处理组件进行目标音频数据播放的步骤,以及基于回调函数确定帧采集时刻已播放的目标音频数据 的数据量的步骤。由此,在采集到视频数据的某一帧时,由于本实施例描述的方案中,音频处理组件的回调函数可在每一次缓冲区内的数据被音频处理组件处理完毕后返回数据量,因而,执行主体可以基于回调函数所返回的数据量直接计算出播放量。因此,相较于以音频数据的发送量作为播放量的方式,本实施例描述的方案可以更加准确地出每帧采集时刻目标音频数据的播放量。进而,提高了所确定的视频数据中的帧的时间戳的准确性,进一步提升了录制的配乐视频的音视频同步效果。As can be seen from FIG. 4, compared with the embodiment corresponding to FIG. 2, the process 400 of the method for processing data in this embodiment embodies the use of a preset audio processing component that supports audio playback for target audio data playback A step, and a step of determining a data amount of the target audio data that has been played at the frame collection time based on the callback function. Therefore, when a certain frame of video data is collected, because in the solution described in this embodiment, the callback function of the audio processing component can return the amount of data after each time the data in the buffer is processed by the audio processing component, so The execution body can directly calculate the playback based on the amount of data returned by the callback function. Therefore, compared with the manner in which the transmission volume of audio data is used as the playback volume, the solution described in this embodiment can more accurately determine the playback volume of the target audio data at each frame collection time. Furthermore, the accuracy of the time stamp of the frames in the determined video data is improved, and the audio and video synchronization effect of the recorded soundtrack video is further improved.
参考图5,作为对上述各图所示方法的实现,本公开提供了一种处理数据的装置的一个实施例,该装置实施例与图2所示的方法实施例相对应,该装置可以应用于各种电子设备中。Referring to FIG. 5, as an implementation of the methods shown in the foregoing figures, the present disclosure provides an embodiment of a device for processing data. The device embodiment corresponds to the method embodiment shown in FIG. 2, and the device can be applied. In various electronic equipment.
如图5所示,本实施例所述的处理数据的装置500包括:采集单元501,被配置成采集视频数据并播放目标音频数据;第一确定单元502,被配置成对于上述视频数据中的帧,确定采集到该帧时已播放的目标音频数据的数据量,将上述数据量对应的播放时长确定为该帧的时间戳;存储单元503,被配置成存储包含时间戳的视频数据和上述目标音频数据中已播放的数据。As shown in FIG. 5, the apparatus 500 for processing data according to this embodiment includes: a collection unit 501 configured to collect video data and play target audio data; and a first determination unit 502 configured to perform Frame, determines the data amount of the target audio data that has been played when the frame is collected, and determines the playback time corresponding to the data amount as the time stamp of the frame; the storage unit 503 is configured to store the video data including the time stamp and the above The played data in the target audio data.
在本实施例的一些实现方式中,上述存储单元503可以包括提取模块和存储模块(图中未示出)。其中,上述提取模块可以被配置成根据采集到上述视频数据的尾帧时已播放的目标音频数据得到目标音频数据区间,提取上述目标音频数据区间。上述存储模块可以被配置成存储包含所述视频数据中的全部帧对应的时间戳的视频数据和上述目标音频数据区间。In some implementations of this embodiment, the storage unit 503 may include an extraction module and a storage module (not shown in the figure). The extraction module may be configured to obtain a target audio data interval based on the target audio data that has been played when the last frame of the video data is collected, and extract the target audio data interval. The storage module may be configured to store video data including time stamps corresponding to all frames in the video data and the target audio data interval.
在本实施例的一些实现方式中,该装置还可以包括第二确定单元(图中未示出)。其中,上述第二确定单元可以被配置成确定本地是否存储有目标音频数据;发送单元,被配置成在本地没有存储目标音频数据的情况下,向服务器发送用于获取上述目标音频数据的请求,并接收上述服务器返回的上述目标音频数据。In some implementations of this embodiment, the apparatus may further include a second determining unit (not shown in the figure). The second determining unit may be configured to determine whether the target audio data is stored locally; and the sending unit is configured to send a request for obtaining the target audio data to the server when the target audio data is not stored locally, And receiving the target audio data returned by the server.
在本实施例的一些实现方式中,上述目标音频数据为脉冲编码调制编码格式的数据流,以及,上述采集单元501可以包括对象创建模块和第一传输模块(图中未示出)。其中,上述对象创建模块可以被配置成对目标类进行实例化,以创建用于播放目标音频数据的目标对象,其中,上述目标类用于播放脉冲编码调制格式的数据流。上述第一传输模块可以被配置成采用流式传输的方式,向上述目标对象传输上述目标音频数据,以利用上述目标对象播放上述目标音 频数据。In some implementations of this embodiment, the target audio data is a data stream in a pulse code modulation coding format, and the acquisition unit 501 may include an object creation module and a first transmission module (not shown in the figure). The object creation module may be configured to instantiate a target class to create a target object for playing target audio data, where the target class is used to play a data stream in a pulse code modulation format. The first transmission module may be configured to transmit the target audio data to the target object in a streaming manner, so as to play the target audio data by using the target object.
在本实施例的一些实现方式中,上述第一确定单元502可以被配置成:对于上述视频数据的帧,确定采集到该帧时已传输至上述目标对象的目标音频数据的数据量,将上述数据量确定为采集到该帧时已播放的目标音频数据的数据量。In some implementations of this embodiment, the first determining unit 502 may be configured to: for the frame of the video data, determine a data amount of target audio data that has been transmitted to the target object when the frame is collected, and The data amount is determined as the data amount of the target audio data that has been played when the frame was acquired.
在本实施例的一些实现方式中,上述目标音频数据为脉冲编码调制编码格式的数据流。以及,上述采集单元可以包括调用模块和第二传输模块(图中未示出)。其中,上述调用模块可以被配置成调用预置的且支持音频播放的音频处理组件,其中,上述音频处理组件支持缓冲区的设置和回调函数的设置,上述回调函数用于在每一次缓冲区中的音频数据音频处理组件处理完毕后返回该次所处理的音频数据的数据量。第二传输模块可以被配置成向上述音频处理组件传输上述目标音频数据,以利用上述音频处理组件播放上述目标音频数据。In some implementations of this embodiment, the target audio data is a data stream in a pulse coding modulation coding format. And, the above-mentioned acquisition unit may include a calling module and a second transmission module (not shown in the figure). The calling module may be configured to call a preset audio processing component that supports audio playback. The audio processing component supports a buffer setting and a callback function. The callback function is used in each buffer. The audio data of the audio processing component returns the data volume of the processed audio data after processing. The second transmission module may be configured to transmit the target audio data to the audio processing component to play the target audio data using the audio processing component.
在本实施例的一些实现方式中,上述第一确定单元可以被配置成:对于上述视频数据的帧,确定采集到该帧时,上述回调函数已返回的数据量之和,将上述数据量之和确定为采集到该帧时已播放的目标音频数据的数据量。In some implementations of this embodiment, the first determining unit may be configured to determine, for the frame of the video data, the sum of the amount of data that the callback function has returned when the frame is collected, and sum the data amount And determine the data amount of the target audio data that was played when the frame was acquired.
在本实施例的一些实现方式中,音频处理组件的缓冲区的大小可以为预先设置的目标数值,其中,目标数值小于或等于视频数据的相邻两帧的预设间隔时长所对应的音频数据的大小。In some implementations of this embodiment, the size of the buffer of the audio processing component may be a preset target value, where the target value is less than or equal to the audio data corresponding to a preset interval of two adjacent frames of video data. the size of.
在本实施例的一些实现方式中,上述存储模块可以包括编码子模块和存储子模块(图中未示出)。其中,上述编码模块可以被配置成将带有时间戳的视频数据进行编码。上述存储模块可以被配置成将编码后的视频数据和上述目标音频数据区间存储至同一文件中。In some implementations of this embodiment, the foregoing storage module may include an encoding submodule and a storage submodule (not shown in the figure). The encoding module may be configured to encode video data with time stamps. The storage module may be configured to store the encoded video data and the target audio data interval in a same file.
本公开的上述实施例提供的装置,通过采集单元501采集视频数据并播放目标音频数据,而后第一确定单元502对于上述视频数据中的帧,确定采集到该帧时已播放的目标音频数据的数据量,将上述数据量对应的播放时长确定为该帧的时间戳,最后存储单元503存储包含时间戳的视频数据和上述目标音频数据中已播放的数据,从而,在采集到某一帧时,可以根据该帧采集时刻已播放的目标音频数据的播放量确定该帧时间戳,提升了所录制的配乐视频的音视频同步效果。The device provided by the foregoing embodiment of the present disclosure collects video data and plays target audio data through the collecting unit 501, and then the first determining unit 502 determines, for a frame in the video data, the target audio data that has been played when the frame is collected. The amount of data, the playback time corresponding to the above-mentioned amount of data is determined as the timestamp of the frame, and finally the storage unit 503 stores the video data including the timestamp and the played data in the target audio data, so that when a certain frame is collected , The time stamp of the frame can be determined according to the playback amount of the target audio data that has been played at the frame collection moment, which improves the audio and video synchronization effect of the recorded soundtrack video.
下面参考图6,其示出了适于用来实现本公开实施例的终端设备的计算机系统600的结构示意图。图6示出的终端设备仅仅是一个示例,不应对本公开实 施例的功能和使用范围带来任何限制。Reference is now made to FIG. 6, which illustrates a schematic structural diagram of a computer system 600 suitable for implementing a terminal device according to an embodiment of the present disclosure. The terminal device shown in FIG. 6 is only an example, and should not impose any limitation on the functions and use range of the embodiments of the present disclosure.
如图6所示,计算机系统600包括中央处理单元(Central Processing Unit,CPU)601,其可以根据存储在只读存储器(Read Only Memory,ROM)602中的程序或者从存储部分608加载到随机访问存储器(Random Access Memory,RAM)603中的程序而执行各种适当的动作和处理。在RAM 603中,还存储有系统600操作所需的各种程序和数据。CPU 601、ROM 602以及RAM 603通过总线604彼此相连。输入/输出(Input/Output,I/O)接口605也连接至总线604。As shown in FIG. 6, the computer system 600 includes a central processing unit (CPU) 601, which can be loaded into random access according to a program stored in a read-only memory (ROM) 602 or from a storage portion 608 A program in a memory (Random Access Memory, RAM) 603 performs various appropriate actions and processes. In the RAM 603, various programs and data required for the operation of the system 600 are also stored. The CPU 601, the ROM 602, and the RAM 603 are connected to each other through a bus 604. An input / output (I / O) interface 605 is also connected to the bus 604.
以下部件连接至I/O接口605:包括触摸屏、触摸板等的输入部分606;包括诸如液晶显示器(Liquid Crystal Display,LCD)等以及扬声器等的输出部分607;包括硬盘等的存储部分608;以及包括诸如局域网(Local Area Network,LAN)卡、调制解调器等的网络接口卡的通信部分609。通信部分609经由诸如因特网的网络执行通信处理。驱动器610也根据需要连接至I/O接口605。可拆卸介质611,诸如半导体存储器等等,根据需要安装在驱动器610上,以便于从其上读出的计算机程序根据需要被安装入存储部分608。The following components are connected to the I / O interface 605: an input portion 606 including a touch screen, a touch panel, etc .; an output portion 607 including a liquid crystal display (LCD), and a speaker; a storage portion 608 including a hard disk; and A communication part 609 of a network interface card, such as a local area network (LAN) card, a modem, or the like. The communication section 609 performs communication processing via a network such as the Internet. The driver 610 is also connected to the I / O interface 605 as necessary. A removable medium 611, such as a semiconductor memory or the like, is installed on the drive 610 as necessary, so that a computer program read therefrom is installed into the storage section 608 as necessary.
根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信部分609从网络上被下载和安装,和/或从可拆卸介质611被安装。在该计算机程序被中央处理单元(CPU)601执行时,执行本公开的方法中限定的上述功能。需要说明的是,本公开所述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器((Erasable Programmable Read Only Memory,EPROM)或闪存)、光纤、便携式紧凑磁盘只读存储器(Compact Disc Read-Only Memory,CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读的信号介质可以包括在基带中或者作为载波一 部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:无线、电线、光缆、射频(Radio Frequency,RF)等等,或者上述的任意合适的组合。According to an embodiment of the present disclosure, the process described above with reference to the flowchart may be implemented as a computer software program. For example, embodiments of the present disclosure include a computer program product including a computer program carried on a computer-readable medium, the computer program containing program code for performing a method shown in a flowchart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication portion 609, and / or installed from a removable medium 611. When the computer program is executed by a central processing unit (CPU) 601, the above-mentioned functions defined in the method of the present disclosure are performed. It should be noted that the computer-readable medium described in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium or any combination of the foregoing. The computer-readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programming read-only memory (Erasable, Programmable, Read-Only, Memory (EPROM) or flash memory), optical fiber, portable compact disk read-only memory (Compact Disc-Read-Only Memory, CD-ROM), optical storage device, magnetic storage device, or the above Any suitable combination. In this disclosure, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in combination with an instruction execution system, apparatus, or device. In this disclosure, a computer-readable signal medium may include a data signal that is included in baseband or propagated as part of a carrier wave, and which carries computer-readable program code. Such a propagated data signal may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. The computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, and the computer-readable medium may send, propagate, or transmit a program for use by or in connection with an instruction execution system, apparatus, or device . The program code contained on the computer-readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, optical cable, radio frequency (RF), or any suitable combination of the foregoing.
附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagram may represent a module, a program segment, or a part of code, which contains one or more functions to implement a specified logical function Executable instructions. It should also be noted that in some alternative implementations, the functions noted in the blocks may also occur in a different order than those marked in the drawings. For example, two successively represented boxes may actually be executed substantially in parallel, and they may sometimes be executed in the reverse order, depending on the functions involved. It should also be noted that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, can be implemented by a dedicated hardware-based system that performs the specified function or operation , Or it can be implemented with a combination of dedicated hardware and computer instructions.
描述于本公开实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。所描述的单元也可以设置在处理器中,例如,可以描述为:一种处理器包括采集单元、第一确定单元、提取单元和存储单元。其中,这些单元的名称在某种情况下并不构成对该单元本身的限定,例如,采集单元还可以被描述为“采集视频数据并播放目标音频数据的单元”。The units described in the embodiments of the present disclosure may be implemented by software or hardware. The described unit may also be provided in a processor, for example, it may be described as: a processor includes an acquisition unit, a first determination unit, an extraction unit, and a storage unit. Among them, the names of these units do not constitute a limitation on the unit itself in some cases. For example, the acquisition unit can also be described as a “unit that collects video data and plays target audio data”.
作为另一方面,本公开还提供了一种计算机可读介质,该计算机可读介质可以是上述实施例中描述的装置中所包含的;也可以是单独存在,而未装配入该装置中。上述计算机可读介质承载有一个或者多个程序,在上述一个或者多个程序被该装置执行时,使得该装置:采集视频数据并播放目标音频数据;对于该视频数据中的帧,确定采集到该帧时已播放的目标音频数据的数据量,将该数据量对应的播放时长确定为该帧的时间戳;存储包含时间戳的视频数据和该目标音频数据中已播放的数据。As another aspect, the present disclosure also provides a computer-readable medium, which may be included in the device described in the above embodiments; or may exist alone without being assembled into the device. The computer-readable medium carries one or more programs, and when the one or more programs are executed by the device, the device causes the device to: collect video data and play target audio data; for frames in the video data, determine that the The amount of data of the target audio data that has been played during the frame. The playback time corresponding to the amount of data is determined as the time stamp of the frame; video data containing the time stamp and the data that has been played in the target audio data are stored.

Claims (18)

  1. 一种处理数据的方法,包括:A method for processing data, including:
    采集视频数据并播放目标音频数据;Collect video data and play target audio data;
    确定采集到所述视频数据中的帧时已播放的目标音频数据的数据量,将所述数据量对应的播放时长确定为所述视频数据中的帧的时间戳;Determining the data amount of the target audio data that has been played when the frames in the video data are collected, and determining the playback time corresponding to the data amount as the time stamp of the frames in the video data;
    存储包含时间戳的视频数据和所述目标音频数据中已播放的数据。The video data including the time stamp and the played data in the target audio data are stored.
  2. 根据权利要求1所述的方法,其中,所述存储包含时间戳的视频数据和所述目标音频数据中已播放的数据,包括:The method according to claim 1, wherein the storing the video data including the time stamp and the played data in the target audio data comprises:
    根据采集到所述视频数据的尾帧时已播放的目标音频数据得到目标音频数据区间,提取所述目标音频数据区间;Obtaining a target audio data interval according to the target audio data that has been played when the last frame of the video data is collected, and extracting the target audio data interval;
    存储包含所述视频数据中的全部帧对应的时间戳的视频数据和所述目标音频数据区间。The video data including the time stamps corresponding to all the frames in the video data and the target audio data interval are stored.
  3. 根据权利要求1所述的方法,在所述采集视频数据并播放目标音频数据之前,还包括:The method according to claim 1, before the collecting video data and playing target audio data, further comprising:
    确定本地是否存储有目标音频数据;Determine whether the target audio data is stored locally;
    在本地没有存储目标音频数据的情况下,向服务器发送用于获取所述目标音频数据的请求,并接收所述服务器返回的所述目标音频数据。When the target audio data is not stored locally, a request for acquiring the target audio data is sent to the server, and the target audio data returned by the server is received.
  4. 根据权利要求1所述的方法,其中,所述目标音频数据为脉冲编码调制编码格式的数据流;以及The method according to claim 1, wherein the target audio data is a data stream in a pulse code modulation coding format; and
    所述播放目标音频数据,包括:The playback target audio data includes:
    创建用于播放目标音频数据的目标对象;Create a target object for playing target audio data;
    采用流式传输的方式,向所述目标对象传输所述目标音频数据,以利用所述目标对象播放所述目标音频数据。In a streaming manner, the target audio data is transmitted to the target object to play the target audio data by using the target object.
  5. 根据权利要求4所述的方法,其中,所述确定采集到所述视频数据中的帧时已播放的目标音频数据的数据量,包括:The method according to claim 4, wherein determining the data amount of the target audio data that has been played when the frames in the video data are collected comprises:
    确定采集到所述视频数据中的帧时已传输至所述目标对象的目标音频数据的数据量,将所述数据量确定为采集到所述视频数据中的帧时已播放的目标音频数据的数据量。Determining a data amount of target audio data that has been transmitted to the target object when frames in the video data are collected, and determining the data amount as the target audio data that has been played when frames in the video data are collected The amount of data.
  6. 根据权利要求1所述的方法,其中,所述目标音频数据为脉冲编码调制编码格式的数据流;以及The method according to claim 1, wherein the target audio data is a data stream in a pulse code modulation coding format; and
    所述播放目标音频数据,包括:The playback target audio data includes:
    调用预置的且支持音频播放的音频处理组件,其中,所述音频处理组件支 持缓冲区的设置和回调函数的设置,所述回调函数用于在缓冲区中的音频数据被所述音频处理组件处理完毕后返回所处理的音频数据的数据量;Invoking a preset audio processing component that supports audio playback, wherein the audio processing component supports the setting of a buffer and the setting of a callback function for the audio data in the buffer by the audio processing component After processing, the data volume of the processed audio data is returned;
    向所述音频处理组件传输所述目标音频数据,以利用所述音频处理组件播放所述目标音频数据。Transmitting the target audio data to the audio processing component to play the target audio data using the audio processing component.
  7. 根据权利要求6所述的方法,其中,所述确定采集到所述视频数据中的帧时已播放的目标音频数据的数据量,包括:The method according to claim 6, wherein the determining the data amount of the target audio data that has been played when the frames in the video data are collected comprises:
    确定采集到所述视频数据中的帧时所述回调函数已返回的数据量之和,将所述数据量之和确定为采集到所述视频数据中的帧时已播放的目标音频数据的数据量。Determine the sum of the amount of data that the callback function has returned when frames in the video data were collected, and determine the sum of the data amounts as the data of the target audio data that was played when the frames in the video data were collected the amount.
  8. 根据权利要求6或7所述的方法,其中,所述音频处理组件的缓冲区的大小为预先设置的目标数值,所述目标数值小于或等于视频数据的相邻两帧的预设间隔时长所对应的音频数据的大小。The method according to claim 6 or 7, wherein the size of the buffer of the audio processing component is a preset target value, and the target value is less than or equal to a preset interval length of two adjacent frames of video data. The size of the corresponding audio data.
  9. 一种处理数据的装置,包括:A device for processing data includes:
    采集单元,被配置成采集视频数据并播放目标音频数据;An acquisition unit configured to acquire video data and play target audio data;
    第一确定单元,被配置成确定采集到所述视频数据中的帧时已播放的目标音频数据的数据量,将所述数据量对应的播放时长确定为所述视频数据中的帧的时间戳;A first determining unit is configured to determine a data amount of target audio data that has been played when frames in the video data are collected, and determine a playback duration corresponding to the data amount as a time stamp of a frame in the video data ;
    存储单元,被配置成存储包含时间戳的视频数据和所述目标音频数据中已播放的数据。The storage unit is configured to store the video data including the time stamp and the played data in the target audio data.
  10. 根据权利要求9所述的装置,其中,所述存储单元包括:The apparatus according to claim 9, wherein the storage unit comprises:
    提取模块,被配置成根据采集到所述视频数据的尾帧时已播放的目标音频数据得到目标音频数据区间,提取所述目标音频数据区间;An extraction module configured to obtain a target audio data interval based on the target audio data that has been played when the last frame of the video data was collected, and extract the target audio data interval;
    存储模块,被配置成存储包含所述视频数据中的全部帧对应的时间戳的视频数据和所述目标音频数据区间进行。The storage module is configured to store video data including time stamps corresponding to all frames in the video data and the target audio data interval.
  11. 根据权利要求9所述的装置,还包括:The apparatus according to claim 9, further comprising:
    第二确定单元,被配置成确定本地是否存储有目标音频数据;A second determining unit configured to determine whether target audio data is stored locally;
    发送单元,被配置成在本地没有存储目标音频数据的情况下,向服务器发送用于获取目标音频数据的请求,并接收服务器返回的目标音频数据。The sending unit is configured to send a request for obtaining the target audio data to the server when the target audio data is not stored locally, and receive the target audio data returned by the server.
  12. 根据权利要求9所述的装置,其中,所述目标音频数据为脉冲编码调制编码格式的数据流;以及The apparatus according to claim 9, wherein the target audio data is a data stream in a pulse coding modulation coding format; and
    所述采集单元,包括:The acquisition unit includes:
    对象创建模块,被配置成创建用于播放目标音频数据的目标对象;An object creation module configured to create a target object for playing target audio data;
    第一传输模块,被配置成采用流式传输的方式,向所述目标对象传输所述目标音频数据,以利用所述目标对象播放所述目标音频数据。The first transmission module is configured to transmit the target audio data to the target object in a streaming manner, so as to play the target audio data by using the target object.
  13. 根据权利要求12所述的装置,所述第一确定单元,被配置成:The apparatus according to claim 12, the first determining unit is configured to:
    确定采集到所述视频数据中的帧时已传输至所述目标对象的目标音频数据的数据量,将所述数据量确定为采集到所述视频数据中的帧时已播放的目标音频数据的数据量。Determining a data amount of target audio data that has been transmitted to the target object when frames in the video data are collected, and determining the data amount as the target audio data that has been played when frames in the video data are collected The amount of data.
  14. 根据权利要求9所述的装置,其中,所述目标音频数据为脉冲编码调制编码格式的数据流;以及The apparatus according to claim 9, wherein the target audio data is a data stream in a pulse coding modulation coding format; and
    所述采集单元,包括:The acquisition unit includes:
    调用模块,被配置成调用预置的且支持音频播放的音频处理组件,其中,所述音频处理组件支持缓冲区的设置和回调函数的设置,所述回调函数用于在缓冲区中的音频数据被所述音频处理组件处理完毕后返回所处理的音频数据的数据量;A calling module configured to call a preset audio processing component that supports audio playback, wherein the audio processing component supports setting of a buffer and setting of a callback function for the audio data in the buffer Return the data volume of the processed audio data after being processed by the audio processing component;
    第二传输模块,被配置成向所述音频处理组件传输所述目标音频数据,以利用所述音频处理组件播放所述目标音频数据。A second transmission module is configured to transmit the target audio data to the audio processing component to play the target audio data using the audio processing component.
  15. 根据权利要求14所述的装置,其中,所述第一确定单元,被配置成:The apparatus according to claim 14, wherein the first determining unit is configured to:
    确定采集到所述视频数据中的帧时所述回调函数已返回的数据量之和,将所述数据量之和确定为采集到所述视频数据中的帧时已播放的目标音频数据的数据量。Determine the sum of the amount of data that the callback function has returned when frames in the video data were collected, and determine the sum of the data amounts as the data of the target audio data that was played when the frames in the video data were collected the amount.
  16. 根据权利要求14或15所述的装置,其中,所述音频处理组件的缓冲区的大小为预先设置的目标数值,所述目标数值小于或等于视频数据的相邻两帧的预设间隔时长所对应的音频数据的大小。The device according to claim 14 or 15, wherein the size of the buffer of the audio processing component is a preset target value, and the target value is less than or equal to a preset interval duration of two adjacent frames of video data The size of the corresponding audio data.
  17. 一种终端设备,包括:A terminal device includes:
    至少一个处理器;At least one processor;
    存储装置,其上存储有至少一个程序,A storage device storing at least one program thereon,
    所述至少一个程序被所述至少一个处理器执行,使得所述至少一个处理器实现如权利要求1-8中任一所述的方法。The at least one program is executed by the at least one processor such that the at least one processor implements the method of any one of claims 1-8.
  18. 一种计算机可读介质,其上存储有计算机程序,所述程序被处理器执行时实现如权利要求1-8中任一项所述的方法。A computer-readable medium having stored thereon a computer program which, when executed by a processor, implements the method according to any one of claims 1-8.
PCT/CN2019/098510 2018-08-01 2019-07-31 Method and apparatus for processing data WO2020024962A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810866740.1A CN109600650B (en) 2018-08-01 2018-08-01 Method and apparatus for processing data
CN201810866740.1 2018-08-01

Publications (1)

Publication Number Publication Date
WO2020024962A1 true WO2020024962A1 (en) 2020-02-06

Family

ID=65956557

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/098510 WO2020024962A1 (en) 2018-08-01 2019-07-31 Method and apparatus for processing data

Country Status (2)

Country Link
CN (1) CN109600650B (en)
WO (1) WO2020024962A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109600650B (en) * 2018-08-01 2020-06-19 北京微播视界科技有限公司 Method and apparatus for processing data
CN110225279B (en) * 2019-07-15 2022-08-16 北京小糖科技有限责任公司 Video production system and video production method of mobile terminal
CN110418183B (en) * 2019-08-05 2022-11-15 北京字节跳动网络技术有限公司 Audio and video synchronization method and device, electronic equipment and readable medium
CN112764709B (en) * 2021-01-07 2021-09-21 北京创世云科技股份有限公司 Sound card data processing method and device and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2955713A1 (en) * 2014-06-12 2015-12-16 Huawei Technologies Co., Ltd. Synchronous audio playback method, apparatus and system
CN107613357A (en) * 2017-09-13 2018-01-19 广州酷狗计算机科技有限公司 Sound picture Synchronous fluorimetry method, apparatus and readable storage medium storing program for executing
CN108063970A (en) * 2017-11-22 2018-05-22 北京奇艺世纪科技有限公司 A kind of method and apparatus for handling live TV stream
CN109600650A (en) * 2018-08-01 2019-04-09 北京微播视界科技有限公司 Method and apparatus for handling data

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003169292A (en) * 2001-11-30 2003-06-13 Victor Co Of Japan Ltd After-recording device, computer program, recording medium, transmission method and reproducing device
CN102348086A (en) * 2010-08-03 2012-02-08 中兴通讯股份有限公司 Method and mobile terminal for loading background sounds in video recording process
CN105933724A (en) * 2016-05-23 2016-09-07 福建星网视易信息系统有限公司 Video producing method, device and system
CN106060424A (en) * 2016-06-14 2016-10-26 徐文波 Video dubbing method and device
CN107786876A (en) * 2017-09-21 2018-03-09 北京达佳互联信息技术有限公司 The synchronous method of music and video, device and mobile terminal
CN108111903A (en) * 2018-01-17 2018-06-01 广东欧珀移动通信有限公司 Record screen document play-back method, device and terminal

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2955713A1 (en) * 2014-06-12 2015-12-16 Huawei Technologies Co., Ltd. Synchronous audio playback method, apparatus and system
CN107613357A (en) * 2017-09-13 2018-01-19 广州酷狗计算机科技有限公司 Sound picture Synchronous fluorimetry method, apparatus and readable storage medium storing program for executing
CN108063970A (en) * 2017-11-22 2018-05-22 北京奇艺世纪科技有限公司 A kind of method and apparatus for handling live TV stream
CN109600650A (en) * 2018-08-01 2019-04-09 北京微播视界科技有限公司 Method and apparatus for handling data

Also Published As

Publication number Publication date
CN109600650A (en) 2019-04-09
CN109600650B (en) 2020-06-19

Similar Documents

Publication Publication Date Title
CN109600564B (en) Method and apparatus for determining a timestamp
WO2020024962A1 (en) Method and apparatus for processing data
WO2020024980A1 (en) Data processing method and apparatus
US11114133B2 (en) Video recording method and device
CN109600661B (en) Method and apparatus for recording video
WO2023125169A1 (en) Audio processing method and apparatus, device, and storage medium
WO2021169632A1 (en) Video quality detection method and apparatus, and computer device
WO2020024949A1 (en) Method and apparatus for determining timestamp
WO2020024960A1 (en) Method and device for processing data
CN109600660B (en) Method and apparatus for recording video
CN109618198A (en) Live content reports method and device, storage medium, electronic equipment
CN109218849B (en) Live data processing method, device, equipment and storage medium
CN109587517B (en) Multimedia file playing method and device, server and storage medium
CN109600562B (en) Method and apparatus for recording video
US8374712B2 (en) Gapless audio playback
US20230031866A1 (en) System and method for remote audio recording
CN111147655B (en) Model generation method and device
CN111145769A (en) Audio processing method and device
US11876850B2 (en) Simultaneous recording and uploading of multiple audio files of the same conversation and audio drift normalization systems and methods
US11388489B2 (en) Simultaneous recording and uploading of multiple audio files of the same conversation and audio drift normalization systems and methods
WO2020087788A1 (en) Audio processing method and device
Xin et al. Live Signal Recording and Segmenting Solution Based on Cloud Architecture
BR112019027958A2 (en) apparatus and method of signal processing, and, program.
CN113436632A (en) Voice recognition method and device, electronic equipment and storage medium
CN115065852A (en) Sound and picture synchronization method and device, electronic equipment and readable storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19843617

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 31.05.2021)

122 Ep: pct application non-entry in european phase

Ref document number: 19843617

Country of ref document: EP

Kind code of ref document: A1