CN113784073A

CN113784073A - Method, device and related medium for synchronizing sound and picture of sound recording and video recording

Info

Publication number: CN113784073A
Application number: CN202111141585.5A
Authority: CN
Inventors: 朱炳科
Original assignee: Shenzhen Wondershare Software Co Ltd
Current assignee: Shenzhen Wondershare Software Co Ltd
Priority date: 2021-09-28
Filing date: 2021-09-28
Publication date: 2021-12-10

Abstract

The invention discloses a method, a device and a related medium for synchronizing sound and picture of sound recording and video recording, wherein the method comprises the following steps: starting recording, and respectively acquiring audio data and video data; buffering the audio data into an audio queue and buffering the video data; coding the audio data in the audio queue, coding the cached video data, and writing the coded audio data and video data into a recording file; and synchronizing the audio data and the video data according to the audio time stamp corresponding to the audio data and the video time stamp corresponding to the video data in the recording file. The invention captures the audio data and the video data respectively, carries out cache coding respectively, and then carries out sound-picture synchronization of audio and video through respective corresponding timestamps of the audio data and the video data, so that the sound and picture synchronization of output contents is kept when the real-time recording is carried out under the condition of insufficient system resources.

Description

Method, device and related medium for synchronizing sound and picture of sound recording and video recording

Technical Field

The invention relates to the technical field of computer software, in particular to a method and a device for synchronizing sound and pictures of sound recording and video recording and a related medium.

Background

Recording refers to capturing the output sound of a computer loudspeaker or the input sound of a microphone in real time, and storing the captured sound into a file or converting the captured sound into real-time streaming live broadcasting. The screen recording and video recording refers to the real-time capture of computer desktop and camera pictures, and storage of the coded pictures into files or real-time live broadcast. Under the condition of strong hardware performance, the simple method has no problem in realizing the synchronization between the sound and the picture, but when the system resources are insufficient, certain difficulty is caused in ensuring the audio and video synchronization.

At present, when system resources are insufficient, the method for keeping sound and picture synchronization by recording mainly comprises the following steps: and caching the recorded audio and video together, subsequently coding the audio and video in sequence, and then writing the audio and video into a file or carrying out real-time live broadcast. The method has certain defects that under the condition of insufficient system resources, a large number of videos are cached, more system memory resources are occupied, and the recording performance is further reduced; meanwhile, when recording is finished, a large amount of audio and video data are not processed, the recording needs to be waited for a long time when the recording is finished, and the longer the recording time is, the longer the waiting time is.

Disclosure of Invention

The embodiment of the invention provides a method, a device and a related medium for synchronizing sound and pictures of a sound recording and video recording, aiming at solving the problem that the sound and the pictures are not synchronized in the recording process when system resources are insufficient.

In a first aspect, an embodiment of the present invention provides a method for synchronizing sound and a picture of a sound recording and video recording, including:

starting recording, and respectively acquiring audio data and video data;

buffering the audio data into an audio queue and buffering the video data;

coding the audio data in the audio queue, coding the cached video data, and writing the coded audio data and video data into a recording file;

and synchronizing the audio data and the video data according to the audio time stamp corresponding to the audio data and the video time stamp corresponding to the video data in the recording file.

In a second aspect, an embodiment of the present invention provides an apparatus for synchronizing sound and picture of a sound recording and video recording, including:

the data acquisition unit is used for starting recording and respectively acquiring audio data and video data;

the data buffer unit is used for buffering the audio data into an audio queue and buffering the video data;

the data coding unit is used for coding the audio data in the audio queue, coding the cached video data and writing the coded audio data and video data into a recording file;

and the data synchronization unit is used for synchronizing the audio data and the video data according to the audio time stamp corresponding to the audio data and the video time stamp corresponding to the video data in the recording file.

In a third aspect, an embodiment of the present invention provides a computer device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the sound and picture synchronization method according to the first aspect when executing the computer program.

In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the sound and picture synchronization method according to the first aspect.

The embodiment of the invention provides a method and a device for synchronizing sound and picture of sound recording and video recording, computer equipment and a storage medium, wherein the method comprises the following steps: starting recording, and respectively acquiring audio data and video data; buffering the audio data into an audio queue and buffering the video data; coding the audio data in the audio queue, coding the cached video data, and writing the coded audio data and video data into a recording file; and synchronizing the audio data and the video data according to the audio time stamp corresponding to the audio data and the video time stamp corresponding to the video data in the recording file. In the embodiment of the invention, when a file is written or broadcast, the audio data and the video data are respectively captured and are respectively cached and coded into the file multiplexer or the broadcast service, so that the file multiplexer or the broadcast service can be synchronized through the corresponding time stamps of the audio data and the video data, namely, the sound and picture synchronization of audio and video can be achieved, and the sound and picture synchronization of output contents can be kept during real-time recording under the condition of insufficient system resources.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a flowchart illustrating a method for synchronizing sound and video of a sound recording and video recording according to an embodiment of the present invention;

FIG. 2 is a block diagram of an apparatus for synchronizing sound and video of a video recorder according to an embodiment of the present invention;

fig. 3 is another flowchart illustrating a method for synchronizing sound and video of a sound recording and video recording according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

Referring to fig. 1, fig. 1 is a schematic flow chart of a method for synchronizing sound and picture of audio and video recording according to an embodiment of the present invention, which specifically includes: steps S101 to S104.

S101, starting recording, and respectively acquiring audio data and video data;

s102, caching the audio data into an audio queue, and caching the video data;

s103, coding the audio data in the audio queue, coding the cached video data, and writing the coded audio data and video data into a recording file;

and S104, synchronizing the audio data and the video data according to the audio time stamp corresponding to the audio data and the video time stamp corresponding to the video data in the recording file.

In this embodiment, when recording is started, for example, when a file is written or live broadcast is started, the audio data and the video data are respectively acquired, and the acquired audio data are cached in the audio queue, while the video data do not need to be cached in the queue, and only need to be stored. And then, respectively encoding the cached audio queue and the cached video queue, and writing the two into a recording file, such as a file multiplexer or a live broadcast service, so that the recording file can carry out sound-picture synchronization according to the audio time stamp and the video time stamp which respectively correspond to the two.

In the embodiment, when a file is written or broadcast, audio data and video data are respectively captured and are respectively cached and coded into a file multiplexer or broadcast service, so that the file multiplexer or broadcast service can synchronize through respective corresponding timestamps of the audio data and the video data, that is, sound and picture synchronization of audio and video can be achieved, and under the condition of insufficient system resources, the sound and picture synchronization of output contents is kept during real-time recording.

In one embodiment, the step S101 includes:

respectively setting an audio thread and a video thread;

calling a system API through the audio thread to acquire the audio data;

and calling a system API through the video thread to acquire the video data.

In this embodiment, when acquiring audio data and video data, an audio thread and a video thread are first established, and then the audio thread and the video thread are run in parallel, so that the audio thread and the video thread respectively call a system API to acquire audio data and mouse data corresponding to each other. When audio data and video data are acquired through a system API, the audio data and the video data are acquired in a mode of sending a request → judging the response state of a server → acquiring the data, namely, the request for acquiring the audio data and the video data is firstly sent to the server, and after the response of the server to the request is received, the audio data and the video data are started to be acquired. And if the response of the server to the request is not received, the request needs to be sent again until the server responds.

Further, in an embodiment, the obtaining the audio data by calling a system API through the audio thread includes:

calling a system API through the audio thread to carry out initialization operation on a microphone and a loudspeaker;

setting the size of a system API audio cache, simultaneously acquiring the sampling frequency and the number of sound channels of an audio stream, and storing the sampling frequency and the number of the sound channels of the audio stream into the system API audio cache;

and calling a system start method and starting an external thread to read audio data from the audio cache of the system API.

In this embodiment, in the process of acquiring audio data through an audio thread, the audio thread is responsible for acquiring audio data from a speaker or a microphone. Specifically, a system API is called to initialize a microphone and a loudspeaker, the size of an audio cache of the system API is set, information such as the sampling frequency and the number of channels of an audio stream is obtained at the same time, then a system start method is called occasionally, a special external thread is started at the same time, and the external thread reads audio data from the audio cache of the system API.

In one embodiment, the step S102 includes:

reading audio data and a corresponding first system clock from a system API audio cache;

converting the first system clock to audio timestamps;

and adding the audio data and the corresponding audio time stamp to the tail part of the audio queue.

In this embodiment, the audio data read from the system API audio buffer is provided with a system clock, i.e. the first system clock, so that the first system clock is converted into a corresponding audio time stamp (PTS). In addition, the size of the audio data read from the system API audio buffer is uncertain, so that after the read audio data reaches a preset fixed size, the acquired audio data is added to the tail of the audio buffer queue with a timestamp.

In an embodiment, the step S102 further includes:

when a user pauses the recording process, audio data acquired during the pause period is discarded as invalid data in real time;

and after the user restarts recording, adding the acquired audio data into the audio queue as valid audio data.

In this embodiment, the audio needs to ensure continuity, and the user may perform a pause operation during the recording process. Considering that the audio data during the pause is not needed in the recording result, although the audio data (i.e. the invalid data) will be continuously acquired during the pause of the recording, the audio data acquired at this time will be discarded in real time, so as not to increase the burden of system resources. After the user restarts the recording, no audio data will be discarded in the non-paused state (i.e. during the recording process), i.e. all valid audio data will be added to the audio queue. It should be noted that, when the performance of the system hardware is insufficient, although the embodiment may also buffer multi-frame audio data, the data amount of the data stream of the audio data relative to the data amount of the video data is much smaller, and therefore the buffering manner of the embodiment does not put any stress on the system.

In one embodiment, the step S103 includes:

and reading the audio data in the audio queue one by one, and sequentially sending the audio data to an audio encoder for encoding.

In this embodiment, when the audio thread is started, the audio data stored in the audio queue during the capturing process are read one by one and sent to the audio encoder for encoding. Furthermore, the audio thread and the audio encoder are simultaneously carried out, so that the size of the audio queue can be effectively controlled. That is, even if the system performance is insufficient, the amount of audio data buffered in the audio queue is not too large, and when the recording is finished, only a small amount of audio data in the audio queue needs to be waited to complete the encoding process.

The queue is characterized by first-in first-out, that is, only one end is allowed to perform an insertion operation (tail of the queue) and the other end is allowed to perform a deletion operation (head of the queue), so that in the embodiment, when the audio data is written into the audio queue, the obtained audio data is written into the audio queue in sequence from the audio data with the earliest time stamp according to the sequence of the time stamps. Of course, since the storage space of the queue is limited, before writing, it can be determined whether the queue is empty according to the following formula: front +1 or front + maxSize-1 or rear; during the writing process, whether the queue is full can be judged according to the following formula: front +2 or front + maxSize-2 or rear. Here, real is the head of the queue pointer, front is the head of the queue pointer, and maxSize is the maximum value. Thereby determining whether audio data can be written to the audio queue. Further, when the audio queue is full, a new audio queue can be created, and the precedence relationship between different audio queues is determined according to the written time stamps.

Of course, in other embodiments, a stack (i.e., a first-in-last-out feature) may be used to write the audio data.

In an embodiment, the obtaining the video data by calling a system API through the video thread includes:

calling a system API through the audio thread to acquire current picture data of a camera and/or capture current picture data of a screen, and acquiring a corresponding second system clock;

and taking the current picture data of the camera and/or the current picture data of the screen as video data, and converting the second system clock into a video time stamp corresponding to the video data.

In this embodiment, after the video thread is started, it only needs to be responsible for acquiring data of the current picture from the camera or capturing data of the current picture on the screen. When the current picture is obtained, the current system clock (i.e. the second system clock) is correspondingly obtained, and the time stamp (PTS), i.e. the video time stamp, of the current picture can be obtained after the second system clock is converted. When the system performance is insufficient and the previous frame image is not coded in time, a new picture is used for covering the previous frame image, so that the image data obtained by the last frame can be guaranteed to be cached, and only the current last frame image data is processed during coding. Meanwhile, as the video data are stored in the uncached queue, the corresponding coding workload is reduced, and when the recording is finished, the video data waiting for processing can not exist, namely the video data can immediately stop working after the current video data is processed.

In a particular embodiment, in conjunction with FIG. 3, audio capture and video capture are initiated simultaneously to capture audio data and video data, and audio processing, i.e., encoding processing, is initiated simultaneously as the audio data is captured. Separately, when capturing audio data, judging whether recording is suspended, if recording is suspended, not caching the captured audio data, and if recording is not suspended, writing the captured audio data into an audio cache A queue. And determining whether to stop capturing the audio data according to whether to stop recording, and simultaneously, synchronously performing audio processing, namely encoding processing, namely acquiring the audio data from the audio buffer A queue and encoding until the audio data does not exist in the audio buffer A queue. When a video image (i.e. video data) is captured, no matter recording is paused or stopped, the captured video data is correspondingly paused or stopped, the captured video data is directly cached, and then the cached video data is correspondingly encoded.

Fig. 2 is a schematic block diagram of an apparatus 200 for synchronizing sound and picture of a video recording device according to an embodiment of the present invention, where the apparatus 200 includes:

a data obtaining unit 201, configured to start recording and obtain audio data and video data, respectively;

a data buffering unit 202, configured to buffer the audio data into an audio queue, and buffer the video data;

the data encoding unit 203 is configured to perform encoding processing on the audio data in the audio queue, perform encoding processing on the cached video data, and write the audio data and the video data after the encoding processing into a recording file;

the data synchronization unit 204 is configured to synchronize the audio data and the video data according to an audio time stamp corresponding to the audio data and a video time stamp corresponding to the video data in the recording file.

In one embodiment, the data acquiring unit 201 includes:

the thread setting unit is used for respectively setting an audio thread and a video thread;

the first calling unit is used for calling a system API through the audio thread to acquire the audio data;

and the second calling unit is used for calling a system API through the video thread to acquire the video data.

In one embodiment, the first calling unit includes:

the initialization unit is used for calling a system API through the audio thread to carry out initialization operation on the microphone and the loudspeaker;

the device comprises a cache setting unit, a system API audio cache processing unit and a processing unit, wherein the cache setting unit is used for setting the size of the system API audio cache, simultaneously acquiring the sampling frequency and the number of sound channels of an audio stream, and storing the sampling frequency and the number of the sound channels of the audio stream into the system API audio cache;

and the starting reading unit is used for calling the system start method and starting an external thread to read the audio data from the system API audio cache.

In one embodiment, the data caching unit 202 includes:

the clock reading unit is used for reading the audio data and the corresponding first system clock from the system API audio cache;

a first conversion unit for converting the first system clock into audio time stamps;

and the first adding unit is used for adding the audio data and the corresponding audio time stamp to the tail part of the audio queue.

In an embodiment, the data caching unit 202 further includes:

a real-time discarding unit, configured to discard audio data acquired during a pause period as invalid data in real time when a user pauses a recording process;

and the first adding unit is used for adding the acquired audio data into the audio queue as effective audio data after the user restarts recording.

In one embodiment, the data encoding unit 203 includes:

and the reading and sending unit is used for reading the audio data in the audio queue one by one and sending the audio data to an audio encoder in sequence for encoding processing.

In one embodiment, the second calling unit includes:

the clock acquisition unit is used for acquiring the current picture data of the camera and/or intercepting the current picture data of the screen through the audio thread calling system API and acquiring a corresponding second system clock;

and the second conversion unit is used for taking the current picture data of the camera and/or the current picture data of the screen as video data and converting the second system clock into a video timestamp corresponding to the video data.

Since the embodiments of the apparatus portion and the method portion correspond to each other, please refer to the description of the embodiments of the method portion for the embodiments of the apparatus portion, which is not repeated here.

Embodiments of the present invention also provide a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed, the steps provided by the above embodiments can be implemented. The storage medium may include: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The embodiment of the present invention further provides a computer device, which may include a memory and a processor, where the memory stores a computer program, and the processor may implement the steps provided in the above embodiments when calling the computer program in the memory. Of course, the computer device may also include various network interfaces, power supplies, and the like.

The embodiments are described in a progressive manner in the specification, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description. It should be noted that, for those skilled in the art, it is possible to make several improvements and modifications to the present application without departing from the principle of the present application, and such improvements and modifications also fall within the scope of the claims of the present application.

It is further noted that, in the present specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Claims

1. A method for synchronizing sound and picture of sound recording and video recording is characterized by comprising the following steps:

starting recording, and respectively acquiring audio data and video data;

buffering the audio data into an audio queue and buffering the video data;

2. The method of claim 1, wherein said separately acquiring audio data and video data comprises:

respectively setting an audio thread and a video thread;

calling a system API through the audio thread to acquire the audio data;

and calling a system API through the video thread to acquire the video data.

3. The method of claim 2, wherein the retrieving the audio data by the audio thread calling a system API comprises:

4. The method of claim 3, wherein buffering the audio data into an audio queue comprises:

converting the first system clock to audio timestamps;

5. The method of claim 1, wherein buffering the audio data into an audio queue further comprises:

6. The method of claim 1, wherein the encoding the audio data in the audio queue comprises:

7. The method of claim 2, wherein said retrieving the video data by calling a system API through the video thread comprises:

8. An apparatus for synchronizing sound and picture of a sound recording/video recording, comprising:

9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of sound and picture synchronization according to any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, having stored thereon a computer program which, when executed by a processor, implements the method of sound and picture synchronization of a sound recording according to any one of claims 1 to 7.