CN108924631B - Video generation method based on audio and video shunt storage - Google Patents

Video generation method based on audio and video shunt storage Download PDF

Info

Publication number
CN108924631B
CN108924631B CN201810675756.4A CN201810675756A CN108924631B CN 108924631 B CN108924631 B CN 108924631B CN 201810675756 A CN201810675756 A CN 201810675756A CN 108924631 B CN108924631 B CN 108924631B
Authority
CN
China
Prior art keywords
video
audio
stream
frame
audio stream
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810675756.4A
Other languages
Chinese (zh)
Other versions
CN108924631A (en
Inventor
吴宣辉
胡松涛
卢锡芹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Xujian Science And Technology Co ltd
Original Assignee
Hangzhou Xujian Science And Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Xujian Science And Technology Co ltd filed Critical Hangzhou Xujian Science And Technology Co ltd
Priority to CN201810675756.4A priority Critical patent/CN108924631B/en
Publication of CN108924631A publication Critical patent/CN108924631A/en
Application granted granted Critical
Publication of CN108924631B publication Critical patent/CN108924631B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/433Content storage operation, e.g. storage operation in response to a pause request, caching operations
    • H04N21/4334Recording operations
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/4302Content synchronisation processes, e.g. decoder synchronisation
    • H04N21/4307Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/433Content storage operation, e.g. storage operation in response to a pause request, caching operations
    • H04N21/4331Caching operations, e.g. of an advertisement for later insertion during playback
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/443OS processes, e.g. booting an STB, implementing a Java virtual machine in an STB or power management in an STB
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/8547Content authoring involving timestamps for synchronizing content
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/76Television signal recording
    • H04N5/91Television signal processing therefor
    • H04N5/92Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/76Television signal recording
    • H04N5/91Television signal processing therefor
    • H04N5/92Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback
    • H04N5/9201Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback involving the multiplexing of an additional signal and the video signal
    • H04N5/9202Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback involving the multiplexing of an additional signal and the video signal the additional signal being a sound signal
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/141Systems for two-way working between two video terminals, e.g. videophone
    • H04N7/147Communication arrangements, e.g. identifying the communication as a video-communication, intermediate storage of the signals

Abstract

The invention discloses a video generation method based on audio and video shunt storage, which comprises the following steps: step 1, performing caching processing after receiving an audio stream A, a video stream A, an audio stream B and a video stream B; step 2, selecting the time needing to be synthesized to synthesize the video; step 3, selecting the audio and video stream A as an audio and video synthesis reference basis, mixing the audio stream A and the audio stream B, and extracting the resolution of the video stream A to zoom the video stream B into the parameter of the video stream A; when the audio stream A is in a sound state, the video A picture is used, when the audio stream B is in a sound state, the video B picture is used, when silence is detected or the audio stream A and the audio stream B are both in the sound state, a reference basic video A picture is used, then synchronization processing is carried out, a timestamp is calculated and adjusted, audio and video synchronization is guaranteed, and finally the video file is incorporated. The video presentation effect of the technical scheme of the invention is novel, and the picture with sound is displayed, which is similar to voice following.

Description

Video generation method based on audio and video shunt storage
Technical Field
The invention relates to the technical field of computer information data processing, in particular to a video generation method based on audio and video shunt storage.
Background
The existing two-party call video generation mode is as follows: as shown in fig. 1, two video files are generated, sound is mixed, and video is not mixed, that is, a single-picture video file is generated. Therefore, the current video recording effect is that both voices can be heard, only one image can be seen (the image and the voice of the local side are synchronized), and one voice has no video picture corresponding to the other voice. In order to solve the problem, an upgrade scheme is required to support the mixing of video pictures and achieve the optimal video experience effect, that is, all the sound and the pictures can be matched synchronously.
Disclosure of Invention
The invention aims to provide a video generation method based on audio and video shunt storage.A temporary storage block stores data, calculates an idle CPU, ensures that the overall performance of a server is not influenced, and is based on a path of routing reference, a path of video coding and decoding, reduces the performance overhead and effectively synthesizes a video file; to solve the problems set forth in the background art described above.
In order to achieve the purpose, the invention provides the following technical scheme:
a video generation method based on audio and video shunt storage comprises the following steps:
step (1), buffering and storing the audio and video
And after receiving the audio stream A and the video stream A as well as the audio stream B and the video stream B, performing caching processing and storing the audio stream A and the video stream A and the audio stream B into a storage block, so that the problem that the subsequent video synthesis is caused by overlarge CPU load due to continuous application of CPU resource scheduling under the condition of less CPU resources can be avoided.
Step (2) selecting the time needing to be synthesized to synthesize the video
Calculating and selecting an idle CPU, and starting the synthesis of the video file;
step (3) audio and video coding and decoding processing
Selecting an audio/video stream A as an audio/video synthesis reference basis, mixing the audio stream A and the audio stream B, and extracting the resolution of the video stream A to scale the video stream B into the parameters of the video stream A, namely the same resolution, code rate and frame rate; when the audio stream A is in a sound state, the video A picture is used, when the audio stream B is in a sound state, the video B picture is used, when silence is detected or the audio stream A and the audio stream B are both in the sound state, a reference basic video A picture is used, then synchronization processing is carried out, a timestamp is calculated and adjusted, audio and video synchronization is guaranteed, and finally the video file is incorporated.
Preferably, in the step (1), the audio and video are buffered according to the following specific flow:
(1.1) pre-establishing storage blocks, defining the size of each storage block to be 256M, and determining the number of establishment according to actual requirements, wherein each storage block has a respective number;
(1.2) dividing the storage blocks for use, for example, the storage blocks of the audio and video A are numbered from 1A to 100A, and the storage blocks of the audio and video B are numbered from 1B to 100B, and reusing the storage blocks;
(1.3) putting the audio and video packaging data into a storage block, wherein the audio and video data are distinguished according to different identification symbols, such as an audio identifier $ a and a video identifier $ v;
(1.3.1) storing the prior video frame data into a storage block, storing the SPS, PPS and I frames of the video firstly, and then storing the audio data to prevent the situation that the synthesized video has no sound and no video;
(1.3.2) extracting the time stamp of each RTP packet by the audio stream, and putting each audio RTP packet into a storage block according to a packet format of '4-byte packet length + 4-byte time stamp description + packet body'; wherein, the packet body refers to a complete RTP packet; 4 bytes of packet length +4 bytes of timestamp description + inclusion, which is a private customized packet format;
(1.3.3) the RTP packets are required to be split by the video stream, complete frame data are assembled, the time stamp of each frame is recorded, and the American frame data are put into a storage block according to the packet format of '4-byte packet length + 4-byte time stamp description + bag body'; wherein: the packet body is obtained by extracting pure H264 data from an RTP packet and putting the data into the RTP packet;
namely, the video body is the RTP packet with the pure H264 data extracted and put in, and the audio body is the complete RTP packet.
Preferably, the CPU performance calculation in step (2) is performed according to the following process:
(2.1), CPU performance typically comprises 3 points: running queues, CPU utilization and context switching;
(2.2) the run queue preferably does not exceed 3 for each CPU, e.g., 6 if a dual core CPU; if the queue remains above 3 for a long period of time, indicating that none of the processes can respond immediately to the cpu while running, it may be necessary to consider upgrading the cpu. In addition, the utilization rate of the cpu in full load operation is preferably that user space is kept at 65% -70%, system space is kept at 30%, and idle space is kept at 0% -5%;
(2.3) checking the overall system running state and the CPU utilization rate through a top command, and checking the process queue length and the average load state, the process creating average value and the number of context switching by using an sar;
and (2.4) when the occupancy rate of the CPU is not more than 50 percent (namely, the CPU is in the optimal state), starting the synthesis of the video file.
Preferably, the audio/video reference basis selection policy in step (3) is specifically as follows:
(3.1) calculating energy values of the audio stream A and the audio stream B, and selecting the higher total energy as a reference basis; such as selecting audio stream a;
(3.2) counting the time point and the duration of each piece of sound of the audio stream A and the audio stream B;
(3.3) decoding the audio stream A and the audio stream B, adding and mixing the audio streams; if the audio is encoded into Alaw and Ulaw of G.711, decoding the audio into PCM, and performing additive sound mixing;
(3.4) with the video stream A as a reference, analyzing the resolution, re-decoding the video stream B, performing scaling by processing YUV, and re-encoding, wherein the resolution is finally consistent with that of the video stream A;
and (3.5) audio and video synthesis, namely calculating that the audio stream A is in a sound state according to a time axis, selecting the I frame and the subsequent frame of the video stream A at the corresponding time point to be combined into a video file, selecting the I frame and the subsequent frame of the video stream B at the corresponding time point to be combined into the video file if the audio stream B is calculated to be in the sound state, and selecting the I frame and the subsequent frame of the reference basic video stream A at the corresponding time point to be combined into the video file if the mute state is calculated or the audio stream A and the audio stream B are both in the sound state.
Preferably, the specific method of the synchronization process in the step (3) is as follows:
(4.1) letting the base timestamp be the timestamp of the first frame;
(4.2) for the audio time stamp calculation process, the relative time stamp Ta of the audio ═ (time stamp per frame-base time stamp) ÷ (8000 ÷ 1000);
(4.3) for the calculation process of the video time stamp, the relative time stamp Tv of the video is (time stamp-base time stamp per frame) ÷ (90000 ÷ 1000);
(4.4) calculating a timestamp deviation formula of the front and rear frames of the audio and video by considering the unstable packet sending (namely the packet sending of the audio and video equipment is too slow): (the timestamp of the next frame-the timestamp of the previous frame) ÷ (8000 ÷ 1000), if the deviation of the timestamp is greater than 1000(1 second), it means that the packet is sent too slowly, the basic timestamp needs to be adjusted, the formula is that the basic timestamp is adjusted to be the basic timestamp plus the timestamp of the next frame-the timestamp of the previous frame, and then the relative timestamps Ta and Tv of the audio and video are adjusted, so as to control the audio and video synchronization;
and (4.5) controlling the difference value of the relative time stamps (Ta and Tv) of the adjacent audios and videos of the received RTP within 1000(1 second), so that the audios and videos are synchronized.
Name interpretation
Mixing: mixing the multiple audio streams into one audio;
resolution ratio: the number of pixel points included in a unit inch;
a CPU: the central processing unit is an ultra-large scale integrated circuit, and is an operation Core (Core) and a control Core (control unit) of a computer;
YUV: a color coding method (belonging to PAL) adopted by European television system is the color space adopted by PAL and SECAM analog color television system;
SPS: sequence Paramater Set, also called Sequence parameter Set, holds a Set of global parameters of a Coded video Sequence (Coded video Sequence);
PPS: picture parameter Set, PPS is stored in the header of a video file, usually together with SPS, in a package format;
i frame (I frame): also known as intra pictures, I-frames, which are usually the first frame of each GOP (a video compression technique used by MPEG), are moderately compressed and serve as reference points for random access, which can be referred to as pictures;
RTP (Real-time Transport Protocol): the real-time transport protocol is a network transport protocol, which was published by the multimedia transport working group of IETF in RFC 1889 in 1996 and later updated in RFC 3550;
RTP packet: defining a packet format of RTP transmission;
time stamping: a complete, verifiable piece of data, usually a sequence of characters, that indicates that a piece of data existed before a particular time, uniquely identifies the time of the moment.
Compared with the prior art, the invention has the beneficial effects that:
1. compared with a mode of directly generating a video file, the technical scheme of the invention increases the use of the storage block, temporarily stores audio and video data and is beneficial to generating a video at a required moment.
2. The use of the storage block can select the work of starting the generation of the video file when the CPU is idle, so that the use of the CPU is reasonably distributed, and other tasks needing CPU resources are not influenced.
3. Compared with a video screen mixing mode, the technical scheme of the invention takes the audio data with high energy value as the basic reference path, the high energy value means that the time occupied by sound is longer, and the video data with sound time corresponding to the other path of decoding and encoding is less, so that the used CPU resource is less and the synthesized video is faster.
4. The video presentation effect of the technical scheme of the invention is novel, and the picture with sound is displayed, which is similar to voice following.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic diagram illustrating a conventional method for generating a double-talk video;
fig. 2 is a schematic structural diagram of a video generation method based on audio and video shunt storage according to the present invention;
the figures in the drawings are marked with numbers: the device comprises a storage block (1), an idle CPU (2), audio and video coding and decoding processing (3), synchronous processing (4) and a video file (5).
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 2: the invention provides a specific embodiment of a video generation method based on audio and video shunt storage, which comprises the following steps:
step (1), buffering and storing the audio and video
And after receiving the audio stream A and the video stream A as well as the audio stream B and the video stream B, performing caching processing, and storing the audio stream A and the video stream A as well as the audio stream B and the video stream B into the storage block (1), so that the problem that the subsequent video synthesis is caused due to overlarge CPU load caused by continuously applying CPU resource scheduling under the condition of less CPU resources can be avoided.
Step (2) selecting the time needing to be synthesized to synthesize the video
Calculating and selecting an idle CPU (2) and starting the synthesis of the video file;
step (3) audio and video coding and decoding processing (3)
Selecting an audio/video stream A as an audio/video synthesis reference basis, mixing the audio stream A and the audio stream B, and extracting the resolution of the video stream A to scale the video stream B into the parameters of the video stream A, namely the same resolution, code rate and frame rate; when the audio stream A is in a sound state, using a video A picture, when the audio stream B is in a sound state, using a video B picture, detecting that the audio stream A is silent or the audio stream A and the audio stream B are both in the sound state, using a reference basic video A picture, then carrying out synchronization processing (4), calculating and adjusting a time stamp, ensuring audio and video synchronization, and finally integrating into a video file (5).
In the step (1), the audio and video are buffered and stored according to the following specific flow:
(1.1) pre-establishing the storage blocks (1), defining the size of each storage block to be 256M, and determining the number of establishment according to actual requirements, wherein each storage block has a respective number;
(1.2) dividing the storage block (1) for use, for example, the storage block of the audio/video A is numbered from 1A to 100A, the storage block of the audio/video B is numbered from 1B to 100B, and the storage block (1) is recycled;
(1.3) putting the audio and video packaging data into the storage block (1), wherein the audio and video data are distinguished according to different identification symbols, such as an audio identifier $ a and a video identifier $ v;
(1.3.1) storing the prior video frame data into a storage block (1), storing the SPS, PPS and I frames of the video firstly, and then storing the audio data to prevent the situation that the synthesized video has no video sound;
(1.3.2) extracting the time stamp of each RTP packet by the audio stream, and putting each audio RTP packet into the storage block (1) according to the packet format of 4-byte packet length, 4-byte time stamp description and packet body; wherein, the packet body refers to a complete RTP packet; 4 bytes of packet length +4 bytes of timestamp description + inclusion, which is a private customized packet format;
(1.3.3) the video stream needs to be split into RTP packets, complete frame data is assembled, the time stamp of each frame is recorded, and the American frame data is put into a storage block (1) according to the packet format of '4 bytes packet length +4 bytes time stamp description + bag body'; wherein: the packet body is obtained by extracting pure H264 data from an RTP packet and putting the data into the RTP packet;
namely, the video body is the RTP packet with the pure H264 data extracted and put in, and the audio body is the complete RTP packet.
The CPU performance calculation in the step (2) is specifically performed according to the following process:
(2.1), CPU performance typically comprises 3 points: running queues, CPU utilization and context switching;
(2.2) the run queue preferably does not exceed 3 for each CPU, e.g., 6 if a dual core CPU; if the queue remains above 3 for a long period of time, indicating that none of the processes can respond immediately to the cpu while running, it may be necessary to consider upgrading the cpu. In addition, the utilization rate of the cpu in full load operation is preferably that user space is kept at 65% -70%, system space is kept at 30%, and idle space is kept at 0% -5%;
(2.3) checking the overall system running state and the CPU utilization rate through a top command, and checking the process queue length and the average load state, the process creating average value and the number of context switching by using an sar;
and (2.4) when the occupancy rate of the CPU is not more than 50 percent (namely, the CPU is in the optimal state), starting the synthesis of the video file.
The audio and video reference basis selection strategy in the step (3) is specifically as follows:
(3.1) calculating energy values of the audio stream A and the audio stream B, and selecting the higher total energy as a reference basis; such as selecting audio stream a;
(3.2) counting the time point and the duration of each piece of sound of the audio stream A and the audio stream B;
(3.3) decoding the audio stream A and the audio stream B, adding and mixing the audio streams; if the audio is encoded into Alaw and Ulaw of G.711, decoding the audio into PCM, and performing additive sound mixing;
(3.4) with the video stream A as a reference, analyzing the resolution, re-decoding the video stream B, performing scaling by processing YUV, and re-encoding, wherein the resolution is finally consistent with that of the video stream A;
and (3.5) audio and video synthesis, namely calculating that the audio stream A is in a sound state according to a time axis, selecting the I frame and the subsequent frame of the video stream A at the corresponding time point to be combined into a video file, selecting the I frame and the subsequent frame of the video stream B at the corresponding time point to be combined into the video file if the audio stream B is calculated to be in the sound state, and selecting the I frame and the subsequent frame of the reference basic video stream A at the corresponding time point to be combined into the video file if the mute state is calculated or the audio stream A and the audio stream B are both in the sound state.
The specific method of the synchronization processing (4) in the step (3) is as follows:
(4.1) letting the base timestamp be the timestamp of the first frame;
(4.2) for the audio time stamp calculation process, the relative time stamp Ta of the audio ═ (time stamp per frame-base time stamp) ÷ (8000 ÷ 1000);
(4.3) for the calculation process of the video time stamp, the relative time stamp Tv of the video is (time stamp-base time stamp per frame) ÷ (90000 ÷ 1000);
(4.4) calculating a timestamp deviation formula of the front and rear frames of the audio and video by considering the unstable packet sending (namely the packet sending of the audio and video equipment is too slow): (the timestamp of the next frame-the timestamp of the previous frame) ÷ (8000 ÷ 1000), if the deviation of the timestamp is greater than 1000(1 second), it means that the packet is sent too slowly, the basic timestamp needs to be adjusted, the formula is that the basic timestamp is adjusted to be the basic timestamp plus the timestamp of the next frame-the timestamp of the previous frame, and then the relative timestamps Ta and Tv of the audio and video are adjusted, so as to control the audio and video synchronization;
and (4.5) controlling the difference value of the relative time stamps (Ta and Tv) of the adjacent audios and videos of the received RTP within 1000(1 second), so that the audios and videos are synchronized.
The invention has the beneficial effects that:
1. compared with a mode of directly generating a video file, the technical scheme of the invention increases the use of the storage block, temporarily stores audio and video data and is beneficial to generating a video at a required moment.
2. The use of the storage block can select the work of starting the generation of the video file when the CPU is idle, so that the use of the CPU is reasonably distributed, and other tasks needing CPU resources are not influenced.
3. Compared with a video screen mixing mode, the technical scheme of the invention takes the audio data with high energy value as the basic reference path, the high energy value means that the time occupied by sound is longer, and the video data with sound time corresponding to the other path of decoding and encoding is less, so that the used CPU resource is less and the synthesized video is faster.
4. The video presentation effect of the technical scheme of the invention is novel, and the picture with sound is displayed, which is similar to voice following.
It is further noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (1)

1. A video generation method based on audio and video shunt storage is characterized by comprising the following steps:
step (1), buffering and storing the audio and video
Receiving an audio stream A, a video stream A, an audio stream B and a video stream B, performing caching processing, and storing the audio stream A, the video stream A, the audio stream B and the video stream B in a storage block;
step (2) selecting the time needing to be synthesized to synthesize the video
Calculating and selecting an idle CPU, and starting the synthesis of the video file;
step (3) audio and video coding and decoding processing
Selecting an audio stream A as an audio reference base, selecting a video stream A as a video reference base, mixing the audio stream A and the audio stream B, extracting the resolution of the video stream A, and scaling the resolution of the video stream B into the resolution of the video stream A; when the audio stream A is in a sound state, using a video A picture, when the audio stream B is in a sound state, using a video B picture, detecting that the audio stream A and the audio stream B are in a sound state or in a mute state, using a video to refer to the basic video A picture, then carrying out synchronous processing, calculating and adjusting a timestamp, ensuring audio and video synchronization, and finally integrating the timestamp into a video file;
in the step (1), the audio and video are buffered and stored according to the following specific flow:
(1.1) pre-establishing storage blocks, defining the size of each storage block to be 256M, and determining the number of establishment according to actual requirements, wherein each storage block has a respective number;
(1.2) dividing the storage block for use and reusing the storage block;
(1.3) putting the audio and video packaging data into a storage block, and distinguishing the audio and video data according to different identification symbols;
(1.3.1) storing the prior video frame data into a storage block, storing the SPS, PPS and I frames of the video firstly, and then storing the audio data to prevent the situation that the synthesized video has no sound and no video;
(1.3.2) extracting the time stamp of each RTP packet by the audio stream, and putting each audio RTP packet into a storage block according to a packet format of '4-byte packet length + 4-byte time stamp description + packet body'; wherein, the packet body refers to a complete RTP packet;
(1.3.3) the video stream needs to be split into RTP packets, complete frame data is assembled, the time stamp of each frame is recorded, and each frame of data is put into a storage block according to the packet format of '4 bytes packet length +4 bytes time stamp description + bag body'; wherein: the packet body is obtained by extracting pure H264 data from an RTP packet and putting the data into the RTP packet;
the step (2) specifically comprises the following steps:
(2.1) checking the overall system running state and the CPU utilization rate through a top command, and checking the process queue length and the average load state, the process creating average value and the number of context switching by using an sar;
(2.2) when the occupancy rate of the CPU is monitored to be not more than 50%, starting the synthesis of the video files;
the step (3) specifically comprises the following steps:
(3.1) counting the time point and the duration of each piece of sound of the audio stream A and the audio stream B;
(3.2) decoding the audio stream A and the audio stream B, adding and mixing the audio streams;
(3.3) taking the video stream A as a video reference base, analyzing the resolution, re-decoding the video stream B, performing scaling by processing YUV, and re-encoding, wherein the resolution is finally kept consistent with that of the video stream A;
(3.4) audio and video synthesis, namely calculating that the audio stream A is in a sound state according to a time axis, selecting an I frame of the video stream A at a corresponding time point and a subsequent frame in the duration to be combined into a video file, selecting an I frame of the video stream B at the corresponding time point and a subsequent frame in the duration to be combined into the video file if the audio stream B is calculated to be in the sound state, and selecting an I frame of the video reference basic video stream A at the corresponding time point and a subsequent frame in the duration to be combined into the video file if the mute state is calculated or the audio stream A and the audio stream B are both in the sound state;
the specific method of the synchronization processing is as follows:
(4.1) letting the base timestamp be the timestamp of the first frame;
(4.2) for the audio time stamp calculation process, the relative time stamp Ta of the audio ═ (time stamp per frame-base time stamp) ÷ (8000 ÷ 1000);
(4.3) for the calculation process of the video time stamp, the relative time stamp Tv of the video is (time stamp-base time stamp per frame) ÷ (90000 ÷ 1000);
(4.4) calculating the time stamp deviation of the front frame and the rear frame of the audio and video by the following formula: (next frame timestamp-timestamp of previous frame) ÷ (8000 ÷ 1000), if the deviation of the timestamp is greater than 1000, the audio and video equipment sends out the packet too slowly, adjust the basic timestamp, the formula is to adjust the basic timestamp ═ basic timestamp + next frame timestamp-timestamp of previous frame, and then adjust relative timestamps Ta and Tv of the audio and video, control the audio and video to be synchronous;
(4.5) the difference value of the relative time stamps Ta and Tv of the adjacent audios and videos of the received RTP packet is within 1000, and the audios and the videos are synchronized.
CN201810675756.4A 2018-06-27 2018-06-27 Video generation method based on audio and video shunt storage Active CN108924631B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810675756.4A CN108924631B (en) 2018-06-27 2018-06-27 Video generation method based on audio and video shunt storage

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810675756.4A CN108924631B (en) 2018-06-27 2018-06-27 Video generation method based on audio and video shunt storage

Publications (2)

Publication Number Publication Date
CN108924631A CN108924631A (en) 2018-11-30
CN108924631B true CN108924631B (en) 2021-07-06

Family

ID=64421569

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810675756.4A Active CN108924631B (en) 2018-06-27 2018-06-27 Video generation method based on audio and video shunt storage

Country Status (1)

Country Link
CN (1) CN108924631B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110381350B (en) * 2019-06-25 2021-07-30 杭州叙简科技股份有限公司 Multi-channel video playback synchronization system based on webrtc and processing method thereof
CN113784073A (en) * 2021-09-28 2021-12-10 深圳万兴软件有限公司 Method, device and related medium for synchronizing sound and picture of sound recording and video recording
CN114143491A (en) * 2021-11-17 2022-03-04 深蓝感知(杭州)物联科技有限公司 Video fragment generation method for single-editing 5G recorder
CN114512139B (en) * 2022-04-18 2022-09-20 杭州星犀科技有限公司 Processing method and system for multi-channel audio mixing, mixing processor and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103024517A (en) * 2012-12-17 2013-04-03 四川九洲电器集团有限责任公司 Method for synchronously playing streaming media audios and videos based on parallel processing
CN103338386A (en) * 2013-07-10 2013-10-02 航天恒星科技有限公司 Audio and video synchronization method based on simplified timestamps
CN106254805A (en) * 2016-07-28 2016-12-21 浙江大华技术股份有限公司 Storage method, device and the videocorder of a kind of Video data

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100810269B1 (en) * 2006-04-18 2008-03-06 삼성전자주식회사 Wireless terminal and a method for performing video communication service using that
CN101951492A (en) * 2010-09-15 2011-01-19 中兴通讯股份有限公司 Method and device for recording videos in video call
CN102447875A (en) * 2010-09-30 2012-05-09 中兴通讯股份有限公司 Method and system for centralized monitoring of video session terminals and relevant devices
US9094569B1 (en) * 2012-02-01 2015-07-28 Gary James Humphries Remote web-based visitation system for prisons
CN108024085A (en) * 2016-10-31 2018-05-11 联芯科技有限公司 The method for recording and equipment of audio and video
CN106507027A (en) * 2016-11-28 2017-03-15 北京小米移动软件有限公司 Video calling recording method and device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103024517A (en) * 2012-12-17 2013-04-03 四川九洲电器集团有限责任公司 Method for synchronously playing streaming media audios and videos based on parallel processing
CN103338386A (en) * 2013-07-10 2013-10-02 航天恒星科技有限公司 Audio and video synchronization method based on simplified timestamps
CN106254805A (en) * 2016-07-28 2016-12-21 浙江大华技术股份有限公司 Storage method, device and the videocorder of a kind of Video data

Also Published As

Publication number Publication date
CN108924631A (en) 2018-11-30

Similar Documents

Publication Publication Date Title
CN108924631B (en) Video generation method based on audio and video shunt storage
KR101234146B1 (en) Methods, apparatuses, and computer program products for adaptive synchronized decoding of digital video
KR101008764B1 (en) Method and system for improving interactive media response systems using visual cues
CN105791939B (en) The synchronous method and device of audio & video
JP2002141945A (en) Data transmission system and data transmission method, and program storage medium
JPH11225168A (en) Video/audio transmitter, video/audio receiver, data processing unit, data processing method, waveform data transmission method, system, waveform data reception method, system, and moving image transmission method and system
JP2003114845A (en) Media conversion method and media conversion device
EP1938498A2 (en) Method for signaling a device to perform no synchronization or include a syncronization delay on multimedia streams
JP2006140984A (en) Transmitting device with discard control of specific media data, and transmission program
JP2004509491A (en) Synchronization of audio and video signals
CN101370220B (en) Video media monitoring method and system
CN108540745B (en) High-definition double-stream video transmission method, transmitting end, receiving end and transmission system
US8842740B2 (en) Method and system for fast channel change
KR20180031673A (en) Switching display devices in video telephony
US20040184540A1 (en) Data processing system, data processing apparatus and data processing method
JPH06125363A (en) Packet communication system
CN114339316A (en) Video stream coding processing method based on live video
CN111385081A (en) End-to-end communication method, device, electronic equipment and medium
JP5488694B2 (en) Remote mobile communication system, server device, and remote mobile communication system control method
CN114554277A (en) Multimedia processing method, device, server and computer readable storage medium
CN108353035B (en) Method and apparatus for multiplexing data
US8565318B2 (en) Restamping transport streams to avoid vertical rolls
CN115102927B (en) SIP intercom method, system and storage device for keeping video clear
CN103079048B (en) Video and audio recording and program request implementation method when the call of multimedia command dispatching system keeps
KR100701032B1 (en) Video data transmission control system for network and method therefore

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
PE01 Entry into force of the registration of the contract for pledge of patent right
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: A Video Recording Generation Method Based on Audio Video Streaming Storage

Effective date of registration: 20231007

Granted publication date: 20210706

Pledgee: Guotou Taikang Trust Co.,Ltd.

Pledgor: HANGZHOU XUJIAN SCIENCE AND TECHNOLOGY Co.,Ltd.

Registration number: Y2023980059619