CN115550728A - Communication network video live broadcast method and system based on sampling-level audio multi-track synthesis - Google Patents

Communication network video live broadcast method and system based on sampling-level audio multi-track synthesis Download PDF

Info

Publication number
CN115550728A
CN115550728A CN202211136511.7A CN202211136511A CN115550728A CN 115550728 A CN115550728 A CN 115550728A CN 202211136511 A CN202211136511 A CN 202211136511A CN 115550728 A CN115550728 A CN 115550728A
Authority
CN
China
Prior art keywords
audio
track
video
communication network
audio signals
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211136511.7A
Other languages
Chinese (zh)
Inventor
丁英锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Wild Grass Acoustics Co ltd
Original Assignee
Shenzhen Wild Grass Acoustics Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Wild Grass Acoustics Co ltd filed Critical Shenzhen Wild Grass Acoustics Co ltd
Priority to CN202211136511.7A priority Critical patent/CN115550728A/en
Publication of CN115550728A publication Critical patent/CN115550728A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • H04N21/4394Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/4302Content synchronisation processes, e.g. decoder synchronisation
    • H04N21/4307Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen
    • H04N21/43072Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen of multiple content streams on the same device

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The invention relates to a communication network video live broadcast method based on sampling-level audio multi-track synthesis, which comprises the following steps: when the video shooting equipment records the video information, enabling each audio acquisition equipment to acquire the audio information; each audio acquisition device packs the acquired audio information into a data packet and then sends the data packet to the video shooting device through a wireless communication network; the video shooting equipment respectively restores the received data packets of the audio acquisition equipment into one-track audio signals, matches and aligns the waveforms of the multi-track audio signals and synthesizes the multi-track audio signals into one-track synthesized audio signals; and the video information is synthesized into live video. In the invention, the waveforms of the multi-track audio signals are matched and aligned, so that sampling level synchronization can be carried out on the multi-track audio signals, and the synchronization precision is far higher than clock synchronization, thereby obtaining better audio synthesis effect.

Description

Communication network video live broadcast method and system based on sampling-level audio multi-track synthesis
Technical Field
The invention belongs to the technical field of video live broadcast, and relates to a communication network video live broadcast method and system based on sampling-level audio multi-track synthesis.
Background
When the video live broadcast, often need set up the condition that a plurality of audio acquisition equipment carried out audio acquisition alone in the position of difference to through the collection closely obtaining better tone quality effect, later synthesize the audio frequency of gathering again, in order to reach better audio effect. When multi-track audio is synthesized, a sound console is generally adopted for manual synthesis, and sound regulators amplify, mix, distribute, modify and process sound effects of multiple paths of input signals through self hearing effects so as to achieve better sound quality effects; this places high demands on the tuning personnel. In addition, the audio acquisition device and the video shooting device usually adopt a wired transmission or bluetooth transmission mode to transmit audio information, but the wired transmission mode is inconvenient for the movement of the devices and has larger loss; the transmission rate of the Bluetooth transmission mode is low, and lossless audio transmission cannot be performed; in addition, the distance of bluetooth transmission is generally about 10 meters, and network extension cannot be performed, and the transmission distance is limited. Although the transmission rate of the WIFI transmission mode is far beyond bluetooth, and the transmission distance can be greatly increased by network extension, the delay of the WIFI transmission protocol is long, and a packet loss phenomenon inevitably exists, so that many problems to be solved exist when the WIFI transmission mode is used for transmitting audio.
Disclosure of Invention
In view of the above, the present invention provides a method and a system for live video broadcast in a communication network based on sample-level audio multi-track synthesis.
In order to achieve the purpose, the invention provides the following technical scheme:
a communication network video live broadcast method based on sampling-level audio multi-track synthesis comprises the following steps:
the method comprises the following steps that S1, a plurality of audio acquisition devices are connected with a video shooting device through a wireless communication network, and the video shooting device sends instructions to enable the audio acquisition devices to acquire audio information while recording the video information;
s2, each audio acquisition device acquires audio information, packages the audio information into a data packet and sends the data packet to the video shooting device through a wireless communication network;
s3, after the video shooting equipment receives the data packets of the audio acquisition equipment, recovering the data packets of the audio acquisition equipment into one-track audio signals respectively, matching and aligning the waveforms of the multi-track audio signals, and synthesizing the multi-track audio signals into one-track synthesized audio signals;
and S4, synthesizing the synthesized audio signal and the video information into a live video.
Further, in the step S3, the method for matching and aligning the waveforms of the multi-track audio signal includes the following steps:
s311, presetting the duration of a matching period, and selecting a track of audio signals as reference audio in one matching period;
s312, overlapping the waveform of the audio signal of the other track with the waveform of the reference audio, moving left and right in a preset range, integrating the Euclidean distances of the two waveforms after each movement, and taking the time point corresponding to the position with the minimum Euclidean distance integral value as the alignment time point of the audio signal of the track;
s313 and repeating step S312, sequentially calculating the alignment time points of the audio signals of other tracks, and aligning the alignment time points of the audio signals of the tracks.
Further, in the step S3, the method for matching and aligning the waveforms of the multi-track audio signal includes the following steps:
s321, presetting the duration of a matching period, and sequentially finding out a plurality of peak values of each track of audio signal from high to low according to audio level values in one matching period to serve as reference peak values;
s322, aligning the time points of one reference peak value corresponding to each track of audio signals in sequence, and summing the time differences of the time points among other corresponding reference peak values;
and S323, finding out the time point of the reference peak value when the sum of the time differences is minimum as an alignment time point, and aligning the alignment time points of the audio signals of the tracks.
Further, in the step S3, the method for matching and aligning the waveforms of the multi-track audio signal includes the following steps:
s331, presetting the duration of a matching period, and respectively calculating the envelope curve of each track of audio signals in one matching period;
s332, finding out time points corresponding to all peak values of the envelope curve of each track of audio signals;
s333, aligning the time points of one peak value corresponding to the envelope curve of each track audio signal in sequence, and summing the time differences of the time points between the other corresponding envelope curve peak values;
s334, the time point of the envelope peak value when the sum of the time differences is the minimum is found as an alignment time point, and the alignment time points of the audio signals of the respective tracks are aligned.
Further, the duration of the matching period is the duration of the audio information in one data packet.
Further, in step S3, before synthesizing the multi-track audio signals into a one-track synthesized audio signal, the following steps are further performed:
s351, an audio low level threshold value is preset, and before the audio signals of all the tracks are superposed and synthesized, the part, lower than the audio low level threshold value, of the audio signals of all the tracks is removed.
Further, in step S3, before synthesizing multi-track audio signals into one-track synthesized audio signal, the following steps are also performed:
and S352, attenuating the audio signal of each track.
Furthermore, the wireless communication network is a WIFI communication network, the WIFI communication network comprises a WIFI router, the audio acquisition device and the video shooting device are both provided with WIFI modules, and the audio acquisition device and the video shooting device are respectively connected with the WIFI router through the WIFI modules.
A communication network video shooting device based on sampling-level audio multi-track synthesis comprises
The video shooting module is used for obtaining video information through video shooting;
the first wireless communication module is used for connecting the audio acquisition equipment through a wireless communication network and acquiring a data packet of audio information sent by the audio acquisition equipment;
the receiving data storage queue is used for storing data packets which are not stored in the received data packets from the audio acquisition equipment and sequentially shifting out the stored data packets according to a first-in first-out principle after the number of the stored data packets reaches a preset number;
the multi-track audio synthesis module is used for respectively recovering the data packet of each audio acquisition device into a track audio signal and synthesizing the multi-track audio signal into a track synthesized audio signal through matching and aligning the waveforms of the multi-track audio signal;
the video buffer area is used for caching the video information shot by the video shooting module; and
and the audio and video synthesis module is used for synthesizing the synthesized audio signal and the video information into the live video.
A communication network video live broadcast system based on sampling-level audio multi-track synthesis comprises a video shooting device and a plurality of audio acquisition devices, wherein each audio acquisition device comprises:
the audio acquisition module is used for acquiring audio information through audio sampling and packaging the acquired audio information into a data packet;
the sending data storage queue is used for storing the data packets generated by the audio acquisition module and discarding the data packets stored firstly according to a first-in first-out principle after the number of the stored data packets reaches a preset number; and
and the second wireless communication module is used for transmitting the data packet stored in the transmission buffer area to a wireless communication network.
In the invention, the waveforms of the multi-track audio signals are matched and aligned, so that sampling level synchronization can be carried out on the multi-track audio signals, and the synchronization precision is improved from dozens to one hundred ms level of clock synchronization to 1ms level, which is far higher than the clock synchronization, thereby obtaining better audio synthesis effect and better audio synthesis effect.
Drawings
For the purposes of promoting a better understanding of the objects, aspects and advantages of the invention, reference will now be made to the following detailed description taken in conjunction with the accompanying drawings in which:
fig. 1 is a flow chart of a preferred embodiment of the video live broadcast method of the communication network based on sampling-level audio multi-track synthesis of the present invention.
Fig. 2 is a flow chart of a method of aligning waveform matching of a multi-track audio signal in a preferred embodiment.
Fig. 3 is a flow chart of a method of aligning waveform matching of multi-track audio signals in another preferred embodiment.
Fig. 4 is a flow chart of a method of aligning waveform matching of multi-track audio signals in yet another preferred embodiment.
Fig. 5 is a schematic structural diagram of a communication network video live broadcast system based on sample-level audio multi-track synthesis according to a preferred embodiment of the present invention.
Detailed Description
The embodiments of the invention are explained below by means of specific examples, the illustrations provided in the following examples merely illustrate the basic idea of the invention in a schematic manner, and the features in the following examples and examples can be combined with one another without conflict.
As shown in fig. 1, a preferred embodiment of the present invention, which is a video live broadcast method based on sample-level audio multi-track synthesis in communication network, comprises the following steps:
the method comprises the following steps that S1, a plurality of audio acquisition devices are connected with a video shooting device through a wireless communication network, and the video shooting device sends instructions to enable the audio acquisition devices to acquire audio information while recording the video information. Of course, a plurality of audio acquisition devices can be synchronized with the video shooting device respectively, so that the audio acquisition devices are synchronized with the video shooting device respectively, and the time axes of the audio information and the video information are aligned when the audio information and the video information are synthesized into live video. The wireless communication network is preferably a WIFI communication network, for example, the audio acquisition device and the video shooting device may preferably be electronic products such as a mobile phone and a tablet computer provided with a WIFI module, and the electronic products are accessed to form the WIFI communication network through a WIFI router. Of course, the wireless communication network may also be a 4G or 5G mobile communication network, the audio acquisition device and the video shooting device may preferably be electronic products such as a mobile phone and a tablet computer provided with a 4G communication module or a 5G communication module, and the electronic products are connected to a mobile communication base station, that is, information can be performed between the audio acquisition device and the video shooting device through the mobile communication network. Adopt wireless communication network transmission audio information, it is convenient not only that audio information transmits, and the network of being convenient for extends, supports multichannel audio information simultaneous transmission, can carry out the transmission of harmless audio frequency moreover, ensures tone quality effect.
And S2, each audio acquisition device acquires audio information through audio sampling, packages the sampled audio information into a data packet and sends the data packet to the video shooting device through a wireless communication network. The audio sampling rate of the audio acquisition device is generally 48000 samples per second, but of course, the audio sampling rate may be other values; the size of the data packet can be 64-2048 bits, and is generally selected to be 128 bits or 256 bits. In order to avoid the influence of the packet loss phenomenon of the wireless communication network on the audio signal transmission, a transmission data storage queue can be respectively arranged on each audio acquisition device; and the audio acquisition equipment stores the sampled data packets to a sending data storage queue and sends all the data packets stored in the sending data storage queue to the video shooting equipment through a wireless communication network. The S2 step may include the steps of:
s201, sequentially moving back the data packets at each storage position in the sending data storage queue. Assuming that only the 1 st data packet generated by the audio acquisition device is stored in the first storage location in the previous transmission data storage queue, after the 2 nd data packet generated by the audio acquisition device, the data packet 1 is moved from the first storage location of the transmission data storage queue to the second storage location, and the data packet 2 is stored in the first storage location of the transmission data storage queue.
S202, discarding the data packet stored in the last storage position in the data storage queue. When the number of the data packets stored in the transmission data storage queue reaches the maximum storage number of the transmission data storage queue (that is, when the data packet is stored in the last storage position of the transmission data storage queue), the data packet stored in the last storage position of the transmission data storage queue is discarded when the data packet stored in the transmission data storage queue is moved backwards, so that the first storage position is free for storing the data packet newly generated by the audio acquisition device.
S203, storing the data packet newly generated by the audio acquisition equipment in a first storage position of the transmission data storage queue. Therefore, the data packets stored in the sending data storage queue are updated, the sending data storage queue discards the data packets stored in the early stage, and the newly generated data packets are cached.
And S204, sending all data packets stored in the sending data storage queue to the video shooting equipment through a wireless communication network. If the transmission data storage queue can store 5 data packets, the 5 data packets stored in the transmission data storage queue are all transmitted when the data packets are transmitted; therefore, each data packet is sent 5 times, so as to avoid that the video shooting device fails to receive the data packet due to packet loss.
And S3, after the video shooting equipment receives the data packets of the audio acquisition equipment, recovering the data packets of the audio acquisition equipment into one-track audio signals respectively, matching and aligning the waveforms of the multi-track audio signals, and synthesizing the multi-track audio signals into one-track synthesized audio signals. In order to avoid the influence of the packet loss phenomenon of the wireless communication network on the audio signal transmission, a receiving data storage queue can be respectively arranged at the video shooting equipment end corresponding to each audio acquisition equipment, and the received data packet of each audio acquisition equipment is respectively stored in the corresponding receiving data storage queue; thereby realizing the split-track storage of the audio. The number of packets stored by the receive data storage queue is preferably equal to the number of packets stored by the transmit data storage queue. The video shooting equipment stores the received data packets into a received data storage queue according to the arrangement sequence of the data packets in the transmitted data storage queue; and when the data packet is missing, reserving a storage position corresponding to the missing data packet in the received data storage queue. After the video shooting device receives the data packet from the audio acquisition device, the following steps can be specifically executed:
s301, detecting whether the data packets stored in the received data storage queue are missing, if so, reserving a storage position corresponding to the missing data packet in the received data storage queue, and executing the step S302, otherwise, executing the step S303.
S302, finding out the data packet missing from the received data packet and storing the data packet to the corresponding position in the received data storage queue; the step S303 is executed.
S303, moving the data packet stored in the last storage position in the received data storage queue out of the received data storage queue, and sequentially moving the data packet in each storage position in the received data storage queue back to one storage position.
S304, detecting whether a data packet newly generated by the audio acquisition equipment exists in the received data packet, if so, storing the data packet in a first storage position of a received data storage queue, if not, reserving the first storage position, and marking the data packet missing in the storage position.
Through the steps, the data packets and the storage sequence stored in the sending data storage queue and the receiving data storage queue can be completely consistent. Because the received data is buffered by the received data storage queue, when packet loss is detected, missing data packets can be found out from the later received data packets, so that the missing data packets are completed, and the influence of the data packet loss on tone quality is avoided.
And recovering the data packets which are shifted out of the receiving data storage queue in the audio signals of each track into analog audio signals, and performing waveform matching and alignment of the audio signals of each track. As shown in fig. 2, in order to accurately achieve waveform matching alignment of the audio signals of the tracks, in a preferred embodiment, the method for matching and aligning the waveforms of the audio signals of multiple tracks includes the following steps:
s311, presetting the duration of a matching period, and selecting a track of audio signal as a reference audio in one matching period; the duration of the matching period is preferably the duration of the audio information in one data packet.
And S312, overlapping the waveform of the audio signal of the other track with the waveform of the reference audio, moving the two tracks left and right in a preset range, integrating the Euclidean distances of the two waveforms after each movement, and taking the time point corresponding to the position with the minimum Euclidean distance integral value as the alignment time point of the audio signal of the track.
And S313, repeating the step S312, sequentially calculating the alignment time points of the audio signals of other tracks, and aligning the alignment time points of the audio signals of the tracks.
The waveform alignment by using the euclidean distance can realize the accurate alignment of the waveforms, but since the calculation amount is too large, there is a great difficulty in the implementation process, and therefore, the waveform alignment method is optimized in the following embodiments, and the waveform alignment method with a small calculation amount is realized.
For example, as shown in fig. 3, in another preferred embodiment, the method of matching and aligning the waveforms of the multi-track audio signal may employ the following steps:
s321, presetting the duration of a matching period, and sequentially finding out a plurality of peak values of each track of audio signal from high to low according to audio level values in one matching period to serve as reference peak values; the duration of the matching period is preferably the duration of the audio information in one data packet.
And S322, aligning the time points of one reference peak value corresponding to each track audio signal in sequence, and summing the time differences of the time points between other corresponding reference peak values.
And S323, finding out the time point of the reference peak value when the sum of the time differences is minimum as an alignment time point, and aligning the alignment time points of the audio signals of each track.
For example, ten peaks may be sequentially found out from high to low according to the audio level values as reference peaks, then time points of peaks with the highest audio signal level values of the tracks are aligned, then time differences (absolute values are taken as calculation results) of time points of peaks with the second highest audio signal level values of the tracks are calculated pairwise, time differences of time points of peaks with the third highest audio signal level values of the tracks to time points of peaks with the tenth highest audio signal level values are calculated pairwise sequentially, and the time differences are summed to obtain a sum of time differences corresponding to time points of peaks with the highest audio signal level values. And then, sequentially calculating the sum of time differences corresponding to the time point of the peak with the highest audio signal level value to the sum of time differences corresponding to the time point of the peak with the tenth audio signal level value, taking the time point of the reference peak with the smallest sum of the time differences as an alignment time point, and taking the time point of the peak with the highest audio signal level value as the alignment time point if the sum of the time differences corresponding to the time point of the peak with the highest audio signal level value is smallest.
The alignment time point is found out by selecting a plurality of peak values to carry out time difference summation, so that the time points involved in calculation are greatly reduced, and complex integral operation is avoided, thereby greatly reducing the calculation amount, and the calculation task can be completed by adopting a common processor chip.
In a further preferred embodiment, as shown in fig. 4, the method for matching and aligning the waveforms of the multi-track audio signal may further comprise the steps of:
s331, presetting the duration of a matching period, and respectively calculating the envelope curve of each track of audio signals in one matching period; the duration of the matching period is preferably the duration of the audio information in one data packet.
And S332, finding out time points corresponding to the peaks of the envelope curve of each track of audio signals.
And S333, aligning the time points of one peak value corresponding to the envelope of each track audio signal in sequence, and summing the time differences of the time points between the envelope peak values corresponding to other tracks.
And S334, finding out the time point of the envelope peak value when the sum of the time differences is minimum as an alignment time point, and aligning the alignment time points of the audio signals of the tracks.
The alignment time point is found out by adopting the envelope peak value to carry out time difference summation, so that the negative influence caused by burrs in the audio signal can be reduced, and the waveform alignment is more accurate while the calculated amount is reduced.
In other embodiments, to avoid noise superposition and/or avoid the synthesized audio signal level being too high, one or both of the following steps may be performed before synthesizing the synthesized multi-track audio signal into a one-track synthesized audio signal:
s351, an audio low level threshold value is preset, and before the audio signals of all the tracks are superposed and synthesized, the part, lower than the audio low level threshold value, of the audio signals of all the tracks is removed. The audio low level threshold is preferably-50 db to-30 db, and the influence of the background noise can be effectively reduced by removing the part of the audio signals of each track which is lower than the audio low level threshold and then performing superposition synthesis on the audio signals of each track. For example, if the audio low level threshold is set to-40 db, the part of each track audio signal lower than-40 db is removed, so that the bottom noise can be well removed; since the audio acquisition device will typically be placed nearby to the sound source, the level of the desired audio signal will necessarily be greater than-40 db and thus not be mistakenly removed.
And S352, attenuating the audio signal of each track. In order to avoid the pop caused by the overhigh level of the synthesized audio signals, an audio high level threshold value can be preset, so that the sum of the maximum audio levels of the audio signals of all tracks after attenuation is smaller than or equal to the audio high level threshold value. For example, the synthesized audio may be "crackling" because the amplitude of the sound exceeds the maximum measurable range of the device after an audio level greater than 0db. There is no phenomenon that the audio level exceeds 0db in nature, and therefore the audio high level threshold may be set to 0db or less than 0db to avoid the level of the synthesized audio signal exceeding 0db. To reduce the maximum level of a synthesized audio signal and to preserve the main characteristics of the audio signal of each track; for example, an audio attenuation level threshold may be preset, and only the part of each track of audio signal higher than the audio attenuation level threshold is attenuated, so that the features of the part lower than the audio attenuation level threshold can be preserved.
And S4, synthesizing the synthesized audio signal and the video information into a live video. Because the audio information and the video information are synthesized without high-precision synchronization, the synchronization of video frames is only required to be satisfied; therefore, before live broadcast shooting, a plurality of audio acquisition devices are respectively synchronized with the video shooting devices, the audio acquisition devices are respectively synchronized with the video shooting devices through clocks, namely, the clocks of any audio signals before audio synthesis can be used as the clocks for synthesizing the audio signals, so that the synthesized audio signals and the video information are aligned on a clock shaft, and the live broadcast video with the audio information is synthesized. Since the video information is stored in the video buffer after the video information is captured, the alignment of the time axes of the synthesized audio signal and the video information can be realized by adjusting the delay of the video buffer.
Because the clock accuracy of the video shooting equipment and the audio acquisition equipment is generally in the order of dozens to one hundred ms, the accuracy of clock synchronization is limited in the range, in the embodiment, the waveforms of the multi-track audio signals are matched and aligned, sampling level synchronization can be performed on the multi-track audio signals, and by taking 48000 times of sampling per second as an example, the sampling level synchronization accuracy achieved by adopting the scheme can reach the order of 1ms, which is far higher than the clock synchronization, so that a better audio synthesis effect can be obtained.
In addition, by setting the sending data storage queue and the receiving data storage queue, data packets lost in the transmission process of the wireless communication network can be complemented, and the influence of the phenomenon of packet loss of the wireless communication network on the audio tone quality is avoided. The data packet can be sent to the wireless communication network after entering the sending data storage queue, so that the sending data storage queue can not cause data delay; although the receiving data storage queue buffers the data packet to cause delay, the actual delay is very short and can be almost ignored because the audio sampling rate is very high, and the real-time performance of live broadcasting cannot be influenced.
As shown in fig. 5, a preferred embodiment of the communication network video shooting device based on sampling-level audio multi-track synthesis of the present invention includes a video shooting module, a first wireless communication module, a received data storage queue, a multi-track audio synthesis module, a video buffer, and an audio-video synthesis module.
The video shooting module is used for acquiring video information through video shooting; of course, the video shooting module is also used for timing with the audio acquisition equipment. The first wireless communication module is used for connecting the audio acquisition equipment through a wireless communication network and acquiring a data packet of audio information sent by the audio acquisition equipment. The first wireless communication module is preferably a WIFI module, and the wireless communication network is preferably a WIFI communication network. Of course, when the audio signal needs to be transmitted remotely, the first wireless communication module may also be a 4G module or a 5G module, and the wireless communication network may also be a 4G or 5G mobile communication network.
The receiving data storage queue is used for storing data packets which are not stored in the received data packets from the audio acquisition equipment, and sequentially shifting out the stored data packets according to a first-in first-out principle after the number of the stored data packets reaches a preset number; the multi-track audio synthesis module is used for respectively recovering the data packet of each audio acquisition device into a track audio signal and synthesizing the multi-track audio signal into a track synthesized audio signal through matching and aligning the waveforms of the multi-track audio signal; the video buffer area is used for caching video information shot by the video shooting module; and the audio and video synthesis module is used for synthesizing the synthesized audio signal and the video information into the live video.
The invention also discloses a communication network video live broadcast system based on sampling-level audio multi-track synthesis, please refer to fig. 5 continuously, and a preferred embodiment of the communication network video live broadcast system based on sampling-level audio multi-track synthesis of the invention comprises the video shooting device and a plurality of audio acquisition devices, wherein each audio acquisition device comprises an audio acquisition module, a transmission data storage queue and a first wireless communication module; the audio acquisition module is used for acquiring audio information through audio sampling and packaging the acquired audio information into a data packet; the sending data storage queue is used for storing the data packets generated by the audio acquisition module, and discarding the data packets stored firstly according to a first-in first-out principle after the number of the stored data packets reaches a preset number; the second wireless communication module is used for sending the data packet stored in the sending buffer area to a wireless communication network.
Finally, the above embodiments are only intended to illustrate the technical solutions of the present invention and not to limit the present invention, and although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions, and all of them should be covered by the claims of the present invention.

Claims (10)

1. A communication network video live broadcast method based on sampling-level audio multi-track synthesis is characterized by comprising the following steps:
the method comprises the following steps that S1, a plurality of audio acquisition devices are connected with a video shooting device through a wireless communication network, and the video shooting device sends instructions to enable each audio acquisition device to acquire audio information while recording video information;
s2, each audio acquisition device acquires audio information, packages the audio information into a data packet and sends the data packet to the video shooting device through a wireless communication network;
s3, after the video shooting equipment receives the data packets of the audio acquisition equipment, the data packets of the audio acquisition equipment are respectively restored into one-track audio signals, the waveforms of the multi-track audio signals are matched and aligned, and then the multi-track audio signals are synthesized into one-track synthesized audio signals;
and S4, synthesizing the synthesized audio signal and the video information into a live video.
2. The communication network video live broadcast method based on sample-level audio multi-track synthesis of claim 1, characterized in that: in the step S3, the method for matching and aligning the waveforms of the multi-track audio signal includes the following steps:
s311, presetting the duration of a matching period, and selecting a track of audio signal as a reference audio in one matching period;
s312, overlapping the waveform of the other track of audio signal with the waveform of the reference audio, moving left and right within a preset range, integrating the Euclidean distances of the two waveforms after each movement, and taking the time point corresponding to the position with the minimum Euclidean distance integral value as the alignment time point of the track of audio signal;
and S313, repeating the step S312, sequentially calculating the alignment time points of the audio signals of other tracks, and aligning the alignment time points of the audio signals of the tracks.
3. The communication network video live broadcasting method based on sampling-level audio multi-track synthesis according to claim 1, characterized in that: in the step S3, the method for matching and aligning the waveforms of the multi-track audio signal includes the following steps:
s321, presetting the duration of a matching period, and sequentially finding out a plurality of peak values of each track of audio signal from high to low according to the audio level value in one matching period to serve as reference peak values;
s322, aligning the time points of one reference peak value corresponding to each track of audio signals in sequence, and summing the time differences of the time points among other corresponding reference peak values;
and S323, finding out the time point of the reference peak value when the sum of the time differences is minimum as an alignment time point, and aligning the alignment time points of the audio signals of each track.
4. The communication network video live broadcasting method based on sampling-level audio multi-track synthesis according to claim 1, characterized in that: in the step S3, the method for matching and aligning the waveforms of the multi-track audio signal includes the following steps:
s331, presetting the duration of a matching period, and respectively calculating the envelope curve of each track of audio signal in one matching period;
s332, finding out time points corresponding to all peak values of the envelope curve of each track of audio signals;
s333, aligning the time points of one peak value corresponding to the envelope curve of each track audio signal in sequence, and summing the time differences of the time points between the other corresponding envelope curve peak values;
s334, the time point of the envelope peak value when the sum of the time differences is the minimum is found as an alignment time point, and the alignment time points of the audio signals of the respective tracks are aligned.
5. The communication network video live broadcasting method based on sampling-level audio multi-track synthesis according to any one of claims 2 to 4, characterized by comprising the following steps: the duration of the matching period is the duration of the audio information in one data packet.
6. The communication network video live broadcasting method based on sampling-level audio multi-track synthesis according to any one of claims 1 to 4, characterized by comprising the following steps: in step S3, before synthesizing multi-track audio signals into one-track synthesized audio signal, the following steps are further performed:
s351, an audio low level threshold value is preset, and before the audio signals of all the tracks are superposed and synthesized, the part, lower than the audio low level threshold value, of the audio signals of all the tracks is removed.
7. The communication network video live broadcasting method based on sample-level audio multi-track synthesis according to any one of claims 1 to 4, characterized in that, before synthesizing multi-track audio signals into one-track synthesized audio signals in the step S3, the following steps are further executed:
and S352, attenuating the audio signal of each track.
8. The communication network video live broadcasting method based on sampling-level audio multi-track synthesis according to any one of claims 1 to 4, wherein the wireless communication network is a WIFI communication network, the WIFI communication network comprises a WIFI router, the audio acquisition device and the video shooting device are both provided with WIFI modules, and the audio acquisition device and the video shooting device are respectively connected with the WIFI router through the WIFI modules.
9. The utility model provides a communication network video shooting equipment based on multi-track synthesis of sampling level audio frequency which characterized in that: comprises that
The video shooting module is used for acquiring video information through video shooting;
the first wireless communication module is used for connecting the audio acquisition equipment through a wireless communication network and acquiring a data packet of audio information sent by the audio acquisition equipment;
the receiving data storage queue is used for storing data packets which are not stored in the received data packets from the audio acquisition equipment, and sequentially shifting out the stored data packets according to a first-in first-out principle after the number of the stored data packets reaches a preset number;
the multi-track audio synthesis module is used for respectively recovering the data packets of each audio acquisition device into one-track audio signals and synthesizing the multi-track audio signals into one-track synthesized audio signals through matching and aligning the waveforms of the multi-track audio signals;
the video buffer area is used for caching the video information shot by the video shooting module; and
and the audio and video synthesis module is used for synthesizing the synthesized audio signal and the video information into the live video.
10. A communication network video live broadcast system based on sampling-level audio multi-track synthesis is characterized in that: comprising the video capture device of claim 9 and a plurality of audio capture devices, the audio capture devices comprising:
the audio acquisition module is used for acquiring audio information through audio sampling and packaging the acquired audio information into a data packet;
the sending data storage queue is used for storing the data packets generated by the audio acquisition module and discarding the data packets stored firstly according to a first-in first-out principle after the number of the stored data packets reaches a preset number; and
and the second wireless communication module is used for transmitting the data packet stored in the transmission buffer area to a wireless communication network.
CN202211136511.7A 2022-09-19 2022-09-19 Communication network video live broadcast method and system based on sampling-level audio multi-track synthesis Pending CN115550728A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211136511.7A CN115550728A (en) 2022-09-19 2022-09-19 Communication network video live broadcast method and system based on sampling-level audio multi-track synthesis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211136511.7A CN115550728A (en) 2022-09-19 2022-09-19 Communication network video live broadcast method and system based on sampling-level audio multi-track synthesis

Publications (1)

Publication Number Publication Date
CN115550728A true CN115550728A (en) 2022-12-30

Family

ID=84727049

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211136511.7A Pending CN115550728A (en) 2022-09-19 2022-09-19 Communication network video live broadcast method and system based on sampling-level audio multi-track synthesis

Country Status (1)

Country Link
CN (1) CN115550728A (en)

Similar Documents

Publication Publication Date Title
US8665370B2 (en) Method for synchronized playback of wireless audio and video and playback system using the same
CN104320843B (en) Audio synchronization method of Bluetooth sound production device
CN101184195B (en) Audio/video living broadcast system and method
KR101878279B1 (en) Video remote-commentary synchronization method and system, and terminal device
WO2013151878A1 (en) Synchronizing wireless earphones
CN101202613B (en) Terminal for clock synchronization
CN107707962A (en) A kind of method for realizing that video requency frame data is synchronous with gps time position and FPGA
CN104010226A (en) Multi-terminal interactive playing method and system based on voice frequency
CN114974321A (en) Audio playing method, equipment and system
CN115550728A (en) Communication network video live broadcast method and system based on sampling-level audio multi-track synthesis
CN105611191B (en) Voice and video file synthesis method, apparatus and system
CN111081238A (en) Bluetooth sound box voice interaction control method, device and system
CN113055312B (en) Multichannel audio pickup method and system based on synchronous Ethernet
CN109039994A (en) A kind of method and apparatus calculating the audio and video asynchronous time difference
US8006007B1 (en) Time scale normalization of a digitized signal
CN107197162B (en) Shooting method, shooting device, video storage equipment and shooting terminal
CN115297337B (en) Audio transmission method and system based on data transceiving cache during live video broadcast
CN115297335B (en) Audio transmission method and system based on receiving buffer area during live video broadcast
CN105471776B (en) A kind of method for transmitting signals and device
CN210627896U (en) High-speed signal acquisition playback system
CN113608714A (en) Echo cancellation method, electronic device and computer readable storage medium
CN115499675A (en) Multi-machine-bit audio and video synthesis method and system based on communication network live video
CN109215664A (en) Method of speech processing and device
CN113452789B (en) Frequency domain combining system and frequency domain combining method for forward interface
CN106792143B (en) Share playback method and system in media file multiple terminals

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination