CN104409079A

CN104409079A - Method and device for audio superposition

Info

Publication number: CN104409079A
Application number: CN201410610820.2A
Authority: CN
Inventors: 阮兰桂
Original assignee: CCOM COMMUNICATIONS TECHNOLOGY Co Ltd
Current assignee: CCOM COMMUNICATIONS TECHNOLOGY Co Ltd
Priority date: 2014-11-03
Filing date: 2014-11-03
Publication date: 2015-03-11

Abstract

The invention discloses a method and a device for audio superposition, relates to the communication technology field and can reduce occupied bandwidth for data transmission. The method comprises steps that, silence detection on at least two paths of received original audio signals is carried out to acquire silence signals and non-silence signals; the silence signals are discarded, and at least two paths of first audio signals are acquired; voice strengthening processing on the at least two paths of first audio signals is carried out to acquire at least two paths of second audio signals; weighted values are distributed to different-component signals of the at least two paths of second audio signals; voice balancing processing on the at least two paths of second audio signals is carried out according to the distributed weighted values to acquire at least two paths of third audio signals; superposition for the at least two paths of third audio signals is carried out to acquire audio signals after superposition. The method is applicable to multi-party simultaneous conversion scenes.

Description

Audio frequency superposition method and device

Technical Field

The present invention relates to the field of communications technologies, and in particular, to a method and an apparatus for audio frequency superimposition.

Background

With the development of Internet communication technology, audio communication technology is continuously developed, such as ordinary phones, VoIP (Voice over Internet Protocol) phones, etc., which have been widely used.

At present, under the scene of a multiparty conference and the like, a plurality of parties need to communicate simultaneously, so that audio signals of the parties need to be superposed. The existing method of superimposing audio signals is mainly implemented based on the g.711 audio compression and mixing technology. G.711 is a set of speech compression standards set by the international telecommunications union. In the mixing technique based on g.711 audio compression, audio signals of respective parties need to be linearly superimposed. The superposition mode excessively occupies the bandwidth of signal transmission, and when the bandwidth required by the superposed signal is larger than the actual bandwidth of the network, the audio data packet cannot be normally transmitted, which can seriously affect the normal communication of each party.

Disclosure of Invention

In view of the above problems, the method and apparatus for audio frequency superposition provided by the present invention can solve the problem of too wide bandwidth required in the data transmission process.

To solve the above technical problem, in one aspect, the present invention provides an audio stacking method, including:

carrying out mute detection on at least two paths of received original audio signals to obtain a mute signal and a non-mute signal;

discarding the mute signal to obtain at least two paths of first audio signals;

performing voice enhancement processing on the at least two paths of first audio signals to obtain at least two paths of second audio signals;

distributing weighted values to signals of different components in the at least two paths of second audio signals;

performing voice equalization processing on the at least two paths of second audio signals according to the distributed weighted values to obtain at least two paths of third audio signals;

and superposing the at least two paths of third audio signals to obtain superposed audio signals.

In another aspect, the present invention provides an apparatus for audio superposition, including:

the detection unit is used for carrying out mute detection on the received at least two paths of original audio signals to obtain a mute signal and a non-mute signal;

a discarding unit, configured to discard the mute signal detected in the detecting unit, and obtain at least two paths of first audio signals;

the enhancement processing unit is used for carrying out voice enhancement processing on the at least two paths of first audio signals obtained by the discarding unit to obtain at least two paths of second audio signals;

the distribution unit is used for distributing weighted values to signals of different components in the at least two paths of second audio signals obtained by the enhancement processing unit;

the equalization processing unit is used for performing voice equalization processing on the at least two paths of second audio signals obtained by the enhancement processing unit according to the weighted value distributed by the distribution unit to obtain at least two paths of third audio signals;

and the superposition unit is used for superposing the at least two paths of third audio signals obtained by the equalization processing unit to obtain superposed audio signals.

By means of the technical scheme, the method and the device for audio superposition can perform silence detection, voice enhancement processing and voice equalization processing on at least two paths of received original audio signals to obtain processed audio signals, and superpose the processed audio signals to obtain superposed audio signals. Compared with the prior art, the invention can eliminate the mute signal in the audio signal in a mute detection mode, thereby reducing the bandwidth occupied by data transmission. Meanwhile, the invention can also improve the transmission quality of the audio signal by means of voice enhancement processing, voice equalization processing and the like, and ensure the conversation quality of all parties of the conference on the basis of reducing the bandwidth occupation.

The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

FIG. 1 illustrates a flow diagram of one particular implementation of a method of audio overlay;

FIG. 2 shows a schematic diagram of one audio superposition approach;

FIG. 3 illustrates a flow diagram of another particular implementation of a method of audio overlay;

FIG. 4 illustrates a flow diagram of yet another particular implementation of a method of audio overlay;

FIG. 5 is a schematic diagram of an embodiment of an audio-frequency superimposing apparatus;

fig. 6 shows a schematic structural diagram of another specific implementation of an audio-frequency superimposing apparatus.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

The embodiment of the invention provides an audio superposition method, which is mainly applied to IP (Internet protocol) communication based on the Internet and can also be applied to communication based on a communication network. The method can be applied to both wired networks and wireless networks. As shown in fig. 1, the method includes:

101. and carrying out mute detection on the received at least two paths of original audio signals to obtain a mute signal and a non-mute signal.

Each original audio signal may include at least one audio data block, and each audio data block carries a time identifier and a terminal identifier. The time identification is used for representing the sequence of time, and the terminal identification is used for representing the source of the original audio signal. For example, a certain path of original audio signal is originated from the terminal 1, and includes an audio data block with time stamp 1 and an audio data block with time stamp 2, and it is known that, on the time axis, an audio data block with time stamp 1 and an audio data block with time stamp 2 are sequentially present.

Furthermore, silence may contain background sounds of the environment, rather than silence in an absolute sense. When the received at least two paths of original audio signals are subjected to silence detection, whether the signals are silence signals or non-silence signals can be judged through a preset demarcation parameter. For example, the preset boundary parameter may be a waveform amplitude of the original audio signal.

102. And discarding the mute signal to obtain at least two paths of first audio signals.

When obtaining the mute signal and the non-mute signal in each original audio signal, the mute signal is discarded, for example, the mute signal can be removed by a filter, so as to obtain the non-mute signal, i.e., the first audio signal, thereby reducing the size of the audio data amount.

103. And performing voice enhancement processing on the at least two paths of first audio signals to obtain at least two paths of second audio signals.

The voice enhancement processing may be to detect and eliminate noise in the audio signal. When the at least two paths of first audio signals are subjected to voice enhancement processing, noise in the at least two paths of first audio signals can be detected, so that the noise signals are distinguished from non-noise signals, the non-noise signals are obtained, and the at least two paths of second audio signals are obtained. The method for realizing speech enhancement includes noise cancellation, harmonic enhancement, enhancement algorithm based on speech short-time spectrum estimation, and the like.

104. And distributing weighted values to the signals of different components in the at least two paths of second audio signals.

Each path of second audio signal may include signals with different components, that is, different audio data blocks in each path of second audio signal may be different in terms of volume, so that weighted values with different sizes may be assigned to signals with different components according to the volume, etc.

105. And performing voice equalization processing on the at least two paths of second audio signals according to the distributed weighted values to obtain at least two paths of third audio signals.

After the weighted values are distributed to the signals of different components in each path of second audio signal, multiplication operation can be performed on the volumes of the signals of different components and the corresponding weighted values, namely, the volumes are amplified by multiples of the weighted values, so that the signals of different components are subjected to equalization processing to obtain a third audio signal.

106. And superposing the at least two paths of third audio signals to obtain superposed audio signals.

And the at least two paths of third audio signals are overlapped, namely the at least two paths of third audio signals are packaged into an audio data packet and then transmitted in the network.

For example, as shown in fig. 2, there are three third audio signals, which are sequentially a first third audio signal, a second third audio signal, and a third audio signal, and the three third audio signals are superimposed, that is, the three third audio signals are packaged in one audio data packet and transmitted through the network.

The audio superposition method provided by the invention can perform silence detection, voice enhancement processing and voice equalization processing on at least two paths of received original audio signals to obtain processed audio signals, and superpose the processed audio signals to obtain superposed audio signals. Compared with the prior art, the invention can eliminate the mute signal in the audio signal in a mute detection mode, thereby reducing the bandwidth occupied by data transmission. Meanwhile, the invention can also improve the transmission quality of the audio signal by means of voice enhancement processing, voice equalization processing and the like, and ensure the conversation quality of all parties of the conference on the basis of reducing the bandwidth occupation.

Further, as a refinement and an extension of the method shown in fig. 1, another embodiment of the present invention further provides an audio superposition method, as shown in fig. 3, including:

201. and carrying out mute detection on the received at least two paths of original audio signals to obtain a mute signal and a non-mute signal.

The implementation of this step is the same as that of step 101 in fig. 1, and is not described here again.

Optionally, in order to further acquire a mute signal and an un-mute signal, step 201 in fig. 3 may be refined into steps 2011 to 2012, as shown in fig. 4, which includes:

2011. and carrying out voice recognition on the original audio signal to obtain the waveform amplitude of the original audio signal.

The amplitude of the waveform of the original audio signal can represent the strength of voice, and the original audio signal is subjected to voice recognition to obtain the amplitudes corresponding to different frequencies in the waveform of the original audio signal.

2012. And comparing the waveform amplitude of the original audio signal according to a preset mute amplitude threshold value, and distinguishing the mute signal from the non-mute signal.

Wherein, the preset mute amplitude threshold is the result of experimental statistics. Comparing the waveform amplitude of the original audio signal with a preset mute amplitude threshold, wherein when the waveform amplitude of the original audio signal is less than or equal to the preset mute amplitude threshold, the original audio signal is a mute signal; when the waveform amplitude of the original audio signal is larger than a preset mute amplitude threshold value, the original audio signal is a non-mute signal.

202. And discarding the mute signal to obtain at least two paths of first audio signals.

The implementation of this step is the same as that of step 102 in fig. 1, and is not described here again.

203. And performing voice enhancement processing on the at least two paths of first audio signals to obtain at least two paths of second audio signals.

The implementation of this step is the same as that of step 103 in fig. 1, and is not described here again.

Optionally, step 203 in fig. 3 may be further refined into step 2031 to step 2033, as shown in fig. 4, including:

2031. and performing voice recognition on the at least two paths of first audio signals to obtain the waveform amplitude of the first audio signals.

The first audio signal is subjected to voice recognition, amplitudes corresponding to different frequencies in a waveform of the first audio signal are obtained, the amplitudes corresponding to noise with low energy are often low, and therefore partial noise can be judged through the amplitudes.

2032. And comparing the waveform amplitude of the first audio signal according to a preset noise amplitude threshold value to distinguish a voice signal and a noise signal.

And the preset noise amplitude threshold value is an experimental statistical result. Comparing the waveform amplitude of the first audio signal with a preset mute amplitude threshold, wherein when the waveform amplitude of the first audio signal is less than or equal to a preset noise amplitude threshold, the first audio signal is a noise signal; when the waveform amplitude of the first audio signal is larger than a preset noise amplitude threshold value, the first audio signal is a voice signal.

Note that the detection of noise is not limited to amplitude, and other noise may be detected by other parameters such as frequency and short-time energy.

2033. Removing the noise signal from the first audio signal.

When the noise signal and the voice signal are obtained, the noise signal can be eliminated through a filter, and the voice signal is obtained.

204. And distributing weighted values to the signals of different components in the at least two paths of second audio signals.

Optionally, the assignment of weighted values to the signals of different components in the at least two paths of second audio signals may be further refined as: signals with smaller volumes are assigned higher weight values.

Wherein, the weight value can be assigned according to the preset volume. For example, the volume of the signal 1 is 40%, the volume of the signal 2 is 60%, and when the preset volume is 50%, the weight value assigned to the signal 1 is 5/4, and the weight value assigned to the signal 2 is 5/6.

205. And performing voice equalization processing on the at least two paths of second audio signals according to the distributed weighted values to obtain at least two paths of third audio signals.

Optionally, according to the assigned weighted value, the speech equalization processing on the at least two paths of second audio signals may be further refined as: and carrying out equalization processing on the volumes of different component signals according to different weight values.

The equalization processing is to process the signals with different volumes into similar or same volumes, and according to the preset volume, the volume of the signals with different components is amplified or reduced into the volume similar or same to the preset volume, thereby achieving the purpose of equalization processing. Wherein, the magnification of the volume is the corresponding weight value.

For example, the volume of the signal 1 is 40%, the volume of the signal 2 is 60%, when the preset volume is 50%, the weight value assigned to the signal 1 is 5/4, and the weight value assigned to the signal 2 is 5/6, the volume of the signal 1 is amplified by 5/4 times, and the volume of the signal 2 is amplified by 5/6 times, so that the volumes of the signal 1 and the signal 2 reach 50%.

In the step, the signals with different volumes in the second audio signal are equalized, so that the volumes of all the signals can reach the same or similar value, and the difference of all the signals in volume is reduced.

206. And superposing the at least two paths of third audio signals to obtain superposed audio signals.

The stacking manner in this step is the same as that in step 106 in fig. 1, and is not described herein again.

207. And carrying out noise reduction processing, distortion processing and volume enhancement processing on the superposed audio signal to obtain an optimized superposed audio signal.

Wherein after the superimposed audio signal is obtained, it can be further detected. When the superimposed audio signal is detected to have a noise signal, noise reduction processing may be performed on the superimposed audio signal, for example, noise may be eliminated by a filtering and noise reduction method; when the superimposed audio signal distortion is detected, distortion processing may be performed on the superimposed audio signal; when the volume of the superimposed audio signal is detected to be too low, volume enhancement processing can be performed on the superimposed audio signal.

In the step, the superposed audio signals are subjected to noise reduction processing, distortion processing, volume enhancement processing and the like, so that the superposed audio signals can be optimized, and the quality of audio in network communication is improved.

In addition, when the bandwidth of the network jumps, especially when the bandwidth becomes smaller, the optimized superimposed audio signal can be disassembled to be able to transmit a plurality of data packets in the jumped bandwidth, thereby ensuring normal communication.

For example, the optimized superimposed audio signal is composed of a first path of signal of 0.3M, a second path of signal of 0.5M, and a third path of signal of 0.2M, and the bandwidth of the network is 1M. When the bandwidth of the network is hopped to 0.5M, the first path of signal of 0.3M and the third path of signal of 0.2M can be packed into a data packet 1, and the second path of signal of 0.5M can be packed into a data packet 2, so that the optimized superimposed audio signal can be divided into two data packets for sequential transmission.

In the embodiment, when the bandwidth of the network jumps, especially when the bandwidth becomes smaller, the optimized superimposed audio signal is split into a plurality of data packets which can be transmitted in the jumped bandwidth, so that the problem that the network is blocked due to network jump and further transmission delay is generated in the prior art is solved, and normal transmission of communication data is further ensured.

It should be noted that the above embodiments can be applied to the server side, that is, the audio signal can be received and processed by the server.

Further, as an implementation of the foregoing method embodiments, in another embodiment of the present invention, an apparatus for audio superposition is further provided. As shown in fig. 5, the apparatus includes: a detection unit 31, a discarding unit 32, an enhancement processing unit 33, an allocation unit 34, an equalization processing unit 35, and a superimposing unit 36. Wherein,

the detecting unit 31 is configured to perform silence detection on the received at least two paths of original audio signals to obtain a silence signal and a non-silence signal.

A discarding unit 32, configured to discard the mute signal detected in the detecting unit 31, so as to obtain at least two paths of first audio signals.

An enhancement processing unit 33, configured to perform speech enhancement processing on the at least two first audio signals obtained by the discarding unit 32 to obtain at least two second audio signals.

An assigning unit 34, configured to assign weighted values to signals of different components in the at least two paths of second audio signals obtained by the enhancement processing unit 33.

The equalization processing unit 35 is configured to perform voice equalization processing on the at least two paths of second audio signals obtained by the enhancement processing unit 33 according to the weight values distributed by the distribution unit 34, so as to obtain at least two paths of third audio signals.

And a superimposing unit 36, configured to superimpose the at least two paths of third audio signals obtained by the equalization processing unit 35, so as to obtain a superimposed audio signal.

Further, as shown in fig. 6, the detecting unit 31 includes:

the first recognition module 311 is configured to perform speech recognition on the original audio signal to obtain a waveform amplitude of the original audio signal.

The first comparison module 312 is configured to compare the waveform amplitude of the original audio signal obtained by the first identification module 311 according to a preset mute amplitude threshold, and distinguish the mute signal from the non-mute signal.

Further, as shown in fig. 6, the enhancement processing unit 33 includes:

the second recognition module 331 is configured to perform speech recognition on the at least two paths of first audio signals to obtain waveform amplitudes of the first audio signals.

The second comparing module 332 is configured to compare the waveform amplitude of the first audio signal obtained by the second identifying module 331 according to a preset noise amplitude threshold, and distinguish a voice signal and a noise signal.

A removing module 333, configured to remove the noise signal found by the second comparing module 332 from the first audio signal.

Further, the assigning unit 34 is also configured to assign a higher weight value to the signal with smaller volume.

The equalizing unit 35 is further configured to perform equalizing processing on the sound volumes of different component signals according to the different weight values assigned by the assigning unit 34.

Further, as shown in fig. 6, the apparatus may further include: and an optimization processing unit 37, configured to perform noise reduction processing, distortion processing, and volume enhancement processing on the audio signal superimposed by the superimposing unit after the superimposed audio signal is obtained by the superimposing unit 36, so as to obtain an optimized superimposed audio signal.

The audio frequency superposition device provided by the invention can perform silence detection, voice enhancement processing and voice equalization processing on at least two paths of received original audio signals to obtain processed audio signals, and superpose the processed audio signals to obtain superposed audio signals. Compared with the prior art, the invention can eliminate the mute signal in the audio signal in a mute detection mode, thereby reducing the bandwidth occupied by data transmission. Meanwhile, the invention can also improve the transmission quality of the audio signal by means of voice enhancement processing, voice equalization processing and the like, and ensure the conversation quality of all parties of the conference on the basis of reducing the bandwidth occupation.

Signals with different volumes in the second audio signal are equalized, so that the volumes of all the signals can reach the same or similar value, and the difference of all the signals in volume is reduced.

The superposed audio signals can be optimized by carrying out noise reduction processing, distortion processing, volume enhancement processing and the like on the superposed audio signals, so that the quality of audio in network communication is improved.

When the bandwidth of the network jumps, especially when the bandwidth becomes smaller, the optimized superposed audio signal is disassembled into a plurality of data packets which can be transmitted in the jumped bandwidth, so that the problem that the network is blocked due to the network jump and further transmission delay is generated in the prior art is solved, and the normal transmission of communication data is further ensured.

In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

It will be appreciated that the relevant features of the method and apparatus described above are referred to one another. In addition, "first", "second", and the like in the above embodiments are for distinguishing the embodiments, and do not represent merits of the embodiments.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

The algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. Moreover, the present invention is not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.

In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.

The various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functionality of some or all of the components in the title of the invention (e.g., means for determining the level of links within a web site) in accordance with embodiments of the invention. The present invention may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present invention may be stored on computer-readable media or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

Claims

1. A method of audio superposition, the method comprising:

discarding the mute signal to obtain at least two paths of first audio signals;

2. The method of claim 1, wherein the performing silence detection on the received at least two original audio signals comprises:

carrying out voice recognition on the original audio signal to obtain the waveform amplitude of the original audio signal;

and comparing the waveform amplitude of the original audio signal according to a preset mute amplitude threshold value, and distinguishing the mute signal from the non-mute signal.

3. The method according to claim 1, wherein said performing speech enhancement processing on said at least two first audio signals comprises:

performing voice recognition on the at least two paths of first audio signals to obtain the waveform amplitude of the first audio signals;

comparing the waveform amplitude of the first audio signal according to a preset noise amplitude threshold value to distinguish a voice signal and a noise signal;

removing the noise signal from the first audio signal.

4. The method according to claim 1, wherein said assigning weight values to signals of different components of said at least two second audio signals comprises:

assigning a higher weight value to a signal with a smaller volume;

the voice equalization processing is performed on the at least two paths of second audio signals according to the distributed weighted values, and the voice equalization processing comprises the following steps:

and carrying out equalization processing on the volumes of different component signals according to different weight values.

5. The method according to any of claims 1 to 4, wherein after said obtaining the superimposed audio signal, the method further comprises:

and carrying out noise reduction processing, distortion processing and volume enhancement processing on the superposed audio signal to obtain an optimized superposed audio signal.

6. An apparatus for audio superposition, the apparatus comprising:

7. The apparatus of claim 6, wherein the detection unit comprises:

the first identification module is used for carrying out voice identification on the original audio signal to obtain the waveform amplitude of the original audio signal;

and the first comparison module is used for comparing the waveform amplitude of the original audio signal obtained by the first identification module according to a preset mute amplitude threshold value so as to distinguish the mute signal from the non-mute signal.

8. The apparatus of claim 6, wherein the enhancement processing unit comprises:

the second identification module is used for carrying out voice identification on the at least two paths of first audio signals to obtain the waveform amplitude of the first audio signals;

the second comparison module is used for comparing the waveform amplitude of the first audio signal obtained by the second identification module according to a preset noise amplitude threshold value to distinguish a voice signal and a noise signal;

and the eliminating module is used for eliminating the noise signal searched by the second comparing module from the first audio signal.

9. The apparatus according to claim 6, wherein the assigning unit is configured to assign a higher weight value to a signal with a smaller volume;

and the equalization processing unit is used for equalizing the volumes of different component signals according to different weight values distributed by the distribution unit.

10. The apparatus of any one of claims 6 to 9, further comprising:

and the optimization processing unit is used for performing noise reduction processing, distortion processing and volume enhancement processing on the audio signal superposed by the superposition unit after the superposed audio signal is obtained by the superposition unit, so as to obtain an optimized superposed audio signal.