CN109327633B

CN109327633B - Sound mixing method, device, equipment and storage medium

Info

Publication number: CN109327633B
Application number: CN201710641953.XA
Authority: CN
Inventors: 吴威麒; 袁荣喜; 苗广艺; 张凯磊
Original assignee: Suzhou Qianwen Wandaba Education Technology Co Ltd
Current assignee: Suzhou Qianwen wandaba Education Technology Co., Ltd
Priority date: 2017-07-31
Filing date: 2017-07-31
Publication date: 2020-09-22
Anticipated expiration: 2037-07-31
Also published as: CN109327633A

Abstract

The embodiment of the invention discloses a sound mixing method, a sound mixing device, sound mixing equipment and a storage medium. The sound mixing method comprises the following steps: receiving audio streams of at least two channels; calculating the waveform envelope strength of the amplitude or energy of each audio stream; distributing the mixing weight of each audio stream according to the proportion of the waveform envelope strength of each audio stream in the sum of the waveform envelope strengths of all the audio streams, wherein the positive correlation between the mixing weight and the proportion is in a preset range; and performing sound mixing according to each audio stream and the sound mixing weight corresponding to the audio stream to generate the audio stream after sound mixing. According to the embodiment of the invention, the audio mixing weight is distributed according to the proportion of the amplitude or the waveform envelope strength of the energy of the audio stream to the sum of the waveform envelope strengths of all the audio streams, so that the human voice audio stream with higher amplitude or energy is distributed to a larger audio mixing weight due to higher waveform envelope strength, and the audio mixing is clearer.

Description

Sound mixing method, device, equipment and storage medium

Technical Field

The present invention relates to the field of audio mixing technologies, and in particular, to an audio mixing method, apparatus, device, and storage medium.

Background

In a VOIP conference call, a plurality of people are involved in a conversation, and in order for a certain receiving party to hear the voice of all the other people, it is necessary to mix audio streams of all the other people. The audio mixing processing function is arranged at the server end, so that the bandwidth can be saved, the calculation pressure of the client end can be reduced, but the calculation pressure of the server can be increased, and the audio mixing processing function is suitable for a plurality of people to participate in conversation at the same time; the mixing processing function can also be set at the client for processing, and has no pressure on the server, so that the method is suitable for simultaneous conversation of a few people.

No matter which end the mixture is placed, it is required for the listener to clearly hear the speaker's voice, and the classical mixing algorithm in the prior art is a linear superposition algorithm, which is specifically as follows:

assuming that M channels are in operation and the length of the audio stream generated by each channel is N, the audio stream of the ith channel is represented as x_i(N), wherein i is 1 to M, and N is 1 to N.

Assuming that the mixing result is recorded as mix (n), the linear mixing calculation method:

the algorithm directly adopts linear processing, is simple and effective, does not have obvious distortion, but does not distinguish the human voice audio stream from the noise audio stream, and when the number of sound channels is increased, namely M is particularly large, the volume of human voice can be obviously weakened, and the user experience is poor.

Disclosure of Invention

Embodiments of the present invention provide a sound mixing method, apparatus, device, and storage medium, which can effectively solve the problem of volume reduction after mixing multiple channels of sound, highlight the voice of a speaker, and reduce the noise volume.

In a first aspect, an embodiment of the present invention provides a sound mixing method, including:

receiving audio streams of at least two channels;

obtaining the waveform envelope line intensity of each audio stream;

distributing the mixing weight of each audio stream according to the proportion of the waveform envelope strength of the audio stream to the sum of the waveform envelope strengths of all the audio streams;

and performing sound mixing according to each audio stream and the sound mixing weight corresponding to the audio stream to generate the audio stream after sound mixing.

Further, before the obtaining the waveform envelope strength of each audio stream, the method further includes:

adjusting the amplitudes of all the audio streams to enable the amplitudes of the audio streams which accord with the preset amplitude condition to fluctuate within a first threshold range, and the amplitudes of other audio streams to fluctuate within a second threshold range;

wherein the first threshold range is greater than the second threshold range.

Further, the adjusting the amplitudes of the audio streams in all the channels so that the amplitude of the audio stream meeting the preset amplitude condition fluctuates within a first threshold range and the amplitudes of the other audio streams fluctuate within a second threshold range includes:

searching a first amplitude audio stream in all audio streams;

updating the first amplitude audio stream by normalizing the amplitude of the first amplitude audio stream to a first threshold range;

updating the other audio stream by normalizing the amplitude of the other audio stream to a second threshold range.

Further, the step of adjusting the amplitudes of the audio streams in all the channels so that the amplitude of the audio stream meeting the preset condition fluctuates within a first threshold range and the amplitudes of the other audio streams fluctuate within a second threshold range includes:

judging whether the difference between the amplitudes of a first amplitude audio stream and a second amplitude audio stream is within a preset amplitude range, wherein the amplitude of the first amplitude audio stream is larger than that of the second amplitude audio stream, and the first amplitude audio stream and the second amplitude audio stream are two audio streams with the largest amplitudes;

if so, updating the first amplitude audio stream and the second amplitude audio stream by normalizing the first amplitude audio stream and the second amplitude audio stream to a first threshold range;

if not, the first amplitude audio stream is updated by normalizing the first amplitude audio stream to a first threshold range; updating the second magnitude audio stream and the other audio streams by normalizing both the second magnitude audio stream and the other audio streams to a second threshold range.

respectively carrying out noise reduction processing on each audio stream to generate noise-reduced audio streams;

and calculating the waveform envelope line intensity of the noise-reduced audio stream.

the waveform envelope strength of each audio stream is found by hilbert transform or low-pass filtering.

Further, the obtaining of the waveform envelope strength of each audio stream by low-pass filtering includes:

and (3) calculating the waveform envelope strength of each audio stream at the n moment according to the following formula:

Env(i,n)＝(1-coeff(n)×Env(i,n-1)+coeff(n)×a(n)

wherein Env (i, n) is the envelope strength of the audio stream waveform at time n; env (i, n-1) is the intensity of the envelope of the waveform of the audio stream at the time n-1; a (n) is the absolute value of the amplitude of the audio stream at time n; coeff (n) is the waveform envelope coefficient at time n, and is set to at when the amplitude of the audio stream at time n is greater than or equal to the amplitude of the audio stream at time n-1; and when the amplitude of the audio stream at the moment n is smaller than that at the moment n-1, the coeff (n) is set to rt, and at and rt are both empirical values.

In a second aspect, an embodiment of the present invention further provides a sound mixing apparatus, including:

the audio stream receiving module is used for receiving audio streams of at least two channels;

the waveform envelope intensity calculating module is used for calculating the waveform envelope intensity of the amplitude or the energy of each audio stream;

the audio mixing weight distribution module is used for distributing the audio mixing weight of each audio stream according to the proportion of the waveform envelope strength of each audio stream in the sum of the waveform envelope strengths of all the audio streams, and the positive correlation degree of the audio mixing weight and the proportion is in a preset range;

and the audio mixing module is used for mixing audio according to each audio stream and the audio mixing weight corresponding to the audio stream to generate audio streams after audio mixing.

In a third aspect, an embodiment of the present invention further provides a mixing apparatus, including: comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program implements the mixing method according to the first aspect.

In a fourth aspect, the present invention further provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the mixing method according to the first aspect.

According to the sound mixing method, the device, the equipment and the storage medium provided by the embodiment of the invention, the sound mixing weight of each audio stream is distributed through the proportion of the waveform envelope strength of each audio stream in the sum of the waveform envelope strengths of all the audio streams, and the positive correlation between the sound mixing weight and the proportion is in the preset range.

Drawings

FIG. 1 is a flowchart illustrating a mixing method according to a first embodiment of the present invention;

FIG. 2 is a flowchart of a mixing method according to a second embodiment of the present invention;

FIG. 3 is a flowchart of a mixing method according to a second embodiment of the present invention;

FIG. 4 is a flowchart of a mixing method according to a third embodiment of the present invention;

fig. 5 is a schematic structural diagram of an audio mixing apparatus according to a fourth embodiment of the present invention;

fig. 6 is a schematic structural diagram of a mixing apparatus according to a fifth embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.

For the convenience of understanding the embodiments of the present invention, the mixing method disclosed in the embodiments of the present invention will be described in detail first.

Example one

Fig. 1 is a flowchart of a sound mixing method in an embodiment of the present invention, where the method is applicable to a scenario of an online voice call such as a multi-party teleconference or a network conference, and can be executed by software/hardware, and is deployed in a server or a client for application, as shown in fig. 1, the sound mixing method provided in this embodiment includes:

s102, receiving audio streams of at least two channels.

And receiving audio streams of all channels in a working state in the teleconference or the network conference, wherein the number of the channels in the working state is related to the number of participants or participants in the teleconference or the network conference. When the number of channels in the working state is higher than the threshold of the number of preset channels, the sound mixing method of the embodiment is preferably executed by the server; when the number of channels in the working state is lower than the threshold of the number of preset channels, the server or the client may execute the mixing method of the embodiment. In addition, in a teleconference, one person often speaks at the same time, or two persons speak, and other sound channels only have noise.

And S104, obtaining the waveform envelope strength of the amplitude or energy of each audio stream.

The larger the amplitude of the audio stream is, the larger the intensity of the envelope of the waveform thereof is, so that the present embodiment describes the trend of the audio stream by the intensity of the envelope of the waveform. The amplitude or energy of the human voice audio stream of the human voice channel is generally larger than that of the noise audio stream of the unmanned voice channel, so that the waveform envelope strength of the human voice audio stream of the human voice channel is larger than that of the noise audio stream of the unmanned voice channel.

And S106, distributing the mixing weight of each audio stream according to the proportion of the waveform envelope strength of each audio stream in the sum of the waveform envelope strengths of all the audio streams, wherein the positive correlation between the mixing weight and the proportion is in a preset range.

In this step, the audio stream with a larger amplitude value can be assigned with a larger mixing weight because of having a larger waveform envelope intensity, so that it is beneficial to make the volume larger, that is, the human voice with a larger amplitude value of the audio stream stand out in the mixing.

And S108, mixing sound according to each audio stream and the mixing weight corresponding to the audio stream to generate the audio stream after mixing sound.

And solving the product of each audio stream and the mixing weight corresponding to the audio stream, and then solving the sum of all products to serve as the audio stream after mixing.

Illustratively, n channels are in a working state in a teleconference, audio streams of the n channels in the working state are acquired, waveform envelope intensity of each audio stream and waveform envelope intensity are obtained, then mixing weights are distributed to each channel according to the waveform envelope intensity of each audio stream, then products of each audio stream and the mixing weights corresponding to the audio streams are obtained, and the sum of all the products is the audio stream after mixing.

In summary, in the embodiment, the audio mixing weight of each audio stream is allocated according to the specific gravity of the waveform envelope strength of each audio stream in the sum of the waveform envelope strengths of all the audio streams, and the positive correlation between the audio mixing weight and the specific gravity is within the preset range, compared with the problem in the prior art that the volume of human voice is reduced after linear audio mixing, the embodiment of the present invention allocates the human voice audio stream with higher amplitude or energy to a larger audio mixing weight due to the higher waveform envelope strength, so that the audio mixing is clearer, and the practicability and the user experience are better.

Example two

Fig. 2 is a flowchart of a mixing method in a second embodiment of the present invention, and as shown in fig. 2, the mixing method provided in this embodiment adds the following operations to adjust amplitudes of all audio streams so that the amplitude of an audio stream meeting a preset amplitude condition fluctuates within a first threshold range and the amplitudes of other audio streams fluctuate within a second threshold range before obtaining the waveform envelope strength of the amplitude or energy of each audio stream, compared with the previous embodiment; wherein the first threshold range is greater than the second threshold range.

The amplitude variation range of each audio stream is controlled through the step, the audio stream meeting the preset amplitude condition is fluctuated in a large range as a human voice audio stream, the audio streams of other sound channels are changed in a relatively small range as noise audio streams, so that the audio stream meeting the preset amplitude condition is distributed with a large mixing weight during mixing, and the other audio streams are distributed with a small mixing weight during mixing, therefore, in the output mixing, the audio stream meeting the preset amplitude condition occupies a prominent position, so that a participant can acquire the voice information of a speaker through good tone quality, and the user experience is improved.

In this embodiment, the first threshold range is preferably-0.8 to 0.8, the second threshold range is preferably-0.5 to 0.5, and it should be noted that the settings of the first threshold range and the second threshold range are related to specific use environments, such as the quality of the device and whether the environment where the conference participant is located is quiet, and therefore the specific ranges of the first threshold range and the second threshold range may be set according to actual use conditions.

Fig. 3 is a flowchart of a mixing method in the second embodiment of the present invention. As shown in fig. 3, adjusting the amplitudes of all audio streams so that the amplitude of the audio stream meeting the preset amplitude condition fluctuates within a first threshold range and the amplitudes of other audio streams fluctuate within a second threshold range preferably includes:

and S1031, searching a first amplitude audio stream in all the audio streams, wherein the first amplitude audio stream is the audio stream with the largest amplitude in all the audio streams.

S1032, updating the maximum audio stream by normalizing the first amplitude audio stream to a first threshold range; the other audio streams are updated by normalizing the other audio streams to a second threshold range.

In the embodiment, the first amplitude audio stream is defaulted to human voice, and other audio streams are noise, at this time, the amplitude change range of each audio stream is controlled by making the first amplitude audio stream fluctuate in a large range, and other audio streams change in a relatively small range, so that the amplitude change of the first amplitude audio stream is highlighted, and further the waveform envelope intensity of the first amplitude audio stream is highlighted, so that the first amplitude audio stream is assigned with a large mixing weight during mixing, and other audio streams regarded as noise are assigned with a small mixing weight, and occupy a highlight position in the output mixing, so that a participant can obtain voice information of a speaker through good sound quality, and the voice mixing method is suitable for a situation that only one person speaks at the same time in a telephone conference or a network conference, and the user experience is good.

In summary, the present embodiment controls the amplitude variation range of each audio stream by making the amplitude of the audio stream that meets the preset amplitude condition fluctuate within the first threshold range, and making the amplitudes of other audio streams fluctuate within the second threshold range. The audio stream meeting the preset amplitude condition is used as the voice, the audio stream of the voice is further highlighted through normalization, the audio stream of noise is reduced, the difference between the voice audio stream and the noise audio stream is enlarged, the difference between the intensity of the waveform envelope of the voice audio stream and the intensity of the waveform envelope of the noise audio stream is increased, the voice can obtain larger sound mixing weight, the voice can be well highlighted in the audio after sound mixing, and the improvement of user experience is facilitated.

EXAMPLE III

Fig. 4 is a flowchart of a mixing method in a third embodiment of the present invention, which is adapted to a situation that two persons may often speak simultaneously in a teleconference or a network conference, as shown in fig. 4, the mixing method provided in this embodiment is optimized for the steps in the foregoing embodiments of adjusting amplitudes of all audio streams, so that the amplitude of an audio stream meeting a preset amplitude condition fluctuates within a first threshold range, and the amplitudes of other audio streams fluctuate within a second threshold range, and includes:

s1033, a first amplitude audio stream and a second amplitude audio stream are obtained, wherein the amplitude of the first amplitude audio stream is greater than that of the second amplitude audio stream, and the first amplitude audio stream and the second amplitude audio stream are two audio streams with the largest amplitudes.

S1034, judging whether the difference of the amplitudes between the first amplitude audio stream and the second amplitude audio stream is in a preset amplitude range, if so, executing the step S1035; if not, step S1036 is executed.

S1035, updating the first amplitude audio stream and the second amplitude audio stream by normalizing the first amplitude audio stream and the second amplitude audio stream to the first threshold range.

S1036, the first amplitude audio stream is updated by normalizing the first amplitude audio stream to a first threshold range; updating the second magnitude audio stream and the other audio streams by normalizing the second magnitude audio stream and the other audio streams to a second threshold range.

The preset amplitude difference range in this embodiment is used to reflect that the amplitude difference between the first amplitude audio stream and the second amplitude audio stream is small, and the first amplitude audio stream and the second amplitude audio stream need to be highlighted at the same time, and the specific preset amplitude difference range may be set according to a use scene, and the specific implementation range of this embodiment is not limited herein.

Normalizing the first amplitude audio stream and the second amplitude audio stream to a first threshold range, updating the original first amplitude audio stream by the normalized first amplitude audio stream, and updating the original second amplitude audio stream by the normalized second amplitude audio stream, that is, normalizing the first amplitude audio stream and the second amplitude audio stream to the first threshold range relative to other audio streams, so that the first amplitude audio stream and the second amplitude audio stream can have larger waveform envelope intensity and can be further distributed to larger mixing weights, and thus the first amplitude audio stream and the second amplitude audio stream can be simultaneously highlighted in the mixed audio streams, so that a participant can obtain the voice information of the first amplitude audio stream and the second amplitude audio stream by better tone quality, and the method is suitable for a situation that two persons speak simultaneously exist at the same time in a telephone conference or a network conference, the user experience is better.

In order to adapt to the situation that two persons speak simultaneously in a teleconference or a network conference but the primary and secondary situations need to be distinguished, the first amplitude audio stream is preferably normalized to a first threshold range, the second amplitude audio stream is preferably normalized to a third threshold range, other audio streams are normalized to the second threshold range, the first threshold range is larger than the third threshold range, and the third threshold range is larger than the second threshold range. The voice of two speakers can be highlighted simultaneously, the primary and secondary speakers can be distinguished, and the user experience is good.

Preferably, before the audio stream is acquired, the audio channels are identified according to primary and secondary channels, when the audio stream identified with the primary channel is a first amplitude audio stream, the audio stream is normalized to a first threshold range, a second amplitude audio stream is normalized to a third threshold range, and other audio streams are normalized to a second threshold range; when the audio stream identified with the main channel is the second amplitude audio stream, the audio stream is still normalized to the first threshold range, the first amplitude audio stream is normalized to the third threshold range, and other audio streams are normalized to the second threshold range. Among the three threshold value ranges, the first threshold value range is larger than the third threshold value range, the third threshold value range is larger than the second threshold value range, the sound of two speakers can be highlighted simultaneously, the volume of the main sound channel can be highlighted, the user experience is good, the main sound channel is suitable for a host of a conference, and the user experience is good.

In summary, in the embodiment, the amplitude of the audio stream that meets the preset amplitude condition fluctuates in the first threshold range, and the amplitudes of the other audio streams fluctuate in the second threshold range to control the amplitude variation range of each audio stream, the audio stream that meets the preset amplitude condition is used as a voice, the amplitude of the audio stream of the voice is further emphasized by normalization, the amplitude of the audio stream of the noise is reduced, that is, the waveform envelope intensity of the audio stream that meets the preset amplitude condition is increased, and then the difference between the waveform envelope intensity of the audio stream of the voice and the waveform envelope intensity of the audio stream of the noise is enlarged, so that the voice can obtain a larger mixing weight, the voice can be well emphasized in the audio after mixing, and the improvement of user experience is facilitated.

In addition, the present embodiment further optimizes the waveform envelope strength of each audio stream in the foregoing embodiments, including: the waveform envelope strength of each audio stream is found by hilbert transform or low-pass filtering. It should be noted that, in the present embodiment, it is preferable but not limited to, to obtain the waveform envelope intensity of each audio stream by hilt transform or low-pass filtering, and other methods capable of obtaining the waveform envelope intensity of an audio stream may be used.

In this embodiment, taking the example of low-pass filtering to obtain the waveform envelope strength of each audio stream, the method includes the following steps:

Env(i,n)＝(1-coeff(n)×Env(i,n-1)+coeff(n)×a(n)

wherein Env (i, n) is the envelope strength of the audio stream waveform at time n; env (i, n-1) is the intensity of the envelope of the waveform of the audio stream at the time n-1; a (n) is the absolute value of the amplitude of the audio stream at time n; coeff (n) is the waveform envelope coefficient at time n, and is set to at when the amplitude of the audio stream at time n is greater than or equal to the amplitude of the audio stream at time n-1; and when the amplitude of the audio stream at the moment n is smaller than the amplitude of the audio stream at the moment n-1, setting coeff (n) as rt, wherein at and rt are empirical values, and the values of at and rt are set and debugged according to the quality of hardware facilities of the audio mixing equipment, a telephone conference or a network conference, or other conditions which can influence the quality of the audio stream.

After the waveform envelope intensity is obtained, the mixed sound distribution weight w (i, n) of the ith path is as follows:

the mixed output is as follows:

in addition, because the envelope strength is a slow change curve, the weighting factor cannot change suddenly along with time, the rationality of audio mixing weight distribution is favorably improved, the phenomenon that sound is suddenly changed after audio mixing is avoided, and the technical effect of improving the quality of audio mixing is realized.

In order to improve the quality of the audio stream and improve the sound quality after mixing, the audio stream is subjected to a pre-processing such as noise reduction before the waveform envelope strength of the audio stream is obtained, in this embodiment, it is preferable to perform noise reduction processing on the audio stream through a gaussian noise reduction or wavelet noise reduction algorithm, so as to expand the difference between the human audio stream and the noise audio stream, further expand the difference between the waveform envelope strength of the human audio stream and the waveform envelope strength of the noise, further expand the difference between the mixing weights of the human audio stream and the noise audio stream, and be beneficial to highlighting human voice in the mixing result.

Example four

Fig. 5 is a schematic structural diagram of a mixing apparatus in a fourth embodiment of the present invention, where the mixing apparatus is suitable for mixing processing in a teleconference or a network conference, and may be implemented by software/hardware, and may be deployed in a server or a client application, as shown in fig. 5, the mixing apparatus provided in this embodiment includes: an audio stream receiving module 101, configured to receive audio streams of at least two channels; a waveform envelope strength calculating module 102, configured to calculate a waveform envelope strength of amplitude or energy of each audio stream; a mixing weight distribution module 103, configured to distribute mixing weights of the audio streams according to specific gravities of the waveform envelope intensities of each audio stream among the sum of the waveform envelope intensities of all the audio streams, where a positive correlation between the mixing weights and the specific gravities is within a preset range; and the audio mixing module 104 is configured to perform audio mixing according to each audio stream and the audio mixing weight corresponding to the audio stream, and generate an audio stream after audio mixing.

The audio mixing apparatus in this embodiment preferably further includes an audio mixing output apparatus, configured to output the audio after audio mixing, so that the audio after audio mixing is output to a receiving end of a client through an original channel, where the receiving end may be an earphone or a headset, or the audio after audio mixing is played through an external playing device, and at this time, the audio mixing output apparatus may be a speaker.

The embodiment further includes an adjusting module 105, configured to adjust the amplitudes of all the audio streams, so that the amplitudes of the audio streams that meet the preset amplitude condition fluctuate within a first threshold range, and the amplitudes of other audio streams fluctuate within a second threshold range; wherein the first threshold range is greater than the second threshold range.

Before the waveform envelope intensities of the audio streams are obtained, the amplitude of the audio stream meeting the preset amplitude condition fluctuates in a first threshold range, the amplitudes of other audio streams fluctuate in a second threshold range, the amplitude difference between the first amplitude audio stream and other audio streams is enlarged, the difference between the waveform envelope intensity of the first amplitude audio stream and the waveform envelope intensities of other audio streams is enlarged, the proportion of the intensity of the waveform envelope of the audio stream meeting the preset amplitude condition to the sum of the waveform envelope intensities of all the audio streams is improved, the mixing weight of the audio stream meeting the preset amplitude condition is improved, and the audio stream meeting the preset amplitude condition can be highlighted in mixing.

In summary, in this embodiment, the audio mixing weight of each audio stream is allocated according to the specific gravity of the waveform envelope strength of each audio stream in the sum of the waveform envelope strengths of all audio streams, and the positive correlation between the audio mixing weight and the specific gravity is within the preset range, compared with the problems in the prior art that the volume of a human voice is reduced after linear audio mixing and the volume of the audio mixing is reduced, the embodiment of the present invention allocates a human voice audio stream with higher amplitude or energy to a larger audio mixing weight due to higher waveform envelope strength, so that the audio mixing is clearer, and the practicability and the user experience are better.

The sound mixing device can execute the sound mixing method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method. For the sake of a brief description, reference is made to the corresponding matters in the preceding method embodiments, where the apparatus embodiments are not mentioned in part.

EXAMPLE five

Fig. 6 is a schematic structural diagram of a mixing apparatus according to a fifth embodiment, and as shown in fig. 6, the mixing apparatus according to the present embodiment includes a processor 201, a memory 202, an input device 203, and an output device 204; the number of the processors 201 in the device may be one or more, and one processor 201 is taken as an example in fig. 6; the processor 201, the memory 202, the input device 203 and the output device 204 in the apparatus may be connected by a bus or other means, for example, in fig. 6.

The memory 202, which is a computer-readable storage medium, may be used to store software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the mixing method in the embodiment of the present invention (for example, the audio stream receiving module 101, the waveform envelope strength obtaining module 102, the mixing weight allocating module 103, and the mixing module 104). The processor 201 executes various functional applications of the apparatus and data processing by running software programs, instructions, and modules stored in the memory 202, that is, implements the above-described mixing method.

The memory 202 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal, and the like. Further, the memory 202 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, the memory 202 may further include memory located remotely from the processor 201, which may be connected to the device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input device 203 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function controls of the apparatus.

The output device 204 may include a display device such as a display screen, for example, of a user terminal.

EXAMPLE six

The present embodiment provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the mixing method of any of the above method embodiments, the method comprising:

receiving audio streams of at least two channels;

calculating the waveform envelope strength of the amplitude or energy of each audio stream;

distributing the mixing weight of each audio stream according to the proportion of the waveform envelope strength of each audio stream in the sum of the waveform envelope strengths of all the audio streams, wherein the positive correlation between the mixing weight and the proportion is in a preset range;

Of course, the storage medium containing the computer-executable instructions provided by the embodiments of the present invention is not limited to the above-described method operations, and may also perform related operations in the mixing method provided by any embodiment of the present invention.

From the above description of the embodiments, it is obvious for those skilled in the art that the present invention can be implemented by software and necessary general hardware, and certainly, can also be implemented by hardware, but the former is a better embodiment in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) to execute the sound mixing method according to the embodiments of the present invention.

It should be noted that, in the embodiment of the intelligent check-in device, the included units and modules are only divided according to functional logic, but are not limited to the above division, as long as the corresponding functions can be realized; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims

1. A mixing method, comprising:

receiving audio streams of at least two channels;

normalizing the audio streams of the channels with different identifications to corresponding threshold value ranges according to the magnitude relation among the amplitudes of the audio streams with different channel identifications, wherein after normalization, the audio stream of the main channel corresponds to the maximum threshold value range in all the threshold value ranges;

calculating the waveform envelope strength of the amplitude or energy of the audio stream of each channel;

distributing the mixing weight of the audio stream according to the proportion of the waveform envelope strength of the audio stream of each channel to the sum of the waveform envelope strengths of all the audio streams, wherein the positive correlation between the mixing weight and the proportion is in a preset range;

2. The method according to claim 1, wherein the normalizing the audio streams of the different identified channels to the corresponding threshold ranges according to the magnitude relationship between the magnitudes of the audio streams of the different identified channels, and after normalization, the audio stream of the main channel corresponds to a maximum threshold range of all the threshold ranges, comprises:

if the amplitude value of the audio stream identified as the main channel is the audio stream with the maximum amplitude value in all the audio streams, the amplitude value of the audio stream of the main channel is normalized to a first threshold value range to update the audio stream of the main channel, and the amplitude values of the audio streams of other channels are normalized to a second threshold value range to update the audio streams of other channels.

3. The method according to claim 1, wherein the normalizing the audio streams of the different identified channels to the corresponding threshold ranges according to the magnitude relationship between the magnitudes of the audio streams of the different identified channels, and after normalization, the audio stream of the main channel corresponds to a maximum threshold range of all the threshold ranges, comprises:

if the amplitude value of the audio stream identified as the main channel is not the audio stream with the maximum amplitude value in all the audio streams, normalizing the amplitude value of the audio stream of the main channel to a first threshold range to update the audio stream of the main channel, normalizing the amplitude value of the audio stream with the maximum amplitude value to a third threshold range, and normalizing the rest audio streams to a second threshold range, wherein the third threshold range is smaller than the first threshold range and larger than the second threshold range.

4. The method of claim 1, wherein prior to determining the waveform envelope strength of each audio stream, further comprising:

5. The method according to any one of claims 1 to 4, wherein the determining the waveform envelope strength of each audio stream comprises:

6. The method of claim 5, wherein the determining the waveform envelope strength of each audio stream by low-pass filtering comprises:

Env(i，n)＝(1-coeff(n)×Env(i，n-1)+coeff(n)×a(n)

wherein i is a channel identifier, and Env (i, n) is the intensity of the audio stream waveform envelope of the channel identified as i at the time of n; env (i, n-1) is the audio stream waveform envelope strength at time n-1 for the channel identified as i; a (n) is the absolute value of the amplitude of the audio stream at time n; coeff (n) is the waveform envelope coefficient at time n, and is set to at when the amplitude of the audio stream at time n is greater than or equal to the amplitude of the audio stream at time n-1; and when the amplitude of the audio stream at the moment n is smaller than that at the moment n-1, coeff (n) is set to rt, and at and rt are both empirical values.

7. An audio mixing apparatus, comprising:

the adjusting module is used for normalizing the audio streams of the channels with different identifications to corresponding threshold value ranges according to the magnitude relation among the amplitude values of the audio streams with different channel identifications, and after normalization, the audio stream of the main channel corresponds to the maximum threshold value range in all the threshold value ranges;

the waveform envelope intensity calculating module is used for calculating the waveform envelope intensity of each audio stream;

the audio mixing weight distribution module is used for distributing the audio mixing weight of the audio stream according to the proportion of the waveform envelope strength of the audio stream of each channel to the sum of the waveform envelope strengths of all the audio streams, and the positive correlation between the audio mixing weight and the proportion is in a preset range;

and the audio mixing module is used for mixing audio according to the audio stream of each channel and the audio mixing weight corresponding to the audio stream to generate the audio stream after audio mixing.

8. A mixing apparatus, characterized by comprising: memory, processor and computer program stored on the memory and executable on the processor, characterized in that the processor when executing the program implements a mixing method according to any of claims 1 to 6.

9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the mixing method according to any one of claims 1 to 6.