CN106504758A - Mixer and sound mixing method - Google Patents
Mixer and sound mixing method Download PDFInfo
- Publication number
- CN106504758A CN106504758A CN201610939143.8A CN201610939143A CN106504758A CN 106504758 A CN106504758 A CN 106504758A CN 201610939143 A CN201610939143 A CN 201610939143A CN 106504758 A CN106504758 A CN 106504758A
- Authority
- CN
- China
- Prior art keywords
- signal
- present frame
- frame
- weight
- represent
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000002156 mixing Methods 0.000 title claims abstract description 172
- 238000000034 method Methods 0.000 title claims abstract description 27
- 238000001514 detection method Methods 0.000 claims abstract description 48
- 238000004364 calculation method Methods 0.000 claims abstract description 42
- 238000009432 framing Methods 0.000 claims abstract description 40
- 238000012805 post-processing Methods 0.000 claims abstract description 28
- 238000001914 filtration Methods 0.000 claims description 26
- 238000005070 sampling Methods 0.000 claims description 22
- 238000005562 fading Methods 0.000 claims description 8
- 238000012360 testing method Methods 0.000 claims description 7
- 230000000052 comparative effect Effects 0.000 claims description 4
- 239000000203 mixture Substances 0.000 description 43
- 238000005516 engineering process Methods 0.000 description 22
- 238000010586 diagram Methods 0.000 description 7
- 238000009499 grossing Methods 0.000 description 5
- 238000012545 processing Methods 0.000 description 4
- 230000005574 cross-species transmission Effects 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000011217 control strategy Methods 0.000 description 2
- 241000208340 Araliaceae Species 0.000 description 1
- 238000012935 Averaging Methods 0.000 description 1
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 1
- 235000003140 Panax quinquefolius Nutrition 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 235000008434 ginseng Nutrition 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/003—Changing voice quality, e.g. pitch or formants
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0316—Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
- G10L21/0324—Details of processing therefor
- G10L21/034—Automatic adjustment
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Quality & Reliability (AREA)
- Traffic Control Systems (AREA)
Abstract
The invention discloses a kind of mixer and sound mixing method, the mixer includes:Framing unit;The speech detection unit being connected with the framing unit;The speech detection unit is used for whether detecting framing Hou Ge roads signal containing voice signal;The time varing filter being connected with the speech detection unit;The loudness computing unit being connected with the speech detection unit;The weight calculation unit being connected with loudness computing unit;The downmixing unit being connected with the time varing filter and the weight calculation unit;The post-processing unit being connected with the downmixing unit;The present invention can improve voice quality, also embody the fairness to each participant.
Description
Technical field
The present invention relates to a kind of audio mixing technology, specially a kind of mixer and sound mixing method.
Background technology
Video conference and videoconference are the conferencing forms that is held by communication network, and they can be ginseng in strange land space
Plus meeting person provides real-time voice exchange.Real meeting communication atmosphere is close to for obtaining, audio mixing technology is indispensable, and mixes
Sound technology can directly influence the voice quality of meeting.Audio mixing technology is specifically divided into simulation audio mixing technology and digital audio mixing technology,
Wherein, digital audio mixing technology due to high precision, dynamic range be big and the low advantage of noise and be used widely.Numeral is mixed
Digital signal of the ultimate principle Shi Jiangge road voice signal of sound technology after analog digital conversion is overlapped mutually and is formed all the way
Audio mixing output signal.
As audio digital signals there is a problem of quantifying upper and lower bound, superposition is likely to result in result spilling,
So the demand of digital audio mixing technology shows following two aspects:1. ensure that the signal after audio mixing frequently will not overflow;With
The increase of voice way, and spillover can more and more frequently, if directly saturation arithmetic is carried out to these spill overs, can
Introduce noise so that the sound after audio mixing sounds discontinuous or explosion sound occurs.2. ensure each road voice quality;Each road language
The size of sound, frequency are different, well ensure that quality of these signals after audio mixing is weigh digital audio mixing technology one
Item major criterion.
Existing document of the author for Zhang Chuanyong《Audio mixing technology and its application in voip session system》In disclose
A kind of weighting method audio mixing technology, its main thought be to calculating a weighted value per voice signal all the way, afterwards to weighting after
Signal be overlapped;And the purpose for weighting is to reduce or eliminate spilling, so as to ensure voice quality.The weighting method audio mixing technology
Specific implementation as follows:Assume there is N roads signal, have M sample per one frame of road signal, wherein f (i, j) is jth road signal
I-th sample value, then its corresponding weighted value be:
Finally it is output as:
Wherein, weights of the weight (i, j) for i-th sample of jth road signal, Output (i) are that i-th sample is defeated
Go out.There are the following problems for weighting method audio mixing technology disclosed in the existing document:Each road voice signal in audio mixing, get over by signal amplitude
Little then its weight also can be less, becomes less, be easily caused larger distortion after which results in small-signal audio mixing;Secondly, typically
The people that video conference is simultaneously made a speech not over 4, and this mode the line signal (containing noise) that does not speak without
Any process directly participate in audio mixing, easily reduces the signal to noise ratio of the voice after audio mixing.
Author is the existing document of Zhou Jingli etc.《A kind of new multimedia conferencing real-time sound mixing scheme》In disclose one kind
Automatic threshold audio mixing technology, its determined whether voice signal (i.e. according to its short-time energy each road signal before audio mixing
Quiet detection), the circuit without voice is judged as " without floor status ", these " without floor status " signals will not be participated in mixed
Sound;During audio mixing, this mode calculates its decay factor according to itself short-time energy size of voice data, when audio frequency short-time energy
Decayed when exceeding some threshold value in certain proportion, and be less than threshold value and need not then be decayed, and then each road
The weight of signal is only related to the short-time energy of oneself.Automatic threshold audio mixing technology disclosed in the existing document is present asks as follows
Topic:Signal due to being judged as " without floor status " does not participate in audio mixing, so showing no sign of " nothing in the sound after audio mixing
Speech state " signal so that the participant of these " without floor status " has no the presence of sense;Meanwhile, participant from silence to speak when
Fluctuating occurs, affects audition;Secondly, although this mode ensure that weight >=1 of small-signal, but still cannot ensure small-signal
Voice quality.
Therefore, in existing audio mixing technology, or each road signal (in spite of there is voice) is without any process
Audio mixing is directly participated in, or the signal way for participating in audio mixing is reduced using quiet detection;If not adopting quiet detection, audio mixing mistake
Cheng Zhonghui adds unnecessary noise, so as to affect voice quality.If reducing audio mixing way, " nothing using quiet detection
The participant of floor status " can become have no sense of participation;In addition, in existing audio mixing technology, being made with the amplitude of voice signal
Determine that for standard the weight of each road signal is overlapped again, but the loudness of voice is not exactly equal to its amplitude, it depends on
The amplitude and frequency of voice signal.
Content of the invention
The present invention for problem above proposition, and develop one kind can improve voice quality, also embodied to respectively with
The mixer and sound mixing method of the fairness of meeting person.
The technological means of the present invention are as follows:
A kind of mixer, including:
Framing unit, carries out framing respectively for each road signal to participating in audio mixing;
The speech detection unit being connected with the framing unit;The speech detection unit is used for framing Hou Ge roads
Whether signal is detected containing voice signal;Whether the speech detection unit is located by present frame in detection all the way signal
In there is voice signal state, determine whether the road signal contains voice signal;
The time varing filter being connected with the speech detection unit;The time varing filter is used for being examined according to the voice
The testing result of unit is surveyed, time-varying low-pass filtering treatment is carried out respectively to framing Hou Ge roads signal;When current in signal all the way
In when having voice signal state, the passband width of the time varing filter gradually broadens frame, at present frame in signal all the way
When without voice signal state, the passband width of the time varing filter becomes narrow gradually;
The loudness computing unit being connected with the speech detection unit;The loudness computing unit is used for according to institute's predicate
The testing result of sound detector unit, to calculate mean loudness of the framing Hou Ge roads signal in current predetermined amount of time respectively;
The weight calculation unit being connected with loudness computing unit;The weight calculation unit is used for according to the program meter
The result of calculation of unit is calculated, each sample weights included by present frame in each road signal for participate in audio mixing are calculated respectively;
The downmixing unit being connected with the time varing filter and the weight calculation unit;The downmixing unit is used for root
According to each road signal respectively through the present frame output signal after time-varying low-pass filtering treatment, and present frame is wrapped in each road signal
The each sample weights for including, obtain and export the current frame signal all the way after each road signal present frame audio mixing;
The post-processing unit being connected with the downmixing unit;The post-processing unit is used for according to downmixing unit output
Current frame signal all the way, calculates each sample weights after audio mixing included by signal present frame and each output sample;
Further, the speech detection unit includes:
Power computation module, for calculating the power of present frame in the signal of framing Hou Ge roads respectively;
The minimum frame power determination module being connected with the power computation module;The minimum frame power determination module is used
In the result of calculation according to the power computation module, to obtain framing Hou Ge roads signal respectively in current predetermined amount of time
Minimum frame power;
The voice status being connected with the power computation module and the minimum frame power determination module know module;Institute
Stating voice status knows module for by the ratio between frame power to the power of present frame in signal all the way and the minimum
Relatively detecting in signal all the way whether contain voice signal;
Further,
The power computation module passes through formulaTo calculate the work(of present frame in signal all the way
Rate, in formula:Pow represents that the power of present frame, x [i] represent that i-th input data of present frame, N represent the sample number in a frame
Amount;
The current predetermined amount of time represents the duration T of the front r frames to present frame from present frame;The minimum frame work(
Rate determining module by formula pow_min=min present frame power, 1 frame power before present frame, r frames before present frame
Power } minimum in current predetermined amount of time the to obtain signal all the way frame power, in formula:Min { } represents in braces own
The minima of data,Ceil (x) is represented and is close to x and the integer more than or equal to x, FSRepresent sampling frequency
Rate, N represent the sample size in a frame;
The voice status know that module whether contain in signal all the way voice signal by setting VAD to represent, and right
VAD assigns initial value and causes VAD=1;As pow >=32 pow_min, and the voice status know that VAD is put by module during VAD=0
1, represent the road signal in there is voice signal state;As pow≤4 pow_min, and the voice status are known during VAD=1
VAD is set to 0 by module, represents that the road signal is in without voice signal state;Comparative result between pow and pow_min is which
During its situation, the voice status know that VAD is kept constant by module;
Further, the time varing filter obtains signal all the way by formula f [i]=(1-b) * x [i]+b*f [i-1]
I-th filtering output value of middle present frame, in formula:F [i] represents i-th filtering output value of present frame, x [i] in signal all the way
Represent that i-th input data of present frame, N represent that the sample size in a frame, 0≤i < N, b represent filter factor, in present frame
In have voice signal state when,When present frame is in without voice signal state,As b < 0.18, take b=0.18, as b > 0.956, take b=0.956, p1 represent b from
0.956 fades to the sampling number in 0.18 time span, and p2 represents samplings of the b from 0.18 time span for fading to 0.956
Points;
Further,
The loudness computing unit includes:Loudness that DFT transform module is connected with DFT transform module obtain module and
The mean loudness acquisition module that module is connected is obtained with loudness;
When in signal all the way present frame in have voice signal state when, by the DFT transform module to the road signal
Middle present frame carries out DFT transform, obtains the loudness value that module calculates the present frame by the loudness afterwards, finally by institute
State mean loudness acquisition module and calculate the mean loudness in the current predetermined amount of time of road signal;
When in signal all the way, present frame is in without voice signal state, average in the current predetermined amount of time of road signal
Loudness is equal to the mean loudness in the upper predetermined amount of time containing voice signal before present frame;
Further,
The DFT transform module passes through formulaTo signal all the way
Middle present frame carries out DFT transform, in formula,S represents that discrete frequency, x [i] represent that present frame is input into for i-th
Data, X [s] represent that x [i] result that obtains after DFT transform, j represent imaginary unit, j2=-1;
The loudness obtains module and utilizes formula
The loudness value of the present frame is calculated, in formula, loudness represents that the loudness value of present frame, X [s] represent that x [i] is passed through
The result that obtains after DFT transform, Equal [s] represent have default value etc. loudness array, s20=ceil (20*N/Fs)、
s20000=floor (20000*N/Fs), ceil (x) represent be close to x and the integer more than or equal to x, floor (x) represent be close to x and
Integer, F less than or equal to xSRepresent that sample frequency, N represent the sample size in a frame;
The mean loudness obtains module and passes through formula
To calculate the mean loudness in the current predetermined amount of time of signal all the way;Current predetermined amount of time represents the front r from present frame
The duration T of frame to present frame;In formula:Ceil (x) is represented and is close to x and whole more than or equal to x
Number, FSRepresent that sample frequency, N represent the sample size in a frame;
Further,
The weight calculation unit passes through formulaDraw present frame power in the signal of kth road
Weight, and pass through formula weightk[i]=weightk0≤i < N obtain the weight of i-th sample of present frame in the signal of kth road;
In formula:weightkRepresent present frame weight in the signal of kth road, LOUDkRepresent that kth road signal is flat in current predetermined amount of time
Equal loudness, M represent the signal way for participating in audio mixing, k=1,2 ..., M, weightk[i] represents present frame i-th in the signal of kth road
The weight of individual sample;
The downmixing unit passes through formulaDraw each road
Current frame signal all the way after signal present frame audio mixing, in formula:I-th of current frame signal all the way after Mix [i] expression audio mixings
Sample, M represent participate in audio mixing signal way, f_k [i] represent in the signal of time-varying low-pass filtering treatment Houk road when
I-th sample output signal of previous frame, weightk[i] represents the weight of i-th sample in the signal of kth road included by present frame,
K=1,2 ..., M;
The post-processing unit maximum sample in signal present frame and is believed after calculating audio mixing after being additionally operable to obtain audio mixing
Number present frame weight;
Further,
The weight calculation unit is additionally operable to carry out smooth place to each frame weight in each road signal of participation audio mixing respectively
Reason;
Further,
The weight calculation unit to signal all the way in the process of realizing that is smoothed of each frame weight be:
When in signal all the way, present frame is the first frame, the weight calculation unit passes through formula weightk[i]=
weightk0≤i < N obtain the weight of i-th sample of present frame in the signal of kth road;
When in signal all the way, present frame is not the first frame, the weight calculation unit passes through formula
Obtain the weight of i-th sample of present frame in the signal of kth road;Wherein, weightk[i] represents present frame in the signal of kth road
The weight of i-th sample, N represent that sample size in a frame, P represent weightk[i] 1 frame weight from before present frame gradually becomes
Sampling number to present frame weight;
The post-processing unit passes through formula signalmax=max | Mix [0] |, | Mix [1] |, | Mix [N-1]
| the maximum sample in signal present frame after audio mixing is obtained, in formula:signalmaxAfter expression audio mixing included by signal present frame
Maximum sample, max { } represent that the maximum of data, Mix [0] in braces represent the defeated of the 0th sample of signal present frame after audio mixing
Go out signal after signal, Mix [1] represent the output signal of the 1st sample of signal present frame after audio mixing, Mix [N-1] represents audio mixing current
The output signal of frame N-1 samples;
Work as signalmaxWhen≤32768, the post-processing unit calculates the present frame weight of signal after audio mixing
weightmix=1, work as signalmaxDuring > 32768, the post-processing unit calculates the present frame weight of signal after audio mixing
When the present frame of signal after audio mixing is the first frame, the post-processing unit passes through formula weightmix[i]=
weightmix, 0≤i < N obtain the weight of signal i-th sample of present frame after audio mixing, and after audio mixing, the present frame of signal is not the
During one frame, the post-processing unit passes through formula
The weight of signal i-th sample of present frame after acquisition audio mixing, in formula:weightmixI-th of signal present frame after [i] expression audio mixing
The weight of sample, Q represent weightmix[i] before present frame, 1 frame weight gradually fades to present frame weight after audio mixing from after audio mixing
Sampling number;
The post-processing unit passes through formulaBelieve after drawing audio mixing
I-th output sample y [i] of number present frame, in formula:Final [i]=Mix [i] * weightmix[i], 0≤i < N.
A kind of sound mixing method, comprises the steps:
Step 1:Framing is carried out respectively to each road signal for participating in audio mixing;
Step 2:Whether framing Hou Ge roads signal is detected containing voice signal;In by detection all the way signal
Whether present frame determines whether the road signal contain voice signal in there is voice signal state;
Step 3:According to Speech signal detection result, framing Hou Ge roads signal is carried out at time-varying low-pass filtering respectively
Reason:When in signal all the way present frame in have voice signal state when, passband width gradually broadens, when present frame in signal all the way
When being in without voice signal state, passband width becomes narrow gradually;
Step 4:According to Speech signal detection result, to calculate framing Hou Ge roads signal respectively in current predetermined amount of time
Interior mean loudness;
Step 5:According to mean loudness result of calculation, respectively in each road signal of calculating participation audio mixing included by present frame
Each sample weights;
Step 6:According to each road signal respectively through the present frame output signal after time-varying low-pass filtering treatment, and each road
Each sample weights in signal included by present frame, obtain and export the letter of present frame all the way after each road signal present frame audio mixing
Number;
Step 7:According to the current frame signal all the way after each road signal present frame audio mixing, signal present frame institute after audio mixing is calculated
Including each sample weights and each output sample.
The mixer provided as a result of above-mentioned technical proposal, the present invention and sound mixing method, there is provided a kind of new language
Sound activates detection mode, i.e., judge by the mean power of voice signal whether present frame is voice signal;The present invention passes through
The introducing of time varing filter, solves current audio mixing technology Zhong Ge road signal and directly participates in audio mixing and introduce asking for unnecessary noise
Topic, is participated in audio mixing way and causes " without speech " participant to have no the presence of sense while avoiding and being reduced using quiet detection
Problem;The loudness control strategies such as present invention employing, draw the weight of each road signal by the loudness of each road signal of calculating, finally
The mean loudness of Shi Ge roads signal be close to identical, auditory effect also close to;Present invention achieves voice signal especially small-signal
The raising of voice quality, has also embodied the fairness to each participant.
Description of the drawings
Fig. 1 is the structured flowchart of mixer of the present invention;
Fig. 2 is the structured flowchart of speech detection unit of the present invention;
Fig. 3 is the structured flowchart of loudness computing unit of the present invention;
Fig. 4 is the workflow diagram of mixer of the present invention;
Fig. 5 is the waveform diagram of the voice signal of three road of the invention different qualities;
Fig. 6 is the waveform of three road voice signals after loudness computing unit of the present invention and weight calculation unit are processed
Schematic diagram;
Fig. 7 is the waveform diagram of mixer output signal of the present invention;
Fig. 8 is the flow chart of sound mixing method of the present invention.
Specific embodiment
A kind of mixer as shown in Figure 1, Figure 2, Figure 3 and Figure 4, including:Framing unit, for each road to participating in audio mixing
Signal carries out framing respectively;The speech detection unit being connected with the framing unit;The speech detection unit is used for dividing
Whether Zheng Houge roads signal is detected containing voice signal;The speech detection unit by detection all the way in signal when
Whether previous frame determines whether the road signal contain voice signal in there is voice signal state;With the speech detection unit
The time varing filter being connected;The time varing filter is used for according to the testing result of the speech detection unit, to framing after
Each road signal carry out time-varying low-pass filtering treatment respectively;When in signal all the way present frame in have voice signal state when, institute
The passband width for stating time varing filter gradually broadens, when in signal all the way, present frame is in without voice signal state, when described
The passband width for becoming wave filter becomes narrow gradually;The loudness computing unit being connected with the speech detection unit;The program meter
Calculating unit is used for the testing result according to the speech detection unit, makes a reservation for currently to calculate framing Hou Ge roads signal respectively
Mean loudness in time period;The weight calculation unit being connected with loudness computing unit;The weight calculation unit is used for root
According to the result of calculation of the loudness computing unit, each sample included by present frame in each road signal for participate in audio mixing is calculated respectively
Weight;The downmixing unit being connected with the time varing filter and the weight calculation unit;The downmixing unit is used for basis
Each road signal respectively through the present frame output signal after time-varying low-pass filtering treatment, and in each road signal included by present frame
Each sample weights, obtain and export the current frame signal all the way after each road signal present frame audio mixing;With the downmixing unit phase
The post-processing unit of connection;The post-processing unit is used for the current frame signal all the way according to downmixing unit output, calculates audio mixing
Each sample weights afterwards included by signal present frame and each output sample;Further, the speech detection unit includes:Power
Computing module, for calculating the power of present frame in the signal of framing Hou Ge roads respectively;It is connected with the power computation module
Minimum frame power determination module;The minimum frame power determination module is used for being tied according to the calculating of the power computation module
Really, the frame power minimum in current predetermined amount of time to obtain framing Hou Ge roads signal respectively;With the power calculation mould
The voice status that block is connected with the minimum frame power determination module know module;The voice status know that module is used for leading to
Cross between the power of present frame in signal all the way and the minimum frame power relatively detecting in signal all the way whether contain
There is voice signal;Further, the power computation module passes through formulaTo calculate signal all the way
The power of middle present frame, in formula:Pow represents that the power of present frame, x [i] represent that i-th input data of present frame, N represent a frame
In sample size;
The current predetermined amount of time represents the duration T of the front r frames to present frame from present frame;The minimum frame work(
Rate determining module by formula pow_min=min present frame power, 1 frame power before present frame, r frames before present frame
Power } minimum in current predetermined amount of time the to obtain signal all the way frame power, in formula:Min { } represents in braces own
The minima of data,Ceil (x) is represented and is close to x and the integer more than or equal to x, FSRepresent sampling frequency
Rate, N represent the sample size in a frame;The voice status know that whether module is represented in signal all the way by setting VAD
Containing voice signal, and VAD=1 is caused to VAD tax initial values;As pow >=32 pow_min, and voice shape during VAD=0
State knows that VAD is put 1 by module, represents the road signal in there is voice signal state;As pow≤4 pow_min, and VAD=1
Shi Suoshu voice status know that VAD is set to 0 by module, represent that the road signal is in without voice signal state;In pow and pow_min
Between comparative result be other situations when, the voice status know that VAD is kept constant by module;Further, when described
Become the i-th filtering output that wave filter obtains present frame in signal all the way by formula f [i]=(1-b) * x [i]+b*f [i-1]
Value, in formula:F [i] represents that i-th filtering output value of present frame, x [i] represent i-th input number of present frame in signal all the way
Represent that according to, N the sample size in a frame, 0≤i < N, b represent filter factor, present frame in have voice signal state when,When present frame is in without voice signal state,
As b < 0.18, b=0.18 is taken, as b > 0.956, take b=0.956, p1 represents b from 0.956 time span for fading to 0.18
Sampling number, p2 represents sampling numbers of the b from 0.18 time span for fading to 0.956;Further, the loudness is calculated
Unit includes:The loudness that DFT transform module is connected with DFT transform module obtains module and obtains module with loudness and is connected
Mean loudness obtain module;When in signal all the way present frame in have voice signal state when, by the DFT transform module
DFT transform is carried out to present frame in the road signal, the loudness value that module calculates the present frame is obtained by the loudness afterwards,
Module is obtained finally by the mean loudness and calculates the mean loudness in the current predetermined amount of time of road signal;When signal all the way
Middle present frame be in without voice signal state when, the mean loudness in the current predetermined amount of time of road signal be equal to present frame it
Mean loudness in the front upper predetermined amount of time containing voice signal;Further, the DFT transform module passes through formulaDFT transform is carried out to present frame in signal all the way, in formula,S represents that discrete frequency, x [i] represent that i-th input data of present frame, X [s] represent that x [i] becomes through DFT
The result that obtains after changing, j represent imaginary unit, j2=-1;
The loudness obtains module and utilizes formula
The loudness value of the present frame is calculated, in formula, loudness represents that the loudness value of present frame, X [s] represent that x [i] is passed through
The result that obtains after DFT transform, Equal [s] represent have default value etc. loudness array, s20=ceil (20*N/Fs)、
s20000=floor (20000*N/Fs), ceil (x) represent be close to x and the integer more than or equal to x, floor (x) represent be close to x and
Integer, F less than or equal to xSRepresent that sample frequency, N represent the sample size in a frame;The mean loudness obtains module and passes through
FormulaTo count
Calculate the mean loudness in the current predetermined amount of time of signal all the way;Current predetermined amount of time is represented from the front r frames of present frame to current
The duration T of frame;In formula:Ceil (x) is represented and is close to x and the integer more than or equal to x, FSExpression is adopted
Sample frequency, N represent the sample size in a frame;Further, the weight calculation unit passes through formulaPresent frame weight in the signal of kth road is drawn, and passes through formula weightk[i]=weightk0
≤ i < N obtain the weight of i-th sample of present frame in the signal of kth road;In formula:weightkRepresent present frame in the signal of kth road
Weight, LOUDkRepresent that mean loudness of the kth road signal in current predetermined amount of time, M represent the signal way for participating in audio mixing, k
=1,2 ..., M, weightk[i] represents the weight of i-th sample of present frame in the signal of kth road;The downmixing unit is by public affairs
FormulaDraw current all the way after each road signal present frame audio mixing
Frame signal, in formula:I-th sample of the current frame signal all the way after Mix [i] expression audio mixings, M represent the signal road for participating in audio mixing
Number, f_k [i] represent i-th sample output signal of present frame in the signal of time-varying low-pass filtering treatment Houk road,
weightk[i] represents the weight of i-th sample in the signal of kth road included by present frame, k=1,2 ..., M;The post processing
Unit maximum sample in signal present frame and calculates the present frame weight of signal after audio mixing after being additionally operable to obtain audio mixing;Enter
One step ground, the weight calculation unit are additionally operable to carry out smooth place to each frame weight in each road signal of participation audio mixing respectively
Reason;Further, the process of realizing that each frame weight during the weight calculation unit is to signal all the way is smoothed is:When
When in signal, present frame is the first frame all the way, the weight calculation unit passes through formula weightk[i]=weightk0≤i < N
Obtain the weight of i-th sample of present frame in the signal of kth road;When in signal all the way, present frame is not the first frame, the weight
Computing unit passes through formula
Obtain the weight of i-th sample of present frame in the signal of kth road;Wherein, weightk[i] represents present frame i-th in the signal of kth road
The weight of individual sample, N represent that sample size in a frame, P represent weightk[i] 1 frame weight from before present frame is gradually faded to be worked as
The sampling number of previous frame weight;The post-processing unit passes through formula signalmax=max | Mix [0] |, | Mix [1]
|, | Mix [N-1] | } maximum sample in signal present frame after audio mixing is obtained, in formula:signalmaxBelieve after representing audio mixing
Maximum sample, max { } number included by present frame represents that signal is worked as after the maximum of data, Mix [0] represent audio mixing in braces
The output signal of the 1st sample of signal present frame, Mix [N-1] table after the output signal of the 0th sample of previous frame, Mix [1] expression audio mixings
Show the output signal of signal present frame N-1 samples after audio mixing;Work as signalmaxWhen≤32768, the post-processing unit is calculated
Go out present frame weight weight of signal after audio mixingmix=1, work as signalmaxDuring > 32768, the post-processing unit is calculated
The present frame weight of signal after audio mixingWhen the present frame of signal after audio mixing is the first frame, institute
Post-processing unit is stated by formula weightmix[i]=weightmix0≤i < N obtain signal present frame i-th after audio mixing
The weight of individual sample, when the present frame of signal after audio mixing is not the first frame, the post-processing unit passes through formulaObtain signal after audio mixing
The weight of i-th sample of present frame, in formula:weightmixThe weight of signal i-th sample of present frame, Q tables after [i] expression audio mixing
Show weightmix[i] before present frame, 1 frame weight gradually fades to the sampling number of present frame weight after audio mixing from after audio mixing;After described
Processing unit passes through formulaDraw i-th of signal present frame after audio mixing
Individual output sample y [i], in formula:Final [i]=Mix [i] * weightmix[i], 0≤i < N.
As shown in figure 8, present invention also offers a kind of sound mixing method, comprises the steps:
Step 1:Framing is carried out respectively to each road signal for participating in audio mixing;
Step 2:Whether framing Hou Ge roads signal is detected containing voice signal;In by detection all the way signal
Whether present frame determines whether the road signal contain voice signal in there is voice signal state;
Step 3:According to Speech signal detection result, framing Hou Ge roads signal is carried out at time-varying low-pass filtering respectively
Reason:When in signal all the way present frame in have voice signal state when, passband width gradually broadens, when present frame in signal all the way
When being in without voice signal state, passband width becomes narrow gradually;
Step 4:According to Speech signal detection result, to calculate framing Hou Ge roads signal respectively in current predetermined amount of time
Interior mean loudness;
Step 5:According to mean loudness result of calculation, respectively in each road signal of calculating participation audio mixing included by present frame
Each sample weights;
Step 6:According to each road signal respectively through the present frame output signal after time-varying low-pass filtering treatment, and each road
Each sample weights in signal included by present frame, obtain and export the letter of present frame all the way after each road signal present frame audio mixing
Number;
Step 7:According to the current frame signal all the way after each road signal present frame audio mixing, signal present frame institute after audio mixing is calculated
Including each sample weights and each output sample.
Further, the step 1 specifically includes following steps:
Step 11:The power of in framing Hou Ge road signal present frame is calculated respectively;
Step 12:According to the result of calculation of present frame power in each road signal, to obtain framing Hou Ge roads signal respectively
Minimum frame power in current predetermined amount of time;
Step 13:By between the power to present frame in signal all the way and the minimum frame power relatively detecting
Whether voice signal is contained in signal all the way;
Further,
In signal, the power of present frame passes through formula all the wayCalculated, in formula:Pow tables
Show that the power of present frame, x [i] represent that i-th input data of present frame, N represent the sample size in a frame;
The current predetermined amount of time represents the duration T of the front r frames to present frame from present frame;By formula pow_
Min=min present frame power, and 1 frame power before present frame, r frames power before present frame } and obtaining signal all the way
Minimum frame power in current predetermined amount of time, in formula:Min { } represent braces in all data minima,Ceil (x) is represented and is close to x and the integer more than or equal to x, FSRepresent that sample frequency, N are represented in a frame
Sample size;
Voice signal whether contain in signal all the way by setting VAD to be represented, and initial value is assigned to VAD and cause VAD=1;
As pow >=32 pow_min, and VAD is put 1 during VAD=0, represent the road signal in there is voice signal state;When pow≤
4 pow_min, and VAD is set to 0 during VAD=1, represent that the road signal is in without voice signal state;Pow and pow_min it
Between comparative result be other situations when, VAD is kept constant;
Further, the step 2 is specially:
The i-th filtering output that present frame in signal all the way is obtained by formula f [i]=(1-b) * x [i]+b*f [i-1]
Value, in formula:F [i] represents that i-th filtering output value of present frame, x [i] represent i-th input number of present frame in signal all the way
Represent that according to, N the sample size in a frame, 0≤i < N, b represent filter factor, present frame in have voice signal state when,When present frame is in without voice signal state,
As b < 0.18, b=0.18 is taken, as b > 0.956, take b=0.956, p1 represents b from 0.956 time span for fading to 0.18
Sampling number, p2 represents sampling numbers of the b from 0.18 time span for fading to 0.956;
Further,
When in signal all the way present frame in have voice signal state when, the step 3 specifically includes following steps:
Step 31:DFT transform is carried out to present frame in the road signal;
Step 32:Calculate the loudness value of the present frame;
Step 33:Calculate the mean loudness in the current predetermined amount of time of road signal;
When in signal all the way, present frame is in without voice signal state, average in the current predetermined amount of time of road signal
Loudness is equal to the mean loudness in the upper predetermined amount of time containing voice signal before present frame;
Further,
By formulaDFT is carried out to present frame in signal all the way
Conversion, in formula,S represents that discrete frequency, x [i] represent that i-th input data of present frame, X [s] represent x
Result that [i] is obtained after DFT transform, j represent imaginary unit, j2=-1;
Using formulaTo the present frame
Loudness value calculated, in formula, loudness represents that the loudness value of present frame, X [s] represent x [i] after DFT transform
The result that arrives, Equal [s] represent have default value etc. loudness array, s20=ceil (20*N/Fs)、s20000=floor
(20000*N/Fs), ceil (x) represents to be close to x and the integer more than or equal to x, floor (x) and represent and is close to x and less than or equal to x's
Integer, FSRepresent that sample frequency, N represent the sample size in a frame;
By formula
To calculate the mean loudness in the current predetermined amount of time of signal all the way;Current predetermined amount of time represent from the front r frames of present frame to
The duration T of present frame;In formula:Ceil (x) is represented and is close to x and the integer more than or equal to x, FS
Represent that sample frequency, N represent the sample size in a frame;
Further, by formulaPresent frame weight in the signal of kth road is drawn, and is passed through
Formula weightk[i]=weightk0≤i < N obtain the weight of i-th sample of present frame in the signal of kth road;In formula:
weightkRepresent present frame weight in the signal of kth road, LOUDkRepresent average sound of the kth road signal in current predetermined amount of time
Degree, M represent participate in audio mixing signal way, k=1,2 ..., M, weightk[i] represents i-th sample of present frame in the signal of kth road
This weight;
Further, by formulaDraw each road letter
Current frame signal all the way after number present frame audio mixing, in formula:I-th sample of the current frame signal all the way after Mix [i] expression audio mixings
This, M represents that the signal way for participating in audio mixing, f_k [i] represent current in the signal of time-varying low-pass filtering treatment Houk road
I-th sample output signal of frame, weightk[i] represents the weight of i-th sample in the signal of kth road included by present frame, k
=1,2 ..., M;
Methods described also comprises the steps:Maximum sample and calculating audio mixing after acquisition audio mixing in signal present frame
The present frame weight of signal afterwards;
Further, also there are following steps after the step 4:Respectively to participate in audio mixing each road signal in each
Frame weight is smoothed;
Further, to signal all the way in the process of realizing that is smoothed of each frame weight be:
When in signal all the way, present frame is the first frame, by formula weightk[i]=weightk0≤i < N obtain the
The weight of i-th sample of present frame in the signal of k roads;
When in signal all the way, present frame is not the first frame, by formula
Obtain the weight of i-th sample of present frame in the signal of kth road;Wherein, weightk[i] represents present frame in the signal of kth road
The weight of i-th sample, N represent that sample size in a frame, P represent weightk[i] 1 frame weight from before present frame gradually becomes
Sampling number to present frame weight;
By formula signalmax=max { | Mix [0] |, | Mix [1] |, | Mix [N-1] | } is believed after obtaining audio mixing
Maximum sample in number present frame, in formula:signalmaxMaximum sample, max { } after expression audio mixing included by signal present frame
Represent that the maximum of data, Mix [0] in braces represent the output signal of the 0th sample of signal present frame, Mix [1] table after audio mixing
Show that the output signal of the 1st sample of signal present frame, Mix [N-1] after audio mixing represent the defeated of signal present frame N-1 samples after audio mixing
Go out signal;
Work as signalmaxWhen≤32768, present frame weight weight of signal after audio mixing is calculatedmix=1, when
signalmaxDuring > 32768, the present frame weight of signal after audio mixing is calculated
When the present frame of signal after audio mixing is the first frame, by formula weightmix[i]=weightmix0≤i < N
The weight of signal i-th sample of present frame after acquisition audio mixing, when the present frame of signal after audio mixing is not the first frame, by formulaObtain signal after audio mixing
The weight of i-th sample of present frame, in formula:weightmixThe weight of signal i-th sample of present frame, Q tables after [i] expression audio mixing
Show weightmix[i] before present frame, 1 frame weight gradually fades to the sampling number of present frame weight after audio mixing from after audio mixing;
By formulaDraw i-th of signal present frame after audio mixing
Output sample y [i], in formula:Final [i]=Mix [i] * weightmix[i], 0≤i < N.
The upper limit of the passband width of time varing filter of the present invention is 20kHz, and the lower limit of passband width is 0.3kHz;Work as filtering
When passband is more than the upper limit or less than lower limit so as to be maintained at bound;Before time-varying low-pass filtering treatment is carried out, when first pair
Become wave filter initialized, specially make f [- 1]=0, b=0.18, now the passband width of time varing filter be 0~
20kHz;Current predetermined amount of time of the invention can be with the current 4s of value;A upper predetermined amount of time refers to a upper 4s of current 4s;This
Invention Equal [s] represent have default value etc. loudness array, the default value in the loudness array such as described according to etc. loudness
Curve and obtain, its concrete numerical value such as table 1;
The numerical tabular of loudness array Equal such as table 1. [s].
With reference to the concrete acquisition process that table 1 illustrates Equal [s] value:1. s*N/F is calculatedsValue;2. according to s*N/Fs
Value, search the frequency range corresponding to the value in table 1;3. according to obtained frequency range in table 1, obtain corresponding
Equal [s] value;For example, work as s*N/FsWhen=1, its value falls in the frequency range of table 1 (0.985~1.500), therefore Equal
[s]=1.5.
Identical in order to ensure each road signal averaging loudness, need to calculate each road signal etc. loudness weight;Present invention warp
Cross smoothing step so that the weight between each frame has the smoothing process of P point, it is ensured that weighted data is in each frame
Between smooth change, and then voice signal sounds more smooth, is conducive to the guarantee of voice quality;If weight calculation unit pair
In per road signal, each frame weight does not carry out preferred smoothing process, each sample in the signal of Zek roads included by present frame
Weight (weightk[i], 0,1,2 ... N-1 of i values) it is equal to present frame weight weight in the signal of kth roadk, specifically,If weight calculation unit carries out preferred smoothing processing mistake to each frame weight in every road signal
Journey, then, when in signal all the way, present frame is the first frame, the weight calculation unit passes through formula weightk[i]=weightk
0≤i < N obtain the weight of i-th sample of present frame in the signal of kth road;When in signal all the way, present frame is not the first frame, institute
Weight calculation unit is stated by formula
Obtain the weight of i-th sample of present frame in the signal of kth road, i.e., after smoothing process, current in the signal of kth road
Each sample weights included by frame are not congruent to present frame weight weight in the signal of kth roadk;In addition, mixed to each road signal
After sound, for the frequent spillover for preventing multi-path voice signal to be likely to occur after being added, then can carry out corresponding rear place
Reason, while can also ensure held stationary between the weighted data of each frame signal by post-processing operation, and then is conducive to audio mixing
The flatness of voice afterwards.The present invention calculates its weight, Shi Ge roads voice signal using the loudness of each road voice signal as standard
Mean loudness identical, finally carry out Overflow handling again, and then each road voice signal, after audio mixing, its loudness connect acoustically
Closely identical, and spilling will not be frequent.
Audio mixing is carried out come further using mixer of the present invention below by the voice signal to three road different qualities
Effectiveness of the invention, wherein, sample frequency F are describedS48kHz is taken, the sample number N in a frame signal takes 2048, current predetermined
Time period takes current 4 seconds, frame number r take sampling number p1s of 100, the b from 0.956 time span for fading to 0.18 take 960, b from
The 0.18 sampling number p2 faded in 0.956 time span takes 96000, weightk[i] 1 frame weight is gradually from before present frame
The sampling number P for fading to present frame weight takes 100, weightmix[i] before present frame, 1 frame weight gradually fades to audio mixing from after audio mixing
The sampling number Q of present frame weight takes 100 afterwards;
Fig. 5 shows the waveform diagram of the voice signal of three road of the invention different qualities, as shown in figure 5, in order to verify
The audio mixing effect of small-signal, first via voice signal (signal 1) amplitude range are -3500~3500, much smaller than other two-way languages
Message number;Due to loudness and amplitude proportional, so the loudness of first via voice signal is also much smaller than other two-way voice signals;
In order to the characteristics of verifying the effectiveness of speech detection unit and time varing filter, the second road voice signal (signal 2) for " having language
Sound " state and " without voice " state are alternateed and add uniform white noise;The characteristics of 3rd road voice signal (signal 3)
For:Above a part of signal amplitude is less, and aft section signal amplitude is larger, and then can contrast the 3rd road voice signal after audio mixing
The change of amplitude in front and back, so that analyze the change of loudness before and after which;Fig. 6 is through loudness computing unit of the present invention and weight calculation
The waveform diagram of three road voice signals after cell processing, as shown in fig. 6, through speech detection unit, time varing filter,
After the process of loudness computing unit and weight calculation unit, three road voice signals there occurs different changes;Wherein, the first via
Voice signal (signal 1) amplitude is significantly increased, and its loudness also increases therewith;Continuously there is " nothing in second road voice signal (signal 2)
During voice " state, the uniform white noise in signal is cut, and signal amplitude has also reduced;3rd road voice signal (signal 3)
Above the amplitude of a part of signal has increased, and the amplitude of aft section signal has then reduced;Fig. 7 shows audio mixing of the present invention
The waveform diagram of device output signal, as shown in fig. 7, three road voice signals are overlapped by downmixing unit, then after passing through
Processing unit causes final output signal to overflow infrequently, and overflows supersaturation process.From above test result, three
The different voice signal of road loudness, after mixer of the present invention such as carries out at the loudness control, its mean loudness is close to phase
Deng;As three road voice signals have different qualities, the know clearly good robustness of the present invention and stability is also indicated that.
The invention provides a kind of new voice activation detection mode, i.e., judge to work as by the mean power of voice signal
Whether previous frame is voice signal;The present invention solves current audio mixing technology Zhong Ge road signal straight by the introducing of time varing filter
Connect participation audio mixing and introduce the problem of unnecessary noise, participate in audio mixing way and make while avoiding and being reduced using quiet detection
Into " without speech " participant have no to there is a problem of feeling;The loudness control strategies such as present invention employing, by calculating each road signal
Loudness drawing the weight of each road signal, the mean loudness of final Shi Ge roads signal be close to identical, auditory effect also close to;This
While invention achieves raising small-signal voice quality, the fairness to each participant has also been embodied.
The above, the only present invention preferably specific embodiment, but protection scope of the present invention is not limited thereto,
Any those familiar with the art the invention discloses technical scope in, technology according to the present invention scheme and its
Inventive concept equivalent or change in addition, should all be included within the scope of the present invention.
Claims (10)
1. a kind of mixer, it is characterised in that the mixer includes:
Framing unit, carries out framing respectively for each road signal to participating in audio mixing;
The speech detection unit being connected with the framing unit;The speech detection unit is used for framing Hou Ge roads signal
Whether detected containing voice signal;The speech detection unit has by whether present frame in detection all the way signal is in
Voice signal state, determines whether the road signal contains voice signal;
The time varing filter being connected with the speech detection unit;The time varing filter is used for according to the speech detection list
The testing result of unit, carries out time-varying low-pass filtering treatment respectively to framing Hou Ge roads signal;At present frame in signal all the way
When having voice signal state, the passband width of the time varing filter gradually broadens, when in signal all the way, present frame is in nothing
During voice signal state, the passband width of the time varing filter becomes narrow gradually;
The loudness computing unit being connected with the speech detection unit;The loudness computing unit is used for being examined according to the voice
The testing result of unit is surveyed, to calculate mean loudness of the framing Hou Ge roads signal in current predetermined amount of time respectively;
The weight calculation unit being connected with loudness computing unit;The weight calculation unit is used for calculating list according to the loudness
The result of calculation of unit, calculates each sample weights included by present frame in each road signal for participate in audio mixing respectively;
The downmixing unit being connected with the time varing filter and the weight calculation unit;The downmixing unit is used for according to each
Road signal respectively through the present frame output signal after time-varying low-pass filtering treatment, and in each road signal included by present frame
Each sample weights, obtain and export the current frame signal all the way after each road signal present frame audio mixing;
The post-processing unit being connected with the downmixing unit;The post-processing unit is used for being exported all the way according to downmixing unit
Current frame signal, each sample weights included by signal present frame and each output sample after calculating audio mixing.
2. mixer according to claim 1, it is characterised in that the speech detection unit includes:
Power computation module, for calculating the power of present frame in the signal of framing Hou Ge roads respectively;
The minimum frame power determination module being connected with the power computation module;The minimum frame power determination module is used for root
According to the result of calculation of the power computation module, minimum in current predetermined amount of time to obtain framing Hou Ge roads signal respectively
Frame power;
The voice status being connected with the power computation module and the minimum frame power determination module know module;Institute's predicate
Sound-like state know module for by the comparison between the power to present frame in signal all the way and the minimum frame power come
Detect in signal all the way and whether contain voice signal.
3. mixer according to claim 2, it is characterised in that
The power computation module passes through formulaTo calculate the power of present frame in signal all the way, formula
In:Pow represents that the power of present frame, x [i] represent that i-th input data of present frame, N represent the sample size in a frame;
The current predetermined amount of time represents the duration T of the front r frames to present frame from present frame;The minimum frame power is true
Cover half block by formula pow_min=min present frame power, 1 frame power before present frame, r frames power before present frame }
To obtain the minimum frame power in current predetermined amount of time of signal all the way, in formula:Min { } represents all data in braces
Minima,Ceil (x) is represented and is close to x and the integer more than or equal to x, FSRepresent that sample frequency, N are represented
Sample size in one frame;
The voice status know that module whether contain in signal all the way voice signal by setting VAD to represent, and VAD is assigned
Initial value causes VAD=1;As pow >=32 pow_min, and during VAD=0, the voice status know that VAD is put 1 by module, represent
The road signal is in voice signal state;As pow≤4 pow_min, and the voice status know that module will during VAD=1
VAD sets to 0, and represents that the road signal is in without voice signal state;Comparative result between pow and pow_min is other situations
When, the voice status know that VAD is kept constant by module.
4. mixer according to claim 1, it is characterised in that the time varing filter passes through formula f [i]=(1-b) * x
[i]+b*f [i-1] obtains i-th filtering output value of present frame in signal all the way, in formula:F [i] represents current in signal all the way
I-th filtering output value of frame, x [i] represent that i-th input data of present frame, N represent the sample size in a frame, 0≤i <
N, b represent filter factor, when present frame is in and has voice signal state,In present frame
When being in without voice signal state,As b < 0.18, b=0.18 is taken, as b > 0.956,
B=0.956 is taken, p1 represents that sampling numbers of the b from 0.956 time span for fading to 0.18, p2 represent that b is faded to from 0.18
Sampling number in 0.956 time span.
5. mixer according to claim 1, it is characterised in that
The loudness computing unit includes:Loudness that DFT transform module is connected with DFT transform module obtain module and with sound
Degree obtains the mean loudness acquisition module that module is connected;
When in signal all the way present frame in have voice signal state when, by the DFT transform module in the road signal when
Previous frame carries out DFT transform, obtains the loudness value that module calculates the present frame by the loudness afterwards, finally by described flat
Loudness obtains module and calculates the mean loudness in the current predetermined amount of time of road signal;
When in signal all the way, present frame is in without voice signal state, the mean loudness in the current predetermined amount of time of road signal
The mean loudness being equal in the upper predetermined amount of time containing voice signal before present frame.
6. mixer according to claim 5, it is characterised in that
The DFT transform module passes through formulaS=0,1 ..., N-1 carried out to present frame in signal all the way
DFT transform, in formula,S represents that discrete frequency, x [i] represent that i-th input data of present frame, X [s] are represented
Result that x [i] is obtained after DFT transform, j represent imaginary unit, j2=-1;
The loudness obtains module and utilizes formula
The loudness value of the present frame is calculated, in formula, loudness represents that the loudness value of present frame, X [s] represent that x [i] is passed through
The result that obtains after DFT transform, Equal [s] represent have default value etc. loudness array, s20=ceil (20*N/Fs)、
s20000=floor (20000*N/Fs), ceil (x) represent be close to x and the integer more than or equal to x, floor (x) represent be close to x and
Integer, F less than or equal to xSRepresent that sample frequency, N represent the sample size in a frame;
The mean loudness obtains module and passes through formula
To calculate the mean loudness in the current predetermined amount of time of signal all the way;Current predetermined amount of time represent from the front r frames of present frame to
The duration T of present frame;In formula:Ceil (x) is represented and is close to x and the integer more than or equal to x, FS
Represent that sample frequency, N represent the sample size in a frame.
7. mixer according to claim 6, it is characterised in that
The weight calculation unit passes through formulaPresent frame weight in the signal of kth road is drawn, and
By formula weightk[i]=weightk0≤i < N obtain the weight of i-th sample of present frame in the signal of kth road;In formula:
weightkRepresent present frame weight in the signal of kth road, LOUDkRepresent average sound of the kth road signal in current predetermined amount of time
Degree, M represent participate in audio mixing signal way, k=1,2 ..., M, weightk[i] represents i-th sample of present frame in the signal of kth road
This weight;
The downmixing unit passes through formula0≤i < N show that each road signal is current
Current frame signal all the way after frame audio mixing, in formula:I-th sample of the current frame signal all the way after Mix [i] expression audio mixings, M tables
Show that the signal way for participating in audio mixing, f_k [i] represent i-th of present frame in the signal of time-varying low-pass filtering treatment Houk road
Sample output signal, weightkThe weight of i-th sample in the signal of [i] expression kth road included by present frame, k=1,
2、…、M;
The post-processing unit be additionally operable to obtain audio mixing after maximum sample in signal present frame and signal after calculating audio mixing
Present frame weight.
8. mixer according to claim 7, it is characterised in that
The weight calculation unit is additionally operable to be smoothed each frame weight in each road signal of participation audio mixing respectively.
9. mixer according to claim 8, it is characterised in that
The weight calculation unit to signal all the way in the process of realizing that is smoothed of each frame weight be:
When in signal all the way, present frame is the first frame, the weight calculation unit passes through formula weightk[i]=weightk0
≤ i < N obtain the weight of i-th sample of present frame in the signal of kth road;
When in signal all the way, present frame is not the first frame, the weight calculation unit passes through formula
0≤i < P obtain the weight of i-th sample of present frame in the signal of kth road;Wherein, weightk[i] is represented in the signal of kth road
The weight of i-th sample of present frame, N represent that sample size in a frame, P represent weightk[i] 1 frame weight from before present frame
The sampling number of present frame weight is gradually faded to;
The post-processing unit passes through formula signalmax=max { | Mix [0] |, | Mix [1] |, | Mix [N-1] | } is obtained
Maximum sample after audio mixing in signal present frame, in formula:signalmaxMaximum after expression audio mixing included by signal present frame
Sample, max { } represent the output letter of the 0th sample of signal present frame after the maximum of data in braces, Mix [0] expression audio mixings
Number, the output signal of the 1st sample of signal present frame, Mix [N-1] represent after audio mixing signal present frame the after Mix [1] represents audio mixing
The output signal of N-1 samples;
Work as signalmaxWhen≤32768, the post-processing unit calculates present frame weight weight of signal after audio mixingmix=
1, work as signalmaxDuring > 32768, the post-processing unit calculates the present frame weight of signal after audio mixing
When the present frame of signal after audio mixing is the first frame, the post-processing unit passes through formula weightmix[i]=
weightmix0≤i < N obtain the weight of signal i-th sample of present frame after audio mixing, and after audio mixing, the present frame of signal is not first
During frame, the post-processing unit passes through formula
0≤i < Q obtain the weight of signal i-th sample of present frame after audio mixing, in formula:weightmixAfter [i] represents audio mixing, signal is current
The weight of i-th sample of frame, Q represent weightmix[i] before present frame, 1 frame weight gradually fades to present frame after audio mixing from after audio mixing
The sampling number of weight;
The post-processing unit passes through formulaAfter drawing audio mixing, signal is worked as
I-th output sample y [i] of previous frame, in formula:Final [i]=Mix [i] * weightmix[i], 0≤i < N.
10. a kind of sound mixing method, it is characterised in that methods described comprises the steps:
Step 1:Framing is carried out respectively to each road signal for participating in audio mixing;
Step 2:Whether framing Hou Ge roads signal is detected containing voice signal;By current in detection all the way signal
Whether frame determines whether the road signal contain voice signal in there is voice signal state;
Step 3:According to Speech signal detection result, time-varying low-pass filtering treatment is carried out respectively to framing Hou Ge roads signal:When
All the way in signal present frame in when having voice signal state, passband width gradually broadens, when in signal all the way, present frame is in
During without voice signal state, passband width becomes narrow gradually;
Step 4:According to Speech signal detection result, to calculate framing Hou Ge roads signal respectively in current predetermined amount of time
Mean loudness;
Step 5:According to mean loudness result of calculation, the various kinds included by present frame in each road signal for participate in audio mixing is calculated respectively
This weight;
Step 6:According to each road signal respectively through the present frame output signal after time-varying low-pass filtering treatment, and each road signal
Each sample weights included by middle present frame, obtain and export the current frame signal all the way after each road signal present frame audio mixing;
Step 7:According to the current frame signal all the way after each road signal present frame audio mixing, after calculating audio mixing included by signal present frame
Each sample weights and each output sample.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610939143.8A CN106504758B (en) | 2016-10-25 | 2016-10-25 | Mixer and sound mixing method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610939143.8A CN106504758B (en) | 2016-10-25 | 2016-10-25 | Mixer and sound mixing method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106504758A true CN106504758A (en) | 2017-03-15 |
CN106504758B CN106504758B (en) | 2019-07-16 |
Family
ID=58319112
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610939143.8A Active CN106504758B (en) | 2016-10-25 | 2016-10-25 | Mixer and sound mixing method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106504758B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107202976A (en) * | 2017-05-15 | 2017-09-26 | 大连理工大学 | The distributed microphone array sound source localization system of low complex degree |
CN111770413A (en) * | 2020-06-30 | 2020-10-13 | 浙江大华技术股份有限公司 | Multi-sound-source sound mixing method and device and storage medium |
CN112750444A (en) * | 2020-06-30 | 2021-05-04 | 腾讯科技(深圳)有限公司 | Sound mixing method and device and electronic equipment |
CN112951197A (en) * | 2021-04-02 | 2021-06-11 | 北京百瑞互联技术有限公司 | Audio mixing method, device, medium and equipment |
CN112995425A (en) * | 2021-05-13 | 2021-06-18 | 北京百瑞互联技术有限公司 | Equal loudness sound mixing method and device |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1770256A (en) * | 2004-11-02 | 2006-05-10 | 北京中科信利技术有限公司 | Digital audio frequency mixing method based on transform domain |
CN101477800A (en) * | 2008-12-31 | 2009-07-08 | 瑞声声学科技(深圳)有限公司 | Voice enhancing process |
CN101593522A (en) * | 2009-07-08 | 2009-12-02 | 清华大学 | A kind of full frequency domain digital hearing aid method and apparatus |
CN102664019A (en) * | 2012-04-27 | 2012-09-12 | 深圳市邦彦信息技术有限公司 | DSP sound mixing method and device for full-interactive conference |
CN102779527A (en) * | 2012-08-07 | 2012-11-14 | 无锡成电科大科技发展有限公司 | Speech enhancement method on basis of enhancement of formants of window function |
CN104539816A (en) * | 2014-12-25 | 2015-04-22 | 广州华多网络科技有限公司 | Intelligent voice mixing method and device for multi-party voice communication |
CN104616665A (en) * | 2015-01-30 | 2015-05-13 | 深圳市云之讯网络技术有限公司 | Voice similarity based sound mixing method |
-
2016
- 2016-10-25 CN CN201610939143.8A patent/CN106504758B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1770256A (en) * | 2004-11-02 | 2006-05-10 | 北京中科信利技术有限公司 | Digital audio frequency mixing method based on transform domain |
CN101477800A (en) * | 2008-12-31 | 2009-07-08 | 瑞声声学科技(深圳)有限公司 | Voice enhancing process |
CN101593522A (en) * | 2009-07-08 | 2009-12-02 | 清华大学 | A kind of full frequency domain digital hearing aid method and apparatus |
CN102664019A (en) * | 2012-04-27 | 2012-09-12 | 深圳市邦彦信息技术有限公司 | DSP sound mixing method and device for full-interactive conference |
CN102779527A (en) * | 2012-08-07 | 2012-11-14 | 无锡成电科大科技发展有限公司 | Speech enhancement method on basis of enhancement of formants of window function |
CN104539816A (en) * | 2014-12-25 | 2015-04-22 | 广州华多网络科技有限公司 | Intelligent voice mixing method and device for multi-party voice communication |
CN104616665A (en) * | 2015-01-30 | 2015-05-13 | 深圳市云之讯网络技术有限公司 | Voice similarity based sound mixing method |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107202976A (en) * | 2017-05-15 | 2017-09-26 | 大连理工大学 | The distributed microphone array sound source localization system of low complex degree |
CN111770413A (en) * | 2020-06-30 | 2020-10-13 | 浙江大华技术股份有限公司 | Multi-sound-source sound mixing method and device and storage medium |
CN112750444A (en) * | 2020-06-30 | 2021-05-04 | 腾讯科技(深圳)有限公司 | Sound mixing method and device and electronic equipment |
CN111770413B (en) * | 2020-06-30 | 2021-08-27 | 浙江大华技术股份有限公司 | Multi-sound-source sound mixing method and device and storage medium |
CN112750444B (en) * | 2020-06-30 | 2023-12-12 | 腾讯科技(深圳)有限公司 | Sound mixing method and device and electronic equipment |
CN112951197A (en) * | 2021-04-02 | 2021-06-11 | 北京百瑞互联技术有限公司 | Audio mixing method, device, medium and equipment |
CN112951197B (en) * | 2021-04-02 | 2022-06-24 | 北京百瑞互联技术有限公司 | Audio mixing method, device, medium and equipment |
CN112995425A (en) * | 2021-05-13 | 2021-06-18 | 北京百瑞互联技术有限公司 | Equal loudness sound mixing method and device |
CN112995425B (en) * | 2021-05-13 | 2021-09-07 | 北京百瑞互联技术有限公司 | Equal loudness sound mixing method and device |
Also Published As
Publication number | Publication date |
---|---|
CN106504758B (en) | 2019-07-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106504758A (en) | Mixer and sound mixing method | |
CN101960516B (en) | Speech enhancement | |
Lavandier et al. | Binaural prediction of speech intelligibility in reverberant rooms with multiple noise sources | |
CN103413547B (en) | A kind of method that room reverberation is eliminated | |
CN104485114B (en) | A kind of method of the voice quality objective evaluation based on auditory perception property | |
CN103238183A (en) | Noise suppression device | |
US20150302865A1 (en) | System and method for audio conferencing | |
WO2017129005A1 (en) | Audio mixing method and apparatus | |
CN102354500A (en) | Virtual bass boosting method based on harmonic control | |
CN104658543A (en) | Method for eliminating indoor reverberation | |
CN104916288B (en) | The method and device of the prominent processing of voice in a kind of audio | |
EP2860989B1 (en) | System and method for dynamically mixing audio signals | |
CN104616665B (en) | Sound mixing method based on voice similar degree | |
CN103280225B (en) | Low-complexity silence detection method | |
CN101740035A (en) | Call voice processing apparatus, call voice processing method and program | |
CN112750444A (en) | Sound mixing method and device and electronic equipment | |
WO2015085946A1 (en) | Voice signal processing method, apparatus and server | |
Sato et al. | Relationship between listening difficulty and acoustical objective measures in reverberant sound fields | |
Schoenmaker et al. | The multiple contributions of interaural differences to improved speech intelligibility in multitalker scenarios | |
Liu et al. | The speech intelligibility and applicability of the speech transmission index in large spaces | |
CN109887521B (en) | Dynamic master tape processing method and device for audio | |
Zhang et al. | A new method of objective speech quality assessment in communication system | |
CN105720939B (en) | A kind of processing method and electronic equipment of audio data | |
CN104424954B (en) | noise estimation method and device | |
Bhat et al. | Smartphone based real-time super gaussian single microphone speech enhancement to improve intelligibility for hearing aid users using formant information |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |