CN106504758A - Mixer and sound mixing method - Google Patents

Mixer and sound mixing method Download PDF

Info

Publication number
CN106504758A
CN106504758A CN201610939143.8A CN201610939143A CN106504758A CN 106504758 A CN106504758 A CN 106504758A CN 201610939143 A CN201610939143 A CN 201610939143A CN 106504758 A CN106504758 A CN 106504758A
Authority
CN
China
Prior art keywords
signal
present frame
frame
weight
represent
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610939143.8A
Other languages
Chinese (zh)
Other versions
CN106504758B (en
Inventor
陈喆
殷福亮
呼德
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University of Technology
Original Assignee
Dalian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University of Technology filed Critical Dalian University of Technology
Priority to CN201610939143.8A priority Critical patent/CN106504758B/en
Publication of CN106504758A publication Critical patent/CN106504758A/en
Application granted granted Critical
Publication of CN106504758B publication Critical patent/CN106504758B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/003Changing voice quality, e.g. pitch or formants
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0316Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
    • G10L21/0324Details of processing therefor
    • G10L21/034Automatic adjustment
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Traffic Control Systems (AREA)

Abstract

The invention discloses a kind of mixer and sound mixing method, the mixer includes:Framing unit;The speech detection unit being connected with the framing unit;The speech detection unit is used for whether detecting framing Hou Ge roads signal containing voice signal;The time varing filter being connected with the speech detection unit;The loudness computing unit being connected with the speech detection unit;The weight calculation unit being connected with loudness computing unit;The downmixing unit being connected with the time varing filter and the weight calculation unit;The post-processing unit being connected with the downmixing unit;The present invention can improve voice quality, also embody the fairness to each participant.

Description

Mixer and sound mixing method
Technical field
The present invention relates to a kind of audio mixing technology, specially a kind of mixer and sound mixing method.
Background technology
Video conference and videoconference are the conferencing forms that is held by communication network, and they can be ginseng in strange land space Plus meeting person provides real-time voice exchange.Real meeting communication atmosphere is close to for obtaining, audio mixing technology is indispensable, and mixes Sound technology can directly influence the voice quality of meeting.Audio mixing technology is specifically divided into simulation audio mixing technology and digital audio mixing technology, Wherein, digital audio mixing technology due to high precision, dynamic range be big and the low advantage of noise and be used widely.Numeral is mixed Digital signal of the ultimate principle Shi Jiangge road voice signal of sound technology after analog digital conversion is overlapped mutually and is formed all the way Audio mixing output signal.
As audio digital signals there is a problem of quantifying upper and lower bound, superposition is likely to result in result spilling, So the demand of digital audio mixing technology shows following two aspects:1. ensure that the signal after audio mixing frequently will not overflow;With The increase of voice way, and spillover can more and more frequently, if directly saturation arithmetic is carried out to these spill overs, can Introduce noise so that the sound after audio mixing sounds discontinuous or explosion sound occurs.2. ensure each road voice quality;Each road language The size of sound, frequency are different, well ensure that quality of these signals after audio mixing is weigh digital audio mixing technology one Item major criterion.
Existing document of the author for Zhang Chuanyong《Audio mixing technology and its application in voip session system》In disclose A kind of weighting method audio mixing technology, its main thought be to calculating a weighted value per voice signal all the way, afterwards to weighting after Signal be overlapped;And the purpose for weighting is to reduce or eliminate spilling, so as to ensure voice quality.The weighting method audio mixing technology Specific implementation as follows:Assume there is N roads signal, have M sample per one frame of road signal, wherein f (i, j) is jth road signal I-th sample value, then its corresponding weighted value be:
Finally it is output as:
Wherein, weights of the weight (i, j) for i-th sample of jth road signal, Output (i) are that i-th sample is defeated Go out.There are the following problems for weighting method audio mixing technology disclosed in the existing document:Each road voice signal in audio mixing, get over by signal amplitude Little then its weight also can be less, becomes less, be easily caused larger distortion after which results in small-signal audio mixing;Secondly, typically The people that video conference is simultaneously made a speech not over 4, and this mode the line signal (containing noise) that does not speak without Any process directly participate in audio mixing, easily reduces the signal to noise ratio of the voice after audio mixing.
Author is the existing document of Zhou Jingli etc.《A kind of new multimedia conferencing real-time sound mixing scheme》In disclose one kind Automatic threshold audio mixing technology, its determined whether voice signal (i.e. according to its short-time energy each road signal before audio mixing Quiet detection), the circuit without voice is judged as " without floor status ", these " without floor status " signals will not be participated in mixed Sound;During audio mixing, this mode calculates its decay factor according to itself short-time energy size of voice data, when audio frequency short-time energy Decayed when exceeding some threshold value in certain proportion, and be less than threshold value and need not then be decayed, and then each road The weight of signal is only related to the short-time energy of oneself.Automatic threshold audio mixing technology disclosed in the existing document is present asks as follows Topic:Signal due to being judged as " without floor status " does not participate in audio mixing, so showing no sign of " nothing in the sound after audio mixing Speech state " signal so that the participant of these " without floor status " has no the presence of sense;Meanwhile, participant from silence to speak when Fluctuating occurs, affects audition;Secondly, although this mode ensure that weight >=1 of small-signal, but still cannot ensure small-signal Voice quality.
Therefore, in existing audio mixing technology, or each road signal (in spite of there is voice) is without any process Audio mixing is directly participated in, or the signal way for participating in audio mixing is reduced using quiet detection;If not adopting quiet detection, audio mixing mistake Cheng Zhonghui adds unnecessary noise, so as to affect voice quality.If reducing audio mixing way, " nothing using quiet detection The participant of floor status " can become have no sense of participation;In addition, in existing audio mixing technology, being made with the amplitude of voice signal Determine that for standard the weight of each road signal is overlapped again, but the loudness of voice is not exactly equal to its amplitude, it depends on The amplitude and frequency of voice signal.
Content of the invention
The present invention for problem above proposition, and develop one kind can improve voice quality, also embodied to respectively with The mixer and sound mixing method of the fairness of meeting person.
The technological means of the present invention are as follows:
A kind of mixer, including:
Framing unit, carries out framing respectively for each road signal to participating in audio mixing;
The speech detection unit being connected with the framing unit;The speech detection unit is used for framing Hou Ge roads Whether signal is detected containing voice signal;Whether the speech detection unit is located by present frame in detection all the way signal In there is voice signal state, determine whether the road signal contains voice signal;
The time varing filter being connected with the speech detection unit;The time varing filter is used for being examined according to the voice The testing result of unit is surveyed, time-varying low-pass filtering treatment is carried out respectively to framing Hou Ge roads signal;When current in signal all the way In when having voice signal state, the passband width of the time varing filter gradually broadens frame, at present frame in signal all the way When without voice signal state, the passband width of the time varing filter becomes narrow gradually;
The loudness computing unit being connected with the speech detection unit;The loudness computing unit is used for according to institute's predicate The testing result of sound detector unit, to calculate mean loudness of the framing Hou Ge roads signal in current predetermined amount of time respectively;
The weight calculation unit being connected with loudness computing unit;The weight calculation unit is used for according to the program meter The result of calculation of unit is calculated, each sample weights included by present frame in each road signal for participate in audio mixing are calculated respectively;
The downmixing unit being connected with the time varing filter and the weight calculation unit;The downmixing unit is used for root According to each road signal respectively through the present frame output signal after time-varying low-pass filtering treatment, and present frame is wrapped in each road signal The each sample weights for including, obtain and export the current frame signal all the way after each road signal present frame audio mixing;
The post-processing unit being connected with the downmixing unit;The post-processing unit is used for according to downmixing unit output Current frame signal all the way, calculates each sample weights after audio mixing included by signal present frame and each output sample;
Further, the speech detection unit includes:
Power computation module, for calculating the power of present frame in the signal of framing Hou Ge roads respectively;
The minimum frame power determination module being connected with the power computation module;The minimum frame power determination module is used In the result of calculation according to the power computation module, to obtain framing Hou Ge roads signal respectively in current predetermined amount of time Minimum frame power;
The voice status being connected with the power computation module and the minimum frame power determination module know module;Institute Stating voice status knows module for by the ratio between frame power to the power of present frame in signal all the way and the minimum Relatively detecting in signal all the way whether contain voice signal;
Further,
The power computation module passes through formulaTo calculate the work(of present frame in signal all the way Rate, in formula:Pow represents that the power of present frame, x [i] represent that i-th input data of present frame, N represent the sample number in a frame Amount;
The current predetermined amount of time represents the duration T of the front r frames to present frame from present frame;The minimum frame work( Rate determining module by formula pow_min=min present frame power, 1 frame power before present frame, r frames before present frame Power } minimum in current predetermined amount of time the to obtain signal all the way frame power, in formula:Min { } represents in braces own The minima of data,Ceil (x) is represented and is close to x and the integer more than or equal to x, FSRepresent sampling frequency Rate, N represent the sample size in a frame;
The voice status know that module whether contain in signal all the way voice signal by setting VAD to represent, and right VAD assigns initial value and causes VAD=1;As pow >=32 pow_min, and the voice status know that VAD is put by module during VAD=0 1, represent the road signal in there is voice signal state;As pow≤4 pow_min, and the voice status are known during VAD=1 VAD is set to 0 by module, represents that the road signal is in without voice signal state;Comparative result between pow and pow_min is which During its situation, the voice status know that VAD is kept constant by module;
Further, the time varing filter obtains signal all the way by formula f [i]=(1-b) * x [i]+b*f [i-1] I-th filtering output value of middle present frame, in formula:F [i] represents i-th filtering output value of present frame, x [i] in signal all the way Represent that i-th input data of present frame, N represent that the sample size in a frame, 0≤i < N, b represent filter factor, in present frame In have voice signal state when,When present frame is in without voice signal state,As b < 0.18, take b=0.18, as b > 0.956, take b=0.956, p1 represent b from 0.956 fades to the sampling number in 0.18 time span, and p2 represents samplings of the b from 0.18 time span for fading to 0.956 Points;
Further,
The loudness computing unit includes:Loudness that DFT transform module is connected with DFT transform module obtain module and The mean loudness acquisition module that module is connected is obtained with loudness;
When in signal all the way present frame in have voice signal state when, by the DFT transform module to the road signal Middle present frame carries out DFT transform, obtains the loudness value that module calculates the present frame by the loudness afterwards, finally by institute State mean loudness acquisition module and calculate the mean loudness in the current predetermined amount of time of road signal;
When in signal all the way, present frame is in without voice signal state, average in the current predetermined amount of time of road signal Loudness is equal to the mean loudness in the upper predetermined amount of time containing voice signal before present frame;
Further,
The DFT transform module passes through formulaTo signal all the way Middle present frame carries out DFT transform, in formula,S represents that discrete frequency, x [i] represent that present frame is input into for i-th Data, X [s] represent that x [i] result that obtains after DFT transform, j represent imaginary unit, j2=-1;
The loudness obtains module and utilizes formula The loudness value of the present frame is calculated, in formula, loudness represents that the loudness value of present frame, X [s] represent that x [i] is passed through The result that obtains after DFT transform, Equal [s] represent have default value etc. loudness array, s20=ceil (20*N/Fs)、 s20000=floor (20000*N/Fs), ceil (x) represent be close to x and the integer more than or equal to x, floor (x) represent be close to x and Integer, F less than or equal to xSRepresent that sample frequency, N represent the sample size in a frame;
The mean loudness obtains module and passes through formula
To calculate the mean loudness in the current predetermined amount of time of signal all the way;Current predetermined amount of time represents the front r from present frame The duration T of frame to present frame;In formula:Ceil (x) is represented and is close to x and whole more than or equal to x Number, FSRepresent that sample frequency, N represent the sample size in a frame;
Further,
The weight calculation unit passes through formulaDraw present frame power in the signal of kth road Weight, and pass through formula weightk[i]=weightk0≤i < N obtain the weight of i-th sample of present frame in the signal of kth road; In formula:weightkRepresent present frame weight in the signal of kth road, LOUDkRepresent that kth road signal is flat in current predetermined amount of time Equal loudness, M represent the signal way for participating in audio mixing, k=1,2 ..., M, weightk[i] represents present frame i-th in the signal of kth road The weight of individual sample;
The downmixing unit passes through formulaDraw each road Current frame signal all the way after signal present frame audio mixing, in formula:I-th of current frame signal all the way after Mix [i] expression audio mixings Sample, M represent participate in audio mixing signal way, f_k [i] represent in the signal of time-varying low-pass filtering treatment Houk road when I-th sample output signal of previous frame, weightk[i] represents the weight of i-th sample in the signal of kth road included by present frame, K=1,2 ..., M;
The post-processing unit maximum sample in signal present frame and is believed after calculating audio mixing after being additionally operable to obtain audio mixing Number present frame weight;
Further,
The weight calculation unit is additionally operable to carry out smooth place to each frame weight in each road signal of participation audio mixing respectively Reason;
Further,
The weight calculation unit to signal all the way in the process of realizing that is smoothed of each frame weight be:
When in signal all the way, present frame is the first frame, the weight calculation unit passes through formula weightk[i]= weightk0≤i < N obtain the weight of i-th sample of present frame in the signal of kth road;
When in signal all the way, present frame is not the first frame, the weight calculation unit passes through formula
Obtain the weight of i-th sample of present frame in the signal of kth road;Wherein, weightk[i] represents present frame in the signal of kth road The weight of i-th sample, N represent that sample size in a frame, P represent weightk[i] 1 frame weight from before present frame gradually becomes Sampling number to present frame weight;
The post-processing unit passes through formula signalmax=max | Mix [0] |, | Mix [1] |, | Mix [N-1] | the maximum sample in signal present frame after audio mixing is obtained, in formula:signalmaxAfter expression audio mixing included by signal present frame Maximum sample, max { } represent that the maximum of data, Mix [0] in braces represent the defeated of the 0th sample of signal present frame after audio mixing Go out signal after signal, Mix [1] represent the output signal of the 1st sample of signal present frame after audio mixing, Mix [N-1] represents audio mixing current The output signal of frame N-1 samples;
Work as signalmaxWhen≤32768, the post-processing unit calculates the present frame weight of signal after audio mixing weightmix=1, work as signalmaxDuring > 32768, the post-processing unit calculates the present frame weight of signal after audio mixing
When the present frame of signal after audio mixing is the first frame, the post-processing unit passes through formula weightmix[i]= weightmix, 0≤i < N obtain the weight of signal i-th sample of present frame after audio mixing, and after audio mixing, the present frame of signal is not the During one frame, the post-processing unit passes through formula The weight of signal i-th sample of present frame after acquisition audio mixing, in formula:weightmixI-th of signal present frame after [i] expression audio mixing The weight of sample, Q represent weightmix[i] before present frame, 1 frame weight gradually fades to present frame weight after audio mixing from after audio mixing Sampling number;
The post-processing unit passes through formulaBelieve after drawing audio mixing I-th output sample y [i] of number present frame, in formula:Final [i]=Mix [i] * weightmix[i], 0≤i < N.
A kind of sound mixing method, comprises the steps:
Step 1:Framing is carried out respectively to each road signal for participating in audio mixing;
Step 2:Whether framing Hou Ge roads signal is detected containing voice signal;In by detection all the way signal Whether present frame determines whether the road signal contain voice signal in there is voice signal state;
Step 3:According to Speech signal detection result, framing Hou Ge roads signal is carried out at time-varying low-pass filtering respectively Reason:When in signal all the way present frame in have voice signal state when, passband width gradually broadens, when present frame in signal all the way When being in without voice signal state, passband width becomes narrow gradually;
Step 4:According to Speech signal detection result, to calculate framing Hou Ge roads signal respectively in current predetermined amount of time Interior mean loudness;
Step 5:According to mean loudness result of calculation, respectively in each road signal of calculating participation audio mixing included by present frame Each sample weights;
Step 6:According to each road signal respectively through the present frame output signal after time-varying low-pass filtering treatment, and each road Each sample weights in signal included by present frame, obtain and export the letter of present frame all the way after each road signal present frame audio mixing Number;
Step 7:According to the current frame signal all the way after each road signal present frame audio mixing, signal present frame institute after audio mixing is calculated Including each sample weights and each output sample.
The mixer provided as a result of above-mentioned technical proposal, the present invention and sound mixing method, there is provided a kind of new language Sound activates detection mode, i.e., judge by the mean power of voice signal whether present frame is voice signal;The present invention passes through The introducing of time varing filter, solves current audio mixing technology Zhong Ge road signal and directly participates in audio mixing and introduce asking for unnecessary noise Topic, is participated in audio mixing way and causes " without speech " participant to have no the presence of sense while avoiding and being reduced using quiet detection Problem;The loudness control strategies such as present invention employing, draw the weight of each road signal by the loudness of each road signal of calculating, finally The mean loudness of Shi Ge roads signal be close to identical, auditory effect also close to;Present invention achieves voice signal especially small-signal The raising of voice quality, has also embodied the fairness to each participant.
Description of the drawings
Fig. 1 is the structured flowchart of mixer of the present invention;
Fig. 2 is the structured flowchart of speech detection unit of the present invention;
Fig. 3 is the structured flowchart of loudness computing unit of the present invention;
Fig. 4 is the workflow diagram of mixer of the present invention;
Fig. 5 is the waveform diagram of the voice signal of three road of the invention different qualities;
Fig. 6 is the waveform of three road voice signals after loudness computing unit of the present invention and weight calculation unit are processed Schematic diagram;
Fig. 7 is the waveform diagram of mixer output signal of the present invention;
Fig. 8 is the flow chart of sound mixing method of the present invention.
Specific embodiment
A kind of mixer as shown in Figure 1, Figure 2, Figure 3 and Figure 4, including:Framing unit, for each road to participating in audio mixing Signal carries out framing respectively;The speech detection unit being connected with the framing unit;The speech detection unit is used for dividing Whether Zheng Houge roads signal is detected containing voice signal;The speech detection unit by detection all the way in signal when Whether previous frame determines whether the road signal contain voice signal in there is voice signal state;With the speech detection unit The time varing filter being connected;The time varing filter is used for according to the testing result of the speech detection unit, to framing after Each road signal carry out time-varying low-pass filtering treatment respectively;When in signal all the way present frame in have voice signal state when, institute The passband width for stating time varing filter gradually broadens, when in signal all the way, present frame is in without voice signal state, when described The passband width for becoming wave filter becomes narrow gradually;The loudness computing unit being connected with the speech detection unit;The program meter Calculating unit is used for the testing result according to the speech detection unit, makes a reservation for currently to calculate framing Hou Ge roads signal respectively Mean loudness in time period;The weight calculation unit being connected with loudness computing unit;The weight calculation unit is used for root According to the result of calculation of the loudness computing unit, each sample included by present frame in each road signal for participate in audio mixing is calculated respectively Weight;The downmixing unit being connected with the time varing filter and the weight calculation unit;The downmixing unit is used for basis Each road signal respectively through the present frame output signal after time-varying low-pass filtering treatment, and in each road signal included by present frame Each sample weights, obtain and export the current frame signal all the way after each road signal present frame audio mixing;With the downmixing unit phase The post-processing unit of connection;The post-processing unit is used for the current frame signal all the way according to downmixing unit output, calculates audio mixing Each sample weights afterwards included by signal present frame and each output sample;Further, the speech detection unit includes:Power Computing module, for calculating the power of present frame in the signal of framing Hou Ge roads respectively;It is connected with the power computation module Minimum frame power determination module;The minimum frame power determination module is used for being tied according to the calculating of the power computation module Really, the frame power minimum in current predetermined amount of time to obtain framing Hou Ge roads signal respectively;With the power calculation mould The voice status that block is connected with the minimum frame power determination module know module;The voice status know that module is used for leading to Cross between the power of present frame in signal all the way and the minimum frame power relatively detecting in signal all the way whether contain There is voice signal;Further, the power computation module passes through formulaTo calculate signal all the way The power of middle present frame, in formula:Pow represents that the power of present frame, x [i] represent that i-th input data of present frame, N represent a frame In sample size;
The current predetermined amount of time represents the duration T of the front r frames to present frame from present frame;The minimum frame work( Rate determining module by formula pow_min=min present frame power, 1 frame power before present frame, r frames before present frame Power } minimum in current predetermined amount of time the to obtain signal all the way frame power, in formula:Min { } represents in braces own The minima of data,Ceil (x) is represented and is close to x and the integer more than or equal to x, FSRepresent sampling frequency Rate, N represent the sample size in a frame;The voice status know that whether module is represented in signal all the way by setting VAD Containing voice signal, and VAD=1 is caused to VAD tax initial values;As pow >=32 pow_min, and voice shape during VAD=0 State knows that VAD is put 1 by module, represents the road signal in there is voice signal state;As pow≤4 pow_min, and VAD=1 Shi Suoshu voice status know that VAD is set to 0 by module, represent that the road signal is in without voice signal state;In pow and pow_min Between comparative result be other situations when, the voice status know that VAD is kept constant by module;Further, when described Become the i-th filtering output that wave filter obtains present frame in signal all the way by formula f [i]=(1-b) * x [i]+b*f [i-1] Value, in formula:F [i] represents that i-th filtering output value of present frame, x [i] represent i-th input number of present frame in signal all the way Represent that according to, N the sample size in a frame, 0≤i < N, b represent filter factor, present frame in have voice signal state when,When present frame is in without voice signal state, As b < 0.18, b=0.18 is taken, as b > 0.956, take b=0.956, p1 represents b from 0.956 time span for fading to 0.18 Sampling number, p2 represents sampling numbers of the b from 0.18 time span for fading to 0.956;Further, the loudness is calculated Unit includes:The loudness that DFT transform module is connected with DFT transform module obtains module and obtains module with loudness and is connected Mean loudness obtain module;When in signal all the way present frame in have voice signal state when, by the DFT transform module DFT transform is carried out to present frame in the road signal, the loudness value that module calculates the present frame is obtained by the loudness afterwards, Module is obtained finally by the mean loudness and calculates the mean loudness in the current predetermined amount of time of road signal;When signal all the way Middle present frame be in without voice signal state when, the mean loudness in the current predetermined amount of time of road signal be equal to present frame it Mean loudness in the front upper predetermined amount of time containing voice signal;Further, the DFT transform module passes through formulaDFT transform is carried out to present frame in signal all the way, in formula,S represents that discrete frequency, x [i] represent that i-th input data of present frame, X [s] represent that x [i] becomes through DFT The result that obtains after changing, j represent imaginary unit, j2=-1;
The loudness obtains module and utilizes formula The loudness value of the present frame is calculated, in formula, loudness represents that the loudness value of present frame, X [s] represent that x [i] is passed through The result that obtains after DFT transform, Equal [s] represent have default value etc. loudness array, s20=ceil (20*N/Fs)、 s20000=floor (20000*N/Fs), ceil (x) represent be close to x and the integer more than or equal to x, floor (x) represent be close to x and Integer, F less than or equal to xSRepresent that sample frequency, N represent the sample size in a frame;The mean loudness obtains module and passes through FormulaTo count Calculate the mean loudness in the current predetermined amount of time of signal all the way;Current predetermined amount of time is represented from the front r frames of present frame to current The duration T of frame;In formula:Ceil (x) is represented and is close to x and the integer more than or equal to x, FSExpression is adopted Sample frequency, N represent the sample size in a frame;Further, the weight calculation unit passes through formulaPresent frame weight in the signal of kth road is drawn, and passes through formula weightk[i]=weightk0 ≤ i < N obtain the weight of i-th sample of present frame in the signal of kth road;In formula:weightkRepresent present frame in the signal of kth road Weight, LOUDkRepresent that mean loudness of the kth road signal in current predetermined amount of time, M represent the signal way for participating in audio mixing, k =1,2 ..., M, weightk[i] represents the weight of i-th sample of present frame in the signal of kth road;The downmixing unit is by public affairs FormulaDraw current all the way after each road signal present frame audio mixing Frame signal, in formula:I-th sample of the current frame signal all the way after Mix [i] expression audio mixings, M represent the signal road for participating in audio mixing Number, f_k [i] represent i-th sample output signal of present frame in the signal of time-varying low-pass filtering treatment Houk road, weightk[i] represents the weight of i-th sample in the signal of kth road included by present frame, k=1,2 ..., M;The post processing Unit maximum sample in signal present frame and calculates the present frame weight of signal after audio mixing after being additionally operable to obtain audio mixing;Enter One step ground, the weight calculation unit are additionally operable to carry out smooth place to each frame weight in each road signal of participation audio mixing respectively Reason;Further, the process of realizing that each frame weight during the weight calculation unit is to signal all the way is smoothed is:When When in signal, present frame is the first frame all the way, the weight calculation unit passes through formula weightk[i]=weightk0≤i < N Obtain the weight of i-th sample of present frame in the signal of kth road;When in signal all the way, present frame is not the first frame, the weight Computing unit passes through formula Obtain the weight of i-th sample of present frame in the signal of kth road;Wherein, weightk[i] represents present frame i-th in the signal of kth road The weight of individual sample, N represent that sample size in a frame, P represent weightk[i] 1 frame weight from before present frame is gradually faded to be worked as The sampling number of previous frame weight;The post-processing unit passes through formula signalmax=max | Mix [0] |, | Mix [1] |, | Mix [N-1] | } maximum sample in signal present frame after audio mixing is obtained, in formula:signalmaxBelieve after representing audio mixing Maximum sample, max { } number included by present frame represents that signal is worked as after the maximum of data, Mix [0] represent audio mixing in braces The output signal of the 1st sample of signal present frame, Mix [N-1] table after the output signal of the 0th sample of previous frame, Mix [1] expression audio mixings Show the output signal of signal present frame N-1 samples after audio mixing;Work as signalmaxWhen≤32768, the post-processing unit is calculated Go out present frame weight weight of signal after audio mixingmix=1, work as signalmaxDuring > 32768, the post-processing unit is calculated The present frame weight of signal after audio mixingWhen the present frame of signal after audio mixing is the first frame, institute Post-processing unit is stated by formula weightmix[i]=weightmix0≤i < N obtain signal present frame i-th after audio mixing The weight of individual sample, when the present frame of signal after audio mixing is not the first frame, the post-processing unit passes through formulaObtain signal after audio mixing The weight of i-th sample of present frame, in formula:weightmixThe weight of signal i-th sample of present frame, Q tables after [i] expression audio mixing Show weightmix[i] before present frame, 1 frame weight gradually fades to the sampling number of present frame weight after audio mixing from after audio mixing;After described Processing unit passes through formulaDraw i-th of signal present frame after audio mixing Individual output sample y [i], in formula:Final [i]=Mix [i] * weightmix[i], 0≤i < N.
As shown in figure 8, present invention also offers a kind of sound mixing method, comprises the steps:
Step 1:Framing is carried out respectively to each road signal for participating in audio mixing;
Step 2:Whether framing Hou Ge roads signal is detected containing voice signal;In by detection all the way signal Whether present frame determines whether the road signal contain voice signal in there is voice signal state;
Step 3:According to Speech signal detection result, framing Hou Ge roads signal is carried out at time-varying low-pass filtering respectively Reason:When in signal all the way present frame in have voice signal state when, passband width gradually broadens, when present frame in signal all the way When being in without voice signal state, passband width becomes narrow gradually;
Step 4:According to Speech signal detection result, to calculate framing Hou Ge roads signal respectively in current predetermined amount of time Interior mean loudness;
Step 5:According to mean loudness result of calculation, respectively in each road signal of calculating participation audio mixing included by present frame Each sample weights;
Step 6:According to each road signal respectively through the present frame output signal after time-varying low-pass filtering treatment, and each road Each sample weights in signal included by present frame, obtain and export the letter of present frame all the way after each road signal present frame audio mixing Number;
Step 7:According to the current frame signal all the way after each road signal present frame audio mixing, signal present frame institute after audio mixing is calculated Including each sample weights and each output sample.
Further, the step 1 specifically includes following steps:
Step 11:The power of in framing Hou Ge road signal present frame is calculated respectively;
Step 12:According to the result of calculation of present frame power in each road signal, to obtain framing Hou Ge roads signal respectively Minimum frame power in current predetermined amount of time;
Step 13:By between the power to present frame in signal all the way and the minimum frame power relatively detecting Whether voice signal is contained in signal all the way;
Further,
In signal, the power of present frame passes through formula all the wayCalculated, in formula:Pow tables Show that the power of present frame, x [i] represent that i-th input data of present frame, N represent the sample size in a frame;
The current predetermined amount of time represents the duration T of the front r frames to present frame from present frame;By formula pow_ Min=min present frame power, and 1 frame power before present frame, r frames power before present frame } and obtaining signal all the way Minimum frame power in current predetermined amount of time, in formula:Min { } represent braces in all data minima,Ceil (x) is represented and is close to x and the integer more than or equal to x, FSRepresent that sample frequency, N are represented in a frame Sample size;
Voice signal whether contain in signal all the way by setting VAD to be represented, and initial value is assigned to VAD and cause VAD=1; As pow >=32 pow_min, and VAD is put 1 during VAD=0, represent the road signal in there is voice signal state;When pow≤ 4 pow_min, and VAD is set to 0 during VAD=1, represent that the road signal is in without voice signal state;Pow and pow_min it Between comparative result be other situations when, VAD is kept constant;
Further, the step 2 is specially:
The i-th filtering output that present frame in signal all the way is obtained by formula f [i]=(1-b) * x [i]+b*f [i-1] Value, in formula:F [i] represents that i-th filtering output value of present frame, x [i] represent i-th input number of present frame in signal all the way Represent that according to, N the sample size in a frame, 0≤i < N, b represent filter factor, present frame in have voice signal state when,When present frame is in without voice signal state, As b < 0.18, b=0.18 is taken, as b > 0.956, take b=0.956, p1 represents b from 0.956 time span for fading to 0.18 Sampling number, p2 represents sampling numbers of the b from 0.18 time span for fading to 0.956;
Further,
When in signal all the way present frame in have voice signal state when, the step 3 specifically includes following steps:
Step 31:DFT transform is carried out to present frame in the road signal;
Step 32:Calculate the loudness value of the present frame;
Step 33:Calculate the mean loudness in the current predetermined amount of time of road signal;
When in signal all the way, present frame is in without voice signal state, average in the current predetermined amount of time of road signal Loudness is equal to the mean loudness in the upper predetermined amount of time containing voice signal before present frame;
Further,
By formulaDFT is carried out to present frame in signal all the way Conversion, in formula,S represents that discrete frequency, x [i] represent that i-th input data of present frame, X [s] represent x Result that [i] is obtained after DFT transform, j represent imaginary unit, j2=-1;
Using formulaTo the present frame Loudness value calculated, in formula, loudness represents that the loudness value of present frame, X [s] represent x [i] after DFT transform The result that arrives, Equal [s] represent have default value etc. loudness array, s20=ceil (20*N/Fs)、s20000=floor (20000*N/Fs), ceil (x) represents to be close to x and the integer more than or equal to x, floor (x) and represent and is close to x and less than or equal to x's Integer, FSRepresent that sample frequency, N represent the sample size in a frame;
By formula To calculate the mean loudness in the current predetermined amount of time of signal all the way;Current predetermined amount of time represent from the front r frames of present frame to The duration T of present frame;In formula:Ceil (x) is represented and is close to x and the integer more than or equal to x, FS Represent that sample frequency, N represent the sample size in a frame;
Further, by formulaPresent frame weight in the signal of kth road is drawn, and is passed through Formula weightk[i]=weightk0≤i < N obtain the weight of i-th sample of present frame in the signal of kth road;In formula: weightkRepresent present frame weight in the signal of kth road, LOUDkRepresent average sound of the kth road signal in current predetermined amount of time Degree, M represent participate in audio mixing signal way, k=1,2 ..., M, weightk[i] represents i-th sample of present frame in the signal of kth road This weight;
Further, by formulaDraw each road letter Current frame signal all the way after number present frame audio mixing, in formula:I-th sample of the current frame signal all the way after Mix [i] expression audio mixings This, M represents that the signal way for participating in audio mixing, f_k [i] represent current in the signal of time-varying low-pass filtering treatment Houk road I-th sample output signal of frame, weightk[i] represents the weight of i-th sample in the signal of kth road included by present frame, k =1,2 ..., M;
Methods described also comprises the steps:Maximum sample and calculating audio mixing after acquisition audio mixing in signal present frame The present frame weight of signal afterwards;
Further, also there are following steps after the step 4:Respectively to participate in audio mixing each road signal in each Frame weight is smoothed;
Further, to signal all the way in the process of realizing that is smoothed of each frame weight be:
When in signal all the way, present frame is the first frame, by formula weightk[i]=weightk0≤i < N obtain the The weight of i-th sample of present frame in the signal of k roads;
When in signal all the way, present frame is not the first frame, by formula
Obtain the weight of i-th sample of present frame in the signal of kth road;Wherein, weightk[i] represents present frame in the signal of kth road The weight of i-th sample, N represent that sample size in a frame, P represent weightk[i] 1 frame weight from before present frame gradually becomes Sampling number to present frame weight;
By formula signalmax=max { | Mix [0] |, | Mix [1] |, | Mix [N-1] | } is believed after obtaining audio mixing Maximum sample in number present frame, in formula:signalmaxMaximum sample, max { } after expression audio mixing included by signal present frame Represent that the maximum of data, Mix [0] in braces represent the output signal of the 0th sample of signal present frame, Mix [1] table after audio mixing Show that the output signal of the 1st sample of signal present frame, Mix [N-1] after audio mixing represent the defeated of signal present frame N-1 samples after audio mixing Go out signal;
Work as signalmaxWhen≤32768, present frame weight weight of signal after audio mixing is calculatedmix=1, when signalmaxDuring > 32768, the present frame weight of signal after audio mixing is calculated
When the present frame of signal after audio mixing is the first frame, by formula weightmix[i]=weightmix0≤i < N The weight of signal i-th sample of present frame after acquisition audio mixing, when the present frame of signal after audio mixing is not the first frame, by formulaObtain signal after audio mixing The weight of i-th sample of present frame, in formula:weightmixThe weight of signal i-th sample of present frame, Q tables after [i] expression audio mixing Show weightmix[i] before present frame, 1 frame weight gradually fades to the sampling number of present frame weight after audio mixing from after audio mixing;
By formulaDraw i-th of signal present frame after audio mixing Output sample y [i], in formula:Final [i]=Mix [i] * weightmix[i], 0≤i < N.
The upper limit of the passband width of time varing filter of the present invention is 20kHz, and the lower limit of passband width is 0.3kHz;Work as filtering When passband is more than the upper limit or less than lower limit so as to be maintained at bound;Before time-varying low-pass filtering treatment is carried out, when first pair Become wave filter initialized, specially make f [- 1]=0, b=0.18, now the passband width of time varing filter be 0~ 20kHz;Current predetermined amount of time of the invention can be with the current 4s of value;A upper predetermined amount of time refers to a upper 4s of current 4s;This Invention Equal [s] represent have default value etc. loudness array, the default value in the loudness array such as described according to etc. loudness Curve and obtain, its concrete numerical value such as table 1;
The numerical tabular of loudness array Equal such as table 1. [s].
With reference to the concrete acquisition process that table 1 illustrates Equal [s] value:1. s*N/F is calculatedsValue;2. according to s*N/Fs Value, search the frequency range corresponding to the value in table 1;3. according to obtained frequency range in table 1, obtain corresponding Equal [s] value;For example, work as s*N/FsWhen=1, its value falls in the frequency range of table 1 (0.985~1.500), therefore Equal [s]=1.5.
Identical in order to ensure each road signal averaging loudness, need to calculate each road signal etc. loudness weight;Present invention warp Cross smoothing step so that the weight between each frame has the smoothing process of P point, it is ensured that weighted data is in each frame Between smooth change, and then voice signal sounds more smooth, is conducive to the guarantee of voice quality;If weight calculation unit pair In per road signal, each frame weight does not carry out preferred smoothing process, each sample in the signal of Zek roads included by present frame Weight (weightk[i], 0,1,2 ... N-1 of i values) it is equal to present frame weight weight in the signal of kth roadk, specifically,If weight calculation unit carries out preferred smoothing processing mistake to each frame weight in every road signal Journey, then, when in signal all the way, present frame is the first frame, the weight calculation unit passes through formula weightk[i]=weightk 0≤i < N obtain the weight of i-th sample of present frame in the signal of kth road;When in signal all the way, present frame is not the first frame, institute Weight calculation unit is stated by formula
Obtain the weight of i-th sample of present frame in the signal of kth road, i.e., after smoothing process, current in the signal of kth road Each sample weights included by frame are not congruent to present frame weight weight in the signal of kth roadk;In addition, mixed to each road signal After sound, for the frequent spillover for preventing multi-path voice signal to be likely to occur after being added, then can carry out corresponding rear place Reason, while can also ensure held stationary between the weighted data of each frame signal by post-processing operation, and then is conducive to audio mixing The flatness of voice afterwards.The present invention calculates its weight, Shi Ge roads voice signal using the loudness of each road voice signal as standard Mean loudness identical, finally carry out Overflow handling again, and then each road voice signal, after audio mixing, its loudness connect acoustically Closely identical, and spilling will not be frequent.
Audio mixing is carried out come further using mixer of the present invention below by the voice signal to three road different qualities Effectiveness of the invention, wherein, sample frequency F are describedS48kHz is taken, the sample number N in a frame signal takes 2048, current predetermined Time period takes current 4 seconds, frame number r take sampling number p1s of 100, the b from 0.956 time span for fading to 0.18 take 960, b from The 0.18 sampling number p2 faded in 0.956 time span takes 96000, weightk[i] 1 frame weight is gradually from before present frame The sampling number P for fading to present frame weight takes 100, weightmix[i] before present frame, 1 frame weight gradually fades to audio mixing from after audio mixing The sampling number Q of present frame weight takes 100 afterwards;
Fig. 5 shows the waveform diagram of the voice signal of three road of the invention different qualities, as shown in figure 5, in order to verify The audio mixing effect of small-signal, first via voice signal (signal 1) amplitude range are -3500~3500, much smaller than other two-way languages Message number;Due to loudness and amplitude proportional, so the loudness of first via voice signal is also much smaller than other two-way voice signals; In order to the characteristics of verifying the effectiveness of speech detection unit and time varing filter, the second road voice signal (signal 2) for " having language Sound " state and " without voice " state are alternateed and add uniform white noise;The characteristics of 3rd road voice signal (signal 3) For:Above a part of signal amplitude is less, and aft section signal amplitude is larger, and then can contrast the 3rd road voice signal after audio mixing The change of amplitude in front and back, so that analyze the change of loudness before and after which;Fig. 6 is through loudness computing unit of the present invention and weight calculation The waveform diagram of three road voice signals after cell processing, as shown in fig. 6, through speech detection unit, time varing filter, After the process of loudness computing unit and weight calculation unit, three road voice signals there occurs different changes;Wherein, the first via Voice signal (signal 1) amplitude is significantly increased, and its loudness also increases therewith;Continuously there is " nothing in second road voice signal (signal 2) During voice " state, the uniform white noise in signal is cut, and signal amplitude has also reduced;3rd road voice signal (signal 3) Above the amplitude of a part of signal has increased, and the amplitude of aft section signal has then reduced;Fig. 7 shows audio mixing of the present invention The waveform diagram of device output signal, as shown in fig. 7, three road voice signals are overlapped by downmixing unit, then after passing through Processing unit causes final output signal to overflow infrequently, and overflows supersaturation process.From above test result, three The different voice signal of road loudness, after mixer of the present invention such as carries out at the loudness control, its mean loudness is close to phase Deng;As three road voice signals have different qualities, the know clearly good robustness of the present invention and stability is also indicated that.
The invention provides a kind of new voice activation detection mode, i.e., judge to work as by the mean power of voice signal Whether previous frame is voice signal;The present invention solves current audio mixing technology Zhong Ge road signal straight by the introducing of time varing filter Connect participation audio mixing and introduce the problem of unnecessary noise, participate in audio mixing way and make while avoiding and being reduced using quiet detection Into " without speech " participant have no to there is a problem of feeling;The loudness control strategies such as present invention employing, by calculating each road signal Loudness drawing the weight of each road signal, the mean loudness of final Shi Ge roads signal be close to identical, auditory effect also close to;This While invention achieves raising small-signal voice quality, the fairness to each participant has also been embodied.
The above, the only present invention preferably specific embodiment, but protection scope of the present invention is not limited thereto, Any those familiar with the art the invention discloses technical scope in, technology according to the present invention scheme and its Inventive concept equivalent or change in addition, should all be included within the scope of the present invention.

Claims (10)

1. a kind of mixer, it is characterised in that the mixer includes:
Framing unit, carries out framing respectively for each road signal to participating in audio mixing;
The speech detection unit being connected with the framing unit;The speech detection unit is used for framing Hou Ge roads signal Whether detected containing voice signal;The speech detection unit has by whether present frame in detection all the way signal is in Voice signal state, determines whether the road signal contains voice signal;
The time varing filter being connected with the speech detection unit;The time varing filter is used for according to the speech detection list The testing result of unit, carries out time-varying low-pass filtering treatment respectively to framing Hou Ge roads signal;At present frame in signal all the way When having voice signal state, the passband width of the time varing filter gradually broadens, when in signal all the way, present frame is in nothing During voice signal state, the passband width of the time varing filter becomes narrow gradually;
The loudness computing unit being connected with the speech detection unit;The loudness computing unit is used for being examined according to the voice The testing result of unit is surveyed, to calculate mean loudness of the framing Hou Ge roads signal in current predetermined amount of time respectively;
The weight calculation unit being connected with loudness computing unit;The weight calculation unit is used for calculating list according to the loudness The result of calculation of unit, calculates each sample weights included by present frame in each road signal for participate in audio mixing respectively;
The downmixing unit being connected with the time varing filter and the weight calculation unit;The downmixing unit is used for according to each Road signal respectively through the present frame output signal after time-varying low-pass filtering treatment, and in each road signal included by present frame Each sample weights, obtain and export the current frame signal all the way after each road signal present frame audio mixing;
The post-processing unit being connected with the downmixing unit;The post-processing unit is used for being exported all the way according to downmixing unit Current frame signal, each sample weights included by signal present frame and each output sample after calculating audio mixing.
2. mixer according to claim 1, it is characterised in that the speech detection unit includes:
Power computation module, for calculating the power of present frame in the signal of framing Hou Ge roads respectively;
The minimum frame power determination module being connected with the power computation module;The minimum frame power determination module is used for root According to the result of calculation of the power computation module, minimum in current predetermined amount of time to obtain framing Hou Ge roads signal respectively Frame power;
The voice status being connected with the power computation module and the minimum frame power determination module know module;Institute's predicate Sound-like state know module for by the comparison between the power to present frame in signal all the way and the minimum frame power come Detect in signal all the way and whether contain voice signal.
3. mixer according to claim 2, it is characterised in that
The power computation module passes through formulaTo calculate the power of present frame in signal all the way, formula In:Pow represents that the power of present frame, x [i] represent that i-th input data of present frame, N represent the sample size in a frame;
The current predetermined amount of time represents the duration T of the front r frames to present frame from present frame;The minimum frame power is true Cover half block by formula pow_min=min present frame power, 1 frame power before present frame, r frames power before present frame } To obtain the minimum frame power in current predetermined amount of time of signal all the way, in formula:Min { } represents all data in braces Minima,Ceil (x) is represented and is close to x and the integer more than or equal to x, FSRepresent that sample frequency, N are represented Sample size in one frame;
The voice status know that module whether contain in signal all the way voice signal by setting VAD to represent, and VAD is assigned Initial value causes VAD=1;As pow >=32 pow_min, and during VAD=0, the voice status know that VAD is put 1 by module, represent The road signal is in voice signal state;As pow≤4 pow_min, and the voice status know that module will during VAD=1 VAD sets to 0, and represents that the road signal is in without voice signal state;Comparative result between pow and pow_min is other situations When, the voice status know that VAD is kept constant by module.
4. mixer according to claim 1, it is characterised in that the time varing filter passes through formula f [i]=(1-b) * x [i]+b*f [i-1] obtains i-th filtering output value of present frame in signal all the way, in formula:F [i] represents current in signal all the way I-th filtering output value of frame, x [i] represent that i-th input data of present frame, N represent the sample size in a frame, 0≤i < N, b represent filter factor, when present frame is in and has voice signal state,In present frame When being in without voice signal state,As b < 0.18, b=0.18 is taken, as b > 0.956, B=0.956 is taken, p1 represents that sampling numbers of the b from 0.956 time span for fading to 0.18, p2 represent that b is faded to from 0.18 Sampling number in 0.956 time span.
5. mixer according to claim 1, it is characterised in that
The loudness computing unit includes:Loudness that DFT transform module is connected with DFT transform module obtain module and with sound Degree obtains the mean loudness acquisition module that module is connected;
When in signal all the way present frame in have voice signal state when, by the DFT transform module in the road signal when Previous frame carries out DFT transform, obtains the loudness value that module calculates the present frame by the loudness afterwards, finally by described flat Loudness obtains module and calculates the mean loudness in the current predetermined amount of time of road signal;
When in signal all the way, present frame is in without voice signal state, the mean loudness in the current predetermined amount of time of road signal The mean loudness being equal in the upper predetermined amount of time containing voice signal before present frame.
6. mixer according to claim 5, it is characterised in that
The DFT transform module passes through formulaS=0,1 ..., N-1 carried out to present frame in signal all the way DFT transform, in formula,S represents that discrete frequency, x [i] represent that i-th input data of present frame, X [s] are represented Result that x [i] is obtained after DFT transform, j represent imaginary unit, j2=-1;
The loudness obtains module and utilizes formula The loudness value of the present frame is calculated, in formula, loudness represents that the loudness value of present frame, X [s] represent that x [i] is passed through The result that obtains after DFT transform, Equal [s] represent have default value etc. loudness array, s20=ceil (20*N/Fs)、 s20000=floor (20000*N/Fs), ceil (x) represent be close to x and the integer more than or equal to x, floor (x) represent be close to x and Integer, F less than or equal to xSRepresent that sample frequency, N represent the sample size in a frame;
The mean loudness obtains module and passes through formula To calculate the mean loudness in the current predetermined amount of time of signal all the way;Current predetermined amount of time represent from the front r frames of present frame to The duration T of present frame;In formula:Ceil (x) is represented and is close to x and the integer more than or equal to x, FS Represent that sample frequency, N represent the sample size in a frame.
7. mixer according to claim 6, it is characterised in that
The weight calculation unit passes through formulaPresent frame weight in the signal of kth road is drawn, and By formula weightk[i]=weightk0≤i < N obtain the weight of i-th sample of present frame in the signal of kth road;In formula: weightkRepresent present frame weight in the signal of kth road, LOUDkRepresent average sound of the kth road signal in current predetermined amount of time Degree, M represent participate in audio mixing signal way, k=1,2 ..., M, weightk[i] represents i-th sample of present frame in the signal of kth road This weight;
The downmixing unit passes through formula0≤i < N show that each road signal is current Current frame signal all the way after frame audio mixing, in formula:I-th sample of the current frame signal all the way after Mix [i] expression audio mixings, M tables Show that the signal way for participating in audio mixing, f_k [i] represent i-th of present frame in the signal of time-varying low-pass filtering treatment Houk road Sample output signal, weightkThe weight of i-th sample in the signal of [i] expression kth road included by present frame, k=1, 2、…、M;
The post-processing unit be additionally operable to obtain audio mixing after maximum sample in signal present frame and signal after calculating audio mixing Present frame weight.
8. mixer according to claim 7, it is characterised in that
The weight calculation unit is additionally operable to be smoothed each frame weight in each road signal of participation audio mixing respectively.
9. mixer according to claim 8, it is characterised in that
The weight calculation unit to signal all the way in the process of realizing that is smoothed of each frame weight be:
When in signal all the way, present frame is the first frame, the weight calculation unit passes through formula weightk[i]=weightk0 ≤ i < N obtain the weight of i-th sample of present frame in the signal of kth road;
When in signal all the way, present frame is not the first frame, the weight calculation unit passes through formula
0≤i < P obtain the weight of i-th sample of present frame in the signal of kth road;Wherein, weightk[i] is represented in the signal of kth road The weight of i-th sample of present frame, N represent that sample size in a frame, P represent weightk[i] 1 frame weight from before present frame The sampling number of present frame weight is gradually faded to;
The post-processing unit passes through formula signalmax=max { | Mix [0] |, | Mix [1] |, | Mix [N-1] | } is obtained Maximum sample after audio mixing in signal present frame, in formula:signalmaxMaximum after expression audio mixing included by signal present frame Sample, max { } represent the output letter of the 0th sample of signal present frame after the maximum of data in braces, Mix [0] expression audio mixings Number, the output signal of the 1st sample of signal present frame, Mix [N-1] represent after audio mixing signal present frame the after Mix [1] represents audio mixing The output signal of N-1 samples;
Work as signalmaxWhen≤32768, the post-processing unit calculates present frame weight weight of signal after audio mixingmix= 1, work as signalmaxDuring > 32768, the post-processing unit calculates the present frame weight of signal after audio mixing
When the present frame of signal after audio mixing is the first frame, the post-processing unit passes through formula weightmix[i]= weightmix0≤i < N obtain the weight of signal i-th sample of present frame after audio mixing, and after audio mixing, the present frame of signal is not first During frame, the post-processing unit passes through formula 0≤i < Q obtain the weight of signal i-th sample of present frame after audio mixing, in formula:weightmixAfter [i] represents audio mixing, signal is current The weight of i-th sample of frame, Q represent weightmix[i] before present frame, 1 frame weight gradually fades to present frame after audio mixing from after audio mixing The sampling number of weight;
The post-processing unit passes through formulaAfter drawing audio mixing, signal is worked as I-th output sample y [i] of previous frame, in formula:Final [i]=Mix [i] * weightmix[i], 0≤i < N.
10. a kind of sound mixing method, it is characterised in that methods described comprises the steps:
Step 1:Framing is carried out respectively to each road signal for participating in audio mixing;
Step 2:Whether framing Hou Ge roads signal is detected containing voice signal;By current in detection all the way signal Whether frame determines whether the road signal contain voice signal in there is voice signal state;
Step 3:According to Speech signal detection result, time-varying low-pass filtering treatment is carried out respectively to framing Hou Ge roads signal:When All the way in signal present frame in when having voice signal state, passband width gradually broadens, when in signal all the way, present frame is in During without voice signal state, passband width becomes narrow gradually;
Step 4:According to Speech signal detection result, to calculate framing Hou Ge roads signal respectively in current predetermined amount of time Mean loudness;
Step 5:According to mean loudness result of calculation, the various kinds included by present frame in each road signal for participate in audio mixing is calculated respectively This weight;
Step 6:According to each road signal respectively through the present frame output signal after time-varying low-pass filtering treatment, and each road signal Each sample weights included by middle present frame, obtain and export the current frame signal all the way after each road signal present frame audio mixing;
Step 7:According to the current frame signal all the way after each road signal present frame audio mixing, after calculating audio mixing included by signal present frame Each sample weights and each output sample.
CN201610939143.8A 2016-10-25 2016-10-25 Mixer and sound mixing method Active CN106504758B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610939143.8A CN106504758B (en) 2016-10-25 2016-10-25 Mixer and sound mixing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610939143.8A CN106504758B (en) 2016-10-25 2016-10-25 Mixer and sound mixing method

Publications (2)

Publication Number Publication Date
CN106504758A true CN106504758A (en) 2017-03-15
CN106504758B CN106504758B (en) 2019-07-16

Family

ID=58319112

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610939143.8A Active CN106504758B (en) 2016-10-25 2016-10-25 Mixer and sound mixing method

Country Status (1)

Country Link
CN (1) CN106504758B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107202976A (en) * 2017-05-15 2017-09-26 大连理工大学 The distributed microphone array sound source localization system of low complex degree
CN111770413A (en) * 2020-06-30 2020-10-13 浙江大华技术股份有限公司 Multi-sound-source sound mixing method and device and storage medium
CN112750444A (en) * 2020-06-30 2021-05-04 腾讯科技(深圳)有限公司 Sound mixing method and device and electronic equipment
CN112951197A (en) * 2021-04-02 2021-06-11 北京百瑞互联技术有限公司 Audio mixing method, device, medium and equipment
CN112995425A (en) * 2021-05-13 2021-06-18 北京百瑞互联技术有限公司 Equal loudness sound mixing method and device

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1770256A (en) * 2004-11-02 2006-05-10 北京中科信利技术有限公司 Digital audio frequency mixing method based on transform domain
CN101477800A (en) * 2008-12-31 2009-07-08 瑞声声学科技(深圳)有限公司 Voice enhancing process
CN101593522A (en) * 2009-07-08 2009-12-02 清华大学 A kind of full frequency domain digital hearing aid method and apparatus
CN102664019A (en) * 2012-04-27 2012-09-12 深圳市邦彦信息技术有限公司 DSP sound mixing method and device for full-interactive conference
CN102779527A (en) * 2012-08-07 2012-11-14 无锡成电科大科技发展有限公司 Speech enhancement method on basis of enhancement of formants of window function
CN104539816A (en) * 2014-12-25 2015-04-22 广州华多网络科技有限公司 Intelligent voice mixing method and device for multi-party voice communication
CN104616665A (en) * 2015-01-30 2015-05-13 深圳市云之讯网络技术有限公司 Voice similarity based sound mixing method

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1770256A (en) * 2004-11-02 2006-05-10 北京中科信利技术有限公司 Digital audio frequency mixing method based on transform domain
CN101477800A (en) * 2008-12-31 2009-07-08 瑞声声学科技(深圳)有限公司 Voice enhancing process
CN101593522A (en) * 2009-07-08 2009-12-02 清华大学 A kind of full frequency domain digital hearing aid method and apparatus
CN102664019A (en) * 2012-04-27 2012-09-12 深圳市邦彦信息技术有限公司 DSP sound mixing method and device for full-interactive conference
CN102779527A (en) * 2012-08-07 2012-11-14 无锡成电科大科技发展有限公司 Speech enhancement method on basis of enhancement of formants of window function
CN104539816A (en) * 2014-12-25 2015-04-22 广州华多网络科技有限公司 Intelligent voice mixing method and device for multi-party voice communication
CN104616665A (en) * 2015-01-30 2015-05-13 深圳市云之讯网络技术有限公司 Voice similarity based sound mixing method

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107202976A (en) * 2017-05-15 2017-09-26 大连理工大学 The distributed microphone array sound source localization system of low complex degree
CN111770413A (en) * 2020-06-30 2020-10-13 浙江大华技术股份有限公司 Multi-sound-source sound mixing method and device and storage medium
CN112750444A (en) * 2020-06-30 2021-05-04 腾讯科技(深圳)有限公司 Sound mixing method and device and electronic equipment
CN111770413B (en) * 2020-06-30 2021-08-27 浙江大华技术股份有限公司 Multi-sound-source sound mixing method and device and storage medium
CN112750444B (en) * 2020-06-30 2023-12-12 腾讯科技(深圳)有限公司 Sound mixing method and device and electronic equipment
CN112951197A (en) * 2021-04-02 2021-06-11 北京百瑞互联技术有限公司 Audio mixing method, device, medium and equipment
CN112951197B (en) * 2021-04-02 2022-06-24 北京百瑞互联技术有限公司 Audio mixing method, device, medium and equipment
CN112995425A (en) * 2021-05-13 2021-06-18 北京百瑞互联技术有限公司 Equal loudness sound mixing method and device
CN112995425B (en) * 2021-05-13 2021-09-07 北京百瑞互联技术有限公司 Equal loudness sound mixing method and device

Also Published As

Publication number Publication date
CN106504758B (en) 2019-07-16

Similar Documents

Publication Publication Date Title
CN106504758A (en) Mixer and sound mixing method
CN101960516B (en) Speech enhancement
Lavandier et al. Binaural prediction of speech intelligibility in reverberant rooms with multiple noise sources
CN103413547B (en) A kind of method that room reverberation is eliminated
CN104485114B (en) A kind of method of the voice quality objective evaluation based on auditory perception property
CN103238183A (en) Noise suppression device
US20150302865A1 (en) System and method for audio conferencing
WO2017129005A1 (en) Audio mixing method and apparatus
CN102354500A (en) Virtual bass boosting method based on harmonic control
CN104658543A (en) Method for eliminating indoor reverberation
CN104916288B (en) The method and device of the prominent processing of voice in a kind of audio
EP2860989B1 (en) System and method for dynamically mixing audio signals
CN104616665B (en) Sound mixing method based on voice similar degree
CN103280225B (en) Low-complexity silence detection method
CN101740035A (en) Call voice processing apparatus, call voice processing method and program
CN112750444A (en) Sound mixing method and device and electronic equipment
WO2015085946A1 (en) Voice signal processing method, apparatus and server
Sato et al. Relationship between listening difficulty and acoustical objective measures in reverberant sound fields
Schoenmaker et al. The multiple contributions of interaural differences to improved speech intelligibility in multitalker scenarios
Liu et al. The speech intelligibility and applicability of the speech transmission index in large spaces
CN109887521B (en) Dynamic master tape processing method and device for audio
Zhang et al. A new method of objective speech quality assessment in communication system
CN105720939B (en) A kind of processing method and electronic equipment of audio data
CN104424954B (en) noise estimation method and device
Bhat et al. Smartphone based real-time super gaussian single microphone speech enhancement to improve intelligibility for hearing aid users using formant information

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant