CN106782587A - Sound mask device and sound mask method - Google Patents

Sound mask device and sound mask method Download PDF

Info

Publication number
CN106782587A
CN106782587A CN201611029084.7A CN201611029084A CN106782587A CN 106782587 A CN106782587 A CN 106782587A CN 201611029084 A CN201611029084 A CN 201611029084A CN 106782587 A CN106782587 A CN 106782587A
Authority
CN
China
Prior art keywords
frame
main channel
power
channel signal
signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201611029084.7A
Other languages
Chinese (zh)
Other versions
CN106782587B (en
Inventor
陈喆
殷福亮
崔行悦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University of Technology
Original Assignee
Dalian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University of Technology filed Critical Dalian University of Technology
Priority to CN201611029084.7A priority Critical patent/CN106782587B/en
Publication of CN106782587A publication Critical patent/CN106782587A/en
Application granted granted Critical
Publication of CN106782587B publication Critical patent/CN106782587B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Noise Elimination (AREA)
  • Time-Division Multiplex Systems (AREA)

Abstract

The invention discloses a kind of sound mask device and sound mask method, the sound mask device includes:Receive the main channel of voice signal;Receive the subaisle of background signal;Framing unit;The bandpass filter being connected with the framing unit;The speech detection unit being connected with the bandpass filter;The time varing filter being connected with the speech detection unit;The masking unit being connected with the framing unit;The sum unit being connected with time varing filter and masking unit;The present invention shelters that result is good, and flexibility is high.

Description

Sound mask device and sound mask method
Technical field
The present invention relates to a kind of sound mask device and sound mask method.
Background technology
Background signal such as music and dialogue (explanation) are usually had at video display recording, live match or live radio station Situation about occurring simultaneously.Now, in order to protrude dialogue (explanation) sound, then background signal level must be reduced, enables dialogue Shelter background signal;At the end of dialogue (explanation), background signal is then restored to original size.Sound mask device is exactly real A kind of dynamic processor of existing above-mentioned functions, it is mainly shelters main channel signal automatically by additional signal, once additional letter Number disappear, main channel signal amplitude restores automatically.
The Chinese utility model patent of Publication No. CN205004028U provides a kind of for directly preventing other people from understanding Or the effective single channel sound mask device of admission conversation content, its basic inventive concept is:Deposited using speech selection switch selection Voice interference signal in reservoir, and convert speech into voice signal using sound capturing unit;Signal transacting is divided into two Unit, the first signal processing unit voice signal is split after random combine, generate the first masking signal;Secondary signal treatment is single Voice interference signal random combine is generated the second masking signal by unit;Finally, audio player is by the first masking signal generation the One shelters voice output, and the second masking signal generation second is sheltered into voice output.This kind of sound mask mode of the prior art is deposited In following problem:1. the two-way masking signal for generating will not automatically change size with voice signal size variation, and flexibility is not It is high;2. any treatment is not made to input speech signal, noise may have an impact to sheltering result.
The content of the invention
The present invention is directed to the proposition of problem above, and develops a kind of sound mask device and sound mask method, this kind of sound Shelter device and sound mask method can avoid it is weaker in voice signal, and noise signal it is stronger when, use fixed judgement electricity It is flat to cause the inaccurate technical problem of the court verdict of voice and noise, and can realize when a certain frame in main channel is voice During frame, it should not have lossy whole to pass through, but then be filtered as far as possible when the frame is noise frame, so as to by noise Influence is preferably minimized;This kind of sound mask device and sound mask method can be realized judging that a certain frame in main channel is language simultaneously During sound frame, the subaisle signal power in respective frame quickly reduces with phonetic speech power but remains to be identified, when judging that voice terminates When, the subaisle signal power in respective frame slowly increases, until the frame end.
Technological means of the invention is as follows:
A kind of sound mask device, including:
Receive the main channel of voice signal;
Receive the subaisle of background signal;
Framing unit, each frame main channel signal is obtained for the signal on the main channel to be carried out into sub-frame processing, and Signal on the subaisle is carried out into sub-frame processing and obtains each frame subaisle signal;
The bandpass filter being connected with the framing unit;The bandpass filter is used to believe each frame main channel Number carry out bandpass filtering;
The speech detection unit being connected with the bandpass filter;After the speech detection unit is used for bandpass filtering Each frame main channel signal judged for speech frame or noise frame;
The time varing filter being connected with the speech detection unit;The time varing filter be used for carry out speech frame or Each frame main channel signal after noise frame judges carries out time-varying bandpass filtering treatment;
The masking unit being connected with the framing unit;When the time varing filter is carried out to each frame main channel signal While becoming bandpass filtering treatment, the masking unit is used to that each frame subaisle signal to be carried out sheltering treatment;When a frame master is logical When road signal is speech frame, the filtering bandwidth of the time varing filter gradually increases, while masking unit adjustment and the language Certain corresponding frame subaisle signal of sound frame is gradually reduced, when a frame main channel signal is noise frame, the time varing filter Filtering bandwidth be gradually reduced, while the masking unit adjusts certain frame subaisle signal corresponding with the noise frame gradually increasing Greatly;
The sum unit being connected with time varing filter and masking unit;The sum unit is used for will be by time-varying band logical A frame main channel signal after filtering process is with certain the frame subaisle corresponding with the frame main channel signal by sheltering treatment Signal correspondence superposition obtains a frame output signal;
Further,
If it is in_x (n) to carry out a frame main channel signal before bandpass filtering in the corresponding sampled value of n-th sampled point, Then the corresponding sampled value of n-th sampled point is by the output result after bandpass filter Wherein, IIR_A (k) is that k-th filter factor, IIR_B (r) are r-th Filter factor, u (n-k) represent that this frame main channel signal the n-th-k corresponding output result of sampled point after bandpass filtering (works as n During >=k) or the corresponding output result of previous frame main channel signal k-th sampled point reciprocal after bandpass filtering (work as n<During k), In_x (n-r) represent before this frame main channel signal bandpass filtering the n-th-r corresponding sampled value of sampled point (as n >=r) or on The corresponding sampled value of one frame main channel signal, r-th sampled point reciprocal before bandpass filtering (works as n<During r);
Further, the speech detection unit includes:
Power computation module, for calculating the power by each frame main channel signal after bandpass filter;
The threshold value acquisition module being connected with the power computation module;The threshold value acquisition module is used for according to power meter The power by each frame main channel signal after bandpass filter that calculates of module is calculated to obtain phonetic speech power set, noise The power demarcation threshold of power set and voice and noise;
The P being connected with threshold value acquisition modulenKnow module;The PnKnow module for each of noise power set Probability distribution of the noise power in the power of each frame main channel signal is sued for peace, and when obtain and value is more than first The maximum in each noise power is defined as P during preset valuen;Work as PnMeet pre-conditioned, and the phonetic speech power collection It is described when meeting preset relation between each noise power average in each phonetic speech power average and the noise power set in conjunction PnKnow that module updates Pn, otherwise keep PnIt is constant;
With threshold value acquisition module and PnKnow the determination module that module is connected;The determination module is used for basis and calculates Power and P by the frame main channel signal after bandpass filternSize comparative result come determine the frame main channel believe Number be speech frame or noise frame;
Further, the power computation module passes through formula Px=-10 × log10(Preal/327682) calculate process The power of the frame main channel signal after bandpass filtering, in formula:PxRepresent by the frame main channel signal after bandpass filtering Power,LEN represents that the corresponding sampling number of the frame length of main channel signal one, u (n) represent frame master Channel signal carries out after bandpass filtering the corresponding sampled value of n-th sampled point;
Further, the threshold value acquisition module obtains phonetic speech power set, noise power set and voice and noise The detailed process of power demarcation threshold be:
The frequency that the performance number of each frame main channel signal of the default frame number calculated to power computation module each occurs Counted;
The probability distribution histogram of each frame main channel signal performance number of default frame number is set up, according to the probability distribution Nogata Figure obtains phonetic speech power distributed areas and noise power distributed areas;
The power demarcation threshold of voice and noise is calculated using maximum variance between clusters;
Further, the phonetic speech power collection in setting phonetic speech power distributed areas is combined into [0, Pth- 1], noise power distribution Noise power collection in region is combined into [Pth,Pm], wherein, Pth- 1 represents that the phonetic speech power in phonetic speech power distributed areas is minimum Value, PthRepresent noise power maximum, the P in noise power distributed areasmRepresent the noise work(in noise power distributed areas Rate minimum value;The threshold value acquisition module calculates the power demarcation threshold P of voice and noise using maximum variance between clustersthbest Detailed process be:
By formulaCalculate each phonetic speech power average in phonetic speech power set, by formulaEach noise power average in noise power set is calculated, wherein, uvTable Show each phonetic speech power average, the u in phonetic speech power setnRepresent each noise power average, p in noise power setiRepresent voice The distribution probability of power i, pkValue is [0, P to distribution probability, the i of expression phonetic speech power k successivelyth- 1] each value, k values successively It is [Pth,Pm] each value;
Using formula uT=w0uv+w1unThe mean power of each frame main channel signal of default frame number is drawn, wherein, uTRepresent The mean power of each frame main channel signal of default frame number;
By formula σ2=w0(uv-uT)2+w1(un-uT)2The inter-class variance between phonetic speech power and noise power is obtained, and Must be sent as an envoy to inter-class variance σ2The maximum performance number of value, the performance number is the power demarcation threshold P of voice and noisethbest
Further, the initial value that the determination module sets state value VAD first is 0;For certain frame main channel signal, when State value VAD is 0, while the performance number P of the frame main channel signalxCompare PnBig second preset value, then the determination module judgement should Frame main channel signal is speech frame, and state value VAD is updated into 1, when state value VAD is 1, while the frame main channel signal Performance number PxCompare PnSmall 3rd preset value, then the determination module judge that the frame main channel signal is noise frame, and by state value VAD is updated to 0;
Further,
The centre frequency of the time varing filter is 0.8kHz, amplitude-frequency response Difference equation isWherein, r is represented Bandwidth control variables, excursion are [0.005,0.995], specifically, when a frame main channel signal is speech frame, bandwidth control Variable r processed is changed to 0.005 by 0.995 according to step-length step1, and when a frame main channel signal is noise frame, bandwidth control becomes Amount r is changed to 0.995 by 0.005 according to step-length step2, wherein:fsRepresent that sample frequency, y (n) represent the frame main channel signal After time-varying bandpass filtering the corresponding output result of n-th sampled point, y (n-1) represent previous frame main channel signal through when variable speed The 1st corresponding output result of sampled point reciprocal, y (n-2) represent that previous frame main channel signal is filtered through time-varying band logical after pass filter The corresponding output result of second-to-last sampled point, j represent imaginary unit, j after ripple2=-1, ω represents circumference angular frequency;
When a frame main channel signal is speech frame, it is secondary logical that the masking unit adjusts certain frame corresponding with the speech frame Road signal power gradually decreases to the speech frame by the power after time-varying bandpass filtering by previous frame subaisle signal power 1 percent, when a frame main channel signal is noise frame, the masking unit adjusts certain frame pair corresponding with the noise frame Channel signal power is gradually increased to the power of present frame subaisle signal by previous frame subaisle signal power;It is described to shelter list Unit passes through formulaDraw the work(of certain the frame subaisle signal corresponding with a frame main channel signal Rate Py, in formula:LEN represents sampling number, in_y (n) expressions and the frame main channel signal phase that a frame main channel signal is included Corresponding certain frame subaisle signal is in n-th sampled value of sampled point.
A kind of sound mask method, voice signal is received by main channel, and receives background signal by subaisle, described Sound mask method comprises the following steps:
Step 1:Signal on the main channel is carried out into sub-frame processing and obtains each frame main channel signal, and by the pair Signal on passage carries out sub-frame processing and obtains each frame subaisle signal, performs step 2;
Step 2:Each frame main channel signal is carried out into bandpass filtering, step 3 is performed;
Step 3:Each frame main channel signal after to bandpass filtering is that speech frame or noise frame judge, performs step 4;
Step 4:Time-varying bandpass filtering treatment is carried out to each frame main channel signal, while being covered to each frame subaisle signal Cover treatment;When a frame main channel signal is speech frame, filtering bandwidth gradually increases, while certain frame corresponding with the speech frame Subaisle signal is gradually reduced, when a frame main channel signal be noise frame when, filtering bandwidth is gradually reduced, at the same with the noise frame Certain corresponding frame subaisle signal gradually increases, and performs step 5;
Step 5:By by the frame main channel signal after time-varying bandpass filtering treatment with by sheltering treatment and the frame The corresponding frame subaisle signal correspondence superposition of main channel signal obtains a frame output signal;
Further, the step 3 specifically includes following steps:
Step 31:The power by each frame main channel signal after bandpass filtering is calculated, step 32 is performed;
Step 32:Voice work(is obtained according to the power by each frame main channel signal after bandpass filtering for calculating The power demarcation threshold of rate set, noise power set and voice and noise, performs step 33;
Step 33:Probability distribution to each noise power of noise power set in each frame main channel signal power Sued for peace, and the maximum in each noise power is defined as P when obtain and value is more than the first preset valuen, hold Row step 34;
Step 34:Work as PnMeet pre-conditioned, and each phonetic speech power average in the phonetic speech power set with it is described When meeting preset relation between each noise power average in noise power set, P is updatedn, otherwise keep PnIt is constant, perform step 35;
Step 35:According to the power and P by the frame main channel signal after bandpass filtering that calculatenSize compare As a result come determine the frame main channel signal be speech frame or noise frame.
By adopting the above-described technical solution, the sound mask device and sound mask method of present invention offer, can avoid It is weaker in voice signal, and noise signal it is stronger when, use fixed decision level to cause the court verdict of voice and noise Inaccurate technical problem, and can realize that, when a certain frame in main channel is speech frame, it should not have lossy whole to lead to Cross, but then filtered as far as possible when the frame is noise frame, so as to the influence of noise be preferably minimized;This kind of sound is covered simultaneously Covering device and sound mask method can realize when it is speech frame to judge a certain frame in main channel, the subaisle signal in respective frame Power quickly reduces with phonetic speech power but remains to be identified, at the end of voice is judged, the subaisle signal power in respective frame Slow increase, until the frame end;The present invention shelters that result is good, and flexibility is high.
Brief description of the drawings
Fig. 1 is the structured flowchart of sound mask device of the present invention;
Fig. 2 is the structured flowchart of speech detection unit of the present invention;
The power schematic diagram of each frame main channel signal when Fig. 3 is default frame number value 200 of the invention;
Fig. 4 is the voice signal as sound mask device input signal of the present invention;
Fig. 5 is the background signal as sound mask device input signal of the present invention;
Fig. 6 is the change curve schematic diagram of state value VAD of the present invention;
Fig. 7 be background signal of the present invention in the presence of voice signal whether there is, the process schematic of its change in gain;
Fig. 8 is by the background signal after sound mask device of the present invention treatment;
Fig. 9 is the output signal of sound mask device of the present invention;
Figure 10 is the probability distribution histogram of the performance number of each frame main channel signal of the default frame number of the present invention;
Figure 11 is the flow chart of sound mask method of the present invention.
Specific embodiment
A kind of sound mask device as depicted in figs. 1 and 2, including:Receive the main channel of voice signal;Receive background letter Number subaisle;Framing unit, each frame main channel signal is obtained for the signal on the main channel to be carried out into sub-frame processing, And the signal on the subaisle is carried out into sub-frame processing obtain each frame subaisle signal;It is connected with the framing unit Bandpass filter;The bandpass filter is used to for each frame main channel signal to carry out bandpass filtering;With the band logical The speech detection unit that wave filter is connected;The speech detection unit be used for bandpass filtering after each frame main channel signal It is that speech frame or noise frame are judged;The time varing filter being connected with the speech detection unit;The time-variable filtering Device is used to carry out time-varying bandpass filtering treatment to each frame main channel signal after carrying out speech frame or noise frame judgement;With institute State the masking unit that framing unit is connected;Time-varying bandpass filtering is carried out to each frame main channel signal in the time varing filter While treatment, the masking unit is used to that each frame subaisle signal to be carried out sheltering treatment;When a frame main channel signal is language During sound frame, the filtering bandwidth of the time varing filter gradually increases, while masking unit adjustment is corresponding with the speech frame Certain frame subaisle signal be gradually reduced, when a frame main channel signal be noise frame when, the filtering bandwidth of the time varing filter It is gradually reduced, while the masking unit adjusts certain frame subaisle signal corresponding with the noise frame gradually increasing;With time-varying The sum unit that wave filter is connected with masking unit;The sum unit is used for by after time-varying bandpass filtering treatment Frame subaisle signal is corresponding is superimposed with certain corresponding with the frame main channel signal by sheltering treatment for frame main channel signal To a frame output signal;Further, if the frame main channel signal carried out before bandpass filtering is corresponding in n-th sampled point Sampled value is in_x (n), then the corresponding sampled value of n-th sampled point is by the output result after bandpass filterWherein, IIR_A (k) is k-th filtering Coefficient, IIR_B (r) are that r-th filter factor, u (n-k) represent the n-th-k sampling after bandpass filtering of this frame main channel signal The corresponding output result (as n >=k) of point or previous frame main channel signal k-th sampled point reciprocal after bandpass filtering are corresponding Output result (works as n<During k), in_x (n-r) represents before this frame main channel signal bandpass filtering the n-th-r sampled point is corresponding and adopt The corresponding sampled value of sample value (as n >=r) or previous frame main channel signal r-th sampled point reciprocal before bandpass filtering (works as n<r When);Further, the speech detection unit includes:Power computation module, for calculating by each after bandpass filter The power of frame main channel signal;The threshold value acquisition module being connected with the power computation module;The threshold value acquisition module is used Voice work(is obtained in the power by each frame main channel signal after bandpass filter calculated according to power computation module The power demarcation threshold of rate set, noise power set and voice and noise;The P being connected with threshold value acquisition modulenKnow Module;The PnKnow module for each noise power to noise power set in the power of each frame main channel signal Probability distribution sued for peace, and when obtain and value more than the first preset value when the maximum in each noise power is determined Justice is Pn;Work as PnMeet pre-conditioned, and each phonetic speech power average and the noise power collection in the phonetic speech power set When meeting preset relation between each noise power average in conjunction, the PnKnow that module updates Pn, otherwise keep PnIt is constant;With threshold Value acquisition module and PnKnow the determination module that module is connected;The determination module is filtered for what basis was calculated by band logical The power and P of the frame main channel signal after ripple devicenSize comparative result determining the frame main channel signal for speech frame or Noise frame;Further, the power computation module passes through formula Px=-10 × log10(Preal/327682) pass through band calculating The power of the frame main channel signal after pass filter, in formula:PxRepresent the work(by the frame main channel signal after bandpass filtering Rate,LEN represents that the corresponding sampling number of the frame length of main channel signal one, u (n) represent that frame master is led to Road signal carries out after bandpass filtering the corresponding sampled value of n-th sampled point;Further, the threshold value acquisition module obtains voice The detailed process of the power demarcation threshold of power set, noise power set and voice and noise is:To power computation module The frequency that the performance number of each frame main channel signal of the default frame number for calculating each occurs is counted;Set up default frame number The probability distribution histogram of each frame main channel signal performance number, phonetic speech power distributed areas are obtained according to the probability distribution histogram With noise power distributed areas;The power demarcation threshold of voice and noise is calculated using maximum variance between clusters;Further, if Phonetic speech power collection in attribute sound power distribution area is combined into [0, Pth- 1], the noise power set in noise power distributed areas It is [Pth,Pm], wherein, Pth- 1 represents phonetic speech power minimum value, the P in phonetic speech power distributed areasthRepresent noise power distribution Noise power maximum in region, PmRepresent the noise power minimum value in noise power distributed areas;The threshold value is obtained Module calculates the power demarcation threshold P of voice and noise using maximum variance between clustersthbestDetailed process be:By formulaCalculate each phonetic speech power average in phonetic speech power set, by formulaCalculate and make an uproar Each noise power average in acoustical power set, wherein,uvIn expression phonetic speech power set Each phonetic speech power average, unRepresent each noise power average, p in noise power setiRepresent distribution probability, the p of phonetic speech power ik Value is [0, P to distribution probability, the i of expression phonetic speech power k successivelyth- 1] value is [P to each value, k successivelyth,Pm] each value;Profit Use formula uT=w0uv+w1unThe mean power of each frame main channel signal of default frame number is drawn, wherein, uTRepresent default frame number The mean power of each frame main channel signal;By formula σ2=w0(uv-uT)2+w1(un-uT)2Obtain phonetic speech power and noise power Between inter-class variance, and the inter-class variance σ that must send as an envoy to2The maximum performance number of value, the performance number is the power of voice and noise Demarcation threshold Pthbest;Further, the initial value that the determination module sets state value VAD first is 0;For certain frame main channel Signal, when state value VAD is 0, while the performance number P of the frame main channel signalxCompare PnBig second preset value, then the judgement mould Block judges that the frame main channel signal is speech frame, and state value VAD is updated into 1, when state value VAD is 1, while frame master is logical The performance number P of road signalxCompare PnSmall 3rd preset value, then the determination module judges that the frame main channel signal is noise frame, and incites somebody to action State value VAD is updated to 0;Further, the centre frequency of the time varing filter is 0.8kHz, amplitude-frequency responseDifference equation is Wherein, r represent bandwidth control variables, excursion for [0.005, 0.995], specifically, when a frame main channel signal is speech frame, bandwidth control variables r is become by 0.995 according to step-length step1 Change to 0.005, when a frame main channel signal is noise frame, bandwidth control variables r is changed to by 0.005 according to step-length step2 0.995, wherein:fsRepresent that sample frequency, y (n) represent the frame main channel signal n-th sampled point pair after time-varying bandpass filtering The output result answered, y (n-1) represent that previous frame main channel signal the 1st sampled point reciprocal after time-varying bandpass filtering is corresponding Output result, y (n-2) represent the corresponding output of second-to-last sampled point after time-varying bandpass filtering of previous frame main channel signal As a result, j represents imaginary unit, j2=-1, ω represents circumference angular frequency;It is described to cover when a frame main channel signal is speech frame Unit adjustment certain frame subaisle signal power corresponding with the speech frame is covered to be gradually reduced by previous frame subaisle signal power It is described when a frame main channel signal is noise frame to the speech frame by 1 percent of the power after time-varying bandpass filtering Masking unit adjusts certain frame subaisle signal power corresponding with the noise frame and is gradually increased by previous frame subaisle signal power The big power to present frame subaisle signal;The masking unit passes through formulaDraw and one The power P of certain corresponding frame subaisle signal of frame main channel signaly, in formula:LEN represents what a frame main channel signal was included Sampling number, in_y (n) expressions certain frame subaisle signal corresponding with a frame main channel signal are in n-th sampling of sampled point Value.
A kind of sound mask method as shown in figure 11, receives voice signal, and receive by subaisle by main channel Background signal, the sound mask method comprises the following steps:
Step 1:Signal on the main channel is carried out into sub-frame processing and obtains each frame main channel signal, and by the pair Signal on passage carries out sub-frame processing and obtains each frame subaisle signal, performs step 2;
Step 2:Each frame main channel signal is carried out into bandpass filtering, step 3 is performed;
Step 3:Each frame main channel signal after to bandpass filtering is that speech frame or noise frame judge, performs step 4;
Step 4:Time-varying bandpass filtering treatment is carried out to each frame main channel signal, while being covered to each frame subaisle signal Cover treatment;When a frame main channel signal is speech frame, filtering bandwidth gradually increases, while certain frame corresponding with the speech frame Subaisle signal is gradually reduced, when a frame main channel signal be noise frame when, filtering bandwidth is gradually reduced, at the same with the noise frame Certain corresponding frame subaisle signal gradually increases, and performs step 5;
Step 5:By by the frame main channel signal after time-varying bandpass filtering treatment with by sheltering treatment and the frame The corresponding frame subaisle signal correspondence superposition of main channel signal obtains a frame output signal;
Further, the step 3 specifically includes following steps:
Step 31:The power by each frame main channel signal after bandpass filtering is calculated, step 32 is performed;
Step 32:Voice work(is obtained according to the power by each frame main channel signal after bandpass filtering for calculating The power demarcation threshold of rate set, noise power set and voice and noise, performs step 33;
Step 33:Probability distribution to each noise power of noise power set in each frame main channel signal power Sued for peace, and the maximum in each noise power is defined as P when obtain and value is more than the first preset valuen, hold Row step 34;
Step 34:Work as PnMeet pre-conditioned, and each phonetic speech power average in the phonetic speech power set with it is described When meeting preset relation between each noise power average in noise power set, P is updatedn, otherwise keep PnIt is constant, perform step 35;
Step 35:According to the power and P by the frame main channel signal after bandpass filtering that calculatenSize compare As a result come determine the frame main channel signal be speech frame or noise frame.
Further, if carrying out a frame main channel signal before bandpass filtering in the corresponding sampled value of n-th sampled point It is in_x (n), then the corresponding sampled value of n-th sampled point is by the output result after bandpass filteringWherein, IIR_A (k) is k-th filtering Coefficient, IIR_B (r) are that r-th filter factor, u (n-k) represent the n-th-k sampling after bandpass filtering of this frame main channel signal The corresponding output result (as n >=k) of point or previous frame main channel signal k-th sampled point reciprocal after bandpass filtering are corresponding Output result (works as n<During k), in_x (n-r) represents before this frame main channel signal bandpass filtering the n-th-r sampled point is corresponding and adopt The corresponding sampled value of sample value (as n >=r) or previous frame main channel signal r-th sampled point reciprocal before bandpass filtering (works as n<r When);
Further, the step 31 is specially:By formula Px=-10 × log10(Preal/327682) calculate process The power of the frame main channel signal after bandpass filtering, in formula:PxRepresent by the frame main channel signal after bandpass filtering Power,LEN represents that the corresponding sampling number of the frame length of main channel signal one, u (n) represent frame master Channel signal carries out after bandpass filtering the corresponding sampled value of n-th sampled point;
Further, the step 32 specifically includes following steps:
Step 321:The frequency that the performance number for presetting each frame main channel signal of frame number each occurs is counted;
Step 322:The probability distribution histogram of each frame main channel signal performance number of default frame number is set up, according to the probability Distribution histogram obtains phonetic speech power distributed areas and noise power distributed areas;
Step 323:The power demarcation threshold P of voice and noise is calculated using maximum variance between clustersthbest
Further, the phonetic speech power collection in setting phonetic speech power distributed areas is combined into [0, Pth- 1], noise power distribution Noise power collection in region is combined into [Pth,Pm], wherein, Pth- 1 represents that the phonetic speech power in phonetic speech power distributed areas is minimum Value, PthRepresent noise power maximum, the P in noise power distributed areasmRepresent the noise work(in noise power distributed areas Rate minimum value;The power demarcation threshold P of voice and noise is calculated using maximum variance between clustersthbestDetailed process be:
By formulaCalculate each phonetic speech power average in phonetic speech power set, by formulaEach noise power average in noise power set is calculated, wherein, uvTable Show each phonetic speech power average, the u in phonetic speech power setnRepresent each noise power average, p in noise power setiRepresent voice The distribution probability of power i, pkValue is [0, P to distribution probability, the i of expression phonetic speech power k successivelyth- 1] each value, k values successively It is [Pth,Pm] each value;
Using formula uT=w0uv+w1unThe mean power of each frame main channel signal of default frame number is drawn, wherein, uTRepresent The mean power of each frame main channel signal of default frame number;
By formula σ2=w0(uv-uT)2+w1(un-uT)2The inter-class variance between phonetic speech power and noise power is obtained, and Must be sent as an envoy to inter-class variance σ2The maximum performance number of value, the performance number is the power demarcation threshold P of voice and noisethbest
Further, the step 35 is specially:The initial value for setting state value VAD first is 0;For certain frame main channel letter Number, when state value VAD is 0, while the performance number P of the frame main channel signalxCompare PnBig second preset value, then judge that frame master is led to Road signal is speech frame, and state value VAD is updated into 1, when state value VAD is 1, while the performance number of the frame main channel signal PxCompare PnSmall 3rd preset value, then judge that the frame main channel signal is noise frame, and state value VAD is updated into 0;
Further, time-varying bandpass filtering treatment is carried out to each frame main channel signal using time varing filter;The time-varying The centre frequency of wave filter is 0.8kHz, amplitude-frequency responseDifference side Cheng WeiWherein, R represents bandwidth control variables, excursion for [0.005,0.995], specifically, when a frame main channel signal is speech frame, Bandwidth control variables r is changed to 0.005 by 0.995 according to step-length step1, when a frame main channel signal is noise frame, bandwidth Control variables r is changed to 0.995 by 0.005 according to step-length step2, wherein:fsRepresent that sample frequency, y (n) represent that frame master is led to The corresponding output result of road signal n-th sampled point after time-varying bandpass filtering, y (n-1) represent previous frame main channel signal warp The 1st corresponding output result of sampled point reciprocal, y (n-2) represent previous frame main channel signal through time-varying after time-varying bandpass filtering The corresponding output result of second-to-last sampled point, j represent imaginary unit, j after bandpass filtering2=-1, ω represents circumference angular frequency Rate;
When a frame main channel signal is speech frame, adjust certain frame subaisle signal power corresponding with the speech frame by Previous frame subaisle signal power gradually decreases to the speech frame by 1 percent of the power after time-varying bandpass filtering, when one When frame main channel signal is noise frame, certain frame subaisle signal power corresponding with the noise frame is adjusted by previous frame subaisle Signal power is gradually increased to the power of present frame subaisle signal;By formulaDraw with The power P of certain corresponding frame subaisle signal of one frame main channel signaly, in formula:LEN represents that a frame main channel signal is included Sampling number, in_y (n) represent certain frame subaisle signal corresponding with a frame main channel signal adopting in n-th sampled point Sample value.
It is specific that the present invention is counted to the frequency that the performance number for presetting each frame main channel signal of frame number each occurs Process is:Floor operation is carried out respectively to the performance number for presetting each frame main channel signal of frame number first, then from the first frame master The performance number of channel signal starts successively using the performance number of each frame main channel signal as accumulator subscript, until by default frame number The performance number of each frame main channel signal all travel through, if existing in the performance number of each frame main channel signal of default frame number a few Individual value is equal, then plus one to that should be worth lower target accumulator, and then draws the probability that the performance number of the i-th frame main channel signal occursThe default frame number can be with value 200.
Each frame main channel signal is carried out bandpass filtering by the present invention, can reduce influence of the noise to voice detection results; A width of 0.3kHz~the 3.4kHz of band of bandpass filter;The Reference Design index of the bandpass filter is:Centre frequency is 1.55kHz, passband frequency range are 0.3kHz~3.4kHz, lower stopband cut-off frequency is 1Hz, upper stopband cut-off frequency is 4kHz, stopband attenuation are more than 60dB;The Reference Design result of the bandpass filter is a 7 rank iir filters, its filter factor IIR_B (7)={ 1.012205768830948 × 10-3,-4.911110647449132×10-4,7.807184279553245 ×10-5,-1.198332967805191×10-3,7.807184279553245×10-5,-4.911110647449132e× 10-4,1.012205768830948×10-3};Filter factor IIR_A (7)=1.0, -5.380875974547, 12.23643655587558,-15.06088779864848,10.58294743567488,-4.024466821830663, 0.6468480167658538};Work as n<During k, previous frame main channel signal k-th sampled point reciprocal after bandpass filtering is corresponding It is 0 that output result u (n-k) assigns initial value, works as n<During r, reciprocal r-th sampled point of the previous frame main channel signal before bandpass filtering It is 0 that corresponding sampled value in_x (n-r) also assigns initial value.
First preset value of the present invention is 0.1;Second preset value is 12dB;3rd preset value is 6dB;Institute It refers to P to state pre-conditionednIn [Pthbest,Pm] scope, the preset relation refers to | uv-un| > 10;LEN can be with value 960; The present invention passes through formula Px=-10 × log10(Preal/327682) calculate by the frame main channel signal after bandpass filtering Power, wherein,It is an intermediate variable, specifically, PrealRepresent by the frame master after bandpass filtering The actual power of channel signal, correspondingly, PxIt is to represent the relative power value by the frame main channel signal after bandpass filtering, It is 32768 that the present invention defines corresponding actual power value at 0dBoV2, therefore PxIt is a relative power value of relative 0dBoV;This hair The bright envelope that main channel signal and subaisle signal can also be obtained using envelope detection or peak detection, is recycled per frame envelope Obtain power.
The power schematic diagram of each frame main channel signal when Fig. 3 shows the present invention default frame number value 200, such as Fig. 3 institutes Show, PxThe default frame number i.e. power of 200 frame main channel signals is collectively constituted with the power of the 199 frame voice signals tried to achieve before.
Figure 10 shows the probability distribution histogram of each frame main channel signal performance number of the default frame number of the present invention, such as Figure 10 It is shown, after the probability distribution histogram foundation of the performance number for presetting each frame main channel signal of frame number, can be according to the probability Distribution histogram obtains phonetic speech power distributed areas and noise power distributed areas, and obtain the phonetic speech power distributed areas and The power demarcation threshold P of the border region between noise power distributed areas, voice and noisethbestJust in the border region Calculating acquisition is carried out by maximum variance between clusters.
When a frame main channel signal is speech frame, the filtering bandwidth of the time varing filter gradually increases the present invention, while The masking unit adjusts certain frame subaisle signal corresponding with the speech frame and is gradually reduced, specifically, by controlling certain frame pair The power of channel signal realizes that the amplitude of the frame subaisle signal reduces, when a frame main channel signal is noise frame, the time-varying The filtering bandwidth of wave filter is gradually reduced, while the masking unit adjusts certain frame subaisle signal corresponding with the noise frame Gradually increase, specifically, realize that the amplitude of the frame subaisle signal increases by controlling the power of certain frame subaisle signal;This hair Bright y (n-1), y (n-2) initialization are assigned initial value and are 0;When one When frame main channel signal is speech frame, certain frame subaisle signal power corresponding with the speech frame is by previous frame subaisle signal Power gradually decreases to the speech frame by 1 percent of the power after time-varying bandpass filtering, when a frame main channel signal is to make an uproar During acoustic frame, certain frame subaisle signal power corresponding with the noise frame is gradually increased to work as by previous frame subaisle signal power The power of previous frame subaisle signal, specifically, when state value VAD is 1, subaisle signal gain curr_gain is in a frame in Quickly reduce according to every point step size 0.01, until reaching minimum valuePxxRepresent by after time-varying bandpass filtering A frame main channel signal power, when state value VAD be 0 when, subaisle signal gain curr_gain is in a frame according to every Point step size 0.000005 slowly increases, until reaching default maximum gain 1;Frame length can be 20ms, sampling frequency during practical application Rate is 48kHz, and then sampling number corresponding to a frame length is 960, and every point step size here then refers to 960 steps of sampled point It is long;Output outputm (n)=curr_gain*in_y (n) of the subaisle signal in n-th sampled point of a frame in;The present invention is most Afterwards by by the frame main channel signal after time-varying bandpass filtering treatment with by sheltering treatment and the frame main channel signal phase Corresponding certain frame subaisle signal correspondence superposition obtains a frame output signal, specifically, in n-th sample point, output signal It is output (n)=y (n)+curr_gain*in_y (n).
The excursion of bandwidth control variables r of the present invention is [0.005,0.995], specifically, when state value VAD is 1, Bandwidth control variables r is changed to 0.005 by 0.995 according to step-length step1, when state value VAD be 0 when, bandwidth control variables r by 0.005 is changed to 0.995 according to step-length step2, wherein:fs Represent sample frequency, y (n) represent the corresponding output result of the frame main channel signal n-th sampled point after time-varying bandpass filtering, Y (n-1) represents previous frame main channel signal the 1st the corresponding output result of sampled point, y (n- of inverse after time-varying bandpass filtering 2) represent previous frame main channel signal after time-varying bandpass filtering the corresponding output result of second-to-last sampled point (when y (n) is During the 1st sampled point, y (n-1) and y (n-2) need to assign initial value for 0), j represent imaginary unit, j2=-1, w represents circumference angular frequency Rate, specifically, works as fsDuring value 48kHz, step1=0.0006875, step2=0.000006875, due to step-length step1 ratios Step-length step2 is much larger, therefore, when r is changed by step-length step1,0.005 quickly can be changed in the short time, when r is pressed When step2 changes, then need the long period slowly varying to 0.995.
The specific effect tested with the subaisle signal as background music signal below after the present invention is implemented, Fig. 4 shows The reception signal (voice signal) gone out on sound mask device main channel of the present invention, Fig. 5 is shown as of the present invention Reception signal (background music signal) on sound mask device subaisle;During test sound mask device, maximum kind is used Between variance method (Ostu) seek the power demarcation threshold of voice and noise, and combine PnJudge that a frame main channel signal is speech frame (state value VAD is 1) or noise frame (state value VAD is 0), Fig. 6 shows the change curve schematic diagram of state value VAD, such as Shown in Fig. 6, it can be seen that having at voice, state value VAD is 1, when voice disappears, state value VAD is equal to 0;Fig. 7 shows Background music signal of the present invention in the presence of voice signal whether there is, the process of its change in gain, as shown in Figure 7, it can be seen that In the presence of voice signal, background music signal gain quickly reduces, and after voice signal terminates, background music signal gain is then Slow increase;Fig. 8 shows that, by the background music signal after sound mask device of the present invention treatment, Fig. 9 shows this hair The output signal of the bright sound mask device;Test result shows, in the presence of voice signal, can both protrude voice signal or Background music can be picked out;When not having voice signal, background music smootherly gradually increases, human auditory system experience effect Well.
The present invention can avoid it is weaker in voice signal, and noise signal it is stronger when, can be made using fixed decision level Voice and noise the inaccurate technical problem of court verdict, and can realize when a certain frame in main channel is voice signal When, it should not have lossy whole to pass through, but then be filtered as far as possible when the frame is noise, so as to by the influence of noise It is preferably minimized;This kind of sound mask device and sound mask method can be realized judging that a certain frame in main channel is voice letter simultaneously Number when, the subaisle signal power in respective frame quickly reduces with phonetic speech power but remains to be identified, at the end of voice is judged, Subaisle signal power in respective frame slowly increases, until the frame end;Further, the present invention is obtained using Ostu algorithms The power demarcation threshold of voice and noise is obtained, the accuracy of power demarcation threshold is improve;When using speech frame and noise frame Becoming wave filter carries out time-varying bandpass filtering treatment, both ensure that speech energy is as much as possible and has passed through, and also reduces noise to letter Number influence;Subaisle signal is to follow main channel signal size conversion by frame when carrying out and sheltering treatment simultaneously.
The above, the only present invention preferably specific embodiment, but protection scope of the present invention is not limited thereto, Any one skilled in the art the invention discloses technical scope in, technology according to the present invention scheme and its Inventive concept is subject to equivalent or change, should all be included within the scope of the present invention.

Claims (10)

1. a kind of sound mask device, it is characterised in that the sound mask device includes:
Receive the main channel of voice signal;
Receive the subaisle of background signal;
Framing unit, each frame main channel signal is obtained for the signal on the main channel to be carried out into sub-frame processing, and by institute State the signal on subaisle and carry out sub-frame processing and obtain each frame subaisle signal;
The bandpass filter being connected with the framing unit;The bandpass filter is used to enter each frame main channel signal Row bandpass filtering;
The speech detection unit being connected with the bandpass filter;The speech detection unit be used for bandpass filtering after it is each Frame main channel signal is that speech frame or noise frame are judged;
The time varing filter being connected with the speech detection unit;The time varing filter is used for carrying out speech frame or noise Each frame main channel signal after frame judges carries out time-varying bandpass filtering treatment;
The masking unit being connected with the framing unit;The variable speed when the time varing filter is carried out to each frame main channel signal While pass filter treatment, the masking unit is used to that each frame subaisle signal to be carried out sheltering treatment;When frame main channel letter Number for speech frame when, the filtering bandwidth of the time varing filter gradually increases, at the same the masking unit adjustment with the speech frame Certain corresponding frame subaisle signal is gradually reduced, when a frame main channel signal is noise frame, the filter of the time varing filter Wavestrip is wide to be gradually reduced, while the masking unit adjusts certain frame subaisle signal corresponding with the noise frame gradually increasing;
The sum unit being connected with time varing filter and masking unit;The sum unit is used for will be by time-varying bandpass filtering A frame main channel signal after treatment is with certain the frame subaisle signal corresponding with the frame main channel signal by sheltering treatment Correspondence superposition obtains a frame output signal.
2. sound mask device according to claim 1, it is characterised in that
If it is in_x (n), institute to carry out a frame main channel signal before bandpass filtering in the corresponding sampled value of n-th sampled point The corresponding sampled value of n-th sampled point is stated by the output result after bandpass filter Wherein, IIR_A (k) is that k-th filter factor, IIR_B (r) are r-th filtering system Number, u (n-k) represent this frame main channel signal the n-th-k corresponding output result of sampled point after bandpass filtering (as n >=k) Or the corresponding output result of previous frame main channel signal k-th sampled point reciprocal after bandpass filtering (works as n<During k), in_x (n- R) represent that the n-th-r corresponding sampled value of sampled point (as n >=r) or previous frame master are logical before this frame main channel signal bandpass filtering The corresponding sampled value of road signal r-th sampled point reciprocal before bandpass filtering (works as n<During r).
3. sound mask device according to claim 1, it is characterised in that the speech detection unit includes:
Power computation module, for calculating the power by each frame main channel signal after bandpass filter;
The threshold value acquisition module being connected with the power computation module;The threshold value acquisition module is used for according to power calculation mould The power by each frame main channel signal after bandpass filter that block is calculated obtains phonetic speech power set, noise power Set and the power demarcation threshold of voice and noise;
The P being connected with threshold value acquisition modulenKnow module;The PnKnow module for each noise to noise power set Probability distribution of the power in the power of each frame main channel signal is sued for peace, and when obtain and value is preset more than first The maximum in each noise power is defined as P during valuen;Work as PnMeet pre-conditioned, and in the phonetic speech power set Each phonetic speech power average and the noise power set in when meeting preset relation between each noise power average, the PnObtain Know that module updates Pn, otherwise keep PnIt is constant;
With threshold value acquisition module and PnKnow the determination module that module is connected;The determination module is used for according to the warp for calculating The power and P of the frame main channel signal crossed after bandpass filternSize comparative result be determining the frame main channel signal Speech frame or noise frame.
4. sound mask device according to claim 3, it is characterised in that the power computation module passes through formula Px=-10 ×log10(Preal/327682) calculate the power by the frame main channel signal after bandpass filtering, in formula:PxRepresent and pass through The power of the frame main channel signal after bandpass filtering,LEN represents the frame length of main channel signal one Corresponding sampling number, u (n) represent that the frame main channel signal carries out after bandpass filtering the corresponding sampled value of n-th sampled point.
5. sound mask device according to claim 3, it is characterised in that the threshold value acquisition module obtains phonetic speech power collection Close, the detailed process of the power demarcation threshold of noise power set and voice and noise is:
The frequency that the performance number of each frame main channel signal of the default frame number calculated to power computation module each occurs is carried out Statistics;
The probability distribution histogram of each frame main channel signal performance number of default frame number is set up, is obtained according to the probability distribution histogram To phonetic speech power distributed areas and noise power distributed areas;
The power demarcation threshold of voice and noise is calculated using maximum variance between clusters.
6. sound mask device according to claim 5, it is characterised in that the voice work(in setting phonetic speech power distributed areas Rate collection is combined into [0, Pth- 1], the noise power collection in noise power distributed areas is combined into [Pth,Pm], wherein, Pth- 1 represents voice Phonetic speech power minimum value in power distribution area, PthRepresent noise power maximum, the P in noise power distributed areasmTable Show the noise power minimum value in noise power distributed areas;The threshold value acquisition module calculates language using maximum variance between clusters The power demarcation threshold P of sound and noisethbestDetailed process be:
By formulaCalculate each phonetic speech power average in phonetic speech power set, by formulaEach noise power average in noise power set is calculated, wherein, uvTable Show each phonetic speech power average, the u in phonetic speech power setnRepresent each noise power average, p in noise power setiRepresent voice The distribution probability of power i, pkValue is [0, P to distribution probability, the i of expression phonetic speech power k successivelyth- 1] each value, k values successively It is [Pth,Pm] each value;
Using formula uT=w0uv+w1unThe mean power of each frame main channel signal of default frame number is drawn, wherein, uTRepresent default The mean power of each frame main channel signal of frame number;
By formula σ2=w0(uv-uT)2+w1(un-uT)2The inter-class variance between phonetic speech power and noise power is obtained, and is drawn Make inter-class variance σ2The maximum performance number of value, the performance number is the power demarcation threshold P of voice and noisethbest
7. sound mask device according to claim 2, it is characterised in that the determination module sets state value VAD's first Initial value is 0;For certain frame main channel signal, when state value VAD is 0, while the performance number P of the frame main channel signalxCompare PnIt is big by Two preset values, then the determination module judges that the frame main channel signal is speech frame, and state value VAD is updated into 1, works as state Value VAD is 1, while the performance number P of the frame main channel signalxCompare PnSmall 3rd preset value, then the determination module judge frame master Channel signal is noise frame, and state value VAD is updated into 0.
8. sound mask device according to claim 1, it is characterised in that
The centre frequency of the time varing filter is 0.8kHz, amplitude-frequency response Difference equation isIts In, r represents bandwidth control variables, excursion for [0.005,0.995], specifically, when a frame main channel signal is speech frame When, bandwidth control variables r is changed to 0.005 by 0.995 according to step-length step1, when a frame main channel signal is noise frame, band Control variables r wide is changed to 0.995 by 0.005 according to step-length step2, wherein:fsRepresent that sample frequency, y (n) represent frame master The corresponding output result of channel signal n-th sampled point after time-varying bandpass filtering, y (n-1) represent previous frame main channel signal After time-varying bandpass filtering the 1st corresponding output result of sampled point reciprocal, y (n-2) represent previous frame main channel signal through when The corresponding output result of second-to-last sampled point, j represent imaginary unit, j after change bandpass filtering2=-1, ω represents circumference angular frequency Rate;
When a frame main channel signal is speech frame, the masking unit adjusts certain the frame subaisle letter corresponding with the speech frame Number power gradually decreases to percentage of the speech frame by the power after time-varying bandpass filtering by previous frame subaisle signal power One of, when a frame main channel signal is noise frame, the masking unit adjusts certain frame subaisle corresponding with the noise frame Signal power is gradually increased to the power of present frame subaisle signal by previous frame subaisle signal power;The masking unit leads to Cross formulaDraw the power of certain the frame subaisle signal corresponding with a frame main channel signal Py, in formula:LEN represents that the sampling number that a frame main channel signal included, in_y (n) represent relative with a frame main channel signal Certain the frame subaisle signal answered is in n-th sampled value of sampled point.
9. a kind of sound mask method, it is characterised in that the sound mask method receives voice signal by main channel, and leads to Cross subaisle and receive background signal, the sound mask method comprises the following steps:
Step 1:Signal on the main channel is carried out into sub-frame processing and obtains each frame main channel signal, and by the subaisle On signal carry out sub-frame processing and obtain each frame subaisle signal, perform step 2;
Step 2:Each frame main channel signal is carried out into bandpass filtering, step 3 is performed;
Step 3:Each frame main channel signal after to bandpass filtering is that speech frame or noise frame judge, performs step 4;
Step 4:Time-varying bandpass filtering treatment is carried out to each frame main channel signal, while carrying out cover to each frame subaisle signal Reason;When a frame main channel signal is speech frame, filtering bandwidth gradually increases, while certain frame corresponding with the speech frame is secondary logical Road signal is gradually reduced, and when a frame main channel signal is noise frame, filtering bandwidth is gradually reduced, while relative with the noise frame Certain the frame subaisle signal answered gradually increases, and performs step 5;
Step 5:Will be same by sheltering leading to frame master for treatment by the frame main channel signal after time-varying bandpass filtering treatment Signal corresponding frame subaisle signal correspondence superposition in road obtains a frame output signal.
10. sound mask method according to claim 9, it is characterised in that the step 3 specifically includes following steps:
Step 31:The power by each frame main channel signal after bandpass filtering is calculated, step 32 is performed;
Step 32:Phonetic speech power collection is obtained according to the power by each frame main channel signal after bandpass filtering for calculating The power demarcation threshold of conjunction, noise power set and voice and noise, performs step 33;
Step 33:Probability distribution to each noise power of noise power set in each frame main channel signal power is carried out Summation, and the maximum in each noise power is defined as P when obtain and value is more than the first preset valuen, perform step Rapid 34;
Step 34:Work as PnMeet pre-conditioned, and each phonetic speech power average and the noise work(in the phonetic speech power set When meeting preset relation between each noise power average in rate set, P is updatedn, otherwise keep PnIt is constant, perform step 35;
Step 35:According to the power and P by the frame main channel signal after bandpass filtering that calculatenSize comparative result, To determine that the frame main channel signal is speech frame or noise frame.
CN201611029084.7A 2016-11-20 2016-11-20 Sound masking device and sound masking method Active CN106782587B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611029084.7A CN106782587B (en) 2016-11-20 2016-11-20 Sound masking device and sound masking method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611029084.7A CN106782587B (en) 2016-11-20 2016-11-20 Sound masking device and sound masking method

Publications (2)

Publication Number Publication Date
CN106782587A true CN106782587A (en) 2017-05-31
CN106782587B CN106782587B (en) 2020-04-28

Family

ID=58971532

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611029084.7A Active CN106782587B (en) 2016-11-20 2016-11-20 Sound masking device and sound masking method

Country Status (1)

Country Link
CN (1) CN106782587B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102099851A (en) * 2008-07-18 2011-06-15 皇家飞利浦电子股份有限公司 Method and system for preventing overhearing of private conversations in public places
JP2012194415A (en) * 2011-03-17 2012-10-11 Yamaha Corp Masker sound measurement instrument and sound masking device
CN104156578A (en) * 2014-07-31 2014-11-19 南京工程学院 Recording time identification method
CN204495996U (en) * 2011-10-26 2015-07-22 菲力尔系统公司 broadband sonar receiver
CN205004029U (en) * 2015-09-29 2016-01-27 苏州一天声学科技有限公司 Ware is sheltered to array sound

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102099851A (en) * 2008-07-18 2011-06-15 皇家飞利浦电子股份有限公司 Method and system for preventing overhearing of private conversations in public places
JP2012194415A (en) * 2011-03-17 2012-10-11 Yamaha Corp Masker sound measurement instrument and sound masking device
CN204495996U (en) * 2011-10-26 2015-07-22 菲力尔系统公司 broadband sonar receiver
CN104156578A (en) * 2014-07-31 2014-11-19 南京工程学院 Recording time identification method
CN205004029U (en) * 2015-09-29 2016-01-27 苏州一天声学科技有限公司 Ware is sheltered to array sound

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
AJAY NATARAJAN: ""An Auditory-Masking-Threshold-Based Noise Suppression Algorithm GMMSE-AMT For Listeners"", 《EURASIP》 *
JUGURTA MONTALVAO: ""IMPROVED SIGNAL REPRESENTATION FOR EVENT DETECTION IN REMOTE HEALTH CARE THROUGH PSYCHOANALYICAL MASKING"", 《RESEARCHGATE》 *
吕勇: ""基于最小统计和人耳掩蔽特性的语音增强算法"", 《电声技术》 *
张勇: ""非平稳噪声环境下结合听觉掩蔽的语音增强"", 《计算机工程与设计》 *
曹亮: ""基于听觉掩蔽效应的多频带谱减语音增强算法"", 《计算机工程与设计》 *
曹龙涛: ""基于噪声估计的二值掩蔽语音增强算法"", 《计算机工程与应用》 *

Also Published As

Publication number Publication date
CN106782587B (en) 2020-04-28

Similar Documents

Publication Publication Date Title
CN107393542B (en) Bird species identification method based on two-channel neural network
CN102282867B (en) Hearing aid and a method of detecting and attenuating transients
DE60108401T2 (en) SYSTEM FOR INCREASING LANGUAGE QUALITY
KR101676211B1 (en) Reduction of transient sounds in hearing implants
CN106340303B (en) A kind of voice de-noising method based on temporal frequency domain
CN106504765B (en) A kind of auto gain control method and device of audio signal
CN102984634A (en) Digital hearing-aid unequal-width sub-band automatic gain control method
CN106448712B (en) A kind of auto gain control method and device of audio signal
CN110265065B (en) Method for constructing voice endpoint detection model and voice endpoint detection system
DE112011105908B4 (en) Method and device for adaptive control of the sound effect
CN110248300B (en) Howling suppression method based on autonomous learning and sound amplification system
CN112242147A (en) Voice gain control method and computer storage medium
CN103812462B (en) Volume control method and device
CN107274913A (en) A kind of sound identification method and device
CN106409309A (en) Tone quality enhancement method and microphone
CN110012331A (en) A kind of far field diamylose far field audio recognition method of infrared triggering
CN102610232B (en) Method for adjusting self-adaptive audio sensing loudness
CN106448690A (en) Automatic gain control method and apparatus of audio signals
CN106782592A (en) A kind of echo and the system and method uttered long and high-pitched sounds for eliminating network sound transmission
US7646912B2 (en) Method and device for ascertaining feature vectors from a signal
CN106782587A (en) Sound mask device and sound mask method
CN1355916A (en) Signal noise reduction by time-domain spectral substraction
CN110010150A (en) Auditory Perception speech characteristic parameter extracting method based on multiresolution
CN112564655A (en) Audio signal gain control method, device, equipment and storage medium
US20060087380A1 (en) Method for limiting the dynamic range of audio signals, and circuit arrangement for this purpose

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant