CN106782587B - Sound masking device and sound masking method - Google Patents

Sound masking device and sound masking method Download PDF

Info

Publication number
CN106782587B
CN106782587B CN201611029084.7A CN201611029084A CN106782587B CN 106782587 B CN106782587 B CN 106782587B CN 201611029084 A CN201611029084 A CN 201611029084A CN 106782587 B CN106782587 B CN 106782587B
Authority
CN
China
Prior art keywords
frame
power
main channel
channel signal
noise
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201611029084.7A
Other languages
Chinese (zh)
Other versions
CN106782587A (en
Inventor
陈喆
殷福亮
崔行悦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University of Technology
Original Assignee
Dalian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University of Technology filed Critical Dalian University of Technology
Priority to CN201611029084.7A priority Critical patent/CN106782587B/en
Publication of CN106782587A publication Critical patent/CN106782587A/en
Application granted granted Critical
Publication of CN106782587B publication Critical patent/CN106782587B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Time-Division Multiplex Systems (AREA)
  • Noise Elimination (AREA)

Abstract

The invention discloses a sound masker and a sound masking method, wherein the sound masker comprises: a main channel for receiving a voice signal; a sub-channel to receive a background signal; a framing unit; the band-pass filter is connected with the framing unit; the voice detection unit is connected with the band-pass filter; a time-varying filter coupled to the voice detection unit; the masking unit is connected with the framing unit; a summing unit connected to the time-varying filter and the masking unit; the invention has good masking result and high flexibility.

Description

Sound masking device and sound masking method
Technical Field
The present invention relates to a sound masker and a sound masking method.
Background
Often there is a background signal such as music and dialogue (commentary) occurring simultaneously when a movie recording, a live game or a live radio is recorded. At this time, in order to highlight the dialogue (narration) sound, it is necessary to reduce the background signal level so that the dialogue can mask the background signal; when the dialogue (commentary) is finished, the background signal is restored to the original size. The sound masker is a dynamic processor for realizing the functions, and the sound masker is mainly used for automatically masking the main channel signal through the additional signal, and once the additional signal disappears, the amplitude of the main channel signal is automatically restored.
The Chinese utility model with publication No. CN205004028U provides a single channel sound masking device for directly preventing others from hearing or effectively recording conversation, which is based on the idea that a language selection switch is used to select a speech interference signal in a memory, and a sound acquisition device is used to convert the speech into a speech signal, the signal processing is divided into two units, the first signal processing unit divides the speech signal and randomly combines the speech signal to generate a first masking signal, the second signal processing unit randomly combines the speech interference signal to generate a second masking signal, finally, an audio player generates the first masking signal into a first masking sound output, and generates the second masking signal into a second masking sound output.
Disclosure of Invention
The invention provides a sound masker and a sound masking method aiming at the problems, the sound masker and the sound masking method can avoid the technical problem that when a voice signal is weak and a noise signal is strong, the judgment result of the voice and the noise is inaccurate by using a fixed judgment level, and can realize that when a certain frame of a main channel is a voice frame, the voice frame can not be lost and completely passes through, but when the frame is a noise frame, the voice frame is filtered as far as possible, so that the influence of the noise is reduced to the minimum; meanwhile, the sound masking device and the sound masking method can realize that when one frame of the main channel is judged to be a speech frame, the power of the auxiliary channel signal in the corresponding frame is rapidly reduced along with the speech power and can still be identified, and when the speech is judged to be finished, the power of the auxiliary channel signal in the corresponding frame is slowly increased until the frame is finished.
The technical means of the invention are as follows:
a sound masker, comprising:
a main channel for receiving a voice signal;
a sub-channel to receive a background signal;
the framing unit is used for framing the signals on the main channel to obtain main channel signals of each frame and framing the signals on the auxiliary channel to obtain auxiliary channel signals of each frame;
the band-pass filter is connected with the framing unit; the band-pass filter is used for performing band-pass filtering on the main channel signals of each frame;
the voice detection unit is connected with the band-pass filter; the voice detection unit is used for judging whether each frame of main channel signal after band-pass filtering is a voice frame or a noise frame;
a time-varying filter coupled to the voice detection unit; the time-varying filter is used for carrying out time-varying band-pass filtering processing on each frame of main channel signals after the voice frame or noise frame judgment is carried out;
the masking unit is connected with the framing unit; the masking unit is used for masking each frame of sub-channel signals while the time-varying filter carries out time-varying band-pass filtering processing on each frame of main channel signals; when a frame of main channel signal is a voice frame, the filtering bandwidth of the time-varying filter is gradually increased, and the masking unit adjusts a certain frame of auxiliary channel signal corresponding to the voice frame to be gradually reduced;
a summing unit connected to the time-varying filter and the masking unit; the summation unit is used for correspondingly superposing a frame of main channel signals subjected to time-varying band-pass filtering and a frame of auxiliary channel signals corresponding to the frame of main channel signals subjected to masking processing to obtain a frame of output signals;
further, the air conditioner is provided with a fan,
if the sampling value corresponding to the nth sampling point of a frame of main channel signal before band-pass filtering is in _ x (n), the sampling value corresponding to the nth sampling point is output after passing through the band-pass filter
Figure BDA0001157094810000021
Wherein IIR _ A (k) is the k-th filter coefficient, IIR _ B (r) is the r-th filter coefficient, and u (n-k) represents the filter coefficientThe output result corresponding to the (n-k) th sampling point after the band-pass filtering of the frame main channel signal (when n is more than or equal to k) or the output result corresponding to the (k) last sampling point after the band-pass filtering of the last frame main channel signal (when n is more than or equal to k)<k, in _ x (n-r) represents a sampling value corresponding to the nth-r sampling point before band-pass filtering of the main channel signal of the frame (when n is larger than or equal to r) or a sampling value corresponding to the last r sampling point before band-pass filtering of the main channel signal of the last frame (when n is larger than or equal to r)<r is (r);
further, the voice detection unit includes:
the power calculation module is used for calculating the power of each frame of main channel signal after passing through the band-pass filter;
the threshold value acquisition module is connected with the power calculation module; the threshold acquisition module is used for acquiring a voice power set, a noise power set and a power boundary threshold of voice and noise according to the power of each frame of main channel signals which are calculated by the power calculation module and pass through the band-pass filter;
p connected with threshold acquisition modulenA learning module; the P isnThe learning module is used for summing probability distribution of each noise power of the noise power set in the power of each frame of main channel signal, and when the obtained sum value is larger than a first preset value, the maximum value in each noise power is defined as Pn(ii) a When P is presentnWhen a preset condition is met and a preset relation is met between each voice power mean value in the voice power set and each noise power mean value in the noise power set, the P valuenLearning module update PnOtherwise, P is maintainednThe change is not changed;
and a threshold acquisition module and PnA judging module connected with the learning module; the judging module is used for calculating the power and P of a frame of main channel signal after passing through the band-pass filternDetermining the frame main channel signal as a voice frame or a noise frame according to the comparison result;
further, the power calculation module is represented by formula Px=-10×log10(Preal/327682) To calculate a band-pass filtered onePower of the frame main channel signal, where: pxRepresents the power of the main channel signal of the frame after band-pass filtering,
Figure BDA0001157094810000031
LEN represents the number of sampling points corresponding to one frame length of the main channel signal, and u (n) represents the sampling value corresponding to the nth sampling point after the band-pass filtering is carried out on the frame of the main channel signal;
further, the specific process of the threshold obtaining module obtaining the voice power set, the noise power set, and the power dividing threshold of voice and noise includes:
counting the frequency of the occurrence of the power value of each frame of main channel signals of a preset frame number calculated by a power calculation module;
establishing a probability distribution histogram of signal power values of each frame main channel with preset frame numbers, and obtaining a voice power distribution area and a noise power distribution area according to the probability distribution histogram;
calculating a power demarcation threshold value of voice and noise by adopting a maximum inter-class variance method;
further, the voice power set in the voice power distribution region is set to [0, P ]th-1]The noise power in the noise power distribution region is set as [ P ]th,Pm]Wherein P isth-1 represents the minimum value of speech power, P, in the region of speech power distributionthRepresenting the maximum value of noise power, P, in the noise power distribution regionmRepresenting a noise power minimum value in a noise power distribution region; the threshold acquisition module calculates the power demarcation threshold P of voice and noise by adopting the maximum inter-class variance methodthbestThe specific process comprises the following steps:
by the formula
Figure BDA0001157094810000041
Calculating the average value of each voice power in the voice power set and passing through a formula
Figure BDA0001157094810000042
Calculating each noise power in the noise power setThe mean value, wherein,
Figure BDA0001157094810000043
Figure BDA0001157094810000044
uvrepresenting the mean value, u, of the individual speech powers in the set of speech powersnRepresenting the mean value, p, of each noise power in a set of noise powersiRepresenting the distribution probability, p, of the speech power ikThe distribution probability of the voice power k is represented, and i is sequentially valued as [0, P ]th-1]Each value of k is sequentially taken as [ P ]th,Pm]Each value of (a);
using the formula uT=w0uv+w1unObtaining the average power of the main channel signal of each frame with preset frame number, wherein uTRepresenting the average power of each frame main channel signal of a preset frame number;
by the formula σ2=w0(uv-uT)2+w1(un-uT)2Obtaining the inter-class variance between the speech power and the noise power and obtaining the inter-class variance sigma2Taking the maximum power value which is the power demarcation threshold value P of voice and noisethbest
Further, the judging module firstly sets the initial value of the state value VAD to be 0; for a frame of main channel signal, when the state value VAD is 0, the power value P of the frame of main channel signalxRatio PnIf the value is larger than the second preset value, the judging module judges that the frame main channel signal is a voice frame and updates the state value VAD to 1, and when the state value VAD is 1, the power value P of the frame main channel signal is at the same timexRatio PnIf the value is smaller than a third preset value, the judging module judges that the frame main channel signal is a noise frame and updates the state value VAD to 0;
further, the air conditioner is provided with a fan,
the center frequency of the time-varying filter is 0.8kHz, and the amplitude-frequency response is
Figure BDA0001157094810000045
The difference equation is
Figure BDA0001157094810000046
Wherein r represents a bandwidth control variable and a variation range of [0.005,0.995 ]]Specifically, when a frame of the main channel signal is a speech frame, the bandwidth control variable r is changed from 0.995 to 0.005 in steps of 1, and when a frame of the main channel signal is a noise frame, the bandwidth control variable r is changed from 0.005 to 0.995 in steps of 2, where: f. ofsRepresenting sampling frequency, y (n) representing an output result corresponding to an nth sampling point of the frame of main channel signal after time-varying band-pass filtering, y (n-1) representing an output result corresponding to a 1 st sampling point of the last frame of main channel signal after time-varying band-pass filtering, y (n-2) representing an output result corresponding to a 2 nd sampling point of the last frame of main channel signal after time-varying band-pass filtering, j representing an imaginary unit, j2-1, ω represents the circumferential angular frequency;
when a frame of main channel signal is a voice frame, the masking unit adjusts the power of a certain frame of auxiliary channel signal corresponding to the voice frame to be gradually reduced from the power of the previous frame of auxiliary channel signal to one percent of the power of the voice frame after time-varying band-pass filtering, and when the frame of main channel signal is a noise frame, the masking unit adjusts the power of the certain frame of auxiliary channel signal corresponding to the noise frame to be gradually increased from the power of the previous frame of auxiliary channel signal to the power of the current frame of auxiliary channel signal; the masking unit is formed by
Figure BDA0001157094810000051
Obtaining the power P of a frame of side channel signal corresponding to a frame of main channel signalyIn the formula: LEN represents the number of sampling points included in one frame of main channel signal, and in _ y (n) represents the sampling value of a certain frame of sub-channel signal corresponding to one frame of main channel signal at the nth sampling point.
A sound masking method for receiving a speech signal through a main channel and a background signal through a sub-channel, the sound masking method comprising the steps of:
step 1: performing framing processing on the signals on the main channel to obtain main channel signals of each frame, performing framing processing on the signals on the auxiliary channels to obtain auxiliary channel signals of each frame, and executing the step 2;
step 2: performing band-pass filtering on the main channel signals of each frame, and executing the step 3;
and step 3: judging whether each frame of main channel signals after the band-pass filtering is a voice frame or a noise frame, and executing the step 4;
and 4, step 4: carrying out time-varying band-pass filtering processing on each frame of main channel signals, and simultaneously carrying out masking processing on each frame of auxiliary channel signals; when a frame of main channel signal is a voice frame, the filtering bandwidth is gradually increased, and a frame of auxiliary channel signal corresponding to the voice frame is gradually decreased, when a frame of main channel signal is a noise frame, the filtering bandwidth is gradually decreased, and a frame of auxiliary channel signal corresponding to the noise frame is gradually increased, and step 5 is executed;
and 5: correspondingly superposing a frame of main channel signals subjected to time-varying band-pass filtering with a frame of auxiliary channel signals corresponding to the frame of main channel signals subjected to masking processing to obtain a frame of output signals;
further, the step 3 specifically includes the following steps:
step 31: calculating the power of each frame of main channel signal after band-pass filtering, and executing step 32;
step 32: obtaining a voice power set, a noise power set and a power boundary threshold of voice and noise according to the calculated power of the main channel signal of each frame after band-pass filtering, and executing step 33;
step 33: summing the probability distribution of each noise power of the noise power set in each frame of main channel signal power, and defining the maximum value in each noise power as P when the obtained sum value is greater than a first preset valuenStep 34 is executed;
step 34: when P is presentnWhen a preset condition is met and a preset relation is met between each voice power mean value in the voice power set and each noise power mean value in the noise power set, updating PnOtherwise, P is maintainednIf not, executing step 35;
step 35: according to the calculated power and P of the band-pass filtered frame of main channel signalnComparing the sizes of the frames to determine whether the main channel signal of the frame is a speech frame or a noise frame.
Due to the adoption of the technical scheme, the sound masker and the sound masking method provided by the invention can avoid the technical problem that when a voice signal is weak and a noise signal is strong, the judgment result of the voice and the noise is inaccurate by using a fixed judgment level, and can realize that when a certain frame of a main channel is a voice frame, the voice frame can pass through the main channel without loss, but the voice frame is filtered as far as possible when the frame is a noise frame, so that the influence of the noise is reduced to the minimum; meanwhile, the sound masking device and the sound masking method can realize that when a certain frame of the main channel is judged to be a speech frame, the power of the auxiliary channel signal in the corresponding frame is rapidly reduced along with the speech power and can still be identified, and when the speech is judged to be finished, the power of the auxiliary channel signal in the corresponding frame is slowly increased until the frame is finished; the invention has good masking result and high flexibility.
Drawings
FIG. 1 is a block diagram of the structure of a sound masker according to the present invention;
FIG. 2 is a block diagram of the structure of the voice detection unit according to the present invention;
FIG. 3 is a schematic diagram of the power of the main channel signal of each frame when the preset frame number value is 200;
FIG. 4 is a speech signal as an input signal to the sound masker according to the present invention;
FIG. 5 is a background signal as an input signal to the sound masker according to the present invention;
FIG. 6 is a diagram illustrating the VAD variation curve of the state value of the present invention;
FIG. 7 is a schematic diagram illustrating the process of gain variation of the background signal according to the present invention under the influence of the presence or absence of the speech signal;
FIG. 8 is a background signal after being processed by the sound masker of the present invention;
FIG. 9 is an output signal of the sound masker of the present invention;
FIG. 10 is a histogram of probability distribution of power values of main channel signals of each frame of preset number of frames according to the present invention;
fig. 11 is a flow chart of the sound masking method of the present invention.
Detailed Description
A sound masker, as shown in fig. 1 and 2, comprising: a main channel for receiving a voice signal; a sub-channel to receive a background signal; the framing unit is used for framing the signals on the main channel to obtain main channel signals of each frame and framing the signals on the auxiliary channel to obtain auxiliary channel signals of each frame; the band-pass filter is connected with the framing unit; the band-pass filter is used for performing band-pass filtering on the main channel signals of each frame; the voice detection unit is connected with the band-pass filter; the voice detection unit is used for judging whether each frame of main channel signal after band-pass filtering is a voice frame or a noise frame; a time-varying filter coupled to the voice detection unit; the time-varying filter is used for carrying out time-varying band-pass filtering processing on each frame of main channel signals after the voice frame or noise frame judgment is carried out; the masking unit is connected with the framing unit; the masking unit is used for masking each frame of sub-channel signals while the time-varying filter carries out time-varying band-pass filtering processing on each frame of main channel signals; when a frame of main channel signal is a voice frame, the filtering bandwidth of the time-varying filter is gradually increased, and the masking unit adjusts a certain frame of auxiliary channel signal corresponding to the voice frame to be gradually reduced; a summing unit connected to the time-varying filter and the masking unit; the summation unit is used for correspondingly superposing a frame of main channel signals subjected to time-varying band-pass filtering and a frame of auxiliary channel signals corresponding to the frame of main channel signals subjected to masking processing to obtain a frame of output signals; further, if the main channel signal of a frame before the band-pass filtering is corresponding to the nth sampling pointThe sampling value is in _ x (n), and the sampling value corresponding to the nth sampling point passes through the output result of the band-pass filter
Figure BDA0001157094810000071
IIR _ A (k) is a kth filter coefficient, IIR _ B (r) is a kth filter coefficient, u (n-k) represents an output result corresponding to an nth-k sampling point after band-pass filtering of the main channel signal of the frame (when n is larger than or equal to k) or an output result corresponding to a last kth sampling point after band-pass filtering of the main channel signal of the last frame (when n is larger than or equal to k)<k, in _ x (n-r) represents a sampling value corresponding to the nth-r sampling point before band-pass filtering of the main channel signal of the frame (when n is larger than or equal to r) or a sampling value corresponding to the last r sampling point before band-pass filtering of the main channel signal of the last frame (when n is larger than or equal to r)<r is (r); further, the voice detection unit includes: the power calculation module is used for calculating the power of each frame of main channel signal after passing through the band-pass filter; the threshold value acquisition module is connected with the power calculation module; the threshold acquisition module is used for acquiring a voice power set, a noise power set and a power boundary threshold of voice and noise according to the power of each frame of main channel signals which are calculated by the power calculation module and pass through the band-pass filter; p connected with threshold acquisition modulenA learning module; the P isnThe learning module is used for summing probability distribution of each noise power of the noise power set in the power of each frame of main channel signal, and when the obtained sum value is larger than a first preset value, the maximum value in each noise power is defined as Pn(ii) a When P is presentnWhen a preset condition is met and a preset relation is met between each voice power mean value in the voice power set and each noise power mean value in the noise power set, the P valuenLearning module update PnOtherwise, P is maintainednThe change is not changed; and a threshold acquisition module and PnA judging module connected with the learning module; the judging module is used for calculating the power and P of a frame of main channel signal after passing through the band-pass filternDetermining the frame main channel signal as a voice frame or a noise frame according to the comparison result; further, the power calculation module is formulated byPx=-10×log10(Preal/327682) Calculating the power of the band-pass filtered frame of main channel signal, wherein: pxRepresents the power of the main channel signal of the frame after band-pass filtering,
Figure BDA0001157094810000081
LEN represents the number of sampling points corresponding to one frame length of the main channel signal, and u (n) represents the sampling value corresponding to the nth sampling point after the band-pass filtering is carried out on the frame of the main channel signal; further, the specific process of the threshold obtaining module obtaining the voice power set, the noise power set, and the power dividing threshold of voice and noise includes: counting the frequency of the occurrence of the power value of each frame of main channel signals of a preset frame number calculated by a power calculation module; establishing a probability distribution histogram of signal power values of each frame main channel with preset frame numbers, and obtaining a voice power distribution area and a noise power distribution area according to the probability distribution histogram; calculating a power demarcation threshold value of voice and noise by adopting a maximum inter-class variance method; further, the voice power set in the voice power distribution region is set to [0, P ]th-1]The noise power in the noise power distribution region is set as [ P ]th,Pm]Wherein P isth-1 represents the minimum value of speech power, P, in the region of speech power distributionthRepresenting the maximum value of noise power, P, in the noise power distribution regionmRepresenting a noise power minimum value in a noise power distribution region; the threshold acquisition module calculates the power demarcation threshold P of voice and noise by adopting the maximum inter-class variance methodthbestThe specific process comprises the following steps: by the formula
Figure BDA0001157094810000082
Calculating the average value of each voice power in the voice power set and passing through a formula
Figure BDA0001157094810000083
Calculating the average value of each noise power in the noise power set, wherein,
Figure BDA0001157094810000084
uvrepresenting the mean value, u, of the individual speech powers in the set of speech powersnRepresenting the mean value, p, of each noise power in a set of noise powersiRepresenting the distribution probability, p, of the speech power ikThe distribution probability of the voice power k is represented, and i is sequentially valued as [0, P ]th-1]Each value of k is sequentially taken as [ P ]th,Pm]Each value of (a); using the formula uT=w0uv+w1unObtaining the average power of the main channel signal of each frame with preset frame number, wherein uTRepresenting the average power of each frame main channel signal of a preset frame number; by the formula σ2=w0(uv-uT)2+w1(un-uT)2Obtaining the inter-class variance between the speech power and the noise power and obtaining the inter-class variance sigma2Taking the maximum power value which is the power demarcation threshold value P of voice and noisethbest(ii) a Further, the judging module firstly sets the initial value of the state value VAD to be 0; for a frame of main channel signal, when the state value VAD is 0, the power value P of the frame of main channel signalxRatio PnIf the value is larger than the second preset value, the judging module judges that the frame main channel signal is a voice frame and updates the state value VAD to 1, and when the state value VAD is 1, the power value P of the frame main channel signal is at the same timexRatio PnIf the value is smaller than a third preset value, the judging module judges that the frame main channel signal is a noise frame and updates the state value VAD to 0; further, the time-varying filter has a center frequency of 0.8kHz and an amplitude-frequency response
Figure BDA0001157094810000091
The difference equation is
Figure BDA0001157094810000092
Wherein r represents a bandwidth control variable and a variation range of [0.005,0.995 ]]Specifically, the bandwidth control variable r is changed from 0.995 to 0.005 in steps of step1 when a frame of the main channel signal is a speech frame, and the bandwidth control variable r is changed from 0.995 to 0.005 when a frame of the main channel signal is a noise frameThe variable r is varied from 0.005 to 0.995 in steps of step2, where: f. ofsRepresenting sampling frequency, y (n) representing an output result corresponding to an nth sampling point of the frame of main channel signal after time-varying band-pass filtering, y (n-1) representing an output result corresponding to a 1 st sampling point of the last frame of main channel signal after time-varying band-pass filtering, y (n-2) representing an output result corresponding to a 2 nd sampling point of the last frame of main channel signal after time-varying band-pass filtering, j representing an imaginary unit, j2-1, ω represents the circumferential angular frequency; when a frame of main channel signal is a voice frame, the masking unit adjusts the power of a certain frame of auxiliary channel signal corresponding to the voice frame to be gradually reduced from the power of the previous frame of auxiliary channel signal to one percent of the power of the voice frame after time-varying band-pass filtering, and when the frame of main channel signal is a noise frame, the masking unit adjusts the power of the certain frame of auxiliary channel signal corresponding to the noise frame to be gradually increased from the power of the previous frame of auxiliary channel signal to the power of the current frame of auxiliary channel signal; the masking unit is formed by
Figure BDA0001157094810000093
Obtaining the power P of a frame of side channel signal corresponding to a frame of main channel signalyIn the formula: LEN represents the number of sampling points included in one frame of main channel signal, and in _ y (n) represents the sampling value of a certain frame of sub-channel signal corresponding to one frame of main channel signal at the nth sampling point.
A sound masking method as shown in fig. 11, which receives a speech signal through a main channel and receives a background signal through a sub-channel, the sound masking method comprising the steps of:
step 1: performing framing processing on the signals on the main channel to obtain main channel signals of each frame, performing framing processing on the signals on the auxiliary channels to obtain auxiliary channel signals of each frame, and executing the step 2;
step 2: performing band-pass filtering on the main channel signals of each frame, and executing the step 3;
and step 3: judging whether each frame of main channel signals after the band-pass filtering is a voice frame or a noise frame, and executing the step 4;
and 4, step 4: carrying out time-varying band-pass filtering processing on each frame of main channel signals, and simultaneously carrying out masking processing on each frame of auxiliary channel signals; when a frame of main channel signal is a voice frame, the filtering bandwidth is gradually increased, and a frame of auxiliary channel signal corresponding to the voice frame is gradually decreased, when a frame of main channel signal is a noise frame, the filtering bandwidth is gradually decreased, and a frame of auxiliary channel signal corresponding to the noise frame is gradually increased, and step 5 is executed;
and 5: correspondingly superposing a frame of main channel signals subjected to time-varying band-pass filtering with a frame of auxiliary channel signals corresponding to the frame of main channel signals subjected to masking processing to obtain a frame of output signals;
further, the step 3 specifically includes the following steps:
step 31: calculating the power of each frame of main channel signal after band-pass filtering, and executing step 32;
step 32: obtaining a voice power set, a noise power set and a power boundary threshold of voice and noise according to the calculated power of the main channel signal of each frame after band-pass filtering, and executing step 33;
step 33: summing the probability distribution of each noise power of the noise power set in each frame of main channel signal power, and defining the maximum value in each noise power as P when the obtained sum value is greater than a first preset valuenStep 34 is executed;
step 34: when P is presentnWhen a preset condition is met and a preset relation is met between each voice power mean value in the voice power set and each noise power mean value in the noise power set, updating PnOtherwise, P is maintainednIf not, executing step 35;
step 35: according to the calculated power and P of the band-pass filtered frame of main channel signalnComparing the sizes of the frames to determine whether the main channel signal of the frame is a speech frame or a noise frame.
Further, if the sampling value of the frame of main channel signal before band-pass filtering at the nth sampling point is in _ x (n), the nth sampling point is used for filtering the frame of main channel signalOutput result of sampling value corresponding to sampling point after band-pass filtering
Figure BDA0001157094810000101
IIR _ A (k) is a kth filter coefficient, IIR _ B (r) is a kth filter coefficient, u (n-k) represents an output result corresponding to an nth-k sampling point after band-pass filtering of the main channel signal of the frame (when n is larger than or equal to k) or an output result corresponding to a last kth sampling point after band-pass filtering of the main channel signal of the last frame (when n is larger than or equal to k)<k, in _ x (n-r) represents a sampling value corresponding to the nth-r sampling point before band-pass filtering of the main channel signal of the frame (when n is larger than or equal to r) or a sampling value corresponding to the last r sampling point before band-pass filtering of the main channel signal of the last frame (when n is larger than or equal to r)<r is (r);
further, the step 31 specifically includes: by the formula Px=-10×log10(Preal/327682) Calculating the power of the band-pass filtered frame of main channel signal, wherein: pxRepresents the power of the main channel signal of the frame after band-pass filtering,
Figure BDA0001157094810000102
LEN represents the number of sampling points corresponding to one frame length of the main channel signal, and u (n) represents the sampling value corresponding to the nth sampling point after the band-pass filtering is carried out on the frame of the main channel signal;
further, the step 32 specifically includes the following steps:
step 321: counting the frequency of the occurrence of the power value of each main channel signal of each frame with a preset number of frames;
step 322: establishing a probability distribution histogram of signal power values of each frame main channel with preset frame numbers, and obtaining a voice power distribution area and a noise power distribution area according to the probability distribution histogram;
step 323: calculating power demarcation threshold value P of voice and noise by adopting maximum inter-class variance methodthbest
Further, the voice power set in the voice power distribution region is set to [0, P ]th-1]The noise power in the noise power distribution region is set as [ P ]th,Pm]Wherein P isth-1 represents the minimum value of speech power, P, in the region of speech power distributionthRepresenting the maximum value of noise power, P, in the noise power distribution regionmRepresenting a noise power minimum value in a noise power distribution region; calculating power demarcation threshold value P of voice and noise by adopting maximum inter-class variance methodthbestThe specific process comprises the following steps:
by the formula
Figure BDA0001157094810000111
Calculating the average value of each voice power in the voice power set and passing through a formula
Figure BDA0001157094810000112
Calculating the average value of each noise power in the noise power set, wherein,
Figure BDA0001157094810000113
Figure BDA0001157094810000114
uvrepresenting the mean value, u, of the individual speech powers in the set of speech powersnRepresenting the mean value, p, of each noise power in a set of noise powersiRepresenting the distribution probability, p, of the speech power ikThe distribution probability of the voice power k is represented, and i is sequentially valued as [0, P ]th-1]Each value of k is sequentially taken as [ P ]th,Pm]Each value of (a);
using the formula uT=w0uv+w1unObtaining the average power of the main channel signal of each frame with preset frame number, wherein uTRepresenting the average power of each frame main channel signal of a preset frame number;
by the formula σ2=w0(uv-uT)2+w1(un-uT)2Obtaining the inter-class variance between the speech power and the noise power and obtaining the inter-class variance sigma2Taking the maximum power value which is the power demarcation threshold value P of voice and noisethbest
Further, the step 35 specifically includes: firstly, setting an initial value of a state value VAD to be 0; for a frame of main channel signal, when the state value VAD is 0, the power value P of the frame of main channel signalxRatio PnIf the value is larger than the second preset value, the main channel signal of the frame is judged to be a voice frame, the VAD of the state value is updated to be 1, and when the VAD of the state value is 1 and the power value P of the main channel signal of the frame is larger than the second preset value, the VAD of the state value is updated to be 1xRatio PnIf the value is smaller than a third preset value, the main channel signal of the frame is judged to be a noise frame, and the VAD value is updated to be 0;
further, a time-varying filter is adopted to carry out time-varying band-pass filtering processing on each frame of main channel signals; the center frequency of the time-varying filter is 0.8kHz, and the amplitude-frequency response is
Figure BDA0001157094810000115
The difference equation is
Figure BDA0001157094810000116
Wherein r represents a bandwidth control variable and a variation range of [0.005,0.995 ]]Specifically, when a frame of the main channel signal is a speech frame, the bandwidth control variable r is changed from 0.995 to 0.005 in steps of 1, and when a frame of the main channel signal is a noise frame, the bandwidth control variable r is changed from 0.005 to 0.995 in steps of 2, where: f. ofsRepresenting sampling frequency, y (n) representing an output result corresponding to an nth sampling point of the frame of main channel signal after time-varying band-pass filtering, y (n-1) representing an output result corresponding to a 1 st sampling point of the last frame of main channel signal after time-varying band-pass filtering, y (n-2) representing an output result corresponding to a 2 nd sampling point of the last frame of main channel signal after time-varying band-pass filtering, j representing an imaginary unit, j2-1, ω represents the circumferential angular frequency;
when a frame of main channel signal is a speech frame, adjusting the power of a certain frame of side channel signal corresponding to the speech frame to gradually reduce the power of the last frame of side channel signal to one percent of the power of the speech frame after time-varying band-pass filtering, and when the frame of main channel signal is a noise frame, adjusting the power of a certain frame of side channel signal corresponding to the noise frame to be gradually reduced from the last frame of side channel signalThe signal power is gradually increased to the power of the current frame subchannel signal; by the formula
Figure BDA0001157094810000121
Obtaining the power P of a frame of side channel signal corresponding to a frame of main channel signalyIn the formula: LEN represents the number of sampling points included in one frame of main channel signal, and in _ y (n) represents the sampling value of a certain frame of sub-channel signal corresponding to one frame of main channel signal at the nth sampling point.
The specific process of counting the frequency of the occurrence of the power value of each frame of main channel signals with the preset frame number is as follows: firstly, respectively rounding the power values of all the main channel signals of each frame with a preset frame number, then sequentially using the power values of all the main channel signals of each frame as an accumulator subscript from the power value of the main channel signal of the first frame until all the power values of all the main channel signals of each frame with the preset frame number traverse, if some values are equal in the power values of all the main channel signals of each frame with the preset frame number, adding one to the accumulator corresponding to the value subscript to obtain the probability of the power value of the main channel signal of the ith frame
Figure BDA0001157094810000122
The preset frame number may take 200.
The invention carries out band-pass filtering on the main channel signal of each frame, thereby reducing the influence of noise on the voice detection result; the bandwidth of the band-pass filter is 0.3 kHz-3.4 kHz; the reference design indexes of the band-pass filter are as follows: the center frequency is 1.55kHz, the pass band frequency range is 0.3 kHz-3.4 kHz, the cut-off frequency of the lower stop band is 1Hz, the cut-off frequency of the upper stop band is 4kHz, and the attenuation of the stop band is more than 60 dB; the reference design result of the band-pass filter is a 7-order IIR filter, and the filter coefficient IIR _ B (7) {1.012205768830948 × 10 } of the 7-order IIR filter-3,-4.911110647449132×10-4,7.807184279553245×10-5,-1.198332967805191×10-3,7.807184279553245×10-5,-4.911110647449132e×10-4,1.012205768830948×10-3}; the filter coefficient IIR _ a (7) {1.0, -5.380875974547,12.23643655587558, -15.06088779864848,10.58294743567488, -4.024466821830663,0.6468480167658538 }; when n is<When k is needed, the output result u (n-k) corresponding to the last k sampling point of the last frame of main channel signal after band-pass filtering is assigned with an initial value of 0, and when n is needed, n is the value<And when r is obtained, the sampling value in _ x (n-r) corresponding to the sample point of the last r before the band-pass filtering of the main channel signal of the previous frame is also assigned with an initial value of 0.
The first preset value is 0.1; the second preset value is 12 dB; the third preset value is 6 dB; the preset condition is PnIs at [ P ]thbest,Pm]A range, the predetermined relationship being | uv-unThe | is more than 10; LEN may take the value 960; the invention passes the formula Px=-10×log10(Preal/327682) To calculate the power of the band-pass filtered frame of main channel signal, wherein,
Figure BDA0001157094810000131
is an intermediate variable, in particular PrealRepresenting the actual power of the band-pass filtered frame of the main channel signal, and, correspondingly, PxThe invention represents the relative power value of a frame of main channel signals after band-pass filtering, and the corresponding actual power value at 0dBoV is defined to be 327682Therefore P isxA relative power value of 0 dBaV; the invention can also adopt envelope detection or peak detection to calculate the envelopes of the main channel signal and the auxiliary channel signal, and then calculate the power by using each frame of envelope.
FIG. 3 is a diagram showing the power of the main channel signal of each frame when the preset frame number is 200, as shown in FIG. 3, PxAnd the power of the 199 frames of voice signals obtained before and the power of the main channel signal of the preset frame number, namely 200 frames, are formed together.
FIG. 10 is a probability distribution histogram of the power value of the main channel signal of each frame with the preset frame number, as shown in FIG. 10, after the probability distribution histogram of the power value of the main channel signal of each frame with the preset frame number is established, the voice power distribution region and the noise power distribution region can be obtained according to the probability distribution histogram, and the voice power distribution region and the noise power distribution region can be obtainedDemarcation zones between zones, power demarcation threshold P for speech and noisethbestIt is calculated in the boundary region by the maximum inter-class variance method.
When a frame of main channel signal is a voice frame, the filtering bandwidth of the time-varying filter is gradually increased, and the masking unit adjusts a certain frame of auxiliary channel signal corresponding to the voice frame to be gradually reduced, specifically, the amplitude of the frame of auxiliary channel signal is reduced by controlling the power of the certain frame of auxiliary channel signal; the initialization initial values of y (n-1) and y (n-2) are both 0;
Figure BDA0001157094810000132
when a frame of main channel signal is a speech frame, the power of a certain frame of side channel signal corresponding to the speech frame is gradually reduced from the power of the last frame of side channel signal to one percent of the power of the speech frame after time-varying band-pass filtering, when the frame of main channel signal is a noise frame, the power of the certain frame of side channel signal corresponding to the noise frame is gradually increased from the power of the last frame of side channel signal to the power of the current frame of side channel signal, specifically, when the state value VAD is 1, the gain curr _ gain of the side channel signal is rapidly reduced within the frame according to the step length of each point of 0.01 until the minimum value is reached
Figure BDA0001157094810000141
PxxRepresenting the power of a frame of main channel signals after time-varying band-pass filtering, when the state value VAD is 0, slowly increasing the gain curr _ gain of the auxiliary channel signal within a frame according to the step length of each point of 0.000005 until the preset maximum gain 1 is reached; in practical application, the length of a time frame can be 20ms, the sampling frequency is 48kHz, the number of sampling points corresponding to the length of one frame is 960, and the step length of each point is the step length of 960 sampling points; of sub-channel signals at nth sample point in a frameOutput utputm (n) curr _ gain in _ y (n); the invention finally superposes a frame of main channel signals after time-varying band-pass filtering processing and a frame of auxiliary channel signals corresponding to the frame of main channel signals after masking processing correspondingly to obtain a frame of output signals, specifically, at the nth sampling point, the output signals are output (n) (n) + curr _ gain _ in _ y (n).
The variation range of the bandwidth control variable r of the invention is [0.005,0.995 ]]Specifically, when the state value VAD is 1, the bandwidth control variable r is changed from 0.995 to 0.005 in steps of 1, and when the state value VAD is 0, the bandwidth control variable r is changed from 0.005 to 0.995 in steps of 2, where:
Figure BDA0001157094810000142
fsrepresenting the sampling frequency, y (n) representing the output result corresponding to the nth sampling point of the frame of main channel signal after time-varying band-pass filtering, y (n-1) representing the output result corresponding to the 1 st sampling point of the last frame of main channel signal after time-varying band-pass filtering, y (n-2) representing the output result corresponding to the 2 nd sampling point of the last frame of main channel signal after time-varying band-pass filtering (when y (n) is the 1 st sampling point, y (n-1) and y (n-2) need to be given initial values of 0), j represents an imaginary unit, j represents a unit2-1, w denotes the circumferential angular frequency, in particular when fsWhen the value is 48kHz, step1 is 0.0006875, step2 is 0.000006875, step1 is much larger than step2, so when r changes according to step1, the r changes to 0.005 quickly in a short time, and when r changes according to step2, the r changes to 0.995 slowly in a longer time.
The specific effect of the invention after the implementation is tested by taking the sub-channel signal as the background music signal, fig. 4 shows the received signal (speech signal) on the main channel of the sound masker of the invention, and fig. 5 shows the received signal (background music signal) on the sub-channel of the sound masker of the invention; in testing the sound masker, the maximum between class variance method (Ostu) is used to find the power boundary threshold of voice and noise, and P is combinednTo determine whether a frame of main channel signal is a speech frame (state value VAD is 1) or a noise frame (state value VAD is 0), FIG. 6 showsWith reference to fig. 6, it can be seen that the state value VAD is 1 at the position where there is voice, and is equal to 0 when the voice disappears; fig. 7 shows the process of the gain change of the background music signal under the action of the voice signal, as shown in fig. 7, it can be seen that when the voice signal exists, the gain of the background music signal is rapidly decreased, and when the voice signal is over, the gain of the background music signal is slowly increased; fig. 8 shows a background music signal after being processed by the sound masker of the present invention, and fig. 9 shows an output signal of the sound masker of the present invention; the test result shows that when the voice signal exists, the voice signal can be highlighted and the background music can be identified; when no voice signal exists, the background music is smoothly and gradually increased, and the auditory experience effect of the human ears is good.
The invention can avoid the technical problem that the judgment result of the voice and the noise is inaccurate by using a fixed judgment level when the voice signal is weak and the noise signal is strong, and can realize that when a certain frame of the main channel is the voice signal, the voice signal can pass through the frame without loss, but when the frame is the noise, the noise is filtered as far as possible, thereby reducing the influence of the noise to the minimum; meanwhile, the sound masking device and the sound masking method can realize that when a certain frame of the main channel is judged to be a voice signal, the power of the auxiliary channel signal in the corresponding frame is rapidly reduced along with the voice power and can still be identified, and when the voice is judged to be finished, the power of the auxiliary channel signal in the corresponding frame is slowly increased until the frame is finished; furthermore, the invention uses the Ostu algorithm to obtain the power boundary threshold of the voice and the noise, thereby improving the accuracy of the power boundary threshold; the time-varying band-pass filtering processing is carried out on the voice frame and the noise frame by using a time-varying filter, so that the voice energy is ensured to pass as much as possible, and the influence of noise on signals is reduced; meanwhile, the size of the auxiliary channel signal is changed according to the frame following the main channel signal when the masking processing is carried out on the auxiliary channel signal.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered to be within the technical scope of the present invention, and the technical solutions and the inventive concepts thereof according to the present invention should be equivalent or changed within the scope of the present invention.

Claims (10)

1. A sound masker, characterized in that the sound masker comprises:
a main channel for receiving a voice signal;
a sub-channel to receive a background signal;
the framing unit is used for framing the signals on the main channel to obtain main channel signals of each frame and framing the signals on the auxiliary channel to obtain auxiliary channel signals of each frame;
the band-pass filter is connected with the framing unit; the band-pass filter is used for performing band-pass filtering on the main channel signals of each frame;
the voice detection unit is connected with the band-pass filter; the voice detection unit is used for judging whether each frame of main channel signal after band-pass filtering is a voice frame or a noise frame;
a time-varying filter coupled to the voice detection unit; the time-varying filter is used for carrying out time-varying band-pass filtering processing on each frame of main channel signals after the voice frame or noise frame judgment is carried out;
the masking unit is connected with the framing unit; the masking unit is used for masking each frame of sub-channel signals while the time-varying filter carries out time-varying band-pass filtering processing on each frame of main channel signals; when a frame of main channel signal is a voice frame, the filtering bandwidth of the time-varying filter is gradually increased, and the masking unit adjusts a certain frame of auxiliary channel signal corresponding to the voice frame to be gradually reduced;
a summing unit connected to the time-varying filter and the masking unit; the summation unit is used for correspondingly superposing a frame of main channel signals subjected to time-varying band-pass filtering processing and a frame of auxiliary channel signals corresponding to the frame of main channel signals subjected to masking processing to obtain a frame of output signals.
2. The sound masker of claim 1, wherein the sound masker is configured to be used in a speaker
If the sampling value corresponding to the nth sampling point of a frame of main channel signal before band-pass filtering is in _ x (n), the sampling value corresponding to the nth sampling point is output after passing through the band-pass filter
Figure FDA0002295034390000011
Wherein IIR _ A (k) is a kth filter coefficient, IIR _ B (r) is a kth filter coefficient, when n is larger than or equal to k, u (n-k) represents an output result corresponding to the (n-k) th sampling point after the band-pass filtering of the main channel signal of the frame, and when n is larger than or equal to k, u (n-k) represents an output result corresponding to the (n-k) th sampling point after the band-pass filtering of the<When k is obtained, u (n-k) represents an output result corresponding to the kth sampling point after band-pass filtering of the last frame of main channel signal, when n is larger than or equal to r, in _ x (n-r) represents a sampling value corresponding to the n-r sampling point before band-pass filtering of the frame of main channel signal, and when n is larger than or equal to r<And r, in _ x (n-r) represents a sampling value corresponding to the last r sampling point of the main channel signal in the previous frame before band-pass filtering.
3. The sound masker of claim 1, characterized in that the speech detection unit comprises:
the power calculation module is used for calculating the power of each frame of main channel signal after passing through the band-pass filter;
the threshold value acquisition module is connected with the power calculation module; the threshold acquisition module is used for acquiring a voice power set, a noise power set and a power boundary threshold of voice and noise according to the power of each frame of main channel signals which are calculated by the power calculation module and pass through the band-pass filter;
p connected with threshold acquisition modulenA learning module; the P isnThe learning module is used for summing probability distribution of each noise power of the noise power set in the power of each frame of main channel signal, and when the obtained sum value is greater than a first preset value, the learning module sums the probability distribution of each noise power in each frame of main channel signalThe maximum value in the noise power is defined as Pn(ii) a When P is presentnWhen a preset condition is met and a preset relation is met between each voice power mean value in the voice power set and each noise power mean value in the noise power set, the P valuenLearning module update PnOtherwise, P is maintainednThe change is not changed;
and a threshold acquisition module and PnA judging module connected with the learning module; the judging module is used for calculating the power and P of a frame of main channel signal after passing through the band-pass filternComparing the sizes of the frames to determine whether the main channel signal of the frame is a speech frame or a noise frame.
4. The sound masker of claim 3, characterized in that the power calculation module is formulated by formula
Figure FDA0002295034390000021
Calculating the power of the band-pass filtered frame of main channel signal, wherein: pxThe power of the frame of main channel signals after band-pass filtering is represented, LEN represents the number of sampling points corresponding to one frame length of the main channel signals, and u (n) represents a sampling value corresponding to the nth sampling point after the band-pass filtering is carried out on the frame of main channel signals.
5. The sound masker of claim 3, wherein the threshold obtaining module obtains the speech power set, the noise power set, and the power boundary threshold of speech and noise by:
counting the frequency of the occurrence of the power value of each frame of main channel signals of a preset frame number calculated by a power calculation module;
establishing a probability distribution histogram of signal power values of each frame main channel with preset frame numbers, and obtaining a voice power distribution area and a noise power distribution area according to the probability distribution histogram;
and calculating the power demarcation threshold value of the voice and the noise by adopting a maximum inter-class variance method.
6. The sound masker of claim 5, wherein the set of speech powers in the speech power distribution region is set to [0, P ™th-1]The noise power in the noise power distribution region is set as [ P ]th,Pm]Wherein P isth-1 represents the minimum value of speech power, P, in the region of speech power distributionthRepresenting the maximum value of noise power, P, in the noise power distribution regionmRepresenting a noise power minimum value in a noise power distribution region; the threshold acquisition module calculates the power demarcation threshold P of voice and noise by adopting the maximum inter-class variance methodthbestThe specific process comprises the following steps:
by the formula
Figure FDA0002295034390000031
Calculating the average value of each voice power in the voice power set and passing through a formula
Figure FDA0002295034390000032
Calculating the average value of each noise power in the noise power set, wherein,
Figure FDA0002295034390000033
Figure FDA0002295034390000034
uvrepresenting the mean value, u, of the individual speech powers in the set of speech powersnRepresenting the mean value, p, of each noise power in a set of noise powersiRepresenting the distribution probability, p, of the speech power ikThe distribution probability of the voice power k is represented, and i is sequentially valued as [0, P ]th-1]Each value of k is sequentially taken as [ P ]th,Pm]Each value of (a);
using the formula uT=w0uv+w1unObtaining the average power of the main channel signal of each frame with preset frame number, wherein uTRepresenting the average power of each frame main channel signal of a preset frame number;
by the formula σ2=w0(uv-uT)2+w1(un-uT)2Obtaining the inter-class variance between the speech power and the noise power and obtaining the inter-class variance sigma2Taking the maximum power value which is the power demarcation threshold value P of voice and noisethbest
7. The sound masker of claim 3, wherein the determining module first sets the initial value of the state value VAD to 0; for a frame of main channel signal, when the state value VAD is 0, the power value P of the frame of main channel signalxRatio PnIf the value is larger than the second preset value, the judging module judges that the frame main channel signal is a voice frame and updates the state value VAD to 1, and when the state value VAD is 1, the power value P of the frame main channel signal is at the same timexRatio PnAnd if the value is smaller than a third preset value, the judging module judges that the main channel signal of the frame is a noise frame and updates the VAD to be 0.
8. The sound masker of claim 1,
the center frequency of the time-varying filter is 0.8kHz, and the amplitude-frequency response is
Figure FDA0002295034390000035
The difference equation is
Figure FDA0002295034390000036
Wherein r represents a bandwidth control variable and a variation range of [0.005,0.995 ]]Specifically, when a frame of the main channel signal is a speech frame, the bandwidth control variable r is changed from 0.995 to 0.005 in steps of 1, and when a frame of the main channel signal is a noise frame, the bandwidth control variable r is changed from 0.005 to 0.995 in steps of 2, where: f. ofsRepresenting the sampling frequency, y (n) representing the output result corresponding to the nth sampling point after the time-varying band-pass filtering of the frame of main channel signals, y (n-1) representing the output result corresponding to the 1 st sampling point after the time-varying band-pass filtering of the last frame of main channel signals, and y (n-2) representing the last frame of main channel signalsThe output result corresponding to the 2 nd sampling point after the time-varying band-pass filtering of the signal, j represents an imaginary unit, j2Where-1 and omega denote the circumferential angular frequency,
Figure FDA0002295034390000041
when a frame of main channel signal is a voice frame, the masking unit adjusts the power of a certain frame of auxiliary channel signal corresponding to the voice frame to be gradually reduced from the power of the previous frame of auxiliary channel signal to one percent of the power of the voice frame after time-varying band-pass filtering, and when the frame of main channel signal is a noise frame, the masking unit adjusts the power of the certain frame of auxiliary channel signal corresponding to the noise frame to be gradually increased from the power of the previous frame of auxiliary channel signal to the power of the current frame of auxiliary channel signal; the masking unit is formed by
Figure FDA0002295034390000042
Obtaining the power P of a frame of side channel signal corresponding to a frame of main channel signalyIn the formula: LEN represents the number of sampling points included in one frame of main channel signal, and in _ y (n) represents the sampling value of a certain frame of sub-channel signal corresponding to one frame of main channel signal at the nth sampling point.
9. A sound masking method, wherein the sound masking method receives a speech signal through a main channel and receives a background signal through a sub-channel, the sound masking method comprising the steps of:
step 1: performing framing processing on the signals on the main channel to obtain main channel signals of each frame, performing framing processing on the signals on the auxiliary channels to obtain auxiliary channel signals of each frame, and executing the step 2;
step 2: performing band-pass filtering on the main channel signals of each frame, and executing the step 3;
and step 3: judging whether each frame of main channel signals after the band-pass filtering is a voice frame or a noise frame, and executing the step 4;
and 4, step 4: carrying out time-varying band-pass filtering processing on each frame of main channel signals, and simultaneously carrying out masking processing on each frame of auxiliary channel signals; when a frame of main channel signal is a voice frame, the filtering bandwidth is gradually increased, and a frame of auxiliary channel signal corresponding to the voice frame is gradually decreased, when a frame of main channel signal is a noise frame, the filtering bandwidth is gradually decreased, and a frame of auxiliary channel signal corresponding to the noise frame is gradually increased, and step 5 is executed;
and 5: and correspondingly superposing a frame of main channel signals after time-varying band-pass filtering processing and a frame of auxiliary channel signals corresponding to the frame of main channel signals after masking processing to obtain a frame of output signals.
10. The sound masking method according to claim 9, wherein said step 3 specifically comprises the steps of:
step 31: calculating the power of each frame of main channel signal after band-pass filtering, and executing step 32;
step 32: obtaining a voice power set, a noise power set and a power boundary threshold of voice and noise according to the calculated power of the main channel signal of each frame after band-pass filtering, and executing step 33;
step 33: summing the probability distribution of each noise power of the noise power set in each frame of main channel signal power, and defining the maximum value in each noise power as P when the obtained sum value is greater than a first preset valuenStep 34 is executed;
step 34: when P is presentnWhen a preset condition is met and a preset relation is met between each voice power mean value in the voice power set and each noise power mean value in the noise power set, updating PnOtherwise, P is maintainednIf not, executing step 35;
step 35: according to the calculated power and P of the band-pass filtered frame of main channel signalnComparing the sizes of the frames to determine whether the main channel signal of the frame is a speech frame or a noise frame.
CN201611029084.7A 2016-11-20 2016-11-20 Sound masking device and sound masking method Active CN106782587B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611029084.7A CN106782587B (en) 2016-11-20 2016-11-20 Sound masking device and sound masking method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611029084.7A CN106782587B (en) 2016-11-20 2016-11-20 Sound masking device and sound masking method

Publications (2)

Publication Number Publication Date
CN106782587A CN106782587A (en) 2017-05-31
CN106782587B true CN106782587B (en) 2020-04-28

Family

ID=58971532

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611029084.7A Active CN106782587B (en) 2016-11-20 2016-11-20 Sound masking device and sound masking method

Country Status (1)

Country Link
CN (1) CN106782587B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102099851A (en) * 2008-07-18 2011-06-15 皇家飞利浦电子股份有限公司 Method and system for preventing overhearing of private conversations in public places
CN104156578A (en) * 2014-07-31 2014-11-19 南京工程学院 Recording time identification method
CN204495996U (en) * 2011-10-26 2015-07-22 菲力尔系统公司 broadband sonar receiver
CN205004029U (en) * 2015-09-29 2016-01-27 苏州一天声学科技有限公司 Ware is sheltered to array sound

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5673256B2 (en) * 2011-03-17 2015-02-18 ヤマハ株式会社 Maska sound measuring device and sound masking device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102099851A (en) * 2008-07-18 2011-06-15 皇家飞利浦电子股份有限公司 Method and system for preventing overhearing of private conversations in public places
CN204495996U (en) * 2011-10-26 2015-07-22 菲力尔系统公司 broadband sonar receiver
CN104156578A (en) * 2014-07-31 2014-11-19 南京工程学院 Recording time identification method
CN205004029U (en) * 2015-09-29 2016-01-27 苏州一天声学科技有限公司 Ware is sheltered to array sound

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
"An Auditory-Masking-Threshold-Based Noise Suppression Algorithm GMMSE-AMT For Listeners";Ajay Natarajan;《EURASIP》;20051231;全文 *
"IMPROVED SIGNAL REPRESENTATION FOR EVENT DETECTION IN REMOTE HEALTH CARE THROUGH PSYCHOANALYICAL MASKING";Jugurta Montalvao;《ResearchGate》;20140102;全文 *
"基于听觉掩蔽效应的多频带谱减语音增强算法";曹亮;《计算机工程与设计》;20130131;第34卷(第1期);全文 *
"基于噪声估计的二值掩蔽语音增强算法";曹龙涛;《计算机工程与应用》;20151231;第51卷(第17期);全文 *
"基于最小统计和人耳掩蔽特性的语音增强算法";吕勇;《电声技术》;20121231;第37卷(第12期);全文 *
"非平稳噪声环境下结合听觉掩蔽的语音增强";张勇;《计算机工程与设计》;20150531;第36卷(第5期);全文 *

Also Published As

Publication number Publication date
CN106782587A (en) 2017-05-31

Similar Documents

Publication Publication Date Title
KR101461141B1 (en) System and method for adaptively controlling a noise suppressor
CN103871421B (en) A kind of self-adaptation noise reduction method and system based on subband noise analysis
WO2022160593A1 (en) Speech enhancement method, apparatus and system, and computer-readable storage medium
EP3163902A1 (en) Information-processing device, information processing method, and program
US9431982B1 (en) Loudness learning and balancing system
US20110188671A1 (en) Adaptive gain control based on signal-to-noise ratio for noise suppression
US9036825B2 (en) Audio signal correction and calibration for a room environment
US7155385B2 (en) Automatic gain control for adjusting gain during non-speech portions
US20130156208A1 (en) Hearing aid and method of detecting vibration
US9384759B2 (en) Voice activity detection and pitch estimation
KR20130038857A (en) Adaptive environmental noise compensation for audio playback
US20140161280A1 (en) Audio signal correction and calibration for a room environment
CN106782586B (en) Audio signal processing method and device
US9437213B2 (en) Voice signal enhancement
CN103812462B (en) Volume control method and device
WO2015085946A1 (en) Voice signal processing method, apparatus and server
CN110914901A (en) Verbal signal leveling
CN110708651A (en) Hearing aid squeal detection and suppression method and device based on segmented trapped wave
US20130223644A1 (en) Systems and Methods for Reducing Unwanted Sounds in Signals Received From an Arrangement of Microphones
US11445307B2 (en) Personal communication device as a hearing aid with real-time interactive user interface
Shin et al. Perceptual reinforcement of speech signal based on partial specific loudness
CN105869652A (en) Psychological acoustic model calculation method and device
CN106782587B (en) Sound masking device and sound masking method
CN115410593A (en) Audio channel selection method, device, equipment and storage medium
KR20080068397A (en) Speech intelligibility enhancement apparatus and method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant