CN108766454A

CN108766454A - A kind of voice noise suppressing method and device

Info

Publication number: CN108766454A
Application number: CN201810692665.1A
Authority: CN
Inventors: 金如利
Original assignee: Zhejiang Feige Electronic Technology Co Ltd
Current assignee: Taizhou Zhige Electronic Technology Co., Ltd
Priority date: 2018-06-28
Filing date: 2018-06-28
Publication date: 2018-11-06

Abstract

The present invention provides a kind of voice noise suppressing method and devices, are related to voice process technology field.The voice noise suppressing method determines acoustics scene corresponding with the frequency domain speech signal according to the noise estimated result and signal-to-noise ratio of frequency domain speech signal first, parameter adjustment is carried out to noise processed model according to the acoustics scene, speech enhan-cement is carried out to the frequency domain speech signal further according to the noise processed model after adjustment.The voice noise suppressing method is adjusted to carry out speech enhan-cement noise processed model based on a variety of acoustics scenes, so that noise suppressed is had scene specific aim, improves noise processed speed and speech enhan-cement effect.

Description

A kind of voice noise suppressing method and device

Technical field

The present invention relates to voice process technology fields, in particular to a kind of voice noise suppressing method and dress It sets.

Background technology

Universal with electronic equipment, more and more operations or input need the participation of phonetic function, and many electronics The precise requirements that the refinement of equipment inputs voice are higher and higher.Since voice input is typically with noise It is carried out under scene, in practical application scene, target voice can usually be interfered by factors such as noise circumstances so that voice Clarity, intelligibility and comfort level substantially reduce, to seriously affect human ear auditory perception and electronic equipment to voice Analyzing processing, therefore generally require to carry out again after the noisy voice signal of the band of typing is carried out noise reduction process, speech enhan-cement Output or other operations.

But existing voice de-noising method carries out the voice messaging of typing under various noise circumstances at the place of identical flow Reason, lacks specific aim, there is a problem of that voice de-noising efficiency is low, noise suppression effect is bad.

Invention content

In view of this, the embodiment of the present invention is designed to provide a kind of voice noise suppressing method and device, to solve The problem that voice de-noising efficiency is low in above-mentioned existing voice Enhancement Method, noise suppression effect is bad.

In a first aspect, an embodiment of the present invention provides a kind of voice noise suppressing method, the voice noise suppressing method Including：Acoustic field corresponding with the frequency domain speech signal is determined according to the noise estimated result and signal-to-noise ratio of frequency domain speech signal Scape；Parameter adjustment is carried out to noise processed model according to the acoustics scene；According to the noise processed model after adjustment to described Frequency domain speech signal carries out speech enhan-cement.

Synthesis described according to the noise estimated result and signal-to-noise ratio of frequency domain speech signal in a first aspect, determine and the frequency Before the corresponding acoustics scene of domain voice signal, the voice noise suppressing method further includes：It will by simulating human ear filter Collected original time domain voice signal is converted into frequency domain speech signal.

Synthesis is in a first aspect, converted collected original time domain voice signal to by simulating human ear filter described It is after frequency domain speech signal and described according to the determination of the noise estimated result and signal-to-noise ratio of frequency domain speech signal and the frequency domain Before the corresponding acoustics scene of voice signal, the voice noise suppressing method further includes：Obtain the frequency domain speech signal Voice activity detection result；The noise estimation knot for obtaining the frequency domain speech signal is calculated according to the voice activity detection result Fruit and signal-to-noise ratio.

It integrates in a first aspect, the voice activity detection for obtaining frequency domain speech signal is as a result, include：Frequency domain speech is believed Number carry out voice activity detection, to which the frequency domain speech signal is divided into sound section and unvoiced segments, will described sound section with The unvoiced segments as the frequency domain speech signal voice activity detection as a result, described sound section for simultaneously including voice signal With the frequency range of noise signal, the unvoiced segments are only to include the frequency range of noise signal.

Synthesis is in a first aspect, described calculated according to the voice activity detection result obtains making an uproar for the frequency domain speech signal Sound estimated result and signal-to-noise ratio, including：Comparing calculation is carried out to the energy feature of described sound section and the unvoiced segments, is made an uproar Sound estimated result and signal-to-noise ratio.

Synthesis shelters submodel in a first aspect, the noise processed model includes noise suppressed submodel and human ear acoustics, It is described that parameter adjustment is carried out to noise processed model according to the acoustics scene, according to the noise processed model after adjustment to described Frequency domain speech signal carries out speech enhan-cement, including：Determine that the human ear acoustics shelters estimating for submodel according to the acoustics scene Masking threshold is counted, auditory perceptual frequency domain language is filtered out in the frequency domain speech signal using human ear acoustics masking submodel Sound signal；The suppression of noise based on spectrum-subtraction is carried out to the auditory perceptual frequency domain speech signal using the noise suppressed submodel System processing, obtains speech enhan-cement output signal.

Synthesis to the frequency domain speech signal in the noise processed model according to after adjustment in a first aspect, carry out voice After enhancing, the voice noise suppressing method further includes：The speech enhan-cement output signal is converted into time domain speech signal； It is exported by loudspeaker after being amplified the time domain speech signal using power amplifier.

Second aspect, an embodiment of the present invention provides a kind of voice noise restraining device, the voice noise restraining device Including acoustics scene determining module, parameter adjustment module and noise processed module.The acoustics scene determining module is used for basis The noise estimated result and signal-to-noise ratio of frequency domain speech signal determine acoustics scene corresponding with the frequency domain speech signal.The ginseng Number adjustment module is used to carry out parameter adjustment to noise processed model according to the acoustics scene.The noise processed module is used for Speech enhan-cement is carried out to the frequency domain speech signal according to the noise processed model after adjustment.

Comprehensive second aspect, the voice noise restraining device further includes Voice Activity Detection module and noise analysis mould Block.The Voice Activity Detection module is used to obtain the voice activity detection result of the frequency domain speech signal.The noise point Module is analysed to be used to calculate the noise estimated result and letter for obtaining the frequency domain speech signal according to the voice activity detection result It makes an uproar ratio.

The third aspect, the embodiment of the present invention additionally provide a kind of storage medium, and the storage medium is stored in computer, The storage medium includes a plurality of instruction, and a plurality of instruction is configured such that the computer executes above-mentioned method.

Advantageous effect provided by the invention is：

The present invention provides a kind of voice noise suppressing method and device, the voice noise suppressing method is based on frequency domain language Sound signal is handled, and more conducively processing equipment is analyzed it and handled, and improves the speed and essence of Speech processing Exactness.Meanwhile the voice noise suppressing method is adjusted noise processed model for different acoustics scenes, so that described Voice noise suppressing method is more accurate on adapting to acoustics scene, can realize more targetedly noise suppressed, improves The effect of noise suppressed.Further, the judgement of the acoustics scene is carried out by noise estimated result and signal-to-noise ratio, is increased Add the accuracy that acoustics scene judges and the speed for improving the judgement of acoustics scene, inhibits effect to improve voice noise Fruit and efficiency.

Other features and advantages of the present invention will be illustrated in subsequent specification, also, partly be become from specification It is clear that by implementing understanding of the embodiment of the present invention.The purpose of the present invention and other advantages can be by saying what is write Specifically noted structure is realized and is obtained in bright book, claims and attached drawing.

Description of the drawings

In order to illustrate the technical solution of the embodiments of the present invention more clearly, below will be to needed in the embodiment attached Figure is briefly described, it should be understood that the following drawings illustrates only certain embodiments of the present invention, therefore is not construed as pair The restriction of range for those of ordinary skill in the art without creative efforts, can also be according to this A little attached drawings obtain other relevant attached drawings.

Fig. 1 is a kind of flow chart for voice noise suppressing method that first embodiment of the invention provides；

Fig. 2 is the flow chart of a kind of voice input that first embodiment of the invention provides and processing step；

Fig. 3 is a kind of flow diagram for noise suppressed mode that first embodiment of the invention provides；

Fig. 4 is a kind of module map for voice noise restraining device that second embodiment of the invention provides；

Fig. 5 is a kind of structure can be applied to the electronic equipment in the embodiment of the present application that third embodiment of the invention provides Block diagram.

Icon：100- voice noise restraining devices；101- Voice Activity Detection modules；102- noise analysis modules；110- Acoustics scene determining module；120- parameter adjustment modules；130- noise processed modules；140- voice signal output modules；200- Electronic equipment；201- memories；202- storage controls；203- processors；204- Peripheral Interfaces；205- input-output units； 206- audio units；207- display units.

Specific implementation mode

Below in conjunction with attached drawing in the embodiment of the present invention, technical solution in the embodiment of the present invention carries out clear, complete Ground describes, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.Usually exist The component of the embodiment of the present invention described and illustrated in attached drawing can be arranged and be designed with a variety of different configurations herein.Cause This, the detailed description of the embodiment of the present invention to providing in the accompanying drawings is not intended to limit claimed invention below Range, but it is merely representative of the selected embodiment of the present invention.Based on the embodiment of the present invention, those skilled in the art are not doing The every other embodiment obtained under the premise of going out creative work, shall fall within the protection scope of the present invention.

It should be noted that：Similar label and letter indicate similar terms in following attached drawing, therefore, once a certain Xiang Yi It is defined, then it further need not be defined and explained in subsequent attached drawing in a attached drawing.Meanwhile the present invention's In description, term " first ", " second " etc. are only used for distinguishing description, are not understood to indicate or imply relative importance.

Below first to the present embodiments relate to part term explain：

Masking effect refers to the stimulation due to there are multiple same categories (such as sound, image), causes subject that cannot completely connect The information all stimulated.Wherein, visual masking effect includes lightness masking effect and pattern masking effect, influence factor master To include spatial domain, time-domain and color gamut；Auditory masking effect is then mainly covered including noise, human ear, frequency domain, time domain and time Cover effect.When masking effect occurs, generally using sound of different nature as masking sound, such as pure tone, complex tone, noise etc..It grinds Study carefully and also found, when being reached when masking sound and masked sound difference, can also shelter, this occlusion is known as non-concurrent cover It covers.Masking sound acts on the masking occurred before masked sound, referred to as preceding masking；Masking sound acts on institute after masked sound The masking of generation, referred to as after shelter.The masking effect of the sense of hearing is usually to be indicated with the new threshold audiogram in the presence of masking sound, Therefore the masked sound referred here to generally refers to pure tone.The threshold of audibility existing for masking sound is known as estimating the corresponding masking of masking threshold Threshold.

First embodiment

Through the applicant the study found that voice messaging can be made an uproar by various scenes in real life voice input Acoustic jamming, and traditional noise reduction schemes are typically carried out for a general noise reduction model, and not according to actual speech typing Acoustics scene is adjusted each algorithm parameter in general noise reduction model, and noise reduction is poor, may remain much noise. To solve the above-mentioned problems, first embodiment of the invention provides a kind of voice noise suppressing method.

Referring to FIG. 1, Fig. 1 is a kind of flow chart for voice noise suppressing method that first embodiment of the invention provides.Institute Predicate sound noise suppressing method is applied to carry out the electronic equipment of any kind of speech signal analysis, the voice noise The step of suppressing method, can be as follows：

Step S10：It is determined and the frequency domain speech signal according to the noise estimated result and signal-to-noise ratio of frequency domain speech signal Corresponding acoustics scene.

Step S20：Parameter adjustment is carried out to noise processed model according to the acoustics scene.

Step S30：Speech enhan-cement is carried out to the frequency domain speech signal according to the noise processed model after adjustment.

For step S10, the various scenes for wanting noise type different based on the acoustics scene may include that office makes an uproar Sound field scape, street noise scene, wind noise scene etc., energy, amplitude and the noise of the noise acquired in various acoustics scenes The codomain range of ratio is different.The present embodiment is based on frequency domain speech signal and carries out acoustics scene Recognition, the reason for this is that being believed by voice Number energy and amplitude can more quickly and accurately determine noise class, so that it is determined that corresponding acoustics scene, and frequency domain language The extraction of the features such as the voice short-time energy of sound signal, short-time average magnitude is more ripe, rapid.The noise estimation is usually logical Cross and short-time analysis carried out to Noisy Speech Signal, using the mathematical methods such as random process and probability statistics to the power spectrum of noise into Row estimation, to know noise power how with frequency distribution.The signal-to-noise ratio is signal and noise in the frequency domain speech signal Ratio, the signal-to-noise ratio of the acoustics scene being typically different is different, and the acoustics that noise source is different from voice input positional distance Scene signal-to-noise ratio often also differs, therefore can improve acoustics scene by the selection that comprehensive signal-to-noise ratio carries out acoustics scene and sentence Fixed accuracy.

For step S20, i.e.,：Parameter adjustment is carried out to noise processed model according to the acoustics scene.At the noise Reason model is mainly recorded microphone in recorded speech the background environment sound, that is, additivity acoustic noise entered and is dropped simultaneously It makes an uproar processing, for such additivity acoustic noise, the noise processed model may include human ear acoustics masking submodel, optional Ground, it is described " according to the acoustics scene to noise processed model carry out parameter adjustment " the step of may include：According to the sound The estimation masking threshold that scene determines the human ear acoustics masking submodel is learned, submodel is sheltered in institute using the human ear acoustics It states and filters out auditory perceptual frequency domain speech signal in frequency domain speech signal.The auditory perceptual frequency domain speech signal is estimated described in being Count the corresponding masking threshold of masking threshold and the frequency domain speech signal in the intersection of auditory perceptual frequency domain.

For step S30, i.e.,：Voice increasing is carried out to the frequency domain speech signal according to the noise processed model after adjustment By force.Wherein, described may include noise suppressed submodel, the voice drop of the frequency domain speech signal to the noise processed model It makes an uproar and is mainly carried out by the noise suppressed submodel.The basic algorithm of the noise suppressed submodel can be based on spectrum subtraction Voice enhancement algorithm, the voice enhancement algorithm based on Kalman filtering, is based on signal at the voice enhancement algorithm based on wavelet analysis Enhancing algorithm, the voice enhancement algorithm based on auditory masking effect, the speech enhan-cement based on independent component analysis of subspace are calculated Method or voice enhancement algorithm based on neural network, optionally, the present embodiment use the voice enhancement algorithm based on spectrum subtraction.

The present embodiment S10-S30 through the above steps first carries out acoustic field scape judgement when carrying out the enhancing of voice signal, Parameter adjustment is carried out to noise processed model further according to different acoustics scenes, it is special to be directed to actual voice input scene noise Point, the different noises being more adapted precisely in different acoustics scenes realize that more targetedly noise suppressed, raising are made an uproar The efficiency and effect that sound inhibits.

As an alternative embodiment, described " according to the noise estimated result and signal-to-noise ratio of frequency domain speech signal Before the step S10 of determining acoustics scene corresponding with the frequency domain speech signal ", the present embodiment also needs to obtain frequency domain speech Signal, and noise estimation and signal-to-noise ratio computation are carried out to it.Referring to FIG. 2, Fig. 2 is one kind that first embodiment of the invention provides The flow chart of voice input and processing step.

Optionally, the step S1 of voice input is：Collected original time domain voice is believed by simulating human ear filter Number it is converted into frequency domain speech signal.To complete Fast Fourier Transform (FFT).Simulation human ear filter (the first ear analog filtering Device, the second human ear analog filter) it is a kind of bandpass filter group that simulation human ear is filtered sound and divides, it uses When gamma bandpass filter (the gammatone filters) in 128 channels, the impulse Response Function of the i-th rank filter is as follows：

g_i(t)=t³exp(-2πb_it)cos(2πf_it+φ_i),if t≥0

g_i(t)=0, otherwise

Wherein, b_iIt represents and impacts corresponding attenuation rate, the attenuation rate is related to the bandwidth of filter, f_iRepresent filter Center frequency-band, φ_iRepresent phase (taking 0).b_iCalculating it is as follows：

ERB(f_i(the 4.37f of)=24.7_i/1000+1)

b_i=1.019ERB (f_i)

Wherein, ERB, equivalent rectangular bandwidth, the scale for weighing psychological response, wherein Frequency of heart f_iIt is uniformly distributed (from 80HZ to 5kHZ) in ERB meter full scales.After above-mentioned conversion, it can be handled in rear class In, finer processing is carried out for frequency domain, improves the levels of precision of Speech processing.

For example, Noisy Speech Signal can obtain the unit of 128 frequency bands, then after the first human ear filter filtering It carries out adding window to handle frame by frame, 128 voice T-F units (alternatively referred to as voice time frequency unit) in every frame voice can be obtained.

Optionally, following steps should be executed after step S1,

Step S2：Obtain the voice activity detection result of the frequency domain speech signal.

Step S3：The noise estimated result for obtaining the frequency domain speech signal is calculated according to the voice activity detection result And signal-to-noise ratio.

For step S2, the voice activity detection refers to the detection for the presence or absence that voice is detected in noise circumstance, Optionally, the voice activity detection can be that the frequency domain speech signal is divided into sound section and unvoiced segments, described sound Section is while including the frequency range of voice signal and noise signal, and the unvoiced segments are only to include the frequency range of noise signal.

It is described " to be calculated according to the voice activity detection result and obtain making an uproar for the frequency domain speech signal for step S3 The step of sound estimated result and signal-to-noise ratio " may include：The energy feature of described sound section and the unvoiced segments is compared It calculates, obtains noise estimated result and signal-to-noise ratio.By taking time recursive average type noise Estimation Algorithm as an example, time recursive average type Noise Estimation Algorithm determines voice at frequency point k with the presence or absence of the general of voice by the division of described sound section and the unvoiced segments Rate, when introducing probability, noise power spectral density can by noisy speech information frequency point k there is no voice conditional probability and There are the conditional probability of voice respectively to there is no the noise power spectral density under speech conditions, there are the noises under speech conditions Power spectral density is weighted and then sums and obtains, and the noise power spectral density is the noise estimated result.

For step S10, i.e., determined and the frequency domain language according to the noise estimated result and signal-to-noise ratio of frequency domain speech signal The corresponding acoustics scene of sound signal, optional acoustics scene decision procedure can be：By making an uproar in the noise estimated result Power sound spectrum density is matched with the power spectral density of various acoustics scenes, will be in the signal-to-noise ratio and various acoustics scenes Signal-to-noise ratio is matched, and chooses the highest acoustics scene of Mean match rate as acoustic field corresponding with the frequency domain speech signal Scape；The frequency point progress noise power spectral density matching that snr value in the noise estimated result is less than predetermined threshold value is chosen, really Noise power spectral density matching described in fixed various acoustics scenes is highest as acoustics corresponding with the frequency domain speech signal Scene；Pass through formula according to the noise estimated resultDB SPL obtain band speech The estimation of noise energy value s of signal f (i)_in(i), different acoustics scenes are directed to (such as quiet, office according to what computer mould was drawn up Room, vehicle-mounted, meeting room and music hall etc.) big data analysis as a result, generating returning for the estimation of noise energy value and signal-to-noise ratio One changes function, judges acoustics scene corresponding with the frequency domain speech signal according to the value of the normalized function.It should manage Solution, the judgement of the acoustics scene can also be carried out by neural network model, support vector machines or other decision procedures Judgement.

For step S30, " determine that the human ear acoustics shelters son according to the acoustics scene described in step S20 completions The estimation masking threshold of model filters out human ear sense using human ear acoustics masking submodel in the frequency domain speech signal Know frequency domain speech signal " the step of after, it is described " according to the noise processed model after adjustment to the frequency domain speech signal carry out language Sound enhance " the step of may include：Base is carried out to the auditory perceptual frequency domain speech signal using the noise suppressed submodel It is handled in the noise suppressed of spectrum-subtraction, obtains speech enhan-cement output signal.

Referring to FIG. 3, Fig. 3 is a kind of flow diagram for noise suppressed mode that first embodiment of the invention provides.Its In, the noise suppressed processing of the spectrum-subtraction can be as follows：

Wherein k indicates that k-th of Frequency point, n and m indicate the lower and upper limit of i-th of frequency band respectively,Indicate enhancing Speech signal energy afterwards,Pending speech energy after indicating smooth,Indicate the noise energy of estimation, α_iTable Show the subtracting coefficient excessively of i-th of subband, δ_iIndicate the additional subband subtraction factor of the i-th subband.

By upper figure as it can be seen that multi-subband spectrum subtract noise suppressing method first have to the amplitude information of input speech signal X (k) and Separated phase comes out, the processing that wherein amplitude information is used for carrying out, and phase information is used for coordinating enhanced voice signal Amplitude information obtains enhanced voice signal Y (k).Then, as follows pre- is carried out to the amplitude of noisy speech according to formula (2) Processing, pretreated effect are to reduce the big minor swing of noisy speech amplitude, reduce residual noise, improve voice quality.

In formula (2),Indicate the voice amplitudes after present frame, that is, jth frame pretreatment, | X_j-m(k) | indicate current The voice amplitudes of n frames before input frame and present frame, and W indicates pretreatment spectrum gain control coefrficient.Noisy speech is composed Can be by noise and speech manual molecule tape handling after being pre-processed, calculate separately each subband crosses subtracting coefficient.

Wherein, the subtracting coefficient of crossing of i-th of subband is calculated by formula (3)：

The Signal to Noise Ratio (SNR) of each subband in formula (3)_iIt is obtained by following formula (4)：

Formula (4) neutron band subtraction factor δ_iCalculating such as formula (5), mainly consider different frequency voice information content not Together：

Above-mentioned spectrum-subtraction is that a kind of development is relatively early and the more mature speech de-noising algorithm of application, the algorithm utilize additivity Noise and the incoherent feature of voice, assuming that noise be statistics smoothly under the premise of, the noise calculated with no speech gaps Spectrum estimation value, which replaces, the frequency spectrum of noise during voice, and noisy speech spectral substraction, to obtain the estimation of voice spectrum Value.Spectrum-subtraction has the characteristics that algorithm is simple, operand is small, is easy to implement quick processing, tends to obtain higher output Signal-to-noise ratio, and then the voice noise of the present embodiment is made to inhibit more rapid, accurate.

As an implementation, after in order to enable user or the more convenient rapid acquisition speech enhan-cement of related personnel Voice signal, the voice noise suppressing method is described " according to the noise processed model after adjustment to the frequency domain speech After the step of signal progress speech enhan-cement ", further include：The speech enhan-cement output signal is converted into time domain speech signal； It is exported by loudspeaker after being amplified the time domain speech signal using power amplifier.

It is carried out at speech enhan-cement it should be understood that the present embodiment can be frequency domain speech signal only to frequency band Reason can also be to the frequency domain speech signals of the multiple frequency bands of certain section of voice signal respectively according to carrying out voice the step of the present embodiment Enhancing, then speech enhan-cement result is merged and is exported.

Second embodiment

In order to which the voice noise suppressing method of first embodiment of the invention offer is better achieved, the present invention second is real It applies example and additionally provides a kind of voice noise restraining device 100.

Referring to FIG. 4, Fig. 4 is a kind of module map for voice noise restraining device that second embodiment of the invention provides.

Voice noise restraining device 100 includes acoustics scene determining module 110, parameter adjustment module 120, noise processed mould Block 130.

Optionally, voice noise restraining device 100 further include Voice Activity Detection module 101, noise analysis module 102 with And voice signal output module 140.

Voice Activity Detection module 101, the voice activity detection result for obtaining the frequency domain speech signal.

Noise analysis module 102 obtains the frequency domain speech signal for being calculated according to the voice activity detection result Noise estimated result and signal-to-noise ratio.

Acoustics scene determining module 110, for according to the noise estimated result and signal-to-noise ratio of frequency domain speech signal determine with The corresponding acoustics scene of the frequency domain speech signal.

Parameter adjustment module 120, for carrying out parameter adjustment to noise processed model according to the acoustics scene.

Noise processed module 130, for carrying out language to the frequency domain speech signal according to the noise processed model after adjustment Sound enhances.

Voice signal output module 140 is used for the speech enhan-cement output signal to be converted to time domain speech signal Power amplifier is exported after amplifying the time domain speech signal by loudspeaker.

It is apparent to those skilled in the art that for convenience and simplicity of description, the device of foregoing description Specific work process, can refer to preceding method in corresponding process, no longer excessively repeat herein.

3rd embodiment

Fig. 5 is please referred to, Fig. 5 is a kind of electronics that can be applied in the embodiment of the present application that third embodiment of the invention provides The structure diagram of equipment.

Electronic equipment 200 may include voice noise restraining device 100, memory 201, storage control 202, processor 203, Peripheral Interface 204, input-output unit 205, audio unit 206, display unit 207.

The memory 201, storage control 202, processor 203, Peripheral Interface 204, input-output unit 205, sound Frequency unit 206,207 each element of display unit are directly or indirectly electrically connected between each other, to realize the transmission or friendship of data Mutually.It is electrically connected for example, these elements can be realized between each other by one or more communication bus or signal wire.The voice Noise Suppression Device 100 can be stored in the memory 201 including at least one in the form of software or firmware (firmware) In or the software function module that is solidificated in the operating system (operating system, OS) of voice noise restraining device 100. The processor 203 is used to execute the executable module stored in memory 201, such as voice noise restraining device 100 includes Software function module or computer program.

Wherein, memory 201 may be, but not limited to, random access memory (Random Access Memory, RAM), read-only memory (Read Only Memory, ROM), programmable read only memory (Programmable Read-Only Memory, PROM), erasable read-only memory (Erasable Programmable Read-Only Memory, EPROM), Electricallyerasable ROM (EEROM) (Electric Erasable Programmable Read-Only Memory, EEPROM) etc.. Wherein, memory 201 is for storing program, and the processor 203 executes described program after receiving and executing instruction, aforementioned The method performed by server that the stream process that any embodiment of the embodiment of the present invention discloses defines can be applied to processor 203 In, or realized by processor 203.

Processor 203 can be a kind of IC chip, the processing capacity with signal.Above-mentioned processor 203 can To be general processor, including central processing unit (Central Processing Unit, abbreviation CPU), network processing unit (Network Processor, abbreviation NP) etc.；Can also be digital signal processor (DSP), application-specific integrated circuit (ASIC), Ready-made programmable gate array (FPGA) either other programmable logic device, discrete gate or transistor logic, discrete hard Part component.It may be implemented or execute disclosed each method, step and the logic diagram in the embodiment of the present invention.General processor Can be microprocessor or the processor 203 can also be any conventional processor etc..

The Peripheral Interface 204 couples various input/output devices to processor 203 and memory 201.At some In embodiment, Peripheral Interface 204, processor 203 and storage control 202 can be realized in one single chip.Other one In a little examples, they can be realized by independent chip respectively.

Input-output unit 205 is for being supplied to user input data to realize user and the server (or local terminal) Interaction.The input-output unit 205 may be, but not limited to, the equipment such as mouse and keyboard.

Audio unit 206 provides a user audio interface, may include that one or more microphones, one or more raises Sound device and voicefrequency circuit.

Display unit 207 provides an interactive interface (such as user's operation circle between the electronic equipment 200 and user Face) or for display image data give user reference.In the present embodiment, the display unit 207 can be liquid crystal display Or touch control display.Can be the capacitance type touch control screen or resistance for supporting single-point and multi-point touch operation if touch control display Formula touch screen etc..Single-point and multi-point touch operation is supported to refer to touch control display and can sense on the touch control display one Or at multiple positions simultaneously generate touch control operation, and by the touch control operation that this is sensed transfer to processor 203 carry out calculate and Processing.

It is appreciated that structure shown in fig. 5 is only to illustrate, the electronic equipment 200 may also include more than shown in Fig. 5 Either less component or with the configuration different from shown in Fig. 5.Hardware, software may be used in each component shown in Fig. 5 Or combinations thereof realize.

In conclusion an embodiment of the present invention provides a kind of voice noise suppressing method and device, the voice noise suppression Method processed is handled based on frequency domain speech signal, and more conducively processing equipment is analyzed it and handled, and improves voice letter Number processing speed and accuracy.Meanwhile the voice noise suppressing method is directed to different acoustics scenes to noise processed model Be adjusted so that the voice noise suppressing method is more accurate on adapting to acoustics scene, can realize more added with for The noise suppressed of property, improves the effect of noise suppressed.Further, the judgement of the acoustics scene is to estimate to tie by noise Fruit and signal-to-noise ratio carry out, and increase the accuracy of acoustics scene judgement and improve the speed of acoustics scene judgement, to carry High voice noise inhibition and efficiency.

In several embodiments provided herein, it should be understood that disclosed device and method can also pass through Other modes are realized.The apparatus embodiments described above are merely exemplary, for example, the flow chart in attached drawing and block diagram Show the device of multiple embodiments according to the present invention, the architectural framework in the cards of method and computer program product, Function and operation.In this regard, each box in flowchart or block diagram can represent the one of a module, section or code Part, a part for the module, section or code, which includes that one or more is for implementing the specified logical function, to be held Row instruction.It should also be noted that at some as in the realization method replaced, the function of being marked in box can also be to be different from The sequence marked in attached drawing occurs.For example, two continuous boxes can essentially be basically executed in parallel, they are sometimes It can execute in the opposite order, this is depended on the functions involved.It is also noted that every in block diagram and or flow chart The combination of box in a box and block diagram and or flow chart can use function or the dedicated base of action as defined in executing It realizes, or can be realized using a combination of dedicated hardware and computer instructions in the system of hardware.

In addition, each function module in each embodiment of the present invention can integrate to form an independent portion Point, can also be modules individualism, can also two or more modules be integrated to form an independent part.

It, can be with if the function is realized and when sold or used as an independent product in the form of software function module It is stored in a computer read/write memory medium.Based on this understanding, technical scheme of the present invention is substantially in other words The part of the part that contributes to existing technology or the technical solution can be expressed in the form of software products, the meter Calculation machine software product is stored in a storage medium, including some instructions are used so that a computer equipment (can be People's computer, server or network equipment etc.) it performs all or part of the steps of the method described in the various embodiments of the present invention. And storage medium above-mentioned includes：USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), arbitrary access are deposited The various media that can store program code such as reservoir (RAM, Random Access Memory), magnetic disc or CD.

The foregoing is only a preferred embodiment of the present invention, is not intended to restrict the invention, for the skill of this field For art personnel, the invention may be variously modified and varied.All within the spirits and principles of the present invention, any made by repair Change, equivalent replacement, improvement etc., should all be included in the protection scope of the present invention.It should be noted that：Similar label and letter exist Similar terms are indicated in following attached drawing, therefore, once being defined in a certain Xiang Yi attached drawing, are then not required in subsequent attached drawing It is further defined and is explained.

The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any Those familiar with the art in the technical scope disclosed by the present invention, can easily think of the change or the replacement, and should all contain Lid is within protection scope of the present invention.Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

It should be noted that herein, relational terms such as first and second and the like are used merely to a reality Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation In any actual relationship or order or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to Non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those Element, but also include other elements that are not explicitly listed, or further include for this process, method, article or equipment Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that There is also other identical elements in process, method, article or equipment including the element.

Claims

1. a kind of voice noise suppressing method, which is characterized in that the voice noise suppressing method includes：

Acoustic field corresponding with the frequency domain speech signal is determined according to the noise estimated result and signal-to-noise ratio of frequency domain speech signal Scape；

Parameter adjustment is carried out to noise processed model according to the acoustics scene；

Speech enhan-cement is carried out to the frequency domain speech signal according to the noise processed model after adjustment.

2. voice noise suppressing method according to claim 1, which is characterized in that described according to frequency domain speech signal Before noise estimated result and signal-to-noise ratio determine acoustics scene corresponding with the frequency domain speech signal, the voice noise inhibits Method further includes：

It converts collected original time domain voice signal to frequency domain speech signal by simulating human ear filter.

3. voice noise suppressing method according to claim 2, which is characterized in that described by simulating human ear filter Collected original time domain voice signal is converted after frequency domain speech signal and the making an uproar according to frequency domain speech signal to Before sound estimated result and signal-to-noise ratio determine acoustics scene corresponding with the frequency domain speech signal, the voice noise inhibition side Method further includes：

Obtain the voice activity detection result of the frequency domain speech signal；

The noise estimated result and signal-to-noise ratio for obtaining the frequency domain speech signal are calculated according to the voice activity detection result.

4. voice noise suppressing method according to claim 3, which is characterized in that the language for obtaining frequency domain speech signal Sound activity detection is as a result, include：

Voice activity detection is carried out to frequency domain speech signal, to which the frequency domain speech signal is divided into sound section and noiseless Section, using described sound section and the unvoiced segments as the voice activity detection of the frequency domain speech signal as a result, described sound section To include the frequency range of voice signal and noise signal simultaneously, the unvoiced segments are only to include the frequency range of noise signal.

5. voice noise suppressing method according to claim 4, which is characterized in that described according to the voice activity detection As a result the noise estimated result and signal-to-noise ratio for obtaining the frequency domain speech signal are calculated, including：

Comparing calculation is carried out to the energy feature of described sound section and the unvoiced segments, obtains noise estimated result and signal-to-noise ratio.

6. voice noise suppressing method according to any one of claims 1-5, which is characterized in that the noise processed mould Type include noise suppressed submodel and human ear acoustics masking submodel, it is described according to the acoustics scene to noise processed model into Row parameter adjustment carries out speech enhan-cement according to the noise processed model after adjustment to the frequency domain speech signal, including：

Determine that the human ear acoustics shelters the estimation masking threshold of submodel according to the acoustics scene, using the human ear acoustics Masking submodel filters out auditory perceptual frequency domain speech signal in the frequency domain speech signal；

Noise suppressed based on spectrum-subtraction is carried out to the auditory perceptual frequency domain speech signal using the noise suppressed submodel Processing obtains speech enhan-cement output signal.

7. voice noise suppressing method according to claim 6, which is characterized in that at the noise according to after adjustment After model is managed to frequency domain speech signal progress speech enhan-cement, the voice noise suppressing method further includes：

The speech enhan-cement output signal is converted into time domain speech signal；

It is exported by loudspeaker after being amplified the time domain speech signal using power amplifier.

8. a kind of voice noise restraining device, which is characterized in that the voice noise restraining device includes：

Acoustics scene determining module, for being determined and the frequency domain according to the noise estimated result and signal-to-noise ratio of frequency domain speech signal The corresponding acoustics scene of voice signal；

Parameter adjustment module, for carrying out parameter adjustment to noise processed model according to the acoustics scene；

Noise processed module, for carrying out speech enhan-cement to the frequency domain speech signal according to the noise processed model after adjustment.

9. voice noise restraining device according to claim 8, which is characterized in that the voice noise restraining device also wraps It includes：

Voice Activity Detection module, the voice activity detection result for obtaining the frequency domain speech signal；

Noise analysis module, the noise for calculating the acquisition frequency domain speech signal according to the voice activity detection result are estimated Count result and signal-to-noise ratio.

10. a kind of storage medium, which is characterized in that be stored with computer program instructions, the computer in the storage medium When program instruction is read and run by a processor, perform claim requires the step in any one of 1-7 the methods.