CN104778951A

CN104778951A - Speech enhancement method and device

Info

Publication number: CN104778951A
Application number: CN201510159358.3A
Authority: CN
Inventors: 周璇; 夏丙寅; 苗磊
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2015-04-07
Filing date: 2015-04-07
Publication date: 2015-07-15

Abstract

The invention discloses a speech enhancement method and device. The method comprises steps as follows: the characteristic quantity of noise in a silence section of a speech signal is acquired; a noise class matched with the noise in the silence section is determined from multiple preset noise classes according to the characteristic quantity of the noise in the silence section, and the multiple noise classes are acquired after clustering of multiple noise samples according to characteristic information of the multiple noise samples; a noise model corresponding to the noise class matched with the noise in the silence section is determined according to the noise class matched with the noise in the silence section as well as the mapping relation between the noise class and the noise model; speech enhancement is performed on the speech signal according to the noise model corresponding to the noise class matched with the noise in the silence section. With the adoption of the speech enhancement method and device, speech enhancement is performed on the speech signal according to the noise model corresponding to the noise class, and the speech enhancement effect can be improved.

Description

The method and apparatus of speech enhan-cement

Technical field

The embodiment of the present invention relates to speech processes field, and more specifically, relates to a kind of method and apparatus of speech enhan-cement.

Background technology

Along with the develop rapidly of mechanics of communication and network technology, it take landline telephone as the category of principal mode that voice communication surmounts traditional far away, is widely used in the numerous areas such as mobile communication, TV/teleconference, car handsfree communication, IP phone.In the application of voice communication, ensure the clear of voice signal and high-quality, eliminating the various noises produced by these new voice communication modes in voice signal, is challenging problem.

At present, in voice communication, the greatest difficulty of speech enhan-cement is the non-intellectual (or uncertainty of statistical property) of noise circumstance.Application in prior art and research is sound enhancement method based on Hidden Markov Model (HMM) (Hidden Markov Model, HMM) the most widely.The effect of HMM sound enhancement method to speech enhan-cement is better.But HMM sound enhancement method too relies on the input of extraneous priori, strengthen effect when noise type is unknown or noise switches poor.

In order to overcome above-mentioned shortcoming, have the noise sample training noise model researched and proposed according to gathering, multiple noise sample and noise model form noise sample and the list of noise model mapping relations.Then determine the noise sample with the noise matching in the noisy speech signal of input, then according to noise sample and the list of noise model mapping relations, determine the noise model mated.According to mating the noise model that obtains, carry out speech enhan-cement based on HMM sound enhancement method, can the robustness of boosting algorithm to a certain extent.

But, the number of the noise sample that noise sample and the list of noise model mapping relations comprise and noise model is limited, when there is not comparatively close noise sample in the noise in input speech signal in mapping relations list, according to the enhancing weak effect of the noise model that noise sample and the list of noise model mapping relations are determined, therefore the universality of the method need to strengthen.In addition, the method all will be trained separately and Modling model often kind of noise, comparatively large to the demand of storage space, limits the availability of its reality.

Summary of the invention

The embodiment of the present invention provides a kind of method and apparatus of speech enhan-cement, can improve the effect of speech enhan-cement.

First aspect, provides a kind of method of speech enhan-cement, comprising:

Obtain the characteristic quantity of noise in quiet section of voice signal;

According to the characteristic quantity of noise in described quiet section, determine from the multiple noise classes preset with described quiet section in the noise class of noise matching, described multiple noise class obtains after carrying out cluster according to the characteristic quantity of multiple noise sample to described multiple noise sample;

According to the noise class of noise matching in described and described quiet section, and the mapping relations of noise class and noise model, determine the noise model corresponding with the noise class of noise matching in described quiet section;

The noise model corresponding according to the noise class of noise matching in described and described quiet section, carries out speech enhan-cement to described voice signal.

In conjunction with first aspect, in the first possible implementation of first aspect, described characteristic quantity comprises at least one in noise averaging spectrum entropy, noise normalization critical band energy proportion and the average zero-crossing rate of noise.

In conjunction with the first possible implementation of first aspect or first aspect, in the implementation that the second of first aspect is possible, described multiple noise sample comprises n noise sample, and described method also comprises:

Obtain a described n noise sample, calculate described n noise sample characteristic quantity separately;

According to described n noise sample characteristic quantity separately, be m noise class by described n noise sample cluster;

Described m noise class is trained, to obtain described m the noise model that noise class is corresponding;

Described m noise class and described m noise model that noise class is corresponding are mapped to the mapping relations of described noise class and noise model, wherein, m is less than n.

In conjunction with the implementation that the second of first aspect is possible, in the third possible implementation of first aspect, described according to described n noise sample characteristic quantity separately, be m noise class by described n noise sample cluster, comprise:

M noise sample is selected as m noise cluster barycenter from a described n noise sample;

To each noise sample in a remaining n-m noise sample in a described n noise sample, the characteristic quantity calculating each noise sample described divides the distance of the characteristic quantity being clipped to described m noise cluster barycenter, each noise sample described is referred to respectively in noise class corresponding to nearest noise cluster barycenter.

In conjunction with the third possible implementation of first aspect, in the 4th kind of possible implementation of first aspect, describedly from a described n noise sample, select m noise sample as m noise cluster barycenter, comprising:

Collection according to a described n noise sample is originated, and selects a described m noise sample as described m noise cluster barycenter from a described n noise sample; Or

According to the size of described n noise sample noise averaging spectrum entropy separately, from a described n noise sample, select a described m noise sample as described m noise cluster barycenter.

In conjunction with the second of first aspect to any one the possible implementation in the 4th kind of possible implementation, in the 5th kind of possible implementation of first aspect, described described m noise class to be trained, to obtain described m the noise model that noise class is corresponding, comprising:

To each the noise class in described m noise class, noise sample in each noise class described is combined into noise class training data respectively by the rule preset, wherein, in described m noise class, any two noise classes divide the length of noise class training data described in other equal;

Described m noise class described noise class training data is separately trained respectively, to obtain described m each self-corresponding noise model of noise class.

In conjunction with the 5th kind of possible implementation of first aspect, in the 6th kind of possible implementation of first aspect, described to each the noise class in described m noise class, the noise sample in each noise class described is combined into noise class training data respectively by the rule preset, comprises:

To any one the noise class in described m noise class, when any one noise class described comprises multiple noise sample, the described multiple noise sample in any one noise class described is become noise class training data by identical ratio combination.

In conjunction with the second of first aspect to any one the possible implementation in the 6th kind of possible implementation, in the 7th kind of possible implementation of first aspect, the described characteristic quantity according to noise in described quiet section, determine from the multiple noise classes preset with described quiet section in the noise class of noise matching, comprising:

The characteristic quantity of the characteristic quantity of noise in described quiet section and the noise cluster barycenter of described m noise class is contrasted, by the noise class at noise cluster barycenter place nearest for noise in apart from described quiet section, be defined as with described quiet section in the noise class of noise matching.

In conjunction with the first of first aspect and first aspect to any one the possible implementation in the 7th kind of possible implementation, in the 8th kind of possible implementation of first aspect, the described characteristic quantity according to noise in described quiet section, determine from the multiple noise classes preset with described quiet section in the noise class of noise matching, comprising:

According to the characteristic quantity of noise in described quiet section, determine whether there is with described quiet section in the noise sample of noise matching;

When there is not the noise sample of noise matching in described quiet section, according to the characteristic quantity of noise in described quiet section, determine from the multiple noise classes preset with described quiet section in the noise class of noise matching.

Second aspect, provides a kind of device of speech enhan-cement, comprising:

First acquisition module, for the characteristic quantity of noise in obtain voice signal quiet section;

First determination module, for the characteristic quantity according to noise in described quiet section of described first acquisition module acquisition, determine from the multiple noise classes preset with described quiet section in the noise class of noise matching, described multiple noise class obtains after carrying out cluster according to the characteristic quantity of multiple noise sample to described multiple noise sample;

Second determination module, for described in determining according to described first determination module with described quiet section in the noise class of noise matching, and the mapping relations of noise class and noise model, determine the noise model corresponding with the noise class of noise matching in described quiet section;

Strengthen module, for the noise model corresponding with the noise class of noise matching in described quiet section determined according to described second determination module, speech enhan-cement is carried out to described voice signal.

In conjunction with second aspect, in the first possible implementation of second aspect, described characteristic quantity comprises at least one in noise averaging spectrum entropy, noise normalization critical band energy proportion and the average zero-crossing rate of noise.

In conjunction with the first possible implementation of second aspect or second aspect, in the implementation that the second of second aspect is possible, described multiple noise sample comprises n noise sample, and described device also comprises:

Second acquisition module, for obtaining a described n noise sample, calculates described n noise sample characteristic quantity separately;

Described n noise sample cluster, for described n the noise sample characteristic quantity separately obtained according to described second acquisition module, is m noise class by cluster module;

Training module, trains for described m the noise class obtained described cluster module, to obtain described m the noise model that noise class is corresponding;

Mapping block, noise model corresponding to described m the noise class obtained for described m noise class being obtained by described cluster module and described training module is mapped to the mapping relations of described noise class and noise model, and wherein, m is less than n.

In conjunction with the implementation that the second of second aspect is possible, in the third possible implementation of second aspect, described cluster module specifically for:

In conjunction with the third possible implementation of second aspect, in the 4th kind of possible implementation of second aspect, described cluster module selects m noise sample as m noise cluster barycenter from a described n noise sample, comprising:

In conjunction with the second of second aspect to any one the possible implementation in the 4th kind of possible implementation, in the 5th kind of possible implementation of second aspect, described training module specifically for:

To each the noise class in described m noise class, noise sample in each noise class described is combined into noise class training data respectively by the rule preset, wherein, in described m noise class, the length of the described noise class training data that any two noise classes are corresponding is equal;

In conjunction with the 5th kind of possible implementation of second aspect, in the 6th kind of possible implementation of second aspect, described training module is to each the noise class in described m noise class, noise sample in each noise class described is combined into noise class training data respectively by the rule preset, comprises:

In conjunction with the second of second aspect to any one the possible implementation in the 6th kind of possible implementation, in the 7th kind of possible implementation of second aspect, described first determination module specifically for:

In conjunction with the first of second aspect and second aspect to any one the possible implementation in the 7th kind of possible implementation, in the 8th kind of possible implementation of second aspect, described first determination module specifically for:

The method and apparatus of the speech enhan-cement that the embodiment of the present invention provides, by in the training stage, noise sample in the noise class obtained after carrying out cluster to multiple noise sample is trained, obtain the noise model that noise class is corresponding, the characteristic quantity of noise in quiet section of voice signal to be reinforced is obtained in the enhancing stage, according to the characteristic quantity obtained, determine the noise model corresponding with the noise class of noise matching in quiet section, the noise of this noise model more approaching to reality, according to the noise model of coupling, speech enhan-cement is carried out to voice signal, the effect of speech enhan-cement can be improved.

Accompanying drawing explanation

In order to be illustrated more clearly in the technical scheme of the embodiment of the present invention, be briefly described to the accompanying drawing used required in embodiment or description of the prior art below, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these accompanying drawings.

Fig. 1 is the indicative flowchart of the method for the speech enhan-cement of the embodiment of the present invention.

Fig. 2 is the schematic diagram of the method for the speech enhan-cement of the embodiment of the present invention.

Fig. 3 is the indicative flowchart of the method for the speech enhan-cement of the embodiment of the present invention.

Fig. 4 is the schematic block diagram of the device of the speech enhan-cement of the embodiment of the present invention.

Fig. 5 is the schematic block diagram of the device of the speech enhan-cement of the embodiment of the present invention.

Embodiment

Below in conjunction with the accompanying drawing in the embodiment of the present invention, be clearly and completely described the technical scheme in the embodiment of the present invention, obviously, described embodiment is the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtained under creative work prerequisite, belong to the scope of protection of the invention.

First, the method for existing speech enhan-cement is simply introduced.The method of existing speech enhan-cement comprises training stage and two stages of enhancing stage.The execution flow process of training stage is as follows: gather n noise sample, to each noise sample in n noise sample, intercepts the noise segment of identical duration, forms n noise training data.N noise training data is trained respectively, obtains n the noise model that n noise sample is corresponding, form n noise sample and n noise model forms noise sample and the list of noise model mapping relations.Wherein, training is carried out to noise training data and can obtain the multiple parameters corresponding with this noise sample, take multiple different weighted array can obtain multiple noise signal to different parameters, the noise model that namely each noise sample is corresponding can comprise multiple noise signal.

The execution flow process in enhancing stage is as follows: input noisy speech signal, spectrum analysis is carried out to voice signal, and the noise sample with the noise matching quiet section of voice signal is determined in the noise sample obtained from the training stage and the list of noise model mapping relations, determine the noise model corresponding with the noise sample of the noise matching in quiet section of voice signal according to the noise sample of coupling and noise sample and noise model mapping relations.Then, then by calculating the noise signal in suitable noise model is determined.Using voice signal to be reinforced and the input of the noise signal in the noise model determined as HMM sound enhancement method, after voice signal being strengthened based on the sound enhancement method of HMM, export the voice signal after strengthening.

Fig. 1 shows the indicative flowchart of the method 100 of the speech enhan-cement of the embodiment of the present invention.Method 100 is performed by the device of speech enhan-cement or equipment, and method 100 comprises:

S101, obtains the characteristic quantity of noise in quiet section of voice signal;

S102, according to the characteristic quantity of noise in this quiet section, from the multiple noise classes preset, determine the noise class of noise matching in quiet section with this, the plurality of noise class obtains after carrying out cluster according to the characteristic information of multiple noise sample to the plurality of noise sample;

S103, according to this and this noise class of noise matching in quiet section, and the mapping relations of noise class and noise model, determine the noise model that the noise class of noise matching in quiet with this section is corresponding;

S104, the noise model corresponding according to the noise class of noise matching in quiet section of this and this, carries out speech enhan-cement to this voice signal.

Therefore, the method of the speech enhan-cement that the embodiment of the present invention provides, by in the training stage, noise sample in the noise class obtained after carrying out cluster to multiple noise sample is trained, obtain the noise model that noise class is corresponding, the characteristic quantity of noise in quiet section of voice signal to be reinforced is obtained in the enhancing stage, according to the characteristic quantity obtained, determine the noise model corresponding with the noise class of noise matching in quiet section, the noise of this noise model more approaching to reality, according to the noise model of coupling, speech enhan-cement is carried out to voice signal, the effect of speech enhan-cement can be improved.

Specifically, in the training stage of the method 100 of the speech enhan-cement of the embodiment of the present invention, be noise class by the multiple noise sample clusters obtained, after the noise sample in the noise class obtained is trained, obtain noise model corresponding to noise class after cluster.Such as, be m noise class by n noise sample cluster, m noise class is trained, obtain the noise model that m noise class is corresponding.Then, the mapping relations of m noise class and noise model corresponding to m noise class are set up.

Algorithm noise sample being carried out to cluster has multiple, such as comprise hierarchical clustering algorithm, Self-organizing Maps (Self-Organizing Map, SOM) clustering algorithm, fuzzy C-mean algorithm (Fuzzy C-Means, FCM) clustering algorithm or adopt and the similar algorithm etc. of K average (K-Means) clustering algorithm thought, the embodiment of the present invention does not limit clustering algorithm.The data of noise sample being carried out to cluster institute foundation are the characteristic quantity of noise sample.Some characteristic quantities irrelevant with absolute energy that is noise that are that use when this characteristic quantity can be chosen at the type distinguishing noise.

Alternatively, as an embodiment, this characteristic quantity can comprise at least one in noise averaging spectrum entropy, noise normalization critical band energy proportion and the average zero-crossing rate of noise, and can also comprise the feature that other can be used for cluster, the embodiment of the present invention does not limit this.

In the enhancing stage of the method 100 of the speech enhan-cement of the embodiment of the present invention, obtain the characteristic quantity of noise in quiet section of voice signal.Such as can comprise at least one in noise averaging spectrum entropy, noise normalization critical band energy proportion and the average zero-crossing rate of noise.According to the characteristic quantity of noise in quiet section, the noise model that m the noise class (the multiple noise classes namely preset) obtained from the training stage of method 100 is corresponding, determine with quiet section in the noise class of noise matching.Then according to the noise class of coupling and the mapping relations of noise class and noise model, determine the noise model corresponding with the noise class of matching noise in described quiet section, and then speech enhan-cement is carried out to voice signal.

Should understand, the noise class of mating is determined from m noise class, can be that the characteristic quantity of noise in quiet section is contrasted with the characteristic quantity of m noise cluster barycenter of m noise class respectively, noise class corresponding for the noise cluster barycenter nearest with the characteristic quantity of noise in quiet section is defined as the noise class of mating.Noise model corresponding according to noise class in embodiments of the present invention carries out speech enhan-cement to voice signal, and can adopt the sound enhancement method based on HMM, also can adopt other sound enhancement method, the embodiment of the present invention is not construed as limiting this.

Should also be understood that the method 100 of the embodiment of the present invention can comprise noise in quiet section and n noise sample are compared, choose the noise sample of coupling, and then determine this process of noise model that the noise class of coupling is corresponding also can not comprise this process.When method 100 does not comprise this process of noise sample determining matching noise in quiet section, then in speech sound enhancement device, the independent training of n noise can not be carried out and do not need to store noise model corresponding to n noise sample, effectively can reduce the consumption of off-line training complexity and on-line storage.If when there being n noise sample to be divided into the individual noise class of m (m<n), when physical training condition number is selected consistent with mixed number, compared with the on-line storage consumption of the noise model of training m noise class corresponding and noise model corresponding to training n noise sample, (n-m)/n times can be reduced.

Alternatively, as an embodiment, S102, according to the characteristic quantity of noise in this quiet section, determines the noise class of noise matching in quiet section with this, comprising from the multiple noise classes preset:

According to the characteristic quantity of noise in this quiet section, determine whether there is the noise sample of noise matching in quiet section with this;

When there is not the noise sample of noise matching in quiet section with this, according to the characteristic quantity of noise in this quiet section, from the multiple noise classes preset, determine the noise class of noise matching in quiet section with this.

Specifically, determine with the noise class of the noise matching of quiet section of voice signal before, can first determine whether there is with quiet section in the noise sample of noise matching.Noise sample can be a kind of noise sample in noise sample and the list of noise model mapping relations.Noise sample and the list of noise model mapping relations can be obtained by the training stage of the method for the existing speech enhan-cement introduced in above, comprise noise sample and the list of noise model mapping relations of n noise sample, but the embodiment of the present invention is not limited thereto.

Determine whether there is with quiet section in the noise sample of noise matching, by the characteristic quantity of noise in extracting quiet section, the noise sample that in n noise sample, whether existing characteristics amount is mated with the characteristic quantity of noise in quiet section can be judged.When there is the noise sample of coupling, according to noise sample and noise model mapping relations list determination noise model, according to noise model, speech enhan-cement is carried out to voice signal.When there is not the noise sample of coupling in n noise sample, according to the characteristic quantity of noise in quiet section, from m the noise class that the training stage of method 100 obtains, determine with quiet section in the noise class of noise matching.Wherein, the noise class of mating is determined from m noise class, can be that the characteristic quantity of noise in quiet section is contrasted with the characteristic quantity of m noise cluster barycenter of m noise class respectively, noise class corresponding for the noise cluster barycenter nearest with the characteristic quantity of noise in quiet section is defined as the noise class of mating, and then determines the noise model that noise class is corresponding.

Therefore, the method of the speech enhan-cement that the embodiment of the present invention provides, by first judging whether the noisy samples that there is coupling, when there is not the noisy samples of coupling, determine the noise-like of mating again, the noise model more approaching to reality noise that noise-like is corresponding, carries out speech enhan-cement with this noise model to voice signal, can improve the effect of speech enhan-cement further.

Alternatively, as an embodiment, the plurality of noise sample comprises n noise sample, and the method 100 also comprises:

Obtain this n noise sample, calculate this n noise sample characteristic quantity separately;

According to this n noise sample characteristic quantity separately, be m noise class by this n noise sample cluster;

This m noise class is trained, to obtain this m the noise model that noise class is corresponding;

This m noise class and this m the noise model that noise class is corresponding are mapped to the mapping relations of this noise class and noise model, wherein, m is less than n.

Specifically, obtain n noise sample, extract n noise sample characteristic quantity separately.Characteristic quantity comprises at least one in noise averaging spectrum entropy, noise normalization critical band energy proportion and the average zero-crossing rate of noise three kinds.The process calculating characteristic quantity is as follows:

(a) noise averaging spectrum entropy

The signal of noise sample is divided into L frame, and wherein, each frame comprises N number of frequency.Calculate the spectrum entropy of each frame, then by the spectrum entropy of each frame calculate noise sample averaging spectrum entropy.Calculate the spectrum entropy of each frame, first need the Probability p of each frequency in the power spectrum of calculating noise sample _k.Wherein, the probability of each frequency refers to that the energy of this frequency accounts for the number percent of the gross energy of all frequencies of whole frame.This Probability p _kcan be obtained by the energy value normalization of frequencies all in this frame, that is:

p_{k} = \frac{D_{k}^{2}}{Σ_{i = 0}^{N - 1} D_{i}^{2}}, k = 0,1,2, . . ., N - 1

(formula 1)

In formula 1 represent the noise power value of corresponding frequency.The then spectrum entropy H of jth frame _jcan be expressed as:

H_{j} = - Σ_{k = 0}^{N - 1} p_{k} {\log p}_{k}

(formula 2)

And then the averaging spectrum entropy of this noise sample can be expressed as:

\overset{&OverBar;}{H} = \frac{Σ_{j = 1}^{L} H_{j}}{L}

(formula 3)

(b) noise normalization critical band energy proportion

The signal of noise sample is divided into L frame, then a frame signal is divided into t critical band.Each normalization critical band energy is the ratio that the energy of each critical band accounts for whole band spectrum energy:

{Br}_{w} = \frac{Σ_{k = {bl}_{w}}^{{bh}_{w}} P (k)}{P_{s}}, w = 1,2, . . ., t

(formula 4)

Wherein, bl _wand bh _wbe respectively lower limit and the upper limit of the frequency of w critical band; Br _wit is the normalized energy of w critical band; P (k) represents the energy value of frequency k; P _srepresent the spectrum energy of the whole frequency band of noise.So just can obtain the critical band energy proportion eigenvector of a present frame t dimension:

{ Br ₁, Br ₂, Br ₃..., Br _t(formula 5)

By amounting to the critical band energy Ratios eigenvector summation of L frame divided by L, the noise normalization critical band energy proportion eigenvector for cluster can be obtained.

The average zero-crossing rate of (c) noise

The signal of noise sample is divided into L frame, calculates the number of times M of sampling point sign modification continuously in each frame noise signal, be the zero-crossing rate of this frame.Then the summation of the zero-crossing rate of each frame can be obtained the average zero-crossing rate feature of this noise divided by L.

After obtaining n noise sample characteristic quantity separately, according to the characteristic quantity of each noise sample, be m noise class (wherein m<n) by n noise sample cluster.The algorithm that cluster uses can be the algorithm etc. that hierarchical clustering algorithm, SOM clustering algorithm, Fuzzy C-Means Cluster Algorithm or employing and K means clustering algorithm thought are similar, and can also be other clustering algorithm, the embodiment of the present invention be not construed as limiting this.After cluster obtains m noise class, form m noise class training data according to the noise sample that each noise class comprises, m noise class training data is trained, to obtain noise model corresponding to m noise class.Then, the mapping relations of m noise class and noise model corresponding to m noise class are set up.

To by the algorithm similar with K means clustering algorithm thought, the process that n noise sample cluster is m noise class is described in detail below.Correspondingly, as an embodiment, according to this n noise sample characteristic quantity separately, be m noise class by this n noise sample cluster, comprise:

M noise sample is selected as m noise cluster barycenter from this n noise sample;

To each noise sample in a remaining n-m noise sample in this n noise sample, the characteristic quantity calculating this each noise sample divides the distance of the characteristic quantity being clipped to this m noise cluster barycenter, this each noise sample is referred to respectively in noise class corresponding to nearest noise cluster barycenter.

Specifically, the concrete steps of the clustering algorithm of the embodiment of the present invention as shown in Figure 2, comprising:

1. from n the noise sample of Fig. 2 A, select m noise sample, as m noise cluster barycenter (schematically being illustrated by the larger stain of noise cluster barycenter in Fig. 2 B).Alternatively, can originate according to the collection of n noise sample, from n noise sample, select m noise sample as m noise cluster barycenter; Or, according to the size of n noise sample noise averaging spectrum entropy separately, from n noise sample, select m noise sample as m noise cluster barycenter.

Specifically, in one example in which, during collection n noise sample, it gathers source is known, such as, from the noise, white noise etc. of office.From n noise sample, select m noise sample by experience, as m noise cluster barycenter.Such as, the representational noise sample such as white noise, office noise and valve noise are elected to be noise cluster barycenter.In another example, from n noise sample, choose m noise cluster barycenter by the size of the noise averaging spectrum entropy of each noise sample.Each frame signal of each noise sample includes N number of frequency, then noise sample averaging spectrum entropy scope 0 to log ₂between N.0 to log ₂on average choose m spectrum entropy node within the scope of N, then the averaging spectrum entropy of n noise sample and the immediate noise sample of node are elected to be noise cluster barycenter.In addition, can also according to n noise sample noise normalization critical band energy proportion separately or the average zero-crossing rate of noise, or the size of the combination of at least two kinds in noise averaging spectrum entropy, noise normalization critical band energy proportion and the average zero-crossing rate of noise, from n noise sample, select m noise sample as m noise cluster barycenter, the embodiment of the present invention is not construed as limiting this.

Correspondingly, from this n noise sample, select m noise sample as m noise cluster barycenter, comprising:

Collection according to this n noise sample is originated, and selects this m noise sample as this m noise cluster barycenter from this n noise sample; Or

According to the size of this n noise sample noise averaging spectrum entropy separately, from this n noise sample, select this m noise sample as this m noise cluster barycenter.

2. any one noise sample in a couple n noise sample in a remaining n-m noise sample, the characteristic quantity calculating this noise sample divides the distance of the characteristic quantity being clipped to m noise cluster barycenter, any one noise sample is grouped in noise class corresponding to nearest noise cluster barycenter.

Wherein, when characteristic quantity is a kind of in noise averaging spectrum entropy, noise normalization critical band energy proportion and the average zero-crossing rate of noise, such as, when being noise normalization critical band energy proportion, the characteristic quantity of calculating noise sample divides the distance of the characteristic quantity being clipped to m noise cluster barycenter, can be calculating noise sample noise normalization critical band energy proportion respectively with the difference of the noise normalization critical band energy proportion of m noise cluster barycenter, this noise sample is grouped in noise class corresponding to the minimum noise cluster barycenter of the absolute value of difference.When characteristic quantity is at least two kinds in noise averaging spectrum entropy, noise normalization critical band energy proportion and the average zero-crossing rate of noise, the characteristic quantity of noise sample divides the distance of the characteristic quantity being clipped to m noise cluster barycenter, can at least two kinds of features of calculating noise sample to the weighting of the difference of at least two kinds of features of noise cluster barycenter, thus this noise sample is grouped in noise class corresponding to nearest noise cluster barycenter.Preferably, above-mentionedly average weighted is weighted to.

3. repeat step 2 until n-m noise sample to be referred to respectively (as shown in Figure 2 B, for ease of distinguishing, wherein noise cluster barycenter schematically illustrates with larger stain) in m noise class.

Alternatively, as an embodiment, this m noise class is trained, to obtain this m the noise model that noise class is corresponding, comprising:

To each the noise class in this m noise class, noise sample in this each noise class is combined into noise class training data respectively by the rule preset, wherein, in this m noise class, the length of this noise class training data that any two noise classes are corresponding is equal;

This m noise class this noise class training data is separately trained respectively, to obtain this m each self-corresponding noise model of noise class.

Specifically, after acquisition m noise class, the noise sample in each noise class in m noise class is combined into noise class training data respectively by the rule preset.Alternatively, to each the noise class in this m noise class, the noise sample in this each noise class is combined into noise class training data respectively by the rule preset, comprises:

To any one the noise class in this m noise class, when this any one noise class comprises multiple noise sample, the plurality of noise sample in this any one noise class is become noise class training data by identical ratio combination.That is, this rule preset can be that the multiple noise sample in noise class are become noise class training data by identical ratio combination.Such as, x noise sample is had in a noise class:

(a) x=1, then this noise sample accounts for 100% of this noise class training data;

(b) x>1, then each noise sample respectively accounts for (100/x) % of noise class training data;

In addition, other accounting rule can also be preset, determine that each noise sample accounts for the ratio of noise class training data.Such as, y noise sample is had in a noise class:

(a) y=1, then this noise sample accounts for 100% of this noise class training data;

(b) y=2, then two noise sample respectively account for 50% of noise class training data;

(c) y>2, then noise cluster barycenter account for 50% of noise class training data, other noise sample fills remaining 50% in proportion according to the normalized cumulant of noise averaging spectrum entropy and noise cluster barycenter, from noise cluster barycenter more close to, proportion is larger.

Should be understood that the length of m the noise class training data obtained respectively according to m noise class is equal.M each self-corresponding noise class training data of noise class is trained, to obtain noise model corresponding to m noise class.

Alternatively, as an embodiment, according to the characteristic quantity of noise in this quiet section, from the multiple noise class preset, determine the noise class of noise matching in quiet section with this, comprising:

The characteristic quantity of the characteristic quantity of noise in this quiet section and the noise cluster barycenter of this m noise class is contrasted, by the noise class at noise cluster barycenter place nearest for noise in quiet apart from this section, is defined as the noise class of noise matching in quiet section with this.

Specifically, in the enhancing stage of the method 100 of speech enhan-cement, obtain the characteristic quantity of noise in quiet section of voice signal, according to characteristic quantity, calculate from m noise class with quiet section in the nearest noise cluster barycenter of noise, the noise class that nearest noise cluster barycenter is corresponding be with quiet section in the noise class of noise matching.

The method of the speech enhan-cement of the embodiment of the present invention is described with a detailed example below.As shown in Figure 3, be the method 300 of the speech enhan-cement of the embodiment of the present invention, method 300 comprises:

S301, inputs noisy voice signal, extracts quiet section of voice signal.

S302, obtain the characteristic quantity of the noise in quiet section of voice signal, this characteristic quantity comprises at least one in noise averaging spectrum entropy, noise normalization critical band energy proportion and the average zero-crossing rate of noise.

S303, according to the characteristic quantity of noise in quiet section, determine with quiet section in the noise class of noise matching.That is, according to the characteristic quantity of noise in quiet section, calculate from m noise class with quiet section in the nearest noise cluster barycenter of noise, the noise class that nearest noise cluster barycenter is corresponding be with quiet section in the noise class of noise matching.

S304, using the input of noise model corresponding for noise class as the method for the speech enhan-cement based on HMM.

S305, using speech model another input as the method for the speech enhan-cement based on HMM.

S306, carries out spectrum analysis to the noisy voice signal of input, calculates the power spectrum of noisy speech, using one of input of the method as the speech enhan-cement based on HMM.

S307, the speech model that the noise model corresponding with the noise class of noise matching in quiet section determined by S304, S305 input and the voice signal of S306 after spectrum analysis, as input, carry out the speech enhan-cement based on HMM.

S308, according to the characteristic quantity of the noise through obtaining based on the voice signal of the speech enhan-cement of HMM and S302, carries out Wiener filtering to through S307 based on the voice signal of the speech enhan-cement of HMM.

S309, carries out frequency spectrum to the voice signal through Wiener filtering comprehensive.

S310, exports the voice signal after the enhancing that frequency spectrum is comprehensive.

Should understand, in various embodiments of the present invention, the size of the sequence number of above-mentioned each process does not also mean that the priority of execution sequence, and the execution sequence of each process should be determined with its function and internal logic, and should not form any restriction to the implementation process of the embodiment of the present invention.

Above composition graphs 1 to Fig. 3, describes the method for the speech enhan-cement according to the embodiment of the present invention in detail, below in conjunction with Fig. 4 and Fig. 5, describes the device of the speech enhan-cement according to the embodiment of the present invention.

Fig. 4 shows the device 400 according to the speech enhan-cement of the embodiment of the present invention.As shown in Figure 4, this device 400 comprises:

First acquisition module 401, for the characteristic quantity of noise in obtain voice signal quiet section;

First determination module 402, for the characteristic quantity of noise in obtain according to this first acquisition module 401 this quiet section, from the multiple noise classes preset, determine the noise class of noise matching in quiet section with this, the plurality of noise class obtains after carrying out cluster according to the characteristic quantity of multiple noise sample to the plurality of noise sample;

Second determination module 403, for determine according to this first determination module 402 should with this quiet section in the noise class of noise matching, and the mapping relations of noise class and noise model, determine the noise model that the noise class of noise matching in quiet with this section is corresponding;

Strengthen module 404, for the noise model corresponding with the noise class of noise matching in this quiet section determined according to this second determination module 403, speech enhan-cement is carried out to this voice signal.

Therefore, the device of the speech enhan-cement that the embodiment of the present invention provides, by in the training stage, noise sample in the noise class obtained after carrying out cluster to multiple noise sample is trained, obtain the noise model that noise class is corresponding, the characteristic quantity of noise in quiet section of voice signal to be reinforced is obtained in the enhancing stage, according to the characteristic quantity obtained, determine the noise model corresponding with the noise class of noise matching in quiet section, the noise of this noise model more approaching to reality, according to the noise model of coupling, speech enhan-cement is carried out to voice signal, the effect of speech enhan-cement can be improved.

Alternatively, as an embodiment, this characteristic quantity comprises at least one in noise averaging spectrum entropy, noise normalization critical band energy proportion and the average zero-crossing rate of noise.

Alternatively, as an embodiment, the plurality of noise sample comprises n noise sample, and this device 400 also comprises:

Second acquisition module, for obtaining this n noise sample, calculates this n noise sample characteristic quantity separately;

This n noise sample cluster, for this n noise sample characteristic quantity separately obtained according to this second acquisition module, is m noise class by cluster module;

Training module, trains for this m the noise class obtained this cluster module, to obtain this m the noise model that noise class is corresponding;

Mapping block, this m the noise model that noise class is corresponding that this m noise class and this training module for being obtained by this cluster module obtain is mapped to the mapping relations of this noise class and noise model, and wherein, m is less than n.

Alternatively, as an embodiment, this cluster module specifically for:

Alternatively, as an embodiment, this cluster module selects m noise sample as m noise cluster barycenter from this n noise sample, comprising:

Collection according to a described n noise sample is originated, and selects this m noise sample as this m noise cluster barycenter from this n noise sample; Or

Alternatively, as an embodiment, this training module specifically for:

Alternatively, as an embodiment, the noise sample in this each noise class, to each the noise class in this m noise class, is combined into noise class training data by the rule preset, comprises by this training module respectively:

To any one the noise class in this m noise class, when this any one noise class comprises multiple noise sample, the plurality of noise sample in this any one noise class is become noise class training data by identical ratio combination.

Alternatively, as an embodiment, this first determination module 402 specifically for:

Alternatively, as an embodiment, this first determination module 403 specifically for:

Should understand, the executive agent of the method 100 in the inventive method embodiment is may correspond to according to the device 400 of the embodiment of the present invention, and above-mentioned and other operation of the modules in device 400 and/or function are respectively in order to realize the corresponding flow process of each method in Fig. 1 to Fig. 3, for simplicity, do not repeat them here.

As shown in Figure 5, the embodiment of the present invention additionally provides a kind of device 500 of speech enhan-cement, and this device 500 comprises: processor 501, storer 502 and bus system 503.Wherein, processor 501 and storer 502 is connected by bus system 503, this storer 502 for storing instruction, this processor 501 for performing the instruction of this storer 502 storage, this processor 501 for:

Obtain the characteristic quantity of noise in quiet section of voice signal;

According to the characteristic quantity of noise in this quiet section, from the multiple noise classes preset, determine the noise class of noise matching in quiet section with this, the plurality of noise class obtains after carrying out cluster according to the characteristic quantity of multiple noise sample to the plurality of noise sample;

According to this and this noise class of noise matching in quiet section, and the mapping relations of noise class and noise model, determine the noise model that the noise class of noise matching in quiet with this section is corresponding;

The noise model corresponding according to the noise class of noise matching in quiet section of this and this, carries out speech enhan-cement to this voice signal.

Should understand, in embodiments of the present invention, this processor 501 can be CPU (central processing unit) (CentralProcessing Unit, referred to as " CPU "), this processor 501 can also be other general processors, digital signal processor (DSP), special IC (ASIC), ready-made programmable gate array (FPGA) or other programmable logic device (PLD), discrete gate or transistor logic, discrete hardware components etc.The processor etc. of general processor can be microprocessor or this processor also can be any routine.

This storer 502 can comprise ROM (read-only memory) and random access memory, and provides instruction and data to processor 501.A part for storer 502 can also comprise nonvolatile RAM.Such as, the information of all right storage device type of storer 502.

This bus system 503, except comprising data bus, can also comprise power bus, control bus and status signal bus in addition etc.But for the purpose of clearly demonstrating, in the drawings various bus is all designated as bus system 503.

In implementation procedure, each step of said method can be completed by the instruction of the integrated logic circuit of the hardware in processor 501 or software form.Step in conjunction with the method disclosed in the embodiment of the present invention can directly be presented as that hardware processor is complete, or hardware in purpose processor and software module combination complete.Software module can be positioned at random access memory, flash memory, ROM (read-only memory), in the storage medium of this area maturations such as programmable read only memory or electrically erasable programmable storer, register.This storage medium is positioned at storer 502, and processor 501 reads the information in storer 502, completes the step of said method in conjunction with its hardware.For avoiding repetition, be not described in detail here.

Alternatively, as an embodiment, the plurality of noise sample comprises n noise sample, processor 501 also for:

Alternatively, as an embodiment, this n noise sample cluster, according to this n noise sample characteristic quantity separately, is m noise class, comprises by processor 501:

Alternatively, as an embodiment, processor 501 selects m noise sample as m noise cluster barycenter from this n noise sample, comprising:

Collection according to this n noise sample is originated, and selects this m noise sample from this n noise sample, as this m noise cluster barycenter; Or

According to the size of this n noise sample noise averaging spectrum entropy separately, from this n noise sample, select this m noise sample, as this m noise cluster barycenter.

Alternatively, as an embodiment, processor 501 is trained this m noise class, to obtain this m the noise model that noise class is corresponding, comprising:

This m noise class this noise class training data is separately trained respectively, to obtain this m noise class noise model corresponding respectively.

Alternatively, as an embodiment, the noise sample in this each noise class, to each the noise class in this m noise class, is combined into noise class training data by the rule preset, comprises by processor 501 respectively:

Alternatively, as an embodiment, processor 501, according to the characteristic quantity of noise in this quiet section, is determined the noise class of noise matching in quiet section with this, being comprised from the multiple noise classes preset:

Should understand, the corresponding main body in the method performing the embodiment of the present invention is may correspond to according to the device 500 of the speech enhan-cement of the embodiment of the present invention, and above-mentioned and other operation of the modules in device 500 and/or function are respectively in order to realize the corresponding flow process of each method in Fig. 1 to Fig. 3, for simplicity, do not repeat them here.

Those of ordinary skill in the art can recognize, in conjunction with unit and the algorithm steps of each example of embodiment disclosed herein description, can realize with the combination of electronic hardware or computer software and electronic hardware.These functions perform with hardware or software mode actually, depend on application-specific and the design constraint of technical scheme.Professional and technical personnel can use distinct methods to realize described function to each specifically should being used for, but this realization should not thought and exceeds scope of the present invention.

Those skilled in the art can be well understood to, and for convenience and simplicity of description, the specific works process of the system of foregoing description, device and unit, with reference to the corresponding process in preceding method embodiment, can not repeat them here.

In several embodiments that the application provides, should be understood that disclosed system, apparatus and method can realize by another way.Such as, device embodiment described above is only schematic, such as, the division of described unit, be only a kind of logic function to divide, actual can have other dividing mode when realizing, such as multiple unit or assembly can in conjunction with or another system can be integrated into, or some features can be ignored, or do not perform.Another point, shown or discussed coupling each other or direct-coupling or communication connection can be by some interfaces, and the indirect coupling of device or unit or communication connection can be electrical, machinery or other form.

The described unit illustrated as separating component or can may not be and physically separates, and the parts as unit display can be or may not be physical location, namely can be positioned at a place, or also can be distributed in multiple network element.Some or all of unit wherein can be selected according to the actual needs to realize the object of the present embodiment scheme.

In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, also can be that the independent physics of unit exists, also can two or more unit in a unit integrated.

If described function using the form of SFU software functional unit realize and as independently production marketing or use time, can be stored in a computer read/write memory medium.Based on such understanding, the part of the part that technical scheme of the present invention contributes to prior art in essence in other words or this technical scheme can embody with the form of software product, this computer software product is stored in a storage medium, comprising some instructions in order to make a computer equipment (can be personal computer, server, or the network equipment etc.) perform all or part of step of method described in each embodiment of the present invention.And aforesaid storage medium comprises: USB flash disk, portable hard drive, ROM (read-only memory) (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disc or CD etc. various can be program code stored medium.

The above; be only the specific embodiment of the present invention, but protection scope of the present invention is not limited thereto, is anyly familiar with those skilled in the art in the technical scope that the present invention discloses; change can be expected easily or replace, all should be encompassed within protection scope of the present invention.Therefore, protection scope of the present invention should be as the criterion with the protection domain of claim.

Claims

1. a method for speech enhan-cement, is characterized in that, comprising:

Obtain the characteristic quantity of noise in quiet section of voice signal;

2. method according to claim 1, is characterized in that, described characteristic quantity comprises at least one in noise averaging spectrum entropy, noise normalization critical band energy proportion and the average zero-crossing rate of noise.

3. method according to claim 1 and 2, is characterized in that, described multiple noise sample comprises n noise sample, and described method also comprises:

4. method according to claim 3, is characterized in that, described according to described n noise sample characteristic quantity separately, is m noise class, comprises described n noise sample cluster:

5. method according to claim 4, is characterized in that, describedly from a described n noise sample, selects m noise sample as m noise cluster barycenter, comprising:

6. the method according to any one of claim 3 to 5, is characterized in that, describedly trains described m noise class, to obtain described m the noise model that noise class is corresponding, comprising:

7. method according to claim 6, is characterized in that, described to each the noise class in described m noise class, the noise sample in each noise class described is combined into noise class training data respectively by the rule preset, comprises:

8. the method according to any one of claim 3 to 7, is characterized in that, the described characteristic quantity according to noise in described quiet section, determine from the multiple noise classes preset with described quiet section in the noise class of noise matching, comprising:

9. method according to any one of claim 1 to 8, is characterized in that, the described characteristic quantity according to noise in described quiet section, determine from the multiple noise classes preset with described quiet section in the noise class of noise matching, comprising:

10. a device for speech enhan-cement, is characterized in that, comprising:

11. devices according to claim 10, is characterized in that, described characteristic quantity comprises at least one in noise averaging spectrum entropy, noise normalization critical band energy proportion and the average zero-crossing rate of noise.

12. devices according to claim 10 or 11, it is characterized in that, described multiple noise sample comprises n noise sample, and described device also comprises:

13. devices according to claim 12, is characterized in that, described cluster module specifically for:

14. devices according to claim 13, is characterized in that, described cluster module selects m noise sample as m noise cluster barycenter from a described n noise sample, comprising:

15., according to claim 12 to the device according to any one of 14, is characterized in that, described training module specifically for:

16. devices according to claim 15, is characterized in that, the noise sample in each noise class described, to each the noise class in described m noise class, is combined into noise class training data by the rule preset, comprises by described training module respectively:

17., according to claim 12 to the device according to any one of 16, is characterized in that, described first determination module specifically for:

18., according to claim 10 to the device according to any one of 17, is characterized in that, described first determination module specifically for: