CN107527611A

CN107527611A - MFCC audio recognition methods, storage medium, electronic equipment and system

Info

Publication number: CN107527611A
Application number: CN201710731077.XA
Authority: CN
Inventors: 李振华; 陈少杰; 张文明
Original assignee: Wuhan Douyu Network Technology Co Ltd
Current assignee: Wuhan Douyu Network Technology Co Ltd
Priority date: 2017-08-23
Filing date: 2017-08-23
Publication date: 2017-12-29
Also published as: WO2019037426A1

Abstract

The invention discloses a kind of MFCC audio recognition methods, storage medium, electronic equipment and system, it is related to field of speech recognition.Steps of the method are：The voice signal for needing speech recognition is pre-processed, obtain MFCC initial signals, calculate Frequency, IF-FRE and the higher frequency of MFCC initial signals, the low, medium and high rate again and again of MFCC initial signals is merged, obtain MFCC characteristic parameters, dimensionality reduction is carried out to MFCC characteristic parameters F, obtains MFCC dimensionality reduction characteristic parameters.Invention significantly improves accuracy of identification of the MFCC characteristic parameters in noise circumstance and high-frequency region, and then the purpose that MFCC characteristic parameters are extracted in the voice signal in noise circumstance and high-frequency region is realized, be very suitable for promoting.

Description

MFCC audio recognition methods, storage medium, electronic equipment and system

Technical field

The present invention relates to field of speech recognition, and in particular to a kind of MFCC (Mel-Frequency Cepstral Coefficients, mel-frequency cepstrum coefficient) audio recognition method, storage medium, electronic equipment and system.

Background technology

MFCC is one kind widely used feature in automatic speech and Speaker Identification, because MFCC characteristic parameters are languages The most feature of identification, therefore MFCC characteristic parameters are had been widely used in speech recognition field, that is, are existed in sound signal During speech recognition, the MFCC characteristic parameters in voice signal are extracted, have just been basically completed speech identifying function.

But for the high-frequency region of the more voice signal of noise and voice signal, MFCC characteristic parameters Accuracy of identification is relatively low, and then causes MFCC characteristic parameters to be difficult to extract.

The content of the invention

For defect present in prior art, present invention solves the technical problem that being：How in noise circumstance and high frequency MFCC characteristic parameters are identified in the voice signal in region, the present invention can increase substantially the accuracy of identification of MFCC characteristic parameters, It is very suitable for promoting.

To achieve the above objectives, MFCC audio recognition methods provided by the invention, comprise the following steps：

S1：The voice signal for needing speech recognition is pre-processed, MFCC initial signals is obtained, goes to S2；

S2：According to the low frequency frequency-region signal f1 of MFCC initial signals, the Frequency fl of MFCC initial signals is calculated, is calculated Formula is：

According to the intermediate frequency frequency-region signal f2 of MFCC initial signals, the IF-FRE fm of MFCC initial signals is calculated, is calculated public Formula is：

According to the high frequency frequency-region signal f3 of MFCC initial signals, the higher frequency fh of MFCC initial signals is calculated, is calculated public Formula is：Go to S3；

S3：The Frequency fl, IF-FRE fm and higher frequency fh of MFCC initial signals are merged, obtain MFCC Characteristic parameter F, calculation formula are：

Wherein N represents points when voice signal carries out FFT, F_sRepresent stopband cut-off frequency, F_sFor constant, Q^-1Generation Table fm inverse function, Q (fl) are represented using fl as after f2 and are calculated fm according to fm calculation formula, and Q (fh) is represented using fh as f2 Fm is calculated according to fm calculation formula afterwards, H represents the number of the wave filter needed to use in speech recognition.

It is further comprising the steps of after S3 on the basis of above-mentioned technical proposal：S4：MFCC characteristic parameters F is dropped Dimension, obtains MFCC dimensionality reduction characteristic parameters F_Drop, calculation formula is：σ_betweenFor inter _ class relationship, voice letter is represented Number kth dimension MFCC characteristic parameters inter-class variance sum；σ_withinFor within-cluster variance, represent that the kth of voice signal ties up MFCC The sum of the variance within clusters of characteristic parameter.

Storage medium provided by the invention, computer program is stored with the storage medium, the computer program is located Reason device realizes above-mentioned MFCC audio recognition methods when performing.

Electronic equipment provided by the invention, including memory and processor, store on memory and run on a processor Computer program, realize above-mentioned MFCC audio recognition methods during computing device computer program.

MFCC speech recognition systems provided by the invention, including speech signal pre-processing module, MFCC initial signal frequencies Computing module and MFCC Feature Parameter Fusion modules；

Speech signal pre-processing module is used for：The voice signal for needing speech recognition is pre-processed, at the beginning of obtaining MFCC Beginning signal, send MFCC initial signals frequency to MFCC initial signal frequencies computing module and calculate signal；

MFCC initial signal frequency computing modules are used for：After receiving MFCC initial signals frequency calculating signal, according to MFCC The low frequency frequency-region signal f1 of initial signal, calculates the Frequency fl of MFCC initial signals, and calculation formula is：

According to the high frequency frequency-region signal f3 of MFCC initial signals, the higher frequency fh of MFCC initial signals is calculated, is calculated public Formula is：MFCC Feature Parameter Fusions are sent to MFCC Feature Parameter Fusions module Signal；

MFCC Feature Parameter Fusion modules are used for：After receiving MFCC Feature Parameter Fusion signals, by MFCC initial signals Frequency fl, IF-FRE fm and higher frequency fh are merged, and obtain MFCC characteristic parameter F, and calculation formula is：

On the basis of above-mentioned technical proposal, the system also includes MFCC feature vectors dimensional down modules, and it is used for：In MFCC After the completion of the work of Feature Parameter Fusion module, dimensionality reduction is carried out to MFCC characteristic parameters, obtains MFCC dimensionality reduction characteristic parameters F_Drop, calculate Formula is：σ_betweenFor inter _ class relationship, represent voice signal kth dimension MFCC characteristic parameters class between side The sum of difference；σ_withinFor within-cluster variance, represent that the kth of voice signal ties up the sum of the variance within clusters of MFCC characteristic parameters.

Compared with prior art, the advantage of the invention is that：

(1) it can be seen from the S1 to S3 of the present invention, the present invention is first calculated MFCC and initially believed by the algorithm of independent research Number basic, normal, high frequent rate, then by it is basic, normal, high again and again rate fusion calculation obtain MFCC characteristic parameters, significantly improved with this Accuracy of identification of the MFCC characteristic parameters in noise circumstance and high-frequency region, and then realize in noise circumstance and high-frequency region In voice signal in extract MFCC characteristic parameters purpose, be very suitable for promoting.

(2) it can be seen from the S4 of the present invention, the present invention carries out dimensionality reduction by the algorithm of independent research to MFCC characteristic parameters, Further to improve the accuracy of identification of MFCC characteristic parameters.

Brief description of the drawings

Fig. 1 is the flow chart of MFCC audio recognition methods in the embodiment of the present invention；

Fig. 2 is the connection block diagram of electronic equipment in the embodiment of the present invention.

Embodiment

The present invention is described in further detail below in conjunction with drawings and Examples.

It is shown in Figure 1, the MFCC audio recognition methods in the embodiment of the present invention, comprise the following steps：

S1：The voice signal for needing speech recognition is pre-processed, MFCC initial signals is obtained, goes to S2.

S1 idiographic flow is：Preemphasis, framing and windowing process are carried out to voice signal, obtain voice signal；By language Sound signal carries out FFT (Fast Fourier Transformation, the fast algorithm of discrete fourier transform) conversion (FFTs When can be counted), i.e., the conversion from time domain to frequency domain, obtain frequency-region signal；By frequency-region signal modulus square obtain it is discrete Power spectrum, discrete power spectrum is passed through into filter filtering, DCT (DCT for are carried out after being taken the logarithm to filtered signal Discrete Cosine Transform, discrete cosine transform) conversion, obtain MFCC initial signals.

S2：Obtain after MFCC initial signals, it is necessary to which signal of the MFCC initial signals in low frequency, intermediate frequency and high frequency is respectively increased Precision, idiographic flow are：

According to the low frequency frequency-region signal f1 of MFCC initial signals, the Frequency fl of MFCC initial signals is calculated, is calculated public Formula is：

The design principle of higher frequency fm calculation formula is：In order to solve the precision problem of Mid Frequency, it is necessary to find one It is kind suitable, for the Mel-Hz corresponding relations of mid-frequency region, this corresponding relation need to realize wave filter in low, high frequency section point Cloth is sparse, and in intermediate frequency section, distribution is relatively intensive, so as to ensure the computational accuracy of Mid Frequency.The fm calculated by above-mentioned formula Just identical above demand, and still ensure the computational accuracy of Mid Frequency coefficient in the form of logarithmic function.

According to the high frequency frequency-region signal f3 of MFCC initial signals, the higher frequency fh of MFCC initial signals is calculated, is calculated public Formula is：Go to S3.

The design principle of higher frequency fh calculation formula is：Because the quantity of MFCC Mel wave filters in extraction is in low frequency Region is more, few in high-frequency region so that MFCC declines therewith with the raising of frequency, its computational accuracy.Therefore IMFCC is proposed (inverse Mel cepstrums coefficient of frequency), i.e. higher frequency fh；IMFCC constructs a kind of new knot opposite with Mel domains yardstick with reference to Me domains Structure so that wave filter is reduced in the number that low frequency range is distributed, in the quantity increase of high frequency region.

Wherein N represents points when voice signal carries out FFT, F_sRepresent stopband cut-off frequency (F_sFor constant), Q^-1 Represent fm inverse function, Q (fl) represents fl being used as after f2 calculates fm according to fm calculation formula, Q (fh) representatives using fh as Fm is calculated according to fm calculation formula after f2, H represents the number of the wave filter needed to use in speech recognition, goes to S4.

S4：Dimensionality reduction is carried out to MFCC characteristic parameters F, obtains MFCC dimensionality reduction characteristic parameters F_Drop, identification essence is improved with further Degree, calculation formula are：

Wherein σ_betweenFor inter _ class relationship, represent that the kth between a certain group of voice signal ties up the class of MFCC characteristic parameters Between variance sum；σ_withinFor within-cluster variance, represent that the kth of some voice signal ties up the variance within clusters of MFCC characteristic parameters Sum；M represents the sum of speech samples, n_iThe number for the speech samples that voice signal i possesses is represented,Represent voice signal i Kth dimension MFCC characteristic parameters average, μ_kThe average of MFCC characteristic parameters is tieed up for kth in voice signal；Represent voice letter The kth dimension MFCC characteristic parameter normal components of number i jth section speech samples,Represent voice signal i i-th section of speech samples Kth ties up MFCC characteristic parameter normal components.

S4 principle is：MFCC characteristic parameters will typically need 20 to 30 to tie up the identification that just can guarantee that speech recognition system Rate, the feature of the more big then voice signal of value of MFCC characteristic parameter dimensions is more accurate, but because of each of characteristic parameter It is different to the contribution degree of speech recognition system to tie up component, can typically there is many garbages in such characteristic parameter Even interference information, that is, the ageing of system is influenceed, influence the discrimination of system again.Therefore MFCC characteristic parameters are dropped Dimension, to choose effective dimension in MFCC characteristic parameters.

The embodiment of the present invention also provides a kind of storage medium, and computer program, computer program are stored with storage medium Above-mentioned MFCC audio recognition methods are realized when being executed by processor.It should be noted that the storage medium includes USB flash disk, movement Hard disk, ROM (Read-Only Memory, read-only storage), RAM (Random Access Memory, random access memory Device), magnetic disc or CD etc. are various can be with the medium of store program codes.

Shown in Figure 2, the embodiment of the present invention also provides a kind of electronic equipment, including memory and processor, memory On store the computer program run on a processor, realize above-mentioned MFCC speech recognitions during computing device computer program Method.

MFCC speech recognition systems provided in an embodiment of the present invention, including speech signal pre-processing module, MFCC initially believe Number frequency computing module, MFCC Feature Parameter Fusions module and MFCC feature vectors dimensional down modules.

Speech signal pre-processing module is used for：The voice signal for needing speech recognition is pre-processed, at the beginning of obtaining MFCC Beginning signal, send MFCC initial signals frequency to MFCC initial signal frequencies computing module and calculate signal；Idiographic flow is：To language Sound signal carries out preemphasis, framing and windowing process, obtains voice signal；Voice signal is subjected to FFT, obtains frequency domain letter Number；Frequency-region signal modulus square is obtained into discrete power spectrum, discrete power spectrum is passed through into filter filtering, to filtered letter DCT conversions are carried out after number taking the logarithm, obtain MFCC initial signals.

According to the high frequency frequency-region signal f3 of MFCC initial signals, the higher frequency fh of MFCC initial signals is calculated, is calculated public Formula is：MFCC Feature Parameter Fusions are sent to MFCC Feature Parameter Fusions module Signal.

Wherein N represent voice signal carry out FFT when points (N specific value can be according to state of the art Draw), F_sRepresent stopband cut-off frequency, F_sFor constant, Q^-1Fm inverse function is represented, Q (fl) is represented using fl as basis after f2 Fm calculation formula calculates fm, and Q (fh) is represented using fh as after f2 and calculated fm according to fm calculation formula, and H represents speech recognition In the number of wave filter that needs to use, send MFCC feature vectors dimensional down signals to MFCC feature vectors dimensional downs module.

MFCC feature vectors dimensional down modules are used for：After receiving MFCC feature vectors dimensional down signals, MFCC characteristic parameters are entered Row dimensionality reduction, obtain MFCC dimensionality reduction characteristic parameters F_Drop, calculation formula is：

It should be noted that：System provided in an embodiment of the present invention is when carrying out intermodule communication, only with above-mentioned each function The division progress of module, can be as needed and by above-mentioned function distribution by different function moulds for example, in practical application Block is completed, i.e., the internal structure of system is divided into different functional modules, to complete all or part of work(described above Energy.

Further, the present invention is not limited to the above-described embodiments, for those skilled in the art, Without departing from the principles of the invention, some improvements and modifications can also be made, these improvements and modifications are also considered as the present invention Protection domain within.The content not being described in detail in this specification belongs to existing skill known to professional and technical personnel in the field Art.

Claims

1. a kind of MFCC audio recognition methods, it is characterised in that this method comprises the following steps：

S2：According to the low frequency frequency-region signal f1 of MFCC initial signals, the Frequency fl of calculating MFCC initial signals, calculation formula For：

It is according to the intermediate frequency frequency-region signal f2 of MFCC initial signals, the IF-FRE fm of calculating MFCC initial signals, calculation formula：

It is according to the high frequency frequency-region signal f3 of MFCC initial signals, the higher frequency fh of calculating MFCC initial signals, calculation formula：

Go to S3；

S3：The Frequency fl, IF-FRE fm and higher frequency fh of MFCC initial signals are merged, obtain MFCC features Parameter F, calculation formula are：

Wherein N represents points when voice signal carries out FFT, F_sRepresent stopband cut-off frequency, F_sFor constant, Q^-1Represent fm Inverse function, Q (fl) represent using fl be used as after f2 according to fm calculation formula calculate fm, Q (fh) representative using fh as root after f2 Fm is calculated according to fm calculation formula, H represents the number of the wave filter needed to use in speech recognition.

2. MFCC audio recognition methods as claimed in claim 1, it is characterised in that further comprising the steps of after S3：S4：It is right MFCC characteristic parameters F carries out dimensionality reduction, obtains MFCC dimensionality reduction characteristic parameters F_Drop, calculation formula is：σ_betweenFor class Between dispersion, represent voice signal kth dimension MFCC characteristic parameters inter-class variance sum；σ_withinFor within-cluster variance, represent The sum of the variance within clusters of the kth dimension MFCC characteristic parameters of voice signal.

3. MFCC audio recognition methods as claimed in claim 2, it is characterised in that：

<mrow> <msub> <mi>&sigma;</mi> <mrow> <mi>b</mi> <mi>e</mi> <mi>t</mi> <mi>w</mi> <mi>e</mi> <mi>e</mi> <mi>n</mi> </mrow> </msub> <mo>=</mo> <msubsup> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>M</mi> </msubsup> <msup> <mrow> <mo>(</mo> <msubsup> <mi>&mu;</mi> <mi>k</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </msubsup> <mo>-</mo> <msub> <mi>&mu;</mi> <mi>k</mi> </msub> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mo>,</mo> <msub> <mi>&sigma;</mi> <mrow> <mi>w</mi> <mi>i</mi> <mi>t</mi> <mi>h</mi> <mi>i</mi> <mi>n</mi> </mrow> </msub> <mo>=</mo> <msubsup> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>M</mi> </msubsup> <mo>&lsqb;</mo> <mfrac> <mn>1</mn> <mi>n</mi> </mfrac> <msubsup> <mi>&Sigma;</mi> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <msub> <mi>n</mi> <mi>i</mi> </msub> </msubsup> <msup> <mrow> <mo>(</mo> <msubsup> <mi>x</mi> <mi>k</mi> <mrow> <mo>(</mo> <mi>j</mi> <mo>)</mo> </mrow> </msubsup> <mo>-</mo> <msubsup> <mi>x</mi> <mi>k</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </msubsup> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mo>&rsqb;</mo> <mo>;</mo> </mrow>

Wherein M represents the sum of speech samples, n_iThe number for the speech samples that voice signal i possesses is represented,Represent voice letter The average of number i kth dimension MFCC characteristic parameters, μ_kThe average of MFCC characteristic parameters is tieed up for kth in voice signal；Represent language The kth dimension MFCC characteristic parameter normal components of sound signal i jth section speech samples,Represent voice signal i i-th section of voice sample This kth dimension MFCC characteristic parameter normal components.

4. the MFCC audio recognition methods as described in claims 1 to 3, it is characterised in that S1 flow includes：To voice signal Preemphasis, framing and windowing process are carried out, obtains voice signal；Voice signal is subjected to FFT, obtains frequency-region signal；Will Frequency-region signal modulus square obtains discrete power spectrum, and discrete power spectrum is passed through into filter filtering, filtered signal is taken DCT conversions are carried out after logarithm, obtain MFCC initial signals.

5. a kind of storage medium, computer program is stored with the storage medium, it is characterised in that：The computer program is located Reason device realizes the method described in any one of Claims 1-4 when performing.

6. a kind of electronic equipment, including memory and processor, the computer journey run on a processor is stored on memory Sequence, it is characterised in that：The method described in any one of Claims 1-4 is realized during computing device computer program.

A kind of 7. MFCC speech recognition systems, it is characterised in that：The system includes speech signal pre-processing module, MFCC initially believes Number frequency computing module and MFCC Feature Parameter Fusion modules；

Speech signal pre-processing module is used for：The voice signal for needing speech recognition is pre-processed, MFCC is obtained and initially believes Number, send MFCC initial signals frequency to MFCC initial signal frequencies computing module and calculate signal；

MFCC initial signal frequency computing modules are used for：It is initial according to MFCC after receiving MFCC initial signals frequency calculating signal The low frequency frequency-region signal f1 of signal, calculates the Frequency fl of MFCC initial signals, and calculation formula is：

MFCC Feature Parameter Fusion signals are sent to MFCC Feature Parameter Fusions module；

MFCC Feature Parameter Fusion modules are used for：After receiving MFCC Feature Parameter Fusion signals, by the low frequency of MFCC initial signals Frequency fl, IF-FRE fm and higher frequency fh are merged, and obtain MFCC characteristic parameter F, and calculation formula is：

8. MFCC speech recognition systems as claimed in claim 7, it is characterised in that the system also includes MFCC characteristic parameters and dropped Module is tieed up, it is used for：After the completion of the work of MFCC Feature Parameter Fusions module, dimensionality reduction is carried out to MFCC characteristic parameters, obtained MFCC dimensionality reduction characteristic parameters F_Drop, calculation formula is：σ_betweenFor inter _ class relationship, the kth of voice signal is represented Tie up the sum of the inter-class variance of MFCC characteristic parameters；σ_withinFor within-cluster variance, represent that the kth dimension MFCC features of voice signal are joined The sum of several variance within clusters.

9. MFCC speech recognition systems as claimed in claim 8, it is characterised in that：

10. the MFCC speech recognition systems as described in claim 7 to 9, it is characterised in that：The speech signal pre-processing module Workflow include：Preemphasis, framing and windowing process are carried out to voice signal, obtain voice signal；Voice signal is entered Row FFT, obtains frequency-region signal；Frequency-region signal modulus square is obtained into discrete power spectrum, discrete power spectrum is passed through into filter Ripple device is filtered, and DCT conversions are carried out after being taken the logarithm to filtered signal, obtain MFCC initial signals.