CN107527611A - MFCC audio recognition methods, storage medium, electronic equipment and system - Google Patents
MFCC audio recognition methods, storage medium, electronic equipment and system Download PDFInfo
- Publication number
- CN107527611A CN107527611A CN201710731077.XA CN201710731077A CN107527611A CN 107527611 A CN107527611 A CN 107527611A CN 201710731077 A CN201710731077 A CN 201710731077A CN 107527611 A CN107527611 A CN 107527611A
- Authority
- CN
- China
- Prior art keywords
- mrow
- mfcc
- frequency
- signal
- msubsup
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/20—Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/21—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
Abstract
The invention discloses a kind of MFCC audio recognition methods, storage medium, electronic equipment and system, it is related to field of speech recognition.Steps of the method are:The voice signal for needing speech recognition is pre-processed, obtain MFCC initial signals, calculate Frequency, IF-FRE and the higher frequency of MFCC initial signals, the low, medium and high rate again and again of MFCC initial signals is merged, obtain MFCC characteristic parameters, dimensionality reduction is carried out to MFCC characteristic parameters F, obtains MFCC dimensionality reduction characteristic parameters.Invention significantly improves accuracy of identification of the MFCC characteristic parameters in noise circumstance and high-frequency region, and then the purpose that MFCC characteristic parameters are extracted in the voice signal in noise circumstance and high-frequency region is realized, be very suitable for promoting.
Description
Technical field
The present invention relates to field of speech recognition, and in particular to a kind of MFCC (Mel-Frequency Cepstral
Coefficients, mel-frequency cepstrum coefficient) audio recognition method, storage medium, electronic equipment and system.
Background technology
MFCC is one kind widely used feature in automatic speech and Speaker Identification, because MFCC characteristic parameters are languages
The most feature of identification, therefore MFCC characteristic parameters are had been widely used in speech recognition field, that is, are existed in sound signal
During speech recognition, the MFCC characteristic parameters in voice signal are extracted, have just been basically completed speech identifying function.
But for the high-frequency region of the more voice signal of noise and voice signal, MFCC characteristic parameters
Accuracy of identification is relatively low, and then causes MFCC characteristic parameters to be difficult to extract.
The content of the invention
For defect present in prior art, present invention solves the technical problem that being:How in noise circumstance and high frequency
MFCC characteristic parameters are identified in the voice signal in region, the present invention can increase substantially the accuracy of identification of MFCC characteristic parameters,
It is very suitable for promoting.
To achieve the above objectives, MFCC audio recognition methods provided by the invention, comprise the following steps:
S1:The voice signal for needing speech recognition is pre-processed, MFCC initial signals is obtained, goes to S2;
S2:According to the low frequency frequency-region signal f1 of MFCC initial signals, the Frequency fl of MFCC initial signals is calculated, is calculated
Formula is:
According to the intermediate frequency frequency-region signal f2 of MFCC initial signals, the IF-FRE fm of MFCC initial signals is calculated, is calculated public
Formula is:
According to the high frequency frequency-region signal f3 of MFCC initial signals, the higher frequency fh of MFCC initial signals is calculated, is calculated public
Formula is:Go to S3;
S3:The Frequency fl, IF-FRE fm and higher frequency fh of MFCC initial signals are merged, obtain MFCC
Characteristic parameter F, calculation formula are:
Wherein N represents points when voice signal carries out FFT, FsRepresent stopband cut-off frequency, FsFor constant, Q-1Generation
Table fm inverse function, Q (fl) are represented using fl as after f2 and are calculated fm according to fm calculation formula, and Q (fh) is represented using fh as f2
Fm is calculated according to fm calculation formula afterwards, H represents the number of the wave filter needed to use in speech recognition.
It is further comprising the steps of after S3 on the basis of above-mentioned technical proposal:S4:MFCC characteristic parameters F is dropped
Dimension, obtains MFCC dimensionality reduction characteristic parameters FDrop, calculation formula is:σbetweenFor inter _ class relationship, voice letter is represented
Number kth dimension MFCC characteristic parameters inter-class variance sum;σwithinFor within-cluster variance, represent that the kth of voice signal ties up MFCC
The sum of the variance within clusters of characteristic parameter.
Storage medium provided by the invention, computer program is stored with the storage medium, the computer program is located
Reason device realizes above-mentioned MFCC audio recognition methods when performing.
Electronic equipment provided by the invention, including memory and processor, store on memory and run on a processor
Computer program, realize above-mentioned MFCC audio recognition methods during computing device computer program.
MFCC speech recognition systems provided by the invention, including speech signal pre-processing module, MFCC initial signal frequencies
Computing module and MFCC Feature Parameter Fusion modules;
Speech signal pre-processing module is used for:The voice signal for needing speech recognition is pre-processed, at the beginning of obtaining MFCC
Beginning signal, send MFCC initial signals frequency to MFCC initial signal frequencies computing module and calculate signal;
MFCC initial signal frequency computing modules are used for:After receiving MFCC initial signals frequency calculating signal, according to MFCC
The low frequency frequency-region signal f1 of initial signal, calculates the Frequency fl of MFCC initial signals, and calculation formula is:
According to the intermediate frequency frequency-region signal f2 of MFCC initial signals, the IF-FRE fm of MFCC initial signals is calculated, is calculated public
Formula is:
According to the high frequency frequency-region signal f3 of MFCC initial signals, the higher frequency fh of MFCC initial signals is calculated, is calculated public
Formula is:MFCC Feature Parameter Fusions are sent to MFCC Feature Parameter Fusions module
Signal;
MFCC Feature Parameter Fusion modules are used for:After receiving MFCC Feature Parameter Fusion signals, by MFCC initial signals
Frequency fl, IF-FRE fm and higher frequency fh are merged, and obtain MFCC characteristic parameter F, and calculation formula is:
Wherein N represents points when voice signal carries out FFT, FsRepresent stopband cut-off frequency, FsFor constant, Q-1Generation
Table fm inverse function, Q (fl) are represented using fl as after f2 and are calculated fm according to fm calculation formula, and Q (fh) is represented using fh as f2
Fm is calculated according to fm calculation formula afterwards, H represents the number of the wave filter needed to use in speech recognition.
On the basis of above-mentioned technical proposal, the system also includes MFCC feature vectors dimensional down modules, and it is used for:In MFCC
After the completion of the work of Feature Parameter Fusion module, dimensionality reduction is carried out to MFCC characteristic parameters, obtains MFCC dimensionality reduction characteristic parameters FDrop, calculate
Formula is:σbetweenFor inter _ class relationship, represent voice signal kth dimension MFCC characteristic parameters class between side
The sum of difference;σwithinFor within-cluster variance, represent that the kth of voice signal ties up the sum of the variance within clusters of MFCC characteristic parameters.
Compared with prior art, the advantage of the invention is that:
(1) it can be seen from the S1 to S3 of the present invention, the present invention is first calculated MFCC and initially believed by the algorithm of independent research
Number basic, normal, high frequent rate, then by it is basic, normal, high again and again rate fusion calculation obtain MFCC characteristic parameters, significantly improved with this
Accuracy of identification of the MFCC characteristic parameters in noise circumstance and high-frequency region, and then realize in noise circumstance and high-frequency region
In voice signal in extract MFCC characteristic parameters purpose, be very suitable for promoting.
(2) it can be seen from the S4 of the present invention, the present invention carries out dimensionality reduction by the algorithm of independent research to MFCC characteristic parameters,
Further to improve the accuracy of identification of MFCC characteristic parameters.
Brief description of the drawings
Fig. 1 is the flow chart of MFCC audio recognition methods in the embodiment of the present invention;
Fig. 2 is the connection block diagram of electronic equipment in the embodiment of the present invention.
Embodiment
The present invention is described in further detail below in conjunction with drawings and Examples.
It is shown in Figure 1, the MFCC audio recognition methods in the embodiment of the present invention, comprise the following steps:
S1:The voice signal for needing speech recognition is pre-processed, MFCC initial signals is obtained, goes to S2.
S1 idiographic flow is:Preemphasis, framing and windowing process are carried out to voice signal, obtain voice signal;By language
Sound signal carries out FFT (Fast Fourier Transformation, the fast algorithm of discrete fourier transform) conversion (FFTs
When can be counted), i.e., the conversion from time domain to frequency domain, obtain frequency-region signal;By frequency-region signal modulus square obtain it is discrete
Power spectrum, discrete power spectrum is passed through into filter filtering, DCT (DCT for are carried out after being taken the logarithm to filtered signal
Discrete Cosine Transform, discrete cosine transform) conversion, obtain MFCC initial signals.
S2:Obtain after MFCC initial signals, it is necessary to which signal of the MFCC initial signals in low frequency, intermediate frequency and high frequency is respectively increased
Precision, idiographic flow are:
According to the low frequency frequency-region signal f1 of MFCC initial signals, the Frequency fl of MFCC initial signals is calculated, is calculated public
Formula is:
According to the intermediate frequency frequency-region signal f2 of MFCC initial signals, the IF-FRE fm of MFCC initial signals is calculated, is calculated public
Formula is:
The design principle of higher frequency fm calculation formula is:In order to solve the precision problem of Mid Frequency, it is necessary to find one
It is kind suitable, for the Mel-Hz corresponding relations of mid-frequency region, this corresponding relation need to realize wave filter in low, high frequency section point
Cloth is sparse, and in intermediate frequency section, distribution is relatively intensive, so as to ensure the computational accuracy of Mid Frequency.The fm calculated by above-mentioned formula
Just identical above demand, and still ensure the computational accuracy of Mid Frequency coefficient in the form of logarithmic function.
According to the high frequency frequency-region signal f3 of MFCC initial signals, the higher frequency fh of MFCC initial signals is calculated, is calculated public
Formula is:Go to S3.
The design principle of higher frequency fh calculation formula is:Because the quantity of MFCC Mel wave filters in extraction is in low frequency
Region is more, few in high-frequency region so that MFCC declines therewith with the raising of frequency, its computational accuracy.Therefore IMFCC is proposed
(inverse Mel cepstrums coefficient of frequency), i.e. higher frequency fh;IMFCC constructs a kind of new knot opposite with Mel domains yardstick with reference to Me domains
Structure so that wave filter is reduced in the number that low frequency range is distributed, in the quantity increase of high frequency region.
S3:The Frequency fl, IF-FRE fm and higher frequency fh of MFCC initial signals are merged, obtain MFCC
Characteristic parameter F, calculation formula are:
Wherein N represents points when voice signal carries out FFT, FsRepresent stopband cut-off frequency (FsFor constant), Q-1
Represent fm inverse function, Q (fl) represents fl being used as after f2 calculates fm according to fm calculation formula, Q (fh) representatives using fh as
Fm is calculated according to fm calculation formula after f2, H represents the number of the wave filter needed to use in speech recognition, goes to S4.
S4:Dimensionality reduction is carried out to MFCC characteristic parameters F, obtains MFCC dimensionality reduction characteristic parameters FDrop, identification essence is improved with further
Degree, calculation formula are:
Wherein σbetweenFor inter _ class relationship, represent that the kth between a certain group of voice signal ties up the class of MFCC characteristic parameters
Between variance sum;σwithinFor within-cluster variance, represent that the kth of some voice signal ties up the variance within clusters of MFCC characteristic parameters
Sum;M represents the sum of speech samples, niThe number for the speech samples that voice signal i possesses is represented,Represent voice signal i
Kth dimension MFCC characteristic parameters average, μkThe average of MFCC characteristic parameters is tieed up for kth in voice signal;Represent voice letter
The kth dimension MFCC characteristic parameter normal components of number i jth section speech samples,Represent voice signal i i-th section of speech samples
Kth ties up MFCC characteristic parameter normal components.
S4 principle is:MFCC characteristic parameters will typically need 20 to 30 to tie up the identification that just can guarantee that speech recognition system
Rate, the feature of the more big then voice signal of value of MFCC characteristic parameter dimensions is more accurate, but because of each of characteristic parameter
It is different to the contribution degree of speech recognition system to tie up component, can typically there is many garbages in such characteristic parameter
Even interference information, that is, the ageing of system is influenceed, influence the discrimination of system again.Therefore MFCC characteristic parameters are dropped
Dimension, to choose effective dimension in MFCC characteristic parameters.
The embodiment of the present invention also provides a kind of storage medium, and computer program, computer program are stored with storage medium
Above-mentioned MFCC audio recognition methods are realized when being executed by processor.It should be noted that the storage medium includes USB flash disk, movement
Hard disk, ROM (Read-Only Memory, read-only storage), RAM (Random Access Memory, random access memory
Device), magnetic disc or CD etc. are various can be with the medium of store program codes.
Shown in Figure 2, the embodiment of the present invention also provides a kind of electronic equipment, including memory and processor, memory
On store the computer program run on a processor, realize above-mentioned MFCC speech recognitions during computing device computer program
Method.
MFCC speech recognition systems provided in an embodiment of the present invention, including speech signal pre-processing module, MFCC initially believe
Number frequency computing module, MFCC Feature Parameter Fusions module and MFCC feature vectors dimensional down modules.
Speech signal pre-processing module is used for:The voice signal for needing speech recognition is pre-processed, at the beginning of obtaining MFCC
Beginning signal, send MFCC initial signals frequency to MFCC initial signal frequencies computing module and calculate signal;Idiographic flow is:To language
Sound signal carries out preemphasis, framing and windowing process, obtains voice signal;Voice signal is subjected to FFT, obtains frequency domain letter
Number;Frequency-region signal modulus square is obtained into discrete power spectrum, discrete power spectrum is passed through into filter filtering, to filtered letter
DCT conversions are carried out after number taking the logarithm, obtain MFCC initial signals.
MFCC initial signal frequency computing modules are used for:After receiving MFCC initial signals frequency calculating signal, according to MFCC
The low frequency frequency-region signal f1 of initial signal, calculates the Frequency fl of MFCC initial signals, and calculation formula is:
According to the intermediate frequency frequency-region signal f2 of MFCC initial signals, the IF-FRE fm of MFCC initial signals is calculated, is calculated public
Formula is:
According to the high frequency frequency-region signal f3 of MFCC initial signals, the higher frequency fh of MFCC initial signals is calculated, is calculated public
Formula is:MFCC Feature Parameter Fusions are sent to MFCC Feature Parameter Fusions module
Signal.
MFCC Feature Parameter Fusion modules are used for:After receiving MFCC Feature Parameter Fusion signals, by MFCC initial signals
Frequency fl, IF-FRE fm and higher frequency fh are merged, and obtain MFCC characteristic parameter F, and calculation formula is:
Wherein N represent voice signal carry out FFT when points (N specific value can be according to state of the art
Draw), FsRepresent stopband cut-off frequency, FsFor constant, Q-1Fm inverse function is represented, Q (fl) is represented using fl as basis after f2
Fm calculation formula calculates fm, and Q (fh) is represented using fh as after f2 and calculated fm according to fm calculation formula, and H represents speech recognition
In the number of wave filter that needs to use, send MFCC feature vectors dimensional down signals to MFCC feature vectors dimensional downs module.
MFCC feature vectors dimensional down modules are used for:After receiving MFCC feature vectors dimensional down signals, MFCC characteristic parameters are entered
Row dimensionality reduction, obtain MFCC dimensionality reduction characteristic parameters FDrop, calculation formula is:
Wherein σbetweenFor inter _ class relationship, represent that the kth between a certain group of voice signal ties up the class of MFCC characteristic parameters
Between variance sum;σwithinFor within-cluster variance, represent that the kth of some voice signal ties up the variance within clusters of MFCC characteristic parameters
Sum;M represents the sum of speech samples, niThe number for the speech samples that voice signal i possesses is represented,Represent voice signal i
Kth dimension MFCC characteristic parameters average, μkThe average of MFCC characteristic parameters is tieed up for kth in voice signal;Represent voice letter
The kth dimension MFCC characteristic parameter normal components of number i jth section speech samples,Represent voice signal i i-th section of speech samples
Kth ties up MFCC characteristic parameter normal components.
It should be noted that:System provided in an embodiment of the present invention is when carrying out intermodule communication, only with above-mentioned each function
The division progress of module, can be as needed and by above-mentioned function distribution by different function moulds for example, in practical application
Block is completed, i.e., the internal structure of system is divided into different functional modules, to complete all or part of work(described above
Energy.
Further, the present invention is not limited to the above-described embodiments, for those skilled in the art,
Without departing from the principles of the invention, some improvements and modifications can also be made, these improvements and modifications are also considered as the present invention
Protection domain within.The content not being described in detail in this specification belongs to existing skill known to professional and technical personnel in the field
Art.
Claims (10)
1. a kind of MFCC audio recognition methods, it is characterised in that this method comprises the following steps:
S1:The voice signal for needing speech recognition is pre-processed, MFCC initial signals is obtained, goes to S2;
S2:According to the low frequency frequency-region signal f1 of MFCC initial signals, the Frequency fl of calculating MFCC initial signals, calculation formula
For:
It is according to the intermediate frequency frequency-region signal f2 of MFCC initial signals, the IF-FRE fm of calculating MFCC initial signals, calculation formula:
<mrow>
<mi>f</mi>
<mi>m</mi>
<mo>=</mo>
<mfenced open = "{" close = "">
<mtable>
<mtr>
<mtd>
<mn>1073.05</mn>
<mo>-</mo>
<mn>527</mn>
<mo>*</mo>
<mi>l</mi>
<mi>n</mi>
<mo>(</mo>
<mn>1</mn>
<mo>+</mo>
<mfrac>
<mrow>
<mn>200</mn>
<mo>-</mo>
<mi>f</mi>
<mn>2</mn>
</mrow>
<mn>300</mn>
</mfrac>
<mo>)</mo>
<mo>,</mo>
<mn>0</mn>
<mo><</mo>
<mi>f</mi>
<mn>2</mn>
<mo>&le;</mo>
<mn>2000</mn>
</mtd>
</mtr>
<mtr>
<mtd>
<mn>1073.05</mn>
<mo>+</mo>
<mn>527</mn>
<mo>*</mo>
<mi>l</mi>
<mi>n</mi>
<mo>(</mo>
<mn>1</mn>
<mo>+</mo>
<mfrac>
<mrow>
<mi>f</mi>
<mn>2</mn>
<mo>-</mo>
<mn>200</mn>
</mrow>
<mn>300</mn>
</mfrac>
<mo>)</mo>
<mo>,</mo>
<mn>2000</mn>
<mo><</mo>
<mi>f</mi>
<mn>2</mn>
<mo>&le;</mo>
<mn>4000</mn>
</mtd>
</mtr>
</mtable>
</mfenced>
<mo>;</mo>
</mrow>
It is according to the high frequency frequency-region signal f3 of MFCC initial signals, the higher frequency fh of calculating MFCC initial signals, calculation formula:
Go to S3;
S3:The Frequency fl, IF-FRE fm and higher frequency fh of MFCC initial signals are merged, obtain MFCC features
Parameter F, calculation formula are:
<mrow>
<mi>F</mi>
<mo>=</mo>
<mfrac>
<mi>N</mi>
<msub>
<mi>F</mi>
<mi>s</mi>
</msub>
</mfrac>
<msup>
<mi>Q</mi>
<mrow>
<mo>-</mo>
<mn>1</mn>
</mrow>
</msup>
<mo>&lsqb;</mo>
<mi>Q</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>f</mi>
<mi>l</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>+</mo>
<mi>H</mi>
<mfrac>
<mrow>
<mi>Q</mi>
<mrow>
<mo>(</mo>
<mi>f</mi>
<mi>h</mi>
<mo>)</mo>
</mrow>
<mo>-</mo>
<mi>Q</mi>
<mrow>
<mo>(</mo>
<mi>f</mi>
<mi>l</mi>
<mo>)</mo>
</mrow>
</mrow>
<mrow>
<mi>H</mi>
<mo>+</mo>
<mn>1</mn>
</mrow>
</mfrac>
<mo>&rsqb;</mo>
<mo>;</mo>
</mrow>
Wherein N represents points when voice signal carries out FFT, FsRepresent stopband cut-off frequency, FsFor constant, Q-1Represent fm
Inverse function, Q (fl) represent using fl be used as after f2 according to fm calculation formula calculate fm, Q (fh) representative using fh as root after f2
Fm is calculated according to fm calculation formula, H represents the number of the wave filter needed to use in speech recognition.
2. MFCC audio recognition methods as claimed in claim 1, it is characterised in that further comprising the steps of after S3:S4:It is right
MFCC characteristic parameters F carries out dimensionality reduction, obtains MFCC dimensionality reduction characteristic parameters FDrop, calculation formula is:σbetweenFor class
Between dispersion, represent voice signal kth dimension MFCC characteristic parameters inter-class variance sum;σwithinFor within-cluster variance, represent
The sum of the variance within clusters of the kth dimension MFCC characteristic parameters of voice signal.
3. MFCC audio recognition methods as claimed in claim 2, it is characterised in that:
<mrow>
<msub>
<mi>&sigma;</mi>
<mrow>
<mi>b</mi>
<mi>e</mi>
<mi>t</mi>
<mi>w</mi>
<mi>e</mi>
<mi>e</mi>
<mi>n</mi>
</mrow>
</msub>
<mo>=</mo>
<msubsup>
<mi>&Sigma;</mi>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>M</mi>
</msubsup>
<msup>
<mrow>
<mo>(</mo>
<msubsup>
<mi>&mu;</mi>
<mi>k</mi>
<mrow>
<mo>(</mo>
<mi>i</mi>
<mo>)</mo>
</mrow>
</msubsup>
<mo>-</mo>
<msub>
<mi>&mu;</mi>
<mi>k</mi>
</msub>
<mo>)</mo>
</mrow>
<mn>2</mn>
</msup>
<mo>,</mo>
<msub>
<mi>&sigma;</mi>
<mrow>
<mi>w</mi>
<mi>i</mi>
<mi>t</mi>
<mi>h</mi>
<mi>i</mi>
<mi>n</mi>
</mrow>
</msub>
<mo>=</mo>
<msubsup>
<mi>&Sigma;</mi>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>M</mi>
</msubsup>
<mo>&lsqb;</mo>
<mfrac>
<mn>1</mn>
<mi>n</mi>
</mfrac>
<msubsup>
<mi>&Sigma;</mi>
<mrow>
<mi>j</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<msub>
<mi>n</mi>
<mi>i</mi>
</msub>
</msubsup>
<msup>
<mrow>
<mo>(</mo>
<msubsup>
<mi>x</mi>
<mi>k</mi>
<mrow>
<mo>(</mo>
<mi>j</mi>
<mo>)</mo>
</mrow>
</msubsup>
<mo>-</mo>
<msubsup>
<mi>x</mi>
<mi>k</mi>
<mrow>
<mo>(</mo>
<mi>i</mi>
<mo>)</mo>
</mrow>
</msubsup>
<mo>)</mo>
</mrow>
<mn>2</mn>
</msup>
<mo>&rsqb;</mo>
<mo>;</mo>
</mrow>
Wherein M represents the sum of speech samples, niThe number for the speech samples that voice signal i possesses is represented,Represent voice letter
The average of number i kth dimension MFCC characteristic parameters, μkThe average of MFCC characteristic parameters is tieed up for kth in voice signal;Represent language
The kth dimension MFCC characteristic parameter normal components of sound signal i jth section speech samples,Represent voice signal i i-th section of voice sample
This kth dimension MFCC characteristic parameter normal components.
4. the MFCC audio recognition methods as described in claims 1 to 3, it is characterised in that S1 flow includes:To voice signal
Preemphasis, framing and windowing process are carried out, obtains voice signal;Voice signal is subjected to FFT, obtains frequency-region signal;Will
Frequency-region signal modulus square obtains discrete power spectrum, and discrete power spectrum is passed through into filter filtering, filtered signal is taken
DCT conversions are carried out after logarithm, obtain MFCC initial signals.
5. a kind of storage medium, computer program is stored with the storage medium, it is characterised in that:The computer program is located
Reason device realizes the method described in any one of Claims 1-4 when performing.
6. a kind of electronic equipment, including memory and processor, the computer journey run on a processor is stored on memory
Sequence, it is characterised in that:The method described in any one of Claims 1-4 is realized during computing device computer program.
A kind of 7. MFCC speech recognition systems, it is characterised in that:The system includes speech signal pre-processing module, MFCC initially believes
Number frequency computing module and MFCC Feature Parameter Fusion modules;
Speech signal pre-processing module is used for:The voice signal for needing speech recognition is pre-processed, MFCC is obtained and initially believes
Number, send MFCC initial signals frequency to MFCC initial signal frequencies computing module and calculate signal;
MFCC initial signal frequency computing modules are used for:It is initial according to MFCC after receiving MFCC initial signals frequency calculating signal
The low frequency frequency-region signal f1 of signal, calculates the Frequency fl of MFCC initial signals, and calculation formula is:
It is according to the intermediate frequency frequency-region signal f2 of MFCC initial signals, the IF-FRE fm of calculating MFCC initial signals, calculation formula:
<mrow>
<mi>f</mi>
<mi>m</mi>
<mo>=</mo>
<mfenced open = "{" close = "">
<mtable>
<mtr>
<mtd>
<mn>1073.05</mn>
<mo>-</mo>
<mn>527</mn>
<mo>*</mo>
<mi>l</mi>
<mi>n</mi>
<mo>(</mo>
<mn>1</mn>
<mo>+</mo>
<mfrac>
<mrow>
<mn>200</mn>
<mo>-</mo>
<mi>f</mi>
<mn>2</mn>
</mrow>
<mn>300</mn>
</mfrac>
<mo>)</mo>
<mo>,</mo>
<mn>0</mn>
<mo><</mo>
<mi>f</mi>
<mn>2</mn>
<mo>&le;</mo>
<mn>2000</mn>
</mtd>
</mtr>
<mtr>
<mtd>
<mn>1073.05</mn>
<mo>+</mo>
<mn>527</mn>
<mo>*</mo>
<mi>l</mi>
<mi>n</mi>
<mo>(</mo>
<mn>1</mn>
<mo>+</mo>
<mfrac>
<mrow>
<mi>f</mi>
<mn>2</mn>
<mo>-</mo>
<mn>200</mn>
</mrow>
<mn>300</mn>
</mfrac>
<mo>)</mo>
<mo>,</mo>
<mn>2000</mn>
<mo><</mo>
<mi>f</mi>
<mn>2</mn>
<mo>&le;</mo>
<mn>4000</mn>
</mtd>
</mtr>
</mtable>
</mfenced>
<mo>;</mo>
</mrow>
It is according to the high frequency frequency-region signal f3 of MFCC initial signals, the higher frequency fh of calculating MFCC initial signals, calculation formula:
MFCC Feature Parameter Fusion signals are sent to MFCC Feature Parameter Fusions module;
MFCC Feature Parameter Fusion modules are used for:After receiving MFCC Feature Parameter Fusion signals, by the low frequency of MFCC initial signals
Frequency fl, IF-FRE fm and higher frequency fh are merged, and obtain MFCC characteristic parameter F, and calculation formula is:
<mrow>
<mi>F</mi>
<mo>=</mo>
<mfrac>
<mi>N</mi>
<msub>
<mi>F</mi>
<mi>s</mi>
</msub>
</mfrac>
<msup>
<mi>Q</mi>
<mrow>
<mo>-</mo>
<mn>1</mn>
</mrow>
</msup>
<mo>&lsqb;</mo>
<mi>Q</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>f</mi>
<mi>l</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>+</mo>
<mi>H</mi>
<mfrac>
<mrow>
<mi>Q</mi>
<mrow>
<mo>(</mo>
<mi>f</mi>
<mi>h</mi>
<mo>)</mo>
</mrow>
<mo>-</mo>
<mi>Q</mi>
<mrow>
<mo>(</mo>
<mi>f</mi>
<mi>l</mi>
<mo>)</mo>
</mrow>
</mrow>
<mrow>
<mi>H</mi>
<mo>+</mo>
<mn>1</mn>
</mrow>
</mfrac>
<mo>&rsqb;</mo>
<mo>;</mo>
</mrow>
Wherein N represents points when voice signal carries out FFT, FsRepresent stopband cut-off frequency, FsFor constant, Q-1Represent fm
Inverse function, Q (fl) represent using fl be used as after f2 according to fm calculation formula calculate fm, Q (fh) representative using fh as root after f2
Fm is calculated according to fm calculation formula, H represents the number of the wave filter needed to use in speech recognition.
8. MFCC speech recognition systems as claimed in claim 7, it is characterised in that the system also includes MFCC characteristic parameters and dropped
Module is tieed up, it is used for:After the completion of the work of MFCC Feature Parameter Fusions module, dimensionality reduction is carried out to MFCC characteristic parameters, obtained
MFCC dimensionality reduction characteristic parameters FDrop, calculation formula is:σbetweenFor inter _ class relationship, the kth of voice signal is represented
Tie up the sum of the inter-class variance of MFCC characteristic parameters;σwithinFor within-cluster variance, represent that the kth dimension MFCC features of voice signal are joined
The sum of several variance within clusters.
9. MFCC speech recognition systems as claimed in claim 8, it is characterised in that:
<mrow>
<msub>
<mi>&sigma;</mi>
<mrow>
<mi>b</mi>
<mi>e</mi>
<mi>t</mi>
<mi>w</mi>
<mi>e</mi>
<mi>e</mi>
<mi>n</mi>
</mrow>
</msub>
<mo>=</mo>
<msubsup>
<mi>&Sigma;</mi>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>M</mi>
</msubsup>
<msup>
<mrow>
<mo>(</mo>
<msubsup>
<mi>&mu;</mi>
<mi>k</mi>
<mrow>
<mo>(</mo>
<mi>i</mi>
<mo>)</mo>
</mrow>
</msubsup>
<mo>-</mo>
<msub>
<mi>&mu;</mi>
<mi>k</mi>
</msub>
<mo>)</mo>
</mrow>
<mn>2</mn>
</msup>
<mo>,</mo>
<msub>
<mi>&sigma;</mi>
<mrow>
<mi>w</mi>
<mi>i</mi>
<mi>t</mi>
<mi>h</mi>
<mi>i</mi>
<mi>n</mi>
</mrow>
</msub>
<mo>=</mo>
<msubsup>
<mi>&Sigma;</mi>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>M</mi>
</msubsup>
<mo>&lsqb;</mo>
<mfrac>
<mn>1</mn>
<mi>n</mi>
</mfrac>
<msubsup>
<mi>&Sigma;</mi>
<mrow>
<mi>j</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<msub>
<mi>n</mi>
<mi>i</mi>
</msub>
</msubsup>
<msup>
<mrow>
<mo>(</mo>
<msubsup>
<mi>x</mi>
<mi>k</mi>
<mrow>
<mo>(</mo>
<mi>j</mi>
<mo>)</mo>
</mrow>
</msubsup>
<mo>-</mo>
<msubsup>
<mi>x</mi>
<mi>k</mi>
<mrow>
<mo>(</mo>
<mi>i</mi>
<mo>)</mo>
</mrow>
</msubsup>
<mo>)</mo>
</mrow>
<mn>2</mn>
</msup>
<mo>&rsqb;</mo>
<mo>;</mo>
</mrow>
Wherein M represents the sum of speech samples, niThe number for the speech samples that voice signal i possesses is represented,Represent voice letter
The average of number i kth dimension MFCC characteristic parameters, μkThe average of MFCC characteristic parameters is tieed up for kth in voice signal;Represent language
The kth dimension MFCC characteristic parameter normal components of sound signal i jth section speech samples,Represent voice signal i i-th section of voice sample
This kth dimension MFCC characteristic parameter normal components.
10. the MFCC speech recognition systems as described in claim 7 to 9, it is characterised in that:The speech signal pre-processing module
Workflow include:Preemphasis, framing and windowing process are carried out to voice signal, obtain voice signal;Voice signal is entered
Row FFT, obtains frequency-region signal;Frequency-region signal modulus square is obtained into discrete power spectrum, discrete power spectrum is passed through into filter
Ripple device is filtered, and DCT conversions are carried out after being taken the logarithm to filtered signal, obtain MFCC initial signals.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710731077.XA CN107527611A (en) | 2017-08-23 | 2017-08-23 | MFCC audio recognition methods, storage medium, electronic equipment and system |
PCT/CN2018/081321 WO2019037426A1 (en) | 2017-08-23 | 2018-03-30 | Mfcc voice recognition method, storage medium, electronic device, and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710731077.XA CN107527611A (en) | 2017-08-23 | 2017-08-23 | MFCC audio recognition methods, storage medium, electronic equipment and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107527611A true CN107527611A (en) | 2017-12-29 |
Family
ID=60681946
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710731077.XA Pending CN107527611A (en) | 2017-08-23 | 2017-08-23 | MFCC audio recognition methods, storage medium, electronic equipment and system |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN107527611A (en) |
WO (1) | WO2019037426A1 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108269566A (en) * | 2018-01-17 | 2018-07-10 | 南京理工大学 | A kind of thorax mouth wave recognition methods based on multiple dimensioned sub-belt energy collection feature |
WO2019037426A1 (en) * | 2017-08-23 | 2019-02-28 | 武汉斗鱼网络科技有限公司 | Mfcc voice recognition method, storage medium, electronic device, and system |
CN113571078B (en) * | 2021-01-29 | 2024-04-26 | 腾讯科技(深圳)有限公司 | Noise suppression method, device, medium and electronic equipment |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090144053A1 (en) * | 2007-12-03 | 2009-06-04 | Kabushiki Kaisha Toshiba | Speech processing apparatus and speech synthesis apparatus |
CN104900229A (en) * | 2015-05-25 | 2015-09-09 | 桂林电子科技大学信息科技学院 | Method for extracting mixed characteristic parameters of voice signals |
CN105895087A (en) * | 2016-03-24 | 2016-08-24 | 海信集团有限公司 | Voice recognition method and apparatus |
CN106356058A (en) * | 2016-09-08 | 2017-01-25 | 河海大学 | Robust speech recognition method based on multi-band characteristic compensation |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7027979B2 (en) * | 2003-01-14 | 2006-04-11 | Motorola, Inc. | Method and apparatus for speech reconstruction within a distributed speech recognition system |
CN101577116B (en) * | 2009-02-27 | 2012-07-18 | 北京中星微电子有限公司 | Extracting method of MFCC coefficients of voice signal, device and Mel filtering method |
JP2012163919A (en) * | 2011-02-09 | 2012-08-30 | Sony Corp | Voice signal processing device, method and program |
CN103390403B (en) * | 2013-06-19 | 2015-11-25 | 北京百度网讯科技有限公司 | The extracting method of MFCC feature and device |
CN105405448B (en) * | 2014-09-16 | 2019-09-03 | 科大讯飞股份有限公司 | A kind of sound effect treatment method and device |
CN106782565A (en) * | 2016-11-29 | 2017-05-31 | 重庆重智机器人研究院有限公司 | A kind of vocal print feature recognition methods and system |
CN107527611A (en) * | 2017-08-23 | 2017-12-29 | 武汉斗鱼网络科技有限公司 | MFCC audio recognition methods, storage medium, electronic equipment and system |
-
2017
- 2017-08-23 CN CN201710731077.XA patent/CN107527611A/en active Pending
-
2018
- 2018-03-30 WO PCT/CN2018/081321 patent/WO2019037426A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090144053A1 (en) * | 2007-12-03 | 2009-06-04 | Kabushiki Kaisha Toshiba | Speech processing apparatus and speech synthesis apparatus |
CN104900229A (en) * | 2015-05-25 | 2015-09-09 | 桂林电子科技大学信息科技学院 | Method for extracting mixed characteristic parameters of voice signals |
CN105895087A (en) * | 2016-03-24 | 2016-08-24 | 海信集团有限公司 | Voice recognition method and apparatus |
CN106356058A (en) * | 2016-09-08 | 2017-01-25 | 河海大学 | Robust speech recognition method based on multi-band characteristic compensation |
Non-Patent Citations (3)
Title |
---|
张文克: "《中国优秀硕士学位论文全文数据库信息科技辑》", 15 March 2017, 中国学术期刊(光盘版)电子杂志社 * |
袁正午等: "改进的混合MFCC语音识别算法研究", 《计算机工程与应用》 * |
鲜晓东等: "基于Fisher比的梅尔倒谱系数混合特征提取方法", 《计算机应用》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019037426A1 (en) * | 2017-08-23 | 2019-02-28 | 武汉斗鱼网络科技有限公司 | Mfcc voice recognition method, storage medium, electronic device, and system |
CN108269566A (en) * | 2018-01-17 | 2018-07-10 | 南京理工大学 | A kind of thorax mouth wave recognition methods based on multiple dimensioned sub-belt energy collection feature |
CN113571078B (en) * | 2021-01-29 | 2024-04-26 | 腾讯科技(深圳)有限公司 | Noise suppression method, device, medium and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
WO2019037426A1 (en) | 2019-02-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102968986B (en) | Overlapped voice and single voice distinguishing method based on long time characteristics and short time characteristics | |
CN103117059B (en) | Voice signal characteristics extracting method based on tensor decomposition | |
CN102968990B (en) | Speaker identifying method and system | |
CN106847292A (en) | Method for recognizing sound-groove and device | |
CN102483916B (en) | Audio feature extracting apparatus, audio feature extracting method, and audio feature extracting program | |
CN109378010A (en) | Training method, the speech de-noising method and device of neural network model | |
CN113327626B (en) | Voice noise reduction method, device, equipment and storage medium | |
CN104221079B (en) | Carry out the improved Mel filter bank structure of phonetic analysiss using spectral characteristic | |
CN110189757A (en) | A kind of giant panda individual discrimination method, equipment and computer readable storage medium | |
CN102486920A (en) | Audio event detection method and device | |
CN109192200B (en) | Speech recognition method | |
CN109147798B (en) | Speech recognition method, device, electronic equipment and readable storage medium | |
CN109065043B (en) | Command word recognition method and computer storage medium | |
CN110942766A (en) | Audio event detection method, system, mobile terminal and storage medium | |
CN107123432A (en) | A kind of Self Matching Top N audio events recognize channel self-adapted method | |
CN103258537A (en) | Method utilizing characteristic combination to identify speech emotions and device thereof | |
Chen et al. | An audio scene classification framework with embedded filters and a DCT-based temporal module | |
CN106782503A (en) | Automatic speech recognition method based on physiologic information in phonation | |
CN108806725A (en) | Speech differentiation method, apparatus, computer equipment and storage medium | |
CN101577116A (en) | Extracting method of MFCC coefficients of voice signal, device and Mel filtering method | |
CN107527611A (en) | MFCC audio recognition methods, storage medium, electronic equipment and system | |
CN105845143A (en) | Speaker confirmation method and speaker confirmation system based on support vector machine | |
CN116524939A (en) | ECAPA-TDNN-based automatic identification method for bird song species | |
CN108172214A (en) | A kind of small echo speech recognition features parameter extracting method based on Mel domains | |
CN109616124A (en) | Lightweight method for recognizing sound-groove and system based on ivector |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20171229 |