CN107527611A - MFCC audio recognition methods, storage medium, electronic equipment and system - Google Patents

MFCC audio recognition methods, storage medium, electronic equipment and system Download PDF

Info

Publication number
CN107527611A
CN107527611A CN201710731077.XA CN201710731077A CN107527611A CN 107527611 A CN107527611 A CN 107527611A CN 201710731077 A CN201710731077 A CN 201710731077A CN 107527611 A CN107527611 A CN 107527611A
Authority
CN
China
Prior art keywords
mrow
mfcc
frequency
signal
msubsup
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710731077.XA
Other languages
Chinese (zh)
Inventor
李振华
陈少杰
张文明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Douyu Network Technology Co Ltd
Original Assignee
Wuhan Douyu Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Douyu Network Technology Co Ltd filed Critical Wuhan Douyu Network Technology Co Ltd
Priority to CN201710731077.XA priority Critical patent/CN107527611A/en
Publication of CN107527611A publication Critical patent/CN107527611A/en
Priority to PCT/CN2018/081321 priority patent/WO2019037426A1/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/20Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/21Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum

Abstract

The invention discloses a kind of MFCC audio recognition methods, storage medium, electronic equipment and system, it is related to field of speech recognition.Steps of the method are:The voice signal for needing speech recognition is pre-processed, obtain MFCC initial signals, calculate Frequency, IF-FRE and the higher frequency of MFCC initial signals, the low, medium and high rate again and again of MFCC initial signals is merged, obtain MFCC characteristic parameters, dimensionality reduction is carried out to MFCC characteristic parameters F, obtains MFCC dimensionality reduction characteristic parameters.Invention significantly improves accuracy of identification of the MFCC characteristic parameters in noise circumstance and high-frequency region, and then the purpose that MFCC characteristic parameters are extracted in the voice signal in noise circumstance and high-frequency region is realized, be very suitable for promoting.

Description

MFCC audio recognition methods, storage medium, electronic equipment and system
Technical field
The present invention relates to field of speech recognition, and in particular to a kind of MFCC (Mel-Frequency Cepstral Coefficients, mel-frequency cepstrum coefficient) audio recognition method, storage medium, electronic equipment and system.
Background technology
MFCC is one kind widely used feature in automatic speech and Speaker Identification, because MFCC characteristic parameters are languages The most feature of identification, therefore MFCC characteristic parameters are had been widely used in speech recognition field, that is, are existed in sound signal During speech recognition, the MFCC characteristic parameters in voice signal are extracted, have just been basically completed speech identifying function.
But for the high-frequency region of the more voice signal of noise and voice signal, MFCC characteristic parameters Accuracy of identification is relatively low, and then causes MFCC characteristic parameters to be difficult to extract.
The content of the invention
For defect present in prior art, present invention solves the technical problem that being:How in noise circumstance and high frequency MFCC characteristic parameters are identified in the voice signal in region, the present invention can increase substantially the accuracy of identification of MFCC characteristic parameters, It is very suitable for promoting.
To achieve the above objectives, MFCC audio recognition methods provided by the invention, comprise the following steps:
S1:The voice signal for needing speech recognition is pre-processed, MFCC initial signals is obtained, goes to S2;
S2:According to the low frequency frequency-region signal f1 of MFCC initial signals, the Frequency fl of MFCC initial signals is calculated, is calculated Formula is:
According to the intermediate frequency frequency-region signal f2 of MFCC initial signals, the IF-FRE fm of MFCC initial signals is calculated, is calculated public Formula is:
According to the high frequency frequency-region signal f3 of MFCC initial signals, the higher frequency fh of MFCC initial signals is calculated, is calculated public Formula is:Go to S3;
S3:The Frequency fl, IF-FRE fm and higher frequency fh of MFCC initial signals are merged, obtain MFCC Characteristic parameter F, calculation formula are:
Wherein N represents points when voice signal carries out FFT, FsRepresent stopband cut-off frequency, FsFor constant, Q-1Generation Table fm inverse function, Q (fl) are represented using fl as after f2 and are calculated fm according to fm calculation formula, and Q (fh) is represented using fh as f2 Fm is calculated according to fm calculation formula afterwards, H represents the number of the wave filter needed to use in speech recognition.
It is further comprising the steps of after S3 on the basis of above-mentioned technical proposal:S4:MFCC characteristic parameters F is dropped Dimension, obtains MFCC dimensionality reduction characteristic parameters FDrop, calculation formula is:σbetweenFor inter _ class relationship, voice letter is represented Number kth dimension MFCC characteristic parameters inter-class variance sum;σwithinFor within-cluster variance, represent that the kth of voice signal ties up MFCC The sum of the variance within clusters of characteristic parameter.
Storage medium provided by the invention, computer program is stored with the storage medium, the computer program is located Reason device realizes above-mentioned MFCC audio recognition methods when performing.
Electronic equipment provided by the invention, including memory and processor, store on memory and run on a processor Computer program, realize above-mentioned MFCC audio recognition methods during computing device computer program.
MFCC speech recognition systems provided by the invention, including speech signal pre-processing module, MFCC initial signal frequencies Computing module and MFCC Feature Parameter Fusion modules;
Speech signal pre-processing module is used for:The voice signal for needing speech recognition is pre-processed, at the beginning of obtaining MFCC Beginning signal, send MFCC initial signals frequency to MFCC initial signal frequencies computing module and calculate signal;
MFCC initial signal frequency computing modules are used for:After receiving MFCC initial signals frequency calculating signal, according to MFCC The low frequency frequency-region signal f1 of initial signal, calculates the Frequency fl of MFCC initial signals, and calculation formula is:
According to the intermediate frequency frequency-region signal f2 of MFCC initial signals, the IF-FRE fm of MFCC initial signals is calculated, is calculated public Formula is:
According to the high frequency frequency-region signal f3 of MFCC initial signals, the higher frequency fh of MFCC initial signals is calculated, is calculated public Formula is:MFCC Feature Parameter Fusions are sent to MFCC Feature Parameter Fusions module Signal;
MFCC Feature Parameter Fusion modules are used for:After receiving MFCC Feature Parameter Fusion signals, by MFCC initial signals Frequency fl, IF-FRE fm and higher frequency fh are merged, and obtain MFCC characteristic parameter F, and calculation formula is:
Wherein N represents points when voice signal carries out FFT, FsRepresent stopband cut-off frequency, FsFor constant, Q-1Generation Table fm inverse function, Q (fl) are represented using fl as after f2 and are calculated fm according to fm calculation formula, and Q (fh) is represented using fh as f2 Fm is calculated according to fm calculation formula afterwards, H represents the number of the wave filter needed to use in speech recognition.
On the basis of above-mentioned technical proposal, the system also includes MFCC feature vectors dimensional down modules, and it is used for:In MFCC After the completion of the work of Feature Parameter Fusion module, dimensionality reduction is carried out to MFCC characteristic parameters, obtains MFCC dimensionality reduction characteristic parameters FDrop, calculate Formula is:σbetweenFor inter _ class relationship, represent voice signal kth dimension MFCC characteristic parameters class between side The sum of difference;σwithinFor within-cluster variance, represent that the kth of voice signal ties up the sum of the variance within clusters of MFCC characteristic parameters.
Compared with prior art, the advantage of the invention is that:
(1) it can be seen from the S1 to S3 of the present invention, the present invention is first calculated MFCC and initially believed by the algorithm of independent research Number basic, normal, high frequent rate, then by it is basic, normal, high again and again rate fusion calculation obtain MFCC characteristic parameters, significantly improved with this Accuracy of identification of the MFCC characteristic parameters in noise circumstance and high-frequency region, and then realize in noise circumstance and high-frequency region In voice signal in extract MFCC characteristic parameters purpose, be very suitable for promoting.
(2) it can be seen from the S4 of the present invention, the present invention carries out dimensionality reduction by the algorithm of independent research to MFCC characteristic parameters, Further to improve the accuracy of identification of MFCC characteristic parameters.
Brief description of the drawings
Fig. 1 is the flow chart of MFCC audio recognition methods in the embodiment of the present invention;
Fig. 2 is the connection block diagram of electronic equipment in the embodiment of the present invention.
Embodiment
The present invention is described in further detail below in conjunction with drawings and Examples.
It is shown in Figure 1, the MFCC audio recognition methods in the embodiment of the present invention, comprise the following steps:
S1:The voice signal for needing speech recognition is pre-processed, MFCC initial signals is obtained, goes to S2.
S1 idiographic flow is:Preemphasis, framing and windowing process are carried out to voice signal, obtain voice signal;By language Sound signal carries out FFT (Fast Fourier Transformation, the fast algorithm of discrete fourier transform) conversion (FFTs When can be counted), i.e., the conversion from time domain to frequency domain, obtain frequency-region signal;By frequency-region signal modulus square obtain it is discrete Power spectrum, discrete power spectrum is passed through into filter filtering, DCT (DCT for are carried out after being taken the logarithm to filtered signal Discrete Cosine Transform, discrete cosine transform) conversion, obtain MFCC initial signals.
S2:Obtain after MFCC initial signals, it is necessary to which signal of the MFCC initial signals in low frequency, intermediate frequency and high frequency is respectively increased Precision, idiographic flow are:
According to the low frequency frequency-region signal f1 of MFCC initial signals, the Frequency fl of MFCC initial signals is calculated, is calculated public Formula is:
According to the intermediate frequency frequency-region signal f2 of MFCC initial signals, the IF-FRE fm of MFCC initial signals is calculated, is calculated public Formula is:
The design principle of higher frequency fm calculation formula is:In order to solve the precision problem of Mid Frequency, it is necessary to find one It is kind suitable, for the Mel-Hz corresponding relations of mid-frequency region, this corresponding relation need to realize wave filter in low, high frequency section point Cloth is sparse, and in intermediate frequency section, distribution is relatively intensive, so as to ensure the computational accuracy of Mid Frequency.The fm calculated by above-mentioned formula Just identical above demand, and still ensure the computational accuracy of Mid Frequency coefficient in the form of logarithmic function.
According to the high frequency frequency-region signal f3 of MFCC initial signals, the higher frequency fh of MFCC initial signals is calculated, is calculated public Formula is:Go to S3.
The design principle of higher frequency fh calculation formula is:Because the quantity of MFCC Mel wave filters in extraction is in low frequency Region is more, few in high-frequency region so that MFCC declines therewith with the raising of frequency, its computational accuracy.Therefore IMFCC is proposed (inverse Mel cepstrums coefficient of frequency), i.e. higher frequency fh;IMFCC constructs a kind of new knot opposite with Mel domains yardstick with reference to Me domains Structure so that wave filter is reduced in the number that low frequency range is distributed, in the quantity increase of high frequency region.
S3:The Frequency fl, IF-FRE fm and higher frequency fh of MFCC initial signals are merged, obtain MFCC Characteristic parameter F, calculation formula are:
Wherein N represents points when voice signal carries out FFT, FsRepresent stopband cut-off frequency (FsFor constant), Q-1 Represent fm inverse function, Q (fl) represents fl being used as after f2 calculates fm according to fm calculation formula, Q (fh) representatives using fh as Fm is calculated according to fm calculation formula after f2, H represents the number of the wave filter needed to use in speech recognition, goes to S4.
S4:Dimensionality reduction is carried out to MFCC characteristic parameters F, obtains MFCC dimensionality reduction characteristic parameters FDrop, identification essence is improved with further Degree, calculation formula are:
Wherein σbetweenFor inter _ class relationship, represent that the kth between a certain group of voice signal ties up the class of MFCC characteristic parameters Between variance sum;σwithinFor within-cluster variance, represent that the kth of some voice signal ties up the variance within clusters of MFCC characteristic parameters Sum;M represents the sum of speech samples, niThe number for the speech samples that voice signal i possesses is represented,Represent voice signal i Kth dimension MFCC characteristic parameters average, μkThe average of MFCC characteristic parameters is tieed up for kth in voice signal;Represent voice letter The kth dimension MFCC characteristic parameter normal components of number i jth section speech samples,Represent voice signal i i-th section of speech samples Kth ties up MFCC characteristic parameter normal components.
S4 principle is:MFCC characteristic parameters will typically need 20 to 30 to tie up the identification that just can guarantee that speech recognition system Rate, the feature of the more big then voice signal of value of MFCC characteristic parameter dimensions is more accurate, but because of each of characteristic parameter It is different to the contribution degree of speech recognition system to tie up component, can typically there is many garbages in such characteristic parameter Even interference information, that is, the ageing of system is influenceed, influence the discrimination of system again.Therefore MFCC characteristic parameters are dropped Dimension, to choose effective dimension in MFCC characteristic parameters.
The embodiment of the present invention also provides a kind of storage medium, and computer program, computer program are stored with storage medium Above-mentioned MFCC audio recognition methods are realized when being executed by processor.It should be noted that the storage medium includes USB flash disk, movement Hard disk, ROM (Read-Only Memory, read-only storage), RAM (Random Access Memory, random access memory Device), magnetic disc or CD etc. are various can be with the medium of store program codes.
Shown in Figure 2, the embodiment of the present invention also provides a kind of electronic equipment, including memory and processor, memory On store the computer program run on a processor, realize above-mentioned MFCC speech recognitions during computing device computer program Method.
MFCC speech recognition systems provided in an embodiment of the present invention, including speech signal pre-processing module, MFCC initially believe Number frequency computing module, MFCC Feature Parameter Fusions module and MFCC feature vectors dimensional down modules.
Speech signal pre-processing module is used for:The voice signal for needing speech recognition is pre-processed, at the beginning of obtaining MFCC Beginning signal, send MFCC initial signals frequency to MFCC initial signal frequencies computing module and calculate signal;Idiographic flow is:To language Sound signal carries out preemphasis, framing and windowing process, obtains voice signal;Voice signal is subjected to FFT, obtains frequency domain letter Number;Frequency-region signal modulus square is obtained into discrete power spectrum, discrete power spectrum is passed through into filter filtering, to filtered letter DCT conversions are carried out after number taking the logarithm, obtain MFCC initial signals.
MFCC initial signal frequency computing modules are used for:After receiving MFCC initial signals frequency calculating signal, according to MFCC The low frequency frequency-region signal f1 of initial signal, calculates the Frequency fl of MFCC initial signals, and calculation formula is:
According to the intermediate frequency frequency-region signal f2 of MFCC initial signals, the IF-FRE fm of MFCC initial signals is calculated, is calculated public Formula is:
According to the high frequency frequency-region signal f3 of MFCC initial signals, the higher frequency fh of MFCC initial signals is calculated, is calculated public Formula is:MFCC Feature Parameter Fusions are sent to MFCC Feature Parameter Fusions module Signal.
MFCC Feature Parameter Fusion modules are used for:After receiving MFCC Feature Parameter Fusion signals, by MFCC initial signals Frequency fl, IF-FRE fm and higher frequency fh are merged, and obtain MFCC characteristic parameter F, and calculation formula is:
Wherein N represent voice signal carry out FFT when points (N specific value can be according to state of the art Draw), FsRepresent stopband cut-off frequency, FsFor constant, Q-1Fm inverse function is represented, Q (fl) is represented using fl as basis after f2 Fm calculation formula calculates fm, and Q (fh) is represented using fh as after f2 and calculated fm according to fm calculation formula, and H represents speech recognition In the number of wave filter that needs to use, send MFCC feature vectors dimensional down signals to MFCC feature vectors dimensional downs module.
MFCC feature vectors dimensional down modules are used for:After receiving MFCC feature vectors dimensional down signals, MFCC characteristic parameters are entered Row dimensionality reduction, obtain MFCC dimensionality reduction characteristic parameters FDrop, calculation formula is:
Wherein σbetweenFor inter _ class relationship, represent that the kth between a certain group of voice signal ties up the class of MFCC characteristic parameters Between variance sum;σwithinFor within-cluster variance, represent that the kth of some voice signal ties up the variance within clusters of MFCC characteristic parameters Sum;M represents the sum of speech samples, niThe number for the speech samples that voice signal i possesses is represented,Represent voice signal i Kth dimension MFCC characteristic parameters average, μkThe average of MFCC characteristic parameters is tieed up for kth in voice signal;Represent voice letter The kth dimension MFCC characteristic parameter normal components of number i jth section speech samples,Represent voice signal i i-th section of speech samples Kth ties up MFCC characteristic parameter normal components.
It should be noted that:System provided in an embodiment of the present invention is when carrying out intermodule communication, only with above-mentioned each function The division progress of module, can be as needed and by above-mentioned function distribution by different function moulds for example, in practical application Block is completed, i.e., the internal structure of system is divided into different functional modules, to complete all or part of work(described above Energy.
Further, the present invention is not limited to the above-described embodiments, for those skilled in the art, Without departing from the principles of the invention, some improvements and modifications can also be made, these improvements and modifications are also considered as the present invention Protection domain within.The content not being described in detail in this specification belongs to existing skill known to professional and technical personnel in the field Art.

Claims (10)

1. a kind of MFCC audio recognition methods, it is characterised in that this method comprises the following steps:
S1:The voice signal for needing speech recognition is pre-processed, MFCC initial signals is obtained, goes to S2;
S2:According to the low frequency frequency-region signal f1 of MFCC initial signals, the Frequency fl of calculating MFCC initial signals, calculation formula For:
It is according to the intermediate frequency frequency-region signal f2 of MFCC initial signals, the IF-FRE fm of calculating MFCC initial signals, calculation formula:
<mrow> <mi>f</mi> <mi>m</mi> <mo>=</mo> <mfenced open = "{" close = ""> <mtable> <mtr> <mtd> <mn>1073.05</mn> <mo>-</mo> <mn>527</mn> <mo>*</mo> <mi>l</mi> <mi>n</mi> <mo>(</mo> <mn>1</mn> <mo>+</mo> <mfrac> <mrow> <mn>200</mn> <mo>-</mo> <mi>f</mi> <mn>2</mn> </mrow> <mn>300</mn> </mfrac> <mo>)</mo> <mo>,</mo> <mn>0</mn> <mo>&lt;</mo> <mi>f</mi> <mn>2</mn> <mo>&amp;le;</mo> <mn>2000</mn> </mtd> </mtr> <mtr> <mtd> <mn>1073.05</mn> <mo>+</mo> <mn>527</mn> <mo>*</mo> <mi>l</mi> <mi>n</mi> <mo>(</mo> <mn>1</mn> <mo>+</mo> <mfrac> <mrow> <mi>f</mi> <mn>2</mn> <mo>-</mo> <mn>200</mn> </mrow> <mn>300</mn> </mfrac> <mo>)</mo> <mo>,</mo> <mn>2000</mn> <mo>&lt;</mo> <mi>f</mi> <mn>2</mn> <mo>&amp;le;</mo> <mn>4000</mn> </mtd> </mtr> </mtable> </mfenced> <mo>;</mo> </mrow>
It is according to the high frequency frequency-region signal f3 of MFCC initial signals, the higher frequency fh of calculating MFCC initial signals, calculation formula:
Go to S3;
S3:The Frequency fl, IF-FRE fm and higher frequency fh of MFCC initial signals are merged, obtain MFCC features Parameter F, calculation formula are:
<mrow> <mi>F</mi> <mo>=</mo> <mfrac> <mi>N</mi> <msub> <mi>F</mi> <mi>s</mi> </msub> </mfrac> <msup> <mi>Q</mi> <mrow> <mo>-</mo> <mn>1</mn> </mrow> </msup> <mo>&amp;lsqb;</mo> <mi>Q</mi> <mrow> <mo>(</mo> <msub> <mi>f</mi> <mi>l</mi> </msub> <mo>)</mo> </mrow> <mo>+</mo> <mi>H</mi> <mfrac> <mrow> <mi>Q</mi> <mrow> <mo>(</mo> <mi>f</mi> <mi>h</mi> <mo>)</mo> </mrow> <mo>-</mo> <mi>Q</mi> <mrow> <mo>(</mo> <mi>f</mi> <mi>l</mi> <mo>)</mo> </mrow> </mrow> <mrow> <mi>H</mi> <mo>+</mo> <mn>1</mn> </mrow> </mfrac> <mo>&amp;rsqb;</mo> <mo>;</mo> </mrow>
Wherein N represents points when voice signal carries out FFT, FsRepresent stopband cut-off frequency, FsFor constant, Q-1Represent fm Inverse function, Q (fl) represent using fl be used as after f2 according to fm calculation formula calculate fm, Q (fh) representative using fh as root after f2 Fm is calculated according to fm calculation formula, H represents the number of the wave filter needed to use in speech recognition.
2. MFCC audio recognition methods as claimed in claim 1, it is characterised in that further comprising the steps of after S3:S4:It is right MFCC characteristic parameters F carries out dimensionality reduction, obtains MFCC dimensionality reduction characteristic parameters FDrop, calculation formula is:σbetweenFor class Between dispersion, represent voice signal kth dimension MFCC characteristic parameters inter-class variance sum;σwithinFor within-cluster variance, represent The sum of the variance within clusters of the kth dimension MFCC characteristic parameters of voice signal.
3. MFCC audio recognition methods as claimed in claim 2, it is characterised in that:
<mrow> <msub> <mi>&amp;sigma;</mi> <mrow> <mi>b</mi> <mi>e</mi> <mi>t</mi> <mi>w</mi> <mi>e</mi> <mi>e</mi> <mi>n</mi> </mrow> </msub> <mo>=</mo> <msubsup> <mi>&amp;Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>M</mi> </msubsup> <msup> <mrow> <mo>(</mo> <msubsup> <mi>&amp;mu;</mi> <mi>k</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </msubsup> <mo>-</mo> <msub> <mi>&amp;mu;</mi> <mi>k</mi> </msub> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mo>,</mo> <msub> <mi>&amp;sigma;</mi> <mrow> <mi>w</mi> <mi>i</mi> <mi>t</mi> <mi>h</mi> <mi>i</mi> <mi>n</mi> </mrow> </msub> <mo>=</mo> <msubsup> <mi>&amp;Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>M</mi> </msubsup> <mo>&amp;lsqb;</mo> <mfrac> <mn>1</mn> <mi>n</mi> </mfrac> <msubsup> <mi>&amp;Sigma;</mi> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <msub> <mi>n</mi> <mi>i</mi> </msub> </msubsup> <msup> <mrow> <mo>(</mo> <msubsup> <mi>x</mi> <mi>k</mi> <mrow> <mo>(</mo> <mi>j</mi> <mo>)</mo> </mrow> </msubsup> <mo>-</mo> <msubsup> <mi>x</mi> <mi>k</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </msubsup> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mo>&amp;rsqb;</mo> <mo>;</mo> </mrow>
Wherein M represents the sum of speech samples, niThe number for the speech samples that voice signal i possesses is represented,Represent voice letter The average of number i kth dimension MFCC characteristic parameters, μkThe average of MFCC characteristic parameters is tieed up for kth in voice signal;Represent language The kth dimension MFCC characteristic parameter normal components of sound signal i jth section speech samples,Represent voice signal i i-th section of voice sample This kth dimension MFCC characteristic parameter normal components.
4. the MFCC audio recognition methods as described in claims 1 to 3, it is characterised in that S1 flow includes:To voice signal Preemphasis, framing and windowing process are carried out, obtains voice signal;Voice signal is subjected to FFT, obtains frequency-region signal;Will Frequency-region signal modulus square obtains discrete power spectrum, and discrete power spectrum is passed through into filter filtering, filtered signal is taken DCT conversions are carried out after logarithm, obtain MFCC initial signals.
5. a kind of storage medium, computer program is stored with the storage medium, it is characterised in that:The computer program is located Reason device realizes the method described in any one of Claims 1-4 when performing.
6. a kind of electronic equipment, including memory and processor, the computer journey run on a processor is stored on memory Sequence, it is characterised in that:The method described in any one of Claims 1-4 is realized during computing device computer program.
A kind of 7. MFCC speech recognition systems, it is characterised in that:The system includes speech signal pre-processing module, MFCC initially believes Number frequency computing module and MFCC Feature Parameter Fusion modules;
Speech signal pre-processing module is used for:The voice signal for needing speech recognition is pre-processed, MFCC is obtained and initially believes Number, send MFCC initial signals frequency to MFCC initial signal frequencies computing module and calculate signal;
MFCC initial signal frequency computing modules are used for:It is initial according to MFCC after receiving MFCC initial signals frequency calculating signal The low frequency frequency-region signal f1 of signal, calculates the Frequency fl of MFCC initial signals, and calculation formula is:
It is according to the intermediate frequency frequency-region signal f2 of MFCC initial signals, the IF-FRE fm of calculating MFCC initial signals, calculation formula:
<mrow> <mi>f</mi> <mi>m</mi> <mo>=</mo> <mfenced open = "{" close = ""> <mtable> <mtr> <mtd> <mn>1073.05</mn> <mo>-</mo> <mn>527</mn> <mo>*</mo> <mi>l</mi> <mi>n</mi> <mo>(</mo> <mn>1</mn> <mo>+</mo> <mfrac> <mrow> <mn>200</mn> <mo>-</mo> <mi>f</mi> <mn>2</mn> </mrow> <mn>300</mn> </mfrac> <mo>)</mo> <mo>,</mo> <mn>0</mn> <mo>&lt;</mo> <mi>f</mi> <mn>2</mn> <mo>&amp;le;</mo> <mn>2000</mn> </mtd> </mtr> <mtr> <mtd> <mn>1073.05</mn> <mo>+</mo> <mn>527</mn> <mo>*</mo> <mi>l</mi> <mi>n</mi> <mo>(</mo> <mn>1</mn> <mo>+</mo> <mfrac> <mrow> <mi>f</mi> <mn>2</mn> <mo>-</mo> <mn>200</mn> </mrow> <mn>300</mn> </mfrac> <mo>)</mo> <mo>,</mo> <mn>2000</mn> <mo>&lt;</mo> <mi>f</mi> <mn>2</mn> <mo>&amp;le;</mo> <mn>4000</mn> </mtd> </mtr> </mtable> </mfenced> <mo>;</mo> </mrow>
It is according to the high frequency frequency-region signal f3 of MFCC initial signals, the higher frequency fh of calculating MFCC initial signals, calculation formula:
MFCC Feature Parameter Fusion signals are sent to MFCC Feature Parameter Fusions module;
MFCC Feature Parameter Fusion modules are used for:After receiving MFCC Feature Parameter Fusion signals, by the low frequency of MFCC initial signals Frequency fl, IF-FRE fm and higher frequency fh are merged, and obtain MFCC characteristic parameter F, and calculation formula is:
<mrow> <mi>F</mi> <mo>=</mo> <mfrac> <mi>N</mi> <msub> <mi>F</mi> <mi>s</mi> </msub> </mfrac> <msup> <mi>Q</mi> <mrow> <mo>-</mo> <mn>1</mn> </mrow> </msup> <mo>&amp;lsqb;</mo> <mi>Q</mi> <mrow> <mo>(</mo> <msub> <mi>f</mi> <mi>l</mi> </msub> <mo>)</mo> </mrow> <mo>+</mo> <mi>H</mi> <mfrac> <mrow> <mi>Q</mi> <mrow> <mo>(</mo> <mi>f</mi> <mi>h</mi> <mo>)</mo> </mrow> <mo>-</mo> <mi>Q</mi> <mrow> <mo>(</mo> <mi>f</mi> <mi>l</mi> <mo>)</mo> </mrow> </mrow> <mrow> <mi>H</mi> <mo>+</mo> <mn>1</mn> </mrow> </mfrac> <mo>&amp;rsqb;</mo> <mo>;</mo> </mrow>
Wherein N represents points when voice signal carries out FFT, FsRepresent stopband cut-off frequency, FsFor constant, Q-1Represent fm Inverse function, Q (fl) represent using fl be used as after f2 according to fm calculation formula calculate fm, Q (fh) representative using fh as root after f2 Fm is calculated according to fm calculation formula, H represents the number of the wave filter needed to use in speech recognition.
8. MFCC speech recognition systems as claimed in claim 7, it is characterised in that the system also includes MFCC characteristic parameters and dropped Module is tieed up, it is used for:After the completion of the work of MFCC Feature Parameter Fusions module, dimensionality reduction is carried out to MFCC characteristic parameters, obtained MFCC dimensionality reduction characteristic parameters FDrop, calculation formula is:σbetweenFor inter _ class relationship, the kth of voice signal is represented Tie up the sum of the inter-class variance of MFCC characteristic parameters;σwithinFor within-cluster variance, represent that the kth dimension MFCC features of voice signal are joined The sum of several variance within clusters.
9. MFCC speech recognition systems as claimed in claim 8, it is characterised in that:
<mrow> <msub> <mi>&amp;sigma;</mi> <mrow> <mi>b</mi> <mi>e</mi> <mi>t</mi> <mi>w</mi> <mi>e</mi> <mi>e</mi> <mi>n</mi> </mrow> </msub> <mo>=</mo> <msubsup> <mi>&amp;Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>M</mi> </msubsup> <msup> <mrow> <mo>(</mo> <msubsup> <mi>&amp;mu;</mi> <mi>k</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </msubsup> <mo>-</mo> <msub> <mi>&amp;mu;</mi> <mi>k</mi> </msub> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mo>,</mo> <msub> <mi>&amp;sigma;</mi> <mrow> <mi>w</mi> <mi>i</mi> <mi>t</mi> <mi>h</mi> <mi>i</mi> <mi>n</mi> </mrow> </msub> <mo>=</mo> <msubsup> <mi>&amp;Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>M</mi> </msubsup> <mo>&amp;lsqb;</mo> <mfrac> <mn>1</mn> <mi>n</mi> </mfrac> <msubsup> <mi>&amp;Sigma;</mi> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <msub> <mi>n</mi> <mi>i</mi> </msub> </msubsup> <msup> <mrow> <mo>(</mo> <msubsup> <mi>x</mi> <mi>k</mi> <mrow> <mo>(</mo> <mi>j</mi> <mo>)</mo> </mrow> </msubsup> <mo>-</mo> <msubsup> <mi>x</mi> <mi>k</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </msubsup> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mo>&amp;rsqb;</mo> <mo>;</mo> </mrow>
Wherein M represents the sum of speech samples, niThe number for the speech samples that voice signal i possesses is represented,Represent voice letter The average of number i kth dimension MFCC characteristic parameters, μkThe average of MFCC characteristic parameters is tieed up for kth in voice signal;Represent language The kth dimension MFCC characteristic parameter normal components of sound signal i jth section speech samples,Represent voice signal i i-th section of voice sample This kth dimension MFCC characteristic parameter normal components.
10. the MFCC speech recognition systems as described in claim 7 to 9, it is characterised in that:The speech signal pre-processing module Workflow include:Preemphasis, framing and windowing process are carried out to voice signal, obtain voice signal;Voice signal is entered Row FFT, obtains frequency-region signal;Frequency-region signal modulus square is obtained into discrete power spectrum, discrete power spectrum is passed through into filter Ripple device is filtered, and DCT conversions are carried out after being taken the logarithm to filtered signal, obtain MFCC initial signals.
CN201710731077.XA 2017-08-23 2017-08-23 MFCC audio recognition methods, storage medium, electronic equipment and system Pending CN107527611A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201710731077.XA CN107527611A (en) 2017-08-23 2017-08-23 MFCC audio recognition methods, storage medium, electronic equipment and system
PCT/CN2018/081321 WO2019037426A1 (en) 2017-08-23 2018-03-30 Mfcc voice recognition method, storage medium, electronic device, and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710731077.XA CN107527611A (en) 2017-08-23 2017-08-23 MFCC audio recognition methods, storage medium, electronic equipment and system

Publications (1)

Publication Number Publication Date
CN107527611A true CN107527611A (en) 2017-12-29

Family

ID=60681946

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710731077.XA Pending CN107527611A (en) 2017-08-23 2017-08-23 MFCC audio recognition methods, storage medium, electronic equipment and system

Country Status (2)

Country Link
CN (1) CN107527611A (en)
WO (1) WO2019037426A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108269566A (en) * 2018-01-17 2018-07-10 南京理工大学 A kind of thorax mouth wave recognition methods based on multiple dimensioned sub-belt energy collection feature
WO2019037426A1 (en) * 2017-08-23 2019-02-28 武汉斗鱼网络科技有限公司 Mfcc voice recognition method, storage medium, electronic device, and system
CN113571078B (en) * 2021-01-29 2024-04-26 腾讯科技(深圳)有限公司 Noise suppression method, device, medium and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090144053A1 (en) * 2007-12-03 2009-06-04 Kabushiki Kaisha Toshiba Speech processing apparatus and speech synthesis apparatus
CN104900229A (en) * 2015-05-25 2015-09-09 桂林电子科技大学信息科技学院 Method for extracting mixed characteristic parameters of voice signals
CN105895087A (en) * 2016-03-24 2016-08-24 海信集团有限公司 Voice recognition method and apparatus
CN106356058A (en) * 2016-09-08 2017-01-25 河海大学 Robust speech recognition method based on multi-band characteristic compensation

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7027979B2 (en) * 2003-01-14 2006-04-11 Motorola, Inc. Method and apparatus for speech reconstruction within a distributed speech recognition system
CN101577116B (en) * 2009-02-27 2012-07-18 北京中星微电子有限公司 Extracting method of MFCC coefficients of voice signal, device and Mel filtering method
JP2012163919A (en) * 2011-02-09 2012-08-30 Sony Corp Voice signal processing device, method and program
CN103390403B (en) * 2013-06-19 2015-11-25 北京百度网讯科技有限公司 The extracting method of MFCC feature and device
CN105405448B (en) * 2014-09-16 2019-09-03 科大讯飞股份有限公司 A kind of sound effect treatment method and device
CN106782565A (en) * 2016-11-29 2017-05-31 重庆重智机器人研究院有限公司 A kind of vocal print feature recognition methods and system
CN107527611A (en) * 2017-08-23 2017-12-29 武汉斗鱼网络科技有限公司 MFCC audio recognition methods, storage medium, electronic equipment and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090144053A1 (en) * 2007-12-03 2009-06-04 Kabushiki Kaisha Toshiba Speech processing apparatus and speech synthesis apparatus
CN104900229A (en) * 2015-05-25 2015-09-09 桂林电子科技大学信息科技学院 Method for extracting mixed characteristic parameters of voice signals
CN105895087A (en) * 2016-03-24 2016-08-24 海信集团有限公司 Voice recognition method and apparatus
CN106356058A (en) * 2016-09-08 2017-01-25 河海大学 Robust speech recognition method based on multi-band characteristic compensation

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
张文克: "《中国优秀硕士学位论文全文数据库信息科技辑》", 15 March 2017, 中国学术期刊(光盘版)电子杂志社 *
袁正午等: "改进的混合MFCC语音识别算法研究", 《计算机工程与应用》 *
鲜晓东等: "基于Fisher比的梅尔倒谱系数混合特征提取方法", 《计算机应用》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019037426A1 (en) * 2017-08-23 2019-02-28 武汉斗鱼网络科技有限公司 Mfcc voice recognition method, storage medium, electronic device, and system
CN108269566A (en) * 2018-01-17 2018-07-10 南京理工大学 A kind of thorax mouth wave recognition methods based on multiple dimensioned sub-belt energy collection feature
CN113571078B (en) * 2021-01-29 2024-04-26 腾讯科技(深圳)有限公司 Noise suppression method, device, medium and electronic equipment

Also Published As

Publication number Publication date
WO2019037426A1 (en) 2019-02-28

Similar Documents

Publication Publication Date Title
CN102968986B (en) Overlapped voice and single voice distinguishing method based on long time characteristics and short time characteristics
CN103117059B (en) Voice signal characteristics extracting method based on tensor decomposition
CN102968990B (en) Speaker identifying method and system
CN106847292A (en) Method for recognizing sound-groove and device
CN102483916B (en) Audio feature extracting apparatus, audio feature extracting method, and audio feature extracting program
CN109378010A (en) Training method, the speech de-noising method and device of neural network model
CN113327626B (en) Voice noise reduction method, device, equipment and storage medium
CN104221079B (en) Carry out the improved Mel filter bank structure of phonetic analysiss using spectral characteristic
CN110189757A (en) A kind of giant panda individual discrimination method, equipment and computer readable storage medium
CN102486920A (en) Audio event detection method and device
CN109192200B (en) Speech recognition method
CN109147798B (en) Speech recognition method, device, electronic equipment and readable storage medium
CN109065043B (en) Command word recognition method and computer storage medium
CN110942766A (en) Audio event detection method, system, mobile terminal and storage medium
CN107123432A (en) A kind of Self Matching Top N audio events recognize channel self-adapted method
CN103258537A (en) Method utilizing characteristic combination to identify speech emotions and device thereof
Chen et al. An audio scene classification framework with embedded filters and a DCT-based temporal module
CN106782503A (en) Automatic speech recognition method based on physiologic information in phonation
CN108806725A (en) Speech differentiation method, apparatus, computer equipment and storage medium
CN101577116A (en) Extracting method of MFCC coefficients of voice signal, device and Mel filtering method
CN107527611A (en) MFCC audio recognition methods, storage medium, electronic equipment and system
CN105845143A (en) Speaker confirmation method and speaker confirmation system based on support vector machine
CN116524939A (en) ECAPA-TDNN-based automatic identification method for bird song species
CN108172214A (en) A kind of small echo speech recognition features parameter extracting method based on Mel domains
CN109616124A (en) Lightweight method for recognizing sound-groove and system based on ivector

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20171229