CN106024010A - Speech signal dynamic characteristic extraction method based on formant curves - Google Patents

Speech signal dynamic characteristic extraction method based on formant curves Download PDF

Info

Publication number
CN106024010A
CN106024010A CN201610340935.3A CN201610340935A CN106024010A CN 106024010 A CN106024010 A CN 106024010A CN 201610340935 A CN201610340935 A CN 201610340935A CN 106024010 A CN106024010 A CN 106024010A
Authority
CN
China
Prior art keywords
formant
curve
voice signal
frame
formant curve
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610340935.3A
Other languages
Chinese (zh)
Other versions
CN106024010B (en
Inventor
韩志艳
王健
王东
周建壮
郭继宁
刘继行
曹丽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bohai University
Original Assignee
Bohai University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bohai University filed Critical Bohai University
Priority to CN201610340935.3A priority Critical patent/CN106024010B/en
Publication of CN106024010A publication Critical patent/CN106024010A/en
Application granted granted Critical
Publication of CN106024010B publication Critical patent/CN106024010B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/15Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being formant information
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/21Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information

Abstract

The invention provides a speech signal dynamic characteristic extraction method based on formant curves, belonging to the technical field of Chinese speech signal dynamic characteristic extraction. The method comprises the following steps: acquiring speech signals; carrying out preprocessingon the speech signals; extracting formant frequency characteristics of the speech signals; according to the sequence from the first frame to the last frame, combining the first formant frequency characteristic values of all the frames of preprocessed speech signals to obtain a first formant curve, and then obtaining a second formant curve, a third formant curve, and a fourth formant curve in the same manner; carrying out rapid Fourier transform on each obtained formant curve to obtain a linear frequency spectrum; obtaining an energy spectrum according to the linear frequency spectrum; obtaining logarithm energy according to the energy spectrum; and carrying out discrete cosine transform on the logarithm energy. Compared with the existing method, the method provided by the invention has the advantages that the speech signal dynamic characteristics are extracted, the temporal correlation is available, therefore, the close relevance before and after the speech signals and between the adjacent speech signals is disclosed, and the speech recognition property is improved.

Description

A kind of voice signal dynamic feature extraction method based on formant curve
Technical field
The invention belongs to Chinese phonetic signal dynamics Feature Extraction Technology field, be specifically related to a kind of voice based on formant curve letter Number dynamic feature extraction method.
Background technology
China's the Research of Speech Recognition work is started in the fifties, but until just starts to develop rapidly the seventies.The Chinese Academy of Sciences, Tsing-Hua University, Deng Duojia research unit of Peking University are being engaged in the exploitation of Chinese speech recognition system, continuous to large vocabulary at present The research of speech recognition system is already close to external top level;In the 8th Five-Year Plan for national economic and social development of China and " 863 " in the works, Chinese The research of speech recognition has obtained supporting energetically, and National 863 " intelligent computer theme " expert group is exclusively for the Research of Speech Recognition Project verification, simultaneously because grow with each passing day in China status in the world, and the critical role residing in terms of economy and market, Chinese Speech recognition is the most increasingly paid attention to by foreign study mechanism and company, IBM, Microsoft, APPLE, Motorola, Intel, The companies such as L&H set up research institution the most at home, in succession put in the exploitation of Chinese speech recognition system, promote forcefully The development of Mandarin speech recognition research;
While it is true, it is far away apart from the real man-machine boundary freely exchanged;Present existing commercial system all also exists one A little problems, such as the most not fully up to expectations for the phonetic recognization rate under noise circumstance and robustness etc.;
The most basic most important exploitation link of speech recognition is the extraction of phonic signal character parameter;As far back as the forties in last century, R.K.Potter et al. proposes the concept of " Visible Speech ", it is indicated that sound spectrograph has the strongest descriptive power to voice signal, And term spectrum information of trying carries out speech recognition, which forms phonetic feature the earliest.To the fifties, it has been found that It is identified voice signal being necessary for from speech waveform extracting some parameter that can reflect characteristics of speech sounds, is so possible not only to Reduce template number, operand and amount of storage, and redundancy useless in voice signal can be filtered, then occur as soon as Amplitude, short time frame average energy, short time frame zero-crossing rate, in short-term autocorrelation coefficient etc..Along with the development of the technology of identification, Ren Menfa Its stability of characteristic parameter and separating capacity in current territory are not the most fine, then start with frequency domain parameter as voice signal Feature, such as pitch period, formant frequency, linear predictor coefficient (LPC), line spectrum pair (LSP), cepstrum coefficient etc., mesh Front the most widely used characteristic parameter is MFCC cepstrum (MFCC) based on human auditory model;But these parameters are once When being applied to noise circumstance, its performance can drastically decline;
And characteristic parameter suggested above all reflects the static nature of voice, the dynamic characteristic of voice signal refers to from the most several The characteristic parameter extracted in frame voice, such as can be obtained by the differential parameter of static nature and acceleration parameter, but difference Multidate information can not be excavated the most abundant by parameter and acceleration parameter, so they still can not reflect voice signal well Dynamic characteristic.
Summary of the invention
For the deficiencies in the prior art, the present invention proposes a kind of voice signal dynamic feature extraction method based on formant curve, To reach to expand application, the performance of raising speech recognition, realize fast and effeciently grasping the behavioral characteristics of signal and realize existing The purpose of speech recognition technology is applied under strong noise environment.
A kind of voice signal dynamic feature extraction method based on formant curve, comprises the following steps:
Step 1, collection voice signal;
Step 2, voice signal is carried out pretreatment, including preemphasis, framing windowing and end-point detection;
Step 3, employing method based on Hibert-Huang conversion, carried out the formant frequency feature of voice signal after pretreatment Estimation, it is thus achieved that every first formant eigenvalue of frame voice signal, the second formant eigenvalue, the 3rd formant eigenvalue and the Four formant eigenvalues;
Step 4, composition formant curve, particularly as follows:
According to from the frame sequence of the first frame to last frame, the first formant eigenvalue of pretreated every frame voice signal is carried out Combination obtains the first formant curve;
According to from the frame sequence of the first frame to last frame, the second formant eigenvalue of pretreated every frame voice signal is carried out Combination obtains the second formant curve;
According to from the frame sequence of the first frame to last frame, the 3rd formant eigenvalue of pretreated every frame voice signal is carried out Combination obtains the 3rd formant curve;
According to from the frame sequence of the first frame to last frame, the 4th formant eigenvalue of pretreated every frame voice signal is carried out Combination obtains the 4th formant curve;
Step 5, to obtain the first formant curve, the second formant curve, the 3rd formant curve and the 4th formant curve Carry out fast Fourier transform, it is thus achieved that the linear spectral of every formant curve;
Step 6, obtain the energy spectrum of every formant curve according to linear spectral;
Step 7, obtain the logarithmic energy of every formant curve according to energy spectrum;
Step 8, above-mentioned logarithmic energy is carried out discrete cosine transform obtain cepstral domains, i.e. obtain voice signal dynamic feature coefficient.
Described in step 2, voice signal is carried out pretreatment, including preemphasis, framing windowing and end-point detection, wherein,
Described preemphasis: realized by single order digital pre-emphasis filter, the coefficient value scope of preemphasis filter be 0.93~ 0.97;
Described framing windowing: carry out framing with frame length 256, and the voice signal after framing is added Hamming window;
Described end-point detection: use short-time energy-zero-product method to detect.
The the first formant curve obtained, the second formant curve, the 3rd formant curve and the 4th are resonated described in step 5 Peak curve carries out fast Fourier transform, it is thus achieved that the linear spectral of every formant curve;
Concrete formula is as follows:
X i ( k ) = Σ n = 0 N - 1 x i ( n ) e - j 2 π n k / N - - - ( 1 )
Wherein, XiK () represents the linear spectral that i-th formant curve obtains after carrying out fast Fourier transform;I=1,2,3,4; K=0,1,2 ..., N-1, N are the frame number of voice signal;xiN () represents i-th formant curve;J is imaginary unit, and e is Constant.
The discrete cosine transform that carries out above-mentioned logarithmic energy described in step 8 obtains cepstral domains, i.e. obtains voice signal the most special Levy parameter;
Concrete publicity is as follows:
C i ( t ) = Σ k = 0 N - 1 L i ( k ) c o s [ π t ( k + 0.5 ) N ] - - - ( 2 )
Wherein, CiT () represents the dynamic feature coefficient of i-th formant curve;I=1,2,3,4;T=1,2 ..., T, T represent and set Fixed cepstrum coefficient number, span is 12~16;LiK () represents the logarithmic energy of i-th formant curve; K=0,1,2 ..., N-1, N are the frame number of voice signal.
The invention have the advantages that
1, the voice signal dynamic feature coefficient that the present invention obtains is mainly used in the dictation machine of computer, and with telephone network or The Speech information query and service system that the Internet combines, the most also can be applicable in miniaturization, portable speech production, as The aspects such as dialing on wireless phone, the Voice command of automobile equipment, intelligent toy, household remote;
What 2, the present invention extracted is voice signal behavioral characteristics, and it has temporal correlation, before and after disclosing voice signal and phase The close association existed between neighbour, compared to traditional MFCC method, substantially increases the performance of speech recognition;
3, the present invention uses method based on Hibert-Huang conversion to estimate pretreated Speech formant frequency feature, Wherein one group of intrinsic mode function containing different scale (IMF) is become to divide signal decomposition by Empirical mode decomposition (EMD) Amount, represents a frequency content through decomposing each the IMF component obtained, and these frequency contents can effectively highlight signal Local characteristics and variations in detail, this will assist in the behavioral characteristics fast and effeciently grasping signal;
4, the present invention constitute formant curve there is temporal correlation, before and after disclosing voice signal and adjacent between also exist Close association;This characteristic so that apply speech recognition technology to become possibility under strong noise environment.
Accompanying drawing explanation
Fig. 1 is the voice signal dynamic feature extraction method flow chart based on formant curve of an embodiment of the present invention;
Fig. 2 be an embodiment of the present invention white noise in the case of parameter recognition performance curve comparison diagram;
Fig. 3 be an embodiment of the present invention powder noise situations under parameter recognition performance curve comparison diagram;
Fig. 4 be an embodiment of the present invention street noise in the case of parameter recognition performance curve comparison diagram;
Fig. 5 be an embodiment of the present invention tank noise situations under parameter recognition performance curve comparison diagram.
Detailed description of the invention
Below in conjunction with the accompanying drawings an embodiment of the present invention is described further.
A kind of voice signal dynamic feature extraction method based on formant curve, method flow diagram is as it is shown in figure 1, include following step Rapid:
Step 1, collection voice signal;
In the embodiment of the present invention, utilize mike to input speech data, and processed by computer, single-chip microcomputer or dsp chip etc. single Unit carries out sample quantization with the sample frequency of 11.025KHz, the quantified precision of 16bit, it is thus achieved that corresponding voice signal;The present invention Embodiment uses computer as processing unit;
Step 2, voice signal is carried out pretreatment, including preemphasis, framing windowing and end-point detection;
In the embodiment of the present invention, described preemphasis: realized by single order digital pre-emphasis filter, preemphasis filter be Number span is 0.93~0.97, and in the embodiment of the present invention, value is 0.9375;Described framing windowing: with frame length 256 Point carries out framing, and the voice signal after framing is added Hamming window;Described end-point detection: use short-time energy-zero-product method to examine Survey;
Step 3, employing method based on Hibert-Huang conversion, carried out the formant frequency feature of voice signal after pretreatment Estimation, it is thus achieved that every first formant eigenvalue F1 of frame voice signal, the second formant eigenvalue F2, the 3rd formant eigenvalue F3 and the 4th formant eigenvalue F4;
In the embodiment of the present invention, fast Fourier transform (FFT) each rank formant frequency of the voice signal gone out according to a preliminary estimate is true Determine the parameter of respective band pass filters, and by this parameter, voice signal is made Filtering Processing, filtered signal is carried out Empirical Mode State is decomposed (EMD) and is obtained family's intrinsic mode function (IMF), determines the IMF containing formant frequency by energy maximum principle, The instantaneous frequency and the Hilbert that calculate this IMF compose the formant frequency parameter i.e. obtaining voice signal;
Step 4, composition formant curve, particularly as follows:
In the embodiment of the present invention, according to from the frame sequence of the first frame to last frame, by the first of pretreated every frame voice signal Formant frequency eigenvalue F1 is combined obtaining the first formant curve x1(n), n=0,1,2 ..., N-1, N are voice signal Frame number;According to from the frame sequence of the first frame to last frame, by special for the second formant frequency of pretreated every frame voice signal Value indicative F2 is combined obtaining the second formant curve x2(n);According to from the frame sequence of the first frame to last frame, after pretreatment The 3rd formant frequency eigenvalue F3 of every frame voice signal be combined obtaining the 3rd formant curve x3(n);According to from 4th formant frequency eigenvalue F4 of pretreated every frame voice signal, to the frame sequence of last frame, is combined obtaining by one frame Obtain the 4th formant curve x4(n);
Step 5, to obtain the first formant curve, the second formant curve, the 3rd formant curve and the 4th formant curve Carry out fast Fourier transform, it is thus achieved that the linear spectral of every formant curve;
In the embodiment of the present invention, concrete formula is as follows:
X i ( k ) = Σ n = 0 N - 1 x i ( n ) e - j 2 π n k / N - - - ( 1 )
Wherein, XiK () represents the linear spectral that i-th formant curve obtains after carrying out fast Fourier transform;I=1,2,3,4; K=0,1,2 ..., N-1, N are the frame number of voice signal;xiN () represents i-th formant curve;J is imaginary unit, and e is Constant, approximation is 2.7;
Step 6, obtain the energy spectrum of every formant curve according to linear spectral;
In the embodiment of the present invention, take above-mentioned linear spectral Xi(k) mould square obtain corresponding energy spectrum Si(k), formula is as follows:
Si(k)=| Xi(k)|2 (3)
Wherein, SiK () represents the energy spectrum of i-th formant curve;
Step 7, obtain the logarithmic energy of every formant curve according to energy spectrum;
In the embodiment of the present invention, in order to make result have more preferable robustness to noise, by the energy spectrum S of above-mentioned acquisitioniK () is taken the logarithm, Logarithmic energy L can be obtainedi(k), formula is as follows:
Li(k)=Log (Si(k)) (4)
Wherein, LiK () is the logarithmic energy of i-th formant curve;
Step 8, above-mentioned logarithmic energy is carried out discrete cosine transform obtain cepstral domains, i.e. obtain voice signal dynamic feature coefficient.
Concrete publicity is as follows:
C i ( t ) = Σ k = 0 N - 1 L i ( k ) c o s [ π t ( k + 0.5 ) N ] - - - ( 2 )
Wherein, CiT () represents the dynamic feature coefficient of i-th formant curve;I=1,2,3,4;T=1,2 ..., T, T represent and set Fixed cepstrum coefficient number, span is 12~16, and the embodiment of the present invention takes T=12;
In the embodiment of the present invention, use 50 typical Chinese words to remit and test;Owing to considering that identification system is easily by environment The impact of the factors such as noise, channel variation and speaker's change, therefore, the training set of the embodiment of the present invention uses under quiet environment Speech data, and test set uses containing noisy data;
In order to verify this feature parameter robustness to different speaker's changes, training set data is recorded into for twice by front and back, totally 50 people, Everyone every word pronounces one time, obtains 5000 data altogether, and test set data are also to record at twice, totally 30 people, everyone every word Pronounce one time, totally 3000 data;In order to verify the robustness that different channels is changed by this feature parameter, use different every time Mike is recorded;In order to verify the robustness that varying environment noise is changed by this feature parameter, the embodiment of the present invention is at test set Each voice in be manually adding to four kinds of noises, including white noise, powder noise, street noise, tank noise, constitute letter Make an uproar than for 15dB, the noisy speech signal of 10dB, 5dB, 0dB ,-5dB.
Using the wavelet neural network improved based on genetic algorithm to be used as grader in the embodiment of the present invention, network input layer has 48 Individual neuron, output layer has 50 neurons, hidden layer node number to be determined by genetic algorithm;
In the embodiment of the present invention, Fig. 2, Fig. 3, Fig. 4 and Fig. 5 are to use the MFCC method with embodiment of the present invention the same terms Bent with the embodiment of the present invention method system identification performance under white noise, powder noise, street noise and tank noise jamming respectively Line;It can be seen that signal to noise ratio is relatively low when, embodiment of the present invention method discrimination compared with MFCC method carries High a lot.

Claims (4)

1. a voice signal dynamic feature extraction method based on formant curve, it is characterised in that comprise the following steps:
Step 1, collection voice signal;
Step 2, voice signal is carried out pretreatment, including preemphasis, framing windowing and end-point detection;
Step 3, employing method based on Hibert-Huang conversion, enter the formant frequency feature of voice signal after pretreatment Row estimation, it is thus achieved that every first formant eigenvalue of frame voice signal, the second formant eigenvalue, the 3rd formant eigenvalue and 4th formant eigenvalue;
Step 4, composition formant curve, particularly as follows:
According to from the frame sequence of the first frame to last frame, the first formant eigenvalue of pretreated every frame voice signal is carried out Combination obtains the first formant curve;
According to from the frame sequence of the first frame to last frame, the second formant eigenvalue of pretreated every frame voice signal is carried out Combination obtains the second formant curve;
According to from the frame sequence of the first frame to last frame, the 3rd formant eigenvalue of pretreated every frame voice signal is carried out Combination obtains the 3rd formant curve;
According to from the frame sequence of the first frame to last frame, the 4th formant eigenvalue of pretreated every frame voice signal is carried out Combination obtains the 4th formant curve;
Step 5, the first formant curve to obtaining, the second formant curve, the 3rd formant curve and the 4th formant are bent Line carries out fast Fourier transform, it is thus achieved that the linear spectral of every formant curve;
Step 6, obtain the energy spectrum of every formant curve according to linear spectral;
Step 7, obtain the logarithmic energy of every formant curve according to energy spectrum;
Step 8, above-mentioned logarithmic energy is carried out discrete cosine transform obtain cepstral domains, i.e. obtain voice signal behavioral characteristics ginseng Number.
Voice signal dynamic feature extraction method based on formant curve the most according to claim 1, it is characterised in that step Described in rapid 2, voice signal is carried out pretreatment, including preemphasis, framing windowing and end-point detection, wherein,
Described preemphasis: realized by single order digital pre-emphasis filter, the coefficient value scope of preemphasis filter is 0.93~0.97;
Described framing windowing: carry out framing with frame length 256, and the voice signal after framing is added Hamming window;
Described end-point detection: use short-time energy-zero-product method to detect.
Voice signal dynamic feature extraction method based on formant curve the most according to claim 1, it is characterised in that step The first formant curve, the second formant curve, the 3rd formant curve and the 4th formant curve to acquisition described in rapid 5 Carry out fast Fourier transform, it is thus achieved that the linear spectral of every formant curve;
Concrete formula is as follows:
X i ( k ) = Σ n = 0 N - 1 x i ( n ) e - j 2 π n k / N - - - ( 1 )
Wherein, XiK () represents the linear spectral that i-th formant curve obtains after carrying out fast Fourier transform; I=1,2,3,4;K=0,1,2 ..., N-1, N are the frame number of voice signal;xiN () represents i-th formant curve, N=0,1,2 ..., N-1;J is imaginary unit, and e is constant.
Voice signal dynamic feature extraction method based on formant curve the most according to claim 1, it is characterised in that step The discrete cosine transform that carries out above-mentioned logarithmic energy described in rapid 8 obtains cepstral domains, i.e. obtains voice signal behavioral characteristics ginseng Number;
Concrete publicity is as follows:
C i ( t ) = Σ k = 0 N - 1 L i ( k ) cos [ π t ( k + 0.5 ) N ] - - - ( 2 )
Wherein, CiT () represents the dynamic feature coefficient of i-th formant curve;I=1,2,3,4;T=1,2 ..., T, T represent and set Fixed cepstrum coefficient number, span is 12~16;LiK () represents the logarithmic energy of i-th formant curve; K=0,1,2 ..., N-1, N are the frame number of voice signal.
CN201610340935.3A 2016-05-19 2016-05-19 A kind of voice signal dynamic feature extraction method based on formant curve Expired - Fee Related CN106024010B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610340935.3A CN106024010B (en) 2016-05-19 2016-05-19 A kind of voice signal dynamic feature extraction method based on formant curve

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610340935.3A CN106024010B (en) 2016-05-19 2016-05-19 A kind of voice signal dynamic feature extraction method based on formant curve

Publications (2)

Publication Number Publication Date
CN106024010A true CN106024010A (en) 2016-10-12
CN106024010B CN106024010B (en) 2019-08-20

Family

ID=57095695

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610340935.3A Expired - Fee Related CN106024010B (en) 2016-05-19 2016-05-19 A kind of voice signal dynamic feature extraction method based on formant curve

Country Status (1)

Country Link
CN (1) CN106024010B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106596002A (en) * 2016-12-14 2017-04-26 东南大学 High-speed railway steel truss arch bridge vehicle-bridge resonance curve measuring method
CN108053842A (en) * 2017-12-13 2018-05-18 电子科技大学 Shortwave sound end detecting method based on image identification
CN109410971A (en) * 2018-11-13 2019-03-01 无锡冰河计算机科技发展有限公司 A kind of method and apparatus for beautifying sound
CN110135291A (en) * 2019-04-29 2019-08-16 西北工业大学 A kind of method for parameter estimation of Low SNR signal
CN110663080A (en) * 2017-02-13 2020-01-07 法国国家科研中心 Method and apparatus for dynamically modifying the timbre of speech by frequency shifting of spectral envelope formants
CN111726728A (en) * 2020-06-30 2020-09-29 联想(北京)有限公司 Resonance suppression method and device
CN111899724A (en) * 2020-08-06 2020-11-06 中国人民解放军空军预警学院 Voice feature coefficient extraction method based on Hilbert-Huang transform and related equipment
CN112966528A (en) * 2021-03-01 2021-06-15 郑州铁路职业技术学院 English voice translation fuzzy matching system
CN114598565A (en) * 2022-05-10 2022-06-07 深圳市发掘科技有限公司 Kitchen electrical equipment remote control system and method and computer equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101067929A (en) * 2007-06-05 2007-11-07 南京大学 Method for enhancing and extracting phonetic resonance hump trace utilizing formant
CN102231281A (en) * 2011-07-18 2011-11-02 渤海大学 Voice visualization method based on integration characteristic and neural network
CN102820037A (en) * 2012-07-21 2012-12-12 渤海大学 Chinese initial and final visualization method based on combination feature
CN102855408A (en) * 2012-09-18 2013-01-02 福州大学 ICA (independent component analysis)-based EMD (empirical mode decomposition) improvement process IMF (intrinsic mode function) judgment method
CN103021405A (en) * 2012-12-05 2013-04-03 渤海大学 Voice signal dynamic feature extraction method based on MUSIC and modulation spectrum filter
CN104835507A (en) * 2015-03-30 2015-08-12 渤海大学 Serial-parallel combined multi-mode emotion information fusion and identification method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101067929A (en) * 2007-06-05 2007-11-07 南京大学 Method for enhancing and extracting phonetic resonance hump trace utilizing formant
CN102231281A (en) * 2011-07-18 2011-11-02 渤海大学 Voice visualization method based on integration characteristic and neural network
CN102820037A (en) * 2012-07-21 2012-12-12 渤海大学 Chinese initial and final visualization method based on combination feature
CN102855408A (en) * 2012-09-18 2013-01-02 福州大学 ICA (independent component analysis)-based EMD (empirical mode decomposition) improvement process IMF (intrinsic mode function) judgment method
CN103021405A (en) * 2012-12-05 2013-04-03 渤海大学 Voice signal dynamic feature extraction method based on MUSIC and modulation spectrum filter
CN104835507A (en) * 2015-03-30 2015-08-12 渤海大学 Serial-parallel combined multi-mode emotion information fusion and identification method

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
乐莎莎: "基于HHT的咳嗽音识别研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
王洪海: "基于声学特征的自动语言辨识研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
莫家玲: "基于不变集多小波的语音特征参数提取研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
顾亚强: "非特定人语音识别关键技术研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106596002A (en) * 2016-12-14 2017-04-26 东南大学 High-speed railway steel truss arch bridge vehicle-bridge resonance curve measuring method
CN110663080A (en) * 2017-02-13 2020-01-07 法国国家科研中心 Method and apparatus for dynamically modifying the timbre of speech by frequency shifting of spectral envelope formants
CN108053842A (en) * 2017-12-13 2018-05-18 电子科技大学 Shortwave sound end detecting method based on image identification
CN108053842B (en) * 2017-12-13 2021-09-14 电子科技大学 Short wave voice endpoint detection method based on image recognition
CN109410971B (en) * 2018-11-13 2021-08-31 无锡冰河计算机科技发展有限公司 Method and device for beautifying sound
CN109410971A (en) * 2018-11-13 2019-03-01 无锡冰河计算机科技发展有限公司 A kind of method and apparatus for beautifying sound
CN110135291A (en) * 2019-04-29 2019-08-16 西北工业大学 A kind of method for parameter estimation of Low SNR signal
CN110135291B (en) * 2019-04-29 2023-03-24 西北工业大学 Parameter estimation method for low signal-to-noise ratio signal
CN111726728A (en) * 2020-06-30 2020-09-29 联想(北京)有限公司 Resonance suppression method and device
CN111899724A (en) * 2020-08-06 2020-11-06 中国人民解放军空军预警学院 Voice feature coefficient extraction method based on Hilbert-Huang transform and related equipment
CN112966528A (en) * 2021-03-01 2021-06-15 郑州铁路职业技术学院 English voice translation fuzzy matching system
CN112966528B (en) * 2021-03-01 2023-09-19 郑州铁路职业技术学院 English speech translation fuzzy matching system
CN114598565A (en) * 2022-05-10 2022-06-07 深圳市发掘科技有限公司 Kitchen electrical equipment remote control system and method and computer equipment

Also Published As

Publication number Publication date
CN106024010B (en) 2019-08-20

Similar Documents

Publication Publication Date Title
CN106024010B (en) A kind of voice signal dynamic feature extraction method based on formant curve
CN103236260B (en) Speech recognition system
CN102509547B (en) Method and system for voiceprint recognition based on vector quantization based
CN103345923B (en) A kind of phrase sound method for distinguishing speek person based on rarefaction representation
CN103310789B (en) A kind of sound event recognition method of the parallel model combination based on improving
CN104900229A (en) Method for extracting mixed characteristic parameters of voice signals
CN102968990B (en) Speaker identifying method and system
CN102568476B (en) Voice conversion method based on self-organizing feature map network cluster and radial basis network
CN103065629A (en) Speech recognition system of humanoid robot
CN101226743A (en) Method for recognizing speaker based on conversion of neutral and affection sound-groove model
CN104123933A (en) Self-adaptive non-parallel training based voice conversion method
CN104183245A (en) Method and device for recommending music stars with tones similar to those of singers
CN113012720B (en) Depression detection method by multi-voice feature fusion under spectral subtraction noise reduction
CN108597505A (en) Audio recognition method, device and terminal device
CN110136709A (en) Audio recognition method and video conferencing system based on speech recognition
CN111192598A (en) Voice enhancement method for jump connection deep neural network
CN109036458A (en) A kind of multilingual scene analysis method based on audio frequency characteristics parameter
CN103021405A (en) Voice signal dynamic feature extraction method based on MUSIC and modulation spectrum filter
CN106531174A (en) Animal sound recognition method based on wavelet packet decomposition and spectrogram features
CN105679312A (en) Phonetic feature processing method of voiceprint identification in noise environment
CN102237083A (en) Portable interpretation system based on WinCE platform and language recognition method thereof
CN100543840C (en) Method for distinguishing speek person based on emotion migration rule and voice correction
CN109192196A (en) A kind of audio frequency characteristics selection method of the SVM classifier of anti-noise
CN106373559A (en) Robustness feature extraction method based on logarithmic spectrum noise-to-signal weighting
CN110728991A (en) Improved recording equipment identification algorithm

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20190820

Termination date: 20200519

CF01 Termination of patent right due to non-payment of annual fee